JP3974412B2

JP3974412B2 - Audio converter

Info

Publication number: JP3974412B2
Application number: JP2002014834A
Authority: JP
Inventors: 研治水谷; 由実脇田; 謙二松井; 良文 ▲ひろ▼瀬; 英嗣前川; 伸一芳澤
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2001-01-24
Filing date: 2002-01-23
Publication date: 2007-09-12
Anticipated expiration: 2022-01-23
Also published as: JP2003288339A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えば、音声入力された原言語の発声を目的言語に変換して音声出力する音声変換装置に関するものである。
【０００２】
【従来の技術】
音声通訳技術は、高性能なワークステーションやパーソナルコンピュータによる使用を前提としたソフトウェアとして開発され、仮に、旅行会話などに会話の範囲を限定した場合、その性能は実用的なレベルまで到達している。しかし、音声通訳装置として、一般のユーザが日常的に使用するためには、海外旅行等に簡単に携行できる程度の大きさのハードウェアと、簡単に操作ができるユーザインタフェースを設計し、同等の機能のソフトウェアをそのハードウェアに移植する必要がある。
【０００３】
従来は、B5サイズ程度のノート型パーソナルコンピュータに対して音声通訳ソフトウェアを移植する作業が進められてきた。
【０００４】
一方、近年のハードウェア技術の進歩によって、携帯可能な小型の情報機器を用いて、主に海外旅行で使用する会話を対象として、音声入力による翻訳機能が実現できるようになった。この様な他の従来技術としての翻訳機能は双方向であり、例えば日本語から英語への変換機能と、英語から日本語への変換機能の両方を備える。
【０００５】
この様な他の従来技術の発明としては、外国語翻訳装置（特開平８−７７１７６号公報参照）や音声入力翻訳装置（特開平８−２７８９７２号公報参照）がある。これらの発明では、言語の異なる２人が１台の機器を対面形式で使用して対話ができるように、機器の形状や表示部の配置、その内容が決定されている。
【０００６】
【発明が解決しようとする課題】
しかしながら、B5サイズ程度のノート型パーソナルコンピュータは、ユーザが簡単に携行して様々な場所で利用できる大きさではない。また、普通のキーボードやマウスで操作しなければならないために、ユーザインタフェースとしても使い易い形態ではない。さらに音声認識に必要とされるCPU性能やワーキングメモリの容量といった計算資源の量は、一般に認識対象語彙の大きさに比例する。
【０００７】
小型のハードウェアでは計算資源が限られるので、音声通訳装置として必要十分な単語を認識対象語彙として実装することが困難であり、音声通訳装置としての利用価値が低くなるという課題があった。以上が上記従来技術についての課題である（第１の課題）。
【０００８】
次に、上記他の従来技術についての課題を説明する（第２の課題）。
【０００９】
即ち、上記他の従来技術の翻訳装置において、衣服のポケットに入るような小型の情報機器で、その表示領域の解像度が小さい場合には、２人の利用者に必要な情報をすべて表示することはできない。そのため、翻訳装置としての使い勝手が低下するという課題があった。また、複数の表示部を実装すると消費電力が増加して翻訳装置の稼働時間が短くなるという課題もあった。また、翻訳装置は２人の利用者の発声を排他的に扱わないので、発声が重なると音声認識の認識率が低下して翻訳装置としての性能が低下するという課題もあった。
【００１０】
本発明は、上記従来の音声通訳装置のこの様な第１の課題を考慮し、従来に比べてより一層小型化が可能であり、操作も簡単に出来得る音声変換装置を提供することを目的とする。
【００１１】
又、本発明に関連する技術の他の発明は、上記従来の翻訳装置のこの様な第２の課題を考慮し、従来に比べて表示内容についての使用性の向上を図る事が出来る音声変換装置、音声変換方法、プログラム、及び媒体を提供することを目的とする。
【００１２】
【課題を解決するための手段】
第１の本発明（請求項１記載の本発明に対応）は、第１の言語の音声を入力するための音声入力手段と、
前記入力された音声を音声認識するための音声認識手段と、
前記第１の言語の用例と、前記用例を構成する単語の内の所定の単語間の依存関係とを予め格納する用例テ゛ータベースと、
前記音声の認識結果に前記所定の単語が含まれる場合、前記含まれる所定の単語の前記依存関係を利用して、前記用例データベースに格納されている前記第１の言語の用例の中から、前記音声に対応した用例を抽出し、前記用例を構成する一つまたは複数の単語列を表示する第１の抽出・表示手段と、
前記表示された前記第１の言語の用例を構成する単語列から、第２の言語への変換対象となる予定の何れかの単語列を選定するための変換対象選定手段と、
前記用例に含まれる単語を予めクラス化して、前記クラス化された前記単語と置き換え可能な単語を予め格納する単語クラス辞書と、
前記選定された単語列の中の前記クラス化された単語が特定された際、その特定された前記クラス化された単語と同じクラスの単語を前記単語クラス辞書から前記置き換えの候補として抽出し、表示する第２の抽出・表示手段と、
前記表示された前記同じクラスの単語の候補から何れかの候補を選定するための候補選定手段と、
前記選定された前記第１の言語の用例を構成する単語列と、前記選定された前記同じクラスの単語の候補とに基づいて、前記第２の言語への変換対象を決定し、その決定された変換対象を前記第２の言語の音声言語に変換する変換手段と、
を備えた音声変換装置である。
【００１３】
又、第２の本発明（請求項２記載の本発明に対応）は、前記第１の抽出・表示手段は、前記選定の対象となる複数の単語列と、前記選定された単語列とを、それぞれ予め定められた領域に表示するための表示画面を備えた表示部を有しており、
前記第２の抽出・表示手段は、前記用語の候補を、前記表示画面の一部の領域にウインドウ状に重ねて表示する手段である上記第１の本発明の音声変換装置である。
【００１４】
又、第３の本発明（請求項３記載の本発明に対応）は、前記第１の抽出・表示手段は、前記選定された単語列を前記表示画面上に表示する際、前記単語列の一部に対して、前記対応する用語の候補の表示が可能である旨の情報をも付加して表示する手段である上記第２の本発明の音声変換装置である。
【００１５】
又、第４の本発明（請求項４記載の本発明に対応）は、前記付加された情報が表示されている前記単語列の一部を、前記表示画面上で特定するための画面表示特定手段を備えた上記第３の本発明の音声変換装置である。
【００１６】
又、第５の本発明（請求項５記載の本発明に対応）は、前記変換手段は、前記単語列の内、前記特定された前記一部を、前記選定された候補の用語に置き換えた結果を、前記変換対象として決定する上記第１の本発明の音声変換装置である。
【００１７】
尚、本発明に関連する技術の発明は、入力される第１の言語の音声を、第２の言語の音声言語に変換する音声変換装置の音声変換方法であって、
前記第１の言語の音声を入力するための音声入力ステップと、
前記入力された音声を音声認識するための音声認識ステップと、
前記第１の言語の用例と、前記用例を構成する単語の内の所定の単語間の依存関係とを予め格納する前記音声変換装置の用例データベースに格納されている前記第１の言語の用例の中から、前記音声の認識結果に前記所定の単語が含まれる場合、前記含まれる所定の単語の前記依存関係を利用して、前記音声に対応した用例を抽出し、前記用例を構成する一つまたは複数の単語列を表示する第１の抽出・表示ステップと、
前記表示された前記第１の言語の用例を構成する単語列から、第２の言語への変換対象となる予定の何れかの単語列を選定するための変換対象選定ステップと、
前記選定された単語列の中のクラス化された単語が特定された際、その特定された前記クラス化された単語と同じクラスの単語を、前記用例に含まれる単語を予めクラス化して、前記クラス化された前記単語と置き換え可能な単語を予め格納する前記音声変換装置の単語クラス辞書から前記置き換えの候補として抽出し、表示する第２の抽出・表示ステップと、
前記表示された前記同じクラスの単語の候補から何れかの候補を選定するための候補選定ステップと、
前記選定された前記第１の言語の用例を構成する単語列と、前記選定された前記同じクラスの単語の候補とに基づいて、前記第２の言語への変換対象を決定し、その決定された変換対象を前記第２の言語の音声言語に変換する変換ステップと、
を備えた音声変換装置の音声変換方法である。
【００１９】
又、本発明に関連する技術の発明は、上記音声変換装置の音声変換方法の、前記入力された音声を音声認識するための音声認識ステップと、
前記第１の言語の用例と、前記用例を構成する単語の内の所定の単語間の依存関係とを予め格納する前記音声変換装置の用例データベースに格納されている前記第１の言語の用例の中から、前記音声の認識結果に前記所定の単語が含まれる場合、前記含まれる所定の単語の前記依存関係を利用して、前記音声に対応した用例を抽出し、前記用例を構成する一つまたは複数の単語列を表示する第１の抽出・表示ステップと、
前記表示された前記第１の言語の用例を構成する単語列から、第２の言語への変換対象となる予定の何れかの単語列を選定するための変換対象選定ステップと、
前記選定された単語列の中のクラス化された単語が特定された際、その特定された前記クラス化された単語と同じクラスの単語を、前記用例に含まれる単語を予めクラス化して、前記クラス化された前記単語と置き換え可能な単語を予め格納する前記音声変換装置の単語クラス辞書から前記置き換えの候補として抽出し、表示する第２の抽出・表示ステップと、
前記表示された前記同じクラスの単語の候補から何れかの候補を選定するための候補選定ステップと、
前記選定された前記第１の言語の用例を構成する単語列と、前記選定された前記同じクラスの単語の候補とに基づいて、前記第２の言語への変換対象を決定し、その決定された変換対象を前記第２の言語の音声言語に変換する変換ステップと、
をコンピュータに実行させるためのプログラムを記録した記録媒体であって、コンピュータにより処理可能な記録媒体である。
【００２２】
以上の構成により、上記本発明では、例えば、ユーザが片手で持ってボタンやタッチパネルで簡単に操作できる小型のハードウェアを提供することが可能である。そして、例えば、音声通訳すべき用例文中に含まれる単語をクラス化して保持し、クラスを代表する少数の単語のみ認識対象語彙として音声認識部に実装することが可能である。クラスを代表する単語を含む文が発声されると、その単語を含む用例を検索してユーザに提示することが可能である。通常、ユーザは所望の用例を選択して翻訳音声を出力させる。しかし必要があれば、ユーザはその単語をクラス内の他の単語に置換して翻訳音声を出力させればよい。例えば、日本語で「アスピリンはありますか」と入力したい場合には、単語「アスピリン」が属するクラスを代表する単語「薬」に置き換えて、日本語で「何か薬はありますか」と発声し、その後「薬」の部分を「アスピリン」に置き換えればよい。このような段階的な操作によって、大規模な認識対象語彙を実装することなく、音声通訳装置としての利用価値は維持される。
【００２３】
尚、上記従来技術における第２の課題を解決するための、本発明に関連する技術の第１〜第１４の他の発明について以下に述べる。
第１の他の発明は、第１又は第２の言語の音声を入力するための入力部と、
（１）前記入力部から前記第１の言語の音声を受け取った場合、それを音声認識し、且つ、所定の制御指示に基づいて、（１−ａ）前記音声認識された前記第１の言語の表記データを出力する、又は（１−ｂ）前記音声認識された認識結果に基づいて決定される変換対象を前記第２の言語に変換し、その変換後の言語の少なくとも表記データを出力し、（２）前記入力部から前記第２の言語の音声を受け取った場合、それを音声認識し、且つ、所定の制御指示に基づいて、（２−ａ）前記音声認識された前記第２の言語の表記データを出力する、又は（２−ｂ）前記音声認識された認識結果に基づいて決定される変換対象を前記第１の言語に変換し、その変換後の言語の少なくとも表記データを出力する翻訳部と、
前記翻訳部の前記変換対象の決定を支援するための支援部と、
前記翻訳部から出力される前記変換後の言語の前記表記データを、前記所定の制御指示に基づいて表示するための表示部と、
前記所定の制御指示を、少なくとも前記翻訳部及び前記表示部に対して行う制御部と、を備えた音声変換装置である。
【００２４】
又、第２の他の発明は、前記第１の言語を使用する利用者と、前記第２の言語を使用する他の利用者との対話の際に前記入力部により入力された音声に基づいた前記表記データを逐次保持して、履歴情報として前記表示部に出力するための対話履歴管理部を備えた上記第１の本発明の音声変換装置である。
【００２５】
又、第３の他の発明は、前記第１の言語から第２の言語、又は前記第２の言語から第１の言語への何れの翻訳が前記音声翻訳部により行われるべきかの翻訳方向を決定するための情報を検出する言語変換方向検出部を備え、
前記制御部は、前記検出結果に基づいて、前記音声翻訳部に対して前記翻訳方向を指定し、かつ、前記入力部を制御する上記第１の他の発明の音声変換装置である。
【００２６】
又、第４の他の発明は、前記制御部の前記入力部に対する制御とは、前記入力部が複数の音声入力部によって構成される場合に、発声する利用者の音声を最も良く集音する前記音声入力部を選択することである上記第３の他の発明の音声変換装置である。
【００２７】
又、第５の他の発明は、前記制御部は、前記翻訳方向に応じて、前記表示部の表示内容を、その表示部の表示画面を基準として実質上１８０度回転するように制御することを特徴とする上記第３の他の発明の音声変換装置である。
【００２８】
又、第６の他の発明は、前記言語変換方向検出部は、ボタンスイッチで構成されており、発声する利用者が前記ボタンスイッチを押して翻訳方向を選択する構成である上記第３の他の発明の音声変換装置である。
【００２９】
又、第７の他の発明は、前記言語変換方向検出部は、可動式のマイクの音響的指向性の最も良い方向を検出する角度センサーで構成されており、発声する利用者がマイクの方向を変えて前記翻訳方向を選択する上記第３の他の発明の音声入力翻訳装置である。
【００３０】
又、第８の他の発明は、前記言語変換方向検出部は、前記音声変換装置の内部に設置されたジャイロセンサーで構成されており、発声する利用者が前記音声変換装置を手に持つ位置で前記翻訳方向を選択する上記第３の他の発明の音声変換装置である。
【００３１】
又、第９の他の発明は、前記言語変換方向検出部は、マイクアレーユニットで構成される入力部の、音源方向検出装置によって構成されており、発声する利用者の前記マイクアレーユニットに対する発声位置で翻訳方向を選択する上記第３の他の発明の音声変換装置である。
【００３２】
又、第１０の他の発明は、第１又は第２の言語の音声を入力し、出力する音声入力ステップと、
（１）前記音声入力ステップにより出力された前記第１の言語の音声を受け取った場合、それを音声認識し、且つ、所定の制御指示に基づいて、（１−ａ）前記音声認識された前記第１の言語の表記データを出力する、又は（１−ｂ）前記音声認識された認識結果に基づいて決定される変換対象を前記第２の言語に変換し、その変換後の言語の少なくとも表記データを出力し、（２）前記音声入力ステップにより出力された前記第２の言語の音声を受け取った場合、それを音声認識し、且つ、所定の制御指示に基づいて、（２−ａ）前記音声認識された前記第２の言語の表記データを出力する、又は（２−ｂ）前記音声認識された認識結果に基づいて決定される変換対象を前記第１の言語に変換し、その変換後の言語の少なくとも表記データを出力する翻訳ステップと、
前記翻訳ステップでの前記変換対象の決定を支援するための支援ステップと、
前記翻訳ステップにより出力される前記変換後の言語の前記表記データを、前記所定の制御指示に基づいて表示するための表示ステップと、
前記所定の制御指示を、少なくとも前記翻訳ステップ及び前記表示ステップに対して行う制御ステップと、を備えた音声変換方法である。
【００３３】
又、第１１の他の発明は、上記第１〜９の何れか一つの他の発明の音声変換装置の、前記翻訳部と、前記支援部と、前記表示部と、前記対話履歴管理部との全部又は一部としてコンピュータを機能させるためのプログラムである。
【００３４】
又、第１２の他の発明は、上記第１０の他の発明の音声変換方法の、前記翻訳ステップと、前記支援ステップと、前記表示ステップと、前記対話履歴管ステップとの全部又は一部をコンピュータに実行させるためのプログラムである。
【００３５】
又、第１３の他の発明は、上記第１１の他の発明のプログラムを担持した媒体であって、コンピュータにより処理可能なことを特徴とする媒体である。
【００３６】
又、第１４の他の発明は、上記第１２の他の発明のプログラムを担持した媒体であって、コンピュータにより処理可能なことを特徴とする媒体である。
【００３７】
上記本発明に関連する技術の他の発明の構成によれば、例えば、音声変換装置としてユーザが片手で持ってボタンやタッチパネルで簡単に操作できるものを用いる。そして、対面する２人の使用者（一方の使用者が、第１の言語を使用し、他方の使用者が第２の言語を使用する）が、操作権限を手動で獲得する手段、あるいは、操作権限を手動で相手に与える手段、あるいは、自動的に獲得する手段を追加して、どちらに操作権限があるかを明示的に示し、かつ、その使用者が操作しやすい表示と入力手段を提供する。これにより、例えば、表示電力を増加させることなく、従来に比べて表示内容についての使用性の向上を図る事が出来る。
【００３８】
【発明の実施の形態】
以下に、本発明の音声変換装置の一実施の形態の音声通訳装置の構成及び動作について、図面を参照しながら述べるとともに、本発明に関連する技術の発明の音声変換方法の動作も同時に説明する。
【００３９】
図１は、本実施の形態の音声通訳装置のハードウェア構成を示すブロック図である。
【００４０】
音声入出力装置１０２はユーザの原言語による発声を受け取り、目的言語に通訳された音声を出力する。画像出力装置１０３は、本通訳装置が通訳すべき原言語の用例を表示する。画像指示装置１０５およびボタン１０６は、画像出力装置１０３に表示された用例をユーザに選択させるため使用する。演算制御装置１０１は、音声入出力装置１０２と画像指示装置１０５とボタン１０６から入力される原言語に関するデータを目的言語に関するデータへ音声言語的に変換し、音声入出力装置１０２と画像出力装置１０３に出力する。外部大容量不揮発性記憶装置１０４は、演算制御装置１０１に処理の手順を指示するプログラムとデータを保持する。また、外部データ入出力端子１０７は、演算制御装置１０１が外部機器とプログラムやデータを交換するために使用する。電源装置１０８は、演算制御装置１０１を駆動するために必要な電源を供給する。
【００４１】
ここで、本発明の音声入力手段は、音声入力装置１０２に対応し、本発明の第１の抽出・表示手段、及び第２の抽出・表示手段は、画像出力装置１０３と演算制御装置１０１等を含む構成部分に対応する。又、本発明の画面表示特定手段は、画像指示装置１０５及びボタン１０６に対応する。又、本発明の第１の言語が、本実施の形態での原言語に対応し、本発明の第２の言語が、本実施の形態での目的言語に対応する。
【００４２】
演算制御装置１０１に、PC/AT互換のマザーボードを使用した具体的な構成例を図２に示す。音声入出力装置２０３はマザーボード２０１のUSB端子を利用して接続する。画像出力装置２０４はマザーボード２０１のデジタルRGBインタフェース端子を利用して接続する。外部大容量不揮発性記憶装置１０４には2.5インチのハードディスクドライブ２０２を使用し、マザーボード２０１とIDEインタフェースで接続する。このハードディスクドライブの替わりにフラッシュメモリディスクを使用してもよい。また、電源装置１０８にはLi-ion２次電池２０８を使用し、+5Vおよび+12Vの電圧をマザーボード２０１に供給する。マザーボード２０１の入出力端子の中で、アナログディスプレイ出力端子、ローカルエリアネットワーク端子、キーボード接続端子を引き出し、外部データ入出力端子２０７を構成する。
【００４３】
画像出力装置２０４の詳細な構成を図３に示す。冷陰極線管のバックライトが背面に実装された表示面積が４インチ、解像度がVGAのLCDユニット３０１に対し、マザーボード３０２のデジタルRGBインタフェースの中から１８ビットを使用して接続する。また、映像の同期信号とバックライトの制御信号も接続する。
【００４４】
画像指示装置２０５およびボタン２０６の詳細な構成を図４に示す。３．８インチの感圧式タッチパネル４０２をタッチパネルコントローラ４０１に接続し、指示位置のX座標とY座標をRS232C規格のシリアルデータに変換してマザーボード４０５のシリアル端子COM1に接続する。ボタン４０３とボタン４０４は、それぞれタッチパネルコントローラ４０１に接続され、ボタンのONまたはOFFの情報は指示位置の情報に付加される。マザーボード４０５に実装されるタッチパネルコントローラ４０１のデバイスドライバ・ソフトウェアによって受信されたシリアルデータは解読され、ボタン４０３はマザーボード４０５にマウスを接続したときの左ボタンに、ボタン４０４は右ボタンに相当するようにマウスクリックイベントが発生する。
【００４５】
音声入出力装置２０３の詳細な構成を図５に示す。USBオーディオインタフェース５０４は入力されるアナログ音声をデジタルデータに変換してマザーボード５０５に送信し、マザーボード５０５（図２の２０１に対応）から送信されるデジタルデータをアナログ音声に変換する。デジタルデータの送受信にはUSBインタフェースを使用する。アナログ音声の収集にはマイク５０３を使用する。また、USBインタフェース５０４の出力はオーディオアンプ５０２によって増幅し、スピーカ５０１から出力する。なお、USBオーディオインタフェース５０４の替わりに、マザーボード５０５に実装されているオーディオインタフェースを使用してもよい。
【００４６】
図２の構成を、ユーザが片手で持つことが可能な筐体に実装した例の斜視図を図６、および、その三面図を図７（ａ）〜図７（ｃ）に示す。主筐体６０１には、画像指示装置２０５と画像表示装置２０４とボタン２０５が実装されている。６０３および６０４はそれぞれ、ボタン４０３およびボタン４０４に相当する。副筐体６０２には、音声入出力装置２０３が実装されている。本通訳装置を利用しない場合は、画像表示装置２０４の表示面を副筐体６０２が覆って保護する。
【００４７】
本通訳装置を利用する場合は、図８のように副筐体８０２を音声入出力装置２０３（マイク８０３）の指向性の方向がユーザの顔を向く所定の位置まで移動させてから利用する。その三面図を図９（ａ）〜図９（ｃ）に示す。すなわち副筐体８０２に実装されたスピーカ８０４がユーザ方向を向くまで持ち上げ、さらにマイク８０３も同じく持ち上げる。この状態でタッチパネル付LCD８０５が使用可能になる。
【００４８】
図１０（ａ）〜図１０（ｃ）に主筐体６０１に対する実装の様子を示す。４インチVGALCDユニット３０１とタッチパネル４０２は重ねられ、タッチパネル付LCD１００５として実装される。図１１（ａ）〜図１１（ｃ）に副筐体６０２に対する実装の様子を示す。
【００４９】
図１２に本発明のプログラムとデータの実施の形態であるソフトウェア構成図を示す。図１２において、１２０１は各構成要素に指示を行い、各構成要素からのデータの流れを制御する制御部、１２０２は制御部１２０１からの情報の表示や、ユーザからの入力を制御部１２０１に送るGUI(Graphical User Interface)部、１２０３は制御部１２０１からの指示によりユーザの音声を収録する音声入力部、１２０４は音声入力部から送られるユーザの音声を連続音声認識する音声認識部、１２０５は原言語と目的言語の用例の対応を保持する用例データベース、１２０６は用例データベース１２０５において、クラス化されている単語を保持する単語クラス辞書、１２０７は制御部１２０１から送られてくる音声認識結果から、用例データベース１２０５を参照して用例の選択を行う用例選択部、１２０８は制御部１２０１からの指示に従って、用例選択部１２０７により選択された用例の中からクラス化された単語の選択を行う単語選択部、１２０９は制御部１２０１により指定されたクラス化された単語に変わることが可能である単語を単語クラス辞書１２０６を参照することにより、代替単語を選択する代替単語選択部、１２１０は制御部１２０１により指定された用例を用例データベース１２０５および単語クラス辞書１２０６を参照することにより目的言語に変換する言語変換部、１２１１は制御部より指定された目的言語による用例文を音声合成して出力する音声合成部で構成する。尚、ここで、用例データベース１２０５〜言語変換部１２１０をまとめて翻訳部１２２０と呼ぶ。
【００５０】
ここで、本発明の音声認識手段は、音声認識部１２０４に対応し、本発明の変換対象選定手段は、用例選択部１２０７等に対応する。又、本発明の画面表示特定手段は、単語選択部１２０８等に対応し、本発明の候補選定手段は、代替単語選択部１２０９等に対応する。又、本発明の変換手段は、言語変換部１２１０と音声合成部１２１１などを含む構成部分に対応する。
【００５１】
図１４に、用例データベース１２０５の具体例を示す。用例は対話の1文に対応しており、各用例において予め定められた情報（原言語の構成要素、構成要素の依存関係）とともに、原言語と目的言語の対応を保持している。＜＞で囲まれた原言語の単語は、クラス化された単語であることを示す。クラス化された単語は、同じクラスの単語と置き換えることが可能であることを示す。
【００５２】
図１５に単語クラス辞書１２０６の具体例を示す。ここで、クラスとは「果物」のように抽象度の高い単語のことであり、クラスに属する単語とは、「りんご」や「みかん」のようにクラスの具体的な実体を表現する単語である。なお、クラス化の抽象度は、音声認識部１２０４の性能に応じて変更することにより、効率的に用例選択を行うことが可能である。また、クラスを階層化して単語クラス辞書１２０６を構成してもよい。
【００５３】
図１６にタッチパネル付LCD８０５に表示されたGUI部１２０２の詳細を示す。１６０１は翻訳の方向を指定する翻訳方向指定部、１６０３は音声認識部１２０４により認識された音声認識結果を表示する音声認識結果表示部、１６０４は用例選択部１２０７により、選択された用例文を表示する用例候補表示部、１６０５はユーザにより指定された用例を表示する用例選択結果表示部、１６０６は言語変換部が目的言語に変換した用例を表示する翻訳結果表示部、１６０７、１６０８はそれぞれボタン８０６、ボタン８０７に相当し、ユーザによる入力を行う。また、タッチパネル付LCD８０５に対して、ユーザはポインティング入力を行うことが可能である。
【００５４】
図１３は本発明のソフトウェアのフローチャートである。１３０１は翻訳の方向を選択するステップ、１３０２はマイク８０３で音声を入力し音声認識を行うステップ、１３０３は音声認識結果に基づいて用例データベース１２０５から用例を検索するステップ、１３０４は検索した用例からユーザが用例の選択を行うステップ、１３０５はステップ１３０４で選択した用例を修正するか、または翻訳を行うかを決定するステップ、１３０６はステップ１３０４で選択した用例において修正する単語を選択するステップ、１３０７はステップ１３０６で選択した修正する単語に置き換え可能な単語の一覧を出力するステップ、１３０８はステップ１３０７で出力した単語の一覧からユーザが希望する単語を選択するステップ、１３０９はステップ１３０８により変更した単語に用例を置き換えるステップ。１３１０はステップ１３０５により決定された用例文を目的言語に変換するステップ、１３１１は、ステップ１３０９により目的言語に変換された用例を音声合成し、スピーカ８０４から出力するステップである。
【００５５】
以下、本発明のソフトウェアの動作を図１３のフローチャートと、図１７から図２５のタッチパネル付LCD８０５に表示されるGUI部１２０２の表示内容を参照しながら説明する。一例として、ユーザが「アスピリンはありますか」という文を翻訳したい場合について説明する。具体的には、ユーザは「薬はありますか」と入力してから、「薬」の部分を「アスピリン」に置き換える操作を行う。本発明では、タッチパネルとボタンを利用して2種類の入力操作が可能であるので、以下ではタッチパネル入力、ボタン入力の順で説明する。
【００５６】
タッチパネル入力の場合のステップ１３０１からステップ１３０３までのGUI部１２０２の表示内容を図１７に示す。ステップ１３０１では、ユーザは翻訳方向指定部１７０１をタッチパネル入力によりクリックし日英方向の翻訳を指定する。この時、GUI部１２０２は翻訳方向を制御部１２０１に送信し、制御部１２０１は音声入力部１２０３に音声入力を指示する。ユーザはマイク８０３を用いて「何か薬はありますか」と発声する。音声入力部１２０３は、入力された音声を音声認識部１２０４に送信する。ステップ１３０２では、音声認識部１２０４が指定された翻訳方向に対応する音声認識を行い、誤認識を含んだ認識結果「７日薬はありますか」を制御部１２０１に送信したとする。制御部１２０１は、音声認識結果をGUI部１２０２、および用例選択部１２０７に送信する。GUI部１２０２は送信された音声認識結果を認識結果表示部１７０２に表示する。一方、ステップ１３０３では、用例選択部１２０７が音声認識結果に基づき、以下の方法で用例を検索し、検索された用例を制御部１２０１に送信する。
【００５７】
用例選択部１２０７は、「７日薬はありますか」という音声認識結果から、用例データベース１２０５で定義されている重要語の集合として、「７日」，「薬」，「あり」を抽出する。
【００５８】
ここで、「７日」はクラス化された単語＜日数＞に帰属し、「薬」はクラス化された単語＜薬＞に帰属する。「あり」はいかなるクラス化された単語にも帰属しない。
【００５９】
用例選択部１２０７は、図１４の構成要素の依存関係を順次確認し、依存関係が１つ以上成立する用例の中で、成立数が多い用例から順に選択する。例えば、用例番号１の用例については、重要語の上記集合の中に「かかり」が存在しないので依存関係の成立数は０である。用例番号２の用例については、重要語の上記集合の中に「何か」が存在しないので、構成要素の依存関係の中で、（１→２）は成立しないが、（２→３）は成立する（図１４参照）。したがって、依存関係の成立数は１である。
【００６０】
用例選択部１２０７が、依存関係の成立数が１以上の用例を用例データベース１２０５の中から選択するように設計すると、図１４で用例番号１の用例は選択されず、用例番号２の用例は選択される。重要語の集合の中に「何か」が存在しないので、選択された用例番号２の用例については、
・「薬はありますか」
という表記を出力する。
【００６１】
以下の説明では、用例データベース１２０５の中の他の用例、
・「薬ですか」
・「薬です」
が上記と同様に選択されたとして説明する。
【００６２】
制御部１２０１は、用例選択部１２０７から送信された用例文をGUI部１２０２に送信する。GUI部１２０２は、選択された用例文を用例候補表示部１７０３に表示する。
【００６３】
ステップ１３０４のGUI部１２０２の表示内容を図１８に示す。ステップ１３０４では、用例候補表示部１７０３に表示されている用例候補の中からタッチパネル入力で１８０１をクリックすることにより、ユーザは自分が発声した文章を同じ文意の用例「薬はありますか」を選択する。この時、GUI部１２０２は選択された用例文を制御部１２０１に送信する。
【００６４】
ステップ１３０５のGUI部１２０２の表示内容を図１９に示す。ステップ１３０５では、GUI部１２０２は用例結果表示部１９０１に選択された用例文を表示し、用例候補表示部１９０２をクリアする。その後、用例を決定し翻訳を行うか、用例を修正してクラス化された単語を代替可能な単語で置き換えるかを選択する。この時、ユーザは用例結果表示部１９０１をタッチパネルでクリックすることにより、用例を決定することが可能である。決定された用例は制御部１２０１に送信される。また、ユーザは用例結果表示部１９０１をタッチパネルでダブルクリックすることにより、用例中の単語を置き換えるモードに移行することが可能である。
【００６５】
ステップ１３０５で用例を決定した場合のGUI部１２０２の表示内容を図２０に示す。ステップ１３１０で、制御部１２０１は、ユーザが決定した用例「薬はありますか」を言語変換部１２１０に送信する。言語変換部１２１０は、用例データベース１２０５を用いて目的言語「Any medicine」に変換し、変換結果を制御部１２０１に送信する。制御部１２０１では、変換結果をGUI部１２０２、および音声合成部１２１１に送信する。ステップ１３１１では、GUI部１２０２は、変換結果を通訳結果表示部２００１に表示する。一方、音声合成部１２１１は、変換結果を音声合成してスピーカ８０４から出力する。
【００６６】
ステップ１３０６のGUI部１２０２の表示内容を図２１に示す。ステップ１３０６では、ユーザがステップ１３０５で、単語選択モードを選択した場合に、変更する単語を選択する。この時、制御部１２０１は、単語選択部１２０８に単語選択を指示する。単語選択部１２０８は、用例の中からクラス化されている単語「薬」を抽出し、制御部１２０１に送信する。制御部１２０１は、GUI部１２０２に単語を送信し、GUI部１２０２は、用例結果表示部２１０１に表示されている「薬」に下線を引きユーザに変更可能な単語であることを表示する。ユーザはタッチパネル入力で、修正したい単語「薬」をクリックする。GUI部１２０２は、選択された単語を制御部１２０１に送信する。
【００６７】
ステップ１３０７のGUI部１２０２の表示内容を図２２に示す。ステップ１３０７では、ステップ１３０６によりユーザに指定された単語「薬」の代替単語の一覧を表示する。制御部１２０１はユーザが指定した単語「薬」を代替単語選択部１２０９に送信する。代替単語選択部１２０９は、図１５に示す単語クラス辞書１２０６を参照し、ユーザが指定した単語「薬」と同じクラスの単語
・「アスピリン」
・「かぜ薬」
・「トローチ」
・「胃腸薬」
を抽出し、制御部１２０１に送信する。制御部１２０１は、GUI部１２０２に代替単語の一覧を送信し、GUI部１２０２は、リストウィンドウ２２０１に代替単語の一覧を表示する。
【００６８】
ステップ１３０８のGUI部１２０２の表示内容を図２３に示す。ステップ１３０８は、リストウィンドウ２２０１に示された代替単語一覧の中から希望する単語を選択する。この時、GUI部１２０２はユーザのタッチパネル入力によりユーザの希望する代替単語２３０１をクリックすることにより代替単語「アスピリン」を取得し、制御部１２０１に送信する。
【００６９】
ステップ１３０９のGUI部１２０２の表示内容を図２４に示す。ステップ１３０９は、指定された代替単語「アスピリン」により用例を「アスピリンはありますか」に変更する。その後、GUI部１２０２は、用例結果表示部２４０１に表示されている用例を「アスピリンはありますか」に変更して表示する。そして、ステップ１３０５に戻る。
【００７０】
図２５は、ステップ１３０５からステップ１３０８を繰り返し、ユーザがステップ１３０５で用例決定を選択し、「アスピリンはありますか」を目的言語「Any aspirin」に変換し、合成音声を出力するときのGUI部１２０２の表示内容である。
【００７１】
次に、ボタン入力の場合の場合について説明する。以下の説明では、SW1はボタン８０６、SW2はボタン８０７にそれぞれ物理的に相当する。
【００７２】
ステップ１３０１からステップ１３０３までのGUI部１２０２の表示内容を図１７に示す。ステップ１３０１では、SW1をクリックことにより日英方向の翻訳を指定し、SW2をクリックすることにより英日方向の翻訳を指定する。この場合SW１をクリックすることにより日英方向の翻訳を指定する。この時、GUI部１２０２は翻訳方向を制御部１２０１に送信し、制御部１２０１は音声入力部１２０３に音声入力を指示する。ユーザはマイクロホン８０３を用いて「何か薬はありますか」と発声する。音声入力部１２０３は、入力された音声を音声認識部１２０４に送信する。ステップ１３０２では、音声認識部１２０４が指定された翻訳方向に対応する音声認識を行い、誤認識を含んだ認識結果「７日薬はありますか」を制御部１２０１に送信したとする。制御部１２０１は、音声認識結果をGUI部１２０２、および用例選択部１２０７に送信する。GUI部１２０２は送信された音声認識結果を認識結果表示部１７０２に表示する。一方、ステップ１３０３では、用例選択部１２０７が音声認識結果に基づいて、用例を制御部１２０１に送信する。
【００７３】
用例選択部１２０７は、「７日薬はありますか」という音声認識結果から、用例データベース１２０５で定義されている重要語の集合として、「７日」，「薬」，「あり」を抽出する。
【００７４】
ここで、「７日」はクラス化された単語＜日数＞に帰属し、「薬」はクラス化された単語＜薬＞に帰属する。「あり」はいかなるクラス化された単語にも帰属しない。
【００７５】
用例選択部１２０７は、図１４の構成要素の依存関係を順次確認し、依存関係が１つ以上成立する用例の中で、成立数が多い用例から順に選択する。例えば、用例番号１の用例については、重要語の上記集合の中に「かかり」が存在しないので依存関係の成立数は０である。用例番号２の用例については、重要語の上記集合の中に「何か」が存在しないので、構成要素の依存関係の中で、（１→２）は成立しないが、（２→３）は成立する（図１４参照）。したがって、依存関係の成立数は１である。
【００７６】
用例選択部１２０７が、依存関係の成立数が１以上の用例を用例データベース１２０５の中から選択するように設計すると、図１４で用例番号１の用例は選択されず、用例番号２の用例は選択される。重要語の集合の中に「何か」が存在しないので、選択された用例番号２の用例については、
・「薬はありますか」
という表記を出力する。
【００７７】
以下の説明では、用例データベース１２０５の中の他の用例、
・「薬ですか」
・「薬です」
が上記と同様に選択されたとして説明する。
【００７８】
制御部１２０１は、用例選択部１２０７から送信された用例文をGUI部１２０２に送信する。GUI部１２０２は、選択された用例文を用例候補表示部１７０３に表示する。
【００７９】
ステップ１３０４のGUI部１２０２の表示内容を図１８に示す。ステップ１３０４では、用例候補表示部１６０４に表示されている用例候補の中からボタン入力により、ユーザは自分が発声した文章を同じ文意の用例「薬はありますか」を選択する。選択方法は、SW1をクリックすることにより指定される行が一行上に移動し、またSW２をクリックすることにより、指定される行が一行下に移動する。用例を選択する場合には、SW1をダブルクリックすることにより選択する。この時、GUI部１２０２は選択された用例文を制御部１２０１に送信する。
【００８０】
ステップ１３０５のGUI部１２０２の表示内容を図１９に示す。ステップ１３０５では、GUI部１２０２は用例結果表示部１９０１に選択された用例文を表示し、用例候補表示部１９０２をクリアする。その後、用例を決定し翻訳を行うか、用例を修正してクラス化された単語を代替可能な単語で置き換えるかを選択する。この時、ユーザはボタン入力でSW2をクリックすることにより、用例を決定することが可能である。決定された用例は制御部１２０１に送信される。また、ボタン入力でSW1をクリックすることにより、用例中の単語の置き換えモードに移行することが可能であり、制御部１２０１に送信される。
【００８１】
ステップ１３０５で用例を決定した場合のGUI部１２０２の表示内容を図２０に示す。ステップ１３１０で、制御部１２０１は、ユーザが決定した用例「薬はありますか」を言語変換部１２１０に送信する。言語変換部１２１０は、用例データベース１２０５を用いて目的言語「Any medicine」に変換し、変換結果を制御部１２０１に送信する。制御部１２０１では、変換結果をGUI部１２０２、および音声合成部１２１１に送信する。ステップ１３１１では、GUI部１２０２は、変換結果を通訳結果表示部２００１に表示する。一方、音声合成部１２１１は、変換結果を音声合成してスピーカ８０４から出力する。
【００８２】
ステップ１３０６のGUI部１２０２の表示内容を図２１に示す。ステップ１３０６では、ユーザがステップ１３０５で、単語選択モードを選択した場合に、変更する単語を選択する。この時、制御部１２０１は、単語選択部１２０８に単語選択を指示する。単語選択部１２０８は、用例の中からクラス化されている単語「薬」を抽出し、制御部１２０１に送信する。制御部１２０１は、GUI部１２０２に単語を送信し、GUI部１２０２は、用例結果表示部２１０１に表示されている「薬」に下線を引きユーザに変更可能な単語であることを表示する。ユーザはボタン入力で、修正したい単語「薬」を選択する。すなわち、SW1をクリックすることにより1単語左に移動し、SW2をクリックすることにより1単語右に移動する。また、SW1をダブルクリックすることにより修正単語を選択することが可能である。GUI部１２０２は、選択された単語を制御部１２０１に送信する。
【００８３】
ステップ１３０７のGUI部１２０２の表示内容を図２２に示す。ステップ１３０７では、ステップ１３０６によりユーザに指定された単語「薬」の代替単語の一覧を表示する。制御部１２０１はユーザが指定した単語「薬」を代替単語選択部１２０９に送信する。代替単語選択部１２０９は、図１５に示す単語クラス辞書１２０６を参照し、ユーザが指定した単語「薬」と同一クラスの単語
・「アスピリン」
・「かぜ薬」
・「トローチ」
・「胃腸薬」
を抽出し、制御部１２０１に送信する。制御部１２０１は、GUI部１２０２に代替単語の一覧を送信し、GUI部１２０２は、リストウィンドウ２２０１に代替単語の一覧を表示する。
【００８４】
ステップ１３０８のGUI部１２０２の表示内容を図２３に示す。ステップ１３０８は、リストウィンドウ２２０１に示された代替単語一覧の中から希望する単語を選択する。この時、GUI部１２０２はユーザのボタン入力によりユーザの希望する代替単語「アスピリン」を取得し、制御部１２０１に送信する。入力方法は、SW1をクリックすることにより、1単語上の単語にカーソルが移動し、SW2をクリックすることにより、1単語下の単語にカーソルが移動する。単語を選択するときは、SW1をダブルクリックすることにより選択することが可能である。
【００８５】
ステップ１３０９のGUI部１２０２の表示内容を図２４に示す。ステップ１３０９は、指定された代替単語「アスピリン」により用例を「アスピリンはありますか」に変更する。その後、GUI部１２０２は、用例結果表示部２４０１に表示されている用例を「アスピリンはありますか」に変更して表示する。そして、ステップ１３０５に戻る。
【００８６】
図２５は、ステップ１３０５からステップ１３０８を繰り返し、ユーザがステップ１３０５で、用例決定を選択し、「アスピリンはありますか」を目的言語「Any aspirin」に変換し、合成音声を出力するときのGUI部１２０２の表示内容である。
【００８７】
なお、以上の説明では、GUI部１２０２に対するユーザの入力をタッチパネル入力、ボタン入力のそれぞれに限定して説明したが、音声認識処理を用いて音声で単語や用例を選択決定することも可能である。また、タッチパネル、ボタン、音声の各入力モダリティを組み合わせて操作することも可能である。また、一例として日本語と英語を取り上げたが、中国語など他の言語についても同様に実施可能であり、本発明は言語に依存しない。
【００８８】
又、本発明の単語列は、上記実施の形態では、複数の単語から構成された文章の場合を例にして説明したが、これに限らず例えば、「こんにちは」の様に一つの単語から構成されていても良い。
【００８９】
又、本発明の第１の抽出・表示手段と、第２の抽出・表示手段は、上記実施の形態では、同一の表示装置により実現する場合について説明したが、これに限らず例えば、それぞれ独立した表示装置により実現する構成としても良い。
【００９０】
以上説明した様に、本発明の一例である音声通訳装置は、音声による入力に基づいて用例を選択し、翻訳を行う音声通訳装置であって、前記音声通訳装置のハードウェアが、音声のモダリティとして音声入出力装置を備え、画像のモダリティとして画像出力装置を備え、接触モダリティとして１個以上のボタンと画像指示装置を備え、ユーザによって前記音声入出力装置と前記画像指示装置と前記ボタンから入力される原言語に関するデータを目的言語に関するデータへ音声言語的に変換し、前記音声入出力装置と前記画像出力装置に前記出力データを出力する演算制御装置と、前記演算制御装置に前記処理の手順を指示するプログラムとデータを保持する外部大容量不揮発性記憶装置と、前記演算制御装置が外部機器と前記プログラムとデータを交換するための外部データ入出力端子と、前記演算制御装置を駆動するために必要な電源を供給する電源装置によって構成されることを特徴とする音声通訳装置である。
【００９１】
又、他の一例は、上記演算制御装置としてPC/AT互換のマザーボードを使用することを特徴とする上記音声通訳装置である。
【００９２】
又、他の一例は、上記の外部大容量不揮発性記憶装置として、2.5インチ以下のハードディスクドライブを使用することを特徴とする上記音声通訳装置である。
【００９３】
又、他の一例は、上記外部大容量不揮発性記憶装置として、フラッシュメモリディスクを使用することを特徴とする上記音声通訳装置である。
【００９４】
又、他の一例は、上記画像出力装置として、解像度の縦方向が２４０ドット以上、かつ、横方向が２４０ドット以上の液晶表示装置を使用することを特徴とする上記音声通訳装置である。
【００９５】
又、他の一例は、上記ボタンとしては、２個の機械式ボタンを使用し、上記マザーボードにマウスを接続したときのマウスボタンに機能的に相当させることを特徴とする上記音声通訳装置である。
【００９６】
又、他の一例は、上記画像指示装置としては、上記液晶表示装置の表示面と同等の大きさ、もしくは前期表示面を包含する大きさのタッチパネルを使用することを特徴とする上記音声通訳装置である。
【００９７】
又、他の一例は、上記外部データ入出力端子は、上記マザーボードの入出力端子の中の、キーボード接続端子、アナログディスプレイ出力端子、ローカルエリアネットワーク端子を利用することを特徴とする上記音声通訳装置である。
【００９８】
又、他の一例は、上記音声入出力装置は、上記マザーボードのUSB端子を通じてアナログ音声データとデジタル音声データを入出力するUSBオーディオインタフェースと、ユーザの発声を収集して前記USBオーディオインタフェースに与えるマイクと、前記USBオーディオインタフェースの出力を増幅するオーディオアンプと、前記オーディオアンプに接続されるスピーカによって構成することを特徴とする上記音声通訳装置である。
【００９９】
又、他の一例は、上記音声入出力装置は、上記マザーボードのオーディオインタフェースと、ユーザの発声を収集して前記オーディオインタフェースに与えるマイクと、前記オーディオインタフェースの出力を増幅するオーディオアンプと、前記オーディオアンプに接続されるスピーカによって構成することを特徴とする上記音声通訳装置である。
【０１００】
又、他の一例は、上記電源装置は、リチウムイオン２次電池によって構成されることを特徴とする上記音声通訳装置である。
【０１０１】
又、他の一例は、上記音声通訳装置は、ユーザが片手に持つことが可能で、かつ、前記片手の親指によってボタンを容易に操作することが可能で、かつ、他方の手で画像指示装置を操作することが可能で、かつ、画像表示装置の表示面の法線の方向と、音声入出力装置の指向性の方向が前記ユーザの顔に容易に向くようにデザインされていることを特徴とする上記音声通訳装置である。
【０１０２】
又、他の一例は、上記音声通訳装置は、ボタンと画像指示装置と画像表示装置が実装される主筐体と、音声入出力装置が実装される副筐体によって構成され、前記音声通訳装置を利用しない場合は前記画像表示装置の表示面を前記副筐体が覆って保護し、かつ、前記音声通訳装置を利用する場合は前記副筐体を前記音声入出力装置の指向性の方向がユーザの顔を向く所定の位置まで移動させてから利用することを特徴とする上記音声通訳装置である。
【０１０３】
又、本発明の一例は、音声による入力に基づいて用例を選択し、翻訳を行う音声通訳装置において、前記音声通訳装置のソフトウェアが、ユーザとの入出力を行うGUI部と、音声を入力して音声認識を行う原言語入力部と、前記原言語入力部から入力された原言語から目的言語への翻訳を行う翻訳部と、前記翻訳部により翻訳された目的言語を音声合成して出力する音声合成部と、前記原言語入力部と前記GUI部と前記翻訳部と前記音声合成部を制御する制御部で構成されることを特徴とする音声通訳装置である。
【０１０４】
又、他の一例は、上記用例としては、対話における1文を単位とすることを特徴とする上記音声通訳装置である。
【０１０５】
又、他の一例は、上記用例としては、旅行会話において使用される頻度が高い文型を保持することを特徴とする上記音声通訳装置である。
【０１０６】
又、他の一例は、上記用例に含まれる単語は、前記単語を置き換えることが可能な関連のある単語と共にクラス化されていることを特徴とする上記音声通訳装置である。
【０１０７】
又、他の一例は、上記原言語入力部は、制御部からの指示により音声入力を行う音声入力部と、前記音声入力部から入力される音声に対して連続音声認識を行って単語列に変換する音声認識部で構成されることを特徴とする上記音声通訳装置である。
【０１０８】
又、他の一例は、上記翻訳部は、原言語と目的言語の用例の対応を保持する用例データベースと、前記用例データベースに含まれる単語のクラス情報を保持する単語クラス辞書と、原言語入力部からの入力に基づいて、前記用例データベースから該当する用例を選択する用例選択部と、前記用例選択部により選択された用例の中から修正する単語を選択する単語選択部と、前記単語選択部により選択された単語と置き換えることが可能な単語を前記単語クラス辞書から選択する代替単語選択部と、決定された用例に基づいて前記用例データベースにより目的言語に変換する言語変換部によって構成することを特徴とする上記音声通訳装置である。
【０１０９】
又、他の一例は、上記GUI部は、表示部に翻訳の方向を指定する翻訳方向指定部と、原言語入力部により出力される音声認識結果を表示する音声認識結果表示部と、前記用例選択部により用例データベースから選択された用例を表示する用例候補表示部と、ユーザにより選択された用例を表示する用例結果表示部と、言語変換部により出力される目的言語の用例を出力する通訳結果表示部で構成されることを特徴とする上記音声通訳装置である。
【０１１０】
又、他の一例は、上記GUI部は、ユーザが用例を用例候補表示部に表示された用例の中から選択する場合に、希望する前記用例をタッチパネル操作またはボタン操作によって選択すること特徴とする上記音声通訳装置である。
【０１１１】
又、他の一例は、上記単語選択部は、1個以上の修正可能な単語をユーザに提示する場合に、GUI部の用例結果表示部の修正可能な単語に印を付加することを特徴とする上記音声通訳装置である。
【０１１２】
又、他の一例は、上記修正可能な単語の印は、前記単語に下線を引く、または、前記単語を反転表示する、または、前記単語を太字にする、または、前記単語を点滅表示することを特徴とする上記音声通訳装置である。
【０１１３】
又、他の一例は、上記単語選択部は、ユーザが修正単語を選択するときに、GUI部をタッチパネル操作、または、ボタン操作、または、音声認識による音声操作で決定することを特徴とする上記音声通訳装置である。
【０１１４】
又、他の一例は、上記代替単語選択部は、代替単語を選択する場合に、前記代替単語選択部が単語クラス辞書を用いて代替候補一覧を取得し、GUI部によって前記代替候補一覧をリスト状に並べて表示することを特徴とする上記音声通訳装置である。
【０１１５】
又、他の一例は、上記代替候補一覧から代替候補を選択する場合は、GUI部のタッチパネル操作、または、ボタン操作、または、音声認識による音声操作によって前記代替候補を選択することを特徴とする上記音声通訳装置である。
【０１１６】
又、他の一例は、上記GUI部は、ユーザが希望する用例に変更することができた場合、タッチパネル操作、または、ボタン操作によって用例を決定し、上記言語変換部によって目的言語に翻訳を行い、上記音声合成部によって前記用例の合成音声を出力することを特徴とする上記音声通訳装置である。
【０１１７】
以上述べたところから明らかなように、小型のハードウェアは音声通訳装置として、ユーザが海外旅行に出かけるときに無理なく携行することができる。また、そのユーザインタフェースは片手で簡単に操作することができるのでショッピングやレストランなど、様々なシーンで容易に利用することができる。さらに、クラスを代表する単語を用いて音声を入力し、用例を確定した後、同じクラスの関連する単語と置き換えることができるので、少ない認識対象語彙でも音声通訳装置としての利用価値が低下しない。
【０１１８】
次に、上記従来技術における上記第２の課題を解決するための、本願発明に関連する技術の他の発明である音声変換装置の一実施の形態の音声入力翻訳装置について、図面を参照しながら説明する。
【０１１９】
本実施の形態の構成を図２６に示す。
【０１２０】
同図に示す様に、音声入力部４１０１、翻訳支援部４１０８、音声翻訳部４１０２、表示部４１０３、音声出力部４１０７によって基本的な音声翻訳機能を実現する。
【０１２１】
ここで、本実施の形態の装置内部の構成については、上記実施の形態において既に説明済みであるので、ここでは詳細な説明は省略する。
【０１２２】
尚、本実施の形態の構成（図２６参照）と、例えば図１２に示した構成との対応関係は次の通りである。図２６の音声入力部４１０１は、図１２の音声入力部１２０３と対応し、音声翻訳部４１０２は翻訳部１２２０及び音声認識部１２０４等に対応する。又、翻訳支援部４１０８及び表示部４１０３はＧＵＩ部１２０２等に対応し、音声出力部４１０７は音声合成部１２１１等に対応する。
【０１２３】
次に、本実施の形態の特有の構成部分について述べる。
【０１２４】
図２６において、言語変換方向制御部４１０５は２人の利用者の、どちらに翻訳装置の操作権限があるかを決定し、音声入力部４１０１の入力形態を制御し、音声翻訳部４１０２に対して翻訳方向を指定し、表示部４１０３の表示内容を指示する。ここで、２人の利用者の内、一方は日本語（本発明に関連する技術の他の発明の第１の言語に対応）を使用し、他方は英語（本発明に関連する技術の他の発明の第２の言語に対応）を使用するものとする。
【０１２５】
言語変換方向検出部４１０４は、言語変換方向制御部４１０５が操作権限のある利用者を決定するために必要な情報を収集する。対話管理部４１０６は、表示部４１０３に表示される翻訳対を逐次保持し、それを用いて、利用者の間で交わされた対話の履歴として、何れか一方の言語で表示部４１０３に表示する（図２９参照）。
【０１２６】
以下の例では、図２７等を参照しながら、日本語と英語についての翻訳装置の動作について説明するとともに、本発明に関連する技術の他の発明の音声変換方法の一実施の形態についても同時に述べる。
【０１２７】
ここで、例えば、図２７に示す様に、翻訳装置を挟んで、図中の下方側に日本語の利用者、上方側に英語の利用者が互いに向き合っているものとする。
【０１２８】
図２７の音声入力翻訳装置は、言語変換方向検出部４１０４がボタンである音声入力翻訳装置である。初期状態として言語変換方向制御部４１０５は日本語から英語への変換方向を指示していると仮定する。入力部４１０１はマイク４２０２とマイク４２０６で構成されるが、日本語の入力をするためにマイク４２０６の入力は遮断される。利用者は音声入力ボタン４２０１を押してから入力部４１０１のマイク４２０２に向かって発声する（例えば「くすりはありませんか」）。発声した音声は音声翻訳部４１０２で翻訳される。
【０１２９】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１３０】
音声翻訳部４１０２は、音声入力部４１０１から日本語の音声を受け取った場合、それを音声認識し、その認識結果に対応する、日本語の一つ又は複数の単語列を抽出し、表示部４１０３に用例候補として表示する。
【０１３１】
即ち、表示部４１０３の用例候補選択ウィンドウ４２０３に用例候補が表示されて（例えば、「薬ですか」、「薬はありますか」、「薬です」の３個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ４２０４に選択された用例が表示されて（例えば「薬はありますか」）、用例を翻訳した英語のテキストが音声出力部４１０７から発声される（例えば"Do you have medicine?"）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「薬はありますか」, "Do you have medicine?")という翻訳対）。相手の答えを求めるために、相手に翻訳装置の操作を促す場合は、言語変換方向検出部４１０４であるボタン４２０５を押す。
【０１３２】
言語変換方向制御部４１０５は言語変換方向検出部４１０４からの情報に基づき、音声翻訳部４１０２と表示部４１０３に対して英語から日本語への変換方向を指示する。表示部４１０３の表示内容は、対面の英語の利用者が使いやすいように図２８のように１８０°回転して、英語の表示になる。入力部４１０１は、英語の入力をするためにマイク４３０７の入力が遮断され、マイク４３０３が有効になる。対話履歴ウィンドウ４３０１には、対話管理部４１０６から翻訳対の英語の方が表示される。具体的には、翻訳対として、例えば、「薬はありますか」と "Do you have medicine?"とからなる翻訳対と、「はい」と"Yes, I do."とからなる翻訳対が対話管理部４１０６に保持されている場合、図２９に示す対話履ウィンドウ４４０１には、（日）：「薬はありますか？」、（英）：「はい」が表示される。利用者は音声入力ボタン４３０２を押してから入力部４１０１のマイク４３０３に向かって発声する（例えば"Yes, certainly"）。発声した音声は音声翻訳部４１０２で翻訳される。
【０１３３】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１３４】
音声翻訳部４１０２は、音声入力部４１０１から英語の音声を受け取った場合、それを音声認識し、その認識結果に対応する、英語の一つ又は複数の単語列を抽出し、表示部４１０３に用例候補として表示する。
【０１３５】
即ち、表示部４１０３の用例候補選択ウィンドウ４３０４に候補が表示されて（例えば、"Yes, I do."、"Surely."、"Certainly."の３個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ４３０５に選択された用例が表示されて（例えば"Yes, I do."）、用例を翻訳した日本語のテキストが音声出力部４１０７から発声される（例えば「はい。」）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「はい」, "Yes, I do.")という翻訳対）。相手の答えを求めるために、相手に翻訳装置の操作を促す場合は、言語変換方向検出部４１０４であるボタン４３０６を押す。
【０１３６】
言語変換方向制御部４１０５は言語変換方向検出部４１０４からの情報に基づき、音声翻訳部４１０２と表示部４１０３に対して日本語から英語への変換方向を指示する。表示部４１０３の表示内容は、対面の日本語の利用者が使いやすいように図２９のように１８０°回転して、日本語の表示になる。入力部４１０１は、日本語の入力をするためにマイク４４０５の入力が遮断され、マイク４４０３が有効になる。対話履歴ウィンドウ４４０１には、対話管理部４１０６から翻訳対の日本語の方が表示される。利用者は音声入力ボタン４４０２を押してから入力部４１０１のマイク４４０３に向かって発声する（例えば「ありがとうございます」）。発声した音声は音声翻訳部４１０２で翻訳される。
【０１３７】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１３８】
即ち、表示部４１０３の用例候補選択ウィンドウ４４１３に候補が表示されて（例えば、「ありがとう。」の１個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ４４１４に選択された用例が表示されて（例えば「ありがとう。」）、用例を翻訳した日本語のテキストが音声出力部４１０７から発声される（例えば"Thank you."）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「ありがとう」, "Thank you.")という翻訳対）。さらに相手の答えを求めるために、相手に翻訳装置の操作を促す場合は、言語変換方向検出部４１０４であるボタン４４０４を押す。
【０１３９】
図３０の音声入力翻訳装置は、言語変換方向検出部４１０４がマイク４５０１の傾斜角度センサー４５０２である音声入力翻訳装置である。すなわち、マイク４５０１が日本語の利用者の方に傾斜しているか、英語の利用者の方に傾斜しているかを判断するために角度センサー４５０２を用いている。図３０の状態では言語変換方向制御部４１０５は日本語から英語への変換方向を指示している。利用者は音声入力ボタン４５０３を押してから入力部４１０１のマイク４５０１に向かって発声する（例えば「くすりはありませんか」）。発声した音声は音声翻訳部４１０２で翻訳される。
【０１４０】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１４１】
即ち、表示部４１０３の用例候補選択ウィンドウ４５０４に候補が表示されて（例えば、「薬ですか」「薬はありますか」「薬です」の３個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ４５０５に選択された用例が表示されて（例えば「薬はありますか」）、用例を翻訳した英語のテキストが音声出力部４１０７から発声される（例えば"Do you have medicine?"）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「薬はありますか」, "Do you have medicine?")という翻訳対）。相手の答えを求めるために、相手に翻訳装置の操作を促す場合は、マイク４５０１を英語の利用者の方に向ける。
【０１４２】
言語変換方向制御部４１０５は言語変換方向検出部４１０４からの情報に基づき、音声翻訳部４１０２と表示部４１０３に対して英語から日本語への変換方向を指示する。表示部４１０３の表示内容は、対面の英語の利用者が使いやすいように図３１のように１８０°回転して、英語の表示になる。対話履歴ウィンドウ４６０１には、対話管理部４１０６から翻訳対の英語の方が表示される。利用者は音声入力ボタン４６０２を押してから入力部４１０１のマイク４６０３に向かって発声する（例えば"Yes, certainly"）。発声した音声は音声翻訳部４１０２で翻訳される。
【０１４３】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１４４】
即ち、表示部４１０３の用例候補選択ウィンドウ４６０４に候補が表示されて（例えば、"Yes, I do."、"Surely."、"Certainly."の３個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ４６０５に選択された用例が表示されて（例えば"Yes, I do."）、用例を翻訳した日本語のテキストが音声出力部４１０７から発声される（例えば「はい。」）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「はい」, "Yes, I do.")という翻訳対）。相手の答えを求めるために、マイク４６０３を日本語の利用者の方へ向ける。
【０１４５】
言語変換方向制御部４１０５は言語変換方向検出部４１０４からの情報に基づき、音声翻訳部４１０２と表示部４１０３に対して日本語から英語への変換方向を指示する。表示部４１０３の表示内容は、対面の日本語の利用者が使いやすいように図３２のように１８０°回転して、日本語の表示になる。対話履歴ウィンドウ４７０１には、対話管理部４１０６から翻訳対の日本語の方が表示される。利用者は音声入力ボタン４７０２を押してから入力部４１０１のマイク４７０３に向かって発声する（例えば「ありがとうございます」）。発声した音声は音声翻訳部４１０２で翻訳される。
【０１４６】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１４７】
即ち、表示部４１０３の用例候補選択ウィンドウ４７０４に候補が表示されて（例えば、「ありがとう。」の１個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ４７０５に選択された用例が表示されて（例えば「ありがとう。」）、用例を翻訳した日本語のテキストが音声出力部４１０７から発声される（例えば"Thank you."）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「ありがとう」, "Thank you.")という翻訳対）。さらに相手の答えを求めるために、相手に翻訳装置の操作を促す場合は、マイク４７０３を英語の利用者の方に向ける。
【０１４８】
図３３の音声入力翻訳装置は、言語変換方向検出部４１０４が本体の傾きを検出するジャイロセンサー４８０１である音声入力翻訳装置である。ジャイロセンサーの状態で言語変換方向制御部４１０５は日本語から英語への変換方向を指示していると仮定する。入力部４１０１はマイク４８０２とマイク４８０３で構成されるが、日本語の入力をするためにマイク４８０３の入力は遮断される。利用者は音声入力ボタン４８０４を押してから入力部４１０１のマイク４８０２に向かって発声する（例えば「くすりはありませんか」）。発声した音声は音声翻訳部４１０２で翻訳される。
【０１４９】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１５０】
即ち、表示部４１０３の用例候補選択ウィンドウ４８０５に候補が表示されて（例えば、「薬ですか」「薬はありますか」「薬です」の３個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ４８０６に選択された用例が表示されて（例えば「薬はありますか」）、用例を翻訳した英語のテキストが音声出力部４１０７から発声される（例えば"Do you have medicine?"）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「薬はありますか」, "Do you have medicine?")という翻訳対）。相手の答えを求めるために、相手に翻訳装置の操作を促す場合は、音声翻訳装置を相手に差し出してさかさまになるように相手に手に持ってもらう。
【０１５１】
言語変換方向制御部４１０５はジャイロセンサー４９０１からの情報に基づき、音声翻訳部４１０２と表示部４１０３に対して英語から日本語への変換方向を指示する。表示部４１０３の表示内容は、対面の英語の利用者が使いやすいように図３４のように１８０°回転して、英語の表示になる。入力部４１０１は、英語の入力をするためにマイク４９０２の入力が遮断され、マイク４９０３が有効になる。対話履歴ウィンドウ４９０４には、対話管理部４１０６から翻訳対の英語の方が表示される。利用者は音声入力ボタン４９０５を押してから入力部４１０１のマイク４９０３に向かって発声する（例えば"Yes, certainly"）。発声した音声は音声翻訳部４１０２で翻訳される。
【０１５２】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１５３】
即ち、表示部４１０３の用例候補選択ウィンドウ４９０６に候補が表示されて（例えば、"Yes, I do."、"Surely."、"Certainly."の３個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ４９０７に選択された用例が表示されて（例えば"Yes, I do."）、用例を翻訳した日本語のテキストが音声出力部４１０７から発声される（例えば「はい。」）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「はい」, "Yes, I do.")という翻訳対）。相手の答えを求めるために、相手に翻訳装置の操作を促す場合は、音声翻訳装置を相手に差し出してさかさまになるように相手に手に持ってもらう。
【０１５４】
言語変換方向制御部４１０５はジャイロセンサー５００１からの情報に基づき、音声翻訳部４１０２と表示部４１０３に対して日本語から英語への変換方向を指示する。表示部４１０３の表示内容は、対面の日本語の利用者が使いやすいように図３５のように１８０°回転して、日本語の表示になる。入力部４１０１は、日本語の入力をするためにマイク５００２の入力が遮断され、マイク５００３が有効になる。対話履歴ウィンドウ５００４には、対話管理部４１０６から翻訳対の日本語の方が表示される。利用者は音声入力ボタン５００５を押してから入力部４１０１のマイク５００３に向かって発声する（例えば「ありがとうございます」）。発声した音声は音声翻訳部４１０２で翻訳される。
【０１５５】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１５６】
即ち、表示部４１０３の用例候補選択ウィンドウ５００６に候補が表示されて（例えば、「ありがとう。」の１個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ５００７に選択された用例が表示されて（例えば「ありがとう。」）、用例を翻訳した日本語のテキストが音声出力部４１０７から発声される（例えば"Thank you."）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「ありがとう」, "Thank you.")という翻訳対）。さらに相手の答えを求めるために、相手に翻訳装置の操作を促す場合は、音声翻訳装置を相手に差し出してさかさまになるように相手に手に持ってもらう。
【０１５７】
図３６の音声入力翻訳装置は、入力部４１０１と言語変換方向検出部４１０４が音源の方向を検出可能なマイクアレーユニット５１０１である音声入力翻訳装置である。マイクアレーユニット５１０１は、音源の方向を特定してから、指向性の鋭い集音を行う機能を持つものであり、一般的に幾何学的に配置される複数のマイクユニットと各マイクユニットからの出力をデジタル信号処理して１つの出力に変換する演算装置によって構成される。
【０１５８】
日本語の利用者が発声を開始すると（例えば「あの、」）、マイクアレーユニット５１０１は発声者の音声の方向を検出し、発声可能状態となる。発声可能状態でない間は表示部４１０３の背景色が利用者に注意を促す色で（例えば赤色）、発生可能状態になると許可を与えられた色になる（例えば緑色）。マイクアレーユニット５１０１の情報に基づき言語変換方向制御部４１０５は日本語から英語への変換方向を指示する。発生可能状態で、利用者が発声すると（例えば「くすりはありませんか」）、発声した音声は音声翻訳部４１０２で翻訳される。
【０１５９】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１６０】
即ち、表示部４１０３の用例候補選択ウィンドウ５１０２に候補が表示されて（例えば、「薬ですか」「薬はありますか」「薬です」の３個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ５１０３に選択された用例が表示されて（例えば「薬はありますか」）、用例を翻訳した英語のテキストが音声出力部４１０７から発声される（例えば"Do you have medicine?"）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「薬はありますか」, "Do you have medicine?")という翻訳対）。
【０１６１】
相手が答えるために発声を開始すると（例えば"Hmm,"）、マイクアレーユニット５２０１は発声者の音声の方向を検出し、発声可能状態となる。言語変換方向制御部４１０５はマイクアレーユニット５２０１からの情報に基づき、音声翻訳部４１０２と表示部４１０３に対して英語から日本語への変換方向を指示する。表示部４１０３の表示内容は、対面の英語の利用者が使いやすいように図３７のように１８０°回転して、英語の表示になる。対話履歴ウィンドウ５２０２には、対話管理部４１０６から翻訳対の英語の方が表示される。発声可能状態で、英語の利用者が発声すると（例えば"Yes, certainly"）、発声した音声は音声翻訳部４１０２で翻訳される。
【０１６２】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１６３】
即ち、表示部４１０３の用例候補選択ウィンドウ５２０３に候補が表示されて（例えば、"Yes, I do."、"Surely."、"Certainly."の３個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ５２０４に選択された用例が表示されて（例えば"Yes, I do."）、用例を翻訳した日本語のテキストが音声出力部４１０７から発声される（例えば「はい。」）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「はい」, "Yes, I do.")という翻訳対）。
【０１６４】
相手が答えるために発声を開始すると（例えば「あ、」）、マイクアレーユニット５３０１は発声者の音声の方向を検出し、発声可能状態となる。言語変換方向制御部４１０５はマイクアレーユニット５３０１からの情報に基づき、音声翻訳部４１０２と表示部４１０３に対して日本語から英語への変換方向を指示する。表示部４１０３の表示内容は、対面の日本語の利用者が使いやすいように図３８のように１８０°回転して、日本語の表示になる。対話履歴ウィンドウ５３０２には、対話管理部４１０６から翻訳対の日本語の方が表示される。発生可能状態で、利用者が発声すると（例えば「ありがとう」）、発声した音声は音声翻訳部４１０２で翻訳される。
【０１６５】
尚、翻訳部４１０２における翻訳の動作は、上記実施の形態において図１４等を参照して説明しているので、ここでは、詳細な説明は省略するが、概要は以下の通りである。
【０１６６】
即ち、表示部４１０３の用例候補選択ウィンドウ５３０３に候補が表示されて（例えば、「ありがとう。」の１個の用例）、翻訳支援部４１０８を用いて利用者がその中から１つを選択すると（例えばタッチパネルで選択）、用例結果ウィンドウ５３０４に選択された用例が表示されて（例えば「ありがとう。」）、用例を翻訳した日本語のテキストが音声出力部４１０７から発声される（例えば"Thank you."）。表示部４１０３から対話管理部４１０６に翻訳対が送られる（例えば、(「ありがとう」, "Thank you.")という翻訳対）。
【０１６７】
なお、タッチパネル、ボタン、音声の各入力モダリティを組み合わせたり、ボタンをタッチパネルで置換して操作することも可能である。また、一例として日本語と英語を取り上げたが、中国語など他の言語についても同様に実施可能であり、本発明に関連する技術の他の発明は言語に依存しない。
【０１６８】
以上述べたことから明らかなように、上記構成によれば、表示部に一方の言語の操作画面が全面に表示されるので、小さい表示部であっても翻訳装置の使い勝手が維持される。また、画面の内容から操作権限がどちらにあるのかが理解しやすく、２人の発声が重なることがない。したがって、音声認識の認識率が低下せず、翻訳装置としての性能が低下しない。
【０１６９】
尚、上記実施の形態では、第２の言語については、翻訳結果の表示とともに、音声出力も行う場合について説明したが、これに限らず例えば、翻訳結果の表示のみの構成でも良い。
【０１７０】
又、上記実施の形態では、２人の利用者が、翻訳装置を挟んで対面する形で同装置を使用する場合について説明したが、これに限らず例えば、２人が並んで同装置を使用する構成としても良い。
【０１７１】
具体的には、図３９、図４０に示す構成となる。これらの図に示す通り、第１の言語（例えば、日本語）の利用者は、音声入力ボタン４２０１ａを使用し、第２の言語（例えば、英語）の利用者は、音声入力ボタン４２０１ｂを使用する。この構成では、マイク５５０１が装置の上部中央に一つ設けられている。この場合にも上記構成と同様の効果を発揮する。
【０１７２】
又、上記実施の形態では、例えば、図２７、図２８に示す様に用例結果を翻訳対象とする場合について説明したが、これに限らず例えば、用例結果ウィンドウ４２０４に表示された単語列の中から、ユーザにより指定された単語の代替単語の一覧を表示して、その代替単語の中から、所望の単語を選択し、その選択結果を反映したものを翻訳対象とする構成でも良い。即ち、この場合の構成は、図２１〜図２４で述べた構成を、図２７等に示す構成に適用したものである。
【０１７３】
具体的には、図４１〜図４４に示す構成となる。即ち、用例結果ウィンドウ４２０４ａに表示された単語列の中から、ユーザにより指定（図４１）された単語の代替単語の一覧を表示して（図４２）、その代替単語の中から、所望の単語として例えば、アスピリンの単語２３０１を選択し（図４３）、その選択結果を反映した用例結果を用例結果ウィンドウ４２０４ａに表示して（図４４）、それを翻訳対象とするものである。その後の翻訳動作等は、図２８等に示した内容と同じである。これにより翻訳対象の幅がより広がり使用性が向上する。
【０１７４】
本発明に関連する技術の発明は、上述した音声変換装置の全部又は一部の手段（又は、素子、回路、部等）の機能をコンピュータにより実行させるためのプログラムであって、コンピュータと協働して動作するプログラムである。
【０１７５】
又、本発明に関連する技術の発明は、上述した音声変換方法の全部又は一部のステップ（又は、工程、動作、作用等）の動作をコンピュータにより実行させるためのプログラムであって、コンピュータと協働して動作するプログラムである。
【０１７６】
又、本発明に関連する技術の発明は、上述した音声変換装置の音声変換方法の全部又は一部のステップの全部又は一部の動作をコンピュータにより実行させるためのプログラムを担持した記録媒体であり、コンピュータにより読み取り可能且つ、読み取られた前記プログラムが前記コンピュータと協動して前記動作を実行する記録媒体である。
【０１７７】
又、本発明に関連する技術の発明は、上述した音声変換装置の全部又は一部の手段の全部又は一部の機能をコンピュータにより実行させるためのプログラムを担持した媒体であり、コンピュータにより読み取り可能且つ、読み取られた前記プログラムが前記コンピュータと協動して前記機能を実行する媒体である。
【０１７８】
又、本発明に関連する技術の発明の音声変換装置の音声変換方法の一部のステップ（又は、工程、動作、作用等）とは、それらの複数のステップの内の、幾つかの手段又はステップを意味し、あるいは、一つの手段又はステップの内の、一部の機能又は一部の動作を意味するものである。
【０１７９】
又、本発明に関連する技術の発明の一部の装置（又は、素子、回路、部等）とは、それらの複数の装置の内の、幾つかの装置を意味し、あるいは、一つの装置の内の、一部の手段（又は、素子、回路、部等）を意味し、あるいは、一つの手段の内の、一部の機能を意味するものである。
【０１８０】
又、本発明に関連する技術の発明のプログラムの一利用形態は、コンピュータにより読み取り可能な記録媒体に記録され、コンピュータと協働して動作する態様であっても良い。
【０１８１】
又、本発明に関連する技術の発明のプログラムの一利用形態は、伝送媒体中を伝送し、コンピュータにより読みとられ、コンピュータと協働して動作する態様であっても良い。
【０１８３】
又、記録媒体としては、ＲＯＭ等が含まれ、伝送媒体としては、インターネット等の伝送媒体、光・電波・音波等が含まれる。
【０１８４】
又、上述した本発明のコンピュータは、ＣＰＵ等の純然たるハードウェアに限らず、ファームウェアや、ＯＳ、更に周辺機器を含むものであっても良い。
【０１８５】
尚、以上説明した様に、本発明の構成は、ソフトウェア的に実現しても良いし、ハードウェア的に実現しても良い。
【０１８６】
【発明の効果】
以上述べたことから明らかなように本発明は、従来に比べてより一層小型化が可能であり、操作も簡単に出来得るという長所を有する。
【図面の簡単な説明】
【図１】本発明の一実施の形態の音声通訳装置のハードウェア構成を示すブロック図
【図２】 PC/AT互換のマザーボードを使用した場合の図１の詳細なブロック図
【図３】画像出力装置２０４の詳細なブロック図
【図４】画像指示装置２０５およびボタン２０６の詳細なブロック図
【図５】音声入出力装置２０３の詳細なブロック図
【図６】音声通訳装置を利用しないときの筐体の全体図
【図７】（ａ）は図６に示す音声通訳装置の詳細な構造を示す正面図
（ｂ）は図６に示す音声通訳装置の詳細な構造を示す側面図
（ｃ）は図６に示す音声通訳装置の詳細な構造を示す平面図
【図８】音声通訳装置を利用するときの筐体の全体図
【図９】（ａ）は図８に示す音声通訳装置の詳細な構造を示す正面図
（ｂ）は図８に示す音声通訳装置の詳細な構造を示す側面図
（ｃ）は図８に示す音声通訳装置の詳細な構造を示す平面図
【図１０】（ａ）は主筐体８０１に図２の各構成要素を実装する方法を示すための正面図（ｂ）は主筐体８０１に図２の各構成要素を実装する方法を示すための側面図（ｃ）は主筐体８０１に図２の各構成要素を実装する方法を示すための平面図
【図１１】（ａ）は副筐体８０２に図２の各構成要素を実装する方法を示すための正面図（ｂ）は副筐体８０２に図２の各構成要素を実装する方法を示すための側面図（ｃ）は副筐体８０２に図２の各構成要素を実装する方法を示すための平面図
【図１２】本発明の一実施の形態の音声通訳装置のソフトウェアの構成を示すブロック図
【図１３】ソフトウェアの処理の流れを示すフローチャート
【図１４】用例データベース１２０５の内容の一例を示す図
【図１５】単語クラス辞書１２０６の内容の一例を示す図
【図１６】 GUI部１２０２の表示内容を示す図
【図１７】ステップ１３０１から１３０３までのGUI部１２０２の表示内容を示す図
【図１８】ステップ１３０４の処理におけるGUI部１２０２の表示内容を示す図
【図１９】ステップ１３０５の処理におけるGUI部１２０２の表示内容を示す図
【図２０】ステップ１３１０から１３１１までの処理におけるGUI部１２０２の表示内容を示す図
【図２１】ステップ１３０６の処理におけるGUI部１２０２の表示内容を示す図
【図２２】ステップ１３０７の処理におけるGUI部１２０２の表示内容を示す図
【図２３】ステップ１３０８の処理におけるGUI部１２０２の表示内容を示す図
【図２４】ステップ１３０９の処理におけるGUI部１２０２の表示内容を示す図
【図２５】ステップ１３１０から１３１１までの処理におけるGUI部１２０２の表示内容を示す図
【図２６】本発明に関連する技術の他の発明の一実施の形態の音声入力翻訳装置の構成を示すブロック図
【図２７】言語変換方向検出部４１０４がボタンである音声翻訳装置の日本語の利用を示す図
【図２８】言語変換方向検出部４１０４がボタンである音声翻訳装置の英語の利用を示す図
【図２９】言語変換方向検出部４１０４がボタンである音声翻訳装置の日本語の利用を示す図
【図３０】言語変換方向検出部４１０４がマイク軸の角度センサーである音声翻訳装置の日本語の利用を示す図
【図３１】言語変換方向検出部４１０４がマイク軸の角度センサーである音声翻訳装置の英語の利用を示す図
【図３２】言語変換方向検出部４１０４がマイク軸の角度センサーである音声翻訳装置の日本語の利用を示す図
【図３３】言語変換方向検出部４１０４がジャイロセンサーである音声翻訳装置の日本語の利用を示す図
【図３４】言語変換方向検出部４１０４がジャイロセンサーである音声翻訳装置の英語の利用を示す図
【図３５】言語変換方向検出部４１０４がジャイロセンサーである音声翻訳装置の日本語の利用を示す図
【図３６】入力部４１０１と言語変換方向検出部４１０４がマイクアレーユニットである音声翻訳装置の日本語の利用を示す図
【図３７】入力部４１０１と言語変換方向検出部４１０４がマイクアレーユニットである音声翻訳装置の英語の利用を示す図
【図３８】入力部４１０１と言語変換方向検出部４１０４がマイクアレーユニットである音声翻訳装置の日本語の利用を示す図
【図３９】本発明に関連する技術の他の発明の別の実施の形態の音声翻訳装置の日本語の利用を説明するための図
【図４０】本発明に関連する技術の他の発明の別の実施の形態の音声翻訳装置の英語の利用を説明するための図
【図４１】本発明に関連する技術の他の発明の更に別の実施の形態の音声翻訳装置の日本語の利用における代替単語の機能を説明するための図
【図４２】本発明に関連する技術の他の発明の更に別の実施の形態の音声翻訳装置の日本語の利用における代替単語の機能を説明するための図
【図４３】本発明に関連する技術の他の発明の更に別の実施の形態の音声翻訳装置の日本語の利用における代替単語の機能を説明するための図
【図４４】本発明に関連する技術の他の発明の更に別の実施の形態の音声翻訳装置の日本語の利用における代替単語の機能を説明するための図
【符号の説明】
１０１演算制御装置
１０２音声入出力装置
１０３画像出力装置
１０４外部大容量不揮発性記憶装置
１０５画像指示装置
１０６ボタン
１０７外部データ入出力端子
１０８電源装置
２０１マザーボード
２０２２．５インチハードディスクドライブ
２０３音声入出力装置
２０４画像出力装置
２０５画像指示装置
２０６ボタン
２０７外部データ出力端子
２０８ Li-ion２次電池
３０１バックライト付４インチVGALCDユニット
３０２マザーボード
４０１タッチパネルコントローラ
４０２３．８インチ感圧式タッチパネル
４０３ボタン
４０４ボタン
４０５マザーボード
５０１スピーカ
５０２オーディオアンプ
５０３マイク
５０４ USBオーディオデバイス
５０５マザーボード
６０１主筐体
６０２副筐体
６０３ボタン
６０４ボタン
７０１正面図
７０２右側面図
７０３上面図
８０１主筐体
８０２副筐体
８０３マイク
８０４スピーカ
８０５タッチパネル付LCD
９０１正面図
９０２右側面図
９０３上面図
１００１正面図
１００２右側面図
１００３上面図
１００４マザーボード
１００５タッチパネル付LCD
１００６２．５インチハードディスクドライブ
１００７ボタン
１００８ボタン
１１０１正面図
１１０２右側面図
１１０３上面図
１１０４マイク
１１０５スピーカ
１１０６ USBオーディオデバイス
１１０７オーディオアンプ
１２０１制御部
１２０２ GUI部
１２０３音声入力部
１２０４音声認識部
１２０５用例データベース
１２０６単語クラス辞書
１２０７用例選択部
１２０８単語選択部
１２０９代替単語選択部
１２１０言語変換部
１２１１音声合成部
１３０１翻訳の方向を決定するステップ
１３０２音声認識を行うステップ
１３０３用例データベースから用例を検索するステップ
１３０４用例を選択するステップ
１３０５用例を決定するか修正するかを判断するステップ
１３０６修正する単語を決定するステップ
１３０７代替単語一覧を取得するステップ
１３０８代替単語を決定するステップ
１３０９用例を修正するステップ
１３１０言語変換を行うステップ
１３１１音声合成部を行うステップ
１６０１翻訳方向指定部
１６０２翻訳方向指定部
１６０３認識結果表示部
１６０４用例候補表示部
１６０５用例結果表示部
１６０６通訳結果表示部
１６０７ボタンSW1
１６０８ボタンSW2
１７０１翻訳方向指定部
１７０２認識結果表示部
１７０３用例候補表示部
１８０１選択された用例
１９０１用例結果表示部
１９０２用例候補表示部
２００１通訳結果表示部
２１０１用例結果表示部
２２０１リストウィンドウ
２３０１選択された代替単語
２４０１用例結果表示部
４１０５言語変換方向制御部
４１０６対話履歴管理部
４１０４言語変換方向検出部[0001]
BACKGROUND OF THE INVENTION
  The present invention provides, for example, a voice conversion device that converts a voice input in a source language into a target language and outputs the voice.In placeIt is related.
[0002]
[Prior art]
Voice interpretation technology was developed as software premised on the use of high-performance workstations and personal computers. If the scope of conversation is limited to travel conversations, the performance has reached a practical level. . However, as a voice interpreting device, for ordinary users to use on a daily basis, hardware that is large enough to be easily carried on overseas trips, etc. and a user interface that can be easily operated are designed to be equivalent. Functional software needs to be ported to the hardware.
[0003]
Conventionally, the work of porting voice interpretation software to a B5 size notebook personal computer has been promoted.
[0004]
On the other hand, recent advances in hardware technology have made it possible to implement a translation function by voice input mainly for conversations used on overseas trips using portable information devices. Such other conventional translation functions are bidirectional, for example, having both a Japanese-to-English conversion function and an English-to-Japanese conversion function.
[0005]
Such other prior art inventions include a foreign language translation device (see Japanese Patent Laid-Open No. 8-77176) and a speech input translation device (see Japanese Patent Laid-Open No. 8-278972). In these inventions, the shape of the device, the arrangement of the display unit, and the contents thereof are determined so that two people having different languages can interact using one device in a face-to-face format.
[0006]
[Problems to be solved by the invention]
However, notebook personal computers of about B5 size are not large enough for users to carry around and use in various places. Moreover, since it must be operated with a normal keyboard or mouse, it is not easy to use as a user interface. Furthermore, the amount of computational resources such as CPU performance and working memory capacity required for speech recognition is generally proportional to the size of the recognition target vocabulary.
[0007]
Since the computational resources are limited in small hardware, it is difficult to implement words necessary and sufficient as a speech interpretation device as a recognition target vocabulary, and there is a problem that the utility value as a speech interpretation device is lowered. The above is the subject concerning the above prior art (first subject).
[0008]
Next, a problem with the other prior art will be described (second problem).
[0009]
That is, in the above-described other prior art translation device, if the resolution of the display area is small with a small information device that fits in the pocket of clothes, all necessary information for two users is displayed. I can't. Therefore, there has been a problem that usability as a translation device is reduced. In addition, when a plurality of display units are mounted, there is a problem that power consumption increases and the operation time of the translation apparatus is shortened. In addition, since the translation device does not handle the utterances of two users exclusively, there is a problem that when the utterances overlap, the recognition rate of speech recognition is lowered and the performance as the translation device is lowered.
[0010]
  In consideration of the first problem of the conventional speech interpreting apparatus, the present invention is capable of further downsizing compared to the conventional speech conversion apparatus that can be easily operated.PlaceThe purpose is to provide.
[0011]
  Also, the present inventionOther inventions related to technologyIn consideration of the second problem of the above-mentioned conventional translation apparatus, a speech conversion apparatus, a speech conversion method, a program, and a medium capable of improving the usability of display contents compared to the conventional one are provided. The purpose is to do.
[0012]
[Means for Solving the Problems]
    According to a first aspect of the present invention (corresponding to the first aspect of the present invention), voice input means for inputting voice in a first language;
  Voice recognition means for voice recognition of the input voice;
  An example database for storing in advance an example of the first language and a dependency relationship between predetermined words of words constituting the example;
  In the speech recognition resultWhen the predetermined word is included, an example corresponding to the voice is selected from the examples of the first language stored in the example database by using the dependency relationship of the included predetermined word.Extract andOne or more word strings constituting the exampleFirst extracting / displaying means for displaying;
  The displayedConfigure an example of the first languageA conversion target selection means for selecting any word string to be converted from the word string to the second language;
  A word class dictionary that pre-classifies the words included in the example and stores in advance words that can be replaced with the classified words;
  In the selected word stringThe classified wordWas identified, it was identifiedWords of the same class as the classified wordTheThe replacement from the word class dictionary.CandidateAsSecond extracting / displaying means for extracting and displaying;
  The displayed saidOf the same classCandidate selection means for selecting any candidate from the candidates,
  SelectedConfigure an example of the first languageWord string and the selectedOf the same class of wordsConversion means for determining a conversion target to the second language based on the candidate, and converting the determined conversion target to the speech language of the second language;
Is a voice conversion device.
[0013]
In the second invention (corresponding to the invention described in claim 2), the first extraction / display means includes a plurality of word strings to be selected and the selected word string. , Each having a display unit with a display screen for displaying in a predetermined area,
The second extraction / display unit is the speech conversion apparatus according to the first aspect of the present invention, which is a unit that displays the term candidates on a partial area of the display screen in a window shape.
[0014]
Further, according to a third aspect of the present invention (corresponding to the present invention described in claim 3), when the first extraction / display unit displays the selected word string on the display screen, The speech conversion apparatus according to the second aspect of the present invention, which is a means for displaying a part of the corresponding term candidate added with information indicating that the corresponding term candidate can be displayed.
[0015]
According to a fourth aspect of the present invention (corresponding to the present invention of claim 4), a screen display specification for specifying a part of the word string on which the added information is displayed on the display screen. A voice conversion device according to the third aspect of the present invention provided with means.
[0016]
According to a fifth aspect of the present invention (corresponding to the present invention described in claim 5), the conversion means replaces the identified part of the word string with the selected candidate term. In the speech conversion apparatus according to the first aspect of the present invention, the result is determined as the conversion target.
[0017]
  The invention of the technology related to the present invention isA speech conversion method of a speech conversion device that converts input speech of a first language into a speech language of a second language,
  A voice input step for inputting voice in the first language;
  A voice recognition step for voice recognition of the input voice;
  The example of the first language stored in the example database of the speech conversion apparatus that stores in advance the example of the first language and the dependency relationship between the predetermined words of the words constituting the example. When the predetermined word is included in the speech recognition result, an example corresponding to the voice is extracted using the dependency relationship of the predetermined word included, and the example is configured Or a first extraction / display step for displaying a plurality of word strings;
  A conversion target selection step for selecting any word string that is to be converted into a second language from the displayed word string that constitutes the example of the first language;
  When a classified word in the selected word string is identified, a word of the same class as the identified classified word is pre-classified as a word included in the example, A second extraction / display step for extracting and displaying as a candidate for replacement from the word class dictionary of the speech conversion apparatus that stores in advance a word that can be replaced with the classified word;
  A candidate selection step for selecting any candidate from the displayed candidate words of the same class;
  The conversion target to the second language is determined based on the word string constituting the selected example of the first language and the selected candidate words of the same class, and the determination is made. A conversion step of converting the converted object into the speech language of the second language;
Is a voice conversion method of a voice conversion device comprising:
[0019]
  or,Inventions related to the present inventionOnWritingA voice recognition step for voice recognition of the input voice of the voice conversion method of the voice converter;
  The example of the first language stored in the example database of the speech conversion apparatus that stores in advance the example of the first language and the dependency relationship between the predetermined words of the words constituting the example. When the predetermined word is included in the speech recognition result, an example corresponding to the voice is extracted using the dependency relationship of the predetermined word included, and the example is configured Or a first extraction / display step for displaying a plurality of word strings;
  A conversion target selection step for selecting any word string that is to be converted into a second language from the displayed word string that constitutes the example of the first language;
  When a classified word in the selected word string is identified, a word of the same class as the identified classified word is pre-classified as a word included in the example, A second extraction / display step for extracting and displaying as a candidate for replacement from the word class dictionary of the speech conversion apparatus that stores in advance a word that can be replaced with the classified word;
  A candidate selection step for selecting any candidate from the displayed candidate words of the same class;
  The conversion target to the second language is determined based on the word string constituting the selected example of the first language and the selected candidate words of the same class, and the determination is made. A conversion step of converting the converted object into the speech language of the second language;
Is a recording medium on which a program for causing a computer to execute is recorded, and can be processed by the computer.
[0022]
With the above configuration, in the present invention, for example, it is possible to provide small hardware that the user can hold with one hand and easily operate with buttons and a touch panel. For example, it is possible to classify and hold words included in an example sentence to be speech-interpreted, and to implement only a small number of words representing the class as a recognition target vocabulary in the speech recognition unit. When a sentence including a word representing a class is spoken, an example including the word can be searched and presented to the user. Normally, the user selects a desired example and outputs a translated speech. However, if necessary, the user can replace the word with another word in the class and output the translated speech. For example, if you want to enter “Do you have aspirin” in Japanese, replace it with the word “medicine” representing the class to which the word “aspirin” belongs, and say “Do you have any medicine” in Japanese. Then, the “medicine” part can be replaced with “aspirin”. Through such stepwise operations, the utility value as a speech interpreting apparatus is maintained without implementing a large-scale recognition target vocabulary.
[0023]
  The first to fourteenth inventions related to the present invention for solving the second problem in the prior art will be described below.
  The first other invention isAn input unit for inputting voice in the first or second language;
  (1) When the voice of the first language is received from the input unit, the voice is recognized and, based on a predetermined control instruction, (1-a) the voice-recognized first language Or (1-b) converting the conversion target determined based on the recognition result recognized as speech into the second language, and outputting at least the notation data of the converted language. (2) When the second language voice is received from the input unit, it is voice-recognized, and based on a predetermined control instruction, (2-a) the second voice-recognized second Output language notation data, or (2-b) convert the conversion target determined based on the speech recognition result into the first language, and output at least notation data of the converted language A translation department,
  A support unit for supporting the determination of the conversion target of the translation unit;
  A display unit for displaying the notation data in the converted language output from the translation unit based on the predetermined control instruction;
  And a control unit that performs the predetermined control instruction on at least the translation unit and the display unit.
[0024]
  or,The second other invention isThe notation data based on the voice input by the input unit during the conversation between the user who uses the first language and another user who uses the second language is sequentially held. The speech conversion apparatus according to the first aspect of the present invention includes a dialogue history management unit for outputting history information to the display unit.
[0025]
  or,The third other invention is, Detecting information for determining a translation direction of which translation from the first language to the second language or from the second language to the first language should be performed by the speech translation unit A language conversion direction detector,
  The control unit is the speech conversion apparatus according to the first other aspect of the invention, which specifies the translation direction for the speech translation unit and controls the input unit based on the detection result.
[0026]
  or,The fourth other invention isThe control of the control unit with respect to the input unit is to select the voice input unit that best collects the voice of the user who speaks when the input unit includes a plurality of voice input units. It is a voice conversion device according to the third other invention.
[0027]
  or,The fifth other invention isThe control unit controls the display content of the display unit to rotate substantially 180 degrees with reference to the display screen of the display unit according to the translation direction. This is a voice conversion device according to the invention.
[0028]
  or,The sixth other invention isThe language conversion direction detection unit is constituted by a button switch, and the speech conversion device according to the third other aspect of the invention is configured such that a user who speaks selects the translation direction by pressing the button switch.
[0029]
  or,The seventh other invention isThe language conversion direction detection unit includes an angle sensor that detects a direction with the best acoustic directivity of the movable microphone, and a user who speaks changes the direction of the microphone and selects the translation direction. A speech input translation apparatus according to the third other aspect of the invention.
[0030]
  or,The eighth other invention isThe language conversion direction detection unit includes a gyro sensor installed inside the speech conversion device, and the user who speaks selects the translation direction at a position where the speech conversion device is held. 3 is a voice conversion device according to another invention.
[0031]
  or,The ninth other invention isThe language conversion direction detection unit is configured by a sound source direction detection device of an input unit configured by a microphone array unit, and selects the translation direction based on the utterance position of the uttering user with respect to the microphone array unit. 3 is a voice conversion device according to another invention.
[0032]
  or,The tenth other invention isA voice input step for inputting and outputting voice in the first or second language;
  (1) When the voice in the first language output by the voice input step is received, the voice is recognized, and based on a predetermined control instruction, (1-a) the voice recognized Output notation data of the first language, or (1-b) convert the conversion target determined based on the recognition result of the speech recognition into the second language, and at least the notation of the language after the conversion (2) When the second language voice output by the voice input step is received, the voice is recognized, and based on a predetermined control instruction, (2-a) Output the notation data of the second language that has been speech-recognized, or (2-b) convert the conversion target determined based on the recognition result of the speech recognition to the first language, and after the conversion At least notation data in other languages And translation step of outputting,
  A support step for supporting the determination of the conversion target in the translation step;
  A display step for displaying the notation data of the converted language output by the translation step based on the predetermined control instruction;
  And a control step for performing the predetermined control instruction on at least the translation step and the display step.
[0033]
  or,The eleventh other invention is,the aboveAny other one of the first to ninthThis is a program for causing a computer to function as all or part of the translation unit, the support unit, the display unit, and the dialogue history management unit of a clear speech conversion apparatus.
[0034]
  or,The twelfth other invention is,the above10th other inventionThis is a program for causing a computer to execute all or part of the translation step, the support step, the display step, and the dialog history tube step.
[0035]
  or,The thirteenth other invention is,the aboveEleventh other inventionIt is a medium carrying the above program and is characterized in that it can be processed by a computer.
[0036]
  or,The fourteenth other invention is,the above12th other inventionIt is a medium carrying the above program and is characterized in that it can be processed by a computer.
[0037]
  The present inventionOther inventions related to technologyAccording to this configuration, for example, a voice conversion device that is held by a user with one hand and can be easily operated with buttons or a touch panel is used. And two means of facing users (one user uses the first language and the other user uses the second language), the means for acquiring the operating authority manually, or A means for manually giving the operation authority to the other party, or a means for automatically acquiring it, explicitly indicating which one has the operation authority, and providing a display and input means that the user can easily operate. provide. Thereby, for example, it is possible to improve the usability of the display contents as compared with the conventional case without increasing the display power.
[0038]
DETAILED DESCRIPTION OF THE INVENTION
  In the following, the configuration and operation of the speech interpretation apparatus according to an embodiment of the speech conversion apparatus of the present invention will be described with reference to the drawings.Inventions related to technologyThe operation of the voice conversion method will be described simultaneously.
[0039]
FIG. 1 is a block diagram showing a hardware configuration of the speech interpretation apparatus according to the present embodiment.
[0040]
The voice input / output device 102 receives the user's utterance in the source language and outputs the voice translated into the target language. The image output device 103 displays an example of the source language to be interpreted by the interpreting device. The image instruction device 105 and the button 106 are used for allowing the user to select an example displayed on the image output device 103. The arithmetic and control unit 101 converts the data related to the source language input from the voice input / output device 102, the image instruction device 105, and the button 106 into data related to the target language in a voice language, and the voice input / output device 102 and the image output device 103 Output to. The external large-capacity nonvolatile storage device 104 holds a program and data for instructing the processing procedure to the arithmetic control device 101. The external data input / output terminal 107 is used by the arithmetic and control unit 101 to exchange programs and data with external devices. The power supply device 108 supplies power necessary for driving the arithmetic control device 101.
[0041]
Here, the voice input means of the present invention corresponds to the voice input device 102, and the first extraction / display means and the second extraction / display means of the present invention are the image output device 103, the arithmetic control device 101, and the like. Corresponds to a component including The screen display specifying means of the present invention corresponds to the image instruction device 105 and the button 106. Further, the first language of the present invention corresponds to the original language in the present embodiment, and the second language of the present invention corresponds to the target language in the present embodiment.
[0042]
A specific configuration example in which a PC / AT compatible motherboard is used for the arithmetic and control unit 101 is shown in FIG. The voice input / output device 203 is connected using the USB terminal of the motherboard 201. The image output device 204 is connected using the digital RGB interface terminal of the motherboard 201. The external large-capacity nonvolatile storage device 104 uses a 2.5-inch hard disk drive 202 and is connected to the motherboard 201 through an IDE interface. A flash memory disk may be used instead of the hard disk drive. Further, a Li-ion secondary battery 208 is used as the power supply device 108, and + 5V and + 12V voltages are supplied to the motherboard 201. Out of the input / output terminals of the motherboard 201, an analog display output terminal, a local area network terminal, and a keyboard connection terminal are pulled out to constitute an external data input / output terminal 207.
[0043]
A detailed configuration of the image output apparatus 204 is shown in FIG. A cold cathode ray tube backlight mounted on the back side is connected to an LCD unit 301 having a display area of 4 inches and a resolution of VGA using 18 bits from the digital RGB interface of the motherboard 302. Also, a video synchronization signal and a backlight control signal are connected.
[0044]
A detailed configuration of the image instruction device 205 and the button 206 is shown in FIG. A 3.8-inch pressure-sensitive touch panel 402 is connected to the touch panel controller 401, and the X coordinate and Y coordinate of the indicated position are converted into RS232C standard serial data and connected to the serial terminal COM1 of the motherboard 405. The buttons 403 and 404 are connected to the touch panel controller 401, respectively, and button ON / OFF information is added to the indicated position information. The serial data received by the device driver software of the touch panel controller 401 mounted on the motherboard 405 is decoded, so that the button 403 corresponds to the left button when the mouse is connected to the motherboard 405, and the button 404 corresponds to the right button. A mouse click event occurs.
[0045]
The detailed configuration of the voice input / output device 203 is shown in FIG. The USB audio interface 504 converts the input analog sound into digital data and transmits it to the mother board 505, and converts the digital data transmitted from the mother board 505 (corresponding to 201 in FIG. 2) into analog sound. A USB interface is used to send and receive digital data. A microphone 503 is used to collect analog audio. The output of the USB interface 504 is amplified by the audio amplifier 502 and output from the speaker 501. Instead of the USB audio interface 504, an audio interface mounted on the motherboard 505 may be used.
[0046]
FIG. 6 shows a perspective view of an example in which the configuration of FIG. 2 is mounted on a housing that the user can hold with one hand, and FIGS. 7A to 7C show three views thereof. An image instruction device 205, an image display device 204, and a button 205 are mounted on the main housing 601. Reference numerals 603 and 604 correspond to a button 403 and a button 404, respectively. A voice input / output device 203 is mounted on the sub-housing 602. When the interpretation apparatus is not used, the display surface of the image display apparatus 204 is covered and protected by the sub-housing 602.
[0047]
When using this interpreting apparatus, the sub-housing 802 is used after being moved to a predetermined position where the directivity direction of the voice input / output apparatus 203 (microphone 803) faces the user's face as shown in FIG. The three views are shown in FIGS. 9 (a) to 9 (c). That is, the speaker 804 mounted on the sub-housing 802 is lifted until it faces the user, and the microphone 803 is also lifted. In this state, the LCD 805 with a touch panel can be used.
[0048]
10A to 10C show how the main casing 601 is mounted. The 4-inch VGALCD unit 301 and the touch panel 402 are overlapped and mounted as a touch panel equipped LCD 1005. FIGS. 11A to 11C show how the sub-housing 602 is mounted.
[0049]
FIG. 12 shows a software configuration diagram as an embodiment of the program and data of the present invention. In FIG. 12, 1201 instructs each component, and a control unit that controls the flow of data from each component, 1202 displays information from the control unit 1201 and sends input from the user to the control unit 1201. A GUI (Graphical User Interface) unit, 1203 is a voice input unit that records a user's voice according to an instruction from the control unit 1201, 1204 is a voice recognition unit that continuously recognizes a user's voice sent from the voice input unit, and 1205 is an original. An example database that holds correspondence between language and target language examples, 1206 is an example database 1205, a word class dictionary that holds words that have been classified, and 1207 is an example from the voice recognition result sent from the control unit 1201 An example selection unit 1208 for selecting an example with reference to the database 1205 follows an instruction from the control unit 1201. The word selection unit 1209 selects a classified word from the examples selected by the example selection unit 1207. A word 1209 is a word that can be changed to the classified word specified by the control unit 1201. An alternative word selection unit for selecting an alternative word by referring to the word class dictionary 1206, and a language for converting an example specified by the control unit 1201 into a target language by referring to the example database 1205 and the word class dictionary 1206 The conversion unit 1211 includes a speech synthesis unit that synthesizes and outputs an example sentence in a target language specified by the control unit. Here, the example database 1205 to the language conversion unit 1210 are collectively referred to as a translation unit 1220.
[0050]
Here, the speech recognition unit of the present invention corresponds to the speech recognition unit 1204, and the conversion target selection unit of the present invention corresponds to the example selection unit 1207 and the like. The screen display specifying means of the present invention corresponds to the word selection unit 1208 and the like, and the candidate selection means of the present invention corresponds to the alternative word selection unit 1209 and the like. The conversion means of the present invention corresponds to the components including the language conversion unit 1210 and the speech synthesis unit 1211.
[0051]
FIG. 14 shows a specific example of the example database 1205. The example corresponds to one sentence of the dialogue, and the correspondence between the source language and the target language is held together with information (components of the source language, dependency of the components) predetermined in each example. The source language words enclosed in <> indicate that they are classified words. Indicates that a classified word can be replaced with a word of the same class.
[0052]
FIG. 15 shows a specific example of the word class dictionary 1206. Here, a class is a word with a high degree of abstraction such as “fruit”, and a word belonging to a class is a word that expresses a specific entity of the class, such as “apple” or “mandarin orange”. is there. Note that it is possible to efficiently select an example by changing the abstraction level of the classification according to the performance of the speech recognition unit 1204. Further, the word class dictionary 1206 may be configured by hierarchizing classes.
[0053]
FIG. 16 shows details of the GUI unit 1202 displayed on the LCD 805 with a touch panel. 1601 is a translation direction designating unit that designates the direction of translation, 1603 is a speech recognition result display unit that displays a speech recognition result recognized by the speech recognition unit 1204, and 1604 displays a selected example sentence by an example selection unit 1207. 1605 is an example selection result display unit that displays an example specified by the user, 1606 is a translation result display unit that displays an example converted by the language conversion unit into a target language, and 1607 and 1608 are buttons 806, respectively. , Which corresponds to the button 807, and is input by the user. In addition, the user can perform pointing input on the LCD 805 with a touch panel.
[0054]
FIG. 13 is a flowchart of the software of the present invention. 1301 is a step of selecting a direction of translation, 1302 is a step of inputting voice by the microphone 803 and performing voice recognition, 1303 is a step of searching an example from the example database 1205 based on the voice recognition result, and 1304 is a user from the searched example Selecting an example, 1305 determining whether the example selected in step 1304 is to be modified or translated, 1306 selecting a word to be corrected in the example selected in step 1304, 1307 A step of outputting a list of words that can be replaced with the word to be corrected selected in step 1306, 1308 a step of selecting a word desired by the user from the list of words output in step 1307, and 1309 a word changed in step 1308 Steps to replace the example Flop. 1310 is a step of converting the example sentence determined in step 1305 into the target language, and 1311 is a step of synthesizing the example converted into the target language in step 1309 and outputting from the speaker 804.
[0055]
Hereinafter, the operation of the software of the present invention will be described with reference to the flowchart of FIG. 13 and the display contents of the GUI unit 1202 displayed on the touch panel-equipped LCD 805 of FIGS. As an example, a case where the user wants to translate the sentence “Are there aspirin?” Will be described. Specifically, the user inputs “Is there a medicine” and then performs an operation of replacing the “medicine” portion with “aspirin”. In the present invention, two types of input operations can be performed using a touch panel and buttons, and therefore, touch panel input and button input will be described below in this order.
[0056]
FIG. 17 shows display contents of the GUI unit 1202 from step 1301 to step 1303 in the case of touch panel input. In step 1301, the user clicks the translation direction designation unit 1701 by touch panel input and designates translation in the Japanese-English direction. At this time, the GUI unit 1202 transmits the translation direction to the control unit 1201, and the control unit 1201 instructs the voice input unit 1203 to input a voice. The user uses the microphone 803 to say “Is there any medicine?” The voice input unit 1203 transmits the input voice to the voice recognition unit 1204. In step 1302, it is assumed that the speech recognition unit 1204 performs speech recognition corresponding to the specified translation direction, and transmits a recognition result “Is there a 7-day medicine” including misrecognition to the control unit 1201. The control unit 1201 transmits the voice recognition result to the GUI unit 1202 and the example selection unit 1207. The GUI unit 1202 displays the transmitted speech recognition result on the recognition result display unit 1702. On the other hand, in step 1303, the example selection unit 1207 searches for an example by the following method based on the voice recognition result, and transmits the searched example to the control unit 1201.
[0057]
The example selection unit 1207 extracts “7 days”, “medicine”, and “yes” as a set of important words defined in the example database 1205 from the speech recognition result “Is there 7 day medicine?”.
[0058]
Here, “7 days” belongs to the classified word <days>, and “drug” belongs to the classified word <drug>. “Yes” does not belong to any classified word.
[0059]
The example selection unit 1207 sequentially checks the dependency relationships of the components in FIG. 14, and sequentially selects from the examples in which the number of established relationships is large among the examples in which one or more dependency relationships are established. For example, for the example of example number 1, since there is no “over” in the above set of key words, the number of established dependency relationships is zero. As for the example of example number 2, since “something” does not exist in the above set of key words, (1 → 2) does not hold in the dependency of the constituent elements, but (2 → 3) is (See FIG. 14). Therefore, the number of established dependency relationships is 1.
[0060]
If the example selection unit 1207 is designed to select an example having a dependency relationship of 1 or more from the example database 1205, the example of example number 1 is not selected in FIG. 14, and the example of example number 2 is selected. Is done. Since “something” does not exist in the set of key words, for the example of the selected example number 2,
・ Is there any medicine?
Is output.
[0061]
In the following description, other examples in the example database 1205,
・ "Is it a medicine"
・ "It is a drug"
Will be described as being selected in the same manner as described above.
[0062]
The control unit 1201 transmits the example sentence transmitted from the example selection unit 1207 to the GUI unit 1202. The GUI unit 1202 displays the selected example sentence on the example candidate display unit 1703.
[0063]
The display content of the GUI unit 1202 in step 1304 is shown in FIG. In step 1304, the user selects an example “Is there a medicine” with the same sentence meaning by clicking 1801 on the touch panel input from the example candidates displayed on the example candidate display unit 1703. To do. At this time, the GUI unit 1202 transmits the selected example sentence to the control unit 1201.
[0064]
The display contents of the GUI unit 1202 in step 1305 are shown in FIG. In step 1305, the GUI unit 1202 displays the selected example sentence on the example result display unit 1901 and clears the example candidate display unit 1902. After that, it is selected whether to determine the example and translate it, or to modify the example and replace the classified word with a replaceable word. At this time, the user can determine an example by clicking the example result display unit 1901 on the touch panel. The determined example is transmitted to the control unit 1201. Further, the user can shift to a mode for replacing words in the example by double-clicking the example result display unit 1901 on the touch panel.
[0065]
The display content of the GUI unit 1202 when the example is determined in step 1305 is shown in FIG. In step 1310, the control unit 1201 transmits the example “Is there any medicine” determined by the user to the language conversion unit 1210. The language conversion unit 1210 converts the target language “Any medicine” using the example database 1205 and transmits the conversion result to the control unit 1201. The control unit 1201 transmits the conversion result to the GUI unit 1202 and the speech synthesis unit 1211. In step 1311, the GUI unit 1202 displays the conversion result on the interpretation result display unit 2001. On the other hand, the voice synthesizer 1211 synthesizes the conversion result as a voice and outputs it from the speaker 804.
[0066]
The display content of the GUI unit 1202 in step 1306 is shown in FIG. In step 1306, when the user selects the word selection mode in step 1305, the word to be changed is selected. At this time, the control unit 1201 instructs the word selection unit 1208 to select a word. The word selection unit 1208 extracts the word “medicine” classified from the example and transmits it to the control unit 1201. The control unit 1201 transmits a word to the GUI unit 1202, and the GUI unit 1202 displays an underlined “medicine” displayed in the example result display unit 2101 to indicate that the word can be changed by the user. The user clicks the word “medicine” to be corrected by touch panel input. The GUI unit 1202 transmits the selected word to the control unit 1201.
[0067]
The display content of the GUI unit 1202 in step 1307 is shown in FIG. In step 1307, a list of alternative words for the word “medicine” designated by the user in step 1306 is displayed. The control unit 1201 transmits the word “drug” designated by the user to the alternative word selection unit 1209. The alternative word selection unit 1209 refers to the word class dictionary 1206 shown in FIG.
·"aspirin"
・ Cold medicine
・ Troach
·"Gastrointestinal drug"
Is extracted and transmitted to the control unit 1201. The control unit 1201 transmits a list of alternative words to the GUI unit 1202, and the GUI unit 1202 displays a list of alternative words in the list window 2201.
[0068]
The display content of the GUI unit 1202 in step 1308 is shown in FIG. Step 1308 selects a desired word from the list of alternative words shown in the list window 2201. At this time, the GUI unit 1202 obtains an alternative word “aspirin” by clicking on the alternative word 2301 desired by the user through the user's touch panel input, and transmits the acquired alternative word to the control unit 1201.
[0069]
The display content of the GUI unit 1202 in step 1309 is shown in FIG. Step 1309 changes the example to “Are there aspirin” with the designated alternative word “aspirin”. After that, the GUI unit 1202 changes the example displayed in the example result display unit 2401 to “Is there aspirin?” And displays it. Then, the process returns to step 1305.
[0070]
In FIG. 25, the user repeats steps 1305 to 1308, and the user selects the example determination in step 1305, converts “Is there aspirin” into the target language “Any aspirin”, and outputs a synthesized speech GUI unit 1202 Is the display content.
[0071]
Next, the case of button input will be described. In the following description, SW1 physically corresponds to the button 806 and SW2 physically corresponds to the button 807.
[0072]
The display contents of the GUI unit 1202 from step 1301 to step 1303 are shown in FIG. In step 1301, the translation in the Japanese-English direction is designated by clicking SW1, and the translation in the English-Japanese direction is designated by clicking SW2. In this case, click SW1 to specify Japanese-English translation. At this time, the GUI unit 1202 transmits the translation direction to the control unit 1201, and the control unit 1201 instructs the voice input unit 1203 to input a voice. The user uses the microphone 803 to say “Is there any medicine?” The voice input unit 1203 transmits the input voice to the voice recognition unit 1204. In step 1302, it is assumed that the speech recognition unit 1204 performs speech recognition corresponding to the specified translation direction, and transmits a recognition result “Is there a 7-day medicine” including misrecognition to the control unit 1201. The control unit 1201 transmits the voice recognition result to the GUI unit 1202 and the example selection unit 1207. The GUI unit 1202 displays the transmitted speech recognition result on the recognition result display unit 1702. On the other hand, in step 1303, the example selection unit 1207 transmits the example to the control unit 1201 based on the voice recognition result.
[0073]
The example selection unit 1207 extracts “7 days”, “medicine”, and “yes” as a set of important words defined in the example database 1205 from the speech recognition result “Is there 7 day medicine?”.
[0074]
Here, “7 days” belongs to the classified word <days>, and “drug” belongs to the classified word <drug>. “Yes” does not belong to any classified word.
[0075]
The example selection unit 1207 sequentially checks the dependency relationships of the components in FIG. 14, and sequentially selects from the examples in which the number of established relationships is large among the examples in which one or more dependency relationships are established. For example, for the example of example number 1, since there is no “over” in the above set of key words, the number of established dependency relationships is zero. As for the example of example number 2, since “something” does not exist in the above set of key words, (1 → 2) does not hold in the dependency of the constituent elements, but (2 → 3) is (See FIG. 14). Therefore, the number of established dependency relationships is 1.
[0076]
If the example selection unit 1207 is designed to select an example having a dependency relationship of 1 or more from the example database 1205, the example of example number 1 is not selected in FIG. 14, and the example of example number 2 is selected. Is done. Since “something” does not exist in the set of important words, for the example of the selected example number 2,
・ Is there any medicine?
Is output.
[0077]
In the following description, other examples in the example database 1205,
・ "Is it a medicine"
・ "It is a drug"
Is assumed to be selected in the same manner as described above.
[0078]
The control unit 1201 transmits the example sentence transmitted from the example selection unit 1207 to the GUI unit 1202. The GUI unit 1202 displays the selected example sentence on the example candidate display unit 1703.
[0079]
The display content of the GUI unit 1202 in step 1304 is shown in FIG. In step 1304, the user selects an example “Is there a medicine” with the same sentence meaning by using a button input from the example candidates displayed on the example candidate display unit 1604. In the selection method, the designated line moves up by one line by clicking SW1, and the designated line moves down by one line by clicking SW2. To select an example, double-click SW1 to select it. At this time, the GUI unit 1202 transmits the selected example sentence to the control unit 1201.
[0080]
The display contents of the GUI unit 1202 in step 1305 are shown in FIG. In step 1305, the GUI unit 1202 displays the selected example sentence on the example result display unit 1901 and clears the example candidate display unit 1902. After that, it is selected whether to determine the example and translate it, or to modify the example and replace the classified word with a replaceable word. At this time, the user can determine an example by clicking SW2 with a button input. The determined example is transmitted to the control unit 1201. Further, by clicking SW1 with a button input, it is possible to shift to the word replacement mode in the example, and it is transmitted to the control unit 1201.
[0081]
The display content of the GUI unit 1202 when the example is determined in step 1305 is shown in FIG. In step 1310, the control unit 1201 transmits the example “Is there any medicine” determined by the user to the language conversion unit 1210. The language conversion unit 1210 converts the target language “Any medicine” using the example database 1205 and transmits the conversion result to the control unit 1201. The control unit 1201 transmits the conversion result to the GUI unit 1202 and the speech synthesis unit 1211. In step 1311, the GUI unit 1202 displays the conversion result on the interpretation result display unit 2001. On the other hand, the voice synthesizer 1211 synthesizes the conversion result as a voice and outputs it from the speaker 804.
[0082]
The display content of the GUI unit 1202 in step 1306 is shown in FIG. In step 1306, when the user selects the word selection mode in step 1305, the word to be changed is selected. At this time, the control unit 1201 instructs the word selection unit 1208 to select a word. The word selection unit 1208 extracts the word “medicine” classified from the example and transmits it to the control unit 1201. The control unit 1201 transmits a word to the GUI unit 1202, and the GUI unit 1202 displays an underlined “medicine” displayed in the example result display unit 2101 to indicate that the word can be changed by the user. The user selects a word “medicine” to be corrected by button input. That is, clicking SW1 moves one word to the left, and clicking SW2 moves one word to the right. It is also possible to select a correction word by double-clicking SW1. The GUI unit 1202 transmits the selected word to the control unit 1201.
[0083]
The display contents of the GUI unit 1202 in step 1307 are shown in FIG. In step 1307, a list of alternative words for the word “medicine” designated by the user in step 1306 is displayed. The control unit 1201 transmits the word “drug” designated by the user to the alternative word selection unit 1209. The alternative word selection unit 1209 refers to the word class dictionary 1206 shown in FIG. 15 and has the same class as the word “medicine” specified by the user.
·"aspirin"
・ Cold medicine
・ Troach
·"Gastrointestinal drug"
Is extracted and transmitted to the control unit 1201. The control unit 1201 transmits a list of alternative words to the GUI unit 1202, and the GUI unit 1202 displays a list of alternative words in the list window 2201.
[0084]
The display content of the GUI unit 1202 in step 1308 is shown in FIG. Step 1308 selects a desired word from the list of alternative words shown in the list window 2201. At this time, the GUI unit 1202 acquires the alternative word “Aspirin” desired by the user by inputting the button on the user, and transmits it to the control unit 1201. As for the input method, clicking SW1 moves the cursor to the word one word above, and clicking SW2 moves the cursor to the word one word below. When selecting a word, it is possible to select it by double-clicking SW1.
[0085]
The display content of the GUI unit 1202 in step 1309 is shown in FIG. Step 1309 changes the example to “Are there aspirin” with the designated alternative word “aspirin”. After that, the GUI unit 1202 changes the example displayed in the example result display unit 2401 to “Is there aspirin?” And displays it. Then, the process returns to step 1305.
[0086]
FIG. 25 shows the GUI section when the user repeats steps 1305 to 1308, selects the example determination in step 1305, converts “Is there aspirin” into the target language “Any aspirin”, and outputs synthesized speech 1202 is a display content.
[0087]
In the above description, the user input to the GUI unit 1202 is limited to the touch panel input and the button input. However, it is also possible to select and determine words and examples by voice using voice recognition processing. . It is also possible to operate by combining touch panel, button, and voice input modalities. Moreover, although Japanese and English were taken as an example, it can implement similarly about other languages, such as Chinese, and this invention does not depend on a language.
[0088]
Also, word sequence of the present invention, in the above embodiment has been described with the case of a sentence including a plurality of words, for example not limited to this, composed of a single word as a "Hello" May be.
[0089]
In the above embodiment, the first extraction / display unit and the second extraction / display unit of the present invention have been described as being realized by the same display device. A configuration realized by the display device described above may be used.
[0090]
As described above, the speech interpreting apparatus as an example of the present invention is a speech interpreting apparatus that selects an example based on input by speech and performs translation, and the hardware of the speech interpreting apparatus has a speech modality. A voice input / output device, an image output device as an image modality, one or more buttons and an image instruction device as a contact modality, and input by the user from the voice input / output device, the image instruction device, and the button And a processing control unit that converts the data related to the source language into data related to the target language in a linguistic manner and outputs the output data to the voice input / output device and the image output device. An external large-capacity non-volatile storage device that holds a program for instructing data and data, and the arithmetic and control unit And the external data input and output terminals for exchanging data, a voice interpreting device characterized by being constituted by a power supply for supplying power necessary for driving the arithmetic and control unit.
[0091]
Another example is the above-described speech interpreting apparatus using a PC / AT compatible motherboard as the arithmetic and control unit.
[0092]
Another example is the above-described speech interpreting apparatus using a hard disk drive of 2.5 inches or less as the external large-capacity nonvolatile storage device.
[0093]
Another example is the above-described speech interpreting apparatus using a flash memory disk as the external large-capacity nonvolatile storage device.
[0094]
Another example is the above-mentioned speech interpreting apparatus using a liquid crystal display device having a resolution of 240 dots or more in the vertical direction and 240 dots or more in the horizontal direction as the image output apparatus.
[0095]
Another example is the voice interpreting apparatus characterized in that two buttons are used as the buttons and functionally correspond to mouse buttons when a mouse is connected to the motherboard. .
[0096]
In another example, the voice interpreting apparatus is characterized in that a touch panel having a size equivalent to the display surface of the liquid crystal display device or a size including the previous display surface is used as the image instruction device. It is.
[0097]
In another example, the external data input / output terminal uses a keyboard connection terminal, an analog display output terminal, and a local area network terminal among the input / output terminals of the motherboard. It is.
[0098]
In another example, the audio input / output device includes a USB audio interface that inputs and outputs analog audio data and digital audio data through the USB terminal of the motherboard, and a microphone that collects user's utterances and supplies the audio to the USB audio interface. And an audio amplifier that amplifies the output of the USB audio interface, and a speaker connected to the audio amplifier.
[0099]
In another example, the audio input / output device includes an audio interface of the motherboard, a microphone that collects a user's utterance and applies it to the audio interface, an audio amplifier that amplifies the output of the audio interface, and the audio The voice interpreting apparatus is constituted by a speaker connected to an amplifier.
[0100]
In another example, the voice interpreting device is characterized in that the power supply device is constituted by a lithium ion secondary battery.
[0101]
As another example, the voice interpreting device can be held by one user in one hand, the button can be easily operated by the thumb of the one hand, and the image pointing device can be operated by the other hand. And the direction of the normal line of the display surface of the image display device and the directionality of the voice input / output device are designed to be easily directed to the user's face. The above-mentioned voice interpretation device.
[0102]
As another example, the voice interpreting device is constituted by a main housing in which buttons, an image instruction device and an image display device are mounted, and a sub-housing in which a voice input / output device is mounted. If the voice interpreter is not used, the display surface of the image display device is covered and protected by the sub-housing, and if the voice interpreting device is used, the sub-housing has a directivity direction of the voice input / output device. The speech interpretation apparatus is used after being moved to a predetermined position facing the user's face.
[0103]
An example of the present invention is an audio interpreting apparatus that selects an example based on input by voice and performs translation. In the audio interpreting apparatus, the software of the audio interpreting apparatus inputs a voice and a GUI unit that inputs and outputs with a user. A speech input unit for performing speech recognition, a translation unit for translating the source language input from the source language input unit to the target language, and synthesizing and outputting the target language translated by the translation unit A speech interpreting apparatus comprising: a speech synthesis unit; the source language input unit; the GUI unit; the translation unit; and a control unit that controls the speech synthesis unit.
[0104]
Another example is the above-described speech interpreting apparatus characterized in that, as the above-mentioned example, one sentence in a dialogue is a unit.
[0105]
Another example is the above-described speech interpreting apparatus characterized in that, as the above-described example, a sentence pattern frequently used in travel conversation is held.
[0106]
Another example is the above-described speech interpretation apparatus, wherein the words included in the above examples are classified together with related words that can replace the words.
[0107]
In another example, the source language input unit includes a voice input unit that performs voice input according to an instruction from the control unit, and performs continuous voice recognition on the voice input from the voice input unit to generate a word string. The speech interpreting apparatus is configured by a speech recognition unit for conversion.
[0108]
In another example, the translation unit includes an example database that holds correspondence between source language and target language examples, a word class dictionary that holds class information of words included in the example database, and a source language input unit. Based on the input from the example database, an example selection unit that selects a corresponding example from the example database, a word selection unit that selects a word to be corrected from the examples selected by the example selection unit, and the word selection unit An alternative word selection unit that selects from the word class dictionary a word that can be replaced with the selected word, and a language conversion unit that converts the target word into a target language based on the determined example. The above-mentioned voice interpretation device.
[0109]
In another example, the GUI unit includes a translation direction designating unit that designates a translation direction on the display unit, a speech recognition result display unit that displays a speech recognition result output from the source language input unit, and the example. An example candidate display unit for displaying an example selected from the example database by the selection unit, an example result display unit for displaying the example selected by the user, and an interpretation result for outputting an example of the target language output by the language conversion unit It is the above-mentioned speech interpretation apparatus characterized by comprising a display unit.
[0110]
In another example, the GUI unit selects a desired example by touch panel operation or button operation when the user selects an example from examples displayed on the example candidate display unit. The voice interpreting apparatus.
[0111]
In another example, the word selection unit adds a mark to the correctable word in the example result display unit of the GUI unit when one or more correctable words are presented to the user. The voice interpreting device.
[0112]
In another example, the mark of the correctable word is underlining the word, highlighting the word, bolding the word, or blinking the word. The above-mentioned speech interpreting apparatus.
[0113]
In another example, the word selection unit determines the GUI unit by touch panel operation, button operation, or voice operation by voice recognition when the user selects a correction word. It is a voice interpreting device.
[0114]
In another example, when the alternative word selection unit selects an alternative word, the alternative word selection unit obtains an alternative candidate list using a word class dictionary, and the GUI unit lists the alternative candidate list. The voice interpreting apparatus is characterized by being displayed in a line.
[0115]
In another example, when selecting an alternative candidate from the alternative candidate list, the alternative candidate is selected by a touch panel operation of the GUI unit, a button operation, or a voice operation by voice recognition. The voice interpreting apparatus.
[0116]
In another example, when the GUI unit can be changed to an example desired by the user, the example is determined by a touch panel operation or a button operation, and the language conversion unit translates the target language. The speech interpreting apparatus, wherein the speech synthesizer outputs the synthesized speech of the example.
[0117]
As is clear from the above description, the small hardware can be easily carried as a voice interpreting device when the user goes abroad. Further, since the user interface can be easily operated with one hand, it can be easily used in various scenes such as shopping and restaurants. Furthermore, since a voice is input using a word representative of a class and an example is confirmed, it can be replaced with a related word of the same class, so that even a small recognition target vocabulary does not reduce the utility value as a speech interpreter.
[0118]
  Next, the present invention for solving the second problem in the prior art.It is another invention of the technology related toA speech input translation device according to an embodiment of a speech conversion device will be described with reference to the drawings.
[0119]
The configuration of the present embodiment is shown in FIG.
[0120]
As shown in the figure, a basic speech translation function is realized by a speech input unit 4101, a translation support unit 4108, a speech translation unit 4102, a display unit 4103, and a speech output unit 4107.
[0121]
Here, since the internal configuration of the apparatus according to the present embodiment has already been described in the above embodiment, a detailed description thereof is omitted here.
[0122]
The correspondence between the configuration of the present embodiment (see FIG. 26) and the configuration shown in FIG. 12, for example, is as follows. 26 corresponds to the speech input unit 1203 in FIG. 12, and the speech translation unit 4102 corresponds to the translation unit 1220, the speech recognition unit 1204, and the like. The translation support unit 4108 and the display unit 4103 correspond to the GUI unit 1202 and the like, and the speech output unit 4107 corresponds to the speech synthesis unit 1211 and the like.
[0123]
Next, the characteristic components of this embodiment will be described.
[0124]
  In FIG. 26, the language conversion direction control unit 4105 determines which of the two users has the authority to operate the translation apparatus, controls the input form of the speech input unit 4101, and controls the speech translation unit 4102. The translation direction is designated and the display content of the display unit 4103 is designated. Here, one of the two users is Japanese (the present inventionOther inventions related to technologyCorresponding to the first language), and the other is English (the present invention)Other inventions related to technologyCorresponding to the second language).
[0125]
The language conversion direction detection unit 4104 collects information necessary for the language conversion direction control unit 4105 to determine a user who has operating authority. The dialogue management unit 4106 sequentially holds the translation pairs displayed on the display unit 4103, and displays them on the display unit 4103 in one of the languages as a history of dialogues exchanged between the users. (See FIG. 29).
[0126]
  In the following example, the operation of the translation apparatus for Japanese and English will be described with reference to FIG.Other inventions related to technologyAn embodiment of the voice conversion method will be described simultaneously.
[0127]
Here, for example, as shown in FIG. 27, it is assumed that a Japanese user is facing the lower side and an English user is facing the upper side in the figure with the translation device in between.
[0128]
The speech input translation device in FIG. 27 is a speech input translation device in which the language conversion direction detection unit 4104 is a button. Assume that the language conversion direction control unit 4105 indicates the conversion direction from Japanese to English as an initial state. The input unit 4101 includes a microphone 4202 and a microphone 4206, but the input of the microphone 4206 is blocked in order to input Japanese. The user presses the voice input button 4201 and then speaks into the microphone 4202 of the input unit 4101 (for example, “Is there any medicine?”). The uttered voice is translated by the voice translation unit 4102.
[0129]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0130]
When the speech translation unit 4102 receives Japanese speech from the speech input unit 4101, the speech translation unit 4102 recognizes the speech, extracts one or more Japanese word strings corresponding to the recognition result, and displays the display unit 4103. Is displayed as an example candidate.
[0131]
That is, the example candidates are displayed in the example candidate selection window 4203 of the display unit 4103 (for example, three examples of “Is it a drug”, “Is there a drug”, and “Is a drug”), and the translation support unit 4108 is displayed. When the user selects one of them (for example, with the touch panel), the selected example is displayed in the example result window 4204 (for example, “Is there a medicine?”), And the English text translated from the example Is uttered from the audio output unit 4107 (for example, “Do you have medicine?”). A translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, a translation pair (“Do you have medicine?”, “Do you have medicine?”)). In order to prompt the other party to operate the translation apparatus, the button 4205 which is the language conversion direction detection unit 4104 is pressed in order to prompt the other party to operate the translation apparatus.
[0132]
Based on the information from the language conversion direction detection unit 4104, the language conversion direction control unit 4105 instructs the speech translation unit 4102 and the display unit 4103 to convert from English to Japanese. The display content of the display unit 4103 is rotated by 180 ° as shown in FIG. 28 so as to be easy to use for face-to-face English users, and is displayed in English. In the input unit 4101, the input of the microphone 4307 is blocked in order to input English, and the microphone 4303 is enabled. The dialogue history window 4301 displays the English of the translation pair from the dialogue management unit 4106. Specifically, for example, a translation pair consisting of “Do you have medicine?” And “Do you have medicine?” And a translation pair consisting of “Yes” and “Yes, I do.” When held in the management unit 4106, the dialogue window 4401 shown in FIG. 29 displays (Sun): “Is there any medicine?”, (English): “Yes”. The user presses the voice input button 4302 and then speaks into the microphone 4303 of the input unit 4101 (for example, “Yes, certainly”). The uttered voice is translated by the voice translation unit 4102.
[0133]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0134]
When the speech translation unit 4102 receives English speech from the speech input unit 4101, the speech translation unit 4102 recognizes the speech, extracts one or more English word strings corresponding to the recognition result, and displays them in the display unit 4103. Display as a candidate.
[0135]
That is, candidates are displayed in the example candidate selection window 4304 of the display unit 4103 (for example, three examples of “Yes, I do.”, “Surely.”, “Certainly.”), And the translation support unit 4108 is used. When the user selects one of them (for example, selecting with the touch panel), the selected example is displayed in the example result window 4305 (for example, “Yes, I do.”), And the translated Japanese example is displayed. The text is uttered from the voice output unit 4107 (for example, “Yes”). A translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, a translation pair (“Yes”, “Yes, I do.”)). In order to prompt the other party to operate the translation apparatus, the button 4306 which is the language conversion direction detection unit 4104 is pressed.
[0136]
Based on information from the language conversion direction detection unit 4104, the language conversion direction control unit 4105 instructs the speech translation unit 4102 and the display unit 4103 to convert from Japanese to English. The display content of the display unit 4103 is rotated by 180 ° as shown in FIG. 29 so as to be easy to use for face-to-face Japanese users, and becomes Japanese display. Since the input unit 4101 inputs Japanese, the input of the microphone 4405 is blocked and the microphone 4403 is enabled. In the dialogue history window 4401, the Japanese of the translation pair is displayed from the dialogue management unit 4106. The user presses the voice input button 4402 and then speaks into the microphone 4403 of the input unit 4101 (for example, “Thank you”). The uttered voice is translated by the voice translation unit 4102.
[0137]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0138]
That is, candidates are displayed in the example candidate selection window 4413 of the display unit 4103 (for example, one example of “Thank you”), and when the user selects one of them using the translation support unit 4108 ( The selected example is displayed on the example result window 4414 (for example, “Thank you”), and the translated Japanese text is uttered from the voice output unit 4107 (for example, “Thank you. "). The translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, (Thank you, "Thank you.") Translation pair). In order to further ask the partner's answer, when the partner is prompted to operate the translation device, a button 4404 which is the language conversion direction detection unit 4104 is pressed.
[0139]
The speech input translation device of FIG. 30 is a speech input translation device in which the language conversion direction detection unit 4104 is the tilt angle sensor 4502 of the microphone 4501. That is, the angle sensor 4502 is used to determine whether the microphone 4501 is inclined toward the Japanese user or the English user. In the state of FIG. 30, the language conversion direction control unit 4105 indicates the conversion direction from Japanese to English. The user presses the voice input button 4503 and then speaks into the microphone 4501 of the input unit 4101 (for example, “Is there any medicine?). The uttered voice is translated by the voice translation unit 4102.
[0140]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0141]
That is, candidates are displayed in the example candidate selection window 4504 of the display unit 4103 (for example, three examples of “Is it a medicine”, “Is there a medicine”, and “Is a medicine”), and are used by using the translation support unit 4108 When the user selects one of them (for example, with the touch panel), the selected example is displayed in the example result window 4505 (for example, “Is there a medicine?”), And the English text translated from the example is output as a voice. Part 4107 is uttered (for example, “Do you have medicine?”). A translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, a translation pair (“Do you have medicine?”, “Do you have medicine?”)). When the partner is prompted to operate the translation device, the microphone 4501 is pointed toward the English user.
[0142]
Based on the information from the language conversion direction detection unit 4104, the language conversion direction control unit 4105 instructs the speech translation unit 4102 and the display unit 4103 to convert from English to Japanese. The display content of the display unit 4103 is rotated by 180 ° as shown in FIG. 31 so as to be easy to use for the face-to-face English user and is displayed in English. In the dialog history window 4601, the English of the translation pair is displayed from the dialog management unit 4106. The user presses the voice input button 4602 and then speaks into the microphone 4603 of the input unit 4101 (for example, “Yes, certainly”). The uttered voice is translated by the voice translation unit 4102.
[0143]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0144]
That is, candidates are displayed in the example candidate selection window 4604 of the display unit 4103 (for example, three examples of “Yes, I do.”, “Surely.”, “Certainly.”), And the translation support unit 4108 is used. When the user selects one of them (for example, selecting with the touch panel), the selected example is displayed in the example result window 4605 (for example, “Yes, I do.”), And the translated Japanese example is displayed. The text is uttered from the voice output unit 4107 (for example, “Yes”). A translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, a translation pair (“Yes”, “Yes, I do.”)). In order to find the answer of the other party, the microphone 4603 is pointed toward the Japanese user.
[0145]
Based on information from the language conversion direction detection unit 4104, the language conversion direction control unit 4105 instructs the speech translation unit 4102 and the display unit 4103 to convert from Japanese to English. The display content of the display unit 4103 is rotated by 180 ° as shown in FIG. 32 so as to be easy to use for face-to-face Japanese users, and is displayed in Japanese. In the dialogue history window 4701, the Japanese of the translation pair is displayed from the dialogue management unit 4106. The user presses the voice input button 4702 and then speaks into the microphone 4703 of the input unit 4101 (for example, “Thank you”). The uttered voice is translated by the voice translation unit 4102.
[0146]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0147]
That is, a candidate is displayed in the example candidate selection window 4704 of the display unit 4103 (for example, one example of “Thank you”), and when the user selects one of them using the translation support unit 4108 ( The selected example is displayed in the example result window 4705 (for example, “Thank you”), and the translated Japanese text is uttered from the voice output unit 4107 (for example, “Thank you. "). The translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, (Thank you, "Thank you.") Translation pair). Further, in order to prompt the other party to operate the translation apparatus, the microphone 4703 is directed toward the English user in order to obtain the other party's answer.
[0148]
The speech input translation device of FIG. 33 is a speech input translation device that is a gyro sensor 4801 in which the language conversion direction detection unit 4104 detects the inclination of the main body. It is assumed that the language conversion direction control unit 4105 indicates the conversion direction from Japanese to English in the state of the gyro sensor. The input unit 4101 includes a microphone 4802 and a microphone 4803, but the input of the microphone 4803 is blocked in order to input Japanese. The user presses the voice input button 4804 and then speaks into the microphone 4802 of the input unit 4101 (for example, “Is there any medicine?). The uttered voice is translated by the voice translation unit 4102.
[0149]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0150]
That is, candidates are displayed in the example candidate selection window 4805 of the display unit 4103 (for example, three examples of “Is it a medicine”, “Is there a medicine”, and “Is a medicine”), and are used by using the translation support unit 4108 When the user selects one of them (for example, with the touch panel), the selected example is displayed in the example result window 4806 (for example, “Is there any medicine?”), And the English text translated from the example is output as a voice. Part 4107 is uttered (for example, “Do you have medicine?”). A translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, a translation pair (“Do you have medicine?”, “Do you have medicine?”)). In order to ask the other party's answer, when the other party is prompted to operate the translation apparatus, the other party is held upside down by holding the speech translation apparatus upside down.
[0151]
Based on the information from the gyro sensor 4901, the language conversion direction control unit 4105 instructs the speech translation unit 4102 and the display unit 4103 to convert from English to Japanese. The display content of the display unit 4103 is rotated by 180 ° as shown in FIG. 34 so as to be easy to use for the face-to-face English user and is displayed in English. In the input unit 4101, the input of the microphone 4902 is blocked in order to input English, and the microphone 4903 is enabled. In the dialogue history window 4904, the English of the translation pair is displayed from the dialogue management unit 4106. The user presses the voice input button 4905 and then speaks into the microphone 4903 of the input unit 4101 (for example, “Yes, certainly”). The uttered voice is translated by the voice translation unit 4102.
[0152]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0153]
In other words, candidates are displayed in the example candidate selection window 4906 of the display unit 4103 (for example, three examples of “Yes, I do.”, “Surely.”, “Certainly.”), And the translation support unit 4108 is used. When the user selects one of them (for example, with the touch panel), the selected example is displayed in the example result window 4907 (for example, “Yes, I do.”). The text is uttered from the voice output unit 4107 (for example, “Yes”). A translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, a translation pair (“Yes”, “Yes, I do.”)). In order to ask the other party's answer, when the other party is prompted to operate the translation apparatus, the other party is held upside down by holding the speech translation apparatus upside down.
[0154]
Based on information from the gyro sensor 5001, the language conversion direction control unit 4105 instructs the speech translation unit 4102 and the display unit 4103 to convert from Japanese to English. The display content of the display unit 4103 is rotated by 180 ° as shown in FIG. 35 so as to be easy to use for the face-to-face Japanese user and is displayed in Japanese. Since the input unit 4101 inputs Japanese, the input of the microphone 5002 is blocked and the microphone 5003 is enabled. In the dialogue history window 5004, the Japanese of the translation pair is displayed from the dialogue management unit 4106. The user presses the voice input button 5005 and then speaks into the microphone 5003 of the input unit 4101 (for example, “Thank you”). The uttered voice is translated by the voice translation unit 4102.
[0155]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0156]
That is, candidates are displayed in the example candidate selection window 5006 of the display unit 4103 (for example, one example of “Thank you”), and when the user selects one of them using the translation support unit 4108 ( The selected example is displayed in the example result window 5007 (for example, “Thank you”), and the translated Japanese text is uttered from the voice output unit 4107 (for example, “Thank you. "). The translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, (Thank you, "Thank you.") Translation pair). In order to further ask the partner's answer, when the partner is prompted to operate the translation device, the partner is held in the hand so that the speech translation device is turned upside down.
[0157]
The speech input translation device of FIG. 36 is a speech input translation device that is a microphone array unit 5101 in which the input unit 4101 and the language conversion direction detection unit 4104 can detect the direction of a sound source. The microphone array unit 5101 has a function of collecting sound with sharp directivity after specifying the direction of the sound source. Generally, the microphone array unit 5101 includes a plurality of microphone units arranged geometrically and each microphone unit. It is constituted by an arithmetic unit that converts an output into a single output by digital signal processing.
[0158]
When a Japanese user starts to speak (for example, “no,”), the microphone array unit 5101 detects the direction of the voice of the speaker, and is ready to speak. The background color of the display unit 4103 is a color that alerts the user (for example, red) while the utterance is not possible, and the permission color is given (for example, green) when the utterance is possible. Based on the information of the microphone array unit 5101, the language conversion direction control unit 4105 instructs the conversion direction from Japanese to English. When the user utters in a state where it can be generated (for example, “Is there any medicine?), The uttered speech is translated by the speech translation unit 4102.
[0159]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0160]
That is, candidates are displayed in the example candidate selection window 5102 of the display unit 4103 (for example, three examples of “Is it a medicine”, “Is there a medicine”, and “Is a medicine”), and are used by using the translation support unit 4108 When the user selects one of them (for example, with the touch panel), the selected example is displayed in the example result window 5103 (for example, “Is there a medicine?”), And the English text translated from the example is output as a voice. Part 4107 is uttered (for example, “Do you have medicine?”). A translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, a translation pair (“Do you have medicine?”, “Do you have medicine?”)).
[0161]
When the other party starts speaking to answer (for example, “Hmm,”), the microphone array unit 5201 detects the direction of the voice of the speaker and is ready to speak. Based on information from the microphone array unit 5201, the language conversion direction control unit 4105 instructs the speech translation unit 4102 and the display unit 4103 to convert from English to Japanese. The display content of the display unit 4103 is rotated by 180 ° as shown in FIG. 37 so as to be easy to use for the face-to-face English user and is displayed in English. In the dialogue history window 5202, the English of the translation pair is displayed from the dialogue management unit 4106. When an English user speaks in a utterable state (for example, “Yes, certainly”), the uttered speech is translated by the speech translation unit 4102.
[0162]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0163]
That is, candidates are displayed in the example candidate selection window 5203 of the display unit 4103 (for example, three examples of “Yes, I do.”, “Surely.”, “Certainly.”), And the translation support unit 4108 is used. When the user selects one of them (for example, selecting with the touch panel), the selected example is displayed in the example result window 5204 (for example, “Yes, I do.”), And the translated Japanese example is displayed. The text is uttered from the voice output unit 4107 (for example, “Yes”). A translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, a translation pair (“Yes”, “Yes, I do.”)).
[0164]
When the other party starts speaking to answer (for example, “Ah,”), the microphone array unit 5301 detects the direction of the voice of the speaker and becomes ready to speak. Based on information from the microphone array unit 5301, the language conversion direction control unit 4105 instructs the speech translation unit 4102 and the display unit 4103 to convert from Japanese to English. The display content of the display unit 4103 is rotated by 180 degrees as shown in FIG. In the dialogue history window 5302, the Japanese of the translation pair is displayed from the dialogue management unit 4106. When the user utters (eg, “thank you”) in a state where it can be generated, the uttered speech is translated by the speech translation unit 4102.
[0165]
Note that the translation operation in the translation unit 4102 has been described with reference to FIG. 14 and the like in the above embodiment, and therefore, detailed description thereof is omitted here, but the outline is as follows.
[0166]
That is, candidates are displayed in the example candidate selection window 5303 of the display unit 4103 (for example, one example of “Thank you”), and when the user selects one of them using the translation support unit 4108 ( The selected example is displayed in the example result window 5304 (for example, “Thank you”), and the Japanese text translated from the example is uttered from the voice output unit 4107 (for example, “Thank you. "). The translation pair is sent from the display unit 4103 to the dialogue management unit 4106 (for example, (Thank you, "Thank you.") Translation pair).
[0167]
  The touch panel, buttons, and voice input modalities can be combined, or the buttons can be replaced with the touch panel. In addition, although Japanese and English are taken as an example, the present invention can be similarly applied to other languages such as Chinese.Other inventions related to technologyIs language independent.
[0168]
As is clear from the above description, according to the above configuration, since the operation screen of one language is displayed on the entire surface of the display unit, the usability of the translation apparatus is maintained even with a small display unit. In addition, it is easy to understand which operation authority is in accordance with the contents of the screen, so that the two utterances do not overlap. Therefore, the recognition rate of voice recognition does not decrease, and the performance as a translation device does not decrease.
[0169]
In the above-described embodiment, the second language has been described with respect to the case where the voice output is performed together with the display of the translation result. However, the present invention is not limited to this.
[0170]
In the above embodiment, a case has been described in which two users use the device in a form facing each other with the translation device interposed therebetween. However, the present invention is not limited to this. For example, two people use the device side by side. It is good also as composition to do.
[0171]
  Specifically, the configuration is as shown in FIGS. 39 and 40. As shown in these figures, a user in the first language (for example, Japanese) uses the voice input button 4201a, and a user in the second language (for example, English) uses the voice input button 4201b. To do. In this configuration, one microphone 5501 is provided in the upper center of the apparatus. In this case as well, the same effect as the above configurationTheDemonstrate.
[0172]
In the above-described embodiment, for example, as shown in FIGS. 27 and 28, the case where the example result is to be translated has been described. However, the present invention is not limited to this. For example, in the word string displayed in the example result window 4204 Then, a list of alternative words for the word specified by the user may be displayed, a desired word may be selected from the alternative words, and a result reflecting the selection result may be a translation target. That is, the configuration in this case is obtained by applying the configuration described in FIGS. 21 to 24 to the configuration shown in FIG.
[0173]
Specifically, the configuration shown in FIGS. That is, a list of alternative words for the word specified by the user (FIG. 41) is displayed from the word string displayed in the example result window 4204a (FIG. 42), and the desired word is displayed from the alternative words. For example, an aspirin word 2301 is selected (FIG. 43), and an example result reflecting the selection result is displayed in an example result window 4204a (FIG. 44), which is to be translated. Subsequent translation operations and the like are the same as those shown in FIG. As a result, the range of translation objects is further expanded and usability is improved.
[0174]
  The present inventionInventions related to technologyIs described aboveSoundA program for causing a computer to execute the functions of all or part of the voice conversion device (or elements, circuits, units, etc.), and a program that operates in cooperation with the computer.
[0175]
  Also, the present inventionInventions related to technologyIs described aboveSoundA program for causing a computer to execute all or some of the steps (or processes, operations, actions, etc.) of the voice conversion method, and a program that operates in cooperation with the computer.
[0176]
  Also, the present inventionInventions related to technologyIs described aboveSoundA recording medium carrying a program for causing a computer to execute all or some of the operations of all or some of the steps of the voice conversion method of the voice conversion device, wherein the program is readable and read by the computer. A recording medium that performs the above-described operation in cooperation with a computer.
[0177]
  Also, the present inventionInventions related to technologyIs described aboveSoundA medium carrying a program for causing a computer to execute all or some of the functions of all or some of the means of the voice conversion device, which can be read by the computer, and the read program cooperates with the computer A medium for executing the function.
[0178]
  Also, the present inventionSpeech conversion method for speech conversion apparatus of invention related to the inventionA part of step (or process, operation, action, etc.) means several means or steps of the plurality of steps, or a part of one means or step. This means a function or a part of the operation.
[0179]
  Also, the present inventionInventions related to technologyA device (or an element, a circuit, a part, etc.) means a number of devices in the plurality of devices, or a means (or a device) in one device. , Element, circuit, part, etc.) or a part of functions of one means.
[0180]
  Also, the present inventionInventions related to technologyOne usage form of the program may be recorded on a computer-readable recording medium and operate in cooperation with the computer.
[0181]
  Also, the present inventionInventions related to technologyOne usage form of the program may be a mode in which the program is transmitted through a transmission medium, read by a computer, and operated in cooperation with the computer.
[0183]
The recording medium includes a ROM and the like, and the transmission medium includes a transmission medium such as the Internet, light, radio waves, sound waves, and the like.
[0184]
The computer of the present invention described above is not limited to pure hardware such as a CPU, but may include firmware, an OS, and peripheral devices.
[0185]
As described above, the configuration of the present invention may be realized by software or hardware.
[0186]
【The invention's effect】
As is apparent from the above description, the present invention has advantages in that it can be further reduced in size and can be easily operated.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a hardware configuration of a speech interpretation apparatus according to an embodiment of the present invention.
2 is a detailed block diagram of FIG. 1 when using a PC / AT compatible motherboard.
FIG. 3 is a detailed block diagram of the image output apparatus 204.
4 is a detailed block diagram of an image instruction device 205 and a button 206. FIG.
FIG. 5 is a detailed block diagram of the voice input / output device 203.
FIG. 6 is an overall view of the housing when the voice interpreting device is not used.
7A is a front view showing a detailed structure of the speech interpretation apparatus shown in FIG.
(B) is a side view showing the detailed structure of the speech interpretation apparatus shown in FIG.
(C) is a plan view showing the detailed structure of the speech interpretation apparatus shown in FIG.
FIG. 8 is an overall view of a housing when using an audio interpretation device.
9A is a front view showing a detailed structure of the speech interpretation apparatus shown in FIG.
(B) is a side view showing the detailed structure of the speech interpretation apparatus shown in FIG.
(C) is a plan view showing the detailed structure of the speech interpretation apparatus shown in FIG.
10A is a front view for illustrating a method of mounting each component of FIG. 2 on the main casing 801. FIG. 10B is a method of mounting each component of FIG. 2 on the main casing 801. FIG. 3C is a plan view for illustrating a method of mounting each component of FIG. 2 on the main housing 801.
11A is a front view for illustrating a method of mounting each component of FIG. 2 on the sub-housing 802. FIG. 11B is a method of mounting each component of FIG. 2 on the sub-housing 802. FIG. 2C is a plan view for illustrating a method of mounting each component shown in FIG.
FIG. 12 is a block diagram showing a software configuration of the speech interpretation apparatus according to the embodiment of the present invention.
FIG. 13 is a flowchart showing the flow of software processing;
FIG. 14 is a diagram showing an example of the contents of an example database 1205
FIG. 15 is a diagram showing an example of the contents of a word class dictionary 1206
FIG. 16 is a diagram showing display contents of the GUI unit 1202
FIG. 17 is a diagram showing display contents of the GUI unit 1202 from steps 1301 to 1303;
FIG. 18 is a diagram showing display contents of the GUI unit 1202 in the process of step 1304;
FIG. 19 is a diagram showing display contents of the GUI unit 1202 in the processing of step 1305;
FIG. 20 is a diagram showing display contents of the GUI unit 1202 in the processing from steps 1310 to 1311;
FIG. 21 is a diagram showing the display contents of the GUI unit 1202 in the process of step 1306;
FIG. 22 is a diagram showing display contents of the GUI unit 1202 in the processing of step 1307;
FIG. 23 is a diagram showing display contents of the GUI unit 1202 in the processing of step 1308.
FIG. 24 is a view showing the display content of the GUI unit 1202 in the processing of step 1309;
FIG. 25 is a diagram showing display contents of the GUI unit 1202 in the processing from steps 1310 to 1311;
FIG. 26 shows the present invention.Other inventions related to technologyThe block diagram which shows the structure of the speech input translation apparatus of one embodiment
FIG. 27 is a diagram showing the use of Japanese in the speech translation apparatus in which the language conversion direction detection unit 4104 is a button.
FIG. 28 is a diagram showing the use of English in the speech translation apparatus in which the language conversion direction detection unit 4104 is a button.
FIG. 29 is a diagram showing the use of Japanese in the speech translation device in which the language conversion direction detection unit 4104 is a button.
FIG. 30 is a diagram illustrating Japanese usage of the speech translation apparatus in which the language conversion direction detection unit 4104 is a microphone axis angle sensor.
FIG. 31 is a diagram showing the use of English by a speech translation apparatus in which the language conversion direction detection unit 4104 is a microphone axis angle sensor;
FIG. 32 is a diagram showing Japanese usage of the speech translation apparatus in which the language conversion direction detection unit 4104 is a microphone axis angle sensor.
FIG. 33 is a diagram showing Japanese usage of the speech translation device in which the language conversion direction detection unit 4104 is a gyro sensor.
FIG. 34 is a diagram showing the use of English in a speech translation apparatus in which the language conversion direction detection unit 4104 is a gyro sensor.
FIG. 35 is a diagram showing Japanese usage of a speech translation apparatus in which the language conversion direction detection unit 4104 is a gyro sensor.
FIG. 36 is a diagram showing Japanese usage of the speech translation apparatus in which the input unit 4101 and the language conversion direction detection unit 4104 are microphone array units.
FIG. 37 is a diagram illustrating the use of English in a speech translation apparatus in which the input unit 4101 and the language conversion direction detection unit 4104 are microphone array units.
FIG. 38 is a diagram showing Japanese usage of the speech translation apparatus in which the input unit 4101 and the language conversion direction detection unit 4104 are microphone array units.
FIG. 39 shows the present invention.Other inventions related to technologyThe figure for demonstrating utilization of the Japanese of the speech translation apparatus of another embodiment of
FIG. 40Other inventions related to technologyThe figure for demonstrating utilization of English of the speech translation apparatus of another embodiment of
FIG. 41Other inventions related to technologyThe figure for demonstrating the function of an alternative word in utilization of the Japanese of the speech translation apparatus of further another embodiment of this
FIG. 42Other inventions related to technologyThe figure for demonstrating the function of an alternative word in utilization of the Japanese of the speech translation apparatus of further another embodiment of this
FIG. 43Other inventions related to technologyThe figure for demonstrating the function of an alternative word in utilization of the Japanese of the speech translation apparatus of further another embodiment of this
FIG. 44Other inventions related to technologyThe figure for demonstrating the function of an alternative word in utilization of the Japanese of the speech translation apparatus of further another embodiment of this
[Explanation of symbols]
101 arithmetic and control unit
102 Voice input / output device
103 Image output device
104 External large-capacity nonvolatile memory device
105 Image instruction device
106 buttons
107 External data input / output terminal
108 Power supply
201 Motherboard
202 2.5 inch hard disk drive
203 Voice input / output device
204 Image output device
205 Image instruction device
206 buttons
207 External data output terminal
208 Li-ion secondary battery
301 4 inch VGALCD unit with backlight
302 Motherboard
401 Touch panel controller
402 3.8-inch pressure-sensitive touch panel
403 button
404 button
405 Motherboard
501 Speaker
502 audio amplifier
503 microphone
504 USB audio device
505 Motherboard
601 Main housing
602 Sub housing
603 button
604 button
701 Front view
702 Right side view
703 Top view
801 Main housing
802 Sub housing
803 microphone
804 speaker
805 LCD with touch panel
901 Front view
902 Right side view
903 Top view
1001 Front view
1002 Right side view
1003 Top view
1004 Motherboard
1005 LCD with touch panel
1006 2.5 inch hard disk drive
1007 button
1008 button
1101 Front view
1102 Right side view
1103 Top view
1104 Microphone
1105 Speaker
1106 USB audio device
1107 Audio amplifier
1201 Control unit
1202 GUI part
1203 Voice input unit
1204 Speech recognition unit
1205 Example database
1206 Word Class Dictionary
1207 Example selection unit
1208 Word selection part
1209 Alternative word selector
1210 Language converter
1211 Speech synthesis unit
1301 Determining the direction of translation
1302 Performing speech recognition
1303 Step for retrieving an example from an example database
1304 Selecting an Example
1305 determining whether to determine or modify an example
1306: determining a word to correct
1307: acquiring an alternative word list
1308: determining alternative words
1309 Steps to modify the example
1310 Steps to perform language conversion
Step of performing a voice synthesizer in 1311
1601 Translation direction designation part
1602 Translation direction designation part
1603 Recognition result display section
1604 Example candidate display area
1605 Example result display section
1606 Interpretation result display
1607 Button SW1
1608 Button SW2
1701 Translation direction designation part
1702 Recognition result display section
1703 Example candidate display area
1801 Selected examples
1901 Example result display area
1902 Example candidate display area
2001 Interpretation result display section
2101 Example result display area
2201 List window
2301 Selected alternative words
2401 Example result display area
4105 Language conversion direction control unit
4106 Dialog history management unit
4104 Language conversion direction detector

Claims

Voice input means for inputting voice in a first language;
Voice recognition means for voice recognition of the input voice;
An example database for storing in advance an example of the first language and a dependency relationship between predetermined words of words constituting the example;
When the predetermined word is included in the speech recognition result, using the dependency relationship of the predetermined word included, the example of the first language stored in the example database, the A first extraction / display unit for extracting an example corresponding to speech and displaying one or a plurality of word strings constituting the example;
A conversion target selecting means for selecting any word string to be converted into a second language from a word string constituting the displayed example of the first language;
A word class dictionary that pre-classifies the words included in the example and stores in advance words that can be replaced with the classified words;
When the classified word in the selected word string is identified, a word of the same class as the identified classified word is extracted as the replacement candidate from the word class dictionary, A second extraction / display means for displaying;
Candidate selection means for selecting any candidate from the displayed candidates for the same class of words;
The conversion target to the second language is determined based on the word string constituting the selected example of the first language and the selected candidate words of the same class, and the determination is made. Conversion means for converting the converted object into the speech language of the second language;
A voice conversion device comprising:

The first extraction / display unit includes a display unit including a display screen for displaying the plurality of word strings to be selected and the selected word string in a predetermined area. Have
2. The speech conversion apparatus according to claim 1, wherein the second extraction / display unit is a unit that displays the candidate candidates on a partial area of the display screen in a window shape.

When the first extraction / display unit displays the selected word string on the display screen, the first extraction / display unit can display the corresponding term candidates for a part of the word string. 3. The voice conversion apparatus according to claim 2, wherein the voice conversion apparatus is a means for adding and displaying information.

4. The voice conversion device according to claim 3, further comprising a screen display specifying unit for specifying a part of the word string on which the added information is displayed on the display screen.

The speech conversion apparatus according to claim 1, wherein the conversion unit determines, as the conversion target, a result of replacing the specified part of the word string with the selected candidate term.