JP3755941B2

JP3755941B2 - Spoken dialogue apparatus and dialogue method

Info

Publication number: JP3755941B2
Application number: JP29589696A
Authority: JP
Inventors: 和也野村
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1996-10-18
Filing date: 1996-10-18
Publication date: 2006-03-15
Anticipated expiration: 2016-10-18
Also published as: JPH10124087A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識技術と音声合成技術を用いた音声対話装置及び対話方法に関するものである。
【０００２】
【従来の技術】
人との音声対話が可能な装置において、選択しようとするある目的の項目を含む集団に含まれている項目の数が音声認識部の処理能力を超えるような場合、目的の項目名を音声入力する前に、予め目的の項目を含む部分集団を表す言葉を入力して、検索の対象をその集団に特定し、音声認識の対象となる単語数を絞ることが必要である。
【０００３】
例えば、音声対話機能を備えたカーナビゲーション装置において実現されている音声対話を用いた目的地設定のための項目検索機能を用いてゴルフ場を検索する場合、検索の対象となるゴルフ場の項目数の総和が日本全国で２０００施設あり、また音声認識部の最大処理能力が１００単語であるとすると、日本全国のゴルフ場名を音声認識対象として一度に検索することは不可能である。
【０００４】
そこで、県毎にカテゴリ分けした場合、各県毎の施設数が１００以内になるとすると、使用者に対し目的のゴルフ場名を入力させる前に県名を入力させ、音声認識対象を県毎に絞り込んでから目的の施設名を発声させることにより、全項目数が音声認識部の最大処理能力を超える場合でも、その全項目の中から目的の施設名を検索することが可能となる。
【０００５】
従来、このような音声対話装置としては、例えば、図７及び図８に示すようなものがあった。図７は従来の音声対話装置の構成を示すブロック図、図８は図７に示す音声対話装置による音声対話の流れを示すフローチャートである。
【０００６】
まず、図７を参照して、従来の音声対話装置の構成について説明する。図７において、３０３は音声信号を入力し、入力音声信号を分析して特徴パラメータを求める音響分析部、３０４は対話制御部３０５の指令により入力音声信号を分析して得られた特徴パラメータと音声認識辞書とを照合して音声認識を行う音声認識部、３０５は音声対話を制御する対話制御部、３０６は使用者の操作及び音声認識の結果に基づいた音声対話の流れの情報を格納する対話制御用情報格納部である。
【０００７】
また、３０７は音声認識に用いられる辞書を格納する音声認識辞書格納部、３０８は対話制御部３０５の指令により音声認識辞書格納部３０７に格納されている辞書から音声認識に用いる辞書を選択する辞書選択部、３０９は対話制御部３０５の指令により、メッセージ辞書格納部３１０に格納されているメッセージの中から使用者に対して音声により提示すべきメッセージを選択するメッセージ選択部、３１０は使用者に対して提示するメッセージを格納するメッセージ辞書格納部である。
【０００８】
次に、図７及び図８を参照して、上記従来の音声対話装置の動作について説明する。なお、以下に示す対話の流れは図８を参照し、音声認識の対話に使用する辞書の内容は図４乃至図６を参照する。図４は音声対話装置において検索項目のジャンルを音声認識するための音声認識辞書の内容を示す図、図５は音声対話装置においてゴルフ場のある県名を音声認識するための音声認識辞書の内容を示す図、図６は音声対話装置において静岡県のゴルフ場を音声認識するための音声認識辞書の内容を示す図である。
【０００９】
まず、ユーザーの指示により音声対話が開始されると、対話制御部３０５は辞書選択部３０８に対し検索のジャンルを表す言葉で構成された辞書の作成を指令する。この指令により、辞書選択部３０８は音声認識辞書格納部３０７から図４に示すような、検索のジャンルを表す言葉で構成された音声認識辞書の作成を行う。
【００１０】
次に、対話制御部３０５はメッセージ選択部３０９に対し、使用者に対して施設の種類を表す言葉の発声を促すメッセージを出力することを指令する。この指令に対し、メッセージ選択部３０９はメッセージ辞書格納部３１０から「施設の種類をどうぞ」というメッセージを選択して使用者に音声で提示する。
【００１１】
次に、対話制御部３０５は音声認識部３０４に対し、辞書選択部３０８が作成した辞書を用いて音声認識を実行することを指令する。先の「施設の種類をどうぞ」というメッセージを聞いた使用者は検索したいジャンルを表す言葉、この場合「ゴルフ場」を発声して音声対話装置に音声信号を入力する。入力された音声信号は音響分析部３０３において特徴パラメータが求められ、音声認識部３０４で認識される。
【００１２】
認識結果として、「ゴルフ場」が検索のジャンルとして選ばれる。この結果を対話制御部３０５が記憶する。次に、対話制御部３０５は辞書選択部３０８に検索の対象の県名を表す言葉で構成された辞書の作成を指令する。この指令により、辞書選択部３０８は音声認識辞書格納部３０７から図５に示すような、検索の対象の県名を表す言葉で構成された音声認識辞書の作成を行う。
【００１３】
次に、対話制御部３０５はメッセージ選択部３０９に対し、使用者に対して検索の対象の県名を表す言葉の発声をメッセージとして出力することを指令する。この指令に対し、メッセージ選択部３０９はメッセージ辞書格納部３１０から
「ゴルフ場のある県名をどうぞ」というメッセージを選択し、使用者に音声で提示する。
【００１４】
次に、対話制御部３０５は、音声認識部３０４に対し、辞書選択部３０８が作成した辞書を用いて音声認識を実行することを指令すると、「ゴルフ場のある県名をどうぞ」というメッセージを聞いた使用者は検索の対象となる県を表す言葉、この場合「静岡県」を発声して音声対話装置に入力する。入力された音声信号は音響分析部３０３で特徴パラメータが求められ、音声認識部３０４で認識され、認識の結果として静岡県」が検索対象の県名して選ばれる。
【００１５】
この結果を対話制御部３０５が記憶する。対話制御部３０５は先の音声認識の結果の「静岡県」と、その前に行われた音声認識の結果である「ゴルフ場」とを組み合わせ、辞書選択部３０８に対し、静岡県のゴルフ場の名称で構成された辞書の作成を指令する。この指令により、辞書選択部３０８は音声認識辞書格納部３０７から図６に示すような、静岡県のゴルフ場の名称で構成された音声認識辞書の作成を行う。
【００１６】
次に、対話制御部３０５はメッセージ選択部３０９に対し、使用者に対して検索の対象である静岡県のゴルフ場の名称を表す言葉の発声をメッセージとして出力することを指令する。この指令に対し、メッセージ選択部３０９はメッセージ辞書格納部３１０から「ゴルフ場の名称をどうぞ」というメッセージを選択し、使用者に音声で提示する。
【００１７】
次に、対話制御部３０５は、音声認識部３０４に対し、辞書選択部３０８が作成した辞書を用いて音声認識を実行することを指令すると、「ゴルフ場の名称をどうぞ」というメッセージを聞いた使用者は検索の対象となるゴルフ場の名称を表す言葉、この場合「○○カントリークラブ」を発声して音声対話装置に入力する。入力された音声信号は音響分析部３０３で特徴パラメータが求められ、音声認識部３０４で認識され、認識の結果として○○カントリークラブ」が選ばれ、検索対象が確定する。
【００１８】
次に、対話制御部３０５はメッセージ選択部３０９に対し、確定した検索対象「○○カントリークラブ」をユーザーに提示することをを指令する。この指令に対し、メッセージ選択部３０９はメッセージ辞書格納部３１０に格納されている内容と「○○カントリークラブ」とを組み合わせ、「○○カントリークラブ付近の地図を表示します。」というメッセージを作成して使用者に対し音声で提示する。そして、その地図が表示される。以上の動作により、図８に示した対話の流れは完了する。
【００１９】
【発明が解決しようとする課題】
しかしながら、上記の従来広く用いられている音声認識装置では、複数の入力を蓄積する手段を持たないため、先に入力した言葉によって、次に実施すべき音声認識の対象を絞り込むことにより目的の項目を検索するという方法が採られるため、上記のようなゴルフ場の検索の例では、その対話の流れが図８に示すようなものに固定されてしまうことになる。
【００２０】
一般に、音声対話装置の分野では、音声対話装置の使用者に対し違和感とかストレスを与えない、自然な音声対話を提供することが要求されている。上記の例では、ゴルフ場の名称が使用者の入力する情報の主体であり、県名は補足情報である。そのため、図８に示されるように、補足情報を先に入力させ、主体となる情報をあとから入力させると、逆の場合に比べ、主体となる情報を先に入力することができないので、使用者に対し違和感を与えがちになるという問題があった。
【００２１】
本発明は、上記従来の問題を解決するためになされたもので、音声認識部の性能限界により認識語彙数が限定されることから、まず補足情報を先に入力させ認識語彙数を絞り込んだ後に、主体となる情報を入力させるという対話の流れにせざるを得ないような場合でも、同一性能の音声認識部を用いて、主体となる情報を先に入力した後に補足情報を入力するという対話の流れを実現することができ、発声順序を変更して目的の項目を検索しうる音声対話装置及び対話方法を提供することを目的とする。
【００２２】
【課題を解決するための手段】
本発明による音声対話装置及び対話方法は、入力された音声信号を入力音声信号の形でまたは入力音声信号を分析した結果の特徴パラメータの形で蓄積する蓄積手段を設け、音声信号を入力した順序を入れ替えて音声認識することにより、後で発声した言葉の音声認識結果から、前に発声した言葉に対する音声認識の対象を絞るようにしたものである。
【００２３】
本発明によれば、同一性能の音声認識部を用いて、主体となる情報を先に入力（発声）した後に補足情報を入力した場合でも、後で発声した言葉の音声認識結果から前に発声した言葉に対する音声認識の対象を絞ることができるようにしたことにより、使用者に対し違和感を与えない音声対話装置及び対話方法が得られる。
【００２４】
【発明の実施の形態】
本発明の請求項１に記載の発明は、対話制御部の指令により入力音声信号を蓄積するかまたは分析するか、蓄積した入力音声信号を分析するかの切り換えを行う入力音声制御手段と、対話制御部の指令により入力音声信号を蓄積する入力音声蓄積手段と、入力された音声信号を分析して特徴パラメータを求める音響分析手段と、対話制御部の指令により入力音声信号を分析して得られた特徴パラメータと音声認識辞書とを照合して音声認識を行う音声認識手段と、音声対話を制御する対話制御部と、対話制御部の指令により格納されているメッセージの中から使用者に対して提示すべきメッセージを選択して出力するメッセージ選択手段とからなり、入力した音声信号を入力音声蓄積手段に蓄積し、入力音声信号の順序を入れ替えて音声認識するようにしたものであり、入力した音声信号の順序を入れ替えて音声認識することにより、発声順序を変更して目的の項目を検索しうる音声対話装置が得られるという作用を有する。
【００２５】
本発明の請求項２に記載の発明は、入力された音声信号を分析してその特徴パラメータを求める音響分析手段と、対話制御部の指令により入力音声信号を分析して得られた特徴パラメータを蓄積するかまたは音声認識するか、蓄積していた特徴パラメータを音声認識するかの切り換えを行うパラメータ制御手段と、対話制御部の指令により入力音声信号を分析して得られた特徴パラメータを蓄積するパラメータ蓄積手段と、対話制御部の指令により入力音声信号を分析して得られた特徴パラメータと音声認識辞書とを照合して音声認識を行う音声認識手段と、音声対話を制御する対話制御部と、対話制御部の指令によりメッセージ辞書格納手段に格納されているメッセージの中から使用者に対して提示すべきメッセージを選択して出力するメッセージ選択手段とからなり、入力した音声信号を分析して得られた特徴パラメータをパラメータ蓄積手段に蓄積し、特徴パラメータの順序を入れ替えて音声認識するようにしたものであり、入力した音声信号の特徴パラメータの順序を入れ替えて音声認識することにより、発声順序を変更して目的の項目を検索しうる音声対話装置が得られるという作用を有する。
【００２６】
本発明の請求項３に記載の発明は、対話による音声信号を入力し、入力した音声信号を分析して特徴パラメータを求め、前記入力した音声信号かまたは該音声信号から求められた特徴パラメータを蓄積し、制御手段の制御による対話の流れに従い格納されているメッセージから提示すべきメッセージを選択して提示し、前記蓄積した音声信号かまたは特徴パラメータの順序を前記対話の流れとは異なるように入れ換え音声認識辞書と照合して音声認識を行うようにしたものであり、入力した音声信号かまたは音声信号の特徴パラメータの順序を入れ替えて音声認識することにより、発声順序を変更して目的の項目を検索しうる音声対話方法が得られるという作用を有する。
【００２７】
以下、添付図面、図１乃至図６に基づき、本発明の実施の形態を詳細に説明する。図１は本発明の第１の実施の形態における音声対話装置の構成を示すブロック図、図２は本発明の第２の実施の形態における音声対話装置の構成を示すブロック図、図３は図１及び図２に示す音声対話装置による音声対話の流れを示すフローチャートを示す図、図４は音声対話装置において検索項目のジャンルを音声認識するための音声認識辞書の内容を示す図、図５は音声対話装置においてゴルフ場のある県名を音声認識するための音声認識辞書の内容を示す図、図６は音声対話装置において静岡県のゴルフ場を音声認識するための音声認識辞書の内容を示す図である。
【００２８】
（実施の形態１）
まず、図１を参照して、本発明の第１の実施の形態における音声対話装置の構成について詳細に説明する。図１において、１０１は対話制御部１０５の指令により入力音声信号を蓄積するか、入力音声信号を分析するか、または蓄積した入力音声信号を分析するかの切り換えを行う入力音声制御部、１０２は対話制御部１０５の指令により入力音声信号を蓄積する入力音声蓄積部である。
【００２９】
また、１０３は入力された音声信号を分析して特徴パラメータを求める音響分析部、１０４は対話制御部１０５の指令により、入力音声信号を分析して得られた特徴パラメータと音声認識辞書とを照合して音声認識を行う音声認識部、１０５は音声対話を制御する対話制御部、１０６は使用者の操作とか音声認識の結果に従って決まる音声対話の流れに対する情報を格納する対話制御用情報格納部、１０７は音声認識に用いられる辞書を格納する音声認識辞書格納部である。
【００３０】
また、１０８は対話制御部１０５の指令により、音声認識辞書格納部１０７に格納されている辞書から音声認識に用いる辞書を選択する辞書選択部、１０９は対話制御部１０５の指令により、メッセージ辞書格納部１１０に格納されているメッセージの中から使用者に対して提示すべきメッセージを選択するメッセージ選択部、１１０は使用者に対して音声で提示するメッセージを格納するメッセージ辞書格納部である。
【００３１】
尚、入力音声制御部１０１、入力音声蓄積部１０２、音響分析部１０３、音声認識部１０４、対話制御部１０５及びメッセージ選択部１０９はそれぞれ入力音声制御手段、入力音声蓄積手段、音響分析手段、音声認識手段、対話制御手段及びメッセージ選択手段に対応する。
【００３２】
次に、図１及び図３乃至図６を参照して、本発明の第１の実施の形態における音声対話装置の動作について、図３に示す対話の流れを例に詳細に説明する。
まず、ユーザーの指示により音声対話が開始されると、対話制御部１０５は辞書選択部１０８に対し検索のジャンルを表す言葉で構成された辞書の作成を指令する。この指令により、辞書選択部１０８は音声認識辞書格納部１０７から図４に示すような、検索のジャンルを表す言葉で構成された音声認識辞書の作成を行う。
【００３３】
次に、対話制御部１０５はメッセージ選択部１０９に対し、使用者に対して施設の種類を表す言葉の発声を促すメッセージを出力することを指令する。この指令に対し、メッセージ選択部１０９はメッセージ辞書格納部１１０から「どのジャンルを検索しますか？」というメッセージを選択し、使用者に音声で提示する。（尚、使用者に対するこの提示は音声によるほか、表示装置に対する表示をも併用することもできる、以下同じ）。
【００３４】
次に、対話制御部１０５は、音声認識部１０４に対し辞書選択部１０８が作成した辞書を用いて音声認識を実行することを指令するとともに、入力音声制御部１０１に対し、入力音声信号を音響分析部へ出力することを指令する。先の「どのジャンルを検索しますか？」というメッセージを聞いた使用者は検索を希望するジャンルを表す言葉、この場合「ゴルフ場」を発声して音声対話装置に入力する。入力された音声信号は、入力音声制御部１０１を経由し、音響分析部１０３においてその特徴パラメータが求められ、音声認識部１０４で認識される。
【００３５】
認識結果として、「ゴルフ場」が検索のジャンルとして選ばれる。対話制御部１０５はこの結果を記憶する。次に、対話制御部１０５は、メッセージ選択部１０９に対し先の音声認識の結果である「ゴルフ場」の名称の発声を使用者に対して促す言葉をメッセージとして出力することを指令する。この指令に対し、メッセージ選択部１０９はメッセージ辞書格納部１１０から「何というゴルフ場ですか？」というメッセージを選択して使用者に音声で提示する。
【００３６】
次に、対話制御部１０５は、入力音声制御部１０１及び入力音声蓄積部１０２に対し、入力した入力音声信号を蓄積することを指令する。入力音声制御部１０１は、この指令により入力した音声信号を音声蓄積部１０２に出力し、入力音声蓄積部１０２は入力音声信号の蓄積を開始する。
【００３７】
また、先に提示された「何というゴルフ場ですか？」というメッセージを聞いた使用者は検索を希望するゴルフ場を表す言葉、この場合「○○カントリークラブ」を発声し、音声対話装置に入力する。入力された音声である「○○カントリークラブ」は入力音声制御部１０１を経由して、入力音声蓄積部１０２に蓄積される。
【００３８】
この蓄積が終了すると、対話制御部１０５は辞書選択部１０８に対し検索の対象とする県名を表す言葉で構成された辞書の作成を指令する。この指令により、辞書選択部１０８は音声認識辞書格納部１０７から図５に示すような、検索対象の県名を表す言葉で構成された音声認識辞書の作成を行う。
【００３９】
次に、対話制御部１０５はメッセージ選択部１０９に対し、使用者に対して検索の対象の県名を表す言葉の発声を促す言葉をメッセージとして出力することを指令する。この指令に対し、メッセージ選択部１０９はメッセージ辞書格納部１１０から「どの県にありますか？」というメッセージを選択し、使用者に音声で提示する。
【００４０】
次に、対話制御部１０５は、音声認識部１０４に対し、辞書選択部１０８が作成した辞書を用いて音声認識を実行することを指令するとともに、入力音声制御部１０１に対し、入力音声信号を音響分析部１０３へ出力することを指令する。先の「どの県にありますか？」というメッセージを聞いた使用者は検索の対象となる県を表す言葉、この場合「静岡県」を発声し、音声対話装置に入力する。入力された音声信号「静岡県」は入力音声制御部１０１を経由して、音響分析部１０３で特徴パラメータが求められ、音声認識部１０４で認識され、その認識結果として、「静岡県」が検索対象の県名して選ばれる。
【００４１】
対話制御部１０５は、その結果を記憶するとともに、音声認識の結果の「静岡県」と、その前に行われた音声認識の結果である「ゴルフ場」とを組み合わせて、辞書選択部１０８に対し、静岡県のゴルフ場の名称で構成された辞書の作成を指令する。この指令により、辞書選択部１０８は音声認識辞書格納部１０７から図６に示すような、静岡県のゴルフ場の名称で構成された音声認識辞書の作成を行う。
【００４２】
次に、対話制御部１０５は、入力音声制御部１０１及び入力音声蓄積部１０２に対し、先に蓄積した使用者の発声である「○○カントリークラブ」の音声信号を音響分析部１０３に出力することを指令する。この指令により音声蓄積部１０２は蓄積された音声信号を入力音声制御部１０１に出力し、入力音声制御部１０１は音響分析部１０３に対してその入力音声信号の出力を開始する。この音声信号が、音響分析部１０３で分析されてその特徴パラメータが求められ、音声認識部１０４で認識される。その認識結果から、図６に示すような「○○カントリークラブ」が選ばれて検索対象が確定する。
【００４３】
次に、対話制御部１０５はメッセージ選択部１０９に対し、確定した検索対象「○○カントリークラブ」をユーザーに対し音声で提示することをを指令する。この指令に対し、メッセージ選択部１０９はメッセージ辞書格納部１１０に格納されている内容と「○○カントリークラブ」を組み合わせ、「○○カントリークラブ付近の地図を表示します。」というメッセージを作成して使用者に対し音声で提示する。そして、その地図が表示される。以上の動作により、図３に示した対話の流れが完了する。
【００４４】
（実施の形態２）
次に、図２を参照して、本発明の第２の実施の形態における音声対話装置の構成について詳細に説明する。図２において、２０１は入力された音声信号を分析してその特徴パラメータを求める音響分析部、２０２は、対話制御部２０５の指令により、入力音声信号を分析した結果得られた特徴パラメータを蓄積するか、入力音声信号を分析した結果得られた特徴パラメータを音声認識するか、または蓄積していた特徴パラメータを音声認識するかの切り換えを行うパラメータ制御部である。
【００４５】
また、２０３は対話制御部２０５の指令により、入力音声信号を分析して得られた特徴パラメータを蓄積するパラメータ蓄積部、２０４は対話制御部２０５の指令により、入力音声信号を分析して得られた特徴パラメータと音声認識辞書とを照合して音声認識を行う音声認識部、２０５は音声対話を制御する対話制御部、２０６は使用者の操作とか音声認識の結果に従って行われる音声対話の流れの情報を格納する対話制御用情報格納部、２０７は音声認識に用いられる辞書を格納する音声認識辞書格納部である。
【００４６】
また、２０８は対話制御部の指令により、音声認識辞書格納部に格納されている辞書から、音声認識に用いる辞書を選択する辞書選択部、２０９は対話制御部の指令により、メッセージ辞書格納部に格納されているメッセージの中から使用者に対して提示すべきメッセージを選択するメッセージ選択部、２１０は使用者に対して提示するメッセージを格納するメッセージ辞書格納部である。
【００４７】
尚、音響分析部２０１、パラメータ制御部２０２、パラメータ蓄積部２０３、音声認識部２０４、対話制御部２０５及びメッセージ選択部２０９はそれぞれ音響分析手段、パラメータ制御手段、パラメータ蓄積手段、音声認識手段、対話制御手段及びメッセージ選択手段に対応する。
【００４８】
次に、図２及び図３乃至図６を参照して、本発明の第２の実施の形態における音声対話装置の動作について、図３に示す対話の流れを例に詳細に説明する。
まず、ユーザーの指示により音声対話が開始されると、対話制御部２０５は、辞書選択部２０８に対し検索のジャンルを表す言葉で構成された辞書の作成を指令する。この指令により、辞書選択部２０８は音声認識辞書格納部２０７から図４に示すような、検索のジャンルを表す言葉で構成された音声認識辞書の作成を行う。
【００４９】
次に、対話制御部２０５はメッセージ選択部２０９に対し、使用者に対して施設の種類を表す言葉の発声を促すメッセージを出力することを指令する。この指令に対し、メッセージ選択部２０９はメッセージ辞書格納部２１０から「どのジャンルを検索しますか？」というメッセージを選択し、使用者に対し音声で提示する。
【００５０】
そこで、対話制御部２０５は、音声認識部２０４に対し、辞書選択部２０８が作成した辞書を用いて音声認識を実行することを指令するとともに、パラメータ制御部２０２に対し、音響分析部２０１において入力音声信号を分析した結果得られた特徴パラメータを音声認識部２０４へ出力することを指令する。先の「どのジャンルを検索しますか？」というメッセージを聞いた使用者は検索を希望するジャンルを表す言葉、この場合「ゴルフ場」を発声して音声対話装置に入力する。
【００５１】
入力された音声信号「ゴルフ場」は、音響分析部２０１で分析されて特徴パラメータに変換され、パラメータ制御部２０２を経由し、音響分析部２０１で求めた特徴パラメータが音声認識部２０４で認識される。認識結果として、「ゴルフ場」が検索のジャンルとして選ばれる。この結果は対話制御部２０５に記憶される。次に、対話制御部２０５はメッセージ選択部２０９に対し、先の音声認識の結果であるゴルフ場の名称の発声を使用者に対して促す言葉をメッセージとして出力することを指令する。この指令に対し、メッセージ選択部２０９はメッセージ辞書格納部２１０から「何というゴルフ場ですか？」というメッセージを選択して、使用者に音声で提示する。
【００５２】
次に、対話制御部２０５はパラメータ制御部２０２とパラメータ蓄積部２０３に対し、音響分析部２０１において、入力音声信号を分析した結果得られた特徴パラメータの蓄積を指令する。この指令により、パラメータ制御部２０２は入力された音声信号をパラメータ蓄積部２０３に出力し、入力パラメータ蓄積部２０３は入力音声信号を分析して得られた特徴パラメータの蓄積を開始する。
【００５３】
また、先に提示された「何というゴルフ場ですか？」というメッセージを聞いた使用者は検索を希望するゴルフ場を表す言葉、この場合は「○○カントリークラブ」を発声して音声対話装置に入力する。入力された音声信号「○○カントリークラブ」は音響分析部２０１において分析されて特徴パラメータに変換され、パラメータ制御部２０２を経由して、パラメータ蓄積部２０３に蓄積される。
【００５４】
この蓄積が終了すると、対話制御部２０５は辞書選択部２０８に対し検索の対象の県名を表す言葉で構成された辞書の作成を指令する。この指令により、辞書選択部２０８は音声認識辞書格納部２０７から図５に示すような、検索の対象の県名を表す言葉で構成された音声認識辞書の作成を行う。
【００５５】
次に、対話制御部２０５はメッセージ選択部２０９に対し、使用者に対して検索の対象の県名を表す言葉の発声を促す言葉をメッセージとして出力することを指令する。この指令に対し、メッセージ選択部２０９はメッセージ辞書格納部２１０から「どの県にありますか？」というメッセージを選択して、使用者に音声で提示する。
【００５６】
次に、対話制御部２０５は、音声認識部２０４に対し、辞書選択部２０８が作成した辞書を用いて音声認識を実行することを指令するとともに、パラメータ制御部２０２に対し、入力音声信号を音響分析部２０１で分析して得られた特徴パラメータを音声認識部２０４へ出力することを指令する。
【００５７】
先の「どの県にありますか？」というメッセージを聞いた使用者は検索の対象となる県を表す言葉、この場合は「静岡県」を発声し、音声対話装置に入力する。入力された音声信号「静岡県」は音響分析部２０１で分析されて特徴パラメータに変換され、パラメータ制御部２０２を経由して、音響分析部２０１で求められた特徴パラメータが音声認識部２０４で認識される。
【００５８】
認識結果として、「静岡県」が検索対象の県名して選ばれる。この結果は対話制御部２０５に記憶される。対話制御部２０５は先の音声認識の結果の「静岡県」と、その前に行われた音声認識の結果である「ゴルフ場」とを組み合わせ、辞書選択部２０８に対し、静岡県のゴルフ場の名称で構成された辞書の作成を指令する。この指令により、辞書選択部２０８は音声認識辞書格納部２０７から図６に示すような、静岡県のゴルフ場の名称で構成された音声認識辞書の作成を行う。
【００５９】
次に、対話制御部２０５は、パラメータ制御部２０２及びパラメータ蓄積部２０３に対し、先に蓄積した使用者の発声による「○○カントリークラブ」の特徴パラメータを音声認識部２０４へ出力することを指令する。この指令により、パラメータ蓄積部２０３は、蓄積された特徴パラメータをパラメータ制御部２０２に出力し、パラメータ制御部２０２はその特徴パラメータの音声認識部２０４に対する出力を開始する。この特徴パラメータは、音声認識部２０４で認識され、その認識の結果として、「○○カントリークラブ」が選ばれて検索対象が確定する。
【００６０】
次に、対話制御部２０５はメッセージ選択部２０９に対し、確定した検索対象「○○カントリークラブ」をユーザーに提示することをを指令する。この指令に対し、メッセージ選択部２０９はメッセージ辞書格納部２１０に格納されている内容と「○○カントリークラブ」を組み合わせ、「○○カントリークラブ付近の地図を表示します。」というメッセージを作成して使用者に対し音声で提示する。そして、その地図が表示される。以上の動作により、図３に示した対話の流れが完了する。
【００６１】
【発明の効果】
本発明は、以上のように構成し、特に、入力音声信号を一時蓄積する入力音声蓄積部かまたは入力音声信号を分析した結果の特徴パラメータを一時蓄積するパラメータ蓄積部を備え、後から発声した言葉の音声認識結果から、前に発声した言葉に対する認識対象の語彙を絞って音声認識できるようにしたことにより、同じ性能の音声認識部を用いて、主体となる情報を先に入力した後補足情報を入力するという対話の流れを実現することができ、発声順序を変更して目的の項目を検索しうる、使用者に対してより使い易い音声対話装置及び対話方法を提供することができる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態における音声対話装置の構成を示すブロック図
【図２】本発明の第２の実施の形態における音声対話装置の構成を示すブロック図
【図３】図１及び図２に示す音声対話装置による音声対話の流れを示すフローチャートを示す図
【図４】音声対話装置において検索項目のジャンルを音声認識するための音声認識辞書の内容を示す図
【図５】音声対話装置においてゴルフ場のある県名を音声認識するための音声認識辞書の内容を示す図
【図６】音声対話装置において静岡県のゴルフ場を音声認識するための音声認識辞書の内容を示す図
【図７】従来の音声対話装置の構成を示すブロック図
【図８】図７に示す音声対話装置による音声対話の流れを示すフローチャート
【符号の説明】
１０１入力音声制御部
１０２入力音声蓄積部
１０３音響分析部
１０４音声認識部
１０５対話制御部
１０６対話制御用情報格納部
１０７音声認識辞書格納部
１０８辞書選択部
１０９メッセージ選択部
１１０メッセージ辞書格納部
２０１音響分析部
２０２パラメータ制御部
２０３パラメータ蓄積部
２０４音声認識部
２０５対話制御部
２０６対話制御用情報格納部
２０７音声認識辞書格納部
２０８辞書選択部
２０９メッセージ選択部
２１０メッセージ辞書格納部
３０３音響分析部
３０４音声認識部
３０５対話制御部
３０６対話制御用情報格納部
３０７音声認識辞書格納部
３０８辞書選択部
３０９メッセージ選択部
３１０メッセージ辞書格納部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a voice dialogue apparatus and a dialogue method using voice recognition technology and voice synthesis technology.
[0002]
[Prior art]
If the number of items included in the group including the target item to be selected exceeds the processing capacity of the voice recognition unit in a device capable of voice interaction with humans, the target item name is input by voice. Before doing so, it is necessary to input words representing a subgroup including the target item in advance, specify the search target as the group, and narrow down the number of words to be subjected to speech recognition.
[0003]
For example, when searching for a golf course using an item search function for setting a destination using a voice dialog realized in a car navigation device having a voice dialog function, the number of golf course items to be searched If there are 2000 facilities throughout Japan and the maximum processing capacity of the speech recognition unit is 100 words, it is impossible to search golf course names all over Japan as speech recognition targets at once.
[0004]
Therefore, when categorizing by prefecture, if the number of facilities in each prefecture is within 100, let the user input the name of the prefecture before inputting the target golf course name, and set the speech recognition target for each prefecture. By making the target facility name uttered after narrowing down, even if the total number of items exceeds the maximum processing capacity of the voice recognition unit, the target facility name can be searched from all the items.
[0005]
Conventionally, as such a voice interactive apparatus, there have been those shown in FIGS. 7 and 8, for example. FIG. 7 is a block diagram showing the configuration of a conventional voice dialogue apparatus, and FIG. 8 is a flowchart showing the flow of voice dialogue by the voice dialogue apparatus shown in FIG.
[0006]
First, with reference to FIG. 7, the configuration of a conventional voice interactive apparatus will be described. In FIG. 7, 303 is an acoustic analysis unit that inputs a speech signal and analyzes the input speech signal to obtain a feature parameter, and 304 is a feature parameter and speech obtained by analyzing the input speech signal according to a command from the dialogue control unit 305. A speech recognition unit that performs speech recognition by collating with a recognition dictionary, 305 is a dialogue control unit that controls voice dialogues, and 306 is a dialogue that stores information on the flow of voice dialogues based on user operations and the results of voice recognition. It is a control information storage unit.
[0007]
Reference numeral 307 denotes a speech recognition dictionary storage unit that stores a dictionary used for speech recognition. Reference numeral 308 denotes a dictionary that selects a dictionary used for speech recognition from the dictionary stored in the speech recognition dictionary storage unit 307 in response to an instruction from the dialog control unit 305. A selection unit 309 is a message selection unit that selects a message to be presented to the user by voice from messages stored in the message dictionary storage unit 310 according to an instruction from the dialogue control unit 305. It is a message dictionary storage unit for storing messages to be presented.
[0008]
Next, with reference to FIGS. 7 and 8, the operation of the conventional voice interaction apparatus will be described. Note that the flow of the dialogue shown below refers to FIG. 8, and the contents of the dictionary used for the speech recognition dialogue refer to FIG. 4 to FIG. FIG. 4 is a diagram showing the contents of a voice recognition dictionary for voice recognition of the genre of the search item in the voice dialogue device, and FIG. 5 is the contents of the voice recognition dictionary for voice recognition of a prefecture name with a golf course in the voice dialogue device. FIG. 6 is a diagram showing the contents of a voice recognition dictionary for voice recognition of a golf course in Shizuoka Prefecture in a voice dialogue apparatus.
[0009]
First, when a voice dialogue is started by a user instruction, the dialogue control unit 305 instructs the dictionary selection unit 308 to create a dictionary composed of words representing the genre of search. In response to this command, the dictionary selection unit 308 creates a speech recognition dictionary composed of words representing the genre of search as shown in FIG. 4 from the speech recognition dictionary storage unit 307.
[0010]
Next, the dialogue control unit 305 instructs the message selection unit 309 to output a message that prompts the user to speak a word representing the type of facility. In response to this instruction, the message selection unit 309 selects a message “please type of facility” from the message dictionary storage unit 310 and presents it to the user by voice.
[0011]
Next, the dialogue control unit 305 instructs the voice recognition unit 304 to execute voice recognition using the dictionary created by the dictionary selection unit 308. The user who has heard the message “please select the type of facility” speaks a word representing the genre to be searched for, in this case “golf course”, and inputs a voice signal to the voice interactive apparatus. A characteristic parameter of the input voice signal is obtained by the acoustic analysis unit 303 and recognized by the voice recognition unit 304.
[0012]
As a recognition result, “golf course” is selected as a search genre. The dialogue control unit 305 stores this result. Next, the dialogue control unit 305 instructs the dictionary selection unit 308 to create a dictionary composed of words representing prefecture names to be searched. In response to this command, the dictionary selection unit 308 creates a speech recognition dictionary composed of words representing prefecture names to be searched as shown in FIG. 5 from the speech recognition dictionary storage unit 307.
[0013]
Next, the dialogue control unit 305 instructs the message selection unit 309 to output a utterance of a word representing the name of the prefecture to be searched as a message to the user. In response to this command, the message selection unit 309 reads from the message dictionary storage unit 310.
Select the message “Please enter the name of the prefecture where the golf course is located” and present it to the user by voice.
[0014]
Next, when the dialogue control unit 305 instructs the voice recognition unit 304 to execute voice recognition using the dictionary created by the dictionary selection unit 308, a message “Please enter the name of a prefecture with a golf course” is displayed. The user who has heard speaks a word representing the prefecture to be searched, in this case “Shizuoka Prefecture”, and inputs it to the voice interactive apparatus. A characteristic parameter is obtained from the input speech signal by the acoustic analysis unit 303 and recognized by the speech recognition unit 304. As a result of the recognition, “Shizuoka Prefecture” is selected as a search target prefecture name.
[0015]
The dialogue control unit 305 stores this result. The dialogue control unit 305 combines “Shizuoka Prefecture” as the result of the previous speech recognition with “Golf course” as the result of the speech recognition performed before that, Directs the creation of a dictionary composed of names. In response to this command, the dictionary selection unit 308 creates a speech recognition dictionary composed of names of golf courses in Shizuoka Prefecture as shown in FIG. 6 from the speech recognition dictionary storage unit 307.
[0016]
Next, the dialogue control unit 305 instructs the message selection unit 309 to output a utterance of a word representing the name of the golf course in Shizuoka Prefecture as a search target to the user as a message. In response to this command, the message selection unit 309 selects the message “Please name the golf course” from the message dictionary storage unit 310 and presents it to the user by voice.
[0017]
Next, when the dialogue control unit 305 instructs the voice recognition unit 304 to execute voice recognition using the dictionary created by the dictionary selection unit 308, the dialogue control unit 305 heard a message “Please name golf course”. The user utters a word representing the name of the golf course to be searched, in this case “XX country club”, and inputs it to the voice interactive apparatus. A characteristic parameter is obtained from the input voice signal by the acoustic analysis unit 303 and recognized by the voice recognition unit 304. As a result of recognition, “XX country club” is selected, and a search target is determined.
[0018]
Next, the dialogue control unit 305 instructs the message selection unit 309 to present the confirmed search target “XX country club” to the user. In response to this command, the message selection unit 309 combines the content stored in the message dictionary storage unit 310 with “XX country club” and creates a message “Displays a map near the XX country club.” And present it to the user by voice. Then, the map is displayed. With the above operation, the dialog flow shown in FIG. 8 is completed.
[0019]
[Problems to be solved by the invention]
However, since the above-described widely used speech recognition apparatuses do not have a means for storing a plurality of inputs, the target items can be reduced by narrowing down the target of speech recognition to be performed next by words that have been input first. Therefore, in the example of golf course search as described above, the flow of the dialogue is fixed as shown in FIG.
[0020]
In general, in the field of voice interactive devices, it is required to provide a natural voice dialogue that does not give a sense of discomfort or stress to the user of the voice interactive device. In the above example, the name of the golf course is the main body of information input by the user, and the prefecture name is supplementary information. Therefore, as shown in FIG. 8, if supplementary information is input first and main information is input later, the main information cannot be input first as compared to the reverse case. There was a problem that people tend to feel uncomfortable.
[0021]
The present invention has been made to solve the above-described conventional problems, and since the number of recognized vocabulary is limited by the performance limit of the speech recognition unit, first, after supplemental information is first input and the number of recognized vocabulary is narrowed down Even if there is no choice but to have the flow of dialogue to input the main information, the dialogue of inputting supplementary information after inputting the main information first using the voice recognition unit with the same performance. It is an object of the present invention to provide a voice dialogue apparatus and a dialogue method that can realize a flow and can search a target item by changing the utterance order.
[0022]
[Means for Solving the Problems]
The voice dialogue apparatus and the dialogue method according to the present invention are provided with accumulation means for accumulating the inputted voice signal in the form of the input voice signal or in the form of the characteristic parameter as a result of analyzing the input voice signal, and the order in which the voice signals are inputted. By recognizing and replacing the speech, the speech recognition target for the previously uttered word is narrowed down from the speech recognition result of the word uttered later.
[0023]
According to the present invention, even when supplementary information is input after first inputting (speaking) main information using a speech recognition unit having the same performance, the speech is uttered before from the speech recognition result of the words uttered later. By making it possible to narrow down the target of speech recognition for a given word, a speech dialogue apparatus and a dialogue method that do not give the user a sense of incongruity can be obtained.
[0024]
DETAILED DESCRIPTION OF THE INVENTION
According to the first aspect of the present invention, there is provided an input voice control means for switching whether to accumulate or analyze an input voice signal or to analyze an accumulated input voice signal according to a command from a dialogue control unit, Obtained by analyzing the input voice signal according to the command of the dialogue control unit, the input voice storage means for storing the input voice signal according to the command of the control unit, the acoustic analysis means for analyzing the input voice signal and obtaining the characteristic parameter A voice recognition means for performing voice recognition by collating the feature parameter with the voice recognition dictionary, a dialogue control unit for controlling the voice dialogue, and a message stored in response to a command from the dialogue control unit to the user. Consists of message selection means for selecting and outputting a message to be presented, storing the input voice signal in the input voice storage means, and recognizing the voice by changing the order of the input voice signals Is obtained by the so that, with the effect that by speech recognition out of sequence of the input speech signal, the speech dialogue system is obtained that can find the desired item by changing the utterance order.
[0025]
According to a second aspect of the present invention, an acoustic analysis means for analyzing an input voice signal and obtaining a characteristic parameter thereof, and a characteristic parameter obtained by analyzing the input voice signal according to a command from the dialogue control unit are provided. Parameter control means for switching between accumulation, voice recognition, or voice recognition of the accumulated feature parameter, and the feature parameter obtained by analyzing the input voice signal according to a command from the dialogue control unit A parameter accumulating unit, a voice recognition unit that performs voice recognition by comparing a feature parameter obtained by analyzing an input voice signal according to a command from the dialog control unit and a voice recognition dictionary, and a dialog control unit that controls the voice dialogue; The message to be presented to the user is selected and output from the messages stored in the message dictionary storage means according to the command of the dialogue control unit. Comprising the sage selection means, the feature parameters obtained by analyzing the input voice signal are stored in the parameter storage means, the voice is recognized by changing the order of the feature parameters, By recognizing the voice by changing the order of the feature parameters, there is an effect that a voice dialogue apparatus capable of searching for a target item by changing the utterance order is obtained.
[0026]
According to a third aspect of the present invention, an audio signal by dialogue is input, the input audio signal is analyzed to obtain a characteristic parameter, and the input audio signal or the characteristic parameter obtained from the audio signal is obtained. A message to be presented is selected from the stored messages according to the flow of dialogue under the control of the control means, and the order of the accumulated audio signal or feature parameter is different from that of the dialogue flow The speech recognition is performed by collating with the replacement speech recognition dictionary, and the target item can be changed by changing the order of speech by recognizing the speech by changing the order of the input speech signal or the feature parameters of the speech signal. It is possible to obtain a voice dialogue method that can search for
[0027]
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings and FIGS. FIG. 1 is a block diagram showing the configuration of a voice interactive apparatus according to the first embodiment of the present invention, FIG. 2 is a block diagram showing the configuration of the voice interactive apparatus according to the second embodiment of the present invention, and FIG. FIG. 4 is a flowchart showing the flow of a voice dialogue by the voice dialogue apparatus shown in FIG. 1 and FIG. 2, FIG. 4 is a diagram showing the contents of a voice recognition dictionary for voice recognition of the genre of a search item in the voice dialogue apparatus, and FIG. FIG. 6 shows the contents of a voice recognition dictionary for voice recognition of a prefecture name with a golf course in the voice dialogue device. FIG. 6 shows the contents of the voice recognition dictionary for voice recognition of a golf course in Shizuoka prefecture in the voice dialogue device. FIG.
[0028]
(Embodiment 1)
First, with reference to FIG. 1, the configuration of the voice interactive apparatus according to the first embodiment of the present invention will be described in detail. In FIG. 1, 101 is an input voice control unit that switches between storing an input voice signal, analyzing an input voice signal, or analyzing a stored input voice signal in response to a command from the dialogue control unit 105. This is an input voice accumulation unit that accumulates an input voice signal according to a command from the dialogue control unit 105.
[0029]
Also, 103 is an acoustic analysis unit that analyzes the input speech signal to obtain a feature parameter, and 104 is a collation between the feature parameter obtained by analyzing the input speech signal and the speech recognition dictionary according to a command from the dialogue control unit 105. A voice recognition unit that performs voice recognition, a dialogue control unit 105 that controls voice dialogue, and a dialogue control information storage unit 106 that stores information on the flow of the voice dialogue determined according to the user's operation or the result of voice recognition, A voice recognition dictionary storage unit 107 stores a dictionary used for voice recognition.
[0030]
Further, 108 is a dictionary selection unit that selects a dictionary to be used for speech recognition from a dictionary stored in the speech recognition dictionary storage unit 107 according to a command from the dialog control unit 105, and 109 is a message dictionary stored according to a command from the dialog control unit 105. A message selection unit for selecting a message to be presented to the user from among messages stored in the unit 110, and a message dictionary storage unit for storing a message to be presented to the user by voice.
[0031]
The input voice control unit 101, the input voice storage unit 102, the acoustic analysis unit 103, the voice recognition unit 104, the dialog control unit 105, and the message selection unit 109 are respectively input voice control means, input voice storage means, acoustic analysis means, and voice. It corresponds to a recognition unit, a dialogue control unit, and a message selection unit.
[0032]
Next, with reference to FIG. 1 and FIGS. 3 to 6, the operation of the voice interaction apparatus according to the first embodiment of the present invention will be described in detail with reference to the flow of the conversation shown in FIG.
First, when a voice dialogue is started in response to a user instruction, the dialogue control unit 105 instructs the dictionary selection unit 108 to create a dictionary composed of words representing the genre of search. In response to this command, the dictionary selection unit 108 creates a speech recognition dictionary composed of words representing the genre of search as shown in FIG. 4 from the speech recognition dictionary storage unit 107.
[0033]
Next, the dialogue control unit 105 instructs the message selection unit 109 to output a message that prompts the user to speak a word representing the type of facility. In response to this command, the message selection unit 109 selects a message “Which genre should be searched?” From the message dictionary storage unit 110 and presents it to the user by voice. (Note that this presentation to the user can be made by voice and can also be displayed on the display device, the same applies hereinafter).
[0034]
Next, the dialogue control unit 105 instructs the voice recognition unit 104 to execute voice recognition using the dictionary created by the dictionary selection unit 108, and transmits an input voice signal to the input voice control unit 101. Command to output to the analysis unit. The user who has heard the message “Which genre do you want to search?” Utters a word representing the genre that the user wants to search, in this case “golf course”, and inputs it to the voice interaction device. The input voice signal passes through the input voice control unit 101, the acoustic analysis unit 103 obtains its characteristic parameter, and is recognized by the voice recognition unit 104.
[0035]
As a recognition result, “golf course” is selected as a search genre. The dialogue control unit 105 stores this result. Next, the dialogue control unit 105 instructs the message selection unit 109 to output a message prompting the user to speak the name “golf course”, which is the result of the previous voice recognition. In response to this command, the message selection unit 109 selects a message “What is a golf course?” From the message dictionary storage unit 110 and presents it to the user by voice.
[0036]
Next, the dialogue control unit 105 instructs the input voice control unit 101 and the input voice storage unit 102 to store the input voice signal that has been input. The input voice control unit 101 outputs the voice signal input in accordance with this command to the voice storage unit 102, and the input voice storage unit 102 starts storing the input voice signal.
[0037]
In addition, the user who heard the message “What is the golf course?” Presented earlier utters a word indicating the golf course that the user wishes to search, in this case “XX Country Club”, and sends it to the voice interaction device. input. The input sound “XX country club” is stored in the input sound storage unit 102 via the input sound control unit 101.
[0038]
When the accumulation is completed, the dialogue control unit 105 instructs the dictionary selection unit 108 to create a dictionary composed of words representing prefecture names to be searched. In response to this command, the dictionary selection unit 108 creates a speech recognition dictionary composed of words representing prefecture names to be searched as shown in FIG. 5 from the speech recognition dictionary storage unit 107.
[0039]
Next, the dialogue control unit 105 instructs the message selection unit 109 to output a message that prompts the user to speak a word representing the name of the prefecture to be searched. In response to this command, the message selection unit 109 selects a message “in which prefecture is it?” From the message dictionary storage unit 110 and presents it to the user by voice.
[0040]
Next, the dialogue control unit 105 instructs the voice recognition unit 104 to execute voice recognition using the dictionary created by the dictionary selection unit 108 and sends an input voice signal to the input voice control unit 101. Command to output to the acoustic analysis unit 103. The user who heard the previous message “Where is it?” Speaks a word representing the prefecture to be searched, in this case “Shizuoka Prefecture”, and inputs it to the voice interaction device. The input speech signal “Shizuoka Prefecture” is obtained by the acoustic analysis unit 103 via the input speech control unit 101, the characteristic parameter is obtained and recognized by the speech recognition unit 104, and “Shizuoka Prefecture” is searched as the recognition result. It is chosen as the target prefecture name.
[0041]
The dialogue control unit 105 stores the result, and combines the result of speech recognition “Shizuoka Prefecture” with the result of speech recognition performed before that, “golf course”, in the dictionary selection unit 108. On the other hand, it instructs the creation of a dictionary composed of the names of golf courses in Shizuoka Prefecture. In response to this command, the dictionary selection unit 108 creates a speech recognition dictionary composed of names of golf courses in Shizuoka Prefecture as shown in FIG. 6 from the speech recognition dictionary storage unit 107.
[0042]
Next, the dialogue control unit 105 outputs, to the acoustic analysis unit 103, the voice signal of “XX country club” that is the user's utterance that has been previously accumulated, to the input voice control unit 101 and the input voice storage unit 102. Command that. In response to this instruction, the voice accumulation unit 102 outputs the accumulated voice signal to the input voice control unit 101, and the input voice control unit 101 starts outputting the input voice signal to the acoustic analysis unit 103. The sound signal is analyzed by the sound analysis unit 103 to obtain a feature parameter, and is recognized by the sound recognition unit 104. From the recognition result, “XX country club” as shown in FIG. 6 is selected and the search target is determined.
[0043]
Next, the dialogue control unit 105 instructs the message selection unit 109 to present the confirmed search target “XX country club” to the user by voice. In response to this command, the message selection unit 109 combines the content stored in the message dictionary storage unit 110 with “XX country club” and creates a message “Displays a map near the XX country club.” To the user by voice. Then, the map is displayed. With the above operation, the dialog flow shown in FIG. 3 is completed.
[0044]
(Embodiment 2)
Next, with reference to FIG. 2, the configuration of the voice interactive apparatus according to the second embodiment of the present invention will be described in detail. In FIG. 2, 201 is an acoustic analysis unit that analyzes an input voice signal and obtains a feature parameter thereof, and 202 stores a feature parameter obtained as a result of analyzing the input voice signal in response to a command from the dialogue control unit 205. Or a parameter control unit that performs switching between voice recognition of a feature parameter obtained as a result of analyzing an input voice signal or voice recognition of an accumulated feature parameter.
[0045]
Reference numeral 203 denotes a parameter accumulating unit that accumulates feature parameters obtained by analyzing the input voice signal according to a command from the dialogue control unit 205, and 204 denotes a result obtained by analyzing the input voice signal according to a command from the dialogue control unit 205. A speech recognition unit that performs speech recognition by comparing the feature parameter and the speech recognition dictionary; 205, a dialog control unit that controls speech interaction; and 206, a flow of speech interaction performed according to a user operation or a result of speech recognition An information storage unit for dialog control that stores information, 207 is a voice recognition dictionary storage unit that stores a dictionary used for voice recognition.
[0046]
208 is a dictionary selection unit that selects a dictionary to be used for speech recognition from a dictionary stored in the speech recognition dictionary storage unit according to a command from the dialog control unit, and 209 is a message dictionary storage unit that receives a command from the dialog control unit. A message selection unit 210 for selecting a message to be presented to the user from stored messages, and a message dictionary storage unit 210 for storing a message to be presented to the user.
[0047]
The acoustic analysis unit 201, parameter control unit 202, parameter storage unit 203, speech recognition unit 204, dialogue control unit 205, and message selection unit 209 are respectively an acoustic analysis unit, parameter control unit, parameter storage unit, voice recognition unit, dialogue unit. Corresponds to control means and message selection means.
[0048]
Next, with reference to FIG. 2 and FIGS. 3 to 6, the operation of the voice interaction apparatus according to the second embodiment of the present invention will be described in detail with reference to the flow of interaction shown in FIG.
First, when a voice dialogue is started in accordance with a user instruction, the dialogue control unit 205 instructs the dictionary selection unit 208 to create a dictionary composed of words representing a search genre. In response to this command, the dictionary selection unit 208 creates a speech recognition dictionary composed of words representing the genre of search as shown in FIG. 4 from the speech recognition dictionary storage unit 207.
[0049]
Next, the dialogue control unit 205 instructs the message selection unit 209 to output a message that prompts the user to speak a word indicating the type of facility. In response to this command, the message selection unit 209 selects a message “Which genre is to be searched” from the message dictionary storage unit 210 and presents it to the user by voice.
[0050]
Therefore, the dialogue control unit 205 instructs the voice recognition unit 204 to execute voice recognition using the dictionary created by the dictionary selection unit 208 and inputs to the parameter control unit 202 in the acoustic analysis unit 201. It instructs to output to the speech recognition unit 204 a feature parameter obtained as a result of analyzing the speech signal. The user who has heard the message “Which genre do you want to search?” Utters a word representing the genre that the user wants to search, in this case “golf course”, and inputs it to the voice interaction device.
[0051]
The input voice signal “golf course” is analyzed by the acoustic analysis unit 201 and converted into a characteristic parameter, and the characteristic parameter obtained by the acoustic analysis unit 201 is recognized by the voice recognition unit 204 via the parameter control unit 202. The As a recognition result, “golf course” is selected as a search genre. This result is stored in the dialogue control unit 205. Next, the dialogue control unit 205 instructs the message selection unit 209 to output a message that prompts the user to speak the name of the golf course, which is the result of the previous speech recognition. In response to this command, the message selection unit 209 selects a message “What is a golf course?” From the message dictionary storage unit 210 and presents it to the user by voice.
[0052]
Next, the dialogue control unit 205 instructs the parameter control unit 202 and the parameter storage unit 203 to store characteristic parameters obtained as a result of analyzing the input voice signal in the acoustic analysis unit 201. In response to this command, the parameter control unit 202 outputs the input voice signal to the parameter storage unit 203, and the input parameter storage unit 203 starts storing feature parameters obtained by analyzing the input voice signal.
[0053]
In addition, the user who heard the message “What is the golf course?” Presented earlier speaks the golf course that the user wishes to search for, in this case, “○ Country Club” and speaks a voice dialogue device. To enter. The input audio signal “XX country club” is analyzed by the acoustic analysis unit 201, converted into a characteristic parameter, and stored in the parameter storage unit 203 via the parameter control unit 202.
[0054]
When the accumulation is completed, the dialogue control unit 205 instructs the dictionary selection unit 208 to create a dictionary composed of words representing prefecture names to be searched. In response to this instruction, the dictionary selection unit 208 creates a speech recognition dictionary composed of words representing prefecture names to be searched as shown in FIG. 5 from the speech recognition dictionary storage unit 207.
[0055]
Next, the dialogue control unit 205 instructs the message selection unit 209 to output a message that prompts the user to speak a word representing the name of the prefecture to be searched. In response to this command, the message selection unit 209 selects a message “in which prefecture is it?” From the message dictionary storage unit 210 and presents it to the user by voice.
[0056]
Next, the dialogue control unit 205 instructs the voice recognition unit 204 to execute voice recognition using the dictionary created by the dictionary selection unit 208, and transmits an input voice signal to the parameter control unit 202 as an acoustic signal. It instructs to output the characteristic parameter obtained by the analysis by the analysis unit 201 to the voice recognition unit 204.
[0057]
The user who heard the previous message “Where is it?” Utters a word representing the prefecture to be searched, in this case “Shizuoka Prefecture”, and inputs it to the voice interaction device. The input speech signal “Shizuoka Prefecture” is analyzed by the acoustic analysis unit 201 and converted into feature parameters, and the feature parameters obtained by the acoustic analysis unit 201 are recognized by the speech recognition unit 204 via the parameter control unit 202. Is done.
[0058]
As a recognition result, “Shizuoka Prefecture” is selected as the name of the search target prefecture. This result is stored in the dialogue control unit 205. The dialogue control unit 205 combines “Shizuoka Prefecture” as the result of the previous speech recognition with “Golf course” as the result of the speech recognition performed before that, and the dictionary selection unit 208 is informed about the golf course in Shizuoka Prefecture. Directs the creation of a dictionary composed of names. In response to this instruction, the dictionary selection unit 208 creates a speech recognition dictionary composed of names of golf courses in Shizuoka Prefecture as shown in FIG. 6 from the speech recognition dictionary storage unit 207.
[0059]
Next, the dialogue control unit 205 instructs the parameter control unit 202 and the parameter storage unit 203 to output to the voice recognition unit 204 the characteristic parameter of “XX country club” based on the voice of the user stored previously. To do. In response to this command, the parameter storage unit 203 outputs the stored feature parameters to the parameter control unit 202, and the parameter control unit 202 starts outputting the feature parameters to the voice recognition unit 204. This feature parameter is recognized by the voice recognition unit 204, and as a result of the recognition, “XX country club” is selected and the search target is determined.
[0060]
Next, the dialogue control unit 205 instructs the message selection unit 209 to present the confirmed search target “XX country club” to the user. In response to this command, the message selection unit 209 combines the content stored in the message dictionary storage unit 210 with “XX country club” and creates a message “Displays a map near the XX country club.” To the user by voice. Then, the map is displayed. With the above operation, the dialog flow shown in FIG. 3 is completed.
[0061]
【The invention's effect】
The present invention is configured as described above, and particularly includes an input voice storage unit that temporarily stores an input voice signal or a parameter storage unit that temporarily stores characteristic parameters obtained as a result of analyzing the input voice signal, and uttered later. The speech recognition result of the word is used to narrow down the recognition target vocabulary for the previously uttered word, so that it can be recognized by using the speech recognition unit with the same performance before supplementing the main information. A dialogue flow of inputting information can be realized, and a voice dialogue device and a dialogue method that are easier to use for a user can be provided, which can search a target item by changing the utterance order.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a voice interaction apparatus according to a first embodiment of the present invention.
FIG. 2 is a block diagram showing a configuration of a voice interaction apparatus according to a second embodiment of the present invention.
FIG. 3 is a flowchart showing the flow of a voice dialogue by the voice dialogue apparatus shown in FIGS. 1 and 2;
FIG. 4 is a diagram showing the contents of a voice recognition dictionary for voice recognition of a genre of a search item in a voice dialogue apparatus.
FIG. 5 is a diagram showing the contents of a voice recognition dictionary for voice recognition of a prefecture name with a golf course in a voice dialogue apparatus.
FIG. 6 is a diagram showing the contents of a voice recognition dictionary for voice recognition of a golf course in Shizuoka Prefecture in a voice dialogue apparatus.
FIG. 7 is a block diagram showing a configuration of a conventional voice interaction apparatus
FIG. 8 is a flowchart showing the flow of a voice dialogue by the voice dialogue apparatus shown in FIG.
[Explanation of symbols]
101 Input voice control unit
102 Input voice storage unit
103 Acoustic analysis unit
104 Voice recognition unit
105 Dialogue control unit
106 Dialog control information storage unit
107 voice recognition dictionary storage
108 Dictionary selector
109 Message selector
110 Message dictionary storage
201 Acoustic analysis unit
202 Parameter control unit
203 Parameter storage unit
204 Voice recognition unit
205 Dialogue control unit
206 Dialogue control information storage
207 Voice recognition dictionary storage
208 Dictionary selection part
209 Message selector
210 Message dictionary storage
303 Acoustic analysis unit
304 Voice recognition unit
305 Dialogue control unit
306 Dialogue control information storage unit
307 Voice recognition dictionary storage
308 Dictionary selection part
309 Message selector
310 Message dictionary storage

Claims

Whether the dialog control unit stores or analyzes the input voice signal by commanding whether to store or analyze the input voice signal or to analyze the stored input voice signal, or to analyze the stored input voice signal Input voice control means for switching,
Input voice storage means for storing the input voice signal by instructing the dialog control unit to store the input voice signal;
An acoustic analysis means for analyzing the input speech signal to obtain a characteristic parameter;
A speech recognition means for performing speech recognition by collating a feature parameter obtained by analyzing an input speech signal by instructing execution of speech recognition by a dialog control unit and a speech recognition dictionary;
A dictionary selection unit that takes out the dictionary designated by the dialogue control unit from the speech recognition storage unit and passes it to the speech recognition unit ;
The same operation is performed for the next speech recognition based on the recognition result obtained by instructing selection of the speech recognition dictionary in a predetermined order, presenting a message to the user and instructing execution of speech recognition, and further by speech recognition. A dialogue control unit for controlling a voice dialogue to be performed;
A message selection means for selecting and outputting a message to be presented to the user from among the messages stored by commands of the dialogue control unit;
The voice signal previously uttered is accumulated in the input voice accumulation means, the voice signal uttered later is recognized by the acoustic analysis means and the voice recognition means, and the obtained voice recognition result is accumulated. A spoken dialogue apparatus characterized by recognizing a voice by changing the order of input voice signals by using it for narrowing down a voice recognition target when recognizing the previously voiced voice signal. .

Acoustic analysis means for analyzing the input speech signal and obtaining its characteristic parameters;
The dialogue control unit stores the analyzed feature parameter or outputs the stored feature parameter to the voice recognition unit, and stores the analyzed feature parameter or the stored feature parameter as a voice recognition unit. Input parameter control means for switching whether to output to,
Parameter storage means for storing feature parameters by instructing the dialog control unit to store feature parameters;
A speech recognition means for performing speech recognition by collating a feature parameter obtained by analyzing an input speech signal by instructing execution of speech recognition by a dialog control unit and a speech recognition dictionary;
A dictionary selection unit that takes out the dictionary designated by the dialogue control unit from the speech recognition storage unit and passes it to the speech recognition unit ;
Instructs the user to select a speech recognition dictionary in a predetermined order, presents a message such as a question to the user, instructs the user to execute speech recognition, and further determines the next speech recognition based on the recognition result obtained as a result of speech recognition. Has a dialogue control unit for controlling a voice dialogue that performs the same operation,
A message selection means for selecting and outputting a message to be presented to the user from among the messages stored by commands of the dialogue control unit;
A feature parameter obtained by analyzing a speech signal uttered earlier is stored in the parameter storage means, and a feature parameter of a speech signal uttered later is recognized by the speech recognition means. The voice recognition result is used to narrow down the voice recognition target when the feature parameter of the voice signal uttered earlier is recognized, thereby changing the order of the input voice signals. A voice interactive apparatus characterized by recognizing .