JP3972632B2

JP3972632B2 - Voice recognition device

Info

Publication number: JP3972632B2
Application number: JP2001333890A
Authority: JP
Inventors: 健大野; 剛司寸田
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2001-10-31
Filing date: 2001-10-31
Publication date: 2007-09-05
Anticipated expiration: 2021-10-31
Also published as: JP2003140687A

Description

【０００１】
【発明の属する技術分野】
本発明は、入力された音声を認識して、入力された実際の音声に対する認識候補を表示する音声認識装置に関する。
【０００２】
【従来の技術】
従来の音声認識装置として、特開平１１−３５２９９１号公報に開示されたものがある。この音声認識装置では、単音節ごとに区切って発声された音声を認識して認識候補を表示し、表示した認識候補が音声入力者によって確定されるまで、順次認識候補を表示していくものである。
【０００３】
【発明が解決しようとする課題】
しかしながら、従来の音声認識装置では、所望の認識候補が得られない場合、次の認識候補を順次表示させていくが、例えば音声入力時に大きいレベルの騒音が混入した時には、認識候補を順次表示させていっても所望の認識候補が表示されないことがある。
【０００４】
本発明の目的は、認識候補を順次表示させても所望の認識結果が得られないと判断されるときには、その旨を操作者に伝えることにより、認識候補の無駄な選択操作を防ぐことができる音声認識装置を提供することにある。
【０００５】
【課題を解決するための手段】
（１）請求項１の発明は、音声を入力する音声入力装置と、入力される音声に対する認識対象語を記憶する記憶装置と、音声入力装置に入力された音声と、記憶装置に記憶されている認識対象語とが一致する度合いを示す一致度を演算するとともに、一致度の大きい順に所定の数の認識対象語を抽出して認識候補とする制御装置と、少なくとも認識候補の中から所望の認識候補を選択する操作を操作者が行うことができる操作装置と、操作者が操作装置を操作したときの操作感を認識候補の一致度の大きさに応じて変更する操作感変更装置とを備えることにより、上記目的を達成する。
（２）請求項２の発明は、請求項１の音声認識装置において、操作感変更装置は、操作者が操作装置を用いて認識候補を選択する操作をアシストする力を発生させるものであり、認識候補の一致度が大きいほど発生させるアシスト力を大きくすることを特徴とする。
（３）請求項３の発明は、請求項２の音声認識装置において、操作装置はホイールを備えた回転式入力装置であって、操作者がホイールを回転させることにより選択操作を行うことができるものであり、操作感変更装置は、ホイールの回転力を認識候補の一致度の大きさに応じて変更することを特徴とする。
（４）請求項４の発明は、請求項１の音声認識装置において、操作感変更装置は、操作者が操作装置を用いて認識候補を選択する操作力に抵抗する力を発生させるものであり、一致度が所定の値より小さい認識候補を操作者が選択する際に、操作装置の操作力に抵抗する力を加えることを特徴とする。
（５）請求項５の発明は、請求項１の音声認識装置において、操作感変更装置は、操作者が操作装置を用いて認識候補を選択する操作を妨げる力を発生させるものであり、認識候補の一致度と次の認識候補の一致度との差が所定の値より大きい場合に、操作者が次の認識候補を選択する際に操作装置の操作力に抵抗する力を加えることを特徴とする。
（６）請求項６の発明は、請求項４または５の音声認識装置において、操作感変更装置から操作装置に加えられた抵抗力に反して操作者が操作装置の操作を継続したときは、音声入力処理を再度行うことを特徴とする。
（７）請求項７の発明は、請求項４または５の音声認識装置において、操作感変更装置から操作装置に加えられた力に反して操作者が操作装置の操作を継続したときは、音声入力以外の入力機能が起動することを特徴とする。
（８）請求項８の発明は、請求項７の音声認識装置において、音声入力以外の入力機能は、文字入力機能であることを特徴とする。
（９）請求項９の発明は、請求項４〜８のいずれかの音声認識装置において、操作装置はホイールを備えた回転式入力装置であって、操作者がホイールを回転させることにより選択操作を行うことができるものであり、操作感変更装置は、ホイールの回転力に抵抗する力を発生させることを特徴とする。
（１０）請求項１０の発明は、請求項１の音声認識装置において、操作装置はホイールを備えた回転式入力装置であって、操作者がホイールを回転させることにより選択操作を行うことができるものであり、操作感変更装置は、認識候補が選択された状態から次の認識候補を操作者が選択する操作をアシストする力をホイールの回転力として発生させ、一致度が所定の値より小さい認識候補を操作者が選択する際のホイールの回転操作量を、一致度が所定の値より大きい認識候補を操作者が選択する際のホイールの回転操作量よりも多くしたことを特徴とする。
【０００６】
【発明の効果】
本発明によれば、次のような効果を奏する。
（１）請求項１〜１０の発明によれば、入力された音声と認識対象語との一致度の大きさに応じて、認識候補を選択する際に用いられる操作装置の操作感を変更するようにしたので、認識候補の選択操作を継続して行っても所望の認識候補が得られないとの判断を操作者が行うことができる。
（２）請求項２の発明によれば、操作装置には、操作者が認識候補を選択する操作をアシストする力が加わり、認識候補の一致度が大きいほど加わる力を大きくするので、操作装置に加わる力の大きさに基づいて、所望の認識候補が得られる可能性の判断を操作者が行うことができる。
（３）請求項３の発明によれば、操作装置はホイールを備えた回転式入力装置であって、操作者がホイールを回転させることにより選択操作を行うことができるものであり、ホイールが回転する方向に力が加わるので、操作者はホイールの回転操作時の感覚により、所望の認識候補が得られる可能性の判断を行うことができる。
（４）請求項４の発明によれば、一致度が所定の値より小さい認識候補を操作者が選択する際に、操作者が認識候補を選択する操作力に抵抗する力が操作装置に加わるので、継続して認識候補の選択操作を行っても所望の認識候補が得られないとの判断を操作者が行うことができる。
（５）請求項５の発明によれば、認識候補の一致度と次の認識候補の一致度との差が所定の値より大きい場合に、操作者が次の認識候補を選択する際に、操作者が認識候補を選択する操作力に抵抗する力が操作装置に加わるので、継続して認識候補の選択操作を行っても所望の認識候補が得られないとの判断を操作者が行うことができる。
（６）請求項６の発明によれば、操作装置に加えられた力に反して認識候補を選択する操作が行われたときは、音声入力処理を再度行うので、所望の認識候補が得られる可能性が低いにも関わらず、認識候補の選択操作が継続されるのを防ぐことができる。
（７）請求項７の発明によれば、操作装置に加えられた力に反して認識候補を選択する操作が行われたときは、音声入力以外の入力機能が起動するので、所望の認識候補が得られる可能性が低いにも関わらず、認識候補の選択操作が継続されるのを防ぐことができる。
（８）請求項８の発明によれば、音声入力以外の入力機能は文字入力機能であるので、文字入力による確実な入力を行うことができる。
（９）請求項９の発明によれば、操作装置はホイールを備えた回転式入力装置であって、操作者がホイールを回転させることにより選択操作を行うことができるものであり、ホイールの回転力に抵抗する力が加わるので、操作者はホイールの回転操作時の感覚により、認識候補の選択操作を継続して行っても所望の認識候補が得られないとの判断を操作者が行うことができる。
（１０）請求項１０の発明によれば、操作装置はホイールを備えた回転式入力装置であって、操作者がホイールを回転させることにより選択操作を行うことができるものであり、一致度が所定の値より小さい認識候補を操作者が選択する際のホイールの回転操作量を、一致度が所定の値より大きい認識候補を操作者が選択する際の回転操作量よりも多くしたので、操作者はホイールの回転操作時の感覚により、認識候補の選択操作を継続して行っても所望の認識候補が得られないとの判断を操作者が行うことができる。
【０００７】
【発明の実施の形態】
（第１の実施の形態）
図１は、本発明による音声認識装置の第１の実施の形態の構成を示す図である。本発明による音声認識装置は、マイク１０１と、スピーカ１０２と、信号処理ユニット１０３と、入力装置１０４と、ディスプレイ１０５とを備える。信号処理ユニット１０３は、Ａ／Ｄコンバータ１０３１と、Ｄ／Ａコンバータ１０３２と、出力アンプ１０３３と、信号処理装置１０３４とを有する。
【０００８】
マイク１０１を介して入力された音声は、音声信号として信号処理ユニット１０３のＡ／Ｄコンバータ１０３１に入力される。Ａ／Ｄコンバータ１０３１は、入力された音声信号をデジタル信号に変換して、信号処理装置１０３４に出力する。信号処理装置１０３４は、ＣＰＵ１０３４ａとメモリ１０３４ｂとを有し、外部記憶装置１０３５に記憶されている認識対象語のデジタル信号と、入力された音声のデジタル信号との一致度を演算する。
【０００９】
Ｄ／Ａコンバータ１０３２は、スピーカ１０２から音声等を出力するために、デジタル信号をアナログ信号に変換して、出力アンプ１０３３に出力する。Ｄ／Ａコンバータ１０３２から出力アンプ１０３３に入力されたアナログ信号は増幅されて、スピーカ１０２を介して音声として出力される。
【００１０】
ディスプレイ１０５は、入力された音声の認識候補等を表示するためのものである。入力装置１０４は、ホイール１０４ａと複数個のスイッチ１０４ｂとを有し、操作者の音声認識開始要求入力、入力の取り消し、認識候補選択操作等を検出して信号処理装置１０３４に出力する。ホイール１０４ａは、図１の矢印Ａの方向への押し込み操作と、矢印Ｂの方向への回転操作とが可能である。矢印Ａの方向への押し込み操作は、例えばディスプレイ１０５に表示された認識候補の確定操作時に行われ、矢印Ｂの方向への回転操作は、ディスプレイ１０５に表示された認識候補の選択操作時に行われる。
【００１１】
図２は、入力装置１０４の構成を示す詳細図である。入力装置１０４は、上述したホイール１０４ａとスイッチ１０４ｂの他に、ホイール駆動モータ１０４ｃとホイール制御ＣＰＵ１０４ｄとホイール位置センサ１０４ｅと通信デバイス１０４ｆとを備える。ホイール駆動モータ１０４ｃは、ホイール１０４ａの矢印Ｂの回転方向にトルクを発生することができる。操作者が回転する方向にトルクを発生すると、操作者がホイール１０４ａを回転させるのを助け、操作者が回転する方向と逆の方向にトルクを発生すると、操作者がホイール１０４ａを回転させるのを妨げることになる。このトルクの発生により、操作者はホイール１０４ａの回転操作が軽くなる感覚や重くなる感覚を感じる。従って、ホイール駆動モータ１０４ｃは、操作者のホイール１０４ａの操作感を変更させることができる。
【００１２】
ホイール位置センサ１０４ｅは、ホイール１０４ａの回転角および矢印Ａ方向の押し込み操作を検出する。ホイール位置センサ１０４ｅにより検出された信号は、ホイール制御ＣＰＵ１０４ｄに送られる。ホイール制御ＣＰＵ１０４ｄは、ホイール位置センサ１０４ｅから入力された信号をデジタル化してホイール位置情報に変換するとともに、信号処理装置１０３４から入力される、後述する発生トルクパターン情報とホイール位置情報とに基づいて、ホイール駆動モータ１０４ｃに発生させるトルク量を計算する。ホイール制御ＣＰＵ１０４ｄは、計算した発生トルク量に基づいたトルク制御信号をホイール駆動モータ１０４ｃに出力する。ホイール駆動モータ１０４ｃは、この制御信号に基づいて駆動し、ホイール１０４ａの矢印Ｂの回転方向にトルクを発生させる。
【００１３】
通信デバイス１０４ｆは、信号処理装置１０３４と接続されており、ホイール制御ＣＰＵ１０４ｄから入力されるホイール位置情報を信号処理装置１０３４に出力するとともに、信号処理装置１０３４から入力される発生トルクパターン情報をホイール制御ＣＰＵ１０４ｄに出力する。
【００１４】
図３は、本発明による音声認識装置により行われる一実施の形態の処理手順を示すフローチャートである。この制御は、信号処理ユニット１０３の信号処理装置１０３４により行われる。ステップＳ２０１から始まる処理は、操作者が入力装置１０４を操作して、音声入力を開始する旨の信号が信号処理装置１０３４に入力されることにより始まる。
【００１５】
ステップＳ２０１では、音声認識処理を開始する旨を操作者に知らせるための告知音信号を外部記憶装置１０３５から読み込んで、Ｄ／Ａコンバータ１０３２に出力する。Ｄ／Ａコンバータ１０３２でアナログ変換された告知音信号は、出力アンプ１０３３を介してスピーカ１０２から告知音として出力される。操作者は、スピーカ１０２から発せられる告知音を聞いて、マイク１０１に音声入力を開始する。ここでは、本発明による音声認識装置をカーナビゲーション装置に適用する例について取りあげる。すなわち、操作者が目的地を音声入力するものである。ここでは、目的地の都道府県の名称を音声入力するものとし、外部記憶装置１０３５には、都道府県の名称が認識対象語として記憶されているものとする。
【００１６】
次のステップＳ２０２では、入力された音声の取り込みを開始する。操作者がマイク１０１に向かって発した音声は、Ａ／Ｄコンバータ１０３１でデジタル信号に変換された後、信号処理装置１０３４に入力される。マイク１０１は、不図示の電源から電力が供給されると、ステップＳ２０１で操作者が入力装置１０４を操作する前から、周辺の音を拾ってＡ／Ｄコンバータ１０３１に出力し、Ａ／Ｄコンバータ１０３１で変換されたデジタル信号が信号処理装置１０３４に入力されている。信号処理装置１０３４は、ステップＳ２０１で操作者が入力装置１０４を操作するまでは、入力されるデジタル信号の平均パワーを演算している。ステップＳ２０１で入力装置１０４が操作されて音声が入力されると、演算していたデジタル信号の平均パワーより大きいパワーのデジタル信号が入力される。従って、信号処理装置１０３４は、演算していた平均パワーより所定値以上のパワーのデジタル信号が入力されたときに、操作者がマイク１０１に向かって音声入力を行ったと判断し、音声の取り込みを開始する。
【００１７】
音声の取り込みを開始するとステップＳ２０３に進む。ステップＳ２０３では、取り込んだ音声と、外部記憶装置１０３５に記憶されている認識対象語との一致度を演算する。信号処理装置１０３４は、取り込みを開始した音声のデジタル信号のうち、信号のパワーに基づいて、操作者が発した音声区間を識別しておく。この音声区間のデジタル信号と、外部記憶装置１０３５に記憶されている複数の認識対象語のデジタル信号とが、それぞれどれほど似ているかを数値化することにより、一致度を演算する。数値化された一致度の値が大きいほど、比較している両者が似ていることを意味する。なお、並列処理により、一致度の演算が行われている間も、音声の取り込みは継続して行われている。
【００１８】
取り込んでいる音声のデジタル信号のパワーが所定値以下となる時間が所定時間以上継続すると、操作者による音声入力が終了したと判断して、ステップＳ２０４にて音声の取り込みを終了する。次のステップＳ２０５では、一致度の演算処理が終了した後に、一致度の大きい順に所定の数の認識対象語を抽出して認識候補とする。図４は、ディスプレイ１０５に表示された認識候補の一例である。ディスプレイ１０５には、認識候補とともに一致度も表示される。抽出する認識対象語の所定の数は、予め定めることができ、例えば１０である。図４では、一致度が高い順に５つの認識候補が表示されており、所定の数を１０とした場合、一致度が８８０（「秋田県」）より小さい５つの認識候補がさらに存在する。
【００１９】
抽出された所定の数の認識候補をディスプレイ１０５に表示すると、ステップＳ２０６に進む。ステップＳ２０６では、操作者がディスプレイ１０５に表示された認識候補の中から、所望の認識候補を選択して確定したことを示す信号が入力されることにより、本制御を終了する。すなわち、操作者は、ディスプレイ１０５に表示された認識候補の中から、入力装置１０４のホイール１０４ａを回転操作して所望の認識候補を選択し、選択した所望の認識候補に対して、ホイール１０４ａの押し込み操作を行うことにより、所望の認識候補を確定させる。上述したように、ホイール１０４ａの回転操作や押し込み操作は、ホイール位置センサ１０４ｅにて検出されてホイール制御ＣＰＵ１０４ｄに送られ、通信デバイス１０４ｆを介して信号処理装置１０３４に入力される。信号処理装置１０３４は、この信号を受信すると本制御を終了する。
【００２０】
本発明による音声認識装置は、ステップＳ２０６で、操作者がディスプレイ１０５に表示された複数の認識候補の中から、ホイール１０４ａの回転操作により所望の候補を選択する際の入力装置１０４の制御に特徴がある。この制御について、図５を用いて説明する。
【００２１】
図５は、ホイール駆動モータ１０４ｃに対してホイール１０４ａの回転方向にトルクを発生させるための発生トルクポテンシャルと、ホイール１０４ａの回転角との関係を示す図である。この発生トルクポテンシャルは、ホイール１０４ａの回転角に対応する発生トルクを積分したグラフである。この発生トルクポテンシャルと回転角との関係を示すグラフには、いくつかの種類があり、これらを発生トルクパターンと呼ぶ。このグラフは、複数ある発生トルクパターンを視覚的に捉えやすいので、以下の説明のために用いるが、実際にホイール１０４ａの回転角ごとに発生させるトルクは、回転角に対応するグラフの傾きである。発生トルクポテンシャルのうち、図５に示す軸方向（正方向）のトルクが発生すると、操作者のホイール１０４ａの回転操作を妨げることになり、軸方向と反対方向（負方向）のトルクが発生すると、操作者のホイール１０４ａの回転操作をアシストすることになる。
【００２２】
図５に示すように、操作者がディスプレイ１０５に表示された認識候補の中から、所望の候補を選択するためにホイール１０４ａの回転操作を行うと、表示された認識候補、すなわち、「神奈川県」、「佐賀県」、「滋賀県」、「熊本県」、「秋田県」等が順次選択される。図５に示すように、各認識候補に対応する発生トルクポテンシャルを「発生トルクポテンシャルの谷」と呼ぶことにする。この谷の深さは、ステップＳ２０３で演算した一致度に比例させる。従って、図４に示される認識候補と対応する一致度とを参照すると、「神奈川県」に対応する発生トルクポテンシャルの谷が一番深く、「佐賀県」、「滋賀県」、「熊本県」、「秋田県」の順に、それぞれに対応する発生トルクポテンシャルの谷は浅くなっていく。
【００２３】
上述したように、発生トルクポテンシャルの軸方向と反対方向のトルク、すなわち、発生トルクポテンシャルの谷の部分に対応するトルクがホイール駆動モータ１０４ｃに発生すると、ホイール１０４ａの回転をアシストすることになる。従って、操作者がホイール１０４ａの回転操作により、第１の認識候補である「神奈川県」を選択する際には、強く引き寄せられるような感覚がホイール１０４ａに発生し、「神奈川県」を選択しやすいようになっている。
【００２４】
操作者が、「神奈川県」が選択された状態からさらに同じ方向にホイール１０４ａを回転操作すると、発生トルクポテンシャルは、谷の部分を通過して水平になる。この水平な部分は、ホイール１０４ａにトルクが発生しない状態である。操作者が、さらにホイール１０４ａを同じ方向に回転操作すると、発生トルクポテンシャルは、「佐賀県」に対応する谷の部分にさしかかる。この時も、強く引き寄せられるような感覚がホイール１０４ａに発生し、「佐賀県」を選択しやすくなっている。ただし、「佐賀県」に対応する谷の深さは、「神奈川県」に対応する谷の深さよりも浅いので、引き寄せられるような感覚は前回の操作時よりも小さくなっている。
【００２５】
操作者は、ホイール１０４ａの回転操作により、次の認識候補を順次選択することができる。選択された認識候補は、ディスプレイ１０５に拡大表示されると同時にスピーカ１０２により合成音声で操作者に知らされる。図６は、操作者が「神奈川県」を選択したときのディスプレイ１０５の表示４０１と、スピーカ１０２から発せられる合成音声４０２とを示したものである。これにより、操作者は選択した認識候補が何であるかを正確に知ることができる。また、ホイール１０４ａの回転操作時の引き寄せられるような感覚の大きさにより、認識候補の一致度の大きさを知ることができる。
【００２６】
操作者が音声入力した言葉が「神奈川県」である場合は、「神奈川県」を選択した状態でホイール１０４ａの押し込み操作を行うことにより、「神奈川県」を確定することができる。「神奈川県」以外の認識候補を選択したい場合には、ホイール１０４ａの回転操作により次の認識候補を順次選択していく。図４に示される第４の認識候補である「熊本県」や第５の認識候補である「秋田県」を選択する際には、ホイール１０４ａ回転操作時の引き寄せられる感覚により、認識候補の一致度がかなり小さいということを知ることができる。従って、第６の認識候補以降の認識候補を選択する操作を行っても、所望の認識候補を得ることが困難であるという判断を操作者がすることができる。
【００２７】
（第２の実施の形態）
第２の実施の形態の音声認識装置が第１の実施の形態の音声認識装置と異なるのは、信号処理装置１０３４で行われる処理である。すなわち、信号処理装置１０３４で行われる処理のうち、図３のフローチャートを用いて説明した処理は同じであるが、操作者がホイール１０４ａの回転操作により認識候補の選択を行う時に、ホイール駆動モータ１０４ｃに発生させるトルクパターンが異なる。従って、以下では、トルクパターンの説明を主に行う。
【００２８】
図７は、ホイール１０４ａの回転角と各回転角に対応する発生トルクポテンシャルとの関係を示す図である。第２の実施の形態の音声認識装置で用いられるトルクパターンは、図７に示すような丘型となっている。すなわち、トルクを発生させない領域がしばらく続いた後、ホイール１０４ａの回転操作を妨げる向きのトルクを発生させ（境界Ｃ）、トルクポテンシャルが所定値に達するとその状態を保つ。
【００２９】
境界Ｃの位置は、図３のフローチャートのステップＳ２０３で演算した一致度に基づいて定める。具体的には、認識候補ごとの一致度が、所定のしきい値Ｘを下回った所に境界Ｃを設ける。一致度がある値より小さくなると、操作者が所望の認識候補を得られる可能性が低くなるので、この点を考慮して所定のしきい値Ｘを予め実験等により求めておく。図４に示される一致度を用いた例について説明すると、しきい値Ｘを例えば１０００とする。この場合、境界Ｃは、一致度が５０００である「滋賀県」と、一致度が９００である「熊本県」との間に設けられる。
【００３０】
操作者がホイール１０４ａを回転操作して、「神奈川県」、「佐賀県」、「滋賀県」を選択するときには、ホイール駆動モータ１０４ｃを作動させていない（トルクを発生させていない）ので、通常通りスムーズにホイール１０４ａを操作することができる。しかし、操作者が、「滋賀県」を選択した状態から「熊本県」を選択するためにホイール１０４ａを回転操作する際は、ホイール１０４ａの回転操作を妨げる向きにホイール駆動モータ１０４ｃにトルクを発生させるので、ホイール１０４ａに丘を登るような感覚が発生する。従って、操作者は、ホイール１０４ａに発生する抗力により、認識候補の一致度がしきい値Ｘを下回ったことを知ることができ、それ以上認識候補の選択操作を行っても、所望の認識候補を得ることが難しいと判断することができる。
【００３１】
（第２の実施の形態の変形例）
第２の実施の形態の音声認識装置で用いられるトルクパターンは、認識候補の選択をさらに行っても所望の認識候補を得られない部分に、ホイール１０４ａの回転操作を妨げる向きにトルクを発生させるものである。従って、トルクパターンは、図８に示すような山型のものでもよい。この場合も、操作者がホイール１０４ａの回転操作を行って、認識候補の一致度がしきい値Ｘを下回る境界Ｃの部分にさしかかると、ホイール１０４ａに抗力が発生し、それ以上認識候補の選択操作を行っても、所望の認識候補を得ることが難しいと判断することができる。
【００３２】
（第３の実施の形態）
第３の実施の形態の音声認識装置が第１，第２の実施の形態の音声認識装置と異なるのは、信号処理装置１０３４で行われる処理である。すなわち、信号処理装置１０３４で行われる処理のうち、図３のフローチャートを用いて説明した処理は同じであるが、ホイール１０４ａが回転操作される時に発生させるトルクパターンが異なる。従って、以下では、トルクパターンの説明を主に行う。
【００３３】
図９は、ホイール１０４ａの回転角と各回転角に対応する発生トルクポテンシャルとの関係を示す図である。第３の実施の形態の音声認識装置で用いられるトルクパターンは、第１の実施の形態の音声認識装置で用いられるトルクパターンのように、各認識候補に対してトルクポテンシャルの谷が割り当てられている。ただし、図９に示すように、各認識候補に対するトルクポテンシャルの谷の深さは同じであり、境界Ｃの位置の谷と谷との間隔が、他の谷と谷との間隔と比べて広く設定されている。
【００３４】
境界Ｃは、第２の実施の形態と同様に、認識候補ごとの一致度が所定のしきい値Ｘを下回った所に設ける。従って、図４に示される一致度を用いた例において、しきい値Ｘを例えば１０００とすると、境界Ｃは、一致度が５０００である「滋賀県」と、一致度が９００である「熊本県」との間に設けられる。この場合、操作者が「神奈川県」から「佐賀県」を選択する際のホイール１０４ａの回転操作量と、「佐賀県」から「滋賀県」を選択する際のホイール１０４ａの回転操作量とは同じであるが、「滋賀県」を選択した状態から「熊本県」を選択するために行うホイール１０４ａの回転操作量は、他の認識候補、例えば「佐賀県」を選択した状態から「滋賀県」を選択するために行うホイール１０４ａの回転操作量よりも多くなる。
【００３５】
操作者がホイール１０４ａの回転操作により認識候補を順次選択していく際、上述したように、「滋賀県」を選択した状態から「熊本県」を選択するためには、ホイール１０４ａをより多く回転操作する必要がある。従って、操作者は、「熊本県」を選択する際に、「熊本県」の一致度がしきい値Ｘより小さいことを知ることができ、それ以上認識候補の選択操作を行っても、所望の認識候補を得ることが難しいと判断することができる。
【００３６】
（第３の実施の形態の変形例）
第３の実施の形態の音声認識装置で用いられるトルクパターンは、認識候補の選択をさらに行っても所望の認識候補を得られない部分の谷と谷との間隔を、それまでの谷と谷との間隔よりも広くした。この谷と谷との間隔を認識候補の一致度に基づいて変化させたトルクパターンを、図１０に示す。図１０に示すトルクパターンは、原点Ｏから各認識候補を選択するまでの回転角を、各認識候補の一致度の逆数、または逆数に比例する値としている。認識候補の一致度が小さくなるほど、一致度の逆数は大きくなる。従って、実際に入力された音声と認識対象語とのずれが大きくなるにつれて、ホイール１０４ａの回転操作量が大きくなるようにしている。これにより、上述した境界Ｃの位置を特定しなくても、認識候補を選択する際のホイール１０４ａの回転操作量により、所望の認識候補を得ることができる可能性を知ることができる。すなわち、ある認識候補が選択された状態から次の認識候補を選択する際のホイール１０４ａの回転操作量が多くなったときに、所望の認識候補を得ることが難しいと判断することができる。
【００３７】
（第４の実施の形態）
第４の実施の形態の音声認識装置が第１，第２，第３の実施の形態の音声認識装置と異なるのは、信号処理装置１０３４で行われる処理である。すなわち、信号処理装置１０３４で行われる処理のうち、図３のフローチャートのステップＳ２０１〜ステップＳ２０５までの処理は同じであるが、ステップＳ２０６で所望の認識候補が得られる可能性が低い場合に、音声の再入力機能を起動させる。
【００３８】
図１１に示すトルクパターンは、図８に示すトルクパターンと同じであるので、その詳しい説明は省略する。図４を参照すると、第４の認識候補である「熊本県」以降の認識候補の一致度はかなり低い。従って、操作者が音声入力した言葉が「神奈川県」、「佐賀県」、「滋賀県」では無い場合、さらに認識候補の選択操作を行っても、所望の認識候補を得られる可能性は低い。第４の実施の形態の音声認識装置では、「滋賀県」以降の認識候補はディスプレイ１０５に表示せず、「滋賀県」の次に「戻る」というコマンドを表示する。この「滋賀県」と「戻る」のコマンドとの境界Ｃは、第２，第３の実施の形態の音声認識装置で境界Ｃを定めたように、一致度としきい値Ｘとの大きさを比較して定める。
【００３９】
操作者が、「滋賀県」を選択した状態からさらにホイール１０４ａの回転操作を行うと、ホイール１０４ａには抗力が働く（図１１の境界Ｃ）。この抗力に反して回転操作を続けると、「戻る」というコマンドが選択されて、音声の再入力機能が起動する。この制御を図１３に示すフローチャートを用いて説明する。図１３に示すフローチャートの各ステップにおける処理のうち、ステップＳ２０１〜ステップＳ２０５における処理は、図３に示すフローチャートで行われる処理と同じなので、説明を省略する。
【００４０】
ステップＳ２０５において、認識候補をディスプレイ１０５に表示するとステップＳ２１０に進む。ステップＳ２１０では、上述した「戻る」のコマンドが操作者により選択されたか否かを判定する。操作者が、ホイール１０４ａに発生する抗力に反して回転操作を行い、「戻る」を選択すると、ホイール位置センサ１０４ｅによりホイール１０４ａの回転操作が検出されてホイール制御ＣＰＵ１０４ｄに送られ、通信デバイス１０４ｆを介して信号処理装置１０３４に入力される。信号処理装置１０３４がこの信号を受信すると、ステップＳ２０１に戻り、再度操作者に音声入力を行ってもらうために告知音信号をＤ／Ａコンバータ１０３２に出力する。Ｄ／Ａコンバータ１０３２に入力された告知音信号は、出力アンプ１０３３を介して、スピーカ１０２から告知音として発せられる。この告知音を聞いた操作者は、再度音声入力を行う。一方、ステップＳ２１０で「戻る」のコマンドの選択・確定が行われなかった場合は、ステップＳ２０６にて認識候補の決定処理が行われ、制御を終了する。
【００４１】
第４の実施の形態の音声認識装置によれば、操作者がホイール１０４ａの回転操作により認識候補を選択する際に、認識候補の一致度の大きさから所望の認識候補が得られる可能性が低い場合に、一致度が所定値Ｘより小さい認識候補をディスプレイ１０５に表示せず、再度音声入力を行うための「戻る」というコマンドを設けた。これにより、速やかに音声の再入力を行うことを可能とし、無駄な認識候補の選択操作を省いて、より早く認識候補を確定させることができる。
【００４２】
なお、上述した実施の形態では、「戻る」のコマンドが選択されると、自動的に音声の再入力機能を起動するようにしたが、「戻る」のコマンドが選択されて、ホイール１０４ａの押し込み操作により確定された後に音声の再入力機能が起動するようにしてもよい。
【００４３】
（第５の実施の形態）
第４の実施の形態の音声認識装置では、所望の認識候補が得られる可能性が低い場合に、再度操作者に音声入力を行ってもらうようにしたが、第５の実施の形態の音声認識装置では、再度の音声入力の代わりにキー入力をしてもらう。
【００４４】
図１２に示すトルクパターンも図８に示すトルクパターンと同じである。第４の音声認識装置と同様に、操作者が音声入力した言葉が「神奈川県」、「佐賀県」、「滋賀県」では無い場合、さらに認識候補の選択操作を行っても、所望の認識候補を得られる可能性は低い。第５の実施の形態の音声認識装置では、「滋賀県」以降の認識候補はディスプレイ１０５に表示せず、「滋賀県」の次にキー入力、すなわち操作者にマニュアル入力を行ってもらうための「マニュアル」というコマンドを表示する。「滋賀県」と「マニュアル」のコマンドとの境界Ｃの定め方も、第２〜第４の実施の形態の音声認識装置と同じである。
【００４５】
以下で説明する制御および操作者による操作は、図３に示すフローチャートのステップＳ２０６で、操作者が所望の認識候補を得られないときに行われるものである。すなわち、ステップＳ２０１〜ステップＳ２０５までの処理は、第１の実施の形態の音声認識装置と同様に行われる。
【００４６】
所望の認識候補がディスプレイ１０５に表示されていないために、操作者がホイール１０４ａに発生する抗力（図１２の境界Ｃ）に反してさらに回転操作を行うと、「マニュアル」のコマンドが選択されて、図１４に示すようなキー入力画面がディスプレイ１０５に表示される。キー入力画面では、「あ」〜「ん」までの４６個のキーと、「ハーフ」等の文字に含まれる長音を入力するための「―」キーと、「検索」キー、「訂正」キー、入力した文字を表示するための表示ディスプレイ５００が示される。
【００４７】
「検索」キーは、キー入力により入力された文字列と一致する認識対象語を検索するためのものであり、入力途中でも認識対象語を検索することが可能である。例えば、「神奈川県」と入力する際に、「かなが」までの３文字を入力して「検索」キーを選択・確定すると、表示ディスプレイ５００に「神奈川県」が表示される。この場合、「検索」キーを入力する前に入力する文字数が多いほど、所望の認識対象語が得られる可能性が高くなる。「訂正」キーは、入力された文字列の末尾の一文字を取り消すためのものである。例えば、「かなざ」と入力されたときに「訂正」キーを選択・確定すると、末尾の一文字である「ざ」が消去される。
【００４８】
図１４に示す画面が表示された状態では、操作者によるホイール１０４ａの回転操作により、「あ」、「か」、「さ」、…、「わ」等の各行の先頭文字と、「検索」、「訂正」を選択することができる。図１４では、「な」が選択されている状態である。この状態で、ホイール１０４ａに予め定められている順方向の回転操作を行うと、「は」、「ま」、…、の順に選択され、逆方向の回転操作を行うと、「た」、「さ」、…、の順に選択される。「な」を選択した状態で、ホイール１０４ａの押し込み操作を行うと、図１５に示すように、「な」行の「な」、「に」、「ぬ」、「ね」、「の」のうちのいずれかを選択可能となる。この状態で、ホイール１０４ａの回転操作を行うことにより、「な」〜「の」うちの所望の文字を選択することができ、ホイール１０４ａの押し込み操作により選択した文字を確定することができる。
【００４９】
次の文字を入力するために、再度行の選択を行いたい時は、入力装置１０４に設けられているスイッチ１０４ｂのうち、図示しない「戻るスイッチ」を押すことにより、図１４に示す画面に戻ることができる。上述した方法により全ての文字を入力した後、または上述したように、文字入力の途中で「検索」キーを選択・確定することにより、認識対象語がディスプレイ５００に表示される。
【００５０】
第５の実施の形態の音声認識装置によれば、操作者がホイール１０４ａの回転操作により認識候補を選択する際に、認識候補の一致度の大きさから所望の認識候補が得られる可能性が低い場合に、一致度の低い認識候補をディスプレイ１０５に表示せず、マニュアル入力（キー入力）するための「マニュアル」というコマンドを設けた。ホイール１０４ａに発生する抗力に反して回転操作を続けると、「マニュアル」のコマンドが選択されてキー入力するための画面がディスプレイ５００に表示され、キー入力を行うことができる。キー入力は、音声入力に比べると、操作時間・手間ともにかかるが、確実に入力を行うことができる。従って、音声入力に対する認識候補の選択操作を行っても所望の認識候補が得られない場合に、無駄な認識候補の選択操作を省いて、迅速・確実に認識候補を確定させることができる。
【００５１】
なお、上述した第５の実施の形態では、「マニュアル」のコマンドが選択されると、自動的にキー入力画面を表示するようにしたが、「マニュアル」のコマンドが選択されて、ホイール１０４ａの押し込み操作により確定された後にキー入力画面を表示するようにしてもよい。
【００５２】
本発明は、上述した実施の形態に限定されることはない。例えば、第２〜第５の実施の形態の音声認識装置では、各認識候補の一致度が所定のしきい値Ｘを下回る箇所を境界Ｃに定めたが、ある認識候補と次の認識候補との一致度の差が、所定のしきい値Ｙを上回った箇所を境界Ｃに定めてもよい。図４に示す認識候補と対応する一致度とを例に取りあげると、所定のしきい値Ｙを例えば１５００とする。「神奈川県」と「佐賀県」との一致度の差は５００であり、「佐賀県」と「滋賀県」との一致度の差も５００であるので、いずれもしきい値Ｙより小さい。しかし、「滋賀県」と「熊本県」との一致度の差は４１００であるので、しきい値Ｙを上回る。従って、「滋賀県」と「熊本県」との間に境界Ｃを定めることができる。
【００５３】
また、第４，第５の実施の形態の音声認識装置では、トルクパターンを図８に示す山型のものを用いたが、図７に示されるトルクパターンを用いることもできる。
【００５４】
上述した各実施の形態では、入力装置１０４を用いて認識候補を選択するために、ホイール１０４ａの回転操作を行うものとしているが、入力装置１０４にジョイスティックを採用して、ジョイスティックにより認識候補の選択を行うこともできる。また、入力装置にキーボードやコントローラを採用して、十字キーにより認識候補の選択を行ってもよい。また、第５の実施の形態の音声認識装置で用いられるキー入力画面も、図１４に示すものに限定されることはない。
【００５５】
上述した実施の形態では、本発明による音声認識装置をカーナビゲーション装置に適用した例について説明したが、カーナビゲーション装置以外のものにも適用することができる。
【図面の簡単な説明】
【図１】本発明による音声認識装置の一実施の形態の構成を示す図
【図２】本発明による音声認識装置に用いられる入力装置の一実施の形態の構成を示す図
【図３】信号処理装置にて行われる一実施の形態の制御手順を示すフローチャート
【図４】ディスプレイに表示される認識候補の一例を示す図
【図５】第１の実施の形態の音声認識装置で用いられるトルクパターンを示す図
【図６】選択された認識候補をディスプレイに表示するとともに音声で知らせることを示す図
【図７】第２の実施の形態の音声認識装置で用いられるトルクパターンを示す図
【図８】第２の実施の形態の音声認識装置で用いられるトルクパターンの変形例を示す図
【図９】第３の実施の形態の音声認識装置で用いられるトルクパターンを示す図
【図１０】第３の実施の形態の音声認識装置で用いられるトルクパターンの変形例を示す図
【図１１】第４の実施の形態の音声認識装置で用いられるトルクパターンと音声を再入力させるコマンドとを示す図
【図１２】第５の実施の形態の音声認識装置で用いられるトルクパターンとマニュアル入力をさせるコマンドとを示す図
【図１３】第４の実施の形態による音声認識装置の信号処理装置にて行われる制御手順を示すフローチャート
【図１４】マニュアル入力を行うための画面を示す図
【図１５】マニュアル入力画面にて「な」行が選択されたときの図
【符号の説明】
１０１…マイク、１０２…スピーカ、１０３…信号処理ユニット、１０３１…Ａ／Ｄコンバータ、１０３２…Ｄ／Ａコンバータ、１０３３…出力アンプ、１０３４…信号処理装置、１０３４ａ…ＣＰＵ、１０３４ｂ…メモリ、１０３５…外部記憶装置、１０４…入力装置、１０４ａ…ホイール、１０４ｂ…スイッチ、１０４ｃ…ホイール駆動モータ、１０４ｄ…ホイール制御ＣＰＵ、１０４ｅ…ホイール位置センサ、１０４ｆ…通信デバイス、５００…表示ディスプレイ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech recognition apparatus that recognizes input speech and displays recognition candidates for the input actual speech.
[0002]
[Prior art]
A conventional speech recognition apparatus is disclosed in Japanese Patent Application Laid-Open No. 11-352991. This speech recognition apparatus recognizes speech uttered by dividing into single syllables, displays recognition candidates, and sequentially displays the recognition candidates until the displayed recognition candidates are determined by the voice input person. is there.
[0003]
[Problems to be solved by the invention]
However, in the conventional speech recognition apparatus, when a desired recognition candidate cannot be obtained, the next recognition candidate is sequentially displayed. For example, when a high level of noise is mixed during speech input, the recognition candidate is sequentially displayed. The desired recognition candidate may not be displayed.
[0004]
An object of the present invention is to prevent useless selection operation of recognition candidates by telling the operator that the desired recognition result is not obtained even if the recognition candidates are sequentially displayed. The object is to provide a speech recognition device.
[0005]
[Means for Solving the Problems]
(1) The invention of claim 1 is a voice input device that inputs voice, a storage device that stores a recognition target word for the input voice, a voice that is input to the voice input device, and a storage device that stores the speech. The degree of coincidence is calculated to show the degree of coincidence with the recognition target word. large A control device that extracts a predetermined number of recognition target words in order and sets them as recognition candidates; an operation device that allows an operator to perform an operation of selecting a desired recognition candidate from at least recognition candidates; The operator The degree of agreement between the recognition candidates for the operation feeling when operating the controller According to the size of The above object is achieved by providing an operation feeling changing device for changing.
(2) In the voice recognition device according to claim 1, in the voice recognition device according to claim 1, the operation feeling changing device generates a force for assisting an operation in which the operator selects a recognition candidate using the operation device, The assist force to be generated is increased as the degree of coincidence of recognition candidates increases.
(3) The voice recognition device according to claim 2, wherein the operation device is a rotary input device including a wheel, and an operator can perform a selection operation by rotating the wheel. The operation feeling changing device recognizes the rotational force of the wheel and the degree of coincidence of recognition candidates. According to the size of It is characterized by changing.
(4) According to a fourth aspect of the present invention, in the voice recognition device according to the first aspect, the operation feeling changing device generates a force that resists an operation force for the operator to select a recognition candidate using the operation device. When the operator selects a recognition candidate whose matching degree is smaller than a predetermined value, a force resisting the operating force of the operating device is applied.
(5) The invention of claim 5 is the speech recognition apparatus of claim 1, wherein the operation feeling changing device generates a force that hinders an operation of selecting a recognition candidate by the operator using the operation device. When the difference between the degree of coincidence of the candidate and the degree of coincidence of the next recognition candidate is larger than a predetermined value, the operator applies a force that resists the operation force of the operating device when selecting the next recognition candidate. And
(6) According to the sixth aspect of the present invention, in the voice recognition device according to the fourth or fifth aspect, when the operator continues to operate the operating device against the resistance applied to the operating device from the operating feeling changing device, The voice input process is performed again.
(7) According to the seventh aspect of the present invention, in the voice recognition device of the fourth or fifth aspect, when the operator continues to operate the operating device against the force applied to the operating device from the operation feeling changing device, An input function other than input is activated.
(8) The invention according to claim 8 is the speech recognition apparatus according to claim 7, wherein the input function other than the voice input is a character input function.
(9) The invention according to claim 9 is the voice recognition device according to any one of claims 4 to 8, wherein the operation device is a rotary input device including a wheel, and the operator performs a selection operation by rotating the wheel. The operation feeling changing device generates a force that resists the rotational force of the wheel.
(10) In the voice recognition device according to claim 1, the operation device is a rotary input device including a wheel, and the operator can perform a selection operation by rotating the wheel. The operation feeling changing device generates a force that assists the operator in selecting the next recognition candidate from the state in which the recognition candidate is selected as the rotational force of the wheel, and the degree of coincidence is smaller than a predetermined value. The amount of rotation operation of the wheel when the operator selects a recognition candidate is larger than the amount of rotation operation of the wheel when the operator selects a recognition candidate whose matching degree is greater than a predetermined value.
[0006]
【The invention's effect】
The present invention has the following effects.
(1) According to the inventions of claims 1 to 10, the degree of coincidence between the input speech and the recognition target word According to the size of Since the operation feeling of the controller device used when selecting the recognition candidate is changed, the operator determines that the desired recognition candidate cannot be obtained even if the recognition candidate selection operation is continuously performed. It can be carried out.
(2) According to the invention of claim 2, the operating device is applied with a force that assists the operator in selecting the recognition candidate, and the greater the matching degree of the recognition candidates, the greater the applied force. The operator can determine the possibility of obtaining a desired recognition candidate based on the magnitude of the force applied to.
(3) According to the invention of claim 3, the operating device is a rotary input device provided with a wheel, and an operator can perform a selection operation by rotating the wheel, and the wheel rotates. Since the force is applied in the direction in which the operator rotates, the operator can determine the possibility of obtaining a desired recognition candidate based on the feeling during the wheel rotation operation.
(4) According to the invention of claim 4, when the operator selects a recognition candidate whose degree of coincidence is smaller than a predetermined value, a force resisting the operation force for the operator to select the recognition candidate is applied to the operating device. Therefore, the operator can determine that the desired recognition candidate cannot be obtained even if the recognition candidate selection operation is continuously performed.
(5) According to the invention of claim 5, when the difference between the coincidence degree of the recognition candidate and the coincidence degree of the next recognition candidate is larger than a predetermined value, when the operator selects the next recognition candidate, Since the operator resists the operation force for selecting the recognition candidate, the operator determines that the desired recognition candidate cannot be obtained even if the recognition candidate selection operation is continuously performed. Can do.
(6) According to the invention of claim 6, when an operation for selecting a recognition candidate is performed against the force applied to the operating device, the voice input process is performed again, so that a desired recognition candidate is obtained. Although the possibility is low, it is possible to prevent the recognition candidate selection operation from being continued.
(7) According to the invention of claim 7, when an operation for selecting a recognition candidate is performed against the force applied to the operating device, an input function other than voice input is activated. However, it is possible to prevent the selection operation of the recognition candidate from being continued even though there is a low possibility of being obtained.
(8) According to the invention of claim 8, since the input function other than the voice input is a character input function, it is possible to perform reliable input by character input.
(9) According to the invention of claim 9, the operating device is a rotary input device provided with a wheel, and an operator can perform a selection operation by rotating the wheel. Since a force that resists the force is applied, the operator makes a judgment that a desired recognition candidate cannot be obtained even if the selection operation of the recognition candidate is continuously performed, based on the feeling at the time of rotating the wheel. Can do.
(10) According to the invention of claim 10, the operating device is a rotary input device provided with a wheel, and an operator can perform a selection operation by rotating the wheel, and the degree of coincidence is high. The amount of rotation operation of the wheel when the operator selects a recognition candidate smaller than the predetermined value is larger than the amount of rotation operation when the operator selects a recognition candidate whose matching degree is greater than the predetermined value. The operator can determine that the desired recognition candidate cannot be obtained even if the selection operation of the recognition candidate is continuously performed based on the feeling at the time of the wheel rotation operation.
[0007]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
FIG. 1 is a diagram showing a configuration of a first embodiment of a speech recognition apparatus according to the present invention. The speech recognition apparatus according to the present invention includes a microphone 101, a speaker 102, a signal processing unit 103, an input device 104, and a display 105. The signal processing unit 103 includes an A / D converter 1031, a D / A converter 1032, an output amplifier 1033, and a signal processing device 1034.
[0008]
The audio input via the microphone 101 is input to the A / D converter 1031 of the signal processing unit 103 as an audio signal. The A / D converter 1031 converts the input audio signal into a digital signal and outputs the digital signal to the signal processing device 1034. The signal processing device 1034 includes a CPU 1034a and a memory 1034b, and calculates the degree of coincidence between the digital signal of the recognition target word stored in the external storage device 1035 and the digital signal of the input voice.
[0009]
The D / A converter 1032 converts a digital signal into an analog signal and outputs the analog signal to the output amplifier 1033 in order to output sound or the like from the speaker 102. The analog signal input from the D / A converter 1032 to the output amplifier 1033 is amplified and output as sound through the speaker 102.
[0010]
The display 105 is for displaying input speech recognition candidates and the like. The input device 104 includes a wheel 104a and a plurality of switches 104b. The input device 104 detects a voice recognition start request input by the operator, cancels input, selects a recognition candidate, and the like, and outputs the detected signal to the signal processing device 1034. The wheel 104a can be pushed in the direction of arrow A in FIG. 1 and can be rotated in the direction of arrow B. The pushing operation in the direction of the arrow A is performed, for example, at the time of the recognition candidate confirmation operation displayed on the display 105, and the rotation operation in the direction of the arrow B is performed at the time of the recognition candidate selection operation displayed on the display 105. .
[0011]
FIG. 2 is a detailed diagram illustrating the configuration of the input device 104. The input device 104 includes a wheel drive motor 104c, a wheel control CPU 104d, a wheel position sensor 104e, and a communication device 104f in addition to the wheel 104a and the switch 104b described above. The wheel drive motor 104c can generate torque in the rotation direction of the arrow B of the wheel 104a. When the torque is generated in the direction in which the operator rotates, the operator helps the wheel 104a to rotate, and when the torque is generated in the direction opposite to the direction in which the operator rotates, the operator rotates the wheel 104a. Will interfere. Due to the generation of this torque, the operator feels that the rotation operation of the wheel 104a becomes lighter or heavier. Therefore, the wheel drive motor 104c can change the operational feeling of the operator's wheel 104a.
[0012]
The wheel position sensor 104e detects the rotation angle of the wheel 104a and the pushing operation in the arrow A direction. The signal detected by the wheel position sensor 104e is sent to the wheel control CPU 104d. The wheel control CPU 104d digitizes the signal input from the wheel position sensor 104e and converts it into wheel position information, and based on generated torque pattern information and wheel position information, which will be described later, input from the signal processing device 1034. The amount of torque generated by the wheel drive motor 104c is calculated. The wheel control CPU 104d outputs a torque control signal based on the calculated generated torque amount to the wheel drive motor 104c. The wheel drive motor 104c is driven based on this control signal, and generates torque in the rotation direction of the arrow B of the wheel 104a.
[0013]
The communication device 104f is connected to the signal processing device 1034, outputs wheel position information input from the wheel control CPU 104d to the signal processing device 1034, and controls generated torque pattern information input from the signal processing device 1034 to wheel control. It outputs to CPU104d.
[0014]
FIG. 3 is a flowchart showing a processing procedure of an embodiment performed by the speech recognition apparatus according to the present invention. This control is performed by the signal processing device 1034 of the signal processing unit 103. The processing starting from step S201 starts when the operator operates the input device 104 and a signal indicating that voice input is to be started is input to the signal processing device 1034.
[0015]
In step S <b> 201, a notification sound signal for notifying the operator that voice recognition processing is to be started is read from the external storage device 1035 and output to the D / A converter 1032. The notification sound signal analog-converted by the D / A converter 1032 is output as a notification sound from the speaker 102 via the output amplifier 1033. The operator listens to the notification sound emitted from the speaker 102 and starts voice input to the microphone 101. Here, an example in which the speech recognition apparatus according to the present invention is applied to a car navigation apparatus will be described. That is, the operator inputs the destination by voice. Here, it is assumed that the name of the destination prefecture is inputted by voice, and the name of the prefecture is stored in the external storage device 1035 as a recognition target word.
[0016]
In the next step S202, capturing of the input voice is started. The voice uttered by the operator toward the microphone 101 is converted into a digital signal by the A / D converter 1031 and then input to the signal processing device 1034. When power is supplied from a power source (not shown), the microphone 101 picks up surrounding sounds and outputs them to the A / D converter 1031 before the operator operates the input device 104 in step S201. The digital signal converted in 1031 is input to the signal processing device 1034. The signal processing device 1034 calculates the average power of the input digital signal until the operator operates the input device 104 in step S201. When voice is input by operating the input device 104 in step S201, a digital signal having a power greater than the average power of the calculated digital signal is input. Therefore, the signal processing device 1034 determines that the operator has made a voice input to the microphone 101 when a digital signal having a power greater than a predetermined value from the calculated average power is input, and captures the voice. Start.
[0017]
When the audio capturing is started, the process proceeds to step S203. In step S203, the degree of coincidence between the captured voice and the recognition target word stored in the external storage device 1035 is calculated. The signal processing device 1034 identifies a voice section uttered by the operator based on the power of the signal among the digital signals of the voice that has been captured. The degree of coincidence is calculated by quantifying how similar the digital signal of this speech section and the digital signals of a plurality of recognition target words stored in the external storage device 1035 are. The larger the value of the degree of coincidence, the more similar the two being compared. Note that, while the matching degree is being calculated by the parallel processing, the voice is continuously captured.
[0018]
If the time during which the power of the digital signal of the voice being captured is equal to or less than the predetermined value continues for a predetermined time or longer, it is determined that the voice input by the operator has been completed, and the voice capturing is terminated in step S204. In the next step S205, after completion of the coincidence calculation process, a predetermined number of recognition target words are extracted in descending order of coincidence and set as recognition candidates. FIG. 4 is an example of recognition candidates displayed on the display 105. The display 105 displays the degree of coincidence along with the recognition candidates. The predetermined number of recognition target words to be extracted can be determined in advance, for example, 10. In FIG. 4, five recognition candidates are displayed in descending order of the degree of coincidence, and when the predetermined number is 10, there are further five recognition candidates having a degree of coincidence smaller than 880 (“Akita Prefecture”).
[0019]
When the extracted predetermined number of recognition candidates are displayed on display 105, the process proceeds to step S206. In step S206, the control is terminated when a signal indicating that the operator has selected and confirmed a desired recognition candidate from the recognition candidates displayed on the display 105 is input. That is, the operator selects a desired recognition candidate by rotating the wheel 104a of the input device 104 from the recognition candidates displayed on the display 105, and the wheel 104a is selected with respect to the selected desired recognition candidate. A desired recognition candidate is determined by performing a push-in operation. As described above, the rotation operation and push-in operation of the wheel 104a are detected by the wheel position sensor 104e, sent to the wheel control CPU 104d, and input to the signal processing device 1034 via the communication device 104f. The signal processing apparatus 1034 ends this control when receiving this signal.
[0020]
The voice recognition apparatus according to the present invention is characterized in that the input device 104 is controlled when the operator selects a desired candidate by rotating the wheel 104a from among a plurality of recognition candidates displayed on the display 105 in step S206. There is. This control will be described with reference to FIG.
[0021]
FIG. 5 is a diagram showing the relationship between the generated torque potential for generating torque in the rotation direction of the wheel 104a with respect to the wheel drive motor 104c and the rotation angle of the wheel 104a. This generated torque potential is a graph obtained by integrating the generated torque corresponding to the rotation angle of the wheel 104a. There are several types of graphs showing the relationship between the generated torque potential and the rotation angle, and these are called generated torque patterns. Since this graph makes it easy to visually grasp a plurality of generated torque patterns, it is used for the following explanation. The torque actually generated for each rotation angle of the wheel 104a is the slope of the graph corresponding to the rotation angle. . Of the generated torque potential, when the torque in the axial direction (positive direction) shown in FIG. 5 is generated, the rotation operation of the wheel 104a by the operator is hindered, and when torque in the opposite direction (negative direction) is generated. Assisting the operator to rotate the wheel 104a.
[0022]
As shown in FIG. 5, when the operator rotates the wheel 104a to select a desired candidate from the recognition candidates displayed on the display 105, the displayed recognition candidate, that is, “Kanagawa Prefecture” is displayed. ”,“ Saga Prefecture ”,“ Shiga Prefecture ”,“ Kumamoto Prefecture ”,“ Akita Prefecture ”, etc. are sequentially selected. As shown in FIG. 5, the generated torque potential corresponding to each recognition candidate is referred to as a “generated torque potential valley”. The depth of this valley is proportional to the degree of coincidence calculated in step S203. Therefore, referring to the recognition candidate and the corresponding degree of correspondence shown in FIG. 4, the valley of the generated torque potential corresponding to “Kanagawa” is the deepest, “Saga”, “Shiga”, “Kumamoto” In the order of “Akita”, the valley of the generated torque potential corresponding to each becomes shallower.
[0023]
As described above, when the torque in the direction opposite to the axial direction of the generated torque potential, that is, the torque corresponding to the valley portion of the generated torque potential is generated in the wheel drive motor 104c, the rotation of the wheel 104a is assisted. Therefore, when the operator selects the first recognition candidate “Kanagawa” by rotating the wheel 104a, the wheel 104a has a feeling of being attracted strongly, and “Kanagawa” is selected. It has become easy.
[0024]
When the operator further rotates the wheel 104a in the same direction from the state in which “Kanagawa Prefecture” is selected, the generated torque potential passes through the valley and becomes horizontal. This horizontal portion is in a state where no torque is generated in the wheel 104a. When the operator further rotates the wheel 104a in the same direction, the generated torque potential approaches the valley corresponding to “Saga”. Also at this time, a feeling of being strongly drawn occurs in the wheel 104a, and it is easy to select “Saga Prefecture”. However, since the depth of the valley corresponding to “Saga Prefecture” is shallower than the depth of the valley corresponding to “Kanagawa Prefecture”, the feeling of being attracted is smaller than in the previous operation.
[0025]
The operator can sequentially select the next recognition candidates by rotating the wheel 104a. The selected recognition candidate is enlarged and displayed on the display 105, and at the same time, the speaker 102 notifies the operator with synthesized speech. FIG. 6 shows a display 401 on the display 105 when the operator selects “Kanagawa Prefecture” and a synthesized voice 402 emitted from the speaker 102. Thereby, the operator can know exactly what recognition candidate has been selected. In addition, the degree of coincidence of recognition candidates can be known from the size of the feeling that can be drawn when the wheel 104a is rotated.
[0026]
If the word input by the operator is “Kanagawa Prefecture”, “Kanagawa Prefecture” can be determined by pressing the wheel 104a while “Kanagawa Prefecture” is selected. When a recognition candidate other than “Kanagawa Prefecture” is to be selected, the next recognition candidate is sequentially selected by rotating the wheel 104a. When selecting “Kumamoto Prefecture” as the fourth recognition candidate and “Akita Prefecture” as the fifth recognition candidate shown in FIG. 4, the recognition candidate matches due to the sense of being attracted when the wheel 104a is rotated. You can see that the degree is quite small. Therefore, even if an operation for selecting recognition candidates after the sixth recognition candidate is performed, the operator can determine that it is difficult to obtain a desired recognition candidate.
[0027]
(Second Embodiment)
The speech recognition apparatus according to the second embodiment is different from the speech recognition apparatus according to the first embodiment in processing performed by the signal processing device 1034. That is, among the processes performed by the signal processing device 1034, the processes described using the flowchart of FIG. 3 are the same, but when the operator selects a recognition candidate by rotating the wheel 104a, the wheel drive motor 104c. The torque pattern to be generated is different. Therefore, the torque pattern will be mainly described below.
[0028]
FIG. 7 is a diagram showing the relationship between the rotation angle of the wheel 104a and the generated torque potential corresponding to each rotation angle. The torque pattern used in the speech recognition apparatus of the second embodiment has a hill shape as shown in FIG. That is, after a region in which no torque is generated continues for a while, torque is generated in a direction that hinders the rotation operation of the wheel 104a (boundary C), and the state is maintained when the torque potential reaches a predetermined value.
[0029]
The position of the boundary C is determined based on the degree of coincidence calculated in step S203 in the flowchart of FIG. Specifically, a boundary C is provided where the degree of matching for each recognition candidate is below a predetermined threshold value X. If the degree of coincidence becomes smaller than a certain value, it is less likely that the operator can obtain a desired recognition candidate. Therefore, in consideration of this point, a predetermined threshold value X is obtained in advance by experiments or the like. The example using the degree of coincidence shown in FIG. In this case, the boundary C is provided between “Shiga Prefecture” with a matching degree of 5000 and “Kumamoto Prefecture” with a matching degree of 900.
[0030]
When the operator operates the wheel 104a to select “Kanagawa”, “Saga”, or “Shiga”, the wheel drive motor 104c is not operated (torque is not generated). The wheel 104a can be operated smoothly. However, when the operator rotates the wheel 104a to select “Kumamoto Prefecture” from the state where “Shiga Prefecture” is selected, torque is generated in the wheel drive motor 104c in a direction that prevents the wheel 104a from rotating. Therefore, a sensation of climbing a hill is generated on the wheel 104a. Therefore, the operator can know that the matching degree of the recognition candidate is below the threshold value X by the drag generated on the wheel 104a, and the desired recognition candidate can be selected even if the recognition candidate is further selected. Can be difficult to obtain.
[0031]
(Modification of the second embodiment)
The torque pattern used in the speech recognition apparatus according to the second embodiment generates torque in a direction in which the rotation operation of the wheel 104a is hindered in a portion where a desired recognition candidate cannot be obtained even when a recognition candidate is further selected. Is. Therefore, the torque pattern may be a mountain shape as shown in FIG. Also in this case, when the operator rotates the wheel 104a and reaches the portion of the boundary C where the matching degree of the recognition candidate falls below the threshold value X, drag occurs on the wheel 104a, and the selection of the recognition candidate is further performed. Even if the operation is performed, it can be determined that it is difficult to obtain a desired recognition candidate.
[0032]
(Third embodiment)
The speech recognition apparatus according to the third embodiment is different from the speech recognition apparatuses according to the first and second embodiments in processing performed by the signal processing apparatus 1034. That is, among the processes performed by the signal processing apparatus 1034, the processes described using the flowchart of FIG. 3 are the same, but the torque pattern generated when the wheel 104a is rotated is different. Therefore, the torque pattern will be mainly described below.
[0033]
FIG. 9 is a diagram showing the relationship between the rotation angle of the wheel 104a and the generated torque potential corresponding to each rotation angle. The torque pattern used in the speech recognition apparatus according to the third embodiment has a torque potential valley assigned to each recognition candidate, like the torque pattern used in the speech recognition apparatus according to the first embodiment. Yes. However, as shown in FIG. 9, the depth of the valley of the torque potential for each recognition candidate is the same, and the interval between the valleys at the position of the boundary C is wider than the interval between other valleys. Is set.
[0034]
The boundary C is provided where the degree of coincidence for each recognition candidate is below a predetermined threshold value X, as in the second embodiment. Therefore, in the example using the matching degree shown in FIG. 4, if the threshold value X is 1000, for example, the boundary C is “Shiga Prefecture” with a matching degree of 5000 and “Kumamoto Prefecture” with a matching degree of 900. Is provided between. In this case, the rotation operation amount of the wheel 104a when the operator selects “Saga Prefecture” from “Kanagawa Prefecture” and the rotation operation amount of the wheel 104a when “Shiga Prefecture” is selected from “Saga Prefecture” Although the same, the rotation operation amount of the wheel 104a performed for selecting “Kumamoto Prefecture” from the state where “Shiga Prefecture” is selected is the same as that for selecting another recognition candidate, for example, “Saga Prefecture”. "Is larger than the rotation operation amount of the wheel 104a performed to select"".
[0035]
When the operator sequentially selects recognition candidates by rotating the wheel 104a, as described above, in order to select “Kumamoto Prefecture” from the state where “Shiga Prefecture” is selected, the wheel 104a is rotated more. It is necessary to operate. Therefore, when selecting “Kumamoto Prefecture”, the operator can know that the degree of coincidence of “Kumamoto Prefecture” is smaller than the threshold value X. It can be determined that it is difficult to obtain a recognition candidate.
[0036]
(Modification of the third embodiment)
The torque pattern used in the speech recognition apparatus according to the third embodiment is based on the interval between the valleys and valleys where the desired recognition candidates cannot be obtained even if the recognition candidates are further selected. And wider than the interval. FIG. 10 shows a torque pattern in which the interval between the valleys is changed based on the matching degree of the recognition candidates. In the torque pattern shown in FIG. 10, the rotation angle from the origin O to the selection of each recognition candidate is a reciprocal of the degree of coincidence of each recognition candidate or a value proportional to the reciprocal. The smaller the matching degree of the recognition candidate, the larger the reciprocal of the matching degree. Therefore, the amount of rotation operation of the wheel 104a is increased as the difference between the actually input voice and the recognition target word increases. Thereby, even if it does not specify the position of the boundary C mentioned above, it can know the possibility that a desired recognition candidate can be obtained from the amount of rotation operation of the wheel 104a when selecting a recognition candidate. That is, it can be determined that it is difficult to obtain a desired recognition candidate when the amount of rotation operation of the wheel 104a when selecting the next recognition candidate from a state where a certain recognition candidate is selected increases.
[0037]
(Fourth embodiment)
The speech recognition apparatus according to the fourth embodiment is different from the speech recognition apparatuses according to the first, second, and third embodiments in processing performed by the signal processing device 1034. That is, among the processes performed by the signal processing device 1034, the processes from step S201 to step S205 in the flowchart of FIG. 3 are the same, but the possibility that a desired recognition candidate can be obtained in step S206 is low. Activate the re-input function.
[0038]
Since the torque pattern shown in FIG. 11 is the same as the torque pattern shown in FIG. 8, detailed description thereof is omitted. When FIG. 4 is referred, the coincidence degree of the recognition candidate after "Kumamoto Prefecture" which is the 4th recognition candidate is quite low. Therefore, if the words input by the operator are not “Kanagawa Prefecture”, “Saga Prefecture”, or “Shiga Prefecture”, it is unlikely that a desired recognition candidate can be obtained even if a recognition candidate selection operation is performed. . In the speech recognition apparatus according to the fourth embodiment, recognition candidates after “Shiga Prefecture” are not displayed on the display 105, but a command “return” is displayed after “Shiga Prefecture”. The boundary C between the “Shiga Prefecture” and “Return” commands is determined by the degree of coincidence and the threshold value X, as the boundary C is defined by the speech recognition apparatuses of the second and third embodiments. Determine by comparison.
[0039]
When the operator further rotates the wheel 104a from the state where “Shiga Prefecture” is selected, drag acts on the wheel 104a (boundary C in FIG. 11). When the rotation operation is continued against this drag, the “return” command is selected and the voice re-input function is activated. This control will be described with reference to the flowchart shown in FIG. Of the processes in each step of the flowchart shown in FIG. 13, the processes in steps S201 to S205 are the same as the processes performed in the flowchart shown in FIG.
[0040]
If the recognition candidate is displayed on the display 105 in step S205, the process proceeds to step S210. In step S210, it is determined whether or not the above-described “return” command has been selected by the operator. When the operator performs a rotation operation against the drag generated on the wheel 104a and selects “return”, the wheel position sensor 104e detects the rotation operation of the wheel 104a and sends it to the wheel control CPU 104d to send the communication device 104f to the communication device 104f. To the signal processing device 1034. When the signal processing apparatus 1034 receives this signal, the process returns to step S201, and a notification sound signal is output to the D / A converter 1032 so that the operator can input voice again. The notification sound signal input to the D / A converter 1032 is emitted as a notification sound from the speaker 102 via the output amplifier 1033. The operator who hears the notification sound performs voice input again. On the other hand, if selection / confirmation of the “return” command is not performed in step S210, recognition candidate determination processing is performed in step S206, and the control is terminated.
[0041]
According to the speech recognition apparatus of the fourth embodiment, when the operator selects a recognition candidate by rotating the wheel 104a, there is a possibility that a desired recognition candidate can be obtained from the degree of coincidence of the recognition candidates. When the match is low, a recognition candidate having a degree of coincidence smaller than the predetermined value X is not displayed on the display 105, and a “return” command is provided for performing voice input again. As a result, it is possible to promptly re-input the voice, and it is possible to confirm the recognition candidate earlier without using a wasteful recognition candidate selection operation.
[0042]
In the above-described embodiment, the voice re-input function is automatically activated when the “return” command is selected. However, the “return” command is selected and the wheel 104a is pushed. The voice re-input function may be activated after being confirmed by the operation.
[0043]
(Fifth embodiment)
In the voice recognition apparatus according to the fourth embodiment, when it is unlikely that a desired recognition candidate is obtained, the operator inputs voice again, but the voice recognition according to the fifth embodiment. The device asks for key input instead of voice input again.
[0044]
The torque pattern shown in FIG. 12 is also the same as the torque pattern shown in FIG. As in the case of the fourth speech recognition apparatus, if the word input by the operator is not “Kanagawa”, “Saga”, or “Shiga”, the desired recognition can be performed even if the recognition candidate is further selected. It is unlikely that a candidate will be obtained. In the speech recognition apparatus according to the fifth embodiment, the recognition candidates after “Shiga Prefecture” are not displayed on the display 105, but the key input after “Shiga Prefecture”, that is, for the operator to perform manual input. Displays the command “manual”. The method of determining the boundary C between the “Shiga Prefecture” and “manual” commands is also the same as in the speech recognition apparatuses of the second to fourth embodiments.
[0045]
The control and operation by the operator described below are performed when the operator cannot obtain a desired recognition candidate in step S206 of the flowchart shown in FIG. That is, the processing from step S201 to step S205 is performed in the same manner as the speech recognition apparatus according to the first embodiment.
[0046]
Since the desired recognition candidate is not displayed on the display 105, if the operator further rotates against the drag generated on the wheel 104a (boundary C in FIG. 12), the “manual” command is selected. A key input screen as shown in FIG. 14 is displayed on the display 105. On the key input screen, 46 keys from "A" to "N", "-" key for inputting long sound included in characters such as "Half", "Search" key, "Correction" key A display 500 for displaying the input characters is shown.
[0047]
The “search” key is used to search for a recognition target word that matches a character string input by key input, and the recognition target word can be searched even during input. For example, when inputting “Kanagawa Prefecture”, if three characters up to “Kanaga” are input and the “Search” key is selected and confirmed, “Kanagawa Prefecture” is displayed on the display 500. In this case, the more characters that are input before the “search” key is input, the higher the possibility of obtaining a desired recognition target word. The “correction” key is used to cancel the last character of the input character string. For example, if “correction” key is selected / confirmed when “Kanaza” is input, “Z”, which is the last character, is deleted.
[0048]
When the screen shown in FIG. 14 is displayed, the first character of each line such as “A”, “KA”, “SA”,. , “Correction” can be selected. In FIG. 14, “NA” is selected. In this state, when a predetermined forward rotation operation is performed on the wheel 104a, “ha”, “ma”,... Are selected in this order, and when a reverse rotation operation is performed, “ta”, “ "", ..., are selected in this order. When the pushing operation of the wheel 104a is performed in a state where “NA” is selected, “NA”, “NI”, “NU”, “NE”, “NO” in the “NA” row are displayed as shown in FIG. One of them can be selected. In this state, by rotating the wheel 104a, it is possible to select a desired character from “NA” to “NO”, and to confirm the character selected by the pushing operation of the wheel 104a.
[0049]
When the user wants to select a line again in order to input the next character, the user returns to the screen shown in FIG. 14 by pressing a “return switch” (not shown) of the switches 104b provided in the input device 104. be able to. The recognition target word is displayed on the display 500 after all characters have been input by the above-described method or by selecting / determining the “search” key during character input as described above.
[0050]
According to the speech recognition apparatus of the fifth embodiment, when an operator selects a recognition candidate by rotating the wheel 104a, there is a possibility that a desired recognition candidate can be obtained from the degree of matching of the recognition candidates. In the case of low, a recognition candidate having a low coincidence is not displayed on the display 105, but a command “manual” is provided for manual input (key input). When the rotation operation is continued against the drag generated on the wheel 104a, a “manual” command is selected and a screen for key input is displayed on the display 500, and key input can be performed. Key input takes more time and effort than voice input, but can be input reliably. Accordingly, if a desired recognition candidate cannot be obtained even if a recognition candidate selection operation for voice input is performed, it is possible to omit a useless recognition candidate selection operation and to quickly and reliably determine a recognition candidate.
[0051]
In the above-described fifth embodiment, when the “manual” command is selected, the key input screen is automatically displayed. However, the “manual” command is selected and the wheel 104a is selected. The key input screen may be displayed after being confirmed by the pressing operation.
[0052]
The present invention is not limited to the embodiment described above. For example, in the speech recognition apparatuses according to the second to fifth embodiments, the boundary C is defined as a location where the matching degree of each recognition candidate is lower than a predetermined threshold value X. The boundary C may be defined as a location where the difference in the degree of coincidence exceeds a predetermined threshold value Y. Taking the recognition candidate shown in FIG. 4 and the corresponding matching degree as an example, the predetermined threshold Y is set to 1500, for example. The difference in coincidence between “Kanagawa Prefecture” and “Saga Prefecture” is 500, and the difference in coincidence between “Saga Prefecture” and “Shiga Prefecture” is also 500, so both are smaller than the threshold Y. However, since the difference in coincidence between “Shiga Prefecture” and “Kumamoto Prefecture” is 4100, the threshold value Y is exceeded. Therefore, the boundary C can be defined between “Shiga Prefecture” and “Kumamoto Prefecture”.
[0053]
In the speech recognition apparatuses of the fourth and fifth embodiments, the mountain pattern shown in FIG. 8 is used as the torque pattern, but the torque pattern shown in FIG. 7 can also be used.
[0054]
In each of the above-described embodiments, the rotation operation of the wheel 104a is performed in order to select a recognition candidate using the input device 104. However, the joystick is adopted as the input device 104, and the recognition candidate is selected by the joystick. Can also be done. Alternatively, a keyboard or controller may be employed as the input device, and recognition candidates may be selected using a cross key. Also, the key input screen used in the speech recognition apparatus of the fifth embodiment is not limited to that shown in FIG.
[0055]
In the above-described embodiment, an example in which the speech recognition apparatus according to the present invention is applied to a car navigation apparatus has been described, but the present invention can also be applied to apparatuses other than a car navigation apparatus.
[Brief description of the drawings]
FIG. 1 is a diagram showing the configuration of an embodiment of a speech recognition apparatus according to the present invention.
FIG. 2 is a diagram showing a configuration of an embodiment of an input device used in a speech recognition device according to the present invention.
FIG. 3 is a flowchart showing a control procedure of an embodiment performed in the signal processing device.
FIG. 4 is a diagram showing an example of recognition candidates displayed on the display.
FIG. 5 is a diagram showing a torque pattern used in the speech recognition apparatus according to the first embodiment.
FIG. 6 is a diagram showing that a selected recognition candidate is displayed on the display and notified by voice.
FIG. 7 is a diagram showing a torque pattern used in the speech recognition apparatus according to the second embodiment.
FIG. 8 is a diagram showing a modification of the torque pattern used in the speech recognition apparatus according to the second embodiment.
FIG. 9 is a diagram showing a torque pattern used in the speech recognition apparatus according to the third embodiment.
FIG. 10 is a diagram showing a modification of the torque pattern used in the speech recognition apparatus according to the third embodiment.
FIG. 11 is a diagram showing a torque pattern used in the speech recognition apparatus according to the fourth embodiment and a command for re-inputting speech.
FIG. 12 is a diagram showing torque patterns and commands for manual input used in the speech recognition apparatus according to the fifth embodiment.
FIG. 13 is a flowchart showing a control procedure performed by the signal processing device of the speech recognition device according to the fourth embodiment.
FIG. 14 is a diagram showing a screen for manual input
FIG. 15 is a diagram when the “NA” line is selected on the manual input screen.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 101 ... Microphone, 102 ... Speaker, 103 ... Signal processing unit, 1031 ... A / D converter, 1032 ... D / A converter, 1033 ... Output amplifier, 1034 ... Signal processing device, 1034a ... CPU, 1034b ... Memory, 1035 ... External Storage device 104 ... Input device 104a ... Wheel 104b ... Switch 104c ... Wheel drive motor 104d ... Wheel control CPU 104e ... Wheel position sensor 104f ... Communication device 500 ... Display

Claims

A voice input device for inputting voice;
A storage device for storing words to be recognized for input speech;
A voice input to the voice input device, along with the recognized word stored to calculates the matching degree indicating a degree of matching in the storage device, the recognition target words a larger order predetermined number of the coincidence degree A control device that is extracted and set as a recognition candidate;
An operating device that allows an operator to perform an operation of selecting a desired recognition candidate from at least the recognition candidates;
Operator speech recognition apparatus characterized by comprising an operation feeling changing device for changing in accordance with an operation feeling when operating the operating device to the coincidence of the magnitude of the recognition candidates.

The speech recognition apparatus according to claim 1,
The operation feeling changing device generates a force for assisting an operation in which an operator selects the recognition candidate using the operation device, and the assist force to be generated increases as the matching degree of the recognition candidates increases. A speech recognition apparatus characterized by:

The speech recognition device according to claim 2,
The operation device is a rotary input device including a wheel, and an operator can perform the selection operation by rotating the wheel.
The operation feeling changing device changes the rotational force of the wheel according to the degree of coincidence of the recognition candidates.

The speech recognition apparatus according to claim 1,
The operation feeling changing device generates a force that resists an operation force by which an operator selects the recognition candidate using the operation device, and the operator selects a recognition candidate whose matching degree is smaller than a predetermined value. A voice recognition device characterized by applying a force resisting the operation force of the operation device when selecting.

The speech recognition apparatus according to claim 1,
The operational feeling changing device generates a force that prevents an operator from selecting the recognition candidate using the operating device, and the difference between the coincidence degree of the recognition candidate and the coincidence degree of the next recognition candidate. A speech recognition device, wherein when the operator selects the next recognition candidate, a force resisting the operation force of the operation device is applied when the operator selects the next recognition candidate.

The speech recognition device according to claim 4 or 5,
A voice recognition device, wherein voice input processing is performed again when an operator continues to operate the operating device against a resistance force applied to the operating device from the operating feeling changing device.

The speech recognition device according to claim 4 or 5,
A speech recognition device, wherein an input function other than the speech input is activated when an operator continues to operate the operation device against a force applied to the operation device from the operation feeling change device.

The speech recognition apparatus according to claim 7.
An input function other than the voice input is a character input function.

The speech recognition apparatus according to any one of claims 4 to 8,
The operation device is a rotary input device including a wheel, and an operator can perform the selection operation by rotating the wheel.
The voice recognition device, wherein the operation feeling changing device generates a force that resists a rotational force of the wheel.

The speech recognition apparatus according to claim 1,
The operation device is a rotary input device including a wheel, and an operator can perform the selection operation by rotating the wheel.
The operational feeling change device generates a force that assists an operator to select a next recognition candidate from a state in which the recognition candidate is selected as a rotational force of the wheel.
Rotation operation amount of the wheel when the operator selects a recognition candidate whose matching degree is smaller than a predetermined value, and rotation of the wheel when the operator selects a recognition candidate whose matching degree is larger than a predetermined value A voice recognition device characterized by having a larger amount of operation.