JP6996944B2

JP6996944B2 - Speech recognition system

Info

Publication number: JP6996944B2
Application number: JP2017214359A
Authority: JP
Inventors: 昭寛四家
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2017-11-07
Filing date: 2017-11-07
Publication date: 2022-01-17
Anticipated expiration: 2037-11-07
Also published as: JP2019086643A

Description

本発明は、ユーザの発話音声を認識する音声認識の技術に関するものである。 The present invention relates to a voice recognition technique for recognizing a user's spoken voice.

ユーザの発話音声を認識する音声認識の技術としては、ユーザの音声認識開始指示操作を必要とすることなく、常時、ユーザの発話音声を認識する技術が知られている（たとえば、特許文献１）。 As a voice recognition technique for recognizing a user's uttered voice, a technique for constantly recognizing a user's uttered voice without requiring a user's voice recognition start instruction operation is known (for example, Patent Document 1). ..

また、ユーザの発話音声を認識する音声認識の技術としては、端末において、自身が備えた音声認識装置によってユーザの発話音声の音声認識を行うと共に、外部の音声認識サーバを用いたユーザの発話音声の音声認識を行い、いずれか一方の音声認識よって得られた音声認識結果を利用する技術が知られている（たとえば、特許文献２、３）。 In addition, as a voice recognition technology for recognizing a user's voice, the terminal recognizes the user's voice by a voice recognition device provided by the terminal and also recognizes the user's voice by using an external voice recognition server. There is known a technique for performing voice recognition in the above and using the voice recognition result obtained by either voice recognition (for example, Patent Documents 2 and 3).

国際公開第２０１４/０６８７８８号International Publication No. 2014/068788 特開２０１５-１０２７９５号JP-A-2015-102795 特開２０１３-０６４７７７号JP 2013-064777

端末において、自身が備えた音声認識装置によるユーザの発話音声の音声認識と、外部の音声認識サーバを用いたユーザの発話音声の音声認識とを、異なる機能のコマンド入力に使用する場合、並行して双方の音声認識を行うことは、ユーザが発話音声によって入力したコマンドが、ユーザがコマンドの入力を意図した機能と異なる機能に対するコマンドとして入力されてしまう可能性が生じるため適切ではない。 When using the voice recognition of the user's spoken voice by the voice recognition device provided by the terminal and the voice recognition of the user's spoken voice by using an external voice recognition server for command input of different functions, they are performed in parallel. It is not appropriate to perform both voice recognition because the command input by the user by the spoken voice may be input as a command for a function different from the function intended by the user to input the command.

そこで、自身が備えた音声認識装置によるユーザの発話音声の音声認識と、外部の音声認識サーバを用いたユーザの発話音声の音声認識とは、選択的に切り替えて行うことが望ましい。 Therefore, it is desirable to selectively switch between the voice recognition of the user's spoken voice by the voice recognition device provided by the user and the voice recognition of the user's spoken voice by using the external voice recognition server.

一方で、自身が備えた音声認識装置による音声認識をユーザの音声認識開始指示操作を必要とすることなく常時行うようにした場合において、自身が備えた音声認識装置によるユーザの発話音声の音声認識と、外部の音声認識サーバを用いたユーザの発話音声の音声認識とを、選択的に切り替えて行うようにすると次のような問題が生じる。 On the other hand, when the voice recognition by the voice recognition device provided by the user is always performed without requiring the user's voice recognition start instruction operation, the voice recognition of the user's spoken voice by the voice recognition device provided by the user is performed. And, if the voice recognition of the user's spoken voice using an external voice recognition server is selectively switched, the following problems occur.

すなわち、この場合には、自身が備えた音声認識装置による音声認識のみをユーザの音声認識開始指示操作を必要とすることなく常時行う第１のモードと、外部の音声認識サーバのみを用いたユーザの発話音声の音声認識行う第２のモードを設け、通常は、第１のモードで音声認識を行い、ユーザの操作に応じて、一時的に、第２のモードで音声認識を行い、音声認識を終了したならば、第１のモードに復帰することが考えられるが、このようにすると、第２のモードの期間中に、自身が備えた音声認識装置による音声認識をコマンドの入力に用いる機能に対して緊急を要するコマンドを入力する必要が生じた場合でも、第１のモードによる音声認識の音声認識開始指示操作が存在しないために、第２のモードから第１のモードに強制的に切り替えることができず、当該緊急を要するコマンドの入力が行えなくなってしまう。 That is, in this case, the first mode in which only the voice recognition by the voice recognition device provided by the user is always performed without requiring the user's voice recognition start instruction operation, and the user using only the external voice recognition server. A second mode is provided for voice recognition of the spoken voice of the above, and normally, voice recognition is performed in the first mode, and voice recognition is temporarily performed in the second mode according to the user's operation, and voice recognition is performed. When is finished, it is conceivable to return to the first mode, but if this is done, the function of using the voice recognition by the voice recognition device provided by itself for inputting the command during the period of the second mode. Even if it becomes necessary to input an urgent command for, the second mode is forcibly switched to the first mode because there is no voice recognition start instruction operation for voice recognition by the first mode. This makes it impossible to enter the urgent command.

一方で、第２のモードの期間中も、自身が備えた音声認識装置による音声認識を行うものとすれば、上述のように、ユーザが発話音声によって入力したコマンドが、ユーザがコマンドの入力を意図した機能と異なる機能に対するコマンドとして入力されてしまうことがある。 On the other hand, if voice recognition is performed by the voice recognition device provided by the user even during the period of the second mode, as described above, the command input by the user by the spoken voice is input by the user. It may be entered as a command for a function that is different from the intended function.

そこで、本発明は、音声認識をユーザの音声認識開始指示操作を必要とすることなく常時行う第１の音声認識手段と、第２の音声認識手段とを、異なる機能に対する入力に用いる音声認識システムにおいて、各機能に対する誤入力の抑制しつつ、常時、第１の音声認識手段の音声認識を用いた緊急を要する入力を行えるようにすることを課題とする。 Therefore, the present invention is a voice recognition system in which a first voice recognition means and a second voice recognition means, which constantly perform voice recognition without requiring a user's voice recognition start instruction operation, are used for input to different functions. In the present invention, it is an object of the present invention to be able to perform urgent input using the voice recognition of the first voice recognition means at all times while suppressing erroneous input for each function.

前記課題達成のために、本発明は、ユーザの発話した音声を音声認識する音声認識システムに、マイクロフォンと、前記マイクロフォンで収音した音声が表すコマンドの常時の音声認識を行う第１の音声認識手段と、入力したコマンドが実行を命令する処理を実行する第１の機能部と、前記マイクロフォンで収音した音声を音声認識する音声認識動作を行う第２の音声認識手段と、前記第２の音声認識装置の音声認識結果を処理する第２の機能部と、前記第１の音声認識手段が音声認識するコマンドのうちの一部のコマンドを優先コマンドとして登録した優先コマンド記憶手段と、音声認識動作制御手段と設けたものである。ここで、当該音声認識動作制御手段は、音声認識モードとして備えた第１の音声認識モードと第２の音声認識モードとの間の切り替えを制御すると共に、前記音声認識モードが前記第１の音声認識モードであるときに、前記第２の音声認識手段の前記音声認識動作を停止すると共に、第１の音声認識手段が音声認識したコマンドを前記第１の機能部に入力し、前記音声認識モードが前記第２の音声認識モードであるときに、前記第２の音声認識手段に前記音声認識動作を実行させると共に、前記第１の音声認識手段が音声認識したコマンドが、前記優先コマンド記憶手段に登録された優先コマンドである場合にのみ、当該第１の音声認識手段が音声認識したコマンドを前記第１の機能部に入力する。 In order to achieve the above object, the present invention is a first voice recognition in which a voice recognition system that recognizes a voice spoken by a user is constantly voice-recognized with a microphone and a command represented by the voice picked up by the microphone. The means, a first functional unit that executes a process in which an input command commands execution, a second voice recognition means that performs a voice recognition operation that recognizes the voice picked up by the microphone, and the second voice recognition means. A second functional unit that processes the voice recognition result of the voice recognition device, a priority command storage means that registers some commands among the commands that the first voice recognition means recognizes as voice recognition means, and voice recognition. It is provided as an operation control means. Here, the voice recognition operation control means controls switching between the first voice recognition mode and the second voice recognition mode provided as the voice recognition mode, and the voice recognition mode is the first voice. In the recognition mode, the voice recognition operation of the second voice recognition means is stopped, and the command recognized by the first voice recognition means is input to the first function unit to perform the voice recognition mode. Is the second voice recognition mode, the second voice recognition means is made to execute the voice recognition operation, and the command recognized by the first voice recognition means is used as the priority command storage means. Only when it is a registered priority command, the command recognized by the first voice recognition means is input to the first function unit.

ここで、このような音声認識システムは、前記音声認識動作制御手段において、前記第１の音声認識モードにあるときに、ユーザからの所定の入力が発生したときに、前記音声認識モードを前記第２の音声認識モードに切り替え、前記第２の音声認識モードにあるときに、前記第２の音声認識手段の前記音声認識動作が完了したときに、前記音声認識モードを前記第１の音声認識モードに切り替えるように構成してもよい。 Here, such a voice recognition system sets the voice recognition mode to the first voice recognition mode when a predetermined input from the user is generated while the voice recognition operation control means is in the first voice recognition mode. When the voice recognition operation of the second voice recognition means is completed while switching to the voice recognition mode of 2 and in the second voice recognition mode, the voice recognition mode is changed to the first voice recognition mode. It may be configured to switch to.

また、このような音声認識システムは、前記第２の音声認識手段を、前記音声認識動作として、音声認識サービスを提供する外部の音声認識サーバに通信を介して接続し、接続した音声認識サーバの音声認識サービスを利用して、前記マイクロフォンで収音した音声の音声認識を行うものとしてもよい。 Further, in such a voice recognition system, the second voice recognition means is connected to an external voice recognition server that provides a voice recognition service as the voice recognition operation via communication, and the connected voice recognition server. The voice recognition service may be used to perform voice recognition of the voice picked up by the microphone.

また、このような音声認識システムは、当該音声認識システムは、自動車に搭載されたシステムにおいて音声入力に用いられる音声認識システムであってよい。
また、このような音声認識システムを、自動車に搭載された車載システムと、当該車載システムと選択的に接続されるポータブル装置とより構成し、前記車載システムに、前記マイクロフォンと前記第１の音声認識手段と前記第１の機能部と前記優先コマンド記憶手段と前記音声認識動作制御手段とを設け、前記ポータブル装置に、前記第２の音声認識手段と前記第２の機能部を設け、前記音声認識動作制御手段において、前記音声認識モードが前記第２の音声認識モードにあるときに、前記マイクロフォンで収音した音声を前記車載システムから前記ポータブル装置に転送し、前記第２の音声認識手段において、前記車載システムから前記ポータブル装置に転送された音声を音声認識する音声認識動作を行うように構成してもよい。 Further, in such a voice recognition system, the voice recognition system may be a voice recognition system used for voice input in a system mounted on an automobile.
Further, such a voice recognition system is composed of an in-vehicle system mounted on an automobile and a portable device selectively connected to the in-vehicle system, and the in-vehicle system includes the microphone and the first voice recognition. The means, the first functional unit, the priority command storage means, and the voice recognition operation control means are provided, and the portable device is provided with the second voice recognition means and the second functional unit, and the voice recognition is performed. In the operation control means, when the voice recognition mode is in the second voice recognition mode, the voice picked up by the microphone is transferred from the in-vehicle system to the portable device, and the second voice recognition means. It may be configured to perform a voice recognition operation for recognizing the voice transferred from the in-vehicle system to the portable device.

また、以上の各音声認識システムは、前記優先コマンド記憶手段を、前記第１の音声認識手段が音声認識するコマンドのうちの、当該コマンドが実行を命令する処理が前記自動車の安全確保に関わる処理であるコマンドが、少なくとも、優先コマンドとして登録されているものとしてもよい。 Further, in each of the above voice recognition systems, among the commands for voice recognition of the priority command storage means by the first voice recognition means, the processing in which the command commands execution is related to ensuring the safety of the automobile. It may be assumed that the command is at least registered as a preferred command.

また、以上の各音声認識システムは、前記優先コマンド記憶手段を、前記第１の音声認識手段が音声認識するコマンドのうちの、当該コマンドを表す音声を音声認識した音声認識結果に対する有意な処理が当該時点において前記第２の機能部に規定されていないコマンドが、少なくとも優先コマンドとして登録されているものとしてもよい。 Further, in each of the above voice recognition systems, the priority command storage means is subjected to significant processing for the voice recognition result of voice recognition of the voice representing the command among the commands for voice recognition by the first voice recognition means. A command not specified in the second functional unit at that time may be registered as at least a priority command.

以上のような音声認識システムによれば、第１の音声認識モードで、第１の音声認識手段による第１の機能部へのコマンド入力を行っているときには、２の音声認識手段による音声認識は停止すると共に、第２の音声認識モードで、第２の音声認識手段による第２の機能部への音声入力を行っているときには、基本的には、常時の音声認識を行う第１の音声認識手段の音声認識を用いた第１の機能部へのコマンドの入力を停止する。 According to the voice recognition system as described above, in the first voice recognition mode, when the command is input to the first functional unit by the first voice recognition means, the voice recognition by the second voice recognition means is performed. When the voice is stopped and the voice is input to the second functional unit by the second voice recognition means in the second voice recognition mode, basically, the first voice recognition that constantly performs voice recognition is performed. Stops inputting commands to the first functional unit using the voice recognition of the means.

よって、ユーザが第１の機能部にコマンドを入力するために発話した音声を、第２の機能部に音声入力してしまったり、ユーザが第２の機能部に音声入力するために発話した音声を、第１の機能部へのコマンド入力の音声と誤認識して第１の機能部へコマンドを誤入力してしまうことは抑制される。 Therefore, the voice spoken by the user to input a command to the first function unit may be voice-input to the second function unit, or the voice spoken by the user to input voice to the second function unit. Is erroneously recognized as the voice of the command input to the first functional unit, and the erroneous input of the command to the first functional unit is suppressed.

一方で、第２の音声認識手段による第２の機能部への音声入力を行っているときでも、第１の音声認識手段の音声認識によって、優先コマンド記憶手段に登録された優先コマンドが認識された場合には、これを第１の機能部へ入力させる。 On the other hand, even when the voice is input to the second functional unit by the second voice recognition means, the priority command registered in the priority command storage means is recognized by the voice recognition of the first voice recognition means. If so, this is input to the first functional unit.

したがって、前記自動車の安全確保に関わる処理の実行を命令するコマンド等の緊急性のある処理の実行を命令するコマンドを優先コマンドとして優先コマンド記憶手段に登録しておくことにより、常時、第１の音声認識手段の音声認識を用いた緊急性のある処理の実行を命令するコマンドの第１の機能部への入力を行えるようになる。 Therefore, by registering a command for instructing the execution of an urgent process such as a command for instructing the execution of the process related to ensuring the safety of the automobile as a priority command in the priority command storage means, the first command is always performed. It becomes possible to input a command for instructing the execution of an urgent process using the voice recognition of the voice recognition means to the first functional unit.

また、コマンドを表す音声を音声認識した音声認識結果に対する有意な処理が当該時点において前記第２の機能部に規定されていない第１の機能部のコマンド、すなわち、第２の機能部に当該コマンドを表す音声が音声入力されてしまっても差し障りのないコマンドを優先コマンドとして優先コマンド記憶手段に登録しておくことにより、これらのコマンドについて、常時、第１の音声認識手段の音声認識を用いたコマンドの第１の機能部への入力を行えるようになる。 Further, the command of the first functional unit, that is, the command of the second functional unit, whose significant processing for the voice recognition result of voice-recognizing the voice representing the command is not specified in the second functional unit at that time. By registering commands that do not hinder the voice input of the above as priority commands in the priority command storage means, the voice recognition of the first voice recognition means is always used for these commands. You will be able to input commands to the first functional part.

以上のように、本発明によれば、音声認識をユーザの音声認識開始指示操作を必要とすることなく常時行う第１の音声認識手段と、第２の音声認識手段とを、異なる機能に対する入力に用いる音声認識システムにおいて、各機能に対する誤入力の抑制しつつ、常時、第１の音声認識手段の音声認識を用いた緊急を要する入力を行えるようにすることができる。 As described above, according to the present invention, the first voice recognition means and the second voice recognition means, which constantly perform voice recognition without requiring the user's voice recognition start instruction operation, are input to different functions. In the voice recognition system used for the above, it is possible to always perform urgent input using the voice recognition of the first voice recognition means while suppressing erroneous input for each function.

本発明の実施形態に係る情報処理システムの構成を示すブロック図である。It is a block diagram which shows the structure of the information processing system which concerns on embodiment of this invention. 本発明の実施形態に係る外部音声認識制御処理を示すフローチャートである。It is a flowchart which shows the external voice recognition control processing which concerns on embodiment of this invention. 本発明の実施形態に係る音声認識結果フィルタ処理を示すフローチャートである。It is a flowchart which shows the speech recognition result filter processing which concerns on embodiment of this invention. 本発明の実施形態に係る情報処理システムの動作例を示すシーケンス図である。It is a sequence diagram which shows the operation example of the information processing system which concerns on embodiment of this invention.

以下、本発明の実施形態を、自動車において利用される情報処理システムへの適用を例にとり説明する。
図１に情報処理システムの構成を示す。
図示するように、情報処理システムは、自動車に搭載される車載システム１と、車載システム１に選択的に接続されるポータブル装置２とを備えている。
ここで、ポータブル装置２は、たとえば、スマートフォンやタブレット装置などのユーザによって携帯可能な装置である。また、ポータブル装置２は移動通信を介して外部の音声認識サーバ３に接続し、音声認識サーバ３の音声認識サービスを利用して、車載システム１から転送された音声の音声認識を行い、音声認識の結果を、ポータブル装置２に対する音声入力として受け入れ、音声入力に応じた動作を行う機能を備えている。 Hereinafter, embodiments of the present invention will be described by taking as an example an application to an information processing system used in an automobile.
FIG. 1 shows the configuration of the information processing system.
As shown in the figure, the information processing system includes an in-vehicle system 1 mounted on an automobile and a portable device 2 selectively connected to the in-vehicle system 1.
Here, the portable device 2 is a device that can be carried by a user such as a smartphone or a tablet device. Further, the portable device 2 is connected to an external voice recognition server 3 via mobile communication, and uses the voice recognition service of the voice recognition server 3 to perform voice recognition of the voice transferred from the in-vehicle system 1 to perform voice recognition. It has a function of accepting the result of the above as a voice input to the portable device 2 and performing an operation according to the voice input.

次に、車載システム１は、マイクロフォン１０１、音声認識装置１０２、音声認識辞書１０３、音声認識結果フィルタ部１０４、優先ワードテーブル１０５、外部音声認識制御部１０６、トークスイッチ１０７、ポータブル装置２と通信を行う通信インタフェース１０８、データ処理装置１０９、ディスプレイや自動車周辺を撮影するカメラやＡＶ装置や空調装置等の各種の周辺装置１１０を備えている。 Next, the in-vehicle system 1 communicates with the microphone 101, the voice recognition device 102, the voice recognition dictionary 103, the voice recognition result filter unit 104, the priority word table 105, the external voice recognition control unit 106, the talk switch 107, and the portable device 2. It is provided with various peripheral devices 110 such as a communication interface 108, a data processing device 109, a display, a camera for photographing the periphery of an automobile, an AV device, and an air conditioning device.

そして、音声認識辞書１０３には、データ処理装置１０９のコマンドを表すワードの音声認識用のデータが登録されている。そして、音声認識装置１０２は音声認識辞書１０３を用いて、マイクロフォン１０１から入力するユーザの発話音声がデータ処理装置１０９のコマンドを表す音声である場合に、当該コマンドを音声認識し、音声認識結果として音声認識結果フィルタ部１０４に出力する動作を、ユーザの音声認識開始指示操作をトリガとすることなく常時行う。 Then, in the voice recognition dictionary 103, data for voice recognition of a word representing a command of the data processing device 109 is registered. Then, when the voice recognition device 102 uses the voice recognition dictionary 103 and the voice spoken by the user input from the microphone 101 is a voice representing a command of the data processing device 109, the voice recognition device 102 recognizes the command as a voice recognition result. The operation of outputting to the voice recognition result filter unit 104 is always performed without being triggered by the user's voice recognition start instruction operation.

次に、優先ワードテーブル１０５には、音声認識辞書１０３に音声認識用のデータが登録されているワードが表すコマンドのうちの、緊急に処理する必要のあるコマンドと、ポータブル装置２に対する音声入力に現れることがないコマンドが登録されている。そして、音声認識結果フィルタ部１０４は、優先ワードテーブル１０５を用いて、後に詳述する音声認識結果フィルタ処理を行って、音声認識装置１０２が認識した音声認識結果のうちの、所定の条件を満たす音声認識結果のみをデータ処理装置１０９に出力する。 Next, in the priority word table 105, among the commands represented by the words whose voice recognition data is registered in the voice recognition dictionary 103, the commands that need to be processed urgently and the voice input to the portable device 2 are performed. A command that never appears is registered. Then, the voice recognition result filter unit 104 performs the voice recognition result filter processing described in detail later using the priority word table 105, and satisfies a predetermined condition among the voice recognition results recognized by the voice recognition device 102. Only the voice recognition result is output to the data processing device 109.

なお、緊急に処理する必要のあるコマンドとは、たとえば、自動車の安全確保に関わる処理の実行をデータ処理装置１０９に命令するコマンドであり、たとえば、データ処理装置１０９に、自動車の後方を撮影するカメラで撮影した画像のディスプレイへの表示を指示するコマンド「バックカメラ」等を、緊急に処理する必要のあるコマンドとすることができる。 The command that needs to be processed urgently is, for example, a command that instructs the data processing device 109 to execute a process related to ensuring the safety of the vehicle. For example, the data processing device 109 takes a picture of the rear of the vehicle. The command "back camera" or the like that instructs the display of the image taken by the camera on the display can be a command that needs to be processed urgently.

また、ポータブル装置２に対する音声入力に現れることがないコマンドとしては、たとえば、当該時点においてポータブル装置２が音声入力をポータブル装置２に対するコマンドの入力に用いている場合には、音声認識辞書１０３に音声認識用のデータが登録されているワードが表すコマンドのうちの、ポータブル装置２が対応していないコマンドとすることができる。 Further, as a command that does not appear in the voice input to the portable device 2, for example, when the portable device 2 uses the voice input to input the command to the portable device 2 at that time, the voice recognition dictionary 103 has voice. Among the commands represented by the words in which the recognition data is registered, the commands that the portable device 2 does not support can be used.

また、音声認識辞書１０３に音声認識用のデータが登録されているワードが表すコマンドのうちの、ポータブル装置２に対する音声入力に、常識的におよそ現れそうもないコマンドも、ポータブル装置２に対する音声入力に現れることがないコマンドとしてよい。 Further, among the commands represented by the words in which the data for voice recognition is registered in the voice recognition dictionary 103, the commands that are unlikely to appear in the voice input to the portable device 2 are also the voice input to the portable device 2. It may be a command that does not appear in.

ここで、このようなポータブル装置２に対する音声入力に現れることがないコマンドとしては、たとえば、データ処理装置１０９に空調装置の風量の増加処理の実行を指示するコマンド「風量アップ」等がある。 Here, as a command that does not appear in the voice input to such a portable device 2, for example, there is a command "air volume up" that instructs the data processing device 109 to execute the air volume increasing process of the air conditioner.

そして、データ処理装置１０９は、音声認識結果フィルタ部１０４から出力されたコマンドに応じた処理を行う。
また、外部音声認識制御部１０６は、後に詳述する外部音声認識制御処理を行って、マイクロフォン１０１から入力するユーザの発話音声を、通信インタフェース１０８を介してポータブル装置２に転送し、ポータブル装置２に、上述した音声認識サーバ３の音声認識サービスを利用した音声認識を行わせる。 Then, the data processing device 109 performs processing according to the command output from the voice recognition result filter unit 104.
Further, the external voice recognition control unit 106 performs the external voice recognition control process described in detail later to transfer the user's spoken voice input from the microphone 101 to the portable device 2 via the communication interface 108, and the portable device 2 To perform voice recognition using the voice recognition service of the voice recognition server 3 described above.

以下、この外部音声認識制御部１０６が行う外部音声認識制御処理について説明する。
図２に、外部音声認識制御処理の手順を示す。
図示するように、外部音声認識制御処理において、外部音声認識制御部１０６は、トークスイッチ１０７がユーザによってオン操作されるのを監視し（ステップ２０２）、オン操作されたならば外部音声認識モードを設定する（ステップ２０４）。 Hereinafter, the external voice recognition control process performed by the external voice recognition control unit 106 will be described.
FIG. 2 shows the procedure of the external voice recognition control process.
As shown in the figure, in the external voice recognition control process, the external voice recognition control unit 106 monitors that the talk switch 107 is turned on by the user (step 202), and if it is turned on, the external voice recognition mode is set. Set (step 204).

そして、通信インタフェース１０８を介してポータブル装置２に音声認識開始コマンドを発行し（ステップ２０６）、マイクロフォン１０１から入力するユーザの発話音声の通信インタフェース１０８を介したポータブル装置２への転送を開始する（ステップ２０８）。 Then, a voice recognition start command is issued to the portable device 2 via the communication interface 108 (step 206), and transfer of the user's spoken voice input from the microphone 101 to the portable device 2 via the communication interface 108 is started (step 206). Step 208).

ここで、ポータブル装置２は、車載システム１から音声認識開始コマンドを受信したならば、車載システム１から転送される、所定時間長以上の無音区間の開始点を終了点とする一連の音声の、音声認識サーバ３の音声認識サービスを利用した音声認識を行う音声認識処理を開始する。 Here, when the portable device 2 receives the voice recognition start command from the vehicle-mounted system 1, the portable device 2 is a series of voices whose end point is the start point of the silent section having a predetermined time length or longer, which is transferred from the vehicle-mounted system 1. The voice recognition process for performing voice recognition using the voice recognition service of the voice recognition server 3 is started.

次に、外部音声認識制御部１０６は、ユーザの発話音声の通信インタフェース１０８を介したポータブル装置２への転送を開始したならば（ステップ２０８）、音声認識結果フィルタ部１０４からの外部音声認識停止コマンドの受信（ステップ２１０）の発生と、通信インタフェース１０８を介したポータブル装置２からの音声認識終了通知の受信（ステップ２１２）の発生を監視する。 Next, if the external voice recognition control unit 106 starts transferring the user's spoken voice to the portable device 2 via the communication interface 108 (step 208), the external voice recognition stop from the voice recognition result filter unit 104. The occurrence of the reception of the command (step 210) and the reception of the voice recognition end notification from the portable device 2 via the communication interface 108 (step 212) are monitored.

ここで、ポータブル装置２は、上述した音声認識処理が終了したならば、音声認識終了通知を車載装置に出力する。
そして、外部音声認識制御部１０６は、音声認識結果フィルタ部１０４からの外部音声認識停止コマンドの受信（ステップ２１０）と、ポータブル装置２からの音声認識終了通知の受信（ステップ２１２）とのいずれかが発生したならば、マイクロフォン１０１から入力するユーザの発話音声の通信インタフェース１０８を介したポータブル装置２への転送を停止し（ステップ２１４）、外部音声認識モードを解除する（ステップ２１６）。 Here, the portable device 2 outputs a voice recognition end notification to the in-vehicle device when the above-mentioned voice recognition process is completed.
Then, the external voice recognition control unit 106 receives either a voice recognition stop command from the voice recognition result filter unit 104 (step 210) or a voice recognition end notification from the portable device 2 (step 212). When the above occurs, the transfer of the user's spoken voice input from the microphone 101 to the portable device 2 via the communication interface 108 is stopped (step 214), and the external voice recognition mode is canceled (step 216).

そして、ステップ２０２からの処理に戻る。
以上、外部音声認識制御部１０６が行う外部音声認識制御処理について説明した。
次に、音声認識結果フィルタ部１０４が行う上述の音声認識結果フィルタ処理について説明する。
図３に、音声認識結果フィルタ処理の手順を示す。
図示するように、音声認識結果フィルタ部１０４は音声認識結果フィルタ処理において、音声認識装置１０２からの音声認識結果の入力を待ち（ステップ３０２）、音声認識結果が入力したならば、外部音声認識制御部１０６によって外部音声認識モードが設定されているかどうかを調べる（ステップ３０４）。 Then, the process returns to the process from step 202.
The external voice recognition control process performed by the external voice recognition control unit 106 has been described above.
Next, the above-mentioned voice recognition result filter processing performed by the voice recognition result filter unit 104 will be described.
FIG. 3 shows a procedure for voice recognition result filtering.
As shown in the figure, the voice recognition result filter unit 104 waits for the input of the voice recognition result from the voice recognition device 102 (step 302) in the voice recognition result filter processing, and if the voice recognition result is input, the external voice recognition control. Check whether the external voice recognition mode is set by the unit 106 (step 304).

そして、外部音声認識モードが設定されていなければ（ステップ３０４）、入力した音声認識結果をデータ処理装置１０９に出力し（ステップ３１０）、ステップ３０２からの処理に戻る。 If the external voice recognition mode is not set (step 304), the input voice recognition result is output to the data processing device 109 (step 310), and the process returns to the process from step 302.

一方、外部音声認識モードが設定されている場合には（ステップ３０４）、入力した音声認識結果が優先ワードテーブル１０５に登録されているコマンドであるかどうかを調べる（ステップ３０６）。 On the other hand, when the external voice recognition mode is set (step 304), it is checked whether the input voice recognition result is a command registered in the priority word table 105 (step 306).

そして、入力した音声認識結果が優先ワードテーブル１０５に登録されているコマンドでなければ（ステップ３０６）、受信した音声認識結果を廃棄し、そのままステップ３０２からの処理に戻る。 If the input voice recognition result is not a command registered in the priority word table 105 (step 306), the received voice recognition result is discarded and the process returns to the process from step 302 as it is.

一方、そして、受信した音声認識結果が優先ワードテーブル１０５に登録されているコマンドであれば（ステップ３０６）、外部音声認識制御部１０６に外部音声認識停止コマンドを送信した上で（ステップ３０８）、受信した音声認識結果をデータ処理装置１０９に出力し（ステップ３１０）、ステップ３０２からの処理に戻る。 On the other hand, if the received voice recognition result is a command registered in the priority word table 105 (step 306), the external voice recognition stop command is transmitted to the external voice recognition control unit 106 (step 308). The received voice recognition result is output to the data processing device 109 (step 310), and the process returns to the process from step 302.

以上、音声認識結果フィルタ部１０４が行う音声認識結果フィルタ処理について説明した。
ここで、以上のような外部音声認識制御処理と音声認識結果フィルタ処理による音声認識の動作の例を図４に示す。
図示するように、通常、マイクロフォン１０１から入力したユーザの発話した音声は、音声認識装置１０２に送られ（４０１）、音声認識装置１０２において音声認識され、音声認識結果が音声認識結果フィルタ部１０４に送られる（４０２）。そして、音声認識結果フィルタ部１０４は、受け取った音声認識結果をデータ処理装置１０９に出力する（４０３）。 The voice recognition result filter processing performed by the voice recognition result filter unit 104 has been described above.
Here, FIG. 4 shows an example of the operation of voice recognition by the above-mentioned external voice recognition control processing and voice recognition result filter processing.
As shown in the figure, normally, the voice spoken by the user input from the microphone 101 is sent to the voice recognition device 102 (401), voice recognition is performed by the voice recognition device 102, and the voice recognition result is sent to the voice recognition result filter unit 104. Sent (402). Then, the voice recognition result filter unit 104 outputs the received voice recognition result to the data processing device 109 (403).

一方、ユーザがポータブル装置２への音声入力を行うためにトークスイッチ１０７のオン操作を行うと（４１１）、外部音声認識制御部１０６は、外部音声認識モードを設定する（４１２）。 On the other hand, when the user turns on the talk switch 107 in order to input voice to the portable device 2 (411), the external voice recognition control unit 106 sets the external voice recognition mode (412).

そして、その後、優先ワードテーブル１０５に登録されたコマンドを表すワードではないワードをユーザが発話すると、その音声(非登録ワード音声）は、マイクロフォン１０１から音声認識装置１０２と、外部音声認識制御部１０６に送られる（４１３）。 Then, when the user utters a word that is not a word representing the command registered in the priority word table 105, the voice (unregistered word voice) is transmitted from the microphone 101 to the voice recognition device 102 and the external voice recognition control unit 106. Is sent to (413).

音声認識装置１０２は、受け取った音声を音声認識し、音声認識結果を音声認識結果フィルタ部１０４に送る（４１４）。音声認識結果フィルタ部１０４は、受け取った音声認識結果が優先ワードテーブル１０５に登録されたコマンドではないので、音声認識結果をデータ処理装置１０９に出力せずに廃棄する。 The voice recognition device 102 recognizes the received voice and sends the voice recognition result to the voice recognition result filter unit 104 (414). Since the received voice recognition result is not a command registered in the priority word table 105, the voice recognition result filter unit 104 discards the voice recognition result without outputting it to the data processing device 109.

一方、外部音声認識制御部１０６は、受け取った音声をポータブル装置２に転送し（４１５）、ポータブル装置２は転送された音声を音声認識サーバ３に送信する（４１６）。
そして、その後、ポータブル装置２が、送信した（４１６）音声の音声認識結果を音声認識サーバ３から受けとる前に、優先ワードテーブル１０５に登録されたコマンドを表すワードをユーザが発話すると、その音声(登録ワード音声）は、マイクロフォン１０１から音声認識装置１０２と、外部音声認識制御部１０６に送られる（４１７）。 On the other hand, the external voice recognition control unit 106 transfers the received voice to the portable device 2 (415), and the portable device 2 transmits the transferred voice to the voice recognition server 3 (416).
Then, after that, before the portable device 2 receives the voice recognition result of the transmitted (416) voice from the voice recognition server 3, when the user speaks a word representing the command registered in the priority word table 105, the voice ( The registered word voice) is sent from the microphone 101 to the voice recognition device 102 and the external voice recognition control unit 106 (417).

音声認識装置１０２は、受け取った音声を音声認識し、音声認識結果を音声認識結果フィルタ部１０４に送る（４１８）。音声認識結果フィルタ部１０４は、受け取った音声認識結果が優先ワードテーブル１０５に登録されたコマンドであるので、外部音声認識停止コマンドを外部音声認識制御部１０６に発行する（４１９）と共に、音声認識結果をデータ処理装置１０９に出力する（４２０）。 The voice recognition device 102 recognizes the received voice and sends the voice recognition result to the voice recognition result filter unit 104 (418). Since the received voice recognition result is a command registered in the priority word table 105, the voice recognition result filter unit 104 issues an external voice recognition stop command to the external voice recognition control unit 106 (419), and the voice recognition result. Is output to the data processing device 109 (420).

一方、外部音声認識制御部１０６は、受け取った音声をポータブル装置２に転送する（４２１）。ここでは、ポータブル装置２は、音声認識サーバ３からの、音声認識結果待ちの状態にある期間は、転送された音声を無視するように構成されているものとし、ポータブル装置２に転送された（４２１）音声は、音声認識サーバ３に送信されずに廃棄されるものとする。 On the other hand, the external voice recognition control unit 106 transfers the received voice to the portable device 2 (421). Here, it is assumed that the portable device 2 is configured to ignore the transferred voice during the period of waiting for the voice recognition result from the voice recognition server 3, and is transferred to the portable device 2 (). 421) The voice shall be discarded without being transmitted to the voice recognition server 3.

また、外部音声認識制御部１０６は、外部音声認識停止コマンドを受け取ったならば（４１９）、外部音声認識モードを解除する（４２２）。
一方、ポータブル装置２が送信した（４１６）音声の音声認識結果が音声認識サーバ３からポータブル装置２に応答されると（４２３）、ポータブル装置２において、当該音声認識結果の処理が行われる。 Further, when the external voice recognition control unit 106 receives the external voice recognition stop command (419), the external voice recognition control unit 106 cancels the external voice recognition mode (422).
On the other hand, when the voice recognition result of the (416) voice transmitted by the portable device 2 is responded to the portable device 2 from the voice recognition server 3 (423), the portable device 2 processes the voice recognition result.

また、ポータブル装置２の音声認識処理が終了し、ポータブル装置２から外部音声認識制御部１０６に音声認識終了が通知される（４２４）。
以上、外部音声認識制御処理と音声認識結果フィルタ処理による音声認識の動作の例を示した。
ここで、図４に示した例と異なり、ユーザがトークスイッチ１０７をオン操作した直後や、ユーザがトークスイッチ１０７をオン操作し優先ワードテーブル１０５に登録されたコマンドを表すワードではないワードを途中まで発話した後に、優先ワードテーブル１０５に登録されたコマンドを表すワードを発話した場合も、当該ワードの発話を音声認識装置１０２で認識した認識結果が音声認識結果フィルタ部１０４を介して、データ処理装置１０９に出力されることとなる。
なお、これらの場合、優先ワードテーブル１０５に登録されたコマンドを表すワードの発話音声が外部音声認識制御部１０６からポータブル装置２に転送され、ポータブル装置２において当該音声に対する音声認識サーバ３を用いた音声認識が行われる不都合が生じることがあるが、ユーザが発話した音声が緊急に処理する必要のあるコマンドを表すものであれば、当該コマンドのデータ処理装置１０９への入力を当該不都合より優先すべきであり、ユーザが発話した音声がポータブル装置２に対する音声入力に現れることがないコマンドを表すものであれば、ポータブル装置２において、音声認識結果に基づいて不慮の動作が行われることはない。 Further, the voice recognition process of the portable device 2 is completed, and the portable device 2 notifies the external voice recognition control unit 106 of the end of voice recognition (424).
The above is an example of the operation of voice recognition by the external voice recognition control processing and the voice recognition result filter processing.
Here, unlike the example shown in FIG. 4, immediately after the user turns on the talk switch 107, or after the user turns on the talk switch 107, a word that is not a word representing a command registered in the priority word table 105 is inserted in the middle. Even when a word representing a command registered in the priority word table 105 is spoken after the utterances up to, the recognition result of recognizing the utterance of the word by the voice recognition device 102 is processed by data processing via the voice recognition result filter unit 104. It will be output to the device 109.
In these cases, the spoken voice of the word representing the command registered in the priority word table 105 is transferred from the external voice recognition control unit 106 to the portable device 2, and the portable device 2 uses the voice recognition server 3 for the voice. Inconvenience that voice recognition is performed may occur, but if the voice spoken by the user represents a command that needs to be processed urgently, the input of the command to the data processing device 109 is prioritized over the inconvenience. As long as the voice spoken by the user represents a command that does not appear in the voice input to the portable device 2, the portable device 2 does not perform an unexpected operation based on the voice recognition result.

以上、本発明の実施形態について説明した。
本実施形態によれば、通常は、ポータブル装置２による音声認識サーバ３を用いたユーザの発話音声の音声入力を停止した状態で、音声認識装置１０２による常時のユーザの発話音声の音声認識と音声認識結果のコマンドのデータ処理装置１０９への入力が行われるが、ユーザがポータブル装置２への音声入力を行うためにトークスイッチ１０７のオン操作を行うと、ポータブル装置２による音声認識サーバ３を用いたユーザの発話音声の音声入力が開始される。 The embodiment of the present invention has been described above.
According to the present embodiment, normally, the voice recognition device 102 constantly recognizes the voice of the user and voices by the voice recognition device 102 in a state where the voice input of the voice of the user using the voice recognition server 3 by the portable device 2 is stopped. The recognition result command is input to the data processing device 109, but when the user turns on the talk switch 107 to input voice to the portable device 2, the voice recognition server 3 by the portable device 2 is used. The voice input of the spoken voice of the existing user is started.

そして、ポータブル装置２による音声認識サーバ３を用いたユーザの発話音声の音声入力が行われている期間中は、基本的には、音声認識装置１０２によるユーザの発話音声の音声認識結果のコマンドのデータ処理装置１０９への入力は停止する。 Then, during the period during which the voice input of the user's spoken voice is performed using the voice recognition server 3 by the portable device 2, basically, the command of the voice recognition result of the user's spoken voice by the voice recognition device 102 is performed. The input to the data processing device 109 is stopped.

よって、ユーザがデータ処理装置１０９にコマンドを入力するために発話した音声を、ポータブル装置２に音声入力してしまったり、ユーザがポータブル装置２に音声入力するために発話した音声を、データ処理装置１０９へのコマンド入力の音声と誤認識してデータ処理装置１０９へコマンドを誤入力してしまうことは抑制される。 Therefore, the voice spoken by the user to input a command to the data processing device 109 is input to the portable device 2, or the voice spoken by the user to input the voice to the portable device 2 is input to the data processing device. It is suppressed that the command is erroneously input to the data processing device 109 by erroneously recognizing it as the voice of the command input to the 109.

一方で、ポータブル装置２による音声認識サーバ３を用いたユーザの発話音声の音声入力が行われている期間中であっても、優先ワードテーブル１０５に登録されている緊急を要する処理の実行を要求するコマンドや、ポータブル装置２に対する音声入力に現れることがないコマンドを表すワードをユーザが発話した場合だけは、音声認識装置１０２によって認識された当該コマンドがデータ処理装置１０９に入力する。 On the other hand, even during the period during which the voice input of the user's spoken voice using the voice recognition server 3 by the portable device 2 is being performed, it is requested to execute the urgent processing registered in the priority word table 105. Only when the user utters a command to be executed or a word representing a command that does not appear in the voice input to the portable device 2, the command recognized by the voice recognition device 102 is input to the data processing device 109.

よって、本実施形態によれば、データ処理装置１０９に対するコマンドの誤入力やポータブル装置２に対する誤音声入力を抑制しつつ、常時、データ処理装置１０９に対する緊急を要する処理の実行を要求するコマンドの音声入力を行うことができるようになる。また、ポータブル装置２に対する音声入力に現れることがないコマンドについても、常時、データ処理装置１０９に対する音声入力を行うことができるようになる。 Therefore, according to the present embodiment, while suppressing erroneous input of a command to the data processing device 109 and erroneous voice input to the portable device 2, the voice of a command that constantly requests the data processing device 109 to execute an urgent process. You will be able to input. Further, even for a command that does not appear in the voice input to the portable device 2, the voice input to the data processing device 109 can be performed at all times.

なお、以上の実施形態では、トークスイッチ１０７のオン操作を、ポータブル装置２に音声認識を行わせるトリガとしたが、このトリガは、トークスイッチ１０７のオン操作以外のものであってもよい。すなわち、このトリガは、ポータブル装置２への音声入力の開始を指示するコマンドの音声入力等であってもよい。なお、この場合、ポータブル装置２への音声入力の開始を指示するコマンドの音声入力の発生は、音声認識装置１０２において、ユーザの発話音声中の当該コマンドを表すワードを音声認識することにより検出する。 In the above embodiment, the on operation of the talk switch 107 is a trigger for causing the portable device 2 to perform voice recognition, but this trigger may be something other than the on operation of the talk switch 107. That is, this trigger may be a voice input of a command instructing the start of a voice input to the portable device 2. In this case, the generation of the voice input of the command instructing the start of the voice input to the portable device 2 is detected by the voice recognition device 102 by voice recognition of the word representing the command in the voice of the user. ..

また、以上の実施形態は、マイクロフォン１０１から入力する発話音声の音声認識サーバ３を用いた音声認識を行うポータブル装置２に代えて、マイクロフォン１０１から入力する発話音声の音声認識サーバ３を用いた音声認識を行うポータブル装置２ではない装置や、音声認識サーバ３を用いずに自身が備えた音声認識機能を用いてマイクロフォン１０１から入力する発話音声の音声認識を行う任意の装置を備えた場合にも、ポータブル装置２を当該備えた装置に置換することにより同様に適用することができる。 Further, in the above embodiment, the voice using the voice recognition server 3 for the spoken voice input from the microphone 101 is used instead of the portable device 2 for performing voice recognition using the voice recognition server 3 for the spoken voice input from the microphone 101. Even if it is equipped with a device that is not a portable device 2 that performs recognition, or an arbitrary device that performs voice recognition of spoken voice input from the microphone 101 using its own voice recognition function without using the voice recognition server 3. , The portable device 2 can be similarly applied by replacing the device with the device.

１…車載システム、２…ポータブル装置、３…音声認識サーバ、１０１…マイクロフォン、１０２…音声認識装置、１０３…音声認識辞書、１０４…音声認識結果フィルタ部、１０５…優先ワードテーブル、１０６…外部音声認識制御部、１０７…トークスイッチ、１０８…通信インタフェース、１０９…データ処理装置、１１０…周辺装置。 1 ... In-vehicle system, 2 ... Portable device, 3 ... Voice recognition server, 101 ... Microphone, 102 ... Voice recognition device, 103 ... Voice recognition dictionary, 104 ... Voice recognition result filter unit, 105 ... Priority word table, 106 ... External voice Recognition control unit, 107 ... talk switch, 108 ... communication interface, 109 ... data processing device, 110 ... peripheral device.

Claims

It is a voice recognition system that recognizes the voice spoken by the user.
With a microphone
A first voice recognition means for constantly recognizing a command represented by a voice picked up by the microphone, and a first voice recognition means.
The first functional part that executes the process in which the input command commands execution, and
A second voice recognition means that performs a voice recognition operation that recognizes the voice picked up by the microphone, and
A second functional unit that processes the voice recognition result of the second voice recognition means , and
The priority command storage means in which some of the commands recognized by the first voice recognition means are registered as priority commands, and the priority command storage means.
It has a voice recognition operation control means and
The voice recognition operation control means is
While controlling the switching between the first voice recognition mode and the second voice recognition mode provided as the voice recognition mode,
When the voice recognition mode is the first voice recognition mode, the voice recognition operation of the second voice recognition means is stopped, and the command that the first voice recognition means recognizes by voice is the first. Enter in the function part,
When the voice recognition mode is the second voice recognition mode, the command that causes the second voice recognition means to execute the voice recognition operation and the first voice recognition means recognizes the voice is given priority. A voice recognition system characterized in that a command recognized by the first voice recognition means is input to the first functional unit only when the priority command is registered in the command storage means.

The voice recognition system according to claim 1.
The voice recognition operation control means switches the voice recognition mode to the second voice recognition mode when a predetermined input from the user occurs while in the first voice recognition mode, and the second voice recognition mode. A voice recognition system comprising switching the voice recognition mode to the first voice recognition mode when the voice recognition operation of the second voice recognition means is completed while in the voice recognition mode.

The voice recognition system according to claim 1 or 2.
As the voice recognition operation, the second voice recognition means connects to an external voice recognition server that provides a voice recognition service via communication, and uses the voice recognition service of the connected voice recognition server to use the microphone. A voice recognition system characterized by performing voice recognition of the voice picked up by.

The voice recognition system according to claim 1, 2 or 3.
The voice recognition system is a voice recognition system characterized by being a voice recognition system used for voice input in a system mounted on an automobile.

The voice recognition system according to claim 1, 2 or 3.
It has an in-vehicle system mounted on an automobile and a portable device selectively connected to the in-vehicle system.
The in-vehicle system includes the microphone, the first voice recognition means, the first functional unit, the priority command storage means, and the voice recognition operation control means, and the portable device includes the second voice recognition. It has means and the second functional unit, and has
The voice recognition operation control means transfers the voice picked up by the microphone from the in-vehicle system to the portable device when the voice recognition mode is in the second voice recognition mode.
The second voice recognition means is a voice recognition system characterized by performing a voice recognition operation for voice recognition of voice transferred from the vehicle-mounted system to the portable device.

The voice recognition system according to claim 4 or 5 .
The priority command storage means includes at least a priority command among the commands for voice recognition by the first voice recognition means, in which the process in which the command commands execution is a process related to ensuring the safety of the vehicle. A speech recognition system characterized by being registered as.

The voice recognition system according to claim 1, 2, 3, 4, 5 or 6.
In the priority command storage means, among the commands for which the first voice recognition means recognizes the voice, significant processing for the voice recognition result of voice recognition of the voice representing the command is performed in the second functional unit at that time. A speech recognition system characterized in that commands not specified in are registered as at least priority commands.