JPS6038745B2

JPS6038745B2 - Voice information input device

Info

Publication number: JPS6038745B2
Application number: JP58141410A
Authority: JP
Inventors: 富男田所; 真卿各務
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1983-08-02
Filing date: 1983-08-02
Publication date: 1985-09-03
Also published as: JPS6031638A

Description

【発明の詳細な説明】〔発明の分野）本発明は、人間が音声にて情報を入力する音声情報入力
装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Field of the Invention] The present invention relates to a voice information input device for inputting information by voice by a human being.

[Background of the invention]

従来の音声情報入力装置は、音声情報入力の認識結果を
ＣＲＴなどの表示装置の画面に文字や記号で表示する方
式がとられていた。Conventional voice information input devices have adopted a method of displaying the recognition results of voice information input on the screen of a display device such as a CRT in the form of characters or symbols.

このために画面から離れて音声情報の入力をすることが
できないという欠点があった。そこで、音声情報認識結
果を音声にてアンサーバックする方式が考案されて前述
の不便さから解放されたが、この場合に音声入力情報と
同数の莫大なアンサーバツク用の合成音声メモリが必要
となり、装置が大型化し且つ高価になるという欠点があ
った。〔発明の目的〕本発明の目的は、従来技術の欠点を解消し、装置を小型
化し且つ安価とすることができると共に、音声情報認識
結果を音声にてアンサーバツクする認識率の高い音声情
報入力装置を提供するにある。For this reason, there was a drawback that it was not possible to input voice information from a distance from the screen. Therefore, a method was devised to answer back the voice information recognition results in voice, which relieved the above-mentioned inconvenience, but in this case, a huge amount of synthesized voice memory for answer back was required, equal to the number of voice input information. The disadvantages are that the device becomes large and expensive. [Object of the Invention] The object of the present invention is to solve the drawbacks of the prior art, to make the device smaller and cheaper, and to provide a voice information input with a high recognition rate that answers the voice information recognition result by voice. We are in the process of providing equipment.

[Summary of the invention]

本発明は、音声情報入力の認識結果が完全のときは一時
的に記憶される入力音声をアンサーバックさせ、不完全
認識のときはあらかじめ記憶させておいた合成音声をア
ンサーバックさせるようにしたものである。The present invention is configured to answer back a temporarily stored input voice when the recognition result of voice information input is complete, and to answer back a synthesized voice stored in advance when recognition is incomplete. It is.

[Embodiments of the invention]

本発明は実施例を図面に基づいて説明する。 Embodiments of the present invention will be described based on the drawings.

第１図には本発明に係る音声入力装薄の一実施例の構成
が示されており、同図において音声情報認識装置１は音
声入力用マイク６の音声信号を増幅する増幅器１１、音
声信号をディジタル信号に変換するＡ／Ｄ変換器１２、
あらかじめ登録音声を記憶しておく登録音声メモリ１４
及び入力音声と登録音声とを比較して音声認識する音声
認識制御回路１３によって構成されている。。一方、音
声出力装置２は音声出力をするために入力音声を一時保
管しておく入力音声メモリ２５、音声出力をするための
音声を記憶しておく合成音声メモリ２２、音声認識結果
に応じて入力音声メモリ２５あるいは合成音声メモリ２
２の記憶内容を選別して出力する音声出力制御回路２１
、音声出力制御回路２１の出力信号をアナログ信号に変
換するＤ／Ａ変換器２３、アナログ信号を増幅してスピ
ーカ（またはイヤホン）７からアンサーバックの音声を
発生させる増幅器２４によって構成されている。FIG. 1 shows the configuration of an embodiment of a voice input device according to the present invention. an A/D converter 12 that converts the
Registered voice memory 14 that stores registered voices in advance
and a voice recognition control circuit 13 that performs voice recognition by comparing input voice and registered voice. . On the other hand, the audio output device 2 includes an input audio memory 25 for temporarily storing input audio for audio output, a synthesized audio memory 22 for storing audio for audio output, and an input audio memory 22 for storing input audio for audio output, and a synthesized audio memory 22 for storing input audio for audio output, and a synthesized audio memory 22 for storing input audio for audio output. Voice memory 25 or synthetic voice memory 2
Audio output control circuit 21 that selects and outputs the memory contents of 2.
, a D/A converter 23 that converts the output signal of the audio output control circuit 21 into an analog signal, and an amplifier 24 that amplifies the analog signal and generates answerback audio from the speaker (or earphone) 7.

また制御回路３は音声認識装置１の音声談議制御回路１
３を制御して音声認識結果を取り込んだり、音声出力装
置２の音声出力制御回路２１を制御してアンサーバツク
音をスピーカ７から出力させたり、表示器（またはプリ
ンタ）５に制御状態や音声認識結果などを表示（または
プリンアウト）したりする制御用コンピュータである。Further, the control circuit 3 is the voice discussion control circuit 1 of the voice recognition device 1.
3 to capture the voice recognition results, control the voice output control circuit 21 of the voice output device 2 to output an answer back sound from the speaker 7, display the control status and voice recognition on the display (or printer) 5, etc. This is a control computer that displays (or prints out) results, etc.

この制御回路３は音声の他にキーボード４によっても制
御される。次に本発明の一実施例を使用する音声単語の
一例を示す一覧表を下記に示す。This control circuit 3 is controlled not only by voice but also by a keyboard 4. Next, a list showing an example of audio words using an embodiment of the present invention is shown below.

同表において音声単語Ｎｏ．１〜Ｎｏ．４２、すなわち
Ｑの範囲は音声認識用登録音声であり、登録音声メモリ
ー４にあらかじめ登録しておく。In the same table, the phonetic word No. 1~No. 42, that is, the range Q is the registered voice for voice recognition, and is registered in the registered voice memory 4 in advance.

音声の登録は話者がマイク６を使って音声単語Ｎｏ．１
〜Ｎｏ．４２円順次音声で読み上げることによって行わ
れ、その昔声は増幅器１１、Ａ／Ｄ変換器１２、音声認
識制御回路１３を介して登録音声メモリ１４に記憶され
る。上表のＱの範囲は音声認識をしたときに音声にてア
ンサーバツクをするための音声出力用入力用音声メモリ
の記憶内容である。さて、話者がマイク６を使って音声
単語Ｎｏ．１〜Ｎｏ．４２のいずれかを音声にて入力す
ると音声認識制御回路１３によって登録音声メモリー４
に記憶されている音声単語の中から同一Ｎｏ．の音声単
語を探し出してそのＮｏ．を示すデータを制御回路３に
出力する。To register the voice, the speaker uses the microphone 6 to register the voice word No. 1
~No. This is performed by sequentially reading out 42 yen aloud, and the old voice is stored in the registered voice memory 14 via the amplifier 11, A/D converter 12, and voice recognition control circuit 13. The range of Q in the above table is the storage content of the voice output input voice memory for making a voice answer back when voice recognition is performed. Now, the speaker uses the microphone 6 to select the audio word No. 1~No. 42 by voice, the voice recognition control circuit 13 registers the registered voice memory 4.
The same No. is selected from among the audio words stored in the . Find the audio word and enter its No. Data indicating this is output to the control circuit 3.

制御回路３は音声単語Ｎｏデータの入力によりデータと
して取り込んだり表示器５に表示したりする他に音声出
力制御回路２１にアンサーバツクさせるための指令を発
する。The control circuit 3 inputs the voice word number data and not only captures it as data and displays it on the display 5, but also issues a command to the voice output control circuit 21 to make an answer.

音声出力制御回路２１は制御回路３のアンサーバック指
令により入力音声メモリ２５に一時記憶されていた話者
入力単語そのものを○／Ａ変換器２３、増幅器２４を介
してスピーカ７からアソサーバックを音声にて出力する
。すなわち、話者がマイク６から音声にて音声単語を入
力したときあらかじめ登録音声メモリ１４に登録してあ
る音声単語であることを認識した場合は、後者がマイク
６から音声にて音声単語を入力した音声単語そのものが
一時記憶をしていた入力音声メモリ２５から出力されて
スピーカ７から放送される。要するに入力音声を完全に
認識した場合は、話者の入力音声ものものがスピーカ７
からアンサ−バックされる入力音声メモリ２５の記憶を
次の入力音声を受け付ける前に消去する方式とすると記
憶容量の非常に少ない型で安価な装置とすることができ
る。The voice output control circuit 21 responds to the answerback command from the control circuit 3 by audibly outputting the speaker's input word itself, which was temporarily stored in the input voice memory 25, from the speaker 7 via the O/A converter 23 and the amplifier 24. Output. That is, when the speaker inputs a voice word from the microphone 6 and recognizes that it is a voice word that has been registered in advance in the registered voice memory 14, the latter inputs the voice word from the microphone 6. The spoken word itself is outputted from the input speech memory 25 where it was temporarily stored and broadcast from the speaker 7. In short, if the input voice is completely recognized, the speaker's input voice will be heard by the speaker 7.
If the storage in the input voice memory 25 that is answered back from the input voice is erased before accepting the next input voice, it is possible to create an inexpensive device with a very small storage capacity.

上表の音声単語Ｎｏ．５１〜Ｎｏ．５５、すなわち８の
範囲は音声出力用合成音声メモリ２２の記憶内容であり
、合成音声メモリ２２にあらかじめ記録しておく。Audio word No. in the table above. 51~No. 55, that is, the range 8 is the storage content of the synthesized voice memory 22 for voice output, and is recorded in the synthesized voice memory 22 in advance.

合成音声メモリ２２への音声単語の記録方式は音声単語
Ｎｏ．５１〜Ｎｏ．５５の音声単語をそのまま記録する
方式、音声単語Ｎｏ．５１〜Ｎｏ．５５のそれぞれの音
声単語を何語かに区切っておき、音声出力制御回路２１
で順序正しく並べて出力する方式、さらに合成音声メモ
リ２２には音素片のみを記録しておき音声出力制御回路
２１で音声単語Ｎｏ．５１〜Ｎｏ．５５を合成して音声
単語として出力する方式とすることができ、いずれの方
式でも良い。さて、話者が音声単語Ｎｏ．１〜地４２以
外の登録音声メモリ１４に記憶していない音声を入力す
ると音声認識制御回路１３は認識不能としてその認識不
能レベルを制御回路３に出力する。The method for recording voice words in the synthesized voice memory 22 is voice word No. 51~No. A method of recording 55 audio words as they are, audio word No. 51~No. Each of the 55 audio words is divided into several words, and the audio output control circuit 21
Furthermore, only the phoneme segments are recorded in the synthesized speech memory 22, and the speech output control circuit 21 outputs them in the correct order. 51~No. 55 may be synthesized and output as a spoken word, and any method may be used. Now, the speaker selects audio word No. When a voice other than 1 to 42 that is not stored in the registered voice memory 14 is input, the voice recognition control circuit 13 determines that the voice is unrecognizable and outputs the unrecognizable level to the control circuit 3.

制御回路３は認識不能レベルを取り込み、その旨を表示
器５で表示したり音声出力制御回路２１に指令を発する
。音声出力制御回路２１は制御回路３の指令を受けて合
成音声メモリ２２内の音素片データを順序正しく読み出
して来て、又は直接に音声単語Ｎｏ．５１〜Ｎｏ．５５
を出力してＤ／Ａ変換器２３、増幅器２４を介してスピ
ーカ７から合成音声を発声させる。すなわち話者の入力
した音声を認識できなかった場合は合成音声メモリ２２
内の音素片を組み立てて合成音声でアンサーバック音声
が出力される。次に第３図に本発明に係る音声情報入力
装置の処理内容を示す。The control circuit 3 takes in the unrecognizable level and displays this on the display 5 or issues a command to the audio output control circuit 21. The voice output control circuit 21 receives a command from the control circuit 3 and reads out the phoneme data in the synthesized voice memory 22 in an orderly manner, or directly outputs the voice word number. 51~No. 55
The synthesized voice is output from the speaker 7 via the D/A converter 23 and the amplifier 24. In other words, if the voice input by the speaker cannot be recognized, the synthesized voice memory 22
The answerback voice is output as a synthesized voice by assembling the phoneme pieces within. Next, FIG. 3 shows the processing contents of the audio information input device according to the present invention.

同図においてステップ１０１で音声入力用マイク６から
話者の音声が入力されると、ステップ１０２ではステッ
プ１０１の入力音声が増幅器１１で増幅される。更にス
テップ１０３ではＡ／Ｄ変換器１２により音声信号がデ
ィジタル信号に変換され、ステップ１０４では入力音声
を出力するためにステップ１０３でＡ／Ｄ変換された入
力音声が一時的に入力音声メモリ２５に記憶される。次
にステップ１０５では音声認識制御回路１３により、音
声認識を容易とするために入力音声の特徴が抽出され、
ステップ１０６で音声認識制御回路１３により登録音声
メモリ１４の記憶内容と入力音声とが比較され、認識判
定が行われる。登録音声メモリー４の記憶内容と入力音
声とが合致すると完全認識としてステップ１０７に移行
し、他方登録音声メモリ１４の記憶内容と入力音声とが
合致しない場合には不完全認識とてステップ１１３に移
行する。そして完全認識の場合にはステップ１１２の制
御デ−夕に信号出力を送り込むとともにステップ１０７
で音声出力制御回路２１によって入力音声メモリ２５に
一時保管されていた入力音声がＤ／Ａ変換器２３に送出
される。ステップ１０８ではＤ／Ａ変換器２３によりデ
ィジタル化された入力音声がアナログ音声に変換され、
次いでステップ１０９で増幅器２４により音声増幅が行
なわれる。更ステップ１１０ではスピーカまたはイヤホ
ン７により音声出力がなされ、話者に対し音声が発せら
れる。そしてステップ１１１では入力音声メモリ２５に
一時保管されていた入力音声データを次の入力音声デー
タを入力するために音声出力制御回路２１の制御信号に
よりクリアされる。他方、ステップ１０６の音声認識判
定で不完全認識と判定された場合にはステップ１１２で
制御データに信号を送り込むとともにステップ１１３で
音声出力制御回路２１によって合成音声メモリ２２に記
憶されている音素片を合成して次のステップ１１４で合
成音声が出力される。In the figure, when the speaker's voice is input from the voice input microphone 6 at step 101, the input voice from step 101 is amplified by the amplifier 11 at step 102. Further, in step 103, the audio signal is converted into a digital signal by the A/D converter 12, and in step 104, the input audio that has been A/D converted in step 103 is temporarily stored in the input audio memory 25 in order to output the input audio. be remembered. Next, in step 105, the voice recognition control circuit 13 extracts features of the input voice to facilitate voice recognition.
In step 106, the speech recognition control circuit 13 compares the input speech with the stored contents of the registered speech memory 14, and performs recognition determination. If the stored contents of the registered voice memory 4 and the input voice match, complete recognition is assumed and the process moves to step 107. On the other hand, when the stored contents of the registered voice memory 14 and the input voice do not match, incomplete recognition is assumed and the process moves to step 113. do. In the case of complete recognition, a signal output is sent to the control data in step 112, and at the same time, in step 107
The input audio temporarily stored in the input audio memory 25 is sent to the D/A converter 23 by the audio output control circuit 21. In step 108, the digitized input audio is converted to analog audio by the D/A converter 23,
Next, in step 109, audio amplification is performed by the amplifier 24. Furthermore, in step 110, the speaker or earphone 7 outputs audio, and the audio is emitted to the speaker. Then, in step 111, the input audio data temporarily stored in the input audio memory 25 is cleared by a control signal from the audio output control circuit 21 in order to input the next input audio data. On the other hand, if it is determined that the voice recognition is incomplete in the voice recognition determination at step 106, a signal is sent to the control data at step 112, and at step 113, the voice output control circuit 21 outputs the phoneme pieces stored in the synthesized voice memory 22. After synthesis, in the next step 114, synthesized speech is output.

以下はステップ１０６で完全認識と判定された場合と同
様にして、ステップ１０８のＤ／Ａ変換、ステップ１０
９の音声増幅、ステップ１１０の音声出力を経て話者に
合成音声がスピーカまたはイヤホーン７から発生される
。そしてステップ１１１の一時メモリクリアで一時保管
されていた入力音声データをクリアして次の音声入力待
ちすなわちステップ１０１の音声入力へと進む。以上の
ように本実施例では入力音声が完全に認識された場合は
入力音声が発生され、不完全認識の場合は音素片合成音
声が発生される。〔発明の効果〕本発明によれば合成音声メモリ２２の音声単語の記憶容
量が極端に少なくて済み装置を４・型軽量化することが
できる。The following steps are performed in the same way as when complete recognition is determined in step 106, D/A conversion in step 108, and step 10.
After voice amplification in step 9 and voice output in step 110, synthesized voice is generated from the speaker or earphone 7 to the speaker. Then, the temporarily stored input audio data is cleared by the temporary memory clear in step 111, and the process advances to wait for the next audio input, that is, to the audio input in step 101. As described above, in this embodiment, when the input speech is completely recognized, the input speech is generated, and when the input speech is incompletely recognized, the phoneme segment synthesized speech is generated. [Effects of the Invention] According to the present invention, the storage capacity of the synthesized speech memory 22 for speech words is extremely small, and the device can be reduced in size and weight.

また話者の音声入力を完全に認識した場合は話者の音声
そのものがアンサ−バックされ、不完全認識の場合は合
成音声でァンサーバックされるため入力音声情報が完全
に認識されたどうかをアンサーバック音によって容易に
判別できる。In addition, if the speaker's voice input is completely recognized, the speaker's voice itself will be answered back, and if it is incompletely recognized, a synthesized voice will be used as an answerback, so it will be answered back whether the input voice information has been completely recognized. Easily identified by sound.

[Brief explanation of drawings]

第１図は本発明の一実施例を示す音声情報入力袋燈の構
成を示すシステム構成図、第２図は第１図に示した音声
情報入力装置の処理内容を示すフローチャートである。FIG. 1 is a system configuration diagram showing the configuration of a voice information input lamp according to an embodiment of the present invention, and FIG. 2 is a flowchart showing the processing contents of the voice information input device shown in FIG.

Claims

[Scope of Claims] 1. In a voice information input device that answers the recognition result of voice input information by voice, the voice information is taken in and the voice input information is compared with a registered voice registered in advance to determine if they match. A voice recognition device that determines the recognition state of voice input information based on a discrepancy; a control circuit that receives the determination output of the voice recognition device and outputs an answerback command according to the recognition state; and a control circuit that temporarily stores the voice input information. In response to an answerback command from the control circuit, when the voice input information is fully recognized, the input voice is answered, and when the voice input information is incompletely recognized, the previously stored voice is answered, and the temporarily stored voice input is answered. A voice information input device comprising: a voice input device for clearing information. 2. The voice information input device according to claim 1, wherein the voice output device obtains the voice to be answered during incomplete recognition by synthesizing pre-stored phoneme piece data.