JPH0437997B2

JPH0437997B2 -

Info

Publication number: JPH0437997B2
Application number: JP57227706A
Authority: JP
Inventors: Teruhiko Ukita; Hidenori Shinoda; Yoichi Takebayashi
Original assignee: Tokyo Shibaura Electric Co Ltd
Current assignee: Toshiba Corp
Priority date: 1982-12-28
Filing date: 1982-12-28
Publication date: 1992-06-23
Also published as: JPS59121097A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は信号音に同期して発声された入力音声
を効果的に、且つ信頼性良く認識することのでき
る音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a speech recognition device that can effectively and reliably recognize input speech uttered in synchronization with a signal tone.

[Technical background of the invention and its problems]

近時、音声認識技術の発達が目覚ましく、孤立
発声された音声を高い精度で認識する音声認識装
置や、単音節毎に区切つて発声された音声を入力
とする音声タイプライター等が一部で実用化され
るに至つている。ところで、上記の単語音声認識
装置の場合、発声された１音のみを認識すればよ
いわけではなく、１音の認識が終了したときには
速やかに次の１音の認識処理を行うことが必要で
ある。そこで従来では、例えば「ピー」といつた
信号音を１音認識終了毎に出力し、この信号音に
同期して次の１音の発声入力を促すことが行なわ
れている。また音声タイプライターにあつては、
連続発声された音節を取扱うので、セグメンテー
シヨンの問題を緩和するべく、上記信号音を所定
の周期で出力し、音声発声のタイミングを指示す
る等の工夫が施されている。従つて発声者は上記
信号音をモニタし、これに同期して音声を発声す
ればよいことになる。 Recently, the development of speech recognition technology has been remarkable, and some devices such as speech recognition devices that recognize isolated speech with high accuracy and voice typewriters that input speech that is divided into monosyllables are now in practical use. It has come to be By the way, in the case of the above-mentioned word speech recognition device, it is not enough to recognize only one uttered sound, but when recognition of one sound is completed, it is necessary to immediately perform recognition processing for the next sound. . Conventionally, therefore, a signal sound such as a "beep" is output every time one sound is recognized, and the next sound is prompted to be uttered in synchronization with this signal sound. Also, for voice typewriters,
Since syllables that are continuously uttered are handled, in order to alleviate the problem of segmentation, measures are taken to output the signal tone at a predetermined cycle to indicate the timing of voice utterance. Therefore, the speaker only needs to monitor the signal tone and utter the voice in synchronization with the signal tone.

さて、従来一般的な音声認識装置にあつては、
入力音声の認識結果を上記装置に付随して設けら
れたCRTキヤラクタデイスプレイ等の表示装置
を用いて表示することが行われている。この為、
その認識結果を確認する為には、発声者がこれを
目視する必要があり、高速性等の音声入力方式の
利点が著しく損われていた。このような不具合を
解消するべく、特開昭56−138799号公報等には、
「ピツ」「ブー」なる２つの信号音を準備し、信号
音に同期して入力された音声が正しく認識された
か否かによつて上記信号音を異ならせることが開
示されている。このような手段によれば、認識で
きなかつた音声の再入力を促すことができる等の
効果が得られる。然し乍ら、このままでは、どの
ように発声すれば正しく音声認識されるかが不明
であり、誤りを未然に防ぐことを殆んど期待する
ことができないと云う問題がある。 Now, regarding conventional general speech recognition devices,
The results of recognition of input speech are displayed using a display device such as a CRT character display attached to the above-mentioned device. For this reason,
In order to confirm the recognition result, the speaker needs to visually check it, and the advantages of the voice input method, such as high speed, are significantly impaired. In order to eliminate such problems, Japanese Patent Application Laid-Open No. 56-138799, etc.
It is disclosed that two signal tones, ``pitsu'' and ``boo'', are prepared and the signal tones are made different depending on whether or not a voice input in synchronization with the signal tone is correctly recognized. According to such a means, effects such as being able to prompt the user to re-input a voice that could not be recognized can be obtained. However, if things continue as they are, it is unclear how to pronounce them to ensure correct speech recognition, and there is a problem in that there is little hope of preventing errors.

[Purpose of the invention]

本発明はこのような事情を考慮してなされたも
ので、その目的とするところは、入力音声の認識
状況を発声者に対して簡易に知らしめ、認識誤り
を未然に、且つ効果的に防ぐことができ、音声入
力の高速化を図り得る実用性の高い音声認識装置
を提供することにある。 The present invention has been made in consideration of these circumstances, and its purpose is to easily inform the speaker of the recognition status of input speech and to effectively prevent recognition errors. An object of the present invention is to provide a highly practical speech recognition device capable of speeding up speech input.

[Summary of the invention]

上記目的を達成するために、本発明に係る音声
認識装置では、音声の入力を促す信号音を出力す
る手段と、上記信号音に同期して発声入力された
音声信号を認識処理する手段と、この手段によつ
て現時点において行われた認識処理による認識信
頼度と現時点より一定の認識処理回数分だけ遡つ
た期間内に行われた認識処理による認識信頼度と
を平均した平均認識信頼度に対応させて前記信号
音の出力形態を可変する出力形態可変手段とを備
えている。 In order to achieve the above object, the speech recognition device according to the present invention includes: a means for outputting a signal sound prompting voice input; a means for recognizing and processing a voice signal input as a voice in synchronization with the signal sound; This means corresponds to the average recognition reliability that is the average of the recognition reliability from the recognition processing performed at the present moment and the recognition reliability from the recognition processing performed within a period that goes back a certain number of recognition processings from the present time. and output form variable means for changing the output form of the signal tone.

〔Effect of the invention〕

かくして本発明によれば、入力した音声の認識
状況、つまり認識信頼度を信号音の変化として発
声者に知らせ、正しい発声を促すようにしている
ので、誤つた認識を未然に防ぎ、音声入力の高速
化を図ることが可能となる。特に、現時点におい
て行われた認識処理による認識信頼度と現時点よ
り一定の認識処理回数分だけ遡つた期間内に行わ
れた認識処理による認識信頼度とを平均した平均
認識信頼度に対応させて前記信号音の出力形態を
可変するようにしているので、長い時間に亘つて
音声を入力するような場合に、発声傾向が時間の
経過にしたがつてどのように変化しているかを発
声者に知らせることができ、この結果、信号音で
入力を催促するときに起こり易い発声者の戸惑い
を緩和できるので、使い易さを向上させることが
できる。しかも、認識信頼度、例えば音声認識処
理過程における類似度計算結果等を利用して信号
音を可変するだけでよいので、その制御が容易で
あり、実用性が高い等の効果が奏せられる。 Thus, according to the present invention, the recognition status of the input voice, that is, the recognition reliability, is notified to the speaker as a change in the signal sound, and the speaker is encouraged to produce the correct voice, thereby preventing erroneous recognition and improving the accuracy of the voice input. It becomes possible to increase the speed. In particular, the average recognition reliability obtained by averaging the recognition reliability obtained by the recognition processing performed at the present time and the recognition reliability obtained by the recognition processing performed within a period extending back a certain number of recognition processings from the present time is calculated as described above. Since the output form of the signal tone is variable, when inputting voice over a long period of time, the speaker can be informed of how the vocalization tendency is changing over time. As a result, the user's confusion that tends to occur when prompting for input using a signal tone can be alleviated, and the ease of use can be improved. In addition, since it is only necessary to vary the signal sound using the recognition reliability, for example, the similarity calculation result in the speech recognition processing process, the control is easy and highly practical.

[Embodiments of the invention]

以下、図面を参照して本発明の一実施例につき
説明する。 Hereinafter, one embodiment of the present invention will be described with reference to the drawings.

第１図ａ〜ｃは、所定の周期で出力され、音声
の発声入力を促す信号音ａと、この信号音に同期
して発声入力される音声のパワーｂと、上記音声
の認識処理における類似度ｃとの関係を示すもの
である。ここでは、信号音に同期して「オ、ン、
セ、イ、ニ、ン、シ、キ」なる音声を発声入力し
たときの音声パワー変化が示される。 Figures 1a to 1c show a signal sound a that is output at a predetermined period to prompt voice input, the power b of a voice input in synchronization with this signal sound, and similarities in the recognition processing of the above-mentioned voice. This shows the relationship with degree c. Here, in synchronization with the signal tone,
A change in voice power is shown when the following voices are inputted as utterances.

第２図は実施例装置の概略構成図であり、１は
音声認識部である。この音声認識部１は、例えば
入力音声を分析してその音声特徴パラメータを求
め、辞書登録された標準音声パターンとのパター
ンマツチングを、例えば類似度計算して行い、そ
の類似度値に従つて上記入力音声を認識するもの
である。この場合、上記認識の信頼度は、認識処
理過程で求められる類似度値や、類似度の大なる
順に求められる第１位と第２位との類似度差等に
よつて表わされる。しかして、この信頼度の情報
は、１音認識の都度シフトレジスタ２に格納され
る。類似度平均化回路３は、シフトレジスタ２に
格納された過去数サンプルの信頼度（類似度値）
を得てこれを平均化処理し、音声認識の状況を判
定している。このようにして求められた認識状況
を示す信頼度の情報が制御データとしてレジスタ
４に格納される。 FIG. 2 is a schematic configuration diagram of the embodiment device, and 1 is a voice recognition section. This speech recognition unit 1 analyzes input speech to obtain its speech feature parameters, performs pattern matching with standard speech patterns registered in a dictionary by, for example, calculating similarity, and performs pattern matching with standard speech patterns registered in a dictionary, and performs pattern matching according to the similarity value. It recognizes the input voice. In this case, the reliability of the recognition is expressed by the similarity value obtained in the recognition processing process, the similarity difference between the first and second places obtained in descending order of similarity, and the like. Therefore, this reliability information is stored in the shift register 2 each time one sound is recognized. The similarity averaging circuit 3 calculates the reliability (similarity value) of the past few samples stored in the shift register 2.
This is then averaged to determine the speech recognition status. Reliability information indicating the recognition status thus determined is stored in the register 4 as control data.

信号音発生回路５は、前記音声認識部１が入力
音声の１音の認識処理を終了する都度出力する信
号を受けて、発声者の次の１音の発声入力を促す
信号音を発生するものである。そして、この信号
音の出力形態は、前記レジスタ４に格納された制
御データに従つて可変されるようになつている。 The signal sound generation circuit 5 generates a signal sound to prompt the speaker to input the next sound in response to a signal outputted each time the speech recognition unit 1 finishes recognition processing for one sound of input speech. It is. The output form of this signal sound is variable according to control data stored in the register 4.

しかして上記信号音の出力形態の制御は次のよ
うにして行われる。即ち信号音は、例えば第３図
に示すように、認識結果の信頼度が或る閾値Th
以上であり、正確な音声認識が行われているとき
には一定周波数に制御されるようになつている。
そして、上記音声認識の信頼性が高くなる程、そ
の周波数を低下させる等して、信号音周波数が信
頼度に対応するように可変制御される。尚、この
信号音の制御を、例えば第４図に示すように、信
頼度に応じて信号音の振幅や出力時間幅を変える
等して行つてもよい。更には、信号音出力波形を
歪ませて信頼度の低下を表わすようにしてもよ
い。具体的には、信頼度0.1〜1.0に対応して、信
号音の周波数を2000Hz〜1000Hzの幅で変化させれ
ば、その認識を容易に行い得る。 The output form of the signal tone is controlled as follows. In other words, as shown in FIG.
As described above, when accurate speech recognition is being performed, the frequency is controlled to be constant.
Then, as the reliability of the voice recognition increases, the signal sound frequency is variably controlled to correspond to the reliability, such as by lowering the frequency. Note that this signal sound may be controlled by, for example, changing the amplitude and output time width of the signal sound depending on the reliability, as shown in FIG. 4, for example. Furthermore, the signal tone output waveform may be distorted to indicate a decrease in reliability. Specifically, if the frequency of the signal tone is changed in a range of 2000Hz to 1000Hz corresponding to the reliability level of 0.1 to 1.0, the recognition can be easily performed.

このようにして、音声認識の信頼度に応じて信
号音の形態が可変制御される本装置によれば、発
声者は信号音に同期して発声するに際して、今ま
での発声の仕方が装置にとつて認識し易いもので
あつたか否かを容易に知ることができ、音声の発
声入力の仕方を工夫して装置が認識し易い音声を
即時的に入力することが可能となる。そして、装
置ではこれを信頼度良く、確実に認識することが
可能となる。これ故、誤認識を効果的に未然に防
ぎ、高速に且つ効率の良い音声入力を行わしめる
ことが可能となる。しかも、従来の認識がなされ
たか否かの信号音情報とは異なつて、認識状況が
知らしめられるので、高い認識精度を得るべく発
声の工夫を施すことが容易となり、不安定な認識
判定処理を回避することが可能となる。そして、
上述したように、信号音の出力形態の可変制御も
簡単であるから、装置を容易に、且つ安価に製作
できる等の効果が奏せられる。 In this way, according to this device in which the form of the signal tone is variably controlled according to the reliability of the voice recognition, when the speaker speaks in synchronization with the signal tone, the previous method of vocalization can be changed to the device. It is possible to easily know whether or not the voice is particularly easy to recognize, and it becomes possible to instantly input voice that is easy to recognize by the device by devising the method of voice input. The device can then reliably and reliably recognize this. Therefore, erroneous recognition can be effectively prevented and voice input can be performed quickly and efficiently. Moreover, unlike the conventional signal sound information that indicates whether recognition has been performed or not, the recognition status is informed, making it easier to devise vocalizations to obtain high recognition accuracy and to avoid unstable recognition judgment processing. It is possible to avoid this. and,
As described above, since variable control of the output form of the signal tone is simple, effects such as the ability to manufacture the device easily and at low cost can be achieved.

尚、本発明は上記実施例に限定されるものでは
ない。例えば信頼度に応じて信号音を何段階かに
分けて可変制御することも可能である。また信頼
度の情報としては、音声の認識処理に用いられる
他の判断要素を用いてもよく、これを写像して信
頼度とすることも可能である。要するに本発明は
その要旨を逸脱しない範囲で種々変形して実施す
ることができる。 Note that the present invention is not limited to the above embodiments. For example, it is also possible to variably control the signal sound by dividing it into several stages depending on the reliability. Further, as the reliability information, other judgment factors used in speech recognition processing may be used, and it is also possible to map this to the reliability. In short, the present invention can be implemented with various modifications without departing from the gist thereof.

[Brief explanation of drawings]

第１図は信号音と入力音声とその認識結果の信
頼度との関係を示す図、第２図は本発明の一実施
例装置の概略構成図、第３図および第４図はそれ
ぞれ信号音の出力形態と信頼度との関係を示す図
である。１……音声認識部、２……レジスタ、３……類
似度平均化回路、４……レジスタ、５……信号音
発生回路。 FIG. 1 is a diagram showing the relationship between signal tones, input voices, and the reliability of their recognition results, FIG. 2 is a schematic configuration diagram of an apparatus according to an embodiment of the present invention, and FIGS. 3 and 4 are respectively for signal tones. FIG. 3 is a diagram showing the relationship between output format and reliability. 1...Speech recognition unit, 2...Register, 3...Similarity averaging circuit, 4...Register, 5...Signal sound generation circuit.

Claims

[Claims] 1. Means for outputting a signal sound to prompt voice input;
A means for recognizing and processing a voice signal inputted in synchronization with the signal sound, and the recognition reliability of the recognition processing performed by this means at the present time, and within a period extending back a certain number of recognition processing times from the present time. 1. A speech recognition device comprising: output form variable means for varying the output form of the signal sound in accordance with an average recognition reliability obtained by averaging recognition reliability obtained by recognition processing performed on the signal sound. 2. The speech recognition device according to claim 1, wherein the output form variable means varies at least one of the frequency, amplitude, time width, and waveform of the signal tone.