JPH0314359B2

JPH0314359B2 -

Info

Publication number: JPH0314359B2
Application number: JP58221097A
Authority: JP
Inventors: Toyoshi Yamada
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-11-24
Filing date: 1983-11-24
Publication date: 1991-02-26
Also published as: JPS60113298A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、学習機能を持ち、リジエクトや類似
語有、発声にばらつき有の場合には自動的に同じ
プロンプトを再表示するようにした特定話者音声
認識装置に関するものである。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention has a learning function and automatically redisplays the same prompt when there are rejects, similar words, or variations in pronunciation. The present invention relates to a user voice recognition device.

[Conventional technology and problems]

音声認識装置において、認識率を高める重要な
要素のひとつに、入力する音声パターンと比較照
合する辞書パターンの最適化が挙げられる。 In speech recognition devices, one of the important factors for increasing the recognition rate is optimizing dictionary patterns to be compared and matched with input speech patterns.

特定話者音声認識装置では、この辞書パターン
の最適化のために学習機能を有し、オペレータ
は、音声登録時に同一語を数回発声して装置に学
習をさせる。高度な学習機能では、ひとつの語に
対し、辞書テンプレートが複数個用意されてい
て、辞書の修正（平均化）、追加、或いは削除／
追加を行う。 The specific speaker speech recognition device has a learning function for optimizing this dictionary pattern, and the operator makes the device learn by uttering the same word several times during voice registration. With the advanced learning function, multiple dictionary templates are prepared for one word, and you can modify (average), add, or delete the dictionary.
Make additions.

音声登録には、初期登録モードと学習モードが
あり、初期登録は、すべての登録語に対し一通り
発声し、辞書のイニシヤル・パターンを作成する
ものであり、学習は、初期登録後、即ちイニシヤ
ル・パターンが作成された後に行われ、或る語の
発声に対し認識処理を実行し、その認識結果の順
位、距離情報をもとにリジエクト、辞書パターン
の修正、或いは追加を行う。さらに学習が進んで
行けば、辞書パターンの削除／追加も行う。 There are two modes for voice registration: initial registration mode and learning mode. Initial registration involves uttering all the registered words once to create an initial pattern for the dictionary. Learning is performed after initial registration, that is, after initial registration. - Performed after a pattern is created, recognition processing is executed for the utterance of a certain word, and based on the ranking and distance information of the recognition result, rejects, corrections, or additions to the dictionary pattern are performed. As learning progresses further, dictionary patterns are deleted/added.

音声登録では、オペレータにとつて、できるだ
け少ない発声回数で効率的に辞書を完成させるの
が望ましく、また余計なキー操作や判断を回避さ
せて発声に専念させるのが望ましい。 In voice registration, it is desirable for the operator to efficiently complete the dictionary with as few utterances as possible, and it is also desirable for the operator to avoid unnecessary key operations and judgments and concentrate on utterance.

[Purpose of the invention]

本発明は、上記の考察に基づくものであつて、
学習機能を持つた特定話者音声認識装置におい
て、自動的に次に発声すべき語を順次オペレータ
に表示し、オペレータの判断やキー操作を極力少
なくし、オペレータの負担の軽減を図つた特定話
者音声認識装置を提供することを目的とするもの
である。 The present invention is based on the above considerations, and includes:
A specific speaker speech recognition device with a learning function automatically displays the next word to be uttered to the operator in sequence, minimizing operator judgment and key operations, thereby reducing the burden on the operator. The purpose of this invention is to provide a voice recognition device for users.

[Structure of the invention]

そのために本発明の特定話者音声認識装置は、登録語の文字列が格納されると共に、登録語に
対する音声の辞書パターンが登録される辞書メモ
リと、入力パターンと辞書パターンとを照合して認識
結果と入力パターンの修正情報を出力する照合部
と、デイスプレイと、辞書メモリの辞書パターンの追加登録や修正を
行う辞書制御修正部と、照合部から出力された認識結果と入力パターン
の修正情報とが入力される学習機構部と、学習機構部からの指示に従つて、辞書メモリか
ら登録語の文字列を読出して登録語の音声入力を
要求するプロンプトをデイスプレイに表示させる
プロンプト送出制御部とを具備した特定話者音声認識装置であつて、学習機構部は、辞書制御修正部を制御して、リジエクトや辞書
パターンの追加、辞書パターンの修正等の処理を
実行し、再発声の必要があるか否かを判断し、再発声の必要がない場合には、次の登録語の音
声入力を要求するプロンプトをデイスプレイに表
示させるための制御を行い、再発声の必要がある場合には、同じ登録語の音
声入力を要求するプロンプトをデイスプレイに再
度表示させるための制御を行うと共に、再発声が
必要な理由を表す補助情報をデイスプレイに表示
させるための制御を行うよう構成されていることを特徴とするものである。 To this end, the specific speaker speech recognition device of the present invention performs recognition by comparing input patterns and dictionary patterns with a dictionary memory in which character strings of registered words are stored and dictionary patterns of speech for registered words are registered. A matching unit that outputs the results and input pattern correction information, a display, a dictionary control correction unit that performs additional registration and correction of dictionary patterns in the dictionary memory, and a recognition result and input pattern correction information output from the matching unit. a learning mechanism unit into which is input, and a prompt sending control unit which reads out the character string of the registered word from the dictionary memory and displays on the display a prompt requesting voice input of the registered word according to instructions from the learning mechanism unit. The learning mechanism section controls the dictionary control and correction section to perform processes such as rejecting, adding dictionary patterns, and correcting dictionary patterns, and is necessary for re-voicing. If there is no need to re-speak, control is performed to display a prompt on the display requesting voice input of the next registered word, and if re-speak is necessary, the same voice input is performed. It is characterized by being configured to perform control to display on the display again a prompt requesting voice input of the registered word, and to perform control to display on the display auxiliary information indicating the reason why re-voicing is necessary. That is.

[Embodiments of the invention]

以下、本発明の実施例を図面を参照しつつ説明
する。 Embodiments of the present invention will be described below with reference to the drawings.

図は本発明の１実施例構成を示す図である。図
において、１は入力パターン・バツフア、２は辞
書制御修正部、３は辞書メモリ、４は照合部、５
はプロンプト送出制御部、６は学習機構部、７は
デイスプレイを示す。入力パターン・バツフア１
は、オペレータによつて音声入力された入力パタ
ーンを蓄えるものであり、照合部４は、入力パタ
ーン・バツフア１に蓄えられた入力パターンと辞
書メモリ３に格納された辞書パターンとを照合
し、学習モード時には、その照合結果として、認
識結果（複数の候補とその距離）及び入力パタ
ーンの修正情報をそれぞれ学習機構部６に送
る。辞書メモリ３は、テーブルと辞書パターンよ
りなり、テーブルには登録語の文字列が格納さ
れ、辞書パターンには登録語に対する複数の辞書
テンプレートが用意され音声のパターンが登録さ
れる。初期登録では、テーブルのすべての登録語
に対して一通り発声し、そのパターンが辞書パタ
ーンに登録され、学習では、登録語に対して発声
した入力パターンと辞書パターンとの照合が行わ
れ、照合結果に応じて、リジエクト、辞書パター
ンの追加や辞書パターンの修正が行われる。辞書
メモリ６に対するこれらの処理は、学習機構部６
からの指示をもとに辞書制御修正部２により実
行される。学習機構部６は、正解（発声した語）
の距離情報、正解と他の語との間の距離差、及び
辞書テンプレートの空き情報により、先に述べた
ように、辞書制御修正部２を制御してリジエク
ト、辞書パターンの追加、或いは辞書パターンの
修正のいずれかの処理を実行する。リジエクト
は、学習を行わないようにする処理であり、辞書
パターンの追加は、入力パターンをそのまま追加
辞書パターンとする処理であり、辞書パターンの
修正は、辞書パターンを平均化して入れ換える処
理である。また、この学習結果に関連して学習機
構部６は、オペレータに対して次に何を発声すべ
きかをデイスプレイ７に表示するための処理を行
う。学習機構部６は、プロンプト送出制御部５に
対し、辞書メモリ３のテーブルの文字列を順次デ
イスプレイ７に表示させるが、例えば次のない
しのケースが起きた場合には再発声が必要とし
てプロンプトを次に進ませないで、同じプロンプ
トを再度表示させる。（）リジエクトされた時（正常な音声入力と見な
されなかつた時）。 The figure is a diagram showing the configuration of one embodiment of the present invention. In the figure, 1 is an input pattern buffer, 2 is a dictionary control correction section, 3 is a dictionary memory, 4 is a collation section, and 5 is a dictionary control correction section.
6 is a prompt sending control section, 6 is a learning mechanism section, and 7 is a display. Input pattern buffer 1
stores the input patterns voice input by the operator, and the collation unit 4 collates the input patterns stored in the input pattern buffer 1 with the dictionary patterns stored in the dictionary memory 3, and performs learning. In the mode, the recognition results (a plurality of candidates and their distances) and input pattern modification information are sent to the learning mechanism section 6 as the matching results. The dictionary memory 3 consists of a table and a dictionary pattern. The table stores character strings of registered words, and the dictionary pattern stores a plurality of dictionary templates for registered words and registers speech patterns. In initial registration, all the registered words in the table are uttered once, and the patterns are registered in the dictionary pattern. During learning, the input pattern uttered for the registered words is compared with the dictionary pattern, and the matching is performed. Depending on the results, rejects, dictionary patterns are added, and dictionary patterns are modified. These processes for the dictionary memory 6 are carried out by the learning mechanism section 6.
This is executed by the dictionary control correction unit 2 based on instructions from the dictionary control correction unit 2. Learning mechanism part 6 is correct (uttered word)
As described above, the dictionary control correction unit 2 is controlled to reject, add a dictionary pattern, or modify a dictionary pattern based on the distance information of , the distance difference between the correct answer and other words, and the dictionary template free information. Perform one of the following corrections. Rejecting is a process of not performing learning, adding a dictionary pattern is a process of using an input pattern as an additional dictionary pattern, and modifying a dictionary pattern is a process of averaging and replacing dictionary patterns. Furthermore, in relation to this learning result, the learning mechanism unit 6 performs processing for displaying on the display 7 what the operator should say next. The learning mechanism unit 6 causes the prompt sending control unit 5 to sequentially display the character strings in the table in the dictionary memory 3 on the display 7, but if, for example, the following cases occur, the prompt is required to be re-voiced. Display the same prompt again without proceeding. () When it is rejected (when it is not considered normal audio input).

正解が第１位で認識されない時（類似語有）。 When the correct answer is not recognized in the first place (similar words exist).

正解が第１位で認識されても第２位候補との
距離が近い時、即ち類似語が存在する時（類似
語有）。 Even if the correct answer is recognized as the first candidate, the distance to the second candidate is close, that is, when similar words exist (similar words exist).

正解が第１位で認識されてもその距離がある
閾値より大きい時、即ち発声がばらつきやすい
語（発声にばらつき有）。 Even if the correct answer is recognized in first place, when the distance is greater than a certain threshold, that is, a word whose pronunciation tends to vary (there is variation in pronunciation).

さらに、学習機構部５は、なぜ再発声が必要な
のかをオペレータに通知するため、補助情報とし
て次のとの情報をデイスプレイ７に表示させ
る。（）。 Furthermore, the learning mechanism section 5 causes the display 7 to display the following information as auxiliary information in order to notify the operator why re-voicing is necessary. ().

認識結果の第４位程度までの候補。 Candidates ranked up to about 4th place in recognition results.

リジエクト、類似語有、発声にばらつき有の
いずれかの情報。 Information on whether there is a redirect, a similar word exists, or a variation in pronunciation.

〔Effect of the invention〕

以上の説明から明らかなように、本発明によれ
ば、学習機能を持つた特定話者音声認識装置にお
いて、学習を行つていく過程で、認識装置自体が
どこまで（何回）発声すればよいかを自動的に判
断してプロンプトを表示するので、オペレータ
は、余計な判断、キー操作無しにただプロンプト
に従つて発声していけばよく、オペレータの負担
の軽減や学習効率の向上を図ることができる。 As is clear from the above description, according to the present invention, in a specific speaker speech recognition device having a learning function, in the process of learning, how far (how many times) should the recognition device itself utter? Since the system automatically determines and displays prompts, the operator can simply follow the prompts and speak without making unnecessary judgments or key operations, reducing the operator's burden and improving learning efficiency. can.

[Brief explanation of the drawing]

図は本発明の１実施例構成を示す図である。１……入力パラメータ・バツフア、２……辞書
制御修正部、３……辞書メモリ、４……照合部、
５……プロンプト送出制御部、６……学習機構
部、７……デイスプレイ。 The figure shows the configuration of one embodiment of the present invention. 1...Input parameter buffer, 2...Dictionary control correction section, 3...Dictionary memory, 4...Verification section,
5... Prompt sending control section, 6... Learning mechanism section, 7... Display.

Claims

[Scope of Claims] 1. Correcting the recognition result and input pattern by comparing the input pattern and the dictionary pattern with a dictionary memory in which character strings of registered words are stored and dictionary patterns of speech for the registered words are registered. A collation unit that outputs information, a display, a dictionary control correction unit that additionally registers and corrects dictionary patterns in the dictionary memory, and a learning mechanism that receives recognition results output from the collation unit and input pattern correction information. and a prompt sending control unit that reads the character string of the registered word from the dictionary memory and displays on the display a prompt requesting voice input of the registered word according to instructions from the learning mechanism unit. In the recognition device, the learning mechanism section controls the dictionary control correction section to perform processing such as rejecting, adding a dictionary pattern, and correcting the dictionary pattern, and determines whether re-voicing is necessary. , If there is no need to re-speak, a prompt is displayed on the display to request voice input of the next registered word, and if re-voice is required, the voice input of the same registered word is performed. A specific speaker's voice characterized by being configured to perform control to display the requested prompt on a display again, and to perform control to display on the display auxiliary information indicating the reason why the re-speech is necessary. recognition device. 2. The specific speaker speech recognition device according to claim 1, wherein the auxiliary information includes recognition results from the first rank to the Nth rank (N is an integer of 2 or more).