JPH0634189B2

JPH0634189B2 - Voice recognizer

Info

Publication number: JPH0634189B2
Application number: JP59140650A
Authority: JP
Inventors: 裕飯塚
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1984-07-09
Filing date: 1984-07-09
Publication date: 1994-05-02
Anticipated expiration: 2009-05-02
Also published as: JPS6120098A

Description

【発明の詳細な説明】（産業上の利用分野）この発明は音声認識装置に関し、更に詳細には、音声認
識における話者の適性、発声方法の適否、認識単語セッ
トの適否等の判定を行うことのできる音声認識装置に関
する。Description: TECHNICAL FIELD The present invention relates to a voice recognition device, and more specifically, determines the suitability of a speaker, the suitability of a vocalization method, the suitability of a recognition word set, and the like in voice recognition. A voice recognition device capable of

（従来の技術）従来、上記のような判定を行うためには認識率が用いら
れていた。従来の音声認識装置を第５図に基づいて説明
する。第５図は従来の音声認識装置のブロック図であ
り、１は音声入力端子、２は認識部、その中で201は周
波数分析部、202はパターンマッチング部、203は標準パ
ターン部、204は判定部、３は正解数カウンタ、４はパ
ターン数カウンタ、５は除算器、６は乗算器、７は出力
端子である。動作について説明すると、まず入力端子１
から入力した音声を周波数分析部201で分析し、周波数
軸、時間軸を持つ２次元の音声入力パターンとする。次
にパターンマッチング部202により標準パターン部203に
格納されている認識カテゴリ毎の標準パターンとパター
ンマッチングを行い、距離を求める。次に判定部204に
より、距離が最小となるカテゴリを求め、それを認識結
果とする。認識結果が初めに入力した音声のカテゴリに
等しければ正解であり、等しくなければ誤認識である。
ここで正解なら正解数カウンタ３をカウントアップす
る。又、１回の認識のたびにパターン数カウンタ４をカ
ウントアップする。このようにして数百〜数千の音声を
認識し、パターン数と正解数を求める。次に除算器５に
より正解数をパターン数で除し、乗算器６により100を
乗じ、認識率を求め、出力端子７から出力する。(Prior Art) Conventionally, the recognition rate has been used to make the above determination. A conventional voice recognition device will be described with reference to FIG. FIG. 5 is a block diagram of a conventional voice recognition device, in which 1 is a voice input terminal, 2 is a recognition unit, 201 is a frequency analysis unit, 202 is a pattern matching unit, 203 is a standard pattern unit, and 204 is a judgment unit. 3 is a correct number counter, 4 is a pattern number counter, 5 is a divider, 6 is a multiplier, and 7 is an output terminal. To explain the operation, first, the input terminal 1
The frequency analysis unit 201 analyzes the voice input from the above to obtain a two-dimensional voice input pattern having a frequency axis and a time axis. Next, the pattern matching unit 202 performs pattern matching with the standard pattern for each recognition category stored in the standard pattern unit 203 to obtain the distance. Next, the determination unit 204 obtains the category with the smallest distance and sets it as the recognition result. If the recognition result is equal to the category of the initially input voice, it is a correct answer, and if not, it is a false recognition.
If the answer is correct, the correct answer number counter 3 is incremented. Also, the pattern number counter 4 is incremented each time recognition is performed. In this way, hundreds to thousands of voices are recognized, and the number of patterns and the number of correct answers are obtained. Next, the corrector number is divided by the number of patterns by the divider 5, and the multiplier 6 multiplies by 100 to obtain the recognition rate, which is output from the output terminal 7.

（発明が解決しようとする問題点）しかしながら、認識率はいわば誤認識したパターン数の
みを問題にするわけであり、３桁の精度を必要とするな
ら（例えば98.8％というように）少なくとも1000回の認
識が必要であり、非常に時間がかかりめんどうな作業と
なるという欠点があった。(Problems to be solved by the invention) However, the recognition rate is, so to speak, only the number of erroneously recognized patterns, and if 3-digit accuracy is required (for example, 98.8%), at least 1000 times. Has to be recognized, and it has a drawback that it is a very time-consuming and troublesome work.

この発明はこのような従来技術の欠点を解消するために
なされたものであって、話者の適性、発声方法の適否、
認識単語セットの適否の判定のための指標を少ない認識
回数で得ることのできる音声認識装置を提供することを
目的とする。The present invention has been made in order to solve the above-mentioned drawbacks of the prior art, and includes the aptitude of a speaker, the adequacy of a vocalization method,
An object of the present invention is to provide a voice recognition device that can obtain an index for determining the suitability of a recognition word set with a small number of recognition times.

（問題点を解決するための手段）このような従来技術の問題点を解決するため、音声のカ
テゴリ毎の標準パターンを格納する標準パターン格納部
と、入力音声パターンと標準パターンを比較してパター
ンマッチングを行うことにより両パターンの類似度を表
す距離を求めるパターンマッチング部と、該パターンマ
ッチング部により求められた距離の大小により認識判定
を行う認識判定部とからなる音声認識装置において、入
力音声と同じカテゴリの標準パターンによる距離Ｄ１を
パターン毎に保持する第１の距離保持回路と、入力音声
と異なるカテゴリの標準パターンによる距離のうち最小
距離であるＤ２をパターン毎に保持する第２の距離保持
回路と、認識したパターン数をカウントするパターン数
カウント回路と、該パターン数カウント回路により求め
たパターン数に基づき、各パターンのＤ２／Ｄ１の値の
相乗平均を求める相乗平均演算回路とを有することに特
徴がある。(Means for Solving Problems) In order to solve the problems of the related art, a standard pattern storage unit that stores a standard pattern for each voice category and a pattern that compares an input voice pattern and a standard pattern In a voice recognition device including a pattern matching unit that obtains a distance representing the similarity between both patterns by performing matching, and a recognition determination unit that performs recognition determination based on the size of the distance obtained by the pattern matching unit, A first distance holding circuit that holds a distance D1 based on the standard pattern of the same category for each pattern, and a second distance holding circuit that holds a minimum distance D2 among the distances based on the standard pattern of a category different from the input voice for each pattern. Circuit, pattern number counting circuit for counting the number of recognized patterns, and pattern number counting It is characterized in that it has a geometric mean calculation circuit for obtaining a geometric mean of the values of D2 / D1 of each pattern based on the number of patterns obtained by the circuit.

（作用）本発明によれば、以上のように音声認識装置を構成した
ので各手段は以下のように作用を行う。(Operation) According to the present invention, since the voice recognition device is configured as described above, each means operates as follows.

第１の距離保持回路は入力音声と同じカテゴリの標準パ
ターンによる距離Ｄ１を各パターン毎に保持し、一方第
２の距離保持回路は入力音声と異なるカテゴリの標準パ
ターンによる距離のうち最小のものＤ２を各パターン毎
に保持する。パターン数カウント回路は認識したパター
ン数のカウントを行う。相乗平均演算回路は第１及び第
２の距離保持回路からの情報を入力し、パターン毎にＤ
２／Ｄ１を求め、パターン数カウント回路でカウントさ
れた数に対応する個数のＤ２／Ｄ１の値の相乗平均を求
め、これを認識安定度として定義する。そしてこの発明
の音声認識装置は前記認識安定度に基づいて音声認識に
おける話者の適性、発声方法の適否、認識単語セットの
適否等の判定を行う。The first distance holding circuit holds the distance D1 according to the standard pattern of the same category as the input voice for each pattern, while the second distance holding circuit holds the minimum distance D2 among the standard patterns of the category different from the input voice. Is held for each pattern. The pattern number counting circuit counts the number of recognized patterns. The geometric mean calculation circuit inputs the information from the first and second distance holding circuits and outputs D for each pattern.
2 / D1 is obtained, the geometric mean of the number of D2 / D1 values corresponding to the number counted by the pattern number counting circuit is obtained, and this is defined as recognition stability. Then, the voice recognition device of the present invention determines the suitability of the speaker in the voice recognition, the suitability of the utterance method, the suitability of the recognition word set, etc. based on the recognition stability.

（実施例）第１図はこの発明の第１の実施例を示す回路図であっ
て、音声入力端子１は認識部２に接続され、認識部２は
データレジスタＤ１ 11，データレジスタＤ２ 12，パ
ターン数カウンタ17に接続される。データレジスタＤ１
11とデータレジスタＤ２ 12は除算器13に接続され、
除算器13の出力は乗算器14へ接続され、乗算器14の出力
はレジスタ15に接続される。レジスタ15の出力はＮ乗根
計算器16と乗算器14のもう一方の入力に接続される。Ｎ
乗根計算器16のもう一方の入力はパターン数カウンタ17
の出力と接続される。Ｎ乗根計算器16の出力は認識安定
度であり、出力端子18から出力される。第１の実施例の
動作につき説明すると、まずレジスタ15に１をセット
し、パターン数カウンタ17をゼロクリアする。次に入力
端子１からカテゴリのはっきりした音声を入力し、認識
部２により認識する。認識部２からはカテゴリ毎の距離
の値とステゴリ名が距離値の昇順に結果として得られ
る。そしてデータレジスタＤ１ 11には入力音声のカテ
ゴリの距離（Ｄ１）をセットし、データレジスタＤ２
12には入力音声以外のカテゴリの中での最小距離（Ｄ
２）をセットする。又、認識を１回行う毎にパターン数
カウンタ17をカウントアップする。次に除算器13により
データレジスタＤ２ 12の内容をデータレジスタＤ１
11の内容で除し、乗算器14によりレジスタ15の値に除算
器13の値を乗じ、その結果をレジスタ15にセットする。
この動作をＮ回くり返し、レジスタ15の値からＮ乗根計
算器16によりＮ乗根を求めて出力端子18から出力する。
ここでＮは数十〜数百の範囲である。(Embodiment) FIG. 1 is a circuit diagram showing a first embodiment of the present invention, in which a voice input terminal 1 is connected to a recognition unit 2, and the recognition unit 2 is connected to a data register D1 11, a data register D2 12, It is connected to the pattern number counter 17. Data register D1
11 and the data register D2 12 are connected to the divider 13,
The output of the divider 13 is connected to the multiplier 14, and the output of the multiplier 14 is connected to the register 15. The output of the register 15 is connected to the other input of the Nth root calculator 16 and the multiplier 14. N
The other input of the root calculator 16 is the pattern number counter 17
Connected with the output of. The output of the Nth root calculator 16 is the recognition stability and is output from the output terminal 18. The operation of the first embodiment will be described. First, 1 is set in the register 15 and the pattern number counter 17 is cleared to zero. Next, a voice with a clear category is input from the input terminal 1 and recognized by the recognition unit 2. From the recognition unit 2, the distance value and the stegory name for each category are obtained as a result in the ascending order of the distance value. Then, the distance (D1) of the category of the input voice is set in the data register D1 11, and the data register D2 is set.
12 is the minimum distance (D
2) Set. Also, the pattern number counter 17 is incremented each time recognition is performed. Next, the contents of the data register D2 12 are transferred to the data register D1 by the divider 13.
The value in register 15 is multiplied by the value in divider 13, and the result is set in register 15.
This operation is repeated N times, the Nth root is calculated from the value of the register 15 by the Nth root calculator 16, and is output from the output terminal 18.
Here, N is in the range of several tens to several hundreds.

以上の動作を式で表わすと、第(1)式になる。The above operation is expressed by the equation (1).

ここでＡは認識安定度である。 Here, A is the recognition stability.

次にこの発明の第２の実施例について説明する。第(1)
式を次のように変形する。Next, a second embodiment of the present invention will be described. Number (1)
The formula is transformed as follows.

第２の実施例は第(4)式を基にしたものであり、第２図
にブロック図を示す。第２図で音声入力端子１は認識部
２に接続され、認識部２はデータレジスタＤ１ 11，デ
ータレジスタＤ２ 12，パターン数カウンタ17に接続さ
れる。データレジスタＤ１ 11とデータレジスタＤ２
12は除算器13に接続され、除算器13の出力は対数計算器
21に接続される。対数計算器21の出力は加算器22に接続
され、加算器22の出力はレジスタ23に接続され、レジス
タ23の出力は除算器24と加算器22のもう一方の入力に接
続される。除算器24のもう一方の入力はパターン数カウ
ンタ17の出力に接続されており、除算器24の出力は指数
関数計算器25に接続され、指数関数計算器25の出力から
は認識安定度Ａが得られ、出力端子18から出力される。 The second embodiment is based on the equation (4), and a block diagram is shown in FIG. In FIG. 2, the voice input terminal 1 is connected to the recognition unit 2, and the recognition unit 2 is connected to the data register D1 11, the data register D2 12, and the pattern number counter 17. Data register D1 11 and data register D2
12 is connected to the divider 13 and the output of the divider 13 is a logarithmic calculator
Connected to 21. The output of the logarithmic calculator 21 is connected to the adder 22, the output of the adder 22 is connected to the register 23, and the output of the register 23 is connected to the divider 24 and the other input of the adder 22. The other input of the divider 24 is connected to the output of the pattern number counter 17, the output of the divider 24 is connected to the exponential function calculator 25, and the recognition stability A is output from the output of the exponential function calculator 25. It is obtained and output from the output terminal 18.

第２の実施例の回路を動作するには、まず、レジスタ23
とパターン数カウンタ17をゼロクリアする。次に音声を
入力し、パターン数カウンタ17をカウントアップし、除
算器13によりＤ２／Ｄ１の値を求めるまでは第１の実施
例と同様である。To operate the circuit of the second embodiment, first, register 23
And the pattern number counter 17 is cleared to zero. Next, voice is input, the pattern number counter 17 is counted up, and the operation until the divider 13 obtains the value of D2 / D1 is the same as in the first embodiment.

次に対数計算器21により、Ｄ２／Ｄ１の対数を計算し、
その値にレジスタ23の値を加算器22により加算し、その
結果をレジスタ23にセットする。Next, the logarithm calculator 21 calculates the logarithm of D2 / D1,
The value of the register 23 is added to the value by the adder 22, and the result is set in the register 23.

以上の動作をＮ回くり返し、レジスタ２３の値を除算器
24によりＮで除し、その結果の指数関数を指数関数計算
器25で求め、出力端子18から出力する。The above operation is repeated N times, and the value of the register 23 is divided by the divider.
It is divided by N by 24, and the exponential function of the result is obtained by the exponential function calculator 25 and output from the output terminal 18.

次にこの発明の第３の実施例について説明する。Next, a third embodiment of the present invention will be described.

第(4)式を次のように変形する。Equation (4) is transformed as follows.

第３の実施例は第(5)式を基にしたものであり、第３図
にブロック図を示す。第３図で音声入力端子１は認識部
２に接続され、認識部２はデータレジスタＤ１ 11，デ
ータレジスタＤ２ 12，パターン数カウンタ17に接続さ
れる。データレジスタＤ１ 11の出力は対数計算器31に
接続され、データレジスタ12の出力は対数計算器32に接
続される。対数計算器32の出力は減算器33の＋入力に接
続され、対数計算器31の出力は減算器33の−入力に接続
される。減算器33の出力は加算器22に接続され、加算器
22の出力はレジスタ23に接続され、レジスタ23の出力は
除算器24と加算器22のもう一方の入力に接続される。除
算器24のもう一方の入力はパターン数カウンタ17の出力
に接続され、除算器24の出力は指数関数計算器25に接続
され、指数関数計算器25の出力からは認識安定度Ａが得
られ出力端子18から出力される。 The third embodiment is based on the equation (5), and a block diagram is shown in FIG. In FIG. 3, the voice input terminal 1 is connected to the recognition unit 2, and the recognition unit 2 is connected to the data register D1 11, the data register D2 12, and the pattern number counter 17. The output of the data register D1 11 is connected to the logarithmic calculator 31, and the output of the data register 12 is connected to the logarithmic calculator 32. The output of the logarithmic calculator 32 is connected to the + input of the subtractor 33, and the output of the logarithmic calculator 31 is connected to the − input of the subtractor 33. The output of the subtractor 33 is connected to the adder 22,
The output of 22 is connected to the register 23, and the output of the register 23 is connected to the divider 24 and the other input of the adder 22. The other input of the divider 24 is connected to the output of the pattern number counter 17, the output of the divider 24 is connected to the exponential function calculator 25, and the recognition stability A is obtained from the output of the exponential function calculator 25. It is output from the output terminal 18.

第３の実施例を動作するには、まずレジスタ23とパター
ン数カウンタ17をゼロクリアする。次に音声を入力し、
データレジスタＤ１ 11に入力音声のカテゴリの距離
（Ｄ１）をセットし、データレジスタＤ２ 12に入力音
声以外のカテゴリの中での最小距離（Ｄ２）をセット
し、パターン数カウンタ17をカウントアップするところ
までは第１の実施例と同様である。To operate the third embodiment, first, the register 23 and the pattern number counter 17 are cleared to zero. Next, input the voice,
When the distance (D1) of the category of the input voice is set in the data register D1 11, the minimum distance (D2) of the categories other than the input voice is set in the data register D2 12, and the pattern number counter 17 is counted up. The steps up to this are the same as in the first embodiment.

次に対数計算器31によりＤ１の対数を計算し、又対数計
算器32によりＤ２の対数を計算し、その値から減算器33
によりＤ１の対数の値を引く。なお、減算器33の出力を
累積し、Ｎで除し、指数関数の計算を行うところは第２
の実施例と同様である。Next, the logarithmic calculator 31 calculates the logarithm of D1, and the logarithmic calculator 32 calculates the logarithm of D2.
Subtract the logarithmic value of D1. Note that the output of the subtractor 33 is accumulated, divided by N, and the exponential function is calculated in the second place.
It is similar to the embodiment of.

第４図は600回の認識を行った場合のＤ２／Ｄ１の値の
ヒストグラムを示したものである。同図においてＤ２／
Ｄ１＜１である所（で示す）はＤ２＜Ｄ１であるので
誤認識となることになる。なお＊は正解を示す。FIG. 4 shows a histogram of D2 / D1 values when recognition is performed 600 times. In the figure, D2 /
A place where D1 <1 (indicated by) is D2 <D1 and therefore is erroneously recognized. Note that * indicates the correct answer.

従来技術で用いられていた認識率はいわばの数を問題
としているもので、第４図からも明らかなようにの数
は全体から見ればごくわずかなので、従来技術によれば
３桁の精度が必要なら少なくとも1000回の認識を必要と
し、非常に時間がかかってしまう。そこで以上説明した
ような本発明の各実施例によればＤ２／Ｄ１の相乗平均
を認識安定度として定義し、ヒストグラム全体から認識
しやすさを判定するようにしているので、３桁の精度を
得るのに100回程度の認識を行えばよく、判定に時間が
かからないという利点がある。The recognition rate used in the prior art is, so to speak, a problem of the number of numbers, and as is clear from FIG. It requires at least 1000 recognitions if needed, which is very time consuming. Therefore, according to each of the embodiments of the present invention described above, the geometric mean of D2 / D1 is defined as the recognition stability, and the easiness of recognition is determined from the entire histogram. It is sufficient to perform recognition about 100 times to obtain, and there is an advantage that determination does not take time.

（発明の効果）以上詳細に説明したように、この発明は、正しいカテゴ
リの距離（Ｄ１）とその他のカテゴリの中での最小距離
（Ｄ２）の比（Ｄ２／Ｄ１）の相乗平均を認識安定度と
して定義し、この認識安定度に基づいて音声認識におけ
る話者の適性、発声方法の適否、認識単語セットの適
否、標準パターンの再登録の必要性等の判定を行ってい
るので、少ない認識回数でかつ短時間で判定ができる。
したがって、この発明は音声認識システムに好適に利用
することができる。(Effect of the Invention) As described in detail above, the present invention recognizes and stabilizes the geometric mean of the ratio (D2 / D1) of the distance (D1) of the correct category and the minimum distance (D2) of the other categories. It is defined as a degree, and based on this recognition stability, it is determined whether the speaker is suitable for speech recognition, the utterance method, the recognition word set, the necessity of re-registering a standard pattern, etc. The number of times can be determined in a short time.
Therefore, the present invention can be preferably used in a voice recognition system.

[Brief description of drawings]

第１図はこの発明の第１の実施例のブロック図、第２図
はこの発明の第２の実施例のブロック図、第３図はこの
発明の第３の実施例のブロック図、第４図はＤ２／Ｄ１
のヒストグラム、第５図は従来の音声認識装置のブロッ
ク図である。１……音声入力端子、２……認識部、 11……データレジスタ、12……データレジスタ、 13……除算器、14……乗算器、 15……レジスタ、16……Ｎ乗根計算器、 17……パターン数カウンタ、18……出力端子、 21……対数計算器、22……加算器、 23……レジスタ、24……除算器、 25……指数関数計算器、31,32……対数計算器、 33……減算器。1 is a block diagram of a first embodiment of the present invention, FIG. 2 is a block diagram of a second embodiment of the present invention, and FIG. 3 is a block diagram of a third embodiment of the present invention. The figure shows D2 / D1
FIG. 5 is a block diagram of a conventional voice recognition device. 1 ... Voice input terminal, 2 ... Recognition section, 11 ... Data register, 12 ... Data register, 13 ... Divider, 14 ... Multiplier, 15 ... Register, 16 ... N root calculator , 17 …… pattern number counter, 18 …… output terminal, 21 …… logarithmic calculator, 22 …… adder, 23 …… register, 24 …… divider, 25 …… exponential function calculator, 31, 32… … Logarithmic calculator, 33 …… Subtractor.

Claims

[Claims]

1. A standard pattern storage unit for storing a standard pattern for each voice category, and a pattern matching for comparing an input voice pattern and the standard pattern to perform pattern matching to obtain a distance representing the similarity between both patterns. And a recognition determination unit that performs recognition determination based on the magnitude of the distance obtained by the pattern matching unit, a distance D1 based on a standard pattern in the same category as the input voice.
For each pattern, a second distance holding circuit for holding D2, which is the minimum distance among the distances according to the standard pattern of a category different from the input voice, for each pattern, and the number of recognized patterns. A voice recognition device, comprising: a pattern number counting circuit for counting; and a geometric mean arithmetic circuit for obtaining a geometric mean of D2 / D1 values of each pattern based on the number of patterns obtained by the pattern number counting circuit.