JPH0679232B2

JPH0679232B2 - Voice recognizer

Info

Publication number: JPH0679232B2
Application number: JP59173083A
Authority: JP
Inventors: 哲也室井
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-08-20
Filing date: 1984-08-20
Publication date: 1994-10-05
Anticipated expiration: 2009-10-05
Also published as: JPS6151198A

Description

【発明の詳細な説明】技術分野本発明は、音声認識におけるパターン照合を行う音声認
識装置に関する。Description: TECHNICAL FIELD The present invention relates to a voice recognition device that performs pattern matching in voice recognition.

従来技術音声によつて機器を動作させるために音声認識が必要と
なり、音声認識装置の開発が活発に行われている。而し
て、現在広く利用されているのは単語音声認識であり、
あらかじめ登録された単語音声の特徴パターンと入力さ
れた音声の特徴パターンを比較して類似のパターンを選
び、これを認識結果としている。このパターンの比較に
際し、パターンの変動の影響をとり除くための手法が提
案されており、非線形伸縮を利用するものと線形伸縮を
利用するものに大別さる。前者は精度良くパターン変動
の影響をとり除くことができるが計算量が多いという欠
点があり、後者は計算量は少ないが前者程の精度を持た
ないという欠点がある。2. Description of the Related Art Voice recognition is required to operate a device by voice, and a voice recognition device is being actively developed. Thus, word speech recognition is widely used at present,
The feature pattern of the word voice registered in advance and the feature pattern of the input voice are compared to select a similar pattern, which is used as the recognition result. When comparing these patterns, methods have been proposed for removing the influence of pattern variations, and they are roughly classified into those that use nonlinear stretching and those that use linear stretching. The former can remove the influence of the pattern variation with high accuracy, but has a drawback that the calculation amount is large, while the latter has a disadvantage that the calculation amount is small but not as accurate as the former.

目的本発明は、上述のごとき実確に鑑みさなされたもので、
特に、少ない演算量で高精度のパターン照合を行うよう
にした音声認識装置を提供することを目的としてなされ
たものである。Purpose The present invention has been made in view of the above-mentioned accuracy,
In particular, the present invention has been made for the purpose of providing a voice recognition device capable of performing highly accurate pattern matching with a small amount of calculation.

構成本発明は、上記目的を達成するために、入力音声を特徴
パラメータに変換する変換部と、該変換部により変換さ
れた特徴パラメータと、該特徴パラメータの値が定めら
れた値より小さい部分の存在位置とを格納する辞書格納
部と、未知の音声が入力さた時に、同時に得られる特徴
パラメータの値が、定められた値より小さい部分の存在
位置をみつける無音区間判定部と、入力音声の特徴パラ
メータに定められた値より小さい部分が存在する時に、
該部分と前記辞書格納部に格納された特徴パラメータの
値が定められた値より小さい部分の存在位置とに対応づ
けて、該部分の前後を線形に伸縮することで比較データ
の長さを揃えてから比較照合する照合部とから成ること
を特徴としたものである。Configuration In order to achieve the above object, the present invention provides a conversion unit for converting an input voice into a characteristic parameter, a characteristic parameter converted by the conversion unit, and a portion in which the value of the characteristic parameter is smaller than a predetermined value. A dictionary storage unit that stores the existence position, a silent parameter determination unit that finds the existence position of a portion where the value of the feature parameter obtained at the same time when an unknown voice is input is less than a predetermined value, and the input voice When there is a portion smaller than the value specified for the characteristic parameter,
The length of the comparison data is made uniform by linearly expanding and contracting the front and rear of the portion in association with the portion and the existing position of the portion in which the value of the characteristic parameter stored in the dictionary storage unit is smaller than a predetermined value. It is characterized in that it comprises a collating unit for later comparing and collating.

第１図は、本発明の一実施例を説明するための構成図
で、図中、１はマイク、２はバンドパスフィルター群、
３は無音区間判定部、４は照合部、５は辞書格納部で、
本発明においては、前記目的を達成するために、特徴パ
ラメータに変換された音声のパターン中に同時に各パラ
メータが一定値以下になる部分、すなわち無音区間をみ
つけて登録パターンと共に記録しておき、未知の入力パ
ターン中に同様の部分が存在する場合、すなわち、無音
区間が存在し、かつ特徴パターン上の同じ様な位置に存
在する場合、登録されたパターンと未知の入力パターン
のこの部分を対応づけて比較照合するようにしている。FIG. 1 is a configuration diagram for explaining an embodiment of the present invention, in which 1 is a microphone, 2 is a bandpass filter group,
3 is a silent section determination unit, 4 is a matching unit, 5 is a dictionary storage unit,
In the present invention, in order to achieve the above-mentioned object, a portion where each parameter is equal to or less than a certain value at the same time in a voice pattern converted into a characteristic parameter, that is, a silent section is found and recorded together with a registered pattern, If a similar part exists in the input pattern of, that is, if there is a silent section and it exists at the same position on the characteristic pattern, the registered pattern and this part of the unknown input pattern are associated with each other. I try to compare and collate.

第１図に示した実施例においては、特徴パラメータの抽
出にバンドパスフィルター群を使つており、マイク１か
ら入力された音声は、バンドパスフィルター群２によつ
て音声パターンに変換される。無音区間判別部３では、
各フィルターの出力が一定値以下の部分をみつけ、その
位置を記憶する。一例として、その閾値は入力音声の無
い時のノイズの各フィルターの出力値とし決定すれば良
く、入力音声の途中でこの閾値以下の部分があつた場
合、その部分を無音区間と見なす。照合部４では、入力
と辞書５の無音区間同士を対応させ、その前後をそれぞ
れ線形伸縮してマッチングさせる。In the embodiment shown in FIG. 1, a bandpass filter group is used to extract the characteristic parameter, and the voice input from the microphone 1 is converted into a voice pattern by the bandpass filter group 2. In the silent section discrimination unit 3,
The part where the output of each filter is below a certain value is found and the position is stored. As an example, the threshold value may be determined as an output value of each filter of noise when there is no input voice, and when there is a portion equal to or less than the threshold value in the middle of the input voice, that portion is regarded as a silent section. The matching unit 4 associates the input and the silent sections of the dictionary 5 with each other, and linearly expands and contracts before and after the matching.

第２図及び第３図は、従来の線形伸縮による方式（第２
図）と本発明による照合方式（第３図）との相違を説明
するための図で、共に、下（/sita/）を単語例としたも
のである。而して、第２図に示すように単語全体を線形
伸縮してマッチングをとると、入力の/t/の部分が辞書
の無音区間と対応してしまうなど適切な対応がなされな
いが、第３図に示すように、無音区間同士を対応させ、
その前後を線形伸縮してマッチングをとるようにする
と、入力と辞書の/si/同士、/ta/同士を線形伸縮によっ
てマッチングさせることが可能となる。2 and 3 show the conventional linear expansion / contraction method (second
Fig.) And the matching method according to the present invention (Fig. 3) for explaining the difference, both of which use lower (/ sita /) as a word example. Thus, if the whole word is linearly expanded / contracted as shown in FIG. 2 for matching, an appropriate correspondence such as the / t / part of the input corresponding to the silent section of the dictionary cannot be made. As shown in Fig. 3, the silent sections are associated with each other,
By performing linear expansion and contraction before and after that, it is possible to perform linear expansion and contraction between the input and / si / between dictionary and / ta /.

また、図２及び図３に示すように、始終端と無音に挾ま
れている区間を音声の一区間と考えると、同じ人が発声
する限り、その長さは±30％程度の時間的な伸縮をする
ことが知られている。この事実を利用すると、比較すべ
き２つのパターンの音声の先頭から無音区間までの位置
がそれぞれの±30％以内に入っていることが、同じ様な
場所に無音区間が存在すると言う意味であることは容易
に考えられる。したがって、同じ様な位置にある無音区
間とは、パターンの先端あるいは末尾から、そこまでの
長さの±30％以内に入るような位置にある無音区間とい
う意味である。従って、本願の発明における無音区間の
判別は、これらの範囲のものだけを無音区間と対応付け
て照合し、他はこの様な照合はしないというものであ
る。Further, as shown in FIGS. 2 and 3, when the interval between the beginning and the end and the silence is considered as one interval of the voice, as long as the same person utters, the length of time is about ± 30%. It is known to stretch. Using this fact, the positions from the beginning of the two patterns of speech to be compared to the silent section are within ± 30% of each, which means that there is a silent section at the same place. It's easy to think of. Therefore, a silent section at the same position means a silent section located at a position within ± 30% of the length from the beginning or end of the pattern to that point. Therefore, in the determination of the silent section in the invention of the present application, only those in these ranges are matched with the silent section, and other matching is not performed.

効果以上の説明から明らかなように、本発明によると、特徴
パラメータに変換された入力音声のパターン中に同時に
各パラメータが一定値以下になる無音区間を見つけて登
録パターンとともに記録しておき、未知の入力パターン
中、上の同じ様な位置に無音区間が存在する場合、登録
されたパターンと未知の入力パターンの無音区間を対応
づけて比較照合するようにしているので、少ない演算量
で高精度のパターン照合が可能となる。Effects As is clear from the above description, according to the present invention, in the pattern of the input voice converted into the characteristic parameter, a silent section in which each parameter is equal to or less than a certain value is found at the same time and recorded together with the registered pattern. If there is a silent section at the same position above in the input pattern of, the registered pattern and the silent section of the unknown input pattern are matched and compared, so high accuracy is achieved with a small amount of calculation. Pattern matching is possible.

[Brief description of drawings]

第１図は、本発明による音声認識方式の一実施例を説明
するための構成図、第２図は、従来の単語認識の一例を
示す図、第３図は、本発明による単語認識の例を示す図
である。１……マイク、２……バンドパスフィルター群、３……
無音区間判定部、４……照合部、５……辞書格納部。FIG. 1 is a block diagram for explaining an embodiment of a voice recognition system according to the present invention, FIG. 2 is a diagram showing an example of conventional word recognition, and FIG. 3 is an example of word recognition according to the present invention. FIG. 1 ... Mike, 2 ... Bandpass filter group, 3 ...
Silent section determination unit, 4 ... collation unit, 5 ... dictionary storage unit.

Claims

[Claims]

1. A dictionary storage for storing a conversion unit for converting an input voice into a characteristic parameter, a characteristic parameter converted by the conversion unit, and an existing position of a portion in which the value of the characteristic parameter is smaller than a predetermined value. Section, when the unknown voice is input, the value of the characteristic parameter obtained at the same time is a silent section determination section that finds the existing position of the portion that is smaller than the specified value, and the value determined by the characteristic parameter of the input voice. When a small part exists, the part and the front and back of the part are linearly expanded / contracted by associating the part and the existence position of the part in which the value of the characteristic parameter stored in the dictionary storage is smaller than a predetermined value. A speech recognition device comprising: a collating unit for collating and collating after making the lengths of comparison data uniform.