JPS6332200B2

JPS6332200B2 -

Info

Publication number: JPS6332200B2
Application number: JP55003871A
Authority: JP
Inventors: Hidefumi Ooga; Hidekazu Yabuchi
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1980-01-16
Filing date: 1980-01-16
Publication date: 1988-06-28
Also published as: JPS56101200A

Description

【発明の詳細な説明】本発明は音声パターンの圧縮方法に関するもの
である。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a method for compressing audio patterns.

登録型の音声認識装置はあらかじめ認識すべき
音声を、認識装置に登録しておき、認識時には、
登録された複数の音声パターンと、入力音声パタ
ーンを比較し、最も類似性のある登録された音声
パターンを求める事により入力音声を認識するも
のである。まず、この様な音声認識装置について
説明する。 With a registration type speech recognition device, the speech to be recognized is registered in the recognition device in advance, and during recognition,
The input voice is recognized by comparing a plurality of registered voice patterns with the input voice pattern and finding the registered voice pattern with the most similarity. First, such a speech recognition device will be explained.

音声信号は第１図Ａに示す様にして、音声パタ
ーンに変換される。すなわち、第１図において１
００は、マイク、１０１は増幅器、１０２，１０
３，１０４はフイルタバンクである。フイルタバ
ンク１０２，１０３，１０４は、それぞれ、第１
図Ｂに示すようにバンドパスフイルタ１２０と、
その出力を整流する整流回路１２１と、ローパス
フイルタ１２２より、構成される。バンドパスフ
イルタ１２０の出力は交流分でありその出力を直
流レベルに変換するのが、整流回路１２１、ロー
パスフイルタ１２２である。フイルタバンク１０
２，１０３，１０４の中心周波数はそれぞれ異な
り、図においては３つのフイルタであるが、実際
は８コ以上設けられている。以下８コのフイルタ
が設けられている場合を例にして説明する。 The audio signal is converted into an audio pattern as shown in FIG. 1A. That is, in Figure 1, 1
00 is a microphone, 101 is an amplifier, 102, 10
3,104 is a filter bank. The filter banks 102, 103, and 104 each have a first
As shown in Figure B, a bandpass filter 120;
It is composed of a rectifier circuit 121 that rectifies the output, and a low-pass filter 122. The output of the band pass filter 120 is an alternating current component, and the rectifier circuit 121 and the low pass filter 122 convert the output to a direct current level. filter bank 10
The center frequencies of filters 2, 103, and 104 are different, and although there are three filters in the figure, there are actually eight or more filters. The following explanation will be given by taking as an example a case where eight filters are provided.

１０５は取込制御回路１１２からの制御信号Ｓ
によりフイルタ１０２〜１０４からの出力₁〜₈
の内のどれかを選択する選択回路である。１０６
はアナログ・デイジタル変換器である。発振回路
１１１からの信号により、一定間隔毎に₁〜₈の
信号のレベルを選択回路１０５で切り換えなが
ら、Ａ／Ｄ変換器１０６を介してデイジタル信号
に変換して取り込みエリア１０７に取り込んでい
く。従つて音声信号は、８コのフイルタにより周
波数分析されたものが、一定の間隔毎に取り込み
エリア１０７に取り込まれることになる。発声さ
れてない場合は₁〜₈レベルの総和は、ゼロに近
くなり、発声するとその総和は大になる。従つ
て、一定の閾値を設け、このレベルより総和が大
きくなると、取り込みを開始し、総和が小さい状
態が一定の間続くと、そこで取り込みを終了すれ
ば発声された単語の周波数分析されたものが、取
り込みエリア１０７に取込まれることとなる。こ
の制御をするのが取込制御部１１２である。 105 is a control signal S from the acquisition control circuit 112
Outputs ₁ to ₈ from filters 102 to 104
This is a selection circuit that selects one of the following. 106
is an analog-to-digital converter. Using the signal from the oscillation circuit 111, the selection circuit 105 switches the levels of the signals ₁ to ₈ at regular intervals, converts them into digital signals via the A/D converter 106, and captures them into the capture area 107. Therefore, the audio signal is frequency-analyzed by eight filters and is captured into the capture area 107 at regular intervals. If no voice is being uttered, the sum of levels ₁ to ₈ will be close to zero, and if utterance is being made, the sum will be large. Therefore, if a certain threshold is set, and the total sum becomes larger than this level, the acquisition will start, and if the total sum continues to be small for a certain period of time, the frequency analysis of the uttered word will be obtained by stopping the acquisition at that point. , will be captured in the capture area 107. The acquisition control unit 112 performs this control.

１０８は、正規化部であり、ここで同じ時点で
取り込まれた各フイルタからの出力の総和が、一
定になる様に正規化される。つまり a₁＝（₁／₈ 〓ⁱ⁼¹ i）×Ｋ，a₂＝（₂／₈ 〓ⁱ⁼¹ i）×Ｋ……a₈＝（₈／₈ 〓ⁱ⁼¹ i）×Ｋ ……(1) であり、Ｋは一定の値で、₁，₂……₈は各フイ
ルタからの出力である。このようにする事により
各フイルタからの出力の総和はＫとなり、音声信
号の大きさに全く無関係のものとなる。フイルタ
の数は８コであるのである時点の音声信号は、８
次元のベクトルで示す事が出来る。ｎ番目に取り
込まれた音声信号をa_oとすると、a_o＝（a_o1，a_o2，
A_o3，…，a_o8）となり音声信号は、結局a₁，a₂，
a₃，……a_o……a_Nのベクトル列で示す事が出来
る。Ｎは従つて音声信号の取り込み個数を示す事
になる。これを音声パターンと呼ぶ。 108 is a normalization unit, which normalizes the sum of outputs from each filter taken in at the same time to be constant. In other words, a ₁ = ( ₁ / ₈ 〓 ⁱ⁼¹ i) × K, a ₂ = ( ₂ / ₈ 〓 ^{i = 1} i) × K ... a ₈ = ( ₈ / ₈ 〓 ^{i = 1} i) × K ... ...(1) where K is a constant value and ₁ , ₂ ... ₈ are the outputs from each filter. By doing this, the sum of the outputs from each filter becomes K, which is completely unrelated to the magnitude of the audio signal. Since the number of filters is 8, the audio signal at a certain point is 8.
It can be shown as a dimensional vector. If the nth captured audio signal is a _o , a _o = (a _o1 , a _o2 ,
A _o3 ,…, a _o8 ), and the audio signal ends up being a ₁ , a ₂ ,
It can be expressed as a vector sequence of a ₃ , ...a _o ...a _N. Therefore, N indicates the number of audio signals to be captured. This is called a voice pattern.

登録時には、第１図においてｅの経路で、登録
エリア１１０に、音声パターンを格納する。 At the time of registration, the voice pattern is stored in the registration area 110 along the route e in FIG.

認識時にはγの経路で入力音声パターンは距離
計算部１０９へ出力され、登録エリア内のそれぞ
れの音声パターンとの間で、パターンマツチング
がなされ、距離が求められ、もつとも距離的に近
い標準パターンが入力音声の認識結果として出力
される。 During recognition, the input speech pattern is output to the distance calculation unit 109 along the path γ, and pattern matching is performed between each speech pattern within the registered area to calculate the distance, and the standard pattern that is closest in distance is determined. Output as the recognition result of input speech.

このとき、話す時のスピードの違いにより同じ
単語でも時間的な長さが異るので、時間軸上にお
いてなんらかの正規化が必要である。正規化する
方法として動的計画法を使用して、時間軸を非線
形に伸縮する方法が知られている。 At this time, since the same word has different temporal lengths due to differences in speaking speed, some kind of normalization on the time axis is necessary. As a normalization method, a method is known in which dynamic programming is used to nonlinearly expand or contract the time axis.

先ず、登録パターンa₁，a₂，a₃，…a_o，……a_N
と、入力パターンb₁，b₂，b₃，…b_n，……b_Mとを
第２図に示す様に、ｘ軸と、ｙ軸に配置した格子
グラフにおいて、径路_i軸上で、a_oとb_nとの距
離ｄ（ｍ，ｎ）を求め、その径路上で定義される
荷重に関して求められたｄ（ｍ，ｎ）の荷重平均
を以て登録パターンと、入力パターンの間の径路
_iに関する距離をＤ（_i）と定義する。 First, registered patterns a ₁ , a ₂ , a ₃ , ...a _o , ...a _N
and the input patterns b ₁ , b ₂ , b ₃ , ...b _n , ...b _M are arranged on the x-axis and the y-axis as shown in Fig. 2, and on the path _i- axis, Find the distance d (m, n) between a _o and b _n , and use the weighted average of d (m, n) found with respect to the load defined on that path to find the path between the registered pattern and the input pattern.
The distance regarding _i is defined as D( _i ).

a_oとb_nとの距離ｄ（ｍ，ｎ）としては、一般に、
市外地距離と呼ばれるもので定義することが多
い。 Generally, the distance d(m, n) between a _o and b _n is
It is often defined by something called distance between city limits.

即ち、このときはｄ（ｍ，ｎ）＝‖a_o―b_n‖＝₈ 〓ⁱ⁼¹ ｜a_oi―b_ni｜となる。 That is, in this case, d(m, n)=‖a _o −b _n ‖= ₈ 〓 ⁱ⁼¹ |a _oi −b _ni |.

ここで a_o＝（a_o1，a_o2，…，a_o8） b_n＝（b_n1，b_n2，…，b_n8）である。 Here, a _o = (a _o1 , a _o2 , ..., a _o8 ) b _n = (b _n1 , b _n2 , ..., b _n8 ).

このとき、前記両パターンを比較する経路とし
ては、第２図に示す様に、様々な経路が考えられ
る。第２図の場合は、例として３本の経路を示し
ているが、この他にも、様々な経路が考えられ
る。この経路の中で、最も距離Ｄ（_i）の小くな
る経路₀を効率よく発見する手法の１つが、動
的計画法を用いたパターンマツチングである。こ
のときの経路₀に関する距離が、両パターンの
距離となる（本発明とは、はなれるので、動的計
画法についての説明は省略する）。 At this time, as a route for comparing the two patterns, various routes can be considered as shown in FIG. In the case of FIG. 2, three routes are shown as an example, but various other routes are possible. Among these routes, one method for efficiently finding route ₀ with the smallest distance D( _i ) is pattern matching using dynamic programming. The distance related to route ₀ at this time becomes the distance between both patterns (description of dynamic programming will be omitted since it is separate from the present invention).

以上の様なパターンマツチングを行ない、最も
類似性の小さい登録パターンを選出し、入力パタ
ーンの識別を行なう事となる。 By performing pattern matching as described above, the registered pattern with the least similarity is selected, and the input pattern is identified.

この様な方法の場合、登録エリア１１０の容
量、及び入力パターンの容量は下記の様になる。
すなわち取り込み間隔をTt秒とし、入力音声の
長さの最大をTi秒とし、フイルタ数をＦコとす
ると（Ti／Tt）×Ｆサンプルが、入力パターンの
記憶のために必要とされるメモリの容量となり、
登録エリアの容量は、この値に認識すべき音声単
語の数Ｋを乗算したものとなる。またパターンマ
ツチングで要する処理時間は、大略２つのベクト
ル間の距離ｄ（ｍ，ｎ）を計算する時間をTdと
し、格子点１つについて動的計画の漸化式を１回
計算するのに要する時間をTDとすると（Td＋TD）×（／Tt）×Ｋ×Ｗになる。Ｗ
は、整合の窓と呼ばれる値で、Ｗは２以上の値で
ある。 In the case of such a method, the capacity of the registration area 110 and the capacity of the input pattern are as follows.
In other words, if the capture interval is Tt seconds, the maximum length of input audio is Ti seconds, and the number of filters is F, then (Ti/Tt) x F samples is the memory required to store the input pattern. The capacity is
The capacity of the registration area is this value multiplied by the number K of speech words to be recognized. In addition, the processing time required for pattern matching is approximately the time required to calculate the distance d (m, n) between two vectors, where Td is the time required to calculate the dynamic programming recurrence formula for one grid point. If the required time is TD, it will be (Td + TD) x (/Tt) x K x W. W
is a value called a matching window, and W is a value of 2 or more.

はTiの平均である。 is the average of Ti.

本発明は、パターン圧縮によつて、登録エリア
の容量、及びパターンマツチングで要する処理時
間を短くしようとするものである。 The present invention aims to reduce the capacity of the registration area and the processing time required for pattern matching through pattern compression.

本発明は、正規化されたパターンにおいて隣り
あう２つのベクトル間の距離を求め、その距離
が、大なる時は、２つのベクトルはそのまま残
し、距離が小なる時は、２つのベクトルの内の一
方、または平均を新たなベクトルとすることによ
つて、パターンを圧縮しようとするものである。
これについての説明を、第３図に示す。音声パタ
ーンをa₁，a₂，a₃，…a_o，……a_Nとした時に（P₀
で示す。）a₁とa₂間のベクトルの距離を求め、そ
の距離d₁，₂が、ある閾値Ｓより大なる時は、a₁，
a₂とも残こしＳより小なる時は、a₁又は、a₂のど
ちらか一方を残す様にする。第３図の場合は、
d₁.₂がＳより大であり、a₁，a₂を残こす。次にa₃
とa₄間のベクトルの距離d₃.₄を求め、図の場合は
d₃.₄は、Ｓより小であり、a₃を残す。以下、同様
の処理をする。圧縮されたパターン列を、P₁で
示す。もし、すべてのd_o,o+1が、Ｓより小なる時
は、パターンは、1/2の長さになる。すべての
d_o,o+1がＳより小なる時は、きわめてまれである
ため実際には、1/2以上、１以下の長さになる。
一般に母音部では、d_o,o+1はＳより小さく音声パ
ターンには、ほとんど母音部が含まれるため、圧
縮されない事はまずない。子音部では、変化が激
しく、d_o,o+1は、大きな値を取り、結果的には、
子音部がそのまま残り、母音部が圧縮されること
となる。 The present invention calculates the distance between two adjacent vectors in a normalized pattern, and when the distance is large, the two vectors are left as is, and when the distance is small, the distance between the two vectors is determined. On the other hand, it attempts to compress the pattern by using the average as a new vector.
An explanation of this is shown in FIG. When the voice patterns are a ₁ , a ₂ , a ₃ , ...a _o , ...a _N , (P ₀
Indicated by ) Find the vector distance between a ₁ and a ₂ , and when the distance d ₁ , ₂ is greater than a certain threshold S, a ₁ ,
When both a ₂ are smaller than the residual S, leave either a ₁ or a ₂ . In the case of Figure 3,
d ₁ . ₂ is greater than S, leaving a ₁ and a ₂ . then a ₃
Find the vector distance d ₃ . ₄ between and a ₄ , and in the case of figure
d ₃ . ₄ is less than S, leaving a ₃ . The same process is performed below. The compressed pattern sequence is denoted by _P1 . If all d _o,o+1 are less than S, the pattern will be 1/2 the length. all
When d _o,o+1 is smaller than S, it is extremely rare, so in reality, the length is 1/2 or more and 1 or less.
In general, in the vowel part, d _o,o+1 is smaller than S and the speech pattern almost always includes the vowel part, so it is unlikely that it will not be compressed. In the consonant part, the changes are drastic, and d _o,o+1 takes a large value, resulting in
The consonant part remains intact and the vowel part is compressed.

上記の様な動作を数回繰り返せば、さらに圧縮
されることになる。第３図のP₁のパターン列に、
さらに同様な処理を行ない、P₂の圧縮したパタ
ーンを得る。P₂のパターン列にさらに圧縮をし、
P₃を得る。以下、同様にする。この様にするこ
とにより圧縮される比率はさらに増加していく。
結果的には、隣りあうベクトル間の距離は、すべ
て、Ｓより大なるパターンとなり、不必要なベク
トルは削除される。 If the above operation is repeated several times, it will be further compressed. In the pattern row of P ₁ in Figure 3,
Further similar processing is performed to obtain a compressed pattern of _P2 . Further compress the pattern sequence of P ₂ ,
Get _P3 . Do the same below. By doing this, the compression ratio will further increase.
As a result, the distances between adjacent vectors are all larger than S, and unnecessary vectors are deleted.

しかし実際には、母音部での長さ子音部での長
さ等も、必要な情報量であり、その点を考慮する
と圧縮は、やりすぎない様に注意すべきである。
認識すべき単語により、認識結果と照し合せて実
験により圧縮の回数は決定される。例えば、「キ
イ」（KEY）と、「キ」（KE）を識別しようとし
た場合、圧縮をやりすぎると、「キイ」の「イ」
の部分がなくなつてしまい「キイ」と「キ」との
識別が難しくなる。認識すべき単語の中に、この
様に、母音部を圧縮することによつて、識別が困
難になる様な単語が含まれている場合は、圧縮の
回数は少くすべきであろう。 However, in reality, the length of the vowel part, the length of the consonant part, etc. are also necessary amounts of information, and in consideration of this, care should be taken not to overdo the compression.
Depending on the word to be recognized, the number of compressions is determined through experiments based on the recognition results. For example, if you try to distinguish between "KEY" and "KE", if you compress too much, the "I" of "KEY" will
The ``key'' part is missing, making it difficult to distinguish between ``key'' and ``ki''. If the words to be recognized include words whose vowel parts are compressed in this way, making it difficult to identify them, the number of compressions should be reduced.

閾値Ｓも、圧縮の回数と同様実験によつて決定
される。 The threshold value S, like the number of compressions, is also determined through experiments.

第４図に本発明の圧縮装置の具体的回路構成の
一例を示す。第５図は、その動作を説明するフロ
ーチヤートである。４００はシフトレジスタで、
これには、圧縮されるべき音声パターンa₁，a₂…
…a_Nが図の様に入つているとする。Ｅは終りを示
すコードである。４０１は同様シフトレジスタ
で、これには圧縮されたパターンが格納される。
４０２，４０３はシフトクロツクCLK₁，CLK₂
を発生するもので、これらは制御回路４０７及
び、圧縮判定回路４０６からの信号でそれぞれク
ロツクを発生し、シフトレジスタ４００，４０１
をそれぞれ独立にシフトさせる。４０４は、距離
計算部でシフトレジスタ４００のR₂，R₁段目の
内容から距離ｄを計算する。４０５は、終了コー
ドＥを検出する回路である。４０６は、距離ｄが
閾値Ｓより大か小かを検出する圧縮判定回路であ
る。 FIG. 4 shows an example of a specific circuit configuration of the compression device of the present invention. FIG. 5 is a flowchart explaining the operation. 400 is a shift register,
This includes the audio patterns a ₁ , a ₂ ... to be compressed.
...a Suppose that _N is entered as shown in the figure. E is a code indicating the end. Similarly, 401 is a shift register in which a compressed pattern is stored.
402 and 403 are shift clocks CLK ₁ and CLK ₂
These clocks are generated by signals from the control circuit 407 and the compression determination circuit 406, respectively, and the shift registers 400, 401
are shifted independently. A distance calculation unit 404 calculates a distance d from the contents of the _first stage R ₂ and R of the shift register 400. 405 is a circuit that detects the end code E. 406 is a compression determination circuit that detects whether the distance d is larger or smaller than the threshold value S.

以下その動作を第５図に従つて説明する。 The operation will be explained below with reference to FIG.

終了コード検出部４０５で、４５０，４５１の
動作がなされる。終了コードでない場合は、R₁，
R₂の内容で距離ｄが距離計算部４０４で計算さ
れ、それが閾値Ｓより大か小か、圧縮判定回路４
０６で判定される。ｄ＜Ｓの場合には４０６から
信号でクロツク発生回路４０２，４０３はそれぞ
れ１回づつ、クロツクを発生し（４５２に示す。） R₁の内容を４０１へ移すとともに４００を１つ、
右へ、シフトする。次に制御回路４０７によつて
４５３の動作により、４００をもう１回、右へシ
フトして、次の隣りあうベクトル間の距離が計算
出来る様にする。ｄＳの場合には、４５４，４
５５に示す様にクロツクCLK₁，CLK₂を２回発
生して、R₁，R₂の内容を、シフトレジススタ４
０１へ移すとともに、シフトレジスタ４００を２
回シフトして、次の隣りあうベクトル間の距離計
算が行なえる様に動作する。R₁の内容が終了コ
ードＥの場合は、終了コードのみシフトレジスタ
４０１へ移す（４５６に示す）。R₂が終了コード
の場合には、R₁，R₂の内容をシフトレジスタ４
０１へ移す（４５７，４５８に示す）。 The end code detection unit 405 performs operations 450 and 451. If not an exit code, R ₁ ,
The distance d is calculated by the distance calculation unit 404 based on the contents of _R2 , and the compression judgment circuit 404 determines whether it is larger or smaller than the threshold value S.
06 is determined. If d<S, the clock generation circuits 402 and 403 each generate a clock once using a signal from 406 (as shown in 452).The contents of _R1 are transferred to 401, and 400 is incremented by one.
Shift to the right. Next, the control circuit 407 shifts 400 to the right once more by the operation 453 so that the distance between the next adjacent vectors can be calculated. For dS, 454,4
As shown in 55, clocks CLK ₁ and CLK ₂ are generated twice, and the contents of R ₁ and R ₂ are transferred to shift register register 4.
01 and shift register 400 to 2.
It operates so that the next distance calculation between adjacent vectors can be performed by shifting the vector twice. If the content of _R1 is the end code E, only the end code is transferred to the shift register 401 (as shown in 456). If R ₂ is an end code, the contents of R ₁ and R ₂ are transferred to shift register 4.
01 (shown at 457, 458).

以上の様な動作により、圧縮する事が出来る。 Compression can be achieved through the operations described above.

シフトレジスタ４０１の内容をシフトレジスタ
４００へ移しかえて、上記の様な動作を行なえ
ば、さらに圧縮することが出来る。あるいは、シ
フトレジスト４０１と４００を交換する様な回路
構成にしても良い。 If the contents of the shift register 401 are transferred to the shift register 400 and the operations described above are performed, further compression can be achieved. Alternatively, the circuit configuration may be such that the shift resists 401 and 400 are replaced.

なお制御回路４０７は、圧縮判定回路４０６か
らの信号を受け４５３の動作を、４０２が行なう
様に指令を出すとともに、終了コード検出回路４
０５によつて圧縮の動作を終了させ、圧縮終了
を、END信号によつて他の回路へ知らせる。ま
たSTAR信号を受けて、各回路へ動作を開始す
る様に指令を出す。 The control circuit 407 receives a signal from the compression determination circuit 406 and issues a command for the operation 453 to be carried out by the circuit 402, and also sends a command to the end code detection circuit 4.
05, the compression operation is terminated, and the end of the compression is notified to other circuits by the END signal. It also receives the STAR signal and issues commands to each circuit to start operating.

本発明の方法を音声認識装置に使用する場合は
以上述べた様な圧縮する機能を、第１図の正規化
部１０８の出力の後行なう。登録時には、圧縮さ
れたパターンをそのまま登録しておき入力パター
ンも同様に圧縮して、登録エリアのパターン群と
パターンマツチングを行なう。 When the method of the present invention is used in a speech recognition device, the compression function described above is performed after the output of the normalization unit 108 in FIG. At the time of registration, the compressed pattern is registered as is, the input pattern is similarly compressed, and pattern matching is performed with the pattern group in the registration area.

なお、隣りあう２つのベクトル間の距離が小さ
い場合、第３図、および第４図では、どちらか一
方を残す様にしているが、２つのベクトルの平均
をとつて、１つにしても良い。またＳが比較的大
なる時は、２つのベクトルの平均をとる方が、よ
り元のパターンに対して忠実なものとなる。なお
平均方法は、次の様に定義する。 Note that when the distance between two adjacent vectors is small, one of them is left in Figures 3 and 4, but it is also possible to take the average of the two vectors and combine them into one. . Furthermore, when S is relatively large, taking the average of the two vectors is more faithful to the original pattern. The averaging method is defined as follows.

a′_oi＝a_oi＋a_o+1,i／２ｉ＝フイルタのチヤンネル番号 a′_oi＝〔a_o1，a_o2…a_oi，…a_o,I〕 a′_oiが平均されたベクトルである。 a′ _oi = a _oi + a _o+1,i /2 i = filter channel number a′ _oi = [a _o1 , a _o2 …a _oi , …a _o,I ] a′ _oi is the averaged vector .

以上のように本発明の方法によれば登録エリア
の容量を小さくすることが出来る。 As described above, according to the method of the present invention, the capacity of the registration area can be reduced.

さらにパターンマツチングの際の処理時間を短
くすることが出来る。すなわち前述の（Ti／Tt）
が小さくなるので結果的に距離ｄ（ｍ，ｎ）およ
び漸化式を計算する回数が少くなる。圧縮の時に
d_o,o+1を計算するため、その分だけ増加するが、
一方圧縮によつてパターンマツチングの際のｄ
（ｍ，ｎ）の計算回数は少なくなり、全体として
処理時間は短くなる。認識語数が多くなる程、圧
縮の効果は大になる。 Furthermore, the processing time for pattern matching can be shortened. In other words, the aforementioned (Ti/Tt)
As a result, the number of times the distance d(m, n) and recurrence formula are calculated becomes smaller. at the time of compression
Since d _o,o+1 is calculated, it increases by that amount, but
On the other hand, due to compression, d during pattern matching
The number of calculations for (m, n) is reduced, and the overall processing time is shortened. The greater the number of recognized words, the greater the compression effect.

また子音部のみが異なるようなパターンをより
正確に判別できる。例えば、SAGA（左賀）
KAGA（加賀）の場合最初のＳとＫしか異なつて
おらず、２つのパターンの距離はきわめて大きく
なる。圧縮すると、母音部での圧縮がなされて、
SAGAのＳの占める割合が大きくなる。KAGA
でも、Ｋの占める割合が大きくなり、結局、２つ
のパターンの距離は小さくなり、より正確にこの
似かよつた音声を識別する事が、出来ることとな
る。 Furthermore, patterns in which only the consonant parts differ can be more accurately discriminated. For example, SAGA
In the case of KAGA, only the first S and K differ, and the distance between the two patterns is extremely large. When compressed, the vowel part is compressed,
The proportion of S in SAGA increases. KAGA
However, as the proportion occupied by K increases, the distance between the two patterns becomes smaller, and it becomes possible to more accurately identify these similar sounds.

なお、本実施例においては、２つの特徴ベクト
ルの違いの程度を距離の概念で説明したのでその
値が大きいときは両者の違いは大きく、その値が
小さいときは両者の相違は小さくなるが、類似度
の概念を用いることも勿論可能であつて、その場
合は両者の似ている程度を表すことになり、その
値が大きいときは両者の相違は小さく、その値が
小さいときは両者の相違は大きいということにな
る。何れの概念を用いようとも本発明の原理の本
質は同じである。 In this example, the degree of difference between two feature vectors was explained using the concept of distance, so when the value is large, the difference between the two is large, and when the value is small, the difference between the two is small. Of course, it is also possible to use the concept of similarity, in which case it would represent the degree to which the two are similar; when the value is large, the difference between the two is small, and when the value is small, the difference between the two is expressed. is large. The essence of the principle of the present invention is the same no matter which concept is used.

[Brief explanation of drawings]

第１図Ａは音声認識装置の構成を示すブロツク
図、同Ｂはその一部のブロツク図、第２図は音声
認識におけるパターンマツチングを説明する図、
第３図は本発明における圧縮方法の一例を説明す
る図、第４図は本発明装置の一構成例を示すブロ
ツク図、第５図はその動作を説明するフローチヤ
ートである。４００，４０１……シフトレジスタ、４０２，
４０３……シフトクロツク発生回路、４０４……
距離計算部、４０５……終了コード検出部、４０
６……圧縮判定部、４０７……制御部。 FIG. 1A is a block diagram showing the configuration of a speech recognition device, FIG. 1B is a block diagram of a part thereof, and FIG. 2 is a diagram explaining pattern matching in speech recognition.
FIG. 3 is a diagram illustrating an example of the compression method according to the present invention, FIG. 4 is a block diagram illustrating an example of the configuration of the apparatus according to the present invention, and FIG. 5 is a flowchart illustrating its operation. 400, 401...shift register, 402,
403...Shift clock generation circuit, 404...
Distance calculation unit, 405... End code detection unit, 40
6... Compression determination unit, 407... Control unit.

Claims

[Claims]

1. A first storage means for holding a pattern represented by a series of feature vectors; a distance calculation means for calculating a distance between adjacent vectors without allowing overlap;
compression determining means for determining whether the distance calculated by the distance calculating means exceeds a predetermined threshold; and if the distance does not exceed the threshold, either one or a second storage means for replacing the average value of both with the two adjacent vectors, and storing the two adjacent vectors as they are when the distance exceeds the threshold; A pattern compression device characterized in that the number of feature vectors constituting the original pattern is reduced by transferring the contents of the means to the first storage means and repeating similar processing using the various means.