JPH0636155B2

JPH0636155B2 - Pattern comparison device

Info

Publication number: JPH0636155B2
Application number: JP60151731A
Authority: JP
Inventors: 英一坪香
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1985-07-09
Filing date: 1985-07-09
Publication date: 1994-05-11
Anticipated expiration: 2009-05-11
Also published as: JPS6210698A

Description

【発明の詳細な説明】産業上の利用分野本発明は、登録された複数種類のパターンと入力パター
ンとの比較を行い、入力パターンの識別を行うパターン
比較装置、特に連続して発声した単語音声の認識などに
適用可能なパターン比較装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pattern comparison device that compares a plurality of types of registered patterns with an input pattern to identify the input pattern, and in particular to a continuously uttered word voice. The present invention relates to a pattern comparison device that can be applied to, for example, recognition.

従来の技術人間にとって最も自然な情報発生手段である音声が、人
間−機械系の入力手段として真価が発揮されるために
は、話者を限定せず連続的な通常の会話音声の認識が可
能なことが望ましい。2. Description of the Related Art Since voice, which is the most natural means of generating information for humans, is fully utilized as a human-machine input means, continuous normal speech recognition is possible without limiting the speaker. It is desirable.

第２図は単語単位を認識単位とする音声認識装置のブロ
ック図である。(21)は音声信号の入力端子、(22)は入力
音声信号を周波数分析、LPC分析、PARCOR分析、相関分
析等により幾つかの数値の組（特徴ベクトル）の系列に
変換する音響分析部、(23)は認識すべき単語が前記特徴
ベクトルの系列として登録されている標準パターン記憶
部、(24)は音響分析部(22)で分析された認識すべき入力
音声信号に対する前記特徴ベクトルの系列と前記標準パ
ターンのそれぞれとを比較し、両者の距離あるいは類似
度を計算するパターンマッチング部、(25)はパターンマ
ッチング部(24)の計算結果に基づいて前記入力音声パタ
ーンに最も近い標準パターンに対応する単語を認識結果
として判定する判定部であり、(26)はこの認識結果を出
力する出力端子である。このような構成による音声認識
装置において、パターンマッチングの方法として、動的
計画法による時間軸非線形伸縮によりマッチング（DPマ
ッチング）を行う方法が優れている。FIG. 2 is a block diagram of a voice recognition device using a word unit as a recognition unit. (21) is an input terminal of a voice signal, (22) is an acoustic analysis unit that converts the input voice signal into a series of several numerical value groups (feature vectors) by frequency analysis, LPC analysis, PARCOR analysis, correlation analysis, etc. (23) is a standard pattern storage unit in which a word to be recognized is registered as a sequence of the feature vectors, (24) is a sequence of the feature vectors for the input speech signal to be recognized analyzed by the acoustic analysis unit (22). And a pattern matching unit that compares each of the standard patterns and calculates the distance or the similarity between the two, (25) is a standard pattern that is the closest to the input voice pattern based on the calculation result of the pattern matching unit (24). A determination unit that determines a corresponding word as a recognition result, and (26) is an output terminal that outputs this recognition result. In the speech recognition apparatus having such a configuration, as a pattern matching method, a method of performing matching (DP matching) by time-axis nonlinear expansion / contraction by dynamic programming is superior.

本発明装置による連続単語認識においては、このDPマッ
チングを用いている。次にDPマッチングのアルゴリズム
について簡単に説明する。いまを２つの音声パターンとする。すなわち、それらの音声
パターンは、それぞれに対する特徴ベクトルa_i,b_jの系
列で表される。This DP matching is used in the continuous word recognition by the device of the present invention. Next, the DP matching algorithm will be briefly described. Now Are two voice patterns. That is, those voice patterns are represented by a series of feature vectors a _i and b _j for each.

ベクトルa_iとb_jの距離をd(i,j)とするとき、前記両系列
を構成するベクトルの種々の対応づけに対し、d(i,j)の
荷重平均を求め、それが最小になる対応づけを両系列間
の最適な対応づけとし、そのときの荷重平均を両系列間
の距離D(A,B)とするのであるが、この手続きを動的計画
法を用いて効率よく行うのがDPマッチングである。な
お、d(i,j)は通常ベクトルa_iとb_jのユークリッド距離ま
たは市街距離が用いられる。When the distance between the vectors a _i and b _j is d (i, j), the weighted average of d (i, j) is calculated for various correspondences of the vectors forming the two series, and it is minimized. The optimum correlation between the two series is defined as follows, and the weighted average at that time is defined as the distance D (A, B) between the two series. This procedure is efficiently performed using dynamic programming. Is DP matching. Note that the Euclidean distance or city distance between the vectors a _i and b _j is usually used as d (i, j).

第３図はこれを二次元的に図示したもので、Ａ，Ｂ両パ
ターンの時間の対応すなわち時間変換函数j(i)はｉ−ｊ
平面上の格子点c(K)＝(i(K),j(K))の系列Ｆ＝c(1)c(2)…c(k)…c(K)…(2) （i(K)＝Ｉ，j(k)＝Ｊ）で表わされる。このとき、D(A,B)は次のように定義され
る。FIG. 3 illustrates this two-dimensionally. Correspondence of time between both patterns A and B, that is, the time conversion function j (i) is i−j.
Sequence of grid points c (K) = (i (K), j (K)) on the plane F = c (1) c (2) ... c (k) ... c (K) ... (2) (i ( K) = I, j (k) = J). At this time, D (A, B) is defined as follows.

ここに、w(K)は非負の定数で、その値は時間変換函数j
(i)を点列で近似するときの方式によって定められる。
ここで、式(3)の分母をＦに依存しない定数とすれば、D(A,B)は動的計画法により効率的に求められ
る。すなわち、であるから、g(c(1))＝g(1,1)＝d(1,1)として、漸化式
(4)を解き、g(c(K))＝g(I,J)が求められればとしてD(A,B)が求められる。 Where w (K) is a non-negative constant whose value is the time conversion function j
It is determined by the method used when approximating (i) with a sequence of points.
Here, the denominator of equation (3) is a constant that does not depend on F. Then, D (A, B) can be efficiently obtained by dynamic programming. That is, Therefore, with g (c (1)) ＝ g (1,1) ＝ d (1,1), the recurrence formula
If (4) is solved and g (c (K)) ＝ g (I, J) is obtained, As D (A, B) is required.

式(3)の分母を定数化する方法として、Ｍ＝I+Jとなるよ
うにする方法（対称型）と、Ｍ＝ＩまたはＪとなるよう
にする方法（非対称型）がある。第４図(a)〜(f)は点列
Ｆを選ぶ際の拘束条件の例を示しており、点(i,j)に至
る経路は図の矢線で示される経路のみとり得る。また各
線分上に示された数字はその線分が経路として選ばれた
場合の荷重w(K)を示している。(a)(b)は前記対称型の例
でＭ＝I+Jとなり、(c)〜(f)は前記非対称型の例でＭ＝
Ｉとなる。As a method of making the denominator of the formula (3) constant, there are a method of setting M = I + J (symmetrical type) and a method of setting M = I or J (asymmetrical type). FIGS. 4 (a) to 4 (f) show examples of constraint conditions when selecting the point sequence F, and the route to the point (i, j) can be only the route indicated by the arrow in the figure. Also, the number shown on each line segment indicates the load w (K) when the line segment is selected as the route. (a) and (b) are M = I + J in the symmetric example, and (c) to (f) are M = I + J in the asymmetric type.
It becomes I.

このようなマッチング法を用いて単語音声の認識をする
には次のようにする。認識の対象となっている単語クラ
スをｎ（ｎ＝１〜Ｎ）、その標準パターンをＢ^ｎで表わ
す。入力Ａと各標準パターンＢ^ｎとの距離Ｄ_ｎ＝D(A,
Bⁿ)を上記の方法で計算し、を与えるクラスn₀をＡに対する認識結果とする。The following is how to recognize a word voice using such a matching method. The word class to be recognized is represented by n (n = 1 to N), and its standard pattern is represented by B ⁿ . Distance between input A and each standard pattern B ⁿ D _n = D (A,
B ⁿ ) is calculated by the above method, The class n ₀ that gives is the recognition result for A.

前記非対称型のDPマッチングでＭ＝Ｉとなるようにすれ
ば、Ｍは入力パターン長にのみ関係する量となり、式
(5)において何れの標準パターンに対してもＭは一定で
あるから、と定義できる。第４図(c)の拘束条件のもとに式(6)を求
める場合には次の漸化式(7)を計算すればよい。If M = I is set in the asymmetric DP matching, M becomes a quantity related only to the input pattern length.
In (5), M is constant for any standard pattern, Can be defined as When the formula (6) is obtained under the constraint condition of FIG. 4 (c), the following recurrence formula (7) may be calculated.

初期条件 g(1,1)＝d(1,1) 次に連続単語音声の認識について説明する。連続単語音
声認識は次のように定式化できる。いまＸ個の単語q
(1),q(2)，…q(x)を連続して発声したときの音声パター
ンをＡで表わす。 Initial condition g (1,1) = d (1,1) Next, recognition of continuous word speech will be described. Continuous word speech recognition can be formulated as follows. Now X words q
A (1), q (2), ... Q (x) represents a voice pattern when it is uttered continuously.

Ａ＝a₁a₂…a_i…a_I……(8) 単語q(x)の標準パターンをとするとき、ｘ個の単語B_q(1),B_q(2),…B_q(x)を接続し
て得られる標準パターンはで表わされる。ここではパターンの連続を表わす。A = a ₁ a ₂ … a _i … a _I …… (8) The standard pattern of the word q (x) is , The standard pattern obtained by connecting _x words B _{q (1)} , B _{q (2)} , ... B _{q (x)} is It is represented by. Here, the continuity of the pattern is represented.

そこで、連続単語音声認識は、このと入力音声パター
ンＡとの間でDPマッチングを実行し、その際得られるD
(A,）が最小になるように、ｘとq(x)(x＝1,2,…，
ｘ）を決めるという問題になる。すなわちを計算し、Ｔが最小になる条件を求めればよい。式(11)
の計算をまともに実行しようとすると、膨大な計算量が
必要となる。すなわち、入力音声パターンにおいて連続
発声の単語数の最大値をＫ、単語標準パターンの数をＮ
とすれば、Ｎ^Ｋ回の計算を実行することになる。そこ
で、実際にはこの問題を次の漸化式を解く問題に帰着さ
せている。Therefore, in continuous word voice recognition, DP matching is performed between this and the input voice pattern A, and D obtained at that time is obtained.
X and q (x) (x = 1,2, ..., so that (A,) is minimized
The problem is to decide x). Ie Is calculated, and the condition for minimizing T may be obtained. Formula (11)
A huge amount of calculation is required to properly execute the calculation of. That is, the maximum value of the number of continuously uttered words in the input voice pattern is K, and the number of standard word patterns is N.
Then, N ^K calculations will be executed. Therefore, we actually reduce this problem to the problem of solving the following recurrence formula.

入力音声パターンＡにおて、ｉ＝l+1からｉ＝ｍまでの
部分区間を、部分パターンA(l,m)で定義する。In the input voice pattern A, a partial section from i = l + 1 to i = m is defined by a partial pattern A (l, m).

A(l,m)＝a_l+1a_l+2…ａ_ｍ……(12) このとき、式(6)によりパターン間の距離を定義すれば
次のことが言える。A (l, m) = a l + 1 a l + 2 ... a m ...... (12) In this case, the following can be said by defining the distance between the patterns by equation (6).

このことを用いれば式(11)は次のように解ける。 Using this fact, equation (11) can be solved as follows.

ここで以後用いる記号の意味を第１表にまとめて示す。The meanings of the symbols used hereafter are summarized in Table 1.

なる漸化式の解を求めれば、認識結果は第５図に示すフ
ローチャートにより、Ｘ単語列の最後尾単語名とセグメ
ンテーション結果から先頭単語名とセグメンテーション
結果まで順次求まる。 If the solution of the recurrence formula is obtained, the recognition result is sequentially obtained from the last word name and segmentation result of the X word string to the first word name and segmentation result according to the flowchart shown in FIG.

ii）入力単語数Ｘが未知の場合なる漸化式の解から第６図のフローチャートにより認識
結果が得られる。ii) When the number of input words X is unknown The recognition result is obtained from the solution of the recurrence formula shown in the flowchart of FIG.

以上の考え方を実現するのに２段DP法が提案されてい
る。次の２段DP法について概略を説明する。A two-stage DP method has been proposed to realize the above concept. An outline of the following two-stage DP method will be described.

２段DP法は、先ずＤⁿ ₀(s:t)をあらゆるs,tの組合せに対
してDPで求めておき、その後D(i)をDPで求める方法で、
DPを２段にしているのが特徴である。この２段DP法とし
ては前向きアルゴリズムと後向きアルゴリズムが提案さ
れているが、ここでは後向きアルゴリズムについて説明
する。The two-stage DP method is a method in which D ⁿ ₀ (s: t) is first obtained by DP for all combinations of s and t, and then D (i) is obtained by DP.
The feature is that the DP is in two stages. As the two-stage DP method, a forward algorithm and a backward algorithm have been proposed. Here, the backward algorithm will be described.

入力パターンのフレームｉ−１に対して、D(i-1),N(i
-1),B(i-1)は求まっているとする。For frame i-1 of the input pattern, D (i-1), N (i
It is assumed that -1) and B (i-1) have been obtained.

単語n(n=1,2…Ｎ）の標準パターンと入力パターン
を、ｉを始点として逆時間向きにDPマッチングする。従
って、経路の拘束条件は第４図(c)(d)(e)(f)に対応し
て、第８図(a)(b)(c)(d)となる。マッチング範囲は、整
合窓幅Ｒで行うことも考えられるが、ここでは傾き1/2
〜２の範囲（傾斜制限内、第７図の斜線部）で行うもの
とする。このマッチングを終端フリーとして行う。その
結果、Ｄⁿ ₀(s:t)が求まる。ただし、i-2Jⁿ+1≦ｓ≦ｉ−
(1/2)Jⁿである。DP matching is performed on the standard pattern of the word n (n = 1, 2 ... N) and the input pattern in the reverse time direction starting from i. Therefore, the constraint conditions of the route are shown in FIGS. 8 (a) (b) (c) (d) corresponding to FIGS. 4 (c) (d) (e) (f). The matching range may be set with the matching window width R, but here the slope is 1/2
It shall be performed in the range of 2 (within the inclination limit, the shaded portion in FIG. 7). This matching is performed without termination. As a result, D ⁿ ₀ (s: t) is obtained. However, i-2J ⁿ + 1 ≦ s ≦ i−
(1/2) J ⁿ .

式(15)のD(i),N(i),B(i)を求める。Find D (i), N (i), and B (i) in equation (15).

ｉ＝i+1としてへもどる。Return to i = i + 1.

以上の考え方は、マッチング経路の拘束条件が、第４図
(c)〜(f)に示すように非対称型であって、前記正規化系
数Ｍが入力パターンのフレーム数（より一般的には、入
力パターンの各フレームに適当に重みを導入した場合の
その重みの総和）にのみ依存する場合に成立するのであ
って、第４図(a)(b)のような対称型のパスの場合は前記
正規化係数は標準パターンのフレーム数にも依存してし
まい、漸化式(14)(15)等が成立しなくなる。According to the above idea, the constraint condition of the matching route is as shown in FIG.
As shown in (c) to (f), it is an asymmetric type, and the normalized coefficient M is the number of frames of the input pattern (more generally, when weights are appropriately introduced into each frame of the input pattern). It depends on the total number of weights only), and in the case of a symmetrical path as shown in FIGS. 4 (a) and 4 (b), the normalization coefficient depends on the number of frames of the standard pattern. As a result, the recurrence formulas (14) and (15) are not established.

次にその理由を説明する。この場合は、標準パターン長
によっても累積照合距離が変るので、どの標準パターン
が最も良く適合するかを評価するためには前記の如く入
力パターン長と標準パターン長の和で両パターン間の累
積照合距離を割る（正規化する）必要があった。Next, the reason will be described. In this case, the cumulative matching distance also changes depending on the standard pattern length. Therefore, in order to evaluate which standard pattern best fits, the cumulative matching distance between the two patterns is calculated by summing the input pattern length and the standard pattern length as described above. It was necessary to divide (normalize) the distance.

いま、入力パターンＡの部分パターンA(0,m)に最も良く
適合する標準パターンがＢ_１、その長さがｂ_１、その他
の任意の標準パターンがＢ_２、その長さがｂ_２であった
とすると次式が成立する。Now, the standard pattern that best matches the partial pattern A (0, m) of the input pattern A is B ₁ , its length is b ₁ , other standard patterns are B ₂ , and its length is b _2. Then, the following equation holds.

但し、ここでD(P,Q)は正規化する前のパターンＰとパタ
ーンＱの累積照合距離を表わすものとしている。 Here, D (P, Q) represents the cumulative matching distance between the pattern P and the pattern Q before normalization.

入力が第ｉフレームの時点で式(14)(15)に基づいて（勿
論入力パターン長と標準パターン長で正規化するとし
て）バッタポインタと最後尾単語（単音節）を探索する
場合を考える。最後尾単語をＸ、その長さをｘ、バック
ポインタをｍと仮定したとき、Ｂ_１とＸを結合した標準
パターンと入力の部分パターンA(0,i)の累積照合距離を
入力パターン長と標準パターン長の和で正規化したもの
はで表わされる。ｍおよびＸを式(14)(15)により探索する
ためには、αは当然次の値よりも小さくなければならな
い。Consider a case where the input is the i-th frame and the grasshopper pointer and the last word (monosyllabic) are searched based on equations (14) and (15) (assuming that the input pattern length and the standard pattern length are used for normalization). Assuming that the last word is X, its length is x, and the back pointer is m, the cumulative matching distance between the standard pattern combining B ₁ and X and the input partial pattern A (0, i) is the input pattern length. Normalized by the sum of standard pattern length It is represented by. In order to search m and X by the equations (14) and (15), α must be smaller than the following value.

すなわち、もしβ＜αが成立すれば、式(15)におけるD
(m)として、第ｍフレーム目で求めたD(m)を用いること
ができなくなるからである。 That is, if β <α holds, D in Eq. (15)
This is because it is not possible to use D (m) obtained in the m-th frame as (m).

ところが、α＜βは一般には成立しない。例えば D(A(0,m),B₁)＝10，D(A(0,m),B₂)＝20 ｍ＝20，ｂ_１＝10，ｂ_２＝20 とすれば式(16)において左辺＝10/(20+10)＝1/3 右辺＝20/(20+20)＝1/2 となり、上記の数値は式(16)を満足する。しかしｉ＝40，ｘ＝10，D(A(m+1,i),X)＝60 とすれば α＝(10+60)/(40+10+10)＝7/6 β＝(20+60)/(40+20+10)＝8/7 であるから α＞β となり、もはや式(16)は満足されなくなる。However, α <β does not generally hold. For example, if D (A (0, m), B ₁ ) = 10, D (A (0, m), B ₂ ) = 20 m = 20, b ₁ = 10, b ₂ = 20, then equation (16) In, the left side = 10 / (20 + 10) = 1/3 and the right side = 20 / (20 + 20) = 1/2, and the above numerical values satisfy equation (16). However, if i = 40, x = 10, D (A (m + 1, i), X) = 60, α = (10 + 60) / (40 + 10 + 10) = 7/6 β = (20 Since +60) / (40 + 20 + 10) = 8/7, α> β, and equation (16) is no longer satisfied.

ところが入力パターン長のみに依存する前記非対称型の
DP法の場合はであればは明らかであるから矛盾なく式(14)(15)が使える。However, the asymmetric type that depends only on the input pattern length
In case of DP method If Since it is clear, equations (14) and (15) can be used without contradiction.

以上のように、対称型のDPパスを用いたときは、そのま
までは前記定式化に基づく連続単語音声認識はできない
ことになる。As described above, when the symmetrical DP path is used, continuous word speech recognition based on the above formulation cannot be performed as it is.

問題点を解決するための手段本発明は上記問題点を解決するために、入力信号を特徴
ベクトルの系列に変換する特徴抽出手段と、特徴ベクトルの系列からなる標準パターンＢ^ｎ（ただし、ｎ＝1,2,…，Ｎ）
を一定のフレーム数Ｊの特徴ベクトルの系列に変換する標準パターン正規化手段と、該正規化された
標準パターンを記憶する正規化標準パターン記憶手段
と、入力パターンの第１フレームから第ｍフレームまで
の部分パターンA(0,m)と前記正規化標準パターンをx-1
個最適に連結することによって得られる連結パターンと
の最小累積照合距離D_x-1(m)と、入力パターンの第ｍ＋
１フレームから第ｉフレームまでの部分パターンA(m,i)
と前記正規化標準パターンとの最小累積照合距離Ｄⁿ _x(m+1:i)との和をn,mについて
最小化することにより、入力パターンの第１フレームか
ら第ｉフレームまでの最小累積照合距離D_x(i)を計算す
る累積照合距離計算手段と、そのときのm,nをB_x(i),N
_x(i)として記憶するバックポインタ記憶手段および最後
尾単語記憶手段と、入力が完了した時点で入力の最終フ
レームをＩとするとき、入力がｘ単語から成るとした場
合、最終の単語はN_x(I)、最終から２番目の単語はN
_x-1(B_x(I))、最終から３番目の単語はN_x-2(B_x-1(B
_x(I)))…というように逆の順序で連続入力された単語音
声を認識する手段とを備えたもので、入力の単語数を指
定するとき、入力がその指定された単語数であると仮定
したときの最適の単語列を出力するものである。Means for Solving the Problems In order to solve the above problems, the present invention uses an input signal as a sequence of feature vectors. Feature extraction means for converting to A standard pattern B ⁿ (where n = 1, 2, ..., N)
Is a sequence of feature vectors with a fixed number of frames J A standard pattern normalizing means for converting into a standard pattern, a standardized standard pattern storing means for storing the standardized standard pattern, a partial pattern A (0, m) from the first frame to the m-th frame of the input pattern, and Normalized standard pattern x-1
The minimum cumulative matching distance D _x-1 (m) with the concatenated pattern obtained by optimally concatenating the
Partial pattern A (m, i) from the 1st frame to the i-th frame
And the normalized standard pattern And the minimum cumulative matching distance D ⁿ _x (m + 1: i) for n, m are minimized to obtain the minimum cumulative matching distance D _x (i from the first frame to the i-th frame of the input pattern. ) Is calculated and the m, n at that time is calculated as B _x (i), N
_When the back pointer storage means and the last word storage means to be stored as _x (i) and the final frame of the input when the input is completed are I, and the input consists of x words, the final word is N. _x (I), the second to last word is N
_x-1 (B _x (I)), the third to last word is N _x-2 (B _x-1 (B
_x (I))) ... and the like, which has means for recognizing consecutively input word sounds in reverse order, and when the number of input words is specified, the input is the specified number of words. It outputs the optimum word string under the assumption.

作用この構成により、特徴ベクトルの系列から成る標準パターンＢ^ｎ（ただし、ｎ＝1,2,…，Ｎ）
を一定のフレーム数Ｊの正規化標準パターンに変換し、入力パターンの第１フレームから第ｍフレー
ムまでの部分パターンと、前記正規化標準パターンを最
適にx-1個連結することによって得られる連結パターン
との最小累積照合距離D_x-1(m)と、入力パターンの第m+1
フレムから第ｉフレームまでの部分パターンと前記正規
化標準パターン^ｎとの最小累積照合距離Ｄⁿ _x(m+1:i)
との和をn,mについて最小化したものとして、入力ター
ンの第１フレームから第ｉフレームまでの最小累積照合
距離D_x(i)を計算し、そのときのｍをバックポインタD
_x(i)として、ｎを入力が第ｉフレームで終端するとした
ときの最後尾単語名N_x(i)とし、ｉ＝1、2…Ｉ，ｘ＝1,2,
……Ｘについて順次B_x(i),N_x(i)を求め、入力が完了す
ると（ｉ＝Ｉになると）、入力が、ｘ単語から成るとし
た場合は、最終の単語はN_x(i)、最終から２番目の単語
はN_x-1(B_x(I))、最終から３番目の単語はN_x-2(B_x-1(B
_x(I)))…というように逆の順序で連続入力された単語音
声を認識する。Action With this configuration, the series of feature vectors A standard pattern B ⁿ (where n = 1, 2, ..., N)
A standardized standard pattern with a fixed number of frames J And the minimum cumulative matching distance D _x-1 between the partial pattern from the first frame to the m-th frame of the input pattern and the concatenated pattern obtained by optimally concatenating the normalized standard patterns _x-1 (m) and the (m + 1) th input pattern
Minimum cumulative matching distance D ⁿ _x (m + 1: i) between the partial pattern from the frame to the i-th frame and the normalized standard pattern ⁿ
The minimum cumulative matching distance D _x (i) from the 1st frame to the i-th frame of the input turn is calculated assuming that the sum of and is minimized for n and m, and m at that time is calculated as the back pointer D.
_{Let x} (i) be the last word name N _x (i) when n is the end of the input at the i-th frame, and i = 1, 2 ... I, x = 1, 2,
...... For each X, B _x (i) and N _x (i) are sequentially obtained, and when the input is completed (when i = I), if the input consists of x words, the final word is N _x ( i), the second last word is N _x-1 (B _x (I)), the third last word is N _x-2 (B _x-1 (B
_x (I))) ... Recognizes word speech continuously input in the reverse order.

実施例以下本発明の一実施例を図面に基づいて説明する。第１
図は本発明の連続音声認識装置の一実施例を示すブロッ
ク図である。第１図において、Ｉ_ｎは音声信号の入力端
子、(1)はフィルタバンク等で構成された特徴抽出であ
って、入力音声信号を特徴ベクトルの系列Ａに変換する。(2)は単語標準パターン記憶部で
あって、認識語彙たるＮ個の単語が、それぞれ標準パタ
ーン（ｎ＝1,2…，Ｎ）として特徴ベクトルの系列の形で予
め登録されている。(3)は標準パターン正規化部であっ
て、単語標準パターン記憶部(2)に保持されているパタ
ーンＢ^ｎの第ｊフレームを線形伸縮によって第フレームに対応せしめ、フレーム数Ｊ^ｎのパターンをフ
レーム数Ｊのパターンに変換する時間軸正規化を行なう
ものである。ここにＪは定数である。第９図はこの変換
方法を説明する図である。第９図(a)は単語標準パター
ンＢ^ｎのフレームｊがに変換される様子を示している。(100)は横軸をｘ、縦
軸をｙとするとで与えられる直線である。実際には、ｊは整数値しか取
り得ないからによりｊとを対応づける。ここに、〔Ｚ〕はＺ以下の整数でＺに最
も近いものを表わす。第９図(b)は特徴ベクトルの系列
で表わされるパターンＢ^ｎにおいて、第ｊフレームのベ
クトルをとするとき時間軸正規化する前のベクトルと時間軸正規化後のベクトルの関係を説明するものである。式(20))によればと対応するｊはなる直線と、直線(100)との交点(102)の下側にあって、
その交点に最も近い上の格子点(103)のｙの座標として対応づけられるそこ
で、は、上にあって、前記交点(102)より上側でその交点に最も
近い格子点(101)のｙ座標におけるベクトルとから線形補間により求める。即ち、格子点(101)と格
子点(103)を交点(102)が1-S対Ｓに分けるものとすればで与えられる。Ｓは式(19)においてを代入して得られたｙから、式(20)で与えられるｊを引
くことによって得られる。即ちとなる。Embodiment An embodiment of the present invention will be described below with reference to the drawings. First
FIG. 1 is a block diagram showing an embodiment of a continuous voice recognition device of the present invention. In Figure 1, an input terminal of the I _n audio signal, (1) is a feature extraction composed of a filter bank or the like, wherein the input speech signal vector It is converted into the series A. (2) is a word standard pattern storage unit in which N words, which are recognition vocabularies, are standard patterns (N = 1, 2, ..., N) is registered in advance in the form of a series of feature vectors. (3) is a standard pattern normalization unit that linearly expands and contracts the j-th frame of the pattern B ⁿ held in the word standard pattern storage unit (2). Corresponding to the frames, the time axis normalization is performed to convert the pattern of the frame number J ^{n into} the pattern of the frame number J. Here, J is a constant. FIG. 9 is a diagram for explaining this conversion method. FIG. 9 (a) shows that the frame j of the standard word pattern B ⁿ is It is shown to be converted into. (100) where x is the horizontal axis and y is the vertical axis Is a straight line given by. Actually, j can take only integer values. By j and Correspond to. Here, [Z] represents an integer which is less than or equal to Z and is closest to Z. FIG. 9 (b) shows the vector of the j-th frame in the pattern B ⁿ represented by the series of feature vectors. And the vector before time axis normalization And vector after time axis normalization To explain the relationship. According to equation (20)) And j corresponding to Which is below the intersection (102) of the straight line and the straight line (100),
Closest to the intersection It is mapped as the y coordinate of the upper grid point (103), where Is A vector at the y coordinate of the lattice point (101) which is above and is closest to the intersection (102) above It is obtained by linear interpolation from and. That is, if the intersection point (102) divides the grid point (101) and the grid point (103) into 1-S vs. S, Given in. S is in equation (19) It is obtained by subtracting j given by equation (20) from y obtained by substituting I.e. Becomes

(4)は正規化標準パターン記憶部であって、標準パター
ン正規化部(3)で時間軸が正規化された特徴ベクトルの
系列を標準パターンとして記憶する。このように、標準
パターンのフレーム数を単語に依らずに一定にすれば、
前記対称型のDPパスを用いても式(14)は成立する。即
ち、前記の例において、ｂ_１＝ｂ_２＝ｃ＝Ｊ(const)と
すれば、式(17)(18)に対応してとなるから、式(16)に対応してが成立すればα＜βが必ず言えるからである。ただしこ
れらの分母は単語数が同じとした場合に等しくなって上
記のことが言えるのであるが、異なる単語数に対して比
較しなければならない式(15)の場合のように単語数が未
知の場合には用いることができない。従って、本発明は
単語数が既知とした場合には適用可能なものである。(4) is a normalized standard pattern storage unit, which stores a series of feature vectors whose time axis is normalized by the standard pattern normalization unit (3) as a standard pattern. In this way, if the number of frames in the standard pattern is made constant regardless of words,
Equation (14) holds even when the symmetrical DP path is used. That is, in the above example, if b ₁ = b ₂ = c = J (const), then corresponding to equations (17) and (18) Therefore, according to equation (16) This is because α <β can always be said if is satisfied. However, these denominators are equal when the number of words is the same, and the above can be said, but the number of words is unknown as in the case of Equation (15), which requires comparison for different numbers of words. It cannot be used in some cases. Therefore, the present invention is applicable when the number of words is known.

(5)はベクトル間距離計算部であって、入力パターンの
第ｉフレームにおける特徴ベクトルとｎ番目の単語標準パターンの特徴ベクトルとの距離dⁿ(i,j)をｊ＝1,2,…，Ｊについて求めるもの
である。本実施例では第４図(b)のマッチング経路を用
いるものとすれば、dⁿ(i,j)は計算の都度次の計算に１
回用いられるのみであるから、本実施例の方法によれ
ば、１回用いられると以後記憶しておく必要はない。従
って、ここではｄ＝dⁿ(i,j)とすることができる。dⁿ(i,
j)は例えばとの市街地距離として定義できる。即ち、ベクトルの次
元をｌとしとするとき等である。(5) is an inter-vector distance calculation unit, which is a feature vector in the i-th frame of the input pattern And the nth word standard pattern Feature vector The distance d ⁿ (i, j) with respect to j = 1, 2, ..., J is obtained. In the present embodiment, ^assuming that the matching route shown in FIG. 4 (b) is used, d ⁿ (i, j) is set to 1 in the next calculation each time.
Since it is used only once, according to the method of the present embodiment, it is not necessary to store it once it has been used once. Therefore, it is possible to set d = d ⁿ (i, j) here. d ⁿ (i,
j) is for example It can be defined as the city distance between and. That is, the vector dimension is l When Etc.

(6)は累積距離計算部であって、経路の拘束条件が、第
４図(b)の場合について漸化式(14)の解を求める部分で
ある。第ｉフレームについて中間累積距離Ｄⁿ _x(i,j)、
終端累積距離D_x(i)、中間バックポインタＤⁿ _x(i,j)、バ
ックポインタB_x(i)をｊ＝1,2,…Ｊ；ｎ＝1,2,…Ｎにつ
いて求め、ｉを終端フレームとしたときの最後尾の単語
を示すN_x(i)を求める。本実施例におけるマッチング経
路の拘束条件によれば、Ｄⁿ _x(i,j)，D_x(i)はｊ＝1,2,…
…Ｊについて順次次の漸化式を計算することにより求ま
る。(6) is a cumulative distance calculation unit, which is a part for obtaining the solution of the recurrence formula (14) in the case where the route constraint condition is as shown in FIG. 4 (b). Intermediate cumulative distance D ⁿ _x (i, j) for the i-th frame,
The terminal cumulative distance D _x (i), the intermediate back pointer D ⁿ _x (i, j), and the back pointer B _x (i) are obtained for j = 1,2, ... J; n = 1,2, ... N, and i Find N _x (i) indicating the last word when is the end frame. According to the constraint condition of the matching path in this embodiment, D ⁿ _x (i, j) and D _x (i) are j = 1,2, ...
... J is obtained by sequentially calculating the following recurrence formulas.

初期条件Ｄⁿ _x(i-1,0)＝D_x-1(i-1) Ｄⁿ _x(i,j)，Ｂⁿ _x(i)は次の式から求まる。 Initial condition D ⁿ _x (i-1,0) ＝ D _x-1 (i-1) D ⁿ _x (i, j) and B ⁿ _x (i) are obtained from the following equations.

初期条件Ｄⁿ _x(i-1,0)＝i-1 またである。 Initial condition D ⁿ _x (i-1,0) ＝ i-1 Also Is.

式(24)〜(28)の計算をｘ＝1,2,…Ｘのそれぞれについて
行ない、各フレームについて、最終的にD_x(i),B_x(i),N_x
(i)を得る。ここでD_x(i)は次のフレームの計算の際にの
み必要であるから、D_x＝D_x(i)とすることができる。即
ちD₁,D₂,…D_xのみ記憶しておけば良い。Equations (24) to (28) are calculated for each of x = 1, 2, ... X, and finally for each frame, D _x (i), B _x (i), N _x
Get (i). Here, since D _x (i) is necessary only in the calculation of the next frame, it is possible to set D _x = D _x (i). That is, only D ₁ , D ₂ , ... D _x need be stored.

以上のようにして求められた終端累積距離D_x、バックポ
インタB_x(i)、最後尾単語名N_x(i)はそれぞれｘ＝1,2,
…，Ｘについて、終端累積距離記憶部(7)、バックポイ
ンタ記憶部(8)、最後尾単語名記憶部(9)に記憶される。The terminal cumulative distance D _x , the back pointer B _x (i), and the last word name N _x (i) obtained as described above are respectively x = 1,2,
, X are stored in the cumulative end distance storage unit (7), the back pointer storage unit (8), and the last word name storage unit (9).

なお、Dⁿ(i,j),Bⁿ(i,j)(j＝1,2,…Ｊ；ｎ＝1,2,…Ｎ）
は必要がなくなるまで累積距離計算部(6)の内部のメモ
リに一時的に記憶されるのであるが、本実施例において
は、第ｉフレームのそれらの計算には１フレーム前と現
フレームの値のみ必要であるから、２フレーム分のみ記
憶すれが良い。即ち、ｋ＝0.1についてDⁿ(k,j),Bⁿ(k,j)
を準備しておき、例えば入力の奇数番目のフレームにお
いてはｋ＝０を前フレーム、ｋ＝１を現フレームとして
計算し、入力の偶数フレームにおいてはｋ＝１を前フレ
ーム、ｋ＝０を現フレームとして計算すれば良い。Note that D ⁿ (i, j), B ⁿ (i, j) (j = 1,2, ... J; n = 1,2, ... N)
Are temporarily stored in the internal memory of the cumulative distance calculation unit (6) until they are no longer needed. In this embodiment, the values of the previous frame and the current frame are used for those calculations of the i-th frame. It is necessary to store only two frames because it is necessary only. That is, for k = 0.1, D ⁿ (k, j), B ⁿ (k, j)
For example, in an input odd-numbered frame, k = 0 is calculated as the previous frame, and k = 1 is calculated as the current frame. In an even-numbered input frame, k = 1 is calculated as the previous frame and k = 0 is calculated as the current frame. It can be calculated as a frame.

(10)は音声区間検出部であって、入力信号の大きさ等か
ら音声区間を判定するものである。音声区間検出部(10)
が、音声入力が開始されたことを検出するとフレーム数
計数部(11)はフレーム毎に計数をはじめる。前記の処理
は第ｉフレームについての処理であったが、このフレー
ム数計数部(11)の計数値がすなわちこのｉを設定してい
る。従って、前記と同様の処理が、フレームが１進む毎
に行われることになる。フレーム数計数部(11)は音声区
間が検出されると計数を始め、音声区間が終了するとリ
セットされる。従って、最後尾単語記憶部(9)、バック
ポインタ記憶部(8)には、N_x(i),B_x(i)がｉ＝1,2,…Ｉ；
ｘ＝1,2,…Ｘについて記憶されることになる。Reference numeral (10) is a voice section detection unit that determines the voice section from the magnitude of the input signal and the like. Voice section detector (10)
However, when it is detected that voice input is started, the frame number counting unit (11) starts counting for each frame. The above-mentioned processing was processing for the i-th frame, but the count value of the frame number counting section (11) sets the i. Therefore, the same processing as described above is performed each time the frame advances. The frame number counting unit (11) starts counting when a voice section is detected, and is reset when the voice section ends. Therefore, N _x (i) and B _x (i) are i = 1, 2, ... I in the last word storage section (9) and the back pointer storage section (8).
x = 1, 2, ... X will be stored.

セグメンテーション部(12)はバックポイント記憶部(8)
に対し、所定のバックポインタを読出すべき命令を発す
るものである。すなわち、セグメンテーション部(12)が
B_x(i)なる値をバックポインタ記憶部(8)に発すると、バ
ックポインタ記憶部(8)からはバックポインタB_x-1(B
_x(i))が読出される。セグメンテーション部(12)はバッ
クポインタ記憶部(8)からB_x-1(B_x(i))なる値を受け取る
と、その同じ値をバックポインタ記憶部(8)に発する。
従って、音声区間検出部(10)が音声入力の終了を検知す
ると、フレーム数計数部(11)の最終地Ｉと単語数Ｘがセ
グメンテーション部(12)に供給され、セグメンテーショ
ン部(12)は先ずI,Xなる値をバックポインタ記憶部(8)に
発する。以後、前記説明の動作に従って、バックポイン
タ記憶部(8)からB_x(I),B_x-1(B_x(I)),B_x-2(B_x-1(B
_x(I))),…０なる出力が順次得られることになる。これ
らの値は最後から２番目の単語の終りのフレーム、同３
番目の終りのフレーム、同４番目の終りのフレーム、…
というものであり、N_x(i)はｉフレームで終わる単語で
あったから、この値をそのまま最後尾単語記憶部(9)に
与えると、最後の単語から逆の順序で認識結果が得られ
る。なお認識結果が逆の順序で得られないようにするた
めには、この順序の変換をバックポインタ記憶部(8)の
出力に対して行うか最後尾単語記憶部(9)の出力に対し
て行えばよい。The segmentation unit (12) is a back point storage unit (8)
In response, a command to read a predetermined back pointer is issued. That is, the segmentation unit (12)
When the value B _x (i) is issued to the back pointer storage unit (8), the back pointer B _x-1 (B
_x (i)) is read. When the segmentation unit (12) receives a value of B _x _-1 (B _x (i)) from the back pointer storage unit (8), the segmentation unit (12) issues the same value to the back pointer storage unit (8).
Therefore, when the voice section detecting unit (10) detects the end of voice input, the final position I and the word number X of the frame number counting unit (11) are supplied to the segmentation unit (12), and the segmentation unit (12) first The values I and X are issued to the back pointer storage unit (8). Thereafter, according to the operation described above, B _x (I), B _x-1 (B _x (I)), B _x-2 (B _x-1 (B
_x (I))), ... Outputs are sequentially obtained. These values are the end frame of the penultimate word, ibid.
4th ending frame, 4th ending frame, ...
Since N _x (i) is a word ending in the i frame, if this value is given to the last word storage unit (9) as it is, the recognition result is obtained from the last word in the reverse order. In order to prevent the recognition results from being obtained in the reverse order, this order conversion is performed on the output of the back pointer storage unit (8) or on the output of the last word storage unit (9). Just go.

第１０図は前記実施例装置の機能をソフトウエアで実現
した場合のフローチャートであり、以下前記実施例装置
の各部の動作と結びつけて説明する。FIG. 10 is a flow chart in the case where the function of the apparatus of the above-described embodiment is realized by software, and the operation of each unit of the apparatus of the above-described embodiment will be described below.

ステップ(100)〜(105)は累積距離D_x、中間累積距離Ｄⁿ _x
(i,j)、バックポインタB_x(i)、中間バックポインタＤⁿ _x
(i,j)の初期化を行う部分である。Steps (100) to (105) are cumulative distance D _x , intermediate cumulative distance D ⁿ _x
(i, j), back pointer B _x (i), intermediate back pointer D ⁿ _x
This is the part that initializes (i, j).

ステップ(107)〜(117)は第ｉフレームのときに実行する
処理である。ステップ(111)は第ｉフレームにおける中
間累積距離と中間バックポインタの初期値を与える部分
である。ステップ(114)〜(117)の処理は、主として累積
距離計算部(6)で行われる処理である。ステップ(116)に
おける記法はＤⁿ _x(k,Jⁿ)を最小にするｎをとおくという意味である。ステップ(114)は中間累積距
離Ｄⁿ _x(k,j)、中間バックポインタＢⁿ _x(k,j)を求めてい
る。ステップ(116)はステップ(114)の計算をｎ＝1,2,
…，Ｎについて行った結果入力の第ｉフレームを入力の
終端としたとき、累積距離Ｄⁿ _x(k,Jⁿ)が最小となる最後
尾単語を求める処理である。ステップ(117)はステップ(116)で
求まった最適の単語に対し、としてそれぞれメモリに記憶することを示しており、そ
れらのメモリは最後尾単語記憶部(9)、累積距離記憶部
(7)、バックポインタ記憶部(8)に対応している。Steps (107) to (117) are processing executed in the i-th frame. Step (111) is a part for giving the intermediate cumulative distance and the initial value of the intermediate back pointer in the i-th frame. The processes of steps (114) to (117) are processes mainly performed by the cumulative distance calculation unit (6). Notation in step (116) Is the ⁿ that minimizes D ⁿ _x (k, J ⁿ ). It means to put. In step (114), the intermediate cumulative distance D ⁿ _x (k, j) and the intermediate back pointer B ⁿ _x (k, j) are obtained. Step (116) calculates n = 1,2,
, N when the i-th frame of the input input is the end of the input, the last word with which the cumulative distance D ⁿ _x (k, J ⁿ ) is minimized Is a process for obtaining. Step (117) is the best word found in step (116) As opposed to Are stored in the memory respectively, and those memories are the last word storage unit (9) and the cumulative distance storage unit.
(7) corresponds to the back pointer storage unit (8).

ステップ(106)(108)によってｉが奇数のときは、ｋ＝
１，ｉが偶数のときにはｋ＝０となり、前記のように中
間累積距離Ｄⁿ _x(k,j)、中間バックポインタＢⁿ _x(k,j)を
記憶するためのメモリは、フレームが切り替る毎に、直
前のフレームに対する記憶用と現フレームに対する記憶
用とに交互に切り替ることになる。即ち、ステップ(11
1)(114)(116)(117)において、ｋは現フレーム、ｋは直
前のフレームを意味することになる。When i is an odd number by steps (106) and (108), k =
When 1 and i are even numbers, k = 0, and as described above, the frame is switched in the memory for storing the intermediate cumulative distance D ⁿ _x (k, j) and the intermediate back pointer B ⁿ _x (k, j). Each time it is switched, the storage for the immediately preceding frame and the storage for the current frame are switched alternately. That is, step (11
In 1), (114), (116) and (117), k means the current frame and k means the immediately preceding frame.

ステップ(118)〜(120)は以上のようにして求められたN_x
(i),B_x(i)から逆の順序で単語の認識結果を求める部分
で、セグメンテーション部(12)、バックポインタ記憶部
(8)、最後尾単語記憶部(9)の間で行われる処理に対応し
ている。Steps (118) to (120) are N _x obtained as above.
(i), B _x (i) is the part that obtains the word recognition result in the reverse order. The segmentation part (12), back pointer storage part
(8) corresponds to the processing performed between the last word storage unit (9).

ステップ(118)のｘとしては、種々の値を設定できる。
例えばｘ＝５として計算しておけば、入力単語数が、１
〜５のそれぞれを仮定した場合についてD_x(I)が求まっ
ているから、ｘ＝１〜５についてを計算し、ｘ＝からステップ(118)〜(200)の計算をす
ることにすればｘ＝１〜５について入力単語数未知の場
合も認識も可能となる。Various values can be set as x in step (118).
For example, if x = 5 is calculated, the number of input words is 1
Since D _x (I) has been obtained for each of the cases of And the steps (118) to (200) are calculated from x =, recognition is possible even when the number of input words is unknown for x = 1 to 5.

本実施例では、連続して発声された単語を認識する場合
を述べたが、単語の代りに単音節等であってもよく、そ
の他の連続するパターンの認識にも適用できるものであ
る。In the present embodiment, the case has been described in which consecutively uttered words are recognized, but a single syllable or the like may be used instead of the word, and the present invention can be applied to the recognition of other continuous patterns.

また、経路の拘束条件として第４図(b)に示したものと
用いたが、他の拘束条件、例えば第４図(a)に示すよう
な拘束条件を用いてもよく、入力パターンの物理的性質
に応じて適当に設定することができる。Although the constraint condition of the route shown in FIG. 4 (b) is used, other constraint conditions such as the constraint condition shown in FIG. 4 (a) may be used, and the physical condition of the input pattern It can be set appropriately according to the physical properties.

さらに、本実施例では距離を用いて説明したが、相関等
を用いると類似度によって、ベクトル間の近さを定義す
ることができ、距離で最小化する代りに類似度で最大化
するという方法も全く同様な扱いで実現できる。Further, although the present embodiment has been described by using the distance, the similarity between vectors can be defined by the similarity by using the correlation or the like, and the method of maximizing the similarity instead of minimizing the distance. Can be achieved with exactly the same treatment.

発明の効果以上のように本発明によれば、連続して発声された音声
を対称型のDPパスを用いて認識することができ、認識精
度の高い連続音声認識装置を実現することができる。EFFECTS OF THE INVENTION As described above, according to the present invention, continuously uttered voices can be recognized using a symmetrical DP path, and a continuous voice recognition device with high recognition accuracy can be realized.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
音声認識装置の原理的説明を行うブロック図、第３図は
孤立単語音声認識における非線形伸縮マッチングを説明
する図、第４図はDPマッチングにおける経路の制限条件
の例を示す図、第５図は連続単語音声認識を単語数既知
とした場合の単語決定段階における処理の流れを示すフ
ローチャート、第６図は同様に単語数未知の場合のフロ
ーチャート、第７図は連続単語音声認識を行う場合のマ
ッチングの範囲を示す図、第８図は逆時間DPマッチング
を行う場合の経路の制限条件の例を示す図、第９図は標
準パターンの長さを一定値に交換して正規化標準パター
ンを作り出す方法を説明する図、第１０図は上記実施例
をソフトウエアで実現する場合の方法を説明するNSチャ
ートである。 (1)……特徴抽出部、(2)……単語標準パターン記憶部、
(3)……標準パターン正規化部、(4)正規化標準パターン
記憶部、(5)……ベクトル間距離計算部、(6)……累積距
離計算部、(7)……終端累積距離記憶部、(8)……パック
ポインタ記憶部、(9)……最後尾単語記憶部、(10)……
音声区間検出部、(11)……フレーム数計数部、(12)……
セグメンテーション部FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a block diagram explaining the principle of a voice recognition device, FIG. 3 is a diagram explaining non-linear expansion / contraction matching in isolated word voice recognition, and FIG. FIG. 6 is a diagram showing an example of a route restriction condition in DP matching, FIG. 5 is a flow chart showing a flow of processing in a word determination stage when continuous word speech recognition is made known, and FIG. 6 is also a word number. FIG. 7 is a flowchart showing an unknown condition, FIG. 7 is a diagram showing a range of matching when performing continuous word speech recognition, and FIG. 8 is a diagram showing an example of a route limiting condition when performing reverse time DP matching, and FIG. Is a diagram for explaining a method of exchanging the length of the standard pattern with a constant value to create a normalized standard pattern, and FIG. 10 is an NS chart for explaining a method of realizing the above-mentioned embodiment by software. (1) …… Feature extraction unit, (2) …… Word standard pattern storage unit,
(3) …… Standard pattern normalization unit, (4) Normalized standard pattern storage unit, (5) …… Vector distance calculation unit, (6) …… Cumulative distance calculation unit, (7) …… Terminal cumulative distance Storage unit, (8) …… Pack pointer storage unit, (9) …… End word storage unit, (10) ……
Voice section detector, (11) …… Frame number counter, (12) ……
Segmentation department

Claims

[Claims]

1. A sequence of feature vectors a ₁ a ₂ ...
Feature extraction means for converting to a _I , and feature vector series A standard pattern B ⁿ (where n = 1, 2, ..., N)
Is a sequence of feature vectors with a fixed number of frames J A standard pattern normalizing means for converting into a standard pattern, a standardized standard pattern storing means for storing the standardized standard pattern, a partial pattern A (0, m) from the first frame to the m-th frame of the input pattern, and The normalized standard pattern is x-
Minimum cumulative matching distance (or maximum cumulative matching similarity) D with the connection pattern obtained by optimally connecting one
_x-1 (m), the partial pattern A (m, i) from the (m + 1) th frame to the i-th frame of the input pattern, and the normalized standard pattern Minimum cumulative matching distance (or maximum cumulative matching similarity) D
The sum of ⁿ _x (m + 1: i) and n, m are minimized (maximized) to obtain the partial pattern A (0, i) from the first frame to the i-th frame of the input pattern and the normalization. Minimum (maximum) distance (similarity) between x standard patterns and combined patterns
Minimum cumulative matching distance (or maximum cumulative matching similarity)
Cumulative matching distance (or cumulative matching similarity) calculating means for calculating D _x (i), back pointer storing means for storing m at that time as B _x (i), and n at that time are N _x (i) ) And the last frame of the input when the input is completed, and the final frame of the input is I, if the input consists of x words, the final word is N _x (I), Two
The third word is N _x-1 (B _x (I)), the third last word is N
A pattern comparison device having means for recognizing continuously input word sounds in the reverse order such as _x-2 (B _x-1 (B _x (I))).