JP3000642B2

JP3000642B2 - Pattern recognition method and standard pattern learning method

Info

Publication number: JP3000642B2
Application number: JP2243632A
Authority: JP
Inventors: 健一磯
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-09-13
Filing date: 1990-09-13
Publication date: 2000-01-17
Anticipated expiration: 2015-01-17
Also published as: JPH04165400A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音声信号などのように特徴ベクトルの時系
列として表されるパターンを認識するパターン認識方
式、およびその標準パターンを学習データから自動的に
構成する標準パターン学習方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention relates to a pattern recognition method for recognizing a pattern represented as a time series of a feature vector such as an audio signal, and a standard pattern thereof is automatically generated from learning data. The present invention relates to a standard pattern learning method configured in a symmetric manner.

[Conventional technology]

時系列パターンの予測に基づくパターン認識方式とし
て、「ニューラル予測モデル」が知られている。この方
式に関しては特願平１−344214号明細書（以下文献１と
略記する）および日本音響学会講演論文集平成元年10月
175〜176ページ（以下文献２と略記する）に詳しく解説
されている。ニューラル予測モデルでは、入力パターン
の時刻ｉにおける特徴ベクトルと有限状態遷移網から構
成される標準パターンモデルの第ｊ番目の状態の間の局
所距離として、入力パターンの時刻ｉにおける特徴ベク
トルと、状態ｊに付随した予測器（多層パーセプトロ
ン）によって入力パターンの時刻ｉ−１以前の複数の特
徴ベクトルから算出された時刻ｉの特徴ベクトルに対す
る予測ベクトルとの間の距離を用いている。As a pattern recognition method based on prediction of a time-series pattern, a “neural prediction model” is known. This method is described in Japanese Patent Application No. 1-344214 (hereinafter abbreviated as Reference 1) and the Transactions of the Acoustical Society of Japan, October 1989.
This is described in detail on pages 175 to 176 (hereinafter abbreviated as reference 2). In the neural prediction model, the feature vector at the time i of the input pattern and the state j as the local distance between the feature vector at the time i of the input pattern and the j-th state of the standard pattern model composed of the finite state transition network. The distance between the prediction vector and the feature vector at time i calculated from a plurality of feature vectors of the input pattern before time i-1 by the predictor (multi-layer perceptron) attached to the input pattern is used.

またニューラル予測モデルを規定するモデルパラメー
タを学習によって自動的に決定する学習法としては、モ
デルパラメータの初期値を乱数などで適当に与えて、繰
り返しアルゴリズムによって最適な値に修正する方法
が、上記文献に開示されている。As a learning method for automatically determining a model parameter that defines a neural prediction model by learning, a method in which an initial value of a model parameter is appropriately given by a random number or the like and is corrected to an optimal value by an iterative algorithm is disclosed in the above-mentioned document. Is disclosed.

[Problems to be solved by the invention]

上記のようにニューラル予測モデルでは、入力パター
ンの時刻ｉにおける特徴ベクトルを予測する際に予測器
は入力パターンの時刻ｉ−１以前の特徴ベクトルだけを
用いて予測ベクトルを決定している。このため、ある時
点の特徴ベクトルがその時点より以前の特徴ベクトルと
相関が強い場合に特に予測が適切に行われる。As described above, in the neural prediction model, when predicting the feature vector of the input pattern at time i, the predictor determines the prediction vector using only the feature vector of the input pattern before time i-1. For this reason, prediction is appropriately performed particularly when a feature vector at a certain time has a strong correlation with a feature vector before that time.

しかし、その逆の場合、すなわちある時点の特徴ベク
トルがその時点より以後の特徴ベクトルと強い相関を持
っている場合には予測の精度が上がらないことがある。
たとえば破裂音の立ち上がりの部分は破裂前の無音部分
との相関よりも、破裂後の後続母音の過渡部分との相関
の方が強いと考えられる。However, in the opposite case, that is, when a feature vector at a certain time has a strong correlation with a feature vector after that time, the prediction accuracy may not be improved.
For example, it is considered that the correlation between the rising part of the plosive and the transient part of the subsequent vowel after the plosion is stronger than the correlation with the silent part before the plosion.

また、モデルパラメータの学習法においてはその初期
値を設定する適切な方法が知られておらず、これまでは
乱数による初期化などが用いられていた。しかし上記文
献１および２で与えられている繰り返し学習法は、学習
の評価関数の極小点に収束するようなアルゴリズムであ
るために、パラメータの初期値が乱数で与えられている
と評価関数の望ましくない極小点に収束したまま学習が
終了してしまうという場合が発生する。Further, in the learning method of the model parameters, an appropriate method for setting the initial value is not known, and the initialization using a random number has been used so far. However, since the iterative learning method given in the above-mentioned documents 1 and 2 is an algorithm that converges to the minimum point of the learning evaluation function, it is desirable that the initial value of the parameter be given by a random number. There is a case where the learning ends with the convergence to the minimum point.

本発明の目的は、ある時点の特徴ベクトルを予測する
際に、その時点より以前の特徴ベクトルとの相関だけで
なく、その時点以後の特徴ベクトルとの相関も取り入れ
た、より精度の高い時系列パターン予測に基づくパター
ン認識方式を提供することにある。An object of the present invention is to predict a feature vector at a certain point in time, and incorporate not only a correlation with a feature vector before that point in time, but also a correlation with a feature vector after that point in time, thereby achieving a more accurate time series. It is to provide a pattern recognition method based on pattern prediction.

本発明の他の目的は、標準パターンモデルのパラメー
タを繰り返し学習で決定する際に、パラメータのより良
い推定を可能にする初期値を設定する標準パターン学習
方式を提供することにある。It is another object of the present invention to provide a standard pattern learning method for setting an initial value that enables better estimation of a parameter when a parameter of a standard pattern model is determined by iterative learning.

[Means for solving the problem]

第１の発明は、特徴ベクトルの時系列として表された
入力パターンを、有限状態遷移網から構成される標準パ
ターンモデルを用いて認識するパターン認識方式におい
て、前記有限状態遷移網の各状態が入力パターンの時刻ｉ
＋１以降の複数の特徴ベクトルから時刻ｉの特徴ベクト
ルに対する予測ベクトルを算出する予測器を有し、入力パターンの時刻ｉにおける特徴ベクトルと前記有
限状態遷移網の第ｊ番目の状態の間の局所距離ｄ（i,
j）として、入力パターンの時刻ｉにおける特徴ベクト
ルa_iと、前記状態ｊの予測器による時刻ｉの特徴ベクト
ルに対する予測ベクトルA_i（ｊ）との間の距離Ｄ（a_i,A
_i（ｊ））を用いることを特徴とする。A first invention is a pattern recognition method for recognizing an input pattern represented as a time series of a feature vector using a standard pattern model composed of a finite state transition network, wherein each state of the finite state transition network is input. Time i of the pattern
A predictor for calculating a prediction vector for the feature vector at time i from a plurality of feature vectors after +1; a local distance between the feature vector at time i of the input pattern and the j-th state of the finite state transition network d (i,
j) a distance D (a _i , A) between a feature vector a _i at time i of the input pattern and a prediction vector A _i (j) for the feature vector at time i by the predictor of state j.
_i (j)).

第２の発明は、特徴ベクトルの時系列として表された
入力パターンを、有限状態遷移網から構成される標準パ
ターンモデルを用いて認識するパターン認識方式におい
て、前記有限状態遷移網の各状態が入力パターンの時刻ｉ
−１以前の複数の特徴ベクトルと入力パターンの時刻ｉ
＋１以降の複数の特徴ベクトルとから時刻ｉの特徴ベク
トルに対する予測ベクトルを算出する予測器を有し、入力パターンの時刻ｉにおける特徴ベクトルと前記有
限状態遷移網の第ｊ番目の状態の間の局所距離ｄ（i,
j）として、入力パターンの時刻ｉにおける特徴ベクト
ルa_iと、前記状態ｊの予測器による時刻ｉの特徴ベクト
ルに対する予測ベクトルA_i（ｊ）との間の距離Ｄ（a_i,A
_i（ｊ））を用いることを特徴とする。A second invention is a pattern recognition method for recognizing an input pattern represented as a time series of a feature vector using a standard pattern model composed of a finite state transition network, wherein each state of the finite state transition network is input. Time i of the pattern
Time i of a plurality of feature vectors and an input pattern before −1
A predictor that calculates a prediction vector for a feature vector at time i from a plurality of feature vectors after +1; and a localizer between the feature vector at time i of the input pattern and the j-th state of the finite state transition network. The distance d (i,
j) a distance D (a _i , A) between a feature vector a _i at time i of the input pattern and a prediction vector A _i (j) for the feature vector at time i by the predictor of state j.
_i (j)).

第３の発明は、特徴ベクトルの時系列として表された
入力パターンを、有限状態遷移網から構成される標準パ
ターンモデルを用いて認識するパターン認識方式におい
て、前記有限状態遷移網の各状態が入力パターンの時刻ｉ
−１以前の複数の特徴ベクトルから時刻ｉの特徴ベクト
ルに対する前方予測ベクトルを算出する前方予測器と、
入力パターンの時刻ｉ＋１以降の複数の特徴ベクトルか
ら時刻ｉの特徴ベクトルに対する後方予測ベクトルを算
出する後方予測器とを有し、入力パターンの時刻ｉにおける特徴ベクトルa_iと前記
状態ｊの前方予測器による時刻ｉの特徴ベクトルに対す
る前方予測ベクトルA_i ^F（ｊ）との間の前方予測距離d_F
（a_i,A_i ^F（ｊ））と、入力パターンの時刻ｉにおける特
徴ベクトルa_iと前記状態ｊの後方予測器による時刻ｉの
特徴ベクトルに対する後方予測ベクトルA_i ^B（ｊ）との
間の後方予測距離d_B（a_i,A_i ^B（ｊ））とから算出される
量Ｄ（d_F（a_i,A_i ^F（ｊ））,d_B（a_i,A_i ^B（ｊ）））を、
入力パターンの時刻ｉにおける特徴ベクトルと前記有限
状態遷移網の第ｊ番目の状態の間の局所距離ｄ（i,j）
として用いることを特徴とする。A third invention is a pattern recognition method for recognizing an input pattern represented as a time series of a feature vector using a standard pattern model composed of a finite state transition network, wherein each state of the finite state transition network is input. Time i of the pattern
A forward predictor that calculates a forward prediction vector for the feature vector at time i from a plurality of feature vectors before -1;
A backward predictor for calculating a backward predictive vector for the feature vector at time i from a plurality of feature vectors of the input pattern after time i + 1, wherein the forward predictor for the feature vector a _i at time i of the input pattern and the state j Predicted distance d _F between the predicted vector A _i ^F (j) and the feature vector at time i according to
(A _i , A _i ^F (j)) between the feature vector a _i at time i of the input pattern and the backward predicted vector A _i ^B (j) for the feature vector at time i by the backward predictor in state j. D (d _F (a _i , A _i ^F (j)), d _B (a _i , A _i ^B (j) calculated from the backward predicted distance d _B (a _i , A _i ^B (j)) ))),
Local distance d (i, j) between the feature vector at time i of the input pattern and the j-th state of the finite state transition network
It is characterized by using as.

第４の発明は、第１または２または３の発明のパター
ン認識方式における有限状態遷移網の各状態に付随した
予測器として多層パーセプトロンを用いた場合に、その
パラメータを学習データから自動的に決定する標準パタ
ーン学習方式であって、状態ｊに付随した多層パーセプトロンの出力層のユニ
ットの閾値の初期値として、学習データから算出した代
表ベクトルを用いることを特徴とする。According to a fourth aspect, when a multilayer perceptron is used as a predictor associated with each state of the finite state transition network in the pattern recognition method according to the first or second or third aspect, its parameters are automatically determined from learning data. A standard pattern learning method, wherein a representative vector calculated from learning data is used as an initial value of a threshold value of a unit of an output layer of the multilayer perceptron associated with the state j.

[Action]

本発明のパターン認識方式および標準パターン学習方
式においては、各認識対象カテゴリの標準パターンモデ
ルは始状態と終状態を有する有限状態遷移網で表され、
有限状態遷移網の各状態にはそれぞれ固有の予測器が付
随している。各予測器は時刻ｉ＋１以降の入力パターン
の特徴ベクトル時系列から切り出された固定長の特徴ベ
クトル列を入力として、時刻ｉに出現すべき特徴ベクト
ルに対する予測ベクトルを出力する。この予測ベクトル
と実際に時刻ｉに入力された特徴ベクトルの間の距離を
予測誤差とする。入力パターンと標準パターンである有
限状態遷移網の間の距離としては、たとえば始状態と終
状態を結び、入力パターン時系列の時刻に同期して状態
遷移を行ったときの可能な状態遷移の中で、遷移に沿っ
た予測誤差の累積値が最少となる遷移を動的計画法で決
定し、得られた最適な遷移に沿った予測誤差の累積値を
距離とする。標準パターンを構成する予測器のパラメー
タの決定は、予測誤差を評価関数とした最急降下法を用
いた学習によって行う。In the pattern recognition method and the standard pattern learning method of the present invention, the standard pattern model of each recognition target category is represented by a finite state transition network having a start state and an end state,
Each state of the finite state transition network has its own predictor. Each predictor receives as input a fixed-length feature vector sequence cut out from the feature vector time series of the input pattern after time i + 1, and outputs a prediction vector for a feature vector to appear at time i. The distance between this prediction vector and the feature vector actually input at time i is defined as a prediction error. The distance between the input pattern and the finite state transition network, which is the standard pattern, is, for example, the possible state transition when the start state and the end state are connected and the state transition is performed in synchronization with the time of the input pattern time series. Then, the transition that minimizes the cumulative value of the prediction error along the transition is determined by the dynamic programming, and the obtained cumulative value of the prediction error along the optimal transition is defined as the distance. The parameters of the predictor constituting the standard pattern are determined by learning using the steepest descent method using the prediction error as an evaluation function.

以下に本発明のパターン認識および標準パターン学習
方式についてより詳細に説明する。説明では音声パター
ンを認識する場合を例に論議することにするが、本発明
はその他の時系列パターンに対しても音声パターンの部
分をパターンベクトル列に読み変えれば同様に適用する
ことができる。Hereinafter, the pattern recognition and standard pattern learning method of the present invention will be described in more detail. In the description, the case of recognizing a voice pattern will be described as an example, but the present invention can be similarly applied to other time-series patterns by reading the voice pattern portion into a pattern vector sequence.

第１の発明に係る予測器は時刻ｉ＋１以降の入力音声
の特徴ベクトル系列a_i+1,a_i+2,…から時刻ｉに出現する
べき特徴ベクトルa_iを時間軸後向きに予測する。ｊ番目
の状態に付随する予測器による予測ベクトルA_i（ｊ）を
次式で表す。Predictor according to a first aspect of the invention the time i + 1 feature vector series of subsequent input speech a _{i + 1,} a _{i + 2,} to predict a feature vector a _i should emerge from ... to time i in the time axis backward. The prediction vector A _i (j) by the predictor associated with the j-th state is represented by the following equation.

A_i（ｊ）＝Ｆ（W_j,a_i+1,a_i+2,…,a_ｉ＋τ）・・・（１）ここでＦ（・）はパラメータW_jによって特徴づけられ
る状態ｊに付随した予測器の入出力関係を与える非線形
ベクトル値関数である。ここでW_jは複数のパラメータを
代表して表している。τは予測に用いる入力音声の特徴
ベクトルの数である。このような時間軸後向きの予測を
行う予測器を用いると、ある時点の特徴ベクトルがその
時点より以後の特徴ベクトルと強い相関を持っている場
合の予測の精度が向上し、ひいては認識性能が上がるこ
とが期待される。たとえば音声の破裂音の立ち上がりの
部分は破裂前の無音部分との相関よりも、破裂後の後続
母音の過渡部分との相関の方が強いので、本方式が特に
有効に機能すると考えられる。A _i (j) = F (W _j , a _{i + 1} , a _{i + 2} ,..., A _{i + τ} ) (1) where F (·) is associated with state j characterized by parameter W _j Is a nonlinear vector value function that gives the input / output relationship of the predicted predictor. Here, W _j is represented as a plurality of parameters. τ is the number of input speech feature vectors used for prediction. When such a predictor that performs backward prediction on the time axis is used, the accuracy of prediction when a feature vector at a certain time point has a strong correlation with a feature vector after that time point is improved, and thus recognition performance is improved. It is expected. For example, it is considered that the present method functions particularly effectively because the rising part of the plosive sound of the voice has a stronger correlation with the transient part of the subsequent vowel after the rupture than with the silent part before the rupture.

このような予測器として多層パーセプトロンを用いる
と、式（１）の具体的な表式は次のように与えられる。When a multilayer perceptron is used as such a predictor, a specific expression of Expression (1) is given as follows.

ここでU₀（ｊ）,U₁（ｊ），…,U_τ（ｊ）はパーセプ
トロンのユニット間の結合係数行列、θ_０（ｊ），θ_１
（ｊ）は出力層，中間層のユニットの閾値ベクトル、ｆ
（・）は引数のベクトルの各成分にシグモイド関数を作
用して得られるベクトルを表している。この場合式
（１）のパラメータW_jは W_j＝｛U₀（ｊ）,U₁（ｊ），…,U_τ（ｊ）， θ_０（ｊ），θ_１（ｊ）｝（３）である。なおここでは３層のパーセプトロンを用いた場
合の例を示してあるが、４層などその他の場合も同様で
ある。また多層パーセプトロンに関しては刊行物「PDP
モデル」（産業図書,1989年）に詳しく解説されてい
る。 Here, U ₀ (j), U ₁ (j),..., U _τ (j) are coupling coefficient matrices between the units of the perceptron, θ ₀ (j), θ ₁
(J) is the threshold vector of the unit of the output layer and the intermediate layer, f
(•) represents a vector obtained by applying a sigmoid function to each component of the argument vector. In this case, the parameter W _{j in} equation (1) is W _j = {U ₀ (j), U ₁ (j),..., U _τ (j), θ ₀ (j), θ ₁ (j)} (3) It is. Although an example in which a three-layer perceptron is used is shown here, the same applies to other cases such as four layers. Regarding multilayer perceptrons, see the publication "PDP
Model ”(Sangyo Tosho, 1989).

入力音声の時刻ｉにおける特徴ベクトルa_iと、標準パ
ターンモデル（有限状態遷移網）の第ｊ番目の状態の間
の局所距離ｄ（i,j）の具体的な表式は、上述の予測ベ
クトルA_i（ｊ）を用いてたとえば次式のように与えられ
る。The specific expression of the local distance d (i, j) between the feature vector a _i at time i of the input voice and the j-th state of the standard pattern model (finite state transition network) is the above-described prediction vector Using A _i (j), for example, is given as follows:

ｄ（i,j）＝Ｄ（a_i,A_i（ｊ））（４）＝‖a_i−A_i（ｊ）‖^２（５）ここではベクトル間の距離として２乗距離を用いた場
合を示しているが、一般的に用いられている他のベクト
ル間の距離（マハラノビス距離など）を用いても、以下
の議論は同じように成立する。d (i, j) = D (a _i , A _i (j)) (4) = {a _i −A _i (j)} ² (5) Here, when the square distance is used as the distance between the vectors However, the following discussion holds in the same way even when a distance between other vectors (a Mahalanobis distance or the like) that is generally used is used.

ここまでの議論は第１の発明に係る予測器に関するも
のであるが、第２の発明に係る予測器は、上述の議論に
おいて予測器の入出力を表す式（１）を次式で置き換え
ることによって与えられる。The discussion so far relates to the predictor according to the first invention, but the predictor according to the second invention replaces the expression (1) representing the input / output of the predictor in the above discussion with the following expression. Given by

A_i（ｊ）＝Ｆ（W_j,a_i-1,a_i-2,…,a_ｉ＋τ１,a_i+1,a_i+2,…,a
_ｉ＋τ２）（６）ここで、τ₁,τ_２は予測に用いる時刻ｉの前後の入力
音声の特徴ベクトルの数を表している。この場合にも第
１の発明の場合と同様に非線形ベクトル値関数Ｆ（・）
は多層パーセプトロンを用いて構成することが可能であ
る。この方式によれば時間軸の前向きおよび後向きの両
方向に相関の強いパターンの予測精度を改善することが
できる。A _i (j) = F (W _j , a _i-1 , a _i-2 , ..., a _{i + τ1} , a _{i + 1} , a _{i + 2} , ..., a
_{i + τ2} ) (6) Here, τ ₁ and τ ₂ represent the number of feature vectors of the input speech before and after time i used for prediction. Also in this case, similarly to the case of the first invention, the nonlinear vector value function F (·)
Can be configured using a multilayer perceptron. According to this method, it is possible to improve the prediction accuracy of a pattern having a strong correlation in both the forward and backward directions on the time axis.

次に第３の発明の方式について説明する。この場合に
は有限状態遷移網の第ｊ番目の状態には前方予測器と後
方予測器の２種類の予測器が付随している。それぞれの
予測器が与える予測ベクトルは次式で表される。Next, the method of the third invention will be described. In this case, two types of predictors, a forward predictor and a backward predictor, are attached to the j-th state of the finite state transition network. The prediction vector given by each predictor is represented by the following equation.

A_i ^F（ｊ）＝Ｆ（W_i ^F,a_i-1,a_i-2,…,a_ｉ＋τ１）（７） A_i ^B（ｊ）＝Ｆ（W_j ^B,a_i+1,a_i+2,…,a_ｉ＋τ２）（８）前方予測器から得られる前方予測ベクトルA_i ^F（ｊ）
から次式のように前方予測距離d_F（a_i,A_i ^F（ｊ））が定
義される。 _{^{A i F (j) = F}} (W i F, a i-1, a i-2, ..., a i + τ1) (7) A i B (j) = F (W j B, a i + 1, a _{i + 2} , ..., a _{i + τ2} ) (8) Forward prediction vector A _i ^F (j) obtained from the forward predictor
, A forward prediction distance d _F (a _i , A _i ^F (j)) is defined as follows:

d_F（a_i,A_i ^F（ｊ））＝‖a_i−A_i ^F（ｊ）‖^２・・・（９）後方予測距離d_B（a_i,A_i ^B（ｊ））に関しても同様であ
る。d _F (a _i , A _i ^F (j)) = {a _i −A _i ^F (j)} ² (9) Also regarding the backward predicted distance d _B (a _i , A _i ^B (j)) The same is true.

d_B（a_i,A_i ^B（ｊ））＝‖a_i−A_i ^B（ｊ）‖^２・・・（10）ここでも距離の例として２乗距離を用いているが、そ
の他の距離に関しても以下の議論は同様に行うことがで
きる。d _B (a _i , A _i ^B (j)) = {a _i −A _i ^B (j)} ² (10) Again, the square distance is used as an example of the distance, but other distances are used. , The following discussion can be similarly performed.

入力音声の時刻ｉにおける特徴ベクトルと、標準パタ
ーンモデル（有限状態遷移網）の第ｊ番目の状態の間の
局所距離ｄ（i,j）は次式で与えられる。The local distance d (i, j) between the feature vector of the input voice at time i and the j-th state of the standard pattern model (finite state transition network) is given by the following equation.

ｄ（i,j）＝Ｄ（d_F（a_i,A_i ^F（ｊ））， d_B（a_i,A_i ^B（ｊ）））（11）ここで局所距離を定める関数Ｄ（・）としては、例え
ば前方予測距離と後方予測距離の小さい方の距離を局所
距離として選択するなどの方式が考えられる。この方式
によれば、先方予測と後方予測のより精度の高い方を自
動的に選択して用いることができるので、取り扱う時系
列が時間軸の前向きと後向きのどちら方向により強い相
関を持っているかをあらかじめ知らなくても、精度の高
い予測が実現できる。d (i, j) = D (d F (a i, A i F (j)), d B (a i, A i B (j))) (11) where the function D (· defining the local distance For example, a method in which the smaller of the forward predicted distance and the backward predicted distance is selected as the local distance can be considered. According to this method, the more accurate one of the forward prediction and the backward prediction can be automatically selected and used, so which time series to be handled has a stronger correlation in the forward or backward direction of the time axis. Without knowing in advance, highly accurate prediction can be realized.

以上のように第1,第2,第３の発明のいずれかの方式に
よって局所距離が与えられると、入力パターンと標準パ
ターンである有限状態遷移網の間の全体としての距離
（累積距離）を定義することができる。この距離の定義
として既知のものとしては、たとえば状態遷移が確定的
な定義（DPマッチング）と、確率的な定義（隠れマルコ
フモデ）がある。「DPマッチング」および「隠れマルコ
フモデル」に関しては刊行物「確率モデルによる音声認
識」（電子情報通信学会編，中川聖一著,1988年，以下
文献３と略記する）に詳しく解説されている。たとえば
DPマッチングによる定義を採用した場合には、入力パタ
ーンと標準パターン（有限状態遷移網）の間の累積距離
は、始状態と終状態を結び、入力パターンの時刻に同期
して状態遷移を行ったときの、可能な状態遷移の中で、
遷移に沿った予測誤差の累積値が最小となる遷移を動的
計画法で決定し、得られた最適な遷移に沿った予測誤差
の累積値で与えられる。その具体的なアルゴリズムに関
しては文献１および２に詳しく与えられている。As described above, when the local distance is given by any of the first, second, and third inventions, the overall distance (cumulative distance) between the input pattern and the finite state transition network, which is the standard pattern, is calculated. Can be defined. Known definitions of this distance include, for example, a definition in which the state transition is deterministic (DP matching) and a stochastic definition (hidden Markov model). “DP matching” and “Hidden Markov model” are described in detail in the publication “Speech Recognition by Stochastic Model” (edited by the Institute of Electronics, Information and Communication Engineers, written by Seiichi Nakagawa, 1988, hereinafter abbreviated as Ref. 3). For example
When the definition by DP matching was adopted, the cumulative distance between the input pattern and the standard pattern (finite state transition network) linked the start state and the end state, and performed the state transition in synchronization with the time of the input pattern. Of the possible state transitions,
The transition that minimizes the cumulative value of the prediction error along the transition is determined by dynamic programming, and is given by the obtained cumulative value of the prediction error along the optimal transition. The specific algorithm is given in detail in References 1 and 2.

次に標準パターンモデル（有限状態遷移網）を特徴づ
けるパラメータを学習データから自動的に決定する標準
パターン学習方式について述べる。学習法としては文献
１および２に与えられている方式が、本発明の標準パタ
ーンモデルの学習にもそのまま適用できる。この方式は
モデルパラメータの初期値を乱数などで適当に与えて、
学習データに対する予測誤差の累積値を評価関数とし
て、繰り返し修正アルゴリズムによって最適な値に修正
する方法である。しかしこの繰り返し修正による学習法
は、学習の評価関数の極小点に収束するようなアルゴリ
ズムであるために、パラメータの初期値が乱数で与えら
れていると評価関数の望ましくない極小点に収束したま
ま学習が終了してしまうという場合が発生する。本発明
の標準パターン学習方式は、予測器として多層パーセプ
トロンを用いた場合にこの点を解消して乱数に比べてよ
り良いモデルパラメータの推定を可能にするようなパラ
メータ初期値設定法を与える。Next, a standard pattern learning method for automatically determining parameters characterizing a standard pattern model (finite state transition network) from learning data will be described. As the learning method, the methods given in References 1 and 2 can be applied as they are to the learning of the standard pattern model of the present invention. In this method, the initial values of the model parameters are appropriately given by random numbers, etc.
In this method, the cumulative value of the prediction error with respect to the learning data is used as an evaluation function to correct the value to an optimum value by an iterative correction algorithm. However, since the learning method based on this iterative correction is an algorithm that converges to a minimum point of the learning evaluation function, if the initial values of the parameters are given by random numbers, the learning method remains converged to an undesirable minimum point of the evaluation function. There is a case where learning ends. The standard pattern learning method according to the present invention provides a parameter initial value setting method that solves this problem when a multilayer perceptron is used as a predictor and enables better estimation of model parameters than random numbers.

この場合、推定すべきパラメータは式（２）のパラメ
ータであるが、ここではこれらパラメータの初期値を以
下のように与える。In this case, the parameters to be estimated are the parameters of the equation (2). Here, the initial values of these parameters are given as follows.

ここで_ｊは後述の方法によって学習データから算出
された代表ベクトルである。また記号は_ｊに比べて絶対値の十分小さい乱数で初期値を与え
ることを表している。このような初期化を行うと繰り返
し修正学習の初期には式（２）は次のように近似でき
る。 Here, _j is a representative vector calculated from learning data by a method described later. Also the symbol Represents that an initial value is given by a random number whose absolute value is sufficiently smaller than _j . By performing such initialization, Expression (2) can be approximated as follows at the beginning of the iterative correction learning.

A_i（ｊ）〜_ｊ（18）この近似によって予測ベクトルは入力音声の特徴ベク
トルによらず一定値となる。このとき式（４）の局所距
離は次式のようになる。A _i (j) to _j (18) By this approximation, the prediction vector becomes a constant value regardless of the feature vector of the input speech. At this time, the local distance in Expression (4) is as follows.

ｄ（i,j）＝‖a_i−_ｊ‖^２（19）これは通常のDPマッチングによる音声認識で用いられ
ている入力パターンの時刻ｉの特徴ベクトルと標準パタ
ーンの時刻ｊの特徴ベクトルの間の距離とみなすことが
できる。そこでθ_０（ｊ）の初期値としての_ｊに、学
習データから作成された標準パターンの特徴ベクトル
（代表ベクトル）を用いることができる。具体的な_ｊ
の設定法の例を以下に示す。認識対象カテゴリｓの第ｍ
番目の学習データ（ｍ＝1,…,M_s）の第ｊ番目のフレー
ムの特徴ベクトルをb_j（s,m），（ｊ＝1,…,J）とす
る。ここで学習データの個々の発生の長さの違いはDPマ
ッチングなどを用いて正規化されているものとする。こ
のときカテゴリｓの標準パターンモデルの状態ｊの予測
器の初期値θ_０（ｊ）を次のように設定する。d (i, j) = {a _i − _j } ² (19) This is between the feature vector at time i of the input pattern and the feature vector at time j of the standard pattern used in speech recognition by ordinary DP matching. Can be considered as a distance. Therefore, a feature vector (representative vector) of a standard pattern created from learning data can be used as _j as an initial value of θ ₀ (j). Specific _j
An example of the setting method is shown below. M-th of recognition target category s
The feature vector of the j-th frame of the learning data (m = 1,..., M _s ) is b _j (s, m), (j = 1,..., J). Here, it is assumed that the difference between the lengths of the individual occurrences of the training data has been normalized using DP matching or the like. At this time, the initial value θ ₀ (j) of the predictor of the state j of the standard pattern model of the category s is set as follows.

このように初期値を設定すると、学習の初期には予測
器は入力パターンの特徴ベクトルによらず代表的な標準
パターンを出力するので、第０近似として通常のパター
ンマッチングから出発したことになり、乱数でモデルを
初期化するのに比べて望ましくない極小点に収束してし
まう可能性が大幅に少なくなる。 When the initial values are set in this way, the predictor outputs a representative standard pattern regardless of the feature vector of the input pattern at the beginning of learning, so that the zeroth approximation has started from normal pattern matching. Compared to initializing the model with random numbers, the possibility of convergence to an undesirable minimum is greatly reduced.

〔Example〕

第１〜５図は本発明のパターン認識方式による認識の
フローチャートを示すもので、長さＩの入力パターン特
徴ベクトル時系列a₁,…,a_I、および標準パターンモデル
のパラメータは外部から与えられているとする。このフ
ローチャートは作用の項の中で説明した認識方式を具体
化したものであり、変数などの表記はそこで与えたもの
に従うことにする。ただし変数の添字ｓは認識対象のカ
テゴリ（ｓ＝1,…,S）を表している。以下流れに沿って
説明する。1 to 5 show a flowchart of recognition by the pattern recognition method of the present invention. The input pattern feature vector time series a ₁ ,..., A I of length _I , and the parameters of the standard pattern model are externally given. Suppose This flowchart embodies the recognition method described in the section of operation, and the notation of variables and the like follows the one given there. However, the subscript s of the variable represents the category (s = 1,..., S) of the recognition target. Hereinafter, description will be given along the flow.

第１図のステップ101では変数の初期化を行う。その
詳細は第２図に示されている。第２図のステップ201〜2
03でカウンタの初期設定を行っている。ステップ204で
は局所距離d_s（i,j）と、累積予測誤差g_s（i,j）の格納
域を初期化している。ステップ205〜210でカウンタのイ
ンクリメントおよび条件判断を行って、すべてのs,i,j
に関してステップ204の初期化を行っている。ステップ2
12は各カテゴリｓの累積予測誤差の始端点での値を設定
している。In step 101 of FIG. 1, variables are initialized. The details are shown in FIG. Steps 201 to 2 in FIG.
Initial setting of the counter is performed in 03. In step 204, the storage area for the local distance d _s (i, j) and the accumulated prediction error g _s (i, j) is initialized. In steps 205 to 210, the counter is incremented and the condition is determined, and all s, i, j
Are performed in step 204. Step 2
Numeral 12 sets the value at the start point of the cumulative prediction error of each category s.

第１図に戻って、ステップ102〜104ではカウンタの初
期設定を行っている。ステップ105では局所距離を計算
する。これは、第1,第2,第３の発明に対応する部分であ
る。第１の発明の場合には第３図のステップ301〜302で
局所距離を計算する。ステップ301の計算は作用の項で
説明した式（２）に対応しており、予測器を３層構造の
パーセプトロンで実現したものである。ステップ302は
局所距離として式（４）の２乗距離を用いた場合であ
る。Returning to FIG. 1, in steps 102 to 104, the counter is initialized. In step 105, the local distance is calculated. This is a portion corresponding to the first, second, and third inventions. In the case of the first invention, the local distance is calculated in steps 301 to 302 in FIG. The calculation in step 301 corresponds to the equation (2) described in the section of the operation, and the predictor is realized by a perceptron having a three-layer structure. Step 302 is a case where the square distance of Expression (4) is used as the local distance.

第２の発明の場合には第４図のステップ401,402で局
所距離を計算する。ステップ401の計算は式（６）の予
測器を３層パーセプトロンで実現したものである。式中
のU₀ ^(s)、U_k ^(s)、V_k ^(s)、θ₀ ^(s)（ｊ）、θ₁ ^(s)（Ｊ）
は３層パーセプトロンを特徴づけるパラメータである。
また関数ｆ（・）はシグモイド関数（sigmoid functio
n）である。ステップ402は局所距離として式（４）の２
乗距離を用いた場合である。In the case of the second invention, the local distance is calculated in steps 401 and 402 in FIG. The calculation in step 401 is the realization of the predictor of equation (6) using a three-layer perceptron. U ₀ ^(s) , U _k ^(s) , V _k ^(s) , θ ₀ ^(s) (j), θ ₁ ^(s) (J)
Are parameters characterizing the three-layer perceptron.
The function f (•) is a sigmoid functio
n). In step 402, the local distance is calculated as 2 in equation (4).
This is the case where the riding distance is used.

第３の発明の場合には第５図のステップ501〜507で局
所距離を計算する。ステップ501の計算は式（７）の前
方予測器を３層パーセプトロンで実現したものである。
ステップ502の計算は式（８）の後方予測器を３層パー
セプトロンで実現したものである。ステップ503は前方
予測距離として式（９）の２乗距離を用いた場合、ステ
ップ504は後方予測距離として式（10）の２乗距離を用
いた場合である。ステップ505で前方予測距離と後方予
測距離の小さい方を選び、ステップ506,507でその値を
局所距離に設定している。In the case of the third invention, the local distance is calculated in steps 501 to 507 in FIG. The calculation in step 501 is the realization of the forward predictor of equation (7) with a three-layer perceptron.
The calculation in step 502 is realized by realizing the backward predictor of equation (8) with a three-layer perceptron. Step 503 is the case where the squared distance of equation (9) is used as the predicted forward distance, and step 504 is the case where the squared distance of equation (10) is used as the predicted backward distance. In step 505, the smaller of the forward predicted distance and the backward predicted distance is selected, and in steps 506 and 507, the value is set as the local distance.

ふたたび第１図に戻る。ステップ106〜115ではステッ
プ105で与えられた局所距離を用いて、文献１〜３に与
えられている動的計画法に基づいて、入力パターンと標
準パターンモデルの間の距離（累積予測誤差）を計算し
ている。ステップ106〜108は動的計画法の漸化式計算を
行っている。ステップ109〜114でカウンタのインクリメ
ントと条件判断を行って、すべてのフレーム，カテゴ
リ，状態に関して計算を行っている。ステップ115は終
端点での累積予測誤差最少のカテゴリを認識結果とし
て選出している。Returning to FIG. In steps 106 to 115, using the local distance given in step 105, the distance (cumulative prediction error) between the input pattern and the standard pattern model is calculated based on the dynamic programming given in documents 1 to 3. I'm calculating. Steps 106 to 108 perform the recurrence formula calculation of the dynamic programming. In steps 109 to 114, the increment of the counter and the condition judgment are performed, and the calculation is performed for all frames, categories, and states. In step 115, a category having the smallest cumulative prediction error at the terminal point is selected as a recognition result.

本発明の標準パターン学習方式は、標準パターンモデ
ル（有限状態遷移網）を特徴づけるパラメータを学習デ
ータから自動的に決定する標準パターン学習方式におけ
るパラメータの初期値設定法を提供する。第6,7図は作
用の項で説明した式（12）〜（21）に与えられた初期値
設定法の処理の流れを示したものである。第６図ステッ
プ603は式（21）に従って、複数の学習データの特徴ベ
クトルの平均値として、予測器（多層パーセプトロン）
のパラメータ（出力層のユニットの閾値ベクトル）の初
期値を設定している。第７図は式（12）に従って各カテ
ゴリの各状態の閾値ベクトル（ステップ703）およびユ
ニット間結合係数行列（ステップ704）を、乱数で初期
化している。ステップ703および704において、RND（ma
g）は絶対値がmag未満の一様乱数を表している。magと
しては作用の項で述べたように、θ₀ ^s（ｊ）の初期値の
絶対値に比べて十分小さい値を用いる。The standard pattern learning method of the present invention provides a parameter initial value setting method in the standard pattern learning method for automatically determining parameters characterizing a standard pattern model (finite state transition network) from learning data. 6 and 7 show the flow of processing of the initial value setting method given to the equations (12) to (21) described in the section of operation. In step 603 of FIG. 6, a predictor (multi-layer perceptron) calculates the average value of the feature vectors of a plurality of learning data according to equation (21).
(The threshold value vector of the unit of the output layer) is set as the initial value. In FIG. 7, a threshold vector (step 703) and an inter-unit coupling coefficient matrix (step 704) of each state of each category are initialized with random numbers according to equation (12). In steps 703 and 704, the RND (ma
g) represents a uniform random number whose absolute value is less than mag. As described in the operation section, a value sufficiently smaller than the absolute value of the initial value of θ ₀ ^s (j) is used as mag.

〔The invention's effect〕

以上述べたように本発明によれば、ある時点の特徴ベ
クトルを予測する際に、その時点より以前の特徴ベクト
ルとの相関だけでなく、その時点以降の特徴ベクトルと
の相関も取り入れた、より精度の高い時系列パターン予
測に基づくパターン認識方式と、標準パターンモデルの
パラメータを繰り返し学習で決定する際に、パラメータ
のより良い推定を可能にする初期値を設定する標準パタ
ーン学習方式を提供することができる。As described above, according to the present invention, when predicting a feature vector at a certain time, not only a correlation with a feature vector before that time but also a correlation with a feature vector after that time are incorporated. To provide a pattern recognition method based on highly accurate time-series pattern prediction and a standard pattern learning method for setting initial values that enable better estimation of parameters when repeatedly determining parameters of a standard pattern model. Can be.

[Brief description of the drawings]

第１図は本発明のパターン認識方式において、累積予測
誤差の定義としてDPマッチングを採用した場合のフロー
チャートを示す図、第２図は第１図のフローチャートにおける初期化のアル
ゴリズムを示すフローチャートを示す図、第３図は第１の発明の予測器を用いた場合の局所距離の
計算のフローチャートを示す図、第４図は第２の発明の予測器を用いた場合の局所距離の
計算のフローチャートを示す図、第５図は第３の発明の予測器を用いた場合の局所距離の
計算のフローチャートを示す図、第６図，第７図は本発明の標準パターン学習方式によっ
て、モデルパラメータの初期値を設定するフローチャー
トを示す図である。FIG. 1 is a diagram showing a flowchart when DP matching is employed as a definition of a cumulative prediction error in the pattern recognition method of the present invention, and FIG. 2 is a diagram showing a flowchart showing an initialization algorithm in the flowchart of FIG. FIG. 3 is a diagram showing a flowchart of calculating a local distance when the predictor of the first invention is used, and FIG. 4 is a flowchart of calculating a local distance when using the predictor of the second invention. FIG. 5 is a diagram showing a flowchart for calculating a local distance when the predictor of the third invention is used. FIGS. 6 and 7 show initial values of model parameters by the standard pattern learning method of the present invention. FIG. 4 is a diagram illustrating a flowchart for setting a value.

フロントページの続き (56)参考文献特開平４−324500（ＪＰ，Ａ) 日本音響学会平成２年度秋季研究発表会講演論文集▲Ｉ▼，２−Ｐ−17，「半音節ニューラル予測モデルによる音声認識」，ｐ．163−164，（平成２年９月20 日発表) 日本音響学会平成２年度秋季研究発表会講演論文集▲Ｉ▼，１−８−22，「ニューラルネット予測型ＨＭＭによる音声認識」，ｐ．43−44，（平成２年９月19 日発表) 電子情報通信学会論文誌Ｖｏｌ．Ｊ 73−Ｄ−▲ＩＩ▼，Ｎｏ．８，Ａｕｇｕｓｔ1990，「ニューラル予測モデルを用いた不特定話者音声認識」，ｐ．1315− 1321，（1990年８月25日発行) 電子情報通信学会技術研究報告［音声］，Ｖｏｌ．91，Ｎｏ．95，ＳＰ91− 14，「時系列処理機能をもつ階層型ニューラルネットワーク」，ｐ．63−70, （1991年６月20日発行) 電子情報通信学会技術研究報告［音声］，Ｖｏｌ．89，Ｎｏ．90，ＳＰ89− 23，「ニューラルネットワークによる予測モデルを用いた音声認識」，ｐ．81− 87，（1989年６月22日発行) 電子情報通信学会技術研究報告［音声］，Ｖｏｌ．89，Ｎｏ．340，ＳＰ89− 83，「ニューラルネット駆動型ＨＭＭ」，ｐ．55−62，（1989年12月14日発行) Ｐｒｏｃｅｅｄｉｎｇｓｏｎ 1990 ＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｃｏｕｓｔｉｃｓ，ＳｐｅｅｃｈａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．１，“Ｓ８．８Ｓｐｅａｋｅｒ−ＩｎｄｅｐｅｎｄｅｎｔＷｏｒｄＲｅｃｏｇｎｉｔｉｏｎｕｓｉｎｇａＮｅｕｒａｌＰｒｅｄｉｃｔｉｏｎＭｏｄｅｌ”，ｐ．441− 444，（３−６，Ａｐｒｉｌ 1990) (58)調査した分野(Int.Cl.⁷，ＤＢ名) C10L 3/00 535 C10L 3/00 521 C10L 3/00 539 C10L 9/10 301 G06F 15/18 ＪＩＣＳＴファイル（ＪＯＩＳ) ＩＮＳＰＥＣContinuation of the front page (56) References JP-A-4-324500 (JP, A) Proceedings of the Acoustical Society of Japan Fall Meeting, 1990, 2-P-17, "Using a semisyllable neural prediction model Speech recognition ”, p. 163-164, (announced on September 20, 1990) Proceedings of the Acoustical Society of Japan Fall Meeting, 1990, I-8, 1-8-22, "Speech Recognition by Neural Network Predictive HMM", p. 43-44, (published September 19, 1990) Transactions of the Institute of Electronics, Information and Communication Engineers, Vol. J 73-D-II, No. 8, August 1990, “Unspecified speaker speech recognition using a neural prediction model”, p. 1315-1321, (published August 25, 1990) IEICE Technical Report [Voice], Vol. 91, No. 95, SP91-14, “Hierarchical neural network with time series processing function”, p. 63-70, (issued on June 20, 1991) IEICE Technical Report [Voice], Vol. 89, No. 90, SP89-23, “Speech Recognition Using Prediction Model by Neural Network”, p. 81-87, (published June 22, 1989) IEICE Technical Report [Voice], Vol. 89, No. 340, SP89-83, "Neural Network Driven HMM", p. 55-62, (issued December 14, 1989) Proceedings on 1990 IEEE International Conferencing on Acoustics, Speech and Signal Processing, Vol. 1, "S8.8 Speaker-Independence World Recognition usng a Neural Prediction Model", p. 441-444, (3-6, April 1990) (58) Fields investigated (Int. Cl. ⁷ , DB name) C10L 3/00 535 C10L 3/00 521 C10L 3/00 539 C10L 9/10 301 G06F 15 / 18 JICST file (JOIS) INSPEC

Claims

(57) [Claims]

1. A pattern recognition method for recognizing an input pattern represented as a time series of feature vectors using a standard pattern model composed of a finite state transition network, wherein each state of the finite state transition network is an input pattern. Time i +
A predictor for calculating a prediction vector for a feature vector at time i from a plurality of feature vectors subsequent to 1; a local distance between the feature vector at time i of the input pattern and the j-th state of the finite state transition network d (i, j)
The feature vector a _i at time i of the input pattern
And a distance D (a _i , A) between a prediction vector A _i (j) for a feature vector at time i by the predictor in state j.
_i (j)) using a pattern recognition method.

2. A pattern recognition method for recognizing an input pattern represented as a time series of a feature vector using a standard pattern model comprising a finite state transition network, wherein each state of the finite state transition network is an input pattern. Time i-
Time i + of a plurality of feature vectors before 1 and the input pattern
A predictor for calculating a prediction vector for a feature vector at time i from a plurality of feature vectors subsequent to 1 and a local vector between the feature vector at time i of the input pattern and the j-th state of the finite state transition network. Distance d (i, j)
The feature vector a _i at time i of the input pattern
And a distance D (a _i , A) between a prediction vector A _i (j) for a feature vector at time i by the predictor in state j.
_i (j)) using a pattern recognition method.

3. A pattern recognition method for recognizing an input pattern represented as a time series of a feature vector using a standard pattern model composed of a finite state transition network, wherein each state of the finite state transition network is an input pattern. Time i-
1. A forward predictor that calculates a forward prediction vector for a feature vector at time i from a plurality of feature vectors before 1 and a backward predictor that calculates a backward prediction vector for a feature vector at time i from a plurality of feature vectors of the input pattern after time i + 1. A forward prediction distance d _F between a feature vector a _i at time i of the input pattern and a forward prediction vector A _i ^F (j) for the feature vector at time i by the forward predictor in state j. (A
_i , A _i ^F (j)) and the backward vector between the feature vector a _i at time i of the input pattern and the backward predicted vector A _i ^B (j) for the feature vector at time i by the backward predictor of state j. predicted distance d _B amount is calculated from the _{_{^{(a i, a i B (}}} j)) D (d F (a i, a i F (j)), d B (a i, a i B (j)) ) Is used as a local distance d (i, j) between the feature vector at time i of the input pattern and the j-th state of the finite state transition network.

4. When a multilayer perceptron is used as a predictor associated with each state of a finite state transition network in the pattern recognition method according to any one of claims 1 to 3, its parameters are automatically obtained from learning data. A standard pattern learning method to be determined, wherein a representative vector calculated from learning data is used as an initial value of a threshold value of a unit of an output layer of a multilayer perceptron associated with a state j.