JPH0634191B2

JPH0634191B2 - Pattern feature normalization method

Info

Publication number: JPH0634191B2
Application number: JP62240093A
Authority: JP
Inventors: 健一磯
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1987-09-24
Filing date: 1987-09-24
Publication date: 1994-05-02
Anticipated expiration: 2009-05-02
Also published as: JPS6482000A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、パターン認識装置において、特に認識対象パ
ターンの変動に対処するパターン正規化方式の改良に関
する。Description: TECHNICAL FIELD The present invention relates to an improvement of a pattern normalization method in a pattern recognition apparatus, particularly for coping with fluctuations in a pattern to be recognized.

（従来の技術）標準パターンと入力パターンのパターンマッチング方式
による音声認識においては、伝送路特性や個人差、男女
差などによる認識対象パターンの変動が認識率を低下さ
せる原因になるので、入力パターンの変動を正規化する
ことが重要になる。(Prior art) In the voice recognition by the pattern matching method of the standard pattern and the input pattern, the variation of the recognition target pattern due to the transmission line characteristic, individual difference, gender difference, etc. causes a reduction in the recognition rate. It is important to normalize the variation.

従来用いられている正規化法としては、音声スペクトル
の最小二乗近似直線を求めて、元の音声スペクトルより
差し引く方法があり、この詳細は、電子通信学会の信学
技報PRL79-46「非線形スペクトルマッチングによる単語
音声認識の一方式」1979年10月に記載されている。As a conventionally used normalization method, there is a method of obtaining a least-squares approximation line of the speech spectrum and subtracting it from the original speech spectrum. A Method of Word Speech Recognition by Matching ", October 1979.

（発明が解決しようとする問題点）上記の方法は、音声スペクトルの変動が傾斜のバラツキ
だけに現れる場合の正規化には有効であるが、電話機や
マイクの特性の違いによるパターンの変動や男女の性差
による変動などの正規化にはより複雑な、非線形なパタ
ーン変動にも対応できるような正規化法が必要となる。(Problems to be Solved by the Invention) The above method is effective for normalization when the variation of the voice spectrum appears only in the variation of the slope, but the variation of the pattern due to the difference in the characteristics of the telephone and the microphone and the gender A normalization method that can deal with more complicated non-linear pattern fluctuations is required to normalize fluctuations due to gender differences.

本発明の目的は、認識対象クラスに依存しないパターン
の非線形変動成分の正規化作用を持つ演算と、認識の際
に標準パターンとして用意すべき変動のタイプとを、学
習用の変動パターンから帰納的に決定することのできる
パターン特徴正規化方式を提供することにある。An object of the present invention is to recursively calculate, from a variation pattern for learning, an operation having a normalizing action of a nonlinear variation component of a pattern that does not depend on a recognition target class and a type of variation that should be prepared as a standard pattern at the time of recognition. It is to provide a pattern feature normalization method that can be determined.

本発明による入力パターン正規化方式を認識装置の前処
理部として用いれば、ただ一つの変動タイプに対する標
準パターンを用意するだけで、複数の変動タイプを持つ
可能性のある入力パターンを精度良く認識することが可
能になる。If the input pattern normalization method according to the present invention is used as a preprocessing unit of a recognition device, an input pattern that may have a plurality of fluctuation types can be accurately recognized by preparing a standard pattern for only one fluctuation type. It will be possible.

（問題点を解決するための手段）本発明によるパターン特徴正規化方式はパターン認識に
於て、入力パターンが複数の変動タイプ（例えば入力装
置の特性の違いや個人差、男女差など）を持つ場合の変
動の正規化に際して、小数の認識対象クラスに対する学
習パターンを用いて（但し、全ての変動タイプに属する
パターンを含む）、標準となる変動のタイプと、他の変
動タイプのパターンを前記標準変動タイプを基準にして
正規化するパラメトライズされた正規化演算群とを、最
急降下法を用いた最適化手段によって帰納的に決定する
ことを特徴とする。(Means for Solving Problems) In the pattern feature normalization method according to the present invention, in pattern recognition, an input pattern has a plurality of fluctuation types (for example, differences in characteristics of input devices, individual differences, gender differences, etc.). In the case of normalization of fluctuations in a case, a learning pattern for a small number of recognition target classes is used (however, patterns including all fluctuation types are included), and a standard fluctuation type and patterns of other fluctuation types are used as the standard. It is characterized in that a parametricized normalization operation group for normalization based on a variation type and an optimization means using the steepest descent method are recursively determined.

（作用）本発明の基本的な原理は、入力パターンのパターンクラ
スに依存しない変動成分を正規化するために、少数のク
ラスに対する代表的な変動タイプパターンを用いて帰納
的に正規化演算を決定しようとするものである。(Operation) The basic principle of the present invention is to recursively determine a normalization operation by using a typical fluctuation type pattern for a small number of classes in order to normalize a fluctuation component that does not depend on a pattern class of an input pattern. Is what you are trying to do.

音声の代表的変動パターンのセットを｛ｘ^(p)(a)：ｐ＝１，．．．，Ｎ_ｖ，ａ＝１，．．．，
Ｎ_ｃ｝と表す。ｐは変動タイプを表す（例えば、男女差の場合
はｐ＝１（男性）、ｐ＝２（女性）となる）。ａはパタ
ーンクラスを表す。A set of typical variation patterns of speech is {x ^(p) (a): p = 1 ,. ．． , N _v , a = 1 ,. ．．，
N _c }. p represents a variation type (for example, in the case of gender difference, p = 1 (male) and p = 2 (female)). a represents a pattern class.

変動タイプｐから標準タイプｐφへの正規化演算Ｆをｙ^{（ｐ，ｐφ）}(a)＝Ｆ（ｘ^(p)(a)：
ｃ_ｍ ^{（ｐ，ｐφ）}）と表す。ここで｛ｃ_ｍ ^{（ｐ，ｐφ）}｝（ｍ＝
１，．．．，Ｍ，ｐ＝１，．．．Ｎ_ｖ（≠ｐφ））は変
動タイプｐから標準変動タイプｐφへの正規化関数を特
徴づけるパラメータ、ｙ^{（ｐ，ｐφ）}(a)は標準変動タ
イプｐφを基準にして変動タイプｐのパターンｘ^(p)(a)
を正規化したパターンである。以下に標準変動タイプｐ
φと｛ｃ_ｍ ^{（ｐ，ｐφ）}｝を上記変動パターンセットか
ら帰納的に決定する方法を示す。The normalization operation F from the fluctuation type p to the standard type pφ is y ^{(p, pφ)} (a) = F (x ^(p) (a):
c _m ^{(p, pφ)} ). Where {c _m ^{(p, pφ)} } (m =
1 ,. ．． , M, p = 1 ,. ．． N _v (≠ pφ)) is a parameter that characterizes the normalization function from the variation type p to the standard variation type pφ, and y ^{(p, pφ)} (a) is the pattern x of the variation type p with reference to the standard variation type pφ. ^(p) (a)
Is a normalized pattern. Below are standard fluctuation types p
A method for inductively determining φ and { _cm ^{(p, pφ)} } from the above variation pattern set will be shown.

標準変動タイプｐφと｛ｃ_ｍ ^{（ｐ，ｐφ）}｝が与えられ
た場合、その正規化の誤差を表す評価関数として次のよ
うな量Ｅを（ｃ_ｍ ^{（ｐ，ｐφ）}）を定義する。Given the standard fluctuation types pφ and { _cm ^{(p, pφ)} }, the following quantity E ( _cm ^{(p, pφ)} ) is defined as an evaluation function that represents the error of the normalization.

ここでｄ〔，〕はパターン間の距離関数である。ｐφが
与えられると、この誤差量Ｅを極小にするような｛ｃ_ｍ
^{（ｐ，ｐφ）}｝は最急降下法（岩波講座情報科学19「最
適化」p.46参照）を用いて求めることが出来る。そこ
で、全ての変動タイプを仮の標準タイプｐφとしてＥを
極小化し、最も誤差量を小さくする変動タイプを標準変
動タイプとして選択すれば、最適なｐφと｛ｃ_ｍ
^{（ｐ，ｐφ）}｝が得られる。 Here, d [,] is a distance function between patterns. When pφ is given, { _cm
^{(P, pφ)} } can be obtained using the steepest descent method (see Iwanami Course Information Science 19 “Optimization” p.46). Therefore, if all the fluctuation types are provisional standard types pφ and E is minimized and the fluctuation type that minimizes the error amount is selected as the standard fluctuation type, the optimum pφ and { _cm
^{(P, pφ)} } is obtained.

また、この方式を音声のように長さの異なる時系列ベク
トルで表されるパターンを対象に適用する場合には、上
記誤差量Ｅ（ｃ_ｍ ^{（ｐ，ｐφ）}）の定義式を次のように
変更すればよい。When this method is applied to a pattern represented by time-series vectors having different lengths such as speech, the definition equation of the error amount E ( _cm ^{(p, pφ)} ) is as follows. You can change to.

ここで添え字ｉは時系列パターンの時間軸を表してい
る。ｘ_ｉ ^（ｐφ）(a)はＤＰマッチング法等によって標
準変動タイプのパターンｘ_ｉ ^（ｐφ）(a)の時間軸との
対応づけを行ったパターンである。 Here, the subscript i represents the time axis of the time series pattern. x _i ^(pφ) (a) is a pattern in which the standard variation type pattern x _i ^(pφ) (a) is associated with the time axis by the DP matching method or the like.

正規化関数Ｆと距離関数ｄとして次のような関数を採用
した場合の具体例を以下に示す。A specific example of the case where the following functions are adopted as the normalization function F and the distance function d is shown below.

（ｘ，ｙの上付き添え字ｍ，ｎはパターンベクトルの成
分を表す。）ｄ［ｘ，ｙ］＝Σ_ｍ（ｘ^ｍ−ｙ^ｍ）^２この正規化関数は、ｘ^ｎ _ｉ ^(p)(a)を音声の第ｉフレーム
の短時間スペクトルの第ｎ成分であると考えると、男女
差などによる周波数軸上でのスペクトル・ピーク（ホル
マント周波数）の位置の違いを正規化するのに有効であ
る。この場合の最急降下法によるパラメータｃ^ｍ _ｎ
^{（ｐ，ｐφ）}の更新はεを微小な定数としてｃ^ｍ _ｎ ^{（ｐ，ｐφ）}←ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}＋Δｃ^ｍ _ｎ
^{（ｐ，ｐφ）} となり、処理の流れは次のようになる。 (The superscripts m and n of x and y represent the components of the pattern vector.) D [x, y] = Σ _m (x ^m −y ^m ) ² This normalization function is x ⁿ _i ^(p) Considering (a) to be the n-th component of the short-time spectrum of the i-th frame of speech, it is effective in normalizing the difference in the position of the spectrum peak (formant frequency) on the frequency axis due to gender differences. Is. Parameter c ^m _n by the steepest descent method in this case
^{^{_{(P, pφ) c m n}}} (p, pφ) as minute constant ε updating the ^{_{^{← c m n (p, pφ}}} ) + Δc m n
^{(P, pφ)} And the flow of processing is as follows.

1)ｐφ，｛ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}｝の初期値を設定する。1) Set the initial values of pφ, { ^cm _n ^{(p, pφ)} }.

2)誤差量Ｅ（ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}）と修正量｛Δｃ^ｍ _ｎ
^{（ｐ，ｐφ）}｝を計算する。2) the error amount ^{_{^{E (c m n (p,}}} pφ)) and the correction amount {.DELTA.c ^m _n
^{(P, pφ)} } is calculated.

3)｛ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}｝を更新する。 ^{_{^{3) {c m n (p}}} , pφ)} Update.

ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}←ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}＋Δｃ^ｍ _ｎ
^{（ｐ，ｐφ）} 4)収束するまで(2)〜(3)を繰り返す。 ^{_{^{c m n (p, pφ)}}} ← c m n (p, pφ) + Δc m n
^{(P, pφ)} 4) ^{Repeat steps} (2) to (3) until convergence.

5)中心ｐφを更新して(1)へ戻る。5) Update the center pφ and return to (1).

6)誤差量Ｅ（ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}）を最小にするｐφを
変動の中心とする。6) The center of fluctuation is pφ which minimizes the error amount E ( ^cm _n ^{(p, pφ)} ).

（実施例）第１図は本発明を実現した装置の一実施例を示したブロ
ック図である。学習用データ記憶部１には少数のクラス
に対する代表的な変動パターンデータが記憶される。パ
ラメータ初期化部２はｐφ、｛ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}｝、
Ｅ（ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}）を初期化してパーラメータ・
バッファ３に格納する。４は最急降下計算部で、パラメ
ータ・バッファ３から読み込んだ正規化関数のパラメー
タと学習用データ記憶部１のデータを用いて最急降下法
により誤差量Ｅ（ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}）の極小化を行
い、その結果として得られた誤差量Ｅ（ｃ^ｍ _ｎ
^{（ｐ，ｐφ）}）とパラメータ｛ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}｝を
パラメータ・バッファ３に書き戻す。この動作がすべて
の変動を仮の中心ｐφとして繰り返されて、最終的に出
力部５がパラメータ・バッファ３から誤差量Ｅを最小に
するｐφと｛ｃ^ｍ _ｎ ^{（ｐ，ｐφ）}｝を選択し、正規化関
数の学習結果として出力する。(Embodiment) FIG. 1 is a block diagram showing an embodiment of an apparatus realizing the present invention. The learning data storage unit 1 stores typical variation pattern data for a small number of classes. The parameter initialization unit 2 uses pφ, { ^cm _n ^{(p, pφ)} },
E ( ^cm _n ^{(p, pφ)} ) is initialized to the parameter
Store in buffer 3. 4 is a steepest descent calculator, the error amount E by the steepest descent method using the parameters and data of the learning data storage unit 1 of the normalization function read from the parameter buffer 3 ^{_{^{(c m n (p, pφ}}} )) performs minimization, the resulting error amount E (c ^m _n
^{(P, pφ)} ) and the parameter {c ^m _n ^{(p, pφ)} } are written back to the parameter buffer 3. This operation is repeated with all the fluctuations as the temporary center pφ, and finally the output unit 5 selects pφ and {c ^m _n ^{(p, pφ)} } that minimize the error amount E from the parameter buffer 3. , Output as the learning result of the normalization function.

この様にして得られた正規化関数群｛ｐφ，ｃ^ｍ _ｎ
^{（ｐ，ｐφ）}｝を音声認識に適用した場合の例を第２図
に示す。第２図において、標準パターン記憶部23には、
標準変動タイプのパターン｛ｘ_ｉ ^（ｐφ）(a)｝だけを
格納しておく。正規化パラメータ記憶部22には前記方式
により決定された正規化関数のパラメータを格納してお
く。正規化演算部21は入力パターンが入力されると、前
記正規化パラメータ記憶部から読み込んだパラメータを
用いてパターンの正規化を行う。この場合、入力パター
ンの変動タイプは未知なので全ての変動タイプから標準
変動タイプへの正規化演算を入力パターンに対して並列
的に適用し、マッタング部24へ送る。マッチング部は標
準パターン記憶部に格納されている標準パターンと、正
規化部から送られてきた前記正規化された入力パターン
群とのマッチングを並列的に行い、最もよく適合したパ
ターンを認識結果として出力する。Normalization function group of thus obtained {pφ, c ^m _n
FIG. 2 shows an example in which ^{(p, pφ)} } is applied to voice recognition. In FIG. 2, in the standard pattern storage unit 23,
Only the standard variation type pattern {x _i ^(pφ) (a)} is stored. The normalization parameter storage unit 22 stores the parameters of the normalization function determined by the above method. When the input pattern is input, the normalization calculation unit 21 normalizes the pattern using the parameters read from the normalization parameter storage unit. In this case, since the variation type of the input pattern is unknown, the normalization operation from all variation types to the standard variation type is applied in parallel to the input pattern and sent to the matting unit 24. The matching unit performs parallel matching between the standard pattern stored in the standard pattern storage unit and the normalized input pattern group sent from the normalization unit, and the best matching pattern as a recognition result. Output.

（発明の効果）以上述べたように、本発明によれば、予測される入力パ
ターンの変動を精度良く正規化する変換関数群を少数の
認識対象クラスのデータから帰納的かつ適応的に決定す
ることができる。その結果得られる正規化関数群は認識
対象クラスに依存しないので学習に用いなかったクラス
のパターンの変動の正規化にも有効であり、認識対象ク
ラスが変わる毎に学習をやり直す必要がなくなる。即
ち、認識対象クラスを追加する場合などには、標準変動
タイプのパターンだけを収集して標準パターンに追加す
れば良く、正規化演算部は変更することなく使用するこ
とができる。(Effects of the Invention) As described above, according to the present invention, a conversion function group that accurately normalizes a predicted variation of an input pattern is recursively and adaptively determined from a small number of recognition target class data. be able to. Since the resulting normalization function group does not depend on the recognition target class, it is effective for normalizing the variation of the pattern of the class not used for learning, and it becomes unnecessary to redo learning every time the recognition target class changes. That is, when adding a recognition target class, it is sufficient to collect only the standard variation type patterns and add them to the standard pattern, and the normalization operation unit can be used without changing.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロック図であり、第
２図は本発明の音声認識装置への適用例を示すブロック
図である。図において、１は学習用データ記憶部、２はパラメータ
初期化部、３はパラメータ・バッファ、４は最急降下計
算部、５は出力部、21は正規化演算部、22は正規化パラ
メータ記憶部、23は標準パターン記憶部、24はマッチン
グ部である。FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is a block diagram showing an application example of the present invention to a voice recognition device. In the figure, 1 is a learning data storage unit, 2 is a parameter initialization unit, 3 is a parameter buffer, 4 is a steepest descent calculation unit, 5 is an output unit, 21 is a normalization calculation unit, and 22 is a normalized parameter storage unit. , 23 is a standard pattern storage unit, and 24 is a matching unit.

Claims

[Claims]

1. In pattern recognition, when a variation is normalized when an input pattern has a plurality of variation types, a learning pattern for a small number of classes to be recognized is used to determine a standard variation type and other variation types. A pattern feature normalization characterized in that a parameterized normalization operation group for normalizing a variation type pattern on the basis of the standard variation type is recursively determined by an optimizing means using a steepest descent method. Method.