JPS5936760B2

JPS5936760B2 - Recognition method using nonlinear matching

Info

Publication number: JPS5936760B2
Application number: JP11736275A
Authority: JP
Inventors: 博平川; 幸和蕪山; 俊夫松浦
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1975-09-29
Filing date: 1975-09-29
Publication date: 1984-09-05
Also published as: JPS5242006A

Description

【発明の詳細な説明】本発明は、非線形整合による認識方法、特に標準パター
ン系列を構成する各要素と入力音声のパターン系列を構
成する各要素との夫々の間の類似度にもとずいて非線形
整合（類似度和ΣＲを求める）を行なうに当つて、入力
音声から抽出されたパーコール係数時間傾斜成分時系列
を氷め、該時系列におけるピーク発生タイミングに対応
して類似度和ΣＲを求めるために用いる類似度Ｒ１ｊを
順次遷移してゆくような整合処理手段をもうけるように
した非線形整合による認識方法に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention is based on a recognition method based on non-linear matching, particularly based on the degree of similarity between each element constituting a standard pattern sequence and each element constituting an input speech pattern sequence. When performing nonlinear matching (determining the sum of similarities ΣR), the time series of Percoll coefficient time slope components extracted from the input speech is frozen, and the sum of similarities ΣR is determined in accordance with the peak occurrence timing in the time series. The present invention relates to a recognition method using nonlinear matching, which includes a matching processing means that sequentially changes the similarity R1j used for the purpose of the present invention.

入力音声のパターン認識の手法の１つとして、パーコー
ル係数と呼ばれる係数の時系列を求め、標準となるパー
コール係数の時系列との間で整合をとる方式が考慮され
ている。As one method for pattern recognition of input speech, a method is being considered in which a time series of coefficients called Percoll coefficients is obtained and matched with a standard time series of Percoll coefficients.

ここで言うパーコール係数とは、「第８回東北大学電気
通信研究所・シンポジューム論文集板倉文忠゛統計
的手法による音声の特徴抽出’’」および「昭和４５年
電子通信学会全国大会講演論文集５−３−９、板倉文忠
他”’ＰＡＲＣＯＲ型音声合成゛“」に開示されている
ものである。フ上記パーコール係数の時系列の如き、
標準パターン系列と上記入力音声から抽出された入力音
声のパターン系列（以下入力パターン系列と略すことも
ある）との整合を求めるに当つて、いわゆる非線形整合
処理が広く利用されている。The Percoll coefficient referred to here refers to the ``8th Tohoku University Research Institute of Electrical Communication Symposium Proceedings, Fumitada Itakura, Speech Feature Extraction Using Statistical Methods'' and ``1972 Institute of Electronics and Communication Engineers National Conference Lecture Proceedings 5. This is disclosed in ``PARCOR-type speech synthesis'' by Fumitada Itakura et al., 3-9. F. Like the time series of Percoll coefficients above,
So-called non-linear matching processing is widely used to find matching between a standard pattern sequence and an input audio pattern sequence extracted from the input audio (hereinafter sometimes abbreviated as input pattern sequence).

これに次５の如きものと考えてよい。即ち、標準パタ
ーン系列Ｓ−（Ｓｉ）（ｉ−１、２、・・・・・・、ｍ
）と入力パターン系列Ｐ ■（Ｐｊ）（ｊ■１、２、・
・・・・・ｎ）とウーから、上記両系列を構成する
各要素ＳｉとＰｊとの類似度Ｒｌｊをすべて決定したマ
トリクスをつくる。This can be thought of as the following 5. That is, the standard pattern series S-(Si)(i-1, 2, . . . , m
) and the input pattern series P ■(Pj) (j■1, 2,・
...n) and Wu, a matrix is created in which all the degrees of similarity Rlj between each element Si and Pj constituting the above two series are determined.

そしてスタート点に対応する類似度Ｒ，ｌから終点に対
応する類似度Ｒｍｎに至る間、例えばＲｉ，ｊからＲｉ
，ｊ＋ｌ＜ｌ！−Ｒｉ＋１・ｊ＋１のうち類似度の大き
ぃ方を選択し、この様に選択された類似度を順にたどり
つつこれら類似度の和を求めるようにする。そしてこの
ような類似度を求める処理をすべての標準パターン系列
に対して行ない、この結果の類似度和の最も大きい１つ
の標準パターン系列をもつて、入力パターン系列がその
標準パターン系列に属するものとして認識する。非線形
整合処理は上述の如く行なわれるが、上述の如くＲｉ，
ｊからＲｉ，ｊ＋１とＲｉ＋Ｉ，ｊ＋ｌとのうちの類似
度の大きい方を選択しつつたどつてゆく処理即ち遷移処
理を行なうために、何んらかの原因により部分的に高い
類似度をとる箇所が非所望に存在すると、非所望な位置
で遷移が生じてしまい、正しい類似度和が決定できなく
なることがある。During the period from the similarity R,l corresponding to the start point to the similarity Rmn corresponding to the end point, for example, from Ri,j to Ri
,j+l<l! -Ri+1.j+1, the one with the larger similarity is selected, and the sum of these similarities is determined by sequentially tracing the selected similarities. Then, perform this process of calculating similarity for all standard pattern sequences, and select the one standard pattern sequence with the largest sum of similarities as a result, and assume that the input pattern sequence belongs to that standard pattern sequence. recognize. The nonlinear matching process is performed as described above, but as mentioned above, Ri,
In order to select the higher similarity from j to Ri,j+1 and Ri+I,j+l, i.e., to perform transition processing, a high similarity is partially taken for some reason. If a position exists in an undesired position, a transition occurs at an undesired position, and it may become impossible to determine the correct similarity sum.

本発明は上記の点を解決することを目的としており、本
発明の非線形整合による認識方法は予め用意された標準
パターン系列を構成する各要素と入力音声のパターン系
列を構成する各要素との夫夫の間の類似度にもとづいて
、上記標準パターン２系列と上記入力音声のパターン
系列との非線形整合処理を行なう非線形整合による認識
方法において、上記入力音声から抽出されたパーコール
係数にもとづき該パーコール係数の時間傾斜成分を決定
するパーコ一火係数時間傾斜成分抽出部、該抽こ出さ
れたパーコール係数時間傾斜成分の時系列を発生する時
系列発生部、および上記標準パターン系列と上記入力音
声のパターン系列との整合を求める整合処理手段とをも
うけ、該整合処理手段として、上記パーコール係数時間
傾斜成分時系列に５おけるピーク発生タイミングに対
応して、類似度和ΣＲを求めるために用いる類似度Ｒを
、上記標準パターン系列を構成する第１番目の要素と上
記入力音声のパターン系列を構成する第ｊ番目の要素と
の１つの類似度Ｒｉｊから、上記標準パターン４系列
を構成する第（１＋１）番目の要素と上記入力音声パタ
ーン系列を構成する第（ｊ＋１）番目の要素との他の類
似度Ｒｉ＋１，ｊ＋ｌに遷移する遷移タイミングを与え
強制的に遷移するよう構成せしめたことを特徴としてい
る。The purpose of the present invention is to solve the above-mentioned problems, and the recognition method using nonlinear matching of the present invention combines each element constituting a standard pattern series prepared in advance with each element constituting a pattern series of input speech. In a recognition method using non-linear matching that performs a non-linear matching process between the two standard patterns and the pattern sequence of the input voice based on the degree of similarity between the husbands, the Percoll coefficient is determined based on the Percoll coefficient extracted from the input voice. a Perco ignition coefficient time slope component extraction unit that determines the time slope component of the Perco coefficient, a time series generation unit that generates a time series of the extracted Perco coefficient time slope component, and the standard pattern sequence and the pattern of the input audio. A matching processing means for obtaining matching with the series is provided, and the matching processing means is configured to calculate the similarity R used to obtain the similarity sum ΣR corresponding to the peak occurrence timing at 5 in the Percoll coefficient time slope component time series. is calculated from one similarity Rij between the first element constituting the standard pattern sequence and the j-th element constituting the pattern sequence of the input voice, the (1+1)th element constituting the four standard pattern sequences. The present invention is characterized in that a transition timing for transitioning to another degree of similarity Ri+1, j+l between the th element and the (j+1)th element constituting the input voice pattern sequence is given so that the transition is forcibly made.

以下図面を参照しつつ説明する。第１図は本発明による
認識方法の一実施例構成、第２図は第１図に示す時間傾
斜処理部の一実施例構成、第３図は本発明によつて抽出
されたＱパラメータの一例、第４図は本発明による非線
形整合処理を説明する説明図、第５図は本発明による整
合部の処理をフローチヤートの形で表わした一実施例を
夫々表わす。This will be explained below with reference to the drawings. FIG. 1 shows the configuration of one embodiment of the recognition method according to the present invention, FIG. 2 shows the configuration of one embodiment of the time gradient processing section shown in FIG. 1, and FIG. 3 shows an example of the Q parameter extracted by the present invention. , FIG. 4 is an explanatory diagram for explaining the nonlinear matching process according to the present invention, and FIG. 5 shows an embodiment in the form of a flowchart of the process of the matching section according to the present invention.

第１図において、１はパーコール係数ｋパラメータ抽出
部、２−１ないし２−１０は夫々時間傾斜処理部で上記
ｋパラメータについて予め定めた短時間内の平均値をと
り該平均値の時間的変化を抽出してＱパラメータを得る
もの、３は時系列発生部でパーコール係数時間傾斜成分
時系列Ｖｊを得るもの、４は入力音声のパターン系列発
生部で例えば上記ｋパラメータにもとずいて入力音声に
対応した入力パターン系列Ｐを得るもの、５は整合処理
部、６は標準パターン系列群格納部であつて予め定めら
れた複数の標準パターン系列ｓ（０）・・・ｓ（ｒ）・
・・を格納しておくものを表わしている。In FIG. 1, reference numeral 1 denotes a Percoll coefficient k-parameter extracting unit, and 2-1 to 2-10 each a time gradient processing unit, which calculates the average value of the above-mentioned k parameter within a predetermined short time and changes the average value over time. 3 is a time series generation unit that obtains the Percoll coefficient time slope component time series Vj, and 4 is an input voice pattern sequence generation unit that extracts the input voice based on the above k parameter, for example. 5 is a matching processing unit, and 6 is a standard pattern sequence group storage unit which obtains an input pattern sequence P corresponding to a plurality of predetermined standard pattern sequences s(0)...s(r).
It represents something that stores...

なお以下の実施例において、上記標準パターン系列ｓ（
ｏ）・・・・・・・・・は入力パターン系列Ｐと同様に
対応する形をとるが、入力パターン系列Ｐにくらべて時
系列上でいわゆる間引いたものが格納部６内に格納され
る。入力音声に対応して、公知の手段をとるパーコール
係数ｋパラメータ抽出部１によつて、ｋパラメータｋ１
ないしＫ，Ｏが求められ、これらｋパラメータｋ１ない
しＫｌＯは入力パターン系列発生部４に導びかれる。In addition, in the following examples, the standard pattern series s(
o)...... takes the same form as the input pattern series P, but compared to the input pattern series P, what is called thinned out in time series is stored in the storage unit 6. . Corresponding to the input voice, the Percoll coefficient k parameter extraction unit 1 using known means extracts the k parameter k1.
, K, and O are determined, and these k parameters k1 to KlO are led to the input pattern sequence generator 4.

そして該発生部４において、例えば時間帯ｔ＝０，Ｔ，
２Ｔ，・・・・・・毎にが求められ、入力パターン系列
Ｐが決定され、整合処理部５において標準パターン系列
ｓ（０）ないしｓ（ｒ）と夫々整合がとられる。本発明
の場合、上記整合処理部５において整合処理を行なうに
当つて、上述のパーコール係数時間傾斜成分時系列Ｖｊ
を求め、該時系列Ｖｊのピーク発生タイミングを調べて
これを利用するようにしている。Then, in the generation unit 4, for example, time period t=0,T,
is determined every 2T, . In the case of the present invention, when performing matching processing in the matching processing section 5, the above-mentioned Percoll coefficient time slope component time series Vj
, and the peak occurrence timing of the time series Vj is investigated and used.

第２図は第１図図示の時間傾斜処理部の一実施例構成を
示し、図中Ｔないし１０は夫々時遅回路、１１は加算演
算増幅器、１２は差動増幅器、１３ないし１６は抵抗を
表わしている。FIG. 2 shows the configuration of an embodiment of the time slope processing section shown in FIG. It represents.

例えばｋパラメータＫ，が入力されてくると、ｋｌ（Ｔ
ｉ），ｋｌ（Ｔｉ＋ＴＯ），ｋｌ（Ｔｉ＋２Ｔ０）が加
算演算増幅器１１によつて平均された値Ａが得られる。For example, when the k parameter K, is input, kl(T
i), kl(Ti+TO), and kl(Ti+2T0) are averaged by the summing operational amplifier 11 to obtain a value A.

時遅回路１０は上記値Ａを遅延せしめ、差動増幅器１２
は該遅延された値Ａ印と上記値Ａとの差即ちＱパラメー
タタＱｌを発生する。上述の如く得られたＱパラメータ
は第３図に示す如く、各話者Ａ，Ｂ，・・・に対応しか
つ単語゛４゜゜や’゛９””などに対応した特徴を含ん
でいる。そして上記パーコール係数は声道に関する特徴
に対応するものであることが知られていることから、上
記ＱパラメータＱｌなιルＱ，Ｏは話者がある単語を
発音したときの゜’声道の変化゜”即ち”゜口の形の変
化゛に対応している。換言すると、上記ＱパラメータＱ
ｌないしＱｌＯについてＶｊ（ｔ←｛Ｑ，”（ｔ｝ＦＱ
ｚ（ｔ｝Ｉ−・・・・・＋Ｑ。The time delay circuit 10 delays the value A, and the differential amplifier 12
generates the difference between the delayed value A and the value A, that is, the Q parameter Ql. As shown in FIG. 3, the Q parameters obtained as described above correspond to each speaker A, B, . . . and include features corresponding to words such as "4" and "9". Since it is known that the above Percoll coefficient corresponds to the characteristics related to the vocal tract, the above Q parameter Ql is the ゜' vocal tract when the speaker pronounces a certain word. This corresponds to a change in the shape of the mouth. In other words, the above Q parameter Q
Vj(t←{Q,”(t}FQ
z(t}I-...+Q.

゛（ｔ）｝１／２−（１）で与えられるパーコール係数
時間傾斜成分時系列Ｖｊを決定すると、該時系列Ｖｊが
ピークをとるタイミングは”゜口の形の変化”゛が大き
い所でありいわぱ音韻が変化するタイ゛ミングであると
考えてよい。即ち例えば１アカイｌの如き単語に対応す
る入力の場合について言えば音韻１アＩが音韻Ｉ力１に
変化するタイミングや、音韻１力１が音韻ｌイ１に変化
するタイミングであると考えてよい。第１図図示の時系
列発生部は上記第（１）式にしたがつた時系列ＶＪを発
生する。When the Percoll coefficient time slope component time series Vj given by ゛(t)}1/2-(1) is determined, the timing at which the time series Vj takes a peak is at a point where ``゜change in mouth shape'' is large. It can be thought of as the timing when the phonology changes. In other words, for example, in the case of an input corresponding to a word such as 1 Akai I, consider that this is the timing at which the phoneme 1 A I changes to the phoneme I power 1, or the timing at which the phoneme 1 power 1 changes to the phoneme l I 1. good. The time series generator shown in FIG. 1 generates the time series VJ according to the above equation (1).

第４図は本発明による整合部の非線形整合処理を説明す
る説明図を示している。FIG. 4 shows an explanatory diagram for explaining the nonlinear matching process of the matching section according to the present invention.

従来公知の非線形整合処理の場合と同様に、一標準パタ
ーン系列Ｓを構成する各要素（音韻例えばｌア１やＩ力
１や１イｌに対応している）Ｓｌと入力パターン系列Ｐ
を構成する各要素Ｐ』とを、横座標と縦座標とに配列
し、各要素Ｓｌ（！：ＰＪとの間の類似度Ｒｉｊを決
定してマトリクスＲを求める。そして上述の如く求めら
れたパーコール係数時間傾斜成分時系列Ｖｊを利用して
、次の如く類似度和ΣＲを得るための類似度Ｒを抽出し
てゆく。即１）類似度Ｒ１ｌをスタート点として、上記
時系列ＶＪが予め定めた閾値ＥＯ以下であるタイミング
のもとでは、１つの抽出された類似度Ｒ１ｊからＲｉ，
ｊ＋１に遷移せしめてゆき、２）また上記時系列Ｖｊが
予め定めた閾値ＥＯを超えるタイミングのもとで、１つ
の抽出された類似度Ｒｉｊから強制的にＲｉ＋１，ｊ＋
Ｉに遷移せしめてゆくようにする。As in the case of conventionally known non-linear matching processing, each element (corresponding to the phoneme 1, 1, 1, etc.) configuring one standard pattern sequence S and the input pattern sequence P
The matrix R is obtained by arranging the elements P'' constituting the abscissa and ordinate, and determining the degree of similarity Rij between each element Sl (!:PJ). Using the Percoll coefficient time slope component time series Vj, the similarity R to obtain the similarity sum ΣR is extracted as follows: 1) Using the similarity R1l as a starting point, the above time series VJ is Under the timing that is less than the predetermined threshold EO, one extracted similarity R1j to Ri,
2) At the timing when the time series Vj exceeds a predetermined threshold EO, forcibly convert Ri+1,j+ from one extracted similarity Rij.
Make the transition to I.

上記の如き処理による遷移処理が正当であることは、上
述の如く、パーコール係数時間傾斜成分が゜”口の形の
変化”゜即ち音韻の変化（移りかわり）に対応している
ことから、容易に理解されよう。The validity of the transition process as described above is easily confirmed by the fact that the Percoll coefficient time slope component corresponds to a change in the shape of the mouth, that is, a change in phoneme. be understood.

第５図は、整合処理部５において第４図に関連して説明
した遷移にしたがつた類似度和ΣＲを求める処理をフロ
ーチヤートの形で表わしている。FIG. 5 shows, in the form of a flowchart, the process of calculating the similarity sum ΣR according to the transition described in connection with FIG. 4 in the matching processing section 5.

その処理は次の通りである。即ちａ）スタート点におい
て要素Ｓ，とＰＩとに対応する類似度Ｒ１ｌをレジスタ
Ｗにセツトする。The process is as follows. That is, a) the similarity R11 corresponding to the elements S and PI is set in the register W at the starting point.

ｂ）次に類似度Ｒ，２を抽出し、レジスタＷに加算せし
めるべきか否かを調べるべく、Ｖ。＞ＥＯであるか否か
をチエツクする。ｃ）Ｖ２＞ＥＯでない場合、レジス
タＷに類似度Ｒ１２を加算し、次に類似度Ｒ１３を抽出
するか否かを調べる処理に入る。b) Next, extract the similarity R,2 and add it to the register W to see if it should be added to the register W. >Check whether it is EO or not. c) If V2>EO is not satisfied, the similarity R12 is added to the register W, and a process is started to check whether or not the similarity R13 is to be extracted next.

ｄ）しかしＶ。d) But V.

＞ＥＯを満足する場合、類似度Ｒｌ２を抽出せず、レジ
スタＷには類似度Ｒ２２を加算し、次に類似度Ｒ２３を
抽出するか否かを調べる処理に入る。第１図図示の整合
処理部５は、上記第４図および第５図に関連して説明し
た如き処理を行なうものであるが、上述の如くＶｊ＞Ｅ
Ｏをチエツクしてゆく方法の代わりに、時系列Ｖｊのピ
ーク・レベルの変化を追跡せしめつつ上記遷移を行なわ
せることができる。>EO, the similarity Rl2 is not extracted, the similarity R22 is added to the register W, and a process is started to check whether the similarity R23 is to be extracted next. The alignment processing section 5 shown in FIG. 1 performs the processing described in connection with FIGS. 4 and 5 above, but as described above, when Vj>E
Instead of checking O, the above transition can be performed while tracking changes in the peak level of the time series Vj.

この場合、次の如き遷移処理が行なわれる。In this case, the following transition processing is performed.

即ち、（３）今類似度和ΣＲを求めるべく抽出された類
似度Ｒが類似度Ｒｉ』であるとするとき、それに対応す
る時点のパーコール係数時間傾斜成分Ｖｊによつて上記
類似度Ｒｉｊを除した値（Ｒｉｊ／Ｖｊ）を求めておく
。That is, (3) If the similarity R extracted to calculate the similarity sum ΣR is the similarity Ri, then the above similarity Rij is divided by the Percoll coefficient time slope component Vj at the corresponding point in time. The value (Rij/Vj) is calculated in advance.

（４） −ー方当該時点において次に抽出されるかも知
れない類似度Ｒｉ＋１，ｊ＋１に定数（１／Ｋ）を乗算
した値（Ｒｉ＋１，ｊ＋１／Ｋ）を求める。(4) - Find the value (Ri+1, j+1/K) obtained by multiplying the degree of similarity Ri+1, j+1 that may be extracted next at the relevant point in time by a constant (1/K).

（５）一上記ｆ直（Ｒｉｊ／Ｖｊ）と（Ｒｉ＋！，ｊ＋
１Ａ０との大小関係を調べ、もし前者が大であれば次に
抽出される類似度Ｒとして類似度Ｒｉ，，＋！を抽出し
、上記と同様な値（Ｒｉ，ｊ＋１／Ｖｊ＋！）を求める
。そして値（Ｒｉ＋１，ｊ＋２Ａ０と比較する。（６）
上記（５）による調査の結果後者が大であれば、そこに
音韻の変化があるものとして、類似度Ｒｉ＋１，ｊ＋１
を抽出する。そして次に値（Ｒｉ＋１，ｊ＋１／Ｖｊ
＋！）ど（Ｒｉ＋ｊ＋Ｖ′Ｋ）との比較処理に入る。以
上の如き遷移処理もまた、時系列ＶＪのピークが現われ
るとき、値（Ｒｉｊ／Ｖｊ）が小い値となり値（Ｒｉ＋
１，ｊ＋１／Ｋ）が大きくなることを考えると、パーコ
ール係数時間傾斜成分のピーク発生タイミングに対応し
て類似度ＲｉｊからＲｉ＋１，ｊ＋１に遷移せしめてゆ
くことに変わりはない。(5) One above f direction (Rij/Vj) and (Ri+!, j+
Check the magnitude relationship with 1A0, and if the former is large, then the similarity R is extracted as the similarity Ri,,+! is extracted, and the same value (Ri, j+1/Vj+!) as above is obtained. Then compare with the value (Ri+1,j+2A0.(6)
If the latter is large as a result of the investigation according to (5) above, it is assumed that there is a phonological change, and the similarity Ri+1,j+1
Extract. Then, the value (Ri+1, j+1 /Vj
+! ) starts the comparison process with (Ri+j+V'K). In the above transition process, when the peak of the time series VJ appears, the value (Rij/Vj) becomes a small value and the value (Ri+
1,j+1/K) increases, the similarity Rij is still made to transition to Ri+1,j+1 in response to the peak generation timing of the Percoll coefficient time slope component.

なお上記説明においてパーコール係数時間傾斜成分時系
列Ｖｊを求めるに当つて、上記第（１）式にしたがつて
ＱパラメータＱ１ないしＱｌＯのすべてを利用すること
を示したが、上記時系列Ｖｊにおけるピーク値が第４図
図示の如くすなおに現われるよう、上記ＱパラメータＱ
１ないしＱｌＯのいくつかを選択して時系列Ｖｊを求め
るようにすることができる。In the above explanation, it has been shown that all of the Q parameters Q1 to QlO are used according to the above equation (1) to obtain the Percoll coefficient time slope component time series Vj, but the peak in the above time series Vj The above Q parameter Q is set so that the value appears smoothly as shown in FIG.
1 to some of QlO can be selected to obtain the time series Vj.

そして発明者らのシミユレーシヨンによればこの方法を
採用することがより好ましい結果を得ることを確めるこ
とができた。言うまでもなく、上述の如く、従来公知の
非線形整合処理においては、Ｒｉ，ｊから、Ｒｉ，ｊ＋
ｌとＲｉ＋１，ｊ＋１とのうちの類似度の大きい方を選
択しつつたどつてゆくために、非所望な形で遷移が生じ
るおそれがあつた。これに対して本明場の場合には十分
に信頼性の高い遷移タイミングを与えていることから、
本来Ｒｉ＋１，ｊ＋！に遷移すべき所が強まつてＲｉ＋
１，ｊに進む如き可能性が強くなる。以上説明した如く
、本発明によればパーコール係数の時間傾斜成分時系列
Ｖｊを利用し、そのピーク発生タイミングを用いて遷移
処理を行なうようにしている。このため、従来公知の方
法の如く部分的に高い類似度が非所望に現われたことに
もとずいて、非所望な遷移が生ずることがなくなる。According to simulations conducted by the inventors, it was confirmed that employing this method yields more favorable results. Needless to say, as mentioned above, in the conventionally known nonlinear matching process, from Ri,j to Ri,j+
Since the path is traced by selecting the one with a greater degree of similarity between 1 and Ri+1, j+1, there is a risk that a transition will occur in an undesired manner. On the other hand, in the case of Honmeiba, since a sufficiently reliable transition timing is given,
Originally Ri+1,j+! The point where the transition should be made is getting stronger and Ri+
The possibility of proceeding to 1,j becomes stronger. As explained above, according to the present invention, the time gradient component time series Vj of the Percoll coefficient is utilized, and the transition processing is performed using the peak occurrence timing. Therefore, undesired transitions do not occur due to the undesired appearance of partially high degrees of similarity as in conventionally known methods.

[Brief explanation of drawings]

第１図は本発明による認識方法の一実施例構成、第２図
は第１図に示す時間傾斜処理部の一実施例構成、第３図
は本発明によつて抽出されたＱパラメータの一例、第４
図は本発明による非線形整合処理を説明する説明図、第
５図は本発明による整合部の処理をフローチヤートの形
で表わした一実施例を夫々表わす。図中、１はパーコール係数ｋパラメータ抽出部、２−１
ないし２−１０は夫々時間傾斜処理部、３は時系列発生
部、４は入力音声のパターン系列発生部、５は整合処理
部、６は標準パターン系列群格納部を表わす。FIG. 1 shows the configuration of one embodiment of the recognition method according to the present invention, FIG. 2 shows the configuration of one embodiment of the time gradient processing section shown in FIG. 1, and FIG. 3 shows an example of the Q parameter extracted by the present invention. , 4th
The figure is an explanatory diagram for explaining the nonlinear matching process according to the present invention, and FIG. 5 shows an embodiment in which the process of the matching unit according to the present invention is expressed in the form of a flowchart. In the figure, 1 is a Percoll coefficient k parameter extraction unit, 2-1
Reference numerals 2-10 to 2-10 respectively represent time gradient processing units, 3 a time series generation unit, 4 an input voice pattern sequence generation unit, 5 a matching processing unit, and 6 a standard pattern sequence group storage unit.

Claims

[Claims]

1 Based on the degree of similarity between each element constituting the standard pattern series prepared in advance and each element constituting the input voice pattern series, non-linearity between the standard pattern series and the input voice pattern series is determined. In a recognition method using non-linear matching that performs matching processing, a Percoll coefficient time slope component extraction unit that determines a time slope component of the Percoll coefficient based on the Percoll coefficient extracted from the input voice; It has a time series generation section that generates a time series, and a matching processing means for matching the standard pattern series and the pattern series of the input audio, and the matching processing means includes a time series generation section that generates a time series, and a matching processing means that detects a peak in the Percoll coefficient time slope component time series. Corresponding to the occurrence timing, the similarity R used to calculate the similarity sum ΣR is calculated by calculating the similarity R between the i-th element constituting the standard pattern sequence and the j-th element constituting the pattern sequence of the input voice. Transition from one similarity Rij to another similarity Ri+_1, j+1 between the (i+1)th element constituting the standard pattern sequence and the (j+1)th element constituting the pattern sequence of the input voice. A recognition method using non-linear matching, characterized in that a transition timing is given and the transition is forcibly made.