JPS6240754B2

JPS6240754B2 -

Info

Publication number: JPS6240754B2
Application number: JP53144427A
Authority: JP
Inventors: Hiroaki Sekoe
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1978-11-22
Filing date: 1978-11-22
Publication date: 1987-08-29
Also published as: JPS5569879A

Description

[Detailed description of the invention]

本発明は音声等のパタン認識システムの主要構
成要素であるパタンマツチング装置の改良に関す
る。本発明によるパタンマツチング装置によつて取
り扱われるパタンは特徴量の時系列として表現さ
れるものであれば何ら限定されるものではない
が、以下では特に代表的な例として音声パタンを
対象とするパタンマツチング装置について説明す
る。音声パタンを自動認識して機械コード化する装
置である音声認識装置は高能率でかつ扱い易い新
時代のデータ入力装置として広範囲に応用分野を
有している。音声パタンを自動認識するための方
法としてはパタンマツチング法が最も信頼性が高
く簡単であるとして広く利用されている。この方
法では、認識対象とする各単語に標準的にパタン
（以下標準パタンと称する）が用意されており、
未知パタン（以下入力パタンと称する）が与えら
れると、上記標準パタンとの比較（すなわち、パ
タンマツチング）を行なつて、最も良好な一致が
得られる標準パタンを決定することによつてこの
入力パタンは該標準パタンと同一単語であると判
定される。このパタンマツチング法では入力パタ
ンと標準パタンとの比較を行うための距離尺度の
選定が重要である。すなわち、同一単語の発声で
あつても、その時々の声道系の運動が必ずしも一
定でないために音声パタンは複雑な変形を受け
る。この変形に影響されにくい尺度によつてパタ
ンマツチングを行なうことによつて、はじめて確
度の高い音声認識処理が可能になる。従来発表さ
れているパタンマツチング手法（またはそれを実
行する装置たるパタンマツチング装置）で最も信
頼性が高いと考えられるものが1974年に発行され
た刊行物「日本音響学会音声研究会資料S74−30
に“ミニコンとDPプロセツサによる実時間音声
認識システム”と題して発表された論文」（以下
引用文献(1)と呼ぶ）に記載されている。この方法
では、音声信号はスペクトラム分析と時間標本化
によつて各時点の音声的特徴を表わすベクトル
（以下特徴ベクトルと称する）の時系列に変換さ
れて処理されている。以下ではこのように特徴ベ
クトルの時系列として表わされたパタンを音声パ
タンと呼ぶ。上記引用文献(1)記載の方法によると
発声速度変動に記因する時間軸の非線形伸縮に対
して安定なマツチング動作を行なうことができ、
結果として確度の高い認識結果が得られるとされ
ている。しかし、上記引用文献(1)の方法は比較の対象と
なる入力パタンと標準パタンとがともに十分短か
い一定の（例えば、10ｍｓ程度）標本化周期で時
間標本化されている場合を主として想定して考案
されたものと言える。一方、このような時間標本
化によつて得られる音声パタンは極めて冗長であ
ることが知られている。すなわち、母音のように
定常な音韻では、ほとんど一定のスペクトラムが
100ｍｓ以上も継続することがある。従つて、上
記のような時間標本化を行なうと同一の音韻が10
点以上も時間標本化される。音声認識の目的では
一音韻について１個の時間標本点が得られれば十
分であるので、このような多重の時間標本化は冗
長なのである。しかし、実際には一音韻から１時
間標本点のみを選ぶような時間標本化は困難であ
るので、例えば、音声研究会資料S78−21（昭和
53年６月）に“マイクロプロセツサを用いた実時
間単語音声認識システム”と題して発表された論
文（以下引用文献(2)と称する）に記載される如
く、定常な部分は粗に変動している部分は密に時
間標本化する方法が取られる。このような時間標
本化によると定常な部分の標本点数が少なくなる
ので全体として冗長性の低い音声パタンを得るこ
とができる。以下ではこのような時間標本化を不
等間隔標本化と呼ぶことにする。前記引用文献(1)に述べられているマツチングア
ルゴリズムは時間軸が均等に標本化されているこ
とを前提として考案されたものである。不等間隔
時間標本化の場合には当然それに適したマツチン
グ方法が採用されなくてはならない。前記引用文
献(2)では次のようなマツチング方法が提案されて
いる。すなわち、２個の音声パタンＡとＢをＡ＝a₁、a₂、……ａ_i、……、ａ_I Ｂ＝b₁、b₂、……ｂ_j、……ｂ_J (1) と示す。ここにａ_iは音声パタンＡの第ｉ時間標
本点ｔ(i) (2) におけるスペクトラム的特徴を与えるスペクトル
であり、ｂ_jは音声パタンＢの第ｊ時間標本点 τ(j) におけるスペクトラム的特徴を示すベクトルであ
る。第１図にはこれら２個の音声パタンＡとＢをそ
れぞれｉ軸とｊ軸に展開して示している。このｉ
−ｊ平面上の点（ｉ、ｊ）は特徴ベクトルａ_iと
特徴ベクトルｂ_jとの組合を示している。前記引
用文献(1)に記載されている方法ではこれらの各点
（ｉ、ｊ）にａ_iとｂ_jとの間の距離（以後ベクト
ル距離と呼ぶ）ｄ（ｉ、ｊ）＝ａ_i−ｂ_j (3) を考えている。点（１、１）から点（Ｉ、Ｊ）
（第１図ではＩ＝８、Ｊ＝６）に至る点列（第１
図参照数字１）（ｉ(1)、ｊ(1)）、（ｉ(2)、ｊ(2)）、…… （ｉ(k)、ｊ(k)）、……、（ｉ(k)、ｊ(k)） (4) を想定する。ただし、ｉ(1)＝１、ｊ(1)＝１、ｉ(k)
＝Ｉ、ｊ(k)＝Ｊ。この点列上距離ｄ（ｉ(k)、ｊ(k)））の総和を最
小にすることによりこれら２個のパタン間の時間
軸変動を補正し、両パタン間の距離（以下パタン
距離と呼ぶ）を算出する。具体的にはによつて音声パタンＡとＢとのパタン距離が定義
されている。ここに、Ｎは正規化係数と呼ばれ、Ｎ＝Ｉ＋Ｊ (6) と定義される。またＷ(k)は荷重係数であつて、Ｗ(k)＝（ｉ(k)−ｉ（ｋ−１））＋（ｊ(k)−ｊ（ｋ−
１）） (7) と定義される。上記(5)式の計算は次のような動的計画法演算に
よつて計算されている。初期条件ｇ（１、１）＝ｄ（１、１） (8) 漸化式パタン距離Ｄ（Ａ、Ｂ）＝１／Ｎｇ（Ｉ、Ｊ） (10) 以上の交献(1)記載による方法は、例えば、
「IEEE Transactions on Acoustics、Speech、
and Signal Processing、Vol.ASSP−26、No.1、
（1978年２月）」に“Dynamic Programming
Algorithm Optimization for Spoken word
Recognition”と題して発表された論文（以下引
用文献(3)と称する）の第44ページに記載されてい
る如く音声パタンＡとＢがともに共通一定の周期
で時間標本化された場合に適用されるべく開発さ
れたのである。従つて、本発明において想定され
ている如く、不等間隔標本化された音声パタンを
対象とする場合には、不適当である。前記引用文献(2)には不等間標本化された音声パ
タンを対象として扱うべく工夫されたマツチング
アルゴリズムが記載されている。それによると、
各ベクトルａ_iに対してもそのベクトルが標本化
された時刻ｔ(i)と、その直前のベクトルai−１が
標本化された時刻ｔ（ｉ−１）との差 α(i)＝ｔ(i)−ｔ（ｉ−１） (11) を荷重係数としかつ各ベクトルｂ_jに対して同様
に β(j)＝τ(j)−τ（ｊ−１） (12) を荷重係数として算出しておく。そしてマツチン
グ処理にあたつては前記(9)式の代りになる漸化式演算を行なう。また、この計算に並行
して正規化係数Ｎを次の漸化式によつて算出す
る。ここにカツコ〔〕内の値は（13）式のカツコ
〔〕内の３値の内のいずれが選択されるかに従
がつて対応する値が選択されるものとする。（13）式において、右辺で（α(i)＋β(j)）／２
が荷重されているのは、次の目的が有つてのこと
とされている。すなわち、例えば、第１図におけ
るベクトルａ(4)、ｂ(4)のように粗に標本化されて
いる部分には大きな荷重（α(4)＋β(4)）／２を乗
ずることによつて原時間軸上で占める時間区間で
の割合を復元することを目的としている。しか
し、第２図に示す２種の点列２と点列３とを比較
して見ると、この引用文献(2)に記載された荷重係
数の用い方には重大な不都合があることがわか
る。すなわち、点列２上で荷重係数（α(i)＋β
(i)）／２の総和Ｎ（Ｉ、Ｊ）を計算すると、Ｎ（Ｉ、Ｊ）＝（１＋１）／２＋（１＋２）／２＋（１＋２）／２（１＋２）／２＋（３＋２）／
２＋（３＋２）／２＋（３＋１）／２＋（３＋
２）／２＋（３＋２）／２＋（１＋２）／２＋（２＋
２）／２＋（１＋２）／２＝22.5 となる。同様のことを点列３上で行なうと、Ｎ（Ｉ、Ｊ）＝（１＋１）／２＋（１＋１）／２＋（１＋１）／２＋（３＋１）／２＋（１＋
１）／２＋（２＋１）／２＋（１＋１）／２＋（１＋
２）／２＋（１＋２）／２＋（１＋１）／２＋（１＋
２）／２＋（１＋２）／２＝15.5 となる。これら荷重係数は（13）式の右辺でｄ
（ｉ、ｊ）に乗ぜられているので、（13）式がｇ
（Ｉ、Ｊ）まで計算され終つた時点では、点列２
を経由する場合よりも点列３を経由する場合の方
でｇ（Ｉ、Ｊ）の値が小さくなる傾向が存在する
ことになる。この結果、第２図の点列２が時間軸
変動補正のための最適点列であつても、上述した
ような荷重係数の影響で点列３が選択され誤まつ
た時間軸補正（すなわち、正規化）がなされると
いう欠点が文献(2)の方法にはあることになる。な
お、文献(2)では音声パタンＡとＢの間のパタン距
離をＤ（Ａ、Ｂ）＝ｇ（Ｉ、Ｊ）／Ｎ（Ｉ、
Ｊ）と定義している。この正規化係数Ｎ（Ｉ、Ｊ）を
計算するためには（14）式の漸化式計算が第１図
のｉ−ｊ平面内の各点で必要であるが、このため
に余分の計算が必要となる。本発明の目的は従来装置の有する上述の諸欠点
を除去して最小の計算量で正確な時間軸正規化が
できるパタンマツチング装置を提供することにあ
る。本発明の装置は不等間標本化されたスペクトル
ａ_iの時系列として表現される入力パタンＡを保
持するための入力パタンバツフア、ベクトルｂ_j
の時系列として表現される標準パタンＢを保持す
るための標準パタン記憶、前記入力パタンＡの各
ベクトルａ_iに対する荷重係数α(i)を記憶するた
めの入力パタン荷重重数記憶、前記標準パタンＢ
の各ベクトルｂ_jに対する荷重係数β(j)を記憶す
るための標準パタン荷重係数記憶、前記ベクトル
ａ_iと前記ベクトルｂ_jとの間の距離ｄ（ｉ、ｊ）
を時刻ｉとｊの任意の組合せに対して算出する機
能を有する距離計算部、時刻ｉとｊをインデツク
スとする漸化式値ｇ（ｉ、ｊ）として次の３種の
値 (1) 漸化式値ｇ（ｉ、ｊ−１）に前記距離ｄ
（ｉ、ｊ）と前記荷重係数β(j)の積を加算した
もの、 (2) 漸化式値ｇ（ｉ−１、ｊ−１）に前記距離ｄ
（ｉ、ｊ）と前記荷重係数の和（α(i)＋β(j)）
の積を加算したもの、 (3) 漸化式値ｇ（ｉ−１、ｊ）に前記距離ｄ
（ｉ、ｊ）と前記荷重係数α(i)の積を加算した
ものの中の最大値を選択出力する機能を有する
漸化式計算部とから構成されている。上記の漸化式を数式的に書下すと次のようにな
る。この漸化式によつて荷重される係数は第３図の
ようになる。すなわち、ｉ座標が増加すると、荷
重係数α(i)が、ｊ座標が増加する経路ではβ(j)
が、また、ｉ座標がともに増加する経路では（α
(i)＋β(j)）がそれぞれ荷重される。かくの如き荷
重係数によると、前記引用文献(2)において第２図
の場合に生じた不都合は解決される。すなわち、
第２図の点列２または点列３のいずれが選択され
ても荷重の総和は一定値20となる。一般には、荷
重係数の総和は、となる。このため、荷重係数の影響で特定の点列
が選択されるという傾向は排除された。また、正
規化係数Ｎは（16）式で与えられるので、（14）
式の漸化式を多数回計算するという処理は不要と
なる。従つて、本発明の原理によると、正解な時間軸
正規化な時間軸正規化能力を有しかつ簡単なパタ
ーンマツチング装置が実現できる。以下に本発明の構成を第４図以下の図面を参照
しながら詳細に説明する。第４図は本発明の一実施例を示す図である。以下に述べるパタンマツチング動作が開始され
る前に入力パタンバツフア２１と標準パタン記憶
２３のそれぞれには(1)式に示すごとき入力パタン
Ａおよび標準パタンＢが記憶され、また、入力パ
タン荷重係数記憶２２と標準パタン荷重係数記憶
２４にはそれぞれ(11)式及び(12)式で定義される入力
パタン荷重係数α(i)と標準パタン荷重係数β(j)と
が記憶されているものとする。最初に、リセツト信号ｒが１にセツトされ、番
地指定信号ｉとｊはｉ＝１、ｊ＝１とされる。ワ
ーキングメモリ２７は、制御部からの番地信号ｉ
とｊにより２次元的に番地指定され、漸化式値ｇ
（ｉ、ｊ）が記憶されるように構成されている。
前記リセツト信号ｒが１として与えられる間に次
の初期条件・設定が実行される。すなわち、入力
パタンバツフア２１からベクトルa₁が、また標準
パタン記憶２３からベクトルb₁が読み出され距離
計算部２５に送られる。そこではベクトルa₁とb₁
の距離ｄ（１、１）が計算され、漸化式計算部２
６に送られる。入力パタン荷重係数記憶２２から
は入力パタン荷重係数α(1)が、また、標準パタン
荷重係数記憶からは標準パタン荷重係数β(1)がそ
れぞれ読み出され漸化式計算部２６に送られる。
漸化式計算部２６では以上の入力をもとにして、ｇ（１、１）＝ｄ（１、１）×（α(1)＋β(1)）（17）なる初期条件設定をワーキングメモリ２７に行な
う。これは(8)式に対応するものであるが、本発明
の場合には荷重係数α(1)とβ(1)の効果を得るため
に（α(1)＋β(1)）がｄ（１、１）に掛けられてい
る。漸化式計算部２６は第５図のように構成されて
いる。初期条件設定動作においては、標準パタン
荷重係数α(1)と入力パタン荷重係数β(1)とがそれ
ぞれ信号線α，βを経由して入力される。加算回
路２６１では、これらの和（α(1)＋β(1)）が計算
され、信号αβとして乗算回路２６３に送られ
る。乗算回路２６３では信号線ｄを経由して入力
される距離ｄ（１、１）と前記信号αβとの積を
計算することによつて（17）式のｇ（１、１）が
信号x₂として算出される。一方、ゲート回路２６
０はリセツト信号ｒが１になつている間は数値０
なる信号を出力する。このため、加算回路２６６
の出力y₂は前記信号x₂の値に等しくｇ（１、１）
となる。マルチプレクサ２６９はリセツト信号ｒ
が１の間は信号y₂を選択し出力する。このため、
漸化式計算部２６の出力g₀としては(1)式のｇ
（１、１）が得られる。この出力g₀はワーキング
メモリ２７に書込まれる。以上の初期条件設定動作が終了すると、リセツ
ト信号ｒは０にリセツトされ、番地指定信号ｉと
ｊとが順次増加方向に変化する。すなわち、前記
信号ｉとｊは、第１図のｉ−ｊ平面上の各点をす
べてを網羅するようにかつ増加方向にｉ＝Ｉ、ｊ
＝Ｊとなるまで変化させられる。このように、番地指定信号ｉとｊが変化する間
になされる第４図各部の動作を、一般的にｇ
（ｉ、ｊ）が計算される場合について説明する。
番地指定信号ｉ、ｊが指定されると入力パタンバ
ツフア２１からベクトルａ_iが、標準パタン記憶
２３からベクトルｂ_jが出力される。距離計算部
２５ではこれら２個のベクトルａ_iとｂ_jの間の距
離ｄ（ｉ、ｊ）が計算され信号ｄとして出力され
る。入力パタン荷重記憶２２からは荷重係数α(i)
が、標準パタン荷重記憶２４からは荷重係数β(j)
が出力される。第５図の加算回路２６１には荷重
係数α(i)とβ(j)が入力され、それらの和（α(i)＋
β(j)が計算され信号αβとして出力される。乗算
回路２６２には荷重系数α(i)と距離ｄ（ｉ、ｊ）
が入力され、それらの積ｄ（ｉ、ｊ）・β(j)が計
算され信号x₃として出力される。一方、第４図の
バツフア記憶２７に番地指定信号ｉ，ｊが与えら
れると、既に計算され記憶されている漸化式値ｇ
（ｉ−１、ｊ）、ｇ（ｉ−１、ｊ−１）およびｇ
（ｉ、ｊ−１）が信号g₃，g₂およびg₁、としてそ
れぞれ出力される。加算回路２６５には信号x₃と
してｄ（ｉ、ｊ）・β(j)が入力され、信号g₃とし
てｇ（ｉ−１、ｊ）が入力される。従つて、この
加算回路からは信号y₃として、ｇ（ｉ−１、ｊ）
＋ｄ（ｉ、ｊ）・α(i)、すなわち、（15）式右辺カ
ツコ〔〕内の第３式の値が出力される。同様に
して、乗算回路２６３の出力はｄ（ｉ、ｊ）・（α
(i)＋β(j)）となり信号x₂として加算回路２６６に
入力される。このときはリセツト信号ｒは０であ
るので、ゲート回路２６０はその入力であるg₂を
そのまま通過せしめる。したがつて、加算回路２
６６では前記信号x₂とg₂が加算される結果、
（15）式右辺カツコ〔〕内の第２式、すなわ
ち、ｇ（ｉ−１、ｊ−１）＋ｄ（ｉ、ｊ）・（α(i)
＋β(j)）が信号y₂として出力される。同様にし
て、乗算回路２６４および加算回路２６７の働き
によつて信号y₁として（15）式右辺カツコ〔〕
内の第１式ｇ（ｉ、ｊ−１）＋ｄ（ｉ、ｊ）・β(j)
が算出される。これら３種の信号y₁，y₂およびy₃
は比較器２６８によつて比較され、それらのうち
最小値が信号gmとして出力される。このように
して、（15）式の計算が実行され漸化式値ｇ
（ｉ、ｊ）が信号gmとして算出される。このと
き、リセツト信号ｒは０であるので、マルチプレ
クサ２６９では入力信号gmが選択され、信号g₀
として第４図のワーキングメモリ２７に送られ漸
化式値ｇ（ｉ、ｊ）として記憶される。以上のようにして、漸化式値の計算はｉ＝Ｉ、
ｊ＝Ｊとして最後の漸化式値ｇ（Ｉ、Ｊ）が算出
されるまで実行される。この漸化式値９（Ｉ、
Ｊ）が算出されたとき、制御部２０からパルス信
号ｅが発生され、これによつて次のような動作が
実行される。第４図の入力パタン荷重係数記憶２２と標準パ
タン荷重係数記憶２３には（16）式に現われる荷
重係数和 The present invention relates to an improvement in a pattern matching device which is a main component of a pattern recognition system for speech, etc. The patterns handled by the pattern matching device according to the present invention are not limited in any way as long as they are expressed as a time series of feature values, but below, we will focus on speech patterns as a particularly representative example. The pattern matching device will be explained. Speech recognition devices, which are devices that automatically recognize voice patterns and convert them into machine codes, have a wide range of applications as highly efficient and easy-to-use data input devices of the new era. Pattern matching is widely used as the most reliable and simplest method for automatically recognizing speech patterns. In this method, a standard pattern (hereinafter referred to as standard pattern) is prepared for each word to be recognized.
When an unknown pattern (hereinafter referred to as an input pattern) is given, this input pattern is determined by comparing it with the standard pattern (i.e., pattern matching) and determining the standard pattern that provides the best match. It is determined that the pattern is the same word as the standard pattern. In this pattern matching method, it is important to select a distance measure for comparing the input pattern and the standard pattern. That is, even when the same word is uttered, the movement of the vocal tract system is not necessarily constant at each time, so the speech pattern undergoes complex deformation. Only by performing pattern matching using a scale that is not easily affected by this deformation can highly accurate speech recognition processing be achieved. The most reliable pattern matching method (or pattern matching device that performs it) that has been published so far is the publication published in 1974, "Acoustical Society of Japan Speech Study Group Material S74. −30
This is described in a paper titled ``Real-time speech recognition system using a minicomputer and DP processor'' (hereinafter referred to as cited document (1)). In this method, an audio signal is processed by being converted into a time series of vectors (hereinafter referred to as feature vectors) representing audio characteristics at each point in time through spectrum analysis and time sampling. In the following, a pattern expressed as a time series of feature vectors in this manner will be referred to as a speech pattern. According to the method described in the above-mentioned cited document (1), it is possible to perform a stable matching operation against nonlinear expansion and contraction of the time axis caused by fluctuations in speaking rate.
As a result, highly accurate recognition results are said to be obtained. However, the method in cited document (1) above mainly assumes that the input pattern and the standard pattern to be compared are time-sampled at a sufficiently short and constant sampling period (for example, about 10 ms). It can be said that it was devised. On the other hand, it is known that the speech patterns obtained by such time sampling are extremely redundant. In other words, for stationary phonemes like vowels, the spectrum is almost constant.
It may continue for more than 100ms. Therefore, if time sampling is performed as described above, the same phoneme will be divided into 10
Points or more are also time sampled. For purposes of speech recognition, it is sufficient to obtain one time sampling point for one phoneme, so such multiple time sampling is redundant. However, in reality, it is difficult to perform time sampling in which only one hour sampling point is selected from one phoneme.
As described in a paper entitled "Real-time word speech recognition system using microprocessor" published in June 1953 (hereinafter referred to as cited document (2)), the steady part fluctuates roughly. A method of densely time-sampling is used for the parts that are According to such time sampling, the number of sample points in the stationary portion is reduced, so that it is possible to obtain a speech pattern with low redundancy as a whole. Hereinafter, such time sampling will be referred to as nonuniform interval sampling. The matching algorithm described in the cited document (1) was devised on the premise that the time axis is evenly sampled. Naturally, in the case of non-uniform time sampling, a matching method suitable for that must be adopted. In the cited document (2), the following matching method is proposed. In other words, two voice patterns A and B are expressed as A=a ₁ , a ₂ , ...a _i , ..., a _I B=b ₁ , b ₂ , ...b _j , ...b _J (1) show. Here, a _i is the spectrum giving the spectral feature at the i-th time sample point t(i) (2) of speech pattern A, and b _j is the spectrum giving the spectral feature at the j-th time sample point τ(j) of speech pattern B. It is a vector indicating the characteristics. In FIG. 1, these two voice patterns A and B are shown expanded on the i-axis and the j-axis, respectively. This i
A point (i, j) on the −j plane indicates a combination of feature vector a _i and feature vector b _j . In the method described in the cited document (1), the distance between a _i and b _j (hereinafter referred to as vector distance) is calculated for each of these points (i, j) as follows: d(i, j) = a _i − I am thinking of b _j (3). From point (1, 1) to point (I, J)
(I=8, J=6 in Figure 1)
Figure reference number 1) (i(1), j(1)), (i(2), j(2)), ... (i(k), j(k)), ..., (i(k) ), j(k)) (4). However, i(1)=1, j(1)=1, i(k)
= I, j(k) = J. By minimizing the sum of distances d (i(k), j(k))) on this point sequence, the time axis fluctuations between these two patterns are corrected, and the distance between both patterns (hereinafter referred to as pattern distance) is call). in particular The pattern distance between voice patterns A and B is defined by . Here, N is called a normalization coefficient and is defined as N=I+J (6). Also, W(k) is a loading coefficient, W(k)=(i(k)-i(k-1))+(j(k)-j(k-
1)) (7) The above equation (5) is calculated by the following dynamic programming operation. Initial condition g (1, 1) = d (1, 1) (8) Recurrence formula Pattern distance D (A, B) = 1/Ng (I, J) (10) The method described in intersection (1) above is, for example,
“IEEE Transactions on Acoustics, Speech;
and Signal Processing, Vol.ASSP−26, No.1,
(February 1978)” in “Dynamic Programming
Algorithm Optimization for spoken word
This method is applied when audio patterns A and B are both time-sampled at a common fixed period, as described on page 44 of the paper entitled "Recognition" (hereinafter referred to as cited document (3)). Therefore, it is inappropriate when targeting speech patterns sampled at irregular intervals, as envisaged in the present invention. A matching algorithm devised to handle unevenly sampled speech patterns is described.According to it,
For each vector a _i, the difference between the time t(i) at which the vector was sampled and the time t(i-1) at which the immediately preceding vector ai-1 was sampled α(i) = t Let (i)−t(i−1) (11) be the weighting factor, and similarly for each vector b _j , let β(j)=τ(j)−τ(j−1) (12) be the weighting factor. Calculate it. And in the matching process, instead of the above equation (9), Perform the recurrence formula operation. Further, in parallel with this calculation, a normalization coefficient N is calculated using the following recurrence formula. Here, it is assumed that the value in brackets [ ] is selected according to which of the three values in brackets [ ] in equation (13) is selected. In equation (13), on the right side (α(i) + β(j))/2
It is said that the purpose of the load is as follows. That is, for example, by multiplying coarsely sampled parts such as vectors a(4) and b(4) in Figure 1 by a large load (α(4)+β(4))/2, The purpose of this is to restore the proportion of the time interval on the original time axis. However, when comparing the two types of point sequence 2 and point sequence 3 shown in Figure 2, it can be seen that there is a serious disadvantage in the use of the loading coefficient described in this cited document (2). . In other words, on the point sequence 2, the weighting coefficient (α(i) + β
(i))/2, N(I, J) = (1+1)/2+(1+2)/2 +(1+2)/2(1+2)/2+(3+2)/
2 + (3 + 2) / 2 + (3 + 1) / 2 + (3 +
2)/2 +(3+2)/2+(1+2)/2+(2+
2)/2 + (1+2)/2=22.5. Doing the same thing on the point sequence 3, N (I, J) = (1 + 1) / 2 + (1 + 1) / 2 + (1 + 1) / 2 + (3 + 1) / 2 + (1 +
1)/2 +(2+1)/2+(1+1)/2+(1+
2)/2 +(1+2)/2+(1+1)/2+(1+
2)/2 + (1+2)/2 = 15.5. These load coefficients are d on the right side of equation (13).
(i, j), so equation (13) becomes g
When the calculation up to (I, J) is completed, the point sequence 2
There is a tendency for the value of g(I, J) to be smaller when passing through point sequence 3 than when passing through . As a result, even though point sequence 2 in FIG. 2 is the optimal point sequence for time axis fluctuation correction, point sequence 3 is selected due to the influence of the above-mentioned loading coefficient, resulting in incorrect time axis correction (i.e., This means that the method in document (2) has the disadvantage that normalization) is performed. In addition, in literature (2), the pattern distance between speech patterns A and B is D (A, B) = g (I, J) / N (I,
J). In order to calculate this normalization coefficient N(I, J), it is necessary to calculate the recurrence formula of equation (14) at each point in the i-j plane in Figure 1, but this requires extra calculations. Is required. SUMMARY OF THE INVENTION An object of the present invention is to provide a pattern matching device that eliminates the above-mentioned drawbacks of conventional devices and can perform accurate time axis normalization with a minimum amount of calculation. The device of _the present invention has an input pattern buffer, a vector b _j
standard pattern storage for holding the standard pattern B expressed as a time series of input pattern A, input pattern weight storage for storing the weight coefficient α(i) for each vector a _i of the input pattern A, and storage for the standard pattern A; B
Standard pattern weighting factor storage for storing the weighting factor β(j) for each vector b _j of , the distance d(i,j) between said vector a _i and said vector b _j
The following three types of values (1) are calculated as the recurrence formula value g(i, j) using time i and j as indexes. The distance d is added to the formula value g(i, j-1)
(i, j) and the product of the weighting coefficient β(j), (2) the recurrence formula value g(i-1, j-1) and the distance d
(i, j) and the sum of the weighting coefficients (α(i)+β(j))
(3) The sum of the products of (3) the recurrence formula value g(i-1, j) and the distance d
It is comprised of a recurrence formula calculation unit having a function of selectively outputting the maximum value of the sum of the products of (i, j) and the weighting coefficient α(i). The above recurrence formula can be written down mathematically as follows. The coefficients loaded by this recurrence formula are as shown in FIG. In other words, when the i-coordinate increases, the weighting coefficient α(i) becomes β(j) on the path where the j-coordinate increases.
However, on the path where both i-coordinates increase (α
(i)+β(j)) are respectively loaded. According to such a load coefficient, the inconvenience that occurred in the case of FIG. 2 in the cited document (2) can be solved. That is,
Regardless of whether point sequence 2 or point sequence 3 in FIG. 2 is selected, the total sum of loads will be a constant value of 20. Generally, the sum of the load factors is becomes. Therefore, the tendency that a specific point sequence is selected due to the influence of the loading coefficient has been eliminated. Also, since the normalization coefficient N is given by equation (16), (14)
There is no need to calculate the recurrence formula many times. Therefore, according to the principles of the present invention, it is possible to realize a simple pattern matching device that has accurate time axis normalization capabilities. The configuration of the present invention will be explained in detail below with reference to the drawings from FIG. 4 onwards. FIG. 4 is a diagram showing an embodiment of the present invention. Before the pattern matching operation described below is started, the input pattern buffer 21 and the standard pattern memory 23 respectively store an input pattern A and a standard pattern B as shown in equation (1), and also store an input pattern weight coefficient. It is assumed that input pattern loading coefficient α(i) and standard pattern loading coefficient β(j) defined by equations (11) and (12) are stored in 22 and standard pattern loading coefficient memory 24, respectively. . First, the reset signal r is set to 1, and the addressing signals i and j are set to i=1, j=1. The working memory 27 receives an address signal i from the control section.
The address is specified two-dimensionally by and j, and the recurrence formula value g
(i, j) is stored.
While the reset signal r is applied as 1, the following initial conditions and settings are executed. That is, vector a ₁ is read from the input pattern buffer 21 and vector b ₁ is read from the standard pattern memory 23 and sent to the distance calculation section 25 . There the vectors a ₁ and b ₁
The distance d(1, 1) is calculated, and the recurrence formula calculation unit 2
Sent to 6. The input pattern weight coefficient α(1) is read from the input pattern weight coefficient memory 22, and the standard pattern weight coefficient β(1) is read from the standard pattern weight coefficient memory and sent to the recurrence formula calculation unit 26.
Based on the above input, the recurrence formula calculation unit 26 stores the initial condition settings as g(1,1)=d(1,1)×(α(1)+β(1))(17) in working memory. It will be held on the 27th. This corresponds to equation (8), but in the case of the present invention, (α(1)+β(1)) is changed to d( 1, 1). The recurrence formula calculation section 26 is configured as shown in FIG. In the initial condition setting operation, standard pattern weight coefficient α(1) and input pattern weight coefficient β(1) are input via signal lines α and β, respectively. The adder circuit 261 calculates the sum (α(1)+β(1)), and sends it to the multiplier circuit 263 as a signal αβ. The multiplication circuit 263 calculates the product of the distance d(1, 1) input via the signal line d and the signal αβ, so that g(1, 1) in equation (17) becomes the signal x ₂ It is calculated as On the other hand, the gate circuit 26
0 is a numerical value of 0 while the reset signal r is set to 1.
Outputs a signal. Therefore, the addition circuit 266
The output y ₂ of is equal to the value of the signal x ₂ g(1, 1)
becomes. The multiplexer 269 receives the reset signal r
While is 1, signal _y2 is selected and output. For this reason,
The output g ₀ of the recurrence formula calculation unit 26 is g in equation (1).
(1, 1) is obtained. This output g ₀ is written to the working memory 27. When the above initial condition setting operation is completed, the reset signal r is reset to 0, and the address designation signals i and j sequentially change in the increasing direction. That is, the signals i and j are arranged so as to cover all the points on the i-j plane in FIG.
= J. In this way, the operations of each part in FIG. 4 that are performed while the address designation signals i and j change are generally expressed as g.
A case where (i, j) is calculated will be explained.
When the address designation signals i and j are designated, the input pattern buffer 21 outputs the vector a _i and the standard pattern memory 23 outputs the vector b _j . The distance calculation unit 25 calculates the distance d(i, j) between these two vectors a _i and b _j and outputs it as a signal d. From the input pattern load memory 22, the load coefficient α(i)
However, from the standard pattern load memory 24, the load coefficient β(j)
is output. Loading coefficients α(i) and β(j) are input to the adder circuit 261 in FIG. 5, and their sum (α(i)+
β(j) is calculated and output as a signal αβ. The multiplier circuit 262 has a load system α(i) and a distance d(i, j).
are input, and their product d(i,j)·β(j) is calculated and output as a signal _x3 . On the other hand, when the address designation signals i and j are given to the buffer memory 27 in FIG.
(i-1, j), g (i-1, j-1) and g
(i, j-1) are output as signals g ₃ , g ₂ and g ₁ , respectively. The adder circuit 265 receives d(i, j) and β(j) as the signal x ₃ and receives g(i-1, j) as the signal g ₃ . Therefore, from this adder circuit, g(i-1, j) is output as signal _y3 .
+d(i,j)·α(i), that is, the value of the third equation in brackets [ ] on the right side of equation (15) is output. Similarly, the output of the multiplication circuit 263 is d(i,j)・(α
(i)+β(j)) and is input to the adder circuit 266 as a signal x ₂ . At this time, since the reset signal r is 0, the gate circuit 260 allows its input g ₂ to pass through as is. Therefore, adder circuit 2
At 66, the signals x ₂ and g ₂ are added, resulting in
The second equation in brackets [ ] on the right side of equation (15), that is, g(i-1, j-1) + d(i, j)・(α(i)
+β(j)) is output as signal _y2 . Similarly, by the functions of the multiplier circuit 264 and the adder circuit 267, the signal y ₁ is converted to the right-hand side of equation (15).
The first equation g(i, j-1) + d(i, j)・β(j)
is calculated. These three signals y ₁ , y ₂ and y ₃
are compared by the comparator 268, and the minimum value among them is output as the signal gm. In this way, the calculation of equation (15) is executed and the recurrence formula value g
(i, j) is calculated as the signal gm. At this time, since the reset signal r is 0, the multiplexer 269 selects the input signal gm, and the signal g ₀
is sent to the working memory 27 in FIG. 4 and stored as the recurrence formula value g(i,j). In the above manner, the recurrence formula value is calculated by i=I,
The process is executed until the final recurrence formula value g(I, J) is calculated with j=J. This recurrence formula value 9(I,
When J) is calculated, a pulse signal e is generated from the control section 20, and the following operations are thereby performed. The input pattern load coefficient memory 22 and the standard pattern load coefficient memory 23 in FIG.

【式】および[expression] and

【式】が予め計算されそれぞれ記憶されている。これら
荷重係数和δαとδβは正規化部２８に入力され
る。正規化部２８は、荷重係数和δαとδβとの
加算を行ない、（16）式の正規化係数Ｎを算出す
る。一方、前記漸化式計算部２６からは信号線g₀
を経由して前記の漸化式値ｇ（Ｉ、Ｊ）が入力さ
れる。この値は先に計算された正規化係数Ｎによ
つて割られ(10)式で与えられるパタン距離が算出さ
れ信号線Ｓに出力される。このようにして、第４
図および第５図を参照してパタンマツチング装置
の構成および動作が説明された。第６図はさらに改良された漸化式計算部の構成
を示す図である。この図には第５図における比較
器２６８の近傍のみが示れている。同一出願人に
よる特許願昭和49年第2418号明細書には、第１図
に示す点列１によつて構成される曲線を平滑にし
た方が、第２図に示すように極端に折れ曲る点列
を許す場合に比べて良好な成績が得られる旨の記
載がある。第６図はこの点を加味して構成されたものであ
る。信号y₁、すなわち、（15）式右辺のカツコ
〔〕内第１式には加算回路２６８１によつて罰
金P₁が加算され、信号y₂、すなわち、上記カツコ
〔〕内の第３式には加算回路２６８３によつて
罰金P₃が加算される。この結果、信号y₁とy₃の値
がy₂に比べて大きくなる傾向が生じる。このた
め、比較回路２６８が信号y₂を最小値として選択
する確率が高くなる。このことは（15）式右辺の
カツコ〔〕内の第２式が選択されることを意味
し、第１図に示す点列１が平滑化される傾向を示
すことになる。なお、構成をより簡便にするため
に第７図に示すように、減算器２６８２によつて
信号y₂から定数P₂を減算する構成を用いてもほぼ
等価な効果が得られる。また、加算減算の代りに
定係数を乗算しても同様な効果が得られる。以上のように、本発明を実施例により説明した
が、これらの記載は本発明の範囲を限定するもの
ではない。特に、第５図においては、加算機能お
よび乗算機能等を空間分割的に実現しているが、
時分割的に構成することも可能である。また、荷
重係数の定義としては上記引用文献(2)に定義され
（本明細書中の(11)式および(12)式に対応）たものに
限定されるものではない。例えば、「日本音響学
会講演論文集（昭和53年５月発行）の第385ペー
ジに“最適短形波近似を用いた可変フレーム音声
分析合成方式”と題して発表された論文」に記載
されているように、音声パタン中から代表ベクト
ルを選び使用する場合には、個々のベクトルが代
表する時間区間長を荷重係数α(i)およびβ(j)とし
て使用することが考えられる。さらに、以上の説明は短離尺度をもとにして与
えられているため（15）式等は最小化問題となつ
ている。これと逆に相関値のように距離とは大小
関係が逆の量を使用する場合には、（15）式等の
最小化処理に代えて最大化処理を行なえばよい。[Formula] are calculated in advance and stored respectively. These weighting coefficient sums δα and δβ are input to the normalization unit 28. The normalization unit 28 adds the weighting coefficient sums δα and δβ to calculate the normalization coefficient N of equation (16). On the other hand, from the recurrence formula calculation unit 26, the signal line g ₀
The recurrence formula value g(I, J) is input via . This value is divided by the normalization coefficient N calculated previously to calculate the pattern distance given by equation (10) and output to the signal line S. In this way, the fourth
The configuration and operation of the pattern matching device have been explained with reference to the figures and FIG. FIG. 6 is a diagram showing the configuration of a further improved recurrence formula calculation section. This figure only shows the vicinity of comparator 268 in FIG. 5. Patent Application No. 2418 of 1972 filed by the same applicant states that if the curve formed by point sequence 1 shown in Fig. 1 is smoothed, it will not be extremely curved as shown in Fig. 2. There is a statement that better results can be obtained compared to when a point sequence is allowed. FIG. 6 is constructed taking this point into consideration. The fine P ₁ is added by the addition circuit 2681 to the signal y ₁ , that is, the first equation in brackets [ ] on the right side of equation (15), and to the signal y ₂ , that is, the third equation in brackets [ ] above. The fine P ₃ is added by the addition circuit 2683. As a result, the values of signals y ₁ and y ₃ tend to be larger than y ₂ . Therefore, the probability that the comparison circuit 268 selects the signal y ₂ as the minimum value increases. This means that the second equation in brackets [ ] on the right side of equation (15) is selected, and indicates that the point sequence 1 shown in FIG. 1 tends to be smoothed. In order to simplify the configuration, a configuration in which a constant P ₂ is subtracted from the signal y ₂ by a subtracter 2682 as shown in FIG. 7 may be used, but a substantially equivalent effect can be obtained. Furthermore, the same effect can be obtained by multiplying by a constant coefficient instead of adding and subtracting. As mentioned above, although the present invention has been explained using Examples, these descriptions do not limit the scope of the present invention. In particular, in Fig. 5, addition functions, multiplication functions, etc. are realized in a space-divided manner.
It is also possible to configure it in a time-division manner. Furthermore, the definition of the load coefficient is not limited to that defined in the cited document (2) (corresponding to equations (11) and (12) in this specification). For example, it is written in the paper titled "Variable frame speech analysis and synthesis method using optimal rectangular wave approximation" on page 385 of the Proceedings of the Acoustical Society of Japan (published in May 1978). When selecting and using a representative vector from a speech pattern as shown in FIG. Furthermore, since the above explanation is given based on the short distance measure, equations (15) etc. are minimization problems. On the other hand, when using a quantity such as a correlation value whose magnitude relationship is opposite to the distance, maximization processing may be performed instead of minimization processing such as equation (15).

[Brief explanation of the drawing]

第１図から第３図は本発明の原理を説明するた
めの図、第４図は本発明の一実施例を示す図、第
５図は本発明の実施例に用いる漸化式計算部を示
す図、および第６図と第７図は漸化式計算部のさ
らに改良された例を示す図である。第４図から第
７図において、２１は入力パタンバツフア、２２
は入力パタン荷重係数記憶、２３は標準パタン記
憶、２４は標準パタン荷重係数記憶、２５は距離
計算部、２６は漸化式計算部、２７はワーキング
メモリ、２８……正規化部、２０は制御部、２６
０はゲート回路、２６２，２６３，２６４は乗算
回路、２６１，２６５，２６６，２６７は加算回
路、２６８は比較器、２６９はマルチプレクサ、
２６８１，２６８３は加算回路、２６８２は減算
回路。 Figures 1 to 3 are diagrams for explaining the principle of the present invention, Figure 4 is a diagram showing an embodiment of the present invention, and Figure 5 is a diagram showing a recurrence formula calculation unit used in the embodiment of the present invention. 6 and 7 are diagrams showing further improved examples of the recurrence formula calculation section. 4 to 7, 21 is an input pattern buffer, 22
23 is an input pattern weight coefficient memory, 23 is a standard pattern memory, 24 is a standard pattern weight coefficient memory, 25 is a distance calculation unit, 26 is a recurrence formula calculation unit, 27 is a working memory, 28 is a normalization unit, 20 is a control unit Part, 26
0 is a gate circuit, 262, 263, 264 are multiplication circuits, 261, 265, 266, 267 are addition circuits, 268 is a comparator, 269 is a multiplexer,
2681 and 2683 are addition circuits, and 2682 is a subtraction circuit.

Claims

[Claims] 1. Input pattern buffer for holding input pattern A expressed as a time series of vectors a _i sampled at irregular intervals, expressed as a time series of vectors b _j sampled at uneven intervals. standard pattern memory for holding standard pattern B;
Input pattern weight coefficient storage for storing a group of weight coefficients α( _i ) corresponding to the time interval length represented by a _i for each vector a i of the previous input pattern A, and each vector b of the standard pattern B. Standard pattern weighting coefficient storage for storing a group of weighting coefficients β(j ₎ corresponding to the time interval length represented by b _j for _j , the distance _d (i, j) for any combination of time i and j, and the following three types of values (1 ) The distance d is added to the recurrence formula value g(i, j-1).
The value obtained by adding the product of (i, j) and the weighting coefficient β(j), (2) the recurrence formula value g(i-1, j-1) and the distance d
(i, j) and the sum of the weighting coefficients (α(i)+β(j))
(3) The value obtained by adding the product of (3) the recurrence formula value g(i-1, j) to the distance d
A recurrence formula calculator having a function of selecting and outputting the minimum value of (i, j) and the product of the weighting coefficient α(i), coefficient group α
1. A pattern matching device comprising: a normalization section for calculating an inter-pattern distance D (A, B) by dividing (i) by the sum of β(j).