JP3973789B2

JP3973789B2 - Element distribution search method, vector quantization method, pattern recognition method, speech recognition method, speech recognition apparatus, and recording medium on which a program for determining a recognition result is recorded

Info

Publication number: JP3973789B2
Application number: JP06236299A
Authority: JP
Inventors: 芳春阿部
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1999-03-09
Filing date: 1999-03-09
Publication date: 2007-09-12
Anticipated expiration: 2019-03-09
Also published as: JP2000261321A

Description

【０００１】
【発明の属する技術分野】
この発明は、例えば、多次元の連続確率分布の演算量等を削減することができる要素分布の探索方法，ベクトル量子化方法，パターン認識方法，音声認識方法，音声認識装置及び認識結果を決定するためのプログラムが記録された記録媒体に関するものである。
【０００２】
【従来の技術】
図２７は混合連続分布ＨＭＭを用いる音声認識装置を示す構成図であり、図において、１０１は入力音声、１０２は入力音声１０１を音声分析して、Ｋ次元の入力特徴ベクトル１０３を出力する音声分析部、１０３は入力特徴ベクトル、１０４は混合連続分布ＨＭＭ１０８で表された音素ＨＭＭの確率演算を実行する確率演算部、１０５は経路演算を実行する経路演算部、１０６は最適な経路を決定する最適経路決定部、１０７は最適な経路を示す認識結果、１０８は混合連続分布ＨＭＭ、１０９は経路データである。
図２８は混合連続分布ＨＭＭの説明図であり、図２９は経路演算の説明図である。
【０００３】
次に動作について説明する。
音声分析部１０２で得られたＫ次元の入力特徴ベクトル１０３について、確率演算部１０４がＭ個のＫ次元正規確率密度関数の混合分布で表された音素ＨＭＭの確率演算を実行し、最適経路決定部１０６が各フレーム毎に単語の接続条件に従って、ビタビ演算を実行して最適な経路を求める。最適な経路を後戻り探索することで単語系列を求める。
時刻ｔの特徴ベクトルｘ（ｔ）に対する確率演算は次式に基づいて行われる。ただし、次式において、λ（ｍ）は第ｍ番目の要素分布の分岐確率を示し、ｆ（ｘ（ｔ），ｍ）は第ｍ番目の要素分布の分岐密度を示す。
【０００４】
【数１】

【０００５】
さらに、要素分布の分岐密度ｆ（ｘ（ｔ），ｍ）は、各次元間に相関のない無相関正規分布は次式のように計算される。
ただし、ｘ（ｔ，ｋ）は入力ベクトルｘ（ｔ）の第ｋ次元目の要素の値、μ（ｍ，ｋ）は第ｍ番目の要素分布の第ｋ次元目の平均値、σ（ｍ，ｋ）は第ｍ番目の要素分布の第ｋ次元目の分散（即ち標準偏差の２乗）である。
なお、一般的な表記では肩に２乗を付すが、本明細書では、肩の２乗を付さずに分散を表す。
【０００６】
【数２】

【０００７】
大語彙の音声認識装置では、音素のＨＭＭを用いる。
しかも、音素には前後の音素に依存した環境依存の音素モデルを用いるため、混合分布に含まれる要素分布の総数はかなり大きくなる。
例えば、環境依存音素モデルの各状態を２０００状態のＨＭＭで表現し、各状態を３２混合分布とすれば、全体で６４０００個の要素分布を有することになる。
【０００８】
これらをフレーム周期の１０ミリ秒毎に計算すると、１秒当り６４０００００個の要素分布計算が必要となり、混合分布計算をリアルタイムで処理するためには、１つの混合分布を約１５６ナノ秒で処理する必要があるが、これは現在の汎用のプロセッサでは実現困難である。
このため、以下に述べるような演算量削減の技術が提案されている。
【０００９】
・予備選択による方法：
混合連続分布ＨＭＭの計算量削減に関して次のような予備選択による方法が提案されている。
【００１０】
岩崎らは、文献（“混合連続分布ＨＭＭを用いた不特定話者連続音声認識のための演算量削減法”、日本音響学会講演論文集、３−５−４、平成３年１０月）で、各音素片ＨＭＭについて単一分布の出力確率により予備選択し、選択された音素片ＨＭＭのみ厳密な混合分布計算を行っている。
【００１１】
Ｂｏｃｃｈｉｅｒｉらは、文献（“Ｖｅｃｔｏｒｑｕａｎｔｉｚａｔｉｏｎｆｏｒｔｈｅｅｆｆｉｃｉｅｎｔｃｏｍｐｕｔａｔｉｏｎｏｆｃｏｎｔｉｎｕｏｕｓｄｅｎｓｉｔｙｌｉｋｅｌｉｈｏｏｄｓ”，ＩＣＡＳＳＰ９３予稿集，ＩＩ−６９２頁〜ＩＩ−６９５頁）で、ガウス分布の平均ベクトルを予めベクトル量子化して、入力ベクトルとこの量子化ベクトルとの距離情報を用いて混合ガウス分布のクラスタを予備選択した後、出力確率を計算した。
【００１２】
Ｄｉｇａｌａｋｉｓらは、文献（“Ｇｅｎｏｎｅｓ：ＧｅｎｅｒａｌｉｚｅｄＭｉｘｔｕｒｅＴｙｉｎｇｉｎＣｏｎｔｉｎｕｏｕｓＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ−ＢａｓｅｄＳｐｅｅｃｈＲｅｃｏｇｎｉｚｅｒｓ”，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＳｐｅｅｃｈａｎｄａｕｄｉｏｐｒｏｃｅｓｓｉｎｇ，ｖｏｌ．４，ｎｏ．４，ｊｕｌｙ１９９６，２８１頁〜２８９頁）で、上記Ｂｏｃｃｈｉｅｒｉらの方法を一般化音素ＨＭＭに用いる音声認識システムに適用して、それぞれの一般化音素ＨＭＭについて混合ガウス分布のクラスタを予備選択して、出力確率を計算した。
【００１３】
渡辺らは、文献（“木構造確率分布を用いた音声認識”、日本音響学会講演論文集、１−８−７、平成５年１０月、及び、特開平６−３４８２９２号公報）で、ガウス分布のクラスタの予備選択及び混合ガウス分布の出力確率計算において木構造を導入した。木構造の作成にはトップダウンクラスタリングを適用している。
【００１４】
小森らは、文献（“ＲｏｕｇｈＨＭＭとＤｅｔａｉｌＨＭＭを用いた連続ＨＭＭ出力確率計算の高速化”、日本音響学会講演論文集、１−Ｑ−２０、平成７年３月）で、認識に貢献度の高そうなＨＭＭの状態を少数分布のＨＭＭを用いて予備選択後、多数分布の出力確率を再計算している。
【００１５】
中川らは、文献（“連続出力分布型ＨＭＭの出力確率計算の短縮法”、日本音響学会講演論文集、１−Ｑ−２２、平成７年３月）で、すべてのコードベクトルに対して出力確率を事前に計算してテーブル化しておき、入力ベクトルに一番近いコードベクトルに対応する出力確率の値を表引きし、必要に応じて確率の大きい場合のみ、従来の方法で再計算して出力確率を求めている。
【００１６】
以上の何れの方法も、実際に計算する総分布数の削減を行っている。しかし、上記の予備選択による方法には次のような問題がある。
岩崎らの方法は、混合分布の他に、混合分布をカバーする単一分布を用意する必要がある。
渡辺らの方法は、木の根に近い分布が、木の葉に近い分布をカバーするように、要素分布が木構造となるように設計されている必要がある。このため、任意の混合分布を有する既存の混合分布の演算削減には適用できない。
【００１７】
小森らの方法は、従来の混合分布のための記憶領域の他に、精度の荒い混合分布の記憶領域を用意するとともに、この精度の荒い混合分布の演算が前処理として必要である。
Ｂｏｃｃｈｉｅｒｉ，Ｄｉｇａｌａｋｉｓ及び中川らの方法は、ベクトル量子化のためのコードブックのための記憶領域と、入力ベクトルに対して最も距離の近いコードを決定するためのベクトル量子化演算が別途に必要である。
【００１８】
ところで、ベクトル量子化は複数の信号を個々に量子化せず、一括して一つの符合で表現する多次元信号量子化手法として、音声や画像の高能率符号化法として注目を集めている。
ベクトル量子化は入力ベクトルに最も近いコードベクトルを求め、そのコードで入力ベクトルを代表させる方法である。したがって、入力ベクトルと全コードベクトルとの距離演算が必要である。
ベクトル量子化による符号化誤差を小さくするため、コードブックの規模及びコードベクトルの探索時間が指数関数的に増大し、ハードウェア実現が困難になる。
【００１９】
ベクトル量子化の演算量を削減する手法として、次の３つの方法がある。
第１の方法は、２進木探索が可能なようにコードベクトルを設計し、入力ベクトルに対して距離最小のコードベクトルを２進木による探索により求めるものである。
【００２０】
渡辺らは、文献（“主成分分析を用いた２進木探索コードブックの設計法”、音声研究会資料ＳＰ８７−１２９、１９８７年２月）で、２進木探索が可能なコードブックの設計法を示している。２進木探索によれば、入力ベクトルと比較するコードベクトルの数は、ｌｏｇ₂ （Ｎ）のオーダとなり、比較回数を大幅に削減できる。
【００２１】
第２の方法は、任意の構造を持ったコードベクトルの集合について、２進木探索を行うものである。
Ｃｈｅｎｇらは、文献（“ＡＦａｓｔｃｏｄｅｂｏｏｋｓｅａｒｃｈａｌｇｏｒｉｔｈｍｆｏｒｎｅａｒｅｓｔ−ｎｅｉｇｈｂｏｒｐａｔｔｅｒｎｍａｔｃｈｉｎｇ”、ＩＣＡＳＳＰ８６予稿集、２６５頁〜２６８頁、１９８６年）で、任意のコードベクトルからなるコードベクトル集合について、コードベクトルの全体集合から始めて、超平面によるコードベクトル集合の２分割処理を、要素一つからなるコードベクトル集合が得られるまで２進木にしたがって順次繰り返し、入力ベクトルに最も近いコードベクトルを探索する方法を提案している。
【００２２】
ノードｎに付随する超平面は、入力ベクトルｘ（ｔ）に対する係数ベクトルｕ（ｎ）と定数ベクトルｃ（ｎ）とで記憶されている。
入力ベクトルｘ（ｔ）に対して、各ノードで次の関係式を計算し、関係（Ａ）が満足されれば、左側の枝のノードに進み、入力ベクトルの最近隣コードは左側の部分集合Ω_L に属するとする。また、関係（Ｂ）が満足されれば、右側の枝のノードに進み、入力ベクトルの最近隣コードは右側の部分集合Ω_R に属するとする。
【００２３】
【数３】

【００２４】
第３の方法は、一般の探索手法を用いるものである。
丸田らは、文献（“文字認識における距離計算の高速化の検討”、１９９７年電子情報通信学会総合大会予稿集、３１０頁）で、特徴空間を探索しながら最小距離を与えるカテゴリ（コードベクトルに相当）を検出している。
各カテゴリはテンプレートベクトルとして与えられる。
この方法では、何らかの方法で与えられた初期カテゴリを元に、カテゴリ毎に予め作成しておいた近傍のカテゴリリストを参照し、現在のカテゴリの近傍カテゴリについて、その中での最小距離を与えるカテゴリを選択し、このカテゴリを新たに種のカテゴリと設定して、再度探索を繰り返すものである。
【００２５】
以上のベクトル量子化の方法には次のような問題がある。
第１の方法では、２進木探索が可能なコードベクトルの設計が必要である。
第２の方法では、多次元空間を超平面で次々に分割して行くものであるが、超平面で多次元空間をうまく均等に分割することは一般には難しく、最終的に１個の要素分布にたどり着くには多くの判定が必要になり、演算量の削減の効果はほとんどない。また、判定のための情報を記憶するため、かなりのメモリを必要とする。
第３の方法では、初期ベクトルを与える方法が開示されていない。また、探索の結果がローカルミニマムに陥ることがあり、最適な解である保証はない。このため、ベクトル量子化の精度の劣化が生じる可能性がある。また、パターン認識に用いたときは認識精度劣化の可能性がある。
【００２６】
・共有化による混合分布演算の削減方法：
混合分布又はベクトル量子化の演算量削減の方法としては、上記の「予備選択による方法」の他に、ＨＭＭの部分を共通化して、演算量を削減する方法がある。
【００２７】
小森らは、文献（特開平７−２６１７８８号公報）で、認識時の演算量を減らすため、あるクラス内において、複数のＨＭＭのうち、そのＨＭＭの関連情報が同値であるＨＭＭを用いてクラス毎に音声認識する方法を示している。
高橋らは、文献（“４階層共有構造音素モデルにおける分散値共有化の効果”、日本音響学会講演論文集、１−Ｑ−２３、平成７年３月）及び文献（特開平８−２４８９８６号公報）で、ＨＭＭの各状態の多次元正規分布における各次元に存在する正規分布中の、平均値及び分散値が共に類似するものを共通化することにより、ＨＭＭの混合分布の演算量を削減する方法を示している。
【００２８】
上記のＨＭＭの部分の共有化による演算量削減の方法には次のような問題がある。
共有化を進めると演算時間は減少するものの、ＨＭＭの分解能が低下し、認識精度が低下する。
また、共有化された後のＨＭＭについてはすべて演算を行う必要があり、演算量の削減には限界がある。
【００２９】
以上、ＨＭＭの混合分布の演算量削減に関する従来の技術について説明した。
次に、経路探索の演算量削減に関する従来の技術について説明する。
経路探索の演算量削減法としては、音声のフレームに同期して仮説の数を制限するビーム探索して音声認識する方法がある。
また、１パス目では制約の緩い経路にそって経路スコアの推定値を求め、２パス目ではこの経路スコアの推定値に基づいて、最適な経路の探索を行うマルチパスの音声認識の方法がある。
【００３０】
野田らは、文献（“前向きヒューリスティック関数を用いたビーム探索によるＨＭＭ―ＬＲ）音声認識の検討”、日本音響学会講演論文集、３−８−１６、平成６年１０月）で、ビーム探索において、前向きヒューリスティクスを導入し、経路探索の演算を削減している。
【００３１】
加藤らは、文献（“連続分布ＨＭＭのよる単語音声認識のＶｉｔｅｒｂｉｂｅｓｔ−ｆｉｒｓｔサーチにおける推定スコア設定法の検討”、日本音響学会講演論文集、３−８−１５、平成６年１０月）で、マルチパスの音声認識において、１パス目では、混合分布の最大分岐密度及び最大経路スコアを推定スコアとして求め、２パス目では、ベストファースト探索を行うことにより、経路探索に要する演算量を削減している。
【００３２】
１パス目で求める最大経路スコアとしては、音素内、単語内、全単語内で求める方法が示されている。
なお、ベストファースト探索は、仮説（探索の途中であるので部分仮説と呼ばれる）を展開する際に、現在のスコアと将来のスコアの推定値に基づいて評価値を計算し、この評価値が最大の仮説を優先的に展開し、探索を進める方法である。
【００３３】
ここで、スコアの推定値が真のスコアに比較して大きいか、等しいとき、Ａ＊条件を満足すると言われる。ベストファースト探索において、Ａ＊条件を満足する推定値を用いる探索をＡ＊探索という。
Ａ＊探索では、探索の最初に見つかる解がフル探索での最適解と一致することが知られている。
【００３４】
上記の方法では次のような問題がある。
野田らの方法では、ビーム幅を一定値以下に下げると精度が劣化する。
加藤らの方法では、Ａ＊探索の条件が満足されるので、精度劣化はない。しかしながら、１回目のパスで、推定スコアを求める際に、混合分布のなかで最大分岐密度を有する要素分布を求める必要がある。実際には、大語彙の音声認識に適用する場合、このための演算量は無視できないものとなる。
【００３５】
【発明が解決しようとする課題】
従来の音声認識装置は以上のように構成されているので、前処理としてベクトル量子化の演算を必要とし、また、混合分布として木構造化された分布を用いる必要があり、さらに、予備選択のために別の混合分布が必要であるため、演算量が膨大になり、音声認識処理に長時間を要する課題があった。
また、共有化による認識精度が低下する課題があり、また、マルチパスの音声認識処理において、推定スコアの演算量が無視できないほど大きくなる課題もあった。
【００３６】
この発明は上記のような課題を解決するためになされたもので、演算量を削減することができる要素分布の探索方法，ベクトル量子化方法，パターン認識方法，音声認識方法，音声認識装置及び認識結果を決定するためのプログラムが記録された記録媒体を得ることを目的とする。
【００３７】
【課題を解決するための手段】
この発明に係る要素分布の探索方法は、展開済の次元数が入力ベクトルの次元数に一致するまで、次元方向にベストファースト探索を実行するようにしたものである。
【００３８】
この発明に係る要素分布の探索方法は、展開済の次元数が増加する方向にベストファースト探索を実行するようにしたものである。
【００３９】
この発明に係る要素分布の探索方法は、第ｋ＋１次元から第Ｋ次元における確率値の累積値の推定値を、自由度Ｋ−ｋのカイ２乗分布の分布関数値が所定値に達するときのカイ２乗分布の独立変数値に基づいて計算するようにしたものである。
【００４０】
この発明に係る要素分布の探索方法は、第ｋ＋１次元から第Ｋ次元における確率値の累積値の推定値を、学習用の音声データが示す値の分布に基づいて計算するようにしたものである。
【００４１】
この発明に係る要素分布の探索方法は、要素分布を複数のクラスタに分割し、各クラスタについて、第ｋ＋１次元から第Ｋ次元における確率値の累積値の推定値を、学習用の音声データが示す値の分布に基づいて計算するようにしたものである。
【００４２】
この発明に係る要素分布の探索方法は、ベストファースト探索で最初から複数個の解を求めて、その複数個の解の中で、上位複数個の評価値を有する要素分布を求めるようにしたものである。
【００４３】
この発明に係る要素分布の探索方法は、第ｋ＋１次元から第Ｋ次元における確率値の累積値の推定値を、各要素分布における各次元の一次元の分布を各分布の平均値の数直線上での配列に基づいて複数の区間に分割し、各区間について求めた各次元毎の不等式と、入力ベクトルの各次元の数値とに基づいて計算するようにしたものである。
【００４４】
この発明に係る要素分布の探索方法は、第ｋ＋１次元から第Ｋ次元における確率値の累積値の推定値を、要素分布を複数のクラスタに分割し、複数のクラスタのそれぞれについて各次元毎に得られた不等式と、入力ベクトルの各次元の数値とに基づいて計算するようにしたものである。
【００４５】
この発明に係る要素分布の探索方法は、探索の閾値を設けて、第ｋ次元までの探索の累積値又は評価値が、その閾値を下回る場合、当該仮説の探索を打ち切るようにしたものである。
【００４６】
この発明に係る要素分布の探索方法は、要素分布に依存して探索の閾値を設けて、第ｋ次元までの探索の累積値又は評価値が、その閾値を下回る場合、当該仮説の探索を打ち切るようにしたものである。
【００４７】
この発明に係る要素分布の探索方法は、探索の閾値として、入力ベクトルの次元数をＫとし、自由度Ｋ−ｋのカイ２乗分布の分布関数が一定の数値に到達するときの独立変数の値に基づいて計算した値を用いるようにしたものである。
【００４８】
この発明に係る要素分布の探索方法は、展開済の次元数を増加させる際に２以上次元数を増加させるベストファースト探索を実行するようにしたものである。
【００５１】
この発明に係る音声認識方法は、多次元の入力ベクトルに対して、評価値最大の要素分布のベストファースト探索を次元方向に実行し、評価値最大をとる要素確率分布の確率はベストファースト探索の結果として計算された確率を用い、その他の要素確率分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するようにしたものである。
【００５２】
この発明に係る音声認識方法は、多次元の入力ベクトルに対して、評価値の大きい方からＮ個の分岐確率密度のベストファースト探索を次元方向に実行し、評価値の大きい方からＮ個の分岐密度をとる要素確率分布の確率はベストファースト探索の結果として計算された確率の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するようにしたものである。
【００５３】
この発明に係る音声認識方法は、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、多次元の入力ベクトルに対して、評価値最大の要素分布のベストファースト探索を次元方向に実行し、その評価値最大の要素分布の確率はベストファースト探索の結果として計算された確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するようにしたものである。
【００５４】
この発明に係る音声認識方法は、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、多次元の入力ベクトルに対して、評価値の大きい方からＮ個の分岐確率密度のベストファースト探索を次元方向に実行し、その評価値の大きい方からＮ個の分岐密度をとる要素分布の確率はベストファースト探索の結果として計算された評価値の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するようにしたものである。
【００５５】
この発明に係る音声認識装置は、多次元の入力ベクトルに対して、評価値最大の要素分布のベストファースト探索を次元方向に実行し、評価値最大をとる要素分布の確率はベストファースト探索の結果として計算された確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を行うようにしたものである。
【００５６】
この発明に係る音声認識装置は、多次元の入力ベクトルに対して、評価値の大きい方からＮ個の要素分布のベストファースト探索を次元方向に実行し、その評価値の大きい方からＮ個の要素分布の確率はベストファースト探索の結果として計算された評価値の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を行うようにしたものである。
【００５７】
この発明に係る音声認識装置は、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、多次元の入力ベクトルに対して、評価値最大の要素分布のベストファースト探索を次元方向に実行し、その評価値最大をとる要素分布の確率はベストファースト探索の結果として計算された確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するようにしたものである。
【００５８】
この発明に係る音声認識装置は、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、多次元の入力ベクトルに対して、評価値の大きい方からＮ個の要素分布のベストファースト探索を次元方向に実行し、その評価値の大きい方からＮ個の要素分布の確率はベストファースト探索の結果として計算された評価値の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するようにしたものである。
【００５９】
この発明に係る認識結果を決定するためのプログラムが記録された記録媒体は、多次元の入力ベクトルに対して、複数の多次元の連続分布からなる混合連続確率分布のモデルで表現された認識単位のスコアを評価値最大の要素分布のベストファースト探索を次元方向に実行し、その評価値最大をとる要素分布の確率はベストファースト探索の結果として計算された要素分布の確率を用い、その他の要素確率分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行し、そのスコア計算の結果に基づいて、認識単位の系列の中でスコアの最も高い認識単位の系列を選択するようにしたものである。
【００６０】
この発明に係る認識結果を決定するためのプログラムが記録された記録媒体は、多次元の入力ベクトルに対して、複数の多次元の連続分布からなる混合連続確率分布のモデルで表現された認識単位のスコアを評価値の大きい方からＮ個の要素分布のベストファースト探索を次元方向に実行し、その評価値の大きい方からＮ個の評価値をとる要素分布の確率はベストファースト探索の結果として計算された評価値の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行し、そのスコア計算の結果に基づいて認識単位の系列の中でスコアの最も高い認識単位の系列を選択するようにしたものである。
【００６１】
この発明に係る認識結果を決定するためのプログラムが記録された記録媒体は、多次元の入力ベクトルに対して、複数の多次元の連続分布からなる混合連続確率分布に基づいて認識単位のスコアの推定値を計算するスコア計算し、１回目は認識単位の系列の中で経路の制約を緩めたスコアの推定値を求め、２回目は１回目で求めたスコアの推定値に基づいて認識単位の完全な系列の中で最もスコアの高い認識単位の系列を探索して求め、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、その多次元の入力ベクトルに対して評価値最大の分岐確率密度のベストファースト探索を次元方向に実行し、その評価値最大をとる要素分布の確率はベストファースト探索の結果として計算された確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して、認識結果を決定するようにしたものである。
【００６２】
この発明に係る認識結果を決定するためのプログラムが記録された記録媒体は、多次元の入力ベクトルに対して、複数の多次元の連続分布からなる混合連続確率分布に基づいて認識単位のスコアを計算し、１回目は認識単位の系列の中で経路の制約を緩めたスコアの推定値を求め、２回目は１回目で求めたスコアの推定値に基づいて認識単位の完全な系列の中で最もスコアの高い認識単位の系列を探索して求め、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、その多次元の入力ベクトルに対して、評価値の大きい方からＮ個の分岐確率密度のベストファースト探索を次元方向に実行し、その評価値の大きい方からＮ個の要素分布の確率はベストファースト探索の結果として計算された評価値の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して、認識結果を決定するようにしたものである。
【００６３】
【発明の実施の形態】
以下、この発明の実施の一形態を説明する。
実施の形態１．
図１はこの発明の実施の形態１による音声認識装置を示す構成図であり、図において、１は入力音声、２は入力音声１を分析して特徴ベクトル３に変換する音声分析手段、４は特徴ベクトル３に対して、ＨＭＭ記憶手段８に記憶された混合連続分布ＨＭＭの状態ＨＭＭの確率を計算する確率演算手段（スコア計算手段）、５は経路データ記憶手段９の経路データに従って、状態ＨＭＭの連結から音素や単語を構成し、経路の各状態において、その状態に至る状態ＨＭＭの系列の中でスコアが最大の経路を決定し、そのスコアを求める経路演算手段（スコア計算手段）、６は経路演算手段５の決定した経路を音声の終端から後ろ向きに探索して最適な経路に沿う状態ＨＭＭの系列を求めて、入力音声１に対する認識結果７を出力する最適経路決定手段（認識結果選択手段）である。
【００６４】
次に動作について説明する。
音声分析の周期は１０ｍｓであり、特徴ベクトル３は、分析次数１２のメル周波数軸ケプストラム分析の結果得られたｃ（０），ｃ（１），ｃ（２），…，ｃ（１２）と、それらの時間変化率であるΔｃ（０），Δｃ（１），Δｃ（２），…，Δｃ（１２）とをベクトルの成分とした２６次元のベクトルである。
以後特徴ベクトル３の次元をＫとする。
【００６５】
ＨＭＭ記憶手段８には状態ＨＭＭが記憶されている。図２は状態ＨＭＭの構造を示す。状態ＨＭＭは、２状態１ループの構造を持つ。
状態遷移に付随する出力確率分布は、複数（本実施の形態では１６個）の連続分布を要素分布とした混合連続分布となっている。
また、各状態遷移に付随する出力確率分布は、いわゆる結びの関係を持たせ共通なものとしてある。
【００６６】
状態ＨＭＭは、全体で２０００個ある。音素ＨＭＭは、音素環境依存型であり、これらの状態ＨＭＭのうちの何れかを３つ連結し、３状態のＨＭＭとして構成される。
単語ＨＭＭは、さらにこのようにして構成される３状態の音素ＨＭＭを単語の音素表記に従って音素数分連結したものとして構成される。
【００６７】
時刻ｔの入力ベクトルｘ（ｔ）に対するある状態ＨＭＭの出力確率は、厳密には、次式で求められる。ただし、Ｍ（ｊ）は第ｊ番目の状態の混合連続分布に含まれる要素分布の個数であり、本実施の形態では全状態とも１６である。
以下、状態ｊの混合分布を説明対象とするので、インデックスｊを省略して、Ｍ（ｊ）をＭ、λ（ｊ，ｍ）をλ（ｍ）などと略記する。
【００６８】
【数４】

【００６９】
また、ｆ（ｘ（ｔ），ｍ）は第ｍ番目の要素分布の分岐密度（連続分布の密度関数）であり、無相関正規分布である本実施の形態では、次式で与えられる。
ただし、ｘ（ｔ，ｋ）は入力ベクトルｘ（ｔ）の第ｋ次元目の要素の値、μ（ｍ，ｋ）は第ｍ番目の要素分布の第ｋ次元目の平均値、σ（ｍ，ｋ）は第ｍ番目の要素分布の第ｋ次元目の分散である。
【００７０】
【数５】

【００７１】
上記の出力確率は、分岐密度の和として計算されるが、本実施の形態では、この出力確率計算の近似式として、次式のように、分岐確率λ（ｍ）と分岐密度ｆ（ｘ（ｔ），ｍ）の積λ（ｍ）ｆ（ｘ（ｔ），ｍ）（以後、分岐出力確率と呼ぶ）の最大値を用いている。
この近似式による出力確率（以後、最大分岐出力確率と呼ぶ）を用いたときの精度低下は僅かであることは認識実験により確認されている。
【００７２】
【数６】

【００７３】
本実施の形態では、最大化出力確率ｂ‘（ｘ（ｔ））を以下のベストファースト探索を適用して計算する。
【００７４】
以下、この方法について説明する。
まず、分岐出力確率λ（ｍ）ｆ（ｘ（ｔ），ｍ）の対数をとると、式（１）から次のように変形できる。
【００７５】
【数７】

【００７６】
ここに、Ａ（ｍ）及びＢ（ｍ）は入力ベクトルに依存しないので、事前に計算しておく。対数関数の単調増加性から、最大化出力確率ｂ‘（ｘ（ｔ））を求めることと、式（２）のｍに関する最大値を求めることとは等価である。
よって、式（２）を最大とする要素分布を求められれば、式（１）はその値の指数関数（ｅｘｐ（ｘ））の値として計算することができる。
【００７７】
次に式（２）を最大とする要素分布を求める探索方法について説明する。
図３はこの探索で用いる機能の構成図である。仮説展開制御部２００１は仮説記憶部（スタックとも呼ぶ）２００２に記憶された仮説の取り出し、展開処理を行う。
【００７８】
仮説記憶部２００２に記憶される仮説は、それぞれ、分布の番号ｍ、計算済の次元数ｋ、第１次元から第ｋ次元までの各次元の一次元の分布の確率の累積値ｇ（ｍ，ｋ）の数値を有する。
累積値ｇ（ｍ，ｋ）は、式（２）のＣ（ｘ（ｔ），ｍ，ｋ）のＫをｋとしたときの値であり、
ｇ（ｍ，ｋ）＝Ａ（ｍ）＋Ｂ（ｍ）＋Ｃ（ｘ（ｔ），ｍ，ｋ）
である。
【００７９】
探索の途中の仮説はスタックに格納される。
スタックは、ヒープと呼ばれる完全２分木として作成することができる。図４はヒープを用いた仮説記憶部の説明図である。ヒープはメモリの１次元配列Ａとして実現される。配列Ａのインデックスを一般にｎと２^* ｎ、２^* ｎ＋１とすると、Ａ［ｎ］≧ｍａｘ（Ａ［２^* ｎ］，Ａ［２^* ｎ＋１］）の関係を有する。
【００８０】
ヒープで仮説を管理することで、最大値を有する仮説は配列Ａの先頭Ａ［１］に必ずあるため、極めて高速に最大値を取り出すことができる。
また、ヒープに仮説を格納するときは、ヒープの根（即ち配列の要素Ａ［１］）から葉に向かって、２分木を辿って、配列の要素を交換しながら、上記の大小関係Ａ［ｎ］≧ｍａｘ（Ａ［２^* ｎ］，Ａ［２^* ｎ＋１］）を満足する位置に仮説を格納する。
この処理は仮説の数の２の対数の２倍のオーダー（Ｏ（２ｌｏｇ２Ｎ））の演算量しか必要とせず、極めて高速に行われる。
【００８１】
図５は式（２）の計算を次元方向に探索を進めるベストファースト探索のフローチャートである。
次に各処理ステップについて説明する。
ステップＳＴ１において、第ｊ状態の混合連続分布のすべての要素分布（第１〜第Ｍ（ｊ）分布）について、初期仮説を生成する。生成された仮説は仮説記憶部２００２に記憶される。
ここで、第ｍ番目の要素分布の初期仮説は、分布の番号をｍ、計算済の次元数を０、各次元の一次元の分布の累積値ｇ（ｍ，０）を
ｇ（ｍ，０）＝Ａ（ｍ）＋Ｂ（ｍ）
とする。
【００８２】
ステップＳＴ２において、累積値最大の仮説を仮説記憶部２００２から取り出す。この仮説の要素分布番号、展開済次元数、累積値を、以下、それぞれ、ｍ，ｋ，ｇ（ｍ，ｋ）として説明する。
【００８３】
ステップＳＴ３において、計算済の次元数ｋが最終次元のＫ（本実施の形態では２６）に到達しているかを調べ、最終次元に到達している場合（ｋ＝２６のとき）は、ベストファースト探索の理論により、この仮説が確率最大の要素分布に該当するので、この仮説を探索された解として、解の処理を行うため、ステップＳＴ７に進む。
もし、最終次元に到達していない場合（ｋ＜２６のとき）は、ステップＳＴ４に進む。
【００８４】
ステップＳＴ４において、次元数を１増やし、ｋ＋１とする。
ステップＳＴ５において、次元数ｋ＋１までの累積値を計算する（ただし、ここでの次元数ｋはステップＳＴ４において１増やす前の値として説明する。なお、以下の実施の形態でも同様とする）。
この累積値は、次元ｋまでの累積値ｇ（ｍ，ｋ）に、次元ｋ＋１の一次元の連続分布の対数確率値（以後尤度と呼ぶ）を加算して、ｇ（ｍ，ｋ＋１）を次式のように計算する。ただし、一次元の連続分布の対数確率値に含まれる定数項−（１／２）ｌｏｇ（２πσ（ｍ，ｋ））はステップＳＴ１において既に定数項Ｂ（ｍ）の中で加算済であり除く。
【００８５】
【数８】

【００８６】
ところで、分散σ（ｍ，ｋ）は正の値であり、かつ、分子も２乗されているため正の値であるので、累積値ｇ（ｍ，ｋ＋１）はｇ（ｍ，ｋ）に等しいか、より小さく、累積値ｇ（ｍ，ｋ）は次元数ｋに関して単調非増加関数となっている。
【００８７】
ステップＳＴ６において、この仮説の要素分布番号、展開済次元数、累積値を、それぞれ、ｍ，ｋ＋１，ｇ（ｍ，ｋ＋１）とした仮説を作成し、仮説記憶部２００２に記憶して、ステップＳＴ２に戻る。
ステップＳＴ２における処理は前述の通りであり、ステップＳＴ３において、解が見つかるまで以上の処理が繰り返される。
【００８８】
ステップＳＴ７では、仮説を解として記憶し、処理を終了する。
さて、ステップＳＴ７において、解として得られた仮説については、要素分布番号、展開済次元数、累積値は、それぞれ、ｍ、Ｋ、ｇ（ｍ，Ｋ）となっている。
ｇ（ｍ，Ｋ）は次式のように各次元の尤度の和であり、要素分布ｍの第１次元から第Ｋ次元までの尤度の和であり、Ｍ（ｊ）個の混合分布の中では、最大の尤度となっている。
ｇ（ｍ，Ｋ）＝Ａ（ｍ）＋Ｂ（ｍ）＋Ｃ（ｘ（ｔ），ｍ，Ｋ）
【００８９】
これによって、最大の尤度を有する要素分布は解として求まった仮説の要素分布番号ｍとして求められる。
また、その対数確率は解として求まった仮説の累積値として得られる。
【００９０】
図６は探索終了時点のスタックに残された仮説の様子をそれらの展開履歴と共に示すグラフ図である。
この図で□印が最終次元Ｋまで展開された各次元の尤度の累積値最大の仮説の展開履歴を示す。また、○印及び×印が途中の次元まで展開された仮説の展開履歴を示す。
【００９１】
さて、この図から分かるように、スタックに展開されずに途中次元の計算して残っている仮説については、その途中次元から最終次元までの一次元の尤度計算をしていない。
したがって、本実施の形態の方法では、演算回数はスタックに展開されずに残っている要素分布の残りの展開すべき次元数の分だけ、一次元の尤度計算をしなくて済み、全要素分布の確率を求めてから、確率の最大の要素分布を求める従来のフル探索の方法に比べて、演算量が削減される。
【００９２】
従って、本実施の形態では、ベストファースト探索を用いているので、フル探索に比較して、一次元の尤度計算の演算回数を削減して、尤度最大の分岐確率密度を求めることができるという効果がある。
また、ベクトル量子化や、元々の混合分布に含まれる要素分布以外に確率分布を用いる必要がないという効果がある。
また、ローカルミニマムに陥る危険のある一般の探索手法に比較して、ベストファースト探索により、必ず、最大の確率を有する要素分布を最初の解として求められるため、最大分岐出力確率の計算の精度を犠牲にすることがないという効果がある。
【００９３】
実施の形態２．
上記実施の形態１では、次元数Ｋの多次元の入力ベクトルに対して、第ｋ次元までの１次元の分布の確率値の第１次元から第ｋ次元までの累積値が最大の要素分布についての仮説を、次元方向に展開するベストファースト探索を用いた。
この場合、将来の第ｋ＋１次元から入力ベクトルの次元数であるＫ次元までの累積値の推定値の情報は用いていない。
【００９４】
そこで、本実施の形態では、第１次元から第ｋ次元までの累積値に、第ｋ＋１次元から前記第Ｋ次元までの累積値の推定値を加えて、解が得られるまでに要するベストファースト探索の仮説の展開回数の削減を図る。
【００９５】
図７は探索のためのフローチャート、図８は探索のための機能ブロック図である。仮説展開制御部２００１は仮説記憶部２００２に記憶された仮説の取り出し、展開処理を行う。
仮説記憶部２００２に記憶される仮説は、それぞれ、要素分布の番号ｍ、計算済の次元数ｋ、第１次元から第ｋ次元までの各次元の一次元の分布の尤度の累積値ｇ（ｍ，ｋ）、評価値ｆ（ｍ，ｋ）の各数値を有する。
【００９６】
累積値ｇ（ｍ，ｋ）は、要素分布ｍの第１次元から第ｋ次元までの各次元の尤度の累積値であり、式（２）のＣ（ｘ（ｔ），ｍ，ｋ）のＫをｋとしたときの値として、次式で計算した値である。
ｇ（ｍ，ｋ）＝Ａ（ｍ）＋Ｂ（ｍ）＋Ｃ（ｘ（ｔ），ｍ，ｋ）
また、評価値ｆ（ｍ，ｋ）は、この累積値ｇ（ｍ，ｋ）と推定値ｈ（ｍ，ｋ）とを用いて、次式で計算される。
ｆ（ｍ，ｋ）＝ｇ（ｍ，ｋ）＋ｈ（ｍ，ｋ）
【００９７】
ここで、推定値ｈ（ｍ，ｋ）は、要素分布ｍの第ｋ＋１次元から最終の第Ｋ次元までの各次元の尤度の累積値の推定値である。その具体例は実施の形態３以下で説明するので、ここでは、その詳細な説明を省略し、この推定値があるものとして、これを用いたベストファースト探索の概略を説明する。
次に各処理ステップについて説明する。
【００９８】
ステップＳＴ１１において、時刻ｔの特徴ベクトルｘ（ｔ）が与えられる。この入力ベクトルの各次元ｋ（ｋ＝１，２，…，Ｋ）の値をｘ（ｔ，ｋ）とする。ステップＳＴ１２において、要素分布ｍの第ｋ＋１次元から最終の第Ｋ次元までの各次元の尤度の累積値の推定値ｈ（ｍ，ｋ）（ｋ＝１，２，…，Ｋ）を計算する。
ただし、最終次元Ｋの推定値はｈ（ｍ，Ｋ）＝０とする。
【００９９】
ステップＳＴ１において、第ｊ状態の混合分布のすべての要素分布（第１〜第Ｍ（ｊ）分布）ついて、初期仮説を生成する。生成された仮説は仮説記憶部２００２に記憶される。
ここで、第ｍ番目の分布の初期仮説は、要素分布の番号をｍ、計算済の次元数を０、各次元の一次元の対数確率（尤度）の累積値をｇ（ｍ，０）、評価値をｆ（ｍ，０）とする。
ただし、次式のように累積値ｇ（ｍ，０）は要素分布の分岐出力確率の定数項とし、また、評価値ｆ（ｍ，０）は累積値ｇ（ｍ，０）と推定値ｈ（ｍ，０）に基づいて次式のように与える。
ｇ（ｍ，０）＝Ａ（ｍ）＋Ｂ（ｍ）
ｆ（ｍ，０）＝ｇ（ｍ，０）＋ｈ（ｍ，０）
【０１００】
ステップＳＴ２において、評価値ｆ（ｍ，ｋ）最大の仮説を仮説記憶部２００２から取り出す。
この仮説の要素分布番号、展開済次元数、累積値、評価値を、以下、それぞれ、ｍ，ｋ，ｇ（ｍ，ｋ），ｆ（ｍ，ｋ）として説明する。
【０１０１】
ステップＳＴ３において、計算済の次元数ｋが最終次元Ｋ（＝２６）に到達しているかを調べ、最終次元に到達している場合（ｋ＝２６のとき）は、ベストファースト探索の理論により、この仮説が評価値最大の要素分布に該当するので、この仮説を探索された解として、解の処理を行うため、ステップＳＴ７に進む。もし、最終次元に到達していない場合（ｋ＜２６のとき）は、ステップＳＴ４に進む。
【０１０２】
ステップＳＴ４において、次元数を１増やし、ｋ＋１とする。
ステップＳＴ５において、まず、次元数ｋ＋１までの累積値ｇ（ｍ，ｋ＋１）を計算する。
この累積値ｇ（ｍ，ｋ＋１）は、次式のように次元ｋまでの累積値ｇ（ｍ，ｋ）に、次元ｋ＋１の一次元の連続分布の尤度を計算して加える。ただし、一次元の連続分布の対数確率値に含まれる定数項−（１／２）ｌｏｇ（２πσ（ｍ，ｋ））はステップＳＴ１において既に定数項Ｂ（ｍ）の中で加算済であり除く。
【０１０３】
【数９】

【０１０４】
また、第ｋ＋１次元から第Ｋ次元までの各次元の一次元の分布の累積値の推定値（ｈ（ｍ，ｋ＋１）とする）を推定値計算部２００３で計算する。
この推定値の具体的な計算例は以下の幾つかの実施の形態で説明されるのでここでの説明を省略する。
そして、累積尤度ｇ（ｍ，ｋ＋１）と推定値ｈ（ｍ，ｋ＋１）とに基づいて、評価値ｆ（ｍ，ｋ＋１）を次式で計算する。
ｆ（ｍ，ｋ＋１）＝ｇ（ｍ，ｋ＋１）＋ｈ（ｍ，ｋ＋１）
【０１０５】
ステップＳＴ６において、この仮説の要素分布番号、展開済次元数、累積値、評価値を、それぞれ、ｍ，ｋ＋１，ｇ（ｍ，ｋ＋１），ｆ（ｍ，ｋ＋１）とした仮説を作成し、仮説記憶部２００２に記憶して、ステップＳＴ２に戻る。
戻った後のステップＳＴ２における処理は前述の通りである。
【０１０６】
ステップＳＴ７では、仮説を解として記憶し、処理を終了する。
以下の幾つかの実施の形態では、推定値ｈ（ｍ，ｋ）の具体的な例について説明する。
【０１０７】
実施の形態３．
本実施の形態では、上記実施の形態２において、第ｋ＋１次元から最終の第Ｋ次元までの尤度累積値の推定値ｈ（ｍ，ｋ）を統計理論にに基づいて推定するものである。
【０１０８】
次元数Ｋの入力ベクトルに対して、分岐出力確率最大の要素分布を決定するためのベストファースト探索のフローチャートは図７に示したものと同一である。即ち、文献（Ｍｏｒｒｉｓｏｎ著：“ＭｕｌｔｉｖａｒｉａｔｅＳｔａｔｉｓｔｉｃａｌＭｅｔｈｏｄｓ”、１０頁）に述べられているように、平均０、分散１の独立した正規分布に従うｎ個の確率変数、Ｘ１，Ｘ２，…，Ｘｎの和として定義される変数χは、自由度ｎのカイ２乗分布に従って分布する。
一般に、Ｘｉが独立した平均μｉ，分散σｉに従うとき
【０１０９】
【数１０】

【０１１０】
として定義される変数は、自由度ｎのカイ２乗分布に従って分布する。
ところで、ある要素分布の第ｋ＋１次元から第Ｋ次元までの一次元の尤度の累積値は、前記式の通り、
【０１１１】
【数１１】

【０１１２】
であり、形式的に上の式と一致する。
よって、第ｋ＋１次元から第Ｋ次元までの一次元の尤度の累積値は、比例定数（この場合−１／２）を除けば、自由度Ｋ−ｋのカイ２乗分布に従って分布すると期待される。
【０１１３】
図９は自由度ｎのカイ２乗分布の分布関数のテーブルの一部分を示す（テーブルの全体は上記文献の３６６頁に示されている）。
このテーブルから、例えば、自由度１０のカイ２乗分布の独立変数χの値が２．５６以下となる確率は１％、３．２５以下となる確率は２．５％、３．９４以下となる確率は５％等々であることが読み取れる。
従って、残り次元が１０次元の時、一次元の確率の累積値が２．５６×比例定数＝２．５６×（−１／２）＝−１．２８以上の値をとる確率は１％である。したがって、９９％の確率で、累積値は−１．２８より小さな値となることが期待される。
【０１１４】
この性質を使うことにより、第ｋ＋１次元から第Ｋ次元までの累積値の推定値ｈ（ｍ，ｋ）を次式で与える。
ｈ（ｍ，ｋ）＝Ｘ（Ｋ−ｋ，ｐ）×（−１／２）
ここで、Ｘ（ｎ，ｐ）は自由度ｎのカイ２乗分布のパーセントポイントがｐであるときの独立変数の値である（例えば、自由度１０のカイ２乗分布の５％ポイントはＸ（１０，０．０５）＝３．９４である）。この値は、図９で示したテーブルを検索して求められる。
【０１１５】
この式から明らかなように、本実施の形態の評価値ｈ（ｍ，ｋ）は、要素分布の各次元の平均値および分散には依存しないので、すべての要素分布に共通の評価値となる。このため、評価値の計算が仮説の展開済の次元数ｋをキーとしたテーブル検索で実現可能である。
【０１１６】
欠点としては、推定値が実際の累積値より小さくなる確率がｐパーセントあるので、この推定値に基づく評価値は、真の値より小さくなる可能性がｐパーセントあり、Ａ＊条件を満たさない可能性がある。このため、評価値最大の仮説を優先的に展開するベストファースト探索で最初に得られる解は、必ずしも、入力ベクトルによっては、分岐出力確率が最大の要素分布が得られず、即ち、探索エラーが生じる可能性がある。
【０１１７】
実施の形態４．
本実施の形態では、上記実施の形態２において、第ｋ＋１次元から第Ｋ次元までの累積値の推定値ｈ（ｍ，ｋ）をＫ次元の入力ベクトルのＩ個の標本ｘ（ｉ）（ｉ＝１，２，…，Ｉ）から求めた値とするものである。
【０１１８】
次元数Ｋの入力ベクトルに対して、分岐出力確率が最大の要素分布を決定するためのベストファースト探索のフローチャートは図７に示したものと同一である。
Ｋ次元の入力ベクトルの標本ｘ（ｉ）は、学習用の音声データを分析して求める。
【０１１９】
次に入力ベクトルの各標本ｘ（ｉ）について、第ｍ番目の要素分布を用いたときの第ｋ＋１次元から第Ｋ次元までの各次元の一次元の尤度の累積値ｈ（ｍ，ｋ，ｘ（ｉ））を次式で求める。
【０１２０】
【数１２】

【０１２１】
ここで、ｘ（ｉ，ｎ）は標本ｘ（ｉ）のｎ次元目の値である。
探索に用いる推定値ｈ（ｍ，ｋ）は、これら標本から求めた累積値ｈ（ｍ，ｋ，ｘ（ｉ））を要素分布のインデックスｍ及び標本のインデックスｉに関する最大値として次式で求める。
【０１２２】
【数１３】

【０１２３】
第ｋ＋１次元から第Ｋ次元までの累積値の推定値として、実際の入力ベクトルから求めた最大値を使うので、Ａ＊条件を満たさない可能性が減少する。この結果、探索エラーが減少するという効果がある。
【０１２４】
実施の形態５．
本実施の形態では、上記実施の形態２において、第ｋ＋１次元から第Ｋ次元までの累積値の推定値ｈ（ｍ，ｋ）をＫ次元の入力ベクトルの標本から求めた値とするものである。
さらに、要素分布を要素分布間の類似度に基づいて、複数のクラスタに分類し、このクラスタ毎に、推定値を求めるようにしたものである。
【０１２５】
次元数Ｋの入力ベクトルに対して、分岐出力確率最大の要素分布を決定するためのベストファースト探索のフローチャートは図７に示したものと同一である。
次に推定値の作成について説明する。
【０１２６】
まず、要素分布をＬＢＧアルゴリズムやＫ平均法などの適宜なクラスタリング手法によって、要素分布をＣ個のクラスタに分類する。ここでは、Ｋ平均法を用いている。
Ｋ平均法では、始めにＣ個の種となる要素分布をランダムに選択し、代表分布とする。
【０１２７】
次にこれらＣ個の代表分布を用いて、代表分布からの距離に基づいて、全要素分布の各要素分布を距離が最小の代表分布の分類とすることにより全要素分布を分類する。
分布間の距離はカルバックダイバージェンスや、チェルノフ距離、バッタチャリア距離などを用いることができるが、ここでは、分布間の距離は計算が簡単であることからカルバックダイバージェンスを用いている。
【０１２８】
次にＣ個の分類のそれぞれについて、その分類に分類された要素分布どうしの平均距離が最小となるように代表分布を選び直す。
さらに、選び直した代表分布を用いて全要素分布を再分類する。
以上の処理を分類が収束するまで繰り返す。
【０１２９】
次にＫ次元の入力ベクトルの標本は、学習用の音声データを分析して求める。この手続きは上記実施の形態４と同じである。
次に入力ベクトルの各標本ｘ（ｉ）（ｉ＝１，２，…，Ｉ、Ｉ標本数）について、第ｋ＋１次元から第Ｋ次元までの累積値ｈ（ｍ，ｋ，ｘ（ｉ））を次式で求める。
【０１３０】
【数１４】

【０１３１】
ここで、ｘ（ｉ，ｎ）は標本ｘ（ｉ）のｎ次元目の値である。
探索に用いる推定値ｈ（ｍ，ｋ）は、これら標本から求めた累積値ｈ（ｍ，ｋ，ｘ（ｉ））をクラスタｃ（ｃ＝１，２，…，Ｃ）に属する要素分布のインデックスｍ（ｍ∈Ω（ｃ）、ただし、Ω（ｃ）はクラスタｃに属する要素分布のインデックス）及び標本のインデックスｉに関する最大値として次式で求める。
【０１３２】
【数１５】

【０１３３】
以上の処理で求めた推定値は、実際の入力ベクトルから求めた最大値を使い、さらに、分類したクラスタ内について最大値をとるので、Ａ＊条件を満たさない可能性がさらに減少するため、探索エラーがさらに減少するという効果がある。
【０１３４】
実施の形態６．
本実施の形態では、ベストファースト探索を用いて、評価値の大きい方から所定の数（以下の説明ではＳ個とする）の解を順次求めて、これら所定の数Ｓ個の解の中で、最大の分岐出力確率を有する要素分布を決定する。
探索のための機能ブロック図は、図８と同一であり、仮説展開制御部２００１は仮説記憶部２００２に記憶された仮説の取り出し、展開処理を行う。
【０１３５】
仮説記憶部２００２に記憶される仮説は、それぞれ、分布の番号ｍ、計算済の次元数ｋ、第１次元から第ｋ次元までの各次元の一次元の分布の尤度の累積値ｇ（ｍ，ｈ）及び評価値ｆ（ｍ，ｈ）の各数値を有する。
累積値ｇ（ｍ，ｈ）及び評価値ｆ（ｍ，ｈ）は、上記実施の形態２と同様、それぞれ、次式で得られる。
ｇ（ｍ，ｋ）＝Ａ（ｍ）＋Ｂ（ｍ）＋Ｃ（ｘ（ｔ），ｍ，ｋ）
ｆ（ｍ，ｋ）＝ｇ（ｍ，ｋ）＋ｈ（ｍ，ｋ）
ここで、ｈ（ｍ，ｋ）は、要素分布ｍの第ｋ＋１次元から最終の第Ｋ次元までの各次元の尤度の累積値の推定値である。
【０１３６】
図１０は次元数Ｋの入力ベクトルに対して、評価値の大きい方からＳ個の要素分布を決定するためのベストファースト探索のフローチャートである。
ステップＳＴ１〜ＳＴ７の各処理ステップの動作は上記実施の形態２と同様であり、その説明を省略し、異なる部分の処理について説明する。
【０１３７】
ステップＳＴ２１において、解の数が所定の数であるＳ個に達したら、ステップＳＴ２２に行く。解の数が所定の数であるＳ個未満であればステップＳＴ２に戻る。
ステップＳＴ２２において、評価値が大きい方から順に記憶されたＳ個の解について、その要素分布の番号をｍ（ｓ）（ｓ＝１，２，…，Ｓ）として、これらの解の仮説どうしで、尤度の累積値ｇ（ｍ（ｓ），Ｋ）を比較して、尤度の累積値ｇ（ｍ（ｓ），Ｋ）が最大である解の仮説を選択して、この解の仮説の保持する要素分布の番号ｍを分岐出力確率最大の要素分布の番号として出力する。
【０１３８】
次にこの実施の形態の効果を説明する。
要素分布ｍの第ｋ＋１次元から最終の第Ｋ次元までの各次元の尤度の累積値の推定値ｈ（ｍ，ｋ）として、前記３つの実施の形態では、それぞれ、
（ａ）自由度Ｋ−ｋのカイ２乗分布の分布関数の値が所定の値に達するときの独立変数値、
（ｂ）学習用の音声データの示す値の分布、
（ｃ）クラスタリングされた要素分布の各クラスタについての学習用の音声データの示す値の分布から決めた値を用いた。
【０１３９】
このような推定値は、要素分布ｍの第ｋ＋１次元から最終の第Ｋ次元までの各次元の尤度の累積値の真の値より下回ることがある。
このため、上記３つの実施の形態では、評価値が入力ベクトルによっては、Ａ＊条件を満たさないことがあった。
このため、ベストファースト探索で最初に求まる解は最適解である保証は必ずしもなく（即ち、解が求められた時点での評価値が最大ではあるが、必ずしも、各次元の尤度の累積値は最大ではないので）、探索エラーが生じることがあった。
【０１４０】
そこで、本実施の形態では、ベストファースト探索で評価値が下位の順位の候補も含めた合計Ｓ個の解を求めているので、その中に真の尤度最大の仮説が含まれる可能性を高め、探索エラーを減少している。
よって、本実施の形態によれば、ベストファースト探索で最初からＳ個の解を求めて、さらに、その中で、尤度の最も大きい解を選択しているので、探索エラーの生じる可能性を少なくできるという効果がある。
【０１４１】
実施の形態７．
本実施の形態では、尤度最大の要素分布の探索に際して、探索エラーが生じないようにＡ＊条件を満足するように推定値を計算し、ベストファースト探索を行うようにした。
そのため、つぎの一般的な不等式を用いる。
ここで、右辺のμｍｉｎ及びμｍａｘは、それぞれ左辺にある平均値μの最小値及び最大値である。また、σｍａｘは左辺の分散σの最大値である。
【０１４２】
【数１６】

【０１４３】
次に、この不等式に現れる平均値の最小値μｍｉｎ、最大値μｍａｘ及び、分散の最大値σｍａｘの求め方について説明する。
まず、一次元の分布の分類について説明する。
図１１は一次元の分布の分類のフローチャートである。また、図１２は数直線上の平均値と分割の様子を示す説明図である。
【０１４４】
次に動作について説明する。
ステップＳＴ３１において、変数ｋを０とする。以後、変数ｋは次元を表す。
ステップＳＴ３２において、次元を１増やす。
ステップＳＴ３３において、各要素分布の次元ｋの一次元分布の平均値μ（ｍ，ｋ）を数直線上に並べる。
【０１４５】
図１２は全要素分布の平均値を並べ終えた後の数直線上の平均値の配列の様子を示す。この例では、１６個の要素分布の平均値を数直線上の○印で示す。また、○印の上の数字は要素分布の番号を示す。
ステップＳＴ３４において、数直線上の平均値をＣ個の区間に分割する。図では４個ずつの平均値を含む４個の区間に分割した場合を示す。分割の仕方としては、例えば、各区間の平均値どうしの平均距離が最小になるように分割しても構わない。
【０１４６】
ステップＳＴ３５において、Ｃ個のそれぞれの区間について、区間の両端に位置する平均値、即ち、各区間の平均値の最小値μｍｉｎ（ｃ，ｋ）と最大値μｍａｘ（ｃ，ｋ）を求める。
図の例では区間２の平均値の最小値μｍｉｎ（２，ｋ）は要素分布７の平均値であり、また、同区間の平均値の最大値μｍａｘ（ｚ，ｋ）は要素分布１５の平均値である。
【０１４７】
ステップＳＴ３６において、Ｃ個のそれぞれの区間ｃについて、区間ｃの中に存在する平均値を所有する要素分布ｍ（∈Ω（ｃ）、ただしΩ（ｃ）はその次元ｋの一次元の分布の平均値が区間ｃに属する要素分布番号の集合とする）を求め、これら要素分布の中で次元ｋの各一次元の分布を見てそのなかで最大の分散σｍａｘ（ｃ，ｋ）を求める。即ち、
【０１４８】
【数１７】

【０１４９】
図の例では区間２の最大の分散σｍａｘ（２，ｋ）は、要素分布７、１０、１３、１５の分散のなかの最大の分散の値として求める。
また、要素分布ｍの区間番号をｃ（ｍ，ｋ）として記憶する。
この区間番号の記憶ｃ（ｍ，ｋ）は、次の探索の中で仮説の次元ｋについて、要素分布番号ｍからその区間番号を求めるために使用される。
【０１５０】
次に時刻ｔの特徴ベクトルｘ（ｔ）に対して、分岐出力確率最大の要素分布を探索する動作について説明する。
図１３は探索のフローチャートである。図１４は探索のための機能ブロック図である。
ステップＳＴ１１において、特徴ベクトルｘ（ｔ）が入力される。
ステップＳＴ１２において、入力の特徴ベクトルｘ（ｔ）と、上記のように既に求められたＣ個のそれぞれの区間ｃについての次元ｋの平均値の最小値μｍｉｎ（ｃ，ｋ）、最大値μｍａｘ（ｃ，ｋ）、分散の最大値σｍａｘ（ｃ，ｋ）に基づいて、区間ｃに平均値を有する要素分布のための推定値ｈ（ｃ，ｋ）を計算する。推定値ｈ（ｃ，ｋ）は、次の漸化式で計算する。
【０１５１】
【数１８】

【０１５２】
上の式に現れるθ（ｃ，ｋ）は、第ｃ番目の分割について、第ｋ＋１次元目の一次元の分布の平均値の最小値μｍｉｎ（ｃ，ｋ）、最大値μｍａｘ（ｃ，ｋ）、及び分散の最大値σｍａｘ（ｃ，ｋ）と、入力ベクトルｘ（ｔ）の第ｋ次元目の値ｘ（ｔ，ｋ）とから、上記不等式（３）の右辺の値を計算した値で次式のように計算される。
【０１５３】
【数１９】

【０１５４】
この値は、Ａ＊探索の条件を満足する。
したがって、この値θ（ｃ，ｋ）を使って求めた上記推定値ｈ（ｃ，ｋ）もＡ＊探索の条件を満足する。
以上のようにして求められた推定値ｈ（ｃ，ｋ）は、推定値記憶部２００４に記憶される。
【０１５５】
ステップＳＴ１〜ＳＴ４については、上記実施の形態２と同様の動作であるのでその説明を省略する。
ステップＳＴ５において、まず、次元数ｋ＋１までの累積尤度ｇ（ｍ，ｋ＋１）を計算する。
この累積尤度ｇ（ｍ，ｋ＋１）は、上記実施の形態２と同様に次式のように計算する。
【０１５６】
【数２０】

【０１５７】
次に第ｋ＋１次元から第Ｋ次元までの各次元の一次元の分布の尤度の累積値の推定値ｈ（ｃ，ｋ＋１）を推定値記憶部２００４から読み出す。そして、累積尤度ｇ（ｍ，ｋ＋１）と推定値ｈ（ｃ，ｋ＋１）とに基づいて、評価値ｆ（ｍ，ｋ＋１）を次式で計算する。
ｆ（ｍ，ｋ＋１）＝ｇ（ｍ，ｋ＋１）＋ｈ（ｃ，ｋ＋１）
ここで、番号ｃは要素分布ｍの区間番号ｃ（ｍ，ｋ）の記憶から求める。
【０１５８】
ステップＳＴ６では、ステップＳＴ５で作成された仮説を仮説記憶部２００２に記憶する。
最後に、効果について説明する。
本実施の形態によれば、評価値がＡ＊条件を満たすようにしたので、探索の結果、最初に見つかる評価値最大の解は、尤度の累積値が最大の解である。従って、探索エラーが生じることがない。また、評価値を探索に用いているので、評価値を探索に用いていない上記実施の形態１の場合に比較して、探索のなかで仮説の生成回数を減少させ、探索のための演算量を削減することができる。
【０１５９】
実施の形態８．
本実施の形態では、分岐出力確率最大の要素分布の探索に際して、探索エラーが生じないようにＡ＊条件を満足するように推定値を設計し、ベストファースト探索を行うようにした。そのため、本実施の形態でも、上記実施の形態７で用いたものと同一の一般的な不等式（３）を用いる。
この不等式については、上記実施の形態７で説明したので、説明を省略する。
【０１６０】
次に本実施の形態におけるこの不等式に現れる平均値の最小値、最大値、及び分散の最大値の求め方について説明する。
まず、混合分布を構成する要素分布をその類似の度合いに基づいて複数のクラスタに分割する。要素分布間の類似の程度は連続分布間の距離で測る。また、クラスタへの分割には、ＬＢＧアルゴリズムやＫ平均法などのクラスタリング手法が適用できる。
ここでは、要素分布間の類似度としてカルバックダイバージェンスを用い、クラスタリング手法としてＫ平均法を用いてクラスタリングを行う。この場合、クラスタリングについては、上記実施の形態４で述べたものと同じであり説明を省略する。
【０１６１】
次に探索の動作について説明する。
探索のフローチャートは図１３と同一であり、この図を用いて説明する。
その各部の動作は、ステップＳＴ１２の推定値の計算、及びステップＳＴ５における評価値計算が異なるだけで、その他は上記実施の形態７と同様であるため、これら同様部分の動作については説明を省略する。
【０１６２】
ステップＳＴ１２において、入力の特徴ベクトルｘ（ｔ）について、クラスタｃの要素分布ｍに関する次元ｋ＋１から最終次元Ｋまでの各一次元の分布の尤度の累積値の推定値ｈ（ｃ，ｋ）を計算し、推定値記憶部２００４に記憶する。推定値ｈ（ｃ，ｋ）は、次の漸化式を用いて計算する。
初期値：
ｈ（ｃ，Ｋ）＝０Ｋ：最終次元
漸化式：
ｋ＝Ｋ−１，Ｋ−２，…，０について
ｈ（ｃ，ｋ）＝θ（ｃ，ｋ＋１）＋ｈ（ｃ，ｋ＋１）
【０１６３】
ここで、θ（ｃ，ｋ）は、第ｃ番目のクラスタについて、そのクラスタに含まれる要素分布の第ｋ＋１次元目の一次元の分布の平均値の最小値μｍａｘ（ｃ，ｋ）、最大値μｍｉｎ（ｃ，ｋ）及び分散の最大値σｍａｘ（ｋ）と、入力ベクトルｘ（ｔ）の第ｋ次元目の値ｘ（ｔ，ｋ）とから、前記の不等式（３）の右辺の値を次式で計算した値である。
【０１６４】
【数２１】

【０１６５】
ステップＳＴ５において、評価値ｆ（ｍ，ｋ＋１）を計算する。
評価値ｆ（ｍ，ｋ＋１）は、累積値ｇ（ｍ，ｋ＋１）と推定値ｈ（ｃ，ｋ＋１）の和である。
推定値ｈ（ｃ，ｋ）は、ステップＳＴ１２において、推定値記憶部２００４に記憶された値を用いる。
なお、ステップＳＴ１２における漸化式から明らかなように、推定値を求めるための漸化式にｍａｘ演算を含んでいないので、上記の実施の形態７に比較して、推定値を計算するための計算量を削減できることが分かる。
また、そのなかに、クラスに関するｍａｘ演算を含まないので、他のクラスの推定値を考慮しなくて済むため、推定値が、より実際の値に近くなるため、評価値の精度が向上し、Ａ＊探索の理論から、最適解が得られるまでに必要な仮説の生成数が減少でき、探索の計算量を削減することができる。
【０１６６】
最後に、効果について説明する。
本実施の形態によれば、評価値がＡ＊条件を満たすようにしたので、探索の結果、最初に見つかる評価値最大の解は、尤度最大の解であり、その尤度は最大の尤度を見つける。また、評価値を探索に用いているので、評価値を探索に用いていない実施の形態１の場合に比較して、探索の仮説取り出しと展開のループのなかのループの回数を減少させ、演算量を削減することができる。また、推定値を計算するための計算量を削減することができる。また、推定値がより実際の値に近くなるため、探索の計算量を削減することができる。
【０１６７】
実施の形態９．
この実施の形態では、ある要素分布についての探索の途中で、途中の次元までの一次元の分布の尤度の累積値が、余りに小さいときは、その要素分布が最大の尤度累積値をとることがないと考えて、その要素分布の探索を打ち切る。
【０１６８】
図１５は本実施の形態のベストファースト探索のフローチャートである。
ステップＳＴ４１において、要素分布の最大の尤度の累積値が、これを下回ることがないという絶対的な閾値Θを設定する。具体的には本実施の形態での最大の尤度は２０〜７０の範囲に分布しており、その最小値より小さな値として数値０を設定する。
【０１６９】
ステップＳＴ２では仮説記憶部２００２にある仮説の中で尤度の累積値が最大の仮説を選択する。
ステップＳＴ４〜ＳＴ５において、次元数ｋを１増やして、累積値を計算する。
その後、ステップＳＴ４２において、ステップＳＴ５で計算された累積値がステップＳＴ４１で設定した閾値Θ（本実施の形態では０）より小さいときは、その仮説は最終次元においても、累積値は現在の累積値より大きくなることはない。したがって、最終次元の累積値は閾値より大きくなることはなく、したがって、現在の要素分布が最大の尤度を有する要素分布となることもないので、ステップＳＴ６には行かずに、ステップＳＴ２に戻る。
なお、計算された累積値が閾値Θより大きいときは、ステップＳＴ６に進み、その仮説はスタックに格納される。
【０１７０】
以上のような動作により、探索の途中で、スタックに格納される仮説の数がステップＳＴ６を通過しない分だけ減少する。
また、ステップＳＴ２でスタックにある仮説の中で累積値が最大の仮説を選択する際に、選択対象となる仮説はスタックの中の仮説全体である。このため、ステップＳＴ２の処理で使われる演算量が削減される。
【０１７１】
次に効果について説明する。
本実施の形態によれば、累積尤度が設定された閾値より小さいとき、仮説の展開格納を行わないので、探索途中で生じる仮説の数が減少し、演算量を削減できる。
なお、閾値を最大尤度の最小値より小さく設定することで、最大の尤度を有する要素分布を探索することができる。また、累積値と閾値を比較して仮説の展開を打ち切る場合について説明したが、上記実施の形態２〜実施の形態８のように、累積値と推定値とに基づいて評価値を計算する場合にも、適用しても構わない。この場合は、閾値との比較対象は、累積値と比較しても、また、評価値と比較しても同様の効果がある。
【０１７２】
また、要素分布を複数のクラスタに分割し、各クラスタについて、最大尤度の範囲を求めて、最大尤度の最小値より小さな値を閾値として、展開された仮説がこのクラスタごとの閾値を下回ったとき、展開された仮説をスタックに格納しないようにしても構わない。
【０１７３】
実施の形態１０．
上記実施の形態９では、すべての要素分布で共通の閾値Θを用いた。本実施の形態では、最大の尤度の最小値の分布は、要素分布ｍに依存して変化する。そこで、要素分布毎に尤度の最大値を調べ、この尤度最大値の最小値の比較的大きな要素分布については、より大きな閾値を設定し、より効率的に探索を打ち切るようにする。
本実施の形態での探索のフローチャートは図１５と同一である。
【０１７４】
本実施の形態では、ステップＳＴ４１において設定する閾値を、要素分布に依存して、そのとり得る最大の尤度の値の範囲を調べ、その最小値より小さい値を閾値Θ（ｍ）とする。
また、ステップＳＴ４２において要素分布番号ｍに応じてこの閾値Θ（ｍ）を取り出し、累積値ｇ（ｍ，ｋ）と比較する。
【０１７５】
次に効果について説明する。
本実施の形態によれば、要素分布に依存して最大の尤度の最小値の比較的大きな要素分布について、より大きな閾値を設定することができ、要素分布によって、より効率的に探索を打ち切ることができる。その結果、探索の演算量をより削減できる。
また、要素分布を複数のクラスタに分類して、各クラスタごとに共通の閾値を用いることで、閾値の記憶を削減することができる。
【０１７６】
実施の形態１１．
この実施の形態では、上記実施の形態９と同様に、ある要素分布についての探索の途中で、途中の次元までの一次元の分布の確率の累積値が、余りに小さいときは、その要素分布が最大の尤度を有することがないと考えて、その要素分布の探索を打ち切る。
さらに、打ち切りの効率を高めるため、閾値を次元数ごとに変化させる。
次元数に応じて閾値を変化させるため、ここでは実施の形態３で用いたカイ２乗分布を用いる。
【０１７７】
図１６はカイ２乗分布の分布関数が一定の数値に到達するときの値に基づいて閾値を決める際の考え方を説明する説明図である。
図において、仮説Ａは既に展開されている仮説、仮説Ｂあるいは仮説Ｃは途中次元ｋにおいて、これから展開されようとしている仮説である。
また、ＦＡは、仮説Ａの次元ｋにおける累積値から、自由度Ｋ−ｋのカイ２乗分布の分布関数が０．９５となるときの独立変数の値の−１／２倍を加えた値である。したがって、仮説Ａが最終次元Ｋまで展開されたときに、仮説Ａの累積値がこのＦＡを上回る確率は０．９５であり、ほとんどの場合ＦＡを上回ることが期待できる。
【０１７８】
また、ＦＢは仮説Ｂの次元ｋにおける累積値から、自由度Ｋ−ｋのカイ２乗分布の分布関数が０．０５となるときの独立変数の値の−１／２倍を加えた値である。したがって、仮説Ｂが最終次元Ｋまで展開されたときに、仮説Ｂの累積値がこのＦＢを上回る確率は０．９５でほとんどの場合上まわることはないことになる。
【０１７９】
同様に、ＦＣは、仮説Ｃの次元ｋにおける累積値から、自由度Ｋ−ｋのカイ２乗分布の分布関数が０．０５となるときの独立変数の値の−１／２倍を加えた値である。したがって、仮説Ｃが最終次元Ｋまで展開されたときに、仮説Ｃの累積値が、このＦＣを上回る確率は０．０５でほとんどの場合上まわることはないことになる。
【０１８０】
この図から、ＦＣがＦＡを上回る確率は、０．０５×０．０５より小さくなるので、仮説Ｃを打ち切ることができる。
また、ＦＢがＦＡを上回る確率は、０．０５×０．０５よりは大きいので、仮説Ｂを打ち切ることはあきらめる方が良い。
【０１８１】
実際には、途中次元ｋにおける最大の累積値を常に保存しておき、その最大値から自由度Ｋ−ｋのカイ２乗分布の分布関数から求めた累積値の上限と下限の推定値を幅として、仮説の打ち切りを行う。
図１７は本実施の形態の探索のフローチャートである。
【０１８２】
ステップＳＴ５１において、各次元ｋ（ｋ＝１，２，…，Ｋ）の閾値の幅を設定する。次元ｋの閾値の幅は、自由度Ｋ−ｋのカイ２乗分布の分布関数から得た二つのパーセントポイント（つまり本実施の形態では０．９５と０．０５）の差の−１／２倍として、次式で計算して記憶する。
【０１８３】
【数２２】

【０１８４】
ここで、Ｘ（ｎ，ｐ）は自由度ｎのカイ２乗分布の分布関数がｐとなるときの独立変数の値（即ち、パーセントポイント）である。
ステップＳＴ５２では、各次元ｋ（ｋ＝１，２，…，Ｋ）までの累積値の最大値の初期値として負の最大値（−∞とする）に設定する。
以後、この次元ｋまでの累積値の最大値をＥ（ｋ）で表す。
【０１８５】
ステップＳＴ４２では、仮説の累積値と次元ｋの閾値とを比較する。次元ｋの閾値は次元ｋの最大値Ｅ（ｋ）から、カイ２乗分布に基づいてステップＳＴ５１で求めた次元ｋの幅Ｗ（ｋ）を引くことにより、次式で求める。
仮説の次元ｋの閾値＝Ｅ（ｋ）−Ｗ（ｋ）
【０１８６】
この閾値との比較の結果、累積値が小さいときはステップＳＴ２に戻る。逆に、累積値が小さくないときはステップＳＴ５３に進む。
ステップＳＴ５３では仮説の次元ｋの最大値を更新する。これは累積値が最大値Ｅ（ｋ）を上回ったとき、最大値に累積値を代入することで行う。
ステップＳＴ６では次元数を増加した仮説をスタックに格納する。
【０１８７】
次に効果について説明する。
本実施の形態によれば、次元ごとに閾値を変更するため、一定の閾値を用いる場合に比較して、探索途中に格納される仮説がより削減できる。このため、スタックから取り出すときの演算量が削減できるという効果がある。
【０１８８】
実施の形態１２．
この実施の形態では、次元方向のベストファースト探索によって仮説を展開する際、次元数を２以上増加させることで、最終次元に到達するまでのスタックから仮説を取り出す処理と、スタックに仮説を格納する処理の回数を削減する。これによって、スタックの管理のための演算量を削減する。
【０１８９】
図１８は本実施の形態のベストファースト探索のフローチャートである。
ステップＳＴ２において、累積値最大の仮説を取り出す。
その仮説の要素分布番号、展開済の次元数、及び次元１から次元ｋまでの各次元の一次元の分布の確率の累積値を、それぞれ、ｍ，ｋ，ｇ（ｍ，ｋ）とする。
【０１９０】
ステップＳＴ３において、展開済の次元数ｋが最終次元Ｋでなければ、ステップＳＴ４に進む。
ステップＳＴ４において、次元数を２以上の数Δだけ増加し、ｋ＋Δとする。ただし、増加後の値が最終次元Ｋより大きくなる場合は、最終次元Ｋと一致させる。
【０１９１】
ステップＳＴ５では、累積値を更新し、ｇ（ｍ，ｋ＋Δ）とする。更新後の累積値ｇ（ｍ，ｋ＋Δ）は次式で計算する。
【０１９２】
【数２３】

【０１９３】
次元数を２以上増加させたので、本実施の形態では、最終次元に到達するまでのスタックから仮説を取り出す処理と、スタックに仮説を格納する処理の回数を削減する。これによって、スタックの管理のための演算量を削減するという効果がある。
なお、累積値が最大の仮説を優先的に次元数方向に展開する場合について説明したが、推定値を含む評価値が最大の仮説を優先的に次元数方向に展開する場合（具体的には実施の形態２以降の上記各実施の形態）についても同様に適用できることは言うまでもない。
【０１９４】
実施の形態１３．
上記の各実施の形態で説明した尤度累積値の最大の要素分布を次元方向にベストファースト探索して求める方法は、任意の空間的配置を有するコードベクトルを用いるベクトル量子化に応用でき、最近隣コードベクトルを計算するための演算量削減のため用いることができる。
【０１９５】
以下、これについて説明する。
音声の特徴ベクトルをｘとする。
特徴ベクトルｘは、限られた数の典型的なベクトルｖ（１），ｖ（２），…，ｖ（Ｃ）のうちで最も近いベクトルｖ（ｃ）のインデックスｃに変換される（Ｃはコードブックのサイズ）。
この変換は連続的な値を持つベクトルａを離散的で限られた数のベクトル｛ｃ｜ｃ＝１，２，…，Ｃ｝に変換するものでベクトルの量子化を行っていることになる。
離散的な値をとるベクトルの有限集合Ｂ＝｛ｖ（１），ｖ（２），…，ｖ（Ｃ）｝をコードブックと呼び、その要素をコードベクトルと呼ぶ。
【０１９６】
図１９はベクトル量子化を用いた符号化・復号化の構成図である。
図において、１１は入力データとしての音声信号である。１２は入力データ１１を時間窓を１０ｍｓの周期でずらしながら線形予測法で周波数分析し、特徴ベクトル１３に変換する分析部である。ここでは、スペクトル包絡情報を１２次元のＰＡＲＣＯＲ係数として求め、これを特徴ベクトルとし、その次元数ＫはＫ＝１２である。
【０１９７】
１５は複数のコードベクトルを記憶したコードブック、１４はコードブック１５を参照して、コードブック内の距離の最も近いコードベクトルを求めて、そのコードベクトル番号を符合１６として出力するコードベクトル探索部である。１７は符合１６を伝送または蓄積する伝送・蓄積部である。１８は伝送又は蓄積された符合１６からコードブック１９を参照し、コードベクトル２０を求める復号部である。２１はコードベクトル２０をパラメータとして音声を合成し、出力データ２２を作成する合成部である。
【０１９８】
本実施の形態では特徴ベクトルｘと距離の最も小さいコードベクトルの番号を探索するのに、これまでに説明した実施の形態と同様の次元方向のベストファースト探索を適用する。
入力ベクトルｘの第ｋ次元目の値をｘ（ｋ）とし、第ｍ番目のコードベクトルの第ｋ次元目の値をｖ（ｍ，ｋ）とする。
また、入力ベクトルと第ｍ番目のコードベクトルｖ（ｍ）の距離はユークリッド距離として次式で計算する。
【０１９９】
【数２４】

【０２００】
ベクトル量子化の結果として探索するコードベクトルは、上式の右辺を最小とするｍの値である。
【０２０１】
【数２５】

【０２０２】
これは、上式の右辺を−１倍した次式を最大とするｍの値を求めるのと等価である。
【０２０３】
【数２６】

【０２０４】
即ち、上式の右辺の総和（Σ）の内側の次元ｋで計算される２次形式を類似度と考えると、ベクトル量子化は、上式の右辺で定義される各次元の類似度の累積値ｇ（ｍ，Ｋ）を最大化するコードベクトルを求めることに相当する。ただし、
【０２０５】
【数２７】

【０２０６】
とする。
次元１から最終次元Ｋまでの各次元の類似度の累積値ｇ（ｍ，Ｋ）が最大のコードベクトルを求める処理の基本的流れは実施の形態１と同一である。
【０２０７】
図２０は本実施の形態のベクトル量子化におけるコードベクトルをベストファースト探索で求めるときのフローチャートである。
以下、上記実施の形態１と異なる点について説明する。
コードブックに記憶されたコードベクトルの数は２０４８（一般にＭ個）である。
【０２０８】
コードブックは多数の標本の入力ベクトルをＫ平均法で２０４８個のクラスタに分割して、総合の量子化誤差が最小になるようにクラスタの代表ベクトル（セントロイド）を決め、２０４８個のセントロイドをコードベクトルとおくことで作成される。各コードベクトルの次元は１２次元（一般にＫ次元）である。
【０２０９】
次に各処理ステップについて説明する。
ステップＳＴ１において、コードブック内の２０４８個のコードベクトルついて、初期仮説を生成する。生成された仮説は仮説記憶部２００２に記憶される。
ここで、第ｍ番目のコードベクトルの初期仮説は、コードベクトルの番号をｍ、計算済の次元数を０、各次元の類似度の累積値ｇ（ｍ，０）を０とする。
【０２１０】
ステップＳＴ２において、類似度の累積値最大の仮説を仮説記憶部２００２から取り出す。この仮説の要素分布番号、展開済次元数、各次元の類似度の累積値を、以下、それぞれ、ｍ，ｋ，ｇ（ｍ，ｋ）として説明する。
【０２１１】
ステップＳＴ３において、計算済の次元数ｋが最終次元をあらわす数値Ｋに到達しているかを調べ、最終次元に到達している場合は、ベストファースト探索の理論により、この仮説が累積値最大のコードベクトルに該当するので、この仮説を探索された解として、解の処理を行うため、ステップＳＴ７に進む。
もし、最終次元に到達していない場合（ｋ＜Ｋのとき）は、ステップＳＴ４に進む。
【０２１２】
ステップＳＴ４において、次元数ｋを１増やし、ｋ＋１とする。
ステップＳＴ５において、次元数ｋ＋１までの類似度の累積値ｇ（ｍ，ｋ＋１）を計算する。
この累積尤度は、次式のように次元ｋまでの累積尤度であるｇ（ｍ，ｋ）に、次元ｋ＋１の一次元の類似度η（ｍ，ｋ＋１）を計算して加える。
ｇ（ｍ，ｋ＋１）＝ｇ（ｍ，ｋ）＋η（ｍ，ｋ＋１）
次元ｋ＋１の一次元の類似度は次式のように計算される。
η（ｍ，ｋ＋１）＝−｜ｘ（ｋ＋１）−ｖ（ｍ，ｋ＋１）｜²
【０２１３】
ステップＳＴ６において、この仮説のコードベクトル番号、展開済次元数、類似度累積値を、それぞれ、ｍ，ｋ＋１，ｇ（ｍ，ｋ＋１）とした仮説を作成し、仮説記憶部２００２に記憶して、ステップＳＴ２に戻る。
ステップＳＴ７では、解として得られた仮説について、そのコードベクトル番号ｍをベクトル量子化の結果のコード番号として出力する。以上で処理を終了する。
【０２１４】
さて、ステップＳＴ７において、解として得られた仮説については、要素分布番号、展開済次元数、累積値はそれぞれ、ｍ，Ｋ，ｇ（ｍ，Ｋ）となっている。
ｇ（ｍ，Ｋ）は第ｍ番目のコードベクトルの次元１から最終次元Ｋまでの各次元の類似度の和であり、２０４８個のコードベクトルの中では、最大の類似度の累積値を有している。
【０２１５】
以上の処理によって出力されるコード番号は、コードブックの中のコードベクトルの中で、入力ベクトルとの類似度の累積値が最大であるコードベクトル、換言すれば、入力ベクトルとの距離が最小であるコードベクトルに一致している。即ち、ベクトル量子化の結果が得られる。
【０２１６】
本実施の形態で、探索終了時点のスタックに残された仮説の様子は、上記実施の形態１と同様に図６のようになっている。この図で□印が最終次元Ｋまで展開された尤度最大の仮説の展開履歴を示す。また、○印及び×印が途中の次元まで展開された仮説の展開履歴を示す。
【０２１７】
さて、この図から分かるように、類似度累積値最大のコードベクトルの全次元について各次元の距離を計算してからベクトル間の距離最小のコードベクトルを判定するフル探索に基づく方法に比較して、本実施の形態の方法では、演算回数はスタックに展開されずに残っているコードベクトルの残りの展開すべき次元数の分だけ、一次元の類似度の計算をしないので演算量が削減されている。
従って、本実施の形態では、ベストファースト探索を用いているので、フル探索に比較して、一次元の距離計算の演算回数を削減できるという効果がある。
また、コードブックを木構造が有するように設計されている必要がなく、任意の分布のコードブックに対して演算量削減の効果を発揮できるという効果がある。
【０２１８】
実施の形態１４．
本実施の形態は、多次元入力ベクトルの最大尤度を有する要素分布を次元方向にベストファースト探索して求める方法の一つの応用として、多次元の特徴ベクトル空間で最近隣または最尤のコードベクトルを探索してカテゴリを認識する問題に適用したものである。
文字認識、特に漢字のようにカテゴリ数が３０００以上と極めて多い場合、距離最小又は尤度最大の分布を計算する部分の演算量が格段に増える。
【０２１９】
本実施の形態では、各カテゴリはマルチテンプレートで表されており、テンプレート数は１６個である。このとき、合計で３７１５カテゴリの１２８次元ベクトルを入力として、距離最小のカテゴリベクトルを探索する例である。
【０２２０】
図２１は文献（“文字認識における距離計算の高速化の検討”、１９９７年電子情報通信学会総合大会予稿集、３１０頁）に基づき構成した文字認識装置の構成図である。
図において、３１は認識対象の印刷漢字が印刷された入力の紙文書、３２は入力文書３１を文字認識装置に読み込むためのイメージスキャナ、３３はイメージスキャナ３２で読み取ったイメージデータから文字の存在する領域を一文字毎に分離して切り出す文字領域切出部、３４は文字領域切出部３３によって切り出された文字領域にある文字イメージから輪郭特徴６４次元（４×４領域×４方向）、及び１次周辺特徴３２次元（８次元×４方向）、さらに、２次周辺特徴３２次元（８次元×４方向）の合計１２８次元を特徴量として抽出し、１２８次元の特徴ベクトル３５を出力する特徴抽出部である。
【０２２１】
３６は各カテゴリのテンプレートのベクトルがカテゴリと関連づけられて記憶された認識辞書、３７は特徴ベクトル３５に対して、認識辞書３６にカテゴリに関連づけて記憶されたテンプレートベクトルとの距離が最小となるテンプレートベクトルを探索し、距離最小のテンプレートベクトルとして探索されたテンプレートベクトルに関連づけられたカテゴリを認識結果３８として出力するカテゴリ検索部である。
【０２２２】
上記文献ではカテゴリ探索は初期カテゴリから中心カテゴリをその近傍にある中心ベクトルの中で距離最小のベクトルに移動することを反復することで距離最小のベクトルを探索している。
本実施の形態では、次元方向のベストファースト探索を適用する。
【０２２３】
図２２は特徴ベクトル３５に対して認識辞書３６に記憶されたテンプレートのベクトルのなかで距離最小のベクトルを次元方向にベストファースト探索するカテゴリ探索部３７の動作を示すフローチャートである。
本実施の形態では、入力ベクトルと距離の最も小さいベクトルコードを探索する。距離は市街地距離である。
【０２２４】
入力ベクトルの第ｋ次元目の値をＸ_k とし、第ｍ番目のコードベクトルの第ｋ次元目の値をＶ_mkとする。このとき、入力ベクトルと第ｍ番目のコードベクトルの距離はユークリッド距離として次式で計算される。
【０２２５】
【数２８】

【０２２６】
符合として求めるコードベクトルは、上式の右辺を最小とするｍの値である。これは、上式の右辺を−１倍した次式で定義される類似度（以後、距離の−１倍を簡単のため「類似度」と呼ぶ）の累積値を最大とするｍの値を求めるのと等価である。
【０２２７】
【数２９】

【０２２８】
そこで、上式で定義される各次元の類似度の累積値ｇを最大化するコードベクトルを求める。
上式が最大のコードベクトルを求める処理の基本的流れは実施の形態１と同一である。
図２０はベクトル量子化におけるコードベクトルをベストファースト探索で求めるときのフローチャートである。
【０２２９】
認識辞書内にはＭ個のテンプレートベクトルが存在し、それぞれ、１からＭまでの通し番号を付けられている。また、それぞれのテンプレートベクトルにはカテゴリＣ（ｍ）が関連づけられている。
次に各処理ステップについて説明する。
【０２３０】
ステップＳＴ１において、認識辞書内の３７１５カテゴリのテンプレートベクトルのそれぞれについて、初期仮説を生成する。生成された仮説は仮説記憶部２００２に記憶される。
ここで、第ｍ番目のコードベクトルの初期仮説は、テンプレートベクトルの番号をｍ、計算済の次元数を０、各次元の類似度の累積値を０とする。各次元の類似度は次式の市街地距離の−１倍である。
次元ｋの類似度＝−｜ｘ_k −ｖ_mk｜
式中、｜ａ｜はａの絶対値を示す。
【０２３１】
ステップＳＴ２において、類似度の累積値最大の仮説を仮説記憶部２００２から取り出す。この仮説の要素分布番号、展開済次元数、各次元の類似度の累積値を、以下、それぞれ、ｍ，ｋ，ｇ（ｍ，ｋ）として説明する。
ここで、類似度の累積値は仮説の展開済次元数をｋとすると次式で与えられる。
【０２３２】
【数３０】

【０２３３】
ステップＳＴ３において、計算済の次元数ｋが最終次元に一致しているかを調べ、最終次元に一致している場合は、ベストファースト探索の理論により、この仮説が累積値最大のテンプレートベクトルに該当するので、この仮説を探索された解として、ステップＳＴ７に進む。
もし、計算済の次元数ｋが最終次元に一致しない場合は、ステップＳＴ４に進む。
【０２３４】
ステップＳＴ４において、次元数を１増やし、ｋ＋１とする。
ステップＳＴ５において、次元数ｋ＋１までの各次元の類似度の累積値ｇ（ｍ，ｋ＋１）を計算する。
この累積類似度ｇ（ｍ，ｋ＋１）は、次式のように次元ｋまでの各次元の類似度の累積であるｇ（ｍ，ｋ）に、次元ｋ＋１の一次元の類似度を計算して加える。
ｇ（ｍ，ｋ＋１）＝ｇ（ｍ，ｋ）＋η（ｍ，ｋ＋１）
【０２３５】
次元ｋ＋１の一次元の類似度η（ｍ，ｋ＋１）は次式で計算される。
η（ｍ，ｋ＋１）＝−｜ｘ_k+1 −ｖ_mk+1｜
【０２３６】
ステップＳＴ６において、この仮説のテンプレートベクトルの番号、展開済次元数、類似度累積値を、それぞれ、ｍ，ｋ＋１，ｇ（ｍ，ｋ＋１）とした仮説を作成し、仮説記憶部２００２に記憶して、ステップＳＴ２に戻る。
【０２３７】
ステップＳＴ７では、解として得られた順位１の仮説及びスタックに残された類似度累積値の大きい上位Ｎ−１個の候補について、そのテンプレートベクトル番号ｍ１，ｍ２，ｍ３，…，ｍＮを求め、そのテンプレートベクトルに関連づけられたカテゴリＣ（ｍ１），Ｃ（ｍ２），Ｃ（ｍ３），…，Ｃ（ｍＮ）を認識辞書３６から求め、認識結果３８の候補として出力し、処理を終了する。
【０２３８】
さて、ステップＳＴ７において、解として得られた順位１の仮説については、要素分布番号、展開済次元数、累積値は、それぞれ、ｍ，Ｋ，ｇ（ｍ，Ｋ）となっている。
ｇ（ｍ，Ｋ）は第ｍ番目のコードベクトルの次元１から最終次元Ｋまでの各次元の類似度の和であり、テンプレートベクトルの中では、最大の類似度の累積値を有している。
【０２３９】
以上の処理によって出力されるカテゴリ番号は、認識辞書３６の中のテンプレートベクトルの中で、特徴ベクトルとの類似度の累積値が最大であるテンプレートベクトル、換言すれば、特徴ベクトルとの距離が最小であるテンプレートベクトルに一致している。即ち、カテゴリ認識の結果が得られる。
【０２４０】
上記文献のカテゴリ探索方法は、初期カテゴリの与え方によっては、必ずしも距離最小のテンプレートベクトルを探索できなかったが、本実施の形態によれば、特徴ベクトルに対して、複数の多次元ベクトルの中で距離最小のベクトルの探索のため、次元方向のベストファースト探索を用いたので、距離最小のテンプレートベクトルを探索でき、演算量を削減でき精度を劣化させないという効果がある。
【０２４１】
実施の形態１５．
本実施の形態は、混合連続分布ＨＭＭを用いる音声認識装置において、多次元入力ベクトルの最大尤度を有する要素分布を次元方向にベストファースト探索して求める方法を適用したものである。
本実施の形態の音声認識装置のブロック図は、図１であり、実施の形態１と同一の図を用いて説明する。
図において、１は入力音声、２は入力音声１を分析して時系列の特徴ベクトル３に変換する音声分析手段、４は時系列の特徴ベクトル３を入力し、混合連続分布ＨＭＭ８に対する尤度を計算し尤度行列を出力する尤度計算手段、５は経路データ記憶手段９である単語ネットワークに従って、音素ＨＭＭの連結から単語を構成し、ネットワークの各ノードにおいて、各ノードに至る音素ＨＭＭの状態系列の中で累積尤度最大の経路を求めるビタビ演算手段、６はビタビ演算の局所経路を後ろ向きに探索して最適な単語列を得る最適経路決定手段である。
【０２４２】
音素ＨＭＭは混合連続分布ＨＭＭであり、図２８のように３状態の多次元連続出力確率分布からなる。
図２３は各フレームの特徴ベクトルが求められた後の各フレームの処理のフローチャートである。
以下、各ステップの動作を説明する。
【０２４３】
ステップＳＴ６１において、当該フレームの特徴ベクトルが入力される。
ステップＳＴ６２において、すべての混合分布のすべての要素分布をまとめて、特徴ベクトルに対する尤度計算を行う。この尤度計算は、本発明の次元方向のベストファースト探索を用いる。詳細は後で述べる。
【０２４４】
ステップＳＴ６３において、ステップＳＴ６２において得られた特徴ベクトルに対する要素分布の尤度計算結果に基づいて、各状態の出力尤度を計算する。
【０２４５】
ステップＳＴ６４において、経路の尤度を計算する。計算はビタビ演算による。ビタビ演算は音素ｐ、状態ｊの尤度をこの状態に行き得るすべての状態からの経路に沿い尤度が最大となる経路を選択しその経路と尤度を記憶する。以上で各フレームの処理は終了する。
次に、上記のステップＳＴ６２における本発明の次元方向のベストファースト探索を用いる尤度の計算方法について説明する。ここでは、実施の形態８に説明した方法を適用する。
【０２４６】
図２４は探索の処理のフローチャートである。入力ベクトルは、当該フレームの特徴ベクトルである。探索する要素分布は、２０００状態のすべての要素分布であり、総数は２０００状態×１６分布＝３２０００分布である。
全要素分布の集合はＫ平均クラスタリング法により２０４８個のクラスタに分割されている。
それぞれのクラスタについて、その中の要素分布の各次元の一次元分布の平均、分散の最大最小値が予め求められている。
【０２４７】
ステップＳＴ１１では、これらに基づいて、特徴ベクトルに対する第ｍ次元から最終次元までの累積値の推定値が計算される。
以下、実施の形態７で説明した動作を行い、評価値最大の仮説がステップＳＴ３で最終次元に到達し、ステップＳＴ７を実行する。
ステップＳＴ７では、スタックの内容を保存して、ステップＳＴ６２は終了する。
【０２４８】
ステップＳＴ６３では、保存されたスタックの内容をみて、全要素分布の尤度を求める。ステップＳＴ３で最初に見つかった解の要素分布は正確な尤度が計算されているので、その値を用いる。また、その他の要素分布については、スタックに残されている途中次元までの累積尤度を用いることができる。
また、途中次元までの累積尤度の他に、途中次元＋１から最終次元までの推定値を加えた評価値が残っているので、この値を用いることができる。
【０２４９】
実施の形態１６．
上記実施の形態１５のフレーム処理内の混合分布演算において、Ｎベストの探索を行う。
フレーム処理の流れは図２３と同一である。探索のフローチャートは図２５である。
【０２５０】
ステップＳＴ２１において、上位Ｎ個の解を求めてから、ステップＳＴ２２においてスタックの内容を保存する。この後、ステップＳＴ６３では、上位Ｎ個の解については対応する要素分布の尤度の値とする。
その他の要素分布の尤度は、スタックに残った累積値あるいは累積値と推定値に基づいて計算された評価値を用いる。
上記実施の形態１５に比べて、少なくとも、上位Ｎ個の要素分布については厳密な尤度が得られるため、認識の精度が向上する。
【０２５１】
実施の形態１７．
図２６はマルチパスの音声認識方法に基づいて構成された、本実施の形態の音声認識装置の構成図である。
音声分析手段２は、入力音声１を分析し、多次元の特徴ベクトル３に変換する。
尤度計算１手段４１及び尤度計算２手段４２は、この多次元の特徴ベクトル３に対して、ＨＭＭ記憶手段８を参照して、複数の多次元の連続分布からなるＨＭＭの状態の尤度を計算する。ＨＭＭの状態の尤度は上記実施の形態１５と同様の処理で計算する。
【０２５２】
経路計算１手段４３は、認識単位の系列の中で、より制約の弱い経路データ１を記憶する経路データ１記憶手段４５を参照して、完全な経路計算のための推定値を計算する。
経路計算２手段４４は、完全な経路データ２を記憶する経路データ２記憶手段４６を参照するとともに、経路計算１手段４３で計算された経路の制約を緩めた経路の尤度推定値と、現在までに得られた完全な経路の途中尤度とを評価値として、ベストファースト探索法により完全な経路の尤度をフル探索に比べて少ない演算量で効率的に計算する。
【０２５３】
最適経路決定手段６は、経路計算２手段４４で得られた経路を調べて最適な経路を決定し、最適な経路に関連づけられた認識単位の系列を認識結果７として出力する。
以上の構成において、尤度計算１手段４１の計算した尤度は真の尤度に比較して同値かより大きく、Ａ＊条件を満足する。
よって、経路の制約を緩めた経路に沿って，経路計算１手段４３の計算した尤度は真の尤度に比較して同値か、それより大きく、Ａ＊条件を満足する。
このように、Ａ＊条件を満足するＨＭＭの出力確率を少ない演算量で求めることができるという効果がある。
【０２５４】
実施の形態１８．
本実施の形態は、上記実施の形態１７で述べたマルチパスの音声認識方法において、尤度計算１手段４１の計算において、この多次元の特徴ベクトル３に対して、ＨＭＭ記憶手段８を参照して、複数の多次元の連続分布からなるＨＭＭの状態の尤度を計算する。ＨＭＭの状態の尤度は上記実施の形態１６と同様に上位Ｎ個の要素分布の尤度を厳密に計算するようにしたものである。
【０２５５】
それによって、マルチパスの１回目で得られる経路尤度の推定値の精度が向上し、その結果、マルチパスの２回目のベストファースト探索による経路探索に要する演算回数が減少するという効果がある。
【０２５６】
実施の形態１９．
以上の実施の形態で述べた方法は、パターン認識装置及び音声認識装置の中で実施することもできる。また、パターン認識及び音声認識を実行するプログラムを記憶した記録媒体としても実施することもできる。
また、入力ベクトルと多次元の複数のベクトルとの距離、類似度及び尤度を計算する応用、例えば、特徴ベクトルのクラスタリングや、ＨＭＭの学習などに適用することで演算量削減の効果があることは言うまでもない。
【０２５７】
【発明の効果】
以上のように、この発明によれば、要素分布の探索方法は、展開済の次元数が入力ベクトルの次元数に一致するまで、次元方向にベストファースト探索を実行するように構成したので、要素分布を探索するための演算量が削減される効果がある。
【０２５８】
この発明によれば、展開済の次元数が増加する方向にベストファースト探索を実行するように構成したので、要素分布を探索するための演算量が削減される効果がある。
【０２５９】
この発明によれば、第ｋ＋１次元から第Ｋ次元における確率値の累積値の推定値を、自由度Ｋ−ｋのカイ２乗分布の分布関数値が所定値に達するときのカイ２乗分布の独立変数値に基づいて計算するように構成したので、評価値の計算が仮説の展開済の次元数ｋをキーとしたテーブル検索で実現可能になる効果がある。
【０２６０】
この発明によれば、第ｋ＋１次元から第Ｋ次元における確率値の累積値の推定値を、学習用の音声データが示す値の分布に基づいて計算するように構成したので、探索エラーが減少する効果がある。
【０２６１】
この発明によれば、要素分布を複数のクラスタに分割し、各クラスタについて、第ｋ＋１次元から第Ｋ次元における確率値の累積値の推定値を、学習用の音声データが示す値の分布に基づいて計算するように構成したので、更に探索エラーが減少する効果がある。
【０２６２】
この発明によれば、ベストファースト探索で最初から複数個の解を求めて、その複数個の解の中で、上位複数個の評価値を有する要素分布を求めるように構成したので、探索エラーの生じる可能性を少なくできる効果がある。
【０２６３】
この発明によれば、第ｋ＋１次元から第Ｋ次元における確率値の累積値の推定値を、各要素分布における各次元の一次元の分布を各分布の平均値の数直線上での配列に基づいて複数の区間に分割し、各区間について求めた各次元毎の不等式と、入力ベクトルの各次元の数値とに基づいて計算するように構成したので、探索エラーの発生を防止することができるとともに、検索の演算量を削減することができる効果がある。
【０２６４】
この発明によれば、第ｋ＋１次元から第Ｋ次元における確率値の累積値の推定値を、要素分布を複数のクラスタに分割し、複数のクラスタのそれぞれについて各次元毎に得られた不等式と、入力ベクトルの各次元の数値とに基づいて計算するように構成したので、要素分布を探索するための演算量等が更に削減される効果がある。
【０２６５】
この発明によれば、探索の閾値を設けて、第ｋ次元までの探索の累積値又は評価値が、その閾値を下回る場合、当該仮説の探索を打ち切るように構成したので、要素分布を探索するための演算量等が更に削減される効果がある。
【０２６６】
この発明によれば、要素分布に依存して探索の閾値を設けて、第ｋ次元までの探索の累積値又は評価値が、その閾値を下回る場合、当該仮説の探索を打ち切るように構成したので、探索の演算量を更に削減できる効果がある。
【０２６７】
この発明によれば、探索の閾値として、入力ベクトルの次元数をＫとし、自由度Ｋ−ｋのカイ２乗分布の分布関数が一定の数値に到達するときの独立変数の値に基づいて計算した値を用いるように構成したので、スタックから取り出すときの演算量が削減できる効果がある。
【０２６８】
この発明によれば、展開済の次元数を増加させる際に２以上次元数を増加させるベストファースト探索を実行するように構成したので、スタックの管理のための演算量が削減される効果がある。
【０２７１】
この発明によれば、多次元の入力ベクトルに対して、評価値最大の要素分布のベストファースト探索を次元方向に実行し、評価値最大をとる要素確率分布の確率はベストファースト探索の結果として計算された確率を用い、その他の要素確率分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するように構成したので、精度を劣化させることなく、演算量を削減できる効果がある。
【０２７２】
この発明によれば、多次元の入力ベクトルに対して、評価値の大きい方からＮ個の分岐確率密度のベストファースト探索を次元方向に実行し、評価値の大きい方からＮ個の分岐密度をとる要素確率分布の確率はベストファースト探索の結果として計算された確率の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するように構成したので、上位Ｎ個の要素分布については厳密な尤度が得られる結果、認識の精度が向上する効果がある。
【０２７３】
この発明によれば、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算するに際して、多次元の入力ベクトルに対して、評価値最大の要素分布のベストファースト探索を次元方向に実行し、その評価値最大の要素分布の確率はベストファースト探索の結果として計算された確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するように構成したので、Ａ＊条件を満足するＨＭＭの出力確率を少ない演算量で求めることができる効果がある。
【０２７４】
この発明によれば、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、多次元の入力ベクトルに対して、評価値の大きい方からＮ個の分岐確率密度のベストファースト探索を次元方向に実行し、その評価値の大きい方からＮ個の分岐密度をとる要素分布の確率はベストファースト探索の結果として計算された評価値の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するように構成したので、マルチパスの１回目で得られる経路尤度の推定値の精度が向上する結果、マルチパスの２回目のベストファースト探索による経路探索に要する演算回数が減少する効果がある。
【０２７５】
この発明によれば、多次元の入力ベクトルに対して、評価値最大の要素分布のベストファースト探索を次元方向に実行し、評価値最大をとる要素分布の確率はベストファースト探索の結果として計算された確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を行うように構成したので、精度を劣化させることなく、演算量を削減できる効果がある。
【０２７６】
この発明によれば、多次元の入力ベクトルに対して、評価値の大きい方からＮ個の要素分布のベストファースト探索を次元方向に実行し、その評価値の大きい方からＮ個の要素分布の確率はベストファースト探索の結果として計算された評価値の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を行うように構成したので、上位Ｎ個の要素分布については厳密な尤度が得られる結果、認識の精度が向上する効果がある。
【０２７７】
この発明によれば、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、多次元の入力ベクトルに対して、評価値最大の要素分布のベストファースト探索を次元方向に実行し、その評価値最大をとる要素分布の確率はベストファースト探索の結果として計算された確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するように構成したので、Ａ＊条件を満足するＨＭＭの出力確率を少ない演算量で求めることができる効果がある。
【０２７８】
この発明によれば、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、多次元の入力ベクトルに対して、評価値の大きい方からＮ個の要素分布のベストファースト探索を次元方向に実行し、その評価値の大きい方からＮ個の要素分布の確率はベストファースト探索の結果として計算された評価値の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して認識結果を決定するように構成したので、マルチパスの１回目で得られる経路尤度の推定値の精度が向上する結果、マルチパスの２回目のベストファースト探索による経路探索に要する演算回数が減少する効果がある。
【０２７９】
この発明によれば、多次元の入力ベクトルに対して、複数の多次元の連続分布からなる混合連続確率分布のモデルで表現された認識単位のスコアを評価値最大の要素分布のベストファースト探索を次元方向に実行し、その評価値最大をとる要素分布の確率はベストファースト探索の結果として計算された要素分布の確率を用い、その他の要素確率分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行し、そのスコア計算の結果に基づいて、認識単位の系列の中でスコアの最も高い認識単位の系列を選択するように構成したので、精度を劣化させることなく、演算量を削減できる効果がある。
【０２８０】
この発明によれば、多次元の入力ベクトルに対して、複数の多次元の連続分布からなる混合連続確率分布のモデルで表現された認識単位のスコアを評価値の大きい方からＮ個の要素分布のベストファースト探索を次元方向に実行し、その評価値の大きい方からＮ個の評価値をとる要素分布の確率はベストファースト探索の結果として計算された評価値の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行し、そのスコア計算の結果に基づいて認識単位の系列の中でスコアの最も高い認識単位の系列を選択するように構成したので、上位Ｎ個の要素分布については厳密な尤度が得られる結果、認識の精度が向上する効果がある。
【０２８１】
この発明によれば、多次元の入力ベクトルに対して、複数の多次元の連続分布からなる混合連続確率分布に基づいて認識単位のスコアの推定値を計算するスコア計算し、１回目は認識単位の系列の中で経路の制約を緩めたスコアの推定値を求め、２回目は１回目で求めたスコアの推定値に基づいて認識単位の完全な系列の中で最もスコアの高い認識単位の系列を探索して求め、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、その多次元の入力ベクトルに対して評価値最大の分岐確率密度のベストファースト探索を次元方向に実行し、その評価値最大をとる要素分布の確率はベストファースト探索の結果として計算された確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して、認識結果を決定するように構成したので、Ａ＊条件を満足するＨＭＭの出力確率を少ない演算量で求めることができる効果がある。
【０２８２】
この発明によれば、多次元の入力ベクトルに対して、複数の多次元の連続分布からなる混合連続確率分布に基づいて認識単位のスコアを計算し、１回目は認識単位の系列の中で経路の制約を緩めたスコアの推定値を求め、２回目は１回目で求めたスコアの推定値に基づいて認識単位の完全な系列の中で最もスコアの高い認識単位の系列を探索して求め、１回目の認識単位の系列の中で経路の制約を緩めたスコアの推定値を計算する事に際して、その多次元の入力ベクトルに対して、評価値の大きい方からＮ個の分岐確率密度のベストファースト探索を次元方向に実行し、その評価値の大きい方からＮ個の要素分布の確率はベストファースト探索の結果として計算された評価値の大きい方からＮ個の要素分布の確率を用い、その他の要素分布の確率についてはベストファースト探索の途中次元までの評価値を用いて認識単位のスコア計算を実行して、認識結果を決定するように構成したので、マルチパスの１回目で得られる経路尤度の推定値の精度が向上する結果、マルチパスの２回目のベストファースト探索による経路探索に要する演算回数が減少する効果がある。
【図面の簡単な説明】
【図１】この発明の実施の形態１による音声認識装置を示す構成図である。
【図２】状態ＨＭＭの構造を示す説明図である。
【図３】探索で用いる機能の構成図である。
【図４】ヒープを用いた仮説記憶部の説明図である。
【図５】式（２）の計算を次元方向に探索を進めるベストファースト探索のフローチャートである。
【図６】探索終了時点のスタックに残された仮説の様子をそれらの展開履歴と共に示すグラフ図である。
【図７】探索のためのフローチャートである。
【図８】探索のための機能ブロック図である。
【図９】自由度ｎのカイ２乗分布の分布関数のテーブルの一部分を示す表図である。
【図１０】次元数Ｋの入力ベクトルに対して、評価値の大きい方からＳ個の要素分布を決定するためのベストファースト探索のフローチャートである。
【図１１】一次元の分布の分類のフローチャートである。
【図１２】数直線上の平均値と分割の様子を示す説明図である。
【図１３】探索のフローチャートである。
【図１４】探索のための機能ブロック図である。
【図１５】ベストファースト探索のフローチャートである。
【図１６】カイ２乗分布の分布関数が一定の数値に到達するときの値に基づいて閾値を決める際の考え方を説明する説明図である。
【図１７】探索のフローチャートである。
【図１８】ベストファースト探索のフローチャートである。
【図１９】ベクトル量子化を用いた符号化・復号化の構成図である。
【図２０】ベクトル量子化におけるコードベクトルをベストファースト探索で求めるときのフローチャートである。
【図２１】文字認識装置の構成図である。
【図２２】カテゴリ探索部の動作を示すフローチャートである。
【図２３】フレーム処理を示すフローチャートである。
【図２４】探索のフローチャートである。
【図２５】探索のフローチャートである。
【図２６】音声認識装置の構成図である。
【図２７】混合連続分布ＨＭＭを用いる音声認識装置を示す構成図である。
【図２８】混合連続分布ＨＭＭの説明図である。
【図２９】経路演算の説明図である。
【符号の説明】
２音声分析手段、４確率演算手段（スコア計算手段）、５経路演算手段（スコア計算手段）、６最適経路決定手段（認識結果選択手段）。[0001]
BACKGROUND OF THE INVENTION
The present invention determines, for example, an element distribution search method, a vector quantization method, a pattern recognition method, a speech recognition method, a speech recognition apparatus, and a recognition result that can reduce the amount of computation of a multidimensional continuous probability distribution. The present invention relates to a recording medium on which a program for recording is recorded.
[0002]
[Prior art]
FIG. 27 is a configuration diagram showing a speech recognition apparatus using a mixed continuous distribution HMM. In FIG. 27, 101 is an input speech, 102 is a speech analysis of the input speech 101, and outputs a K-dimensional input feature vector 103. , 103 is an input feature vector, 104 is a probability calculation unit that executes a probability calculation of the phoneme HMM represented by the mixed continuous distribution HMM 108, 105 is a path calculation unit that executes a route calculation, and 106 is an optimum that determines an optimum route A route determination unit 107 is a recognition result indicating an optimum route, 108 is a mixed continuous distribution HMM, and 109 is route data.
FIG. 28 is an explanatory diagram of the mixed continuous distribution HMM, and FIG. 29 is an explanatory diagram of the path calculation.
[0003]
Next, the operation will be described.
For the K-dimensional input feature vector 103 obtained by the speech analysis unit 102, the probability calculation unit 104 executes the probability calculation of the phoneme HMM represented by the mixed distribution of M K-dimensional normal probability density functions, and determines the optimum path. The unit 106 performs a Viterbi operation according to the word connection condition for each frame to obtain an optimum route. A word sequence is obtained by searching the optimal route backward.
The probability calculation for the feature vector x (t) at time t is performed based on the following equation. In the following expression, λ (m) indicates the branch probability of the mth element distribution, and f (x (t), m) indicates the branch density of the mth element distribution.
[0004]
[Expression 1]

[0005]
Further, the branch density f (x (t), m) of the element distribution is calculated as the following equation for an uncorrelated normal distribution having no correlation between dimensions.
Where x (t, k) is the value of the kth element of the input vector x (t), μ (m, k) is the average value of the kth dimension of the mth element distribution, and σ (m , K) is the variance (ie, the square of the standard deviation) in the kth dimension of the mth element distribution.
In general, the shoulder is squared, but in this specification, the shoulder is not squared and the variance is expressed.
[0006]
[Expression 2]

[0007]
In a large vocabulary speech recognition apparatus, a phoneme HMM is used.
Moreover, since an environment-dependent phoneme model that depends on the preceding and following phonemes is used as the phoneme, the total number of element distributions included in the mixture distribution becomes considerably large.
For example, if each state of the environment-dependent phoneme model is expressed by a 2000-state HMM and each state has 32 mixed distributions, the total distribution has 64000 elements.
[0008]
If these are calculated every 10 milliseconds of the frame period, it is necessary to calculate 640000 element distributions per second. In order to process the mixed distribution calculation in real time, one mixed distribution is processed in about 156 nanoseconds. Although necessary, this is difficult to achieve with current general-purpose processors.
For this reason, techniques for reducing the amount of calculation described below have been proposed.
[0009]
・ Preliminary selection method:
For reducing the amount of calculation of the mixed continuous distribution HMM, the following preliminary selection method has been proposed.
[0010]
Iwasaki et al. In the literature ("Computation amount reduction method for continuous speech recognition of unspecified speakers using mixed continuous distribution HMM", Proc. Of the Acoustical Society of Japan, 3-5-4, October 1991). Each phoneme HMM is preliminarily selected based on the output probability of a single distribution, and only the selected phoneme HMM is subjected to exact mixed distribution calculation.
[0011]
Bocchieri et al., In the literature ("Vector quantization for the effective computation of continuous density likelihoods", ICASSP93 Proceedings, II-692-II-695) The output probability was calculated after pre-selecting a cluster of mixed Gaussian distribution using distance information with quantization vector.
[0012]
Digalakis et al., “Genones: Generalized Mixture Tying in Continuous Hidden Markov Model-Based Spe. Re. 28”, IEEE Transactions on Spe. The method of Bocchieri et al. Was applied to a speech recognition system used for a generalized phoneme HMM, and a cluster of a mixed Gaussian distribution was preselected for each generalized phoneme HMM to calculate an output probability.
[0013]
Watanabe et al. In the literature ("Speech recognition using tree structure probability distribution", Acoustical Society of Japan Proceedings, 1-8-7, October, 1993, and Japanese Patent Laid-Open No. 6-348292), Gauss. A tree structure was introduced in the pre-selection of distribution clusters and the output probability calculation of mixed Gaussian distributions. Top-down clustering is applied to create the tree structure.
[0014]
Komori et al. In the literature ("Acceleration of continuous HMM output probability calculation using Rough HMM and Detail HMM", Proc. Of the Acoustical Society of Japan, 1-Q-20, March 1995) After preselecting the state of the HMM that seems to be high using a minority distribution HMM, the output probability of the majority distribution is recalculated.
[0015]
Nakagawa et al., Published for all code vectors in the literature ("Shortening Method for Output Probability Calculation of Continuous Output Distributed HMM", Acoustical Society of Japan Proceedings, 1-Q-22, March 1995). Probabilities are calculated in advance and tabulated, the output probability value corresponding to the code vector closest to the input vector is tabulated, and recalculated using the conventional method only when the probability is large as necessary. The output probability is obtained.
[0016]
In any of the above methods, the total number of distributions actually calculated is reduced. However, the above-described method based on the preliminary selection has the following problems.
In the method of Iwasaki et al., It is necessary to prepare a single distribution covering the mixed distribution in addition to the mixed distribution.
Watanabe et al.'S method needs to be designed so that the element distribution has a tree structure so that the distribution close to the root of the tree covers the distribution close to the leaves of the tree. For this reason, it cannot be applied to the calculation reduction of an existing mixed distribution having an arbitrary mixed distribution.
[0017]
In the method of Komori et al., In addition to the conventional storage area for the mixed distribution, a storage area for the rough mixed distribution is prepared, and the calculation of the rough mixed distribution is required as preprocessing.
The method of Bocchieri, Digigalakis, and Nakagawa et al. Requires an additional storage area for a codebook for vector quantization and a vector quantization operation for determining a code closest to the input vector. .
[0018]
By the way, vector quantization is attracting attention as a high-efficiency encoding method for speech and images as a multi-dimensional signal quantization method that does not individually quantize a plurality of signals but expresses them as one code.
Vector quantization is a method of obtaining a code vector closest to an input vector and representing the input vector with the code. Therefore, it is necessary to calculate the distance between the input vector and all code vectors.
Since the coding error due to vector quantization is reduced, the codebook size and code vector search time increase exponentially, making hardware implementation difficult.
[0019]
There are the following three methods for reducing the amount of vector quantization computation.
In the first method, a code vector is designed so that a binary tree search is possible, and a code vector having a minimum distance with respect to an input vector is obtained by a search using the binary tree.
[0020]
Watanabe et al., In the literature ("Design Method of Binary Tree Search Codebook Using Principal Component Analysis", Speech Study Group Material SP87-129, February 1987), Design of Codebook that Enables Binary Tree Search Indicates the law. According to the binary tree search, the number of code vectors compared to the input vector is log₂ (N) order, and the number of comparisons can be greatly reduced.
[0021]
The second method is to perform a binary tree search for a set of code vectors having an arbitrary structure.
Cheng et al. In the literature (“A Fast codebook search algorithm for nearest-neighbor pattern matching”, ICASSP86 Proceedings, pages 265-268, 1986), the entire code vector consisting of an arbitrary code vector. We propose a method to search for the code vector closest to the input vector by starting from the set and repeating the bipartition processing of the code vector set by the hyperplane sequentially according to the binary tree until the code vector set consisting of one element is obtained. ing.
[0022]
The hyperplane associated with node n is stored as coefficient vector u (n) and constant vector c (n) for input vector x (t).
For the input vector x (t), the following relational expression is calculated at each node. If the relation (A) is satisfied, the process proceeds to the node on the left branch, and the nearest neighbor code of the input vector is the left subset. Ω_L It belongs to. If the relationship (B) is satisfied, the process proceeds to the right branch node, and the nearest neighbor code of the input vector is the right subset Ω._R It belongs to.
[0023]
[Equation 3]

[0024]
The third method uses a general search method.
Maruta et al., In the literature ("Examination of high-speed distance calculation in character recognition", Proceedings of the 1997 IEICE General Conference, page 310), a category that gives a minimum distance while searching for a feature space (in the code vector) Equivalent).
Each category is given as a template vector.
In this method, based on an initial category given in some way, a category list of neighborhoods created in advance for each category is referred to, and a category that gives the minimum distance among the neighborhood categories of the current category Is selected, this category is newly set as a seed category, and the search is repeated again.
[0025]
The above vector quantization method has the following problems.
In the first method, it is necessary to design a code vector that allows binary tree search.
In the second method, the multidimensional space is divided one after another on the hyperplane, but it is generally difficult to divide the multidimensional space equally well on the hyperplane, and finally one element distribution is obtained. A lot of judgments are required to reach this, and there is almost no effect of reducing the amount of calculation. Also, a considerable amount of memory is required to store information for determination.
In the third method, a method for providing an initial vector is not disclosed. In addition, the search result may fall into a local minimum, and there is no guarantee that it is the optimal solution. For this reason, the accuracy of vector quantization may be deteriorated. Also, when used for pattern recognition, there is a possibility of degradation of recognition accuracy.
[0026]
・ Method of reducing mixed distribution calculation by sharing:
As a method for reducing the amount of calculation for mixed distribution or vector quantization, there is a method for reducing the amount of calculation by sharing the HMM part in addition to the above-described “method by preliminary selection”.
[0027]
Komori et al., In the literature (Japanese Patent Laid-Open No. 7-261788), in order to reduce the amount of computation at the time of recognition, among a plurality of HMMs, a class using a HMM in which related information of the HMM is the same value. A method for recognizing speech is shown for each.
Takahashi et al., Literature (“Effect of sharing shared values in a four-layer shared phoneme model”, Acoustical Society of Japan, 1-Q-23, March 1995) and literature (Japanese Patent Laid-Open No. 8-248986). Gazette) reduces the amount of computation of the mixed distribution of HMM by sharing the normal distribution that exists in each dimension of the multi-dimensional normal distribution of each state of HMM with the same average value and variance value. Shows how to do.
[0028]
The above-described method for reducing the amount of calculation by sharing the HMM portion has the following problems.
If the sharing is advanced, the calculation time is reduced, but the resolution of the HMM is lowered and the recognition accuracy is lowered.
In addition, all the HMMs that have been shared must be operated, and there is a limit in reducing the amount of calculation.
[0029]
The conventional technique related to the reduction of the calculation amount of the HMM mixture distribution has been described above.
Next, a conventional technique related to a reduction in the amount of calculation for route search will be described.
As a method for reducing the amount of calculation for route search, there is a method of performing speech recognition by performing beam search that limits the number of hypotheses in synchronization with a speech frame.
In the first pass, there is a multipath speech recognition method for obtaining an estimated value of a route score along a loosely restricted route, and searching for an optimum route based on the estimated value of the route score in the second pass. is there.
[0030]
Noda et al. In the beam search in the literature ("Study on HMM-LR speech recognition by beam search using forward-looking heuristic function", Acoustical Society of Japan, 3-8-16, October 1994). Introducing forward-looking heuristics to reduce route search operations.
[0031]
Kato et al. In the literature ("Study of Estimated Score Setting Method in Viterbi best-first Search for Word Speech Recognition Using Continuously Distributed HMM", Proc. Of the Acoustical Society of Japan, 3-8-15, October 1994). In multi-pass speech recognition, the maximum branch density and maximum route score of the mixed distribution are obtained as estimated scores in the first pass, and the best first search is performed in the second pass, thereby reducing the amount of computation required for route search. is doing.
[0032]
As the maximum route score to be obtained in the first pass, a method of obtaining within a phoneme, a word, or all words is shown.
Note that the best-first search calculates an evaluation value based on the estimated value of the current score and the future score when developing a hypothesis (which is called a partial hypothesis because it is in the middle of the search), and this evaluation value is the maximum. This is a method of preferentially developing the hypothesis and proceeding with the search.
[0033]
Here, it is said that the A * condition is satisfied when the estimated score value is greater than or equal to the true score. In the best first search, a search using an estimated value that satisfies the A * condition is called an A * search.
In the A * search, it is known that the solution found at the beginning of the search matches the optimal solution in the full search.
[0034]
The above method has the following problems.
In the method of Noda et al., The accuracy deteriorates when the beam width is lowered below a certain value.
In the method of Kato et al., Since the A * search condition is satisfied, there is no deterioration in accuracy. However, when obtaining an estimated score in the first pass, it is necessary to obtain an element distribution having the maximum branch density in the mixed distribution. In practice, when applied to speech recognition of large vocabulary, the amount of computation for this is not negligible.
[0035]
[Problems to be solved by the invention]
Since the conventional speech recognition apparatus is configured as described above, it requires a vector quantization operation as preprocessing, and it is necessary to use a tree-structured distribution as a mixed distribution. Therefore, since another mixture distribution is necessary, the amount of calculation becomes enormous, and there is a problem that a long time is required for the speech recognition processing.
In addition, there is a problem that recognition accuracy is reduced due to sharing, and there is also a problem that the amount of calculation of the estimated score becomes so large that it cannot be ignored in multi-pass speech recognition processing.
[0036]
The present invention has been made in order to solve the above-described problems. An element distribution search method, a vector quantization method, a pattern recognition method, a speech recognition method, a speech recognition device, and a recognition that can reduce the amount of calculation. An object is to obtain a recording medium on which a program for determining a result is recorded.
[0037]
[Means for Solving the Problems]
In the element distribution search method according to the present invention, the best-first search is executed in the dimension direction until the number of developed dimensions matches the number of dimensions of the input vector.
[0038]
In the element distribution search method according to the present invention, the best-first search is executed in the direction in which the number of developed dimensions increases.
[0039]
In the element distribution search method according to the present invention, the estimated value of the probability value in the (k + 1) -th dimension to the K-th dimension is calculated, and the distribution function value of the chi-square distribution with degrees of freedom K−k reaches a predetermined value. The calculation is based on the independent variable value of the chi-square distribution.
[0040]
In the element distribution search method according to the present invention, the estimated value of the cumulative probability value in the (k + 1) -th dimension to the K-th dimension is calculated based on the distribution of values indicated by the learning speech data. .
[0041]
In the element distribution search method according to the present invention, the element distribution is divided into a plurality of clusters, and the learning speech data indicates the estimated value of the probability value in the (k + 1) -th dimension to the K-th dimension for each cluster. The calculation is based on the distribution of values.
[0042]
In the element distribution search method according to the present invention, a plurality of solutions are obtained from the beginning by a best-first search, and an element distribution having a plurality of higher evaluation values is obtained from the plurality of solutions. It is.
[0043]
In the element distribution search method according to the present invention, the cumulative value of the probability value in the (k + 1) -th dimension to the K-th dimension, the one-dimensional distribution of each dimension in each element distribution on the number line of the average value of each distribution Are divided into a plurality of sections based on the arrangement in Fig. 5, and calculated based on the inequality for each dimension obtained for each section and the numerical value of each dimension of the input vector.
[0044]
In the element distribution search method according to the present invention, an estimated value of cumulative probability values in the (k + 1) -th dimension to the K-th dimension is obtained by dividing the element distribution into a plurality of clusters, and obtaining each of the plurality of clusters for each dimension. The calculation is performed based on the obtained inequality and the numerical value of each dimension of the input vector.
[0045]
In the element distribution search method according to the present invention, a search threshold value is provided, and when the cumulative value or evaluation value of the search up to the k-th dimension falls below the threshold value, the search for the hypothesis is terminated. .
[0046]
The element distribution search method according to the present invention provides a search threshold value depending on the element distribution, and terminates the search for the hypothesis when the accumulated value or evaluation value of the search up to the k-th dimension is lower than the threshold value. It is what I did.
[0047]
In the element distribution search method according to the present invention, as the search threshold, the number of dimensions of the input vector is K, and the independent variable when the distribution function of the chi-square distribution with K−k degrees of freedom reaches a constant value. A value calculated based on the value is used.
[0048]
The element distribution search method according to the present invention performs a best-first search for increasing the number of dimensions by two or more when the number of developed dimensions is increased.
[0051]
  The speech recognition method according to the present invention performs a best-first search for an element distribution having the maximum evaluation value for a multidimensional input vector.DimensionallyThe probability of the element probability distribution that is executed and takes the maximum evaluation value is recognized using the probability calculated as a result of the best-first search, and the other element probability distribution probabilities are recognized using the evaluation values up to the middle dimension of the best-first search. The unit score calculation is executed to determine the recognition result.
[0052]
  The speech recognition method according to the present invention performs a best-first search of N branch probability densities from the one with the larger evaluation value for a multidimensional input vector.DimensionallyThe probability of the element probability distribution having N branch density from the largest evaluation value is executed using the probabilities of N element distributions from the largest probability calculated as a result of the best-first search, and other element distributions. For the probability of, the recognition result is determined by executing the score calculation of the recognition unit using the evaluation value up to the middle dimension of the best-first search.
[0053]
  The speech recognition method according to the present invention calculates an element distribution having the maximum evaluation value with respect to a multidimensional input vector when calculating an estimated value of a score in which a restriction on a path is relaxed in the first recognition unit sequence. The best first searchDimensionallyThe probability of the element distribution with the maximum evaluation value is the probability calculated as a result of the best-first search, and the probabilities of other element distributions are evaluated using the evaluation values up to the middle dimension of the best-first search. The score calculation is executed to determine the recognition result.
[0054]
  The speech recognition method according to the present invention calculates a score estimate value obtained by loosening the path constraint in the first recognition unit sequence, from the one having a larger evaluation value with respect to a multidimensional input vector. Best-first search for N branch probability densitiesDimensionallyThe probability of the element distribution having N branch density from the largest evaluation value is executed using the probability of the N element distribution from the largest evaluation value calculated as a result of the best-first search, and other elements. As for the distribution probability, the recognition result is determined by executing the score calculation of the recognition unit using the evaluation value up to the middle dimension of the best-first search.
[0055]
  The speech recognition apparatus according to the present invention performs a best-first search for an element distribution having the maximum evaluation value for a multidimensional input vector.DimensionallyThe probability of the element distribution that takes the maximum evaluation value is executed using the probability calculated as a result of the best-first search, and the other element distribution probabilities are evaluated using the evaluation values up to the middle dimension of the best-first search. The score is calculated.
[0056]
  The speech recognition apparatus according to the present invention performs a best-first search of N element distributions in descending order of evaluation values for a multidimensional input vector.DimensionallyThe probability of N element distributions from the largest evaluation value is used as the probability of N element distributions from the largest evaluation value calculated as a result of the best-first search. Is to calculate the score of the recognition unit using the evaluation values up to the middle dimension of the best-first search.
[0057]
  The speech recognition apparatus according to the present invention calculates an element distribution having a maximum evaluation value with respect to a multidimensional input vector when calculating an estimated value of a score in which a path restriction is relaxed in a first recognition unit sequence. The best first searchDimensionallyUse the probability calculated as a result of the best-first search for the probability of the element distribution that is executed and take the maximum evaluation value, and use the evaluation value up to the middle dimension of the best-first search for the probability of the other element distribution. The score calculation is executed to determine the recognition result.
[0058]
  The speech recognition apparatus according to the present invention calculates an estimated value of a relaxed path in the first recognition unit sequence from the one with the larger evaluation value for a multidimensional input vector. Best-first search of N element distributionsDimensionallyThe probability of N element distributions from the largest evaluation value is used as the probability of N element distributions from the largest evaluation value calculated as a result of the best-first search. Uses the evaluation value up to the middle dimension of the best-first search to calculate the recognition unit score and determine the recognition result.
[0059]
  A recording medium on which a program for determining a recognition result according to the present invention is recorded is a recognition unit expressed by a mixed continuous probability distribution model composed of a plurality of multidimensional continuous distributions with respect to a multidimensional input vector. Best-first search of element distribution with maximum evaluation valueDimensionallyThe probability of the element distribution that takes the maximum evaluation value is used as the probability of the element distribution calculated as a result of the best-first search, and the evaluation value up to the middle dimension of the best-first search is used for the other element probability distribution probabilities. Using this, the score calculation of the recognition unit is executed, and based on the result of the score calculation, the recognition unit sequence having the highest score is selected from the recognition unit sequence.
[0060]
  A recording medium on which a program for determining a recognition result according to the present invention is recorded is a recognition unit expressed by a mixed continuous probability distribution model composed of a plurality of multidimensional continuous distributions with respect to a multidimensional input vector. The best first search of N element distribution from the one with the highest evaluation valueDimensionallyThe probability of the element distribution that takes N evaluation values from the larger evaluation value is executed using the probability of the N element distributions from the largest evaluation value calculated as a result of the best-first search, and other elements For the probability of distribution, execute the score calculation of the recognition unit using the evaluation value up to the middle dimension of the best first search, and based on the result of the score calculation, the sequence of the recognition unit with the highest score among the recognition unit series Is to be selected.
[0061]
  A recording medium on which a program for determining a recognition result according to the present invention is recorded has a score of a recognition unit based on a mixed continuous probability distribution composed of a plurality of multidimensional continuous distributions with respect to a multidimensional input vector. The score is calculated to calculate the estimated value, and the first time obtains the estimated value of the relaxed path constraint in the series of recognition units, and the second time determines the recognition unit based on the estimated score value obtained in the first time. When searching for the sequence of the recognition unit with the highest score in the complete sequence and calculating the estimated value of the score that relaxed the path constraint in the sequence of the first recognition unit, Best-first search for branch probability density with maximum evaluation value for input vectorDimensionallyUse the probability calculated as a result of the best-first search for the probability of the element distribution that is executed and take the maximum evaluation value, and use the evaluation value up to the middle dimension of the best-first search for the probability of the other element distribution. The score calculation is executed to determine the recognition result.
[0062]
  A recording medium on which a program for determining a recognition result according to the present invention is recorded has a recognition unit score for a multidimensional input vector based on a mixed continuous probability distribution composed of a plurality of multidimensional continuous distributions. Calculate the first time to obtain the estimated value of the relaxed path in the sequence of recognition units, and the second time in the complete sequence of recognition units based on the estimated score obtained in the first time When searching for a sequence of recognition units with the highest score and calculating an estimated value of a relaxed path in the sequence of recognition units for the first time, for the multidimensional input vector, Best-first search of N branch probability densities from the highest evaluation valueDimensionallyThe probability of N element distributions from the largest evaluation value is used as the probability of N element distributions from the largest evaluation value calculated as a result of the best-first search. Is a score calculation of recognition units using evaluation values up to the middle dimension of the best-first search to determine the recognition result.
[0063]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment of the present invention will be described below.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a speech recognition apparatus according to Embodiment 1 of the present invention. In the figure, 1 is input speech, 2 is speech analysis means for analyzing input speech 1 and converting it into

feature vector

3, and 4 is Probability calculation means (score calculation means) for calculating the probability of the state HMM of the mixed continuous distribution HMM stored in the HMM storage means 8 for the

feature vector

3, and 5 is a state HMM according to the route data in the route data storage means 9. A route calculation means (score calculation means) for determining a route having the highest score in the state HMM sequence leading to that state in each state of the route, comprising phonemes and words from the connection of Searches for the route determined by the route calculation means 5 backward from the end of the speech, obtains a sequence of state HMMs along the optimum route, and outputs a recognition result 7 for the input speech 1. A means (recognition result selection means).
[0064]
Next, the operation will be described.
The period of speech analysis is 10 ms, and the feature vector 3 is c (0), c (1), c (2),..., C (12) obtained as a result of the mel frequency axis cepstrum analysis of analysis order 12. , .DELTA.c (0), .DELTA.c (1), .DELTA.c (2),..., .DELTA.c (12), which are time change rates, are 26-dimensional vectors.
Hereinafter, the dimension of the feature vector 3 is assumed to be K.
[0065]
The HMM storage unit 8 stores a state HMM. FIG. 2 shows the structure of the state HMM. The state HMM has a two-state one-loop structure.
The output probability distribution associated with the state transition is a mixed continuous distribution in which a plurality (16 in the present embodiment) of continuous distributions are element distributions.
In addition, the output probability distribution associated with each state transition has a so-called knot relationship and is common.
[0066]
There are 2000 state HMMs in total. The phoneme HMM is a phoneme environment-dependent type, and is configured as a three-state HMM by connecting three of these state HMMs.
The word HMM is further configured by concatenating the three-state phoneme HMMs configured as described above for the number of phonemes according to the phoneme notation of the word.
[0067]
Strictly speaking, the output probability of a certain state HMM with respect to the input vector x (t) at time t is obtained by the following equation. However, M (j) is the number of element distributions included in the mixed continuous distribution in the jth state, and is 16 in all states in this embodiment.
In the following, since the mixed distribution of the state j will be described, the index j is omitted, M (j) is abbreviated as M, λ (j, m) is abbreviated as λ (m), and the like.
[0068]
[Expression 4]

[0069]
Further, f (x (t), m) is a branch density (density function of continuous distribution) of the m-th element distribution, and is given by the following expression in this embodiment that is an uncorrelated normal distribution.
Where x (t, k) is the value of the kth element of the input vector x (t), μ (m, k) is the average value of the kth dimension of the mth element distribution, and σ (m , K) is the variance of the kth dimension of the mth element distribution.
[0070]
[Equation 5]

[0071]
The above output probability is calculated as the sum of the branch density. In this embodiment, as an approximate expression of the output probability calculation, the branch probability λ (m) and the branch density f (x ( The maximum value of the product λ (m) f (x (t), m) (hereinafter referred to as branch output probability) of t) and m) is used.
It has been confirmed by a recognition experiment that the accuracy degradation is small when the output probability (hereinafter referred to as the maximum branch output probability) by this approximate expression is used.
[0072]
[Formula 6]

[0073]
In the present embodiment, the maximum output probability b ′ (x (t)) is calculated by applying the following best-first search.
[0074]
Hereinafter, this method will be described.
First, taking the logarithm of the branch output probability λ (m) f (x (t), m), the following transformation can be made from the equation (1).
[0075]
[Expression 7]

[0076]
Here, since A (m) and B (m) do not depend on the input vector, they are calculated in advance. Obtaining the maximum output probability b ′ (x (t)) from the monotonic increase of the logarithmic function is equivalent to obtaining the maximum value for m in Equation (2).
Therefore, if an element distribution that maximizes Equation (2) is obtained, Equation (1) can be calculated as the value of the exponential function (exp (x)) of that value.
[0077]
Next, a search method for obtaining an element distribution that maximizes Expression (2) will be described.
FIG. 3 is a block diagram of functions used in this search. The hypothesis development control unit 2001 extracts hypotheses stored in a hypothesis storage unit (also called a stack) 2002 and performs a development process.
[0078]
The hypotheses stored in the hypothesis storage unit 2002 are the distribution number m, the calculated number of dimensions k, and the cumulative value g (m, m) of the one-dimensional distribution probability of each dimension from the first dimension to the k-th dimension. k).
The accumulated value g (m, k) is a value when K in C (x (t), m, k) in Equation (2) is k,
g (m, k) = A (m) + B (m) + C (x (t), m, k)
It is.
[0079]
Hypotheses in the middle of the search are stored in the stack.
A stack can be created as a complete binary tree called a heap. FIG. 4 is an explanatory diagram of a hypothesis storage unit using a heap. The heap is realized as a one-dimensional array A of memories. The index of array A is generally n and 2^* n, 2^* If n + 1, then A [n] ≧ max (A [2^* n], A [2^* n + 1]).
[0080]
By managing the hypothesis in the heap, the hypothesis having the maximum value is always at the top A [1] of the array A, and therefore the maximum value can be extracted very quickly.
When a hypothesis is stored in the heap, the above magnitude relationship A is exchanged while tracing the binary tree from the root of the heap (that is, the element A [1] of the array) toward the leaf and exchanging the elements of the array. [N] ≧ max (A [2^* n], A [2^* n + 1]).
This processing requires only an amount of calculation (O (2log2N)) twice as large as the logarithm of the hypothetical number 2, and is performed at a very high speed.
[0081]
FIG. 5 is a flowchart of the best-first search in which the calculation of Expression (2) is advanced in the dimension direction.
Next, each processing step will be described.
In step ST1, initial hypotheses are generated for all element distributions (first to M (j) distributions) of the j-th state mixed continuous distribution. The generated hypothesis is stored in the hypothesis storage unit 2002.
Here, the initial hypothesis of the m-th element distribution is that the distribution number is m, the calculated number of dimensions is 0, and the cumulative value g (m, 0) of the one-dimensional distribution of each dimension.
g (m, 0) = A (m) + B (m)
And
[0082]
In step ST2, the hypothesis having the maximum cumulative value is extracted from the hypothesis storage unit 2002. The hypothetical element distribution number, the number of developed dimensions, and the accumulated value will be described below as m, k, and g (m, k), respectively.
[0083]
In step ST3, it is checked whether or not the calculated number of dimensions k has reached the final dimension K (26 in this embodiment). If the final dimension has been reached (when k = 26), the best first is obtained. Since this hypothesis corresponds to the element distribution with the maximum probability according to the search theory, the process proceeds to step ST7 in order to process the solution using the hypothesis as the searched solution.
If the final dimension has not been reached (when k <26), the process proceeds to step ST4.
[0084]
In step ST4, the number of dimensions is increased by 1 to k + 1.
In step ST5, the cumulative value up to the dimension number k + 1 is calculated (however, the dimension number k is described as a value before incrementing by 1 in step ST4. Note that the same applies to the following embodiments).
This cumulative value is obtained by adding a logarithmic probability value (hereinafter referred to as likelihood) of a one-dimensional continuous distribution of dimension k + 1 to a cumulative value g (m, k) up to dimension k to obtain g (m, k + 1). Calculate as follows: However, the constant term− (½) log (2πσ (m, k)) included in the logarithmic probability value of the one-dimensional continuous distribution has already been added in the constant term B (m) in step ST1. .
[0085]
[Equation 8]

[0086]
By the way, since the variance σ (m, k) is a positive value and is also a positive value because the numerator is also squared, the accumulated value g (m, k + 1) is equal to g (m, k). The accumulated value g (m, k) is a monotonically non-increasing function with respect to the dimension number k.
[0087]
In step ST6, a hypothesis is created with the element distribution number, the number of developed dimensions, and the cumulative value of this hypothesis set as m, k + 1, g (m, k + 1), respectively, and stored in the hypothesis storage unit 2002. Return to.
The processing in step ST2 is as described above, and the above processing is repeated until a solution is found in step ST3.
[0088]
In step ST7, the hypothesis is stored as a solution, and the process ends.
Now, for the hypothesis obtained as a solution in step ST7, the element distribution number, the number of developed dimensions, and the accumulated value are m, K, and g (m, K), respectively.
g (m, K) is the sum of the likelihoods of each dimension as in the following equation, is the sum of the likelihoods from the first dimension to the Kth dimension of the element distribution m, and M (j) mixed distributions Among them, it is the maximum likelihood.
g (m, K) = A (m) + B (m) + C (x (t), m, K)
[0089]
Thereby, the element distribution having the maximum likelihood is obtained as the element distribution number m of the hypothesis obtained as a solution.
The log probability is obtained as a cumulative value of a hypothesis obtained as a solution.
[0090]
FIG. 6 is a graph showing the hypotheses remaining in the stack at the end of the search together with their development history.
In this figure, □ indicates the development history of the hypothesis with the maximum cumulative value of the likelihood of each dimension expanded to the final dimension K. Further, the development history of the hypothesis in which the circles and the crosses are expanded to the middle dimension is shown.
[0091]
As can be seen from this figure, the one-dimensional likelihood calculation from the intermediate dimension to the final dimension is not performed for the hypothesis remaining after calculation of the intermediate dimension without being developed on the stack.
Therefore, in the method of the present embodiment, the number of computations is not expanded on the stack, and the one-dimensional likelihood calculation does not have to be performed for the remaining number of dimensions to be expanded of the element distribution, and all elements Compared with the conventional full search method for obtaining the element distribution with the maximum probability after obtaining the distribution probability, the amount of calculation is reduced.
[0092]
Therefore, since the best first search is used in this embodiment, the maximum likelihood branch probability density can be obtained by reducing the number of one-dimensional likelihood calculation operations compared to the full search. There is an effect.
Further, there is an effect that it is not necessary to use a probability distribution other than the vector quantization and the element distribution included in the original mixed distribution.
Compared to general search methods that are likely to fall into the local minimum, the best-first search always obtains the element distribution with the maximum probability as the first solution, so the accuracy of calculating the maximum branch output probability is improved. There is an effect that there is no sacrifice.
[0093]
Embodiment 2. FIG.
In the first embodiment, with respect to a multi-dimensional input vector having a number of dimensions K, the element distribution having the maximum cumulative value from the first dimension to the k-th dimension of the probability value of the one-dimensional distribution up to the k-th dimension. We used a best-first search that expands the hypothesis in the dimensional direction.
In this case, information on the estimated value of the cumulative value from the future k + 1th dimension to the Kth dimension, which is the number of dimensions of the input vector, is not used.
[0094]
Therefore, in the present embodiment, the best first search required until a solution is obtained by adding an estimated value of the cumulative value from the (k + 1) th dimension to the Kth dimension to the cumulative value from the first dimension to the kth dimension. Reduce the number of times the hypothesis is deployed.
[0095]
FIG. 7 is a flowchart for searching, and FIG. 8 is a functional block diagram for searching. The hypothesis development control unit 2001 extracts hypotheses stored in the hypothesis storage unit 2002 and performs a development process.
The hypotheses stored in the hypothesis storage unit 2002 are respectively the element distribution number m, the calculated number of dimensions k, and the cumulative value g of the one-dimensional distribution likelihood of each dimension from the first dimension to the k-th dimension g ( m, k) and evaluation value f (m, k).
[0096]
The cumulative value g (m, k) is a cumulative value of the likelihood of each dimension from the first dimension to the k-th dimension of the element distribution m, and C (x (t), m, k) in Expression (2). Is a value calculated by the following equation as a value when K is k.
g (m, k) = A (m) + B (m) + C (x (t), m, k)
The evaluation value f (m, k) is calculated by the following equation using the accumulated value g (m, k) and the estimated value h (m, k).
f (m, k) = g (m, k) + h (m, k)
[0097]
Here, the estimated value h (m, k) is an estimated value of the cumulative value of the likelihood of each dimension from the (k + 1) th dimension to the final Kth dimension of the element distribution m. Specific examples thereof will be described in Embodiment 3 and below, and therefore, detailed description thereof will be omitted, and an outline of the best-first search using the estimated values will be described here assuming that there is an estimated value.
Next, each processing step will be described.
[0098]
In step ST11, a feature vector x (t) at time t is given. The value of each dimension k (k = 1, 2,..., K) of this input vector is assumed to be x (t, k). In step ST12, an estimated value h (m, k) (k = 1, 2,..., K) of the cumulative value of the likelihood of each dimension from the (k + 1) th dimension to the final Kth dimension of the element distribution m is calculated. .
However, the estimated value of the final dimension K is h (m, K) = 0.
[0099]
In step ST1, initial hypotheses are generated for all element distributions (first to M (j) distributions) of the j-th state mixed distribution. The generated hypothesis is stored in the hypothesis storage unit 2002.
Here, the initial hypothesis of the mth distribution is that the element distribution number is m, the calculated number of dimensions is 0, and the cumulative value of the one-dimensional log probability (likelihood) of each dimension is g (m, 0). The evaluation value is f (m, 0).
However, as shown in the following equation, the accumulated value g (m, 0) is a constant term of the branch output probability of the element distribution, and the evaluation value f (m, 0) is the accumulated value g (m, 0) and the estimated value h. Based on (m, 0), the following equation is given.
g (m, 0) = A (m) + B (m)
f (m, 0) = g (m, 0) + h (m, 0)
[0100]
In step ST <b> 2, the hypothesis having the maximum evaluation value f (m, k) is extracted from the hypothesis storage unit 2002.
The hypothesis element distribution number, the number of developed dimensions, the cumulative value, and the evaluation value will be described below as m, k, g (m, k), and f (m, k), respectively.
[0101]
In step ST3, it is checked whether or not the calculated number of dimensions k has reached the final dimension K (= 26). If the final dimension has been reached (when k = 26), according to the theory of best-first search, Since this hypothesis corresponds to the element distribution with the maximum evaluation value, the process proceeds to step ST7 in order to process the hypothesis as a searched solution. If the final dimension has not been reached (when k <26), the process proceeds to step ST4.
[0102]
In step ST4, the number of dimensions is increased by 1 to k + 1.
In step ST5, first, an accumulated value g (m, k + 1) up to the dimension number k + 1 is calculated.
This cumulative value g (m, k + 1) is calculated by adding the likelihood of a one-dimensional continuous distribution of dimension k + 1 to the cumulative value g (m, k) up to dimension k as in the following equation. However, the constant term− (½) log (2πσ (m, k)) included in the logarithmic probability value of the one-dimensional continuous distribution has already been added in the constant term B (m) in step ST1. .
[0103]
[Equation 9]

[0104]
In addition, the estimated value calculation unit 2003 calculates an estimated value (h (m, k + 1)) of a cumulative value of one-dimensional distribution of each dimension from the (k + 1) th dimension to the Kth dimension.
Specific calculation examples of the estimated value will be described in the following several embodiments, and thus description thereof will be omitted.
Based on the cumulative likelihood g (m, k + 1) and the estimated value h (m, k + 1), an evaluation value f (m, k + 1) is calculated by the following equation.
f (m, k + 1) = g (m, k + 1) + h (m, k + 1)
[0105]
In step ST6, a hypothesis is created with the element distribution number, the number of developed dimensions, the accumulated value, and the evaluation value of this hypothesis as m, k + 1, g (m, k + 1), and f (m, k + 1), respectively. It memorize | stores in the memory | storage part 2002, and returns to step ST2.
The processing in step ST2 after returning is as described above.
[0106]
In step ST7, the hypothesis is stored as a solution, and the process ends.
In the following embodiments, specific examples of the estimated value h (m, k) will be described.
[0107]
Embodiment 3 FIG.
In the present embodiment, the estimated value h (m, k) of the likelihood cumulative value from the (k + 1) -th dimension to the final K-th dimension in the second embodiment is estimated based on statistical theory.
[0108]
The flowchart of the best first search for determining the element distribution with the maximum branch output probability with respect to the input vector of dimension number K is the same as that shown in FIG. That is, the sum of n random variables X1, X2,..., Xn according to an independent normal distribution with mean 0 and variance 1 as described in the literature (Morrison: “Multivariate Statistical Methods”, page 10). Is distributed according to a chi-square distribution with n degrees of freedom.
In general, when Xi follows independent mean μi and variance σi
[0109]
[Expression 10]

[0110]
Is distributed according to a chi-square distribution with n degrees of freedom.
By the way, the cumulative value of the one-dimensional likelihood from the (k + 1) -th dimension to the K-th dimension of a certain element distribution is as follows:
[0111]
## EQU11 ##

[0112]
Which formally matches the above formula.
Therefore, the cumulative value of the one-dimensional likelihood from the (k + 1) -th dimension to the K-th dimension is expected to be distributed according to the chi-square distribution with the degree of freedom K−k, except for the proportionality constant (−1/2 in this case). The
[0113]
FIG. 9 shows a part of a distribution function table of chi-square distribution with n degrees of freedom (the entire table is shown on page 366 of the above document).
From this table, for example, the probability that the value of the independent variable χ of the chi-square distribution with 10 degrees of freedom is 2.56 or less is 1%, the probability that it is 3.25 or less is 2.5%, 3.94 or less. It can be seen that the probability of
Therefore, when the remaining dimension is 10 dimensions, the probability that the cumulative value of the one-dimensional probability is 2.56 × proportional constant = 2.56 × (−½) = 1.28 or more is 1%. is there. Therefore, the cumulative value is expected to be smaller than -1.28 with a probability of 99%.
[0114]
By using this property, an estimated value h (m, k) of the cumulative value from the (k + 1) th dimension to the Kth dimension is given by the following equation.
h (m, k) = X (K−k, p) × (−1/2)
Here, X (n, p) is the value of the independent variable when the percentage point of the chi-square distribution with n degrees of freedom is p (for example, the 5% point of the chi-square distribution with 10 degrees of freedom is X (10, 0.05) = 3.94). This value is obtained by searching the table shown in FIG.
[0115]
As is apparent from this equation, the evaluation value h (m, k) of the present embodiment does not depend on the average value and the variance of each dimension of the element distribution, and thus is an evaluation value common to all element distributions. . For this reason, the calculation of the evaluation value can be realized by a table search using the developed dimension number k as a key.
[0116]
Disadvantageously, since there is p percent probability that the estimated value will be smaller than the actual cumulative value, the evaluation value based on this estimated value may be smaller than the true value and may not satisfy the A * condition. There is sex. For this reason, the first solution obtained by the best-first search that preferentially develops the hypothesis with the highest evaluation value does not necessarily provide an element distribution with the maximum branch output probability depending on the input vector. It can happen.
[0117]
Embodiment 4 FIG.
In the present embodiment, in the second embodiment, the estimated value h (m, k) of the cumulative value from the (k + 1) th dimension to the Kth dimension is used as the I samples x (i) (i) of the K-dimensional input vector. = 1, 2,..., I).
[0118]
The flowchart of the best first search for determining the element distribution having the maximum branch output probability for the input vector of dimension number K is the same as that shown in FIG.
The K-dimensional input vector sample x (i) is obtained by analyzing speech data for learning.
[0119]
Next, for each sample x (i) of the input vector, the cumulative value h (m, k,) of the one-dimensional likelihood of each dimension from the (k + 1) -th dimension to the K-th dimension when the m-th element distribution is used. x (i)) is obtained by the following equation.
[0120]
[Expression 12]

[0121]
Here, x (i, n) is the n-th value of the sample x (i).
The estimated value h (m, k) used for the search is obtained by the following equation using the cumulative value h (m, k, x (i)) obtained from these samples as the maximum value regarding the index m of the element distribution and the index i of the sample. .
[0122]
[Formula 13]

[0123]
Since the maximum value obtained from the actual input vector is used as the estimated value of the cumulative value from the (k + 1) th dimension to the Kth dimension, the possibility that the A * condition is not satisfied is reduced. As a result, there is an effect that search errors are reduced.
[0124]
Embodiment 5 FIG.
In the present embodiment, the estimated value h (m, k) of the cumulative value from the (k + 1) -th dimension to the K-th dimension is the value obtained from the sample of the K-dimensional input vector in the second embodiment. .
Further, the element distribution is classified into a plurality of clusters based on the similarity between the element distributions, and an estimated value is obtained for each cluster.
[0125]
The flowchart of the best first search for determining the element distribution with the maximum branch output probability with respect to the input vector of dimension number K is the same as that shown in FIG.
Next, creation of an estimated value will be described.
[0126]
First, the element distribution is classified into C clusters by an appropriate clustering method such as LBG algorithm or K-average method. Here, the K-average method is used.
In the K-means method, first, an element distribution that becomes C seeds is selected at random to obtain a representative distribution.
[0127]
Next, using the C representative distributions, based on the distance from the representative distribution, the total element distribution is classified by classifying each element distribution of the total element distribution as the representative distribution having the smallest distance.
The distance between distributions can be the Calbach divergence, the Chernoff distance, the Battacharia distance, or the like, but here, the distance between the distributions uses the Calbach divergence because it is easy to calculate.
[0128]
Next, for each of the C classifications, the representative distribution is selected again so that the average distance between the element distributions classified in the classification is minimized.
Further, the entire element distribution is reclassified using the selected representative distribution.
The above processing is repeated until the classification converges.
[0129]
Next, a sample of a K-dimensional input vector is obtained by analyzing learning speech data. This procedure is the same as in the fourth embodiment.
Next, for each sample x (i) (i = 1, 2,..., I, I sample number) of the input vector, the cumulative value h (m, k, x (i)) from the (k + 1) th dimension to the Kth dimension. Is obtained by the following equation.
[0130]
[Expression 14]

[0131]
Here, x (i, n) is the n-th value of the sample x (i).
The estimated value h (m, k) used for the search is the cumulative value h (m, k, x (i)) obtained from these samples of the element distribution belonging to the cluster c (c = 1, 2,..., C). An index m (mεΩ (c), where Ω (c) is an index of an element distribution belonging to the cluster c) and a maximum value regarding the index i of the sample is obtained by the following equation.
[0132]
[Expression 15]

[0133]
The estimated value obtained by the above processing uses the maximum value obtained from the actual input vector, and further takes the maximum value in the classified cluster, so that the possibility of not satisfying the A * condition is further reduced. This has the effect of further reducing errors.
[0134]
Embodiment 6 FIG.
In the present embodiment, the best first search is used to sequentially obtain a predetermined number (S in the following description) of solutions having a larger evaluation value, and among the predetermined number S of solutions. Determine the element distribution with the largest branch output probability.
The functional block diagram for the search is the same as that in FIG. 8, and the hypothesis development control unit 2001 extracts the hypothesis stored in the hypothesis storage unit 2002 and performs the development process.
[0135]
The hypotheses stored in the hypothesis storage unit 2002 are the distribution number m, the calculated number of dimensions k, and the cumulative value g (m) of the one-dimensional distribution of each dimension from the first dimension to the kth dimension. , H) and evaluation value f (m, h).
The cumulative value g (m, h) and the evaluation value f (m, h) are respectively obtained by the following equations, as in the second embodiment.
g (m, k) = A (m) + B (m) + C (x (t), m, k)
f (m, k) = g (m, k) + h (m, k)
Here, h (m, k) is an estimated value of the cumulative value of likelihood in each dimension from the (k + 1) th dimension to the final Kth dimension of the element distribution m.
[0136]
FIG. 10 is a flowchart of the best-first search for determining S element distributions from the largest evaluation value with respect to an input vector of dimension number K.
The operation of each processing step of steps ST1 to ST7 is the same as that of the second embodiment, and the description thereof is omitted, and the processing of different parts will be described.
[0137]
In step ST21, when the number of solutions reaches a predetermined number S, the process goes to step ST22. If the number of solutions is less than the predetermined number S, the process returns to step ST2.
In step ST22, for the S solutions stored in order from the largest evaluation value, the element distribution number is m (s) (s = 1, 2,..., S), and the hypotheses of these solutions are used. , The cumulative likelihood value g (m (s), K) is compared, the hypothesis of the solution having the maximum likelihood cumulative value g (m (s), K) is selected, and the hypothesis of this solution is selected. Is output as the element distribution number having the maximum branch output probability.
[0138]
Next, the effect of this embodiment will be described.
As the estimated value h (m, k) of the cumulative value of the likelihood of each dimension from the (k + 1) th dimension to the final Kth dimension of the element distribution m, in the above three embodiments,
(A) an independent variable value when the value of the distribution function of the chi-square distribution with the degree of freedom K−k reaches a predetermined value;
(B) a distribution of values indicated by the speech data for learning;
(C) A value determined from the distribution of values indicated by the speech data for learning for each cluster of the clustered element distribution was used.
[0139]
Such an estimated value may be lower than the true value of the cumulative value of the likelihood of each dimension from the (k + 1) th dimension to the final Kth dimension of the element distribution m.
For this reason, in the above three embodiments, the evaluation value may not satisfy the A * condition depending on the input vector.
For this reason, there is no guarantee that the first solution obtained by the best-first search is the optimal solution (that is, the evaluation value at the time when the solution is obtained is the maximum, but the cumulative value of the likelihood of each dimension is not necessarily Because it was not the maximum, search errors could occur.
[0140]
Therefore, in the present embodiment, since a total of S solutions including candidates with lower ranks in the evaluation value are obtained by the best-first search, there is a possibility that a hypothesis having the maximum true likelihood is included in the solution. Increase and decrease search errors.
Therefore, according to the present embodiment, S solutions are obtained from the beginning in the best first search, and the solution with the highest likelihood is selected among them. There is an effect that it can be reduced.
[0141]
Embodiment 7 FIG.
In this embodiment, when searching for an element distribution with the maximum likelihood, an estimated value is calculated so as to satisfy the A * condition so that a search error does not occur, and a best-first search is performed.
Therefore, the following general inequality is used.
Here, μmin and μmax on the right side are the minimum value and the maximum value of the average value μ on the left side, respectively. Σmax is the maximum value of the variance σ on the left side.
[0142]
[Expression 16]

[0143]
Next, how to obtain the average minimum value μmin, maximum value μmax, and maximum dispersion σmax appearing in the inequality will be described.
First, one-dimensional distribution classification will be described.
FIG. 11 is a flowchart of classification of a one-dimensional distribution. FIG. 12 is an explanatory diagram showing an average value on the number line and a state of division.
[0144]
Next, the operation will be described.
In step ST31, the variable k is set to 0. Hereinafter, the variable k represents a dimension.
In step ST32, the dimension is increased by one.
In step ST33, the average value μ (m, k) of the one-dimensional distribution of the dimension k of each element distribution is arranged on a number line.
[0145]
FIG. 12 shows the arrangement of the average values on the number line after the average values of all the element distributions are arranged. In this example, the average value of the 16 element distributions is indicated by a circle on the number line. The numbers above the circles indicate element distribution numbers.
In step ST34, the average value on the number line is divided into C sections. In the figure, a case where it is divided into four sections each including four average values is shown. As a method of division, for example, the division may be performed so that the average distance between the average values in each section is minimized.
[0146]
In step ST35, for each of the C sections, an average value located at both ends of the section, that is, a minimum value μmin (c, k) and a maximum value μmax (c, k) of the average value of each section are obtained.
In the example of the figure, the minimum value μmin (2, k) of the average value of the section 2 is the average value of the element distribution 7, and the maximum value μmax (z, k) of the average value of the section is the average of the element distribution 15. Value.
[0147]
In step ST36, for each of the C sections c, an element distribution m (∈Ω (c) having an average value existing in the section c, where Ω (c) is a one-dimensional distribution of the dimension k. The average value is a set of element distribution numbers belonging to the interval c), and each one-dimensional distribution of the dimension k is looked at among these element distributions, and the maximum variance σmax (c, k) is obtained. That is,
[0148]
[Expression 17]

[0149]
In the example shown in the figure, the maximum variance σmax (2, k) in section 2 is obtained as the maximum variance value among the variances of the

element distributions

7, 10, 13, and 15.
The section number of the element distribution m is stored as c (m, k).
This section number storage c (m, k) is used to obtain the section number from the element distribution number m for the hypothesis dimension k in the next search.
[0150]
Next, an operation for searching for an element distribution with the maximum branch output probability for the feature vector x (t) at time t will be described.
FIG. 13 is a flowchart of the search. FIG. 14 is a functional block diagram for searching.
In step ST11, the feature vector x (t) is input.
In step ST12, the input feature vector x (t) and the minimum value μmin (c, k) and maximum value μmax () of the average value of the dimension k for each of the C sections c already obtained as described above. c, k) and an estimated value h (c, k) for an element distribution having an average value in the interval c are calculated based on the maximum variance σmax (c, k). The estimated value h (c, k) is calculated by the following recurrence formula.
[0151]
[Expression 18]

[0152]
Θ (c, k) appearing in the above equation is the minimum value μmin (c, k) and maximum value μmax (c, k) of the average value of the one-dimensional distribution of the (k + 1) th dimension for the c-th division. , And the maximum value σmax (c, k) of the variance and the value x (t, k) of the k-th dimension of the input vector x (t), the value calculated on the right side of the inequality (3) It is calculated as follows:
[0153]
[Equation 19]

[0154]
This value satisfies the A * search condition.
Therefore, the estimated value h (c, k) obtained using this value θ (c, k) also satisfies the A * search condition.
The estimated value h (c, k) obtained as described above is stored in the estimated value storage unit 2004.
[0155]
Since steps ST1 to ST4 are the same as those in the second embodiment, the description thereof is omitted.
In step ST5, first, the cumulative likelihood g (m, k + 1) up to the dimension number k + 1 is calculated.
This cumulative likelihood g (m, k + 1) is calculated as in the following equation as in the second embodiment.
[0156]
[Expression 20]

[0157]
Next, the estimated value h (c, k + 1) of the cumulative value of the likelihood of the one-dimensional distribution of each dimension from the (k + 1) th dimension to the Kth dimension is read from the estimated value storage unit 2004. Based on the cumulative likelihood g (m, k + 1) and the estimated value h (c, k + 1), the evaluation value f (m, k + 1) is calculated by the following equation.
f (m, k + 1) = g (m, k + 1) + h (c, k + 1)
Here, the number c is obtained from the storage of the section number c (m, k) of the element distribution m.
[0158]
In step ST6, the hypothesis created in step ST5 is stored in the hypothesis storage unit 2002.
Finally, the effect will be described.
According to the present embodiment, since the evaluation value satisfies the A * condition, the solution with the largest evaluation value found first as a result of the search is the solution with the maximum likelihood accumulated value. Therefore, no search error occurs. In addition, since the evaluation value is used for the search, the number of hypotheses generated in the search is reduced compared to the case of the first embodiment in which the evaluation value is not used for the search. Can be reduced.
[0159]
Embodiment 8 FIG.
In this embodiment, when searching for an element distribution with the maximum branch output probability, an estimated value is designed so as to satisfy the A * condition so that a search error does not occur, and a best-first search is performed. For this reason, the same general inequality (3) as that used in the seventh embodiment is also used in this embodiment.
Since this inequality has been described in the seventh embodiment, a description thereof will be omitted.
[0160]
Next, a method for obtaining the minimum value, maximum value, and maximum dispersion value appearing in this inequality in this embodiment will be described.
First, the element distribution constituting the mixed distribution is divided into a plurality of clusters based on the degree of similarity. The degree of similarity between element distributions is measured by the distance between continuous distributions. A clustering technique such as an LBG algorithm or a K-average method can be applied to the division into clusters.
Here, clustering is performed using Cullback divergence as the similarity between the element distributions and using the K-average method as the clustering method. In this case, clustering is the same as that described in the fourth embodiment, and a description thereof is omitted.
[0161]
Next, the search operation will be described.
The search flowchart is the same as FIG. 13, and will be described with reference to this figure.
The operation of each part is the same as that of the seventh embodiment except for the calculation of the estimated value in step ST12 and the evaluation value calculation in step ST5. Therefore, the description of the operation of these similar parts is omitted. .
[0162]
In step ST12, for the input feature vector x (t), an estimated value h (c, k) of the cumulative value of the likelihood of each one-dimensional distribution from the dimension k + 1 to the final dimension K regarding the element distribution m of the cluster c is obtained. Calculate and store in the estimated value storage unit 2004. The estimated value h (c, k) is calculated using the following recurrence formula.
default value:
h (c, K) = 0 K: Final dimension
Recurrence formula:
About k = K-1, K-2, ..., 0
h (c, k) = θ (c, k + 1) + h (c, k + 1)
[0163]
Here, θ (c, k) is the minimum value μmax (c, k), the maximum value of the average value of the k + 1-dimensional one-dimensional distribution of the element distribution included in the c-th cluster. From the μmin (c, k) and the maximum variance σmax (k) and the value x (t, k) of the kth dimension of the input vector x (t), the value of the right side of the inequality (3) is obtained. The value calculated by the following formula.
[0164]
[Expression 21]

[0165]
In step ST5, an evaluation value f (m, k + 1) is calculated.
The evaluation value f (m, k + 1) is the sum of the accumulated value g (m, k + 1) and the estimated value h (c, k + 1).
As the estimated value h (c, k), the value stored in the estimated value storage unit 2004 in step ST12 is used.
As is clear from the recurrence formula in step ST12, the recurrence formula for obtaining the estimated value does not include the max operation, so that the estimated value is calculated as compared with the seventh embodiment. It can be seen that the amount of calculation can be reduced.
In addition, since it does not include the max operation regarding the class, it is not necessary to consider the estimated value of the other class, the estimated value is closer to the actual value, the accuracy of the evaluation value is improved, From the A * search theory, it is possible to reduce the number of hypotheses required until an optimal solution is obtained, and to reduce the amount of calculation for the search.
[0166]
Finally, the effect will be described.
According to the present embodiment, since the evaluation value satisfies the A * condition, the solution with the maximum evaluation value found first as a result of the search is the solution with the maximum likelihood, and the likelihood is the maximum likelihood. Find the degree. In addition, since the evaluation value is used for the search, the number of loops in the search hypothesis extraction and expansion loops is reduced compared to the case of the first embodiment in which the evaluation value is not used for the search, and the calculation is performed. The amount can be reduced. In addition, the amount of calculation for calculating the estimated value can be reduced. In addition, since the estimated value is closer to the actual value, the calculation amount of the search can be reduced.
[0167]
Embodiment 9 FIG.
In this embodiment, when the cumulative value of the likelihood of a one-dimensional distribution up to a halfway dimension is too small during the search for a certain element distribution, the element distribution takes the maximum likelihood cumulative value. Considering that there is no such thing, the search for the element distribution is terminated.
[0168]
FIG. 15 is a flowchart of the best first search of the present embodiment.
In step ST41, an absolute threshold value Θ is set so that the cumulative value of the maximum likelihood of the element distribution never falls below this value. Specifically, the maximum likelihood in this embodiment is distributed in the range of 20 to 70, and a numerical value 0 is set as a value smaller than the minimum value.
[0169]
In step ST2, the hypothesis having the maximum likelihood cumulative value is selected from the hypotheses stored in the hypothesis storage unit 2002.
In steps ST4 to ST5, the dimension number k is increased by 1, and the cumulative value is calculated.
Thereafter, in step ST42, when the cumulative value calculated in step ST5 is smaller than the threshold value Θ (0 in the present embodiment) set in step ST41, the hypothesis is that the cumulative value is the current cumulative value even in the final dimension. It will never be bigger. Therefore, the cumulative value of the final dimension does not become larger than the threshold value, and therefore, the current element distribution does not become the element distribution having the maximum likelihood. Therefore, the process returns to step ST2 without going to step ST6. .
When the calculated cumulative value is larger than the threshold value Θ, the process proceeds to step ST6, and the hypothesis is stored in the stack.
[0170]
By the operation as described above, the number of hypotheses stored in the stack is reduced by the amount not passing through step ST6 during the search.
Further, when the hypothesis having the maximum accumulated value is selected from the hypotheses in the stack in step ST2, the hypothesis to be selected is the entire hypothesis in the stack. For this reason, the amount of calculation used in the process of step ST2 is reduced.
[0171]
Next, the effect will be described.
According to the present embodiment, when the cumulative likelihood is smaller than the set threshold, the hypothesis is not expanded and stored, so that the number of hypotheses generated during the search is reduced and the amount of calculation can be reduced.
Note that by setting the threshold value to be smaller than the minimum value of the maximum likelihood, it is possible to search for an element distribution having the maximum likelihood. Further, the case where the cumulative value is compared with the threshold value and the development of the hypothesis is terminated has been described, but the evaluation value is calculated based on the cumulative value and the estimated value as in the second to eighth embodiments. Also, it may be applied. In this case, the comparison target with the threshold value has the same effect regardless of whether it is compared with the accumulated value or the evaluation value.
[0172]
In addition, the element distribution is divided into a plurality of clusters, the range of maximum likelihood is obtained for each cluster, and a value smaller than the minimum value of the maximum likelihood is set as a threshold, and the developed hypothesis falls below the threshold for each cluster. In this case, the expanded hypothesis may not be stored in the stack.
[0173]
Embodiment 10 FIG.
In the ninth embodiment, a common threshold Θ is used for all element distributions. In the present embodiment, the distribution of the minimum value of the maximum likelihood changes depending on the element distribution m. Therefore, the maximum likelihood value is checked for each element distribution, and for a relatively large element distribution having a minimum maximum likelihood value, a larger threshold value is set so that the search is terminated more efficiently.
The search flowchart in this embodiment is the same as FIG.
[0174]
In the present embodiment, the threshold value set in step ST41 depends on the element distribution, the range of the maximum likelihood value that can be taken is examined, and a value smaller than the minimum value is set as the threshold value Θ (m).
In step ST42, the threshold Θ (m) is extracted according to the element distribution number m and compared with the accumulated value g (m, k).
[0175]
Next, the effect will be described.
According to the present embodiment, it is possible to set a larger threshold value for a relatively large element distribution having the maximum minimum likelihood depending on the element distribution, and the search can be more efficiently terminated by the element distribution. be able to. As a result, the calculation amount of search can be further reduced.
Further, by classifying the element distribution into a plurality of clusters and using a common threshold value for each cluster, it is possible to reduce the storage of the threshold value.
[0176]
Embodiment 11 FIG.
In this embodiment, as in the ninth embodiment, when the cumulative value of the probability of the one-dimensional distribution up to the middle dimension is too small during the search for a certain element distribution, the element distribution is The search for the element distribution is aborted, assuming that it has no maximum likelihood.
Furthermore, the threshold value is changed for each number of dimensions in order to increase the truncation efficiency.
In order to change the threshold according to the number of dimensions, the chi-square distribution used in Embodiment 3 is used here.
[0177]
FIG. 16 is an explanatory diagram for explaining the concept when the threshold value is determined based on the value when the distribution function of the chi-square distribution reaches a certain numerical value.
In the figure, hypothesis A is a hypothesis that has already been developed, and hypothesis B or hypothesis C is a hypothesis that is about to be developed in the middle dimension k.
FA is a value obtained by adding -1/2 times the value of the independent variable when the distribution function of the chi-square distribution with the degree of freedom Kk is 0.95 from the cumulative value in the dimension k of hypothesis A. It is. Therefore, when hypothesis A is expanded to final dimension K, the probability that the cumulative value of hypothesis A exceeds this FA is 0.95, and in most cases it can be expected to exceed FA.
[0178]
Further, FB is a value obtained by adding -1/2 times the value of the independent variable when the distribution function of the chi-square distribution with the degree of freedom Kk is 0.05 from the cumulative value in the dimension k of the hypothesis B. is there. Therefore, when hypothesis B is expanded to the final dimension K, the probability that the cumulative value of hypothesis B exceeds this FB is 0.95, which is almost never exceeded.
[0179]
Similarly, FC adds -1/2 times the value of the independent variable when the distribution function of the chi-square distribution with the degree of freedom Kk is 0.05 from the cumulative value in dimension k of hypothesis C. Value. Therefore, when the hypothesis C is expanded to the final dimension K, the probability that the cumulative value of the hypothesis C exceeds this FC is 0.05, which is almost never exceeded.
[0180]
From this figure, since the probability that FC exceeds FA is smaller than 0.05 × 0.05, Hypothesis C can be discontinued.
In addition, since the probability that FB exceeds FA is greater than 0.05 × 0.05, it is better to give up aborting Hypothesis B.
[0181]
Actually, the maximum accumulated value in the intermediate dimension k is always stored, and the upper and lower estimated values of the accumulated value obtained from the distribution function of the chi-square distribution with the degree of freedom Kk from the maximum value are ranged. As a result, the hypothesis is terminated.
FIG. 17 is a flowchart of the search according to this embodiment.
[0182]
In step ST51, the threshold width of each dimension k (k = 1, 2,..., K) is set. The threshold width of the dimension k is −½ of the difference between two percentage points (that is, 0.95 and 0.05 in the present embodiment) obtained from the distribution function of the chi-square distribution with K−k degrees of freedom. As a double, it is calculated and stored by the following formula.
[0183]
[Expression 22]

[0184]
Here, X (n, p) is the value of the independent variable (ie, the percentage point) when the distribution function of the chi-square distribution with n degrees of freedom is p.
In step ST52, a negative maximum value (−∞) is set as the initial value of the maximum value of the accumulated values up to each dimension k (k = 1, 2,..., K).
Hereinafter, the maximum value of accumulated values up to this dimension k is represented by E (k).
[0185]
In step ST42, the cumulative value of the hypothesis is compared with the threshold value of dimension k. The threshold value of the dimension k is obtained by the following equation by subtracting the width W (k) of the dimension k obtained in step ST51 based on the chi-square distribution from the maximum value E (k) of the dimension k.
Hypothesis dimension k threshold = E (k) −W (k)
[0186]
If the cumulative value is small as a result of the comparison with the threshold, the process returns to step ST2. Conversely, when the accumulated value is not small, the process proceeds to step ST53.
In step ST53, the maximum value of the hypothesis dimension k is updated. This is done by substituting the cumulative value for the maximum value when the cumulative value exceeds the maximum value E (k).
In step ST6, a hypothesis having an increased number of dimensions is stored in the stack.
[0187]
Next, the effect will be described.
According to this embodiment, since the threshold value is changed for each dimension, hypotheses stored during the search can be further reduced as compared with the case where a constant threshold value is used. For this reason, there is an effect that the amount of calculation when taking out from the stack can be reduced.
[0188]
Embodiment 12 FIG.
In this embodiment, when a hypothesis is developed by a best-first search in the dimension direction, the hypothesis is stored in the stack by increasing the number of dimensions by 2 or more and extracting the hypothesis from the stack until the final dimension is reached. Reduce the number of processes. This reduces the amount of calculation for managing the stack.
[0189]
FIG. 18 is a flowchart of the best first search of the present embodiment.
In step ST2, a hypothesis having the maximum accumulated value is extracted.
Assume that the hypothesized element distribution number, the number of developed dimensions, and the cumulative values of the one-dimensional distribution probabilities for each dimension from dimension 1 to dimension k are m, k, and g (m, k), respectively.
[0190]
If the developed dimension number k is not the final dimension K in step ST3, the process proceeds to step ST4.
In step ST4, the number of dimensions is increased by a number Δ equal to or greater than 2, and is set to k + Δ. However, when the increased value is larger than the final dimension K, the final dimension K is made coincident.
[0191]
In step ST5, the accumulated value is updated to be g (m, k + Δ). The updated cumulative value g (m, k + Δ) is calculated by the following equation.
[0192]
[Expression 23]

[0193]
Since the number of dimensions is increased by 2 or more, in the present embodiment, the number of processes for extracting hypotheses from the stack until reaching the final dimension and the number of processes for storing hypotheses in the stack are reduced. This has the effect of reducing the amount of computation for managing the stack.
In addition, although the case where the hypothesis with the maximum cumulative value is preferentially expanded in the dimension number direction has been described, the hypothesis with the maximum evaluation value including the estimated value is preferentially expanded in the dimension number direction (specifically, Needless to say, the present invention can be similarly applied to the above-described embodiments after the second embodiment.
[0194]
Embodiment 13 FIG.
The method of obtaining the maximum element distribution of the cumulative likelihood value described in each of the above embodiments by performing a best-first search in the dimensional direction can be applied to vector quantization using a code vector having an arbitrary spatial arrangement. It can be used to reduce the amount of calculation for calculating the neighborhood code vector.
[0195]
This will be described below.
Let x be the feature vector of speech.
The feature vector x is converted into an index c of the nearest vector v (c) among a limited number of typical vectors v (1), v (2),. Codebook size).
In this conversion, a vector a having a continuous value is converted into a discrete and limited number of vectors {c | c = 1, 2,..., C}, and the vector is quantized. .
A finite set of vectors taking discrete values B = {v (1), v (2),..., V (C)} is called a code book, and its elements are called code vectors.
[0196]
FIG. 19 is a configuration diagram of encoding / decoding using vector quantization.
In the figure, 11 is an audio signal as input data. An analysis unit 12 performs frequency analysis on the input data 11 using a linear prediction method while shifting the time window at a period of 10 ms and converts the input data 11 into a feature vector 13. Here, spectrum envelope information is obtained as a 12-dimensional PARCOR coefficient, which is used as a feature vector, and the number of dimensions K is K = 12.
[0197]
15 is a code book that stores a plurality of code vectors, 14 is a code vector search unit that refers to the code book 15 and obtains the code vector having the closest distance in the code book and outputs the code vector number as a code 16 It is. Reference numeral 17 denotes a transmission / storage unit that transmits or stores the code 16. A decoding unit 18 obtains a code vector 20 by referring to the code book 19 from the transmitted or accumulated code 16. Reference numeral 21 denotes a synthesizing unit that synthesizes speech using the code vector 20 as a parameter and creates output data 22.
[0198]
In the present embodiment, the best-first search in the dimensional direction similar to the embodiments described so far is applied to search for the number of the code vector having the smallest distance from the feature vector x.
The value of the kth dimension of the input vector x is x (k), and the value of the kth dimension of the mth code vector is v (m, k).
The distance between the input vector and the m-th code vector v (m) is calculated as the Euclidean distance by the following equation.
[0199]
[Expression 24]

[0200]
The code vector to be searched as a result of vector quantization is the value of m that minimizes the right side of the above equation.
[0201]
[Expression 25]

[0202]
This is equivalent to obtaining the value of m that maximizes the following expression obtained by multiplying the right side of the above expression by -1.
[0203]
[Equation 26]

[0204]
That is, when considering the quadratic form calculated by the dimension k inside the sum (Σ) of the right side of the above equation as the similarity, vector quantization is the accumulation of the similarity of each dimension defined by the right side of the above equation. This corresponds to obtaining a code vector that maximizes the value g (m, K). However,
[0205]
[Expression 27]

[0206]
And
The basic flow of the processing for obtaining the code vector having the maximum cumulative value g (m, K) of similarity in each dimension from dimension 1 to final dimension K is the same as that of the first embodiment.
[0207]
FIG. 20 is a flowchart when the code vector in the vector quantization of this embodiment is obtained by the best first search.
Hereinafter, differences from the first embodiment will be described.
The number of code vectors stored in the code book is 2048 (generally M).
[0208]
The code book divides the input vector of a large number of samples into 2048 clusters by the K-means method, determines the representative vector (centroid) of the cluster so that the total quantization error is minimized, and 2048 centroids. It is created by putting as a code vector. Each code vector has 12 dimensions (generally K dimensions).
[0209]
Next, each processing step will be described.
In step ST1, initial hypotheses are generated for 2048 code vectors in the code book. The generated hypothesis is stored in the hypothesis storage unit 2002.
Here, the initial hypothesis of the m-th code vector is that the code vector number is m, the calculated number of dimensions is 0, and the cumulative value g (m, 0) of similarity of each dimension is 0.
[0210]
In step ST <b> 2, the hypothesis having the maximum accumulated similarity value is extracted from the hypothesis storage unit 2002. The hypothesized element distribution number, the number of developed dimensions, and the cumulative value of similarity in each dimension will be described below as m, k, g (m, k), respectively.
[0211]
In step ST3, it is checked whether or not the calculated number of dimensions k has reached the numerical value K representing the final dimension. If the final dimension has been reached, this hypothesis is based on the best first search theory. Since it corresponds to a vector, the process proceeds to step ST7 in order to process this hypothesis as a searched solution.
If the final dimension has not been reached (when k <K), the process proceeds to step ST4.
[0212]
In step ST4, the number of dimensions k is increased by 1 to k + 1.
In step ST5, a cumulative value g (m, k + 1) of similarities up to the dimension number k + 1 is calculated.
This cumulative likelihood is calculated by adding a one-dimensional similarity η (m, k + 1) of dimension k + 1 to g (m, k) which is a cumulative likelihood up to dimension k as shown in the following equation.
g (m, k + 1) = g (m, k) + η (m, k + 1)
The one-dimensional similarity of dimension k + 1 is calculated as follows:
η (m, k + 1) = − | x (k + 1) −v (m, k + 1) |²
[0213]
In step ST6, a hypothesis is created with the code vector number, the number of developed dimensions, and the similarity accumulated value of this hypothesis as m, k + 1, g (m, k + 1), respectively, and stored in the hypothesis storage unit 2002. Return to step ST2.
In step ST7, the code vector number m of the hypothesis obtained as a solution is output as the code number as a result of vector quantization. The process ends here.
[0214]
Now, for the hypothesis obtained as a solution in step ST7, the element distribution number, the number of developed dimensions, and the accumulated value are m, K, and g (m, K), respectively.
g (m, K) is the sum of the similarities of each dimension from the dimension 1 to the final dimension K of the m-th code vector. Among 2048 code vectors, the accumulated value of the maximum similarity is present. is doing.
[0215]
The code number output by the above processing is the code vector in the code vector in the code book that has the largest cumulative value of similarity to the input vector, in other words, the distance from the input vector is the smallest. Matches a code vector. That is, the result of vector quantization is obtained.
[0216]
In the present embodiment, the state of the hypothesis remaining in the stack at the end of the search is as shown in FIG. 6 as in the first embodiment. In this figure, the □ mark indicates the development history of the maximum likelihood hypothesis expanded to the final dimension K. Further, the development history of the hypothesis in which the circles and the crosses are expanded to the middle dimension is shown.
[0217]
Now, as can be seen from this figure, the distance between each dimension is calculated for all dimensions of the code vector with the maximum similarity accumulated value, and then the code vector with the smallest distance between the vectors is determined. In the method of this embodiment, the number of operations is not expanded on the stack, and the amount of operations is reduced because the one-dimensional similarity is not calculated by the number of remaining dimensions of the code vector remaining. ing.
Therefore, since the best first search is used in the present embodiment, the number of one-dimensional distance calculation operations can be reduced as compared to the full search.
Further, it is not necessary to design the code book so that the tree structure has a tree structure, and there is an effect that it is possible to exhibit the effect of reducing the amount of calculation with respect to a code book having an arbitrary distribution.
[0218]
Embodiment 14 FIG.
In the present embodiment, as one application of the method of finding the element distribution having the maximum likelihood of the multidimensional input vector by performing the best first search in the dimensional direction, the nearest neighbor or the maximum likelihood code vector in the multidimensional feature vector space Is applied to the problem of recognizing a category by searching for.
When the number of categories is very large, such as character recognition, particularly kanji, as 3000 or more, the amount of computation for calculating the minimum distance or maximum likelihood distribution is remarkably increased.
[0219]
In the present embodiment, each category is represented by a multi-template, and the number of templates is 16. In this case, a total of 3,715 categories of 128-dimensional vectors are input, and the category vector with the smallest distance is searched.
[0220]
FIG. 21 is a configuration diagram of a character recognition device configured based on a document (“Examination of Speeding up Distance Calculation in Character Recognition”, Proceedings of the 1997 IEICE General Conference, page 310).
In the figure, reference numeral 31 is an input paper document on which a print kanji character to be recognized is printed, 32 is an image scanner for reading the input document 31 into a character recognition device, and 33 is a character from image data read by the image scanner 32. A character region cutout unit 34 that separates and cuts out a region for each character, and has a contour feature 64 dimensions (4 × 4 region × 4 directions) from the character image in the character region cut out by the character

region cutout unit

33, and 1 Extracting a total of 128 dimensions of the next peripheral feature 32 dimensions (8 dimensions × 4 directions) and a secondary peripheral feature 32 dimensions (8 dimensions × 4 directions) as feature amounts, and outputting a 128-dimensional feature vector 35 Part.
[0221]
36 is a recognition dictionary in which the vector of the template of each category is stored in association with the category, 37 is a template that minimizes the distance between the feature vector 35 and the template vector stored in the recognition dictionary 36 in association with the category. A category search unit that searches for a vector and outputs a category associated with the searched template vector as a template vector with the shortest distance as a recognition result 38.
[0222]
In the above document, category search searches for a vector with the smallest distance by repeatedly moving the central category from the initial category to a vector with the smallest distance among the central vectors in its vicinity.
In this embodiment, the best-first search in the dimensional direction is applied.
[0223]
FIG. 22 is a flowchart showing the operation of the category search unit 37 that performs a best-first search in the dimension direction for a vector having the smallest distance among the template vectors stored in the recognition dictionary 36 for the feature vector 35.
In the present embodiment, a vector code having the smallest distance from the input vector is searched. The distance is a city distance.
[0224]
The value of the kth dimension of the input vector is X_k And the value of the kth dimension of the mth code vector is V_mkAnd At this time, the distance between the input vector and the mth code vector is calculated as the Euclidean distance by the following equation.
[0225]
[Expression 28]

[0226]
The code vector obtained as a sign is a value of m that minimizes the right side of the above equation. This is the value of m that maximizes the cumulative value of the similarity defined by the following equation obtained by multiplying the right side of the above equation by −1 (hereinafter, −1 times the distance is referred to as “similarity” for simplicity). It is equivalent to seeking.
[0227]
[Expression 29]

[0228]
Therefore, a code vector that maximizes the cumulative value g of similarities in each dimension defined by the above equation is obtained.
The basic flow of the process for obtaining the code vector with the maximum expression is the same as in the first embodiment.
FIG. 20 is a flowchart when a code vector in vector quantization is obtained by a best-first search.
[0229]
There are M template vectors in the recognition dictionary, and serial numbers from 1 to M are assigned respectively. Each template vector is associated with a category C (m).
Next, each processing step will be described.
[0230]
In step ST1, an initial hypothesis is generated for each of the 3715 category template vectors in the recognition dictionary. The generated hypothesis is stored in the hypothesis storage unit 2002.
Here, the initial hypothesis of the m-th code vector is m for the template vector number, 0 for the calculated number of dimensions, and 0 for the accumulated similarity of each dimension. The similarity in each dimension is -1 times the city distance in the following equation.
Similarity of dimension k =-| x_k -V_mk｜
In the formula, | a | indicates the absolute value of a.
[0231]
In step ST <b> 2, the hypothesis having the maximum accumulated similarity value is extracted from the hypothesis storage unit 2002. The hypothesized element distribution number, the number of developed dimensions, and the cumulative value of similarity in each dimension will be described below as m, k, g (m, k), respectively.
Here, the cumulative value of the similarity is given by the following equation, where k is the number of developed dimensions of the hypothesis.
[0232]
[30]

[0233]
In step ST3, it is checked whether or not the calculated dimension number k matches the final dimension. If the calculated dimension number k matches the final dimension, this hypothesis corresponds to the template vector having the maximum cumulative value according to the theory of best-first search. Therefore, the process proceeds to step ST7 with this hypothesis as the searched solution.
If the calculated dimension number k does not match the final dimension, the process proceeds to step ST4.
[0234]
In step ST4, the number of dimensions is increased by 1 to k + 1.
In step ST5, a cumulative value g (m, k + 1) of similarity of each dimension up to the dimension number k + 1 is calculated.
The cumulative similarity g (m, k + 1) is obtained by calculating the one-dimensional similarity of dimension k + 1 to g (m, k), which is the cumulative similarity of each dimension up to dimension k as in the following equation. Add.
g (m, k + 1) = g (m, k) + η (m, k + 1)
[0235]
The one-dimensional similarity η (m, k + 1) of the dimension k + 1 is calculated by the following equation.
η (m, k + 1) = − | x_{k + 1} -V_{mk + 1}｜
[0236]
In step ST6, a hypothesis is created with the template vector number of the hypothesis, the number of developed dimensions, and the similarity cumulative value as m, k + 1, g (m, k + 1), respectively, and stored in the hypothesis storage unit 2002. Return to step ST2.
[0237]
In step ST7, the template vector numbers m1, m2, m3,..., MN are obtained for the hypothesis of rank 1 obtained as a solution and the top N−1 candidates with the large similarity accumulation value remaining in the stack, The categories C (m1), C (m2), C (m3),..., C (mN) associated with the template vector are obtained from the recognition dictionary 36, output as candidates for the recognition result 38, and the process ends.
[0238]
Now, for the hypothesis of rank 1 obtained as a solution in step ST7, the element distribution number, the number of developed dimensions, and the accumulated value are m, K, and g (m, K), respectively.
g (m, K) is the sum of the similarities of each dimension from the dimension 1 to the final dimension K of the m-th code vector, and has the maximum accumulated similarity value in the template vector. .
[0239]
The category number output by the above processing is the template vector having the maximum cumulative value of similarity with the feature vector among the template vectors in the recognition dictionary 36, in other words, the distance from the feature vector is the minimum. Matches the template vector. That is, a category recognition result is obtained.
[0240]
The category search method in the above document cannot always search for the template vector with the minimum distance depending on how the initial category is given. However, according to the present embodiment, among the multi-dimensional vectors for the feature vector, Since the best-first search in the dimensional direction is used to search for the vector with the smallest distance, the template vector with the smallest distance can be searched, and the amount of calculation can be reduced and the accuracy is not deteriorated.
[0241]
Embodiment 15 FIG.
The present embodiment is an application of a method for obtaining a best-first search in the dimensional direction for an element distribution having the maximum likelihood of a multidimensional input vector in a speech recognition apparatus using a mixed continuous distribution HMM.
A block diagram of the speech recognition apparatus according to the present embodiment is FIG. 1 and will be described with reference to the same diagram as that of the first embodiment.
In the figure, 1 is an input speech, 2 is a speech analysis means for analyzing the input speech 1 and converting it into a time

series feature vector

3, 4 is a time series feature vector 3 input, and the likelihood for the mixed continuous distribution HMM 8 is shown. Likelihood calculation means 5 for calculating and outputting a likelihood matrix 5 constitutes a word from concatenation of phoneme HMMs according to the word network which is the path data storage means 9, and the state of the phoneme HMM reaching each node at each node of the network Viterbi calculation means for obtaining a path with the maximum cumulative likelihood in the series, and 6 is an optimum path determination means for searching backward for a local path of the Viterbi calculation to obtain an optimum word string.
[0242]
The phoneme HMM is a mixed continuous distribution HMM, and consists of a three-state multi-dimensional continuous output probability distribution as shown in FIG.
FIG. 23 is a flowchart of processing of each frame after the feature vector of each frame is obtained.
Hereinafter, the operation of each step will be described.
[0243]
In step ST61, the feature vector of the frame is input.
In step ST62, all the element distributions of all the mixed distributions are put together, and the likelihood calculation for the feature vector is performed. This likelihood calculation uses the best-first search in the dimensional direction of the present invention. Details will be described later.
[0244]
In step ST63, the output likelihood of each state is calculated based on the likelihood calculation result of the element distribution for the feature vector obtained in step ST62.
[0245]
In step ST64, the likelihood of the route is calculated. Calculation is based on Viterbi arithmetic. In the Viterbi operation, the likelihood of the phoneme p and the state j is selected along the route from all the states that can go to this state, and the route and the likelihood are stored. This completes the processing of each frame.
Next, the likelihood calculation method using the best-first search in the dimensional direction of the present invention in step ST62 will be described. Here, the method described in the eighth embodiment is applied.
[0246]
FIG. 24 is a flowchart of search processing. The input vector is a feature vector of the frame. The element distribution to be searched is all element distributions in 2000 states, and the total number is 2000 states × 16 distributions = 32000 distributions.
The set of all element distributions is divided into 2048 clusters by the K-means clustering method.
For each cluster, the average of the one-dimensional distribution of each dimension of the element distribution in the cluster and the maximum and minimum values of the variance are obtained in advance.
[0247]
In step ST11, based on these, an estimated value of the cumulative value from the m-th dimension to the final dimension for the feature vector is calculated.
Thereafter, the operation described in the seventh embodiment is performed, the hypothesis having the maximum evaluation value reaches the final dimension in step ST3, and step ST7 is executed.
In step ST7, the contents of the stack are saved, and step ST62 ends.
[0248]
In step ST63, the likelihood of the total element distribution is obtained by looking at the contents of the saved stack. Since the exact likelihood is calculated for the element distribution of the solution first found in step ST3, the value is used. For other element distributions, the cumulative likelihood up to a halfway dimension remaining in the stack can be used.
In addition to the cumulative likelihood up to the intermediate dimension, an evaluation value obtained by adding the estimated values from the intermediate dimension +1 to the final dimension remains, and this value can be used.
[0249]
Embodiment 16 FIG.
In the mixture distribution calculation in the frame processing of the fifteenth embodiment, the N best search is performed.
The flow of frame processing is the same as in FIG. The flowchart of the search is shown in FIG.
[0250]
After obtaining the top N solutions in step ST21, the contents of the stack are stored in step ST22. Thereafter, in step ST63, the top N solutions are set to the likelihood values of the corresponding element distributions.
For the likelihood of other element distributions, an accumulated value remaining in the stack or an evaluation value calculated based on the accumulated value and the estimated value is used.
Compared to the fifteenth embodiment, since the strict likelihood is obtained at least for the top N element distributions, the recognition accuracy is improved.
[0251]
Embodiment 17. FIG.
FIG. 26 is a block diagram of the speech recognition apparatus of the present embodiment configured based on the multipath speech recognition method.
The voice analysis means 2 analyzes the input voice 1 and converts it into a multidimensional feature vector 3.
The likelihood calculation 1 means 41 and the likelihood calculation 2 means 42 refer to the HMM storage means 8 for this multidimensional feature vector 3, and the likelihood of the state of the HMM consisting of a plurality of multidimensional continuous distributions. Calculate The likelihood of the state of the HMM is calculated by the same process as in the fifteenth embodiment.
[0252]
The route calculation 1 means 43 refers to the route data 1 storage means 45 that stores the route data 1 with less restrictions in the series of recognition units, and calculates an estimated value for complete route calculation.
The route calculation 2 means 44 refers to the route data 2 storage means 46 that stores the complete route data 2 and also estimates the likelihood estimate of the route with the restriction of the route calculated by the route calculation 1 means 43 and the current By using the midway likelihood of the complete route obtained up to the evaluation value as an evaluation value, the likelihood of the complete route is efficiently calculated with a smaller amount of computation than the full search by the best first search method.
[0253]
The optimum route determination means 6 examines the route obtained by the route calculation 2 means 44 to determine an optimum route, and outputs a recognition unit sequence associated with the optimum route as a recognition result 7.
In the above configuration, the likelihood calculated by the likelihood calculation 1 means 41 is equal to or larger than the true likelihood and satisfies the A * condition.
Therefore, the likelihood calculated by the route calculation 1 means 43 is equal to or larger than the true likelihood along the route whose route restriction is relaxed, and satisfies the A * condition.
In this way, the output probability of the HMM that satisfies the A * condition can be obtained with a small amount of computation.
[0254]
Embodiment 18 FIG.
In this embodiment, in the multipath speech recognition method described in Embodiment 17, the HMM storage unit 8 is referred to the multidimensional feature vector 3 in the calculation of the likelihood calculation 1 unit 41. Thus, the likelihood of the state of the HMM consisting of a plurality of multidimensional continuous distributions is calculated. Like the above-described Embodiment 16, the likelihood of the state of the HMM is such that the likelihood of the top N element distributions is calculated strictly.
[0255]
Thereby, the accuracy of the estimated value of the route likelihood obtained in the first multipath is improved, and as a result, the number of operations required for the route search by the second best-first search of the multipath is reduced.
[0256]
Embodiment 19. FIG.
The methods described in the above embodiments can also be implemented in a pattern recognition device and a speech recognition device. It can also be implemented as a recording medium storing a program for executing pattern recognition and voice recognition.
In addition, it has the effect of reducing the amount of calculation by applying to an application that calculates the distance, similarity, and likelihood between an input vector and a plurality of multidimensional vectors, such as feature vector clustering or HMM learning. Needless to say.
[0257]
【The invention's effect】
As described above, according to the present invention, the element distribution search method is configured to perform the best-first search in the dimensional direction until the number of developed dimensions matches the number of dimensions of the input vector. There is an effect that the amount of calculation for searching the distribution is reduced.
[0258]
According to the present invention, since the best-first search is executed in the direction in which the number of developed dimensions increases, there is an effect that the amount of calculation for searching the element distribution is reduced.
[0259]
According to the present invention, the estimated value of the cumulative probability value from the (k + 1) -th dimension to the K-th dimension is used as the chi-square distribution when the distribution function value of the chi-square distribution with degrees of freedom K−k reaches a predetermined value. Since the calculation is based on the independent variable value, there is an effect that the evaluation value can be calculated by a table search using the hypothesized dimension k as a key.
[0260]
According to this invention, since the estimated value of the cumulative value of the probability values in the (k + 1) -th dimension to the K-th dimension is calculated based on the distribution of values indicated by the learning speech data, the search error is reduced. effective.
[0261]
According to this invention, the element distribution is divided into a plurality of clusters, and for each cluster, the estimated value of the cumulative probability value in the (k + 1) -th dimension to the K-th dimension is based on the distribution of values indicated by the learning speech data. Therefore, the search error is further reduced.
[0262]
According to the present invention, since a plurality of solutions are obtained from the beginning by the best-first search, and an element distribution having a plurality of higher evaluation values is obtained among the plurality of solutions, a search error is determined. This has the effect of reducing the possibility of occurrence.
[0263]
According to the present invention, the estimated value of the probability value in the (k + 1) -th dimension to the K-th dimension is calculated, the one-dimensional distribution of each dimension in each element distribution is based on the arrangement on the number line of the average value of each distribution. And is calculated based on the inequality for each dimension obtained for each section and the numerical value for each dimension of the input vector, so that the occurrence of search errors can be prevented. There is an effect that the amount of calculation of search can be reduced.
[0264]
According to this invention, the estimated value of the cumulative value of the probability values in the (k + 1) -th dimension to the K-th dimension is divided into a plurality of clusters, and the inequalities obtained for each dimension for each of the plurality of clusters; Since the calculation is made based on the numerical value of each dimension of the input vector, there is an effect that the calculation amount for searching the element distribution is further reduced.
[0265]
According to the present invention, the search threshold value is provided, and when the cumulative value or evaluation value of the search up to the k-th dimension is lower than the threshold value, the search for the hypothesis is terminated, so the element distribution is searched. For this reason, there is an effect that the amount of calculation for the purpose is further reduced.
[0266]
According to the present invention, the search threshold is provided depending on the element distribution, and when the cumulative value or evaluation value of the search up to the k-th dimension is lower than the threshold, the search for the hypothesis is terminated. There is an effect that the calculation amount of the search can be further reduced.
[0267]
According to the present invention, the number of dimensions of the input vector is set as K as the search threshold, and the calculation is performed based on the value of the independent variable when the distribution function of the chi-square distribution with K−k degrees of freedom reaches a certain value. Since this value is used, there is an effect that it is possible to reduce the amount of calculation when taking out from the stack.
[0268]
According to the present invention, when the number of developed dimensions is increased, the best first search for increasing the number of dimensions by two or more is executed, so that the amount of calculation for managing the stack is reduced. .
[0271]
  According to the present invention, the best first search for the element distribution with the maximum evaluation value is performed on a multidimensional input vector.DimensionallyThe probability of the element probability distribution that is executed and takes the maximum evaluation value is recognized using the probability calculated as a result of the best-first search, and the probabilities of other element probability distributions are recognized using the evaluation values up to the middle dimension of the best-first search. Since the unit score calculation is executed to determine the recognition result, there is an effect that the amount of calculation can be reduced without degrading accuracy.
[0272]
  According to the present invention, a best-first search of N branch probability densities from the one with the larger evaluation value is performed on a multidimensional input vector.DimensionallyThe probability of the element probability distribution having N branch density from the largest evaluation value is executed using the probabilities of N element distributions from the largest probability calculated as a result of the best-first search, and other element distributions. Since the recognition result is determined by executing the score calculation of the recognition unit using the evaluation value up to the middle dimension of the best-first search, the strict likelihood is given to the top N element distributions. As a result, the accuracy of recognition is improved.
[0273]
  According to the present invention, when calculating an estimated value of a relaxed path in the first recognition unit sequence, a best-first search for an element distribution having the maximum evaluation value is performed for a multidimensional input vector. TheDimensionallyThe probability of the element distribution with the maximum evaluation value is the probability calculated as a result of the best-first search, and the probabilities of other element distributions are evaluated using the evaluation values up to the middle dimension of the best-first search. Since the score calculation is executed and the recognition result is determined, the output probability of the HMM that satisfies the A * condition can be obtained with a small amount of calculation.
[0274]
  According to the present invention, when calculating an estimated value of a score in which a path restriction is relaxed in the first recognition unit sequence, N pieces of evaluation values having a larger evaluation value are calculated for a multidimensional input vector. Best-first search for branch probability densityDimensionallyThe probability of the element distribution having N branch density from the largest evaluation value is executed using the probability of the N element distribution from the largest evaluation value calculated as a result of the best-first search, and other elements. Since the distribution probability is determined by executing the score calculation of the recognition unit using the evaluation value up to the middle dimension of the best-first search for the distribution probability, the path likelihood obtained in the first multipath is calculated. As a result of improving the accuracy of the estimated value, there is an effect of reducing the number of computations required for route search by the second best-first search of multipath.
[0275]
  According to the present invention, the best first search for the element distribution with the maximum evaluation value is performed on a multidimensional input vector.DimensionallyThe probability of the element distribution that takes the maximum evaluation value is executed using the probability calculated as a result of the best-first search, and the other element distribution probabilities are evaluated using the evaluation values up to the middle dimension of the best-first search. Since the score calculation is performed, there is an effect that the amount of calculation can be reduced without degrading accuracy.
[0276]
  According to the present invention, a best-first search of N element distributions from the largest evaluation value is performed on a multidimensional input vector.DimensionallyThe probability of N element distributions from the largest evaluation value is used as the probability of N element distributions from the largest evaluation value calculated as a result of the best-first search. Is configured to calculate the score of the recognition unit using the evaluation values up to the middle dimension of the best-first search. As a result, strict likelihood is obtained for the top N element distributions, and the recognition accuracy is improved. effective.
[0277]
  According to the present invention, when calculating the estimated value of the score in which the path restriction is relaxed in the first recognition unit sequence, the best first of the element distribution with the maximum evaluation value is calculated for the multidimensional input vector. SearchDimensionallyUse the probability calculated as a result of the best-first search for the probability of the element distribution that is executed and take the maximum evaluation value, and use the evaluation value up to the middle dimension of the best-first search for the probability of the other element distribution. Since the score calculation is executed and the recognition result is determined, the output probability of the HMM that satisfies the A * condition can be obtained with a small amount of calculation.
[0278]
  According to the present invention, when calculating an estimated value of a score in which a path restriction is relaxed in the first recognition unit sequence, N pieces of evaluation values having a larger evaluation value are calculated for a multidimensional input vector. Best-first search for element distributionDimensionallyThe probability of N element distributions from the largest evaluation value is used as the probability of N element distributions from the largest evaluation value calculated as a result of the best-first search. Is configured so that the recognition result is determined by executing the score calculation of the recognition unit using the evaluation value up to the middle dimension of the best-first search, so the accuracy of the estimated path likelihood obtained in the first multipath As a result, the number of computations required for route search by the second best-first search of the multipath is reduced.
[0279]
  According to the present invention, the best first search of the element distribution having the maximum evaluation value is performed on the recognition unit score expressed by the mixed continuous probability distribution model composed of a plurality of multidimensional continuous distributions with respect to the multidimensional input vector.DimensionallyThe probability of the element distribution that takes the maximum evaluation value is used as the probability of the element distribution calculated as a result of the best-first search, and the evaluation values up to the middle dimension of the best-first search are used for the probabilities of other element probability distributions. The recognition unit score calculation is executed, and the recognition unit sequence having the highest score is selected from the recognition unit sequences based on the result of the score calculation. This has the effect of reducing the amount of calculation.
[0280]
  According to the present invention, for a multidimensional input vector, a recognition unit score expressed by a mixed continuous probability distribution model composed of a plurality of multidimensional continuous distributions is assigned to N element distributions in descending order of evaluation values. The best first searchDimensionallyThe probability of the element distribution that takes N evaluation values from the larger evaluation value is executed using the probability of the N element distributions from the largest evaluation value calculated as a result of the best-first search, and other elements For the probability of distribution, execute the score calculation of the recognition unit using the evaluation value up to the middle dimension of the best first search, and based on the result of the score calculation, the sequence of the recognition unit with the highest score among the recognition unit series Since the strict likelihood is obtained for the top N element distributions, the recognition accuracy is improved.
[0281]
  According to the present invention, for a multidimensional input vector, score calculation is performed to calculate an estimated value of a recognition unit score based on a mixed continuous probability distribution composed of a plurality of multidimensional continuous distributions. The estimated value of the score with the path restriction relaxed in the sequence of the second is obtained, and the second time is the sequence of the recognition unit having the highest score among the complete series of recognition units based on the estimated value of the score obtained in the first time When calculating the estimated value of the relaxed path constraint in the first recognition unit sequence, the best branch probability density of the evaluation value for the multidimensional input vector is calculated. First searchDimensionallyUse the probability calculated as a result of the best-first search for the probability of the element distribution to be executed and take the maximum evaluation value, and use the evaluation value up to the middle dimension of the best-first search for the probability of the other element distribution Since the score calculation is executed and the recognition result is determined, the output probability of the HMM that satisfies the A * condition can be obtained with a small amount of calculation.
[0282]
  According to the present invention, a score of a recognition unit is calculated based on a mixed continuous probability distribution composed of a plurality of multidimensional continuous distributions for a multidimensional input vector, and the first time is a path in a sequence of recognition units. The second estimate is obtained by searching for the sequence of the recognition unit with the highest score among the complete sequence of recognition units based on the estimate of the score obtained in the first time. When calculating the estimated value of the relaxed path constraint in the first recognition unit sequence, the best of the N branch probability densities from the one with the largest evaluation value for the multidimensional input vector. First searchDimensionallyThe probability of N element distributions from the largest evaluation value is used as the probability of N element distributions from the largest evaluation value calculated as a result of the best-first search. Is configured to determine the recognition result by executing the score calculation of the recognition unit using the evaluation value up to the middle dimension of the best-first search, so the estimated value of the path likelihood obtained in the first multipath As a result of the improved accuracy, there is an effect that the number of computations required for the route search by the second best-first search of the multipath is reduced.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a speech recognition apparatus according to Embodiment 1 of the present invention.
FIG. 2 is an explanatory diagram showing a structure of a state HMM.
FIG. 3 is a configuration diagram of functions used in a search.
FIG. 4 is an explanatory diagram of a hypothesis storage unit using a heap.
FIG. 5 is a flowchart of a best-first search that advances the calculation of Expression (2) in the dimensional direction.
FIG. 6 is a graph showing the state of hypotheses left in the stack at the end of the search together with their development history.
FIG. 7 is a flowchart for searching.
FIG. 8 is a functional block diagram for searching.
FIG. 9 is a table showing a part of a distribution function table of chi-square distribution with n degrees of freedom.
FIG. 10 is a flowchart of a best-first search for determining S element distributions in descending order of evaluation values for an input vector of dimensionality K;
FIG. 11 is a flowchart of classification of a one-dimensional distribution.
FIG. 12 is an explanatory diagram showing an average value on a number line and a state of division.
FIG. 13 is a flowchart of search.
FIG. 14 is a functional block diagram for searching.
FIG. 15 is a flowchart of a best-first search.
FIG. 16 is an explanatory diagram illustrating a concept for determining a threshold based on a value when a distribution function of a chi-square distribution reaches a certain numerical value.
FIG. 17 is a flowchart of search.
FIG. 18 is a flowchart of a best first search.
FIG. 19 is a configuration diagram of encoding / decoding using vector quantization.
FIG. 20 is a flowchart when a code vector in vector quantization is obtained by a best-first search.
FIG. 21 is a configuration diagram of a character recognition device.
FIG. 22 is a flowchart showing the operation of the category search unit.
FIG. 23 is a flowchart showing frame processing.
FIG. 24 is a flowchart of search.
FIG. 25 is a flowchart of search.
FIG. 26 is a block diagram of a voice recognition device.
FIG. 27 is a block diagram showing a speech recognition apparatus using a mixed continuous distribution HMM.
FIG. 28 is an explanatory diagram of a mixed continuous distribution HMM.
FIG. 29 is an explanatory diagram of route calculation.
[Explanation of symbols]
2 speech analysis means, 4 probability calculation means (score calculation means), 5 route calculation means (score calculation means), 6 optimum route determination means (recognition result selection means).

Claims

In an element distribution search method for searching for an element probability distribution indicating the maximum probability density value among a plurality of multidimensional probability distributions for a multidimensional input vector and searching for the probability density of the element probability distribution, Based on the probability value of the one-dimensional continuous distribution existing in each dimension of each element distribution for each dimension value from the first dimension to the developed k-th dimension of the multi-dimensional input vector, the first dimension to the k-th A process of calculating a cumulative value up to a dimension, a process of selecting an element distribution having the maximum cumulative value up to the k-th dimension, and increasing the number of already calculated dimensions for the selected element distribution. A recalculation process, and performing a best-first search in a dimensional direction until the number of developed dimensions matches the number of dimensions of an input vector.

A search is made for an element probability distribution showing the maximum probability density value among a plurality of multidimensional probability distributions with respect to a multidimensional input vector of dimension number K, and an element distribution for searching the probability density of the element probability distribution is searched. In the search method, based on a probability value of a one-dimensional continuous distribution existing in each dimension of each element distribution with respect to each dimension value from the first dimension of the multi-dimensional input vector of dimension number K to the developed k-th dimension. The process of calculating the evaluation value from the cumulative value up to the k-th dimension and the estimated value of the cumulative value from the (k + 1) -th dimension to the K-th dimension, and the process of selecting the element distribution having the maximum evaluation value And a process of recalculating the evaluation value by increasing the calculated number of dimensions for the selected element distribution, and performing a best-first search in a direction in which the number of expanded dimensions increases. To search element distribution .

Based on the independent variable value of the chi-square distribution when the distribution function value of the chi-square distribution with degrees of freedom K−k reaches a predetermined value, the estimated value of the probability value from the (k + 1) -th dimension to the K-th dimension is calculated. The element distribution search method according to claim 2, wherein calculation is performed.

3. The element distribution search method according to claim 2, wherein an estimated value of cumulative probability values from the (k + 1) th dimension to the Kth dimension is calculated based on a distribution of values indicated by learning speech data.

The element distribution is divided into a plurality of clusters, and for each cluster, an estimated value of cumulative probability values in the (k + 1) -th dimension to the K-th dimension is calculated based on the distribution of values indicated by the speech data for learning. The element distribution search method according to claim 2.

6. A plurality of solutions are obtained from the beginning by a best-first search, and an element distribution having a plurality of higher evaluation values is obtained from the plurality of solutions. The element distribution search method according to claim 1.

Divide the estimated value of the cumulative probability value from the (k + 1) th dimension to the Kth dimension, and divide the one-dimensional distribution of each dimension in each element distribution into a plurality of sections based on the arrangement on the number line of the average value of each distribution 3. The element distribution search method according to claim 2, wherein the calculation is performed based on the inequality for each dimension obtained for each section and the numerical value of each dimension of the input vector.

The estimated distribution of the probability values in the (k + 1) -th dimension to the K-th dimension is divided into a plurality of clusters, the inequalities obtained for each dimension for each of the plurality of clusters, and each dimension of the input vector The element distribution search method according to claim 2, wherein the calculation is performed based on a numerical value.

The search of the said hypothesis is stopped when the threshold value of a search is provided and the accumulation value or evaluation value of the search to the k-th dimension is less than the threshold value. A method for searching an element distribution according to item 1.

A search threshold value is provided depending on the element distribution, and the search for the hypothesis is terminated when the accumulated value or evaluation value of the search up to the k-th dimension is lower than the threshold value. The element distribution search method according to any one of 8.

As a search threshold, the number of dimensions of the input vector is K, and a value calculated based on the value of the independent variable when the distribution function of the chi-square distribution with the degree of freedom K−k reaches a certain value is used. The element distribution search method according to claim 9, wherein the element distribution is searched.

The element distribution search method according to any one of claims 1 to 10, wherein a best-first search for increasing the number of dimensions by two or more is performed when the number of developed dimensions is increased.

The input speech is analyzed and converted into a multidimensional input vector, and the recognition unit score is calculated based on a mixed continuous probability distribution consisting of multiple multidimensional continuous distributions. in speech recognition method for outputting the most scores series of high recognition unit of the accumulated value in units of sequence as the recognition result, the dimension for the input vector of the multi-dimensional, the best first search evaluation value maximum element distribution The probability of the element probability distribution that executes in the direction and takes the maximum evaluation value uses the probability calculated as a result of the best first search, and for the other element probability distribution probabilities, the evaluation value up to the middle dimension of the best first search is used. A speech recognition method, wherein a recognition result is determined by executing a score calculation of a recognition unit.

The input speech is analyzed and converted into a multidimensional input vector, and the recognition unit score is calculated based on a mixed continuous probability distribution consisting of multiple multidimensional continuous distributions. In a speech recognition method for outputting a recognition unit sequence having the highest accumulated score value among unit sequences as a recognition result, N branch probability densities from the one with the highest evaluation value for the multidimensional input vector Best first search run dimension direction, probability elements probability distribution taking N number of branch density from the larger evaluation value N elements distributed from the largest probability calculated as a result of the best first search For other element distribution probabilities, the recognition result is determined by executing the score calculation of the recognition unit using the evaluation values up to the middle dimension of the best-first search. Speech recognition method according to claim.

The input speech is analyzed and converted into a multidimensional input vector, and the recognition unit score is calculated based on a mixed continuous probability distribution composed of a plurality of multidimensional continuous distributions for the multidimensional input vector. The second round finds the estimated value of the relaxed path in the series of recognition units, and the second round has the highest score among the complete series of recognition units based on the estimated score obtained in the first round In the multi-pass speech recognition method obtained by searching for a sequence of recognition units, when calculating an estimated score of a loosened path constraint in the first recognition unit sequence, the multidimensional input vector in contrast, run best first search of the evaluation values up elements distributed in the dimension direction, uses the calculated probabilities as a result of the best first search probability of the evaluation value maximum element distribution, other elements partial Speech recognition method multipath and determining a recognition result by executing the scoring recognition unit by using the evaluation value of the halfway dimension best first search for the probability.

The input speech is analyzed and converted into a multidimensional input vector, and the recognition unit score is calculated based on a mixed continuous probability distribution composed of a plurality of multidimensional continuous distributions for the multidimensional input vector. The second round finds the estimated value of the relaxed path in the series of recognition units, and the second round has the highest score among the complete series of recognition units based on the estimated score obtained in the first round In the multi-pass speech recognition method obtained by searching for a sequence of recognition units, when calculating an estimated score of a loosened path constraint in the first recognition unit sequence, the multidimensional input vector against it, from the largest evaluation value running best first search of the N branch probability density dimension direction, of the element distribution taking N number of branch density from the largest of the evaluation value probabilities of the best first search Using the probabilities of the N element distributions from the larger evaluation value as a result, the other element distribution probabilities are used to calculate the recognition unit score using the evaluation values up to the middle dimension of the best-first search. And determining a recognition result.

Speech analysis means for analyzing input speech and converting it to a multidimensional input vector, and a mixed continuous probability distribution consisting of a plurality of multidimensional continuous distributions for the multidimensional input vector converted by the speech analysis means. A voice comprising: a score calculating means for calculating a recognition unit score based on the recognition unit; and a recognition result selecting means for selecting a recognition unit series having the highest score among the recognition unit series and outputting the series as a recognition result. In the recognition apparatus, the score calculation means performs a best-first search for the element distribution having the maximum evaluation value in the dimension direction with respect to the multidimensional input vector, and the probability of the element distribution having the maximum evaluation value is the best first. The probabilities calculated as the search results are used, and the probabilities of other element distributions are determined using the evaluation values up to the middle dimension of the best-first search. Speech recognition apparatus and performing core calculations.

Speech analysis means for analyzing input speech and converting it into a multidimensional input vector, and a mixed continuous probability distribution consisting of a plurality of multidimensional continuous distributions for the multidimensional input vector converted by the speech analysis means. And a score calculation means for calculating a score of the recognition unit based on the recognition unit, and a recognition result selection means for selecting the recognition unit series having the highest score among the recognition unit series and outputting the series as a recognition result. In the speech recognition apparatus, the score calculation means performs a best-first search of N element distributions in the dimensional direction from the one with the larger evaluation value with respect to the multidimensional input vector, and from the one with the larger evaluation value. The probabilities of N element distributions use the probabilities of N element distributions from the larger evaluation value calculated as a result of the best-first search, and the probabilities of other element distributions. Speech recognition apparatus and performing score calculation of recognition units using the evaluation value of the halfway dimension best first search Te.

The input speech is analyzed and converted into a multidimensional input vector, and the recognition unit score is calculated based on a mixed continuous probability distribution composed of a plurality of multidimensional continuous distributions for the multidimensional input vector. The second round finds the estimated value of the relaxed path in the series of recognition units, and the second round has the highest score among the complete series of recognition units based on the estimated score obtained in the first round In a multi-pass speech recognition device that searches and obtains a sequence of recognition units, when calculating an estimated value of a relaxed path constraint in the first sequence of recognition units, the multi-dimensional input vector in contrast, run best first search of the evaluation values up elements distributed in the dimension direction, is using the probability calculated as a result of the best first search probability of the element distribution taking the evaluation value maximum, other essential Speech recognition apparatus of a multi-path and determining a recognition result by executing the scoring recognition unit by using the evaluation value of the halfway dimension best first search for probability distributions.

The input speech is analyzed and converted into a multidimensional input vector, and the recognition unit score is calculated based on a mixed continuous probability distribution composed of a plurality of multidimensional continuous distributions for the multidimensional input vector. The second round finds the estimated value of the relaxed path in the series of recognition units, and the second round has the highest score among the complete series of recognition units based on the estimated score obtained in the first round In a multipath speech recognition apparatus that searches for a sequence of recognition units and calculates an estimated value of a relaxed path constraint in the first recognition unit sequence, the multidimensional input vector in contrast, the best first search of N elements distributed executed dimension direction from the larger evaluation value, the probability of N elements distributed from the largest of the evaluation values is calculated as a result of the best first search Using the probabilities of N element distributions from the largest evaluation value, and for the other element distribution probabilities, the recognition result is determined by executing the score calculation of the recognition unit using the evaluation values up to the middle dimension of the best-first search. A multipath speech recognition apparatus characterized by:

Analyzes input speech and converts it to a multidimensional input vector, and evaluates the score of the recognition unit expressed by a mixed continuous probability distribution model consisting of multiple multidimensional continuous distributions for the multidimensional input vector The best-first search of the element distribution with the maximum value is executed in the dimension direction, and the probability of the element distribution taking the maximum evaluation value is the probability of the element distribution calculated as a result of the best-first search. For the probability, the score of the recognition unit is calculated using the evaluation value up to the middle dimension of the best-first search, and based on the result of the score calculation, the sequence of the recognition unit with the highest score among the recognition unit sequences. A recording medium on which a program for selecting and determining a recognition result of the input voice is recorded.

Analyzes input speech and converts it to a multidimensional input vector, and evaluates the score of the recognition unit expressed by a mixed continuous probability distribution model consisting of multiple multidimensional continuous distributions for the multidimensional input vector The best first search of N element distributions from the largest value is executed in the dimension direction, and the probability of the element distribution taking N evaluation values from the largest evaluation value is calculated as a result of the best first search. Using the probability of N element distributions from the one with the largest evaluation value, and for the other element distribution probabilities, the score calculation of the recognition unit is executed using the evaluation values up to the middle dimension of the best-first search. A recording medium on which a program for selecting a recognition unit sequence having the highest score among recognition unit sequences based on the result and determining a recognition result of the input speech is recorded.

Analyzes the input speech and converts it to a multidimensional input vector, and calculates an estimated score for the recognition unit based on a mixed continuous probability distribution consisting of multiple multidimensional continuous distributions for that multidimensional input vector The first time obtains the estimated value of the relaxed path in the recognition unit series, and the second time obtains the complete series of recognition units based on the estimated score obtained in the first time. Searching for a sequence of recognition units with the highest score among them, and calculating the estimated score of the first recognition unit sequence with relaxed path constraints, the multi-dimensional input vector Best first search evaluation value largest branch probability density executed dimension direction Te, probability of its probability of the element distribution taking evaluation value maximum using the probability calculated as a result of the best first search, other elements distribution For running the score calculation of the recognizing unit by using the evaluation value of the halfway dimension best first search is a recording medium having a program recorded thereon for determining a recognition result.

The input speech is analyzed and converted into a multidimensional input vector, and the recognition unit score is calculated based on a mixed continuous probability distribution composed of a plurality of multidimensional continuous distributions for the multidimensional input vector. The second round finds the estimated value of the relaxed path in the series of recognition units, and the second round has the highest score among the complete series of recognition units based on the estimated score obtained in the first round When calculating the estimated value of the score that relaxed the path restriction in the first recognition unit sequence by searching for the recognition unit sequence, the evaluation value is large for the multidimensional input vector. The best first search of the N branch probability density from the direction is executed in the dimensional direction, and the probability of N element distributions from the one with the largest evaluation value is from the one with the largest evaluation value calculated as a result of the best first search. N key points A record that records the program for determining the recognition result by executing the score calculation of the recognition unit using the evaluation value up to the middle dimension of the best-first search for the probability of other element distributions using the probability of distribution Medium.