JPH0634192B2

JPH0634192B2 - Voice recognizer

Info

Publication number: JPH0634192B2
Application number: JP59108667A
Authority: JP
Inventors: 敦子広田; 陽一山田; 裕飯塚
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1984-05-30
Filing date: 1984-05-30
Publication date: 1994-05-02
Anticipated expiration: 2009-05-02
Also published as: JPS60254099A

Description

【発明の詳細な説明】（技術分野）本発明は、音声認識装置に関し、特に認識性能の向上を
図る為の音声データのマッチング方法の改良に関するも
のである。Description: TECHNICAL FIELD The present invention relates to a voice recognition device, and more particularly to an improvement of a voice data matching method for improving recognition performance.

（背景技術）従来の音声認識装置は第１図のように構成されており、
１は入力端子、２は周波数分析部、３はスペクトル変換
部、４は音声区間決定部、５は非類似度演算部、６は標
準音声スペクトルパターンメモリ、７は判定部、８は認
識結果出力端子である。(Background Art) A conventional voice recognition device is configured as shown in FIG.
1 is an input terminal, 2 is a frequency analysis unit, 3 is a spectrum conversion unit, 4 is a voice section determination unit, 5 is a dissimilarity calculation unit, 6 is a standard voice spectrum pattern memory, 7 is a determination unit, and 8 is a recognition result output. It is a terminal.

従来の音声認識装置では、スペクトル変換した入力音声
スペクトルパターンと標準スペクトルパターンｋ（ｋ＝
１〜Ｋ）との非類似度演算において、非類似度Ｄｋを入
力スペクトルパターンの時間標本点第ｎ番目のｍチャネ
ル目の要素をＡ（ｍ，ｎ）とし、標準スペクトルパター
ンｋの時間標本点ｎ番目のｍチャネル目の要素をＳＫ
（ｍ，ｎ）とした時に、 (1)式により、計算し、Ｋ個の標準スペクトルパターン
の中でＤｋを最小とする標準スペクトルパターンのカテ
ゴリを認識結果としている。ここで重みＷｋ（ｍ，ｎ）
の計算方法については、数々の方式があるが、この発明
の目的ではないので省略する。In a conventional speech recognition device, a spectrum-converted input speech spectrum pattern and a standard spectrum pattern k (k =
1 to K), the dissimilarity Dk is the time sampling point of the input spectrum pattern, and the element of the n-th m-th channel is A (m, n), and the time sampling point of the standard spectrum pattern k. SK the element of the nth m-th channel
When (m, n), The recognition result is the category of the standard spectrum pattern that minimizes Dk among the K standard spectrum patterns calculated by the equation (1). Here, the weight Wk (m, n)
Although there are various methods for calculating, the description thereof is omitted because it is not the object of the present invention.

従来の認識装置では、スペクトル変換により入力音声の
パワー情報は、完全に失なわれる。その結果、例えば
「イチ」を「ニ」と誤認識したり、「ゴ」を「ロク」に
誤認識するという場合がある。In the conventional recognition device, the power information of the input voice is completely lost due to the spectrum conversion. As a result, for example, "ichi" may be erroneously recognized as "d", or "go" may be erroneously recognized as "roku".

第２図に、「イチ」，「ニ」，「ゴ」，「ロク」の音声
パターンのソナグラムの例を示す。第２図で横方向は周
波数軸、たて方向が時間軸である。このように、スペク
トル変換により「イチ」と「ニ」、「ゴ」と「ロク」
は、かなり似かよったパターンとなり、その差としては
「イ」と「チ」の間の無音区間「ロ」と「ク」の間の無
音区間が大きいがパワー情報は失われているので、結果
として誤認識されることがあり、認識率低下の原因とな
った。FIG. 2 shows an example of a sonargram of voice patterns of “ichi”, “d”, “go”, and “roku”. In FIG. 2, the horizontal direction is the frequency axis and the vertical direction is the time axis. In this way, by spectrum conversion, "ichi" and "ni", "go" and "roku"
Results in a very similar pattern, and the difference is that the silent section between “A” and “H” is large, but the power information is lost, so the result is It was sometimes mistakenly recognized, causing a decline in recognition rate.

（発明の課題）この発明の目的は、これらの欠点を解決し、従来スペク
トル変換により完全に失われていたパワー情報、特に無
音区間の情報を用いることにより、認識率を向上させる
事のできる音声認識装置を提供するにあり、その要点
は、前出の数字「イチ」が「ニ」、「ロク」が「ゴ」な
ど、単語中のパワーの谷部（以下パワーディップと称
す）の有無の情報を、従来のスペクトル距離情報と併用
して認識判定に利用することによって、コンフュージョ
ンを失くし、特に電話音声認識などの音韻情報の劣化し
たものなどに利用し、認識率の向上を図るものである。(Problem of the invention) An object of the present invention is to solve these drawbacks and to improve the recognition rate by using the power information which has been completely lost by the conventional spectrum conversion, particularly the information of the silent section. It is to provide a recognition device, and the main point is whether there is a valley of power in the word (hereinafter referred to as power dip), such as "ni" for "ichi" and "go" for "roku" By using information for recognition judgment in combination with conventional spectral distance information, confusion is lost, and it is used especially for phone speech recognition and other phoneme information that has deteriorated to improve the recognition rate. Is.

第３図に単語によるパワーディップの状況を示す。Figure 3 shows the situation of power dip by word.

（発明の構成）本発明によれば、入力音声信号を周波数分析する手段
と、該分析された入力音声のパワーパターンを作成する
手段と、該分析された入力音声のスペクトル傾斜で正規
化されたスペクトルパターンを作成する手段と、標準音
声の予め用意されたスペクトルパターンと入力音声の前
記スペクトルパターンとのマッチングを行い、スペクト
ル非類似度を算出する手段と、前記パワーパターンか
ら、音声区間中でパワーが極小となるパワーディップ区
間を検出し、該パワーディップ区間の有無とパワーディ
ップの大きさから認識対象カテゴリ毎にパワーディップ
非類似度を算出する手段と、各カテゴリに対して、前記
スペクトル非類似度を表すスペクトルマッチング距離に
前記パワーディップ非類似度を表すパワーディップ距離
を加算して総合マッチング距離を算出する手段と、該算
出した総合マッチング距離が最小となるカテゴリを認識
判定する手段とを備えた音声認識装置が提供される。(Structure of the Invention) According to the present invention, means for frequency-analyzing an input voice signal, means for creating a power pattern of the analyzed input voice, and normalization with the spectral slope of the analyzed input voice A means for creating a spectrum pattern, a means for matching a spectrum pattern prepared in advance for a standard voice and the spectrum pattern for an input voice to calculate a spectrum dissimilarity, and a power in the voice section from the power pattern. Detecting a power dip section in which the power dip is minimum, and calculating the power dip dissimilarity for each recognition target category from the presence or absence of the power dip section and the size of the power dip, and the spectrum dissimilarity for each category. Add the power dip distance indicating the power dip dissimilarity to the spectrum matching distance indicating the degree There is provided a voice recognition device comprising means for calculating a total matching distance and means for recognizing and determining a category for which the calculated total matching distance is the smallest.

（作用）音声区間中でパワーが極小となるパワーディップ区間を
検出している。この検出により、パワーディップ区間の
有無とパワーディップの大きさという特徴が認識対象カ
テゴリ毎に把握できる。このように、認識対象カテゴリ
間でパワーの異なる区間のみに着目して非類似度を算出
できるから、各認識対象カテゴリ毎に正確なパワーディ
ップ非類似度を計算では認識性能を大幅に向上させるこ
とができる。(Operation) The power dip section in which the power becomes minimum in the voice section is detected. By this detection, the characteristics of the presence or absence of the power dip section and the size of the power dip can be grasped for each recognition target category. In this way, since the dissimilarity can be calculated by focusing only on the sections having different powers between the recognition target categories, it is possible to significantly improve the recognition performance by calculating the accurate power dip dissimilarity for each recognition target category. You can

（実施例）第４図は、この発明の１実施例を示したブロック図であ
る。第４図において、１００は、入力端子、２００は周
波数分析部である。３００は、スペクトル変換部であ
り、４００は音声区間決定部、５００は再サンプル部で
ある。６００は、パワーディップ演算部であり、パワー
情報メモリ部６０１、シフトレジスタ１６０２、シフ
トレジスタ２６０３、シフトレジスタ３６０４、シ
フトレジスタ４６０５、加算減算器６０６、パワー微
分器６０７、比較部１６０８、比較部２６０９、比
較部３６１０、ＤＳ決定部６１１、ＤＥ決定部６１
２、パワーディップ区間決定部６１３、ＰＡ計算部６１
４、ＱＡ計算器６１５、パワーディップ評価値計算部６
１６、パワーディップ距離演算部６１７、比較部４６
１８、ラッチ６１９、除算計算器６２０、減算計算器１
６２１、減算計算器２６２２、減算計算器３６２
３、減算計算器４６２４、比較部５６２５、比較部
６６２６、比較部７６２７、比較部８６２８、Ｏ
Ｒゲート６２９から成る。７００はパワーディップ定数
テーブルであり、８００は総合距離演算部、９００は距
離出力端子である。(Embodiment) FIG. 4 is a block diagram showing one embodiment of the present invention. In FIG. 4, 100 is an input terminal and 200 is a frequency analysis unit. 300 is a spectrum conversion unit, 400 is a voice section determination unit, and 500 is a re-sampling unit. Reference numeral 600 denotes a power dip calculation unit, which is a power information memory unit 601, shift register 1 602, shift register 2 603, shift register 3 604, shift register 4 605, addition subtractor 606, power differentiator 607, comparison unit 1 608. , Comparison unit 2 609, comparison unit 3 610, DS determination unit 611, DE determination unit 61
2. Power dip section determination unit 613, PA calculation unit 61
4, QA calculator 615, power dip evaluation value calculation unit 6
16, power dip distance calculation unit 617, comparison unit 46
18, latch 619, division calculator 620, subtraction calculator 1
621, subtraction calculator 2 622, subtraction calculator 3 62
3, subtraction calculator 4 624, comparison unit 5 625, comparison unit 6 626, comparison unit 7 627, comparison unit 8 628, O
It consists of an R gate 629. 700 is a power dip constant table, 800 is a total distance calculation unit, and 900 is a distance output terminal.

このような構成において、入力端子１００から入力され
る入力音声信号は、周波数分析部２００に入力され、複
数の周波数帯域に対応した量子化信号として、周波数分
析され、スペクトル変換部３００に送られる。スペクト
ル変換部３００に送られたデータは、スペクトル変換が
なされ、フレーム毎の正規化されたスペクトル情報と、
音声パワー情報となり、音声区間決定部４００、及び再
サンプル部５００に送られる。音声区間決定部４００
は、音声パワー情報を利用して音声区間の始端及び終端
を決定し、再サンプル部５００及びパワーディップ演算
部６００へ送る。再サンプル部５００に送られたスペク
トルデータ及びパワーデータは、抽出された音声区間を
１６点または３２点で時間の正規化がおこなわれ、その
うちのパワー情報のみがパワーディップ演算部６００に
送られる。一方、スペクトルデータは、第１図に説明し
た従来と同じ方法で別のスペクトル情報の非類似度演算
部（ここでは図示せず）に送られ、スペクトル距離が求
められる。この結果は第４図の１０００に入力される。In such a configuration, the input voice signal input from the input terminal 100 is input to the frequency analysis unit 200, subjected to frequency analysis as a quantized signal corresponding to a plurality of frequency bands, and sent to the spectrum conversion unit 300. The data sent to the spectrum conversion unit 300 is subjected to spectrum conversion and normalized spectrum information for each frame,
It becomes voice power information and is sent to the voice section determination unit 400 and the re-sampling unit 500. Speech section determination unit 400
Determines the start and end of the voice section using the voice power information, and sends it to the re-sampling unit 500 and the power dip calculation unit 600. The spectrum data and the power data sent to the re-sampling unit 500 are subjected to time normalization at the 16 or 32 points in the extracted voice section, and only the power information of them is sent to the power dip computing unit 600. On the other hand, the spectral data is sent to another spectral information dissimilarity calculator (not shown here) by the same method as the conventional method described in FIG. 1, and the spectral distance is obtained. This result is input to 1000 in FIG.

次にパワーディップを用いての距離演算を行なうため、
パワー情報メモリ部６０１に書き込まれた再サンプル済
のパワー情報は、シフトレジスタ６０２〜６０５へ１フ
レーム毎に順に転送される。Next, to calculate the distance using the power dip,
The resampled power information written in the power information memory unit 601 is sequentially transferred to the shift registers 602 to 605 every frame.

本発明は、各カテゴリ毎に異なる無音部の特徴をマッチ
ング演算の距離に換算し、単語中のパワーディップの有
無の情報を使用し、認識率の向上を図ることを主眼とす
るものである。The present invention mainly aims to improve the recognition rate by converting the characteristics of the silent part that is different for each category into the distance of the matching calculation and using the information on the presence or absence of the power dip in the word.

第５図は音声のパワーパターンを示し、音声は始端フレ
ーム（STFR）と終端フレーム（EDFR）の区間切出されて
いる。さて、シフトレジスタ６０２〜６０５へ逐次転送
された再サンプル済音声パワーデータｊ（ｊ＝STFR〜ED
FR）は、加算減算器６０６へ送られ、データはさらにパ
ワー微分器６０７へ送られ、第(1)式より値DFPW（ｊ）
を計算する。FIG. 5 shows a power pattern of voice, and the voice is segmented into a start frame (STFR) and an end frame (EDFR). Now, the resampled voice power data j (j = STFR to ED) sequentially transferred to the shift registers 602 to 605.
FR) is sent to the adder / subtractor 606, the data is further sent to the power differentiator 607, and the value DFPW (j) is calculated from the equation (1).
To calculate.

DFPW（ｊ）＝POW（ｊ＋２）＋POW（ｊ＋１）−POW
（ｊ）−POW（ｊ−１）…(1) （ｊ＝STFR〜EDFR）ｊはスタートフレームから開始する。DFPW (j) = POW (j + 2) + POW (j + 1) -POW
(J) -POW (j-1) ... (1) (j = STFR to EDFR) j starts from the start frame.

次にパワーディップ開始点及び終点を決定するために以
下の順に比較を行なっていく。Next, in order to determine the power dip start point and the end point, comparison is performed in the following order.

第(1)式によって計算されたDFPW（ｊ）を、まず比較部
１６９８（THL1）と比較を行い、DFPW（ｊ）がスレッ
ショルドTHL1よりも値が小となる点をｊ１とする。次に
ｊ１より開始してDFPW（ｊ）がスレッショルドTHL2より
も値が大きいか等しくなる点をｊ２とする。次にｊ２よ
り開始してDFPW（ｊ）がスレッショルドTHL3よりも値が
小となる点をｊ３とする。また比較器１６０８〜３
６１０で既にパワーディップ区間候補となった場合は、
ｊ１＝ｊ３として、再び同じ操作を行なう。又、パワー
ディップ区間候補とならなかった場合は新たにｊ２及び
ｊ３を求める。The DFPW (j) calculated by the equation (1) is first compared with the comparison unit 1698 (THL1), and the point at which DFPW (j) is smaller than the threshold THL1 is j1. Next, starting from j1, the point where DFPW (j) is greater than or equal to the threshold THL2 is j2. Then, starting from j2, the point where DFPW (j) becomes smaller than the threshold THL3 is set as j3. Also, comparator 1 608-3
If it is already a power dip section candidate in 610,
The same operation is performed again with j1 = j3. If the power dip section candidate has not been obtained, j2 and j3 are newly obtained.

第５図においては、それぞれｊ１＝Ｐ１，ｊ２＝Ｐ２，
ｊ３＝Ｐ３となり、ＤＳ決定部６１１及びＤＥ決定部６
１２へ送られる。パワー情報メモリ部６０１に書き込ま
れたデータはまた、比較部４６１８へ送られ、比較部
４６１８及びラッチを用いて逐次比較を行ない、パワ
ーの最大値PMAXを求める。パワーの最大値PMAXを求める
式を第(2)式に示す。In FIG. 5, j1 = P1, j2 = P2, respectively.
j3 = P3, and the DS determination unit 611 and the DE determination unit 6
Sent to 12. The data written in the power information memory unit 601 is also sent to the comparison unit 4 618, and successive comparison is performed using the comparison unit 4 618 and the latch to obtain the maximum power value PMAX. Equation (2) shows the equation for obtaining the maximum power value PMAX.

PMAX＝MAX（POW（ｊ））ｊ＝STFR〜EDFR …(2) 既に前述した第(1)式により求められたｊ１，ｊ２，ｊ
２−１及びｊ３の値は、減算計算器６２１〜６２４及び
比較部５６２５〜比較部８６２８により減算・比較
され、ＯＲゲート６２９を通し、出力結果が「１」の時
のみパワーディップ区間を検出・出力し、ＤＳ決定部６
１１及びＤＥ決定部６１２へ送る。PMAX = MAX (POW (j)) j = STFR to EDFR (2) j1, j2, j already obtained by the above-mentioned equation (1)
The values of 2-1 and j3 are subtracted / compared by the subtraction calculators 621 to 624 and the comparison unit 5 625 to the comparison unit 8 628, passed through the OR gate 629, and the power dip section is set only when the output result is “1”. Detect and output, DS determination unit 6
11 and the DE decision unit 612.

パワー最大値PMAXを第(2)式より求めた後、次の条件第
(4)式1)〜4)のいずれかを満たした場合、ｊ１〜ｊ３を
パワーディップ区間候補とする。After obtaining the maximum power value PMAX from the equation (2),
(4) When any of the expressions 1) to 4) is satisfied, j1 to j3 are set as power dip section candidates.

1) POW（ｊ１）−POW（ｊ２）＞PMAX／８ 2) POW（ｊ１）−POW（ｊ２−１）＞PMAX／８ …
(4) 3) POW（ｊ３）−POW（ｊ２−１）＞PMAX／８ 4) POW（ｊ３）−POW（ｊ２−１）＞PMAX／８減算計算器６２１〜６２４においては第(4)式の左辺の
減算を行ない、右辺では除算計算器６２０において第
(2)式より既に求めたパワー最大値PMAXを８で割ってい
る。両辺が求められた後比較部５６２５〜比較部８
６２８で比較され、1)〜4)のいずれかを満たした場合、
ＯＲゲート６２９を介し、パワーディップ区間候補とみ
なし、ＤＳ決定部６１１及びＤＥ決定部６１２へその情
報が送られる。ＤＳ決定部６１１にはパワーディップの
開始点ＤＳを、ＤＥ決定部６１２にはパワーディップの
終点ＤＥを、それぞれ格納する。1) POW (j1) -POW (j2)> PMAX / 8 2) POW (j1) -POW (j2-1)> PMAX / 8
(4) 3) POW (j3) -POW (j2-1)> PMAX / 8 4) POW (j3) -POW (j2-1)> PMAX / 8 In the subtraction calculators 621 to 624, the expression (4) is used. On the right side of the division calculator 620.
The maximum power value PMAX already obtained from equation (2) is divided by 8. After both sides are obtained, the comparison unit 5 625 to the comparison unit 8
When compared in 628 and any one of 1) to 4) is satisfied,
It is considered as a power dip section candidate via the OR gate 629, and the information is sent to the DS determination unit 611 and the DE determination unit 612. A power dip start point DS is stored in the DS determination unit 611, and a power dip end point DE is stored in the DE determination unit 612.

ＤＳ決定部６１１及びＤＥ決定部６１２へ送られたｊ
１，ｊ３及びパワーディップ区間検出データはさらにパ
ワーディップ区間決定部６１３を通して、ＰＡ計算器６
１４及びＱＡ計算器６１５へ送られる。パワーディップ
区間決定部６１３では、パワーディップ区間候補が２つ
できた場合、例えば第５図でＰ１〜Ｐ３，Ｐ３〜Ｐ５が
候補になるが、次の第(5)式1)〜3)を全て満たした時は
Ｐ１〜Ｐ５をパワーディップ区間とするものであり、 1) POW（Ｐ１）−POW（Ｐ３）＞０ 2) POW（Ｐ５）−POW（Ｐ３）＞０ ……(5) 3) POW（Ｐ１）＋POW（Ｐ５）＞３＊POW（Ｐ３）第５図ではＤ１〜Ｄ２，Ｄ２〜Ｄ３の２つがパワーディ
ップ区間になる。J sent to the DS determination unit 611 and the DE determination unit 612
1, j3 and the power dip section detection data are further passed through the power dip section determination unit 613 to the PA calculator 6
14 and QA calculator 615. In the power dip section determination unit 613, when two power dip section candidates are created, for example, P1 to P3 and P3 to P5 are candidates in FIG. 5, but the following Expression (5) 1) to 3) is used. When all are satisfied, P1 to P5 are set as the power dip section, and 1) POW (P1) -POW (P3)> 0 2) POW (P5) -POW (P3)> 0 ...... (5) 3 ) POW (P1) + POW (P5)> 3 * POW (P3) In FIG. 5, two of D1 to D2 and D2 to D3 are power dip sections.

パワーディップ区間決定部６１３で決定されたデータ
は、ＰＡ計算器６１４及びＱＡ計算器６１５へ送られ、
第５図の(i)に示す、パワーディップ区間の始端、終
端の値を結ぶ直線の傾き、及び切片が計算される。The data determined by the power dip section determination unit 613 is sent to the PA calculator 614 and the QA calculator 615,
The slope and intercept of a straight line connecting the start and end values of the power dip section shown in (i) of FIG. 5 are calculated.

まず、（ＤＳ，POW（ＤＳ）），（ＤＥ，POW（ＤＥ））
の２点を結ぶ直線の方程式を第(9)式より求める。First, (DS, POW (DS)), (DE, POW (DE))
The equation of the straight line connecting the two points of is calculated from the equation (9).

(j)＝ＰＡ＊ｊ＋ＱＡ ……(7) ここでＰＡは直線の傾き、ＱＡは切片とする。(j) = PA * j + QA (7) where PA is the slope of the line and QA is the intercept.

第(8)式に直線の傾きＰＡを求める式、第(9)式に切片Ｑ
Ａを求める式を示す。Equation (8) is used to find the slope PA of the straight line, and equation (9) is used to intercept Q.
An equation for obtaining A is shown.

このようにして直線の切片及び傾きを求めた後、パワー
ディップの大きさの評価関数値をパワーディップ評価値
計算部６１６で計算する。評価関数値は、第(10)式によ
り正規化されたものとして定義する。 After obtaining the intercept and the slope of the straight line in this way, the power dip evaluation value calculation unit 616 calculates the evaluation function value of the size of the power dip. The evaluation function value is defined as being normalized by the equation (10).

第(10)式のパワーディップ評価関数値PDVにおいて、Ｃ
は定数でＣ＝２とする。ＳＳは、パワーディップの面積
をあらわし第(11)式より計算される。 In the power dip evaluation function value PDV of the equation (10), C
Is a constant and C = 2. SS represents the area of the power dip and is calculated from the equation (11).

ＷＷはパワーディップのはばをあらわし第(12)式より計
算される。 WW represents the width of the power dip and is calculated from equation (12).

ＷＷ＝ＤＥ−ＤＳ＋１ ……(12) すなわち、ＷＷの値が大きい程、パワーディップの可能
性は少なくなる。WW = DE-DS + 1 (12) That is, the larger the value of WW, the less the possibility of power dip.

またＡＡはパワーディップの傾きをあらわし、第(13)式
により計算される。AA represents the slope of the power dip and is calculated by the equation (13).

ＡＡ＝ＰＡ^２＋１ ……(13) 以上のようにして第(10)式から算出されたパワーディッ
プの大きさPDVは、パワーディップ距離演算部６１７へ
送られる。パワーディップ距離演算部６１７では、パワ
ーディップによる距離を第(14)式により計算し、パワー
ディップの有無に応じた距離値の計算を行う。AA = PA ² +1 (13) The power dip magnitude PDV calculated from the equation (10) as described above is sent to the power dip distance calculation unit 617. The power dip distance calculation unit 617 calculates the distance due to the power dip according to the equation (14), and calculates the distance value according to the presence or absence of the power dip.

(14-1)はパワーディップありの場合、(14-2)は、パワー
ディップなしの場合である。Ｃ１，DBMAX，Ｃ２は定数
であり、それぞれＣ１＝３，Ｃ２＝５，DBMAX＝３００
とする。CCP及びCCNの値はパワーディップ定数テーブル
７００より与えられる。 (14-1) is the case with power dip, and (14-2) is the case without power dip. C1, DBMAX, C2 are constants, and C1 = 3, C2 = 5, DBMAX = 300, respectively.
And The values of CCP and CCN are given from the power dip constant table 700.

ここで第６図に各カテゴリに応じたCCP，CCNの値の設定
値の一部を示す。Here, Fig. 6 shows some of the set values of the CCP and CCN values according to each category.

このようにCCP及びCCNは各カテゴリ毎のパワーディップ
の有無、パワーディップの大きさなどによって予め決定
されるものであり、CCPの値は０〜８の範囲で決定し、
パワーディップのあるカテゴリ、例えば「イチ」，「ロ
ク」，「ハチ」，「ホリュウ」などは、CCPを「０」、
逆に「ヨン」，「ゴ」，「ハイ」などのパワーディップ
のカテゴリは「８」とする。In this way, CCP and CCN are determined in advance by the presence or absence of power dip and the size of power dip for each category, and the value of CCP is determined in the range of 0 to 8,
For categories with power dips, such as "Ichi", "Roku", "Hachi", "Horyu", CCP is "0",
On the contrary, the category of power dip such as “Young”, “Go”, “High” is “8”.

CCNの値は、CCPの逆でパワーディップのないカテゴリ
は、「０」となり、パワーディップのあるカテゴリは
「３０」とする。The value of CCN is “0” for the category that is the reverse of CCP and has no power dip, and is “30” for the category with power dip.

第(14)式により計算された結果が(14-1)式の条件を満た
せばパワーディップであり、また(14-2)式の条件を満た
したのであればパワーディップなしの判定を行ない、判
定結果DBCが総合距離演算部８００へ送られる。総合距
離演算部８００ではスペクトル距離情報１０００から送
られる従来からのスペクトル距離値と、第(14)式から求
めたパワーディップ判定結果DBCとの距離の加算を行い
その結果として距離出力端子９００から出力する。If the result calculated by the formula (14) satisfies the condition of the formula (14-1), it is a power dip, and if the condition of the formula (14-2) is satisfied, the judgment without the power dip is performed. The determination result DBC is sent to the total distance calculation unit 800. The total distance calculation unit 800 adds the distance between the conventional spectral distance value sent from the spectral distance information 1000 and the power dip determination result DBC obtained from the equation (14), and outputs the result from the distance output terminal 900. To do.

以上述べたように、本発明では、通常のスペクトルマッ
チング距離に加え、各カテゴリのパワーディップの有無
の情報を取り込むことにより、パワーディップを持つカ
テゴリ、「イチ」，「ロク」，「ハチ」などと、パワー
ディップを持たないカテゴリ「ニ」「ヨン」「ゴ」など
のコンフュージョンを減少させる事が可能であり、認識
率を向上させることが出来る。As described above, according to the present invention, in addition to the normal spectrum matching distance, the information on the presence or absence of the power dip of each category is taken in, so that the category having the power dip, such as “ichi”, “roku”, “bee”, etc. And, it is possible to reduce the confusion of categories such as “d”, “yeon”, “go” that do not have a power dip, and it is possible to improve the recognition rate.

以上述べた本発明の有効性を証明するために、認識実験
した結果を説明する。In order to prove the effectiveness of the present invention described above, the results of recognition experiments will be described.

男性データ約２９００パタンに対し、約５８００パタン
から作成した標準パタンを用いて認識実験を行なったと
ころ、パワーディップを加えない従来の認識率96.61％
に対して、97.80％と約１％強の認識率の向上が得られ
た。同時に１位と２位の距離の差が拡大し、認識の安定
度の向上がみられる。A recognition experiment was performed using a standard pattern created from about 5800 patterns for about 2900 male data, and the conventional recognition rate without power dip was 96.61%.
On the other hand, the recognition rate was improved to 97.80%, which is a little over 1%. At the same time, the difference in the distance between the first and second places widens, and the stability of recognition is improved.

（発明の効果）以上詳細に説明したように本発明では、通常のスペクト
ル非類似度に加えて、音声区間中でパワーが極小となる
パワーディップ区間を検出しパワーディップ区間の有無
とパワーディップの大きさから認識対象カテゴリ毎にパ
ワーディップ非類似度を算出している。従って、認識対
象カテゴリ間でパワーの異なる区間のみに着目して非類
似度を算出できるから、各認識対象カテゴリ毎に正確な
パワーディップ非類似度を計算でき、その結果、認識性
能を大幅に向上させることができる。(Effect of the Invention) As described in detail above, in the present invention, in addition to the normal spectral dissimilarity, the power dip section in which the power becomes the minimum in the voice section is detected, and the presence or absence of the power dip section and the power dip section are detected. The power dip dissimilarity is calculated for each recognition target category from the size. Therefore, since the dissimilarity can be calculated by focusing only on the sections having different powers between the recognition target categories, the accurate power dip dissimilarity can be calculated for each recognition target category, and as a result, the recognition performance is significantly improved. Can be made.

[Brief description of drawings]

第１図は、従来の音声認識装置のブロック図、第２図
は、音声パターンの例を示す図、第３図は単語によるパ
ワーディップの状況の例を示す図、第４図は本発明によ
る音声認識装置の一実施例のブロック図、第５図はパワ
ーディップの設定範囲を表わした図、第６図はパワーデ
ィップ定数テーブルを示す図である。１００……入力端子、２００……周波数分析部、３００
……スペクトル変換部、４００……音声区間決定部、５
００……再サンプル部、６００……マッチング演算部、
６０１……パワー情報メモリ部、６０２……シフトレジ
スタ（Ｊ＋２）、６０３……シフトレジスタ（Ｊ＋
１）、６０４……シフトレジスタ（Ｊ）、６０５……シ
フトレジスタ（Ｊ−１）、６０６……加算減算器、６０
７……パワー微分器、６０８……比較部１、６０９……
比較部２、６１０……比較部３、６１１……ＤＳ決定
部、６１２……ＤＥ決定部、６１３……パワーディップ
区間決定部、６１４……ＰＡ計算器、６１５……ＱＡ計
算器、６１６……パワーディップ評価値計算部、６１７
……加算器、６１８……比較部４、６１９……ラッチ、
６２０……除算計算部、６２１……減算計算器、６２２
……減算計算器２、６２３……減算計算器３、６２４…
…減算計算器４、６２５……比較部５、６２６……比較
部６、６２７……比較部７、６２８……比較部８、６２
９……ＯＲゲート、７００……パワーディップ定数テー
ブル、８００……総合距離演算部、９００……距離出力
端子。FIG. 1 is a block diagram of a conventional voice recognition device, FIG. 2 is a diagram showing an example of a voice pattern, FIG. 3 is a diagram showing an example of a power dip situation by words, and FIG. 4 is according to the present invention. FIG. 5 is a block diagram of an embodiment of a voice recognition device, FIG. 5 is a diagram showing a power dip setting range, and FIG. 6 is a diagram showing a power dip constant table. 100: input terminal, 200: frequency analysis unit, 300
...... Spectrum conversion unit, 400 ...... Voice section determination unit, 5
00 ... Re-sampling unit, 600 ... Matching calculation unit,
601 ... Power information memory unit, 602 ... Shift register (J + 2), 603 ... Shift register (J +
1), 604 ... Shift register (J), 605 ... Shift register (J-1), 606 ... Adder / subtractor, 60
7 ... Power differentiator, 608 ... Comparison unit 1, 609 ...
Comparing unit 2, 610 ... Comparing unit 3, 611 ... DS determining unit, 612 ... DE determining unit, 613 ... Power dip section determining unit, 614 ... PA calculator, 615 ... QA calculator, 616 ... ... Power dip evaluation value calculation unit, 617
...... Adder, 618 …… Comparison unit 4, 619 …… Latch,
620 ... Division calculator, 621 ... Subtraction calculator, 622
... Subtraction calculator 2, 623 ... Subtraction calculator 3, 624 ...
... subtraction calculator 4, 625 ... comparison unit 5, 626 ... comparison unit 6, 627 ... comparison unit 7, 628 ... comparison unit 8, 62
9 ... OR gate, 700 ... power dip constant table, 800 ... total distance calculation unit, 900 ... distance output terminal.

Claims

[Claims]

1. A means for frequency-analyzing an input voice signal, a means for producing a power pattern of the analyzed input voice, and a means for producing a spectral pattern normalized by a spectral slope of the analyzed input voice. And a means for calculating a spectral dissimilarity by matching a previously prepared spectrum pattern of the standard voice with the spectrum pattern of the input voice, and a power dip that minimizes the power in the voice section from the power pattern. Means for detecting a section, calculating a power dip dissimilarity for each recognition target category from the presence or absence of the power dip section and the size of the power dip, and a spectrum matching distance representing the spectrum dissimilarity for each category To the power dip distance, which indicates the power dip dissimilarity, Means for calculating the distance, the speech recognition apparatus characterized by total matching distance thus calculated is a recognized means for determining the category to which the minimum.