JPH0458636B2

JPH0458636B2 -

Info

Publication number: JPH0458636B2
Application number: JP59129854A
Authority: JP
Inventors: Takao Irumano; Hisanori Kanezashi; Kunio Akiba
Original assignee: Matsushita Communication Industrial Co Ltd
Current assignee: Panasonic Mobile Communications Co Ltd
Priority date: 1984-06-22
Filing date: 1984-06-22
Publication date: 1992-09-18
Also published as: JPS617894A

Description

【発明の詳細な説明】産業上の利用分野本発明は、入力音声と音素表記された単語辞書
を照合して単語を認識する音声認識方法に関する
ものである。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a speech recognition method for recognizing words by comparing input speech with a word dictionary in which phonemes are expressed.

従来例の構成とその問題点従来の音声認識方法を図面とともに説明する。
図において、単語辞書は認識すべき全単語を音素
で表記したものであり、例えばアサヒ、ユーキ、
ユキは／ASAHI／、／JUUKI／、／JUKI／等
と表記されている。音素と標準パタンは、各音素
毎に予め予備実験等により作成しておく。Configuration of conventional example and its problems A conventional speech recognition method will be explained with reference to drawings.
In the figure, the word dictionary represents all the words to be recognized using phonemes, such as Asahi, Yuki,
Yuki is written as /ASAHI/, /JUUKI/, /JUKI/, etc. Phonemes and standard patterns are created in advance for each phoneme through preliminary experiments.

次に上記従来例の動作について説明する。入力
音声を10msのフレーム毎に分析し、パラメータ
を抽出してパラメータ時系列を作成する。パラメ
ータを予め計算しておくものである。次に、各辞
書項目毎に類似度を求めるのであるが、この類似
度計算時に、その辞書項目を構成する辞書音素系
列に従つて音素のセグメンテーシヨンを行ない、
そのセグメンテーシヨンされた音声区間が、その
音素を発声したものである確からしさを表わす尺
度である尤度を計算し、その辞書項目における各
音素の尤度の平均値として類似度を求め、類似度
が最大となる辞書項目をもつて認識単語とする。 Next, the operation of the above conventional example will be explained. Analyze the input audio every 10ms frame, extract the parameters, and create a parameter time series. The parameters are calculated in advance. Next, the degree of similarity is calculated for each dictionary item, and when calculating this degree of similarity, segmentation of phonemes is performed according to the dictionary phoneme sequence that makes up the dictionary item.
The likelihood, which is a measure of the probability that the segmented speech interval is the one that uttered the phoneme, is calculated, and the similarity is calculated as the average value of the likelihood of each phoneme in the dictionary entry. The dictionary item with the maximum degree is determined to be a recognized word.

ここで長母音、例えば／JUUKI／の／UU／
は、／Ｕ／と／Ｕ／の境界を見い出すことは通常
できないので、／UU／を１まとめにしてセグメ
ンテーシヨンし、尤度も１まとめにして計算す
る。なお、上記／UU／のような長母音は、長母
音であるところの１個の音素と考えることも可能
であるが本従来例では２個の音素／Ｕ／が続くも
のとして取り扱つている。従つて尤度も２音素分
の尤度の値を算出する。 Here, a long vowel, for example /JUUKI/'s /UU/
Since it is usually not possible to find the boundary between /U/ and /U/, segmentation is performed by grouping /UU/ together, and the likelihood is also calculated by grouping them together. Note that a long vowel like /UU/ above can be thought of as a single long vowel phoneme, but in this conventional example, it is treated as a series of two phonemes /U/. . Therefore, the likelihood value for two phonemes is also calculated.

本従来例において、辞書音素系列におけるｉ番
目の音素の尤度liは次式で表わされる。 In this conventional example, the likelihood li of the i-th phoneme in the dictionary phoneme sequence is expressed by the following equation.

li＝l_iO−l_iP …… ここでl_iOは、ゼグメンテーシヨンされた区間中
の各フレームにおけるパラメータが、その音素の
標準パタンにどれだけ合致するかを表わす尺度と
して計算される。また、l_iPは、セグメンテーシヨ
ンされた区間が長過ぎ、または短過ぎの場合の尤
度の減点を表わす。長母音は前記のように複数音
素をまとめて尤度計算を行なうので式とは若干
異なるが、基本的には同様で、標準パタンとの合
致度と、長さによる減点で尤度を決定する。ここ
で単語／JUUKI／と／JUKI／は、／Ｕ／が長母
音か短母音かという点のみ異なる。このような語
を識別するため、本従来例において、長母音、こ
の例では／UU／の尤度計算時に／UU／の区間
の長さが予め定められたスレツシヨルドより短い
場合には尤度の減点を行ない、一方、通常の短母
音、この例では／Ｕ／の尤度計算時に、／Ｕ／の
区間の長さがスレツシヨルドより長い場合には尤
度の減点を行なつている。 li=l _iO −l _iP ... Here, l _iO is calculated as a measure representing how well the parameters in each frame in the segmented interval match the standard pattern of that phoneme. In addition, l _iP represents the reduction in likelihood when the segmented interval is too long or too short. For long vowels, the formula is slightly different because the likelihood is calculated for multiple phonemes as described above, but it is basically the same, and the likelihood is determined by the degree of matching with the standard pattern and the deduction of points depending on the length. . Here, the words /JUUKI/ and /JUKI/ differ only in whether /U/ is a long vowel or a short vowel. In order to identify such words, in this conventional example, when calculating the likelihood of a long vowel, /UU/ in this example, if the length of the interval of /UU/ is shorter than a predetermined threshold, the likelihood is On the other hand, when calculating the likelihood of a normal short vowel, /U/ in this example, if the length of the /U/ section is longer than the threshold, the likelihood is deducted.

しかしながら、上記従来例においては以下のよ
うな欠点があつた。入力単語が／JUUKI／で、
辞書音素系列も／JUUKI／である場合、／Ｊ／
のセグメンテーシヨンにおいて、パラメータの変
動の大きい部分を／Ｊ／と／Ｕ／の境界としてい
るが、／Ｊ／の区間が非常に長くなり／UU／の
区間がその分短くなつてしまうことがしばしばあ
る。これは半母音に長母音が後続すると、短母音
が後続する場合と比べ、半母音特有のパラメータ
変化が長く続くからである。つまり聴感的に
は、／JUU／の／UU／は、／JU／の／Ｕ／よ
り明らかに長いのであるが、／JUU／の／UU／
は、単に／Ｕ／を引き伸ばしたものではなく、／
Ｊ／の性質が長く続いているということである。
よつて、前記のように／Ｊ／のセグメンテーシヨ
ンを行なうと、／JUU／が／JU／に比べ長くな
つた分を、／Ｊ／が長くなることによつて食つて
しまい、／Ｊ／を除いた／UU／の部分は長母音
にしては短いということが生じる。このような
時、入力／JUUKI／に対し、辞書項目／
JUUKI／において／UU／の尤度が短過ぎ減点の
ために低くなり、一方、辞書項目／JUKI／の／
Ｕ／は減点されないため、高い尤度となつて類似
度も／JUKI／の方が大となり、単語認識結果
が／JUKI／に誤つてしまう欠点があつた。 However, the above conventional example had the following drawbacks. The input word is /JUUKI/,
If the dictionary phoneme sequence is also /JUUKI/, /J/
In the segmentation of , the part with large parameter variations is set as the boundary between /J/ and /U/, but the /J/ interval becomes extremely long and the /UU/ interval becomes correspondingly short. Often. This is because when a semi-vowel is followed by a long vowel, the parameter changes unique to semi-vowels last longer than when a short vowel follows. In other words, /JUU/'s /UU/ is clearly longer than /JU/'s /U/, but /JUU/'s /UU/
is not simply an extension of /U/, but /
This means that the properties of J/ have continued for a long time.
Therefore, if we segment /J/ as described above, the length of /JUU/ compared to /JU/ will be eaten up by the length of /J/, and /J/ will become longer. The /UU/ part, excluding the , is short for a long vowel. In such a case, for the input /JUUKI/, the dictionary entry /
The likelihood of /UU/ in JUUKI/ becomes low due to the point deduction being too short, while the likelihood of /UU/ in dictionary item /JUKI/ becomes low due to the point deduction.
Since points are not deducted for U/, the likelihood is high and the similarity is also greater for /JUKI/, which has the disadvantage that the word recognition result is mistaken for /JUKI/.

発明の目的本発明は、上記従来例の欠点を除去するもので
あり、類似度計算の精度を向上させ、それにより
単語認識率を向上させることを目的とする。OBJECTS OF THE INVENTION The present invention is intended to eliminate the drawbacks of the above-mentioned conventional examples, and aims to improve the accuracy of similarity calculation, thereby improving the word recognition rate.

発明の構成本発明は、上記目的を達成するために、半母音
に後続する長母音又は短母音の長さの適否を判定
するにあたり、その長母音又はその短母音の区間
長と、先行する半母音の区間長の和を用いること
により、長母音の区間長が短い場合でも先行する
半母音の区間との和が長ければ、長母音の短過ぎ
による尤度の減点を行なわず、一方、短母音の区
間長が短い場合でも先行する半母音の区間との和
が長い時には尤度の減点を行ない、それにより尤
度、類似度計算の精度を向上させる効果を持つも
のである。Structure of the Invention In order to achieve the above object, the present invention, when determining the appropriateness of the length of a long vowel or short vowel following a semi-vowel, uses the interval length of the long vowel or short vowel and the length of the preceding semi-vowel. By using the sum of interval lengths, even if the interval length of a long vowel is short, if the sum with the interval of the preceding semi-vowel is long, the likelihood will not be deducted due to the long vowel being too short; Even if the length is short, if the sum with the preceding half vowel interval is long, the likelihood is subtracted, which has the effect of improving the accuracy of likelihood and similarity calculations.

実施例の説明以下に本発明の一実施例について、図面ととも
に説明する。本実施例の基本構成は、前記従来例
と同様であり、また単語辞書、音素の標準パタン
も前記従来例と同様である。DESCRIPTION OF EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. The basic configuration of this embodiment is the same as that of the conventional example, and the word dictionary and standard pattern of phonemes are also the same as those of the conventional example.

本実施例の動作について説明する。本実施例の
前記従来例と異なる所は、半母音に後続する母音
の尤度計算法であり、他の部分は同様であるか
ら、その尤度計算法について述べる。なお、ここ
で半母音と言う場合、語頭や母音に挾まれた半母
音だけでなく、拗音における半母音部分、すなわ
ちリヤ（／RJA／）等の／Ｊ／も含む。尤度計
算は、前記従来例と同様、入力音声を辞書音素系
列に従つてセグメンテーシヨンし、式または長
母音の場合式に準じた式により尤度を求める。
しかし、本実施例において、尤度の減点、すなわ
ち式におけるl_iPの決め方が前記従来例と異な
る。本実施例における、半母音に後続する長母音
の尤度計算において、長母音の短過ぎの減点は、
長母音だけの区間の長さではなく、長母音の区間
と先行する半母音の区間の長さの和に対してスレ
ツシヨルドを設け、その和がスレツシヨルドより
短い場合に行なう。さらに、半母音に後続する短
母音の尤度計算において、短母音の長過ぎの減点
は、その母音だけの区間の長さではなく、短母音
の区間と先行する半母音の区間の長さの和に対し
てスレツシヨルドを設け、その和がスレツシヨル
ドより長い場合に行なう。 The operation of this embodiment will be explained. The difference between this embodiment and the conventional example is the method of calculating the likelihood of a vowel following a semi-vowel, and since the other parts are the same, the method of calculating the likelihood will be described below. Note that when we say semi-vowels here, we include not only semi-vowels at the beginning of words or between vowels, but also semi-vowels in persistent consonants, such as /J/ such as Riya (/RJA/). Likelihood calculation is performed by segmenting the input speech according to the dictionary phoneme sequence, and calculating the likelihood using a formula or a formula similar to the formula for long vowels, as in the conventional example.
However, in this embodiment, the deduction of the likelihood, that is, the way l _iP in the formula is determined is different from the conventional example. In this example, in the likelihood calculation of a long vowel following a semi-vowel, the deduction for a long vowel that is too short is as follows:
A threshold is set for the sum of the lengths of the long vowel section and the preceding half vowel section, rather than the length of the long vowel only section, and this is done when the sum is shorter than the threshold. Furthermore, when calculating the likelihood of a short vowel following a semi-vowel, the deduction for an excessively long short vowel is not based on the length of the segment containing only that vowel, but on the sum of the length of the segment of the short vowel and the segment of the preceding semi-vowel. A threshold is set for this, and this is performed when the sum is longer than the threshold.

本実施例の効果の例を述べる。前記従来例と同
じ、図に示す入力単語／JUUKI／の場合を述べ
る。この入力例において、辞書項目も／
JUUKI／である時、セグメンテーシヨン結果は
前記従来例と同様／Ｊ／の区間が長く、／UU／
の区間は長母音にしては短いという結果になつ
た。しかし、本実施例では／UU／の区間に／
Ｊ／の区間を加えた長さの和をみるので、ここで
は／Ｊ／の区間が長いため、／UU／に対する短
過ぎの尤度の減点は無かつた。また、同じ入力／
JUUKI／に対し、辞書項目／JUKI／の場合、セ
グメンテーシヨン結果は／Ｊ／は／JUUKI／
の／Ｊ／と同じ区間、／Ｕ／は／JUUKI／の／
UU／と同じ区間となつた。ここで／Ｕ／の長さ
は、短母音の／Ｕ／として標準的な長さであつ
た。しかし、本実施例においては／Ｕ／の区間
に／Ｊ／の区間を加えた和をみるため、ここで
は／Ｊ／の区間が非常に長いため、その和がスレ
ツシヨルドを越え、／Ｕ／の尤度は長過ぎ減点さ
れた。この結果、この／JUUKI／の入力に対し、
辞書項目／JUUKI／における類似度は、前記従
来例の場合と比べ、／UU／の短過ぎ減点が無い
分だけ大きくなり、一方辞書項目／JUKI／にお
ける類似度は、前記従来例の場合と比べ、／Ｕ／
の長過ぎ減点の分だけ小さくなり、これらにより
単語認識結果は正しく／JUUKI／となつた。 An example of the effect of this embodiment will be described. The case of the input word /JUUKI/ shown in the figure, which is the same as the conventional example, will be described. In this input example, the dictionary entry is also /
When JUUKI/, the segmentation results are similar to the conventional example above, with the /J/ section being long and /UU/
The result was that the interval was short for a long vowel. However, in this example, /UU/ is
Since we look at the sum of the lengths including the section of J/, here the section of /J/ is long, so there was no deduction for the likelihood of being too short for /UU/. Also, the same input/
For JUUKI/, in the case of dictionary entry /JUKI/, the segmentation result is /J/ is /JUUKI/
Same section as /J/, /U/ is /JUUKI/
It became the same section as UU/. Here, the length of /U/ was the standard length for the short vowel /U/. However, in this example, the sum of the /U/ interval plus the /J/ interval is calculated, so the /J/ interval is very long, so the sum exceeds the threshold, and the /U/ interval is calculated. Likelihood was too long and points were deducted. As a result, for this /JUUKI/ input,
The degree of similarity in the dictionary item /JUUKI/ is greater than that in the conventional example because /UU/ is not deducted for being too short, while the degree of similarity in the dictionary item /JUUKI/ is greater than that in the conventional example. , /U/
The word recognition result was correct as /JUUKI/.

本実施例においては、半母音に後続する長母音
の短過ぎに対する尤度の減点、及び半母音に後続
する短母音の長過ぎに対する尤度の減点を、それ
ら母音の区間と先行する半母音の区間の長さの和
に対してスレツシヨルドを設けて行なうことによ
り、半母音に後続する長母音、及び短母音の尤度
の減点を的確に行ない、尤度計算の精度を向上で
きる利点がある。 In this example, the likelihood deduction points for a long vowel that follows a semi-vowel that is too short, and the likelihood deduction points for a short vowel that follows a semi-vowel that is too long are calculated based on the length of the interval of those vowels and the interval of the preceding semi-vowel. By setting a threshold for the sum of the sums, there is an advantage that the likelihood of long vowels and short vowels that follow a semi-vowel can be accurately deducted, and the accuracy of likelihood calculation can be improved.

発明の効果本発明は上記のような構成であり、半母音に後
続する長母音の短過ぎの制限、半母音に後続する
短母音の長過ぎの制限を、それら母音の区間に先
行する半母音の区間の長さを加えた長さの和に対
し、スレツシヨルドを設けて尤度を減点すること
により行ない、長母音と短母音の識別度を上げる
方向に類似度計算の精度を向上させ、高い単語認
識率を得ることができる。Effects of the Invention The present invention has the above-mentioned configuration, and limits the length of a long vowel that follows a semi-vowel and the length of a short vowel that follows a semi-vowel. This is done by setting a threshold for the sum of the lengths and subtracting the likelihood, which improves the accuracy of similarity calculation to increase the degree of discrimination between long and short vowels, resulting in a high word recognition rate. can be obtained.

[Brief explanation of the drawing]

図は、従来及び本発明の一実施例における音声
認識方法を示す図である。 The figure is a diagram showing a speech recognition method according to a conventional method and an embodiment of the present invention.

Claims

[Claims]

1. The input speech is compared with dictionary entries in a word dictionary in which words to be recognized are expressed in phonemes, and the input speech is segmented for each phoneme according to the dictionary phoneme series that constitutes each dictionary entry. The system calculates the likelihood, which is a measure of the probability that the phoneme was generated in the segment of the recorded speech, and uses this likelihood value to find the degree of similarity between each dictionary entry and the input speech. When recognizing an input word, limit whether a long vowel following a semi-vowel is too short or a short vowel is too long based on the sum of the lengths of that long vowel or short vowel and the preceding semi-vowel. A speech recognition method characterized by: