JPH0242238B2

JPH0242238B2 -

Info

Publication number: JPH0242238B2
Application number: JP58025069A
Authority: JP
Priority date: 1983-02-16
Filing date: 1983-02-16
Publication date: 1990-09-21
Also published as: JPS59149400A

Description

【発明の詳細な説明】＜技術分野＞本発明は音声入力装置における音節境界選択方
式の改良に関し、更に詳細には音声入力装置にお
いて、発声速度に応じて音節境界を決定し得るよ
うにしたものである。[Detailed Description of the Invention] <Technical Field> The present invention relates to an improvement of a syllable boundary selection method in a voice input device, and more specifically, to a voice input device that is capable of determining syllable boundaries according to the speaking rate. It is.

＜従来技術＞一般に連続的に発声された音声から音節部を抽
出して識別を行なう方法では、音節部のセグメン
テーシヨンの正確さが認識性能を大きく左右す
る。<Prior Art> In general, in a method of extracting and identifying syllables from continuously uttered speech, the accuracy of segmentation of the syllables greatly influences recognition performance.

従来のセグメンテーシヨン方法においては発声
速度が変化するとセグメンテーシヨン誤り数も変
化する問題点があつた。これはセグメンテーシヨ
ンのアルゴリズムが発声速度に関係なく固定され
ていることに帰因している。 Conventional segmentation methods have a problem in that the number of segmentation errors changes as the speaking speed changes. This is due to the fact that the segmentation algorithm is fixed regardless of the speaking speed.

＜目的＞本発明は上記の点に鑑みてなされたものであ
り、連続音声の発声速度を推定し、音節境界検出
部から出力される音節境界候補の中から推定され
た発声速度にもとずいて音節境界を決定するよう
にした音声入力装置を提供することを目的として
いる。<Purpose> The present invention has been made in view of the above points, and it estimates the speech rate of continuous speech and calculates the speech rate based on the estimated speech rate from among the syllable boundary candidates output from the syllable boundary detection section. It is an object of the present invention to provide a speech input device that determines syllable boundaries by selecting the syllable boundaries.

＜実施例＞以下、図面を参照して本発明を詳細に説明す
る。<Example> Hereinafter, the present invention will be described in detail with reference to the drawings.

第１図は本発明を実施した音声入力装置の全体
構成を示すブロツク図である。 FIG. 1 is a block diagram showing the overall configuration of a voice input device embodying the present invention.

第１図において、入力された音声は、音声分析
部１において、入力時刻ｔにおける音声信号から
パワーｐ（ｔ）、スペクトルｙ（ｔ）等の特徴パラ
メータが抽出される。この音声分析部１において
抽出された特徴パラメータが発声速度検出部２に
入力され、該発声速度検出部２内の無音区間検出
部２１及び有音区間検出部２２によつて入力され
たパラメータのパワーｐ（ｔ）の強弱等にもとず
いて有音区間及び無音区間が区別される。 In FIG. 1, a voice analysis unit 1 extracts characteristic parameters such as power p(t) and spectrum y(t) from the voice signal at input time t. The characteristic parameters extracted in the speech analysis section 1 are input to the speech rate detection section 2, and the power of the parameters input by the silent section detection section 21 and the voiced section detection section 22 in the speech rate detection section 2 is input to the speech rate detection section 2. A sound section and a silent section are distinguished based on the strength of p(t).

また発声速度検出部２内の発声速度推定部２３
によつて音節数が既知である訓練用文章の音声入
力の有音区間の継続時間にもとずいて平均音節長
Ｌが推定され出力される。 Also, the speaking rate estimation unit 23 in the speaking rate detecting unit 2
The average syllable length L is estimated and output based on the duration of the voiced section of the speech input of a training sentence whose number of syllables is known.

即ち、音声入力装置を使用する時に、最初に音
節数が既知である訓練用文章をユーザが発話して
発声速度推定部２３において平均音節長（１／
平均発声速度）を推定することになる。 That is, when using a voice input device, the user first utters a training sentence whose number of syllables is known, and the speech rate estimator 23 calculates the average syllable length (1/
The average speaking rate) will be estimated.

今、音節数がｎ個含まれる文章を発話した際の
有音区間検出部２２において検出されたｉ番目の
有音区間の継続時間をＬ（ｉ）とすると（ただし
ｉ＝１、２、…、ｍ）、発声速度推定部２３にお
いて平均音節長＝１／２_n 〓ⁱ⁼¹ Ｌ（ｉ）が算出され出力される。 Now, let L(i) be the duration of the i-th voiced interval detected by the voiced interval detection unit 22 when a sentence containing n syllables is uttered (where i=1, 2,... , m), the average syllable length = 1/2 _n 〓 ⁱ⁼¹ L(i) is calculated and output in the speech rate estimation unit 23.

文節境界検出部３では無音区間検出部２１にお
いて検出された無音区間の継続時間にもとずい
て、無音区間の継続時間長が所定の長さを越えて
いる場合を検出して、その無音区間を文節境界と
みなしてその旨を出力する。 Based on the duration of the silent section detected by the silent section detection section 21, the phrase boundary detection section 3 detects when the duration of the silent section exceeds a predetermined length, and detects the silent section. is regarded as a bunsetsu boundary and outputs a message to that effect.

音節境界検出部４では上記文節境界検出部３に
よつて文節毎に区切られた音声を単位として、音
声分析部１で抽出された特徴パラメータを用いて
音節境界の候補を出力する（音節境界間の間隔が
音節長となる）。この音節境界検出部４において、
第２図に示すように時刻t₁と時刻t₃において、音
節境界が明確に検出されたが、時刻t₂において音
節境界が存在するか否かを決定し難い場合がある
が、このような場合には、音節境界の最終決定は
音節境界選択部５が行なう。 The syllable boundary detection unit 4 outputs syllable boundary candidates using the feature parameters extracted by the speech analysis unit 1, using the speech segmented into phrases by the phrase boundary detection unit 3 as a unit. The interval between is the syllable length). In this syllable boundary detection unit 4,
As shown in Figure 2, a syllable boundary was clearly detected at time t ₁ and time t ₃ , but it may be difficult to determine whether or not a syllable boundary exists at time t ₂ . In this case, the final determination of syllable boundaries is made by the syllable boundary selection unit 5.

音節境界選択部５は音節境界検出部４において
検出された音節境界の候補の音節長と発声速度推
定部２３により推定された平均音節長とを比較
して音節境界を決定する。 The syllable boundary selection unit 5 determines a syllable boundary by comparing the syllable length of the syllable boundary candidate detected by the syllable boundary detection unit 4 with the average syllable length estimated by the speech rate estimation unit 23.

今、第２図に示す例において、もし時刻t₂が音
節境界でないならば、時間領域t₁＜ｔ＜t₃におい
て長さt₃−t₁（図中Ａ１の長さ）の音節が存在す
ることになり、もし音節境界ならば、長さt₂−t₁
（図中Ｂ１の長さ）と長さt₃−t₂（図中Ｂ２の長
さ）の音節が存在することになるが、音節境界選
択部５はこれらの音節長の候補Ａ１，Ｂ１，Ｂ２
と平均音節長とを比較して音節境界を決定す
る。第２図に示した例では、Ａ１の長さの方がＢ
１及びＢ２の長さより、平均音節長に近いた
め、長さＡ１の音節を選択して、時刻t₂は音節境
界でないと判断される。 Now, in the example shown in Figure 2, if time t ₂ is not a syllable boundary, a syllable of length t ₃ - t ₁ (length of A1 in the figure) exists in the time domain t ₁ < t < t ₃ . If it is a syllable boundary, the length t ₂ −t ₁
(the length of B1 in the figure) and length t ₃ -t ₂ (the length of B2 in the figure), but the syllable boundary selection unit 5 selects these syllable length candidates A1, B1, B2
and the average syllable length to determine syllable boundaries. In the example shown in Figure 2, the length of A1 is longer than B.
Since it is closer to the average syllable length than the lengths of A1 and B2, the syllable of length A1 is selected and time _t2 is determined to be not a syllable boundary.

上記音節境界選択部５において行なわれる音節
境界の選択アルゴリズムをより一般化して以下に
説明する。 The syllable boundary selection algorithm carried out in the syllable boundary selection section 5 will be more generalized and explained below.

今、第３図に示すように、ある時間領域T₁＜
ｔ＜T₂において、音節境界の決定が困難なため、
音節境界検出部３がいくつかの音節候補列Ａ，
Ｂ，Ｃ，…を作成して出力したとする（ただし、
音節候補列Ａはａ個の長さＡ１，Ａ２，…，Aa
の音節候補から成り、音節候補列Ｂ，Ｃ，…も同
様とする）。 Now, as shown in FIG. 3, a certain time domain T ₁ <
At t<T ₂ , it is difficult to determine syllable boundaries, so
The syllable boundary detection unit 3 selects several syllable candidate sequences A,
Suppose that B, C, ... are created and output (however,
The syllable candidate string A has a length A1, A2, ..., Aa
(The same applies to syllable candidate sequences B, C, ...).

この音節候補列Ａ，Ｂ，Ｃ，…が音節境界選択
部５に入力されて、音節候補Ａ，Ｂ，Ｃ，…の平
均音節長からのずれD_A，D_B，D_C，…がそれぞ
れ D_A＝１／ａ_a 〓ⁱ⁼¹ ｄ（Ａ（ｉ），） D_B＝１／ｂ_b 〓ⁱ⁼¹ ｄ（Ｂ（ｉ），） D_C＝１／ｃ_c 〓ⁱ⁼¹ ｄ（Ｃ（ｉ），）ただし、ｄ（ｘ，ｙ）＝｜ｘ−k₁y｜if長さｘ
の音節の前に無音区間有｜ｘ−k₂y｜if長さｘの音節の後に文節境界有｜ｘ−ｙ｜if上記以外として算出される。 These syllable candidate strings A, B, C, ... are input to the syllable boundary selection section 5, and the deviations D A , D B , D C, ... of the syllable candidates _A , _B , _C , ... from the average syllable length are determined respectively. D _A =1/a _a 〓 ⁱ⁼¹ d(A(i),) D _B =1/b _b 〓 ⁱ⁼¹ d(B(i),) D _C =1/c _c 〓 ⁱ⁼¹ d (C(i),) where d(x,y)= |x−k ₁ y|if length x
There is a silent interval before the syllable of length x |x-k ₂ y|if There is a clause boundary after the syllable of length x |x-y|if Calculated as other than the above.

ここで、文節の最初に来る音節や破裂音は平均
音節長より短くなることが多いため、０＜k₁＜
１と設定され、文節の終りの音節は長くなること
が多いため、k₂＞１と設定される。 Here, since syllables and plosives that come at the beginning of a phrase are often shorter than the average syllable length, 0<k ₁ <
Since the syllable at the end of a phrase is often long, k ₂ >1 is set.

音節境界選択部５は、上記のようにして算出さ
れた平均音節長からのずれD_A，D_B，D_C，…の
中で最も小さな平均音節長からのずれを有する
音節候補列を選択して音節列として出力する。 The syllable boundary selection unit 5 selects a syllable candidate sequence having the smallest deviation from the average syllable length among the deviations D _A , D _B , D _C , ... calculated as above from the average syllable length. output as a syllable string.

音節認識部６では、上記のようにして求められ
た音節区間に対して音節標準パターンメモリ７に
記憶された音節の標準パターンとマツチングを行
なつて認識結果を出力する。 The syllable recognition unit 6 matches the syllable section obtained as described above with the syllable standard pattern stored in the syllable standard pattern memory 7, and outputs a recognition result.

なお、上記実施例においては、音声入力装置を
使用する時に最初に既知の訓練用文章を発声して
平均音節長を算出するようにしたが、本発明
は、これに限定されることなく、例えば複数の話
者について予め平均音節長を算出して記憶してお
くように成してもよい。また同一話者における発
声速度の速い、普通、遅い状態における複数の平
均音節長を算出して記憶しておき、認識時の発
声状態により平均音節長を選択するようにしても
よい。 In the above embodiment, when using the voice input device, the known training sentence is first uttered to calculate the average syllable length, but the present invention is not limited to this, and the present invention can be The average syllable length may be calculated and stored in advance for a plurality of speakers. It is also possible to calculate and store a plurality of average syllable lengths in states of fast, normal, and slow speech rates for the same speaker, and select the average syllable length depending on the speech state at the time of recognition.

＜効果＞以上説明したように、本発明によれば、まず発
声速度を推定し、この推定した発声速度にもとず
いて音節境界が決定されるため、話者の特性等に
起因した入力音声の発声速度の相違に拘わらず、
正確に音節境界を検出決定することが出来る。<Effects> As explained above, according to the present invention, the speech rate is first estimated, and the syllable boundaries are determined based on the estimated speech rate, so that inputs due to speaker characteristics etc. Regardless of the difference in speech rate,
Syllable boundaries can be detected and determined accurately.

[Brief explanation of the drawing]

第１図は本発明を実施した音声入力装置の構成
を示すブロツク図、第２図は検出された音節境界
の一例を示す図、第３図は検出された音節境界候
補の他の例を示す図である。１…音声分析部、２１…無音区間検出部、２２
…有音区間検出部、２３…発声速度推定部、３…
文節境界検出部、４…音節境界検出部、５…音節
境界選択部。 FIG. 1 is a block diagram showing the configuration of a voice input device embodying the present invention, FIG. 2 is a diagram showing an example of a detected syllable boundary, and FIG. 3 is a diagram showing another example of detected syllable boundary candidates. It is a diagram. 1... Voice analysis section, 21... Silent section detection section, 22
...Speech interval detection unit, 23...Speech rate estimation unit, 3...
Bunsetsu boundary detection section, 4... syllable boundary detection section, 5... syllable boundary selection section.

Claims

[Scope of Claims] 1. A speech rate estimator that calculates the average syllable length by dividing the sum of durations in voiced sections of speech whose utterance content is known by the number of syllables included in the speech; A syllable boundary detection unit that detects syllable boundaries, and the average syllable length calculated by the speech rate estimation unit for the plurality of syllable boundary candidates detected by the syllable boundary detection unit, and the similarity. A voice input device comprising: a syllable boundary selection unit that determines a candidate with the highest degree as a syllable boundary;