JP2558658B2

JP2558658B2 - Basic frequency analyzer

Info

Publication number: JP2558658B2
Application number: JP27207986A
Authority: JP
Inventors: 博也藤崎; 啓吉広瀬; 圭典清水; 幹雄山口
Original assignee: Sumitomo Electric Industries Ltd
Current assignee: Sumitomo Electric Industries Ltd
Priority date: 1986-11-13
Filing date: 1986-11-13
Publication date: 1996-11-27
Anticipated expiration: 2011-11-27
Also published as: JPS63124100A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声の基本周波数を分析する装置の改良に関
するもので、音声の分析合成、音声確認、音声の高能率
符号化伝送などの用途においてもちいられる、音声の基
本周波数分析装置に関する。Description: TECHNICAL FIELD The present invention relates to an improvement of a device for analyzing a fundamental frequency of a voice, and in applications such as voice analysis / synthesis, voice confirmation, and high-efficiency coded transmission of voice. The present invention relates to a fundamental frequency analysis device for sound used.

[Conventional technology]

音声の基本周波数の抽出方法はケプストラムによる方
法、音声波形自体の自己相関関数を利用する方法など多
くの方法があるが、本発明に対応する従来の技術として
線形予測分析の結果得られる残差波形の自己相関関数を
利用する方法について説明する。There are many methods of extracting the fundamental frequency of the speech, such as a method using a cepstrum and a method using the autocorrelation function of the speech waveform itself, but as a conventional technique corresponding to the present invention, a residual waveform obtained as a result of linear prediction analysis. A method of using the autocorrelation function of will be described.

第２図に従来技術による基本周波数の求め方を示す。
また第３図に正規化自己相関関数を求めるまでのデータ
を模式化して示す。FIG. 2 shows how to obtain the fundamental frequency according to the prior art.
Further, FIG. 3 schematically shows the data until the normalized autocorrelation function is obtained.

まず、入力となる音声波形は、時とともに状態の変化
する（有声から無声へ変わったり、調音様式が変わった
りする）、時間的に連続した波形であるが、これによる
短期間の波形の抽出処理21によって、ある期間（たとえ
ば20msec）だけ取り出す。次にハミング窓やハニング窓
等の窓関数をかける処理22を施すことによって、窓の端
での波の振幅を弱めることにより、窓の切り出し境界で
の時点での影響を小さくする。そして窓の長さの間では
音声波形を生成する系の状態が定常的であると仮定して
線形予測分析23を行う。線形予測分析によって音源情報
に対応する残差波形と、音道伝達関数に対応する係数が
得られる。残差波形は無音声の場合はほぼランダム雑音
に近いものが得られ、有音声の場合は声帯の振動に対応
したインパルス列が得られる（第３図34）。First, the input voice waveform is a temporally continuous waveform whose state changes over time (changes from voiced to unvoiced, or the articulation style changes). By 21, a certain period (for example, 20 msec) is taken out. Then, by applying a window function 22 such as a Hamming window or a Hanning window to weaken the amplitude of the wave at the edge of the window, the influence at the time of the cutout boundary of the window is reduced. Then, the linear prediction analysis 23 is performed on the assumption that the state of the system that generates the speech waveform is stationary during the window length. By the linear prediction analysis, the residual waveform corresponding to the sound source information and the coefficient corresponding to the sound path transfer function are obtained. The residual waveform is almost random noise in the case of no voice, and the impulse train corresponding to the vibration of the vocal cords is obtained in the case of voice (FIG. 34 in FIG. 3).

基本周波数を求めるにはこのインパルス列の繰り返し
の周期を求めれば良いので、残差波形の正規化自己相関
関数（自己相関関数をパワーで割り、遅れ時間τがτ＝
０のとき、関数値が１になるように正規化した関数）を
求めると、基本周波数の周期（τ_ｐとする）の整数倍で
遅らせた正規化自己相関関数は極大値を示す。基本周波
数を求めるには残差波形の正規化自己相関関数からこの
極大値をみつけ、τ_ｐの逆数を計算することにより、基
本周波数を算出すればよい。実際には無声音の場合や無
音（有声音も無性音も出ていない期間）の場合もあるの
で以下の説明のように処理を行う。To find the fundamental frequency, it suffices to find the cycle of repetition of this impulse sequence, so the normalized autocorrelation function of the residual waveform (autocorrelation function divided by power, delay time τ = τ =
When the function is normalized so that the function value becomes 1 when 0), the normalized autocorrelation function delayed by an integral multiple of the period of the fundamental frequency ( _defined as τ _p ) exhibits a maximum value. To obtain the fundamental frequency, the fundamental frequency may be calculated by finding this maximum value from the normalized autocorrelation function of the residual waveform and calculating the reciprocal of τ _p . Actually, there is a case of unvoiced sound or a case of no sound (a period in which neither voiced sound nor asexual sound is produced). Therefore, the processing is performed as described below.

無声音の残差波形の正規化自己相関関数の場合、遅れ
時間τが０のとき最大値１を取る以外は比較的小さな値
をとる。有声音の残差波形の正規化自己相関関数の場合
はτ＝０で最大値１をとる以外に、τ＝τ_ｐ、τ＝２τ
_ｐ、……で極大値となる。また基本周波数の周期τ_ｐと
関係の無いτの値のときにも、極大値を示すことがよく
ある。また、無音のときは本来は音声波形には全く信号
が現れないはずであるが、通常はノイズ（特に電源から
の50又は60Hzのハム音）が存在するため、そのノイズに
より正規化自己相関関数が有声音の場合と同様に周期的
に極大値を示すことがよくある。しかし、ノイズは音声
波形に比べて小さいので残差波形のパワーを見ればその
値が小さいことで判別できる。有声、無声、無音の判別
を含めた実際の処理は次のようにして行なう。In the case of the normalized autocorrelation function of the residual waveform of unvoiced sound, it takes a relatively small value except that the maximum value 1 is taken when the delay time τ is 0. In the case of the normalized autocorrelation function of the residual waveform of voiced sound, τ = τ _p , τ = 2τ in addition to the maximum value of 1 at τ = 0.
_The maximum value is reached with _p , .... In addition, a value of τ that is not related to the period τ _{p of the} fundamental frequency often exhibits a maximum value. In addition, when there is no sound, originally no signal should appear in the voice waveform, but since there is usually noise (especially hum of 50 or 60 Hz from the power supply), the noise causes a normalized autocorrelation function. As in the case of voiced sound, it often has local maximum values. However, since the noise is smaller than the voice waveform, the power of the residual waveform can be discriminated by its small value. Actual processing including discrimination between voiced, unvoiced, and silent is performed as follows.

先ず、閾値θ_ｐを設けて、窓の長さだけ求まっている
残差波形パワーがθ_ｐを越えないときは無音と判別する
（このときの基本周波数は“なし”とする）。無音でな
い場合は、次に閾値θ_ｖを設けて、残差波形の正規化自
己相関関数のτ＝０近辺以外の極大値がθ_ｖを越えない
ならば無音声と判別する（このときも基本周波数は“な
し”とする）。無音でも無声音でもないときが有声音の
場合であり、残差波形の正規化自己相関関数のτ＝０近
辺以外でθ_ｖを越える極大値をとるτの値を、極大値の
大きいほうから順に基本周波数の周期の第一候補、第二
候補、第三候補、……とし、周期の逆数をとって基本周
波数第一候補、第二候補、第三候補、……とする（符号
26）。First, a threshold value θ _p is provided, and when the residual waveform power obtained by the length of the window does not exceed θ _p , it is determined that there is no sound (the fundamental frequency at this time is “none”). If it is not silent, a threshold value θ _v is set next, and if the maximum value other than τ = 0 of the normalized autocorrelation function of the residual waveform does not exceed θ _v , it is determined to be silent (also in this case The frequency is "none". It is the case of voiced sound when it is neither silent nor unvoiced, and the value of τ that takes a maximum value exceeding θ _v except near τ = 0 of the normalized autocorrelation function of the residual waveform is, in order from the largest local maximum value. The first candidate, the second candidate, the third candidate, ... Of the cycle of the fundamental frequency, and the reciprocal of the cycle are taken as the first candidate, the second candidate, the third candidate of the fundamental frequency.
26).

以上の基本周波数の候補は窓位置を少しずつ（たとえ
ば10msec）ずらしながら、各窓位置に対して求める。窓
位置をずらす様子を第４図に模式的に示す。そして各窓
位置に対して求まった基本周波数の候補の様子を第二候
補まで第５図に示す。図中は第一候補であり、は第
二候補である。最終的に求まる基本周波数は基本的には
第一候補の周波数とするが、前後の窓位置における基本
周波数の候補とのつながり具合をみて、第二候補、第三
候補による周波数の方が各窓位置の基本周波数の時間変
化パタンの方が滑らかに続くならばそちらを基本周波数
とする（第２図符号27参照）。窓位置Ｎにおける基本周
波数は例えば次のようにして候補から選択する。The above candidates for the fundamental frequency are obtained for each window position while gradually shifting the window position (for example, 10 msec). FIG. 4 schematically shows how the window positions are shifted. FIG. 5 shows the states of the candidates of the fundamental frequency obtained for each window position up to the second candidate. In the figure, it is the first candidate, and is the second candidate. The fundamental frequency finally obtained is basically the frequency of the first candidate, but the frequency of the second candidate and the third candidate is the window of each window depending on the connection with the candidate of the fundamental frequency at the front and rear window positions. If the temporal change pattern of the fundamental frequency of the position continues smoothly, that is the fundamental frequency (see reference numeral 27 in FIG. 2). The fundamental frequency at the window position N is selected from the candidates as follows, for example.

まず、Ｎ−２、Ｎ−１、Ｎ、Ｎ＋１、Ｎ＋２のそれぞ
れの窓位置における基本周波数の第一候補をもとめる
と、順に128、104、121、107、108Hzである。これらを
大きい順に並べると128、121、108、107、104Hzであ
り、三番目すなわち中央の値は108Hzである。窓位置Ｎ
においてはこの中央の値に最も近い値を候補から選出し
て基本周波数とする。すなわち第二候補106Hzの方が第
一候補121Hzよりも108Hzに近いので第二候補を最終的に
基本周波数とする。この処理を全ての窓位置の基本周波
数の候補に対して行い、各窓位置における基本周波数を
きめる。First, when the first candidates of the fundamental frequencies at the window positions of N-2, N-1, N, N + 1, and N + 2 are obtained, they are 128, 104, 121, 107, and 108 Hz in order. If these are arranged in descending order, they are 128, 121, 108, 107, 104 Hz, and the third or central value is 108 Hz. Window position N
In, the value closest to this central value is selected from the candidates as the fundamental frequency. That is, since the second candidate 106 Hz is closer to 108 Hz than the first candidate 121 Hz, the second candidate is finally set as the fundamental frequency. This processing is performed on the candidates of the fundamental frequency at all window positions to determine the fundamental frequency at each window position.

[Problems to be solved by the invention]

従来技術では、窓を設定しその期間内では音声生成系
が定常状態であると仮定して分析を行っていた。そのた
め、次のような問題点がある。In the prior art, an analysis was performed by setting a window and assuming that the voice generation system is in a steady state within that period. Therefore, there are the following problems.

（１）音声波形に対する窓の位置が常に最適であると
は限らないため、窓の位置によっては基本周波数の分析
精度が悪くなることがある。(1) Since the position of the window with respect to the voice waveform is not always optimal, the analysis accuracy of the fundamental frequency may deteriorate depending on the position of the window.

（２）窓の長さが常に最適であるとは限らない。窓が
長いとそれだけ定常状態と仮定している（実際には非定
常だが）期間が長くなり、基本周波数が急激に変わると
ころでは基本周波数を抽出することができなくなる。逆
に窓が短く、基本周波数の周期の二倍より小さくなると
（低い声の場合に起こりやすい）基本周波数の抽出がき
ないことがある。(2) The window length is not always optimal. The longer the window, the longer the period assuming the steady state (actually, it is non-steady), and the fundamental frequency cannot be extracted where the fundamental frequency changes rapidly. On the contrary, if the window is short and is smaller than twice the period of the fundamental frequency, the fundamental frequency may not be extracted (which is likely to occur in the case of low voice).

基本周波数の変化範囲は同一人でも会話などでは２オ
クターブに達するので有限で一定の窓の長さでは常に最
適であることは困難であり、分析精度をあげるためには
窓の長さを適応的に変化させることが望ましい。The variation range of the fundamental frequency reaches 2 octaves even for the same person in conversation, so it is difficult to always be optimal with a finite and fixed window length. To improve the analysis accuracy, the window length should be adaptive. It is desirable to change to.

[Means for solving problems]

第１図に本発明による処理のブロック図を示す。 FIG. 1 shows a block diagram of the processing according to the present invention.

１は連続的に線形予測分析（ここでは線形予測分析と
等価な偏自己相関分析またはそれらの変形手法による分
析を含めて考える）を行う部分で時間的に連続した残差
波形を出力する。２はローパスフィルタで、残差波形の
低周波成分を強調する。３は過去の方向に向かって指数
関数の減衰を有する半無限長の窓関数を用いることを示
す。４は連続的に自己相関関数を算出することを示す。
３と４は説明を分かり易くするために別々に表している
が実施例に述べるように一度に処理をすることができ
る。５は自己相関関数の極大値と残差波形のパワーに対
し閾値を設け、有声音に対してのみ、基本周波数の候補
をみつける処理を進める部分である。６は自己相関関数
の極大値をとる遅れ時間τから基本周波数の候補を求め
る部分である。７は基本周波数の候補から基本周波数を
選択する部分である。Reference numeral 1 denotes a portion for continuously performing linear prediction analysis (here, consideration is made to include partial autocorrelation analysis equivalent to linear prediction analysis or analysis by a modified method thereof), and outputs a temporally continuous residual waveform. A low-pass filter 2 emphasizes the low-frequency component of the residual waveform. 3 indicates the use of a semi-infinite window function with exponential decay in the past direction. 4 indicates that the autocorrelation function is continuously calculated.
Although 3 and 4 are shown separately for the sake of clarity, they can be processed at once as described in the embodiment. Reference numeral 5 is a portion for setting a threshold for the maximum value of the autocorrelation function and the power of the residual waveform and advancing the process of finding the candidate of the fundamental frequency only for the voiced sound. Reference numeral 6 is a portion for obtaining a candidate for the fundamental frequency from the delay time τ that takes the maximum value of the autocorrelation function. Reference numeral 7 is a portion for selecting a fundamental frequency from candidates for the fundamental frequency.

[Action]

第６図に本発明による音声波形の処理経過を概念的に
示す。61は入力となる音声波形である。これを１により
連続的線形予測分析をおこなって残差波形を求めたのが
62である。さらにローパスフィルタ２によって、高周波
成分を減衰させ低周波域を強調すると63のような波形が
得られる。64は窓関数と掛け合わされる残差波形の部分
を示す。65は過去の方向に向かって指数関数の減衰を有
する半無限長の窓関数を示している。64と65を各サンプ
ル時点ごとに掛け合わせることによって、窓の開始時点
では波の大きさが大きいが、時間的に古くなるに従っ
て、波の大きさが小さく縮小された波形66が得られる。
これはすなわち、過去の値に対して現在の値に重みづけ
を行っていることを意味する。更に66の波形に対して自
己相関関数を計算したものが67の自己相関関数である。
第３図では分かり易くするために正規化したうえで横方
向に拡大して示してある。FIG. 6 conceptually shows the process of processing a voice waveform according to the present invention. Reference numeral 61 is an input voice waveform. The continuous linear prediction analysis was performed by 1 and the residual waveform was obtained.
62. Further, when the high-pass component is attenuated and the low-frequency region is emphasized by the low-pass filter 2, a waveform such as 63 is obtained. Reference numeral 64 indicates the portion of the residual waveform that is multiplied by the window function. Reference numeral 65 indicates a semi-infinite length window function having an exponential decay in the past direction. Multiplying 64 and 65 at each sample time gives a waveform 66 with a large wave size at the beginning of the window but a smaller wave size as the window ages.
This means that the present value is weighted with respect to the past value. Further, the autocorrelation function of 67 is calculated by calculating the autocorrelation function for 66 waveforms.
In FIG. 3, it is shown in a laterally enlarged state after being normalized for easy understanding.

残差波形のパワーと正規化自己相関関数の値によって
有声か否かの判別を、従来の技術で示した通りおこなう
ことができる。自己相関関数からは従来技術と同様にそ
の極大値を示す遅れ時間τ_１、τ_２……の値を基本周波
数の周期の候補とすることができ、基本周波数の候補を
求め、基本周波数の候補の中から選定する。第６図では
概念的に説明するために残差波形に指数関数的に減衰す
る半無限長の窓関数を掛けた結果66を示しているが、実
施例で示すように66を求めなくても直接自己相関関数を
求めることができる。Whether or not the voice is present can be discriminated based on the power of the residual waveform and the value of the normalized autocorrelation function as described in the related art. From the autocorrelation function, the values of the delay times τ ₁ , τ _2, ... Which show the maximum value can be used as the candidates of the cycle of the fundamental frequency, and the candidates of the fundamental frequency can be obtained by using the autocorrelation function. Select from among. FIG. 6 shows the result 66 obtained by multiplying the residual waveform by a window function of a semi-infinite length that decays exponentially for the purpose of conceptual explanation. However, 66 is not required as shown in the embodiment. The autocorrelation function can be obtained directly.

以上の処理によって、窓の開始位置がサンプル点ｎに
あるときの基本周波数を求めることができる。窓の開始
位置を順次ずらしながら、各位置における基本周波数を
求めることで、基本周波数の時間変化パタンが求められ
る。With the above processing, the fundamental frequency when the window start position is at the sampling point n can be obtained. The time-varying pattern of the fundamental frequency is obtained by finding the fundamental frequency at each position while sequentially shifting the start position of the window.

〔Example〕

まず連続的な残差波形を求めるための線形予測分析の
実施例として、格子法の計算により予測残差波形を求め
る方法を第７図に示す。格子法は斎藤、中田：“音声情
報処理の基礎”オーム社（1981）110−112頁に解説があ
る。ここでは従来の格子法を時間的に連続して処理でき
るように変形して用いる。First, as an example of a linear prediction analysis for obtaining a continuous residual waveform, a method for obtaining a predicted residual waveform by a lattice method is shown in FIG. The lattice method is described in Saito and Nakata: "Basics of Speech Information Processing", Ohmsha (1981), pages 110-112. Here, the conventional grid method is modified so that it can be processed continuously in time.

第７図（ａ）において、▲ｋ⁽ⁱ⁾ _t▼（ｉ＝1,…,p）は
あるサンプル時点ｔにおける偏相関係数で、各サンプル
時点ごとに連続的相互相関演算器C_iによって求める。あ
るサンプル時点ｔにおいてｉ段目の出力として得られる
前向予測残差を▲ε⁽ⁱ⁾ _f,t▼、後向予測残差を▲ε⁽ⁱ⁾
_b,t▼とするとｉ＋１段目（第７図（ｂ））の前向予測
残差は、後向予測残差は、で計算する。後向予測残差は実際には１つ前のサンプル
時点ｔ′＝ｔ−１において、を計算しておき、その値を遅延要素を通すことにより、
サンプル時点ｔにおいて▲ε⁽ⁱ⁺¹⁾ _b,t▼として出力す
る。初段（ｉ＝０）の前向予測残差と後向予測残差は、（１つ前のサンプル時点における入力波形）である。ま
たk⁽ⁱ⁺¹⁾ _t▼は、によって計算する。ここで、は、時間平均を意味する。ここでは、過去の値に重みづ
けを行った時間平均を連続的に計算するため、指数関数
的に減衰する窓をかけることによって時間平均の計算を
行う。窓の実施例としては、現在の時刻を０し、過去に
ｊτ時間さかのぼった時間の窓関数の値をＷ（ｊτ）と
すると、Ｗ（ｊτ）＝exp（−ｊτ/Tc）の関数を用いる。ただし、 τはサンプリング周期、 Tcは窓関数の減衰の早さを定めるパラメータで10から20
msec程度に選ぶ。In FIG. 7 (a), ▲ k ⁽ⁱ⁾ _t ▼ (i = 1, ..., p) is a partial correlation coefficient at a certain sample time t, and is calculated by the continuous cross-correlation calculator C _i at each sample time. Ask. The forward prediction residual obtained as the output of the i-th stage at a certain sample time t is ε ⁽ⁱ⁾ _{f, t} ▼, and the backward prediction residual is ▲ ε ^(i).
_{If b, t} ▼, the forward prediction residual of the i + 1th stage (Fig. 7 (b)) is The backward prediction residual is Calculate with. The backward prediction residual is actually at the previous sample time t ′ = t−1, By calculating and passing that value through the delay element,
It is output as ▲ ε ^{(i + 1)} _{b, t} ▼ at the sampling time t. The forward prediction residual and the backward prediction residual of the first stage (i = 0) are (Input waveform at the time of the previous sample). Also, k ^{(i + 1)} _t ▼ is Calculate by here, Means time average. Here, since the time averages obtained by weighting the past values are continuously calculated, the time averages are calculated by applying an exponentially decaying window. As an example of the window, when the current time is set to 0 and the value of the window function for the time traced back by jτ time is W (jτ), the function of W (jτ) = exp (-jτ / Tc) is used. . Where τ is the sampling period and Tc is a parameter that determines the decay rate of the window function.
Select about msec.

時系列的に得られるサンプル値y_tのサンプル時点ｔに
おける時間平均▲▼はで定義する。直前のサンプル時点ｔ−１で得られる時間
平均を使えば、という漸化式にって▲▼が計算できる。この漸化式
を用いることで、が効率よく計算できる。第１図２のローパスフィルタに
入力する残差波形はｐ段目の出力として得られる前向予
測残差▲ε⁽ⁱ⁺¹⁾ _f,t▼をもちいる。The time average ▲ ▼ of the sample values y _t obtained in time series at the sampling time t is Defined by Time average obtained at the last sample time t-1 If you use ▲ ▼ can be calculated by the recurrence formula. By using this recurrence formula, Can be calculated efficiently. The residual waveform input to the low-pass filter of FIG. 1 and FIG. 2 uses the forward prediction residual ▲ ε ^{(i + 1)} _{f, t} ▼ obtained as the output of the p-th stage.

次に、２のローパスフィルタについて説明する。ロー
パスフィルタは無くても基本周波数を求めることができ
るが、500Hz程度の遮断周波数特性をもつローパスフィ
ルタを用いることで残差波形に含まれる基本周波数成分
を強調することができ、分析精度をあげることができ
る。Next, the low pass filter 2 will be described. Although the fundamental frequency can be obtained without a low-pass filter, the fundamental frequency component contained in the residual waveform can be emphasized by using a low-pass filter with a cutoff frequency characteristic of about 500Hz, which improves analysis accuracy. You can

次に、３の過去の方向に向かって指数関数の減衰を有
する半無限長の窓関数の実施例は、 W₁（ｎτ）＝A₁exp（−ｎτ/T₁）（ｎ≧０）（ただし、A₁＝１であり、τはサンプリング周期であ
る。）で表される一次の指数関数である。T₁の値は窓関数の減
衰の速さを定めるパラメータで、たとえば男性では15ms
ec、女性では10msec程度の値を用いる。An example of a semi-infinite window function with exponential decay towards the past direction of 3 is then W ₁ (nτ) = A ₁ exp (−nτ / T ₁ ) (n ≧ 0) ( However, A ₁ = 1 and τ is a sampling period.) Is a first-order exponential function. The value of T ₁ is a parameter that determines the decay rate of the window function.
ec, a value of about 10 msec is used for women.

サンプル点ｎにおけるｋサンプル離れた場所との自己
相関関数の値をσ（n,k）とすると、直前のサンプル点
ｎ−１における自己相関関数の値σ（ｎ−1,k）を用い
て以下の漸化式でサンプル点ｎにおける自己相関関数を
計算できる。Let σ (n, k) be the value of the autocorrelation function at a sample point n away from the sample by k samples, and use the value σ (n−1, k) of the autocorrelation function at the immediately preceding sample point n−1. The autocorrelation function at the sample point n can be calculated by the following recurrence formula.

σ（n,k）＝α^２σ（ｎ−1,k）＋α^kx_nx_n-k ただし、α＝exp（−τ/T₁） x_nは残差波形のサンプル値 τはサンプリング周期ｋ＝0,1,2,…，である。初期値すなわちx_nが入力される前の状態はσ
（0,k）＝０（ｋ＝0,1,2,…，）である。本願における
指数関数の減衰を有する窓関数は、このように、指数関
数そのものを毎回計算するのではなく、漸化式によって
実施することもできる。σ (n, k) = α ² σ (n−1, k) + α ^k _xn x _nk where α = exp (−τ / T ₁ ) x _n is the sample value of the residual waveform τ is the sampling period k = 0,1,2, ... The initial value, that is, the state before x _n is input is σ
(0, k) = 0 (k = 0,1,2, ...). The window function with exponential decay in this application can thus also be implemented by a recurrence formula rather than calculating the exponential function itself each time.

以上の漸化式を用いることで、比較的少ない演算量で
各サンプル点における自己相関関数を求めることができ
る。ｋ＝０の場合、すなわちσ（n,0）はその窓をかけ
た後の残差波形のパワーを表している。正規化自己相関
関数は、 σ（n,k）／σ（n,0）（ｋ＝0,1,2,…，）で表される。実際に計算すべきｋの範囲は、ｋ＝0,1,2,…,k_maxとすると、以下の通りである。観測
されうる基本周波数の下限値をF_minとすると、F_minによ
る自己相関関数の極大点は遅れ時間k_maxτ＝1/F_minのと
ころに現れる。従ってk_max＝ceil（1/（τF_min））まで
自己相関関数を求めておけばその極大点を見付けること
ができる。ただし、ceil（ｘ）はｘを越える最小の整数
を表す。F_minは男性話者の通常の発話なら例えば70Hz程
度に設定する。By using the above recurrence formula, the autocorrelation function at each sample point can be obtained with a relatively small amount of calculation. In the case of k = 0, that is, σ (n, 0) represents the power of the residual waveform after applying the window. The normalized autocorrelation function is represented by σ (n, k) / σ (n, 0) (k = 0,1,2, ...). The range of k to be actually calculated is as follows, where k = 0, 1, 2, ..., K _max . If the lower limit of observed may fundamental frequency is F _min, the maximum point of the autocorrelation function according to F _min appears at the delay time _{_{k max τ = 1 / F min}} . Therefore, if the autocorrelation function is calculated up to k _max = ceil (1 / (τ F _min )), the maximum point can be found. However, ceil (x) represents the smallest integer exceeding x. F _min is set to, for example, about 70 Hz for a normal utterance of a male speaker.

５、６の処理は従来技術と同様に行うことができる。
７の基本周波数の選出も、基本周波数の候補が各サンプ
ル点ごとにえられ、時間的により細かく比較できる以外
は従来技術と同様におこなうことができる。The processes 5 and 6 can be performed in the same manner as the conventional technique.
The selection of the fundamental frequency 7 can be performed in the same manner as the conventional technique except that the fundamental frequency candidates are obtained for each sample point and can be compared more finely in time.

以上の説明から判るようにσ（n,k）は全てのサンプ
ル点ｎ毎に、すなわち時間的に連続して求めるので、基
本周波数も全てのサンプル点ｎ毎に出力することも可能
である。しかしながら、基本周波数を利用する面から言
えば全てのサンプル点毎（たとえばサンプリング周波数
10kHzなら0.1msec毎）には必ずしも必要ではない場合が
多く、例えば10msec毎に判れば良い。そのためには７か
ら得られる基本周波数を10msec毎に間引きして用いる
か、あるいは３以降または４以降の処理を10msce毎に行
うことも可能である。As can be seen from the above description, σ (n, k) is obtained for every sample point n, that is, continuously in time, so that the fundamental frequency can also be output for every sample point n. However, in terms of using the fundamental frequency, for every sampling point (for example, sampling frequency
In many cases, it is not always necessary for 0.1 msec for 10 kHz). For example, it is sufficient to know every 10 msec. For that purpose, it is possible to use the basic frequency obtained from 7 by thinning it out every 10 msec, or to carry out the processing after 3 or 4 every 10 msce.

また、３の過去の方向に向かって指数関数の減衰を有
する半無限長の窓関数の他の実施例をあげると、多次の
指数関数的な窓も可能であり、たとえば W₂（ｎτ）＝A₂nτexp（１−ｎτ/T₂）ただし、A₂＝1/T₂ や、 W₃（ｎτ）＝A₃（ｎτ）²exp（２−ｎτ/T₃）ただし、A₃＝1/（2T₃）^２という式によって計算することも可能である。In addition, as another example of a window function of semi-infinite length having an exponential decay toward the past direction of 3, a multi-order exponential window is also possible, for example W ₂ (nτ) = A ₂ nτ exp (1-nτ / T ₂ ) where A ₂ = 1 / T ₂ or W ₃ (nτ) = A ₃ (nτ) ² exp (2-nτ / T ₃ ) where A ₃ = 1 It is also possible to calculate by the formula / (2T ₃ ) ² .

W₂（ｎτ）やW₃（ｎτ）の場合もW₁（ｎτ）と同様に
して多少複雑になるが漸化式によって自己相関関数を計
算することも可能であるため、効率よく計算できる。In the case of W ₂ (nτ) and W ₃ (nτ), it becomes a little complicated like W ₁ (nτ), but the autocorrelation function can be calculated by the recurrence formula, so that the calculation can be performed efficiently.

第８図にW₁（ｎτ）、W₂（ｎτ）、W₃（ｎτ）それぞ
れの窓関数の形を81、82、83に示す。言うまでもなく、
W₂（ｎτ）、W₃（ｎτ）が過去の方向に向かって減衰す
るものは、それぞれexp（１−ｎτ/T₂）、exp（２−ｎ
τ/T₃）という指数関数の項によるものである。第８図
ではT₂、T₃の値は、W₁（ｎτ）が0.5になる時刻と、W₂
（ｎτ）、W₃（ｎτ）が極大値を取る時刻とが一致する
よう、具体的には、 T₂＝T₁ln2 T₃＝T₂/2 として示してある。FIG. 8 shows the window function shapes of W ₁ (nτ), W ₂ (nτ), and W ₃ (nτ) at 81, 82, and 83, respectively. not to mention,
W ₂ (nτ) and W ₃ (nτ) that decay in the past direction are exp (1-nτ / T ₂ ), exp (2-n), respectively.
This is due to the exponential term τ / T ₃ ). In Fig. 8, the values of T ₂ and T ₃ are the time when W ₁ (nτ) becomes 0.5 and the value of W ₂
_{(Nτ), W 3 (nτ} ) so as to match the time of taking the maximum value, specifically, it is shown as _{_{_{T 2 = T 1 ln2 T 3}}} = T 2/2.

〔The invention's effect〕

従来では有限長の窓を用いていたために避けられな
い、あるいは特別の配慮を要した問題点がすべて未然に
回避される。すなわち、窓の位置によって、基本周波数
分析精度が悪くなることや、窓の長さを厳密に制御する
必要がなくなる。In the past, all problems that could not be avoided because of the use of finite length windows or that required special consideration are avoided. That is, depending on the position of the window, the accuracy of fundamental frequency analysis is deteriorated, and it becomes unnecessary to strictly control the length of the window.

さらに、従来の技術では窓の範囲内では定常性を前提
として分析を行っていたが、本発明では過去の音声波形
に比べ現在の音声波形により重みをおいた分析であり、
窓の範囲における定常性を必要としないため分析精度の
向上が期待できる。Further, in the conventional technique, the analysis was performed on the assumption of stationarity within the range of the window, but in the present invention, the analysis is weighted by the current voice waveform as compared with the past voice waveform,
Improvement in analysis accuracy can be expected because stationarity in the window range is not required.

また、窓に指数関数的に減衰する関数を用いるので、
直前のサンプル点における自己相関関数を用いて比較的
簡単な処理により次のサンプル点における自己相関関数
を計算することができる。Also, since a function that exponentially decays is used for the window,
The autocorrelation function at the next sample point can be calculated by a relatively simple process using the autocorrelation function at the immediately preceding sample point.

[Brief description of drawings]

第１図は、本発明による基本周波数の求め方の説明図、
第２図は、従来技術による基本周波数の求め方の説明
図、第３図は、従来技術による正規化自己相関関数を求
めるまでの処理過程の説明図、第４図は、窓位置の移動
の説明図、第５図は、基本周波数の候補の説明図、第６
図は、本発明による自己相関関数算出までの処理の模式
的説明図、第７図の（ａ）は、格子形偏相関計算回路の
全体図（ｂ）はその一段分の回路図、第８図は、過去の
方向に向かって指数関数の減衰を有する半無限長の窓関
数の概形図である。１……連続的線形予測分析処理、２……ローパスフィル
タ、３……過去の方向に向かって指数関数の減衰を有す
る半無限長の窓関数による処理、４……自己相関関数の
連続算出、５、25……閾値による有声判別処理、６、26
……基本周波数の候補の選出処理、７、27……基本周波
数の選出処理、21……窓掛けによる短時間の波形の抽出
処理、22……窓関数をかける処理、23……線形予測分
析、24、……自己相関関数の算出処理、31、61……音声
波形、32……窓の長さの音声波形、33……窓関数を掛け
た波形、34……残差波形、35……正規化自己相関関数、
62……時間的に連続する残差波形、63……ローパスフィ
ルタを通した残差波形、64……半無限長の窓関数と掛け
合わされる部分の残差波形、65……半無限長の窓関数、
66……窓掛け結果、67……自己相関関数、81……一次の
指数関数で表される窓関数の概形、82……二次の指数関
数的に表される窓関数の概形、83……三次の指数関数的
に表される窓関数の概形。FIG. 1 is an explanatory diagram of how to obtain a fundamental frequency according to the present invention,
FIG. 2 is an explanatory diagram of how to obtain a fundamental frequency according to the conventional technique, FIG. 3 is an explanatory diagram of a processing process until obtaining a normalized autocorrelation function according to the conventional technique, and FIG. 4 is a diagram showing movement of a window position. Explanatory drawing, FIG. 5 is explanatory drawing of the candidate of a fundamental frequency, 6th.
FIG. 7 is a schematic explanatory diagram of the processing up to the calculation of the autocorrelation function according to the present invention. FIG. 7A is an overall diagram of the lattice partial correlation calculation circuit. FIG. The figure is a schematic of a window function of semi-infinite length with exponential decay towards the past. 1 ... Continuous linear prediction analysis processing, 2 ... Low-pass filter, 3 ... Processing by window function of semi-infinite length having attenuation of exponential function toward past direction, 4 ... Continuous calculation of autocorrelation function, 5, 25 ... Voiced voice discrimination processing by threshold value, 6, 26
…… Basic frequency candidate selection processing, 7, 27 …… Basic frequency selection processing, 21 …… Short-time waveform extraction processing by windowing, 22 …… Window function multiplication processing, 23 …… Linear prediction analysis , 24, ... Autocorrelation function calculation processing, 31, 61 ... Speech waveform, 32 ... Window length speech waveform, 33 ... Window function multiplied waveform, 34 ... Residual waveform, 35 ... … Normalized autocorrelation function,
62: Residual waveform that is continuous in time, 63: Residual waveform after passing through a low-pass filter, 64: Residual waveform of a portion that is multiplied by a window function of semi-infinite length, 65: Semi-infinite length Window function,
66 ... Windowing result, 67 ... Autocorrelation function, 81 ... Outline of window function represented by first-order exponential function, 82 ... Outline of window function represented by second-order exponential function, 83 …… A third-order exponential window function.

───────────────────────────────────────────────────── フロントページの続き (72)発明者山口幹雄大阪市此花区島屋１丁目１番３号住友電気工業株式会社大阪製作所内 (56)参考文献特開昭52−26107（ＪＰ，Ａ) 特開昭61−187000（ＪＰ，Ａ) ─────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Mikio Yamaguchi 1-3-1 Shimaya, Konohana-ku, Osaka Sumitomo Electric Industries, Ltd. (56) Reference JP-A-52-26107 (JP, A) Kaisho 61-187000 (JP, A)

Claims

(57) [Claims]

1. A method for obtaining a residual by continuously performing linear prediction analysis with a speech waveform as an input, and a residual having a semi-infinite length window function having an exponential decay in the past direction. It has a means for obtaining the autocorrelation function and a means for obtaining the period of the fundamental frequency from the maximum point of the autocorrelation function of the residual, and continuously obtains the residual from the speech waveform, and calculates the exponential function toward the past direction. A fundamental frequency analysis device, characterized in that an autocorrelation function of a residual is obtained by applying a window function of a semi-infinite length having attenuation, and a fundamental frequency is obtained from a maximum point of the autocorrelation function.

2. The fundamental frequency analyzer according to claim 1, wherein the means for continuously performing linear prediction analysis to obtain the residual error is a continuous calculation of partial correlation by a lattice method. Fundamental frequency analyzer.

3. The fundamental frequency analyzer according to claim 1 or 2, wherein the window function W (nτ) is A ₁ exp (−nτ / T ₁ ) (A ₁ and T ₁ are constants. Τ represents a sampling period, and n is a non-negative integer.) The fundamental frequency analyzer.

4. The fundamental frequency analyzer according to claim 1 or 2, wherein the window function W (nτ) is A ₂ nτexp (1-nτ / T ₂ ) (A ₂ , T ₂ are A constant, τ represents a sampling period, and n is a non-negative integer.) The fundamental frequency analyzer.

5. The fundamental frequency analyzer according to claim 1 or 2, wherein the window function W (nτ) is A ₃ (nτ) ² exp (2-nτ / T ₃ ) (A ₃ , T ₃ is a constant, τ is a sampling period, and n is a non-negative integer.).