JPS6211360B2

JPS6211360B2 -

Info

Publication number: JPS6211360B2
Application number: JP53048955A
Authority: JP
Inventors: Satoru Taguchi; Kazuo Ochiai
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1978-04-24
Filing date: 1978-04-24
Publication date: 1987-03-12
Also published as: JPS54151303A

Abstract

PURPOSE:To enable to discriminate voice and voicelessness accurately, by obtaining the spectrum of audio with the linear forescasting analysis and the pitch with the autocorrelation analysis method, to compress the deviation of the distribution of the maximum value of K parameter and autocorrelation. CONSTITUTION:The K parameter pick up unit 102 receives audio input signal from the terminal 102, picks up the K parameter being the linear forecasting coefficient, called partial autocorrelation, to a plurality of orders with the linear forecasting analysis method, and outputs the K parameter of a plurality of orders piked up to the log area ratio converter 103. The converter 103 converts the K parameter of primary, secondary order,... into the log area ratio L1, L2... by using the value memorized in the read only memory in advance and outputs it to the voice and voicelessess discriminator 104. The discriminator 104 discriminates voice or voicelessness and outputs it, depending whether the value of W1L1+W2L2+... is greater than the threshold value for discrimination predetermined or smaller. Further, W1 and W2 are constants obtained in advance.

Description

【発明の詳細な説明】本発明は音声波形から音声の有声無声を判別す
るための有声無声判別装置に関し、線形判別式を
用いた有声無声判別装置に係るものであり、更に
詳しく云えば自己相関型のピツチ抽出に用いる自
己相関係数やスペクトラム包絡情報を表わすＫパ
ラメータ等をそのまま、あるいは簡単な変換をし
て組合せた線形判別式を用いて音声の有声無声判
別を行なう有声無声判別装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a voiced/unvoiced discriminator for discriminating whether speech is voiced or unvoiced from a speech waveform, and more particularly to a voiced/unvoiced discriminator using a linear discriminant. The present invention relates to a voiced/unvoiced discriminator that discriminates between voiced and unvoiced speech using a linear discriminant that combines autocorrelation coefficients used for pattern pitch extraction, K parameters representing spectral envelope information, etc. as they are or after simple conversion.

音声における有声無声判別情報は音声の分析合
成等における重要なパラメータであることが知ら
れている。例えば音声の分析合成系においては分
析部で判別される有声無声判別情報が合成部にお
いて合成される合成音の品質に大きな影響を及ぼ
す。その影響は有声音部分を無声と判定した場合
と無声音部分を有声と判定した場合とを問わず合
成音の品質を致命的に劣化させる場合がしばしば
起る事からも無視出来ないものである。例えば有
声音部分を無声と判定した場合には、合成音は云
わゆる“カス”れた感じの音となり、自然性を大
きく損う。また無声音部分を有声と判定した場合
には、合成音は云わゆる“ビリビリ”した感じの
音となり、自然性を大きく損い、明瞭度に重大な
影響を与える場合がある。 It is known that voiced/unvoiced discrimination information in speech is an important parameter in speech analysis and synthesis. For example, in a speech analysis and synthesis system, voiced/unvoiced discrimination information discriminated by an analysis section has a great influence on the quality of synthesized speech synthesized by a synthesis section. This influence cannot be ignored, as it often fatally deteriorates the quality of synthesized speech, regardless of whether a voiced part is determined to be unvoiced or an unvoiced part is determined to be voiced. For example, if a voiced sound part is determined to be unvoiced, the synthesized sound will have a so-called "scattered" sound, greatly impairing its naturalness. Furthermore, if an unvoiced sound portion is determined to be voiced, the synthesized sound will have a so-called "choppy" sound, which may greatly impair naturalness and have a serious impact on intelligibility.

従来、有声無声をある程度判別し得るパラメー
タとしては、エネルギーが有声、無声で異なるこ
とを利用した短時間平均電力、周板数パワースペ
クトラムの密度関数が両者で異なることを利用し
た比較的低周波数域における短時間平均電力との
比率、線形予測分析法により求まる予測残差電力
（有声は小さく、無声は大きい）、両者でその数が
異なることを利用した零交さカウント数、比較的
に零に近い遅れ時間における自己相関係数値等の
主としてホルマント情報をよく表現している自己
相関係数値、ほぼピツチ周期遅れ時間が分布する
と考えられる範囲の遅れ時間における自己相関係
数の最大値（以後δ_MAXと云う）、線形予測分析法
により求める種々のパラメータ例えば云わゆるα
パラメータとして知られる線形方程式の直接的な
解として求まるパラメータや、云わゆる声道内の
反射係数を示すＫパラメータもしくは偏自己相関
係数あるいは部分自己相関係数として知られる、
前記αパラメータと関連したパラメータ等の種々
のパラメータ、あるいは云わゆるケプストラムと
して知られるパラメータ等、数多く知られてい
る。 Conventionally, parameters that can distinguish between voiced and unvoiced to some extent are short-time average power, which takes advantage of the fact that energy differs between voiced and unvoiced, and relatively low frequency range, which takes advantage of the fact that the density function of the frequency power spectrum is different between the two. The ratio to the short-time average power of Autocorrelation coefficients that express formant information well, such as autocorrelation coefficients at close delay times, and the maximum value of autocorrelation coefficients at delay times in a range where pitch period delay times are considered to be distributed (hereinafter referred to as δ _MAX ), various parameters determined by linear predictive analysis method, for example, the so-called α
Parameters that are found as direct solutions to linear equations known as parameters, K parameters that indicate the so-called reflection coefficient in the vocal tract, or known as partial autocorrelation coefficients or partial autocorrelation coefficients.
Many parameters are known, such as parameters related to the α parameter, or parameters known as the so-called cepstrum.

しかしながら、これらのパラメータのどれもが
単独で必要十分な有声無声判別特性は有していな
い。云い換えれば有声無声判別に関して、上記の
どのパラメータもある程度の効果は期待し得る
が、完全ではない。 However, none of these parameters alone has sufficient voiced/unvoiced discrimination characteristics. In other words, all of the above parameters can be expected to have some effect on voiced/unvoiced discrimination, but they are not perfect.

そのため従来から有声無声判別装置では上記の
パラメータ等の種々のパラメータを組合せて有声
無声判別を行なつていた。パラメータの組合せ方
法としては、通常下記の３種類の方法が採用され
る。 Therefore, voiced/unvoiced discrimination devices have conventionally performed voiced/unvoiced discrimination by combining various parameters such as those described above. Generally, the following three methods are used to combine parameters.

第１の方法は、組合せに使用する各パラメータ
をあらかじめ、例えば明確に有声と判定出来るし
きい値で有声判定を行ない、前記各パラメータの
少なくとも一つが有声と判定されれば、総合的に
有声と判定する等の手段による論理演算、すなわ
ち論理和と論理積とによる演算を用いる方法であ
る。第２の方法は、組合せに使用する各パラメー
タを変数とする線形判別式を用いる方法である。
第３の方法は前記第１の方法と前記第２の方法と
の混合した方法である。これら３種類の方法のう
ち、第２の方法は特に計算機様の装置に適した方
法であることが知られている。また選択すべきパ
ラメータとしては、出来るかぎり有声無声判別装
置と同時に使用されるスペクトラム情報分析装
置、ピツチ抽出されるパラメータを利用すること
が有利であることは云うまでもない。 The first method is to determine voicedness in advance for each parameter used in the combination, for example, using a threshold value that can clearly determine that it is voiced, and if at least one of the parameters is determined to be voiced, it is determined that voiced overall. This method uses logical operations such as determination, that is, operations using logical sums and logical products. The second method is to use a linear discriminant in which each parameter used in the combination is a variable.
The third method is a mixed method of the first method and the second method. Of these three methods, the second method is known to be particularly suitable for computer-like devices. As for the parameters to be selected, it goes without saying that it is advantageous to use, as much as possible, parameters extracted from the spectrum information analysis device and the pitch, which are used simultaneously with the voiced/unvoiced discriminator.

線形予測分析法から抽出されるαパラメータ、
Ｋパラメータ等のパラメータは有声無声をある程
度判別し得るパラメータであり、特に１次のＫパ
ラメータ等、低次のＫパラメータは、その分布が
有声と無声とでは比較的によく分離していること
が知られている。 α parameter extracted from linear predictive analysis method,
Parameters such as the K parameter are parameters that can distinguish between voiced and unvoiced to some extent, and in particular, the distribution of low-order K parameters such as the first-order K parameter is relatively well separated between voiced and unvoiced cases. Are known.

また自己相関分析法から抽出される、ほぼピツ
チ周期遅れ時間が分布すると考えられる範囲の遅
れ時間における自己相関係数の最大値、すなわち
δ_MAXもまた、その分布が有声と無声とでは比較
的によく分離していることが知られている。以上
の性質を利用して例えば１次のＫパラメータとδ
_MAXとを変数とした線形判別式を用いて有声無声
判別を行なう方法が知られている。（例えば、公
開特許公報、特開昭51−149705、「駆動音源信号
分析方法」）しかしながら前記方法は例えば１次のＫパラメ
ータの分布例（例えば、谷戸文広、榑松明、「音
声分析合成系における有声無声判定の検討」、昭
和52年度電子通信学会情報部門全国大会論文集、
205ページに記載されている図１）から明らかな
ように１次のＫパラメータの分布が大きく偏つて
いるため、例えば多変量解析による統計的手法に
より線形判別式を決定する場合、あるいは経験的
に線形判別式を決定する場合に、有声と無声との
分離度の良好な判別式の各係数と判別のためのし
きい値とを選び得ないという欠点を有していた。
又、δ_MAXについても偏りを有しているため同様
な欠点を有していた。 Furthermore, the maximum value of the autocorrelation coefficient in the range of delay times that is considered to be approximately distributed in pitch cycle delay times, that is, δ _MAX , extracted from the autocorrelation analysis method, also has a distribution that is relatively different between voiced and unvoiced cases. known to be well separated. Using the above properties, for example, the first-order K parameter and δ
A method is known in which voiced/unvoiced discrimination is performed using a linear discriminant with _MAX as a variable. (For example, published patent publication, JP 51-149705, "Drive sound source signal analysis method") However, the above method does not apply to the distribution example of the first-order K parameter (for example, Fumihiro Yato, Matsushita Kure, "Speech analysis and synthesis system analysis method"). "A Study on Voiced/Unvoiced Determination", Proceedings of the 1975 National Conference of the Information Division of the Institute of Electronics and Communication Engineers,
As is clear from Figure 1) on page 205, the distribution of the first-order K parameter is highly biased. When determining a linear discriminant, it is difficult to select the coefficients of the discriminant and the threshold for discrimination that provide a good degree of separation between voiced and unvoiced states.
Furthermore, since δ _MAX is also biased, it has a similar drawback.

即ち、線形判別式は判別に利用するパラメータ
の各クラスの分布が各々、正規分布である前提で
そのしきい値が決定されている。しかしながら、
上述のように、従来、有声無声の判別に利用して
いるパラメータ、例えばK₁，K₂等は有声、又は
無声の分布が正規分布から大きく偏している。そ
のためK₁，K₂等を直接、判別パラメータとして
利用する従来の方法は、誤判別が多いという欠点
を有していた。 That is, the threshold value of the linear discriminant is determined on the premise that the distribution of each class of parameters used for discrimination is a normal distribution. however,
As mentioned above, the distribution of parameters conventionally used to determine voiced/unvoiced, such as K ₁ , K _{2 ,} etc., is significantly deviated from the normal distribution. For this reason, conventional methods that directly use K ₁ , K ₂ , etc. as discrimination parameters have the disadvantage that they often result in erroneous discrimination.

本発明の目的は上記欠点を大幅に改良した精密
な有声無声判別を可能にせしめた有声無声判別装
置を提供することにある。 SUMMARY OF THE INVENTION An object of the present invention is to provide a voiced/unvoiced discriminating device which greatly improves the above-mentioned drawbacks and enables precise voiced/unvoiced discrimination.

具体的には音声のスペクトラム情報を線形予測
分析法、ピツチ情報を自己相関分析法を用いて得
られるパラメータに或る変換を施すことによつ
て、例えばＫパラメータやδ_MAXの分布の偏りを
圧縮し、精密な有声無声の判別を可能とする有声
無声判別装置を提供するものである。 Specifically, by performing a certain transformation on the parameters obtained by using the linear prediction analysis method for the voice spectrum information and the autocorrelation analysis method for the pitch information, for example, the bias in the distribution of the K parameter and δ _MAX can be compressed. The present invention provides a voiced/unvoiced discrimination device that enables precise discrimination between voiced and unvoiced.

本発明によれば、所定の係数をもち、且つ入力
音声信号の所定の特徴パラメータを変数とする判
別式の値と予め定めたしきい値とを比較すること
により有声無声を判別する有声無声判別装置にお
いて、前記変数のうち少なくとも１つが予め定め
た次数までの部分自己相関係数（Ｋパラメータ）
であり、この部分自己相関係数をログエリアレシ
オに変換して前記判別式の変数とすることを特徴
とする有声無声判別装置が得られる。 According to the present invention, voiced/unvoiced discrimination is performed by comparing the value of a discriminant having a predetermined coefficient and having a predetermined characteristic parameter of an input audio signal as a variable with a predetermined threshold value to determine voiced/unvoiced. In the apparatus, at least one of the variables has a partial autocorrelation coefficient (K parameter) up to a predetermined order.
A voiced/unvoiced discriminator is obtained, which is characterized in that this partial autocorrelation coefficient is converted into a log area ratio and used as a variable in the discriminant.

次に本発明の実施例について図面を参照しなが
ら詳細に説明する。 Next, embodiments of the present invention will be described in detail with reference to the drawings.

先ず、本発明の基本的概念について説明する
と、一般に音声のスペクトラム情報を線形予測分
析法を用いて得られる１次のＫパラメータと２次
のＫパラメータとは前述の如く分布の偏りが大き
いため、これら２つのパラメータを例えば伝送や
記憶の目的で量子化する場合には各々ログエリア
レシオ（声道断面積比の対数値）に変換後に量子
化する場合がある。例えばR.VISWANATHAN、
JOHN MAKHOUL、“Quantigation Properties
of Transmission Par−ameters in Linear
Prediction Systems”、IEEE TRANSACTIONS
ON ACOUSTICS、SPEECH、AND SI−GNAL
PROCESSING.VOL.ASSP−23、NO.3
JUNE1975 pp309〜321の第（31）式等による。
Ｋパラメータをログエリアレシオに変換する理由
の一つはこの変換により分布の偏りが圧縮される
ことによる。一般に声道の各部の断面積は時間的
に連続して変化している。しかして音声分析の際
には声道の断面積は声道内のサンプリングポイン
トの両側の各断面積を代表する断面積比（例えば
サンプリングポイント間の声道断面積の平均値を
前記代表とする）を用いる。前記の如くＫパラメ
ータは声道内の反射係数を意味するため、前記断
面積比は（１＋Kn）／（１−Kn）で表わすこと
ができる。ここでｎはＫの次数を示す。従つてロ
グエリアレシオはlog（１＋Kn）／（１−Kn）で
表わすことができる。ｎ次の声道の声道断面積比
を更に一般的に云えば、音速Vo、サンプリング
周期をToとするとき開口部からnVoToの声道断
面積に相当する。 First, to explain the basic concept of the present invention, the first-order K parameter and the second-order K parameter, which are generally obtained by using the linear predictive analysis method of voice spectrum information, have a large bias in distribution as described above. When these two parameters are quantized for the purpose of transmission or storage, for example, they may be quantized after being converted into log area ratios (logarithmic values of vocal tract cross-sectional area ratios). For example, R.VISWANATHAN,
JOHN MAKHOUL, “Quantigation Properties
of Transmission Parameters in Linear
Prediction Systems”, IEEE TRANSACTIONS
ON ACOUSTICS, SPEECH, AND SI−GNAL
PROCESSING.VOL.ASSP−23, NO.3
Based on equation (31) etc. of JUNE1975 pp309-321.
One of the reasons for converting the K parameter into a log area ratio is that the bias in the distribution is compressed by this conversion. Generally, the cross-sectional area of each part of the vocal tract changes continuously over time. Therefore, during voice analysis, the cross-sectional area of the vocal tract is determined by a cross-sectional area ratio representing each cross-sectional area on both sides of a sampling point in the vocal tract (for example, the average value of the vocal tract cross-sectional area between sampling points is used as the representative). ) is used. As mentioned above, since the K parameter means the reflection coefficient within the vocal tract, the cross-sectional area ratio can be expressed as (1+Kn)/(1-Kn). Here, n indicates the order of K. Therefore, the log area ratio can be expressed as log(1+Kn)/(1-Kn). More generally speaking, the vocal tract cross-sectional area ratio of the n-th vocal tract corresponds to the vocal tract cross-sectional area of nVoTo from the aperture, where Vo is the sound velocity and To is the sampling period.

従つて、これを有声無声判別に適用すれば特に
分布の偏りが大きい１次のＫパラメータと２次の
Ｋパラメータとについては、有声無声判別のため
の線形判別式に前記パラメータを直接用いずに、
より確実な有声無声判別を行ない得る。 Therefore, if this is applied to voiced/unvoiced discrimination, especially for the first-order K parameters and second-order K parameters whose distributions are highly biased, the above parameters will not be directly used in the linear discriminant for voiced/unvoiced discrimination. ,
More reliable voiced/unvoiced discrimination can be performed.

更にログエリアレシオは通常はＫパラメータの
量子化のために計算済であり、従つてログエリア
レシオの採用は一般には演算量の負担にならない
という利点がある。なお３次以降のＫパラメータ
は比較的に分布の偏りが小さく有声無声判別のた
め線形判別式に直接用い得ることは明らかであ
る。またδ_MAXの分布は若干の偏りを有しており
（例えば、前記谷戸らの論文の図２）、例えば下式ａδ_MAX／（ｂ−ｃδ_MAX）による変換等の非線形変換を実施することによ
り、有声無声判別のための線形判別式に用いるパ
ラメータとしてより有効性を増すことは明らかで
ある。上式においてａ、ｂ、ｃは全て定数であ
る。 Furthermore, the log area ratio has usually been calculated for the quantization of the K parameter, and therefore, the use of the log area ratio has the advantage that it generally does not impose a burden on the amount of calculation. It is clear that the third-order and subsequent K parameters have relatively small deviations in distribution and can be used directly in the linear discriminant for voiced/unvoiced discrimination. In addition, the distribution of δ _MAX has some bias (for example, Figure 2 of the paper by Tanito et al. mentioned above), and by performing a nonlinear transformation such as the transformation using the following formula aδ _MAX / (b−cδ _MAX ), , it is clear that it becomes more effective as a parameter used in the linear discriminant for voicing/unvoicing discrimination. In the above formula, a, b, and c are all constants.

なお前記非線形変換は一般に演算量の増加を伴
い、また１次のＫパラメータや２次のＫパラメー
タを直接、有声無声判別のための線形判別式に用
いる場合と異なり、δ_MAXを直接、線形判別式の
パラメータとして使用しても相当の効果が期待で
きるため、δ_MAXについては若干の有声無声判別
誤りを許容できる場合には線形判別式のパラメー
タとして前記δ_MAXを直接用いるか、前記δ_MAXを
非線形変換後に用いるかは有声無声の精度に対す
る要求により選択する必要がある。 Note that the nonlinear transformation generally involves an increase in the amount of calculations, and unlike the case where the first-order K parameter or the second-order K parameter is directly used in the linear discriminant for voiced/unvoiced discrimination, δ _MAX is directly used for the linear discriminant. Considerable effects can be expected even when used as a parameter in the equation, so if a slight error in voiced/unvoiced discrimination can be tolerated, the above δ _MAX can be used directly as a parameter of the linear discriminant, or the above δ _MAX can be used as a parameter in the linear _discriminant . It is necessary to select whether to use it after nonlinear transformation depending on the requirement for accuracy of voiced and unvoiced.

本発明は１次のログエリアレシオ、２次のログ
エリアレシオ、３次以降のＫパラメータ、δ_MAX
についての必らず１次のログエリアレシオを含む
種々の組合せによる線形判別式を用いて有声無声
判別を行なう手段により構成されている。 The present invention is based on the first order log area ratio, the second order log area ratio, the third order and subsequent K parameters, and δ _MAX.
The voiced/unvoiced discrimination is performed using linear discriminants based on various combinations including necessarily a first-order log area ratio.

次に本発明による有声無声判別装置の具体的構
成例を説明する。 Next, a specific configuration example of the voiced/unvoiced discriminator according to the present invention will be explained.

第１図は第１の実施例を示すブロツク図であ
る。 FIG. 1 is a block diagram showing a first embodiment.

波形入力端子１０１を介して、音声波形入力信
号がＫパラメータ抽出器１０２へ供給される。Ｋ
パラメータ抽出器１０２は偏自己相関係数、ある
いは部分自己相関係数と呼ばれる一種の線形予測
係数であるＫパラメータを複数次まで、少なくと
も２次まで前記音声波形入力信号から例えば
AUTOCORRELATION法あるいはREGUL−AR
LATTICE法、もしくはCOVARI−ANCE
LATICE法（以上３種の分析法については例え
ば、JOHN MAKHOUL、“Stable and Efficient
Lattice Methods for Linear Prediction”、
IEEE TRANSACTIONS ON ACOUSTICS、
SPEECH、AND SIGNAL PROCESSING、
VOL.ASSP−25、NO.5 OCTOBER1977、pp423
〜428）等の線形予測分析法により抽出し、抽出
された少なくとも１次のＫパラメータと２次のＫ
パラメータとをログエリアレシオ変換器１０３へ
出力する。 An audio waveform input signal is supplied to a K-parameter extractor 102 via a waveform input terminal 101 . K
The parameter extractor 102 extracts K parameters, which are a type of linear prediction coefficient called partial autocorrelation coefficient or partial autocorrelation coefficient, from the audio waveform input signal up to multiple orders, at least up to second order, for example.
AUTOCORRELATION method or REGUL-AR
LATTICE method or COVARI-ANCE
LATICE method (For the above three analytical methods, see JOHN MAKHOUL, “Stable and Efficient
“Lattice Methods for Linear Prediction”
IEEE TRANSACTIONS ON ACOUSTICS,
SPEECH, AND SIGNAL PROCESSING,
VOL.ASSP−25, NO.5 OCTOBER1977, pp423
~428), etc., and the extracted at least first-order K parameter and second-order K
parameters to the log area ratio converter 103.

ログエリアレシオ変換器１０３はＫパラメータ
抽出器１０２より供給される１次のＫパラメータ
と２次のＫパラメータとを例えば、予じめリード
オンリーメモリに記憶された値を用いて、それぞ
れログエリアレシオに変換し、変換結果を有声無
声判別器１０４へ出力する。有声無声判別器１０
４は１次のログエリアレシオをL₁、２次のログ
エリアレシオをL₂として下式 W₁L₁＋W₂L₂ で示される値が、予じめ決められた判別しきい値
より大きいか、小さいかにより有声無声を判別す
る。なお前記判別しきい値とW₁とW₂とは、例え
ば多変量解析等の手法を用いて予じめ求められて
いる定数である。更に有声無声判別器１０４は有
声無声の判別信号として有声無声信号出力端子１
０５へ出力する。 The log area ratio converter 103 converts the primary K parameters and secondary K parameters supplied from the K parameter extractor 102 into log area ratios using, for example, values stored in advance in a read-only memory. and outputs the conversion result to the voiced/unvoiced discriminator 104. Voiced/unvoiced discriminator 10
4 means that the value shown by the following formula W ₁ L ₁ + W ₂ L ₂ is larger than the predetermined discrimination threshold, where L ₁ is the first-order log area ratio and L ₂ is the second-order log area ratio. Whether it is voiced or unvoiced is determined by whether it is small or small. Note that the discrimination threshold and W ₁ and W ₂ are constants that are determined in advance using a method such as multivariate analysis, for example. Furthermore, the voiced/unvoiced discriminator 104 outputs a voiced/unvoiced signal output terminal 1 as a voiced/unvoiced discrimination signal.
Output to 05.

第２図は第２の実施例を説明するためのブロツ
ク図である。 FIG. 2 is a block diagram for explaining the second embodiment.

波形入力端子２０１を介して音声波形入力信号
がＫパラメータ抽出器２０２へ供給される。Ｋパ
ラメータ抽出器２０２は少なくとも３次以上の複
数次のＫパラメータを前記音声波形入力信号から
例えばAUTOCORRELATION法等の線形予測分
析法により抽出し、抽出された複数のＫパラメー
タのうち１次と２次とのＫパラメータをログエリ
アレシオ変換器２０３へ、３次からＮ次までのＫ
パラメータを有声無声判別器２０４へそれぞれ出
力する。なおＮは３以上の整数である。 An audio waveform input signal is supplied to a K-parameter extractor 202 via a waveform input terminal 201 . The K-parameter extractor 202 extracts K-parameters of at least 3rd-order or higher order from the audio waveform input signal by a linear predictive analysis method such as the AUTOCORRELATION method, and extracts 1st-order and 2nd-order K parameters among the extracted K parameters. The K parameters from the 3rd to the Nth order are sent to the log area ratio converter 203.
The parameters are output to the voiced/unvoiced discriminator 204, respectively. Note that N is an integer of 3 or more.

ログエリアレシオ変換器２０３はＫパラメータ
抽出器２０２より供給される１次のＫパラメータ
と２次のＫパラメータとを例えば、予じめリード
オンリーメモリに記録された値を用いて、それぞ
れログエリアレシオに変換し、変換結果を有声無
声判別器２０４へ出力する。有声無声判別器２０
４は１次のログエリアレシオをL₁、２次のログ
エリアレシオをL₂、３次のＫパラメータをK₃、
Ｎ次のＫパラメータをＫ_Nとして、下式で示される値が、予じめ決められた判別しきい値
より大きいか、小さいかにより有声無声を判別す
る。なお、V₁、V₂………、Ｖ_Nは定数である。更
に有声無声の判別器２０４は有声無声の判別結果
を有声無声判別信号として有声無声信号出力端子
２０５へ出力する。 The log area ratio converter 203 converts the primary K parameters and secondary K parameters supplied from the K parameter extractor 202 into log area ratios using, for example, values recorded in advance in a read-only memory. and outputs the conversion result to the voiced/unvoiced discriminator 204. Voiced/unvoiced classifier 20
4 is the first-order log area ratio as L ₁ , the second-order log area ratio as L ₂ , the third-order K parameter as K ₃ ,
Assuming the Nth-order K parameter as K _N , the following formula Voiced or unvoiced is determined based on whether the value indicated by is larger or smaller than a predetermined discrimination threshold. Note that V ₁ , V _{2 .} . . , and V _N are constants. Furthermore, the voiced/unvoiced discriminator 204 outputs the voiced/unvoiced discrimination result to the voiced/unvoiced signal output terminal 205 as a voiced/unvoiced discrimination signal.

第３図は第２の実施例の他の構成例を説明する
ためのブロツク図である。 FIG. 3 is a block diagram for explaining another example of the configuration of the second embodiment.

波形入力端子３０１を介して音声波形入力信号
がＫパラメータ抽出器３０２へ供給される。Ｋパ
ラメータ抽出器３０２は波形入力信号から予測残
差電力の入力信号電力に対する比すなわち正規化
予測残差電力を計測する手段を有し、この計測結
果によりＫパラメータの次数を制御する機能を持
つ分析器である。 An audio waveform input signal is supplied to a K-parameter extractor 302 via a waveform input terminal 301 . The K-parameter extractor 302 has means for measuring the ratio of predicted residual power to input signal power, that is, normalized predicted residual power, from the waveform input signal, and has a function of controlling the order of the K-parameter based on this measurement result. It is a vessel.

Ｋパラメータ抽出器３０２は波形入力端子３０
１を介して供給される音声波形入力信号を分析
し、正規化予測残差電力が予じめ設定された値以
下になるまで複数次のＫパラメータを抽出する。 The K parameter extractor 302 is connected to the waveform input terminal 30
1, and extracts K-parameters of multiple orders until the normalized prediction residual power becomes equal to or less than a preset value.

なお前記複数次のＫパラメータに関し、その最
大次数は例えば８次等に制限されているのが普通
である。また適当な正規化予測残差電力の設定値
を選べばその最小次数は２であることが経験的に
知られている。更にＫパラメータ抽出器３０２は
抽出された１次と２次とのＫパラメータをＫパラ
メータ伝送路１，３０３へ、Ｋパラメータの次数
N1を示す分析次数信号を分析次数信号伝送路３
０５へ出力する。N1が３以上の場合にはＫパラ
メータ抽出器３０２は更に３次からN1次までの
１ケ又は複数ケのＫパラメータをＫパラメータ伝
送路２，３０４へ出力する。ログエリアレシオ変
換器３０６はＫパラメータ伝送路１，３０３を介
して供給される１次と２次とのＫパラメータをそ
れぞれログエリアレシオに変換し、変換結果をロ
グエリアレシオ伝送路３０７へ出力する。 Regarding the K-parameters of multiple orders, the maximum order is usually limited to, for example, the 8th order. Furthermore, it is empirically known that the minimum order is 2 if an appropriate set value of the normalized prediction residual power is selected. Furthermore, the K-parameter extractor 302 sends the extracted primary and secondary K-parameters to the K-parameter transmission path 1,303,
The analytical order signal indicating N1 is transmitted through the analytical order signal transmission path 3.
Output to 05. If N1 is 3 or more, the K parameter extractor 302 further outputs one or more K parameters from the third order to the N1 order to the K parameter transmission line 2,304. The log area ratio converter 306 converts the primary and secondary K parameters supplied via the K parameter transmission lines 1 and 303 into log area ratios, respectively, and outputs the conversion results to the log area ratio transmission line 307. .

有声無声判別器３０８は分析次数信号伝送路３
０５から供給されるＫパラメータの次数N1を示
す分析次数信号により制御される。有声無声判別
器３０８は前記N1が２の場合にはログエリアレ
シオ伝送路３０７を介して供給される１次のログ
エリアレシオL₁、２次のログエリアレシオをL₂
として、下式 W₁L₁＋W₂L₂ を用いて有声無声判別を行なう。またN1が３以
上の場合には前記L₁、L₂及びＫパラメータ伝送
路２，３０４を介して供給される３次からN1次
までのＫパラメータK₃、………、Ｋ_N1を用い
て、下式を用いて有声無声判別を行なう。なお上記W₁、
W₂、Ui^N1（ｉ＝１、………N1）は定数である。
更に有声無声判別器３０８は有声無声の判別結果
を有声無声判別信号出力端子３０９へ出力する。 The voiced/unvoiced discriminator 308 is connected to the analysis order signal transmission path 3
It is controlled by an analysis order signal indicating the order N1 of the K parameter supplied from 05. When N1 is 2, the voiced/unvoiced discriminator 308 determines the primary log area ratio L ₁ and the secondary log area ratio L ₂ supplied via the log area ratio transmission line 307.
, voiced/unvoiced discrimination is performed using the following formula W ₁ L ₁ +W ₂ L ₂ . In addition, when N1 is 3 or more, the K parameters K ₃ , ..., K _N1 from the third order to the N1 order supplied via the L ₁ , L ₂ and the K parameter transmission line 2, 304 are used. , below formula is used to determine voiced/unvoiced. Note that the above W ₁ ,
W ₂ and Ui ^N1 (i=1,...N1) are constants.
Furthermore, the voiced/unvoiced discriminator 308 outputs the voiced/unvoiced discrimination result to the voiced/unvoiced discrimination signal output terminal 309 .

第４図は第３の実施例を説明するためのブロツ
ク図である。 FIG. 4 is a block diagram for explaining the third embodiment.

波形入力端子４０１を介して、例えば3400Hzに
帯域制限された音声波形入力信号がＡ＼Ｄ変換器
４０２へ供給される。Ａ＼Ｄ変換器４０２は前記
音声波形入力信号を例えば8000Hzで標本化して自
己相関数計測器４０３へ出力する。自己相関係数
計測器４０３は前記標本化された音声波形入力信
号の１標本化周期、例えば1/8000SECに相対す
る遅れ時間における自己相関係数の遅れ零におけ
る自己相関係数に対する比δ_１を計測する。 An audio waveform input signal band-limited to, for example, 3400 Hz is supplied to an A\D converter 402 via a waveform input terminal 401 . The A/D converter 402 samples the audio waveform input signal at, for example, 8000 Hz and outputs it to the autocorrelation number measuring device 403. The autocorrelation coefficient measuring device 403 calculates the ratio δ ₁ of the autocorrelation coefficient at a delay time relative to one sampling period of the sampled audio waveform input signal, for example, 1/8000 SEC, to the autocorrelation coefficient at zero delay. measure.

なお、よく知られているように前記δ_１は１次
のＫパラメータK₁と一致する。 Note that, as is well known, the above δ ₁ coincides with the first-order K parameter K ₁ .

また自己相関係数計測器４０３は、ほぼピツチ
周期遅れ時間が分布すると考えられる範囲の遅れ
時間、例えば２ｍSECから18ｍSECにおける自
己相関係数の最大値の遅れ零における自己相関係
数に対する比δ_MAXを計測する。更に自己相関係
数計測器４０３は前記δ_１をログエリアレシオ変
換器４０４へ、前記δ_MAXを非線形変換器４０５
へそれぞれ出力する。ログエリアレシオ変換器４
０４は自己相関係数計測器４０３より供給される
δ_１を１次のＫパラメータK₁として１次のログ
エリアレシオL₁に変換し、前記L₁を有声無声判
別器４０６へ出力する。非線形変換器４０５は自
己相関係数計測器４０３より供給される前記δ_MA
_Ｘを例えば、下式 δ　′_ＭＡＸ＝ａδ_MAX／（ｂ−ｃδ_MAX）によりδ　′_ＭＡＸに変換し、前記δ　′_ＭＡＸを有声
無声判別
器４０６へ出力する。但しａ、ｂ、ｃは定数であ
る。有声無声判別器４０６は１次のログエリアレ
シオL₁、非線形変換後のδ_MAXすなわちδ　′_ＭＡＸか
ら、下式 T₁L₁＋T₂δ_MAX を用いて有声無声判別を行なう。 In addition, the autocorrelation coefficient measuring device 403 calculates the ratio δ _MAX of the maximum value of the autocorrelation coefficient to the autocorrelation coefficient at zero delay in a range of delay times in which pitch period delay times are considered to be approximately distributed, for example, from 2mSEC to 18mSEC. measure. Furthermore, the autocorrelation coefficient measuring device 403 sends the δ ₁ to a log area ratio converter 404 and sends the δ _MAX to a nonlinear converter 405.
Output to each. Log area ratio converter 4
04 converts δ ₁ supplied from the autocorrelation coefficient measuring device 403 into a first-order log area ratio L 1 as a first-order K parameter K ₁ and outputs _{the L 1} _to the voiced/unvoiced discriminator 406 . The nonlinear converter 405 receives the δ _MA supplied from the autocorrelation coefficient measuring device 403.
For example, _X is converted to δ' _MAX by the following formula δ' _MAX = aδ _MAX / (b-cδ _MAX ), and the δ' _MAX is output to the voiced/unvoiced discriminator 406. However, a, b, and c are constants. The voiced/unvoiced discriminator 406 performs voiced/unvoiced discrimination using the following formula T ₁ L ₁ +T ₂ δ _MAX from the first-order log area ratio L ₁ and δ _MAX after nonlinear transformation, that is, δ ′ _MAX .

なお上記T₁、T₂は定数である。更に有声無声
判別器４０６は有声無声の判別結果を有声無声判
別信号として有声無声判別信号出力端子４０７へ
出力する。なお第３の実施例において第４図の非
線形変換器４０５を除去した構成を実施し得るこ
とは明らかである。 Note that T ₁ and T ₂ above are constants. Furthermore, the voiced/unvoiced discriminator 406 outputs the voiced/unvoiced discrimination result to the voiced/unvoiced discrimination signal output terminal 407 as a voiced/unvoiced discrimination signal. It is clear that in the third embodiment, a configuration in which the nonlinear converter 405 in FIG. 4 is removed can be implemented.

第５図は第４の実施例を説明するためのブロツ
ク図である。 FIG. 5 is a block diagram for explaining the fourth embodiment.

波形入力端子５０１を介して、例えば3400Hzに
帯域制限された音声波形入力信号がＡ／Ｄ変換器
５０２へ供給される。Ａ／Ｄ変換器５０２は前記
音声波形入力信号を例えば8000Hzで標本化してＫ
パラメータ抽出器５０３と自己相関係数計測器５
０４へと出力する。Ｋパラメータ抽出器５０３は
前記標本化された音声波形入力信号から１次と２
次とのＫパラメータ、K₁とK₂とを抽出し、ログ
エリアレシオ変換器５０５へ出力する。 An audio waveform input signal band-limited to, for example, 3400 Hz is supplied to an A/D converter 502 via a waveform input terminal 501 . The A/D converter 502 samples the audio waveform input signal at, for example, 8000Hz and converts it into K
Parameter extractor 503 and autocorrelation coefficient measuring device 5
Output to 04. A K parameter extractor 503 extracts the first and second order from the sampled audio waveform input signal.
The following K parameters, _K1 and _K2 , are extracted and output to the log area ratio converter 505.

ログエリアレシオ変換器５０５はＫパラメータ
抽出器５０３より供給されると１次と２次とのＫ
パラメータすなわちK₁とK₂とを１次と２次との
ログエリアレシオL₁とL₂とに変換し、前記L₁と
L₂とを有声無声判別器５０７へ出力する。自己
相関係数計測器５０４は前記標本化された音声波
形入力信号から例えば２ｍSECから18ｍSECの
範囲の遅れ時間における自己相関係数の最大値の
遅れ零における自己関係数に対する比δ_MAXを計
測し、前記δ_MAXを非線形変換器５０６へ出力す
る。 When the log area ratio converter 505 is supplied from the K parameter extractor 503, the first-order and second-order K
Convert the parameters, namely K ₁ and K ₂ , into linear and quadratic log area ratios L ₁ and L ₂ , and calculate the above L ₁ and
L ₂ is output to the voiced/unvoiced discriminator 507. The autocorrelation coefficient measuring device 504 measures the ratio δ _MAX of the maximum value of the autocorrelation coefficient at a delay time in the range of 2 mSEC to 18 mSEC from the sampled audio waveform input signal to the autocorrelation coefficient at zero delay, The δ _MAX is output to a nonlinear converter 506 .

非線形変換器５０６は自己相関係数計測器５０
４より供給される前記δ_MAXを例えば下式 δ　′_ＭＡＸ＝ａδ_MAX／（ｂ−ｃδ_MAX）によりδ　′_ＭＡＸに変換し、前記δ　′_ＭＡＸを有声
無声判判
別器５０７へ出力する。但しａ、ｂ、ｃは定数で
ある。有声無声判別器５０７は１次のログエリア
レシオL₁、２次のログエリアレシオL₂、非線形
変換後のδ_MAXすなわちδ　′_ＭＡＸから、下式 S₁L₁＋S₂L₂＋S₃δ　′_ＭＡＸを用いて有声無声判別を行なう。 Nonlinear converter 506 is autocorrelation coefficient measuring device 50
The δ _MAX supplied from 4 is converted into δ ′ MAX by the following formula, for example, δ ′ _MAX = a δ _MAX / (b−c δ _MAX ), and _{the δ ′ MAX} _is output to the voiced/unvoiced discriminator 507 . However, a, b, and c are constants. The voiced/unvoiced discriminator 507 uses the following formula S ₁ L 1 +S ₂ L 2 + _{S 3} _δ ′ from the first-order log area ratio L ₁ , the second-order log area ratio L ₂ , _and δ _MAX after nonlinear _{transformation} , that is, δ ′ MAX. Voiced/unvoiced discrimination is performed using _MAX .

なお上記S₁、S₂、S₃は定数である。更に有声無
声判別結果を有声無声判別信号として有声無声判
別信号出力端子５０８へ出力する。なお第４の実
施例において第５図の非線形変換器５０６を除去
した構成を実施し得ることは明らかである。 Note that the above S ₁ , S ₂ , and S ₃ are constants. Furthermore, the voiced/unvoiced discrimination result is outputted to the voiced/unvoiced discrimination signal output terminal 508 as a voiced/unvoiced discrimination signal. It is clear that a configuration in which the nonlinear converter 506 in FIG. 5 is removed can be implemented in the fourth embodiment.

第６図は第５の実施例における他の構成例を説
明するためのブロツク図である。 FIG. 6 is a block diagram for explaining another configuration example in the fifth embodiment.

波形入力端子６０１を介して、例えば3400Hzに
帯域制限された音声波形入力信号がＡ／Ｄ変換器
６０２へ供給される。Ａ／Ｄ変換器６０２は前記
音声波形入力信号を例えば8000Hzで標本化してＫ
パラメータ抽出器６０３と自己相関係数計測器６
０４へと出力する。Ｋパラメータ抽出器６０３は
前記標本化された音声波形入力信号からＮ次まで
のＫパラメータを抽出し、前記Ｋパラメータのう
ち１次と２次とのＫパラメータについてはログエ
リアレシオ変換器６０５へ、３次からＮ次までの
Ｋパラメータについては有声無声判別器６０７へ
それぞれ出力する。但しＮは３以上の整数であ
る。 An audio waveform input signal band-limited to, for example, 3400 Hz is supplied to an A/D converter 602 via a waveform input terminal 601 . The A/D converter 602 samples the audio waveform input signal at, for example, 8000Hz and converts it into K
Parameter extractor 603 and autocorrelation coefficient measuring device 6
Output to 04. A K parameter extractor 603 extracts K parameters up to the Nth order from the sampled audio waveform input signal, and among the K parameters, the first and second order K parameters are sent to a log area ratio converter 605. The K parameters from the 3rd order to the Nth order are output to the voiced/unvoiced discriminator 607, respectively. However, N is an integer of 3 or more.

ログエリアレシオ変換器６０５はＫパラメータ
抽出器６０３より供給される１次と２次とのＫパ
ラメータすなわちK₁とK₂とを１次と２次とのロ
グエリアレシオL₁とL₂とに変換し、前記L₁とL₂
とを有声無声判別器６０７へ出力する。自己相関
係数計測器６０４は前記標本化された音声波形入
力信号から例えば２ｍSECから18ｍSECの範囲
の遅れ時間において抽出されるいわゆるδ_MAXを
計測し、前記δ_MAXを非線形変換器６０６へ出力
する。非線形変換器６０６は前記δ_MAXを例えば
下式 δ　′_ＭＡＸ＝ａδ_MAX／（ｂ−ｃδ_MAX）によりδ　′_ＭＡＸに変換し、前記δ　′_ＭＡＸを有声
無声判別
器６０７へ出力する。但しａ、ｂ、ｃは定数であ
る。 The log area ratio converter 605 converts the primary and secondary K parameters, that is, K ₁ and K ₂ , supplied from the K parameter extractor 603 into primary and secondary log area ratios L ₁ and L _2. Convert L ₁ and L ₂
is output to the voiced/unvoiced discriminator 607. The autocorrelation coefficient measuring device 604 measures the so-called δ _MAX extracted from the sampled audio waveform input signal at a delay time in the range of, for example, 2 mSEC to 18 mSEC, and outputs the δ _MAX to the nonlinear converter 606 . The nonlinear converter 606 converts the δ _MAX to δ ′ MAX using the following formula, for example, δ ′ _MAX = aδ _MAX / (b−c δ _MAX ), and outputs _{the δ ′ MAX} _to the voiced/unvoiced discriminator 607 . However, a, b, and c are constants.

有声無声判別器６０７は１次のログエリアレシ
オL₁、２次のログエリアレシオL₂、３次からＮ
次までのＫパラメータK₃、………、Ｋ_N、非線形
変換後のδ_MAXすなわちδ　′_ＭＡＸから、下式を用いて有声無声判別を行なう。なお上記Q₁…
……Ｑ_N+1は定数である。更に有声無声判別器６
０７は有声無声の判別結果を有声無声判別信号と
して有声無声判別信号出力端子６０８へ出力す
る。なお上記第５の実施例において第６図非線形
変換器６０６を除去した構成を実施し得ることは
明らかである。 The voiced/unvoiced discriminator 607 uses the first order log area ratio L ₁ , the second order log area ratio L ₂ , and the third order to N
From the K parameters K ₃ , ......, K _N up to the next, and δ _MAX after nonlinear transformation, that is, δ ′ _MAX , the following formula is used to determine voiced/unvoiced. In addition, _Q1 above...
...Q _N+1 is a constant. Furthermore, voiced/unvoiced discriminator 6
07 outputs the voiced/unvoiced discrimination result to the voiced/unvoiced discrimination signal output terminal 608 as a voiced/unvoiced discrimination signal. It is clear that a configuration in which the nonlinear converter 606 in FIG. 6 is removed from the fifth embodiment described above can be implemented.

第７図は第５の実施例の他の構成例を説明する
ためのブロツク図である。 FIG. 7 is a block diagram for explaining another example of the configuration of the fifth embodiment.

波形入力端子７０１を介して、例えば3400Hzに
帯域制限された音声波形入力信号がＡ／Ｄ変換器
７０２へ供給される。Ａ／Ｄ変換器７０２は前記
音声波形入力信号を例えば8000Hzで標本化してＫ
パラメータ抽出器７０３と自己相関係数計測器７
０４へと出力する。 An audio waveform input signal band-limited to, for example, 3400 Hz is supplied to an A/D converter 702 via a waveform input terminal 701 . The A/D converter 702 samples the audio waveform input signal at, for example, 8000Hz and converts it into K
Parameter extractor 703 and autocorrelation coefficient measuring device 7
Output to 04.

Ｋパラメータ抽出器７０３は前記標本化された
音声波形入力信号から正規化予測残差電力を計測
する手段を有し、この計測結果によりＫパラメー
タの次数を制御する機能を持つ分析器である。Ｋ
パラメータ抽出器７０３は前記標本化された音声
波形入力信号を分析し、正規化予測残差電力が予
じめ設定された値以下になるまで複数次のＫパラ
メータに関し、その最大次数は例えば８次等に制
限されているのが普通である。また適当な正規化
予測残差電力の設定値を選べば、その最小次数は
２であることが経験的に知られている。 The K parameter extractor 703 is an analyzer that has means for measuring the normalized prediction residual power from the sampled audio waveform input signal, and has a function of controlling the order of the K parameter based on the measurement result. K
The parameter extractor 703 analyzes the sampled audio waveform input signal, and extracts K parameters of multiple orders until the normalized predicted residual power becomes equal to or less than a preset value. Usually, there are restrictions such as Furthermore, it is empirically known that if an appropriate set value of the normalized prediction residual power is selected, the minimum order is 2.

更にＫパラメータ抽出器７０３は抽出された１
次と２次とのＫパラメータをＫパラメータ伝送路
１，７０５へ、Ｋパラメータの次数N1を示す分
析次数信号を分析次数信号伝送路７０７へ出力す
る。N1が３以上の場合にはＫパラメータ抽出器
７０３は更に３次からN1次までの１ケ又は複数
ケのＫパラメータをＫパラメータ伝送路２，７０
６へ出力する。ログエリアレシオ変換器７０８は
Ｋパラメータ伝送路１，７０５を介して供給され
る１次と２次とのＫパラメータをそれぞれログエ
リアレシオに変換し、変換結果をログエリア伝送
路７０９へ出力する。自己相関係数計測器７０４
はＡ／Ｄ変換器７０２より供給される標本化され
た音声波形入力信号から例えば２ｍSECから18
ｍSECの範囲の遅れ時間において抽出されるい
わゆるδ_MAXを計測し、前記δ_MAXを例えば下式 δ　′_ＭＡＸ＝ａδ_MAX／（ｂ−ｃδ_MAX）によりδ　′_ＭＡＸに変換し、前記δ　′_ＭＡＸをδ伝
送路７１
１へ出力する。有声無声判別器７１２は分析次数
信号伝送路７０７を介して供給されるＫパラメー
タの次数N1を示す分析次数信号により制御され
る。有声無声判別器７１２は前記N1が２の場合
にはログエリアレシオ伝送路７０９を介して供給
される１次のログエリアレシオをL₁、２次のロ
グエリアレシオをL₂、δ伝送路７１１を介して
供給される値をδ　′_ＭＡＸとして下式 S₁L₁＋S₂L₂＋S₃δ　′_ＭＡＸを用いて有声無声判別を行なう。なお上記S₁、
S₂、S₃は定数である。 Furthermore, the K parameter extractor 703 extracts the extracted 1
The next and second-order K parameters are output to the K parameter transmission line 1,705, and the analytical order signal indicating the order N1 of the K parameter is output to the analytical order signal transmission line 707. If N1 is 3 or more, the K parameter extractor 703 further extracts one or more K parameters from the 3rd order to the N1 order to the K parameter transmission paths 2, 70.
Output to 6. The log area ratio converter 708 converts the primary and secondary K parameters supplied via the K parameter transmission lines 1 and 705 into log area ratios, respectively, and outputs the conversion results to the log area transmission line 709. Autocorrelation coefficient measuring device 704
is the sampled audio waveform input signal supplied from the A/D converter 702, for example, from 2 mSEC to 18
The so-called δ _MAX extracted at the delay time in the range of mSEC is measured, and the δ _MAX is converted to δ ′ _MAX by the following formula, for example, δ ′ _MAX = aδ _MAX / (b−cδ _MAX ), and the δ _MAX is δ transmission line 71
Output to 1. The voiced/unvoiced discriminator 712 is controlled by an analysis order signal indicating the order N1 of the K parameter supplied via the analysis order signal transmission line 707. When N1 is 2, the voiced/unvoiced discriminator 712 sets the primary log area ratio supplied via the log area ratio transmission path 709 as L ₁ , the secondary log area ratio as L ₂ , and the δ transmission path 711 Voiced/unvoiced discrimination is performed using the following formula S ₁ L ₁ +S ₂ L ₂ +S ₃ δ ' _MAX , with the value supplied via .delta.' _MAX . Note that the above S ₁ ,
S ₂ and S ₃ are constants.

またN1が３以上の場合には、前記L₁、L₂、δ
　′_ＭＡＸおよびＫパラメータ伝送路２，７０６を介し
て供給される３次からN1次までのＫパラメータ
K₃、………Ｋ_N1を用いて、下式を用いて有声無声判別を行なう。 In addition, when N1 is 3 or more, the above-mentioned L ₁ , L ₂ , δ
' K parameters from 3rd order to N1 order supplied via _MAX and K parameter transmission line 2,706
Using K ₃ ,...K _N1 , the following formula is used to determine voiced/unvoiced.

なお上記Ｕ^Ｎ１ _ｊ（ｊ＝１、N1＋１）は定数であ
る。更に有声無声判別器７１２は有声無声の判別
結果を有声無声判別信号として有声無声判別信号
出力端子７１３へ出力する。なお上記第５の実施
例において第７図の非線形変換器７１０を除去し
た構成を実施し得ることは明らかである。 Note that the above U ^N1 _j (j=1, N1+1) is a constant. Further, the voiced/unvoiced discriminator 712 outputs the voiced/unvoiced discrimination result to the voiced/unvoiced discrimination signal output terminal 713 as a voiced/unvoiced discrimination signal. It is clear that a configuration in which the nonlinear converter 710 in FIG. 7 is removed can be implemented in the fifth embodiment.

上述した実施例を整理すると以下の如き構成と
なる。即ち、 (1) 音声の有声無声を判別するための有声無声判
別装置に於いて、１次の声道断面積比の対数値
と、２次の声道断面積比の対数値とを変数とす
る線形判別式を用いて音声の有声無声を判別す
る手段を有する有声無声判別装置、 (2) 音声の有声無声を判別するための有声無声判
別装置に於いて、１次の声道断面積比の対数値
と、２次の声道断面積比の対数値と３次または
３次以降の複数の部分自己相関係数とを変数と
する１種類の線形判別式を用いて、もしくは前
記１次の声道断面積比の対数値と、前記２次の
声道断面積比の対数値と、正規化予測残差電力
値から次数が決定される、１次と２次とを除い
た複数次の部分自己相関係数値とを変数とする
多種類の線形判別式を用いて音声の有声無声を
判別する手段を有する有声無声判別装置、 (3) 音声の有声無声を判別するための有声無声判
別装置に於いて、１次の声道断面積比の対数値
と、遅れ時間零付近を除く有限な遅れ時間の範
囲での自己相関係数値の最大値、又は前記最大
値の非線形変換値とを変数とする線形判別式を
用いて音声の有声無声を判別する手段を有する
有声無声判別装置、 (4) 音声の有声無声を判別するための有声無声判
別装置に於いて、１次の声道断面積比の対数値
と、２次の声道断面積比の対数値と、遅れ時間
零付近を除く有限な遅れ時間の範囲での自己相
関係数の最大値、又は前記最大値の非線形変換
値とを変数とする線形判別式を用いて音声の有
声無声を判別する手段を有する有声無声判別装
置、 (5) 音声の有声無声を判別するための有声無声判
別装置に於いて、１次の声道断面積比の対数値
と、２次の声道断面積比の対数値と、遅れ時間
零付近を除く有限な遅れ時間の範囲での自己相
関係数の最大値、又は前記最大値の非線形変換
値とから成る３つの変数と、３次又は３次以降
の複数の部分自己相関係数とを変数とする１種
類の線形判別式を用いて、もしくは前記３つの
変数と正規化予測残差電力値から次数が決定さ
れる、１次と２次とを除いた複数次の部分自己
相関係数値とを変数とする多種類の線形判別式
を用いて音声の有声無声を判別する手段を有す
る有声無声判別装置である。 The above-described embodiments can be summarized as follows. That is, (1) In a voiced/unvoiced discriminator for discriminating whether speech is voiced or unvoiced, the logarithm of the primary vocal tract cross-sectional area ratio and the logarithmic value of the secondary vocal tract cross-sectional area ratio are used as variables. (2) A voiced/unvoiced discriminator having means for discriminating voiced/unvoiced speech using a linear discriminant; , the logarithm of the second-order vocal tract cross-sectional area ratio, and the third-order or multiple partial autocorrelation coefficients after the third-order are used as variables, or the first-order The order is determined from the logarithm of the vocal tract cross-sectional area ratio of the second order, the logarithm of the vocal tract cross-sectional area ratio of the second order, and the normalized predicted residual power value, and multiple orders excluding the first order and the second order. A voiced/unvoiced discriminator having means for discriminating whether speech is voiced or unvoiced using multiple types of linear discriminants using partial autocorrelation coefficients as variables; In the device, the logarithm of the first-order vocal tract cross-sectional area ratio and the maximum value of the autocorrelation coefficient within a finite delay time range excluding the vicinity of zero delay time, or the nonlinear transformation value of the maximum value. (4) A voiced/unvoiced discriminator having means for discriminating whether speech is voiced or unvoiced using a linear discriminant as a variable; The logarithmic value of the area ratio, the logarithmic value of the quadratic vocal tract cross-sectional area ratio, and the maximum value of the autocorrelation coefficient in a finite delay time range excluding the vicinity of zero delay time, or a nonlinear conversion value of the maximum value. (5) A voiced/unvoiced discriminator having a means for discriminating whether speech is voiced or unvoiced using a linear discriminant using a linear discriminant as a variable; The logarithm of the vocal tract cross-sectional area ratio, the logarithmic value of the quadratic vocal tract cross-sectional area ratio, and the maximum value of the autocorrelation coefficient within a finite delay time range excluding the vicinity of zero delay time, or the nonlinearity of the maximum value. Using one type of linear discriminant with variables consisting of three variables consisting of the converted value and a plurality of cubic or post-cubic partial autocorrelation coefficients, or using the three variables and the normalized predicted residual It has means for determining whether speech is voiced or unvoiced using various types of linear discriminant formulas whose order is determined from the power value and whose variables are partial autocorrelation coefficients of multiple orders excluding first order and second order. This is a voiced/unvoiced discrimination device.

本発明は以上説明した様に音声の有声無声判別
に用いるパラメータのうち、分布の偏りが大き
く、そのまま線形判別式の変数として使用すると
有声と無声の分離度が悪いパラメータ、例えば１
次のＫパラメータにあらかじめ例えばログエリア
レシオのように分布の偏りを圧縮する効果のある
パラメータに変換し、変換後に線形判別式のパラ
メータとして使用することにある。そのため従来
の有声無声判別装置と比較して大幅に有声無声判
別精度が向上するという効果がある。 As explained above, among the parameters used for voiced/unvoiced discrimination, the distribution of the parameters is large, and if used as variables in the linear discriminant, the degree of separation between voiced and unvoiced is poor.
The next K parameter is converted in advance to a parameter that has the effect of compressing the bias of the distribution, such as the log area ratio, and is used as a parameter of the linear discriminant after conversion. Therefore, there is an effect that the voiced/unvoiced discrimination accuracy is significantly improved compared to the conventional voiced/unvoiced discrimination device.

[Brief explanation of the drawing]

第１図〜第７図は本発明による有声無声判別装
置の実施例を示す概略ブロツク図である。１０２，２０２，３０２，５０３，６０３，７
０３……Ｋパラメータ抽出器、４０２，５０２，
６０２，７０２……Ａ／Ｄ変換器、１０３，２０
３，３０６，４０４，５０５，６０５，７０８…
…ログエリアレシオ変換器、１０４，２０４，３
０８，４０６，５０７，６０７，７１２……有声
無声判別器、４０３，５０４，６０４，７０４…
…自己相関係数計測器、４０５，５０６，６０
６，７１０……非線形変換器。 1 to 7 are schematic block diagrams showing an embodiment of a voiced/unvoiced discriminating device according to the present invention. 102, 202, 302, 503, 603, 7
03...K parameter extractor, 402, 502,
602,702...A/D converter, 103,20
3,306,404,505,605,708...
...Log area ratio converter, 104, 204, 3
08,406,507,607,712...Voiced/unvoiced discriminator, 403,504,604,704...
...Autocorrelation coefficient measuring device, 405, 506, 60
6,710...Nonlinear converter.

Claims

[Claims] 1. Voiced/unvoiced discrimination that discriminates voiced/unvoiced by comparing the value of a discriminant having a predetermined coefficient and using a predetermined characteristic parameter of an input audio signal as a variable with a predetermined threshold value. In the apparatus, at least one of the variables is a partial autocorrelation coefficient (K parameter) up to a predetermined order, and this partial autocorrelation coefficient is converted into a log area ratio and used as a variable in the discriminant. A voiced/unvoiced discrimination device. 2. The voiced/unvoiced discrimination device according to claim 1, wherein the predetermined order is 1. 3. The voiced/unvoiced discrimination device according to claim 1, wherein the predetermined order is 2.