JPS5912198B2

JPS5912198B2 - Audio parameter abnormality detection method

Info

Publication number: JPS5912198B2
Application number: JP56214567A
Authority: JP
Inventors: 亨金盛
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1981-12-25
Filing date: 1981-12-25
Publication date: 1984-03-21
Also published as: JPS58111996A

Description

【発明の詳細な説明】（１）発明の技術分野本発明は、音声の分析合成方式において、合成音声中に
異音などの聞き辛い音を発生させる原因にっながる、異
常な音声パラメータを自動的に検出する方式に関し、特
に極周波数と帯域幅との関数に近似的に変換して異常パ
ラメータを検出する方式に関する。DETAILED DESCRIPTION OF THE INVENTION (1) Technical Field of the Invention The present invention is directed to a method for analyzing and synthesizing speech that detects abnormal speech parameters that can cause difficult-to-hear sounds such as strange noises to be generated in synthesized speech. The present invention relates to a method for automatically detecting an abnormal parameter, and in particular to a method for detecting an abnormal parameter by approximately converting it into a function of polar frequency and bandwidth.

（２）技術の背景一般に、ＰＡＲＣＯＲ、ＬＳＰなどの線形予測方式では
、Ｐを予測次数、αｉを線形予測係数とＨ（ｚ）＝１＋ ΣαｉＺ−１ｉ＝１によつて表わされる全極型モデルにより音声のスペクト
ル包絡特性を近似するものであるため、このモデルには
あてはまらない場合の多いＮやＭな１０どの鼻音性の子
音や、撥音、あるいは母音のＩなどの、第１フオルマン
ト周波数が低い音声でかつ、ピッチ周波数が第１フオル
マント周波数にほぼ一致している場合などに第１フオル
マントの帯域幅が異常に狭く分析されることがある。(2) Technical background Generally, in linear prediction methods such as PARCOR and LSP, an all-pole model is represented by H(z)=1+ΣαiZ−1 i=1, where P is the prediction order and αi is the linear prediction coefficient. This model approximates the spectral envelope characteristics of speech, so this model often does not apply to nasal consonants such as N, M, 10, and the like, or the vowel I, which has a low first formant frequency. In the case of voice and the pitch frequency almost matching the first formant frequency, the bandwidth of the first formant may be analyzed as being abnormally narrow.

このような１５場合には、合成された音声は、その振幅
が非常に大きくなつたり、あるいは、聞き辛い異音を含
むものとなつたりすることが多い。（３）従来技術と問
題点従来、音声の分析合成を行なう場合で、特定の■０合
成音声の品質を向上させるためには、合成と試聴を繰り
返しながら、人手によつてパラメータの異常が生じてい
る時点を探索し、さらにパラメータに適当な修正を加え
てみる、という作業を繰り返さねばならなかつた。In such cases, the synthesized speech often has a very large amplitude or contains abnormal sounds that are difficult to hear. (3) Prior art and problems Conventionally, when analyzing and synthesizing speech, certain I had to repeat the process of searching for the point in time where the parameters were, and then making appropriate modifications to the parameters.

しかも、従来は、合成音ク５声中の異常性が、音声パ
ラメータのどの部分に起因して生じるのかが、必ずしも
適確には判別できなかつたため、たとえば、短いが強い
刺激をもつた異音を抑制する場合には、試行錯誤的で非
効率な修正処理をしいられていた。３０（４）発明の目
的本発明は、異常な音声を発生する可能性が特に高いパラ
メータの異常を、自動的に検出する手段を提供すること
を目的とする。Moreover, in the past, it was not always possible to accurately determine which part of the voice parameters caused an abnormality in a synthesized voice. In order to suppress this problem, a trial-and-error and inefficient correction process was required. 30(4) Object of the Invention The object of the present invention is to provide a means for automatically detecting an abnormality in a parameter that is particularly likely to cause an abnormal sound.

（５）発明の構成３５本発明は、異常パラメータの検出を自動的に行なう
ため、スペクトル包絡中の狭少な帯域幅をもつ極周波数
とピッチ周波数との近接度が、合成音声中の異音の発生
に密接な関係をもつことに着目してなされたものである
。(5) Structure of the Invention 35 The present invention automatically detects abnormal parameters, so that the proximity of the pitch frequency and the polar frequency with a narrow bandwidth in the spectrum envelope is This was done with a focus on the fact that it is closely related to development.

第１図は、上述した極周波数とピツチ周波数との関係の
説明図である。FIG. 1 is an explanatory diagram of the relationship between the above-mentioned pole frequency and pitch frequency.

同図において、Ｆは、複数の極周波数Ｆｉと帯域幅Ｂｉ
により近似的に表現されたスペクトル包絡である。本発
明は、各極周波数の帯域幅Ｂｉの中で、特に狭少な帯域
幅Ｂｉをもつ極周波数Ｆｉ（たとえば３００Ｈｚ）に、
ピツチ周波数Ｐｉが、たとえば数Ｈｚ乃至３０Ｈｚ程度
の差で近接していた場合に、これを異常パラメータと判
定する。本発明は、その構成として、音声を分析しピツ
チ周波数に関する情報とスペクトル包絡に関する情報と
を抽出し、これをパラメータ時系列として音声合成に用
いる音声分析合成方式において、上記分析されたスペク
トル包絡に関する情報を、極周波数とその帯域幅との関
数に近似的に変換する手段と、該変換された関数につい
て狭少な帯域幅をもつ極周波数を抽出する手段と、該抽
出された極周波数と上記分析されたピツチ周波数との近
接度を判定する手段とをそなえ、極周波数とピツチ周波
数との近接度が高い場合にこれをパラメータ異常と判定
することを特徴としている。In the figure, F is a plurality of polar frequencies Fi and a bandwidth Bi
This is the spectral envelope approximately expressed by In the present invention, among the bandwidth Bi of each pole frequency, a pole frequency Fi (for example, 300 Hz) having a particularly narrow bandwidth Bi,
If the pitch frequencies Pi are close to each other with a difference of, for example, several Hz to 30 Hz, this is determined to be an abnormal parameter. The present invention provides a speech analysis and synthesis method in which speech is analyzed to extract information on pitch frequencies and information on spectral envelope, and this is used as a parameter time series for speech synthesis. means for approximately converting the polar frequency into a function of a polar frequency and its bandwidth; means for extracting a polar frequency having a narrow bandwidth for the transformed function; The present invention is characterized in that it includes means for determining the degree of proximity between the polar frequency and the pitch frequency, and when the degree of proximity between the polar frequency and the pitch frequency is high, it is determined that this is a parameter abnormality.

（６）発明の実施例以下に、本発明を実施例にしたがつて詳述する。(6) Examples of the invention The present invention will be described in detail below using examples.

第２図は本発明を実施した音声分析器の構成図である。
同図において、１は音声分析処理部、２は本発明に係る
音声パラメータ異常検出部である。音声分析処理部１に
おいて、３は入力音声からピッチ、振幅、有声／無声の
情報を抽出する音源情報分析部、４はスペクトル包絡情
報をバラメータ化するための線形予測分析およびＰＡＲ
ＣＯＲ変換を行なう線形予測分析部である。分析部３，
４からの分析出力は、図示されない音声合成器において
音声合成に使用される。パラメータ異常検出部２におい
て、５は線形予測分析部４から出力されたＰＡＲＣＯＲ
係数を、線形予測係数に戻す変換部である。FIG. 2 is a block diagram of a speech analyzer embodying the present invention.
In the figure, 1 is a voice analysis processing section, and 2 is a voice parameter abnormality detection section according to the present invention. In the speech analysis processing section 1, 3 is a sound source information analysis section that extracts pitch, amplitude, voiced/unvoiced information from input speech, and 4 is a linear predictive analysis and PAR for parameterizing spectral envelope information.
This is a linear predictive analysis unit that performs COR conversion. Analysis department 3,
The analysis output from 4 is used for speech synthesis in a speech synthesizer, not shown. In the parameter abnormality detection unit 2, 5 is the PARCOR output from the linear prediction analysis unit 4.
This is a conversion unit that converts coefficients back into linear prediction coefficients.

もし、線形予測分析部４から、中間処理データとして線
形予測係数を取り出すことができれば、変換部５は不要
である。６は、前述した音声のスペクトル包絡に関する
全極型近似モデルの伝達関数Ｈ（ｚ）において、その分
母をＡ（ｚ）としたときの方程式について、ニユートン
ラフソン法などを用いて、Ａ（ｚ）−０を満たすｚ平面
上の根を求める求根演算部である。If linear prediction coefficients can be extracted from the linear prediction analysis unit 4 as intermediate processing data, the conversion unit 5 is not necessary. 6 is the transfer function H(z) of the all-pole approximation model regarding the spectral envelope of the voice described above, and the equation when the denominator is A(z) is calculated by using the Newton-Raphson method etc. ) This is a root-finding calculation unit that finds roots on the z-plane that satisfy −0.

７は、求根演算部６の演算結果の解をとし、Ｔをサンプリング周期としたとき、極周波数およ
び帯域幅を求める変換部である。Reference numeral 7 denotes a conversion unit that calculates the polar frequency and bandwidth, where the solution of the calculation result of the root-finding calculation unit 6 is taken as the sampling period and T is the sampling period.

８は、狭少な帯域幅Ｂｉをもつ極周波数Ｆｉの抽出部で
ある。8 is a part for extracting the polar frequency Fi having a narrow bandwidth Bi.

帯域幅の狭少度は、Ｂｉの値がある一定の閾値以下であ
るか、あるいはがある一定の閾値以上であるかによつて
判定し、抽出する。The degree of narrowness of the bandwidth is determined and extracted depending on whether the value of Bi is less than or equal to a certain threshold value or greater than or equal to a certain threshold value.

９は、抽出された狭少帯域幅の極の極周波数と、音源情
報分析部３の出力中のピツチ情報（周波数）Ｐｉとの近
接度を判定する判定部である。Reference numeral 9 denotes a determination unit that determines the degree of proximity between the extracted polar frequency of the narrow bandwidth and the pitch information (frequency) Pi being output from the sound source information analysis unit 3.

近接度の判定は、たとえば、次の方程式に基づいて行な
われる。The determination of proximity is performed, for example, based on the following equation.

この判定式を満たす極周波数Ｆｉがあつたとき、これを
異常パラメータと判定して出力する。異常パラメータの
判定出力が得られたとき、そのデータを利用して、狭少
帯域幅を広げるか、ピツチ周波数をずらすか、その他音
質に作用する適当な因子を変化させるなど、音声分析出
力を事前に修正することが可能となる。When a polar frequency Fi that satisfies this determination formula is found, it is determined to be an abnormal parameter and output. When the abnormal parameter judgment output is obtained, the data can be used to pre-process the audio analysis output by widening the narrow bandwidth, shifting the pitch frequency, or changing other appropriate factors that affect sound quality. It is possible to correct it.

（７）発明の効果本発明は、異常な音声を発生する可能性が高いパラメー
タを自動的に検出することができるため、合成音声の品
質を向上させる作業を極めて容易にすることができる。(7) Effects of the Invention Since the present invention can automatically detect parameters that are likely to cause abnormal speech, it can extremely facilitate the task of improving the quality of synthesized speech.

[Brief explanation of drawings]

第１図は、本発明の原理の説明図、第２図は本発明実施
例の構成図である。図において、１は音声分析処理部、２は音声パラメータ
異常検出部、３は音源情報分析部、４は線形予測分析部
、５はＰＡＲＣＯＲ係数から線形予測係数への変換部、
６は求根演算部、７は根から極周波数および帯域幅への
変換部、８は狭少帯域幅の抽出部、９は極周波数とピツ
チ周波数との近接度判定部、をそれぞれ示す。FIG. 1 is an explanatory diagram of the principle of the present invention, and FIG. 2 is a configuration diagram of an embodiment of the present invention. In the figure, 1 is a speech analysis processing section, 2 is a speech parameter abnormality detection section, 3 is a sound source information analysis section, 4 is a linear prediction analysis section, 5 is a conversion section from PARCOR coefficients to linear prediction coefficients,
Reference numeral 6 indicates a root calculation unit, 7 a conversion unit from roots to pole frequencies and bandwidths, 8 a narrow bandwidth extraction unit, and 9 a proximity determination unit between pole frequencies and pitch frequencies.

Claims

[Claims]

1 In a speech analysis and synthesis method that analyzes speech and extracts information on pitch frequency and information on spectral envelope, and uses this as a parameter time series for speech synthesis,
means for approximately converting the information regarding the analyzed spectral envelope into a function of a polar frequency and its bandwidth; means for extracting a polar frequency having a narrow bandwidth with respect to the transformed function; means for determining the proximity between the analyzed polar frequency and the analyzed pitch frequency,
An audio parameter abnormality detection method that determines a parameter abnormality when the degree of proximity between the polar frequency and the pitch frequency is high.