JPS5912199B2

JPS5912199B2 - Audio parameter modification method

Info

Publication number: JPS5912199B2
Application number: JP56214568A
Authority: JP
Inventors: 亨金盛
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1981-12-25
Filing date: 1981-12-25
Publication date: 1984-03-21
Also published as: JPS58111997A

Description

【発明の詳細な説明】（１）発明の技術分野本発明は、音声の分析合成方式において、合成音声中の
異音などの聞き辛い音の発生を自動的に抑制するための
音声パラメータ修正方式に関する。Detailed Description of the Invention (1) Technical Field of the Invention The present invention provides a speech parameter correction method for automatically suppressing the occurrence of sounds that are difficult to hear, such as abnormal noises, in synthesized speech in a speech analysis and synthesis method. Regarding.

（２）技術の背景一般に、ＰＡＲＣＯＲ、ＬＳＰなどの
線形予測方式では、Ｐを予測次数、αｉを線形予測係数
としたＨ（ｚ）＝１＋ΣαｉＺ１＝１によつて表わされる全極型モデルにより音声のスペクト
ル包絡特性は近似するものであるため、このモデルには
あてはまらない場合の多いＮやＭな１０どの鼻音性の子
音や撥音、あるいは母音の工などの、第１フオルマント
周波数が低い音声でかつ基本周波数が第１フオルマント
周波数にほぼ一致している場合などに第１フオルマント
の帯域幅が異常に狭く分析されることがある。(2) Background of the technology Generally, in linear prediction methods such as PARCOR and LSP, speech is calculated using an all-pole model expressed by H(z)=1+ΣαiZ 1=1, where P is the prediction order and αi is the linear prediction coefficient. The spectral envelope characteristics of are approximate, so this model is often not applicable to nasal consonants such as N and M 10, plosive sounds, or vowel sounds, which have a low first formant frequency. When the fundamental frequency almost matches the first formant frequency, the bandwidth of the first formant may be analyzed to be abnormally narrow.

このような場合１５には、合成された音声は、その振幅
が非常に大きくなつたり、あるいは、聞き辛い異音を含
むものとなつたりすることが多い。（３）従来技術と問
題点従来、音声の分析合成を行なう場合で、特定のクｏ合
成音声の品質を向上させるためには、合成と試聴を繰り
返しながら人手によつてパラメータの異常が生じている
時点を探索し、さらにパラメータに適当な修正を加えて
みる、という作業を繰り返さねばならなかつた。In such cases 15, the synthesized speech often has a very large amplitude or contains abnormal sounds that are difficult to hear. (3) Conventional technology and problems Conventionally, when analyzing and synthesizing speech, it is necessary to manually perform parameter abnormalities while repeating synthesis and listening in order to improve the quality of synthesized speech. I had to repeat the process of searching for a certain point in time, and then making appropriate modifications to the parameters.

しかも、従来は、合成音声９５の異常性が、音声パラメ
ータのどの部分に起因して生じているのかが、必ずしも
適確に判別することができず、たとえば、短いが強い刺
激をもつ「ギヨン、ギヨン」というような異音が合成音
声中に混じつた場合、パラメータの修正は試行錯誤３０
的になり、非効率な処理をしいられていた。（４）発明
の目的本発明は、異常な音声を発生する可能性が特に高
い上記のようなパラメータを、自動的に検出修正して、
合成音声の品質を向上させることを目的３５とする。Moreover, in the past, it was not always possible to accurately determine which part of the speech parameters caused the abnormality in the synthesized speech 95. If an abnormal sound such as "Giyon" is mixed into the synthesized speech, modifying the parameters is a matter of trial and error30
This resulted in inefficient processing. (4) Purpose of the Invention The present invention automatically detects and corrects the above-mentioned parameters that are particularly likely to cause abnormal sounds.
The purpose 35 is to improve the quality of synthesized speech.

（５）発明の構成本発明は、異常パラメータの検出および修正処理を自動
的に行なうため、スペクトル包絡中の狭少な帯域幅をも
つ極周波数とピツチ周波数との近接度が高い場合に、合
成音声中の該極周波数の近傍のパワーレベルが異常に増
加し、異音発生原因となる点に着目してなされたもので
ある。(5) Structure of the Invention The present invention automatically detects abnormal parameters and corrects them, so when there is a high degree of proximity between a polar frequency with a narrow bandwidth in the spectrum envelope and a pitch frequency, the synthesized voice This was done by focusing on the fact that the power level in the vicinity of the polar frequency increases abnormally, causing abnormal noise.

第１図は、上述した極周波数とピッチ周波数との関係の
説明図である。FIG. 1 is an explanatory diagram of the relationship between the above-mentioned pole frequency and pitch frequency.

同図において、Ｆは、複数の極周波数Ｆｉと帯域幅Ｂｉ
により近似的に表現されたスペクトル包絡である。本発
明は、各極周波数の帯域幅Ｂｉの中で、特に狭小な帯域
幅Ｂｉをもつ極周波数Ｆｉ（たとえば３００Ｈｚ）に、
ピツチ周波数Ｐｉが、たとえば数Ｈｚ乃至３０Ｈｚ程度
の差で近接していた場合に、異常パラメータと判定し、
ピツチ周波数を適当な値だけずらすものである。本発明
は、上述した原理に基づき、その構成として、音声波を
分析し、ピッチ周波数に関する情報とスペクトル包絡に
関する情報とを抽出し、これをパラメータ時系列として
音声合成を行なう音声の分析合成方式において、上記ス
ペクトル包絡に関する情報を極周波数と帯域幅の関数に
より近似する手段と、該関数の極周波数の中で狭少な帯
域幅を伴うものを抽出する手段と、該関数の極周波数ど
ピツチ周波数との近接度を判定する手段と、該抽出手段
と判定手段とにより、帯域幅が狭少で、かつピツチ周波
数に近接した極周波数の存在が検出されたとき、該ピツ
チ周波数を変更する手段とをそなえ、該帯域幅が狭少な
極周波数とピツチ周波数との間隔を広げることにより、
合成音声中の異音の発生を抑制することを特徴としてい
る。In the figure, F is a plurality of polar frequencies Fi and a bandwidth Bi
This is the spectral envelope approximately expressed by The present invention provides a polar frequency Fi (for example, 300 Hz) having a particularly narrow bandwidth Bi among the bandwidth Bi of each pole frequency.
If the pitch frequencies Pi are close to each other with a difference of, for example, several Hz to 30 Hz, it is determined that the parameter is abnormal,
This is to shift the pitch frequency by an appropriate value. The present invention is based on the above-mentioned principle and includes a speech analysis and synthesis method in which speech waves are analyzed, information on pitch frequency and information on spectral envelope are extracted, and speech synthesis is performed using this as a parameter time series. , means for approximating the information regarding the spectral envelope by a function of a polar frequency and a bandwidth; a means for extracting a polar frequency with a narrow bandwidth among the polar frequencies of the function; means for determining the proximity of the pitch frequency, and means for changing the pitch frequency when the existence of a polar frequency having a narrow bandwidth and close to the pitch frequency is detected by the extraction means and the determining means. Therefore, by widening the interval between the pole frequency and the pitch frequency where the bandwidth is narrow,
It is characterized by suppressing the occurrence of abnormal sounds in synthesized speech.

（６）発明の実施例以下に、本発明を実施例にしたがつ
て詳述する。(6) Examples of the Invention The present invention will be described in detail below using examples.

第２図は本発明を実施した音声分析器の構成図である。
同図において、１は音声分析処理部、２は本発明に係る
音声パラメータ修正処理部である。音声分析処理部１に
おいて、３は入力音声からピツチ、振幅、有声／無声の
情報を抽出する音源情報分析部、４はスペクトル包絡情
報をパラメータ化するための線形予測分析およびＰＡＲ
ＣＯＲ変換を行なう線形予測分析部である。分析部３，
４からの分析出力は、図示されない音声合成器において
音声合成に使用される。パラメータ修正処理部２におい
て、５は線形予測分析部４から出力されたＰＡＲＣＯＲ
係数を、線形予測係数に戻す変換部である。FIG. 2 is a block diagram of a speech analyzer embodying the present invention.
In the figure, 1 is a speech analysis processing section, and 2 is a speech parameter correction processing section according to the present invention. In the speech analysis processing section 1, 3 is a sound source information analysis section that extracts pitch, amplitude, voiced/unvoiced information from input speech, and 4 is a linear predictive analysis and PAR for parameterizing spectral envelope information.
This is a linear predictive analysis unit that performs COR conversion. Analysis department 3,
The analysis output from 4 is used for speech synthesis in a speech synthesizer, not shown. In the parameter correction processing unit 2, 5 is the PARCOR output from the linear prediction analysis unit 4.
This is a conversion unit that converts coefficients back into linear prediction coefficients.

もし、線形予測分析部４から、中間処理データとして線
形予測係数を取り出すことができれば、変換部５は不要
である。６は、前述した音声のスペクトル包絡に関する
全極型近似モデルの伝達関数Ｈ＠）において、その分母
をＡ（ｚ）としたときの方程式について、ニユートンラ
フソ名去などを用いて、Ａ（ｚ）＝Ｏを満たすｚ平面上
の根を求める求根演算部である。If linear prediction coefficients can be extracted from the linear prediction analysis unit 4 as intermediate processing data, the conversion unit 5 is not necessary. 6 is the transfer function H@) of the all-pole approximation model regarding the spectral envelope of the voice mentioned above, and the equation when the denominator is A(z) is expressed as A(z)= This is a root-finding calculation unit that finds roots on the z-plane that satisfy O.

７は、求根演舞部６の演算結果の解をとし、Ｔをサンプリング周期としたとき、極周波数およ
び帯域幅を求める変換部である。Reference numeral 7 denotes a conversion unit that calculates the polar frequency and bandwidth, where the solution of the calculation result of the root-finding performance unit 6 is taken as the sampling period and T is the sampling period.

８は、狭少な帯域幅Ｂｉをもつ極周波数Ｆｉの抽出部で
ある。8 is a part for extracting the polar frequency Fi having a narrow bandwidth Bi.

帯域幅の狭少度は、Ｂｉの値が、ある一定の閾値以下で
あるか、あるいはが、ある一走力閾値以上であるかによ
つて判定し、抽出する。The degree of narrowness of the bandwidth is determined and extracted depending on whether the value of Bi is less than a certain threshold value or more than a certain running force threshold value.

９は、抽出された狭少帯域幅の極の極周波数と、音源情
報分析部３の出力中のピツチ情報（周波数）Ｐｉとの近
接度を判定する判定部である。Reference numeral 9 denotes a determination unit that determines the degree of proximity between the extracted polar frequency of the narrow bandwidth and the pitch information (frequency) Pi being output from the sound source information analysis unit 3.

近接度Ｏ判定は、たとえば、次の方程式Ｆｉ〉１００１ｆｉ−Ｐｉｌに基づいて行なわれる。For example, the proximity O determination is performed using the following equation Fi 〉100 1fi-Pil It is carried out based on.

この判定式を満たす極周波数Ｆｉがあつたとき、これを
異常パラメータと判定して出力する。１０は、ピツチ周
波数変更部であり、狭少帯域幅をもつ極周波数に対して
、ピツチ周波数を、一定値以上、たとえば３０ＨＺ以上
離すように、ピツチ周波数を修正し、分析出力へＰｉと
して供給する。When a polar frequency Fi that satisfies this determination formula is found, it is determined to be an abnormal parameter and output. Reference numeral 10 denotes a pitch frequency changing unit, which corrects the pitch frequency so that the pitch frequency is separated by a certain value or more, for example, 30 Hz or more, with respect to the polar frequency having a narrow bandwidth, and supplies it to the analysis output as Pi. .

なお、ピツチ周波数の修正の他に、極周波数の帯域幅を
広げたり、時間窓を調整するなどの修正を行なうことも
可能である。In addition to modifying the pitch frequency, it is also possible to perform modifications such as widening the bandwidth of the polar frequency and adjusting the time window.

（７）発明の効果本発明は、異常な音声を発生する可能性が高いパラメー
タを事前に自動的に修正することができるため、合成音
声の品質を向上させる作業を極めて容易にすることがで
きる。(7) Effects of the Invention The present invention can automatically correct in advance parameters that are likely to cause abnormal speech, making it extremely easy to improve the quality of synthesized speech. .

【図面の簡単な説明】第１図は、本発明の原理の説明図、第２図は本発明実施
例の構成図である。図において、１は音声分析処理部、２は音声パラメータ
修正処理部、３は音源情報分析部、４は線形予測分析部
、５はＰＡＲＣＯＲ係数から線形予測係数への変換部、
６は求根演算部、７は根から極周波数および帯域幅への
変換部、８は狭少帯域幅の抽出部、９は極周波数とピツ
チ周波数との近接度判定部、１０はピツチ周波数変更部
、をそれぞれ示す。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an explanatory diagram of the principle of the present invention, and FIG. 2 is a configuration diagram of an embodiment of the present invention. In the figure, 1 is a speech analysis processing section, 2 is a speech parameter correction processing section, 3 is a sound source information analysis section, 4 is a linear prediction analysis section, 5 is a conversion section from PARCOR coefficients to linear prediction coefficients,
6 is a root calculation unit, 7 is a conversion unit from roots to pole frequency and bandwidth, 8 is a narrow bandwidth extraction unit, 9 is a proximity determination unit between pole frequency and pitch frequency, and 10 is a pitch frequency change unit. , respectively.

Claims

[Claims]

1 In a speech analysis and synthesis method that analyzes speech waves, extracts information regarding pitch frequency and information regarding spectral envelope, and performs speech synthesis using this as a parameter time series, the information regarding the spectral envelope is combined with polar frequencies and bandwidth. means for approximating by a function; means for extracting one with a narrow bandwidth among the polar frequencies of the function; means for determining the proximity between the polar frequencies of the function and the pitch frequency; means for changing the pitch frequency when the existence of a pole frequency having a narrow bandwidth and close to the pitch frequency is detected by the determining means, the pole frequency having the narrow bandwidth and the pitch frequency; A speech parameter modification method characterized by suppressing the occurrence of abnormal sounds in synthesized speech by widening the interval between.