JPS5912198B2 - Audio parameter abnormality detection method - Google Patents
Audio parameter abnormality detection methodInfo
- Publication number
- JPS5912198B2 JPS5912198B2 JP56214567A JP21456781A JPS5912198B2 JP S5912198 B2 JPS5912198 B2 JP S5912198B2 JP 56214567 A JP56214567 A JP 56214567A JP 21456781 A JP21456781 A JP 21456781A JP S5912198 B2 JPS5912198 B2 JP S5912198B2
- Authority
- JP
- Japan
- Prior art keywords
- frequency
- speech
- polar
- abnormality detection
- pitch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired
Links
Landscapes
- Alarm Systems (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Description
【発明の詳細な説明】
(1)発明の技術分野
本発明は、音声の分析合成方式において、合成音声中に
異音などの聞き辛い音を発生させる原因にっながる、異
常な音声パラメータを自動的に検出する方式に関し、特
に極周波数と帯域幅との関数に近似的に変換して異常パ
ラメータを検出する方式に関する。DETAILED DESCRIPTION OF THE INVENTION (1) Technical Field of the Invention The present invention is directed to a method for analyzing and synthesizing speech that detects abnormal speech parameters that can cause difficult-to-hear sounds such as strange noises to be generated in synthesized speech. The present invention relates to a method for automatically detecting an abnormal parameter, and in particular to a method for detecting an abnormal parameter by approximately converting it into a function of polar frequency and bandwidth.
(2)技術の背景
一般に、PARCOR、LSPなどの線形予測方式では
、Pを予測次数、αiを線形予測係数とH(z)=
1+ ΣαiZ−1
i=1
によつて表わされる全極型モデルにより音声のスペクト
ル包絡特性を近似するものであるため、このモデルには
あてはまらない場合の多いNやMな10どの鼻音性の子
音や、撥音、あるいは母音のIなどの、第1フオルマン
ト周波数が低い音声でかつ、ピッチ周波数が第1フオル
マント周波数にほぼ一致している場合などに第1フオル
マントの帯域幅が異常に狭く分析されることがある。(2) Technical background Generally, in linear prediction methods such as PARCOR and LSP, an all-pole model is represented by H(z)=1+ΣαiZ−1 i=1, where P is the prediction order and αi is the linear prediction coefficient. This model approximates the spectral envelope characteristics of speech, so this model often does not apply to nasal consonants such as N, M, 10, and the like, or the vowel I, which has a low first formant frequency. In the case of voice and the pitch frequency almost matching the first formant frequency, the bandwidth of the first formant may be analyzed as being abnormally narrow.
このような15場合には、合成された音声は、その振幅
が非常に大きくなつたり、あるいは、聞き辛い異音を含
むものとなつたりすることが多い。(3)従来技術と問
題点
従来、音声の分析合成を行なう場合で、特定の■0 合
成音声の品質を向上させるためには、合成と試聴を繰り
返しながら、人手によつてパラメータの異常が生じてい
る時点を探索し、さらにパラメータに適当な修正を加え
てみる、という作業を繰り返さねばならなかつた。In such cases, the synthesized speech often has a very large amplitude or contains abnormal sounds that are difficult to hear. (3) Prior art and problems Conventionally, when analyzing and synthesizing speech, certain I had to repeat the process of searching for the point in time where the parameters were, and then making appropriate modifications to the parameters.
しかも、従来は、合成音ク5 声中の異常性が、音声パ
ラメータのどの部分に起因して生じるのかが、必ずしも
適確には判別できなかつたため、たとえば、短いが強い
刺激をもつた異音を抑制する場合には、試行錯誤的で非
効率な修正処理をしいられていた。30(4)発明の目
的
本発明は、異常な音声を発生する可能性が特に高いパラ
メータの異常を、自動的に検出する手段を提供すること
を目的とする。Moreover, in the past, it was not always possible to accurately determine which part of the voice parameters caused an abnormality in a synthesized voice. In order to suppress this problem, a trial-and-error and inefficient correction process was required. 30(4) Object of the Invention The object of the present invention is to provide a means for automatically detecting an abnormality in a parameter that is particularly likely to cause an abnormal sound.
(5)発明の構成
35本発明は、異常パラメータの検出を自動的に行なう
ため、スペクトル包絡中の狭少な帯域幅をもつ極周波数
とピッチ周波数との近接度が、合成音声中の異音の発生
に密接な関係をもつことに着目してなされたものである
。(5) Structure of the Invention 35 The present invention automatically detects abnormal parameters, so that the proximity of the pitch frequency and the polar frequency with a narrow bandwidth in the spectrum envelope is This was done with a focus on the fact that it is closely related to development.
第1図は、上述した極周波数とピツチ周波数との関係の
説明図である。FIG. 1 is an explanatory diagram of the relationship between the above-mentioned pole frequency and pitch frequency.
同図において、Fは、複数の極周波数Fiと帯域幅Bi
により近似的に表現されたスペクトル包絡である。本発
明は、各極周波数の帯域幅Biの中で、特に狭少な帯域
幅Biをもつ極周波数Fi(たとえば300Hz)に、
ピツチ周波数Piが、たとえば数Hz乃至30Hz程度
の差で近接していた場合に、これを異常パラメータと判
定する。本発明は、その構成として、音声を分析しピツ
チ周波数に関する情報とスペクトル包絡に関する情報と
を抽出し、これをパラメータ時系列として音声合成に用
いる音声分析合成方式において、上記分析されたスペク
トル包絡に関する情報を、極周波数とその帯域幅との関
数に近似的に変換する手段と、該変換された関数につい
て狭少な帯域幅をもつ極周波数を抽出する手段と、該抽
出された極周波数と上記分析されたピツチ周波数との近
接度を判定する手段とをそなえ、極周波数とピツチ周波
数との近接度が高い場合にこれをパラメータ異常と判定
することを特徴としている。In the figure, F is a plurality of polar frequencies Fi and a bandwidth Bi
This is the spectral envelope approximately expressed by In the present invention, among the bandwidth Bi of each pole frequency, a pole frequency Fi (for example, 300 Hz) having a particularly narrow bandwidth Bi,
If the pitch frequencies Pi are close to each other with a difference of, for example, several Hz to 30 Hz, this is determined to be an abnormal parameter. The present invention provides a speech analysis and synthesis method in which speech is analyzed to extract information on pitch frequencies and information on spectral envelope, and this is used as a parameter time series for speech synthesis. means for approximately converting the polar frequency into a function of a polar frequency and its bandwidth; means for extracting a polar frequency having a narrow bandwidth for the transformed function; The present invention is characterized in that it includes means for determining the degree of proximity between the polar frequency and the pitch frequency, and when the degree of proximity between the polar frequency and the pitch frequency is high, it is determined that this is a parameter abnormality.
(6)発明の実施例 以下に、本発明を実施例にしたがつて詳述する。(6) Examples of the invention The present invention will be described in detail below using examples.
第2図は本発明を実施した音声分析器の構成図である。
同図において、1は音声分析処理部、2は本発明に係る
音声パラメータ異常検出部である。音声分析処理部1に
おいて、3は入力音声からピッチ、振幅、有声/無声の
情報を抽出する音源情報分析部、4はスペクトル包絡情
報をバラメータ化するための線形予測分析およびPAR
COR変換を行なう線形予測分析部である。分析部3,
4からの分析出力は、図示されない音声合成器において
音声合成に使用される。パラメータ異常検出部2におい
て、5は線形予測分析部4から出力されたPARCOR
係数を、線形予測係数に戻す変換部である。FIG. 2 is a block diagram of a speech analyzer embodying the present invention.
In the figure, 1 is a voice analysis processing section, and 2 is a voice parameter abnormality detection section according to the present invention. In the speech analysis processing section 1, 3 is a sound source information analysis section that extracts pitch, amplitude, voiced/unvoiced information from input speech, and 4 is a linear predictive analysis and PAR for parameterizing spectral envelope information.
This is a linear predictive analysis unit that performs COR conversion. Analysis department 3,
The analysis output from 4 is used for speech synthesis in a speech synthesizer, not shown. In the parameter abnormality detection unit 2, 5 is the PARCOR output from the linear prediction analysis unit 4.
This is a conversion unit that converts coefficients back into linear prediction coefficients.
もし、線形予測分析部4から、中間処理データとして線
形予測係数を取り出すことができれば、変換部5は不要
である。6は、前述した音声のスペクトル包絡に関する
全極型近似モデルの伝達関数H(z)において、その分
母をA(z)としたときの方程式について、ニユートン
ラフソン法などを用いて、A(z)−0を満たすz平面
上の根を求める求根演算部である。If linear prediction coefficients can be extracted from the linear prediction analysis unit 4 as intermediate processing data, the conversion unit 5 is not necessary. 6 is the transfer function H(z) of the all-pole approximation model regarding the spectral envelope of the voice described above, and the equation when the denominator is A(z) is calculated by using the Newton-Raphson method etc. ) This is a root-finding calculation unit that finds roots on the z-plane that satisfy −0.
7は、求根演算部6の演算結果の解を
とし、Tをサンプリング周期としたとき、極周波数およ
び帯域幅
を求める変換部である。Reference numeral 7 denotes a conversion unit that calculates the polar frequency and bandwidth, where the solution of the calculation result of the root-finding calculation unit 6 is taken as the sampling period and T is the sampling period.
8は、狭少な帯域幅Biをもつ極周波数Fiの抽出部で
ある。8 is a part for extracting the polar frequency Fi having a narrow bandwidth Bi.
帯域幅の狭少度は、Biの値がある一定の閾値以下であ
るか、あるいはがある一定の閾値以上であるかによつて
判定し、抽出する。The degree of narrowness of the bandwidth is determined and extracted depending on whether the value of Bi is less than or equal to a certain threshold value or greater than or equal to a certain threshold value.
9は、抽出された狭少帯域幅の極の極周波数と、音源情
報分析部3の出力中のピツチ情報(周波数)Piとの近
接度を判定する判定部である。Reference numeral 9 denotes a determination unit that determines the degree of proximity between the extracted polar frequency of the narrow bandwidth and the pitch information (frequency) Pi being output from the sound source information analysis unit 3.
近接度の判定は、たとえば、次の方程式に基づいて行な
われる。The determination of proximity is performed, for example, based on the following equation.
この判定式を満たす極周波数Fiがあつたとき、これを
異常パラメータと判定して出力する。異常パラメータの
判定出力が得られたとき、そのデータを利用して、狭少
帯域幅を広げるか、ピツチ周波数をずらすか、その他音
質に作用する適当な因子を変化させるなど、音声分析出
力を事前に修正することが可能となる。When a polar frequency Fi that satisfies this determination formula is found, it is determined to be an abnormal parameter and output. When the abnormal parameter judgment output is obtained, the data can be used to pre-process the audio analysis output by widening the narrow bandwidth, shifting the pitch frequency, or changing other appropriate factors that affect sound quality. It is possible to correct it.
(7)発明の効果
本発明は、異常な音声を発生する可能性が高いパラメー
タを自動的に検出することができるため、合成音声の品
質を向上させる作業を極めて容易にすることができる。(7) Effects of the Invention Since the present invention can automatically detect parameters that are likely to cause abnormal speech, it can extremely facilitate the task of improving the quality of synthesized speech.
第1図は、本発明の原理の説明図、第2図は本発明実施
例の構成図である。
図において、1は音声分析処理部、2は音声パラメータ
異常検出部、3は音源情報分析部、4は線形予測分析部
、5はPARCOR係数から線形予測係数への変換部、
6は求根演算部、7は根から極周波数および帯域幅への
変換部、8は狭少帯域幅の抽出部、9は極周波数とピツ
チ周波数との近接度判定部、をそれぞれ示す。FIG. 1 is an explanatory diagram of the principle of the present invention, and FIG. 2 is a configuration diagram of an embodiment of the present invention. In the figure, 1 is a speech analysis processing section, 2 is a speech parameter abnormality detection section, 3 is a sound source information analysis section, 4 is a linear prediction analysis section, 5 is a conversion section from PARCOR coefficients to linear prediction coefficients,
Reference numeral 6 indicates a root calculation unit, 7 a conversion unit from roots to pole frequencies and bandwidths, 8 a narrow bandwidth extraction unit, and 9 a proximity determination unit between pole frequencies and pitch frequencies.
Claims (1)
ル包絡に関する情報とを抽出し、これをパラメータ時系
列として音声合成に用いる音声分析合成方式において、
上記分析されたスペクトル包絡に関する情報を、極周波
数とその帯域幅との関数に近似的に変換する手段と、該
変換された関数について狭少な帯域幅をもつ極周波数を
抽出する手段と、該抽出された極周波数と上記分析され
たピッチ周波数との近接度を判定する手段とをそなえ、
極周波数とピッチ周波数との近接度が高い場合にこれを
パラメータ異常と判定する音声パラメータ異常検出方式
。1 In a speech analysis and synthesis method that analyzes speech and extracts information on pitch frequency and information on spectral envelope, and uses this as a parameter time series for speech synthesis,
means for approximately converting the information regarding the analyzed spectral envelope into a function of a polar frequency and its bandwidth; means for extracting a polar frequency having a narrow bandwidth with respect to the transformed function; means for determining the proximity between the analyzed polar frequency and the analyzed pitch frequency,
An audio parameter abnormality detection method that determines a parameter abnormality when the degree of proximity between the polar frequency and the pitch frequency is high.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP56214567A JPS5912198B2 (en) | 1981-12-25 | 1981-12-25 | Audio parameter abnormality detection method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP56214567A JPS5912198B2 (en) | 1981-12-25 | 1981-12-25 | Audio parameter abnormality detection method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPS58111996A JPS58111996A (en) | 1983-07-04 |
| JPS5912198B2 true JPS5912198B2 (en) | 1984-03-21 |
Family
ID=16657852
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP56214567A Expired JPS5912198B2 (en) | 1981-12-25 | 1981-12-25 | Audio parameter abnormality detection method |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JPS5912198B2 (en) |
-
1981
- 1981-12-25 JP JP56214567A patent/JPS5912198B2/en not_active Expired
Also Published As
| Publication number | Publication date |
|---|---|
| JPS58111996A (en) | 1983-07-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP4624552B2 (en) | Broadband language synthesis from narrowband language signals | |
| US8831942B1 (en) | System and method for pitch based gender identification with suspicious speaker detection | |
| CN104934029A (en) | Speech identification system based on pitch-synchronous spectrum parameter | |
| US8942977B2 (en) | System and method for speech recognition using pitch-synchronous spectral parameters | |
| US5577160A (en) | Speech analysis apparatus for extracting glottal source parameters and formant parameters | |
| Gurunath Reddy et al. | Predominant melody extraction from vocal polyphonic music signal by time-domain adaptive filtering-based method | |
| JPH11242498A (en) | Method and device for pitch encoding of voice and record medium where pitch encoding program for voice is record | |
| JPS5912198B2 (en) | Audio parameter abnormality detection method | |
| Arroabarren et al. | Glottal spectrum based inverse filtering. | |
| JPS5912199B2 (en) | Audio parameter modification method | |
| JPH0650440B2 (en) | LSP type pattern matching vocoder | |
| KR0176623B1 (en) | Automatic extraction method and apparatus for continuous voiced osseous part and silent consonant part | |
| JP2605256B2 (en) | LSP pattern matching vocoder | |
| Vogten et al. | The formator: a speech analysis-synthesis system based on formant extraction from linear prediction coefficients | |
| JPS62278598A (en) | Band division type vocoder | |
| JPH0141998B2 (en) | ||
| de Paiva et al. | On the application of RLS adaptive filtering for voice pitch modification | |
| Hosom | F0 estimation for adult and children's speech. | |
| JPS599920B2 (en) | Audio parameter modification method | |
| Al-Naimi et al. | Improved line spectral frequency estimation through anti-aliasing filtering | |
| Faycal et al. | Pitch modification of speech signal using source filter model by linear prediction for prosodic transformations | |
| Rabiner et al. | Use of a Computer Voice‐Response System for Wiring Communications Equipment | |
| Funada | Speech analysis using a time‐varying ARX model for separating the source‐tract coupling of vowels | |
| Takagi et al. | Formant frequency estimation by moment calculation of the speech spectrum | |
| Mito et al. | Real‐time pitch detection with a digital signal processor |