JPH0145640B2

JPH0145640B2 -

Info

Publication number: JPH0145640B2
Application number: JP56139575A
Authority: JP
Inventors: Seiichi Kotani
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 1981-09-04
Filing date: 1981-09-04
Publication date: 1989-10-04
Also published as: JPS5842096A

Description

[Detailed description of the invention]

本発明は、音声合成システムの出力信号などの
音声信号の雑音抑圧方式に関する。音声特にア、イ、ウなどの母音の波形は第１図
に示すようにある特定の周波数領域本例では1K
Hz，2KHz，3.6KHz近傍にエネルギが集中してい
る。これは声道の共振によるものでホルマントと
呼ばれ、これらの第１、第２、第３ホルマント
F₁，F₂，F₃の間のデイツプd₁，d₂は声道の反共
振によるものでアンチホルマントと呼ばれる。こ
のようにホルマント、アンチホルマントを持つ信
号では、ホルマント部の信号対雑音比Ｓ／Ｎは充
分大きいがアンチホルマント部のＳ／Ｎは小さく
（これは雑音レベルを鎖線の如くとつてみるとd₁
では余り差がなく、d₂では雑音レベル以下になつ
ていることから容易に理解されよう）、綜合的に
Ｓ／Ｎが劣化する。この対策としてアンチホルマ
ント部をバンドエリミネートする（斜線部はこの
除去部分を示す）ノツチフイルタを音声合成シス
テムのデコーダの出力端に挿入する方法が考えら
れている（例えばT.S.Lamba etal：Intelligible
Voice Communication……，Radio and
Electronic Engineer，Vol.48，No.４）。この場合
は雑音成分だけでなく当該帯域の信号成分も失な
われるが、原信号のホルマント成分は保存されて
いるので音声品質の低下は少なく雑音特性を改善
できる（雑音がないように聞える）。しかしながら音声の種類例えば母音の種類によ
りホルマント周波数は次表に示されるように大幅
に変化するから、ある種の音声に対して有効なノ
ツチフイルタも他の種類の音声に対してはかえつ
て品質を劣化することになる。 The present invention relates to a noise suppression method for a speech signal such as an output signal of a speech synthesis system. As shown in Figure 1, the waveforms of voices, especially vowels such as A, I, and U, are in a specific frequency range of 1K in this example.
Energy is concentrated near Hz, 2KHz, and 3.6KHz. This is due to vocal tract resonance and is called formant, and these first, second, and third formants
The dips d ₁ and d ₂ between F ₁ , F ₂ , and F ₃ are due to antiresonance of the vocal tract and are called antiformants. In this way, in a signal with formant and antiformant, the signal-to-noise ratio S/N of the formant part is sufficiently large, but the S/N of the antiformant part is _small .
(This can be easily understood from the fact that there is not much difference at _d2 , and it is below the noise level at d2), and the S/N deteriorates overall. As a countermeasure to this problem, a method has been considered in which a notch filter that band-eliminates the antiformant part (the shaded part indicates the removed part) is inserted into the output terminal of the decoder of the speech synthesis system (for example, TSLamba etal: Intelligible
Voice Communication……，Radio and
Electronic Engineer, Vol.48, No.4). In this case, not only the noise component but also the signal component in the relevant band is lost, but the formant component of the original signal is preserved, so there is little deterioration in voice quality and the noise characteristics can be improved (it sounds like there is no noise). However, the formant frequency changes significantly depending on the type of voice, such as the type of vowel, as shown in the table below, so a notch filter that is effective for some types of voice will actually degrade the quality for other types of voice. I will do it.

【表】【table】

【表】上記の欠点は前記文献記載のシステムに準じた
システムで色々な音声をメモリへ書込み、それを
読出してデコードするとき、ノツチフイルタの中
心周波数と帯域を変化した場合に聴感上で確認さ
れた。本発明はこのような障害を除くべく、中心
周波数と帯域を自動的に変更できるノツチフイル
タを設け、それぞれの音声に対応させて該中心周
波数と帯域を変更させて、音声の忠実度を保ちな
がら雑音特性を改善しようとするものである。次
に実施例を参照しながらこれを詳細に説明する。第２図は波形符号化型の音声合成システムに本
発明を適用した例を示す。１０は記憶部で、
ROM（読取り専用メモリ）で構成され、音声素
片を波形符号化して記憶する部分Ａと該音声素片
に対して使用するノツチフイルタの特性制御情報
を記憶する部分Ｂとを有する。なお部分Ａ，Ｂは
図面では一方はメモリの前半部、他方は後半部と
分離しておくように示しているが、これは説明上
で、実際のメモリシステム上の配置は適宜変更で
きる。１２はマイクロプロセツサで構成される制
御部で、記憶部１０の読出しアドレス信号などを
出力する。１４はデコーダ部で、記憶部１０の読
出しデジタル出力をデコードして音声アナログ信
号を出力する。１６はノツチフイルタ部で、帯域
除去するその帯域の中心周波数および幅が前記制
御情報により変更される。例えば記憶部の部分Ａ
に記憶されている音声素片群のうち第１図に示さ
れるような周波数スペクトル特性を持つ母音を読
出すとすると制御部１２は、該母音の音声信号を
サンプリングしΔ変調等して得たデジタルデータ
群を記憶している部分Ａのアドレス群を逐次出力
し、その読出し出力がデコーダ部１４でデコード
されてアナログ音声信号となり、それがノツチフ
イルタ部１６を通つて出力され、図示しない増幅
器を介してスピーカを駆動するが、上記制御部が
出力するアドレス群の先頭のもので部分Ｂに格納
されている該母音に関するノツチフイルタ部１６
の特性制御情報、即ち該フイルタの中心周波数お
よび帯域を本例では第１図の斜線部の中央および
幅にする制御信号も読出され、ノツチフイルタ部
１６を上記のように特性変更する。メモリに書込
む情報はデジタル情報であるから、フイルタ制御
信号がアナログ信号ならDA変換を行なう。なお
このノツチフイルタとしてはデジタルフイルタが
好ましい。またノツチフイルタは複数個設け、音
声素片の第１、第２……アンチホルマント部を除
去するようにしてもよく、この場合は上記特性制
御情報を複数個用意する。一般の音声では無声期間50％、母音区間49％、
子音期間１％、というデータも文献に発表されて
おり、ノツチフイルタの特性制御は、音声を音
節、音韻あるいは母音、子音等と区別するとき、
音韻または母音単位で行なうのが効果的である。
またこのフイルタ部にはホルマント部の伸長機能
を持たせることも有効である。なお音声信号をΔ
変調等して記憶部分Ａに音声素片群を書込むと
き、該音声信号をノツチフイルタに通してアンチ
ホルマント部除去を行なうことも考えられ、この
場合は再生に際してノツチフイルタ部１６を通す
必要はないが、本発明方式はアンチホルマント部
除去を行なわずに音声素片群を書込まれているい
わば汎用音声メモリを用いての音声合成出力等に
有効である。以上説明したように本発明によればメモリに記
憶されたア，イ……，カ，キ……などの音声素片
の群を用いて音質の良好な、雑音特性の優れた有
意音声を出力することができ、甚だ有効である。[Table] The above drawbacks were audibly confirmed when the center frequency and band of the notch filter were changed when various sounds were written to memory and read out and decoded using a system similar to the system described in the above literature. . In order to eliminate such disturbances, the present invention provides a notch filter that can automatically change the center frequency and band, and changes the center frequency and band according to each voice, thereby eliminating noise while maintaining the fidelity of the voice. This is an attempt to improve the characteristics. Next, this will be explained in detail with reference to examples. FIG. 2 shows an example in which the present invention is applied to a waveform encoding type speech synthesis system. 10 is the storage section,
It is composed of a ROM (read-only memory) and has a part A for storing waveform encoded speech segments and a part B for storing characteristic control information of a notch filter used for the speech segments. In the drawings, parts A and B are shown so that one is separated from the first half of the memory and the other is separated from the latter half of the memory, but this is for illustrative purposes only, and the actual arrangement on the memory system can be changed as appropriate. Reference numeral 12 denotes a control section composed of a microprocessor, which outputs a read address signal for the storage section 10 and the like. A decoder section 14 decodes the read digital output of the storage section 10 and outputs an audio analog signal. 16 is a notch filter section, and the center frequency and width of the band to be removed are changed by the control information. For example, part A of the storage section
If a vowel having the frequency spectrum characteristics as shown in FIG. 1 is to be read out from a group of speech segments stored in The address group of the part A that stores the digital data group is sequentially outputted, and the read output is decoded by the decoder section 14 to become an analog audio signal, which is outputted through the notch filter section 16 and then sent through an amplifier (not shown). The notch filter unit 16 regarding the vowel stored in part B, which is the first address of the address group output by the control unit, drives the speaker.
Characteristic control information, ie, a control signal for setting the center frequency and band of the filter to the center and width of the hatched area in FIG. 1 in this example, is also read out, and the characteristics of the notch filter section 16 are changed as described above. Since the information written to the memory is digital information, if the filter control signal is an analog signal, DA conversion is performed. Note that a digital filter is preferable as this notch filter. Further, a plurality of notch filters may be provided to remove the first, second, . In general speech, the silent period is 50%, the vowel interval is 49%,
There is also data published in the literature that the consonant period is 1%, and the characteristic control of the notch filter is useful when distinguishing speech from syllables, phonemes, vowels, consonants, etc.
It is effective to do this on a phoneme or vowel basis.
It is also effective to provide this filter section with a function of expanding the formant section. Note that the audio signal is
When writing a group of speech segments into the storage section A by modulation, etc., it is also possible to pass the speech signal through a notch filter to remove the antiformant part. The method of the present invention is effective for speech synthesis output using a so-called general-purpose speech memory in which a group of speech segments is written without performing antiformant part removal. As explained above, according to the present invention, meaningful speech with good sound quality and excellent noise characteristics is output using a group of speech segments such as A, I..., Ka, Ki... stored in the memory. It can be done and is very effective.

[Brief explanation of drawings]

第１図は音声素片の周波数スペクトル密度特性
を示すグラフ、第２図は本発明の説明用ブロツク
図である。図面で１０はメモリ、１４はデコード部、１６
はノツチフイルタである。 FIG. 1 is a graph showing the frequency spectrum density characteristics of a speech segment, and FIG. 2 is a block diagram for explaining the present invention. In the drawing, 10 is a memory, 14 is a decoding section, 16
is a notsuchi filter.

Claims

[Claims]

1. In a noise suppression method for a system that selectively reads out a large number of speech segments stored in a memory, decodes them, and outputs a speech signal, a notch filter for removing the antiformant part of each speech segment is provided, and its characteristic control information is provided. is stored in memory in correspondence with each speech segment, and when each speech segment is read out from the memory, characteristic control information for the speech segment is also read out and added to the notch filter. A method for suppressing noise in a speech signal, characterized in that a decoded output of a speech segment readout output is passed through a notch filter.