JPH0754436B2

JPH0754436B2 - CSM type speech synthesizer

Info

Publication number: JPH0754436B2
Application number: JP62019114A
Authority: JP
Inventors: 哲田口
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1987-01-28
Filing date: 1987-01-28
Publication date: 1995-06-07
Anticipated expiration: 2010-06-07
Also published as: JPS63186300A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明はCSM型音声合成器に関し、特に合成音質の改善
を図ったCSM型音声合成器に関する。The present invention relates to a CSM-type speech synthesizer, and more particularly to a CSM-type speech synthesizer with improved synthetic sound quality.

[Conventional technology]

たかだか４〜６波程度の正弦波周波数の組合せで音声信
号を表現するCSM型音声合成器は近時よく知られつつあ
る。Recently, CSM type voice synthesizers that express voice signals with combinations of sine wave frequencies of about 4 to 6 waves are becoming well known.

音声合成器としては従来LPC（Linear Prediction Codin
g,線形予測分析）型音声合成器）が広く利用されている
が、このLPC型音声合成器は一般に構成が複雑であり、
また、音声合成に用いられるLPC合成フィルタの特性が
音声パラメータ伝送時のエラー等によってその安定性が
損なわれる機会が多いという問題がある。As a speech synthesizer, the conventional LPC (Linear Prediction Codin
g, linear predictive analysis) type speech synthesizer) is widely used, but this LPC type speech synthesizer generally has a complicated configuration.
Also, there is a problem that the stability of the characteristics of the LPC synthesis filter used for speech synthesis is often impaired due to an error during the transmission of speech parameters.

これに対してCSM型音声合成器は、後述する如く、LPC合
成フィルタの如き音声合成フィルタを利用する必要がな
いためその構成が極めて単純化され本質的に合成時にお
ける安定性の問題を生ずることがない。On the other hand, the CSM speech synthesizer does not need to use a speech synthesis filter such as an LPC synthesis filter, as will be described later, so that its configuration is extremely simplified and inherently causes stability problems during synthesis. There is no.

[Problems to be solved by the invention]

しかしながら、従来のこの種のCSM型音声合成器におい
て、たかだか４〜６波程度の正弦波周波数で表現される
CSMを利用して音声を合成するには単にこれらを線形結
合するだけでは全く不充分で、これらの処理のほかにい
くつかの特別の処理をすることが必要となる。However, in this kind of conventional CSM type speech synthesizer, it is represented by a sine wave frequency of about 4 to 6 waves.
In order to synthesize speech using CSM, simply linearly combining them is not enough, and it is necessary to perform some special processing in addition to these processings.

これら特別な処理は、単純なCSMの集合、すなわちたか
だか４〜６波程度の線スペクトルを、声道伝送特性に対
応するスペクトル包絡と、声帯振動特性あるいは声道の
狭め等によってつくられる無声音声の音源の特性に対応
するスペクトルの微細構造との畳み込みの形式のスペク
トルに変換するためのもので、後に詳述する如く目的と
するところはたかだか数個のCSMの必要なスペクトル拡
散にあり、ピッチ情報を利用した特別な窓処理を介して
実施される。第11図は従来のCSMにおける窓処理説明図
である。第11図は有声音を生成する場合を対象とした窓
処理を示す。あらかじめ設定する窓関数による時間窓を
利用し、この時間窓をすべてのCSM正弦波信号の位相を
ピッチ周期に応答して位相リセットしつつ出力する状況
を示す。ただし、このような窓処理を前提として行なわ
れる音源付与によれば、スペクトルの微細構造としての
音源が完壁にピッチ構造を持過ぎてしまうこととなる。
何故ならば、現実の音声はピッチ構造を有してはいるも
ののそのくずれもまた大きいからである。従って、従来
の有声音発生法は、人工的過ぎる合成音を発生し聴覚的
に不自然すなわち機械的音声にならざるを得ないという
問題がある。These special processes are performed on a set of simple CSM, that is, a line spectrum of about 4 to 6 waves at the maximum, and a spectrum envelope corresponding to the vocal tract transmission characteristic, and an unvoiced voice created by the vocal cord vibration characteristic or the vocal tract narrowing. It is for converting into a spectrum in the form of convolution with the fine structure of the spectrum corresponding to the characteristics of the sound source, and the purpose is to spread the spectrum required for several CSMs at most, as will be described in detail later. It is carried out through a special window processing using. FIG. 11 is an explanatory view of window processing in the conventional CSM. FIG. 11 shows window processing for the case of generating voiced sound. We show a situation in which a time window with a preset window function is used, and this time window is output while resetting the phase of all CSM sine wave signals in response to the pitch period. However, if the sound source is added on the premise of such window processing, the sound source as a fine structure of the spectrum will have an excessive pitch structure.
This is because the real voice has a pitch structure, but its collapse is also large. Therefore, the conventional voiced sound generation method has a problem that the artificial sound is too synthetic to be aurally unnatural or mechanical sound.

上述した問題は有声音を対象としたものであるが、一方
無声音について言えば次のような問題がある。The above-mentioned problem is intended for voiced sounds, while unvoiced sounds have the following problems.

すなわち、無声音の処理については現在、一般的に必ら
ずしも明確に知られていない。特にCSMを用いて無声音
の合成を行なう場合に必要なスペクトル拡散法も確立し
ておらず、従ってCSM型音声合成器はまだ実用化されて
いるとは言い難いという問題がある。That is, the processing of unvoiced sound is generally not always known clearly at present. In particular, the spread spectrum method required for unvoiced speech synthesis using CSM has not been established, and therefore it is difficult to say that CSM-type speech synthesizers are still in practical use.

本発明の目的は上述した欠点を除去し、実用に耐えるCS
M型音声合成器を提供することにある。The object of the present invention is to eliminate the above-mentioned drawbacks and to put it into practical use.
It is to provide an M-type speech synthesizer.

[Means for solving problems]

本発明のCSM型音声合成器は、ピッチ構造に拘束されな
いスペクトル拡散手段を含む有声音発生手段を有して構
成される。The CSM type speech synthesizer of the present invention is configured to have a voiced sound generating means including a spread spectrum means that is not restricted by a pitch structure.

また、本発明のCSM型音声合成器は、乱数発生手段を有
しかつこの乱数発生手段によって発生した乱数信号にも
とづいて設定される周期の連続波形を利用したスペクト
ル拡散手段を含む無声音発生手段を有するCSM型音声合
成器において、前記スペクトル拡散手段が周波数変調と
して構成される。Further, the CSM type speech synthesizer of the present invention has unvoiced sound generating means including a random number generating means and a spectrum spreading means utilizing a continuous waveform having a period set based on a random number signal generated by the random number generating means. In the CSM type speech synthesizer having, the spread spectrum means is configured as frequency modulation.

さらに、本発明のCSM型音声合成器は、乱数発生手段を
有しかつこの乱数発生手段によって発生した乱数信号に
もとづいて設定された周期信号を利用するスペクトル拡
散手段を含む無声音発生手段を有するCSM型音声合成器
において、前記スペクトル拡散手段が前記同期信号によ
って指定される周期とは異る周期の窓関数発生手段を含
んで構成される。Further, the CSM type speech synthesizer of the present invention has a CSM having unvoiced sound generating means including random number generating means and spread spectrum means using a periodic signal set based on the random number signal generated by the random number generating means. In the type speech synthesizer, the spread spectrum means includes window function generating means having a period different from the period designated by the synchronizing signal.

〔Example〕

最初にCSM型音声合成器の原理について説明する。 First, the principle of the CSM type speech synthesizer will be explained.

CSMは、音声信号を振幅と周波数を自由に選択しうるパ
ラメータとしてもつ特定個数の正弦波の和として表現し
たものである。特定個数としてはたかだか４〜６個のあ
らかじめ設定するが利用される。従って、CSM音声合成
を行なうには、先ず被分析音声信号をCSM音声分析によ
って、あらかじめ設定された個数の正弦波の和として表
現することが必要となる。CSM expresses a voice signal as the sum of a specific number of sine waves having amplitude and frequency as freely selectable parameters. A preset number of at most 4 to 6 is used as the specific number. Therefore, in order to perform CSM voice synthesis, it is first necessary to express the voice signal to be analyzed as a sum of a preset number of sine waves by CSM voice analysis.

CSM音声分析については後で詳述するものとし、ここで
は要点のみを説説する。The CSM speech analysis will be described in detail later, and only the main points will be explained here.

CSM音声分析もLPC分析の場合と同様に位相情報の無視、
音源の影響の平均化、雑育成分による不安定性の回避等
を目的とし中間パラメータとして自己相関係数を利用す
る。CSM voice analysis also ignores phase information, as in LPC analysis.
The autocorrelation coefficient is used as an intermediate parameter for the purpose of averaging the effects of sound sources and avoiding instability due to miscellaneous components.

すなわち、CSM分析は各分析フレームごとに表現される
べき音声波形から直接算出される標本自己相関係数の低
次のタップのＮ個を、合成波の自己相関係数の低次のタ
ップのＮ個と一致するように、合成すべき各正弦波の周
波数およびその強度（電力振幅）を決定することであ
る。That is, in CSM analysis, N low order taps of the sample autocorrelation coefficient directly calculated from the speech waveform to be expressed for each analysis frame are converted into N low order taps of the autocorrelation coefficient of the composite wave. The frequency of each sine wave to be combined and its intensity (power amplitude) are determined so as to match the individual numbers.

いま、合成すべき正弦波の個数をｎとし、各正弦波の角
周波数をω_i（ｉ＝1,2,…ｎ）、各正弦波の強度をｍ_iと
すると、CSMによる合成波ｙ_tは次の（１）式で表わされ
る。Now, assuming that the number of sine waves to be combined is n, the angular frequency of each sine wave is ω _i (i = 1, 2, ... N), and the strength of each sine wave is m _i , the combined wave y _t by CSM Is expressed by the following equation (1).

（１）式のタップｌの自己相関係数γ_lはω_i,m_iを利用
して容易に（２）式の如く表わすことができる。 The autocorrelation coefficient γ _l of the tap l in the equation (1) can be easily expressed as in the equation (2) by using ω _i , m _i .

一方、表現されるべき音声波形のサンプルをＸ_tとする
と、あるフレームにおけるタップｌの標本自己相関係数
Ｖ_lは次の（３）式で示すことができる。 On the other hand, if the sample of the speech waveform to be expressed is X _t , the sample autocorrelation coefficient V _l of tap l in a certain frame can be expressed by the following equation (3).

（３）式において、Ｍは１分析フレームあたりのサンプ
ル数である。 In Expression (3), M is the number of samples per analysis frame.

さて、CSM分析においては上述したγ_lが、与えられたＶ
_lと低次のＮ個について等しくなるようにｍ_i，ω_iが決
定される。すなわちγ_l＝Ｖ_l（ｌ＝0,1,2,…Ｎ）が成立
するようにｍ_i，ω_iが決定される。ここで、ｎ個の正弦
波のｍ_iおよびω_iが、与えられた音声信号に対応して各
分析フレームごとに次次に得られるものとする。第12図
はCSMパラメータｍ_i，ω_iによる音声特徴ベクトルパタ
ーンの一例を示す音声特徴ベクトルパターン図である。Now, in the CSM analysis, the above-mentioned γ _l is given V
m _i and ω _i are determined so as to be equal to _l and N low-order ones. That is, m _i and ω _i are determined so that γ _l = V _l (l = 0,1,2, ... N) holds. Here, it is assumed that n sine waves m _i and ω _i are obtained next for each analysis frame corresponding to a given audio signal. FIG. 12 is a voice feature vector pattern diagram showing an example of a voice feature vector pattern based on the CSM parameters m _i and ω _i .

第13図はCSM線スペクトルとLPCスペクトル包絡とを対比
して示すCSM/LPCスペクトル対比図である。第13図の場
合は分析フレーム長が30mSEC、分析次数９次（Ｎ＝９）
のCSM（ｎ＝５）のCSM線スペクトルと、同一の音声サン
プルから求めた９次のLPCスペクトル包絡とを対比して
示している。FIG. 13 is a CSM / LPC spectrum contrast diagram showing the CSM line spectrum and the LPC spectrum envelope in contrast. In the case of FIG. 13, analysis frame length is 30 mSEC, analysis order is 9th order (N = 9)
The CSM line spectrum of CSM (n = 5) is compared with the 9th-order LPC spectrum envelope obtained from the same voice sample.

上述した次数Ｎと、正弦波の個数ｎとの間にはＮ＝2n−
１の関係があることもよく知られている。Between the order N and the number n of sine waves, N = 2n−
It is also well known that there is a one relationship.

ところで、このようなCSM分析の結果得られるｎ個の
ｍ_i，ω_iの値を用いて、このｍ_i，ω_iで指定される強度
（振幅はおよび角周波数をもつｎ個の正弦波をつくりこれを単純
に加算しただけでは人間の耳には単に正弦波が合成され
た音として聞えるだけでもとの音声を再現するという目
的は果たせない。However, such CSM analysis of resulting n-number of m _i, using the value of omega _i, the m _i, the intensity (amplitude specified by omega _i is By creating n sine waves having angular frequencies and adding them simply, the purpose of reproducing the original voice cannot be achieved by merely hearing as a synthesized sound of the sine waves in the human ear.

このことは、正弦波を単純加算しても、発生した信号の
スペクトルは離散化されたｎ個の線スペクトルに過ぎ
ず、一方、音声信号のスペクトルは連続的スペクトル包
絡を有し、さらに、有声音はピッチ構造で表現され、ま
た無声音は確率過程で表現される微細スペクトル構造を
有し、従って単純加算したCSMと音声信号とは互いにス
ペクトル構造が全く異ることに起因する。そこで、CSM
を用いて音声を合成するには、何等かの手法を用いて線
スペクトルを連続的スペクトルに拡散せしめることが必
要となる。つまり、CSM音声合成とは第12図、第13図に
示すような線スペクトルで表現された音声特徴スペクト
ルパターンから音声スペクトルパターンを発生させるこ
とに帰着する。This means that even if a sine wave is simply added, the spectrum of the generated signal is only the discretized n line spectra, while the spectrum of the voice signal has a continuous spectral envelope, and Voice sounds are represented by a pitch structure, and unvoiced sounds have a fine spectral structure represented by a stochastic process. Therefore, the simple addition of CSM and the speech signal have completely different spectral structures. So CSM
To synthesize speech using, it is necessary to spread the line spectrum into a continuous spectrum using some method. That is, CSM speech synthesis results in generation of a speech spectrum pattern from a speech feature spectrum pattern represented by a line spectrum as shown in FIGS.

本発明においては、上述したスペクトル拡散を行なうた
めに次に述べるような手法を用いている。In the present invention, the following method is used to perform the above-mentioned spread spectrum.

すなわち、有声音は明確なピッチ構造を有することを利
用し、ｎ個の各正弦波をこのピッチ周期ごとに位相のリ
セットを行なう。これにより、容易にスペクトルの発生
とピッチ微細構造の発生が可能となる。さらに、後述す
る如く、特殊な窓処理を上述した位相リセット波形に施
すことによって位相リセット時における合成波形の不連
続性を除去している。第14図は本発明におけるCSMのス
ペクトル拡散前のスペクトル包絡およびピッチ微細構造
特性図、第15図は従来のCSMのスペクトル拡散後のスペ
クトル特性図である。第14図の内容がスペクトル包絡と
ピッチ微細構造とを有するスペクトルに変化し、聴覚的
にも充分実用に耐える音質となることが実験結果でも明
らかになっている。これに対し、第14図の場合は、単に
正弦波が合成された波形として聞えるだけで音声合成の
目的は達せられない。That is, the fact that the voiced sound has a clear pitch structure is used to reset the phase of each of the n sine waves at every pitch period. This makes it possible to easily generate a spectrum and a fine pitch structure. Further, as will be described later, a special window process is applied to the above-mentioned phase reset waveform to remove the discontinuity of the composite waveform at the time of phase reset. FIG. 14 is a spectrum envelope and pitch fine structure characteristic diagram of the CSM before spectrum diffusion in the present invention, and FIG. 15 is a spectrum characteristic diagram of the conventional CSM after spectrum diffusion. Experimental results have shown that the contents of FIG. 14 are changed to a spectrum having a spectrum envelope and a pitch fine structure, and the sound quality is audibly sufficient for practical use. On the other hand, in the case of FIG. 14, the purpose of the voice synthesis cannot be achieved because the sine wave can be heard as a synthesized waveform.

以上は有声音の場合であるが、無声音の場合は次のよう
にして行なう。すなわち、上述した有声音の場合には、
ピッチ周期ごとに行なった位相のリセットと特殊の窓処
理とを、無声音の場合にはピッチ周期の代りに確率過程
としてランダムに発生するその周期が分布幅と下限値と
を設定されたパルスを用い、このパルスの発生時点ごと
に上述した処理を実施する。The above is the case of voiced sound, but the case of unvoiced sound is performed as follows. That is, in the case of the voiced sound described above,
The phase reset and special window processing performed for each pitch period are randomly generated as a stochastic process instead of the pitch period in the case of unvoiced sound, using a pulse whose distribution width and lower limit value are set. The above-mentioned processing is performed every time the pulse is generated.

しかしながら、本来無声音は有声音と異り、声帯のよう
な明確な音源を有せず、声道全体が乱流を発生して音源
となっている。従って、無声音には位相初期化により発
生する各周波数成分の明確な位相関係は不要である。そ
こで、本発明では、不要な位相関係を有せずに周波数拡
散を行なう１つの無声音の合成としてCSMの各周波数に
雑音でFM変調をかける方法を用いている。この場合の雑
音は、白色雑音や上述した周期が分布幅と下限値とを設
定したものでよい。これにより、無声音において不必要
な位相初期化は排除できる。However, unlike voiced sound, unvoiced sound originally does not have a clear sound source such as a vocal cord, and the entire vocal tract produces turbulence and becomes a sound source. Therefore, the unvoiced sound does not require a clear phase relationship between frequency components generated by phase initialization. Therefore, in the present invention, a method of FM-modulating each frequency of CSM with noise is used as one unvoiced sound synthesis for performing frequency spreading without having an unnecessary phase relationship. The noise in this case may be white noise or the above-mentioned period for which the distribution width and the lower limit value are set. This eliminates unnecessary phase initialization in unvoiced sounds.

また本発明においては、無声音の他の合成法として次の
ような手法も利用している。Further, in the present invention, the following method is also used as another synthesis method of unvoiced sound.

すなわち、乱数発生手段によって発生した乱数信号にも
とずいて設定した同期信号を利用するスペクトル拡散手
段を含む無声音発生手段を有するCSM型音声合成器を対
象とし、前記同期信号とは異る周期の窓関数発生手段を
含むスペクトル拡散手段によって無声音を合成するもの
である。このような窓関数発生手段を利用することによ
って位相リセット時点近傍における連続性を著しく改善
することができる。That is, targeting a CSM type speech synthesizer having an unvoiced sound generating means including a spread spectrum means that utilizes a synchronizing signal set based on a random number signal generated by the random number generating means, and has a cycle different from that of the synchronizing signal. The unvoiced sound is synthesized by the spread spectrum means including the window function generating means. By using such window function generating means, the continuity in the vicinity of the phase reset time can be remarkably improved.

さらに本発明においては、有声音においてもピッチ構造
が従来の手法にあっては明確化され過ぎ、極めて不自然
な聴感を与えることに対し、ピッチ構造に拘束されない
スペクトル拡散手段を利用し有声音合成も実施してい
る。このことは無声音の場合に同期信号とは異る周期の
窓関数発生手段を利用する前述の例と同じく、窓区間境
界部分における合成音声の連続性を高める効果を狙った
ものである。Further, in the present invention, the pitch structure of voiced sound is too clarified in the conventional method, and a very unnatural sensation is given, whereas the spread spectrum means not restricted by the pitch structure is used to synthesize the voiced sound. Is also implemented. This is aimed at the effect of enhancing the continuity of the synthesized voice at the boundary portion of the window section, as in the above-mentioned example in which the window function generating means having a period different from that of the synchronizing signal is used in the case of unvoiced sound.

さて、次に本発明の具体的実施例を図面を参照して詳細
に説明する。第１図は本発明の第１の実施例を示すブロ
ック図である。Now, specific embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of the present invention.

この第１の実施例は、乱数発生によって発生した乱数信
号にもとづいて設定した周期の連続波形を利用するスペ
クトル拡散手段を含む無声音発生手段を備えたCSM型音
声合成器であって、しかも周波数変調をスペクトル拡散
とする場合の具体例を示すものである。The first embodiment is a CSM type speech synthesizer equipped with unvoiced sound generating means including a spread spectrum means for utilizing a continuous waveform having a period set based on a random number signal generated by random number generation, and further, frequency modulation. It shows a specific example in the case where is the spread spectrum.

第１の実施例は分析側（送信側）と合成側（受信側）と
から成る。分析側は図示しないが、A/Dコンバータ、ハ
ミング窓処理器、自己相関係数計測器、CSM分析器、CSM
量子化器、電子量子化器、ピッチ抽出器、有声音／無声
音判定器、およびマルチプレクサを備えて構成される。The first embodiment comprises an analysis side (transmission side) and a synthesis side (reception side). Although not shown on the analysis side, A / D converter, Hamming window processor, autocorrelation coefficient measuring instrument, CSM analyzer, CSM
It is composed of a quantizer, an electronic quantizer, a pitch extractor, a voiced / unvoiced sound determiner, and a multiplexer.

また合成側は、第１図に示す如く、デマルチプレクサ・
復号化器101、補間器102、同期算出器103、乱数発生器1
04、ｎ個の位相リセット機能付可変周波数発振器105
（１）,105（２），…105（ｎ）,n個の可変利得増幅器1
06（１）,106（ｎ），加算合成器107,可変長窓関数発生
器108,乗算器109,110,n個のFM変調器111（１）,111
（２），…111（ｎ）、鋸歯状波発生器112,および有声
／無声切替器113（ａ）〜113（ｃ）等を備えて構成され
る。As shown in Fig. 1, the synthesizing side is a demultiplexer
Decoder 101, interpolator 102, synchronization calculator 103, random number generator 1
04, n variable frequency oscillator with phase reset function 105
(1), 105 (2), ... 105 (n), n variable gain amplifiers 1
06 (1), 106 (n), addition combiner 107, variable length window function generator 108, multipliers 109, 110, n FM modulators 111 (1), 111
(2), ... 111 (n), a sawtooth wave generator 112, voiced / unvoiced switchers 113 (a) to 113 (c), and the like.

分析側では、音声波形をA/Dコンバータで振幅と時間軸
とが量子化されたディジタルデータに変換し、ハミング
窓処理器、ピッチ抽出器、有声音／無声音判定器等に供
給する。On the analysis side, the voice waveform is converted by an A / D converter into digital data whose amplitude and time axis are quantized, and is supplied to a Hamming window processor, a pitch extractor, a voiced sound / unvoiced sound determiner and the like.

ハミング窓処理器に供給されたディジタルデータは、あ
らかじめ定める１分析フレームごとに公知のハミング窓
関数による荷重乗算を受け各分析フレームのデータごと
に自己相関係数計測器に供給される。The digital data supplied to the Hamming window processor is subjected to weight multiplication by a known Hamming window function for each predetermined analysis frame, and is supplied to the autocorrelation coefficient measuring device for each data of each analysis frame.

自己相関係数計測器は、こうして入力された各分析フレ
ームのデータごとに前述した（３）式の演算により低位
のＮ個の自己相関係数Ｖ_lを求める。こうして得られる
各分析フレームごとのＶ_lの組を次のCSM分析器に供給す
るとともにこの中のＶ₀、つまり遅れ時間零における自
己相関係数を当該分析フレームにおける電力情報として
電力量子化器に供給する。The autocorrelation coefficient measuring device obtains N low-order autocorrelation coefficients V _l by the above-described calculation of the equation (3) for each data of each analysis frame thus input. The set of V _l for each analysis frame obtained in this way is supplied to the next CSM analyzer, and V ₀ therein, that is, the autocorrelation coefficient at the delay time of zero is supplied to the power quantizer as the power information in the analysis frame. Supply.

各分析フレームごとの自己相関係数Ｖ_lの組の供給を受
けたCSM分析器は、後述する演算を行なうことによって
対応する分析フレームのCSMのｎ個の各正弦波の強度お
よび角周波数を指定するｍ_i，ω_iを決定しこれをCSM量
子化器に供給する。The CSM analyzer supplied with the set of autocorrelation coefficients V _l for each analysis frame specifies the intensity and angular frequency of each of the n sine waves of the CSM of the corresponding analysis frame by performing the operation described later. M _i and ω _i are determined and are supplied to the CSM quantizer.

CSM量子化器は、これらｍ_i，ω_iの組を再生音質に対す
る要求と伝送路の伝送容量とを勘案して設定される適当
な粗さで量子化したのちマルチプレクサに供給する。ま
た、電力量子化器もＶ₀を上述の観点から設定される粗
さで量子化したのちマルチプレクサに供給する。また、
A/Dコンバータからのディジタルデータを受けたピッチ
抽出器は、ピッチ周期を抽出しこれを量子化したのちマ
ルチプレクサに供給し、さらに、有声音／無声音判定器
も供給されたディジタルデータによって有声音／無声音
の判定を行ないこれを２値信号としてマルチプレクサに
供給する。The CSM quantizer quantizes the set of m _i and ω _i with an appropriate roughness set in consideration of the demand for the reproduced sound quality and the transmission capacity of the transmission line, and then supplies it to the multiplexer. In addition, the power quantizer also quantizes V ₀ with the roughness set from the above viewpoint and then supplies it to the multiplexer. Also,
The pitch extractor, which received the digital data from the A / D converter, extracts the pitch period, quantizes it, and then supplies it to the multiplexer. The unvoiced sound is determined and supplied as a binary signal to the multiplexer.

以上の信号の供給を受けたマルチプレクサは、これらの
信号を合成側における分離の容易さ、ならびに伝送に適
した形式で多重化し合成側に伝送する。The multiplexer supplied with the above signals multiplexes these signals in a format suitable for ease of separation on the combining side and for transmission, and transmits the signals to the combining side.

合成側では、こうして伝送された信号をデマルチプレク
サ・復号化器101で受け、復号化および分離化を行なっ
て分析側のマルチプレクサ入力の信号状態に復元する。On the synthesizing side, the signal thus transmitted is received by the demultiplexer / decoder 101, decoded and separated to restore the signal state of the multiplexer input on the analyzing side.

説明を簡単にするためにｎ個のFM変調器111（１）〜111
（ｎ）は存在しないものとして説明を進める。In order to simplify the explanation, n FM modulators 111 (1) to 111
(N) will be described assuming that it does not exist.

こうして復元された角信号はメモリ機能を有する補間器
102に供給され、必要な補間を施されたのちω₁〜ω_iは
ｎ個の位相リセット機能付可変周波数発振器105（１）
〜105（ｎ）の周波数制御入力として供給され、これら
各発振器の出力角周波数をそれぞれω₁〜ω_nに設定す
る。The angle signal thus restored is an interpolator having a memory function.
After being supplied to 102 and subjected to necessary interpolation, ω _{1 to} ω _i are n variable frequency oscillators with a phase reset function 105 (1)
~ 105 (n) frequency control inputs to set the output angular frequencies of each of these oscillators to ω ₁ to ω _n , respectively.

また、CSMのｎ個の各正弦波の強度（電力振幅）を指定
するｍ_i〜ｍ_nはｎ個の可変利得増幅器106（１）〜106
（ｎ）の利得制御入力として供給され、これにより各周
波数の発振電力がそれぞれ指定された値になるように制
御する。Further, m _{i to} _mn that specify the intensities (power amplitudes) of the n sine waves of the CSM are n variable gain amplifiers 106 (1) to 106, respectively.
It is supplied as a gain control input of (n), so that the oscillation power of each frequency is controlled to a designated value.

こうして得られるｎ個の可変利得増幅器106（１）〜106
（ｎ）の出力は加算合成器107で加算合成したのち乗算
器109に供給される。The n variable gain amplifiers 106 (1) to 106 thus obtained
The output of (n) is added and synthesized by the addition and synthesis unit 107 and then supplied to the multiplier 109.

デマルチプレクサ・復号化器101から出力されるピッチ
周期情報は、補間器102で必要な補間が行なわれたのち
可変長窓関数発生器108に供給される。The pitch period information output from the demultiplexer / decoder 101 is supplied to the variable length window function generator 108 after the necessary interpolation is performed in the interpolator 102.

可変長窓関数発生器108は、位相リセットによって出力
波形に生ずる不連続を除き、音声波形のもつ連続性を確
保するための窓関数を発生するものであり、同時に、こ
の窓関数と密接な時間関係を有する位相リセット用パル
スも発生する。前述した如く、可変長窓関数発生器108
には次次に位相リセット用パルスの間隔を指定するピッ
チ周期データ列が入力されるが、可変長窓関数発生器10
8は、このデータは、このデータで指定される時間間隔
を有するインパルスを次次に発生し、これを有声／無声
切替器113（ａ）を介して位相リセット機能付可変周波
数発振器105（１）〜105（ｎ）の位相リセット用入力と
して供給しこれによって各発振器の位相リセットを行な
う。また、これを補間器102にも供給し、ω_iおよびｍ_i
を補間するためのタイミング信号として使用する。The variable-length window function generator 108 generates a window function for ensuring the continuity of the speech waveform, excluding the discontinuity that occurs in the output waveform due to the phase reset, and at the same time, the window function close to this window function. A related phase reset pulse is also generated. As described above, the variable length window function generator 108
Next, the pitch period data string that specifies the interval of the phase reset pulse is input to the variable length window function generator 10.
8. This data next generates an impulse having a time interval specified by this data, which is then transmitted via a voiced / unvoiced switch 113 (a) to a variable frequency oscillator 105 (1) with a phase reset function. .About.105 (n), which are supplied as phase reset inputs to reset the phase of each oscillator. It is also fed to the interpolator 102, where ω _i and m _i
Is used as a timing signal for interpolating.

さて、可変長窓関数発生器108は、上述した位相リセッ
ト用パルス発生と同期して下記のような可変長の窓関数
Ｗ（ｘ）を発生する。すなわち、供給されたデータによ
って指定されたその時点における位相リセット用パルス
間隔をＴとし、前の位相リセットパルスが発生してから
の経過時間をｔとするとＷ（ｔ）は次の（４）式で示さ
れる。Now, the variable length window function generator 108 generates the following variable length window function W (x) in synchronization with the above-described phase reset pulse generation. That is, if the phase reset pulse interval at that point in time specified by the supplied data is T and the elapsed time from the occurrence of the previous phase reset pulse is t, then W (t) is given by the following equation (4). Indicated by.

第16図は第１図の実施例の可変長窓関数特性図である。
この場合のＴの値は、有声音の場合にはピッチ周期を表
わし時間とともに変化する。従って、Ｗ（ｔ）は可変長
である。第17図は第１図の可変長窓関数と位相リセット
用パルスとの相対時間関係特性図である。第17図からも
明らかな如く、窓関数の開始時点および終止時点が位相
リセット用パルスとほぼ一致している。 FIG. 16 is a variable length window function characteristic diagram of the embodiment shown in FIG.
In the case of voiced sound, the value of T in this case represents a pitch period and changes with time. Therefore, W (t) has a variable length. FIG. 17 is a characteristic diagram of the relative time relationship between the variable length window function and the phase reset pulse of FIG. As is clear from FIG. 17, the starting point and the ending point of the window function almost coincide with the phase reset pulse.

可変長窓関数発生器108の出力は、有声／無声切替器113
（ｂ）を介して乗算器109に供給される。この結果、乗
算器109には、加算合成器107で合成されたｎ個の正弦波
形と、各位相リセット用パルスに同期して発生される窓
関数Ｗ（ｔ）との積が得られる。こうして得られる波形
は、各正弦波が位相リセットされる直前で窓関数Ｗ
（ｔ）の乗算を受けて連続的に零に収束されており、ま
た位相リセット時点では各正弦波は零から立上るので波
形の連続性が確保され、かくして窓関数Ｗ（ｔ）の乗算
により位相リセット波形に生ずる不連続性を除去するこ
とができる。The output of the variable length window function generator 108 is the voiced / unvoiced switch 113.
It is supplied to the multiplier 109 via (b). As a result, the multiplier 109 obtains the product of the n sine waveforms synthesized by the addition synthesizer 107 and the window function W (t) generated in synchronization with each phase reset pulse. The waveform thus obtained has a window function W just before each sine wave is reset in phase.
It is continuously converged to zero after being multiplied by (t), and each sine wave rises from zero at the time of phase resetting, so that the continuity of the waveform is secured, and thus the multiplication of the window function W (t) is performed. Discontinuities that occur in the phase reset waveform can be eliminated.

乗算器109の出力は乗算器110に供給され、補間器102を
介して出力される電力Ｖ₀によって加重され合成音声と
して出力する。The output of the multiplier 109 is supplied to the multiplier 110, weighted by the electric power V ₀ output through the interpolator 102, and output as synthesized speech.

以上が第１の実施例における音声合成時の動作である
が、ただし有声音を対象としたものであり、従って有声
／無声切替器113（ａ），（ｂ），（ｃ）はいずれも第
１図に示す（Ｖ）側、すなわち有声音側に接続されてい
る。The above is the operation at the time of speech synthesis in the first embodiment, but it is intended for voiced sound, and therefore the voiced / unvoiced switch 113 (a), (b), (c) is the first. It is connected to the (V) side shown in FIG. 1, that is, the voiced sound side.

さて、上記は有声音の場合であるが、無声音の場合には
有声／無声切替器113（ａ），（ｂ），（ｃ）は（UV）
側に切替えられ次のようにして処理される。The above is the case of voiced sound, but in the case of unvoiced sound, the voiced / unvoiced switch 113 (a), (b), (c) is (UV).
It is switched to the side and processed as follows.

乱数発生器104は乱数信号を発生しこれを周期算出器103
に出力する。周期算出器103は提供された乱数信号にも
とづいて算出した周期データを鋸歯状波発生器112に出
力する。The random number generator 104 generates a random number signal and outputs it to the period calculator 103.
Output to. The period calculator 103 outputs the period data calculated based on the provided random number signal to the sawtooth wave generator 112.

鋸歯状波発生器112は、供給された周期データで周期が
制御される鋸歯状波を発生する。第18図は第１図の実施
例における鋸歯状波出力波形図である。The sawtooth wave generator 112 generates a sawtooth wave whose period is controlled by the supplied period data. FIG. 18 is a sawtooth wave output waveform diagram in the embodiment of FIG.

鋸歯状波発生器112は、第18図に示す周期Ｔ₁,T₂,T₃,T₄
…等の鋸歯状波を出力し、有声／無声切替器113（ｃ）
に供給する。The sawtooth wave generator 112 has the periods T ₁ , T ₂ , T ₃ , T ₄ shown in FIG.
A sawtooth wave such as ... is output, and a voiced / unvoiced switch 113 (c)
Supply to.

FM変調器111（１）〜111（ｎ）は、位相リセット機能付
可変周波数発振器105（１）〜105（ｎ）から提供される
正弦波をFM変調するものであり、その変調信号は無声時
の場合は鋸歯状波発生器112の出力が有声／無声切替器1
13（ｃ）を介して供給され、有声時の場合には“0"電圧
が有声／無声切替器113（ｃ）を介して供給される。つ
まり、位相リセット機能付可変周波数発振器105（１）
〜105（ｎ）の出力波形は無声音では鋸歯状波でFM変調
され、有声音では変調されない。FM変調により正弦波が
周波数拡散されることは公知であり説明は省略する。な
お、変調指数は聴覚的な観点から経験的に設定される。
有声／無声切替器113（ａ）〜113（ｃ）はデマルチプレ
クサ・復号化器101から出力されるＶ（有声）/UV（無
声）信号によって一斉に切替えられる。こうして有声音
時には位相リセット用パルスを利用して音成が合成さ
れ、無声音時にはFM変調によって音声が合成される。The FM modulators 111 (1) to 111 (n) FM-modulate the sine waves provided from the variable frequency oscillators 105 (1) to 105 (n) with a phase reset function, and the modulated signals are in the unvoiced state. In case of, the output of the sawtooth wave generator 112 is the voiced / unvoiced switch 1
13 (c), and in the case of voiced voice, the "0" voltage is supplied via the voiced / unvoiced switch 113 (c). In other words, variable frequency oscillator with phase reset function 105 (1)
The output waveforms of up to 105 (n) are FM-modulated by a sawtooth wave for unvoiced sounds and not for voiced sounds. It is well known that a sine wave is spread in frequency by FM modulation, and a description thereof will be omitted. The modulation index is set empirically from the viewpoint of hearing.
The voiced / unvoiced switching units 113 (a) to 113 (c) are switched simultaneously by the V (voiced) / UV (unvoiced) signal output from the demultiplexer / decoder 101. In this way, the voice synthesis is synthesized by using the phase resetting pulse for voiced sound, and the voice is synthesized by FM modulation for unvoiced sound.

なお、無声音合成の場合には位相リセット機能付可変周
波数発振器105（１）〜105（ｎ）は位相リセットを行な
わず、また乗算器110には一定のDC信号が印加され波形
の整形は行なわれない。また、補間器102は有声時のみ
位相リセット機能付可変周波数発振器105（１）〜105
（ｎ）に供給される位相リセット用パルスに同期して補
間処理を実施し、無声音時には、たとえば5mSEC等一定
の周期で補間処理を実施する。In the case of unvoiced sound synthesis, the variable frequency oscillators 105 (1) to 105 (n) with a phase reset function do not reset the phase, and a constant DC signal is applied to the multiplier 110 to shape the waveform. Absent. In addition, the interpolator 102 is a variable frequency oscillator with a phase reset function 105 (1) to 105 only when voiced.
Interpolation processing is performed in synchronization with the phase reset pulse supplied to (n), and during unvoiced sound, interpolation processing is performed at a constant cycle such as 5 mSEC.

このようにして音声合成に必要なCSM合成が実行され、
伝送路における情報量の圧縮や伝送エラーにかかわらず
比較的良好な音質が得られる。In this way, CSM synthesis necessary for speech synthesis is executed,
A relatively good sound quality can be obtained regardless of the amount of information compression or transmission error in the transmission path.

上述した説明中、補間器102における各データに対する
補間は、分析側で各データを量子化する際の粗さに応じ
て種種の組合せで行なうことが可能で、たとえばω_iだ
け、あるいはω_i,m_iだけ等種種考えられ、また補間の方
法も直線補間や高級な関数による補間も可能である。な
お、ω_i,m_iに対する補間に関しては、前述した位相リセ
ットパルス発生時点ごとに補間データが得られるように
補間点を選択することが有利であり、ω_i,m_iの更新をこ
のタイミングで行なうため前述の如く位相リセット用パ
ルス補間器102にも供給している。このような補間を行
なうためには、必要な後データが到着するかもしくは発
生するかした後で補間データを求めることとなるため補
間器102には必要情報を補間演算に必要な時点まで記憶
しておくメモリが備えられている。In the above description, the interpolation for each data in the interpolator 102, each data analysis side can be performed in various combinations depending on the roughness when quantizing, for example omega _i only, or omega _i, Only m _i can be considered, and the interpolation method can be linear interpolation or high-level interpolation. Regarding the interpolation for omega _i, m _i, it is advantageous to select the interpolation points so interpolation data is obtained for each time phase reset pulse generator described above, the update of omega _i, m _i in this time In order to perform this, it is also supplied to the phase reset pulse interpolator 102 as described above. In order to perform such interpolation, the interpolated data is obtained after the necessary post-data arrives or is generated, so the interpolator 102 stores the necessary information up to the time required for the interpolating operation. It has a memory to store.

第２図は第１図の位相リセット機能付可変周波数発振器
の部分を詳細に示す回路図である。FIG. 2 is a circuit diagram showing in detail a portion of the variable frequency oscillator with the phase reset function shown in FIG.

定電流源1051,1052を流れる電流値は周波数制御端子に
印加される電圧によって制御される。これら定電流の制
御は、スイッチ1050によって切替えられるキャパシタ10
53の充放電電流を制御することとなる。Ｖ点の出力は基
準電圧＋Ｖ_γと−Ｖ_γとの間を直線的に上下する三角波
形となる。定電流ｉ₁とｉ₂はそれぞれこの三角波形の立
上りと立下りの特性を決定し従ってその周期も決定す
る。たとえばこの電流値が小であれば周期は大となる。
スイッチ1050はRS型フリップフロップ回路1059のＱ出力
によって制御されるが、RS型フリップフロップ回路1059
は、コンパレータ1056の出力をリセット端子Ｒで受け、
またコンパレータ1057の出力ならびに位相リセット端子
の入力を２入力とするOR回路1058の出力をセット端子Ｓ
を受けて前記三角波形を形成せしめるためのスイッチ切
替えを行なわせる。すなわち、位相リセット端子にイン
パルスを印加するとＶ点はスイッチ1054によって強制的
に零電位に戻され、この零電位から発振を再スタートし
スイッチ1050は充電側に接続され位相リセットされる。The current value flowing through the constant current sources 1051 and 1052 is controlled by the voltage applied to the frequency control terminal. These constant currents are controlled by the capacitor 10 which is switched by the switch 1050.
The charge / discharge current of 53 will be controlled. The output of the V point becomes linearly up and down triangular waveform between the reference voltage + V _gamma and -V _gamma. The constant currents i ₁ and i ₂ respectively determine the rising and falling characteristics of this triangular waveform and thus also its period. For example, if this current value is small, the cycle becomes large.
The switch 1050 is controlled by the Q output of the RS flip-flop circuit 1059, but the RS flip-flop circuit 1059
Receives the output of the comparator 1056 at the reset terminal R,
In addition, the output of the OR circuit 1058 which takes the output of the comparator 1057 and the input of the phase reset terminal as two inputs is set terminal S
In response to this, the switches are switched to form the triangular waveform. That is, when an impulse is applied to the phase reset terminal, the point V is forcibly returned to the zero potential by the switch 1054, the oscillation is restarted from this zero potential, and the switch 1050 is connected to the charging side and the phase is reset.

このようにして得られるＶ点の三角波発振出力は正弦波
変換器1055に供給され正弦波に変換される。この正弦波
変換器は、たとえばROMに格納したサイン関数値を入力
波形で読出すといった形式で容易に実施しうる。The triangular wave oscillation output at point V thus obtained is supplied to the sine wave converter 1055 and converted into a sine wave. This sine wave converter can be easily implemented, for example, by reading the sine function value stored in ROM as an input waveform.

また、かかる位相リセット機能付可変周波数発振器は計
算器のプログラムを用いても容易に実施することができ
る。Further, such a variable frequency oscillator with a phase reset function can be easily implemented by using a computer program.

第３図は第１図の可変利得増幅器の部分を詳細に示す回
路図である。第３図に示す可変利得増幅器は増幅器207
1、抵抗2073による演算増幅器の利得を電界効果トラン
ジスタ2072に印加される電圧に対応して可変とするもの
で、増幅すべき信号を入力端子に加え、制御端子に制御
信号を加え、出力端子から制御された振幅を有する出力
を得る。FIG. 3 is a circuit diagram showing in detail a portion of the variable gain amplifier shown in FIG. The variable gain amplifier shown in FIG.
1, the gain of the operational amplifier by the resistor 2073 is made variable according to the voltage applied to the field effect transistor 2072, the signal to be amplified is added to the input terminal, the control signal is added to the control terminal, and the output terminal Obtain an output with a controlled amplitude.

また、このほかに、アナログ乗算器を用いて実現するこ
とも可能であり、さらに、D/Aコンバータの基準電圧に
アナログ波形入力を利用し、ディジタル入力としてディ
ジタル量で表現された制御情報を利用する等の方法によ
っても容易に実現することができる。In addition to this, it is also possible to realize by using an analog multiplier. Furthermore, the analog waveform input is used for the reference voltage of the D / A converter, and the control information expressed in digital quantity is used as the digital input. It can be easily realized by a method such as

第４図は第１図の乱数発生器の部分を詳細に示す回路図
である。Ｄ₁〜Ｄ₁₅で示す15段のシフトレジスタ1041と
半加算器1042の組合せを利用し、２個の入力端子A,Bに
印加される信号の排他的論理和を出力Ｓ端からシフトレ
ジスタ1041の入力に帰還する形式で２¹⁵−１の周期を有
する15次のＭ系列の乱数信号を発生する。CLK（クロッ
ク）端子に必要なタイミングでシフトパルスを加えるこ
とによって次の乱数値を得るように歩進せしめる。FIG. 4 is a circuit diagram showing in detail the portion of the random number generator shown in FIG. The exclusive OR of the signals applied to the two input terminals A and B is obtained from the output S terminal by using the combination of the shift register 1041 of 15 stages and the half adder 1042 shown by D _{1 to} D _15. A 15th-order M-series random number signal having a period of 2 ¹⁵ -1 is generated in the form of being fed back to the input of. A shift pulse is applied to the CLK (clock) terminal at a necessary timing to advance to obtain the next random number value.

第５図は第１図の周期算出器103の部分を詳細に示すブ
ロック図である。乱数発生器104から出力される０から
２¹⁵−１の範囲に一様に分布している乱数を無声時に位
相リセット用パルスの時間間隔を指定する乱数として利
用するのに適した分布に変換するもので定数乗算器1031
と定数加算器1032から成り、これら定数乗算、加算を介
して乱数上上，下限と分布幅とを適当な値に設定するこ
とができる。FIG. 5 is a block diagram showing in detail the portion of the period calculator 103 of FIG. Random numbers output from the random number generator 104 and uniformly distributed in the range of 0 to 2 ¹⁵ -1 are converted into a distribution suitable for use as a random number that specifies the time interval of the phase reset pulse when unvoiced. Thing with constant multiplier 1031
And a constant adder 1032, and the upper and lower limits of the random number and the distribution width can be set to appropriate values through these constant multiplications and additions.

第６図は第１図の可変長窓関数発生器の部分を詳細に示
すブロック図であり、レジスタ1081,プリセット可能な
ダウンカウンタ1082,カウンタ1803,ROM1084等を備えて
構成される。FIG. 6 is a block diagram showing in detail the portion of the variable-length window function generator shown in FIG. 1, and comprises a register 1081, a presettable down counter 1082, a counter 1803, a ROM 1084 and the like.

補間器102から供給された位相リセット用パルス間隔を
指定するピッチ周期データはレジスタ1081に格納され
る。Pitch cycle data specifying the phase reset pulse interval supplied from the interpolator 102 is stored in the register 1081.

ダウンカウンタ1082は、一定周期の高速クロックCLKを
カウントするカウンタで、レジスタ1081の内容Ｔをプリ
セットしこれをCLKを用いてダウンカウントする。ダウ
ンカウンタ1082の内容が零になると出力端子からパルス
を発生しこれによって再びレジスタ1081の内容をプリセ
ットしてこのダウンカウントを開始する。このようにし
てダウンカウンタ1082の出力にはＴに比例した周期、た
とえばT/Kのパルス列が発生する。このパルス列はカウ
ンタ1038のクロックとして提供され、このクロックで歩
進させられるカウンタ1083のカウント出力はROM1084に
アドレス指定信号として加えられ、このアドレスに書き
込まれている窓関数Ｗ（ｔ）のデータを順番に読出して
出力する。カウンタ1083の内容がＫとなるとROM1084の
窓関数Ｗ（ｔ）の最後のデータが読出され、これととも
にカウント1083はリセットされてROM1084からは位相リ
セット用パルスが出力される。また、ROM1084の中にＫ
個のサンプルとしてあらかじめ格納されている窓関数の
データは有声／無声切替器113（ｂ）を介して乗算器109
に供給される。かくして、パルス間間隔が次次に指定さ
れた値をもつ位相リセット用パルスと、これと同期した
可変長窓関数Ｗ（ｔ）とが生成される。The down counter 1082 is a counter that counts the high-speed clock CLK having a constant cycle, presets the content T of the register 1081, and down counts it by using CLK. When the content of the down counter 1082 becomes zero, a pulse is generated from the output terminal, whereby the content of the register 1081 is preset again and the down counting is started. In this way, the output of the down counter 1082 generates a pulse train of a period proportional to T, for example, T / K. This pulse train is provided as the clock of the counter 1038, and the count output of the counter 1083 which is incremented by this clock is applied to the ROM 1084 as an addressing signal to sequentially output the data of the window function W (t) written at this address. To read and output. When the content of the counter 1083 becomes K, the last data of the window function W (t) of the ROM 1084 is read out, and at the same time, the count 1083 is reset and the ROM 1084 outputs the phase reset pulse. In addition, K in ROM1084
The window function data stored in advance as individual samples is sent to the multiplier 109 via the voiced / unvoiced switch 113 (b).
Is supplied to. In this way, the phase reset pulse having the next specified pulse spacing and the variable length window function W (t) synchronized with the pulse are generated.

CSM分析およびその具体的構成については、“複合正弦
波モデルによる音声スペクトル分析”嵯峨山その他、電
子通信学会論文誌81/2、Vol.J64−A No.2 P.105-112や
同人による特願昭59-143045に詳述されている。For the CSM analysis and its concrete configuration, refer to “Speech spectrum analysis using complex sine wave model” Sagayama et al., IEICE Transactions 81/2, Vol.J64-A No.2 P.105-112, and others. This is detailed in Japanese Patent Application No. 59-143045.

なお、上述した第１の実施例における分析側のCSM分析
は、標本自己相関係数とCSMの自己相関係数とを等しい
とする方程式を解く方法を用いているが、この代りにLP
C係数の無損失化による線スペクトル周波数の算出およ
び留数計算による方法を用いることもできる。また、本
実施においては補間器で位相リセット時点でパラメータ
補間を行なっているが、この補間は運用条件等によって
は省略することも可能である。Note that the CSM analysis on the analysis side in the above-described first embodiment uses a method of solving an equation in which the sample autocorrelation coefficient and the CSM autocorrelation coefficient are equal, but instead of this, LP
It is also possible to use the method of calculating the line spectrum frequency by making the C coefficient lossless and calculating the residue. In this embodiment, the interpolator performs parameter interpolation at the time of resetting the phase, but this interpolation can be omitted depending on the operating conditions.

また、本実施例で利用している可変長の窓関数は一例を
示すもので他の関数形を利用しても差支えない。Further, the variable length window function used in the present embodiment is an example, and other function forms may be used.

さらに乱数発生器周期算出器等もその一例を示したもの
でこれに限定される必要はない。Further, the random number generator period calculator and the like also show an example thereof, and need not be limited to this.

次に本発明の第２の実施例について説明する。第７図は
本発明の第２の実施例を示すブロック図である。この第
２の実施例は、ピッチ構造に拘束されないスペクトル拡
散手段を含む有声音発声手段を有するCSM型音声合成器
におけるスペクトル拡散手段を周波数変調としたもので
ある。つまり、第１の実施例では無声音の合成において
FM変調を用いているが有声音の場合もFM変調を用いるこ
とができ、これを利用したものである。Next, a second embodiment of the present invention will be described. FIG. 7 is a block diagram showing a second embodiment of the present invention. In the second embodiment, the spectrum spreading means in the CSM type speech synthesizer having the voiced sound producing means including the spread spectrum means not restricted by the pitch structure is frequency-modulated. That is, in the first embodiment, in the synthesis of unvoiced sound
Although FM modulation is used, FM modulation can also be used for voiced sound, and this is used.

第７図において、同一記号に関する構成要素は第１図の
場合と全く同一であるのでこれらに関する詳細な説明に
省略する。In FIG. 7, components related to the same symbols are exactly the same as those in FIG. 1, and therefore detailed description thereof will be omitted.

有声／無声切替器114はデマルチプレクサ・復号化器101
から供給されるV/UV情報を受けつつこれが有声（Ｖ）を
指定するときは補間器102からのピッチ周期情報を可変
長窓関数発生器108に供給し、第１図の場合と同様にし
て発生する位相リセット用パルスは位相リセット機能付
加変周波数発振器105（１）〜105（ｎ）ならびに補間器
102に供給する。また可変長窓関数も乗算器109に提供さ
れかくして有声音の場合にもスペクトル拡散手段として
の周波数変調が実施される。一方、無声の場合は可変長
窓関数発生器に対するピッチ周期データの提供は停止さ
れ、第１図の場合と同様に無声音が発生される。こうし
て、有声音の合成においてもFM変調を行ない、より肉声
に近い音声の合成が可能となる。このことは、次のよう
な効果を意味する。すなわち、単純なCSM位相初期化に
よって合成した音声のスペクトルがあまりにも明確過ぎ
るピッチ調波構造を付与しているのに対し、実際の音声
は声帯のゆらぎ等によってある程度不明確なピッチ構造
と有しており、従って、有声音合成時にFM変調を併用す
ることにより肉声に接近せしめることができるというこ
とである。Voiced / unvoiced switch 114 is demultiplexer / decoder 101
When the voiced (V) is specified while receiving the V / UV information supplied from, the pitch period information from the interpolator 102 is supplied to the variable length window function generator 108, in the same manner as in the case of FIG. The generated phase reset pulse is a variable frequency oscillator with phase reset function 105 (1) to 105 (n) and an interpolator.
Supply to 102. The variable length window function is also provided to the multiplier 109, and thus frequency modulation is performed as a spread spectrum means even in the case of voiced sound. On the other hand, in the unvoiced case, the provision of the pitch period data to the variable length window function generator is stopped, and the unvoiced sound is generated as in the case of FIG. In this way, even in the synthesis of voiced sound, FM modulation is performed, and it becomes possible to synthesize a voice closer to a real voice. This means the following effects. That is, the spectrum of speech synthesized by simple CSM phase initialization gives a pitch harmonic structure that is too clear, whereas actual speech has a pitch structure that is somewhat unclear due to fluctuations in the vocal cords. Therefore, it is possible to get closer to the real voice by using FM modulation together with voiced sound synthesis.

次に本発明の第３の実施例について説明する。第８図は
本発明の第３の実施例を示すブロック図であり、デマル
チプレクサ・復号化器101、補間器102、周期算出器10
3、乱数発生器104等前述した内容のほか、有声／無声切
替器114、カウンタ115、固定窓長CSM合成器116（１）,1
16（２），…116（ｎ）、加算合成器117等を備えて構成
される。Next, a third embodiment of the present invention will be described. FIG. 8 is a block diagram showing a third embodiment of the present invention, which includes a demultiplexer / decoder 101, an interpolator 102, and a period calculator 10.
3. Random number generator 104, etc. In addition to the above-mentioned contents, voiced / unvoiced switch 114, counter 115, fixed window length CSM synthesizer 116 (1), 1
16 (2), ... 116 (n), an adder / combiner 117, and the like.

この第３の実施例は、ピッチ構造に拘束されないスペク
トル拡散手段を含む有声音発生手段を有するCSM型音声
合成器であって、スペクトル拡散手段がピッチ周期長と
は異る周期の窓関数発生手段を含んでいる場合、ならび
に乱数発生手段で発生した乱数信号にもとづいて設定さ
れる周期信号を利用したスペクトル拡散手段を含む無声
音発生手段を備えたCSM型音声合成器であって、そのス
ペクトル拡散手段が上述した周期信号によって指定され
る周期とは異る周期の窓関数発生手段を含んでいる場合
の２つの場合を併せ実施して有声音と無声音を発生する
場合の一実施例を示すものである。The third embodiment is a CSM type speech synthesizer having a voiced sound generating means including a spread spectrum means not restricted by a pitch structure, wherein the spread spectrum means has a window function generating means having a period different from the pitch period length. And a CSM type speech synthesizer having unvoiced sound generating means including a spectrum spreading means using a periodic signal set based on a random number signal generated by the random number generating means, the spectrum spreading means being Shows an example of a case in which a voiced sound and an unvoiced sound are generated by combining two cases in which the window function generating means having a cycle different from the cycle specified by the above-described periodic signal is included. is there.

有声／無声切替器114は、デマルチプレクサ・復号化器1
01から受けるV/UV情報によって切替えられカウンタ115
に有声のときはピッチ周期、無声のとき周期算出器103
から出力される乱数周期に関するデータをカウンタ115
に供給する。The voiced / unvoiced switch 114 is a demultiplexer / decoder 1
Counter 115 switched by V / UV information received from 01
When voiced, the pitch period, and when unvoiced, the period calculator 103
Data about the random number cycle output from the counter 115
Supply to.

カウンタ115は、こうして入力するピッチ周期データも
しくは乱数周期データ入力ごとに歩進せしめられつつ固
定窓長CSM合成器116（１）〜116（ｎ）に対し次次に位
相リセット用パルスを出力しｎ個の固定窓長CSM合成器1
16（１）〜116（ｎ）に供給する。The counter 115 outputs a phase reset pulse to the fixed window length CSM synthesizers 116 (1) to 116 (n) while stepping each time the pitch period data or the random number period data input in this manner is input. Fixed window length CSM combiner 1
Supply to 16 (1) to 116 (n).

ｎ個の固定窓長CSM合成器116（１）〜116（ｎ）はそれ
ぞれ、補間器102から補間ずみのデータｍ₁〜ｍ_n，ω₁〜
ω_nならびに電力情報Ｖ₀を提供され、互いに異る固定窓
長のCSMを合成し加算合成器117に出力する。The n fixed window length CSM synthesizers 116 (1) to 116 (n) respectively interpolate data m _{1 to} _mn , ω ₁ to from the interpolator 102.
ω _n and power information V ₀ are provided, and CSMs having different fixed window lengths are combined and output to the adder / combiner 117.

第９図は第８図の固定窓長CSM合成器116の部分を詳細に
示すブロック図であり、ｎ個の位相リセット機能付可変
周波数発振器1161（１）〜1161（ｎ）、固定長窓関数発
生器、ｎ個の可変利得増幅器1163（１）〜1163（ｎ）、
加算合成器1164、乗算器1165、および乗算器1166等を備
えて構成される。FIG. 9 is a block diagram showing in detail the part of the fixed window length CSM synthesizer 116 of FIG. 8, and n variable frequency oscillators 1161 (1) to 1161 (n) with a phase reset function and fixed length window function. Generator, n variable gain amplifiers 1163 (1) -1163 (n),
It is configured to include an adder / combiner 1164, a multiplier 1165, and a multiplier 1166.

ｎ個の位相リセット機能付周波数発振器1161（１）〜11
61（ｎ）はそれぞれ補間器からω₁〜ω_nを受け位相リセ
ット用パルスで位相リセットしながら正弦波周波数を発
振し可変利得増幅器1163（１）〜1163（ｎ）に出力す
る。これら可変利得増幅器はそれぞれｍ₁〜ｍ_nに対応し
た可変利得制御をうけたのち加算合成器で加算される。
有声音の場合は、こうして出力される波形の位相リセッ
トはピッチ周期に対応して行なわれ、また無声音の場合
は乱数周期に対応して行なわれる。n frequency oscillators with phase reset function 1161 (1) to 11
Reference numeral 61 (n) receives ω ₁ to ω _n from the interpolator and oscillates a sine wave frequency while resetting the phase with the phase reset pulse and outputs the sine wave frequency to the variable gain amplifiers 1163 (1) to 1163 (n). These variable gain amplifiers are subjected to variable gain control corresponding to m _{1 to} _mn, and then added by an adder / combiner.
In the case of voiced sound, the phase reset of the waveform thus output is performed in correspondence with the pitch period, and in the case of unvoiced sound, it is performed in accordance with the random number period.

乗算器1165は加算合成器1164の出力と固定長窓関数発生
器1162から受ける固定長窓関数との乗算を行ないその出
力を乗算器1166に供給する。固定長窓関数はｎ個の固定
窓長CSM合成器ごとに互いに異り、従ってこれら固定長
窓が互いに重複し合い、もしくは離隔し合うような処理
も選択しうる。このような固定長窓関数を利用する目的
は位相リセット時点近傍での合成音質の連続性をさらに
改善させようとする目的に他ならない。Multiplier 1165 multiplies the output of addition combiner 1164 and the fixed length window function received from fixed length window function generator 1162, and supplies the output to multiplier 1166. The fixed length window functions are different for each of the n fixed window length CSM combiners, and thus a treatment may be chosen such that these fixed length windows overlap or are separated from each other. The purpose of using such a fixed-length window function is nothing but the purpose of further improving the continuity of synthesized sound quality near the phase reset time.

第10図は第９図の固定長窓関数発生器1162の部分を詳細
に示すブロック図であり、RS型フリップフロップ回路11
62−1,AND回路1162−２、カウンタ1162−3,ROM1162−4,
ゲート回路1162−５等を備えて構成される。FIG. 10 is a block diagram showing in detail the part of the fixed length window function generator 1162 of FIG. 9, and the RS type flip-flop circuit 11
62-1, AND circuit 1162-2, counter 1162-3, ROM1162-4,
It is configured by including a gate circuit 1162-5 and the like.

RS型フリップフロップ回路1162−１は、カウンタ115か
ら位相リセット用パルスをＳ端子に受けるごとにＱ出力
端子は“0"レベルとなり、またカウンタ1162−３もリセ
ットされ、AND回路1162−２からは高速クロックがカウ
ンタ1162−３に出力される。カウンタ1162−３は各固定
窓長CSM合成器ごとに形成すべき固定長窓関数の時間長
に対応した値のデータを格納し、入力する高速クロック
で読出されたときこれをROM1162−４のアドレスデータ
として出力し、ROM1162−４の当該アドレスに格納され
ている固定長窓関数に関するデータを乗算器1165に出力
する。また、カウンタ1162−３はこのようにして出力す
るアドレスデータをゲート回路1162−５に出力し、その
論理ゲートを利用して所定の形式のリセットパルスを出
力せしめる。In the RS type flip-flop circuit 1162-1, the Q output terminal becomes “0” level each time the S terminal receives the phase reset pulse from the counter 115, the counter 1162-3 is also reset, and the AND circuit 1162-2 outputs The high speed clock is output to the counter 1162-3. The counter 1162-3 stores the data of the value corresponding to the time length of the fixed length window function to be formed for each fixed window length CSM synthesizer, and when it is read by the input high speed clock, this is the address of the ROM 1162-4. The data regarding the fixed-length window function stored in the corresponding address of the ROM 1162-4 is output to the multiplier 1165. Further, the counter 1162-3 outputs the address data thus output to the gate circuit 1162-5, and the reset pulse of a predetermined format is output using the logic gate.

ゲート回路1162−５から出力されるリセットパルスはRS
型フリップフロップ回路1162−１のＲ端子に印加され
出力端子の極性を反転しカウンタ1162−３のカウント動
作を停止する。The reset pulse output from the gate circuit 1162-5 is RS
Applied to the R terminal of the flip-flop circuit 1162-1 to invert the polarity of the output terminal and stop the counting operation of the counter 1162-3.

こうしてROM1162−４からは位相リセット用パルス入力
ごとに所定の窓関数が出力され乗算器1165に提供され
る。In this way, a predetermined window function is output from the ROM 1162-4 for each phase reset pulse input and provided to the multiplier 1165.

乗算器1165は、こうして入力する固定長窓関数と加算合
成器1164の出力とを乗算し乗算器1166に供給する。The multiplier 1165 multiplies the fixed-length window function thus input and the output of the adder / combiner 1164 and supplies the result to the multiplier 1166.

乗算器1166は乗算器1165の出力に対し電力情報Ｖ₀を乗
算しこれを加算合成器117に出力する。The multiplier 1166 multiplies the output of the multiplier 1165 by the power information V ₀ and outputs it to the adder / combiner 117.

加算合成器117は、こうして各固定窓長CSM合成器から出
力されるCSM合成音を加算合成し合成音として出力す
る。The additive synthesizer 117 thus additively synthesizes the CSM synthesized sound output from each fixed window length CSM synthesizer and outputs the synthesized sound.

こうしてピッチ構造に拘束されず、しかもピッチ周期長
とは異る周期の窓関数を利用する有声音の発生、ならび
に乱数信号で指定される周期とは異る周期の窓関数を利
用する無声音による音声合成が容易に行なわれる。In this way, voiced sound that is not constrained by the pitch structure and that uses a window function with a period different from the pitch period length, and voiceless sound that uses a window function with a period different from the period specified by the random number signal Synthesis is easy.

〔The invention's effect〕

以上説明したように本発明によれば、CSM型音声合成器
において、ピッチ構造に拘束されないスペクトル拡散手
段を含む有声音発生手段、もしくは乱数信号にもとづい
て設定される周期の連続波形を利用するスペクトル拡散
手段が周波数変調である無声音発生手段、あるいは乱数
信号にもとづいて設定された周期信号によって指定され
る周期とは異る周期の窓関数発生手段を有するスペクト
ル拡散手段を含む無声音発生手段を備えることにより、
聴覚的に著しく肉声に近い音声の合成法を実施すること
が可能となるCSM型音声合成器が実現できるという効果
がある。As described above, according to the present invention, in the CSM type speech synthesizer, a voiced sound generating means including a spectrum spreading means that is not restricted by a pitch structure, or a spectrum using a continuous waveform of a period set based on a random number signal An unvoiced sound generating means including spread spectrum means having a window function generating means having a period different from a period specified by a periodic signal set based on a random number signal Due to
This has the effect of realizing a CSM-type speech synthesizer capable of implementing a speech synthesis method that is remarkably close to the real voice.

[Brief description of drawings]

第１図は本発明の第１の実施例のブロック図、第２図は
第１図の実施例における位相リセット機能付可変周波数
発振器105の部分を詳細に示すブロック図、第３図は第
１図の実施例における可変利得増幅器106の部分を詳細
に示す回路図、第４図は第１図の実施例における乱数発
生器104の部分を詳細に示すブロック図、第５図は第１
図の実施例における周期算出器103の部分を詳細に示す
ブロック図、第６図は第１図の実施例における可変長窓
関数発生器108の部分を詳細に示すブロック図、第７図
は本発明の第２の実施例のブロック図、第８図は本発明
の第３の実施例のブロック図、第９図は第８図の実施例
における固定窓長CSM合成器116の部分を詳細に示すブロ
ック図、第10図は第９図の固定長窓関数発生器1162の部
分を詳細に示すブロック図、第11図は従来のCSMにおけ
る窓処理説明図、第12図はCSMパラメータｍ_i，ω_iによ
る音声特徴ベクトルパターン図、第13図はCSM線スペク
トルとLPCスペクトル包絡とを対比して示すCSM/LPCスペ
クトル対比図、第14図は本発明におけるCSMのスペクト
ル拡散後のスペクトル包絡およびピッチ微細構造特性
図、第15図は従来のCSMのスペクトル拡散後のスペクト
ル特性図、第16図は第１図の実施例における可変長窓関
数特性図、第17図は第１図の実施例における窓関数と位
相リセット用パルスとの相対時関関係特性図、第18図は
第１図の実施例における鋸歯状波出力波形図である。 101……デマルチプレクサ・復号化器、102……補間器、
103……周期算出器、14……乱数発生器、105（１）〜10
5（ｎ）……位相リセット機能付可変周波数発振器、106
（１）〜106（ｎ）……可変利得増幅器、107……加算合
成器、108……可変長窓関数発生器、109,110……乗算
器、111（１）〜111（ｎ）……FM変調器、112……鋸歯
状波発生器、113（ａ），（ｂ），（ｃ）,114……有声
／無声切替器、115……カウンタ、116（１）〜116
（ｎ）……固定窓長CSM合成器、117……加算合成器。FIG. 1 is a block diagram of a first embodiment of the present invention, FIG. 2 is a block diagram showing in detail a portion of a variable frequency oscillator 105 with a phase reset function in the embodiment of FIG. 1, and FIG. FIG. 4 is a block diagram showing in detail the portion of the random number generator 104 in the embodiment of FIG. 1, and FIG.
FIG. 6 is a block diagram showing in detail the portion of the period calculator 103 in the embodiment of the figure, FIG. 6 is a block diagram showing in detail the portion of the variable length window function generator 108 in the embodiment of FIG. 1, and FIG. FIG. 8 is a block diagram of a second embodiment of the invention, FIG. 8 is a block diagram of a third embodiment of the present invention, and FIG. 9 is a detailed view of a portion of the fixed window length CSM synthesizer 116 in the embodiment of FIG. FIG. 10 is a block diagram showing in detail the fixed length window function generator 1162 of FIG. 9, FIG. 11 is an explanatory view of window processing in a conventional CSM, and FIG. 12 is a CSM parameter m _i , Voice feature vector pattern diagram by ω _i, FIG. 13 is a CSM / LPC spectrum contrast diagram showing the CSM line spectrum and the LPC spectrum envelope in contrast, and FIG. 14 is a spectrum envelope and pitch after spectrum spread of the CSM in the present invention. Fig. 15 is a microstructure characteristic diagram of the conventional CSM after spread spectrum. FIG. 16 is a characteristic diagram of the spectrum, FIG. 16 is a characteristic diagram of the variable length window function in the embodiment of FIG. 1, FIG. 17 is a characteristic diagram of the relative temporal relation between the window function and the pulse for phase reset in the embodiment of FIG. FIG. 18 is a sawtooth wave output waveform diagram in the embodiment of FIG. 101 ... Demultiplexer / decoder, 102 ... Interpolator,
103 …… Period calculator, 14 …… Random number generator, 105 (1) ～ 10
5 (n): Variable frequency oscillator with phase reset function, 106
(1) to 106 (n) ... Variable gain amplifier, 107 ... Addition synthesizer, 108 ... Variable length window function generator, 109, 110 ... Multiplier, 111 (1) -111 (n) ... FM modulation 112, sawtooth wave generator, 113 (a), (b), (c), 114 ... voiced / unvoiced switch, 115 ... counter, 116 (1) -116
(N): Fixed window length CSM combiner, 117: Additive combiner.

Claims

[Claims]

1. A CSM type speech synthesizer having a random number generating means and an unvoiced sound generating means including a spread spectrum means using a continuous waveform having a period set based on a random number signal generated by the random number generating means. CS characterized in that the spread spectrum means is frequency modulation
M-type speech synthesizer.

2. A CSM type speech synthesizer having a random number generating means and an unvoiced sound generating means including a spread spectrum means utilizing a periodic signal set based on a random number signal generated by the random number generating means, A CSM type speech synthesizer characterized in that the spread spectrum means includes window function generating means having a plurality of fixed time lengths which are set independently of the cycle specified by the periodic signal and have mutually different cycles. .

3. A CSM having voiced sound generating means including a spread spectrum means not restricted by a pitch structure, and including a frequency modulation means as the spread spectrum means.
Type speech synthesizer.

4. A voiced sound generating means including a spectrum spreading means which is not restricted by a pitch structure, wherein a plurality of fixed time lengths having mutually different periods set independently of the pitch period length are provided as said spectrum spreading means. A CSM type speech synthesizer characterized by including window function generating means having the same.

5. The CSM type speech synthesizer according to claim 1, wherein the continuous waveform having a period set based on the random number signal generated by the random number generating means is a sawtooth wave.

6. A CSM according to claim 1, further comprising a random number generating means for generating a random number signal having a distribution range set by a finite lower limit value and an upper limit value set in advance. Type speech synthesizer.

7. A sine having means for inputting or accumulating the strength and frequency parameters of a plurality of separately extracted sine wave signals representative of a voice signal and outputting a plurality of sine wave signals using the parameters. Wave generation means, superposition means for superposing a plurality of sine wave signals output by the sine wave generation means, and random number generation for generating a random number signal having a distribution range set by preset finite lower and upper limit values. And a means for resetting the phase of the sine wave signal corresponding to the pitch period of the voice signal when the voice signal is voiced, and modulating the sine wave signal with a continuous waveform having a period set based on the random number signal when the voice signal is unvoiced. A CSM type speech synthesizer according to claim (1), characterized in that it comprises a modulation means.

8. A means for extracting intensity and frequency parameters of a plurality of sine wave signals representing a voice signal, and a sine wave generating means for outputting a plurality of sine wave signals having the extracted intensity and frequency parameters. Superimposing means for superimposing a plurality of sine wave signals output by the sine wave generating means, random number generating means for generating a random number signal having a distribution range set by a preset finite lower limit value and upper limit value, When the voice signal is voiced, the phase of the sine wave signal is reset corresponding to the pitch period of the voice signal, and when the voice signal is unvoiced, the sine wave signal is modulated with a continuous waveform having a period set based on the random number signal. The CSM type speech synthesizer according to claim (1), characterized by comprising:

9. A random number generating means having a distribution range set by a finite lower limit value and an upper limit value set in advance, and the random number generating means according to claim (2).
CSM type speech synthesizer.