JPH0315759B2

JPH0315759B2 -

Info

Publication number: JPH0315759B2
Application number: JP56069029A
Authority: JP
Inventors: Yukio Mitome; Katsunobu Fushikida
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1981-05-08
Filing date: 1981-05-08
Publication date: 1991-03-01
Also published as: JPS57185097A

Description

【発明の詳細な説明】本発明は自然音声から抽出されたピツチ周期程
度の音声素片波形を編集合成する音声分析合成装
置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech analysis and synthesis device that edits and synthesizes speech unit waveforms of approximately pitch periods extracted from natural speech.

従来、音声波形の音声部分が周期的な波形とな
つていること、又母音定常部等においてはピツチ
周期波形（音声素片波形）の変化が比較的ゆるや
かであることを利用して、分析部において代表的
な音声素片波形を選択し、合成部において前記音
声素片波形をくり返し用いて音声波形を生成する
ことにより音声の情報量圧縮を行なう音声分析合
成方式が知られている。前記方式は例えば文献(1)
榑松、井上；「ピツチ単位音声素片の録音編集に
よる音声合成のシミユレーシヨン」昭和45年10月
音響学会講演録文書（２−１−４）p.125〜126
に詳しく述べられている。 Conventionally, the analysis unit takes advantage of the fact that the voice part of the speech waveform is a periodic waveform, and that the pitch periodic waveform (speech element waveform) changes relatively slowly in the vowel stationary part. A speech analysis and synthesis method is known in which a representative speech segment waveform is selected in a synthesis section, and a speech segment waveform is repeatedly used to generate a speech waveform in a synthesis section, thereby compressing the information amount of speech. The above method is described in, for example, literature (1).
Kuromatsu, Inoue; “Simulation of speech synthesis by recording and editing of pitch-unit speech segments” October 1970, Acoustical Society of Japan conference proceedings document (2-1-4) p.125-126
is described in detail.

しかしながら前記従来方式は、隣り合う音声素
片波形のスペクトルが急激に変化する際に、合成
波形の短時間スペクトルの時間変化に大きな不連
続が生じるために合成音質が劣化し易い欠点があ
る。さらに、ピツチ周期に関しては、抽出された
音声素片波形の時間長と前記音声素片波形を編集
合成する際のピツチ周期との差が大きい場合は素
片波形の切り捨て等により合成音質が劣化する欠
点がある。 However, the conventional method has the disadvantage that when the spectra of adjacent speech unit waveforms change rapidly, a large discontinuity occurs in the temporal change of the short-time spectrum of the synthesized waveform, resulting in a tendency for the synthesized sound quality to deteriorate. Furthermore, regarding the pitch period, if the difference between the time length of the extracted speech segment waveform and the pitch period when editing and synthesizing the speech segment waveform is large, the synthesized sound quality will deteriorate due to truncation of the segment waveform, etc. There are drawbacks.

本発明の目的は、自然音声から抽出された音声
素片波形を編集合成する音声分析合成装置におい
て、前記隣り合う音声素片波形間のスペクトル形
状あるいはピツチ周期の急激な変化により生じる
音質変化を緩和し、比較的高品質な合成音の得ら
れる音声分析合成装置を提供することにある。 An object of the present invention is to alleviate changes in sound quality caused by rapid changes in the spectral shape or pitch period between adjacent speech segment waveforms in a speech analysis and synthesis device that edits and synthesizes speech segment waveforms extracted from natural speech. The object of the present invention is to provide a speech analysis and synthesis device capable of producing synthesized speech of relatively high quality.

本発明は分析部において自然音声波形からピツ
チ同期的にすなわちピツチ区間に同期させて抽出
されたスペクトル包絡パラメータ及びピツチ周波
数を比較して前記音声素片波形を代表音声素片波
形として選択する手段と、合成部において先行す
る前記代表音声素片波形と後続する前記代表音声
素片波形とを重みづけした後加え合わせて補間す
ることにより前記２つの代表音声素片間の音声波
形を生成する手段とから構成されている。 The present invention includes means for selecting the speech segment waveform as a representative speech segment waveform by comparing the spectral envelope parameters and pitch frequencies extracted from the natural speech waveform in pitch synchronization, that is, in synchronization with the pitch interval, in the analysis section. , means for generating a speech waveform between the two representative speech segments by weighting the preceding representative speech segment waveform and the following representative speech segment waveform and then adding them together and interpolating them in a synthesis unit; It consists of

本発明の特徴は、ピツチ同期的に抽出されたス
ペクトル包絡パラメータ及びピツチ周波数を比較
して前記音声素片波形を代表音声素片波形として
選択し、先行する前記代表音声素片波形と後続す
る前記代表音声素片波形とを重みづけした後加え
合わせて補間することにより、前記２つの代表音
声素片波形間の音声波形を生成することにある。 The feature of the present invention is to select the speech segment waveform as a representative speech segment waveform by comparing the spectral envelope parameters and pitch frequencies extracted in pitch synchronization, and select the preceding representative speech segment waveform and the following representative speech segment waveform. The purpose of the present invention is to generate a speech waveform between the two representative speech segment waveforms by weighting the representative speech segment waveforms, then adding them together and interpolating them.

代表音声素片波形の選択は、例えば前記自然音
声からスペクトル包絡パラメータとピツチ周波数
をピツチ同期的に抽出し、これらを時間的に先行
する代表音声素片波形のこれらのパラメータ値と
比較することにより行なう。この際スペクトル包
絡パラメータ値間の距離のみならず、ピツチ周波
数間の距離がそれぞれある閾値を超えているか否
かを判定し、いずれかの前記パラメータ値間の距
離が前記閾値を超えていたら新たな代表音声素片
波形として選択し、いずれのパラメータ値間の距
離も前記閾値を超えていないならば先行する前記
代表音声素片波形と同一の音声素片波形とみなし
代表としては選択せず、新たな代表素片波形が選
択されるまで分析をくり返す。このとき先行する
前記代表音声素片波形がどの程度の時間間隔にわ
たつて音声素片波形を代表するかを表わすパラメ
ータを抽出する。 The representative speech segment waveform can be selected, for example, by pitch-synchronously extracting the spectral envelope parameter and pitch frequency from the natural speech and comparing these parameter values with the temporally preceding representative speech segment waveform. Let's do it. At this time, it is determined whether not only the distance between spectral envelope parameter values but also the distance between pitch frequencies exceeds a certain threshold, and if the distance between any of the parameter values exceeds the threshold, a new If the distance between any parameter values does not exceed the threshold value, it is considered to be the same as the preceding representative speech segment waveform, and it is not selected as the representative speech segment waveform, and a new speech segment waveform is selected as the representative speech segment waveform. The analysis is repeated until a representative segmental waveform is selected. At this time, a parameter representing the time interval over which the preceding representative speech segment waveform represents the speech segment waveform is extracted.

次に本発明の代表音声素片間の合成波形の生成
方式について詳細に説明する。 Next, a method for generating a composite waveform between representative speech units according to the present invention will be explained in detail.

時間的に先行する代表音声素片波形の振幅値を
ｆ（ｔ）、後続する代表音声素片波形の振幅値をｇ
（ｔ）と表わす。（ここにｔは時間を表わす。）先
行する代表音声素片波形に重みα_i（ここにｉ＝１，
２，……，ｎであり、ピツチ区間のくり返し数を
表わす。）を乗じ、後続する代表音声素片波形に
重みβ_i（ｉ＝１，２，……，ｎ）を乗じ両者を加
え合わせて前記代表素片間の波形を合成する。即
ち、合成された１ピツチ分の音声素片波形をh_i
（ｔ）（ｉ＝１，２，……ｎ）とすればh_i（ｔ）は
次式で与えられる。 The amplitude value of the temporally preceding representative speech unit waveform is f(t), and the amplitude value of the subsequent representative speech unit waveform is g
(t). (Here t represents time.) Weight α _i (here i=1,
2, . . . , n, which represents the number of repetitions of the pitch section. ), the subsequent representative speech segment waveform is multiplied by weight β _i (i=1, 2, . . . , n), and both are added to synthesize the waveform between the representative speech segments. That is, the synthesized speech segment waveform for one pitch is h _i
(t) (i=1, 2, . . . n), h _i (t) is given by the following equation.

h_i（ｔ）＝α_i・ｆ（ｔ）＋β_i・ｇ（ｔ） ……(1) このとき重みα_i及びβ_iはくり返すごとに次のよ
うに変化させる。 h _i (t) = α _i · f (t) + β _i · g (t) ... (1) At this time, the weights α _i and β _i are changed as follows each time it is repeated.

１＞α₁＞α₂＞……＞α_o＞０Ｏ＜β₁＜β₂＜……＜β_o＜１ ……(2) 又、先行する代表音声素片波形ｆ（ｔ）のピツ
チ周期T_fと、後続する代表素片波形ｇ（ｔ）のピ
ツチ周期T_gとは一般に時間長が異なるので合成
すべき前記１ピツチ分の音声素片波形h_i（ｔ）の
ピツチ周期T_iは前記２つの代表音声素片波形のピ
ツチ周期T_fとT_gとを内挿して得られるピツチ周
期とする。即ち、例えばT_f＞T_gのときには T_f≧T₁≧……≧T_o≧T_g ……(3) とし、又T_f＜T_gのときには T_f≦T₁≦……≦T_o≦T_g ……(4) とする。１＞α ₁ ＞α ₂ ＞……＞α _o ＞0 O＜β ₁ ＜β ₂ ＜……＜β _o ＜1 …(2) Also, the pitch of the preceding representative speech segment waveform f(t) Since the period T _f and the pitch period T _g of the subsequent representative segment waveform g(t) generally have different time lengths, the pitch period T _i of the speech segment waveform h _i (t) for one pitch to be synthesized is is the pitch period obtained by interpolating the pitch periods T _f and T _g of the two representative speech unit waveforms. That is, for example, when T _f > T _g , T _f ≧T ₁ ≧...≧T _o ≧T _g ...(3), and when T _f < T _g , T _f ≦T ₁ ≦...≦T _o ≦T _g ……(4).

このとき前記代表音声素片波形のピツチ周期
T_fもしくはT_gが合成すべき前記１ピツチ分の音
声素片波形h_i（ｔ）のピツチ周期T_iに対して長す
ぎる場合あるいは短かすぎる場合が生じるが、例
えば長すぎる場合には余分な波形を切捨て短かす
ぎる場合には波形の最終振幅値を保持させて用い
ることにより、本方式如くピツチ周期の差が大き
くならないように代表音声素片波形を選択した場
合には、音質をほとんど劣化さぜずに合成波形を
生成することができる。 At this time, the pitch period of the representative speech segment waveform is
There may be cases where T _f or T _g is too long or too short with respect to the pitch period T _i of the speech unit waveform h _i (t) for one pitch to be synthesized. If the waveform is too short, the final amplitude value of the waveform is retained and used, and if the representative speech unit waveform is selected so that the difference in pitch period does not become large as in this method, the sound quality can be reduced to almost nothing. A composite waveform can be generated without deterioration.

以上の説明ではピツチ区間のくり返しはｎ回と
したが、これは分析部で抽出した先行する前記代
表音声素片波形が代表する時間間隔を表わすパラ
メータに従つて制御し、１回の場合、複数回の場
合があり、又零回の場合すなわち前記２つの代表
音声素片波形間の合成波形は生成しない場合もあ
り得る。 In the above explanation, the pitch interval is repeated n times, but this is controlled according to the parameter representing the time interval represented by the preceding representative speech unit waveform extracted by the analysis section. In some cases, it may occur zero times, that is, in other words, a composite waveform between the two representative speech segment waveforms may not be generated.

以上の説明の如く、本発明によれば前記２つの
代表音声素片波形間のスペクトル包絡やピツチ周
期の急激な変化による合成音質の劣化を緩和し、
比較的良質な合成音を得られるという効果がある
ことは明らかである。 As described above, according to the present invention, deterioration in synthesized sound quality due to sudden changes in the spectral envelope and pitch period between the two representative speech segment waveforms is alleviated,
It is clear that this method has the effect of obtaining a relatively high quality synthesized sound.

次に図面を用いて本発明の実施例を説明する。 Next, embodiments of the present invention will be described using the drawings.

図は本発明の一実施例を示すブロツク図であ
る。 The figure is a block diagram showing one embodiment of the present invention.

まず自然音声が分析部１０１の自然音声入力端
子１０３を介して入力音声波形一時記憶回路１０
４、ピツチ抽出回路１０５及びスペクトル包絡情
報抽出回路１０６に入力される。ピツチ抽出回路
については例えば文献〔２〕L.R.Rabiner他「Ａ
Comparative Performance of Several Pitch
Detection Algorithms」IEEE Trans.Assp−24
No.５ p.399〜418にいくつかの方式とブロツク
図が詳細に述べられている。又、スペクトル包絡
情報抽出回路については例えば文献〔３〕L.R.
Rabiner他「Digital Processing of Speech
Signals」Prentice−Hall 1978の第６章〜第８章
にいくつかの方式、ブロツク図等が詳細に述べら
れているので、詳細な説明は省略する。 First, natural speech is input to the speech waveform temporary storage circuit 10 via the natural speech input terminal 103 of the analysis section 101.
4. Input to pitch extraction circuit 105 and spectrum envelope information extraction circuit 106. Regarding the pitch extraction circuit, see, for example, the literature [2] LRRabiner et al.
Comparative Performance of Several Pitches
Detection Algorithms” IEEE Trans.Assp−24
Some methods and block diagrams are described in detail on No. 5, pages 399-418. Also, regarding the spectral envelope information extraction circuit, see, for example, document [3] LR
Rabiner et al. “Digital Processing of Speech
Since several systems, block diagrams, etc. are described in detail in Chapters 6 to 8 of "Signals" Prentice-Hall 1978, detailed explanations will be omitted.

分析部制御回路１２０からピツチ抽出回路制御
情報伝送路１４５を介して送られる制御情報に従
つてピツチ抽出回路１０５で抽出されたピツチ情
報はピツチ情報伝送路１０８を介してピツチ情報
比較回路１１０及び代表音声素片波形切出し回路
１２１に送られる。 The pitch information extracted by the pitch extraction circuit 105 in accordance with the control information sent from the analyzer control circuit 120 via the pitch extraction circuit control information transmission line 145 is sent to the pitch information comparison circuit 110 and the representative via the pitch information transmission line 108. It is sent to the speech segment waveform extraction circuit 121.

分析部制御回路１２０からスペクトル包絡情報
抽出回路制御情報伝送路１４６を介して送られる
制御情報に従つてスペクトル包絡情報抽出回路１
０６で抽出されたスペクトル包絡情報はスペクト
ル包絡情報伝送路１０９を介してスペクトル包絡
情報比較回路１１２に送られる。 Spectrum envelope information extraction circuit 1 according to control information sent from analysis unit control circuit 120 via spectrum envelope information extraction circuit control information transmission line 146
The spectrum envelope information extracted in step 06 is sent to the spectrum envelope information comparison circuit 112 via the spectrum envelope information transmission line 109.

ピツチ情報比較回路１１０では分析部制御回路
１２０からピツチ情報比較回路制御情報伝送路１
１６を介して送られるピツチ情報比較回路制御情
報に従い、前記ピツチ抽出回路１０５から送られ
たピツチ情報と、ピツチ情報記憶回路１１１に記
憶されている時間的に先行する代表音声素片波形
のピツチ情報と比較し、その差の絶対値がある基
準値を超えたか否かの比較結果の情報をピツチ情
報比較情報伝送路１１５を介して判定回路１２３
に送る。 In the pitch information comparison circuit 110, the analysis section control circuit 120 is connected to the pitch information comparison circuit control information transmission line 1.
16, the pitch information sent from the pitch extraction circuit 105 and the pitch information of the temporally preceding representative speech unit waveform stored in the pitch information storage circuit 111. The comparison result information as to whether the absolute value of the difference exceeds a certain reference value is sent to the determination circuit 123 via the pitch information comparison information transmission line 115.
send to

スペクトル包絡情報比較回路１１２では、前記
分析部制御回路１２０からスペクトル包絡情報比
較回路制御情報伝送路１１９を介して送られる制
御情報に従い、前記スペクトル包絡情報抽出回路
１０６から送られたスペクトル包絡情報と、スペ
クトル包絡情報記憶回路１１３に記憶されている
時間的に先行する前記代表音声素片波形のスペク
トル包絡情報と比較し、２つのスペクトル包絡情
報の距離がある基準値を超えたか否かの比較結果
の情報をスペクトル包絡情報比較情報伝送路１１
８を介して判定回路１２３に送る。 In the spectral envelope information comparison circuit 112, according to the control information sent from the analysis section control circuit 120 via the spectral envelope information comparison circuit control information transmission line 119, the spectral envelope information sent from the spectral envelope information extraction circuit 106, Compare the spectral envelope information of the temporally preceding representative speech unit waveform stored in the spectral envelope information storage circuit 113 to determine whether the distance between the two spectral envelope information exceeds a certain reference value. Comparing information to spectrum envelope information Information transmission path 11
8 to the determination circuit 123.

判定回路１２３では、前記分析部制御回路１２
０から判定回路制御情報伝送路１２４を介して送
られる制御情報に従い、前記２つの比較回路から
送られた比較結果の情報のうち少くとも一方が前
記基準値を超えていることを示している場合には
入力音声波形を新たな代表音声素片波形として採
用すると判定し、前記２つの比較結果の情報が共
に前記基準値を超えていないことを示している場
合には新たな代表音声素片波形として採用しない
と判定し、判定結果の情報を判定情報伝送路Ａ１
１４、判定情報伝送路Ｂ１１７、判定情報伝送路
Ｃ１２２、及び判定情報伝送路Ｄ１４７を介して
それぞれ前記ピツチ情報比較回路１１０、前記ス
ペクトル包絡情報比較回路１１２、代表音声波形
切出し回路１２１、及び計数回路１４８に送る。 In the determination circuit 123, the analysis section control circuit 12
0 through the determination circuit control information transmission path 124, when at least one of the comparison result information sent from the two comparison circuits indicates that it exceeds the reference value. It is determined that the input speech waveform is to be adopted as a new representative speech segment waveform, and if the information of the two comparison results indicates that both do not exceed the reference value, the new representative speech segment waveform is adopted. It is determined that the information is not to be adopted as
14, the pitch information comparison circuit 110, the spectrum envelope information comparison circuit 112, the representative audio waveform extraction circuit 121, and the counting circuit 148 via the judgment information transmission line B117, the judgment information transmission line C122, and the judgment information transmission line D147, respectively. send to

前記ピツチ情報比較回路１１０及びスペクトル
包絡情報比較回路１１２では前記判定回路１２３
から新たな代表音声素片波形として採用したこと
を示す判定情報が送られたらば、それぞれ新たな
代表音声素片波形のピツチ情報及びスペクトル包
絡情報をそれぞれピツチ情報記憶回路１１１及び
スペクトル包絡情報記憶回路１１３に書き込む。 In the pitch information comparison circuit 110 and the spectrum envelope information comparison circuit 112, the determination circuit 123
When judgment information indicating that the waveform has been adopted as a new representative speech unit waveform is sent, the pitch information and spectral envelope information of the new representative speech unit waveform are stored in the pitch information storage circuit 111 and the spectral envelope information storage circuit, respectively. Write to 113.

代表音声素片波形切出し回路１２１は、分析部
制御回路１２０から代表音声素片波形切出し回路
制御情報伝送路１２５を介して送られる制御情報
と前記判定回路１２３から送られる判定情報に従
い、前記ピツチ抽出回路１０５から送られるピツ
チ情報に基づいて、前記入力音声波形一時記憶回
路１０４から入力音声波形伝送路１０７を介して
送られる入力音声波形から代表音声素片波形を切
り出し代表音声素片波形データを分析部代表音声
素片波形データ出力端子１２６に出力し、代表音
声素片波形のピツチ情報を分析部代表音声素片波
形ピツチ情報出力端子１２７に出力する。 The representative speech segment waveform extraction circuit 121 extracts the pitch according to the control information sent from the analysis section control circuit 120 via the representative speech segment waveform extraction circuit control information transmission line 125 and the judgment information sent from the judgment circuit 123. Based on the pitch information sent from the circuit 105, a representative speech segment waveform is cut out from the input speech waveform sent from the input speech waveform temporary storage circuit 104 via the input speech waveform transmission path 107, and the representative speech segment waveform data is analyzed. The pitch information of the representative speech segment waveform is outputted to the analysis section representative speech segment waveform pitch information output terminal 127.

計数回路１４８は分析部制御回路１２０から計
数回路制御情報伝送路１４９を介して送られる制
御情報に従い、前記判定回路１２３から判定情報
伝送路Ｄ１４７を介して送られる判定情報に基づ
いて、隣接する新たな代表音声素片波形として採
用したことを示す判定情報間に発生する代表音声
素片波形として採用しないことを示す判定情報の
数を計測し、これを先行する代表音声素片波形が
何ピツチ区間の音声波形を代表するかというくり
返し情報として分析部代表音声素片波形くり返し
情報出力端子１２８に出力する。 The counting circuit 148 determines whether an adjacent new The number of determination information indicating that the waveform is not adopted as a representative voice unit waveform that occurs between the determination information indicating that the waveform has been adopted as a representative voice unit waveform is counted, and the number of determination information indicating that the waveform is not adopted as a representative voice unit waveform is calculated. It is output to the analysis section representative speech unit waveform repetition information output terminal 128 as repetition information indicating whether the speech waveform is representative.

分析部制御回路１２０は前記ピツチ情報比較回
路１１０、スペクトル包絡情報比較回路１１２、
判定回路１２３、及び代表音声素片波形切出し回
路１２１を制御する。 The analysis unit control circuit 120 includes the pitch information comparison circuit 110, the spectrum envelope information comparison circuit 112,
The determination circuit 123 and the representative speech segment waveform extraction circuit 121 are controlled.

合成部１０２では、前記代表音声素片波形デー
タが合成部代表音声素片波形データ入力端子１２
９から入力され代表音声素片波形データ記憶回路
１３２に記憶され、前記代表音声素片波形ピツチ
情報が合成部代表音声素片波形ピツチ情報入力端
子１３０に入力され代表音声素片波形ピツチ情報
記憶回路１３３に記憶され、前記代表音声素片波
形くり返し情報が合成部代表音声素片波形くり返
し情報入力端子１３１に入力され代表音声素片波
形くり返し情報記憶回路１３４に記憶される。 In the synthesis section 102, the representative speech segment waveform data is input to the synthesis section representative speech segment waveform data input terminal 12.
9 and stored in the representative speech segment waveform data storage circuit 132, and the representative speech segment waveform pitch information is inputted to the representative speech segment waveform pitch information input terminal 130 of the synthesis section and stored in the representative speech segment waveform pitch information storage circuit. The representative speech unit waveform repetition information is inputted to the representative speech unit waveform repetition information input terminal 131 of the synthesis section and stored in the representative speech unit waveform repetition information storage circuit 134.

重み及びピツチ周期算出回路１３８は、合成部
制御回路１４０から重み及びピツチ周期算出回路
制御情報伝送路１３９を介して送られれる制御情
報に従い、前記代表音声素片波形ピツチ情報記憶
回路１３３から代表音声素片波形ピツチ情報伝送
路１３６を介して、先行する代表音声素片波形の
ピツチ情報と後続する代表音声素片波形のピツチ
情報を得て、前記代表音声素片波形くり返し情報
記憶回路１３４から代表音声素片波くり返し情報
伝送路１３７を介して先行する代表音声素片波形
のくり返し情報を得て各くり返し区間ごとの重み
の値とピツチ周期をそれぞれ式(2)及び式(3)もしく
は式(4)を満足するように算出しそれぞれ重み値伝
送路１５０及びピツチ周期伝送路１５１を介して
振幅およびピツチ周期調整回路１５２に送る。 The weight and pitch period calculation circuit 138 calculates the representative voice from the representative speech segment waveform pitch information storage circuit 133 in accordance with control information sent from the synthesis unit control circuit 140 via the weight and pitch period calculation circuit control information transmission line 139. The pitch information of the preceding representative speech segment waveform and the pitch information of the subsequent representative speech segment waveform are obtained via the segment waveform pitch information transmission path 136, and the pitch information of the representative speech segment waveform that follows is obtained from the representative speech segment waveform repetition information storage circuit 134. Repetition information of the preceding representative speech element waveform is obtained via the speech element wave repetition information transmission path 137, and the weight value and pitch period for each repetition section are calculated by equations (2) and (3) or ( 4) is calculated and sent to the amplitude and pitch period adjustment circuit 152 via the weight value transmission path 150 and the pitch period transmission path 151, respectively.

振幅およびピツチ周期調整回路１５２は前記合
成部制御回路１４０から振幅およびピツチ周期調
整回路制御情報伝送路１５３を介して送られる制
御情報に従い、前記代表音声素片波形データ記憶
回路１３２から代表音声素片波形データ伝送路１
３５を介して先行する代表音声素片波形データと
後続する代表音声素片波形データを入手し、前記
重み及びピツチ周期算出回路１３８から送られた
重みを各音声素片波形データに乗じ、前記重み及
びピツチ周期算出回路１３８から送られたピツチ
周期に対して各代表音声素片波形データのピツチ
周期が短い場合は最終振幅値を保持し長い場合は
超過分の波形を切捨ててピツチ周期を調整し重み
づけされた各音声素片波形データを音声素片波形
データ伝送路１５４を介して加算回路１５５へ送
る。 The amplitude and pitch period adjustment circuit 152 selects the representative speech segment from the representative speech segment waveform data storage circuit 132 in accordance with control information sent from the synthesis section control circuit 140 via the amplitude and pitch period adjustment circuit control information transmission line 153. Waveform data transmission line 1
35, obtain the preceding representative speech segment waveform data and the subsequent representative speech segment waveform data, multiply each speech segment waveform data by the weight sent from the weight and pitch period calculation circuit 138, and calculate the weight. If the pitch period of each representative speech unit waveform data is short with respect to the pitch period sent from the pitch period calculation circuit 138, the final amplitude value is held; if it is long, the pitch period is adjusted by cutting off the excess waveform. Each weighted speech unit waveform data is sent to an adding circuit 155 via a speech unit waveform data transmission line 154.

加算回路１５５は合成部制御回路１４０から加
算回路制御情報伝送路１５６を介して送られる制
御情報に従つて、前記振幅およびピツチ周期調整
回路から送られた重みづけられた先行する音声素
片波形と後続する音声素片波形データの対応する
時刻の振幅値を加え合わせて補間波形を生成し、
補間波形伝送路１４１を介して編集合成回路１４
２に送る。 The adder circuit 155 combines the weighted preceding speech segment waveform sent from the amplitude and pitch period adjustment circuit in accordance with control information sent from the synthesizer control circuit 140 via the adder circuit control information transmission line 156. Generate an interpolated waveform by adding the amplitude values at the corresponding times of the subsequent speech segment waveform data,
The editing synthesis circuit 14 via the interpolation waveform transmission line 141
Send to 2.

編集合成回路１４２は合成部制御回路１４０か
ら編集合成回路制御情報伝送路１４３を介して送
られる制御情報に従い、前記代表音声素片波形デ
ータ記憶回路１３５から前記代表音声素片波形デ
ータ伝送路１３５を介して送られる代表音声素片
波形データと、前記代表音声素片波形間補間回路
１３８から送られる前記補間波形を編集合成し、
合成音声出力端子１４４に出力する。 The editing/synthesizing circuit 142 selects the representative speech segment waveform data transmission path 135 from the representative speech segment waveform data storage circuit 135 in accordance with the control information sent from the synthesis unit control circuit 140 via the editing/synthesizing circuit control information transmission path 143. editing and synthesizing the representative speech unit waveform data sent through the representative speech unit waveform data and the interpolation waveform sent from the representative speech unit waveform interpolation circuit 138;
It is output to the synthesized speech output terminal 144.

合成部制御回路１４０は前記重み及びピツチ周
期算出回路１３８、前記振幅およびピツチ周期調
整回路１５２、前記加算回路１５５及び前記編集
合成回路１４２を制御する。 The synthesis section control circuit 140 controls the weight and pitch period calculation circuit 138, the amplitude and pitch period adjustment circuit 152, the addition circuit 155, and the editing and synthesis circuit 142.

なお、図の実施例では分析部で得られた代表音
声素片波形データ、代表音声素片波形ピツチ情
報、代表音声素片波形くり返し情報を記憶回路に
蓄える構成としているが、これらの諸データを記
憶回路に蓄えずリアルタイムで合成することによ
り音声の通信装置として実現することも可能であ
る。 In the embodiment shown in the figure, the representative speech segment waveform data, representative speech segment waveform pitch information, and representative speech segment waveform repetition information obtained by the analysis section are stored in the storage circuit. It is also possible to realize an audio communication device by synthesizing in real time without storing it in a memory circuit.

[Brief explanation of the drawing]

図は本発明の一実施例を示すブロツク図であ
る。同図において、１０１は分析部、１０２は合成
部、１０３は自然音声入力端子、１０４は入力音
声波形一時記憶回路、１０５はピツチ抽出回路、
１０６はスペクトル包絡情報抽出回路、１１０は
ピツチ情報比較回路、１１１はピツチ情報記憶回
路、１１２はスペクトル包絡情報比較回路、１１
３はスペクトル包絡情報記憶回路、１２０は分析
部制御回路、１２１は代表音声素片波形切出し回
路、１２３は判定回路、１１２６は分析部代表音
声素片波形データ出力端子、１２７は分析部代表
音声素片波形ピツチ情報出力端子、１２８は分析
部代表音声素片波形くり返し情報出力端子、１２
９は合成部代表音声素片波形データ入力端子、１
３０は合成部代表音声素片波形ピツチ情報入力端
子、１３１は合成部代表音声素片波形くり返し情
報入力端子、１３２は代表音声素片波形データ記
憶回路、１３３は代表音声素片波形ピツチ情報記
憶回路、１３４は代表音声素片波形くり返し情報
記憶回路、１３８は重み及び周期算出回路、１４
０は合成部制御回路、１４２は編集合成回路、１
４４は合成音声出力端子、１４８は計数回路、１
５２は振幅およびピツチ周期調整回路、１５５は
加算回路を表わす。 The figure is a block diagram showing one embodiment of the present invention. In the figure, 101 is an analysis section, 102 is a synthesis section, 103 is a natural speech input terminal, 104 is an input speech waveform temporary storage circuit, 105 is a pitch extraction circuit,
106 is a spectral envelope information extraction circuit, 110 is a pitch information comparison circuit, 111 is a pitch information storage circuit, 112 is a spectral envelope information comparison circuit, 11
3 is a spectrum envelope information storage circuit, 120 is an analysis section control circuit, 121 is a representative speech unit waveform extraction circuit, 123 is a determination circuit, 1126 is an analysis section representative speech unit waveform data output terminal, and 127 is an analysis section representative phoneme. Single waveform pitch information output terminal, 128 is a representative voice unit waveform repetition information output terminal of the analysis section, 12
9 is a representative voice unit waveform data input terminal of the synthesis section; 1
30 is a representative speech unit waveform pitch information input terminal for the synthesis section, 131 is a representative speech unit waveform repetition information input terminal for the synthesis section, 132 is a representative speech unit waveform data storage circuit, and 133 is a representative speech unit waveform pitch information storage circuit. , 134 is a representative speech unit waveform repetition information storage circuit, 138 is a weight and period calculation circuit, 14
0 is a synthesis unit control circuit, 142 is an editing synthesis circuit, 1
44 is a synthesized voice output terminal, 148 is a counting circuit, 1
Reference numeral 52 represents an amplitude and pitch period adjustment circuit, and 155 represents an addition circuit.

Claims

[Claims]

1. In a speech analysis and synthesis device of the type that edits and synthesizes a speech segment waveform with a pitch period or so extracted from a natural speech signal, the spectral envelope parameter value and pitch frequency corresponding to the extracted speech segment waveform are predetermined. means for selecting a new spectral envelope parameter, a pitch frequency, and a representative speech segment waveform from among the speech segment waveforms by comparing with respective values corresponding to the representative speech segment waveform;
It is characterized by having means for generating a speech waveform between the two representative speech unit waveforms by weighting and adding together the preceding representative speech unit waveform and the following representative speech unit waveform. Speech analysis and synthesis equipment.