JPH0315759B2 - - Google Patents
Info
- Publication number
- JPH0315759B2 JPH0315759B2 JP56069029A JP6902981A JPH0315759B2 JP H0315759 B2 JPH0315759 B2 JP H0315759B2 JP 56069029 A JP56069029 A JP 56069029A JP 6902981 A JP6902981 A JP 6902981A JP H0315759 B2 JPH0315759 B2 JP H0315759B2
- Authority
- JP
- Japan
- Prior art keywords
- waveform
- speech
- representative
- pitch
- circuit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Landscapes
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Description
【発明の詳細な説明】
本発明は自然音声から抽出されたピツチ周期程
度の音声素片波形を編集合成する音声分析合成装
置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech analysis and synthesis device that edits and synthesizes speech unit waveforms of approximately pitch periods extracted from natural speech.
従来、音声波形の音声部分が周期的な波形とな
つていること、又母音定常部等においてはピツチ
周期波形(音声素片波形)の変化が比較的ゆるや
かであることを利用して、分析部において代表的
な音声素片波形を選択し、合成部において前記音
声素片波形をくり返し用いて音声波形を生成する
ことにより音声の情報量圧縮を行なう音声分析合
成方式が知られている。前記方式は例えば文献(1)
榑松、井上;「ピツチ単位音声素片の録音編集に
よる音声合成のシミユレーシヨン」昭和45年10月
音響学会講演録文書(2−1−4)p.125〜126
に詳しく述べられている。 Conventionally, the analysis unit takes advantage of the fact that the voice part of the speech waveform is a periodic waveform, and that the pitch periodic waveform (speech element waveform) changes relatively slowly in the vowel stationary part. A speech analysis and synthesis method is known in which a representative speech segment waveform is selected in a synthesis section, and a speech segment waveform is repeatedly used to generate a speech waveform in a synthesis section, thereby compressing the information amount of speech. The above method is described in, for example, literature (1).
Kuromatsu, Inoue; “Simulation of speech synthesis by recording and editing of pitch-unit speech segments” October 1970, Acoustical Society of Japan conference proceedings document (2-1-4) p.125-126
is described in detail.
しかしながら前記従来方式は、隣り合う音声素
片波形のスペクトルが急激に変化する際に、合成
波形の短時間スペクトルの時間変化に大きな不連
続が生じるために合成音質が劣化し易い欠点があ
る。さらに、ピツチ周期に関しては、抽出された
音声素片波形の時間長と前記音声素片波形を編集
合成する際のピツチ周期との差が大きい場合は素
片波形の切り捨て等により合成音質が劣化する欠
点がある。 However, the conventional method has the disadvantage that when the spectra of adjacent speech unit waveforms change rapidly, a large discontinuity occurs in the temporal change of the short-time spectrum of the synthesized waveform, resulting in a tendency for the synthesized sound quality to deteriorate. Furthermore, regarding the pitch period, if the difference between the time length of the extracted speech segment waveform and the pitch period when editing and synthesizing the speech segment waveform is large, the synthesized sound quality will deteriorate due to truncation of the segment waveform, etc. There are drawbacks.
本発明の目的は、自然音声から抽出された音声
素片波形を編集合成する音声分析合成装置におい
て、前記隣り合う音声素片波形間のスペクトル形
状あるいはピツチ周期の急激な変化により生じる
音質変化を緩和し、比較的高品質な合成音の得ら
れる音声分析合成装置を提供することにある。 An object of the present invention is to alleviate changes in sound quality caused by rapid changes in the spectral shape or pitch period between adjacent speech segment waveforms in a speech analysis and synthesis device that edits and synthesizes speech segment waveforms extracted from natural speech. The object of the present invention is to provide a speech analysis and synthesis device capable of producing synthesized speech of relatively high quality.
本発明は分析部において自然音声波形からピツ
チ同期的にすなわちピツチ区間に同期させて抽出
されたスペクトル包絡パラメータ及びピツチ周波
数を比較して前記音声素片波形を代表音声素片波
形として選択する手段と、合成部において先行す
る前記代表音声素片波形と後続する前記代表音声
素片波形とを重みづけした後加え合わせて補間す
ることにより前記2つの代表音声素片間の音声波
形を生成する手段とから構成されている。 The present invention includes means for selecting the speech segment waveform as a representative speech segment waveform by comparing the spectral envelope parameters and pitch frequencies extracted from the natural speech waveform in pitch synchronization, that is, in synchronization with the pitch interval, in the analysis section. , means for generating a speech waveform between the two representative speech segments by weighting the preceding representative speech segment waveform and the following representative speech segment waveform and then adding them together and interpolating them in a synthesis unit; It consists of
本発明の特徴は、ピツチ同期的に抽出されたス
ペクトル包絡パラメータ及びピツチ周波数を比較
して前記音声素片波形を代表音声素片波形として
選択し、先行する前記代表音声素片波形と後続す
る前記代表音声素片波形とを重みづけした後加え
合わせて補間することにより、前記2つの代表音
声素片波形間の音声波形を生成することにある。 The feature of the present invention is to select the speech segment waveform as a representative speech segment waveform by comparing the spectral envelope parameters and pitch frequencies extracted in pitch synchronization, and select the preceding representative speech segment waveform and the following representative speech segment waveform. The purpose of the present invention is to generate a speech waveform between the two representative speech segment waveforms by weighting the representative speech segment waveforms, then adding them together and interpolating them.
代表音声素片波形の選択は、例えば前記自然音
声からスペクトル包絡パラメータとピツチ周波数
をピツチ同期的に抽出し、これらを時間的に先行
する代表音声素片波形のこれらのパラメータ値と
比較することにより行なう。この際スペクトル包
絡パラメータ値間の距離のみならず、ピツチ周波
数間の距離がそれぞれある閾値を超えているか否
かを判定し、いずれかの前記パラメータ値間の距
離が前記閾値を超えていたら新たな代表音声素片
波形として選択し、いずれのパラメータ値間の距
離も前記閾値を超えていないならば先行する前記
代表音声素片波形と同一の音声素片波形とみなし
代表としては選択せず、新たな代表素片波形が選
択されるまで分析をくり返す。このとき先行する
前記代表音声素片波形がどの程度の時間間隔にわ
たつて音声素片波形を代表するかを表わすパラメ
ータを抽出する。 The representative speech segment waveform can be selected, for example, by pitch-synchronously extracting the spectral envelope parameter and pitch frequency from the natural speech and comparing these parameter values with the temporally preceding representative speech segment waveform. Let's do it. At this time, it is determined whether not only the distance between spectral envelope parameter values but also the distance between pitch frequencies exceeds a certain threshold, and if the distance between any of the parameter values exceeds the threshold, a new If the distance between any parameter values does not exceed the threshold value, it is considered to be the same as the preceding representative speech segment waveform, and it is not selected as the representative speech segment waveform, and a new speech segment waveform is selected as the representative speech segment waveform. The analysis is repeated until a representative segmental waveform is selected. At this time, a parameter representing the time interval over which the preceding representative speech segment waveform represents the speech segment waveform is extracted.
次に本発明の代表音声素片間の合成波形の生成
方式について詳細に説明する。 Next, a method for generating a composite waveform between representative speech units according to the present invention will be explained in detail.
時間的に先行する代表音声素片波形の振幅値を
f(t)、後続する代表音声素片波形の振幅値をg
(t)と表わす。(ここにtは時間を表わす。)先
行する代表音声素片波形に重みαi(ここにi=1,
2,……,nであり、ピツチ区間のくり返し数を
表わす。)を乗じ、後続する代表音声素片波形に
重みβi(i=1,2,……,n)を乗じ両者を加
え合わせて前記代表素片間の波形を合成する。即
ち、合成された1ピツチ分の音声素片波形をhi
(t)(i=1,2,……n)とすればhi(t)は
次式で与えられる。 The amplitude value of the temporally preceding representative speech unit waveform is f(t), and the amplitude value of the subsequent representative speech unit waveform is g
(t). (Here t represents time.) Weight α i (here i=1,
2, . . . , n, which represents the number of repetitions of the pitch section. ), the subsequent representative speech segment waveform is multiplied by weight β i (i=1, 2, . . . , n), and both are added to synthesize the waveform between the representative speech segments. That is, the synthesized speech segment waveform for one pitch is h i
(t) (i=1, 2, . . . n), h i (t) is given by the following equation.
hi(t)=αi・f(t)+βi・g(t) ……(1)
このとき重みαi及びβiはくり返すごとに次のよ
うに変化させる。 h i (t) = α i · f (t) + β i · g (t) ... (1) At this time, the weights α i and β i are changed as follows each time it is repeated.
1>α1>α2>……>αo>0
O<β1<β2<……<βo<1 ……(2)
又、先行する代表音声素片波形f(t)のピツ
チ周期Tfと、後続する代表素片波形g(t)のピ
ツチ周期Tgとは一般に時間長が異なるので合成
すべき前記1ピツチ分の音声素片波形hi(t)の
ピツチ周期Tiは前記2つの代表音声素片波形のピ
ツチ周期TfとTgとを内挿して得られるピツチ周
期とする。即ち、例えばTf>Tgのときには
Tf≧T1≧……≧To≧Tg ……(3)
とし、又Tf<Tgのときには
Tf≦T1≦……≦To≦Tg ……(4)
とする。 1>α 1 >α 2 >……>α o >0 O<β 1 <β 2 <……<β o <1 …(2) Also, the pitch of the preceding representative speech segment waveform f(t) Since the period T f and the pitch period T g of the subsequent representative segment waveform g(t) generally have different time lengths, the pitch period T i of the speech segment waveform h i (t) for one pitch to be synthesized is is the pitch period obtained by interpolating the pitch periods T f and T g of the two representative speech unit waveforms. That is, for example, when T f > T g , T f ≧T 1 ≧...≧T o ≧T g ...(3), and when T f < T g , T f ≦T 1 ≦...≦T o ≦T g ……(4).
このとき前記代表音声素片波形のピツチ周期
TfもしくはTgが合成すべき前記1ピツチ分の音
声素片波形hi(t)のピツチ周期Tiに対して長す
ぎる場合あるいは短かすぎる場合が生じるが、例
えば長すぎる場合には余分な波形を切捨て短かす
ぎる場合には波形の最終振幅値を保持させて用い
ることにより、本方式如くピツチ周期の差が大き
くならないように代表音声素片波形を選択した場
合には、音質をほとんど劣化さぜずに合成波形を
生成することができる。 At this time, the pitch period of the representative speech segment waveform is
There may be cases where T f or T g is too long or too short with respect to the pitch period T i of the speech unit waveform h i (t) for one pitch to be synthesized. If the waveform is too short, the final amplitude value of the waveform is retained and used, and if the representative speech unit waveform is selected so that the difference in pitch period does not become large as in this method, the sound quality can be reduced to almost nothing. A composite waveform can be generated without deterioration.
以上の説明ではピツチ区間のくり返しはn回と
したが、これは分析部で抽出した先行する前記代
表音声素片波形が代表する時間間隔を表わすパラ
メータに従つて制御し、1回の場合、複数回の場
合があり、又零回の場合すなわち前記2つの代表
音声素片波形間の合成波形は生成しない場合もあ
り得る。 In the above explanation, the pitch interval is repeated n times, but this is controlled according to the parameter representing the time interval represented by the preceding representative speech unit waveform extracted by the analysis section. In some cases, it may occur zero times, that is, in other words, a composite waveform between the two representative speech segment waveforms may not be generated.
以上の説明の如く、本発明によれば前記2つの
代表音声素片波形間のスペクトル包絡やピツチ周
期の急激な変化による合成音質の劣化を緩和し、
比較的良質な合成音を得られるという効果がある
ことは明らかである。 As described above, according to the present invention, deterioration in synthesized sound quality due to sudden changes in the spectral envelope and pitch period between the two representative speech segment waveforms is alleviated,
It is clear that this method has the effect of obtaining a relatively high quality synthesized sound.
次に図面を用いて本発明の実施例を説明する。 Next, embodiments of the present invention will be described using the drawings.
図は本発明の一実施例を示すブロツク図であ
る。 The figure is a block diagram showing one embodiment of the present invention.
まず自然音声が分析部101の自然音声入力端
子103を介して入力音声波形一時記憶回路10
4、ピツチ抽出回路105及びスペクトル包絡情
報抽出回路106に入力される。ピツチ抽出回路
については例えば文献〔2〕L.R.Rabiner他「A
Comparative Performance of Several Pitch
Detection Algorithms」IEEE Trans.Assp−24
No.5 p.399〜418にいくつかの方式とブロツク
図が詳細に述べられている。又、スペクトル包絡
情報抽出回路については例えば文献〔3〕L.R.
Rabiner他「Digital Processing of Speech
Signals」Prentice−Hall 1978の第6章〜第8章
にいくつかの方式、ブロツク図等が詳細に述べら
れているので、詳細な説明は省略する。 First, natural speech is input to the speech waveform temporary storage circuit 10 via the natural speech input terminal 103 of the analysis section 101.
4. Input to pitch extraction circuit 105 and spectrum envelope information extraction circuit 106. Regarding the pitch extraction circuit, see, for example, the literature [2] LRRabiner et al.
Comparative Performance of Several Pitches
Detection Algorithms” IEEE Trans.Assp−24
Some methods and block diagrams are described in detail on No. 5, pages 399-418. Also, regarding the spectral envelope information extraction circuit, see, for example, document [3] LR
Rabiner et al. “Digital Processing of Speech
Since several systems, block diagrams, etc. are described in detail in Chapters 6 to 8 of "Signals" Prentice-Hall 1978, detailed explanations will be omitted.
分析部制御回路120からピツチ抽出回路制御
情報伝送路145を介して送られる制御情報に従
つてピツチ抽出回路105で抽出されたピツチ情
報はピツチ情報伝送路108を介してピツチ情報
比較回路110及び代表音声素片波形切出し回路
121に送られる。 The pitch information extracted by the pitch extraction circuit 105 in accordance with the control information sent from the analyzer control circuit 120 via the pitch extraction circuit control information transmission line 145 is sent to the pitch information comparison circuit 110 and the representative via the pitch information transmission line 108. It is sent to the speech segment waveform extraction circuit 121.
分析部制御回路120からスペクトル包絡情報
抽出回路制御情報伝送路146を介して送られる
制御情報に従つてスペクトル包絡情報抽出回路1
06で抽出されたスペクトル包絡情報はスペクト
ル包絡情報伝送路109を介してスペクトル包絡
情報比較回路112に送られる。 Spectrum envelope information extraction circuit 1 according to control information sent from analysis unit control circuit 120 via spectrum envelope information extraction circuit control information transmission line 146
The spectrum envelope information extracted in step 06 is sent to the spectrum envelope information comparison circuit 112 via the spectrum envelope information transmission line 109.
ピツチ情報比較回路110では分析部制御回路
120からピツチ情報比較回路制御情報伝送路1
16を介して送られるピツチ情報比較回路制御情
報に従い、前記ピツチ抽出回路105から送られ
たピツチ情報と、ピツチ情報記憶回路111に記
憶されている時間的に先行する代表音声素片波形
のピツチ情報と比較し、その差の絶対値がある基
準値を超えたか否かの比較結果の情報をピツチ情
報比較情報伝送路115を介して判定回路123
に送る。 In the pitch information comparison circuit 110, the analysis section control circuit 120 is connected to the pitch information comparison circuit control information transmission line 1.
16, the pitch information sent from the pitch extraction circuit 105 and the pitch information of the temporally preceding representative speech unit waveform stored in the pitch information storage circuit 111. The comparison result information as to whether the absolute value of the difference exceeds a certain reference value is sent to the determination circuit 123 via the pitch information comparison information transmission line 115.
send to
スペクトル包絡情報比較回路112では、前記
分析部制御回路120からスペクトル包絡情報比
較回路制御情報伝送路119を介して送られる制
御情報に従い、前記スペクトル包絡情報抽出回路
106から送られたスペクトル包絡情報と、スペ
クトル包絡情報記憶回路113に記憶されている
時間的に先行する前記代表音声素片波形のスペク
トル包絡情報と比較し、2つのスペクトル包絡情
報の距離がある基準値を超えたか否かの比較結果
の情報をスペクトル包絡情報比較情報伝送路11
8を介して判定回路123に送る。 In the spectral envelope information comparison circuit 112, according to the control information sent from the analysis section control circuit 120 via the spectral envelope information comparison circuit control information transmission line 119, the spectral envelope information sent from the spectral envelope information extraction circuit 106, Compare the spectral envelope information of the temporally preceding representative speech unit waveform stored in the spectral envelope information storage circuit 113 to determine whether the distance between the two spectral envelope information exceeds a certain reference value. Comparing information to spectrum envelope information Information transmission path 11
8 to the determination circuit 123.
判定回路123では、前記分析部制御回路12
0から判定回路制御情報伝送路124を介して送
られる制御情報に従い、前記2つの比較回路から
送られた比較結果の情報のうち少くとも一方が前
記基準値を超えていることを示している場合には
入力音声波形を新たな代表音声素片波形として採
用すると判定し、前記2つの比較結果の情報が共
に前記基準値を超えていないことを示している場
合には新たな代表音声素片波形として採用しない
と判定し、判定結果の情報を判定情報伝送路A1
14、判定情報伝送路B117、判定情報伝送路
C122、及び判定情報伝送路D147を介して
それぞれ前記ピツチ情報比較回路110、前記ス
ペクトル包絡情報比較回路112、代表音声波形
切出し回路121、及び計数回路148に送る。 In the determination circuit 123, the analysis section control circuit 12
0 through the determination circuit control information transmission path 124, when at least one of the comparison result information sent from the two comparison circuits indicates that it exceeds the reference value. It is determined that the input speech waveform is to be adopted as a new representative speech segment waveform, and if the information of the two comparison results indicates that both do not exceed the reference value, the new representative speech segment waveform is adopted. It is determined that the information is not to be adopted as
14, the pitch information comparison circuit 110, the spectrum envelope information comparison circuit 112, the representative audio waveform extraction circuit 121, and the counting circuit 148 via the judgment information transmission line B117, the judgment information transmission line C122, and the judgment information transmission line D147, respectively. send to
前記ピツチ情報比較回路110及びスペクトル
包絡情報比較回路112では前記判定回路123
から新たな代表音声素片波形として採用したこと
を示す判定情報が送られたらば、それぞれ新たな
代表音声素片波形のピツチ情報及びスペクトル包
絡情報をそれぞれピツチ情報記憶回路111及び
スペクトル包絡情報記憶回路113に書き込む。 In the pitch information comparison circuit 110 and the spectrum envelope information comparison circuit 112, the determination circuit 123
When judgment information indicating that the waveform has been adopted as a new representative speech unit waveform is sent, the pitch information and spectral envelope information of the new representative speech unit waveform are stored in the pitch information storage circuit 111 and the spectral envelope information storage circuit, respectively. Write to 113.
代表音声素片波形切出し回路121は、分析部
制御回路120から代表音声素片波形切出し回路
制御情報伝送路125を介して送られる制御情報
と前記判定回路123から送られる判定情報に従
い、前記ピツチ抽出回路105から送られるピツ
チ情報に基づいて、前記入力音声波形一時記憶回
路104から入力音声波形伝送路107を介して
送られる入力音声波形から代表音声素片波形を切
り出し代表音声素片波形データを分析部代表音声
素片波形データ出力端子126に出力し、代表音
声素片波形のピツチ情報を分析部代表音声素片波
形ピツチ情報出力端子127に出力する。 The representative speech segment waveform extraction circuit 121 extracts the pitch according to the control information sent from the analysis section control circuit 120 via the representative speech segment waveform extraction circuit control information transmission line 125 and the judgment information sent from the judgment circuit 123. Based on the pitch information sent from the circuit 105, a representative speech segment waveform is cut out from the input speech waveform sent from the input speech waveform temporary storage circuit 104 via the input speech waveform transmission path 107, and the representative speech segment waveform data is analyzed. The pitch information of the representative speech segment waveform is outputted to the analysis section representative speech segment waveform pitch information output terminal 127.
計数回路148は分析部制御回路120から計
数回路制御情報伝送路149を介して送られる制
御情報に従い、前記判定回路123から判定情報
伝送路D147を介して送られる判定情報に基づ
いて、隣接する新たな代表音声素片波形として採
用したことを示す判定情報間に発生する代表音声
素片波形として採用しないことを示す判定情報の
数を計測し、これを先行する代表音声素片波形が
何ピツチ区間の音声波形を代表するかというくり
返し情報として分析部代表音声素片波形くり返し
情報出力端子128に出力する。 The counting circuit 148 determines whether an adjacent new The number of determination information indicating that the waveform is not adopted as a representative voice unit waveform that occurs between the determination information indicating that the waveform has been adopted as a representative voice unit waveform is counted, and the number of determination information indicating that the waveform is not adopted as a representative voice unit waveform is calculated. It is output to the analysis section representative speech unit waveform repetition information output terminal 128 as repetition information indicating whether the speech waveform is representative.
分析部制御回路120は前記ピツチ情報比較回
路110、スペクトル包絡情報比較回路112、
判定回路123、及び代表音声素片波形切出し回
路121を制御する。 The analysis unit control circuit 120 includes the pitch information comparison circuit 110, the spectrum envelope information comparison circuit 112,
The determination circuit 123 and the representative speech segment waveform extraction circuit 121 are controlled.
合成部102では、前記代表音声素片波形デー
タが合成部代表音声素片波形データ入力端子12
9から入力され代表音声素片波形データ記憶回路
132に記憶され、前記代表音声素片波形ピツチ
情報が合成部代表音声素片波形ピツチ情報入力端
子130に入力され代表音声素片波形ピツチ情報
記憶回路133に記憶され、前記代表音声素片波
形くり返し情報が合成部代表音声素片波形くり返
し情報入力端子131に入力され代表音声素片波
形くり返し情報記憶回路134に記憶される。 In the synthesis section 102, the representative speech segment waveform data is input to the synthesis section representative speech segment waveform data input terminal 12.
9 and stored in the representative speech segment waveform data storage circuit 132, and the representative speech segment waveform pitch information is inputted to the representative speech segment waveform pitch information input terminal 130 of the synthesis section and stored in the representative speech segment waveform pitch information storage circuit. The representative speech unit waveform repetition information is inputted to the representative speech unit waveform repetition information input terminal 131 of the synthesis section and stored in the representative speech unit waveform repetition information storage circuit 134.
重み及びピツチ周期算出回路138は、合成部
制御回路140から重み及びピツチ周期算出回路
制御情報伝送路139を介して送られれる制御情
報に従い、前記代表音声素片波形ピツチ情報記憶
回路133から代表音声素片波形ピツチ情報伝送
路136を介して、先行する代表音声素片波形の
ピツチ情報と後続する代表音声素片波形のピツチ
情報を得て、前記代表音声素片波形くり返し情報
記憶回路134から代表音声素片波くり返し情報
伝送路137を介して先行する代表音声素片波形
のくり返し情報を得て各くり返し区間ごとの重み
の値とピツチ周期をそれぞれ式(2)及び式(3)もしく
は式(4)を満足するように算出しそれぞれ重み値伝
送路150及びピツチ周期伝送路151を介して
振幅およびピツチ周期調整回路152に送る。 The weight and pitch period calculation circuit 138 calculates the representative voice from the representative speech segment waveform pitch information storage circuit 133 in accordance with control information sent from the synthesis unit control circuit 140 via the weight and pitch period calculation circuit control information transmission line 139. The pitch information of the preceding representative speech segment waveform and the pitch information of the subsequent representative speech segment waveform are obtained via the segment waveform pitch information transmission path 136, and the pitch information of the representative speech segment waveform that follows is obtained from the representative speech segment waveform repetition information storage circuit 134. Repetition information of the preceding representative speech element waveform is obtained via the speech element wave repetition information transmission path 137, and the weight value and pitch period for each repetition section are calculated by equations (2) and (3) or ( 4) is calculated and sent to the amplitude and pitch period adjustment circuit 152 via the weight value transmission path 150 and the pitch period transmission path 151, respectively.
振幅およびピツチ周期調整回路152は前記合
成部制御回路140から振幅およびピツチ周期調
整回路制御情報伝送路153を介して送られる制
御情報に従い、前記代表音声素片波形データ記憶
回路132から代表音声素片波形データ伝送路1
35を介して先行する代表音声素片波形データと
後続する代表音声素片波形データを入手し、前記
重み及びピツチ周期算出回路138から送られた
重みを各音声素片波形データに乗じ、前記重み及
びピツチ周期算出回路138から送られたピツチ
周期に対して各代表音声素片波形データのピツチ
周期が短い場合は最終振幅値を保持し長い場合は
超過分の波形を切捨ててピツチ周期を調整し重み
づけされた各音声素片波形データを音声素片波形
データ伝送路154を介して加算回路155へ送
る。 The amplitude and pitch period adjustment circuit 152 selects the representative speech segment from the representative speech segment waveform data storage circuit 132 in accordance with control information sent from the synthesis section control circuit 140 via the amplitude and pitch period adjustment circuit control information transmission line 153. Waveform data transmission line 1
35, obtain the preceding representative speech segment waveform data and the subsequent representative speech segment waveform data, multiply each speech segment waveform data by the weight sent from the weight and pitch period calculation circuit 138, and calculate the weight. If the pitch period of each representative speech unit waveform data is short with respect to the pitch period sent from the pitch period calculation circuit 138, the final amplitude value is held; if it is long, the pitch period is adjusted by cutting off the excess waveform. Each weighted speech unit waveform data is sent to an adding circuit 155 via a speech unit waveform data transmission line 154.
加算回路155は合成部制御回路140から加
算回路制御情報伝送路156を介して送られる制
御情報に従つて、前記振幅およびピツチ周期調整
回路から送られた重みづけられた先行する音声素
片波形と後続する音声素片波形データの対応する
時刻の振幅値を加え合わせて補間波形を生成し、
補間波形伝送路141を介して編集合成回路14
2に送る。 The adder circuit 155 combines the weighted preceding speech segment waveform sent from the amplitude and pitch period adjustment circuit in accordance with control information sent from the synthesizer control circuit 140 via the adder circuit control information transmission line 156. Generate an interpolated waveform by adding the amplitude values at the corresponding times of the subsequent speech segment waveform data,
The editing synthesis circuit 14 via the interpolation waveform transmission line 141
Send to 2.
編集合成回路142は合成部制御回路140か
ら編集合成回路制御情報伝送路143を介して送
られる制御情報に従い、前記代表音声素片波形デ
ータ記憶回路135から前記代表音声素片波形デ
ータ伝送路135を介して送られる代表音声素片
波形データと、前記代表音声素片波形間補間回路
138から送られる前記補間波形を編集合成し、
合成音声出力端子144に出力する。 The editing/synthesizing circuit 142 selects the representative speech segment waveform data transmission path 135 from the representative speech segment waveform data storage circuit 135 in accordance with the control information sent from the synthesis unit control circuit 140 via the editing/synthesizing circuit control information transmission path 143. editing and synthesizing the representative speech unit waveform data sent through the representative speech unit waveform data and the interpolation waveform sent from the representative speech unit waveform interpolation circuit 138;
It is output to the synthesized speech output terminal 144.
合成部制御回路140は前記重み及びピツチ周
期算出回路138、前記振幅およびピツチ周期調
整回路152、前記加算回路155及び前記編集
合成回路142を制御する。 The synthesis section control circuit 140 controls the weight and pitch period calculation circuit 138, the amplitude and pitch period adjustment circuit 152, the addition circuit 155, and the editing and synthesis circuit 142.
なお、図の実施例では分析部で得られた代表音
声素片波形データ、代表音声素片波形ピツチ情
報、代表音声素片波形くり返し情報を記憶回路に
蓄える構成としているが、これらの諸データを記
憶回路に蓄えずリアルタイムで合成することによ
り音声の通信装置として実現することも可能であ
る。 In the embodiment shown in the figure, the representative speech segment waveform data, representative speech segment waveform pitch information, and representative speech segment waveform repetition information obtained by the analysis section are stored in the storage circuit. It is also possible to realize an audio communication device by synthesizing in real time without storing it in a memory circuit.
図は本発明の一実施例を示すブロツク図であ
る。
同図において、101は分析部、102は合成
部、103は自然音声入力端子、104は入力音
声波形一時記憶回路、105はピツチ抽出回路、
106はスペクトル包絡情報抽出回路、110は
ピツチ情報比較回路、111はピツチ情報記憶回
路、112はスペクトル包絡情報比較回路、11
3はスペクトル包絡情報記憶回路、120は分析
部制御回路、121は代表音声素片波形切出し回
路、123は判定回路、1126は分析部代表音
声素片波形データ出力端子、127は分析部代表
音声素片波形ピツチ情報出力端子、128は分析
部代表音声素片波形くり返し情報出力端子、12
9は合成部代表音声素片波形データ入力端子、1
30は合成部代表音声素片波形ピツチ情報入力端
子、131は合成部代表音声素片波形くり返し情
報入力端子、132は代表音声素片波形データ記
憶回路、133は代表音声素片波形ピツチ情報記
憶回路、134は代表音声素片波形くり返し情報
記憶回路、138は重み及び周期算出回路、14
0は合成部制御回路、142は編集合成回路、1
44は合成音声出力端子、148は計数回路、1
52は振幅およびピツチ周期調整回路、155は
加算回路を表わす。
The figure is a block diagram showing one embodiment of the present invention. In the figure, 101 is an analysis section, 102 is a synthesis section, 103 is a natural speech input terminal, 104 is an input speech waveform temporary storage circuit, 105 is a pitch extraction circuit,
106 is a spectral envelope information extraction circuit, 110 is a pitch information comparison circuit, 111 is a pitch information storage circuit, 112 is a spectral envelope information comparison circuit, 11
3 is a spectrum envelope information storage circuit, 120 is an analysis section control circuit, 121 is a representative speech unit waveform extraction circuit, 123 is a determination circuit, 1126 is an analysis section representative speech unit waveform data output terminal, and 127 is an analysis section representative phoneme. Single waveform pitch information output terminal, 128 is a representative voice unit waveform repetition information output terminal of the analysis section, 12
9 is a representative voice unit waveform data input terminal of the synthesis section; 1
30 is a representative speech unit waveform pitch information input terminal for the synthesis section, 131 is a representative speech unit waveform repetition information input terminal for the synthesis section, 132 is a representative speech unit waveform data storage circuit, and 133 is a representative speech unit waveform pitch information storage circuit. , 134 is a representative speech unit waveform repetition information storage circuit, 138 is a weight and period calculation circuit, 14
0 is a synthesis unit control circuit, 142 is an editing synthesis circuit, 1
44 is a synthesized voice output terminal, 148 is a counting circuit, 1
Reference numeral 52 represents an amplitude and pitch period adjustment circuit, and 155 represents an addition circuit.
Claims (1)
の音声素片波形を編集合成する型の音声分析合成
装置において、前記抽出された前記音声素片波形
に対応するスペクトル包絡パラメータ値およびピ
ツチ周波数を先行する代表音声素片波形に対応す
るそれぞれの値と比較して新たなスペクトル包絡
パラメータ、ピツチ周波数および前記音声素片波
形の中から代表音声素片波形を選択する手段と、
先行する前記代表音声素片波形と後続する前記代
表音声素片波形とを重みづけして加え合わせるこ
とにより、前記2つの代表音声素片波形間の音声
波形を生成する手段とを有することを特徴とする
音声分析合成装置。1. In a speech analysis and synthesis device of the type that edits and synthesizes a speech segment waveform with a pitch period or so extracted from a natural speech signal, the spectral envelope parameter value and pitch frequency corresponding to the extracted speech segment waveform are predetermined. means for selecting a new spectral envelope parameter, a pitch frequency, and a representative speech segment waveform from among the speech segment waveforms by comparing with respective values corresponding to the representative speech segment waveform;
It is characterized by having means for generating a speech waveform between the two representative speech unit waveforms by weighting and adding together the preceding representative speech unit waveform and the following representative speech unit waveform. Speech analysis and synthesis equipment.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP56069029A JPS57185097A (en) | 1981-05-08 | 1981-05-08 | Voice analyzer/synthesizer |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP56069029A JPS57185097A (en) | 1981-05-08 | 1981-05-08 | Voice analyzer/synthesizer |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPS57185097A JPS57185097A (en) | 1982-11-15 |
| JPH0315759B2 true JPH0315759B2 (en) | 1991-03-01 |
Family
ID=13390741
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP56069029A Granted JPS57185097A (en) | 1981-05-08 | 1981-05-08 | Voice analyzer/synthesizer |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JPS57185097A (en) |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS5936273B2 (en) * | 1977-04-01 | 1984-09-03 | 日本電気株式会社 | Fragment editing type speech synthesizer |
| JPS6422636A (en) * | 1987-07-16 | 1989-01-25 | Nissan Motor | Constant speed driving device for automobile |
-
1981
- 1981-05-08 JP JP56069029A patent/JPS57185097A/en active Granted
Also Published As
| Publication number | Publication date |
|---|---|
| JPS57185097A (en) | 1982-11-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CA1046642A (en) | Phase vocoder speech synthesis system | |
| US5630013A (en) | Method of and apparatus for performing time-scale modification of speech signals | |
| EP0714089B1 (en) | Code-excited linear predictive coder and decoder, and method thereof | |
| US4058676A (en) | Speech analysis and synthesis system | |
| US4301329A (en) | Speech analysis and synthesis apparatus | |
| CA1065490A (en) | Emphasis controlled speech synthesizer | |
| EP0427953A2 (en) | Apparatus and method for speech rate modification | |
| US4821324A (en) | Low bit-rate pattern encoding and decoding capable of reducing an information transmission rate | |
| JPH06266390A (en) | Waveform editing type speech synthesizer | |
| US3909533A (en) | Method and apparatus for the analysis and synthesis of speech signals | |
| US4945565A (en) | Low bit-rate pattern encoding and decoding with a reduced number of excitation pulses | |
| JPH01155400A (en) | Voice encoding system | |
| JPH04358200A (en) | Speech synthesizer | |
| US4601052A (en) | Voice analysis composing method | |
| US5826231A (en) | Method and device for vocal synthesis at variable speed | |
| JPH0315759B2 (en) | ||
| JPS642960B2 (en) | ||
| US12170092B2 (en) | Signal processing device, method, and program | |
| US4520502A (en) | Speech synthesizer | |
| JPH04279B2 (en) | ||
| JPS6232800B2 (en) | ||
| JPS5816297A (en) | Voice synthesizing system | |
| JP2535807B2 (en) | Speech synthesizer | |
| DE3943795C2 (en) | interpolation | |
| JP3112462B2 (en) | Audio coding device |