JPS5936273B2

JPS5936273B2 - Fragment editing type speech synthesizer

Info

Publication number: JPS5936273B2
Application number: JP52037738A
Authority: JP
Inventors: 勝信伏木田
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1977-04-01
Filing date: 1977-04-01
Publication date: 1984-09-03
Also published as: JPS53123007A

Abstract

PURPOSE:To make it possible to edit and compose element waveforms with a small number of multiplier circuits by controlling the amplitude of the elongation wavelength, whose element waveform in the preceding pitch period overlaps with that in the following period, by mean-amplitude data in the following pitch period.

Description

【発明の詳細な説明】本発明は素片編集型音声合成装置に関する。[Detailed description of the invention] The present invention relates to a segment editing type speech synthesis device.

あらかじめ、１ピッチ区間から数ピッチ区間程度の時間
長を持つ種々の音声素片波形を自然音声波形より抽出す
る等して用意しておき、合成データとして与えられる素
片番号、ピッチ周期データ、素片波形の平均的な振巾値
を与える振巾データ等に従つて前記音声素片波形を編集
合成することにより合成音声波形を生成する型の素片編
集型音声合成方式が知られている。また、前記音声素片
波形を編集合成する方式として合成データとして与えら
れる前記振巾データに従つて各素片波形の振巾処理を行
なつた後、素片波形をピッチ周期だけずらして後続する
素片波形と加え合わせて合成波形を生成する方式が知ら
れている。前記の合成方式は第１図に示されるようにあ
るピッチ区間において指定された素片波形のサンプル値
をｆｌ（ｎＴ）（第１図の１０１）、それ以前のピッチ
区間において選択された素片波形のサンプル値を該ピッ
チ区間に近いものからｆ２（ｎＴ）（第１図の１０２）
、ｆ３（ｎＴ）（第１図の１０３）、・・・・・・・・
・（ここで、ｎ■１、２、・・・・・・・・・、ＮＮは
素片波形のサンプル値数、Ｔはサンプリング周期を表わ
す）と表わし、各ピッチ区間のピッチ周期をＰ１Ｔ、Ｐ
２Ｔ、Ｐ３Ｔ・・・・・・（Ｐｉは正整数）、平均振巾
値をそれぞれＡｌ、Ａ２、Ａｓ、・、・・・・と表わす
と合成波形ン（ｉＴ）（第１図の１０４１は次の（１拭
で与えられる。Ｖ（ｉＴ）＝Ａ１ｆ１（ｉＴ）＋Ａ２ｆ
２（（ｉ＋ｐ２）Ｔ）＋・・・・−・・・・・・・（１
）前記の合成方式は先行するピッチ区間において用ぃら
れた素片波形を加え合わせずに素片波形をピッチ周期毎
に切り捨てる方式に比較すると合成波形の連続性がよく
音質もよいが（ハ式からも明らかなように乗算の回数が
多く合成波形を生成する際の演算量が多いという欠点が
あり、特に時分割処理により同一の合成回路を用いて多
チャンネルの合成処理を行なう場合等には不利となる。In advance, various speech segment waveforms with time lengths ranging from one pitch section to several pitch sections are prepared by extracting them from natural speech waveforms, and the segment number, pitch period data, and segment number given as synthetic data are prepared. A segment editing type speech synthesis method is known in which a synthesized speech waveform is generated by editing and synthesizing the speech segment waveform according to amplitude data that gives an average amplitude value of a segment waveform. In addition, as a method for editing and synthesizing the speech segment waveforms, after performing amplitude processing on each segment waveform according to the amplitude data given as synthesis data, the segment waveform is shifted by a pitch period and continues. A method is known in which a composite waveform is generated by adding elemental waveforms. As shown in Fig. 1, the above synthesis method uses the sample value of a segment waveform specified in a certain pitch section as fl(nT) (101 in Fig. 1), and uses the sample value of the segment waveform specified in a certain pitch section as fl(nT) (101 in FIG. Sample values of the waveform are f2 (nT) (102 in Fig. 1) from those closest to the pitch section.
, f3(nT) (103 in Figure 1), ...
・(Here, n■1, 2, ......, NN represents the number of sample values of the elemental waveform, T represents the sampling period), and the pitch period of each pitch section is expressed as P1T, P
2T, P3T... (Pi is a positive integer), and the average amplitude values are respectively expressed as Al, A2, As,..., the composite waveform n (iT) (1041 in Figure 1 is Next (given in 1 wipe.V(iT)=A1f1(iT)+A2f
2((i+p2)T)+・・・・−・・・・・・・・・(1
) The above synthesis method provides better continuity of the synthesized waveform and better sound quality than the method of cutting off the segment waveforms for each pitch period without adding the segment waveforms used in the preceding pitch section. As is clear from the above, it has the disadvantage of a large number of multiplications and a large amount of calculation when generating a synthesized waveform, especially when performing multi-channel synthesis processing using the same synthesis circuit by time-division processing. It will be disadvantageous.

本発明の目的は、前記素片編集型音声合成における音声
素片波形の編集合成を音質の劣化をほとんど伴わずに比
較的少ない演算量で行なうことにより実時間で多チャン
ネルの音声素片波形の編集合成を行なう場合にも適した
素片編集型音声合成装置を提供することにある。本発明
の素片編集型音声合成装置は時間的に先行するピッチ区
間において選択された素片波形が時間的に後続するピッ
チ区間に重なる部分の波形と前記後続するピッチ区間に
おいて選択された素片波形とを加え合わせた加算値を算
出する加算回路と、前記加算回路より出力される加算値
と前記後続するピツチ区間において与えられる前記後続
するピツチ区間近傍の平均振巾値とを乗じた値を合成音
声波形として出力する乗算回路とから構成されている。An object of the present invention is to edit and synthesize speech segment waveforms in the segment editing type speech synthesis with almost no deterioration in sound quality and with a relatively small amount of calculations, thereby generating multi-channel speech segment waveforms in real time. It is an object of the present invention to provide a segment editing type speech synthesis device suitable for performing editing synthesis. The segment editing type speech synthesis device of the present invention combines a segment waveform selected in a temporally preceding pitch section with a waveform of a portion where a segment waveform selected in a temporally preceding pitch segment overlaps with a temporally subsequent pitch segment, and a segment waveform selected in the temporally preceding pitch segment. an adding circuit that calculates an added value by adding the waveforms, and a value obtained by multiplying the added value output from the adding circuit by the average amplitude value near the subsequent pitch section given in the subsequent pitch section. It consists of a multiplication circuit that outputs a synthesized speech waveform.

本発明の特徴は時間的に先行するピツチ区間において用
いられた素片波形が後続するピツチ区間に重なる延長波
形の振巾を前記後続するピツチ区間において与えられる
平均振巾データにより制御することにある。A feature of the present invention is that the amplitude of an extended waveform in which a segment waveform used in a temporally preceding pitch section overlaps with a subsequent pitch section is controlled by average amplitude data given in the subsequent pitch section. .

このため本発明を用いると相隣るピツチ区間の境界にお
ける不連続が比較的小さく、比較的少ない乗算回数によ
り素片波形の編集合成を行なうことができるという効果
がある。本発明においては、ピツチ区間の境界時点にお
ける波形の小さな不連続性はほとんど合成音の音質に影
響を与えないという性質を利用して次の（２）式に示す
ように、時間的に先行するピツチ区間において選択され
た素片波形Ｆ２（ＮＴ），Ｆ３（ＮＴ），・・・と該ピ
ツチ区間において選択された索片波形ｆｌ（ＮＴ）の加
算を行なつた後に該ピツチ区間において与えられた平均
振巾データＡ１を乗じて該ピツチ区間における合成波形
（ｙ’（□Ｔ）を算出する。前記（２）式により与えら
れる合成波形ｙ’（ＩＴ】ま（１）式により与えられる
合成波形ｙ（ＩＴ）においては生じなかつた不連続を相
隣るピツチ区間の境界において生ずる。しかしながら、
相隣るピツチ区間において与えられる平均振巾データの
値はほとんどの場合小さな変化し力化ないことから前記
の不連続は小さくほとんど合成音の音質に影響を与えな
い。本発明による（２）式で表わされる編集合成方式は
各タイムスロツト当りの乗算回数が１回であり、（１）
式で表わされる従来の方式に比較して乗算回数が少なく
なることは明らかである。従つて、本発明の方式は１タ
イムスロツトの合成波形の算出時間が比較的少なく、同
時に複数個のチヤンネルに対する異つた合成波形を生成
する言わゆる多チヤンネル音声応答方式に適している。
次に第２図に示す本発明の一実施例について詳細に説明
する。Therefore, when the present invention is used, discontinuities at the boundaries between adjacent pitch sections are relatively small, and elemental waveforms can be edited and synthesized with a relatively small number of multiplications. In the present invention, by utilizing the property that a small discontinuity in the waveform at the boundary point of the pitch interval has almost no effect on the sound quality of the synthesized sound, the temporally preceding After adding the segment waveforms F2 (NT), F3 (NT), ... selected in the pitch interval and the segment waveform fl (NT) selected in the pitch interval, the waveforms given in the pitch interval are added. The composite waveform (y'(□T)) in the pitch interval is calculated by multiplying the average amplitude data A1 obtained by A discontinuity that did not occur in the waveform y(IT) occurs at the boundary between adjacent pitch sections. However,
Since the value of the average amplitude data given in adjacent pitch sections changes small in most cases and does not become significant, the above-mentioned discontinuity is small and hardly affects the sound quality of the synthesized sound. In the editing and synthesis method according to the present invention expressed by equation (2), the number of multiplications per each time slot is one, and (1)
It is clear that the number of multiplications is reduced compared to the conventional method expressed by Eq. Therefore, the method of the present invention requires a relatively short amount of time to calculate a composite waveform for one time slot, and is suitable for a so-called multi-channel voice response method that simultaneously generates different composite waveforms for a plurality of channels.
Next, an embodiment of the present invention shown in FIG. 2 will be described in detail.

まず音声出力指令データが音声出力指令入力端子２０１
を介して制御回路２０２に入力される。First, the audio output command data is input to the audio output command input terminal 201.
The signal is input to the control circuit 202 via.

制御回路２０２は前記音声出力指令データに従い合成デ
ータ記憶回路制御データを合成データ記憶回路制御ゼー
タ伝送路２０３を介して合成データ記憶回路２０６に出
力するとともに素片波形出力回路制御ゼータを素片波形
出力回路制御データ伝送路２０４を介して素片波形出力
回路２０８に出力する。合成データ記憶回路２０６は前
記合成データ記憶回路制御データに従い、あらかじめ記
憶されている素片波形を指定する素片番号データ、ピツ
チデータ、および振巾データをそれぞれ素片波形アドレ
スレジスタ２０９、ピツチ周期データー時記憶回路２１
１および振巾データー時記憶回路２１２に出力する。な
お、前記の素片番号データが素片アドレスレジスタ２０
９に出力される前に素片アドレスレジスタ２０９に一時
記憶されていた古い素片番号データは素片アドレスレジ
スタ２１０に出力される。素片アドレスレジスタ２０９
，２１０に一時記憶される前記素片番号データは、素片
波形出力回路２０８に出力される。素片波形出力回路２
０８は前記素片波形出力回路制御データに従い素片波形
記憶回路２０Ｔにあらかじめ記憶されている素片波形の
なかから前記素片アドレスレジスタ２０９，２１０より
出力される素片番号データに対応する二つの素片波形を
アクセスしスイツチ２１９を介してそれぞれ素片波形を
アクセスしスイツチ２１９を介してそれぞれ素片波形デ
ーター時記憶回路２１３あるいは２１４に一時記憶させ
る。加算回路２１５はピツチ周期データー時記憶回路２
１１より出力されるピツチ周期データに従い、素片波形
データー時記憶回路２１３と２１４よりそれぞれ出力さ
れる二つの素片波形データを前記ピツチ周期データによ
り指定されるピツチ周期だけずらして加え合わせその結
果を乗算回路２１６に出力する。乗算回路２１６は振巾
データー時記憶回路２１２より出力される振巾データと
加算回路２１５より出力される波形データとの乗算を行
ないそのデイジタル値をＤＡ変換回路２１Ｔに出力する
。０Ａ変換回路２１Ｔは乗算回路２１６より出力される
デイジタル値をデイジタルーアナログ変換し合成波形出
力端子２１８を介して合成波形を出力する。The control circuit 202 outputs the synthetic data storage circuit control data to the synthetic data storage circuit 206 via the synthetic data storage circuit control zeta transmission line 203 according to the audio output command data, and outputs the segment waveform output circuit control zeta as a segment waveform. It is output to the elemental piece waveform output circuit 208 via the circuit control data transmission line 204. According to the synthetic data storage circuit control data, the composite data storage circuit 206 stores the segment number data, pitch data, and amplitude data specifying the segment waveform stored in advance in the segment waveform address register 209, respectively, and the pitch period data. Memory circuit 21
1 and amplitude data are output to the storage circuit 212. Note that the segment number data mentioned above is stored in the segment address register 20.
The old segment number data, which was temporarily stored in the segment address register 209 before being output to the segment address register 210, is output to the segment address register 210. Fragment address register 209
, 210 is output to the segment waveform output circuit 208. Piece waveform output circuit 2
08 indicates two numbers corresponding to the segment number data outputted from the segment address registers 209 and 210 from among the segment waveforms previously stored in the segment waveform storage circuit 20T according to the segment waveform output circuit control data. The segment waveforms are accessed via the switch 219, and the segment waveform data is temporarily stored in the memory circuit 213 or 214 via the switch 219. The adder circuit 215 is the pitch period data storage circuit 2.
According to the pitch period data outputted from the unit 11, the two unit waveform data outputted from the unit waveform data storage circuits 213 and 214, respectively, are shifted by the pitch period specified by the pitch period data and added. It is output to the multiplication circuit 216. The multiplication circuit 216 multiplies the amplitude data output from the amplitude data storage circuit 212 by the waveform data output from the addition circuit 215, and outputs the digital value to the DA conversion circuit 21T. The 0A conversion circuit 21T performs digital-to-analog conversion on the digital value output from the multiplication circuit 216, and outputs a composite waveform via the composite waveform output terminal 218.

以上の操作を前記ピツチ周期毎に繰返し行なうことによ
り合成波形出力端子２１８より合成波形が得られる。By repeating the above operations for each pitch period, a composite waveform is obtained from the composite waveform output terminal 218.

なお、上記の実施例においては直前のピツチ区間におい
て指定された素片波形と後続するピツチ区間において指
定された二つの素片波形のみが重なり合うものとして加
算回路２１５により加算値が算出されたが、さらに時間
的に先行するピツチ区間において指定された素片波形を
も用いた装置も、実現できることは明らかである。In the above embodiment, the addition value is calculated by the addition circuit 215 on the assumption that only the segment waveform specified in the immediately preceding pitch interval and the two segment waveforms specified in the subsequent pitch interval overlap. It is clear that it is also possible to realize a device that also uses a segment waveform specified in a temporally preceding pitch interval.

[Brief explanation of drawings]

第１図は素片波形の編集合成方式を説明するための図で
、１０１は該ピツチ区間において選択された素片波形ｆ
ｌ（ＮＴ）を表わし、１０２は１ピツチ区間前において
選択された素片波形Ｆ２（ＮＴ）を表わし、１０３は２
ピツチ区間前において選択された素片波形Ｆ３（ＮＴ）
を表わし、１０４は合成波形Ｖ（ＩＴ）を表わす。第２図は本発明の一実施例を示すブ頭ノク図で、２０１
は音声出力指令入力端子、２０２は制御回路、２０３は
合成データ記憶回路制御データ伝送路、２０４は素片波
形出力回路制御データ伝送路、２０６は合成データ記憶
回路、２０Ｔは素片波形記憶回路、２０８は素片波形出
力回路、２０９および２１０は素片波形アドレスレジス
タ、２１１はピツチ周期データー時記憶回路、２１２は
振巾データー時記憶回路、２１３および２１４は素片波
形データー時記憶回路、２１５は加算回路、２１６は乗
算回路、２１１はＤＡ変換回路、２１８は合成波形出力
端子、２１９はスイツチである。FIG. 1 is a diagram for explaining the editing and synthesis method of segment waveforms, and 101 is the segment waveform f selected in the pitch section.
1 (NT), 102 represents the segment waveform F2 (NT) selected one pitch interval ago, and 103 represents 2
Piece waveform F3 (NT) selected before pitch interval
, and 104 represents the composite waveform V(IT). FIG. 2 is a block diagram showing an embodiment of the present invention.
202 is an audio output command input terminal, 202 is a control circuit, 203 is a synthetic data storage circuit control data transmission line, 204 is a segmental waveform output circuit control data transmission line, 206 is a synthetic data storage circuit, 20T is a segmental waveform storage circuit, 208 is a segment waveform output circuit, 209 and 210 are segment waveform address registers, 211 is a pitch period data memory circuit, 212 is an amplitude data memory circuit, 213 and 214 are segment waveform data memory circuits, and 215 is a segment waveform data memory circuit. 216 is a multiplication circuit, 211 is a DA conversion circuit, 218 is a composite waveform output terminal, and 219 is a switch.

Claims

[Claims]

1. In a segment-editing speech synthesizer that edits and synthesizes speech segment waveforms having a time length equivalent to a plurality of pitch segments prepared in advance, a segment waveform selected in a temporally preceding pitch segment is an adding circuit that outputs an added value obtained by adding a waveform of a portion that overlaps with a subsequent pitch section and an elemental waveform selected in the subsequent pitch section; an added value output from the adding circuit and the subsequent pitch; 1. A segment editing type speech synthesis device comprising: a multiplication circuit that outputs a value multiplied by an amplitude value given in an interval as a synthesized speech waveform.