JPS5948399B2

JPS5948399B2 - Speech synthesis method

Info

Publication number: JPS5948399B2
Application number: JP53155350A
Authority: JP
Inventors: 正宏浜田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1978-12-15
Filing date: 1978-12-15
Publication date: 1984-11-26
Also published as: JPS5581400A

Description

【発明の詳細な説明】本発明は、素片編集方式を用いた音声合成法に関し、合
成音の音韻的品質を損なう事なく、素片に必要とされる
データ数を削減することを目的とするものである。[Detailed Description of the Invention] The present invention relates to a speech synthesis method using a segment editing method, and an object of the present invention is to reduce the number of data required for a segment without impairing the phonological quality of the synthesized sound. It is something to do.

従来の素片編集法においては、必要とされる音声素片を
複数ビット（通常６〜１２ビット）で量子化し、これを
一定の合成法に従つて記録・再生していた。In the conventional segment editing method, a required speech segment is quantized into multiple bits (usually 6 to 12 bits), and then recorded and reproduced according to a certain synthesis method.

所で一般に自然音声の摩擦子音は一定の特性のフィルタ
を通つた雑音と考えられるため、波形には周期性がなく
、これを母音部のピッチと同程度の区間長の素片の繰り
返しによつて合成しようとすると、同形状の波形の繰り
返しに起因する発振音が発生し、これが合成音の音質を
劣化ざせていた。上記発振音を抑制するには十分長い（
３０ｍｓｅｃ程度以上の長さ）素片を用いることが考え
られるが、その為には大量のデジタルデータを必要とし
、そのためのコストアップや装置体質の増加が欠点とな
つていた。本発明は摩擦子音合成に於ける上記問題点を
除去するものである。まず、本発明の音声合成方式によ
つて合成された音声波形例を示す。第１図Ａは「ハチ（
８）」と発音した場合の男声の自然音声波形である。However, since fricative consonants in natural speech are generally considered to be noise that has passed through a filter with certain characteristics, their waveforms do not have periodicity. When attempting to synthesize the same waveforms, oscillations were generated due to the repetition of the same waveform, which degraded the sound quality of the synthesized sound. Long enough to suppress the above oscillation noise (
It is conceivable to use an elemental piece (length of about 30 msec or more), but this requires a large amount of digital data, which has the drawback of increasing costs and equipment structure. The present invention eliminates the above problems in fricative consonant synthesis. First, an example of a speech waveform synthesized by the speech synthesis method of the present invention will be shown. Figure 1A is “Hachi (
This is a natural speech waveform of a male voice when pronouncing "8)".

図中１及び２で示した波形部分はそれぞれれ及び〔ｔｆ
〕に対応している。同図Ｂはこの自然波形の素片合成法
による合成波形である。図中３及び４で示した部分は、
上図の１及び２で示す子音波形部から一部を抜き取り、
これをそれぞれ２回繰り返して接続したものである。一
方同図Ｃは上記Ｂ図中の３及び４の子音波形を零交叉波
形に変換し、それぞれ波形５及び６として該当部分に適
用したものである。ここで零交叉波とは振幅の絶対値が
常に一定で、±ｏを境にして土下振動する波形のことで
、謂わば波形の位相情報のみが残り振幅情報が欠落した
波形のことである。第１図Ｄは子音〔ｔｆ〕の自然波形
の一部と、これの零交叉波形の拡大図である。次に本発
明の一実施例を第２図とともに説明する。The waveform portions indicated by 1 and 2 in the figure are respectively and [tf
]. Figure B shows a synthesized waveform obtained by the elemental piece synthesis method of this natural waveform. The parts indicated by 3 and 4 in the figure are
Extract a part from the child waveform part shown in 1 and 2 in the above figure,
This process was repeated twice and connected. On the other hand, in Figure C, the child waveforms 3 and 4 in Figure B are converted into zero-crossing waveforms and applied to the corresponding portions as waveforms 5 and 6, respectively. Here, a zero-crossing wave is a waveform in which the absolute value of the amplitude is always constant and vibrates underground with ±o as the boundary, so to speak, it is a waveform in which only the phase information of the waveform remains and the amplitude information is missing. . FIG. 1D is an enlarged view of a part of the natural waveform of the consonant [tf] and its zero-crossing waveform. Next, one embodiment of the present invention will be described with reference to FIG.

第２図において、１は外部から与えられたビットパター
ンの変換テーブル、２は単語構成のための素群スタート
アドレス読み出し用テーブル、３は素群テーブル、４は
零交叉波用１ビット出力あるいは通常の複数ビット出力
の切換え用ビットデータ、５は零交叉波用素片ファイル
、６は通常の複数ビットデータ素片ファイル、ＴはＤ−
Ａ変換器、８は出力レベルコントロール用データ、９は
出力レベルコントロール用アツテネータである。第２図
は零交叉波と通常の素片波形とを適宜選択して出力する
ためのブロック図である。第２図において、５は零交叉
波の波形形状のメモリで、例えば正極信号に対しては１
、負極信号に対してはφが、書き込まれている。書き込
みの順序は、例えば１ワード当り８ビットのメモリを使
用する場合には、１ワード毎にＬＳＢから始まつてＭＳ
Ｂで終わるようにすればよい。摩擦子音の場合は一般に
素片長が音声ピツチに無関係に選べるので、各素片のデ
ータはあるメモリワードのＬ引切ゝら始まり別のメモリ
ワードのＭＳＢで終わるように選べば、読み出しの際便
利である。６は摩擦子音以外の素片形状のメモリで、１
サンプル点当り１ワードの全ビツトを用いて素片形状を
表現している。In Figure 2, 1 is a conversion table for bit patterns given externally, 2 is a table for reading prime group start addresses for word construction, 3 is a prime group table, and 4 is a 1-bit output for zero-crossing wave or normal 5 is the zero-crossing wave segment file, 6 is the normal multi-bit data segment file, T is D-
A converter, 8 is data for output level control, and 9 is an attenuator for output level control. FIG. 2 is a block diagram for appropriately selecting and outputting a zero-crossing wave and a normal elemental waveform. In Fig. 2, 5 is a memory for the waveform shape of a zero-crossing wave, for example, 1 for a positive signal.
, φ is written for the negative signal. For example, when using a memory with 8 bits per word, the writing order starts from the LSB and MSB for each word.
It should end in B. In the case of fricative consonants, the segment length can generally be selected regardless of the phonetic pitch, so it is convenient when reading out data for each segment, starting from the L-cut of one memory word and ending with the MSB of another memory word. It is. 6 is a memory of elemental shapes other than fricative consonants, 1
The shape of a segment is expressed using all bits of one word per sample point.

上記５及びｅは素片の形状に関するメモリであるため、
以後素片フアイルと呼ぶ。３は上記素片フアイルを読み
出すためのスタートアドレス、ストツプアドレス、素片
繰り返し回数の他、さらに下記の４及び８のデータが書
かれたメモリで、同一素片の何回かの繰り返しからなる
波形（素群と呼ぶ）を規定をするものであり、素群テー
ブルと呼ぶ。Since the above 5 and e are memories regarding the shape of the elemental piece,
Hereafter, it will be called a segment file. 3 is a memory in which, in addition to the start address, stop address, and number of repetitions of the fragments for reading the fragment file mentioned above, the following data 4 and 8 are written, which consists of several repetitions of the same fragment. It defines waveforms (called prime groups) and is called a prime group table.

４はこの素群テーブル中のデータで、指定する素片が摩
擦子音であるかどうかをｌビツトで示している。4 is data in this prime group table, which indicates whether or not the specified elementary piece is a fricative consonant using l bits.

又、同図中７はＤ−Ａ変換器、９はデジタルにコントロ
ール可能なアツテネータであり、素群テーブル中のデー
タ８は素片の再生レベルを示しており、９のアツテナー
タをコントロールしている。ある単語を合成するには先
に述べた素群を複数個接続して出力することが必要であ
るが、メモリ２には種々の素群テーブルのスタートアド
レスが連続して書き込まれており、このメモリ２を順に
読み出していけば、目的とする単語音声の合成に必要な
素群が次々に出力できるようになつている。メモリ１は
外部から与えられたビツトパターンを２のメモリのスタ
ートアドレスに変換するためのメモリである。一方音声
合成に必要なプログラムを第３図に示す。Also, in the figure, 7 is a D-A converter, 9 is an attenuator that can be controlled digitally, and data 8 in the elementary group table indicates the reproduction level of the elementary piece, which controls the attenuator 9. . To synthesize a certain word, it is necessary to connect and output multiple prime groups mentioned above, but the start addresses of various prime group tables are written in succession in memory 2, and this By sequentially reading out the memory 2, the prime groups necessary for synthesizing the target word speech can be output one after another. Memory 1 is a memory for converting an externally applied bit pattern into a start address of memory 2. On the other hand, the program necessary for speech synthesis is shown in FIG.

１０のステツプにおいて、先に述べた素群テーブル中の
データ４が参照され、プログラムのブランチ方向が指定
される。In step 10, data 4 in the prime group table mentioned above is referenced to specify the branch direction of the program.

出力素片が摩擦子音の場合には、素片フアイル中の１ワ
ードを読み取り、これをＬＳＢから１ビツトづつ出力し
、ｌワード中の全てのビツトが出力し終われば次のワー
ドをアドレスするようになつている。一方、出力素片が
摩擦子音でない場合には、１ワード全体を出力すればよ
い。以上、第２図の機能プロツクを第３図のプログラム
で動作させることにより、摩擦子音の場合には零交叉波
を、摩擦子音以外の場合には通常の素片データを出力し
て素片編集法による合成音声信号を得ることができるも
のである。When the output segment is a fricative consonant, read one word in the segment file and output it one bit at a time starting from the LSB, and when all bits in the l word have been output, address the next word. It's getting old. On the other hand, if the output segment is not a fricative consonant, it is sufficient to output one entire word. As described above, by operating the function block in Figure 2 with the program in Figure 3, you can output zero-crossing waves for fricative consonants and normal segment data for non-fricative consonants to edit the segments. It is possible to obtain a synthesized speech signal using the method.

本発明の音声合成方式は上記のような構成であり、母音
部の素片等に比較して、より長い素片長を必要とする摩
擦子音用の素片に対して零交叉波を適用するためデータ
の縮少化が可能であり、一方上記摩擦子音部以外には通
常の複数ビツトワードのデータを用いるため、音韻性の
欠損が少ない合成音声を得ることができる利点を有する
ものである。The speech synthesis method of the present invention has the above-mentioned configuration, and the zero-crossing wave is applied to the fragments for fricative consonants, which require a longer fragment length compared to the fragments for vowels, etc. It is possible to reduce the data, and on the other hand, since normal multi-bit word data is used for the parts other than the fricative consonant part, it has the advantage that synthesized speech with less phonological defects can be obtained.

[Brief explanation of drawings]

第１図Ａは自然音声波形図、第１図Ｂは合成音声波形図
、第１図Ｃは摩擦子音部に零交叉波を用いた合成音声波
形図、第１図Ｄは摩擦子音部（Ｔｆ）の自然音声波形図
およびその零交叉波形図、第２図は本発明の音声合成方
式を実施する装置のプロツク図、第３図は同装置のプロ
グラムフロチヤート図である。１・・・・・・変換テーブル、２・・・・・・素群ス
タートアドレス読み出し用テーブル、３・・・・・・素
群テーブル、４・・・・・・切換え用ビツトデータ、５
・・・・・・零交叉波用素片フアイル、６・・・・・・
複数ビツトデータ素片フアィル、Ｔ・・・・・・Ｄ −
Ａ変換器、８・・・・・・出力レベルコントロール用
データ、９・・・・・・アツテネータ。Figure 1A is a natural speech waveform diagram, Figure 1B is a synthetic speech waveform diagram, Figure 1C is a synthetic speech waveform diagram using zero-crossing waves in the fricative consonant part, and Figure 1D is a diagram of the fricative consonant part (Tf ) and its zero-crossing waveform diagram, FIG. 2 is a block diagram of an apparatus implementing the speech synthesis method of the present invention, and FIG. 3 is a program flowchart diagram of the same apparatus. 1... Conversion table, 2... Table for reading prime group start address, 3... Prime group table, 4... Bit data for switching, 5
・・・・・・Element file for zero crossing wave, 6・・・・・・
Multiple bit data segment file, T...D-
A converter, 8...Output level control data, 9...Attenuator.

Claims

[Claims]

1. For fragments of fricative consonants or similar noise waveforms, use the zero-cross or wave of the corresponding fricative consonants or similar noise waveforms, while for fragments other than the above fricative consonants or similar noise waveforms. On the other hand, there is a speech synthesis method characterized by using digital data segments quantized with multiple bits.