JP2848604B2

JP2848604B2 - Speech synthesizer

Info

Publication number: JP2848604B2
Application number: JP63025941A
Authority: JP
Inventors: 治木村; 延佳海木; 淳悟鬼頭
Original assignee: Consejo Superior de Investigaciones Cientificas CSIC
Current assignee: Consejo Superior de Investigaciones Cientificas CSIC
Priority date: 1988-02-03
Filing date: 1988-02-03
Publication date: 1999-01-20
Anticipated expiration: 2014-01-20
Also published as: JPH01200290A

Description

【発明の詳細な説明】〈産業上の利用分野〉この発明は、規則合成音声を生成する音声合成装置に
関する。DETAILED DESCRIPTION OF THE INVENTION <Industrial Application Field> The present invention relates to a speech synthesizer that generates rule-synthesized speech.

〈従来の技術〉自然な合成音声を生成するためには所定の母音を無声
化することが重要である。従来、母音の無声化の規則
としては、「桜井茂治“共通語の発音で注意すべきこと
がら”日本語発音アクセント辞典（改定新版）NHK編，
解説・付録P.128 1985年」に一般的な法則として示され
ている。これには、母音無音化の生起する典型的な音韻
環境について詳しく述べられている。<Prior Art> In order to generate a natural synthesized voice, it is important to make a predetermined vowel unvoiced. Conventionally, rules for devoicing vowels include the following: "Shigeru Sakurai" What to Watch Out for Common Language Pronunciations "Japanese Pronunciation Accent Dictionary (revised new edition) NHK,
Commentary / Appendix P.128 1985 ”is shown as a general rule. It describes in detail the typical phonological environment in which vowel silencing occurs.

また、実際に音声合成装置において用いられる無声化
の規則としては、例えば「佐藤大和，箱田和雄“法則に
よる音声合成”研究実用化報告第27巻第12号、P.2562
（62）電々公社編1978年」がある。Examples of the devoicing rules that are actually used in a speech synthesizer include, for example, “Yamato Sato, Kazuo Hakoda,“ Speech Synthesis by Law ”, Research and Application Report, Vol. 27, No. 12, P.2562
(62) Dentsu Public Corporation, 1978. "

第３図は上記従来の母音無声化規則を用いた無声化判
定ルーチンのフローチャートである。以下、第３図を用
いてこの従来の無声化判定ルーチンについて説明する。FIG. 3 is a flowchart of a devoicing determination routine using the conventional vowel devoicing rule. Hereinafter, the conventional voiceless judgment routine will be described with reference to FIG.

ステップS₃₁で、対象とする母音が高舌母音（/i/,/u
/）であるか否かが判別される。その結果、高舌母音で
あればステップS₃₂に進み、そうでなければ有声と判断
してステップS₃₇に進む。In step S _31, the vowel of interest are those vowels (/ i /, / u
/) Is determined. As a result, the process proceeds to step S ₃₂ if it is such a vowel, the process proceeds to step S ₃₇ it is determined that the voiced otherwise.

ステップS₃₂で、対象とする母音が無声子音に挾まれ
るか否かが判別される。その結果、挾まれていればステ
ップS₃₃に進み、そうでなければ有声と判断してステッ
プS₃₇に進む。In step S _32, whether or not a vowel of interest is sandwiched unvoiced consonant is determined. As a result, the process proceeds to step S ₃₃ if it is pinched, the process proceeds to step S ₃₇ it is determined that the voiced otherwise.

ステップS₃₃で、対象とする母音がアクセント核（音
の高さが相対的に高から低に変化する位置）を有してい
るか否かが判別される。その結果、アクセント核を有し
ていればステップS₃₉に進み、そうでなければステップS
₃₄に進む。In step S _33, the vowel of interest accents nuclear whether it has the (pitch of a sound from a relatively high position changes to low) is determined. As a result, the process proceeds to step S ₃₉ if it has an accent nucleus, step S otherwise
Proceed to ₃₄ .

ステップS₃₄で、対象とする母音が第１モーラである
か否かが判別される。その結果、第１モーラであればス
テップS₃₆に進み、そうでなければステップS₃₅に進む。In step S _34, whether or not a vowel of interest is the first mora is determined. As a result, if the first mora proceeds to step S _36, the process proceeds to step S ₃₅ otherwise.

ステップS₃₅で、先行する母音がすでに無声化されて
いるか否かが判別される。その結果、無声化されていれ
ば準無声化としてステップS₃₉へ進み、そうでなければ
ステップS₃₆に進む。In step S _35, whether or not the preceding vowel is already devoiced is determined. As a result, the process proceeds to step S ₃₉ as a quasi-unvoiced if it is silent and proceeds to step S ₃₆ otherwise.

ステップS₃₆で、対象とする母音が同種の無声摩擦音
に挾まれているか否かが判別される。その結果、挾まれ
ていれば準無声化としてステップS₃₉に進み、そうでな
ければステップS₃₈に進む。In step S _36, whether or not a vowel of interest is sandwiched unvoiced fricative of the same type is determined. As a result, the process proceeds to step S ₃₉ as a quasi-unvoiced if sandwiched, the process proceeds to step S ₃₈ otherwise.

ステップS₃₇で、対象とする母音を有声化すると判定
される。In step S _37, it is determined to voicing the vowel of interest.

ステップS₃₈で、対象とする母音を無声化すると判定
される。In step S _38, the vowels of interest is determined to be unvoiced.

ステップS₃₉で、対象とする母音の継続時間を短くす
るなどの準無声化処理が実行される。In step S _39, the quasi-unvoiced processing such as shortening the duration of the vowels of interest is performed.

〈発明が解決しようとする課題〉しかしながら、上記従来の無声化規則では、母音無声
化と音韻系列との間および母音無声化とアクセントパタ
ーンとの間に明確な無声化基準が示されておらず、無声
化の判定基準としは曖昧な点がある。また、発生速度が
変わると一般に無声化する割合も変わるが、発生速度と
無声化基準との関係が定量時に記述されていない。その
ため、無声化の規則が実音声と必ずしも一致しない場合
がある。したがって、上記従来の無声化の規則によって
生成された合成音声は不自然に聞えるという問題があ
る。<Problems to be Solved by the Invention> However, the above-mentioned conventional devoicing rules do not show a clear devoicing criterion between vowel devoicing and a phonological sequence and between vowel devoicing and an accent pattern. However, there is an ambiguous point as a criterion for devoicing. In addition, when the generation speed changes, the rate of voicelessness generally changes, but the relationship between the generation speed and the voiceless reference is not described at the time of quantification. For this reason, the rule of silence may not always match the actual voice. Therefore, there is a problem that the synthesized speech generated by the above-mentioned conventional rule of silence sounds unnatural.

そこで、この発明の目的は、音韻系列別の無声化生起
度合いと、アクセントパターン別の無声化生起度合い
と、発声速度に応じて設定される無声化閾値とに基づい
て、対象母音の有声化・無声化を判定することによっ
て、実音声に近い自然な合成音声を生成することができ
る音声合成装置を提供することにある。Therefore, an object of the present invention is to make the target vowel voicing / voicing based on the degree of unvoiced occurrence for each phoneme sequence, the degree of unvoiced occurrence for each accent pattern, and the unvoiced threshold set according to the utterance speed. It is an object of the present invention to provide a speech synthesizer that can generate a natural synthesized speech close to a real speech by judging unvoiced voice.

〈課題を解決するための手段〉上記目的を達成するため、請求項１に係る発明は、入
力される文字列から音韻系列／アクセントパターンを決
定する文字列解析部，音韻系列／アクセントパターン／
規則ファイルに格納された規則に従って音声合成パラメ
ータを作成する音声合成パラメータ作成部，音声合成パ
ラメータに基づいて音声合成を行う音声合成部を備えた
音声合成装置であって、音声合成パラメータ作成部は、
音韻系列と，規則ファイルに格納された音韻系列別の無
声化生起度合いに基づいて，各母音毎の音韻系列無声化
生起度合いを設定する第１無声化生起度合い設定手段
と、アクセントパターンと，規則ファイルに格納された
アクセントパターン別の無声化生起度合いに基づいて，
各母音毎のアクセントパターン無声化生起度合いを設定
する第２無声化生起度合い設定手段と、音韻系列無声化
生起度合いとアクセントパターン無声化生起度合いか
ら，各母音毎の無声化生起度合いを算出する無声化生起
度合い算出手段と、指定される発声速度と，規則ファイ
ルに格納された発声速度別の無声化判定閾値に基づいて
無声化判定閾値を設定する設定手段と、設定された無声
化生起度合い，無声化判定閾値に基づいて各母音毎に無
声化・有声化を判定する母音有声化無声化判定手段とを
備え、母音有声化無声化判定手段の判定結果に従って音
声合成パラメータを作成することを特徴としている。<Means for Solving the Problems> In order to achieve the above object, an invention according to claim 1 is a character string analysis unit that determines a phoneme sequence / accent pattern from an input character string, and a phoneme sequence / accent pattern / accent pattern.
A speech synthesis device comprising: a speech synthesis parameter creation unit that creates speech synthesis parameters according to rules stored in a rule file; and a speech synthesis unit that performs speech synthesis based on the speech synthesis parameters.
First unvoiced occurrence degree setting means for setting a degree of unvoiced occurrence for each vowel based on the phoneme sequence and the degree of unvoiced occurrence for each phonological sequence stored in the rule file; Based on the degree of unvoiced occurrence for each accent pattern stored in the file,
Second unvoiced occurrence degree setting means for setting an accent pattern unvoiced occurrence degree for each vowel, and unvoicedness calculation for each vowel from the phonological sequence unvoiced occurrence degree and accent pattern unvoiced occurrence degree. Means for calculating a degree of voicing occurrence, setting means for setting a voicing determination threshold based on the specified vocalization rate and a voicing determination threshold for each vocalization rate stored in the rule file; Vowel voicing / voicing determining means for determining voicing / voicing for each vowel based on the voicing determining threshold, and generating voice synthesis parameters according to the determination result of the vowel voicing / voicing determining means. And

〈作用〉任意の文字列が文字列解析部に入力されて上記文字列
の音韻系列およびアクセントパターンが決定される。そ
して、上記文字列解析部からの音韻系列およびアクセン
トパターンが音声合成パラメータ作成部に入力される
と、上記音韻系列およびアクセントパターンと規則ファ
イルに格納された規則とに従って音声合成パラメータが
作成される。<Operation> An arbitrary character string is input to the character string analysis unit, and the phoneme sequence and the accent pattern of the character string are determined. When the phoneme sequence and the accent pattern from the character string analysis unit are input to the speech synthesis parameter creation unit, speech synthesis parameters are created according to the phoneme sequence and the accent pattern and rules stored in the rule file.

その際に、上記音声合成パラメータ作成部の第１無声
化生起度合い設定手段によって、上記音韻系列と上記規
則ファイルに格納された音韻系列別の無声化生起度合い
とに基づいて、上記入力文字列の各母音毎の音韻系列無
声化生起度合いが設定される。また、第２無声化生起度
合い設定手段によって、上記アクセントパターンと上記
規則ファイルに格納されたアクセントパターン別の無声
化生起度合いとに基づいて、上記入力文字列の各母音毎
のアクセントパターン無声化生起度合いが設定される。
そして、無声化生起度合い算出手段によって、上記設定
された音韻系列無声化生起度合いとアクセントパターン
無声化生起度合いとから上記各母音毎の無声化生起度合
いが算出される。At this time, the first unvoiced occurrence degree setting means of the speech synthesis parameter creation unit sets the input character string based on the phoneme sequence and the unvoiced occurrence degree for each phoneme sequence stored in the rule file. A phonemic sequence unvoiced occurrence degree is set for each vowel. The second unvoiced occurrence degree setting means sets an accent pattern unvoiced occurrence for each vowel of the input character string based on the accent pattern and the degree of unvoiced occurrence for each accent pattern stored in the rule file. The degree is set.
Then, the unvoiced occurrence degree calculating means calculates the unvoiced occurrence degree for each vowel from the set phonological sequence unvoiced occurrence degree and the accent pattern unvoiced occurrence degree.

さらに、設定手段によって、上記規則ファイルに格納
された発声速度別の無声化判定閾値に基づいて、指定さ
れた発声速度に応じた無声化判定閾値が設定される。そ
して、母音有声化無声化判定手段によって、上記算出さ
れた無声化生起度合いと上記設定された無声化判定閾値
とに基づいて、入力文字列の各母音毎に有声化・無声化
が判定される。そして、上記母音有声化無声化判定手段
による判定結果に従って上記音声合成パラメータが生成
される。Further, the setting means sets a de-voice determination threshold corresponding to the specified utterance speed based on the de-voice determination threshold for each utterance speed stored in the rule file. Then, based on the calculated degree of occurrence of unvoicedness and the set unvoicedness determination threshold, voiced / unvoiced is determined for each vowel of the input character string by the vowel voiced and unvoiced determination unit. . Then, the speech synthesis parameters are generated according to the determination result by the vowel voiced / unvoiced determination means.

上述のようにして、上記合成パラメータ生成部によっ
て上記音声合成パラメータが生成されると、この生成さ
れた音声合成パラメータに基づいて、音声合成部によっ
て音声合成が行われる。When the speech synthesis parameters are generated by the synthesis parameter generation unit as described above, speech synthesis is performed by the speech synthesis unit based on the generated speech synthesis parameters.

こうして、母音の無声化現象に大きな影響を与える音
韻系列およびアクセントパターンと発声速度とが考慮さ
れて母音の無声化が行われ、自然な合成音声が得られる
のである。In this way, the vowel is devoiced in consideration of the phonological sequence and the accent pattern and the utterance speed, which greatly affect the vowel devoicing phenomenon, and a natural synthesized speech is obtained.

〈実施例〉以下、第１図のブロック図により、この発明の音声合
成装置の構成および動作の該要を説明する。<Embodiment> The configuration and operation of the speech synthesizer according to the present invention will be described below with reference to the block diagram of FIG.

任意の文字列が文字列解析部１に入力されると、文字
列解析部１は入力された上記文字列の構文解釈を行い、
文字列全体のイントネーションパターンを決定する。さ
らに、単語辞書２を参照して上記文字列に含まれる単語
を検索し、文字列内の各単語のアクセント及び音韻系列
を決定することにより、上記文字列の音韻系列及びアク
セントパターンを決定する。このようにして、上記文字
列解析部１において決定された文字列全体のイントネー
ションパターンと、上記文字列の音韻系列およびアクセ
ントパターンとは、上記音声合成パラメータ作成部とし
ての規則制御部３に出力される。When an arbitrary character string is input to the character string analysis unit 1, the character string analysis unit 1 performs syntax interpretation of the input character string,
Determine the intonation pattern for the entire string. Further, a word included in the character string is searched with reference to the word dictionary 2, and the accent and phoneme sequence of each word in the character string are determined, thereby determining the phoneme sequence and accent pattern of the character string. In this way, the intonation pattern of the entire character string determined by the character string analysis unit 1 and the phoneme sequence and accent pattern of the character string are output to the rule control unit 3 as the speech synthesis parameter creation unit. You.

特徴パラメータファイル８はターゲット特徴パラメー
タファイル６と時系列特徴パラメータファイル７とから
構成され、上記ターゲット特徴パラメータファイル６
は、母音の特徴を表わすターゲット特徴パラメータを上
記規則制御部３に出力し、また、上記時系列特徴パラメ
ータファイル７は子音の特徴を表わす時系列特徴パラメ
ータを規則制御部３に出力する。一方、規則ファイル４
は上記特徴パラメータファイルから出力されるターゲッ
ト特徴パラメータと時系列特徴パラメータとを接続する
ための音韻制御規則と、各韻律を制御するための韻律制
御規則とをそれぞれ上記規則制御部３に出力する。この
韻律制御規則の中に、後述する無声化音韻系列，音韻系
列毎の母音無声化生起度合いおよびアクセントパターン
別の母音無声化生起度合いが含まれている。The feature parameter file 8 includes a target feature parameter file 6 and a time-series feature parameter file 7.
Outputs a target feature parameter representing a vowel feature to the rule control unit 3, and the time-series feature parameter file 7 outputs a time-series feature parameter representing a consonant feature to the rule control unit 3. On the other hand, rule file 4
Outputs to the rule control unit 3 a phonological control rule for connecting the target characteristic parameter and the time-series characteristic parameter output from the characteristic parameter file, and a prosody control rule for controlling each prosody. The prosody control rules include an unvoiced phoneme sequence described later, a vowel unvoiced occurrence degree for each phoneme sequence, and a vowel unvoiced occurrence degree for each accent pattern.

上記規則制御部３は、上記特徴パラメータファイル８
から入力されたターゲット特徴パラメータおよび時系列
特徴パラメータと、上記規則ファイル４から入力された
各音韻を結合させるための上記音韻制御規則および各韻
律を制御するための上記韻律制御規則を参照して、上記
文字列解析部１から入力された文字列全体のイントネー
ションパターン、文字列の音韻系列、アクセントパター
ン、及び、後述する母音の有声化・無声化の判定結果に
より、音声合成に必要なパラメータを生成し、生成され
た上記パラメータを音声合成器５に出力する。The rule control unit 3 stores the feature parameter file 8
With reference to the target feature parameter and the time-series feature parameter input from the above, the phoneme control rule for combining each phoneme input from the rule file 4 and the prosody control rule for controlling each prosody, The parameters necessary for speech synthesis are generated based on the intonation pattern of the entire character string, the phoneme sequence of the character string, the accent pattern, and the vowel voiced / unvoiced determination results described later, which are input from the character string analysis unit 1. Then, the generated parameters are output to the speech synthesizer 5.

音声合成器５は、入力されたパラメータに基づいて、
音声合成を行ない入力された文字列に対応する規則合成
装置を出力する。The voice synthesizer 5 determines, based on the input parameters,
Speech synthesis is performed, and a rule synthesizer corresponding to the input character string is output.

第２図は上記規則制御部３で行なわれている有声化・
無声化判定ルーチンのフローチャートである。以下第２
図を用いて有声化・無声化判定ルーチンについて説明す
る。FIG. 2 shows the voicing performed by the rule control unit 3.
It is a flowchart of a voiceless judgment routine. The second
The voiced / unvoiced determination routine will be described with reference to the drawings.

ステップS₁で、まず、文字列解析部１から入力される
音韻系列および規則ファイル４に格納された音韻系列別
の無声化生起度合いを表した無声化生起係数から、母音
毎に音韻系列無声化生起係数が求められる。In step S ₁ , first, a vowel sequence is devoiced for each vowel from the phoneme sequence input from the character string analysis unit 1 and the unvoiced occurrence coefficient representing the degree of unvoiced occurrence for each phoneme sequence stored in the rule file 4. An occurrence coefficient is determined.

但し、前後の子音のどちらかが促音のときは上記音韻
系列無声化生起係数をα倍（α:0＜α＜１の定数）して
無声化生起度合いを低くするようにしている。However, if either of the preceding and following consonants is a consonant, the phonological sequence unvoiced occurrence coefficient is multiplied by α (α: 0 <α <1) to reduce the degree of unvoiced occurrence.

また、この音韻系列無声化生起係数は、例えば、以下
の特徴をもつ。The phoneme sequence unvoiced occurrence coefficient has, for example, the following characteristics.

1.母音が高舌母音（/i/,/u/）であり、この母音に先行
する子音が無声摩擦音（/s/,/sh/,/h/）または無声破擦
音（/ch/,/ts/）であって、後続する子音が無声破裂音
（/p/,/t/,/k/,/py/,/ky/）または無声破擦音（/ch/,/t
s/）のときは、前後の子音が無声破裂音（/p/,/t/,/k
/、後続子音は/py/,/ky/を含む）のときよりも無声化の
度合いが高い。1. The vowel is a high tongue vowel (/ i /, / u /) and the consonant preceding this vowel is unvoiced fricative (/ s /, / sh /, / h /) or unvoiced affricate (/ ch / , / ts /) and the subsequent consonant is a voiceless plosive (/ p /, / t /, / k /, / py /, / ky /) or a voiceless affricate (/ ch /, / t
s /), the consonants before and after are silent plosives (/ p /, / t /, / k
/, And subsequent consonants include / py /, / ky /).

2.母音が高舌母音（/i/,/u/）であり、前後の子音が無
声破裂音（/p/,/t/,/k/、後続子音は/py/,/ky/を含む）
のときは、前後の子音が種類の異なる無声摩擦音（/s/,
/sh/,/h/、後続子音は/hy/を含む）のときよりも無声化
の度合いが高い。2. The vowel is a high tongue vowel (/ i /, / u /), the consonants before and after are unvoiced plosives (/ p /, / t /, / k /, and the following consonants are / py /, / ky / Including)
In the case of, the consonants before and after are different types of unvoiced fricatives (/ s /,
/ sh /, / h /, and the following consonants include / hy /).

3.母音が高舌母音（/i/,/u/）であり、前後の子音が種
類の異なる無声摩擦音（/s/,/sh/,/h/、後続子音は/hy/
を含む）のときは、前後の子音が同一の無声摩擦音（/s
/,/sh/,/h/）のときよりも無声化の度合いが高い。3.The vowel is a high tongue vowel (/ i /, / u /), the consonants before and after are different types of unvoiced fricatives (/ s /, / sh /, / h /, and the following consonants are / hy /
), The preceding and following consonants have the same unvoiced fricative (/ s
/, / sh /, / h /), the degree of silence is higher.

4.母音が高舌母音（/i/,/u/）であり、前後の子音が同
一の無声摩擦音（/s/,/sh/,/h/）のときは、前後の子音
のどちらかが無声子音でないときよりも無声化の度合い
が高い。4. If the vowel is a high tongue vowel (/ i /, / u /) and the preceding and following consonants are the same unvoiced fricative (/ s /, / sh /, / h /), one of the preceding and following consonants Is more unvoiced than when it is not a voiceless consonant.

5.母音が高舌母音（/i/,/u/）でないときは無声化はし
ない。5. If the vowel is not a high tongue vowel (/ i /, / u /), do not devoice.

表１に高舌母音の前後の子音で場合分けした音韻系列
無声化生起係数の一例を示す。縦は先行子音、横は後続
子音であり、各係数は数字が大きいほど無声化の度合い
が高いことを示す。括弧内の数字は前後の子音が同一の
場合を示している。Table 1 shows an example of a phonological sequence unvoiced occurrence coefficient classified by consonants before and after a high tongue vowel. The vertical is the preceding consonant and the horizontal is the succeeding consonant, and the larger the coefficient, the higher the degree of unvoicedness. The numbers in parentheses indicate the case where the preceding and following consonants are the same.

次に、文字列解析部１から入力されるアクセントパタ
ーンと規則ファイル４に格納されたアクセントパターン
別無声化生起係数から、母音毎にアクセントパターン無
声化生起係数が求められる。 Next, an accent pattern unvoiced occurrence coefficient is determined for each vowel from the accent pattern input from the character string analysis unit 1 and the accent pattern unvoiced occurrence coefficient stored in the rule file 4.

上記アクセントパターン別の無声化生起度合いは、以
下の特徴を持つ。The degree of silence occurrence for each accent pattern has the following characteristics.

a.アクセント核にある母音はほとんど無声化することが
ない。a. Vowels in the accent nucleus are hardly devoiced.

b.O型の発声は、１型以上の発声に比べて母音が無声化
する度合いが高い。The bO type utterance has a higher degree of vowel devoicing than the type 1 or higher utterance.

c.語頭は１型の発声を除いて無声化の度合いが高い。c. Except for Type 1 utterances, the degree of devoicing is high.

d.アクセント核の直前にある母音はアクセント核にある
母音より無声化の度合いが高いが、アクセント核より後
にある母音より無声化の度合いが低い。d. Vowels immediately before the accent nucleus are more devoiced than vowels in the accent nucleus, but less devoiced than vowels after the accent nucleus.

第２表にアクセントパターン別の無声化生起度合いの
一例を示す。縦はアクセント型、横は対象母音のモーラ
位置を示し、各係数は数字が大きい程無声化の度合いが
高いことを示す。ここで、上記アクセント型とは、単語
を形成する音節のうち高く唱える音節の位置によって分
類するものであり、例えば“ｎ型”とは第２番目の音節
からｎ番目までを高く唱え、１番目の音節とｎ＋１番目
以下の音節はすべて低く唱えることを表わす（新明解国
語事典第３版付録アクセント一覧）。Table 2 shows an example of the degree of unvoiced occurrence for each accent pattern. The vertical indicates the accent type, the horizontal indicates the mora position of the target vowel, and the larger the coefficient, the higher the degree of unvoicedness. Here, the accent type is classified according to the position of the syllable that is higher among the syllables forming the word. For example, “n-type” means that the syllables from the second syllable to the nth are sung higher, Syllable and all syllables after the (n + 1) -th syllable indicate that it is to be chanted low.

さらに、上記音韻系列無声化生起係数とアクセントパ
ターン無声化生起係数から、次式により母音毎に無声化
生起度合いρ（ｎ）（ｎ＝1,2,‥,WM＋1 WMは入力し
た音韻系列別のモーラ数であり、WM＋１は語尾の無音を
表わす）が求められる。 Further, from the phonological sequence unvoiced occurrence coefficient and the accent pattern unvoiced occurrence coefficient, the unvoiced occurrence degree ρ (n) (n = 1, 2, 1 ,, WM + 1 WM for each vowel is calculated according to the following equation. Mora number, and WM + 1 represents silence at the end).

ρ（ｎ）＝音韻系列無声化生起係数×アクセントパター
ン無声化生起係数ステップS₂で、規則ファイル４に格納してある発声速
度毎に定めた無声化を判定するための無声化判定閾値
と、文字列解析部１から入力された発声速度とから、次
のようにして無声化の判定基準となる閾値θが求められ
る。すなわち、普通の発声速度のときの閾値を定め、そ
れより発声速度が速いときは閾値を下げ、それより発声
速度が遅いときは閾値を上げる。ρ (n) = phonetic sequence unvoiced occurrence coefficient × accent pattern unvoiced occurrence coefficient In step S ₂ , a voiceless determination threshold for determining voicelessness determined for each voice speed stored in the rule file 4, From the utterance speed input from the character string analysis unit 1, a threshold θ as a criterion for de-voicing is obtained as follows. That is, a threshold value for a normal utterance speed is determined, and when the utterance speed is higher than that, the threshold value is lowered, and when the utterance speed is lower than that, the threshold value is raised.

一例として、普通発声速度の閾値を６としたときの発
声速度毎の閾値を第３表に示す。As an example, Table 3 shows threshold values for each utterance speed when the threshold value of the normal utterance speed is 6.

また、無声化の生起しやすい母音が連鎖したときの無
声化判定閾値θ２が次式で求められる。 Further, a voicing determination threshold value θ2 when vowels that are likely to be voicing occur is obtained by the following equation.

θ２＝θ×β（βは、１＜βの実数）次に、母音毎に無声化の判定に入る。[theta] 2 = [theta] * [beta] ([beta] is a real number satisfying 1 <[beta]) Next, a determination of voicelessness is started for each vowel.

ステップS₃で、１モーラ目から判定を行なうためにｎ
＝１とする。In step S _3, n in order to perform the determination of one mora th
= 1.

ステップS₄で、現モーラ（ｎモーラ）の無声化生起度
合いρ（ｎ）と無声化判定閾値θとが比較される。その
結果、ρ（ｎ）≧θのときは現モーラの母音は無声化の
可能性があるとしてステップS₅に進み、ρ（ｎ）＜θの
ときは無声化しないと判定してステップS₁₁へ進む。In Step S _4, devoicing degree of the current mora (n Mora) [rho and (n) and unvoiced assessment threshold θ is compared. As a result, ρ (n) ≧ θ current mora vowel when the process proceeds to step S ₅ as a possible devoicing, ρ (n) <θ step S ₁₁ it is determined that no devoicing when the Proceed to.

ステップS₅で、次モーラ（ｎ＋１）の無声化生起度合
いρ（ｎ＋１）と、現モーラ（ｎ）の無声化生起度合い
ρ（ｎ）とが比較される。その結果、ρ（ｎ）≧ρ（ｎ
＋１）すなわち現モーラの方が無声化の度合が高い場合
は、現モーラの母音は無声化するが、次モーラの母音は
無声化しない場合があるとしてステップS₈へ進む。一
方、ρ（ｎ）＜ρ（ｎ＋１）すなわち現モーラの方が無
声化の度合いが低い場合は、現モーラの母音は無声化し
ない場合があるとしてステップS₆へ進む。In step S _5, the devoicing degree follows mora (n + 1) ρ (n + 1), and devoicing degree [rho of the current mora (n) (n) are compared. As a result, ρ (n) ≧ ρ (n
+1) That case towards the current mora high degree of devoicing the vowel current mora although silent and proceeds to step S ₈ as there are cases where vowels following mora not unvoiced. On the other hand, if ρ (n) <ρ (n + 1) i.e. towards the current mora degree of devoicing is low, the flow proceeds to step S ₆ as there are cases where vowels current mora not unvoiced.

これは、無声化する母音が続く場合は、発音の不明確
になるのを避けるために無声化の度合いの低い一方の母
音を無声化させないためである。This is because if vowels to be unvoiced continue, one vowel with a low degree of unvoicedness is not unvoiced in order to avoid unclear pronunciation.

ステップS₆で、次モーラの母音が必ず無声化されるよ
うにρ（ｎ＋１）を大きくしてステップS₇へ進む。In step S _6, the process proceeds by increasing the ρ (n + 1) as a vowel follows mora is always devoiced to step S _7.

ステップS₇で、現モーラの無声化生起度合いρ（ｎ）
と、上記無声化の生起しやすい母音が連鎖したときの無
声化判定閾値θ２とが比較される。その結果、ρ（ｎ）
≧θ２の場合は現モーラの母音は無声化の度合いが強い
ためステップS₁₀へ進む。一方、ρ（ｎ）＜θ２の場合
は現モーラの母音を無声化しないためステップS₁₁へ進
む。In step S _7, devoicing the degree of the current mora ρ (n)
Is compared with a voicing determination threshold θ2 when vowels that are likely to cause voicing are chained. As a result, ρ (n)
Current Mora of the vowel in the case of ≧ θ2, the process proceeds to step S ₁₀ for a strong degree of devoicing. On the other hand, in the case of ρ (n) <θ2 proceeds to step S ₁₁ since no silent the current mora vowel.

ステップS₈で、次モーラの無声化生起度合ρ（ｎ＋
１）と、無声化の生起しやすい母音が連鎖したときの無
声化判定閾値θ２とが比較される。その結果、ρ（ｎ＋
１）＜θ２の場合は次モーラの母音を無声化しないと判
定してステップS₉に進む。一方、ρ（ｎ＋１）≧θ２の
場合は次モーラの母音は無声化の度合いが強いためその
ままステップS₁₀に進む。In step S _8, devoicing degree of next mora ρ (n +
1) is compared with a voicing determination threshold value θ2 when vowels that are likely to cause voicing are chained. As a result, ρ (n +
1) In the case of <.theta.2 proceeds vowel follows Mora Step S ₉ determines not to unvoiced. On the other hand, vowel follows Mora For ρ (n + 1) ≧ θ2 is directly proceeds to step S ₁₀ a strong degree of unvoiced.

ステップS₉で、次モーラの母音を無声化させないため
に、ρ（ｎ＋１）＝０としてステップS₁₀に進む。In step S _9, the vowels following mora in order not to devoiced, the process proceeds to step S ₁₀ as ρ (n + 1) = 0 .

ステップS₁₀で、ｎモーラ目の母音を無声化すると判
定してステップS₁₂に進む。In step S _10, the process proceeds to step S ₁₂ it is determined that the unvoiced the n mora th vowel.

ステップS₁₁で、ｎモーラ目の母音を有声化すると判
定してステップS₁₂に進む。In step S _11, the process proceeds to step S ₁₂ it is determined that the voicing an n mora th vowel.

ステップS₁₂で、次モーラの母音の判定に移るため
に、ｎを１つインクリメントされる。In step S _12, in order to move to the determination of the vowel next Mora, it is incremented by one n.

ステップS₁₃で、ｎが入力した音韻系列のモーラ数（W
M）以下であるか否かが判定される。その結果、ｎ≦WM
の場合は上記ステップS₄からステップS₁₂が繰返され、
ｎ＞WMの場合はこの無声化の判定ルーチンを終了する。In step S _13, n is number of moras phoneme sequences inputted (W
M) It is determined whether or not: As a result, n ≦ WM
For Step S ₁₂ is repeated from the step S _4,
If n> WM, the unvoiced determination routine ends.

以上のように、この発明では、音韻系列別の無声化生
起度合いとアクセントパターン別の無声化生起度合いと
発声速度に応じて設定される無声化判定閾値と無声化の
生起しやすい母音が連鎖したときの無声化判定閾値とに
基づいて、対象母音の有声化・無声化を判定するので、
入力された文字列に対して、この文字列の音韻系列とア
クセントパターンと発声速度に則して母音の無声化を行
なうことができる。したがって、この発明によれば、よ
り自然な合成音声を生成することができる。As described above, in the present invention, the unvoiced occurrence threshold for each phoneme sequence, the unvoiced occurrence for each accent pattern, and the unvoiced determination threshold set according to the utterance speed are linked to the vowels that are likely to be unvoiced. Based on the unvoiced determination threshold at the time, the voiced / unvoiced voice of the target vowel is determined,
For an input character string, vowels can be devoiced in accordance with the phoneme sequence, accent pattern, and utterance speed of the character string. Therefore, according to the present invention, a more natural synthesized speech can be generated.

〈発明の効果〉以上より明らかなように、請求項１に係る発明の音声
合成装置は、音声合成パラメータ作成部によって、規則
ファイルの音韻系列別の無声化生起度合いとアクセント
パターン別の無声化生起度合いとに基づいて、入力文字
列の各母音毎の無声化生起度合いを算出し、さらに、発
声速度別の無声化判定閾値に基づいて、指定された発声
速度に応じた無声化判定閾値を設定し、上記各無声化生
起度合いと無声化判定閾値とに基づいて入力文字列の各
母音毎に有声化・無声化を判定し、この有声化・無声化
判定結果に従って音声合成パラメータを生成するので、
この音声合成パラメータに基づいて音声合成を行うこと
によって、母音の無声化現象に大きな影響を与える音韻
系列とアクセントパターンと発声速度を考慮して対象と
なる母音を無声化できる。すなわち、この発明によれ
ば、実音声に近い自然な合成音声を生成することができ
るのである。<Effects of the Invention> As is apparent from the above description, in the speech synthesis apparatus according to the first aspect of the present invention, the speech synthesis parameter creation unit uses the phonetic sequence of the rule file to generate unvoiced voices and accent patterns to generate voiceless voices. Based on the degree, the degree of unvoiced occurrence for each vowel of the input character string is calculated, and further, based on the unvoiced determination threshold for each vocalization rate, a devoicing determination threshold is set according to the specified vocalization rate Then, voicing / voicing is determined for each vowel of the input character string based on each of the unvoiced occurrence degrees and the voicing determination threshold, and a voice synthesis parameter is generated according to the voicing / voicing determination result. ,
By performing speech synthesis based on these speech synthesis parameters, a target vowel can be devoiced in consideration of a phoneme sequence, an accent pattern, and a utterance speed that greatly affect the vowel devoicing phenomenon. That is, according to the present invention, it is possible to generate a natural synthesized speech that is close to a real speech.

[Brief description of the drawings]

第１図はこの発明の音声合成装置の一実施例を示すブロ
ック図、第２図は上記実施例における母音の有声化・無
声化判定ルーチンのフローチャート、第３図は従来の有
声化・無声化判定ルーチンのフローチャートである。１……文字列解析部、２……単語辞書、３……規則制御
部、４……規則ファイル、５……音声合成器、６……タ
ーゲット特徴パラメータファイル、７……時系列特徴パ
ラメータファイル、８……特徴パラメータファイル。FIG. 1 is a block diagram showing an embodiment of a voice synthesizing apparatus according to the present invention, FIG. 2 is a flowchart of a voicing / voicing determination routine for vowels in the above embodiment, and FIG. 3 is conventional voicing / voicing. It is a flowchart of a determination routine. 1 ... character string analysis unit, 2 ... word dictionary, 3 ... rule control unit, 4 ... rule file, 5 ... speech synthesizer, 6 ... target feature parameter file, 7 ... time-series feature parameter file , 8 ... Feature parameter file.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 5/02 G10L 3/00──────────────────────────────────────────────────続き Continued on front page (58) Field surveyed (Int. Cl. ⁶ , DB name) G10L 5/02 G10L 3/00

Claims

(57) [Claims]

A character string analysis unit for determining a phoneme sequence and an accent pattern from an input character string;
Speech synthesis including a speech synthesis parameter creation unit (3) for creating speech synthesis parameters in accordance with the rules stored in the accent file and the rule file (4), and a speech synthesis unit (5) for performing speech synthesis based on the speech synthesis parameters. A speech synthesis parameter creation unit (3), based on the phoneme sequence and the degree of unvoiced occurrence for each phonological sequence stored in the rule file (4), a degree of unvoiced occurrence of a phonological sequence for each vowel. First unvoiced occurrence degree setting means (S1) for setting an accent pattern and an unvoiced occurrence degree for each vowel based on the accent pattern and the degree of unvoiced occurrence for each accent pattern stored in the rule file (4). A second unvoiced occurrence degree setting means (S1) for setting the degree; a phonetic sequence unvoiced occurrence degree and an accent pattern unvoiced occurrence degree; Based on the voicing occurrence degree calculating means (S1) for calculating the voicing occurrence degree for each vowel, the specified vocalization rate, and the voicing determination threshold for each vocalization rate stored in the rule file (4). Setting means (S2) for setting a devoicing judgment threshold value; and vowel voiced and unvoiced judgment means (S4) for judging devoicing / voicing for each vowel based on the set devoiced occurrence degree and devoiced judgment threshold value. To S13), and generates a speech synthesis parameter in accordance with the determination result of the vowel voiced and unvoiced determination means (S4 to S13).