JP3078074B2

JP3078074B2 - Basic frequency pattern generation method

Info

Publication number: JP3078074B2
Application number: JP03344628A
Authority: JP
Inventors: 隆矢頭
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1991-12-26
Filing date: 1991-12-26
Publication date: 2000-08-21
Anticipated expiration: 2015-08-21
Also published as: JPH05173591A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、音声合成に用いる基
本周波数パタン生成方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for generating a fundamental frequency pattern used for speech synthesis.

【０００２】[0002]

【従来の技術】文字情報を入力し、それを音声に変換し
て出力する音声合成装置は、出力語彙の制限がないこと
から録音・再生型の音声合成に取って代る技術として期
待されている。この種の合成装置に於いて、音声のアク
セント、イントネ−ションを表現する声帯振動の基本周
波数（ピッチ）パタンの生成技術は自然な合成音を得る
上で非常に重要な要素技術である。2. Description of the Related Art A speech synthesizer for inputting character information, converting it into speech, and outputting the speech is expected to be a technology that can replace recording / playback speech synthesis because there is no restriction on the output vocabulary. I have. In this type of synthesizer, a technique for generating a fundamental frequency (pitch) pattern of vocal cord vibrations representing voice accents and intonation is a very important elemental technique for obtaining a natural synthesized sound.

【０００３】音声のピッチパタンは、個々の単語のアク
セント型のみならず文章構造、意味等の影響を強く受け
るため、実際の音声から抽出した基本周波数パタンを種
々用意して、此等を組合せて文章全体のパタンとすると
いう方法では実現が困難であり、適切なモデル化が不可
欠となる。[0003] Since the pitch pattern of speech is strongly affected not only by the accent type of each word but also by the sentence structure, meaning, etc., various fundamental frequency patterns extracted from actual speech are prepared and combined. It is difficult to achieve this by using the entire sentence pattern, and appropriate modeling is indispensable.

【０００４】音声の基本周波数パタン生成モデルとして
は、幾つかの方法が提案されているが、対数軸上の基本
周波数パタンを文頭から文末に向かう緩やかな下降（イ
ントネ−ション）に対応するフレ−ズ成分と、局所的な
起伏（アクセント）に対応するアクセント成分の和で表
現されるとし、フレ−ズ成分はインパルス状のフレ−ズ
指令に対する臨界制動２次線形系の応答であるとの近似
の基に、また、アクセント成分はステップ状のアクセン
ト指令に対する臨界制動２次線形系の応答であるとの近
似の基に定式化したモデルが一般に広く用いられている
（広瀬啓吉、藤崎博也、河井恒、山口幹雄：“基本周波
数パタン生成過程モデルに基づく文章音声の合成”、電
子情報通信学会論文誌 ’８９／０１Ｖｏｌ．Ｊ７２
−ＡＮｏ．１参照）。Several methods have been proposed as models for generating a fundamental frequency pattern of speech, but a fundamental frequency pattern on a logarithmic axis is a frame corresponding to a gentle descent (intonation) from the beginning to the end of a sentence. Is expressed as the sum of a noise component and an accent component corresponding to a local undulation (accent), and the phrase component is an approximation that is a response of a critical damping quadratic linear system to an impulse-like phrase command. In general, a model formulated based on an approximation that the accent component is a response of a critical damping quadratic linear system to a step-like accent command is widely used (Kiyoshi Hirose, Hiroya Fujisaki, Tsune Kawai, Mikio Yamaguchi: "Synthesis of Sentence Speech Based on Fundamental Frequency Pattern Generation Process Model", Transactions of the Institute of Electronics, Information and Communication Engineers, '89 / 01 Vol.
-A No. 1).

【０００５】図５は従来の基本周波数パタン生成モデル
を示すブロック図である。このモデルでは対数基本周波
数ｌn Ｆ0(ｔ）は時刻ｔの関数として次式で与えられ
る。FIG. 5 is a block diagram showing a conventional fundamental frequency pattern generation model. In this model, the logarithmic fundamental frequency In F0 (t) is given by the following equation as a function of time t.

【０００６】[0006]

【数１】 (Equation 1)

【０００７】ここでＦ_min は基底周波数、Ａ_piは文章中
のｉ番目のフレ−ズ指令の大きさ、Ａ_ajは文章中のｊ番
目のアクセント指令の大きさ、Ｉは一文章中のフレ−ズ
指令の数、Ｊは一文章中のアクセント指令の数、Ｔ_0iは
ｉ番目のフレ−ズ指令の開始時点、Ｔ_1j，Ｔ_2jは其々ｊ
番目のアクセント指令の開始時点と終了時点である。Here, F _min is the base frequency, A _pi is the size of the i-th phrase command in the text, A _aj is the size of the j-th accent command in the text, and I is the frame size in the text. The number of accent commands, J is the number of accent commands in one sentence, T _0i is the start time of the i-th _phrase command, and T _1j and T _2j are _j , respectively.
The start and end times of the second accent command.

【０００８】また、Ｇp(ｔ）、Ｇa(ｔ）は其々フレ−ズ
制御機構のインパルス応答関数、アクセント制御機構の
ステップ応答関数であり、ｔ≧０の範囲で次式（２）、
（３）で与えられる（ｔ＜０ではＧp(ｔ）＝Ｇa(ｔ）＝
０）。Gp (t) and Ga (t) are an impulse response function of the phrase control mechanism and a step response function of the accent control mechanism, respectively, and within the range of t ≧ 0, the following equation (2):
(When t <0, Gp (t) = Ga (t) =
0).

【０００９】Ｇp(ｔ）＝αｔ・ｅｘｐ（−αｔ）（２）Ｇa(ｔ）＝Ｍin［１−（１＋βｔ）・ｅｘｐ（−βｔ），θ）］（３）Gp (t) = αt · exp (−αt) (2) Ga (t) = Min [1− (1 + βt) · exp (−βt), θ)] (3)

【００１０】ここで、α、βは其々フレ−ズ制御機構の
応答の速さ、アクセント制御機構の応答の速さを決める
定数であり、α＝３．０、β＝２０．０程度の値を用い
る。また、θはアクセント成分の上限値で通常θ＝０．
９等に選ばれる。Here, α and β are constants which determine the response speed of the phrase control mechanism and the response speed of the accent control mechanism, respectively, where α = 3.0 and β = 20.0. Use values. Θ is the upper limit value of the accent component, usually θ = 0.
9 mag.

【００１１】人間の発声では平坦に発声しても、発声の
初めはピッチが高く、以後呼気圧の低下などにより自然
にピッチが下がる性質があり、このピッチの自然下降成
分をモデル化したものが前述のフレ−ズ制御機構であ
る。In human utterances, even if uttered flatly, the pitch is high at the beginning of the utterance, and thereafter the pitch naturally falls due to a decrease in the expiration pressure, etc. This is the above-described phrase control mechanism.

【００１２】一方、アクセントについては、標準語の場
合、単語のアクセントは第１モ−ラから第２モ−ラにか
けて必ず顕著なピッチの上昇または下降があり、かつ、
単語内でのピッチの顕著な下降は一ヶ所のみに限られ
る。従って、ｎ個のモ−ラから成る単語には（ｎ＋１）
種のアクセント型が存在する。各アクセント型はピッチ
の下降する位置に着目して０型、１型、２型、３型（第
ｉモ−ラと第ｉ＋１モ−ラとの間でピッチが顕著に下降
するものがｉ型であり、０型は平板アクセントとも言
う）と呼ぶ。ピッチの上昇、下降は、前述の基本周波数
生成モデルのアクセント指令の始点、終点に対応する。On the other hand, with respect to accent, in the case of a standard word, the accent of a word always has a remarkable pitch rise or fall from the first mora to the second mora, and
There is only one significant drop in pitch within a word. Therefore, a word consisting of n moras is (n + 1)
There are various accent types. Each accent type pays attention to the position where the pitch falls, and the 0 type, 1 type, 2 type, and 3 type (the i type is a type in which the pitch significantly decreases between the i-th and i + 1-th models) And type 0 is also called a flat accent). The rise and fall of the pitch correspond to the start point and end point of the accent command of the above-described fundamental frequency generation model.

【００１３】[0013]

【発明が解決しようとする課題】図６(a) は、従来モデ
ルのフレ−ズ制御機構によるフレ−ズ成分の形状を示し
たものであり、初めは比較的急峻に立ち上がりその後徐
々に減衰してある時間経過後殆ど０となるような漸近型
の減少パタンとなる。図６(a) は通常用いられる応答の
速さ（α＝３．０）のパタン例であるが、約２秒でフレ
−ズ成分はほぼ０となる。FIG. 6 (a) shows the shape of a phrase component by the phrase control mechanism of the conventional model, which rises relatively sharply at first, and then gradually attenuates. After the lapse of a certain time, the asymptotic reduction pattern becomes almost zero. FIG. 6A shows an example of a pattern of response speed (α = 3.0) which is generally used, but the phrase component becomes almost zero in about 2 seconds.

【００１４】しかしながら、実際の音声では２秒を越え
るような長いフレ−ズも少なくない。このような場合、
従来のフレ−ズ制御機構ではフレ−ズ後半部分のピッチ
の自然降下成分が表現されず、聴感上フレ−ズ終端部分
のピッチが不自然に上昇したように感じられる。[0014] However, in actual speech, there are many long phrases exceeding 2 seconds. In such a case,
In the conventional phrase control mechanism, the natural fall component of the pitch in the latter half of the phrase is not expressed, and it is perceived that the pitch at the end of the phrase has risen unnaturally.

【００１５】前記フレ−ズ成分生成関数（式（２））に
おける定数αを小さくすればフレ−ズ成分の減衰も緩や
かになり、フレ−ズ後半の自然降下成分を確保すること
が可能である（図６(b) 参照）が、同時に立ち上がり速
度も鈍り、実際の音声の立ち上がり形状に合わなくなる
という問題がある。従って、従来のフレ−ズ制御機構で
は、フレ−ズ成分のモデル化に際して、立ち上がりの応
答速度と減衰速度の双方を適当に満足するような値に妥
協せざるを得ず、自然音声のフレ−ズ形状に近付けるの
が困難であるという問題、更には、生成し得るフレ−ズ
成分の長さが自ずと限定される、という問題が生じる。If the constant α in the above-mentioned phrase component generation function (Equation (2)) is reduced, the attenuation of the phrase component is moderated, and a natural fall component in the latter half of the phrase can be secured. (See FIG. 6 (b).) However, there is a problem that the rising speed also becomes slow at the same time, and does not match the actual rising shape of the voice. Therefore, in the conventional phrase control mechanism, when modeling the phrase component, it is necessary to compromise the values so as to appropriately satisfy both the rising response speed and the decay speed. The problem is that it is difficult to approach the shape of the froth, and furthermore, the length of the generated froth component is naturally limited.

【００１６】この発明は、以上述べた問題を解決し、立
ち上がりの応答速度と減衰速度の双方を独立に制御して
自然音声のフレ−ズ形状に近付けると共に、種々の長さ
のフレ−ズに適応して自然なピッチパタンの生成が可能
な基本周波数パタン生成方法を提供することを目的とす
る。The present invention solves the above-mentioned problems, and independently controls both the response speed and the decay speed of the rising edge so as to approximate the phrase shape of a natural voice, while providing a phrase having various lengths. An object of the present invention is to provide a fundamental frequency pattern generation method capable of adaptively generating a natural pitch pattern.

【００１７】[0017]

【課題を解決するための手段】この発明は、前記課題を
解決するために、入力文章の解析により算出される韻律
を生成する為のアクセント指令とフレーズ指令を入力
し、対数軸上の基本周波数パタンを、イントネーション
に対応するフレーズ成分と、アクセントに対応するアク
セント成分との和で表す基本周波数パタン生成法におい
て、ｂｉ及びｃｉを入力文のｉ番目のフレーズに対する
フレーズ制御機構の応答の早さを決める定数、ａｉを入
力文のｉ番目のフレーズに対するフレーズ制御機構の応
答の強さを補正する定数として、フレーズ成分を次式
（４）で表される時刻ｔのインパルス応答関数で近似す
ると共に、前記定数ａｉ、ｂｉ、ｃｉを基本周波数を付
与するフレーズの長さに応じて設定することを特徴とす
る。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention is to input an accent command and a phrase command for generating a prosody calculated by analyzing an input sentence, and to input a fundamental frequency on a logarithmic axis. In a fundamental frequency pattern generation method in which a pattern is represented by the sum of a phrase component corresponding to intonation and an accent component corresponding to accent, bi and ci represent the response speed of the phrase control mechanism to the i-th phrase of the input sentence. The phrase component is approximated by an impulse response function at time t represented by the following equation (4), with the determined constant ai as a constant for correcting the response strength of the phrase control mechanism to the i-th phrase of the input sentence, The method is characterized in that the constants ai, bi, and ci are set according to the length of a phrase to which a fundamental frequency is assigned.

【００１８】Ｇ_pi（ｔ）＝ａ_i・ｔ^ci・ｅｘｐ（−ｂ_iｔ）（４）G _pi (t) = a _i · t ^ci · exp (−b _i t) (4)

【００１９】尚、パタン形状の設定に当たっては、（ｃ
_i ／ｂ_i ）の値を０．２〜０．４の範囲に設定するのが
好適であり、ｂ_i はフレ−ズの長さに応じて１．０〜
４．５の範囲に設定するのが好適である。In setting the pattern shape, (c)
_i / b _i ) is preferably set in the range of 0.2 to 0.4, and b _i is 1.0 to
It is preferable to set it in the range of 4.5.

【００２０】[0020]

【作用】前記式（４）によれば、立ち上がりの応答速
度、即ちフレ−ズの開始時点から極大値に達するまでの
時間は（ｃ_i ／ｂ_i ）秒で与えられる。一方、減衰速度
は定数ｂi に比例するから、与えられたフレ−ズの長さ
に応じて定数ｂ_i を制御し、立ち上がりの応答速度に応
じてｃ_i を制御することにより、フレ−ズの立ち上がり
応答と立ち下がり応答を独立に制御することが可能とな
る。また、定数ａ_iは、前記関数（式（４））の極大値
が所望の値となるように設定可能であり、定数ｂ_i 、ｃ
_i を調整することにより変化する関数の応答の強さ（前
記関数の極大値）を所望の値に補正するように作用す
る。According to the above equation (4), the response speed of the rise, that is, the time from the start of the phrase to the maximum value is given by (c _i / b _i ) seconds. On the other hand, since the decay rate is proportional to the constant bi, given frame - controls the constant b _i according to the length of the figure, by controlling the c _i in response to the rising of the response speed, frame - the's The rising response and the falling response can be controlled independently. The constant a _i can be set so that the maximum value of the function (Equation (4)) becomes a desired value, and the constants b _i and c
It acts so as to correct the response strength (maximum value of the function) of the function that changes by adjusting _i to a desired value.

【００２１】図２は関数式（４）による生成パタンを示
したもので、図２（ａ）は立ち上がりの応答速度が速
く、減衰が遅い例であり、図２（ｂ）は逆に立ち上がり
の応答が遅く、減衰が速い場合である。このように、本
発明によれば、定数の調整により、立ち上がりと立ち下
がりのパタン形状を任意に設定することが可能である。
従って、フレ−ズ成分のモデル化に際して、立ち上がり
及び立ち下がりの何れか一方の応答速度に拘束されるこ
となく、自然音声のフレ−ズ成分の形状に最も適したパ
タンを与えることが出来る。FIG. 2 shows a pattern generated by the function formula (4). FIG. 2A shows an example in which the rising response speed is fast and the attenuation is slow, and FIG. The response is slow and the decay is fast. Thus, according to the present invention, it is possible to arbitrarily set the rising and falling pattern shapes by adjusting the constant.
Therefore, when modeling the phrase component, a pattern most suitable for the shape of the phrase component of natural speech can be given without being restricted by either the rising or falling response speed.

【００２２】更に、本発明によれば、定数ａ_i 、ｂ_i 、
ｃ_i は、前述のようにフレ−ズの長さに応じて設定可能
である。具体的には、定数の組（ａ_i 、ｂ_i 、ｃ_i ）を
予め複数組用意しておき、生成すべきフレ−ズ成分の長
さに応じて最も好適な組合せを選択することが可能で、
種々の長さのフレ−ズに対応することが可能となる。以
上の作用により前記課題が解決される。Further, according to the present invention, the constants a _i , b _i ,
c _i is frame as described above - can be set according to the length of the figure. Specifically, a plurality of sets of constants (a _i , b _i , c _i ) are prepared in advance, and the most suitable combination can be selected according to the length of the phrase component to be generated. so,
It is possible to correspond to phrases of various lengths. The above problem is solved by the above operation.

【００２３】[0023]

【実施例】図１はこの発明の第１の実施例を示す装置の
機能ブロック図であり、文字列入力部１０、文字列解析
部１１、フレ−ズ制御機構１２、アクセント制御機構１
３、定数設定部１４、声帯振動機構１５、基本周波数出
力端子１６から構成されている。以下、第１の実施例に
おける基本周波数の生成方法につき説明する。FIG. 1 is a functional block diagram of an apparatus showing a first embodiment of the present invention. A character string input section 10, a character string analysis section 11, a phrase control mechanism 12, and an accent control mechanism 1 are shown.
3, a constant setting unit 14, a vocal cord vibration mechanism 15, and a fundamental frequency output terminal 16. Hereinafter, a method of generating a fundamental frequency in the first embodiment will be described.

【００２４】先ず、文字列入力部１０から、音声に変換
されるべき文章の韻律記号付き仮名文字列が入力され
る。韻律記号付き仮名文字列とは、例えば表１に示され
る文字列であり、音声に変換されるべき文章の読みに対
応する仮名文字列とフレ−ズ記号、アクセント記号、休
止記号（区切り記号）等の韻律制御記号から成る。First, a kana character string with a prosody symbol of a sentence to be converted into speech is input from the character string input unit 10. A kana character string with a prosody symbol is, for example, a character string shown in Table 1, and includes a kana character string corresponding to reading of a sentence to be converted into speech, a phrase symbol, an accent symbol, a pause symbol (separator symbol). And so on.

【００２５】[0025]

【表１】 [Table 1]

【００２６】文字列解析部１１では入力された韻律記号
付き仮名文字列に基づき、フレ−ズ制御機構１２の入力
であるフレ−ズ指令を決定すると共にアクセント制御機
構１３の入力であるアクセント指令を決定する。The character string analyzing unit 11 determines a phrase command which is an input of the phrase control mechanism 12 based on the input kana character string with a prosody symbol, and outputs an accent command which is an input of the accent control mechanism 13. decide.

【００２７】フレ−ズ指令の大きさ（式（１）のＡ_pi）
は入力文字列中に挿入されているフレ−ズ記号の種類に
応じて表２の如く決められる。The magnitude of the phrase command (A _{pi in} equation (1))
Is determined as shown in Table 2 according to the type of phrase symbol inserted in the input character string.

【００２８】[0028]

【表２】 [Table 2]

【００２９】フレ−ズ指令の開始時点は、同じく入力文
字列に於いて、仮名文字列中のフレ−ズ記号の挿入位置
に応じて決められる。例えば、表１に示される文字列で
は、第１フレ−ズのフレ−ズ記号Ｐ1 に対して、当該フ
レ−ズの第１音節「キ」の始端を基準に、通常１００〜
２００ｍｓさかのぼった時点に設定される。The start time of the phrase command is also determined according to the insertion position of the phrase symbol in the kana character string in the input character string. For example, in the character string shown in Table 1, the phrase symbol P1 of the first phrase is usually 100 to 100% based on the beginning of the first syllable "K" of the phrase.
It is set at the point in time when it goes back 200 ms.

【００３０】アクセント指令の大きさ及び指令の時点は
休止記号、アクセント記号の位置、種類に応じて決定さ
れる。アクセント記号はアクセント核のある音節の直後
に挿入されて語のアクセント位置を示すと共にその種類
によってアクセントの強さを表している。アクセント記
号の種類に応じたアクセント指令の大きさを表３に示
す。アクセント記号の無い語については平板型（０型）
アクセントと見做され、同じく表３の指令の大きさが与
えられる。The size of the accent command and the time of the command are determined according to the position and type of the pause symbol and the accent symbol. Accent marks are inserted immediately after a syllable with an accent nucleus to indicate the accent position of a word and to indicate the strength of the accent depending on the type. Table 3 shows the size of the accent command according to the type of accent symbol. Flat words (type 0) for words without accent marks
It is regarded as an accent and is given the magnitude of the command in Table 3 as well.

【００３１】[0031]

【表３】 [Table 3]

【００３２】アクセント指令の開始時点はアクセント型
によっても異なるが、語の第１モ−ラもしくは第２モ−
ラの母音開始時点を基準に決められる。指令の終了時点
は、アクセント記号の位置によりアクセント核の次のモ
−ラの母音開始時点を基準に求められる。平板型の場合
は語の最終モ−ラが基準となる。Although the start time of the accent command differs depending on the accent type, the first or second word of the word is used.
La vowel start time can be determined as a reference. The end time of the command is obtained based on the vowel start time of the next mora of the accent nucleus according to the position of the accent mark. In the case of the flat type, the last mora of the word is the reference.

【００３３】以上のようにして決定されたフレ−ズ指
令、アクセント指令は其々フレ−ズ制御機構１２、アク
セント立ち上げ制御機構１３、に送られる。The phrase command and the accent command determined as described above are sent to the phrase control mechanism 12 and the accent activation control mechanism 13, respectively.

【００３４】フレ−ズ制御機構１２では与えられたフレ
−ズ指令に対するインパルス応答関数である式（４）を
計算してフレ−ズ成分を生成するが、これに先立って合
成すべきフレ−ズの長さに応じて式（４）の定数ａ_i 、
ｂ_i 、ｃ_i の値の設定が定数生設定部１４で行われる。
この定数値の設定に当たっては、実際の音声のフレ−ズ
成分に適合した値を予め観測により定めておく必要があ
るが、本実施例ではフレ−ズ長に応じて３種の定数の組
合せを用いることとする。The phrase control mechanism 12 calculates the expression (4), which is an impulse response function for a given phrase command, to generate a phrase component. Prior to this, the phrase to be synthesized is calculated. The constants a _i in equation (4),
The values of b _i and c _i are set by the constant raw setting unit 14.
In setting this constant value, it is necessary to previously determine a value suitable for the phrase component of the actual voice by observation, but in this embodiment, a combination of three types of constants is used in accordance with the phrase length. Shall be used.

【００３５】図４は、前記定数値とフレ−ズ成分の形状
との関係を示したものであり、図示のように、減衰速度
が変っても立ち上がりの応答速度には殆ど変化がなく、
フレ−ズ長に応じたフレ−ズ成分パタンの生成が可能と
なる。FIG. 4 shows the relationship between the constant value and the shape of the phrase component. As shown in FIG.
The generation of a phrase component pattern according to the phrase length becomes possible.

【００３６】定数設定部１４では、文字列解析部１１か
ら得られるフレ−ズのモ−ラ数を基に、前述の３種類の
定数の組合せの一つを選択してフレ−ズ制御機構１２に
出力する。フレ−ズの長さは該フレ−ズを構成する各音
韻の継続時間から算出されるが、フレ−ズ長とフレ−ズ
成分の長さとの関係はそれほど厳密なものではないた
め、本実施例では入力文字列から容易に求まるモ−ラ数
で代用している。The constant setting unit 14 selects one of the above three combinations of constants based on the number of phrases in the phrase obtained from the character string analysis unit 11 and selects the phrase control mechanism 12 Output to The length of the phrase is calculated from the duration of each phoneme composing the phrase. However, the relationship between the phrase length and the length of the phrase component is not so strict, so this embodiment In the example, the number of moras easily obtained from the input character string is substituted.

【００３７】一方アクセント制御機構１３では、文字列
解析部１１から与えられたアクセント指令に対して、式
（３）のステップ応答関数によりアクセント成分を計算
する。On the other hand, the accent control mechanism 13 calculates an accent component by the step response function of the equation (3) in response to the accent command given from the character string analysis unit 11.

【００３８】このようにして、全ての指令に対するフレ
−ズ成分、アクセント成分は加算されて声帯振動機構１
５に出力され、声帯振動機構１５では前記フレ−ズ成
分、アクセント成分に対数基本周波数の下限値ｌn Ｆ
_min を加算して対数基本周波数とし、更に指数変換を行
って基本周波数Ｆ0(ｔ）を出力端子１６から出力する。In this way, the phrase component and the accent component for all commands are added, and the vocal cord vibration mechanism 1 is added.
5 and the vocal cord vibrating mechanism 15 adds the lower limit value ln F of the logarithmic fundamental frequency to the phrase component and the accent component.
_{The min} is added to obtain a logarithmic fundamental frequency, and exponential conversion is performed to output a fundamental frequency F0 (t) from an output terminal 16.

【００３９】図３は、この発明の第２の実施例を示すブ
ロック図であり、文字列入力部２０、文字列解析部２
１、フレ−ズ制御機構２２、フレ−ズパタンテ−ブル２
３、アクセント制御機構２４、アクセントパタンテ−ブ
ル２５、声帯振動機構２６、対数変換テ−ブル２７、基
本周波数出力端子２８から構成されている。以下、第２
の実施例における基本周波数の生成方法を説明する。FIG. 3 is a block diagram showing a second embodiment of the present invention.
1, phrase control mechanism 22, phrase pattern table 2
3, an accent control mechanism 24, an accent pattern table 25, a vocal cord vibration mechanism 26, a logarithmic conversion table 27, and a fundamental frequency output terminal 28. The second
A method of generating a fundamental frequency in the embodiment will be described.

【００４０】韻律記号付き仮名文字列を入力して、フレ
−ズ指令、アクセント指令を生成する文字列入力部２
０、文字列解析部２１の動作は第１の実施例と同様であ
るので説明を省略する。文字列解析部２１で決定された
フレ−ズ指令及びアクセント指令は、其々フレ−ズ制御
機構２２、アクセント制御機構２４に入力される。A character string input unit 2 for inputting a kana character string with a prosody symbol and generating a phrase command and an accent command
0, the operation of the character string analysis unit 21 is the same as in the first embodiment, and a description thereof will be omitted. The phrase command and the accent command determined by the character string analysis unit 21 are input to the phrase control mechanism 22 and the accent control mechanism 24, respectively.

【００４１】フレ−ズ制御機構２２、アクセント制御機
構２４では、指令に基づき各成分を生成するが、この時
前式（３）、（４）を逐一計算する代わりに本実施例で
は、フレ−ズ制御機構２２のインパルス応答関数及びア
クセント制御機構２４のステップ応答関数の応答パタン
を予め計算して其々フレ−ズパタンテ−ブル２３、アク
セントパタンテ−ブル２５に記憶しておき、この記憶内
容を参照して各成分の応答値を算出する。フレ−ズパタ
ンテ−ブル２３には前述の第１の実施例における３種の
フレ−ズパタンを記憶しておき、フレ−ズ制御機構２２
では文字列解析部２１から出力されるモ−ラ数に応じて
好適なフレ−ズパタンを選択する。In the phrase control mechanism 22 and the accent control mechanism 24, each component is generated based on the command. At this time, instead of calculating the above equations (3) and (4) one by one, in this embodiment, the phrase control is performed. The response patterns of the impulse response function of the pitch control mechanism 22 and the step response function of the accent control mechanism 24 are calculated in advance and stored in the phrase pattern table 23 and the accent pattern table 25, respectively. The response value of each component is calculated with reference to this. In the phrase pattern table 23, the three types of phrase patterns in the first embodiment are stored, and the phrase control mechanism 22 is stored.
Then, a suitable phrase pattern is selected according to the number of moras output from the character string analyzing unit 21.

【００４２】声帯振動機構２６においても、対数基本周
波数を基本周波数に変換する際に指数変換が必要となる
ので、ここでも対数変換テ−ブル２７を備えている。実
際に音声合成に使用するのは基本周波数ではなく、ピッ
チ周期（基本周波数の逆数）である場合が多いため、対
数変換テ−ブル２７は対数基本周波数を直接ピッチ周期
に変換するテ−ブルとしても良い。The vocal cord vibrating mechanism 26 also requires an exponential conversion when converting the logarithmic fundamental frequency to the fundamental frequency. Therefore, a logarithmic conversion table 27 is provided here. In many cases, the pitch frequency (the reciprocal of the fundamental frequency) is actually used for speech synthesis instead of the fundamental frequency. Therefore, the logarithmic conversion table 27 is used as a table for directly converting the logarithmic fundamental frequency to the pitch period. Is also good.

【００４３】[0043]

【発明の効果】以上、詳細に説明したように、本発明の
基本周波数パタン生成方法によれば、対数軸上の基本周
波数パタンにおけるフレ−ズ成分を、立ち上がり速度
と、減衰速度を別個に制御し得るインパルス応答関数と
して近似したため、フレ−ズ成分をモデル化するに当た
り立ち上がりの応答速度と減衰速度のうちの何れか一方
に拘束されることなく自然音声のフレ−ズ形状により近
いモデルが設定できる。As described above in detail, according to the fundamental frequency pattern generation method of the present invention, the rise component and the decay speed of the phrase component in the fundamental frequency pattern on the logarithmic axis are separately controlled. Since it is approximated as an impulse response function that can be performed, a model closer to the natural sound phrase shape can be set without being restricted by either the rising response speed or the decay speed in modeling the phrase component. .

【００４４】さらに、本発明では基本周波数パタンを生
成すべきフレ−ズの長さに応じてフレ−ズ成分の形状を
選択するため、種々の長さのフレ−ズに対して自然音声
に近いフレ−ズ成分が生成出来る。従って、本発明によ
る基本周波数パタン生成方法を用いて音声合成を行うこ
とにより、より自然な抑揚をもった合成音が生成でき
る。Further, in the present invention, the shape of the phrase component is selected according to the length of the phrase for which the fundamental frequency pattern is to be generated. A phrase component can be generated. Therefore, by performing speech synthesis using the fundamental frequency pattern generation method according to the present invention, a synthesized sound having more natural intonation can be generated.

[Brief description of the drawings]

【図１】本発明の第１の実施例を示す装置の機能ブロッ
ク図である。FIG. 1 is a functional block diagram of an apparatus showing a first embodiment of the present invention.

【図２】式（４）による生成パタンの例を示す図であ
る。FIG. 2 is a diagram showing an example of a pattern generated by Expression (4).

【図３】本発明の第２の実施例を示す装置の機能ブロッ
ク図である。FIG. 3 is a functional block diagram of an apparatus showing a second embodiment of the present invention.

【図４】式（４）の定数値とフレ−ズ成分の形状との関
係を示す図である。FIG. 4 is a diagram showing a relationship between a constant value of Expression (4) and a shape of a phrase component.

【図５】従来の基本周波数パタン生成モデルを示すブロ
ック図である。FIG. 5 is a block diagram showing a conventional fundamental frequency pattern generation model.

【図６】従来方法によるフレ−ズ成分の形状を示した図
である。FIG. 6 is a diagram showing the shape of a phrase component according to a conventional method.

[Explanation of symbols]

１０文字列入力部１１文字列解析部１２フレ−ズ制御機構１３アクセント制御機構１４定数設定部１５声帯振動機構１６基本周波数出力端子２０文字列入力部２１文字列解析部２２フレ−ズ制御機構２３フレ−ズパタンテ−ブル２４アクセント制御機構２５アクセントパタンテ−ブル２６声帯振動機構２７指数変換テ−ブル２８基本周波数出力端子 Reference Signs List 10 character string input section 11 character string analysis section 12 phrase control mechanism 13 accent control mechanism 14 constant setting section 15 vocal cord vibration mechanism 16 fundamental frequency output terminal 20 character string input section 21 character string analysis section 22 phrase control mechanism 23 Phrase pattern table 24 accent control mechanism 25 accent pattern table 26 vocal cord vibration mechanism 27 exponential conversion table 28 fundamental frequency output terminal

フロントページの続き (56)参考文献特開昭64−28695（ＪＰ，Ａ) 特開平２−129700（ＪＰ，Ａ) 特開昭62−138898（ＪＰ，Ａ) 藤崎、須藤「日本語単語アクセントの基本周波数パタンとその生成機構のモデル」音響学会誌27巻９号、ｐｐ．445− 453（1971) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 21/06 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (56) References JP-A-64-28695 (JP, A) JP-A-2-129700 (JP, A) JP-A-62-138898 (JP, A) Fujisaki, Sudo “Japanese word accent Model of Fundamental Frequency Pattern and Its Generation Mechanism ”, Journal of the Acoustical Society of Japan, Vol. 445-453 (1971) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 11/00-21/06 JICST file (JOIS)

Claims

(57) [Claims]

1. A prosody calculated by analyzing an input sentence.
Enter the accent command and phrase command to generate,
The fundamental frequency pattern on the logarithmic axis
Corresponding phrase component and Accen corresponding to accent
In fundamental frequency pattern generation process represented by the sum of the bets component, deflection for the i-th phrase input sentence bi and ci
Input ai, a constant that determines the response speed of the dose control mechanism
Of the response of the phrase control mechanism to the ith phrase of
The phrase component is Gpi as a constant for correcting the strength.
(T) = time t where ai · tci · exp (−bit)
And the constant a
A basic frequency pattern generation method, wherein i, bi, and ci are set according to the length of a phrase to which a basic frequency is assigned.