JP5925082B2

JP5925082B2 - Speech synthesis apparatus, method and program

Info

Publication number: JP5925082B2
Application number: JP2012176759A
Authority: JP
Inventors: 定男廣谷
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2012-08-09
Filing date: 2012-08-09
Publication date: 2016-05-25
Anticipated expiration: 2032-08-09
Also published as: JP2014035460A

Description

この発明は、HMM(Hidden Markov Model)を用いた音声合成技術に関する。 The present invention relates to a speech synthesis technique using an HMM (Hidden Markov Model).

従来のHMM音声合成装置の構成を図３に示す。従来のHMM音声合成装置は、音素HMM系列生成部１と、声道スペクトルパラメータ計算部２と、音声合成部３と、音素HMM記憶部４とを備える。 The configuration of a conventional HMM speech synthesizer is shown in FIG. The conventional HMM speech synthesizer includes a phoneme HMM sequence generation unit 1, a vocal tract spectrum parameter calculation unit 2, a speech synthesis unit 3, and a phoneme HMM storage unit 4.

音素HMM系列生成部１は、入力された合成したいテキストの音素列にしたがって、学習用の音声データベースから予め学習しておいた音素HMM（Hidden Markov Model: 隠れマルコフモデル）を接続することにより音素HMMの状態の系列を生成する。生成された音素HMMの状態の系列は、声道スペクトルパラメータ計算部２に送信される。 The phoneme HMM sequence generation unit 1 connects a phoneme HMM (Hidden Markov Model) previously learned from a speech database for learning according to the input phoneme string of the text to be synthesized. Generate a series of states. The generated sequence of phoneme HMM states is transmitted to the vocal tract spectrum parameter calculation unit 2.

ここで、音素HMMの状態の系列は、声道スペクトルパラメータの平均値と声道スペクトルパラメータの速度の平均値とから成るベクトル、及び、声道スペクトルパラメータの分散と声道スペクトルパラメータの速度の分散とから成る行列で表現される。なお、声道スペクトルパラメータの平均値、声道スペクトルパラメータの速度の平均値、声道スペクトルパラメータの分散及び声道スペクトルパラメータの速度の分散のそれぞれは、音素HMMの状態の系列を構成する音素HMMの各状態ごとに計算される。 Here, the sequence of phoneme HMM states is a vector composed of the average value of the vocal tract spectral parameters and the average value of the velocity of the vocal tract spectral parameters, and the variance of the vocal tract spectral parameters and the velocity dispersion of the vocal tract spectral parameters. It is expressed as a matrix consisting of Note that the average value of the vocal tract spectrum parameter, the average value of the velocity of the vocal tract spectrum parameter, the variance of the vocal tract spectrum parameter, and the variance of the velocity of the vocal tract spectrum parameter are respectively the phoneme HMMs constituting the sequence of the phoneme HMM states. It is calculated for each state.

声道スペクトルパラメータ計算部２は、出力確率が最大となる声道スペクトルパラメータを計算する。 The vocal tract spectrum parameter calculation unit 2 calculates a vocal tract spectrum parameter that maximizes the output probability.

最後に、音声合成部３が、計算された声道スペクトルパラメータに基づいて、入力された音源信号（有声音であれば基本周期に応じたパルス列、無声音であれば白色雑音からなる音源信号）を畳み込むことにより音声を合成する（例えば、非特許文献１参照。）。 Finally, based on the calculated vocal tract spectrum parameters, the speech synthesizer 3 inputs the input sound source signal (a pulse train corresponding to the basic period if voiced sound, or a sound source signal consisting of white noise if unvoiced sound). A voice is synthesized by convolution (see, for example, Non-Patent Document 1).

声道スペクトルパラメータとしては、音声の声道スペクトルの全極フィルタ表現であるLSPパラメータω_i(i=1,2,…,p)が広く用いられている。フィルタが安定であるためのLSPパラメータの必要条件は、LSPパラメータω_iが昇順特性、つまり、0<ω₁<ω₂<…<ω_pを満たすことである。昇順特性を満たさないLSPパラメータから音声を合成した場合、合成された音声信号が発散する可能性がある。 As the vocal tract spectrum parameter, an LSP parameter ω _i (i = 1, 2,..., P) that is an all-pole filter expression of the vocal tract spectrum of speech is widely used. Requirements LSP parameters for the filter is stable, ascending characteristic LSP parameter omega _i, that is, is to satisfy _{_{0 <ω 1 <ω 2 <}} ... <ω p. When speech is synthesized from LSP parameters that do not satisfy the ascending characteristics, the synthesized speech signal may diverge.

さて、時刻tにおけるLSPパラメータをω_i(t)とし、LSPパラメータの速度を Now, let the LSP parameter at time t be ω _i (t) and the speed of the LSP parameter be

とした場合、LSPパラメータω_i(t)のみから成るベクトルω_static The vector ω _static consisting only of LSP parameters ω _i (t)

からLSPパラメータω_i(t)とその速度Δω_i(t)から成るベクトルω_dynamic The vector ω _dynamic consisting of LSP parameter ω _i (t) and its velocity Δω _i (t)

への変換行列Ｒは、 The transformation matrix R into

で定義できる。aは、例えば0.5である。この明細書において、行列・^Tの上付きのTは、転置を意味する。行列・^Tの上付きのT以外のTは、所定の時刻Tを意味することに注意する。0_a×bは、全ての要素が0であるa×bの行列を意味し、I_a×aは、対角要素が1、それ以外の要素が0であるa×aの正方行列を意味する。 Can be defined. For example, a is 0.5. In this specification, T superscript matrix · ^T means transpose. Note that T other than the superscript ^T of the matrix T means a predetermined time T. 0 _{a × b} means an a × b matrix where all elements are 0, I _{a × a} means an a × a square matrix whose diagonal elements are 1 and all other elements are 0 To do.

LSPパラメータの平均値ω^- _i(t)とLSPパラメータの速度の平均値Δω^- _i(t)とから成るベクトルω^-、及び、LSPパラメータの分散σ_ωi(t)とLSPパラメータの速度の分散σ_Δωi(t)とから成る行列σは、以下のように表すことができる。diagは、対角行列を意味する。 Average value of the speed of the average value .omega.i ^_(t) and LSP parameters LSP parameter Δω ^- _i (t) from the composed vector omega ^-, and the variance of the velocity of the variance σ _{ωi (t)} and LSP parameters LSP parameters A matrix σ composed of σ _{Δωi (t)} can be expressed as follows. diag means a diagonal matrix.

ここで、声道スペクトルパラメータ計算部２において計算により推定されるLSPパラメータベクトルω^を以下のように定義する。 Here, the LSP parameter vector ω ^ estimated by calculation in the vocal tract spectrum parameter calculation unit 2 is defined as follows.

すると、声道スペクトルパラメータ計算部２において出力確率を最大にするLSPパラメータを求めるには、以下を最小化すれば良い。 Then, in order to obtain the LSP parameter that maximizes the output probability in the vocal tract spectrum parameter calculation unit 2, the following may be minimized.

つまり、 That means

を解けばよい（例えば、非特許文献２参照。）。 (For example, see Non-Patent Document 2).

しかし、上述の計算により求まるLSPパラメータの値はHMMの平均値付近に存在するため、人間が発声したオリジナルのLSPパラメータの値と比較すると、LSPパラメータのダイナミックレンジが狭く、こもった音声に聞こえるという問題がある。そこで、Global Variance (GV)というパラメータを定義し、LSPパラメータのダイナミックレンジを評価することを考える（例えば、非特許文献３参照。）。 However, since the value of the LSP parameter obtained by the above calculation exists near the average value of the HMM, compared to the value of the original LSP parameter uttered by humans, the dynamic range of the LSP parameter is narrow, and it can be heard that it is muffled There's a problem. Therefore, it is considered to define a parameter called Global Variance (GV) and evaluate the dynamic range of the LSP parameter (see, for example, Non-Patent Document 3).

ある１つの文章nについてのGVであるν_i,nは、その文章nの発声についてのLSPパラメータを用いて、以下のように定義される。T_nは文章nの発話の時間長である。 Ν _{i, n,} which is a GV for one sentence n, is defined as follows using the LSP parameter for the utterance of the sentence n. T _n is the utterance length of sentence n.

また、学習データのN個の文章n(n=1, ・・・,N) の発声についてのLSPパラメータを用いて、μ及びUを以下のように定義する。μはν_i,n(n=1,・・・,N )の平均であり、Uはν_i,n(n=1,・・・,N)の分散である。なお、Nを所定の正の数とする。音素HMM記憶部４には、μ及びUが記憶されているとする。 Also, μ and U are defined as follows using LSP parameters for the utterance of N sentences n (n = 1,..., N) of learning data. μ is the average of ν _{i, n} (n = 1,..., N), and U is the variance of ν _{i, n} (n = 1,..., N). Note that N is a predetermined positive number. It is assumed that μ and U are stored in the phoneme HMM storage unit 4.

推定されるGVであるν(ω)を以下のように定義する。 The estimated GV ν (ω) is defined as follows.

GVの出力確率を最大にするLSPパラメータを求めるには、以下を最小化すればよい。 To find the LSP parameter that maximizes the output probability of GV, you can minimize:

最終的には、非特許文献２では、声道スペクトルパラメータ計算部２において、式（１）の代わりに、式（１）と式（２）を合わせた評価式である式（３）を最小化するようなLSPパラメータω^を最急降下法などのアルゴリズムを用いて求める。 Finally, in Non-Patent Document 2, the vocal tract spectrum parameter calculation unit 2 minimizes Expression (3), which is an evaluation expression combining Expression (1) and Expression (2), instead of Expression (1). The LSP parameter ω ^ that can be obtained is calculated using an algorithm such as the steepest descent method.

ここで、1/(2T)は、式（１）と式（２）の値のオーダーを揃えるための重みである。 Here, 1 / (2T) is a weight for aligning the order of the values of the expressions (1) and (2).

HMM音声合成では比較的高い次数のLSPパラメータ（例えば、p=40）を用いる必要があることや、GVを導入することにより、上述の方法では、LSPパラメータの昇順特性が必ずしも満たされないという問題がある。そこで、ペナルティ関数を導入する方法が提案されている。つまり、 In HMM speech synthesis, it is necessary to use a relatively high-order LSP parameter (for example, p = 40), and the above-mentioned method does not necessarily satisfy the ascending characteristics of the LSP parameter by introducing GV. is there. Therefore, a method for introducing a penalty function has been proposed. That means

というペナルティ関数による制約を導入する方法である（例えば、非特許文献４参照。）。ここで、αはあまり大きくない正の値、Mは偶数、ω_p+1=πである。ω_i-1>ω_iの場合にペナルティ関数の値が大きくなることが分かるが、このペナルティ関数では昇順特性を満たすことが保証されず、ペナルティ関数の影響により出力確率の最大化が保証されないという問題がある。 This is a method for introducing a constraint by a penalty function (for example, see Non-Patent Document 4). Here, α is a positive value that is not so large, M is an even number, and ω _{p + 1} = π. It can be seen that the value of the penalty function becomes large when ω _i-1 > ω _i , but this penalty function does not guarantee that the ascending characteristics are satisfied, and the maximum output probability is not guaranteed due to the effect of the penalty function. There's a problem.

L.R. Rabiner, R.W. Schafer, “Digital processing of speech signals”, Prentice Hall, 1978.L.R.Rabiner, R.W.Schafer, “Digital processing of speech signals”, Prentice Hall, 1978. 徳田, 益子, 小林, 今井, “動的特徴を用いたHMMからの音声パラメータ生成アルゴリズム”, 日本音響学会誌, Vol.53, No.3, 192-200, 1997.Tokuda, Mashiko, Kobayashi, Imai, “Speech parameter generation algorithm from HMM using dynamic features”, Acoustical Society of Japan, Vol.53, No.3, 192-200, 1997. T. Toda, K. Tokuda, “A speech parameter generation algorithms considering global variance for HMM-based speech synthesis”, IEICE Trans. Inf. & Syst., 816-824, 2007.T. Toda, K. Tokuda, “A speech parameter generation algorithms considering global variance for HMM-based speech synthesis”, IEICE Trans. Inf. & Syst., 816-824, 2007. M. Lei, Z.-H. Ling, L.-R. Dai, “Preserve ordering property of generated LSPs for minimum generation error training in HMM-based speech synthesis”, Proc. ICASSP, 4712-4715, 2011.M. Lei, Z.-H. Ling, L.-R. Dai, “Preserve ordering property of generated LSPs for minimum generation error training in HMM-based speech synthesis”, Proc. ICASSP, 4712-4715, 2011.

このように、従来の手法では、LSPパラメータの昇順特性を保持することができなかった。 As described above, the conventional method cannot maintain the ascending order characteristic of the LSP parameter.

この発明は、LSPパラメータの昇順特性を保持することができる音声合成装置、方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide a speech synthesizer, a method, and a program that can maintain the ascending characteristics of LSP parameters.

この発明の一態様による音声合成装置は、複数の音素の音素HMMを記憶する音素HMM記憶部と、入力されたテキストの各音素に対応する音素HMMを音素HMM記憶部から読み込み、読み込んだ音素HMMを接続することにより、音素HMMの状態の系列を生成する音素HMM状態系列生成部と、pをLSPパラメータの次数とし、ω^_i(t)(i=1,2,・・・.p)をLSPパラメータとして、差分LSPパラメータd_i(t)をd₁(t)=ω^₁(t)>0かつd_j(t)=ω^_j(t)-ω^_j-1(t)>0(2≦j≦p)として、音素HMMの状態の系列を構成するLSPパラメータを用いて、差分LSPパラメータd_i (t)を乗算型更新式により繰り返し更新することにより計算する差分LSPパラメータ計算部と、計算された差分LSPパラメータd_i (t)を用いて、以下の式により定義されるLSPパラメータω^_i(t)を計算するLSPパラメータ計算部と、計算されたLSPパラメータω^_i(t)に基づいて、入力されたテキストに対応する音声を生成する音声合成部とを、備える。
A speech synthesizer according to an aspect of the present invention includes a phoneme HMM storage unit that stores phoneme HMMs of a plurality of phonemes, and a phoneme HMM corresponding to each phoneme of the input text from the phoneme HMM storage unit. And a phoneme HMM state sequence generator that generates a sequence of phoneme HMM states, and p is the order of the LSP parameter, and ω ^ _i (t) (i = 1,2, .... p) Is the LSP parameter, and the differential LSP parameter d _i (t) is d ₁ (t) = ω ^ ₁ (t)> 0 and d _j (t) = ω ^ _j (t) -ω ^ _j-1 (t) > 0 (2 ≤ j ≤ p), using the LSP parameters that make up the sequence of phoneme HMM states, the differential LSP parameter calculated by repeatedly updating the differential LSP parameter d _i (t) with a multiplication-type update equation a calculation unit, using the calculated difference LSP parameters d _i (t), and the LSP parameter calculating unit that calculates an LSP parameter ω ^ _i (t) which is defined by the following equation, the calculated LSP parameter Based on over data ω ^ _i (t), and a speech synthesis unit for generating a sound corresponding to the input text, provided.

LSPパラメータの昇順特性を保持することができる。 The ascending characteristics of LSP parameters can be retained.

音声合成装置の例を説明するための図。The figure for demonstrating the example of a speech synthesizer. 音声合成方法の例を説明するための図。The figure for demonstrating the example of the speech synthesis method. 従来の音声合成装置の例を説明するための図。The figure for demonstrating the example of the conventional speech synthesizer.

以下、図面を参照して、音声合成装置及び方法の実施形態を説明する。 Hereinafter, embodiments of a speech synthesis apparatus and method will be described with reference to the drawings.

音声合成装置は、図１に示すように、音素HMM系列生成部１と、差分LSPパラメータ計算部５と、LSPパラメータ計算部６と、音声合成部３と、音素HMM記憶部４とを備える。この発明の実施形態では、背景技術で説明した声道スペクトルパラメータ計算部２に代わり、差分LSPパラメータ計算部５及びLSPパラメータ計算部６がLSPパラメータを計算する。他の部分は、背景技術と同様である。 As shown in FIG. 1, the speech synthesizer includes a phoneme HMM sequence generation unit 1, a differential LSP parameter calculation unit 5, an LSP parameter calculation unit 6, a speech synthesis unit 3, and a phoneme HMM storage unit 4. In the embodiment of the present invention, the differential LSP parameter calculation unit 5 and the LSP parameter calculation unit 6 calculate LSP parameters instead of the vocal tract spectrum parameter calculation unit 2 described in the background art. Other parts are the same as in the background art.

音素HMM記憶部４には、複数の音素の音素HMMが記憶されている。また、Nを所定の正の数として、音素HMM記憶部４には、N個の文章n(n=1,・・・,N)の発話についてのLSPパラメータから求められたGV（μ及びＵ）が記憶されているとする。 The phoneme HMM storage unit 4 stores a plurality of phoneme HMMs. Further, with N as a predetermined positive number, the phoneme HMM storage unit 4 stores GV (μ and U) obtained from the LSP parameters for the utterances of N sentences n (n = 1,..., N). ) Is stored.

音素HMMの各状態は、LSPパラメータの平均値ω^- _i(t)、LSPパラメータの速度の平均値Δω^- _i(t)、LSPパラメータの分散σ_ωi(t)及びLSPパラメータの速度の分散σ_Δωi(t)で表現される。LSPパラメータの平均値ω^- _i(t)、LSPパラメータの速度の平均値Δω^- _i(t)、LSPパラメータの分散σ_ωi(t)及びLSPパラメータの速度の分散σ_Δωi(t)は、事前に計算され、音素HMM記憶部４に記憶されている。 Each state of the phoneme HMM consists of an LSP parameter average value ω ^- _i (t), an LSP parameter speed average value Δω ^- _i (t), an LSP parameter variance σ _{ωi (t),} and an LSP parameter velocity variance σ It is expressed by _{Δωi (t)} . LSP parameter average value ω ^- _i (t), LSP parameter speed average value Δω ^- _i (t), LSP parameter variance σ _{ωi (t)} and LSP parameter velocity variance σ _{Δωi (t)} And stored in the phoneme HMM storage unit 4.

音素HMM系列生成部１は、背景技術で説明した音素HMM系列生成部１と同様の処理を行う。すなわち、音素HMM系列生成部１は、入力されたテキストの各音素に対応する音素HMMを音素HMM記憶部４から読み込み、読み込んだ音素HMMを接続することにより、音素HMMの状態の系列を生成する（ステップＳ１）。生成された音素HMMの状態の系列は、差分LSPパラメータ計算部５に送信される。 The phoneme HMM sequence generation unit 1 performs the same processing as the phoneme HMM sequence generation unit 1 described in the background art. That is, the phoneme HMM sequence generation unit 1 reads a phoneme HMM corresponding to each phoneme of the input text from the phoneme HMM storage unit 4 and connects the read phoneme HMM to generate a sequence of phoneme HMM states. (Step S1). The generated sequence of phoneme HMM states is transmitted to the differential LSP parameter calculation unit 5.

音素HMMの各状態は、LSPパラメータの平均値ω^- _i(t)、LSPパラメータの速度の平均値Δω^- _i(t)、LSPパラメータの分散σ_ωi(t)及びLSPパラメータの速度の分散σ_Δωi(t)で表現されるため、音素HMMの状態の系列も、LSPパラメータの平均値ω^- _i(t)とLSPパラメータの速度の平均値Δω^- _i(t)とから成るベクトルω^-、及び、LSPパラメータの分散σ_ωi(t)とLSPパラメータの速度の分散σ_Δωi(t)とから成る行列σで表現される。 Each state of the phoneme HMM consists of an LSP parameter average value ω ^- _i (t), an LSP parameter speed average value Δω ^- _i (t), an LSP parameter variance σ _{ωi (t),} and an LSP parameter velocity variance σ _Since it is expressed by _{Δωi (t)} , the sequence of phoneme HMM states is also a vector ω ⁻ composed of the average value ω ⁻ _i (t) of the LSP parameter and the average value Δω ⁻ _i (t) of the speed of the LSP parameter, In addition, it is expressed by a matrix σ composed of LSP parameter variance σ _{ω i (t)} and LSP parameter velocity variance σ _{Δω i (t)} .

差分LSPパラメータ計算部５は、音素HMMの状態の系列を構成するLSPパラメータを用いて、差分LSPパラメータd_jを乗算型更新式により繰り返し更新することにより計算する（ステップＳ２）。計算された差分LSPパラメータd_jは、LSPパラメータ計算部６に送信される。 The difference LSP parameter calculation unit 5 calculates the difference LSP parameter d _j by repeatedly updating the difference LSP parameter dj using a multiplication type update equation using the LSP parameters constituting the sequence of phoneme HMM states (step S2). The calculated differential LSP parameter d _j is transmitted to the LSP parameter calculation unit 6.

差分LSPパラメータ計算部５は、具体的にはω^- _i(t),Δω^- _i(t),σ_ωi(t),σ_Δωi(t),μ,U等のパラメータを用いて、差分LSPパラメータd_jを計算する。ここで、ω^- _i(t),Δω^- _i(t),σ_ωi(t),σ_Δωi(t),μ,U等のパラメータは、音素HMMの状態の系列を構成するLSPパラメータω_i(t)から計算されるパラメータである。したがって、差分LSPパラメータ計算部５は、音素HMMの状態の系列を構成するLSPパラメータを用いて、差分LSPパラメータd_jを計算していると言える。 Specifically, the differential LSP parameter calculation unit 5 uses the parameters such as ω ^- _i (t), Δω ^- _i (t), σ _{ωi (t)} , σ _{Δωi (t)} , μ, U, etc. The parameter d _j is calculated. Here, parameters such as ω ^- _i (t), Δω ^- _i (t), σ _{ωi (t)} , σ _{Δωi (t)} , μ, U, etc. are LSP parameters ω _i that constitute a sequence of states of the phoneme HMM. This is a parameter calculated from (t). Therefore, it can be said that the differential LSP parameter calculation unit 5 calculates the differential LSP parameter d _j by using the LSP parameters constituting the sequence of phoneme HMM states.

pをLSPパラメータの次数として、差分LSPパラメータd_i(t)により定まるLSPパラメータをω^_i(t)として、d_i(t)及びω^_i(t)は以下の関係を有する。 With p as the order of the LSP parameter and LSP parameter determined by the differential LSP parameter d _i (t) as ω ^ _i (t), d _i (t) and ω ^ _i (t) have the following relationship.

差分LSPパラメータ計算部５は、背景技術の欄で説明した式（３）を最小化するd_j(t)を計算する。 The differential LSP parameter calculation unit 5 calculates d _j (t) that minimizes the expression (3) described in the background art section.

例えば、式（３）に For example, in equation (3)

を代入した評価式をF(d_i(t))と定義し、F(d_i(t))をd_i(t)で偏微分した多項式▽F(d_i(t))の正の項から成る多項式を▽F⁺(d_i(t))とし、上記多項式▽F(d_i(t))の負の項から成る多項式を▽F^-(d_i(t))とする。このとき、d_jを以下の乗算型更新式により繰り返し更新することにより、d_jを計算することができる（例えば、参考文献１参照。）。 The evaluation formula obtained by substituting defined as _{F (d i (t))} , the positive term of F (d _i (t)) was partially differentiated by d _i (t) polynomial _{▽ F (d i (t)} ) a polynomial consisting of ▽ F ⁺ and (d _i (t)), a polynomial consisting of negative term of the polynomial _{▽ F (d i (t)} ) ▽ F - and (d _i (t)). At this time, by repeatedly updated by the following multiplicative update equation d _j, it is possible to calculate the d _j (e.g., see reference 1.).

〔参考文献１〕Virtanen, IEEE Trans. Audio Speech Lang. Process., 15(3), 1066-1074, 2007.
また、背景技術で説明した式（０）のようにLSPパラメータの速度を定義した場合には、式（４）の乗算型更新式は具体的には以下のようになる。 [Reference 1] Virtanen, IEEE Trans. Audio Speech Lang. Process., 15 (3), 1066-1074, 2007.
Further, when the speed of the LSP parameter is defined as in Expression (0) described in the background art, the multiplication type update expression of Expression (4) is specifically as follows.

また、背景技術で説明した式（０）の定数aを0.5にした場合には、式（４）の乗算型更新式は、具体的には以下のようになる。 Further, when the constant “a” of the equation (0) described in the background art is set to 0.5, the multiplication type update equation of the equation (4) is specifically as follows.

また、LSPパラメータの速度を以下のように定義した場合には、 If the speed of the LSP parameter is defined as follows,

式（４）の乗算型更新式は具体的には以下のようになる。 Specifically, the multiplication type update formula of Formula (4) is as follows.

なお、上述の乗算型更新式では、GVを利用する場合について説明を行ったが、GVを利用しない実施形態に本発明を利用することも可能である。例えば、LSPパラメータの速度を以下のように定義し、 In the above-described multiplication type updating formula, the case of using GV has been described, but the present invention can also be used in an embodiment that does not use GV. For example, define the speed of the LSP parameter as follows:

乗算型更新式を具体的には以下のようにする。なお、GVを用いない乗算型更新式を使用する場合には、μやUを予め音素HMM記憶部４に記憶しておかなくてもよい。 Specifically, the multiplication type update formula is as follows. Note that when using a multiplication type update equation that does not use GV, μ and U need not be stored in the phoneme HMM storage unit 4 in advance.

上述の乗算型更新式の中のTは、音素HMMの状態の系列の時間長の総和であり、音素HMM状態系列生成部１により計算される。 T in the above-described multiplication type update formula is the sum of the time lengths of the phoneme HMM state sequences, and is calculated by the phoneme HMM state sequence generation unit 1.

なお、上述の乗算型更新式において、t≦0又はt>Tの場合にd_i(t),Δω^- _j(t)は定義されないため、t≦0又はt>Tの場合のd_i(t),Δω^- _j(t)を含む項の値を０とする。例えば、最初に具体例として挙げた乗算型更新式の式（５）のＡの右辺の第２項においてd_l(0)の場合にはこの場合の第２項の値を０とする。また、式（５）のＡの右辺の第３項においてd_l(T)の場合にはこの場合の第３項の値を０とする。 In addition, in the above-described multiplication type update equation, d _i (t) and Δω ⁻ _j (t) are not defined when t ≦ 0 or t> T, and therefore d _i (t when t ≦ 0 or t> T The value of the term including t), Δω ⁻ _j (t) is set to 0. For example, in the case of d _l (0) in the second term on the right side of A in Expression (5) of the multiplication type update formula given as a specific example first, the value of the second term in this case is set to 0. Further, in the case of d _l (T) in the third term on the right side of A in Equation (5), the value of the third term in this case is set to 0.

d_i(t)の初期値は、例えば非負の乱数とする。d_i(t)の更新の回数は求める精度、仕様に応じて適宜定められる。一般に、d_i(t)の更新の回数が多いほど、d_i(t)の精度が高くなる。例えば、K-1回目の更新後の式（３）の値と、K回目の更新後の式（３）の値との差が所定の閾値（例えば１０^−７）以下になるまで更新する。 The initial value of d _i (t) is, for example, a non-negative random number. The number of times of updating d _i (t) is appropriately determined according to the required accuracy and specifications. In general, the higher the number of updates of d _i (t) is large, the accuracy of d _i (t) is higher. For example, the update is performed until the difference between the value of Equation (3) after the K-1th update and the value of Equation (3) after the ^Kth update is equal to or less than a predetermined threshold (for example, 10 ⁻⁷ ).

d_i(t)は非負値に収束することから、０＜ω₁(t)＝d_i(t)＜ω₂(t)＝d_i(t)＋d₂(t)＜…＜ω_p(t)＝d_i(t)＋…＋d_p(t)となる。このように、ω_i(t)の代わりにd_i(t)を計算し、d_i(t)からω_i(t)を求めることにより、求まったω_i(t)は昇順特性を必ず満たし、かつ式（３）の誤差を最小化する。 Since d _i (t) converges to a non-negative value, 0 <ω ₁ (t) = d _i (t) <ω ₂ (t) = d _i (t) + d ₂ (t) <… <ω _p ( t) = d _i (t) +... + d _p (t). Thus, by calculating d _i (t) instead of ω _i (t) and calculating ω _i (t) from d _i (t), the obtained ω _i (t) always satisfies the ascending characteristics. , And minimize the error in equation (3).

LSPパラメータ計算部６は、差分ＬＳＰパラメータ計算部５が求めたd_ｉ(ｔ)を用いて、下記式によりLSPパラメータω^_i(t)を計算する（ステップＳ３）。計算されたLSPパラメータω^_i(t)は、音声合成部３に送信される。 The LSP parameter calculation unit 6 calculates the LSP parameter ω ^ _i (t) by the following equation using d _i (t) obtained by the differential LSP parameter calculation unit 5 (step S3). The calculated LSP parameter ω ^ _i (t) is transmitted to the speech synthesizer 3.

音声合成部３は、背景技術で説明した音声合成部３と同様の処理を行う。すなわち、音声合成部３は、計算されたLSPパラメータω^_i(t)に基づいて、入力されたテキストに対応する音声を生成する（ステップＳ４）。 The speech synthesizer 3 performs the same processing as the speech synthesizer 3 described in the background art. That is, the speech synthesizer 3 generates speech corresponding to the input text based on the calculated LSP parameter ω ^ _i (t) (step S4).

具体的には、音声合成部３は、以下の式のように、計算された声道スペクトルパラメータに基づいて、入力された音源信号u(t)（有声音であれば基本周期に応じたパルス列、無声音であれば白色雑音からなる音源信号）を畳み込むことにより音声信号s~(t)を生成する。 Specifically, the speech synthesizer 3 receives the input sound source signal u (t) (a pulse train corresponding to the basic period if it is a voiced sound) based on the calculated vocal tract spectrum parameters as shown in the following equation. The voice signal s˜ (t) is generated by convolving a sound source signal composed of white noise if it is an unvoiced sound.

なお、以下のパラメータを音素HMM記憶部４に記憶された音素HMMに追加することにより、d_i(t)<πであることが補償される。これにより、合成された音声信号が発散する可能性が更に低くなる。 Note that the following parameters are added to the phoneme HMM stored in the phoneme HMM storage unit 4 to compensate for d _i (t) <π. This further reduces the possibility that the synthesized audio signal will diverge.

なお、音素HMM記憶部４に、LSPパラメータの平均値ω^- _i(t)、LSPパラメータの速度の平均値Δω^- _i(t)、LSPパラメータの分散σ_ωi(t)及びLSPパラメータの速度の分散σ_Δωi(t)ではなく、LSPパラメータω_i(t)で表現された音素HMMが記憶されていてもよい。この場合、音素HMM系列生成部１は、音素HMM記憶部４から読み込んだLSPパラメータω_i(t)を用いて、LSPパラメータの平均値ω^- _i(t)、LSPパラメータの速度の平均値Δω^- _i(t)、LSPパラメータの分散σ_ωi(t)及びLSPパラメータの速度の分散σ_Δωi(t)を計算して差分LSPパラメータ計算部５に送信する。 The phoneme HMM storage unit 4 stores the LSP parameter average value ω ^- _i (t), the LSP parameter speed average value Δω ^- _i (t), the LSP parameter variance σ _{ωi (t),} and the LSP parameter speed. _{Instead of the} variance σ _{Δωi (t)} , a phoneme HMM expressed by the LSP parameter ω _i (t) may be stored. In this case, the phoneme HMM sequence generation unit 1 uses the LSP parameter ω _i (t) read from the phoneme HMM storage unit 4, and uses the LSP parameter average value ω ⁻ _i (t) and the LSP parameter speed average value Δω. ^- _i (t), and transmits the calculated variance sigma _.omega.i the LSP parameter _(t) and the variance sigma _{Derutaomegaai} speed LSP parameter _(t) to the difference LSP parameter calculating part 5.

なお、μ及びUが音素HMM記憶部４に記憶されていなくてもよい。この場合であって、GVを用いる乗算型更新式を用いる場合には、音素HMM系列生成部１は、N個の文章n(n=1,・・・,N)の発話についてのLSPパラメータからμ及びUを計算して差分LSPパラメータ計算部５に送信する。 Note that μ and U may not be stored in the phoneme HMM storage unit 4. In this case, when a multiplication type update formula using GV is used, the phoneme HMM sequence generation unit 1 uses the LSP parameters for the utterances of N sentences n (n = 1,..., N). μ and U are calculated and transmitted to the differential LSP parameter calculation unit 5.

上記装置及び方法において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 The processes described in the above apparatus and method are not only executed in time series according to the order of description, but may be executed in parallel or individually as required by the processing capability of the apparatus that executes the process.

また、上記装置における処理手段をコンピュータによって実現する場合、上記装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 Further, when the processing means in the apparatus is realized by a computer, the processing contents of the functions that the apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、各処理手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each processing means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

その他、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 Needless to say, other modifications are possible without departing from the spirit of the present invention.

１音素HMM系列生成部
２声道スペクトルパラメータ計算部
３音声合成部
４音素HMM記憶部
５差分LSPパラメータ計算部
６ LSPパラメータ計算部 DESCRIPTION OF SYMBOLS 1 Phoneme HMM sequence generation part 2 Vocal tract spectrum parameter calculation part 3 Speech synthesis part 4 Phoneme HMM storage part 5 Difference LSP parameter calculation part 6 LSP parameter calculation part

Claims

A phoneme HMM storage unit for storing a plurality of phoneme HMMs;
A phoneme HMM state sequence generation unit that reads a phoneme HMM corresponding to each phoneme of the input text from the phoneme HMM storage unit and generates a sequence of phoneme HMM states by connecting the read phoneme HMM;
p is the order of the LSP parameter, ω ^ _i (t) (i = 1,2, .... p) is the LSP parameter, and the differential LSP parameter d _i (t) is d ₁ (t) = ω ^ ₁ (t)> 0 and d _j (t) = ω ^ _j (t) -ω ^ _j-1 (t)> 0 (2 ≦ j ≦ p) A differential LSP parameter calculation unit that calculates the difference LSP parameter d _i (t) by repeatedly updating with a multiplication type update equation,
An LSP parameter calculation unit that calculates an LSP parameter ω ^ _i (t) defined by the following formula using the calculated differential LSP parameter d _i (t) ,

Based on the calculated LSP parameter ω ^ _i (t), a speech synthesizer that generates speech corresponding to the input text;
A speech synthesizer.

The speech synthesis apparatus according to claim 1,
Positive expression of polynomial ▽ F (d _i (t)) obtained by partial differentiation of evaluation formula F (d _i (t)) with respect to d _i (t) to obtain LSP parameter ω ^ _i (t) that maximizes output probability a polynomial consisting of terms ▽ F ⁺ and (d _i (t)), a polynomial consisting of negative term of the polynomial _{▽ F (d i (t)} ) ▽ F - as (d _i (t)),
The multiplication type update equation is the following equation:

Speech synthesizer.

The phoneme HMM state sequence generation unit reads the phoneme HMM corresponding to each phoneme of the input text from the phoneme HMM storage unit that stores the phoneme HMMs of a plurality of phonemes, and connects the read phoneme HMMs to connect the phoneme HMM A phoneme HMM state sequence generation step for generating a sequence of states;
The differential LSP parameter calculation unit sets p as the order of the LSP parameter, ω ^ _i (t) (i = 1, 2,... P) as the LSP parameter, and sets the differential LSP parameter d _i (t) as d ₁ (t) = ω ^ ₁ (t)> 0 and d _j (t) = ω ^ _j (t) -ω ^ _j-1 (t)> 0 (2 ≦ j ≦ p) A differential LSP parameter calculation step for calculating the differential LSP parameter d _i (t) by repeatedly updating the differential LSP parameter d _i (t) using a multiplication-type update equation,
The LSP parameter calculation unit calculates an LSP parameter ω ^ _i (t) defined by the following formula using the calculated difference LSP parameter d _i (t) ,

A speech synthesis unit that generates speech corresponding to the input text based on the calculated LSP parameter ω ^ _i (t);
A speech synthesis method including:

The program for functioning a computer as each part of the speech synthesizer of Claim 1 or 2.