JP7125599B2

JP7125599B2 - Prosody control device, prosody control method and program

Info

Publication number: JP7125599B2
Application number: JP2018133062A
Authority: JP
Inventors: 秀治中嶋; 芳典匂坂; 一眞高田
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2018-07-13
Filing date: 2018-07-13
Publication date: 2022-08-25
Anticipated expiration: 2038-07-13
Also published as: JP2020012867A

Description

特許法第３０条第２項適用高田一眞が、学校法人早稲田大学２０１７（平成２９）年度修士論文・卒業論文発表会において、「判断のモダリティに基づく対話韻律の分析」と題して、中嶋秀治、匂坂芳典、及び、高田一眞が発明した「韻律制御装置、韻律制御方法及びプログラム」に関する技術について公開。Application of Article 30, Paragraph 2 of the Patent Act Kazuma Takada gave a presentation titled "Analysis of Dialogue Prosody Based on Modalities of Judgment" at Waseda University 2017 (Heisei 29) Master's Thesis and Graduation Thesis Presentation, Hideharu Nakajima, Disclosed the technology related to "prosody control device, prosody control method and program" invented by Yoshinori Sakisaka and Kazuma Takada.

本発明は対話音声の音声合成において、韻律制御、特に基本周波数（声の高さ）の制御を行う韻律制御装置、韻律制御方法及びプログラムに関する。 The present invention relates to a prosody control device, a prosody control method, and a program for controlling prosody, particularly fundamental frequency (pitch) in speech synthesis of dialogue speech.

従来、音声合成対象の意図又は意味に対応する基本周波数変化形状の関係分析（例えば非特許文献１）と、形状分類に基づく制御（例えば非特許文献２）とが行われていた。 Conventionally, relationship analysis of fundamental frequency change shapes corresponding to intentions or meanings of speech synthesis targets (for example, Non-Patent Document 1) and control based on shape classification (for example, Non-Patent Document 2) have been performed.

大島デイヴィド義和、「日本語におけるイントネーション型と終助詞機能の相関について」、国際開発フォーラム４３、2013年3月、pp.47‐63David Yoshikazu Oshima, "Correlation between Intonation Type and Final Particle Function in Japanese," International Development Forum 43, March 2013, pp.47-63 岩田和彦、小林哲則、「対話音声合成の表現力向上に向けた文末詞と音調の組合せによる発話意図の表現に関する実験的検討」、2017年、電子情報通信学会論文誌D、Vol.J100-D、No.11、pp.938-948Kazuhiko Iwata, Tetsunori Kobayashi, "Experimental Study on Expression of Utterance Intent by Combining Sentence Final Words and Tones for Improving Expressiveness of Dialogue Speech Synthesis", 2017, The Institute of Electronics, Information and Communication Engineers Transactions D, Vol.J100- D, No.11, pp.938-948

非特許文献１及び２記載の方法は、音声合成対象の句の単語、特に、発話末尾の終助詞に着目して、その終助詞の基本周波数の変化形状を分類して利用している。当該方法は、終助詞を含む句を終助詞とその前部の要素とに分け、それらの組合せと句全体で伝える意味又は意図との対応から基本周波数の変化形状を選択することで、尤もらしい基本周波数の変化形状を与える。 The methods described in Non-Patent Literatures 1 and 2 focus on words in phrases to be synthesized, particularly final particles at the end of utterances, and classify and use the changing shape of the fundamental frequency of the final particles. The method divides a phrase containing a final particle into a final particle and its front element, and selects a change shape of the fundamental frequency from the correspondence between the combination of them and the meaning or intention conveyed by the whole phrase. Gives the variation shape of the fundamental frequency.

上記の方法は、例えば、終助詞が「ね」で、その前部要素が「食べて」で、全体の句が「食べてね」の場合、その句が依頼を意味する場合と命令を意味する場合との間で基本周波数の変化形状を選択しわける。 For example, if the final particle is ``ne'', its front element is ``eat'', and the whole phrase is ``eat ne'', the above method can be used to Select the shape of change in the fundamental frequency between

しかし、終助詞の前には、上記の非特許文献１又は２で調べられた以外の要素が付く場合も有るが、上記非特許文献では調べられていない要素が句全体に与える影響についての知見が無い。このため基本周波数の制御が困難であるという課題が有る。例えば、「する＋らしい＋よ」と「する＋みたいだ＋よ」では、どちらも基本周波数は上昇する。しかし、非特許文献１記載の方法は変化の方向を示すのみであり、その程度を選択することができない。また、「する＋にちがいない＋よ」と「する＋みたいだ＋よ」では、「よ」の基本周波数は前者で大きく下がり、後者で大きく上がる。しかし、非特許文献２では細かな意図又は意味の分類との対応が無いため、基本周波数を適切に選択することは難しい。以下、終助詞の前側につく「らしい」、「みたいだ」、「にちがいない」等の要素を変化要因部と称して本発明を説明する。 However, before the final particle, there may be cases where elements other than those examined in the above Non-Patent Documents 1 or 2 are attached, but knowledge about the influence of elements not examined in the above Non-Patent Documents on the entire phrase There is no Therefore, there is a problem that it is difficult to control the fundamental frequency. For example, "Do + seems + yo" and "Do + seems + yo" both raise the fundamental frequency. However, the method described in Non-Patent Document 1 only indicates the direction of change, and the degree cannot be selected. In addition, in the case of ``suru + must be + yo'' and ``suru + mitai da + yo'', the fundamental frequency of ``yo'' drops significantly in the former and rises significantly in the latter. However, in Non-Patent Document 2, it is difficult to appropriately select the fundamental frequency because there is no correspondence with detailed intention or semantic classification. Hereinafter, the present invention will be described by referring to elements such as ``like'', ``like'', and ``nishinai'' attached to the front side of a final particle as a variable factor part.

かかる点に鑑みてなされた本開示の目的は、終助詞の前側につく要素が細かく変わっても、それに対応する多様な韻律制御が可能である韻律制御装置、韻律制御方法及びプログラムを提供することにある。 An object of the present disclosure, which has been made in view of this point, is to provide a prosody control device, a prosody control method, and a program that are capable of various prosody control corresponding to even if the element attached to the front side of the final particle changes finely. It is in.

上記課題を解決するため、本発明に係る韻律制御装置は、
句を含む表現の入力を受け付けて、当該句を単語に分割し、終助詞と動詞部分とを特定し、前記終助詞と前記動詞部分とに挟まれた部分があれば前記部分を変化要因部として設定する言語分析部と、
前記変化要因部に対応する尺度値を決定する尺度化部と、
前記尺度値を韻律制御量へ変換する尺度値制御量変換部と、
前記韻律制御量に基づいて前記表現の韻律を制御する韻律生成部と、
を有し、
前記句に前記変化要因部が存在しないと前記言語分析部により判定されると、前記尺度値制御量変換部により前記韻律制御量として所定の定数を設定する。 In order to solve the above problems, the prosody control device according to the present invention includes:
Receiving an input of an expression including a phrase, dividing the phrase into words, identifying a final particle and a verb part, and if there is a part sandwiched between the final particle and the verb part, the part is treated as a change factor part. a linguistic analyzer configured as
a scaling unit that determines a scale value corresponding to the variable factor;
a scale value control amount conversion unit that converts the scale value into a prosody control amount;
a prosody generator that controls the prosody of the expression based on the prosody control amount;
has
When the language analysis unit determines that the change factor part does not exist in the phrase, the scale value control amount conversion unit sets a predetermined constant as the prosody control amount.

上記課題を解決するため、本発明に係る韻律制御方法は、
言語分析部により、句を含む表現の入力を受け付けて、当該句を単語に分割し、終助詞と動詞部分とを特定し、前記終助詞と前記動詞部分とに挟まれた部分があれば前記部分を変化要因部として設定するステップと、
尺度化部により、前記変化要因部に対応する尺度値を決定するステップと、
尺度値制御量変換部により、前記尺度値を韻律制御量へ変換するステップと、
韻律生成部により、前記韻律制御量に基づいて前記表現の韻律を制御するステップと、
前記句に前記変化要因部が存在しないと前記言語分析部により判定されると、前記尺度値制御量変換部により前記韻律制御量として所定の定数を設定するステップと、
を有する。 In order to solve the above problems, the prosody control method according to the present invention includes:
The language analysis unit receives an input of an expression including a phrase, divides the phrase into words, identifies a final particle and a verb part, and if there is a part sandwiched between the final particle and the verb part, setting the portion as the variable factor portion;
determining, by a scaling unit, a scale value corresponding to the variable factor;
a step of converting the scale value into a prosodic control quantity by a scale value control quantity conversion unit;
controlling the prosody of the expression based on the prosody control amount by a prosody generator;
setting a predetermined constant as the prosody control amount by the scale value control amount conversion unit when the language analysis unit determines that the change factor part does not exist in the phrase;
have

本開示に係る韻律制御装置、韻律制御方法及びプログラムによれば、終助詞の前側につく要素が細かく変わっても、それに対応する多様な韻律制御が可能である。 According to the prosody control device, the prosody control method, and the program according to the present disclosure, even if the element that precedes the final particle changes in detail, various prosody controls corresponding to it are possible.

本実施形態の韻律制御装置の機能ブロック図である。2 is a functional block diagram of the prosody control device of this embodiment; FIG. 本実施形態の韻律制御装置が実行する処理のフローチャートである。4 is a flow chart of processing executed by the prosody control device of the present embodiment; 本実施形態における印象対の評点付け方法を示す図である。FIG. 4 is a diagram showing a method of scoring impression pairs according to the present embodiment; 本実施形態の対応表を示す図である。It is a figure which shows the correspondence table of this embodiment. 本実施形態における基本周波数の上昇量等を示す図である。FIG. 5 is a diagram showing the amount of increase in fundamental frequency, etc., according to the present embodiment;

以下、図面を参照して本発明がより具体的に説明される。 Hereinafter, the present invention will be described more specifically with reference to the drawings.

図１は、本実施形態の韻律制御装置Ｄの機能ブロック図である。韻律制御装置Ｄは言語分析部１、尺度化部２、尺度値制御量変換部３及び韻律生成部４を含む。図１内の矢印は情報が流れる方向を示す。以下、韻律制御装置Ｄの各機能を説明するが、韻律制御装置Ｄが有する他の機能を排除することを意図したものではない。 FIG. 1 is a functional block diagram of the prosody control device D of this embodiment. The prosody control device D includes a language analysis unit 1, a scaling unit 2, a scale value control amount conversion unit 3, and a prosody generation unit 4. FIG. Arrows in FIG. 1 indicate directions in which information flows. Each function of the prosody control device D will be described below, but it is not intended to exclude other functions that the prosody control device D has.

言語分析部１、尺度化部２、尺度値制御量変換部３及び韻律生成部４が実行する処理は、１又は複数のＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等のプロセッサ（図示せず）によって実行される。プロセッサは、種々の処理のためのプログラム及び演算中の情報を記憶する１又は複数のメモリを含んでよい。メモリは揮発性メモリ及び不揮発性メモリを含む。メモリは、プロセッサと独立しているメモリ、及びプロセッサの内蔵メモリを含む。プロセッサは特定のプログラムを読み込ませて特定の機能を実行する汎用のプロセッサ、特定の処理に特化した専用のプロセッサを含む。 The processing executed by the language analysis unit 1, the scaling unit 2, the scale value control amount conversion unit 3, and the prosody generation unit 4 is executed by a processor (not shown) such as one or more CPUs (Central Processing Units). . A processor may include one or more memories that store programs and information during operation for various processes. Memory includes volatile memory and non-volatile memory. The memory includes memory separate from the processor and memory internal to the processor. The processor includes a general-purpose processor that loads a specific program and executes a specific function, and a dedicated processor that specializes in specific processing.

各種情報、及び／又は韻律制御装置Ｄを動作させるためのプログラム等は、韻律制御装置Ｄ内部又は外部の記憶部に記憶されてよい。記憶部は半導体メモリ又は磁気メモリ等で構成される。記憶部はワークメモリとして機能してもよい。 Various kinds of information and/or programs for operating the prosody control device D may be stored in the internal or external storage unit of the prosody control device D. The storage unit is composed of a semiconductor memory, a magnetic memory, or the like. The storage unit may function as a work memory.

図２のフローチャートを参照しながら、韻律制御装置Ｄが実行する韻律制御方法を説明する。 The prosody control method executed by the prosody control device D will be described with reference to the flow chart of FIG.

ステップＳ１にて、言語分析部１は句を含む表現の入力を受け付ける。言語分析部１は、既存の形態素解析技術を用いて、句を単語に分割する。言語分析部１は形態素解析の結果に基づいて各単語に品詞を付与し、終助詞を特定する。言語分析部１は、各単語に付与された品詞に基づいて、終助詞の前（文頭側）の述部の主要素である動詞語幹と活用変化部分とをまとめて動詞部分として設定する。言語分析部１は、動詞部分と終助詞とに挟まれて残された部分があれば、当該部分を変化要因部としてまとめる。すなわち単語のいずれかは変化要因部である。言語分析部１は、このようなまとめる処理の代わりに、各単語に対して、終助詞のラベル、動詞部分のラベル、及び変化要因部のラベルを付与してよい。 At step S1, the language analysis unit 1 receives input of expressions including phrases. The language analysis unit 1 uses existing morphological analysis technology to divide the phrase into words. The language analysis unit 1 assigns a part of speech to each word based on the result of the morphological analysis, and specifies the final particle. Based on the part of speech given to each word, the language analysis unit 1 collectively sets the verb stem and conjugation change part, which are the main elements of the predicate before the final particle (at the beginning of the sentence), as a verb part. If there is a part left between the verb part and the final particle, the language analysis part 1 puts this part together as a change factor part. That is, any of the words is the variable part. The language analysis unit 1 may assign a final particle label, a verb part label, and a variable factor part label to each word, instead of such processing of putting together.

例えば句が「するみたいだよ」のとき、言語分析部１は、形態素解析の結果として、「する、動詞、基本形」、「みたいだ、助動詞、基本形」及び「よ、助詞、終助詞」の３つの要素を得る。このとき１つの要素は３つの情報を含む。３つの情報は、具体的には左から、字面である表層形、品詞大分類、及び、品詞小分類である。この句の場合、左から１つ目の単語である「する」が動詞部分であり、左から三つ目の単語である「よ」が終助詞であり、それらの間の単語「みたいだ」が変化要因部である。なお他の実施形態では、言語分析部１は、用いる形態素解析器によっては上記の句を更に細かい単位に分割してよい。 For example, when the phrase is "Suru mita da yo", the language analysis unit 1, as a result of the morphological analysis, finds the following: You get 3 elements. At this time, one element contains three pieces of information. Specifically, the three pieces of information are, from left to right, a surface form that is a character face, a large part of speech classification, and a small part of speech classification. In this phrase, the first word from the left, ``suru'', is the verb part, the third word from the left, ``yo'', is the final particle, and the word between them is ``mitaida''. is the variable factor part. In another embodiment, the language analysis unit 1 may divide the above phrase into finer units depending on the morphological analyzer used.

別の機能として、言語分析部１は、「する」に動詞部分のラベルを、「みたいだ」に変化要因部のラベルを、「よ」に終助詞のラベルを与えることで、分割処理を行ってよい。 As another function, the language analysis unit 1 assigns a verb label to ``suru'', a variable factor label to ``miida'', and a final particle label to ``yo'', thereby performing segmentation processing. you can

本実施形態では変化要因部が存在しないことをφで表す。したがって言語分析部１は、句が「するよ」の場合、当該句を「する」と「φ」と「よ」とへ分割する。 In this embodiment, φ indicates that there is no change factor portion. Therefore, when the phrase is ``suruyo'', the language analysis unit 1 divides the phrase into ``suru'', ``φ'', and ``yo''.

ステップＳ２にて言語分析部１は、句に変化要因部が存在する否かを判定する。変化要因部が存在しないとき、言語分析部１は終助詞と動詞部分とを尺度値制御量変換部３へ送信する。変化要因部が存在するとき、言語分析部１は、終助詞、動詞部分及び変化要因部を尺度化部２へ送信する。 At step S2, the linguistic analysis unit 1 determines whether or not there is a change factor part in the phrase. When the change factor part does not exist, the language analysis part 1 sends the final particle and the verb part to the scale value control amount conversion part 3 . When there is a change factor part, the language analysis unit 1 sends the final particle, the verb part and the change factor part to the scaling unit 2 .

ステップＳ２にて言語分析部１が、単語内に変化要因部が存在すると判定したとき、ステップＳ３にて尺度化部２は変化要因部に対応する尺度値を決定する。具体的には尺度化部２は様々な句の尺度値を、確信度（確信の度合い）、指摘度（指摘行為である度合い）、主張度（主張の強さの度合い）、などの観点から評点付けすることで得ることができる。当該評点付けは例えば、１人以上の人が例えば手作業等にて、各句の印象が図３に示すような印象対のどこに位置するかを判断して行う。尺度化部２は、尺度値として、それらの評点の平均値等を設定可能である。 When the linguistic analysis unit 1 determines in step S2 that there is a change factor part in the word, the scaling unit 2 determines a scale value corresponding to the change factor part in step S3. Specifically, the scaling unit 2 calculates the scale values of various phrases from the viewpoint of certainty (degree of certainty), degree of pointing out (degree of pointing action), degree of assertion (degree of strength of assertion), and the like. It can be obtained by rating. The scoring is performed, for example, by one or more persons, for example, by manually determining where the impression of each phrase is positioned in the pair of impressions shown in FIG. The scaling unit 2 can set the average value of those scores or the like as the scale value.

別の機能として、尺度化部２は、この尺度値と表現との対応付けに、例えば図４に示すような表を用いることが可能である。当該表は例えば１以上の人によって手作業等にて作成される。 As another function, the scaling unit 2 can use, for example, a table such as that shown in FIG. 4 for associating the scale values with expressions. The table is created manually, for example, by one or more people.

別の機能として尺度化部２は、言語表現の違いを吸収する目的で、例えば、変化要因部をカテゴリカルな説明変数とし尺度値を連続値の従属変数とする、数量化I類のモデルを用いてよい。これにより尺度化部２は、変化要因部に対応する尺度値を決定することが可能である。 Another function of the scaling unit 2 is to absorb differences in linguistic expressions. may be used. Thereby, the scaling unit 2 can determine a scale value corresponding to the change factor portion.

尺度化部２は数量化I類にて、カテゴリカル変数とその重み係数の線形和で連続値の従属変数の値を決定する。この重み係数は、事前に集めておいた従属変数の値とそれに対応するカテゴリカル変数の多数の組から計算される。尺度化部２は、上記のさまざまな句に現れる変化要因部を構成する単語に対してカテゴリカル変数を１つずつ割り当てておく。尺度化部２は、ある変化要因部に対応する尺度値を決定する場合には、その変化要因部に含まれる単語に対応するカテゴリカル変数を１とし、その重みとの積和演算で尺度値を決定する。データから重みを決めるなどの、一般的な数量化理論での処理は下記の参考文献１等に開示される。その記述はここでは割愛する。
参考文献１：駒沢勉、「数量化理論とデータ処理」、朝倉書店、1982年 In the quantification class I, the scaling unit 2 determines the value of the continuous dependent variable by the linear sum of the categorical variables and their weighting coefficients. The weighting factors are calculated from multiple sets of pre-collected dependent variable values and corresponding categorical variables. The scaling unit 2 assigns one categorical variable to each of the words forming the change factor parts appearing in the above various phrases. When determining the scale value corresponding to a certain change factor part, the scaling unit 2 sets the categorical variable corresponding to the word contained in the change factor part to 1, and calculates the scale value by performing a product-sum operation with its weight. to decide. General quantification theory processing, such as determining weights from data, is disclosed in Reference 1 below. The description is omitted here.
Reference 1: Tsutomu Komazawa, "Quantification Theory and Data Processing", Asakura Shoten, 1982

別の機能として尺度化部２は、変化要因部を構成する各単語を、その各単語を表わす実数値ベクトルに変換し、変化要因部を構成する各単語を文頭から文末までの順に受け付ける数値変換器（例えば巡回型ニューラルネットなど）を用いて、変化要因部に対応する尺度値を決定してよい。離散的な単語を実数値ベクトルに変換するこのような処理はWord Embeddingと呼ばれる。この処理は次の参考文献２等に開示されるのでここでの説明を割愛する。
参考文献２：Mikolov Tomas、Sutskever Ilya、Chen Kai、Corrado Greg、Dean Jeffrey、「Distributed Representations of Words and Phrases and their Compositionality」、[Online]、2013年、[平成30年6月1日検索]、arXiv:1310.4546、インターネット〈URL：https://arxiv.org/abs/1310.4546〉 As another function, the scaling unit 2 converts each word constituting the change factor part into a real-valued vector representing each word, and receives each word constituting the change factor part in order from the beginning to the end of the sentence. A metric (eg, a cyclic neural net, etc.) may be used to determine the scale value corresponding to the variator. Such a process of converting discrete words to real-valued vectors is called word embedding. Since this processing is disclosed in the following Reference Document 2 or the like, the explanation here is omitted.
Reference 2: Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg, Dean Jeffrey, "Distributed Representations of Words and Phrases and their Compositionality", [Online], 2013, [searched June 1, 2018], arXiv : 1310.4546, Internet <URL: https://arxiv.org/abs/1310.4546>

また尺度化部２は、上記のように変化要因部の言語表現の確信度、指摘度及び主張度を直接求める代わりに、確信度、指摘度、主張度などの尺度値を、変化要因部に対する一般的な印象評価から求めることも可能である。例えば尺度化部２は、変化要因部に対して、一般的な印象評価で用いられる単語対（例えば、鈍い-鋭い、澄んだ-濁った、暗い-明るい、汚い-きれい、尖った-丸まった、重い-軽い、固い-柔らかい、派手な-地味な、太い-細い、ざらざらした-さらさらした、等）のどちらに近いかを定量的に評価した結果に対して複数人に渡る平均化を行なって尺度化してよい。当該結果は例えば人の手作業により与えられてよい。尺度化部２はそれぞれの尺度値を求める場合には上記と同様の数量化I類又は巡回型ニューラルネットワークを用いてよい。 In addition, instead of directly obtaining the degree of certainty, the degree of indication, and the degree of assertion of the linguistic expression of the change factor portion as described above, the scaling unit 2 calculates scale values such as the degree of certainty, the degree of indication, and the degree of assertion for the change factor portion. It is also possible to obtain from a general impression evaluation. For example, the scaling unit 2 uses word pairs used in general impression evaluation (for example, dull-sharp, clear-muddy, dark-bright, dirty-clean, sharp-curled) for the variable factor part. , Heavy - Light, Hard - Soft, Flashy - Plain, Thick - Thin, Rough - Smooth, etc.). can be scaled by The result may for example be given manually by a person. The scaling unit 2 may use the same quantification class I or a cyclic neural network as described above when calculating each scale value.

図２のステップＳ４にて尺度値制御量変換部３は、変化要因部に対応する尺度値を、任意の変換式を用いて韻律制御量に変換する。本実施形態の韻律制御量は、一例として基本周波数の変化量である。当該変化量は、変化要因部の末尾の基本周波数から、当該変化要因部の影響を受ける終助詞の基本周波数への変化量である。なおステップＳ２にて変化要因部が存在しないと判定された場合、尺度値制御量変換部３は、韻律制御量を所定の定数に設定してよい。 At step S4 in FIG. 2, the scale value control amount converter 3 converts the scale value corresponding to the change factor portion into a prosody control amount using an arbitrary conversion formula. The prosody control amount in this embodiment is, for example, the amount of change in the fundamental frequency. The amount of change is the amount of change from the fundamental frequency at the end of the change factor part to the fundamental frequency of the final particle affected by the change factor part. Note that if it is determined in step S2 that the change factor portion does not exist, the scale value control amount conversion section 3 may set the prosody control amount to a predetermined constant.

図５は、表現（図の「判断のモダリティ」）ごとの基本周波数の上昇量（図の「Ｆ_０上昇量（log）」）の例を示す。図５において、横軸の左から右に向かう順に、変化要因部の表現が、尺度値の１つである確信度（折れ線グラフ参照）が高い順に並ぶ。各表現には、対話時の基本周波数の上昇量と、読み上げ時の基本周波数の上昇量とが柱状グラフで対応付けられる。 FIG. ₅ shows an example of the amount of increase in the fundamental frequency (“F0 increase (log)” in the figure) for each expression (“modality of judgment” in the figure). In FIG. 5 , representations of the change factor part are arranged in order from left to right on the horizontal axis in descending order of confidence (see the line graph), which is one of the scale values. Each expression is associated in a columnar graph with the amount of increase in the fundamental frequency during dialogue and the amount of increase in the fundamental frequency during reading.

例えば図５の最も左に記載される「にちがいない」との表現の基本周波数の上昇量は、対話時に約-0.18で、読み上げ時に約-0.08である。すなわち、当該表現の末尾の「い」から、当該表現直後の終助詞（例えば「よ」）への基本周波数は下降する。よって声が低く変化することが読み取れる。 For example, the amount of increase in the fundamental frequency of the expression "must be" shown on the far left in FIG. 5 is about -0.18 during dialogue and about -0.08 during reading. That is, the fundamental frequency from "i" at the end of the expression to the final particle immediately following the expression (for example, "yo") descends. Therefore, it can be read that the voice changes to low.

図２のステップＳ４にて尺度値制御量変換部３は、例えば図５に示す値を用いて、対話時又は読み上げ時のうち対応する柱状グラフの高さに応じた基本周波数上昇量へと尺度値を変換して韻律を制御する。別の機能として、尺度値制御量変換部３は、変化要因部の基本周波数の変動量、又は、動詞部分の変動量を韻律制御量に設定してもよい。 In step S4 of FIG. 2, the scale value control amount conversion unit 3 uses the values shown in FIG. Transform values to control prosody. As another function, the scale value control amount conversion unit 3 may set the variation amount of the fundamental frequency of the change factor part or the variation amount of the verb part as the prosody control amount.

尺度値制御量変換部３は、韻律の制御を次の参考文献３の指令応答モデルを用いて行なうことができる。
参考文献３：H. Fujisaki and S. Nagashima、「A model for the synthesis of pitch contours of connected speech」、1969年、Annual Report of the Engineering Research Institute」、Faculty of Engineering、University of Tokyo、pp.53-60 The scale value control amount conversion unit 3 can control the prosody using the command response model of Reference 3 below.
Reference 3: H. Fujisaki and S. Nagashima, "A model for the synthesis of pitch contours of connected speech", 1969, Annual Report of the Engineering Research Institute, Faculty of Engineering, University of Tokyo, pp.53- 60

この場合には尺度値制御量変換部３は、指令応答モデルが用いるフレーズ成分又はアクセント成分の大きさ、並びに、生起及び効力を失う時刻等の複数の制御量を用いてよい。 In this case, the scale value control amount conversion unit 3 may use a plurality of control amounts such as the magnitude of the phrase component or accent component used by the command response model and the time of occurrence and loss of effect.

上記いずれの場合も、尺度値制御量変換部３は尺度化部２で得られた１つ以上の尺度値を韻律制御量へ変換する。尺度値制御量変換部３は当該変換の際、上記の１つ以上の尺度値を説明変数として、韻律制御量を従属変数として、制御量毎個別に変換する線形若しくは非線形の重回帰モデル、又は、同時に変換するニューラルネットワークを用いることが可能である。 In any of the above cases, the scale value control amount conversion unit 3 converts one or more scale values obtained by the scaling unit 2 into prosodic control amounts. At the time of the conversion, the scale value control amount conversion unit 3 uses a linear or nonlinear multiple regression model in which each control amount is individually converted using one or more scale values as explanatory variables and the prosody control amount as a dependent variable, or , it is possible to use a neural network that transforms simultaneously.

図２のステップＳ５にて韻律生成部４は尺度値制御量変換部３で得られた制御量に基づいて韻律生成を行なう。例えば制御量として基本周波数の上昇量を用いる場合、韻律生成部４は、基本周波数の上昇量が正であれば基本周波数が上昇する（声が高くなる）ように、あるいは、当該上昇量が負の値であれば基本周波数が下降する（声が低くなる）ように、句内の基本周波数を制御することが可能である。 At step S5 in FIG. 2, the prosody generation unit 4 generates a prosody based on the control amount obtained by the scale value control amount conversion unit 3. FIG. For example, when the amount of increase in the fundamental frequency is used as the amount of control, the prosody generation unit 4 is configured such that if the amount of increase in the fundamental frequency is positive, the amount of increase in the fundamental frequency is increased (the voice becomes higher), or if the amount of increase is negative. It is possible to control the fundamental frequency in a phrase such that the fundamental frequency falls (lower voice) for values of .

韻律生成部４は、制御量として上記の参考文献３の指令応答モデルの制御量を用いる場合、フレーズ成分とアクセント成分との大きさにより声の高さの変化を制御し、それらの成分の生起時刻により基本周波数の立ち上がりの緩急を制御可能である。 When the control amount of the command response model of Reference 3 is used as the control amount, the prosody generation unit 4 controls changes in the pitch of the voice according to the magnitude of the phrase component and the accent component, and determines the occurrence of these components. It is possible to control the gradual rise of the fundamental frequency by time.

以上のように本実施形態によれば、韻律制御装置Ｄは、終助詞の前側につく要素が細かく変わっても、それに対応する多様な韻律制御が可能である。具体的には韻律制御装置Ｄは、動詞部分、終助詞、それらの間の変化要因部といった、句を構成する表現の各部の影響を加味し、それらを一旦１以上の尺度値に変換して、更に尺度値から韻律制御量に変換する。これにより細かな意図又は意味に応じて、柔軟に韻律の程度を制御することが可能である。 As described above, according to the present embodiment, the prosody control device D can perform various prosody controls corresponding to minute changes in the element that precedes the final particle. Specifically, the prosody control device D considers the influence of each part of the expressions that make up the phrase, such as the verb part, the final particle, and the change factor part between them, and once converts them into a scale value of 1 or more. , and further converts the scale value into the prosody control amount. This makes it possible to flexibly control the degree of prosody according to detailed intentions or meanings.

また本実施形態によれば、言語分析部１は句に変化要因部が存在するか否かを判定し、句に変化要因部が存在しないと判定されたとき、尺度値制御量変換部３は制御量として所定の定数を設定する。このため、変化要因部の有無にかかわらず、句に対し、柔軟で多様な韻律制御が可能である。 Further, according to this embodiment, the language analysis unit 1 determines whether or not the phrase has a change factor part, and when it is determined that the phrase does not have a change factor part, the scale value control amount conversion unit 3 A predetermined constant is set as the controlled variable. Therefore, flexible and diverse prosody control is possible for phrases regardless of the presence or absence of the variable factor portion.

上記した実施形態は一例である。発明の趣旨及び範囲内で、当該実施形態に対して多くの変更及び置換ができることは当業者に明らかである。したがって、本開示は、上述の実施形態によって制限するものと解するべきではなく、特許請求の範囲から逸脱することなく、種々の変形又は変更が可能である。例えば、実施例の構成図に記載の複数の構成ブロックを１つに組み合わせたり、あるいは１つの構成ブロックを分割したりすることが可能である。 The above-described embodiment is an example. It will be apparent to those skilled in the art that many modifications and substitutions can be made to the embodiments within the spirit and scope of the invention. Therefore, the disclosure should not be construed as limited by the above-described embodiments, and various modifications and changes are possible without departing from the scope of the claims. For example, it is possible to combine a plurality of configuration blocks described in the configuration diagrams of the embodiments into one, or divide one configuration block.

本発明の装置はコンピュータとプログラムによっても実現でき、プログラムを記録媒体に記録することも、ネットワークを通して提供することも可能である。例えば韻律制御装置Ｄをコンピュータで構成する場合、各機能を実現する処理内容を記述したプログラムを、当該コンピュータの内部又は外部の記憶手段に格納しておき、当該コンピュータの中央演算処理装置（ＣＰＵ）によってこのプログラムを読み出して実行させることでコンピュータを機能させることができる。また、このようなプログラムは、例えばＤＶＤ又はＣＤ－ＲＯＭ等の可搬型記録媒体の販売、譲渡、貸与等により流通させることができるほか、そのようなプログラムを、例えばネットワーク上にあるサーバの記憶手段に記憶しておき、ネットワークを介してサーバから他のコンピュータにそのプログラムを転送することにより、流通させることができる。また、そのようなプログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラム又はサーバから転送されたプログラムを、一旦、自己の記憶手段に格納することができる。また、このプログラムの別の実施態様として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、更に、このコンピュータにサーバからプログラムが転送される度に、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。 The apparatus of the present invention can also be realized by a computer and a program, and the program can be recorded on a recording medium or provided through a network. For example, when the prosody control device D is configured by a computer, a program describing the processing contents for realizing each function is stored in a storage means inside or outside the computer, and the central processing unit (CPU) of the computer stores the program. By reading and executing this program, the computer can function. In addition, such programs can be distributed by selling, assigning, or lending portable recording media such as DVDs or CD-ROMs. , and can be distributed by transferring the program from a server to another computer via a network. Also, a computer that executes such a program can once store, for example, a program recorded on a portable recording medium or a program transferred from a server in its own storage means. Also, as another embodiment of this program, the computer may read the program directly from a portable recording medium and execute processing according to the program. In addition, it is also possible to sequentially execute processing according to the received program.

Ｄ韻律制御装置
１言語分析部
２尺度化部
３尺度値制御量変換部
４韻律生成部 D Prosody control device 1 Language analysis unit 2 Scaling unit 3 Scale value control amount conversion unit 4 Prosody generation unit

Claims

Receiving an input of an expression including a phrase, dividing the phrase into words, identifying a final particle and a verb part, and if there is a part sandwiched between the final particle and the verb part, the part is treated as a change factor part. a linguistic analyzer configured as
a scaling unit that determines a scale value corresponding to the variable factor;
a scale value control amount conversion unit that converts the scale value into a prosody control amount;
a prosody generator that controls the prosody of the expression based on the prosody control amount;
has
A prosody control device , wherein, when the language analysis unit determines that the change factor part does not exist in the phrase, the scale value control amount conversion unit sets a predetermined constant as the prosody control amount .

The language analysis unit receives an input of an expression including a phrase, divides the phrase into words, identifies a final particle and a verb part, and if there is a part sandwiched between the final particle and the verb part, setting the portion as the variable factor portion;
determining, by a scaling unit, a scale value corresponding to the variable factor;
a step of converting the scale value into a prosodic control quantity by a scale value control quantity conversion unit;
controlling the prosody of the expression based on the prosody control amount by a prosody generator;
setting a predetermined constant as the prosody control amount by the scale value control amount conversion unit when the language analysis unit determines that the change factor part does not exist in the phrase;
A prosody control method having

A program for causing a computer to function as the prosody control device according to claim 1 .