JP6633466B2

JP6633466B2 - Pause length control device, pause length control method, and program

Info

Publication number: JP6633466B2
Application number: JP2016137889A
Authority: JP
Inventors: 秀治中嶋; 裕司青野
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2016-07-12
Filing date: 2016-07-12
Publication date: 2020-01-22
Anticipated expiration: 2036-07-12
Also published as: JP2018010095A

Description

この発明は、音声合成技術に関し、特に、音声合成される文と文との間に挿入されるポーズの長さを制御する技術に関する。 The present invention relates to a speech synthesis technique, and more particularly to a technique for controlling the length of a pause inserted between sentences to be synthesized.

近年、コーパスに基づく手法や統計的手法の導入により、１つ１つの文に対応する合成音声の自然性は改善されてきた。しかし、文と文との間のポーズの長さを制御する方法は明らかとなっていない。音声分析によれば、文と文との間のポーズの長さは一定値ではない。複数の合成音声を適度で自然な長さのポーズを挟んで並べた合成音声コンテンツを、人間のナレーションによって作成された音声コンテンツに近づけるには、文と文との間のポーズの長さを適切に制御する技術が必要となる。 In recent years, the naturalness of synthesized speech corresponding to each sentence has been improved by introducing a corpus-based method and a statistical method. However, it is not clear how to control the length of the pause between sentences. According to the speech analysis, the length of the pause between sentences is not constant. To make the synthesized speech content, which is composed of multiple synthesized speeches with poses of appropriate and natural length interposed, closer to the speech content created by human narration, the length of the pause between sentences is appropriate. Control technology is required.

文間のポーズの長さを制御するための知見として非特許文献１がある。非特許文献１では、文書のレイアウト構造に基づいた分析の結果、１）タイトルと文との間のポーズ長は、タイトルが文である場合にはそれに続く本文との間のポーズ長が長く取られるが、タイトルが句である場合には異なること、２）箇条書き項目の文と文との間のポーズ長がほぼ一定に近い値をとること、が示されている。これらに基づけば、それらの文間でのポーズの長さの制御は可能である。 Non-Patent Document 1 discloses knowledge for controlling the length of a pause between sentences. According to Non-Patent Document 1, as a result of analysis based on the layout structure of a document, 1) When the title is a sentence, the pause length between the title and the sentence is long. However, this is different when the title is a phrase, and 2) that the pause length between the sentences of the bulleted item takes a value almost constant. Based on these, it is possible to control the length of the pause between those sentences.

中嶋秀治，宮崎昇，阪内澄宇，“高齢者への語りかけ音声におけるポーズ長の分析”，日本音響学会秋季研究発表会講演論文集，３−Ｑ−４９,平成２７年９月Hideharu Nakajima, Noboru Miyazaki, Sumyu Sakauchi, "Analysis of Pause Length in Speech Speaking to Elderly People," Proc. Of the Autumn Meeting of the Acoustical Society of Japan, 3-Q-49, September 2015

しかしながら、非特許文献１の知見だけでは、文書レイアウト構造では差異化が不可能な、段落内の文間のポーズの長さを制御することは不可能である。 However, the knowledge of Non-Patent Document 1 alone cannot control the length of the pause between sentences in a paragraph, which cannot be differentiated by the document layout structure.

この発明は、このような点に鑑みて、合成対象となる各文の構造に基づいて、文間ポーズの長さを適切に制御できるポーズ長制御技術を提供することを目的とする。 In view of the foregoing, it is an object of the present invention to provide a pause length control technique that can appropriately control the length of a pause between sentences based on the structure of each sentence to be combined.

上記の課題を解決するために、この発明のポーズ長制御装置は、文の係り受け解析結果から抽出した特徴量に応じて文間のポーズ長を出力する文間ポーズ長予測モデルを記憶するモデル記憶部と、合成対象文書に含まれる各文の係り受け解析結果から合成対象文書に含まれる各文の特徴量を抽出する特徴量抽出部と、文間ポーズ長予測モデルを用いて合成対象文書に含まれる各文の特徴量から合成対象文書に含まれる各文間に対応するポーズ長を予測するポーズ長予測部と、を含む。 In order to solve the above-mentioned problem, a pause length control device according to the present invention is a model that stores an inter-sentence pause length prediction model that outputs a pause length between sentences according to a feature amount extracted from a dependency analysis result of a sentence. A storage unit, a feature amount extraction unit that extracts a feature amount of each sentence included in the synthesis target document from a dependency analysis result of each sentence included in the synthesis target document, and a synthesis target document using an inter-sentence pause length prediction model And a pose length predicting unit for predicting a pause length corresponding to each sentence included in the synthesis target document from the feature amount of each sentence included in the document.

この発明によれば、文と文との間のポーズの長さを、各ポーズ位置の前後の文の構造情報のみから予測できる。その結果、ポーズの存在と長さによって文書の構造を聞き手に伝えることができ、より高い理解を与えることが可能となる。 According to the present invention, the length of a pause between sentences can be predicted from only the structure information of the sentences before and after each pause position. As a result, the structure of the document can be conveyed to the listener based on the existence and length of the pose, and a higher understanding can be given.

図１は、ポーズ長制御装置の機能構成を例示する図である。FIG. 1 is a diagram illustrating a functional configuration of the pause length control device. 図２は、ポーズ長制御方法の処理フローを例示する図である。FIG. 2 is a diagram illustrating a processing flow of a pause length control method. 図３は、文間ポーズ長予測モデルの一例を示す図である。FIG. 3 is a diagram illustrating an example of an inter-sentence pause length prediction model.

以下、この発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In the drawings, components having the same function are denoted by the same reference numerals, and redundant description is omitted.

実施形態のポーズ長制御装置は、複数の文が含まれる音声合成の対象とする文書（以下、合成対象文書と呼ぶ）の各文の係り受け解析結果を入力とし、合成対象文書から生成する合成音声における文間ポーズ長を出力する装置である。ポーズ長制御装置１は、図１に示すように、モデル記憶部１０、特徴量抽出部１１、およびポーズ長予測部１２を含む。このポーズ長制御装置１が後述する各ステップの処理を行うことにより実施形態のポーズ長制御方法が実現される。 The pause length control device according to the embodiment receives a dependency analysis result of each sentence of a document to be subjected to speech synthesis including a plurality of sentences (hereinafter, referred to as a synthesis target document) and generates a synthesis generated from the synthesis target document. This is a device that outputs the pause length between sentences in voice. As shown in FIG. 1, the pose length control device 1 includes a model storage unit 10, a feature amount extraction unit 11, and a pose length prediction unit 12. The pause length control method of the embodiment is realized by the pause length control device 1 performing the processing of each step described later.

ポーズ長制御装置１は、例えば、中央演算処理装置（CPU: Central Processing Unit）、主記憶装置（RAM: Random Access Memory）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。ポーズ長制御装置１は、例えば、中央演算処理装置の制御のもとで各処理を実行する。ポーズ長制御装置１に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて読み出されて他の処理に利用される。また、ポーズ長制御装置１の各処理部の少なくとも一部が集積回路等のハードウェアによって構成されていてもよい。ポーズ長制御装置１の各記憶部は、例えば、RAM（Random Access Memory）などの主記憶装置、ハードディスクや光ディスクもしくはフラッシュメモリ（Flash Memory）のような半導体メモリ素子により構成される補助記憶装置、またはリレーショナルデータベースやキーバリューストアなどのミドルウェアにより構成することができる。 The pause length control device 1 is configured by reading a special program into a known or dedicated computer having, for example, a central processing unit (CPU), a main storage device (RAM: Random Access Memory), and the like. It is a special device. The pause length control device 1 executes each process under the control of a central processing unit, for example. The data input to the pause length control device 1 and the data obtained in each process are stored in, for example, a main storage device, and the data stored in the main storage device is read out as necessary and used for other processes. Used. Further, at least a part of each processing unit of the pause length control device 1 may be configured by hardware such as an integrated circuit. Each storage unit of the pause length control device 1 is, for example, a main storage device such as a RAM (Random Access Memory), an auxiliary storage device including a semiconductor memory device such as a hard disk, an optical disk, or a flash memory, or It can be configured by middleware such as a relational database or key-value store.

図２を参照して、実施形態のポーズ長制御方法の処理手続きを説明する。 The processing procedure of the pause length control method according to the embodiment will be described with reference to FIG.

ポーズ長制御装置には、合成対象文書の各文の係り受け解析結果が入力される。係り受け解析結果は、各文を構成する文節間の係り受け解析結果でもよいし、各文を構成するアクセント句間の係り受け解析結果でもよい。係り受け解析の方法は、既存のどのような係り受け解析技術を用いてもよい。入力された係り受け解析結果は、特徴量抽出部１１へ入力される。 The result of the dependency analysis of each sentence of the document to be synthesized is input to the pause length control device. The dependency analysis result may be a dependency analysis result between clauses constituting each sentence, or a dependency analysis result between accent phrases forming each sentence. The dependency analysis method may use any existing dependency analysis technology. The input dependency analysis result is input to the feature amount extraction unit 11.

モデル記憶部１０には、文の係り受け解析結果から抽出した特徴量に応じて文間のポーズ長を出力する文間ポーズ長予測モデルが記憶されている。以下、係り受け解析結果から抽出する特徴量および文間ポーズ長予測モデルについて詳しく説明する。 The model storage unit 10 stores an inter-sentence pose length prediction model that outputs the inter-sentence pause length according to the feature amount extracted from the sentence dependency analysis result. Hereinafter, the feature amount extracted from the dependency analysis result and the sentence pause length prediction model will be described in detail.

係り受け解析結果から抽出する特徴量について説明する。文間、すなわち、ポーズの位置の前後両側の文を構成する文節間の係り受け関係から、文節の係り先が直後の文節ではない文節境界の数を間接係り受け数として数える。入力された係り受け解析結果がアクセント句間の係り受け解析結果である場合は、アクセント句間の係り受け関係から間接係り受け数を数える。この間接係り受け数は、文間の直前の文および直後の文のみでなく、さらにそれらの１つ前後の文、２つ前後の文、・・・、というように一定の幅内に属する複数の文から抽出してもよい。つまり、ポーズの長さを予測しようとする文間のＮ個前の文からＭ個後の文までを所定の幅として設定し、その所定の幅に含まれる各文から抽出した間接係り受け数の（Ｎ＋Ｍ）個の組を、特徴量として抽出する。この特徴量は次式で表すことができる。 The feature amount extracted from the dependency analysis result will be described. From the dependency relationship between sentences, that is, the dependency relationship between the phrases that constitute the sentences on the front and rear sides of the pause position, the number of clause boundaries whose clause is not the immediately following phrase is counted as the number of indirect dependencies. If the input dependency analysis result is a dependency analysis result between accent phrases, the number of indirect dependencies is counted from the dependency relationship between accent phrases. The number of indirect dependencies is not only the sentence immediately before and after the sentence, but also the sentence before and after the sentence, the sentence before and after the sentence, and so on within a certain width. May be extracted from the sentence. That is, the length from the sentence N to the sentence N between the sentences whose pause length is to be predicted is set as a predetermined width, and the number of indirect dependencies extracted from each sentence included in the predetermined width is set. (N + M) sets are extracted as feature amounts. This feature can be represented by the following equation.

ここで、xは各文の間接係り受け数を表し、下付き添え字は予測対象の文間を０としてその文間との相対的な位置を表す。すなわち、x_-1は予測対象の文間の直前の文の間接係り受け数であり、x₁は直後の文の間接係り受け数であり、x_-NはＮ個前の文の間接係り受け数であり、x_MはＭ個後の文の間接係り受け数である。なお、ＮとＭの値は異なっていてもよいし、同じであってもよい。 Here, x indicates the number of indirect dependencies of each sentence, and the subscript indicates the relative position with respect to the inter-sentence assuming that the interval between the sentences to be predicted is 0. That, x _-1 is the number of receive indirect dependency sentence of the previous sentences to be predicted, x ₁ is an indirect dependency number of statements immediately following, x _-N undergoes indirect dependency of N th previous sentence X _M is the number of indirect dependencies of the sentence after M. Note that the values of N and M may be different or the same.

文間ポーズ長予測モデルは、例えば、特徴量を説明変数（入力変数）とし、予測対象のポーズ長を従属変数（出力変数）とする回帰式である。予測しようとする文間、すなわち、x_-1とx₁との間の位置のポーズ長をＹとすると、Ｙを予測する文間ポーズ長予測モデルは、下記の線形式で表すことができる。 The inter-sentence pause length prediction model is, for example, a regression equation in which the feature amount is an explanatory variable (input variable), and the pause length to be predicted is a dependent variable (output variable). Sentences to be predicted, i.e., when the pause length position between the x _-1 and x ₁ and Y, sentences pause length prediction model for predicting the Y can be represented by the linear equation below.

ここで、ａは各説明変数ｘの係数であり、ｂは定数項である。線形式のこれらの係数と定数項は、間接係り受け数とポーズ長とがペアになった大量の学習データから、最小二乗法によって計算しておくことができる。線形回帰式のほかに、入力変数および出力変数が同様である非線形回帰式や、回帰木またはニューラルネットワークなどの機械学習による回帰モデルを用いてもよい。 Here, a is a coefficient of each explanatory variable x, and b is a constant term. These linear coefficients and constant terms can be calculated by a least squares method from a large amount of learning data in which the number of indirect dependencies and the pause length are paired. In addition to the linear regression equation, a non-linear regression equation having the same input variables and output variables, or a regression model based on machine learning such as a regression tree or a neural network may be used.

文間ポーズ長予測モデルは、例えば、予測対象の文間の直前の文から得た特徴量のみを引数とする階段状関数でもよい。階段状関数の一例を図３に示す。図３は、横軸が特徴量、すなわち、文間の直前の文から得られた間接係り受け数であり、縦軸がその間接係り受け数に対応するポーズ長である。階段状関数では、例えば、図３に示すように、間接係り受け数が１または２の場合は同じポーズ長（１．２５秒）とし、３から５の場合は別の値で同じポーズ長（１．４５秒）に設定する。 The inter-sentence pause length prediction model may be, for example, a step-like function having only the feature amount obtained from the sentence immediately before the sentence to be predicted as an argument. FIG. 3 shows an example of the step function. In FIG. 3, the horizontal axis represents the feature quantity, that is, the number of indirect dependencies obtained from the sentence immediately before the sentence, and the vertical axis represents the pause length corresponding to the number of indirect dependencies. In the step-like function, for example, as shown in FIG. 3, when the number of indirect dependencies is 1 or 2, the same pause length (1.25 seconds) is used, and when the number of indirect dependencies is 3 to 5, another pause value is used. 1.45 seconds).

文間ポーズ長予測モデルは、例えば、図３における縦軸の高さを間接係り受け数が１および２のポーズ長の平均値とし、それらの分散値をもち、ポーズ長がそれらの平均値と分散に従う正規分布などの確率分布としてモデル化し、ポーズ長を確率的に設定することも可能である。 The inter-sentence pause length prediction model has, for example, the height of the vertical axis in FIG. 3 as the average value of the pause lengths in which the number of indirect dependencies is 1 and 2, and has a variance thereof. It is also possible to model as a probability distribution such as a normal distribution according to the variance, and set the pose length stochastically.

ステップＳ１において、特徴量抽出部１１は、入力された合成対象文書に含まれる各文の係り受け解析結果から、その合成対象文書に含まれる各文の特徴量を抽出する。各文から抽出された特徴量の例を表１に示す。 In step S1, the feature value extraction unit 11 extracts the feature value of each sentence included in the synthesis target document from the dependency analysis result of each sentence included in the input synthesis target document. Table 1 shows an example of the feature amount extracted from each sentence.

文番号は合成対象文書に含まれる各文のインデックスであり、特徴量は各文に対応する特徴量である。表１の例では、各文の間接係り受け数が特徴量として設定されている。所定の幅に含まれる文の間接係り受け数を特徴量に含める場合、文書の先頭に位置する文１のように前側の文が存在しない文については、前側Ｎ個の間接係り受け数には、例えば０を設定する。抽出した特徴量はポーズ長予測部１２へ送られる。 The sentence number is an index of each sentence included in the synthesis target document, and the feature amount is a feature amount corresponding to each sentence. In the example of Table 1, the number of indirect dependencies of each sentence is set as a feature amount. In the case where the number of indirect dependencies of a sentence included in a predetermined width is included in the feature amount, for a sentence having no preceding sentence, such as sentence 1 located at the beginning of the document, the number of indirect dependencies of the front N is determined by For example, 0 is set. The extracted feature amount is sent to the pose length prediction unit 12.

ステップＳ２において、ポーズ長予測部１２は、モデル記憶部１０に記憶された文間ポーズ長予測モデルを読み出し、その文間ポーズ長予測モデルを用いて、合成対象文書に含まれる各文の特徴量から、合成対象文書に含まれる各文間に対応するポーズ長を予測する。表１の特徴量の例から予測されたポーズ長の例を表２に示す。 In step S2, the pause length prediction unit 12 reads the inter-sentence pause length prediction model stored in the model storage unit 10, and uses the inter-sentence pause length prediction model to calculate the feature amount of each sentence included in the synthesis target document. Then, the pause length corresponding to each sentence included in the document to be synthesized is predicted. Table 2 shows an example of the pose length predicted from the example of the feature amount in Table 1.

表２の例では、ポーズ長予測値は同じ行の文の直後の文間のポーズ長を秒単位で表している。すなわち、文１と文２の文間のポーズ長は１．２秒であり、文２と文３の文間のポーズ長は１．３秒である。 In the example of Table 2, the pause length prediction value represents the pause length between sentences immediately after the sentence on the same line in seconds. That is, the pause length between sentences 1 and 2 is 1.2 seconds, and the pause length between sentences 2 and 3 is 1.3 seconds.

ポーズ長制御装置は、ポーズ長予測部１２が予測した、合成対象文書の各文間に対応するポーズ長を出力する。このポーズ長を用いて合成対象文書を音声合成することで、適切な長さのポーズを挿入した合成音声を生成することができる。 The pause length control device outputs a pause length between the sentences of the document to be synthesized, which is predicted by the pause length prediction unit 12. By synthesizing the document to be synthesized using the pause length, it is possible to generate a synthesized speech in which a pause having an appropriate length is inserted.

このように構成することにより、この発明のポーズ長制御装置は、文と文との間のポーズの長さを、各ポーズ位置の前後の文の構造情報のみから予測できる。その結果、ポーズの存在と長さによって文書の構造を聞き手に伝えることができ、より高い理解を与えることが可能となる。 With this configuration, the pause length control device of the present invention can predict the length of a pause between sentences only from the structure information of the sentences before and after each pause position. As a result, the structure of the document can be conveyed to the listener based on the existence and length of the pose, and a higher understanding can be given.

以上、この発明の実施の形態について説明したが、具体的な構成は、これらの実施の形態に限られるものではなく、この発明の趣旨を逸脱しない範囲で適宜設計の変更等があっても、この発明に含まれることはいうまでもない。実施の形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 As described above, the embodiments of the present invention have been described. However, the specific configuration is not limited to these embodiments, and the design may be appropriately changed without departing from the spirit of the present invention. It goes without saying that this invention is included. The various processes described in the embodiments may be executed not only in chronological order according to the order described, but also in parallel or individually according to the processing capability of the device that executes the processes or as necessary.

［プログラム、記録媒体］
上記実施形態で説明した各装置における各種の処理機能をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 [Program, recording medium]
When the various processing functions of each device described in the above embodiment are implemented by a computer, the processing content of the function that each device should have is described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 A program describing this processing content can be recorded on a computer-readable recording medium. As a computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The distribution of the program is performed by, for example, selling, transferring, lending, or the like, a portable recording medium such as a DVD or a CD-ROM on which the program is recorded. Further, the program may be stored in a storage device of a server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. Then, when executing the processing, the computer reads the program stored in its own recording medium and executes the processing according to the read program. As another execution form of the program, the computer may directly read the program from the portable recording medium and execute processing according to the program, and further, the program may be transferred from the server computer to the computer. Each time, the processing according to the received program may be sequentially executed. A configuration in which the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes a processing function only by executing the program and acquiring the result without transferring the program from the server computer to the computer. It may be. It should be noted that the program in the present embodiment includes information used for processing by the computer and which is similar to the program (data that is not a direct command to the computer but has characteristics that define the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of the processing may be realized by hardware.

１ポーズ長制御装置
１０モデル記憶部
１１特徴量抽出部
１２ポーズ長予測部 DESCRIPTION OF SYMBOLS 1 Pose length control device 10 Model storage unit 11 Feature amount extraction unit 12 Pose length prediction unit

Claims

A model storage unit that stores an inter-sentence pause length prediction model that outputs a pause length between sentences according to a feature amount extracted from a sentence dependency analysis result;
A feature amount extraction unit configured to extract a feature amount of each sentence included in the synthesis target document from a dependency analysis result of each sentence included in the synthesis target document;
A pose length prediction unit that predicts a pose length corresponding to between sentences included in the synthesis target document from a feature amount of each sentence included in the synthesis target document using the inter-sentence pause length prediction model;
Including
The inter-sentence pause length prediction model is a step function having only the feature amount extracted from the sentence immediately before the sentence to be predicted as an argument.
Pause length control device.

A model storage unit that stores an inter-sentence pause length prediction model that outputs a pause length between sentences according to a feature amount extracted from a sentence dependency analysis result;
A feature amount extraction unit configured to extract a feature amount of each sentence included in the synthesis target document from a dependency analysis result of each sentence included in the synthesis target document;
A pose length prediction unit that predicts a pose length corresponding to between sentences included in the synthesis target document from a feature amount of each sentence included in the synthesis target document using the inter-sentence pause length prediction model;
Including
The feature amount is obtained by combining the indirect dependency number, which is the number of clause boundaries in which the preceding clause is not the immediately following clause between the clauses constituting each sentence, for a predetermined number of sentences before and after each sentence Is a thing,
Pause length control device.

A model storage unit that stores an inter-sentence pause length prediction model that outputs a pause length between sentences according to a feature amount extracted from a sentence dependency analysis result;
A feature amount extraction unit configured to extract a feature amount of each sentence included in the synthesis target document from a dependency analysis result of each sentence included in the synthesis target document;
A pose length prediction unit that predicts a pose length corresponding to between sentences included in the synthesis target document from a feature amount of each sentence included in the synthesis target document using the inter-sentence pause length prediction model;
Including
The feature quantity is the number of indirect dependencies between the accent phrases that constitute each sentence, the number of accent phrase boundaries where the preceding accent phrase is not the immediately following accent phrase, and the number of indirect dependencies between each sentence. A combination of sentences,
Pause length control device.

The model storage unit stores an inter-sentence pause length prediction model that outputs a inter-sentence pause length according to a feature amount extracted from a sentence dependency analysis result,
A feature amount extraction unit that extracts a feature amount of each sentence included in the synthesis target document from a dependency analysis result of each sentence included in the synthesis target document,
A pose length prediction unit predicts a pose length corresponding to each sentence included in the synthesis target document from the feature amount of each sentence included in the synthesis target document using the inter-sentence pause length prediction model ,
The inter-sentence pause length prediction model is a step function having only the feature amount extracted from the sentence immediately before the sentence to be predicted as an argument.
Pause length control method.

The model storage unit stores an inter-sentence pause length prediction model that outputs a inter-sentence pause length according to a feature amount extracted from a sentence dependency analysis result,
A feature amount extraction unit that extracts a feature amount of each sentence included in the synthesis target document from a dependency analysis result of each sentence included in the synthesis target document,
A pose length prediction unit predicts a pose length corresponding to each sentence included in the synthesis target document from the feature amount of each sentence included in the synthesis target document using the inter-sentence pause length prediction model ,
The feature amount is obtained by combining the indirect dependency number, which is the number of clause boundaries in which the preceding clause is not the immediately following clause between the clauses constituting each sentence, for a predetermined number of sentences before and after each sentence Is a thing,
Pause length control method.

The model storage unit stores an inter-sentence pause length prediction model that outputs a inter-sentence pause length according to a feature amount extracted from a sentence dependency analysis result,
A feature amount extraction unit that extracts a feature amount of each sentence included in the synthesis target document from a dependency analysis result of each sentence included in the synthesis target document,
A pose length prediction unit predicts a pose length corresponding to each sentence included in the synthesis target document from the feature amount of each sentence included in the synthesis target document using the inter-sentence pause length prediction model ,
The feature quantity is the number of indirect dependencies between the accent phrases that constitute each sentence, the number of accent phrase boundaries where the preceding accent phrase is not the immediately following accent phrase, and the number of indirect dependencies between each sentence. A combination of sentences,
Pause length control method.

A program for causing a computer to function as the pause length control device according to any one of claims 1 to 3 .