JP3589972B2

JP3589972B2 - Speech synthesizer

Info

Publication number: JP3589972B2
Application number: JP2000312354A
Authority: JP
Inventors: 英二小松
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2000-10-12
Filing date: 2000-10-12
Publication date: 2004-11-17
Anticipated expiration: 2020-10-12
Also published as: JP2002123281A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声合成のテキスト解析における、単語の表記の処理に関する。
【０００２】
【従来の技術】
文章を伝達媒体で分類すると、文字により表現する文字言語と、音声による音声言語に分けられる。また、文章を文体で分類すると、文語（文語体、書き言葉）と、口語（口語体、話し言葉）がある。文語は、主に文字言語で用いられ、口語は、主に音声言語で用いられる。
【０００３】
文字言語においては、通常文語が用いられるが、電子メールなどでは、口語が用いられることも多い。このような場合、韻律を文字で表す場合が多い。文献［１］では、単語を強調する方法についての分類がなされている。発明者が、収集した例文に対して、上記の方法を当てはめたところ、多くの表記が強調として説明できることが分かった。
【０００４】
図２８は、文献［１］の強調の分類に、それぞれの場合に用いられる表記法を、発明者が追加したものである。
【０００５】
図２６は、従来の音声合成装置の構成図である。文入力部００１から日本語の文章を入力すると、言語解析部００２で言語的な解析が行われる。まず、語彙解析部００３が言語解析に不要な文字などを削除、置換し、入力を文に分割する。次に、形態素解析部００４で形態素解析を行い、入力文を単語に区切る処理を行うと共に、取り出した単語から入力文の分割として最も妥当と判定する単語列を選択する。単語辞書００６からの単語の取り出しは、辞書検索処理部００５が行う。
【０００６】
図２７は、入力文の単語辞書検索結果の一部である。図のように、入力文の各文字位置において、そこの文字を先頭とする、入力文の部分文字列を単語見出しとする単語を単語辞書から取出す。辞書にない単語については、未知語処理部００７が未知語を推定して、単語を作成する。
【０００７】
形態素解析終了後、構文解析部００８が構文解析を行う。アクセント結合・連濁処理部００９は、形態素解析と構文解析の結果を用いて、単語辞書から得られたアクセントに対して、前後の単語とのアクセント結合処理を行い、読み上げの際の基本単位となるアクセント句を作成する。韻律処理部０１０は、アクセント句の列に対してフレーズ指令やポーズ指令を設定する。合成パラメータ部０１１は、言語解析部の解析結果を基に、音声合成を制御するのに必要なパラメータの時系列を生成する。音声波形生成部０１２は、制御パラメータをＤ／Ａ変換して、音声波形を生成する。音声出力部０１３は、音声波形を音声として出力する。
【０００８】
形態素解析部００４、辞書検索処理部００５における処理のアルゴリズムとしては、文献［２］に示されるようなものがある。
【０００９】
文献［１］：講座─日本語と日本語教育２、日本語の音声・音韻（上）、明治書院、ｐ３１６〜３４２
文献［２］：岩波講座─言語の科学３、単語と辞書、岩波書店、ｐ５５〜７３
【００１０】
【発明が解決しようとする課題】
図２８の１に示されるように、「発音による強調法」を行い、強調された単語の表記を作成すると、発音の異なる複数の単語を作り出せる。さらに、それらの単語の表記は一意でないため、上記の強調法により作り出される単語は非常に多い。例えば、「すごい」に対して、「すっごい」、「すごーい」、「すっごーい」、「すんごい」、「すーごい」、「すごぉーい」等の単語を作ることができる。
【００１１】
以下、本発明における、説明のための用語として、「すっごい」のように、ある単語に強調を行い、表記変更がなされた結果できた単語を「強調形」、「すごい」のように、特に強調しない場合に用いられる単語を「中立形」と呼ぶことにする。
【００１２】
上述のように、１つの中立形に対して、強調形の数は非常に多く、強調形をすべて単語辞書に登録することは事実不可能である。このため、通常は、中立形、及び、頻度が高い強調形のみを単語辞書に登録する。この結果、強調形が多用される文章、特に、口語体等の文章については、入力文の解析精度が低くなり、音声合成が不自然になるという欠点があった。
【００１３】
また、「すっごーい」のように強調形の単語を単語辞書に登録した場合についても、強調形であることを考慮しないため、中立形の韻律で音声合成しており、合成音が不自然であるという欠点があった。
【００１４】
さらに、図２８の２．及び文献［１］で述べられているように、中立形が既に強調の意味を持っている「強調語」も存在し、このような語についても、自然な合成音が得られなかった。
【００１５】
本発明では、単語辞書を格納するメモリの記憶容量の増大を抑制すると共に、強調形及び強調語を用いることにより、韻律の制御を柔軟にし、より使い勝手のよい、音声合成装置を提供することを目的とする。
【００１６】
【課題を解決するための手段】
この発明に係る音声合成装置は、強調表記された強調形単語を含む文章を入力する文入力手段と、入力された文章の言語解析を行う言語解析手段と、言語解析手段による言語解析の結果に対して韻律情報を設定する韻律処理手段と、韻律処理手段により韻律情報が設定された言語解析の結果を基に音声合成パラメータを生成する合成パラメータ生成手段と、音声合成パラメータに基づいて音声波形を生成する音声波形生成手段と、音声波形を音声として出力する音声出力手段とを備えた音声合成装置であって、言語解析手段は、入力された文章から言語解析に不要な文字を削除、置換すると共に、不要な文字が削除、置換された文書を文に分割する語彙解析手段と、表記中立化規則の様々な適用を行い単語辞書から取り出した単語同士の接続の可能性を検証することにより、語彙解析手段によって分割された文における強調形の表記を中立形の表記に変換すると共に文を単語に分割する形態素解析手段と、表記中立化規則の適用状況に応じて、形態素解析手段により分割された単語の読みに対して読み修正規則を適用して強調形の読みに修正すると共に、形態素解析手段により分割された単語の韻律に対して韻律修正規則を適用して強調した韻律に修正し、韻律処理主手段に出力する強調表記処理手段とを備えている。
【００２０】
【発明の実施の形態】
以下、本発明の音声合成装置の実施の形態（実施形態）について図面を参照しながら詳細に説明する。
【００２１】
＜第１の実施形態＞
＜構成＞
図１はこの発明の第１の実施形態を示す構成図である。１０１は、日本語の文章を入力する文入力部、１０２は、入力文の言語的な解析を行なう言語解析部、１０３は、言語解析に不要な文字などを削除、置換し、入力を文に分割する語彙解析部、１０４は、入力文を単語に分割する形態素解析部、１０５は、単語辞書を検索する辞書検索処理部、１０６は、言語解析部で用いられる言語情報を格納する単語辞書である。
【００２２】
１０７は、辞書検索の検索文字列を変更するための表記中立化規則である。表記中立化規則は、強調形の表記を中立形の表記に変更し、辞書に登録されている単語の表記とマッチさせるための規則である。図２は、表記中立化規則で用いられる、強調形の表記に関係する文字の分類である。図３は、辞書検索処理部１０５で用いる表記中立化規則である。
【００２３】
１０８は、入力された文書中の単語が単語辞書にない場合に、未知語を推定し、単語候補を作成する未知語処理部である。
【００２４】
１０９は、形態素解析結果の単語について、表記中立化規則１０７の適用結果を参照して、検索時とは逆に、取出した単語の読みを入力文中の強調形の読みに近くなるように修正し、さらに、強調した韻律に修正する強調表記処理部である。１１０は、強調表記処理部で用いる読み修正規則を格納する読み修正規則格納部である。図６は、読み修正規則である。
【００２５】
１１１は、強調表記処理部で用いる韻律修正規則を格納する韻律修正規則格納部である。図８は、韻律修正規則である。韻律修正規則に於ける「アクセント強調規則」は、入力文で、強調形が用いられている場合に、アクセントを強くするための規則であり、「速度遅延規則」は、強調形が用いられている場合に、発声速度を遅くして、強調の効果を生じさせる規則である。
【００２６】
１１２は、形態素解析結果を用いて構文解析を行う構文解析部、１１３は、アクセント結合、連濁処理等を行う、アクセント・連濁処理部、１１４は、言語解析部の出力にフレーズ、ポーズ等の韻律情報を設定する韻律処理部、１１５は、音声合成制御パラメータを生成する合成パラメータ生成部、１１６は、制御パラメータをＤ／Ａ変換して、音声波形を生成する音声波形生成部、１１７は、音声波形を音声として出力する音声出力部である。
【００２７】
＜動作＞
文入力部１０１から日本語の文章を入力すると、言語解析部１０２で言語的な解析が行われる。まず、語彙解析部１０３が、言語解析に不要な文字を削除、置換し、入力を文に分割する。次に、形態素解析部１０４で形態素解析を行い、入力文を単語に区切る。形態素解析のアルゴリズムに関わらず、入力文の各文字位置において、その文字位置から始まる、入力文の部分文字列を見出しとする単語を、まとめて単語辞書１０６から取り出す処理が行われる。形態素解析部１０４は、単語辞書１０６から取り出した単語同士の接続の可能性を検証して、入力文の単語分割として、最も妥当と思われる単語列を決定し、出力する。単語辞書１０６から取り出した単語には、見出し、品詞、接続情報等、形態素解析に必要な情報以外に、音声合成に必要な各種情報が付与されている。
【００２８】
単語辞書１０６からの単語の取り出しは、辞書検索処理部１０５が行う。本発明では、単語辞書検索で用いる検索文字列として、入力文字列の部分列を用いて、図２７で示されるような辞書検索を行う以外に、図２に示される文字分類、及び、表記中立化規則１０７により文字を置換えた文字列を用いて検索を行う。図３は、辞書検索部で用いる表記中立化規則である。
【００２９】
図４は、辞書検索のフローチャートである。Ｓｅａｒｃｈ（）は、図２７に示したような、入力文の部分文字列となる単語を得る手続きであるとする。例えば、Ｓ＝”郵送しまぁ〜〜す。”、ｎ＝９、ｉ＝４とする。ステップＳ１でＤ（Ｓ（４））＝φとなる。ステップＳ２で、Ｓ（４）＝”まぁ〜〜す。”として、Ｓｅａｒｃｈ（Ｓ（４））を実行するすると、Ｄ（Ｓ（４））＝｛ま（感動詞、…）、ま（副詞、…）｝を得る。ステップＳ３で、Ｓ（４）の長音相当文字連鎖、促音連鎖、撥音の連鎖をそれぞれ１つにまとめ、長音相当文字を標準表記に置換えると、Ｓ（４）＝”まぁーす”となる。ステップＳ４で、Ｓ２＝Ｓ（４）＝”まぁーす”とする。ステップＳ５で、長音削除規則、促音規則なし、撥音規則なしの組み合わせを選ぶ。ステップＳ６で未適用の組み合わせが有ったか否かを判定し、組み合わせが有った場合、ステップＳ７で、その規則をＳ２に適用すると、Ｓ２＝”まぁす。”となる。ステップＳ８で、Ｓｅａｒｃｈ（Ｓ２）を実行するすると、Ｄ（Ｓ２）＝｛ま（感動詞、…）、ま（副詞、…）｝となる。ステップＳ９で、Ｄ（Ｓ（４））＝Ｄ（Ｓ（４））∪Ｄ（Ｓ２）＝｛ま（感動詞、…）、ま（副詞、…）｝となり、ステップＳ４に戻る。尚、前記ステップＳ６で未適用の組み合わせが無かった場合は処理を終了する。
【００３０】
ステップＳ４で、Ｓ２＝Ｓ（４）＝”まぁーす”とする。ステップＳ５で、長音削除規則、促音削除規則、撥音規則なしの組み合わせを選ぶ。この場合ステップＳ７で選んだ規則をＳ２に適用すると、Ｓ２＝”ます。”となる。ステップＳ８で、Ｓｅａｒｃｈ（Ｓ２）を実行するすると、Ｄ（Ｓ２）＝｛ます（普通名詞、…）、ます（助動詞、…）、ま（感動詞、…）、ま（副詞、…）｝となる。ステップＳ９で、Ｄ（Ｓ（４））＝Ｄ（Ｓ（４））∪Ｄ（Ｓ２）＝｛ます（普通名詞、…）、ます（助動詞、…）、ま（感動詞、…）、ｓま（副詞、…）｝となる。
【００３１】
以下、規則の組み合わせを変えて、ステップＳ４〜ステップＳ９を実行するが、Ｄ（Ｓ（４））は変化せず、規則の組み合わせがなくなり、ステップＳ６の判定結果がＮＯとなり、処理が終了する。
【００３２】
結果として、Ｄ（Ｓ（４））＝｛ます（普通名詞、…）、ます（助動詞、…）、ま（感動詞、…）、ま（副詞、…）｝を得る。
【００３３】
未知語修正部１０８は、辞書検索で用いた、入力文の部分文字列の先頭部分が未知語である可能性があるため、未知語の候補をいくつか作成し、辞書検索結果に追加する。
【００３４】
辞書から取り出された、「ます（助動詞）」が、最終的な形態素解析結果の一部として得られるかどうかは、形態素解析のアルゴリズムによるが、この例では、辞書引き結果に含まれる単語の集合は、強調形が用いられていない文「郵送します。」の辞書引き結果と同じであり、「郵送します」が正しく解析できる限り、「郵送しまぁ〜〜す。」についても正しい結果が得られる。
【００３５】
図５は、本発明による形態素解析結果の一例を示したものである。強調表記処理部１０９で必要になるため、辞書から取出された単語には、適用された表記中立化規則と、規則を適用した文字位置の情報を記憶しておく。また、表記中立化規則が適用される前の単語見出しも記憶しておく。複数の規則が適用された場合は、複数の規則を記憶する。
【００３６】
形態素解析終了後、強調表記処理部１０９では、まず、読み修正規則格納部１１０に格納された読み修正規則を用いて、形態素解析結果の読みを修正する。図６は、読み修正規則を示したものである。図７は、読み修正規則の適用例である。
【００３７】
強調表記処理部１０９は、次に、韻律修正規則格納部１１１に格納された韻律修正規則により、韻律の修正を行う。図９は、韻律修正規則の適用例である。図９において、まず、「郵送しまぁ〜〜す。」という文を解析して得られた形態素解析結果に、読み修正規則を適用する。形態素解析結果に含まれる表記中立化規則の適用状況から、「ます」の読みが変更される。次に、韻律修正規則を適用する。形態素解析結果に含まれる表記中立化規則の適用状況から、「ます」にアクセント強化規則を適用する。さらに、入力文では、「ます」の長音化部分の表記として、「〜〜」が用いられているため、速度遅延規則が適用される。この結果、図９に示されるように、「ます」のアクセントと遅延速度が変更される。
【００３８】
形態素解析の出力を用いて、構文解析部１１２が構文解析を行う。アクセント結合・連濁処理部１１３は、形態素解析と構文解析の結果を用いて、単語辞書１０６から得られたアクセントに対して、前後の単語とのアクセント結合処理を行い、読み上げの際の基本単位となるアクセント句を作成する。韻律処理部１１４は、アクセント句の列に対してフレーズ指令やポーズ指令を設定する。
【００３９】
合成パラメータ生成部１１５は、言語解析部１０２の解析結果を基に、音声合成を制御するのに必要なパラメータの時系列を生成する。音声波形生成部１１６は、制御パラメータをＤ／Ａ変換して、音声波形を生成する。音声出力部１１７は、音声波形を音声として出力する。
【００４０】
＜第２の実施形態＞
＜構成＞
図１０は、第２の実施形態の構成図である。第１の実施形態における、単語辞書１０６の代わりに、強調情報が付与されている単語辞書２０１を用いていること、及び、強調語処理部２０２、強調語処理規則格納部２０３が設けられている点が異なる。
【００４１】
図１１は、単語辞書に付与されている強調情報の例である。強調単語情報は、単語が常に強調される語彙であることを示す情報である。Ｅ１ならば、強調単語である。読み変更フラグは、既に、強調により、読みを変更している単語かどうかを示すフラグである。ＯＮならば、変更済み、ＯＦＦならば、未変更である。読み変更情報は、その単語が強調されたときに、どのように読みを変更するかを記述した情報である。形容詞「すごい」については、読みが「スゴーイ」、「スッゴイ」、「スンゴイ」のように変化できることを示している。読み変更情報で生じる読みは、そのまま、単語見出しの変化に用いることもできる。単語辞書は通常語幹部分を登録するが、図１２の強調語処理規則は、説明上、形容詞については語尾まで含めた形にしている。
【００４２】
＜動作＞
強調表記処理部１０９までは、実施形態１と同じ動作を行う。ただし、形態素解析で用いる単語辞書２０１には、強調情報が含まれているため、形態素解析結果にも、単語辞書から取出された強調情報が含まれている。強調語処理部２０２は、強調語処理規則格納部２０３の強調語処理規則を用いて、強調語の処理を行う。
【００４３】
図１３は、強調語が含まれる文の強調語処理の例を示したものでである。「かなり」は、強調語として単語辞書に登録されているとする。強調表記処理部１０９の処理結果では、通常のアクセントが付けられている。強調語処理部２０２では、強調語処理規則を参照する。この場合、「かなり」に対して、「強調語アクセント強化規則」が適用され、「かなり」のアクセントが強くなる。
【００４４】
図１４は、ユーザが強調を指定したときの強調語処理の例を示したものである。「＜強調＞」と「＜／強調＞」の間の文字列を強調することが指定されているとする。図１３と同様に、「かなり」が、強調語として単語辞書に登録されているとする。強調表記処理部１０９の処理結果では、通常のアクセントが付けられている。強調語処理部２０２では、強調語処理規則を参照する。この場合、「かなり」に対して、「強調語アクセント強化規則」が適用され、「かなり」のアクセントが強くなり、韻律は、「カ＆＆ナリ」になる。さらに、「強調語読み変更規則」が適用される。「かなり」には、読み変更情報としてＡ４がついているため、第１拍に長音化が起き、韻律は「カ＆＆ーナリ」となる。
【００４５】
図１５は、ユーザが「強調」を指定したときの非強調語処理である。「＜強調＞」と「＜／強調＞」の間の文字列を強調することが指定されているとする。図１３、図１４の例と異なり、文章中に強調語がないとする。強調表記処理部の処理結果では、通常のアクセントが付けられている。強調語処理部は、強調語処理規則を参照する。この場合、「こんな」、「変な」、及び、「ひょっとして」の３つの単語についての処理を説明する。単語辞書から取出した強調語情報は、３単語とも、非強調語であることを示す「Ｅ０」である。そこで、３単語には、「非強調語強調規則」が適用される。この結果、アクセントを強くすると同時に、読みを修正し、図１５強調語処理結果のように、それぞれ、「コーンナ」、「ヘ＆＆ーンナ」、「ヒョーットシテ」という読み、及び、韻律（アクセント）が得られる。
【００４６】
＜第３の実施形態＞
＜構成＞
図１６はこの発明の第３の実施形態を示す構成図である。第２の実施形態と比べて、モジュール３０１〜３０５を備えている点が異なる。３０１は、形態素解析結果から、表記中立化が行われている単語を抽出する強調情報抽出部、３０２は、抽出した強調情報を格納する強調情報格納部、３０３は、３０２に格納された強調情報を辞書に追加するためのユーザインタフェースである強調情報管理部、３０４は、強調情報格納部３０２から強調情報を取出し、辞書に追加してよい情報かどうかをユーザが確認するための、強調情報確認部、３０５は、辞書に登録してよいと確認された強調情報を単語辞書に登録する、強調情報登録部である。
【００４７】
＜動作＞
文入力部１０１から入力された文章は、語彙解析部１０３で文に区切られ、形態素解析部１０４で形態素解析される。形態素解析結果は、例えば図５のようになっているものとする。１０９以降のモジュールでは、実施形態２と同じ動作がなされ、音声合成が行われる。
【００４８】
強調情報抽出部３０１は、形態素解析結果の各単語を走査し、表記中立化規則が適用されているかどうかを調べ、同規則が適用されている単語の情報だけを、強調情報格納部３０２に格納する。図１７は、抽出強調情報の例である。ここまでの処理を、文入力部から入力された、すべての文に対して行う。
【００４９】
文章中のすべての文について、上記の処理が終了した後、ユーザが強調情報管理部３０３を操作する。強調情報管理部３０３は、強調情報確認部３０４を動作させる。強調情報確認部３０４は、強調情報格納部３０２から、形態素解析結果を１単語ずつ取出す。以下、強調情報格納部３０２に、図１７のような結果が得られているとして説明する。
【００５０】
まず、「ます（助動詞、マ＆ス、長音相当文字削除規則、１文字目の直後、”まぁ〜〜す”）」が取出される。ユーザの画面に、単語の情報を表示する。図１８は、ユーザ表示画面の例である。中立形は、入力文の文字列を本装置の表記中立化規則により、単語辞書の登録単語とマッチングできるように変更した表記であり、強調形は、入力文の文字列に実際に現れた表記である。
【００５１】
表記中立化規則が誤って適用された場合には、誤った中立形が作成されることがあるため、ユーザに確認を促す。ユーザには、中立形と強調形以外に、原文、及び、形態素解析された品詞を表示して、判断を正確にさせる。上記の例では、「まぁ〜〜す」は、「ます」の強調形であると、ユーザが判断すれば、助動詞「ます」に、このような変化をする強調情報を追加するような指定がされる。
【００５２】
強調情報管理部３０３は、強調情報確認部３０４から、ユーザの確認が入力されると、強調情報登録部３０５を呼び出し、形態素解析結果を渡す。強調情報登録部３０５は、渡された形態素解析結果の表記中立化規則の適用結果を参照する。この場合、「まぁ〜〜す」から「ます」への変更に、「１文字目の直後に長音相当文字削除規則を適用」しているため、「ます」から「まぁ〜〜す」への変更は、「語頭拍の長音化」となり、単語辞書の「ます」に対して、図１１の強調情報の読み変更情報に示される、Ａ４（「語頭拍の長音化」）という情報を追加する。この結果、「ます」の強調情報は、図１９のようになる。
【００５３】
次に、図１７で示される形態素解析結果から、「しかり（動詞、シカリ、促音削除規則、１文字目の直後、”しっかり”）」が取出される。ユーザの画面に、「中立形：しかり」、「強調形：しっかり」、「品詞：動詞」、及び、入力文を表示する。この場合は、「しっかり」が、動詞「しかり」の強調形であることはない。ユーザが、この抽出強調情報を不適当なものと判断すれば、単語辞書への登録を指示しないため、抽出強調情報確認部は、抽出強調情報登録部を呼び出すことなく、次の形態素解析結果を、抽出強調情報格納部から取出し、上記のような処理を繰り返す。
【００５４】
抽出強調情報格納部のすべての形態素解析結果を処理すると、動作を終了する。
【００５５】
＜第４の実施形態＞
＜構成＞
図２０はこの発明の第４の実施形態を示す構成図である。第２の実施形態と比べて、強調表記作成部４０１と、強調表記出力部４０２を備えている点が異なる。図２１は、強調表記作成規則である。
【００５６】
＜動作＞
文入力部１０１〜音声出力部１１７まで、実施形態２と同じ動作をする。一方、強調語処理部２０２の処理が終了した時点で、解析結果を強調表記作成部に送る。
【００５７】
図２２は、強調語処理部２０２から、強調表記作成部４０１へ送られる情報の例である。各単語には、品詞、読み（アクセント位置、アクセントの強さを含む）、強調語かどうかを示す強調語情報（Ｅ０：非強調語、Ｅ１：強調語）、既に強調により表記の変更がなされているかどうかを示す読み習性情報（ＯＮ：読み修正済み、ＯＦＦ：読み未修正）、修正内容（「語頭拍長音化」等）の情報が付けられている。
【００５８】
強調表記作成部４０１は、強調表記作成規則を用いて、単語見出しを修正する。図２２の例では、「一体」、「どうして」、「こんな」、「だ」の読み修正フラグがＯＮであるため、図２１の「強調語表記作成規則」が適用される。この結果、「一体」は、読みを片仮名にしたものを単語見出しとし、「どうして」、「こんな」、「だ」については、読みを平仮名に直したものを単語見出しとする。さらに、「だ」の表記については、長音でなく「あ」を用いる。一方、「こと」、「に」、「なっ」、「た」、「ん」については、読み修正フラグがＯＦＦであるため、図２１の「非強調語表記作成規則」が適用され、入力文の表記をそのまま単語見出しとして用いる。この結果、図２３のような結果が得られる。強調表記作成部で得られた結果は、強調表記出力部４０２に送られる。
【００５９】
強調表記出力部４０２は、単語列の単語見出しを羅列して、文を作成し、音声出力部１１７が音声を出力するのと同時に、作成した文を出力して、入力文と共に、ユーザに表示する。図２４は、ユーザへの表示の例である。
【００６０】
図２５は、文の一部に強調が指定されている場合のユーザへの表示の例である。指定された部分だけが、強調されて音声合成され、その部分だけが、強調形で表示される。
【００６１】
尚、本発明は前述の実施形態に限定されるものではなく、本発明の趣旨に基づいて種々の変形が可能である。例えば、第１の実施形態で用いた、文字種別以外に、「い、か、な、い」という文で用いられるような読点、或いは、「キレイ」のような仮名型表記に関する規則を使用することができる。また、第１の実施形態では、韻律として、アクセントだけを修正したが、入力文字の種類によって、ピッチを修正することもできる。
【００６２】
【発明の効果】
以上詳細に説明したように、請求項１記載の発明によれば、強調表記された強調形単語を含む文章を入力する文入力手段と、入力された文章の言語的な解析を行う言語解析手段と、言語解析手段による言語解析結果に対して韻律情報を設定する韻律処理手段と、音声合成パラメータを生成する合成パラメータ生成手段と、合成パラメータに基づく音声波形を生成する音声波形生成手段と、音声波形を音声として出力する音声出力手段とを備えた音声合成装置であって、前記言語解析手段は、入力文章から言語解析に不要な文字を削除、置換すると共に、入力文章を文に分割する語彙解析手段と、強調表記された単語を中立形単語に変換する規則である表記中立化規則を用いて、前記語彙解析手段により分割された文を、中立形単語のみを格納した単語辞書を参照することにより単語に分割する形態素解析手段と、前記形態素解析手段により分割された単語について、読み修正規則を参照することにより、中立形単語に変換された強調単語の読みを修正すると共に、韻律修正規則を参照して当該単語の韻律を強調した韻律に修正する強調表記処理手段とを備えた構成としたので、（１）強調を表すために表記を変化させた単語に対して、新たに単語を単語辞書に登録することなく、音声合成を行うことができる効果がある。また、（２）強調を表すために表記を変化させた単語に対して、強調された韻律で、自然な音声合成を行うことができるという効果がある。
【００６３】
また、請求項２記載の発明によれば、請求項１に記載の音声合成装置において、前記言語解析手段における前記単語辞書は、当該単語が強調語であるか否かの強調情報を付与した単語辞書であり、前記言語解析手段は更に、前記形態素解析手段により当該単語が強調単語であると判定された場合に、強調語に対する強調処理規則を参照することにより当該強調語の韻律を修正する強調語処理手段を備えた構成としたので、（１）強調語の韻律を強調することにより、自然な韻律の音声合成音を得ることができる効果がある。また、（２）強調形の韻律を、強調することにより、自然な韻律の音声合成音を得ることができる効果がある。
【００６４】
また、請求項４に記載の発明によれば、請求項２に記載の音声合成装置において、更に、入力文章中の全ての文から抽出された単語について、表記中立化規則が適用されている単語のみを抽出して強調情報格納手段に格納する強調情報抽出手段と、ユーザの操作により、前記強調情報格納手段から単語を読みだして当該単語に関する情報を表示させ、ユーザが強調形単語と判断した単語のみを当該単語の強調情報を付与して前記単語辞書に登録する強調情報管理手段とを備えたこ構成としたので、単語を強調したときの変化に関する辞書情報を、簡便に得ることができるという効果がある。
【００６５】
また、請求項５に記載の発明によれば、請求項２に記載の音声合成装置において、更に、前記強調語処理手段から出力される各単語情報に対して、強調表記作成規則を適用することにより音声の韻律を表記に反映した文を作成して入力文と共に表示する強調表記作成手段を備えた構成としたので、合成音の韻律の強調の変化を文字で表示することができる効果がある。
【図面の簡単な説明】
【図１】第一の実施形態を示す装置の構成図である。
【図２】強調形の表記で用いられる文字の分類図である。
【図３】辞書検索部で用る表記中立化規則である。
【図４】辞書検索のフローチャートである。
【図５】形態素解析結果の例である。
【図６】読み修正規則の例である。
【図７】読み修正規則の適用例である。
【図８】韻律修正規則の例である。
【図９】韻律修正規則の適用例である。
【図１０】第二の実施形態を示す装置の構成図である。
【図１１】強調情報の例である。
【図１２】強調語処理規則の例である。
【図１３】強調語処理（強調語・ユーザ強調非指定）の例である。
【図１４】強調語処理（強調語・ユーザの強調指定）の例である。
【図１５】強調語処理（非強調語・ユーザの強調語指定）の例である。
【図１６】第三の実施形態を示す装置の構成図である。
【図１７】抽出強調情報の例である。
【図１８】ユーザ表示画面の表示例である。
【図１９】単語辞書への読み変更情報の追加の例である。
【図２０】第四の実施形態を示す装置の構成図である。
【図２１】強調表記作成規則の例である。
【図２２】強調表記作成規則部へ送られる情報の例である。
【図２３】強調表記作成規則の適用結果の例である。
【図２４】ユーザへの表示例（文全体の強調）である。
【図２５】ユーザへの表示例（部分的な強調）である。
【図２６】従来の装置の構成図である。
【図２７】単語辞書検索の例である。
【図２８】強調の表記法を示す図である。
【符号の説明】
１０１文入力部
１０２言語解析部
１０３語彙解析部
１０４形態素解析部
１０５辞書検索処理部
１０６単語辞書
１０７表記中立化規則
１０８未知語処理部
１０９強調表記処理部
１１０読み修正規則格納部
１１１韻律修正規則格納部
１１２構文解析部
１１３アクセント結合・連濁処理部
１１４韻律処理部
１１５合成パラメータ生成部
１１６音声波形生成部
１１７音声出力部
２０１単語辞書（強調情報付き）
２０２強調語処理部
２０３強調語処理規則格納部
３０１強調情報抽出部
３０２強調情報格納部
３０３強調情報管理部
３０４強調情報確認部
３０５強調情報登録部
４０１強調表記作成部
４０２強調表記出力部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to word notation processing in text analysis of speech synthesis.
[0002]
[Prior art]
When a sentence is classified by a transmission medium, it is divided into a character language expressed by characters and a speech language by voice. When sentences are classified by style, there are sentences (sentence style, written words) and spoken languages (spoken style, spoken words). Sentences are mainly used in written languages, and colloquials are mainly used in spoken languages.
[0003]
In a character language, a sentence is usually used, but in an e-mail or the like, a spoken language is often used. In such a case, the prosody is often represented by characters. Document [1] classifies the method of emphasizing words. When the inventor applied the above method to the collected example sentences, it was found that many notations could be explained as emphasis.
[0004]
FIG. 28 shows the notation used in each case added to the emphasis classification of the document [1] by the inventor.
[0005]
FIG. 26 is a configuration diagram of a conventional speech synthesizer. When a Japanese sentence is input from the sentence input unit 001, a linguistic analysis is performed by the linguistic analysis unit 002. First, the vocabulary analysis unit 003 deletes and replaces characters and the like unnecessary for language analysis, and divides the input into sentences. Next, the morphological analysis unit 004 performs a morphological analysis to divide the input sentence into words, and selects a word string determined to be most appropriate as a division of the input sentence from the extracted words. The retrieval of words from the word dictionary 006 is performed by the dictionary search processing unit 005.
[0006]
FIG. 27 shows a part of a word dictionary search result of an input sentence. As shown in the figure, at each character position of the input sentence, a word having the character at the head and having a partial character string of the input sentence as a word heading is extracted from the word dictionary. For words that are not in the dictionary, the unknown word processing unit 007 estimates unknown words and creates words.
[0007]
After the morphological analysis is completed, the syntax analysis unit 008 performs syntax analysis. The accent combination and rendaku processing unit 009 performs the accent combination processing of the accent obtained from the word dictionary with the preceding and following words using the result of the morphological analysis and the syntax analysis, and becomes a basic unit when reading out. Create accent phrases. The prosody processing unit 010 sets a phrase command and a pause command for a row of accent phrases. The synthesis parameter unit 011 generates a time series of parameters necessary for controlling speech synthesis based on the analysis result of the language analysis unit. The audio waveform generation unit 012 performs D / A conversion on the control parameter to generate an audio waveform. The audio output unit 013 outputs the audio waveform as audio.
[0008]
As an algorithm of processing in the morphological analysis unit 004 and the dictionary search processing unit 005, there is an algorithm shown in Document [2].
[0009]
Literature [1]: Lecture: Japanese and Japanese language education 2, Japanese speech and phonology (above), Meiji Shoin, pp. 316-342
Reference [2]: Iwanami Lecture @ Science of Language 3, Words and Dictionaries, Iwanami Shoten, pp. 55-73
[0010]
[Problems to be solved by the invention]
As shown in 1 of FIG. 28, when the “pronunciation method by pronunciation” is performed and the notation of the emphasized word is created, a plurality of words having different pronunciations can be created. Furthermore, since the notation of these words is not unique, the words created by the above-mentioned emphasis method are very large. For example, for "Wow," words such as "Wow,""Wow,""Wow,""Wow,""Wow,""Wow," can be made. .
[0011]
Hereinafter, in the present invention, as a term for explanation, such as "wow", emphasizing a certain word, the word resulting from the notation change is "emphasized", "wow", especially Words used without emphasis will be referred to as "neutral".
[0012]
As described above, the number of emphasized forms is extremely large for one neutral form, and it is virtually impossible to register all emphasized forms in the word dictionary. Therefore, usually, only the neutral form and the frequently emphasized form are registered in the word dictionary. As a result, with respect to sentences in which the emphasis forms are frequently used, particularly, sentences such as colloquial texts, there is a disadvantage that the analysis accuracy of the input sentence is reduced and speech synthesis becomes unnatural.
[0013]
Also, when words with emphasized forms, such as "soooooooo," are registered in the word dictionary, speech is synthesized using neutral prosody because the emphasized forms are not considered, and the synthesized sound is unnatural. There was a disadvantage that it was.
[0014]
Further, 2. of FIG. As described in [1] and in the literature [1], there is also an “emphasized word” in which the neutral form already has the meaning of emphasis, and even for such a word, a natural synthesized sound could not be obtained.
[0015]
In the present invention, it is possible to provide a speech synthesis device that suppresses an increase in the storage capacity of a memory that stores a word dictionary, controls the prosody flexibly by using emphasized forms and emphasized words, and is more user-friendly. Aim.
[0016]
[Means for Solving the Problems]
A speech synthesis device according to the present invention includes a sentence input unit for inputting a sentence including an emphasized word with an emphasis, a language analysis unit for performing a language analysis of the input sentence, and a result of a language analysis performed by the language analysis unit. Prosody processing means for setting prosody information, synthesis parameter generation means for generating speech synthesis parameters based on the result of language analysis in which the prosody information is set by the prosody processing means, and generating a speech waveform based on the speech synthesis parameters. A speech synthesizer comprising: a speech waveform generating means for generating; and a speech output means for outputting a speech waveform as speech, wherein the language analyzing means deletes and replaces characters unnecessary for language analysis from the input text. A vocabulary analysis unit that divides the document in which unnecessary characters are deleted and replaced into sentences, and a connection between words extracted from the word dictionary by applying various notation neutralization rules By verifying the possibility, the morphological analysis means for converting the notation of the emphasized form in the sentence divided by the lexical analysis means into the notation of the neutral form and dividing the sentence into words, and according to the application status of the notation neutralization rule And applying a reading correction rule to the reading of the word divided by the morphological analysis means to correct the reading to an emphasized form, and applying a prosody modification rule to the prosody of the word divided by the morphological analysis means. And an emphasis notation processing means for modifying the prosody emphasized to output the result to the prosody processing main means.
[0020]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment (embodiment) of a speech synthesis device of the present invention will be described in detail with reference to the drawings.
[0021]
<First embodiment>
<Structure>
FIG. 1 is a configuration diagram showing a first embodiment of the present invention. 101 is a sentence input unit for inputting a Japanese sentence, 102 is a language analysis unit for performing linguistic analysis of the input sentence, and 103 is a unit that deletes and replaces characters and the like unnecessary for language analysis and converts the input into a sentence. A vocabulary analysis unit 104 for dividing the input sentence into words, a morphological analysis unit for dividing the input sentence into words, a dictionary search processing unit 105 for searching a word dictionary, and a word dictionary 106 for storing linguistic information used in the language analysis unit is there.
[0022]
Reference numeral 107 denotes a notation neutralization rule for changing a search character string in a dictionary search. The notation neutralization rule is a rule for changing the notation of the emphasized form to the notation of the neutral form and matching the notation of a word registered in the dictionary. FIG. 2 is a classification of characters related to the emphasis notation used in the notation neutralization rule. FIG. 3 shows a notation neutralization rule used in the dictionary search processing unit 105.
[0023]
An unknown word processing unit 108 estimates an unknown word and creates a word candidate when the word in the input document is not in the word dictionary.
[0024]
109, referring to the application result of the notation neutralization rule 107 for the word of the morphological analysis result, corrects the reading of the extracted word so as to be closer to the reading of the emphasized type in the input sentence, contrary to the search. , And an emphasis notation processing unit that corrects the prosody. Reference numeral 110 denotes a reading correction rule storage unit that stores reading correction rules used in the emphasis notation processing unit. FIG. 6 shows the reading correction rule.
[0025]
A prosody modification rule storage unit 111 stores prosody modification rules used in the emphasis notation processing unit. FIG. 8 shows a prosody modification rule. In the prosody modification rule, the "accent emphasis rule" is a rule for strengthening the accent when an emphasized form is used in the input sentence. The "speed delay rule" is a rule in which the emphasized form is used. This is a rule that causes the utterance speed to be slowed down and the effect of emphasis to be produced when the utterance is present.
[0026]
Reference numeral 112 denotes a syntactic analysis unit that performs syntactic analysis using the result of the morphological analysis, 113 denotes an accent combination, rendaku processing, etc., an accent / rendation processing unit, and 114 denotes a prosody such as a phrase or a pause output from the language analysis unit. A prosody processing unit 115 for setting information, a synthesis parameter generation unit 115 for generating a voice synthesis control parameter, a voice waveform generation unit 116 for D / A converting the control parameter to generate a voice waveform, and 117 An audio output unit that outputs a waveform as audio.
[0027]
<Operation>
When a Japanese sentence is input from the sentence input unit 101, a linguistic analysis is performed by the linguistic analysis unit 102. First, the vocabulary analysis unit 103 deletes and replaces characters unnecessary for language analysis, and divides the input into sentences. Next, the morphological analysis unit 104 performs a morphological analysis to divide the input sentence into words. Regardless of the morphological analysis algorithm, at each character position of the input sentence, a process of starting with the character position and heading for a partial character string of the input sentence as a heading is performed to collectively extract the words from the word dictionary 106. The morphological analysis unit 104 verifies the possibility of connection between the words extracted from the word dictionary 106, determines the most appropriate word string as word division of the input sentence, and outputs it. The words extracted from the word dictionary 106 are provided with various information required for speech synthesis in addition to information necessary for morphological analysis, such as headings, parts of speech, and connection information.
[0028]
The retrieval of words from the word dictionary 106 is performed by the dictionary search processing unit 105. In the present invention, in addition to performing a dictionary search as shown in FIG. 27 using a substring of an input character string as a search character string to be used in a word dictionary search, a character classification shown in FIG. A search is performed using a character string obtained by replacing characters according to the conversion rule 107. FIG. 3 shows a notation neutralization rule used in the dictionary search unit.
[0029]
FIG. 4 is a flowchart of a dictionary search. Search () is a procedure for obtaining a word that is a partial character string of the input sentence as shown in FIG. 27. For example, it is assumed that S = “mail”. N = 9 and i = 4. In step S1, D (S (4)) = φ. In step S2, when Search (S (4)) is executed with S (4) = “ma- ぁ”, D (S (4)) = ｛ma (inflection, ...), ma (adverb) , ...) get｝. In step S3, the chain of long letter equivalents of S (4), the chain of consonant sounds, and the chain of sound repelling are combined into one, and the letters corresponding to long sounds are replaced with the standard notation, so that S (4) = “Ma ぁ”. . In step S4, S2 = S (4) = “Ma-A”. In step S5, a combination without the long-sound deletion rule, no prompting rule, and no sound-repelling rule is selected. In step S6, it is determined whether or not there is a combination that has not been applied. If there is a combination, then in step S7, when the rule is applied to S2, S2 = “much better”. When Search (S2) is executed in step S8, D (S2) = {ma (adverb, ...), ma (adverb, ...)}. In step S9, D (S (4)) = D (S (4)) {D (S2) = {ma (intransitive verb, ...), ma (adverb, ...)}, and the process returns to step S4. If there is no unapplied combination in step S6, the process ends.
[0030]
In step S4, S2 = S (4) = “Ma-A”. In step S5, a combination without the long sound deletion rule, the prompt sound deletion rule, and the sound repelling rule is selected. In this case, if the rule selected in step S7 is applied to S2, then S2 = "". In step S8, when Search (S2) is executed, D (S2) = {Masu (ordinary noun, ...), Masu (auxiliary verb, ...), Ma (inspirational verb, ...), Ma (adverb, ...)} Become. At step S9, D (S (4)) = D (S (4)) ∪D (S2) = ｛(ordinary noun, ...), mas (auxiliary verb, ...), ma (inflection verb, ...), s Ma (adverb, ...)｝.
[0031]
Hereinafter, steps S4 to S9 are executed by changing the combination of rules, but D (S (4)) does not change, the combination of rules disappears, the determination result in step S6 becomes NO, and the process ends. .
[0032]
As a result, D (S (4)) = {mass (ordinary noun,...), Mas (auxiliary verb,.
[0033]
The unknown word correction unit 108 creates some unknown word candidates and adds them to the dictionary search result because the beginning of the partial character string of the input sentence used in the dictionary search may be an unknown word.
[0034]
Whether or not "mas (auxiliary verb)" extracted from the dictionary is obtained as a part of the final morphological analysis result depends on the morphological analysis algorithm, but in this example, a set of words included in the dictionary lookup result Is the same as the dictionary lookup result of the sentence "mailed." Without the emphasized form. As long as "mailed" can be correctly analyzed, the correct result is also obtained for "mailed." can get.
[0035]
FIG. 5 shows an example of a morphological analysis result according to the present invention. The word extracted from the dictionary stores information on the applied notation neutralization rule and the character position to which the rule is applied, since the word is taken out of the dictionary because it is required by the emphasis notation processing unit 109. Also, the word heading before the writing neutralization rule is applied is stored. If multiple rules are applied, the multiple rules are stored.
[0036]
After the morphological analysis is completed, the emphasis notation processing unit 109 first corrects the reading of the morphological analysis result using the reading correction rules stored in the reading correction rule storage unit 110. FIG. 6 shows the reading correction rule. FIG. 7 is an application example of the reading correction rule.
[0037]
Next, the emphasis notation processing unit 109 modifies the prosody according to the prosody modification rule stored in the prosody modification rule storage unit 111. FIG. 9 is an application example of the prosody modification rule. In FIG. 9, first, a reading correction rule is applied to a morphological analysis result obtained by analyzing a sentence “Posted mail. The reading of "mas" is changed depending on the application status of the notation neutralization rule included in the morphological analysis result. Next, the prosody modification rules are applied. Based on the application status of the notation neutralization rule included in the morphological analysis result, the accent enhancement rule is applied to “mas”. Furthermore, in the input sentence, "~~" is used as the notation of the prolonged part of "masu", so the speed delay rule is applied. As a result, as shown in FIG. 9, the accent of "mas" and the delay speed are changed.
[0038]
Using the output of the morphological analysis, the syntax analysis unit 112 performs syntax analysis. The accent combining and rendaku processing unit 113 uses the results of the morphological analysis and the syntax analysis to perform accent combining processing on the accent obtained from the word dictionary 106 with the preceding and succeeding words, and Create an accent phrase The prosody processing unit 114 sets a phrase command and a pause command for the string of accent phrases.
[0039]
The synthesis parameter generation unit 115 generates a time series of parameters necessary for controlling speech synthesis based on the analysis result of the language analysis unit 102. The audio waveform generation unit 116 performs D / A conversion of the control parameter to generate an audio waveform. The audio output unit 117 outputs an audio waveform as audio.
[0040]
<Second embodiment>
<Structure>
FIG. 10 is a configuration diagram of the second embodiment. In the first embodiment, instead of the word dictionary 106, a word dictionary 201 to which emphasis information is added is used, and an emphasis word processing unit 202 and an emphasis word processing rule storage unit 203 are provided. The points are different.
[0041]
FIG. 11 is an example of emphasis information added to the word dictionary. The emphasized word information is information indicating that a word is a vocabulary that is always emphasized. If E1, it is an emphasized word. The reading change flag is a flag indicating whether or not the word has been changed in reading due to emphasis. If it is ON, it has been changed, and if it is OFF, it has not been changed. The reading change information is information describing how to change the reading when the word is emphasized. The adjective "Wow" indicates that the reading can be changed to "Sgoi", "Sugoi", or "Sungoi". The readings generated in the reading change information can be used for the change of the word headings as they are. Although the word dictionary normally registers the stem portion, the emphasis word processing rule in FIG. 12 uses an adjective including the end of the word for explanation.
[0042]
<Operation>
The same operation as in the first embodiment is performed up to the emphasis notation processing unit 109. However, since the word dictionary 201 used in the morphological analysis contains the emphasis information, the result of the morphological analysis also includes the emphasis information extracted from the word dictionary. The emphasized word processing unit 202 processes the emphasized word using the emphasized word processing rule of the emphasized word processing rule storage unit 203.
[0043]
FIG. 13 shows an example of the emphasized word processing of a sentence including the emphasized word. It is assumed that “pretty” is registered in the word dictionary as an emphasized word. The processing result of the emphasis notation processing unit 109 has a normal accent. The emphasized word processing unit 202 refers to the emphasized word processing rules. In this case, “emphasis” is applied to “pretty”, and the “pretty” accent becomes stronger.
[0044]
FIG. 14 shows an example of the emphasized word processing when the user specifies emphasis. Assume that it is specified that a character string between “<emphasis>” and “</ emphasis>” is emphasized. Similarly to FIG. 13, it is assumed that “quite” is registered as an emphasized word in the word dictionary. The processing result of the emphasis notation processing unit 109 has a normal accent. The emphasized word processing unit 202 refers to the emphasized word processing rules. In this case, the “emphasized word accent enhancement rule” is applied to “pretty”, the “pretty” accent becomes strong, and the prosody becomes “ka & nari”. Further, the “emphasis word reading change rule” is applied. Since "pretty" has A4 as the reading change information, the first beat is prolonged, and the prosody is "Ka &nari".
[0045]
FIG. 15 shows the non-emphasized word processing when the user designates “emphasized”. Assume that it is specified that a character string between “<emphasis>” and “</ emphasis>” is emphasized. Unlike the examples in FIGS. 13 and 14, it is assumed that there is no emphasized word in the text. The processing result of the emphasis notation processing unit is given a normal accent. The emphasized word processing unit refers to the emphasized word processing rules. In this case, the processing for the three words “this”, “weird”, and “by chance” will be described. The emphasized word information extracted from the word dictionary is “E0” indicating that all three words are non-emphasized words. Therefore, the “non-emphasized word emphasis rule” is applied to the three words. As a result, at the same time that the accent is strengthened, the pronunciation is corrected, and as shown in the emphasized word processing result of FIG. 15, the pronunciations "Koruna", "He &&Nanna", and "Hyotshite" and the prosody (accent) are obtained, respectively. Can be
[0046]
<Third embodiment>
<Structure>
FIG. 16 is a configuration diagram showing a third embodiment of the present invention. The difference from the second embodiment is that modules 301 to 305 are provided. Reference numeral 301 denotes an emphasis information extraction unit that extracts words in which spelling neutralization is performed from the result of the morphological analysis, 302 denotes an emphasis information storage unit that stores the extracted emphasis information, and 303 denotes an emphasis information stored in the 302. The emphasis information management unit 304, which is a user interface for adding a word to the dictionary, retrieves the emphasis information from the emphasis information storage unit 302, and checks the emphasis information for the user to confirm whether the information can be added to the dictionary. The unit 305 is an emphasis information registration unit that registers the emphasis information confirmed to be registered in the dictionary in the word dictionary.
[0047]
<Operation>
The sentence input from the sentence input unit 101 is divided into sentences by the vocabulary analysis unit 103 and morphologically analyzed by the morphological analysis unit 104. The morphological analysis result is, for example, as shown in FIG. In the modules 109 and thereafter, the same operation as in the second embodiment is performed, and speech synthesis is performed.
[0048]
The emphasis information extraction unit 301 scans each word of the morphological analysis result, checks whether the notation neutralization rule is applied, and stores only the information of the word to which the rule is applied in the emphasis information storage unit 302. I do. FIG. 17 is an example of the extraction emphasis information. The processing up to this point is performed for all sentences input from the sentence input unit.
[0049]
After the above processing is completed for all the sentences in the sentence, the user operates the emphasis information management unit 303. The emphasis information management unit 303 operates the emphasis information confirmation unit 304. The emphasis information confirmation unit 304 extracts the morphological analysis result from the emphasis information storage unit 302 word by word. Hereinafter, a description will be given assuming that a result as shown in FIG. 17 is obtained in the emphasis information storage unit 302.
[0050]
First, "Masu (auxiliary verb, ma & s, rule for deleting long-equivalent characters, immediately after the first character,"") is extracted. Display word information on the user's screen. FIG. 18 is an example of a user display screen. The neutral form is a form in which the character string of the input sentence is changed according to the notation neutralization rule of the present apparatus so that it can be matched with the registered word in the word dictionary. The emphasized form is the form that actually appears in the character string of the input sentence. It is.
[0051]
If the notation neutralization rule is applied incorrectly, an erroneous neutral form may be created, and the user is prompted for confirmation. In addition to the neutral form and the emphasized form, the user is allowed to display the original text and the part of speech subjected to the morphological analysis to make the judgment accurate. In the above example, if the user determines that "Ma- ~ s" is an emphasized form of "masu", a designation to add such emphasis information to the auxiliary verb "mas" is made. Is done.
[0052]
When the user's confirmation is input from the emphasized information confirming unit 304, the emphasized information managing unit 303 calls the emphasized information registering unit 305 and passes the morphological analysis result. The emphasis information registration unit 305 refers to the application result of the notation neutralization rule of the passed morphological analysis result. In this case, since the rule for deleting long-equivalent characters immediately after the first character is applied to the change from "mama" to "masu", "mama" is changed to "mama" The change is “prolonging the initial beat”, and information A4 (“prolonging the initial beat”) shown in the reading change information of the emphasis information is added to “mas” in the word dictionary. . As a result, the emphasis information of “mas” is as shown in FIG.
[0053]
Next, from the morphological analysis result shown in FIG. 17, “Shikari (verb, shikari, sounding deletion rule, immediately after the first character,“ firm ”)” is extracted. On the user's screen, “neutral form: shikari”, “emphasized form: firm”, “part of speech: verb”, and an input sentence are displayed. In this case, "sturdy" is not an emphasized form of the verb "shikari". If the user determines that the extraction emphasis information is inappropriate, the extraction emphasis information confirmation unit does not instruct registration in the word dictionary, so the extraction emphasis information confirmation unit reads the next morphological analysis result without calling the extraction emphasis information registration unit. Is extracted from the extraction emphasis information storage unit, and the above processing is repeated.
[0054]
When all the morphological analysis results in the extraction emphasis information storage unit have been processed, the operation ends.
[0055]
<Fourth embodiment>
<Structure>
FIG. 20 is a configuration diagram showing a fourth embodiment of the present invention. The difference from the second embodiment lies in that an emphasis notation creation unit 401 and an emphasis notation output unit 402 are provided. FIG. 21 shows rules for creating emphasis notation.
[0056]
<Operation>
The same operations as in the second embodiment are performed from the sentence input unit 101 to the voice output unit 117. On the other hand, when the processing of the emphasized word processing unit 202 ends, the analysis result is sent to the emphasized notation creation unit.
[0057]
FIG. 22 is an example of information sent from the emphasized word processing unit 202 to the emphasized notation creation unit 401. For each word, part-of-speech, pronunciation (including accent position and accent strength), emphasized word information (E0: non-emphasized word, E1: emphasized word) indicating whether the word is an emphasized word, and the notation has already been changed by emphasis. Information indicating whether reading is performed (ON: reading corrected, OFF: reading uncorrected), and details of correction (such as "initial beat lengthening").
[0058]
The emphasis notation creation unit 401 corrects a word heading using the emphasis notation creation rule. In the example of FIG. 22, since the reading correction flags of “one”, “why”, “this”, and “da” are ON, the “emphasis word notation creation rule” of FIG. 21 is applied. As a result, the word "integration" is obtained by converting the pronunciation into katakana, and the words "why", "this", and "da" are converted into hiragana. Further, as for the notation of "da", "a" is used instead of a long sound. On the other hand, since the reading correction flag is OFF for “koto”, “ni”, “n”, “ta”, and “n”, the “non-emphasized word notation creation rule” in FIG. Is used as a word heading as it is. As a result, a result as shown in FIG. 23 is obtained. The result obtained by the emphasis notation creating unit is sent to the emphasis notation output unit 402.
[0059]
The emphasis notation output unit 402 creates a sentence by listing the word headings of the word string, and outputs the created sentence at the same time as the voice output unit 117 outputs the voice, and displays it to the user together with the input sentence. I do. FIG. 24 is an example of display to the user.
[0060]
FIG. 25 is an example of display to the user when emphasis is specified for a part of a sentence. Only the specified portion is emphasized and synthesized, and only that portion is displayed in an emphasized form.
[0061]
Note that the present invention is not limited to the above-described embodiment, and various modifications are possible based on the gist of the present invention. For example, in addition to the character type used in the first embodiment, a rule such as a reading point used in the sentence "i, ka, na, i" or a kana type notation such as "beautiful" is used. be able to. In the first embodiment, only the accent is corrected as the prosody, but the pitch can be corrected according to the type of the input character.
[0062]
【The invention's effect】
As described in detail above, according to the first aspect of the present invention, a sentence inputting means for inputting a sentence including an emphasized word and a language analyzing means for performing a linguistic analysis of the input sentence Prosody processing means for setting prosody information to the language analysis result by the language analysis means, synthesis parameter generation means for generating speech synthesis parameters, speech waveform generation means for generating a speech waveform based on the synthesis parameters, A voice synthesizing device comprising: a voice output unit configured to output a waveform as voice, wherein the language analyzing unit deletes and replaces characters unnecessary for language analysis from the input sentence and divides the input sentence into sentences. Using the analysis means and a notation neutralization rule that is a rule for converting the highlighted word into a neutral word, storing the sentence divided by the vocabulary analysis means, and storing only the neutral word Morphological analysis means for dividing the words into words by referring to the word dictionary, and correcting the reading of the emphasized words converted into neutral words by referring to the reading correction rules for the words divided by the morphological analysis means And emphasis notation processing means for referring to the prosody modification rule to modify the prosody of the word to a prosody, so that (1) a word whose notation has been changed to represent emphasis Thus, there is an effect that speech synthesis can be performed without registering a new word in the word dictionary. Also, (2) there is an effect that natural speech synthesis can be performed with emphasized prosody for a word whose notation has been changed to indicate emphasis.
[0063]
According to the second aspect of the present invention, in the voice synthesizing apparatus according to the first aspect, the word dictionary in the linguistic analysis unit includes a word to which emphasis information is added to determine whether the word is an emphasis word. A lexical analysis unit, wherein the linguistic analysis unit further includes, when the morphological analysis unit determines that the word is an emphasized word, corrects the prosody of the emphasized word by referring to an emphasis processing rule for the emphasized word. Since the word processing means is provided, there is an effect that (1) by emphasizing the prosody of the emphasized word, it is possible to obtain a speech synthesis sound having a natural prosody. Also, (2) there is an effect that by emphasizing the emphasized prosody, a synthesized speech with a natural prosody can be obtained.
[0064]
According to the fourth aspect of the present invention, in the speech synthesizer according to the second aspect, further, a word extracted from all sentences in the input sentence, to which a notation neutralization rule is applied. And extracting a word from the emphasis information storage means and displaying information related to the word by an operation of the user, and determining that the word is an emphasis word. Since this configuration is provided with an emphasis information management unit that adds only the word to the word and adds the word to the word dictionary and registers the word in the word dictionary, it is possible to easily obtain dictionary information on a change when the word is emphasized. effective.
[0065]
According to a fifth aspect of the present invention, in the voice synthesizing apparatus according to the second aspect, the emphasis notation creation rule is further applied to each word information output from the emphasized word processing unit. , A sentence reflecting the prosody of the voice in the notation is created, and the emphasis notation creating means for displaying the sentence is displayed together with the input sentence. .
[Brief description of the drawings]
FIG. 1 is a configuration diagram of an apparatus showing a first embodiment.
FIG. 2 is a classification diagram of characters used in the notation of an emphasis type.
FIG. 3 is a notation neutralization rule used in a dictionary search unit.
FIG. 4 is a flowchart of a dictionary search.
FIG. 5 is an example of a morphological analysis result.
FIG. 6 is an example of a reading correction rule.
FIG. 7 is an application example of a reading correction rule.
FIG. 8 is an example of a prosody modification rule.
FIG. 9 is an application example of a prosody modification rule.
FIG. 10 is a configuration diagram of an apparatus showing a second embodiment.
FIG. 11 is an example of emphasis information.
FIG. 12 is an example of an emphasis word processing rule.
FIG. 13 is an example of emphasized word processing (emphasized word / user not emphasized).
FIG. 14 is an example of emphasized word processing (emphasized word / user's emphasis designation).
FIG. 15 is an example of emphasized word processing (non-emphasized word / user emphasized word designation).
FIG. 16 is a configuration diagram of an apparatus showing a third embodiment.
FIG. 17 is an example of extraction emphasis information.
FIG. 18 is a display example of a user display screen.
FIG. 19 is an example of adding reading change information to a word dictionary.
FIG. 20 is a configuration diagram of an apparatus showing a fourth embodiment.
FIG. 21 is an example of an emphasis notation creation rule.
FIG. 22 is an example of information sent to an emphasis notation creation rule unit.
FIG. 23 is an example of an application result of the emphasis notation creation rule.
FIG. 24 is an example of display to the user (emphasis on the entire sentence).
FIG. 25 is a display example (partial emphasis) to a user.
FIG. 26 is a configuration diagram of a conventional device.
FIG. 27 is an example of a word dictionary search.
FIG. 28 is a diagram showing a notation of emphasis.
[Explanation of symbols]
101 sentence input section
102 Language analysis unit
103 Vocabulary analysis unit
104 Morphological analyzer
105 Dictionary search processing unit
106 Word Dictionary
107 Notation Neutralization Rule
108 Unknown Word Processor
109 Emphasis notation processing unit
110 Reading Correction Rule Storage Unit
111 Prosody modification rule storage
112 syntax analyzer
113 Accent Combination / Rendaku Processing Unit
114 Prosody processing unit
115 Synthesis Parameter Generation Unit
116 sound waveform generator
117 Audio output unit
201 Word dictionary (with emphasis information)
202 Emphasis word processing unit
203 Keyword processing rule storage
301 Emphasis information extraction unit
302 Emphasis information storage
303 Emphasis information management unit
304 Emphasis information confirmation section
305 Emphasis information registration section
401 Emphasis notation creation unit
402 Emphasis notation output unit

Claims

Sentence inputting means for inputting a sentence including the highlighted word with emphasis, language analyzing means for performing language analysis of the input sentence, and setting prosodic information for a result of the language analysis by the language analyzing means. Prosody processing means, a synthesis parameter generation means for generating a speech synthesis parameter based on the result of the language analysis in which the prosody information is set by the prosody processing means, and a speech waveform based on the speech synthesis parameter A voice synthesizing apparatus, comprising: a voice waveform generating unit that performs voice output; and a voice output unit that outputs the voice waveform as voice.
The language analysis means,
Vocabulary analysis means for deleting and replacing characters unnecessary for the linguistic analysis from the input sentence, and for dividing the document in which the unnecessary characters have been deleted and replaced into sentences.
By applying various notation neutralization rules and verifying the possibility of connection between words extracted from the word dictionary , the notation of the emphasized form in the sentence divided by the vocabulary analysis means is converted to the notation of the neutral form Morphological analysis means for dividing the sentence into words and
Depending on the application status of the notation neutralization conventions, as well as modifications to the reading of the enhancement type by applying the read modified rules for pronunciation of the word is divided by the morphological analysis means, divided by the morphological analysis means Correct the prosody stressed by applying the prosody modification rule for prosody of the words were, speech synthesis apparatus characterized by comprising a enhancement representation processing means for outputting to the prosodic processing unit.

The speech synthesizer according to claim 1,
The word dictionary in the language analysis means is a word dictionary to which emphasis information as to whether the word is an emphasis word is added,
The language analysis unit further includes, when the morphological analysis unit determines that the word is an emphasized word, refers to an emphasis processing rule for the emphasized word to correct the prosody of the emphasized word. A speech synthesizer comprising:

The speech synthesizer according to claim 2,
The speech synthesizer according to claim 1, wherein the emphasis word processing means performs the emphasis process only on a character string specified by a user.

The speech synthesizer according to claim 2, further comprising:
For words extracted from all the sentences in the input sentence, emphasis information extraction means for extracting only words to which the notation neutralization rule is applied and storing them in emphasis information storage means,
The user reads out the word from the emphasis information storage means to display information about the word, and gives the emphasis information of the word only to the word determined to be the emphasis word by the user to thereby emphasize the word. A voice synthesizing device comprising: an emphasis information management unit that registers the word to which information is added in the word dictionary.

The speech synthesizer according to claim 2, further comprising:
The word processing apparatus further includes a highlighted notation creating unit that creates a sentence reflecting the prosody of the voice in the notation by applying a highlighted notation creating rule to each word information output from the emphasized word processing unit and displays the sentence together with the input sentence. A speech synthesizer characterized by the following.