JPH0738113B2

JPH0738113B2 - Speech synthesizer

Info

Publication number: JPH0738113B2
Application number: JP62278541A
Authority: JP
Inventors: 隆之大山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-11-04
Filing date: 1987-11-04
Publication date: 1995-04-26
Anticipated expiration: 2010-04-26
Also published as: JPH01120599A

Description

【発明の詳細な説明】［概要］本発明は人間が発声した音節単位の音声を短い時間間隔
ごとに分析して、これを該音声のパラメータ時系列デー
タとして、音節ごとに蓄積しておいて、これらのパラメ
ータ時系列データから成る音声を結合することにより、
任意の音声を合成する音声合成方式に関し、合成すべき音声が長音を含む場合の自然らしさと了解度
の高い合成音声を得る手段を提供することを目的とし、長音を合成する場合に、長音の直前の音節の母音部を伸
張して長音部として用いることにより構成する。DETAILED DESCRIPTION OF THE INVENTION [Outline] The present invention analyzes a syllable-based voice uttered by a human at short time intervals, and stores it as parameter time series data of the voice for each syllable. , By combining the speech consisting of these parameter time series data,
Regarding a speech synthesis method for synthesizing arbitrary speech, the purpose is to provide a means for obtaining a synthesized speech with high naturalness and intelligibility when the speech to be synthesized includes a long sound. It is constructed by expanding the vowel part of the immediately preceding syllable and using it as a long sound part.

［産業上の利用分野］本発明は音声の合成方式の内、実際に人間が発声した音
節単位の音声を短い時間間隔ごとに分析して、これを該
音声のパラメータ時系列データとして、音節ごとに蓄積
しておいて、これらのパラメータ時系列データから成る
音声を結合することにより、任意の音声を合成する音声
合成方式に関し、特に、長音を伴う場合の音声の自然性
を向上せしめ得る音声合成方式に係る。INDUSTRIAL APPLICABILITY The present invention analyzes voices in a syllable unit actually uttered by a human at short time intervals in a voice synthesizing method, and uses this as parameter time series data of the voices for each syllable. The present invention relates to a speech synthesis method for synthesizing arbitrary speech by accumulating speeches composed of these parameter time-series data, and particularly, speech synthesis that can improve the naturalness of speech when accompanied by long sound. It depends on the method.

［従来の技術］人工的に音声を合成する方式を大別すると、録音編集方式、分析合成方式、純粋合成方式の３
方式に分けられる。[Prior Art] The methods of artificially synthesizing speech are roughly divided into three: recording and editing, analysis and synthesis, and pure synthesis.
It is divided into methods.

これらの内、の録音編集方式は予め録音した人間の音
声波形をつなぎ合わせて合成するもので装置や制御が比
較的簡単であり、音質も良好であるという長所もあるが
情報量が非常に多いため、大容量の記憶装置を必要とす
る欠点がある。Among these, the recording / editing method is one in which prerecorded human voice waveforms are connected and synthesized, and the device and control are relatively simple and the sound quality is good, but the amount of information is very large. Therefore, there is a drawback that a large-capacity storage device is required.

また、の純粋合成方式は文字等から一定の合成規則を
用いて音声を合成する方式で、人間の音声を予め分析す
る等の必要はないが、自然性の良好な音声を作り出すた
めには、前記合成規則が非常に複雑なものとなり、その
実現は容易ではない。In addition, the pure synthesis method is a method of synthesizing voices from characters etc. using a certain synthesis rule, and it is not necessary to analyze human voices in advance, but in order to create a voice with good naturalness, The composition rule becomes very complicated, and its implementation is not easy.

これらに対して、の分析合成方式は予め人の音声を分
析して、該音声をその特徴を表すパラメータに変換して
記憶しておいて、これを元に音声を合成する方式であっ
て、この方式においては、音声を波形としてではなく、
音声をその調音によるスペクタルの形状を表すパラメー
タと音源の状態を表すパラメータとに変換して、それら
の情報を圧縮して記憶することができるので、記憶容量
が少なくて済む上、音質も比較的良好なものが得られる
から、近年、この方式を用いた音声合成方式が普及しつ
つある。On the other hand, the analysis and synthesis method of is a method of analyzing a person's voice in advance, converting the voice into a parameter representing the characteristic and storing it, and synthesizing the voice based on this. In this method, the voice is not a waveform, but
Since the voice can be converted into a parameter indicating the shape of the spectrum by the articulation and a parameter indicating the state of the sound source and the information can be compressed and stored, the storage capacity is small and the sound quality is relatively high. In recent years, a voice synthesis method using this method has become widespread because a good one can be obtained.

このような方式にもとづいて人間の発声した音節単位の
音声（例えば“ア”“イ”“ウ”……等）を分析して置
き、それらの音節を結合して声の高さを制御することで
任意の音声を合成する方式が広く用いられている。Based on such a method, human uttered voices in syllable units (for example, “A”, “I”, “U”, etc.) are analyzed and placed, and the syllables are combined to control the pitch of the voice. Therefore, a method of synthesizing an arbitrary voice is widely used.

［発明が解決しようとする問題点］上述したような従来の分析合成方式にもとづく任意語の
音声合成方式において、長音は同種の母音連鎖と同様の
方法で合成されていた。[Problems to be Solved by the Invention] In the speech synthesis method for arbitrary words based on the conventional analysis and synthesis method as described above, long sounds are synthesized by the same method as the vowel chain of the same kind.

例えば、“キー”（Key）も“キイ”（奇異）も全く同
様に音節“キ”の後に音節“イ”を接続（同種母音連
鎖）することにより合成していた。For example, both "key" and "key" (odd) were synthesized by connecting the syllable "i" to the syllable "a" (same vowel chain) in exactly the same way.

そのため、長音部の音声が不自然であるばかりでなく、
長音と同種母音連鎖との区別が付かないという問題点が
あった。Therefore, not only is the sound of the long tones unnatural,
There was a problem that it was not possible to distinguish between long consonants and vowel chains of the same kind.

本発明はこのような従来の問題点に鑑み、長音の発声が
自然であると共に、長音と同種母音連鎖との区別を明確
にすることのできる任意語の音声合成方式を提供するこ
とを目的としている。In view of such conventional problems, the present invention has an object to provide a speech synthesis method for an arbitrary word, which is natural in voicing long sounds and which can clearly distinguish between long sounds and consonant vowel chains. There is.

［問題点を解決するための手段］本発明によれば、上述の目的は前記特許請求の範囲に記
載した手段により達成される。すなわち、本発明は、発
声された音節単位の音声を短い時間間隔ごとに分析し
て、これを該音声のパラメータ時系列データとして、音
節ごとに蓄積しておいて、これらのパラメータ時系列デ
ータから成る音声を結合することにより、任意の音声を
合成する音声合成装置において、長音を合成する場合
に、長音の直前の音節の母音部を伸張して長音部として
用いる音声合成装置である。[Means for Solving the Problems] According to the present invention, the above object is achieved by the means described in the claims. That is, the present invention analyzes voiced syllable units at short time intervals, accumulates this as parameter time-series data of the voice for each syllable, and extracts from these parameter time-series data. In a speech synthesizer for synthesizing an arbitrary speech by combining the following speeches, the speech synthesizer uses the vowel part of the syllable immediately before the long sound as a long sound portion when the long sound is synthesized.

［作用］第１図は本発明の原理を説明する図であって、（ａ）は
“奇異”の音声合成について示しており、（ｂ）は“キ
ー”の音声合成について示したもので、１は音節
“キ”、２は音節“イ”、３は伸張部を表している。[Operation] FIG. 1 is a diagram for explaining the principle of the present invention, in which (a) shows "odd" voice synthesis and (b) shows "key" voice synthesis. Reference numeral 1 represents a syllable “ki”, 2 represents a syllable “a”, and 3 represents an expanded portion.

従来は、“奇異”、“キー”共に（ａ）で示す方法で合
成していたが、本発明によれば、“キー”の場合に
（ｂ）に示す方法によって音声合成を行なうことによ
り、このような長音と（ａ）の場合のような同種母音連
鎖との区別が明確に聞き分けられる合成音声を得ること
ができる。Conventionally, both the "odd" and the "key" are synthesized by the method shown in (a), but according to the present invention, by performing the speech synthesis by the method shown in (b) in the case of the "key", It is possible to obtain a synthetic voice in which the distinction between such a long sound and the homologous vowel chain as in the case of (a) can be clearly heard.

［実施例］第２図は本発明の一実施例の機能ブロック図であって、
４は音節読出部、５は音節格納部、６は音節結合部、７
は時間長設定部、８はピッチパターン設定部、９は波形
合成部、10は長音検出部を示している。[Embodiment] FIG. 2 is a functional block diagram of an embodiment of the present invention.
4 is a syllable reading unit, 5 is a syllable storage unit, 6 is a syllable connecting unit, and 7
Is a time length setting unit, 8 is a pitch pattern setting unit, 9 is a waveform synthesizing unit, and 10 is a long sound detecting unit.

同図において、音声合成すべき入力文字列が入力される
と、音声読出し部４は音節格納部５から必要な音節の音
声パラメータを読み出して音節結合部６へ送る。In the figure, when an input character string to be voice-synthesized is input, the voice reading unit 4 reads a voice parameter of a required syllable from the syllable storage unit 5 and sends it to the syllable combining unit 6.

一方、時間長設定部７は前記入力文字列から各音節の時
間長を設定する。On the other hand, the time length setting unit 7 sets the time length of each syllable from the input character string.

また、長音検出部10は入力文字列を監視していて、長音
が検出された場合にはその直前の音節の母音を伸張する
ように音節結合部６に連絡する。Further, the long sound detecting unit 10 monitors the input character string, and when a long sound is detected, the long sound detecting unit 10 informs the syllable connecting unit 6 to extend the vowel of the syllable immediately before that.

該音節結合部６は、前記時間長設定部７が設定した音節
の時間長に従って、音声パラメータを結合する。The syllable combination unit 6 combines the voice parameters according to the time length of the syllable set by the time length setting unit 7.

このとき、読み出されたパラメータと設定された時間長
に不一致があれば母音部時間長を伸縮することによって
調整を行なう。そして、同時に長音部に係る母音伸張を
行ない、結合した音声パラメータを波形合成部９へ送
る。At this time, if there is a disagreement between the read parameter and the set time length, the vowel part time length is expanded or contracted for adjustment. At the same time, vowel expansion related to the long sound portion is performed, and the combined voice parameters are sent to the waveform synthesizing unit 9.

波形合成部９は該音声パラメータとピッチパターン設定
部８で設定されたピッチパターンとによって音声波形を
合成する。The waveform synthesizing section 9 synthesizes a speech waveform with the speech parameter and the pitch pattern set by the pitch pattern setting section 8.

［発明の効果］以上、説明したように、本発明によれば、長音を有する
合成音の自然性を増して音質を向上せしめ得る利点を有
する。[Effects of the Invention] As described above, according to the present invention, there is an advantage that the naturalness of a synthetic sound having a long sound can be increased and the sound quality can be improved.

そして従来区別できなかった長音（例えば“アー”）と
同一母音連鎖（“アア”）あるいは同種母音連鎖との区
別が明確になる効果がある。Then, there is an effect that distinction between a long sound (for example, “Ah”) and the same vowel chain (“Aa”) or the same kind of vowel chain, which cannot be distinguished conventionally, becomes clear.

[Brief description of drawings]

第１図は本発明の原理を説明する図、第２図は本発明の
一実施例の機能ブロック図である。１……音節“キ”、２……音節“イ”、３……伸張部、
４……音節読出部、５……音節格納部、６……音節結合
部、７……時間長設定部、８……ピッチパターン設定
部、９……波形合成部、10……長音検出部FIG. 1 is a diagram for explaining the principle of the present invention, and FIG. 2 is a functional block diagram of an embodiment of the present invention. 1 ... syllable "ki", 2 ... syllable "a", 3 ... extension part,
4 ... syllable reading unit, 5 ... syllable storage unit, 6 ... syllable combining unit, 7 ... time length setting unit, 8 ... pitch pattern setting unit, 9 ... waveform synthesis unit, 10 ... long sound detecting unit

Claims

[Claims]

1. Spoken syllable-based speech is analyzed for each short time interval, and this is stored as parameter time-series data of the speech for each syllable, and is composed of these parameter time-series data. By combining the voice,
A voice synthesizing device for synthesizing an arbitrary voice, wherein when synthesizing a long sound, a vowel part of a syllable immediately before a long sound is expanded and used as a long sound part.