JPH0644199B2

JPH0644199B2 - Variable-length frame speech analysis / synthesis method

Info

Publication number: JPH0644199B2
Application number: JP59159846A
Authority: JP
Inventors: 哲田口
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1984-07-30
Filing date: 1984-07-30
Publication date: 1994-06-08
Anticipated expiration: 2009-06-08
Also published as: JPS6136800A

Description

【発明の詳細な説明】（技術分野）本発明は可変長フレーム音声分析合成方式、とくに、例
えば１０秒程度の比較的長い音声信号を全体として最適
な可変長フレーム処理を行なうようにした可変長フレー
ム音声分析合成方式に関するものである。Description: TECHNICAL FIELD The present invention relates to a variable length frame voice analysis / synthesis method, and more particularly, to a variable length voice signal for relatively long voice signals of about 10 seconds, which is optimum for variable length frame processing as a whole. The present invention relates to a frame voice analysis / synthesis method.

（従来技術）例えば１０秒程度を単位とする音声信号の分析合成に
は、ボイスメール，パブリックアドレス等の種種の利用
分野が考えられる。このような音声信号の分析合成に可
変長フレーム処理を適用して情報量の圧縮等を図る場合
は、一般に、下記のように行なわれる。(Prior Art) For analyzing and synthesizing a voice signal in units of, for example, about 10 seconds, various fields of application such as voice mail, public address, etc. can be considered. When the variable length frame processing is applied to the analysis and synthesis of such a voice signal to achieve the compression of the amount of information or the like, it is generally performed as follows.

すなわち、例えば１０秒の区間を５０等分して、２００
ｍSEC程度の区分を作り、この各区分ごとに独立に可変
長フレーム処理を行なうという方式が用いられる。That is, for example, a section of 10 seconds is divided into 50 equal parts and 200
A method is used in which sections of about mSEC are created and variable-length frame processing is performed independently for each section.

しかしながらこの方式によると、２００ｍSECの各区分
ごとには最適の可変長フレーム処理を行なうことができ
るが、１０秒を全体として見た場合には必らずしもまだ
最適化が達成されていないという欠点を有している。However, according to this method, optimal variable-length frame processing can be performed for each section of 200 mSEC, but if 10 seconds is viewed as a whole, optimization is not necessarily achieved yet. It has drawbacks.

一方、はじめから１０秒全体を一単位として、ダイナミ
ックプログラミング（ＤＰ）を用いた可変長フレーム処
理により最適化を行うことも理論的には可能であるが、
こうするとＤＰを行なうための演算量が莫大なものとな
り、また分析合成装置では遅延が大となり、伝送路エラ
ーに対して復元性に欠ける等の問題が生じ現実的でな
い。On the other hand, it is theoretically possible to perform optimization by variable length frame processing using dynamic programming (DP) with the entire 10 seconds as a unit.
In this case, the amount of calculation for performing DP becomes enormous, and the analysis and synthesis apparatus has a large delay, which causes a problem such as lack of resilience against a transmission line error, which is not realistic.

（発明の目的）本発明の目的は、１０秒程度の比較的長い単位の音声信
号を全体として可変長フレーム処理による最適化を行な
い情報量のより効率的な圧縮を可能にする現実的な可変
長フレーム音声分析合成方式を提供するにある。(Object of the Invention) It is an object of the present invention to optimize a voice signal in a relatively long unit of about 10 seconds by variable length frame processing as a whole and realize a realistic variable that enables more efficient compression of information amount. It is to provide a long frame speech analysis and synthesis method.

（発明の構成）本発明の方式は、予め定めた一定の分析周期ごとに周期
的に入力音声信号を分析して特徴パラメータベクトルを
抽出する音声分析手段と、予め定めた複数個の相連続す
る前記分析周期からなる各区分ごとに各区分中の前記特
徴パラメータベクトルから任意の数の代表パラメータベ
クトルを選出して各区分を区分的最適関数近似を行なっ
た場合に得られる各区分ごとの前記代表パラメータベク
トルの構成とこの場合の各区分ごとの最適近似による残
留歪とを演算する区分的最適関数近似手段と、予め定め
た複数個の相連続する前記区分からなる大区間において
前記区分的最適関数近似手段により演算された各区分ご
との前記残留歪を比較して残留歪の最も大きい区分の前
記代表パラメータベクトルの構成をより多くの前記代表
パラメータベクトルを含む前記代表パラメータベクトル
の構成に置換えるという処理ステップを繰返すことによ
り前記大区間を最適近似する予め定めた数のすべての代
表パラメータベクトルを選出するようにした相互最適フ
レーム選択手段とを有する。(Structure of the Invention) According to the method of the present invention, a voice analysis unit that periodically analyzes an input voice signal at a predetermined fixed analysis period to extract a feature parameter vector, and a plurality of predetermined continuous phases. The representative for each section obtained by selecting an arbitrary number of representative parameter vectors from the characteristic parameter vectors in each section for each section consisting of the analysis cycle and performing each piecewise optimal function approximation A piecewise optimal function approximating means for calculating the configuration of the parameter vector and the residual strain due to the optimal approximation in each case in this case, and the piecewise optimal function in a large section consisting of a plurality of predetermined consecutive sections By comparing the residual strains calculated by the approximating means for each of the sections, the configuration of the representative parameter vector of the section having the largest residual strain can be determined by using more representative patterns. Mutual optimum frame selecting means for selecting a predetermined number of all the representative parameter vectors that optimally approximates the large section by repeating the processing step of replacing with the configuration of the representative parameter vector including the parameter vector. Have.

（実施例）次に図面を参照して本発明を詳細に説明する。第１図は
本発明の一実施例を示すブロック図である。(Example) Next, this invention is demonstrated in detail with reference to drawings. FIG. 1 is a block diagram showing an embodiment of the present invention.

本実施例は音声分析側１と音声合成側２とよりなる。The present embodiment comprises a voice analysis side 1 and a voice synthesis side 2.

分析側１はさらに、低域波器およびＡ／Ｄ変換器（Ｌ
ＰＦ＆Ａ／Ｄ）１０１、窓関数処理器１０２、ＬＳＰ分
析器１０３、区分的最適関数近似器１０４、総合最適フ
レーム選択器１０５、量子化器１０６、音源情報分析器
１０７、コーダ１０８、およびメモリ１０９を含み、ま
た、合成側２は、メモリ２０１、デコーダ２０２、パル
ス発振器２０３、雑音発生器２０４、Ｖ／ＵＶ切替器２
０５、電力制御器２０６、ＬＳＰ合成フィルタ２０７、
Ｄ／Ａ変換器および低域波器’（Ｄ／Ａ＆ＬＰＦ）２
０８、および補間器２０９を含んでいる。The analysis side 1 is further provided with a low pass filter and an A / D converter (L
PF & A / D) 101, window function processor 102, LSP analyzer 103, piecewise optimal function approximator 104, total optimal frame selector 105, quantizer 106, source information analyzer 107, coder 108, and memory 109. In addition, the synthesis side 2 includes a memory 201, a decoder 202, a pulse oscillator 203, a noise generator 204, and a V / UV switch 2
05, power controller 206, LSP synthesis filter 207,
D / A converter and low-pass filter '(D / A & LPF) 2
08 and an interpolator 209 are included.

本実施例の動作は下記の通りである。The operation of this embodiment is as follows.

ライン1000から入力した音声信号は、低域波器および
Ａ／Ｄ変換器（LPF&Ａ／Ｄ）１０１において、周波数帯
域が例えば3.4ｋHzに制限された後、８ｋHzのサンプリ
ング周波数でサンプルされ、量子化されてディジタルデ
ータに変換され、こうして得られたデータは窓関数処理
器１０２に供給される。The audio signal input from the line 1000 is sampled at a sampling frequency of 8 kHz and quantized in the low pass filter and the A / D converter (LPF & A / D) 101 after the frequency band is limited to, for example, 3.4 kHz. Data is converted into digital data, and the data thus obtained is supplied to the window function processor 102.

窓関数処理器１０２は供給されたデータの１ブロック分
（例えば２４０サンプル）を一時的に記憶し、これに予
め定まっている窓関数による荷重乗算を施こし、この処
理結果のデータをＬＳＰ分析器１０３および音源情報分
析器１０７に供給する。窓関数処理器１０２によるこの
ような処理は例えば１０ｍSECごとの周期で繰返され
る。従って、ＬＳＰ分析器１０３および音源情報分析器
１０７は１０ｍＳＥＣの周期で１ブロック分の窓関数処
理されたデータの供給を受ける。The window function processor 102 temporarily stores one block (for example, 240 samples) of the supplied data, applies weight multiplication by a predetermined window function to this, and outputs the data of the processing result to the LSP analyzer. 103 and the sound source information analyzer 107. Such processing by the window function processor 102 is repeated, for example, in a cycle of every 10 mSEC. Therefore, the LSP analyzer 103 and the sound source information analyzer 107 are supplied with the data subjected to the window function processing for one block at a cycle of 10 mSEC.

さてＬＳＰ分析器１０３は供給された１ブロック分のデ
ータを用いて公知の手法によるＬＳＰ（線スペクトル
対）分析を行ない、ＬＳＰパラメータベクトルを決定す
る。このＬＳＰパラメータベクトルは、Ｓ（偶数）個の
成分をもつＳ次元のベクトル＝（Ｐ_１，Ｐ_２，…，Ｐ_ｓ）で、この各成分Ｐ_１〜Ｐ_ｓは、この１ブロック分の音声
を発声するときの声道の形態に関する情報を共振周波数
の組の形で抽出したデータである。上述のように、この
ようなＬＳＰパラメータベクトルの生成は１０ｍSEC
の基本分析周期ごとに行なわれ、かくして得られたベク
トルの各成分は、次の区分的最適関数近似器１０４に
各基本分析周期の１０ｍSEC（以後これを基本フレーム
と呼ぶ）ごとに供給される。The LSP analyzer 103 performs LSP (line spectrum pair) analysis by a known method using the supplied data for one block, and determines the LSP parameter vector. This LSP parameter vector is an S-dimensional vector having S (even number) components = (P ₁ , P ₂ , ..., P _s ), and each of these components P _{1 to} P _s is a speech for one block. It is data obtained by extracting the information on the form of the vocal tract when uttering a voice in the form of a set of resonance frequencies. As mentioned above, the generation of such an LSP parameter vector is 10 mSEC.
Each of the components of the vector thus obtained is supplied to the next piecewise optimal function approximator 104 every 10 mSEC of each basic analysis period (hereinafter referred to as a basic frame).

さて、区分的最適関数近似器１０４は、こうしてつぎつ
ぎに連続して供給されるパラメータベクトルのＫ個ずつ
を一つにまとめて取扱う。Now, the piecewise optimum function approximator 104 collectively handles K parameter vectors which are successively supplied one after another.

すなわち、各基本フレームはそれぞれこの基本フレーム
に属するパラメータベクトルによって代表されている
が、この相連続するＫの個の基本フレームを一つにまと
めて、これを一区分とし、この各区分ごとに、以下に示
すような区分的最適関数近似の処理を行なう。ここでは
この区分的最適関数近似に用いる関数として矩形近似を
行なう場合について説明する。また１区分中の基本フレ
ーム数を２０個（つまりＫ＝２０）、したがって１区分
の時間長を２００ｍSECと仮定する。That is, each basic frame is represented by a parameter vector belonging to this basic frame, respectively, but this continuous K basic frames are combined into one, and this is divided into one section. The following piecewise optimal function approximation processing is performed. Here, a case where a rectangle approximation is performed as a function used for this piecewise optimal function approximation will be described. It is also assumed that the number of basic frames in one section is 20 (that is, K = 20), and thus the time length of one section is 200 mSEC.

さて、区分的最適関数近似器１０４による処理は以下の
ようなものである。The processing by the piecewise optimum function approximator 104 is as follows.

すなわち、１区分中の２０個の基本フレームの中からｉ
個（ｉ＝１，２，…，２０）の代表フレームを選び、こ
の代表フレームに属するパラメータベクトルを用いて、
この区分中の他の基本フレームに属するパラメータベク
トルをも代表（近似）させ、これによって矩形近似を行
なう。こうして、この近似による歪が最小になるよう
に、前述のｉ個の代表フレーム（代表ベクトル）を選出
する。またこのときのｉ個の代表フレームで近似したと
きに達し得られる歪の最小値Ｅ_ｉも同時に求める。That is, i is selected from the 20 basic frames in one section.
(I = 1, 2, ..., 20) representative frames are selected, and using the parameter vector belonging to this representative frame,
Parameter vectors belonging to other basic frames in this section are also represented (approximated), and thereby a rectangle is approximated. In this way, the above-mentioned i representative frames (representative vectors) are selected so that the distortion due to this approximation is minimized. At the same time, the minimum value E _{i of the} distortion that can be achieved by approximation with i representative frames at this time is also obtained.

なお、この場合の矩形近似による歪は以下のようにして
演算される。The distortion due to the rectangular approximation in this case is calculated as follows.

例えば今、ｉ＝２として、２個の代表フレームのパラメ
ータベクトルを用いて矩形近似を行なう場合について説
明すると、第１の代表フレームとして第４番目の基本フ
レームが選択され、この代表フレームに属するパラメー
タベクトル⁽⁴⁾を第１の代表バクトルとして第１番目
の基本フレームから第９番目の基本フレームまでの９個
の区間を近似し、次に第２の代表フレームとして、第１
３番目の基本フレームが選択され、この代表フレームに
属するパラメータベクトル⁽¹³⁾を第２の代表ベクトル
として残りの第１０番目の基本フレームから第２０番目
の基本フレームまでの１１個の区間を近似した場合にお
ける矩形近似による歪は以下に示すようにして求められ
る。For example, supposing that i = 2 and performing the rectangle approximation using the parameter vectors of the two representative frames, the fourth basic frame is selected as the first representative frame, and the parameters belonging to this representative frame are selected. The vector ⁽⁴⁾ is used as the first representative vector to approximate the nine sections from the first basic frame to the ninth basic frame, and then as the second representative frame, the first representative frame is used.
The third basic frame was selected, and the 11th section from the remaining 10th basic frame to the 20th basic frame was approximated using the parameter vector ⁽¹³⁾ belonging to this typical frame as the second typical vector. The distortion due to the rectangular approximation in this case is obtained as follows.

但しＷ_ｌ（ｌ＝１，２，…Ｓ）は、パラメータベクトル
の各成分のスペクトル位置における差によって歪に与え
る影響が異なってくるのを補正するための予め定めた荷
重係数である。 However, W _l (l = 1, 2, ... S) is a predetermined weighting factor for correcting that the influence on the distortion is different due to the difference in the spectral position of each component of the parameter vector.

例えばｉ＝２と与えた場合の最適矩形近似とは、このよ
うにして求められる歪が最小になるような、２個の代表
ベクトルを含む代表パラメータベクトルの構成を決定す
ること、つまり２個の代表フレームと、各代表フレーム
に属するパラメータベクトルが代表すべき２個の基本フ
レーム区間とを決定することである。これとともにこの
決定された代表パラメータベクトルの構成により達し得
られた残留歪の値をもデータとして求めておく。For example, the optimal rectangle approximation when i = 2 is given is to determine the configuration of the representative parameter vector including the two representative vectors so that the distortion obtained in this way is minimized, that is, This is to determine a representative frame and two basic frame sections that the parameter vector belonging to each representative frame should represent. At the same time, the value of the residual strain reached and obtained by the configuration of the determined representative parameter vector is also obtained as data.

以上に述べた演算はダイナミックプログラミング（Ｄ
Ｐ）を用いて容易に行なうことができる。The operations described above are dynamic programming (D
P) can be easily used.

今、区分の最初からａ個の基本フレームでできる区間
を、最後の基本フレーム（第ａ番目の基本フレーム）を
含むｂ個の代表フレームで近似した場合に達し得られる
最小の歪（残留歪）をＧ（ｂ，ａ）と定義すると、ｂ＝
１、つまり代表ベクトルの数が１個の場合には、代表パ
ラメータベクトルの構成は^(a)が第１番目から第ａ番
目までの基本フレームの区間を代表するので残留歪Ｇ
（１，ａ）は、となり、ａ＝１〜２０に対して一義的に定まる。Now, the minimum distortion (residual distortion) that can be reached when the section formed by a basic frames from the beginning of the partition is approximated by b representative frames including the final basic frame (a-th basic frame) Is defined as G (b, a), b =
1, that is, when the number of representative vectors is 1, the configuration of the representative parameter vector ^(a) represents the section of the basic frame from the first to the a-th, and therefore the residual distortion G
(1, a) is Therefore, it is uniquely determined for a = 1 to 20.

但しｄ_ｋ，ａは第ｋ番目の基本フレームのパラメータベ
クトルを第ａ番目の基本フレーム（代表フレーム）のパ
ラメータベクトル（代表ベクトル）で代表した場合の歪
である。However, d _{k, a} is distortion when the parameter vector of the k-th basic frame is represented by the parameter vector (representative vector) of the a-th basic frame (representative frame).

次に、第ｘ番目の基本フレームから第ｙ番目の基本フレ
ームの区間（但しｙ＞ｘとする）を両端を代表フレーム
として近似する場合（つまりこの区間を代表ベクトルＰ
^(x)と^(y)を用いて近似する場合）に達し得られる最小
の歪をＤ_ｘ，ｙで表わすＤ_ｘ，ｙはで容易に求めることができる。このＤ_ｘ，ｙは、代表ベ
クトル^(x)の代表する区間を最初は第Ｘ番目の基本フ
レームだけ、次には第Ｘ番目と第Ｘ＋１番目の基本フレ
ームの区間、さらにつぎには第Ｘ番目、第Ｘ＋１番目お
よび第Ｘ＋２番の基本フレームの区間というように次次
に増してゆき、残りの区間を^(y)で代表させた場合の
歪をそれぞれ求めてその中の最小の歪を選出したもので
ある。Next, when approximating a section from the x-th basic frame to the y-th basic frame (provided that y> x) as representative frames (that is, this section is represented by a representative vector P
minimum distortion of the D _x obtained reached when) approximated by using ^(x) and a ^_(y), D represented by _y _{x, y} is Can be easily obtained at. This D _{x, y} is the section represented by the representative vector ^(x) , which is initially the Xth basic frame, the Xth and X + 1th basic frames, and then the Xth basic frame. , The X + 1th and X + 2th basic frame sections, and so on. The distortions when the remaining section is represented by ^(y) are obtained, and the minimum distortion among them is selected. It is a thing.

さて以上に求めたＧ（１，ａ）とＤ_ｘ，ｙとを用いて、
代表フレーム数を２個とした場合のＧ（２，ａ）を下記
のようにして容易に求めることができる。Now, using G (1, a) and D _{x, y} obtained above,
G (2, a) when the number of representative frames is 2 can be easily obtained as follows.

すなわち、第１の代表フレームとして、第ａ−１番目の
基本フレームを選んだ場合（勿論第２の代表フレームは
第ａ番目の基本フレームである）の歪は明らかに上式右
辺の第１番目に示すＧ（１，ａ−１）である。 That is, the distortion when the (a-1) th basic frame is selected as the first representative frame (of course, the second representative frame is the ath basic frame) is obviously the first distortion on the right side of the above equation. G (1, a-1) shown in FIG.

次に第１の代表フレームを一つだけ前に進めて第ａ−２
番目に選んだ場合には達し得られる歪の最小値は、上式
右辺の第２番目に示すＧ（１，ａ−２）＋Ｄ_{ａ−２，ａ}
となる。すなわち、Ｇ（１，ａ−２）は第１番目から第
ａ−２番目までの区間を第ａ−２番目の基本フレームに
よる代表フレーム（代表ベクトル^{（ａ−２）}で代表し
た場合の歪を表わしＤ_ａ−２，ａは第ａ−２番目から第
ａ番目までの区間をその両端の第ａ−２番目と第ａ番目
の基本フレームによる代表フレーム（代表ベクトルＰ
^{（ａ−２）}と^（ａ）で代表した場合に達し得られる最
小の歪を表わしていてこの場合の最小の歪は両者の和に
なることは明らかである（第ａ−２番目の基本フレーム
の歪は０になるのでＧ（１，ａ−２）とＤ_ａ−２との代
表する区間はこの基本フレームでダブッてもよい。）同様にして第１の代表フレームを一つずつ前に進め、こ
のときに達し得られる最小歪を次次に求める。Next, the first representative frame is moved forward by one, and the a-2
Minimum value of the resulting strain reached if chosen th, G shown in the second upper right side of equation (1, a-2) + D a-2, a
Becomes That is, G (1, a-2) represents the distortion when the section from the 1st to the a-2nd section is represented by the representative frame (representative vector ^(a-2)) of the ^a-2nd basic frame. The symbol D _a−2, a represents the section from the a−2nd section to the ath section, which is a representative frame (representative vector P based on the a−2nd and ath basic frames at both ends thereof).
It represents the minimum distortion that can be reached in the case represented by ^(a-2) and ^(a) , and it is clear that the minimum distortion in this case is the sum of both (a-2nd basic frame). Since the distortion of 0 becomes 0, the section representative of G (1, a-2) and D _a-2 may be doubled with this basic frame.) Similarly, the first representative frame is forwarded one by one. The next step is to find the minimum distortion that can be reached at this time.

第１の代表フレームを最も前に進めて、第１番目の基本
フレームに選んだ場合には、達し得られる最小歪として
上式右辺の最も下に示すＧ（１，１）＋Ｄ_１，ａになる
ことは明らかである。勿論Ｇ（１，１）＝０である。When the first representative frame is advanced to the front and selected as the first basic frame, the minimum distortion that can be reached is G (1,1) + D _{1, a} shown at the bottom of the right side of the above equation. It is clear that Of course, G (1,1) = 0.

以上より、第１番目から第ａ番目までの区間を、２個の
代表フレーム（但しその中の一つは第ａ番目の基本フレ
ームを代表フレームとする）で代表した場合に達し得ら
れる最小歪（残留歪）Ｇ（２，ａ）は以上に得られたす
べての歪の中の最小の歪を選出することによって求めら
れこれは上式によって示される。こうして残留歪Ｇ
（２，ａ）が求められるが、これとともにこの残留歪を
与える場合の代表パラメータベクトルの構成、つまり二
つの代表ベクトル^（Ｘ），^（ａ）およびそれぞれの
代表ベクトルが代表する区間幅Ｂ_ｘ，ａ−Ｂ_ｘとが定ま
る。こうしてＧ（２，ａ）およびそれに対応する代表パ
ラメータベクトルの構成もａ＝１〜２０に対してすべて
求められる。From the above, the minimum distortion that can be reached when the first to a-th sections are represented by two representative frames (one of which is the a-th basic frame as a representative frame) The (residual strain) G (2, a) is obtained by selecting the minimum strain among all the strains obtained above, and this is shown by the above equation. Thus residual strain G
(2, a) is obtained, and together with this, the configuration of the representative parameter vector when giving this residual distortion, that is, the two representative vectors ^(X) , ^(a) and the section width B _x , which each representative vector represents, a−B _x is determined. In this way, the configurations of G (2, a) and the corresponding representative parameter vector are also obtained for a = 1 to 20.

さらに代表フレーム数を一個増した場合のＧ（３，ａ）
は、上に求めたＧ（２，ａ）とＤ_ｘ，ｙを用いてＧ
（２，ａ）と全く同様にして下式により求められる。G (3, a) when the number of representative frames is increased by one
Is calculated by using G (2, a) obtained above and D _{x, y.}
It is obtained by the following equation in exactly the same manner as (2, a).

こうしてＧ（３，ａ）がａ＝３〜２０に対して求められ
ると、次に代表フレーム数をさらに一個増したＧ（４，
ａ）が全く同様にしてａ＝４〜２０に対して求められ
る。 Thus, when G (3, a) is obtained for a = 3 to 20, G (4,4), which is the number of representative frames, is increased by one.
a) is obtained in the same manner for a = 4 to 20.

このようにして、ＤＰを用いることにより代表フレーム
数を次次に増して達し得られる歪の最小値を求めてゆく
ことにより、任意のｉ，ｊ（但しｊ＝１，２，…２０：
ｉｊ）に対するＧ（ｉ，ｊ）および、そのときの代表
パラメータベクトルの構成つまり、ｉ個の代表ベクトル
と各代表ベクトルが代表する基本フレームの区間幅の
組をすべて決定することができる。In this way, by using DP, the number of representative frames is increased next and the minimum value of the distortion that can be reached is obtained, whereby any i, j (where j = 1, 2, ... 20:
It is possible to determine G (i, j) for ij) and the configuration of the representative parameter vector at that time, that is, all the sets of i representative vectors and the interval width of the basic frame represented by each representative vector.

こうして求められたＧ（ｉ，２０）は、基本フレーム２
０個からなる前述の一つの区分をｉ個の代表フレームで
近似する場合に達し得られる歪の最小値を表わしている
が、前述したようにｉ個の代表フレームの中の１個は第
２０番目の基本フレームを用いるという拘束条件が入っ
ている。G (i, 20) thus obtained is the basic frame 2
It represents the minimum value of the distortion that can be reached when approximating one section consisting of 0 pieces by i pieces of representative frames. As described above, one of the i pieces of representative frames is the 20th frame. There is a constraint that the second basic frame is used.

この拘束条件を除いて、基本フレーム２０個からなる一
区分の中に任意のｉ個の代表フレームを選んで最適近似
をする場合に達し得られる歪の最小値（残留歪）をＥ_ｉ
とすると、Ｅ_ｉは、上に求めた（ｉ，ｊ）を用いて以下
のようにして求められる。Excluding this constraint, the minimum value of the distortion (residual distortion) that can be reached when optimal i approximation is performed by selecting any i representative frames in one section consisting of 20 basic frames is E _i.
Then, E _i is obtained as follows using (i, j) obtained above.

今、第ｋ番目の基本フレームから第２０番目の基本フレ
ームまでの区間を第ｋ番目の基本フレームを代表フレー
ムとして用いて（つまり^（ｋ）を用いて）近似する場
合の歪をＤ_ｋで表わすと、として容易に求められる。Now, the distortion when approximating the section from the kth basic frame to the 20th basic frame using the kth basic frame as a representative frame (that is, using ^(k)) is represented by D _k When, As easily requested.

このＤ_ｋを用いると。例えばＥ_１は、として求められ、また任意のｉ（但しｉ＝１，２，…２
０）に対するＥ_ｉはとして求めることができる。こうしてＥ_ｉが求まると、
前述のように、この残留歪を与える代表パラメータベク
トルの構成つまりｉ個の代表ベクトルの組と、これらの
ｉ個の各代表ベクトルが代表する基本フレームの区間幅
を表わすｉ個の数の組と、が決定される。With this D _k . For example, E ₁ is And any i (where i = 1, 2, ... 2)
E _i for 0) is Can be asked as Thus, when E _i is obtained,
As described above, the configuration of the representative parameter vector that gives this residual distortion, that is, the set of i representative vectors and the set of i numbers representing the section width of the basic frame represented by each of these i representative vectors, , Are determined.

さて、区分的最適関数近似器１０４は、基本フレーム２
０個分（時間長２００ｍSEC）からなる各区分ごとに、
上述の演算を行なって、任意のｉ個の代表ベクトルを含
む代表パラメータベクトルの構成と、この構成をとる場
合の残留歪とを決定する。つまり、区分的最適関数近似
器１０４は、各区分ごとに、上述の演算を行なってＥ_ｉ
（但しｉ＝１〜２０のすべて）の組｛Ｅ｝および、各Ｅ
_ｉに対応するｉ個の代表ベクトルの組｛｝およびこ
れらの代表ベクトルの代表するｉ個の区間の幅Ｂの組
｛Ｂ｝を決定し、これらのデータを次の総合最適フレー
ム選択器１０５に供給する。Now, the piecewise optimal function approximator 104 uses the basic frame 2
For each division consisting of 0 (time length 200 mSEC),
The above calculation is performed to determine the configuration of the representative parameter vector including any i representative vectors and the residual distortion in the case of adopting this configuration. That is, the piecewise optimum function approximator 104 performs the above-mentioned calculation for each section to obtain E _i
(However, i = 1 to 20) All {E} and each E
_{The set of i} representative vectors {} corresponding to i and the set of widths B of the i sections representative of these representative vectors {B} are determined, and these data are sent to the next total optimum frame selector 105. Supply.

総合最適フレーム選択器１０５は、以上に述べた基本フ
レーム２０個分よりなる１区分をさらに例えば５０個分
集めてなる大区間（例えば時間長１０SEC）に対する最
適フレーム選択の処理を行なう処理器である。The total optimum frame selector 105 is a processor that performs optimum frame selection processing for a large section (for example, a time length of 10 SEC) that is obtained by collecting, for example, one section consisting of 20 basic frames described above, for example, 50 sections. .

選択器１０５は上述のようにして供給された、各区分毎
のデータ｛Ｅ｝，｛｝および｛Ｂ｝をそれぞれ少くも
１大区間分（５０区分分）だけ貯わえられるメモリを有
し、一つの大区間分の上述のデータの供給が終了する
と、これらのデータを用いて以下に説明するような総合
最適フレーム選択処理を開始する。The selector 105 has a memory capable of storing the data {E}, {} and {B} for each section supplied as described above for at least one large section (for 50 sections). When the supply of the above-mentioned data for one large section is completed, the comprehensive optimum frame selection processing as described below is started using these data.

さて、各区分ごとの｛Ｅ｝は選択器１０５のメモリのワ
ークエリヤに、第２図に示すようなマトリクス状のテー
ブルとして格納される。但し▲Ｅ^（ｊ） _ｉ▼の上方のサ
フィックスの(j)はこれが第ｊ番目の区分の歪であるこ
とを示し、下方のサフィックスのｉは、この区分をｉ個
の代表フレーム（ｉ個の代表ベクトル）で最適矩形近似
を行なった場合に達し得られる歪の最小値（残留歪）で
あることを表わしている。従って同じ(j)の値に対して
は（同じ縦列内においては）ｉが大きくなる程▲^（ｊ）
_ｉ▼が小さくなることは明らかである。Now, {E} for each section is stored in the work area of the memory of the selector 105 as a matrix table as shown in FIG. However, the suffix (j) above ▲ E ^(j) _i ▼ indicates that this is the distortion of the j-th section, and the suffix i below indicates that this section has i representative frames (i This represents the minimum value of the distortion (residual distortion) that can be reached when the optimum rectangle approximation is performed using the representative vector). Therefore, for the same value of (j), the larger i becomes (in the same column ^), ▲ ^(j)
It is clear that _i ▼ becomes small.

本実施例においては、上述の如く、区分的最適関数近似
器１０４は、区分的最適関数近似により、各区分毎に、
任意のｉ個を代表フレームとして選んだ場合における最
適近似の残留歪Ｅ_ｉをすべて求めてこれに関係するデー
タを供給しているが、実際に各区分毎の代表パラメータ
ベクトル構成としてこれらの中からいかにその一つを選
択すべきかについてはこれを決定していない。In the present embodiment, as described above, the piecewise optimal function approximator 104 performs the piecewise optimal function approximation for each segment.
Although all the residual strains E _i of the optimum approximation in the case of selecting arbitrary i pieces as the representative frame are obtained and the data related thereto are supplied, from among these as the representative parameter vector configuration for each section, It has not decided on how to choose that one.

これに対して総合最適フレーム選択器１０５は、各大区
間を代表する代表フレーム（代表ベクトル）の総数を、
予めＮ個と固定した場合に、できるかぎりこの大区間に
おける全体の歪が、各区分ごとにバランスして小さくな
るように、この大区間における代表フレーム構成を選択
決定する機能を有している。On the other hand, the total optimum frame selector 105 calculates the total number of representative frames (representative vectors) representing each large section as
It has a function of selecting and deciding the representative frame configuration in this large section so that the total distortion in this large section is balanced and small in each section as much as possible when fixed to N in advance.

この大区間における代表フレーム（代表ベクトル）の総
数Ｎとしては、各区分毎の代表フレーム数の平均値を、
例えば５個とするとＮ＝５×５０＝２５０となる。As the total number N of representative frames (representative vectors) in this large section, the average value of the number of representative frames in each section is
For example, if there are five, N = 5 × 50 = 250.

総合最適フレーム選択処理は、上述の第２図に示すテー
ブルを用いて以下に示すアルゴリズムに従って行なわれ
る。The overall optimum frame selection process is performed according to the following algorithm using the table shown in FIG.

（Ａ０）：最初に、各区分ごとに代表フレームを１個ず
つ選出するものとして、Ｎ＝５０と設定する。この場合
には、上述のマトリクスの第１の横列▲Ｅ^（ｉ） _１▼
（但しｊ＝１〜５０）の内容が各区分の歪を表わしてい
る。勿論各区分内においては１個の代表フレーム（代表
ベクトル）を用いた場合の最適近似が行なわれている。(A0): First, N = 50 is set assuming that one representative frame is selected for each section. In this case, the first row ▲ E ⁽ⁱ⁾ ₁ ▼ of the above matrix
The content of (where j = 1 to 50) represents the distortion of each section. Of course, optimal approximation is performed in each section when one representative frame (representative vector) is used.

（Ａ１）：上述の第１の横列▲Ｅ^（ｉ） _１▼（ｊ＝１〜
５０）の内容を比較して、この中の最大値をとる▲Ｅ
^（ｉ） _１▼を選出する。すなわち上述のような大区間の
代表フレーム構成をとった場合において最大の歪を生ず
る区分を選出する。(A1): First row ▲ E ⁽ⁱ⁾ ₁ ▼ (j = 1 to ₁ ) described above
Compare the contents of 50) and take the maximum value among them ▲ E
^(I) Select ₁ ▼. That is, the section that produces the maximum distortion when the representative frame configuration of the large section as described above is taken is selected.

（Ａ２）：上に選出された最大値を▲Ｅ^（ｍ） _１▼とす
る。すなわち、第ｍ番目の縦列（第ｍ番目の区分）に属
する歪が選出されたとすると、その縦列がすべての▲Ｅ
^（ｍ） _ｉ▼を一個分だけ上方にシフトする。すなわち、
▲Ｅ^（ｍ） _ｉ▼を▲Ｅ^（ｍ） _ｉ＋１▼でおきかえる（但
し、ｉ＝１，２，…，１９）。(A2): Let the maximum value selected above be ▲ E ^(m) ₁ ▼. In other words, if a distortion belonging to the m-th column (m-th section) is selected, that column has all ▲ E.
^(M) Shift _i ▼ upward by one. That is,
▲ E ^(m) _i ▼ is replaced with ▲ E ^(m) _{i + 1} ▼ (however, i = 1, 2, ..., 19).

明らかに、▲Ｅ^（ｊ） _２０▼は(j)の値如何にかかわら
ず常に０であるので、このような上方シフトを行なうこ
とによって、シフトされた縦列の後尾には０が一つだけ
増すことになる。Obviously, ▲ E ^(j) ₂₀ ▼ is always 0 regardless of the value of (j), so performing such an upward shift will add only 0 to the tail of the shifted column. It will be.

以上の（Ａ１），（Ａ２）による処理は、この大区間の
代表フレーム構成において最大の歪を発生する区分を見
出し、この区分の代表フレーム数だけを１個増して、こ
の区分をより高度の近似に更新するという処理になって
いる。The processing by the above (A1) and (A2) finds a section that generates the maximum distortion in the representative frame configuration of this large section, increases only the number of representative frames in this section, and The process is to update to an approximation.

（Ａ３）：上述の（Ａ２）の処理により大区間の代表フ
レーム数が１個増したことに対応してＮの値を１だけ増
加する。この結果Ｎの値が予め定めた大区間における代
表フレームの総数２５０に達した場合には、次の（Ａ
４）の処理を行ない、これに達しない場合には、再び
（Ａ１）の処理に戻り、この大区間の代表フレーム構成
において最大歪を発生する区分を見出し、この区分の代
表フレーム数だけを１個増加して最適近似の更新を行な
うという処理を繰返す。(A3): The value of N is increased by 1 in response to the increase in the number of representative frames in the large section by 1 due to the processing in (A2). As a result, when the value of N reaches the total number of representative frames 250 in a predetermined large section, the following (A
If the process of 4) is not performed, and if this is not reached, the process returns to the process of (A1) again, finds the section in which the maximum distortion occurs in the representative frame configuration of this large section, and sets only the number of representative frames of this section to 1 The process of increasing the number and updating the optimum approximation is repeated.

（Ａ４）：以上の処理により、前述の▲Ｅ^（ｊ） _ｉ▼テ
ーブルの第１番目の横列には、この大区間を２５０代表
フレームを用いて最適近似を行なった場合の各区分に対
する残留歪が示されている。またこのテーブルの各ｊ
（ｊ＝１〜５０）に対する縦列の後尾に含まれる０の数
をＭ_ｊとすると、前述のような理由から、この数Ｍ
_ｊは、拾度この大区間に対する近似が行なわれた場合
の、各ｊ番目の区分に対する代表フレームの数を表わす
ことになる。(A4): As a result of the above processing, the residual distortion for each section when this large section is optimally approximated using 250 representative frames is shown in the first row of the above-mentioned ⁽ E ^{) (j)} _i table. It is shown. Also, each j in this table
Assuming that the number of 0s included in the tail of the column for (j = 1 to 50) is M _j , this number M is set for the reason described above.
_j represents the number of representative frames for each j-th segment when approximation is performed for this large interval.

以上の理由により、大区間を２５０代表フレームを用い
て最適近似を行なった場合の、各区分における代表ベク
トルの数、各代表ベクトルの成分の値、および各代表ベ
クトルが代表する区間幅の構成が決定されたので、選択
器１０５は、これらの各区分の代表ベクトルの成分の
値、およびこの代表ベクトルが代表する各区間幅（基本
フレーム数）の値を貯えられているメモリ領域から読出
して、各代表ベクトルの成分の値をつぎつぎに、量子化
器１０６に供給するとともに、この各代表ベクトルが代
表する区間幅（基本フレーム数）の値を、このベクトル
の各基本フレームごとの繰返しを指定する数としてコー
ダ１０８に供給する。For the above reasons, the configuration of the number of representative vectors in each section, the value of the component of each representative vector, and the section width represented by each representative vector when the large section is optimally approximated using 250 representative frames Since it is determined, the selector 105 reads the value of the component of the representative vector of each of these sections and the value of each section width (the number of basic frames) represented by this representative vector from the stored memory area, The values of the components of each representative vector are supplied to the quantizer 106 one after another, and the value of the interval width (the number of basic frames) represented by each representative vector is designated as the repetition of each basic frame of this vector. It is supplied to the coder 108 as a number.

量子化器１０６は、供給された各代表ベクトルの成分
を、伝送路および伝送品質の要求り定まる粗さで再量子
化した後、コーダ１０８に供給する。The quantizer 106 requantizes the supplied components of each representative vector with the roughness determined by the required transmission path and transmission quality, and then supplies them to the coder 108.

一方、音源情報分析器１０７は窓関数処理器１０２から
供給された音声データより、ピッチ情報、有声音／無声
音情報（Ｖ／ＵＶ）、音量情報等を公知の手段を用いて
抽出し、これらの情報をコーダ１０８に供給する。On the other hand, the sound source information analyzer 107 extracts pitch information, voiced sound / unvoiced sound information (V / UV), sound volume information, etc. from the sound data supplied from the window function processor 102 using a known means, and extracts these. Information is provided to the coder 108.

コーダ１０８は、以上のようにして供給された各情報
を、伝送に適する形に合成符号化してメモリ１０９に供
給する。The coder 108 composite-encodes each information supplied as described above into a form suitable for transmission, and supplies it to the memory 109.

メモリ１０９は、供給されたデータを音声の蓄積伝送を
行なうために一時記憶し、伝送路1200の空き状態に応じ
て合成側２に送出する。The memory 109 temporarily stores the supplied data in order to store and transmit the voice, and sends it to the synthesizing side 2 in accordance with the empty state of the transmission path 1200.

さて、合成側２においては、伝送路1200を介して伝送さ
れたデータは、いったん、メモリ２０１に貯えられ、音
声発生の必要に応じてこのメモリ２０１から流出され、
以下の処理によって音声が再現される。Now, on the synthesis side 2, the data transmitted via the transmission path 1200 is once stored in the memory 201, and is flown out of the memory 201 as needed for voice generation.
The sound is reproduced by the following processing.

すなわち、メモリ２０１から読出されたデータは、デコ
ーダ２０２によってデコードされ、これにより分析側１
のコーダ１０８の入力側に供給されたデータが復元され
る。That is, the data read from the memory 201 is decoded by the decoder 202, whereby the analysis side 1
The data supplied to the input side of the coder 108 is restored.

復元されたデータ中の、音源情報分析器１０７からのピ
ッチ情報は、パルス発振器２０３に供給され、この発振
周波数がピッチの基本周波数になるように制御する。ま
た、有声／無声情報（Ｖ／ＵＶ）は、Ｖ／ＵＶ切替器２
０５の切替制御信号として供給かれ、これが有声音(V)
指定する場合には、切替器２０５がパルス発振器２０３
の出力側を選択し、無声音（ＵＶ）を指定する場合に
は、切替器２０５が雑音発生器２０４の出力側を選択す
るように制御する。The pitch information from the sound source information analyzer 107 in the restored data is supplied to the pulse oscillator 203, and the oscillation frequency is controlled so as to become the fundamental frequency of the pitch. In addition, voiced / unvoiced information (V / UV) is V / UV switcher 2.
It is supplied as a switching control signal of 05, and this is a voiced sound (V)
When designating, the switching unit 205 sets the pulse oscillator 203
When the unvoiced sound (UV) is designated by selecting the output side of, the switching unit 205 controls so as to select the output side of the noise generator 204.

さらにまた、音量情報は、電力制御器２０６の制御情報
として供給され、これにより電力制御器２０６が、切替
器２０５の選択出力を可変増幅してその出力が指定され
た電力量になるように制御する。Furthermore, the volume information is supplied as the control information of the power controller 206, whereby the power controller 206 variably amplifies the selected output of the switch 205 and controls it so that the output has a specified power amount. To do.

こうして得られた電力制御器２０６の出力は、ＬＳＰ合
成フィルタを駆動する音源信号としてＬＳＰ合成フィル
タ２０７に供給される。The output of the power controller 206 thus obtained is supplied to the LSP synthesis filter 207 as a sound source signal for driving the LSP synthesis filter.

一方、デコーダ２０２からデコードされた、各代表ベク
トルの各成分、および各代表ベクトルが代表する各区間
幅の情報は、補間器２０９を介してＬＳＰ合成フィルタ
２０７に供給される。On the other hand, the information on each component of each representative vector and each section width represented by each representative vector decoded by the decoder 202 is supplied to the LSP synthesis filter 207 via the interpolator 209.

補間器２０９は供給された各代表ベクトルの各成分を、
これらの各代表ベクトルが代表する区間幅分だけ各基本
フレームごとに繰返し再生することにより矩形近似に対
する補間を行ない、各基本フレーム毎のＬＳＰパラメー
タベクトルの各成分を生成してこれをＬＳＰ合成フィル
タ２０７に供給する。The interpolator 209 calculates each component of each supplied representative vector as
Interpolation for the rectangular approximation is performed by repeatedly reproducing for each basic frame by the section width represented by each of these representative vectors, each component of the LSP parameter vector for each basic frame is generated, and this is used for the LSP synthesis filter 207. Supply to.

ＬＳＰ合成フィルタ２０７は、こうして供給されたＬＳ
Ｐパラメータベクトルの各成分と音源信号とを用いて公
知の手段により音声信号を合成しこれを、Ｄ／Ａ変換器
および低域波器２０８に出力する。The LSP synthesis filter 207 determines the LS thus supplied.
A sound signal is synthesized by a known means using each component of the P parameter vector and the sound source signal, and the synthesized sound signal is output to the D / A converter and the low pass filter 208.

かくして、合成されたディジタル音声信号は、アナログ
音声信号に変換され、不要な周波数成分が除かれて出力
ライン2000から出力される。Thus, the synthesized digital voice signal is converted into an analog voice signal, unnecessary frequency components are removed, and the resultant digital voice signal is output from the output line 2000.

以上のように本実施例によると、伝送される音声情報
は、２００ｍSEC程度の各区分毎に、この区分に割当て
られた代表フレーム数に対する最適近似になっているば
かりでなく、これらの区分の５０個程度からなる１０Ｓ
ＥＣにおよぶ大区間においても、各区分に対する歪がよ
くバランスされた形の最適近似になっている。As described above, according to the present embodiment, the transmitted voice information is not only the optimum approximation for the number of representative frames assigned to each section of about 200 mSEC, but also 50 of these sections. 10S consisting of about 10 pieces
Even in the large section up to EC, the distortion is well balanced in the optimum approximation.

すなわち、音声情報の激しく変化する区分においては、
より多くの代表フレームを用いることにより、より高度
の最適近似を行ない、一方音声情報の変化の少ない区分
に対しては少ない数の代表フレームによる粗い近似を行
なっていて、伝送すべき全情報量を一定に制限した場合
に、できるだけ各区分に対する歪がバランスして小さく
なるような最適近似が行なわれていることになる。これ
により各区分の代表フレーム数を一定に固定した場合に
較べて、大区間内の各区分ごとの音声情報量のゆらぎを
一層忠実に追随することができるため、より効率的な情
報量の圧縮または、より高品質の音声の再現が達成され
る。That is, in the section where the voice information changes drastically,
By using a larger number of representative frames, a higher degree of optimal approximation is performed, while a rough approximation is performed by using a small number of representative frames for sections with little change in audio information, and the total amount of information to be transmitted is calculated. Optimal approximation is performed so as to balance and reduce the distortion for each section as much as possible when the value is limited to a fixed value. As a result, it is possible to more faithfully follow the fluctuations in the audio information amount for each segment in a large section, as compared to the case where the number of representative frames in each segment is fixed, and thus more efficient compression of the information amount. Alternatively, a higher quality audio reproduction is achieved.

しかも、例えば基本分析フレームを1000個も含む１０秒
もの大区間を、区分的最適近似で述べたような手法によ
り直接この大区間全体に対して最適近似を行なおうとす
ると、莫大な計算量となってしまって、その実現は殆ん
ど不可能になる。本実施例においては、この大区間を、
２００ｍＳＥＣ程度の通常広く用いられている区分に分
割し、この各区分に対する区分的最適関数近似により、
まず各区分に任意の数の代表フレームを割当てた場合の
各区分に対する最適近似を行ないそれ等の場合の各歪を
求めておき、これを巧に利用することによって大区間に
対する最適近似を実現可能なものとしている。Moreover, for example, if a large section of 10 seconds including 1000 basic analysis frames is directly subjected to the optimum approximation by the method described in the piecewise optimal approximation, a large amount of calculation is required. It becomes almost impossible to realize it. In this embodiment, this large section is
It is divided into commonly used sections of about 200 mSEC, and by piecewise optimal function approximation for each section,
First, when an arbitrary number of representative frames are assigned to each segment, optimal approximation is performed for each segment, distortions in those cases are calculated, and by using this distortion optimally for large sections can be realized. It is supposed to be.

なお、以上は本発明の一実施例を示したもので本発明は
以上の実施例に限定されるものでないことは明らかであ
る。It should be noted that the above shows one embodiment of the present invention, and it is obvious that the present invention is not limited to the above embodiment.

例えば、以上の実施例においては、基本フレーム長とし
て１０ｍＳＥＣ、１区分の基本フレーム数２０個（従っ
て１区分長２００ｍＳＥＣ）、大区間における区分数５
０個（従って大区間の時間長１０ＳＥＣ、またその中に
含まれる基本フレーム数1000個）および大区間中におけ
る代表フレーム数２５０個等と、特定の値を用いて説明
したが、勿論これらは一例を示したのみで何もこれらの
値に限定される必要はない。For example, in the above embodiment, the basic frame length is 10 mSEC, the number of basic frames in one segment is 20 (hence, the one segment length is 200 mSEC), and the number of segments in the large section is 5.
0 (therefore, the time length of the large section is 10 SEC, and the number of basic frames included in the large section is 1000) and the number of representative frames in the large section are 250, etc., and the explanation is given using specific values. Is shown and nothing need be limited to these values.

また区分的最適関数近似を行なうためのダイナミックプ
ログラミングの方法も一例を示したもので勿論これに限
定される必要はない。The dynamic programming method for performing the piecewise optimal function approximation is also an example, and need not be limited to this.

さらにまた、音声の特徴パラメータベクトルとしてＬＳ
Ｐ（線スペクト対）を用いる方法について説明したが、
これもＬＳＰパラメータベクトルに限定される必要はな
く、例えばＬＰＣパラメータベクトルその他の特徴パラ
メータベクトルを用いて実施できることも明らかであ
る。Furthermore, LS is used as a voice feature parameter vector.
The method using P (line-spect pair) has been described,
It is also clear that this need not be limited to the LSP parameter vector, and can be implemented using, for example, the LPC parameter vector or other characteristic parameter vector.

さらに、本実施例においては、区分的最適関数近似に用
いる関数として矩形近似を用いたが、この代わりに、線
形近似または台形近似を用いることもできる。Further, in this embodiment, the rectangular approximation is used as the function used for the piecewise optimal function approximation, but linear approximation or trapezoidal approximation may be used instead.

線形近似とは、選出されたつぎつぎの各代表ベクトルの
先端を直線で結び、これにより、代表される各基本フレ
ームのベクトルを直線補間により決定してこれをこられ
の代表される基本フレームの実際のパラメータベクトル
のかわりに用いるもので、このような近似を行なった場
合における歪も、実際の各基本フレームのパラメータベ
クトルと、かわりに用いるベクトルとの各成分の差から
前述と同様にして容易に求められるので、本実施例に用
いた手法を殆どそのまま適用して、区分的最適関数近似
および総合最適フレーム選択を行なうことができる。Linear approximation is to connect the tip of each selected next representative vector with a straight line, thereby determining the vector of each representative basic frame by linear interpolation, and to determine this by the actual of the representative basic frame. Is used instead of the parameter vector of, and the distortion in the case of performing such an approximation can be easily performed in the same manner as described above from the difference of each component between the actual parameter vector of each basic frame and the vector used instead. Therefore, the method used in the present embodiment can be applied almost as it is, and piecewise optimal function approximation and total optimal frame selection can be performed.

すなわち、区分的最適関数近似器により、各区分の代表
ベクトル数（代表フレーム数）を必要な範囲内で任意に
変えて最適線形近似を行なった場合の各歪をすべて求め
ておき、総合最適フレーム選択器においてこの結果を利
用して上述と全く同様な総合最適フレーム選択を行な
う。That is, the piecewise optimal function approximator is used to find all distortions when optimal linear approximation is performed by arbitrarily changing the number of representative vectors (the number of representative frames) of each section within a required range, and the total optimal frame is calculated. This result is used in the selector to perform the total optimum frame selection exactly as described above.

つまり、大区間中の各区分に対し、最初に同数の最小の
代表ベクトル数を与えるように仮想設定する。次にこの
設定において最大の歪を発生する区分を上述の結果を用
いて見出し、この区分の代表ベクトル数を一つ増し歪を
低減する。次にこうして更新された設定に対し再び最大
の歪を発生する区分を見出しこの区分の代表ベクトル数
を一つ増し、さらに歪を低減する。こうして各設定のス
テップにおいて最大の歪を発生する区分を見出しこの区
分の代表ベクトル数を増すことにより、大区間全体の代
表ベクトル数を一つずつ増し、これが予め定めた数にな
るまで以上のステップを繰返して総合最適フレーム選択
を行なう。In other words, virtual setting is performed so that the same number of minimum representative vectors is first given to each section in the large section. Next, the section that produces the maximum distortion in this setting is found using the above results, and the number of representative vectors in this section is increased by one to reduce the distortion. Next, the section which generates the maximum distortion again with respect to the updated setting is found, the number of representative vectors of this section is increased by one, and the distortion is further reduced. In this way, by finding the section that produces the maximum distortion in each setting step, and increasing the number of representative vectors of this section, the number of representative vectors of the entire large section is increased by one, until the number reaches a predetermined number. Is repeated to select a total optimum frame.

なお、直線近似を行なった場合には合成側２の補間器２
０９は、デコーダ２０２から供給される次次の代表パラ
メータベクトルとこれらのパラメータベクトル間の基本
フレーム数とを用いて直線補間を行なって各基本フレー
ムに対するパラメータベクトルを生成しこれを合成フィ
ルタ２０７に供給する。When linear approximation is performed, the interpolator 2 on the synthesis side 2
Reference numeral 09 denotes a next-order representative parameter vector supplied from the decoder 202 and the number of basic frames between these parameter vectors to perform linear interpolation to generate a parameter vector for each basic frame, and supplies this to the synthesis filter 207. To do.

また台形近似とは、音声情報の特徴として、音声情報の
激しく変化する過渡部分は、ほぼ一定の約２０ｍＳＥＣ
程度の時間長を有することを利用して、変化部分の時間
長を予め定めた一定の時間長（例えば２基本フレーム
分）とする台形関数を用いて最適近似を行なうもので音
声の特徴パラメータベクトルの最適近似にはとくに有効
である。このような台形近似を用いることによりパラメ
ータベクトルの急激な変化に伴なう反響音等の悪影響を
軽減することができる。In addition, the trapezoidal approximation is a feature of voice information, in which a transient portion of the voice information that changes drastically is approximately 20 mSEC.
A characteristic parameter vector of a voice is obtained by performing optimal approximation using a trapezoidal function in which the time length of a change portion is set to a predetermined constant time length (for example, two basic frames) by having a time length of about Is particularly effective for the optimal approximation of. By using such a trapezoidal approximation, it is possible to reduce adverse effects such as reverberant sound that accompany a sudden change in the parameter vector.

このような台形近似を用いる場合についても、近似によ
る歪を求めることは本実施例に述べたのとほぼ同様に行
なうことができ、従って、上に述べた区分的最適関数近
似とこの結果を用いる総合最適フレーム選択とによる本
発明の方式はそのまま適用できることは明らかである。Also in the case of using such a trapezoidal approximation, the distortion due to the approximation can be obtained almost in the same manner as described in the present embodiment, and therefore, the piecewise optimum function approximation described above and this result are used. It is clear that the method of the present invention based on the total optimum frame selection can be applied as it is.

また本実施例においては、分析側１においてメモリ１０
９を設け、これにより、伝送路に送出するのに適する形
に整えられた音声情報を蓄積しておき、伝送路の都合の
よい時間を利用してこれを合成側２に伝送し、合成側２
においては、伝送された音声情報をそのままメモリ２０
１に蓄積し、使用者の都合のよいときにこれを再生させ
るボイスメール等のいわゆる音声蓄積伝送装置に本発明
の方式を適用する例を示したが、これ以外のボコーダ等
のような通常の音声分析合成装置に適用できることは明
らかである。この場合には、本実施例に示した分析側の
メモリ１０９、および合成側のメモリ２０１を省略する
こともできる。Further, in this embodiment, the memory 10 is provided on the analysis side 1.
9 is provided, whereby voice information arranged in a form suitable for sending to the transmission path is stored, and this is transmitted to the synthesizing side 2 by utilizing a convenient time of the transmission path. Two
In the memory 20, the transmitted voice information is directly stored in the memory 20.
Although the example of applying the method of the present invention to a so-called voice storing and transmitting apparatus such as a voice mail that stores the data in No. 1 and reproduces it when it is convenient for the user has been shown, other normal vocoders and the like can be used. Obviously, it can be applied to a voice analysis / synthesis device. In this case, the memory 109 on the analysis side and the memory 201 on the combining side shown in this embodiment can be omitted.

さらにまた、予め蓄積している各種の短音声素片を指定
に応じて組合せて発生させる例えばパブリックアドレス
装置等にも本方式を適用できる。つまりこのような音声
合成器に用いる各音声素片を生成する場合に本方式を適
用して情報量の圧縮および／または音質の改善を図るこ
とができる。Furthermore, the present method can also be applied to, for example, a public address device or the like that generates various short speech units stored in advance in combination according to a designation. That is, the present method can be applied to the generation of each speech unit used in such a speech synthesizer to reduce the amount of information and / or improve the sound quality.

本発明の方式によると、例えば１０ＳＥＣ程度にも及ぶ
大区間に対する最適近似が行なわれているにもかかわら
ず、合成側においては音声を再現するに当って、区分時
間幅（２０ｍSEC程度）以上の時間遅れを必要としない
という特徴を有している。これはボイスメール，バブリ
ックアドレス等の装置に適用した場合に、使用者の要求
に応じて遅滞なく音声再現を可能にするという点で特に
有効である。According to the method of the present invention, although the optimum approximation is performed for a large section of up to about 10 SEC, for example, the time of the section time width (about 20 mSEC) or more is required for reproducing the voice on the synthesis side. It has the feature that no delay is required. This is particularly effective in that when applied to a device such as a voice mail or a public address, the voice can be reproduced without delay according to the user's request.

（発明の効果）以上述べたように本発明によると、基本フレームを1000
個のオーダーで含むような音声ブロックに対してこの莫
大な数の基本フレームを含む音声ブロックを全体として
可変長フレームによる最適近似を行なえるような可変長
フレーム音声分析合成方式を実現できる。As described above, according to the present invention, the basic frame is
It is possible to realize a variable-length frame speech analysis / synthesis method capable of optimally approximating a speech block including a huge number of basic frames by a variable-length frame to a speech block including a plurality of basic frames.

これによって、より効果的な音声情報量の圧縮および／
または音質の向上を達成でき、音声分析合成装置，音声
蓄積伝送装置および音声合成装置の性能向上を達成でき
る。This enables more effective compression and / or
Alternatively, the sound quality can be improved, and the performance of the voice analysis / synthesis device, the voice storage / transmission device, and the voice synthesis device can be improved.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロック図および第２
図は前記実施例の総合最適フレーム選択器中のメモリの
ワークエリヤの内容を説明するための図である。図において、１……音声分析側、２……音声合成側、１
０１……低域波器およびＡ／Ｄ変換器（ＬＰＦ＆Ａ／
Ｄ）、１０２……窓関数処理器、１０３……ＬＳＰ分析
器、１０４……区分的最適関数近似器、１０５……総合
最適フレーム選択器、１０６……量子化器、１０９……
メモリ、２０１……メモリ、２０２……デコーダ、２０
３……パルス発振器、２０４……雑音発生器、２０５…
…Ｖ／ＵＶ切替器、２０６……電力制御器、２０８……
Ｄ／Ａ変換器および低域波器（Ｄ／Ａ＆ＬＰＦ）、２
０９……補間器。FIG. 1 is a block diagram showing an embodiment of the present invention and FIG.
The figure is a diagram for explaining the contents of the work area of the memory in the comprehensive optimum frame selector of the above embodiment. In the figure, 1 ... voice analysis side, 2 ... voice synthesis side, 1
01 …… Low-pass filter and A / D converter (LPF & A /
D), 102 ... Window function processor, 103 ... LSP analyzer, 104 ... Piecewise optimal function approximator, 105 ... Total optimal frame selector, 106 ... Quantizer, 109 ...
Memory, 201 ... Memory, 202 ... Decoder, 20
3 ... Pulse oscillator, 204 ... Noise generator, 205 ...
… V / UV switch, 206 …… Power controller, 208 ……
D / A converter and low-pass filter (D / A & LPF), 2
09 ... Interpolator.

Claims

[Claims]

1. A voice analysis means for periodically analyzing an input voice signal at a predetermined constant analysis cycle to extract a characteristic parameter vector, and a plurality of predetermined continuous analysis cycles. The configuration of the representative parameter vector for each section obtained when selecting an arbitrary number of representative parameter vectors from the feature parameter vector in each section for each section and performing the piecewise optimal function approximation In this case, the piecewise optimum function approximating means for calculating the residual strain by the optimum approximation for each section, and the piecewise optimum function approximating means for the large section consisting of a plurality of predetermined continuous sections are calculated. Comparing the residual strain for each segment, the configuration of the representative parameter vector of the segment with the largest residual strain is determined by adding more representative parameter vectors. A total optimum frame selecting means for selecting a predetermined number of all the representative parameter vectors that optimally approximate the large section by repeating the processing step of replacing with the configuration of the representative parameter vector including A variable length frame speech analysis and synthesis method characterized by: