JP2853266B2

JP2853266B2 - Audio encoding device and audio decoding device

Info

Publication number: JP2853266B2
Application number: JP2129607A
Authority: JP
Inventors: 利幸森井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-05-18
Filing date: 1990-05-18
Publication date: 1999-02-03
Anticipated expiration: 2014-02-03
Also published as: JPH0424699A

Description

【発明の詳細な説明】産業上の利用分野本発明は、音声を符号化または復号化する音声符号化
装置および音声復号化装置に関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech encoding device and a speech decoding device for encoding or decoding speech.

従来の技術従来、低ビットレート（4.8kbps程度）の音符号化を
実現するには、線形予測分析などのスペクトル分析を利
用して音声の周波数的特徴を抽出し、音源情報と合せて
符号化する音声分析合成符号化方式と、音声の冗長性を
用いて波形自体を符号化する音声波形符号化方式とがあ
る。2. Description of the Related Art Conventionally, to realize sound encoding at a low bit rate (about 4.8 kbps), frequency characteristics of speech are extracted by using spectral analysis such as linear prediction analysis, and encoded together with sound source information. There is a speech analysis / synthesis coding method that performs speech analysis, and a speech waveform coding method that encodes a waveform itself using speech redundancy.

発明が解決しようとする課題しかしながら、上述した従来例のうち、前者の音声分
析合成符号化方式では、低ビットレートを実現すること
はできるが、良質な音声を合成するための駆動音源の符
号化が困難である。一方、後者の音声波形符号化方式で
は、良質な音声を復号化することはできるが、低ビット
レートを実現することが困難である。SUMMARY OF THE INVENTION However, of the above-mentioned conventional examples, the former speech analysis / synthesis coding method can realize a low bit rate, but can encode a driving sound source for synthesizing high quality speech. Is difficult. On the other hand, the latter speech waveform encoding method can decode high quality speech, but it is difficult to realize a low bit rate.

このように、従来の音声符号化・復号化方式では、高
品質の音質と低ビットレートの符号化を同時に満足する
ことができなかった。さらに、両者に共通する課題とし
て、複雑な処理によって計算量が増加することが挙げら
れる。As described above, the conventional audio encoding / decoding system cannot simultaneously satisfy high-quality sound quality and low-bit-rate encoding. Further, a problem common to both of them is that the amount of calculation increases due to complicated processing.

本発明は、上記課題に鑑み、音声波形符号化の形態を
取りながらも、簡単なデータ処理で低ビットレートの音
声符号化を実現することを目的とするものである。The present invention has been made in view of the above problems, and has as its object to realize low-bit-rate voice encoding by simple data processing while taking a form of voice waveform coding.

課題を解決するための手段この目的を達成するために、本発明の第１の発明は、
音声信号をピッチ分析することによって得られる１ピッ
チの基本波形の形状を求め、前記基本波形から正負それ
ぞれ絶対値最大となる点を検索し、前記点から時間軸に
下ろした垂線を骨組とし、前記音声信号の波形に応じて
最適な複数種類の前記骨組の組数を決定する骨組数決定
手段と、前記骨組数決定手段によって決定された組数の
骨組を、前記点の時間的位置及び振幅を用いて符号化す
る骨組符号化手段と、番号付けされた複数の骨間波形サ
ンプルが格納されている第１の骨間波形符号帳と、前記
骨組符号化手段で得られた骨組の間に張られる骨間波形
を前記第１の骨間波形符号帳を利用して符号化する骨間
波形符号化手段を有するように構成されている。Means for Solving the Problems To achieve this object, a first invention of the present invention is
The shape of the one-pitch basic waveform obtained by analyzing the pitch of the audio signal is obtained, a point at which the absolute value of each of the positive and negative is maximized from the basic waveform is searched, and a vertical line lowered from the point to the time axis is used as a skeleton. The number of frames determining means for determining the optimal number of sets of the plurality of types of frames according to the waveform of the audio signal, and the number of frames determined by the number of frames determining means, the time position and amplitude of the point Skeleton encoding means for encoding using the skeleton encoding means, a first interskeletal waveform codebook storing a plurality of numbered interskeletal waveform samples, and a skeleton obtained by the skeleton encoding means. And an interosseous waveform encoding means for encoding the interosseous waveform to be obtained using the first interosseous waveform codebook.

本発明の第２の発明は、音声信号に関する時間的位置
及び振幅を用いて符号化された情報により決定される点
から時間軸に下ろした垂線を骨組とし、前記音声信号に
関する情報から抽出される前記骨組の組数を用いて複数
種類の前記骨組を作成する骨組復号化手段と、番号付け
された複数の骨間波形サンプルが格納されている第２の
骨間波形符号帳と、前記第２の骨間波形符号帳を利用し
て前記骨組の間に張られる骨間波形を復号化する骨間波
形複合化手段を有するように構成されている。According to a second aspect of the present invention, a vertical line lowered on a time axis from a point determined by information encoded using a temporal position and an amplitude regarding an audio signal is set as a skeleton, and extracted from the information regarding the audio signal. A skeleton decoding means for creating a plurality of types of skeletons by using the number of skeletons; a second interbone waveform codebook storing a plurality of numbered interbone waveform samples; And an interosseous waveform combining means for decoding an interosseous waveform stretched between the frames using the interosseous waveform codebook.

本発明の第３の発明は、第１の骨間波形符号帳が、予
め音声信号の分析により得られた複数の骨間波形のそれ
ぞれを端点を固定して時間的およびパワー的に正規化し
た波形に識別番号を付けた骨間波形サンプルを格納し、
骨間波形符号化手段が、音声信号の骨間波形と前記骨間
波形サンプルとを比較して最も近い骨間波形サンプルの
識別番号を選択することにより符号化を行うように構成
された音声符号化装置としたものであり、また、第２の
骨間波形符号帳が、予め音声信号の分析により得られた
複数の骨間波形のそれぞれを端点を固定して時間的およ
びパワー的に正規化した波形に識別番号を付けた骨間波
形サンプルを格納し、骨間波形復号化手段が、得られた
情報の中から識別番号を抽出して前記識別番号に対応す
る骨間波形サンプルを選択することにより復号化を行う
ように構成された音声復号化装置としたものである。According to a third aspect of the present invention, the first interstitial waveform codebook normalizes temporally and power each of a plurality of interosseous waveforms obtained in advance by voice signal analysis with fixed end points. Stores interstitial waveform samples with waveform identification numbers,
An interstitial waveform encoding unit configured to perform the encoding by comparing the interosseous waveform of the audio signal with the interosseous waveform sample and selecting an identification number of the closest interosseous waveform sample; A second interstitial waveform codebook, wherein a plurality of interosseous waveforms obtained in advance by voice signal analysis are fixed at end points to temporally and power normalized. The interosseous waveform sample in which the identification number is added to the obtained waveform is stored, and the interosseous waveform decoding means extracts the identification number from the obtained information and selects the interosseous waveform sample corresponding to the identification number. Thus, the present invention provides an audio decoding device configured to perform decoding.

作用本発明は、上記構成により、音声が基本周波数を持つ
波形であると仮定して、まず、音声をピッチ分析するこ
とによってピッチ情報を求める。次に、ピッチ情報に基
づいて１ピッチの基本波形を求め、その基本波形の形状
を表す複数種類の骨組を検索して骨組情報を得る。さら
にその骨組の間に張られる波形（骨間波形）の情報を圧
縮して符号化する。そして、骨組情報と骨間波形情報と
を合せて伝送することによって、簡単なデータ処理で低
ビットレートの音声符号化および復号化が実現できる。Operation According to the present invention, pitch information is obtained by first performing pitch analysis on a voice, assuming that the voice has a waveform having a fundamental frequency. Next, a basic waveform of one pitch is obtained based on the pitch information, and a plurality of types of frames representing the shape of the basic waveform are searched to obtain frame information. Further, information of a waveform (inter-bone waveform) stretched between the frames is compressed and encoded. Then, by transmitting the skeleton information and the interosseous waveform information together, voice encoding and decoding at a low bit rate can be realized by simple data processing.

実施例第１図は、本発明による音声符号化装置および音声復
号化装置の一実施例を示す機能ブロック図である。Embodiment FIG. 1 is a functional block diagram showing one embodiment of a speech encoding device and a speech decoding device according to the present invention.

まず、音声符号化装置および音声復号化装置を構成す
る各ブロックの説明を以下に述べる。First, a description will be given below of each block constituting the speech encoding device and the speech decoding device.

符号器１には、サンプリングしてディジタル信号に変
換して一定時間長（１フレーム）ごとに区切った入力音
声信号３が供給される。The encoder 1 is supplied with an input audio signal 3 which is sampled, converted into a digital signal, and divided for each predetermined time length (one frame).

入力音声信号３は、符号器１のピッチ分析部４におい
て区間内のピッチが求められ、これがピッチ情報とされ
る。このピッチ情報を基にして、ピッチ分析部４は、区
間内の波形から１ピッチの平均的な波形を求め、これを
基本波形として骨組検索部５に送る。The pitch in the section of the input speech signal 3 is obtained by the pitch analysis unit 4 of the encoder 1, and this is used as pitch information. Based on the pitch information, the pitch analysis unit 4 calculates an average waveform of one pitch from the waveform in the section, and sends the average waveform to the skeleton search unit 5 as a basic waveform.

骨組検索部５では、ピッチ分析部４で作成された基本
波形の形状を分析し、何段階の骨組を立てるかを考慮し
ながら、骨組の段数に応じて、正と負で絶対値最大とな
るポイントを検索し、その信号の位置と信号の振幅とを
骨組情報とする。The skeleton search unit 5 analyzes the shape of the basic waveform created by the pitch analysis unit 4 and takes the maximum number of positive and negative absolute values in accordance with the number of steps of the skeleton, while considering the number of stages of the skeleton. A point is searched, and the position of the signal and the amplitude of the signal are used as skeleton information.

ここで、この骨組検索部５における骨組検索法につい
て、詳細に説明する。Here, the skeleton search method in the skeleton search unit 5 will be described in detail.

１ピッチの基本波形は、どれもインパルス応答的形状
であるが、その形状は音声者や発生状況によって様々で
ある。従って、その概形を骨組で表すには、その段数を
波形の形状に応じて決定する必要がある。即ち、なだら
かな山の形状の波形には段数を少なく設定し、正負に激
しく振動する波形には段数を多く設定する必要がある。
そこで、この骨組段数を考慮しながら骨組探索を行うア
ルゴリズムを以下に述べる。The one-pitch basic waveform has an impulse response shape, but the shape varies depending on the voice and the occurrence situation. Therefore, in order to represent the general shape by a frame, it is necessary to determine the number of steps according to the shape of the waveform. That is, it is necessary to set a small number of steps for a waveform having a gentle mountain shape, and to set a large number of steps for a waveform which vibrates strongly in positive and negative directions.
Therefore, an algorithm for performing a frame search in consideration of the number of frame steps will be described below.

（１）初期値設定を行う。(1) Set the initial value.

Xi（ｉ＝1,L）:1ピッチの基本波形。Ｌは長さ。Xi (i = 1, L): One pitch basic waveform. L is the length.

D:骨組段数の最大値。D: Maximum number of frame steps.

K:1〜Ｌまでの位置を要素とする探索の禁止領域集合。
初期値としてＫ＝φ（空集合）とする。K: A search prohibited area set having elements from positions 1 to L as elements.
Let K = φ (empty set) as an initial value.

M:検索段数。初期値Ｍ＝０ Hi＝（Ax、An、Ix、In）：骨組情報。MAXの信号値Ax、M
INの信号値An、MAXの位置Ix、MINの位置Inの４つの値に
より構成される。M: Number of search stages. Initial value M = 0 Hi = (Ax, An, Ix, In): Frame information. MAX signal values Ax, M
It is composed of four values of IN signal value An, MAX position Ix, and MIN position In.

（２）Ｍ＝Ｍ＋１（３） Xmax ＝max｛Xi|i＝1,L ｉＫ｝＝Xi1 Xmin ＝min｛Xi|i＝1,L ｉＫ｝＝Xi2 HM＝（Xmax、Xmin、i1、i2）（４） i1とi2を中心として、前後のXiの符号が変化し
ない区間の位置全てを禁止領域としてＫの要素に加え
る。(2) M = M + 1 (3) Xmax = max {Xi | i = 1, LiK} = Xi1 Xmin = min {Xi | i = 1, LiK} = Xi2 HM = (Xmax, Xmin, i1, i2) (4) With respect to i1 and i2, all the positions of the preceding and succeeding sections in which the sign of Xi does not change are added to the K element as a prohibited area.

（５）Ｍ＝ＤまたはＫが１〜Ｌ全てを要素として持つ
ときは（６）へ。それ以外のときは（２）へ。(5) If M = D or K has all 1 to L as elements, go to (6). Otherwise go to (2).

（６） Hj（ｊ＝1,M）の位置情報の部分のみを取り出
して、大きさの順番に並べる。(6) Only the position information portion of Hj (j = 1, M) is extracted and arranged in the order of size.

（７）小さい方からその位置がMAXの位置であるが、M
INの位置であるかを調べる。そして、そのどちらかが２
つ連続して続いた場合は、Ｍ＝Ｍ−１として（６）へ。
MAXとminが全て交互に並んでいる場合は（８）へ。(7) From the smaller one, the position is the MAX position.
Check if the position is IN. And either one is 2
If it continues one after another, set M = M−1 and go to (6).
If MAX and min are all alternately arranged, go to (8).

（８）Ｍを骨組段数、Hj（ｊ＝1,M）を骨組情報とし
て検索を終了する。(8) The search is terminated using M as the frame number and Hj (j = 1, M) as the frame information.

上述したアルゴリズムによって分類された基本波形の
集合の例を第２図に示す。第２図では、実線で１ピッチ
の基本波形を、破線で骨組の位置を示す。骨組が１段の
場合を第２図（ａ）に、２段の場合を第２図（ｂ）に、
３段の場合を第２図（ｃ）に例として示し、基本波形と
骨組情報の関係を第３図に示す。第３図において、A1
1、A12、A21、A22が骨組の位置情報、B11、B12、B21、B
22が信号値情報である。FIG. 2 shows an example of a set of basic waveforms classified by the above-described algorithm. In FIG. 2, the solid line indicates the basic waveform of one pitch, and the broken line indicates the position of the skeleton. FIG. 2 (a) shows the case where the skeleton has one stage, and FIG. 2 (b) shows the case where the skeleton has two stages.
FIG. 2C shows an example of the case of three stages, and FIG. 3 shows the relationship between the basic waveform and the skeleton information. In FIG. 3, A1
1, A12, A21, A22 are skeleton position information, B11, B12, B21, B
22 is signal value information.

次に、骨間波形選択部６の機能を第４図を用いて説明
する。ただし、第４図は骨組が１段の場合を示してい
る。Next, the function of the interosseous waveform selector 6 will be described with reference to FIG. However, FIG. 4 shows a case where the frame is one stage.

まず、骨組検索部５から供給される骨組情報を基に、
１ピッチ内において、骨組となるMAX信号１からMIN信号
C2までの間に張られる波形と、MIN信号C2からMAX信号C1
までの間に張られる波形とを求めて（第４図（ａ）参
照）、これを基本骨間波形D1およびD2とする（第４図
（ｂ）参照）。次に、それぞれの基本骨間波形d1および
d2を端点固定して時間的およびパワー的に正規化した波
形E1およびE2を得る。（第４図（ｃ）参照）。First, based on the skeleton information supplied from the skeleton search unit 5,
Within 1 pitch, MIN signal from MAX signal 1 which is the skeleton
The waveform stretched between C2 and MIN signal C2 to MAX signal C1
(See FIG. 4 (a)), and these are defined as basic interosseous waveforms D1 and D2 (see FIG. 4 (b)). Next, the respective basic interosseous waveforms d1 and
Waveforms E1 and E2 normalized with respect to time and power are obtained with d2 fixed at the end points. (See FIG. 4 (c)).

波形E1およびE2は、第４図（ｄ）に示すように骨間波
形符号帳７（第１図参照）に格納されている、番号付け
された骨間波形サンプルと比較され、最も近い骨間波形
サンプルに付いている番号ＮおよびＭを骨間波形情報と
して出力する。上述したようにして得られたピッチ情報
（ピッチ分析部４の出力）、骨組情報（骨組検索部５の
出力）、骨間波形情報（骨間波形選択部６の出力）を単
位時間の音声の符号として復号器２に伝送する。The waveforms E1 and E2 are compared with the numbered interosseous waveform samples stored in the interosseous waveform codebook 7 (see FIG. 1) as shown in FIG. The numbers N and M attached to the waveform samples are output as inter-bone waveform information. The pitch information (output of the pitch analysis unit 4), skeleton information (output of the skeleton search unit 5), and interskeletal waveform information (output of the interskeletal waveform selection unit 6) obtained as described above are used It is transmitted to the decoder 2 as a code.

この時に用いられる骨間波形符号帳７には、予め音声
を分析することによって得られる基本骨間波形を多くの
音声データについて集め、それぞれを端点固定して時間
的およびパワー的に正規化して番号を付けた情報が格納
されている。The interosseous waveform codebook 7 used at this time collects basic interosseous waveforms obtained by analyzing the speech in advance for a large amount of speech data, and fixes the end points to normalize in terms of time and power to obtain a number. Stored information with.

ここで、骨間波形符号帳７に格納される情報の作成方
法について詳細に述べる。Here, a method of creating information stored in the interbone waveform codebook 7 will be described in detail.

骨間波形符号帳７は、そのサイズが大きい程その符号
化歪が小さくなるのは自明である。高音質を実現するた
めには、骨間波形符号帳７のサイズは大きいことが望ま
しい。しかし、低ビットレートを実現するためには、骨
間波形情報のビット数が小さいことが望ましく、また、
符号器１を実時間で動作させるためには、骨間波形符号
帳７とのマッチングに要する計算量は少ないのが望まし
い。従って、サイズは小さいながらも符号化歪が小さい
という効率の良い骨間波形符号帳７が必要となる。It is obvious that the larger the size of the interstitial waveform codebook 7, the smaller the coding distortion. In order to achieve high sound quality, it is desirable that the size of the interstitial waveform codebook 7 is large. However, in order to realize a low bit rate, it is desirable that the number of bits of the interosseous waveform information is small,
In order to operate the encoder 1 in real time, it is desirable that the amount of calculation required for matching with the interbone waveform codebook 7 is small. Therefore, an efficient interskeletal waveform codebook 7 having a small size but small coding distortion is required.

この骨間波形符号帳７を作成するために、充分大きな
骨間波形サンプル集合に対してサンプルとセントロイド
（重心）間のユークリッド距離が最小になるようなクラ
スタリングを行い、作成しようとする骨間波形符号帳７
のサイズの数のクラスに分けて、そのクラスタのセント
ロイド（重心）で骨間波形符号帳７を作成するという技
術的手段を用いる。本実施例に用いたクラスタリング・
アルゴリズムは、細胞分裂型のアルゴリズムである。そ
のアルゴリズムを以下に述べる。In order to create the interstitial waveform codebook 7, clustering is performed on a sufficiently large set of interosseous waveform samples so that the Euclidean distance between the sample and the centroid (center of gravity) is minimized, and the interstitial waveform to be created is calculated. Waveform codebook 7
A technical means is used in which the interstitial waveform codebook 7 is created with the centroids (centroids) of the clusters divided into classes of the number of sizes. Clustering used in this embodiment
The algorithm is a cell division type algorithm. The algorithm is described below.

（１）Ｋ＝１（２）Ｋ個のクラスタのセントロイドを単純平均によ
り求める。そして、それぞれのクラスタに属する全ての
サンプルとセントロイドとのユークリッド距離を求め、
その最大値をそのクラスタの歪とする。(1) K = 1 (2) A centroid of K clusters is obtained by a simple average. Then, calculate the Euclidean distance between all samples belonging to each cluster and the centroid,
The maximum value is defined as the distortion of the cluster.

（３）Ｋ個のクラスタの中で最も歪の大きいクラスタ
のセントロイドの附近に２つのセントロイドを作る。
（細胞分裂の核になる。）（４）Ｋ＋１個のセントロイドを基にクラスタリング
を行い、セントロイドを求め直す。(3) Two centroids are formed near the centroid of the cluster having the largest distortion among the K clusters.
(It becomes the nucleus of cell division.) (4) Clustering is performed based on K + 1 centroids, and the centroid is obtained again.

（５）空のクラスタがあればそのセントロイドを抹消
して（３）へ。(5) If there is an empty cluster, delete the centroid and go to (3).

（６）Ｋ＋１個のクラスタの歪を（２）と同様に求
め、その総和の変化量が予め設定された微小な閾値以下
であれば（７）へ、閾値より大きければ（４）へ。(6) The distortion of K + 1 clusters is obtained in the same manner as in (2). If the amount of change in the sum is equal to or smaller than a preset small threshold, go to (7), and if larger than the threshold, go to (4).

（７）Ｋ＋１が目標のクラスタ数に達していなければ
Ｋ＝Ｋ＋１として（２）へ、達していれば（６）へ。(7) If K + 1 does not reach the target number of clusters, set K = K + 1 and go to (2), otherwise go to (6).

（８）すべてのクラスタのセントロイドを求め、骨間
波形符号帳７を作成する。(8) The centroids of all the clusters are obtained, and the interbone waveform codebook 7 is created.

次に、復号器２の機能を第１図および第５図を用いて
説明する。ただし、第５図は骨組が１段の場合を示して
いる。Next, the function of the decoder 2 will be described with reference to FIGS. However, FIG. 5 shows a case where the frame has one stage.

まず、骨組形成部８においては、符号器１によって符
号化によって得られるピッチ情報（入力音声信号３の出
力）と骨組情報（骨間波形選択部６の出力）を基に、音
声の骨組C1およびC2を形成する。第５図（ａ）は、この
骨組の一例である。骨組が骨組情報に基づいて形成され
ている様子を示す。First, in the skeleton forming unit 8, based on pitch information (output of the input audio signal 3) obtained by encoding by the encoder 1 and skeleton information (output of the interosseous waveform selecting unit 6), the audio skeletons C1 and C1 are output. Form C2. FIG. 5A is an example of this skeleton. It shows a state where a skeleton is formed based on skeleton information.

波形合成部９においては、骨間波形選択部６から供給
される骨間波形情報ＮおよびＭに基づいて、符号器１に
格納されているものと同じ骨間波形符号帳10から基本骨
間波形E1およびE2を選び、骨組に応じて時間的およびパ
ワー的に変換して各骨の間に張り、この波形Ｆを出力音
声信号11（第１図参照）とする。第５図（ｂ）〜（ｄ）
は、この波形合成の一例である。骨間波形情報Ｎおよび
Ｍに基づいて、骨間波形符号帳10から選び出した骨間波
形サンプルによって、骨組の間に基本骨間波形を張って
いる様子を示す。In the waveform synthesizing unit 9, based on the interosseous waveform information N and M supplied from the interosseous waveform selecting unit 6, the same interosseous waveform codebook 10 as that stored in the encoder 1 is used. E1 and E2 are selected, converted in terms of time and power according to the frame, and stretched between the bones, and this waveform F is used as an output audio signal 11 (see FIG. 1). FIG. 5 (b) to (d)
Is an example of this waveform synthesis. A state in which a basic interosseous waveform is stretched between frames by an interosseous waveform sample selected from the interosseous waveform codebook 10 based on the interosseous waveform information N and M is shown.

この音声符号化法の効果を示すために、この音声符号
化・復号化のシミュレーション実験を行った。以下、こ
の実験結果について説明する。In order to show the effect of this speech coding method, a simulation experiment of this speech coding / decoding was performed. Hereinafter, the results of this experiment will be described.

符号化される音声データは、女性アナウンサー１名の
発声した天気予報の音声「天気予報。気象庁予報部午後
１時30分発表の天気予報をお知らせします。日本の南岸
には、東西にのびる前線が停滞し、前線上の八丈島の東
や、北九州の五島列島付近には低気圧があって、東北東
に進んでいます。」を8kHzサンプリングA/D変換したデ
ィジタル音声データで、長さは約20秒である。音声デー
タは20msec（１フレーム）毎に分析する。骨間波形符号
帳７は、上記音声データを含まない男女50名の各約10秒
間の音声データを分析することによって得られた骨間波
形サンプル集合を基に、上記クラスタリング・アルゴリ
ズムを用いて作成した。なお、サンプル数は約２万個で
ある。The encoded voice data is the voice of the weather forecast uttered by one female announcer, "Weather forecast. The weather forecast announced at 1:30 pm, Japan Meteorological Agency Forecast Department. On the south coast of Japan, the east-west front Is stagnant, there is a low pressure near the east of Hachijojima on the front line and near the Goto Islands in Kitakyushu, and it is proceeding east-northeast. " 20 seconds. The audio data is analyzed every 20 msec (one frame). The interosseous waveform codebook 7 is created using the above clustering algorithm based on the interosseous waveform sample set obtained by analyzing the audio data of each of the 50 men and women without the audio data for about 10 seconds each. did. The number of samples is about 20,000.

また、骨組段数の最大を３段とした。そして、ビット
レートをさらに下げるためほ、格段数に応じて適応的に
ビット割当てを行った。２段と３段の骨組位置情報と３
段の骨組ゲイン情報については、ベクトル量子化による
符号化を行い、ビットレートの節約を行う。また、骨間
波形情報を求めるための骨間波形符号帳７のサイズは、
各段数と波形の長さに応じて適応的に変化させ、短い波
形は小さい骨間波形符号帳７で、長い波形は大きな骨間
波形符号帳７で符号化するようにした。音声データ１単
位（20msec）当たりビット割当てについては、下記の第
１表に示す。In addition, the maximum number of frame stages was set to three. Then, in order to further reduce the bit rate, bit allocation was adaptively performed according to the number of stages. 2nd and 3rd skeleton position information and 3rd
The frame skeleton gain information of the stage is encoded by vector quantization to save the bit rate. The size of the interstitial waveform codebook 7 for obtaining the interosseous waveform information is:
The waveform is adaptively changed according to the number of stages and the length of the waveform, and the short waveform is encoded by the small interosseous waveform codebook 7 and the long waveform is encoded by the large interosseous waveform codebook 7. The bit allocation per unit of audio data (20 msec) is shown in Table 1 below.

上記条件による符号化実験の結果、低ビットレートで
ありながら、滑らかで自然な音声が合成できた。S/N比
でも約10dBが得られた。この音声データ以外の音声で同
様の実験を試みたところ７〜11dBのS/N比が得られ、音
質も良かった。 As a result of a coding experiment under the above conditions, a smooth and natural speech could be synthesized at a low bit rate. Approximately 10 dB was also obtained in the S / N ratio. When a similar experiment was attempted with voices other than the voice data, an S / N ratio of 7 to 11 dB was obtained, and the sound quality was good.

上記音声符号化実験により、本発明による音声符号化
法によって、簡単なデータ処理でありながら、低ビット
レートの音声の符号化が実現できていることが検証され
た。The above-described speech coding experiment verified that the speech coding method according to the present invention was able to realize low bit rate speech coding with simple data processing.

発明の効果以上のように、本発明は、音声信号を分析することに
よって数種類の骨組を検索してその骨組情報を得ると共
に、その骨組の間に張られる波形（骨間波形）の情報を
圧縮して骨間波形情報を得て、骨組情報と骨間波形情報
とによって音声を符号化および復号化するようにしたの
で、低ビットレートで良質な音声符号化装置および音声
復号化装置を得ることができる。As described above, according to the present invention, several types of skeletons are searched for by analyzing voice signals to obtain skeleton information, and information on waveforms (inter-bone waveforms) stretched between the skeletons is compressed. Inter-bone waveform information, and encodes and decodes the audio based on the skeleton information and the inter-bone waveform information. Therefore, it is possible to obtain a low-bit-rate and high-quality audio encoding device and an audio decoding device. Can be.

[Brief description of the drawings]

第１図は、本発明による音声符号化装置および音声復号
化装置の一実施例を示す機能ブロック図、第２図は、本
発明による音声符号化装置および音声復号化装置の一実
施例の骨組検索アルゴリズムによって骨組段数別に分類
された基本波形の集合を示す波形図、第３図は、本発明
による音声符号化装置および音声復号化装置の一実施例
の動作について、骨組が２段の場合を例として、基本波
形と骨組情報の関係を示した波形図、第４図は、本発明
による音声符号化装置および音声復号化装置の一実施例
の動作を符号器１について説明する波形図、第５図は、
本発明による音声符号化装置および音声復号化装置の一
実施例の動作を復号器２について説明する波形図であ
る。１……符号器、２……復号器、３……入力音声信号、４
……ピッチ分析部、５……骨組検索部、６……骨間波形
選択部、７……骨間波形符号帳、８……骨組形成部、９
……波形合成部、10……骨間波形符号帳、11……出力音
声信号。FIG. 1 is a functional block diagram showing one embodiment of a speech encoding device and a speech decoding device according to the present invention, and FIG. 2 is a skeleton of one embodiment of a speech encoding device and a speech decoding device according to the present invention. FIG. 3 is a waveform diagram showing a set of basic waveforms classified according to the number of skeleton stages by a search algorithm. FIG. 3 is a diagram showing the operation of an embodiment of a speech encoding device and a speech decoding device according to the present invention in the case where the skeleton has two stages. As an example, a waveform diagram showing the relationship between the basic waveform and the skeleton information, FIG. 4 is a waveform diagram for explaining the operation of one embodiment of the speech encoding device and the speech decoding device according to the present invention for the encoder 1, FIG. Figure 5
FIG. 5 is a waveform diagram illustrating the operation of the speech encoding device and the speech decoding device according to the embodiment of the present invention with respect to the decoder 2. 1 encoder 2 decoder 3 input audio signal 4
... Pitch analysis unit, 5... Skeleton search unit, 6... Interskeletal waveform selection unit, 7.
... Waveform synthesizing section, 10... Interosseous waveform codebook, 11... Output audio signal.

Claims

(57) [Claims]

A pitch analysis unit (4), a skeleton search unit (5),
A speech encoding apparatus comprising a first interbone waveform codebook (7) and an interbone waveform selection unit (6), and outputs pitch information, skeleton information, and an identification number for each frame, wherein a pitch analysis unit (4) ) Analyzes the pitch of the input frame audio signal, extracts the pitch information of the frame and the basic waveform for one pitch, and the frame search unit (5) determines the maximum absolute value of the basic waveform based on a predetermined rule. And outputs amplitude information and position information of the maximum absolute value point as skeleton information. The first interosseous waveform codebook (7) has a plurality of interosseous waveform samples corresponding to the identification numbers. The interosseous waveform selecting unit (6) selects and selects an interosseous waveform sample matching the waveform between the absolute value maximum points obtained by the skeleton search unit (5) from the first interosseous waveform codebook (7). Speech coder that outputs identification number of interstitial waveform sample .

2. The interosseous waveform sample of the first interosseous waveform codebook (7) is a waveform normalized in terms of time and power. The speech encoding apparatus according to claim 1, wherein a bone-to-bone waveform sample to be matched is selected after performing temporal and power conversion on the speech signal.

3. A frame forming unit (8), a second interbone waveform codebook (10), and a waveform synthesizing unit (9). For each frame, pitch information, position information and amplitude information of the absolute value maximum point, An audio decoding device for inputting an identification number and decoding an audio signal, wherein a frame forming unit (8) calculates an absolute value maximum point based on pitch information, position information of the absolute value maximum point, and amplitude information. The second interosseous waveform codebook (10) has a plurality of interosseous waveform samples corresponding to the identification numbers, and the waveform synthesizing unit (9) compares the interosseous waveform samples corresponding to the identification numbers between the absolute value maximum points. To form a one-pitch basic waveform, and restore a speech waveform between frames based on the basic waveform.

4. The interosseous waveform sample of the second interosseous waveform codebook (10) is a waveform normalized in terms of time and power, and the waveform synthesizing unit (9) outputs a waveform corresponding to the identification number. The audio decoding apparatus according to claim 3, wherein the inter-waveform sample is temporally and power-converted, and is then set between the absolute value maximum points.