JP4626261B2

JP4626261B2 - Speech coding apparatus and speech coding method

Info

Publication number: JP4626261B2
Application number: JP2004307029A
Authority: JP
Inventors: 博康井手
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2004-10-21
Filing date: 2004-10-21
Publication date: 2011-02-02
Anticipated expiration: 2024-10-21
Also published as: JP2006119363A

Description

本発明は、音声符号化装置、及び音声符号化方法に関する。 The present invention relates to a speech coding apparatus and a speech coding method.

従来より、音声信号の圧縮方式として、μ−ｌａｗ、ＡＤＰＣＭ（Adaptive Differential Pulse Code Modulation）、音楽で利用されるＭＰ３（MPEG Audio Layer-3）、携帯電話等で利用されるＶＳＥＬＰ(Vector Sum Excited Linear Prediction)、Ｇ．７２９等のＣＥＬＰ（Code-Excited Linear Prediction）系の圧縮方式が実用化されている。特許文献１には、音声圧縮技術として、ベクトル量子化を用いた技術が開示されている。
特開平１０−６３２９９号公報 Conventionally, audio signal compression methods include μ-law, ADPCM (Adaptive Differential Pulse Code Modulation), MP3 (MPEG Audio Layer-3) used in music, VSELP (Vector Sum Excited Linear) used in mobile phones, etc. Prediction), G. A CELP (Code-Excited Linear Prediction) type compression method such as 729 has been put into practical use. Patent Document 1 discloses a technique using vector quantization as an audio compression technique.
Japanese Patent Laid-Open No. 10-63299

語学学習において会話等の録音を行う場合、１６ｋＨｚ程度のサンプリング周波数が、各言語の特徴を保ちつつ、多くのデータ量を必要としない適度な周波数であると考えられている。しかしながら、ＣＥＬＰ系の圧縮方式に現れる圧縮ノイズは、語学学習用には適切ではないという問題があった。また、μ−ｌａｗ、ＡＤＰＣＭは、十分な音質であるが、符号化レートが高いため、携帯機器でこれらの圧縮方式を利用する場合、録音時間が短くなってしまうという問題があった。また、ＭＰ３は、主に高品質の音声の圧縮を対象としており、１６ｋＨｚ程度のサンプリング周波数では、効果的に圧縮を行うことができないという問題があった。 When recording conversation or the like in language learning, a sampling frequency of about 16 kHz is considered to be an appropriate frequency that does not require a large amount of data while maintaining the characteristics of each language. However, there is a problem that the compression noise that appears in the CELP compression method is not appropriate for language learning. In addition, although μ-law and ADPCM have sufficient sound quality, since the encoding rate is high, there is a problem that recording time is shortened when these compression methods are used in a portable device. MP3 is mainly intended for compression of high-quality audio, and there is a problem that compression cannot be performed effectively at a sampling frequency of about 16 kHz.

本発明の課題は、音声符号化における符号化の効率を向上させることにより、効果的な音声符号化を実現させることである。 An object of the present invention is to realize effective speech coding by improving coding efficiency in speech coding.

本発明に係る音声符号化装置は、入力された音声信号をフレームに分割するフレーム化部と、１フレームを構成するブロック単位毎に、音声信号に対し周波数変換を施して変換係数を算出する周波数変換部と、前記周波数変換部で得られた変換係数のうちの予め指定された１つのブロックの変換係数で、同一フレーム内の他のブロックの変換係数を除算する除算部と、前記除算部で得られた各ブロックの変換係数の除算された値を周波数帯域毎にまとめてベクトル量子化するベクトル量子化部と、前記ベクトル量子化部により得られたインデックスデータを、所定の符号化方式で符号化する符号化部と、を備えることを特徴としている。
また別の本発明に係る音声符号化装置は、入力された音声信号をフレームに分割するフレーム化部と、１フレームを構成するブロック単位毎に、音声信号に対し周波数変換を施して変換係数を算出する周波数変換部と、前記周波数変換部で得られた変換係数に基づいて、フレームを構成するブロック単位毎に変換係数の絶対値の和を算出し、この絶対値の和の値が最も大きなブロックの変換係数で、同一フレーム内の他のブロックの変換係数を除算する除算部と、前記除算部で得られた各ブロックの変換係数の除算された値を周波数帯域毎にまとめてベクトル量子化するベクトル量子化部と、前記ベクトル量子化部により得られたインデックスデータを、所定の符号化方式で符号化する符号化部と、を備えることを特徴とする。
更に別の本発明に係る音声符号化装置は、入力された音声信号をフレームに分割するフレーム化部と、１フレームを構成するブロック単位毎に、音声信号に対し周波数変換を施して変換係数を算出する周波数変換部と、フレームを構成するブロック単位毎に算出された変換係数のうち絶対値が最大となる値を有するブロックを求め、当該ブロックの変換係数で、同一フレーム内の他のブロックの変換係数を除算する除算部と、前記除算部で得られた各ブロックの変換係数の除算された値を周波数帯域毎にまとめてベクトル量子化するベクトル量子化部と、前記ベクトル量子化部により得られたインデックスデータを、所定の符号化方式で符号化する符号化部と、を備えることを特徴とする。 The speech coding apparatus according to the present invention includes a framing unit that divides an input speech signal into frames, and a frequency that performs frequency transformation on the speech signal for each block unit constituting one frame and calculates a transform coefficient. A conversion unit, a division unit that divides a conversion coefficient of another block in the same frame by a conversion coefficient of one block specified in advance among the conversion coefficients obtained by the frequency conversion unit, and the division unit A vector quantization unit that vector-quantizes the obtained divided coefficients of the transform coefficients for each block for each frequency band, and encodes index data obtained by the vector quantization unit using a predetermined encoding method. And an encoding unit for converting to an encoding unit.
Another speech encoding apparatus according to the present invention includes a framing unit that divides an input speech signal into frames, and performs a transform on the speech signal for each block unit constituting one frame to obtain a transform coefficient. Based on the frequency conversion unit to be calculated and the conversion coefficient obtained by the frequency conversion unit, the sum of absolute values of the conversion coefficients is calculated for each block unit constituting the frame, and the sum of the absolute values is the largest. Divide the transform coefficient of the other block in the same frame by the transform coefficient of the block, and vector quantization of the divided values of the transform coefficients of each block obtained by the divider for each frequency band A vector quantization unit that encodes the index data obtained by the vector quantization unit using a predetermined encoding method.
Furthermore, another speech encoding apparatus according to the present invention includes a framing unit that divides an input speech signal into frames, and performs a frequency conversion on the speech signal for each block unit constituting one frame to obtain a transform coefficient. A frequency conversion unit to be calculated and a block having a value having the maximum absolute value among the conversion coefficients calculated for each block unit constituting the frame are obtained, and the conversion coefficient of the block is used to determine other blocks in the same frame. A division unit that divides the transform coefficient, a vector quantization unit that collectively vector-quantizes the divided values of the transform coefficient of each block obtained by the division unit for each frequency band, and the vector quantization unit. And an encoding unit that encodes the index data obtained by a predetermined encoding method.

本発明によれば、周波数変換により得られた変換係数のうち、フレーム中の所定の１つのブロックの変換係数で、同一フレーム内の他のブロックの変換係数を除算するようにしたことにより、除算後の変換係数の絶対値の範囲が狭くなり、ベクトル量子化やエントロピー符号化等の効率が向上し、音質を向上させることができる。 According to the present invention, among the transform coefficients obtained by frequency transform, the transform coefficient of another block in the same frame is divided by the transform coefficient of a predetermined one block in the frame. The range of the absolute value of the subsequent transform coefficient is narrowed, the efficiency of vector quantization, entropy coding, etc. is improved, and the sound quality can be improved.

以下、図面を参照して、本発明の実施形態について詳細に説明する。
まず、本実施形態における構成について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
First, the configuration in the present embodiment will be described.

図１に、本発明の実施形態に係る音声符号化装置１００の構成を示す。音声符号化装置１００は、図１に示すように、Ａ／Ｄ変換部１、ＤＣ（Direct Current）除去部２、フレーム化部３、レベル調整部４、周波数変換部５、除算部６、周波数並べ替え部７、ベクトル量子化部８、エントロピー符号化部９、レートコントローラ１０、データ削除部１１により構成される。 FIG. 1 shows a configuration of speech encoding apparatus 100 according to an embodiment of the present invention. As shown in FIG. 1, the speech encoding apparatus 100 includes an A / D converter 1, a DC (Direct Current) removing unit 2, a framing unit 3, a level adjusting unit 4, a frequency converting unit 5, a dividing unit 6, a frequency The rearrangement unit 7, the vector quantization unit 8, the entropy encoding unit 9, the rate controller 10, and the data deletion unit 11 are configured.

Ａ／Ｄ変換部１は、入力された音声アナログ信号をデジタル信号に変換し、ＤＣ除去部２に出力する。サンプリング周波数は、16kHz程度が望ましいが、11.025kHz、22.05kHz等でもかまわない。 The A / D conversion unit 1 converts the input audio analog signal into a digital signal and outputs the digital signal to the DC removal unit 2. The sampling frequency is preferably about 16 kHz, but may be 11.025 kHz, 22.05 kHz, or the like.

ＤＣ除去部２は、Ａ／Ｄ変換部１から入力された音声信号の直流成分を除去し、フレーム化部３に出力する。音声信号の直流成分を除去するのは、直流成分が音質にほとんど無関係であることによる。直流成分の除去は、例えば、高域通過フィルタによって実現することができる。高域通過フィルタには、例えば、式（１）で表されるものがある。

The DC removal unit 2 removes the direct current component of the audio signal input from the A / D conversion unit 1 and outputs it to the framing unit 3. The reason why the DC component of the audio signal is removed is that the DC component is almost irrelevant to the sound quality. The removal of the direct current component can be realized by, for example, a high-pass filter. An example of the high-pass filter is represented by Expression (1).

フレーム化部３は、ＤＣ除去部２から入力された信号を、符号化（圧縮）の処理単位であるフレームに分割し、レベル調整部４に出力する。ここで、１つのフレームは、１つ以上、好ましくは４つ以上のブロックが含まれる長さにする。１ブロックは、１回のＭＤＣＴ（Modified Discrete Cosine Transform：変形離散コサイン変換）を行う単位であり、ＭＤＣＴの次数分の長さを有する。以下、１フレームを構成する各ブロックをＭＤＣＴブロックと呼ぶことにする。図２に、入力信号と各フレームとの関係を示し、図３に、１フレームと各ＭＤＣＴブロックとの関係を示す。図３に示すように、各ＭＤＣＴブロックは直前のＭＤＣＴブロックと、ＭＤＣＴブロックの半分の長さの重複部分を有する。また、図２に示すように、各フレームは、直前のフレームと、ＭＤＣＴブロックの半分の長さの重複部分を有する。 The framing unit 3 divides the signal input from the DC removal unit 2 into frames that are processing units of encoding (compression) and outputs the frames to the level adjustment unit 4. Here, one frame has a length including one or more, preferably four or more blocks. One block is a unit for performing one MDCT (Modified Discrete Cosine Transform), and has a length corresponding to the order of MDCT. Hereinafter, each block constituting one frame is referred to as an MDCT block. FIG. 2 shows the relationship between the input signal and each frame, and FIG. 3 shows the relationship between one frame and each MDCT block. As shown in FIG. 3, each MDCT block has an overlap portion that is half the length of the previous MDCT block and the MDCT block. In addition, as shown in FIG. 2, each frame has an overlapping portion that is half the length of the previous frame and the MDCT block.

レベル調整部４は、フレーム毎に、入力された音声信号のレベル調整を行い、レベル調整された信号を周波数変換部５に出力する。レベル調整とは、１フレーム中に含まれる信号の振幅の最大値を、指定されたビット（以下、制圧目標ビット）数に収まるようにすることである。レベル調整は、例えば、１フレーム中の信号の最大振幅をｎbit、制圧目標ビット数をＮとすると、フレーム中の信号を全て、式（２）を満たすshift_bit数分ＬＳＢ（Least Significant Bit：最下位ビット）側にシフトすることによって実現できる。

音声信号では、１０ビット程度に制圧することが考えられる。なお、音声再生時には、振幅が制圧目標ビット以下に制圧された信号を元に戻す必要があるため、shift_bitを表す信号が、音声符号化信号の一部として出力される。 The level adjustment unit 4 adjusts the level of the input audio signal for each frame, and outputs the level-adjusted signal to the frequency conversion unit 5. Level adjustment is to make the maximum value of the amplitude of a signal included in one frame fall within a specified number of bits (hereinafter referred to as suppression target bits). For example, if the maximum amplitude of a signal in one frame is nbit and the suppression target bit number is N, all the signals in the frame are LSB (Least Significant Bit: least significant) for the number of shift_bits that satisfy Expression (2). This can be realized by shifting to the bit) side.

It can be considered that the audio signal is suppressed to about 10 bits. Note that, during audio reproduction, a signal whose amplitude is suppressed to a suppression target bit or less needs to be restored, so a signal representing shift_bit is output as part of the encoded audio signal.

周波数変換部５は、レベル調整部４から入力された信号に対し周波数変換を施し、除算部６に出力する。本実施形態では、周波数変換としてＭＤＣＴ（Modified Discrete Cosine Transform：変形離散コサイン変換）を用いる場合を示す。ＭＤＣＴブロックの長さをＭ、入力信号を｛ｘ_n｜n=0,…,M-1｝とすると、ＭＤＣＴ係数｛Ｘ_k｜k=0,…,M/2-1｝は式（３）のように表される。

ここで、ｈ_nは窓関数であり、式（４）で表される。

なお、ブロック長Ｍは、16kHz程度のサンプリング周波数の音声では、２５６程度の値が考えられる。 The frequency conversion unit 5 performs frequency conversion on the signal input from the level adjustment unit 4 and outputs the result to the division unit 6. In the present embodiment, a case where MDCT (Modified Discrete Cosine Transform) is used as frequency conversion is shown. When the length of the MDCT block is M and the input signal is {x _n | n = 0,..., M−1}, the MDCT coefficient {X _k | k = 0,. ).

Here, h _n is a window function, and is represented by Expression (4).

Note that the block length M may be a value of about 256 for audio having a sampling frequency of about 16 kHz.

除算部６は、周波数変換により得られた１フレーム分のＭＤＣＴブロックのうち予め指定されたＭＤＣＴブロックのＭＤＣＴ係数で、他のＭＤＣＴブロックのＭＤＣＴ係数を除算する処理を行う。例えば、１フレーム中にＭ₀〜Ｍ₃の４つのＭＤＣＴブロックがあり、各ＭＤＣＴブロックは６次のＭＤＣＴ係数を有するものとする。ｉ番目のＭＤＣＴブロックのＭＤＣＴ係数をＭ_i＝｛ｍ_in｜n=0,…,5｝と表し、除算処理において除算値として予め指定されたＭＤＣＴブロックをＭ₀とする。同一フレーム内の他のＭＤＣＴブロックＭ₁〜Ｍ₃のＭＤＣＴ係数をＭ₀のＭＤＣＴ係数で除算した結果Ｍ'_i＝｛ｍ'_in｜n=0,…,5｝は、式（５）のようになる。
ｍ'_in＝ｍ_in / ｍ_0n i＝1,2,3, n＝0,…,5 （５）
なお、除算値ｍ_0nの絶対値が１未満であった場合、除算によりＭＤＣＴ係数の値が却って大きくなってしまうため、除算値ｍ_0nの絶対値が１以上である場合（|ｍ_0n|＞1）のみ、除算処理を行うようにする。 The division unit 6 performs a process of dividing the MDCT coefficient of another MDCT block by the MDCT coefficient of the MDCT block designated in advance among the MDCT blocks for one frame obtained by the frequency conversion. For example, there are four MDCT blocks M _{0 to} M ₃ in one frame, and each MDCT block has a sixth-order MDCT coefficient. The MDCT coefficient of the _i -th MDCT block is expressed as M _i = {m _in | n = 0,..., 5}, and the MDCT block designated in advance as a division value in the division processing is M ₀ . As a result of dividing the MDCT coefficients of other MDCT blocks M _{1 to} M ₃ in the same frame by the MDCT coefficient of M ₀ , M ′ _i = {m ′ _in | n = 0,..., 5} It becomes like this.
_{_{m 'in = m in / m}} 0n i = 1,2,3, n = 0, ..., 5 (5)
If the absolute value of the division value m _0n is less than 1, the value of the MDCT coefficient becomes larger due to the division, so that the absolute value of the division value m _0n is 1 or more (| m _0n |> 1) Only perform division processing.

例えば、ＭＤＣＴブロックＭ₀〜Ｍ₃のＭＤＣＴ係数が下記の式（６）である場合、除算処理後のＭ'_i（i＝1,2,3）は式（７）のようになる。

For example, when the MDCT coefficients of the MDCT blocks M _{0 to} M ₃ are expressed by the following formula (6), M ′ _i (i = 1, 2, 3) after the division processing is expressed by the following formula (7).

周波数並べ替え部７は、除算部６の除算処理で得られたＭＤＣＴ係数を周波数毎に並べ替え、同一周波数帯域の係数をまとめてベクトル化し、ベクトル量子化部８に出力する。このように、同一周波数帯域の信号をまとめてベクトル化すると、例えば、定常信号を多く含む場合、後のベクトル量子化の精度が向上する。１フレームにＭＤＣＴブロックがｍ個あり、各ＭＤＣＴでＭＤＣＴ係数がＭ/２個算出された場合、ｉ番目のＭＤＣＴブロックのｊ番目のＭＤＣＴ係数をＸ_ijとすると、ｊ番目の周波数帯域をまとめたベクトルＦ_jは、Ｆ_j＝｛Ｘ_ij｜i=0,…,m-1｝,j=0,…,M/2-1となる。 The frequency rearrangement unit 7 rearranges the MDCT coefficients obtained by the division process of the division unit 6 for each frequency, collectively vectorizes the coefficients in the same frequency band, and outputs them to the vector quantization unit 8. As described above, when signals in the same frequency band are collectively vectorized, for example, when many stationary signals are included, the accuracy of subsequent vector quantization is improved. When there are m MDCT blocks in one frame and M / 2 MDCT coefficients are calculated in each MDCT, the j-th frequency band is summarized assuming that the j-th MDCT coefficient of the i-th MDCT block is X _ij . The vector F _j is F _j = {X _ij | i = 0,..., M−1}, j = 0,.

ベクトル量子化部８は、複数の音声パターンを示す代表ベクトルを格納したＶＱ（Vector Quantization）テーブルを有し、周波数並べ替え部７で作成されたベクトルＦ_jと、ＶＱテーブルに格納された各代表ベクトルを比較し、最も類似した代表ベクトルが示すインデックスを符号としてエントロピー符号化部９に出力する。 The vector quantization unit 8 has a VQ (Vector Quantization) table that stores representative vectors representing a plurality of speech patterns. The vector F _j created by the frequency rearrangement unit 7 and each representative stored in the VQ table. The vectors are compared, and the index indicated by the most similar representative vector is output as a code to the entropy encoding unit 9.

例えば、ベクトル長Ｎの符号化対象のベクトルを｛ｓ_j｜j=1,…,N｝、ＶＱテーブルに格納されたｋ個の代表ベクトルを｛Ｖ_i｜i=1,…,k｝、Ｖ_i＝｛ｖ_ij｜j=1,…,N｝とすると、符号化対象のベクトルと、ＶＱテーブルに格納されたｉ番目の代表ベクトルの各要素ｖ_ijの誤差ｅ_iが最小となるようなｉ（インデックス）を、出力する符号とする。誤差ｅ_iの算出式を式（８）に示す。

代表ベクトルの数ｋと符号化対象ベクトルのベクトル長Ｎは、ベクトル量子化に要する処理時間やＶＱテーブルの容量等を勘案して決定される。例えば、ベクトル長を２にして代表ベクトル数を２５６にしたり、ベクトル長を４にして代表ベクトル数を８１９２（＝２¹³）にしたりするなど、自由な組み合わせが考えられる。 For example, {s _j | j = 1,..., N} is an encoding target vector having a vector length N, and k representative vectors stored in the VQ table are {V _i | i = 1,. If V _i = {v _ij | j = 1,..., N}, the error e _i between the encoding target vector and each element v _ij of the i-th representative vector stored in the VQ table is minimized. I (index) is an output code. The equation for calculating the error e _i shown in equation (8).

The number k of representative vectors and the vector length N of the encoding target vector are determined in consideration of the processing time required for vector quantization, the capacity of the VQ table, and the like. For example, a free combination is conceivable, for example, the vector length is 2 and the number of representative vectors is 256, or the vector length is 4 and the number of representative vectors is 8192 (= 2 ¹³ ).

音声は、高域周波数部分と低域周波数部分で異なる特性がある場合が多いため、本実施形態では、高域と低域で異なるＶＱテーブルを用いることにする。高域用の代表ベクトルが格納されたＶＱテーブルを高域用ＶＱテーブル８ａ、低域用の代表ベクトルが格納されたＶＱテーブルを低域用ＶＱテーブル８ｂとする。周波数並べ替え部７で作成されたベクトルＦ_j＝｛Ｘ_ij｜i=0,…,m-1｝,j=0,…,M/2-1において、高域と低域の境界は、周波数帯域を示すｊを単純に半分に分ければよい。即ち、Ｆ₀,Ｆ₁,…,Ｆ_M/4-1を低域、Ｆ_M/4,Ｆ_M/4+1,…,Ｆ_M/2-1を高域とすればよい。従って、低域のベクトルＦ₀,Ｆ₁,…,Ｆ_M/4-1は、低域用ＶＱテーブル８ｂに格納された各代表ベクトルと比較され、最も類似した代表ベクトルが示すインデックスが符号として出力される。同様に、高域のベクトルＦ_M/4,Ｆ_M/4+1,…,Ｆ_M/2-1は、高域用ＶＱテーブル８ａに格納された各代表ベクトルと比較され、最も類似した代表ベクトルが示すインデックスが符号として出力される。 Since audio often has different characteristics in the high frequency part and the low frequency part, in this embodiment, different VQ tables are used in the high frequency and the low frequency. The VQ table storing the high frequency representative vector is referred to as the high frequency VQ table 8a, and the VQ table storing the low frequency representative vector is referred to as the low frequency VQ table 8b. In the vector F _j = {X _ij | i = 0,..., M−1}, j = 0,..., M / 2-1 created by the frequency rearrangement unit 7, the boundary between the high frequency and the low frequency is What is necessary is just to divide j which shows a frequency band into half simply. _{_{That, F 0, F 1, ...}} , low the _{_{F M / 4-1, F M /}} 4, F M / 4 + 1, ..., a F _{M / 2-1} may be set to the high band. Therefore, the low-frequency vectors F ₀ , F ₁ ,..., F _{M / 4-1} are compared with the representative vectors stored in the low-frequency VQ table 8b, and the index indicated by the most similar representative vector is used as a code. Is output. Similarly, the high-frequency vectors F _{M / 4} , F _{M / 4 + 1} ,..., F _{M / 2-1} are compared with the representative vectors stored in the high-frequency VQ table 8a, and the most similar representatives are compared. The index indicated by the vector is output as a code.

エントロピー符号化部９は、ベクトル量子化部８から入力された符号に対してエントロピー符号化を施し、レートコントローラ１０に出力する。エントロピー符号化とは、信号の統計的性質を利用して、符号をより短い符号へと変換する符号化方式であり、ハフマン（Huffman）符号化、算術符号化、レンジコーダ（Range Coder）による符号化等がある。エントロピー符号化の詳細については、後に図４〜図８を参照して説明する。 The entropy encoding unit 9 performs entropy encoding on the code input from the vector quantization unit 8 and outputs the result to the rate controller 10. Entropy coding is a coding method that uses the statistical properties of a signal to convert a code into a shorter code. Huffman coding, arithmetic coding, and code by a range coder (Range Coder) There is. Details of the entropy encoding will be described later with reference to FIGS.

レートコントローラ１０は、エントロピー符号化で得られた符号のデータ量が、予め設定された目標データ量より大きいか否かを判定し、エントロピー符号化で得られた符号のデータ量が目標データ量より大きいと判定した場合、データ削除部１１に対し、符号のデータ量の抑制を要求する。エントロピー符号化で得られた符号のデータ量が目標データ量以下であると判定した場合は、レートコントローラ１０は、エントロピー符号化で得られた符号を符号化（圧縮）された音声信号として出力する。レートコントローラ１０から出力された音声符号化信号は、記録媒体に記録されたり、通信ネットワークを介して外部装置に伝送されたりする。 The rate controller 10 determines whether or not the amount of code data obtained by entropy coding is larger than a preset target data amount, and the amount of code data obtained by entropy coding is larger than the target data amount. When it determines with it being large, it requests | requires suppression of the data amount of a code | symbol with respect to the data deletion part 11. FIG. When it is determined that the data amount of the code obtained by entropy coding is less than or equal to the target data amount, the rate controller 10 outputs the code obtained by entropy coding as an encoded (compressed) audio signal. . The encoded speech signal output from the rate controller 10 is recorded on a recording medium or transmitted to an external device via a communication network.

データ削除部１１は、レートコントローラ１０により、エントロピー符号化で得られた符号のデータ量が目標データ量より大きいと判定された場合、エネルギー｜Ｆ_j｜²が最小の帯域を削除し、削除後の音声信号をエントロピー符号化部９に出力し、再度、エントロピー符号化を要求する。 Data deleting unit 11, by the rate controller 10, if the data amount of the code obtained by the entropy coding is determined to be greater than the target amount of data, energy | F _j | ² deletes the minimum bandwidth, after deletion Are output to the entropy encoding unit 9 and the entropy encoding is requested again.

〈エントロピー符号化〉
以下では、本実施形態で適用されるエントロピー符号化の例として、ハフマン符号化、レンジコーダによる符号化について説明する。 <Entropy coding>
Hereinafter, Huffman coding and coding by a range coder will be described as examples of entropy coding applied in the present embodiment.

（ハフマン符号化）
ハフマン符号化とは、出現頻度の高い記号には短い符号を割り当て、出現頻度の低い記号には長い符号を割り当てることで、全体のデータ量を圧縮する方式である。例えば、４つの記号｛ａ、ｂ、ｃ、ｄ｝からなる１００文字のデータがあったとする。全ての記号に同じ長さの２進数の符号（固定長符号）を割り当てる場合、４つの記号を表すには２ビットが必要であるため、１００文字のデータ量は、２[bit]×１００＝２００[bit]となる。 (Huffman coding)
Huffman coding is a method of compressing the entire data amount by assigning short codes to symbols with high appearance frequency and assigning long codes to symbols with low appearance frequency. For example, assume that there is 100 characters of data consisting of four symbols {a, b, c, d}. When a binary code (fixed length code) having the same length is assigned to all symbols, 2 bits are required to represent the four symbols, so the data amount of 100 characters is 2 [bit] × 100 = 200 [bit].

ハフマン符号化では、各記号の出現頻度に応じて２進数の符号が割り当てられる。図４に、１００文字のデータ中の各記号ａ、ｂ、ｃ、ｄの出現頻度が、それぞれ、１０、７０、１、１９である場合に各記号に割り当てられた２進数の符号の例を示す。図４に示すように、記号ａ、ｂ、ｃ、ｄに、それぞれ、符号１００、０、１０１、１１が割り当てられた場合、１００文字のデータ量は、３[bit]×１０＋１[bit]×７０＋３[bit]×１＋２[bit]×１９＝１４１[bit]となり、データ量は、固定長符号のデータ量の７０％に圧縮される。 In Huffman coding, a binary code is assigned according to the appearance frequency of each symbol. FIG. 4 shows an example of a binary code assigned to each symbol when the appearance frequency of each symbol a, b, c, d in 100-character data is 10, 70, 1, 19 respectively. Show. As shown in FIG. 4, when symbols 100, 0, 101, and 11 are assigned to the symbols a, b, c, and d, respectively, the data amount of 100 characters is 3 [bit] × 10 + 1 [bit] ×. 70 + 3 [bit] × 1 + 2 [bit] × 19 = 141 [bit], and the data amount is compressed to 70% of the data amount of the fixed-length code.

（レンジコーダによる符号化）
符号化前の元信号に含まれる記号の集合をＳ＝｛s_i|i=1,…,n｝とし、各記号s_iの出現確率をｐ_iとする。また、元信号に含まれる各記号s_iを予め決められた順番に並べ替えた記号列｛s₁、s₂、…、s_n｝において、記号s_k（ｋ≧２）より前に並んでいる各記号の出現確率の合計をＧ_kとする。即ち、Ｇ_kは、式（９）のように表される。

(Encoding by range coder)
Assume that a set of symbols included in the original signal before encoding is S = {s _i | i = 1,..., N}, and the appearance probability of each symbol s _i is p _i . Furthermore, symbol strings sorted in a predetermined order each symbol s _i in the original signal _{_{{s 1, s 2, ...}} , s n} in, lined before the symbol s _{k (k} ≧ 2) Let G _k be the total appearance probability of each symbol. That is, G _k is expressed as in Expression (9).

レンジコーダによる符号化では、記号毎に出現確率ｐ_iとＧ_iを対応付けて格納したテーブル（以下、生起確率テーブルという。）に基づいて、入力済みの信号が示す記号列に、数値で示す範囲（下限、幅）を設定する処理を行う。入力済みの信号に設定される範囲（下限、幅）は、直前に入力された信号に設定された範囲と生起確率テーブルに基づいて決定される。 In the encoding by the range coder, a symbol string indicated by an input signal is indicated by a numerical value based on a table (hereinafter referred to as an occurrence probability table) in which appearance probabilities p _i and G _i are stored in association with each symbol. Process to set the range (lower limit, width). The range (lower limit, width) set for the input signal is determined based on the range set for the signal input immediately before and the occurrence probability table.

符号化対象の信号s_kが入力されたときに設定される幅をrange'、下限をlow'とし、その信号s_kの１つ前の信号が入力されたときに設定された幅をrange、下限をlowとすると、幅range'、下限low'は、それぞれ、式（１０）、式（１１）のように表される。
range'＝range×ｐ_k （１０）
low'＝low＋range×Ｇ_k （１１）
式（１０）及び式（１１）で算出されたrange'、low'が、次の信号が入力されたときのrange、lowとなる。 The range set when the signal s _{k to} be encoded is input is range ', the lower limit is low', and the range set when the signal before the signal s _k is input is range, Assuming that the lower limit is low, the width range ′ and the lower limit low ′ are expressed as in Expression (10) and Expression (11), respectively.
range '= range × _pk (10)
low '= low + range × G _k (11)
The range ′ and low ′ calculated by Expression (10) and Expression (11) are the range and low when the next signal is input.

式（１０）及び式（１１）で示す算出処理は、入力信号がなくなるまで行われ、最後の信号が入力されたときに算出されたrange、lowに基づいて決定される範囲low〜low＋rangeの間の値が符号値として出力される。 The calculation processing shown in Expression (10) and Expression (11) is performed until there is no input signal. Between the range low and low + range determined based on the range and low calculated when the last signal is input. Is output as a code value.

図５に、レンジコーダ符号化の例を示す。図５（ａ）に、元信号に含まれる記号の集合がＳ＝｛s₁=ａ、s₂=ｂ、s₃=ｃ、s₄=ｄ｝であるときの生起確率テーブルの一例を示す。また、図５（ｂ）に、記号列｛ｂａｃａ｝に対する符号化の一例を示す。図５（ｂ）では、記号列を示す符号を１０進数とし、lowの初期値を０、rangeの初期値を１０⁶とした場合を示している。図５（ｂ）において、「入力信号」項目は、入力された記号を示し、「記号列」項目は、これまでに入力された記号列を示し、「low」項目は、式（１１）により算出されるlow'を示し、「range」項目は、式（１０）により算出されるrange'を示す。また、「範囲」項目は、low及びrangeから決定される符号値の範囲を示す。図５（ｂ）において、［ｘ、ｙ）という表記は、符号値Ｚがｘ≦Ｚ＜ｙを満たすことを意味する。図５（ｂ）によると、５９３７５０≦Ｚ＜６０３１２５を満たす符号値Ｚのうちの１つ（例えば、６０００００）が、記号列｛ｂａｃａ｝を符号化した結果として出力されることになる。 FIG. 5 shows an example of range coder encoding. FIG. 5A shows an example of the occurrence probability table when the set of symbols included in the original signal is S = {s ₁ = a, s ₂ = b, s ₃ = c, s ₄ = d}. . FIG. 5B shows an example of encoding for the symbol string {baca}. FIG. 5B shows a case where the code indicating the symbol string is a decimal number, the initial value of low is 0, and the initial value of range is 10 ⁶ . In FIG. 5B, the “input signal” item indicates the input symbol, the “symbol string” item indicates the symbol string input so far, and the “low” item is expressed by the equation (11). “Low ′” is calculated, and the “range” item indicates “range ′” calculated by Expression (10). The “range” item indicates a range of code values determined from low and range. In FIG. 5B, the notation [x, y) means that the code value Z satisfies x ≦ Z <y. According to FIG. 5B, one of the code values Z satisfying 593750 ≦ Z <603125 (for example, 600000) is output as a result of encoding the symbol string {baca}.

このように、レンジコーダによる符号化では、予め決められた出現確率を利用して入力される各記号を符号化しているため、元信号に含まれる各記号の出現確率が固定された情報源からの発生であれば非常に有効である。しかしながら、符号化対象となる信号が、出現確率が一定の情報源から発生されていることは極めてまれである。よって、上述のレンジコーダによる符号化では、各記号の出現確率が符号化対象となる信号に適応していない。そこで、本実施形態では、レンジコーダ符号化において、信号が入力される度に出現確率を更新させるようにすることによって、実際の信号に適応可能にした。以下、本実施形態のレンジコーダによる符号化について説明する。 As described above, in encoding by the range coder, each symbol input is encoded using a predetermined appearance probability, and therefore, from an information source in which the appearance probability of each symbol included in the original signal is fixed. This is very effective. However, it is extremely rare that a signal to be encoded is generated from an information source having a constant appearance probability. Therefore, in the encoding by the above range coder, the appearance probability of each symbol is not adapted to the signal to be encoded. Therefore, in the present embodiment, in the range coder encoding, the appearance probability is updated every time a signal is input, so that it can be adapted to an actual signal. Hereinafter, encoding by the range coder of this embodiment will be described.

上述と同様に、符号化前の元信号に含まれる記号の集合をＳ＝｛s_i|i=1,…,n｝とする。元信号に含まれる記号s_iの出現頻度をｇ_i、出現頻度ｇ_iの合計をｃｕｍ、各記号s_iの出現確率をｐ_iとすると、ｃｕｍ、ｐ_iは、それぞれ、式（１２）、式（１３）のように表される。

Similarly to the above, a set of symbols included in the original signal before encoding is S = {s _i | i = 1,..., N}. Assuming that the appearance frequency of symbols s _i included in the original signal is g _i , the sum of the appearance frequencies g _i is cum, and the appearance probability of each symbol s _i is p _i , cum and p _i are respectively expressed by Equation (12), It is expressed as equation (13).

エントロピー符号化部９は、入力された信号に幅range及び下限lowを設定するためのテーブルとして、図６に示すような生起確率テーブル８１を有する。生起確率テーブル８１は、図６に示すように、各記号毎に、出現頻度ｇ_i、出現確率ｐ_i、Ｇ_iの各項目を対応付けて格納している。Ｇ_iの定義は、式（９）で示したとおりである。 The entropy encoding unit 9 has an occurrence probability table 81 as shown in FIG. 6 as a table for setting the width range and the lower limit low for the input signal. As shown in FIG. 6, the occurrence probability table 81 stores the items of the appearance frequency g _i , the appearance probability p _i , and G _i in association with each symbol. The definition of G _i is as shown in Expression (9).

エントロピー符号化部９に符号化対象の信号s_kが入力されたときに設定される幅をrange'、下限をlow'とし、その信号s_kの１つ前の信号が入力されたときに設定された幅をrange、下限をlowとすると、幅range'、下限low'は、それぞれ、式（１４）、式（１５）のように表される。

式（１４）及び式（１５）で算出されたrange'、low'が、次の信号が入力されたときのrange、lowとなる。 The range set when the encoding target signal s _k is input to the entropy encoding unit 9 is set as range ', the lower limit is set as low', and is set when the signal before the signal s _k is input. Assuming that the obtained width is range and the lower limit is low, the width range ′ and the lower limit low ′ are respectively expressed as Expression (14) and Expression (15).

The range ′ and low ′ calculated by Expression (14) and Expression (15) are the range and low when the next signal is input.

信号s_kの入力によりrange、lowが算出されると、エントロピー符号化部９は、式（１６）に示すように、出現確率ｇ_kに１を加算し、算出された出現確率ｇ_k'を新たなｇ_kとする。
ｇ_k'＝ｇ_k＋１（１６）
エントロピー符号化部９は、出現確率ｇ_kの加算に伴い、ｃｕｍ、出現確率ｐ_i、Ｇ_iを再計算し、生起確率テーブル８１を更新する。エントロピー符号化部９は、これらの処理を、入力信号がなくなるまで行い、最後の信号が入力されたときに算出されたrange、lowに基づいて決定される範囲low〜low＋rangeの間の値を符号値として出力する。 When range and low are calculated by the input of the signal s _k , the entropy encoding unit 9 adds 1 to the appearance probability g _k as shown in the equation (16), and uses the calculated appearance probability g _k ′. Let it be a new g _k .
g _k '= g _k +1 (16)
The entropy encoding unit 9 recalculates the cum, the appearance probabilities p _i and G _i with the addition of the appearance probability g _k , and updates the occurrence probability table 81. The entropy encoding unit 9 performs these processes until there is no input signal, and encodes a value between the range low to low + range determined based on the range and low calculated when the last signal is input. Output as a value.

図７及び図８に、本実施形態のレンジコーダ符号化の例を示す。図７（ａ）に、元信号に含まれる記号の集合がＳ＝｛s₁=ａ、s₂=ｂ、s₃=ｃ、s₄=ｄ｝であるときのデフォルトの生起確率テーブル８１の一例を示す。図７（ａ）に示すデフォルトの生起確率テーブル８１のp_i及びＧ_iは、図５（ａ）に示す生起確率テーブルと同一であるものとする。また、図７（ｂ）には、図５（ｂ）に示した記号列と同一の記号列｛ｂａｃａ｝に対する符号化の一例を示す。図７（ｂ）においても、記号列を示す符号を１０進数とし、lowの初期値を０、rangeの初期値を１０⁶とする。図７（ｂ）において、「入力信号」項目は、入力された記号を示し、「記号列」項目は、これまでに入力された記号列を示し、「low」項目は、式（１５）により算出されるlow'を示し、「range」項目は、式（１４）により算出されるrange'を示す。また、「範囲」項目は、low及びrangeから決定される符号値の範囲を示す。また、「生起確率テーブル」項目は、記号の入力毎に更新された生起確率テーブルを示す。図８に、記号の入力毎に更新された生起確率テーブルを示す。図７（ｂ）によると、記号の入力毎に生起確率テーブルを更新することで、記号列｛ｂａｃａ｝が示す「範囲」は、図５（ｂ）に示した生起確率テーブルが固定された場合と異なり、５９１９９２≦Ｚ＜５９９７５７を満たす符号値Ｚのうちの１つが、記号列｛ｂａｃａ｝を符号化した結果として出力されることになる。 7 and 8 show examples of range coder encoding according to this embodiment. FIG. 7A shows a default occurrence probability table 81 when the set of symbols included in the original signal is S = {s ₁ = a, s ₂ = b, s ₃ = c, s ₄ = d}. An example is shown. It is assumed that p _i and G _i of the default occurrence probability table 81 shown in FIG. 7A are the same as the occurrence probability table shown in FIG. FIG. 7B shows an example of encoding for the same symbol string {baca} as the symbol string shown in FIG. Also in FIG. 7B, the symbol indicating the symbol string is a decimal number, the initial value of low is 0, and the initial value of range is 10 ⁶ . In FIG. 7B, the “input signal” item indicates the input symbol, the “symbol string” item indicates the symbol string input so far, and the “low” item is expressed by the equation (15). “Low ′” is calculated, and the “range” item indicates “range ′” calculated by the equation (14). The “range” item indicates a range of code values determined from low and range. The “occurrence probability table” item indicates an occurrence probability table updated every time a symbol is input. FIG. 8 shows an occurrence probability table updated every time a symbol is input. According to FIG. 7B, by updating the occurrence probability table for each input of the symbol, the “range” indicated by the symbol string {baca} is the case where the occurrence probability table shown in FIG. 5B is fixed. Unlike the above, one of the code values Z satisfying 591992 ≦ Z <599757 is output as a result of encoding the symbol string {baca}.

図９に、音声符号化装置１００により符号化（圧縮）された音声信号を復号する音声復号装置２００の構成を示す。音声復号装置２００は、図９に示すように、エントロピー復号部２１、逆ベクトル量子化部２２、時間順並べ替え部２３、乗算部２４、周波数逆変換部２５、レベル再現部２６、フレーム合成部２７、Ｄ／Ａ変換部２８により構成される。なお、音声符号化装置１００と音声復号装置２００を１つの筐体に一体的に備えるような構造としてもよいし、各々を別体として設けるようにしてもよい。 FIG. 9 shows a configuration of a speech decoding apparatus 200 that decodes a speech signal encoded (compressed) by the speech encoding apparatus 100. As shown in FIG. 9, the speech decoding apparatus 200 includes an entropy decoding unit 21, an inverse vector quantization unit 22, a time order rearrangement unit 23, a multiplication unit 24, a frequency inverse transformation unit 25, a level reproduction unit 26, and a frame synthesis unit. 27, a D / A converter 28. Note that the structure may be such that the speech encoding apparatus 100 and the speech decoding apparatus 200 are integrally provided in one housing, or each may be provided separately.

エントロピー復号部２１は、エントロピー符号化により符号化された信号を復号し、逆ベクトル量子化部２２に出力する。逆ベクトル量子化部２２は、複数の音声パターンを示す代表ベクトルを格納したテーブルとして、高域用ＶＱテーブル２２ａ、低域用ＶＱテーブル２２ｂを有し、エントロピー復号部２１から入力された信号（インデックス）に対応する代表ベクトルを抽出し、時間順並べ替え部２３に出力する。高域用ＶＱテーブル２２ａ、低域用ＶＱテーブル２２ｂは、それぞれ、図１に示す高域用ＶＱテーブル８ａ、低域用ＶＱテーブル８ｂと同一のものである。 The entropy decoding unit 21 decodes the signal encoded by entropy encoding and outputs the decoded signal to the inverse vector quantization unit 22. The inverse vector quantization unit 22 has a high frequency VQ table 22a and a low frequency VQ table 22b as tables storing representative vectors indicating a plurality of speech patterns, and a signal (index) input from the entropy decoding unit 21. ) Are extracted and output to the time order rearrangement unit 23. The high frequency VQ table 22a and the low frequency VQ table 22b are respectively the same as the high frequency VQ table 8a and the low frequency VQ table 8b shown in FIG.

時間順並べ替え部２３は、逆ベクトル量子化部２２から入力されたベクトルを時間順に並べ替え、乗算部２４に出力する。乗算部２４は、時間順並べ替え部２３により得られた１フレーム分のＭＤＣＴブロックのうち予め指定されたＭＤＣＴブロックのＭＤＣＴ係数を、他のＭＤＣＴブロックのＭＤＣＴ係数に乗算し、乗算結果を周波数逆変換部２５に出力する。ここで、予め指定されたＭＤＣＴブロックとは、符号化時に音声符号化装置１００の除算部６において除算値として使用されたＭＤＣＴブロックと同一である。 The time order rearrangement unit 23 rearranges the vectors input from the inverse vector quantization unit 22 in time order and outputs the vectors to the multiplication unit 24. The multiplying unit 24 multiplies the MDCT coefficient of the MDCT block specified in advance among the MDCT blocks for one frame obtained by the time-order rearranging unit 23 by the MDCT coefficient of the other MDCT block, and inverses the frequency of the multiplication result. The data is output to the conversion unit 25. Here, the MDCT block designated in advance is the same as the MDCT block used as a division value in the division unit 6 of the speech encoding apparatus 100 at the time of encoding.

周波数逆変換部２５は、乗算部２４から入力された信号（ＭＤＣＴ係数）に対し逆ＭＤＣＴを施し、レベル再現部２６に出力する。レベル再現部２６は、周波数逆変換部２５から入力された信号のレベル調節を行って、元のレベルに戻し、フレーム合成部２７に出力する。 The frequency inverse transform unit 25 performs inverse MDCT on the signal (MDCT coefficient) input from the multiplication unit 24 and outputs the result to the level reproduction unit 26. The level reproduction unit 26 adjusts the level of the signal input from the frequency inverse conversion unit 25, returns it to the original level, and outputs it to the frame synthesis unit 27.

フレーム合成部２７は、符号化及び復号化の処理単位であったフレームを合成し、合成後の信号をＤ／Ａ変換部２８に出力する。Ｄ／Ａ変換部２８は、フレーム合成部２７から入力されたデジタル信号をアナログ信号に変換し、音声再生信号として出力する。 The frame synthesizing unit 27 synthesizes frames that are processing units of encoding and decoding, and outputs the combined signal to the D / A converting unit 28. The D / A converter 28 converts the digital signal input from the frame synthesizer 27 into an analog signal and outputs it as an audio reproduction signal.

次に、本実施形態における動作について説明する。
まず、図１０のフローチャートを参照して、音声符号化装置１００において実行される音声符号化処理について説明する。 Next, the operation in this embodiment will be described.
First, the speech encoding process executed in speech encoding apparatus 100 will be described with reference to the flowchart of FIG.

まず、音声アナログ信号が入力されると、Ａ／Ｄ変換部１において、入力された音声アナログ信号が音声デジタル信号に変換される（ステップＳ１）。以下、符号化対象の音声デジタル信号を単に音声信号を呼ぶことにする。次いで、ＤＣ除去部２において、音声信号の直流成分が削除され（ステップＳ２）、フレーム化部３において、直流成分削除後の音声信号がフレームに分割される（ステップＳ３）。 First, when an audio analog signal is input, the input audio analog signal is converted into an audio digital signal in the A / D converter 1 (step S1). Hereinafter, the audio digital signal to be encoded is simply referred to as an audio signal. Next, the DC removal unit 2 deletes the DC component of the audio signal (step S2), and the framing unit 3 divides the audio signal after the DC component deletion into frames (step S3).

次いで、レベル調整部４において、フレーム毎に、入力された音声信号のレベルが調整され（ステップＳ４）、周波数変換部５において、レベル調整後の音声信号に対し、ＭＤＣＴが施される（ステップＳ５）。 Next, the level adjustment unit 4 adjusts the level of the input audio signal for each frame (step S4), and the frequency conversion unit 5 applies MDCT to the audio signal after level adjustment (step S5). ).

次いで、除算部６において、フレーム毎に、ステップＳ５で得られたＭＤＣＴ係数に対する除算処理（周波数変換係数除算処理）が行われる（ステップＳ６）。ステップＳ６では、図１１に示すように、周波数変換部５で得られた１フレーム分のＭＤＣＴブロックのうち予め指定されたＭＤＣＴブロックのＭＤＣＴ係数で、同一フレーム内の他のＭＤＣＴブロックのＭＤＣＴ係数が除算される（ステップＳ２０）。 Next, the division unit 6 performs a division process (frequency conversion coefficient division process) on the MDCT coefficient obtained in step S5 for each frame (step S6). In step S6, as shown in FIG. 11, the MDCT coefficient of the MDCT block designated in advance among the MDCT blocks for one frame obtained by the frequency converter 5, and the MDCT coefficients of other MDCT blocks in the same frame are set. Divide (step S20).

次いで、周波数並べ替え部７において、ステップＳ６の周波数変換係数除算処理で得られたＭＤＣＴ係数が周波数毎に並べ替えられ（ステップＳ７）、同一周波数帯域の係数がまとめてベクトル化される。 Next, the frequency rearrangement unit 7 rearranges the MDCT coefficients obtained by the frequency conversion coefficient division processing in step S6 for each frequency (step S7), and collectively vectorizes the coefficients in the same frequency band.

次いで、ベクトル量子化部８において、高域のＭＤＣＴ係数のベクトルと高域用ＶＱテーブル８ａに格納された代表ベクトルが比較されるとともに、低域のＭＤＣＴ係数のベクトルと低域用ＶＱテーブル８ｂに格納された代表ベクトルが比較され、最も類似した代表ベクトルが示すインデックスが符号として出力される（ステップＳ８）。 Next, the vector quantization unit 8 compares the high-frequency MDCT coefficient vector with the representative vector stored in the high-frequency VQ table 8a, and stores the low-frequency MDCT coefficient vector in the low-frequency VQ table 8b. The stored representative vectors are compared, and the index indicated by the most similar representative vector is output as a code (step S8).

次いで、ベクトル量子化後の音声信号に対し、フレーム毎にエントロピー符号化が施され（ステップＳ９）、エントロピー符号化後の信号が音声符号化信号としてレートコントローラ１０に出力される。次いで、レートコントローラ１０において、エントロピー符号化部９から入力された１フレーム分の音声符号化信号が予め設定された目標データ量以下であるか否かが判定される（ステップＳ１０）。 Next, entropy encoding is performed for each frame on the speech signal after vector quantization (step S9), and the entropy encoded signal is output to the rate controller 10 as a speech encoded signal. Next, the rate controller 10 determines whether or not the speech encoded signal for one frame input from the entropy encoding unit 9 is equal to or less than a preset target data amount (step S10).

ステップＳ１０において、入力された音声符号化信号が目標データ量より大きいと判定された場合（ステップＳ１０；ＮＯ）、データ削除部１１において、１フレーム分の音声符号化信号のうち、エネルギー｜Ｆ_j｜²が最小の帯域の信号が削除される（ステップＳ１２）。ステップＳ１２が終了すると、ステップＳ９に戻り、再度、該当フレームの音声信号に対するエントロピー符号化が行われる（ステップＳ９）。 If it is determined in step S10 that the input speech encoded signal is larger than the target data amount (step S10; NO), the data deleting unit 11 uses the energy | F _{j in} the speech encoded signal for one frame. The signal in the band with the minimum | ² is deleted (step S12). When step S12 ends, the process returns to step S9, and entropy coding is performed again on the audio signal of the corresponding frame (step S9).

ステップＳ１０において、入力された音声符号化信号が目標データ量以下であると判定された場合（ステップＳ１０；ＹＥＳ）、エントロピー符号化部９に次のフレームの音声信号が入力されたか否かが判定される（ステップＳ１１）。 When it is determined in step S10 that the input speech encoded signal is equal to or less than the target data amount (step S10; YES), it is determined whether the speech signal of the next frame is input to the entropy encoding unit 9 or not. (Step S11).

ステップＳ１１において、エントロピー符号化部９に次のフレームの音声信号が入力されたと判定された場合（ステップＳ１１；ＹＥＳ）、ステップＳ９に戻り、当該フレームに対するエントロピー符号化が行われる（ステップＳ９）。ステップＳ１１において、エントロピー符号化部９に入力された全てのフレームに対するエントロピー符号化が終了したと判定された場合（ステップＳ１１；ＮＯ）、本音声符号化処理が終了する。 In step S11, when it is determined that the audio signal of the next frame is input to the entropy encoding unit 9 (step S11; YES), the process returns to step S9, and entropy encoding is performed on the frame (step S9). If it is determined in step S11 that entropy encoding has been completed for all frames input to the entropy encoding unit 9 (step S11; NO), the speech encoding process ends.

<周波数変換係数除算処理の変形例>
ステップＳ６の周波数変換係数除算処理の方法は、図１１に示した方法に限定されない。以下、図１１の周波数変換係数除算処理の変形例について、図１２のフローチャートを参照して説明する。 <Modification of frequency conversion coefficient division>
The frequency conversion coefficient division processing method in step S6 is not limited to the method shown in FIG. Hereinafter, a modification of the frequency conversion coefficient division process of FIG. 11 will be described with reference to the flowchart of FIG.

まず、１フレーム分の各ＭＤＣＴブロック毎に、ＭＤＣＴ係数の絶対値の和が算出される（ステップＳ３０）。ｉ番目のＭＤＣＴブロックのＭＤＣＴ係数をＭ_i＝｛ｍ_in｜n=0,…,k-1｝とすると、ｉ番目のＭＤＣＴブロックのＭＤＣＴ係数の絶対値の和ＳＭ_iは式（１７）のように表される。

First, the sum of absolute values of MDCT coefficients is calculated for each MDCT block for one frame (step S30). When the MDCT coefficient of the _i -th MDCT block is M _i = {m _in | n = 0,..., k−1}, the sum SM _i of the absolute values of the MDCT coefficients of the i-th MDCT block is given by Equation (17). It is expressed as follows.

次いで、該当フレームのＭＤＣＴブロックのうち、ＭＤＣＴ係数の絶対値の和ＳＭ_iが最も大きいＭＤＣＴブロックが選択され（ステップＳ３１）、その選択されたＭＤＣＴブロックのＭＤＣＴ係数で、同一フレーム内の他のＭＤＣＴブロックのＭＤＣＴ係数が除算され（ステップＳ３２）、本周波数変換係数除算処理が終了する。 Then, among the MDCT block of the corresponding frame, the sum SM _i of the largest absolute value MDCT block of MDCT coefficients is selected (step S31), in MDCT coefficients of the selected MDCT blocks, other MDCT in the same frame The MDCT coefficient of the block is divided (step S32), and the frequency conversion coefficient division process is completed.

例えば、式（６）で示した例において、各ＭＤＣＴブロック毎にＭＤＣＴ係数の絶対値の和ＳＭ_iを計算すると、ＳＭ₀＝186、ＳＭ₁＝201、ＳＭ₂＝150、ＳＭ₃＝180となる。この中ではＳＭ₁が最も大きいため、Ｍ₁が除算値として利用するＭＤＣＴブロックとして選択される。このとき、除算処理後のＭ'_i（i＝0,2,3）は式（１８）のようになる。

式（１８）と式（７）を比較すると、式（１８）の方が、除算処理後のＭＤＣＴ係数のとり得る範囲がより狭いことがわかる。 For example, in the example shown in Expression (6), when the sum SM _i of the absolute values of the MDCT coefficients is calculated for each MDCT block, SM ₀ = 186, SM ₁ = 201, SM ₂ = 150, SM ₃ = 180. Become. Among these, since SM ₁ is the largest, M ₁ is selected as an MDCT block to be used as a division value. At this time, M ′ _i (i = 0, 2, 3) after the division processing is as shown in Expression (18).

Comparing equation (18) and equation (7), it can be seen that equation (18) has a narrower range of possible MDCT coefficients after division processing.

周波数変換係数除算処理の他の方法として、絶対値が最も大きいＭＤＣＴ係数を有するＭＤＣＴブロックを除算値として利用する方法がある。例えば、式（６）で示した例では、各ＭＤＣＴブロックにおいて絶対値が最も大きいＭＣＤＴ係数は、Ｍ₀が112、Ｍ₁が120、Ｍ₂が97、Ｍ₃が110となる。この中で最も大きいのはＭ₁の120であるため、Ｍ₁が除算値として利用するＭＤＣＴブロックとして選択される。 As another method of frequency conversion coefficient division processing, there is a method of using an MDCT block having an MDCT coefficient having the largest absolute value as a division value. For example, in the example shown in Expression (6), the MCDT coefficient having the largest absolute value in each MDCT block is M ₀ is 112, M ₁ is 120, M ₂ is 97, and M ₃ is 110. Since the largest of these is M ₁ 120, M ₁ is selected as the MDCT block to be used as a division value.

音声復号装置２００の乗算部２４では、図１０のステップＳ６に示した周波数変換係数除算処理（以下、単に「除算処理」という。）の方法に対応して、周波数変換係数乗算処理が行われる。 The multiplication unit 24 of the speech decoding apparatus 200 performs frequency conversion coefficient multiplication processing corresponding to the method of frequency conversion coefficient division processing (hereinafter simply referred to as “division processing”) shown in step S6 of FIG.

ステップＳ６において図１１の除算処理が行われた場合、乗算部２４では、図１３に示すように、１フレーム分のＭＤＣＴブロックのうち予め指定されたＭＤＣＴブロックのＭＤＣＴ係数が、同一フレーム内の他のＭＤＣＴブロックのＭＤＣＴ係数に乗算される（ステップＳ４０）。ここで、予め指定されたＭＤＣＴブロックとは、符号化時に音声符号化装置１００の除算部６において除算値として使用されたＭＤＣＴブロックと同一である。 When the division process of FIG. 11 is performed in step S6, the multiplication unit 24 sets the MDCT coefficient of the MDCT block designated in advance among the MDCT blocks for one frame as shown in FIG. The MDCT coefficient of each MDCT block is multiplied (step S40). Here, the MDCT block designated in advance is the same as the MDCT block used as a division value in the division unit 6 of the speech encoding apparatus 100 at the time of encoding.

ステップＳ６において図１２の除算処理が行われた場合、乗算部２４では、図１４に示すように、１フレーム分のＭＤＣＴブロックのうち、ステップＳ３１で選択されたＭＤＣＴブロックのＭＤＣＴ係数が、他のＭＤＣＴブロックのＭＤＣＴ係数に乗算される（ステップＳ５０）。 When the division process of FIG. 12 is performed in step S6, the multiplication unit 24 determines that the MDCT coefficients of the MDCT block selected in step S31 out of the MDCT blocks for one frame are other values as shown in FIG. The MDCT coefficient of the MDCT block is multiplied (step S50).

ステップＳ６において、絶対値が最も大きいＭＤＣＴ係数を有するＭＤＣＴブロックを除算値として利用する除算処理が行われた場合、乗算部２４では、当該ＭＤＣＴブロックのＭＤＣＴ係数が、他のＭＤＣＴブロックのＭＤＣＴ係数に乗算される。 In step S6, when division processing using the MDCT block having the MDCT coefficient having the largest absolute value as the division value is performed, the multiplication unit 24 converts the MDCT coefficient of the MDCT block into the MDCT coefficient of another MDCT block. Is multiplied.

以上のように、本実施形態及びその変形例の音声符号化装置１００及び音声復号装置２００によれば、１フレーム中の１つのＭＤＣＴブロックのＭＤＣＴ係数を用いて、他のＭＤＣＴブロックのＭＤＣＴ係数を除算するようにしたことにより、除算後のＭＤＣＴ係数の絶対値の範囲が狭くなり、ベクトル量子化やエントロピー符号化等の効率が向上し、音質を向上させることができる。また、除算値の絶対値が１以上である場合にのみ、除算処理を行うことにより、ＭＤＣＴ係数が却って大きくなることを防ぎ、ベクトル量子化の効率に悪影響を与えることがなくなる。 As described above, according to the speech coding apparatus 100 and the speech decoding apparatus 200 of the present embodiment and the modifications thereof, the MDCT coefficients of other MDCT blocks are obtained using the MDCT coefficients of one MDCT block in one frame. By performing the division, the range of the absolute value of the MDCT coefficient after the division is narrowed, the efficiency of vector quantization, entropy coding and the like is improved, and the sound quality can be improved. Also, by performing the division process only when the absolute value of the division value is 1 or more, the MDCT coefficient is prevented from becoming larger and the vector quantization efficiency is not adversely affected.

なお、本実施形態における記述内容は、本発明の趣旨を逸脱しない範囲で適宜変更可能である。
例えば、上述の実施形態では、周波数変換としてＭＤＣＴを用いる場合を示したが、ＦＦＴ（Fast Fourier Transform：高速フーリエ変換）等の他の周波数変換を用いるようにしてもよい。 Note that the description in the present embodiment can be changed as appropriate without departing from the spirit of the present invention.
For example, although the case where MDCT is used as frequency conversion has been described in the above-described embodiment, other frequency conversion such as FFT (Fast Fourier Transform) may be used.

本発明の実施形態に係る音声符号化装置の構成を示すブロック図。The block diagram which shows the structure of the audio | voice coding apparatus which concerns on embodiment of this invention. 入力信号のフレーム分割を示す図。The figure which shows the frame division | segmentation of an input signal. １フレームと各ＭＤＣＴブロックの関係を示す図。The figure which shows the relationship between 1 frame and each MDCT block. ハフマン符号の一例を示す図。The figure which shows an example of a Huffman code | symbol. 従来のレンジコーダによる符号化の一例を示す図。The figure which shows an example of the encoding by the conventional range coder. 本実施形態のレンジコーダ符号化に必要な生起確率テーブル８１のデータ構成を示す図。The figure which shows the data structure of the occurrence probability table 81 required for the range coder encoding of this embodiment. デフォルトの生起確率テーブル８１の一例（同図（ａ））と、符号化の一例（同図（ｂ））を示す図。The figure which shows an example (the figure (a)) of default occurrence probability table 81, and an example (the figure (b)) of encoding. 生起確率テーブル８１の更新例を示す図。The figure which shows the example of an update of the occurrence probability table 81. FIG. 符号化された音声を復号する音声復号装置の構成を示すブロック図。The block diagram which shows the structure of the audio | voice decoding apparatus which decodes the audio | voice encoded. 本実施形態の音声符号化装置において実行される音声符号化処理を示すフローチャート。The flowchart which shows the audio | voice encoding process performed in the audio | voice encoding apparatus of this embodiment. 本実施形態の音声符号化装置において実行される周波数変換係数除算処理を示すフローチャート。The flowchart which shows the frequency-transform coefficient division process performed in the audio | voice coding apparatus of this embodiment. 図１１の周波数変換係数除算処理の変形例を示すフローチャート。The flowchart which shows the modification of the frequency conversion coefficient division process of FIG. 本実施形態の音声復号装置において実行される周波数変換係数乗算処理を示すフローチャート。The flowchart which shows the frequency conversion coefficient multiplication process performed in the audio | voice decoding apparatus of this embodiment. 図１３の周波数変換係数乗算処理の変形例を示すフローチャート。14 is a flowchart showing a modification of the frequency conversion coefficient multiplication process of FIG.

Explanation of symbols

１Ａ／Ｄ変換部
２ＤＣ除去部
３フレーム化部
４レベル調整部
５周波数変換部
６除算部
７周波数並べ替え部
８ベクトル量子化部
８ａ高域用ＶＱテーブル
８ｂ低域用ＶＱテーブル
９エントロピー符号化部
１０レートコントローラ
１１データ削除部
８１生起確率テーブル
２１エントロピー復号部
２２逆ベクトル量子化部
２３時間順並べ替え部
２４乗算部
２５周波数逆変換部
２６レベル再現部
２７フレーム合成部
２８Ｄ／Ａ変換部
１００音声符号化装置
２００音声復号装置 1 A / D conversion unit 2 DC removal unit 3 Framing unit 4 Level adjustment unit 5 Frequency conversion unit 6 Division unit 7 Frequency rearrangement unit 8 Vector quantization unit 8a High frequency VQ table 8b Low frequency VQ table 9 Entropy code Conversion unit 10 rate controller 11 data deletion unit 81 occurrence probability table 21 entropy decoding unit 22 inverse vector quantization unit 23 time order rearrangement unit 24 multiplication unit 25 frequency inverse conversion unit 26 level reproduction unit 27 frame synthesis unit 28 D / A conversion Unit 100 speech coding apparatus 200 speech decoding apparatus

Claims

A framing unit that divides the input audio signal into frames;
A frequency conversion unit that performs frequency conversion on an audio signal and calculates a conversion coefficient for each block unit constituting one frame;
A division unit that divides a conversion coefficient of another block in the same frame by a conversion coefficient of one block designated in advance among the conversion coefficients obtained by the frequency conversion unit;
A vector quantization unit that collectively vector-quantizes the divided values of the transform coefficients of each block obtained by the division unit for each frequency band;
An encoding unit that encodes the index data obtained by the vector quantization unit using a predetermined encoding method;
A speech encoding apparatus comprising:

  A framing unit that divides the input audio signal into frames;
  A frequency conversion unit that performs frequency conversion on an audio signal and calculates a conversion coefficient for each block unit constituting one frame;
  Based on the transform coefficient obtained by the frequency transform unit, the sum of the absolute values of the transform coefficients is calculated for each block unit constituting the frame, and the same transform coefficient of the block having the largest absolute value is the same. A division unit that divides the transform coefficients of other blocks in the frame;
  A vector quantization unit that collectively vector-quantizes the divided values of the transform coefficients of each block obtained by the division unit for each frequency band;
  An encoding unit that encodes the index data obtained by the vector quantization unit using a predetermined encoding method;
A speech encoding apparatus comprising:

  A framing unit that divides the input audio signal into frames;
  A frequency conversion unit that performs frequency conversion on an audio signal and calculates a conversion coefficient for each block unit constituting one frame;
  A division unit that obtains a block having a maximum absolute value among the transform coefficients calculated for each block constituting the frame, and divides the transform coefficient of another block in the same frame by the transform coefficient of the block. When,
  A vector quantization unit that collectively vector-quantizes the divided values of the transform coefficients of each block obtained by the division unit for each frequency band;
  An encoding unit that encodes the index data obtained by the vector quantization unit using a predetermined encoding method;
A speech encoding apparatus comprising:

A framing step for dividing the input audio signal into frames;
A frequency conversion step of performing a frequency conversion on the audio signal and calculating a conversion coefficient for each block unit constituting one frame;
A division step of dividing a conversion coefficient of another block in the same frame by a conversion coefficient of one block specified in advance among the conversion coefficients obtained in the frequency conversion step;
A vector quantization step for performing vector quantization for each frequency band by dividing the divided values of the transform coefficients of each block obtained in the division step;
An encoding step for encoding the index data obtained by the vector quantization step by a predetermined encoding method;
A speech encoding method comprising:

  A framing step for dividing the input audio signal into frames;
  A frequency conversion step of performing a frequency conversion on the audio signal and calculating a conversion coefficient for each block unit constituting one frame;
  Based on the transform coefficient obtained in the frequency transform step, the sum of the absolute values of the transform coefficients is calculated for each block unit constituting the frame, and the same transform coefficient of the block having the largest absolute value is the same. A division step to divide the transform coefficients of other blocks in the frame;
  A vector quantization step for performing vector quantization for each frequency band by dividing the divided values of the transform coefficients of each block obtained in the division step;
  An encoding step for encoding the index data obtained by the vector quantization step by a predetermined encoding method;
A speech encoding method comprising:

  A framing step for dividing the input audio signal into frames;
  A frequency conversion step of performing a frequency conversion on the audio signal and calculating a conversion coefficient for each block unit constituting one frame;
  A division step of obtaining a block having a maximum absolute value among the transform coefficients calculated for each block constituting the frame, and dividing the transform coefficient of another block in the same frame by the transform coefficient of the block When,
  A vector quantization step for performing vector quantization for each frequency band by dividing the divided values of the transform coefficients of each block obtained in the division step;
  An encoding step for encoding the index data obtained by the vector quantization step by a predetermined encoding method;
A speech encoding method comprising: