JP3630082B2

JP3630082B2 - Audio signal encoding method and apparatus

Info

Publication number: JP3630082B2
Application number: JP2000204915A
Authority: JP
Inventors: 定浩安良
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2000-07-06
Filing date: 2000-07-06
Publication date: 2005-03-16
Anticipated expiration: 2020-07-06
Also published as: JP2002026736A

Description

【０００１】
【発明の属する技術分野】
本発明は、オーディオ信号を周波数領域に変換した後に符号化を行なうオーディオ信号符号化方法に係り、特に装置の許容歪み、使用ビット数が許容範囲の適正値に入れるまでの算出推定ループの回数を削減可能なオーディオ信号符号化装置に関する。
【０００２】
【従来の技術】
従来より、オーディオ信号の符号化方法には、例えば適応スペクトル聴感制御エントロピー符号化法（ＡＳＰＥＣ，ＡｄａｐｔｉｖｅＳｐｅｃｔｒａｌＰｅｒｃｅｐｔｕａｌＥｎｔｒｏｐｙＣｏｄｉｎｇ）、ＭＰＥＧ１オーディオ・レイヤ３、ＭＰＥＧ２オーディオＡＡＣ（ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）がある。
これらは、非線形量子化とハフマン符号化のために２重ループを構成して、量子化歪みと、符号量を制御している。
【０００３】
それぞれのループは、アウターループ、インナーループと呼ばれており、アウターループでは、量子化歪みが、聴覚心理モデルから得られた許容ノイズレベル以下になるように制御し、インナーループでは、量子化を行ない所定のビット数の範囲内に収まるように制御を行なう。
【０００４】
図４には、従来の量子化符号化部におけるイタレーションループ処理を示す。
従来の処理では、所定ビット数に収める処理と量子化歪みを所定量に収める処理とに対して、それぞれループを作ることで実現している。
所定ビット数とは、設定されたビットレートより求められる１オーディオフレームにおいて使用可能なビット数を意味する。
【０００５】
まず、インナーループでは、量子化（ＳＴＥＰ１１Ａ）とハフマン符号化（ＳＴＥＰ１２Ａ）により求められる使用ビット数が所定ビット数に収まっているかの判断を行ない（ＳＴＥＰ１３Ａ）、収まっていない場合には、周波数スペクトルを全ての帯域に対して一様に可変する変数（ｇｌｏｂａｌｇａｉｎ）を調整することで、量子化器のステップサイズに相当するｇｌｏｂａｌｇａｉｎを調整する（ＳＴＥＰ１４Ａ）ことにより、所定のビット数に納める。
【０００６】
インナーループを実行するために必要なｇｌｏｂａｌｇａｉｎの初期値は初期化部で求められる（ＳＴＥＰ１Ａ）。
初期化部では、周波数スペクトル中の最大値を量子化式により量子化した際に、量子化値がハフマン符号化を行なうための制限値を超えないようなｇｌｏｂａｌｇａｉｎを求め、これを初期値とする。
この初期値は、周波数スペクトルの最大値を基準に求めるため、量子化値全体が小さい値となる。
そのため、所定のビット数に対してビットが余る傾向になる。
【０００７】
つぎに、アウターループでは、インナーループで求められた量子化結果を元に逆量子化を行ない、周波数スペクトルのバンド単位で量子化歪みを求める（ＳＴＥＰ３Ａ）。
【０００８】
求めた量子化歪みが聴覚モデル部の信号対マスキング率ＳＭＲ（Ｓｉｇｎａｌ−ｔｏ−Ｍａｓｋ−ｒａｔｉｏ）から求めた許容歪み内に収まっているかを判断し（ＳＴＥＰ４Ａ）、収まっていない場合、そのバンドのｓｃａｌｅｆａｃｔｏｒ（ｓｆｂ）を調整する（ＳＴＥＰ５Ａ）。
量子化歪みが収まっていないバンドが１バンド以上存在する場合には、再びインナーループからやり直す。
【０００９】
収まっていないバンドが残っている場合は、量子化歪みが収まるまでｓｃａｌｅｆａｃｔｏｒ（ｓｆｂ）を何度も調整し（ＳＴＥＰ５Ａ）、インナーループを繰り返す。
よって、これまでのやり方では収束するまで時間が掛かってしまう。
【００１０】
【発明が解決しようとする課題】
前記のインナー、アウターの２重ループ処理では、外側に存在するアウターループが満足されない場合に、再び内側にあるインナーループを呼び出さねばならないため、収束時間の確定が難しい。
この収束時間を速めること、即ち、ループの回数を削減するには、ループを開始する初期値の決定が重要となる。
そこで本発明は、ループの初期値の決定手段を有する符号化装置及びその方法を提供することで、ループ回数を削減することを目的とする。
【００１１】
【課題を解決するための手段】
上記目的を達成するための手段として、前記インナーループ内で用いるパラメータの初期値を推測するブロックを用意して、周波数スペクトルを別の符号化方法で符号化を行ない算出した使用ビット数と所定ビット数を用いてパラメータの初期値を推測し、求められた初期値を用いてインナーループ処理を行なう。
【００１２】
本発明の第１の発明は、入力されたオーディオ信号を時間軸から周波数軸へ変換して周波数スペクトル信号を出力し、前記周波数スペクトル信号を所定のビット数に収まるように符号量を制御するためのパラメータを用いて量子化符号化し、前記量子化符号化された信号をビットストリームとして出力するオーディオ信号符号化方法において、
前記量子化符号化は、予め前記周波数スペクトル信号を符号化して得られた量子化値を２進数で表わしたときのビットの個数に第１の係数を乗算して得られた第１のビット数と、前記ビットの個数を２進数で表わすために予め用意されたビットの個数に第２の係数を乗算して得られた第２のビット数と、前記所定のビット数とから、前記パラメータの初期値を推測して行うことを特徴とするオーディオ信号符号化方法を提供する。
第２の発明は、入力されたオーディオ信号を時間軸から周波数軸へ変換して周波数スペクトル信号を出力する時間−周波数変換手段と、前記入力された前記オーディオ信号から量子化雑音量を制御するための許容雑音量を算出する聴覚モデル手段と、前記時間−周波数変換手段と聴覚モデル手段の各出力が供給され前記周波数スペクトル信号を量子化符号化する量子化符号化手段と、前記量子化符号化手段で量子化符号化された信号をビットストリームに変換して出力するビットストリーム化手段とを備えたオーディオ信号符号化装置において、
前記量子化符号化手段は、
前記時間−周波数変換手段から出力された周波数スペクトル信号を予め符号化して得られた量子化値を２進数で表わしたときのビットの個数に第１の係数を乗算して得られた第１のビット数と、前記ビットの個数を２進数で表わすために予め用意されたビットの個数に第２の係数を乗算して得られた第２のビット数と、所定のビット数とから、前記周波数スペクトルを所定のビット数内に収まるように符号量を制御するためのパラメータの初期値を推測する全帯域レベル推測手段と、前記全帯域レベル推測手段で推測された前記初期値に基づいて、前記周波数スペクトル信号を量子化符号化してビット信号として出力する量子化符号化変換手段と、前記量子化符号化変換手段から出力された前記ビット信号を逆量子化して逆量子化信号を出力する逆量子化手段と、前記符号量を制御するためのパラメータを変更するために、前記量子化符号化手段に変更制御信号を出力する全帯域レベル変更手段と、前記量子化符号化変換手段から出力された前記ビット信号が前記所定のビット数内に収まっているかどうかを判断して、前記ビット信号が前記所定のビット数以内に収まっていない場合には、前記全帯域レベル変更手段に制御信号を出力し、前記ビット信号が前記所定のビット数以内に収まっている場合には、前記ビット信号を前記逆量子化手段に出力する使用ビット数算出判断手段と、前記周波数スペクトル信号のバンドレベルを制御するためのパラメータを変更するために、前記量子化符号化手段に変更制御信号を出力するバンドレベル変更手段と、前記逆量子化手段から出力された前記逆量子化信号が前記聴覚モデル手段で算出された前記許容雑音量以内に収まっているかどうかを判断し、前記ビット信号が前記許容雑音量以内に収まっている場合には、何も出力せず、前記ビット信号が前記許容雑音量以内に収まっていない場合には、前記バンドレベル変更手段に制御信号を出力する量子化歪算出判断手段と、からなることを特徴とするオーディオ信号符号化装置を提供する。
【００１３】
【発明の実施の形態】
本発明のオーディオ符号化装置及びその方法の一実施例について、図と共に以下に説明する。
図１は本発明のオーディオ信号符号化装置の一実施例のブロック構成図を示し、図２には本発明のオーディオ信号符号化装置の量子化符号化部の一実施例のブロック構成図を示す。
【００１４】
図１に示される本発明のオーディオ信号符号化装置の一実施例は、時間−周波数変換部１１、聴覚モデル部１２、量子化符号化部１３、及びビットストリーム化部１４より構成されている。
【００１５】
図２に示されている本発明のオーディオ符号化装置の量子化符号化部１３の一実施例は、量子化、符号化器１３０、全帯域レベル（ｇｌｏｂａｌｇａｉｎ）推測器１３１、使用ビット数算出判断器１３２、全帯域レベル（ｇｌｏｂａｌｇａｉｎ）変更器１３３、量子化歪算出判断器１３４、バンドレベル（ｓｃａｌｅｆａｃｔｏｒ）変更器１３５、及び逆量子化器１３６より構成されている。
【００１６】
まず、図１に示される、入力されたＰＣＭ信号は、時間−周波数変換部１１においてＦＦＴやＭＤＣＴ等を用いて、時間軸から周波数軸への変換が行なわれ、その周波数スペクトルが量子化符号化部１３に供給される。
【００１７】
聴覚モデル部１２では、入力された信号に対して聴覚心理に基づいたマスキングレベルの計算により求められた信号対マスキング率SMR(Signal to Mask Ratio)が量子化符号化部１３に供給される。
【００１８】
量子化符号化部１３においては、各周波数に対するレベルを所定のビット数でかつ、前記ＳＭＲより求められた許容歪み内に量子化歪みが収まるように量子化、符号化を行ない、量子化、符号化された信号をビットストリーム化部１４に出力する。
ビットストリーム化部１４では、量子化符号化部１３より供給された信号をビットストリームとして出力する。
【００１９】
下記の（数１）及び（数２）には、量子化符号化部１３において、量子化、及び逆量子化で使用される各式の一実施例を示した。
【００２０】
【数１】

【００２１】
【数２】

【００２２】
前記（数１）の量子化式において、ｍｄｃｔｌｉｎｅ（ｋ）は、時間−周波数変換部１１より出力される周波数スペクトルを示しており、ｇｌｏｂａｌｇａｉｎは、周波数スペクトル全帯域のレベルを全帯域レベル変更器１３３により変更するものである。
【００２３】
また、ｓｃａｌｅｆａｃｔｏｒ（ｓｆｂ）は、バンド単位で周波数スペクトルのレベルをバンドレベル変更器１３５により変更するものである。
Ｇｌｏｂａｌｇａｉｎは、量子化器のステップサイズに相当し、ｓｃａｌｅｆａｃｔｏｒ（ｓｆｂ）は各スケールファクタバンドの増幅度を決定する。
【００２４】
インナーループでは、量子化とハフマン符号化により求められる使用ビット数が所定ビット数に収まっているかどうかの判断を行ない、収まっていない場合には、周波数スペクトルを全ての帯域に対して一様に可変する全帯域レベル（ｇｌｏｂａｌｇａｉｎ）変更器１３３の変数（ｇｌｏｂａｌｇａｉｎ）を調整することで、所定のビット数に納めるようにする。
【００２５】
そこで、インナーループで求められた量子化結果を元に逆量子化を行ない、バンド単位で量子化歪みを求める。
求めた量子化歪みが聴覚モデル部１２の信号対マスキング率ＳＭＲから求めた許容歪み内に収まっているかどうかを判断し、収まっていない場合、そのバンドのｓｃａｌｅｆａｃｔｏｒ（ｓｆｂ）をバンドレベル（ｓｃａｌｅｆａｃｔｏｒ）変更器１３５により調整する。
量子化歪みが収まっていないバンドが１バンド以上存在する場合には、再びインナーループからやり直す。
【００２６】
図３に本発明のオーディオ信号符号化装置及びその方法におけるイタレーションループを示す。
本発明のイタレーションループは、先に示した図４に対して、その先頭に、所定ビット数に収まるｇｌｏｂａｌｇａｉｎ値を推測するブロックである、全帯域レベル（ｇｌｏｂａｌｇａｉｎ）推測器１３１を追加した形のものになる。
【００２７】
これは、図４の従来の初期化部におけるｇｌｏｂａｌｇａｉｎ算出方法を変更して、最初から最終結果に近いｇｌｏｂａｌｇａｉｎ値を推測することが可能な算出方法としたものである。
【００２８】
このブロックにおいて使用される、global gain算出方法について、以下に説明する。
ここで求めたglobal gain値を初期値として（ＳＴＥＰ1）、つぎのインナーループ（ＳＴＥＰ２）を実行する。
【００２９】
この（ＳＴＥＰ１）の、前記所定ビット数に収まるｇｌｏｂａｌｇａｉｎ値を推測する全帯域レベル推測器１３１において使用される、別の符号化法を用いて使用ビット数を求め、ｇｌｏｂａｌｇａｉｎ値を算出推測する一実施例について、以下に示す。
【００３０】
量子化は前記（数１）より変形すると、下記（数３）となる。
【００３１】
【数３】

【００３２】
量子化された値が、何ビットで表現されるかを求めると、下記（数４）のｎｕｍ−ｂｉｔのように示される。
【００３３】
【数４】

【００３４】
ところで、前記ｎｕｍ−ｂｉｔは入力信号が１６ビットＰＣＭである場合、絶対値を取っているため、正負の符号を取り除いた１５ビットまで取り得る。
【００３５】
つぎに、例えば、それをビットストリームのように、ビット単位の羅列をした場合、サンプルｘ１，ｘ２が何ビットで表現されているかが情報として与えられていないと、取り出せなくなる。
そのため、取り出すためのサイド情報として０〜１５まで表現出来るように４ビットを別に使用する（補助情報としての第２のビット数４×１０２４）。
【００３６】
また、初期値を求めるので、ｓｃａｌｅｆａｃｔｏｒ（ｓｆｂ）＝ａｌｌｚｅｒｏとすると、前記（数４）は、下記（数５）のようになる。
【００３７】
【数５】

【００３８】
さらに、前式をサンプル数１０２４個分求めると、下記の（数６）のようになる。
【００３９】
【数６】

【００４０】
このｔｏｔａｌｎｕｍｂｉｔが前記所定ビット数（ａｖｅｒａｇｅｂｉｔ）であるような、ｇｌｏｂａｌｇａｉｎを下記（数７）より求める。
【００４１】
【数７】

【００４２】
よって、前記（数７）を整理すると下記（数８）が得られる。
【００４３】
【数８】

【００４４】
この（数８）に従って、前記時間−周波数変換部１１より供給される周波数スペクトルを量子化して得られる量子化値に対して、全帯域レベル（ｇｌｏｂａｌｇａｉｎ）推測器１３１は、２の対数を取ることで求められる、量子化値をビット表現するのに必要な第１のビット数（（１／１９２）×（３／４）Σ（ｌｏｇ２（ｘ））と、そのビット表現が何ビット幅であるかを示す補助情報としての第２のビット数（１／１９２×４０９６）と、前記所定ビット数（１／１９２）×（ａｖｅｒａｇｅｂｉｔ）とから、前記全帯域レベル変更手段１３３の初期値（ｇｌｏｂａｌｇａｉｎ）を推測する。
これによって、ループが収束した時に得られる値に近い値が推測される。
【００４５】
第１ブロックのインナーループでは、量子化とハフマン符号化により求められる使用ビット数（ＳＴＥＰ１２）が所定ビット数に収まっているかの判断を、使用ビット数算出判断器１３２により行なう（ＳＴＥＰ１３）。
【００４６】
収まっていない場合には、周波数スペクトルを全ての帯域に対して一様に可変する全帯域レベル変更器１３３の変数（ｇｌｏｂａｌｇａｉｎ）を調整する（ＳＴＥＰ１４）ことにより、所定のビット数に納めるようにする。
【００４７】
第２ブロックのアウターループでは、インナーループ（ＳＴＥＰ２）で求められた量子化結果を元に逆量子化を行ない、バンド単位で量子化歪みを求める（ＳＴＥＰ３）。
【００４８】
求めた量子化歪みが聴覚モデル部１２の信号対マスキング率ＳＭＲから求めた許容歪み内に収まっているかどうかを量子化歪算出判断器１３４により判断し、収まっていない場合（ＳＴＥＰ４）には、そのバンドのｓｃａｌｅｆａｃｔｏｒ（ｓｆｂ）をバンドレベル（ｓｃａｌｅｆａｃｔｏｒ）変更器１３５により調整を行なう（ＳＴＥＰ５）。
【００４９】
量子化歪みが収まっていないバンドが１バンド以上存在する場合（ＳＴＥＰ４）には、再びインナーループ（ＳＴＥＰ２）からやり直す。
【００５０】
インナーループを実行するために必要なｇｌｏｂａｌｇａｉｎの初期値は従来のものでは初期化部で求められる。
この初期化部では、周波数スペクトル中の最大値を量子化式により量子化した際に、量子化値がハフマン符号化を行なうための制限値を超えないようなｇｌｏｂａｌｇａｉｎを求め、これを初期値としている。
【００５１】
よって、この初期値は、周波数スペクトルの最大値を基準に求めるため、量子化値全体が小さい値となる。
【００５２】
そのため、所定のビット数に対してビットが余る傾向になるが、本発明のものは所定のビット数が的確に推定されるので、大幅にループの改善がなされ、ループの回数を減少させることが出来る。
【００５３】
本発明は全帯域レベル（ｇｌｏｂａｌｇａｉｎ）推測器１３１により初期値の推定を最適に近く出来る（ＳＴＥＰ１）ことにより、インナーループからやり直す回数は従来のものと比較すると大幅に削減させることが出来る。
【００５４】
【発明の効果】
本発明のオーディオ符号化装置及びその方法によれば、前記第１のブロック内のパラメータ（ｇｌｏｂａｌｇａｉｎ）の初期値を推測する全帯域レベル推測手段により求められた初期値は、ループが収束した時に得られる値に近い値が推測されるため、第２のブロックにおけるインナーループの回数を大幅に削減させることが出来る。
【図面の簡単な説明】
【図１】本発明のオーディオ符号化装置及びその方法の一実施例のブロック構成を示した図である。
【図２】本発明のオーディオ符号化装置及びその方法の一実施例のブロック構成を示した図である。
【図３】本発明のオーディオ符号化装置及びその方法のイタレーションループのフローを示した図である。
【図４】従来のイタレーションループのフローを示した図である。
【符号の説明】
１１時間−周波数変換部（時間−周波数変換ステップ）
１２聴覚モデル部（聴覚モデルステップ）
１３量子化符号化部（量子化符号化ステップ）
１３０量子化符号化器
１３１全帯域レベル（ｇｌｏｂａｌｇａｉｎ）推測器（全帯域レベル推測手段、ステップ）
１３２使用ビット数算出判断器（使用ビット数算出判断手段、ステップ）
１３３全帯域レベル（ｇｌｏｂａｌｇａｉｎ）変更器（全帯域レベル変更手段、ステップ）
１３４量子化歪算出判断器（量子化歪算出判断手段、ステップ）
１３５バンドレベル（ｓｃａｌｅｆａｃｔｏｒ）変更器（バンドレベル変更手段、ステップ）
１３６逆量子化器
１４ビットストリーム化部（ビットストリーム化ステップ）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an audio signal encoding method that performs encoding after converting an audio signal into a frequency domain, and in particular, calculates the number of calculation estimation loops until the allowable distortion of the apparatus and the number of bits used fall within an allowable range. The present invention relates to a reducible audio signal encoding apparatus.
[0002]
[Prior art]
Conventionally, audio signal encoding methods include, for example, adaptive spectrum auditory control entropy encoding (ASPEC, Adaptive Perceptual Entropy Coding), MPEG1 audio layer 3, and MPEG2 audio AAC (Advanced Audio Coding).
In these, a double loop is formed for nonlinear quantization and Huffman coding, and quantization distortion and code amount are controlled.
[0003]
Each loop is called an outer loop and an inner loop. In the outer loop, the quantization distortion is controlled to be less than the allowable noise level obtained from the psychoacoustic model. Control is performed so as to be within a predetermined number of bits.
[0004]
FIG. 4 shows an iteration loop process in the conventional quantization coding unit.
The conventional processing is realized by creating a loop for each of the processing for storing a predetermined number of bits and the processing for storing the quantization distortion by a predetermined amount.
The predetermined number of bits means the number of bits that can be used in one audio frame obtained from a set bit rate.
[0005]
First, in the inner loop, it is determined whether the number of used bits obtained by quantization (STEP 11A) and Huffman coding (STEP 12A) is within the predetermined number of bits (STEP 13A). The global gain corresponding to the step size of the quantizer is adjusted (STEP 14A) by adjusting a variable (global gain) that is uniformly variable for all bands, so that the predetermined number of bits is obtained.
[0006]
The initial value of global gain necessary for executing the inner loop is obtained by the initialization unit (STEP 1A).
The initialization unit obtains a global gain such that when the maximum value in the frequency spectrum is quantized by the quantization formula, the quantized value does not exceed the limit value for performing the Huffman coding, this is determined as the initial value. To do.
Since this initial value is obtained based on the maximum value of the frequency spectrum, the entire quantized value is a small value.
For this reason, the number of bits tends to remain with respect to a predetermined number of bits.
[0007]
Next, in the outer loop, inverse quantization is performed based on the quantization result obtained in the inner loop, and quantization distortion is obtained in band units of the frequency spectrum (STEP 3A).
[0008]
It is determined whether the obtained quantization distortion is within the allowable distortion obtained from the signal-to-masking ratio SMR (Signal-to-Mask-ratio) of the auditory model part (STEP 4A). If not, the scale factor of the band is determined. (Sfb) is adjusted (STEP 5A).
If there is one or more bands in which the quantization distortion is not settled, the process starts again from the inner loop.
[0009]
If there is a band that does not fit, the scale factor (sfb) is adjusted many times until the quantization distortion is settled (STEP 5A), and the inner loop is repeated.
Therefore, in the conventional method, it takes time to converge.
[0010]
[Problems to be solved by the invention]
In the inner and outer double loop processing, when the outer loop existing outside is not satisfied, it is difficult to determine the convergence time because the inner loop inside must be called again.
In order to speed up the convergence time, that is, to reduce the number of loops, it is important to determine an initial value for starting the loop.
Therefore, an object of the present invention is to reduce the number of loops by providing an encoding apparatus having a means for determining an initial value of a loop and a method therefor.
[0011]
[Means for Solving the Problems]
As means for achieving the above object, a block for estimating an initial value of a parameter used in the inner loop is prepared, and the number of used bits and a predetermined bit calculated by encoding the frequency spectrum by another encoding method are prepared. The initial value of the parameter is estimated using the number, and the inner loop processing is performed using the obtained initial value.
[0012]
According to a first aspect of the present invention, an input audio signal is converted from a time axis to a frequency axis, a frequency spectrum signal is output, and a code amount is controlled so that the frequency spectrum signal falls within a predetermined number of bits. In an audio signal encoding method for quantizing and encoding using the parameters, and outputting the quantized and encoded signal as a bit stream,
In the quantization coding, the first bit number obtained by multiplying the number of bits when the quantized value obtained by previously coding the frequency spectrum signal is represented in binary number by the first coefficient. And the second number of bits obtained by multiplying the number of bits prepared in advance to represent the number of bits in binary number by a second coefficient, and the predetermined number of bits, Provided is an audio signal encoding method characterized by estimating an initial value.
According to a second aspect of the present invention, there is provided time-frequency conversion means for converting an input audio signal from a time axis to a frequency axis to output a frequency spectrum signal, and for controlling a quantization noise amount from the input audio signal. Auditory model means for calculating the permissible noise amount, quantization encoding means for quantizing and encoding the frequency spectrum signal supplied with the outputs of the time-frequency conversion means and the auditory model means, and the quantization coding An audio signal encoding device comprising: a bit stream converting means for converting a signal quantized and encoded by the means into a bit stream and outputting the bit stream;
The quantization encoding means includes:
A first value obtained by multiplying the number of bits when the quantized value obtained by previously encoding the frequency spectrum signal output from the time-frequency conversion means is expressed in binary number by a first coefficient. From the number of bits, the second number of bits obtained by multiplying the number of bits prepared in advance to represent the number of bits in binary number and the second coefficient, and the predetermined number of bits, the frequency Based on the whole band level estimation means for estimating the initial value of the parameter for controlling the code amount so that the spectrum is within a predetermined number of bits, and the initial value estimated by the whole band level estimation means, Quantization coding conversion means for quantizing and outputting a frequency spectrum signal as a bit signal, and dequantizing the bit signal output from the quantization coding conversion means to output an inverse quantization signal Inverse quantization means, all-band level changing means for outputting a change control signal to the quantization coding means, and the quantization coding conversion means for changing a parameter for controlling the code amount It is determined whether or not the output bit signal is within the predetermined number of bits, and if the bit signal is not within the predetermined number of bits, a control signal is sent to the entire band level changing means. When the bit signal is within the predetermined number of bits, the used bit number calculation judging means for outputting the bit signal to the inverse quantization means, and the band level of the frequency spectrum signal In order to change a parameter for control, a band level changing unit that outputs a change control signal to the quantization encoding unit, and an output from the inverse quantization unit. It is determined whether the dequantized signal is within the allowable noise amount calculated by the auditory model means, and if the bit signal is within the allowable noise amount, nothing is output. An audio signal encoding device comprising: a quantization distortion calculation judging means for outputting a control signal to the band level changing means when the bit signal does not fall within the allowable noise amount. I will provide a.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment of an audio encoding apparatus and method according to the present invention will be described below with reference to the drawings.
FIG. 1 shows a block configuration diagram of an embodiment of an audio signal encoding device of the present invention, and FIG. 2 shows a block configuration diagram of an embodiment of a quantization encoding unit of the audio signal encoding device of the present invention. .
[0014]
An embodiment of the audio signal encoding apparatus of the present invention shown in FIG. 1 includes a time-frequency conversion unit 11, an auditory model unit 12, a quantization encoding unit 13, and a bitstreaming unit 14.
[0015]
An embodiment of the quantization encoding unit 13 of the audio encoding apparatus of the present invention shown in FIG. 2 includes a quantization and encoding unit 130, a global band level estimator 131, and a calculation of the number of used bits. It comprises a decision unit 132, a global band level changer 133, a quantization distortion calculation decision unit 134, a band level changer 135, and an inverse quantizer 136.
[0016]
First, the input PCM signal shown in FIG. 1 is converted from the time axis to the frequency axis using FFT, MDCT, or the like in the time-frequency conversion unit 11, and the frequency spectrum is quantized and encoded. Supplied to the unit 13.
[0017]
In the auditory model unit 12, a signal-to-masking ratio SMR (Signal to Mask Ratio) obtained by calculating a masking level based on auditory psychology for the input signal is supplied to the

quantization

encoding unit 13.
[0018]
The quantization encoding unit 13 performs quantization and encoding so that the level for each frequency is a predetermined number of bits and the quantization distortion falls within the allowable distortion obtained from the SMR. The converted signal is output to the bitstreaming unit 14.
The bit stream generator 14 outputs the signal supplied from the quantization encoder 13 as a bit stream.
[0019]
In the following (Equation 1) and (Equation 2), an example of each equation used in quantization and inverse quantization in the quantization encoding unit 13 is shown.
[0020]
[Expression 1]

[0021]
[Expression 2]

[0022]
In the quantization formula of (Equation 1), mdct line (k) indicates the frequency spectrum output from the time-frequency converter 11, and global gain changes the level of the entire frequency spectrum band. It is changed by the device 133.
[0023]
The scale factor (sfb) is used to change the level of the frequency spectrum by the band level changing unit 135 in units of bands.
Global gain corresponds to the step size of the quantizer, and scale factor (sfb) determines the amplification factor of each scale factor band.
[0024]
In the inner loop, it is determined whether the number of used bits obtained by quantization and Huffman coding is within a predetermined number of bits, and if not, the frequency spectrum is uniformly changed for all bands. By adjusting a variable (global gain) of the global band level changer 133, a predetermined number of bits is set.
[0025]
Therefore, inverse quantization is performed based on the quantization result obtained in the inner loop, and quantization distortion is obtained in band units.
It is determined whether or not the obtained quantization distortion is within the allowable distortion obtained from the signal-to-masking rate SMR of the auditory model unit 12, and if not, the scale factor (sfb) of the band is changed to the band level (scale factor). Adjustment is performed by the device 135.
If there is one or more bands in which the quantization distortion is not settled, the process starts again from the inner loop.
[0026]
FIG. 3 shows an iteration loop in the audio signal encoding apparatus and method according to the present invention.
The iteration loop of the present invention adds a global gain estimator 131, which is a block for estimating a global gain value falling within a predetermined number of bits, to the head of FIG. 4 shown above. It will be in shape.
[0027]
This is a calculation method capable of estimating a global gain value close to the final result from the beginning by changing the global gain calculation method in the conventional initialization unit of FIG.
[0028]
The global gain calculation method used in this block will be described below.
The global gain value obtained here is set as the initial value (STEP 1), and the next inner loop (STEP 2) is executed.
[0029]
In this (STEP 1), the number of used bits is obtained using another encoding method used in the all-band level estimator 131 for estimating the global gain value falling within the predetermined number of bits, and the global gain value is calculated and estimated. One embodiment is shown below.
[0030]
When the quantization is modified from the above (Equation 1), the following (Equation 3) is obtained.
[0031]
[Equation 3]

[0032]
The number of bits in which the quantized value is expressed is expressed as num-bit in the following (Equation 4).
[0033]
[Expression 4]

[0034]
By the way, when the input signal is 16-bit PCM, the num-bit takes an absolute value, and can take up to 15 bits from which the positive and negative signs are removed.
[0035]
Next, for example, when it is arranged in bit units like a bit stream, it cannot be extracted unless the number of bits represented by the samples x1 and x2 is given as information.
Therefore, 4 bits are separately used so that 0 to 15 can be expressed as side information to be extracted (second bit number 4 × 1024 as auxiliary information).
[0036]
Further, since the initial value is obtained, if scalefactor (sfb) = all zero, the above (Equation 4) becomes the following (Equation 5).
[0037]
[Equation 5]

[0038]
Further, when the previous equation is obtained for 1024 samples, the following (Equation 6) is obtained.
[0039]
[Formula 6]

[0040]
The global gain such that the total num bit is the predetermined number of bits (average bit) is obtained from the following (Equation 7).
[0041]
[Expression 7]

[0042]
Therefore, rearranging (Equation 7) yields (Equation 8) below.
[0043]
[Equation 8]

[0044]
According to this (Equation 8), the global band estimator 131 takes the logarithm of 2 with respect to the quantized value obtained by quantizing the frequency spectrum supplied from the time-frequency converter 11. The first number of bits ((1/192) × (3/4) Σ (log2 (x)) required to express the quantized value in bits, and how many bits the bit expression is From the second bit number (1/192 × 4096) as auxiliary information indicating whether or not, and the predetermined bit number (1/192) × (average bit), an initial value of the all band level changing means 133 ( infer global gain).
As a result, a value close to the value obtained when the loop converges is estimated.
[0045]
In the inner loop of the first block, the used bit number calculation determination unit 132 determines whether the number of used bits (STEP 12) obtained by quantization and Huffman coding is within a predetermined number of bits (STEP 13).
[0046]
If not, the variable (global gain) of the entire band level changer 133 that uniformly varies the frequency spectrum for all the bands is adjusted (STEP 14), so that it can be accommodated in a predetermined number of bits. To do.
[0047]
In the outer loop of the second block, inverse quantization is performed based on the quantization result obtained in the inner loop (STEP 2), and quantization distortion is obtained in band units (STEP 3).
[0048]
It is judged by the quantized distortion calculation judging unit 134 whether or not the obtained quantized distortion falls within the allowable distortion obtained from the signal-to-masking rate SMR of the auditory model unit 12, and if not (STEP 4), The band scale factor (sfb) is adjusted by the band level changing unit 135 (STEP 5).
[0049]
When there is one or more bands in which the quantization distortion is not settled (STEP 4), the process starts again from the inner loop (STEP 2).
[0050]
The initial value of the global gain necessary for executing the inner loop is obtained by the initialization unit in the conventional one.
In this initialization unit, when the maximum value in the frequency spectrum is quantized by the quantization formula, a global gain is obtained such that the quantized value does not exceed the limit value for performing the Huffman coding, and this initial value is obtained. It is said.
[0051]
Therefore, since the initial value is obtained based on the maximum value of the frequency spectrum, the entire quantized value is a small value.
[0052]
For this reason, the number of bits tends to remain with respect to a predetermined number of bits. However, according to the present invention, since the predetermined number of bits is accurately estimated, the loop can be greatly improved and the number of loops can be reduced. I can do it.
[0053]
In the present invention, since the initial value can be estimated almost optimally by the global band estimator 131 (STEP 1), the number of times of redoing from the inner loop can be greatly reduced as compared with the conventional one.
[0054]
【The invention's effect】
According to the audio encoding apparatus and method of the present invention, the initial value obtained by the all-band level estimating means for estimating the initial value of the parameter (global gain) in the first block is obtained when the loop converges. Since a value close to the obtained value is estimated, the number of inner loops in the second block can be greatly reduced.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an embodiment of an audio encoding apparatus and method according to the present invention.
FIG. 2 is a block diagram showing an audio encoding apparatus and method according to an embodiment of the present invention.
FIG. 3 is a diagram showing an iteration loop flow of the audio encoding apparatus and method according to the present invention.
FIG. 4 is a diagram showing a flow of a conventional iteration loop.
[Explanation of symbols]
11 Time-frequency conversion unit (time-frequency conversion step)
12 Auditory model part (auditory model step)
13 Quantization encoding part (quantization encoding step)
130 Quantization Encoder 131 Global Band Estimator (Full Band Level Estimator, Step)
132 Used bit number calculation judgment device (used bit number calculation judgment means, step)
133 All band level changer (all band level changing means, step)
134 Quantization distortion calculation judgment device (quantization distortion calculation judgment means, step)
135 Band level changer (band level changing means, step)
136 Inverse Quantizer 14 Bit Streaming Unit (Bit Streaming Step)

Claims

The input audio signal is converted from the time axis to the frequency axis to output a frequency spectrum signal, and the frequency spectrum signal is quantized and encoded using parameters for controlling the code amount so as to be within a predetermined number of bits. In the audio signal encoding method for outputting the quantized and encoded signal as a bit stream,
In the quantization coding, the first bit number obtained by multiplying the number of bits when the quantized value obtained by previously coding the frequency spectrum signal is represented in binary number by the first coefficient. And the second number of bits obtained by multiplying the number of bits prepared in advance to represent the number of bits in binary number by a second coefficient, and the predetermined number of bits, An audio signal encoding method characterized in that an initial value is estimated.

Time-frequency conversion means for converting the input audio signal from the time axis to the frequency axis to output a frequency spectrum signal, and calculating an allowable noise amount for controlling the quantization noise amount from the input audio signal Auditory model means, quantized encoding means for quantizing and encoding the frequency spectrum signal supplied with outputs of the time-frequency converting means and the auditory model means, and quantized coding by the quantized encoder means In an audio signal encoding device comprising: a bit stream converting means for converting a converted signal into a bit stream and outputting the bit stream;
The quantization encoding means includes:
A first value obtained by multiplying the number of bits when the quantized value obtained by pre- encoding the frequency spectrum signal output from the time-frequency conversion means is expressed in binary number by a first coefficient. From the number of bits, the second number of bits obtained by multiplying the number of bits prepared in advance to represent the number of bits in binary number and the second coefficient, and the predetermined number of bits, the frequency An all-band level estimating means for estimating an initial value of a parameter for controlling the amount of code so that the spectrum falls within a predetermined number of bits;
Based on the initial value estimated by the all-band level estimation means, quantization coding conversion means for quantizing and encoding the frequency spectrum signal as a bit signal;
Inverse quantization means for inversely quantizing the bit signal output from the quantization encoding conversion means and outputting an inverse quantization signal;
An all-band level changing unit that outputs a change control signal to the quantization encoding unit in order to change a parameter for controlling the code amount;
It is determined whether the bit signal output from the quantization coding conversion means is within the predetermined number of bits, and when the bit signal is not within the predetermined number of bits, A control signal is output to the entire band level changing means, and when the bit signal is within the predetermined number of bits, the used bit number calculation judging means for outputting the bit signal to the inverse quantization means,
Band level changing means for outputting a change control signal to the quantization coding means in order to change a parameter for controlling the band level of the frequency spectrum signal;
It is determined whether the inverse quantized signal output from the inverse quantizing means is within the allowable noise amount calculated by the auditory model means, and the bit signal is within the allowable noise amount. In this case, nothing is output, and when the bit signal is not within the allowable noise amount, a quantization distortion calculation determining unit that outputs a control signal to the band level changing unit,
An audio signal encoding device comprising: