JP4313993B2

JP4313993B2 - Audio decoding apparatus and audio decoding method

Info

Publication number: JP4313993B2
Application number: JP2002211443A
Authority: JP
Inventors: 峰生津島; 直也田中; 武志則松; セン・チョンコク; ハン・クアキム; ホン・ネオスア; 俊之野村; 修嶋田; 雄一郎高見沢; 芹沢　　昌宏
Original assignee: Panasonic Corp; NEC Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; NEC Corp; Panasonic Holdings Corp
Priority date: 2002-07-19
Filing date: 2002-07-19
Publication date: 2009-08-12
Anticipated expiration: 2022-07-19
Also published as: JP2004053940A

Abstract

<P>PROBLEM TO BE SOLVED: To decode a wide-band audio signal of high quality at a low bit rate. <P>SOLUTION: Low-band component information separated by a bit stream separating means is decoded into a low-band time signal representing a low-band component. The obtained low-band time signal is divided into a plurality of low-band subband signals. A band expanding means generates a high-band subband signal according to the low-band subband signals and high-band component information. The low-band and high-band subband signals are put together by a composition subband filter into a time area composite signal. According to sine-wave addition information, a signal of a time area representing a sine wave having a desired frequency and a desired amplitude characteristic is generated and put together with the time area composite signal. Through this constitution, a decoded signal more faithful to an input signal can be obtained. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、少ない情報量の補助情報を付加することによって、狭帯域なオーディオ信号から広帯域なオーディオ信号を生成する帯域拡張システムに関わり、当該システムにおける再生信号の高品質化と低演算量化のための技術に関する。
【０００２】
【従来の技術】
一般的な音響信号を、少ない情報量で符号化でき、かつ高品質な再生信号を得られる技術として、帯域分割符号化を利用する方法が広く知られている。これは、入力された音響信号を、帯域分割フィルタを用いて複数の周波数帯域の信号に分割するか、もしくはフーリエ変換等の時間−周波数変換を用いて周波数軸の信号に変換した後、周波軸上で複数の帯域に分割した上で、分割された各帯域に適切な符号化ビット割当を行うことにより、実現されるものである。帯域分割符号化を用いることにより、少ない情報量の符号から高品質な再生信号を得られる理由は、符号化段階において人間の聴覚特性に基づいた処理を行うことができることにある。一般に、人間の聴覚は、１０ｋＨｚ程度以上の高い周波数の音に対しては感度が下がり、レベルの低い音は感知されにくくなる。また、周波数マスキングと呼ばれる現象も良く知られており、ある特定の周波数帯域に高いレベルの音が存在する場合、その周辺帯域のレベルの低い音は感知されにくくなる。このような、聴覚的な特性によって感知されにくい部分については、ビット割当を行って符号化を行っても再生信号の品質にはほとんど影響を及ぼさず、符号化の意味をなさない。逆に、聴覚的特性を考慮しないままこの部分に割り当てられていた符号化ビットを、他の聴覚的に敏感な部分に割当て直すことによって、聴覚的に敏感な部分を詳細に符号化し、再生信号の品質を向上することができる。このような帯域分割を利用した符号化の代表例としては、ＩＳＯ国際標準規格ＭＰＥＧ−４ＡＡＣ（ＩＳＯ／ＩＥＣ１４４９６−３）があり、９６ｋｂｐｓ程度のビットレートにおいて、１６ｋＨｚ以上の広帯域のステレオ信号を高品質に符号化することが可能である。
【０００３】
しかしながら、ビットレートを例えば４８ｋｂｐｓ程度に低下させた場合、高品質に符号化できる帯域は１０ｋＨｚ程度以下となり、聴感的にはこもった感じの音となる。このような帯域制限による音質劣化を補償する方法としては、例えば、ＥＴＳＩ（ＥｕｒｏｐｅａｎＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎｓＳｔａｎｄａｒｄｓＩｎｓｔｉｔｕｔｅ）が勧告する「ＤｉｇｉｔａｌＲａｄｉｏＭｏｎｄｉａｌｅ（ＤＲＭ）；ＳｙｓｔｅｍＳｐｅｃｉｆｉｃａｔｉｏｎ」（ＥＴＳＩＴＳ１０１９８０）に記載される、ＳＢＲ（ＳｐｅｃｔｒａｌＢａｎｄＲｅｐｌｉｃａｔｉｏｎ）と呼ばれる技術がある。
【０００４】
図７はＳＢＲによる帯域拡張を行うデコーダの一例を示す図である。以降、図を参照しながら、その動作を説明する。入力ビットストリーム７０６は、ビットストリーム分離手段７０１において、低域成分情報７０７、高域成分情報７０８、および正弦波付加情報７０９に分離される。低域成分情報７０７は、例えばＭＰＥＧ−４ＡＡＣ等の符号化方式を用いて符号化された情報であり、低域復号手段７０２において復号され、低域成分を表す時間信号が生成される。生成された低域成分を表す時間信号は、分析フィルタバンク７０３において複数（Ｍ個）のサブバンドに分割され、帯域拡張手段７０４に入力される。帯域拡張手段７０４は、低域成分を表す低域サブバンド信号を高域のサブバンドにコピーすることによって、帯域制限によって失われた高域成分を補償する。ここで、帯域拡張手段７０４に入力される高域成分情報７０８には、補償される高域サブバンドに対するゲイン情報が含まれており、生成された高域サブバンドごとにゲインが調整される。また、正弦波付加情報７０９にしたがって、各高域サブバンドに対して、ゲイン制御された正弦波が加算される。帯域拡張手段７０４において生成された高域サブバンド信号は、低域サブバンド信号と共に合成フィルタバンク７０５に入力されて帯域合成され、出力信号７１０が生成される。このとき、合成フィルタバンク側のサブバンド数は、分析フィルタ側のサブバンド数と一致していなくても良い。例えば、図７においてＮ＝２Ｍの関係が成り立つとすれば、出力信号のサンプリング周波数は、分析フィルタバンクに入力される時間信号のサンプリング周波数に対して２倍となる。
【０００５】
上記の構成では、高域成分情報７０８もしくは正弦波付加情報７０９に含まれる情報は、ゲイン制御に関わる情報のみであるので、スペクトル情報を含む低域成分情報７０７と比較して非常に少ない情報量しか必要としない。したがって、低ビットレートにおいて広帯域の信号を符号化するのに適した方法である。
【０００６】
【発明が解決しようとする課題】
しかしながら、上記の構成においては、特に正弦波を付加する場合においてサブバンドフィルタに起因する制限を受ける。たとえば、高域サブバンドに正弦波を付加する際に、隣接するサブバンドフィルターとの境界に相当するような周波数に正弦波を付加する場合は、隣接する２つのサブバンドに対して、正弦波成分を追加しなくてはならず、かつ、その正弦波成分に対する振幅値や位相は相互に影響するので、それを導出するのに多くの計算量が必要となることが予想される。よって、実際には、各サブバンドの中心周波数に正弦波を注入することが拘束条件となってしまい、入力信号に忠実な再生音を得ることが困難である。また、本発明の対象とする帯域拡張システムにおいては、正弦波を付加する高域成分の信号は、その位相成分を詳細に符号化して情報転送すると、多くの情報量を必要とするので、情報量削減の意味から位相成分は一般的に符号化されない。したがって、復号化側では位相成分は未知であり、そのエネルギーのみから復号処理をおこなう必要がある。その場合、複素数係数のサブバンドフィルターを用いて信号の位相を考慮するのが一般であるが、演算量削減のために実数係数のサブバンドフィルタを用いることも可能である。しかしながら、実数係数のサブバンドフィルタにおける位相の制御は、複素数係数のサブバンドフィルターにおける制御と比較して困難である。
【０００７】
本発明では、このような従来の問題点に鑑みてなされたものであって、正弦波付加を時間領域の信号に対して行うように構成することによって、サブバンドフィルタに起因する正弦波周波数の制限や位相に関わる問題を解決し、できるだけ入力信号に忠実な信号の復号を可能にする復号化方法を提供するものである。
【０００８】
【課題を解決するための手段】
上記課題を解決するために、本発明は、定められた時間長のフレームに分割され、さらに複数のサブバンドに分割されて符号化されたオーディオ信号情報を多重化したビットストリームから、各符号化情報を分離するビットストリーム分離手段と、分離された低域の符号化情報から低域成分を表す時間信号を復号する低域復号化手段と、復号された低域時間信号を複数の低域サブバンド信号に分割する分析サブバンドフィルタと、低域サブバンド信号と分離された高域の符号化情報とから高域サブバンド信号を生成する帯域拡張手段と、低域および高域サブバンド信号を合成して時間領域の信号を得る合成サブバンドフィルタと、分離された正弦波付加情報から所望の周波数および振幅特性の正弦波を表す時間領域の信号を生成する正弦波生成手段と、前記正弦波生成手段から生成される正弦波の周波数を、復号された低域時間信号の周波数の整数倍になるように調整する周波数調整手段と、前記合成フィルタバンクから得られた時間領域信号と前記正弦波生成手段から得られた正弦波を合成して出力オーディオ信号を得る時間信号合成手段とを設けるものである。
【０００９】
【発明の実施の形態】
本発明の第１の実施の形態関わるオーディオ復号化方法は、定められた時間長のフレームに分割され、さらに複数のサブバンドに分割されて符号化されたオーディオ信号情報を多重化したビットストリームから、各符号化情報を分離するビットストリーム分離手段と、分離された低域の符号化情報から低域成分を表す時間信号を復号する低域復号化手段と、復号された低域時間信号を複数の低域サブバンド信号に分割する分析サブバンドフィルタと、低域サブバンド信号と分離された高域の符号化情報とから高域サブバンド信号を生成する帯域拡張手段と、低域および高域サブバンド信号を合成して時間領域の信号を得る合成サブバンドフィルタと、分離された正弦波付加情報から所望の周波数および振幅特性の正弦波を表す時間領域の信号を生成する正弦波生成手段と、複数の時間領域の信号を合成する時間信号合成手段とを設け、前記合成フィルタバンクから得られた時間領域の信号と正弦波生成手段から得られた正弦波を合成するようにした構成である。
【００１０】
本発明の第２の実施の形態に関わるオーディオ復号化方法は、本発明の第１の実施の形態に関わるオーディオ復号化方法に対して、低域復号化手段から得られた低域時間信号を分析し、信号の周期性およびその周波数を検出する周期性分析手段と、検出された周波数に基づいて、正弦波生成手段を制御する周波数調整手段とを備え、低域時間信号の特性に従って、生成される正弦波の周波数を適応的に制御するようにした構成である。
【００１１】
本発明の第３の実施の形態に関わるオーディオ復号化方法は、本発明の第１の実施の形態に関わるオーディオ復号化方法に対して、前記低域時間信号を分析し、信号の変化を検出する信号変化検出手段を設け、低域時間信号の特性に従って、生成される正弦波の振幅制御位置を適応的に制御するようにした構成である。
【００１２】
以下、本発明の実施の形態におけるオーディオ復号化方法について、図面を用いて説明する。
（実施の形態１）
図１は、本発明の実施の形態１における復号化方法を示す構成図である。先に説明した図７に示される従来例と異なる点は、従来例においては正弦波付加情報７０９が帯域拡張手段７０４に直接入力され、サブバンド信号に対して正弦波付加処理が行われていたのに対して、本発明の実施の形態では、時間領域正弦波生成手段１０６を設け、正弦波付加情報１１２にしたがって時間領域の正弦波信号１１４を生成し、合成フィルタバンク１０５から出力された時間領域合成信号１１６と加算するように構成されていることである。
【００１３】
入力ビットストリーム１０９は、ビットストリーム分離手段１０１において、低域成分情報１１０、高域成分情報１１１、および正弦波付加情報１１２に分離される。低域成分情報１１０は、例えばＭＰＥＧ−４ＡＡＣ等の符号化方式を用いて符号化された情報であり、低域復号手段１０２において復号され、低域成分を表す時間信号が生成される。生成された低域成分を表す時間信号は、分析フィルタバンク１０３において複数（Ｍ個）のサブバンドに分割され、帯域拡張手段１０４に入力される。帯域拡張手段１０４は、低域成分を表す低域サブバンド信号を高域のサブバンドにコピーすることによって、帯域制限によって失われた高域成分を補償する。ここで、帯域拡張手段１０４に入力される高域成分情報１１１には、補償される高域サブバンドに対するゲイン情報が含まれており、生成された高域サブバンドごとにゲインが調整される。帯域拡張手段１０４において生成された高域サブバンド信号は、低域サブバンド信号と共に合成フィルタバンク１０５に入力されて帯域合成され、時間領域合成信号１１６が生成される。また、時間領域正弦波生成手段１０６は、入力された正弦波付加情報１１２に基づいて、正弦波調整手段１０７において生成する正弦波の特性を調整するパラメータを生成した後、正弦波生成手段１０８において、所望の正弦波を生成し、時間領域正弦波信号１１４として出力する。時間領域合成信号１１６と、時間領域正弦波信号１１４は時間信号として合成され、出力信号１１５となる。
【００１４】
ここで、時間領域正弦波生成手段１０６の構成および動作を詳しく説明する。図２は時間領域正弦波生成手段１０６の構成を示す図である。正弦波調整手段１０７は、振幅調整手段２０１、タイミング調整手段２０２および概形生成手段２０４より構成される。振幅調整手段２０１は、正弦波付加情報２０５を参照して、生成される正弦波の振幅を制御する振幅情報２０７を出力する。同様に、タイミング調整手段２０２は、正弦波付加情報２０５を参照して、生成される正弦波の変化のタイミングを制御するタイミング情報２０８を出力する。続いて、概形生成手段２０４は、振幅情報２０７とタイミング情報２０８から、正弦波の概形を生成する。図３は、生成された正弦波の概形を表す図である。通常、正弦波の概形はある時刻を示すタイミング情報ｔと、対応する振幅情報Ａの組み合わせにより、（ａ）に示される補間前の概形３０１のような階段状の形状となる。正弦波付加情報２０５に含まれる正弦波の周波数情報２０６に基づいて、正弦波生成手段２０３において生成された振幅一定の正弦波に対して、前記概形を適用することにより、所望の時間振幅特性を有する時間領域の正弦波２０９を生成することができる。
【００１５】
このような構成とすることにより、サブバンドフィルタによる正弦波の周波数に対する制限を受けずに、付加する正弦波の周波数を設定することができるので、より入力信号に近い高品質な出力信号を得ることができる。また、付加する正弦波は、時間信号として生成されるので、位相の制御も容易である。
【００１６】
なお、正弦波付加情報に複数の正弦波を付加する情報が含まれている場合には、複数の周波数の正弦波を順次生成し、それぞれの正弦波に対して、対応する概形を適応した後、全ての正弦波を合成すればよい。対応する概形は、全ての正弦波に対して同一であっても良いし、それぞれに異なる形状であっても良い。
【００１７】
また、概形生成手段２０４における正弦波の概形生成にあたっては、隣接フレーム間の振幅情報を用いて補間処理を行うことにより、信号の急激な変化を抑制し、音質を向上させることができる。図３の（ｂ）に示される補間後の概形では、（ａ）に示される補間前の概形３０１に存在する振幅が急激に変化する点が無いため、出力信号の音量が滑らかに変化し聴感上の音質が向上する。同様の処理は、従来のサブバンド信号に対する正弦波付加においても可能であるが、この場合、振幅を制御できる時間方向の単位はサブサンプル（サブバンド信号におけるサンプル）単位であり、補間精度は低下することになる。本構成では、出力信号のサンプル単位での補間が可能であり、より高品質な出力信号を得ることができる。
【００１８】
また、隣接フレーム間の補間としては、周波数の補間も可能である。振幅情報と同様に出力信号のサンプル単位での補間が可能であり、付加される正弦波の周波数を滑らかに変化させることにより、より高音質な出力信号を得ることができる。
【００１９】
（実施の形態２）
図４は、本発明の実施の形態２における復号化方法を示す構成図である。実施の形態２の構成は、図２に示される実施の形態１の時間領域正弦波生成手段において、周波数分析手段４１０と周波数調整手段４１１を設け、周波数分析手段４１０において低域時間信号４１２を分析し、分析結果に基づいて、周波数調整手段４１１において、生成する正弦波の周波数を適応的に制御するようにした構成である。低域時間信号４１２としては、例えば図１の低域復号手段１０２の出力を使用する。
【００２０】
ここで、低域時間信号の周波数を分析し、その結果に基づいて生成する正弦波の周波数を制御する理由を説明する。図５は、オーディオ信号のスペクトル分布を示す図である。入力ビットストリームに含まれる情報のうち、オーディオ信号のスペクトルの符号化情報を保持しているのは低域成分情報のみであり、低域成分の符号化にあたっては、５０１で示される帯域制限が適用されている。したがって、低域復号手段から出力された低域時間信号には、帯域制限５０１の範囲内のスペクトルしか含まれないことになる。正弦波付加処理においては、帯域制限５０１の範囲を超える高い周波数に正弦波を付加することによって、復号されるオーディオ信号の帯域幅を拡張するが、通常、付加される正弦波は、ある基本周波数を持つ基本波５０２の整数倍の周波数を持つ高調波５０３である。これは、一般に符号化されるオーディオ信号の多くが、複数の高調波の集合により構成されているという事実に基づいている。したがって、正弦波付加情報４０５には、基本波５０２の整数倍を正しく表す周波数情報を含める必要があるが、そのためには、多くの情報量を割り当てる必要があるため、本発明が対象とするような低いビットレートにおいては、近似値となる周波数情報しか割り当てることができない。これにより、付加される正弦波の周波数は、整数倍高調波５０３と異なることとなり、復号されるオーディオ信号の品質の低下につながっていた。これに対し、本発明の実施の形態では、周波数分析手段４１０において低域時間信号４１２を分析し、分析された基本周波数に基づいて、周波数調整手段４１１において、生成される正弦波の周波数を、基本周波数の整数倍になるように制御するので、付加される正弦波の周波数と、整数倍高調波５０３のずれが解消されるので、より入力信号に忠実な出力信号を得ることができる。
【００２１】
なお、周波数分析手段４１０においては、分析された基本周波数の強度情報等に基づいて、周波数調整手段４１１での周波数の適応制御を行うか行わないかを切り替えるように構成することも可能である。
【００２２】
（実施の形態３）
図６は、本発明の実施の形態３おける復号化方法を示す構成図である。実施の形態３の構成は、図２に示される実施の形態１の時間領域正弦波生成手段において、信号変化検出手段６１１を設け、信号変化検出手段６１１において低域時間信号６１０を分析し、分析結果に基づいて、タイミング調整手段６０２において、生成する正弦波の振幅制御位置を適応的に制御するようにした構成である。低域時間信号６１０としては、例えば、図１の低域復号手段１０２の出力を使用する。
【００２３】
ここで、低域時間信号の変化を分析し、その結果に基づいて生成する正弦波の振幅制御位置を制御する理由を説明する。付加される正弦波の振幅は、図３に示されるようにある時間位置ｔにおける振幅情報として与えられる。入力信号をできるだけ忠実に表現するためには、振幅調整位置をできるだけ多く設定する必要があるが、本発明の復号化装置が対象とするような低いビットレートにおいては、情報量削減のため、少数の振幅調整位置しか設置することができない。また、振幅調整位置はあらかじめ定められた複数の候補点からしか選択できない。このため、実際の入力信号の変化点と振幅調整位置にずれが生じ、復号されるオーディオ信号の品質の低下につながっていた。これに対し、本発明の実施の形態では、信号変化検出手段６１１において低域時間信号６１０を分析し、信号の変化点を検出して、その位置情報に基づいて正弦波の振幅調整位置を適応的に制御するので、入力信号の変化点と振幅調整位置のずれが解消し、より入力信号に忠実な出力信号を得ることができる。
【００２４】
なお、本実施の形態の正弦波の振幅調整位置制御は、図1に示す帯域拡張手段１０４における高域信号生成に対しても適用が可能である。高域信号の振幅は、正弦波信号と同様に、ある時間位置ｔにおける振幅情報として与えられるので、低域信号を分析して得られた変化点の位置情報に基づいて、振幅調整位置を適応的に制御することにより、入力信号の変化点と振幅調整位置のずれを解消し、より入力信号に忠実な出力信号を得ることができる。
【００２５】
なお、前記本実施の形態１から３においては、正弦波付加情報に基づく正弦波の生成方法および合成方法を説明したが、正弦波の代わりに周期性を持つどのような波形の信号を用いてもよい。時間領域信号に対する処理として実現するため、使用する波形の周波数スペクトル分布に影響される事無く、本実施の形態と同様な構成により実現することが可能である。
【００２６】
【発明の効果】
本発明によれば、正弦波付加を時間領域の信号に対して行うように構成することによって、サブバンドフィルタに起因する正弦波周波数の制限や位相に関わる問題を解決し、より入力信号に忠実な高品質なオーディオ信号の復号が可能となる。
また、本発明によれば、低域時間信号の分析結果に基づいて、付加する正弦波の周波数を適応的に制御することによって、より入力信号に忠実な高品質なオーディオ信号の復号が可能となる。
また、本発明によれば、低域時間信号の分析結果に基づいて、付加する正弦波の振幅制御位置を適応的に制御することによって、より入力信号に忠実な高品質なオーディオ信号の復号が可能となる。
【図面の簡単な説明】
【図１】本発明のオーディオ復号化装置の構成の一例を示す図
【図２】本発明の時間領域正弦波生成手段の一例を示す図
【図３】基本波と高調波の関係を示す図
【図４】本発明の時間領域正弦波生成手段の一例を示す図
【図５】振幅調整のための正弦波の概形を示す図
【図６】本発明の時間領域正弦波生成手段の一例を示す図
【図７】従来のオーディオ復号化装置の一例を示す図
【符号の説明】
１０１ビットストリーム分離手段
１０２低域復号手段
１０３分析フィルタバンク
１０４帯域拡張手段
１０５合成フィルタバンク
１０６時間領域正弦波生成手段
１０７正弦波調整手段
１０８正弦波生成手段
１０９ビットストリーム
１１０低域成分情報
１１１高域成分情報
１１２正弦波付加情報
１１３正弦波調整情報
１１４時間領域正弦波信号
１１５出力信号
１１６時間領域合成信号
２０１、４０１、６０１振幅調整手段
２０２、４０２、６０２タイミング調整手段
２０３、４０３、６０３正弦波生成手段
２０４、４０４、６０４概形生成手段
２０５、４０５、６０５正弦波付加情報
２０６、４０６、６０６周波数情報
２０７、４０７、６０７振幅情報
２０８、４０８、６０８タイミング情報
２０９、４０９、６０９時間領域正弦波信号
３０１補間前の概形
３０２補間後の概形
４１０周波数分析手段
４１１周波数調整手段
４１２低域時間信号
５０１帯域制限
５０２基本波
５０３高調波
５０４整数倍の関係
６１０低域時間信号
６１１信号変化検出手段[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a band expansion system that generates a wideband audio signal from a narrowband audio signal by adding auxiliary information with a small amount of information, and for improving the quality of the reproduced signal and reducing the amount of computation in the system. Related to technology.
[0002]
[Prior art]
As a technique that can encode a general acoustic signal with a small amount of information and obtain a high-quality reproduced signal, a method that uses band division coding is widely known. This is because the input acoustic signal is divided into a plurality of frequency band signals using a band division filter, or converted to a frequency axis signal using time-frequency conversion such as Fourier transform, and then the frequency axis. This can be realized by dividing the data into a plurality of bands and assigning appropriate coded bits to the divided bands. The reason why a high-quality reproduction signal can be obtained from a code with a small amount of information by using band division coding is that processing based on human auditory characteristics can be performed at the coding stage. In general, human hearing is less sensitive to sounds with a high frequency of about 10 kHz or more, and sounds with low levels are less likely to be detected. Also, a phenomenon called frequency masking is well known, and when a high level sound exists in a specific frequency band, a low level sound in the peripheral band becomes difficult to detect. For such parts that are difficult to detect due to auditory characteristics, even if bit allocation is performed and encoding is performed, the quality of the reproduced signal is hardly affected, and encoding does not make sense. Conversely, by reassigning the coded bits assigned to this part without considering the auditory characteristics to other auditory sensitive parts, the auditory sensitive parts are encoded in detail and the reproduced signal is reproduced. Can improve the quality. As a typical example of encoding using such band division, there is ISO international standard MPEG-4 AAC (ISO / IEC 14496-3), and at a bit rate of about 96 kbps, a wide-band stereo signal of 16 kHz or more is received. It is possible to encode with high quality.
[0003]
However, when the bit rate is reduced to, for example, about 48 kbps, the band that can be encoded with high quality is about 10 kHz or less, and the sound is audibly sounded. As a method for compensating for such sound quality degradation due to band limitation, for example, “Digital Radio Monaural (DRM) recommended by ETSI (European Telecommunications Standards Institute)” is described in System Specification 101 (ETSI TS 101, 9). There is a technique called (Spectral Band Replication).
[0004]
FIG. 7 is a diagram illustrating an example of a decoder that performs band extension by SBR. Hereinafter, the operation will be described with reference to the drawings. The input bit stream 706 is separated into low-frequency component information 707, high-frequency component information 708, and sine wave additional information 709 by the bit stream separation means 701. The low frequency component information 707 is information encoded by using an encoding method such as MPEG-4 AAC, for example, and is decoded by the low frequency decoding means 702 to generate a time signal representing the low frequency component. The generated time signal representing the low frequency component is divided into a plurality (M) of subbands in the analysis filter bank 703 and input to the band extending means 704. The band extending unit 704 compensates for the high frequency component lost due to the band limitation by copying the low frequency subband signal representing the low frequency component to the high frequency subband. Here, the high frequency component information 708 input to the band extending means 704 includes gain information for the high frequency sub-band to be compensated, and the gain is adjusted for each generated high frequency sub-band. Further, according to the sine wave additional information 709, a gain-controlled sine wave is added to each high frequency sub-band. The high frequency sub-band signal generated in the band extending unit 704 is input to the synthesis filter bank 705 together with the low frequency sub-band signal, and is subjected to band synthesis to generate an output signal 710. At this time, the number of subbands on the synthesis filter bank side does not need to match the number of subbands on the analysis filter side. For example, if the relationship of N = 2M is established in FIG. 7, the sampling frequency of the output signal is twice the sampling frequency of the time signal input to the analysis filter bank.
[0005]
In the above configuration, since the information included in the high-frequency component information 708 or the sine wave additional information 709 is only information related to gain control, the amount of information is very small compared to the low-frequency component information 707 including spectrum information. I only need it. Therefore, this is a method suitable for encoding a wideband signal at a low bit rate.
[0006]
[Problems to be solved by the invention]
However, the above configuration is subject to limitations due to the subband filter, particularly when a sine wave is added. For example, when adding a sine wave to a high frequency sub-band, when adding a sine wave to a frequency corresponding to the boundary with an adjacent sub-band filter, the sine wave is applied to two adjacent sub-bands. A component must be added, and since the amplitude value and phase for the sine wave component influence each other, it is expected that a large amount of calculation is required to derive the component. Therefore, in practice, a sine wave is injected into the center frequency of each subband, which is a constraint condition, and it is difficult to obtain reproduced sound faithful to the input signal. Further, in the band extension system to which the present invention is applied, a high-frequency component signal to which a sine wave is added requires a large amount of information when the phase component is encoded in detail and information is transferred. In general, the phase component is not encoded in order to reduce the amount. Therefore, the phase component is unknown on the decoding side, and it is necessary to perform the decoding process only from the energy. In this case, the signal phase is generally considered using a complex coefficient subband filter, but a real coefficient subband filter may be used to reduce the amount of computation. However, it is difficult to control the phase in the real-band coefficient sub-band filter as compared to the control in the complex-band coefficient sub-band filter.
[0007]
The present invention has been made in view of such a conventional problem, and is configured to perform addition of a sine wave to a signal in a time domain, so that the sine wave frequency caused by the subband filter can be reduced. It is an object of the present invention to provide a decoding method that solves the problems related to limitations and phase and enables decoding of a signal that is as faithful as possible to the input signal.
[0008]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, the present invention is to divide each encoded signal from a bit stream obtained by multiplexing audio signal information that is divided into frames of a predetermined time length and further divided into a plurality of subbands. Bit stream separation means for separating information, low frequency decoding means for decoding a time signal representing a low frequency component from the separated low frequency encoded information, and a plurality of low frequency sub signals An analysis subband filter that divides the signal into band signals, band expansion means for generating a high frequency subband signal from the low frequency subband signal and the high frequency encoded information separated from the low frequency subband signal, and low frequency and high frequency subband signals A combined subband filter that combines to obtain a time domain signal and a sine wave generator that generates a time domain signal representing a sine wave of the desired frequency and amplitude characteristics from the separated sine wave additional information. Means, frequency adjusting means for adjusting the frequency of the sine wave generated from the sine wave generating means to be an integral multiple of the frequency of the decoded low-frequency signal, and the time obtained from the synthesis filter bank There is provided time signal synthesizing means for synthesizing the area signal and the sine wave obtained from the sine wave generating means to obtain an output audio signal.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
The audio decoding method according to the first embodiment of the present invention is based on a bitstream obtained by multiplexing audio signal information that is divided into frames of a predetermined time length and further divided into a plurality of subbands and encoded. A bit stream separating means for separating each encoded information; a low-frequency decoding means for decoding a time signal representing a low-frequency component from the separated low-frequency encoded information; and a plurality of decoded low-frequency signals An analysis subband filter that divides the signal into low-frequency subband signals, band expansion means for generating a high-frequency subband signal from the low-frequency subband signal and the high-frequency encoded information separated from the low-frequency subband signal, A synthesized subband filter that synthesizes subband signals to obtain a time domain signal, and a time domain signal that represents a sine wave of desired frequency and amplitude characteristics from the separated sine wave additional information A sine wave generating means for generating and a time signal synthesizing means for synthesizing a plurality of time domain signals are provided, and the time domain signal obtained from the synthesis filter bank and the sine wave obtained from the sine wave generating means are synthesized. It is the structure made to do.
[0010]
The audio decoding method according to the second embodiment of the present invention differs from the audio decoding method according to the first embodiment of the present invention in that the low frequency signal obtained from the low frequency decoding means is used. Analyzing and detecting the periodicity of the signal and its frequency, and a periodicity adjusting means for controlling the sine wave generating means based on the detected frequency, and generating according to the characteristics of the low frequency signal In this configuration, the frequency of the generated sine wave is adaptively controlled.
[0011]
The audio decoding method according to the third embodiment of the present invention is a method of analyzing the low frequency signal and detecting a change in the signal with respect to the audio decoding method according to the first embodiment of the present invention. The signal change detecting means is provided, and the amplitude control position of the generated sine wave is adaptively controlled according to the characteristics of the low frequency signal.
[0012]
Hereinafter, an audio decoding method according to an embodiment of the present invention will be described with reference to the drawings.
(Embodiment 1)
FIG. 1 is a configuration diagram showing a decoding method according to Embodiment 1 of the present invention. The difference from the conventional example shown in FIG. 7 described above is that in the conventional example, the sine wave additional information 709 is directly input to the band expanding means 704 and the sine wave addition process is performed on the subband signal. On the other hand, in the embodiment of the present invention, the time domain sine wave generation means 106 is provided to generate the time domain sine wave signal 114 according to the sine wave additional information 112, and the time output from the synthesis filter bank 105. It is configured to be added to the region composite signal 116.
[0013]
The input bit stream 109 is separated into low-frequency component information 110, high-frequency component information 111, and sine wave additional information 112 by the bit stream separation means 101. The low frequency component information 110 is information encoded by using an encoding method such as MPEG-4 AAC, for example, and is decoded by the low frequency decoding means 102 to generate a time signal representing the low frequency component. The generated time signal representing the low frequency component is divided into a plurality (M) of subbands in the analysis filter bank 103 and input to the band extending means 104. The band extending means 104 compensates for the high frequency component lost due to the band limitation by copying the low frequency subband signal representing the low frequency component to the high frequency subband. Here, the high frequency component information 111 input to the band extending means 104 includes gain information for the high frequency sub-band to be compensated, and the gain is adjusted for each generated high frequency sub-band. The high frequency sub-band signal generated in the band extending means 104 is input to the synthesis filter bank 105 together with the low frequency sub-band signal, and is subjected to band synthesis to generate a time domain synthesized signal 116. Further, the time domain sine wave generating unit 106 generates a parameter for adjusting the characteristics of the sine wave generated by the sine wave adjusting unit 107 based on the input sine wave additional information 112 and then the sine wave generating unit 108. A desired sine wave is generated and output as a time domain sine wave signal 114. The time domain synthesized signal 116 and the time domain sine wave signal 114 are synthesized as a time signal to become an output signal 115.
[0014]
Here, the configuration and operation of the time domain sine wave generating means 106 will be described in detail. FIG. 2 is a diagram showing the configuration of the time domain sine wave generating means 106. The sine wave adjusting unit 107 includes an amplitude adjusting unit 201, a timing adjusting unit 202, and a rough shape generating unit 204. The amplitude adjusting unit 201 refers to the sine wave additional information 205 and outputs amplitude information 207 for controlling the amplitude of the generated sine wave. Similarly, the timing adjustment unit 202 refers to the sine wave additional information 205 and outputs timing information 208 for controlling the timing of the generated sine wave change. Subsequently, the outline generation unit 204 generates an outline of a sine wave from the amplitude information 207 and the timing information 208. FIG. 3 is a diagram illustrating the outline of the generated sine wave. Usually, the outline of the sine wave has a stepped shape like the outline 301 before interpolation shown in (a) by combining the timing information t indicating a certain time and the corresponding amplitude information A. Based on the sine wave frequency information 206 included in the sine wave additional information 205, a desired time-amplitude characteristic is obtained by applying the above outline to a sine wave having a constant amplitude generated by the sine wave generating unit 203. A time-domain sine wave 209 can be generated.
[0015]
By adopting such a configuration, it is possible to set the frequency of the added sine wave without being limited by the frequency of the sine wave by the subband filter, so that a high-quality output signal closer to the input signal is obtained. be able to. Further, since the added sine wave is generated as a time signal, phase control is also easy.
[0016]
In addition, when the information for adding a plurality of sine waves is included in the sine wave additional information, a sine wave having a plurality of frequencies is sequentially generated, and the corresponding outline is applied to each sine wave. After that, all the sine waves may be synthesized. Corresponding outlines may be the same for all sine waves, or may have different shapes.
[0017]
In addition, when generating the approximate shape of the sine wave in the approximate shape generation means 204, by performing interpolation processing using amplitude information between adjacent frames, it is possible to suppress a sudden change in the signal and improve the sound quality. In the outline after interpolation shown in (b) of FIG. 3, there is no point where the amplitude existing in the outline 301 before interpolation shown in (a) changes suddenly, so that the volume of the output signal changes smoothly. The sound quality on hearing is improved. The same processing can be performed by adding a sine wave to a conventional subband signal. In this case, the unit in the time direction in which the amplitude can be controlled is a subsample (sample in the subband signal), and the interpolation accuracy is reduced. Will do. In this configuration, the output signal can be interpolated in units of samples, and a higher quality output signal can be obtained.
[0018]
Further, frequency interpolation is also possible as interpolation between adjacent frames. As with the amplitude information, the output signal can be interpolated in units of samples, and an output signal with higher sound quality can be obtained by smoothly changing the frequency of the added sine wave.
[0019]
(Embodiment 2)
FIG. 4 is a configuration diagram showing a decoding method according to Embodiment 2 of the present invention. In the configuration of the second embodiment, the frequency analysis means 410 and the frequency adjustment means 411 are provided in the time domain sine wave generation means of the first embodiment shown in FIG. 2, and the low frequency time signal 412 is analyzed by the frequency analysis means 410. The frequency adjustment unit 411 is configured to adaptively control the frequency of the generated sine wave based on the analysis result. As the low frequency signal 412, for example, the output of the low frequency decoding means 102 in FIG. 1 is used.
[0020]
Here, the reason for controlling the frequency of the sine wave generated based on the result of analyzing the frequency of the low-frequency signal will be described. FIG. 5 is a diagram showing a spectral distribution of an audio signal. Of the information included in the input bit stream, only the low frequency component information holds the encoding information of the spectrum of the audio signal, and the band limitation indicated by 501 is applied when encoding the low frequency component. Has been. Therefore, the low frequency signal output from the low frequency decoding means includes only a spectrum within the band limit 501 range. In the sine wave addition process, the bandwidth of the audio signal to be decoded is expanded by adding a sine wave to a high frequency exceeding the range of the band limit 501. Usually, the added sine wave has a certain fundamental frequency. A harmonic 503 having a frequency that is an integral multiple of the fundamental wave 502 having This is based on the fact that many of the audio signals that are generally encoded are composed of a set of multiple harmonics. Therefore, the sine wave additional information 405 needs to include frequency information that correctly represents an integral multiple of the fundamental wave 502. For this purpose, a large amount of information needs to be allocated, and the present invention is intended. At very low bit rates, only approximate frequency information can be assigned. As a result, the frequency of the added sine wave is different from that of the integral multiple harmonic 503, leading to a reduction in the quality of the decoded audio signal. On the other hand, in the embodiment of the present invention, the frequency analysis unit 410 analyzes the low-frequency signal 412, and based on the analyzed fundamental frequency, the frequency adjustment unit 411 calculates the frequency of the generated sine wave. Since the control is performed so as to be an integral multiple of the fundamental frequency, the difference between the frequency of the added sine wave and the integral multiple harmonic 503 is eliminated, so that an output signal more faithful to the input signal can be obtained.
[0021]
The frequency analysis unit 410 may be configured to switch whether or not to perform adaptive control of the frequency in the frequency adjustment unit 411 based on the analyzed intensity information of the fundamental frequency.
[0022]
(Embodiment 3)
FIG. 6 is a configuration diagram showing a decoding method according to Embodiment 3 of the present invention. In the configuration of the third embodiment, in the time domain sine wave generating means of the first embodiment shown in FIG. 2, a signal change detecting means 611 is provided, and the signal change detecting means 611 analyzes the low frequency signal 610 for analysis. Based on the result, the timing adjustment means 602 adaptively controls the amplitude control position of the sine wave to be generated. As the low frequency signal 610, for example, the output of the low frequency decoding means 102 in FIG. 1 is used.
[0023]
Here, the reason for controlling the amplitude control position of the sine wave generated based on the analysis of the change in the low-frequency signal will be described. The amplitude of the added sine wave is given as amplitude information at a certain time position t as shown in FIG. In order to represent the input signal as faithfully as possible, it is necessary to set as many amplitude adjustment positions as possible. However, at a low bit rate that is targeted by the decoding apparatus of the present invention, a small number is required to reduce the amount of information. Only the amplitude adjustment position can be installed. The amplitude adjustment position can be selected only from a plurality of predetermined candidate points. For this reason, a shift occurs between the actual change point of the input signal and the amplitude adjustment position, leading to a decrease in the quality of the decoded audio signal. On the other hand, in the embodiment of the present invention, the signal change detection means 611 analyzes the low frequency signal 610, detects the signal change point, and adapts the amplitude adjustment position of the sine wave based on the position information. Therefore, the difference between the change point of the input signal and the amplitude adjustment position is eliminated, and an output signal more faithful to the input signal can be obtained.
[0024]
Note that the amplitude adjustment position control of the sine wave according to the present embodiment can also be applied to high-frequency signal generation in the band extension means 104 shown in FIG. Like the sine wave signal, the amplitude of the high frequency signal is given as amplitude information at a certain time position t. Therefore, the amplitude adjustment position is adapted based on the positional information of the changing point obtained by analyzing the low frequency signal. By performing the control, the deviation between the change point of the input signal and the amplitude adjustment position can be eliminated, and an output signal more faithful to the input signal can be obtained.
[0025]
In the first to third embodiments, the sine wave generation method and the synthesis method based on the sine wave additional information have been described. However, any waveform signal having periodicity is used instead of the sine wave. Also good. Since it is realized as processing for a time domain signal, it can be realized by the same configuration as that of the present embodiment without being influenced by the frequency spectrum distribution of the waveform to be used.
[0026]
【The invention's effect】
According to the present invention, by adding the sine wave to the signal in the time domain, the problem related to the limitation of the sine wave frequency and the phase caused by the subband filter is solved, and the input signal is more faithful. High-quality audio signals can be decoded.
Further, according to the present invention, it is possible to decode a high-quality audio signal faithful to the input signal by adaptively controlling the frequency of the added sine wave based on the analysis result of the low-frequency signal. Become.
In addition, according to the present invention, it is possible to decode a high-quality audio signal that is more faithful to the input signal by adaptively controlling the amplitude control position of the added sine wave based on the analysis result of the low-frequency signal. It becomes possible.
[Brief description of the drawings]
FIG. 1 is a diagram showing an example of the configuration of an audio decoding apparatus according to the present invention. FIG. 2 is a diagram showing an example of a time domain sine wave generating means according to the present invention. FIG. 4 is a diagram showing an example of a time domain sine wave generating means of the present invention. FIG. 5 is a diagram showing an outline of a sine wave for amplitude adjustment. FIG. 6 is an example of a time domain sine wave generating means of the present invention. FIG. 7 is a diagram showing an example of a conventional audio decoding apparatus.
101 Bit stream separation means 102 Low frequency decoding means 103 Analysis filter bank 104 Band expansion means 105 Synthesis filter bank 106 Time domain sine wave generation means 107 Sine wave adjustment means 108 Sine wave generation means 109 Bit stream 110 Low frequency component information 111 High frequency Component information 112 Sine wave additional information 113 Sine wave adjustment information 114 Time domain sine wave signal 115 Output signal 116 Time domain synthesized signals 201, 401, 601 Amplitude adjustment means 202, 402, 602 Timing adjustment means 203, 403, 603 Sine wave generation Means 204, 404, 604 Outline generation means 205, 405, 605 Sine wave additional information 206, 406, 606 Frequency information 207, 407, 607 Amplitude information 208, 408, 608 Timing information 209, 409, 609 Time domain positive String wave signal 301 General shape before interpolation 302 General shape after interpolation 410 Frequency analysis means 411 Frequency adjustment means 412 Low-frequency signal 501 Band limit 502 Fundamental wave 503 Harmonic wave 504 Integer multiple relationship 610 Low-frequency signal 611 Signal change Detection means

Claims

Bitstream separation means for separating each encoded information from a bitstream obtained by multiplexing audio signal information divided into frames of a predetermined time length and further divided into a plurality of subbands and encoded;
Low frequency decoding means for decoding a time signal representing a low frequency component from the separated low frequency encoding information;
An analysis subband filter that divides the decoded lowband time signal into a plurality of lowband subband signals;
Band extension means for generating a high frequency subband signal from the low frequency subband signal and the separated high frequency encoded information,
A synthesis subband filter that synthesizes low and high frequency subband signals to obtain a time domain signal;
Sine wave generating means for generating a time-domain signal representing a sine wave having a desired frequency and amplitude characteristic from the separated sine wave additional information;
Frequency adjusting means for adjusting the frequency of the sine wave generated from the sine wave generating means to be an integral multiple of the frequency of the decoded low-frequency signal;
An audio decoding device comprising: a time signal synthesis unit that synthesizes a time domain signal obtained from the synthesis filter bank and a sine wave obtained from the sine wave generation unit to obtain an output audio signal.

Bitstream separation means for separating each encoded information from a bitstream obtained by multiplexing audio signal information divided into frames of a predetermined time length and further divided into a plurality of subbands and encoded;
Low frequency decoding means for decoding a time signal representing a low frequency component from the separated low frequency encoding information;
An analysis subband filter that divides the decoded lowband time signal into a plurality of lowband subband signals;
Band extension means for generating a high frequency subband signal from the low frequency subband signal and the separated high frequency encoded information,
A synthesis subband filter that synthesizes low and high frequency subband signals to obtain a time domain signal;
Sine wave generating means for generating a time-domain signal representing a sine wave having a desired frequency and amplitude characteristic from the separated sine wave additional information;
Timing adjustment means for adjusting the amplitude change position of the sine wave generated from the sine wave generation means based on the change position of the decoded low-frequency signal;
An audio decoding device comprising: a time signal synthesis unit that synthesizes a time domain signal obtained from the synthesis filter bank and a sine wave obtained from the sine wave generation unit to obtain an output audio signal.

A bit stream separation process procedure for separating each encoded information from a bit stream obtained by multiplexing audio signal information divided into a plurality of frames of a predetermined time length and further divided into a plurality of subbands and encoded;
A low frequency decoding processing procedure for decoding a time signal representing a low frequency component from the separated low frequency encoding information;
An analysis subband filtering procedure for dividing the decoded lowband time signal into a plurality of lowband subband signals;
Band extension processing procedure for generating a high frequency subband signal from the low frequency subband signal and the separated high frequency encoded information,
A synthesis subband filter processing procedure for synthesizing the low frequency and high frequency subband signals to obtain a time domain signal,
A sine wave generation processing procedure for generating a time-domain signal representing a sine wave having a desired frequency and amplitude characteristic from the separated sine wave additional information;
A frequency adjustment processing procedure for adjusting the frequency of the sine wave generated in the sine wave generation processing procedure to be an integral multiple of the frequency of the decoded low-frequency signal;
A time signal synthesis processing procedure for obtaining an output audio signal by synthesizing the time domain signal obtained from the synthesis filter bank processing procedure and the sine wave obtained in the sine wave generation processing procedure. Decryption method.

A bit stream separation process procedure for separating each encoded information from a bit stream obtained by multiplexing audio signal information divided into a plurality of frames of a predetermined time length and further divided into a plurality of subbands and encoded;
A low frequency decoding processing procedure for decoding a time signal representing a low frequency component from the separated low frequency encoding information;
An analysis subband filtering procedure for dividing the decoded lowband time signal into a plurality of lowband subband signals;
Band extension processing procedure for generating a high frequency subband signal from the low frequency subband signal and the separated high frequency encoded information,
A synthesis subband filter processing procedure for synthesizing the low frequency and high frequency subband signals to obtain a time domain signal,
A sine wave generation processing procedure for generating a time-domain signal representing a sine wave having a desired frequency and amplitude characteristic from the separated sine wave additional information;
A timing adjustment processing procedure for adjusting the amplitude change position of the sine wave generated in the sine wave generation processing procedure based on the change position of the decoded low-frequency signal;
A time signal synthesis processing procedure for obtaining an output audio signal by synthesizing the time domain signal obtained from the synthesis filter bank processing procedure and the sine wave obtained in the sine wave generation processing procedure. Decryption method.

A program for causing a computer to execute each processing procedure of the audio decoding method according to claim 3 or 4.

A computer-readable recording medium on which a program for causing a computer to execute each processing procedure of the audio decoding method according to claim 3 or 4 is recorded .