JP3776004B2

JP3776004B2 - Encoding method of digital data

Info

Publication number: JP3776004B2
Application number: JP2001158767A
Authority: JP
Inventors: 修藤井
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2001-05-28
Filing date: 2001-05-28
Publication date: 2006-05-17
Anticipated expiration: 2021-05-28
Also published as: JP2002351500A

Description

【０００１】
【発明の属する技術分野】
本発明は、ミニディスク等の記録媒体に楽音や音声等のディジタルデータを記録するにあたって、前記楽音や音声等に適応して各周波数帯域のスペクトルに対するビット割り当てを行い、データ量を圧縮する符号化方法に関する。
【０００２】
【従来の技術】
楽音や音声等のディジタルデータを高能率で圧縮符号化する方法として、ミニディスクで用いられているＡＴＲＡＣ（Adaptive TRansform Acoustic Coding）が挙げられる。このＡＴＲＡＣでは、ディジタルデータを高能率で圧縮するために、入力ディジタルデータは複数の周波数帯域（以下、適宜サブバンドフレームと呼ぶ）に分割され、可変長の単位時間でブロック化される。ブロック化されたディジタルデータはＭＤＣＴ（Modified Discrete Cosine Transform）処理によってスペクトル信号に変換され、さらに聴覚心理特性を利用して割り当てられたビット数で各スペクトル信号がそれぞれ符号化される。
【０００３】
上記の圧縮符号化に適用することができる聴覚心理特性には、等ラウドネス特性やマスキング効果が挙げられる。等ラウドネス特性とは、同じ音圧レベルの音であっても、人間が感じ取る音の大きさが周波数によって変化することを表すものである。従って、人間が感じ取ることのできる音の大きさを示す最小可聴限がその音の周波数によって変化することを表している。
【０００４】
一方、マスキング効果には同時マスキング効果と経時マスキング効果がある。同時マスキング効果とは、複数の周波数成分の音が同時に発生しているときに、ある音が別の音を聴き取り難くさせる現象を言う。また、経時マスキング効果とは、大きな音の時間軸方向の前後では、別の音が聞き取り難くなる現象を言う。
【０００５】
このような聴覚心理特性を利用したビット割り当て法、例えば反復法と呼ばれる割り当て法では、入力されたディジタルデータに適応した実際のビット割り当てを、次のようにして行っている。
【０００６】
まず、各周波数帯域のパワーＳを求め、そのパワーＳによる他の周波数帯域に対するマスキング閾値Ｍを求める。次に、このマスキング閾値Ｍと、各周波数帯域をｎビットで量子化したときの量子化雑音パワーＮ（ｎ）とから、マスキング閾値対雑音比ＭＮＲ（ｎ）＝Ｍ／Ｎ（ｎ）を求める。続いて、そのマスキング閾値対雑音比ＭＮＲ（ｎ）が最小となる周波数帯域にビットの割当てを行った後、該マスキング閾値対雑音比ＭＮＲ（ｎ）を更新し、再び最小の周波数帯域にビットの割り当てが行われる。
【０００７】
【発明が解決しようとする課題】
確かに、上記で説明した従来の符号化方法によれば、楽音や音声等のディジタルデータを高能率で圧縮符号化することができる。
【０００８】
しかしながら、正弦波のように純音性の高いディジタルデータに対してマスキング閾値対雑音比ＭＮＲ（ｎ）を用いたビット割り当てを行うと、自身のパワー或いはエネルギーによって自身がマスキングの影響を受けてしまうため、信号対雑音比ＳＮＲ（ｎ）＝Ｓ／Ｎ（ｎ）を用いたビット割り当てを行った場合と比べて、符号化・復号化時の歪み率、Ｓ／Ｎ特性、及びダイナミックレンジといったオーディオ特性が悪化するという課題があった。
【０００９】
また、超低域や超高域の正弦波に対してマスキング閾値対雑音比ＭＮＲ（ｎ）を用いたビット割り当てを行うと、最小可聴限もオーディオ特性の悪化要因となるおそれがあった。なお、超低域の正弦波に対して信号対雑音比ＳＮＲ（ｎ）を用いたビット割り当てを行った場合には、隣接するサブバンドフレーム間の分析窓のクロスポイントで、知覚可能な量子化誤差が生じるおそれがあった。
【００１０】
一方、ホワイトノイズのように純音性の低いディジタルデータに対してマスキング閾値対雑音比ＭＮＲ（ｎ）を用いたビット割り当てを行うと、自身のパワー或いはエネルギーによって、マスキング閾値対雑音比ＭＮＲ（ｎ）が広帯域でフラットにならないため、信号対雑音比ＳＮＲ（ｎ）を用いたビット割り当てを行った場合と比べて、符号化・復号化時の音質が悪化するという課題があった。また、最小可聴限も音質の悪化要因となるおそれがあった。
【００１１】
この点、本件出願人は、特開平１０−２０７４８９号公報において、純音性の高いディジタルデータ或いは純音性の低いディジタルデータを符号化する場合、各周波数帯域の相互に隣接するスペクトルのパワーＳ（或いはエネルギー）の差から求めたピーク及びローカルピークとマスキング閾値Ｍとの関係に対応して、マスキング閾値対雑音比ＭＮＲ（ｎ）を用いたビット割り当てを行うビットレートと、信号対雑音比ＳＮＲ（ｎ）を用いたビット割り当てを行うビットレートとを、可変させる構成のディジタルデータ符号化方法を提案している。
【００１２】
確かに、上記の符号化方法によれば、正弦波のように狭帯域なディジタルデータから、ホワイトノイズのように広帯域なディジタルデータに至るまで、最適なビット割り当てを自動的に行うことができるので、マスキング閾値対雑音比ＭＮＲ（ｎ）等の同時マスキングを利用したビット割り当てに不向きな楽音に対しても音質の劣化を防止することができる。しかしながら、マスキング閾値対雑音比ＭＮＲ（ｎ）と信号対雑音比ＳＮＲ（ｎ）とを併用する上記の符号化方法では、アルゴリズムが複雑なものとなっていた。
【００１３】
本発明は上記の問題点に鑑み、アルゴリズムを複雑化することなく、純音性の高いディジタルデータから純音性の低いディジタルデータまで、高忠実に符号化することが可能なディジタルデータの符号化方法を提供することを第１の目的とする。また、本発明は、超低域であってかつ純音性の高いディジタルデータを符号化する際であっても、隣接するサブバンドフレーム間の分析窓のクロスポイントで、知覚可能な量子化誤差が生じるおそれの少ないディジタルデータの符号化方法を提供することを第２の目的とする。
【００１４】
【課題を解決するための手段】
上記した第１の目的を達成するために、本発明に係るディジタルデータの符号化装置は、楽音や音声等のディジタルデータを周波数領域に変換する手段を有するディジタルデータの符号化装置において、
前記周波数領域を複数の周波数帯域に分割する手段と、
前記各周波数帯域のパワーまたはエネルギーを算出する手段と、
前記算出された各周波数帯域のパワーまたはエネルギーの最大値及び平均値に基づき前記ディジタルデータの純音性を判定する純音性判定手段と、
聴覚心理特性を反映した基準マスキング特性に基づきマスキング閾値を決定する第一のマスキング算出手段と、
周波数に重み付けを行っていない平坦なマスキング特性に基づきマスキング閾値を決定する第二のマスキング算出手段と、
前記純音性判定手段の判定結果に基づき前記第一のマスキング算出手段と前記第二のマスキング算出手段とを切換える切換手段と、
前記算出されたパワーまたはエネルギーと前記マスキング閾値とに基づきマスキング閾値対雑音比を算出する手段と、
前記マスキング閾値対雑音比に基づき前記各周波数帯域にビットを割り当てる手段と、を備えることを特徴とする。
【００１６】
また、上記した第２の目的を達成するために、本発明に係るディジタルデータの符号化装置は、楽音や音声等のディジタルデータを複数のサブバンドフレームに分割する手段と、
前記サブバンドフレーム単位毎に周波数領域に変換する手段と、
前記周波数領域を複数の周波数帯域に分割する手段と、
前記各周波数帯域のパワーを算出する手段と、
マスキング閾値を決定する手段と、
前記算出されたパワーと前記マスキング閾値とに基づきマスキング閾値対雑音比を算出する手段と、
前記マスキング閾値対雑音比に基づき前記各周波数帯域にビット割り当てを行う割り当て手段と、を備える符号化装置において、
前記算出されたパワーの最大値と平均値の差分値が所定値以上であり、かつ、前記パワーの最大値が存在する周波数が所定周波数以下である場合、前記パワーの最大値が存在するサブバンドフレームの全ての周波数帯域に、少なくとも最低量子化ビット数以上のビットを割り当てるよう前記割り当て手段に指示を行う手段を備えることを特徴とする。
【００１７】
【発明の実施の形態】
本発明に係るディジタルデータの符号化方法を採用したディジタル録音再生装置として、ここでは、ミニディスク録音再生装置を例に挙げて説明を行う。図１は本発明に係るディジタルデータの符号化方法を採用したミニディスク録音再生装置の一構成例を示すブロック図である。
【００１８】
本図に示すミニディスク録音再生装置１に設けられた入力端子２には、コンパクトディスク再生装置や衛星放送受信装置などのディジタル音声信号源から出力されたディジタル音声データが、例えば光信号としてシリアル入力される。入力端子２に入力された光信号は、光電素子３によって電気信号に変換された後、ディジタルＰＬＬ（Phase-Locked-Loop）回路４に入力される。
【００１９】
ディジタルＰＬＬ回路４は、入力されたディジタル音声データからクロックの抽出を行うとともに、サンプリング周波数および量子化ビット数に対応したマルチビットデータを再現する。このマルチビットデータは信号源毎に異なるサンプリングレート（コンパクトディスク；４４．１ｋＨｚ、ディジタルオーディオテープレコーダ；４８ｋＨｚ、衛星放送（Ａモード）；３２ｋＨｚなど）で標本化されたディジタルデータである。そこで、ディジタルＰＬＬ回路４から出力されたマルチビットデータは、周波数変換回路５によってそのサンプリングレートをミニディスクの規格に対応した４４．１ｋＨｚに変換される。
【００２０】
音声圧縮回路６は、ＡＴＲＡＣ（Adaptive TRansform Acoustic Coding）方式によって入力されたディジタル音声データの圧縮符号化を行い、符号化されたディジタル音声データをショックプルーフメモリコントローラ７を介して信号処理回路８に送出する。なお、音声圧縮回路６におけるディジタルデータの符号化方法については、後ほど詳細に説明を行う。
【００２１】
ショックプルーフメモリコントローラ７で制御されるショックプルーフメモリ９は、音声圧縮回路６から出力されるディジタル音声データの転送速度と、信号処理回路８に入力されるディジタル音声データの転送速度との差を吸収するとともに、再生時における振動等の外乱による再生信号の中断を補間し、ディジタル音声データを保護するためのものである。
【００２２】
信号処理回路８は、エンコーダおよびデコーダとしての機能を備えており、入力されたディジタル音声データをシリアルの磁界変調信号にエンコードしてヘッド駆動回路１０に与える。ヘッド駆動回路１０は、記録ヘッド１１をミニディスク１２上の所定記録位置に移動させるとともに、前記磁界変調信号に対応した磁界を発生させる。このとき、ミニディスク１２上の所定記録位置には、光ピックアップ１３からレーザ光が照射されており、これによって前記磁界に対応した磁化パターンがミニディスク１２上に形成される。
【００２３】
一方、光ピックアップ１３は、ミニディスク１２から前記磁化パターンに対応したシリアル信号を再生する。再生されたシリアル信号は高周波アンプ１４（以下、ＲＦアンプ１４と呼ぶ）で増幅された後、信号処理回路８によってディジタル音声データにデコードされる。デコードされたディジタル音声データは、ショックプルーフメモリコントローラ７及びショックプルーフメモリ９で外乱による影響を除去された後、音声伸長回路１５に送出される。
【００２４】
音声伸長回路１５は、ＡＴＲＡＣ方式による圧縮符号化の逆変換処理を行い、フルビットのディジタル音声データを復調する。復調されたディジタル音声データは、ディジタル／アナログ変換回路１６（以下、Ｄ／Ａ変換回路１６と呼ぶ）によってアナログ音声データに変換され、出力端子１７から外部へ出力される。
【００２５】
なお、ＲＦアンプ１４で増幅されたシリアル信号は、サーボ回路１８にも入力されている。サーボ回路１８は、再生されたシリアル信号に応じてドライバ回路１９に制御信号を送出し、該ドライバ回路１９を介してスピンドルモータ２０の回転速度をフィードバック制御する。このようなフィードバック制御により、ミニディスク１２を線速度一定で回転させることができる。
【００２６】
また、サーボ回路１８は、ドライバ回路１９を介して送りモータ２１の回転速度もフィードバック制御している。このようなフィードバック制御により、ミニディスク１２の半径方向に対する光ピックアップ１３の変移制御、すなわちトラッキング制御を行うことができる。さらに、サーボ回路１８は、ドライバ回路１９を介して光ピックアップ１３のフォーカシング制御も行っている。
【００２７】
上記した信号処理回路８、光ピックアップ１３、ＲＦアンプ１４、サーボ回路１８、及びドライバ回路１９等には、図示しない電源回路から電力供給が行われるが、このような電力供給動作や後述する信号処理動作は、全てシステムコントロールマイコン２２によって集中管理されている。なお、システムコントロールマイコン２２には、曲名入力や選曲操作、或いは音質調整動作等を行うための入力装置２３が接続されている。
【００２８】
続いて、上記した音声圧縮回路６におけるディジタルデータ符号化処理の第１実施形態について説明する。図２は音声圧縮回路６の第１実施形態を示すブロック図であり、特に、スペクトル変換部に続くビット割当処理部を模式化したものである。
【００２９】
本図に示すビット割当処理部の入力端には、その前段に設けられたスペクトル変換部（図示せず）で得られたＭＤＣＴ係数（ディジタル音声データを構成する周波数成分（スペクトル））が入力される。なお、スペクトル変換部は、周波数変換回路５から入力されたディジタル音声データ（４４．１ｋＨｚ）を、帯域分割フィルタであるＱＭＦ（Quadrature Mirror Filter）によって複数の周波数帯域（サブバンドフレーム）に分割し、そのサブバンドフレーム単位毎にＭＤＣＴ（Modified Discrete Cosine Transform）処理を施すことで、ディジタル音声データのスペクトル変換を行っている。
【００３０】
パワー算出部３１は、入力されたＭＤＣＴ係数をさらにｉ個の周波数帯域（臨界帯域等）に分割し、各周波数帯域に属するＭＤＣＴ係数の２乗和から、各周波数帯域のスペクトルパワーＳ_i（ｉ＝１，２，…，Ｉ、例えばＩ＝２５）を算出する。なお、臨界帯域とは、周波数選択性・マスキング閾値等の特定の音響心理学的規則性が有効な広帯域オーディオスペクトルの特性的部分のことである。
【００３１】
純音性判定部３２は、パワー算出部３１で算出されたスペクトルパワーＳ_iの最大値Ｓ_maxと平均値Ｓ_av（＝ΣＳ_i／Ｉ）との差分値（Ｓ_max−Ｓ_av）を求めるとともに、該差分値の大小からディジタル音声データの純音性の高低を判定し、その判定結果に基づいて切換部３３の切換制御を行う。
【００３２】
図３はパワー算出部３１で算出されたスペクトルパワーＳ_iの一例を示す図である。本図中（ａ）に示すように、スペクトルパワーＳ_iの最大値Ｓ_maxと平均値Ｓ_avとの差分値（Ｓ_max−Ｓ_av）が非常に大きい場合、例えばＳ_max−Ｓ_av≧４０ｄＢを満たす場合、純音性判定部３２は、入力されたディジタル音声データの純音性が高いと判定して、平坦マスキング算出部３５を選択するように切換部３３の切換制御を行う。
【００３３】
また、本図中（ｂ）に示すように、スペクトルパワーＳ_iの最大値Ｓ_maxと平均値Ｓ_avとの差分値（Ｓ_max−Ｓ_av）が非常に小さい場合、例えばＳ_max−Ｓ_av≦６ｄＢを満たす場合、純音性判定部３２は、入力されたディジタル音声データの純音性が低いと判定して、上記と同様、平坦マスキング算出部３５を選択するように切換部３３の切換制御を行う。
【００３４】
一方、スペクトルパワーＳ_iの最大値Ｓ_maxと平均値Ｓ_avとの差分値（Ｓ_max−Ｓ_av）が上記のいずれにも該当しない場合、例えば６ｄＢ＜Ｓ_max−Ｓ_av＜４０ｄＢを満たす場合、純音性判定部３２は、入力されたディジタル音声データに対する聴覚心理、すなわちマスキング効果が有効であると判断して、基準マスキング算出部３４を選択するように切換部３３の切換制御を行う。
【００３５】
上記の純音性判定動作により、基準マスキング算出部３４が選択された場合、最小可聴限合成部３６は、音声圧縮部６のテーブルＲＯＭ（図示せず）に予め格納されている基準マスキング特性と最小可聴限特性とを合成することで、最終的なマスキング閾値Ｍ_iを決定する。一方、平坦マスキング算出部３５が選択された場合、最小可聴限合成部３６は、周波数に重み付けを行っていない平坦なマスキング特性と最小可聴限特性とを合成することで、最終的なマスキング閾値Ｍ_iを決定する。
【００３６】
ＳＭＲ算出部３７は、各周波数帯域のインデックスを前記ｉとするとき、パワー算出部３１で算出されたスペクトルパワーＳ_iと、最小可聴限合成部３６で決定された各周波数帯域のマスキング閾値Ｍ_iとの比ＳＭＲ_i（＝Ｓ_i／Ｍ_i）を、全ての周波数帯域に亘って計算する。
【００３７】
ＭＮＲ算出部３８は、まず各周波数帯域のスペクトルパワーＳ_iと、該スペクトルパワーＳ_iをｎビットで量子化したときに生じる量子化雑音パワーＮ_i（ｎ）との比、すなわち信号対雑音比ＳＮＲ_i（ｎ）（＝Ｓ_i／Ｎ_i（ｎ））を求める。なお、この信号対雑音比ＳＮＲ_i（ｎ）は、統計的には信号特性に応じた定数となるので、予め統計処理によって求めておいてもよい。さらに、ＭＮＲ算出部３８は、この信号対雑音比ＳＮＲ_i（ｎ）と、ＳＭＲ算出部３７で得られた比ＳＭＲ_iから、マスキング閾値Ｍ_iと量子化雑音パワーＮ_iとの比、すなわちマスキング閾値対雑音比ＭＮＲ_i（ｎ）（＝ＳＮＲ_i（ｎ）／ＳＭＲ_i）を算出する。
【００３８】
量子化ビット数算出部３９は、各周波数帯域の量子化ビット数ｎを０から順に大きくしていき、その都度、各周波数帯域のマスキング閾値対雑音比ＭＮＲ_i（ｎ）を計算する。そして、マスキング閾値対雑音比ＭＮＲ_i（ｎ）が最小となる周波数帯域から順にビットを割り当てていく。その後、量子化ビット数ｎを更新する度毎に、マスキング閾値対雑音比ＭＮＲ_i（ｎ）が最小となる周波数帯域に対する同様のビット割り当てを行う。所定の割り当て可能ビット数となるまで割り当てを行うと、各周波数帯域の語長が決定されて出力が行われる。すなわち、スペクトルパワーＳ_iの絶対値が、マスキング閾値Ｍ_iを超えた部分の長さが最も長い周波数帯域から順にビット割り当てが行われることになる。
【００３９】
上記したディジタルデータの符号化方法であれば、正弦波のように純音性の高いディジタルデータ、或いはホワイトノイズのように純音性の低いディジタルデータに対して、マスキング閾値対雑音比ＭＮＲ_i（ｎ）のみを用いてビット割り当てを行った場合であっても、信号対雑音比ＳＮＲ_i（ｎ）を用いてビット割り当てを行った場合と同等のオーディオ特性及び音質を得ることができる。
【００４０】
また、本実施形態におけるディジタルデータの符号化方法であれば、楽音や音声のように、聴覚心理を利用した方が好ましい音源には、通常のマスキング閾値対雑音比ＭＮＲ_i（ｎ）を用いたビット割り当てを行うので、信号対雑音比ＳＮＲ_i（ｎ）を用いてビット割り当てを行うよりも、聴覚的に優れた音質を得ることができる。さらに、マスキング閾値対雑音比ＭＮＲ_i（ｎ）と信号対雑音比ＳＮＲ_i（ｎ）とを併用する従来の符号化方法に比べて、アルゴリズムを容易に実現することが可能である。
【００４１】
続いて、音声圧縮回路６におけるディジタルデータ符号化処理の第２実施形態について説明する。図４は音声圧縮回路６の第２実施形態を示すブロック図であり、説明の理解を深めるために音声伸長回路１５も合わせて示している。
【００４２】
本図に示す音声圧縮回路６の入力端には、周波数変換回路５で得られたディジタル音声データ（４４．１ｋＨｚ）が入力される。音声圧縮回路６の最前段に設けられた周波数帯域分割部４１は、入力されたディジタル音声データを複数の周波数帯域（サブバンドフレーム）に分割する。
【００４３】
時間周波数変換部４２は、周波数帯域分割部４１で得られたサブバンドフレーム単位毎にＭＤＣＴ処理を施すことで、ディジタル音声データをＭＤＣＴ係数に変換する。このときのＭＤＣＴ処理によって得られる変換データＸ_m（ｋ）は、次の（１）式で示される。
【数１】

【００４４】
なお、上式中の変数ｍはブロック番号を表しており、関数ｘ_m（ｉ）は入力信号を表している。また、関数ｈ（ｉ）は順変換用窓関数を表している。図５は順変換窓関数ｈ（ｉ）の時間特性の一例を示す概念図であり、図６は順変換窓関数ｈ（ｉ）の周波数特性の一例を示す概念図である。
【００４５】
パワー算出部４３は、時間周波数変換部４２で得られたＭＤＣＴ係数をさらにｉ個の周波数帯域（臨界帯域等）に分割し、各周波数帯域に属するＭＤＣＴ係数の２乗和から、各周波数帯域のスペクトルパワーＳ_i（ｉ＝１，２，…，Ｉ、例えばＩ＝２５）を算出する。
【００４６】
純音性判定部４４は、パワー算出部４３で算出されたスペクトルパワーＳ_iの最大値Ｓ_maxと平均値Ｓ_av（＝ΣＳ_i／Ｉ）との差分値（Ｓ_max−Ｓ_av）を求めるとともに、該差分値の大小からディジタル音声データの純音性の高低を判定し、その判定結果に基づいて量子化ビット数算出部４９における量子化ビット数の割り当て制御を行う。
【００４７】
図７は本実施形態における量子化ビット数の割り当て制御を説明するための図であり、パワー算出部３１で算出されたスペクトルパワーＳ_iの一例（ａ）と、その際に割り当てられる量子化ビット数の一例（ｂ）と、を示している。なお、本図では、入力されたディジタル音声データが４つのサブバンドフレームＳＢ１〜ＳＢ４に分割されている場合を例に挙げて説明を行う。
【００４８】
本図中（ａ）に示すように、スペクトルパワーＳ_iの最大値Ｓ_maxと平均値Ｓ_avとの差分値（Ｓ_max−Ｓ_av）が非常に大きく（例えばＳ_max−Ｓ_av≧４０ｄＢ）、かつスペクトルパワーＳ_iの最大値Ｓ_maxが存在する周波数が所定周波数（例えば１００Ｈｚ）以下である場合、純音性判定部４４は、入力されたディジタル音声データが超低域であるとともに純音性が高いと判定して、本図中（ｂ）に示すように、スペクトルパワーＳ_iの最大値Ｓ_maxが存在するサブバンドフレームＳＢ１に、少なくとも最低の量子化ビット数を割り当てるよう、量子化ビット数算出部４９に対する指示を行う。
【００４９】
このようなビット割り当てを行うことにより、特定周波数のノイズを低減することが可能となる。従って、超低域であるとともに純音性が高いディジタル音声データ（例えば、超低域の正弦波）を符号化する場合であっても、隣接するサブバンドフレーム間の分析窓のクロスポイントで、知覚可能な量子化誤差が生じるおそれが少なくなる。
【００５０】
パワー算出部４３の後段に接続されたマスキング算出部４５、最小可聴限合成部４６、ＳＭＲ算出部４７、ＭＮＲ算出部４８、及び量子化ビット数算出部４９は、前述の第１実施形態と同様、マスキング閾値対雑音比ＭＮＲ_i（ｎ）を用いたビット割り当てを行い、量子化ビット数を決定する。
【００５１】
量子化部５０及びパッキング部５１は、量子化ビット数算出部４９で得られた量子化ビット数に従って、入力されたディジタル音声データを圧縮符号化する。このようにして圧縮符号化されたディジタル音声データは、信号処理回路８などを介してミニディスク１２に記録される。
【００５２】
一方、ミニディスク１２を再生する際、音声伸長回路１５のアンパッキング部５２及び逆量子化部５３は、圧縮符号化されたディジタル音声データを元のＭＤＣＴ係数に復元する。
【００５３】
周波数時間変換部５４は、復元されたＭＤＣＴ係数に対して、サブバンドフレーム単位毎にＩＭＤＣＴ（Inverse Modified Discrete Cosine Transform）処理を施す。このときのＩＭＤＣＴ処理によって得られる復元信号ｘ＾_m（ｉ）は、次の（２）式で示される。
【数２】

【００５４】
なお、上式中の変数ｍはブロック番号を表しており、関数Ｘ_m（ｋ）は変換データ（復元されたＭＤＣＴ係数）を表している。また、関数ｙ_m（ｉ）は逆変換信号を表しており、関数ｆ（ｉ）は逆変換用窓関数を表している。
【００５５】
続く周波数帯域合成部５５は、周波数時間変換部５４によって得られた復元信号ｘ＾_m（ｉ）を合成することで元のディジタル音声データを復元し、該ディジタル音声データを次段のＤ／Ａ変換回路１６に送出する。
【００５６】
なお、変換データＸ_m（ｋ）が量子化による影響を受けることなく、復元信号ｘ＾_m（ｉ）から元のディジタル音声データを復元できるように、上記した時間周波数変換部４２及び周波数時間変換部５４は、次の（３）式を満たすように設計されるべきである。本条件はＣＡＳ９０−１０やＤＳＰ９０−１４等により既に公知とされている。
【数３】

【００５７】
一方、変換データＸ_m（ｋ）が量子化による影響を受けた場合について、参考までに説明する。図８は変換データＸ_m（ｋ）が量子化による影響を受けた場合を説明する図であり、周波数帯域合成部５５から出力されるディジタル音声データの一例（ａ）と、該ディジタル音声データの符号化時におけるビット割り当ての一例（ｂ）と、を示している。
【００５８】
本図中（ｂ）に示すように、サブバンドフレームＳＢ１の低域から１／３程度の周波数成分にのみビット割り当てを行った場合、復元信号ｘ＾_m（ｉ）の周波数特性は本図中（ａ）のようになり、窓関数の周波数特性のノイズが発生する。このような量子化ノイズは、サブバンドフレームＳＢ１のどの周波数にでも少なからず発生するが、この例ではサブバンドフレームＳＢ１の低域から１／３程度に集中してビットを割り当てているため、残りの２／３の周波数成分における量子化ノイズが知覚されやすい。従って、前述した通り、スペクトルパワーＳ_iの最大値Ｓ_maxが存在するサブバンドフレームＳＢ１に、少なくとも最低の量子化ビット数を割り当てれば、特定周波数のノイズを低減することが可能となる。
【００５９】
なお、上記の実施形態では、本発明に係るディジタルデータの符号化方法をミニディスク録音再生装置に適用した例を挙げて説明を行ったが、本発明の適用範囲がこれに限定されないことは言うまでもない。
【００６０】
【発明の効果】
本発明に係るディジタルデータの符号化装置は、楽音や音声等のディジタルデータを周波数領域に変換する手段を有するディジタルデータの符号化装置において、
前記周波数領域を複数の周波数帯域に分割する手段と、
前記各周波数帯域のパワーまたはエネルギーを算出する手段と、
前記算出された各周波数帯域のパワーまたはエネルギーの最大値及び平均値に基づき前記ディジタルデータの純音性を判定する純音性判定手段と、
聴覚心理特性を反映した基準マスキング特性に基づきマスキング閾値を決定する第一のマスキング算出手段と、
周波数に重み付けを行っていない平坦なマスキング特性に基づきマスキング閾値を決定する第二のマスキング算出手段と、
前記純音性判定手段の判定結果に基づき前記第一のマスキング算出手段と前記第二のマスキング算出手段とを切換える切換手段と、
前記算出されたパワーまたはエネルギーと前記マスキング閾値とに基づきマスキング閾値対雑音比を算出する手段と、
前記マスキング閾値対雑音比に基づき前記各周波数帯域にビットを割り当てる手段と、を備える。
【００６２】
このようなディジタルデータの符号化方法であれば、正弦波のように純音性の高いディジタルデータ、或いはホワイトノイズのように純音性の低いディジタルデータに対して、マスキング閾値対雑音比のみを用いてビット割り当てを行った場合であっても、信号対雑音比を用いてビット割り当てを行った場合と同等のオーディオ特性及び音質を得ることができる。
【００６３】
また、本発明に係るディジタルデータの符号化方法であれば、楽音や音声のように、聴覚心理を利用した方が好ましい音源には、通常のマスキング閾値対雑音比を用いたビット割り当てを行うので、信号対雑音比を用いてビット割り当てを行うよりも、聴覚的に優れた音質を得ることができる。さらに、マスキング閾値対雑音比と信号対雑音比とを併用する従来の符号化方法に比べて、アルゴリズムを容易に実現することが可能である。
【００６４】
また、本発明に係るディジタルデータの符号化装置は、楽音や音声等のディジタルデータを複数のサブバンドフレームに分割する手段と、
前記サブバンドフレーム単位毎に周波数領域に変換する手段と、
前記周波数領域を複数の周波数帯域に分割する手段と、
前記各周波数帯域のパワーを算出する手段と、
前記各周波数帯域にビット割り当てを行う割り当て手段と、を備える符号化装置において、
前記算出されたパワーの最大値と平均値の差分値が所定値以上であり、かつ、前記パワーの最大値が存在する周波数が所定周波数以下である場合、前記パワーの最大値が存在するサブバンドフレームの全ての周波数帯域に、少なくとも最低量子化ビット数以上のビットを割り当てるよう前記割り当て手段に指示を行う手段を備える。
【００６５】
このようなディジタルデータの符号化方法であれば、特定周波数のノイズを低減することが可能となる。従って、超低域であるとともに純音性が高いディジタル音声データ（例えば、超低域の正弦波）を符号化する場合であっても、隣接するサブバンドフレーム間の分析窓のクロスポイントで、知覚可能な量子化誤差が生じるおそれが少なくなる。
【図面の簡単な説明】
【図１】本発明に係るディジタルデータの符号化方法を採用したミニディスク録音再生装置の一構成例を示すブロック図である。
【図２】音声圧縮回路６の第１実施形態を示すブロック図である。
【図３】パワー算出部３１で算出されたスペクトルパワーＳ_iの一例を示す図である。
【図４】音声圧縮回路６の第２実施形態を示すブロック図である。
【図５】順変換窓関数ｈ（ｉ）の時間特性の一例を示す概念図である。
【図６】順変換窓関数ｈ（ｉ）の周波数特性の一例を示す概念図である。
【図７】第２実施形態における量子化ビット数の割り当て制御を説明するための図である。
【図８】変換データＸ_m（ｋ）が量子化による影響を受けた場合を説明する図である。
【符号の説明】
１ミニディスク録音再生装置
２入力端子
３光電素子
４ディジタルＰＬＬ回路
５周波数変換回路
６音声圧縮回路
７ショックプルーフメモリコントローラ
８信号処理回路
９ショックプルーフメモリ
１０記録ヘッド駆動回路
１１記録ヘッド
１２ミニディスク
１３光ピックアップ
１４高周波アンプ（ＲＦアンプ）
１５音声伸長回路
１６ディジタル／アナログ変換回路（Ｄ／Ａ変換回路）
１７出力端子
１８サーボ回路
１９ドライバ回路
２０スピンドルモータ
２１送りモータ
２２システムコントロールマイコン
２３入力装置
３１パワー算出部
３２純音性判定部
３３切換部
３４基準マスキング算出部
３５平坦マスキング算出部
３６最小可聴限合成部
３７ＳＭＲ算出部
３８ＭＮＲ算出部
３９量子化ビット数算出部
４１周波数帯域分割部
４２時間周波数変換部
４３パワー算出部
４４純音性判定部
４５マスキング算出部
４６最小可聴限合成部
４７ＳＭＲ算出部
４８ＭＮＲ算出部
４９量子化ビット数算出部
５０量子化部
５１パッキング部
５２アンパッキング部
５３逆量子化部
５４周波数時間変換部
５５周波数帯域合成部[0001]
BACKGROUND OF THE INVENTION
In recording digital data such as music and voice on a recording medium such as a mini disk, the present invention assigns bits to the spectrum of each frequency band in accordance with the music and voice and compresses the data amount. Regarding the method.
[0002]
[Prior art]
As a method of compressing and encoding digital data such as musical sounds and voices with high efficiency, there is ATRAC (Adaptive TRansform Acoustic Coding) used in minidiscs. In this ATRAC, in order to compress digital data with high efficiency, input digital data is divided into a plurality of frequency bands (hereinafter, referred to as subband frames as appropriate) and is blocked in variable length unit times. Blocked digital data is converted into a spectrum signal by MDCT (Modified Discrete Cosine Transform) processing, and each spectrum signal is encoded with the number of bits assigned using the psychoacoustic characteristics.
[0003]
Examples of the psychoacoustic characteristics that can be applied to the above compression coding include an equal loudness characteristic and a masking effect. The equal loudness characteristic represents that the volume of sound perceived by humans varies depending on the frequency even for sounds having the same sound pressure level. Therefore, it represents that the minimum audible limit indicating the volume of sound that humans can perceive changes depending on the frequency of the sound.
[0004]
On the other hand, the masking effect includes a simultaneous masking effect and a temporal masking effect. The simultaneous masking effect is a phenomenon that makes it difficult for one sound to hear another sound when sounds of a plurality of frequency components are generated simultaneously. The temporal masking effect refers to a phenomenon in which it is difficult to hear another sound before and after a loud sound in the time axis direction.
[0005]
In a bit allocation method using such psychoacoustic characteristics, for example, an allocation method called an iterative method, actual bit allocation adapted to input digital data is performed as follows.
[0006]
First, the power S of each frequency band is obtained, and the masking threshold M for other frequency bands based on the power S is obtained. Next, a masking threshold-to-noise ratio MNR (n) = M / N (n) is obtained from the masking threshold M and the quantization noise power N (n) when each frequency band is quantized with n bits. . Subsequently, after assigning bits to the frequency band where the masking threshold-to-noise ratio MNR (n) is minimum, the masking threshold-to-noise ratio MNR (n) is updated, and the bit is again set to the minimum frequency band. Allocation is done.
[0007]
[Problems to be solved by the invention]
Certainly, according to the conventional encoding method described above, digital data such as musical sounds and voices can be compression-encoded with high efficiency.
[0008]
However, if bit allocation using a masking threshold-to-noise ratio MNR (n) is performed on digital data having a high pure tone such as a sine wave, it is affected by masking by its own power or energy. , Audio characteristics such as distortion rate, S / N characteristics, and dynamic range at the time of encoding / decoding compared to the case where bit allocation using signal-to-noise ratio SNR (n) = S / N (n) is performed There was a problem of getting worse.
[0009]
In addition, when bit allocation using the masking threshold-to-noise ratio MNR (n) is performed on a sine wave in the ultra-low frequency range or the ultra-high frequency range, the minimum audible limit may be a cause of deterioration of audio characteristics. When bit allocation using the signal-to-noise ratio SNR (n) is performed on a very low frequency sine wave, perceptible quantization at the analysis window crosspoint between adjacent subband frames There was a risk of errors.
[0010]
On the other hand, when bit allocation using the masking threshold-to-noise ratio MNR (n) is performed on digital data with low pure tone such as white noise, the masking threshold-to-noise ratio MNR (n) depends on its own power or energy. Has a problem that the sound quality at the time of encoding / decoding deteriorates compared to the case where bit allocation using the signal-to-noise ratio SNR (n) is performed. In addition, the minimum audible limit may cause a deterioration in sound quality.
[0011]
In this regard, in the case of encoding digital data with high pureness or digital data with low pureness in Japanese Patent Laid-Open No. 10-207489, the applicant of the present application has the power S (or spectrum power S) of each frequency band adjacent to each other. The bit rate for bit allocation using the masking threshold-to-noise ratio MNR (n) and the signal-to-noise ratio SNR (n) corresponding to the relationship between the peak and local peaks obtained from the difference in energy) and the masking threshold M The digital data encoding method of the structure which makes variable the bit rate which performs the bit allocation which uses () is proposed.
[0012]
Certainly, according to the above encoding method, optimum bit allocation can be automatically performed from digital data having a narrow band such as a sine wave to digital data having a wide band such as white noise. Further, it is possible to prevent deterioration of sound quality even for a musical sound unsuitable for bit allocation using simultaneous masking such as a masking threshold-to-noise ratio MNR (n). However, the above-described encoding method using both the masking threshold-to-noise ratio MNR (n) and the signal-to-noise ratio SNR (n) has a complicated algorithm.
[0013]
In view of the above-described problems, the present invention provides a digital data encoding method capable of high-fidelity encoding from high-pure digital data to low-pure digital data without complicating the algorithm. The first purpose is to provide it. In addition, the present invention has a perceptible quantization error at the cross-point of the analysis window between adjacent subband frames even when encoding digital data having a very low frequency range and a high pure tone. A second object is to provide a digital data encoding method that is less likely to occur.
[0014]
[Means for Solving the Problems]
  In order to achieve the first object described above, encoding of digital data according to the present invention is performed.The apparatus is a digital data encoding apparatus having means for converting digital data such as musical sounds and voices into a frequency domain.
  Means for dividing the frequency domain into a plurality of frequency bands;
  Means for calculating the power or energy of each frequency band;
  Pure tone determination means for determining the pure tone of the digital data based on the maximum and average values of the power or energy of each calculated frequency band;
  A first masking calculation means for determining a masking threshold based on a reference masking characteristic reflecting the psychoacoustic characteristic;
  A second masking calculation means for determining a masking threshold based on a flat masking characteristic that does not weight the frequency;
  Switching means for switching between the first masking calculation means and the second masking calculation means based on the determination result of the pure tone determination means;
  Means for calculating a masking threshold to noise ratio based on the calculated power or energy and the masking threshold;
  Means for allocating bits to each frequency band based on the masking threshold to noise ratio.
[0016]
  In order to achieve the second object described above, the digital data encoding apparatus according to the present invention comprises means for dividing digital data such as musical sounds and voices into a plurality of subband frames,
  Means for converting to the frequency domain for each subband frame unit;
  Means for dividing the frequency domain into a plurality of frequency bands;
  Means for calculating the power of each frequency band;
  Means for determining a masking threshold;
  Means for calculating a masking threshold to noise ratio based on the calculated power and the masking threshold;
  Based on the masking threshold to noise ratioIn an encoding device comprising: allocation means for performing bit allocation to each frequency band,
  When the calculated difference value between the maximum value and the average value is equal to or greater than a predetermined value, and the frequency at which the maximum power value exists is equal to or less than the predetermined frequency, the subband in which the maximum power value exists It is characterized by comprising means for instructing the assigning means to assign at least the minimum quantization bit number to all frequency bands of the frame.
[0017]
DETAILED DESCRIPTION OF THE INVENTION
Here, as a digital recording / reproducing apparatus employing the digital data encoding method according to the present invention, a mini-disc recording / reproducing apparatus will be described as an example. FIG. 1 is a block diagram showing an example of the configuration of a mini-disc recording / reproducing apparatus employing the digital data encoding method according to the present invention.
[0018]
A digital audio data output from a digital audio signal source such as a compact disc reproducing device or a satellite broadcast receiving device is serially input as an optical signal, for example, to an input terminal 2 provided in the minidisc recording / reproducing device 1 shown in FIG. Is done. The optical signal input to the input terminal 2 is converted into an electric signal by the photoelectric element 3 and then input to a digital PLL (Phase-Locked-Loop) circuit 4.
[0019]
The digital PLL circuit 4 extracts a clock from the input digital audio data and reproduces multi-bit data corresponding to the sampling frequency and the number of quantization bits. This multi-bit data is digital data sampled at different sampling rates for each signal source (compact disc; 44.1 kHz, digital audio tape recorder; 48 kHz, satellite broadcast (A mode); 32 kHz, etc.). Therefore, the multi-bit data output from the digital PLL circuit 4 is converted by the frequency conversion circuit 5 into a sampling rate of 44.1 kHz corresponding to the minidisk standard.
[0020]
The audio compression circuit 6 compresses and encodes the digital audio data input by the ATRAC (Adaptive TRansform Acoustic Coding) method, and sends the encoded digital audio data to the signal processing circuit 8 via the shock proof memory controller 7. To do. The digital data encoding method in the audio compression circuit 6 will be described in detail later.
[0021]
The shock proof memory 9 controlled by the shock proof memory controller 7 absorbs the difference between the transfer speed of the digital audio data output from the audio compression circuit 6 and the transfer speed of the digital audio data input to the signal processing circuit 8. At the same time, the interruption of the reproduction signal due to disturbance such as vibration during reproduction is interpolated to protect the digital audio data.
[0022]
The signal processing circuit 8 has functions as an encoder and a decoder. The signal processing circuit 8 encodes the input digital audio data into a serial magnetic field modulation signal and supplies the serial magnetic field modulation signal to the head driving circuit 10. The head drive circuit 10 moves the recording head 11 to a predetermined recording position on the mini disk 12 and generates a magnetic field corresponding to the magnetic field modulation signal. At this time, a predetermined recording position on the mini disk 12 is irradiated with laser light from the optical pickup 13, whereby a magnetization pattern corresponding to the magnetic field is formed on the mini disk 12.
[0023]
On the other hand, the optical pickup 13 reproduces a serial signal corresponding to the magnetization pattern from the mini disk 12. The reproduced serial signal is amplified by a high frequency amplifier 14 (hereinafter referred to as an RF amplifier 14), and then decoded by a signal processing circuit 8 into digital audio data. The decoded digital audio data is sent to the audio decompression circuit 15 after the influence of disturbance is removed by the shock proof memory controller 7 and the shock proof memory 9.
[0024]
The audio decompression circuit 15 performs inverse conversion processing of compression encoding by the ATRAC method and demodulates full-bit digital audio data. The demodulated digital audio data is converted into analog audio data by a digital / analog conversion circuit 16 (hereinafter referred to as D / A conversion circuit 16), and is output to the outside from an output terminal 17.
[0025]
The serial signal amplified by the RF amplifier 14 is also input to the servo circuit 18. The servo circuit 18 sends a control signal to the driver circuit 19 according to the reproduced serial signal, and feedback-controls the rotational speed of the spindle motor 20 via the driver circuit 19. By such feedback control, the mini disk 12 can be rotated at a constant linear velocity.
[0026]
The servo circuit 18 also feedback-controls the rotation speed of the feed motor 21 via the driver circuit 19. By such feedback control, shift control of the optical pickup 13 with respect to the radial direction of the mini disk 12, that is, tracking control can be performed. Further, the servo circuit 18 also performs focusing control of the optical pickup 13 via the driver circuit 19.
[0027]
The signal processing circuit 8, the optical pickup 13, the RF amplifier 14, the servo circuit 18, the driver circuit 19, and the like are supplied with power from a power supply circuit (not shown). Such power supply operation and signal processing described later are performed. All operations are centrally managed by the system control microcomputer 22. The system control microcomputer 22 is connected to an input device 23 for performing song name input, music selection operation, sound quality adjustment operation, and the like.
[0028]
Next, a first embodiment of the digital data encoding process in the audio compression circuit 6 will be described. FIG. 2 is a block diagram showing a first embodiment of the audio compression circuit 6, and in particular, schematically shows a bit allocation processing unit following the spectrum conversion unit.
[0029]
The MDCT coefficient (frequency component (spectrum) constituting digital audio data) obtained by a spectrum conversion unit (not shown) provided in the preceding stage is input to the input end of the bit allocation processing unit shown in the figure. The The spectrum conversion unit divides the digital audio data (44.1 kHz) input from the frequency conversion circuit 5 into a plurality of frequency bands (subband frames) by a QMF (Quadrature Mirror Filter) that is a band division filter, Spectrum conversion of digital audio data is performed by performing MDCT (Modified Discrete Cosine Transform) processing for each subband frame unit.
[0030]
The power calculation unit 31 further divides the input MDCT coefficient into i frequency bands (critical bands, etc.), and the spectral power S of each frequency band is obtained from the square sum of the MDCT coefficients belonging to each frequency band._i(I = 1, 2,..., I, for example, I = 25) is calculated. The critical band is a characteristic part of a wideband audio spectrum in which specific psychoacoustic regularity such as frequency selectivity and masking threshold is effective.
[0031]
The pure tone determination unit 32 has the spectral power S calculated by the power calculation unit 31._iMaximum value S_maxAnd mean value S_av(= ΣS_i/ I) difference value (S_max-S_av) Is determined, and the level of the pure tone of the digital audio data is determined from the magnitude of the difference value, and switching control of the switching unit 33 is performed based on the determination result.
[0032]
FIG. 3 shows the spectrum power S calculated by the power calculator 31._iIt is a figure which shows an example. As shown in FIG._iMaximum value S_maxAnd mean value S_avDifference value with (S_max-S_av) Is very large, eg S_max-S_avWhen ≧ 40 dB is satisfied, the pure tone determination unit 32 determines that the pure tone property of the input digital audio data is high, and performs switching control of the switching unit 33 so as to select the flat masking calculation unit 35.
[0033]
In addition, as shown in FIG._iMaximum value S_maxAnd mean value S_avDifference value with (S_max-S_av) Is very small, eg S_max-S_avWhen ≦ 6 dB is satisfied, the pure tone determination unit 32 determines that the pure tone property of the input digital audio data is low, and controls the switching unit 33 to select the flat masking calculation unit 35 as described above. Do.
[0034]
On the other hand, spectral power S_iMaximum value S_maxAnd mean value S_avDifference value with (S_max-S_av) Does not correspond to any of the above, for example, 6 dB <S_max-S_avWhen <40 dB is satisfied, the pure tone determination unit 32 determines that the psychology of the input digital audio data, that is, the masking effect is effective, and switches the switching unit 33 so as to select the reference masking calculation unit 34. Take control.
[0035]
When the reference masking calculation unit 34 is selected by the above-described pure tone determination operation, the minimum audible limit synthesis unit 36 stores the reference masking characteristic stored in the table ROM (not shown) of the audio compression unit 6 and the minimum. By combining the audible limit characteristics, the final masking threshold M_iTo decide. On the other hand, when the flat masking calculation unit 35 is selected, the minimum audible limit combining unit 36 combines the flat masking characteristic that does not weight the frequency and the minimum audible limit characteristic, thereby obtaining a final masking threshold M._iTo decide.
[0036]
The SMR calculator 37 calculates the spectral power S calculated by the power calculator 31 when the index of each frequency band is i._iAnd the masking threshold M of each frequency band determined by the minimum audible limit synthesis unit 36_iRatio SMR_i(= S_i/ M_i) Over all frequency bands.
[0037]
First, the MNR calculation unit 38 performs spectral power S of each frequency band._iAnd the spectral power S_iQuantization noise power N generated when n is quantized with n bits_iRatio to (n), ie signal to noise ratio SNR_i(N) (= S_i/ N_i(N)) is obtained. This signal-to-noise ratio SNR_iSince (n) is statistically a constant according to the signal characteristics, it may be obtained in advance by statistical processing. Further, the MNR calculation unit 38 uses the signal-to-noise ratio SNR._i(N) and the ratio SMR obtained by the SMR calculation unit 37_iFrom the masking threshold M_iAnd quantization noise power N_iThe masking threshold to noise ratio MNR_i(N) (= SNR_i(N) / SMR_i) Is calculated.
[0038]
The quantization bit number calculation unit 39 sequentially increases the quantization bit number n of each frequency band from 0, and each time the masking threshold value to noise ratio MNR of each frequency band is increased._i(N) is calculated. And the masking threshold to noise ratio MNR_iBits are allocated in order from the frequency band in which (n) is minimized. Thereafter, every time the number of quantization bits n is updated, the masking threshold to noise ratio MNR_iSimilar bit allocation is performed for the frequency band in which (n) is minimized. When allocation is performed until a predetermined number of allocatable bits is reached, the word length of each frequency band is determined and output is performed. That is, the spectral power S_iIs the masking threshold M_iBit allocation is performed in order from the frequency band in which the length of the part exceeding the maximum is the longest.
[0039]
With the digital data encoding method described above, the masking threshold-to-noise ratio MNR is applied to digital data with a high pure tone such as a sine wave or digital data with a low pure tone such as white noise._iEven if only (n) is used for bit allocation, the signal-to-noise ratio SNR_iAudio characteristics and sound quality equivalent to those obtained when bit allocation is performed using (n) can be obtained.
[0040]
In the digital data encoding method according to the present embodiment, a normal masking threshold-to-noise ratio MNR is preferably used for a sound source that preferably uses auditory psychology, such as music and voice._iSince bit allocation using (n) is performed, the signal-to-noise ratio SNR_iIt is possible to obtain a sound quality superior to that of the bit allocation using (n). In addition, the masking threshold to noise ratio MNR_i(N) and signal-to-noise ratio SNR_iCompared to the conventional encoding method using (n) together, the algorithm can be easily realized.
[0041]
Next, a second embodiment of the digital data encoding process in the audio compression circuit 6 will be described. FIG. 4 is a block diagram showing a second embodiment of the voice compression circuit 6, and a voice decompression circuit 15 is also shown for better understanding of the explanation.
[0042]
The digital audio data (44.1 kHz) obtained by the frequency conversion circuit 5 is input to the input terminal of the audio compression circuit 6 shown in the figure. A frequency band dividing unit 41 provided at the front stage of the audio compression circuit 6 divides the input digital audio data into a plurality of frequency bands (subband frames).
[0043]
The time-frequency converter 42 converts the digital audio data into MDCT coefficients by performing MDCT processing for each subband frame unit obtained by the frequency band divider 41. Conversion data X obtained by MDCT processing at this time_m(K) is expressed by the following equation (1).
[Expression 1]

[0044]
Note that the variable m in the above formula represents the block number, and the function x_m(I) represents an input signal. The function h (i) represents a forward conversion window function. FIG. 5 is a conceptual diagram illustrating an example of time characteristics of the forward conversion window function h (i), and FIG. 6 is a conceptual diagram illustrating an example of frequency characteristics of the forward conversion window function h (i).
[0045]
The power calculation unit 43 further divides the MDCT coefficient obtained by the time-frequency conversion unit 42 into i frequency bands (critical bands, etc.), and calculates the sum of the squares of the MDCT coefficients belonging to each frequency band. Spectral power S_i(I = 1, 2,..., I, for example, I = 25) is calculated.
[0046]
The pure tone determination unit 44 uses the spectrum power S calculated by the power calculation unit 43._iMaximum value S_maxAnd mean value S_av(= ΣS_i/ I) difference value (S_max-S_av) Is determined, the level of the pure tone of the digital audio data is determined based on the difference value, and the quantization bit number calculation unit 49 performs the allocation control of the quantization bit number based on the determination result.
[0047]
FIG. 7 is a diagram for explaining the quantization bit number allocation control in the present embodiment. The spectrum power S calculated by the power calculation unit 31 is shown in FIG._iAn example (a) and an example (b) of the number of quantization bits allocated at that time are shown. In the figure, the case where the input digital audio data is divided into four subband frames SB1 to SB4 will be described as an example.
[0048]
As shown in FIG._iMaximum value S_maxAnd mean value S_avDifference value with (S_max-S_av) Is very large (eg S_max-S_av≧ 40 dB) and spectral power S_iMaximum value S_maxWhen the frequency in which the sound is present is equal to or lower than a predetermined frequency (for example, 100 Hz), the pure tone determination unit 44 determines that the input digital audio data is in a very low frequency range and has a high pure tone, and in FIG. ), The spectral power S_iMaximum value S_maxThe quantization bit number calculation unit 49 is instructed to assign at least the minimum quantization bit number to the subband frame SB1 in which there is.
[0049]
By performing such bit allocation, it is possible to reduce noise at a specific frequency. Therefore, even when digital audio data (for example, a very low frequency sine wave) having a very low frequency and a high pure tone is encoded, it is perceived at the analysis window cross point between adjacent subband frames. There is less risk of possible quantization errors.
[0050]
The masking calculation unit 45, the minimum audible limit synthesis unit 46, the SMR calculation unit 47, the MNR calculation unit 48, and the quantization bit number calculation unit 49 connected to the subsequent stage of the power calculation unit 43 are the same as those in the first embodiment. , Masking threshold to noise ratio MNR_iBit allocation using (n) is performed to determine the number of quantization bits.
[0051]
The quantization unit 50 and the packing unit 51 compress and code the input digital audio data according to the number of quantization bits obtained by the quantization bit number calculation unit 49. The digital audio data compression-encoded in this way is recorded on the mini disk 12 via the signal processing circuit 8 or the like.
[0052]
On the other hand, when the mini-disc 12 is reproduced, the unpacking unit 52 and the inverse quantization unit 53 of the audio decompression circuit 15 restore the compression-coded digital audio data to the original MDCT coefficients.
[0053]
The frequency time conversion unit 54 performs an IMDCT (Inverse Modified Discrete Cosine Transform) process for each subband frame unit on the restored MDCT coefficient. The restored signal x ^ obtained by the IMDCT process at this time_m(I) is expressed by the following equation (2).
[Expression 2]

[0054]
Note that the variable m in the above formula represents the block number, and the function X_m(K) represents converted data (reconstructed MDCT coefficient). The function y_m(I) represents an inverse transformation signal, and function f (i) represents an inverse transformation window function.
[0055]
The subsequent frequency band synthesizer 55 generates the restored signal x ^ obtained by the frequency time converter 54._mThe original digital audio data is restored by synthesizing (i), and the digital audio data is sent to the D / A conversion circuit 16 at the next stage.
[0056]
Conversion data X_m(K) is not affected by quantization, and the restored signal x ^_mThe time frequency conversion unit 42 and the frequency time conversion unit 54 should be designed so as to satisfy the following equation (3) so that the original digital audio data can be restored from (i). This condition is already known by CAS90-10, DSP90-14, and the like.
[Equation 3]

[0057]
On the other hand, conversion data X_mThe case where (k) is affected by quantization will be described for reference. FIG. 8 shows the conversion data X_m(K) is a figure explaining the case where it receives to the influence by quantization, and is an example of the digital audio | voice data (a) output from the frequency band synthetic | combination part 55, and the bit allocation at the time of encoding of this digital audio | voice data An example (b) is shown.
[0058]
As shown in (b) in this figure, when bits are assigned only to the frequency components of about 1/3 from the low frequency of the subband frame SB1, the restored signal x ^_mThe frequency characteristic of (i) is as shown in (a) in the figure, and noise of the frequency characteristic of the window function is generated. Such quantization noise is generated not a little at any frequency of the subband frame SB1, but in this example, bits are concentrated from the low frequency of the subband frame SB1 to about 1/3, and the remaining noise is left. Quantization noise in 2/3 frequency components is easily perceived. Therefore, as described above, the spectral power S_iMaximum value S_maxIf at least the minimum number of quantization bits is assigned to the subband frame SB1 in which the noise exists, noise at a specific frequency can be reduced.
[0059]
In the above embodiment, the example in which the digital data encoding method according to the present invention is applied to a mini-disc recording / playback apparatus has been described. However, it goes without saying that the scope of the present invention is not limited to this. Yes.
[0060]
【The invention's effect】
  Encoding digital data according to the present inventionThe apparatus is a digital data encoding apparatus having means for converting digital data such as musical sounds and voices into a frequency domain.
  Means for dividing the frequency domain into a plurality of frequency bands;
  Means for calculating the power or energy of each frequency band;
  Pure tone determination means for determining the pure tone of the digital data based on the maximum and average values of the power or energy of each calculated frequency band;
  A first masking calculation means for determining a masking threshold based on a reference masking characteristic reflecting the psychoacoustic characteristic;
  A second masking calculation means for determining a masking threshold based on a flat masking characteristic that does not weight the frequency;
  Switching means for switching between the first masking calculation means and the second masking calculation means based on the determination result of the pure tone determination means;
  Means for calculating a masking threshold to noise ratio based on the calculated power or energy and the masking threshold;
  Means for allocating bits to each frequency band based on the masking threshold to noise ratio.
[0062]
With such a digital data encoding method, only the masking threshold-to-noise ratio is used for digital data with high purity such as a sine wave, or digital data with low purity such as white noise. Even when bit allocation is performed, audio characteristics and sound quality equivalent to those obtained when bit allocation is performed using a signal-to-noise ratio can be obtained.
[0063]
In addition, in the digital data encoding method according to the present invention, a bit allocation using a normal masking threshold-to-noise ratio is performed for a sound source that preferably uses auditory psychology, such as a musical sound or voice. As a result, it is possible to obtain a sound quality superior to that of the bit allocation using the signal-to-noise ratio. Furthermore, the algorithm can be easily realized as compared with the conventional coding method using both the masking threshold-to-noise ratio and the signal-to-noise ratio.
[0064]
  Also, the encoding of digital data according to the present inventionThe apparatus includes means for dividing digital data such as musical sounds and voices into a plurality of subband frames;
  Means for converting to the frequency domain for each subband frame unit;
  Means for dividing the frequency domain into a plurality of frequency bands;
  Means for calculating the power of each frequency band;
  In an encoding device comprising: allocation means for performing bit allocation to each frequency band,
  When the calculated difference value between the maximum value and the average value is equal to or greater than a predetermined value, and the frequency at which the maximum power value exists is equal to or less than the predetermined frequency, the subband in which the maximum power value exists Means for instructing the assigning means to assign at least the minimum quantization bit number to all frequency bands of the frame;
[0065]
With such a digital data encoding method, it is possible to reduce noise at a specific frequency. Therefore, even when digital audio data (for example, a very low frequency sine wave) having a very low frequency and a high pure tone is encoded, it is perceived at the analysis window cross point between adjacent subband frames. There is less risk of possible quantization errors.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an example of the configuration of a mini-disc recording / playback apparatus employing a digital data encoding method according to the present invention.
FIG. 2 is a block diagram showing a first embodiment of an audio compression circuit 6;
FIG. 3 shows the spectral power S calculated by the power calculator 31._iIt is a figure which shows an example.
FIG. 4 is a block diagram showing a second embodiment of the audio compression circuit 6;
FIG. 5 is a conceptual diagram showing an example of time characteristics of a forward conversion window function h (i).
FIG. 6 is a conceptual diagram showing an example of frequency characteristics of a forward conversion window function h (i).
FIG. 7 is a diagram for explaining quantization bit number allocation control in the second embodiment.
[Figure 8] Conversion data X_mIt is a figure explaining the case where (k) receives the influence by quantization.
[Explanation of symbols]
1 Mini-disc recording / playback device
2 input terminals
3 photoelectric elements
4 Digital PLL circuit
5 Frequency conversion circuit
6 Voice compression circuit
7 Shockproof memory controller
8 Signal processing circuit
9 Shockproof memory
10 Recording head drive circuit
11 Recording head
12 Mini Disc
13 Optical pickup
14 High frequency amplifier (RF amplifier)
15 Voice decompression circuit
16 Digital / analog conversion circuit (D / A conversion circuit)
17 Output terminal
18 Servo circuit
19 Driver circuit
20 Spindle motor
21 Feed motor
22 System control microcomputer
23 Input device
31 Power calculator
32 Pureness judgment part
33 switching part
34. Reference masking calculator
35 Flat masking calculator
36 Minimum audible limit synthesis unit
37 SMR calculator
38 MNR calculator
39 Quantization Bit Number Calculation Unit
41 Frequency band division unit
42 Time frequency converter
43 Power calculator
44 Pure tone judgment unit
45 Masking calculator
46 Minimum audible limit synthesis unit
47 SMR calculator
48 MNR calculator
49 Quantization bit number calculation section
50 Quantizer
51 Packing part
52 Unpacking part
53 Inverse quantization part
54 Frequency time converter
55 Frequency Band Synthesizer

Claims

In a digital data encoding apparatus having means for converting digital data such as music and voice into the frequency domain ,
Means for dividing the frequency domain into a plurality of frequency bands;
Means for calculating the power or energy of each frequency band;
Pure tone determination means for determining the pure tone of the digital data based on the maximum and average values of the power or energy of each calculated frequency band;
A first masking calculation means for determining a masking threshold based on a reference masking characteristic reflecting the psychoacoustic characteristic;
A second masking calculation means for determining a masking threshold based on a flat masking characteristic that does not weight the frequency;
Switching means for switching between the first masking calculation means and the second masking calculation means based on the determination result of the pure tone determination means;
Means for calculating a masking threshold to noise ratio based on the calculated power or energy and the masking threshold;
Means for allocating bits to each of the frequency bands based on the masking threshold-to-noise ratio.

Means for dividing digital data such as music and voice into a plurality of subband frames;
Means for converting to the frequency domain for each subband frame unit;
Means for dividing the frequency domain into a plurality of frequency bands;
Means for calculating the power of each frequency band;
Means for determining a masking threshold;
Means for calculating a masking threshold to noise ratio based on the calculated power and the masking threshold;
In an encoding device comprising: allocation means for performing bit allocation to each frequency band based on the masking threshold-to-noise ratio ;
When the calculated difference value between the maximum value and the average value is equal to or greater than a predetermined value, and the frequency at which the maximum power value exists is equal to or less than the predetermined frequency, the subband in which the maximum power value exists An encoding apparatus comprising: means for instructing the assigning means to assign at least the minimum quantization bit number to all frequency bands of a frame.