JP4095348B2

JP4095348B2 - Noise reduction system and program

Info

Publication number: JP4095348B2
Application number: JP2002159676A
Authority: JP
Inventors: 義久石田; 隆啓村上
Original assignee: Meiji University
Current assignee: Meiji University
Priority date: 2002-05-31
Filing date: 2002-05-31
Publication date: 2008-06-04
Anticipated expiration: 2022-05-31
Also published as: JP2004004286A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声信号等の原信号に雑音が重畳された観測信号から雑音成分を取り除く雑音除去システムおよびプログラムに関し、特にＭＵＳＩＣ（Multiple Signal Classification）法を用いた雑音除去システムおよびプログラムに関する。
【０００２】
【従来の技術】
従来より、音声信号に含まれる雑音の除去方法として、ＭＵＳＩＣ法が知られている（M. Kaveh and A. J. Barabell,“The statistical performance of the MUSIC and the minimum-norm algorithms in resolving plane waves in noise", IEEE Trans. ASSP-34, 331-341, 1986）。ＭＵＳＩＣ法は部分空間法の一つで、音声信号等の原信号に雑音が重畳された観測信号の自己相関行列に固有値分解を適用し、観測信号を、原信号の情報を持つ“信号部分空間”と、雑音情報を持つ“雑音部分空間”とに分解し、得られた雑音部分空間を用いて原信号の情報を推定するものである。このＭＵＳＩＣ法により推定された信号情報を用いると、最尤法や一般化逆行列計算により観測信号に含まれる雑音成分を効果的に除去することができる。
【０００３】
【発明が解決しようとする課題】
しかしながら、従来のＭＵＳＩＣ法による雑音除去方法では、▲１▼自己相関行列の固有値分解の計算が複雑で、計算量が非常に多い、▲２▼最尤法や一般化逆行列計算は行列の乗数や逆行列計算を含むので計算量が非常に多い、などの理由により多大な計算時間を必要とする。このため、リアルタイムでの処理が困難であるという問題がある。
【０００４】
本発明は、このような点に鑑みなされたもので、計算量を削減することができ、リアルタイムでの雑音除去処理が可能な雑音除去システムおよびプログラムを提供することを目的とする。
【０００５】
【課題を解決するための手段】
本発明に係る雑音除去システムは、原信号に雑音信号が重畳された観測信号を入力し、この観測信号を所定長さのフレーム毎に切り出すフレーム切出し部と、前記切り出されたフレーム毎に離散的フーリエ変換を施し、前記フレームの振幅スペクトルと位相スペクトルとを抽出する離散的フーリエ変換部と、前記観測信号から前記雑音信号の分散を推定する雑音分散推定部と、前記推定された雑音信号の分散に基づいて前記観測信号の信号部分空間の次元を決定する信号部分空間の次元決定部と、前記抽出されたフレームの振幅スペクトルから前記決定された次元の振幅スペクトルを抽出し、この抽出された振幅スペクトルと前記推定された雑音信号の分散とから前記原信号の振幅スペクトルを推定し、更に前記原信号の振幅スペクトルの推定値と前記フレームの位相スペクトルとから前記原信号のスペクトルを推定し、前記推定された原信号のスペクトルを逆フーリエ変換して原信号を復元することにより前記原信号に重畳された雑音信号を除去する雑音除去部と、前記雑音信号が除去された各フレームを接続して復元信号を得るフレーム接続部とを備えたものであることを特徴とする。
【０００６】
本発明に係る雑音除去プログラムは、原信号に雑音信号が重畳された観測信号を入力し、この観測信号を所定長さのフレーム毎に切り出すステップと、前記切り出されたフレーム毎に離散的フーリエ変換を施し、前記フレームの振幅スペクトルと位相スペクトルとを抽出するステップと、前記観測信号から前記雑音信号の分散を推定するステップと、前記推定された雑音信号の分散に基づいて前記観測信号の信号部分空間の次元を決定するステップと、前記抽出されたフレームの振幅スペクトルから前記決定された次元の振幅スペクトルを抽出し、この抽出された振幅スペクトルと前記推定された雑音信号の分散とから前記原信号の振幅スペクトルを推定し、更に前記原信号の振幅スペクトルの推定値と前記フレームの位相スペクトルとから前記原信号のスペクトルを推定し、前記推定された原信号のスペクトルを逆フーリエ変換して原信号を復元することにより前記原信号に重畳された雑音信号を除去するステップと、前記雑音信号が除去された各フレームを接続して復元信号を得るステップとをコンピュータに実行させるように構成されたものである。
【０００７】
本発明によれば、ＭＵＳＩＣ法によって推定された信号情報を利用した雑音除去方法として、自己相関行列の固有値と固有ベクトルを求めるための行列演算を行わず、観測信号の自己相関行列の特徴に注目して観測信号に対する離散的フーリエ変換を用いて自己相関行列の固有値および固有ベクトルを求めるようにしているので、リアルタイムでの信号除去処理が可能になる。しかも、ＭＵＳＩＣ法をベースとした雑音除去がなされるので、雑音除去能力も極めて高い。
【０００８】
【発明の実施の形態】
以下、図面を参照しながら、本発明の一実施形態を詳細に説明する。
図１は、本実施形態に係る音声信号の雑音除去システムの構成を示す図である。
この情報検索支援システムは、音声信号等の原信号ｘに雑音ｎが重畳された観測信号ｙを入力し、この観測信号ｙを所定長さのフレーム毎に切出すフレーム切出し部１と、切出されたフレームに対して離散的フーリエ変換（ＤＦＴ）を施して観測信号ｙの振幅スペクトルと位相スペクトルとを求めるＤＦＴ部２と、このＤＦＴ部２で求められた振幅スペクトルと位相スペクトルのうち、信号部分空間を占めるスペクトルを確定するため、信号部分空間の次元Ｐを決定する信号部分空間の次元決定部３と、決定された信号部分空間のスペクトルを逆ＤＦＴ処理して雑音が除去されたフレームを生成するＭＵＳＩＣ法による雑音除去部４と、逆ＤＦＴ処理されたフレームを接続して原信号ｘの復元信号を出力するフレーム接続部５と、雑音の分散を推定する雑音分散推定部６とを備えて構成されている。また、雑音分散推定部６は、フレーム切出し部１で切出されたフレームからフレームの分散を計算する分散計算部１１と、ＤＦＴ部２の出力と分散計算部１１の出力から、観測信号の無音区間を検出する無音区間検出部１２と、検出された無音区間の分散で雑音の分散値を更新する雑音分散更新部１３とを備えている。なお、このシステムは、上述した各部の処理をステップとして含む雑音除去プログラムを、コンピュータに実行させることにより実現されるシステムであっても良い。
【０００９】
次に、このように構成された雑音除去システムの動作について説明する。
図２は、本システムの動作を示すフローチャートである。
まず、マイクロホン等を介して入力された原信号の音声信号を含む観測信号ｙは、フレーム切出部１において、Ｎサンプルの長さを持つハニング窓によって１フレームずつ切り出される（Ｓ１）。切り出しに当たっては、図３（ａ）に示すように、前のフレーム（切り出し区間）とＮ／２サンプルだけ重複させて次のフレームを切り出すようにする。そして、切り出されたそれぞれのフレームに対して、ＤＦＴ部２で離散的フーリエ変換により振幅スペクトルと位相スペクトルを求める（Ｓ２）。また、観測信号のフレームから雑音信号の分散を推定し（Ｓ３）、この雑音の分散に基づいて、信号部分空間の次元決定部３により信号部分空間の次元Ｐを決定する（Ｓ４）。そして、求めた観測信号ｙの信号部分空間の振幅スペクトルと位相スペクトルとに基づいてＭＵＳＩＣ法による雑音除去を行う(Ｓ５)。最後に、雑音除去されたフレームを図３（ｂ）に示すように、Ｎ／２フレームずつ重ねて接続することにより、原信号ｘの復元信号が得られる。
【００１０】
次に、ＤＦＴ処理（Ｓ２）で求められる観測信号ｙの振幅スペクトルおよび位相スペクトルと、ＭＵＳＩＣ法における原信号を構成する正弦波成分の周波数の推定値との関係について詳細に説明する。
【００１１】
［１］観測信号ｙの自己相関行列と固有値分解
いま、ステップＳ１で切り出されたフレームの原信号を、Ｎサンプルの離散時間信号ベクトルｘとすると、このｘは、次のように表すことができる。
【００１２】
【数１】

Ｔ：行列またはベクトルの転置
【００１３】
原信号ｘがＰ個の正弦波成分から構成されているとすると、原信号ｘは、
【００１４】
【数２】

【００１５】
と表される。ここで、Ｓ，ａは、
【００１６】
【数３】

【００１７】
【数４】

Ｘ（ｆ_ｌ）：ｘを構成する正弦波成分のうち周波数ｆ_ｌにあたる成分の複素振幅値
【００１８】
で与えられる。ここで、Ｓの列要素である複素指数ベクトルｓ(ｆ_l)は、
【００１９】
【数５】

【００２０】
で与えられる。一般に、原信号ｘを構成する正弦波成分の数Ｐとその周波数ｆ_l（ｌ＝０，１，…，Ｐ−１）は未知である。
いま、原信号ｘに雑音が付加された信号、
【００２１】
【数６】

【００２２】
が観測されたものとする。ここでｎは平均０、分散σ_n ²で与えられる正規分布雑音信号であり、原信号ｘと雑音ｎとは互いに無相関である。このとき、観測信号ｙの自己相関関数行列Ｒ_yyは、
【００２３】
【数７】

【００２４】
で定義される。ここで、Ｅ［ｙｙ^H］はｙｙ^Hの期待値、Ｈは行列またはベクトルの複素共役転置を表す。また、Ｒ_xxとＲ_nnは、それぞれｘとｎの自己相関行列であり、
【００２５】
【数８】

【００２６】
【数９】

【００２７】
で与えられる。また、式（８）の行列Ａは、
【００２８】
【数１０】

【００２９】
で与えられる。
【００３０】
ＭＵＳＩＣ法では、一般に、信号の自己相関行列を固有値分解することにより、信号部分空間と雑音部分空間とを求め、得られた雑音部分空間を用いて信号情報を推定する。しかし、一般に行列の固有値分解は計算が複雑で計算量が非常に多い。このため音声信号のようにサンプル数Ｎが比較的大きい場合には、実時間処理は不可能である。そこで、本発明では、自己相関行列の特徴に注目し、実時間処理が可能な新しい固有値分解の手法を提案する。
【００３１】
いま、原信号ｘがＰ−Ｎ個の周波数成分Ｘ（φ_k）（ｋ＝０，１，…，Ｎ−１）で構成されており、その周波数φ_kがＤＦＴ（離散フーリエ変換）のように
【００３２】
【数１１】

【００３３】
で与えられるものとする。このとき、式（３）で与えられるＳの行要素は複素平面にある単位円の円周を等分する点に等間隔で配置されるので、Ｓの異なる行は互いに直交する。そのため、Ｓは、
【００３４】
【数１２】

【００３５】
【数１３】

【００３６】
という性質を持つ。式（１２）と式（１３）とにより、Ｓの複素共役転置Ｓ^Hは、
【００３７】
【数１４】

【００３８】
と表され、式（１４）を式（８）に代入すると、
【００３９】
【数１５】

【００４０】
が得られる。式（１０）より、Ａは対角行列なのでＡにスカラー量ＮをかけたＮＡも対角行列である。よって、式（１５）はＲ_xxの固有値分解を表しており、Ｒ_xxの固有値をλ_k（ｋ＝０，１，…，Ｎ−１）、それに対応する固有ベクトルをν_k（ｋ＝０，１，…，Ｎ−１）とすると、固有値λ_kおよび固有ベクトルν_kは、次の式（１６）、（１７）により容易に求めることができる。
【００４１】
【数１６】

【００４２】
【数１７】

【００４３】
以上のことから、自己相関行列とその固有値と固有ベクトルに関する次のような性質が得られる。
（１）自己相関行列の固有値は、信号のＤＦＴから直接求めることができる。
（２）自己相関行列の固有ベクトルは、式（１７）で与えられるので、自己相関行列からの計算によって求める必要がない。
（３）自己相関行列の固有値と固有ベクトルを計算するために、信号から自己相関行列を求める必要がない。
【００４４】
更に、サンプル数Ｎを２の乗数に設定することにより、ＤＦＴ部２としてＦＦＴ（高速フーリエ変換）を用いた固有値計算が可能となる。信号の自己相関行列から固有値と固有ベクトルとが得られると、ＭＵＳＩＣ法によって信号を構成する正弦波成分の周波数を推定することができる。
【００４５】
［２］ＭＵＳＩＣ法による雑音除去
ＭＵＳＩＣ法は、固有値分解に基づく部分空間法の一つで、雑音に埋もれた中から目的とする原信号を構成する正弦波成分の周波数を推定するものである。いま、雑音を含まない原信号ｘ、雑音を含んだ観測信号ｙがそれぞれ式（２）と式（６）とで与えられるものとする。ここで、ｘを構成する正弦波成分の数はＰ＜Ｎとする。このとき、ｘの自己相関行列Ｒ_xxは、ランクがＰの非負定値エルミート行列となるので、Ｒ_xxの固有値λ_kは全て実数で、
【００４６】
【数１８】

【００４７】
で与えられる。一方、式（７）と式（９）とにより、ｙの自己相関行列Ｒ_yyの固有値μ_kは、
【００４８】
【数１９】

【００４９】
で与えられるので、μ_kは、
【００５０】
【数２０】

【００５１】
となる。また、μ_kに対応する固有ベクトルをν_k（ｋ＝０，１，…，Ｎ−１）とすると、それらは互いに直交する二組の部分空間に分けることができる。｛ν₀，ν₁，…ν_p-1｝は、Ｓと等価な信号部分空間を張り、一方｛ν_p，ν_p+1，…ν_pN-1｝は、Ｓと直交するザツサ部分空間を張る。ｙのＭＵＳＩＣスペクトラムは、雑音部分空間を張る固有ベクトルを用いて、
【００５２】
【数２１】

【００５３】
で定義される。ここで、ｆは任意の周波数である。式（２１）は、ｆ＝ｆ_lのときに極を持つので、その点において鋭いピークが現れる。よって、ｘを構成する正弦波成分の周波数ｆ_lは、ＭＵＳＩＣスペクトラム上のＰ個のピーク点を検出することによって推定することができる。さらに、ＭＵＳＩＣスペクトラムから推定された周波数（以下、式中においてはｆ_lの上に推定値であることを示す“＾”を付加して表記する）を用いると、最尤法や一般化逆行列計算によってｙからｘを復元することができる。しかし、この方法は、行列の乗算や逆行列計算を含むので計算量が多く、実時間処理には不向きである。そこで本発明では、これらの方法によらない以下に述べる方法でｘを復元する。
【００５４】
上述したように、ｘはＰ（＜Ｎ）個の周波数成分から構成される。一方、ｎは正規分布雑音なので、その成分は全周波数帯域にわたって分布している。つまり、ｙの周波数成分のうちｆ_l以外の周波数における成分は雑音のみが含まれる。式（１６）と式（１９）とにより、ｘとｙの周波数成分の間には、
【００５５】
【数２２】

【００５６】
が成り立つ。ここで、Ｘ（ｆ_l）とＹ（ｆ_l）は、それぞれｘとｙのｆ_lにおける周波数成分である。よって、Ｘ（ｆ_l）の推定値は、
【００５７】
【数２３】

【００５８】
で与えられる。式（２３）は自己相関行列の固有値から導かれた式であるから、この式により求められるＸ（ｆ_l）の推定値は、位相に関する情報を一切持たない。そこで、人間の聴覚が位相の変化に対して敏感でないという性質を利用し、ｙの周波数成分から求められた位相スペクトルを、Ｘ（ｆ_l）の推定値の位相情報として用いる。ｙの周波数成分Ｙ（ｆ_l）（但しｆ_lは推定値）の位相情報を、
【００５９】
【数２４】

【００６０】
とする。ここでＲｅ｛・｝とＩｍ｛・｝とはそれぞれ複素数の実数部と虚数部とを表す。このとき、Ｘ（ｆ_l）の推定値は、
【００６１】
【数２５】

【００６２】
によって復元される。よって、ｘの復元信号は、Ｘ_MUSICのＩＤＦＴ（逆離散フーリエ変換）によって、
【００６３】
【数２６】

【００６４】
で求めることができる。
【００６５】
次に、ＭＵＳＩＣスペクトラムの持つ意味について考える。式（１６）と式(２０)より、信号部分空間に相当する固有値は、ｙの周波数成分のうちパワーの大きいＰ個の成分と対応する。同様にして、雑音部分空間に相当する固有値は、ｙの周波数成分のうちパワーの小さいＮ−Ｐ個の成分と対応している。さらに、固有ベクトルν_kは、式（１７）よりｙを構成する正弦波成分に相当する複素指数ベクトルｓ（φ_k）で与えられる。一方、任意の周波数ｆにおける複素指数ベクトルｓ（ｆ）は、その周波数が式（１１）で与えられるときにのみ、その要素がすべて複素平面の単位円を等分する点に等間隔で配置される。よって、式（２１）で表されるＭＵＳＩＣスペクトラムの分母は、以下のように与えられる。
【００６６】
【数２７】

【００６７】
【数２８】

【００６８】
【数２９】

【００６９】
これらのことから、式（２１）で表されるＭＵＳＩＣスペクトラムは、観測信号ｙの周波数成分のうちパワーの大きいＰ個の周波数成分における周波数のみに極を持つ。このことは、観測信号ｙのＭＵＳＩＣスペクトラムに現れるＰ個のピーク点から周波数を推定することが、観測信号ｙの周波数成分のうちパワーの大きいＰ個の成分における周波数を検出することに他ならないことを意味する。よって、ＭＵＳＩＣスペクトラムを計算しなくても、観測信号ｙの振幅スペクトルから直接、ＭＵＳＩＣ法によって推定されるものと同じ周波数を推定することができる。
【００７０】
図４は、ＭＵＳＩＣ法による雑音除去の処理（Ｓ５）の詳細を示すフローチャートであり、その左には、各処理で得られる信号又は信号のスペクトルを示している。まず、観測信号ｙ（離散時間信号）から、ＤＦＴ部２で振幅スペクトルおよび位相スペクトルが求められたら、振幅スペクトルのうち、パワーの大きいＰ個の成分が抽出される（Ｓ１１）。次に、原信号の振幅スペクトルが求められ（Ｓ１２）、この振幅スペクトルに観測信号ｙの位相スペクトルが付加されて位相情報が復元される（Ｓ１３）。そして、これを逆ＤＦＴすることにより，フレームの復元信号が得られる（Ｓ１４）。
【００７１】
なお、以上では、ｘを構成する正弦波成分の数、つまり信号部分空間の次元Ｐが既知であるものとして説明した。しかし、一般にはＰは未知である。そこでＲ_yyの固有値μ_kの分布とｎの分散σ_n ²を用いてＰを推定する方法について説明する。
【００７２】
Ｒ_yyの固有値μ_kは、式（２０）のように分布する。実際には、サンプル数Ｎが有限長であるために、｛μ_P，μ_P+1，…，μ_N-1｝は一定値σ_n ²にならず、σ_n ²を平均値として、ある程度の範囲に散らばって分布するが、それでも｛μ₀，μ₁，…，μ_P-1｝と比較すると、それは十分σ_n ²とみなすことができる範囲で分布している。そこで、本発明では、σ_n ²をしきい値として用い、σ_n ²よりも大きいＲ_yyの固有値の数をＰとしている。信号部分空間の次元決定部３は、雑音分散推定部６で求められた分散σ_n ²の推定値よりも大きいピークを持つ成分の数を信号部分空間の次元Ｐとして決定する。
【００７３】
しかし、観測信号には、ｘとｎの両方が含まれているため、σ_n ²を推定することは現実的には困難である。そこで、人間の発する音声は文章間に無音区間を含み、また、雑音の分散は、時間変化に対して緩やかに変化すると仮定すると、音声区間に含まれる雑音の分散は、直前の無音区間から推定することができる。観測信号ｙから音声区間と無音区間を検出するため、次の式（３０）を定義する。
【００７４】
【数３０】

【００７５】
ここで、σ_n ²の上に“＾”を付した値は、σ_n ²の推定値である。Ｒ_yyの固有値μ_kは、式（２０）で与えられるので、Ｄ_VADは無音区間では０、音声区間ではそれよりも大きな値を示す。Ｄ_VAD≦０の場合、σ_n ²の推定値をその区間の信号の分散によって更新することにより、時間変化に対して緩やかに特性が変化する雑音に対しても雑音除去や部分空間の次元の決定が効果的に行われる。本システムでは、分散計算部１１が観測信号ｙの分散を計算し、無音区間検出部１２が、上記式（３０）に基づいて無音区間を検出する。そして、雑音分散更新部１３は、無音区間での分散を雑音信号の分散であるとして、分散の更新に使用する。なお、本実施形態では、マイクロホンから入力される観測信号ｙは必ず無音区間から始まると仮定して、観測信号の最初のフレームの分散をσ_n ²の推定値の初期値として用いるようにしている。
【００７６】
なお、以上では、原信号が音声信号である例について説明したが、原信号が心電図における心拍信号である場合や脳波信号である場合にも、それらに重畳された雑音を除去するのに極めて有用である。
【００７７】
【発明の効果】
以上述べたように本発明によれば、ＭＵＳＩＣ法によって推定された信号情報を利用した雑音除去方法として、自己相関行列の固有値と固有ベクトルを求めるための行列演算を行わず、観測信号に対する離散的フーリエ変換を用いて自己相関行列の固有値および固有ベクトルを求めるようにしているので、精度の良い雑音除去をリアルタイムで行うことが可能になるという効果を奏する。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る雑音除去システムの構成を示すブロック図である。
【図２】同システムの動作を示すフローチャートである。
【図３】同システムにおけるフレーム切出しとフレーム接続を説明するための図である。
【図４】同システムにおける雑音除去処理の詳細を示すフローチャートである。
【符号の説明】
１…フレーム切出し部
２…ＤＦＴ(離散フーリエ変換)部
３…信号部分空間の次元決定部
４…ＭＵＳＩＣ法による雑音除去部
５…フレーム接続部
６…雑音分散推定部
１１…分散計算部
１２…無音区間検出部
１３…雑音分散更新部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a noise removal system and program for removing a noise component from an observation signal in which noise is superimposed on an original signal such as an audio signal, and more particularly to a noise removal system and program using a MUSIC (Multiple Signal Classification) method.
[0002]
[Prior art]
Conventionally, the MUSIC method is known as a method for removing noise contained in an audio signal (M. Kaveh and AJ Barabell, “The statistical performance of the MUSIC and the minimum-norm algorithms in resolving plane waves in noise”, IEEE Trans. ASSP-34, 331-341, 1986). The MUSIC method is one of the subspace methods. By applying eigenvalue decomposition to the autocorrelation matrix of the observed signal in which noise is superimposed on the original signal such as a speech signal, the observed signal is converted into a “signal subspace having information of the original signal ”And“ noise subspace ”having noise information, and the information of the original signal is estimated using the obtained noise subspace. When signal information estimated by the MUSIC method is used, a noise component included in the observation signal can be effectively removed by a maximum likelihood method or a generalized inverse matrix calculation.
[0003]
[Problems to be solved by the invention]
However, in the conventional noise removal method using the MUSIC method, (1) the calculation of eigenvalue decomposition of the autocorrelation matrix is complicated and the calculation amount is very large. (2) The maximum likelihood method and the generalized inverse matrix calculation are matrix multipliers. And a large amount of calculation time is required because the calculation amount is very large. For this reason, there is a problem that processing in real time is difficult.
[0004]
The present invention has been made in view of these points, and an object of the present invention is to provide a noise removal system and program that can reduce the amount of calculation and can perform noise removal processing in real time.
[0005]
[Means for Solving the Problems]
The noise removal system according to the present invention inputs an observation signal in which a noise signal is superimposed on an original signal, and extracts a frame cutout unit for each frame of a predetermined length. A discrete Fourier transform unit that performs Fourier transform and extracts an amplitude spectrum and a phase spectrum of the frame; a noise variance estimation unit that estimates a variance of the noise signal from the observation signal; and a variance of the estimated noise signal A signal subspace dimension determining unit for determining a dimension of the signal subspace of the observation signal based on the extracted amplitude spectrum of the determined dimension from the amplitude spectrum of the extracted frame, and the extracted amplitude The amplitude spectrum of the original signal is estimated from the spectrum and the estimated variance of the noise signal, and the amplitude spectrum of the original signal is estimated. The spectrum of the original signal is estimated from the value and the phase spectrum of the frame, and the noise signal superimposed on the original signal is removed by performing an inverse Fourier transform on the estimated spectrum of the original signal to restore the original signal. And a frame connecting unit that connects each frame from which the noise signal has been removed to obtain a restored signal.
[0006]
The noise removal program according to the present invention inputs an observation signal in which a noise signal is superimposed on an original signal, cuts out the observation signal for each frame of a predetermined length, and discrete Fourier transform for each of the cut out frames. And extracting the amplitude spectrum and phase spectrum of the frame, estimating the variance of the noise signal from the observation signal, and the signal portion of the observation signal based on the estimated variance of the noise signal Determining the dimension of the space; extracting the amplitude spectrum of the determined dimension from the amplitude spectrum of the extracted frame; and determining the original signal from the extracted amplitude spectrum and the variance of the estimated noise signal. From the estimated amplitude spectrum of the original signal and the phase spectrum of the frame. Estimating the spectrum of the original signal, and performing inverse Fourier transform on the estimated spectrum of the original signal to restore the original signal, thereby removing the noise signal superimposed on the original signal; and removing the noise signal And connecting the frames thus obtained to obtain a restoration signal.
[0007]
According to the present invention, as a noise removal method using signal information estimated by the MUSIC method, the matrix calculation for obtaining the eigenvalues and eigenvectors of the autocorrelation matrix is not performed, and the feature of the autocorrelation matrix of the observation signal is focused. Thus, the eigenvalues and eigenvectors of the autocorrelation matrix are obtained using the discrete Fourier transform on the observed signal, so that signal removal processing in real time becomes possible. Moreover, since noise removal based on the MUSIC method is performed, the noise removal capability is extremely high.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a diagram showing a configuration of a sound signal noise removal system according to the present embodiment.
This information retrieval support system receives an observation signal y in which noise n is superimposed on an original signal x such as a speech signal, and extracts a frame cutout unit 1 that cuts out the observation signal y for each frame of a predetermined length. A DFT unit 2 that performs discrete Fourier transform (DFT) on the generated frame to obtain an amplitude spectrum and a phase spectrum of the observation signal y, and a signal out of the amplitude spectrum and the phase spectrum obtained by the DFT unit 2 In order to determine the spectrum occupying the subspace, the signal subspace dimension determining unit 3 that determines the dimension P of the signal subspace, and the frame from which the noise is removed by performing inverse DFT processing on the determined spectrum of the signal subspace A noise removing unit 4 that generates the MUSIC method, a frame connecting unit 5 that connects the inverse DFT processed frames and outputs a restoration signal of the original signal x, and noise dispersion It is configured by a noise variance estimation unit 6 which estimates. Further, the noise variance estimation unit 6 calculates the variance of the frame from the frame cut out by the frame cutout unit 1, and the silence of the observation signal from the output of the DFT unit 2 and the output of the variance calculation unit 11. A silent section detecting unit 12 for detecting a section, and a noise variance updating unit 13 for updating a noise variance value with the variance of the detected silent section are provided. Note that this system may be a system that is realized by causing a computer to execute a noise removal program including the above-described processing of each unit as a step.
[0009]
Next, the operation of the noise removal system configured as described above will be described.
FIG. 2 is a flowchart showing the operation of the present system.
First, an observation signal y including an original audio signal input via a microphone or the like is cut out frame by frame by a Hanning window having a length of N samples in the frame cutout unit 1 (S1). In the cutout, as shown in FIG. 3A, the next frame is cut out by overlapping the previous frame (cutout section) by N / 2 samples. Then, an amplitude spectrum and a phase spectrum are obtained by discrete Fourier transform in the DFT unit 2 for each cut out frame (S2). Further, the variance of the noise signal is estimated from the frame of the observed signal (S3), and the dimension P of the signal subspace is determined by the dimension determining unit 3 of the signal subspace based on the noise variance (S4). Then, noise removal by the MUSIC method is performed based on the amplitude spectrum and phase spectrum of the signal subspace of the obtained observation signal y (S5). Finally, as shown in FIG. 3B, the restored signal of the original signal x can be obtained by connecting N / 2 frames in an overlapping manner as shown in FIG. 3B.
[0010]
Next, the relationship between the amplitude spectrum and phase spectrum of the observation signal y obtained in the DFT process (S2) and the estimated value of the frequency of the sine wave component constituting the original signal in the MUSIC method will be described in detail.
[0011]
[1] Autocorrelation matrix and eigenvalue decomposition of observed signal y If the original signal of the frame extracted in step S1 is an N-sample discrete-time signal vector x, this x is as follows: Can be represented.
[0012]
[Expression 1]

T: Matrix or vector transpose
If the original signal x is composed of P sine wave components, the original signal x is
[0014]
[Expression 2]

[0015]
It is expressed. Here, S and a are
[0016]
[Equation 3]

[0017]
[Expression 4]

X (f ₁ ): Complex amplitude value of the component corresponding to the frequency f ₁ among the sine wave components constituting x
Given in. Here, the complex exponential vector s (f _l ), which is a column element of S, is
[0019]
[Equation 5]

[0020]
Given in. In general, the number P of sine wave components constituting the original signal x and its frequency _fl (l = 0, 1,..., P-1) are unknown.
Now, a signal with noise added to the original signal x,
[0021]
[Formula 6]

[0022]
Is observed. Here, n is a normally distributed noise signal given by mean 0 and variance σ _n ² , and the original signal x and noise n are uncorrelated with each other. At this time, the autocorrelation function matrix R _yy of the observed signal y is
[0023]
[Expression 7]

[0024]
Defined by Here, E [yy ^H ] represents an expected value of yy ^H , and H represents a complex conjugate transpose of a matrix or a vector. R _xx and R _nn are autocorrelation matrices of x and n, respectively.
[0025]
[Equation 8]

[0026]
[Equation 9]

[0027]
Given in. Also, the matrix A in equation (8) is
[0028]
[Expression 10]

[0029]
Given in.
[0030]
In the MUSIC method, generally, a signal subspace and a noise subspace are obtained by eigenvalue decomposition of an autocorrelation matrix of a signal, and signal information is estimated using the obtained noise subspace. However, in general, the eigenvalue decomposition of a matrix is complicated and has a large amount of calculation. For this reason, when the number of samples N is relatively large like an audio signal, real-time processing is impossible. Therefore, the present invention proposes a new eigenvalue decomposition technique capable of real-time processing by paying attention to the characteristics of the autocorrelation matrix.
[0031]
Now, the original signal x is composed of PN frequency components X (φ _k ) (k = 0, 1,..., N−1), and the frequency φ _k is like DFT (discrete Fourier transform). [0032]
[Expression 11]

[0033]
It shall be given by At this time, the row elements of S given by the expression (3) are arranged at equal intervals at the points that equally divide the circumference of the unit circle in the complex plane, so that the rows with different S are orthogonal to each other. So S is
[0034]
[Expression 12]

[0035]
[Formula 13]

[0036]
It has the nature of From Equation (12) and Equation (13), the complex conjugate transpose S ^H of S is
[0037]
[Expression 14]

[0038]
And substituting equation (14) into equation (8),
[0039]
[Expression 15]

[0040]
Is obtained. From equation (10), since A is a diagonal matrix, NA obtained by multiplying A by a scalar quantity N is also a diagonal matrix. Thus, equation (15) represents the eigenvalue decomposition of R _xx, the eigenvalues of _{_{R xx λ k (k = 0,1}} , ..., N-1), the eigenvector corresponding thereto ν _{k (k} = 0, 1,..., N−1), the eigenvalue λ _k and the eigenvector ν _k can be easily _obtained by the following equations (16) and (17).
[0041]
[Expression 16]

[0042]
[Expression 17]

[0043]
From the above, the following properties regarding the autocorrelation matrix, its eigenvalues and eigenvectors can be obtained.
(1) The eigenvalue of the autocorrelation matrix can be obtained directly from the DFT of the signal.
(2) Since the eigenvector of the autocorrelation matrix is given by Expression (17), it is not necessary to obtain it by calculation from the autocorrelation matrix.
(3) In order to calculate the eigenvalue and eigenvector of the autocorrelation matrix, it is not necessary to obtain the autocorrelation matrix from the signal.
[0044]
Furthermore, by setting the number of samples N to a multiplier of 2, eigenvalue calculation using FFT (Fast Fourier Transform) as the DFT unit 2 becomes possible. When eigenvalues and eigenvectors are obtained from the autocorrelation matrix of the signal, the frequency of the sine wave component constituting the signal can be estimated by the MUSIC method.
[0045]
[2] Noise removal by the MUSIC method The MUSIC method is one of subspace methods based on eigenvalue decomposition, and estimates the frequency of a sine wave component that constitutes a target original signal from within a noise. Now, it is assumed that the original signal x not including noise and the observation signal y including noise are given by Equation (2) and Equation (6), respectively. Here, the number of sine wave components constituting x is P <N. At this time, since the autocorrelation matrix R _xx of _x is a non-negative definite Hermitian matrix of rank P, all eigenvalues λ _{k of} R _xx are real numbers,
[0046]
[Formula 18]

[0047]
Given in. On the other hand, from the equations (7) and (9), the eigenvalue μ _k of the autocorrelation matrix R _yy of y is
[0048]
[Equation 19]

[0049]
Μ _k is given by
[0050]
[Expression 20]

[0051]
It becomes. If the eigenvector corresponding to μ _k is ν _k (k = 0, 1,..., N−1), they can be divided into two sets of subspaces orthogonal to each other. {Ν ₀ , ν ₁ ,... Ν _p-1 } _spans a signal subspace equivalent to S, while {ν _p , ν _{p + 1} ,. Hang. The MUSIC spectrum of y uses the eigenvector that spans the noise subspace,
[0052]
[Expression 21]

[0053]
Defined by Here, f is an arbitrary frequency. Equation (21) does have a very when f = f _l, sharp peaks appear at that point. Therefore, the frequency f _l of the sinusoidal components constituting the x can be estimated by detecting the P-number of peak points on MUSIC spectrum. Further, when a frequency estimated from the MUSIC spectrum (hereinafter, expressed by adding “^” indicating an estimated value to f _{1 in} the equation), the maximum likelihood method or the generalized inverse matrix is used. X can be restored from y by calculation. However, since this method includes matrix multiplication and inverse matrix calculation, the amount of calculation is large and is not suitable for real-time processing. Therefore, in the present invention, x is restored by the method described below that does not depend on these methods.
[0054]
As described above, x is composed of P (<N) frequency components. On the other hand, since n is normally distributed noise, its components are distributed over the entire frequency band. That is, only the noise is included in the frequency components other than f _l among the frequency components of y. According to Equation (16) and Equation (19), between the frequency components of x and y,
[0055]
[Expression 22]

[0056]
Holds. Here, X ( _fl ) and Y ( _fl ) are frequency components in _fl of x and y, respectively. Therefore, the estimated value of X (f _l ) is
[0057]
[Expression 23]

[0058]
Given in. Since the equation (23) is an equation derived from the eigenvalues of the autocorrelation matrix, the estimated value of X (f ₁ ) obtained by this equation has no information regarding the phase. Therefore, the phase spectrum obtained from the frequency component of y is used as the phase information of the estimated value of X (f ₁ ) by utilizing the property that human hearing is not sensitive to phase changes. The phase information of the frequency component Y (f _l ) of y (where f _l is an estimated value)
[0059]
[Expression 24]

[0060]
And Here, Re {•} and Im {•} represent a real part and an imaginary part of a complex number, respectively. At this time, the estimated value of X (f _l ) is
[0061]
[Expression 25]

[0062]
Restored by. Therefore, the restored signal of x is obtained by X _MUSIC IDFT (Inverse Discrete Fourier Transform).
[0063]
[Equation 26]

[0064]
Can be obtained.
[0065]
Next, the meaning of the MUSIC spectrum will be considered. From Equation (16) and Equation (20), the eigenvalue corresponding to the signal subspace corresponds to P components with high power among the frequency components of y. Similarly, the eigenvalue corresponding to the noise subspace corresponds to NP components with low power among the frequency components of y. Further, the eigenvector ν _k is given by a complex exponential vector s (φ _k ) corresponding to the sine wave component constituting y from the equation (17). On the other hand, the complex exponential vector s (f) at an arbitrary frequency f is arranged at equal intervals so that all its elements are equally divided into unit circles of the complex plane only when the frequency is given by equation (11). The Therefore, the denominator of the MUSIC spectrum expressed by Equation (21) is given as follows.
[0066]
[Expression 27]

[0067]
[Expression 28]

[0068]
[Expression 29]

[0069]
From these facts, the MUSIC spectrum represented by the equation (21) has a pole only in the frequency of P frequency components having high power among the frequency components of the observation signal y. This means that estimating the frequency from the P peak points appearing in the MUSIC spectrum of the observation signal y is nothing but detecting the frequency of the P components having high power among the frequency components of the observation signal y. Means. Therefore, the same frequency as that estimated by the MUSIC method can be estimated directly from the amplitude spectrum of the observation signal y without calculating the MUSIC spectrum.
[0070]
FIG. 4 is a flowchart showing details of the noise removal processing (S5) by the MUSIC method, and the left side shows a signal obtained by each processing or a spectrum of the signal. First, when the amplitude spectrum and the phase spectrum are obtained from the observation signal y (discrete time signal) by the DFT unit 2, P components having high power are extracted from the amplitude spectrum (S11). Next, the amplitude spectrum of the original signal is obtained (S12), and the phase spectrum of the observation signal y is added to the amplitude spectrum to restore the phase information (S13). Then, by performing inverse DFT on this, a frame restoration signal is obtained (S14).
[0071]
In the above description, it is assumed that the number of sine wave components constituting x, that is, the dimension P of the signal subspace is known. However, in general, P is unknown. Therefore, a method for estimating P using the distribution of eigenvalues μ _k of R _yy and the variance σ _n ² of _n will be described.
[0072]
The eigenvalue μ _{k of} R _yy is distributed as shown in Expression (20). Actually, since the number of samples N is a finite length, {μ _P , μ _{P + 1} ,..., Μ _N-1 } does not become a constant value σ _n ² , and σ _n ² is an average value to some extent. However, compared with {μ ₀ , μ ₁ ,..., Μ _P-1 }, it is distributed in a range that can be regarded as σ _n ² . Therefore, in the present invention, σ _n ² is used as a threshold value, and the number of eigenvalues of R _yy larger than σ _n ² is P. The signal subspace dimension determining unit 3 determines the number of components having a peak larger than the estimated value of the variance σ _n ² obtained by the noise variance estimating unit 6 as the dimension P of the signal subspace.
[0073]
However, since the observation signal includes both x and n, it is practically difficult to estimate σ _n ² . Therefore, assuming that the speech uttered by humans contains silence intervals between sentences, and the noise variance changes slowly with time, the noise variance contained in the speech interval is estimated from the previous silence interval. can do. In order to detect a voice section and a silent section from the observation signal y, the following equation (30) is defined.
[0074]
[30]

[0075]
Here, the value denoted by the "^" over the sigma _n ² is an estimate of the sigma _n ^2. Since the eigenvalue μ _{k of} R _yy is given by Equation (20), D _VAD is 0 in the silent section and larger than that in the voice section. In the case of D _VAD ≦ 0, the estimated value of σ _n ² is updated by the variance of the signal in the section, so that noise removal and subspace dimensions can be reduced even for noise whose characteristics change gradually with time. Decisions are made effectively. In this system, the variance calculation unit 11 calculates the variance of the observation signal y, and the silence interval detection unit 12 detects the silence interval based on the above equation (30). Then, the noise variance update unit 13 assumes that the variance in the silent section is the variance of the noise signal and uses it for updating the variance. In the present embodiment, it is assumed that the observation signal y input from the microphone always starts from a silent interval, and the variance of the first frame of the observation signal is used as the initial value of the estimated value of σ _n ² . .
[0076]
In the above, an example in which the original signal is an audio signal has been described. However, even when the original signal is a heartbeat signal in an electrocardiogram or an electroencephalogram signal, it is extremely useful for removing noise superimposed thereon. It is.
[0077]
【The invention's effect】
As described above, according to the present invention, as a denoising method using signal information estimated by the MUSIC method, a matrix operation for obtaining eigenvalues and eigenvectors of an autocorrelation matrix is not performed, and discrete Fourier Since the eigenvalues and eigenvectors of the autocorrelation matrix are obtained using the transformation, there is an effect that it is possible to perform accurate noise removal in real time.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a noise removal system according to an embodiment of the present invention.
FIG. 2 is a flowchart showing the operation of the system.
FIG. 3 is a diagram for explaining frame cutout and frame connection in the system;
FIG. 4 is a flowchart showing details of noise removal processing in the system.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Frame extraction part 2 ... DFT (discrete Fourier transform) part 3 ... Dimension determination part 4 of signal subspace ... Noise removal part 5 by MUSIC method ... Frame connection part 6 ... Noise dispersion estimation part 11 ... Dispersion calculation part 12 ... Silence Section detection unit 13 ... noise variance update unit

Claims

A frame cutout unit that inputs an observation signal in which a noise signal is superimposed on the original signal and cuts out the observation signal for each frame of a predetermined length;
A discrete Fourier transform is performed for each extracted frame, and a discrete Fourier transform unit that extracts an amplitude spectrum and a phase spectrum of the frame;
A noise variance estimator for estimating variance of the noise signal from the observed signal;
A signal subspace dimension determining unit that determines a signal subspace dimension of the observed signal based on the estimated variance of the noise signal;
An amplitude spectrum of the determined dimension is extracted from the amplitude spectrum of the extracted frame, and the amplitude spectrum of the original signal is estimated from the extracted amplitude spectrum and the estimated variance of the noise signal. The spectrum of the original signal is estimated from the estimated value of the amplitude spectrum of the original signal and the phase spectrum of the frame, and the original signal is restored by performing inverse Fourier transform on the estimated spectrum of the original signal. A noise removing unit for removing the superimposed noise signal;
A noise removal system comprising: a frame connection unit that connects each frame from which the noise signal has been removed to obtain a restored signal.

The noise variance estimator is
A variance calculation unit for calculating the variance of the observed signal;
A no-signal section detector for detecting a no-signal section that does not include the original signal in the observed signal from the amplitude spectrum of the observed signal and the variance of the estimated noise signal;
The noise removal unit according to claim 1, further comprising a noise variance updating unit that obtains a variance of the observed signal in the detected no-signal section and updates the variance of the estimated noise signal. system.

Inputting an observation signal in which a noise signal is superimposed on the original signal, and cutting this observation signal into frames of a predetermined length;
Performing discrete Fourier transform on each of the extracted frames, and extracting an amplitude spectrum and a phase spectrum of the frame;
Estimating a variance of the noise signal from the observed signal;
Determining a dimension of a signal subspace of the observed signal based on a variance of the estimated noise signal;
An amplitude spectrum of the determined dimension is extracted from the amplitude spectrum of the extracted frame, and the amplitude spectrum of the original signal is estimated from the extracted amplitude spectrum and the estimated variance of the noise signal. The spectrum of the original signal is estimated from the estimated value of the amplitude spectrum of the original signal and the phase spectrum of the frame, and the original signal is restored by performing inverse Fourier transform on the estimated spectrum of the original signal. Removing the superimposed noise signal;
A noise removal program configured to cause a computer to execute a step of obtaining a restored signal by connecting each frame from which the noise signal has been removed.

Estimating the variance of the noise signal comprises:
Calculating a variance of the observed signal;
Detecting a no-signal section that does not include the original signal in the observed signal from the amplitude spectrum of the observed signal and the variance of the estimated noise signal;
The noise removal program according to claim 3, further comprising a step of obtaining a variance of the observed signal in the detected no-signal section to update the variance of the estimated noise signal.

Determining the dimension of the signal subspace of the observed signal comprises:
The step of determining, as the dimension of the signal subspace, the number of amplitude spectra having peaks larger than an estimated value of variance of the noise signal among amplitude spectra obtained from the observed signal. 3. The noise removal program according to 3 or 4.