JP4183338B2

JP4183338B2 - Noise reduction system

Info

Publication number: JP4183338B2
Application number: JP18305999A
Authority: JP
Inventors: 孝一中田
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 1999-06-29
Filing date: 1999-06-29
Publication date: 2008-11-19
Anticipated expiration: 2019-06-29
Also published as: JP2001014000A

Description

【０００１】
【発明の属する技術分野】
本発明は音声認識時における話者音声信号のＳＮ比を改善するノイズリダクションシステムに係わり、特に、AMNOR(Adaptive Microphone-array for Noise Reduction)方式に用いて好適なノイズリダクションシステムに関する。
【０００２】
【従来の技術】
現在の音声認識システムは、15dB以上のSN比が確保されている場合、約95％の認識率が実現できる程の技術レベルまでに達している。しかし、周囲に存在するノイズによりSN比が低下すると、それに伴って認識率が急激に低下する性質も有している。図５はSN比と認識性能との関係をいくつかの種類のマイクロホン（無指向性、単一指向性、狭指向性等)について評価したもので、各マイクロホンのSN比と認識率はおおむねＳ字特性を示す帯100の中に包含されている。この図５から明らかなように、認識率はSN比の低下により急激に低下し、SN比が0dBの環境下において約50％にまで低下してしまう。
【０００３】
そのため、自動車が発生するノイズ（エンジン音・ロードノイズ・パターンノイズ・風切り音など）が存在する自動車車室内において、上記のような認識性能の劣化は避けられず、音声認識システムを車載化する上で大きな問題の一つとなっている。前記したような事情から、周囲に存在するノイズの影響を少なくし、高いSN比で音声を受音するための方式が種々提案されており、AMNOR方式はその一例である。
【０００４】
AMNOR方式のノイズリダクションシステムでは、複数のマイクロホンを設け、目標信号を各マイクロホン毎に相当量遅延し、各遅延信号を対応するマイクロホン出力信号に加算して参照信号とする。適応信号処理部は、学習時、各参照信号と誤差信号を入力され、誤差信号のパワーが最小となるように適応信号処理を行って適応フィルタの係数を更新し、音声認識時などの非学習時、適応フィルタの係数更新を停止し、学習時の係数を適応フィルタに設定したままにし、目標応答設定部から出力する信号と適応フィルタから出力する信号の差を音声信号としえ音声認識部に出力する。
【０００５】
図６は一般的な２つのマイクを用いたAMNOR方式のノイズリダクションシステムの構成例である。図中、１１，１２は第１、第２のマイクロホン、１３，１４はアンプ、１５はランダムノイズたとえばホワイトノイズを目標信号として発生するシグナルジェネレータ（ＳＧ）、１６はゲイン可変アンプ、１７，１８は話者口元から各マイクロホン迄の信号遅延時間ｄ₁，ｄ₂に相当する遅延を目標信号に付加する遅延部、１９，２０は各遅延部から出力する信号をそれぞれ各マイクロホンの出力信号に加算する加算部である。
【０００６】
２１は２入力／１出力の適応信号処理部であり、第１、第２の２つの適応信号処理部２１ａ，２１ｂ及び各適応信号処理部２１ａ，２１ｂの出力を加算して出力する加算器２１ｃを有している。各適応信号処理部２１ａ，２１ｂは図示しないがLMS演算部と、FIR型ディジタルフィルタ構成の適応フィルタを有している。第１の信号処理部２１ａは、学習時、加算器１９の出力信号を参照信号とし、エラー信号ｅのパワーが最小となるように適応フィルタ係数Ｗ₁を更新し、音声認識時、適応フィルタの係数更新を停止し、学習時に得られている係数Ｗ₁を適応フィルタに設定したままにして入力信号にフィルタリング処理を施して出力する。第２の信号処理部２１ｂは、学習時、加算器２０の出力信号を参照信号とし、エラー信号ｅのパワーが最小となるように適応フィルタ係数Ｗ₂を更新し、音声認識時、適応フィルタの係数更新を停止し、学習時に得られている係数Ｗ₂を適応フィルタに設定したままにして入力信号にフィルタリング処理を施して出力する。加算器２１ｃは各適応フィルタ出力を加算して出力する。
【０００７】
２２はシグナルジェネレータ１５から出力するノイズ信号を目標信号として入力される目標応答設定部であり、音響系の逆特性を精度よく近似するためのものである。適応フィルタのタップ長の半分の信号遅延時間をｄ′、遅延時間ｄ₁，ｄ₂の平均値をｄ″とするとき、目標応答設定部２２は遅延時間ｄ（＝ｄ′＋ｄ″）の遅延特性を有し、オーディオ周波数帯域でフラットな特性（ゲイン１の特性）を有する。２３は減算部であり、目標応答設定部２２から出力する目標応答より適応信号処理部２１の出力信号を減算して誤差信号ｅを出力する。
【０００８】
学習時、マイクロホン１１、１２には自動車ＣＲが発生するノイズXn₁(z)，Xn₂(z)のみが入力する。加算器１９、２０は目標信号としてのランダムノイズと各マイクロホン出力を合成し、適応信号処理部２１は加算器１９，２０の出力信号を参照信号とし、エラー信号ｅのパワーが最小となるように適応信号処理を行って適応フィルタの係数Ｗ₁，Ｗ₂を更新する。
音声認識時、適応信号処理部２１はフィルタ係数の更新をせず、学習時に得られた係数Ｗ₁，Ｗ₂を各適応フィルタに設定したままにし、これら適応フィルタの出力信号を合成して減算部２３に入力する。減算部２３は目標応答設定部２２から出力する目標応答より適応信号処理部２１の出力信号を減算し、差信号を音声信号として音声認識部に入力する。かかるAMNOR方式のノイズリダクションシステムによれば、音声認識時にノイズは最小になり、しかも、大きな話者音声出力が得られＳＮ比を改善できる。
【０００９】
【発明が解決しようとする課題】
(1) かかるAMNOR方式は音響伝達特性がそれほど複雑でなく、話者口元から各マイクへの音響伝達特性の差分が遅延のみであらわせる環境、例えば、比較的広い部屋などで使用する場合には有効である。しかし、車室内のように非常に複雑な音響伝達特性を持つ環境では、伝達特性の差分は遅延のみで表現できず不十分である。
(2)車室内において、ノイズ源が多数存在し、各マイクで受信するノイズは相関が低いため、マイク間距離を短くせざるを得ない。このため、話者―マイク間距離は短距離に限定されてしまい、各マイクへの音声の到来の時間差は微少であり、必ずしも遅延分を精度よく設定できず、遅延のみでは不十分である。たとえば、サンプリング周波数fsが11.025(kHｚ)のとき、1サンプルは、
(1/11.025×10³)(sec)×340m/sec)＝3.08(cm)に相当するため、距離差3(cm)以上で、かつ3(cm)単位に相当する遅延しか設定できない。
(3) AMNOR方式のマイクロホンアレイシステムにおいては、複数個のマイクを使用するが、理想的には各マイクの特性は同一である必要があるが、現実的にはマイクロホンの特性は同一でなく、このため意図するとおりのノイズ低減効果を期待できない。尚、仮に同一の特性を有するマイクロホンを用意するとすればコストがかかることになる。
【００１０】
以上から、本発明は、ノイズ削減効果を向上できるノイズリダクションシステムを提供することである。
本発明の別の目的は、各マイクロホンの特性が同一でなくてもノイズ削減効果を向上できるノイズリダクションシステムを提供することである。
【００１１】
【課題を解決するための手段】
上記課題は本発明によれば、▲１▼複数のマイクロホン、▲２▼ランダムノイズ信号を発生するシグナルジェネレータ、▲３▼話者口元から各マイクロホンまでの伝達特性を模擬する伝達回路、▲４▼各伝達回路を介して出力するランダムノイズ信号をそれぞれ各マイクロホンの出力信号に合成する合成部、▲５▼学習時、前記ランダムノイズ信号を目標信号、各合成部出力をそれぞれ参照信号として適応信号処理を行って適応フィルタの係数を更新し、非学習時にフィルタ係数の更新を停止する適応信号処理部、▲６▼目標信号に所定の遅延を付与する目標応答設定部、▲７▼適応フィルタの出力信号と目標応答設定部の出力信号との差を求め、非学習時、該差信号を音声信号として出力する減算部を備えたノイズリダクションシステムにより達成される。本発明によれば、従来、遅延特性のみで話者口元からマイクロホンまでの伝達特性を模擬していたものを、話者口元からマイクロホンまでの実際の伝達特性で模擬するため、ノイズ削減効果を向上することができる
【００１２】
又、上記目的は、本発明によれば、話者口元から各マイクロホン出力端までの伝達特性を測定する伝達特性測定手段を伝達回路に設け、システムを伝達特性測定モード、学習モード、非学習モードに切り替えて、伝達特性の測定、フィルタ係数の学習、音声信号の出力を行うことにより達成される。このようにすれば、話者口元から各マイクロホンの出力端までの伝達特性を測定して伝達回路に設定でき、マイクロホンの特性が同一でなくてもノイズ削減効果を向上できる。
【００１３】
【発明の実施の形態】
図１は本発明の第１実施例の構成図である。図中、５１，５２は第１、第２のマイクロホン、５３，５４はアンプ、５５はランダムノイズたとえばホワイトノイズを目標信号として発生するシグナルジェネレータ（ＳＧ）、５６はゲイン可変アンプ、５７，５８は話者口元から各マイクロホンの出力端迄の伝達特性（伝搬特性）CS1′、CS2′を模擬し、該伝搬特性を目標信号に付与する伝達回路、５９，６０は各伝達回路から出力する信号をそれぞれ各マイクロホンの出力信号に加算する加算部である。
【００１４】
６１は２入力／１出力の適応信号処理部であり、第１、第２の２つの適応信号処理部６１ａ，６１ｂ及び各適応信号処理部の出力を加算して出力する加算器６１ｃを有している。各適応信号処理部６１ａ，６１ｂは図示しないがLMS演算部と、FIR型ディジタルフィルタ構成の適応フィルタを有している。
第１の信号処理部６１ａは、学習時、加算器５９の出力信号を参照信号とし、エラー信号ｅのパワーが最小となるように適応フィルタ係数Ｗ₁を更新し、非学習時たとえば音声認識時、適応フィルタの係数更新を停止し、学習時に得られている係数Ｗ₁を適応フィルタに設定したままにして入力信号にフィルタリング処理を施して出力する。第２の信号処理部２１ｂは、学習時、加算器６０の出力信号を参照信号とし、エラー信号ｅのパワーが最小となるように適応フィルタ係数Ｗ₂を更新し、音声認識時、適応フィルタの係数更新を停止し、学習時に得られている係数Ｗ₂を適応フィルタに設定したままにして入力信号にフィルタリング処理を施して出力する。加算器６１ｃは各適応フィルタ出力を合成して出力する。
【００１５】
６２はシグナルジェネレータ５５から出力するノイズ信号を目標信号として入力される目標応答設定部であり、音響系の逆特性を精度よく近似するためのものである。適応フィルタのタップ長の半分の信号遅延時間をｄとすれば、目標応答設定部４は該遅延時間ｄの遅延特性を有し、オーディオ周波数帯域でフラットな特性（ゲイン１の特性）を有する。６３は減算部であり、目標応答設定部６２から出力する目標応答より適応信号処理部７１の出力信号を減算して誤差信号ｅを出力する。
【００１６】
学習時、マイクロホン５１、５２には自動車ＣＲが発生するノイズXn₁(z)，Xn₂(z)のみが入力する。伝達回路５７、５８は、目標信号としてのランダムノイズに対し、話者口元から各マイクロホン出力端迄の伝達特性CS1′、CS2′付与する。加算器５９、６０は伝達回路５７，５８の出力と各マイクロホン出力を合成し、適応信号処理部６１は加算器５９，６０の出力信号を参照信号とし、エラー信号ｅのパワーが最小となるように適応信号処理を行って適応フィルタの係数Ｗ₁，Ｗ₂を更新する。
音声認識時、適応信号処理部６１はフィルタ係数の更新をせず、学習時に得られた係数Ｗ₁，Ｗ₂を各適応フィルタに設定したままにし、これら適応フィルタの出力信号を合成して減算部６３に入力する。減算部６３は目標応答設定部６２から出力する目標応答より適応信号処理部６１の出力信号を減算し、差信号を音声信号として音声認識部に入力する。
以上のように、伝達特性CS1′，CS2′を目標信号に付与した信号と各マイクロホンの出力信号とを加算した信号を参照信号として適応信号処理するから、学習時にノイズ出力のパワーが最小となるように正確に適応フィルタ係数を決定でき、この結果、音声認識時にノイズを低減でき、ＳＮ比の大きな音声信号を出力できる。
【００１７】
図２は話者ＤＲの口元から各マイクロホン出力端迄の伝達特性CS１′、CS2′を測定する測定装置の構成図であり、図１と同一部分には同一符号を付している。図中、７０は話者口元付近に設けたスピーカであり、シグナルジェネレータ５５から出力するホワイトノイズをマイクロホン５１，５２に向けて出力する。
７１はホワイトノイズを参照信号、マイクロホン５１の出力を目標信号とし、エラーｅ₁のパワーが最小となるように適応信号処理を行って適応フィルタの係数Ｗcs₁を更新する適応信号処理部、７２はマイクロホン５１の出力と適応信号処理部７１の出力との差（エラー）ｅ₁を出力する減算部である。
７３はホワイトノイズを参照信号、マイクロホン５２の出力を目標信号とし、エラーｅ₂のパワーが最小となるように適応信号処理を行って適応フィルタの係数Ｗcs₂を更新する適応信号処理部、７４はマイクロホン５２の出力と適応信号処理部７３の出力との差（エラー）ｅ₂を出力する減算部である。
【００１８】
適応信号処理部７１、７３において、継続して適応信号処理を行って適応フィルタ（図示せず）の係数Ｗcs₁,Ｗcs₂を更新すると、該係数は一定値に収束する。係数値Ｗcs₁が一定値に収束したとき、適応信号処理部７１の適応フィルタの特性は、話者口元からマイクロホン５１の出力端迄の伝達関数CS1′を示す。又、係数値Ｗcs₂が一定値に収束したとき、適応信号処理部７３の適応フィルタの特性は、話者口元からマイクロホン５２の出力端迄の伝達関数CS2′を示す。
従って、図１の伝達回路５７、５８をＦＩＲ型ディジタルフィルタで構成し、これらフィルタの係数としてＷcs₁,Ｗcs₂を設定すれば、伝達回路５７，５８により話者口元からマイクロホン５１，５２の出力端迄の伝達関数CS1′,CS2′を模擬できる。
【００１９】
以上より、車両毎にＷcs₁,Ｗcs₂を決定するようにすれば、マイクの特性を含めて話者口元からマイクロホン出力端までの伝達特性を測定できる。しかし、車両毎にＷcs₁,Ｗcs₂を決定するのは煩雑である。そこで、音声認識装置を搭載する、車種およびマイク位置が特定できる場合には、あらかじめ、１台の車両について係数Ｗcs₁,Ｗcs₂の値を確定し、それを図１の伝達回路５７、５８に設定する。しかし、この方法はマイク特性の補正効果を有しない。
【００２０】
図３は話者口元から各マイクロホンの出力端迄の伝達関数を測定する機能を備えたAMNOR方式の別のノイズリダクションシステムの構成図であり、図１と同一部分には同一符号を付している。この図３のシステムはスイッチの切り替えにより、伝達関数測定時には図２に示す構成になり、学習／音声認識時には図１の構成になる。図３において、図１と異なる点は、
(1) 切替スイッチSW1〜SW4を設けている点、
(2) 話者口元近傍に伝達特性測定用のスピーカ８０を設けている点、
(3) 伝達回路５７，５８を適応信号処理が可能な構成にし、話者口元から各マイクロホン出力端迄の伝達関数を測定できるようにした点、
である。
【００２１】
伝達回路５７は、適応信号処理部５７ａと減算部５７ｂで構成されている。適応信号処理部５７ａは、ホワイトノイズを参照信号、マイクロホン５１の出力を目標信号とし、エラーｅ₁のパワーが最小となるように適応信号処理を行って適応フィルタの係数Ｗcs₁を更新し、減算部５７ｂはマイクロホン５１の出力と適応信号処理部５７ａの出力との差（エラー）ｅ₁を出力する。
又、伝達回路５８は、適応信号処理部５８ａと減算部５８ｂで構成されている。適応信号処理部５８ａは、ホワイトノイズを参照信号、マイクロホン５２の出力を目標信号とし、エラーｅ₂のパワーが最小となるように適応信号処理を行って適応フィルタの係数Ｗcs₂を更新し、減算部５８ｂはマイクロホン５２の出力と適応信号処理部５８ａの出力との差（エラー）ｅ₂を出力する。
【００２２】
話者口元から各マイクロホンの出力端迄の伝達関数を測定するには、スイッチSW1,SW2,SW3をオンし、スイッチSW4をオフし、システムを図２に示す構成にする。しかる後、図２で説明した方法により、適応信号処理部５７ａ，５８ａの適応フィルタ（図示せず）の係数が一定値Ｗcs₁，Ｗcs₂に収束すれば、各適応フィルタは話者口元からマイクロホン５１，５２の出力端迄の伝達関数CS1′，CS2′を模擬することになる。尚、測定完了によりスピーカ８０を除去する。
学習／音声認識するには、スイッチSW1,SW2,SW3をオフし、スイッチSW4をオンし、システムを図１に示す構成にする。しかる後、図１で説明した方法により、学習、音声認識を行う。
【００２３】
図４は図３のノイズリダクションシステムの全体の制御フローである。
最初に、話者口元から各マイクロホンの出力端迄の伝達関数を同定するモードであるかチェックし（ステップ１０１）、同定モードであれば、スイッチSW1,SW2,SW3をオン、スイッチSW4をオフし、システムを図２に示す構成にし、適応信号処理により係数Ｗcs₁，Ｗcs₂を更新する（ステップ１０２）。ついで、係数Ｗcs₁，Ｗcs₂が一定値に収束したかチェックし（ステップ１０３）、一定値に収束するまでステップ１０２の更新処理を行う。係数Ｗcs₁，Ｗcs₂が一定値に収束すれば、適応信号処理部５７ａ，５８ａの各適応フィルタは、話者口元からマイクロホン５１，５２の出力端迄の伝達関数CS1′，CS2′を模擬することになる。一定値に収束すれば、スイッチSW1,SW2,SW3をオフし、スイッチSW4をオンし、システムを図１に示す構成にする。
【００２４】
ついで、音声認識開始を指示するトークスイッチが操作されて音声認識状態になったかチェックする（ステップ１０４）。トークスイッチがオン操作されなければ、学習モードであるから、２入力／１出力の適応信号処理部６１は、加算器５９，６０の出力信号を参照信号としてエラー信号ｅのパワーが最小となるように適応信号処理を行って適応フィルタの係数Ｗ₁，Ｗ₂を更新する（ステップ１０５）。以後、トークスイッチがオン操作されるまで、係数Ｗ₁，Ｗ₂の更新動作を行う。これにより、係数Ｗ₁，Ｗ₂は一定値に収束する。
ステップ１０４において、トークスイッチがオン操作されると音声認識モードになり、係数Ｗ₁，Ｗ₂の更新動作を終了する（ステップ１０６）。音声認識時、適応信号処理部６１はフィルタ係数の更新をせず、学習時に決定したフィルタ係数Ｗ₁，Ｗ₂を適応フィルタに設定したままにし、減算部６３は目標応答設定部６２から出力する目標信号より適応信号処理部６１の出力信号を減算した信号、すなわち、ノイズが低減し、ＳＮ比が向上した音声信号を図示しない音声認識部に出力する。
【００２５】
以後、トークスイッチがオフ操作されて音声認識が解除される迄、ステップ１０６の動作を行い、トークスイッチがオフ操作されると（ステップ１０７）、学習モードに戻りステップ１０５以降の係数Ｗ₁，Ｗ₂の更新動作が再開する。
図３のノイズリダクションシステムによれば、車の出荷前に生産ライン等にて、あるいは、販売店等で伝達特性CS1′,CS2′を同定する作業を１回行うだけで良く、しかも、マイクロホンの特性を含めて伝達特性CS1′,CS2′を同定できるためマイクロホンに特性の不揃があっても何ら問題を生じない。又、予め、車種を特定できない場合等不確定要素を含むような場合であっても、伝達特性CS1′,CS2′を同定して設定することができる。
以上では、本発明システムから出力する音声信号を非学習時に音声認識装置に入力する場合について説明したが、かかる場合に限らず、非学習時に音声信号をハンズフリー電話器に入力したり、その他の機器に入力する場合に応用できるものである。
以上、本発明を実施例により説明したが、本発明は請求の範囲に記載した本発明の主旨に従い種々の変形が可能であり、本発明はこれらを排除するものではない。
【００２６】
【発明の効果】
以上本発明によれば、従来、遅延特性のみで口元からマイクロホンまでの伝達特性を模擬していたものを、実際の口元からマイクロホンまでの伝達特性で模擬するため、ノイズ削減効果を向上することができる
又、本発明によれば、一般的なAMNOR方式では対応できない、車室内ノイズの除去が可能となる。
又、本発明によれば、マイク−話者間の伝達特性を模擬する機構をノイズリダクション装置に組み込むことで、マイク特性のバラツキ補正が可能になる。
又、本発明によれば、ノイズリダクションのための適応処理をリアルタイムに行うことができる。
【図面の簡単な説明】
【図１】本発明のノイズリダクションシステムの構成図である。
【図２】本発明の話者口元から各マイクロホン出力端までの伝達特性測定装置の構成図である。
【図３】伝達特性測定機能を備えたノイズリダクションシステムの構成図である。
【図４】本発明の全体の制御フローである。
【図５】ＳＮ比と認識率の関係図である。
【図６】従来のAMNOR方式のノイズリダクションシステムである。
【符号の説明】
５１，５２・・第１、第２のマイクロホン
５５・・ランダムノイズを目標信号として発生するシグナルジェネレータ
５７，５８・・伝達特性を模擬する伝達回路
５９，６０・・加算部
６１・・２入力／１出力の適応信号処理部
６１ａ，６１ｂ・・第１、第２の適応信号処理部
６２・・目標応答設定部
６３・・減算部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a noise reduction system for improving the S / N ratio of a speaker voice signal during speech recognition, and more particularly to a noise reduction system suitable for use in an AMNOR (Adaptive Microphone-array for Noise Reduction) system.
[0002]
[Prior art]
The current speech recognition system has reached a technical level that can achieve a recognition rate of about 95% when a signal-to-noise ratio of 15 dB or more is secured. However, when the S / N ratio is reduced due to noise present in the surrounding area, the recognition rate is rapidly lowered. Fig. 5 shows the relationship between the SN ratio and recognition performance for several types of microphones (omnidirectional, unidirectional, narrow directivity, etc.). The SNR and recognition rate of each microphone is roughly S. It is included in the band 100 showing the character characteristics. As is apparent from FIG. 5, the recognition rate rapidly decreases due to a decrease in the SN ratio, and decreases to about 50% in an environment where the SN ratio is 0 dB.
[0003]
For this reason, the above-mentioned deterioration in recognition performance is unavoidable in automobile interiors where noise generated by automobiles (engine noise, road noise, pattern noise, wind noise, etc.) is present, and a voice recognition system is mounted on the vehicle. It is one of the big problems. In view of the circumstances described above, various methods have been proposed for receiving sound with a high S / N ratio while reducing the influence of noise present in the surroundings, and the AMNOR method is one example.
[0004]
In the AMNOR noise reduction system, a plurality of microphones are provided, the target signal is delayed by a considerable amount for each microphone, and each delayed signal is added to the corresponding microphone output signal to obtain a reference signal. The adaptive signal processing unit receives each reference signal and error signal during learning, performs adaptive signal processing to minimize the power of the error signal, updates the coefficient of the adaptive filter, and performs non-learning such as during speech recognition At this time, the coefficient update of the adaptive filter is stopped, the coefficient at the time of learning is left set in the adaptive filter, and the difference between the signal output from the target response setting unit and the signal output from the adaptive filter is output as a speech signal to the speech recognition unit To do.
[0005]
FIG. 6 is a configuration example of an AMNOR type noise reduction system using two general microphones. In the figure, 11 and 12 are first and second microphones, 13 and 14 are amplifiers, 15 is a signal generator (SG) that generates random noise such as white noise as a target signal, 16 is a gain variable amplifier, and 17 and 18 are Delay units 19 and 20 add delays corresponding to signal delay times d ₁ and d ₂ from the speaker's mouth to the respective microphones to the target signal, and add signals output from the respective delay units to the output signals of the respective microphones. It is an adder.
[0006]
Reference numeral 21 denotes a 2-input / 1-output adaptive signal processing unit, which adds and outputs the outputs of the first and second adaptive signal processing units 21a and 21b and the adaptive signal processing units 21a and 21b. have. Although not shown, each of the adaptive signal processing units 21a and 21b has an LMS calculation unit and an adaptive filter having an FIR type digital filter configuration. First signal processing unit 21a, when learning a reference signal the output signal of the adder 19 to update the adaptive filter coefficients W ₁ so that the power of the error signal e is minimized, during speech recognition, adaptive filter stop coefficient update, subjected to output the filtering process on the input signal by the coefficient W ₁ that is left to set in the adaptive filter obtained at the time of learning. Second signal processing unit 21b, during training, a reference signal the output signal of the adder 20 to update the adaptive filter coefficients W ₂ so that the power of the error signal e is minimized, during speech recognition, adaptive filter The coefficient updating is stopped, the coefficient W ₂ obtained at the time of learning is set in the adaptive filter, the input signal is filtered and output. The adder 21c adds each adaptive filter output and outputs it.
[0007]
Reference numeral 22 denotes a target response setting unit that receives a noise signal output from the signal generator 15 as a target signal, and approximates the inverse characteristic of the acoustic system with high accuracy. When the signal delay time half of the tap length of the adaptive filter is d ′ and the average value of the delay times d ₁ and d ₂ is d ″, the target response setting unit 22 delays the delay time d (= d ′ + d ″). And has a flat characteristic (gain 1 characteristic) in the audio frequency band. A subtracting unit 23 subtracts the output signal of the adaptive signal processing unit 21 from the target response output from the target response setting unit 22 and outputs an error signal e.
[0008]
During learning, only the noises Xn ₁ (z) and Xn ₂ (z) generated by the automobile CR are input to the microphones 11 and 12. The adders 19 and 20 synthesize the random noise as the target signal and the output of each microphone, and the adaptive signal processing unit 21 uses the output signal of the adders 19 and 20 as a reference signal so that the power of the error signal e is minimized. Adaptive signal processing is performed to update the adaptive filter coefficients W ₁ and W ₂ .
At the time of speech recognition, the adaptive signal processing unit 21 does not update the filter coefficients, leaves the coefficients W ₁ and W ₂ obtained at the time of learning set to each adaptive filter, and synthesizes and subtracts the output signals of these adaptive filters. Input to the unit 23. The subtracting unit 23 subtracts the output signal of the adaptive signal processing unit 21 from the target response output from the target response setting unit 22, and inputs the difference signal as a speech signal to the speech recognition unit. According to such an AMNOR type noise reduction system, noise is minimized at the time of voice recognition, and a large speaker voice output can be obtained to improve the SN ratio.
[0009]
[Problems to be solved by the invention]
(1) The AMNOR method has less complicated sound transfer characteristics, and when used in an environment where the difference in sound transfer characteristics from the speaker's mouth to each microphone can be expressed only by a delay, such as a relatively large room. It is valid. However, in an environment having a very complicated acoustic transfer characteristic such as in a passenger compartment, the difference in transfer characteristic cannot be expressed only by a delay, and is insufficient.
(2) There are many noise sources in the passenger compartment, and the noise received by each microphone has a low correlation, so the distance between the microphones must be shortened. For this reason, the distance between the speaker and the microphone is limited to a short distance, and the time difference of arrival of the voice to each microphone is very small. The delay cannot be set accurately, and the delay alone is not sufficient. For example, when the sampling frequency fs is 11.025 (kHz), one sample is
Since it corresponds to (1 / 11.025 × 10 ³ ) (sec) × 340 m / sec) = 3.08 (cm), only a delay corresponding to a distance difference of 3 (cm) or more and 3 (cm) units can be set.
(3) The AMNOR microphone array system uses a plurality of microphones, but ideally the characteristics of each microphone should be the same, but in reality the characteristics of the microphones are not the same, For this reason, the expected noise reduction effect cannot be expected. If microphones having the same characteristics are prepared, it will be costly.
[0010]
As described above, the present invention is to provide a noise reduction system that can improve the noise reduction effect.
Another object of the present invention is to provide a noise reduction system that can improve the noise reduction effect even if the characteristics of the microphones are not the same.
[0011]
[Means for Solving the Problems]
According to the present invention, (1) a plurality of microphones, (2) a signal generator for generating a random noise signal, (3) a transfer circuit for simulating transfer characteristics from the speaker's mouth to each microphone, (4) A synthesis unit that synthesizes a random noise signal output via each transmission circuit with an output signal of each microphone. (5) During learning, adaptive signal processing is performed using the random noise signal as a target signal and each synthesis unit output as a reference signal. To update the coefficient of the adaptive filter and stop updating the filter coefficient when not learning, (6) target response setting unit to give a predetermined delay to the target signal, and (7) output of the adaptive filter A noise reduction system including a subtraction unit that obtains a difference between a signal and an output signal of a target response setting unit and outputs the difference signal as an audio signal when not learning. Ri is achieved. According to the present invention, since the transfer characteristic from the speaker mouth to the microphone is simulated by the actual transfer characteristic from the speaker mouth to the microphone, the noise reduction effect is improved. Can do [0012]
Further, according to the present invention, the present invention provides a transfer circuit for measuring transfer characteristics from a speaker's mouth to each microphone output terminal in a transfer circuit, and the system is configured to transfer characteristics measurement mode, learning mode, non-learning mode. This is achieved by measuring transfer characteristics, learning filter coefficients, and outputting audio signals. In this way, the transfer characteristic from the speaker's mouth to the output end of each microphone can be measured and set in the transfer circuit, and the noise reduction effect can be improved even if the microphone characteristics are not the same.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a block diagram of a first embodiment of the present invention. In the figure, 51 and 52 are first and second microphones, 53 and 54 are amplifiers, 55 is a signal generator (SG) that generates random noise such as white noise as a target signal, 56 is a gain variable amplifier, and 57 and 58 are Simulating transfer characteristics (propagation characteristics) CS1 'and CS2' from the speaker's mouth to the output terminal of each microphone, and transfer circuits 59 and 60 apply the propagation characteristics to the target signal. Each is an adder that adds to the output signal of each microphone.
[0014]
Reference numeral 61 denotes a 2-input / 1-output adaptive signal processing unit, which includes first and second adaptive signal processing units 61a and 61b and an adder 61c that adds and outputs the outputs of the adaptive signal processing units. ing. Although not shown, each of the adaptive signal processing units 61a and 61b includes an LMS calculation unit and an adaptive filter having a FIR type digital filter configuration.
First signal processing unit 61a, when learning a reference signal the output signal of the adder 59 to update the adaptive filter coefficients W ₁ so that the power of the error signal e is minimized, nonlearning when for example during voice recognition Then, the updating of the coefficient of the adaptive filter is stopped, the coefficient W ₁ obtained at the time of learning is left set in the adaptive filter, and the input signal is filtered and output. Second signal processing unit 21b, during training, a reference signal the output signal of the adder 60 to update the adaptive filter coefficients W ₂ so that the power of the error signal e is minimized, during speech recognition, adaptive filter The coefficient updating is stopped, the coefficient W ₂ obtained at the time of learning is set in the adaptive filter, the input signal is filtered and output. The adder 61c combines and outputs the respective adaptive filter outputs.
[0015]
Reference numeral 62 denotes a target response setting unit that receives a noise signal output from the signal generator 55 as a target signal, and is used to accurately approximate the inverse characteristics of the acoustic system. If the signal delay time that is half the tap length of the adaptive filter is d, the target response setting unit 4 has a delay characteristic of the delay time d and has a flat characteristic (gain 1 characteristic) in the audio frequency band. Reference numeral 63 denotes a subtracting unit that subtracts the output signal of the adaptive signal processing unit 71 from the target response output from the target response setting unit 62 and outputs an error signal e.
[0016]
During learning, only the noises Xn ₁ (z) and Xn ₂ (z) generated by the automobile CR are input to the microphones 51 and 52. The transfer circuits 57 and 58 give transfer characteristics CS1 ′ and CS2 ′ from the speaker mouth to each microphone output terminal to random noise as a target signal. The adders 59 and 60 synthesize the outputs of the transmission circuits 57 and 58 and the outputs of the microphones, and the adaptive signal processing unit 61 uses the output signals of the adders 59 and 60 as a reference signal so that the power of the error signal e is minimized. Then, adaptive signal processing is performed to update the adaptive filter coefficients W ₁ and W ₂ .
At the time of speech recognition, the adaptive signal processing unit 61 does not update the filter coefficient, leaves the coefficients W ₁ and W ₂ obtained at the time of learning set to each adaptive filter, and synthesizes and subtracts the output signals of these adaptive filters. Input to the unit 63. The subtracting unit 63 subtracts the output signal of the adaptive signal processing unit 61 from the target response output from the target response setting unit 62, and inputs the difference signal as a speech signal to the speech recognition unit.
As described above, adaptive signal processing is performed using the signal obtained by adding the transfer characteristics CS1 ′ and CS2 ′ to the target signal and the output signal of each microphone as a reference signal, so that the power of the noise output is minimized during learning. Thus, the adaptive filter coefficient can be accurately determined. As a result, noise can be reduced during speech recognition, and a speech signal having a large SN ratio can be output.
[0017]
FIG. 2 is a block diagram of a measuring apparatus for measuring transfer characteristics CS1 ′ and CS2 ′ from the mouth of the speaker DR to the output terminals of the microphones. The same parts as those in FIG. In the figure, 70 is a speaker provided near the speaker's mouth, and outputs white noise output from the signal generator 55 toward the microphones 51 and 52.
71 is an adaptive signal processing unit that uses white noise as a reference signal and the output of the microphone 51 as a target signal, performs adaptive signal processing so as to minimize the power of the error e ₁ , and updates the coefficient Wcs _{1 of the} adaptive filter. It is a subtraction unit that outputs a difference (error) e ₁ between the output of the microphone 51 and the output of the adaptive signal processing unit 71.
73 is an adaptive signal processing unit 74 that uses white noise as a reference signal and the output of the microphone 52 as a target signal, performs adaptive signal processing so that the power of the error e ₂ is minimized, and updates the coefficient Wcs _{2 of the} adaptive filter. It is a subtraction unit that outputs a difference (error) e ₂ between the output of the microphone 52 and the output of the adaptive signal processing unit 73.
[0018]
When the adaptive signal processing units 71 and 73 continuously perform adaptive signal processing to update the coefficients Wcs ₁ and Wcs ₂ of the adaptive filter (not shown), the coefficients converge to a constant value. When the coefficient value Wcs ₁ converges to a constant value, the characteristic of the adaptive filter of the adaptive signal processing unit 71 indicates the transfer function CS 1 ′ from the speaker's mouth to the output terminal of the microphone 51. Further, when the coefficient value Wcs ₂ has converged to a constant value, the characteristics of the adaptive filter of the adaptive signal processing unit 73, showing the transfer function CS2 'until the output end of the microphone 52 from the speaker mouth.
Therefore, if the transmission circuits 57 and 58 in FIG. 1 are constituted by FIR type digital filters and Wcs ₁ and Wcs ₂ are set as the coefficients of these filters, the transmission circuits 57 and 58 output the microphones 51 and 52 from the speaker's mouth. Transfer functions CS1 'and CS2' to the end can be simulated.
[0019]
As described above, if Wcs ₁ and Wcs ₂ are determined for each vehicle, the transfer characteristics from the speaker mouth to the microphone output end including the characteristics of the microphone can be measured. However, it is complicated to determine Wcs ₁ and Wcs ₂ for each vehicle. Therefore, when the vehicle type and the microphone position on which the voice recognition device is mounted can be specified, the values of the coefficients Wcs ₁ and Wcs ₂ are determined in advance for one vehicle, and the values are transferred to the transmission circuits 57 and 58 in FIG. Set. However, this method does not have a microphone characteristic correction effect.
[0020]
FIG. 3 is a configuration diagram of another AMNOR type noise reduction system having a function of measuring a transfer function from the speaker's mouth to the output end of each microphone. The same parts as those in FIG. Yes. The system shown in FIG. 3 has the configuration shown in FIG. 2 when the transfer function is measured, and the configuration shown in FIG. 1 when learning / speech recognition is performed. 3 differs from FIG. 1 in that
(1) The changeover switches SW1 to SW4 are provided,
(2) A speaker 80 for measuring transfer characteristics is provided in the vicinity of the speaker's mouth,
(3) The transfer circuits 57 and 58 are configured to be capable of adaptive signal processing, and the transfer function from the speaker mouth to each microphone output end can be measured.
It is.
[0021]
The transmission circuit 57 includes an adaptive signal processing unit 57a and a subtraction unit 57b. Adaptive signal processing unit 57a includes the reference signal white noise, the target signal output of microphone 51, and updates the coefficient Wcs ₁ of the adaptive filter by performing adaptive signal processing so that the power error e ₁ is minimized, subtraction The unit 57b outputs a difference (error) e ₁ between the output of the microphone 51 and the output of the adaptive signal processing unit 57a.
The transmission circuit 58 includes an adaptive signal processing unit 58a and a subtraction unit 58b. Adaptive signal processing unit 58a includes the reference signal white noise, the target signal output of microphone 52, and updates the coefficient Wcs ₂ of the adaptive filter by performing adaptive signal processing so that the power error e ₂ is minimized, subtraction The unit 58b outputs a difference (error) e ₂ between the output of the microphone 52 and the output of the adaptive signal processing unit 58a.
[0022]
In order to measure the transfer function from the speaker's mouth to the output terminal of each microphone, the switches SW1, SW2, and SW3 are turned on, the switch SW4 is turned off, and the system is configured as shown in FIG. Thereafter, if the coefficients of the adaptive filters (not shown) of the adaptive signal processing units 57a and 58a converge to the constant values Wcs ₁ and Wcs ₂ by the method described in FIG. 2, each adaptive filter is connected to the microphone from the speaker's mouth. The transfer functions CS1 'and CS2' up to the output terminals 51 and 52 are simulated. Note that the speaker 80 is removed upon completion of the measurement.
For learning / recognition, the switches SW1, SW2, and SW3 are turned off, the switch SW4 is turned on, and the system is configured as shown in FIG. Thereafter, learning and speech recognition are performed by the method described in FIG.
[0023]
FIG. 4 is an overall control flow of the noise reduction system of FIG.
First, it is checked whether the transfer function from the speaker's mouth to the output terminal of each microphone is identified (step 101). If the mode is the identification mode, the switches SW1, SW2, SW3 are turned on and the switch SW4 is turned off. The system is configured as shown in FIG. 2, and the coefficients Wcs ₁ and Wcs ₂ are updated by adaptive signal processing (step 102). Next, it is checked whether the coefficients Wcs ₁ and Wcs ₂ have converged to a constant value (step 103), and the update process of step 102 is performed until the coefficients Wcs ₁ and Wcs ₂ have converged to a constant value. When the coefficients Wcs ₁ and Wcs ₂ converge to a constant value, the adaptive filters of the adaptive signal processing units 57a and 58a simulate transfer functions CS1 ′ and CS2 ′ from the speaker mouth to the output ends of the microphones 51 and 52, respectively. It will be. When it converges to a certain value, the switches SW1, SW2, and SW3 are turned off, the switch SW4 is turned on, and the system is configured as shown in FIG.
[0024]
Next, it is checked whether or not the talk switch for instructing the start of voice recognition is operated to enter the voice recognition state (step 104). If the talk switch is not turned on, the learning mode is set, so that the 2-input / 1-output adaptive signal processing unit 61 uses the output signals of the adders 59 and 60 as reference signals so that the power of the error signal e is minimized. Then, adaptive signal processing is performed to update the coefficients W ₁ and W ₂ of the adaptive filter (step 105). Thereafter, the coefficients W ₁ and W ₂ are updated until the talk switch is turned on. As a result, the coefficients W ₁ and W ₂ converge to a constant value.
In step 104, when the talk switch is turned on, the voice recognition mode is set, and the updating operation of the coefficients W ₁ and W ₂ is terminated (step 106). At the time of speech recognition, the adaptive signal processing unit 61 does not update the filter coefficients, but keeps the filter coefficients W ₁ and W ₂ determined at the time of learning set in the adaptive filter, and the subtraction unit 63 outputs from the target response setting unit 62. A signal obtained by subtracting the output signal of the adaptive signal processing unit 61 from the target signal, that is, a speech signal with reduced noise and an improved S / N ratio is output to a speech recognition unit (not shown).
[0025]
Thereafter, the operation of step 106 is performed until the talk switch is turned off and the voice recognition is released. When the talk switch is turned off (step 107), the learning mode is returned to and the coefficients W ₁ and W after step 105 are set. The update operation of ₂ resumes.
According to the noise reduction system of FIG. 3, it is only necessary to identify the transmission characteristics CS1 ′ and CS2 ′ once in a production line or at a dealer before the vehicle is shipped. Since the transmission characteristics CS1 ′ and CS2 ′ including the characteristics can be identified, no problem occurs even if the microphones have irregular characteristics. Further, even when an uncertain element is included such as when the vehicle type cannot be specified, the transfer characteristics CS1 ′ and CS2 ′ can be identified and set in advance.
In the above, the case where the speech signal output from the system of the present invention is input to the speech recognition apparatus at the time of non-learning has been described. However, the present invention is not limited thereto, and the speech signal is input to the hands-free telephone at the time of non-learning, This can be applied when inputting to a device.
The present invention has been described with reference to the embodiments. However, the present invention can be variously modified in accordance with the gist of the present invention described in the claims, and the present invention does not exclude these.
[0026]
【The invention's effect】
As described above, according to the present invention, since the transfer characteristic from the mouth to the microphone is simulated with only the delay characteristic, the transfer characteristic from the actual mouth to the microphone is simulated, so that the noise reduction effect can be improved. In addition, according to the present invention, it is possible to remove vehicle interior noise that cannot be handled by a general AMNOR system.
Further, according to the present invention, by incorporating a mechanism for simulating the microphone-speaker transfer characteristic into the noise reduction device, it is possible to correct the variation in the microphone characteristic.
Further, according to the present invention, adaptive processing for noise reduction can be performed in real time.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a noise reduction system of the present invention.
FIG. 2 is a configuration diagram of a transfer characteristic measuring apparatus from a speaker's mouth to each microphone output terminal according to the present invention.
FIG. 3 is a configuration diagram of a noise reduction system having a transfer characteristic measurement function.
FIG. 4 is an overall control flow of the present invention.
FIG. 5 is a relationship diagram between an SN ratio and a recognition rate.
FIG. 6 is a conventional AMNOR type noise reduction system.
[Explanation of symbols]
51, 52... First and second microphones 55... Signal generators 57 and 58 for generating random noise as a target signal... Transfer circuit 59 and 60 for simulating transfer characteristics. 1-output adaptive signal processing units 61a, 61b, first and second adaptive signal processing units 62, target response setting unit 63, subtracting unit

Claims

In the noise reduction system AMNOR method for improving the SN ratio of the speaker's speech signal,
Multiple microphones,
A signal generator that generates random noise signals,
A transfer circuit that simulates the transfer characteristics from the speaker's mouth to the output end of each microphone,
A synthesis unit that synthesizes a random noise signal output via each transmission circuit with an output signal of each microphone,
An adaptive signal processing unit that performs adaptive signal processing by using the random noise signal as a target signal and learning by using each synthesized unit output as a reference signal during learning, and updates the coefficient of the adaptive filter during non-learning;
A target response setting unit for giving a predetermined delay to the target signal;
A calculation unit that outputs a difference between the output signal of the adaptive filter and the output signal of the target response setting unit as an audio signal;
The transfer circuit measures transfer characteristics from the speaker mouth to the output end of each microphone, and simulates the transfer characteristics from the speaker mouth to the output end of each microphone by the measured transfer characteristics. And switching the system to transfer characteristic measurement mode, learning mode, and non-learning mode to measure transfer characteristics, learn filter coefficients, and output audio signals.
Noise reduction system characterized by