JP5698166B2

JP5698166B2 - Sound source distance estimation apparatus, direct ratio estimation apparatus, noise removal apparatus, method thereof, and program

Info

Publication number: JP5698166B2
Application number: JP2012041053A
Authority: JP
Inventors: 裕輔日岡; 古家　賢一; 賢一古家; 羽田　陽一; 陽一羽田; 健太丹羽
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2012-02-28
Filing date: 2012-02-28
Publication date: 2015-04-08
Anticipated expiration: 2032-02-28
Also published as: JP2013178110A

Description

本発明は、例えば、音声入力によって機器を操作するハンズフリー方式等に応用可能な、マイクロホンアレーを用いてマイクロホンアレーから音源までの距離を推定する音源距離測定装置、雑音除去装置、それらに用いられる直間比推定装置、それらの方法、及びプログラムに関する。 INDUSTRIAL APPLICABILITY The present invention is applicable to, for example, a hands-free method for operating a device by voice input, and is used for a sound source distance measuring device, a noise removing device, and the like that estimate a distance from a microphone array to a sound source using a microphone array. The present invention relates to a direct ratio estimation apparatus, a method thereof, and a program.

特許文献１に示す従来技術では、直間比を求めるためにマイクロホンアレーの受音信号を周波数領域に変換し、その信号から求められる空間相関行列を用いて直接音と間接音のそれぞれのパワーを求めている（例えば、実施例１の段落〔００２５〕〜〔００４０〕参照）。 In the prior art shown in Patent Document 1, the received sound signal of the microphone array is converted to the frequency domain in order to obtain the direct ratio, and the power of each of the direct sound and the indirect sound is calculated using the spatial correlation matrix obtained from the signal. (See, for example, paragraphs [0025] to [0040] of Example 1).

特開２０１１−５３０６２号公報JP 2011-53062 A

特許文献１に開示された方法では、直接音とそれと同じ方向から到来する間接音との区別がつかないため、直接音の方向から到来する音はすべて直接音と判断されてしまう。結果として直接音パワーを過大評価（または間接音パワーを過小評価）してしまい、最終的に求められる直間比が真の値よりも大きくなってしまう課題がある。 In the method disclosed in Patent Literature 1, since direct sound and indirect sound coming from the same direction cannot be distinguished, all sounds coming from the direct sound direction are determined to be direct sounds. As a result, there is a problem that the direct sound power is overestimated (or the indirect sound power is underestimated), and the finally obtained direct ratio becomes larger than the true value.

本発明は、このような課題に鑑みてなされたものであり、直接音の方向から到来する残響音を区別して、直接音パワーと残響音パワーを推定することで、従来手法に比べてより真値に近い直間比推定値（ＤＲＲ:Direct-to-Reverberation energy Ratio）を得ることのできる直間比推定装置を提供すると共に、その直間比推定装置を用いた音源距離推定装置と雑音除去装置と、それらの方法及びプログラムを提供することを目的とする。 The present invention has been made in view of such problems, and distinguishes reverberant sounds coming from the direction of the direct sound, and estimates the direct sound power and the reverberant sound power. Provided is a direct ratio estimating device capable of obtaining a direct-to-reverberation energy ratio (DRR) close to a value, and a sound source distance estimating device and noise elimination using the direct ratio estimating device It is an object to provide apparatuses, methods and programs thereof.

本発明の直間比推定装置は、受信音パワー推定部と、直接音方向パワー推定部と、残響音方向パワー推定部と、減算部と、直間比算出部と、を有する。受信音パワー推定部は、マイクロホンアレーに含まれる複数個のマイクロホンで受音された受音信号を周波数領域に変換して得られる周波数領域信号を用い、当該周波数領域信号のパワー推定値を得る。直接音方向パワー推定部は、周波数領域信号に対して直接音源方向から到来した信号成分を主に通過させる処理を行って得られた直接音方向信号のパワー推定値、又は、受音信号に対して直接音源方向から到来した信号成分を主に通過させる処理を行った信号を周波数領域に変換して得られた直接音方向信号のパワー推定値を得る。残響音方向パワー推定部は、主に直接音源方向以外から到来した信号成分を、直接音方向パワー推定部の直接音源方向から到来した信号成分を主に通過させる処理と同じ指向性形状で通過させる処理を行って得られた残響音方向信号のパワー推定値、又は、受音信号に対して主に前記直接音源方向以外から到来した信号成分を通過する処理を行った信号を周波数領域に変換して有られた残響音方向信号のパワー推定値、を得る。減算部は、直接音方向信号のパワー推定値から残響音方向信号のパワー推定値を減算した直接音パワー推定値を出力する。直間比算出部は、周波数領域信号のパワー推定値及び残響音方向信号のパワー推定値を用い、残響音方向信号のパワー推定値に対する直接音のパワー推定値の比率を表す直間比推定値を得る。 The direct ratio estimation apparatus of the present invention includes a received sound power estimation unit, a direct sound direction power estimation unit, a reverberation sound direction power estimation unit, a subtraction unit, and a direct ratio calculation unit. The reception sound power estimation unit obtains a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a reception signal received by a plurality of microphones included in the microphone array into a frequency domain. The direct sound direction power estimator is a power estimation value of a direct sound direction signal obtained by performing a process of mainly passing a signal component arriving from a direct sound source direction to a frequency domain signal, or a received sound signal. Thus, the power estimation value of the direct sound direction signal obtained by converting the signal subjected to the processing of mainly passing the signal component arriving from the direct sound source direction into the frequency domain is obtained. The reverberant sound direction power estimation unit passes signal components that mainly come from other than the direct sound source direction in the same directivity shape as the processing that mainly passes signal components that come from the direct sound source direction of the direct sound direction power estimation unit. The power estimation value of the reverberant sound direction signal obtained by performing the processing, or the signal obtained by performing the processing of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal is converted into the frequency domain. To obtain the estimated power value of the reverberation direction signal. The subtracting unit outputs a direct sound power estimated value obtained by subtracting the power estimated value of the reverberant sound direction signal from the power estimated value of the direct sound direction signal. The direct ratio calculation unit uses the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, and indicates the ratio of the direct sound power estimation value to the power estimation value of the reverberation sound direction signal. Get.

また、本発明の音源距離推定装置は、前記した直間比推定装置が出力する直間比推定値を用いて、音源との距離を推定する。また、本発明の雑音除去装置は、前記した直間比推定値を用いて所望の音源以外の音を除去する。 The sound source distance estimation apparatus of the present invention estimates the distance to the sound source using the direct ratio estimation value output from the direct ratio estimation apparatus. Moreover, the noise removal apparatus of this invention removes sounds other than a desired sound source using the above-described direct ratio estimation value.

本発明の直間比推定装置は、残響音の拡散性が強いことによるその等方性に着目して直間比を求める新しい装置である。その直間比推定方法は、マイクロホンアレーにより実現される指向性形状が同一な２つ以上のビームフォーマによって、直接音方向から到来する信号のうち直接音と残響音を区別し、それぞれのパワーを正しく推定することができる。よって、本発明の直間比推定装置は、直間比の推定精度を向上させることができる。 The direct ratio estimation apparatus of the present invention is a new apparatus that obtains the direct ratio by paying attention to its isotropy due to the strong diffusibility of reverberant sound. The direct ratio estimation method distinguishes between direct sound and reverberant sound from signals coming from the direct sound direction by using two or more beamformers with the same directivity shape realized by the microphone array, and sets the power of each. It can be estimated correctly. Therefore, the direct ratio estimation apparatus of the present invention can improve the estimation accuracy of the direct ratio.

また、本発明の音源距離推定装置は、提案する直間比推定方法で求めた直間比推定値に基づいて、音源との距離を正確に推定することができる。また、本発明の雑音除去装置は、前記した直間比推定値に応じて受信音をフィルタリングすることで一定の距離範囲内にあると判定された音源の成分だけを強調又は抑圧して収音することができる。その結果、マイクロホンアレーで特定の距離にある音源だけを正確に収音することが可能になる。 The sound source distance estimation apparatus of the present invention can accurately estimate the distance to the sound source based on the direct ratio estimation value obtained by the proposed direct ratio estimation method. Further, the noise removal apparatus of the present invention emphasizes or suppresses only the sound source component determined to be within a certain distance range by filtering the received sound according to the above-described direct ratio estimation value. can do. As a result, only the sound source at a specific distance can be accurately picked up by the microphone array.

音源距離推定装置４００を利用する場面の一例を示す図。The figure which shows an example of the scene using the sound source distance estimation apparatus. 屋内での音の伝搬経路を示す図。The figure which shows the propagation path of the sound indoors. 直間比とマイクロホン間距離との関係を示す図。The figure which shows the relationship between direct ratio and the distance between microphones. 実施例に対応する原理を概念的に示す図。The figure which shows the principle corresponding to an Example notionally. 音源方向を表す単位ベクトルを示す図であり、（ａ）は３次元単位ベクトルｒ、（ｂ）は２次元単位ベクトルｕ、の例を示す図である。It is a figure which shows the unit vector showing a sound source direction, (a) is a figure which shows the example of the three-dimensional unit vector r, (b) is the two-dimensional unit vector u. 同じ指向性形状を持ち、メインビームが異なる方向に向けられた２つのビームフォーマを示す図であり、（ａ）は音源方向にビームを向けたビームフォーマ、（ｂ）は音源方向にヌルを向けたビームフォーマを示す。It is a figure which shows the two beam formers which have the same directivity shape and the main beam is directed in different directions, (a) is a beam former in which the beam is directed to the sound source direction, (b) is directed to the sound source direction. Shows the beamformer. 実施例１の直間比推定装置１００の機能構成例を示す図。The figure which shows the function structural example of the direct ratio estimation apparatus 100 of Example 1. FIG. 直間比推定装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the direct ratio estimation apparatus 100. 直間比計算部４４の機能構成例を示す図。The figure which shows the function structural example of the direct ratio calculation part 44. 直間比計算部４４′の機能構成例を示す図。The figure which shows the function structural example of direct ratio calculation part 44 '. 各残響指向性形成部４４３１_１〜４４３１_Ｎの指向性形状の例を模式的に示す図。Schematically illustrates an example of a directional shape of each reverberation directivity forming section 4431 ₁ ~4431 _N. 直間比計算部４４″の機能構成例を示す図The figure which shows the function structural example of direct ratio calculation part 44 '' 実施例２の音源距離推定装置４００の機能構成例を示す図。The figure which shows the function structural example of the sound source distance estimation apparatus 400 of Example 2. FIG. 実施例３の雑音除去装置７００の機能構成例を示す図。FIG. 10 is a diagram illustrating a functional configuration example of a noise removal device 700 according to a third embodiment. 雑音除去装置７００の動作フローを示す図。The figure which shows the operation | movement flow of the noise removal apparatus 700. 処理対象信号生成部７２の機能構成例を示す図。The figure which shows the function structural example of the process target signal production | generation part 72. FIG. 効果確認実験の実験条件を示す図。The figure which shows the experimental condition of an effect confirmation experiment. 直間比推定のシミュレーション結果を示す図。The figure which shows the simulation result of direct ratio estimation.

以下、本発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。また、以下の説明において、テキスト中で使用する記号「￣」や「＾」等は、本来直後の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直前に記載する。式中においてはこれらの記号は本来の位置に記述している。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated. In the following description, the symbols “記号”, “^”, etc. used in the text should be written immediately above the original character, but immediately before the character due to restrictions on the text notation. It describes. In the formula, these symbols are written in their original positions.

実施例の説明の前に、各実施例に対応する原理について説明する。
〔原理〕
図１に、実施例２の音源距離推定装置４００を利用する場面を例示する。残響特性を持つ部屋１０の中に、マイクロホンアレー１１と、発話者１２が存在している。マイクロホンアレー１１と発話者１２は距離を置いて配置されている。 Prior to the description of the embodiments, the principle corresponding to each embodiment will be described.
〔principle〕
FIG. 1 illustrates a scene where the sound source distance estimation apparatus 400 according to the second embodiment is used. A microphone array 11 and a speaker 12 are present in a room 10 having reverberation characteristics. The microphone array 11 and the speaker 12 are arranged at a distance.

この状況において、発話者１２とマイクロホンアレー１１との間の距離ｄを推定したい。そこで、本発明は、直間比推定値を用いて音源とマイクロホンアレー間の距離を推定する。 In this situation, we want to estimate the distance d between the speaker 12 and the microphone array 11. Therefore, the present invention estimates the distance between the sound source and the microphone array using the direct ratio estimation value.

直間比とは、受信音に含まれる直接音と間接音（残響音とも称する）との比（例えば、パワーの比、パワースペクトルの比、パワースペクトル密度の比、振幅の単調増加関数値の比）である。間接音のパワー推定値に対する直接音のパワー推定値の比率を表す値を「直間比推定値」と呼ぶ。例えば、直接音のパワー推定値を間接音のパワー推定値で除した値を直間比推定値としてもよいし、間接音のパワー推定値を直接音のパワー推定値で除した値を直間比推定値としてもよいし、それら何れかの関数値を直間比推定値としてもよい。パワー推定値とは、パワーの増加に応じて増加する値を意味する。パワー推定値の例は、パワー、パワースペクトル、パワースペクトル密度、振幅の単調増加関数値、それらの推定値などである。 The direct ratio is the ratio of direct sound and indirect sound (also called reverberation sound) included in the received sound (for example, power ratio, power spectrum ratio, power spectrum density ratio, amplitude monotonically increasing function value) Ratio). A value representing the ratio of the power estimate value of the direct sound to the power estimate value of the indirect sound is referred to as “direct ratio estimate value”. For example, the value obtained by dividing the power estimate value of the direct sound by the power estimate value of the indirect sound may be used as the direct ratio estimate value, or the value obtained by dividing the power estimate value of the indirect sound by the power estimate value of the direct sound It is good also as a ratio estimated value, and it is good also considering one of those function values as a direct ratio estimated value. The power estimated value means a value that increases as the power increases. Examples of power estimates are power, power spectrum, power spectrum density, monotonically increasing function values of amplitude, their estimates, etc.

図２に屋内にマイクロホンを置いて音を収録した際の、音源２１からマイクロホン２２までの音の伝搬経路を示す。直接音とは、音源２１からマイクロホンまで直接到達する太い実線で示す音波である。一方の残響音とは、音源２１から発した音が壁や床や天井などで反射してからマイクロホン２２に到達する破線で示す音波である。 FIG. 2 shows a sound propagation path from the sound source 21 to the microphone 22 when a microphone is placed indoors and a sound is recorded. The direct sound is a sound wave indicated by a thick solid line that directly reaches from the sound source 21 to the microphone. One reverberant sound is a sound wave indicated by a broken line that reaches the microphone 22 after the sound emitted from the sound source 21 is reflected by a wall, floor, ceiling, or the like.

図３に直間比とマイクロホン間距離との関係を示す。図３の横軸はマイクロホンから音源までの距離、縦軸は直間比である。一般的に間接音はマイクロホンからの距離に依存しない一定の大きさを示す。その間接音に対して直接音は、マイクロホンからの距離の増加に伴って単調に減少する特性を示す。その直接音を間接音で除した直間比は、直接音と同様に距離の増加に伴って単調に減少する特性になる。この直間比から、マイクロホンアレー１１で受音した受信音からマイクロホンアレーと音源との間の距離を推定することが可能になる。本発明の直間比推定装置は、直間比推定値を出力する。また、本発明の雑音除去装置は、直間比推定装置が出力する直間比推定値に応じて受音信号の雑音を除去する。 FIG. 3 shows the relationship between the direct ratio and the distance between the microphones. The horizontal axis in FIG. 3 is the distance from the microphone to the sound source, and the vertical axis is the direct ratio. In general, the indirect sound has a certain magnitude that does not depend on the distance from the microphone. In contrast to the indirect sound, the direct sound exhibits a characteristic that monotonously decreases as the distance from the microphone increases. The direct ratio obtained by dividing the direct sound by the indirect sound has a characteristic that decreases monotonously as the distance increases, as in the case of the direct sound. From this direct ratio, the distance between the microphone array and the sound source can be estimated from the received sound received by the microphone array 11. The direct ratio estimation apparatus of the present invention outputs a direct ratio estimated value. In addition, the noise removal apparatus of the present invention removes noise from the received sound signal in accordance with the direct ratio estimation value output from the direct ratio estimation apparatus.

図４に、この原理の考えを概念的に示す。一般に残響が十分ある場合には残響音に拡散性を仮定することができ、マイクロホンから見た場合に残響音はあらゆる方向から同じ大きさで到来する音としてモデル化できることが知られている。小型マイクロホンアレー１１の出力信号に任意のビームフォーマＢＦ１を適用すると、所定の指向性形状Ｄ_１で残響音方向パワー２３を受音することができる。残響音方向パワー２３の３本の矢印は、指向性形状Ｄ_１で得られる残響音の大きさを模式的に表現している。 FIG. 4 conceptually shows the idea of this principle. In general, when reverberation is sufficient, diffusivity can be assumed for reverberant sound, and it is known that reverberant sound can be modeled as sound arriving at the same magnitude from all directions when viewed from a microphone. Applying any beamformer BF1 output signal of the small microphone array 11, may be received sound reverberation direction power 23 in a predetermined directional shape D _1. Three arrows reverberation direction power 23, the magnitude of reverberation obtained by directional shape D ₁ are schematically represented.

いま音源２１の位置が既知であると仮定した場合、音源２１から小型マイクロホンアレー１１に直接到来する直接音パワー２５は、ビームフォーマＢＦ０の指向性形状Ｄ_０を、Ｄ_１と同じとし、その指向方向を音源２１方向とすることで、残響音方向パワー２３と同じ大きさの残響音方向パワーを含む直接音方向パワー２６を受音することができる。 Assuming that the position of the sound source 21 is already known, the direct sound power 25 that directly arrives at the small microphone array 11 from the sound source 21 has the directivity shape D ₀ of the beamformer BF ₀ as D _1, and its directivity. By setting the direction to the direction of the sound source 21, it is possible to receive the direct sound direction power 26 including the reverberation sound direction power having the same magnitude as the reverberation sound direction power 23.

残響音方向パワー２３と同じ残響成分を含む直接音方向パワー２６から、残響音方向パワー２３を差し引くことで直接音パワー２５を得ることができる。次に、この原理を理論的に説明する。 The direct sound power 25 can be obtained by subtracting the reverberant sound direction power 23 from the direct sound direction power 26 including the same reverberation component as the reverberant sound direction power 23. Next, this principle will be explained theoretically.

＜残響音の等方到来モデル＞
提案方式では、残響音の等方性を考慮したモデルを導入する。ここでは、パワー推定値としてパワースペクトル密度又はその推定値を用いた例を説明するが、これは本発明を限定するものではない。 <Model of arrival of isotropic reverberation>
In the proposed method, a model considering the isotropic nature of reverberant sound is introduced. Here, an example in which the power spectral density or its estimated value is used as the power estimated value will be described, but this does not limit the present invention.

Ｍ（Ｍ≧２）個のマイクロホンからなるマイクロホンアレーのｍ番目のマイクロホンでの受音信号を短時間フーリエ変換等によって周波数領域に変換すると、以下の周波数領域信号Ｘ^（ｍ）（ω，ｔ）が得られる。 When the received sound signal of the m-th microphone of the microphone array composed of M (M ≧ 2) microphones is converted into the frequency domain by short-time Fourier transform or the like, the following frequency domain signal X ^(m) (ω, t) Is obtained.

ただし、ωは周波数であり、Ｈ_Ｄ ^（ｍ）（ω）は音源からｍ番目のマイクロホンまでの直接音の伝達関数であり、Ｈ_Ｒ ^（ｍ）（ω）は音源からｍ番目のマイクロホンまでの間接音の伝達関数であり、Ｓ（ω，ｔ）は音源の音を周波数領域に変換して得られる信号である。ｔは時間フレームのインデックスである。 Where ω is a frequency, H _D ^(m) (ω) is a direct sound transfer function from the sound source to the m-th microphone, and H _R ^(m) (ω) is a sound source to the m-th microphone. It is a transfer function of indirect sound, and S (ω, t) is a signal obtained by converting the sound of the sound source into the frequency domain. t is a time frame index.

ここで直接音はコヒーレント（coherent）である一方、間接音はその主な成分が残響であることから拡散音（diffuse）であると仮定する。すなわち、それぞれの到来方向に着目した場合、直接音は音源の方向からのみ到来するのに対し、間接音はあらゆる方向から一様なパワーで到来する性質（以下「等方性」という）を持つ。提案方法ではこれら空間的な到来特性の違いに着目して直接音パワーと間接音パワーを推定して直間比を求める。 Here, it is assumed that the direct sound is coherent while the indirect sound is a diffuse sound because its main component is reverberation. In other words, when focusing on each direction of arrival, direct sound arrives only from the direction of the sound source, while indirect sound has the property of arriving with uniform power from all directions (hereinafter referred to as “isotropic”). . In the proposed method, the direct ratio is obtained by estimating the direct sound power and the indirect sound power by paying attention to the difference in these arrival characteristics.

前提条件として直接音の到来方向（以下「直接音源方向」という）は既知であり、直接音及び任意の方向から到来する間接音は平面波とみなせるとし、拡散音の定義より直接音と間接音は互いに無相関とする。このとき音源からｍ番目のマイクロホンまでの直接音、間接音の伝達関数Ｈ_Ｄ ^（ｍ）（ω），Ｈ_Ｒ ^（ｍ）（ω）は、それぞれ以下のように表現できる。 As a prerequisite, the direct sound arrival direction (hereinafter referred to as “direct sound source direction”) is known, and direct sound and indirect sound coming from any direction can be regarded as plane waves. Uncorrelated with each other. At this time, the transfer functions H _D ^(m) (ω) and H _R ^(m) (ω) of the direct sound and the indirect sound from the sound source to the m-th microphone can be expressed as follows.

ただし、Ｈ_Ｄｒｅｆ（ω）は音源からマイクロホンアレーの基準点（「基準点」という）までの伝達関数の直接音成分であり、Ｈ_{Ｒｒｅｆ，θ}（ω）は基準点からみた方向θの間接音成分である。基準点はマイクロホンアレーの内部に存在してもよいし、マイクロホンアレーの外部に存在してもよい。マイクロホンアレーの内部とは、例えば、マイクロホンアレーを構成する複数のマイクロホンを通る直線上、当該複数のマイクロホンを通る線分で囲まれた平面の内部、又は当該複数のマイクロホンを通る面で囲まれた立体の内部を意味する。マイクロホンアレーの外部とは、マイクロホンアレーの内部以外の位置を意味する。例えば、マイクロホンアレーを構成する複数のマイクロホンそれぞれと基準点との距離は、マイクロホンそれぞれと直接音源との距離よりも短い。基準点の例は、マイクロホンアレーの中心点、何れか１個のマイクロホンの位置である。このとき方向θから到来する音の上記基準点からｍ番目のマイクロホンまでの間での伝搬遅延τ_θ ^（ｍ）は、次式のように表される。 Here, H _Dref (ω) is a direct sound component of the transfer function from the sound source to the reference point (referred to as “reference point”) of the microphone array, and H _{Rref, θ} (ω) is an indirect sound in the direction θ viewed from the reference point. It is an ingredient. The reference point may exist inside the microphone array or may exist outside the microphone array. The inside of the microphone array is, for example, surrounded by a straight line passing through a plurality of microphones constituting the microphone array, a plane surrounded by a line segment passing through the plurality of microphones, or a surface passing through the plurality of microphones. It means the inside of a solid. The outside of the microphone array means a position other than the inside of the microphone array. For example, the distance between each of the plurality of microphones constituting the microphone array and the reference point is shorter than the distance between each microphone and the direct sound source. An example of the reference point is the center point of the microphone array, or the position of any one microphone. At this time, the propagation delay τ _θ ^(m) of the sound arriving from the direction θ from the reference point to the m-th microphone is expressed by the following equation.

ここで第ｍ番目のマイクロホンの位置ｐ_ｍは式（５）で表現され、図５（ａ）及び５（ｂ）に示すように直接音源方向を表す単位ベクトルｕは、式（６）で表せる。ｃは音波の伝搬速度である。 Here the position p _m of the m-th microphone is expressed by Equation (5), the unit vector u representing the direct sound source direction as shown in FIG. 5 (a) and 5 (b) represented by the formula (6) . c is the propagation speed of the sound wave.

また、θ_Ｄは基準点からみた直接音源方向であり、ｊは虚数単位であり、ｅは自然対数である（式（２），（３））。また、θについての積分は０≦θ＜２πの範囲で行われる（以下の積分についても同様）。 Θ _D is the direct sound source direction as seen from the reference point, j is an imaginary unit, and e is a natural logarithm (Equations (2) and (3)). Further, the integration with respect to θ is performed in the range of 0 ≦ θ <2π (the same applies to the following integration).

すなわち、直接音と間接音の伝達関数Ｈ_Ｄ ^（ｍ）（ω），Ｈ_Ｒ ^（ｍ）（ω）のそれぞれは、音源から基準点までの伝達関数成分と、基準点からｍ番目のマイクロホンまでの伝搬遅延による位相差成分とに分解して表すことができる。従って、周波数領域信号Ｘ^（ｍ）（ω，ｔ）（ｍ∈{１，…，Ｍ}）を要素とするマイクロホンアレー入力ベクトル^→ｘ（ω，ｔ）＝[Ｘ^（１）（ω，ｔ），…，Ｘ^（Ｍ）（ω，ｔ）]^Ｔは次式で表される。Ｔは転置を表す。 That is, the transfer functions H _D ^(m) (ω) and H _R ^(m) (ω) of the direct sound and the indirect sound are respectively transferred from the sound source to the reference point and from the reference point to the mth microphone. It can be expressed by being decomposed into phase difference components due to the propagation delay. Therefore, a microphone array input vector whose elements are frequency domain signals X ^(m) (ω, t) (mε {1,..., M}) ^→ x (ω, t) = [X ⁽¹⁾ (ω, t ,..., X ^(M) (ω, t)] ^T is expressed by the following equation. T represents transposition.

ただし、Ｓ_Ｄ（ω，ｔ）＝Ｈ_Ｄｒｅｆ（ω）Ｓ（ω，ｔ），Ｓ_Ｒ，θ（ω，ｔ）＝Ｈ_{Ｒｒｅｆ，θ}（ω）Ｓ（ω，ｔ）である。^→ａ_θ（ω）は式（８）で表されるθ方向のステアリングベクトルである。アレイ・マニフォールド・ベクトルの各要素は伝搬遅延τ_θ ^（ｍ）に依存する。直接音及び間接音が平面波とみなせる場合、伝搬遅延τ_θ ^（ｍ）はマイクロホンアレーの基準点に対する各マイクロホンの相対位置及び方向θに依存する。なお、アレイ・マニフォールド・ベクトルの詳細については、例えば、参考文献１「浅野太著，“音のアレイ信号処理−音源の定位・追跡と分離（日本音響学会編音響テクノロジーシリーズ）”，株式会社コロナ社，2011年2月25日，ISBN978-4-339-01116-6」の第１章（ｐ1〜26）を参照されたい。 However, S _D (ω, t) = H _Dref (ω) S (ω, t), S _{R, θ} (ω, t) = H _{Rref, θ} (ω) S (ω, t). ^→ a _θ (ω) is a steering vector in the θ direction represented by Expression (8). Each element of the array manifold vector depends on the propagation delay τ _θ ^(m) . When direct sound and indirect sound can be regarded as plane waves, the propagation delay τ _θ ^(m) depends on the relative position and direction θ of each microphone relative to the reference point of the microphone array. For details of the array manifold vector, refer to Reference Document 1 “Taita Asano,“ Sound Array Signal Processing-Sound Source Localization / Tracking and Separation (Sound Technology Series edited by the Acoustical Society of Japan) ”, Corona Co., Ltd. Company, February 25, 2011, ISBN978-4-339-01116-6 ”, Chapter 1 (p1 to 26).

このマイクロホンアレー入力に任意のビームフォーマ（ＢＦ）を適用すると、その出力パワースペクトル密度（ＰＳＤ）は、式（９）に示す直接音と間接音のそれぞれの出力パワースペクトル密度（ＰＳＤ）にビームフォーマ（ＢＦ）のパワーゲイン｜Ｄ_θ（ω）｜^２を乗じた和となる。 When an arbitrary beamformer (BF) is applied to the microphone array input, the output power spectral density (PSD) is changed to the output power spectral density (PSD) of each of direct sound and indirect sound shown in Equation (9). The sum of (BF) multiplied by the power gain | _Dθ (ω) | ² .

ただし、Ｐ_Ｄ（ω）＝Ｅ[｜Ｓ_Ｄ（ω，ｔ）｜^２]_ｔ，Ｐ_Ｒ，θ（ω）＝Ｅ[｜Ｓ_Ｒ（ω，ｔ）｜^２]_ｔ，^→ｗ（ω）はビームフォーマ（ＢＦ）のフィルタ係数、Ｒ（ω）はｉｊ成分にＲ_ｉｊ（ω）＝Ｅ[Ｘ_ｉ（ω，ｔ）Ｘ_ｊ ^＊（ω，ｔ）]_ｔを持つマイクロホンアレーの入力信号空間相関行列である。Ｅ[・]は期待値演算を表している。 _{However, P D (ω) = E} [| S D (ω, t) | 2] t, P R, θ (ω) = E [| S R (ω, t) | 2] t, → w (ω ) Is a filter coefficient of the beamformer (BF), and R (ω) is an input of a microphone array having R _ij (ω) = E [X _i (ω, t) X _j ^* (ω, t)] _t in the ij component. It is a signal space correlation matrix. E [•] represents an expected value calculation.

＜複数のビームフォーマを用いた直間比推定＞
式（９）において間接音が等方的に到来すると仮定できる音場では、残響音パワーＰ_Ｒ，_θ（ω）は方向θに依らない定数￣Ｐ_Ｒ（ω）で置き換えることができ、出力パワースペクトル密度は式（１０）で表せる。 <Direct ratio estimation using multiple beamformers>
In the sound field that can be assumed that the indirect sound isotropically arrives in Equation (9), the reverberant power P _R , _θ (ω) can be replaced with a constant ￣P _R (ω) that does not depend on the direction θ. The power spectral density can be expressed by equation (10).

ここで、図６に示すように同じ指向性形状を持ち、メインビームが異なる方向に向けられた２つのビームフォーマＢＦ０とＢＦ１があるとすると、式（１０）の右辺第二項∫_θ｜Ｄ_θ（ω）｜^２ｄθは等しくなり、各ビームフォーマの出力は、右辺第一項すなわち直接音に対するビームフォーマのパワーゲインによってのみ変化する。 Here, assuming that there are two beamformers BF0 and BF1 having the same directivity shape and having the main beam directed in different directions as shown in FIG. 6, the second term ∫ _θ | D on the right side of Expression (10). _θ (ω) | ² dθ are equal, and the output of each beamformer changes only with the first term on the right side, that is, the beamformer's power gain for direct sound.

そこで、音源方向にビームを向けたビームフォーマＢＦ０の出力パワースペクトル密度Ｐ_０（ω）から音源方向にヌル（指向性感度の低い点）を向けたビームフォーマＢＦ１の出力パワースペクトル密度Ｐ_１（ω）を減算することで、直接音パワー２５を求めることができる。 Therefore, the output power spectral density P ₁ beamformer BF1 with its null (low point directivity sensitivity) from the output power spectral density P ₀ beamformer BF0 with its beam to the sound source direction _(omega) in the sound source direction _(omega ) Is subtracted, the direct sound power 25 can be obtained.

以上の原理により、直接音源方向から到来する残響音を区別することができ、結果として直間比の推定精度を向上させることが可能になる。 Based on the above principle, reverberant sound coming directly from the sound source direction can be distinguished, and as a result, the accuracy of the direct ratio can be improved.

図７に、実施例１の直間比推定装置１００の機能構成例を示す。その動作フローを図８に示す。直間比推定装置１００は、マイクロホンアレー４１と、複数の周波数領域変換部４２_１〜４２_Ｍと、直間比計算部４４と、を具備する。マイクロホンアレー４１を除く各機能構成部は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 FIG. 7 illustrates a functional configuration example of the direct ratio estimation apparatus 100 according to the first embodiment. The operation flow is shown in FIG. The direct ratio estimation apparatus 100 includes a microphone array 41, a plurality of frequency domain converters 42 _{1 to} 42 _M, and a direct ratio calculator 44. Each functional configuration unit excluding the microphone array 41 is realized by a predetermined program being read into a computer including, for example, a ROM, a RAM, and a CPU, and the CPU executing the program.

マイクロホンアレー４１は複数のマイクロホンｍ₁，…ｍ_Mから成る。複数の周波数領域変換部４２₁，…，４２_Mは、複数のマイクロホンｍ₁，…ｍ_Mで受音された受音信号ｘ_m（ｎ）がそれぞれ入力され、各受音信号を周波数領域の信号に変換する（ステップＳ４２）。周波数領域変換部４２₁，…，４２_Mは、受音信号ｘ_m（ｎ）を、例えばサンプリング周波数１６ｋＨｚでサンプリングしてディジタル信号に変換し、例えば２５６個のサンプルを１フレームとして、それぞれのフレームにおいて離散フーリエ変換を行い周波数成分Ｘ_m（ω，ｔ）を出力する（ステップＳ４２）。ωは周波数、ｔはフレーム番号である。なお、受音信号ｘ_m（ｎ）をディジタル信号に変換するＡ/Ｄ変換器は省略している。 Microphone array 41 is a plurality of microphones m _1, consisting of ... m _M. A plurality of frequency domain transform section 42 _1, ..., 42 _M, a plurality of microphones m _1, ... m _M received sound has been received sound signal x _m (n) are inputted, respectively, each received sound signal in the frequency domain It converts into a signal (step S42). The frequency domain converters 42 ₁ ,..., 42 _M sample the received sound signal x _m (n), for example, at a sampling frequency of 16 kHz and convert it into a digital signal, for example, 256 samples as one frame. In Step S42, discrete Fourier transform is performed to output a frequency component X _m (ω, t) (step S42). ω is a frequency, and t is a frame number. An A / D converter that converts the received sound signal x _m (n) into a digital signal is omitted.

直間比計算部４４は、複数の周波数領域変換部４２₁，…，４２_mが出力する周波数領域の信号Ｘ_m（ω，ｔ）を入力として受音信号の直間比推定値ＤＲＲ（ω，ｔ）を計算する（ステップＳ４４）。 Chokkan ratio calculation unit 44, a plurality of frequency domain transform section 42 _1, ..., 42 _m signal X _m (ω, t) in the frequency domain to output Chokkan ratio estimate DRR of the received sound signals as input (omega , T) is calculated (step S44).

図９に、直間比計算部４４の機能構成例を示す。直間比計算部４４は、受信音パワー推定部４４１と、直接音方向パワー推定部４４２と、残響音方向パワー推定部４４３と、減算部４４４と、直間比算出部４４５と、を備える。 FIG. 9 shows a functional configuration example of the direct ratio calculation unit 44. The direct ratio calculation unit 44 includes a received sound power estimation unit 441, a direct sound direction power estimation unit 442, a reverberation sound direction power estimation unit 443, a subtraction unit 444, and a direct ratio calculation unit 445.

受信音パワー推定部４４１は、マイクロホンアレー４１に含まれる複数個のマイクロホンで受音された受音信号を周波数領域に変換して得られる周波数領域信号Ｘ_１（ω，ｔ），…，Ｘ_Ｍ（ω，ｔ）を用い、受音信号に対応する周波数領域信号のパワー推定値を生成して出力する。このパワー推定値は、式（１２）のように何れか１個のマイクロホンｍ（ｍ∈{１，…，Ｍ}）に対応する周波数領域信号Ｘ_ｍ（ω，ｔ）のパワー推定値であってもよいし、式（１３）のように周波数領域信号Ｘ_１（ω，ｔ），…，Ｘ_Ｍ（ω，ｔ）それぞれのパワー推定値を重み付け平均したものであってもよい。実施例１では、受音信号に対応する周波数領域信号のパワー推定値としてパワースペクトル密度Ｐ_Ｘ，Ｌ（ω）を求める。 The reception sound power estimation unit 441 converts frequency reception signals received by a plurality of microphones included in the microphone array 41 into frequency domain signals X ₁ (ω, t),..., X _M Using (ω, t), a power estimation value of the frequency domain signal corresponding to the received sound signal is generated and output. This power estimated value is a power estimated value of the frequency domain signal X _m (ω, t) corresponding to any one microphone m (mε {1,..., M}) as shown in the equation (12). Alternatively, as in Expression (13), the power estimation values of the frequency domain signals X ₁ (ω, t),..., X _M (ω, t) may be weighted and averaged. In the first embodiment, the power spectral density P _{X, L} (ω) is obtained as the power estimation value of the frequency domain signal corresponding to the received sound signal.

ただし、Ｌはフレーム数、α_ｍは式（１４）と成るように設定されるマイクロホンｍへの非負の重みである。Ｅ[・]は期待値演算を表している。 However, L is the number of frames, and α _m is a non-negative weight to the microphone m set so as to satisfy the equation (14). E [•] represents an expected value calculation.

直接音方向パワー推定部４４２は、周波数領域信号Ｘ_１（ω，ｔ），…，Ｘ_Ｍ（ω，ｔ）に対して直接音源方向から到来した信号成分のみを通過する処理を行って得られた直接音方向信号のパワー推定値Ｐ_ＤＤ（ω）、又は、受音信号に対して直接音源方向から到来した信号成分のみを通過する処理を行った信号を周波数領域に変換して得られた直接音方向信号のパワー推定値Ｐ_ＤＤ（ω）を得る。直接音方向信号のパワーＰ_ＤＤ（ω）は、前記した式（１１）のＰ_０（ω）と同じものである。 The direct sound direction power estimation unit 442 is obtained by performing a process of passing only the signal component coming from the direct sound source direction with respect to the frequency domain signals X ₁ (ω, t),..., X _M (ω, t). Obtained by converting the power estimation value P _DD (ω) of the direct sound direction signal or the signal obtained by performing the process of passing only the signal component coming from the direct sound source direction to the received sound signal into the frequency domain. A power estimate value P _DD (ω) of the direct sound direction signal is obtained. The power P _DD (ω) of the direct sound direction signal is the same as P ₀ (ω) in the above equation (11).

直接音方向パワー推定部４４２は、指向性形成部４４２１とパワー推定部４４２２を備える。指向性形成部４４２１は、あらかじめ与えられた方向に指向性のビームが向くように指向性を形成し、その指向性を通過した信号を出力する。指向性形成部４４２１の指向性は、直接音方向に指向性のメインビームが向くように設定される。指向性形成の方法としては、例えば参考文献１（浅野太著，「音のアレイ信号処理−音源の定位・追跡と分離」コロナ社，pp.70-79））に記載されている遅延和ビームフォーミングなどの方法を用いることができる。 The direct sound direction power estimation unit 442 includes a directivity forming unit 4421 and a power estimation unit 4422. The directivity forming unit 4421 forms directivity so that a directional beam is directed in a predetermined direction, and outputs a signal that has passed the directivity. The directivity of the directivity forming unit 4421 is set so that the main beam having directivity faces the direct sound direction. As a method of directivity formation, for example, the delayed sum beam described in Reference Document 1 (Taro Asano, “Sound Array Signal Processing-Sound Source Localization / Tracking and Separation” Corona, pp. 70-79)). A method such as forming can be used.

指向性形成部４４２１の出力をＹ_ＢＦ（ω，ｔ）と表記した場合、パワー推定部４４２２の出力する直接音方向信号のパワー推定値Ｐ_ＤＤ（ω）は、式（１５）によって得られる。 When the output of the directivity forming unit 4421 is expressed as Y _BF (ω, t), the power estimation value P _DD (ω) of the direct sound direction signal output from the power estimation unit 4422 is obtained by Expression (15).

また、直接音方向信号のパワー推定値Ｐ_ＤＤ（ω）の出力パワースペクトル密度は式（１６）で表される。 Further, the output power spectral density of the power estimation value P _DD (ω) of the direct sound direction signal is expressed by Expression (16).

ここで｜Ｄ_０，θ（ω）｜^２は、図４で説明したビームフォーマＢＦ０のθ方向に対するパワーゲインに当たる。 Here, | D _{0, θ} (ω) | ² corresponds to the power gain in the θ direction of the beam former BF ₀ described with reference to FIG.

残響音方向パワー推定部４４３は、主に直接音源方向以外から到来した信号成分を、直接音方向パワー推定部４４２の直接音源方向から到来した信号成分を主に通過させる処理と同じ指向性形状で通過させる処理を行って得られた残響音方向信号のパワー推定値、又は、主に受音信号に対して直接音源方向以外から到来した信号成分を通過する処理を行った信号を周波数領域に変換して有られた残響音方向信号のパワー推定値を得る。 The reverberant sound direction power estimation unit 443 has the same directivity shape as the process of mainly passing the signal component arriving from other than the direct sound source direction through the signal component arriving from the direct sound source direction of the direct sound direction power estimation unit 442. Converts the estimated power of the reverberant sound direction signal obtained by passing the signal, or the signal that has been processed to pass the signal component that mainly arrives from outside the sound source direction to the received sound signal into the frequency domain. Thus, the power estimate value of the reverberation direction signal is obtained.

理想的には、残響音方向パワー推定部４４３は、残響指向性形成部４４３１と残響パワー推定部４４３２を備える。残響指向性形成部４４３１の指向性は、指向性のメインビームが直接音方向を避けるように設定される。その指向性形状は指向性形成部４４２１と同じに設定される。残響指向性形成部４４３１と指向性形成部４４２１の指向性形状は、極力同じ形状になるように設定するのが望ましい。その指向性形状の設定は従来技術で容易に実現することができる。 Ideally, the reverberant sound direction power estimation unit 443 includes a reverberation directivity forming unit 4431 and a reverberation power estimation unit 4432. The directivity of the reverberation directivity forming unit 4431 is set so that the directivity main beam avoids the direct sound direction. The directivity shape is set to be the same as that of the directivity forming portion 4421. The directivity shapes of the reverberation directivity forming unit 4431 and the directivity forming unit 4421 are desirably set to be the same as much as possible. The setting of the directivity shape can be easily realized by the prior art.

残響パワー推定部４４３２は、直接音方向を避けるようにして受音された残響音を入力として残響音方向信号のパワー推定値Ｐ_ＲＤ（ω）を出力する（式１７）。残響音方向信号のパワー推定値Ｐ_ＲＤ（ω）には、直接音方向を避けるようにして受音しているので、｜Ｄ_１，θＤ｜^２≪１と設定することで、直接音成分｜Ｄ_０，θ（ω）｜^２Ｐ_Ｄ（ω）は、十分小さくなる。 The reverberation power estimation unit 4432 receives the reverberant sound received so as to avoid the direct sound direction, and outputs the power estimate value P _RD (ω) of the reverberation sound direction signal (Equation 17). Since the power estimation value P _RD (ω) of the reverberant sound direction signal is received so as to avoid the direct sound direction, by setting | D _{1, θD} | ² << _1, the direct sound component | D _{0, θ} (ω) | ² P _D (ω) is sufficiently small.

ここで｜Ｄ_１，θ（ω）｜^２は、図４で説明したビームフォーマＢＦ１のθ方向に対するパワーゲインに当たる。 Here, | D _{1, θ} (ω) | ² corresponds to the power gain in the θ direction of the beam former BF1 described in FIG.

減算部４４４は、直接音方向パワー推定部４４２が出力する直接音方向信号のパワー推定値Ｐ_ＤＤ（ω）から、残響パワー推定部４４３２が出力する残響音方向信号のパワー推定値Ｐ_ＲＤ（ω）を減算した直接音パワー推定値＾Ｐ_Ｄ（ω）を出力する（式（１８））。 The subtracting unit 444 uses the power estimation value P _RD (ω) of the reverberation sound direction signal output from the reverberation power estimation unit 4432 from the power estimation value P _DD (ω) of the direct sound direction signal output from the direct sound direction power estimation unit 442. ) Is subtracted from the direct sound power estimated value ^ P _D (ω) (formula (18)).

式（１８）の分母は、指向性形成部４４２１と残響指向性形成部４４３１のそれぞれのビームフォーマ（ＢＦ）のパワーゲインの差によって直接音パワー推定値＾Ｐ_Ｄ（ω）を正規化するための項である。 The denominator of the equation (18) normalizes the direct sound power estimated value ^ P _D (ω) by the difference in power gain between the beamformers (BF) of the directivity forming unit 4421 and the reverberant directivity forming unit 4431. It is a term of.

直間比算出部４４５は、受信音パワー推定部４４１が出力するパワースペクトル密度Ｐ_Ｘ，Ｌ（ω）及び直接音パワー推定値＾Ｐ_Ｄ（ω）を用い、直接音パワー推定値＾Ｐ_Ｄ（ω）と、残響音方向信号のパワー推定値のパワーの比である直間比推定値ＤＲＲ（ω）を得る（式（１９））。 The direct ratio calculation unit 445 uses the power spectral density P _{X, L} (ω) and the direct sound power estimated value ^ P _D (ω) output from the received sound power estimating part 441, and uses the direct sound power estimated value ^ P _D. A direct ratio estimated value DRR (ω), which is the ratio of the power of (ω) and the power estimated value of the reverberant sound direction signal, is obtained (formula (19)).

また、受信音パワー推定部４４１の出力する受信音パワーが、何れか１個のマイクロホンｍ（ｍ∈{１，…，Ｍ}）に対応する式（１２）で表記される場合、直間比は式（２０）で推定することもできる。 Further, when the reception sound power output from the reception sound power estimation unit 441 is expressed by the equation (12) corresponding to any one microphone m (mε {1,..., M}), the direct ratio Can also be estimated by equation (20).

さらに、直間比は周波数によらない直間比として式（２１），（２２）で推定することもできる。なお、フレーム数Ｌごとに求めた値であるのでＤＲＲ（ω）と表記しているが、１フレームごと周波数ごとに求めた値はＤＲＲ（ω，ｔ）と表記される。 Furthermore, the direct ratio can also be estimated by the equations (21) and (22) as the direct ratio independent of the frequency. In addition, since it is the value calculated | required for every frame number L, it describes with DRR ((omega)), However, The value calculated | required for every frequency for every frame is described with DRR ((omega), t).

以上説明した直間比推定法は、残響音は拡散性が強い信号であることからマイクロホンアレーに対して等方的に到来することに着目した新しい方法である。マイクロホンアレーにより実現される指向性形状が同一な２つのビームフォーマによって、直接音と残響音を含む信号と、残響音のみを含む信号と、を得ることで直接音成分と間接音成分を正しく分離することができ、その結果として直間比の推定精度を向上させることができる。 The direct ratio estimation method described above is a new method that pays attention to the fact that reverberant sound is isotropically arrives at the microphone array since it is a highly diffuse signal. By using two beamformers with the same directional shape realized by a microphone array, a direct sound component and an indirect sound component are correctly separated by obtaining a signal including direct sound and reverberation sound and a signal including only reverberation sound. As a result, the estimation accuracy of the direct ratio can be improved.

式（１９），（２０）、式（２１），（２２）は、以下のようにデシベル表記しない直間比推定値ＤＲＲであってもよい。 Expressions (19), (20), and expressions (21), (22) may be a direct ratio estimated value DRR not expressed in decibels as follows.

〔変形例１〕
図１０に、直間比計算部４４の残響音方向パワー推定部４４３の機能構成を変形した直間比計算部４４′の機能構成例を示す。直間比計算部４４′は、残響音方向パワーＰ_ＲＤ（ω）を、複数（２個以上）の指向方向の残響音方向パワーＰ_ＲＤ1（ω）〜Ｐ_ＲＤＮ（ω）を平均して求めるようにしたものである。 [Modification 1]
FIG. 10 shows a functional configuration example of a direct ratio calculation unit 44 ′ in which the functional configuration of the reverberation sound direction power estimation unit 443 of the direct ratio calculation unit 44 is modified. Chokkan ratio calculating unit 44 'obtains the reverberation direction power _{P RD} (omega), a plurality (two or more) orientation of the reverberation sound direction power _{P RD1} an _{(omega) to P RDN} (omega) On average It is what I did.

直間比計算部４４′の残響音方向パワー推定部４４３′は、２個以上の残響指向性形成部４４３１_１〜４４３１_Ｎと、２個以上の残響パワー推定部４４３２_１〜４４３２_Ｎと、残響方向パワー算出部４４３３を備える点で、直間比計算部４４と異なる。残響指向性形成部４４３１_１のビームフォーマのメインビームの方向は例えば基準点から方向θ_１である。残響指向性形成部４４３１_２のビームフォーマのメインビームの方向は方向θ_１であり、残響指向性形成部４４３１_Ｎのビームフォーマのメインビームの方向は方向θ_Ｎである。 The reverberation sound direction power estimation unit 443 ′ of the direct ratio calculation unit 44 ′ includes two or more reverberation directivity forming units 4431 _{1 to} 4431 _N , two or more reverberation power estimation units 4432 _{1 to} 4432 _N , and reverberation. It differs from the direct ratio calculation unit 44 in that a directional power calculation unit 4433 is provided. The direction of the main beam of the beam former of the reverberation directivity forming unit 4431 ₁ is, for example, the direction θ ₁ from the reference point. Direction of the main beam beamformer reverberation directivity forming section 4431 ₂ is the direction theta _1, the direction of the main beam beamformer reverberation directivity forming section 4431 _N is the direction theta _N.

図１１に各残響指向性形成部４４３１_１〜４４３１_Ｎの指向性形状を模式的に示す。各残響指向性形成部４４３１_１〜４４３１_Ｎの指向性形状は、そのメインビームの方向θのみが異なりその形状は同じものである。各々の残響指向性形成部４４３１_１〜４４３１_Ｎの指向性を通過した信号から、それぞれに接続された残響パワー推定部４４３２_１〜４４３２_Ｎによって各指向方向の残響音パワー推定値Ｐ_ＲＤ１（ω）〜Ｐ_ＲＤＮ（ω）が求められる。 FIG. 11 schematically shows the directivity shapes of the reverberation directivity forming units 4431 _{1 to} 4431 _N. The directivity shapes of the reverberation directivity forming units 4431 _{1 to} 4431 _N are different only in the direction θ of the main beam and have the same shape. The reverberant power estimation values P _RD1 (ω) in the respective directivity directions from the signals that have passed through the directivities of the respective reverberation directivity forming units 4431 _{1 to} 4431 _N by the reverberation power estimation units 4432 _{1 to} 4432 _N connected thereto. ~ _PRDN (ω) is determined.

残響方向パワー算出部４４３３は、複数のパワー推定値Ｐ_ＲＤ１（ω）〜Ｐ_ＲＤＮ（ω）を、重み付け平均（式２３）して残響音方向パワーＰ_ＲＤ（ω）を算出する。 The reverberation direction power calculation unit 4433 calculates the reverberation sound direction power P _RD (ω) by performing a weighted average (Expression 23) on the plurality of power estimation values P _RD1 (ω) to P _RDN (ω).

ただし、β_ｎは非負の重み係数であり、式（２４）を満たすようにあらかじめ設定される。このようにして求めた残響音方向パワーＰ_ＲＤ（ω）は、複数の方向の残響音方向パワーを平均して求めた値なので、その精度を向上させることができる。その結果、直間比推定値ＤＲＲ（ω）の精度を向上させることができる。 However, β _n is a non-negative weighting factor, and is set in advance so as to satisfy Expression (24). Since the reverberant sound direction power P _RD (ω) obtained in this way is a value obtained by averaging the reverberant sound direction powers in a plurality of directions, the accuracy can be improved. As a result, the accuracy of the direct ratio estimation value DRR (ω) can be improved.

〔変形例２〕
図１２に、直間比計算部４４の残響音方向パワー推定部４４３の機能構成を変更した直間比計算部４４″の機能構成例を示す。直間比計算部４４″は、指向性形成部４４２１と残響指向性形成部４４３１のビームフォーマのメインビームの方向を自動的に設定できるようにしたものである。 [Modification 2]
FIG. 12 shows a functional configuration example of the direct ratio calculation unit 44 ″ obtained by changing the functional configuration of the reverberation sound direction power estimation unit 443 of the direct ratio calculation unit 44. The direct ratio calculation unit 44 ″ is a directivity generator. The direction of the main beam of the beam former of the unit 4421 and the reverberation directivity forming unit 4431 can be automatically set.

直間比計算部４４″は、音源方向推定部４４６と、ビームフォーマ生成部４４７と、を備える点で、直間比計算部４４と異なる。音源方向推定部４４６は、マイクロホンアレー４１に含まれる複数個のマイクロホンで受音された受音信号を周波数領域に変換して得られる周波数領域信号Ｘ_１（ω，ｔ），…，Ｘ_Ｍ（ω，ｔ）を入力として、音源の方向を推定して音源方向信号を出力する。音源の方向は、例えば、周波数領域信号Ｘ_１（ω，ｔ），…，Ｘ_Ｍ（ω，ｔ）の位相差等から従来技術で求めることが可能である。音源の方向の推定については、例えば、参考文献２「大賀，山崎，金田著，“音響システムとディジタル信号処理”電子情報通信学会発行」の７.２章に記載されている。 The direct ratio calculation unit 44 ″ is different from the direct ratio calculation unit 44 in that it includes a sound source direction estimation unit 446 and a beam former generation unit 447. The sound source direction estimation unit 446 is included in the microphone array 41. The direction of the sound source is estimated by using as input the frequency domain signals X ₁ (ω, t),..., X _M (ω, t) obtained by converting the received signals received by a plurality of microphones into the frequency domain. The direction of the sound source can be obtained by a conventional technique from the phase difference of the frequency domain signals X ₁ (ω, t),..., X _M (ω, t), for example. The estimation of the direction of the sound source is described in, for example, Chapter 7.2 of Reference Document 2 “Oga, Yamazaki, Kanada,“ Sound System and Digital Signal Processing ”published by the Institute of Electronics, Information and Communication Engineers”.

ビームフォーマ生成部４４７は、音源方向信号を入力として、その音源方向にメインビームを持つビームフォーマＢＦ０と、その音源方向を避けるようにメインビームが設定されたビームフォーマＢＦ１とを生成して、ビームフォーマＢＦ０を直接音方向パワー推定部４４２へ、ビームフォーマＢＦ１を残響音方向パワー推定部４４３に出力する。直接音方向パワー推定部４４２の指向性形成部４４２１は、ビームフォーマＢＦ０を適用して上記した出力信号Ｙ_ＢＦ（ω，ｔ）を出力する。残響音方向パワー推定部４４３は、ビームフォーマＢＦ１を適用して残響音方向パワーＰ_ＲＤ（ω）を出力する。 The beamformer generation unit 447 receives a sound source direction signal, generates a beamformer BF0 having a main beam in the sound source direction, and a beamformer BF1 in which the main beam is set so as to avoid the sound source direction. The former BF0 is output to the direct sound direction power estimating unit 442, and the beam former BF1 is output to the reverberant sound direction power estimating unit 443. The directivity forming unit 4421 of the direct sound direction power estimating unit 442 applies the beamformer BF0 and outputs the output signal Y _BF (ω, t) described above. The reverberant sound direction power estimation unit 443 outputs the reverberant sound direction power P _RD (ω) by applying the beam former BF1.

このように直間比計算部４４″は、自動的に直接音方向パワー推定部４４２と残響音方向パワー推定部４４３の指向性形状を設定することができる。 In this way, the direct ratio calculation unit 44 ″ can automatically set the directivity shapes of the direct sound direction power estimation unit 442 and the reverberant sound direction power estimation unit 443.

以上、直間比計算部４４，４４′，４４″の動作を周波数領域で動作する例で説明を行ったが、変形例を含めて本発明の技術思想は、そのまま時間領域の動作に適用することが可能である。また、直間比計算部４４″の考えを、直間比計算部４４′に適用することも可能である。 The operation of the direct ratio calculation units 44, 44 ', and 44' 'has been described above with an example of operating in the frequency domain. However, the technical idea of the present invention including the modification is applied to the operation in the time domain as it is. It is also possible to apply the idea of the direct ratio calculation unit 44 ″ to the direct ratio calculation unit 44 ′.

図１３に実施例２の音響距離推定装置４００の機能構成例を示す。音響距離推定装置４００は、直間比推定装置１００と、距離−直間比データベース（以降、距離−直間比ＤＢと称する）４５と、距離判定部４６と、を具備する。直間比推定装置１００は、実施例１で説明したものと同じである。マイクロホンアレー４１を除く各機能構成部は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 FIG. 13 illustrates a functional configuration example of the acoustic distance estimation apparatus 400 according to the second embodiment. The acoustic distance estimation apparatus 400 includes a direct ratio estimation apparatus 100, a distance-direct ratio database (hereinafter referred to as a distance-direct ratio DB) 45, and a distance determination unit 46. The direct ratio estimation apparatus 100 is the same as that described in the first embodiment. Each functional configuration unit excluding the microphone array 41 is realized by a predetermined program being read into a computer including, for example, a ROM, a RAM, and a CPU, and the CPU executing the program.

距離−直間比ＤＢ４５には、距離と直間比との関係に関する情報が予め記録されている。距離と直間比との関係に関する情報とは、実際に実験により求めた距離と直間比とを対応付けた組（ｄ₁，ＤＲＲ₁），（ｄ₂，ＤＲＲ₂），…を線形補間して得られる関数や、（ｄ₁，ＤＲＲ₁），（ｄ₂，ＤＲＲ₂），…の組から求めた近似関数等の距離と直間比との関係を示す関数式ｄ＝ｆ（ＤＲＲ）とする。関数ｆ（ＤＲＲ）は、例えば、参考文献「M.Tohyama et. al.”The Nature and Technology of Acoustic Space,”Academic Press,1995.」に記載されている。 Information regarding the relationship between the distance and the direct ratio is recorded in advance in the distance-direct ratio DB 45. Information relating to the relationship between the distance and the direct ratio is obtained by linearly interpolating a set (d ₁ , DRR ₁ ), (d ₂ , DRR ₂ ),. Or a functional expression d = f (DRR) indicating the relationship between the distance and the direct ratio of the function obtained from the above, the approximate function obtained from the set of (d ₁ , DRR ₁ ), (d ₂ , DRR ₂ ),. ). The function f (DRR) is described in, for example, the reference “M. Tohyama et. Al.” The Nature and Technology of Acoustic Space, “Academic Press, 1995.”.

距離判定部４６は、直間比推定装置１００から入力される直間比推定値ＤＲＲと、距離−直間比ＤＢ４５に記録されている距離と直間比との関係を参照して、直間比ＤＲＲに対応する音源距離推定値＾ｄを出力する。 The distance determination unit 46 refers to the relationship between the direct ratio estimated value DRR input from the direct ratio estimation apparatus 100 and the distance and direct ratio recorded in the distance-direct ratio DB 45, A sound source distance estimated value ^ d corresponding to the ratio DRR is output.

距離と直間比とを対応付けた組（ｄ₁，ＤＲＲ₁），（ｄ₂，ＤＲＲ₂），…そのものが距離−直間比ＤＢ４５に格納されている場合は、次の三つのステップにより音源距離推定値＾ｄを求めて出力する。 When the pair (d ₁ , DRR ₁ ), (d ₂ , DRR ₂ ),..., Which associates the distance with the direct ratio is stored in the distance-direct ratio DB 45, the following three steps are performed. The sound source distance estimated value ^ d is obtained and output.

第一ステップ：距離−直間比ＤＢ４５に格納されたＤＲＲ₁，ＤＲＲ₂，…の内、直間比推定装置１００で求めた直間比推定値ＤＲＲに隣接する２つの直間比ＤＲＲ_mとＤＲＲ_nを求める。 First step: Of the DRR ₁ , DRR ₂ ,... Stored in the distance-direct ratio DB 45, two direct ratios DRR _m adjacent to the direct ratio estimated value DRR obtained by the direct ratio estimating device 100 Obtain DRR _n .

第二ステップ：直間比ＤＲＲ_mとＤＲＲ_nのそれぞれに対応する距離ｄ_mとｄ_nを距離−直間比ＤＢ４４より求める。 Second step: Chokkan ratio DRR _m and DRR distance the distance d _m and d _n corresponding to each of the _n - obtained from Chokkan ratio DB 44.

第三ステップ：距離ｄ_mとｄ_nとから音源距離推定値＾ｄを式（２５）に示すように線形補間して求める。 Third step: the distance d _m and d _n from the sound source distance estimate ^ d as shown in Equation (25) obtained by linear interpolation.

なお、距離ｄ_ｍとｄ_ｎとから音源距離推定値＾ｄを推定する際に、特定の周波数（または周波数帯域）のＤＲＲ（ω）を使ってもよい。その場合、式（２５）は次式となる。 Incidentally, from the distance d _m and d _n when estimating the sound source distance estimate ^ d, it may also be used DRR (omega) of a specific frequency (or frequency band). In that case, Expression (25) becomes the following expression.

また、距離判定部４６は、関数式ｄ＝ｆ（ＤＲＲ）が距離−直間比ＤＢ４５に格納されている場合には、直間比推定装置１００から入力される直間比推定値ＤＲＲから音源距離推定値＾ｄを計算して出力する。 In addition, the distance determination unit 46 determines the sound source from the direct ratio estimated value DRR input from the direct ratio estimation apparatus 100 when the functional equation d = f (DRR) is stored in the distance-direct ratio DB 45. Calculate and output the estimated distance ^ d.

図１４に、本発明の雑音除去装置７００の機能構成例を示す。その動作フローを図１５に示す。雑音除去装置７００は、実施例１で述べた直間比推定装置１００と、処理対象信号生成部７２と、対象信号調整部７３と、逆周波数領域変換部７４と、を具備する。 FIG. 14 shows a functional configuration example of the noise removal apparatus 700 of the present invention. The operation flow is shown in FIG. The noise removal apparatus 700 includes the direct ratio estimation apparatus 100 described in the first embodiment, a processing target signal generation unit 72, a target signal adjustment unit 73, and an inverse frequency domain conversion unit 74.

処理対象信号生成部７２は、直間比推定装置１００内の複数の周波数領域変換部４２₁〜４２_Ｍが出力する周波数領域の信号Ｘ_m（ω，ｔ）を入力として処理対象信号Ｘ（ω，ｔ）を出力する（ステップＳ７２）。処理対象信号Ｙ（ω，ｔ）は、周波数領域の信号Ｘ_m（ω，ｔ）を例えば図示しない加算手段等で合成したものである。加算する前に、各周波数領域の信号Ｘ_m（ω，ｔ）に、重みを乗じる様にしても良い。 The processing target signal generation unit 72 receives the frequency domain signals X _m (ω, t) output from the plurality of frequency domain conversion units 42 _{1 to} 42 _M in the direct ratio estimation apparatus 100 as input, and the processing target signal X (ω , T) is output (step S72). The processing target signal Y (ω, t) is a signal obtained by synthesizing the frequency domain signal X _m (ω, t) by, for example, an adding means (not shown). Before the addition, the signal X _m (ω, t) in each frequency domain may be multiplied by a weight.

図１６に処理対象信号生成部７２のより具体的な機能構成例を示す。処理対象信号生成部７２は、複数の重み乗算手段７２１₁〜７２１_Mと、加算手段７２２を備える。複数の重み乗算手段７２１₁〜７２１_Mは、Ｍ個のマイクロホンで受音した複数の受音信号ｘ_m（ｎ）の、それぞれの周波数成分Ｘ₁（ω，ｔ），…，Ｘ_M（ω，ｔ）に重み係数ｗ_m（ω）を乗ずる。 FIG. 16 shows a more specific functional configuration example of the processing target signal generation unit 72. The processing target signal generation unit 72 includes a plurality of weight multiplication units 721 _{1 to} 721 _M and an addition unit 722. The plurality of weight multiplying means 721 _{1 to} 721 _M are frequency components X ₁ (ω, t),..., X _M (ω) of a plurality of received signals x _m (n) received by M microphones. , T) is multiplied by a weighting factor w _m (ω).

重み乗算手段７２１₁〜７２１_Mで使用する重みには、例えばＭ個のマイクロホンが無指向性の場合にはｗ_m＝１/Ｍとすることで全ての周波数成分Ｘ₁（ω，ｔ），…，Ｘ_M（ω，ｔ）の平均を取ることで、処理対象信号Ｙ（ω，ｔ）を安定化させる。また、Ｍ個のマイクロホンが指向性を持つ場合には、ｗ₁＝１，ｗ_m＝０（ｍ＝{２，…，Ｍ}）とすることで、特定のマイクロホンの信号だけを使用することができる。例えば、参考文献２「大賀、山崎、金田著、“音響システムとディジタル信号処理”電子情報通信学会発行」に記載されているような方法を利用して、重みビームフォーミングのフィルタ係数を使用すれば、マイクロホンアレーで任意の指向性を形成することもできる。 Weight multiplier unit 721 ₁ to weights used in ～721 _M, for example, when the M microphones of omnidirectional w _m = 1 / all frequency components by the M X ₁ (omega, t), .., X _M (ω, t) is averaged to stabilize the processing target signal Y (ω, t). Also, when M microphones have directivity, use only a specific microphone signal by setting w ₁ = 1, w _m = 0 (m = {2,..., M}). Can do. For example, if the filter coefficient of the weight beamforming is used by using a method described in Reference Document 2 “Oga, Yamazaki, Kanada,“ Sound System and Digital Signal Processing ”published by the Institute of Electronics, Information and Communication Engineers” An arbitrary directivity can be formed by the microphone array.

加算手段７２２は、重みが乗ぜられた全ての周波数成分Ｘ₁（ω，ｔ），…，Ｘ_M（ω，ｔ）を加算して処理対象信号Ｙ（ω，ｔ）を出力する。 The adding means 722 adds all the frequency components X ₁ (ω, t),..., X _M (ω, t) multiplied by the weights, and outputs a processing target signal Y (ω, t).

対象信号調整部７３は、直間比推定装置１００が出力する直間比推定値ＤＲＲ（ω）と、処理対象信号生成部７２が出力する処理対象信号Ｘ（ω，ｔ）を入力として、処理対象信号Ｘ（ω，ｔ）の振幅を調整した処理後信号Ｙ（ω，ｔ）を生成する（ステップＳ７３）。逆周波数領域変換部７４は、処理後信号Ｙ（ω，ｔ）を時間領域の信号ｙ（ｎ）に変換する（ステップＳ７４）。 The target signal adjustment unit 73 receives the direct ratio estimated value DRR (ω) output from the direct ratio estimation apparatus 100 and the processing target signal X (ω, t) output from the processing target signal generation unit 72 as inputs. A post-processing signal Y (ω, t) in which the amplitude of the target signal X (ω, t) is adjusted is generated (step S73). The inverse frequency domain transform unit 74 transforms the processed signal Y (ω, t) into a time domain signal y (n) (step S74).

対象信号調整部７３は、例えば、距離算出手段７３１、フィルタ形成手段７３２、乗算手段７３３、を備える。距離算出手段７２１は、マイクロホンアレー４１と音源との間の距離と、直間比推定値ＤＲＲとの関係を示す関数式ｄ＝ｆ（ＤＲＲ）を内蔵し、入力される直間比推定値ＤＲＲに応じた音源距離推定値＾ｄを算出する（距離算出ステップＳ７３１）。 The target signal adjustment unit 73 includes, for example, a distance calculation unit 731, a filter formation unit 732, and a multiplication unit 733. The distance calculation means 721 incorporates a function formula d = f (DRR) indicating the relationship between the distance between the microphone array 41 and the sound source and the direct ratio estimate value DRR, and the input direct ratio estimate value DRR. A sound source distance estimated value ^ d corresponding to is calculated (distance calculation step S731).

フィルタ形成手段７３２は、式（２７）に示すように、音源距離推定値＾ｄが、２つの大きさが異なる閾値ｄ_fとｄ_nの間の値を取る時間周波数成分を強調するように設定し、２つの距離区間内の帯状の領域にある音源だけを強調するフィルタを形成する。 Filter formation section 732, as shown in equation (27), set as the sound source distance estimate ^ d is two sizes to emphasize the temporal frequency components take values between different threshold d _f and d _n Then, a filter that emphasizes only the sound source in the band-like region within the two distance sections is formed.

ここで、Ｇ（ω，ｔ）のｔとωは、上記した直間比推定装置１００の処理の内、受信音パワー推定部４４１，直接音方向パワー推定部４４２，残響音方向パワー推定部４４３で、平均を行ったＬ個のフレーム及び平均を行った周波数に含まれる全ての周波数に対して、同じＧ（ω，ｔ）が乗算される。また、式（２２）においてＧ（ω，ｔ）の値は必ずしも１と０である必要は無く、例えば、０.９と０.１のように十分大きさが異なる値でも良い。 Here, t and ω of G (ω, t) are received sound power estimation unit 441, direct sound direction power estimation unit 442, and reverberation sound direction power estimation unit 443 in the processing of the direct ratio estimation apparatus 100 described above. Thus, the same G (ω, t) is multiplied to all the frequencies included in the averaged L frames and the averaged frequency. Further, in the equation (22), the value of G (ω, t) is not necessarily 1 and 0, and may be a sufficiently different value such as 0.9 and 0.1, for example.

乗算手段７３３は、処理対象信号Ｘ（ω，ｔ）に、フィルタＧ（ω，ｔ）を乗じて処理後信号Ｙ（ω，ｔ）を生成する。したがって、処理後信号Ｙ（ω，ｔ）は、２つの距離区間内、つまり、マイクロホンアレー４１から特定の距離範囲に位置する音源の音声が、強調若しくは抑圧されたものとなる。この処理後信号Ｙ（ω，ｔ）は、逆周波数領域変換部７４で時間領域の信号ｙ（ｎ）に変換される。 The multiplier 733 multiplies the processing target signal X (ω, t) by the filter G (ω, t) to generate a post-processing signal Y (ω, t). Therefore, the processed signal Y (ω, t) is obtained by enhancing or suppressing the sound of the sound source located within the two distance sections, that is, in a specific distance range from the microphone array 41. The post-process signal Y (ω, t) is converted into a time domain signal y (n) by the inverse frequency domain converter 74.

なお、本発明は上述の実施形態に限定されるものではない。例えば、周波数ω単位で求めたＤＲＲ（ω）を直間比推定値として実施例２が実行されてもよい。ＤＲＲ（ω）の具体例は、式（１９），（２０）若しくは式（２１），（２２）の直間比推定値である。この場合には、あらかじめ周波数ωごとに用意した関係式ｄ（ω）＝ｆ（ＤＲＲ（ω））に直間比推定値ＤＲＲ（ω）を代入して＾ｄ（ω）を求め、式（２８）のようにフィルタを形成する。ただし、ｄ_ｆ（ω）とｄ_ｎ（ω）は、あらかじめ用意された互いに大きさが異なる閾値である。 In addition, this invention is not limited to the above-mentioned embodiment. For example, the second embodiment may be executed using DRR (ω) obtained in units of frequency ω as a direct ratio estimation value. A specific example of DRR (ω) is a direct ratio estimated value of Expressions (19) and (20) or Expressions (21) and (22). In this case, ^ d (ω) is obtained by substituting the direct ratio estimated value DRR (ω) into the relational expression d (ω) = f (DRR (ω)) prepared for each frequency ω in advance, A filter is formed as in 28). However, d _f (ω) and d _n (ω) are threshold _values prepared in advance and having different sizes.

音源距離測定装置、直間比推定装置、又は雑音除去装置に含まれる機能構成が外部の装置によって実現されてもよい。例えば、音源距離測定装置４００や雑音除去装置７００が、マイクロホンアレーを含まず、外部のマイクロホンアレーに接続されて同様の機能が実現されてもよい。同様に、雑音除去装置７００が、周波数領域変換部や逆周波数領域変換部を含まず、外部の周波数領域変換部や逆周波数領域変換部を利用して同様の機能が実現されてもよい。
〔実験結果〕
本発明の効果を確認する目的で、鏡像法を用いたシミュレーション実験を行った。 The functional configuration included in the sound source distance measurement device, the direct ratio estimation device, or the noise removal device may be realized by an external device. For example, the sound source distance measuring device 400 and the noise removing device 700 may be connected to an external microphone array without including a microphone array to realize the same function. Similarly, the noise removal apparatus 700 does not include the frequency domain conversion unit and the inverse frequency domain conversion unit, and the same function may be realized by using an external frequency domain conversion unit and an inverse frequency domain conversion unit.
〔Experimental result〕
In order to confirm the effect of the present invention, a simulation experiment using a mirror image method was performed.

図１７にシミュレーション条件を示す。図１７は平面図であり、幅４ｍ、奥行き６ｍで、高さが２.７ｍの部屋を想定した。壁の吸音率はα＝０.０５（残響時間Ｔ_６０＝１.８秒）に設定した。８個のマイクロホンを円状に配置したマイクロホンアレーを用い、その基準点の高さは１.５ｍとした。音源の高さも１.５ｍとした。 FIG. 17 shows the simulation conditions. FIG. 17 is a plan view, assuming a room having a width of 4 m, a depth of 6 m, and a height of 2.7 m. The wall sound absorption coefficient was set to α = 0.05 (reverberation time T ₆₀ = 1.8 seconds). A microphone array in which eight microphones were arranged in a circle was used, and the height of the reference point was 1.5 m. The height of the sound source was also 1.5 m.

この条件において、インパルス応答から推定したＤＲＲの実測値ＤＲＲ_actual（□）と、本発明（▽）と、従来法（○）と、を比較した結果を図１８に示す。本発明の方法により推定したＤＲＲ（▽）は、従来法と比べて実測値ＤＲＲ_actual（□）に近づいており、特に音源が遠方にある場合では３ｄＢ程度改善している。 FIG. 18 shows a result of comparison between the _actual measured DRR value DRR _actual (□) estimated from the impulse response, the present invention (▽), and the conventional method (◯) under these conditions. The DRR (▽) estimated by the method of the present invention is closer to the actually measured value DRR _actual (□) than the conventional method, and is improved by about 3 dB particularly when the sound source is far away.

一般に間接成分のパワーは音源の距離によらず一定であるのに対して、直接成分のパワーは距離の２乗に反比例する。このため遠方の音源の場合、直接成分のパワーは間接成分のそれと比べて微小になり、推定された直接成分に含まれる誤差が小さくてもＤＲＲの推定結果には大きな影響を与える。本発明の方法では、マイクロホンアレーの指向性制御により、音源方向から到来する信号の影響を極力抑えて間接音のパワーを求めることから、より精度の高い推定が可能となり、より遠方の音源までＤＲＲを正しく推定できるようになる。 In general, the power of the indirect component is constant regardless of the distance of the sound source, whereas the power of the direct component is inversely proportional to the square of the distance. For this reason, in the case of a distant sound source, the power of the direct component is smaller than that of the indirect component, and even if the error included in the estimated direct component is small, the DRR estimation result is greatly affected. In the method of the present invention, the power of the indirect sound is obtained by suppressing the influence of the signal arriving from the sound source direction as much as possible by controlling the directivity of the microphone array, so that a more accurate estimation is possible, and the DRR can be applied to a sound source farther away. Can be estimated correctly.

以上説明したように、本発明の新しい直間比推定方法は、残響音は拡散性が強い信号であることからマイクロホンアレーに対して等方的に到来すると仮定する新しい方法である。マイクロホンアレーにより実現される指向性形状が同一でメインビームの方向が直接音源方向に設定されたビームフォーマと、メインビームの方向が直接音源方向を避けるように設定されたビームフォーマと、によって音源方向から到来する直接成分と間接成分とを正しく分離することができ、その結果として直間比の推定値精度を上げることができる。 As described above, the new direct ratio estimation method of the present invention is a new method that assumes that the reverberant sound is isotropically arrives at the microphone array because it is a highly diffuse signal. The sound source direction by the beamformer with the same directional shape realized by the microphone array and the main beam direction set directly to the sound source direction, and the beamformer set the main beam direction to avoid the direct sound source direction The direct component and the indirect component coming from can be correctly separated, and as a result, the accuracy of the estimate of the direct ratio can be improved.

また、音源距離推定装置４００は、正確な直間比推定値を用いるので、音源との距離を正確に推定することができる。また、雑音除去装置７００は、正確な直間比推定値を用いるので、特定の距離にある音源だけを正確に収音することが可能になる。 Moreover, since the sound source distance estimation apparatus 400 uses an accurate direct ratio estimation value, the distance to the sound source can be accurately estimated. Moreover, since the noise removal apparatus 700 uses an accurate direct ratio estimation value, only a sound source located at a specific distance can be collected accurately.

なお、直間比推定値ＤＲＲとしてデジベル表記した例を式（１９）〜（２２）に示したが、直間比推定値はパワースペクトル密度の比で求めてよいことは言うまでもないことであり、上記した式で表されるＤＲＲの値に、何れかの定数が乗じられたものを直間比推定値としてもよいし、上記した式で表されたＤＲＲの値の逆数に定数が乗じられたものを直間比推定値としてもよい。また、その定数は単調増加関数値であってもよい。つまり、本発明の直間比推定値ＤＲＲは、上記した式（１９）〜（２２）で表されたものに限定されない。 In addition, although the example expressed in decibel as the direct ratio estimated value DRR is shown in the equations (19) to (22), it goes without saying that the direct ratio estimated value may be obtained by the ratio of the power spectral density. The direct ratio estimation value may be obtained by multiplying the DRR value represented by the above formula by any constant, or the constant may be multiplied by the reciprocal of the DRR value represented by the above formula. It is good also as a direct ratio estimated value. The constant may be a monotonically increasing function value. That is, the direct ratio estimated value DRR according to the present invention is not limited to that represented by the above-described equations (19) to (22).

なお、上記方法及び装置において説明した処理は、記載の順に従って時系列に実行され
るのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 Note that the processes described in the above method and apparatus are not only executed in time series according to the order of description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Good.

また、上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 Further, when the processing means in the above apparatus is realized by a computer, the processing contents of functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）/ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

A received sound power estimation unit that obtains a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal A direct sound direction power estimation unit that obtains a power estimation value of a direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
It is obtained by performing a process of passing signal components mainly coming from other than the direct sound source direction with the same directivity shape as the process of mainly passing the signal components coming from the direct sound source direction of the direct sound direction power estimation unit. It is possible to convert the power estimation value of the reverberant sound direction signal obtained or the signal obtained by performing the process of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimation unit for obtaining a power estimate of a reverberant sound direction signal;
A subtracting unit that outputs a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculator;
A distance-direct ratio database that records the relationship between the estimated direct ratio and distance;
A distance determination unit that refers to the distance-direct ratio database with the direct ratio estimate as an input and outputs a sound source distance estimate corresponding to the direct ratio estimate;
A sound source distance estimation apparatus comprising:

A received sound power estimation unit that obtains a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal A direct sound direction power estimation unit that obtains a power estimation value of a direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
It is obtained by performing a process of passing signal components mainly coming from other than the direct sound source direction with the same directivity shape as the process of mainly passing the signal components coming from the direct sound source direction of the direct sound direction power estimation unit. It is possible to convert the power estimation value of the reverberant sound direction signal obtained or the signal obtained by performing the process of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimation unit for obtaining a power estimate of a reverberant sound direction signal;
A subtracting unit that outputs a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculator;
A direct ratio estimation apparatus comprising:

A direct ratio estimation device according to claim 2;
A processing target signal generation unit that generates a processing target signal by using the frequency domain signals output from the plurality of frequency domain conversion units in the direct ratio estimation apparatus;
A target for generating a post-processing signal in which the direct ratio estimation value output by the direct ratio estimation device and the processing target signal are input and the amplitude of the processing target signal is adjusted according to the direct ratio estimation value A signal conditioning unit;
An inverse frequency domain transform unit for transforming the processed signal into a time domain signal;
A noise removal apparatus comprising:

A received sound power estimation step for obtaining a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal Direct sound direction power estimation step for obtaining a power estimation value of the direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
Obtained by performing processing to pass signal components mainly coming from other than the direct sound source direction in the same directivity shape as processing to mainly pass signal components coming from the direct sound source direction in the direct sound direction power estimation step. It is possible to convert the power estimation value of the reverberant sound direction signal obtained or the signal obtained by performing the process of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimating step for obtaining a power estimate of the reverberant sound direction signal;
A subtracting step for outputting a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculating step;
Distance as input the Chokkan ratio estimate - Chokkan ratio Chokkan ratio recorded in the database by reference to the relationship between the estimated value and the distance, the corresponding sound source distance estimation and input Chokkan ratio estimate A distance determination step for outputting a value;
A sound source distance estimation method comprising:

A received sound power estimation step for obtaining a power estimation value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain;
The power estimation value of the direct sound direction signal obtained by performing the process of mainly passing the signal component arriving from the direct sound source direction with respect to the frequency domain signal, or the direct sound source direction with respect to the received signal Direct sound direction power estimation step for obtaining a power estimation value of the direct sound direction signal obtained by converting the signal that has been processed to pass through the signal component mainly into the frequency domain;
Obtained by performing processing to pass signal components mainly coming from other than the direct sound source direction in the same directivity shape as processing to mainly pass signal components coming from the direct sound source direction in the direct sound direction power estimation step. It is possible to convert the power estimation value of the reverberant sound direction signal obtained or the signal obtained by performing the process of passing the signal component mainly coming from outside the direct sound source direction to the received sound signal into the frequency domain. A reverberant sound direction power estimating step for obtaining a power estimate of the reverberant sound direction signal;
A subtracting step for outputting a direct sound power estimated value obtained by subtracting a power estimated value of the reverberant sound direction signal from a power estimated value of the direct sound direction signal;
Using the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, the direct ratio to obtain the direct ratio estimation value representing the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal A ratio calculating step;
A direct ratio estimation method comprising:

The direct ratio estimation method according to claim 5,
And processing signal generating step of generating a processed signal a signal of a plurality of frequency domain obtained in the previous SL Chokkan ratio in estimating method as an input,
A target for generating a post-processing signal in which the direct ratio estimation value obtained by the direct ratio estimation method and the processing target signal are input and the amplitude of the processing target signal is adjusted according to the direct ratio estimation value A signal conditioning step;
An inverse frequency domain transforming step for transforming the processed signal into a time domain signal;
A noise removal method comprising:

A program for causing a computer to function as the sound source distance measuring device according to claim 1, the direct ratio estimating device according to claim 2, or the noise removing device according to claim 3.