JP2985982B2

JP2985982B2 - Sound source direction estimation method

Info

Publication number: JP2985982B2
Application number: JP3249411A
Authority: JP
Inventors: 豊金田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 1991-09-27
Filing date: 1991-09-27
Publication date: 1999-12-06
Anticipated expiration: 2014-12-06
Also published as: JPH0587903A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、複数のマイクロホン
で観測される信号間の相互相関関数に基づいてその音源
の方向や位置を推定する音源方向推定方法に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound source direction estimating method for estimating the direction and position of a sound source based on a cross-correlation function between signals observed by a plurality of microphones.

【０００２】[0002]

【従来の技術】音源方向の推定は、例えば、遠隔監視装
置などで異常音の発生した方向を推定してその方向にモ
ニタカメラを向ける目的や、テレビ会議において発声者
の方向にカメラを向ける目的や、屋外における航空機飛
行軌道の追跡など、様々な目的において必要とされる基
本技術である。2. Description of the Related Art The direction of a sound source is estimated, for example, by estimating the direction in which an abnormal sound is generated by a remote monitoring device or the like, and pointing a monitor camera in the direction, or aiming the camera in the direction of a speaker in a video conference. It is a basic technology required for various purposes such as tracking the flight trajectory of an aircraft outdoors.

【０００３】音源方向推定の最も基本的な従来方法は、
複数のマイクロホンで観測される複数の信号の時間差に
基づく方法である。図７Ａはこのことを説明する図であ
る。第１のマイクロホン１、第２のマイクロホン２に、
波面３で入射方向４から入射し、第１のマイクロホン１
の出力信号５をｘ（ｔ）、第２のマイクロホン２の出力
信号６をｙ（ｔ）とする。この図の状態では音波は最初
に第１のマイクロホン１で受音され、少し遅れて第２の
マイクロホン２で受音される。第１，第２のマイクロホ
ン１，２間の距離をｄ、音波の到来方向を図中に示した
角度θとすると、第２のマイクロホン２に到来する遅れ
の時間（時間差）は、音波が距離ｄ・sin θを進むのに
要する時間τ０であり、音速をｃと表すと τ０＝（ｄ・sin θ）／ｃ（１）と関係づけられる。音波の到来方向が一方向であるとす
ると、第１のマイクロホン１の出力信号ｘ（ｔ）を用い
て、第２のマイクロホン２の出力信号ｙ（ｔ）は、ｙ
（ｔ）＝ｘ（ｔ−τ０）と表すことができる。そして、
この２つの信号ｘ（ｔ）、ｙ（ｔ）よりτ０を求めるこ
とができれば、（１）式より、音波の到来方向は次式 θ＝ sin^-1（ｃ・τ０／ｄ）（２）で求めることができる。The most basic conventional method of sound source direction estimation is
This is a method based on a time difference between a plurality of signals observed by a plurality of microphones. FIG. 7A is a diagram for explaining this. For the first microphone 1 and the second microphone 2,
The first microphone 1 is incident on the wavefront 3 from the incident direction 4.
Let the output signal 5 of the second microphone 2 be x (t) and the output signal 6 of the second microphone 2 be y (t). In the state shown in this figure, the sound wave is first received by the first microphone 1 and is received by the second microphone 2 with a slight delay. Assuming that the distance between the first and second microphones 1 and 2 is d and the arrival direction of the sound wave is an angle θ shown in the drawing, the delay time (time difference) of arrival at the second microphone 2 is the distance The time required to advance d · sin θ is τ0, and when the sound speed is represented by c, τ0 = (d · sin θ) / c (1) Assuming that the arrival direction of the sound wave is one direction, the output signal y (t) of the second microphone 2 is expressed by y using the output signal x (t) of the first microphone 1.
(T) = x (t−τ0). And
If τ0 can be obtained from these two signals x (t) and y (t), from the equation (1), the arrival direction of the sound wave is given by the following equation: θ = sin ⁻¹ (c · τ0 / d) (2) You can ask.

【０００４】音波の時間差τ０は、２つの信号ｘ
（ｔ）、ｙ（ｔ）の相互相関関数φｘｙ（τ）を計算
し、その最大値をとるτの値として求めることができ
る。ここで、離散化された信号（ｔが整数）の相互相関
関数を次式、 φｘｙ（τ）＝Σｘ（ｔ）ｙ（ｔ＋τ）（３） Σはｔについての加算で定義する（連続信号の場合には、総和（Σ）が積分に
変更される）。この時、ｙ（ｔ）＝ｘ（ｔ−τ０）の関
係を用いれば、 φｘｙ（τ）＝Σｘ（ｔ）ｘ（ｔ＋τ−τ０）＝φｘｘ（τ−τ０）（４） Σはｔについての加算となる。但し、φｘｘ（τ）はｘ（ｔ）の自己相関関数
で、知られているように、τ＝０の時最大値をとる。従
って、φｘｙ（τ）はτ＝τ０の時最大値をとることが
理解される。図７Ｂに音波がパルス音である場合を例と
して、信号ｘ（ｔ）、ｙ（ｔ）およびそれらから計算さ
れる相互相関関数φｘｙ（τ）を図示した。φｘｙ
（τ）はτ＝τ０の点で明確な最大値を持つことがわか
る。[0004] The time difference τ0 of the sound wave is represented by two signals x
The cross-correlation function φxy (τ) of (t) and y (t) is calculated, and can be obtained as the value of τ that takes the maximum value. Here, the cross-correlation function of the discretized signal (t is an integer) is defined by the following equation, and φxy (τ) = {x (t) y (t + τ) (3)} is defined by addition of t. In that case, the sum (Σ) is changed to integral). At this time, if the relation y (t) = x (t−τ0) is used, φxy (τ) = {x (t) x (t + τ−τ0) = φxx (τ−τ0) (4)} Addition. Here, φxx (τ) is an autocorrelation function of x (t), and takes a maximum value when τ = 0 as is known. Therefore, it is understood that φxy (τ) takes the maximum value when τ = τ0. FIG. 7B illustrates signals x (t) and y (t) and a cross-correlation function φxy (τ) calculated from the signals x (t) and y (t), taking a case where the sound wave is a pulse sound as an example. φxy
It can be seen that (τ) has a clear maximum at τ = τ0.

【０００５】[0005]

【発明が解決しようとする課題】この従来法は、推定す
べき音源が発生する音波以外にも音波が存在したり、ま
た、反射音が存在する場合であっても、それらのパワー
が小さい場合には良好に動作することが知られている
（ほぼτ＝τ０の点で最大値をとる）。しかし、特に室
内音場における音源方向推定を考える場合には、反射音
のパワーは大きい場合が多く、従来法に大きな影響を与
える。図７Ｃを用いてこのことを説明する。In this conventional method, a sound wave other than a sound wave generated by a sound source to be estimated exists, and even if a reflected sound exists, the power of the sound is small. Is known to operate well (takes the maximum value at about τ = τ0). However, especially when considering the estimation of the sound source direction in a room sound field, the power of the reflected sound is often large, which greatly affects the conventional method. This will be described with reference to FIG. 7C.

【０００６】図７Ｃは、音波がパルス音で、単一反射音
がある場合を例として、信号ｘ（ｔ）、ｙ（ｔ）および
それらから計算される相互相関関数φｘｙ（τ）を図示
したもので、直接音７に対し反射音８が受音される。直
接音７は、音源から直接マイクロホンに到達する音のこ
とを意味し、その到来方向は音源方向と一致している。
一方、反射音８は音源から出た音が壁などで反射されて
マイクロホンに到達する音であるため、一般には反射音
の到来方向は音源方向とは異なっている。従って、直接
音７の到達時間差のみが、音源方向に関する情報を含ん
でいる。FIG. 7C illustrates the signals x (t) and y (t) and the cross-correlation function φxy (τ) calculated from them when the sound wave is a pulse sound and there is a single reflected sound as an example. The reflected sound 8 is received for the direct sound 7. The direct sound 7 means a sound that reaches the microphone directly from the sound source, and its arrival direction matches the direction of the sound source.
On the other hand, the reflected sound 8 is a sound that is emitted from a sound source and is reflected by a wall or the like and reaches the microphone. Therefore, the arrival direction of the reflected sound is generally different from the sound source direction. Therefore, only the arrival time difference of the direct sound 7 includes information on the sound source direction.

【０００７】さて、図７Ｃの例では、直接音７および反
射音８は、それぞれ異なった時間差τ０およびτ１をも
って２つのマイクロホン出力信号ｘ（ｔ）、ｙ（ｔ）に
含まれている。音源方向の情報を含んでいるのは、直接
音７の時間差τ０のみである。この時、信号ｘ（ｔ）、
ｙ（ｔ）より計算される相互相関関数φｘｙ（τ）は同
図に示したものとなり、図７Ｂと比較すればわかるよう
に、τ＝τ０以外の複数の点でも極大値が生じて最大値
を与えるτの値が不明確になることがわかる。さらにこ
の極大値を与える点の数は反射音の数の２乗に比例して
増加するので、パワーの大きな反射音が多数存在する室
内音場においては、従来法によりτ０の値を求めて音源
方向を推定することが困難であることが理解できる。In the example of FIG. 7C, the direct sound 7 and the reflected sound 8 are included in the two microphone output signals x (t) and y (t) with different time differences τ0 and τ1, respectively. Only the time difference τ0 of the direct sound 7 includes the information of the sound source direction. At this time, the signal x (t),
The cross-correlation function φxy (τ) calculated from y (t) is as shown in the figure, and as can be seen by comparing FIG. 7B, the local maximum occurs at a plurality of points other than τ = τ0 and the maximum value It can be seen that the value of τ that gives Further, since the number of points giving the maximum value increases in proportion to the square of the number of reflected sounds, in a room sound field where there are many reflected sounds having a large power, the value of τ0 is obtained by a conventional method to determine the value of τ0. It can be seen that it is difficult to estimate the direction.

【０００８】この発明の目的は、上記したような、従来
の音源方向推定方法の問題点を解決し、反射音の多い室
内音場においても良好な音源方向推定を実現する新規な
音源方向推定方法を提供することにある。An object of the present invention is to solve the above-mentioned problems of the conventional sound source direction estimating method, and to realize a novel sound source direction estimating method which realizes a good sound source direction estimation even in a room sound field having many reflected sounds. Is to provide.

【０００９】[0009]

【課題を解決するための手段】複数のマイクロホンで観
測される信号間の相互相関関数に基づいて音源方向を推
定する方法において、請求項１の発明では、複数のマイ
クロホンの出力信号に対して、それぞれピークホールド
処理を行うことを特徴とする。このようにして反射音の
影響がマスクされ、直接音の時間差推定が良好に行われ
る。A method for estimating a sound source direction based on a cross-correlation function between signals observed by a plurality of microphones. It is characterized in that peak hold processing is performed for each. In this way, the influence of the reflected sound is masked, and the time difference estimation of the direct sound is favorably performed.

【００１０】請求項２の発明によれば各マイクロホンの
出力をそれぞれ複数の周波数成分に分割し、その各分割
された成分についてそれぞれピークホールド処理を行
い、これら処理されたものの各対応する周波数成分につ
いて相互相関関数を計算し、これら相互相関関数を重み
付け平均してその値に基づいて音源方向を推定する。請
求項３の発明によれば請求項１又は２の発明においてピ
ークホールド処理したものについて対数化処理し、その
対数化処理したものについて上記相互相関関数を求め
る。According to the second aspect of the present invention, the output of each microphone is divided into a plurality of frequency components, and each of the divided components is subjected to a peak hold process. A cross-correlation function is calculated, these cross-correlation functions are weighted and averaged, and the sound source direction is estimated based on the value. According to the third aspect of the present invention, the peak-hold processing in the first or second aspect is logarithmized, and the cross-correlation function is obtained for the logarithmized processing.

【００１１】[0011]

【作用】まず最初にピークホールド処理の作用効果につ
いて説明する。音源方向の情報は直接音の時間差τ０の
みに含まれている。しかし、図７Ｃに示したように、信
号ｘ（ｔ）、ｙ（ｔ）のように時間差τ１（≠τ０）を
もった反射音が付加されると相互相関関数φｘｙ（τ）
はτ０の点において明確な最大値を持たない。そこで、
図７Ｃに示した信号ｘ（ｔ）、ｙ（ｔ）に対してピーク
ホールド処理（各時点までの入力信号のパワーの最大値
を保持し、出力する処理）を行ってやれば、その結果は
図１Ａに示すような信号ｘ（ｔ）、ｙ（ｔ）になる。図
１Ａより信号ｘ（ｔ）、ｙ（ｔ）はそれぞれ直接音７が
受音されるとそのピーク値に保持され、遅れて到達する
反射音８は直接音７よりパワーが小さいため反射音８が
マスクされ、観測できなくなっていることがわかる。こ
れらピークホールド処理された信号ｘ（ｔ）、ｙ（ｔ）
に対して時間差分処理ｘ（ｔ）← ｘ（ｔ）−ｘ（ｔ−１）（５）（または微分処理）を行うと、波形の変化（増大）する
部分のみが取り出され、差分処理結果は図７Ｂに示した
反射音の無い場合の信号ｘ（ｔ）、ｙ（ｔ）と同一の信
号となる。従って、それらの信号より計算される相互相
関関数も図７Ｂに示したφｘｙ（τ）と同一のものとな
って、τ０の点で明確な最大値を持つ。[Operation] First, the operation and effect of the peak hold processing will be described. The information on the sound source direction is included only in the time difference τ0 of the direct sound. However, as shown in FIG. 7C, when a reflected sound having a time difference τ1 (≠ τ0) such as the signals x (t) and y (t) is added, the cross-correlation function φxy (τ)
Has no definite maximum at the point τ0. Therefore,
If peak hold processing (processing of holding and outputting the maximum value of the input signal power up to each time point) is performed on the signals x (t) and y (t) shown in FIG. 7C, the result is as follows. The signals are x (t) and y (t) as shown in FIG. 1A. 1A, the signals x (t) and y (t) are held at their peak values when the direct sound 7 is received, and the reflected sound 8 arriving later has a lower power than the direct sound 7 so that the reflected sound 8 Is masked and cannot be observed. These signals x (t) and y (t) that have been subjected to the peak hold processing
X (t) ← x (t) −x (t−1) (5) (or differential processing), only the part where the waveform changes (increases) is extracted, and the difference processing result Are the same signals as the signals x (t) and y (t) when there is no reflected sound shown in FIG. 7B. Accordingly, the cross-correlation function calculated from these signals is the same as φxy (τ) shown in FIG. 7B, and has a clear maximum value at τ0.

【００１２】次に対数化処理の作用効果について説明す
る。反射音は音源からマイクロホンまでの到達経路が直
接音より長く、また反射時における壁面吸収のため、直
接音に比べてパワーは小さくなり、従って前述したピー
クホールド処理が有効となるのである。しかし、実際の
室内反射音系列においては、パワーの大きな複数の初期
反射音がほぼ同一時刻に到来し、その結果、しばしば直
接音よりパワーの大きな反射音（正確には複数の反射音
の重畳したもの）が観測される。図１Ｂにパワーの大き
な反射音が到来している場合のピークホールド処理の結
果を示す。図からわかるように、そのような反射音の影
響９は、ピークホールド処理のみでは除去できない。し
かし反射音のパワーは直接のパワーと比べて高々数倍程
度であるので、この影響は対数化処理により軽減され
る。例えば、暗騒音（定常騒音）のパワーが１で、そこ
にパワーが１０００の直接音が到達し、続いてパワーが
２０００の反射音が到達したとする。これを対数化処理
した後の数値で考えると暗騒音は０ｄＢであり、直接音
は３０ｄＢ、反射音は３３ｄＢのパワーをそれぞれもつ
ことになる。従って、真数値においては、反射音の大き
さは直接音の２倍であるが、これを対数化処理した結果
は１．１倍となり、反射音の影響が数値的に軽減されて
いることがわかる。図１Ｃに、図１Ｂの信号に対数化処
理を行った結果を示す。図１Ｃに示した反射音の影響９
は、図１Ｂに示したものと比べて小さくなっており、対
数化処理の有効性が理解できる。Next, the operation and effect of the logarithmic processing will be described. The reflected sound has a longer path from the sound source to the microphone than the direct sound, and the power is smaller than that of the direct sound due to the wall absorption at the time of reflection. Therefore, the above-described peak hold processing is effective. However, in an actual room reflected sound sequence, a plurality of initial reflected sounds having a large power arrive at almost the same time, and as a result, a reflected sound often having a larger power than a direct sound (to be more precise, a plurality of reflected sounds are superimposed). ) Are observed. FIG. 1B shows the result of the peak hold processing when a high-power reflected sound arrives. As can be seen from the figure, such an influence 9 of the reflected sound cannot be removed only by the peak hold processing. However, since the power of the reflected sound is at most several times the direct power, this effect is reduced by logarithmic processing. For example, it is assumed that the power of the background noise (stationary noise) is 1, a direct sound having a power of 1000 arrives there, and a reflected sound having a power of 2000 subsequently arrives. Considering this as a numerical value after logarithmic processing, the background noise is 0 dB, the direct sound has a power of 30 dB, and the reflected sound has a power of 33 dB. Therefore, in the case of an exact value, the magnitude of the reflected sound is twice that of the direct sound, but the result of logarithmic processing of this is 1.1 times, and the effect of the reflected sound is numerically reduced. Recognize. FIG. 1C shows the result of performing logarithmic processing on the signal of FIG. 1B. Effect of reflected sound 9 shown in FIG. 1C
Is smaller than that shown in FIG. 1B, and the effectiveness of the logarithmic processing can be understood.

【００１３】また、直接音の観測機会を増加させるとい
う点では帯域分割処理は有効である。図２Ａは音声信号
の時間−周波数スペクトルを等高線表示したものの例で
ある。この音声の生起時刻１０に対し、この音声の持つ
１ｋＨｚ〜２ｋＨｚの周波数成分が生起する時刻１１は
少し遅れている。このように音声信号などは、周波数成
分に分けて観測すると、各周波数成分毎にその生起時間
が異なっていることが理解される。この１ｋＨｚ〜２ｋ
Ｈｚの成分の生起部分においても直接音の情報が含まれ
ているが、信号を全帯域で見た場合には０〜１ｋＨｚの
成分などによりマスクされてしまい良好な直接音の観測
は行えない。これらのことより、帯域分割処理を行って
個々の帯域毎に信号処理を行うことは直接音の観測機会
を増加させるという点から時間差τ０の推定精度を向上
させる。この考えを利用したのが請求項２の発明であ
る。The band division processing is effective in increasing the chance of observing a direct sound. FIG. 2A is an example of the time-frequency spectrum of the audio signal displayed as contour lines. The time 11 at which the frequency component of 1 kHz to 2 kHz of the sound occurs is slightly behind the time 10 at which the sound occurs. As described above, when an audio signal or the like is divided into frequency components and observed, it is understood that the occurrence time differs for each frequency component. This 1kHz ~ 2k
Although the information of the direct sound is also included in the occurrence part of the Hz component, when the signal is viewed in the entire band, it is masked by the component of 0 to 1 kHz or the like, and good direct sound cannot be observed. For these reasons, performing the signal processing for each band by performing the band division processing improves the accuracy of estimating the time difference τ0 from the viewpoint of increasing the chance of direct sound observation. The invention of claim 2 utilizes this idea.

【００１４】以上説明したように、この発明方法は、ピ
ークホールド処理ならびに対数化処理により反射音の影
響を除去することを特徴とする。その結果、相互相関関
数に基づいた従来の音源方向推定方法の問題点であった
相互相関関数に及ぼす反射音の影響は大幅に改善され
る。As described above, the method of the present invention is characterized in that the influence of the reflected sound is removed by peak hold processing and logarithmic processing. As a result, the influence of the reflected sound on the cross-correlation function, which is a problem of the conventional sound source direction estimation method based on the cross-correlation function, is greatly improved.

【００１５】[0015]

【実施例】図３はこの発明の実施例を示す。２つのマイ
クロホン２１の出力信号を２つの帯域分割部２２におい
てそれぞれＭ個の周波数帯域に分割する。この帯域分割
の方法としては、例えば、ＦＦＴ（高速フーリエ変換）
などを用いる。次に、この２つの系の各帯域の信号に対
してそれぞれ以下に述べる処理を行う。まず、パワー演
算部２３において信号のパワーを求める。次に、ピーク
ホールド処理部２４において信号のピークホールド処理
を行う。次に対数処理部２５において信号の対数化を行
う。次に、時間差分処理部２６において信号の時間差分
を求める。次に、相互相関関数演算部２７において、同
一の処理を行った２つのマイクロホン出力の対応する周
波数帯域の信号の間の相互相関関数を求める。第ｋ番目
の周波数帯域に対して、時間差分処理を行った信号をそ
れぞれｘ′（ｋ，ｔ）、ｙ′（ｋ，ｔ）と表すと、その
相互相関関数φｘｙ′（ｋ，τ）は、次式により求めら
れる。FIG. 3 shows an embodiment of the present invention. The output signals of the two microphones 21 are divided into M frequency bands in each of the two band division units 22. As a method of this band division, for example, FFT (Fast Fourier Transform)
And so on. Next, the following processing is performed on the signals of each band of the two systems. First, the power of the signal is obtained in the power calculator 23. Next, the peak hold processing section 24 performs peak hold processing of the signal. Next, the logarithmic processing unit 25 performs logarithmization of the signal. Next, the time difference processing unit 26 calculates a time difference between the signals. Next, the cross-correlation function calculator 27 obtains a cross-correlation function between signals in the corresponding frequency bands of the two microphone outputs subjected to the same processing. When the signals subjected to the time difference processing with respect to the k-th frequency band are expressed as x ′ (k, t) and y ′ (k, t), respectively, the cross-correlation function φxy ′ (k, τ) becomes Is obtained by the following equation.

【００１６】 φｘｙ′（ｋ，τ）＝Σｘ′（ｋ，ｔ）ｙ′（ｋ，ｔ＋τ）（６） Σはｔに関して行う音源方向推定部２８においては、各帯域毎に求められた
相互相関関数を次式のように平均化してφｘｙ（τ）を
計算する。 φｘｙ（τ）＝ΣＷｋ・φｘｙ（ｋ，τ）（７） Σはｋ＝１からｋ＝Ｍまでである。Φxy ′ (k, τ) = {x ′ (k, t) y ′ (k, t + τ) (6)} is Performed on t In the sound source direction estimating unit 28, the cross-correlation calculated for each band The function is averaged as in the following equation to calculate φxy (τ). φxy (τ) = {Wkφxy (k, τ) (7)} is from k = 1 to k = M.

【００１７】但し、Ｗｋは平均化に際して各帯域に付与
する重み関数であって、例えば、ＳＮ比の悪い帯域はＷ
ｋの値を小さくするなど、測定条件によって決定される
値である。音源方向推定部２８においては、以上の操作
によって求めた相互相関関数φｘｙ（τ）が最大値をと
るτの値を求め、これを直接音の持つ時間差τ０の推定
値とする。そして、音源方向は、次式 θ＝ sin^-1（ｃ・τ０／ｄ）（８）により求められ、出力される。Here, Wk is a weighting function assigned to each band at the time of averaging.
This is a value determined by measurement conditions, such as reducing the value of k. The sound source direction estimating unit 28 obtains the value of τ at which the cross-correlation function φxy (τ) obtained by the above operation takes the maximum value, and uses it as the estimated value of the time difference τ0 of the direct sound. Then, the sound source direction is obtained by the following equation θ = sin ⁻¹ (c · τ0 / d) (8) and output.

【００１８】以上の例ではマイクロホンは２つとして説
明してきたが、３つ以上のマイクロホンを用いることが
可能であれば推定精度は向上する。その場合には、複数
のマイクロホンから２つのマイクロホンの組を複数選び
出し、各々の組に対して上記と同様の方法で音源方向θ
を推定し、得られた複数の音源方向を平均化処理して最
終推定結果とする。このように複数の推定結果を平均化
すれば、雑音などによる推定誤差の影響が軽減される。In the above example, two microphones have been described. However, if three or more microphones can be used, the estimation accuracy is improved. In that case, a plurality of pairs of two microphones are selected from a plurality of microphones, and the sound source direction θ is determined for each pair in the same manner as described above.
Is estimated, and a plurality of obtained sound source directions are averaged to obtain a final estimation result. By averaging a plurality of estimation results in this way, the effects of estimation errors due to noise and the like are reduced.

【００１９】図３において帯域分割することなく、ピー
クホールド処理及び対数化処理して相互相関関数を求め
てもよい。またいずれの場合でも対数化処理を省略して
もよい。対象とする音源から発生される音響信号は音声
などのように非定常信号である場合も多い。もし、非定
常信号が断続的に生起する場合には、各生起時刻毎に直
接音の観測機会は増加するわけであるので、ピークホー
ルド特性には減衰特性を持たせることが有効である。こ
のことを図２Ｂ（ａ)(ｂ)(ｃ）により説明する。図２Ｂ
の（ａ）はパルス音が断続的に発生している場合のマイ
クロホンの受音信号を表したもので、直接音７₁〜７₃
が順次時間的に離れて発生し、これら各直接音７₁〜７
₃に対しそれぞれ複数の反射音系列８₁〜８₃が生じて
いる。この信号に対して減衰特性を持たないピークホー
ルド処理を行った結果を図２Ｂの（ｂ）に示した。この
図より、最初の直接音７₁のピーク値が保持されたまま
となり、反射音は除去されているが、第２，第３の直接
音７₂，７₃も除去されてしまっていることがわかる。
そこで図２Ｂの（ａ）の信号に対して減衰特性を持つピ
ークホールド処理を行う。その結果を同図（ｃ）に示
す。この図より、ピークホールドに減衰特性を持たせれ
ば、第２，第３の直接音７₂，７₃は除去されることな
く、反射音を除去することができる。このようにして直
接音の観測機会を増加させ、複数の直接音から時間差τ
０を推定することにより、一つの直接音からのみτ０を
推定する場合と比べて演算誤差や雑音の影響をうけにく
く、より高い推定精度が得られるという点において有効
である。In FIG. 3, a peak hold process and a logarithmic process may be performed to obtain a cross-correlation function without band division. In any case, the logarithmic processing may be omitted. An acoustic signal generated from a target sound source is often a non-stationary signal such as voice. If an unsteady signal occurs intermittently, the chance of observing a direct sound increases at each occurrence time, so that it is effective to provide the peak hold characteristic with an attenuation characteristic. This will be described with reference to FIGS. 2A, 2B, and 2C. FIG. 2B
Of (a) intended to pulse sound representing the received sound signal of the microphone in the case that occurs intermittently, direct sound 7 _1-7 ₃
Are successively separated in time, and each of these direct sounds 7 _{1 to} 7
Each to ₃ multiple reflections series 8 _1-8 ₃ occurs. FIG. 2B shows the result of performing a peak hold process having no attenuation characteristic on this signal. According to this figure, the peak value of the first direct sound 7 ₁ is maintained and the reflected sound is removed, but the second and third direct sounds 7 ₂ and 7 ₃ are also removed. I understand.
Therefore, a peak hold process having an attenuation characteristic is performed on the signal of FIG. The result is shown in FIG. From this figure, if no attenuation to the peak hold, second, third direct sound 7 _2, 7 ₃ without being removed, can be removed reflected sound. In this way, the direct sound observation opportunity is increased, and the time difference τ
Estimating 0 is effective in that it is less susceptible to calculation errors and noises than in the case of estimating τ0 from only one direct sound, and that higher estimation accuracy can be obtained.

【００２０】[0020]

【発明の効果】次に、この発明方法の有効性を検証する
ために図３の実施例により行った実験結果について説明
する。実験は、体積５０ｍ³、残響時間０．２５秒の室
内で行った。実験における音源とマイクロホンの配置を
図４に示す。図において、３１，３２は音源、３３，３
４はマイクロホンを表している。音源方向推定のための
信号は音声を用いた。受音した信号は８ｋＨｚでサンプ
リングを行い、６４点のＦＦＴで帯域分割（つまり３２
帯域に分割）し、相関の計算には２秒分の受音データを
用いた。Next, a description will be given of the results of an experiment conducted by the embodiment of FIG. 3 to verify the effectiveness of the method of the present invention. The experiment was performed in a room having a volume of 50 m ³ and a reverberation time of 0.25 seconds. FIG. 4 shows the arrangement of the sound source and the microphone in the experiment. In the figure, reference numerals 31 and 32 denote sound sources, 33 and 3
Reference numeral 4 denotes a microphone. The signal for sound source direction estimation used speech. The received signal is sampled at 8 kHz, and band-divided (ie, 32
Band), and two seconds of received sound data were used for calculating the correlation.

【００２１】図５の（ａ)(ｂ）に音源３２のみが存在す
る場合の実験結果を示す。実験結果は得られた相互相関
関数φｘｙ（τ）を表示している。図５の（ａ）はピー
クホールド処理を行わない従来法により求めた結果を、
図５の（ｂ）はこの発明手法によって得られた結果を表
している。音源の方向に対応する正しい時間差は図中黒
矢印で示した。図より、従来法を用いて得られた結果
（図５の（ａ））においても、その最大値は正解と一致
しているが、反射音の影響で正解とは一致しない極大値
も発生している。一方、この発明手法を用いた結果（図
５の（ｂ））では、正解位置に明確な最大値が得られて
いることがわかる。FIGS. 5A and 5B show experimental results when only the sound source 32 is present. The experimental results indicate the obtained cross-correlation function φxy (τ). FIG. 5A shows the result obtained by the conventional method without performing the peak hold process.
FIG. 5B shows the result obtained by the method of the present invention. The correct time difference corresponding to the direction of the sound source is indicated by a black arrow in the figure. As can be seen from the figure, in the results obtained using the conventional method (FIG. 5 (a)), the maximum value coincides with the correct answer, but the maximum value which does not coincide with the correct answer also occurs due to the influence of the reflected sound. ing. On the other hand, the result ((b) of FIG. 5) using the method of the present invention shows that a clear maximum value is obtained at the correct answer position.

【００２２】図６の（ａ)(ｂ）は２つの音源３１，３２
より異なった音声（パワー比１対２）を発生させた場合
の結果を示している。図６の（ａ）は従来法により求め
た結果を、図６の（ｂ）はこの発明手法によって得られ
た結果を表している。同図より、従来法を用いて得られ
た結果（図６の（ａ））ではこれら２つの音源の区別は
困難であるが、この発明手法を用いた結果（図６の
（ｂ））では、各音源に対する直接音の時間差を明瞭に
推定できることがわかる。FIGS. 6A and 6B show two sound sources 31 and 32.
The results when a different voice (power ratio 1: 2) is generated are shown. FIG. 6A shows the result obtained by the conventional method, and FIG. 6B shows the result obtained by the method of the present invention. From the figure, it is difficult to distinguish between these two sound sources in the result obtained by using the conventional method (FIG. 6A), but in the result obtained by using the method of the present invention (FIG. 6B). It can be seen that the time difference between direct sound and each sound source can be clearly estimated.

【００２３】以上説明し、実験により確認してきたよう
に、この発明は反射音の多数存在する音場における音源
方向の推定に大変有効な手法である。As described above and confirmed by experiments, the present invention is a very effective technique for estimating the direction of a sound source in a sound field where many reflected sounds exist.

[Brief description of the drawings]

【図１】Ａは図７Ｃの信号にピークホールド処理を行っ
た結果の信号を表す図、Ｂは反射音のパワーが直接音の
パワーより大きい信号にピークホールド処理を行った結
果の信号を表す図、Ｃは図１Ｂの信号に対数化処理を行
った結果の信号を表す図である。1A is a diagram illustrating a signal obtained by performing a peak hold process on the signal of FIG. 7C, and FIG. 1B is a diagram illustrating a signal obtained by performing a peak hold process on a signal in which the power of the reflected sound is greater than the power of the direct sound. FIG. 7C is a diagram illustrating a signal obtained by performing logarithmic processing on the signal of FIG. 1B.

【図２】Ａは音声の時間−周波数スペクトルの例を示す
図、Ｂは減衰特性を持つピークホールド処理の有効性を
説明する図である。FIG. 2A is a diagram illustrating an example of a time-frequency spectrum of voice, and FIG. 2B is a diagram illustrating the effectiveness of a peak hold process having an attenuation characteristic.

【図３】この発明の実施例を示すブロック図。FIG. 3 is a block diagram showing an embodiment of the present invention.

【図４】この発明の有効性を確認するための実験条件を
示す図。FIG. 4 is a diagram showing experimental conditions for confirming the effectiveness of the present invention.

【図５】音源が一つの場合の実験結果を示す図。FIG. 5 is a view showing an experimental result when one sound source is used.

【図６】音源が二つの場合の実験結果を示す図。FIG. 6 is a view showing an experimental result in a case where there are two sound sources.

【図７】Ａは音波の到来方向θと２つのマイクロホンで
受音される信号との関係を説明する図、Ｂは反射音がな
い場合の２つのマイクロホンの出力信号とそれらから計
算される相関関数を表した図、Ｃは反射音がある場合の
２つのマイクロホンの出力信号とそれらから計算される
相関関数を表した図である。7A is a diagram illustrating the relationship between the arrival direction θ of a sound wave and signals received by two microphones, and FIG. 7B is a diagram illustrating output signals of two microphones when there is no reflected sound and a correlation calculated from them. FIG. 7C is a diagram illustrating the function, and FIG. 7C is a diagram illustrating the output signals of the two microphones when there is a reflected sound and the correlation function calculated from the output signals.

フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G01S 3/00 - 3/86 G01H 3/00 Continuation of the front page (58) Field surveyed (Int.Cl. ⁶ , DB name) G01S 3/00-3/86 G01H 3/00

Claims

(57) [Claims]

1. A method for estimating a sound source direction based on a cross-correlation function between signals observed by a plurality of microphones, wherein a plurality of signals are generated by performing peak hold processing on outputs of the plurality of microphones. A sound source direction estimating method, comprising calculating a cross-correlation function from a plurality of processed signals.

2. A method for estimating a sound source direction based on a cross-correlation function between signals observed by a plurality of microphones, comprising: dividing an output of each of a plurality of microphones into a plurality of frequency components; Generating a plurality of signals by performing peak hold processing, calculating a cross-correlation function between corresponding frequency components of the processed plurality of signals, and weighing and averaging the cross-correlation function between these frequency components. A sound source direction estimation method characterized by estimating a sound source direction based on a cross-correlation function obtained by:

3. The sound source direction estimating method according to claim 1, wherein the cross-correlation function is calculated for a logarithmically processed one after the peak hold processing.