JP4113169B2

JP4113169B2 - Method for estimating the number of signal sources, estimation apparatus, estimation program, and recording medium

Info

Publication number: JP4113169B2
Application number: JP2004238174A
Authority: JP
Inventors: 宏澤田; 良向井; 章子荒木; 昭二牧野
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2004-08-18
Filing date: 2004-08-18
Publication date: 2008-07-09
Anticipated expiration: 2024-08-18
Also published as: JP2006058065A

Description

本発明は、混合された未知数の信号を複数のセンサにより観測した観測値を用いて信号の数を推定する技術に関し、特に、実環境において信号源の数を正しく推定する技術に関する。 The present invention relates to a technique for estimating the number of signals using observation values obtained by observing a mixed unknown number of signals with a plurality of sensors, and more particularly to a technique for correctly estimating the number of signal sources in an actual environment.

複数の音源が混合した観測信号を短時間フーリエ変換し、各周波数ビンで空間相関行列の固有値を調べることにより、信号源の数を推定する方法が提案されている（例えば、非特許文献１参照。）。
［問題の定式化］
まず、この方法で取り扱う問題の定式化を行う。すべての信号はあるサンプリング周波数でサンプリングされ、離散的に表現されるものとする。Ｎ個の信号が混合されてＭ個のセンサで観測されたとする。以下では、信号の発生源からセンサまでに距離があり、信号が減衰・遅延し、かつ複数の経路を経てセンサに到達する状況を扱う。このような状況での混合は、信号源kからセンサjヘのインパルス応答h_jk(l)による畳み込み混合

となる。ここでPはインパルス応答h_jk(l)の持続時間を、n_j(t)はセンサでのノイズを表す。具体的な例としては、音信号が室内で混合される場合、音源からマイクまでの距離により音が減衰・遅延し、また壁などの反射により残響が発生し、さらにマイクに背景ノイズが付加される。 There has been proposed a method for estimating the number of signal sources by performing short-time Fourier transform on an observation signal mixed with a plurality of sound sources and examining eigenvalues of a spatial correlation matrix in each frequency bin (for example, see Non-Patent Document 1). .)
[Problem formulation]
First, the problem handled by this method is formulated. All signals are sampled at a certain sampling frequency and expressed discretely. Assume that N signals are mixed and observed by M sensors. The following deals with the situation where there is a distance from the signal source to the sensor, the signal is attenuated / delayed, and reaches the sensor via a plurality of paths. Mixing in this situation is convolutional mixing with impulse response h _jk (l) from signal source k to sensor j.

It becomes. Here, P represents the duration of the impulse response h _jk (l), and n _j (t) represents the noise at the sensor. As a specific example, when sound signals are mixed indoors, the sound is attenuated / delayed depending on the distance from the sound source to the microphone, reverberation occurs due to reflection from the wall, etc., and background noise is added to the microphone. The

［固有値に基づく方法］
次に、非特許文献１で提案された信号源数の推定方法を、順を追って説明する。なお、センサの数は信号源の数と同等かそれ以上、すなわちN≦Mを仮定する。
まず、センサｊでの観測信号x_j(t)にＬ点の短時間離散フーリエ変換を適用して周波数毎の時間系列

を求める。ここでfは周波数であり、f=0, (1/L)f_s,...,{(L-1)/L}f_sと離散化されている（f_sはサンプリング周波数)。g(l)は窓関数であり、ハニング窓g(l)=(1/2)(1+cos(2πl/L))などのg(0)にパワーの中心を持つ窓関数を用いることで、X_j(f,τ)は時刻t=τを中心とする観測信号x_j(t)の周波数特性を表現する。X_j(f,τ)はLサンプルにわたる情報を含んでいるため、すべての時間τに対してX_j(f,τ)を求める必要はなく、適当な間隔（例えばL/2やL/4）の時間τ毎にX_j(f,τ)を求める。 [Method based on eigenvalues]
Next, the method for estimating the number of signal sources proposed in Non-Patent Document 1 will be described step by step. It is assumed that the number of sensors is equal to or more than the number of signal sources, that is, N ≦ M.
First, a time series for each frequency is applied to the observation signal x _j (t) at sensor j by applying a short-time discrete Fourier transform of L points.

Ask for. Here, f is a frequency, and is discretized as f = 0, (1 / L) f _s ,..., {(L-1) / L} f _s (f _s is a sampling frequency). g (l) is a window function. By using a window function having a center of power at g (0) such as Hanning window g (l) = (1/2) (1 + cos (2πl / L)) , X _j (f, τ) represents the frequency characteristic of the observation signal x _j (t) centered at time t = τ. Since X _j (f, τ) contains information over L samples, it is not necessary to obtain X _j (f, τ) for every time τ, and an appropriate interval (for example, L / 2 or L / 4) ) X _j (f, τ) is obtained every time τ.

畳み込み混合された信号には、周波数領域での操作が有効である。式(1)で示される時間領域での畳み込み混合が、周波数領域では

と各周波数での単純混合に近似表現できるからである。ここで、H_jk(f)は信号源kからセンサｊまでの周波数応答、S_k(f,τ)やN_j(f,τ)は式(2)と同様の式に従って源信号s_k(t)やノイズn_j(t)に短時間離散フーリエ変換を施したものである。 Operation in the frequency domain is effective for convolution mixed signals. The convolutional mixing in the time domain expressed by Equation (1) is

This is because it can be approximated to simple mixing at each frequency. Here, H _jk (f) is the frequency response from the signal source k to the sensor j, and S _k (f, τ) and N _j (f, τ) are the source signals s _k ( t) and noise n _j (t) are subjected to short-time discrete Fourier transform.

次に、X(f,τ)=[X₁(f,τ),..., X_M(f,τ)]^Tに対して相関行列R(f)=〈X(f,τ) X(f,τ)^H〉_τを計算し、これをR(f)=V(f)・Λ(f)・V(f)^Hのように固有値分解する。なお、V(f)=[v₁(f),v₂(f),...,v_M(f)]であり、Λ(f)はλ₁(f)，λ₂(f)，...，λ_M(f)を対角要素とするＭ行Ｍ列の対角行列である。また、・^Hは行列の共役転置を求める操作、〈・〉_τは時間τに関する平均、v_j(f)は固有ベクトル（M次元の縦ベクトル）、λ_j(f)はこれに対応する固有値であり、λ₁(f)≧λ₂(f)≧...≧λ_M(f)の順にソートされている。また、各固有値λ_j(f)は、[Y₁(f,τ),...,Y_M(f,τ)]^T←V(f)^H・[X₁(f,τ),...,X_M(f,τ)]^Tとしたときのｊ番目の信号Y_j(f,τ)のパワー値を示す。 Next, for X (f, τ) = [X ₁ (f, τ), ..., X _M (f, τ)] ^T , the correlation matrix R (f) = <X (f, τ) X (f, τ) ^H > _τ is calculated, and this is subjected to eigenvalue decomposition as R (f) = V (f) · Λ (f) · V (f) ^H. Note that V (f) = [v ₁ (f), v ₂ (f), ..., v _M (f)], and Λ (f) is λ ₁ (f), λ ₂ (f), ..., a diagonal matrix of M rows and M columns with λ _M (f) as diagonal elements.・^H is the operation to find the conjugate transpose of the matrix, 〈〉 _τ is the average over time τ, v _j (f) is the eigenvector (M-dimensional vertical vector), λ _j (f) is the corresponding eigenvalue Yes, they are sorted in the order of λ ₁ (f) ≧ λ ₂ (f) ≧ ... ≧ λ _M (f). Each eigenvalue λ _j (f) is expressed as [Y ₁ (f, τ), ..., Y _M (f, τ)] ^T ← V (f) ^H・ [X ₁ (f, τ),. .., X _M (f, τ)] indicates the power value of the j-th signal Y _j (f, τ) when ^T.

そして、分解された固有値のうち支配的な値を持つ固有値の個数Nを信号源の数と推定し、残りのM-N個の固有値の大きさをノイズのパワー値σ_n(f)²と推定する（λ_N+1(f)=…=λ_M(f)=σ_n(f)²）。
山本潔，W. F.G. van Rooijen， E. Y. Ling，浅野太，山田武志，北脇信彦，「ＳＶＭを用いた音源数推定法の音源分離システムヘの応用」，日本音響学会２００２年秋季研究発表会，２−５−１０，ｐｐ．５３７−５３８，２００２年９月」 Then, the number N of eigenvalues having a dominant value among the decomposed eigenvalues is estimated as the number of signal sources, and the size of the remaining MN eigenvalues is estimated as the noise power value σ _n (f) ² (Λ _{N + 1} (f) =... = Λ _M (f) = σ _n (f) ² ).
Kiyoshi Yamamoto, WFG van Rooijen, EY Ling, Tai Asano, Takeshi Yamada, Nobuhiko Kitawaki, “Application of Sound Source Number Estimation Method Using SVM to Sound Source Separation System”, Acoustical Society of Japan 2002 Autumn Meeting, 2-5 -10, pp. 537-538, September 2002 "

しかし、従来技術の固有値に基づく方法では、実際の信号源の数を正しく推定できない場合があるという問題点がある。
例えば、上述の固有値に基づく方法を現実的な状況で用いる場合、以下に挙げる２つの問題を考慮しなければならない。
１つ目の問題は残響の影響である。一般に、残響の長さは短時間離散フーリエ変換のフレーム長Ｌよりも長いため、ある信号のある時刻の成分が複数のフレームに影響する。その結果、支配的な固有値の数が実際の信号の数よりも多く推定されることがある。 However, the conventional method based on eigenvalues has a problem that the actual number of signal sources may not be estimated correctly.
For example, when the above-described method based on eigenvalues is used in a realistic situation, the following two problems must be considered.
The first problem is the effect of reverberation. In general, since the reverberation length is longer than the frame length L of the short-time discrete Fourier transform, a certain time component of a certain signal affects a plurality of frames. As a result, the number of dominant eigenvalues may be estimated more than the actual number of signals.

図１１（ａ）は、図１０に示す条件で１つの音源だけを鳴らした場合の各周波数における固有値の正規化パワー値である。この図に示すように、残響の影響により２番目に大きな値をとる固有値のパワー値が−２０ｄＢ程度になっている。上述の固有値に基づく方法の場合、所定のしきい値よりも値が大きな固有値の個数を信号源の数と判断することになるが、このしきい値が−２０ｄＢより小さかった場合、上述の２番目に値が大きな固有値も「支配的な固有値」の一つにカウントされ、音源の数が２個と推定されてしまう。すなわち、残響の影響から、このしきい値をある程度大きな値としなければ正確な音源数の推定はできない。
２つ目の問題は、各信号のパワーが固有値に適切に現れていない場合があるということである。この問題は特に位相差が小さくなる低周波数で顕著になる。 FIG. 11A shows normalized power values of eigenvalues at each frequency when only one sound source is played under the conditions shown in FIG. As shown in this figure, the power value of the eigenvalue taking the second largest value due to the effect of reverberation is about −20 dB. In the case of the above-described method based on the eigenvalue, the number of eigenvalues having a value larger than a predetermined threshold value is determined as the number of signal sources. When this threshold value is smaller than −20 dB, the above-described 2 The eigenvalue with the second largest value is counted as one of the “dominant eigenvalues”, and the number of sound sources is estimated to be two. That is, due to the effects of reverberation, the number of sound sources cannot be estimated accurately unless this threshold is set to a certain large value.
The second problem is that the power of each signal may not appear properly in the eigenvalue. This problem is particularly noticeable at low frequencies where the phase difference is small.

図１１（ｂ）は、図１０に示す条件で３音源すべてを鳴らした場合の各周波数における固有値の正規化パワー値である。この例の場合、音源数は３であるから３つの支配的な固有値が存在するはずである。しかし、この図に示すように、各音源のパワー値は同等に設定したにもかかわらず、固有値のパワー値は、２番目、３番目となるにつれ次第に小さくなっていく。この傾向は、低周波数になるほど顕著となる。そのため、この状況において多くの周波数で３音源が存在すると推定されるためには、上述のしきい値を−３０ｄＢ程度以下に設定しなければならない。しかし、しきい値を小さく設定すると、今度は残響に対応する固有値も「支配的な固有値」にカウントされ、例えば、図１１（ａ）の１音源の場合に２音源以上と推定されてしまう。 FIG. 11B shows normalized power values of eigenvalues at each frequency when all three sound sources are played under the conditions shown in FIG. In this example, since the number of sound sources is 3, there should be three dominant eigenvalues. However, as shown in this figure, although the power values of the sound sources are set to be equal, the power value of the eigenvalue gradually decreases as it becomes the second and third. This tendency becomes more prominent as the frequency becomes lower. Therefore, in order to estimate that there are three sound sources at many frequencies in this situation, the above threshold value must be set to about −30 dB or less. However, if the threshold is set small, the eigenvalue corresponding to reverberation is also counted as a “dominant eigenvalue”, and for example, in the case of one sound source in FIG.

以上説明してきたように、従来技術の固有値に基づく方法では、残響の影響の問題と、各信号のパワーが固有値に適切に現れていない問題とにより、実際の信号源の数を正しく推定できないことがある。
本発明はこのような点に鑑みてなされたものであり、実環境でも信号源の数を正しく推定できる技術を提供することを目的とする。 As explained above, the method based on the eigenvalues of the prior art cannot accurately estimate the actual number of signal sources due to the effect of reverberation and the problem that the power of each signal does not appear properly in the eigenvalues. There is.
The present invention has been made in view of these points, and an object thereof is to provide a technique capable of correctly estimating the number of signal sources even in an actual environment.

本発明では上記課題を解決するために、まず、Ｍ個のセンサでの観測信号x_j(t)(j={1,...,M})を周波数毎の時系列データX_j(f,τ)に変換し、この時系列データX_j(f,τ)から分離信号Y_i(f,τ)(i={1,...,M})を生成して記憶部に格納する。次に、上記の各分離信号Y_i(f,τ)のパワー値を求めて記憶部に格納し、異なる分離信号Y_i(f,τ)間の時間差Δτに対するエンベロープの相関値を算出して記憶部に格納する。なお、分離信号Y_i(f,τ)のエンベロープとは、分離信号の絶対値の包絡線｜Y_i(f,τ)｜を意味する。そして、各分離信号Y_i(f,τ)のパワー値及びエンベロープ相関値と、記憶部に格納されている複数のパラメータとを比較し、当該分離信号Y_i(f,τ)が源信号成分であるか否かを判断する。 In the present invention, in order to solve the above-described problem, first, observation signals x _j (t) (j = {1,..., M}) from M sensors are converted to time-series data X _j (f , τ), and generates a separated signal Y _i (f, τ) (i = {1, ..., M}) from the time series data X _j (f, τ) and stores it in the storage unit . Next, the power value of each separated signal Y _i (f, τ) is obtained and stored in the storage unit, and the correlation value of the envelope with respect to the time difference Δτ between the different separated signals Y _i (f, τ) is calculated. Store in the storage. Note that the envelope of separated signals Y _i (f, τ), the envelope of the absolute value of the separated signal _{| Y i (f, τ)} | means. Then, the power value and envelope correlation value of each separated signal Y _i (f, τ) are compared with a plurality of parameters stored in the storage unit, and the separated signal Y _i (f, τ) is a source signal component. It is determined whether or not.

ここで、源信号と残響信号とは相関を持つため、これらの間のエンベロープ相関値は高い。また、残響信号は対応する源信号よりもパワーが小さい。つまり、エンベロープ相関値が高く、パワー値が比較的小さいのが残響信号である。本発明ではこの特徴に着目し、分離信号のエンベロープ相関値やパワー値を各しきい値を示す複数のパラメータと比較して、その分離信号が源信号であるか否かを判断する。これにより、固有値のパワー値のみを指標として源信号を判別していた場合に比べ、実環境を考慮した信号源数の推定が可能となる。 Here, since the source signal and the reverberation signal have a correlation, the envelope correlation value between them is high. Also, the reverberation signal has less power than the corresponding source signal. That is, the reverberation signal has a high envelope correlation value and a relatively small power value. In the present invention, paying attention to this feature, the envelope correlation value or power value of the separated signal is compared with a plurality of parameters indicating each threshold value to determine whether or not the separated signal is a source signal. This makes it possible to estimate the number of signal sources in consideration of the actual environment, compared to the case where the source signal is determined using only the power value of the eigenvalue as an index.

以上のように、本発明では、分離信号のエンベロープ相関値とパワー値とを算出し、それらと複数のパラメータとを比較して源信号を判別することとしたため、実環境において信号源の数を正しく推定することが可能となる。 As described above, in the present invention, the envelope correlation value and the power value of the separated signal are calculated, and the source signal is determined by comparing them with a plurality of parameters. It becomes possible to estimate correctly.

以下、この発明の実施の形態を図面を参照して説明する。
〔第１の実施の形態〕
まず、本発明における第１の実施の形態について説明する。
＜全体の構成＞
図１は本形態における推定装置１の全体を示すブロック図である。
推定装置１は、例えば、ＣＰＵ（central processing unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ハードディスク等がバスで接続されたノイマン型コンピュータに所定のプログラム（推定プログラム）を実行させることにより構築されるものである。 Embodiments of the present invention will be described below with reference to the drawings.
[First Embodiment]
First, a first embodiment of the present invention will be described.
<Overall configuration>
FIG. 1 is a block diagram showing an entire estimation apparatus 1 in this embodiment.
The estimation apparatus 1 causes a Neumann computer, to which a CPU (central processing unit), a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk, and the like are connected by a bus to execute a predetermined program (estimation program), for example. It is constructed by

図１に例示するように、本形態の推定装置１は、メモリ１０、周波数領域変換部２０、信号源数推定部３０、結果統合部４０及び制御部５０を有している。ここで、信号源数推定部３０は、信号分離部３１、パワー算出部３２、エンベロープ相関算出部３３及び判定部３４を有し、メモリ１０は、観測信号領域１１、周波数毎の時系列データ領域１２、分離信号領域１３、パワー値領域１４、エンベロープ領域１５、パラメータ領域１６及び信号源数領域１７を有している。また、制御部５０はレジスタ５１を有し、推定装置１全体を制御する。また、この図における破線の矢印は理論上の情報の流れを示し、実線の矢印は現実のデータの流れ（同時に電気的或いは情報的な接続関係も）を示している。ただし、制御部５０における入出力データの表記は省略してある。 As illustrated in FIG. 1, the estimation apparatus 1 according to the present embodiment includes a memory 10, a frequency domain conversion unit 20, a signal source number estimation unit 30, a result integration unit 40, and a control unit 50. Here, the signal source number estimation unit 30 includes a signal separation unit 31, a power calculation unit 32, an envelope correlation calculation unit 33, and a determination unit 34. The memory 10 includes an observation signal region 11, a time-series data region for each frequency. 12, a separation signal region 13, a power value region 14, an envelope region 15, a parameter region 16, and a signal source number region 17. The control unit 50 includes a register 51 and controls the estimation device 1 as a whole. Also, the broken-line arrows in this figure indicate the theoretical information flow, and the solid-line arrows indicate the actual data flow (at the same time, the electrical or information connection relationship). However, the notation of input / output data in the control unit 50 is omitted.

＜処理の概要＞
本形態では、源信号が混合された混合信号をＭ個のセンサで観測した観測信号x₁(t),...,x_M(t)から源信号の数を推定する。
本形態では、まず前処理として複数のパラメータ（ノイズレベルのしきい値を示す第１パラメータth_noise、残響レベルのしきい値を示す第２パラメータth_rev、及びエンベロープ相関値のしきい値を示す第３パラメータth_cor）を特定するデータをメモリ１０に格納しておく。入力された時間領域の観測信号x_j(t)(j={1,...,M})は、それぞれ、周波数領域変換部２０で周波数毎の時系列データX_j(f,τ)に変換され、信号源数推定部３０の信号分離部３１に送られる。信号分離部３１は、周波数ｆ毎にこの時系列データX_j(f,τ)から分離信号Y_i(f,τ)(i={1,...,M})を生成する。そして、パワー算出部３２が、各分離信号Y_i(f,τ)のパワー値を算出し、エンベロープ相関算出部３３が、異なる分離信号Y_i(f,τ)間の時間差Δτに対するエンベロープ相関値を算出する。これらが算出されると、判定部３４は、各分離信号Y_i(f,τ)のパワー値及びエンベロープ相関値とメモリ内の各パラメータとを比較し、当該分離信号Y_i(f,τ)が源信号成分であるか否かを判断して、周波数ｆに対する信号源の数EN(f)を推定する。 <Outline of processing>
In this embodiment, the number of source signals is estimated from observation signals x ₁ (t),..., X _M (t) obtained by observing mixed signals obtained by mixing source signals with M sensors.
In this embodiment, as a pre-process, a plurality of parameters (a first parameter th _noise indicating a noise level threshold, a second parameter th _rev indicating a reverberation level threshold, and an envelope correlation value threshold are shown. Data specifying the third parameter th _cor ) is stored in the memory 10. The input time domain observation signals x _j (t) (j = {1,..., M}) are respectively converted into time-series data X _j (f, τ) for each frequency by the frequency domain converter 20. The signal is converted and sent to the signal separation unit 31 of the signal source number estimation unit 30. The signal separation unit 31 generates a separation signal Y _i (f, τ) (i = {1,..., M}) from the time series data X _j (f, τ) for each frequency f. Then, the power calculation unit 32 calculates the power value of each separated signal Y _i (f, τ), and the envelope correlation calculation unit 33 uses the envelope correlation value for the time difference Δτ between different separated signals Y _i (f, τ). Is calculated. When these are calculated, the determination unit 34 compares the power value and envelope correlation value of each separated signal Y _i (f, τ) with each parameter in the memory, and the separated signal Y _i (f, τ). Is the source signal component, and the number of signal sources EN (f) for the frequency f is estimated.

その後、音声などの広帯域信号に対しては、最後に結果統合部４０において周波数毎の推定値が統合され、全体としての信号源数の推定値enを得る。一方、通信分野などで用いられる狭帯域信号に対しては、周波数毎の推定値を統合する必要はなく、着目する周波数fでの推定値EN(f)を得れば良い。
＜本形態の詳細＞
図２（ａ）は図１に例示した信号分離部３１の機能構成を、図２（ｂ）はパワー算出部３２の機能構成を、図３（ａ）はエンベロープ相関算出部３３の機能構成を、図３（ｂ）は判定部の機能構成を、それぞれ例示したブロック図である。また、図４及び図５は、本形態における信号源数の推定方法を説明するためのフローチャートである。 Thereafter, for a wideband signal such as voice, finally, the estimated value for each frequency is integrated in the result integrating unit 40 to obtain an estimated value en of the number of signal sources as a whole. On the other hand, for narrowband signals used in the communication field or the like, it is not necessary to integrate estimated values for each frequency, and it is only necessary to obtain an estimated value EN (f) at the frequency f of interest.
<Details of this embodiment>
2A shows the functional configuration of the signal separation unit 31 illustrated in FIG. 1, FIG. 2B shows the functional configuration of the power calculation unit 32, and FIG. 3A shows the functional configuration of the envelope correlation calculation unit 33. FIG. 3B is a block diagram illustrating the functional configuration of the determination unit. 4 and 5 are flowcharts for explaining a method of estimating the number of signal sources in the present embodiment.

以下、図１〜図５を用い、本形態における構成・処理の詳細について説明する。
［前処理］
まず、前処理としてノイズレベルのしきい値を示す第１パラメータth_noise、残響レベルのしきい値を示す第２パラメータth_rev、及びエンベロープ相関値のしきい値を示す第３パラメータth_corを特定するデータを、メモリ１０（「記憶部」に相当）のパラメータ領域１６に格納する。なおパラメータとしては、例えば、th_noise=0.01、th_rev =0.2、th_cor=0.5を例示できる。ただし、実際の測定時において、鳴っている音源数が分かるサンプルがあれば、その観測データをもとに各パラメータを調整していってもよい。具体的には、例えば、第１パラメータth_noiseを、ノイズ信号の正規化パワー値よりも大きく残響信号の正規化パワー値よりも小さくなるように調整し、第２パラメータth_revを、源信号の正規化パワー値よりも小さく残響信号の正規化パワー値よりも大きくなるように調整し、第３パラメータth_corを、源信号と残響信号とのエンベロープ相関値より小さくなるように調整する。なお、正規化パワー値やエンベロープ相関値の意味については後述する。
また、信号源数の推定対象となる時間領域の観測信号x_j(t)(j={1,...,M})をメモリ１０の観測信号領域１１に書き込む。なお、この観測信号x_j(t)はＭ個のセンサ（マイクロホン等）での観測信号であり、下付添字のjは、その観測信号x_j(t)がj番目のセンサで観測されたことを示す。 The details of the configuration and processing in this embodiment will be described below with reference to FIGS.
[Preprocessing]
First, as preprocessing, a first parameter th _noise indicating a noise level threshold, a second parameter th _rev indicating a reverberation level threshold, and a third parameter th _cor indicating an envelope correlation value threshold are specified. The data to be stored is stored in the parameter area 16 of the memory 10 (corresponding to the “storage unit”). Examples of parameters include th _noise = 0.01, th _rev = 0.2, and th _cor = 0.5. However, if there is a sample that shows the number of sound sources that are sounding during actual measurement, each parameter may be adjusted based on the observation data. Specifically, for example, the first parameter th _noise is adjusted to be larger than the normalized power value of the noise signal and smaller than the normalized power value of the reverberation signal, and the second parameter th _{rev is set} to Adjustment is made to be smaller than the normalized power value and larger than the normalized power value of the reverberation signal, and the third parameter th _cor is adjusted to be smaller than the envelope correlation value between the source signal and the reverberation signal. The meaning of the normalized power value and the envelope correlation value will be described later.
Further, the observation signal x _j (t) (j = {1,..., M}) in the time domain that is the target of estimating the number of signal sources is written in the observation signal area 11 of the memory 10. This observation signal x _j (t) is an observation signal from M sensors (microphones, etc.), and the subscript j is the observation signal x _j (t) observed by the jth sensor. It shows that.

［周波数領域への変換］
まず、制御部５０（図１）が変数jに１を代入し、それをレジスタ５１に格納する（ステップＳ１）。次に、周波数領域変換部２０が、メモリ１０の観測信号領域１１にアクセスし、観測信号x_j(t)を読み込む（ステップＳ２）。観測信号x_j(t)を読み込んだ周波数領域変換部２０は、それを周波数毎の時間系列データX_j(f,τ)に変換してメモリ１０の周波数毎の時系列データ領域１２に格納する（ステップＳ３）。なお、この例では、サンプリング周波数f_s、Ｌ点の短時間離散フーリエ変換を利用してこの変換を行う。
次に制御部５０は、レジスタ５１に格納された変数jがＭか否かを判断する（ステップＳ４）。ここでj＝Ｍでないと判断された場合、制御部５０がjに１を加算した値を新たなjとし（ステップＳ５）、それをレジスタ５１に格納してステップＳ２の処理に戻る。一方、j＝Ｍであると判断された場合、以下の信号源推定処理に移る。 [Conversion to frequency domain]
First, the control unit 50 (FIG. 1) assigns 1 to the variable j and stores it in the register 51 (step S1). Next, the frequency domain transform unit 20 accesses the observation signal region 11 of the memory 10 and reads the observation signal x _j (t) (step S2). The frequency domain conversion unit 20 that has read the observation signal x _j (t) converts it into time series data X _j (f, τ) for each frequency and stores it in the time series data area 12 for each frequency in the memory 10. (Step S3). In this example, this conversion is performed using a sampling frequency f _s and a short-time discrete Fourier transform of L points.
Next, the control unit 50 determines whether or not the variable j stored in the register 51 is M (step S4). If it is determined that j = M is not satisfied, the control unit 50 sets a value obtained by adding 1 to j as a new j (step S5), stores it in the register 51, and returns to the process of step S2. On the other hand, if it is determined that j = M, the process proceeds to the following signal source estimation process.

［信号源推定処理］
まず、制御部５０（図１）が、変数ｆに０を代入してレジスタ５１に格納する（ステップＳ６）。
独立成分分析（ＩＣＡ：Independent Component Analysis）処理：
次に、独立成分分析（ＩＣＡ）部３１ａ（図２（ａ））が、メモリ１０の周波数毎の時系列データ領域１２から時系列データX_j(f,τ)を抽出し、独立成分分析（ＩＣＡ）を用い、X(f,τ)=[X₁(f,τ),...,X_M(f,τ)]^Tから、Ｍ×Ｍ行列の分離行列Ｗ(f)とＩＣＡ分離信号Z(f,τ)=[Z₁(f,τ),...,Z_M(f,τ)]^Tとを生成してレジスタ３１ｂ（「記憶部」に相当）に格納する（ステップＳ７）。ここでＩＣＡによる信号分離は、ＩＣＡ分離信号Z(f,τ)の各要素が互いに独立になるようにZ(f,τ)=Ｗ(f)・X(f,τ)となるＷ(f) を算出する手法である。また、ＩＣＡのアルゴリズムは、A. Hyvarinen and J. Karhunen and E. Oja, "Independent Component Analysis," John Wiley & Sons, 2001, ISBN 0-471-40540,などに様々なものが示されている。 [Signal source estimation processing]
First, the control unit 50 (FIG. 1) assigns 0 to the variable f and stores it in the register 51 (step S6).
Independent component analysis (ICA) processing:
Next, the independent component analysis (ICA) unit 31a (FIG. 2A) extracts the time series data X _j (f, τ) from the time series data region 12 for each frequency in the memory 10, and the independent component analysis ( ICA), and X (f, τ) = [X ₁ (f, τ), ..., X _M (f, τ)] ^T , the separation matrix W (f) of the M × M matrix and the ICA separation A signal Z (f, τ) = [Z ₁ (f, τ),..., Z _M (f, τ)] ^T is generated and stored in the register 31b (corresponding to the “storage unit”) (step S7). Here, the signal separation by ICA is such that W (f, τ) = W (f) · X (f, τ) so that each element of the ICA separation signal Z (f, τ) becomes independent from each other. ). Various ICA algorithms are shown in A. Hyvarinen and J. Karhunen and E. Oja, “Independent Component Analysis,” John Wiley & Sons, 2001, ISBN 0-471-40540, and the like.

なおＩＣＡの解にはスケーリングの任意性がある。Z(f,τ)のある要素にあるスカラ値を掛けても、要素間の独立性は変化しないからである。従って、この段階では、センサで観測された源信号のパワーがＩＣＡ分離信号に正しく反映されていない可能性が高い。また、源信号の数Ｎがセンサ数Ｍより少なければ、ＩＣＡ分離信号Z(f,τ)のＮ個の要素は源信号に対応し、残りのM-N個の要素はノイズや残響成分に対応するが、この段階のノイズや残響に対応する要素の大きさは一般に増幅されている。そこで、次にスケーリング部３１ｃ（図２（ａ））において、このスケーリングの任意性の問題を解決する。 Note that the ICA solution has arbitrary scaling. This is because the independence between elements does not change even if a scalar value in an element of Z (f, τ) is multiplied. Therefore, at this stage, there is a high possibility that the power of the source signal observed by the sensor is not correctly reflected in the ICA separation signal. If the number N of source signals is less than the number M of sensors, N elements of the ICA separation signal Z (f, τ) correspond to the source signal, and the remaining MN elements correspond to noise and reverberation components. However, the size of the element corresponding to the noise and reverberation at this stage is generally amplified. Therefore, next, the scaling unit 31c (FIG. 2A) solves the problem of the arbitraryness of the scaling.

スケーリング問題解決処理：
スケーリング部３１ｃでは、スケーリングの任意性の問題を解決するため、以下に示す操作を行う。まず、対角行列生成部３１ｃａが、レジスタ３１ｂから分離行列Ｗ(f)を読み出し、この分離行列Ｗ(f)からスケーリング問題を解決するための対角行列Λ(f)を生成する（ステップＳ８）。この対角行列Λ(f)としては、例えば、
Λ(f)=sqrt(diag[(W(f)・W(f)^H)^-1]) …(4)
が例示できる。ここで、・^-1は逆行列、・^Hは共役転置行列、diagは対角成分以外を０にする操作、sqrtは各要素の平方根を計算する操作である。 Scaling problem solving process:
In the scaling unit 31c, the following operation is performed in order to solve the problem of the arbitraryness of scaling. First, the diagonal matrix generation unit 31ca reads the separation matrix W (f) from the register 31b, and generates a diagonal matrix Λ (f) for solving the scaling problem from the separation matrix W (f) (step S8). ). As this diagonal matrix Λ (f), for example,
Λ (f) = sqrt (diag [(W (f) ・ W (f) ^H ) ^-1 ]) (4)
Can be illustrated. Here, • ⁻¹ is an inverse matrix, • ^H is a conjugate transpose matrix, diag is an operation for setting values other than diagonal components to 0, and sqrt is an operation for calculating the square root of each element.

生成された対角行列Λ(f)は、積演算部３１ｃｂに送られ、積演算部３１ｃｂは、これとレジスタ３１ｂから読み出したＩＣＡ分離信号Z(f,τ)とを用い、[Y₁(f,τ),...,Y_M(f,τ)]^T←Λ(f)・[Z₁(f,τ),...,Z_M(f,τ)]^Tの演算によって、スケーリング問題を解決した（パワーを回復した）分離信号Y_i(f,τ)(i={1,...,M})を生成し、メモリ１０の分離信号領域１３（図１）に格納する（ステップＳ９）。
ここで、式(4)を含む上記一連の操作により、分離信号Y_i(f,τ)は、以下の２つの性質を持つ。第一に分離信号Y_i(f,τ)が互いに無相関であれば、

が成り立つ。すなわち、分離信号Y_i(f,τ)のパワーの総和とセンサでの観測信号X_j(f,τ)のパワーの総和が等しくなる。さらに、分離信号Y_i(f,τ)が互いに独立であれば、

The generated diagonal matrix Λ (f) is sent to the product calculation unit 31cb, and the product calculation unit 31cb uses this and the ICA separation signal Z (f, τ) read from the register 31b, and [Y ₁ ( f, τ), ..., Y M (f, τ)] T ← Λ (f) · [Z 1 (f, τ), ..., the calculation of _{Z M (f, τ)]} T, A separation signal Y _i (f, τ) (i = {1,..., M}) that solves the scaling problem (recovers power) is generated and stored in the separation signal area 13 (FIG. 1) of the memory 10. (Step S9).
Here, the separation signal Y _i (f, τ) has the following two properties by the series of operations including the expression (4). First, if the separated signals Y _i (f, τ) are uncorrelated with each other,

Holds. That is, the sum of the powers of the separated signals Y _i (f, τ) is equal to the sum of the powers of the observation signals X _j (f, τ) at the sensor. Furthermore, if the separated signals Y _i (f, τ) are independent of each other,

が成り立つ。なお、S_k(f,τ)(k={1,...,N})は源信号成分を示す。すなわち、ある分離信号Y_i(f,τ)のパワーと、それに対応する源信号S_k(f,τ)をすべてのセンサで観測した際のパワーの総和とは等しくなる。分離信号Y_i(f,τ)が互いに無相関、さらには互いに独立になることは、独立成分分析の目的であり、多くの場合この条件はほぼ満たされている。従って、上記一連の操作により、各分離信号Y_i(f,τ)のパワーは、それに対応する源信号S_k(f,τ)がセンサで観測された際のパワーの総和に近くなる。
なお、式(4)の対角行列Λ(f)の代わりに、対角行列Λ(f)=diag[W(f)^-1]を使用してもよく、より一般的にW(f)^-1のｉ列ｊ行目の要素をｊ行目の対角成分とする対角行列Λを使用してもよい。この場合、各分離信号Y_i(f,τ)のパワーは、対応する源信号S_k(f,τ)をあるセンサｊで観測したパワー、すなわち|H_jk(f)・S_k(f,τ)|²に近似する。 Holds. Note that S _k (f, τ) (k = {1,..., N}) represents a source signal component. That is, the power of a certain separated signal Y _i (f, τ) is equal to the sum of the power when the corresponding source signal S _k (f, τ) is observed by all sensors. That the separated signals Y _i (f, τ) are uncorrelated with each other and further independent of each other is the purpose of the independent component analysis, and in many cases, this condition is almost satisfied. Therefore, the power of each separated signal Y _i (f, τ) becomes close to the sum of the power when the corresponding source signal S _k (f, τ) is observed by the sensor by the above series of operations.
Note that the diagonal matrix Λ (f) = diag [W (f) ^-1 ] may be used instead of the diagonal matrix Λ (f) in equation (4), and more generally W (f) It is also possible to use a diagonal matrix [Lambda] having a ^-1 column i-th row and j-th row as a diagonal component of the j-th row. In this case, the power of each separated signal Y _i (f, τ) is the power _obtained by observing the corresponding source signal S _k (f, τ) with a certain sensor j, that is, | H _jk (f) · S _k (f, τ) | approximates to ^2.

［判定処理］
判定処理では、スケーリング問題を解決した（パワーを回復した）分離信号Y_i(f,τ)から、源信号の数を推定する。まず、制御部５０（図１）が変数ｉに１を代入し、レジスタ５１に格納する（ステップＳ１０）。
次に、パワー算出部３２の平均パワー算出部３２ａ（図２（ｂ））が、例えばメモリ１０の分離信号領域１３（図１）から各τに対する分離信号Y_i(f,τ)を順次抽出し、そのパワー値｜Y_i(f,τ)｜²を順次算出してレジスタ３２ｂに格納する。そして、平均パワー算出部３２ａは、レジスタ３２ｂに格納されたパワー値｜Y_i(f,τ)｜²を読み出し、分離信号Y_i(f,τ)の時間τに関する平均パワー値
σ_i ²(f)←〈｜Y_i(f,τ)｜²〉_τ
を算出して、レジスタ３２ｂ（「記憶部」に相当）に格納する（ステップＳ１１）。 [Determination process]
In the determination process, the number of source signals is estimated from the separated signal Y _i (f, τ) that has solved the scaling problem (recovered power). First, the control unit 50 (FIG. 1) assigns 1 to the variable i and stores it in the register 51 (step S10).
Next, the average power calculation unit 32a (FIG. 2B) of the power calculation unit 32 sequentially extracts the separation signal Y _i (f, τ) for each τ from the separation signal region 13 (FIG. 1) of the memory 10, for example. Then, the power value | Y _i (f, τ) | ² is sequentially calculated and stored in the register 32b. Then, the average power calculation unit 32a reads the power value | Y _i (f, τ) | ² stored in the register 32b, and calculates the average power value σ _i ² (for the time τ of the separated signal Y _i (f, τ). f) ← 〈│Y _i (f, τ) ｜ ² 〉 _τ
Is calculated and stored in the register 32b (corresponding to the “storage unit”) (step S11).

次に、エンベロープ相関算出部３３のエンベロープ算出部３３ａ（図３（ａ））が、例えばメモリ１０の分離信号領域１３（図１）から各τに対する分離信号Y_i(f,τ)を順次抽出し、その絶対値｜Y_i(f,τ)｜を順次算出してレジスタ３３ｂに格納する。次に、エンベロープ算出部３３ａは、レジスタ３２ｂに格納された絶対値｜Y_i(f,τ)｜を読み出し、時間τに関する平均が０になるように分離信号Y_i(f,τ)の絶対値｜Y_i(f,τ)｜を正規化したエンベロープ
ｖ_ｉ(f,τ)←｜Y_i(f,τ)｜-〈｜Y_i(f,τ)｜〉_τ
を算出してレジスタ３３ｂ（「記憶部」に相当）に格納する（ステップＳ１２）。 Next, the envelope calculation unit 33a (FIG. 3A) of the envelope correlation calculation unit 33 sequentially extracts the separation signal Y _i (f, τ) for each τ from the separation signal region 13 (FIG. 1) of the memory 10, for example. Then, the absolute value | Y _i (f, τ) | is sequentially calculated and stored in the register 33b. Next, the envelope calculation unit 33a reads the absolute value | Y _i (f, τ) | stored in the register 32b, and the absolute value of the separated signal Y _i (f, τ) so that the average with respect to the time τ becomes zero. Value | Y _i (f, τ) | normalized envelope v _i (f, τ) ← | Y _i (f, τ) |-<| Y _i (f, τ) |> _τ
Is calculated and stored in the register 33b (corresponding to the “storage unit”) (step S12).

次に、制御部５０（図１）が、レジスタ５１に格納された変数ｉがＭであるか否かを判断する（ステップ１３）。ここでｉ＝Ｍでなければ、制御部５０がｉに１を加算し、その値を新たなｉとしレジスタ５１に格納し（ステップＳ１４）、ステップＳ１１に戻る。一方、ｉ＝Ｍであれば、制御部５０は、この変数ｉに１を代入してレジスタ５１に格納し（ステップＳ１５）、以下の処理を実行する。
まず、パワー算出部３２のパワー正規化部３２ｃ（図２（ｂ））が、レジスタ３２ｂから平均パワー値σ₁ ²(f),...,σ_M ²(f)を抽出し、平均パワー値σ_i ²(f)を正規化した正規化パワー値

を算出して、メモリ１０のパワー値領域１４に格納する（ステップＳ１６）。 Next, the control unit 50 (FIG. 1) determines whether or not the variable i stored in the register 51 is M (step 13). If i = M is not satisfied, the control unit 50 adds 1 to i, sets the value as a new i, stores the value in the register 51 (step S14), and returns to step S11. On the other hand, if i = M, the controller 50 assigns 1 to the variable i and stores it in the register 51 (step S15), and executes the following processing.
First, the power normalization unit 32c (FIG. 2B) of the power calculation unit 32 extracts the average power values σ ₁ ² (f),..., Σ _M ² (f) from the register 32b, and calculates the average power. Normalized power value obtained by normalizing the value σ _i ² (f)

Is calculated and stored in the power value area 14 of the memory 10 (step S16).

次に、制御部５０（図１）が、変数ｋに１を代入し、レジスタ５１に格納する（ステップＳ１７）。次に、エンベロープ相関算出部３３の相関算出部３３ｃ（図３（ａ））が、レジスタ３３ｂからエンベロープv_i(f,τ)及びv_k(f,τ)を抽出する。そして、相関算出部３３ｃは、これらのエンベロープv_i(f,τ)及びv_k(f,τ)を用い、時間差Δτ（例えばL/2やL/4）による分離信号Y_i(f,τ)の分離信号Y_k(f,τ)とのエンベロープ相関値

を算出し、その演算結果Cor_i,k(f)をレジスタ３３ｂに格納する（ステップＳ１８）。なお、この例のΔτは、例えばプログラムコードに組み込まれた定数である。 Next, the control unit 50 (FIG. 1) assigns 1 to the variable k and stores it in the register 51 (step S17). Next, the correlation calculation unit 33c (FIG. 3A) of the envelope correlation calculation unit 33 extracts the envelopes v _i (f, τ) and v _k (f, τ) from the register 33b. Then, the correlation calculation unit 33c uses the envelopes v _i (f, τ) and v _k (f, τ), and uses the separated signal Y _i (f, τ) based on the time difference Δτ (for example, L / 2 or L / 4). ) Separated signal Y _k (f, τ)

And the calculation result Cor _{i, k} (f) is stored in the register 33b (step S18). In this example, Δτ is a constant incorporated in the program code, for example.

次に、制御部５０（図１）はレジスタ５１に格納された変数ｋがＭであるか否かを判断する（ステップＳ１９）。ここで、ｋ＝Ｍでなかった場合、制御部５０がｋに１を加算し、その値を新たなｋとしてレジスタ５１に格納し、ステップＳ１８に戻る（ステップＳ２０）。一方、ｋ＝Ｍであった場合、エンベロープ相関算出部３３の最大値算出部３３ｄ（図３（ａ））は、レジスタ３３ｂ（図３（ａ））から、エンベロープ相関値Cor_i,1(f),...,Cor_i,M(f)を抽出する。そして、最大値算出部３３ｄは、これらを用い、エンベロープ相関値Cor_i,k(f)のｉごとの最大値maxCor_i(f)を算出し、メモリ１０のエンベロープ領域１５（図１）に格納する（ステップＳ２１）。 Next, the control unit 50 (FIG. 1) determines whether or not the variable k stored in the register 51 is M (step S19). If k = M is not satisfied, the control unit 50 adds 1 to k, stores the value as a new k in the register 51, and returns to step S18 (step S20). On the other hand, when k = M, the maximum value calculation unit 33d (FIG. 3A) of the envelope correlation calculation unit 33 receives the envelope correlation value Cor _{i, 1} (f from the register 33b (FIG. 3A). ), ..., Cori _{, M} (f) is extracted. Then, the maximum value calculation unit 33d calculates the maximum value maxCor _i (f) for each _{i of} the envelope correlation values Cor _{i, k} (f) using these, and stores them in the envelope area 15 (FIG. 1) of the memory 10. (Step S21).

次に、判定部３４の比較部３４ａ（図３（ｂ））が、メモリ１０のパワー値領域１４、エンベロープ領域１５及びパラメータ領域１６から、平均パワー値の正規化値NP_i(f)、エンベロープ相関値の最大値maxCor_i(f)並びに第１パラメータth_noise、第２パラメータth_rev及び第３パラメータth_corを読み出す。そして、比較部３４ａは、以下の論理式により、分離信号Y_i(f,τ)が、源信号に対応するか、ノイズや残響成分に対応するかを判定する（ステップＳ２２）。

Next, the comparison unit 34 a (FIG. 3B) of the determination unit 34 uses the average power value normalized value NP _i (f), the envelope from the power value region 14, the envelope region 15, and the parameter region 16 of the memory 10. The maximum value maxCor _i (f) of the correlation value, the first parameter th _noise , the second parameter th _rev and the third parameter th _cor are read out. Then, the comparison unit 34a determines whether the separated signal Y _i (f, τ) corresponds to the source signal, noise, or reverberation component according to the following logical expression (step S22).

すなわち、ここでは３種類のパラメータth_noise、th_rev、th_corを用いている。そして、平均パワー値の正規化値NP_i(f)が第１パラメータth_noise未満であればノイズ成分と判定し、平均パワー値の正規化値NP_i(f)が第２パラメータth_rev未満であり、さらにエンベロープ相関値の最大値maxCor_i(f)が第３パラメータth_corを超えれば残響成分と判定する。結局、sig_i(f)が0になれば、分離信号Y_i(f,τ)がノイズや残響成分に対応する（源信号成分でない）と判定されたことになり、sig_i(f)=1になれば、分離信号Y_i(f,τ)が源信号に対応すると判定されたことになる。そして、このように生成された判定結果sig_iはレジスタ３４ｂ（図３（ｂ））に送られて格納される。なお、上記論理式中の「＜」の少なくとも一部を「≦」としてもよく、「＞」を「≧」としてもよい。 That is, here, three types of parameters th _noise , th _rev , and th _cor are used. If the normalized value NP _i (f) of the average power value is less than the first parameter th _noise , it is determined as a noise component, and the normalized value NP _i (f) of the average power value is less than the second parameter th _rev In addition, if the maximum value maxCor _i (f) of the envelope correlation value exceeds the third parameter th _cor , it is determined as a reverberation component. Eventually, when sig _i (f) becomes 0, it is determined that the separated signal Y _i (f, τ) corresponds to noise and reverberation components (not source signal components), and sig _i (f) = When it is 1, it is determined that the separated signal Y _i (f, τ) corresponds to the source signal. The determination result sig _i generated in this way is sent to and stored in the register 34b (FIG. 3B). Note that at least a part of “<” in the above logical expression may be “≦”, and “>” may be “≧”.

次に、制御部５０はレジスタ５１（図１）に格納されている変数ｉがＭであるか否かを判断する（ステップＳ２３）。ここで、ｉ＝Ｍでなければ、制御部５０がｉに１を加算し、その値を新たなｉとしてレジスタ５１に格納してステップＳ１６に戻る（ステップＳ２４）。一方、ｉ＝Ｍであれば、判定部３４の信号源数算出部３４ｃ（図３（ｂ））が、レジスタ３４ｂから判定結果sig₁(f),...,sig_M(f)を抽出し、信号源数推定値
EN(f)=Σ_isig_i(f)
を算出し、それをメモリ１０の信号源数領域１７（図１）に格納する（ステップＳ２５）。 Next, the control unit 50 determines whether or not the variable i stored in the register 51 (FIG. 1) is M (step S23). If i = M is not satisfied, the controller 50 adds 1 to i, stores the value as a new i in the register 51, and returns to step S16 (step S24). On the other hand, if i = M, the signal source number calculation unit 34c (FIG. 3B) of the determination unit 34 extracts the determination results sig ₁ (f),..., Sig _M (f) from the register 34b. The number of signal sources
EN (f) = Σ _i sig _i (f)
Is stored in the signal source number area 17 (FIG. 1) of the memory 10 (step S25).

次に、制御部５０は、レジスタ５１に格納された変数ｆが{(L-1)/L}f_s（f_sはサンプリング周波数）であるか否かを判断する（ステップＳ２６）。ここで、変数ｆが{(L-1)/L}f_sでなかった場合、制御部５０が変数ｆにf_s/Lを加算し、その値を新たな変数ｆとし、レジスタ５１に格納してステップＳ７の処理に戻る（ステップＳ２７）。一方、変数ｆが{(L-1)/L}f_sであった場合、以下の結果統合処理を行う。
［結果統合処理］
まず結果統合部４０が、メモリ１０の信号源数領域１７から各周波数fで推定された信号源数推定値EN(0),...,EN({(L-1)/L}f_s)を読み出し、これを元に、全体としての信号源数の推定値enを算出して出力する（ステップＳ２８）。この例では、単純に多数決で全体の推定値enを決定する。信頼できる周波数（例えば、高い周波数）に大きな重みを与えて、重みづけの多数決で全体の推定値enを決定しても良い。 Next, the control unit 50 determines whether or not the variable f stored in the register 51 is {(L-1) / L} f _s (f _s is a sampling frequency) (step S26). Storing Here, when the variable f is not a {(L-1) / L } f s, adds f _s / L to the control unit 50 is variable f, and its value as a new variable f, the register 51 Then, the process returns to step S7 (step S27). On the other hand, when the variable f is a {(L-1) / L } f s, it performs the following result integration processing.
[Result integration processing]
First, the result integrating unit 40 estimates the number of signal sources EN (0) _,. ), And based on this, an estimated value en of the number of signal sources as a whole is calculated and output (step S28). In this example, the overall estimated value en is simply determined by majority vote. A large weight may be given to a reliable frequency (for example, a high frequency), and the overall estimated value en may be determined by weighting majority.

［適用結果］
本形態の信号源数の推定方法を音源数の推定に適用した結果を示す。
図１０に一般的な実験条件を例示する。この実験条件は以下である。
・信号源：７秒間の音声
・残響時間：Ｔ_Ｒ＝２００ｍｓ
・背景ノイズパワー：−２１．８ｄＢ
・サンプリング周波数：ｆ_ｓ＝８０００Ｈｚ
・部屋の大きさ：４．４５ｍ×３．５５ｍ×２．５０ｍ
・音源数：１〜３個
・音源配置・間隔：４ｃｍの間隔で直線上に配置
・センサの数：３個
・中心音源と各センサとの距離：１．１ｍ
・中心音源と各センサを結んだ直線と、各センサが配置される直線とがなす角度：４５°，９０°，１２０°
この図１０に示す条件で１〜３個の音源を鳴らし、３個のマイクでの観測信号を用いて鳴っている音源の数を推定した。 [Application result]
The result of having applied the estimation method of the number of signal sources of this form to estimation of the number of sound sources is shown.
FIG. 10 illustrates general experimental conditions. The experimental conditions are as follows.
・ Signal source: 7 seconds of sound ・ Reverberation time: T _R = 200 ms
-Background noise power: -21.8 dB
・ Sampling frequency: f _s = 8000 Hz
-Room size: 4.45 m x 3.55 m x 2.50 m
・ Number of sound sources: 1 to 3 ・ Sound source arrangement / interval: 4 cm in a straight line ・ Number of sensors: 3 ・ Distance between the central sound source and each sensor: 1.1 m
・ An angle between a straight line connecting the central sound source and each sensor and a straight line where each sensor is arranged: 45 °, 90 °, 120 °
1 to 3 sound sources were sounded under the conditions shown in FIG. 10, and the number of sound sources sounding was estimated using observation signals from three microphones.

図６に従来手法と本形態の手法とによる推定結果の比較を示す。ここで、図６（ａ）は固有値に基づく従来手法による信号源数の推定結果を示しており、図６（ｂ）は本形態の手法よる信号源数の推定結果を示している。また、横軸は真の音源数、縦軸は音源数0,1,2,3としてそれぞれ推定した周波数ビンの数を示す。
この図に示すように、従来手法では、１音源や３音源の場合にも多数決によると２音源と推定してしまっている。このように従来手法が推定を誤る原因は、図１１を用いて説明したように、個々の音源やノイズのパワーが固有値に適切に現れていないことや、残響の影響を考慮されていないことである。一方、本形態の手法によるとすべての場合に正しく推定されている。 FIG. 6 shows a comparison of estimation results between the conventional method and the method of this embodiment. Here, FIG. 6A shows the estimation result of the number of signal sources by the conventional method based on the eigenvalue, and FIG. 6B shows the estimation result of the number of signal sources by the method of the present embodiment. The horizontal axis represents the number of true sound sources, and the vertical axis represents the number of frequency bins estimated as the number of sound sources 0, 1, 2, and 3, respectively.
As shown in this figure, in the conventional method, even in the case of one sound source or three sound sources, it is estimated that there are two sound sources according to the majority vote. As described above with reference to FIG. 11, the reason why the conventional method makes a mistake in estimation is that the power of each sound source or noise does not appear properly in the eigenvalue, and the influence of reverberation is not taken into consideration. is there. On the other hand, according to the method of this embodiment, it is correctly estimated in all cases.

次に本形態の手法による推定が正確である理由を示す。
まず、パワーの回復（各分離信号が、各信号のパワーを適切に反映しているか）に関して考察する。図７は、３音源の場合にセンサで観測された真のパワー値（図７（ａ））と、これらの混合音を本形態の手法により分離した分離信号（図７（ｂ））のパワー値との比較を示すものである。なお、図７（ａ）の観測結果は、各音源を１つずつ鳴らして測定し、その結果を正規化したものである。これらの図に示すように、本形態の手法による各分離信号のパワー値は、各音源の観測値の真のパワー値に近似し、音源数を推定できる程度に正しくパワーが回復されていることがわかる。 Next, the reason why the estimation by the method of this embodiment is accurate will be described.
First, consider power recovery (whether each separated signal appropriately reflects the power of each signal). FIG. 7 shows the true power value (FIG. 7 (a)) observed by the sensor in the case of three sound sources and the power of the separated signal (FIG. 7 (b)) obtained by separating these mixed sounds by the method of this embodiment. The comparison with the value is shown. In addition, the observation result of Fig.7 (a) is measured by sounding each sound source one by one, and normalizing the result. As shown in these figures, the power value of each separated signal according to the method of the present embodiment approximates the true power value of the observed value of each sound source, and the power is restored correctly enough to estimate the number of sound sources. I understand.

次に、残響の影響ヘの対処に関して考察する。図８は、１音源の場合の１番目と２番目（i=1,2）の分離信号のパワー値（図８（ａ））とそれらのエンベロープの相関値（図８（ｂ））を示すものである。図８（ａ）のパワー値だけを見ると、２番目の分離信号のパワー値が決して十分には小さくないので、信号源なのか残響を含むノイズなのか判断し難い。しかし、右側に示す１番目と２番目の分離信号のエンベロープの相関値を見ると、その値が十分に大きいため、２番目の分離信号は１番目の分離信号の残響成分を多く含むノイズであることがわかる。すなわち、エンベロープの相関値は−１〜＋１の値をとり、信号間の相関性が低いほど０に近づく。図８（ｂ）の例では、エンベロープの相関値が０．６〜１の間に集中しており、１番目の分離信号と２番目の分離信号の相関性が高いことが分かる。そしてパワー値が弱い２番目の分離信号が１番目の信号の残響成分であることが推定できる。 Next, let us consider how to deal with the effects of reverberation. FIG. 8 shows the power values (FIG. 8 (a)) of the first and second (i = 1, 2) separated signals in the case of one sound source and the correlation values (FIG. 8 (b)) of their envelopes. Is. Looking only at the power value in FIG. 8A, the power value of the second separated signal is never sufficiently small, so it is difficult to determine whether it is a signal source or noise including reverberation. However, when the correlation value of the envelopes of the first and second separated signals shown on the right side is seen, the value is sufficiently large, so the second separated signal is a noise containing a large amount of reverberation components of the first separated signal. I understand that. That is, the correlation value of the envelope takes a value of −1 to +1, and approaches 0 as the correlation between signals is lower. In the example of FIG. 8B, the correlation values of the envelope are concentrated between 0.6 and 1, and it can be seen that the correlation between the first separated signal and the second separated signal is high. It can be estimated that the second separated signal having a weak power value is the reverberation component of the first signal.

そして、これらの判断に必要なノイズレベルのしきい値を示す第１パラメータth_noise、残響レベルのしきい値を示す第２パラメータth_rev、及びエンベロープ相関値のしきい値を示す第３パラメータth_corを適切に設定することにより、ノイズや残響の影響が無視できない実環境において、アクティブな源信号の数を精度良く推定することができる。
〔第２の実施の形態〕
次に、本発明における第２の実施の形態について説明する。
本形態は第１の実施の形態の変形例であり、ＩＣＡを用いた信号分離の代わりに固有値に基づく信号分離を行う形態である。以下では、第１の実施の形態との相違点を中心に説明を行い、第１の実施の形態と共通する事項については説明を省略する。 The first parameter th _noise indicating the noise level threshold necessary for these determinations, the second parameter th _rev indicating the reverberation level threshold, and the third parameter th indicating the envelope correlation value threshold. By appropriately setting _cor , the number of active source signals can be accurately estimated in an actual environment where the influence of noise and reverberation cannot be ignored.
[Second Embodiment]
Next, a second embodiment of the present invention will be described.
The present embodiment is a modification of the first embodiment, and is a form in which signal separation based on eigenvalues is performed instead of signal separation using ICA. Below, it demonstrates centering around difference with 1st Embodiment, and abbreviate | omits description about the matter which is common in 1st Embodiment.

図９（ａ）は、本形態における信号分離部１３１の構成を例示したブロック図である。
なお、本形態の推定装置と第１の実施の形態の推定装置１との相違点は、信号分離部３１が信号分離部１３１になる点のみである。また、本形態の処理と第１の実施の形態の処理との相違点は、信号分離処理（図４：ステップＳ７〜９）と平均パワー算出処理（図４：ステップＳ１１）のみである。 FIG. 9A is a block diagram illustrating the configuration of the signal separation unit 131 in this embodiment.
Note that the only difference between the estimation apparatus of the present embodiment and the estimation apparatus 1 of the first embodiment is that the signal separation unit 31 becomes the signal separation unit 131. Further, the difference between the process of the present embodiment and the process of the first embodiment is only the signal separation process (FIG. 4: steps S7 to 9) and the average power calculation process (FIG. 4: step S11).

［信号分離処理］
図９（ｂ）は、本形態の信号分離処理を説明するためのフローチャートである。
まず、信号分離部１３１の相関行列生成部１３１ａ（図９（ａ））が、メモリ１０の周波数毎の時系列データ領域１２（図１）から時系列データX_j(f,τ)を順次抽出し、時系列ベクトルX(f,τ)=[X₁(f,τ),...,X_M(f,τ)]^Tに対する相関行列R(f)=〈X(f,τ)・X(f,τ)^H〉_τを生成する（ステップＳ３１）。
生成された相関行列R(f)は固有値分解部１３１ｂ（図９（ａ））に送られ、固有値分解部１３１ｂはこの相関行列R(f)を、R(f)=V(f)・Λ(f)・V(f)^Hの積に分解する（ステップＳ３２）。なお、V(f)=[v₁(f),v₂(f),...,v_M(f)]とし、Λ(f)をλ₁(f)，λ₂(f)，...，λ_M(f)を対角要素とするＭ行Ｍ列の対角行列とし、v_j(f)を固有ベクトルとし、λ_j(f)をこれに対応する固有値とする。生成された固有値λ₁(f)，λ₂(f)，...，λ_M(f)は対応するτに関連つけてメモリ１０（図１）に格納され（ステップＳ３２）、V(f)は積演算部１３１ｃ（図９（ａ））に送られる。 [Signal separation processing]
FIG. 9B is a flowchart for explaining the signal separation processing of the present embodiment.
First, the correlation matrix generation unit 131a (FIG. 9A) of the signal separation unit 131 sequentially extracts time-series data X _j (f, τ) from the time-series data region 12 (FIG. 1) for each frequency in the memory 10. And the time series vector X (f, τ) = [X ₁ (f, τ), ..., X _M (f, τ)] ^T correlation matrix R (f) = <X (f, τ) X (f, τ) ^H > _τ is generated (step S31).
The generated correlation matrix R (f) is sent to the eigenvalue decomposition unit 131b (FIG. 9A), and the eigenvalue decomposition unit 131b converts the correlation matrix R (f) into R (f) = V (f) · Λ. It is decomposed into a product of (f) · V (f) ^H (step S32). Note that V (f) = [v ₁ (f), v ₂ (f), ..., v _M (f)], and Λ (f) is λ ₁ (f), λ ₂ (f),. .., λ _M (f) is a diagonal matrix of M rows and M columns, v _j (f) is an eigenvector, and λ _j (f) is a corresponding eigenvalue. The generated eigenvalues λ ₁ (f), λ ₂ (f),..., Λ _M (f) are stored in the memory 10 (FIG. 1) in association with the corresponding τ (step S32), and V (f ) Is sent to the product operation unit 131c (FIG. 9A).

積演算部１３１ｃは、メモリ１０の周波数毎の時系列データ領域１２（図１）から時系列データX_j(f,τ)を抽出し、[Y₁(f,τ),...,Y_M(f,τ)]^T=V(f)^H・[X₁(f,τ),...,X_M(f,τ)]^Tの演算によって、分離信号Y_i(f,τ)(i={1,...,M})を生成してメモリ１０の分離信号領域１３に格納する（ステップＳ３３）。なお、〈｜Y_i(f,τ)｜²〉_τ=λ_i(f)が成立する。 The product calculation unit 131c extracts the time series data X _j (f, τ) from the time series data area 12 (FIG. 1) for each frequency in the memory 10, and [Y ₁ (f, τ),. _{^{M (f, τ)] T}} = V (f) H · [X 1 (f, τ), ..., by the calculation of _{X M (f, τ)]} T, separated signal Y _i (f, τ) (i = {1,..., M}) is generated and stored in the separated signal area 13 of the memory 10 (step S33). Note that <| Y _i (f, τ) | ² > _τ = λ _i (f) holds.

［平均パワー算出処理］
本形態では、第１の実施の形態のステップＳ１１において、パワー算出部３２の平均パワー算出部３２ａ（図２（ｂ））が、分離信号Y_i(f,τ)からパワー値｜Y_i(f,τ)｜²を算出し、平均パワー値σ_i ²(f)←〈｜Y_i(f,τ)｜²〉_τを算出してレジスタ３２ｂに格納していた代わりに、平均パワー算出部３２ａがメモリ１０（図１）から固有値λ_i(f)を順次抽出し、分離信号Y_i(f,τ)の時間τに関する平均パワー値
σ_i ²(f)←λ_i(f)
を算出して、レジスタ３２ｂに格納する。
なお、その他の処理については第１の実施の形態と同様である。 [Average power calculation processing]
In this embodiment, in step S11 of the first embodiment, the average power calculation section 32a of the power calculation section 32 (FIG. 2 (b)), the separation signal Y _i (f, tau) from the power value | Y _i ( f, τ) | ² is calculated, and instead of calculating the average power value σ _i ² (f) ← <| Y _i (f, τ) | ² > _τ and storing it in the register 32b, the average power is calculated. The unit 32a sequentially extracts the eigenvalue λ _i (f) from the memory 10 (FIG. 1), and the average power value σ _i ² (f) ← λ _i (f) regarding the time τ of the separated signal Y _i (f, τ).
Is calculated and stored in the register 32b.
Other processes are the same as those in the first embodiment.

以上のような構成の場合、パワーが適切に回復されない問題（各信号のパワーが固有値に適切に現れていない問題）は解決されないが、第１の実施の形態と同様、残響の影響の問題は解決できる。そのため、パワーが適切に回復されない問題の影響が少ない周波数領域では、本形態でも正確な信号源数の推定ができる。
なお、本発明は上述の各実施の形態に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 In the case of the configuration described above, the problem that the power is not properly recovered (the problem that the power of each signal does not appear properly in the eigenvalue) is not solved, but the problem of the effect of reverberation is the same as in the first embodiment. can be solved. Therefore, the number of signal sources can be accurately estimated even in this embodiment in the frequency region where the influence of the problem that power is not properly recovered is small.
The present invention is not limited to the embodiments described above. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. Needless to say, other modifications are possible without departing from the spirit of the present invention.

また、上述の構成をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。
この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよいが、具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 Further, when the above-described configuration is realized by a computer, processing contents of functions that each device should have are described by a program. The processing functions are realized on the computer by executing the program on the computer.
The program describing the processing contents can be recorded on a computer-readable recording medium. The computer-readable recording medium may be any medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, or a semiconductor memory. Specifically, for example, the magnetic recording device may be a hard disk device or a flexible Discs, magnetic tapes, etc. as optical disks, DVD (Digital Versatile Disc), DVD-RAM (Random Access Memory), CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable) / RW (ReWritable), etc. As the magneto-optical recording medium, MO (Magneto-Optical disc) or the like can be used, and as the semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) or the like can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。
このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.
A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

本発明の音信号に対する応用例としては、例えば、適応ビームフォーマやブラインド音源分離の前処理において、ある区間でのアクティブな音源数を推定する処理を例示できる。 As an application example of the sound signal of the present invention, for example, a process of estimating the number of active sound sources in a certain section in pre-processing of adaptive beamformer or blind sound source separation can be exemplified.

第１の実施の形態における推定装置の全体を示すブロック図。The block diagram which shows the whole estimation apparatus in 1st Embodiment. （ａ）は図１に例示した信号分離部の機能構成を例示したブロック図。（ｂ）はパワー算出部の機能構成を例示したブロック図。FIG. 2A is a block diagram illustrating a functional configuration of a signal separation unit illustrated in FIG. 1. FIG. 6B is a block diagram illustrating a functional configuration of a power calculation unit. （ａ）はエンベロープ相関算出部の機能構成を例示したブロック図。（ｂ）は判定部の機能構成を例示したブロック図。FIG. 4A is a block diagram illustrating a functional configuration of an envelope correlation calculation unit. FIG. 6B is a block diagram illustrating a functional configuration of a determination unit. 第１の実施の形態における信号源数の推定方法を説明するためのフローチャート。The flowchart for demonstrating the estimation method of the number of signal sources in 1st Embodiment. 第１の実施の形態における信号源数の推定方法を説明するためのフローチャート。The flowchart for demonstrating the estimation method of the number of signal sources in 1st Embodiment. （ａ）は固有値に基づく従来手法による信号源数の推定結果を示したグラフ。（ｂ）は本形態の手法よる信号源数の推定結果を示したグラフ。(A) is the graph which showed the estimation result of the number of signal sources by the conventional method based on an eigenvalue. (B) is the graph which showed the estimation result of the number of signal sources by the method of this form. （ａ）は３音源の場合にセンサで観測された真のパワー値を示した図。（ｂ）は混合音を第１の実施の形態の手法により分離した分離信号のパワー値を示した図。(A) is a diagram showing the true power value observed by the sensor in the case of three sound sources. (B) is a diagram showing the power value of the separated signal obtained by separating the mixed sound by the method of the first embodiment. （ａ）は１音源の場合の１番目と２番目の分離信号のパワー値を示した図。（ｂ）は、それらのエンベロープの相関値を示した図。(A) is the figure which showed the power value of the 1st and 2nd separated signal in the case of 1 sound source. (B) is the figure which showed the correlation value of those envelopes. （ａ）は、第２の実施の形態における信号分離部の構成を例示したブロック図。（ｂ）は第２の実施の形態における信号分離処理を説明するためのフローチャート。(A) is the block diagram which illustrated the composition of the signal separation part in a 2nd embodiment. FIG. 6B is a flowchart for explaining signal separation processing according to the second embodiment. 実験条件を示した図。The figure which showed experiment conditions. 固有値に基づく方法でのパワー推定値を示した図。（ａ）は、１つの音源だけを鳴らした場合の各周波数における固有値の正規化パワー値。（ｂ）は、３音源すべてを鳴らした場合の各周波数における固有値の正規化パワー値。The figure which showed the power estimated value by the method based on an eigenvalue. (A) is the normalized power value of the eigenvalue at each frequency when only one sound source is played. (B) is the normalized power value of the eigenvalue at each frequency when all three sound sources are played.

Explanation of symbols

１推定装置３１信号分離部
１０メモリ３２パワー算出部
２０周波数領域変換部３３エンベロープ相関算出部
３０信号減数推定部３４判定部 DESCRIPTION OF SYMBOLS 1 Estimation apparatus 31 Signal separation part 10 Memory 32 Power calculation part 20 Frequency domain conversion part 33 Envelope correlation calculation part 30 Signal reduction estimation part 34 Determination part

Claims

A method for estimating the number of signal sources, which estimates the number of signal sources from observed signals,
Data for specifying a plurality of parameters is stored in the storage unit,
The frequency domain converter converts the observation signal x _j (t) (j = {1, ..., M}) from M sensors into time-series data X _j (f, τ) for each frequency. The procedure of storing in the storage unit;
A procedure in which the signal separation unit generates a separation signal Y _i (f, τ) (i = {1, ..., M}) from the time series data X _j (f, τ) and stores it in the storage unit; ,
The power calculation unit calculates the power value of each of the separated signals Y _i (f, τ) and stores it in the storage unit,
A procedure in which an envelope correlation calculating unit calculates an envelope correlation value for a time difference Δτ between different separated signals Y _i (f, τ) and stores it in a storage unit;
The determination unit compares the power value and envelope correlation value of each of the separated signals Y _i (f, τ) with the parameters, and the separated signal Y _i (f, τ) is a source signal component. A procedure for determining whether or not
A method for estimating the number of signal sources.

A method for estimating the number of signal sources according to claim 1,
The procedure for generating the separated signal Y _i (f, τ) from the time series data X _j (f, τ) and storing it in the storage unit is as follows.
The independent component analysis unit uses the independent component analysis to separate the M × M matrix separation matrix W (f) and ICA from the time series data X ₁ (f, τ),..., X _M (f, τ). Generating a signal [Z ₁ (f, τ), ..., Z _M (f, τ)] ^T and storing it in the storage unit;
A diagonal matrix generation unit generating a diagonal matrix Λ (f) for solving the scaling problem from the separation matrix W (f);
The product operation unit is [Y ₁ (f, τ), ..., Y _M (f, τ)] ^T ← Λ (f) ・ [Z ₁ (f, τ), ..., Z _M (f , τ)] ^T to generate the separated signal Y _i (f, τ) and store it in the storage unit by calculating ^T ;
A method for estimating the number of signal sources.

A method for estimating the number of signal sources according to claim 1,
The procedure for generating the separated signal Y _i (f, τ) from the time series data X _j (f, τ) and storing it in the storage unit is as follows.
The correlation matrix generator generates a correlation matrix R (f) ← <X (f for time series vector X (f, τ) = [X ₁ (f, τ), ..., X _M (f, τ)] ^T , τ) ・ X (f, τ) ^H 〉 _τ ,
The eigenvalue decomposition unit sets the correlation matrix R (f) to V (f) = [v ₁ (f), v ₂ (f), ..., v _M (f)], and sets Λ (f) to λ ₁ (f), λ ₂ (f),..., Λ _M (f) are diagonal elements of M rows and M columns, v _j (f) is an eigenvector, and λ _j (f) Is a product of R (f) = V (f) · Λ (f) · V (f) ^H , where E is the corresponding eigenvalue,
The product operation unit is [Y ₁ (f, τ), ..., Y _M (f, τ)] ^T ← V (f) ^H・ [X ₁ (f, τ), ..., X _M ( f, τ)] ^T to generate the separated signal Y _i (f, τ) and store it in the storage unit by calculating ^T ,
A method for estimating the number of signal sources.

A method for estimating the number of signal sources according to claim 1,
The procedure for calculating the power value of each separated signal Y _i (f, τ) and storing it in the storage unit is as follows:
The average power calculation unit calculates the average power value for the time τ of each separated signal Y _i (f, τ),
A power normalization unit normalizing the average power value,
The procedure for determining whether the separated signal Y _i (f, τ) is a source signal component is as follows:
It is a procedure for comparing the normalized value of the average power value and the envelope correlation value with each of the above parameters and determining whether or not the separated signal Y _i (f, τ) is a source signal component.
A method for estimating the number of signal sources.

A method for estimating the number of signal sources according to claim 1,
The procedure for calculating the envelope correlation value and storing it in the storage unit is as follows.
Envelope calculation unit, the isolated signals in the so average with respect to time tau becomes 0 Y _i (f, τ) the absolute value of _{| Y i (f, τ)} | envelope was normalized v _{i (f,} τ ) And storing it in the storage unit;
The correlation calculator

A procedure for calculating an envelope correlation value Cor _{i, k} (f) by calculation of and storing it in the storage unit,
A maximum value calculating unit calculating a maximum value maxCor _i (f) of the envelope correlation value Cor _{i, k} (f) for each _i , and
The procedure for determining whether the separated signal Y _i (f, τ) is a source signal component is as follows:
The power value of each of the separated signals Y _i (f, τ) and the maximum value maxCor _i (f) are compared with the parameters, and the separated signal Y _i (f, τ) is a source signal component. It is a procedure to determine whether there is,
A method for estimating the number of signal sources.

A method for estimating the number of signal sources according to claim 1,
The above parameters are
A first parameter th _noise indicating a threshold of a noise level, a second parameter th _rev indicating a threshold of a reverberation level, and a third parameter th _cor indicating a threshold of an envelope correlation value,
The procedure for determining whether the separated signal Y _i (f, τ) is a source signal component is as follows:
When the power value of the separated signal Y _i (f, τ) is less than or less than the first parameter th _noise , or the power value of the separated signal Y _i (f, τ) is the second parameter th _rev Or less, and when the envelope correlation value is greater than or equal to the third parameter th _cor , the separated signal Y _i (f, τ) has a procedure for determining that it is not a source signal component.
A method for estimating the number of signal sources.

An estimation device for estimating the number of signal sources from observed signals,
A storage unit storing data for specifying a plurality of parameters;
Frequency to convert observation signal x _j (t) (j = {1, ..., M}) from M sensors into time-series data X _j (f, τ) for each frequency and store in memory An area conversion unit;
A signal separation unit that generates a separation signal Y _i (f, τ) (i = {1, ..., M}) from the time series data X _j (f, τ) and stores it in a storage unit;
A power calculation unit that calculates the power value of each of the separated signals Y _i (f, τ) and stores it in the storage unit;
Calculating an envelope correlation value for a time difference Δτ between the different separated signals Y _i (f, τ) and storing the envelope correlation value in a storage unit;
The power value and envelope correlation value of each separated signal Y _i (f, τ) is compared with each of the above parameters, and whether or not the separated signal Y _i (f, τ) is a source signal component is determined. A determination unit for determining;
The estimation apparatus characterized by having.

A procedure for converting observation signals x _j (t) (j = {1, ..., M}) from M sensors into time-series data X _j (f, τ) for each frequency;
A procedure for generating a separation signal Y _i (f, τ) (i = {1, ..., M}) from the time series data X _j (f, τ) and storing it in a storage unit,
A procedure for calculating the power value of each separated signal Y _i (f, τ) and storing it in the storage unit,
A procedure for calculating an envelope correlation value for a time difference Δτ between different separated signals Y _i (f, τ) and storing it in a storage unit;
The power value and envelope correlation value of each separated signal Y _i (f, τ) described above are compared with each parameter stored in the storage unit, and the separated signal Y _i (f, τ) is the source signal component. A procedure for determining whether or not
An estimation program for causing a computer to execute.

A computer-readable recording medium on which the estimation program according to claim 8 is recorded.