JP4714892B2

JP4714892B2 - High reverberation blind signal separation apparatus and method

Info

Publication number: JP4714892B2
Application number: JP2005132885A
Authority: JP
Inventors: 博五反田; 武志古屋; 圭市金田
Original assignee: Kitakyushu Foundation for Advancement of Industry Science and Technology
Current assignee: Kitakyushu Foundation for Advancement of Industry Science and Technology
Priority date: 2005-04-28
Filing date: 2005-04-28
Publication date: 2011-06-29
Anticipated expiration: 2025-04-28
Also published as: JP2006308955A

Description

本発明は、統計的に独立な複数の原信号が線形混合した混合信号の信号分離を行う信号分離技術に関し、特に、独立成分分析（Independent Component Analysis：以下「ICA」という。）を用いて原信号の分離を行う技術に関する。 The present invention relates to a signal separation technique for performing signal separation of a mixed signal obtained by linearly mixing a plurality of statistically independent original signals, and in particular, using original component analysis (hereinafter referred to as “ICA”). The present invention relates to a technique for separating signals.

混合信号を原信号に分離する信号処理技術はブラインド信号分離と呼ばれている。近年、ブラインド信号分離の手法としてICAが注目されており、音声信号処理、画像処理、通信等の分野において研究開発が進められている。ICAでは、例えば、複数の音源から発せられた原信号が線形混合した混合音をR個の受音器で受音して得られる複数の観測信号に対して、各原信号が統計的に独立であることを利用して観測信号を処理することで、受音器と同数のR個又はR以下の個数の原信号を分離することができる。 A signal processing technique for separating a mixed signal into an original signal is called blind signal separation. In recent years, ICA has been attracting attention as a blind signal separation technique, and research and development are being promoted in the fields of audio signal processing, image processing, communication, and the like. In ICA, for example, each original signal is statistically independent of multiple observed signals obtained by receiving mixed sound, which is a linear mixture of original signals emitted from multiple sound sources, with R receivers. By utilizing the fact that the observation signal is processed, it is possible to separate the same number of R or less than R original signals as the receiver.

ICAは、時間領域で信号分離を行う時間領域ICA（Time Domain ICA：以下「TDICA」という。）と周波数領域で信号分離を行う周波数領域ICA（Frequency Domain ICA：以下「FDICA」という。）とに大別される。 The ICA is divided into a time domain ICA (Time Domain ICA: hereinafter referred to as “TDICA”) that performs signal separation in the time domain and a frequency domain ICA (Frequency Domain ICA: hereinafter referred to as “FDICA”) that performs signal separation in the frequency domain. Broadly divided.

2つ以上の受音器に到達する音波に時間遅れや畳み込みがなく、混合行列の要素が実定数で表される場合を「空間的混合（spatial mixture）」または「瞬時混合（instantaneous mixture）」という。瞬時混合の場合、TDICA及びFDICAの何れによっても音源を完全に分離することができる。 "Spatial mixture" or "instantaneous mixture" when the sound wave reaching two or more receivers has no time delay or convolution and the elements of the mixing matrix are represented by real constants That's it. In the case of instantaneous mixing, the sound source can be completely separated by either TDICA or FDICA.

しかし、実際の環境においては、音源と受音器との距離の相違から、個々の受音器において受信される信号に時間差がある。また、反響や残響が畳み込まれて観測される。このように、空間的・時間的に原信号が混合される場合を「時空間的混合（spatio-temporal mixture）」または「畳み込み混合（convolutive mixture）」という。また、畳み込み混合の原信号の分離問題は「ブラインド・ソース・デコンボリューション（Blind Source Deconvolution：以下「BSD」という。）」と呼ばれる。ブラインド・ソース・デコンボリューションの問題は次のように定式化される。 However, in an actual environment, due to the difference in distance between the sound source and the sound receiver, there is a time difference in the signals received by the individual sound receivers. In addition, echoes and reverberations are convoluted and observed. A case where the original signals are mixed spatially and temporally in this manner is called “spatio-temporal mixture” or “convolutive mixture”. The problem of separation of the original signal of convolutional mixing is called “Blind Source Deconvolution (hereinafter referred to as“ BSD ”)”. The problem of blind source deconvolution is formulated as follows:

〔１〕ブラインド・ソース・デコンボリューション（BSD）
音源の数をR₁、受音器の数をR₂とする。音源から出力される原信号をs_i（t）、受音器における観測信号をx_j（t）と記す。iは音源の番号を表すインデックスであり、jは受音器の番号を表すインデックスである。 [1] Blind source deconvolution (BSD)
Let R _{1 be} the number of sound sources and R ₂ the number of sound receivers. The original signal output from the sound source is denoted as s _i (t), and the observation signal at the receiver is denoted as x _j (t). i is an index representing the number of the sound source, and j is an index representing the number of the receiver.

時間領域での観測信号x_j（t）を、L次のフィルタで次式のようにモデル化する。 The observation signal x _j (t) in the time domain is modeled by the Lth order filter as follows:

ここで、tは時間、τは遅延時間を表す。また、各音源から各受音器までのインパルス応答をa_ji（τ）と記した。行列A（τ）＝（a_ji（τ））を「混合行列」という。一般的なインパルス応答a_ji（τ）の形状は、適当な時間経過後にパルス的な強い応答を持ち、時間と共に減衰していくものである。このインパルス応答長をTとする。但し、Tは未知である。 Here, t represents time and τ represents delay time. Also, the impulse response from each sound source to each receiver is denoted as a _ji (τ). The matrix A (τ) = (a _ji (τ)) is called a “mixing matrix”. The shape of a general impulse response a _ji (τ) has a strong pulse-like response after a lapse of an appropriate time and decays with time. Let this impulse response length be T. However, T is unknown.

BSDの目的は、原信号s_i（t）やインパルス応答a_ji（τ）の情報なくして、分離のための長さLのFIR（Finite Impulse Response）フィルタの係数w_ij（τ）と分離信号y_i（t）とを求めることにある。尚、係数の行列W（τ）＝（w_ij（τ））は「分離行列」という。 The purpose of the BSD is to eliminate the length w of the FIR (Finite Impulse Response) filter coefficient w _ij (τ) and the separated signal without the information of the original signal s _i (t) and impulse response a _ji (τ). It is to obtain y _i (t). The coefficient matrix W (τ) = (w _ij (τ)) is referred to as a “separation matrix”.

図12はR₁＝2、R₂＝2の場合のBSDの構成を表す。式（1），式（2）を周波数領域に変換すると次式のようになる。 FIG. 12 shows the BSD configuration when R ₁ = 2 and R ₂ = 2. When Formula (1) and Formula (2) are converted to the frequency domain, the following formula is obtained.

ここで、x_j（ω_n，k），s_i（ω_n，k）はそれぞれx_j（t），s_i（t）の離散フーリエ変換（Discrete Fourier Transform：以下「DFT」という。）、a_ji（ω_n），w_ij（ω_n）はそれぞれa_ji（t），w_ij（t）のDFT、ω_ｎは正規化角周波数、kはフレーム番号を表す。 Here, x _j (ω _n , k) and s _i (ω _n , k) are discrete Fourier transforms (Discrete Fourier Transform: hereinafter referred to as “DFT”) of x _j (t) and s _i (t), respectively. a _ji (ω _n ) and w _ij (ω _n ) are DFTs of a _ji (t) and w _ij (t), respectively, ω _n is a normalized angular frequency, and k is a frame number.

TDICAでは時間領域において式（2）のw_ij（τ）を計算することによって信号分離を行い、FDICAでは周波数領域において式（4）のw_ij（ω_n）を計算することによって信号分離を行う。BSDにおいて、TDICAにより混合信号から元の原信号を分離しようとすると、次数（タップ長）Lが数千にも及ぶフィルタが必要とされ、アルゴリズムの安定性や計算負荷の問題が生じる。従って、一般にBSDはFDICAにより解かれることが多い。これは、時間領域での畳み込みが周波数領域では積項として表現できるため、瞬時混合の場合と同様に平易に扱うことができることによる。 In TDICA, signal separation is performed by calculating w _ij (τ) of Equation (2) in the time domain, and in FDICA, signal separation is performed by calculating w _ij (ω _n ) of Equation (4) in the frequency domain. . In BSD, when trying to separate the original original signal from the mixed signal by TDICA, a filter with order (tap length) L of several thousand is required, which causes problems of algorithm stability and calculation load. Therefore, in general, BSD is often solved by FDICA. This is because convolution in the time domain can be expressed as a product term in the frequency domain and can be handled as easily as in the case of instantaneous mixing.

〔２〕TDICA
次に、TDICAについての概略を簡単に説明する。TDICAでは、分離信号Y（t）＝［y₁（t），…，y_R1（t）］^T（但し、x^Tはxの転置を表す。）の各成分が独立となるように分離行列W（τ）＝（w_ij（τ））を決定する。独立性を測る距離としては、分離信号Y（t）の各成分の同時分布と周辺分布との間のKullback-Leibler情報量（以下「KL情報量」という。）I_KL（Y（t））を用いることができる。 [2] TDICA
Next, an outline of TDICA will be briefly described. In TDICA, the separation matrix is such that each component of the separation signal Y (t) = [y ₁ (t),..., Y _R1 (t)] ^T (where x ^T represents the transpose of x) is independent. Determine W (τ) = (w _ij (τ)). The distance to measure independence is the Kullback-Leibler information amount (hereinafter referred to as “KL information amount”) between the simultaneous distribution of each component of the separated signal Y (t) and the peripheral distribution. I _KL (Y (t)) Can be used.

但し、C（Y）は、Yのエントロピーを表す。分離信号Y（t）の各成分を完全に独立にするためには、I_KL（Y（t））＝0となるように分離行列W（τ）を決定すればよい。KL情報量は非負なのでI_KL（Y（t））のW（τ）による微分が0のとき最小となる。これを解くと、分離行列の更新式が次式のように定式化される。 However, C (Y) represents the entropy of Y. In order to make each component of the separation signal Y (t) completely independent, the separation matrix W (τ) may be determined so that I _KL (Y (t)) = 0. Since the amount of KL information is non-negative, it becomes minimum when the derivative of I _KL (Y (t)) by W (τ) is zero. When this is solved, the update formula of the separation matrix is formulated as follows.

ここで、IはR₁＝R₂のときR₁×R₁の単位行列、R₁＜R₂のときR₁×R₂の(I:O)なる行列、OはR₁×(R₂-R₁)の零行列であり、ηは探索ステップ幅、δ（τ）はクロネッカーのデルタである。また、φは理想的にはy_iの周辺確率密度p_yi（y）を用いて次式で定義される関数である。しかし、一般には音源の正確な分布は分からないため、φはtanh(・)のような非線形関数を用いる。 Here, I is R ₁ = matrix of R ₁ × R ₁ when R _2, R ₁ <when R ₂ of _{_{R 1 × R 2 (I:}} O) becomes the matrix, O is R ₁ × (R ₂ -R ₁ ), where η is the search step width and δ (τ) is the Kronecker delta. Also, φ is a function that is ideally defined by the following equation using the peripheral probability density p _yi (y) of y _i . However, since the exact distribution of the sound source is generally unknown, φ uses a nonlinear function such as tanh (•).

ここで、まず、TDICAアルゴリズム（式（6））の非対角成分について考える。非対角成分は、次式（10）のように表される。 First, consider the off-diagonal component of the TDICA algorithm (Equation (6)). The non-diagonal component is expressed as the following equation (10).

従って、W（τ）の収束点では、次式（11）が成り立つ。 Therefore, the following equation (11) is established at the convergence point of W (τ).

このことは、収束後の分離信号y_i（t）とy_j（t）とが空間的にだけでなく時間的にも独立となることを示している。すなわち、式（6）の非対角成分は分離信号が空間的にも時間的にも独立となるように機能する。 This indicates that the separated separated signals y _i (t) and y _j (t) are independent not only spatially but also temporally. That is, the off-diagonal component of Equation (6) functions so that the separated signal is independent both spatially and temporally.

一方、式（6）の対角成分について考える。対角成分は次式（12）のように表される。 On the other hand, consider the diagonal component of equation (6). The diagonal component is expressed as the following equation (12).

従って、W（τ）の収束点では、次式（13）が成り立つ。 Therefore, the following equation (13) holds at the convergence point of W (τ).

このことは、分離信号y_i（t）がφ（y_i（t））y_i（t）＝1にスケーリングされることに加えて、時間的にも独立となることを示している。すなわち、式（6）の対角成分は、分離信号をスケーリングするとともに時間的に独立化（白色化）するように働く。時間的な独立化については、非対角成分の場合、もともと異なる信号に対して独立化するように働くため問題とはならない。しかし、対角成分の場合、1つの信号を時間的に独立化することとなるため、歪みを生じる。この歪みを「白色化歪み」という。 This indicates that the separated signal y _i (t) is scaled to φ (y _i (t)) y _i (t) = 1 and is also independent in time. That is, the diagonal component of Equation (6) works to scale the separated signal and to make it independent (white) in time. In the case of non-diagonal components, temporal independence is not a problem because it works to independence for different signals. However, in the case of the diagonal component, since one signal is temporally independent, distortion occurs. This distortion is called “whitening distortion”.

図13は、TDICAからみた混合過程と分離過程を表す図である。白色信号s_i’（t）の分布がi.i.d（identically independent distribution）の音源と仮定し、原信号s_i（t）はこれを口腔内の調音器官で調音された音声信号である。原信号s_i（t）は白色信号s_i’（t）と異なり時間的な相関を持つ信号となる。従って、TDICAアルゴリズムは、分離対象の信号をi.i.dを仮定して、その過程を満たすように分離行列の更新を行う。その結果得られる分離信号y_i（t）は、上述のように空間的に独立化されるのみならず時間的にも独立化（白色化）されたものとなる。そのため、音声のように時間的な相関を持つ原信号s_i（t）にTDICAアルゴリズムを適用した場合、分離信号y_i（t）はs_i（t）ではなくs_i’（t）の推定値となり歪んだものとなる。この歪みは、白色雑音の歪みと似ていることから、白色化歪みと呼ばれる。 FIG. 13 is a diagram showing the mixing process and the separation process as seen from TDICA. Assuming that the distribution of the white signal s _i ′ (t) is a sound source with iid (identically independent distribution), the original signal s _i (t) is an audio signal tuned by the articulator in the oral cavity. Unlike the white signal s _i ′ (t), the original signal s _i (t) has a temporal correlation. Therefore, the TDICA algorithm assumes the iid of the signal to be separated and updates the separation matrix so as to satisfy the process. The separated signal y _i (t) obtained as a result is not only spatially independent as described above, but also temporally independent (whitened). Therefore, when applying TDICA algorithm to the original signal s _i (t) with a temporal correlation as speech, the estimation of the separated signals y _i (t) in s _i (t) is not s _i '(t) It becomes a value and becomes distorted. This distortion is called whitening distortion because it is similar to the distortion of white noise.

〔３〕最小歪みの原理
TDICAにおける白色化歪みを解決する方法としては、非特許文献3に記載の最小歪みの原理（Minimal Distortion Principle：以下「MDP」という。）による方法が公知である。以下、MDPの概略と、TDICAへの適用について簡単に説明する。 [3] Principle of minimum distortion
As a method for solving whitening distortion in TDICA, a method based on the principle of minimum distortion (Minimum Distortion Principle: hereinafter referred to as “MDP”) described in Non-Patent Document 3 is known. The outline of MDP and its application to TDICA are briefly described below.

まず、s_i’（t）からx_j（t）までの混合過程とx_j（t）からy_i（t）までの分離過程を図13のようにブロック化して考える。TDICAにより求められる分離フィルタ行列をz変換の形式で式（14）のように表す。 First, the mixing process from s _i ′ (t) to x _j (t) and the separation process from x _j (t) to y _i (t) are considered as shown in FIG. The separation filter matrix obtained by TDICA is expressed in the form of z-transform as shown in Equation (14).

TDICAにより音源s_i（t）を分離しようとしても、分離フィルタ行列W（z）は、混合行列の逆行列A^-1（z）とはならず、式（15）のように、白色化による対角行列D（z）がA^-1（z）に掛かる形で与えられる。 Even if the source s _i (t) is separated by TDICA, the separation filter matrix W (z) does not become the inverse matrix A ⁻¹ (z) of the mixing matrix, and is expressed by whitening as shown in Equation (15). A diagonal matrix D (z) is given over A ^-1 (z).

ここで、D（z）はフィルタd_i（z）（i＝1，…，R₁）を対角要素とする対角行列である。一方、音源s（t）は白色信号s’（t）を対角成分がフィルタe_i（z）（i＝1，…，R₁）の対角行列E（z）で変調したものと考えられる。従って、TDICAではD（z）としてE^-1（z）を推定しているとも解釈することができる。 Here, D (z) is a diagonal matrix having the filters d _i (z) (i = 1,..., R ₁ ) as diagonal elements. On the other hand, the sound source s (t) is considered to be a white signal s ′ (t) modulated by a diagonal matrix E (z) whose diagonal components are filters e _i (z) (i = 1,..., R ₁ ). It is done. Therefore, it can be interpreted that TDICA estimates E ⁻¹ (z) as D (z).

ここで、分離行列W（z）に対して、次式（16）のような操作を行うことにより、D（z）を相殺することができる。 Here, D (z) can be canceled by performing an operation such as the following equation (16) on the separation matrix W (z).

従って、式（16）の左辺を改めてW（z）とおくことによって、W（z）からD（z）の任意性を取り除くことができる。 Therefore, by replacing the left side of Equation (16) with W (z), the arbitraryness of D (z) can be removed from W (z).

式（17）は式（16）においてD（z）＝diag A（z）としたものと等価である。このとき、分離信号y_i（t）はa_ii（z）s_i（t）となる。従って、y_i（t）のスケーリングは伝達関数で規定され理にかなったものとなり、分離信号の歪みは最小となる。 Equation (17) is equivalent to D (z) = diag A (z) in Equation (16). At this time, the separated signal y _i (t) becomes a _ii (z) s _i (t). Therefore, the scaling of y _i (t) is specified by the transfer function and makes sense, and the distortion of the separated signal is minimized.

D（z）がdiag A（z）になるようにするには、歪みをE［‖y（t）−x（t）‖²］で評価し、この評価関数を最小とするW（z）を求めればよいことが示されている（非特許文献3参照）。これが最小歪みの原理（MDP）である。 To make D (z) become diag A (z), evaluate the distortion with E [‖y (t) −x (t) ‖ ² ], and minimize this evaluation function W (z) Has been shown to be obtained (see Non-Patent Document 3). This is the principle of minimum distortion (MDP).

次に、MDPに基づくTDICAのアルゴリズムについて説明する。MDPに従ってE［‖y（t）−x（t）‖²］を最小化するために、E［‖y（t）−x（t）‖²］の自然勾配を Next, the TDICA algorithm based on MDP will be described. To minimize E [‖y (t) -x ( t) ‖ ^2] according to MDP, the natural gradient of E [‖y (t) -x ( t) ‖ ^2]

とすると、

Then,

は式（18）で与えられる。また、そのon-line展開形は式（19）で与えられる。

Is given by equation (18). The on-line expansion is given by equation (19).

一方、TDICAの分離行列の更新式（式（14））を非対角成分ΔW_off（τ）と対角成分ΔW_diag（τ）とに分けて式（20）〜（22）のように書くことができる。 On the other hand, the update formula of TDICA separation matrix (Equation (14)) is divided into non-diagonal component ΔW _off (τ) and diagonal component ΔW _diag (τ) as shown in Equations (20) to (22). be able to.

ここで、ξはΔW_off（τ）に対するΔW_diag（τ）の比率を表し、式（6）ではξ＝1である。上述のように、非対角成分は白色化には寄与しないのでそのまま採用する。ΔW_off（τ）をon-line展開形で表すと式（23）のようになる。 Here, ξ represents the ratio of ΔW _diag (τ) to ΔW _off (τ), and ξ = 1 in equation (6). As described above, off-diagonal components do not contribute to whitening and are used as they are. When ΔW _off (τ) is expressed in an on-line expansion form, Equation (23) is obtained.

一方、対角成分ΔW_diag（τ）はスケーリング調整に寄与し白色化の原因ともなるので、MDPに基づいて導かれる On the other hand, the diagonal component ΔW _diag (τ) contributes to scaling adjustment and also causes whitening, so it is derived based on MDP.

と置換する。以上の結果、MDPに基づくTDICA法における分離行列の更新式のon-line展開形は、式（24）により表される。

Replace with As a result, the on-line expansion form of the separation matrix update formula in the TDICA method based on MDP is expressed by formula (24).

ここで、ψ（t）＝y（t）−x（t）、γ＝2ξである。γは非対角項に対する対角項の更新比率を表す。 Here, ψ (t) = y (t) −x (t) and γ = 2ξ. γ represents the update ratio of the diagonal term to the off-diagonal term.

〔４〕FDICA
次に、FDICAについての概略を簡単に説明する。FDICAでは、分離信号Y（ω_n，k）＝［y₁（ω_n，k），…，y_R1（ω_n，k）］^T（但し、x^Tはxの転置を表す。）の各成分が独立となるように分離行列W（ω_n）＝（w_ij（ω_n））を決定する。分離行列W（ω_n）の更新式はTDICAの場合と同様にして求められ、次式のようになる。 [4] FDICA
Next, an outline of FDICA will be briefly described. In FDICA, each of the separated signals Y (ω _n , k) = [y ₁ (ω _n , k),..., Y _R1 (ω _n , k)] ^T (where x ^T represents transposition of x). The separation matrix W (ω _n ) = (w _ij (ω _n )) is determined so that the components are independent. The update formula of the separation matrix W (ω _n ) is obtained in the same manner as in the case of TDICA, and is as follows.

ここで、Iは式（６）に対してと同様に定義される単位行列、ηは探索ステップ幅であり、φは式（8）（9）と同様に定義される非線形関数である。 Here, I is a unit matrix defined in the same manner as in equation (6), η is a search step width, and φ is a nonlinear function defined in the same manner as in equations (8) and (9).

FDICAにおいては、成分置換（permutation）及びスケーリング（scaling）が大きな問題となる。すなわち、TDICAの場合と同様、FDICAにより得られる分離信号の周波数毎の時系列Y（ω_n，k）＝［y₁（ω_n，k），…，y_R1（ω_n，k）］^Tは、要素間の順序の任意性とスケーリング（大きさと位相）の任意性がある。各要素の順序が入れ替わったとしても、また、各要素のスケーリングが変化しても、各要素間の独立性は保たれるからである。式（4）より、分離信号Y（ω_n，k）に関するこれらの任意性は、分離行列W（ω_n）＝（w_ij（ω_n））の任意性に対応している。従って、分離行列W（ω_n）の各行成分が同じ信号に対応するように並べる操作と、各行成分のスケーリングを適切に調節する操作が必要とされる。前者が成分置換の問題であり、後者がスケーリングの問題である。 In FDICA, component replacement (permutation) and scaling (scaling) are major problems. That is, as in the case of TDICA, time series Y (ω _n , k) = [y ₁ (ω _n , k),..., Y _R1 (ω _n , k)] ^{T for} each frequency of the separated signal obtained by FDICA Are arbitrary in order between elements and arbitrary in scaling (magnitude and phase). This is because the independence between the elements is maintained even if the order of the elements is changed or the scaling of the elements is changed. From Equation (4), these arbitraryities regarding the separation signal Y (ω _n , k) correspond to the arbitraryities of the separation matrix W (ω _n ) = (w _ij (ω _n )). Therefore, an operation for arranging the row components of the separation matrix W (ω _n ) so as to correspond to the same signal and an operation for appropriately adjusting the scaling of the row components are required. The former is a component replacement problem, and the latter is a scaling problem.

図14では、音源の数及び受音器の数が共に2である場合の例を示している。FDICAでは観測信号x₁（t），x₂（t）は時間幅Nのフレームごとにフーリエ変換されて、N個のサブバンド｛x₁（ω₀，k），…，x₁（ω_N-1，k）｝，｛x₂（ω₀，k），…，x₂（ω_N-1，k）｝に分解される。そして、各サブバンドについて、瞬時混合のFDICAが実行され、N組の分離信号｛y₁（ω_n，k），y₂（ω_n，k）｜n＝0，…，N−1｝が計算される。そして、これらの分離信号は逆フーリエ変換され、時間領域の分離信号y₁（t），y₂（t）として出力される。ここで、図14の例では、成分置換により分離信号y₁（ω_n，k）と分離信号y₂（ω_n，k）の成分がω_n＝ω₀のときとω_n＝ω₁のときで入れ替わっている。このような成分置換が生じると、逆フーリエ変換により時間領域の分離信号y₁（t），y₂（t）を再生した場合にサブバンド成分が再び混合して歪みが生じる。 FIG. 14 shows an example where the number of sound sources and the number of sound receivers are both two. In FDICA, the observed signals x ₁ (t) and x ₂ (t) are Fourier transformed for each frame of time width N, and N subbands {x ₁ (ω ₀ , k),..., X ₁ (ω _{N −1} , k)}, {x ₂ (ω ₀ , k),..., X ₂ (ω _N−1 , k)}. Then, for each subband, instantaneous mixing FDICA is performed, and N sets of separated signals {y ₁ (ω _n , k), y ₂ (ω _n , k) | n = 0,..., N−1} Calculated. These separated signals are subjected to inverse Fourier transform and output as separated signals y ₁ (t) and y ₂ (t) in the time domain. Here, in the example of FIG. 14, when the components of the separated signal y ₁ (ω _n , k) and the separated signal y ₂ (ω _n , k) are ω _n = ω ₀ and ω _n = ω ₁ by component replacement. It is replaced at times. When such component replacement occurs, when the separated signals y ₁ (t) and y ₂ (t) in the time domain are reproduced by inverse Fourier transform, the subband components are mixed again and distortion occurs.

同様に、スケーリングの任意性により、サブバンドごとに分離信号y₁（ω_n，k），y₂（ω_n，k）のゲインが異なる。従って、逆フーリエ変換により時間領域の分離信号y₁（t），y₂（t）を再現した際に、不自然な重み付けがなされて歪みが生じる。 Similarly, the gains of the separated signals y ₁ (ω _n , k) and y ₂ (ω _n , k) are different for each subband due to the arbitrary scaling. Therefore, when the separated signals y ₁ (t) and y ₂ (t) in the time domain are reproduced by inverse Fourier transform, unnatural weighting is performed and distortion occurs.

スケーリングの問題を解決する方法としては、例えば、上述の非特許文献3に記載のMDPによる方法が公知である（特許文献4，〔0011〕参照）。 As a method for solving the scaling problem, for example, a method based on MDP described in Non-Patent Document 3 is known (see Patent Document 4, [0011]).

また、成分置換の問題を解決する方法としては、例えば、特許文献1〜3及び非特許文献1，2，5〜7等が公知である。ここでは、一例として、非特許文献1に記載の成分置換解消法について説明しておく。簡単のため、音源と受音器の数がともに2である場合について説明する。 As methods for solving the problem of component replacement, for example, Patent Documents 1 to 3 and Non-Patent Documents 1, 2, 5 to 7 are known. Here, as an example, the component replacement elimination method described in Non-Patent Document 1 will be described. For simplicity, the case where the number of sound sources and receivers is both 2 will be described.

まず、FDICAにより分離行列W（ω_n）が求められたとし、分離信号y_i（ω_n,k）は観測信号x_j（ω_n,k）により式（4）で表されるとする。ここで、分離信号y_i（ω_n,k）に対して分割スペクトルv_i（ω_n，k）を次式のように定義する。 First, it is assumed that the separation matrix W (ω _n ) is obtained by FDICA, and the separation signal y _i (ω _n , k) is represented by the observation signal x _j (ω _n , k) by the equation (4). Here, the divided spectrum v _i (ω _n , k) is defined as follows for the separated signal y _i (ω _n , k).

この分割スペクトルは、成分置換がないときは、式（30）となり、成分置換が起きているときには式（31）となり、音源s_i（ω_n，k）と音源−受音器間の伝達関数a_ji（ω_n）の積で一意的に表される。 This divided spectrum is expressed by Equation (30) when there is no component replacement, and is expressed by Equation (31) when component replacement occurs, and the transfer function between the sound source s _i (ω _n , k) and the sound source and receiver. It is uniquely represented by the product of a _ji (ω _n ).

また、v_i（ω_n，k）の第1成分と第2成分は同じ音源s_i（ω_n，k）に対する推定値となっているため、二つの音源s₁（ω_n，k），s₂（ω_n，k）の推定値は、v₁₁（ω_n，k）かv₂₂（ω_n，k）の何れかの分割スペクトルで代表させることができる。 Since the first and second components of v _i (ω _n , k) are estimated values for the same sound source s _i (ω _n , k), the two sound sources s ₁ (ω _n , k), The estimated value of s ₂ (ω _n , k) can be represented by a split spectrum of either v ₁₁ (ω _n , k) or v ₂₂ (ω _n , k).

この成分置換解消法では、音声に対する分解スペクトルのエントロピーは、雑音に対する分解スペクトルのエントロピーよりも高いことを利用する。 This component replacement cancellation method utilizes the fact that the decomposition spectrum entropy for speech is higher than the decomposition spectrum entropy for noise.

そこで、分解スペクトルのエントロピーを導入するために、まず、分割スペクトルの実部Re｛v_ij（ω_n）｝の分布範囲をM等分したときの各区画をι_m（m＝0，…，Γ−1）とし、区画ι_mに入る頻度をκ_ij（ω_n，ι_m）とする。また、κ_ij（ω_n，ι_m）を規格化して得られる確率ρ_ij（ω_n，ι_m）を下式で定義する。 Therefore, in order to introduce the entropy of the decomposition spectrum, first, each partition when the distribution range of the real part Re {v _ij (ω _n )} of the divided spectrum is equally divided into M is represented by ι _m (m = 0,. Γ−1) and the frequency of entering the block ι _m is κ _ij (ω _n , ι _m ). The probability ρ _ij (ω _n , ι _m ) obtained by normalizing κ _ij (ω _n , ι _m ) is defined by the following equation.

この確率ρ_ij（ω_n，ι_m）を用いて、分割スペクトルの実部Re｛v_ij（ω_n）｝の分布に対するエントロピーを次式で定義する。 Using this probability ρ _ij (ω _n , ι _m ), entropy for the distribution of the real part Re {v _ij (ω _n )} of the split spectrum is defined by the following equation.

分割スペクトルv_ij（ω_n，k）は音源s_i（ω_n，k）と音源−受音器間の伝達関数a_ji（ω_n）の積で一意的に表されるので、C_i1とC_i2とは同じ音源に対するエントロピーを表している。そこで、両者の差（式（34））を検査して、エントロピー差ΔC（ω_n）が負ならば成分置換はないと判定し、最終的な出力をz（ω_n，k）＝［v₁₁（ω_n，k），v₂₂（ω_n，k）］として割り当てる、逆に、差ΔC（ω_n）が正であれば、成分置換が起きていると判断し、最終的な出力をz（ω_n，k）＝［v₂₂（ω_n，k），v₁₁（ω_n，k）］として割り当てる。 Since the split spectrum v _ij (ω _n , k) is uniquely represented by the product of the sound source s _i (ω _n , k) and the transfer function a _ji (ω _n ) between the sound source and the receiver, C _i1 and C _i2 represents the entropy for the same sound source. Therefore, the difference between the two (formula (34)) is examined, and if the entropy difference ΔC (ω _n ) is negative, it is determined that there is no component replacement, and the final output is z (ω _n , k) = [v ₁₁ (ω _n , k), v ₂₂ (ω _n , k)], conversely, if the difference ΔC (ω _n ) is positive, it is determined that component replacement has occurred, and the final output is z (ω _n , k) = [v ₂₂ (ω _n , k), v ₁₁ (ω _n , k)] is assigned.

尚、上記説明では、音源及び受音器の数がともに2の場合で説明したが、音源数R₁、受音器数R₂の場合に拡張することは容易である。以上のようにして、FDICAにおける成分置換の問題は解決することができる。 In the above description, the case where both the number of sound sources and sound receivers is two has been described. However, it is easy to expand to the case where the number of sound sources is R ₁ and the number of sound receivers is R ₂ . As described above, the problem of component replacement in FDICA can be solved.

〔５〕畳み込みFDICA（ConvFDICA）
FDICAにおいては、残響が多くなると音源分解性能が著しく低下するという問題がある。これは、信号をDFTにより周波数領域に変換する際に、フーリエ変換の分析窓幅が狭いとインパルス応答のすべてを包含できないため、音源の直接音部分は分離できても、それに続く反射や残響部分が分離できないことに起因する。図15にこの様子を示す。図15は、インパルス応答長Tに対して窓幅Nの分析窓を使う場合を表している。この場合、Nよりも大きなインパルス応答はDFTに取り込まれない。従って、この取り込まれない残響部分が分離できないことになる。 [5] Convolution FDICA (ConvFDICA)
In FDICA, there is a problem that the sound source decomposition performance is remarkably lowered when reverberation increases. This is because when the signal is converted to the frequency domain by DFT, if the Fourier transform analysis window width is narrow, it cannot include all of the impulse response, so even if the direct sound part of the sound source can be separated, the subsequent reflection and reverberation part Due to the inseparability. FIG. 15 shows this state. FIG. 15 shows a case where an analysis window having a window width N is used for the impulse response length T. In this case, an impulse response larger than N is not taken into the DFT. Therefore, the reverberation part that is not captured cannot be separated.

そこで、分析窓幅を広くすれば残響成分も除去できるように思われる。しかしながら、同一長の観測信号について、分析窓幅を広くすると、時間分解能が粗くなり、結果的にICAに必要な十分な個数のデータが得られない。従って、分析窓幅を小さくしたとしてもICAの分解能力は低下する。また、観測信号が長ければ、分析窓幅を広くすることにより残響成分を除去することができるが、実用を考えた場合、長時間に亘って同じ環境が持続するとは考えにくい。また、移動音源に対しては短時間の観測信号で分離する必要がある。 Therefore, it seems that reverberation components can be removed by widening the analysis window width. However, if the analysis window width is widened for the same length of observation signal, the time resolution becomes coarse, and as a result, a sufficient number of data required for ICA cannot be obtained. Therefore, even if the analysis window width is reduced, the ability of ICA to degrade is reduced. In addition, if the observation signal is long, the reverberation component can be removed by widening the analysis window width. However, considering practical use, it is unlikely that the same environment will last for a long time. Moreover, it is necessary to separate the moving sound source with a short-time observation signal.

この問題を解決するために、非特許文献4においては、分割畳み込みの概念を導入し、混合過程のインパルス応答を分割することにより、それまで周波数領域で瞬時混合として近似的にモデル化していたFDICAを畳み込み混合として厳密にモデル化している。そして、時間領域畳み込み混合に対するTDICAアルゴリズムを周波数領域に適用した、畳み込みFDICA（以下「ConvFDICA」という。）法が提案されている。 In order to solve this problem, Non-Patent Document 4 introduced the concept of split convolution, and divided the impulse response of the mixing process, so that it was approximately modeled as instantaneous mixing in the frequency domain until then. Is modeled strictly as a convolutional mixture. A convolutional FDICA (hereinafter referred to as “ConvFDICA”) method is proposed in which the TDICA algorithm for time-domain convolutional mixing is applied to the frequency domain.

ConvFDICAでは、狭帯域の分離フィルタ行列を式（35）のように定義する。このW（ω_n，z）を「狭帯域分離フィルタ行列」と呼ぶ。 In ConvFDICA, a narrowband separation filter matrix is defined as shown in Equation (35). This W (ω _n , z) is called a “narrowband separation filter matrix”.

これを周波数領域での畳み込みモデルに対してTDICAの考え方を適用すると、次式（36）のような更新行列が得られる。これを、ConvFDICAと呼ぶ。 When this is applied to the convolution model in the frequency domain, an update matrix like the following equation (36) is obtained. This is called ConvFDICA.

ここで、kはフレーム番号、Mは狭帯域分離フィルタのタップ長、L=MNである。また、ΔW（ω_n，p），W（ω_n，p），φ（y（ω_n，p）），y（ω_n，p）はそれぞれ次式のように表される。 Here, k is the frame number, M is the tap length of the narrowband separation filter, and L = MN. ΔW (ω _n , p), W (ω _n , p), φ (y (ω _n , p)), and y (ω _n , p) are each expressed by the following equations.

ConvFDICAでは、分析窓幅をN、各周波数における分離フィルタのタップ長をMとして、長さMNの時間領域フィルタを周波数領域でMNと各周波数に分解する。これにより、短時間の観測信号からでも十分な統計量が得られる程度の適度な分析窓幅で高残響下でも有効に機能し収束性に優れた分離フィルタが実現される。 In ConvFDICA, the analysis window width is N, the tap length of the separation filter at each frequency is M, and the time domain filter of length MN is decomposed into MN and each frequency in the frequency domain. As a result, a separation filter that is effective even under high reverberation and has excellent convergence can be realized with an appropriate analysis window width sufficient to obtain sufficient statistics even from short-time observation signals.

しかしながら、ConvFDICAは、TDICAの性質を継承しているため、TDICA固有の白色化歪みの問題を伴う。更に、周波数領域で分離を行うことに起因してスケーリングの不定性や成分置換の問題も生じてくる。
特開2005−91560号公報特開2005−91732号公報特開2004−302122号公報特開2005−79781号公報金田圭市，古屋武志，五反田博，「分割スペクトルのエントロピーに基づく成分置換解消法」，電子情報通信学会論文誌，Vol.J87-A, No.7, pp. 1065-1069, July 2004. H. Sawada, R. Mukai, S. Araki, S. Makino, "A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation," IEEE Trans. Speech and Audio Processing, vol.12, no. 5, pp. 530-538, Sep. 2004. K. Matsuoka and S. Nakashima, "Minimal Distortion Principle for Blind Source Separation". Proc. ICA 2001, pp.722-727, 2001. C. Serviere, "Separation of speech signals with segmentation of the impulse responses under reverberant conditions ICA, Int.Workshop on Independent Component Analysis and Signal Separation, Nara, Avril 2003 猿渡洋，栗田悟史，武田一哉，板倉文忠，鹿野清宏，「帯域分割型ICAとビームフォーミングを利用したブラインド音源分離」, 信学技報，EA2000-14，pp.1-8，2000. 石橋孝昭，金田圭市，古屋武志，五反田博，「分割スペクトルの位相に基づく音源方向の推定」, 第56回電気関係学会九州支部連合大会講演会，講演論文集(CD-ROM), 12-2A-10, (2003, H15-9, 崇城大学) H. Gotanda, K. Nobu, T. Koya, K. Kaneda, T. Ishibashi, N. Haratani, "Permutation correction and speech extraction based on split spectrum through FastICA," Proc. ICA2003, pp.379-384, 2003. S. Araki, S. Makino, R. Aichner, T. Nishikawa, and H. Saruwatari, "Subband based blind source separation with appropriate processing for each frequency band, " Proc. ICA2003, pp.499-504, 2003. However, since ConvFDICA inherits the properties of TDICA, it involves the problem of whitening distortion inherent to TDICA. Furthermore, the problem of scaling instability and component replacement also arises due to the separation in the frequency domain.
JP-A-2005-91560 JP 2005-91732 A JP 2004-302122 A JP 2005-79781 A Kanada, Satoshi, Furuya Takeshi, Gotanda Hiroshi, “Resolving component replacement based on entropy of split spectrum”, IEICE Transactions, Vol.J87-A, No.7, pp. 1065-1069, July 2004. H. Sawada, R. Mukai, S. Araki, S. Makino, "A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation," IEEE Trans. Speech and Audio Processing, vol.12, no. 5, pp. 530-538, Sep. 2004. K. Matsuoka and S. Nakashima, "Minimal Distortion Principle for Blind Source Separation". Proc. ICA 2001, pp.722-727, 2001. C. Serviere, "Separation of speech signals with segmentation of the impulse responses under reverberant conditions ICA, Int.Workshop on Independent Component Analysis and Signal Separation, Nara, Avril 2003 Hiroshi Saruwatari, Satoshi Kurita, Kazuya Takeda, Fumitada Itakura, Kiyohiro Shikano, “Blind sound source separation using band-split ICA and beamforming”, IEICE Technical Report, EA2000-14, pp.1-8, 2000. Takaaki Ishibashi, Satoshi Kaneda, Takeshi Furuya, Hiroshi Gotanda, "Estimation of sound source direction based on phase of split spectrum", Proceedings of the 56th Kyushu Branch Joint Conference on Electrical Engineering, CD-ROM, 12-2A -10, (2003, H15-9, Sojo University) H. Gotanda, K. Nobu, T. Koya, K. Kaneda, T. Ishibashi, N. Haratani, "Permutation correction and speech extraction based on split spectrum through FastICA," Proc. ICA2003, pp.379-384, 2003. S. Araki, S. Makino, R. Aichner, T. Nishikawa, and H. Saruwatari, "Subband based blind source separation with appropriate processing for each frequency band," Proc. ICA2003, pp.499-504, 2003.

以上、従来の信号分離技術の問題点をまとめると次の通りである。 The problems of the conventional signal separation technology are summarized as follows.

（1）TDICAでは、成分置換の問題は応用上重大な支障とはならないが、スケーリングの不定性及び白色化歪みの問題は重大な支障を及ぼす。これらの問題はMDPを適用したTDICAによって解決することができる。 (1) In TDICA, the problem of component replacement does not pose a serious problem in application, but the problem of scaling indefiniteness and whitening distortion pose a serious problem. These problems can be solved by TDICA using MDP.

しかしながら、反射や残響を考慮したBSDモデルにTDICAを適用しようとすると、フィルタの次数が莫大となるため実装上の問題が大きい。また、アルゴリズムの問題や計算負荷の問題も大きい。従って、BSDモデルへの適用は不適であり、残響の多い環境においてはTDICAは有効ではない。 However, if TDICA is applied to a BSD model that takes reflection and reverberation into account, the filter order becomes enormous, which causes a large mounting problem. In addition, the problem of algorithm and the problem of calculation load are also great. Therefore, application to the BSD model is inappropriate, and TDICA is not effective in an environment with a lot of reverberations.

（2）FDICAは、サブバンド毎に瞬時混合によって分離を行うため、BSDモデルに適用した場合でも小規模な演算回路で実現することができ、収束も速く解の安定性にも優れる。一方、FDICAでは、成分置換の問題、スケーリングの問題、及びインパルス応答が長い場合に残響を完全に分離することができないという問題がある。 (2) Since FDICA separates each subband by instantaneous mixing, it can be realized with a small arithmetic circuit even when applied to the BSD model, and it converges quickly and has excellent solution stability. On the other hand, in FDICA, there are a problem of component replacement, a scaling problem, and a problem that reverberation cannot be completely separated when an impulse response is long.

成分置換の問題に関しては、例えば、上述の分解スペクトルのエントロピーによる成分置換解消法などによって解決されている。また、スケーリングの問題に関しては、上述のMDPに基づくTDICA法により解決されている。 The problem of component replacement is solved by, for example, the component replacement elimination method based on the entropy of the decomposition spectrum described above. Further, the scaling problem is solved by the above-described TDICA method based on MDP.

しかしながら、観測信号のインパルス応答が長い場合に残響を完全に分離することができないという問題に関しては解決できない。 However, the problem that reverberation cannot be completely separated when the impulse response of the observation signal is long cannot be solved.

（3）非特許文献4に記載のConvFDICAによれば、短時間の観測信号からでも十分な統計量が得られる程度の適度な分析窓幅で、高残響下でも有効に機能し収束性にも優れた分離フィルタを実現することができる。しかしながら、ConvFDICAは、TDICAの性質を継承しているため、TDICAに固有の白色化歪みの問題を伴う。さらには、周波数領域で必然的に起こるスケーリングの不定性や成分置換の課題も残る。 (3) According to ConvFDICA described in Non-Patent Document 4, it has an appropriate analysis window width that can provide sufficient statistics even from short-time observation signals. An excellent separation filter can be realized. However, since ConvFDICA inherits the properties of TDICA, it involves the problem of whitening distortion inherent to TDICA. Furthermore, there remain problems of scaling indefiniteness and component replacement that inevitably occur in the frequency domain.

そこで、本発明の目的は、高残響下でも観測信号を有効に分離することができ、白色化歪みの問題、スケーリングの問題及び成分置換の問題をすべて解決することが可能な信号分離技術を提供することにある。 Therefore, an object of the present invention is to provide a signal separation technique that can effectively separate observation signals even under high reverberation and can solve all the problems of whitening distortion, scaling, and component replacement. There is to do.

本発明に係る耐高残響ブラインド信号分離装置の第1の構成は、複数の観測信号をフーリエ変換して周波数ごとの時系列データを算出し、当該時系列データから独立成分分析（以下「ICA」という。）により周波数毎の分離行列を算出し、前記分離行列により分離信号を算出する耐高残響ブラインド信号分離装置であって、周波数ごとの前記時系列データに対し、MDPに基づくConvFDICAによる分離行列の更新式を用いて前記分離行列の更新を行う分離行列更新手段を有することを特徴とする。 The first configuration of the high reverberation blind signal separation device according to the present invention calculates time series data for each frequency by Fourier transforming a plurality of observation signals, and performs independent component analysis (hereinafter “ICA”) from the time series data. A high-reverb reverberation blind signal separation device that calculates a separation matrix for each frequency and calculates a separation signal using the separation matrix, wherein the separation matrix by ConvFDICA based on MDP is used for the time-series data for each frequency. And a separation matrix updating means for updating the separation matrix using the update formula.

この構成によれば、MDPを用いていることにより、白色化歪みの問題を解決することができる。また、MDPを用いることにより、スケーリングの問題も解決される。さらに、周波数領域での分離行列を更新式するため、TDICAに比べて収束性が速い。また、畳み込みモデルを用いているため、残響を除去することが可能となる。 According to this configuration, the problem of whitening distortion can be solved by using MDP. Using MDP also solves the scaling problem. Furthermore, since the separation matrix in the frequency domain is updated, the convergence is faster than TDICA. Moreover, since the convolution model is used, it is possible to remove reverberation.

ここで、「MDPに基づくConvFDICAによる分離行列の更新式」とは、後述の式（85）によって与えられる更新式をいう。 Here, the “separation matrix update formula by ConvFDICA based on MDP” refers to an update formula given by formula (85) described later.

本発明に係る耐高残響ブラインド信号分離装置の第2の構成は、前記第1の構成において、観測信号に基づき、各周波数間での行列要素の成分置換のない前記分離行列の近似的値（以下「初期分離行列」という。）を演算する分離行列初期化手段を備え、前記分離行列更新手段は、前記初期分離行列を初期値として、前記分離行列の更新を行うことを特徴とする。 A second configuration of the high reverberation blind signal separation device according to the present invention is the approximate value of the separation matrix without component replacement of matrix elements between the frequencies in the first configuration based on the observation signal ( (Hereinafter, referred to as “initial separation matrix”). The separation matrix update means includes a separation matrix initialization means, and the separation matrix update means updates the separation matrix using the initial separation matrix as an initial value.

この構成によれば、最初に、各周波数間での行列要素の成分置換のない初期分離行列を演算し、この初期分離行列を初期値とすることで、分離行列の最終的な収束点の近傍に初期値を設定することができる。そして、この初期値を用いて分離行列の更新を行うことにより、成分置換を生じることなく分離行列を最終的な収束点に収束させることができる。 According to this configuration, first, an initial separation matrix without component replacement of matrix elements between each frequency is calculated, and this initial separation matrix is set as an initial value, so that the vicinity of the final convergence point of the separation matrix An initial value can be set for. Then, by updating the separation matrix using this initial value, the separation matrix can be converged to the final convergence point without causing component replacement.

すなわち、分離係数の解空間は、模式的に示すと図3のような複雑な曲面であると考えられる。この場合、正しい収束点が独立性の尺度が最小となるが、それ以外にも成分置換を伴う解が極小値として存在する。一方、ICAは、自然勾配法によって分離係数空間での解の探索を行うため、正しい収束点の近傍に初期値を設定すれば、曲率が小さくなる方向に移動して、成分置換の収束点に飛び移ることなく正しい収束点に収束する。本発明は、この性質を利用することによって、成分置換の問題を解決したものである。 That is, the solution space of the separation coefficient is considered to be a complicated curved surface as shown in FIG. In this case, the correct convergence point has the smallest measure of independence, but there are other solutions with component replacement as local minimum values. On the other hand, since ICA searches for a solution in the separation coefficient space by the natural gradient method, if an initial value is set in the vicinity of the correct convergence point, it moves in a direction in which the curvature decreases and becomes the convergence point of component replacement. It converges to the correct convergence point without jumping. The present invention solves the problem of component substitution by utilizing this property.

ここで、「各周波数間での行列要素の成分置換のない初期分離行列を演算」する方法は、特に限定するものではなく、上述した各種公知の方法を用いることができる。 Here, the method of “calculating an initial separation matrix without component replacement of matrix elements between frequencies” is not particularly limited, and various known methods described above can be used.

本発明に係る耐高残響ブラインド信号分離装置の第3の構成は、前記第2の構成において、前記分離行列初期化手段は、前記観測信号に基づき、瞬時混合の周波数領域ICAにより成分置換のない前記初期分離行列を演算することを特徴とする。 According to a third configuration of the high reverberation blind signal separation device according to the present invention, in the second configuration, the separation matrix initialization unit has no component replacement in the frequency domain ICA of instantaneous mixing based on the observation signal. The initial separation matrix is calculated.

本発明に係る耐高残響ブラインド信号分離方法の第1の構成は、複数の観測信号をフーリエ変換して周波数ごとの時系列データを算出し、当該時系列データからICAにより周波数毎の分離行列を算出し、前記分離行列により分離信号を算出する耐高残響ブラインド信号分離方法であって、周波数ごとの前記時系列データに対し、MDPに基づくConvFDICAによる分離行列の更新式を用いて前記分離行列の更新を行う分離行列更新手順を有することを特徴とする。 In the first configuration of the anti-reverberation blind signal separation method according to the present invention, time series data for each frequency is calculated by Fourier transforming a plurality of observation signals, and a separation matrix for each frequency is calculated from the time series data by ICA. A high-resistance reverberant blind signal separation method for calculating a separation signal using the separation matrix, wherein the time-series data for each frequency is obtained by using an update equation of the separation matrix by ConvFDICA based on MDP. It has the separation matrix update procedure which updates.

本発明に係る耐高残響ブラインド信号分離方法の第2の構成は、前記第1の構成において、観測信号に基づき、各周波数間での行列要素の成分置換のない前記分離行列の近似的値である初期分離行列を演算する分離行列初期化手順を備え、前記初期分離行列を初期値として、前記分離行列更新手順において、前記分離行列の更新を行うことを特徴とする。 A second configuration of the high reverberation blind signal separation method according to the present invention is an approximate value of the separation matrix without component replacement of matrix elements between frequencies based on the observation signal in the first configuration. A separation matrix initialization procedure for calculating a certain initial separation matrix is provided, and the separation matrix is updated in the separation matrix update procedure using the initial separation matrix as an initial value.

本発明に係る耐高残響ブラインド信号分離方法の第3の構成は、前記第2の構成において、前記分離行列初期化手順において、前記観測信号に基づき、瞬時混合の周波数領域ICAにより成分置換のない前記分離行列の近似的な最適値である初期分離行列を演算することを特徴とする。 According to the third configuration of the high reverberation blind signal separation method according to the present invention, in the second configuration, in the separation matrix initialization procedure, no component replacement is performed in the frequency domain ICA of instantaneous mixing based on the observation signal. An initial separation matrix that is an approximate optimum value of the separation matrix is calculated.

本発明に係るプログラムは、コンピュータにより実行することにより、コンピュータを前記第1乃至3の何れか一の耐高残響ブラインド信号分離装置として機能させることを特徴とする。 The program according to the present invention, when executed by a computer, causes the computer to function as any one of the first to third anti-reverberation blind signal separation devices.

以上のように、本発明によれば、高残響下でも観測信号を有効に分離することができ、白色化歪みの問題、スケーリングの問題及び成分置換の問題をすべて解決することが可能な信号分離技術を提供することが可能となる。 As described above, according to the present invention, it is possible to effectively separate observation signals even under high reverberation, and signal separation that can solve all the problems of whitening distortion, scaling, and component replacement. Technology can be provided.

以下、本発明を実施するための最良の形態について、図面を参照しながら説明する。最初に、基本的な考え方についての説明を行った後に、本発明の実施例について説明する。 The best mode for carrying out the present invention will be described below with reference to the drawings. First, after explaining the basic concept, embodiments of the present invention will be described.

〔１〕混合過程の厳密なモデル化
本発明の耐高残響ブラインド信号分離装置では、時間周波数領域においてICAを行う。ここで、時間周波数領域におけるICAでは、式（1）で表される観測信号をDFTによりサブバンド化し、フレーム単位で時系列化されたサブバンドの観測信号を用いてICAが行われる。しかしながら、式（1）を単純にDFTにより変換した式（3）を、再度時間領域に逆変換すると、その時間領域表現は巡回畳み込みとなる。このとき、巡回畳み込みはL＞＞Tの場合には式（1）の線形畳み込みと等しいが、L＞＞Tでないときは異なる。このことに起因して誤差を生じる。 [1] Strict modeling of mixing process The high reverberation blind signal separation apparatus of the present invention performs ICA in the time-frequency domain. Here, in the ICA in the time-frequency domain, the observation signal represented by Equation (1) is subbanded by DFT, and the ICA is performed using the subband observation signals time-series in frame units. However, when Equation (3) obtained by simply transforming Equation (1) by DFT is again transformed back to the time domain, the time domain representation becomes cyclic convolution. At this time, the cyclic convolution is equal to the linear convolution of Equation (1) when L >> T, but is different when L >> T is not satisfied. This causes an error.

そこで、まず巡回畳み込みに起因する誤差の解消法について説明し、その後、式（1）の混合過程を周波数領域で厳密に反映させたモデルについて説明する。 Therefore, first, a method for eliminating an error caused by cyclic convolution will be described, and then a model that accurately reflects the mixing process of Equation (1) in the frequency domain will be described.

まず、式（1）の線形畳み込みを行列表現すると、次式（41）のように表される。 First, when the linear convolution of Expression (1) is expressed as a matrix, it is expressed as the following Expression (41).

一方、式（3）は逆DFTにより時間領域に戻すと、次式（45）のように表される。

On the other hand, when the expression (3) is returned to the time domain by inverse DFT, it is expressed as the following expression (45).

ここで、式（41）と式（45）との比較から、線形畳み込みの行列表現と巡回畳み込みの行列表現とは異なることが分かる。そこで、A_ji ^（L）の後ろにL個の0を付加したものをA_ji ^（2L）とし、長さを2Lとして式（3）の線形畳み込み表現を書き換えると、次式（49）のようになる。 Here, it can be seen from the comparison between Equation (41) and Equation (45) that the matrix representation of linear convolution and the matrix representation of cyclic convolution are different. Therefore, when A _ji ^(L) is followed by L 0s added to A _ji ^(2L) , the length is 2L, and the linear convolution expression in equation (3) is rewritten, the following equation (49) become.

また、巡回畳み込み表現についても同様に書き換えると、次式（53）のようになる。 In addition, when the cyclic convolution expression is similarly rewritten, the following equation (53) is obtained.

ここで、式（49）と式（53）との比較から、 Here, from the comparison between Equation (49) and Equation (53),

の後半のL個の

L in the second half

から

From

と、X_j ^（2L）（t）の後半のL個とは同じであるが、前半のL個は異なることが分かる。

And ^L _{j in} the second half of X _j ^(2L) (t) is the same, but L in the first half is different.

巡回畳み込みによる By cyclic convolution

と線形畳み込みによるX_j ^（2L）（t）の間には上述のような相違があり、これに起因して誤差が生じる。しかしながら、両者は以下のような手順を踏むことによって等しくすることができる（非特許文献4参照）。

And the above-mentioned difference between X _j ^(2L) (t) by linear convolution and an error occurs due to this. However, both can be made equal by taking the following procedure (see Non-Patent Document 4).

まず、最初のL個の窓を式（57）により定義して、このhの各要素を対角成分とする対角行列Hを定義する。 First, the first L windows are defined by Expression (57), and a diagonal matrix H is defined with each element of h as a diagonal component.

対角行列Hを式（53）の両辺に掛けることにより、次式（58）のようになり、式（59）が成立し、線形畳み込みと巡回畳み込みとは等しくなる。 By multiplying the diagonal matrix H on both sides of the equation (53), the following equation (58) is established, and the equation (59) is established, and the linear convolution and the cyclic convolution are equal.

一方、フーリエ変換の作用素である行列Fを式（60）により定義する。 On the other hand, a matrix F that is an operator of Fourier transform is defined by Expression (60).

これを式（58）の両辺に掛けると、式（62）となる。 When this is applied to both sides of equation (58), equation (62) is obtained.

更に、これを変形することにより、最終的に式（63）が得られる。これは、式（58）の時間領域表現をDFTして周波数領域表現にしたものである。そこで、この式（63）を時間領域畳み込み混合過程の周波数領域における厳密なモデルと呼ぶ。 Further, by transforming this, the formula (63) is finally obtained. This is a frequency domain representation of the time domain representation of Equation (58) by DFT. Therefore, this equation (63) is called a strict model in the frequency domain of the time domain convolution mixing process.

〔２〕インパルス応答の分割による混合モデル
高残響下では、インパルス応答の時間Tが長いため、残響を十分に除去する為には式（1）の畳み込みのフィルタ長LをTに対して大きく採る必要がある。しかしながら、上述したように、Lを大きくすると、各サブバンドで音源を推定するのに十分な個数のデータが得られなくなり収束性に問題が生じる。一方、フィルタ長LをL＜Tとすると、インパルス応答を正確に反映したDFTとはならず、窓がインパルス応答全体を包含できないことに起因する誤差が生じる。 [2] Mixing model by dividing impulse response Since the time T of the impulse response is long under high reverberation, the filter length L of the convolution of equation (1) is made larger than T to sufficiently eliminate the reverberation. There is a need. However, as described above, if L is increased, a sufficient number of data cannot be obtained for estimating sound sources in each subband, resulting in a problem in convergence. On the other hand, if the filter length L is L <T, the DFT does not accurately reflect the impulse response, and an error occurs because the window cannot include the entire impulse response.

そこで、まず適切な窓幅Nを仮定して、フィルタ長LをL＝MNとなるようにN等分する。LはTよりも十分に大きくなるように設定される。このとき、式（1）の線形畳み込みは、次式（69）のようになる。 Therefore, first, assuming an appropriate window width N, the filter length L is equally divided into N so that L = MN. L is set to be sufficiently larger than T. At this time, the linear convolution of Equation (1) is as shown in Equation (69) below.

さらに、これを行列表現すると、次式（70）のように表すことができる。 Furthermore, when this is expressed as a matrix, it can be expressed as the following equation (70).

式（70）をそのまま周波数領域に変換しても正確に反映されないので、式（58）の関係を利用して、次式（74）のように変形する。 Even if the equation (70) is converted into the frequency domain as it is, it is not reflected accurately, and therefore, the relationship of the equation (58) is used to transform into the following equation (74).

更に、式（74）の両辺にDFTの変換行列Fを掛けて周波数領域に変換することにより、最終的に厳密なモデルが式（78）により得られる。 Further, by multiplying both sides of the equation (74) by the DFT transformation matrix F to convert to the frequency domain, a strict model is finally obtained by the equation (78).

ここで、 here,

は式（79）を対角成分とする対角行列であり、また、それぞれの記号は、式（79）〜（83）により表される。

Is a diagonal matrix having the equation (79) as a diagonal component, and each symbol is represented by equations (79) to (83).

式（78）の行列表現を各サブバンドω_n（n＝0，1，…，2N−1）に分けた混合過程として記述すれば、式（84）のように、周波数領域での畳み込み表現となる。これを離散窓掛け離散フーリエ変換と呼ぶ。 If the matrix representation of Equation (78) is described as a mixing process divided into subbands ω _n (n = 0, 1,..., 2N−1), a convolution representation in the frequency domain is obtained as shown in Equation (84). It becomes. This is called a discrete windowed discrete Fourier transform.

これを分かりやすく図で表すと図1のようになる。図1において、観測信号x_j（t）は、各時間フレームkにおいて窓幅NでDFTがされるとともに、各サブバンドはフレーム0からフレームKの間で畳み込み演算がされる。これにより、すべてのインパルス応答をすべて取り込むことができると共に、適度な窓幅Nとして計算するために、各サブバンドで音源を推定するのに十分な個数のデータを短時間で得ることが可能となる。 This can be expressed in a simple manner as shown in Fig. 1. In FIG. 1, the observation signal x _j (t) is DFT with a window width N in each time frame k, and each subband is convolved between frame 0 and frame K. As a result, all impulse responses can be captured, and a sufficient number of data can be obtained in a short time to estimate the sound source in each subband in order to calculate an appropriate window width N. Become.

〔３〕本発明のICAアルゴリズム
本発明では、以上説明したような離散窓掛け離散フーリエ変換を用いることによって、式（1）の観測信号をサブバンド分解し、各サブバンドに対してMDPに基づくConvFDICAアルゴリズムを行う。この場合、分離行列の更新式は、下式（85）により表される。 [3] ICA algorithm of the present invention In the present invention, by using the discrete windowed discrete Fourier transform as described above, the observation signal of Equation (1) is subband decomposed, and each subband is based on MDP. Perform the ConvFDICA algorithm. In this case, the update formula of the separation matrix is expressed by the following formula (85).

ここで、ηは探索ステップ幅、γは内分比である。また、ΔW（ω_n，p）、W（ω_n，p）、φ（y（ω_n，k））、ψ（ω_n，k），及びy（ω_n，k）は、それぞれ式（86）〜（90）により表される。尚、音源の数と受音器の数はともにRとする。 Here, η is a search step width, and γ is an internal ratio. In addition, ΔW (ω _n , p), W (ω _n , p), φ (y (ω _n , k)), ψ (ω _n , k), and y (ω _n , k) are respectively expressed by equations ( 86) to (90). The number of sound sources and the number of sound receivers are both R.

また、φi（ｙ（ω_n，ｋ））、ψi（ω_n，ｋ）は、それぞれ式（９１），（９２）により表される。 Φi (y (ω _n , k)) and ψi (ω _n , k) are expressed by equations (91) and (92), respectively.

このとき、ｙ（ω_n，ｋ）は式（９３）のように計算される。 At this time, y (ω _n , k) is calculated as shown in Equation (93).

本発明のアルゴリズムは、上述のServiereによるConvFDICAのアルゴリズム（非特許文献4参照）にMDPの概念を適用したものとなっている。以下、この本発明のアルゴリズムを「MDPに基づくConvFDICA（MDP-ConvFDICA）」と呼ぶ。このアルゴリズムを用いてICAを実行することにより、ConvFDICAにおいて問題となっていた白色化による歪みを解消することが可能となると共に、スケーリングの不定性の問題も解消することができる。 The algorithm of the present invention is an application of the concept of MDP to the aforementioned ConvFDICA algorithm by Serviere (see Non-Patent Document 4). Hereinafter, the algorithm of the present invention is referred to as “MDP-based ConvFDICA (MDP-ConvFDICA)”. By executing ICA using this algorithm, it is possible to eliminate distortion caused by whitening, which has been a problem in ConvFDICA, and to solve the problem of scaling indefiniteness.

〔４〕成分置換の解消法
最後に、成分置換の解消法について説明する。ICAでは、出力信号が互いに独立となるように分離フィルタの係数を更新し真の分離フィルタを求める。分離フィルタの係数をパラメータ、真の分離フィルタを収束値とすると、独立性の尺度（評価関数）が最小となるようにパラメータ空間を探索し、収束値を求める。従って、パラメータ空間での解の探索問題と考えることが可能である。 [4] Method for eliminating component substitution Finally, a method for eliminating component substitution will be described. In ICA, the true separation filter is obtained by updating the coefficients of the separation filter so that the output signals are independent of each other. When the coefficient of the separation filter is a parameter and the true separation filter is a convergence value, the parameter space is searched so as to minimize the measure of independence (evaluation function), and the convergence value is obtained. Therefore, it can be considered as a solution search problem in the parameter space.

タップ長1の瞬時混合の場合、図2（a）のようにパラメータ曲面は単純であり、収束性や安定性に優れる。しかしながら、残響や反射を考慮した畳み込み混合の場合、パラメータ曲面は図2（ｂ）のように複雑な曲面となって、収束の安定性が低下し、真の分離フィルタ以外の極小解（ローカルミニマム）に陥りやすくなる。 In the case of instantaneous mixing with a tap length of 1, the parameter surface is simple as shown in Fig. 2 (a), and it has excellent convergence and stability. However, in the case of convolutional mixing in consideration of reverberation and reflection, the parameter surface becomes a complex surface as shown in FIG. 2 (b), the convergence stability is lowered, and a minimal solution other than the true separation filter (local minimum) ).

しかしながら、図3のようにビームフォーミング（非特許文献5参照）や先験情報を利用した方法（非特許文献1，6，7）によって得られた結果を適切な初期値W⁰（z）として設定することで、安定に解が探索できると同時に成分置換も解決することができる（非特許文献8参照）。 However, as shown in FIG. 3, the results obtained by beam forming (see Non-Patent Document 5) and methods using a priori information (Non-Patent Documents 1, 6, and 7) are used as appropriate initial values W ⁰ (z). By setting, it is possible to search for a solution stably, and at the same time, to solve component replacement (see Non-Patent Document 8).

そこで、本発明では、まず、瞬時混合のFDICAにおいて成分置換とスケーリングの不定性の問題を解決し、それによって得られた分離行列を、上述の本発明のICAアルゴリズムの初期値として採択する。そして、この初期値に基づいて、本発明のICAアルゴリズムを実行すれば、成分置換のない真の分離フィルタを求めることができる。 Therefore, in the present invention, first, the problem of instability of component replacement and scaling in FDICA of instantaneous mixing is solved, and a separation matrix obtained by that is adopted as an initial value of the above-described ICA algorithm of the present invention. Then, if the ICA algorithm of the present invention is executed based on this initial value, a true separation filter without component replacement can be obtained.

図3は適切な初期値の設定によりICAアルゴリズムの収束性を模式的に表した図である。図3のように、畳み込み混合のパラメータ曲面は複雑であり、成分置換を伴う解も極小値（ローカルミニマム）として存在する。ICAアルゴリズムは自然勾配法に基づく探索であるため、パラメータ曲面の曲率が小さくなる方向に探索点が移動する。従って、初期値を適切に設定することによってローカルミニマムに陥ることなく成分置換のない真の分離フィルタを探索することができる。 FIG. 3 is a diagram schematically showing the convergence of the ICA algorithm by setting appropriate initial values. As shown in FIG. 3, the parameter surface of convolutional mixing is complex, and solutions with component replacement exist as local minimums (local minimum). Since the ICA algorithm is a search based on the natural gradient method, the search point moves in the direction in which the curvature of the parameter curved surface decreases. Therefore, by setting an initial value appropriately, it is possible to search for a true separation filter without component replacement without falling into a local minimum.

図4は、本発明の実施例1に係る耐高残響ブラインド信号分離装置1の全体構成を表す図である。本実施例に係る耐高残響ブラインド信号分離装置1は、2つの音源から発せられる音声信号を2つの受音器により受音して得られる観測信号x₁（t），x₂（t）を分離して、分離信号y₁（t），y₂（t）を出力するものである。 FIG. 4 is a diagram illustrating an overall configuration of the high reverberation blind signal separation device 1 according to the first embodiment of the present invention. The high reverberation blind signal separation device 1 according to the present embodiment uses observation signals x ₁ (t), x ₂ (t) obtained by receiving sound signals emitted from two sound sources by two sound receivers. The separated signals y ₁ (t) and y ₂ (t) are output.

耐高残響ブラインド信号分離装置1は、FFT手段2、サブバンド分離手段3_0〜3_(N-1)、及びIFFT手段4を備えている。 The high reverberation blind signal separation device 1 includes FFT means 2, subband separation means 3_0 to 3_ (N-1), and IFFT means 4.

FFT手段2は、観測信号x₁（t），x₂（t）のそれぞれについて、フレーム単位でFFTを行うことによってサブバンド分解を行い、N個のサブバンドの観測信号x₁（ω_n，k），x₂（ω_n，k）（n＝0，…，N-1）を出力する。ここで、ω_nは角周波数、kはフレーム番号、nはサブバンド番号である。 The FFT means 2 performs subband decomposition by performing FFT on a frame basis for each of the observation signals x ₁ (t) and x ₂ (t), and N-band observation signals x ₁ (ω _n , k), x ₂ (ω _n , k) (n = 0, ..., N-1) are output. Here, ω _n is an angular frequency, k is a frame number, and n is a subband number.

サブバンド分離手段3_n（n＝0，…，N-1）は、観測信号x₁（ω_n，k），x₂（ω_n，k）に対してICAによる分離を行い、分離信号y₁（ω_n，k），y₂（ω_n，k）を出力する。 The subband separation means 3_n (n = 0,..., N−1) separates the observation signals x ₁ (ω _n , k) and x ₂ (ω _n , k) by ICA, and separates the separation signal y ₁ (Ω _n , k), y ₂ (ω _n , k) is output.

IFFT手段4は、周波数領域の分離信号y₁（ω_n，k），y₂（ω_n，k）（n＝0，…，N-1）に対してそれぞれ逆離散コサイン変換（IFFT）を行い、時間領域の分離信号y₁（t），y₂（t）を出力する。 The IFFT means 4 performs inverse discrete cosine transform (IFFT) on the frequency domain separation signals y ₁ (ω _n , k) and y ₂ (ω _n , k) (n = 0,..., N−1), respectively. The time domain separation signals y ₁ (t) and y ₂ (t) are output.

図5は、図4のサブバンド分離手段3_n（n＝0，…，N-1）の構成を表すブロック図である。サブバンド分離手段3_nは、分離行列記憶手段11、分離演算手段12、分離行列初期化手段13、分離行列更新手段14、分離行列合成手段15、成分置換及びスケーリング手段16、及び収束判定手段17を備えている。 FIG. 5 is a block diagram showing the configuration of the subband separation means 3_n (n = 0,..., N−1) in FIG. The subband separation means 3_n includes a separation matrix storage means 11, a separation calculation means 12, a separation matrix initialization means 13, a separation matrix update means 14, a separation matrix synthesis means 15, a component replacement and scaling means 16, and a convergence determination means 17. I have.

分離行列記憶手段11は、分離行列W（ω_n，p）を記憶する。分離演算手段12は、分離行列記憶手段11に記憶された分離行列W（ω_n，p）を用いて、観測信号x₁（ω_n，k），x₂（ω_n，k）から分離信号y₁（ω_n，k），y₂（ω_n，k）を演算する。 The separation matrix storage unit 11 stores a separation matrix W (ω _n , p). The separation calculation means 12 uses the separation matrix W (ω _n , p) stored in the separation matrix storage means 11 to separate the separation signal from the observation signals x ₁ (ω _n , k) and x ₂ (ω _n , k). Calculate y ₁ (ω _n , k), y ₂ (ω _n , k).

分離行列初期化手段13は、観測信号x₁（ω_n，k），x₂（ω_n，k）に基づき、FDICAにより分離行列W（ω_n，p）の更新値ΔW（ω_n，p）を計算する。また、分離演算手段12及び分離行列合成手段15と協働することによって、各周波数間での行列要素の成分置換のない分離行列の近似値（初期分離行列）を演算する。 The separation matrix initializing means 13 is based on the observation signals x ₁ (ω _n , k) and x ₂ (ω _n , k), and the updated value ΔW (ω _n , p) of the separation matrix W (ω _n , p) is obtained by FDICA. ). Further, by cooperating with the separation calculating means 12 and the separation matrix synthesizing means 15, the approximate value (initial separation matrix) of the separation matrix without the component replacement of the matrix elements between the frequencies is calculated.

分離行列更新手段14は、観測信号x₁（ω_n，k），x₂（ω_n，k）に基づき、式（85）を用いてMDP-ConvFDICAにより分離行列W（ω_n，p）の更新値ΔW（ω_n，p）を計算する。また、分離演算手段12及び分離行列合成手段15と協働することによって、分離行列の更新を行う。 The separation matrix updating means 14 uses the equation (85) based on the observation signals x ₁ (ω _n , k) and x ₂ (ω _n , k) to convert the separation matrix W (ω _n , p) by MDP-ConvFDICA. The update value ΔW (ω _n , p) is calculated. In addition, the separation matrix is updated by cooperating with the separation calculation means 12 and the separation matrix synthesis means 15.

分離行列合成手段15は、分離行列初期化手段13又は分離行列更新手段14が出力する分離行列W（ω_n，p）の更新値ΔW（ω_n，p）と分離行列記憶手段11に記憶された分離行列W（ω_n，p）とを加えて新たな分離行列W（ω_n，p）を合成し、これにより分離行列記憶手段11の分離行列W（ω_n，p）を更新する。 The separation matrix synthesizing unit 15 stores the update value ΔW (ω _n , p) of the separation matrix W (ω _n , p) output from the separation matrix initialization unit 13 or the separation matrix update unit 14 and the separation matrix storage unit 11. It was added and the separation matrix W (ω _n, p) to synthesize a new separation matrix W (ω _n, p), thereby updating the separation matrix W of the separation matrix storage unit 11 (ω _n, p).

成分置換及びスケーリング手段16は、FDICAにより得られる分離信号y₁（ω_n，k），y₂（ω_n，k）の成分置換を検出し、分離行列記憶手段11に記憶された分離行列W（ω_n，p）の成分置換を行う。また、スケーリングの不定性についても解消する。 The component replacement and scaling means 16 detects the component replacement of the separation signals y ₁ (ω _n , k) and y ₂ (ω _n , k) obtained by FDICA, and the separation matrix W stored in the separation matrix storage means 11 Perform component replacement of (ω _n , p). It also eliminates scaling indefiniteness.

収束判定手段17は、分離行列初期化手段13及び分離行列更新手段14による分離行列の更新が収束したか否かを判定し、分離行列初期化手段13及び分離行列更新手段14の動作の切り替え制御を行う。 Convergence determination means 17 determines whether or not the update of the separation matrix by the separation matrix initialization means 13 and the separation matrix update means 14 has converged, and controls the switching of the operations of the separation matrix initialization means 13 and the separation matrix update means 14 I do.

以上のように構成された本実施例に係る耐高残響ブラインド信号分離装置1について、以下その動作を説明する。 The operation of the high reverberation-resistant blind signal separation device 1 according to this embodiment configured as described above will be described below.

図6は、本発明の実施例1に係る耐高残響ブラインド信号分離方法の流れを表すフローチャートである。図6において、ステップS1〜S6は分離行列の初期化処理であり、ステップS7〜S12はMDP-ConvFDICAによる信号分離処理である。 FIG. 6 is a flowchart showing the flow of the high reverberation-resistant blind signal separation method according to the first embodiment of the present invention. In FIG. 6, steps S1 to S6 are separation matrix initialization processing, and steps S7 to S12 are signal separation processing by MDP-ConvFDICA.

まず、ステップS1において、時間領域の観測信号x₁（t），x₂（t）がFFT手段2に入力される。 First, in step S 1, time domain observation signals x ₁ (t), x ₂ (t) are input to the FFT means 2.

ステップS2において、FFT手段2は、観測信号x₁（t），x₂（t）のそれぞれについて、フレーム単位でFFTを行いサブバンド分解する。これにより、時間領域の観測信号x₁（t），x₂（t）は、N個のサブバンドの観測信号x₁（ω_n，k），x₂（ω_n，k）（n＝0，…，N-1）に変換される。各観測信号x₁（ω_n，k），x₂（ω_n，k）は、分離演算手段12及び分離行列初期化手段13に入力される。 In step S2, the FFT means 2 performs FFT on a frame basis for each of the observation signals x ₁ (t) and x ₂ (t), and performs subband decomposition. Thus, the observed signal x ₁ in the time domain _{(t), x 2 (t} ) is the observed signal x ₁ of the N subbands _{(ω n, k), x} 2 (ω n, k) (n = 0 , ..., N-1). Each observation signal x ₁ (ω _n , k), x ₂ (ω _n , k) is input to the separation calculation means 12 and the separation matrix initialization means 13.

ステップS3において、各サブバンドの分離演算手段12は、分離行列記憶手段11に記憶された分離行列W（ω_n，p）を読み出し、式（93）により、分離信号Y（ω_n，k）＝［y₁（ω_n，k），y₂（ω_n，k）］^Tを演算する。尚、初期状態においては、分離行列W（ω_n，p）は単位行列に初期化されているものとする。 In step S3, the separation operation means 12 for each subband reads the separation matrix W (ω _n , p) stored in the separation matrix storage means 11, and the separation signal Y (ω _n , k) according to the equation (93). = [Y ₁ (ω _n , k), y ₂ (ω _n , k)] ^T is calculated. In the initial state, it is assumed that the separation matrix W (ω _n , p) is initialized to a unit matrix.

ステップS4において、各サブバンドの分離行列初期化手段13は、FDICA（式（25））により、分離行列の更新値ΔW（ω_n，p）を算出する。分離行列合成手段15は、分離行列記憶手段11に記憶された分離行列W（ω_n，p）を読み出し、分離行列の更新値ΔW（ω_n，p）と加えることにより、新たな分離行列W_new（ω_n，p）を生成する。そして、分離行列記憶手段11に記憶された分離行列W（ω_n，p）を新たな分離行列W_new（ω_n，p）に更新する。 In step S4, the separation matrix initialization unit 13 of each subband calculates an update value ΔW (ω _n , p) of the separation matrix by FDICA (Equation (25)). The separation matrix synthesizing means 15 reads the separation matrix W (ω _n , p) stored in the separation matrix storage means 11 and adds it to the new separation matrix W by adding it to the updated value ΔW (ω _n , p) of the separation matrix. Generate _new (ω _n , p). Then, the separation matrix W (ω _n , p) stored in the separation matrix storage means 11 is updated to a _new separation matrix W _new (ω _n , p).

ステップS5において、収束判定手段17は、分離行列の更新値ΔW（ω_n，p）の大きさ｜ΔW（ω_n，p）｜が所定の閾値W_thより小さいか否かを判定する。｜ΔW（ω_n，p）｜≧W_thであれば、ステップS1に戻る。｜ΔW（ω_n，p）｜＜W_thの場合には、次のステップS6に移行する。 In step S5, the convergence determination means 17 determines whether or not the magnitude | ΔW (ω _n , p) | of the update value ΔW (ω _n , p) of the separation matrix is smaller than a predetermined threshold value W _th . If | ΔW (ω _n , p) | ≧ W _th , the process returns to step S1. If | ΔW (ω _n , p) | <W _th , the process _proceeds to the next step S6.

ステップS6において、成分置換及びスケーリング手段16は、成分置換に関しては、各サブバンドの分離信号y₁（ω_n，k），y₂（ω_n，k）の分割ベクトルの実部の分布に対するエントロピーH₁₁（ω_n），H₂₂（ω_n）を式（33）により計算し、その差ΔH（ω_n）を式（34）により計算する。そして、ΔH（ω_n）＞0であれば、分離信号y₁（ω_n，k），y₂（ω_n，k）の順序を入れ替えるとともに、分離行列記憶手段11に記憶された分離行列W（ω_n，p）の行の置換を行う。次に、スケーリングに関して分離行列に対して式（１６）に準じてスケーリングの解決を行う。これにより、FDICAにおける成分置換が解消される。以上で、分離行列初期化手段13による分離行列の初期化処理が終了し、次にMDPに基づくConvFDICAによる信号分離処理（ステップS7）へ移行する。 In step S6, the component permutation and scaling means 16 performs entropy on the distribution of the real part of the divided vector of the separation signals y ₁ (ω _n , k) and y ₂ (ω _n , k) of each subband regarding the component permutation. H ₁₁ (ω _n ) and H ₂₂ (ω _n ) are calculated by equation (33), and the difference ΔH (ω _n ) is calculated by equation (34). If ΔH (ω _n )> 0, the order of the separation signals y ₁ (ω _n , k) and y ₂ (ω _n , k) is changed, and the separation matrix W stored in the separation matrix storage means 11 is changed. Replace the line of (ω _n , p). Next, scaling is solved for the separation matrix in accordance with Equation (16). This eliminates component replacement in FDICA. Thus, the separation matrix initialization process by the separation matrix initialization unit 13 is completed, and then the process proceeds to the signal separation process (step S7) by ConvFDICA based on MDP.

ステップS7において、時間領域の観測信号x₁（t），x₂（t）がFFT手段2に入力される。 In step S7, time domain observation signals x ₁ (t), x ₂ (t) are input to the FFT means 2.

ステップS8において、FFT手段2は、観測信号x₁（t），x₂（t）のそれぞれについて、フレーム単位でFFTを行いサブバンド分解する。これにより、時間領域の観測信号x₁（t），x₂（t）は、N個のサブバンドの観測信号x₁（ω_n，k），x₂（ω_n，k）（n＝0，…，N-1）に変換される。各観測信号x₁（ω_n，k），x₂（ω_n，k）は、分離演算手段12及び分離行列更新手段14に入力される。 In step S8, the FFT means 2 performs FFT on a frame basis for each of the observation signals x ₁ (t) and x ₂ (t) to perform subband decomposition. Thus, the observed signal x ₁ in the time domain _{(t), x 2 (t} ) is the observed signal x ₁ of the N subbands _{(ω n, k), x} 2 (ω n, k) (n = 0 , ..., N-1). Each observation signal x ₁ (ω _n , k), x ₂ (ω _n , k) is input to the separation calculation means 12 and the separation matrix update means 14.

ステップS9において、各サブバンドの分離演算手段12は、分離行列記憶手段11に記憶された分離行列W（ω_n，p）を読み出し、式（93）により、分離信号Y（ω_n，k）＝［y₁（ω_n，k），y₂（ω_n，k）］^Tを演算する。 In step S9, the separation calculation means 12 for each subband reads the separation matrix W (ω _n , p) stored in the separation matrix storage means 11, and the separation signal Y (ω _n , k) is obtained by Expression (93). = [Y ₁ (ω _n , k), y ₂ (ω _n , k)] ^T is calculated.

ステップS10において、各サブバンドの分離行列更新手段14は、MDP-ConvFDICA（式（85））により、分離行列の更新値ΔW（ω_n，p）を算出する。分離行列合成手段15は、分離行列記憶手段11に記憶された分離行列W（ω_n，p）を読み出し、分離行列の更新値ΔW（ω_n，p）と加えることにより、新たな分離行列W_new（ω_n，p）を生成する。そして、分離行列記憶手段11に記憶された分離行列W（ω_n，p）を新たな分離行列W_new（ω_n，p）に更新する。 In step S10, the separation matrix update means 14 for each subband calculates the update value ΔW (ω _n , p) of the separation matrix by MDP-ConvFDICA (Equation (85)). The separation matrix synthesizing means 15 reads the separation matrix W (ω _n , p) stored in the separation matrix storage means 11 and adds it to the new separation matrix W by adding it to the updated value ΔW (ω _n , p) of the separation matrix. Generate _new (ω _n , p). Then, the separation matrix W (ω _n , p) stored in the separation matrix storage means 11 is updated to a _new separation matrix W _new (ω _n , p).

ステップS11において、IFFT手段4は、全サブバンドの分離信号｛y₁（ω_n，k）｜n＝0，…，N-1｝，｛y₂（ω_i，k）｜n＝0，…，N-1｝について、IFFTを行い時間領域の分離信号y₁（t），y₂（t）に変換する。 In step S11, the IFFT means 4 determines the separation signals {y ₁ (ω _n , k) | n = 0,..., N−1}, {y ₂ (ω _i , k) | n = 0, .., N-1} are subjected to IFFT and converted into time domain separated signals y ₁ (t) and y ₂ (t).

最後に、ステップS12において、IFFT手段4は、時間領域の分離信号y₁（t），y₂（t）を出力し、再びステップS7に戻り、以降は同じ動作を繰り返す。 Finally, in step S12, IFFT means 4 outputs time domain separation signals y ₁ (t) and y ₂ (t), returns to step S7 again, and thereafter repeats the same operation.

尚、本発明は上述の実施例に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、本発明の構成は、コンピュータによっても実現できる。この場合、各手段が有すべき機能の処理内容は、プログラムによって記述され、このプログラムをコンピュータで実行することにより、上述の処理機能をコンピュータ上で実現することができる。 In addition, this invention is not limited to the above-mentioned Example, In the range which does not deviate from the meaning of this invention, it can change suitably. The configuration of the present invention can also be realized by a computer. In this case, the processing contents of the functions that each means should have are described by a program, and the above processing functions can be realized on the computer by executing the program on the computer.

〔実験結果〕
最後に、本発明に係る耐高残響ブラインド信号分離方法（MDP-ConvFDICA）の実験結果について示す。ここでは、残響下で1音源2受音器で混合された音声を分離する実験の結果について説明する。〔Experimental result〕
Finally, experimental results of the high reverberation blind signal separation method (MDP-ConvFDICA) according to the present invention will be described. Here, the result of the experiment which separates the sound mixed with one sound source and two sound receivers under reverberation will be described.

混合信号は、新聞読み上げ音声コーパス（（社）日本音響学会. 新聞記事読み上げ音声コーパス．JNAS Vols. 1-16, 1997.）から選んだ8秒の音声データに、RWCP実環境音声・音響データベース（Real World Conputing Partnership, "RWCP sound scene database in real acoustic environments," http://tosa.mri.co.jp/sounddb/index.htm）から選んだ残響時間300 msのインパルス応答を畳み込むことにより作成した。ここに、インパルス応答の音源と受音器の配置は図7の通りで、受音器間距離2.83 cm、受音器の中心から音源までの距離は2 mである。音声データとしては、男女各2名による2通りの文を使用し、12通りの混合信号を作成した。尚、標本化周波数は8 kHzとした。 The mixed signal is the RWCP real-world speech / acoustic database (8-second speech data selected from the newspaper reading speech corpus (The Japan Society of Acoustics. Newspaper article reading speech corpus. JNAS Vols. 1-16, 1997.)). Real World Computing Partnership, “RWCP sound scene database in real acoustic environments,” http://tosa.mri.co.jp/sounddb/index.htm) . Here, the arrangement of the impulse response sound source and the sound receiver is as shown in FIG. 7, the distance between the sound receivers is 2.83 cm, and the distance from the center of the sound receiver to the sound source is 2 m. As voice data, two mixed sentences by two men and women were used, and 12 mixed signals were created. The sampling frequency was 8 kHz.

ここでは、本発明のMDP-ConvFDICAに加えて、比較例として、自然勾配（Natural Gradient）法（S. Amari, "Natural Gradient Works Efficiently in Learning", Neural Computation, Vol.10, pp.251-276, 1998.）、FastICA（A.Hyvarinen, J.Karhunen and E. Oja, "Independent Component Analysis", Jhon Wiley & Sons, pp.165-202, 2001, ISBN 0-471-40540）、及びConvICA（非特許文献4）による分離実験も併せて行った。自然勾配法とFastICAについては、フレーム長を128，256，512，1024，2048［point］と変化させ、フレーム周期をフレーム長の1／4とし、窓関数をHamming窓としてフーリエ変換を行った。学習に用いた信号の長さは3秒、8秒とした。自然勾配法については、初期荷重は［−1，1］の乱数、更新回数は最大100回とした。FastICAについては、初期荷重は［−1，1］の乱数、更新回数は最大1000回、収束判定基準は更新前後の分離荷重の内積の絶対値が0.999999以上となることとした． Here, in addition to the MDP-ConvFDICA of the present invention, as a comparative example, a natural gradient method (S. Amari, “Natural Gradient Works Efficiently in Learning”, Neural Computation, Vol. 10, pp.251-276 , 1998.), FastICA (A. Hyvarinen, J. Karhunen and E. Oja, "Independent Component Analysis", Jhon Wiley & Sons, pp.165-202, 2001, ISBN 0-471-40540), and ConvICA (non- A separation experiment according to Patent Document 4) was also performed. For the natural gradient method and FastICA, the frame length was changed to 128, 256, 512, 1024, and 2048 [point], the frame period was set to 1/4 of the frame length, and the Fourier transform was performed using the window function as the Hamming window. The length of the signal used for learning was 3 seconds and 8 seconds. For the natural gradient method, the initial load was a random number of [-1, 1], and the maximum number of updates was 100. For FastICA, the initial load was a random number of [-1, 1], the maximum number of updates was 1000, and the convergence criterion was that the absolute value of the inner product of the separation load before and after the update was 0.999999 or more.

ConvFDICA及び本発明のMDP-ConvFDICAについては，フレーム長を128、フレーム周期をフレーム長の1／4とし、窓関数をHamming窓としてフーリエ変換を行った。また、それぞれのフレーム長に対して、分離行列のタップ長を4，8，16と変化させた。学習に用いた信号の長さは3秒とした。ConvFDICAのタップ長4，8，16に対する更新回数はそれぞれ1000回，2000回，4000回で、探索ステップ幅は0.007, 0.002, 0.002とした。 For ConvFDICA and the MDP-ConvFDICA of the present invention, Fourier transform was performed using a frame length of 128, a frame period of 1/4, and a window function as a Hamming window. In addition, the tap length of the separation matrix was changed to 4, 8, and 16 for each frame length. The length of the signal used for learning was 3 seconds. The update times for ConvFDICA tap lengths 4, 8, and 16 were 1000, 2000, and 4000, respectively, and the search step width was 0.007, 0.002, and 0.002.

本発明のMDP-ConvFDICAのタップ長4，8，16に対する更新回数はそれぞれ最大1000回，2000回，4000回で，探索ステップ幅ηは0.02, 0.006, 0.002で、内分比γは0.1とした。 The maximum number of updates for the tap lengths 4, 8, and 16 of the MDP-ConvFDICA of the present invention is 1000 times, 2000 times, and 4000 times, respectively, the search step width η is 0.02, 0.006, and 0.002, and the internal ratio γ is 0.1. .

尚、ConvFDICAとMDP-ConvFDICAについてはブロック化してFFTで計算することにより高速化した。ブロックサイズはタップ長と同じ長さとした。最終的な分離行列は、最も良い分離性能を与える更新回数での分離行列とした。 In addition, ConvFDICA and MDP-ConvFDICA were speeded up by making blocks and calculating with FFT. The block size was the same as the tap length. The final separation matrix is a separation matrix with the number of updates that gives the best separation performance.

〔１〕分離性能の比較
本実験では分離性能の評価尺度として，Noise Reduction Rate（NRR；≡出力SNR［dB］−入力SNR［dB］）を用いる（非特許文献5参照）。
NRRは、目的音声と干渉音声（雑音）の比をSNRとして、次式（91）のように定義される。 [1] Comparison of separation performance In this experiment, Noise Reduction Rate (NRR; ≡ output SNR [dB]-input SNR [dB]) is used as an evaluation scale of separation performance (see Non-Patent Document 5).
NRR is defined as the following equation (91), where the ratio of the target voice to the interference voice (noise) is SNR.

ここで、OSNR_iとISNR_iはそれぞれ出力SNRと入力SNRでi≠jである。また、g_ijは分離行列W(z)と混合行列A(z)の積G(z)= W(z) A(z)のi行j列要素である。図8と図9に各方法による分離性能を示す。図8では横軸はタップ長を示し、縦軸はNNRを示す。また、図9では横軸は窓幅を示し、縦軸はNNRを示す。これらの図から、FastICA、自然勾配法（NG）、MDP-ConvFDICA、ConvFDICAの順に、NNRの値は向上することが読みとれる。FastICAと自然勾配法（NG）の場合、狭帯域分離フィルタのタップ長をM=１としたことに相当する瞬時混合モデルとなっているため、白色化歪みは起きない。また、MDP-ConvFDICAの場合、これまで述べてきたように白色化歪みが起きないように工夫されている。しかし、ConvFDICAの場合、上述で再三指摘してきたように、白色化歪みが起きる。NNRは、従来、分離能力を計るための評価量として用いられてきたが、白色化歪みが大きくなると、NNRも大きくなると云う欠点がある。そのため、白色化歪みの影響も含めて分離能力を評価するSDR（Signal Distortion Ratio）が以下のように提案されている。 Here, OSNR _i and ISNR _i are i ≠ j at the output SNR and the input SNR, respectively. Further, g _ij is an i-row / j-column element of the product G (z) = W (z) A (z) of the separation matrix W (z) and the mixing matrix A (z). Figures 8 and 9 show the separation performance of each method. In FIG. 8, the horizontal axis indicates the tap length, and the vertical axis indicates NNR. In FIG. 9, the horizontal axis indicates the window width, and the vertical axis indicates NNR. From these figures, it can be seen that the NNR value increases in the order of FastICA, natural gradient method (NG), MDP-ConvFDICA, and ConvFDICA. In the case of FastICA and the natural gradient method (NG), whitening distortion does not occur because the instantaneous mixing model corresponds to the tap length of the narrowband separation filter being M = 1. In the case of MDP-ConvFDICA, it has been devised to prevent whitening distortion as described above. However, in the case of ConvFDICA, whitening distortion occurs as pointed out repeatedly above. Conventionally, NNR has been used as an evaluation quantity for measuring separation ability, but there is a drawback that NNR increases as whitening distortion increases. Therefore, SDR (Signal Distortion Ratio) for evaluating the separation ability including the influence of whitening distortion has been proposed as follows.

〔２〕分離信号の音質に関する比較
本実験では、分離信号の白色化歪みに関する評価尺度として、SDR（Signal Distortion Ratio）を用いる（高谷智哉，西川剛樹，猿渡洋，鹿野清宏，「SIMOモデルを用いた高忠実度なブラインド音源分離」，信学技報，EA2002-108，pp.19-24，2003）。SDRを下式（92）のように定義する。 [2] Comparison of sound quality of separated signals In this experiment, SDR (Signal Distortion Ratio) is used as an evaluation scale for whitening distortion of separated signals (Tomoya Takatani, Takeki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano, “SIMO model High-fidelity blind sound source separation used ", IEICE Technical Report, EA2002-108, pp.19-24, 2003). SDR is defined as in the following formula (92).

これは、分離信号の音質を評価するものであり、この値が高い方が歪みなく信号を分離できたことになる。図10，図11に分離信号のSDRを示す。図10では横軸はタップ長を示し、縦軸はSDRを示している。図11では横軸は窓幅を示し、縦軸はSDRを示している。これらの図より、FastICAと自然勾配法（Natural Gradient：NG）では高い値を保っており、本発明のMDP-ConvFDICAでもそれほどの劣化は見られないが、ConvFDICAでは低い値を示しており、音質が劣化していることが読み取れる。このことから、MDP-ConvFDICAはConvFDICAよりSDRが約６ｄB高く原音を忠実に再現できていることが分かる。 This evaluates the sound quality of the separated signal, and a higher value means that the signal can be separated without distortion. 10 and 11 show the SDR of the separated signal. In FIG. 10, the horizontal axis indicates the tap length, and the vertical axis indicates SDR. In FIG. 11, the horizontal axis indicates the window width, and the vertical axis indicates the SDR. From these figures, FastICA and natural gradient method (Natural Gradient: NG) maintain high values, and MDP-ConvFDICA of the present invention does not show much deterioration, but ConvFDICA shows low values, and sound quality It can be seen that is deteriorated. From this, it can be seen that MDP-ConvFDICA has a higher SDR than ConvFDICA by about 6 dB and can faithfully reproduce the original sound.

以上より、本発明に係る耐高残響ブラインド信号分離装置及び耐高残響ブラインド信号分離方法によれば、以下のような効果を得ることができる。 As described above, according to the high reverberation blind signal separation device and high reverberation blind signal separation method according to the present invention, the following effects can be obtained.

（1）混合過程を周波数領域で畳み込みモデルで厳密に定式化した分離モデルを用いることによって、高残響下でも信号を良好に分離することができる。
（2）TDICAやConvFDICAで問題となっていた白色化による歪みが解消される。
（3）本発明における初期値設定法を用いることで、成分置換のない最適解が得られる。これにより、これまで課題となっていた成分置換の問題が解決される。
（4）ConvICAでは未解決であった成分置換やスケーリングの問題が解消される。 (1) By using a separation model in which the mixing process is strictly formulated in a convolution model in the frequency domain, signals can be well separated even under high reverberation.
(2) The distortion caused by whitening, which was a problem with TDICA and ConvFDICA, is eliminated.
(3) By using the initial value setting method in the present invention, an optimal solution without component replacement can be obtained. This solves the problem of component replacement, which has been a problem until now.
(4) The problem of component replacement and scaling that were unsolved in ConvICA is solved.

また、本発明に係る耐高残響ブラインド信号分離方法が高残響下で有効に機能することについて、実験により以下の点が確認された。 In addition, the following points were confirmed by experiments that the anti-high reverberation blind signal separation method according to the present invention functions effectively under high reverberation.

（1）インパルス応答を各周波数ビンに分割するので、残響が長くても各周波数ビンでフィルタ長は短くてすむ。これにより、TDICAやMDPに基づくTDICAで残響時間が長い、すなわちフィルタ長を長く取ると収束しないという欠点が軽減される。
（2）インパルス応答を各周波数ビンに分割するので、短い窓幅でも全ての残響を包含できる。FDICAの窓幅を大きくすると十分な統計量が得られないという欠点が軽減される。
（3）各周波数ビンにMDPを適用することにより、白色化による歪みの影響を回避できる。これにより、ConvFDICAの白色化による歪みの影響が軽減される。
（4）本発明のMDP-ConvFDICAによる分離信号には、白色化歪みによる劣化がなく、原音をほぼ忠実に再現でき、音質もConvFDICAに比べて6dB向上することが確認された。 (1) Since the impulse response is divided into frequency bins, the filter length of each frequency bin can be short even if the reverberation is long. As a result, TDICA based on TDICA and MDP has a long reverberation time, that is, the disadvantage of not converging if the filter length is long.
(2) Since the impulse response is divided into frequency bins, all reverberations can be included even with a short window width. Increasing the window width of FDICA alleviates the disadvantage that sufficient statistics cannot be obtained.
(3) By applying MDP to each frequency bin, the influence of distortion due to whitening can be avoided. Thereby, the influence of distortion due to whitening of ConvFDICA is reduced.
(4) It was confirmed that the separated signal by the MDP-ConvFDICA of the present invention is not deteriorated by whitening distortion, can reproduce the original sound almost faithfully, and the sound quality is improved by 6 dB compared to the ConvFDICA.

時間窓掛け離散フーリエ変換を表す図である。It is a figure showing time windowing discrete Fourier transform. 瞬時混合の場合と畳み込み混合の場合のパラメータ曲面の状態を表す模式図である。It is a schematic diagram showing the state of the parameter curved surface in the case of instantaneous mixing and the case of convolutional mixing. 適切な初期値の設定による解の収束性について説明する図である。It is a figure explaining the convergence of the solution by the setting of a suitable initial value. 本発明の実施例1に係る耐高残響ブラインド信号分離装置1の全体構成を表す図である。1 is a diagram showing an overall configuration of a high reverberation blind signal separation device 1 according to Embodiment 1 of the present invention. FIG. 図4のサブバンド分離手段3_n（n＝0，…，N−1）の構成を表すブロック図である。FIG. 5 is a block diagram showing a configuration of subband separation means 3_n (n = 0,..., N−1) in FIG. 本発明の実施例1に係る耐高残響ブラインド信号分離方法の流れを表すフローチャートである。5 is a flowchart showing a flow of a high reverberation blind signal separation method according to Embodiment 1 of the present invention. 実験における受音器と音源との配置を示す図である。It is a figure which shows arrangement | positioning of the sound receiver and sound source in experiment. FDICAのNNRの測定結果を示す図である。It is a figure which shows the measurement result of NNR of FDICA. ConvFDICAとMDP-ConvFDICAのNNRの測定結果を示す図である。It is a figure which shows the measurement result of NNR of ConvFDICA and MDP-ConvFDICA. FDICAのSDRの測定結果を示す図である。It is a figure which shows the measurement result of SDR of FDICA. ConvFDICAとMDP-ConvFDICAのSDRの測定結果を示す図である。It is a figure which shows the measurement result of SDR of ConvFDICA and MDP-ConvFDICA. R₁＝2、R₂＝2の場合のBSDの構成を表す図である。It is a diagram illustrating the structure of a BSD in the case of _{_{R 1 = 2, R 2 =}} 2. TDICAからみた混合過程と分離過程を表す図である。It is a figure showing the mixing process and separation process seen from TDICA. 成分置換の問題を問題について説明する図である。It is a figure explaining the problem of a component substitution. フーリエ変換の分析窓幅とインパルス応答との関係を表す図である。It is a figure showing the relationship between the analysis window width of a Fourier transform, and an impulse response.

Explanation of symbols

1 耐高残響ブラインド信号分離装置
2 FFT手段
3_0，3_2，…，3_(N-1) サブバンド分離手段
4 IFFT手段
11 分離行列記憶手段
12 分離演算手段
13 分離行列初期化手段
14 分離行列更新手段
15 分離行列合成手段
16 成分置換及びスケーリング手段
17 収束判定手段

1 High reverberation blind signal separator
2 FFT means
3_0, 3_2, ..., 3_ (N-1) Subband separation means
4 IFFT measures
11 Separation matrix storage means
12 Separation calculation means
13 Separation matrix initialization means
14 Separation matrix update means
15 Separation matrix synthesis means
16 Component replacement and scaling means
17 Convergence judgment means

Claims

Time-series data {x _i (ω _n , k)} (ω _n (n = 0,..., N−1)) is a normalized frequency, N is a window width, k (= 0, 1,...) Is a frame number, i (= 1,..., R) is a receiver number, and R is the number of receivers. A high-reverberation blind signal separation device that calculates a separation matrix for each frequency according to "ICA") and calculates a separation signal using the separation matrix,
Based on the observed signal, the separation matrix W (ω _n , p) without the component replacement of the matrix elements between the frequencies (ω _n is the normalized frequency, p is the filter tap number) (hereinafter referred to as “initial” A separation matrix initialization means for calculating a separation matrix ”),
Update formula of the separation matrix W (ω _n , p) by the convolution frequency domain ICA based on the principle of minimum distortion with the initial separation matrix as an initial value for the time series data for each frequency

Anti-high reverberation blind signal separation apparatus characterized by having a separation matrix updating means for updating the separation matrix W (ω _n, p) with.

The separating matrix initialization means, based on said observed signal, resistance to high reverberation blind signal separation device according to claim 1, wherein the calculating the initial separation matrix without component replacement by frequency domain ICA instantaneous mixing.

Time-series data {x _i (ω _n , k)} (ω _n (n = 0,..., N−1)) is a normalized frequency, N is a window width, k (= 0, 1,...) Is a frame number, i (= 1,..., R) is a receiver number, and R is the number of receivers. A high-reverberation blind signal separation method that calculates a separation matrix for each frequency according to "ICA") and calculates a separation signal using the separation matrix,
Based on the observed signal, the separation matrix W (ω _n , p) without the component replacement of the matrix elements between the frequencies (ω _n is the normalized frequency, p is the filter tap number) (hereinafter referred to as “initial” A separation matrix initialization procedure for calculating a separation matrix ”),
Update formula of the separation matrix W (ω _n , p) by the convolution frequency domain ICA based on the principle of minimum distortion with the initial separation matrix as an initial value for the time series data for each frequency

Anti-high reverberation blind signal separation method characterized by having a procedure separation matrix update for updating the separation matrix W (ω _n, p) with.

4. The anti-reverberation blind signal separation method according to claim 3, wherein, in the separation matrix initialization procedure, the initial separation matrix without component replacement is calculated based on the observed signal by a frequency domain ICA of instantaneous mixing.

A program that, when executed by a computer, causes the computer to function as the high reverberation blind signal separation device according to claim 1 or 2 .