JP7639382B2

JP7639382B2 - Audio signal enhancement device, method and program

Info

Publication number: JP7639382B2
Application number: JP2021020858A
Authority: JP
Inventors: 智広中谷; 林太郎池下; 慶介木下; 章子荒木; 哲也上田
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2021-02-12
Filing date: 2021-02-12
Publication date: 2025-03-05
Anticipated expiration: 2041-02-12
Also published as: JP2022123507A

Description

特許法第３０条第２項適用（１）ウェブサイト掲載日：２０２０年１０月１８日（２）ウェブサイトのアドレス：Ｉｎｔｅｒｓｐｅｅｃｈ２０２０（国際会議）ＷＥＢサイト：ｈｔｔｐ：／／ｗｗｗ．ｉｎｔｅｒｓｐｅｅｃｈ２０２０．ｏｒｇ／ｈｔｔｐ：／／ｗｗｗ．ｉｎｔｅｒｓｐｅｅｃｈ２０２０．ｏｒｇ／ｉｎｄｅｘ．ｐｈｐ？ｍ＝ｃｏｎｔｅｎｔ＆ｃ＝ｉｎｄｅｘ＆ａ＝ｓｈｏｗ＆ｃａｔｉｄ＝２４４＆ｉｄ＝３２５ｈｔｔｐ：／／ｗｗｗ．ｉｎｔｅｒｓｐｅｅｃｈ２０２０．ｏｒｇ／ｕｐｌｏａｄｆｉｌｅ／ｐｄｆ／Ｍｏｎ－１－２－９．ｐｄｆArticle 30, paragraph 2 of the Patent Act applies (1) Date posted on the website: October 18, 2020 (2) Website address: Interspeech2020 (International Conference) WEB site: http://www.interspeech2020.org/ http://www.interspeech2020.org/index.php?m=content&c=index&a=show&catid=244&id=325 http://www.interspeech2020.org/uploadfile/pdf/Mon-1-2-9.pdf

特許法第３０条第２項適用（１）開催日：２０２０年１０月２５日～１０月２９日（公開日：２０２０年１０月２６日）（２）集会名：Ｉｎｔｅｒｓｐｅｅｃｈ２０２０（国際会議）ＷＥＢ開催Article 30, paragraph 2 of the Patent Act applies. (1) Date: October 25th to October 29th, 2020 (opening date: October 26th, 2020) (2) Name of the conference: Interspeech 2020 (international conference) held online

この発明は、複数の音やその残響が混ざって、複数のマイクロホンで集音された音響信号から、各構成音に関する事前情報なしで、残響を抑圧するとともに、個別の音に分離する音響信号強調技術に関する。 This invention relates to an audio signal enhancement technology that suppresses reverberation and separates individual sounds from an audio signal that is a mixture of multiple sounds and their reverberations and is collected by multiple microphones, without prior information about each component sound.

従来より、各構成音に関する事前情報なしの状況で、すべての構成音に関する残響をオンライン処理に基づき抑圧するオンライン残響抑圧法が考案されている（例えば、非特許文献１参照。）。 In the past, online dereverberation methods have been devised that suppress the reverberation of all component sounds based on online processing in a situation where there is no prior information about each component sound (see, for example, Non-Patent Document 1).

また、残響を含まない混合音を個別の音に分離するオンライン音源分離法が考案されている（例えば、非特許文献２）。 In addition, an online sound source separation method has been devised that separates a mixed sound that does not contain reverberation into individual sounds (for example, non-patent document 2).

したがって、それぞれを残響抑圧ステップ、音源分離ステップとして、図６のように接続することで、オンライン処理によりすべての構成音の残響を抑圧するとともに個別の音に分離する音響信号強調法は、従来から構成することができた。 Therefore, by connecting the steps as shown in Figure 6 as a dereverberation step and a sound source separation step, it has been possible to construct an acoustic signal enhancement method that suppresses the reverberation of all component sounds and separates them into individual sounds through online processing.

J. Caroselli, et al. “Adap-tive multichannel dereverberation for automatic speech recog-nition.”, inProc. Interspeech, 2017, pp. 3877-3881.J. Caroselli, et al. “Adaptive multichannel dereverberation for automatic speech recog-nition.”, inProc. Interspeech, 2017, pp. 3877-3881. T. Taniguchi, et al. “An anauxiliary-function approach to online independent vector anal-ysis for real-time blind source separation”, inProc. HSCMA, 2014, pp. 107-111.T. Taniguchi, et al. “An anauxiliary-function approach to online independent vector anal-ysis for real-time blind source separation”, inProc. HSCMA, 2014, pp. 107-111.

しかし、従来法では、残響抑圧ステップの処理は、後段の音源分離ステップの処理と独立して行なわれるため、残響抑圧と音源分離を同時に行う上で、全体として最適な処理が行えなかった。 However, in conventional methods, the dereverberation step is performed independently of the subsequent sound source separation step, making it impossible to achieve optimal overall processing when simultaneously performing dereverberation and sound source separation.

この発明は、全体として最適な処理を行う音響信号強調装置、方法及びプログラムを提供することを目的とする。 The objective of this invention is to provide an audio signal enhancement device, method, and program that performs optimal processing overall.

この発明の一態様による音響信号強調装置は、各時刻tにおいて、時刻tの観測信号ベクトルと時刻t-1に求められた時間空間共分散行列の逆行列を受け取り、観測信号ベクトルに対応する、音源nに対応する又は全音源共通の残響抑圧信号ベクトルを生成する残響抑圧部と、各時刻tにおいて、生成された音源nに対応する又は全音源共通の残響抑圧信号ベクトルを用いて、音源nの強調音及び音源nのパワーを求める音源分離部と、各時刻tにおいて、音源nのパワーと、観測信号ベクトルとを受け取り、音源nに対応する時間空間共分散行列の逆行列を求める時空間パラメータ更新部と、を備えている。 An acoustic signal enhancement device according to one aspect of the present invention includes a reverberation reduction unit that receives, at each time t, an observation signal vector at time t and an inverse matrix of a time-space covariance matrix determined at time t−1, and generates a dereverberation-reduced signal vector corresponding to the observation signal vector and corresponding to a sound source n or common to all the sound sources ; a sound source separation unit that determines, at each time t, an enhanced sound for the sound source n and a power of the sound source n, using the generated reverberation-reduced signal vector corresponding to the sound source n or common to all the sound sources; and a space-time parameter update unit that receives, at each time t, the power of the sound source n and the observation signal vector, and determines an inverse matrix of the time-space covariance matrix corresponding to the sound source n.

この発明の一態様による音響信号強調装置は、tは時間フレームの番号であり、fは周波数の番号であり、Nは音源の個数であり、Mはマイクの個数であり、n=1,…,Nであり、m=1,…,Mであり、音源nに対応する一時刻前に求まった残響抑圧フィルタG_n(f;t-1)と、マイクmの観測信号x_m(f,t)から構成される観測信号ベクトルX(f,t)とを用いて、観測信号x_m(f,t)に対応する残響抑圧信号y_n,m(f,t)から構成される、音源nの強調音に関する残響抑圧信号ベクトルY_n(f,t)を生成する残響抑圧部と、音源nに対応する一時刻前に生成された音源分離フィルタQ_n(f;t-1)と、生成された残響抑圧信号ベクトルY_n(f,t)とを用いて、音源nの強調音s_n(f,t)及び音源nのパワーv_n(t)を求める音源分離部と、音源nのパワーv_n(t)と、観測信号ベクトルX(f,t)と、音源nに対応する一時刻前に求まった時間空間共分散行列の逆行列R_n ^-1(f;t-1)とを用いて、音源nに対応するカルマンゲインK_n(f,t)及び音源nに対応する時間空間共分散行列の逆行列R_n ^-1(f;t)を求める時空間パラメータ更新部と、を含み、残響抑圧部は、音源nに対応する一時刻前に求まった残響抑圧フィルタG_n(f;t-1)と、音源nに対応するカルマンゲインK_n(f,t)と、音源nのパワーv_n(t)を用いて、音源nに対応する残響抑圧フィルタG_n(f;t)を求め、音源分離部は、残響抑圧信号ベクトルY_n(f,t)及び音源nのパワーv_n(t)を用いて、音源nに対応する空間共分散行列Σ_n(f,t)を求め、求まった音源nに対応する空間共分散行列Σ_n(f,t)と、音源nに対応する一時刻前に生成された音源分離フィルタQ_n(f;t-1)とを用いて、音源nに対応する音源分離フィルタQ_n(f;t)を求める。 An audio signal enhancement device according to one aspect of the present invention includes a reverberation suppression unit that generates a dereverberation suppression signal vector Y n (f,t) for an enhanced sound of the sound source n, the dereverberation signal vector Y n (f,t) composed of a dereverberation suppression signal y n,m (f,t) corresponding to an observation signal x _m (f,t) using a dereverberation filter G _n (f;t-1) corresponding to the sound source n obtained one time ago and an observation signal vector _X (f,t) composed of an observation signal x _m (f,t) of the microphone m, a sound source separation unit that obtains an enhanced sound s _n (f,t) of the sound source n and a power v n (t) of the sound source n using a sound source separation filter Q _n (f;t-1) corresponding to the sound source _n obtained one time ago and the generated dereverberation suppression signal vector _Y _n (f, _t ), and and a spatio-temporal parameter updating unit that uses the dereverberation filter G n (f; t-1) corresponding to the sound source n obtained one time ago, the Kalman gain K _n (f, t) corresponding to the sound source n, and the inverse matrix _R _n ^-1 (f; t-1) of the spatio-temporal covariance matrix corresponding to the sound source ⁿ , the dereverberation filter G n (f; t) corresponding to the sound source n is obtained using the dereverberation filter G _n (f; t-1) corresponding to the sound source n obtained one time ago, the Kalman gain K _n (f, t) corresponding to the sound source n, and the power v _n (t) of the sound source n. The sound source separation unit uses the dereverberation _filter vector Y _n (f, t) and the power v _n (t) of the sound source n to obtain a spatial covariance matrix Σ _n (f, t) corresponding to the sound source n, and obtains the obtained spatial covariance matrix Σ _n A sound source separation filter Q _n (f; t) corresponding to sound source n is obtained by using (f, t) and a sound source separation filter Q _n (f; t-1) corresponding to sound source n that was generated one time before.

この発明の一態様による音響信号強調装置は、tは時間フレームの番号であり、fは周波数の番号であり、Nは音源の個数であり、Mはマイクの個数であり、n=1,…,Nであり、m=1,…,Mであり、一時刻前に求まった残響抑圧フィルタG(f;t-1)と、マイクmの観測信号x_m(f,t)から構成される観測信号ベクトルX(f,t)とを用いて、観測信号x_m(f,t)に対応する残響抑圧信号y_m(f,t)から構成される残響抑圧信号ベクトルY(f,t)を生成する残響抑圧部と、一時刻前に生成された音源分離フィルタQ(f;t-1)と、生成された残響抑圧信号ベクトルY(f,t)とを用いて、音源nの強調音s_n(f,t)から構成される強調音ベクトルS(f,t)及び音源nのパワーv_n(t)を求める音源分離部と、音源nのパワーv_n(t)と、観測信号ベクトルX(f,t)と、音源nに対応する一時刻前に求まった時間空間共分散行列の逆行列R_n ^-1(f;t-1)とを用いて、音源nに対応するカルマンゲインK_n(f,t)及び音源nに対応する時間空間共分散行列の逆行列R_n ^-1(f;t)を求める時空間パラメータ更新部と、を含み、残響抑圧部は、音源nに対応する一時刻前に求まった残響抑圧フィルタG_n(f;t-1)と、音源nに対応するカルマンゲインK_n(f,t)と、音源nのパワーv_n(t)を用いて、音源nに対応する残響抑圧フィルタG_n(f;t)を求め、求まった各音源nに対応する残響抑圧フィルタG_n(f;t)と、一時刻前に生成された音源分離フィルタQ(f;t-1)とを用いて、残響抑圧フィルタG(f;t)を求め、音源分離部は、残響抑圧信号ベクトルY(f,t)及び音源nのパワーv_n(t)を用いて、音源nに対応する空間共分散行列Σ_n(f,t)を求め、求まった音源nに対応する空間共分散行列Σ_n(f,t)と、音源nに対応する一時刻前に生成された音源分離フィルタQ_n(f;t-1)とを用いて、音源nに対応する音源分離フィルタQ_n(f;t)を求める。 An audio signal enhancement device according to one aspect of the present invention includes a reverberation suppression unit that generates a dereverberation suppression signal vector Y(f,t) composed of a dereverberation suppression signal ym(f,t) corresponding to an observation signal xm(f,t) using a dereverberation filter G(f;t-1) obtained one time ago and an observation signal vector _X (f,t) composed of an observation signal _xm (f,t) of microphone _m , a sound source separation unit that obtains an enhancement sound vector S(f,t) composed of an enhancement sound sn(f,t) of sound source n and a power _vn (t) of sound source n using a sound source separation filter _Q (f;t-1) generated one time ago and the generated reverberation suppression signal vector _Y (f,t), and the sound source separation unit determines a dereverberation filter G n (f; t) corresponding to the sound source ⁿ using the dereverberation filter G _n (f; t-1) corresponding to the sound source n determined one time ago, the Kalman gain K _n (f, t) corresponding to the sound source n, and the power v ^{n (t) of the sound source n, and determines a dereverberation filter G n} ₍ f; t) corresponding to each sound source n using the determined dereverberation filter _G _n (f; t) corresponding to each sound source n and the sound source separation filter Q( _f ; t-1) generated one time ago, and the sound source separation unit determines a dereverberation filter _G (f; t) corresponding to each sound source n using the dereverberation filter G _n (f; t) corresponding to each sound source n determined one time ago, the Kalman gain K n (f, t) corresponding to the sound source n, and the power v n (t) of the sound source _n . (t) is used to determine the spatial covariance matrix Σ _n (f,t) corresponding to the sound source n, and a sound source separation filter Q n (f;t) corresponding to the sound source n is determined using the spatial covariance matrix Σ _n (f,t) corresponding to the sound source n thus determined and a sound source separation filter Q _n (f;t-1) corresponding to the sound source n that was generated one _time earlier.

全体として最適な処理を行うことができる。 This allows for optimal processing overall.

図１は、第一実施形態の音響信号強調装置の機能構成の例を示す図である。FIG. 1 is a diagram illustrating an example of a functional configuration of an acoustic signal enhancement device according to a first embodiment. 図２は、音響信号強調方法の処理手続きの例を示す図である。FIG. 2 is a diagram showing an example of a processing procedure of the audio signal enhancement method. 図３は、第一実施形態の音響信号強調装置の機能構成の例を示す図である。FIG. 3 is a diagram illustrating an example of a functional configuration of the acoustic signal enhancement device according to the first embodiment. 図４は、第一実施形態と第二実施形態の上位概念の音響信号強調装置の機能構成の例を示す図である。FIG. 4 is a diagram showing an example of a functional configuration of an acoustic signal enhancement device according to a higher-level concept of the first and second embodiments. 図５は、コンピュータの機能構成例を示す図である。FIG. 5 is a diagram illustrating an example of a functional configuration of a computer. 図６は、背景技術を説明するための図である。FIG. 6 is a diagram for explaining the background art.

以下、本発明の実施の形態について詳細に説明する。なお、図面中において同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 The following describes in detail an embodiment of the present invention. Note that components having the same functions in the drawings are given the same numbers, and duplicated explanations are omitted.

[第一実施形態]
第一実施形態の音響信号強調装置は、音源ごとに異なる残響抑圧フィルタG_n(f,t)を用いて残響抑圧をするものである。 [First embodiment]
The sound signal enhancing device of the first embodiment performs dereverberation using a different dereverberation filter G _n (f,t) for each sound source.

第一実施形態の音響信号強調装置は、図１に示すように、初期化部１、残響抑圧部２、音源分離部３及び時空間パラメータ更新部４を例えば備えている。 As shown in FIG. 1, the audio signal enhancement device of the first embodiment includes, for example, an initialization unit 1, a reverberation suppression unit 2, a sound source separation unit 3, and a spatiotemporal parameter update unit 4.

第一実施形態の音響信号強調方法は、音響信号強調装置の各構成部が、以下に説明する及び図２に示すステップＳ１からステップＳ６の処理を行うことにより例えば実現される。 The acoustic signal enhancement method of the first embodiment is realized, for example, by each component of the acoustic signal enhancement device performing the processes from step S1 to step S6 described below and shown in FIG. 2.

なお、文中で使用する記号「^-」は、本来直後の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直前に記載する。数式中においてはこれらの記号は本来の位置、すなわち文字の真上に記載している。例えば、文中の「^-X」は、数式中では以下のように記載される。

まず、記号の使い方について説明する。 In addition, the symbol " ^- " used in a sentence should be written directly above the character immediately following it, but due to limitations in text notation, it is written immediately before the character in question. In mathematical expressions, these symbols are written in their original position, that is, directly above the character. For example, " ^- X" in a sentence is written in a mathematical expression as follows:

First, the use of symbols will be explained.

Mはマイクの数であり、m(1≦m≦M)はマイクの番号である。Mは、２以上の正の整数である。 M is the number of microphones, and m (1≦m≦M) is the microphone number. M is a positive integer greater than or equal to 2.

Nは音源の数であり、n(1≦n≦N)は音源の番号である。Nは、２以上の正の整数である。 N is the number of sound sources, and n (1≦n≦N) is the number of the sound source. N is a positive integer greater than or equal to 2.

なお、マイクの番号及び音源の番号は、右下添え字で表される。 The microphone number and sound source number are indicated by subscripts at the bottom right.

t, τ(1≦t,τ≦T)は、時間フレームの番号である。Tは、時間フレームの総数であり、２以上の正の整数である。 t, τ (1≦t, τ≦T) are the time frame numbers. T is the total number of time frames and is a positive integer greater than or equal to 2.

f(1≦f≦F)は、周波数の番号である。Fは、最も高い周波数ビンに対応する周波数である。 f (1≦f≦F) is the frequency number. F is the frequency corresponding to the highest frequency bin.

(・)^Tは行列又はベクトルの非共役転置であり、(・)^Hは行列又はベクトルの共役転置である。・は任意の行列又はベクトルである。 (·) ^T is the anti-conjugate transpose of a matrix or vector, and (·) ^H is the conjugate transpose of a matrix or vector. · is any matrix or vector.

アルファベットの小文字は、スカラー変数である。例えば、マイクmにおける時刻t、周波数fの観測信号x_m(f,t)は、スカラー変数である。 Lowercase letters denote scalar variables. For example, an observed signal x _m (f,t) at microphone m with time t and frequency f is a scalar variable.

アルファベットの大文字は、ベクトル又は行列を表す。例えば、X(f,t)=[x₁(f,t),x₂(f,t),…,x_N(f,t)]^T∈C^M×1は、時刻t、周波数fでの、全マイクにおける観測信号ベクトルである。 Capital letters represent vectors or matrices, e.g., X(f,t)=[ _x1 (f,t), _x2 (f,t),..., _xN (f,t)] ^T ∈ ^{C M×1} is the observed signal vector at all microphones at time t and frequency f.

C^M×Nは、M×N次元複素行列の全体集合である。X∈C^M×Nは、その要素であることを示す記法である。すなわち、Xは、C^M×Nの要素であることを示す。 C ^M×N is the universal set of M×N dimensional complex matrices. X∈C ^M×N is a notation that indicates that it is an element. In other words, X indicates that it is an element of C ^M×N .

^-X(f,t)は、時刻t-D-L+1から時刻t-Dの過去の観測信号時系列のベクトルであり、^-X(f,t)=[X(f,t-D)^T,x(f,t-D+1)^T,…,x_N(f,t-D-L+1)^T]^T∈C^ML×1である。Dは、予測遅延であり、１以上の正の整数である。 ^- X(f,t) is a vector of past observed signal time series from time tD-L+1 to time tD, ^- X(f,t)=[X(f,tD) ^T ,x(f,t-D+1) ^T ,…,x _N (f,tD-L+1) ^T ] ^T ∈C ^ML×1 . D is the prediction delay, a positive integer equal to or greater than 1.

v_n(t)は、時刻tにおける音源nのパワーであり、スカラーである。 v _n (t) is the power of source n at time t and is a scalar.

s_n(f,t)は、時刻t、周波数fにおける音源nの強調音であり、スカラーである。 s _n (f,t) is the emphasis sound of sound source n at time t and frequency f, and is a scalar.

G_n(f,t)∈C^M(L-D)×M,G(f,t)∈C^M(L-D)×Mは、周波数fにおける音源nの残響抑圧フィルタの時刻tにおける推定値である。Lは、フィルタ次数であり、２以上の正の整数である。 _Gn (f,t)∈CM ^(LD)×M , G(f,t)∈CM ^(LD)×M are the estimates of the dereverberation filter for sound source n at frequency f at time t. L is the filter order and is a positive integer equal to or greater than 2.

Q(f;t)=[Q₁(f;t), Q₂(f;t),…, Q_N(f;t)]^T∈C^M×Nは、周波数fの分離行列であり、Q_n(f;t)は周波数fにおける音源nの音源分離フィルタである。 Q(f;t) = [ _Q1 (f;t), _Q2 (f;t), ..., _QN (f;t)] ^T ∈ ^{C M × N} is the separation matrix for frequency f, and _Qn (f;t) is the source separation filter for source n at frequency f.

R_n ^-1(f;t)∈C^{M(L-D)×M(L-D)}は、周波数f、時刻tにおける音源nに関する時間空間共分散行列の逆行列である。 R _n ^-1 (f;t) ∈ ^{C M(LD) × M(LD)} is the inverse of the time-space covariance matrix for sound source n at frequency f and time t.

K_n(f,t)∈C^M(L-D)×1は、周波数f、時刻tにおける音源nに関するカルマンゲインである。 K _n (f,t) ∈ ^{C M(LD)×1} is the Kalman gain for source n at frequency f and time t.

以下、音響信号強調装置の各構成部について説明する。なお、以下では、時刻t及び周波数fは、所与のものとして説明するが、実際には、以下に説明する処理は、各時刻tにおいて各周波数fに対して行われる。すなわち、音響信号強調装置は、短時間フーリエ変換などにより周波数分割された音響信号を逐次的に受け取り、各時刻t及び各周波数fごとに以下の処理を行う。 Each component of the audio signal enhancement device will be described below. Note that in the following description, the time t and frequency f are assumed to be given, but in reality, the processing described below is performed for each frequency f at each time t. In other words, the audio signal enhancement device sequentially receives audio signals that have been frequency-divided by short-time Fourier transform or the like, and performs the following processing for each time t and each frequency f.

<初期化部１>
n=1,…,Nとして、初期化部１は、初期化として、残響抑圧フィルタG_n(f;0)の全要素を所定の値（例えば0）、音源分離フィルタQ(f;0)=[Q₁(f;0),Q₂(f;0),…,Q_N(f;0)]と時間空間共分散行列の逆行列R_n ^-1(f;0)と音源ごとの残響抑圧音の空間共分散行列Σ_n(f,0)をそれぞれ所定の行列（例えば単位行列）とする。 <Initialization section 1>
For n=1, ..., N, the initialization unit 1 initializes all elements of the dereverberation filter G _n (f;0) to predetermined values (e.g., 0), and sets the sound source separation filter Q(f;0)=[Q ₁ (f;0), Q ₂ (f;0), ..., Q _N (f;0)], the inverse matrix R _n ^-1 (f;0) of the time-space covariance matrix, and the spatial covariance matrix Σ _n (f,0) of the reverberation-suppressed sound for each sound source to predetermined matrices (e.g., unit matrices).

所定の値は、0以外の値であってもよい。所定の行列は、単位行列以外の行列であってもよい。 The predetermined value may be a value other than 0. The predetermined matrix may be a matrix other than an identity matrix.

初期化された残響抑圧フィルタG_n(f;0)は、残響抑圧部２に出力され記憶される。 The initialized dereverberation filter G _n (f;0) is output to the dereverberation unit 2 and stored therein.

初期化された音源分離フィルタQ(f;0)及び空間共分散行列Σ_n(f,0)は、音源分離部３に出力され記憶される。 The initialized sound source separation filter Q(f;0) and spatial covariance matrix Σ _n (f,0) are output to the sound source separation unit 3 and stored.

時間空間共分散行列の逆行列R_n ^-1(f;0)は、時空間パラメータ更新部４に出力され記憶される。 The inverse matrix R _n ^-1 (f;0) of the time-space covariance matrix is output to the time-space parameter update unit 4 and stored therein.

なお、初期化部１の処理は、時刻t=0のときに１度だけ行われる。以下に説明する残響抑圧部２、音源分離部３及び時空間パラメータ更新部４の処理は、各時刻tにおいて行われる。 The processing of the initialization unit 1 is performed only once at time t=0. The processing of the reverberation reduction unit 2, sound source separation unit 3, and spatiotemporal parameter update unit 4 described below is performed at each time t.

<残響抑圧部２（第一の処理）>
残響抑圧部２は、第一の処理及び後述する第二の処理を行う。ここでは、残響抑圧部２の第一の処理について説明する。 <Dereverberation Unit 2 (First Processing)>
The dereverberation unit 2 performs a first process and a second process, which will be described later. Here, the first process performed by the dereverberation unit 2 will be described.

残響抑圧部２には、マイクmの観測信号x_m(f,t)から構成される観測信号ベクトルX(f,t)が入力される。 The dereverberation unit 2 receives an observed signal vector X(f,t) formed from an observed signal x _m (f,t) of a microphone m.

n=1,…,Nとして、残響抑圧部２は、音源nに対応する一時刻前に求まった残響抑圧フィルタG_n(f;t-1)と、観測信号ベクトルX(f,t)とを用いて、観測信号x_m(f,t)に対応する残響抑圧信号y_n,m(f,t)から構成される、音源nの強調音に関する残響抑圧信号ベクトルY_n(f,t)を生成する（ステップＳ２）。 For n=1, ..., N, the dereverberation unit 2 uses the dereverberation filter G _n (f;t-1) corresponding to the sound source n obtained one time earlier and the observed signal vector X(f,t) to generate a dereverberation signal vector Y n (f,t) for the emphasized sound of the sound source n, which is composed of a dereverberation signal y _n _,m (f,t) corresponding to the observed signal x _m (f,t) (step S2).

すなわち、残響抑圧部２は、各音源1,…,Nに対応する残響抑圧信号ベクトルY₁(f,t),…,Y_N(f,t)を生成する。ここで、Y_n(f,t)=[y_n,1(f,t),y_n,2(f,t),…,y_n,M(f,t)]であり、m=1,…,Mとして、y_n,m(f,t)は、音源nの強調音に関する、観測信号x_m(f,t)に対応する残響抑圧信号である。 That is, the dereverberation unit 2 generates dereverberation signal vectors _Y1 (f,t),..., _YN (f,t) corresponding to each sound source 1,...,N, where _Yn (f,t)=[ _yn,1 (f,t),yn _,2 (f,t),..., _yn,M (f,t)], where m=1,...,M _{, yn,m} (f,t) is a dereverberation signal corresponding to the observed signal _xm (f,t) for the emphasized sound of sound source n.

生成された残響抑圧信号ベクトルY_n(f,t)は、音源分離部３に出力される。 The generated dereverberation signal vector Y _n (f, t) is output to the sound source separation unit 3 .

音源nに対応する一時刻前に求まった残響抑圧フィルタG_n(f;t-1)は、残響抑圧部２に記憶されている。残響抑圧部２は、この記憶された音源nに対応する一時刻前に求まった残響抑圧フィルタG_n(f;t-1)を用いて処理を行う。 The dereverberation filter G _n (f; t-1) obtained one time before and corresponding to the sound source n is stored in the dereverberation unit 2. The dereverberation unit 2 performs processing using the stored dereverberation filter G _n (f; t-1) obtained one time before and corresponding to the sound source n.

残響抑圧部２は、例えば以下の式に基づいて残響抑圧信号ベクトルY_n(f,t)を求める。

<音源分離部３（第一の処理）>
音源分離部３は、第一の処理及び後述する第二の処理を行う。ここでは、音源分離部３の第一の処理について説明する。 The dereverberation unit 2 calculates the dereverberation signal vector Y _n (f,t) based on, for example, the following equation.

<Sound source separation unit 3 (first process)>
The sound source separation unit 3 performs a first process and a second process, which will be described later. Here, the first process performed by the sound source separation unit 3 will be described.

音源分離部３には、残響抑圧部２で生成された残響抑圧信号ベクトルY_n(f,t)が入力される。 The dereverberation signal vector Y _n (f, t) generated by the dereverberation unit 2 is input to the sound source separation unit 3 .

n=1,…,Nとして、音源分離部３は、音源nに対応する一時刻前に生成された音源分離フィルタQ_n(f;t-1)と、残響抑圧信号ベクトルY_n(f,t)とを用いて、音源nの強調音s_n(f,t)及び音源nのパワーv_n(t)を求める（ステップＳ３）。 For n=1, ..., N, the sound source separation unit 3 uses the sound source separation filter Q _n (f; t-1) generated one time earlier and corresponding to the sound source n, and the dereverberation suppression signal vector Y _n (f, t) to obtain the emphasized sound s _n (f, t) of the sound source n and the power v _n (t) of the sound source n (step S3).

すなわち、音源分離部３は、各音源1,…,Nに対応する強調音s₁(f,t),…,s_N(f,t)と、各音源1,…,Nに対応するv₁(f),…,v_N(t)を生成する。 That is, the sound source separation unit 3 generates emphasized sounds _s1 (f,t),...,sN(f,t) corresponding to each sound source 1,..., _N , and v1(f),..., _vN (t) corresponding to each sound source ₁ ,...,N.

生成された音源nの強調音s_n(f,t)は、音響信号強調装置から出力される。生成された音源nのパワーv_n(t)は、時空間パラメータ更新部４に出力される。 The generated emphasized sound s _n (f, t) of the sound source n is output from the sound signal emphasis device. The generated power v _n (t) of the sound source n is output to the spatio-temporal parameter update unit 4.

音源nに対応する一時刻前に求まった音源分離フィルタQ_n(f;t-1)は、音源分離部３に記憶されている。音源分離部３は、この記憶された音源nに対応する一時刻前に求まった音源分離フィルタQ_n(f;t-1)を用いて処理を行う。 The sound source separation filter Q _n (f; t-1) obtained one time before and corresponding to the sound source n is stored in the sound source separation unit 3. The sound source separation unit 3 performs processing using the stored sound source separation filter Q _n (f; t-1) obtained one time before and corresponding to the sound source n.

音源分離部３は、例えば以下の式に基づいて音源nの強調音s_n(f,t)を求める。

また、音源分離部３は、例えば以下の式に基づいて音源nのパワーv_n(t)を求める。

<時空間パラメータ更新部４>
時空間パラメータ更新部４には、観測信号ベクトルX(f,t)と、音源分離部３で生成された音源nのパワーv_n(t)とが入力される。 The sound source separation unit 3 obtains an emphasis sound s _n (f, t) of the sound source n based on, for example, the following formula.

Further, the sound source separation unit 3 obtains the power v _n (t) of the sound source n based on, for example, the following formula.

<Time-space parameter update unit 4>
The spatio-temporal parameter update unit 4 receives the observed signal vector X(f,t) and the power v _n (t) of the sound source n generated by the sound source separation unit 3 .

n=1,…,Nとして、時空間パラメータ更新部４は、音源nのパワーv_n(t)と、観測信号ベクトルX(f,t)と、音源nに対応する一時刻前に求まった時間空間共分散行列の逆行列R_n ^-1(f;t-1)とを用いて、音源nに対応するカルマンゲインK_n(f,t)及び音源nに対応する時間空間共分散行列の逆行列R_n ^-1(f;t)を求める（ステップＳ４）。 For n=1, ..., N, the spatiotemporal parameter update unit 4 uses the power v _n (t) of sound source n, the observed signal vector _X (f,t), and the inverse matrix R _n ^-1 (f;t-1) of the spatiotemporal covariance matrix corresponding to sound source _n calculated one ^time earlier (step S4).

すなわち、時空間パラメータ更新部４は、各音源1,…,Nに対応するカルマンゲインK₁(f,t),…,K_N(f,t)と、各音源1,…,Nに対応する時間空間共分散行列の逆行列R₁ ^-1(f;t),…,R_N ^-1(f;t)を求める。 That is, the spatiotemporal parameter update unit 4 calculates the Kalman gains K ₁ (f,t), ..., K _N (f,t) corresponding to each sound source 1, ..., N, and the inverse matrices R ₁ ^-1 (f;t), ..., R _N ^-1 (f;t) of the spatiotemporal covariance matrices corresponding to each sound source 1, ..., N.

求まった音源nに対応するカルマンゲインK_n(f,t)及び音源nに対応する時間空間共分散行列の逆行列R_n ^-1(f;t)は、残響抑圧部２に出力される。 The obtained Kalman gain K _n (f, t) corresponding to the sound source n and the inverse matrix R _n ⁻¹ (f; t) of the time-space covariance matrix corresponding to the sound source n are output to the dereverberation unit 2 .

音源nに対応する一時刻前に求まった時間空間共分散行列の逆行列R_n ^-1(f;t-1)は、時空間パラメータ更新部４に記憶されている。時空間パラメータ更新部４は、この記憶された音源nに対応する一時刻前に求まった時間空間共分散行列の逆行列R_n ^-1(f;t-1)を用いて処理を行う。 The inverse matrix R _n ^-1 (f; t-1) of the spatiotemporal covariance matrix obtained one time before and corresponding to the sound source n is stored in the spatiotemporal parameter update unit 4. The spatiotemporal parameter update unit 4 performs processing using the stored inverse matrix R _n ^-1 (f; t-1) of the spatiotemporal covariance matrix obtained one time before and corresponding to the sound source n.

時空間パラメータ更新部４は、例えば以下の式に基づいて音源nに対応するカルマンゲインK_n(f,t)を求める。ここで、βは、忘却係数であり、0<β<1である。

また、時空間パラメータ更新部４は、例えば以下の式に基づいて音源nに対応する時間空間共分散行列の逆行列R_n ^-1(f;t)を求める。

<残響抑圧部２（第二の処理）>
残響抑圧部２の第二の処理について説明する。 The spatio-temporal parameter update unit 4 obtains a Kalman gain K _n (f,t) corresponding to a sound source n based on, for example, the following equation: where β is a forgetting factor, and 0<β<1.

Furthermore, the spatiotemporal parameter update unit 4 obtains an inverse matrix R _n ^-1 (f;t) of the spatiotemporal covariance matrix corresponding to the sound source n, for example, based on the following equation.

<Dereverberation Unit 2 (Second Processing)>
The second process of the dereverberation unit 2 will be described.

残響抑圧部には、時空間パラメータ更新部４が求めた音源nに対応するカルマンゲインK_n(f,t)及び音源nに対応する時間空間共分散行列の逆行列R_n ^-1(f;t)と、音源分離部３で生成された音源nのパワーv_n(t)とが入力される。 The reverberation suppression unit receives the Kalman gain K _n (f,t) corresponding to the sound source n calculated by the spatio-temporal parameter update unit 4, the inverse matrix R _n ^-1 (f;t) of the spatio-temporal covariance matrix corresponding to the sound source n, and the power v _n (t) of the sound source n generated by the sound source separation unit 3.

n=1,…,Nとして、残響抑圧部２は、音源nに対応する一時刻前に求まった残響抑圧フィルタG_n(f;t-1)と、音源nに対応するカルマンゲインK_n(f,t)とを用いて、音源nに対応する残響抑圧フィルタG_n(f;t)を求める（ステップＳ５）。 For n=1, ..., N, the dereverberation unit 2 determines a dereverberation filter G _n (f; t) corresponding to the sound source n using the dereverberation filter G n (f; t-1) determined one time earlier and corresponding to the sound source n, and the Kalman gain _K _n (f, t) corresponding to the sound source n (step S5).

すなわち、残響抑圧部２は、各音源1,…,Nに対応する残響抑圧フィルタG₁(f;t),…,G_N(f;t)を求める。 That is, the dereverberation unit 2 obtains dereverberation filters G ₁ (f;t), . . . , G _N (f;t) corresponding to the sound sources 1, .

求まった音源nに対応する残響抑圧フィルタG_n(f;t)は、残響抑圧部２に記憶される。この音源nに対応する残響抑圧フィルタG_n(f;t)は、次の時刻t+1の処理で、音源nに対応する一時刻前の残響抑圧フィルタとして用いられる。 The obtained dereverberation filter G _n (f; t) corresponding to the sound source n is stored in the reverberation suppression unit 2. This dereverberation filter G _n (f; t) corresponding to the sound source n is used as the dereverberation filter corresponding to the sound source n at the previous time point in the processing at the next time point t+1.

残響抑圧部２は、例えば以下の式に基づいて音源nに対応する残響抑圧フィルタG_n(f;t)を求める。

<音源分離部３（第二の処理）>
音源分離部３の第二の処理について説明する。 The dereverberation unit 2 obtains a dereverberation filter G _n (f;t) corresponding to the sound source n based on, for example, the following equation.

<Sound source separation unit 3 (second processing)>
The second process of the sound source separation unit 3 will be described.

n=1,…,Nとして、音源分離部３は、残響抑圧信号ベクトルY_n(f,t)及び音源nのパワーv_n(t)を用いて、音源nに対応する空間共分散行列Σ_n(f,t)を求め、求まった音源nに対応する空間共分散行列Σ_n(f,t)と、音源nに対応する一時刻前に生成された音源分離フィルタQ_n(f;t-1)とを用いて、音源nに対応する音源分離フィルタQ_n(f;t)を求める（ステップＳ６）。 For n=1, ..., N, the sound source separation unit 3 calculates a spatial covariance matrix Σ _n (f,t) corresponding to the sound source n using the dereverberation signal vector Y _n (f,t) and the power v _n (t) of the sound source n, and calculates a sound source separation filter Q _n (f;t) corresponding to the sound source n using the calculated spatial covariance matrix Σ _n (f,t) corresponding to the sound source n and a sound source separation filter Q _n (f;t-1) corresponding to the sound source n that was generated one time earlier (step S6).

すなわち、音源分離部３は、各音源1,…,Nに対応する音源分離フィルタQ₁(f;t),…,Q_N(f;t)を求める。 That is, the sound source separation unit 3 obtains sound source separation filters Q ₁ (f;t), . . . , Q _N (f;t) corresponding to each of the sound sources 1, .

求まった音源nに対応する音源分離フィルタQ_n(f;t)は、音源分離部３に記憶される。この音源nに対応する音源分離フィルタQ_n(f;t)は、次の時刻t+1の処理で、音源nに対応する一時刻前の音源分離フィルタQ_n(f;t)として用いられる。 The obtained sound source separation filter Q _n (f; t) corresponding to the sound source n is stored in the sound source separation unit 3. This sound source separation filter Q _n (f; t) corresponding to the sound source n is used as the sound source separation filter Q _n (f; t) corresponding to the sound source n at the previous time point in the processing at the next time point t+1.

音源分離部３は、例えば以下の式に基づいて空間共分散行列Σ_n(f,t)を求める。ここで、L_bは、ブロック長であり、正の整数である。

音源分離部３は、例えば以下の式(3),(4)に基づいて音源分離フィルタQ_n(f;t)を更新する。より詳細には、式(3’)で一時刻前に求められたQ(f,t-1)をQ(f,t)にコピーしたのち、すべてのnに関して順に式(3)でQ_n(f;t)を更新し、得られたQ_n(f;t)を式(4)の右辺に代入して式(4)で定義されるQ_n(f;t)を計算することで、音源分離フィルタQ_n(f;t)を更新する。

ここで、n=1,…,Nとして、e_nは、n番目の要素が1であり、他の要素が0であるN次元ベクトルである。 The sound source separation unit 3 calculates the spatial covariance matrix Σ _n (f, t) based on, for example, the following equation: Here, L _b is a block length and is a positive integer.

The sound source separation unit 3 updates the sound source separation filter Q _n (f;t) based on, for example, the following formulas (3) and (4). More specifically, Q(f,t-1) calculated one time before by formula (3') is copied to Q(f,t), and then Q _n (f;t) is updated sequentially by formula (3) for all n, and the obtained Q _n (f;t) is substituted into the right side of formula (4) to calculate Q _n (f;t) defined in formula (4), thereby updating the sound source separation filter Q _n (f;t).

Here, e _n , for n=1,…,N, is an N-dimensional vector whose nth element is 1 and the other elements are 0.

このようにして、音源分離の結果を時空間パラメータ更新部４の処理にフィードバックすることで、全体として最適な処理を行うことができる。また、音源nごとに時空間パラメータであるカルマンゲインK_n(f,t)及び時間空間共分散行列の逆行列R_n ^-1(f;t)を個別にオンライン処理で求めることで、音源間の関係を考慮する必要がなくなるため、背景技術と比べて最適化に必要な行列のサイズを小さくできる。このため、全体の計算コストを削減できる。 In this way, the sound source separation results are fed back to the processing of the spatio-temporal parameter update unit 4, thereby enabling optimal processing overall. In addition, by individually calculating the Kalman gain K _n (f,t) and the inverse matrix R _n ^-1 (f;t) of the spatio-temporal covariance matrix, which are spatio-temporal parameters, for each sound source n through online processing, it becomes unnecessary to consider the relationship between the sound sources, and therefore the size of the matrix required for optimization can be made smaller than that of the background art. As a result, the overall calculation cost can be reduced.

なお、第一実施形態においては、全体最適化を行うために、１つの最適化基準で全パラメータを最適化している。１つ最適化基準の例は、以下の式(5)により示される基準である。

ここで、X_tは、X_t={x_m(f,t’)}_f,t’≦t,nであり、過去の時刻ｔ’から現在の時刻tまでの観測信号である。 In the first embodiment, in order to perform global optimization, all parameters are optimized using one optimization criterion. An example of the one optimization criterion is the criterion shown in the following formula (5).

Here, _Xt is _Xt = { _xm (f,t')} _{f,t' ≤ t,n} , and is the observed signal from the past time t' to the current time t.

上記の処理では、各時刻において、例えば式(5)を最大化する残響抑圧フィルタG_n(f,t)、音源分離フィルタQ_n(f,t)、各分離音のパワーv_n(t)をオンライン処理で求めているといえる。 In the above process, the dereverberation filter G _n (f, t), sound source separation filter Q _n (f, t), and power v _n (t) of each separated sound that maximize equation (5) at each time are found by online processing.

なお、式(5)は、(i)各分離音は、そのパワーv_n(t)が時間変化する複素ガウス分布に従い、(ii)各時刻で、直近の観測信号を重視する（=古い過去の観測信号を忘れる）忘却係数βを用いるという仮定の下、例えば式(1)及び式(2)による処理を考慮し、最尤法に基づき導出された基準である。 Note that equation (5) is a criterion derived based on the maximum likelihood method, taking into account, for example, the processing by equations (1) _and (2), under the assumption that (i) each separated sound follows a complex Gaussian distribution whose power vn(t) varies over time, and (ii) a forgetting factor β is used that places importance on the most recent observed signal at each time (= forgets older observed signals).

なお、残響抑圧と音源分離は別々で処理されるので、部分的に異なる基準を用いて最適化する（例えば、異なる忘却係数を用いる）などの修正を加えてもよい。 Note that since dereverberation and sound source separation are processed separately, modifications may be made, such as optimizing using partially different criteria (e.g., using different forgetting coefficients).

[第二実施形態]
第二実施形態の音響信号強調装置は、第一実施形態の音響信号強調装置とは異なり、全音源共通の残響抑圧フィルタG(f,t-1)を用いて、全ての音源を同時に残響抑圧し、全音源共通の残響抑圧信号ベクトルY(f,t)∈C^M×1を求めるものである。 [Second embodiment]
The acoustic signal enhancement device of the second embodiment differs from the acoustic signal enhancement device of the first embodiment in that it uses a dereverberation filter G(f,t−1) common to all sound sources to simultaneously dereverberate all sound sources and obtain a dereverberation-suppressed signal vector Y(f,t)∈C ^M×1 common to all sound sources.

以下、第一実施形態の音響信号強調装置とは異なる部分を中心に説明する。第一実施形態と同様の部分については重複説明を省略する。 The following will focus on the differences from the first embodiment of the audio signal enhancement device. A duplicated explanation of the parts that are the same as the first embodiment will be omitted.

第二実施形態の音響信号強調装置は、第一実施形態の音響信号強調装置と同様に、図３に示すように、初期化部１、残響抑圧部２、音源分離部３及び時空間パラメータ更新部４を例えば備えている。 The audio signal enhancement device of the second embodiment, like the audio signal enhancement device of the first embodiment, includes, for example, an initialization unit 1, a reverberation suppression unit 2, a sound source separation unit 3, and a spatiotemporal parameter update unit 4, as shown in FIG. 3.

<初期化部１>
初期化部１は、残響抑圧フィルタG(f;0)の全要素を所定の値（例えば0）とすることで、残響抑圧フィルタG(f;0)の初期化を行う。また、初期化部１は、第一実施形態と同様にして、音源分離フィルタQ(f;0)と時間空間共分散行列の逆行列R_n ^-1(f;0)と音源ごとの残響抑圧音の空間共分散行列Σ_n(f,0)の初期化を行う。 <Initialization section 1>
The initialization unit 1 initializes the dereverberation filter G(f;0) by setting all elements of the dereverberation filter G(f;0) to a predetermined value (for example, 0). Similarly to the first embodiment, the initialization unit 1 also initializes the sound source separation filter Q(f;0), the inverse matrix R _n ^-1 (f;0) of the time-space covariance matrix, and the spatial covariance matrix Σ _n (f,0) of the dereverberation sound for each sound source.

<残響抑圧部２（第一の処理）>
残響抑圧部２は、第一の処理として、一時刻前に求まった残響抑圧フィルタG(f;t-1)と、マイクmの観測信号x_m(f,t)から構成される観測信号ベクトルX(f,t)とを用いて、観測信号x_m(f,t)に対応する残響抑圧信号y_m(f,t)から構成される残響抑圧信号ベクトルY(f,t)を生成する（ステップＳ２）。 <Dereverberation Unit 2 (First Processing)>
As a first process, the dereverberation unit 2 uses the dereverberation filter G(f;t-1) obtained one time ago and an observed signal vector X(f,t) composed of an observed signal x _m (f,t) from microphone m to generate a dereverberation signal vector Y(f,t) composed of a dereverberation signal y _m (f,t) corresponding to the observed signal x _m (f,t) (step S2).

ここで、Y(f,t)=[y₁(f,t),…,y_M(f,t)]である。残響抑圧信号ベクトルY(f,t)は、全音源に共通の残響抑圧音ともいえる。 Here, Y(f,t)=[y ₁ (f,t),...,y _M (f,t)]. The dereverberation signal vector Y(f,t) can be said to be a dereverberation sound common to all sound sources.

生成された残響抑圧信号ベクトルY(f,t)は、音源分離部３に出力される。 The generated dereverberation signal vector Y(f,t) is output to the sound source separation unit 3.

残響抑圧部２は、例えば以下の式に基づいて残響抑圧信号ベクトルY(f,t)を求める。

<音源分離部３（第一の処理）>
n=1,…,Nとして、音源分離部３は、第一の処理として、一時刻前に生成された音源分離フィルタQ(f;t-1)と、残響抑圧部２で生成された残響抑圧信号ベクトルY(f,t)とを用いて、音源nの強調音s_n(f,t)から構成される強調音ベクトルS(f,t)及び音源nのパワーv_n(t)を求める（ステップＳ３）。 The dereverberation unit 2 calculates a dereverberation signal vector Y(f,t) based on, for example, the following equation:

<Sound source separation unit 3 (first process)>
As a first process, for n=1, ..., N, the sound source separation unit 3 uses the sound source separation filter Q(f;t-1) generated one time ago and the reverberation suppression signal vector Y(f,t) generated by the reverberation suppression unit 2 to obtain an emphasis sound vector S(f,t) composed of the emphasis sound s _n (f,t) of the sound source n and the power v _n (t) of the sound source n (step S3).

音源分離部３は、例えば以下の式に基づいて強調音ベクトルS(f,t)を求める。ここで、S(f,t)=[s₁(f,t),…,s_N(f,t)]である。また、Q(f;t-1)=[Q₁(f;t-1),…,Q_N(f;t-1)]である。

強調音ベクトルS(f,t)に基づく音源nのパワーv_n(t)の求め方は、第一実施形態と同様である。 The sound source separation unit 3 obtains an emphasis sound vector S(f,t) based on, for example, the following formula, where S(f,t)=[ _s1 (f,t),..., _sN (f,t)], and Q(f;t-1)=[ _Q1 (f;t-1),..., _QN (f;t-1)].

The method of calculating the power v _n (t) of the sound source n based on the emphasis sound vector S(f,t) is the same as in the first embodiment.

生成された強調音ベクトルS(f,t)を構成する音源nの強調音s_n(f,t)は、音響信号強調装置から出力される。生成された音源nのパワーv_n(t)は、時空間パラメータ更新部４に出力される。 The generated emphasis sound s n (f, t) of the sound source n constituting the emphasis sound vector _S (f, t) is output from the audio signal emphasis device. The generated power v _n (t) of the sound source n is output to the spatio-temporal parameter update unit 4.

<時空間パラメータ更新部４>
時空間パラメータ更新部４の処理は、第一実施形態と同様である。すなわち、n=1,…,Nとして、時空間パラメータ更新部４は、音源nのパワーv_n(t)と、観測信号ベクトルX(f,t)と、音源nに対応する一時刻前に求まった時間空間共分散行列の逆行列R_n ^-1(f;t-1)とを用いて、音源nに対応するカルマンゲインK_n(f,t)及び音源nに対応する時間空間共分散行列の逆行列R_n ^-1(f;t)を求める（ステップＳ４）
求まった音源nに対応するカルマンゲインK_n(f,t)及び音源nに対応する時間空間共分散行列の逆行列R_n ^-1(f;t)は、残響抑圧部２に出力される。 <Time-space parameter update unit 4>
The processing of the spatio-temporal parameter update unit 4 is the same as that of the first embodiment. That is, for n=1,...,N, the spatio-temporal parameter update unit 4 calculates the Kalman gain Kn( _f ,t) corresponding to the sound source n and the inverse matrix Rn ^-1 (f;t) of the spatio-temporal covariance matrix corresponding to the sound source n by using the power vn(t) of the sound source n, the observed signal vector _X (f,t), and the inverse matrix _Rn ^-1 (f;t-1) of the spatio-temporal covariance matrix corresponding to the sound source _n calculated one time before (step S4).
The obtained Kalman gain K _n (f, t) corresponding to the sound source n and the inverse matrix R _n ⁻¹ (f; t) of the time-space covariance matrix corresponding to the sound source n are output to the dereverberation unit 2 .

<残響抑圧部２（第二の処理）>
第二実施形態では、初期化部１で初期化された又は音源分離部３で更新された音源分離フィルタQ(f;t)は、残響抑圧部２にも入力される。 <Dereverberation Unit 2 (Second Processing)>
In the second embodiment, the sound source separation filter Q(f;t) initialized by the initialization unit 1 or updated by the sound source separation unit 3 is also input to the reverberation reduction unit 2 .

n=1,…,Nとして、残響抑圧部２は、第二の処理として、第一実施形態と同様の処理により、音源nに対応する一時刻前に求まった残響抑圧フィルタG_n(f;t-1)と、音源nに対応するカルマンゲインK_n(f,t)と、音源nのパワーv_n(t)を用いて、音源nに対応する残響抑圧フィルタG_n(f;t)を求める（ステップＳ５１）。 As a second process, the dereverberation unit 2, for n=1, ..., N, performs processing similar to that in the first embodiment to find a dereverberation filter G _n (f; t) corresponding to sound source n using the dereverberation filter G _n (f; t-1) corresponding to sound source n found one time ago, the Kalman gain K _n (f, t) corresponding to sound source n, and the power v _n (t) of sound source n (step S51).

そして、残響抑圧部２は、求まった各音源nに対応する残響抑圧フィルタG_n(f;t)と、一時刻前に生成された音源分離フィルタQ(f;t-1)とを用いて、残響抑圧フィルタG(f;t)を求める（ステップＳ５２）。 Then, the dereverberation unit 2 obtains a dereverberation filter G(f;t) using the obtained dereverberation filter G _n (f;t) corresponding to each sound source n and the sound source separation filter Q(f;t-1) generated one time earlier (step S52).

残響抑圧部２は、例えば以下の式に基づいて残響抑圧フィルタG(f;t)を求める。

ステップＳ５１及びステップＳ５２の処理が、ステップＳ５に相当する。 The dereverberation unit 2 calculates the dereverberation filter G(f;t) based on, for example, the following equation:

The processes in steps S51 and S52 correspond to step S5.

残響抑圧フィルタG(f,t)は、全音源に共通の残響抑圧フィルタである。 The dereverberation filter G(f,t) is a common dereverberation filter for all sound sources.

<音源分離部３（第二の処理）>
n=1,…,Nとして、音源分離部３は、第二の処理として、残響抑圧信号ベクトルY(f,t)及び音源nのパワーv_n(t)を用いて、音源nに対応する空間共分散行列Σ_n(f,t)を求める（ステップＳ６１）。 <Sound source separation unit 3 (second processing)>
As a second process, the sound source separation unit 3 calculates a spatial covariance matrix Σ n (f,t) corresponding to sound source n, where n=1, ..., N, using the dereverberation signal vector Y(f,t) and the power v _n (t) of sound source _n (step S61).

第二実施形態の音源分離部３は、第一実施形態とは異なり、例えば以下の式に基づいて空間共分散行列Σ_n(f,t)を求める。

そして、音源分離部３は、第一実施形態の処理と同様の処理により、求まった音源nに対応する空間共分散行列Σ_n(f,t)と、音源nに対応する一時刻前に生成された音源分離フィルタQ_n(f;t-1)とを用いて、音源nに対応する音源分離フィルタQ_n(f;t)を求める（ステップＳ６２）。 The sound source separation unit 3 of the second embodiment differs from that of the first embodiment in that it calculates the spatial covariance matrix Σ _n (f,t) based on, for example, the following equation.

Then, the sound source separation unit 3 uses the spatial covariance matrix Σ _n (f,t) corresponding to the obtained sound source n and the sound source separation filter Q _n (f;t-1) generated one time earlier and corresponding to the sound source n, by processing similar to that in the first embodiment, to find the sound source separation filter Q _n (f;t) corresponding to the sound source n (step S62).

ステップＳ６１及びステップＳ６２の処理が、ステップＳ６の処理に相当する。 The processing in steps S61 and S62 corresponds to the processing in step S6.

第二実施形態でも、全体最適化を行うために、１つの最適化基準に基づいて各時刻の全パラメータを最適化している。最適化基準の例は、第一実施形態と同様に、式(5)により示される基準である。 In the second embodiment, in order to perform global optimization, all parameters at each time are optimized based on one optimization criterion. An example of the optimization criterion is the criterion shown in formula (5), as in the first embodiment.

上記の処理では、各時刻において、例えば式(5)を最大化する残響抑圧フィルタG(f,t)、音源分離フィルタQ(f;t)、各分離音のパワーv(t)をオンライン処理で求めているといえる。 In the above process, the dereverberation filter G(f,t), the sound source separation filter Q(f;t), and the power v(t) of each separated sound that maximizes, for example, equation (5) are found online at each time.

なお、第一実施形態と同様に、残響抑圧と音源分離は別々で処理されるので、部分的に異なる基準を用いて最適化する（例えば、異なる忘却係数を用いる）などの修正を加えてもよい。 As in the first embodiment, dereverberation and sound source separation are processed separately, so modifications may be made, such as optimizing using partially different criteria (e.g., using different forgetting coefficients).

[第一実施形態と第二実施形態の上位概念]
第一実施形態と第二実施形態の音響信号強調装置及び方法は、少なくとも以下の点で共通している。 [Generic Concept of First and Second Embodiments]
The acoustic signal enhancement device and method of the first and second embodiments have at least the following points in common.

言い換えれば、第一実施形態及び第二実施形態の音響信号強調装置の残響抑圧部２、音源分離部３及び時空間パラメータ更新部４は、以下の処理を行っていると言える。 In other words, the reverberation suppression unit 2, the sound source separation unit 3, and the spatiotemporal parameter update unit 4 of the audio signal enhancement device of the first and second embodiments perform the following processes.

残響抑圧部２は、各時刻tにおいて、時刻tの観測信号ベクトルと時刻t-1に求められた時間空間共分散行列の逆行列を受け取り、観測信号ベクトルに対応する残響抑圧信号ベクトルを生成する。 At each time t, the dereverberation unit 2 receives the observed signal vector at time t and the inverse matrix of the time-space covariance matrix calculated at time t-1, and generates a dereverberation signal vector corresponding to the observed signal vector.

音源分離部３は、各時刻tにおいて、残響抑圧部２で生成された残響抑圧信号ベクトルを受け取り、音源nの強調音及び音源nのパワーを求める。 The sound source separation unit 3 receives the dereverberation signal vector generated by the dereverberation unit 2 at each time t, and calculates the emphasis sound of sound source n and the power of sound source n.

時空間パラメータ更新部４は、各時刻tにおいて、音源分離部３で求まった音源nのパワーと、観測信号ベクトルとを受け取り、音源nに対応する時間空間共分散行列の逆行列を求める。 The spatiotemporal parameter update unit 4 receives the power of sound source n determined by the sound source separation unit 3 and the observed signal vector at each time t, and calculates the inverse of the spatiotemporal covariance matrix corresponding to sound source n.

本発明の音響信号強調装置は、これらの第一実施形態及び第二実施形態に共通の構成を少なくとも有していればよい。 The audio signal enhancement device of the present invention may have at least the configuration common to the first and second embodiments.

言い換えれば、音響信号強調装置は、各時刻tにおいて、時刻tの観測信号ベクトルと時刻t-1に求められた時間空間共分散行列の逆行列を受け取り、観測信号ベクトルに対応する残響抑圧信号ベクトルを生成する残響抑圧部２と、各時刻tにおいて、生成された残響抑圧信号ベクトルを受け取り、音源nの強調音及び音源nのパワーを求める音源分離部３と、各時刻tにおいて、音源nのパワーと、観測信号ベクトルとを受け取り、音源nに対応する時間空間共分散行列の逆行列を求める時空間パラメータ更新部４と、を備えていればよい。 In other words, the audio signal enhancement device may include a dereverberation unit 2 that receives, at each time t, an observed signal vector at time t and the inverse matrix of the time-space covariance matrix calculated at time t-1, and generates a dereverberation-reduction signal vector corresponding to the observed signal vector, a sound source separation unit 3 that receives, at each time t, the generated dereverberation-reduction signal vector, and calculates an enhanced sound for sound source n and the power of sound source n, and a spatio-temporal parameter update unit 4 that receives, at each time t, the power of sound source n and the observed signal vector, and calculates the inverse matrix of the time-space covariance matrix corresponding to sound source n.

時空間パラメータ更新部４が音源分離部３で求まった音源nのパワーを用いることにより、音源分離の結果が時空間パラメータ更新部４の処理にフィードバックされるため、全体として最適な処理を行うことができる。 The spatiotemporal parameter update unit 4 uses the power of the sound source n obtained by the sound source separation unit 3, and the results of the sound source separation are fed back to the processing of the spatiotemporal parameter update unit 4, allowing optimal processing to be performed overall.

[変形例]
以上、本発明の実施の形態について説明したが、具体的な構成は、これらの実施の形態に限られるものではなく、本発明の趣旨を逸脱しない範囲で適宜設計の変更等があっても、本発明に含まれることはいうまでもない。 [Variations]
Although the embodiments of the present invention have been described above, the specific configurations are not limited to these embodiments, and it goes without saying that appropriate design changes, etc., are included in the present invention as long as they do not deviate from the spirit of the present invention.

実施の形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 The various processes described in the embodiments may not only be executed in chronological order according to the order described, but may also be executed in parallel or individually depending on the processing capabilities of the device executing the processes or as necessary.

例えば、音響信号強調装置の構成部間のデータのやり取りは直接行われてもよいし、図示していない記憶部を介して行われてもよい。 For example, data may be exchanged directly between components of the audio signal enhancement device, or may be exchanged via a storage unit (not shown).

[プログラム、記録媒体]
上述した各装置の各部の処理をコンピュータにより実現してもよく、この場合は各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムを図５に示すコンピュータ１０００の記憶部１０２０に読み込ませ、演算処理部１０１０、入力部１０３０、出力部１０４０などに動作させることにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 [Programs, recording media]
The processing of each unit of each of the above-mentioned devices may be realized by a computer, in which case the processing contents of the functions that each device should have are described by a program. Then, by loading this program into the storage unit 1020 of the computer 1000 shown in Fig. 5 and operating the arithmetic processing unit 1010, the input unit 1030, the output unit 1040, etc., various processing functions of each of the above-mentioned devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体は、例えば、非一時的な記録媒体であり、具体的には、磁気記録装置、光ディスク、等である。 The program describing this processing can be recorded on a computer-readable recording medium. A computer-readable recording medium is, for example, a non-transitory recording medium, specifically, a magnetic recording device, an optical disk, etc.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program may be distributed, for example, by selling, transferring, or lending portable recording media such as DVDs and CD-ROMs on which the program is recorded. Furthermore, the program may be stored in a storage device of a server computer, and the program may be distributed by transferring the program from the server computer to other computers via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の非一時的な記憶装置である補助記録部１０５０に格納する。そして、処理の実行時、このコンピュータは、自己の非一時的な記憶装置である補助記録部１０５０に格納されたプログラムを記憶部１０２０に読み込み、読み込んだプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを記憶部１０２０に読み込み、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program, for example, first stores the program recorded on a portable recording medium or the program transferred from a server computer in its own non-transient storage device, the auxiliary recording unit 1050. Then, when executing the process, the computer reads the program stored in its own non-transient storage device, the auxiliary recording unit 1050, into the storage unit 1020, and executes the process according to the read program. In addition, as another execution form of this program, the computer may read the program directly from the portable recording medium into the storage unit 1020 and execute the process according to the program, or, further, each time a program is transferred from the server computer to this computer, the computer may execute the process according to the received program one by one. In addition, the server computer may not transfer the program to this computer, but may execute the above-mentioned process by a so-called ASP (Application Service Provider) type service that realizes the processing function only by issuing an execution instruction and obtaining the result. In this embodiment, the program includes information used for processing by a computer that is equivalent to a program (such as data that is not a direct command to a computer but has properties that dictate computer processing).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, in this embodiment, the device is configured by executing a specific program on a computer, but at least a portion of the processing may be realized by hardware.

その他、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 Needless to say, other modifications are possible without departing from the spirit of this invention.

[実験結果]
残響のある環境下で収録した同時に話す２人の声から、前記の実施形態により残響抑圧と音源分離を行った。マイクの個数が４個の場合のオンライン処理の処理遅延は、8.89ミリ秒であった。また、このとき、分離音の信号対歪み比（SDR）は、6.81dBであった。従来法によるSDRは、3.81dBであることを考慮すると、低遅延(8.98ミリ秒)のオンライン処理で、SDRを従来法よりも改善できたことがわかる。 [Experimental Results]
The above embodiment performed dereverberation and sound source separation from the voices of two people speaking simultaneously recorded in a reverberant environment. The processing delay of online processing when the number of microphones was four was 8.89 milliseconds. In addition, the signal-to-distortion ratio (SDR) of the separated sound was 6.81 dB. Considering that the SDR by the conventional method was 3.81 dB, it can be seen that the SDR was improved compared to the conventional method by low-delay (8.98 milliseconds) online processing.

Claims

a dereverberation unit that receives, at each time t, an observation signal vector at time t and an inverse matrix of a time-space covariance matrix calculated at time t−1, and generates a dereverberation signal vector corresponding to the observation signal vector , corresponding to sound source n, or common to all sound sources ;
a sound source separation unit that obtains an enhanced sound of the sound source n and a power of the sound source n by using the generated dereverberation signal vector corresponding to the sound source n or common to all sound sources at each time t;
a space-time parameter update unit that receives the power of the sound source n and the observed signal vector at each time t, and calculates an inverse matrix of a space-time covariance matrix corresponding to the sound source n;
16. An acoustic signal enhancement device comprising:

where t is the time frame number, f is the frequency number, N is the number of sound sources, M is the number of microphones, n = 1, ..., N, m = 1, ..., M,
a dereverberation unit that uses a dereverberation filter _Gn (f;t-1) corresponding to a sound source n obtained one time before and an observation signal vector _X (f,t) composed of an observation signal xm(f,t) of a microphone m to generate a dereverberation signal vector _Yn (f,t) for an emphasized sound of the sound source n, the dereverberation signal vector Yn(f,t) being composed of a dereverberation signal _yn _, m(f,t) corresponding to the observation signal xm(f,t);
a sound source separation unit that obtains an emphasized sound s n (f,t) of the sound source n and a power v _n (t) of the sound source n by using a sound source separation filter _Q _n (f;t-1) generated one time before and corresponding to the sound source n and the generated dereverberation signal vector Y _n (f,t);
a space-time parameter update unit that calculates a Kalman gain K _n (f,t) corresponding to the sound source n and an inverse matrix R _n ^-1 (f;t) of the space-time covariance matrix corresponding to the sound source n, using the power v n (t) of the sound source n, the observed signal vector _X (f,t), and an inverse matrix R _n ^-1 (f;t-1) of the space-time covariance matrix corresponding to the sound source n calculated one time before,
the dereverberation unit obtains a dereverberation filter G _n (f; t) corresponding to the sound source n by using a dereverberation filter G _n (f; t-1) obtained one time before and corresponding to the sound source n, a Kalman gain K _n (f, t) corresponding to the sound source n, and a power v _n (t) of the sound source n;
the sound source separation unit obtains a spatial covariance matrix Σ _n (f,t) corresponding to the sound source n by using the dereverberation signal vector Y _n (f,t) and a power v _n (t) of the sound source n, and obtains a sound source separation filter Q _n (f;t) corresponding to the sound source n by using the obtained spatial covariance matrix Σ _n (f,t) corresponding to the sound source n and a sound source separation filter Q _n (f;t-1) corresponding to the sound source n that was generated one time ago.
13. An audio signal enhancement device comprising:

where t is the time frame number, f is the frequency number, N is the number of sound sources, M is the number of microphones, n = 1, ..., N, m = 1, ..., M,
a dereverberation unit that uses a dereverberation filter G(f;t-1) obtained one time before and an observation signal vector _X (f,t) composed of an observation signal xm(f,t) of a microphone _m to generate a dereverberation signal vector Y(f,t) composed of a dereverberation signal _ym (f,t) corresponding to the observation signal xm(f,t);
a sound source separation unit that uses a sound source separation filter Q(f;t-1) generated one time before and the generated dereverberation signal vector Y(f,t) to obtain an emphasis sound vector S(f,t) composed of an emphasis sound s _n (f,t) of a sound source n and a power v _n (t) of the sound source n;
a space-time parameter update unit that calculates a Kalman gain K _n (f,t) corresponding to the sound source n and an inverse matrix R _n ^-1 (f;t) of the space-time covariance matrix corresponding to the sound source n, using the power v n (t) of the sound source n, the observed signal vector _X (f,t), and an inverse matrix R _n ^-1 (f;t-1) of the space-time covariance matrix corresponding to the sound source n calculated one time before,
the dereverberation unit obtains a dereverberation filter G _n (f; t) corresponding to the sound source n using a dereverberation filter G _n (f; t-1) obtained one time ago and corresponding to the sound source n, a Kalman gain K _n (f, t) corresponding to the sound source n, and a power v _n (t) of the sound source n; and obtains a dereverberation filter G(f; t) using the obtained dereverberation filter G _n (f; t) corresponding to each sound source n and the sound source separation filter Q(f; t-1) generated one time ago;
the sound source separation unit obtains a spatial covariance matrix Σ _n (f, t) corresponding to the sound source n by using the dereverberation signal vector Y(f, t) and a power v _n (t) of the sound source n, and obtains a sound source separation filter Q _n (f; t) corresponding to the sound source n by using the obtained spatial covariance matrix Σ _n (f, t) corresponding to the sound source n and a sound source separation filter Q _n (f; t-1) corresponding to the sound source n that was generated one time ago.
13. An audio signal enhancement device comprising:

a dereverberation step in which a dereverberation unit receives an observation signal vector at time t and an inverse matrix of a time-space covariance matrix calculated at time t−1, and generates a dereverberation signal vector corresponding to the observation signal vector , corresponding to sound source n, or common to all sound sources ;
a sound source separation step in which a sound source separation unit obtains an enhanced sound of the sound source n and a power of the sound source n by using the generated dereverberation signal vector corresponding to the sound source n or common to all sound sources at each time t;
a space-time parameter updating step in which a space-time parameter updating unit receives the power of the sound source n and the observed signal vector at each time t, and calculates an inverse matrix of a space-time covariance matrix corresponding to the sound source n;
2. A method for enhancing an acoustic signal comprising:

where t is the time frame number, f is the frequency number, N is the number of sound sources, M is the number of microphones, n = 1, ..., N, m = 1, ..., M,
a dereverberation step in which the dereverberation unit generates a dereverberation signal vector Y _n (f,t) for the emphasized sound of the sound source n, the dereverberation signal vector Y n (f,t) being composed of a dereverberation signal y n, _m (f,t) corresponding to the observation signal x _m (f,t), using a dereverberation filter G _n (f;t-1) obtained one time before and corresponding to the sound source n, and an observation signal vector X(f,t) composed of an observation signal x _m (f,t) of the microphone m;
a sound source separation step in which the sound source separation unit obtains an emphasized sound s _n (f,t) of the sound source n and a power v _n (t) of the sound source n by using a sound source separation filter _Q _n (f;t-1) generated one time before and corresponding to the sound source n and the generated dereverberation signal vector Y n (f,t);
a space-time parameter updating step in which a space-time parameter updating unit calculates a Kalman gain K n ( _f ,t) corresponding to the sound source n and an inverse matrix R _n ^-1 (f;t) of the space-time covariance matrix corresponding to the sound source n, using the power v n (t) of the sound source n, the observed signal vector _X (f,t), and an inverse matrix R _n ^-1 (f;t-1) of the space-time covariance matrix corresponding to the sound source n calculated one time ago,
the dereverberation unit obtains a dereverberation filter G _n (f; t) corresponding to the sound source n by using a dereverberation filter G _n (f; t-1) obtained one time before and corresponding to the sound source n, a Kalman gain K _n (f, t) corresponding to the sound source n, and a power v _n (t) of the sound source n;
the sound source separation unit obtains a spatial covariance matrix Σ _n (f,t) corresponding to the sound source n by using the dereverberation signal vector Y _n (f,t) and a power v _n (t) of the sound source n, and obtains a sound source separation filter Q _n (f;t) corresponding to the sound source n by using the obtained spatial covariance matrix Σ _n (f,t) corresponding to the sound source n and a sound source separation filter Q _n (f;t-1) corresponding to the sound source n that was generated one time ago.
13. A method for enhancing an acoustic signal comprising:

where t is the time frame number, f is the frequency number, N is the number of sound sources, M is the number of microphones, n = 1, ..., N, m = 1, ..., M,
a dereverberation step in which the dereverberation unit generates a dereverberation signal vector Y(f,t) composed of a dereverberation signal ym(f,t) corresponding to the observation signal _xm (f,t) by using a dereverberation filter _G (f;t-1) obtained one time before and an observation signal vector _X (f,t) composed of an observation signal xm(f,t) of a microphone m;
a sound source separation step in which the sound source separation unit obtains an emphasis sound vector S(f,t) composed of an emphasis sound s _n (f,t) of the sound source n and a power v n (t) of the sound source n by using a sound source separation filter Q(f;t-1) generated one time before and the generated dereverberation signal vector Y( _f ,t);
a space-time parameter updating step in which a space-time parameter updating unit calculates a Kalman gain K n ( _f ,t) corresponding to the sound source n and an inverse matrix R _n ^-1 (f;t) of the space-time covariance matrix corresponding to the sound source n, using the power v n (t) of the sound source n, the observed signal vector _X (f,t), and an inverse matrix R _n ^-1 (f;t-1) of the space-time covariance matrix corresponding to the sound source n calculated one time ago,
the dereverberation unit obtains a dereverberation filter G _n (f; t) corresponding to the sound source n using a dereverberation filter G _n (f; t-1) obtained one time ago and corresponding to the sound source n, a Kalman gain K _n (f, t) corresponding to the sound source n, and a power v _n (t) of the sound source n; and obtains a dereverberation filter G(f; t) using the obtained dereverberation filter G _n (f; t) corresponding to each sound source n and the sound source separation filter Q(f; t-1) generated one time ago;
the sound source separation unit obtains a spatial covariance matrix Σ _n (f, t) corresponding to the sound source n by using the dereverberation signal vector Y(f, t) and a power v _n (t) of the sound source n, and obtains a sound source separation filter Q _n (f; t) corresponding to the sound source n by using the obtained spatial covariance matrix Σ _n (f, t) corresponding to the sound source n and a sound source separation filter Q _n (f; t-1) corresponding to the sound source n that was generated one time ago.
13. A method for enhancing an acoustic signal comprising:

A program for causing a computer to function as each part of the audio signal enhancement device according to any one of claims 1 to 3.