JP5285665B2

JP5285665B2 - Reflected sound information estimation apparatus, reflected sound information estimation method, program

Info

Publication number: JP5285665B2
Application number: JP2010176019A
Authority: JP
Inventors: 健太丹羽; 裕輔日岡; 澄宇阪内; 賢一古家; 陽一羽田
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2010-08-05
Filing date: 2010-08-05
Publication date: 2013-09-11
Anticipated expiration: 2030-08-05
Also published as: JP2012039276A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology which estimates a relative difference in the arrival time between a standard reflection sound and other reflection sounds. <P>SOLUTION: When Q is an preset integer no less than 2, a complex amplitude (arrival amplitude) of each of Q reflection sounds (the reflection sound is shown by a function (transfer characteristic function) simulating a transfer characteristic between an optional position in space and multiple microphones for every frequency included in a prescribed frequency bandwidth multiplied by the complex amplitude) is used. Concerning each reflection sound (target reflection sound) other than a standard reflection sound, the arithmetic average of the frequency in relation to a deflection angle of the arrival amplitude of the target reflection sound in relation to the arrival amplitude of the standard reflection sound being divided by the frequency is defined as a relative difference in the arrival time between the standard reflection sound and the target reflection sound. <P>COPYRIGHT: (C)2012,JPO&INPIT

Description

本発明は、反射音に関する情報、特に基準となる反射音に対する他の反射音の相対的な到来時刻差を推定する技術に関する。 The present invention relates to information relating to reflected sound, and more particularly to a technique for estimating a relative arrival time difference of another reflected sound with respect to a reference reflected sound.

電話や音声会議といった音声情報をやりとりするシステムを一般に音声通信システムと呼ぶ。音声通信システムにおいて、反射音に関する情報（到来振幅、到来方向など）を得ることは非常に重要なことである。会議室のような残響環境下において、マイクロホンを通して収音される収音信号の中には発話者のような音源から直接到来する直接音だけではなく、床、壁や天井に反射して到来する反射音が混在する。したがって、このような残響環境下で或る話者の発言を収録すると、直接音から遅延して反射音が混入するため、聞き取りづらくなってしまう。収音信号から各反射音の到来情報を推定して、反射音を除去することができれば、聞き取りやすい音声に回復することができる。ここで、反射音情報を推定する従来研究として、非特許文献１が挙げられる。 A system for exchanging voice information such as telephone calls and voice conferences is generally called a voice communication system. In a voice communication system, it is very important to obtain information about the reflected sound (arrival amplitude, arrival direction, etc.). In a reverberant environment such as a conference room, the collected sound signal collected through the microphone is reflected not only from the direct sound coming from the sound source such as the speaker but also from the floor, wall or ceiling. Reflected sound is mixed. Therefore, when a speaker's utterance is recorded in such a reverberant environment, the reflected sound is mixed with a delay from the direct sound, making it difficult to hear. If arrival information of each reflected sound can be estimated from the collected sound signal and the reflected sound can be removed, it is possible to recover a sound that is easy to hear. Here, Non-Patent Document 1 is given as a conventional study for estimating reflected sound information.

非特許文献１に開示される技術を実現する機能構成を図１に示す。この技術における処理手順は次のとおりである。 A functional configuration for realizing the technology disclosed in Non-Patent Document 1 is shown in FIG. The processing procedure in this technique is as follows.

１．インパルス音源１００から放射された音源信号を４ｃｈのマイクロホン１１０−１，１１０−２，１１０−３，１１０−４を用いて収音する。ＡＤ変換部１２０は、収音されたアナログ信号をデジタル信号x^→(t)＝[x₁(t),x₂(t),x₃(t),x₄(t)]^Tへ変換する。ここで、[・]^Tは転置を表す。ｔは離散時間のインデックスを表す。４本のマイクロホンは正四面体の頂点に配置されていることとする。 1. The sound source signal radiated from the impulse sound source 100 is picked up using 4ch microphones 110-1, 110-2, 110-3, 110-4. The AD converter 120 converts the collected analog signal into a digital signal x ^→ (t) = [x ₁ (t), x ₂ (t), x ₃ (t), x ₄ (t)] ^T . Here, [•] ^T represents transposition. t represents a discrete time index. Assume that four microphones are arranged at the apexes of a regular tetrahedron.

２．インパルス応答算出部１３０は、デジタル信号x^→(t)＝[x₁(t),x₂(t),x₃(t),x₄(t)]^Tを入力とし、各マイクロホンのインパルス応答h^→(t)＝[h₁(t),h₂(t),h₃(t),h₄(t)]^Tを算出する。インパルス応答の算出方法には、ＴＳＰ法やＭ系列法等があり、いかなる方法を用いてインパルス応答を算出してもよい。 2. The impulse response calculation unit 130 receives the digital signal x ^→ (t) = [x ₁ (t), x ₂ (t), x ₃ (t), x ₄ (t)] ^T as an input, and the impulse response of each microphone. h ^→ (t) = [h ₁ (t), h ₂ (t), h ₃ (t), h ₄ (t)] ^T is calculated. The impulse response calculation method includes a TSP method, an M-sequence method, and the like, and any method may be used to calculate the impulse response.

３．仮想音源算出部１４０は、４ｃｈのインパルス応答h^→(t)＝[h₁(t),h₂(t),h₃(t),h₄(t)]^Tを入力とし、仮想音源情報v^→＝[v^→ ₁,…,v^→ _D]^Tを出力する。Ｄは仮想音源の数を表す。仮想音源とは、各反射音の到来振幅、到来方向、到来時間を表現するために仮想的に存在するとされる音源である。図２を参照して、仮想音源について説明する。図２には、右側の壁で反射した音源信号をマイクロホンで受音する経路が書かれている。右側の壁で反射して到来する音源信号（反射音）は、「仮想音源」と書かれた位置から直接到来する信号と等価である（ただし、壁での反射による減衰や距離減衰の影響は受ける）。 3. The virtual sound source calculation unit 140 receives 4ch impulse response h ^→ (t) = [h ₁ (t), h ₂ (t), h ₃ (t), h ₄ (t)] ^T as input, and generates virtual sound source information. v ^→ = [v ^→ ₁ ,…, v ^→ _D ] ^T is output. D represents the number of virtual sound sources. A virtual sound source is a sound source that is virtually present to represent the arrival amplitude, arrival direction, and arrival time of each reflected sound. The virtual sound source will be described with reference to FIG. FIG. 2 shows a path for receiving a sound source signal reflected by the right wall with a microphone. The sound source signal reflected from the right wall (reflected sound) is equivalent to the signal coming directly from the position written as “virtual sound source” (however, the effects of attenuation and distance attenuation due to reflection on the wall are not receive).

この従来技術の詳細について説明する。インパルス応答を近接した４つの受音点（マイクロホンの位置）で測定すると反射音の到来時刻にわずかな差が生じる。インパルス応答の短い区間の相互相関を利用して、各マイクロホンにおける反射音の対応付けを行うことで、図３のように、ｎ番目の反射波に関するそれぞれの受音点での到来時刻ｔ_1n，ｔ_2n，ｔ_3n，ｔ_4n（１≦ｎ≦Ｄ）が求まる。正四面体マイクロホンアレーの辺の長さをｄ、音速をｃとすると、各仮想音源情報v_n ^→＝[X_n,Y_n,Z_n,S_n]^Tが求まる。ここで、X_n,Y_n,Z_nはｎ番目の仮想音源の位置を表し（式（１）−（３）参照）、これは各反射音の到来方向と到来時間に対応する情報を持つ。また、S_nはｎ番目の仮想音源の強さを表し、４ｃｈのインパルスで対応付けされたｎ番目の反射音の振幅の平均で求まる。
Details of this prior art will be described. When the impulse response is measured at four adjacent sound receiving points (microphone positions), there is a slight difference in the arrival time of the reflected sound. By using the cross-correlation of the short section of the impulse response and associating the reflected sound with each microphone, as shown in FIG. 3, the arrival time t _1n at each sound receiving point regarding the nth reflected wave, t _2n , t _3n , t _4n (1 ≦ n ≦ D) are obtained. Each virtual sound source information v _n ^→ = [X _n , Y _n , Z _n , S _n ] ^T is obtained when the length of the side of the regular tetrahedral microphone array is d and the speed of sound is c. Here, X _n , Y _n , and Z _n represent the position of the n-th virtual sound source (see equations (1) to (3)), which has information corresponding to the arrival direction and arrival time of each reflected sound. . Further, S _n represents the strength of the n-th virtual source, determined by the average of the amplitudes of the an n-th reflected sound associated with impulse 4ch.

山崎芳男ら、「近接する４点のインパルス応答により求めたホールの空間情報」、日本音響学会講演論文集、１９８１年５年、pp.759-760.Yoshio Yamazaki et al., “Spatial information obtained from impulse responses of four adjacent points”, Proc. Of the Acoustical Society of Japan, 1981, pp.759-760.

従来技術によると、仮想音源情報と呼んでいた反射音の「到来振幅」、「到来方向」、「到来時刻」を推定するためには、インパルス応答をあらかじめ用意することが必要であった。しかし、インパルス応答を用意するためには特殊な信号を用いて観測する必要があるため、あらゆる位置でのインパルス応答が事前に用意されているという条件は現実的ではない。 According to the prior art, in order to estimate the “arrival amplitude”, “arrival direction”, and “arrival time” of the reflected sound called virtual sound source information, it is necessary to prepare an impulse response in advance. However, since it is necessary to observe using a special signal in order to prepare an impulse response, the condition that impulse responses at all positions are prepared in advance is not realistic.

そこで本発明は、特殊な信号を用いることなく、反射音情報、特に基準となる反射音に対する他の反射音の相対的な到来時刻差を推定する技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique for estimating a relative arrival time difference of reflected sound information, particularly, a reflected sound with respect to a reference reflected sound, without using a special signal.

記憶部には、Ｐを２以上の予め定められた整数、ｐを１以上Ｐ以下の各整数として、ｐ番目の位置とＭ個のマイクロホンが配置される各位置との間の周波数ごとの伝達特性を模擬した関数（以下、伝達特性関数という）がｐごとにテンプレートとして記憶されていて、音源信号をＭ個のマイクロホンで収音して得られるＭ個の収音信号がそれぞれ周波数領域に変換された信号（以下、観測信号という）とテンプレートとを入力とし、Ｑを２以上の予め定められた整数とし、観測信号をＭ個のマイクロホンに対しＱ個の方向（以下、到来方向という）から到来する反射音が重畳したものとし、テンプレートに基づいて、Ｑ個の反射音それぞれの複素振幅（以下、到来振幅という）を推定し、基準となる反射音（以下、基準反射音という）以外の反射音（以下、対象反射音という）について、基準反射音の到来振幅に対する対象反射音の到来振幅の偏角を周波数で除したものに対する周波数についての相加平均を、基準反射音に対する対象反射音の相対的な到来時刻差とする。In the storage unit, P is a predetermined integer of 2 or more, p is an integer of 1 to P, and transmission for each frequency between the p-th position and each position where M microphones are arranged. Functions that simulate characteristics (hereinafter referred to as transfer characteristic functions) are stored as templates for each p, and M sound pickup signals obtained by collecting sound source signals with M microphones are converted into frequency domains, respectively. Input signals (hereinafter referred to as observation signals) and a template, Q as a predetermined integer of 2 or more, and observation signals from Q directions (hereinafter referred to as arrival directions) with respect to M microphones. Assume that the incoming reflected sound is superimposed, and based on the template, estimate the complex amplitude (hereinafter referred to as arrival amplitude) of each of the Q reflected sounds, and other than the reference reflected sound (hereinafter referred to as reference reflected sound) For the reflected sound (hereinafter referred to as the target reflected sound), the arithmetic average of the frequency obtained by dividing the deviation angle of the arrival amplitude of the target reflected sound with respect to the reference reflected sound by the frequency is the target reflected sound with respect to the reference reflected sound. Relative arrival time difference.

本発明に拠ると、基準反射音の到来振幅に対する対象反射音の到来振幅の偏角を周波数で除したものに対する周波数についての相加平均を、基準反射音に対する対象反射音の相対的な到来時刻差とすることから、インパルス応答を求めるために音源信号に特殊な信号を用いることなく、反射音情報（基準反射音に対する他の反射音の相対的な到来時刻差）を推定することが可能である。反射音情報が得られると、従来の音声情報処理技術では実現できなかった音源距離の推定や、音声強調（遠方音の収音や距離別の収音）といった用途に応用できる。 According to the present invention, the arithmetic mean of the frequency relative to the deviation of the arrival amplitude of the target reflected sound relative to the arrival amplitude of the reference reflected sound divided by the frequency is obtained as a relative arrival time of the target reflected sound with respect to the reference reflected sound. Because of the difference, it is possible to estimate reflected sound information (relative arrival time difference of other reflected sounds with respect to the reference reflected sound) without using a special signal as a sound source signal to obtain an impulse response. is there. When the reflected sound information is obtained, it can be applied to uses such as estimation of a sound source distance and sound enhancement (collection of far-field sound and sound collection by distance) that could not be realized by the conventional sound information processing technology.

従来技術における反射音情報推定技術の機能構成を示す図。The figure which shows the function structure of the reflected sound information estimation technique in a prior art. 仮想音源を説明するための図。The figure for demonstrating a virtual sound source. 従来技術における反射音の対応付けを説明するための図。The figure for demonstrating matching of the reflected sound in a prior art. 第１実施形態に係る反射音情報推定装置の機能構成を示す図。The figure which shows the function structure of the reflected sound information estimation apparatus which concerns on 1st Embodiment. 第１実施形態に係る反射音情報推定方法の処理手順を示す図。The figure which shows the process sequence of the reflected sound information estimation method which concerns on 1st Embodiment. ２次元マイクロホンアレーの構成例を示す図。The figure which shows the structural example of a two-dimensional microphone array. ｐ番目の点[x_p,y_p,z_p]とｍ番目の受音点[u_m,v_m,w_m]との間の伝達特性を説明するための図。p-th point _{_{[x p, y p, z}} p] diagram for explaining the transfer characteristic between the m-th receiving point _{_{[u m, v m, w}} m]. 図６に示すマイクロホンアレーを用いて観測した或る平面での音圧分布が例えば直接音と反射音１と反射音２との重畳で得られていることを説明するための図。The figure for demonstrating that the sound pressure distribution in a certain plane observed using the microphone array shown in FIG. 6 is obtained by superimposing the direct sound, the reflected sound 1, and the reflected sound 2, for example. 本発明の原理を説明するための図。The figure for demonstrating the principle of this invention. 実験例においてテンプレート情報作成のための区間座標とマイクロホンアレーとの位置関係を示す図。The figure which shows the positional relationship of the area coordinate for template information preparation and a microphone array in an experiment example. 実験結果を示す図。The figure which shows an experimental result. 実験例により本発明の有効性を説明するための図。The figure for demonstrating the effectiveness of this invention by an experiment example. 第２実施形態に係る反射音情報推定方法の処理手順を示す図。The figure which shows the process sequence of the reflected sound information estimation method which concerns on 2nd Embodiment. （ａ）理想的には、推定到来方向に関する情報だけが抽出されるべきことを説明するための図。（ｂ）実際には、推定到来方向以外の方向に関する情報も混在してしまうことを説明するための図。(A) The figure for demonstrating that only the information regarding an estimated arrival direction should be extracted ideally. (B) The figure for demonstrating that the information regarding directions other than an estimated arrival direction will actually be mixed. 残差信号のパワーを全周波数に亘って総括することにより、推定到来方向以外の方向の影響を減らすことを説明するための図。The figure for demonstrating reducing the influence of directions other than an estimated arrival direction by summarizing the power of a residual signal over all the frequencies. 実用レベルの２次元マトリクスマイクロホンアレーを用いた場合における音圧分布とその分解を示す図。The figure which shows the sound pressure distribution at the time of using a two-dimensional matrix microphone array of a practical use level, and its decomposition | disassembly.

《第１実施形態》
本発明は、発話信号のような音源から放射された音声信号（音源信号）を複数のマイクロホンで構成されるマイクロホンアレーで収音した信号（収音信号）から反射音の「到来振幅」と「到来方向」の少なくともいずれか一つを推定する。第１実施形態の機能構成および処理フローを図４と図５に示す。 << First Embodiment >>
In the present invention, the “arrival amplitude” and “ At least one of the “arrival directions” is estimated. The functional configuration and processing flow of the first embodiment are shown in FIGS.

音源２００から放射された音源信号をＭｃｈのマイクロホン２１０−１，…，２１０−Ｍを用いて収音する（ステップＳ１）。Ｍは、４より大きい値が望ましい。ＡＤ変換部２２０が、収音されたアナログ信号をデジタル信号xx^→(t)＝[xx₁(t),…,xx_M(t)]^Tへ変換する（ステップＳ２）。ここで、[・]^Tは転置を表す。ｔは離散時間のインデックスを表す。 The sound source signal radiated from the sound source 200 is collected using the Mch microphones 210-1,..., 210-M (step S1). A value greater than 4 is desirable for M. The AD converter 220 converts the collected analog signal into a digital signal xx ^→ (t) = [xx ₁ (t),..., Xx _M (t)] ^T (step S2). Here, [•] ^T represents transposition. t represents a discrete time index.

Ｍ本のマイクロホンの並べ方は、２次元または３次元的に等間隔で配置することが望ましい。これは、反射音の到来方向とテンプレート（後で説明するが、反射音の伝達特性を模擬したものである）の対応を一意に定めるためである。なお、原理的に、１次元的にマイクロホンを配置しても、あるいは等間隔に配置しなくても、本発明を実施できるが、反射音の伝達特性と反射音の到来方向が一対一の関係とならないため、２次元または３次元的に等間隔で配置することが望ましい。２次元平面上に等間隔にマイクロホンを並べた場合の一例を図６に示す。マイクロホン間隔ｄは、空間サンプリング定理を満たすように設定されていることが望ましい。空間サンプリング定理を満たす場合、マイクロホン間隔ｄは、式（４）を満たす数値となる。ｃは音速であり、ｆは解析対象とする周波数である。例えば、４ｋＨｚの周波数を解析する場合、マイクロホン間隔を４ｃｍ程度に設定するのがよい。
It is desirable to arrange the M microphones at equal intervals in two dimensions or three dimensions. This is to uniquely determine the correspondence between the arrival direction of the reflected sound and the template (which will be described later, which simulates the transfer characteristic of the reflected sound). In principle, the present invention can be implemented even if the microphones are arranged one-dimensionally or not at regular intervals, but there is a one-to-one relationship between the transmission characteristics of the reflected sound and the arrival directions of the reflected sound. Therefore, it is desirable to arrange them at equal intervals in two dimensions or three dimensions. An example when microphones are arranged at equal intervals on a two-dimensional plane is shown in FIG. The microphone interval d is preferably set so as to satisfy the spatial sampling theorem. When the spatial sampling theorem is satisfied, the microphone interval d is a numerical value that satisfies Equation (4). c is the speed of sound, and f is the frequency to be analyzed. For example, when analyzing a frequency of 4 kHz, it is preferable to set the microphone interval to about 4 cm.

フレーム分割部２３０は、ＡＤ変換部２００が出力したデジタル信号xx^→(t)＝[xx₁(t),…,xx_M(t)]^Tを入力とし、チャネルごとに複数サンプルから成るデジタル信号の組（フレーム）に分割された信号x^→(k)＝[x₁(k),…,x_M(k)]^Tを出力する（ステップＳ３）。ｋはフレーム番号を表すインデックスである。フレーム分割は、各チャネルのデジタル信号xx_i(t)（１≦ｉ≦Ｍ）ごとにＷ点分をバッファリングして出力する処理である。Ｗはサンプリング周波数にもよるが、１６ｋＨｚサンプリングの場合には５１２点あたりが妥当である。 Frame division unit 230, a digital signal xx ^→ AD conversion unit 200 is output _{(t) = [xx 1 (} t), ..., xx M (t)] as input ^T, a digital signal consisting of a plurality of samples for each channel The signal x ^→ (k) = [x ₁ (k),..., X _M (k)] ^T divided into groups (frames) is output (step S3). k is an index representing a frame number. The frame division is a process of buffering and outputting W points for each digital signal xx _i (t) (1 ≦ i ≦ M) of each channel. W depends on the sampling frequency, but in the case of 16 kHz sampling, around 512 points are appropriate.

周波数領域変換部２４０は、各フレームのデジタル信号x^→(k)を入力として、周波数領域の信号X^→(ω,k)＝[X₁(ω,k),…,X_M(ω,k)]^Tに変換して出力する（ステップＳ４）。この信号X^→(ω,k)を観測信号と呼ぶことにする。ここで、ωは離散周波数のインデックスを指し（周波数ｆと角周波数ωとの間にはω=2πfの関係があるから、周波数のインデックスωをこの角周波数ωと同一視してもかまわない。以下、ωに関して「周波数のインデックス」を単に「周波数」ともいう）、ｋはフレームのインデックスを指す。周波数領域に変換する方法の一つに、離散フーリエ変換があるが、周波数領域に変換するのであれば、他の方法を用いてもよい。周波数領域の観測信号X^→(ω,k)は、各周波数ω、フレームｋごとに出力される。 The frequency domain converter 240 receives the digital signal x ^→ (k) of each frame as an input, and the frequency domain signal X ^→ (ω, k) = [X ₁ (ω, k),..., X _M (ω, k )] Converted to ^T and output (step S4). This signal X ^→ (ω, k) is called an observation signal. Here, ω indicates a discrete frequency index (there is a relationship of ω = 2πf between the frequency f and the angular frequency ω, and therefore the frequency index ω may be identified with the angular frequency ω. Hereinafter, with respect to ω, “frequency index” is also simply referred to as “frequency”), and k indicates a frame index. One method for transforming into the frequency domain is discrete Fourier transform, but other methods may be used as long as the transform is performed into the frequency domain. The observation signal X ^→ (ω, k) in the frequency domain is output for each frequency ω and frame k.

テンプレート生成部２５０は、Ｐ個のテンプレートS_p ^→(ω)の集合（ただし、計算の都合、ベクトル表記している）であるテンプレート情報S^→(ω)＝[S₁ ^→(ω),…,S_P ^→(ω)]（^∀ω∈Ω；Ωは周波数のインデックスωの集合）を周波数ωごとに生成する（ステップＳｐ）。この処理は通常、ステップＳ１−Ｓ４の各処理に先立ち実施される。Ｐはテンプレートの総数を表し、予め２以上の整数値に設定されている。テンプレート総数Ｐは多ければ多いほど高精度な反射音情報の推定に繋がるが、計算量が多くなるので、例えばＰ＝１０００くらいに設定するのが良い。この処理は、マイクロホンで信号を観測する以前にあらかじめ行う処理である。また、マイクロホンの位置（例えばマイクロホンの間隔ｄ）を変更したり、テンプレート総数Ｐを変更したりしない限り、通常、テンプレートを毎回作り直す必要はない。ここで言う“テンプレート”とは、反射音の到来方向に対応する伝達特性（音響伝播特性）を模擬したものである。ｐ番目（１≦ｐ≦Ｐ）のテンプレートS_p ^→(ω)＝[S_p1(ω),…,S_pM(ω)]^T（ω∈Ω）は、予め定められたｐ番目の点[x_p,y_p,z_p]とＭ個の受音点（ここで受音点はマイクロホンが配置される位置であり、ｍ番目（１≦ｍ≦Ｍ）の受音点を[u_m,v_m,w_m]とする）の間の周波数ごとの伝達特性を表す（図７を参照）。ｐ番目のテンプレートS_p ^→(ω)の各要素S_pm(ω)の算出式の一例を式（５）に示す。記号ｉは虚数単位を表す。
The template generation unit 250 includes template information S ^→ (ω) = [S ₁ ^→ (ω),..., Which is a set of P templates _Sp ^→ (ω) (however, for convenience of calculation, expressed in vector). , S _P ^→ (ω)] ( ^∀ ω∈Ω; Ω is a set of frequency indices ω) is generated for each frequency ω (step Sp). This process is usually performed prior to each process of steps S1-S4. P represents the total number of templates, and is set in advance to an integer value of 2 or more. The larger the total number of templates P, the more accurately the reflected sound information is estimated. However, since the amount of calculation increases, for example, P = 1000 is preferable. This processing is performed in advance before observing the signal with the microphone. Further, unless the position of the microphone (for example, the distance d between the microphones) is changed or the total number P of templates is not changed, it is usually unnecessary to recreate the template every time. The “template” referred to here is a simulation of a transfer characteristic (acoustic propagation characteristic) corresponding to the arrival direction of the reflected sound. The p-th (1 ≦ p ≦ P) template S _p ^→ (ω) = [S _p1 (ω),..., S _pM (ω)] ^T (ω∈Ω) is a predetermined p-th point [ x _p , y _p , z _p ] and M sound receiving points (where the sound receiving point is the position where the microphone is placed, and the m th (1 ≦ m ≦ M) sound receiving point is represented by [u _m , v _m , w _m ]) (see FIG. 7). An example of a calculation formula for each element S _pm (ω) of the p-th template _Sp ^→ (ω) is shown in Formula (5). The symbol i represents an imaginary unit.

ｐ番目のテンプレートS_p ^→(ω)には方向情報θ_p ^→(ω)が対応付けられている。方向情報θ_p ^→(ω)は、ｐ番目の点[x_p,y_p,z_p]および受音点[u_m,v_m,w_m]の位置座標の基準となる３次元直交座標系の原点からｐ番目の点[x_p,y_p,z_p]を見た方向であり、例えば（当該３次元直交座標系の原点と共通の原点を持つ）球座標系における二つの偏角（極角θ_p,polと方位角θ_p,azi）として表される。つまり、θ_p ^→(ω)＝[θ_p,pol(ω),θ_p,azi(ω)]である。なお、ｐ番目のテンプレートS_p ^→(ω)にｐ番目の点[x_p,y_p,z_p]が関連付けられていれば方向情報θ_p ^→(ω)は位置[x_p,y_p,z_p]から計算可能であるから、ｐ番目のテンプレートS_p ^→(ω)に方向情報θ_p ^→(ω)が対応付けられていることは必須要件ではない。なお、３次元直交座標系と球座標系とは相互に転換可能であるから（座標変換）、式（５）の右辺は位置[x,y,z]でなく方向情報θ_p ^→(ω)＝[θ_p,pol(ω),θ_p,azi(ω)]を用いて、例えば式（５ａ）のように表すこともできる。ここで、ｄはマイクロホン間隔であり、マイクロホンアレーをΦ行Ξ列（Φ×Ξ＝Ｍ）の２次元マイクロホンアレーとし、ｍ番目のマイクロホンの位置をφ行ξ列（１≦φ≦Φ，１≦ξ≦Ξ）にあるとする。
Direction information θ _p ^→ (ω) is associated with the p-th template S _p ^→ (ω). The direction information θ _p ^→ (ω) is a three-dimensional orthogonal coordinate system that serves as a reference for the position coordinates of the p th point [x _p , y _p , z _p ] and the sound receiving point [u _m , v _m , w _m ]. The direction of viewing the p-th point [x _p , y _p , z _p ] from the origin of, for example, two declinations in a spherical coordinate system (having a common origin with the origin of the 3D Cartesian coordinate system) ( _Polar angle θ _{p, pol} and azimuth angle θ _{p, azi} ). That is, θ _p ^→ (ω) = [θ _{p, pol} (ω), θ _{p, azi} (ω)]. If the p-th template S _p ^→ (ω) is associated with the p-th point [x _p , y _p , z _p ], the direction information θ _p ^→ (ω) is the position [x _p , y _p , since can be calculated from z _p], p-th template S _p ^→ (ω) be the direction information θ _p ^→ (ω) is associated with is not a requirement. Since the three-dimensional orthogonal coordinate system and the spherical coordinate system can be converted to each other (coordinate conversion), the right side of Equation (5) is not the position [x, y, z] but the direction information θ _p ^→ (ω). = [Θ _{p, pol} (ω), θ _{p, azi} (ω)], for example, can also be expressed as shown in equation (5a). Here, d is a microphone interval, the microphone array is a two-dimensional microphone array of Φ rows and Ξ columns (Φ × Ξ = M), and the position of the mth microphone is φ rows and ξ columns (1 ≦ φ ≦ Φ, 1 ≦ ξ ≦ Ξ).

また、第１実施形態のようにテンプレートが方向に対応している場合、Ｐ個の点[x_p,y_p,z_p]（１≦ｐ≦Ｐ）の位置は互いに方向の異なる位置であることが好ましく、例えば各点[x_p,y_p,z_p]が原点から十分に離れた等距離にあるとして、上記原点を中心とする球面上の異なるＰ個の点とすればよい。各点[x_p,y_p,z_p]を原点から十分に離れた位置とする理由は、音源ないし仮想音源から放射された信号は球面的に伝達するが音源ないし仮想音源から十分に離れた位置（原点）での局所領域では直接音ないし反射音を平面波として模擬できるからである。ただし、テンプレート情報が同じ方向の位置に対応するテンプレートを含むことを排除する趣旨ではない。なお、マイクロホンアレーは上記座標系の原点の近傍（局所領域）に配置されているとする。 Further, when the template corresponds to the direction as in the first embodiment, the positions of the P points [x _p , y _p , z _p ] (1 ≦ p ≦ P) are positions having different directions from each other. For example, assuming that each point [x _p , y _p , z _p ] is at an equal distance sufficiently away from the origin, different P points on the sphere centered on the origin may be used. The reason why each point [x _p , y _p , z _p ] is located sufficiently away from the origin is that the signal radiated from the sound source or virtual sound source is transmitted spherically but sufficiently separated from the sound source or virtual sound source This is because a direct sound or reflected sound can be simulated as a plane wave in a local region at the position (origin). However, this does not mean that the template information includes a template corresponding to a position in the same direction. It is assumed that the microphone array is disposed in the vicinity (local region) of the origin of the coordinate system.

テンプレート記憶部２６０は、テンプレート生成部２５０が出力したテンプレート情報S^→(ω)を記憶し、解析時に反射音情報推定部２７０にテンプレート情報S^→(ω)を提供する役割を果たす。 The template storage unit 260 stores the template information S ^→ (ω) output from the template generation unit 250 and plays a role of providing the template information S ^→ (ω) to the reflected sound information estimation unit 270 at the time of analysis.

反射音情報推定部２７０は、周波数領域の観測信号X^→(ω,k)とテンプレート情報S^→(ω)を入力として、Ｑ個の反射音情報成分rs_q ^→(ω,k)の集合（ただし、計算の都合、ベクトル表記している）である反射音情報rs^→(ω,k)＝[rs₁ ^→(ω,k),…,rs_Q ^→(ω,k)]^Tを各フレームｋについて周波数ωごとに出力する（ステップＳ５）。ここで、Ｑは推定される反射音の総数を表し、予め１以上の整数値に設定されている。ｑ番目（１≦ｑ≦Ｑ）の反射音情報成分rs_q ^→(ω,k)は、rs_q ^→(ω,k)＝[rsA_q(ω,k),rsB_q(ω,k)]の２要素から成り、rsA_q(ω,k)はｑ番目の反射音の到来振幅であり、rsB_q(ω,k)はｑ番目の反射音の到来方向である。 The reflected sound information estimation unit 270 receives a frequency domain observation signal X ^→ (ω, k) and template information S ^→ (ω) as an input, and a set of Q reflected sound information components rs _q ^→ (ω, k) ( However, the reflected sound information rs ^→ (ω, k) = [rs ₁ ^→ (ω, k),..., Rs _Q ^→ (ω, k)] ^T is calculated for each frame. k is output for each frequency ω (step S5). Here, Q represents the total number of reflected sounds to be estimated, and is set in advance to an integer value of 1 or more. The q-th (1 ≦ q ≦ Q) reflected sound information component rs _q ^→ (ω, k) is expressed as rs _q ^→ (ω, k) = [rsA _q (ω, k), rsB _q (ω, k)] RsA _q (ω, k) is the arrival amplitude of the qth reflected sound, and rsB _q (ω, k) is the arrival direction of the qth reflected sound.

反射音情報を推定する原理について説明する。図６に示すような２次元マイクロホンアレーを用いて観測した或る平面での音圧分布の一例を図８の左端の濃淡図として示す。濃淡図として示された音圧分布の見方について、黒い部分は音圧が小さく、白い部分は音圧が大きいことを示す。観測した音圧分布には直接音の音圧分布だけではなく、反射音の音圧分布も混入している。直接音や反射音が十分に遠方より到来する場合において、２次元平面上でのそれぞれの音圧分布は、図８の右側の３つの濃淡図のように縞模様となる。縞模様の「濃淡」が直接音ないし反射音の到来振幅、「回転・周期」が直接音ないし反射音の到来方向にそれぞれ対応する。図８の例では、到来振幅や到来方向が異なる直接音、反射音１、反射音２の各音圧分布の重畳で観測信号の音圧分布が構成されることを示している。周波数領域で考えると、直接音や各反射音は到来方向に応じて周波数の変化する複素正弦波で表され、観測信号は直接音と各反射音に対応する複数の複素正弦波が重畳したものとして表される。ところで、本発明で解決する問題は、観測信号のみを用いて、反射音の到来振幅および／または到来方向を推定することである。この課題解決は、図８の左端に描かれた音圧分布から図８の右側の３つの濃淡図の直接音や各反射音に対応する縞模様の「濃淡」および／または「回転・周期」を推定することに対応する。 The principle of estimating the reflected sound information will be described. An example of the sound pressure distribution in a certain plane observed using a two-dimensional microphone array as shown in FIG. 6 is shown as a shading diagram at the left end of FIG. Regarding the view of the sound pressure distribution shown as a shading diagram, the black portion indicates that the sound pressure is low and the white portion indicates that the sound pressure is high. The observed sound pressure distribution includes not only the sound pressure distribution of the direct sound but also the sound pressure distribution of the reflected sound. When the direct sound and the reflected sound come sufficiently far away, each sound pressure distribution on the two-dimensional plane has a striped pattern as shown in the three shades on the right side of FIG. Striped “shading” corresponds to the arrival amplitude of direct sound or reflected sound, and “rotation / period” corresponds to the direction of arrival of direct sound or reflected sound. In the example of FIG. 8, it is shown that the sound pressure distribution of the observation signal is configured by superimposing the sound pressure distributions of the direct sound, the reflected sound 1 and the reflected sound 2 having different arrival amplitudes and directions. When considered in the frequency domain, the direct sound and each reflected sound are represented by a complex sine wave whose frequency changes according to the direction of arrival, and the observation signal is a superposition of the direct sound and multiple complex sine waves corresponding to each reflected sound. Represented as: Incidentally, the problem to be solved by the present invention is to estimate the arrival amplitude and / or the arrival direction of the reflected sound using only the observation signal. To solve this problem, from the sound pressure distribution depicted at the left end of FIG. 8, the direct sound of the three shades on the right side of FIG. 8 and the “tone” and / or “rotation / period” of the striped pattern corresponding to each reflected sound Corresponds to the estimation.

図９を参照して、反射音情報rs^→(ω,k)を推定する手法の概略について説明する。ある２次元平面で観測した観測信号に含まれているパワーの最も強い反射音０（q=1に相当し、最も強いパワーを持つことから、通常、この反射音０は「直接音」として理解される）を推定し、観測信号から当該反射音０を減算して残差信号E₂を得る。次に当該残差信号E₂に含まれているパワーの最も強い反射音１（q=2に相当する）を推定し、当該残差信号E₂から当該反射音１を減算して新たな残差信号E₃を得る。次に、当該残差信号E₃に含まれているパワーの最も強い反射音２（q=Q=3に相当する）を推定する。ここでは、Ｑ＝３の場合を説明したが、一般的に、ｑ番目の残差信号E_q（ただし１番目の残差信号は観測信号とする）に含まれているパワーの最も強いｑ番目の反射音q-1（ただし反射音０は直接音である）を減算する操作をｑ＝Ｑまで逐次実行することでＱ個の反射音情報成分（rs₁ ^→(ω,k),…,rs_Q ^→(ω,k)）を得る。１番目の反射音情報成分rs₁ ^→(ω,k)は反射音０（直接音）に対応し、２番目の反射音情報成分rs₂ ^→(ω,k)は反射音１に対応し、３番目の反射音情報成分rs₃ ^→(ω,k)は反射音２に対応し、・・・、Ｑ番目の反射音情報成分rs_Q ^→(ω,k)は反射音Q-1に対応する。Ｑは、計算パワーや反射音情報を用いるアプリケーションにも依存するが、３０くらいに設定するのが良い。 With reference to FIG. 9, an outline of a method for estimating the reflected sound information rs ^→ (ω, k) will be described. The reflected sound 0 with the strongest power included in the observation signal observed on a two-dimensional plane (corresponding to q = 1 and having the strongest power, this reflected sound 0 is usually understood as a “direct sound”. to) estimates to obtain a residual signal E ₂ from the observed signal by subtracting the reflected sound 0. Next, the reflected sound 1 having the strongest power (corresponding to q = 2) included in the residual signal E ₂ is estimated, and the reflected sound 1 is subtracted from the residual signal E ₂ to obtain a new residual sound. obtaining a difference signal E _3. Next, the reflected sound 2 having the strongest power (corresponding to q = Q = 3) included in the residual signal E ₃ is estimated. Here, the case of Q = 3 has been described, but generally, the qth strongest power included in the _qth residual signal E _q (where the first residual signal is an observed signal) The reflected sound information components (rs ₁ ^→ (ω, k),..., Are sequentially executed by subtracting the reflected sound q-1 (where reflected sound 0 is a direct sound) until q = Q. rs _Q ^→ (ω, k)). The first reflected sound information component rs ₁ ^→ (ω, k) corresponds to the reflected sound 0 (direct sound), the second reflected sound information component rs ₂ ^→ (ω, k) corresponds to the reflected sound 1, The third reflected sound information component rs ₃ ^→ (ω, k) corresponds to the reflected sound 2..., The Qth reflected sound information component rs _Q ^→ (ω, k) corresponds to the reflected sound Q-1. To do. Q depends on the application using the calculation power and the reflected sound information, but is preferably set to about 30.

なお、図８および図９の音圧分布はそれぞれ高解像度の濃淡図として示されているが、このような高解像度の濃淡図として音圧分布を示すためには極めて多くのマイクロホンを必要とし、実用的ではない。他方、実用レベルの２次元マトリクスマイクロホンアレーとして例えば１００個のマイクロホンを１０×１０の２次元マトリクスマイクロホンアレーとして用いた場合でさえ、粗い（低解像度）濃淡図（図１６参照）として示される音圧分布しか得られない。そこで、実用の観点から、低解像度の音圧分布しか得られないような状況の下で、精度良く反射音の到来振幅や到来方向を推定することが求められる。本発明では、空間分解能の向上のために任意の位置から到来する平面波を具体的に表現することとし（定式化）、パワーが大きな反射音の影響を受けてパワーの小さいな反射音を推定できなくなることを防止するために、既に推定された反射音を観測信号から除去して次の反射音を推定する（分解）。定式化についてはテンプレート情報として説明したとおりであり、分解については反射音情報rs^→(ω,k)の推定手法の概略で説明したとおりである。 Note that the sound pressure distributions in FIGS. 8 and 9 are shown as high-resolution shading diagrams, but in order to show the sound pressure distribution as such a high-resolution shading diagram, an extremely large number of microphones are required, Not practical. On the other hand, even when, for example, 100 microphones are used as a 10 × 10 two-dimensional matrix microphone array as a practical level two-dimensional matrix microphone array, the sound pressure shown as a rough (low resolution) gray scale (see FIG. 16). Only a distribution can be obtained. Thus, from a practical point of view, it is required to accurately estimate the arrival amplitude and direction of the reflected sound in a situation where only a low-resolution sound pressure distribution can be obtained. In the present invention, a plane wave arriving from an arbitrary position is specifically expressed in order to improve spatial resolution (formulation), and reflected sound with low power can be estimated under the influence of reflected sound with high power. In order to prevent disappearance, the already estimated reflected sound is removed from the observation signal to estimate the next reflected sound (decomposition). The formulation is as described in the template information, and the decomposition is as described in the outline of the estimation method of the reflected sound information rs ^→ (ω, k).

上で述べた反射音情報rs^→(ω,k)を推定する手法について詳細を述べる。説明に先立ち、記号の定義を行う。ｑ番目の残差信号をE_q ^→(ω,k)=[E_q1(ω,k),…,E_qM(ω,k)]^T、推定されたｑ番目の反射音（q=1の場合は直接音を表す）をA_q(ω,k,g(q,k))S_g(q,k) ^→(ω)とする。ここで、g(q,k)は、フレームｋにおけるｑ番目の反射音を最も精度良く表現できるテンプレートのインデックスを表す。反射音を構成する係数A_q(ω,k,g(q,k))は、音源２００自身が持つ位相や壁での反射、距離による減衰などによるテンプレートS_g(q,k) ^→(ω)と反射音との相違を表す。ｑの昇順に残差信号から反射音を減算する上述の方法を式で表すと式（６）のようになる。ただし、１≦ｑ≦Ｑであり、E₁ ^→(ω,k)=X^→(ω,k)である。
The method for estimating the reflected sound information rs ^→ (ω, k) described above will be described in detail. Prior to explanation, symbols are defined. The q-th residual signal is represented by E _q ^→ (ω, k) = [E _q1 (ω, k),..., E _qM (ω, k)] ^T , and the estimated q-th reflected sound (q = 1 In this case, direct sound is represented as A _q (ω, k, g (q, k)) S _{g (q, k)} ^→ (ω). Here, g (q, k) represents a template index that can represent the q-th reflected sound in the frame k with the highest accuracy. The coefficient A _q (ω, k, g (q, k)) constituting the reflected sound is determined by the template S _{g (q, k)} ^→ (ω ) And the reflected sound. The above-described method for subtracting the reflected sound from the residual signal in ascending order of q can be expressed by equation (6). However, 1 ≦ q ≦ Q, and E ₁ ^→ (ω, k) = X ^→ (ω, k).

このとき、ｑ番目の反射音情報成分rs_q ^→(ω,k)＝[rsA_q(ω,k),rsB_q(ω,k)]は式（７）、式（８）で与えられる。方向情報θ_g(q,k) ^→(ω,k)はテンプレートS_g(q,k) ^→(ω)に対応する方向情報である。
At this time, the q-th reflected sound information component rs _q ^→ (ω, k) = [rsA _q (ω, k), rsB _q (ω, k)] is given by equations (7) and (8). The direction information θ _{g (q, k)} ^→ (ω, k) is direction information corresponding to the template S _{g (q, k)} ^→ (ω).

次に、ｑ番目の反射音A_q(ω,k,g(q,k))S_g(q,k) ^→(ω)を推定する手法について述べる。ｑ番目の反射音A_q(ω,k,g(q,k))S_g(q,k) ^→(ω)は、ｑ＋１番目の残差信号E_q+1 ^→(ω,k)のパワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)を最小とするものとして推定される。記号Ｈは共役転置を表す。推定方法は様々あるが、そのうちの一つの方法について述べる。反射音は、A_q(ω,k,g(q,k))とS_g(q,k) ^→(ω)の２つの要素で構成されるので、２つの要素に対して最適化することが必要となる。後述の＜処理１＞と＜処理２＞はｑの昇順に各ｑについて行われる。 Next, a technique for estimating the _qth reflected sound A _q (ω, k, g (q, k)) S _{g (q, k)} ^→ (ω) will be described. The qth reflected sound A _q (ω, k, g (q, k)) S _{g (q, k)} ^→ (ω) is the power of the q + 1th residual signal E _{q + 1} ^→ (ω, k) (E _{q + 1} ^→ (ω, k)) It is estimated that ^H E _{q + 1} ^→ (ω, k) is minimized. The symbol H represents conjugate transposition. There are various estimation methods, but one of them will be described. The reflected sound is composed of two elements, A _q (ω, k, g (q, k)) and S _{g (q, k)} ^→ (ω). Is required. <Process 1> and <Process 2> to be described later are performed for each q in ascending order of q.

＜処理１＞
記号Λはインデックスｐの全体の集合{1,…,p,…,P}から後述する式（１０）により決定されたインデックスの集合を除いた集合である。つまり、Λ＝{1,…,p,…,P}-{g(1,k),…,g(q-1,k)}とする。ただし、初めて＜処理１＞を行うときはΛ＝{1,…,p,…,P}である。
ｐ番目のテンプレートS_p ^→(ω)が残差信号E_q+1 ^→(ω,k)のパワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)を最小化するための最適なテンプレートであると仮定した場合の係数A_q(ω,k,p)は、最小二乗法に基づき、式（９）により求められる。なお、この段階では、式（９）左辺のｑは意味を持たないことに留意されたい。
<Process 1>
The symbol Λ is a set obtained by excluding the set of indexes determined by Expression (10) described later from the entire set {1,..., P,. That is, Λ = {1, ..., p, ..., P}-{g (1, k), ..., g (q-1, k)}. However, when <Process 1> is performed for the first time, Λ = {1,..., P,.
The p-th template S _p ^→ (ω) is the power (E _{q + 1} ^→ (ω, k)) ^H E _{q + 1} ^→ (ω, k) of the residual signal E _{q + 1} ^→ (ω, k) The coefficient A _q (ω, k, p) when it is assumed that the template is the optimum template for minimization is obtained by Expression (9) based on the least square method. Note that at this stage, q on the left side of Equation (9) has no meaning.

＜処理２＞
集合Λの要素の個数（濃度）を|Λ|とすると、式（９）に基づき得られた|Λ|個の係数A_q(ω,k,p)（ｐ∈Λ）を用いて、テンプレートS_g(q,k) ^→(ω)のインデックスを表すg(q,k)は、残差信号E_q+1 ^→(ω,k)のパワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)を最小とするインデックスとして式（１０）により得られる。
<Process 2>
When the number (concentration) of elements of the set Λ is | Λ |, the | Λ | coefficients A _q (ω, k, p) (p∈Λ) obtained based on the equation (9) are used to generate a template. _{G (q, k)} representing the index of S _{g (q, k)} ^→ (ω) is the power of the residual signal E _{q + 1} ^→ (ω, k) (E _{q + 1} ^→ (ω, k)) ^As an index that minimizes ^H E _{q + 1} ^→ (ω, k), it is obtained by Expression (10).

ｑ番目の反射音を構成する係数A_q(ω,k,g(q,k))は、式（１０）で得られたg(q,k)を式（９）のｐに代入して得られる。式（１０）の計算の際に係数A_q(ω,k,g(q,k))は計算されているので、メモリに余裕がある場合には、式（１０）の際に得られた係数A_q(ω,k,g(q,k))をメモリに格納しておき、式（１０）でg(q,k)が得られた後に、メモリからp=g(q,k)に対応する係数A_q(ω,k,g(q,k))を呼び出せばよい。 The coefficient A _q (ω, k, g (q, k)) constituting the q-th reflected sound is obtained by substituting g (q, k) obtained in Equation (10) for p in Equation (9). can get. Since the coefficient A _q (ω, k, g (q, k)) is calculated at the time of the calculation of the expression (10), it is obtained at the time of the expression (10) when there is a margin in the memory. The coefficient A _q (ω, k, g (q, k)) is stored in the memory, and after g (q, k) is obtained by Expression (10), p = g (q, k) is obtained from the memory. The coefficient A _q (ω, k, g (q, k)) corresponding to

以上の過程により、Ｑ個の反射音情報成分rs_q ^→(ω,k)＝[rsA_q(ω,k),rsB_q(ω,k)]（ｑ＝１，…，Ｑ）が求められる。 Through the above process, Q reflected sound information components rs _q ^→ (ω, k) = [rsA _q (ω, k), rsB _q (ω, k)] (q = 1,..., Q) are obtained. .

到来時刻差推定部２８０は、反射音情報rs^→(ω,k)＝[rs₁ ^→(ω,k),…,rs_Q ^→(ω,k)]^Tを入力として（正確には、複素振幅{rsA₁(ω,k),…,rsA_Q(ω,k)}のみで十分である）、基準となる反射音（基準反射音）に対する各反射音の相対的な到来時刻差rsD_q(k)（１≦ｑ≦Ｑ）の集合である到来時刻差情報rsD^→(k)＝[rsD₁(k),…,rsD_Q(k)]^T（ただし、到来時刻差を用いる任意のアプリケーションの便宜上、ベクトル表記している）を、各フレームｋについて出力する（ステップＳ６）。具体的には、後述の((処理１))と((処理２))を各ｑ（１≦ｑ≦Ｑ）について行うことにより、rsD_q(k)（１≦ｑ≦Ｑ）が求められる。ここでは処理対象のｑを特にｑｃと記す。 The arrival time difference estimation unit 280 receives the reflected sound information rs ^→ (ω, k) = [rs ₁ ^→ (ω, k),..., Rs _Q ^→ (ω, k)] ^T as an input (more precisely, complex Amplitude {rsA ₁ (ω, k), ..., rsA _Q (ω, k)} is sufficient), and the relative arrival time difference rsD _q of each reflected sound with respect to the reference reflected sound (reference reflected sound) (k) arrival time difference information rsD which is a set of (1 ≦ q ≦ Q) ^→ (k) = [rsD ₁ (k),..., rsD _Q (k)] ^T Is output for each frame k (step S6). Specifically, rsD _q (k) (1 ≦ q ≦ Q) is obtained by performing ((Process 1)) and ((Process 2)) described later for each q (1 ≦ q ≦ Q). . Here, q to be processed is particularly referred to as qc.

((処理１))
推定されたＱ個の反射音から基準反射音のインデックスＮを決める。((処理１))を各ｑ（１≦ｑ≦Ｑ）について行う場合、Ｎ＝ｑｃならば基準反射音に対するｑｃ番目の反射音の到来時刻差が０であることは自明であるから、インデックスＮは反射音を識別するインデックスの差集合{１，…，Ｑ}−{ｑｃ}の中から選択されることが好ましい。基準反射音の決定方法は種々考えられるが、例えば、差集合{１，…，Ｑ}−{ｑｃ}に対応する反射音のうちパワーが強い反射音を選択することが好適である。この際、解析する周波数帯域に含まれる周波数のインデックスωに関して到来振幅の大きさの総和を計算することにより、周波数による影響をキャンセルすることができる。この場合、Ｎは式（１０ａ）により決定される。なお、解析する周波数帯域に含まれる周波数のインデックスωの集合をΩとし、例えば音声信号を扱うのであれば、1.0〜3.0kHz帯域に対応するインデックスの集合をΩとすればよい。
((Process 1))
An index N of the reference reflected sound is determined from the estimated Q reflected sounds. When ((Process 1)) is performed for each q (1 ≦ q ≦ Q), if N = qc, it is obvious that the arrival time difference of the qc-th reflected sound with respect to the reference reflected sound is zero. N is preferably selected from an index difference set {1,..., Q} − {qc} that identifies the reflected sound. There are various methods for determining the reference reflected sound. For example, it is preferable to select a reflected sound having a high power among the reflected sounds corresponding to the difference set {1,..., Q} − {qc}. At this time, the influence of the frequency can be canceled by calculating the sum of the magnitudes of the arrival amplitudes for the frequency index ω included in the frequency band to be analyzed. In this case, N is determined by equation (10a). A set of indexes ω of frequencies included in the frequency band to be analyzed is Ω. For example, when a voice signal is handled, a set of indexes corresponding to the 1.0 to 3.0 kHz band may be Ω.

((処理２))
Ｎ番目の反射音の到来振幅rsA_N(ω,k)とｑｃ番目（ｑｃ≠Ｎ）の反射音の到来振幅rsA_qc(ω,k)との位相差を用いて相対的な到来時刻差rsD_qc(k)を算出する。一般に、到来振幅rsA_q(ω,k)は複素振幅であるから振幅λ_q(ω,k)と時刻τ_q(ω,k)を用いてrsA_q(ω,k)=λ_q(ω,k)exp[iωτ_q(ω,k)]と表すことができる。従って、Ｎ番目の反射音に対するｑ番目（ｑ≠Ｎ）の反射音の位相差は、Ｎ番目の反射音の到来振幅rsA_N(ω,k)に対するｑ番目（ｑ≠Ｎ）の反射音の到来振幅rsA_q(ω,k)の偏角arg(rsA_q(ω,k)／rsA_N(ω,k))として与えられる。偏角arg(rsA_q(ω,k)／rsA_N(ω,k))を周波数のインデックスωで除することによりＮ番目の反射音とｑ番目（ｑ≠Ｎ）の反射音との時刻差λ_q(ω,k)-λ_N(ω,k)を求めることができる。つまり、λ_q(ω,k)-λ_N(ω,k)=arg(rsA_q(ω,k)／rsA_N(ω,k))／ωである。時刻λ_N(ω,k)に対する時刻λ_q(ω,k)が「早い」か「遅い」かの区別を行うため負号を導入する。つまり、-arg(rsA_q(ω,k)／rsA_N(ω,k))／ω＜０であれば時刻λ_N(ω,k)に対する時刻λ_q(ω,k)が「早い」ことを表し、-arg(rsA_q(ω,k)／rsA_N(ω,k))／ω＞０であれば時刻λ_N(ω,k)に対する時刻λ_q(ω,k)が「遅い」ことを表す。-arg(rsA_q(ω,k)／rsA_N(ω,k))／ω=０の場合、時刻差が無いことを表す。この際、解析する周波数帯域に含まれる周波数のインデックスωに関して-arg(rsA_q(ω,k)／rsA_N(ω,k))／ωの総和を計算することにより、周波数による影響をキャンセルすることができる。即ち、相対的な到来時刻差rsD_qc(k)は、式（１０ｂ）によって与えられる。集合Ωの要素の個数（濃度）を|Ω|とする。
((Process 2))
N-th reflected sound of the incoming amplitude rsA _N (ω, k) and qc th relative arrival time differences using the phase difference between the incoming amplitude RSA _qc reflected sound (ω, k) of (qc ≠ N) RSD _qc (k) is calculated. In general, since the arrival amplitude rsA _q (ω, k) is a complex amplitude, rsA _q (ω, k) = λ _q (ω, k) using amplitude λ _q (ω, k) and time τ _q (ω, k) k) exp [iωτ _q (ω, k)]. Therefore, the phase difference of the qth (q ≠ N) reflected sound with respect to the Nth reflected sound is equal to the qth (q ≠ N) reflected sound with respect to the arrival amplitude rsA _N (ω, k) of the Nth reflected sound. It is given as the argument arg (rsA _q (ω, k) / rsA _N (ω, k)) of the arrival amplitude rsA _q (ω, k). The time difference between the Nth reflected sound and the qth (q ≠ N) reflected sound by dividing the argument arg (rsA _q (ω, k) / rsA _N (ω, k)) by the frequency index ω. λ _q (ω, k) −λ _N (ω, k) can be obtained. That is, λ _q (ω, k) −λ _N (ω, k) = arg (rsA _q (ω, k) / rsA _N (ω, k)) / ω. In order to distinguish whether the time λ _q (ω, k) relative to the time λ _N (ω, k) is “early” or “late”, a negative sign is introduced. That is, if -arg (rsA _q (ω, k) / rsA _N (ω, k)) / ω <0, the time λ _q (ω, k) with respect to the time λ _N (ω, k) is “early”. If -arg (rsA _q (ω, k) / rsA _N (ω, k)) / ω> 0, the time λ _q (ω, k) with respect to the time λ _N (ω, k) is “slow” Represents that. _{-arg (rsA q (ω, k} ) / rsA N (ω, k)) / ω = 0, indicating that the time difference is not. At this time, the influence of the frequency is canceled by calculating the sum of −arg (rsA _q (ω, k) / rsA _N (ω, k)) / ω with respect to the frequency index ω included in the frequency band to be analyzed. be able to. That is, the relative arrival time difference rsD _qc (k) is given by equation (10b). Let | Ω | be the number of elements in the set Ω (concentration).

なお、（(処理１)）にて基準反射音のインデックスを決定するが、基準反射音を各ｑについて共通とする場合には、（(処理１)）はひとたび行われれば十分である。この場合、先ず((処理１ａ))を行い、次いで差集合{１，…，Ｑ}−{Ｎ}の各要素ｑｃについて上述の((処理２))を行えばよい。 Note that the index of the reference reflected sound is determined in ((Process 1)). However, if the reference reflected sound is common to each q, it is sufficient that ((Process 1)) is performed once. In this case, ((Processing 1a)) is first performed, and then (Processing 2) described above is performed for each element qc of the difference set {1,..., Q} − {N}.

((処理１ａ))
推定されたＱ個の反射音から基準反射音のインデックスＮを決める。基準反射音の決定方法は種々考えられるが、例えば、Ｑ個の反射音のうちパワーが強い反射音（直接音）を選択することが好適である。この際、解析する周波数帯域に含まれる周波数のインデックスωに関して到来振幅の大きさの総和を計算することにより、周波数による影響をキャンセルすることができる。この場合、Ｎは式（１０ｃ）により決定される。
((Process 1a))
An index N of the reference reflected sound is determined from the estimated Q reflected sounds. There are various methods for determining the reference reflected sound. For example, it is preferable to select a reflected sound (direct sound) having a high power among the Q reflected sounds. At this time, the influence of the frequency can be canceled by calculating the sum of the magnitudes of the arrival amplitudes for the frequency index ω included in the frequency band to be analyzed. In this case, N is determined by equation (10c).

なお、マイクロホンアレーから音源ないし仮想音源までの音源距離dis_q(ω,k)は、相対的な到来時刻差を定数倍した量であるから、式（１０ｄ）によって与えられる。ｃは音速である。
Note that the sound source distance dis _q (ω, k) from the microphone array to the sound source or the virtual sound source is an amount obtained by multiplying the relative arrival time difference by a constant, and is given by the equation (10d). c is the speed of sound.

以下、相対的な到来時刻差を求めるに必要な反射音情報rs^→(ω,k)＝[rs₁ ^→(ω,k),…,rs_Q ^→(ω,k)]^Tを求めるいくつかの実施形態を説明する。いずれの実施形態であっても、第１実施形態のステップＳ６の処理を行うことで、相対的な到来時刻差rsD_q(k)（１≦ｑ≦Ｑ）を求めることができる。 Below, reflected sound needed to determine the relative arrival time difference information ^{rs → (ω, k) =} [rs 1 → (ω, k), ..., rs Q → (ω, k)] some to seek a ^T The embodiment will be described. In any embodiment, the relative arrival time difference rsD _q (k) (1 ≦ q ≦ Q) can be obtained by performing the process of step S6 of the first embodiment.

《第２実施形態》
第１実施形態ではテンプレート情報S^→(ω)を用いて反射音情報rs^→(ω,k)を求めたが、Ｐ個のテンプレートS_p ^→(ω)の集合であるテンプレート情報S^→(ω)を事前に求めておくことは必ずしも必須ではない。テンプレート情報S^→(ω)を事前に求めておかない実施形態を第２実施形態として説明する。 << Second Embodiment >>
In the first embodiment, the reflected sound information rs ^→ (ω, k) is obtained using the template information S ^→ (ω). However, the template information S ^→ (ω) is a set of P templates _Sp ^→ (ω). ) Is not necessarily required in advance. An embodiment in which template information S ^→ (ω) is not obtained in advance will be described as a second embodiment.

第２実施形態では、第１実施形態におけるステップＳ１−Ｓ４の各処理が実施されるが、第１実施形態におけるステップＳｐの処理が不要であり、さらに第１実施形態のステップＳ５の処理に替えてステップＳ５ａの処理が行われる（図１３参照）。そこで、第１実施形態と同じ事項については重複説明を省略し、第１実施形態と異なる事項について説明する。 In the second embodiment, the processes in steps S1 to S4 in the first embodiment are performed. However, the process in step Sp in the first embodiment is not necessary, and is further replaced with the process in step S5 in the first embodiment. Step S5a is then performed (see FIG. 13). Therefore, the duplicated description of the same items as those in the first embodiment will be omitted, and items different from those in the first embodiment will be described.

記号の定義を行う。ｑ番目の残差信号をE_q ^→(ω,k)=[E_q1(ω,k),…,E_qM(ω,k)]^T、ｑ番目の反射音（q=1の場合は直接音を表す）をA_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))とする。反射音を構成するR_q ^→(ω,θ_q ^→(ω,k))=[R₁(ω,θ_q ^→(ω,k)),…,R_M(ω,θ_q ^→(ω,k))]^Tは、空間中の任意の位置[x,y,z]と各マイクロホンとの間の周波数ごとの伝達特性を模擬した関数（以下、伝達特性関数という）であり、各マイクロホンに対する伝達特性を模擬した関数であれば何でもよい。通常、伝達特性関数を構成する各伝達特性R_m(ω,θ_q ^→(ω,k))とテンプレートの各要素S_pm(ω)の算出式とは同じである。この場合、方向情報θ_q ^→(ω,k)で表される方向に在る位置[x,y,z]とｍ番目の受音点[u_m,v_m,w_m]との間の周波数ごとの伝達特性R_m(ω,θ_q ^→(ω,k))は式（１１）で表される。なお、方向情報θ_q ^→(ω,k)で表される方向に在る位置[x,y,z]は、例えば、上記座標系原点から十分に離れた球面上の位置とすればよい。位置[x,y,z]を原点から十分に離れた位置とする理由は既述のとおりであり、詳しくは位置[x,y,z]はマイクロホンアレーが配置されている局所領域にて音源ないし仮想音源からの直接音ないし反射音を平面波として模擬できる距離にある空間中の任意の位置であることが好ましい。なお、３次元直交座標系と球座標系とは相互に転換可能であるから（座標変換）、式（１１）の右辺は位置[x,y,z]でなく方向情報θ_q ^→(ω,k)＝[θ_q,pol(ω,k),θ_q,azi(ω,k)]を用いて、例えば式（１１ａ）のように表すこともできる。ここで、ｄはマイクロホン間隔であり、マイクロホンアレーをΦ行Ξ列（Φ×Ξ＝Ｍ）の２次元マイクロホンアレーとし、ｍ番目のマイクロホンの位置をφ行ξ列（１≦φ≦Φ，１≦ξ≦Ξ）にあるとする。
Define the symbol. q q residual signal is represented by E _q ^→ (ω, k) = [E _q1 (ω, k), ..., E _qM (ω, k)] ^T , qth reflected sound (if q = 1, direct A _q (ω, k) R _q ^→ (ω, θ _q ^→ (ω, k)). R _q ^→ (ω, θ _q ^→ (ω, k)) = (R ₁ (ω, θ _q ^→ (ω, k)), ..., R _M (ω, θ _q ^→ (ω, k))] ^T is a function that simulates the transfer characteristics for each frequency between an arbitrary position [x, y, z] in space and each microphone (hereinafter referred to as the transfer characteristic function). Any function that simulates the transfer characteristics may be used. Usually, each transfer characteristic R _m (ω, θ _q ^→ (ω, k)) constituting the transfer characteristic function and the calculation formula for each element S _pm (ω) of the template are the same. In this case, between the position [x, y, z] in the direction represented by the direction information θ _q ^→ (ω, k) and the m th sound receiving point [u _m , v _m , w _m ] The transfer characteristic R _m (ω, θ _q ^→ (ω, k)) for each frequency is expressed by equation (11). Note that the position [x, y, z] in the direction represented by the direction information θ _q ^→ (ω, k) may be, for example, a position on a spherical surface sufficiently away from the coordinate system origin. The reason why the position [x, y, z] is sufficiently distant from the origin is as described above. Specifically, the position [x, y, z] is a sound source in the local region where the microphone array is arranged. Or it is preferably an arbitrary position in a space at a distance that can simulate a direct sound or reflected sound from a virtual sound source as a plane wave. Since the three-dimensional orthogonal coordinate system and the spherical coordinate system can be converted to each other (coordinate conversion), the right side of Equation (11) is not the position [x, y, z] but the direction information θ _q ^→ (ω, Using k) = [θ _{q, pol} (ω, k), θ _{q, azi} (ω, k)], for example, it can also be expressed as in equation (11a). Here, d is a microphone interval, the microphone array is a two-dimensional microphone array of Φ rows and Ξ columns (Φ × Ξ = M), and the position of the mth microphone is φ rows and ξ columns (1 ≦ φ ≦ Φ, 1 ≦ ξ ≦ Ξ).

反射音を構成するA_q(ω,k)は、音源２００自身が持つ位相や壁での反射、距離による減衰といったテンプレートR_q ^→(ω,θ_q ^→(ω,k))と反射音との相違を表し、到来振幅に相当する。ｑの昇順に残差信号から反射音を減算する上述の方法を式で表すと式（１２）のようになる。ただし、１≦ｑ≦Ｑであり、E₁ ^→(ω,k)=X^→(ω,k)である。
A _q (ω, k) constituting the reflected sound is a template R _q ^→ (ω, θ _q ^→ (ω, k)) such as the phase of the sound source 200 itself, reflection on the wall, attenuation due to distance, and reflected sound. This corresponds to the arrival amplitude. The above-described method for subtracting the reflected sound from the residual signal in ascending order of q can be expressed by equation (12). However, 1 ≦ q ≦ Q, and E ₁ ^→ (ω, k) = X ^→ (ω, k).

次に、反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))を最適化する方法について説明する。
ｑ番目の最適化された反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))は、式（１２）で表されるｑ＋１番目の残差信号E_q+1 ^→(ω,k)のパワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)を最小とする基準に従って決定される。具体的には、伝達特性関数R_q ^→(ω,θ_q ^→(ω,k))が方向情報θ_q ^→(ω,k)で決定されることに注意すると、ｑ番目の反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))を表現するパラメータA_q(ω,k)，θ_q ^→(ω,k)の最適値A_q,opt(ω,k)，θ_q,opt ^→(ω,k)は式（１３）によって得られる。なお、記号Ｈは共役転置を表す。
Next, a method for optimizing the reflected sound A _q (ω, k) R _q ^→ (ω, θ _q ^→ (ω, k)) will be described.
The q-th optimized reflected sound A _q (ω, k) R _q ^→ (ω, θ _q ^→ (ω, k)) is expressed by the q + 1-th residual signal E _{q +} represented by Expression (12). ₁ ^→ (ω, k) power (E _{q + 1} ^→ (ω, k)) ^H E _{q + 1} ^→ (ω, k) is determined according to a standard that minimizes the power. Specifically, when it is noted that the transfer characteristic function R _q ^→ (ω, θ _q ^→ (ω, k)) is determined by the direction information θ _q ^→ (ω, k), the q-th reflected sound A _q (ω, k) R _q ^→ (ω, θ _q ^→ (ω, k)) parameter A _q (ω, k), θ _q ^→ (ω, k) optimum value A _{q, opt} (ω, k), θ _{q, opt} ^→ (ω, k) is obtained by equation (13). Note that the symbol H represents conjugate transposition.

このとき、ｑ番目の反射音情報成分rs_q ^→(ω,k)＝[rsA_q(ω,k),rsB_q(ω,k)]は式（１４）、式（１５）で与えられる。
At this time, the q-th reflected sound information component rs _q ^→ (ω, k) = [rsA _q (ω, k), rsB _q (ω, k)] is given by the equations (14) and (15).

式（１３）の具体的な計算方法は種々考えられるが、ここではその一例を示す。方向情報θ_q ^→(ω,k)を、例えばビームフォーマ法などの到来方位推定方法によって定める。ビームフォーマ法は、指向性ビームを空間走査し、得られた電力スペクトルから電力が大きくなる方向を探索する方法である。ここでは、ビームフォーマ法によりＰ個の到来方向が推定できたとする。 Various specific calculation methods of Expression (13) are conceivable, but an example is shown here. The direction information θ _q ^→ (ω, k) is determined by an arrival direction estimation method such as a beam former method. The beam former method is a method in which a directional beam is spatially scanned and a direction in which power is increased is searched for from the obtained power spectrum. Here, it is assumed that P arrival directions can be estimated by the beamformer method.

実際には、ビームフォーマ法によって得られる電力スペクトルは到来方向に対して急峻でないことがあり、このような場合、例えば、予め定めたスペクトル強度以上のスペクトル強度を示す電力スペクトルに対応する方向の範囲にて予め定めた間隔で到来方向を定めればよい。具体例として、極角５°、方位角１０°から２０°の範囲で予め定めたスペクトル強度以上のスペクトル強度を示す電力スペクトルが得られたとすると、予め定めた間隔２°ごとに到来方向を定めるとして、（極角５°，方位角１０°），（極角５°，方位角１２°），（極角５°，方位角１４°），（極角５°，方位角１６°），（極角５°，方位角１８°），（極角５°，方位角２０°）を到来方向とすればよい。 In practice, the power spectrum obtained by the beamformer method may not be steep with respect to the direction of arrival. In such a case, for example, the range of the direction corresponding to the power spectrum showing the spectral intensity equal to or higher than the predetermined spectral intensity. The arrival direction may be determined at predetermined intervals. As a specific example, assuming that a power spectrum showing a spectral intensity equal to or higher than a predetermined spectral intensity in a polar angle range of 5 ° and an azimuth angle of 10 ° to 20 ° is obtained, the direction of arrival is determined every predetermined interval of 2 °. (Polar angle 5 °, azimuth angle 10 °), (polar angle 5 °, azimuth angle 12 °), (polar angle 5 °, azimuth angle 14 °), (polar angle 5 °, azimuth angle 16 °), (Polar angle 5 °, azimuth angle 18 °), (polar angle 5 °, azimuth angle 20 °) may be the arrival direction.

また、電力スペクトルが或る方向にて急峻なピークを示したとしても、単純に当該方向を到来方向の一つとして定めるのではなく、当該方向の所定の範囲で到来方向を定めてもよい。具体例として、極角３０°、方位角５０°で急峻なピークを示す電力スペクトルが得られたとすると、所定の範囲（極角±４°，方位角±４°，間隔２°）で到来方向を定めるとして、（極角２６°，方位角４６°），（極角２８°，方位角４６°），（極角３０°，方位角４６°），（極角３２°，方位角４６°），（極角３４°，方位角４６°），（極角２６°，方位角４８°），（極角２８°，方位角４８°），（極角３０°，方位角４８°），（極角３２°，方位角４８°），（極角３４°，方位角４８°），（極角２６°，方位角５０°），（極角２８°，方位角５０°），（極角３０°，方位角５０°），（極角３２°，方位角５０°），（極角３４°，方位角５０°），（極角２６°，方位角５２°），（極角２８°，方位角５２°），（極角３０°，方位角５２°），（極角３２°，方位角５２°），（極角３４°，方位角５２°），（極角２６°，方位角５４°），（極角２８°，方位角５４°），（極角３０°，方位角５４°），（極角３２°，方位角５４°），（極角３４°，方位角５４°）を到来方向とすればよい。なお、第１実施形態ではＰは固定値であったが、第２実施形態ではＰはビームフォーマ法などの到来方位推定方法による推定成果に依存する値であることに留意されたい。 Even if the power spectrum shows a steep peak in a certain direction, the direction of arrival may be determined within a predetermined range of the direction instead of simply determining that direction as one of the directions of arrival. As a specific example, assuming that a power spectrum showing a steep peak at a polar angle of 30 ° and an azimuth angle of 50 ° is obtained, the direction of arrival is within a predetermined range (polar angle ± 4 °, azimuth angle ± 4 °, interval 2 °). (Polar angle 26 °, azimuth angle 46 °), (polar angle 28 °, azimuth angle 46 °), (polar angle 30 °, azimuth angle 46 °), (polar angle 32 °, azimuth angle 46 ° ), (Polar angle 34 °, azimuth angle 46 °), (polar angle 26 °, azimuth angle 48 °), (polar angle 28 °, azimuth angle 48 °), (polar angle 30 °, azimuth angle 48 °), (Polar angle 32 °, azimuth angle 48 °), (polar angle 34 °, azimuth angle 48 °), (polar angle 26 °, azimuth angle 50 °), (polar angle 28 °, azimuth angle 50 °), (polar Angle 30 °, azimuth angle 50 °), polar angle 32 °, azimuth angle 50 °, polar angle 34 °, azimuth angle 50 °, polar angle 26 °, azimuth angle 52 °, and polar angle 28 °, azimuth angle 52 °), (polar angle 30 °, azimuth 52 °), (polar angle 32 °, azimuth angle 52 °), (polar angle 34 °, azimuth angle 52 °), (polar angle 26 °, azimuth angle 54 °), (polar angle 28 °, azimuth angle 54 ° ), (Polar angle 30 °, azimuth angle 54 °), (polar angle 32 °, azimuth angle 54 °), and (polar angle 34 °, azimuth angle 54 °). Note that P is a fixed value in the first embodiment, but in the second embodiment, P is a value that depends on an estimation result obtained by an arrival direction estimation method such as a beamformer method.

ビームフォーマ法によって得られたＰ個の到来方向についてテンプレートを生成する。テンプレートの各要素の算出式は例えば式（６）である。このＰ個のテンプレート（テンプレート情報S^→(ω)）を用いて第１実施形態で説明した処理を行えばよい。 A template is generated for P arrival directions obtained by the beamformer method. A calculation formula for each element of the template is, for example, Formula (6). The processing described in the first embodiment may be performed using the P templates (template information S ^→ (ω)).

《第３実施形態》
第３実施形態は、第２実施形態と異なり、テンプレート情報を利活用して反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))を最適化する。式（１３）の具体的な計算方法の一例を示す。下記に説明する最適化方法はｑの昇順に各ｑに対して適用される。 << Third Embodiment >>
Unlike the second embodiment, the third embodiment uses the template information to optimize the reflected sound A _q (ω, k) R _q ^→ (ω, θ _q ^→ (ω, k)). An example of a specific calculation method of Expression (13) will be shown. The optimization method described below is applied to each q in ascending order of q.

§１方向情報の初期値設定
最初に、方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)をテンプレート情報S^→(ω)を用いて決定する。このために、推定されるべき到来方向に最も近いと考えられる方向に対応するテンプレートを決定し、この決定されたテンプレートに対応する方向情報を方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)とすればよい。 §1 Initial value setting of direction information First, initial value θ _{ini, q} ^→ (ω, k) of direction information θ _q ^→ (ω, k) is determined using template information S ^→ (ω). For this purpose, a template corresponding to the direction considered to be closest to the direction of arrival to be estimated is determined, and the direction information corresponding to the determined template is determined as the initial value θ of the direction information θ _q ^→ (ω, k). _{ini, q} ^→ (ω, k).

そこで、テンプレート情報の中から上述のようなテンプレートを決定するために、便宜上、反射音をA_q(ω,k,g(ω,q))S_g(ω,q) ^→(ω)と表すことにする。ここで、g(ω,q)は、テンプレート情報の中でｑ番目の反射音を最も精度良く表現できるテンプレートのインデックスを表す。反射音を構成する係数A_q(ω,k,g(ω,q))は、音源２００自身が持つ位相や壁での反射、距離による減衰などによるテンプレートS_g(ω,q) ^→(ω)と反射音との相違を表す。この場合、ｑ＋１番目の残差信号E_q+1 ^→(ω,k)は式（１６）のように表される。ただしE₁ ^→(ω,k)=X^→(ω,k)である。
Therefore, in order to determine the template as described above from the template information, the reflected sound is expressed as A _q (ω, k, g (ω, q)) S _{g (ω, q)} ^→ (ω) for convenience. I will decide. Here, g (ω, q) represents an index of a template that can represent the q-th reflected sound most accurately in the template information. The coefficient A _q (ω, k, g (ω, q)) constituting the reflected sound is determined by the template S _{g (ω, q)} ^→ (ω ) And the reflected sound. In this case, the q + 1-th residual signal E _{q + 1} ^→ (ω, k) is expressed as in Expression (16). However, E ₁ ^→ (ω, k) = X ^→ (ω, k).

反射音A_q(ω,k,g(ω,q))S_g(ω,q) ^→(ω)は、式（１６）に基づくｑ＋１番目の残差信号E_q+1 ^→(ω,k)のパワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)を最小とする基準に従って推定される。推定方法は様々あるが、そのうちの一つの方法について述べる。反射音は、A_q(ω,k,g(ω,q))とS_g(ω,q) ^→(ω)の２つの要素で構成されるので、２つの要素に対して最適化することが必要となる。後述の=処理１=と=処理２=はｑの昇順に各ｑについて行われる。 The reflected sound A _q (ω, k, g (ω, q)) S _{g (ω, q)} ^→ (ω) is expressed as q + 1th residual signal E _{q + 1} ^→ (ω, k) based on the equation (16). ) Power (E _{q + 1} ^→ (ω, k)) ^H E _{q + 1} ^→ (ω, k) is estimated according to a standard that minimizes. There are various estimation methods, but one of them will be described. The reflected sound consists of two elements, A _q (ω, k, g (ω, q)) and S _{g (ω, q)} ^→ (ω). Is required. The following = process 1 = and = process 2 = are performed for each q in ascending order of q.

=処理１=
記号Λはインデックスｐの全体の集合{1,…,p,…,P}から後述する式（１８）により決定されたインデックスの集合を除いた集合である。つまり、Λ＝{1,…,p,…,P}-{g(ω,1),…,g(ω,q-1)}とする。ただし、初めて＜処理１＞を行うときはΛ＝{1,…,p,…,P}である。
ｐ番目のテンプレートS_p ^→(ω)が残差信号E_q+1 ^→(ω,k)のパワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)を最小化するための最適なテンプレートであると仮定した場合の係数A_q(ω,k,p)は、最小二乗法に基づき、式（１７）により求められる。なお、この段階では、式（１７）左辺のｑは意味を持たないことに留意されたい。
= Process 1 =
The symbol Λ is a set obtained by excluding the set of indexes determined by the equation (18) described later from the entire set {1,..., P,. That is, Λ = {1, ..., p, ..., P}-{g (ω, 1), ..., g (ω, q-1)}. However, when <Process 1> is performed for the first time, Λ = {1,..., P,.
The p-th template S _p ^→ (ω) is the power (E _{q + 1} ^→ (ω, k)) ^H E _{q + 1} ^→ (ω, k) of the residual signal E _{q + 1} ^→ (ω, k) The coefficient A _q (ω, k, p) when it is assumed that the template is the optimum template for minimization is obtained by Expression (17) based on the least square method. Note that at this stage, q on the left side of Equation (17) has no meaning.

=処理２=
集合Λの要素の個数（濃度）を|Λ|とすると、式（１７）に基づき得られた|Λ|個の係数A_q(ω,k,p)（ｐ∈Λ）を用いて、テンプレートS_g(ω,q) ^→(ω)のインデックスを表すg(ω,q)は、残差信号E_q+1 ^→(ω,k)のパワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)を最小とするインデックスとして式（１８）により得られる。
= Process 2 =
When the number (concentration) of elements of the set Λ is | Λ |, a template is obtained using | Λ | coefficients A _q (ω, k, p) (p∈Λ) obtained based on the equation (17). _{G (ω, q)} representing the index of S _{g (ω, q)} ^→ (ω) is the power of the residual signal E _{q + 1} ^→ (ω, k) (E _{q + 1} ^→ (ω, k)) ^As an index that minimizes ^H E _{q + 1} ^→ (ω, k), it is obtained by Expression (18).

従って、方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)は、式（１８）により得られるg(ω,q)をインデックスに持つテンプレートS_g(ω,q) ^→(ω)に対応する方向情報θ_g(ω,q) ^→(ω)＝[θ_g(ω,q),pol(ω),θ_g(ω,q),azi(ω)]として与えられる。すなわち、θ_ini,q ^→(ω,k)＝[θ_g(ω,q),pol(ω),θ_g(ω,q),azi(ω)]である。初期値θ_ini,q ^→(ω,k)はフレームインデックスｋに依存しないことに留意されたい。 Therefore, the initial value θ _{ini, q} ^→ (ω, k) of the direction information θ _q ^→ (ω, k) is a template S _{g (ω,} k) having g (ω, q) obtained by the equation (18) as an index _{. q)} ^→ Direction information corresponding to (ω) θ _{g (ω, q)} ^→ (ω) = [θ _{g (ω, q), pol} (ω), θ _{g (ω, q), azi} (ω)] As given. That is, θ _{ini, q} ^→ (ω, k) = [θ _{g (ω, q), pol} (ω), θ _{g (ω, q), azi} (ω)]. Note that the initial value θ _{ini, q} ^→ (ω, k) does not depend on the frame index k.

§２反射音の最適化
次に、方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)を起点として、式（１２）で表されるｑ＋１番目の残差信号E_q+1 ^→(ω,k)のパワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)を最小とするように、反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))を最適化する。反射音は、係数A_q(ω,k)とR_q ^→(ω,θ_q ^→(ω,k))の２つの要素で構成されるので、２つの要素に対して最適化することが必要となる。この最適化方法は様々あるが、そのうちの一つの方法（勾配法）について述べる。例示する方法では、方向情報θ_q ^→(ω,k)の補正と係数A_q(ω,k)の補正が交互に所定回数（δ回）反復して行われることにより反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))が最適化される。δは例えば５０程度の値とされるが１でもよい。 §2 Optimization of reflected sound Next, with the initial value θ _{ini, q} ^→ (ω, k) of the direction information θ _q ^→ (ω, k) as the starting point, the q + 1th residual represented by equation (12) In order to minimize the power (E _{q + 1} ^→ (ω, k)) ^H E _{q + 1} ^→ (ω, k) of the signal E _{q + 1} ^→ (ω, k), the reflected sound A _q (ω, k) k) Optimize R _q ^→ (ω, θ _q ^→ (ω, k)). The reflected sound is composed of two elements: coefficient A _q (ω, k) and R _q ^→ (ω, θ _q ^→ (ω, k)), so it is necessary to optimize for the two elements It becomes. There are various optimization methods, but one of them (gradient method) will be described. In the illustrated method, correction of the direction information θ _q ^→ (ω, k) and correction of the coefficient A _q (ω, k) are alternately repeated a predetermined number of times (δ times), whereby the reflected sound A _q (ω , k) R _q ^→ (ω, θ _q ^→ (ω, k)) is optimized. For example, δ is about 50, but may be 1.

§２．１方向情報の補正
方向情報θ_q ^→(ω,k)＝[θ_q,pol(ω,k),θ_q,azi(ω,k)]の補正は、式（１９）による更新によって行われる。初めて§２．１の処理を行う場合、式（１９）右辺の方向情報θ_q ^→(ω,k)は§１の処理で得られた初期値θ_ini,q ^→(ω,k)であり、§２．１の処理が初めてではない場合、式（１９）右辺の方向情報θ_q ^→(ω,k)は直前の§２．１の処理で得られた方向情報とする。また、初めて§２．１の処理を行う場合、パワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)の計算に用いられる係数A_q(ω,k)は式（１７）で得られたA_q(ω,k,p)とし、§２．１の処理が初めてではない場合、パワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)の計算に用いられる係数A_q(ω,k)は直前の§２．２の処理（後述する）で得られた係数A_q(ω,k)とする。ステップ幅α₁およびα₂は小さい正の定数であり、収束速度などを考慮して決定されるが、例えばそれぞれ0.1程度の値とされる。
§2.1 Correction of direction information The correction of direction information θ _q ^→ (ω, k) = [θ _{q, pol} (ω, k), θ _{q, azi} (ω, k)] is updated by equation (19). Is done by. When the process of §2.1 is performed for the first time, the direction information θ _q ^→ (ω, k) on the right side of the equation (19) is the initial value θ _{ini, q} ^→ (ω, k) obtained by the process of §1. When the processing of §2.1 is not the first time, the direction information θ _q ^→ (ω, k) on the right side of Equation (19) is the direction information obtained by the processing of §2.1 immediately before. Further, when the processing of §2.1 is performed for the first time, the coefficient A _q (ω, k) used for calculating the power (E _{q + 1} ^→ (ω, k)) ^H E _{q + 1} ^→ (ω, k) Is A _q (ω, k, p) obtained in equation (17), and power (E _{q + 1} ^→ (ω, k)) ^H E _{q + 1} if the processing of §2.1 is not the first time ^{→ The} coefficient A _q (ω, k) used in the calculation of (ω, k) is the coefficient A _q (ω, k) obtained in the immediately preceding §2.2 process (described later). The step widths α ₁ and α ₂ are small positive constants, which are determined in consideration of the convergence speed and the like, and are each about 0.1, for example.

§２．２係数の補正
係数A_q(ω,k)の補正は、最小二乗法に基づき、式（２０）に従って新たな係数A_q(ω,k)を求めることにより行われる。式（２０）で用いるR_q ^→(ω,θ_q ^→(ω,k))は§２．１の処理で得られた方向情報θ_q ^→(ω,k)と式（６）から得られる。
§2.2 Correction of coefficient The coefficient A _q (ω, k) is corrected by obtaining a new coefficient A _q (ω, k) according to the equation (20) based on the least square method. R _q ^→ (ω, θ _q ^→ (ω, k)) used in the equation (20) is obtained from the direction information θ _q ^→ (ω, k) obtained by the processing of §2.1 and the equation (6). .

δ回の反復処理が終了した時点で得られている係数A_q(ω,k)と方向情報θ_q ^→(ω,k)がA_q,opt(ω,k)とθ_q,opt ^→(ω,k)であり、ｑ番目の反射音情報成分rs_q ^→(ω,k)となる。すなわち、ｑ番目の反射音情報成分rs_q ^→(ω,k)＝[rsA_q(ω,k),rsB_q(ω,k)]は式（２１）、式（２２）で与えられる。
The coefficient A _q (ω, k) and direction information θ _q ^→ (ω, k) obtained when δ iterations are completed are A _{q, opt} (ω, k) and θ _{q, opt} ^→ ( ω, k), and the q-th reflected sound information component rs _q ^→ (ω, k). That is, the q-th reflected sound information component rs _q ^→ (ω, k) = [rsA _q (ω, k), rsB _q (ω, k)] is given by the equations (21) and (22).

以上の過程により、Ｑ個の反射音情報成分rs_q ^→(ω,k)＝[rsA_q(ω,k),rsB_q(ω,k)]（ｑ＝１，…，Ｑ）が求められる。なお、δ＝１に設定されている場合、係数の補正を行わないことにより、反射音情報として到来方向のみを求めることができる。 Through the above process, Q reflected sound information components rs _q ^→ (ω, k) = [rsA _q (ω, k), rsB _q (ω, k)] (q = 1,..., Q) are obtained. . When δ = 1 is set, only the direction of arrival can be obtained as reflected sound information by not correcting the coefficient.

《第４実施形態》
次に、第１、第２、第３実施形態と異なり、Ｑ個の反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))を一括して最適化する方法について説明する。 << 4th Embodiment >>
Next, unlike the first, second, and third embodiments, Q reflected sounds A _q (ω, k) R _q ^→ (ω, θ _q ^→ (ω, k)) are collectively optimized. A method will be described.

Ｑ個の最適化された反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))は、観測信号からＱ個の反射音を除去して得られる残差信号E^→(ω,k)のパワー(E^→(ω,k))^HE^→(ω,k)（式（２３ａ）参照）を最小とする基準に従って決定される。具体的には、伝達特性関数R_q ^→(ω,θ_q ^→(ω,k))が方向情報θ_q ^→(ω,k)で決定されることに注意すると、Ｑ個の反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))（１≦ｑ≦Ｑ）を表現する各パラメータA_q(ω,k)，θ_q ^→(ω,k)（１≦ｑ≦Ｑ）の最適値A_q,opt(ω,k)，θ_q,opt ^→(ω,k)（１≦ｑ≦Ｑ）は式（２３ｂ）によって得られる。なお、記号Ｈは共役転置を表す。式（２３ｂ）にて、｛(A_q,opt(ω,k)，θ_q,opt ^→(ω,k))｝_{q∈{1,…,Q}}は、｛(A_1,opt(ω,k)，θ_1,opt ^→(ω,k)),…,(A_q,opt(ω,k)，θ_q,opt ^→(ω,k)),…,(A_Q,opt(ω,k)，θ_Q,opt ^→(ω,k))｝を表し、｛(A_q(ω,k)，θ_q ^→(ω,k))｝_{q∈{1,…,Q}}は、｛(A₁(ω,k)，θ₁ ^→(ω,k)),…,(A_q(ω,k)，θ_q ^→(ω,k)),…,(A_Q(ω,k)，θ_Q ^→(ω,k))｝を表す。
Q optimized reflected sounds A _q (ω, k) R _q ^→ (ω, θ _q ^→ (ω, k)) are residual signals obtained by removing Q reflected sounds from the observed signal. E ^→ (ω, k) power of ^{(E → (ω, k)} ) H E → (ω, k) is determined according to the criteria to minimize the (formula (23a) refer). Specifically, if it is noted that the transfer characteristic function R _q ^→ (ω, θ _q ^→ (ω, k)) is determined by the direction information θ _q ^→ (ω, k), Q reflected sounds A _q (ω, k) R _q ^→ (ω, θ _q ^→ (ω, k)) (1 ≦ q ≦ Q) Each parameter A _q (ω, k), θ _q ^→ (ω, k) (1 The optimum values A _{q, opt} (ω, k), θ _{q, opt} ^→ (ω, k) (1 ≦ q ≦ Q) of ≦ q ≦ Q are obtained by the equation (23b). Note that the symbol H represents conjugate transposition. In equation (23b), {(A _{q, opt} (ω, k), θ _{q, opt} ^→ (ω, k))} _{q∈ {1,..., Q}} is {(A _{1, opt} (ω , k), θ _{1, opt} ^→ (ω, k)),…, (A _{q, opt} (ω, k), θ _{q, opt} ^→ (ω, k)),…, (A _{Q, opt} (ω , k), θ _{Q, opt} ^→ (ω, k))}, {(A _q (ω, k), θ _q ^→ (ω, k))} _{q∈ {1,.} {(A ₁ (ω, k), θ ₁ ^→ (ω, k)), ..., (A _q (ω, k), θ _q ^→ (ω, k)), ..., (A _Q (ω, k ), Θ _Q ^→ (ω, k))}.

このとき、ｑ番目の反射音情報成分rs_q ^→(ω,k)＝[rsA_q(ω,k),rsB_q(ω,k)]は式（２４）、式（２５）で与えられる。
At this time, the q-th reflected sound information component rs _q ^→ (ω, k) = [rsA _q (ω, k), rsB _q (ω, k)] is given by equations (24) and (25).

式（２３ｂ）の具体的な計算方法は種々考えられるが、ここではその一例を示す。 Various specific calculation methods of the equation (23b) are conceivable, but an example is shown here.

§１．１方向情報の初期値設定
最初に、Ｑ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）を設定する。Ｑ個の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）の決定方法として、この実施形態では、Ｑ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）を、観測信号X^→(ω,k)とテンプレート情報S^→(ω)を用いて決定するビーム探索的決定方法を説明する。この方法によると、Ｑ個の推定されるべき到来方向それぞれに最も近いと考えられるＱ個の方向に対応するＱ個のテンプレートを決定し、この決定されたＱ個のテンプレートに対応する方向情報をＱ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）とすればよい。なお、この場合、ＰとＱとの間にＱ＜Ｐなる関係がある。 §1.1 Initial value setting of direction information First, initial values θ _{ini, q} ^→ (ω, k) (1 ≦ q ≦ Q) of Q pieces of direction information θ _q ^→ (ω, k) are set. As a method for determining Q initial values θ _{ini, q} ^→ (ω, k) (1 ≦ q ≦ Q), in this embodiment, initial values θ _{ini of} Q direction information θ _q ^→ (ω, k). _{, q} ^→ (ω, k) (1 ≦ q ≦ Q) will be described with reference to the observation signal X ^→ (ω, k) and the template information S ^→ (ω). According to this method, Q templates corresponding to Q directions considered to be closest to each of Q estimated directions of arrival are determined, and direction information corresponding to the determined Q templates is determined. The initial value θ _{ini, q} ^→ (ω, k) of Q pieces of direction information θ _q ^→ (ω, k) may be set to 1 ≦ q ≦ Q. In this case, there is a relationship of Q <P between P and Q.

そこで、テンプレート情報の中から上述のようなテンプレートを決定するために、便宜上、ｑ番目の反射音をA_q(ω,k,g(ω,q))S_g(ω,q) ^→(ω)と表すことにする。ここで、g(ω,q)は、テンプレート情報の中でｑ番目の反射音を最も精度良く表現できるテンプレートのインデックスを表す。反射音を構成する係数A_q(ω,k,g(ω,q))は、音源２００自身が持つ位相や壁での反射、距離による減衰などによるテンプレートS_g(ω,q) ^→(ω)と反射音との相違を表す。この場合、観測信号X^→(ω,k)からｑ番目の反射音A_q(ω,k,g(ω,q))S_g(ω,q) ^→(ω)を除去して得られる残差信号E_q ^→(ω,k)は式（２６）のように表される。
Therefore, in order to determine the template as described above from the template information, for convenience, the q-th reflected sound is expressed as A _q (ω, k, g (ω, q)) S _{g (ω, q)} ^→ (ω ). Here, g (ω, q) represents an index of a template that can represent the q-th reflected sound most accurately in the template information. The coefficient A _q (ω, k, g (ω, q)) constituting the reflected sound is determined by the template S _{g (ω, q)} ^→ (ω ) And the reflected sound. In this case, the remaining signal obtained by removing the q-th reflected sound A _q (ω, k, g (ω, q)) S _{g (ω, q)} ^→ (ω) from the observed signal X ^→ (ω, k). The difference signal E _q ^→ (ω, k) is expressed as in Expression (26).

ｑ番目の反射音A_q(ω,k,g(ω,q))S_g(ω,q) ^→(ω)は、式（２６）に基づく残差信号E_q ^→(ω,k)のパワー(E_q ^→(ω,k))^HE_q ^→(ω,k)を最小とする基準に従って推定される。推定方法は様々あるが、そのうちの一つの方法について述べる。 The q-th reflected sound A _q (ω, k, g (ω, q)) S _{g (ω, q)} ^→ (ω) is obtained from the residual signal E _q ^→ (ω, k) based on the equation (26). Power (E _q ^→ (ω, k)) Estimated according to a standard that minimizes ^H E _q ^→ (ω, k). There are various estimation methods, but one of them will be described.

反射音は、A_q(ω,k,g(ω,q))とS_g(ω,q) ^→(ω)の２つの要素で構成されるので、２つの要素に対して最適化することが必要となる。まず、ｐ番目のテンプレートS_p ^→(ω)が残差信号E_q ^→(ω,k)のパワー(E_q ^→(ω,k))^HE_q ^→(ω,k)を最小化するための最適なテンプレートであると仮定した場合の係数A_q(ω,k,p)は、最小二乗法に基づき、式（２７）により求められる。なお、この段階では、式（２７）左辺のｑは意味を持たないことに留意されたい。
The reflected sound consists of two elements, A _q (ω, k, g (ω, q)) and S _{g (ω, q)} ^→ (ω). Is required. First, the p-th template S _p ^→ (ω) minimizes the power (E _q ^→ (ω, k)) ^H E _q ^→ (ω, k) of the residual signal E _q ^→ (ω, k). The coefficient A _q (ω, k, p) in the case where it is assumed that the template is an optimal template is obtained from the equation (27) based on the least square method. Note that at this stage, q on the left side of Equation (27) has no meaning.

次に、式（２７）に基づき得られたＰ個の係数A_q(ω,k,p)（１≦ｐ≦Ｐ）の中から、その絶対値の大きい方から順にＱ個の係数A_q(ω,k,p)のインデックスｑ（１≦ｑ≦Ｑ）を決定する（式（２８）参照）。記号Λは、インデックスｐの全体の集合{1,…,p,…,P}から式（２８）により決定されたインデックスの集合を除いた集合であり、Λ＝{1,…,p,…,P}-{g(ω,1),…,g(ω,q-1)}である。
Next, among the P coefficients A _q (ω, k, p) (1 ≦ p ≦ P) obtained based on the equation (27), the Q coefficients A _q are ordered in descending order of their absolute values. The index q (1 ≦ q ≦ Q) of (ω, k, p) is determined (see formula (28)). The symbol Λ is a set obtained by subtracting the set of indexes determined by the equation (28) from the entire set {1,..., P,..., P} of the index p, and Λ = {1,. , P}-{g (ω, 1),..., G (ω, q-1)}.

従って、Ｑ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）は、式（２８）により得られるＱ個のg(ω,q)（１≦ｑ≦Ｑ）をインデックスに持つテンプレートS_g(ω,q) ^→(ω)（１≦ｑ≦Ｑ）に対応する方向情報θ_g(ω,q) ^→(ω)＝[θ_g(ω,q),pol(ω),θ_g(ω,q),azi(ω)]（１≦ｑ≦Ｑ）として与えられる。すなわち、θ_ini,q ^→(ω,k)＝[θ_g(ω,q),pol(ω),θ_g(ω,q),azi(ω)]（１≦ｑ≦Ｑ）である。初期値θ_ini,q ^→(ω,k)はフレームインデックスｋに依存しないことに留意されたい。 Accordingly, the initial value θ _{ini, q} ^→ (ω, k) (1 ≦ q ≦ Q) of the Q pieces of direction information θ _q ^→ (ω, k) is obtained by Q pieces of g (ω , q) (S ≦ _G ≦ ω ≦ q ≦ Q) (1 ≦ q ≦ Q) ^→ (ω) (1 ≦ q ≦ Q) direction information θ _{g (ω, q)} ^→ (ω) = [θ _{g (ω, q), pol} (ω), θ _{g (ω, q), azi} (ω)] (1 ≦ q ≦ Q). That is, θ _{ini, q} ^→ (ω, k) = [θ _{g (ω, q), pol} (ω), θ _{g (ω, q), azi} (ω)] (1 ≦ q ≦ Q). Note that the initial value θ _{ini, q} ^→ (ω, k) does not depend on the frame index k.

§１．２係数A_q(ω,k)の初期値設定
次に、Ｑ個の係数A_q(ω,k)の初期値A_ini,q(ω,k)を設定する。Ｑ個の初期値A_ini,q(ω,k)（１≦ｑ≦Ｑ）の決定方法として種々のものが考えられるが、ここでは一例として、パワー最小化基準でＱ個の初期値A_ini,q(ω,k)（１≦ｑ≦Ｑ）を決定する方法を説明する。まず、A_q(ω,k,p)=0（１≦ｐ≦Ｐ）とする。そして、初期値A_ini,q(ω,k)（１≦ｑ≦Ｑ）は、残差信号E^→(ω,k)のパワー(E^→(ω,k))^HE^→(ω,k)を最小化するように、最小二乗法に基づき、式（２９）により求められる。式（２９）にてF_q ^→(ω,k)は式（３０）で与えられる。式（３０）にてΥ={1,…,q-1,q+1,…,Q}であり、F_q ^→(ω,k)は観測信号からｑ番目の反射音を除去した残差信号である。なお、Ｑ個の方向情報θ_q ^→(ω,k)として§１．１で決定されたＱ個の初期値θ_ini,q ^→(ω,k)を用いる。式（２９）で用いるR_q ^→(ω,θ_q ^→(ω,k))は方向情報の初期値θ_ini,q ^→(ω,k)と式（６）から得られる。
§1.2 coefficients A _q (ω, k) initial value setting Next, Q-number of coefficients A _q (ω, k) initial value A _{ini of, q (ω,} k) is set to. There are various methods for determining the Q initial values A _{ini, q} (ω, k) (1 ≦ q ≦ Q). Here, as an example, the Q initial values A _{ini on the} power minimization basis are considered. _{, q} (ω, k) (1 ≦ q ≦ Q) will be described. First, A _q (ω, k, p) = 0 (1 ≦ p ≦ P). The initial value A _{ini, q} (ω, k) (1 ≦ q ≦ Q) is the power (E ^→ (ω, k)) ^H E ^→ (ω, k) of the residual signal E ^→ (ω, k). ) Based on the least square method so as to minimize. In Equation (29), F _q ^→ (ω, k) is given by Equation (30). In equation (30), Υ = {1, ..., q-1, q + 1, ..., Q}, and F _q ^→ (ω, k) is the residual obtained by removing the q-th reflected sound from the observed signal Signal. Note that Q initial values θ _{ini, q} ^→ (ω, k) determined in §1.1 are used as Q pieces of direction information θ _q ^→ (ω, k). R _q ^→ (ω, θ _q ^→ (ω, k)) used in Expression (29) is obtained from the initial value θ _{ini, q} ^→ (ω, k) of the direction information and Expression (6).

§２反射音の最適化
次に、Ｑ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）を起点として、式（２３ａ）で表される残差信号E^→(ω,k)のパワー(E^→(ω,k))^HE^→(ω,k)を最小とするように、Ｑ個の反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))（１≦ｑ≦Ｑ）を一括して最適化する。各反射音は、係数A_q(ω,k)とR_q ^→(ω,θ_q ^→(ω,k))の２つの要素で構成されるので、２つの要素に対して最適化することが必要となる。この最適化方法は様々あるが、そのうちの一つの方法（勾配法）について述べる。例示する方法では、方向情報θ_q ^→(ω,k)の補正と係数A_q(ω,k)の補正が交互に所定回数（δ回）反復して行われることによりＱ個の反射音A_q(ω,k)R_q ^→(ω,θ_q ^→(ω,k))（１≦ｑ≦Ｑ）が最適化される。δは例えば１００程度の値とされるが１でもよい。 §2 Optimization of reflected sound Next, starting from the initial value θ _{ini, q} ^→ (ω, k) (1 ≦ q ≦ Q) of Q pieces of direction information θ _q ^→ (ω, k), the equation (23a ) Q reflected sounds A _q (ω to minimize the power (E ^→ (ω, k)) ^H E ^→ (ω, k) of the residual signal E ^→ (ω, k) , k) R _q ^→ (ω, θ _q ^→ (ω, k)) (1 ≦ q ≦ Q) is optimized collectively. Each reflected sound is composed of two elements: coefficient A _q (ω, k) and R _q ^→ (ω, θ _q ^→ (ω, k)). Necessary. There are various optimization methods, but one of them (gradient method) will be described. In the illustrated method, the correction of the direction information θ _q ^→ (ω, k) and the correction of the coefficient A _q (ω, k) are alternately repeated a predetermined number of times (δ times), thereby Q reflected sounds A _q (ω, k) R _q ^→ (ω, θ _q ^→ (ω, k)) (1 ≦ q ≦ Q) is optimized. For example, δ is set to a value of about 100, but may be 1.

§２．１方向情報の補正
Ｑ個の方向情報θ_q ^→(ω,k)＝[θ_q,pol(ω,k),θ_q,azi(ω,k)]（１≦ｑ≦Ｑ）の補正は、式（３１）による更新によって行われる。各ｑ（１≦ｑ≦Ｑ）について、初めて§２．１の処理を行う場合、式（３１）右辺の方向情報θ_q ^→(ω,k)は§１．１の処理で得られた初期値θ_ini,q ^→(ω,k)であり、§２．１の処理が初めてではない場合、式（３１）右辺の方向情報θ_q ^→(ω,k)は直前の§２．１の処理で得られた方向情報とする。また、初めて§２．１の処理を行う場合、パワー(F_q ^→(ω,k))^HF_q ^→(ω,k)の計算に用いられる係数A_q(ω,k)（ｑ∈Υ）として式（２９）で得られたA_ini,q(ω,k)（１≦ｑ≦Ｑ）を用い、§２．１の処理が初めてではない場合、パワー(F_q ^→(ω,k))^HF_q ^→(ω,k)の計算に用いられる係数A_q(ω,k)（ｑ∈Υ）として直前の§２．２の処理（後述する）で得られた係数A_q(ω,k)（１≦ｑ≦Ｑ）を用いる。ステップ幅α₁およびα₂は小さい正の定数であり、収束速度などを考慮して決定されるが、例えばそれぞれ0.1程度の値とされる。
§2.1 Correction of direction information Q pieces of direction information θ _q ^→ (ω, k) = [θ _{q, pol} (ω, k), θ _{q, azi} (ω, k)] (1 ≦ q ≦ Q) Is corrected by updating according to the equation (31). When the process of §2.1 is performed for each q (1 ≦ q ≦ Q) for the first time, the direction information θ _q ^→ (ω, k) on the right side of the equation (31) is the initial value obtained by the process of §1.1 If the value θ _{ini, q} ^→ (ω, k) and the processing of §2.1 is not the first time, the direction information θ _q ^→ (ω, k) on the right side of Equation (31) is The direction information obtained by the processing is used. Further, when the processing of §2.1 is performed for the first time, the coefficient A _q (ω, k) (q∈Υ) used for calculating the power (F _q ^→ (ω, k)) ^H F _q ^→ (ω, k) ) Using A _{ini, q} (ω, k) (1 ≦ q ≦ Q) obtained by the equation (29), the power (F _q ^→ (ω, k) )) The coefficient A _q (ω (k, k) (q∈Υ) used in the calculation of ^H F _q ^→ (ω, k) is the coefficient A _q ( ω, k) (1 ≦ q ≦ Q) is used. The step widths α ₁ and α ₂ are small positive constants, which are determined in consideration of the convergence speed and the like, and are each about 0.1, for example.

§２．２係数の補正
Ｑ個の係数A_q(ω,k)（１≦ｑ≦Ｑ）の補正は、最小二乗法に基づき、式（３２）に従って新たな係数A_q(ω,k)（１≦ｑ≦Ｑ）を求めることにより行われる。式（３２）で用いるR_q ^→(ω,θ_q ^→(ω,k))は§２．１の処理で得られた方向情報θ_q ^→(ω,k)と式（６）から得られる。式（３２）にてF_q ^→(ω,k)の計算に用いられる係数A_q(ω,k)（ｑ∈Υ）として、初めて§２．２の処理を行う場合、§１．１の処理で得られた初期値A_ini,q(ω,k)を用い、§２．２の処理が初めてではない場合、直前の§２．２の処理で得られた係数A_q(ω,k)（１≦ｑ≦Ｑ）を用いる。
§2.2 Correction of coefficients The correction of the Q coefficients A _q (ω, k) (1 ≦ q ≦ Q) is based on the least square method, and a new coefficient A _q (ω, k) is obtained according to equation (32). This is performed by obtaining (1 ≦ q ≦ Q). R _q ^→ (ω, θ _q ^→ (ω, k)) used in the equation (32) is obtained from the direction information θ _q ^→ (ω, k) obtained by the processing of §2.1 and the equation (6). . When the processing of §2.2 is performed for the first time as the coefficient A _q (ω, k) (q∈Υ) used in the calculation of F _q ^→ (ω, k) in Equation (32), When the initial value A _{ini, q} (ω, k) obtained in the processing is used and the processing in §2.2 is not the first time, the coefficient A _q (ω, k) obtained in the processing in the previous §2.2 is used. ) (1 ≦ q ≦ Q) is used.

δ回の反復処理が終了した時点で得られている係数A_q(ω,k)と方向情報θ_q ^→(ω,k)のＱ個の組み合わせ(A_q(ω,k)，θ_q ^→(ω,k))（１≦ｑ≦Ｑ）が｛(A_q,opt(ω,k)，θ_q,opt ^→(ω,k))｝_{q∈{1,…,Q}}であり、Ｑ個の反射音情報成分rs_q ^→(ω,k)（１≦ｑ≦Ｑ）となる。すなわち、ｑ番目の反射音情報成分rs_q ^→(ω,k)＝[rsA_q(ω,k),rsB_q(ω,k)]は式（３３）、式（３４）で与えられる。
Q combinations (A _q (ω, k), θ _q ^→ ) of coefficient A _q (ω, k) and direction information θ _q ^→ (ω, k) obtained at the end of δ iterations (ω, k)) (1 ≦ q ≦ Q) is {(A _{q, opt} (ω, k), θ _{q, opt} ^→ (ω, k))} _{q∈ {1,..., Q}} Q reflected sound information components rs _q ^→ (ω, k) (1 ≦ q ≦ Q). That is, the q-th reflected sound information component rs _q ^→ (ω, k) = [rsA _q (ω, k), rsB _q (ω, k)] is given by Expression (33) and Expression (34).

《第５実施形態》
第５実施形態は、「§１．１方向情報の初期値設定」が第４実施形態と異なる。そこで、第４実施形態と同じ事項については重複説明を省略し、第４実施形態と異なる事項について説明する。 << 5th Embodiment >>
The fifth embodiment is different from the fourth embodiment in “§1.1 Initial value setting of direction information”. Therefore, the duplicated description of the same items as those in the fourth embodiment will be omitted, and items different from those in the fourth embodiment will be described.

§１．１方向情報の初期値設定
最初に、Ｑ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）を設定する。Ｑ個の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）の決定方法として、この実施形態では、Ｑ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）を、観測信号X^→(ω,k)とテンプレート情報S^→(ω)を用いて決定する一般化調和解析的決定方法を説明する。この方法によると、Ｑ個の推定されるべき到来方向それぞれに最も近いと考えられるＱ個の方向に対応するＱ個のテンプレートを決定し、この決定されたＱ個のテンプレートに対応する方向情報をＱ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）とすればよい。なお、この場合、ＰとＱとの間にＱ＜Ｐなる関係がある。 §1.1 Initial value setting of direction information First, initial values θ _{ini, q} ^→ (ω, k) (1 ≦ q ≦ Q) of Q pieces of direction information θ _q ^→ (ω, k) are set. As a method for determining Q initial values θ _{ini, q} ^→ (ω, k) (1 ≦ q ≦ Q), in this embodiment, initial values θ _{ini of} Q direction information θ _q ^→ (ω, k). _{, q} ^→ (ω, k) (1 ≦ q ≦ Q) will be described using the observation signal X ^→ (ω, k) and the template information S ^→ (ω). According to this method, Q templates corresponding to Q directions considered to be closest to each of Q estimated directions of arrival are determined, and direction information corresponding to the determined Q templates is determined. The initial value θ _{ini, q} ^→ (ω, k) of Q pieces of direction information θ _q ^→ (ω, k) may be set to 1 ≦ q ≦ Q. In this case, there is a relationship of Q <P between P and Q.

そこで、テンプレート情報の中から上述のようなテンプレートを決定するために、便宜上、ｑ番目の反射音をA_q(ω,k,g(ω,q))S_g(ω,q) ^→(ω)と表すことにする。ここで、g(ω,q)は、テンプレート情報の中でｑ番目の反射音を最も精度良く表現できるテンプレートのインデックスを表す。反射音を構成する係数A_q(ω,k,g(ω,q))は、音源２００自身が持つ位相や壁での反射、距離による減衰などによるテンプレートS_g(ω,q) ^→(ω)と反射音との相違を表す。この場合、観測信号から１番目からｑ番目までのｑ個の反射音を除去して得られる残差信号E_q+1 ^→(ω,k)は式（３５）のように表される。ただし、１≦ｑ≦Ｑであり、E₁ ^→(ω,k)=X^→(ω,k)である。
Therefore, in order to determine the template as described above from the template information, for convenience, the q-th reflected sound is expressed as A _q (ω, k, g (ω, q)) S _{g (ω, q)} ^→ (ω ). Here, g (ω, q) represents an index of a template that can represent the q-th reflected sound most accurately in the template information. The coefficient A _q (ω, k, g (ω, q)) constituting the reflected sound is determined by the template S _{g (ω, q)} ^→ (ω ) And the reflected sound. In this case, the residual signal E _{q + 1} ^→ (ω, k) obtained by removing the first to q-th reflected sounds from the observed signal is expressed as in Expression (35). However, 1 ≦ q ≦ Q, and E ₁ ^→ (ω, k) = X ^→ (ω, k).

ｑ番目の反射音A_q(ω,k,g(ω,q))S_g(ω,q) ^→(ω)は、式（３５）に基づく残差信号E_q+1 ^→(ω,k)のパワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)を最小とする基準に従って推定される。推定方法は様々あるが、そのうちの一つの方法について述べる。反射音は、A_q(ω,k,g(ω,q))とS_g(ω,q) ^→(ω)の２つの要素で構成されるので、２つの要素に対して最適化することが必要となる。後述の＜処理１＞と＜処理２＞はｑの昇順に各ｑについて行われる。 The q-th reflected sound A _q (ω, k, g (ω, q)) S _{g (ω, q)} ^→ (ω) is the residual signal E _{q + 1} ^→ (ω, k) based on the equation (35). ) Power (E _{q + 1} ^→ (ω, k)) ^H E _{q + 1} ^→ (ω, k) is estimated according to a standard that minimizes. There are various estimation methods, but one of them will be described. The reflected sound consists of two elements, A _q (ω, k, g (ω, q)) and S _{g (ω, q)} ^→ (ω). Is required. <Process 1> and <Process 2> to be described later are performed for each q in ascending order of q.

＜処理１＞
記号Λはインデックスｐの全体の集合{1,…,p,…,P}から後述する式（３７）により決定されたインデックスの集合を除いた集合である。つまり、Λ＝{1,…,p,…,P}-{g(ω,1),…,g(ω,q-1)}とする。ただし、初めて＜処理１＞を行うときはΛ＝{1,…,p,…,P}である。
ｐ番目（ｐ∈Λ）のテンプレートS_p ^→(ω)が式（３５）に基づく残差信号E_q+1 ^→(ω,k)のパワー(E_q+1 ^→(ω,k))^HE_q+1 ^→(ω,k)を最小化するための最適なテンプレートであると仮定した場合の係数A_q(ω,k,p)は、最小二乗法に基づき、式（３６）により求められる。なお、この段階では、式（３６）左辺のｑは意味を持たないことに留意されたい。
<Process 1>
The symbol Λ is a set obtained by excluding the set of indexes determined by the equation (37) described later from the entire set {1,..., P,. That is, Λ = {1, ..., p, ..., P}-{g (ω, 1), ..., g (ω, q-1)}. However, when <Process 1> is performed for the first time, Λ = {1,..., P,.
The p-th (p∈Λ) template S _p ^→ (ω) is the power (E _{q + 1} ^→ (ω, k)) ^{H of the} residual signal E _{q + 1} ^→ (ω, k) based on the equation (35). The coefficient A _q (ω, k, p) when assuming that E _{q + 1} ^→ (ω, k) is an optimal template is obtained from Equation (36) based on the method of least squares. It is done. Note that at this stage, q on the left side of Equation (36) has no meaning.

＜処理２＞
集合Λの要素の個数（濃度）を|Λ|とすると、式（３６）に基づき得られた|Λ|個の係数A_q(ω,k,p)（ｐ∈Λ）の中から、その絶対値が最大の係数A_q(ω,k,p)のインデックスｑ（１≦ｑ≦Ｑ）を決定する（式（３７）参照）。
<Process 2>
If that was obtained based on the equation (36) | | number of elements in the set lambda (concentration) | lambda lambda | among the pieces of coefficients _{A q (ω, k, p} ) (p∈Λ), its The index q (1 ≦ q ≦ Q) of the coefficient A _q (ω, k, p) having the maximum absolute value is determined (see Expression (37)).

従って、Ｑ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）は、式（３７）により得られるＱ個のg(ω,q)（１≦ｑ≦Ｑ）をインデックスに持つテンプレートS_g(ω,q) ^→(ω)（１≦ｑ≦Ｑ）に対応する方向情報θ_g(ω,q) ^→(ω)＝[θ_g(ω,q),pol(ω),θ_g(ω,q),azi(ω)]（１≦ｑ≦Ｑ）として与えられる。すなわち、θ_ini,q ^→(ω,k)＝[θ_g(ω,q),pol(ω),θ_g(ω,q),azi(ω)]（１≦ｑ≦Ｑ）である。初期値θ_ini,q ^→(ω,k)はフレームインデックスｋに依存しないことに留意されたい。 Therefore, the initial value θ _{ini, q} ^→ (ω, k) (1 ≦ q ≦ Q) of the Q pieces of direction information θ _q ^→ (ω, k) is expressed by Q pieces of g (ω , q) (S ≦ _G ≦ ω ≦ q ≦ Q) (1 ≦ q ≦ Q) ^→ (ω) (1 ≦ q ≦ Q) direction information θ _{g (ω, q)} ^→ (ω) = [θ _{g (ω, q), pol} (ω), θ _{g (ω, q), azi} (ω)] (1 ≦ q ≦ Q). That is, θ _{ini, q} ^→ (ω, k) = [θ _{g (ω, q), pol} (ω), θ _{g (ω, q), azi} (ω)] (1 ≦ q ≦ Q). Note that the initial value θ _{ini, q} ^→ (ω, k) does not depend on the frame index k.

第５実施形態では、Ｑ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）の決定に一般化調和解析的決定方法を用いた。一般化調和解析的決定方法によると、初期値計算量がビーム探索的決定方法に比べて増大するが、推定されるべき反射音（正解）に近い初期値を設定できる可能性が高く、この場合、推定精度の向上を望めるだけでなく、反復処理の回数を減らすことができる。 In the fifth embodiment, a generalized harmonic analysis determination method is used to determine the initial value θ _{ini, q} ^→ (ω, k) (1 ≦ q ≦ Q) of Q pieces of direction information θ _q ^→ (ω, k). Using. According to the generalized harmonic analysis determination method, the initial value calculation amount increases compared to the beam search determination method, but it is highly possible to set an initial value close to the reflected sound (correct answer) to be estimated. In addition to improving the estimation accuracy, the number of iterations can be reduced.

《第６実施形態》
第６実施形態では、「§１．１方向情報の初期値設定」が第４実施形態と異なる。そこで、第４実施形態と同じ事項については重複説明を省略し、第４実施形態と異なる事項について説明する。第６実施形態のコンセプトは、第４実施形態よりも簡便でありながら、上述の初期値の偏りを防止することにある。 << 6th Embodiment >>
In the sixth embodiment, “§1.1 Initial value setting of direction information” is different from the fourth embodiment. Therefore, the duplicated description of the same items as those in the fourth embodiment will be omitted, and items different from those in the fourth embodiment will be described. The concept of the sixth embodiment is to prevent the above-described bias of the initial values while being simpler than the fourth embodiment.

§１．１方向情報の初期値設定
まず、方向情報を構成するθ_pol(ω)とθ_azi(ω)のいずれか一方について、偏りが無いように複数の方向を決定する。この例では、極角θ_pol(ω)について偏りが無いようにβ個（β≧2）の方向{θ_1,pol(ω),…,θ_β,pol(ω)}を決定する。通常、極角は0°≦θ_pol(ω)≦180°を満たすから、例えば等間隔10°ごとに方向を定めることにより{θ_1,pol(ω),…,θ_β,pol(ω)}={0,10,20,…,180}となる（β=19）。テンプレート情報に含まれるテンプレートのうち集合{θ_1,pol(ω),…,θ_β,pol(ω)}の要素のいずれかを極角θ_pol(ω)として持つ方向情報（位置）に対応するテンプレートのインデックスの集合をΨとする。この際、集合Ψの要素の個数（濃度）|Ψ|が、Ｑ≦|Ψ|＜Ｐを満たすようになることが好ましい。集合Ψは、テンプレート情報に含まれるテンプレートに対応する方向情報（位置）の集合の真部分集合である。 §1.1 Initial value setting of direction information First, a plurality of directions are determined so that there is no bias for either one of θ _pol (ω) and θ _azi (ω) constituting the direction information. In this example, β (β ≧ 2) directions {θ _{1, pol} (ω),..., Θ _{β, pol} (ω)} are determined so that there is no deviation with respect to the polar angle θ _pol (ω). Usually, the polar angle satisfies 0 ° ≦ θ _pol (ω) ≦ 180 °. For example, by setting the direction at equal intervals of 10 °, {θ _{1, pol} (ω),…, θ _{β, pol} (ω) } = {0,10,20, ..., 180} (β = 19). Corresponds to the direction information (position) having one of the elements of the set {θ _{1, pol} (ω),…, θ _{β, pol} (ω)} as the polar angle θ _pol (ω) among the templates included in the template information Let Ψ be the set of template indexes to be performed. At this time, it is preferable that the number (concentration) | Ψ | of the elements of the set Ψ satisfies Q ≦ | Ψ | <P. The set Ψ is a true subset of the set of direction information (position) corresponding to the template included in the template information.

そして、ｐ番目のテンプレートS_p ^→(ω)が残差信号E_q ^→(ω,k)のパワー(E_q ^→(ω,k))^HE_q ^→(ω,k)を最小化するための最適なテンプレートであると仮定した場合の係数A_q(ω,k,p)を、最小二乗法に基づき、式（３８ａ）により求める。ただし、式（３８ａ）の右辺で用いるテンプレートのインデックスｐはｐ∈Ψである。なお、この段階では、式（３８ａ）左辺のｑは意味を持たないことに留意されたい。
The p-th template S _p ^→ (ω) minimizes the power (E _q ^→ (ω, k)) ^H E _q ^→ (ω, k) of the residual signal E _q ^→ (ω, k). The coefficient A _q (ω, k, p) when it is assumed to be an optimal template is obtained from the equation (38a) based on the least square method. However, the index p of the template used on the right side of the equation (38a) is pεΨ. Note that at this stage, q on the left side of Equation (38a) has no meaning.

次に、ｐ∈Ψなる条件の下で式（３８ａ）に基づき得られた|Ψ|個の係数A_q(ω,k,p)（１≦ｐ≦|Ψ|）の中から、その絶対値の大きい方から順にＱ個の係数A_q(ω,k,p)のインデックスｑ（１≦ｑ≦Ｑ）を決定する（式（３８ｂ）参照）。記号Γは、集合Ψから式（３８ｂ）により決定されたインデックスの集合を除いた集合であり、Γ＝Ψ-{g(ω,1),…,g(ω,q-1)}である。
Next, out of | Ψ | coefficients A _q (ω, k, p) (1 ≦ p ≦ | Ψ |) obtained based on the equation (38a) under the condition of p∈Ψ, The index q (1 ≦ q ≦ Q) of the Q coefficients A _q (ω, k, p) is determined in order from the largest value (see equation (38b)). The symbol Γ is a set obtained by subtracting the set of indexes determined by the equation (38b) from the set Ψ, and Γ = Ψ− {g (ω, 1),..., G (ω, q−1)}. .

従って、Ｑ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）は、式（３８）により得られるＱ個のg(ω,q)（１≦ｑ≦Ｑ）をインデックスに持つテンプレートS_g(ω,q) ^→(ω)（１≦ｑ≦Ｑ）に対応する方向情報θ_g(ω,q) ^→(ω)＝[θ_g(ω,q),pol(ω),θ_g(ω,q),azi(ω)]（１≦ｑ≦Ｑ）として与えられる。すなわち、θ_ini,q ^→(ω,k)＝[θ_g(ω,q),pol(ω),θ_g(ω,q),azi(ω)]（１≦ｑ≦Ｑ）である。初期値θ_ini,q ^→(ω,k)はフレームインデックスｋに依存しないことに留意されたい。 Accordingly, the initial value θ _{ini, q} ^→ (ω, k) (1 ≦ q ≦ Q) of the Q pieces of direction information θ _q ^→ (ω, k) is expressed by Q pieces of g (ω , q) (S ≦ _G ≦ ω ≦ q ≦ Q) (1 ≦ q ≦ Q) ^→ (ω) (1 ≦ q ≦ Q) direction information θ _{g (ω, q)} ^→ (ω) = [θ _{g (ω, q), pol} (ω), θ _{g (ω, q), azi} (ω)] (1 ≦ q ≦ Q). That is, θ _{ini, q} ^→ (ω, k) = [θ _{g (ω, q), pol} (ω), θ _{g (ω, q), azi} (ω)] (1 ≦ q ≦ Q). Note that the initial value θ _{ini, q} ^→ (ω, k) does not depend on the frame index k.

これらの実施形態のほか、Ｑ個の方向情報θ_q ^→(ω,k)の初期値θ_ini,q ^→(ω,k)（１≦ｑ≦Ｑ）をランダムに設定する実施形態も許容される。 In addition to these embodiments, an embodiment in which the initial value θ _{ini, q} ^→ (ω, k) (1 ≦ q ≦ Q) of Q pieces of direction information θ _q ^→ (ω, k) is set at random is allowed. The

＜変形例＞
上述の実施形態では周波数ごとに観測信号X^→(ω,k)を用いて反射音情報rs^→(ω,k)を推定したが、周波数ごとに反射音情報を推定すると、一意に推定されるべき仮想音源の方向（推定到来方向）以外の方向に関する情報も含んでしまうことがあり、この結果、反射音情報に誤差が生じることがありうる。例えば、図１４（ａ）に示すように推定到来方向に関する情報だけを抽出できることが望ましいが、実際には図１４（ｂ）に示すように推定到来方向以外の方向に関する情報が混在してしまうことがありえる。 <Modification>
In the above-described embodiment, the reflected sound information rs ^→ (ω, k) is estimated using the observation signal X ^→ (ω, k) for each frequency. However, when the reflected sound information is estimated for each frequency, it is uniquely estimated. Information regarding directions other than the direction of the virtual sound source (estimated arrival direction) may be included, and as a result, errors may occur in the reflected sound information. For example, it is desirable that only information related to the estimated arrival direction can be extracted as shown in FIG. 14 (a), but in reality, information related to directions other than the estimated arrival direction is mixed as shown in FIG. 14 (b). There can be.

そこで変形例では、全周波数に亘り一括してパワーを算出することで、反射音情報の推定誤差を小さくする。つまり、図１５に示すように、残差信号のパワーを全周波数に亘り統括することによって、推定到来方向以外の方向の影響を極力減らすことができる。一般的に推定到来方向以外の方向では各周波数でのパワーにバラつきが生じるので、残差信号のパワーを全周波数に亘り統括することにより、推定到来方向のパワーに比してそれ以外の方向のパワーの相対的な影響を低減することができる。なお、図１５では、縦軸のパワーは相対値を示しているので各グラフのスケールが同じであるわけではないことに留意されたい。 Therefore, in the modification, the estimation error of the reflected sound information is reduced by calculating the power collectively over all frequencies. That is, as shown in FIG. 15, the influence of directions other than the estimated arrival direction can be reduced as much as possible by integrating the power of the residual signal over all frequencies. In general, the power at each frequency varies in directions other than the estimated direction of arrival. Therefore, by integrating the power of the residual signal over all frequencies, the power in the other direction compared to the power in the estimated direction of arrival. The relative influence of power can be reduced. Note that in FIG. 15, the power of the vertical axis indicates a relative value, so the scales of the graphs are not the same.

この変形例での処理は次のとおりである。解析する周波数帯域に含まれる周波数のインデックスωの集合をΩとする。例えば、音声信号を扱うのであれば、1.0〜3.0kHz帯域に対応するインデックスの集合をΩとすればよい。そして、テンプレートS_g(q,k) ^→(ω)のインデックスg(q,k)を式（１０）の替わりに式（３９）によって求める。
The processing in this modification is as follows. A set of frequency indexes ω included in the frequency band to be analyzed is Ω. For example, if an audio signal is handled, the set of indexes corresponding to the 1.0 to 3.0 kHz band may be Ω. Then, an index g (q, k) of the template S _{g (q, k)} ^→ (ω) is obtained by Expression (39) instead of Expression (10).

テンプレートS_g(ω,q) ^→(ω)のインデックスg(ω,q)を式（１８）の替わりに式（４０）によって求める。また、方向情報θ_q ^→(ω,k)＝[θ_q,pol(ω,k),θ_q,azi(ω,k)]の補正は、式（１９）の替わりに式（４１）による更新によって行われる。
The index g (ω, q) of the template S _{g (ω, q)} ^→ (ω) is obtained by equation (40) instead of equation (18). Further, the correction of the direction information θ _q ^→ (ω, k) = [θ _{q, pol} (ω, k), θ _{q, azi} (ω, k)] is performed by the equation (41) instead of the equation (19). Done by renewal.

テンプレートS_g(ω,q) ^→(ω)のインデックスg(ω,q)を式（２８）や式（３７）の替わりに式（４２）によって求める。同様に、式（３８）の替わりに式（４３）によって求める。また、方向情報θ_q ^→(ω,k)＝[θ_q,pol(ω,k),θ_q,azi(ω,k)]の補正は、式（３１）の替わりに式（４４）による更新によって行われる。
The index g (ω, q) of the template S _{g (ω, q)} ^→ (ω) is obtained by the equation (42) instead of the equations (28) and (37). Similarly, it calculates | requires by Formula (43) instead of Formula (38). Further, the correction of the direction information θ _q ^→ (ω, k) = [θ _{q, pol} (ω, k), θ _{q, azi} (ω, k)] is performed by the equation (44) instead of the equation (31). Done by renewal.

本発明を用いて相対的な到来時刻差を推定した実験結果を示す。直方体の部屋で１００本のマイクロホンを１０行１０列に等間隔に２次元的に並べ、壁際に配置した環境をシミュレートした。マイクロホン間隔ｄは４ｃｍである。詳細な実験条件は図１１に示してある。テンプレートの座標として、図１０に示したように、マイクロホンアレーを取り囲む半球上に等間隔となるように配置した。1.0-3.0kHz帯域を解析した場合の実験結果を図１２に示す。基準反射音として直接音を選択した。図が煩雑になることを避けるため、ここでは反射音１−３についてのみ図示している。反射音１について、直接音からの到来時刻差は周波数平均で1.4msecであり、反射音２について、直接音からの到来時刻差は周波数平均で2.5msecであり、反射音３について、直接音からの到来時刻差は周波数平均で4.7msecであった。低周波数帯域では距離に対する位相の変化が小さいために到来時刻差の推定値の誤差が大きくなる傾向があるため、周波数帯域全体に亘って平均をとることによって到来時刻差を求めることがよい。 The experimental result which estimated the relative arrival time difference using this invention is shown. In a rectangular parallelepiped room, 100 microphones were two-dimensionally arranged in 10 rows and 10 columns at equal intervals, and an environment in which the microphones were arranged near a wall was simulated. The microphone interval d is 4 cm. Detailed experimental conditions are shown in FIG. As the coordinates of the template, as shown in FIG. 10, they were arranged at equal intervals on the hemisphere surrounding the microphone array. FIG. 12 shows the experimental results when the 1.0-3.0 kHz band is analyzed. A direct sound was selected as the reference reflected sound. In order to avoid complication of the figure, only the reflected sound 1-3 is illustrated here. For reflected sound 1, the arrival time difference from the direct sound is 1.4 msec on average in frequency, for reflected sound 2, the arrival time difference from the direct sound is 2.5 msec on average in frequency, and for reflected sound 3 from the direct sound The arrival time difference was 4.7 msec on average in frequency. Since the change in the phase with respect to the distance is small in the low frequency band, the error in the estimated value of the arrival time difference tends to increase. Therefore, it is preferable to obtain the arrival time difference by taking an average over the entire frequency band.

＜応用例＞
反射音情報は人間が生活する上で、非常に重要な音声情報である。例えば、視覚障害者は、タッピングによって発した音源信号が壁や天井等で反射して耳で観測することにより、環境を把握している。また、日常会話でも、適度な反射が生じる部屋で会話することと、反射音が比較的少ない環境で会話することでは会話のしやすさに相違が生じる。以下、本発明により推定された反射音情報を用いたサービス例について述べる。
１つ目は、会議システムに本発明を組み込んだ例である。指向性音源の向きに応じて反射音の振幅は変化するので、反射音情報が分かると、どの方向に音源が向いているのかを推定することができる。会議システムに音源向きの推定装置を組み込めば、誰に向かって発言したのかを提示することに応用できる。
２つ目は、自由な位置で映像や音声を鑑賞できるシステムである。遠方にある音は直接到来する音源のパワーが小さいので収音することが困難である。反射音情報が分かると、直接音だけでなく、反射音も強調収音できるので、遠方の音を強調することが可能となる。また、音声処理の分野では、方向別に音源の強調収音は可能であるが、距離別に音声を強調収音することは非常に難しいとされている。反射音情報が分かると、距離に対応する物理的な特徴量が得られるので、距離別に収音することが可能となる。遠方の音を収音したり、方向別、距離別に収音することができれば、視聴者の選択した位置に対応した音場を擬似的に生成することが可能となる。 <Application example>
The reflected sound information is very important voice information for human life. For example, a visually impaired person grasps the environment by reflecting a sound source signal generated by tapping on a wall or ceiling and observing with an ear. Further, even in everyday conversation, there is a difference in the ease of conversation between talking in a room where moderate reflection occurs and talking in an environment with relatively few reflected sounds. Hereinafter, service examples using reflected sound information estimated according to the present invention will be described.
The first is an example in which the present invention is incorporated in a conference system. Since the amplitude of the reflected sound changes according to the direction of the directional sound source, if the reflected sound information is known, it is possible to estimate in which direction the sound source is directed. If a sound source direction estimation device is incorporated in the conference system, it can be applied to presenting who spoke.
The second is a system that allows users to view video and audio at any position. Sound far away is difficult to pick up because the power of the sound source coming directly is small. If the reflected sound information is known, not only the direct sound but also the reflected sound can be picked up and collected, so that it is possible to enhance the sound in the distance. In the field of audio processing, it is possible to emphasize and collect sound sources by direction, but it is very difficult to emphasize and collect sounds by distance. If the reflected sound information is known, a physical feature amount corresponding to the distance can be obtained, so that sound can be collected for each distance. If far-field sounds can be picked up or picked up by direction and distance, a sound field corresponding to the position selected by the viewer can be generated in a pseudo manner.

音声通信システムにおいて、反射音情報を推定することは、直接音だけでは得られなかった音場の情報を得ることにつながる。反射音情報が分かれば、これまでの音声強調技術ではできなかったような遠方音の収音や距離別の収音に結びついたり、従来の収音技術では推定できなかった音場の情報（例えば音源の向き）を推定できる。こういった音場の情報の推定は、これまでの技術では実現できなかった音声処理装置の開発に繋がる。反射音情報の推定に関する従来技術は、インパルス応答を求めるために特殊な信号を観測する必要があったが、本発明は音声信号のような一般的な観測信号で反射音情報を得られるという利点を持つ。 In a voice communication system, estimating reflected sound information leads to obtaining information on a sound field that could not be obtained only by direct sound. If the reflected sound information is known, it will lead to far-field sound collection and sound collection by distance, which could not be done with conventional speech enhancement technology, or information on the sound field that could not be estimated with conventional sound collection technology (for example, The direction of the sound source can be estimated. Such estimation of sound field information leads to the development of a speech processing apparatus that could not be realized by the conventional technology. The prior art related to the estimation of reflected sound information required observation of a special signal in order to obtain an impulse response, but the present invention has an advantage that reflected sound information can be obtained with a general observation signal such as an audio signal. have.

＜反射音情報推定装置のハードウェア構成例＞
上述の実施形態に関わる反射音情報推定装置は、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ＣＰＵ（Central Processing Unit）〔キャッシュメモリなどを備えていてもよい。〕、メモリであるＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）と、ハードディスクである外部記憶装置、並びにこれらの入力部、出力部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置間のデータのやり取りが可能なように接続するバスなどを備えている。また必要に応じて、反射音情報推定装置に、ＣＤ−ＲＯＭなどの記憶媒体を読み書きできる装置（ドライブ）などを設けるとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Example of hardware configuration of reflected sound information estimation device>
The reflected sound information estimation apparatus according to the above-described embodiments may include an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, a CPU (Central Processing Unit) [cache memory, or the like. ] RAM (Random Access Memory) or ROM (Read Only Memory) and external storage device as a hard disk, and data exchange between these input unit, output unit, CPU, RAM, ROM, and external storage device It has a bus that can be connected. Further, if necessary, the reflected sound information estimation device may be provided with a device (drive) that can read and write a storage medium such as a CD-ROM. A physical entity having such hardware resources includes a general-purpose computer.

反射音情報推定装置の外部記憶装置には、反射音情報を推定するためのプログラム並びにこのプログラムの処理において必要となるデータなどが記憶されている〔外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくなどでもよい。〕。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。以下、データやその格納領域のアドレスなどを記憶する記憶装置を単に「記憶部」と呼ぶことにする。 The external storage device of the reflected sound information estimation device stores a program for estimating reflected sound information and data necessary for processing of the program [not limited to the external storage device, for example, a program is read-only. You may memorize | store in ROM which is a memory | storage device. ]. Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device. Hereinafter, a storage device that stores data, addresses of storage areas, and the like is simply referred to as a “storage unit”.

反射音情報推定装置の記憶部には、アナログ信号に対してＡＤ変換を行うためのプログラム、フレーム分割処理を行うためのプログラム、フレームごとのデジタル信号を周波数領域の観測信号に変換するためのプログラム、テンプレート情報を生成するためのプログラム、周波数領域の観測信号とテンプレート情報を用いて反射音情報を推定するためのプログラムが記憶されている。 The storage unit of the reflected sound information estimation device has a program for performing AD conversion on an analog signal, a program for performing frame division processing, and a program for converting a digital signal for each frame into an observation signal in the frequency domain A program for generating template information and a program for estimating reflected sound information using frequency domain observation signals and template information are stored.

反射音情報推定装置では、記憶部に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてＲＡＭに読み込まれて、ＣＰＵで解釈実行・処理される。この結果、ＣＰＵが所定の機能（ＡＤ変換部、フレーム分割部、周波数領域変換部、テンプレート生成部、反射音情報推定部、到来時刻差推定部）を実現することで反射音情報の推定が実現される。 In the reflected sound information estimation apparatus, each program stored in the storage unit and data necessary for processing each program are read into the RAM as necessary, and are interpreted and processed by the CPU. As a result, the CPU implements predetermined functions (AD conversion unit, frame division unit, frequency domain conversion unit, template generation unit, reflected sound information estimation unit, arrival time difference estimation unit), thereby realizing estimation of reflected sound information. Is done.

＜補記＞
本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 <Supplementary note>
The present invention is not limited to the above-described embodiment, and can be appropriately changed without departing from the spirit of the present invention. In addition, the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the apparatus that executes the processing. .

また、上記実施形態において説明したハードウェアエンティティ（反射音情報推定装置）における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 When the processing functions in the hardware entity (reflected sound information estimation apparatus) described in the above embodiment are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, a hardware entity is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

A function that simulates transfer characteristics for each frequency between the p-th position and each position where M microphones are arranged, where P is a predetermined integer of 2 or more and p is an integer of 1 to P. (Hereinafter referred to as transfer characteristic function) as a template for each p,
A signal (hereinafter referred to as an observation signal) obtained by collecting sound source signals with M microphones and converted into the frequency domain, and the template, are input.
Integer Q defined small 2 or more pre than P, and q is 1 or more Q less each integer, the observed signal to the Q direction with respect to the M microphones (hereinafter, arrival of direction) coming respectively from The Q reflection sounds are superimposed, and a template that can represent the qth reflection sound with the highest accuracy is obtained from the P templates, and the direction corresponding to the template is determined from the Q directions. a q-th direction, and the Q template obtained, based on the direction corresponding to the template, to estimate the Q-number of reflections each complex amplitude, the the Q-number of complex amplitudes obtained Q A reflected sound information estimation unit for the arrival amplitude of each reflected sound;
For reflected sound (hereinafter referred to as target reflected sound) other than the reference reflected sound (hereinafter referred to as reference reflected sound), the deviation of the arrival amplitude of the target reflected sound relative to the arrival amplitude of the reference reflected sound divided by the frequency An arrival time difference estimator that sets the arithmetic mean of the frequencies to the relative arrival time difference of the target reflected sound relative to the reference reflected sound;
Reflected sound information estimation apparatus having

The reflected sound information estimation apparatus according to claim 1,
The reflected sound information estimation unit
(1) The p-th complex amplitude is set so that the power of the residual signal obtained by subtracting the p-th reflected sound represented by multiplying the p-th template by the p-th complex amplitude from the observed signal is minimized. Determine the power of the residual signal obtained by subtracting the p-th reflected sound represented by multiplying the determined p-th complex amplitude by the p-th template from the observed signal for each p, Determine Q templates that gave the least power,
(2) Based on the determined Q templates and the direction corresponding to each of the Q templates, the complex amplitude of each of the Q reflected sounds is estimated, and the obtained Q complex amplitudes Is a reflected sound information estimation device, wherein each of the Q reflected sounds is an arrival amplitude .

In the reflected sound information estimation device according to claim 2,
The reflected sound information estimation unit
A reflected sound information estimation apparatus, wherein the complex amplitude multiplied by the template determined in (1) is estimated as the arrival amplitude of the p-th reflected sound.

In the reflected sound information estimation device according to claim 2,
The reflected sound information estimation unit
Observation of a product obtained by multiplying a template for each frequency between an arbitrary position in the space and each of the microphones by a complex amplitude in the vicinity of the direction D determined by the position corresponding to the template determined in (1) above. The arrival direction of the reflected sound is estimated by correcting the direction D so that the power of the residual signal E obtained by subtracting from the signal is minimized, and the complex multiplied by the template corresponding to the arrival direction is estimated. An apparatus for estimating reflected sound information, wherein an amplitude is estimated as an arrival amplitude of the p-th reflected sound.

The reflected sound information estimation apparatus according to claim 1,
The reflected sound information estimation unit
First, p is an integer of 1 to P, and for each p, the pth obtained by subtracting the pth template multiplied by the pth complex amplitude (hereinafter referred to as virtual amplitude) from the observed signal. The p-th virtual amplitude is determined so that the power of the residual signal is minimized, and Q templates in which the top Q virtual amplitudes are multiplied with respect to the magnitude of the obtained P virtual amplitudes are determined. Let Q arrival directions corresponding to be Q initial directions,
A residual signal obtained by subtracting the qth virtual amplitude from the transfer characteristic function determined corresponding to the qth initial direction, where q is an integer between 1 and Q, from the observed signal (hereinafter collectively) The q-th virtual amplitude is determined so that the power of the residual signal after removal) is minimized, and these Q virtual amplitudes are defined as Q initial amplitudes.
Next, the arrival direction correction process for correcting the arrival direction of the Q reflected sounds and the complex amplitude correction process for correcting the complex amplitude of the Q reflected sounds are alternately repeated a predetermined number of times, thereby performing Q The direction of arrival and complex amplitude corresponding to the reflected sound of
In the arrival direction correction process, q is an integer between 1 and Q, and in the first arrival direction correction process, the transfer characteristic function determined corresponding to the qth initial direction is multiplied by the qth initial amplitude. In the second and subsequent arrival direction correction processes, the q th complex amplitude obtained by the immediately preceding complex amplitude correction process is transferred to the transfer characteristic function determined corresponding to the q th arrival direction obtained by the immediately preceding arrival direction correction process. The qth direction of arrival is corrected so that the power of the signal obtained by multiplying by the residual signal after batch removal is reduced,
In the complex amplitude correction process, q is an integer of 1 or more and Q or less, and in the first complex amplitude correction process, the transfer characteristic function determined in correspondence with the qth arrival direction obtained in the immediately preceding arrival direction correction process is q In the second and subsequent complex amplitude correction processes, the complex amplitude correction process immediately before is added to the transfer characteristic function determined corresponding to the qth arrival direction obtained in the previous arrival direction correction process. The q-th complex amplitude is corrected so that the power of the signal obtained by multiplying the q-th reflected sound obtained in step 1 multiplied by the complex amplitude is added to the residual signal after collective removal , and obtained. An apparatus for estimating reflected sound information, wherein the complex amplitude of the Q reflected sounds is estimated as the arrival amplitude of each of the Q reflected sounds.

The reflected sound information estimation apparatus according to claim 1,
The reflected sound information estimation unit
First, q is set to an integer of 1 to Q, and for each q in ascending order, a first process for obtaining a complex amplitude and a second process for obtaining an index p for identifying a template are performed, whereby Q initial directions are obtained. In the first process, p represents each index included in the set obtained by excluding the template index determined in the second process from the set of indexes of all templates. , The pth template multiplied by the pth complex amplitude (hereinafter referred to as virtual amplitude) from the qth minimum residual signal (however, the first minimum residual signal is the observed signal). The p-th virtual amplitude is determined so that the power of the q + 1-th residual signal obtained by subtraction is minimized, and in the second process, the maximum virtual amplitude among the virtual amplitudes obtained in the first process is determined. Identify the index of the width is multiplied template, the arrival direction that corresponds to the template by the q-th initial direction, to determine the Q-number of initial direction,
A residual signal obtained by subtracting the qth virtual amplitude from the transfer characteristic function determined corresponding to the qth initial direction, where q is an integer between 1 and Q, from the observed signal (hereinafter collectively) The q-th virtual amplitude is determined so that the power of the residual signal after removal) is minimized, and these Q virtual amplitudes are defined as Q initial amplitudes.
Next, the arrival direction correction process for correcting the arrival direction of the Q reflected sounds and the complex amplitude correction process for correcting the complex amplitude of the Q reflected sounds are alternately repeated a predetermined number of times, thereby performing Q The direction of arrival and complex amplitude corresponding to the reflected sound of
In the arrival direction correction process, q is an integer between 1 and Q, and in the first arrival direction correction process, the transfer characteristic function determined corresponding to the qth initial direction is multiplied by the qth initial amplitude. In the second and subsequent arrival direction correction processes, the q th complex amplitude obtained by the immediately preceding complex amplitude correction process is transferred to the transfer characteristic function determined corresponding to the q th arrival direction obtained by the immediately preceding arrival direction correction process. The qth direction of arrival is corrected so that the power of the signal obtained by multiplying by the residual signal after batch removal is reduced,
In the complex amplitude correction process, q is an integer of 1 or more and Q or less, and in the first complex amplitude correction process, the transfer characteristic function determined in correspondence with the qth arrival direction obtained in the immediately preceding arrival direction correction process is q In the second and subsequent complex amplitude correction processes, the complex amplitude correction process immediately before is added to the transfer characteristic function determined corresponding to the qth arrival direction obtained in the previous arrival direction correction process. The q-th complex amplitude is corrected so that the power of the signal obtained by multiplying the q-th reflected sound obtained in step 1 multiplied by the complex amplitude is added to the residual signal after collective removal , and obtained. An apparatus for estimating reflected sound information, wherein the complex amplitude of the Q reflected sounds is estimated as the arrival amplitude of each of the Q reflected sounds.

The reflected sound information estimation apparatus according to claim 1,
The reflected sound information estimation unit
Determine a true subset Ψ composed of directions of arrival selected so that the directions of arrival corresponding to each of the templates are not biased and sparse, and set the concentration of the true subset Ψ to | Ψ |, Q ≦ | Ψ |, X represents each index p that identifies the template corresponding to the direction of arrival included in the true subset Ψ,
For each x, the x-th residual signal obtained by subtracting the x-th template multiplied by the x-th complex amplitude (hereinafter referred to as virtual amplitude) from the observed signal is minimized so that the power of the x-th residual signal is minimized. Of the obtained virtual amplitudes, Q arrival directions corresponding to Q templates each multiplied by the top Q virtual amplitudes are defined as Q initial directions. ,
A residual signal obtained by subtracting the qth virtual amplitude from the transfer characteristic function determined corresponding to the qth initial direction, where q is an integer between 1 and Q, from the observed signal (hereinafter collectively) The q-th virtual amplitude is determined so that the power of the residual signal after removal) is minimized, and these Q virtual amplitudes are defined as Q initial amplitudes.
Next, the arrival direction correction process for correcting the arrival direction of the Q reflected sounds and the complex amplitude correction process for correcting the complex amplitude of the Q reflected sounds are alternately repeated a predetermined number of times, thereby performing Q The direction of arrival and complex amplitude corresponding to the reflected sound of
In the arrival direction correction process, q is an integer between 1 and Q, and in the first arrival direction correction process, the transfer characteristic function determined corresponding to the qth initial direction is multiplied by the qth initial amplitude. In the second and subsequent arrival direction correction processes, the q th complex amplitude obtained by the immediately preceding complex amplitude correction process is transferred to the transfer characteristic function determined corresponding to the q th arrival direction obtained by the immediately preceding arrival direction correction process. The qth direction of arrival is corrected so that the power of the signal obtained by multiplying by the residual signal after batch removal is reduced,
In the complex amplitude correction process, q is an integer of 1 or more and Q or less, and in the first complex amplitude correction process, the transfer characteristic function determined in correspondence with the qth arrival direction obtained in the immediately preceding arrival direction correction process is q In the second and subsequent complex amplitude correction processes, the complex amplitude correction process immediately before is added to the transfer characteristic function determined corresponding to the qth arrival direction obtained in the previous arrival direction correction process. The q-th complex amplitude is corrected so that the power of the signal obtained by multiplying the q-th reflected sound obtained in step 1 multiplied by the complex amplitude is added to the residual signal after collective removal , and obtained. An apparatus for estimating reflected sound information, wherein the complex amplitude of the Q reflected sounds is estimated as the arrival amplitude of each of the Q reflected sounds.

In the reflected sound information estimation device according to any one of claims 2 to 7,
The reflected sound information estimation apparatus according to claim 1, wherein the power of the residual signal is a power obtained by adding over all the frequencies.

In the reflected sound information estimation device according to any one of claims 1 to 8,
The arrival time difference estimator is
The reflected sound information estimation device, wherein the reference reflected sound common to the target reflected sounds is determined from the Q reflected sounds.

In the reflected sound information estimation device according to any one of claims 1 to 8,
The arrival time difference estimator is
The reflected sound information estimation apparatus, wherein the reference reflected sound is determined from Q reflected sounds for each target reflected sound.

In the storage unit, P is a predetermined integer of 2 or more, p is an integer of 1 to P, and transmission for each frequency between the p-th position and each position where M microphones are arranged. A function that simulates characteristics (hereinafter referred to as transfer characteristic function) is stored as a template for each p,
A signal (hereinafter referred to as an observation signal) obtained by collecting sound source signals with M microphones and converted into the frequency domain, and the template, are input.
The reflected sound information estimation unit sets Q to 2 or more predetermined integers smaller than P , q to 1 or more and Q or less, and sets the observed signal to Q directions (hereinafter, Q reflections coming from each of the incoming directions) are superimposed, and a template that can represent the qth reflected sound with the highest accuracy is obtained from the P templates, and the direction corresponding to the template is determined as a q-th direction of the Q direction, and the Q template obtained, based on the direction corresponding to the template, to estimate the Q-number of reflections each complex amplitude, resulting Q Reflected sound information estimation process in which the complex amplitude is the arrival amplitude of each of the Q reflected sounds ;
The arrival time difference estimator determines the deviation of the arrival amplitude of the target reflected sound with respect to the arrival amplitude of the reference reflected sound with respect to the reflected sound other than the reference reflected sound (hereinafter referred to as the reference reflected sound). An arrival time difference estimation process in which the arithmetic mean of the frequency relative to the angle divided by the frequency is the relative arrival time difference of the target reflected sound with respect to the reference reflected sound;
A method for estimating reflected sound information.

In the reflected sound information estimation method according to claim 11,
In the reflected sound information estimation process,
(1) The p-th complex amplitude is set so that the power of the residual signal obtained by subtracting the p-th reflected sound represented by multiplying the p-th template by the p-th complex amplitude from the observed signal is minimized. Determine the power of the residual signal obtained by subtracting the p-th reflected sound represented by multiplying the determined p-th complex amplitude by the p-th template from the observed signal for each p, Determine Q templates that gave the least power,
(2) Based on the determined Q templates and the direction corresponding to each of the Q templates, the complex amplitude of each of the Q reflected sounds is estimated, and the obtained Q complex amplitudes Is a reflected sound information estimation method, wherein each of the Q reflected sounds is an arrival amplitude .

The reflected sound information estimation method according to claim 12,
In the reflected sound information estimation process,
A reflected sound information estimation method, wherein the complex amplitude multiplied by the template determined in (1) is estimated as the arrival amplitude of the p-th reflected sound.

The reflected sound information estimation method according to claim 12,
In the reflected sound information estimation process,
Observation of a product obtained by multiplying a template for each frequency between an arbitrary position in the space and each of the microphones by a complex amplitude in the vicinity of the direction D determined by the position corresponding to the template determined in (1) above. The arrival direction of the reflected sound is estimated by correcting the direction D so that the power of the residual signal E obtained by subtracting from the signal is minimized, and the complex multiplied by the template corresponding to the arrival direction is estimated. A reflected sound information estimation method, wherein an amplitude is estimated as an arrival amplitude of the p-th reflected sound.

In the reflected sound information estimation method according to claim 11,
In the reflected sound information estimation process,
First, p is an integer of 1 to P, and for each p, the pth obtained by subtracting the pth template multiplied by the pth complex amplitude (hereinafter referred to as virtual amplitude) from the observed signal. The p-th virtual amplitude is determined so that the power of the residual signal is minimized, and Q templates in which the top Q virtual amplitudes are multiplied with respect to the magnitude of the obtained P virtual amplitudes are determined. Let Q arrival directions corresponding to be Q initial directions,
A residual signal obtained by subtracting the qth virtual amplitude from the transfer characteristic function determined corresponding to the qth initial direction, where q is an integer between 1 and Q, from the observed signal (hereinafter collectively) The q-th virtual amplitude is determined so that the power of the residual signal after removal) is minimized, and these Q virtual amplitudes are defined as Q initial amplitudes.
Next, the arrival direction correction process for correcting the arrival direction of the Q reflected sounds and the complex amplitude correction process for correcting the complex amplitude of the Q reflected sounds are alternately repeated a predetermined number of times, thereby performing Q The direction of arrival and complex amplitude corresponding to the reflected sound of
In the arrival direction correction process, q is an integer between 1 and Q, and in the first arrival direction correction process, the transfer characteristic function determined corresponding to the qth initial direction is multiplied by the qth initial amplitude. In the second and subsequent arrival direction correction processes, the q th complex amplitude obtained by the immediately preceding complex amplitude correction process is transferred to the transfer characteristic function determined corresponding to the q th arrival direction obtained by the immediately preceding arrival direction correction process. The qth direction of arrival is corrected so that the power of the signal obtained by multiplying by the residual signal after batch removal is reduced,
In the complex amplitude correction process, q is an integer of 1 or more and Q or less, and in the first complex amplitude correction process, the transfer characteristic function determined in correspondence with the qth arrival direction obtained in the immediately preceding arrival direction correction process is q In the second and subsequent complex amplitude correction processes, the complex amplitude correction process immediately before is added to the transfer characteristic function determined corresponding to the qth arrival direction obtained in the previous arrival direction correction process. The q-th complex amplitude is corrected so that the power of the signal obtained by multiplying the q-th reflected sound obtained in step 1 multiplied by the complex amplitude is added to the residual signal after collective removal , and obtained. A reflected sound information estimation method, wherein the complex amplitude of the Q reflected sounds is estimated as the arrival amplitude of each of the Q reflected sounds.

In the reflected sound information estimation method according to claim 11,
In the reflected sound information estimation process,
First, q is set to an integer between 1 and Q, and for each q in ascending order of q, a first process for obtaining a complex amplitude and a second process for obtaining an index for identifying a template are performed. In the first process, p represents each index included in the set obtained by excluding the template index determined in the second process from the set of all template indexes, and for each p, The pth template multiplied by the pth complex amplitude (hereinafter referred to as virtual amplitude) is subtracted from the qth minimum residual signal (however, the first minimum residual signal is referred to as the observed signal). The p-th virtual amplitude is determined so that the power of the q + 1-th residual signal obtained in this way is minimized. In the second process, the largest virtual amplitude among the virtual amplitudes obtained in the first process is determined. Identify the index of was multiplied template, the arrival direction that corresponds to the template by the q-th initial direction, to determine the Q-number of initial direction,
A residual signal obtained by subtracting the qth virtual amplitude from the transfer characteristic function determined corresponding to the qth initial direction, where q is an integer between 1 and Q, from the observed signal (hereinafter collectively) The q-th virtual amplitude is determined so that the power of the residual signal after removal) is minimized, and these Q virtual amplitudes are defined as Q initial amplitudes.
Next, the arrival direction correction process for correcting the arrival direction of the Q reflected sounds and the complex amplitude correction process for correcting the complex amplitude of the Q reflected sounds are alternately repeated a predetermined number of times, thereby performing Q The direction of arrival and complex amplitude corresponding to the reflected sound of
In the arrival direction correction process, q is an integer between 1 and Q, and in the first arrival direction correction process, the transfer characteristic function determined corresponding to the qth initial direction is multiplied by the qth initial amplitude. In the second and subsequent arrival direction correction processes, the q th complex amplitude obtained by the immediately preceding complex amplitude correction process is transferred to the transfer characteristic function determined corresponding to the q th arrival direction obtained by the immediately preceding arrival direction correction process. The qth direction of arrival is corrected so that the power of the signal obtained by multiplying by the residual signal after batch removal is reduced,
In the complex amplitude correction process, q is an integer of 1 or more and Q or less, and in the first complex amplitude correction process, the transfer characteristic function determined in correspondence with the qth arrival direction obtained in the immediately preceding arrival direction correction process is q In the second and subsequent complex amplitude correction processes, the complex amplitude correction process immediately before is added to the transfer characteristic function determined corresponding to the qth arrival direction obtained in the previous arrival direction correction process. The q-th complex amplitude is corrected so that the power of the signal obtained by multiplying the q-th reflected sound obtained in step 1 multiplied by the complex amplitude is added to the residual signal after collective removal , and obtained. A reflected sound information estimation method, wherein the complex amplitude of the Q reflected sounds is estimated as the arrival amplitude of each of the Q reflected sounds.

In the reflected sound information estimation method according to claim 11,
In the reflected sound information estimation process,
Determine a true subset Ψ composed of directions of arrival selected so that the directions of arrival corresponding to each of the templates are not biased and sparse, and set the concentration of the true subset Ψ to | Ψ |, Q ≦ | Ψ |, X represents each index identifying a template corresponding to the direction of arrival included in the true subset Ψ,
For each x, the x-th residual signal obtained by subtracting the x-th template multiplied by the x-th complex amplitude (hereinafter referred to as virtual amplitude) from the observed signal is minimized so that the power of the x-th residual signal is minimized. Of the obtained virtual amplitudes, Q arrival directions corresponding to Q templates each multiplied by the top Q virtual amplitudes are defined as Q initial directions. ,
A residual signal obtained by subtracting the qth virtual amplitude from the transfer characteristic function determined corresponding to the qth initial direction, where q is an integer between 1 and Q, from the observed signal (hereinafter collectively) The q-th virtual amplitude is determined so that the power of the residual signal after removal) is minimized, and these Q virtual amplitudes are defined as Q initial amplitudes.
Next, the arrival direction correction process for correcting the arrival direction of the Q reflected sounds and the complex amplitude correction process for correcting the complex amplitude of the Q reflected sounds are alternately repeated a predetermined number of times, thereby performing Q The direction of arrival and complex amplitude corresponding to the reflected sound of
In the arrival direction correction process, q is an integer between 1 and Q, and in the first arrival direction correction process, the transfer characteristic function determined corresponding to the qth initial direction is multiplied by the qth initial amplitude. In the second and subsequent arrival direction correction processes, the q th complex amplitude obtained by the immediately preceding complex amplitude correction process is transferred to the transfer characteristic function determined corresponding to the q th arrival direction obtained by the immediately preceding arrival direction correction process. The qth direction of arrival is corrected so that the power of the signal obtained by multiplying by the residual signal after batch removal is reduced,
In the complex amplitude correction process, q is an integer of 1 or more and Q or less, and in the first complex amplitude correction process, the transfer characteristic function determined in correspondence with the qth arrival direction obtained in the immediately preceding arrival direction correction process is q In the second and subsequent complex amplitude correction processes, the complex amplitude correction process immediately before is added to the transfer characteristic function determined corresponding to the qth arrival direction obtained in the previous arrival direction correction process. The q-th complex amplitude is corrected so that the power of the signal obtained by multiplying the q-th reflected sound obtained in step 1 multiplied by the complex amplitude is added to the residual signal after collective removal , and obtained. A reflected sound information estimation method, wherein the complex amplitude of the Q reflected sounds is estimated as the arrival amplitude of each of the Q reflected sounds.

In the reflection sound information estimating method according to claims 1 2 in any one of claims 17,
The reflected sound information estimation method, wherein the power of the residual signal is a power obtained by adding all of the frequencies.

The reflected sound information estimation method according to any one of claims 11 to 18,
In the above arrival time difference estimation process,
A method for estimating reflected sound information, wherein the reference reflected sound common to each of the target reflected sounds is determined from the Q reflected sounds.

The reflected sound information estimation method according to any one of claims 11 to 18,
In the above arrival time difference estimation process,
A reflected sound information estimation method, wherein the reference reflected sound is determined from Q reflected sounds for each target reflected sound.

A program for causing a computer to execute processing of the reflected sound information estimation method according to any one of claims 11 to 20.