JP5454330B2

JP5454330B2 - Sound processor

Info

Publication number: JP5454330B2
Application number: JP2010099488A
Authority: JP
Inventors: 多伸近藤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2010-04-23
Filing date: 2010-04-23
Publication date: 2014-03-26
Anticipated expiration: 2030-04-23
Also published as: JP2011227417A

Description

本発明は、音響信号に含まれる雑音成分を抑圧する技術に関する。 The present invention relates to a technique for suppressing a noise component included in an acoustic signal.

目的音成分と雑音成分との混合音の音響信号から雑音成分を抑圧する技術が従来から提案されている。例えば特許文献１には、複数の音響信号の各々における低域成分と各低域成分の平均成分とのうち強度が最小となる成分を選択して各音響信号の高域成分と合成することで、風雑音が抑圧された雑音抑圧信号を生成する技術が開示されている。 Conventionally, a technique for suppressing a noise component from an acoustic signal of a mixed sound of a target sound component and a noise component has been proposed. For example, in Patent Document 1, a component having the minimum intensity is selected from a low-frequency component and an average component of each low-frequency component in each of a plurality of acoustic signals, and synthesized with a high-frequency component of each acoustic signal. A technique for generating a noise suppression signal in which wind noise is suppressed is disclosed.

特許第４３５６６７０号公報Japanese Patent No. 4356670

しかし、特許文献１の技術では、雑音抑圧信号の生成に利用される成分が強度のみを基準として選択されるから、例えば風雑音と比較して目的音成分の強度が小さい場合には目的音成分が除去される可能性がある。また、複数の音響信号の平均成分が雑音抑圧信号の低域成分として選択された場合には、雑音抑圧信号の生成の過程で目的音成分の波形が大幅に変化するから、目的音成分が忠実に再現されないという問題もある。以上の事情を考慮して、本発明は、音響信号の雑音成分を高精度に抑圧することを目的とする。 However, in the technique of Patent Document 1, since the component used for generating the noise suppression signal is selected based on the intensity only, for example, when the intensity of the target sound component is small compared to the wind noise, the target sound component May be removed. In addition, when the average component of multiple acoustic signals is selected as the low-frequency component of the noise suppression signal, the waveform of the target sound component changes significantly during the generation of the noise suppression signal. There is also a problem that it is not reproduced. In view of the above circumstances, an object of the present invention is to suppress a noise component of an acoustic signal with high accuracy.

以上の課題を解決するために、本発明の第１態様に係る音響処理装置は、並列に収音された第１音響信号および第２音響信号の各々について、当該音響信号の周波数毎の成分値の時系列を要素とする観測行列（例えば図２の観測行列Ｖi）の非負行列因子分解で、当該音響信号の相異なる成分の周波数毎の成分値を示す複数の基底（例えば図４の基底Ｃi[1]〜Ｃi[K]）を含む基底行列（例えば図４の基底行列Ｗi）と、各基底の重み値の時系列を各々が示す複数の重み系列（例えば図５の重み系列Ｅi[1]〜Ｅi[K]）を含む係数行列（例えば図５の係数行列Ｈi）とを生成する行列分解手段と、第１音響信号の基底行列の複数の基底のうち第２音響信号の基底行列の基底との相関が高い基底を、第１音響信号の雑音成分に対応する雑音基底（例えば図４の雑音基底Ｃi_noise）として特定する雑音特定手段と、第１音響信号の基底行列のうち雑音基底以外の各基底と第１音響信号の係数行列のうち当該各基底に対応する重み系列とに応じて、第１音響信号の雑音成分が抑圧されるように周波数毎の係数値が設定された処理係数列（例えば図１の処理係数列Ｇi）を生成する係数列生成手段と、第１音響信号に処理係数列を作用させる雑音抑圧手段とを具備する。 In order to solve the above-described problems, the acoustic processing device according to the first aspect of the present invention provides, for each of the first acoustic signal and the second acoustic signal collected in parallel, component values for each frequency of the acoustic signal. Is a non-negative matrix factorization of an observation matrix (for example, the observation matrix Vi in FIG. 2), and a plurality of bases (for example, the base Ci in FIG. 4) indicating component values for different frequencies of the acoustic signal. [1] to Ci [K]) including a base matrix (for example, the base matrix Wi in FIG. 4) and a plurality of weight sequences (for example, the weight sequence Ei [1 in FIG. 5) each indicating a time series of weight values of each base. ] To Ei [K]), matrix decomposition means for generating a coefficient matrix (for example, coefficient matrix Hi in FIG. 5), and a base matrix of the second acoustic signal among a plurality of bases of the base matrix of the first acoustic signal. A base having a high correlation with the base is a noise base corresponding to the noise component of the first acoustic signal (for example, in FIG. 4). Noise specifying means for specifying as noise base Ci_noise), each base other than the noise base in the base matrix of the first acoustic signal and a weight sequence corresponding to each base in the coefficient matrix of the first acoustic signal, Coefficient sequence generating means for generating a processing coefficient sequence (for example, processing coefficient sequence Gi in FIG. 1) in which coefficient values for each frequency are set so that the noise component of the first acoustic signal is suppressed, and processing on the first acoustic signal Noise suppression means for operating the coefficient sequence.

以上の構成では、第１音響信号および第２音響信号の各々の観測行列が基底行列と係数行列とに分解され、第１音響信号の基底行列の複数の基底のうち第２音響信号の基底行列の基底との相関が高い雑音基底が除外されたうえで処理係数列が生成される。したがって、第１音響信号の目的音成分の強度が雑音成分と比較して低い場合でも雑音成分を高精度に抑圧することが可能である。また、基底行列のうち雑音基底以外の基底は維持されるから、目的音成分の波形を忠実に維持しながら雑音成分を抑圧できるという利点もある。 In the above configuration, each observation matrix of the first acoustic signal and the second acoustic signal is decomposed into a base matrix and a coefficient matrix, and the base matrix of the second acoustic signal among the plurality of bases of the base matrix of the first acoustic signal. A processing coefficient sequence is generated after removing noise bases having a high correlation with the bases. Therefore, even when the intensity of the target sound component of the first acoustic signal is lower than the noise component, the noise component can be suppressed with high accuracy. In addition, since the basis other than the noise basis is maintained in the basis matrix, there is an advantage that the noise component can be suppressed while maintaining the waveform of the target sound component faithfully.

本発明の第１態様の好適例において、行列分解手段は、第１音響信号のうち第１帯域（例えば図２の帯域ＢL）の成分について基底行列と係数行列とを生成し、雑音抑圧手段は、第１音響信号のうち第１帯域の成分に処理係数列を作用させ、雑音抑圧手段による処理後の成分と、第１音響信号のうち第１帯域と比較して高域側の第２帯域（例えば図２の帯域ＢH）内の成分とを合成する波形合成手段を具備する。以上の態様では、第１音響信号の第１帯域の成分について処理係数列が生成および適用され、雑音抑圧手段による処理後に第２帯域内の成分と合成される。したがって、第１音響信号の全帯域を対象として処理係数列の生成や適用を実行する構成と比較して処理の負荷を軽減することが可能である。また、雑音成分の周波数を包含するように第１帯域を設定することで、雑音成分を高精度に抑圧可能な処理係数列を生成できるという利点もある。 In a preferred example of the first aspect of the present invention, the matrix decomposing means generates a base matrix and a coefficient matrix for the component of the first band (for example, band BL in FIG. 2) of the first acoustic signal, and the noise suppressing means is The processing coefficient sequence is applied to the component of the first band in the first acoustic signal, the component after processing by the noise suppression means, and the second band on the high frequency side compared to the first band of the first acoustic signal. Waveform synthesizing means for synthesizing the components in the band (for example, band BH in FIG. 2) is provided. In the above aspect, the processing coefficient sequence is generated and applied to the first band component of the first acoustic signal, and is synthesized with the component in the second band after being processed by the noise suppressing means. Therefore, it is possible to reduce the processing load as compared with the configuration in which the generation and application of the processing coefficient sequence is performed for the entire band of the first acoustic signal. Further, by setting the first band so as to include the frequency of the noise component, there is an advantage that a processing coefficient sequence capable of suppressing the noise component with high accuracy can be generated.

本発明の第１態様の好適例に係る音響処理装置は、第１音響信号と第２音響信号との位相差（例えば図６の位相差ΔＰ[tn]）を算定する位相差算定手段を具備し、係数列生成手段は、処理係数列の各係数値を、第１音響信号と第２音響信号との位相差に応じて可変に設定する。例えば、第１音響信号と第２音響信号との位相差が大きい（雑音成分が優勢である）ほど処理係数列による雑音抑圧の効果が増加するように、処理係数列の各係数値が可変に設定される。以上の形態によれば、第１音響信号と第２音響信号との位相差が処理係数列に反映されるから、処理係数列に位相差を反映させない構成と比較して雑音成分を充分に抑圧できるという利点がある。 The acoustic processing apparatus according to a preferred example of the first aspect of the present invention includes phase difference calculating means for calculating a phase difference (for example, phase difference ΔP [tn] in FIG. 6) between the first acoustic signal and the second acoustic signal. Then, the coefficient sequence generation means variably sets each coefficient value of the processing coefficient sequence according to the phase difference between the first acoustic signal and the second acoustic signal. For example, each coefficient value of the processing coefficient sequence is variable so that the effect of noise suppression by the processing coefficient sequence increases as the phase difference between the first acoustic signal and the second acoustic signal increases (the noise component is dominant). Is set. According to the above embodiment, since the phase difference between the first acoustic signal and the second acoustic signal is reflected in the processing coefficient sequence, the noise component is sufficiently suppressed as compared with the configuration in which the phase difference is not reflected in the processing coefficient sequence. There is an advantage that you can.

本発明の第１態様の好適例に係る音響処理装置は、第１音響信号と第２音響信号との強度差（振幅差やパワー差）を算定する強度差算定手段を具備し、係数列生成手段は、処理係数列の各係数値を、第１音響信号と第２音響信号との強度差（例えば図６の強度差ΔＡ[tn]）に応じて可変に設定する。例えば、第１音響信号と第２音響信号との強度差が大きい（雑音成分が優勢である）ほど処理係数列による雑音抑圧の効果が増加するように、処理係数列の各係数値が可変に設定される。以上の形態によれば、第１音響信号と第２音響信号との強度差が処理係数列に反映されるから、処理係数列に強度差を反映させない構成と比較して雑音成分を充分に抑圧できるという利点がある。 An acoustic processing apparatus according to a preferred example of the first aspect of the present invention includes intensity difference calculation means for calculating an intensity difference (amplitude difference or power difference) between a first acoustic signal and a second acoustic signal, and generates a coefficient sequence. The means variably sets each coefficient value of the processing coefficient sequence according to the intensity difference between the first acoustic signal and the second acoustic signal (for example, the intensity difference ΔA [tn] in FIG. 6). For example, each coefficient value of the processing coefficient sequence is variable so that the effect of noise suppression by the processing coefficient sequence increases as the intensity difference between the first acoustic signal and the second acoustic signal increases (the noise component is dominant). Is set. According to the above embodiment, since the intensity difference between the first acoustic signal and the second acoustic signal is reflected in the processing coefficient sequence, the noise component is sufficiently suppressed as compared with the configuration in which the intensity difference is not reflected in the processing coefficient sequence. There is an advantage that you can.

本発明の第２態様に係る音響処理装置は、並列に収音された第１音響信号および第２音響信号の各々について、当該音響信号の周波数毎の成分値の時系列を要素とする観測行列（例えば図２の観測行列Ｖi）の非負行列因子分解で、当該音響信号の相異なる成分の周波数毎の成分値を示す複数の基底（例えば図４の基底Ｃi[1]〜Ｃi[K]）を含む基底行列（例えば図４の基底行列Ｗi）と、各基底の重み値の時系列を各々が示す複数の重み系列（例えば図５の重み系列Ｅi[1]〜Ｅi[K]）を含む係数行列（例えば図５の係数行列Ｈi）とを生成する行列分解手段（例えば図７の行列分解部４２）と、第１音響信号の基底行列の複数の基底のうち第２音響信号の基底行列の基底との相関が高い基底を、第１音響信号の雑音成分に対応する雑音基底（例えば図４の雑音基底Ｃi_noise）として特定する雑音特定手段（例えば図７の雑音特定部４４）と、第１音響信号の基底行列のうち雑音基底以外の各基底と第１音響信号の係数行列のうち当該各基底に対応する重み系列との乗算で、前記第１音響信号の雑音成分が抑圧された音響信号を生成する抑圧処理手段（例えば図７の抑圧処理部４８）とを具備する。以上の構成でも第１態様と同様の効果が実現される。また、処理係数列の生成や適用が省略されるから、第１態様と比較して音響処理装置の構成や処理が簡素化されるという利点もある。 The acoustic processing device according to the second aspect of the present invention provides an observation matrix having, as elements, a time series of component values for each frequency of the acoustic signal for each of the first acoustic signal and the second acoustic signal collected in parallel. A plurality of bases (for example, bases Ci [1] to Ci [K] in FIG. 4) indicating component values for different frequencies of different components of the acoustic signal in non-negative matrix factorization of the observation matrix Vi in FIG. And a plurality of weight sequences (for example, weight sequences Ei [1] to Ei [K] in FIG. 5) each indicating a time series of weight values of the respective bases. Matrix decomposition means (for example, the matrix decomposition unit 42 in FIG. 7) for generating a coefficient matrix (for example, the coefficient matrix Hi in FIG. 5), and the base matrix of the second acoustic signal among the plurality of bases of the base matrix of the first acoustic signal A basis having a high correlation with the basis of the noise basis corresponding to the noise component of the first acoustic signal (for example, FIG. 4). Noise specifying means (for example, the noise specifying unit 44 in FIG. 7) specifying each of the bases other than the noise base and the coefficient matrix of the first acoustic signal among the base matrix of the first acoustic signal. Suppression processing means (for example, a suppression processing unit 48 in FIG. 7) that generates an acoustic signal in which the noise component of the first acoustic signal is suppressed by multiplication with a weight sequence corresponding to a base. With the above configuration, the same effect as the first aspect is realized. Further, since the generation and application of the processing coefficient sequence are omitted, there is an advantage that the configuration and processing of the sound processing device are simplified as compared with the first mode.

なお、本発明（第１態様，第２態様）の適用の範囲は、２系統の音響信号を処理する構成に限定されない。すなわち、３系統以上の音響信号を処理する構成でも、特定の２系統の音響信号に着目したときに本発明の要件を充足する構成は、本発明の範囲に当然に包含される。 Note that the scope of application of the present invention (the first aspect and the second aspect) is not limited to a configuration for processing two systems of acoustic signals. That is, even in a configuration for processing three or more systems of acoustic signals, a configuration that satisfies the requirements of the present invention when focusing on two specific systems of acoustic signals is naturally included in the scope of the present invention.

以上の各態様に係る音響処理装置は、音響信号の処理に専用されるＤＳＰ（Digital Signal Processor）などのハードウェア（電子回路）によって実現されるほか、ＣＰＵ（Central Processing Unit）などの汎用の演算処理装置とプログラムとの協働によっても実現される。以下に例示する本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で利用者に提供されてコンピュータにインストールされるほか、通信網を介した配信の形態でサーバ装置から提供されてコンピュータにインストールされる。 The acoustic processing device according to each of the above aspects is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to processing of an acoustic signal, or a general-purpose calculation such as a CPU (Central Processing Unit). This is also realized by cooperation between the processing device and the program. The program of the present invention exemplified below is provided to the user in a form stored in a computer-readable recording medium and installed in the computer, and is also provided from the server device in the form of distribution via a communication network. Installed on the computer.

本発明の第１態様に係るプログラムは、並列に収音された第１音響信号および第２音響信号の各々について、当該音響信号の周波数毎の成分値の時系列を要素とする観測行列の非負行列因子分解で、当該音響信号の相異なる成分の周波数毎の成分値を示す複数の基底を含む基底行列と、各基底の重み値の時系列を各々が示す複数の重み系列を含む係数行列とを生成する行列分解処理と、第１音響信号の基底行列の複数の基底のうち第２音響信号の基底行列の基底との相関が高い基底を、第１音響信号の雑音成分に対応する雑音基底として特定する雑音特定処理と、第１音響信号の基底行列のうち雑音基底以外の各基底と第１音響信号の係数行列のうち当該各基底に対応する重み系列とに応じて、第１音響信号の雑音成分が抑圧されるように周波数毎の係数値が設定された処理係数列を生成する係数列生成処理と、第１音響信号に処理係数列を作用させる雑音抑圧処理とをコンピュータに実行させる。以上のプログラムによれば、本発明の第１態様に係る音響処理装置と同様の作用および効果が実現される。 The program according to the first aspect of the present invention provides a non-negative observation matrix for each of the first acoustic signal and the second acoustic signal collected in parallel, the element being a time series of component values for each frequency of the acoustic signal. In matrix factorization, a base matrix including a plurality of bases indicating component values for each frequency of different components of the acoustic signal, and a coefficient matrix including a plurality of weight sequences each indicating a time series of weight values of each base, A base having a high correlation with a base of the base matrix of the second acoustic signal among a plurality of bases of the base matrix of the first acoustic signal and a noise base corresponding to the noise component of the first acoustic signal First acoustic signal in accordance with the noise identification processing specified as follows, each basis other than the noise basis in the basis matrix of the first acoustic signal, and the weight sequence corresponding to each basis in the coefficient matrix of the first acoustic signal So that the noise component of A coefficient sequence generation process for generating a processing coefficient sequence coefficient value for each number is set to execute a noise suppression process for applying a processing coefficient sequence to the first acoustic signal to the computer. According to the above program, the same operation and effect as the sound processing apparatus according to the first aspect of the present invention are realized.

本発明の第２態様に係るプログラムは、並列に収音された第１音響信号および第２音響信号の各々について、当該音響信号の周波数毎の成分値の時系列を要素とする観測行列（例えば図２の観測行列Ｖi）の非負行列因子分解で、当該音響信号の相異なる成分の周波数毎の成分値を示す複数の基底（例えば図４の基底Ｃi[1]〜Ｃi[K]）を含む基底行列（例えば図４の基底行列Ｗi）と、各基底の重み値の時系列を各々が示す複数の重み系列（例えば図５の重み系列Ｅi[1]〜Ｅi[K]）を含む係数行列（例えば図５の係数行列Ｈi）とを生成する行列分解処理と、第１音響信号の基底行列の複数の基底のうち第２音響信号の基底行列の基底との相関が高い基底を、第１音響信号の雑音成分に対応する雑音基底（例えば図４の雑音基底Ｃi_noise）として特定する雑音特定処理と、第１音響信号の基底行列のうち雑音基底以外の各基底と第１音響信号の係数行列のうち当該各基底に対応する重み系列との乗算で、前記第１音響信号の雑音成分が抑圧された音響信号を生成する抑圧処理とをコンピュータに実行させる。以上のプログラムによれば、本発明の第２態様に係る音響処理装置と同様の作用および効果が実現される。 The program according to the second aspect of the present invention provides an observation matrix (for example, an element having a time series of component values for each frequency of the acoustic signal as an element for each of the first acoustic signal and the second acoustic signal collected in parallel. The non-negative matrix factorization of the observation matrix Vi) in FIG. 2 includes a plurality of bases (for example, the bases Ci [1] to Ci [K] in FIG. 4) indicating component values for different frequencies of the acoustic signal. A coefficient matrix including a base matrix (for example, the base matrix Wi in FIG. 4) and a plurality of weight sequences (for example, the weight sequences Ei [1] to Ei [K] in FIG. 5) each indicating a time series of weight values of each base. (For example, the coefficient matrix Hi in FIG. 5) and a base having a high correlation with the base of the base matrix of the second acoustic signal among the plurality of bases of the base matrix of the first acoustic signal. The noise base corresponding to the noise component of the acoustic signal (for example, the noise base Ci_noise in FIG. 4) is specified. Noise of the first acoustic signal is obtained by performing noise identification processing and multiplication of each basis other than the noise basis in the basis matrix of the first acoustic signal and a weight sequence corresponding to each basis in the coefficient matrix of the first acoustic signal. And causing the computer to execute a suppression process for generating an acoustic signal in which the component is suppressed. According to the above program, the same operation and effect as the sound processing apparatus according to the second aspect of the present invention are realized.

本発明の第１実施形態に係る音響処理装置のブロック図である。1 is a block diagram of a sound processing apparatus according to a first embodiment of the present invention. 観測行列の説明図である。It is explanatory drawing of an observation matrix. 特性解析部のブロック図である。It is a block diagram of a characteristic analysis part. 基底行列の説明図である。It is explanatory drawing of a base matrix. 係数行列の説明図である。It is explanatory drawing of a coefficient matrix. 第２実施形態における特性解析部のブロック図である。It is a block diagram of the characteristic analysis part in 2nd Embodiment. 第３実施形態における雑音抑圧部のブロック図である。It is a block diagram of the noise suppression part in 3rd Embodiment.

＜Ａ：第１実施形態＞
図１は、本発明の第１実施形態に係る音響処理装置１００のブロック図である。図１に示すように、音響処理装置１００には信号供給装置１２と放音装置１４とが接続される。 <A: First Embodiment>
FIG. 1 is a block diagram of a sound processing apparatus 100 according to the first embodiment of the present invention. As shown in FIG. 1, a signal supply device 12 and a sound emitting device 14 are connected to the sound processing device 100.

信号供給装置１２は、相異なる位置で並列（同時）に収音されたステレオ形式の音響信号ｓ1および音響信号ｓ2を音響処理装置１００に供給する。各音響信号ｓi（ｉ＝1,2）は、目的音成分と雑音成分との混合音の音圧波形を表す時間領域の信号である。図１では、相互に離間して配置された複数の収音機器（例えば無指向性のステレオマイク）１２２が信号供給装置１２として例示されている。ただし、可搬型または内蔵型の記録媒体から各音響信号ｓiを読出して音響処理装置１００に供給する再生装置や、各音響信号ｓiを通信網から受信して音響処理装置１００に供給する通信装置を、信号供給装置１２として採用することも可能である。 The signal supply device 12 supplies the sound processing device 100 with the stereo-type sound signal s1 and sound signal s2 collected in parallel (simultaneously) at different positions. Each acoustic signal si (i = 1, 2) is a time-domain signal representing a sound pressure waveform of a mixed sound of a target sound component and a noise component. In FIG. 1, a plurality of sound collecting devices (for example, omnidirectional stereo microphones) 122 that are spaced apart from each other are illustrated as the signal supply device 12. However, a playback device that reads out each acoustic signal si from a portable or built-in recording medium and supplies it to the acoustic processing device 100, or a communication device that receives each acoustic signal si from a communication network and supplies it to the acoustic processing device 100. The signal supply device 12 can also be employed.

音響処理装置１００は、音響信号ｓ1および音響信号ｓ2からステレオ形式の音響信号ｑ1および音響信号ｑ2を生成する。各音響信号ｑiは、音響信号ｓiから雑音成分を抑圧（目的音成分を強調）した時間領域の信号である。放音装置１４（例えばステレオスピーカやステレオヘッドホン）は、音響処理装置１００が生成した音響信号ｑ1および音響信号ｑ2に応じた音波を放射する。なお、音響信号ｓiをアナログからデジタルに変換するＡ/Ｄ変換器や音響信号ｑiをデジタルからアナログに変換するＤ/Ａ変換器の図示は便宜的に省略されている。 The sound processing apparatus 100 generates a stereo sound signal q1 and a sound signal q2 from the sound signal s1 and the sound signal s2. Each acoustic signal qi is a time domain signal obtained by suppressing a noise component (emphasizing a target sound component) from the acoustic signal si. The sound emitting device 14 (for example, a stereo speaker or stereo headphones) radiates sound waves corresponding to the acoustic signal q1 and the acoustic signal q2 generated by the acoustic processing device 100. Note that an A / D converter that converts the acoustic signal si from analog to digital and a D / A converter that converts the acoustic signal qi from digital to analog are not shown for convenience.

図１に示すように、音響処理装置１００は、演算処理装置２２と記憶装置２４とを具備するコンピュータシステムで実現される。記憶装置２４は、演算処理装置２２が実行するプログラムＰGや演算処理装置２２が使用する各種のデータを記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体や複数種の記録媒体の組合せが記憶装置２４として任意に採用され得る。音響信号ｓ1および音響信号ｓ2を記憶装置２４に記憶した構成（したがって信号供給装置１２は省略され得る）も好適である。 As shown in FIG. 1, the sound processing device 100 is realized by a computer system including an arithmetic processing device 22 and a storage device 24. The storage device 24 stores a program PG executed by the arithmetic processing device 22 and various data used by the arithmetic processing device 22. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media can be arbitrarily employed as the storage device 24. A configuration in which the acoustic signal s1 and the acoustic signal s2 are stored in the storage device 24 (therefore, the signal supply device 12 can be omitted) is also suitable.

演算処理装置２２は、記憶装置２４に格納されたプログラムＰGの実行で、音響信号ｓiから音響信号ｑiを生成するための複数の機能（周波数分析部３２，特性解析部３４A，雑音抑圧部３６，波形合成部３８）を実現する。なお、演算処理装置２２の各機能を複数の集積回路に分散した構成や、専用の電子回路（DSP）が各機能を実現する構成も採用され得る。 The arithmetic processing unit 22 has a plurality of functions (frequency analysis unit 32, characteristic analysis unit 34A, noise suppression unit 36, etc.) for generating the acoustic signal qi from the acoustic signal si by executing the program PG stored in the storage device 24. A waveform synthesis unit 38) is realized. A configuration in which each function of the arithmetic processing unit 22 is distributed over a plurality of integrated circuits, or a configuration in which a dedicated electronic circuit (DSP) realizes each function may be employed.

図１の周波数分析部３２は、音響信号ｓ1のスペクトルＳ1と音響信号ｓ2のスペクトルＳ2とを時間軸上の単位期間（フレーム）毎に順次に生成する。図２に示すように、各スペクトルＳiは、周波数軸上の相異なる周波数（ｆ1，ｆ2，……，ｆM，……）に対応する複数の成分値（パワー）ｘiを配列したパワースペクトルである。スペクトルＸiの生成には、短時間フーリエ変換等の公知の周波数分析が任意に採用され得る。 1 sequentially generates a spectrum S1 of the acoustic signal s1 and a spectrum S2 of the acoustic signal s2 for each unit period (frame) on the time axis. As shown in FIG. 2, each spectrum Si is a power spectrum in which a plurality of component values (power) xi corresponding to different frequencies (f1, f2,..., FM,...) On the frequency axis are arranged. . For the generation of the spectrum Xi, known frequency analysis such as short-time Fourier transform can be arbitrarily employed.

各音響信号ｓiのスペクトルＳiは、帯域ＢL内のスペクトルＸiと帯域ＢH内のスペクトルＸHiとに区分される。帯域ＢLは、雑音成分の周波数を包含するように設定される。本実施形態では風雑音を雑音成分として想定する。風雑音は、空気自体が流動して収音機器１２２の振動板に直接に衝突することで発生する雑音成分である。空気の衝突に起因した振動板の振動の周波数は、空気の振動（音圧変化）として振動板に伝播する音波の周波数と比較して低い。具体的には、風雑音の周波数は、例えば1.5kHz以下の低周波成分が支配的となる。以上の傾向を考慮して、帯域ＢLは、Ｍ個（Ｍは自然数）の周波数ｆ1〜ｆMを含む1.5kHz以下の周波数帯域に設定される。帯域ＢHは、帯域ＢLと比較して高域側（例えば1.5kHz以上）の周波数帯域である。 The spectrum Si of each acoustic signal si is divided into a spectrum Xi in the band BL and a spectrum XHi in the band BH. The band BL is set so as to include the frequency of the noise component. In the present embodiment, wind noise is assumed as a noise component. Wind noise is a noise component generated when air itself flows and directly collides with the diaphragm of the sound collecting device 122. The vibration frequency of the diaphragm due to the air collision is lower than the frequency of the sound wave propagating to the diaphragm as air vibration (sound pressure change). Specifically, the frequency of the wind noise is dominated by a low frequency component of 1.5 kHz or less, for example. Considering the above tendency, the band BL is set to a frequency band of 1.5 kHz or less including M (M is a natural number) frequencies f1 to fM. The band BH is a high frequency band (for example, 1.5 kHz or more) compared to the band BL.

図２に示すように、周波数分析部３２が生成したスペクトルＳiの時系列（すなわちスペクトログラム）は、Ｎ個の単位期間ｔ1〜ｔNで構成される解析期間ＴA毎に時間軸上で区分される。解析期間ＴAは、例えば数十秒程度の長時間に設定される。帯域ＢL内のスペクトルＸiのＭ個の成分値ｘi[f1,tn]〜ｘi[fM,tn]を解析期間ＴA内のＮ個の単位期間ｔ1〜ｔNについて時系列に配列したＭ行×Ｎ列の観測行列Ｖiが解析期間ＴA毎に規定される。すなわち、観測行列Ｖiの第ｎ列（ｎ＝１〜Ｎ）は、解析期間ＴA内の第ｎ番目の単位期間ｔnにおけるスペクトルＸiのＭ個の成分値ｘi[f1,tn]〜ｘi[fM,tn]の系列に相当し、観測行列Ｖiの第ｍ行（ｍ＝１〜Ｍ）は、解析期間ＴA内のＮ個の単位期間ｔ1〜ｔNにわたる周波数ｆmでの成分値ｘi[fm,t1]〜ｘi[fm,tN]の時系列に相当する。スペクトルＸiの成分値ｘi[fm,tn]はパワー（非負値）に相当するから、観測行列Ｖiは非負行列（負数を含まない行列）である。 As shown in FIG. 2, the time series (that is, the spectrogram) of the spectrum Si generated by the frequency analysis unit 32 is divided on the time axis for each analysis period TA composed of N unit periods t1 to tN. The analysis period TA is set to a long time of about several tens of seconds, for example. M rows × N columns in which M component values xi [f1, tn] to xi [fM, tn] of the spectrum Xi in the band BL are arranged in time series for N unit periods t1 to tN in the analysis period TA. The observation matrix Vi is defined for each analysis period TA. That is, the n-th column (n = 1 to N) of the observation matrix Vi has M component values xi [f1, tn] to xi [fM, of the spectrum Xi in the n-th unit period tn in the analysis period TA. tn], the m-th row (m = 1 to M) of the observation matrix Vi is the component value xi [fm, t1] at the frequency fm over N unit periods t1 to tN in the analysis period TA. This corresponds to a time series of ~ xi [fm, tN]. Since the component value xi [fm, tn] of the spectrum Xi corresponds to power (non-negative value), the observation matrix Vi is a non-negative matrix (matrix not including negative numbers).

図１の特性解析部３４Aは、観測行列Ｖiを解析することで処理期間ＴA毎に処理係数列Ｇi（Ｇ1，Ｇ2）を生成する。処理係数列Ｇiは、係数値ｇi[f1,t1]〜ｇi[fM,tN]を配列したＭ行×Ｎ列の行列である。係数値ｇi[fm,tn]は、単位期間ｔnのスペクトルＸiのうち周波数ｆmでの成分値ｘi[fm,tn]に対する利得（スペクトルゲイン）に相当し、０以上かつ１以下の範囲内で音響信号ｓiの特性（風雑音の強度）に応じて可変に設定される。具体的には、単位期間ｔn内の音響信号ｓiのうち周波数ｆmの成分において風雑音が優勢であるほど係数値ｇi[fm,tn]は小さい数値に設定される。 The characteristic analysis unit 34A in FIG. 1 generates a processing coefficient sequence Gi (G1, G2) for each processing period TA by analyzing the observation matrix Vi. The processing coefficient sequence Gi is a matrix of M rows × N columns in which coefficient values gi [f1, t1] to gi [fM, tN] are arranged. The coefficient value gi [fm, tn] corresponds to a gain (spectrum gain) with respect to the component value xi [fm, tn] at the frequency fm in the spectrum Xi of the unit period tn. It is variably set according to the characteristics of the signal si (wind noise intensity). Specifically, the coefficient value gi [fm, tn] is set to a smaller numerical value as the wind noise becomes dominant in the component of the frequency fm in the acoustic signal si within the unit period tn.

雑音抑圧部３６は、特性解析部３４Aが生成した処理係数列Ｇiを音響信号ｓiの観測行列Ｖiに作用させることで、Ｎ個の単位期間ｔ1〜ｔNの各々に対応するＮ個のスペクトルＹiの時系列（解析期間ＴA内のスペクトログラム）を解析期間ＴA毎に順次に生成する。単位期間ｔnに対応するスペクトルＹiは、相異なる周波数ｆmに対応するＭ個の成分値（パワー）ｙi[f1,tn]〜ｙi[fM,tn]を配列したパワースペクトルである。 The noise suppression unit 36 applies the processing coefficient sequence Gi generated by the characteristic analysis unit 34A to the observation matrix Vi of the acoustic signal si, so that the N spectra Yi corresponding to each of the N unit periods t1 to tN. A time series (a spectrogram within the analysis period TA) is sequentially generated for each analysis period TA. The spectrum Yi corresponding to the unit period tn is a power spectrum in which M component values (powers) yi [f1, tn] to yi [fM, tn] corresponding to different frequencies fm are arranged.

単位期間ｔnのスペクトルＹiのうち周波数ｆmでの成分値ｙi[fm,tn]は、処理係数列Ｇiのうち当該成分値ｙi[fm,tn]に対応する係数値ｇi[fm,tn]と観測行列Ｖiのうち当該成分値ｙi[fm,tn]に対応する成分値ｘi[fm,tn]との乗算値に設定される（ｙi[fm,tn]＝ｇi[fm,tn]×ｘi[fm,tn]）。前述のように風雑音が優勢であるほど係数値ｇi[fm,tn]は小さい数値に設定されるから、雑音抑圧部３６による処理後のスペクトルＹiは、音響信号ｓiのスペクトルＸiから風雑音を抑圧したスペクトル（音響信号ｑiの帯域ＢLの成分のスペクトル）に相当する。すなわち、雑音抑圧部３６は、音響信号ｓ1および音響信号ｓ2の各々から風雑音を抑圧する要素として機能する。 The component value yi [fm, tn] at the frequency fm in the spectrum Yi of the unit period tn is observed with the coefficient value gi [fm, tn] corresponding to the component value yi [fm, tn] in the processing coefficient sequence Gi. The matrix Vi is set to a multiplication value with the component value xi [fm, tn] corresponding to the component value yi [fm, tn] (yi [fm, tn] = gi [fm, tn] × xi [fm , tn]). As described above, the coefficient value gi [fm, tn] is set to a smaller value as the wind noise becomes more dominant. Therefore, the spectrum Yi after processing by the noise suppression unit 36 is obtained from the spectrum Xi of the acoustic signal si. This corresponds to the suppressed spectrum (the spectrum of the band BL component of the acoustic signal qi). That is, the noise suppression unit 36 functions as an element that suppresses wind noise from each of the acoustic signal s1 and the acoustic signal s2.

図１の波形合成部３８は、雑音抑圧部３６が単位期間ｔn毎に生成したスペクトルＹi（帯域ＢL）と周波数分析部３２が生成したスペクトルＸHi（帯域ＢH）とから時間領域の音響信号ｑiを生成する。具体的には、波形合成部３８は、帯域ＢLのスペクトルＹiと帯域ＢHのスペクトルＸHiとの合成（加算）で単位期間ｔn毎にスペクトルＱiを生成し、スペクトルＱiに対応する振幅スペクトルと音響信号ｓiの位相スペクトルとを適用したフーリエ逆変換で時間領域の信号を生成するとともに前後の単位期間ｔnで相互に連結して音響信号ｑiを生成する。 The waveform synthesizer 38 in FIG. 1 generates a time-domain acoustic signal qi from the spectrum Yi (band BL) generated by the noise suppressor 36 every unit period tn and the spectrum XHi (band BH) generated by the frequency analyzer 32. Generate. Specifically, the waveform synthesizer 38 generates a spectrum Qi for each unit period tn by combining (adding) the spectrum Yi of the band BL and the spectrum XHi of the band BH, and the amplitude spectrum and the acoustic signal corresponding to the spectrum Qi. A time-domain signal is generated by inverse Fourier transform using the phase spectrum of si, and is connected to each other in the preceding and following unit periods tn to generate an acoustic signal qi.

図３は、特性解析部３４Aのブロック図である。図３に示すように、第１実施形態の特性解析部３４Aは、行列分解部４２と雑音特定部４４と係数列生成部４６とを含んで構成される。行列分解部４２は、音響信号ｓ1の観測行列Ｖ1および音響信号ｓ2の観測行列Ｖ2の各々について、当該観測行列Ｖiの非負行列因子分解（NMF：Non-negative Matrix Factorization）で基底行列Ｗi（Ｗ1，Ｗ2）と係数行列Ｈi（Ｈ1，Ｈ2）とを生成する。基底行列Ｗi（図４）は、成分値ｗi[f1,1]〜ｗi[fM,K]を配列したＭ行×Ｋ列の非負行列であり、係数行列Ｈi（図５）は、重み値ｈi[1,t1]〜ｈi[K,tN]を配列したＫ行×Ｎ列の非負行列である（Ｋは自然数）。基底行列Ｗiと係数行列Ｈiとの積が観測行列Ｖiと近似する（Ｖi≒ＷiＨi）ように基底行列Ｗiと係数行列Ｈiとが算定される。 FIG. 3 is a block diagram of the characteristic analysis unit 34A. As shown in FIG. 3, the characteristic analysis unit 34A of the first embodiment includes a matrix decomposition unit 42, a noise specifying unit 44, and a coefficient sequence generation unit 46. For each of the observation matrix V1 of the acoustic signal s1 and the observation matrix V2 of the acoustic signal s2, the matrix decomposition unit 42 uses a non-negative matrix factorization (NMF) of the observation matrix Vi to form a basis matrix Wi (W1,. W2) and a coefficient matrix Hi (H1, H2) are generated. The base matrix Wi (FIG. 4) is a non-negative matrix of M rows × K columns in which component values wi [f1,1] to wi [fM, K] are arranged, and the coefficient matrix Hi (FIG. 5) is a weight value hi. A non-negative matrix of K rows × N columns in which [1, t1] to hi [K, tN] are arranged (K is a natural number). The base matrix Wi and the coefficient matrix Hi are calculated so that the product of the base matrix Wi and the coefficient matrix Hi approximates the observation matrix Vi (Vi≈WiHi).

図４に示すように、基底行列Ｗiは、Ｋ個の基底（codebook）Ｃi[1]〜Ｃi[K]で構成される。基底行列Ｗiの第ｋ列目（ｋ＝１〜Ｋ）の基底Ｃi[k]は、解析期間ＴA内の音響信号ｓiを構成すると推定されるＫ種類の成分のうち第ｋ番目の成分の周波数ｆ1〜ｆMでの成分値ｗi[f1,k]〜ｗi[fM,k]の系列（パワースペクトル）に相当する。他方、係数行列Ｈiは、図５に示すように、Ｋ個の重み系列（excitation）Ｅi[1]〜Ｅi[K]で構成される。係数行列Ｈiの第ｋ行目の重み系列Ｅi[k]は、基底行列Ｗiの基底Ｃi[k]で表現される成分に対する単位期間ｔn毎の重み値ｈi[k,t1]〜ｈi[k,tN]の時系列（基底Ｃi[k]の各成分値ｗi[fm,k]の時間変化）に相当する。以上の定義から理解されるように、単位期間ｔnのスペクトルＸiは、係数行列Ｈiのうち単位期間ｔnに対応するＫ個の重み値ｈi[1,tn]〜ｈi[K,tn]を適用したＫ個の基底Ｃi[1]〜Ｃi[K]の加重和で近似される（Ｘi≒ｈi[1,tn]×Ｃi[1]＋ｈi[2,tn]×Ｃi[2]＋……＋ｈi[K,tn]×Ｃi[K]）。 As shown in FIG. 4, the base matrix Wi is composed of K codebooks Ci [1] to Ci [K]. The basis Ci [k] of the k-th column (k = 1 to K) of the basis matrix Wi is the frequency of the k-th component among the K types of components estimated to constitute the acoustic signal si within the analysis period TA. This corresponds to a sequence (power spectrum) of component values wi [f1, k] to wi [fM, k] at f1 to fM. On the other hand, the coefficient matrix Hi is composed of K weight sequences (excitation) Ei [1] to Ei [K] as shown in FIG. The weight series Ei [k] in the k-th row of the coefficient matrix Hi is a weight value hi [k, t1] to hi [k, for each unit period tn for the component expressed by the basis Ci [k] of the basis matrix Wi. This corresponds to a time series of tN] (time change of each component value wi [fm, k] of the basis Ci [k]). As understood from the above definition, the spectrum Xi in the unit period tn is applied with K weight values hi [1, tn] to hi [K, tn] corresponding to the unit period tn in the coefficient matrix Hi. It is approximated by a weighted sum of K bases Ci [1] to Ci [K] (Xi≈hi [1, tn] × Ci [1] + hi [2, tn] × Ci [2] + …… + hi [ K, tn] × Ci [K]).

観測行列Ｖiの非負行列因子分解には公知の方法が任意に採用される。例えば、基底行列Ｗiおよび係数行列Ｈiの積と観測行列Ｖiとの相違（例えば距離）が最小化するように基底行列Ｗiと係数行列Ｈiとを逐次的に更新（反復演算）する方法が好適に採用される。反復演算に適用される基底行列Ｗiの初期値（成分値ｗi[fm,k]の初期値）は、例えば乱数に設定される。なお、例えば風雑音のスペクトル（高域ほど減衰する周波数特性）を模擬するように各基底Ｃi[k]のＭ個の成分値ｗi[f1,k]〜ｗi[fM,k]の初期値を設定した構成も好適である。 A known method is arbitrarily employed for non-negative matrix factorization of the observation matrix Vi. For example, a method of sequentially updating (iteratively calculating) the base matrix Wi and the coefficient matrix Hi so as to minimize the difference (for example, distance) between the product of the base matrix Wi and the coefficient matrix Hi and the observation matrix Vi is preferable. Adopted. The initial value of the base matrix Wi applied to the iterative calculation (the initial value of the component value wi [fm, k]) is set to a random number, for example. For example, the initial values of M component values wi [f1, k] to wi [fM, k] of each base Ci [k] are simulated so as to simulate the spectrum of wind noise (frequency characteristics that attenuate as the frequency increases). The set configuration is also suitable.

図３の雑音特定部４４は、音響信号ｓ1の基底行列Ｗ1および音響信号ｓ2の基底行列Ｗ2の各々について、当該基底行列ＷiのＫ個の基底Ｃi[1]〜Ｃi[K]のうち風雑音に対応する１個の基底（以下「雑音基底」という）Ｃi_noiseを特定する。風雑音は、収音機器１２２に衝突する空気の乱流に起因して発生するから、相異なる位置で収音された音響信号ｓ1および音響信号ｓ2の各々に含まれる風雑音の瞬時的な周波数特性は相互に統計的に独立する。ただし、風雑音の長期的な周波数特性は、音声等と比較すると、収音の位置に関わらず同様の特性に維持され易い。すなわち、解析期間ＴAのような長期間にわたる風雑音の周波数特性は音響信号ｓ1と音響信号ｓ2とで類似するという傾向がある。 For each of the base matrix W1 of the acoustic signal s1 and the base matrix W2 of the acoustic signal s2, the noise specifying unit 44 in FIG. 3 wind noise among the K bases Ci [1] to Ci [K] of the base matrix Wi. One base (hereinafter referred to as “noise base”) Ci_noise corresponding to is specified. Since wind noise is generated due to the turbulent flow of air colliding with the sound collecting device 122, the instantaneous frequency of the wind noise included in each of the acoustic signals s1 and s2 collected at different positions. The characteristics are statistically independent of each other. However, the long-term frequency characteristics of wind noise are more likely to be maintained at the same characteristics regardless of the position of sound collection, as compared to voice and the like. That is, the frequency characteristics of wind noise over a long period such as the analysis period TA tend to be similar between the acoustic signal s1 and the acoustic signal s2.

以上の傾向を考慮して、雑音特定部４４は、音響信号ｓ1の基底行列Ｗ1（基底Ｃ1[1]〜Ｃ1[K]）と音響信号ｓ2の基底行列Ｗ2（基底Ｃ2[1]〜Ｃ2[K]）との間で相互に相関が高い各基底Ｃi[k]（Ｃ1[k1]，Ｃ2[k2]）を雑音基底Ｃi_noiseとして基底行列Ｗ1および基底行列Ｗ2の各々から特定する。例えば、基底行列Ｗ1の１個の基底Ｃ1[k]と基底行列Ｗ2の１個の基底Ｃ2[k]とを選択する全通りの組合せについて基底Ｃ1[k]と基底Ｃ2[k]との相関の度合を示す指標（相関指標）を算定し、相関指標が示す相関の度合が最大となる組合せの基底Ｃ1[k1]と基底Ｃ2[k2]との各々（変数ｋ1と変数ｋ2との数値の異同は不問）を雑音基底Ｃi_noise（Ｃ1_noise，Ｃ2_noise）として抽出する。基底Ｃ1[k]と基底Ｃ2[k]との相関指標としては、例えば距離（ユークリッド距離）や内積が好適に採用される。 Considering the above tendency, the noise specifying unit 44 uses the basis matrix W1 (basis C1 [1] to C1 [K]) of the acoustic signal s1 and the basis matrix W2 (basis C2 [1] to C2 [) of the acoustic signal s2. K]), each base Ci [k] (C1 [k1], C2 [k2]) having a high correlation with each other is specified as a noise base Ci_noise from each of the base matrix W1 and the base matrix W2. For example, the correlation between the basis C1 [k] and the basis C2 [k] for all combinations of selecting one basis C1 [k] of the basis matrix W1 and one basis C2 [k] of the basis matrix W2 An index (correlation index) indicating the degree of the correlation is calculated, and each of the combinations of the base C1 [k1] and the base C2 [k2] (the values of the variable k1 and the variable k2) having the maximum degree of correlation indicated by the correlation index The noise base Ci_noise (C1_noise, C2_noise) is extracted. As a correlation index between the base C1 [k] and the base C2 [k], for example, a distance (Euclidean distance) or an inner product is preferably employed.

図３の係数列生成部４６は、音響信号ｓ1および音響信号ｓ2の各々について、図４に示すように当該音響信号ｓiの基底行列Ｗiから雑音基底Ｃi_noiseを除外したＭ行×(K-1)列の行列ＷAiと、図５に示すように雑音基底Ｃi_noiseに対応する重み系列Ｅi_noiseを係数行列Ｈiから除外した(K-1)行×Ｎ列の行列ＨAiとから処理係数列Ｇi（Ｇ1，Ｇ2）を生成する。 For each of the acoustic signal s1 and the acoustic signal s2, the coefficient sequence generator 46 of FIG. 3 removes the noise base Ci_noise from the basis matrix Wi of the acoustic signal si as shown in FIG. A processing coefficient sequence Gi (G1, G2) is obtained from the matrix HAi of columns and the matrix HAi of (K-1) rows × N columns excluding the weight sequence Ei_noise corresponding to the noise basis Ci_noise from the coefficient matrix Hi as shown in FIG. ) Is generated.

第１に、係数列生成部４６は、雑音基底Ｃi_noiseの除外後の行列ＷAi（Ｍ行×(K-1)列）と重み系列Ｅi_noiseの除外後の行列ＨAi（(K-1)行×Ｎ列）との乗算で行列ＶAiを算定する。行列ＶAiは、要素値ｖa[f1,t1]〜ｖa[fM,tN]を配列したＭ行×Ｎ列の行列である。行列ＶAiの第ｎ列に位置するＭ個の要素値ｖa[f1,tn]〜ｖa[fM,tn]の系列は、単位期間ｔnのスペクトルＸiから風雑音を抑圧したパワースペクトルに相当する。 First, the coefficient sequence generation unit 46 excludes the noise basis Ci_noise from the matrix WAi (M rows × (K−1) columns) and the weight sequence Ei_noise from the matrix HAi ((K−1) rows × N). Matrix VAi is calculated by multiplication with (column). The matrix VAi is an M row × N column matrix in which element values va [f1, t1] to va [fM, tN] are arranged. A sequence of M element values va [f1, tn] to va [fM, tn] located in the nth column of the matrix VAi corresponds to a power spectrum in which wind noise is suppressed from the spectrum Xi of the unit period tn.

第２に、係数列生成部４６は、以下の数式(A)の演算で処理係数列Ｇiの係数値ｇi[fm,tn]を算定する。数式(A)の記号ｖ[fm,tn]は、雑音基底Ｃi_noiseの除外前の基底行列Ｗiと重み系列Ｅi_noiseの除外前の係数行列Ｈiとを乗算したＭ行×Ｎ列の行列における第ｍ行第ｎ列の要素値を意味し、スペクトルＸiの成分値ｘi[fm,tn]の推定値に相当する。
ｇi[fm,tn]＝ｖa[fm,tn]／ｖ[fm,tn] ……(A)
数式(A)で要素値ｖa[fm,tn]を要素値ｖ[fm,tn]で除算しているのは、係数値ｇi[fm,tn]を０以上かつ１以下の範囲内の数値に制限するためである。以上の手順で生成された処理係数列Ｇ1および処理係数列Ｇ2が、図１の雑音抑圧部３６による風雑音の抑圧に適用される。 Second, the coefficient sequence generator 46 calculates the coefficient value gi [fm, tn] of the processing coefficient sequence Gi by the calculation of the following formula (A). The symbol v [fm, tn] in the formula (A) is the m-th row in the matrix of M rows × N columns obtained by multiplying the base matrix Wi before the noise base Ci_noise is excluded and the coefficient matrix Hi before the weight sequence Ei_noise is excluded. This means the element value in the n-th column and corresponds to the estimated value of the component value xi [fm, tn] of the spectrum Xi.
gi [fm, tn] = va [fm, tn] / v [fm, tn] (A)
The reason why the element value va [fm, tn] is divided by the element value v [fm, tn] in the formula (A) is that the coefficient value gi [fm, tn] is set to a numerical value in the range of 0 to 1 This is to limit. The processing coefficient sequence G1 and the processing coefficient sequence G2 generated by the above procedure are applied to the suppression of wind noise by the noise suppression unit 36 of FIG.

以上の説明から理解されるように、単位期間ｔn内の音響信号ｓiのうち周波数ｆmの成分に風雑音が優勢であるほど要素値ｖa[fm,tn]は要素値ｖ[fm,tn]に対して減少するから、係数値ｇi[fm,tn]は小さい数値となる。他方、音響信号ｓiの周波数ｆmの成分が風雑音を含まない場合、要素値ｖa[fm,tn]は要素値ｖ[fm,tn]と略同値となる（雑音基底Ｃi_noiseの除外に影響されない）から、係数値ｇi[fm,tn]は１に近い数値となる。したがって、前述のように雑音抑圧部３６が処理係数列Ｇiを観測行列Ｖiに作用させることで、音響信号ｓiから風雑音を抑圧した音響信号ｑiが生成される。 As understood from the above description, the element value va [fm, tn] becomes the element value v [fm, tn] as the wind noise prevails in the frequency fm component of the acoustic signal si within the unit period tn. On the other hand, the coefficient value gi [fm, tn] is a small numerical value because it decreases. On the other hand, when the component of the frequency fm of the acoustic signal si does not include wind noise, the element value va [fm, tn] is substantially the same value as the element value v [fm, tn] (not affected by the exclusion of the noise base Ci_noise). Therefore, the coefficient value gi [fm, tn] is a numerical value close to 1. Therefore, as described above, the noise suppression unit 36 causes the processing coefficient sequence Gi to act on the observation matrix Vi, thereby generating the acoustic signal qi in which the wind noise is suppressed from the acoustic signal si.

以上に説明したように、第１実施形態では、音響信号ｓiの観測行列Ｖiが基底行列Ｗiと係数行列Ｈiとに分解され、雑音基底Ｃi_noiseを基底行列Ｗiから除外した行列ＷAiと重み系列Ｅi_noiseを係数行列Ｈiから除外した行列ＨAiとを利用して処理係数列Ｇiが生成される。したがって、音響信号ｓiの目的音成分の強度が風雑音と比較して低い場合でも、高精度に風雑音を抑圧することが可能である。また、基底行列Ｗiのうち雑音基底Ｃi_noise以外の各基底Ｃi[k]と係数行列Ｈiのうち重み系列Ｅi_noise以外の各重み系列Ｅi[k]とは維持されるから、音響信号ｓiの目的音成分の波形が忠実に維持された音響信号ｑiを生成できるという利点もある。 As described above, in the first embodiment, the observation matrix Vi of the acoustic signal si is decomposed into the base matrix Wi and the coefficient matrix Hi, and the matrix WAi and the weight sequence Ei_noise in which the noise base Ci_noise is excluded from the base matrix Wi are obtained. A processing coefficient sequence Gi is generated using the matrix HAi excluded from the coefficient matrix Hi. Therefore, even when the intensity of the target sound component of the acoustic signal si is lower than that of the wind noise, it is possible to suppress the wind noise with high accuracy. Further, since each basis Ci [k] other than the noise basis Ci_noise in the basis matrix Wi and each weight series Ei [k] other than the weight series Ei_noise in the coefficient matrix Hi are maintained, the target sound component of the acoustic signal si is maintained. There is also an advantage that an acoustic signal qi can be generated in which the waveform of is maintained faithfully.

なお、基底行列Ｗiから雑音基底Ｃi_noiseを特定する方法としては、例えば、風雑音の周波数特性を模擬するように事前に作成されたモデルを基底行列Ｗiの各基底Ｃi[k]と比較する構成も採用され得る。しかし、風雑音のモデルを利用する構成では、事前に用意されたモデルとは周波数特性が相違する風雑音を充分に抑圧できない可能性がある。他方、第１実施形態では、基底行列Ｗ1と基底行列Ｗ2との間で相関が高い各基底Ｃi[k]が雑音基底Ｃi_noiseとして特定されるから、風雑音のモデルを利用する構成と比較して、多様な特性の風雑音を充分に抑圧できるという利点がある。 As a method for identifying the noise base Ci_noise from the base matrix Wi, for example, a configuration in which a model created in advance so as to simulate the frequency characteristics of wind noise is compared with each base Ci [k] of the base matrix Wi. Can be employed. However, in a configuration using a wind noise model, wind noise having a frequency characteristic different from that of a model prepared in advance may not be sufficiently suppressed. On the other hand, in the first embodiment, each base Ci [k] having a high correlation between the base matrix W1 and the base matrix W2 is specified as a noise base Ci_noise, so that it is compared with a configuration using a wind noise model. There is an advantage that wind noise with various characteristics can be sufficiently suppressed.

また、風雑音を含む帯域ＢLについて選択的に処理係数列Ｇiの生成や適用が実行される（帯域ＢHについては省略される）から、音響信号ｓiの全帯域を処理の対象とする場合と比較して、風雑音の抑圧の精度を維持しながら、演算処理装置２２の処理の負荷を軽減できるという利点もある。 Further, since the generation and application of the processing coefficient sequence Gi is selectively executed for the band BL including wind noise (the band BH is omitted), it is compared with the case where the entire band of the acoustic signal si is to be processed. Thus, there is an advantage that the processing load of the arithmetic processing unit 22 can be reduced while maintaining the accuracy of suppression of wind noise.

＜Ｂ：第２実施形態＞
次に、本発明の第２実施形態について説明する。なお、以下の各例示において作用や機能が第１実施形態と同等である要素については、以上の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 <B: Second Embodiment>
Next, a second embodiment of the present invention will be described. In addition, about the element which an effect | action and function are equivalent to 1st Embodiment in each following illustration, the code | symbol referred by the above description is diverted and each detailed description is abbreviate | omitted suitably.

信号供給装置１２の２個の収音機器１２２に対して正面方向から到来する目的音成分は、位相差を殆ど発生させずに略同等の強度（振幅）で各収音機器１２２に到達する。他方、風雑音は前述のように空気の乱流に起因するから、同位相かつ同振幅で各収音機器１２２に到達する可能性は低い。したがって、音響信号ｓ1や音響信号ｓ2にて風雑音が優勢となるほど両者間の位相差や強度差が増加するという傾向がある。以上の傾向を考慮して、本実施形態では、音響信号ｓ1と音響信号ｓ2との位相差や強度差に応じて処理係数列Ｇiの各係数値ｇi[fm,tn]を補正する。 The target sound component that arrives from the front direction with respect to the two sound collecting devices 122 of the signal supply device 12 reaches each sound collecting device 122 with substantially the same intensity (amplitude) without generating a phase difference. On the other hand, since wind noise is caused by air turbulence as described above, the possibility of reaching each sound collecting device 122 with the same phase and the same amplitude is low. Therefore, the more the wind noise becomes dominant in the acoustic signal s1 and the acoustic signal s2, the more the phase difference and intensity difference between the two tend to increase. In consideration of the above tendency, in the present embodiment, each coefficient value gi [fm, tn] of the processing coefficient sequence Gi is corrected according to the phase difference or intensity difference between the acoustic signal s1 and the acoustic signal s2.

第２実施形態の音響処理装置１００は、第１実施形態の特性解析部３４Aを図６の特性解析部３４Bに置換した構成である。図６に示すように、特性解析部３４Bは、特性解析部３４Aと同様の要素（行列分解部４２，雑音特定部４４，係数列生成部４６）に加えて位相差算定部５２と強度差算定部５４とを含んで構成される。各音響信号ｓi（ｓ1，ｓ2）の帯域ＢMの成分が位相差算定部５２および強度差算定部５４に供給される。帯域ＢMは、風雑音の周波数と主要な目的音成分の周波数とを包含するように設定される。例えば、帯域ＢMは４kHz以下の周波数帯域（すなわち帯域ＢLを含む帯域）に設定される。 The sound processing apparatus 100 of the second embodiment has a configuration in which the characteristic analysis unit 34A of the first embodiment is replaced with a characteristic analysis unit 34B of FIG. As shown in FIG. 6, the characteristic analysis unit 34B includes a phase difference calculation unit 52 and an intensity difference calculation in addition to the same elements (matrix decomposition unit 42, noise identification unit 44, coefficient sequence generation unit 46) as the characteristic analysis unit 34A. Part 54. The components of the band BM of each acoustic signal si (s1, s2) are supplied to the phase difference calculation unit 52 and the intensity difference calculation unit 54. The band BM is set so as to include the frequency of the wind noise and the frequency of the main target sound component. For example, the band BM is set to a frequency band of 4 kHz or less (that is, a band including the band BL).

位相差算定部５２は、音響信号ｓ1と音響信号ｓ2との位相差ΔＰ[tn]を単位期間ｔn毎に算定する。位相差ΔＰ[tn]は、例えば、帯域ＢM内の各周波数での位相差の代表値（例えば平均値）である。同様に、強度差算定部５４は、音響信号ｓ1と音響信号ｓ2との強度差（例えば振幅差やパワー差）ΔＡ[tn]を単位期間ｔn毎に算定する。 The phase difference calculator 52 calculates the phase difference ΔP [tn] between the acoustic signal s1 and the acoustic signal s2 for each unit period tn. The phase difference ΔP [tn] is, for example, a representative value (for example, an average value) of the phase difference at each frequency in the band BM. Similarly, the intensity difference calculation unit 54 calculates an intensity difference (for example, an amplitude difference or a power difference) ΔA [tn] between the acoustic signal s1 and the acoustic signal s2 for each unit period tn.

係数列生成部４６は、第１実施形態の係数値ｇi[fm,tn]と同様の方法で算定した係数値γi[fm,tn]（第１実施形態の係数値ｇi[fm,tn]に相当する）を、位相差算定部５２が算定した位相差ΔＰ[tn]と強度差算定部５４が算定した強度差ΔＡ[tn]とに応じて補正することで処理係数列Ｇiの係数値ｇi[fm,tn]を算定する。 The coefficient sequence generator 46 calculates the coefficient value γi [fm, tn] calculated by the same method as the coefficient value gi [fm, tn] of the first embodiment (the coefficient value gi [fm, tn] of the first embodiment). Corresponding to the phase difference ΔP [tn] calculated by the phase difference calculation unit 52 and the intensity difference ΔA [tn] calculated by the intensity difference calculation unit 54, thereby correcting the coefficient value gi of the processing coefficient sequence Gi. Calculate [fm, tn].

第１に、係数列生成部４６は、位相差算定部５２が算定した位相差ΔＰ[tn]に応じて単位期間ｔn毎に係数値ξp[tn]を設定する。具体的には、位相差ΔＰ[tn]が大きい（すなわち単位期間ｔnにて風雑音が優勢である）ほど係数値ξp[tn]が小さい数値となるように、係数値ξp[tn]は０以上かつ１以下の範囲内で可変に設定される。第２に、係数列生成部４６は、強度差算定部５４が算定した強度差ΔＡ[tn]に応じて単位期間ｔn毎に係数値ξa[tn]を設定する。具体的には、強度差ΔＡ[tn]が大きい（すなわち単位期間ｔnにて風雑音が優勢である）ほど係数値ξa[tn]が小さい数値となるように、係数値ξa[tn]は０以上かつ１以下の範囲内で可変に設定される。 First, the coefficient sequence generator 46 sets a coefficient value ξp [tn] for each unit period tn according to the phase difference ΔP [tn] calculated by the phase difference calculator 52. Specifically, the coefficient value ξp [tn] is 0 so that the coefficient value ξp [tn] becomes smaller as the phase difference ΔP [tn] is larger (that is, the wind noise is dominant in the unit period tn). It is set variably within the range above and below 1. Secondly, the coefficient sequence generator 46 sets the coefficient value ξa [tn] for each unit period tn according to the intensity difference ΔA [tn] calculated by the intensity difference calculator 54. Specifically, the coefficient value ξa [tn] is 0 so that the coefficient value ξa [tn] becomes smaller as the intensity difference ΔA [tn] is larger (that is, the wind noise is dominant in the unit period tn). It is set variably within the range above and below 1.

第３に、係数列生成部４６は、数式(A)の演算で算定した係数値γi[fm,tn]を係数値ξp[tn]と係数値ξa[tn]に応じて調整することで処理係数列Ｇiの各係数値ｇi[fm,tn]（ｇi[f1,t1]〜ｇi[fM,tN]）を算定する。例えば、係数列生成部４６は、係数値γi[fm,tn]と係数値ξp[tn]と係数値ξa[tn]との乗算値を係数値ｇi[fm,tn]として算定する（ｇi[fm,tn]＝γi[fm,tn]×ξp[tn]×ξa[tn]）。以上の説明から理解されるように、係数値ξp[tn]や係数値ξa[tn]が小さい（すなわち単位期間ｔnにて風雑音が優勢である）ほど係数値ｇi[gm,tn]は小さい数値となる。したがって、第２実施形態によれば、第１実施形態と比較して風雑音を充分に抑圧できるという利点がある。 Third, the coefficient sequence generation unit 46 adjusts the coefficient value γi [fm, tn] calculated by the calculation of the formula (A) according to the coefficient value ξp [tn] and the coefficient value ξa [tn]. Each coefficient value gi [fm, tn] (gi [f1, t1] to gi [fM, tN]) of the coefficient sequence Gi is calculated. For example, the coefficient sequence generation unit 46 calculates a multiplication value of the coefficient value γi [fm, tn], the coefficient value ξp [tn], and the coefficient value ξa [tn] as the coefficient value gi [fm, tn] (gi [ fm, tn] = γi [fm, tn] × ξp [tn] × ξa [tn]). As understood from the above description, the coefficient value gi [gm, tn] is smaller as the coefficient value ξp [tn] and the coefficient value ξa [tn] are smaller (that is, the wind noise is dominant in the unit period tn). It becomes a numerical value. Therefore, according to the second embodiment, there is an advantage that wind noise can be sufficiently suppressed as compared with the first embodiment.

なお、以上の例示では単位期間ｔn毎に位相差ΔＰ[tn]および強度差ΔＡ[tn]を算定したが、周波数ｆm毎の位相差ΔＰ[fm,tn]と周波数ｆm毎の強度差ΔＡ[fm,tn]とを各単位期間ｔnについて算定する構成も好適である。係数列生成部４６は、周波数ｆmと単位期間ｔnとの組合せ毎に、位相差ΔＰ[fm,tn]に応じた係数値ξp[fm,tn]と強度差ΔＡ[fm,tn]に応じた係数値ξa[fm,tn]とを算定する。処理係数列Ｇiの係数値ｇi[fm,tn]は、例えば係数値γi[fm,tn]と係数値ξp[fm,tn]と係数値ξa[fm,tn]との乗算値として算定される。 In the above example, the phase difference ΔP [tn] and the intensity difference ΔA [tn] are calculated for each unit period tn, but the phase difference ΔP [fm, tn] for each frequency fm and the intensity difference ΔA [for each frequency fm. A configuration in which fm, tn] is calculated for each unit period tn is also suitable. The coefficient sequence generator 46 corresponds to the coefficient value ξp [fm, tn] and the intensity difference ΔA [fm, tn] corresponding to the phase difference ΔP [fm, tn] for each combination of the frequency fm and the unit period tn. The coefficient value ξa [fm, tn] is calculated. The coefficient value gi [fm, tn] of the processing coefficient sequence Gi is calculated as a multiplication value of the coefficient value γi [fm, tn], the coefficient value ξp [fm, tn], and the coefficient value ξa [fm, tn], for example. .

＜Ｃ：第３実施形態＞
次に、本発明の第３実施形態を説明する。第１実施形態では、雑音基底Ｃi_noiseの除外後の行列ＶAiに応じた処理係数列Ｇi（Ｇ1，Ｇ2）を音響信号ｓiの観測行列Ｖiに作用させることでスペクトルＹiの時系列を生成した。前述の通り、行列ＶAiの第ｎ列に位置するＭ個の要素値ｖa[f1,tn]〜ｖa[fM,tn]の系列は、単位期間ｔnのスペクトルＸiから風雑音を抑圧したパワースペクトルに相当する。そこで、第３実施形態では、雑音基底Ｃi_noiseの除外後の行列ＶAiをスペクトルＹiの時系列として利用する。すなわち、処理係数列Ｇiの生成や観測行列Ｖiに対する適用は省略される。 <C: Third Embodiment>
Next, a third embodiment of the present invention will be described. In the first embodiment, the time series of the spectrum Yi is generated by applying the processing coefficient sequence Gi (G1, G2) corresponding to the matrix VAi after the exclusion of the noise basis Ci_noise to the observation matrix Vi of the acoustic signal si. As described above, the sequence of M element values va [f1, tn] to va [fM, tn] located in the nth column of the matrix VAi has a power spectrum in which wind noise is suppressed from the spectrum Xi of the unit period tn. Equivalent to. Therefore, in the third embodiment, the matrix VAi after removal of the noise basis Ci_noise is used as a time series of the spectrum Yi. That is, the generation of the processing coefficient sequence Gi and the application to the observation matrix Vi are omitted.

第３実施形態の音響処理装置１００は、第１実施形態の特性解析部３４Aおよび雑音抑圧部３６を図７の雑音抑圧部６０に置換した構成である。図７に示すように、雑音抑圧部６０は、行列分解部４２と雑音特定部４４と抑圧処理部４８とを含んで構成される。行列分解部４２および雑音特定部４４は、第１実施形態と同様である。すなわち、行列分解部４２は、各観測行列Ｖiの非負行列因子分解で基底行列Ｗi（Ｗ1，Ｗ2）と係数行列Ｈi（Ｈ1，Ｈ2）とを生成し、雑音特定部４４は、基底行列Ｗ1内の雑音基底Ｃ1_noiseと基底行列Ｗ2内の雑音基底Ｃ2_noiseとを特定する。 The sound processing apparatus 100 of the third embodiment has a configuration in which the characteristic analysis unit 34A and the noise suppression unit 36 of the first embodiment are replaced with a noise suppression unit 60 of FIG. As shown in FIG. 7, the noise suppression unit 60 includes a matrix decomposition unit 42, a noise identification unit 44, and a suppression processing unit 48. The matrix decomposing unit 42 and the noise specifying unit 44 are the same as those in the first embodiment. That is, the matrix decomposition unit 42 generates the base matrix Wi (W1, W2) and the coefficient matrix Hi (H1, H2) by non-negative matrix factorization of each observation matrix Vi, and the noise specifying unit 44 stores the base matrix W1. Noise base C1_noise and noise base C2_noise in the base matrix W2 are specified.

抑圧処理部４８は、音響信号ｓ1および音響信号ｓ2の各々について、当該音響信号ｓiの基底行列Ｗiから雑音基底Ｃi_noiseを除外したＭ行×(K-1)列の行列ＷAiと、雑音基底Ｃi_noiseに対応する重み系列Ｅi_noiseを係数行列Ｈiから除外した(K-1)行×Ｎ列の行列ＨAiとの乗算で算定されるＭ行×Ｎ列の行列ＶAiを、風雑音の抑圧後のスペクトルＹiの時系列（スペクトログラム）として解析期間ＴA毎に順次に生成する。すなわち、行列ＶAiの第ｎ列に位置するＭ個の要素値ｖa[f1,tn]〜ｖa[fM,tn]の系列が単位期間ｔnのスペクトルＹiとして波形合成部３８に供給される。他の動作は第１実施形態と同様である。 For each of the acoustic signal s1 and acoustic signal s2, the suppression processing unit 48 applies an M-row × (K−1) -column matrix WAi obtained by excluding the noise basis Ci_noise from the basis matrix Wi of the acoustic signal si and the noise basis Ci_noise. The matrix VAi of M rows × N columns calculated by multiplying the corresponding weight sequence Ei_noise from the coefficient matrix Hi by the multiplication with the (K-1) rows × N columns matrix HAi is obtained from the spectrum Yi after the suppression of wind noise. A time series (spectrogram) is sequentially generated for each analysis period TA. That is, a sequence of M element values va [f1, tn] to va [fM, tn] located in the nth column of the matrix VAi is supplied to the waveform synthesis unit 38 as the spectrum Yi of the unit period tn. Other operations are the same as those in the first embodiment.

第３実施形態では、雑音基底Ｃi_noiseを基底行列Ｗiから除外した行列ＷAiと重み系列Ｅi_noiseを係数行列Ｈiから除外した行列ＨAiとの乗算でスペクトルＹiの時系列（行列ＶAi）が生成される。したがって、第１実施形態と同様の効果が実現される。また、処理係数列Ｇiの算定（数式(A)）や観測行列Ｖiに対する適用が省略されるから、第１実施形態と比較して音響処理装置１００の構成や処理が簡素化されるという利点もある。 In the third embodiment, a time series (matrix VAi) of the spectrum Yi is generated by multiplying the matrix WAi excluding the noise basis Ci_noise from the base matrix Wi and the matrix HAi excluding the weight sequence Ei_noise from the coefficient matrix Hi. Therefore, the same effect as the first embodiment is realized. Further, since calculation of the processing coefficient sequence Gi (formula (A)) and application to the observation matrix Vi are omitted, there is an advantage that the configuration and processing of the sound processing apparatus 100 are simplified as compared with the first embodiment. is there.

＜Ｄ：変形例＞
以上の各形態には多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様は適宜に併合され得る。 <D: Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

（１）変形例１
以上の各形態では、音響信号ｓiを帯域ＢLと帯域ＢHとに分割したが、音響信号ｓiの帯域分割は省略され得る。例えば、音響信号ｓiの全帯域を特性解析部３４（３４A，３４B）や雑音抑圧部３６による処理の対象とした構成も採用される。もっとも、風雑音の強度は高域側の帯域（例えば帯域ＢH）で低下するから、音響信号ｓiの帯域分割を省略した構成では、風雑音の独立した基底Ｃi[k]を非負行列因子分解で高精度に抽出することが困難となる。したがって、抑圧の対象となる雑音成分の周波数帯域が事前に判明している場合には、雑音成分を包含する周波数帯域（帯域ＢL）のみを特性解析部３４（３４A，３４B）や雑音抑圧部３６による処理の対象とした前述の構成が格別に好適である。 (1) Modification 1
In each of the above embodiments, the acoustic signal si is divided into the band BL and the band BH, but the band division of the acoustic signal si may be omitted. For example, a configuration in which the entire band of the acoustic signal si is processed by the characteristic analysis unit 34 (34A, 34B) or the noise suppression unit 36 is also employed. However, since the intensity of the wind noise decreases in the higher band (for example, the band BH), in the configuration in which the band division of the acoustic signal si is omitted, the independent basis Ci [k] of the wind noise is obtained by non-negative matrix factorization. It becomes difficult to extract with high accuracy. Therefore, when the frequency band of the noise component to be suppressed is known in advance, only the frequency band (band BL) including the noise component is used for the characteristic analysis unit 34 (34A, 34B) or the noise suppression unit 36. The above-described configuration that is the object of the processing by is particularly suitable.

（２）変形例２
以上の各形態では、音響信号ｓiの解析期間ＴA毎に処理係数列Ｇiを生成したが、解析期間ＴAの区切は省略される。例えば、音響信号ｓiの全区間にわたる単位期間ｔn毎のスペクトルＸiの時系列を１個の観測行列Ｖiとして処理係数列Ｇiを生成する構成も採用され得る。 (2) Modification 2
In each of the above embodiments, the processing coefficient sequence Gi is generated for each analysis period TA of the acoustic signal si, but the division of the analysis period TA is omitted. For example, a configuration in which the processing coefficient sequence Gi is generated using the time series of the spectrum Xi for each unit period tn over the entire interval of the acoustic signal si as one observation matrix Vi can be employed.

（３）変形例３
以上の各形態では、スペクトルＳiをパワースペクトルとした構成を例示したが、スペクトルＳiを振幅スペクトルとした構成も採用され得る。前掲の数式(A)で算定される係数値ｇi[fm,tn]はパワーに対するゲインであるから、スペクトルＳiを振幅スペクトルとした構成では、数式(A)の右辺の平方根（ｖa[fm,tn]／ｖ[fm,tn]）^1/2が係数値ｇi[fm,tn]として算定される。 (3) Modification 3
In each of the above embodiments, the configuration in which the spectrum Si is the power spectrum is illustrated, but a configuration in which the spectrum Si is the amplitude spectrum can also be employed. Since the coefficient value gi [fm, tn] calculated by the above formula (A) is a gain with respect to power, in the configuration in which the spectrum Si is an amplitude spectrum, the square root (va [fm, tn] of the right side of the formula (A) is used. ] / V [fm, tn]) ^1/2 is calculated as the coefficient value gi [fm, tn].

（４）変形例４
以上の各形態では、処理係数列Ｇiの各係数値ｇi[fm,tn]を音響信号ｓiの各成分値ｘi[fm,tn]に乗算することで音響信号ｑiを生成したが、処理係数列Ｇiを音響信号ｓiに作用させる方法は適宜に変更される。例えば、音響信号ｓiの各成分値ｘi[fm,tn]に係数値ｇi[fm,tn]を加算する構成も採用され得る。また、以上の各形態での例示とは反対に、風雑音が優勢であるほど係数値ｇi[fm,tn]が大きい数値となるように処理係数列Ｇiを生成する構成では、成分値ｘi[fm,tn]を係数値ｇi[fm,tn]で除算または減算する構成が採用され得る。 (4) Modification 4
In each of the above embodiments, the acoustic signal qi is generated by multiplying each component value xi [fm, tn] of the acoustic signal si by each coefficient value gi [fm, tn] of the processing coefficient sequence Gi. The method of causing Gi to act on the acoustic signal si is appropriately changed. For example, a configuration in which the coefficient value gi [fm, tn] is added to each component value xi [fm, tn] of the acoustic signal si may be employed. Contrary to the examples in the above embodiments, in the configuration in which the processing coefficient sequence Gi is generated so that the coefficient value gi [fm, tn] becomes larger as the wind noise becomes more dominant, the component value xi [ A configuration in which fm, tn] is divided or subtracted by the coefficient value gi [fm, tn] may be employed.

（５）変形例５
以上の各形態では、２系統の音響信号ｑi（ｑ1，ｑ2）を生成したが、１系統（モノラル形式）の音響信号ｑ1のみを生成する場合にも以上の各形態が同様に適用され得る。例えば、観測行列Ｖ1に応じた１個の処理係数列Ｇ1のみを係数列生成部４６が生成し、雑音抑圧部３６は観測行列Ｖ1に対する処理係数列Ｇ1の適用でスペクトルＹ1の時系列のみを生成する。以上の構成では、音響信号ｓ1の基底行列Ｗ1から雑音基底Ｃ1_noiseを特定するために音響信号ｓ2が利用される。 (5) Modification 5
In the above embodiments, two systems of acoustic signals qi (q1, q2) are generated, but the above embodiments can be similarly applied when only one system (monaural format) of acoustic signals q1 is generated. For example, the coefficient sequence generation unit 46 generates only one processing coefficient sequence G1 corresponding to the observation matrix V1, and the noise suppression unit 36 generates only the time series of the spectrum Y1 by applying the processing coefficient sequence G1 to the observation matrix V1. To do. In the above configuration, the acoustic signal s2 is used to specify the noise base C1_noise from the basis matrix W1 of the acoustic signal s1.

（６）変形例６
演算処理装置２２の処理（処理係数列Ｇiの生成や適用）は、音響信号ｓ1および音響信号ｓ2の供給に並行して実時間的に実行され、各処理毎に逐次的に各音響信号ｑiが再生され得る。ただし、事前に用意された音響信号ｓ1および音響信号ｓ2に対する処理係数列Ｇiの生成や適用が完了してから音響信号ｑiの生成を開始する構成（バッチ処理）も好適である。 (6) Modification 6
The processing of the arithmetic processing unit 22 (generation and application of the processing coefficient sequence Gi) is executed in real time in parallel with the supply of the acoustic signal s1 and the acoustic signal s2, and each acoustic signal qi is sequentially generated for each processing. Can be regenerated. However, a configuration (batch process) in which generation of the acoustic signal qi is started after generation and application of the processing coefficient sequence Gi for the acoustic signal s1 and the acoustic signal s2 prepared in advance is completed.

１００……音響処理装置、１２……信号供給装置、１４……放音装置、２２……演算処理装置、２４……記憶装置、３２……周波数分析部、３４A，３４B……特性解析部、３６……雑音抑圧部、３８……波形合成部、４２……行列分解部、４４……雑音特定部、４６……係数列生成部、５２……位相差算定部、５４……強度差算定部。
DESCRIPTION OF SYMBOLS 100 ... Sound processing device, 12 ... Signal supply device, 14 ... Sound emission device, 22 ... Arithmetic processing device, 24 ... Memory | storage device, 32 ... Frequency analysis part, 34A, 34B ... Characteristic analysis part, 36 …… Noise suppression unit, 38 …… Waveform synthesis unit, 42 …… Matrix decomposition unit, 44 …… Noise identification unit, 46 …… Coefficient sequence generation unit, 52 …… Phase difference calculation unit, 54 …… Intensity difference calculation Department.

Claims

For each of the first acoustic signal and the second acoustic signal collected in parallel, the acoustic signals are different by non-negative matrix factorization of an observation matrix having a time series of component values for each frequency of the acoustic signal as elements. Matrix decomposition means for generating a base matrix including a plurality of bases indicating component values for each frequency of the component, and a coefficient matrix including a plurality of weight sequences each indicating a time series of weight values of each base;
A base having a high correlation with the base of the base matrix of the second acoustic signal among the plurality of bases of the base matrix of the first acoustic signal is identified as a noise base corresponding to a noise component of the first acoustic signal Noise identification means to
According to each basis other than the noise basis among the basis matrix of the first acoustic signal and a weight sequence corresponding to each basis among the coefficient matrix of the first acoustic signal, the first acoustic signal of the first acoustic signal Coefficient sequence generation means for generating a processing coefficient sequence in which coefficient values for each frequency are set so that noise components are suppressed;
Noise suppression means comprising: noise suppression means for applying the processing coefficient sequence to the first acoustic signal.

The matrix decomposition means generates the base matrix and the coefficient matrix for the first band component of the first acoustic signal,
The noise suppression means causes the processing coefficient sequence to act on the component of the first band in the first acoustic signal,
The waveform synthesizing unit that synthesizes the component after processing by the noise suppression unit and the component in the second band on the high frequency side compared to the first band in the first acoustic signal. Sound processing device.

Comprising phase difference calculating means for calculating a phase difference between the first acoustic signal and the second acoustic signal;
The acoustic processing apparatus according to claim 2, wherein the coefficient sequence generation unit variably sets each coefficient value of the processing coefficient sequence in accordance with a phase difference between the first acoustic signal and the second acoustic signal.

Comprising intensity difference calculating means for calculating an intensity difference between the first acoustic signal and the second acoustic signal;
The acoustic processing device according to claim 2 or 3, wherein the coefficient sequence generation means variably sets each coefficient value of the processing coefficient sequence according to an intensity difference between the first acoustic signal and the second acoustic signal. .