JP5060438B2

JP5060438B2 - Sound collector

Info

Publication number: JP5060438B2
Application number: JP2008232586A
Authority: JP
Inventors: 実福島; 香菜川東
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2008-09-10
Filing date: 2008-09-10
Publication date: 2012-10-31
Anticipated expiration: 2028-09-10
Also published as: JP2010066506A

Abstract

PROBLEM TO BE SOLVED: To provide a sound collecting device capable of suppressing noise. SOLUTION: The sound collecting device includes: a sound receiving section 1 for outputting a voice signal before processing of intensity according to sound pressure of input sound, a time differential signal of intensity according to a time differential value of the sound pressure, and x-differential signal and y-differential signal according to a space differential value of the sound pressure in each axis direction of a two-dimensional orthogonal coordinate system; and a voice processing section 2 for generating a voice signal after processing, in which noise from a position other than a predetermined target position is suppressed, by load summing of the voice signal before processing, the time differential signal, the x-differential signal and the y-differential signal, which are output by the sound collection section 1. The voice processing section 2 suitably updates a reference distance which is used for determination of the load used for generation of the voice signal after processing. Compared to a case that the reference distance is always matched with the distance to the target position, sound (noise) from a position other than the target position is suppressed. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、収音装置に関するものである。 The present invention relates to a sound collection device.

従来から、マイクロホン等が出力した処理前音声信号を用いて、所定の目標位置からの音が選択的に反映された処理後音声信号を生成する収音装置が提供されている（例えば、特許文献１〜３参照）。目標位置としては、目的とする音の音源（話者など）が存在するとみなされる位置が選択される。 2. Description of the Related Art Conventionally, there has been provided a sound collection device that generates a post-processing audio signal in which sound from a predetermined target position is selectively reflected using a pre-processing audio signal output from a microphone or the like (for example, Patent Documents). 1-3). As the target position, a position where the sound source (speaker or the like) of the target sound is considered to be present is selected.

また、入力された音の音圧に応じた強度の処理前音声信号と前記音圧の時間微分値に応じた強度の時間微分信号と二次元直交座標系の各軸方向についての前記音圧の空間微分値に応じたｘ微分信号及びｙ微分信号とをそれぞれ出力する受音手段を用いる時空間勾配法において、処理前音声信号と時間微分信号とｘ微分信号とｙ微分信号との荷重和により処理後音声信号を生成するという技術が知られている（例えば、非特許文献１参照）。上記の荷重和に用いる荷重を、目標位置からの音に対する利得を一定とするという拘束条件の下で、所定の荷重和算出時間にわたる処理後音声信号の強度の２乗の積分値である処理後パワーを最低とするようなＭＶ（Minimum Variance：最小分散）法で決定すれば、得られる処理後音声信号においては、目標位置以外からの音が抑えられ、目標位置からの音が選択的に反映される。
特開２００６−１５４３１４号公報特開２０００−４７６９９号公報特許第３４８４１１２号公報小野, 安藤, “音場の計測と指向性制御, 第22回センシングフォーラム資料, pp. 305-310,2005. In addition, the pre-processed sound signal having an intensity corresponding to the sound pressure of the input sound, the time differential signal having an intensity corresponding to the time differential value of the sound pressure, and the sound pressure in each axial direction of the two-dimensional orthogonal coordinate system. In a spatiotemporal gradient method using sound receiving means for outputting x differential signals and y differential signals according to spatial differential values, the weight sum of pre-processed audio signals, temporal differential signals, x differential signals, and y differential signals is calculated. A technique of generating an audio signal after processing is known (for example, see Non-Patent Document 1). After the processing, which is an integral value of the square of the intensity of the processed audio signal over a predetermined load sum calculation time, the load used for the above load sum is subjected to a constraint condition that the gain with respect to the sound from the target position is constant. If the MV (Minimum Variance) method that minimizes the power is used, the sound after the target position is suppressed and the sound from the target position is selectively reflected in the processed audio signal obtained. Is done.
JP 2006-154314 A JP 2000-47699 A Japanese Patent No. 3484112 Ono, Ando, “Measurement of sound field and directivity control, 22nd Sensing Forum, pp. 305-310, 2005.

この種の収音装置では、処理後音声信号において、目標位置以外の位置からの周囲雑音や残響音等の音（以下、「騒音」と呼ぶ。）が可能な限り抑えられることが望ましい。例えば、上記の収音装置が出力する処理後音声信号が音声認識に用いられる場合、処理後音声信号における騒音の低減は認識の精度の向上につながる。 In this type of sound collecting device, it is desirable to suppress as much as possible ambient noise and reverberant sound (hereinafter referred to as “noise”) from positions other than the target position in the processed audio signal. For example, when the processed speech signal output from the sound collection device is used for speech recognition, noise reduction in the processed speech signal leads to an improvement in recognition accuracy.

本発明は、上記事由に鑑みて為されたものであり、その目的は、騒音をより抑えることが可能な収音装置を提供することにある。 This invention is made | formed in view of the said reason, The objective is to provide the sound collection device which can suppress a noise more.

請求項１の発明は、入力された音の音圧に応じた強度の処理前音声信号と前記音圧の時間微分値に応じた強度の時間微分信号と二次元直交座標系の各軸方向についての前記音圧の空間微分値に応じたｘ微分信号及びｙ微分信号とをそれぞれ出力する受音部と、受音部が出力した処理前音声信号と時間微分信号とｘ微分信号とｙ微分信号との荷重和により処理後音声信号を生成する音声処理部とを備え、音声処理部は、前記荷重和に用いる荷重を、目標とする音声が存在すると見なされる所定の目標位置に応じて決定された基準位置からの音に対する利得を一定とするという拘束条件の下で、所定の荷重和算出時間にわたる処理後音声信号の強度の２乗の積分値である処理後パワーを最低とするようなＭＶ法で決定するものであり、基準位置を、受音部を起点とし目標位置を通る直線上から選択するものであって、処理後パワーを最低とするように、受音部と基準位置との距離である基準距離を更新する可能性がある更新動作を定期的に行うことを特徴とする。 According to the first aspect of the present invention, the pre-processed speech signal having the intensity corresponding to the sound pressure of the input sound, the time differential signal having the intensity corresponding to the time differential value of the sound pressure, and each axial direction of the two-dimensional orthogonal coordinate system. A sound receiving unit that outputs an x differential signal and a y differential signal corresponding to a spatial differential value of the sound pressure, a pre-processing voice signal, a time differential signal, an x differential signal, and a y differential signal output by the sound receiving unit, respectively. And a sound processing unit that generates a processed sound signal based on a load sum of the load and the sound processing unit is configured to determine a load used for the load sum according to a predetermined target position where a target sound is considered to exist. MV that minimizes the processed power that is the integral value of the square of the intensity of the processed audio signal over a predetermined load sum calculation time under the constraint that the gain for the sound from the reference position is constant. The reference position is determined by law. There is a possibility that the reference distance, which is the distance between the sound receiving unit and the reference position, is updated so that the power after processing is minimized, which is selected from a straight line passing through the target position starting from the sound receiving unit. The update operation is performed periodically.

この発明によれば、荷重の決定に用いられる基準位置を常に目標位置に一致させる場合に比べ、処理後音声信号において目標位置以外からの騒音をより抑えることが可能となる。 According to the present invention, it is possible to further suppress noise from other than the target position in the processed audio signal as compared with the case where the reference position used for determining the load is always matched with the target position.

請求項２の発明は、請求項１の発明において、前記荷重和算出時間にわたる処理前音声信号の強度の２乗の積分値である処理前パワーを演算する処理前パワー演算部と、処理前パワー演算部で得られた処理前パワーを所定の音量閾値と比較する判定部とを備え、音声処理部は、更新動作時、判定部における比較の結果として処理前パワーが音量閾値未満であれば、基準距離を変更しないことを特徴とする。 According to a second aspect of the present invention, in the first aspect of the present invention, a pre-processing power calculation unit that calculates a pre-processing power that is an integral value of the square of the intensity of the pre-processing voice signal over the load sum calculation time, and a pre-processing power A determination unit that compares the pre-processing power obtained by the calculation unit with a predetermined volume threshold value, and the audio processing unit is the update operation, if the pre-processing power is less than the volume threshold value as a result of the comparison in the determination unit, The reference distance is not changed.

この発明によれば、受音部に入射する音の音圧（音量）が不十分であるときに基準距離が変更されることを避けることができる。 According to the present invention, it is possible to avoid changing the reference distance when the sound pressure (volume) of the sound incident on the sound receiving unit is insufficient.

請求項３の発明は、請求項１又は請求項２の発明において、前記荷重和算出時間にわたる処理前音声信号の強度の２乗の積分値である処理前パワーを演算する処理前パワー演算部と、処理後パワーを演算する処理後パワー演算部と、処理前パワー演算部が出力した処理前音声パワーに対する、処理後パワー演算部が出力した処理後音声パワーの比である処理比を演算するとともに得られた処理比を所定の目的音閾値と比較する判定部とを備え、音声処理部は、更新動作時、判定部で得られた処理比が目的音閾値以上であれば基準距離を変更しないことを特徴とする。 According to a third aspect of the present invention, in the first or second aspect of the present invention, the pre-processing power calculation unit that calculates a pre-processing power that is an integral value of the square of the intensity of the pre-processing voice signal over the load sum calculation time. Calculating a processing ratio, which is a ratio of the post-processing voice power output by the post-processing power calculation unit to the post-processing power calculation unit that calculates the post-processing power and the pre-processing voice power output by the pre-processing power calculation unit; A determination unit that compares the obtained processing ratio with a predetermined target sound threshold, and the sound processing unit does not change the reference distance if the processing ratio obtained by the determination unit is equal to or greater than the target sound threshold during the update operation. It is characterized by that.

この発明によれば、騒音が少ないときに基準距離が変更されることを避けることができる。 According to the present invention, it is possible to avoid changing the reference distance when there is little noise.

請求項４の発明は、請求項１又は請求項２の発明において、受音部が出力した処理前音声信号と時間微分信号とｘ微分信号とｙ微分信号との荷重和により、目標位置からの音声が反映されない不感音声信号を生成する不感処理部と、前記荷重和算出時間にわたる処理前音声信号の強度の２乗の積分値である処理前パワーを演算する処理前パワー演算部と、前記荷重和算出時間にわたる不感音声信号の強度の２乗の積分値である不感処理後パワーを演算する不感処理後パワー演算部と、処理前パワー演算部が出力した処理前音声パワーに対する、不感処理後パワー演算部が出力した不感処理後音声パワーの比である不感処理比を演算するとともに得られた不感処理比を所定の騒音閾値と比較する判定部とを備え、音声処理部は、更新動作時、判定部で得られた不感処理比が騒音閾値未満であれば基準距離を変更しないことを特徴とする。 According to a fourth aspect of the present invention, in the first or second aspect of the present invention, the difference from the target position is determined by the sum of the weights of the pre-processing audio signal, the time differential signal, the x differential signal, and the y differential signal output by the sound receiving unit. A dead processing unit that generates a dead voice signal that does not reflect voice, a pre-processing power calculation unit that calculates a pre-processing power that is an integral value of the square of the strength of the pre-processing voice signal over the load sum calculation time, and the load Power after desensitization processing for calculating power after desensitization, which is an integral value of the square of the intensity of the dead sound signal over the sum calculation time, and power after desensitization for the pre-process voice power output by the pre-process power calculation unit And a determination unit that calculates the dead processing ratio that is a ratio of the voice power after the dead processing output by the calculation unit and compares the obtained dead processing ratio with a predetermined noise threshold, and the voice processing unit, during the update operation, Size Insensitive treatment ratio obtained in part is characterized in that it does not change the reference distance is less than the noise threshold.

請求項５の発明は、請求項３の発明において、音声処理部は、更新動作時、判定部で得られた処理比が目的音閾値未満であった場合に、基準距離を、処理比を最小とするような値に更新することを特徴とする。 According to a fifth aspect of the present invention, in the third aspect of the invention, when the processing ratio obtained by the determination section is less than the target sound threshold during the update operation, the voice processing section minimizes the reference distance and the processing ratio. It is characterized by updating to a value such as

請求項６の発明は、請求項４の発明において、処理後パワーを演算する処理後パワー演算部を備え、音声処理部は、更新動作時、判定部で得られた不感処理比が騒音閾値以上であった場合に、基準距離を、不感処理後パワー演算部が出力した不感処理後パワーに対する、処理後パワー演算部が出力した処理後パワーの比を最小とするような値に更新することを特徴とする。 The invention of claim 6 is the invention of claim 4, further comprising a post-processing power calculation unit that calculates post-processing power, and the voice processing unit has an insensitive processing ratio obtained by the determination unit during the update operation equal to or greater than a noise threshold value. The reference distance is updated to a value that minimizes the ratio of the post-processing power output from the post-processing power calculation unit to the post-insensitive processing power output from the post-insensitive processing power calculation unit. Features.

請求項７の発明は、請求項１〜６のいずれかの発明において、音声処理部は、基準距離を更新する際、受音部が出力した処理前音声信号と時間微分信号とｘ微分信号とｙ微分信号とに基いて、音源の個数が１個か否かを判定するとともに、音源の個数が１個であった場合には、該音源までの距離を推定し、推定された音源までの距離を更新後の基準距離とすることを特徴とする。 The invention according to claim 7 is the invention according to any one of claims 1 to 6, wherein the sound processing unit outputs the pre-processing sound signal, the time differential signal, and the x differential signal output by the sound receiving unit when updating the reference distance. Based on the y differential signal, it is determined whether or not the number of sound sources is one, and if the number of sound sources is one, the distance to the sound source is estimated, The distance is used as the updated reference distance.

請求項８の発明は、請求項５〜７のいずれかの発明において、音声処理部は、一回の更新動作での基準距離の変更幅を所定の上限幅以下とすることを特徴とする。 The invention of claim 8 is characterized in that, in the invention of any one of claims 5 to 7, the speech processing unit sets the change width of the reference distance in one update operation to a predetermined upper limit width or less.

この発明によれば、一回の更新動作で基準距離が大きく変更されることに伴う処理後音声信号の歪みが抑えられ、処理後音声信号を変換（再生）した音声が与える違和感が低減される。 According to the present invention, the distortion of the processed audio signal due to the large change of the reference distance in one update operation is suppressed, and the uncomfortable feeling given by the sound obtained by converting (reproducing) the processed audio signal is reduced. .

請求項９の発明は、請求項５〜７のいずれかの発明において、音声処理部が更新動作時に基準距離に関して行う可能性がある動作は、基準距離を変更しない、基準距離を所定の単位増加量だけ増加させる、基準距離を所定の単位減少量だけ減少させる、のいずれかであることを特徴とする。 The invention according to claim 9 is the invention according to any one of claims 5 to 7, wherein the operation that the voice processing unit may perform with respect to the reference distance during the update operation does not change the reference distance and increases the reference distance by a predetermined unit. The reference distance is increased by an amount, or the reference distance is decreased by a predetermined unit decrease amount.

請求項１の発明によれば、音声処理部は、前記荷重和に用いる荷重を、目標とする音声が存在すると見なされる所定の目標位置に応じて決定された基準位置からの音に対する利得を一定とするという拘束条件の下で、所定の荷重和算出時間にわたる処理後音声信号の強度の２乗の積分値である処理後パワーを最低とするようなＭＶ法で決定するものであって、基準位置を、受音部を起点とし目標位置を通る直線上から選択するものであって、処理後パワーを最低とするように、受音部と基準位置との距離である基準距離を更新する可能性がある更新動作を定期的に行うので、荷重の決定に用いられる基準位置を常に目標位置に一致させる場合に比べ、処理後音声信号において目標位置以外からの騒音をより抑えることが可能となる。 According to the first aspect of the present invention, the sound processing unit has a constant gain for a sound from a reference position determined according to a predetermined target position at which a target sound is regarded as a load used for the load sum. Is determined by an MV method that minimizes the processed power, which is the integral value of the square of the intensity of the processed audio signal over a predetermined load sum calculation time. The position is selected from a straight line that passes through the target position starting from the sound receiving unit, and the reference distance, which is the distance between the sound receiving unit and the reference position, can be updated so that the post-processing power is minimized. Since the renewal operation is performed periodically, it is possible to further suppress noise from other than the target position in the processed audio signal, compared to the case where the reference position used for determining the load is always matched with the target position. .

請求項２の発明によれば、前記荷重和算出時間にわたる処理前音声信号の強度の２乗の積分値である処理前パワーを演算する処理前パワー演算部と、処理前パワー演算部で得られた処理前パワーを所定の音量閾値と比較する判定部とを備え、音声処理部は、更新動作時、判定部における比較の結果として処理前パワーが音量閾値未満であれば、基準距離を変更しないので、受音部に入射する音の音圧（音量）が不十分であるときに基準距離が変更されることを避けることができる。 According to the second aspect of the present invention, the pre-processing power calculation unit that calculates the pre-processing power that is an integral value of the square of the intensity of the pre-processing voice signal over the load sum calculation time, and the pre-processing power calculation unit are obtained. A determination unit that compares the pre-process power with a predetermined volume threshold value, and the audio processing unit does not change the reference distance when the pre-process power is less than the volume threshold value as a result of the comparison in the determination unit during the update operation. Therefore, it is possible to avoid changing the reference distance when the sound pressure (volume) of the sound incident on the sound receiving unit is insufficient.

請求項３の発明によれば、前記荷重和算出時間にわたる処理前音声信号の強度の２乗の積分値である処理前パワーを演算する処理前パワー演算部と、処理後パワーを演算する処理後パワー演算部と、処理前パワー演算部が出力した処理前音声パワーに対する、処理後パワー演算部が出力した処理後音声パワーの比である処理比を演算するとともに得られた処理比を所定の目的音閾値と比較する判定部とを備え、音声処理部は、更新動作時、判定部で得られた処理比が目的音閾値以上であれば基準距離を変更しないので、騒音が少ないときに基準距離が変更されることを避けることができる。 According to the invention of claim 3, the pre-processing power calculation unit that calculates the pre-processing power that is an integral value of the square of the intensity of the pre-processing audio signal over the load sum calculation time, and the post-processing that calculates the post-processing power The processing ratio obtained by calculating the processing ratio, which is the ratio of the post-processing voice power output by the post-processing power calculation section to the pre-processing voice power output by the power calculation section and the pre-processing power calculation section, and the obtained processing ratio for a predetermined purpose A determination unit that compares with a sound threshold, and the sound processing unit does not change the reference distance if the processing ratio obtained by the determination unit is equal to or greater than the target sound threshold during the update operation. Can be avoided.

請求項４の発明によれば、受音部が出力した処理前音声信号と時間微分信号とｘ微分信号とｙ微分信号との荷重和により、目標位置からの音声が反映されない不感音声信号を生成する不感処理部と、前記荷重和算出時間にわたる処理前音声信号の強度の２乗の積分値である処理前パワーを演算する処理前パワー演算部と、前記荷重和算出時間にわたる不感音声信号の強度の２乗の積分値である不感処理後パワーを演算する不感処理後パワー演算部と、処理前パワー演算部が出力した処理前音声パワーに対する、不感処理後パワー演算部が出力した不感処理後音声パワーの比である不感処理比を演算するとともに得られた不感処理比を所定の騒音閾値と比較する判定部とを備え、音声処理部は、更新動作時、判定部で得られた不感処理比が騒音閾値未満であれば基準距離を変更しないので、騒音が少ないときに基準距離が変更されることを避けることができる。 According to the fourth aspect of the present invention, a dead sound signal that does not reflect the sound from the target position is generated by the load sum of the pre-processing sound signal, the time differential signal, the x differential signal, and the y differential signal output from the sound receiving unit. A desensitization processing unit, a pre-processing power calculation unit that calculates pre-processing power that is an integral value of the square of the intensity of the pre-processing audio signal over the load sum calculation time, and the intensity of the insensitive audio signal over the load sum calculation time Insensitive post-insensitive power computation unit that computes power after insensitive processing, which is an integral value of the square of, and post-insensitive speech output from the post-insensitive power computing unit with respect to the pre-processing audio power output from the pre-processing power computing unit And a determination unit that calculates the insensitive processing ratio that is a power ratio and compares the obtained insensitive processing ratio with a predetermined noise threshold, and the voice processing unit has the insensitive processing ratio obtained by the determining unit during the update operation. Noise Does not change the reference distance is less than the value, it is possible to prevent the reference distance is changed when the noise is small.

請求項８の発明によれば、音声処理部は、一回の更新動作での基準距離の変更幅を所定の上限幅以下とし、請求項９の発明によれば、音声処理部が更新動作時に基準距離に関して行う可能性がある動作は、基準距離を変更しない、基準距離を所定の単位増加量だけ増加させる、基準距離を所定の単位減少量だけ減少させる、のいずれかであるので、それぞれ、一回の更新動作で基準距離が大きく変更されることに伴う処理後音声信号の歪みが抑えられ、処理後音声信号を変換（再生）した音声が与える違和感が低減される。 According to the invention of claim 8, the voice processing unit sets the change width of the reference distance in one update operation to a predetermined upper limit width or less, and according to the invention of claim 9, the voice processing unit is in the update operation. The operations that may be performed with respect to the reference distance are either not changing the reference distance, increasing the reference distance by a predetermined unit increase amount, or decreasing the reference distance by a predetermined unit decrease amount. Distortion of the processed audio signal due to a large change in the reference distance in one update operation is suppressed, and the uncomfortable feeling given by the sound obtained by converting (reproducing) the processed audio signal is reduced.

以下、本発明を実施するための最良の形態について、図面を参照しながら説明する。 The best mode for carrying out the present invention will be described below with reference to the drawings.

本実施形態は、図１に示すように、入力された音声を電気信号に変換する受音部１と、受音部１の出力を用いて、所定の目標位置からの音が選択的に反映された処理後音声信号を生成する音声処理部２とを備える。 In this embodiment, as shown in FIG. 1, a sound from a predetermined target position is selectively reflected using a sound receiving unit 1 that converts input sound into an electrical signal and an output of the sound receiving unit 1. And an audio processing unit 2 that generates the processed audio signal.

受音部１は、入力された音の音圧ｆに応じた強度の処理前音声信号と、前記音圧ｆの時間微分値ｆ_ｔに応じた強度の時間微分信号と、所定のｘ軸方向についての前記音圧ｆの空間微分値ｆ_ｘに応じたｘ微分信号と、前記ｘ軸方向に直交する所定のｙ軸方向についての前記音圧ｆの空間微分値ｆ_ｙに応じたｙ微分信号とをそれぞれ出力する。 Sound receiving unit 1, a voice signal pre-processing of intensity corresponding to the sound pressure f of the input sound, and time differential signal of intensity corresponding to the time derivative f _t of the sound pressure f, predetermined in the x-axis direction the sound and x differential signal corresponding to the spatial differential value f _x of pressure f, y differential signal corresponding to the spatial differential value f _y of the sound pressure f for a given y-axis direction orthogonal to the x-axis direction of the And output respectively.

具体的には、受音部１は、例えば図２に示すように正方形の頂点の配置で設けられそれぞれ入力された音の音圧に応じた強度ｆ_A〜ｆ_Dの原音声信号を生成する４個のマイクロホン１０Ａ〜１０Ｄと、各マイクロホン１０Ａ〜１０Ｄが出力した原音声信号を用いて処理前音声信号と時間微分信号とｘ微分信号とｙ微分信号とをそれぞれ生成する時空間勾配処理部（図示せず）とを有する。上記のようなマイクロホン１０Ａ〜１０Ｄや時空間勾配処理部は周知技術で実現可能であるので、詳細な図示は省略する。 Specifically, the sound receiving unit 1 is provided with a square apex arrangement as shown in FIG. 2, for example, and generates an original audio signal having intensities f _{A to} f _D according to the sound pressures of the input sounds. Spatio-temporal gradient processing unit that generates four microphones 10A to 10D and a pre-processed audio signal, a time differential signal, an x differential signal, and a y differential signal using the original audio signal output from each of the microphones 10A to 10D ( (Not shown). The microphones 10 A to 10 D and the spatiotemporal gradient processing unit as described above can be realized by a known technique, and thus detailed illustration is omitted.

すなわち、時空間勾配処理部は、各マイクロホン１０Ａ〜１０Ｄの出力の強度ｆ_A〜ｆ_Dと、上記の正方形の一辺の長さｄとを用いて、音圧ｆと、時間微分値ｆ_ｔと、空間微分値ｆ_ｘ，ｆ_ｙとを、次式(1)〜(4)によって得る。 That is, the spatiotemporal gradient processing unit uses the intensity f _{A to} f _D of the output of each of the microphones 10A to 10D and the length d of one side of the above square to calculate the sound pressure f and the time differential value f _t . , spatial differential value f _x, and f _y, obtained by the following equation (1) to (4).

ｆ＝（ｆ_A＋ｆ_B＋ｆ_C＋ｆ_D）／４ …(1)
ｆ_ｔ＝ｄｆ／ｄｔ …(2)
ｆ_ｘ＝{（ｆ_A＋ｆ_B）−（ｆ_C＋ｆ_D）}／２ｄ …(3)
ｆ_ｙ＝{（ｆ_A＋ｆ_C）−（ｆ_B＋ｆ_D）}／２ｄ …(4)
音声処理部２は、受音部１が出力した上記の処理前音声信号、時間微分信号、並びに、ｘ微分信号及びｙ微分信号の荷重和をとることにより、所定の基準位置からの音が選択的に反映された処理後音声信号を生成する。すなわち、処理後音声信号の強度は、音圧ｆと時間微分値ｆ_ｔと空間微分値ｆ_ｘ，ｆ_ｙとの荷重和に応じた強度となる。音声処理部２は周知の電子回路で実現可能であるので、詳細な回路図等は省略する。 f = (f _A + f _B + f _C + f _D ) / 4 (1)
f _t = df / dt (2)
f _x = {(f _A + f _B ) − (f _C + f _D )} / 2d (3)
f _y = {(f _A + f _C ) − (f _B + f _D )} / 2d (4)
The sound processing unit 2 selects a sound from a predetermined reference position by taking a weighted sum of the above-mentioned unprocessed sound signal output from the sound receiving unit 1, the time differential signal, and the x differential signal and the y differential signal. A post-processed audio signal that is automatically reflected is generated. That is, the intensity of the processed speech signal, the sound pressure f and the time differential value f _t and the spatial differential value f _x, the intensity corresponding to the weighted sum of the f _y. Since the audio processing unit 2 can be realized by a known electronic circuit, a detailed circuit diagram and the like are omitted.

ここで、入力ベクトルＦを、Ｆ＝（ｆｆ_ｔｆ_ｘｆ_ｙ）^Ｔと定義し、荷重ベクトルＷを、Ｗ＝（ｗｗ_ｔｗ_ｘｗ_ｙ）^Ｔと定義すると、処理後音声信号の強度はＷ^ＨＦと表される。 Here, if the input vector F is defined as F = (f f _t f _x f _y ) ^T and the load vector W is defined as W = (w w _t w _x w _y ) ^T , The intensity is expressed as W ^H F.

上記の荷重和に用いられる荷重ベクトルＷの決定には、周知のＭＶ（Minimum Variance：最小分散）法が用いられる。ＭＶ法は、時空間勾配法を用いて指向性を制御する手法であり、時空間勾配法とは、そもそも動画像中の見かけの速度場であるオプティカルフローを決定する手法の一つとして提案されたものである（参考文献１参照）。より具体的には、ＭＶ法は、基準位置からの音声に対し処理後音声信号での利得を一定にすることを拘束条件とし、処理後音声信号の強度の２乗の、所定期間での平均値（分散）Ｅ[（Ｗ^ＨＦ）^２]を最小化するという拘束条件付最適化を行うというものである。これにより、処理後音声信号においては基準位置以外からの音が抑えられることになる。 A known MV (Minimum Variance) method is used to determine the load vector W used for the load sum. The MV method is a method for controlling directivity by using the spatiotemporal gradient method, and the spatiotemporal gradient method has been proposed as one of the methods for determining the optical flow that is the apparent velocity field in the moving image. (See Reference 1). More specifically, in the MV method, the gain in the processed audio signal is made constant with respect to the audio from the reference position, and the square of the intensity of the processed audio signal over the predetermined period is averaged. The optimization with constraints is performed to minimize the value (dispersion) E [(W ^H F) ² ]. Thereby, the sound from other than the reference position is suppressed in the processed audio signal.

以下、具体的に荷重ベクトルＷを決定する方法について説明する(参考文献２〜４参照)。簡単のため、音源は１個とし、図２のように受音部１の位置（以下、「観測点」と呼ぶ。）を原点とする座標系を考える。音速をc、音源の座標を(x,y,z)、音源と観測点との距離をｒ=(x²+y²+z²)^1/2、音源の位置で形成される音場をｇとおくと、観測点に形成される音場すなわち音圧ｆは次式で表される。 Hereinafter, a method for determining the load vector W will be specifically described (see References 2 to 4). For simplicity, assume that there is one sound source and a coordinate system having the position of the sound receiving unit 1 (hereinafter referred to as “observation point”) as the origin as shown in FIG. The speed of sound is c, the coordinates of the sound source are (x, y, z), the distance between the sound source and the observation point is r = (x ² + y ² + z ² ) ^1/2 , and the sound field formed by the position of the sound source is If g is set, the sound field formed at the observation point, that is, the sound pressure f is expressed by the following equation.

観測点における音圧ｆのｘ、ｙ方向の空間微分（勾配）ｆ_ｘ，ｆ_ｙは、 X sound pressure f at an observation point, the spatial derivative of y-direction _(gradient) f x, _{f y} is

ここで、
ξ_x=x/r²,ξ_y=y/r² …(8)
は強度勾配と呼ばれ、
τ_x=x/cr,τ_y=y/cr …(9)
はx,y方向時間勾配と呼ばれる。 here,
ξ _x = x / r ² , ξ _y = y / r ² (8)
Is called the intensity gradient,
τ _x = x / cr, τ _y = y / cr (9)
Is called the time gradient in the x and y directions.

観測点における音圧ｆ(t)のx,y方向の空間勾配を示した式(6)(7)を、音源から観測点に向かうベクトルＲ=(-x,-y,-z)（つまり|R|＝ｒ）を用いて書き直すと Equations (6) and (7) showing the spatial gradient in the x and y directions of the sound pressure f (t) at the observation point are expressed as a vector R = (-x, -y, -z) (ie, from the sound source to the observation point) Rewrite using | R | = r)

となる。次にｆ(t),ｆ_t(t),∇ｆ(t)が観測される時、これらの荷重和は It becomes. Then _{f (t), f t (} t), when ∇f (t) is observed, these weighted sum is

と表される。ここで、ｗ,ｗ_tは実数定数、Ｗ_Ｓ=(w_x,w_y,0)は単位ベクトルである。式(10)を式(11)に代入すると、 It is expressed. Here, w and w _t are real constants, and W _S = (w _x , w _y , 0) is a unit vector. Substituting equation (10) into equation (11),

となる。よって時空間勾配の荷重和は、ｆ(t),ｆ_t(t)に対してそれぞれ異なる指向特性Ｈ(Ｒ)，Ｈ_t(Ｒ)をもつフィルタの和として表される。Ｈ(Ｒ)=αのとき、式(13)は It becomes. Therefore, the load sum of the spatiotemporal gradient is expressed as the sum of filters having different directivity characteristics H (R) and H _t (R) for f (t) and f _t (t), respectively. When H (R) = α, equation (13) is

と変形できる。ここで、一般に、２つのベクトルａ，ｂの成す角をθとすると以下の公式が成り立つ。 And can be transformed. Here, generally, if the angle formed by the two vectors a and b is θ, the following formula holds.

式(18)の公式を用いると式(16)は次式のように書き換えられる。 Using the formula of equation (18), equation (16) can be rewritten as

ここで、｜Ｗ_Ｓ｜=１より、 Here, from | W _S | = 1,

という球の方程式で表される。ｗ+α=0の場合には、式(15)は
Ｒ・Ｗ_Ｓ=０ …(22)
となる。また、Ｈ_t(Ｒ)=αの時には式(14)は It is expressed by the sphere equation. In the case of w + α = 0, the formula (15) is R · W _S = 0 (22)
It becomes. When H _t (R) = α, equation (14) becomes

となるので、ベクトルＲとＷ_Ｓの成す角をθ(Ｒ)とすると｜Ｗ_Ｓ｜=１より Therefore, if the angle formed by the vector R and W _S is θ (R), | W _S | = 1

となる。よって、式(23)は It becomes. Therefore, equation (23) becomes

となる。 It becomes.

式(21)、(22)、(25)より、Ｈ(Ｒ)，Ｈ_t(Ｒ)について次のような性質を得る。
１）２つの指向特性Ｈ(Ｒ)，Ｈ_t(Ｒ)はＷｓを軸とする回転対称体をもつ
２）Ｈ(Ｒ)=０の時、Ｒの分布は直径1/ｗ(ｗ≠0)の球面または平面(ｗ=0)を成す
３）Ｈ_t(Ｒ)=０の時、Ｒの分布は頂角2cｗ_t(ｗ_t≠0）の円錐面または平面(ｗ_t=0)を成す
４）Ｈ(Ｒ)=０とＨ_t(Ｒ)=０の時のＲの分布の交わりは円または平面を成す
上述の指向特性Ｈ(Ｒ)，Ｈ_t(Ｒ)をそれぞれＨ₁(Ｒ₁)，Ｈ₂(Ｒ₁)と置き換えて下記のように定義する。但し、Ｒ₁は基準位置の位置ベクトル、r₁=|R₁|は観測点から基準位置までの距離（以下、「基準距離」と呼ぶ。）、ｎ_1x，ｎ_1yはそれぞれベクトルＲ₁と同じ向きの単位ベクトルＲ₁/r₁のｘ成分とy成分であり、ｎ_1x ^２＋ｎ_1y ^２＝１である。 From the equations (21), (22), and (25), the following properties are obtained for H (R) and H _t (R).
1) The two directivity characteristics H (R) and H _t (R) have a rotationally symmetric body with Ws as the axis. 2) When H (R) = 0, the distribution of R has a diameter 1 / w (w ≠ 0). ) To form a spherical surface or plane (w = 0) 3) When H _t (R) = 0, the distribution of R is a conical surface or plane (w _t = 0) with apex angle 2cw _t (w _t ≠ 0). 4) The intersection of the distributions of R when H (R) = 0 and H _t (R) = 0 forms a circle or a plane. The above directivity characteristics H (R) and H _t (R) are set to H ₁ ( R ₁ ) and H ₂ (R ₁ ) are defined as follows. Where R ₁ is the position vector of the reference position, r ₁ = | R ₁ | is the distance from the observation point to the reference position (hereinafter referred to as “reference distance”), and n _1x and n _1y are the vector R ₁ and The x component and y component of the unit vector R ₁ / r _{1 in} the same direction, and n _1x ² + n _1y ² = 1.

さらに、これらのＨ₁(R₁)，Ｈ₂(R₁)に対して、下記の式(28),(29)のような２つの拘束条件をおく。
W^HH₁(R₁)＝p …(28)
W^HH₂(R₁)＝q …(29)
ここで、p,qはそれぞれ正の実数定数である。すると、ベクトルＲ_１で示された基準位置にある音源からの音に対し、荷重ベクトルＷを用いた荷重和による利得は一定値ｐ+jωqとなるので、これを補償するために、図２の音声処理部２においては、荷重和をとる荷重和演算部２１の後段に、(p+jωq)^-1という１次の低域通過フィルタ２２を設けている。これにより、ベクトルＲ_１で示された基準位置にある音源からの音に対して音声処理部２全体での利得は１となっている。本実施形態では、p=1/r₁,q=1/cとしている。 Further, two constraint conditions such as the following formulas (28) and (29) are set for these H ₁ (R ₁ ) and H ₂ (R ₁ ).
W ^H H ₁ (R ₁ ) ＝ p… (28)
W ^H H ₂ (R ₁ ) ＝ q… (29)
Here, p and q are positive real constants, respectively. Then, with respect to sound from a sound source reference position indicated by the vector R _1, the gain due to the load sum with weight vector W is constant p + jωq, in order to compensate for this, in FIG. 2 In the speech processing unit 2, a first-order low-pass filter 22 of (p + jωq) ⁻¹ is provided after the load sum calculation unit 21 that takes the load sum. As a result, the gain of the entire audio processing unit 2 is 1 with respect to the sound from the sound source at the reference position indicated by the vector R ₁ . In this embodiment, p = 1 / r ₁ and q = 1 / c.

すると、処理後音声信号において基準位置からの音を抑圧せず基準位置以外からの音（騒音）を最も抑圧するような荷重ベクトルWは、式(28),(29)の条件のもとで、観測時間区間Γにおける処理後音声信号のパワー（以下、「処理後パワー」と呼ぶ。）P_c Then, the weight vector W that most suppresses the sound (noise) from other than the reference position without suppressing the sound from the reference position in the processed audio signal is obtained under the conditions of equations (28) and (29). , The power of the processed speech signal in the observation time interval Γ (hereinafter referred to as “processed power”) P _c

を最小化するというMinimum Variance Beamformer（ＭＶ法）を用いることにより得られる。すなわち、観測窓Γが請求項における荷重和算出時間に相当する。この解は、下記の式(31),(32)のように表される。 Is obtained by using a Minimum Variance Beamformer (MV method) for minimizing. That is, the observation window Γ corresponds to the load sum calculation time in the claims. This solution is expressed as the following equations (31) and (32).

但し、式(32)のＢij(i,j=a,x,y)は式(33)で表されるものであり、式(33)のb_a(t),b_x(t),b_y(t)はそれぞれ式(34)〜(36)で表されるものである。 However, Bij (i, j = a, x, y) in Expression (32) is expressed by Expression (33), and b _a (t), b _x (t), b in Expression (33) _y (t) is represented by the equations (34) to (36), respectively.

ここで、本発明者のシミュレーションによれば、図２及び図３（ａ）（ｂ）に示すように基準位置Ｓをｚ軸上とした場合であっても、図４及び図５のように基準位置Ｓをｚ軸から外れた位置とした場合であっても、観測点（すなわち受音部１の位置であり、上記の座標系での原点）Ｏと基準位置Ｓとを結ぶ直線上の音源からの音は処理後音声信号では殆ど抑圧されない。つまり、上記の基準位置Ｓが、処理後音声信号において選択的に残したい音の音源が存在すると見なされる目標位置と一致していなくても、観測点Ｏと目標位置とを結ぶ直線上に基準位置Ｓがあるならば、目標位置の音源からの音の抑圧を避けるという目的は達成される。言い換えれば、基準距離ｒ_１は必ずしも観測点と目標位置との距離（以下、「目標距離」と呼ぶ。）に一致している必要はなく、基準距離ｒ_１と目標距離とが互いに異なっていたとしても、処理後音声信号において目標位置の音源からの音が抑えられてしまうようなことはない。なお、図２〜図５に示した各シミュレーションでは、それぞれ音源は１個としている。また、図３（ａ）（ｂ）及び図５はそれぞれ処理後音声信号における音の抑圧量（以下、「騒音抑圧量」と呼ぶ）と音源の位置との関係を示す図であり、色が濃い位置ほど、その位置に音源が存在したと仮定した場合におけるその音源からの音に対する騒音抑圧量が高くなることを意味する。図３（ａ）及び図５ではそれぞれ基準距離ｒ_１は０．５ｍとされ、図３（ｂ）では基準距離ｒ_１は１ｍとされている。 Here, according to the inventor's simulation, even when the reference position S is on the z-axis as shown in FIGS. 2 and 3A and 3B, as shown in FIGS. Even when the reference position S is a position deviating from the z-axis, it is on a straight line connecting the observation point (that is, the position of the sound receiving unit 1 and the origin in the above coordinate system) O and the reference position S. Sound from the sound source is hardly suppressed by the processed audio signal. In other words, even if the reference position S does not coincide with the target position where the sound source desired to be selectively left in the processed audio signal is assumed to exist, the reference position S is on the straight line connecting the observation point O and the target position. If there is a position S, the goal of avoiding sound suppression from the sound source at the target position is achieved. In other words, the reference distance r ₁ does not necessarily match the distance between the observation point and the target position (hereinafter referred to as “target distance”), and the reference distance r ₁ and the target distance are different from each other. However, the sound from the sound source at the target position is not suppressed in the processed audio signal. In each simulation shown in FIGS. 2 to 5, the number of sound sources is one. FIGS. 3A, 3B and 5 are diagrams showing the relationship between the sound suppression amount (hereinafter referred to as “noise suppression amount”) in the processed audio signal and the position of the sound source. The darker the position, the higher the amount of noise suppression for the sound from that sound source when it is assumed that the sound source exists at that position. In FIGS. 3A and 5, the reference distance r ₁ is 0.5 m, and in FIG. 3B, the reference distance r ₁ is 1 m.

さらに、本発明者のシミュレーションによれば、観測点Ｏを中心とし基準距離ｒ_１を半径とする球面（図５の白線）上の点であって基準位置Ｓ以外の点に対しては比較的に騒音抑圧量が高くなるといったように、観測点と基準位置とを結ぶ直線から外れた位置に騒音源があった場合での処理後音声信号における騒音抑圧量は、基準距離ｒ_１に依存する。例えば、基準位置Ｓをｚ軸上にとった場合を考える。図６に示すように、騒音源Ｎがｚ軸に対してなす角φが３０°である方向であって１．０ｍの距離にある場合、図７及び図８に示すように、基準距離ｒ_１を０．５ｍとした場合には騒音抑圧量が１４．５ｄＢであるのに対し、基準距離ｒ_１を騒音源Ｎへの距離と同じ１ｍとした場合には騒音抑圧量が最大の２５．６ｄＢとなっている。なお、図８（ａ）は処理前音声信号の波形を示し、図８（ｂ）は基準距離ｒ_１を０．５ｍとした場合における処理後音声信号の波形を示し、図８（ｃ）は基準距離ｒ_１を１．０ｍとした場合における処理後音声信号の波形を示す。また、図９に示すようにｚ軸に対してなす角φが４５°である方向に２個の騒音源Ｎ１，Ｎ２があり、一方の騒音源Ｎ１と観測点との距離が１．０ｍであって他方の騒音源Ｎ２と観測点との距離が０．５ｍといったように２個の騒音源Ｎ１，Ｎ２が互いに離れている場合であっても、図１０に示すように基準距離ｒ_１を観測点から２個の騒音源Ｎ１，Ｎ２の中間までの距離程度としたときに騒音抑圧量がピーク値となっている。すなわち、基準距離ｒ_１を適宜変更すれば、基準位置を常に目標位置に一致させる場合に比べ、目標位置以外からの音（騒音）がより抑えられる可能性がある。 Further, according to the simulation of the present inventor, a point on the spherical surface (white line in FIG. 5) having the observation point O as the center and the reference distance r ₁ as the radius, and other than the reference position S is comparatively relatively. The noise suppression amount in the processed speech signal in the case where the noise source is at a position deviating from the straight line connecting the observation point and the reference position such that the noise suppression amount increases at the time depends on the reference distance r ₁ . . For example, consider a case where the reference position S is on the z-axis. As shown in FIG. 6, when the angle φ formed by the noise source N with respect to the z axis is 30 ° and is at a distance of 1.0 m, as shown in FIGS. 7 and 8, the reference distance r _{When 1} is 0.5 m, the noise suppression amount is 14.5 dB, whereas when the reference distance r ₁ is 1 m, which is the same as the distance to the noise source N, the noise suppression amount is 25. It is 6 dB. 8A shows the waveform of the unprocessed audio signal, FIG. 8B shows the waveform of the processed audio signal when the reference distance r ₁ is 0.5 m, and FIG. 8C shows the waveform of FIG. the reference distance r ₁ shows the waveform of the processed audio signal in case of a 1.0 m. Further, as shown in FIG. 9, there are two noise sources N1 and N2 in a direction where the angle φ formed with respect to the z-axis is 45 °, and the distance between one noise source N1 and the observation point is 1.0 m. Even when the two noise sources N1 and N2 are separated from each other such that the distance between the other noise source N2 and the observation point is 0.5 m, the reference distance r ₁ is set as shown in FIG. When the distance from the observation point to the middle between the two noise sources N1 and N2, the noise suppression amount has a peak value. That is, if appropriately changing the reference distance r _1, compared with the case of always match the target position a reference position, there is a possibility that the sound (noise) is more suppressed from the non-target position.

そこで、本実施形態では、目標位置以外からの音（騒音）がより抑えられるように、荷重ベクトルＷの決定に用いられる基準距離ｒ_１を随時更新する。 Therefore, in this embodiment, as the sound from other than the target position (noise) is suppressed more, at any time updates the reference distance r ₁ used to determine the load vector W.

詳しく説明すると、音声処理部２は、所定時間おきに、荷重ベクトルＷを更新する更新動作を行っており、このときに基準距離ｒ_１を更新する可能性がある。更新動作の具体的な内容を、図１１の流れ図を用いて説明する。 In detail, the voice processor 2, at predetermined time intervals, and performs an update operation for updating the weight vector W, it is possible to update the reference distance r ₁ at this time. Specific contents of the update operation will be described with reference to the flowchart of FIG.

音声処理部２は、更新動作を開始する（Ｓ１）と、まず、受音部１に入力されている音量が十分かどうかを判定する（Ｓ２）。すなわち、本実施形態は、更新動作の開始時点を基準として決定された観測窓Γにおける処理前音声信号のパワー（以下、「処理前パワー」と呼ぶ。）Ｐ_ｆ When the sound processing unit 2 starts the update operation (S1), it first determines whether or not the sound volume input to the sound receiving unit 1 is sufficient (S2). That is, in the present embodiment, the power of the pre-processing audio signal in the observation window Γ determined with reference to the start time of the update operation (hereinafter referred to as “pre-processing power”) P _f .

を演算する処理前パワー演算部３１と、得られた処理前パワーＰ_ｆを所定の無音閾値と比較する判定部３０とを有している。判定部３０において処理前パワーＰ_ｆが無音閾値未満と判定された場合、音声処理部２は、入力された音量が十分でなく基準距離ｒ_１や荷重ベクトルＷの更新に適さないと判定して更新動作を終了する（Ｓ３）。一方、判定部３０において処理前パワーＰ_ｆが無音閾値以上と判定された場合、音声処理部２は、音量が十分であると判定し、まず更新前の基準距離ｒ_１を用いたＭＶ法により荷重ベクトルＷを決定する（Ｓ４）。 And a determination unit 30 that compares the obtained pre-process power P _f with a predetermined silence threshold. When the determination unit 30 determines that the pre-processing power P _f is less than the silence threshold, the sound processing unit 2 determines that the input volume is not sufficient and is not suitable for updating the reference distance r ₁ and the load vector W. The update operation is terminated (S3). On the other hand, when the determination unit 30 determines that the pre-processing power P _f is equal to or greater than the silence threshold, the audio processing unit 2 determines that the volume is sufficient, and _first, by the MV method using the reference distance r ₁ before the update. The load vector W is determined (S4).

荷重ベクトルＷを決定した後、音声処理部２は、騒音が存在するか否かを判定する（Ｓ５）。すなわち、本実施形態は、ステップＳ４において決定された荷重ベクトルＷを用いて得られた処理後音声信号のパワー（処理後パワー）Ｐ_ｃを式(30)により算出する処理後パワー演算部３２を有しており、判定部３０は、処理後パワー演算部３２が出力した処理後パワーＰ_ｃの、処理前パワー演算部３１が出力した処理前パワーＰ_ｆに対する比（以下、「処理比」と呼ぶ。）Ｐ_ｃ／Ｐ_ｆを所定の目的音閾値と比較する。判定部３０において処理比が目的音閾値未満であると判定されれば、つまり処理前音声信号の音圧ｆに対して処理後音声信号の全体としての音圧がある程度低下しているならば、音声処理部２は騒音が存在すると判定し、判定部３０において処理比が目的音閾値以上と判定されれば音声処理部２は騒音が存在しないと判定する。一方、判定部３０において処理比が目的音閾値未満と判定されたことにより騒音が存在しないと判定された場合、基準距離を更新する意味はないので、音声処理部２はそのままステップＳ３に進んで更新動作を終了する。一方、騒音が存在すると判定された場合、音声処理部２は、音源の個数が１個であるか否かを判定する（Ｓ６）。上記の目的音閾値は、１より小さい正の定数とされる。上記の判定部３０、処理前パワー演算部３１、処理後パワー演算部３２はそれぞれ周知の電子回路によって実現可能であるので、詳細な図示は省略する。 After determining the load vector W, the voice processing unit 2 determines whether or not there is noise (S5). That is, in the present embodiment, the post-processing power calculation unit 32 that calculates the power (post-processing power) _Pc of the post-processing audio signal obtained by using the load vector W determined in step S4 by the equation (30) is provided. The determination unit 30 has a ratio of the post-processing power P _c output from the post-processing power calculation unit 32 to the pre-processing power P _f output from the pre-processing power calculation unit 31 (hereinafter referred to as “processing ratio”). And compare P _c / P _f with a predetermined target sound threshold. If the determination unit 30 determines that the processing ratio is less than the target sound threshold, that is, if the sound pressure as a whole of the processed sound signal is reduced to some extent with respect to the sound pressure f of the unprocessed sound signal, The voice processing unit 2 determines that noise is present, and if the determination unit 30 determines that the processing ratio is equal to or higher than the target sound threshold, the voice processing unit 2 determines that no noise exists. On the other hand, if the determination unit 30 determines that there is no noise because it is determined that the processing ratio is less than the target sound threshold, there is no point in updating the reference distance, so the speech processing unit 2 proceeds directly to step S3. The update operation is terminated. On the other hand, when it is determined that there is noise, the voice processing unit 2 determines whether or not the number of sound sources is one (S6). The target sound threshold is a positive constant smaller than 1. Since the determination unit 30, the pre-processing power calculation unit 31, and the post-processing power calculation unit 32 can be realized by well-known electronic circuits, detailed illustration is omitted.

ステップＳ６の判定の具体的な方法について説明する。音声処理部２は、観測窓Γから推定される共分散行列Ｓ A specific method for the determination in step S6 will be described. The speech processing unit 2 uses the covariance matrix S estimated from the observation window Γ.

の階数rank(S)を演算することによって音源の個数を推定する。このような音源の個数の推定については参考文献４に記載されている。すなわち、階数rank(S)が２であれば音源の数は１であり、階数rank(S)が３であれば音源の数は２であり、階数rank(S)が４であれば音源の数は３以上である。 The number of sound sources is estimated by calculating the rank rank (S). Such estimation of the number of sound sources is described in Reference Document 4. That is, if rank rank (S) is 2, the number of sound sources is 1, if rank rank (S) is 3, the number of sound sources is 2, and if rank rank (S) is 4, the number of sound sources is The number is 3 or more.

ステップ６において、音源の個数が１個であると判定された場合、音声処理部２は、音源までの距離を演算し、得られた距離を新たな基準距離として基準距離ｒ_１を更新するとともに、新たな基準距離ｒ_１に合わせて再度荷重ベクトルＷを演算して更新し（Ｓ７）、その後ステップＳ３に進んで更新動作を終了する。音源までの距離を演算する具体的な方法について説明する（参考文献５参照）。まず、式(8)(9)のτ_x,τ_y,ξ_x,ξ_yを、最小自乗法により求める。短時間の観測窓Γにおいて評価関数を
J=∫_Γ｛(f_x+ξ_xf+τ_xf_t)²+(f_y+ξ_yf+τ_yf_t)²｝dt …(39)
とする。式(39)をτ_x,τ_y,ξ_x,ξ_yに関して偏微分し、０とおくと次式が得られる。 When it is determined in step 6 that the number of sound sources is one, the sound processing unit 2 calculates the distance to the sound source and updates the reference distance r ₁ using the obtained distance as a new reference distance. , and update operation again load vector W in accordance with the new reference distance r ₁ (S7), and terminates the subsequent update operation proceeds to step S3. A specific method for calculating the distance to the sound source will be described (see Reference 5). First, τ _x , τ _y , ξ _x , ξ _y in equations (8) and (9) are obtained by the method of least squares. The evaluation function in the short observation window Γ
J = ∫ _Γ {(f _x + ξ _x f + τ _x f _t ) ² + (f _y + ξ _y f + τ _y f _t ) ² } dt… (39)
And When the equation (39) is partially differentiated with respect to τ _x , τ _y , ξ _x , ξ _y and set to 0, the following equation is obtained.

式(21)の共分散行列Ｓの行列要素を用いると、式(40)(41)は Using the matrix elements of the covariance matrix S of equation (21), equations (40) and (41) are

と書き直される。式(42)，(43)を解くことにより、τ_x,τ_y,ξ_x,ξ_yが次式のように求め
られる。 Rewritten. By solving the equations (42) and (43), τ _x , τ _y, ξ _{x and} ξ _y are obtained as follows.

音源までの距離ｒは、式(8)，(9)から最小自乗法を適用することにより求められる。評価関数を The distance r to the sound source can be obtained by applying the least square method from the equations (8) and (9). Evaluation function

とし、これを1/rで偏微分して0とおくと And if this is partially differentiated by 1 / r and set to 0

となる。これを解くと It becomes. Solving this

のように音源までの距離rが求められる。ステップＳ７では、音声処理部２は、式(38)(44)(45)(48)で得られた距離ｒを、新たな基準距離ｒ_１とするのである。 The distance r to the sound source is obtained as follows. In step S7, the audio processing unit 2, the formula (38) (44) (45) a distance r obtained in (48), it is taken as the new reference distance r _1.

また、ステップＳ６において音源が２個以上であると判定された場合、音声処理部２は、騒音抑圧量を最大化するような（つまり、処理後パワーＰ_ｃ及び処理比をそれぞれ最小化するような）基準距離ｒ_１を探索し（Ｓ８）、基準距離ｒ_１をステップＳ８において得られた値に更新するとともに、新たな基準距離ｒ_１に合わせて再度荷重ベクトルＷを演算して更新（Ｓ９）した上で、ステップＳ３に進んで更新動作を終了する。具体的には、基準距離ｒ_１を所定の単位幅だけ増減させた上で荷重ベクトルＷを決定するとともに処理後パワーＰ_ｃを処理後パワー演算部３２から得て記憶するといった動作を繰り返し、増減いずれの方向に変化させても処理後パワーＰ_ｃが増大する（つまり騒音抑圧量が低下する）ような基準距離ｒ_１に更新する。 If it is determined in step S6 that there are two or more sound sources, the audio processing unit 2 maximizes the noise suppression amount (that is, minimizes the post-processing power P _c and the processing ratio, respectively). explore the Do) reference distance r ₁ (S8), and updates the values obtained in the reference distance r ₁ step S8, update operation again load vector W in accordance with the new reference distance r ₁ (S9 After that, the process proceeds to step S3 to end the update operation. Specifically, the operation of determining the load vector W after increasing / decreasing the reference distance r ₁ by a predetermined unit width and obtaining and storing the processed power _Pc from the processed power calculation unit 32 is repeated to increase / decrease the load vector W. Regardless of the direction, the reference distance r ₁ is updated so that the post-processing power P _c increases (that is, the noise suppression amount decreases).

上記構成によれば、基準距離を常に目標距離とする場合に比べ、騒音をより抑えることが可能となる。 According to the above configuration, it is possible to suppress noise more than when the reference distance is always set as the target distance.

また、時空間勾配法に基く手法であるため、一般のビームフォーマや超指向性マイクロホンと比べて受音部１の小型化が可能であり、演算を全て時間領域で行うため、ＦＥＴを用いて周波数領域での演算を必要とする他の方式と比べて演算コストの削減が可能である。 In addition, since the method is based on the spatiotemporal gradient method, the sound receiving unit 1 can be downsized compared to a general beamformer or superdirective microphone, and all calculations are performed in the time domain. Compared to other methods that require computation in the frequency domain, the computation cost can be reduced.

なお、図１２に示すように、目標位置からの音が反映されない（つまり目標位置を不感点とする）不感音声信号を生成する不感処理部３３と、不感音声信号のパワー（以下、「不感処理後パワー」と呼ぶ）Ｐ_ｚを演算する不感処理後パワー演算部３４とを設けるとともに、ステップＳ５においては判定部３０は、不感処理後パワー演算部３４が出力した不感処理後パワーＰ_ｚの、処理前パワー演算部３１が出力した処理前パワーＰ_ｆに対する比（以下、「不感処理比」と呼ぶ。）Ｐ_Ｚ／Ｐ_ｆを所定の騒音閾値と比較するようにしてもよい。この場合、判定部３０において不感処理比が騒音閾値以上であると判定されれば、目標位置以外からの音声がある程度存在することになるので、音声処理部２は騒音が存在すると判定し、逆に判定部３０において処理比が騒音閾値未満と判定されれば音声処理部２は騒音が存在しないと判定する。上記の騒音閾値は、１より小さい正の定数とされる。不感処理部３３や不感処理後パワー演算部３４は音声処理部２等と同様に周知の電子回路で実現可能であるので、詳細な回路図等は省略する。不感処理部３３の動作について詳しく説明する。式(26),(27)で定義したＨ₁(R₁)，Ｈ₂(R₁)に対して、基準位置を示すベクトルR₁を目標位置を示すベクトルR₀に置き換えるとともに、下記の式(49),(50)のような２つの拘束条件をおく。
W_Z ^HH₁(R₀)＝0 …(49)
W_Z ^HH₂(R₀)＝0 …(50)
この条件を満たす荷重ベクトル（以下、「不感点荷重ベクトル」と呼ぶ。）W_Z=（ｗ' ｗ'_ｔｗ'_ｘｗ'_ｙ）^Ｔによる荷重和W_Z ^HF(t)で得られる不感音声信号は、目標位置に不感点を形成するものとなる。不感処理後パワーＰ_ｚは次式で表される。 As shown in FIG. 12, a dead sound processing unit 33 that generates a dead voice signal in which a sound from the target position is not reflected (that is, the target position is a dead point), and the power of the dead voice signal (hereinafter referred to as “dead process”). In addition to providing a post-insensitive power calculation unit 34 that calculates _Pz ), in step S5, the determination unit 30 determines the post-insensitive power P _z output by the post-insensitive power calculation unit 34. The ratio (hereinafter referred to as “insensitive processing ratio”) P _Z / P _f to the pre-processing power P _f output by the pre-processing power calculation unit 31 may be compared with a predetermined noise threshold. In this case, if the determination unit 30 determines that the insensitive processing ratio is equal to or greater than the noise threshold, there is some sound from other than the target position, so the sound processing unit 2 determines that noise is present and vice versa. If the determination unit 30 determines that the processing ratio is less than the noise threshold, the speech processing unit 2 determines that there is no noise. The noise threshold is a positive constant smaller than 1. Since the dead processing unit 33 and the post-dead processing power calculation unit 34 can be realized by a well-known electronic circuit like the voice processing unit 2 and the like, detailed circuit diagrams and the like are omitted. The operation of the dead processing unit 33 will be described in detail. For H ₁ (R ₁ ) and H ₂ (R ₁ ) defined in Expressions (26) and (27), the vector R ₁ indicating the reference position is replaced with the vector R ₀ indicating the target position, and the following expression Two constraint conditions such as (49) and (50) are set.
W _Z ^H H ₁ (R ₀ ) = 0 (49)
W _Z ^H H ₂ (R ₀ ) = 0 (50)
Load vector satisfying this condition (hereinafter referred to as “dead point load vector”) W _Z = (w ′ w ′ _t w ′ _x w ′ _y ) Insensitivity obtained by the load sum W _Z ^H F (t) by ^T The audio signal forms a dead point at the target position. The power _Pz after dead processing is expressed by the following equation.

さらに、図１２の例において、処理後パワー演算部３２を追加するとともに、ステップＳ８では、不感処理後パワー演算部３４が出力した不感処理後パワーＰ_ｚに対する、処理後パワー演算部３２が出力する処理後パワーＰ_Ｃの比Ｐ_Ｃ／Ｐ_Ｚを最小とするような基準距離ｒ_１を探索するようにしてもよい。 Further, in the example of FIG. 12, the post-processing power calculation unit 32 is added, and in step S8, the post-processing power calculation unit 32 outputs the desensitized power _Pz output by the post-dead processing power calculation unit 34. the ratio P _{C /} P _Z of the processed power P _C may be searched reference distance r ₁ that minimizes.

また、ステップＳ７やステップＳ９において、１回の更新動作当りの基準距離ｒ_１の変動幅に上限値（以下、「上限幅」と呼ぶ。）を設け、ステップＳ７やステップＳ８において得られた基準距離と更新前の基準距離ｒ_１との差の絶対値が上限幅を上回る場合には上限幅の分だけ基準距離ｒ_１を変化させるようにしてもよい。この構成を採用すれば、基準距離ｒ_１が大きく変更されることによる処理後音声信号の歪みが抑えられ、処理後音声信号を変換（再生）した音声が与える違和感が低減される。 Further, in step S7 and step S9, the upper limit for one variation width of the reference distance r ₁ per update operation (hereinafter, referred to as "upper limit width".) Is provided, the criteria obtained in step S7 or step S8 distance and the absolute value of the difference between the reference distance r ₁ before update may be changed reference distance r ₁ by the amount of the upper limit width when the value exceeds the upper limit width. By this configuration, the reference distance distortion of the processed speech signal due to the r ₁ is changed greatly suppressed, discomfort voice converting the processed audio signal (reproduction) provides is reduced.

または、ステップＳ７やステップＳ９において、１回の更新動作での基準距離ｒ_１の変動幅を一定としてもよい。具体的には例えば、音声処理部２は、基準距離ｒ_１を変更しない、基準距離ｒ_１を単位増加幅だけ増加させる、基準距離ｒ_１を単位減少幅だけ減少させる、の３通りの動作のうちから、更新動作の終了後の基準距離ｒ_１を、ステップＳ７の演算やステップＳ８の探索によって得られた基準距離に最も近くするような１つの動作を実行する。この場合において、単位増加幅と単位減少幅とは互いに異なっていてもよい。この構成を採用すれば、基準距離が大きく変更されることによる処理後音声信号の歪みが抑えられ、処理後音声信号を変換（再生）した音声が与える違和感が低減される。
＜参考文献一覧＞
参考文献１：安藤繁「画像の時空間微分算法を用いた速度ベクトル分布計測システム」計測自動制御学会論文集 22-12，1330/1336(1986)
参考文献２：N. Ono, T. Arita, Y. Senjo, and S. Ando, “Directivity steeringprinciple for biomimicry silicon microphone”, Proc. Int. Conf. Solid State Sensors,
Actuators, and Microsystems (Transducers'05), pp. 792-795, 2005.
参考文献３：小野, 安藤, “音場の計測と指向性制御, 第22回センシングフォーラム資料, pp. 305-310,2005.
参考文献４：小野, 有田, 千條, 安藤, “時空間勾配計測に基づく指向性制御と音源分離の理論, 日本音響学会2005年春季研究発表会講演論文集, 2-6-13, pp. 607-608, 2005.
参考文献５：安藤繁・篠田裕之・小川勝也・光山訓「時空間勾配法に基づく３次元音源定位センサシステム」計測自動制御学会論文集第２９巻第５号，p520~528，1993 Alternatively, in step S7 and step S9, the fluctuation range of the reference distance r1 in _one update operation may be constant. Specifically, for example, the audio processing unit 2 does not change the reference distance r _1, increasing the reference distance r ₁ by the unit increment, decreases the reference distance r ₁ by the unit decline, the three kinds of operation the inner shell, the reference distance r ₁ after the update operation is completed, to perform one of the operations such as closest to the reference distance obtained by the search operation or step S8 in step S7. In this case, the unit increase width and the unit decrease width may be different from each other. If this configuration is adopted, distortion of the processed audio signal due to a large change in the reference distance is suppressed, and the uncomfortable feeling given by the sound obtained by converting (reproducing) the processed audio signal is reduced.
<List of references>
Reference 1: Shigeru Ando “Velocity vector distribution measurement system using spatio-temporal differential calculation of images” Transactions of the Society of Instrument and Control Engineers 22-12, 1330/1336 (1986)
Reference 2: N. Ono, T. Arita, Y. Senjo, and S. Ando, “Directivity steeringprinciple for biomimicry silicon microphone”, Proc. Int. Conf. Solid State Sensors,
Actuators, and Microsystems (Transducers'05), pp. 792-795, 2005.
Reference 3: Ono, Ando, “Measurement of sound field and directivity control, 22nd Sensing Forum document, pp. 305-310, 2005.
Reference 4: Ono, Arita, Chiaki, Ando, “Theory of Directional Control and Sound Source Separation Based on Spatiotemporal Gradient Measurement, Proc. Of the Spring Meeting of the Acoustical Society of Japan 2005, 2-6-13, pp. 607-608, 2005.
Reference 5: Shigeru Ando, Hiroyuki Shinoda, Katsuya Ogawa, Satoshi Mitsuyama "3D sound source localization sensor system based on spatiotemporal gradient method" Vol. 29, No. 5, p520-528, 1993

本発明の実施形態を示すブロック図である。It is a block diagram which shows embodiment of this invention. 同上における基準位置と受音部との位置関係の一例を示す説明図である。It is explanatory drawing which shows an example of the positional relationship of the reference | standard position and sound receiving part in the same as the above. （ａ）（ｂ）はそれぞれ図２の場合における騒音抑圧量の分布の一例を示す説明図であり、（ａ）は基準距離を０．５ｍとした場合を示し、（ｂ）は基準距離を１．０ｍとした場合を示す。(A) (b) is explanatory drawing which shows an example of distribution of the noise suppression amount in the case of FIG. 2, respectively (a) shows the case where a reference distance is 0.5 m, (b) shows a reference distance. The case of 1.0 m is shown. 同上における基準位置と受音部との位置関係の別の例を示す説明図である。It is explanatory drawing which shows another example of the positional relationship of the reference | standard position and sound receiving part in the same as the above. 図４の場合における騒音抑圧量の分布の一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of a noise suppression amount distribution in the case of FIG. 4. 同上における騒音源の位置と受音部との位置関係の一例を示す説明図である。It is explanatory drawing which shows an example of the positional relationship of the position of a noise source and sound receiving part in the same as the above. 図６の場合における基準距離と騒音抑圧量との関係を示す説明図である。It is explanatory drawing which shows the relationship between the reference distance in the case of FIG. 6, and the amount of noise suppression. （ａ）〜（ｃ）はそれぞれ同上の動作を示す説明図であり、（ａ）は処理前音声信号の波形を示し、（ｂ）は基準距離を０．５ｍとした場合における処理後音声信号の波形を示し、（ｃ）は基準距離を１．０ｍとした場合における処理後音声信号の波形を示す。(A)-(c) is explanatory drawing which respectively shows operation | movement same as the above, (a) shows the waveform of the audio signal before a process, (b) is the audio signal after a process in case a reference distance is 0.5 m. (C) shows the waveform of the processed audio signal when the reference distance is 1.0 m. 同上における騒音源の位置と受音部との位置関係の別の例を示す説明図である。It is explanatory drawing which shows another example of the positional relationship of the position of a noise source and sound receiving part in the same as the above. 図９の場合における基準距離と騒音抑圧量との関係を示す説明図である。FIG. 10 is an explanatory diagram showing a relationship between a reference distance and a noise suppression amount in the case of FIG. 9. 同上における更新動作を示す流れ図である。It is a flowchart which shows the update operation | movement in the same as the above. 同上の別の形態を示すブロック図である。It is a block diagram which shows another form same as the above.

Explanation of symbols

１受音部
２音声処理部
３０判定部
３１処理前パワー演算部
３２処理後パワー演算部
３３不感処理部
３４不感処理後パワー演算部 DESCRIPTION OF SYMBOLS 1 Sound receiving part 2 Audio | voice processing part 30 Judgment part 31 Pre-processing power calculation part 32 Post-processing power calculation part 33 Dead process part 34 Power calculation part after dead process

Claims

A pre-processed speech signal having an intensity corresponding to the sound pressure of the input sound, a time differential signal having an intensity corresponding to the time differential value of the sound pressure, and a spatial differential of the sound pressure in each axial direction of a two-dimensional orthogonal coordinate system A sound receiving unit that outputs an x differential signal and a y differential signal corresponding to the value;
An audio processing unit that generates a processed audio signal by a weighted sum of the pre-processing audio signal, the time differential signal, the x differential signal, and the y differential signal output by the sound receiving unit;
The voice processing unit, under the constraint that the gain for the sound from the reference position determined according to a predetermined target position where the target voice is considered to be the load used for the load sum is constant, It is determined by the MV method that minimizes the processed power that is the integral value of the square of the intensity of the processed audio signal over a predetermined load sum calculation time,
The reference position is selected from a straight line passing through the target position starting from the sound receiving unit,
A sound collection device that periodically performs an update operation that may update a reference distance, which is a distance between a sound receiving unit and a reference position, so that power after processing is minimized.

An unprocessed power calculator that calculates an unprocessed power that is an integral value of the square of the intensity of the unprocessed audio signal over the load sum calculation time; and the unprocessed power obtained by the unprocessed power calculator is a predetermined volume threshold value. 2. The determination unit according to claim 1, wherein the audio processing unit does not change the reference distance during the update operation if the pre-processing power is less than a volume threshold value as a result of the comparison in the determination unit. Sound collecting device.

A pre-processing power calculation unit that calculates a pre-processing power that is an integral value of the square of the intensity of the pre-processing voice signal over the load sum calculation time, a post-processing power calculation unit that calculates post-processing power, and a pre-processing power calculation A determination unit that calculates a processing ratio that is a ratio of the post-processing audio power output by the post-processing power calculation unit to the pre-processing audio power output by the unit and compares the obtained processing ratio with a predetermined target sound threshold value; 3. The sound collection device according to claim 1, wherein the sound processing unit does not change the reference distance during the update operation if the processing ratio obtained by the determination unit is equal to or greater than a target sound threshold.

A dead processing unit that generates a dead voice signal that does not reflect a voice from a target position based on a load sum of a pre-processing voice signal, a time differential signal, an x differential signal, and a y differential signal output by the sound receiving unit; An unprocessed power calculation unit that calculates an unprocessed power that is an integral value of the square of the intensity of the unprocessed audio signal over the calculation time, and an insensitivity that is an integral value of the square of the intensity of the insensitive audio signal over the load sum calculation time. A desensitization processing ratio, which is a ratio of the desensitized power output unit output by the desensitized power calculation unit to the unprocessed power calculation unit that calculates the post-processing power and the unprocessed power calculation unit output by the pre-processing power calculation unit. And a determination unit that compares the insensitive processing ratio obtained with the calculation with a predetermined noise threshold, and the speech processing unit has a reference distance if the insensitive processing ratio obtained by the determination unit is less than the noise threshold during the update operation. Sound pickup apparatus according to claim 1 or claim 2, wherein it does not change.

The voice processing unit updates the reference distance to a value that minimizes the processing ratio when the processing ratio obtained by the determination unit is less than the target sound threshold during the update operation. Item 4. The sound collecting device according to Item 3.

Provided with a post-processing power calculator that calculates post-processing power,
When the dead processing ratio obtained by the determination unit is greater than or equal to the noise threshold during the update operation, the speech processing unit uses the post-processing power relative to the post-dead processing power output by the post-dead power calculation unit. 5. The sound collecting device according to claim 4, wherein the sound collecting device is updated to a value that minimizes the ratio of the post-processing power output by the arithmetic unit.

When updating the reference distance, the sound processing unit determines whether or not the number of sound sources is one based on the unprocessed sound signal, the time differential signal, the x differential signal, and the y differential signal output from the sound receiving unit. In addition, when the number of sound sources is one, the distance to the sound source is estimated, and the estimated distance to the sound source is used as the updated reference distance. The sound collecting device according to any one of the above.

The sound processing device according to claim 5, wherein the sound processing unit sets a change width of the reference distance in one update operation to a predetermined upper limit width or less.

The operations that the voice processing unit may perform with respect to the reference distance during the update operation include either not changing the reference distance, increasing the reference distance by a predetermined unit increase amount, or decreasing the reference distance by a predetermined unit decrease amount. The sound collecting device according to claim 5, wherein the sound collecting device is a sound collecting device.