JP7630872B2

JP7630872B2 - Noise Update Circuit

Info

Publication number: JP7630872B2
Application number: JP2020113298A
Authority: JP
Inventors: 康二郎今里
Original assignee: Japan Radio Co Ltd
Current assignee: Japan Radio Co Ltd
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2025-02-18
Anticipated expiration: 2040-06-30
Also published as: JP2022011890A

Description

この発明は、ノイズ更新回路に関し、例えば、高周波信号を送受信する無線機に組み込まれるノイズリダクション回路に用いられ得るノイズ更新回路に関する。 This invention relates to a noise update circuit, for example, a noise update circuit that can be used in a noise reduction circuit incorporated in a radio device that transmits and receives high-frequency signals.

音声信号に含まれる雑音成分を抑圧する手法としてスペクトル減算法（Ｓpectral Ｓubtraction）が知られている（例えば、特許文献１、非特許文献１参照）。 Spectral subtraction is known as a method for suppressing noise components contained in an audio signal (see, for example, Patent Document 1 and Non-Patent Document 1).

特開平４－２３８３９９号公報Japanese Patent Application Publication No. 4-238399

Ｐ．Ｓcalart and Ｊ．Ｖieira Ｆilho「Ｓpeech Ｅnhancement Ｂased on a Ｐriori Ｓignal to Ｎoise Ｅstimation」，ＩＥＥＥＩnternational Ｃonference on．Ａcoustics，Ｓpeech，Ｓignal Ｐrocessing，Ａtlanta，ＧＡ，ＵＳＡ，ｖｏｌ．２，ｐｐ．６２９－６３２，１９９６年P. Scalart and J. Vieira Filho “Speech Enhancement Based on a Priori Signal to Noise Estimation”, IEEE International Conference on. Acoustics, Speech, Signal Processing, Atlanta, GA, USA, vol. 2, pp. 629-632, 1996

ところで、スペクトル減算法を適切に適用するためには、ノイズ成分を的確に推定して入力される音声信号から減算することが重要である。 However, in order to properly apply the spectral subtraction method, it is important to accurately estimate the noise components and subtract them from the input speech signal.

そこでこの発明は、ノイズ成分を的確に推定することが可能な、ノイズ更新回路を提供することを目的とする。 Therefore, the objective of this invention is to provide a noise update circuit that can accurately estimate noise components.

上記課題を解決するために、請求項１に記載の発明は、処理対象のフレームがノイズ成分のみのフレームである場合に、前記処理対象のフレームの振幅スペクトルに該当する信号を入力信号スペクトルＹi(ｆ)として、ＩＩＲフィルタである以下の数式１もしくはＦＩＲフィルタである以下の数式２に従って周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を算出する第１の更新部と（但し、Ｎi-1(ｆ)：更新の１フレーム前のノイズスペクトル、Ｙi-j(ｆ)：更新のｊフレーム前の入力信号スペクトル、Ｋn：処理対象のフレームがノイズ成分のみのフレームである場合のＹi(ｆ)に対するＮi-1(ｆ)の重みづけを決定づける定数、ｉ：時系列の順序を表す順序数、ｊ：時系列における順序数ｉとの隔たりの程度を表す０以上の整数）、
前記処理対象のフレームが音声成分を含むフレームである場合に、ＩＩＲフィルタである以下の数式３もしくはＦＩＲフィルタである以下の数式４に従って周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を算出する第２の更新部と（但し、Ｋs：処理対象のフレームが音声成分を含むフレームである場合のＹi(ｆ)に対するＮi-1(ｆ)の重みづけを決定づける定数、Ｋn＜Ｋs）、
を有し、前記入力信号スペクトルＹi(ｆ)について、以下の数式５に従って平均スペクトルレベルＭを算出する平均値算出部と（但し、ｆ1：振幅スペクトルにおける最小の周波数、ｆ2：振幅スペクトルにおける最大の周波数、Ｆn：最小の周波数ｆ1から最大の周波数ｆ2までの範囲における周波数の個数）、
前記入力信号スペクトルＹi(ｆ)と前記平均スペクトルレベルＭとに関してＹi(ｆ)≧Ｔe×Ｍである周波数ｆについて、以下の数式６に従って更新後のノイズスペクトルＮi(ｆ)を決定する第３の更新部と（但し、Ｔe：係数、Ｎi-1(ｆ)：更新の１フレーム前のノイズスペクトル）、
をさらに有する、ことを特徴とするノイズ更新回路である。 In order to solve the above problem, the present invention provides a first updating unit that, when a frame to be processed is a frame containing only noise components, calculates an updated noise spectrum Ni(f) for each frequency f in accordance with the following Equation 1 which is an IIR filter or the following Equation 2 which is an FIR filter, using a signal corresponding to an amplitude spectrum of the frame to be processed as an input signal spectrum Y(f) (where Ni-1(f): noise spectrum one frame before the update, Yi-j(f): input signal spectrum j frames before the update, Kn: constant determining weighting of Ni-1(f) with respect to Yi(f) when the frame to be processed is a frame containing only noise components, i: ordinal number representing the order in time series, j: integer equal to or greater than 0 representing the degree of distance from ordinal number i in the time series);
a second update unit that calculates an updated noise spectrum N(f) for each frequency f according to the following Equation 3 which is an IIR filter or the following Equation 4 which is an FIR filter when the frame to be processed is a frame including a voice component (where Ks is a constant that determines the weighting of N(f) with respect to Y(f) when the frame to be processed is a frame including a voice component, Kn<Ks);
an average calculation unit that calculates an average spectrum level M for the input signal spectrum Y(f) according to the following formula 5 (where f: minimum frequency in the amplitude spectrum, f: maximum frequency in the amplitude spectrum, F: number of frequencies in the range from the minimum frequency f to the maximum frequency f);
a third update unit that determines an updated noise spectrum Ni(f) according to the following Equation 6 for a frequency f where Yi(f)≧Te×M with respect to the input signal spectrum Yi(f) and the average spectrum level M (where T is a coefficient, and Ni-1(f) is the noise spectrum one frame before the update);
The noise updating circuit further comprises :

請求項２に記載の発明は、請求項１に記載のノイズ更新回路において、前記定数Ｋsが、ＩＩＲフィルタの時定数もしくはＦＩＲフィルタの平均区間の１～１０秒に相当する範囲のうちのいずれかの値に設定される、ことを特徴とする。 The invention described in claim 2 is characterized in that, in the noise updating circuit described in claim 1 , the constant Ks is set to any value within a range corresponding to a time constant of an IIR filter or an averaging interval of an FIR filter, which is 1 to 10 seconds.

請求項３に記載の発明は、請求項１に記載のノイズ更新回路において、前記係数Ｔeが、１～１００の範囲のうちのいずれかの値に設定される、ことを特徴とする。 According to a third aspect of the present invention, in the noise updating circuit according to the first aspect, the coefficient Te is set to any value within a range of 1 to 100.

請求項１に記載の発明によれば、処理対象のフレームとして音声成分を含むフレームが続いた場合であってもノイズの変動に追従することができ、ノイズ成分を的確に推定することが可能となる。具体的には、スペクトル減算法を実現する従来の回路では、処理対象のフレームがノイズ成分のみのフレームである場合にはノイズスペクトルを更新する一方で音声成分を含むフレームである場合にはノイズスペクトルを更新しないようにしているので、処理対象のフレームとして音声成分を含むフレームが続くとノイズスペクトルが更新されないためにノイズの変動に的確に追従することができず、結果的にノイズ成分を的確に推定することができない、という問題がある。これに対して、請求項１に記載の発明では、処理対象のフレームが音声成分を含むフレームである場合もノイズスペクトルを更新するようにしているので、処理対象のフレームとして音声成分を含むフレームが続いた場合であってもノイズの変動に追従することができ、ノイズ成分を的確に推定することが可能となる。また、請求項１に記載の発明によれば、トーン信号を抑圧しないようすることが可能となる。具体的には、スペクトル減算法を実現する従来の回路では、周波数スペクトルにおいて定常的に存在する成分をノイズと判断して抑圧するようにしているので、トーン信号も定常性があるためにノイズと判断されて抑圧の対象になり、ユーザにとって必要なトーン信号（例えば、モールス信号）も抑圧されてしまう、という問題がある。これに対して、請求項１に記載の発明では、平均スペクトルレベルよりも振幅スペクトルが著しく大きい周波数成分をノイズスペクトルの更新から除外するようにしているので、トーン信号を抑圧しないようにすることが可能となる。 According to the invention described in claim 1, even if frames containing voice components are consecutively processed as frames to be processed, it is possible to follow the noise fluctuations and accurately estimate the noise components. Specifically, in a conventional circuit that realizes the spectrum subtraction method, the noise spectrum is updated when the frame to be processed is a frame containing only noise components, but the noise spectrum is not updated when the frame contains voice components. Therefore, if frames containing voice components are consecutively processed as frames to be processed, the noise spectrum is not updated, so it is not possible to follow the noise fluctuations accurately, and as a result, it is not possible to accurately estimate the noise components. In contrast, according to the invention described in claim 1 , the noise spectrum is updated even when the frame to be processed is a frame containing voice components, so it is possible to follow the noise fluctuations and accurately estimate the noise components even if frames containing voice components are consecutively processed as frames to be processed. Also, according to the invention described in claim 1, it is possible not to suppress the tone signal. Specifically, in conventional circuits that realize the spectral subtraction method, components that are constantly present in the frequency spectrum are judged to be noise and suppressed, so that tone signals, which are also stationary, are judged to be noise and are subject to suppression, resulting in the problem that tone signals necessary for the user (for example, Morse code) are also suppressed. In contrast, in the invention described in claim 1, frequency components whose amplitude spectrum is significantly larger than the average spectrum level are excluded from updating of the noise spectrum, making it possible to avoid suppressing tone signals.

請求項２に記載の発明によれば、処理対象のフレームが音声成分を含むフレームである場合の入力信号スペクトルＹi(ｆ)に対する更新の１フレーム前のノイズスペクトルＮi-1(ｆ)の重みづけを決定づける定数Ｋsが適切な値に設定されるので、処理対象のフレームとして音声成分を含むフレームが続いた場合におけるノイズの変動に一層確実に追従することができ、ノイズ成分を一層確実に的確に推定することが可能となる。 According to the invention recited in claim 2 , the constant K determining the weighting of the noise spectrum N(f) of the previous frame to be updated for the input signal spectrum Y(f) when the frame to be processed is a frame containing a speech component is set to an appropriate value. Therefore, it is possible to more reliably follow noise fluctuations when frames containing speech components are successively processed as frames to be processed, and it is possible to more reliably and accurately estimate the noise components.

請求項３に記載の発明によれば、係数Ｔeが適切な値に設定されるので、平均スペクトルレベルよりも振幅スペクトルが著しく大きい周波数成分をノイズスペクトルの更新から一層確実に除外することができ、トーン信号を一層確実に抑圧しないようすることが可能となる。
According to the invention recited in claim 3 , the coefficient Te is set to an appropriate value, so that it is possible to more reliably exclude frequency components whose amplitude spectrum is significantly larger than the average spectrum level from the updating of the noise spectrum, and it is possible to more reliably prevent tone signals from being suppressed.

この発明の実施の形態に係るノイズ更新回路を含むノイズリダクション回路の概略構成を示す機能ブロック図である。1 is a functional block diagram showing a schematic configuration of a noise reduction circuit including a noise updating circuit according to an embodiment of the present invention; 実施の形態１に係るノイズ更新回路の概略構成を示す機能ブロック図である。1 is a functional block diagram showing a schematic configuration of a noise updating circuit according to a first embodiment; 実施の形態２に係るノイズ更新回路の概略構成を示す機能ブロック図である。FIG. 11 is a functional block diagram showing a schematic configuration of a noise updating circuit according to a second embodiment.

以下、この発明を図示の実施の形態に基づいて説明する。 The present invention will be described below based on the illustrated embodiment.

（実施の形態１）
図１は、この発明の実施の形態に係るノイズ更新回路１１を含むノイズリダクション回路１の概略構成を示す機能ブロック図である。図２は、実施の形態１に係るノイズ更新回路１１の概略構成を示す機能ブロック図である。 (Embodiment 1)
Fig. 1 is a functional block diagram showing a schematic configuration of a noise reduction circuit 1 including a noise updating circuit 11 according to an embodiment of the present invention. Fig. 2 is a functional block diagram showing a schematic configuration of the noise updating circuit 11 according to the first embodiment.

ノイズリダクション回路１は、例えば、高周波信号を送受信する無線機に組み込まれて、音声信号に含まれる雑音成分を抑圧する手法であるスペクトル減算法（Ｓpectral Ｓubtraction）を実現する回路であり、主として、プリエンファシス回路２と、窓処理部３と、時間周波数変換部４と、変換結果出力部５と、減算部６と、合成部７と、周波数時間変換部８と、ディエンファシス回路９と、音声区間検出部１０と、ノイズ更新回路１１と、を有する。 The noise reduction circuit 1 is a circuit that is incorporated, for example, in a radio that transmits and receives high-frequency signals, and realizes the Spectral Subtraction method, which is a method for suppressing noise components contained in audio signals. It mainly has a pre-emphasis circuit 2, a window processing unit 3, a time-frequency conversion unit 4, a conversion result output unit 5, a subtraction unit 6, a synthesis unit 7, a frequency-time conversion unit 8, a de-emphasis circuit 9, a voice section detection unit 10, and a noise update circuit 11.

プリエンファシス（Ｐre-Ｅmphasis：ＰＥ）回路２は、アンテナから受信した高周波信号を復調した音声信号に対して高周波成分の相対強度を予め増幅する高域強調処理を施して、高域強調処理後の信号を出力する。 The pre-emphasis (PE) circuit 2 performs high-frequency emphasis processing to pre-amplify the relative intensity of high-frequency components on the audio signal demodulated from the high-frequency signal received from the antenna, and outputs the signal after high-frequency emphasis processing.

窓処理部３は、プリエンファシス回路２から出力される高域強調処理後の信号の入力を受け、入力された前記信号から所定の時間長さのフレームを抽出する（例えば、１２．５ｍｓごとに２５ｍｓ分の時間波形を抽出する）とともに、各フレームに対して例えばハニング窓などの窓関数を乗じて窓処理を施す。窓処理部３は、各フレームに対して窓処理を施すたびに、窓処理後のフレームを出力する。 The window processing unit 3 receives the high-frequency emphasis processed signal output from the pre-emphasis circuit 2, extracts frames of a predetermined length from the input signal (e.g., extracts a 25 ms time waveform every 12.5 ms), and performs window processing on each frame by multiplying it by a window function such as a Hanning window. Each time the window processing unit 3 performs window processing on a frame, it outputs the windowed frame.

時間周波数変換部４は、窓処理部３から出力される窓処理後のフレームの入力を受け、前記フレームの入力を受けるたびに、前記フレームに対して時間領域の信号から周波数領域の信号への変換処理を施し、複数の周波数それぞれについての振幅成分と位相成分とを含む周波数スペクトルを計算して、実数と虚数との周波数スペクトルの信号を出力する。時間周波数変換部４は、例えば離散フーリエ変換（Ｄiscrete Ｆourier Ｔransform)や高速フーリエ変換（Ｆast Ｆourier Ｔransform）により、時間周波数変換を実行して周波数スペクトルを計算する。 The time-frequency conversion unit 4 receives the windowed frame output from the window processing unit 3, and each time the time-frequency conversion unit 4 receives the frame, it converts the frame from a time domain signal to a frequency domain signal, calculates a frequency spectrum including amplitude and phase components for each of a plurality of frequencies, and outputs a signal with a real and imaginary frequency spectrum. The time-frequency conversion unit 4 performs a time-frequency conversion, for example, by a discrete Fourier transform or a fast Fourier transform, to calculate the frequency spectrum.

変換結果出力部５は、時間周波数変換部４から出力されるフレームごとの（例えば、１２．５ｍｓ程度の間隔で）周波数スペクトルの信号の入力を受け、フレームごとに、入力された前記周波数スペクトルのうちの各周波数の振幅成分を含む振幅スペクトルに該当する信号を減算部６に対して出力するとともに、入力された前記周波数スペクトルのうちの各周波数の位相成分を含む位相スペクトルに該当する信号を合成部７に対して出力する。 The conversion result output unit 5 receives the frequency spectrum signal for each frame (e.g., at intervals of about 12.5 ms) output from the time-frequency conversion unit 4, and outputs, for each frame, a signal corresponding to an amplitude spectrum including the amplitude components of each frequency in the input frequency spectrum to the subtraction unit 6, and outputs a signal corresponding to a phase spectrum including the phase components of each frequency in the input frequency spectrum to the synthesis unit 7.

減算部６は、変換結果出力部５から出力されるフレームごとの振幅スペクトルに該当する信号の入力を受けるとともに、ノイズ更新回路１１から出力されるフレームごとの更新後のノイズスペクトルに該当する信号の入力を受け、各フレームについて、入力された前記振幅スペクトルに該当する信号から、周波数ごとに（別言すると、スペクトルごとに）、入力された前記更新後のノイズスペクトルに該当する信号を減算する。これにより、音声信号に含まれる雑音成分が抑圧される。減算部６は、変換結果出力部５から出力されるフレームごとに、減算処理後の振幅スペクトルに該当する信号を出力する。 The subtraction unit 6 receives an input of a signal corresponding to the amplitude spectrum for each frame output from the conversion result output unit 5, and also receives an input of a signal corresponding to the updated noise spectrum for each frame output from the noise update circuit 11, and subtracts the signal corresponding to the input updated noise spectrum for each frequency (in other words, for each spectrum) from the signal corresponding to the input amplitude spectrum for each frame. This suppresses the noise components contained in the audio signal. The subtraction unit 6 outputs a signal corresponding to the amplitude spectrum after subtraction for each frame output from the conversion result output unit 5.

合成部７は、変換結果出力部５から出力されるフレームごとの位相スペクトルに該当する信号の入力を受けるとともに、減算部６から出力されるフレームごとの減算処理後の振幅スペクトルに該当する信号の入力を受け、フレームごとに、入力された前記位相スペクトルに該当する信号と前記振幅スペクトルに該当する信号とを合成して周波数スペクトルを生成して、実数と虚数との周波数スペクトルの信号を出力する。 The synthesis unit 7 receives an input of a signal corresponding to the phase spectrum for each frame output from the conversion result output unit 5, and receives an input of a signal corresponding to the amplitude spectrum after subtraction processing for each frame output from the subtraction unit 6, synthesizes the input signal corresponding to the phase spectrum and the signal corresponding to the amplitude spectrum for each frame to generate a frequency spectrum, and outputs a signal with a frequency spectrum of real and imaginary numbers.

周波数時間変換部８は、合成部７から出力されるフレームごとの周波数スペクトルの信号の入力を受け、フレームごとに、入力された前記周波数スペクトルの信号に対して周波数領域の信号から時間領域の信号への変換処理、すなわち時間周波数変換部４における変換処理の逆変換処理を施して、音声信号を出力する。周波数時間変換部８は、例えば逆離散フーリエ変換や逆高速フーリエ変換により、周波数時間変換を実行して音声信号を生成する。 The frequency-time transform unit 8 receives the frequency spectrum signal for each frame output from the synthesis unit 7, and performs a conversion process from a frequency domain signal to a time domain signal for each frame on the input frequency spectrum signal, i.e., an inverse conversion process of the conversion process in the time-frequency transform unit 4, to output an audio signal. The frequency-time transform unit 8 performs a frequency-time conversion, for example, by an inverse discrete Fourier transform or an inverse fast Fourier transform, to generate an audio signal.

ディエンファシス（Ｄe－Ｅmphasis：ＤＥ）回路９は、周波数時間変換部８から出力される音声信号の入力を受け、入力された前記音声信号に対して高周波成分の相対強度を減衰させる高域減衰処理、すなわちプリエンファシス回路２の逆フィルタによる減衰処理を施して、高域減衰処理後の音声信号を出力する。 The de-emphasis (DE) circuit 9 receives the audio signal output from the frequency-time converter 8, performs high-frequency attenuation processing to attenuate the relative intensity of high-frequency components in the input audio signal, i.e., attenuation processing using the inverse filter of the pre-emphasis circuit 2, and outputs the audio signal after high-frequency attenuation processing.

音声区間検出部１０は、変換結果出力部５から出力されて分岐されるフレームごとの振幅スペクトルに該当する信号の入力を受け、フレームごとに、入力された前記振幅スペクトルに該当する信号について、ノイズ成分のみのフレームであるのか、音声成分を含むフレームであるのか、の判定を行う。 The speech section detection unit 10 receives the signal corresponding to the amplitude spectrum for each frame output from the conversion result output unit 5 and branches off, and determines for each frame whether the signal corresponding to the input amplitude spectrum is a frame containing only noise components or a frame containing speech components.

音声区間検出部１０における、処理対象のフレームがノイズ成分のみであるのか音声成分を含むのかの判定の仕法は、特定の手順や手法に限定されるものではなく、従来もしくは新規の手順や手法の中から適当な手順や手法が適宜選択され得る。 The method used by the voice activity detection unit 10 to determine whether a frame being processed contains only noise components or also contains voice components is not limited to a specific procedure or method, but rather an appropriate procedure or method may be selected from conventional or new procedures or methods.

音声区間検出部１０における、処理対象のフレームがノイズ成分のみであるのか音声成分を含むのかの判定の仕法として、例えば、音声の非恒常性に着目して、振幅スペクトルの周波数別の振幅の大きさに関する平均や分散の値が直近のフレームにおいて複数回（例えば、３～５回程度）連続して所定の閾値未満であるときは処理対象のフレームはノイズ成分のみであると判定し、前記以外のときは処理対象のフレームには音声成分があると判定する手法、あるいは、振幅スペクトルの周波数別の振幅の大きさに関する平均や分散の値が所定の閾値未満であるときは処理対象のフレームはノイズ成分のみであると判定し、前記平均や分散の値が前記閾値以上であるときは処理対象のフレームには音声成分があると判定する手法などが用いられ得る。 As a method for determining whether a frame to be processed contains only noise components or includes speech components in the voice section detection unit 10, for example, a method that focuses on the non-constancy of speech and determines that the frame to be processed contains only noise components if the average or variance value of the amplitude magnitude for each frequency of the amplitude spectrum is less than a predetermined threshold value multiple times (e.g., about 3 to 5 times) consecutively in the most recent frame, and determines that the frame to be processed contains speech components in other cases, or a method that determines that the frame to be processed contains only noise components if the average or variance value of the amplitude magnitude for each frequency of the amplitude spectrum is less than a predetermined threshold value, and determines that the frame to be processed contains speech components if the average or variance value is equal to or greater than the threshold value.

音声区間検出部１０は、処理対象のフレームはノイズ成分のみであると判定した場合にはノイズフレーム信号を出力し、また、処理対象のフレームには音声成分があると判定した場合には音声フレーム信号を出力する。音声区間検出部１０は、フレームごとに、音声区間検出結果としてノイズフレーム信号または音声フレーム信号を出力する。 The voice section detection unit 10 outputs a noise frame signal when it determines that the frame to be processed contains only noise components, and outputs a voice frame signal when it determines that the frame to be processed contains voice components. The voice section detection unit 10 outputs a noise frame signal or a voice frame signal as the voice section detection result for each frame.

そして、実施の形態に係るノイズ更新回路１１は、処理対象のフレームがノイズ成分のみのフレームである場合に、処理対象のフレームの振幅スペクトルに該当する信号を入力信号スペクトルＹi(ｆ)として、周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を平均化を含む処理によって算出する第１の更新部と（但し、ｉ：時系列の順序を表す順序数）、処理対象のフレームが音声成分を含むフレームである場合に、周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を平均化を含む処理によって算出する第２の更新部と、を有し、音声成分を含むフレームである場合に更新後のノイズスペクトルＮi(ｆ)を算出する際の、入力信号スペクトルＹi(ｆ)の平均化時間が、ノイズ成分のみのフレームである場合に更新後のノイズスペクトルＮi(ｆ)を算出する際の、入力信号スペクトルＹi(ｆ)の平均化時間よりも長い、ようにしている。 The noise update circuit 11 according to the embodiment has a first update unit that calculates an updated noise spectrum Ni(f) for each frequency f by processing including averaging using a signal corresponding to the amplitude spectrum of the frame to be processed as the input signal spectrum Yi(f) when the frame to be processed is a frame containing only noise components (where i is an ordinal number representing the order in the time series), and a second update unit that calculates an updated noise spectrum Ni(f) for each frequency f by processing including averaging when the frame to be processed is a frame containing voice components, such that the averaging time of the input signal spectrum Yi(f) when calculating the updated noise spectrum Ni(f) when the frame contains voice components is longer than the averaging time of the input signal spectrum Yi(f) when calculating the updated noise spectrum Ni(f) when the frame contains only noise components.

ノイズ更新回路１１は、過去に計算された周波数ごとの雑音成分を表すノイズスペクトルに、現フレーム（別言すると、処理対象のフレーム、最新のフレーム）の振幅スペクトルを加味することにより、最新のノイズスペクトルに更新するものであり、第１の更新部１１１と、第２の更新部１１２と、を有する。 The noise update circuit 11 updates the noise spectrum representing the noise components for each frequency calculated in the past to the latest noise spectrum by adding the amplitude spectrum of the current frame (in other words, the frame to be processed, the latest frame), and has a first update unit 111 and a second update unit 112.

ノイズ更新回路１１は、変換結果出力部５から出力されて分岐されるフレームごとの振幅スペクトルに該当する信号の入力を受けるとともに、音声区間検出部１０から出力されるフレームごとの音声区間検出結果の入力を受け、入力された前記振幅スペクトルに該当する信号を用いて、周波数ｆごとに、更新の１フレーム前のノイズスペクトルＮi-1(ｆ)を更新するものとして、更新後のノイズスペクトルＮi(ｆ)を、入力された前記音声区間検出結果の内容に応じて下記の数式７もしくは数式８または数式９もしくは数式１０に従って算出する。なお、以降の数式における添字ｉは、時系列の順序を表す順序数であり、すべての数式に共通して適用される順序を表す。また、以降の数式における添字ｊは、時系列における順序数ｉとの隔たりの程度を表す変数であり、０以上の整数である。 The noise update circuit 11 receives an input of a signal corresponding to the amplitude spectrum for each frame output from the conversion result output unit 5 and branched, and also receives an input of the voice activity detection result for each frame output from the voice activity detection unit 10, and updates the noise spectrum Ni-1(f) of the previous frame for each frequency f using the input signal corresponding to the amplitude spectrum, and calculates the updated noise spectrum Ni(f) according to the following formula 7 or 8, or formula 9 or 10 depending on the content of the input voice activity detection result. Note that the subscript i in the following formulas is an ordinal number representing the order in the time series, and represents an order that is commonly applied to all formulas. Also, the subscript j in the following formulas is a variable representing the degree of distance from the ordinal number i in the time series, and is an integer equal to or greater than 0.

ノイズ更新回路１１へと入力された前記音声区間検出結果がノイズフレーム信号である場合には、第１の更新部１１１が、入力された前記振幅スペクトルに該当する信号を入力信号スペクトルＹi(ｆ)として、ＩＩＲ（Ｉnfinite Ｉmpulse Ｒesponse の略；無限インパルス応答）フィルタである以下の数式７もしくはＦＩＲ（Ｆinite Ｉmpulse Ｒesponse の略；有限インパルス応答）フィルタである以下の数式８に従って周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を算出する。なお、以降の数式における、Ｎi-1(ｆ)は更新の１フレーム前のノイズスペクトルを表し、Ｙi-j(ｆ)は更新のｊフレーム前の入力信号スペクトルを表す。
When the voice activity detection result input to the noise updating circuit 11 is a noise frame signal, a first updating unit 111 calculates an updated noise spectrum Ni(f) for each frequency f using a signal corresponding to the input amplitude spectrum as an input signal spectrum Yi(f) according to the following formula 7 which is an IIR (Infinite Impulse Response) filter or the following formula 8 which is an FIR (Finite Impulse Response) filter. In the following formulas, Ni-1(f) represents the noise spectrum one frame before the update, and Yi-j(f) represents the input signal spectrum j frames before the update.

数式７や数式８におけるＫnは、処理対象のフレーム（別言すると、現フレーム、最新のフレーム）がノイズ成分のみのフレームである場合の、前記処理対象のフレームの振幅スペクトルである入力信号スペクトルＹi(ｆ)に対する更新の１フレーム前のノイズスペクトルＮi-1(ｆ)の重みづけを決定づける定数であり、「ノイズ時更新前重み定数Ｋn」と呼ぶ。 Kn in Equation 7 and Equation 8 is a constant that determines the weighting of the noise spectrum Ni-1(f) of the previous frame to the input signal spectrum Yi(f), which is the amplitude spectrum of the frame to be processed (in other words, the current frame or the latest frame) when the frame to be processed is a frame containing only noise components, and is called the "noise pre-update weighting constant Kn."

ノイズ時更新前重み定数Ｋnは、０以上の整数であれば特定の値には限定されない。ノイズ時更新前重み定数Ｋnは、具体的には例えば、ＩＩＲフィルタの時定数もしくはＦＩＲフィルタの平均区間の０．０６～０．２０秒程度に相当する範囲（例えば、フレーム間隔１２．５ｍｓにおいてＫn＝５～１６程度の範囲）のうちのいずれかの値に設定されることが考えられ、特にＩＩＲフィルタの時定数もしくはＦＩＲフィルタの平均区間の０．１秒程度に相当する値（例えば、フレーム間隔１２．５ｍｓにおいてＫn＝８程度）に設定されることが考えられる。なお、ＩＩＲフィルタの時定数もしくはＦＩＲフィルタの平均区間を例えば０．１秒としたとき、数式７における定数Ｋnの具体的な値と数式８における定数Ｋnの具体的な値とは異なる。 The noise pre-update weighting constant Kn is not limited to a specific value as long as it is an integer equal to or greater than 0. Specifically, the noise pre-update weighting constant Kn may be set to any value within a range equivalent to the time constant of the IIR filter or the average interval of the FIR filter of about 0.06 to 0.20 seconds (for example, a range of Kn = 5 to 16 at a frame interval of 12.5 ms), and may be set to a value equivalent to the time constant of the IIR filter or the average interval of the FIR filter of about 0.1 seconds (for example, Kn = 8 at a frame interval of 12.5 ms). Note that when the time constant of the IIR filter or the average interval of the FIR filter is set to, for example, 0.1 seconds, the specific value of the constant Kn in Equation 7 differs from the specific value of the constant Kn in Equation 8.

ノイズ更新回路１１へと入力された前記音声区間検出結果が音声フレーム信号である場合には、第２の更新部１１２が、入力された前記振幅スペクトルに該当する信号を入力信号スペクトルＹi(ｆ)として、ＩＩＲフィルタである以下の数式９もしくはＦＩＲフィルタである以下の数式１０に従って周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を算出する。
When the voice section detection result input to the noise updating circuit 11 is a voice frame signal, a second updating unit 112 calculates an updated noise spectrum N(f) for each frequency f, using a signal corresponding to the input amplitude spectrum as an input signal spectrum Y(f) in accordance with the following Equation 9 which is an IIR filter or the following Equation 10 which is an FIR filter:

数式９や数式１０におけるＫsは、処理対象のフレーム（別言すると、現フレーム、最新のフレーム）が音声成分を含むフレームである場合の、前記処理対象のフレームの振幅スペクトルである入力信号スペクトルＹi(ｆ)に対する更新の１フレーム前のノイズスペクトルＮi-1(ｆ)の重みづけを決定づける定数であり、「音声時更新前重み定数Ｋs」と呼ぶ。 Ks in Equation 9 and Equation 10 is a constant that determines the weighting of the noise spectrum Ni-1(f) of the previous frame to the input signal spectrum Yi(f), which is the amplitude spectrum of the frame being processed (in other words, the current frame or the latest frame) when the frame contains a voice component, and is called the "voice pre-update weighting constant Ks."

音声時更新前重み定数Ｋsは、ノイズ時更新前重み定数Ｋnよりも大きく、好ましくは十分に大きい整数であれば、特定の値には限定されない。音声時更新前重み定数Ｋsは、具体的には例えば、ＩＩＲフィルタの時定数もしくはＦＩＲフィルタの平均区間の１～１０秒程度に相当する範囲（例えば、フレーム間隔１２．５ｍｓにおいてＫs＝８０～８００程度の範囲）のうちのいずれかの値、さらに特定するとＩＩＲフィルタの時定数もしくはＦＩＲフィルタの平均区間の１～６秒程度に相当する範囲（例えば、フレーム間隔１２．５ｍｓにおいてＫs＝８０～５００程度の範囲）のうちのいずれかの値に設定されることが考えられ、特にＩＩＲフィルタの時定数もしくはＦＩＲフィルタの平均区間の３．７秒程度に相当する値（例えば、フレーム間隔１２．５ｍｓにおいてＫs＝２９５程度）に設定されることが考えられる。なお、ＩＩＲフィルタの時定数もしくはＦＩＲフィルタの平均区間を例えば３．７秒としたとき、数式９における定数Ｋsの具体的な値と数式１０における定数Ｋsの具体的な値とは異なる。 The pre-update weight constant Ks during voice is not limited to a specific value, as long as it is a larger integer than the pre-update weight constant Kn during noise, and preferably a sufficiently large integer. Specifically, the pre-update weight constant Ks during voice may be set to any value within a range equivalent to about 1 to 10 seconds of the time constant of the IIR filter or the average interval of the FIR filter (for example, a range of about Ks = 80 to 800 at a frame interval of 12.5 ms), or more specifically, to any value within a range equivalent to about 1 to 6 seconds of the time constant of the IIR filter or the average interval of the FIR filter (for example, a range of about Ks = 80 to 500 at a frame interval of 12.5 ms), and may be set to a value equivalent to about 3.7 seconds of the time constant of the IIR filter or the average interval of the FIR filter (for example, about Ks = 295 at a frame interval of 12.5 ms). Note that when the time constant of the IIR filter or the average interval of the FIR filter is, for example, 3.7 seconds, the specific value of the constant Ks in Equation 9 differs from the specific value of the constant Ks in Equation 10.

音声時更新前重み定数Ｋsが十分に大きい値に設定されることにより、音声には非恒常性があるため、ノイズスペクトルの更新において音声成分が大きく反映されることが回避され、したがって、音声成分が雑音成分として抑圧されることがない。 By setting the pre-voice weighting constant Ks to a sufficiently large value, voice components are prevented from being significantly reflected in the noise spectrum update due to the non-constant nature of voice, and therefore voice components are not suppressed as noise components.

なお、更新後のノイズスペクトルＮi(ｆ)を算出するための平均化処理として、ＩＩＲフィルタ（一般形はＮi(ｆ)＝ΣＡjＹi-j(ｆ)＋ΣＢkＮi-k(ｆ)；但し、Ａ，Ｂは係数、ｉは時系列の順序を表す順序数、ｊは時系列における順序数ｉとの隔たりの程度を表す０以上の整数、ｋは時系列における順序数ｉとの隔たりの程度を表す１以上の整数）を用いる場合は、処理対象のフレームがノイズ成分のみのフレームである場合に、処理対象のフレームの振幅スペクトルに該当する信号を入力信号スペクトルＹi(ｆ)として、周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を平均化を含む処理によって算出する第１の更新部１１１と、処理対象のフレームが音声成分を含むフレームである場合に、周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を平均化を含む処理によって算出する第２の更新部１１２と、を有し、音声成分を含むフレームである場合に更新後のノイズスペクトルＮi(ｆ)を算出する際の、入力信号スペクトルＹi(ｆ)と過去の入力信号スペクトルＹi-j(ｆ)と過去のノイズスペクトルＮi-k(ｆ)とを用いた平均の平均化時間が、ノイズ成分のみのフレームである場合に更新後のノイズスペクトルＮi(ｆ)を算出する際の、入力信号スペクトルＹi(ｆ)と過去の入力信号スペクトルＹi-j(ｆ)と過去のノイズスペクトルＮi-k(ｆ)とを用いた平均の平均化時間よりも長い、場合に該当する。 In addition, when an IIR filter (general form: Ni(f) = ΣAjYi-j(f) + ΣBkNi-k(f); where A and B are coefficients, i is an ordinal number representing the order of the time series, j is an integer of 0 or more representing the degree of distance from ordinal number i in the time series, and k is an integer of 1 or more representing the degree of distance from ordinal number i in the time series) is used as the averaging process for calculating the updated noise spectrum Ni(f), a first updating unit 111 calculates the updated noise spectrum Ni(f) for each frequency f by processing including averaging when the frame to be processed is a frame containing only noise components, using a signal corresponding to the amplitude spectrum of the frame to be processed as the input signal spectrum Yi(f), and a second updating unit 112 calculates the updated noise spectrum Ni(f) for each frequency f by processing including averaging when the frame to be processed is a frame containing voice components. and a second update unit 112 that calculates the updated noise spectrum Ni(f) by processing including averaging, and corresponds to a case where the averaging time of the input signal spectrum Yi(f), the past input signal spectrum Yi-j(f), and the past noise spectrum Ni-k(f) when calculating the updated noise spectrum Ni(f) for a frame containing a voice component is longer than the averaging time of the input signal spectrum Yi(f), the past input signal spectrum Yi-j(f), and the past noise spectrum Ni-k(f) when calculating the updated noise spectrum Ni(f) for a frame containing only noise components.

また、上記のＩＩＲフィルタの一般形についてｊ≧１においてＡj＝０としたＩＩＲフィルタを用いる場合は、処理対象のフレームがノイズ成分のみのフレームである場合に、処理対象のフレームの振幅スペクトルに該当する信号を入力信号スペクトルＹi(ｆ)として、周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を平均化を含む処理によって算出する第１の更新部１１１と、処理対象のフレームが音声成分を含むフレームである場合に、周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を平均化を含む処理によって算出する第２の更新部１１２と、を有し、音声成分を含むフレームである場合に更新後のノイズスペクトルＮi(ｆ)を算出する際の、入力信号スペクトルＹi(ｆ)と過去のノイズスペクトルＮi-k(ｆ)とを用いた平均の平均化時間が、ノイズ成分のみのフレームである場合に更新後のノイズスペクトルＮi(ｆ)を算出する際の、入力信号スペクトルＹi(ｆ)と過去のノイズスペクトルＮi-k(ｆ)とを用いた平均の平均化時間よりも長い、場合に該当する。 When using an IIR filter with Aj=0 for j≧1 for the general form of the IIR filter described above, the first update unit 111 calculates an updated noise spectrum Ni(f) for each frequency f by processing including averaging, using a signal corresponding to the amplitude spectrum of the frame to be processed as the input signal spectrum Yi(f) when the frame to be processed is a frame containing only noise components, and the second update unit 112 calculates an updated noise spectrum Ni(f) for each frequency f by processing including averaging when the frame to be processed is a frame containing voice components. This corresponds to the case where the averaging time of the average using the input signal spectrum Yi(f) and the past noise spectrum Ni-k(f) when calculating the updated noise spectrum Ni(f) when the frame contains voice components is longer than the averaging time of the average using the input signal spectrum Yi(f) and the past noise spectrum Ni-k(f) when calculating the updated noise spectrum Ni(f) when the frame contains only noise components.

なお、上記のＩＩＲフィルタの一般形についてｊ≧１においてＡj＝０としたうえで、Ａ0＝１／(１＋Ｋn)，Ｂ1＝Ｋn／(１＋Ｋn)，且つＢk＝０（ｋ≧２）である場合が上記の数式７に該当し、Ａ0＝１／(１＋Ｋs)，Ｂ1＝Ｋs／(１＋Ｋs)，且つＢk＝０（ｋ≧２）である場合が上記の数式９に該当する。 For the general form of the IIR filter above, if Aj=0 for j≧1, then the case where A0=1/(1+Kn), B1=Kn/(1+Kn) and Bk=0 (k≧2) corresponds to the above formula 7, and the case where A0=1/(1+Ks), B1=Ks/(1+Ks) and Bk=0 (k≧2) corresponds to the above formula 9.

さらに、更新後のノイズスペクトルＮi(ｆ)を算出するための平均化処理として、ＦＩＲフィルタ（一般形はＮi(ｆ)＝ΣＡjＹi-j(ｆ)；但し、Ａは係数、ｉは時系列の順序を表す順序数、ｊは時系列における順序数ｉとの隔たりの程度を表す０以上の整数；即ち、上記のＩＩＲフィルタの一般形についてＢk＝０としたもの）を用いる場合は、処理対象のフレームがノイズ成分のみのフレームである場合に、処理対象のフレームの振幅スペクトルに該当する信号を入力信号スペクトルＹi(ｆ)として、周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を平均化を含む処理によって算出する第１の更新部１１１と、処理対象のフレームが音声成分を含むフレームである場合に、周波数ｆごとに更新後のノイズスペクトルＮi(ｆ)を平均化を含む処理によって算出する第２の更新部１１２と、を有し、音声成分を含むフレームである場合に更新後のノイズスペクトルＮi(ｆ)を算出する際の、入力信号スペクトルＹi(ｆ)と過去の入力信号スペクトルＹi-j(ｆ)とを用いた平均の平均化時間が、ノイズ成分のみのフレームである場合に更新後のノイズスペクトルＮi(ｆ)を算出する際の、入力信号スペクトルＹi(ｆ)と過去の入力信号スペクトルＹi-j(ｆ)とを用いた平均の平均化時間よりも長い、場合に該当する。 Furthermore, when an FIR filter (general form is Ni(f) = ΣAjYi-j(f); where A is a coefficient, i is an ordinal number representing the order of the time series, and j is an integer equal to or greater than 0 representing the degree of separation from the ordinal number i in the time series; that is, the general form of the IIR filter described above with Bk = 0) is used as the averaging process for calculating the updated noise spectrum Ni(f) for each frequency f, when the frame to be processed is a frame containing only noise components, a first update unit 111 is used which calculates the updated noise spectrum Ni(f) for each frequency f by processing including averaging using a signal corresponding to the amplitude spectrum of the frame to be processed as the input signal spectrum Yi(f); and when the frame to be processed is a frame containing voice components, In some cases, the second update unit 112 calculates the updated noise spectrum Ni(f) for each frequency f by processing including averaging, and the averaging time of the input signal spectrum Yi(f) and the past input signal spectrum Yi-j(f) when calculating the updated noise spectrum Ni(f) for a frame containing a voice component is longer than the averaging time of the input signal spectrum Yi(f) and the past input signal spectrum Yi-j(f) when calculating the updated noise spectrum Ni(f) for a frame containing only noise components.

なお、上記のＦＩＲフィルタの一般形について、Ａj＝１／Ｋnである場合が上記の数式８に該当し、Ａj＝１／Ｋsである場合が上記の数式１０に該当する。 Note that for the general form of the FIR filter above, when Aj = 1/Kn it corresponds to the above formula 8, and when Aj = 1/Ks it corresponds to the above formula 10.

ノイズ更新回路１１は、フレームごとの振幅スペクトルに該当する信号および音声区間検出結果の入力を受けるたびに、更新後の、周波数ｆごとのノイズスペクトルＮi(ｆ)に該当する信号を減算部６に対して出力する。減算部６は、フレームごとに、ノイズ更新回路１１から出力される前記更新後のノイズスペクトルＮi(ｆ)に該当する信号を用いて、変換結果出力部５から出力される振幅スペクトルに該当する信号から前記更新後のノイズスペクトルＮi(ｆ)に該当する信号を減算する処理を行う。 The noise update circuit 11 outputs a signal corresponding to the updated noise spectrum Ni(f) for each frequency f to the subtraction unit 6 each time it receives an input of a signal corresponding to the amplitude spectrum for each frame and the voice activity detection result. The subtraction unit 6 performs a process of subtracting the signal corresponding to the updated noise spectrum Ni(f) for each frame from the signal corresponding to the amplitude spectrum output from the conversion result output unit 5, using the signal corresponding to the updated noise spectrum Ni(f) output from the noise update circuit 11.

上記のようなノイズ更新回路１１によれば、処理対象のフレームとして音声成分を含むフレームが続いた場合であってもノイズの変動に追従することができ、ノイズ成分を的確に推定することが可能となる。具体的には、スペクトル減算法を実現する従来の回路では、処理対象のフレームがノイズ成分のみのフレームである場合にはノイズスペクトルを更新する一方で音声成分を含むフレームである場合にはノイズスペクトルを更新しないようにしているので、処理対象のフレームとして音声成分を含むフレームが続くとノイズスペクトルが更新されないためにノイズの変動に的確に追従することができず、結果的にノイズ成分を的確に推定することができない、という問題がある。これに対して、上記のようなノイズ更新回路１１では、処理対象のフレームが音声成分を含むフレームである場合もノイズスペクトルを更新するようにしているので、処理対象のフレームとして音声成分を含むフレームが続いた場合であってもノイズの変動に追従することができ、ノイズ成分を的確に推定することが可能となる。 According to the above-mentioned noise update circuit 11, even if the frames to be processed are successive frames containing voice components, it is possible to follow the noise fluctuations and accurately estimate the noise components. Specifically, in a conventional circuit that realizes the spectrum subtraction method, the noise spectrum is updated when the frame to be processed is a frame containing only noise components, but the noise spectrum is not updated when the frame contains voice components. Therefore, if the frames to be processed are successive frames containing voice components, the noise spectrum is not updated, so it is not possible to accurately follow the noise fluctuations, and as a result, it is not possible to accurately estimate the noise components. In contrast, in the above-mentioned noise update circuit 11, the noise spectrum is updated even when the frames to be processed are frames containing voice components, so it is possible to follow the noise fluctuations and accurately estimate the noise components, even if the frames to be processed are successive frames containing voice components.

（実施の形態２）
図３は、この発明の実施の形態２に係るノイズ更新回路１１の概略構成を示す機能ブロック図である。 (Embodiment 2)
FIG. 3 is a functional block diagram showing a schematic configuration of a noise updating circuit 11 according to a second embodiment of the present invention.

この実施の形態ではノイズ更新回路１１が上記の実施の形態１の構成と比べて追加の構成を有する点で実施の形態１と異なる一方で、共通する構成や処理の内容もあり、実施の形態１と同等の構成や処理の内容については同一符号を付することでその説明を省略する。 This embodiment differs from embodiment 1 in that the noise update circuit 11 has an additional configuration compared to the configuration of embodiment 1 described above, but there are also common configurations and processing contents, and the same configurations and processing contents as embodiment 1 are denoted by the same reference numerals and their description will be omitted.

この実施の形態に係るノイズ更新回路１１は、入力信号スペクトルＹi(ｆ)について平均スペクトルレベルＭを算出する平均値算出部１１３と、入力信号スペクトルＹi(ｆ)と平均スペクトルレベルＭとに関してＹi(ｆ)≧Ｔe×Ｍである周波数ｆについて更新後のノイズスペクトルＮi(ｆ)を決定する第３の更新部１１４と、をさらに有する、ようにしている。 The noise update circuit 11 according to this embodiment further includes an average calculation unit 113 that calculates an average spectrum level M for the input signal spectrum Yi(f), and a third update unit 114 that determines an updated noise spectrum Ni(f) for a frequency f where Yi(f) ≧ Te×M with respect to the input signal spectrum Yi(f) and the average spectrum level M.

ノイズ更新回路１１は、変換結果出力部５から出力されて分岐されるフレームごとの振幅スペクトルに該当する信号の入力を受けるとともに、音声区間検出部１０から出力されるフレームごとの音声区間検出結果の入力を受け、入力された前記振幅スペクトルに該当する信号を用いて、周波数ｆごとに、更新の１フレーム前のノイズスペクトルＮi-1(ｆ)を更新するものとして、更新後のノイズスペクトルＮi(ｆ)を、入力された前記音声区間検出結果の内容などに応じて上記の数式７もしくは数式８、数式９もしくは数式１０、ならびに数式１２のうちのいずれかに従って算出する。 The noise update circuit 11 receives an input of a signal corresponding to the amplitude spectrum for each frame output from the conversion result output unit 5 and branched, and also receives an input of the voice section detection result for each frame output from the voice section detection unit 10. Using the input signal corresponding to the amplitude spectrum, the noise spectrum Ni-1(f) of the previous frame is updated for each frequency f, and the updated noise spectrum Ni(f) is calculated according to any one of the above formulas 7 or 8, 9 or 10, and 12 depending on the contents of the input voice section detection result, etc.

平均値算出部１１３は、変換結果出力部５から出力されて分岐されるフレームごとの振幅スペクトルに該当する信号を入力信号スペクトルＹi(ｆ)として、以下の数式１１に従って平均スペクトルレベルＭを算出する。平均スペクトルレベルＭは、すなわち、振幅スペクトルの周波数軸方向における平均値である。
The average value calculation unit 113 calculates an average spectrum level M according to the following formula 11, using the signal corresponding to the amplitude spectrum for each frame output and branched from the conversion result output unit 5 as the input signal spectrum Yi(f). The average spectrum level M is the average value in the frequency axis direction of the amplitude spectrum.

数式１１における、ｆは振幅スペクトルにおける周波数を表し、ｆ1は振幅スペクトルにおける最小の周波数であり、ｆ2は振幅スペクトルにおける最大の周波数であり、さらに、Ｆnは最小の周波数ｆ1から最大の周波数ｆ2までの範囲における周波数の個数を表す。周波数の個数Ｆnは、すなわち、時間周波数変換部４での時間周波数変換において１フレームに含まれるサンプル点数をｎとすると、ｎ／２＋１である。 In Equation 11, f represents the frequency in the amplitude spectrum, f1 is the minimum frequency in the amplitude spectrum, f2 is the maximum frequency in the amplitude spectrum, and Fn represents the number of frequencies in the range from the minimum frequency f1 to the maximum frequency f2. In other words, the number of frequencies Fn is n/2+1, where n is the number of sample points included in one frame in the time-frequency conversion by the time-frequency conversion unit 4.

平均値算出部１１３は、フレームごとの振幅スペクトルに該当する信号の入力を受けるたびに、平均スペクトルレベルＭを算出する。 The average calculation unit 113 calculates the average spectrum level M each time it receives an input of a signal corresponding to the amplitude spectrum for each frame.

そのうえで、ノイズ更新回路１１へと入力された音声区間検出結果がノイズフレーム信号である場合には、入力された振幅スペクトルに該当する信号を入力信号スペクトルＹi(ｆ)として、周波数ｆごとに、前記入力信号スペクトルＹi(ｆ)と平均スペクトルレベルＭとの間の関係に応じて下記の〈ア〉または〈イ〉の処理が行われる。 If the voice section detection result input to the noise update circuit 11 is a noise frame signal, the signal corresponding to the input amplitude spectrum is treated as the input signal spectrum Yi(f), and for each frequency f, the following processing (a) or (b) is performed according to the relationship between the input signal spectrum Yi(f) and the average spectrum level M.

〈ア〉Ｙi(ｆ)＜Ｔe×Ｍである周波数ｆについて
第１の更新部１１１が、上記の数式７もしくは数式８に従って更新後のノイズスペクトルＮi(ｆ)を算出する。 <A> For a frequency f where Yi(f)<Te×M, the first update unit 111 calculates the updated noise spectrum Ni(f) in accordance with Equation 7 or Equation 8 above.

入力信号スペクトルＹi(ｆ)と平均スペクトルレベルＭとの間の関係におけるＴeは、平均スペクトルレベルＭよりも振幅スペクトルが著しく大きい周波数成分をノイズスペクトルの更新から除外するための係数である。係数Ｔeは、特定の値に限定されるものではなく、例えばトーン信号に相当する周波数成分がノイズスペクトルの更新から除外されるようにしてトーン信号が減算部６で減算されず雑音成分として抑圧されないようにすることが考慮されるなどしたうえで、適当な値に適宜設定される。係数Ｔeは、具体的には、１～１００程度の範囲のうちのいずれかの値に設定されることが考えられ、特に１０程度に設定されることが考えられる。 In the relationship between the input signal spectrum Yi(f) and the average spectrum level M, Te is a coefficient for excluding frequency components whose amplitude spectrum is significantly larger than the average spectrum level M from the update of the noise spectrum. The coefficient Te is not limited to a specific value, and is set to an appropriate value, for example, taking into consideration that the frequency components corresponding to the tone signal are excluded from the update of the noise spectrum so that the tone signal is not subtracted by the subtraction unit 6 and is not suppressed as a noise component. Specifically, the coefficient Te may be set to any value in the range of about 1 to 100, and may be set to about 10 in particular.

〈イ〉Ｙi(ｆ)≧Ｔe×Ｍである周波数ｆについて
第３の更新部１１４が、以下の数式１２に従って更新後のノイズスペクトルＮi(ｆ)を決定する。
<A> For a frequency f where Yi(f)≧Te×M, the third update unit 114 determines the updated noise spectrum N i (f) in accordance with the following Equation 12.

数式１２におけるＮi-1(ｆ)は、更新の１フレーム前のノイズスペクトルを表す。 In equation 12, Ni-1(f) represents the noise spectrum of the frame before the update.

また、ノイズ更新回路１１へと入力された音声区間検出結果が音声フレーム信号である場合には、入力された振幅スペクトルに該当する信号を入力信号スペクトルＹi(ｆ)として、周波数ｆごとに、前記入力信号スペクトルＹi(ｆ)と平均スペクトルレベルＭとの間の関係に応じて下記の〈ウ〉または〈エ〉の処理が行われる。 In addition, if the voice section detection result input to the noise update circuit 11 is a voice frame signal, the signal corresponding to the input amplitude spectrum is treated as the input signal spectrum Yi(f), and for each frequency f, the following processing (c) or (d) is performed according to the relationship between the input signal spectrum Yi(f) and the average spectrum level M.

〈ウ〉Ｙi(ｆ)＜Ｔe×Ｍである周波数ｆについて
第２の更新部１１２が、上記の数式９もしくは数式１０に従って更新後のノイズスペクトルＮi(ｆ)を算出する。 <c> For a frequency f where Yi(f)<Te×M, the second update unit 112 calculates the updated noise spectrum N i (f) in accordance with Equation 9 or Equation 10 above.

〈エ〉Ｙi(ｆ)≧Ｔe×Ｍである周波数ｆについて
第３の更新部１１４が、上記の数式１２に従って更新後のノイズスペクトルＮi(ｆ)を決定する。 <D> For a frequency f where Yi(f)≧Te×M, the third update unit 114 determines the updated noise spectrum Ni(f) in accordance with Equation 12 above.

上記の処理では、すなわち、Ｙi(ｆ)≧Ｔe×Ｍであって平均スペクトルレベルＭよりも振幅スペクトルが著しく大きい周波数成分については、当該の更新の後のノイズスペクトルＮi(ｆ)を更新の１フレーム前のノイズスペクトルＮi-1(ｆ)のままとして更新から除外する。 In the above process, that is, for frequency components where Yi(f) ≥ Te × M and the amplitude spectrum is significantly greater than the average spectrum level M, the noise spectrum Ni(f) after the update is left as the noise spectrum Ni-1(f) one frame before the update and is excluded from the update.

上記のようなノイズ更新回路１１によれば、トーン信号を抑圧しないようすることが可能となる。具体的には、スペクトル減算法を実現する従来の回路では、周波数スペクトルにおいて定常的に存在する成分をノイズと判断して抑圧するようにしているので、トーン信号も定常性があるためにノイズと判断されて抑圧の対象になり、ユーザにとって必要なトーン信号（例えば、モールス信号）も抑圧されてしまう、という問題がある。これに対して、上記のようなノイズ更新回路１１では、平均スペクトルレベルよりも振幅スペクトルが著しく大きい周波数成分をノイズスペクトルの更新から除外するようにしているので、トーン信号を抑圧しないようにすることが可能となる。 The above-described noise updating circuit 11 makes it possible to avoid suppressing tone signals. Specifically, in conventional circuits that realize the spectral subtraction method, components that are constantly present in the frequency spectrum are judged to be noise and suppressed, so tone signals, which are also stationary, are judged to be noise and are subject to suppression, resulting in the problem that tone signals necessary for the user (e.g., Morse code) are also suppressed. In contrast, the above-described noise updating circuit 11 excludes frequency components whose amplitude spectrum is significantly larger than the average spectrum level from the noise spectrum update, making it possible to avoid suppressing tone signals.

なお、実施の形態２に係るノイズ更新回路１１については、その時々の運用に応じてユーザが平均値算出部１１３および第３の更新部１１４を機能させるか否かを選択するというモードの選択に合わせてノイズスペクトルの更新の仕方が調整されて雑音成分の抑圧の仕方が調整されるようにしてもよい。例えば、ユーザが、雑音成分を抑圧しながら主に音声信号を送受信する場合には平均値算出部１１３および第３の更新部１１４を機能させないモードを選択してトーン信号を抑圧し、一方、モールス信号を送受信する場合には平均値算出部１１３および第３の更新部１１４を機能させるモードを選択してトーン信号を抑圧しないようすることが考えられる。 In addition, with respect to the noise updating circuit 11 according to the second embodiment, the manner in which the noise spectrum is updated may be adjusted in accordance with the mode selection in which the user selects whether or not to activate the average value calculation unit 113 and the third update unit 114 depending on the operation at the time, and the manner in which the noise components are suppressed may be adjusted. For example, when the user mainly transmits and receives voice signals while suppressing noise components, the user may select a mode in which the average value calculation unit 113 and the third update unit 114 are not activated to suppress tone signals, whereas when transmitting and receiving Morse codes, the user may select a mode in which the average value calculation unit 113 and the third update unit 114 are activated to prevent tone signals from being suppressed.

以上、この発明の実施の形態について説明したが、具体的な構成は、上記の実施の形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計の変更等があっても、この発明に含まれる。例えば、上記の実施の形態では図１に概略構成を示すノイズリダクション回路１に対してこの発明に係るノイズ更新回路１１が適用される場合を例に挙げて説明しているが、この発明が適用され得るノイズリダクション回路の構成は図１に示す例には限定されない。さらに言えば、この発明が適用され得る回路は、ノイズリダクション回路には限定されない。すなわち、この発明は、ノイズスペクトルを時系列で更新することが必要とされる種々の回路に対して適用され得る。 Although the embodiment of the present invention has been described above, the specific configuration is not limited to the above embodiment, and even if there are design changes within the scope of the present invention, they are included in the present invention. For example, the above embodiment is described using an example in which the noise updating circuit 11 of the present invention is applied to the noise reduction circuit 1 whose schematic configuration is shown in FIG. 1, but the configuration of the noise reduction circuit to which the present invention can be applied is not limited to the example shown in FIG. 1. Furthermore, the circuits to which the present invention can be applied are not limited to noise reduction circuits. In other words, the present invention can be applied to various circuits in which it is necessary to update the noise spectrum in a time series.

また、更新後のノイズスペクトルＮi(ｆ)を算出するための平均化を含む処理は、上記の実施の形態におけるノイズ時更新前重み定数Ｋnおよび音声時更新前重み定数Ｋsに相当する係数を用いる方法であればどのような方法であってもよい。すなわち、更新後のノイズスペクトルＮi(ｆ)の算出式は、上記の実施の形態における数式７ないし数式１０には限定されない。 Furthermore, the process including averaging for calculating the updated noise spectrum Ni(f) may be any method that uses coefficients equivalent to the noise pre-update weighting constant Kn and the voice pre-update weighting constant Ks in the above embodiment. In other words, the calculation formula for the updated noise spectrum Ni(f) is not limited to Equation 7 to Equation 10 in the above embodiment.

１ノイズリダクション回路
２プリエンファシス回路
３窓処理部
４時間周波数変換部
５変換結果出力部
６減算部
７合成部
８周波数時間変換部
９ディエンファシス回路
１０音声区間検出部
１１ノイズ更新回路
１１１第１の更新部
１１２第２の更新部
１１３平均値算出部（実施の形態２）
１１４第３の更新部（実施の形態２） REFERENCE SIGNS LIST 1 Noise reduction circuit 2 Pre-emphasis circuit 3 Window processing unit 4 Time-frequency conversion unit 5 Conversion result output unit 6 Subtraction unit 7 Synthesis unit 8 Frequency-time conversion unit 9 De-emphasis circuit 10 Voice activity detection unit 11 Noise update circuit 111 First update unit 112 Second update unit 113 Average value calculation unit (Embodiment 2)
114 Third update unit (second embodiment)

Claims

a first updating unit that, when a frame to be processed is a frame containing only noise components, calculates an updated noise spectrum Ni(f) for each frequency f according to the following Equation 1 which is an IIR filter or the following Equation 2 which is an FIR filter, using a signal corresponding to an amplitude spectrum of the frame to be processed as an input signal spectrum Yi(f) (where Ni-1(f): noise spectrum one frame before the update, Yi-j(f): input signal spectrum j frames before the update, Kn: constant determining weighting of Ni-1(f) with respect to Yi(f) when the frame to be processed is a frame containing only noise components, i: ordinal number representing the order in the time series, j: integer equal to or greater than 0 representing the degree of distance from ordinal number i in the time series);
a second update unit that calculates an updated noise spectrum N(f) for each frequency f according to the following Equation 3 which is an IIR filter or the following Equation 4 which is an FIR filter when the frame to be processed is a frame including a voice component (where Ks is a constant that determines the weighting of N(f) with respect to Y(f) when the frame to be processed is a frame including a voice component, Kn<Ks);
having
an average calculation unit that calculates an average spectrum level M for the input signal spectrum Y(f) according to the following formula 5 (where f1 is the minimum frequency in the amplitude spectrum, f2 is the maximum frequency in the amplitude spectrum, and Fn is the number of frequencies in the range from the minimum frequency f1 to the maximum frequency f2);
a third update unit that determines an updated noise spectrum Ni(f) according to the following Equation 6 for a frequency f where Yi(f)≧Te×M with respect to the input signal spectrum Yi(f) and the average spectrum level M (where T is a coefficient, and Ni-1(f) is the noise spectrum one frame before the update);
Further comprising
A noise updating circuit comprising:

The constant Ks is set to any value within a range corresponding to the time constant of an IIR filter or the averaging interval of an FIR filter, that is, 1 to 10 seconds.
2. The noise updating circuit according to claim 1 .

The coefficient Te is set to any value in the range of 1 to 100.
2. The noise updating circuit according to claim 1 .