JP3586205B2

JP3586205B2 - Speech spectrum improvement method, speech spectrum improvement device, speech spectrum improvement program, and storage medium storing program

Info

Publication number: JP3586205B2
Application number: JP2001046450A
Authority: JP
Inventors: 清明相川; 健太郎石塚
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2001-02-22
Filing date: 2001-02-22
Publication date: 2004-11-10
Anticipated expiration: 2021-02-22
Also published as: JP2002244695A

Description

【０００１】
【発明の属する技術分野】
この発明は例えば音声認識のために音声スペクトルを求める技術に関し、特に雑音に強い音声認識を行なうことができる音声スペクトル改善方法及びこの改善方法を用いた音声スペクトル改善装置、この装置を構成するプログラム、このプログラムを記憶した記憶媒体を提案するものである。
【０００２】
【従来の技術】
従来より、音声認識においては音の特徴を表わすのにスペクトル包絡が用いられている。［例えば、古井貞煕、ディジタル音声処理、東海大学出版会、１９８５］音声認識にスペクトル包絡を用いた場合、音声スペクトルには調波構造があり、高調波が存在する周波数には音声のエネルギーが集中しているので信号対雑音の比（ＳＮＲ）は高い。従って、高調波が存在する周波数帯域だけを用いて音声認識を行なう限りにあっては高い認識率を得ることができる。
【０００３】
【発明が解決しようとする課題】
然し乍ら従来は音声スペクトルの調波構造を無視して平滑化されたスペクトルを用いていたので信号対雑音の比（ＳＮＲ）が高い部分も低い部分も平均化されてスペクトルが推定されてしまうため、雑音の影響を受け易くなり認識率が低下する欠点がある。
この欠点を解消する方法としては例えば音声が調波成分に集中しており、集中している帯域が全周波数帯域の半分だったとすると、その部分だけを用いれば、実質的に雑音を半分に減らすことができる。
【０００４】
然し乍らこの方法を採るためには音声が集中している調波成分の帯域を特定しなければならないことから、この特定がむずかしく実現は困難である。
また、他の方法としてマスキングを用いて音声エネルギーの弱い部分を覆い隠す方法［松尾秀之、宇佐川毅、江端正直、マスキング効果を用いた騒音下での音声パラメータの抽出、日本音響学会講演論文集、Ｖｏｌ．Ｉ，ｐｐ．６７−６８，１９９２−１０］もあるが、マスキングの及ぶ範囲が三角形状のため、音声のレベルが弱い周波数では、音声のレベルが強い周波数から離れていたとしても強い周波数の影響を広く受けてしまうため、音声の特徴が隠されてしまう欠点があった。
【０００５】
音声の周波数帯域と雑音の周波数帯域が違えば、音声波形に対するフィルタを用いて雑音を抑制することが可能であるが、音声の周波数帯域と雑音の周波数帯域が重なっている場合にはこのようなフィルタの効果を得ることはできない。
この発明の目的は音声の周波数帯域と雑音の帯域が重なっている場合でも優れた認識率の向上を達することができるスペクトル改善方法を提案するものである。
【０００６】
【課題を解決するための手段】
この発明の請求項１では、一定時間長の音声信号からこの音声信号に含まれる周波数スペクトルを分析し算出するスペクトル算出処理と、
このスペクトル算出処理により算出したスペクトルに対し、スペクトル強度の大きい値程より強調する第１非線形変換処理と、
この第１非線形変換処理が施されたスペクトルに対し周波数軸上で線形の平滑化フィルタリングを施す平滑化第１線形フィルタリング処理と、
第１非線形変換処理で施された非線形変換を打ち消すための非線形変換処理を施す第２非線形変換処理と、
平滑化第１線形フィルタリング処理で施された平滑化フィルタリング特性を打ち消すための逆フィルタリング処理を施す第２線形フィルタリング処理と、
を実行する音声スペクトル改善方法を提案する。
【０００７】
この発明の請求項２では請求項１記載の音声スペクトル改善方法において、スペクトル算出処理は高速フーリエ変換、線形予測分析、バンドパスフィルタの何れかにより実行する音声スペクトル改善方法を提案する。
この発明の請求項３では請求項１記載の音声スペクトル改善方法において、第１非線形変換処理は１以上の実数による冪乗を実行し、第２非線形変換処理は１以下の実数による冪乗を実行する音声スペクトル改善方法を提案する。
この発明の請求項４では請求項１記載の音声スペクトル改善方法において、第２非線形変換処理は第１非線形変換処理の近似的逆変換であるとする音声スペクトル改善方法を提案する。
【０００８】
この発明の請求項５では請求項１記載のスペクトル改善方法において、第２線形フィルタリング処理が平滑化第１線形フィルタリング処理の近似的逆フィルタリング処理であるとする音声スペクトル改善方法を提案する。
この発明の請求項６では一定時間長の音声信号からその区間のスペクトルを算出するスペクトル算出手段と、
このスペクトル算出手段が算出したスペクトルの強度を大きい値程より強調する非線形変換処理を施す第１非線形変換処理手段と、
この第１非線形変換処理手段が非線形変換処理したスペクトルに対し、周波数軸上で線形の平滑化フィルタリングを施す平滑化第１線形フィルタリング処理手段と、
第１非線形変換処理手段が施した非線形変換特性を打ち消すための非線形変換処理を施す第２非線形変換処理手段と、
平滑化第１線形フィルタリング処理手段が施した平滑化フィルタリング特性を打ち消すための逆フィルタリング処理を施す第２線形フィルタリング処理手段と、
によって構成した音声スペクトル改善装置を提案する。
【０００９】
この発明の請求項７では一定時間長の音声信号からその区間のスペクトルを算出するスペクトル算出プログラムと、
このスペクトル算出プログラムが算出したスペクトルの強度を、大きい値程より強調する非線形変換処理を施す第１非線形変換プログラムと、
この第１非線形変換プログラムが非線形変換処理したスペクトルに対して周波数軸上で線形の平滑化フィルタリングを施す平滑化第１線形フィルタリングプログラムと、
第１非線形変換プログラムが施した非線形変換を打ち消すための非線形変換処理を施す第２非線形変換プログラムと、
平滑化第１線形フィルタリングプログラムが施した平滑化フィルタリング特性を打ち消すための逆フィルタリング処理を施す第２線形フィルタリングプログラムと、
によって構成した音声スペクトル改善用プログラムを提案する。
【００１０】
この発明の請求項８では一定時間長の音声信号からその区間のスペクトルを算出するスペクトル算出プログラムと、
このスペクトル算出プログラムが算出したスペクトルの強度を、大きい値程より強調する非線形変換処理を施す第１非線形変換プログラムと、
この第１非線形変換プログラムが非線形変換処理したスペクトルに対して周波数軸上で線形の平滑化フィルタリングを施す平滑化第１線形フィルタリングプログラムと、
第１非線形変換プログラムが施した非線形変換特性を打ち消すための非線形変換処理を施す第２非線形変換プログラムと、
平滑化第１線形フィルタリングプログラムが施した平滑化フィルタリング特性を打ち消すための逆フィルタリング処理を施す第２線形フィルタリングプログラムと、
を記憶した記憶媒体を提案する。
作用
有声部分の音声スペクトルは基本周波数の整数倍の周波数（高調波の周波数）にエネルギーが集中している。これらをここでは調波成分と呼ぶ。調波成分においては、音声に雑音が混入していても、信号対雑音比（Ｓ／Ｎ比）が高いと考えられる。このような成分を用いて調波成分以外のＳ／Ｎ比が低い成分を抑制すれば、雑音に強いスペクトル推定ができると考えられる。
【００１１】
この発明では雑音が重畳された音声のスペクトルが、スペクトル算出処理により調波構造がわかる周波数分解能で得られているとする。このスペクトルを第１非線形処理により１より大きい実数で冪乗すると、エネルギーの強い部分が強調され調波成分が強調される。これに平滑化第１線形フィルタリング処理により平滑化フィルタリング処理を施すとエネルギーが強い部分のインパルスレスポンスがエネルギーの弱い、音声の調波成分以外の部分、すなわち谷の部分に強調されて重畳される。これにより、谷部分では雑音成分の比率は非常に小さくなる。
【００１２】
次に、第２非線形変換処理により冪乗の逆変換を行なうと、スペクトルの谷部分では雑音の比率が極めて小さくなる。ここで更に、逆平滑化第２線形フィルタリング処理により平滑化第１線形フィルタリングの逆フィルタを施せば、平滑化第１線形フィルタリング処理により重畳された調波成分のインパルスレスポンスの影響を取り除くことができ、スペクトルの谷部分で雑音を抑制したスペクトルが得られる。
周波数ωの関数として表されるスペクトルをＳ（ω）、第１非線形変換処理で用いる冪指数をη（η＞１）、周波数の関数として表されるスペクトル平滑化フィルタリング処理のインパルスレスポンスをｈ（λ）とすると、平滑化第１線形フィルタリング処理により得られるスペクトルＰ（ω）は、
【００１３】
【数１】

【００１４】
により表される。第１非線形変換処理の冪指数ηが１の時には、式（１）は通常のスペクトル上の線形フィルタを表す。ηが大きくなれば、エネルギの強い部分がより強調されたスペクトルに対して平滑化第１線形フィルタリング処理が行なわれる。
式（１）の演算においては、スペクトルＳ（ω）は正値でなくてはならない。スペクトルを求めるには各種のスペクトル算出方法例えば高速フーリエ変換、線形予測分析或はバンドパスフィルタを用いることができる。高速フーリエ変換或は線形予測分析により求まるパワースペクトルのゲイン成分は正値であるのでそのまま使用することができる。また、対数スペクトルも、ｌｏｇ（Ｓ（ω）＋１）を用いれば、正値の近似的対数スペクトルとして求めることができる。
【００１５】
また、第１非線形処理で用いる冪指数η、および、平滑化第１線形フィルタリング処理に用いるインパルスレスポンスｈ（λ）としては、
η＝５（２）
ｈ（λ）＝０．５＋０．５ｃｏｓ（４０λ）（−π／４０＜λ＜π／４０）（３）
を用いることができる。平滑化第１線形フィルタリング処理とはハニング窓と呼ばれているものである。スペクトルの帯域は０からπに正規化されているので、このハニングウインドウの長さは全周波数帯域の１／２０である。
【００１６】
以上の操作により処理されたスペクトルは強度スケール、周波数分解能共に変化している。そこで、第１非線形変換処理の逆変換と平滑化第１線形フィルタリング処理の逆フィルタを第２の処理により施して元の関数に近い強度のスケーリングと周波数分解能に戻す。しかし、第２の処理は必ずしも第１の処理の正確な逆変換になっている必要はなく近似的で良い。第２非線形変換処理に用いる冪指数をζ、第２線形フィルタリング処理（高域強調フィルタとなっている）のインパルスレスポンスをｇ（λ）とすると、フィルタリング後のスペクトルＱ（ω）は
【００１７】
【数２】

【００１８】
により与えられる。ζ＝１／ηで、ｈ（λ）、ｇ（λ）が共にデルタ関数の場合、または、ζ＝η＝１で、ｇ（λ）がｈ（λ）の逆フィルタになっている場合にはＱ（ω）＝Ｓ（ω）となる。
以上のアルゴリズムをまとめると図１に示す如くなる。ステップ１１は音声スペクトルの算出処理過程を示す。ステップ１２は冪乗による第１非線形変換処理、ステップ１３は平滑化第１線形フィルタリング処理、ステップ１４は冪乗による第２非線形変換処理、ステップ１５は逆平滑化第２線形フィルタリング処理、ステップ１６は結果出力を示す。
【００１９】
第２非線形変換処理に用いる冪指数ζ、第２線形フィルタリング処理に用いる逆フィルタ特性ｇ（λ）としては、以下が使用できる。

αは逆フィルタにおいて元の成分を残す比率とハニング窓関数の積分値を１に正規化する係数の積を表す。
【００２０】
【発明の実施の形態】
図１にこの発明によるスペクトル改善装置の一実施例を示す。図中２０はこの発明によるスペクトル改善装置の全体を示す。この発明によるスペクトル改善装置２０はスペクトル算出手段２１と、第１非線形変換処理手段２２と、平滑化第１線形フィルタリング処理手段２３と、第２非線形変換処理手段２４と、逆平滑化第２線形フィルタリング処理手段２５とによって構成される。
スペクトル算出手段２１は連続的な音声信号から一定区間切り出す音声波形切出手段２１Ａと、この音声波形切出手段２１Ａで切り出した音声波形を高速フーリエ変換する高速フーリエ変換手段２１Ｂと、高速フーリエ変換手段２１Ｂでフーリエ変換した結果からスペクトルを算出する演算手段２１Ｃとによって構成される。
【００２１】
音声波形切出手段２１Ａは連続的な音声信号から例えば３０ｍｓ程度の音声を切り出す。この切り出された音声は例えば１１０２５Ｈｚのサンプリングレートでサンプリングされデジタル信号に変換される。
ＡＤ変換された音声信号は高速フーリエ変換手段２１Ｂで例えば１０２４ポイントの高速フーリエ変換され周波数分析される。
演算手段２１Ｃは高速フーリエ変換手段２１Ｂの高速フーリエ変換結果を演算処理し、入力信号に含まれる周波数成分のスペクトルを算出し、そのスペクトルをスペクトル算出手段２１の算出結果として出力する。
【００２２】
スペクトルは例えば５１２点のデータにより表現される。
第１非線形変換処理手段２２はスペクトル算出手段２１が出力したスペクトルに対して第１非線形変換処理を施す。ここでは例えばスペクトルを５乗する。
第１非線形変換処理手段２２で冪乗されたスペクトルは平滑化第１線形フィルタリング処理手段２３で平滑化フィルタリング処理が施される。平滑化第１線形フィルタリング処理手段２３はコンボリューション部２３Ａと平滑化フィルタリング特性生成手段（ハニング窓生成手段）２３Ｂとによって構成される。
【００２３】
コンボリューション部２３Ａは冪乗されて入力されたスペクトルと平滑化フィルタリング特性生成手段２３Ｂで生成したフィルタ特性のコンボリューション（重畳積分）を行なう。
平滑化フィルタリング特性生成手段２３Ｂでは例えばハニングウインドウの形状をした長さ２６（５１２／２０）ポイントの平滑化フィルタリング特性を生成する。
第２非線形変換処理手段２４は第１平滑化フィルタリング処理手段２３でコンボリューションされたスペクトルを例えば１／５乗し、第２非線形処理を施す。
【００２４】
逆平滑化第２線形フィルタリング処理手段２５は平滑化第１線形フィルタリング処理手段２３と同様にコンボリューション部２５Ａと、逆平滑化フィルタリング特性生成手段２５Ｂとによって構成される。この逆平滑化フィルタリング特性生成手段２５Ｂが生成するフィルタリング特性は平滑化第１線形フィルタリング処理手段２３を構成する平滑化フィルタリング特性生成手段２３Ｂが生成するフィルタ特性の逆特性を生成する。その一例としては周辺が幅１０３（５１２／５）のハニングウインドウに負の符号を付し、その和を例えば−０．３に正規化すると共に中央部のみに１を加えたものを用意する。
【００２５】
この用意した逆フィルタ特性をコンボリューション部２５Ａで第２非線形変換処理手段２４から与えられるスペクトルとコンボリューション演算し、その演算結果をスペクトル改善装置２０の改善されたスペクトル出力として外部に送出する。
【００２６】
【発明の効果】
以上説明したようにこの発明によれば第１非線形変換処理と、平滑化第１線形フィルタリング処理と、第２非線形処理と、逆平滑化第２線形フィルタリング処理の組合せによりスペクトルに含まれる雑音を抑制することができる。特にこの発明によれば全ての周波数帯域にわたって均一に雑音を抑制することができる効果が得られる。
また、この発明によるスペクトル改善方法及び装置によれば入力された音声スペクトルの包絡の形状が保持された状態で出力することができる。
【００２７】
図３は入力音声スペクトルモデル、図４は改善されたスペクトルモデルを示す。図２及び図３において、横軸は０から１に正規化された周波数、縦軸は各周波数成分の強さを表す。図２に示す入力スペクトルでは音声の谷の部分には各高調波に比例した調波性の雑音が加わっている。図３に示す改善後のスペクトルでは目的信号の調波成分の比率はほぼ元と同じレベルに保たれ、かつ、調波成分の間にあった雑音が抑制されていることがわかる。更に、入力されたスペクトルと改善後のスペクトルの包絡の形状が一致していることがわかる。
【００２８】
以上のように、信号対雑音比が高く、雑音の影響を受けにくいスペクトルの調波成分により、調波成分の間の雑音成分を抑制できるので、平均的に雑音の少ない、音声スペクトルを得ることができる。このような音声スペクトルを用いることにより音声認識率を向上することができる。
尚、図２に示したこの発明によるスペクトル改善装置２０の各部の構成はコンピュータ上で動作するプログラムによって構成することができる。これらのプログラムは磁気ディスク、半導体メモリ等の記憶媒体に記録されて流通するか或は通信網を通じて流通し販売される。
【図面の簡単な説明】
【図１】この発明によるスペクトル改善方法を説明するためのフローチャート。
【図２】この発明によるスペクトル改善装置の一実施例を説明するためのブロック図。
【図３】この発明によるスペクトル改善装置に入力される音声スペクトルの一例を示すグラフ。
【図４】この発明によるスペクトル改善装置で改善された音声スペクトルの一例を示すグラフ。
【符号の説明】
１１スペクトル算出処理ステップ
１２第１非線形変換処理ステップ
１３平滑化第１線形フィルタリング処理ステップ
１４第２非線形変換処理ステップ
１５逆平滑化第２線形フィルタリング処理ステップ
１６出力ステップ
２０スペクトル改善装置
２１スペクトル算出手段
２２第１非線形変換処理手段
２３平滑化第１線形フィルタリング処理手段
２４第２非線形変換処理手段
２５逆平滑化第２線形フィルタリング処理手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a technique for obtaining a voice spectrum for voice recognition, for example, and more particularly to a voice spectrum improvement method capable of performing voice recognition resistant to noise, a voice spectrum improvement apparatus using the improvement method, a program constituting the apparatus, A storage medium storing the program is proposed.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, in speech recognition, a spectral envelope has been used to represent sound characteristics. [For example, Sadahiro Furui, Digital Speech Processing, Tokai University Press, 1985] When a spectral envelope is used for speech recognition, the speech spectrum has a harmonic structure, and speech energy is present at frequencies where harmonics exist. Because of the concentration, the signal-to-noise ratio (SNR) is high. Therefore, a high recognition rate can be obtained as long as speech recognition is performed using only the frequency band in which the harmonic exists.
[0003]
[Problems to be solved by the invention]
However, in the past, since the smoothed spectrum was used ignoring the harmonic structure of the voice spectrum, a portion having a high signal-to-noise ratio (SNR) and a portion having a low signal-to-noise ratio were averaged, and the spectrum was estimated. There is a disadvantage that the apparatus is easily affected by noise and the recognition rate is reduced.
As a method of solving this drawback, for example, if the sound is concentrated in the harmonic component and the concentrated band is half of the entire frequency band, if only that part is used, the noise is substantially reduced to half be able to.
[0004]
However, in order to adopt this method, it is necessary to specify the band of the harmonic component in which the voice is concentrated.
As another method, masking is used to mask parts with weak sound energy [Hideyuki Matsuo, Takeshi Usagawa, Masanao Ebata, Extraction of speech parameters under noise using masking effect, Proceedings of the Acoustical Society of Japan, Vol. I, pp. 67-68, 1992-10], but since the range covered by the masking is triangular, at a frequency where the audio level is weak, even if the audio level is far from the strong frequency, it is widely affected by the strong frequency. Therefore, there is a disadvantage that the features of the sound are hidden.
[0005]
If the voice frequency band and the noise frequency band are different, it is possible to suppress noise using a filter for the voice waveform, but if the voice frequency band and the noise frequency band overlap, The effect of the filter cannot be obtained.
SUMMARY OF THE INVENTION An object of the present invention is to propose a spectrum improving method capable of achieving an excellent improvement in recognition rate even when a voice frequency band and a noise band overlap.
[0006]
[Means for Solving the Problems]
According to claim 1 of the present invention, a spectrum calculation process of analyzing and calculating a frequency spectrum included in the audio signal from the audio signal having a fixed time length,
A first non-linear conversion process for emphasizing a spectrum calculated by the spectrum calculation process with a value having a higher spectrum intensity;
A smoothed first linear filtering process for performing linear smoothing filtering on the frequency axis on the spectrum subjected to the first nonlinear conversion process;
A second non-linear conversion process for performing a non-linear conversion process for canceling the non-linear conversion performed in the first non-linear conversion process;
A second linear filtering process for performing an inverse filtering process for canceling the smoothing filtering characteristic performed in the smoothing first linear filtering process;
We propose a speech spectrum improvement method that performs.
[0007]
According to a second aspect of the present invention, there is provided the voice spectrum improving method according to the first aspect, wherein the spectrum calculating process is performed by any one of a fast Fourier transform, a linear prediction analysis, and a band-pass filter.
According to a third aspect of the present invention, in the voice spectrum improving method according to the first aspect, the first non-linear conversion process executes a power to one or more real numbers, and the second non-linear conversion process performs a power to one or less real numbers. We propose a speech spectrum improvement method.
According to a fourth aspect of the present invention, there is provided the voice spectrum improving method according to the first aspect, wherein the second nonlinear conversion processing is an approximate inverse conversion of the first nonlinear conversion processing.
[0008]
In a fifth aspect of the present invention , there is provided a voice spectrum improving method according to the first aspect, wherein the second linear filtering process is an approximate inverse filtering process of the smoothing first linear filtering process.
According to claim 6 of the present invention, a spectrum calculating means for calculating a spectrum of the section from an audio signal of a fixed time length,
First non-linear conversion processing means for performing non-linear conversion processing for enhancing the intensity of the spectrum calculated by the spectrum calculation means as the value increases,
Smoothing first linear filtering processing means for performing linear smoothing filtering on the frequency axis to the spectrum subjected to the nonlinear conversion processing by the first nonlinear conversion processing means;
A second non-linear conversion processing means for performing a non-linear conversion processing for canceling the non-linear conversion characteristic performed by the first non-linear conversion processing means;
Second linear filtering processing means for performing inverse filtering processing for canceling the smoothing filtering characteristic performed by the smoothing first linear filtering processing means;
We propose a speech spectrum improving device composed of.
[0009]
According to a seventh aspect of the present invention, there is provided a spectrum calculation program for calculating a spectrum of a section from an audio signal having a predetermined time length,
A first non-linear conversion program for performing a non-linear conversion process to emphasize the intensity of the spectrum calculated by the spectrum calculation program as the value increases,
A smoothing first linear filtering program for performing linear smoothing filtering on the frequency axis for the spectrum subjected to the nonlinear conversion processing by the first nonlinear conversion program;
A second nonlinear conversion program for performing a nonlinear conversion process for canceling the nonlinear conversion performed by the first nonlinear conversion program;
A second linear filtering program that performs an inverse filtering process to cancel the smoothing filtering characteristic performed by the smoothing first linear filtering program;
We propose a speech spectrum improvement program composed of.
[0010]
According to an eighth aspect of the present invention, there is provided a spectrum calculation program for calculating a spectrum of a section from an audio signal having a fixed time length,
A first non-linear conversion program for performing a non-linear conversion process to emphasize the intensity of the spectrum calculated by the spectrum calculation program as the value increases,
A smoothing first linear filtering program for performing linear smoothing filtering on the frequency axis for the spectrum subjected to the nonlinear conversion processing by the first nonlinear conversion program;
A second non-linear conversion program for performing a non-linear conversion process for canceling the non-linear conversion characteristics performed by the first non-linear conversion program;
A second linear filtering program that performs an inverse filtering process to cancel the smoothing filtering characteristic performed by the smoothing first linear filtering program;
Is proposed.
Speech spectrum of action <br/> voiced is concentrated energy to an integral multiple of the frequency of the fundamental frequency (frequency harmonics). These are referred to herein as harmonic components. Regarding harmonic components, it is considered that the signal-to-noise ratio (S / N ratio) is high even if noise is mixed in the voice. It is considered that if components having a low S / N ratio other than the harmonic components are suppressed using such components, it is possible to estimate a spectrum that is strong against noise.
[0011]
In the present invention, it is assumed that the spectrum of the voice on which the noise is superimposed is obtained at a frequency resolution at which the harmonic structure can be understood by the spectrum calculation processing. When this spectrum is raised to a real number larger than 1 by the first non-linear processing, the high energy portion is emphasized and the harmonic component is emphasized. When a smoothing filtering process is performed by the smoothing first linear filtering process on this, an impulse response of a portion having a high energy is emphasized and superimposed on a portion other than a harmonic component of the voice having a low energy, that is, a valley portion. As a result, the ratio of the noise component in the valley becomes very small.
[0012]
Next, when the inverse transformation of the power is performed by the second non-linear transformation processing, the noise ratio becomes extremely small in the valley portion of the spectrum. Here, if the inverse filter of the smoothing first linear filtering is further applied by the inverse smoothing second linear filtering, the influence of the impulse response of the harmonic component superimposed by the smoothing first linear filtering can be removed. , A spectrum in which noise is suppressed at the valley portion of the spectrum is obtained.
The spectrum expressed as a function of the frequency ω is S (ω), the exponent used in the first non-linear conversion processing is η (η > 1), and the impulse response of the spectrum smoothing filtering processing expressed as a function of the frequency is h ( λ), the spectrum P (ω) obtained by the smoothing first linear filtering process is
[0013]
(Equation 1)

[0014]
Is represented by When the power exponent η of the first nonlinear conversion process is 1, Expression (1) represents a normal linear filter on the spectrum. When η increases, the first linear filtering process is performed on the spectrum in which the high-energy portion is further emphasized.
In the calculation of Expression (1), the spectrum S (ω) must be a positive value. To obtain the spectrum, various spectrum calculation methods such as fast Fourier transform, linear prediction analysis, or band pass filter can be used. Since the gain component of the power spectrum obtained by the fast Fourier transform or the linear prediction analysis is a positive value, it can be used as it is. The logarithmic spectrum can also be obtained as a positive approximate logarithmic spectrum by using log (S (ω) +1).
[0015]
The exponent η used in the first non-linear process and the impulse response h (λ) used in the smoothing first linear filtering process are as follows:
η = 5 (2)
h (λ) = 0.5 + 0.5 cos (40λ) (−π / 40 <λ <π / 40) (3)
Can be used. The smoothing first linear filtering process is called a Hanning window. Since the spectrum band is normalized from 0 to π, the length of this Hanning window is 1/20 of the entire frequency band.
[0016]
The spectrum processed by the above operation changes in both the intensity scale and the frequency resolution. Therefore, the inverse transform of the first non-linear transform process and the smoothing inverse filter of the first linear filtering process are performed by the second process to return the intensity to a scaling and frequency resolution close to the original function. However, the second process does not necessarily have to be an exact inverse transform of the first process, but may be approximate. Assuming that a power exponent used for the second non-linear conversion processing is ζ and an impulse response of the second linear filtering processing (which is a high-frequency emphasis filter) is g (λ), the spectrum Q (ω) after filtering is
(Equation 2)

[0018]
Given by When ζ = 1 / η and both h (λ) and g (λ) are delta functions, or when ζ = η = 1 and g (λ) is an inverse filter of h (λ) Is Q (ω) = S (ω).
The above algorithm is summarized as shown in FIG. Step 11 shows the process of calculating the speech spectrum. Step 12 is a first non-linear conversion process by power, step 13 is a smoothing first linear filtering process, step 14 is a second non-linear conversion process by power, step 15 is inverse smoothing second linear filtering process, and step 16 is Show the result output.
[0019]
The following can be used as the exponent ζ used in the second non-linear conversion processing and the inverse filter characteristic g (λ) used in the second linear filtering processing.

α represents the product of the ratio of retaining the original component in the inverse filter and the coefficient for normalizing the integral value of the Hanning window function to 1.
[0020]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 shows an embodiment of a spectrum improving apparatus according to the present invention. In the figure, reference numeral 20 denotes the whole spectrum improving apparatus according to the present invention. The spectrum improving apparatus 20 according to the present invention includes a spectrum calculating unit 21, a first nonlinear conversion processing unit 22, a smoothed first linear filtering processing unit 23, a second nonlinear conversion processing unit 24, and an inverse smoothed second linear filtering. And processing means 25.
The spectrum calculating means 21 includes an audio waveform extracting means 21A for extracting a predetermined section from a continuous audio signal, a fast Fourier transforming means 21B for performing a fast Fourier transform on the audio waveform extracted by the audio waveform extracting means 21A, and a fast Fourier transforming means. 21B for calculating the spectrum from the result of the Fourier transform at 21B.
[0021]
The audio waveform extracting unit 21A extracts audio of, for example, about 30 ms from a continuous audio signal. The cut-out sound is sampled at a sampling rate of 11025 Hz, for example, and is converted into a digital signal.
The A / D-converted audio signal is subjected to, for example, 1024 points of high-speed Fourier transform by the high-speed Fourier transform means 21B, and is subjected to frequency analysis.
The arithmetic unit 21C performs an arithmetic process on the fast Fourier transform result of the fast Fourier transform unit 21B, calculates the spectrum of the frequency component included in the input signal, and outputs the spectrum as the calculation result of the spectrum calculating unit 21.
[0022]
The spectrum is represented by, for example, data of 512 points.
The first nonlinear conversion processing unit 22 performs a first nonlinear conversion process on the spectrum output from the spectrum calculation unit 21. Here, for example, the spectrum is raised to the fifth power.
The spectrum raised to the power by the first nonlinear conversion processing means 22 is subjected to smoothing filtering processing by the smoothing first linear filtering processing means 23. The smoothing first linear filtering means 23 includes a convolution unit 23A and a smoothing filtering characteristic generating means (Hanning window generating means) 23B.
[0023]
The convolution unit 23A performs convolution (superposition integration) of the spectrum input by raising to the power and the filter characteristic generated by the smoothing filtering characteristic generation unit 23B.
The smoothing filtering characteristic generation unit 23B generates a smoothing filtering characteristic having a length of 26 (512/20) points in the shape of a Hanning window, for example.
The second non-linear conversion processing means 24 raises the spectrum convolved by the first smoothing filtering processing means 23, for example, to the power of ５ and performs a second non-linear processing.
[0024]
The inverse-smoothing second linear filtering processing means 25 includes a convolution unit 25A and an inverse-smoothing filtering characteristic generating means 25B, like the smoothing first linear filtering processing means 23. The filtering characteristic generated by the inverse smoothing filtering characteristic generation unit 25B generates an inverse characteristic of the filter characteristic generated by the smoothing filtering characteristic generation unit 23B included in the smoothing first linear filtering processing unit 23. As an example, a Hanning window having a width of 103 (512/5) at the periphery is given a negative sign, the sum of which is normalized to, for example, -0.3, and 1 is added only to the central portion.
[0025]
The prepared inverse filter characteristic is subjected to a convolution operation by the convolution unit 25A with the spectrum given from the second non-linear conversion processing unit 24, and the operation result is sent to the outside as an improved spectrum output of the spectrum improvement device 20.
[0026]
【The invention's effect】
As described above, according to the present invention, noise included in the spectrum is suppressed by a combination of the first non-linear conversion processing, the smoothed first linear filtering processing, the second non-linear processing, and the inverse smoothed second linear filtering processing. can do. In particular, according to the present invention, an effect that noise can be suppressed uniformly over all frequency bands can be obtained.
Further, according to the spectrum improving method and apparatus according to the present invention, it is possible to output the input speech spectrum in a state where the envelope shape of the speech spectrum is maintained.
[0027]
FIG. 3 shows the input speech spectrum model, and FIG. 4 shows the improved spectrum model. 2 and 3, the horizontal axis represents the frequency normalized from 0 to 1, and the vertical axis represents the intensity of each frequency component. In the input spectrum shown in FIG. 2, harmonic noise proportional to each harmonic is added to the valley portion of the voice. In the spectrum after the improvement shown in FIG. 3, it can be seen that the ratio of the harmonic components of the target signal is maintained at substantially the same level as the original signal, and the noise between the harmonic components is suppressed. Further, it can be seen that the shape of the envelope of the input spectrum coincides with that of the spectrum after the improvement.
[0028]
As described above, since the signal-to-noise ratio is high and the noise component between the harmonic components can be suppressed by the harmonic component of the spectrum that is not easily affected by noise, it is possible to obtain an audio spectrum with low noise on average. Can be. By using such a voice spectrum, the voice recognition rate can be improved.
The configuration of each section of the spectrum improving apparatus 20 according to the present invention shown in FIG. 2 can be configured by a program operating on a computer. These programs are recorded on a storage medium such as a magnetic disk or a semiconductor memory and distributed, or distributed and sold through a communication network.
[Brief description of the drawings]
FIG. 1 is a flowchart for explaining a spectrum improving method according to the present invention.
FIG. 2 is a block diagram for explaining an embodiment of a spectrum improving apparatus according to the present invention.
FIG. 3 is a graph showing an example of a voice spectrum input to the spectrum improving device according to the present invention.
FIG. 4 is a graph showing an example of a speech spectrum improved by the spectrum improving device according to the present invention.
[Explanation of symbols]
Reference Signs List 11 spectrum calculation processing step 12 first non-linear conversion processing step 13 smoothing first linear filtering processing step 14 second non-linear conversion processing step 15 inverse smoothing second linear filtering processing step 16 output step 20 spectrum improving device 21 spectrum calculation means 22 First non-linear transformation processing means 23 Smoothing first linear filtering processing means 24 Second non-linear transformation processing means 25 Inverse smoothing second linear filtering processing means

Claims

A. A spectrum calculation process of analyzing and calculating a frequency spectrum included in the audio signal from the audio signal of a fixed time length,
B. A first non-linear conversion process for emphasizing a spectrum calculated by the spectrum calculation process with a value having a higher spectrum intensity;
C. A smoothed first linear filtering process for performing linear smoothing filtering on the frequency axis on the spectrum subjected to the first nonlinear conversion process;
D. A second non-linear conversion process for performing a non-linear conversion process for canceling the non-linear conversion performed in the first non-linear conversion process;
E. FIG. A second linear filtering process for performing an inverse filtering process for canceling the smoothing filtering characteristic performed in the smoothing first linear filtering process;
And a voice spectrum improving method.

2. The speech spectrum improving method according to claim 1, wherein said spectrum calculation processing is performed by any one of a fast Fourier transform, a linear prediction analysis, and a band pass filter.

2. The voice spectrum improving method according to claim 1, wherein the first non-linear conversion process executes a power of one or more real numbers, and the second non-linear conversion process executes a power of one or less real numbers. Voice spectrum improvement method.

2. The voice spectrum improving method according to claim 1, wherein said second nonlinear conversion processing is an approximate inverse conversion of said first nonlinear conversion processing.

2. The method according to claim 1, wherein said second linear filtering is an approximate inverse filtering of said smoothing first linear filtering.

A. Spectrum calculating means for calculating the spectrum of the section from the audio signal of a fixed time length,
B. First non-linear conversion processing means for performing non-linear conversion processing for enhancing the intensity of the spectrum calculated by the spectrum calculation means as the value increases,
C. Smoothing first linear filtering processing means for performing linear smoothing filtering on the frequency axis to the spectrum subjected to the nonlinear conversion processing by the first nonlinear conversion processing means;
D. A second non-linear conversion processing means for performing a non-linear conversion processing for canceling the non-linear conversion characteristic performed by the first non-linear conversion processing means;
E. FIG. Second linear filtering processing means for performing inverse filtering processing for canceling the smoothing filtering characteristic applied by the smoothing first linear filtering means;
A speech spectrum improving apparatus characterized by comprising:

A. A spectrum calculation program for calculating the spectrum of the section from the audio signal of a fixed time length,
B. A first non-linear conversion program for performing a non-linear conversion process to emphasize the intensity of the spectrum calculated by the spectrum calculation program as the value increases,
C. A smoothing first linear filtering program for performing linear smoothing filtering on the frequency axis for the spectrum subjected to the nonlinear conversion processing by the first nonlinear conversion program;
D. A second non-linear conversion program for performing a non-linear conversion process for canceling the non-linear conversion performed by the first non-linear conversion program;
E. FIG. A second linear filtering program that performs an inverse filtering process to cancel the smoothing filtering characteristic performed by the smoothing first linear filtering program;
A speech spectrum improvement program composed of:

A. A spectrum calculation program for calculating the spectrum of the section from the audio signal of a fixed time length,
B. A first non-linear conversion program for performing a non-linear conversion process to emphasize the intensity of the spectrum calculated by the spectrum calculation program as the value increases,
C. A smoothing first linear filtering program for performing linear smoothing filtering on the frequency axis for the spectrum subjected to the non-linear conversion processing by the first non-linear conversion program;
D. A second non-linear conversion program for performing a non-linear conversion process for canceling the non-linear conversion characteristics performed by the first non-linear conversion program;
E. FIG. A second linear filtering program that performs an inverse filtering process to cancel the smoothing filtering characteristic performed by the smoothing first linear filtering program;
Storage medium storing.