JP3430133B2

JP3430133B2 - Speech recognition device and program recording medium

Info

Publication number: JP3430133B2
Application number: JP2000257623A
Authority: JP
Inventors: 彰鶴田; 俊夫赤羽
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2000-08-28
Filing date: 2000-08-28
Publication date: 2003-07-28
Anticipated expiration: 2020-08-28
Also published as: JP2002073073A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、取り込んだ音声
信号を認識する音声認識装置および音声データ入力変換
処理プログラムを記録したプログラム記録媒体に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device for recognizing a voice signal taken in and a program recording medium recording a voice data input conversion processing program.

【０００２】[0002]

【従来の技術】現在、パソコン等に対して所定の動作を
指示する場合には、キーボードやマウス等を用いる方法
が主流である。ところが、上記パソコンにおいて、例え
ばよく見るウェブ・ページを表示する場合には、ウェブ・
ブラウザを起動して「お気に入り」等に登録しているウェ
ブ・ページを選択するか、複雑なＵＲＬ(ユニフォーム・
リソース・ロケーション)を入力する必要があり、何れに
しても面倒である。そこで、パソコン等に直接音声によ
って指令を与えて所定の動作を行なわせる音声認識が注
目され始めている。2. Description of the Related Art At present, a method using a keyboard, a mouse or the like is mainly used to instruct a predetermined operation to a personal computer or the like. However, on the above-mentioned personal computer, for example, when displaying a frequently viewed web page,
Start a browser and select a web page registered in "Favorites", or enter a complicated URL (uniform
It is necessary to enter the resource location), which is troublesome in any case. Therefore, voice recognition, which gives a command directly to a personal computer or the like by voice to perform a predetermined operation, has begun to attract attention.

【０００３】従来、この種の音声認識装置として、図４
に概略を示すようなものである。この音声認識装置にお
いては、マイク(図示せず)によって集音したアナログの
音声信号をＡ/Ｄ変換回路(図示せず)によってディジタ
ル化し、得られたディジタルの音声信号から算出した音
声パワーを音声区間切り出し部１に与える。そして、上
記音声パワーに基づいて切り出された音声区間が音響分
析部２に送出され、上記音声区間の特徴ベクトルが抽出
される。こうして得られた特徴ベクトルがマッチング部
３に送出され、マッチング部３によって予め辞書部４に
登録されている標準パターンとのマッチングが行なわ
れ、最も類似した標準パターンのラベル等を認識結果と
して出力するのである。Conventionally, as a voice recognition device of this type, FIG.
It is as outlined in. In this voice recognition device, an analog voice signal picked up by a microphone (not shown) is digitized by an A / D conversion circuit (not shown), and voice power calculated from the obtained digital voice signal is output. It is given to the section cutout unit 1. Then, the voice section cut out based on the voice power is sent to the acoustic analysis unit 2, and the feature vector of the voice section is extracted. The feature vector obtained in this way is sent to the matching unit 3, the matching unit 3 performs matching with the standard pattern registered in the dictionary unit 4 in advance, and the label of the most similar standard pattern is output as the recognition result. Of.

【０００４】この場合、アナログの音声信号をディジタ
ルの音声信号に変換するＡ/Ｄ変換回路は、図５に示す
ような構成を有している。ローパスフィルタ５は、アン
チエイリアスフィルタであり、カットオフ周波数が可変
となっている場合もある。増幅器６は、微弱なアナログ
信号をＡ/Ｄ変換器８が扱いやすい大きさまで増幅する
もので、数ビットのディジタル信号を与えることによっ
て利得設定が行なえるようになっている場合がある。サ
ンプルホールド７は、Ａ/Ｄ変換器８において変換中の
アナログ信号を一定に保つためのもので、Ａ/Ｄ変換器
８の制御と密接な関係を持って制御される必要がある。In this case, the A / D conversion circuit for converting an analog voice signal into a digital voice signal has a structure as shown in FIG. The low-pass filter 5 is an anti-alias filter, and the cutoff frequency may be variable. The amplifier 6 amplifies a weak analog signal to a size that the A / D converter 8 can easily handle, and in some cases the gain can be set by giving a digital signal of several bits. The sample hold 7 is for keeping the analog signal being converted in the A / D converter 8 constant, and needs to be controlled in close relation to the control of the A / D converter 8.

【０００５】上記従来の音声認識装置においては、上記
音響分析部２およびマッチング部３で成る音声認識部で
処理する周波数帯域(例えば、６ｋＨzまで)における最
大周波数の２倍のサンプリング周波数(１２ｋＨz)でサ
ンプリングを行なうように上記Ａ/Ｄ変換回路を制御し
て、アナログの音声信号からディジタルの音声信号に変
換している。In the conventional speech recognition apparatus, the sampling frequency (12 kHz) twice the maximum frequency in the frequency band (for example, up to 6 kHz) processed by the speech recognition section composed of the acoustic analysis section 2 and the matching section 3 is used. The A / D conversion circuit is controlled to perform sampling to convert an analog voice signal into a digital voice signal.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上記従
来の音声認識装置においては以下のような問題がある。
すなわち、上記音声認識部で処理する周波数帯域(０Ｈz
〜６ｋＨz)における最大周波数の２倍のサンプリング周
波数１２ｋＨzでサンプリングを行うので、図６に示す
ように周波数帯域(０Ｈz〜６ｋＨz)の音声信号が得られ
る。その場合に、図６に示すように、本音声認識装置に
おけるＡ/Ｄ変換回路のアンチエイリアスフィルタ(ロー
パスフィルタ５)の遮断特性と、標準パターンまたは音
響モデルを学習するための学習データ抽出時に用いられ
たアンチエイリアスフィルタの遮断特性とが異なる場合
には、遷移域の周波数特性に違いが生じて認識率の低下
を招くことになる。However, the above-mentioned conventional voice recognition device has the following problems.
That is, the frequency band (0 Hz
Since sampling is performed at a sampling frequency of 12 kHz, which is twice the maximum frequency in (~ 6 kHz), an audio signal in the frequency band (0 Hz to 6 kHz) is obtained as shown in FIG. In that case, as shown in FIG. 6, the cutoff characteristic of the anti-aliasing filter (low-pass filter 5) of the A / D conversion circuit in the speech recognition apparatus and the learning data used for learning the standard pattern or the acoustic model are used when the learning data is extracted. If the cut-off characteristic of the anti-aliasing filter is different, the frequency characteristic of the transition region is different, and the recognition rate is lowered.

【０００７】本音声認識装置におけるＡ/Ｄ変換回路の
アンチエイリアスフィルタの遮断特性が事前に既知であ
る場合は、そのアンチエイリアスフィルタの遮断特性を
考慮して標準パターンあるいは音響モデルを学習するこ
とが可能ではある。しかしながら、パソコンの場合のよ
うに、サウンドカードに搭載されているＡ/Ｄ変換回路
のアンチエイリアスフィルタの遮断特性が機種により異
なる可能性がある場合は、遮断特性を事前に考慮して標
準パターンあるいは音響モデルを学習することが難しい
という問題がある。When the cutoff characteristic of the antialiasing filter of the A / D conversion circuit in this speech recognition apparatus is known in advance, it is not possible to learn the standard pattern or acoustic model in consideration of the cutoff characteristic of the antialiasing filter. is there. However, if the cutoff characteristics of the anti-aliasing filter of the A / D conversion circuit installed in the sound card may differ depending on the model, such as in the case of a personal computer, consider the cutoff characteristics in advance and use a standard pattern or sound. The problem is that it is difficult to learn the model.

【０００８】そこで、この発明の目的は、簡単な処理に
よってアンチエイリアスフィルタの遮断特性による認識
率の低下を防止できる音声認識装置、および、音声認識
処理プログラムを記録したプログラム記録媒体を提供す
ることにある。Therefore, an object of the present invention is to provide a voice recognition device capable of preventing a reduction in the recognition rate due to the cutoff characteristic of an anti-aliasing filter by a simple process, and a program recording medium recording a voice recognition processing program. .

【０００９】[0009]

【課題を解決するための手段】上記目的を達成するた
め、第１の発明は、入力された音声を音声認識手段で認
識する音声認識装置において、上記音声認識手段で処理
する周波数帯域よりも広い通過域のアンチエイリアスフ
ィルタを有すると共に、上記アンチエイリアスフィルタ
を通過した入力音声をサンプリングして音声データを取
得する音声入力手段と、上記音声入力手段のサンプリン
グ周波数を、上記アンチエイリアスフィルタの通過域に
上記音声認識手段で処理する周波数帯域が充分に含まれ
るようなオーバーサンプリング周波数に設定する入力制
御手段と、標準パターンあるいは音響モデルを学習する
ための学習データ抽出時に用いられたアンチエイリアス
フィルタの遮断特性と同じ特性または類似した特性を有
するフィルタを用いて、上記音声入力手段でサンプリン
グされた音声データにフィルタリング処理を施すと共
に、上記フィルタリング処理された後の音声データをダ
ウンサンプリングして所定の周波数帯域を有する音声デ
ータに変換して上記音声認識手段に送出するデータ変換
手段を備えたことを特徴としている。In order to achieve the above object, the first invention is a voice recognition device for recognizing an input voice by the voice recognition means, which is wider than the frequency band processed by the voice recognition means. A voice input unit having a pass band anti-aliasing filter, for sampling the input voice that has passed through the anti-aliasing filter to obtain voice data, and a sampling frequency of the voice input unit, wherein the voice recognition is performed in the pass band of the anti-aliasing filter. Input control means for setting the oversampling frequency so that the frequency band processed by the means is sufficiently included, and the same characteristics as the cutoff characteristics of the antialiasing filter used during learning data extraction for learning the standard pattern or acoustic model, or Use filters with similar characteristics Filtering the voice data sampled by the voice input unit, down-sampling the voice data after the filtering process, converting the voice data into voice data having a predetermined frequency band, and sending the voice data to the voice recognition unit. It is characterized in that it is provided with a data conversion means.

【００１０】上記構成によれば、音声入力手段によっ
て、アンチエイリアスフィルタの通過域に音声認識手段
で処理する周波数帯域が充分に含まれるようなオーバー
サンプリング周波数で、入力音声がサンプリングされ
る。そして、データ変換手段によって、学習データ抽出
時に用いられたアンチエイリアスフィルタの遮断特性と
同じ特性あるいは類似した特性を有するフィルタを用い
て音声データにフィルタリング処理が施される。こうし
て、入力音声の周波数特性が、標準パターンや音響モデ
ルを学習するための学習データの周波数特性に合わせら
れて、上記音声入力手段が有するアンチエイリアスフィ
ルタの遮断特性による認識率の低下が防止される。With the above arrangement, the voice input means samples the input voice at an oversampling frequency such that the pass band of the antialiasing filter sufficiently includes the frequency band processed by the voice recognition means. Then, the data conversion means performs the filtering process on the voice data by using a filter having the same characteristic or a similar characteristic to the cutoff characteristic of the antialiasing filter used at the time of extracting the learning data. In this way, the frequency characteristic of the input voice is matched with the frequency characteristic of the learning data for learning the standard pattern or the acoustic model, and the reduction of the recognition rate due to the cutoff characteristic of the antialiasing filter included in the voice input unit is prevented.

【００１１】また、上記第１の発明の音声認識装置は、
上記入力制御手段を、上記音声認識手段で処理する周波
数帯域のｎ倍(ｎ:２以上の正の整数)の周波数帯域を有
する音声データが得られるようなオーバーサンプリング
周波数に設定することが望ましい。The speech recognition apparatus according to the first aspect of the invention is
It is desirable that the input control means is set to an oversampling frequency so that voice data having a frequency band n times (n: a positive integer of 2 or more) processed by the voice recognition means can be obtained.

【００１２】上記構成によれば、上記データ変換手段に
よる１/ｎダウンサンプリング処理が容易に行われる。According to the above arrangement, the 1 / n downsampling process by the data converting means is easily performed.

【００１３】また、上記第１の発明の音声認識装置は、
上記データ変換手段を、上記変換後の音声信号における
直流成分や音声認識に必要のない低域成分を除去するた
めのローカットフィルタを有するように成すことが望ま
しい。The speech recognition apparatus of the first invention is
It is desirable that the data conversion means has a low cut filter for removing a direct current component in the converted voice signal and a low frequency component unnecessary for voice recognition.

【００１４】上記構成によれば、上記音声認識手段に渡
される音声信号における直流成分や音声認識に必要のな
い低域成分が除去されて、認識率がさらに高められる。According to the above construction, the DC component in the voice signal passed to the voice recognition means and the low frequency component not necessary for voice recognition are removed, and the recognition rate is further enhanced.

【００１５】また、第２の発明のプログラム記録媒体
は、コンピュータを、上記第１の発明における音声入力
手段,入力制御手段およびデータ変換手段として機能さ
せる音声データ入力変換処理プログラムが記録されてい
ることを特徴としている。The program recording medium of the second invention has recorded therein a voice data input conversion processing program for causing a computer to function as the voice input means, the input control means and the data conversion means in the first invention. Is characterized by.

【００１６】上記構成によれば、上記第１の発明の場合
と同様に、入力音声の周波数特性が学習データの周波数
特性に合わせられて、上記音声入力手段が有するアンチ
エイリアスフィルタの遮断特性による認識率の低下が防
止される。According to the above structure, as in the case of the first invention, the frequency characteristic of the input voice is matched with the frequency characteristic of the learning data, and the recognition rate by the cutoff characteristic of the antialiasing filter included in the voice input means is obtained. Is prevented.

【００１７】[0017]

【発明の実施の形態】以下、この発明を図示の実施の形
態により詳細に説明する。図１は、本実施の形態の音声
認識装置におけるブロック図である。この音声認識装置
は、音声入力部１１,入力制御部１２,データ変換部１
３,音声認識部１４および表示部１５で概略構成され
る。BEST MODE FOR CARRYING OUT THE INVENTION The present invention will be described in detail below with reference to the embodiments shown in the drawings. FIG. 1 is a block diagram of the voice recognition device according to the present embodiment. This voice recognition device includes a voice input unit 11, an input control unit 12, and a data conversion unit 1.
3, the voice recognition unit 14 and the display unit 15 are roughly configured.

【００１８】上記音声入力部１１は、マイク１１aによ
って集音されたアナログの音声信号をＡ/Ｄ変換器１１b
によってディジタル化する。このようにしてディジタル
化された音声信号は、バス１６を介してデータ変換部１
３に入力される。入力制御部１２は、音声入力部１１を
制御することによってサンプリング周波数の設定等の音
声信号の取り込み条件を制御する。データ変換部１３
は、音響モデルを学習するための学習データを抽出する
際に用いられたアンチエイリアスフィルタの遮断特性と
同じ特性または極めて類似した特性を有するフィルタを
用いて、音声入力部１１で取り込まれたディジタル音声
信号にフィルタリング処理を施し、フィルタリング処理
されたデータをダウンサンプリングして所定の周波数帯
域を有する音声信号に変換する。The voice input unit 11 converts the analog voice signal collected by the microphone 11a into an A / D converter 11b.
Digitize by. The audio signal digitized in this way is transmitted via the bus 16 to the data conversion unit 1
Input to 3. The input control unit 12 controls the audio input unit 11 to control the audio signal acquisition conditions such as the setting of the sampling frequency. Data converter 13
Is a digital voice signal captured by the voice input unit 11 by using a filter having the same characteristic as or a characteristic very similar to the cutoff characteristic of the antialiasing filter used when extracting the learning data for learning the acoustic model. Is subjected to a filtering process, and the filtered data is down-sampled to be converted into an audio signal having a predetermined frequency band.

【００１９】上記音声認識部１４は、例えば隠れマルコ
フモデル(以下、ＨＭＭと言う)を用いた音声認識手法に
より、データ変換部１３で変換された音声信号を用いて
認識対象単語に対応するＨＭＭの総てに関してその生起
確率を求め、生起確率の最も高いＨＭＭに対応する単語
を認識結果とする。ここで、ＨＭＭは大量の音声データ
から得られる音声の統計的特徴をモデル化したものであ
り、このＨＭＭを用いた音声認識手法の詳細は、中川聖
一著「確率モデルによる音声認識」等に記載されてい
る。表示部１５は、アイコン等に加えて上記認識結果を
表示する。The voice recognition unit 14 uses the voice recognition method using, for example, a hidden Markov model (hereinafter, referred to as HMM) to recognize the HMM corresponding to the recognition target word by using the voice signal converted by the data conversion unit 13. The occurrence probabilities are calculated for all of them, and the word corresponding to the HMM having the highest occurrence probability is used as the recognition result. Here, the HMM is a model of statistical characteristics of speech obtained from a large amount of speech data, and details of the speech recognition method using this HMM are described in "Speech Recognition by Probabilistic Model" by Seiichi Nakagawa. Have been described. The display unit 15 displays the recognition result in addition to the icon and the like.

【００２０】図２は、上記音声入力部１１,入力制御部
１２およびデータ変換部１３によって実行される音声デ
ータ入力変換処理動作のフローチャートを示す、以下、
図２に従って、本実施の形態における音声データ入力変
換処理動作について具体的に説明する。ここでは、音声
認識処理においては０Ｈz〜６ｋＨzの周波数帯域を使用
するために、上記周波数帯域における最大周波数の２倍
のサンプリング周波数１２ｋＨzでサンプリングする場
合について説明する。FIG. 2 shows a flowchart of the voice data input conversion processing operation executed by the voice input unit 11, the input control unit 12, and the data conversion unit 13.
The audio data input conversion processing operation in the present embodiment will be specifically described with reference to FIG. Here, since a frequency band of 0 Hz to 6 kHz is used in the voice recognition process, a case will be described where sampling is performed at a sampling frequency of 12 kHz, which is twice the maximum frequency in the frequency band.

【００２１】ステップＳ1で、上記入力制御部１２によ
って、音声の取り込みに先立って、音声入力部１１にお
けるサンプリング周波数が、目的とするサンプリング周
波数１２ｋＨzの２倍である２４ｋＨzに設定される。ス
テップＳ2で、音声入力部１１で、２４ｋＨzのサンプリ
ング周波数で音声信号が取り込まれる。その場合、図３
に破線で示すような遮断特性を有するアンチエイリアス
フィルタを通った音声信号が得られる。In step S1, the input control section 12 sets the sampling frequency in the audio input section 11 to 24 kHz, which is twice the target sampling frequency of 12 kHz, prior to voice acquisition. In step S2, the voice input unit 11 captures a voice signal at a sampling frequency of 24 kHz. In that case,
An audio signal that has passed through an anti-aliasing filter having a cutoff characteristic as shown by a broken line is obtained.

【００２２】ステップＳ3で、上記データ変換部１３に
よって、音声入力部１１で取り込まれた音声信号に対し
て、図３に実線で示すような、音響モデル(ＨＭＭ)を学
習するための学習データを抽出する際に用いられたアン
チエイリアスフィルタの遮断特性と同じ特性、または、
極めて類似した特性を有するローパスフィルタが掛けら
れる。こうして、認識時における上記遷移域の特性を、
上記音響モデル(ＨＭＭ)の学習時における上記遷移域の
特性と略同じようにするのである。In step S3, learning data for learning an acoustic model (HMM) as shown by a solid line in FIG. 3 is applied to the voice signal taken in by the voice input unit 11 by the data conversion unit 13. The same characteristics as the blocking characteristics of the anti-aliasing filter used when extracting, or
A low pass filter with very similar characteristics is applied. Thus, the characteristics of the above transition region at the time of recognition are
The characteristics of the transition region at the time of learning the acoustic model (HMM) are made substantially the same.

【００２３】ステップＳ4で、さらに上記データ変換部
１３によって、１/２(つまり音声認識処理で用いる目的
とするサンプリング周波数１２ｋＨzの音声信号)にダウ
ンサンプリングされる。ステップＳ5で、さらにデータ
変換部１３によって、ローカットフィルタが掛けられて
直流成分や音声認識にあまり必要がない低域成分(例え
ば６０Ｈz以下の成分)が除去される。そうした後、上記
音声信号が音声認識部１４に渡されて、音声データ入力
変換処理動作が終了する。In step S4, the data conversion section 13 further down-samples to 1/2 (that is, a voice signal having a target sampling frequency of 12 kHz used in the voice recognition process). In step S5, the data conversion unit 13 further applies a low-cut filter to remove DC components and low-frequency components (for example, components of 60 Hz or less) that are not necessary for voice recognition. After that, the voice signal is passed to the voice recognition unit 14, and the voice data input conversion processing operation ends.

【００２４】尚、上記フローチャートにおけるステップ
Ｓ3において使用されるローパスフィルタ、および、ス
テップＳ5において使用されるローカットフィルタは、
例えば、ＩＩＲ型フィルタ等によって容易に実現でき
る。The low-pass filter used in step S3 and the low-cut filter used in step S5 in the above flow chart are
For example, it can be easily realized by an IIR type filter or the like.

【００２５】上述のように、本実施の形態においては、
上記音声認識部１４による音声認識処理の前段に、以下
のような音声データ入力変換処理動作を追加している。
すなわち、入力制御部１２によって、音声入力部１１の
サンプリング周波数を音声認識部１４による音声認識処
理で使用するサンプリング周波数１２ｋＨzの２倍の周
波数２４ｋＨzに設定する。そして、データ変換部１３
によって、音声入力部１１でオーバーサンプリングされ
た音声信号に、学習データを抽出する際に用いられたア
ンチエイリアスフィルタの遮断特性と同じ特性、また
は、極めて類似した特性を有するローパスフィルタを掛
けて、認識時のフィルタにおける阻止域から通過域まで
の遷移域の特性を、上記学習時における上記遷移域の特
性と略同じにする。そして更に、データ変換部１３によ
って、１/２(つまり音声認識処理で用いるサンプリング
周波数１２ｋＨzの音声信号)にダウンサンプリングす
る。As described above, in the present embodiment,
The following voice data input conversion processing operation is added before the voice recognition processing by the voice recognition unit 14.
That is, the input control unit 12 sets the sampling frequency of the voice input unit 11 to 24 kHz, which is twice the sampling frequency 12 kHz used in the voice recognition processing by the voice recognition unit 14. Then, the data conversion unit 13
By applying a low-pass filter having the same characteristic as the cutoff characteristic of the anti-aliasing filter used when extracting the learning data or a very similar characteristic to the voice signal oversampled by the voice input unit 11, The characteristics of the transition band from the stop band to the pass band in the filter are set to be substantially the same as the characteristics of the transition band at the time of learning. Then, the data converter 13 further down-samples to 1/2 (that is, a voice signal with a sampling frequency of 12 kHz used in the voice recognition process).

【００２６】したがって、簡単な前処理によって入力音
声の周波数特性と音響モデル学習時に用いる学習データ
の周波数特性とを合わせることができ、Ａ/Ｄ変換回路
１１bにおけるアンチエイリアスフィルタの遮断特性に
よる認識率の低下を防ぐことができるのである。Therefore, the frequency characteristic of the input voice and the frequency characteristic of the learning data used for learning the acoustic model can be matched by a simple preprocessing, and the recognition rate is lowered by the cutoff characteristic of the antialiasing filter in the A / D conversion circuit 11b. Can be prevented.

【００２７】尚、上記実施の形態においては、上記音声
認識処理で使用するサンプリング周波数の２倍のサンプ
リング周波数でオーバーサンプリングを行っているが、
ｎ倍(ｎ：２以上の整数)(つまり、音声認識処理で処理
する周波数帯域における最大周波数の２ｎ倍)のサンプ
リング周波数でオーバーサンプリングを行っても差し支
えない。但し、その場合には、ダウンサンプリング時に
は１/ｎのダウンサンプリングを行う必要がある。In the above embodiment, oversampling is performed at a sampling frequency twice as high as the sampling frequency used in the voice recognition processing.
Oversampling may be performed at a sampling frequency n times (n: an integer of 2 or more) (that is, 2n times the maximum frequency in the frequency band processed by the speech recognition process). However, in that case, it is necessary to perform 1 / n downsampling at the time of downsampling.

【００２８】ところで、上記実施の形態における上記音
声入力手段,入力制御手段及びデータ変換手段としての
機能は、プログラム記録媒体に記録された音声データ入
力変換処理プログラムによって実現される。上記実施の
形態における上記プログラム記録媒体は、ＲＯＭ(リー
ド・オンリ・メモリ)でなるプログラムメディアである。
あるいは、外部補助記憶装置に装着されて読み出される
プログラムメディアであってもよい。尚、何れの場合に
おいても、上記プログラムメディアから音声データ入力
変換処理プログラムを読み出すプログラム読み出し手段
は、上記プログラムメディアに直接アクセスして読み出
す構成を有していてもよいし、ＲＡＭ(ランダム・アクセ
ス・メモリ)に設けられたプログラム記憶エリア(図示せ
ず)にダウンロードし、上記プログラム記憶エリアにア
クセスして読み出す構成を有していてもよい。尚、上記
プログラムメディアからＲＡＭの上記プログラム記憶エ
リアにダウンロードするためのダウンロードプログラム
は、予め本体装置に格納されているものとする。By the way, the functions as the voice input means, the input control means and the data conversion means in the above embodiment are realized by the voice data input conversion processing program recorded in the program recording medium. The program recording medium in the above embodiments is a program medium including a ROM (Read Only Memory).
Alternatively, it may be a program medium loaded in an external auxiliary storage device and read. In any case, the program reading means for reading the audio data input conversion processing program from the program medium may have a configuration of directly accessing and reading the program medium, or a RAM (random access memory). ) May be configured to be downloaded to a program storage area (not shown) provided in (1), and the program storage area may be accessed and read. The download program for downloading from the program medium to the program storage area of the RAM is assumed to be stored in the main body device in advance.

【００２９】ここで、上記プログラムメディアとは、本
体側と分離可能に構成され、磁気テープやカセットテー
プ等のテープ系、フロッピー（登録商標）ディスク,ハ
ードディスク等の磁気ディスクやＣＤ(コンパクトディ
スク)‐ＲＯＭ,ＭＯ(光磁気)ディスク,ＭＤ(ミニディス
ク),ＤＶＤ(ディジタルビデオディスク)等の光ディスク
のディスク系、ＩＣ(集積回路)カードや光カード等のカ
ード系、マスクＲＯＭ,ＥＰＲＯＭ（紫外線消去型ＲＯ
Ｍ),ＥＥＰＲＯＭ(電気的消去型ＲＯＭ),フラッシュＲ
ＯＭ等の半導体メモリ系を含めた、固定的にプログラム
を坦持する媒体である。Here, the program medium is configured to be separable from the main body side, and is a tape system such as a magnetic tape or a cassette tape, a magnetic disk such as a floppy (registered trademark) disk or a hard disk, or a CD (compact disk)- Disk system of optical disk such as ROM, MO (magneto-optical) disk, MD (mini disk), DVD (digital video disk), card system such as IC (integrated circuit) card and optical card, mask ROM, EPROM (ultraviolet erasable type) RO
M), EEPROM (electrically erasable ROM), flash R
It is a medium that holds a program fixedly, including a semiconductor memory system such as OM.

【００３０】また、上記実施の形態における音声認識装
置は、モデムを備えてインターネットを含む通信ネット
ワークと接続可能な構成を有していれば、上記プログラ
ムメディアは、通信ネットワークからのダウンロード等
によって流動的にプログラムを坦持する媒体であっても
差し支えない。尚、その場合における上記通信ネットワ
ークからダウンロードするためのダウンロードプログラ
ムは、予め本体装置に格納されているものとする。ある
いは、別の記録媒体からインストールされるものとす
る。Further, if the voice recognition device in the above-mentioned embodiment is equipped with a modem and is connectable to a communication network including the Internet, the program medium is fluid by downloading from the communication network or the like. Even if it is a medium that carries the program, it does not matter. In this case, the download program for downloading from the communication network is stored in the main body device in advance. Alternatively, it is assumed that the program is installed from another recording medium.

【００３１】尚、上記記録媒体に記録されるものはプロ
グラムのみに限定されるものではなく、データも記録す
ることが可能である。It should be noted that what is recorded on the recording medium is not limited to the program, and data can be recorded.

【００３２】[0032]

【発明の効果】以上より明らかなように、第１の発明の
音声認識装置は、音声入力手段のサンプリング周波数
を、入力制御手段によって、アンチエイリアスフィルタ
の通過域に音声認識手段で処理する周波数帯域が充分に
含まれるようなオーバーサンプリング周波数に設定し、
データ変換手段によって、学習データ抽出時に用いられ
たアンチエイリアスフィルタの遮断特性と同じ特性また
は類似した特性を有するフィルタを用いて、上記オーバ
ーサンプリング周波数でサンプリングされた音声データ
にフィルタリング処理を施すので、簡単な前処理で入力
音声の周波数特性と標準パターンや音響モデルを学習す
るための学習データの周波数特性とを合わせることがで
きる。したがって、音声入力手段が有するアンチエイリ
アスフィルタの遮断特性による認識率の低下を防止する
ことができる。As is apparent from the above, in the voice recognition apparatus of the first invention, the sampling frequency of the voice input means is controlled by the input control means so that the frequency band processed by the voice recognition means is in the pass band of the antialiasing filter. Set the oversampling frequency so that it is fully included,
Since the data conversion means performs the filtering process on the voice data sampled at the oversampling frequency by using the filter having the same characteristic or the similar characteristic as the cutoff characteristic of the antialiasing filter used at the time of extracting the learning data, In the pre-processing, the frequency characteristic of the input voice and the frequency characteristic of the learning data for learning the standard pattern or acoustic model can be matched. Therefore, it is possible to prevent the recognition rate from decreasing due to the cutoff characteristic of the anti-aliasing filter included in the voice input unit.

【００３３】また、上記第１の発明の音声認識装置は、
上記入力制御手段を、音声認識手段で処理する周波数帯
域のｎ倍(ｎ:２以上の正の整数)の周波数帯域を有する
音声データが得られるようなオーバーサンプリング周波
数に上記サンプリング周波数を設定するように成せば、
上記データ変換手段による１/ｎダウンサンプリング処
理を容易に行うことができる。The voice recognition device of the first invention is
The input control means sets the sampling frequency to an oversampling frequency such that audio data having a frequency band n times (n: a positive integer of 2 or more) processed by the voice recognition means can be obtained. If you do
The 1 / n downsampling process by the data conversion means can be easily performed.

【００３４】また、上記第１の発明の音声認識装置は、
上記データ変換手段を、上記変換後の音声信号における
直流成分や音声認識に必要のない低域成分を除去するた
めのローカットフィルタを有するように成せば、上記音
声認識手段に渡される音声信号における直流成分や音声
認識に必要のない低域成分が除去されて、認識率をさら
に高めることができる。The voice recognition device of the first invention is
If the data conversion means is provided with a low-cut filter for removing a direct current component in the converted voice signal and a low-frequency component unnecessary for voice recognition, the direct current in the voice signal passed to the voice recognition means is reduced. It is possible to further improve the recognition rate by removing the components and low-frequency components unnecessary for voice recognition.

【００３５】また、第２の発明のプログラム記録媒体
は、コンピュータを、上記第１の発明における音声入力
手段,入力制御手段及びデータ変換手段として機能させ
る音声データ入力変換処理プログラムが記録されている
ので、上記第１の発明の場合と同様に、簡単な前処理で
入力音声の周波数特性と標準パターンや音響モデルを学
習するための学習データの周波数特性とを合わせること
ができる。したがって、音声入力手段が有するアンチエ
イリアスフィルタの遮断特性による認識率の低下を防止
することができる。Further, the program recording medium of the second invention has recorded therein a voice data input conversion processing program for causing a computer to function as the voice input means, the input control means and the data conversion means in the first invention. As in the case of the first invention, the frequency characteristic of the input voice and the frequency characteristic of the learning data for learning the standard pattern or the acoustic model can be matched by a simple preprocessing. Therefore, it is possible to prevent the recognition rate from decreasing due to the cutoff characteristic of the anti-aliasing filter included in the voice input unit.

[Brief description of drawings]

【図１】この発明の音声認識装置におけるブロック図
である。FIG. 1 is a block diagram of a voice recognition device of the present invention.

【図２】図１における音声入力部,入力制御部および
データ変換部によって実行される音声データ入力変換処
理動作のフローチャートである。FIG. 2 is a flowchart of a voice data input conversion processing operation executed by a voice input unit, an input control unit, and a data conversion unit in FIG.

【図３】図１に示す音声認識装置による認識時のフィ
ルタ遮断特性と音響モデル作成時のフィルタ遮断特性と
を示す図である。3 is a diagram showing a filter cutoff characteristic at the time of recognition by the voice recognition device shown in FIG. 1 and a filter cutoff characteristic at the time of creating an acoustic model.

【図４】従来の音声認識装置の概略を示すブロック図
である。FIG. 4 is a block diagram showing an outline of a conventional voice recognition device.

【図５】Ａ/Ｄ変換回路の構成図である。FIG. 5 is a configuration diagram of an A / D conversion circuit.

【図６】図４に示す音声認識装置による認識時のフィ
ルタ遮断特性と音響モデル作成時のフィルタ遮断特性と
を示す図である。6 is a diagram showing a filter cutoff characteristic at the time of recognition by the voice recognition device shown in FIG. 4 and a filter cutoff characteristic at the time of creating an acoustic model.

[Explanation of symbols]

１１…音声入力部、１１a…マイク、１１b…Ａ/Ｄ変換器、１２…入力制御部、１３…データ変換部、１４…音声認識部、１５…表示部、１６…バス。 11 ... voice input section, 11a ... Mike, 11b ... A / D converter, 12 ... Input control unit, 13 ... Data converter, 14 ... voice recognition unit, 15 ... Display, 16 ... bus.

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 21/00 ─────────────────────────────────────────────────── ─── Continuation of front page (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10L 21/00

Claims

(57) [Claims]

1. A voice recognition device for recognizing an input voice by a voice recognition means, comprising an antialiasing filter having a pass band wider than a frequency band processed by the voice recognition means, and input voice having passed through the antialiasing filter. And a sampling frequency of the speech input means for sampling to obtain speech data, and the sampling frequency of the speech input means is set to an oversampling frequency such that the pass band of the antialiasing filter sufficiently includes the frequency band processed by the speech recognition means. The input control means to be set and a filter having the same characteristics as or similar to the cutoff characteristics of the antialiasing filter used at the time of extracting the learning data for learning the standard pattern or the acoustic model are sampled by the voice input means. Audio data In addition to the filtering process, the audio data after the filtering process is down-sampled and converted into audio data having a predetermined frequency band, and the data converting means is sent to the audio recognition means. Speech recognizer.

2. The voice recognition apparatus according to claim 1, wherein the input control unit has a sampling frequency n times as high as a frequency band processed by the voice recognition unit (n: a positive integer of 2 or more). A voice recognition device, characterized in that an oversampling frequency is set so that voice data having a frequency band can be obtained.

3. The voice recognition device according to claim 1 or 2, wherein the data conversion means removes a DC component in the converted voice signal and a low-frequency component unnecessary for voice recognition. A voice recognition device having a low-cut filter.

4. A computer-readable program recording medium on which a voice data input conversion processing program for causing a computer to function as the voice input means, the input control means, and the data converting means according to claim 1 is recorded.