JPH0754434B2

JPH0754434B2 - Voice recognizer

Info

Publication number: JPH0754434B2
Application number: JP1114733A
Authority: JP
Inventors: 由実滝沢
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1989-05-08
Filing date: 1989-05-08
Publication date: 1995-06-07
Anticipated expiration: 2010-06-07
Also published as: JPH02293797A

Description

【発明の詳細な説明】産業上の利用分野本発明は、音声認識装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device.

従来の技術近年、音声認識技術の発達と共に、音声認識装置が様々
の分野で実用化されようとしているが、実用化するため
には、認識装置を実際に使用する上での様々の問題点を
解消する必要がある。この実用上の問題点の１つに、音
声入力時のSN比が低い場合に、雑音を音声区間として誤
検出してしまい、その結果、誤認識してしまうという点
がある。2. Description of the Related Art In recent years, with the development of voice recognition technology, voice recognition devices are about to be put to practical use in various fields. However, in order to put them into practical use, there are various problems in actually using the recognition devices. It needs to be resolved. One of the practical problems is that when the SN ratio at the time of voice input is low, noise is erroneously detected as a voice section, and as a result, erroneous recognition is performed.

従来の音声認識装置では、上記問題点を解決するため
に、あらかじめ音声を入力する直前に背景雑音を入力し
てそのパワーを調べ、音声区間を検出するための閾値を
上記パワー以上に設定しておき、設定された閾値を用い
て音声区間を検出する。この方法により、SN比が低い環
境でも雑音を音声区間として誤検出することなく、誤認
識率が少なくなる。In the conventional voice recognition device, in order to solve the above-mentioned problems, the background noise is input in advance just before the voice is input and the power thereof is checked, and the threshold for detecting the voice section is set to be equal to or higher than the power. Then, the voice section is detected using the set threshold. By this method, the false recognition rate is reduced without erroneously detecting noise as a voice section even in an environment with a low SN ratio.

以下，図面を参照しながら、上述したような従来の音声
認識装置について説明を行う。第３図は、従来の登録型
単語音声認識装置のブロック図である。同図において、
１は音声入力端子,2は分析部、16は閾値設定部、17は区
間検出部、18は照合部、19は認識結果出力端子、20は登
録音声用バッファ、21は入力音声用バッファ、22、23は
スイッチである。以上のように構成された音声認識装置
について以下その動作について説明する。Hereinafter, the conventional speech recognition apparatus as described above will be described with reference to the drawings. FIG. 3 is a block diagram of a conventional registered word voice recognition device. In the figure,
1 is a voice input terminal, 2 is an analysis unit, 16 is a threshold value setting unit, 17 is a section detection unit, 18 is a matching unit, 19 is a recognition result output terminal, 20 is a registered voice buffer, 21 is an input voice buffer, 22 , 23 are switches. The operation of the speech recognition apparatus configured as described above will be described below.

まず登録時には、音声入力時直前に、音声入力端子１よ
り所定時間分の背景雑音信号が入力され、分析部２で単
位時間ごとの信号のパワーが算出され、算出結果は閾値
設定部16に入力される。閾値設定部16では上記で算出さ
れたパワーの平均値を求め、左記平均値に所定値（たと
えば6dBとする）を加えた値を区間検出閾値と設定す
る。First, at the time of registration, just before voice input, a background noise signal for a predetermined time is input from the voice input terminal 1, the power of the signal per unit time is calculated by the analysis unit 2, and the calculation result is input to the threshold setting unit 16. To be done. The threshold setting unit 16 obtains the average value of the powers calculated above, and sets the value obtained by adding a predetermined value (for example, 6 dB) to the average value on the left as the section detection threshold.

登録単語音声入力時には、音声入力端子１より入力され
た信号にもとづき、分析部２では単位時間毎の信号のパ
ワーと特徴パラメータが算出され、パワー算出結果は区
間検出部17に、特徴パラメータは入力音声用バッファ21
に入力される。分析方法としてたとえばLPCケプストラ
ム法を用いれば所定の個数のケプストラム係数が特徴パ
ラメータとして算出される。次に区間検出部17では単位
時間毎の信号のパワーと先に設定した区間検出閾値とを
比較し、信号のパワーが60msec以上連続して区間検出閾
値以上となる部分を音声区間と決定する。但し、信号パ
ワーが区間検出閾値以下となっても閾値以下の区間が60
msec以上連続しなければ音声区間とする。次に決定され
た音声区間分の特徴パラメータを入力音声用バッファ21
より入力し、登録音声用バッファ20に保管する。以上の
音声入力以降の処理を全認識単語分繰り返す。When inputting the registered word voice, the analysis unit 2 calculates the power and characteristic parameter of the signal for each unit time based on the signal input from the voice input terminal 1, and the power calculation result is input to the section detection unit 17 and the characteristic parameter is input. Audio buffer 21
Entered in. If the LPC cepstrum method is used as the analysis method, a predetermined number of cepstrum coefficients are calculated as the characteristic parameters. Next, the section detection unit 17 compares the power of the signal for each unit time with the section detection threshold set previously, and determines the portion where the signal power is continuously the section detection threshold or more for 60 msec or more as the voice section. However, even if the signal power is below the section detection threshold, there are 60 sections below the threshold.
If it does not continue for more than msec, it is regarded as a voice section. Next, the feature parameters for the determined voice section are input to the input voice buffer 21.
It is input by the user and stored in the registered voice buffer 20. The above processing after the voice input is repeated for all the recognized words.

次に認識時には、登録時と同様に背景雑音から区間検出
閾値を設定した後、入力音声を分析し、音声区間を検出
する。分析方法、区間検出方法共に登録時と同じであ
る。音声区間検出後、照合部18で登録音声と入力音声と
の照合を行い、最短距離を示す単語を認識結果として認
識結果出力端子19より出力する。なおスイッチ22は、音
声入力直前に雑音を入力する場合には閾値設定部16に、
音声入力時には区間検出部17と入力音声用バッファ21と
に算出結果を入力するように動作する。スイッチ23は、
登録時には登録用バッファ20に、認識時には照合部18に
特徴パラメータを入力するように動作する。Next, at the time of recognition, as in the case of registration, the section detection threshold is set from the background noise, and then the input speech is analyzed to detect the speech section. The analysis method and the section detection method are the same as at the time of registration. After the voice section is detected, the collating unit 18 collates the registered voice with the input voice, and outputs the word indicating the shortest distance from the recognition result output terminal 19 as the recognition result. Note that the switch 22 is provided in the threshold setting unit 16 when noise is input immediately before voice input,
At the time of voice input, the section detection unit 17 and the input voice buffer 21 operate to input the calculation result. Switch 23
It operates so as to input the characteristic parameter to the registration buffer 20 at the time of registration and to the collating unit 18 at the time of recognition.

発明が解決しようとする課題しかしながら、上記のような構成では、雑音パワーの変
化に無関係に雑音を除去することは可能であるが、雑音
または発声パワーの変化に伴い音声区間の始端及び終端
位置がずれるため、登録音声または標準音声発声時と入
力音声発声時との状況が違うと異なる音声区間で照合さ
れるため、誤認識を起こしやすいという問題点を有して
いた。DISCLOSURE OF THE INVENTION Problems to be Solved by the Invention However, in the above-described configuration, although it is possible to remove noise irrespective of the change in noise power, the start and end positions of the voice section are changed in accordance with the change in noise or vocal power. Due to the deviation, when the registered voice or standard voice is uttered and the input voice is uttered differently, matching is performed in different voice sections, so that there is a problem in that erroneous recognition is likely to occur.

本発明は、上記問題点に鑑み、登録音声または標準音声
と入力音声との音声区間のずれを防ぎ、状況の違いによ
る誤認識を軽減することができる音声認識装置を提供す
るものである。In view of the above-mentioned problems, the present invention provides a voice recognition device capable of preventing a deviation of a voice section between a registered voice or a standard voice and an input voice and reducing erroneous recognition due to a difference in situation.

課題を解決するための手段上記目的を達成するために請求項１記載の音声認識装置
は、入力信号の単位時間毎のパワーを検出する分析部
と、SN比を算出するSN比算出部と、SN比を考慮して区間
検出閾値を決定する閾値設定部と、決定された閾値によ
り上記入力信号の音声区間を検出する区間検出部と、登
録音声または標準音声区間を修正する区間修正部と、登
録音声または標準音声と入力音声とを照合して認識結果
を出力する照合部とから構成されている。Means for Solving the Problems In order to achieve the above object, the voice recognition device according to claim 1, an analysis unit that detects the power of an input signal per unit time, an SN ratio calculation unit that calculates the SN ratio, A threshold setting unit that determines a section detection threshold in consideration of the SN ratio, a section detection unit that detects a voice section of the input signal by the determined threshold, a section correction unit that corrects a registered voice or a standard voice section, It is composed of a collating unit that collates a registered voice or standard voice with an input voice and outputs a recognition result.

また請求項２記載の音声認識装置は、入力信号の単位時
間毎のパワーを検出する分析部と、信号パワーのピーク
値と雑音パワー値とを考慮して区間検出閾値を設定する
閾値設定部と、設定された閾値により上記入力信号の音
声区間の検出する区間検出部と、登録音声または標準音
声の区間を修正する区間修正部と、登録音声または標準
音声と入力音声とを照合して認識結果を出力する照合部
とから構成されている。The speech recognition apparatus according to claim 2 further includes an analysis unit that detects the power of the input signal per unit time, and a threshold setting unit that sets the section detection threshold in consideration of the peak value of the signal power and the noise power value. , A section detection unit that detects the voice section of the input signal according to the set threshold value, a section correction unit that corrects the section of the registered voice or standard voice, and the recognition result by comparing the registered voice or standard voice with the input voice And a collating unit for outputting

作用請求項１記載の音声認識装置によれば、SN比算出部でSN
比を算出し、閾値決定部でSN比が低い環境では雑音パワ
ー以上の値を、SN比が高い環境ではピーク値から所定値
を引いた値を閾値と決定した後、区間検出部で上記閾値
を用いて入力音声の区間検出を行い、さらに区間修正部
で上記閾値にて登録音声または標準音声と入力音声との
照合を行う。According to the speech recognition apparatus of claim 1, the SN ratio calculation unit performs SN
After calculating the ratio, the threshold determination unit determines a value equal to or higher than the noise power in an environment where the SN ratio is low, and a value obtained by subtracting a predetermined value from the peak value in an environment where the SN ratio is high. Is used to detect the section of the input voice, and the section correction unit compares the registered voice or the standard voice with the input voice using the threshold value.

また請求項２記載の音声認識装置によれば、閾値設定部
で雑音パワー値に所定値を加えた値と、ピーク値より所
定値を引いた値とを比較して大きい方の値を閾値と設定
した後、区間検出部で上記閾値を用いて入力音声の区間
検出を行い、さらに区間修正部で上記閾値にて登録音声
または標準音声区間を修正し、照合部で上記登録音声ま
たは標準音声と入力音声との照合を行う。According to the speech recognition apparatus of claim 2, the threshold value setting unit compares a value obtained by adding a predetermined value to the noise power value with a value obtained by subtracting the predetermined value from the peak value, and sets the larger value as the threshold value. After setting, the section detection unit detects the section of the input voice using the threshold, and the section correction unit corrects the registered voice or the standard voice section with the threshold, and the matching unit detects the registered voice or the standard voice. Match with the input voice.

実施例第１図は、本発明の第１の実施例（請求項１記載の発明
に対応）における登録型単語音声認識装置のブロック図
である。Embodiment 1 FIG. 1 is a block diagram of a registered word voice recognition device in a first embodiment of the present invention (corresponding to the invention described in claim 1).

同図において、１は音声入力端子、２は分析部、３は仮
閾値設定部、４はSN比算出部、５は閾値設定部、６は区
間検出部、７は区間修正部、８は照合部、９は登録音声
用バッファ、10は入力音声用バッファ、11は認識結果出
力端子、12、13はスイッチであり、従来例（第３図参
照）と同じものは同一の番号を付与している。In the figure, 1 is a voice input terminal, 2 is an analysis unit, 3 is a temporary threshold value setting unit, 4 is an SN ratio calculation unit, 5 is a threshold value setting unit, 6 is a section detection unit, 7 is a section correction unit, and 8 is a collation. Reference numeral 9 is a registered voice buffer, 10 is an input voice buffer, 11 is a recognition result output terminal, 12 and 13 are switches, and the same elements as in the conventional example (see FIG. 3) are given the same numbers. There is.

以上のように構成された音声認識装置について以下その
動作について説明する。The operation of the speech recognition apparatus configured as described above will be described below.

まず登録時には、音声入力時直前に、音声入力端子１よ
り所定時間分の背景雑音信号が入力され、分析部２で単
位時間ごとの信号のパワーが算出される。算出結果は仮
閾値設定部３に入力される。仮閾値設定部３で上記パワ
ーの平均値を求め、左記平均値に所定値（本実施例では
6dBとする）加えた値を仮区間検出閾値とする。First, at the time of registration, immediately before voice input, a background noise signal for a predetermined time is input from the voice input terminal 1, and the analysis unit 2 calculates the signal power for each unit time. The calculation result is input to the temporary threshold value setting unit 3. The temporary threshold setting unit 3 obtains the average value of the powers, and the average value on the left is a predetermined value (in the present embodiment,
The added value is used as the temporary section detection threshold.

登録単語音声入力時には、音声入力端子１より入力され
た信号にもとづき、分析部２では単位時間毎の信号のパ
ワーと特徴パラメータが算出される。パワー算出結果は
SN比算出部４に、特徴パラメータは入力音声用バッファ
10に入力される。なお分析方法は従来例と同じである。
SN比算出部４では、先に設定された仮区間検出閾値以上
の信号部を仮の音声区間として、仮音声区間内のピーク
値と先に算出された雑音パワーとの平均値の比をSN比と
して算出し、SN比が所定値（本例では24dBとする）以下
であれば登録を再度やり直すよう話者に指示し、以上の
登録処理を初めからやり直す。When inputting the registered word voice, the analysis unit 2 calculates the power of the signal and the characteristic parameter for each unit time based on the signal input from the voice input terminal 1. The power calculation result is
In the SN ratio calculation unit 4, the feature parameter is the input voice buffer.
Entered in 10. The analysis method is the same as the conventional example.
In the SN ratio calculation unit 4, a signal portion equal to or higher than the previously set provisional section detection threshold is set as a provisional voice section, and the ratio of the average value of the peak value in the provisional voice section and the previously calculated noise power is calculated as SN. If the SN ratio is less than or equal to a predetermined value (24 dB in this example), the speaker is instructed to retry registration, and the above registration process is restarted from the beginning.

SN値が24dB以上であれば閾値設定部５で、ピーク値より
所定値（本実施例では18dBとする）を引いた値を検出閾
値として決定する。区間検出部６で単位時間毎の信号パ
ワーと検出閾値とを比較し、音声区間を検出する。区間
検出方法は、従来例と同じである。次に、音声区間分の
特徴パラメータを入力音声用バッファ10より入力し、登
録音声用バッファ９に登録する。以上の登録音声入力以
降の処理を全認識単語分繰り返す。If the SN value is 24 dB or more, the threshold setting unit 5 determines a value obtained by subtracting a predetermined value (18 dB in this embodiment) from the peak value as the detection threshold. The section detection unit 6 compares the signal power for each unit time with the detection threshold to detect a voice section. The section detection method is the same as the conventional example. Next, the characteristic parameters for the voice section are input from the input voice buffer 10 and registered in the registered voice buffer 9. The above processing after the input of the registered voice is repeated for all the recognized words.

次に認識時には、登録時と同様に背景雑音から仮区間検
出閾値を設定した後、入力音声を分析し、結果をSN比算
出部４と入力音声用バッファ10とに入力する。SN比算出
部４で登録時同様にSN比を算出し、結果を閾値設定部５
に入力する。閾値設定部５で、SN比が24dB以上であれば
ピーク値から18dBを引いた値を閾値とし、SN比が24dB以
下であれば先の仮区間検出閾値を閾値と設定した後、区
間検出部６で、左記閾値を用いて音声区間を検出する。
なお区間検出方法は登録時と同様である。Next, at the time of recognition, as in the case of registration, the provisional section detection threshold is set from the background noise, then the input voice is analyzed, and the result is input to the SN ratio calculation unit 4 and the input voice buffer 10. The SN ratio calculation unit 4 calculates the SN ratio in the same manner as at the time of registration, and the result is the threshold setting unit 5
To enter. In the threshold value setting unit 5, if the SN ratio is 24 dB or more, the value obtained by subtracting 18 dB from the peak value is set as the threshold value, and if the SN ratio is 24 dB or less, the provisional section detection threshold value is set as the threshold value, and then the section detection unit is set. In step 6, the voice segment is detected using the threshold value on the left.
The section detection method is the same as that at the time of registration.

次に区間修正部７では、上記SN比が24dB以上の際には登
録された登録音声区間の修正は行なわず、SN比が24dB以
下の場合のみ、上記閾値にて登録音声の区間検出を再度
やり直す。次に照合部８で登録音声と入力音声との照合
を行い、最短距離を示す単語を認識結果として出力端子
11より出力する。なおスイッチ12は、音声入力直前に雑
音を入力する際には仮閾値設定部３に、音声入力時には
SN比算出部４と入力音声用バッファ10とに算出結果を入
力するように動作する。スイッチ13は、登録時には登録
用バッファ９に、認識時には区間修正部７に特徴パラメ
ータを入力するように動作する。Next, the section correcting unit 7 does not correct the registered voice section registered when the SN ratio is 24 dB or more. Only when the SN ratio is 24 dB or less, the section detection of the registered voice is performed again with the threshold value. Start over. Next, the matching unit 8 matches the registered voice with the input voice, and outputs the word indicating the shortest distance as a recognition result.
Output from 11. It should be noted that the switch 12 is set to the temporary threshold value setting unit 3 when noise is input immediately before voice input, and when the voice is input.
It operates so as to input the calculation result to the SN ratio calculation unit 4 and the input voice buffer 10. The switch 13 operates to input the characteristic parameter to the registration buffer 9 at the time of registration and to the section correction unit 7 at the time of recognition.

以上のように、本実施例によれば、SN比算出部４で信号
のピーク値と雑音の平均パワー値との比を算出し、閾値
設定部５で上記SN比が一定値以下の場合は雑音パワー値
に所定値を加えた値を、SN比が一定値以上の場合にはピ
ーク値より所定値を引いた値を閾値と決定し、区間検出
部６で上記閾値を用いて入力音声の区間検出を行い、区
間修正部７で上記閾値にて登録音声を修正し、照合部８
で上記登録音声と入力音声との照合を行うことにより、
登録音声と入力音声との音声区間のずれを防ぎ、状況の
違いによる誤認識を軽減することができる。As described above, according to the present embodiment, the SN ratio calculating unit 4 calculates the ratio between the peak value of the signal and the average power value of the noise, and the threshold setting unit 5 determines that the SN ratio is equal to or less than a certain value. A value obtained by adding a predetermined value to the noise power value is determined as a threshold value when the SN ratio is a predetermined value or more and a predetermined value is subtracted from the peak value, and the section detection unit 6 uses the threshold value to detect the input speech. The section is detected, the section correcting unit 7 corrects the registered voice with the threshold value, and the collating unit 8
By checking the registered voice and the input voice with
It is possible to prevent the voice section from deviating from the registered voice and the input voice, and reduce erroneous recognition due to the difference in the situation.

第２図は、本発明の第２の実施例（請求項２記載の発明
に対応）における登録型単語音声認識装置のブロック図
である。FIG. 2 is a block diagram of a registered word voice recognition device in a second embodiment of the present invention (corresponding to the invention described in claim 2).

同図において、１は音声入力端子、２は分析部、３は仮
閾値設定部、14は閾値設定部、６は区間検出部、７は区
間修正部、８は照合部、９は登録音声用バッファ、10は
入力音声用バッファ、11は認識結果出力端子、12、15は
スイッチであり、前記実施例と同じものは，同一の番号
を付与している。In the figure, 1 is a voice input terminal, 2 is an analyzing unit, 3 is a temporary threshold setting unit, 14 is a threshold setting unit, 6 is a section detecting unit, 7 is a section correcting unit, 8 is a collating unit, and 9 is for registered voice. A buffer, 10 is an input voice buffer, 11 is a recognition result output terminal, 12 and 15 are switches, and the same elements as those in the above-mentioned embodiment are given the same numbers.

まず登録時には、音声入力時直前に、音声入力端子１よ
り所定時間分の背景雑音信号が入力され、分析部２で単
位時間ごとの信号のパワーが算出される。算出結果は仮
閾値設定部３に入力される。仮閾値設定部３で上記パワ
ーの平均値を求め、左記平均値に一定値（本実施例では
6dBとする）加えた値を仮区間検出閾値とする。First, at the time of registration, immediately before voice input, a background noise signal for a predetermined time is input from the voice input terminal 1, and the analysis unit 2 calculates the signal power for each unit time. The calculation result is input to the temporary threshold value setting unit 3. The temporary threshold value setting unit 3 calculates the average value of the powers, and the average value shown on the left is a constant value (in the present embodiment,
The added value is used as the temporary section detection threshold.

登録単語音声入力時には、音声入力端子１より入力され
た信号にもとづき、分析部２では単位時間毎の信号のパ
ワーと特徴パラメータが算出される。パワー算出結果は
閾値設定部14に、特徴パラメータは入力音声用バッファ
10に入力される。なお分析方法は前記実施例と同じであ
る。閾値設定部14では、先に設定された仮区間検出閾値
以上の信号部を仮音声区間とし、仮音声区間内のピーク
値から所定値（本実施例では18dBとする）を加えた値と
先に算出された仮区間検出閾値とを比較し、後者の値が
大きければ登録を再度やり直すよう話者に指示し、以上
の登録処理を初めからやり直す。When inputting the registered word voice, the analysis unit 2 calculates the power of the signal and the characteristic parameter for each unit time based on the signal input from the voice input terminal 1. The power calculation result is stored in the threshold setting unit 14, and the characteristic parameter is stored in the input voice buffer.
Entered in 10. The analysis method is the same as in the above-mentioned example. In the threshold value setting unit 14, a signal portion equal to or greater than the previously set provisional section detection threshold is set as a provisional voice section, and a value obtained by adding a predetermined value (18 dB in this embodiment) from the peak value in the provisional voice section The temporary section detection threshold value calculated in step 1 is compared, and if the latter value is large, the speaker is instructed to retry registration, and the above registration process is restarted from the beginning.

前者の値が大きければ、この前者の値（ピーク値−18d
B）を検出閾値として設定し、区間検出部６で単位時間
毎の信号パワーと検出閾値とを比較し、音声区間を検出
する。区間検出方法は、従来例と同じである。次に、音
声区間分の特徴パラメータを入力音声用バッファ10より
入力し、登録音声用バッファ９に登録する。以上の登録
音声以降の処理を全認識単語分繰り返す。If the former value is large, this former value (peak value −18d
B) is set as the detection threshold value, and the section detection unit 6 compares the signal power per unit time with the detection threshold value to detect the voice section. The section detection method is the same as the conventional example. Next, the characteristic parameters for the voice section are input from the input voice buffer 10 and registered in the registered voice buffer 9. The above processing after the registered voice is repeated for all the recognized words.

次に認識時には、登録時と同様に背景雑音から仮区間検
出閾値を設定した後、入力音声を分析し、結果を閾値設
定部14と入力音声用バッファ10とに入力する。閾値設定
部14で、区間検出閾値以上の信号部を仮音声区間とし、
仮音声区間内のピーク値から18dBを引いた値と先に算出
された仮区間検出閾値とを比較し、両値の大きい方を閾
値と設定した後、区間検出部６で、左記閾値を用いて音
声区間を検出する。なお区間検出方法は登録時と同様で
ある。Next, at the time of recognition, as in the case of registration, the temporary interval detection threshold is set from the background noise, then the input voice is analyzed, and the result is input to the threshold setting unit 14 and the input voice buffer 10. In the threshold value setting unit 14, the signal portion equal to or more than the section detection threshold is a temporary voice section,
The value obtained by subtracting 18 dB from the peak value in the temporary voice section is compared with the previously calculated temporary section detection threshold value, and the larger one of the two values is set as the threshold value, and then the section detection unit 6 uses the threshold value on the left. To detect the voice section. The section detection method is the same as that at the time of registration.

次に区間修正部７では、上記閾値がピーク値から18dBを
引いた値で設定された場合には登録された登録音声区間
の修正は行なわず、閾値が仮区間検出閾値で設定された
場合のみ、上記閾値にて登録音声の区間検出を再度やり
直す。次に照合部８で登録音声と入力音声との照合を行
い、最短距離を示す単語を認識結果として出力端子11よ
り出力する。なおスイッチ15は、音声入力直前に雑音を
入力する際には仮閾値設定部３に、音声入力時には閾値
設定部14と入力音声用バッファ10とに算出結果を入力す
るように動作する。スイッチ13は、登録時には登録用バ
ッファ９に、認識時には区間修正部７に特徴パラメータ
を入力するように動作する。Next, the section correction unit 7 does not correct the registered voice section registered when the threshold is set to a value obtained by subtracting 18 dB from the peak value, and only when the threshold is set to the temporary section detection threshold. , The registered voice section detection is performed again with the above threshold value. Next, the collating unit 8 collates the registered voice with the input voice, and outputs the word indicating the shortest distance from the output terminal 11 as a recognition result. The switch 15 operates to input the calculation result to the temporary threshold setting unit 3 when noise is input immediately before voice input, and to the threshold setting unit 14 and the input voice buffer 10 when voice is input. The switch 13 operates to input the characteristic parameter to the registration buffer 9 at the time of registration and to the section correction unit 7 at the time of recognition.

以上のように，本実施例によれば、閾値設定部14で雑音
パワー値に6dBを加えた値と、ピーク値より18dBを引い
た値とを比較して大きい方を閾値と決定した後、区間検
出部６で上記閾値を用いて入力音声の区間検出を行い、
区間修正部で上記閾値にて登録音声区間を修正し、照合
部で上記登録音声と入力音声との照合を行うことによ
り、登録音声と入力音声との音声区間のずれを防ぎ、状
況の違いによる誤認識を少なくすることができる。また
本実施例は、第１の実施例に比べ、SN比を算出する手間
をかけずに同じ効果を期待できる。As described above, according to the present embodiment, after comparing the value obtained by adding 6 dB to the noise power value in the threshold setting unit 14 and the value obtained by subtracting 18 dB from the peak value, the larger one is determined as the threshold value, The section detector 6 detects the section of the input voice using the above threshold,
By correcting the registered voice section with the threshold value in the section correction unit, and by checking the registered voice and the input voice in the matching unit, it is possible to prevent the voice section from deviating from the registered voice and the input voice, depending on the situation. False recognition can be reduced. Further, the present embodiment can expect the same effect as the first embodiment without the trouble of calculating the SN ratio.

発明の効果請求項１記載の音声認識装置は、SN比算出部でSN比を算
出し、閾値決定部でSN比が低い環境では雑音パワー以上
の値を、SN比が高い環境ではピーク値より所定値を引い
た値を閾値と設定した後、区間検出部で上記閾値を用い
て入力音声の区間検出を行い、さらに区間修正部で上記
閾値にて登録音声または標準音声区間を修正し、照合部
で上記登録音声または標準音声と入力音声との照合を行
うことにより、登録音声と入力音声との音声区間のずれ
を防ぎ、状況の違いによる誤認識を少なくすることがで
きる。Advantageous Effects of Invention The speech recognition apparatus according to claim 1 calculates the SN ratio in the SN ratio calculation unit, and the threshold value determination unit outputs a value equal to or higher than noise power in an environment where the SN ratio is low, and a value higher than the peak value in an environment where the SN ratio is high. After setting the value obtained by subtracting the predetermined value as the threshold value, the section detection unit detects the section of the input voice using the threshold value, and the section correction unit corrects the registered voice or standard voice section with the threshold value and collates. By comparing the registered voice or the standard voice with the input voice by the section, it is possible to prevent a deviation of the voice section between the registered voice and the input voice, and to reduce erroneous recognition due to a difference in the situation.

また請求項２記載の音声認識装置は、閾値設定部で雑音
パワー値に所定値を加えた値と、ピーク値より所定値を
引いた値とを比較して大きい方を閾値と決定した後、区
間検出部で上記閾値を用いて入力音声の区間検出を行
い、さらに区間修正部で上記閾値にて登録音声または標
準音声区間を修正し、照合部で上記登録音声または標準
音声と入力音声との照合を行うことにより、登録音声ま
たは標準音声と入力音声との音声区間のずれを防ぎ、状
況の違いによる誤認識を少なくすることができる。また
上記発明に比べ、SN比を算出する手間をかけずに同じ効
果を期待できる。Further, in the voice recognition apparatus according to claim 2, after comparing the value obtained by adding a predetermined value to the noise power value with the threshold setting unit and the value obtained by subtracting the predetermined value from the peak value, the larger one is determined as the threshold, The section detection unit detects the section of the input voice using the threshold value, the section correction unit further corrects the registered voice or standard voice section with the threshold value, and the matching unit compares the registered voice or standard voice with the input voice. By performing the collation, it is possible to prevent the deviation of the voice section between the registered voice or the standard voice and the input voice, and reduce the erroneous recognition due to the difference in the situation. Further, compared to the above invention, the same effect can be expected without the trouble of calculating the SN ratio.

[Brief description of drawings]

第１図は本発明の第１の実施例における音声認識装置の
ブロック図、第２図は本発明の第２の実施例における音
声認識装置のブロック図、第３図は従来例における音声
認識装置のブロック図である。２……分析部、４……SN比算出部、５、14……閾値設定
部、６……区間検出部、７……区間修正部、８……照合
部。FIG. 1 is a block diagram of a voice recognition device in a first embodiment of the present invention, FIG. 2 is a block diagram of a voice recognition device in a second embodiment of the present invention, and FIG. 3 is a voice recognition device in a conventional example. It is a block diagram of. 2 ... Analysis unit, 4 ... SN ratio calculation unit, 5, 14 ... Threshold setting unit, 6 ... Section detection unit, 7 ... Section correction unit, 8 ... Collation unit.

Claims

[Claims]

1. An analysis unit for detecting the power of an input signal per unit time, an SN ratio calculation unit for calculating the ratio of voice power and noise power (hereinafter referred to as SN ratio), and section detection in consideration of the SN ratio. A threshold value determining unit that determines a threshold value, a section detecting unit that detects a voice section of the input signal based on the determined threshold value, a section correcting unit that corrects a registered voice or standard voice section, and a registered voice or standard voice is input. The SN ratio calculator calculates the SN ratio, and the threshold value determiner calculates the SN ratio. In the environment, set a value obtained by subtracting a predetermined value from the maximum power value of the signal (hereinafter called the peak value) as the threshold,
The section detection unit detects the section of the input voice using the threshold value, the section correction unit corrects the registered voice or standard voice section using the threshold value, and the matching unit compares the registered voice or standard voice with the input voice. A voice recognition device characterized by being configured to perform matching.

2. An analysis unit for detecting power of an input signal per unit time, a threshold setting unit for setting a section detection threshold in consideration of a peak value of signal power and a noise power value, and a set threshold. A section detection unit that detects a voice section of the input signal, a section correction unit that corrects a section of registered voice or standard voice, and a collation unit that collates the registered voice or standard voice with the input voice and outputs a recognition result. The threshold value setting unit compares a value obtained by adding a predetermined value to the noise power with a value obtained by subtracting the predetermined value from the peak value of the signal, and sets the larger value as the threshold value, and the section detection unit sets the above value. The input voice section is detected using a threshold value, the section correction unit corrects the registered voice or standard voice section with the threshold value, and the matching unit performs matching between the registered voice or standard voice and the input voice. Voice recognition characterized by Apparatus.