JPH0792671B2

JPH0792671B2 - Voice recognizer

Info

Publication number: JPH0792671B2
Application number: JP60132064A
Authority: JP
Inventors: 雅彦林; 裕史大塚
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1985-06-19
Filing date: 1985-06-19
Publication date: 1995-10-09
Anticipated expiration: 2010-10-09
Also published as: JPS61292198A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、ノイズ情報を用いて音声区間の検出及び認識
処理を行なう音声認識装置に関するものである。Description: TECHNICAL FIELD The present invention relates to a voice recognition device that detects and recognizes a voice section using noise information.

（従来の技術）従来、この種の音声認識装置は音声が発声される前にあ
らかじめ話者周囲環境ノイズあるいは入力条件に起因す
るノイズを測定し、該ノイズ測定の結果をノイズ情報と
してそのノイズ情報を用いて音声の区間検出及び認識を
行なっている。第５図は従来の音声認識装置の構成を示
すブロック図である。同図において、１は入力データを
分析する音声分析部、２は音声分析部１からの分析デー
タの入力先を切替えるスイッチ、３はスイッチ２を介し
て入力される分析データからノイズ情報を作成するノイ
ズ情報作成部、４はノイズ情報作成部３で作成されたノ
イズ情報を記憶するノイズ情報記憶部、５はスイツチ２
を介して入力される分析データを記憶する分析データ記
憶部、６はノイズ情報記憶部４に格納されたノイズ情報
を用いてレベル閾値を設定し、分析データ記憶部５の分
析データとレベル閾値とを比較して音声区間を検出する
音声区間検出部、７は分析データと音声区間検出部６か
らの音声区間情報により入力音声の認識処理を行なう認
識処理部、８は装置全体を制御するコントローラであ
る。(Prior Art) Conventionally, this type of voice recognition device measures noise caused by the ambient environment of the speaker or noise caused by input conditions in advance before the voice is uttered, and the result of the noise measurement is used as noise information for the noise information. Is used to detect and recognize the voice section. FIG. 5 is a block diagram showing the configuration of a conventional voice recognition device. In the figure, 1 is a voice analysis unit for analyzing input data, 2 is a switch for switching the input destination of the analysis data from the voice analysis unit 1, and 3 is noise information created from the analysis data input via the switch 2. A noise information creation unit 4, a noise information storage unit 4 for storing the noise information created by the noise information creation unit 3, and a switch 2
An analysis data storage unit for storing analysis data input via the, 6 sets a level threshold value using the noise information stored in the noise information storage unit 4, and sets the analysis data in the analysis data storage unit 5 and the level threshold value. A voice section detecting unit for detecting the voice section by comparing the two, a recognition processing section 7 for recognizing the input voice based on the analysis data and the voice section information from the voice section detecting section 6, and a controller 8 for controlling the entire apparatus. is there.

次に動作を説明する。まず、最初に音声が発声される前
にノイズ測定を行なう。コントローラ８はスイッチ２を
ノイズ情報作成部３側に切替えて音声分析部１及びノイ
ズ情報作成部３を起動する。これにより入力データは音
声分析部１に入力され、ここで入力データのパワー情報
及び周波数情報を抽出する分析処理が一定時間毎（フレ
ーム単位毎）に行なわれ、この分析結果よりノイズ情報
作成部３においてノイズ情報を作成し、ノイズ情報記憶
部４に格納する。該ノイズ測定が完了するとコントロー
ラ８は音声分析部１及びノイズ情報作成部３を停止さ
せ、次に音声区間検出及び認識動作を行なう。この場
合、コントローラ８はスイッチ２を音声区間検出部６側
に切替えて音声分析部１及び音声区間検出部６を起動す
る。これにより入力データは前記ノイズ測定と同様音声
分析部１により分析され、分析結果は音声分析データと
して分析データ記憶部５及び音声区間検出部６に入力さ
れる。音声区間検出部６は音声分析結果とノイズ情報記
憶部４に格納され、あらかじめノイズ測定により得られ
た前記ノイズ情報とを用いて入力された音声レベルの検
出を行なうためのレベル閾値を設定し、該レベル閾値よ
り大の区間の最初のフレームを始端、最後のフレームを
終端として、この始端から終端までの区間を音声区間と
決定し、次にコントローラ８は認識処理部７を起動させ
る。認識処理部７は音声分析データ及び音声区間検出部
５から出力される音声区間情報、ノイズ情報をもとに一
般的に知られているパターンマッチングによる認識処理
等を行ない、認識結果を出力する。第６図は入力データ
に対するノイズ測定及び音声区間検出の関係を示したも
のである。Next, the operation will be described. First, a noise measurement is performed before the voice is first uttered. The controller 8 switches the switch 2 to the noise information creation unit 3 side to activate the voice analysis unit 1 and the noise information creation unit 3. As a result, the input data is input to the voice analysis unit 1, where an analysis process for extracting the power information and frequency information of the input data is performed at regular time intervals (frame units). At, noise information is created and stored in the noise information storage unit 4. When the noise measurement is completed, the controller 8 stops the voice analysis unit 1 and the noise information creation unit 3, and then performs the voice section detection and recognition operation. In this case, the controller 8 switches the switch 2 to the voice section detection unit 6 side to activate the voice analysis unit 1 and the voice section detection unit 6. As a result, the input data is analyzed by the voice analysis unit 1 as in the noise measurement, and the analysis result is input to the analysis data storage unit 5 and the voice section detection unit 6 as voice analysis data. The voice section detection unit 6 sets a level threshold for detecting the input voice level by using the voice analysis result and the noise information stored in the noise information storage unit 4 in advance and using the noise information obtained by noise measurement, The first frame of the section larger than the level threshold is set as the start end and the last frame is set as the end, and the section from the start end to the end is determined as the voice section, and then the controller 8 activates the recognition processing unit 7. The recognition processing unit 7 performs recognition processing by generally known pattern matching based on the voice analysis data and the voice section information and noise information output from the voice section detection unit 5, and outputs the recognition result. FIG. 6 shows the relationship between noise measurement and voice section detection for input data.

（発明が解決しようとする問題点）しかしながら、上記構成の装置では電話機や指向性の弱
いマイクロフォン等で音声が入力される場合の話者の周
囲雑音の変化や入力されてから音声認識装置まで伝送さ
れる間に混入するノイズによって発声するノンパルス性
のノイズをノイズ測定において偶然に拾うと、そのため
ノイズ情報が特異なものになり実際の音声が存在する近
辺のノイズの状態と該ノイズ測定で得たノイズ情報が大
きく異って正確な音声区間を切り出すことができず誤認
識が引き起すという問題点があった。(Problems to be Solved by the Invention) However, in the device having the above-described configuration, a change in ambient noise of a speaker when a voice is input through a telephone or a microphone having a weak directivity, and transmission to a voice recognition device after the input. When non-pulse noise uttered by noise mixed in during the measurement is accidentally picked up in the noise measurement, the noise information becomes peculiar and the noise state near the actual voice and the noise measurement were obtained. There is a problem in that the noise information is greatly different and an accurate voice section cannot be cut out, resulting in erroneous recognition.

本発明は、以上述べた特異なノイズ情報で音声区間検出
を誤るという問題点を除去し、より正確な音声区間を切
り出し認識率の優れた装置を提供することを目的とす
る。SUMMARY OF THE INVENTION It is an object of the present invention to eliminate the above-mentioned problem of erroneous voice section detection due to peculiar noise information, and to provide a device that cuts out a more accurate voice section and has an excellent recognition rate.

（問題点を解決するための手段）本発明は前記問題点を解決するために、入力データを音
声分析する分析手段と、前記分析手段の分析結果を記憶
する分析結果記憶手段と、前記分析結果記憶手段に格納
された分析結果の所定の先頭フレームと任意の終了フレ
ームとの間の測定区間のノイズを測定してノイズ情報を
算出するノイズ情報算出手段と、前記ノイズ情報算出手
段における終了フレームを所定量ずつ増加させて前記測
定区間を変更するストップフレーム設定手段と、前記ノ
イズ情報算出手段からのノイズ情報を記憶するノイズ情
報記憶手段と、前記ノイズ情報記憶手段からのノイズ情
報を用いてレベル閾値を設定し、該設定したときタイミ
ングで前記分析結果記憶手段の内容と該レベル閾値とを
比較することにより音声区間を検出する検出手段と、前
記分析結果記憶手段及び検出手段の出力信号に基づいて
音声認識を行う認識手段と、前記検出手段が、音声区間
の始端を検出するまで、前記ストップフレーム設定手段
を動作させて終了フレームを所定量づつ増加させてノイ
ズ情報を更新させ、更新されたノイズ情報に基づいて音
声区間を検出させる制御手段とから構成される音声認識
装置である。(Means for Solving Problems) In order to solve the above problems, the present invention provides an analysis means for analyzing input data by voice, an analysis result storage means for storing the analysis result of the analysis means, and the analysis result. A noise information calculation unit that calculates noise information by measuring noise in a measurement section between a predetermined start frame and an arbitrary end frame of the analysis result stored in the storage unit, and an end frame in the noise information calculation unit. A stop frame setting means for changing the measurement section by increasing by a predetermined amount, a noise information storage means for storing noise information from the noise information calculation means, and a level threshold value using the noise information from the noise information storage means. Is set, and the voice section is detected by comparing the contents of the analysis result storage means with the level threshold at the timing when the setting is made. Detecting means, recognizing means for recognizing the voice based on the output signals of the analysis result storing means and detecting means, and the detecting means, until the detecting end detects the beginning of the voice section, the stop frame setting means is operated to end. A voice recognition device comprising: a control unit that increases a frame by a predetermined amount to update noise information, and detects a voice section based on the updated noise information.

好ましくは、前記制御手段が、前記ノイズ情報記憶手段
の最新の内容を次の音声認識時における最初のノイズ測
定区間のノイズ情報とするように前記各手段を制御する
ものである。Preferably, the control means controls each means so that the latest contents of the noise information storage means are noise information of the first noise measurement section at the next voice recognition.

（作用）本発明によれば以上のように音声認識装置を構成したの
で技術的手段は次のように作用する。無音声（ノイズ測
定）区間も含む音声入力データが分析手段により音声分
析され分析結果記憶手段に格納される。制御手段はノイ
ズ情報算出手段に対して分析結果記憶手段の分析データ
の先頭フレーム及び任意の終了フレームのノイズ測定区
間を指示してノイズ情報を算出させる。次に、検出手段
はノイズ情報記憶手段を介して入力されたノイズ情報を
用いてレベル閾値を設定し、このタイミング（終了フレ
ーム＋レベル閾値設定時間）で、分析データのレベルと
レベル閾値とを比較して入力音声の始端から終端までの
音声区間を検出する。制御手段は検出手段で音声区間の
始端が検出されないときには、始端が検出されるまで終
了フレームを所定量増加させたノイズ測定区間をストッ
プフレーム設定手段に指示してノイズ情報を更新させ
る。検出手段は更新されたノイズ情報を用いてレベル閾
値を設定し、このタイミング（更新の終了フレーム＋更
新のレベル閾値の設定時間）で同様に音声区間の検出を
行なう。認識手段はこのようにして得られた音声区間情
報により入力音声の認識を行なう。従って、音声区間が
検出されるまで最新のノイズ情報を用いて音声区間検出
を行うので、前記従来技術の問題点が解決できるのであ
る。(Operation) According to the present invention, since the voice recognition device is configured as described above, the technical means operates as follows. The voice input data including the non-voice (noise measurement) section is subjected to voice analysis by the analysis means and stored in the analysis result storage means. The control means instructs the noise information calculating means to calculate the noise information by instructing the noise measurement section of the first frame and the arbitrary end frame of the analysis data of the analysis result storage means. Next, the detection means sets a level threshold value using the noise information input via the noise information storage means, and at this timing (end frame + level threshold setting time), compares the level of the analysis data with the level threshold value. Then, the voice section from the beginning to the end of the input voice is detected. When the detection means does not detect the start end of the voice section, the control means instructs the stop frame setting means to update the noise information by increasing the end frame by a predetermined amount until the start end is detected. The detection means sets the level threshold value using the updated noise information, and similarly detects the voice section at this timing (update end frame + update level threshold setting time). The recognizing means recognizes the input voice based on the voice section information obtained in this way. Therefore, the voice section detection is performed using the latest noise information until the voice section is detected, so that the problems of the conventional technique can be solved.

（実施例）第１図は本発明の第１の実施例を示すブロック図であ
る。同図において、第５図と同一の参照符号は同一性の
ある構成部分を示す。30は第５図のノイズ情報作成部３
に相当するノイズ情報作成部で、ノイズ情報算出部31及
びストップフレーム設定部32から構成される。ノイズ情
報算出部31は分析データ記憶部５に格納された音声分析
データ（無音声区間も含む）を入力としてノイズ情報を
作成する。ストップフレーム設定部32はコントローラ８
の指示によりノイズ測定の終了フレーム番号を格納して
ノイズ情報算出部31に与える。(Embodiment) FIG. 1 is a block diagram showing a first embodiment of the present invention. In the figure, the same reference numerals as those in FIG. 5 denote the same components. 30 is the noise information creation unit 3 in FIG.
The noise information creation unit corresponding to the above is composed of a noise information calculation unit 31 and a stop frame setting unit 32. The noise information calculation unit 31 inputs the voice analysis data (including the non-voice section) stored in the analysis data storage unit 5 and creates noise information. The stop frame setting unit 32 is the controller 8
In accordance with this instruction, the end frame number of noise measurement is stored and given to the noise information calculation unit 31.

次に動作例を説明する。まず音声が入力される前に従来
と同様にノイズ測定を一次ノイズ測定として行なう。音
声分析部１から従来と同様にして分析された音声分析デ
ータはすべて一度分析データ記憶部５にあらかじめ決ま
った先頭フレーム番号から逐次格納され、コントローラ
８は一次ノイズ測定を行なう区間の終了フレーム番号を
ストップフレーム設定部32に設定する。次にノイズ情報
作成部30が起動され、ノイズ情報算出部31は前記先頭フ
レーム番号とストップフレーム設定部32に設定されてい
る前記終了フレーム番号で指される区間の音声分析デー
タを分析データ記憶部５から読み出し、従来と同様にし
てノイズ情報を作成してノイズ情報記憶部４に格納す
る。これにより一次ノイズ測定を終了する。次にコント
ローラ８は音声区間検出部６を起動し、該音声区間検出
部６において前記一次ノイズ測定によるノイズ情報によ
り従来と同様の音声区間検出を行ない、音声区間の始端
が検出されない場合は更にストップフレーム設定部32に
あらかじめ設定されている終了フレーム番号に対して一
定のフレーム数加算したフレーム番号をストップフレー
ム設定部32に再設定する。次にノイズ情報作成部30を起
動して前記一次ノイズ測定と同様にして前記先頭フレー
ムと前記再設定された終了フレーム番号で指される区間
の音声分析データを分析データ記憶部５から読み出し、
ノイズ情報を作成し、ノイズ情報記憶部４に格納させ、
音声区間検出部６において音声区間の始端検出を行な
う。以上の操作を音声区間の始端が検出されるまで繰り
返すことによって、ノイズ測定の区間を拡大させ最新の
ノイズ情報を音声区間検出部６に与える。これにより音
声区間検出部６は、ノイズ情報記憶部４に格納されてい
る最新のノイズ情報をもとに従来と同様にして音声区間
の検出を行なう。Next, an operation example will be described. First, before speech is input, noise measurement is performed as primary noise measurement as in the conventional case. All the voice analysis data analyzed by the voice analysis unit 1 in the conventional manner are sequentially stored once in the analysis data storage unit 5 from the predetermined start frame number, and the controller 8 sets the end frame number of the section in which the primary noise measurement is performed. Set in the stop frame setting unit 32. Next, the noise information creation unit 30 is activated, and the noise information calculation unit 31 stores the voice analysis data in the section pointed by the start frame number and the end frame number set in the stop frame setting unit 32 in the analysis data storage unit. 5, noise information is created and stored in the noise information storage unit 4 in the same manner as in the related art. This completes the primary noise measurement. Next, the controller 8 activates the voice section detection unit 6, and the voice section detection unit 6 detects the voice section in the same manner as the conventional one based on the noise information obtained by the primary noise measurement, and further stops when the start end of the voice section is not detected. A frame number obtained by adding a fixed number of frames to the end frame number preset in the frame setting unit 32 is reset in the stop frame setting unit 32. Next, the noise information creating unit 30 is activated to read the voice analysis data of the section pointed by the first frame and the reset end frame number from the analysis data storage unit 5 in the same manner as the primary noise measurement.
Noise information is created and stored in the noise information storage unit 4,
The voice section detection unit 6 detects the start of the voice section. By repeating the above operation until the start of the voice section is detected, the noise measurement section is expanded and the latest noise information is given to the voice section detection unit 6. As a result, the voice section detection unit 6 detects the voice section in the same manner as the conventional one based on the latest noise information stored in the noise information storage unit 4.

第２図は入力データに対するノイズ測定と音声区間検出
との関係を示したもので、の区間は〔１〕の区間で求
めたノイズ情報を、の区間では〔２〕の区間で求めた
ノイズ情報を、…、の区間では〔ｎ〕の区間で求めた
ノイズ情報を用いて音声区間の検出を行なうタイミング
を表し、前記一次ノイズ測定の区間は１回目のノイズ測
定区間〔１〕に対応し、ｎ回目のノイズ測定区間〔ｎ〕
により音声区間の始端が検出されたことを示してい
る。FIG. 2 shows the relationship between the noise measurement and the voice section detection for the input data. In the section, the noise information obtained in the section [1] and the noise information obtained in the section [2] are shown. , ... represents the timing of detecting the voice section using the noise information obtained in the section [n], and the section of the primary noise measurement corresponds to the first noise measurement section [1], nth noise measurement section [n]
Indicates that the beginning of the voice section has been detected.

なお音声区間検出後の動作は従来と同様のためその説明
は省略する。第１の実施例では、一次ノイズ測定の音声
区間検出する区間が連続している場合を説明したが、そ
の間が不連続であっても音声分析データをあたかも連続
しているように扱って連続の場合と同様にノイズ測定を
行なっても良い。Since the operation after the detection of the voice section is the same as the conventional one, its explanation is omitted. In the first embodiment, the case where the section for detecting the speech section of the primary noise measurement is continuous has been described, but even if the section is discontinuous, the speech analysis data is treated as if it is continuous. Noise measurement may be performed as in the case.

次に第２の実施例について説明する。第３図は本発明の
第２実施例を示すブロック図であり、第１の実施例構成
要素に加えて前回の認識に用いたノイズ情報を測定した
区間の音声分析データの平均値を格納する平均ノイズ記
憶部41を設けたものである。次に動作例を説明する。ま
ず一回目の入力データに対する認識動作は第１の実施例
と同様に一次ノイズ測定を行ない順次音声区間の始端が
決定されるまでノイズ測定を繰り返し、最終的に音声区
間が検出され、これにより認識処理を行なった後、音声
区間検出に使用されたノイズ情報を測定した区間の音声
分析データの平均値を求め平均ノイズ記憶部41に格納す
る。更に二回目以降の入力データに対する認識について
は一次ノイズ測定を行なわず、コントローラ８の指令に
より平均ノイズ記憶部41に格納されている前記音声分析
データの平均値を分析データ記憶部５内における音声区
間検出区間の各フレームに対応した一次ノイズ測定用分
析データ格納エリアに格納し一次ノイズ測定を終了した
ものとして扱う。以下の動作は第１の実施例と同様に、
前記先頭フレーム番号とコントローラ８より与えられた
ストップフレーム設定部32の終了フレーム番号とが指す
区間に対してノイズ測定を音声区間の始端が検出される
まで繰り返し、以降は第１の実施例と同じ動作を行な
う。その他の動作は第１の実施例と同じであり、更に認
識処理後最終的に音声区間検出及び認識処理に用いたノ
イズ情報を測定した区間の音声分析データの平均値を前
記平均ノイズ記憶部41に格納する。第４図は第１の実施
例における第２図と同様に入力データに対するノイズ測
定と音声区間検出との関係を示したもので、の区間は
〔１〕の区間で求めたノイズ情報を、の区間では
〔２〕の区間で求めたノイズ情報を、…、の区間では
〔ｎ〕の区間で求めたノイズ情報を用いて音声区間の検
出を行なうタイミングを表す。また、前記音声分析デー
タの平均値の格納の区間は１回目のノイズ測定区間
〔１〕に対応し、ｎ回目のノイズ測定区間〔ｎ〕により
音声区間の始端が検出されたことを示している。Next, a second embodiment will be described. FIG. 3 is a block diagram showing a second embodiment of the present invention. In addition to the components of the first embodiment, the average value of the voice analysis data of the section in which the noise information used for the previous recognition is measured is stored. The average noise storage unit 41 is provided. Next, an operation example will be described. First, in the first recognition operation for input data, similar to the first embodiment, the primary noise measurement is performed and the noise measurement is repeated until the start end of the sequential voice section is determined, and finally the voice section is detected. After the processing, the average value of the voice analysis data of the section where the noise information used for the voice section detection is measured is obtained and stored in the average noise storage unit 41. Further, regarding the recognition of the input data from the second time onward, the primary noise measurement is not performed, and the average value of the voice analysis data stored in the average noise storage unit 41 according to the instruction of the controller 8 is used as the voice section in the analysis data storage unit 5. It is stored in the analysis data storage area for primary noise measurement corresponding to each frame in the detection section, and is treated as if the primary noise measurement was completed. The following operation is similar to that of the first embodiment,
Noise measurement is repeated until the start end of the voice section is detected for the section indicated by the start frame number and the end frame number of the stop frame setting unit 32 given by the controller 8, and thereafter the same as in the first embodiment. Take action. The other operations are the same as those in the first embodiment. Further, after the recognition processing, the average value of the voice analysis data of the section in which the noise information used for the voice section detection and the recognition processing is finally measured is the average noise storage unit 41. To store. Similar to FIG. 2 in the first embodiment, FIG. 4 shows the relationship between the noise measurement and the voice section detection with respect to the input data. The section of the noise information obtained in the section [1] In the section, the noise information obtained in the section [2] is used, and in the section [...], the noise information obtained in the section [n] is used to represent the timing of detecting the voice section. The section for storing the average value of the voice analysis data corresponds to the first noise measurement section [1], and indicates that the beginning of the voice section is detected by the nth noise measurement section [n]. .

なお、第２の実施例において最終的に音声区間検出及び
認識処理に用いたノイズ情報を測定した区間の音声分析
データの平均値を装置内の平均ノイズ記憶部41に格納し
次の認識に用いているが、前記音声分析データの平均値
を装置外部に出力し、次の認識において外部から受け取
るようにしても良い。In the second embodiment, the average value of the voice analysis data of the section in which the noise information finally used for the voice section detection and recognition processing is measured is stored in the average noise storage unit 41 in the device and used for the next recognition. However, the average value of the voice analysis data may be output to the outside of the device and received from the outside in the next recognition.

以上のように本実施例によれば、ノイズ情報の測定を音
声区間の始端が検出されるまで繰り返し行ない、音声に
極めて近いところのノイズ情報による音声区間検出を行
なうことにより、話者の周囲雑音の変化等があってもよ
り正確な音声区間を切り出すことができ、認識率の優れ
た音声認識装置を提供できる。更に前記第２の実施例の
ように二回目以降の入力データに対する認識において一
次ノイズ測定を行なわないことにより、連続的な認識に
対して、ノイズ測定のため時間が取られることがなく話
者の発声を自然に行なわせることができる。As described above, according to the present embodiment, the noise information is repeatedly measured until the beginning of the voice section is detected, and the voice section is detected by the noise information that is extremely close to the voice. It is possible to cut out a more accurate voice segment even when there is a change in the above, and to provide a voice recognition device with an excellent recognition rate. Further, as in the second embodiment, the primary noise measurement is not performed in the recognition of the input data for the second time and thereafter, so that the continuous measurement does not take time for the noise measurement and the speaker does not need time. The vocalization can be done naturally.

（発明の効果）以上説明したように本発明によれば、音声区間を正確に
切り出すことができるので確認率の優れた音声認識装置
を提供することができる。(Effects of the Invention) As described above, according to the present invention, a voice segment can be accurately cut out, and thus a voice recognition device having an excellent confirmation rate can be provided.

[Brief description of drawings]

第１図は本発明の音声認識装置の第１の実施例を示すブ
ロック図、第２図は第１の実施例の動作を説明する図、
第３図は本発明の音声認識装置の第２の実施例を示すブ
ロック図、第４図は第２の実施例の動作を説明する図、
第５図は従来の音声認識装置を示すブロック図、第６図
は従来の音声区間検出動作を説明する図である。１……音声分析部、４……ノイズ情報記憶部、５……分
析データ記憶部、６……音声区間検出部、７……認識処
理部、８……コントローラ、30……ノイズ情報作成部、
31……ノイズ情報算出部、32……ストップフレーム設定
部、41……平均ノイズ記憶部。FIG. 1 is a block diagram showing a first embodiment of the speech recognition apparatus of the present invention, and FIG. 2 is a diagram explaining the operation of the first embodiment,
FIG. 3 is a block diagram showing a second embodiment of the voice recognition device of the present invention, and FIG. 4 is a diagram explaining the operation of the second embodiment,
FIG. 5 is a block diagram showing a conventional voice recognition device, and FIG. 6 is a diagram for explaining a conventional voice section detection operation. 1 ... Voice analysis unit, 4 ... Noise information storage unit, 5 ... Analysis data storage unit, 6 ... Voice section detection unit, 7 ... Recognition processing unit, 8 ... Controller, 30 ... Noise information creation unit ,
31: Noise information calculation unit, 32: Stop frame setting unit, 41: Average noise storage unit.

Claims

[Claims]

Claims: (a) analysis means for voice analysis of input data; (b) analysis result storage means for storing analysis results of the analysis means; (c) predetermined analysis results stored in the analysis result storage means. Information measuring means for calculating noise information by measuring noise in a measurement section between the first frame and an arbitrary end frame, (d) increasing the end frame in the noise information calculating means by a predetermined amount and performing the measurement Stop frame setting means for changing the section, (e) noise information storage means for storing the noise information from the noise information calculation means, (f) setting a level threshold value using the noise information from the noise information storage means, Detection means for detecting a voice section by comparing the content of the analysis result storage means with the level threshold value at the timing when the setting is made; Recognizing means for performing voice recognition based on the output signals of the result storing means and the detecting means, (h) until the detecting means detects the start end of the voice section, the stop frame setting means is operated to set the end frame by a predetermined amount. The noise information is updated in increments of
A voice recognition device comprising: a control unit for detecting a voice section based on the updated noise information.

2. The control means controls each means so that the latest contents of the noise information storage means are noise information of the first noise measurement section at the time of the next speech recognition. A speech recognition apparatus according to claim 1.