JPS6142280B2 - - Google Patents
Info
- Publication number
- JPS6142280B2 JPS6142280B2 JP54098857A JP9885779A JPS6142280B2 JP S6142280 B2 JPS6142280 B2 JP S6142280B2 JP 54098857 A JP54098857 A JP 54098857A JP 9885779 A JP9885779 A JP 9885779A JP S6142280 B2 JPS6142280 B2 JP S6142280B2
- Authority
- JP
- Japan
- Prior art keywords
- signal
- input
- voice
- frequency
- filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired
Links
- 238000001514 detection method Methods 0.000 claims 1
- 239000000284 extract Substances 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 5
- 238000000034 method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000007257 malfunction Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000002238 attenuated effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000005314 correlation function Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Description
【発明の詳細な説明】
本発明は音声認識装置の外部騒音による誤動作
防止に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to preventing malfunction of a speech recognition device due to external noise.
音声信号の特徴パラメータを抽出し、この特徴
パラメータを利用してある話者の音声を認識し、
認識した音声にしたがつて外部機器を制御する信
号等を出力する音声認識システムは従来種々提案
されてきている。その一例を第1図のブロツク図
に示す。入力音声を電気信号に変換する音響−電
気信号変換器(例えばマイクロフホン)を含む入
力部1、入力された音響信号のうち、処理の対象
となる周波数帯域のみを取り出す入力帯域フイル
タ2、音声信号の特徴を抽出する特徴抽出部3、
あらかじめ登録された音声特徴の標準パターンを
記憶する標準パターン記憶部4、入力音声から抽
出された特徴パターンと標準パターンとを比較
し、入力音声を特定する認識処理部5、認識結果
を出力する出力制御部6を主な構成要素とし、こ
れに認識率を向上をさせるための入力信号振幅正
規化回路7と、時間軸調整部8と、あらかじめ音
声特徴の標準パターンを登録する際の処理を受け
持つ登録制御部9とが付加されている。 Extract the feature parameters of the audio signal, use these feature parameters to recognize the voice of a certain speaker,
2. Description of the Related Art Various voice recognition systems have been proposed in the past that output signals and the like to control external devices in accordance with recognized voice. An example of this is shown in the block diagram of FIG. An input unit 1 including an acoustic-electrical signal converter (for example, a microphone) that converts input audio into an electrical signal, an input band filter 2 that extracts only a frequency band to be processed from the input audio signal, and an audio signal. a feature extraction unit 3 that extracts the features of
A standard pattern storage unit 4 that stores standard patterns of pre-registered audio features, a recognition processing unit 5 that compares the feature pattern extracted from the input audio with the standard pattern and identifies the input audio, and an output that outputs the recognition results. The main component is a control unit 6, which includes an input signal amplitude normalization circuit 7 for improving the recognition rate, a time axis adjustment unit 8, and is responsible for processing when registering standard patterns of voice features in advance. A registration control section 9 is added.
音声の特徴を抽出するパラメータとしては周波
数スペクトル分布、相関関数、零交差数、フオル
マント周波数、或いは線型予測係数など多くの方
法が考えられるが、これらのうち音声の周波数ス
ペクトルを複数の帯域分割フイルタにより分離抽
出し、標準パターンとの相関をみるいわゆるフイ
ルタバンク方法は比較的簡単な構成で高い認識率
を得ることができる方法として良く用いられてい
る。 There are many methods that can be considered as parameters for extracting voice features, such as frequency spectrum distribution, correlation function, number of zero crossings, formant frequency, or linear prediction coefficient. The so-called filter bank method, which separates and extracts patterns and checks the correlation with a standard pattern, is often used as a method that can obtain a high recognition rate with a relatively simple configuration.
さてこのような音声処理装置においては、入力
信号の中から制御命令を分離抽出する能力はきわ
めて重要でありこれによつて装置の認識率が左右
されるといつても過言ではない。特に認識装置が
使用される環境が外部騒音の多い場所であるよう
な場合、騒音がマイクロフホン1を通して入力さ
れることになり、使用上大きい問題となつてく
る。この場合マイクロフホン1から入力された騒
音が入力帯域フイルタ2円通過した後に示す周波
数スペクトル分布が、装置にあらかじめ登録され
ている音声の中のいずれかの周波数スペクトル分
布と非常に似通つたパターンを示すならば装置は
騒音をある意味を有する音声を誤認識してしまう
ことが起きるのである。 Now, in such a speech processing device, the ability to separate and extract control commands from an input signal is extremely important, and it is no exaggeration to say that the recognition rate of the device is influenced by this ability. Particularly when the environment in which the recognition device is used is a place where there is a lot of external noise, the noise will be input through the microphone 1, which will pose a big problem in use. In this case, the frequency spectrum distribution of the noise input from microphone 1 after passing through two input band filters has a pattern that is very similar to the frequency spectrum distribution of one of the voices registered in advance in the device. If this is the case, the device may mistakenly recognize noise as sound having a certain meaning.
しばしば発生し、問題となる外部騒音としては
人の拍手、物と物とがぶつかつて生じる衝撃音と
かがあり、現代ではこういつた比較的高レベルで
周波数の高い成分を多く含む騒音が種々の環境で
発生しやすくなつている。この種の音は持続時間
は短いので通常の語の長さの点から装置には受け
付けられにくくなつているが、間隔短く数回連続
して起こる場合には受け付けられることがある。 External noises that often occur and cause problems include people's applause and impact noises caused by objects colliding with each other.In modern times, these relatively high-level noises that contain many high-frequency components are produced by various types of noise. It is becoming more likely to occur in the environment. This type of sound has a short duration, making it difficult for the device to accept it due to the length of a typical word, but it may be accepted if it occurs several times in succession with short intervals.
外部騒音を除去する目的でいわゆる接話型のマ
イクロフオンを入力部に用いることが行なわれる
が、前記のような周波数の高い成分を多く含む高
レベルの音には効果がうすくなる傾向があり、誤
動作防止についてはなお不充分である。 A so-called close-talk type microphone is used in the input section for the purpose of removing external noise, but this tends to be less effective against high-level sounds that contain many high-frequency components as mentioned above. Prevention of malfunctions is still insufficient.
本発明はかかる音声認識装置の騒音による誤動
作を起こしにくくする目的から為されたものであ
る。 The present invention has been made for the purpose of making it difficult for such a speech recognition device to malfunction due to noise.
即ち、前述した種類の騒音は周波数帯域が広く
通常の音声には含まれない高域成分まで多く含む
ことに着目し、騒音と音声との区別を明確にする
ものである。 That is, it is intended to clarify the distinction between noise and voice by focusing on the fact that the above-mentioned types of noise have a wide frequency band and include many high-frequency components that are not included in normal voice.
以下本発明の一実施例を第2図に従つて詳述す
る。入力部1はマイクロフオホン10と増幅器1
1により構成される。入力部1からの信号は入力
帯域フイルタ2を通つた後特徴抽出部3に入る特
徴抽出部3は中心周波数がそれぞれ1,2,
…,nのN個のバンドパスフイルタ(BPF)1
2,13〜14、該各フイルタの出力を切り替え
るマルチプレクサ15、該マルチプレクサを通過
した前記各フイルタの出力レベルをデイジタル信
号に変換するアナログ−デイジタル(A/D)変
換器16によつて構成される。なおバンドパスフ
イルタ12〜14全部でカバーする帯域は入力帯
域フイルタ2の帯域と同じか狭い帯域とする。こ
れにより入力部1から入力した音声信号の各フイ
ルタ成分が適当な時間間隔(多くの場合10ミリ秒
前後)でサンプリングされデイジタルコーデに変
換された後I/Oポートを含むマイクロコンピユ
ータ17を介して記憶メモリー18に記憶され
る。マイクロコンピユータ17には別の標準パタ
ーンメモリー4が接続されておりあらかじめ制御
命令音声がその制御内容を指定するコードと共に
記憶されている。音声認識モードに於いては前述
の如く制御音声が入力し、特徴抽出回路3の各フ
イルタ12,13…14により抽出されデイジタ
ル信号化された信号列は記憶メモリー18に記憶
され次いでマイクロコンピユータ17はこの記憶
パターンと標準パターンとの差を、全ての標準パ
ターンについて計算しその差が最も小さい標準パ
ターンを決定することにより入力音声を特定す
る。一般に人間の話声は同じ言語を発声してもそ
の時間的推移は常に同等とは限らない為、第1図
に示すが如き何らかの時間軸調整回路8が付加さ
れなければならないことは衆知の通りであるが、
本発明の構成を示した第2図に於ては説明の都合
上かかる時間軸調整回路は省略している。 An embodiment of the present invention will be described in detail below with reference to FIG. Input section 1 includes microphone 10 and amplifier 1
1. The signal from the input section 1 passes through the input band filter 2 and then enters the feature extraction section 3.The feature extraction section 3 has center frequencies of 1 , 2, and 2 , respectively.
..., n bandpass filters (BPF) 1
2, 13 to 14, a multiplexer 15 that switches the output of each filter, and an analog-digital (A/D) converter 16 that converts the output level of each filter that has passed through the multiplexer into a digital signal. . Note that the band covered by all of the bandpass filters 12 to 14 is the same as or narrower than the band of the input band filter 2. As a result, each filter component of the audio signal input from the input section 1 is sampled at appropriate time intervals (in most cases, around 10 milliseconds) and converted into digital code, and then sent via the microcomputer 17 including the I/O port. It is stored in the storage memory 18. Another standard pattern memory 4 is connected to the microcomputer 17, and control command voices are stored in advance together with codes specifying the control contents. In the voice recognition mode, the control voice is input as described above, and the signal string extracted by each filter 12, 13...14 of the feature extraction circuit 3 and converted into a digital signal is stored in the storage memory 18, and then the microcomputer 17 The input voice is specified by calculating the difference between this stored pattern and the standard pattern for all standard patterns and determining the standard pattern with the smallest difference. In general, human speech is not always the same over time even when the same language is uttered, so it is well known that some kind of time axis adjustment circuit 8 as shown in Figure 1 must be added. In Although,
In FIG. 2 showing the configuration of the present invention, the time axis adjustment circuit is omitted for convenience of explanation.
認識モードに於ける音声の取り込みは常時行な
われており、入力音声が途切れたとき即ちポーズ
期間に前述の認識計算が実行されそれ以前の入力
音声がパターンマツチング法により特定される。
この時入力音声について特定が可能となつた時、
即ち入力音声が何らかの標準パターンに許容され
得る誤差の範囲内で一致した時マイクロコンピユ
ータ17は出力制御部6を制御して制御音声に対
応した外部機器制御用信号を出力する。入力音声
を特定できぬ場合制御用信号は出力せず例えば表
示装置19を駆動し再発声を促すようにする。 Speech is always captured in the recognition mode, and when the input speech is interrupted, that is, during a pause period, the above-mentioned recognition calculation is executed and the previous input speech is identified by the pattern matching method.
At this time, when it became possible to identify the input audio,
That is, when the input voice matches some standard pattern within an allowable error range, the microcomputer 17 controls the output control section 6 to output an external device control signal corresponding to the control voice. If the input voice cannot be specified, the control signal is not output, but the display device 19 is driven, for example, to prompt re-voice.
次に本発明実施例において騒音と音声信号との
判別がどのように行なわれるかついて説明する。
入力部1からの信号は入力帯域フイルタ2に入る
と同時に入力帯域フイルタの帯域以上を通過させ
る。ハイパルスフイルタ(HPF)20に入力さ
れる。ここで必要以上の高域音を検知することは
ないのでマイクロフオンの特性に合わせて高域を
減衰させる。HPF20の出力は高域信号用積分
器21で積分される。入力帯域フイルタ2の出力
はBPF12,13〜14に力されると同時に音声
帯域信号積分器22に入力される。高域信号用積
分器21と音声帯域信号用積分器22の出力は信
号比較器を構成する割算器23に入力される。こ
の割算器23に於て音声帯域信号レベルを分母に
高域信号レベルを分子において割算が行なわれ該
割算器23出力は電圧比較器24に入る。前述の
如き騒音では音声帯域以上の成分が多く含まれる
ため、この種の騒音が入力された場合高域信号レ
ベルは大きくなり、音声帯域信号レベルとの比す
なわち前記割算器23出力が音声帯域信号のみの
場合に比べ判然と大きくなる。したがつて電圧比
較器24の参照電圧(V0)を適当値(これは実験
的に定まる)に選び、参照電圧(V0)以上の割算
器出力が出力された場合入力信号は騒音であると
みなしマイクロコンピユータ17に信号を出すよ
うに構成することによりマイクロコンピユータ1
7は認識動作を中断し、誤認識を防止することが
できる。この場合も前記表示装置19を駆動し再
発声を促す。 Next, a description will be given of how noise and audio signals are distinguished in the embodiment of the present invention.
The signal from the input section 1 enters the input band filter 2 and at the same time passes the signal above the band of the input band filter. The signal is input to a high pulse filter (HPF) 20. Since no higher-frequency sound than necessary is detected here, the high-frequency range is attenuated according to the characteristics of the microphone. The output of the HPF 20 is integrated by a high frequency signal integrator 21. The output of the input band filter 2 is input to the BPFs 12, 13 to 14 and at the same time is input to the voice band signal integrator 22. The outputs of the high frequency signal integrator 21 and the voice band signal integrator 22 are input to a divider 23 constituting a signal comparator. This divider 23 divides the audio band signal level as the denominator and the high frequency signal level as the numerator, and the output of the divider 23 is input to the voltage comparator 24. Since the above-mentioned noise contains many components above the voice band, when this type of noise is input, the high frequency signal level increases, and the ratio with the voice band signal level, that is, the output of the divider 23, is equal to or higher than the voice band. It is clearly larger than when only the signal is used. Therefore, if the reference voltage (V 0 ) of the voltage comparator 24 is selected to an appropriate value (this is determined experimentally) and the divider output is higher than the reference voltage (V 0 ), the input signal is noisy. By configuring the microcomputer 17 to output a signal to the microcomputer 17, the microcomputer 1
7 can interrupt the recognition operation and prevent erroneous recognition. In this case as well, the display device 19 is driven to prompt the user to speak again.
以上述べたように本発明によれば騒音と通常音
声とのスペクトル分布の特徴によつて両者の区別
をつけることができるので、高い弁別性を有し、
認識装置の誤認識を防止し、認識率を向上させる
ことができ実用性の高いものである As described above, according to the present invention, it is possible to distinguish between noise and normal speech based on the characteristics of their spectral distribution, so that high discrimination is achieved.
It is highly practical as it can prevent misrecognition by the recognition device and improve the recognition rate.
第1図は現存する音声認識装置の構成を示すブ
ロツク図、第2図は本発明装置の構成を示すブロ
ツク図であつて、1は入力部、3は特徴抽出部、
4は標準パターン記憶部、17はマイクロコンピ
ユータ、21は高域信号用積分器、22は音声帯
域信号用積分器、23は割算器、を夫々示してい
る。
FIG. 1 is a block diagram showing the configuration of an existing speech recognition device, and FIG. 2 is a block diagram showing the configuration of the device of the present invention, in which 1 is an input section, 3 is a feature extraction section,
Reference numeral 4 indicates a standard pattern storage unit, 17 a microcomputer, 21 an integrator for high frequency signals, 22 an integrator for voice band signals, and 23 a divider.
Claims (1)
帯域信号と、該帯域以上の高音域信号とに分割す
る音声域フイルタと、高音域フイルタと、該音声
域フイルタ通過信号のレベルを検出する音声域レ
ベル検出器と、上記高音域フイルタ通過信号のレ
ベルを検出する高音域レベル検出器と、これ等の
両レベル検出器からの検出出力の比を採る比較器
と、から成り、該比較器での比較の結果、高音域
レベルが音声域レベルより大きい時は音声認識動
作を中断する事を特徴とした音声認識装置。1. A voice range filter that divides the acoustic signal from the acoustic-electrical signal converter into a voice band signal and a high frequency signal above the frequency band; a high frequency filter; and an audio filter that detects the level of the signal passing through the voice frequency filter. a high-frequency range level detector, a high-frequency range level detector that detects the level of the signal passed through the high-frequency range filter, and a comparator that takes the ratio of the detection outputs from both of these level detectors. A speech recognition device characterized in that, as a result of the comparison, the speech recognition operation is interrupted when the treble range level is higher than the voice range level.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP9885779A JPS5622500A (en) | 1979-08-01 | 1979-08-01 | Audio identifying device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP9885779A JPS5622500A (en) | 1979-08-01 | 1979-08-01 | Audio identifying device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPS5622500A JPS5622500A (en) | 1981-03-03 |
| JPS6142280B2 true JPS6142280B2 (en) | 1986-09-19 |
Family
ID=14230895
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP9885779A Granted JPS5622500A (en) | 1979-08-01 | 1979-08-01 | Audio identifying device |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JPS5622500A (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS5870285A (en) * | 1981-10-22 | 1983-04-26 | 日産自動車株式会社 | Voice recognition equipment for vehicle |
| JPS5922100A (en) * | 1982-07-28 | 1984-02-04 | シャープ株式会社 | Voice recognition equipment |
| JPH0648196B2 (en) * | 1987-07-13 | 1994-06-22 | シャープ株式会社 | Noise reduction circuit for measuring instruments |
-
1979
- 1979-08-01 JP JP9885779A patent/JPS5622500A/en active Granted
Also Published As
| Publication number | Publication date |
|---|---|
| JPS5622500A (en) | 1981-03-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP0077194B1 (en) | Speech recognition system | |
| EP0398180A2 (en) | Method of and arrangement for distinguishing between voiced and unvoiced speech elements | |
| JP6220304B2 (en) | Voice identification device | |
| Kamble et al. | Novel energy separation based instantaneous frequency features for spoof speech detection | |
| JPS5876893A (en) | Voice recognition equipment | |
| GB2388947A (en) | Method of voice authentication | |
| JPS6060080B2 (en) | voice recognition device | |
| JPS6142280B2 (en) | ||
| JPH02232697A (en) | Voice recognition device | |
| JPS6226480B2 (en) | ||
| KR100587260B1 (en) | speech recognizing system of sound apparatus | |
| JPH04230796A (en) | Audio signal processing device | |
| JPS6219760B2 (en) | ||
| JP2754960B2 (en) | Voice recognition device | |
| JPH04230800A (en) | Voice signal processor | |
| JP2666296B2 (en) | Voice recognition device | |
| KR20000056849A (en) | method for recognizing speech in sound apparatus | |
| JP2882792B2 (en) | Standard pattern creation method | |
| JP2901976B2 (en) | Pattern matching preliminary selection method | |
| JP2844592B2 (en) | Discrete word speech recognition device | |
| JPS63278100A (en) | Voice recognition equipment | |
| JPS6039695A (en) | Method and apparatus for automatically detecting voice activity | |
| JP3020999B2 (en) | Pattern registration method | |
| JP2557497B2 (en) | How to identify male and female voices | |
| JPH0119597B2 (en) |