JPH06100918B2

JPH06100918B2 - Voice recognizer

Info

Publication number: JPH06100918B2
Application number: JP58085340A
Authority: JP
Inventors: 敏恵渡部
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-05-16
Filing date: 1983-05-16
Publication date: 1994-12-12
Anticipated expiration: 2009-12-12
Also published as: JPS59211098A

Description

【発明の詳細な説明】（ａ）発明の技術分野本発明は、話者の入力音声の入力特徴パラメータ時系列
を、予め登録してある複数個の登録特徴パラメータ時系
列と比較して認識する音声認識装置に関する。Description: (a) Technical Field of the Invention The present invention recognizes an input feature parameter time series of a speaker's input voice by comparing it with a plurality of registered feature parameter time series registered in advance. The present invention relates to a voice recognition device.

（ｂ）技術の背景近年、音声認識装置として、限定された複数の登録話者
の入力音声を、適当な入力特徴パラメータ時系列に変換
して登録音声辞書に登録し、未知話者の入力音声なの
か、或る特定話者の入力音声なのかを、登録音声辞書と
照合して識別判定する話者音声認識装置とか、特定話者
の単語の入力音声を、適当な入力特徴パラメータ時系列
に変換して登録音声辞書に登録し、認識したい特定話者
の発声した入力音声を登録音声辞書を照合し、認識結果
を文字として表示する特定話者音声認識装置等、音声認
識技術の進歩と共に、音声による機械との対話分野にま
で拡大してきた。(B) Background of Technology In recent years, as a voice recognition device, input voices of a limited number of registered speakers are converted into appropriate input feature parameter time series and registered in a registered voice dictionary to input voices of unknown speakers. A speaker voice recognition device that determines whether a particular speaker's input voice is identified by comparing it with a registered voice dictionary, or an input voice of a specific speaker's word is converted into an appropriate input feature parameter time series. Convert and register in the registered voice dictionary, collate the input voice uttered by the specific speaker you want to recognize with the registered voice dictionary, and display the recognition result as characters With the progress of voice recognition technology, such as a specific speaker voice recognition device, It has expanded to the field of voice interaction with machines.

音声認識方法として、未知入力音声を周波数分析し、そ
の分析した結果のアナログ信号をデジタル信号に変換
し、変換されたデジタル信号を時系列化し、音声区間を
決める閾値により音声区間を決め、各音素の特徴を表す
入力特徴パラメータ時系列として抽出し、前記抽出され
た入力特徴パラメータ時系列と、予め登録されている複
数個の登録特徴パラメータ時系列とを照合して、照合結
果の最も近い距離を選択して未知入力音声の認識結果を
得るよう構成されている。前記特徴パラメータ時系列を
どのような形式とするか、登録方法と照合選択方法をど
のようにするか等が、入力音声認識の難易性と認識率に
影響を与える為、各種の方式について検討されている。As a voice recognition method, unknown input voice is frequency analyzed, the analog signal of the analysis result is converted into a digital signal, the converted digital signal is time-series, the voice section is determined by a threshold value that determines the voice section, and each phoneme is Is extracted as an input feature parameter time series that represents the characteristics of, and the extracted input feature parameter time series is collated with a plurality of registered feature parameter time series that are registered in advance, and the closest distance of the collation result is determined. It is configured to select and obtain a recognition result of an unknown input voice. Since the format of the characteristic parameter time series, the registration method and the matching selection method, etc. affect the difficulty of input speech recognition and the recognition rate, various methods are studied. ing.

（ｃ）従来技術と問題点従来の、音声認識方式は、特定話者の発声した入力音声
の入力特徴パラメータ時系列を複数個登録した登録音声
辞書を持ち、認識したい入力音声の語頭・語尾を決めた
音声区間の入力特徴パラメータ時系列と、登録音声辞書
の複数個の登録特徴パラメータ時系列とは比較して、認
識結果を出力するものである。従来は、一回の入力音声
から一個以上の音声区間を切り出した後に、各々の音声
区間の入力特徴パラメータ時系列全てを用いて、予め登
録されている複数個の登録特徴パラメータ時系列毎に距
離計算を行っていた。(C) Conventional Technology and Problems Conventional speech recognition methods have a registered voice dictionary in which a plurality of input feature parameter time series of the input voice uttered by a specific speaker are registered, and the beginning and ending of the input voice to be recognized are identified. The input feature parameter time series of the determined voice section is compared with the plurality of registered feature parameter time series of the registered voice dictionary, and the recognition result is output. Conventionally, after cutting out one or more voice sections from one input voice, all input feature parameter time series of each voice section are used, and the distance is calculated for each of a plurality of registered feature parameter time series registered in advance. I was doing calculations.

この種の音声認識方式の構成について説明する。第１図
に従来の音声認識装置の回路構成ブロック図を示す。予
め、話者の音声を登録処理する手順は、話者の発声した
一個の入力音声を、マイク１より入力させ、入力した入
力音声を帯域フイルタ−群２で、音声帯域200Hz〜5KHz
程度を10〜20のチャンネルフイルタ−群に分けて、５〜
30ms周期で各チャンネルフイルタ−出力を取り出し、特
徴パラメータ時系列抽出部３で、デジタル情報の入力特
徴パラメータ時系列に変換し、入力特徴パラメータ時系
列バッファ４に格納する。入力特徴パラメータ時系列バ
ッファ４に格納された入力特徴パラメータ時系列は、音
声区間切り出し回路５により、語頭と語尾の音声区間を
決める閾値により音声区間を切り出し、入力音声の音素
の特徴を表す登録特徴パラメータ時系列として、登録特
徴パラメータ時系列辞書部６に登録される。以上の手順
を登録したい入力音声の数だけ繰り返し、複数個の登録
特徴パラメータ時系列を登録特徴パラメータ時系列辞書
部６に登録する。The configuration of this type of voice recognition system will be described. FIG. 1 shows a circuit block diagram of a conventional speech recognition apparatus. The procedure for registering the voice of the speaker in advance is to input one input voice uttered by the speaker from the microphone 1 and input the input voice in the band filter group 2 in the voice band 200 Hz to 5 KHz.
Divide the degree into 10 to 20 channel filter groups, and
Each channel filter output is taken out at a period of 30 ms, converted by the characteristic parameter time series extraction unit 3 into the input characteristic parameter time series of digital information, and stored in the input characteristic parameter time series buffer 4. The input feature parameter time-series stored in the input feature parameter time-series buffer 4 is a registration feature representing the feature of the phoneme of the input voice by extracting a voice segment with a threshold value that determines the voice segment at the beginning and end of the speech by the voice segment extraction circuit 5. It is registered in the registered characteristic parameter time series dictionary unit 6 as a parameter time series. The above procedure is repeated for the number of input voices to be registered, and a plurality of registered characteristic parameter time series are registered in the registered characteristic parameter time series dictionary unit 6.

次ぎに、話者の音声を認識処理する手順は、話者の発声
した入力音声を、マイク１より入力させ、上記述同様の
手順に従って、入力音声を入力特徴パラメータ時系列バ
ッファ４に格納する。入力特徴パラメータ時系列バッフ
ァ４に格納された入力特徴パラメータ時系列は、音声区
間切り出し回路５により、音声区間を決める閾値によ
り、複数個の音声区間毎に区分して切り出される。この
音声区間切り出し回路５は、同一アルゴリズムにより動
作する閾値のみが変えられた形式でも、アルゴリズム自
体が異なっている回路の組合せ形式でもよい。Next, in the procedure for recognizing the voice of the speaker, the input voice uttered by the speaker is input from the microphone 1, and the input voice is stored in the input feature parameter time series buffer 4 according to the same procedure as described above. The input feature parameter time series stored in the input feature parameter time series buffer 4 is divided by the voice section cutout circuit 5 into a plurality of voice sections according to a threshold value for determining the voice section and cut out. The voice section cutout circuit 5 may be of a type in which only the threshold value operated by the same algorithm is changed or a combination type of circuits in which the algorithms themselves are different.

第４図は音声区間切り出し回路５で音声区間を決める閾
値により、複数個の音声区間を決める例であり、仮に、
３レベルの閾値をTL,TM,THとし、一回の音声、例えば、
「アオモリ」が入力されると、閾値により、閾値TLレベ
ルで「アオモリ」、閾値TMレベルで「オモリ」、閾値TH
レベルで「オモ」の３個の音声区間が決められる。従っ
て、第１図においてｎ個の閾値が設定された場合は、該
音声区間の入力特徴パラメータ時系列はｎ個出力し、予
め登録されている登録特徴パラメータ時系列辞書部６の
複数個の登録特徴パラメータ時系列ｍ個と、該音声区間
毎の入力特徴パラメータ時系列ｎ個とを、照合選択回路
７で順次ｎ×ｍ回照合して、照合距離をｎ×ｍ回計算
し、ｎ×ｍ回の照合距離のうち最も近い照合距離を選択
して認識結果として認識端子８に出力する。FIG. 4 shows an example in which a plurality of voice sections are determined by the threshold for determining the voice section in the voice section cutout circuit 5.
Let TL, TM, TH be thresholds of three levels, and one voice, for example,
When "Aomori" is input, "Aomori" at the threshold TL level, "Aomori" at the threshold TM level, and the threshold TH depending on the threshold.
Three voice segments of "omo" are decided by the level. Therefore, when n threshold values are set in FIG. 1, n input feature parameter time series of the voice section are output, and a plurality of registered feature parameter time series dictionary units 6 are registered. The feature parameter time series m and the input feature parameter time series n for each voice section are sequentially matched by the matching selection circuit 7 n × m times, and the matching distance is calculated n × m times to obtain n × m. The closest matching distance is selected from the matching distances of the times and is output to the recognition terminal 8 as a recognition result.

以上が従来の音声認識の処理手順である。この方式で
は、音声区間ｎ個の入力特徴パラメータ時系列と、ｍ個
の登録特徴パラメータ時系列の照合距離計算量は、ｎ×
ｍ回であり、認識カテゴリ数が多くなればなる程計算量
が増加し、照合処理に要する時間が多くなり、かつ誤認
識率が大きく欠点を有していた。The above is the processing procedure of the conventional voice recognition. In this method, the matching distance calculation amount of the n input feature parameter time series of the voice section and the m registered feature parameter time series is n ×
Since the number of recognitions is m, the calculation amount increases as the number of recognition categories increases, the time required for the matching process increases, and the false recognition rate is large, which is a drawback.

（ｄ）発明の目的本発明は、上記従来の欠点を解決することを目的として
いる。(D) Object of the Invention The present invention aims to solve the above-mentioned conventional drawbacks.

（ｅ）発明の構成上記目的は本発明により、帯域フイルタによってｉ個の
チャンネルに分けられた入力音声のそれぞれのチャンネ
ルをディジタル化されたｉ個の入力特徴パラメータ時系
列に変換する特徴パラメータ時系列抽出部、ｉ個のチャ
ンネルのそれぞれの入力特徴パラメータ時系列をｎ個の
閾値を用いてｎ個の音声区間に切り出す音声区間切り出
し回路、該回路よりの出力を登録する登録特徴パラメー
タ時系列辞書部を備える構成において、第一選択回路、
第二選択回路が設けられ、ｉ個のチャンネルに分けられ
た未知入力音声が特徴パラメータ時系列抽出部でディジ
タル化されたｉ個の入力特徴パラメータ時系列に変換さ
れ、それぞれは音声区間切り出し回路でｎ個の閾値でｎ
個の音声区間に切り出されたｉ個のチャンネルとされ、
かかるｉ個のチャンネルについて、第一選択回路は粗入
力特徴パラメータ時系列としてそれぞれｎ個の音声区間
を有するｊ個（ｊ＜ｉ）のチャンネルを選出し、これを
登録特徴パラメータ時系列辞書部に予め登録されている
登録特徴パラメータ時系列ｍ個の中の１個宛と照合し、
照合毎に照合距離の最も近い音声区間をｎ個から１個選
択しｍ個の選択結果を出力する機能を有し、第二選択回
路は粗入力特徴パラメータ時系列よりなる音声区間につ
いての第一選択回路からのｍ個の出力から変換された、
全ての入力特徴パラメータ時系列を含む音声区間よりな
るｍ個またはｒ個（ｒ＜ｍ）の出力と、第一選択回路で
のｍ個の出力に対応した登録特徴パラメータ時系列辞書
部の登録特徴パラメータ時系列とを、ｍ回またはｒ回対
応して照合選択する機能を有することを特徴とする音声
認識装置によって達成される。(E) Structure of the invention The above-mentioned object is, according to the present invention, a characteristic parameter time series for converting each channel of the input sound divided into i channels by a band filter into a digitized i number of input characteristic parameter time series. Extraction unit, voice section cutout circuit for cutting out input feature parameter time series of each of i channels into n voice sections using n threshold values, registered feature parameter time series dictionary section for registering output from the circuit In the configuration including, the first selection circuit,
A second selection circuit is provided, and the unknown input speech divided into i channels is converted into the i input characteristic parameter time series digitized by the characteristic parameter time series extraction unit, and each of them is converted by the speech segmentation circuit. n thresholds
I channels cut out into audio segments,
For such i channels, the first selection circuit selects j (j <i) channels each having n speech sections as a coarse input feature parameter time series, and stores them in the registered feature parameter time series dictionary unit. Matched with one of the m registered registration parameter time series in advance,
The second selection circuit has a function of selecting one of n speech sections having the closest matching distance for each matching and outputting m selection results, and the second selection circuit includes a first speech section consisting of a time series of rough input feature parameters. Converted from m outputs from the selection circuit,
Registered features of the registered feature parameter time series dictionary unit corresponding to m or r (r <m) outputs consisting of speech sections including all input feature parameter time series and m outputs in the first selection circuit This is achieved by a voice recognition device having a function of matching and selecting a parameter time series in correspondence with m times or r times.

本発明による複数個の音声区間切り出し回路と二段階の
選択回路を設けることにより、従来の方式に対し、照合
距離計算量を少なくすることができると同時に、入力特
徴パラメータ時系列と登録特徴パラメータ時系列を、２
回照合することに近似し、入力音声の強弱や時間長並び
にアクセント等様々な変動に対応して認識し、特に、話
者のその日の体調により声がかすれたりしても、高い認
識率で認識することができる利点がある。By providing a plurality of voice section cutout circuits and a two-stage selection circuit according to the present invention, the calculation amount of the matching distance can be reduced as compared with the conventional method, and at the same time, the input feature parameter time series and the registered feature parameter time Series 2
It is similar to collating twice, and recognizes it according to various changes such as the strength and time of the input voice, time length, and accent. Especially, even if the voice is faint due to the physical condition of the speaker on the day, it is recognized with a high recognition rate. There is an advantage that can be done.

（ｆ）発明の実施例以下本発明の一実施例について説明する。第２図は本発
明による音声認識装置の回路構成ブロック図であり、全
図を通し、同一対象物は第１図と同一符号で示す。９は
第一選択回路、10は登録特徴パラメータ時系列辞書部、
11はアンドゲート回路、12は第二選択回路、である。(F) Example of Invention An example of the present invention will be described below. FIG. 2 is a circuit configuration block diagram of the speech recognition apparatus according to the present invention. Throughout all the drawings, the same objects are designated by the same reference numerals as those in FIG. 9 is a first selection circuit, 10 is a registered feature parameter time series dictionary unit,
Reference numeral 11 is an AND gate circuit, and 12 is a second selection circuit.

本回路構成において、音声の登録処理手順は、従来の第
１図に示す方法と同様なので省略する。本発明による音
声認識処理手順は、マイク１から認識させる話者の入力
音声を入力し、入力特徴パラメータ時系列を入力特徴パ
ラメータ時系列バッファ４に格納する所までは第１図と
同様である。この入力特徴パラメータ時系列を音声区間
切り出し回路５で、ｎ個の閾値を設定してｎ個の音声区
間の入力特徴パラメータ時系列として切り出し、第一選
択回路９において、帯域フイルタ群２のチャンネルｉ個
中夫々ｎ個の音声区間を有するｊ個のチャンネル（但し
ｊ＜ｉとする）の粗入力特徴パラメータ時系列に変換し
て、予め登録特徴パラメータ時系列辞書部10に登録され
ているｍ個の登録特徴パラメータ時系列中の１個を、第
一選択回路９に送ってｎ個の粗入力特徴パラメータ時系
列と照合し、照合距離の最も近い音声区間をｎ個中から
１個選択する。このｎ個中より選択された１個の音声区
間のみがアンドゲート回路11のａ側に入力される。ま
た、同時に、同一閾値で切り出された音声区間のｉチャ
ンネル全て持った入力特徴パラメータ時系列は、同系の
アンドゲート回路11のｂ側に入力される。このアンドゲ
ート回路11のa,b両側に入力された系統のみアンドゲー
ト回路11のｃ側より入力特徴パラメータ時系列を出力す
る。続いて、ｍ個中の他の１個の登録特徴パラメータ時
系列を、ｎ個の音声区間の粗入力特徴パラメータ時系列
と照合し、上述同様にして音声区間を選択し、アンドゲ
ート回路11のｃ側に選択された音声区間の入力特徴パラ
メータ時系列が出力される。かくして、ｎ個の音声区間
の粗入力特徴パラメータ時系列をｍ個の登録特徴パラメ
ータ時系列とｍ回照合し、合計ｍ個の音声区間が選択さ
れ、順次アンドゲート回路11のｃ側に、選択された音声
区間のｉチャンネルの入力特徴パラメータ時系列のみが
出力され、第二選択回路12に入力される。このｍ個の選
択された音声区間のｉチャンネルの入力特徴パラメータ
時系列と、登録特徴パラメータ時系列辞書部10に登録さ
れているｍ個の登録特徴パラメータ時系列とを、順次第
二選択回路12でｍ回照合距離を計算し、最も近い照合距
離を選択して、認識結果として選択された登録特徴パラ
メータ時系列を認識端子８に出力する。本発明の方式に
おいて、第一選択回路と第二選択回路で計算する照合距
離の計算量は、（j/i）＜（１−1/n）、ｊ＜ｉとなるようにｉとｊを設定することで（nj/i＋１）ｍ＜（ｎ×ｍ）となり、計算量は少なくなる。In this circuit configuration, the voice registration processing procedure is the same as the conventional method shown in FIG. The voice recognition processing procedure according to the present invention is the same as that shown in FIG. 1 up to the point where the input voice of the speaker to be recognized is input from the microphone 1 and the input feature parameter time series is stored in the input feature parameter time series buffer 4. This input feature parameter time series is cut out as an input feature parameter time series of n voice sections by setting n threshold values in the voice section cutout circuit 5, and in the first selection circuit 9, channel i of the band filter group 2 is selected. M channels that are registered in advance in the registered feature parameter time series dictionary unit 10 after being converted into rough input feature parameter time series for j channels (where j <i) each having n voice sections. One of the registered feature parameter time series of is sent to the first selection circuit 9 to be compared with the n coarse input feature parameter time series, and one voice section having the closest matching distance is selected from n. Only one voice segment selected from the n voice segments is input to the a side of the AND gate circuit 11. At the same time, the input feature parameter time series having all i channels of the voice section cut out with the same threshold value is input to the b side of the AND gate circuit 11 of the same system. The input feature parameter time series is output from the c side of the AND gate circuit 11 only for the systems input to both sides a and b of the AND gate circuit 11. Then, the other one of the registered characteristic parameter time series of m is compared with the rough input characteristic parameter time series of the n speech sections, the speech section is selected in the same manner as described above, and the AND gate circuit 11 The input feature parameter time series of the selected voice section is output to the c side. Thus, the time series of the coarse input feature parameters of n voice sections are collated with the time series of m registered feature parameters m times, and a total of m voice sections are selected and sequentially selected on the c side of the AND gate circuit 11. Only the input feature parameter time series of the i channel of the selected voice section is output and input to the second selection circuit 12. The input feature parameter time series of the i channels of the m selected voice sections and the m registered feature parameter time series registered in the registered feature parameter time series dictionary unit 10 are sequentially selected by the second selection circuit 12 Is calculated m times, the closest matching distance is selected, and the registered feature parameter time series selected as the recognition result is output to the recognition terminal 8. In the method of the present invention, the calculation amount of the matching distance calculated by the first selection circuit and the second selection circuit is i and j such that (j / i) <(1-1 / n) and j <i. By setting, (nj / i + 1) m <(n × m), and the amount of calculation decreases.

第３図は本発明の他の実施例であって、13は第一選択回
路、14は選択入力特徴パラメータ時系列バッファ、15は
第二選択回路である。第一選択回路13では、第２図で記
述したｉチャンネル中ｊチャンネルについての粗入力特
徴パラメータ時系列ｎ個を、登録特徴パラメータ時系列
ｍ個と照合し、照合距離のうち最も近い音声区間ｒ個
（但しｒ≦ｍ）選択し、選択入力特徴パラメータ時系列
バッファ14に格納する。この選択格納されたｉチャンネ
ル全部の夫々のｒ個の音声区間の間を第二選択回路15で
照合距離を計算し、最も近い照合距離を選択して認識結
果として認識端子８に出力する。第一選択回路と第二選
択回路で計算する照合距離の計算量は、（j/i）＜（１−r/nm）、ｒ≦ｍ、ｊ＜ｉとなるように
i,j,rを選ぶことによって（j/i）×nm＋ｒ＜（ｎ×ｍ）が得られる。FIG. 3 shows another embodiment of the present invention, in which 13 is a first selection circuit, 14 is a selection input characteristic parameter time series buffer, and 15 is a second selection circuit. The first selection circuit 13 collates the rough input feature parameter time series n for channel j among channels i described in FIG. 2 with the registered feature parameter time series m, and determines the closest speech segment r among the collation distances. Individual pieces (however, r ≦ m) are selected and stored in the selected input feature parameter time series buffer 14. The second selection circuit 15 calculates the matching distance between the respective r voice sections of all the i channels selected and stored, selects the closest matching distance, and outputs it as the recognition result to the recognition terminal 8. The calculation amount of the matching distance calculated by the first selection circuit and the second selection circuit should be (j / i) <(1-r / nm), r ≦ m, j <i
By selecting i, j, r, (j / i) × nm + r <(n × m) is obtained.

以上、本発明の実施例として、帯域フイルターによる周
波数分析方式で説明したが、LPC分析方式等を採用して
いる音声認識装置にも利用できる。As described above, the frequency analysis method using the band filter has been described as the embodiment of the present invention, but the present invention can also be applied to a voice recognition device adopting the LPC analysis method or the like.

（ｇ）発明の効果以上説明したように、本発明による複数個の音声区間切
り出し回路と、二段階の選択回路を設けることにより、
話者の入力音声を認識する音声認識において、入力音声
の強弱や時間長並びにアクセント等様々な変動に対応し
て認識できるので、認識率を改善できると共に、照合距
離の計算量を減少できる効果がある。(G) Effect of the Invention As described above, by providing a plurality of voice section cutout circuits according to the present invention and a two-stage selection circuit,
In the speech recognition for recognizing the input voice of the speaker, the recognition rate can be improved and the amount of calculation of the matching distance can be reduced because the input voice can be recognized in response to various variations such as the strength, time length, and accent. is there.

[Brief description of drawings]

第１図は、従来の音声認識装置の回路構成ブロック図、
第２図、第３図は本発明による音声認識装置の回路構成
ブロック図、第４図は入力音声の音声区間を決める関係
図である。図面において、１はマイク、２は帯域フイルタ−群、３
は特徴パラメータ時系列抽出部、４は入力特徴パラメー
タ時系列バッファ、５は音声区間切り出し回路、７は照
合選択回路、８は認識端子、９は第一選択回路、10は登
録特徴パラメータ時系列辞書部、12は第二選択回路、13
は第一選択回路、14は選択入力特徴パラメータ時系列バ
ッファ、15は第二選択回路をそれぞれ示す。FIG. 1 is a circuit configuration block diagram of a conventional voice recognition device,
FIG. 2 and FIG. 3 are circuit block diagrams of the voice recognition device according to the present invention, and FIG. 4 is a relational diagram for determining the voice section of the input voice. In the drawing, 1 is a microphone, 2 is a band filter group, 3
Is a feature parameter time series extraction unit, 4 is an input feature parameter time series buffer, 5 is a voice segment cutout circuit, 7 is a matching selection circuit, 8 is a recognition terminal, 9 is a first selection circuit, and 10 is a registered feature parameter time series dictionary. Part, 12 is the second selection circuit, 13
Is a first selection circuit, 14 is a selection input feature parameter time series buffer, and 15 is a second selection circuit.

Claims

[Claims]

1. A feature parameter time series extraction unit for converting each channel of input speech divided into i channels by a band filter into a digitized i number of input feature parameter time series. In a configuration including a voice section cutout circuit that cuts out each input feature parameter time series into n voice sections using n thresholds, and a registered feature parameter time series dictionary unit that registers an output from the circuit, the first selection A circuit and a second selection circuit are provided, and the unknown input speech divided into i channels is converted into i input characteristic parameter time series digitized by the characteristic parameter time series extraction unit, and each is segmented into a voice section. In the circuit, there are i thresholds cut out into n voice segments and i channels are cut out. , The first selection circuit selects j (j <i) channels each having n voice sections as a rough input feature parameter time series, and registers them in the registration feature parameter time series dictionary unit in advance. It has a function of matching one of m feature parameter time series to one destination, selecting one of n voice segments having the closest matching distance for each match, and outputting m selection results. The circuit transforms m outputs from the first selection circuit for a speech segment having a rough input characteristic parameter time series into m or r speech segments including all the input characteristic parameter time series (r < m) and the registered feature parameter time series of the registered feature parameter time series dictionary unit corresponding to the m outputs in the first selection circuit have a function of matching and selecting m times or r times. Characterized by Voice recognition device.