JPS6232798B2

JPS6232798B2 -

Info

Publication number: JPS6232798B2
Application number: JP55012203A
Authority: JP
Inventors: Kazunaga Yoshida; Hiroaki Sekoe
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1980-02-04
Filing date: 1980-02-04
Publication date: 1987-07-16
Also published as: JPS56109399A

Description

【発明の詳細な説明】本発明はパタンマツチング法を用いた音声入力
装置の改良に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an improvement of a voice input device using a pattern matching method.

従来、あらかじめ発声した音声を標準パタンと
して登録しておき、あらたに発声された音声と前
記標準パタンとの間でパタンマツチングを行ない
音声を認識する技術は実用化されている（参照、
情報処理学会研究会資料MMS23−２、1976年１
月20日「DPを用いた連続音声認識システム」以
下引用文献(1)と称す）。 Conventionally, technology has been put into practical use to recognize speech by registering previously uttered speech as a standard pattern and performing pattern matching between the newly uttered speech and the standard pattern (see,
Information Processing Society of Japan Research Group Material MMS23-2, 1976 1
May 20, ``Continuous speech recognition system using DP'' (hereinafter referred to as cited document (1)).

このような装置においては、あらかじめ発声さ
れた音声を標準パタンとして登録しておく必要が
ある。この際、従来はあらかじめ定められた単語
辞書にしたがつて定められた順序により標準パタ
ンを登録した。登録する単語として数字やアルフ
アベツトなどを用いる場合これらは１、２、３、
……や、Ａ、Ｂ、Ｃ、……のような順序で発声す
るのが普通であり容易である。しかしこの場合区
切つて発声したとしてもある一定のイントネーシ
ヨンが付くことや、となり合つた２単語の前後の
声韻が連続して変化するおそれがある。このため
一語一語独立して発声した場合と異なる標準パタ
ンが登録されるおそれがある。 In such a device, it is necessary to register the uttered voice in advance as a standard pattern. At this time, conventionally, standard patterns were registered in a predetermined order according to a predetermined word dictionary. When using numbers or alphanumeric characters as words to be registered, these should be 1, 2, 3,
..., A, B, C, etc. It is normal and easy to utter the words in this order. However, in this case, even if the words are uttered separately, there is a risk that a certain intonation will be added, or that the pronunciation before and after two words that are next to each other will change continuously. For this reason, there is a possibility that a standard pattern different from that in the case where each word is uttered independently may be registered.

また一般に音声入力装置への標準パタンの登録
はたとえば１日１回というように繰り返し行なわ
れる。このとき単語辞書の順序で発声が慣れてし
まうと一般の単語に関しても数字やアルフアベツ
トと同様に一語一語独立して発声した場合と異な
るある一定の癖のついたパタンが標準パタンとし
て登録されるおそれがある。また発声者が標準パ
タン登録の順序に慣れて次に発声する語を覚える
と単語の発声が不正確になるおそれがある。 Further, in general, standard patterns are registered in the voice input device repeatedly, for example, once a day. At this time, if you get used to saying the words in the order of the word dictionary, the patterns with certain peculiarities that are different from when you say each word independently are registered as standard patterns for general words, just like numbers and alphanumeric characters. There is a risk of Furthermore, if the speaker becomes accustomed to the order of standard pattern registration and remembers the next word to be uttered, the utterance of the word may become inaccurate.

本発明の目的は、このようなパタンマツチング
法を用いた音声入力装置の標準パタン登録時に起
こる発生順序による癖のついた発声や不正確な発
声の発生を防ぎ、信頼できる標準パタンを得るこ
とにある。 The purpose of the present invention is to prevent the occurrence of peculiar or inaccurate utterances due to the order of occurrence that occurs when standard patterns are registered in a voice input device using such a pattern matching method, and to obtain reliable standard patterns. It is in.

その目的を達成するため本発明の音声入力装置
は、発声された音声より特徴を抽出してパタン化
する特徴抽出部と、該パタンを標準パタンとして
登録する標準パタン登録部と、新たに入力された
音声より前記特徴抽出部においてパタン化された
入力パタンと前記標準パタンの間の類似度を求め
る類似度計算部と、該類似度をもとに認識結果を
求める結果出力部と、前記標準パタンを標準パタ
ン登録部へ登録する際にランダムな順序を与える
乱数発生部を備えて成ることを特徴とするもので
ある。 In order to achieve the object, the voice input device of the present invention includes a feature extraction section that extracts features from uttered speech and converts it into a pattern, a standard pattern registration section that registers the pattern as a standard pattern, and a standard pattern registration section that registers the pattern as a standard pattern. a similarity calculation unit that calculates the degree of similarity between the input pattern patterned in the feature extraction unit and the standard pattern from the voice obtained by the feature extraction unit; a result output unit that calculates a recognition result based on the degree of similarity; The present invention is characterized in that it includes a random number generation section that gives a random order when registering the patterns to the standard pattern registration section.

以下実施例について詳細に説明する。 Examples will be described in detail below.

第１図はパタンマツチング法を用いた音声入力
装置の構成図である。図において、１はマイクロ
ホン、２は特徴抽出部、３は標準パタン登録部、
４は類似度計算部、５は結果出力部、６はモード
切り替えスイツチである。マイクロホン１より入
力された音声信号WVは特徴抽出部２によりパタ
ン化され音声パタンＰとなる。パタンマツチング
法を用いた音声入力装置においてはその動作は標
準パタン登録モードと認識モードの２つに分けら
れる。まず標準パタン登録モードにおいてはモー
ド切り替えスイツチ６は登録モードTRにセツト
される。このとき入力された音声パタンＰは標準
パタンとして標準パタン登録部３に登録される。
また認識モードにおいてはモード切替えスイツチ
６は認識モードOPにセツトされ入力された音声
パタンＰは類似度計算部４において標準パタン登
録部３から出力された標準パタンRPとの間の類
似度Ｓが計算され出力される。この類似度Ｓをも
とに結果出力部５において認識結果Ｒが出力され
る。以上はパタンマツチング法を用いた音声入力
装置の動作であり、たとえば文献(1)に示したよう
な装置も同様の動作をする。本発明の特徴となる
部分は上記の標準パタン登録モードにおける動作
であつて認識モードにおける動作すなわちパタン
マツチングの方法には限定されない。そこで標準
パタン登録モードの動作に関してさらに詳しく述
べる。 FIG. 1 is a block diagram of a voice input device using a pattern matching method. In the figure, 1 is a microphone, 2 is a feature extraction unit, 3 is a standard pattern registration unit,
4 is a similarity calculation section, 5 is a result output section, and 6 is a mode changeover switch. The audio signal WV input from the microphone 1 is patterned by the feature extractor 2 to become an audio pattern P. In a voice input device using a pattern matching method, its operation is divided into two: a standard pattern registration mode and a recognition mode. First, in the standard pattern registration mode, the mode changeover switch 6 is set to the registration mode TR. The audio pattern P input at this time is registered in the standard pattern registration section 3 as a standard pattern.
In the recognition mode, the mode changeover switch 6 is set to the recognition mode OP, and the similarity calculation unit 4 calculates the similarity S between the input speech pattern P and the standard pattern RP output from the standard pattern registration unit 3. and output. Based on this similarity S, a recognition result R is outputted by the result output unit 5. The above is the operation of a voice input device using the pattern matching method, and the device shown in document (1), for example, also operates in a similar manner. The feature of the present invention is the operation in the above standard pattern registration mode, and is not limited to the operation in the recognition mode, that is, the pattern matching method. Therefore, the operation of the standard pattern registration mode will be described in more detail.

第２図は標準パタン登録時に必要である部分構
成であり、第１図における点線で囲まれた部分を
ぬき出したものである。図において、１１は乱数
発生部、１２は単語辞書、１３はデイスプレイ、
１４は標準パタンメモリである。なお、マイクロ
ホン１、特徴抽出部２、第２図において点線で囲
まれた部分の標準パタン登録部３は第１図と同じ
ものであり、モード切り替えスイツチ６は省略し
てある。まずＮ個の単語の標準パタンを登録する
場合、単語辞書１２にＮ個の単語名をあらかじめ
セツトしておく。この単語名は乱数発生部１１か
らのアドレス信号ADにしたがつてデイスプレイ
１３に表示される。発声者はこのデイスプレイ１
３に表示された単語名を読んでマイクロホン１に
発声する。発声された音声は特徴抽出部２でパタ
ン化された後標準パタンメモリ１４に記憶され
る。この際単語辞書１２に与えられたものと同一
のアドレス信号ADにより示された標準パタンメ
モリ１４内の領域に記憶される。これにより単語
名に対応する標準パタンが登録される。音声が入
力し終わると音声検出信号SDが出され、これに
よつて乱数発生部１１より次のアドレス信号AD
が出力される。乱数発生部１１は１からＮまでの
数をアドレス信号ADとして出力し、音声検出信
号SDによりアドレス信号ADは次の数に替わる。
すなわち１からＮまでの数をランダムな並べ替え
を行なつた数を出力する。このランダムな並べ替
えを行なう方法の一例を示す。第３図は乱数発生
部１１の構成例を示し、３１はアドレスカウン
タ、３２はＮビツトで１ビツトづつアクセス可能
のアドレスフラグメモリ、３３は乱数カウンタ、
３４は乱数発生器、３５は乱数最大値レジスタで
ある。まず最初にアドレスフラグメモリ３２の内
容にすべて１がセツトされ、乱数最大値レジスタ
３５にはＮがセツトされる。乱数発生器３４では
乱数最大値レジスタ３５にセツトされた値以下の
乱数を発生する。乱数発生方法としてはＭ系列を
用いる方法など多数あるが、発声する時間間隔を
カウントしたものを乱数として用いる方法もあ
る。 FIG. 2 shows a partial configuration necessary when registering a standard pattern, and shows the portion surrounded by dotted lines in FIG. 1 extracted. In the figure, 11 is a random number generator, 12 is a word dictionary, 13 is a display,
14 is a standard pattern memory. Note that the microphone 1, feature extraction section 2, and standard pattern registration section 3 surrounded by dotted lines in FIG. 2 are the same as in FIG. 1, and the mode changeover switch 6 is omitted. First, when registering a standard pattern of N words, N word names are set in the word dictionary 12 in advance. This word name is displayed on the display 13 according to the address signal AD from the random number generator 11. The speaker is this display 1
Read the word name displayed in 3 and say it into microphone 1. The uttered voice is patterned by the feature extractor 2 and then stored in the standard pattern memory 14. At this time, it is stored in the area in the standard pattern memory 14 indicated by the same address signal AD as that given to the word dictionary 12. As a result, a standard pattern corresponding to the word name is registered. When the voice input is finished, the voice detection signal SD is output, and the random number generation unit 11 generates the next address signal AD.
is output. The random number generator 11 outputs a number from 1 to N as an address signal AD, and the address signal AD changes to the next number in response to the voice detection signal SD.
That is, the numbers 1 to N are randomly rearranged and output. An example of a method for performing this random sorting will be shown. FIG. 3 shows an example of the configuration of the random number generator 11, where 31 is an address counter, 32 is an N-bit address flag memory that can be accessed one by one, and 33 is a random number counter.
34 is a random number generator, and 35 is a random number maximum value register. First, the contents of the address flag memory 32 are all set to 1, and the random number maximum value register 35 is set to N. The random number generator 34 generates a random number less than the value set in the random number maximum value register 35. There are many methods of generating random numbers, such as a method using an M sequence, but there is also a method of counting the time intervals between vocalizations and using them as random numbers.

アドレス出力信号SDによつてアドレスカウン
タ３１はリセツトされ乱数発生器３４から発生さ
れた乱数は乱数カウンタ３３にセツトされる。つ
づいてアドレスカウンタ３１がクロツク信号
CLKによつてカウントアツプされる。アドレス
カウタ３１からのアドレス信号によつてアドレス
フラグメモリ３２からデータＲが読み出されこの
値が１の場合は乱数カウンタ３３がカウントダウ
ンされる。これが繰り返されて乱数カウンタ３３
が０になるとゼロ信号ZFが出され、これによつ
てアドレスカウンタ３１からのアドレス信号が単
語辞書部１２と標準パタンメモリ１４に出力され
る。同時にアドレスカウンタ３１へのクロツク信
号CLKがマスクされアドレスフラグメモリ３２
の現在のアドレス指定されている部分に０が書き
こまれる。また同時に乱数最大値レジスタ３５が
１だけ引かれてＮ−１となる。そして再び乱数発
生器３４より最大値がＮ−１の乱数が発生され
る。つづいてアドレス出力信号SDが入力される
と以上の動作を繰り返す。これによつてアドレス
出力信号SDが入力されるたびにアドレス信号AD
として１からＮまでの数をランダムに並べ替えた
ものが出力される。以上が乱数発生部１１の動作
例である。この他にもさまざまな方法が考えられ
るが、本発明はこのランダムな並べ替えを行なう
方法に限定されるものではない。 The address counter 31 is reset by the address output signal SD, and the random number generated from the random number generator 34 is set in the random number counter 33. Next, the address counter 31 receives the clock signal.
Counted up by CLK. Data R is read from the address flag memory 32 in response to the address signal from the address counter 31, and if this value is 1, the random number counter 33 counts down. This is repeated and the random number counter 33
When becomes 0, a zero signal ZF is output, whereby the address signal from the address counter 31 is output to the word dictionary section 12 and the standard pattern memory 14. At the same time, the clock signal CLK to the address counter 31 is masked and the address flag memory 32 is masked.
0 is written to the currently addressed portion of . At the same time, the random number maximum value register 35 is subtracted by 1 and becomes N-1. Then, the random number generator 34 again generates a random number with a maximum value of N-1. Next, when the address output signal SD is input, the above operation is repeated. This causes the address signal AD to be output every time the address output signal SD is input.
The numbers from 1 to N are randomly rearranged and output. The above is an example of the operation of the random number generator 11. Although various other methods can be considered, the present invention is not limited to this method of random rearrangement.

本発明の要点は標準パタン登録時に毎回異なる
ランダムな順序で登録することにあり実施例に限
定されるものではない。たとえばデイスプレイ１
３に表示される単語名は音声により出力されるも
のであつてもよいし、乱数発生部１１に入力され
るアドレス出力信号は別のキーによつて入力され
るものであつてもよい。 The key point of the present invention is to register standard patterns in a different random order each time, and is not limited to the embodiments. For example, display 1
The word name displayed at 3 may be outputted by voice, and the address output signal inputted to the random number generation section 11 may be inputted using another key.

本発明の音声入力装置によれば、標準パタン登
録時に毎回異なつた順序で登録することになるの
で一定の順序による癖がつきにくく、独立して発
声したときの条件に近くなる。また発声時に若干
の注意が必要なため発声も正確になる。 According to the voice input device of the present invention, since standard patterns are registered in a different order each time, it is difficult to get accustomed to a fixed order, and the conditions are similar to those when uttering independently. Also, since it requires some attention when uttering, the utterance becomes more accurate.

[Brief explanation of the drawing]

第１図はパタンマツチング法を用いた音声入力
装置の構成図、第２図は本発明の実施例における
標準パタン登録部の構成図、第３図は乱数発生部
の構成図である。図中、１はマイクロホン、２は特徴抽出部、３
は標準パタン登録部、４は類似度計算部、５は結
果出力部、６はモード切り替えスイツチ、１１は
乱数発生部、１２は単語辞書、１３はデイスプレ
イ、１４は標準パタンメモリ、３１はアドレスカ
ウンタ、３２はアドレスフラグメモリ、３３は乱
数カウンタ、３４は乱数発生器、３５は乱数最大
値レジスタをそれぞれ示す。 FIG. 1 is a block diagram of a voice input device using a pattern matching method, FIG. 2 is a block diagram of a standard pattern registration section in an embodiment of the present invention, and FIG. 3 is a block diagram of a random number generation section. In the figure, 1 is a microphone, 2 is a feature extraction unit, and 3
is a standard pattern registration section, 4 is a similarity calculation section, 5 is a result output section, 6 is a mode changeover switch, 11 is a random number generation section, 12 is a word dictionary, 13 is a display, 14 is a standard pattern memory, and 31 is an address counter. , 32 is an address flag memory, 33 is a random number counter, 34 is a random number generator, and 35 is a random number maximum value register.

Claims

[Claims]

1. A feature extraction unit that extracts features from uttered audio and converts it into a pattern, a standard pattern registration unit that registers the pattern as a standard pattern, and an input that is patterned by the feature extraction unit from newly input audio. a similarity calculation unit that calculates the similarity between a pattern and the standard pattern; a result output unit that calculates a recognition result based on the similarity; and a random order when registering the standard pattern in the standard pattern registration unit. What is claimed is: 1. A voice input device comprising: a random number generating section that generates a random number.