JPH0656559B2

JPH0656559B2 - Word detection method

Info

Publication number: JPH0656559B2
Application number: JP61190261A
Authority: JP
Inventors: 香一郎畑▲崎▼
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-08-12
Filing date: 1986-08-12
Publication date: 1994-07-27
Anticipated expiration: 2009-07-27
Also published as: JPS6344698A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声認識装置、音声入力装置等において用いら
れ、入力音声中に含まれる単語とその単語の音声中での
位置とを検出する単語検出方式に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention is used in a voice recognition device, a voice input device, etc., and is a word for detecting a word included in an input voice and a position of the word in the voice. Regarding detection method.

（従来の技術）音声認識装置、音声入力装置等において入力音声中の単
語とその位置を検出する方法に、音節、音素、音素クラ
ス等のカテゴリの列である入力音声から各カテゴリとそ
れらの入力音声中での位置情報とを抽出し、抽出された
カテゴリから作成したカテゴリ列がある単語のカテゴリ
列に対応すれば、その単語と入力音声中でのカテゴリ列
の位置とを検出結果として出力する方法がある。(Prior Art) A method for detecting a word and its position in an input voice in a voice recognition device, a voice input device, etc., includes a method for inputting each category from the input voice that is a sequence of categories such as syllable, phoneme, and phoneme class Position information in the voice is extracted, and if the category string created from the extracted categories corresponds to the category string of a word, that word and the position of the category string in the input voice are output as the detection result. There is a way.

一般に上述のカテゴリは、その時間長が短かく、また類
似するカテゴリが存在することなどから、入力音声中の
カテゴリを完全に誤りなく抽出することは困難である。
このため、従来は、入力音声中の各カテゴリの区間に対
して複数個のカテゴリ候補を抽出しておき、入力音声の
端から順にカテゴリ候補を用いて、部分的なカテゴリ候
補列を生成しては単語のカテゴリ列との照合を行なうと
いう処理を繰り返すことによって、その単語に対応する
カテゴリ候補列を見つけていた。この方法の詳細は、例
えば、文献１「特願昭58−214544号、パタン認識装置」
に述べられているので、ここでは省略する。In general, the above-mentioned categories have a short time length, and there are similar categories. Therefore, it is difficult to extract the categories in the input speech completely without error.
Therefore, conventionally, a plurality of category candidates are extracted for each category section in the input voice, and the category candidates are sequentially used from the end of the input voice to generate a partial category candidate sequence. Found a category candidate sequence corresponding to the word by repeating the process of matching the word with the category sequence. For details of this method, refer to, for example, Document 1 “Japanese Patent Application No. 58-214544, Pattern Recognition Device”.
Since it has been described above, it is omitted here.

また、入力音声中のカテゴリ抽出の段階において、発声
のなまけ音や隣接するカテゴリ（例えば音節）どうしの
調音結合などの原因によって、入力音声には含まれない
カテゴリが検出され、その結果、隣接すべきカテゴリ候
補の間に他のカテゴリ候補が出現することがある。この
現象を以後、カテゴリ挿入誤りと呼ぶ。In addition, at the stage of extracting the categories in the input voice, a category that is not included in the input voice is detected due to factors such as a slur of the utterance and articulation of adjacent categories (for example, syllables). Other category candidates may appear between the power category candidates. This phenomenon is hereinafter referred to as category insertion error.

カテゴリ挿入誤りに対処するために、従来は次の方法を
用いていた。すなわち、どのようなカテゴリの並びのと
きにカテゴリ挿入誤りが起こりやすいかということをあ
らかじめ調査し、その結果から比較的頻度の高いカテゴ
リ挿入誤りについてカテゴリ列訂正規則を用意する。カ
テゴリ列訂正規則は、カテゴリ挿入誤りの起こっている
カテゴリ列に適用された場合、挿入されたカテゴリを削
除する。この規則を、単語検出時に、カテゴリ候補列に
適用することによって、比較的頻度の高いカテゴリ挿入
誤りは、訂正することができる。この方法は、例えば文
献２「松永昭一、好田正紀“Branch ＆ Bound法の効果
とBottom−Up音韻認識を利用した候補選択”、日本音響
学会音声研究会資料Ｓ85−79、1986年１月」の616頁右
側15行目から18行目に述べられている。また、訂正規則
の例は同文献617頁、表４に示されている。Conventionally, the following method has been used to deal with a category insertion error. That is, it is preliminarily investigated what kind of category arrangement is likely to cause a category insertion error, and from the result, a category string correction rule is prepared for a relatively frequent category insertion error. When the category string correction rule is applied to the category string in which the category insertion error has occurred, the inserted category is deleted. By applying this rule to the category candidate sequence at the time of word detection, it is possible to correct category insertion errors that are relatively frequent. This method is described in, for example, Reference 2 “Shoichi Matsunaga, Masanori Yoshida“ Candidate Selection Using the Effects of Branch & Bound and Bottom-Up Phonological Recognition ”, Acoustical Society of Japan Material S85-79, January 1986”. Pp. 616, lines 15-18. An example of the correction rule is shown in Table 4 on page 617 of the same document.

（発明が解決しようとしている問題点）上記従来の方法では、入力音声から抽出されたカテゴリ
候補を用いてカテゴリ候補列を生成したのちに、単語の
カテゴリ列との照合を行なっていたために、最終的に無
駄になるカテゴリ候補列が多数生成されてしまい、その
ために多大な計算量を必要としていた。(Problems to be Solved by the Invention) In the above-mentioned conventional method, since the category candidate string is generated using the category candidates extracted from the input speech, the category candidate string is collated with the word category string. A large number of category candidate sequences, which are wasted, are generated, which requires a large amount of calculation.

また、検出すべき単語の区間が入力音声の一部分しか占
めない場合でも、従来は、その単語の存在しない区間を
含め、入力音声の端からすべてのカテゴリ候補について
等しく単語中のカテゴリとの照合を行なわねばならず、
無駄な計算時間を必要とし、単語の検出まで長い時間を
必要としていた。Further, even if a word segment to be detected occupies only a part of the input voice, conventionally, all the category candidates including the segment in which the word does not exist are matched equally with the category in the word from the end of the input voice. Must be done,
It takes a lot of calculation time and a long time to detect a word.

さらに、前記のカテゴリ列訂正規則は、カテゴリ挿入誤
りの起こっているカテゴリ候補列だけではなくて、起こ
っていないカテゴリ候補列にも等しく適用される。ま
た、一つのカテゴリ候補列に対しては、多くの場合複数
個の訂正規則が個別に適用される。このため、一つのカ
テゴリ候補列から多くのカテゴリ候補列が生成されてし
まい、単語のカテゴリ列との照合を行なうべきカテゴリ
候補列の数が増加する。しかもそのカテゴリ候補列のほ
とんどは、検出すべき単語のカテゴリ列とは一致しない
ために拒絶されるものである。Furthermore, the above-mentioned category string correction rule is equally applied not only to the category candidate string in which the category insertion error has occurred but also to the category candidate string in which the category insertion error has not occurred. Further, in many cases, a plurality of correction rules are individually applied to one category candidate sequence. Therefore, many category candidate strings are generated from one category candidate string, and the number of category candidate strings to be matched with the word category string increases. Moreover, most of the category candidate strings are rejected because they do not match the category string of the word to be detected.

また、訂正規則で訂正できるカテゴリ挿入誤りは比較的
頻繁に起こるものに限られ、まれに起こる誤りを訂正す
ることはできない。訂正できる誤りの種類を増やすため
には訂正規則の数を増加させなければならないが、この
結果、生成されるカテゴリ候補列はますます増加する。Moreover, category insertion errors that can be corrected by the correction rule are limited to those that occur relatively frequently, and errors that occur infrequently cannot be corrected. The number of correction rules must be increased in order to increase the types of errors that can be corrected, but as a result, more and more category candidate sequences are generated.

例えば、「オンセイニンシキワ（音声認識は）」と発音
された音声が入力され、その中の音節候補を抽出した場
合、音節“ン”と“シ”のそれぞれの音節候補の間に音
節“ニ”が誤って挿入される場合がある。この場合、他
の音節に対して正しい音節候補が得られた場合でも、抽
出された音節候補から生成される音節候補列は、“オン
セイニンニシキワ”となり、この中には正しい単語候補
「認識」の音節例“ニンシキ”に一致する部分がないた
め、単語「認識」を検出することはできない。しかも、
このような音節の挿入誤りは比較的まれな種類のもので
あり、この誤りを訂正する規則が用意されていることは
少ない。For example, when a voice pronounced "Onsei Ninshikiwa (voice recognition is used)" is input and syllable candidates are extracted, the syllable "s" is inserted between the syllables "n" and "si". D ”may be inserted by mistake. In this case, even if the correct syllable candidates are obtained for other syllables, the syllable candidate sequence generated from the extracted syllable candidates becomes “Onsei Ninishikiwa”, and the correct word candidate “recognition The word “recognition” cannot be detected because there is no portion that matches the syllable example “ninshinki”. Moreover,
Such a syllable insertion error is a relatively rare kind, and there are few rules for correcting this error.

本発明の目的は、無駄なカテゴリ候補列を生成せず、ま
た、検出すべき単語の区間が入力音声全体のごく一部で
ある場合や、さらに入力音声中のカテゴリ候補検出時に
いくつかのカテゴリ候補が誤って挿入された場合でも、
効率よく入力音声から正しい単語とその位置とを検出す
ることを可能にする単語検出方式を提供することにあ
る。An object of the present invention is not to generate a wasteful category candidate sequence, and also when a section of a word to be detected is a very small part of the entire input speech, or when some category is detected when a category candidate is detected in the input speech. Even if the candidate was accidentally inserted,
It is an object of the present invention to provide a word detection method that enables efficient detection of a correct word and its position from input speech.

（問題点を解決するための手段）前述の問題点を解決し上記目的を達成するために本発明
が提供する手段は、音節、音素、音素クラス等のカテゴ
リの列である入力音声から抽出した複数個のカテゴリ候
補とそれらの位置情報とを用いて、単語のカテゴリ列に
対応するカテゴリ候補列を生成することによって、入力
音声中の単語とその出現位置を検出する単語検出方式で
あって、入力音声から得た複数個のカテゴリ候補とそれ
らの位置情報のそれぞれをそのカテゴリ名で分類して記
憶し、単語中のカテゴリの並びの順に従って各カテゴリ
に対応するカテゴリ候補とその位置情報をそのカテゴリ
と同じ名前に分類されて記憶されているカテゴリ候補の
中から選ぶとともに、単語中の隣接する２個のカテゴリ
のそれぞれが、入力音声中の連続する３個のカテゴリ候
補の並びの両端のカテゴリ候補に対応するときには、そ
の２個のカテゴリの並びと３個のカテゴリ候補の並びと
を対応させて、カテゴリ候補列の生成を行なうことを特
徴とする。(Means for Solving Problems) Means provided by the present invention for solving the above problems and achieving the above object are extracted from an input speech that is a sequence of categories such as syllables, phonemes, and phoneme classes. A word detection method for detecting a word in an input voice and its appearance position by generating a category candidate string corresponding to a word category string using a plurality of category candidates and their position information, A plurality of category candidates obtained from the input voice and their position information are classified and stored by the category name, and the category candidates and their position information corresponding to each category are stored according to the order of the categories in the word. While selecting from the category candidates that have been classified into the same name as the category and stored, each of the two adjacent categories in the word has three consecutive words in the input speech. When it corresponds to the category candidates at both ends of the arrangement of the individual category candidates, the arrangement of the two categories and the arrangement of the three category candidates are made to correspond to each other to generate the category candidate sequence.

（作用）本発明の方式では、入力音声から抽出されたカテゴリ候
補のうち、検出すべき単語に含まれるカテゴリと同じ名
前のカテゴリ候補だけを用いて、かつ単語中のカテゴリ
の並びを辿りながら対応するカテゴリ候補列を生成す
る。このことによって、単語のカテゴリ列あるいはその
部分列に対応するカテゴリ候補列だけが生成されること
になり、無駄なカテゴリ列を生成することを避けること
が可能となる。(Operation) In the method of the present invention, among the category candidates extracted from the input speech, only the category candidates having the same name as the category included in the word to be detected are used, and the arrangement of the categories in the word is followed. A category candidate sequence to be generated is generated. As a result, only the category candidate string corresponding to the word category string or the substring thereof is generated, and it is possible to avoid generating an unnecessary category string.

また、入力音声中のカテゴリ候補のうち、単語中のカテ
ゴリに対応するカテゴリ候補からカテゴリ候補列を生成
してゆくために、検出すべき単語の区間が入力音声の全
体のごく一部の場合であっても、また、その区間が入力
音声中のどの位置にあっても、素早くその単語を検出す
ることが可能となる。In addition, in order to generate a category candidate string from category candidates corresponding to the category in a word among the category candidates in the input speech, when the section of the word to be detected is a very small part of the entire input speech. It is possible to detect the word quickly regardless of whether or not the section exists in the input voice.

また、カテゴリの挿入誤りが生じた場合でも以下の原理
で単語を検出することが可能となる。いま、カテゴリ列
がＣ₁...Ｃ_i-1Ｃ_i...Ｃ₁である単語Ｗが含まれる入力音
声中のカテゴリ候補を抽出した結果、カテゴリＣ_i-1，
Ｃ₁それぞれのカテゴリ候補Ｋ_i-1，Ｋ_iの間に誤ってカ
テゴリ候補Ｋ_Xが挿入されたとする。すなわち、入力音
声中での単語Ｗに対応する部分のカテゴリ候補列は
Ｋ₁...Ｋ_i-1Ｋ_XＫ_i...Ｋ₁となる。そこで、検出すべき
単語中のカテゴリの並びを辿りながらその単語に対応す
るカテゴリ候補列を生成するときに、カテゴリＣ_i-1に
対応するカテゴリ候補（Ｋ_i-1）とカテゴリＣ₁に対応す
るカテゴリ候補（Ｋ₁）とがそれぞれ、入力音声中での
連続する３個のカテゴリ候補の並びの両端のカテゴリ候
補であるなら、その３個のカテゴリ候補の並びを単語中
のカテゴリ列Ｃ_i-1Ｃ₁に対応させる。このことによって
も、カテゴリ候補Ｋ_Xが挿入されたとしても、カテゴリ
候補列とカテゴリ列との正しい対応をとることが可能に
なる。また、単語のカテゴリ列に対応するカテゴリ候補
列だけが生成されることになるため、無駄なカテゴリ候
補列の生成を避けることができる。Further, even if a category insertion error occurs, it is possible to detect a word according to the following principle. Now, as a result of extracting the category candidates in the input speech including the word W whose category string is C ₁ ... C _i-1 C _i ... C ₁ , the categories C _i-1 ,
It is assumed that the category candidate K _X is erroneously inserted between the category candidates K _i-1 and K _i of C ₁ . That is, the category candidate sequence of the portion corresponding to the word W in the input voice is K ₁ ... K _i-1 K _X K _i ... K ₁ . Therefore, when generating a category candidate string corresponding to the word while following a sequence of categories in the word to be detected, corresponding to the category candidate (K _i-1) and category C ₁ corresponding to the category C _i-1 If the category candidate (K ₁ ) to be processed is a category candidate at both ends of the sequence of three consecutive category candidates in the input speech, the sequence of these three category candidates is the category string C _i in the word. _-1 Corresponds to C ₁ . This also enables correct correspondence between the category candidate sequence and the category sequence even if the category candidate K _X is inserted. Further, since only the category candidate string corresponding to the word category string is generated, useless generation of the category candidate string can be avoided.

（実施例）以下、図面を参照しつつ、実施例に従って本発明を一層
詳細に説明する。(Examples) Hereinafter, the present invention will be described in more detail according to examples with reference to the drawings.

第１図は本発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

本実施例では日本語の音声の入力がされるものとし、ま
たカテゴリとして音節を用いる。音節抽出部101は入力
音声中の音節候補を検出し、その候補を音節候補記憶部
102に記憶する。In this embodiment, Japanese voice is input and syllables are used as categories. The syllable extraction unit 101 detects a syllable candidate in the input voice and stores the candidate in the syllable candidate storage
Store in 102.

音節抽出部101の一例をブロック図で第２図に示す。第
２図において、入力音声は音声バッファ201に一旦格納
される。まず、母音候補検出部202が、音声バッファ201
に格納された音声中の母音候補を検出し、母音候補記憶
部203に格納する。母音候補の検出は母音パタン記憶部2
04にあらかじめ格納されている各母音の音声標準パタン
と入力音声の各区間とを照合することによって行なわれ
る。母音の音声信号は比較的定常であるので検出は容易
である。各母音候補は少なくとも母音名、入力音声中で
の位置の情報を保持している。母音候補の検出が終了し
た後、子音候補検出部205によって子音候補が次に述べ
るようにして検出される。日本語においては、音節は子
音（Ｃ）−母音（Ｖ）の組である。従って入力音声中で
は、２個の母音に挾まれた区間のうちのある時間長以下
の区間（これをＶＣＶ区間）および入力音声の始端から
ある時間長以内にある母音までの区間（これをＣＶ区
間）のそれぞれに、１個の子音が存在すると言える。子
音候補検出部204は母音候補記憶部203に記憶されている
母音候補から作られるすべてのＶＣＶ区間およびＣＶ区
間のそれぞれに対して、あらかじめ子音パタン記憶部20
6に記憶されているＶＣＶおよびＣＶ標準音声パタンと
の照合を行ない、類似度の高い複数個の音声パタンの名
前を子音候補とする。以上で決定された母音候補と子音
候補とを組み合わせて音節候補とし、入力音声中での位
置と共に音節候補記憶部102に記憶する。An example of the syllable extraction unit 101 is shown in a block diagram in FIG. In FIG. 2, the input voice is temporarily stored in the voice buffer 201. First, the vowel candidate detection unit 202 uses the voice buffer 201.
The vowel candidate in the voice stored in is detected and stored in the vowel candidate storage unit 203. Vowel candidate detection is performed by the vowel pattern storage unit 2
This is performed by comparing the standard voice pattern of each vowel stored in 04 with each section of the input voice. The vowel voice signal is relatively stationary and therefore easy to detect. Each vowel candidate holds at least vowel name and position information in the input voice. After the detection of vowel candidates is completed, the consonant candidate detection unit 205 detects consonant candidates as described below. In Japanese, a syllable is a consonant (C) -vowel (V) pair. Therefore, in the input speech, a section between two vowels that is shorter than a certain time length (this is a VCV section) and a section from the beginning of the input speech to a vowel that is within a certain time length (this is a CV section). It can be said that there is one consonant in each (section). The consonant candidate detection unit 204 preliminarily stores the consonant pattern storage unit 20 for all VCV sections and CV sections created from the vowel candidates stored in the vowel candidate storage unit 203.
The VCV and CV standard voice patterns stored in 6 are collated, and the names of a plurality of voice patterns having a high degree of similarity are used as consonant candidates. The vowel candidate and the consonant candidate determined as described above are combined into a syllable candidate and stored in the syllable candidate storage unit 102 together with the position in the input voice.

例として、“オンセイニンシキワ”（音声認識は）とい
う音声が入力されたとすると、音節認識の結果として例
えば第３図に示されるような音節候補が抽出される。第
３図において、矢印の線が各音節候補の区間であり、各
区間に複数個の音節に候補が抽出されている。これらの
音節候補は、音節名で分類されて、音節候補記憶部102
記憶される。この結果、音節候補記憶部102の内容は第
４図に示されるようになる。この図では、各音節候補を
“音節名／始端時刻：終端時刻”の形式で表現してい
る。As an example, if the voice "ON-SEI NINISHIWA" (voice recognition) is input, syllable candidates as shown in FIG. 3 are extracted as a result of the syllable recognition. In FIG. 3, a line of an arrow is a section of each syllable candidate, and a plurality of syllable candidates are extracted in each section. These syllable candidates are classified by syllable name, and the syllable candidate storage unit 102
Remembered. As a result, the contents of the syllable candidate storage unit 102 are as shown in FIG. In this figure, each syllable candidate is represented in the format of "syllable name / start time: end time".

単語記憶部103には検出すべき単語の音節列が記憶され
ている。その中の１個の単語を単語バッファ104に取り
出した後、入力音声にこの単語が含まれるかどうか調べ
られる。今、単語バッファ104には単語「認識」の音節
列“ニンシキ”が記憶されているとする。The word storage unit 103 stores syllable strings of words to be detected. After fetching one of the words into the word buffer 104, it is examined whether or not this word is included in the input speech. Now, it is assumed that the word buffer 104 stores the syllable string “Ninshiki” of the word “recognition”.

音節候補列生成部105は単語バッファ104に記憶されてい
る単語中の音節の並びの順に、音節候補記憶部102中の
音節候補から音節候補列を作成し、その結果の音節候補
列と対応する音節列とを音節候補列記憶部106に記憶す
る。本実施例では、単語の先頭の音節から順に音節列を
作成してゆく。The syllable candidate string generation unit 105 creates a syllable candidate string from the syllable candidates in the syllable candidate storage unit 102 in the order of the arrangement of syllables in the words stored in the word buffer 104, and corresponds to the resulting syllable candidate string. The syllable sequence and the syllable candidate sequence storage unit 106 are stored. In this embodiment, a syllable string is created in order from the beginning syllable of a word.

まず、単語バッファ104先頭の音節は“ニ”であるか
ら、音節候補列生成部105は音節候補記憶部102中で
“ニ”に分類されて記憶されている音節候補を取り出
し、それぞれを長さ１の音節候補列として、音節“ニ”
とともに音節候補列記憶部106に記憶する。この結果、
音節候補列記憶部106には、ニ／０：２（ニ）ニ／２：４（ニ）ニ／10：12 （ニ）ニ／14：16 （ニ）の４個の音節候補列が記憶される。ここで、括弧の中が
対応する音節列である。First, since the syllable at the beginning of the word buffer 104 is “d”, the syllable candidate string generation unit 105 extracts the syllable candidates stored in the syllable candidate storage unit 102 classified into “d” and stores the length of each syllable candidate. As the syllable candidate sequence of 1, the syllable "D"
It is stored together with the syllable candidate string storage unit 106. As a result,
The syllable candidate string storage unit 106 stores four syllable candidate strings: 0/2 (d) 2/2: 4 (d) d / 10: 12 (d) d / 14: 16 (d). To be done. Here, the brackets are the corresponding syllable strings.

次に、音節候補列生成部105は単語バッファ104中の次の
音節“ン”に注目し、音節候補記憶部102中で“ン”に
分類されて記憶されている音節候補のそれぞれについ
て、音節候補列記憶部106中のいずれかの音節候補列の
最後尾の音節候補の直後にか、あるいは他の１個の他の
音節候補を介して、入力音声中で後続しているかどうか
を調べる。そのように後続している音節候補があれば、
その音節候補を音節候補列の最後尾に連結して新たな音
節候補列を生成し音節候補列記憶部106に記憶する。音
節候補Ａが他の音節候補Ｂに後続しているかどうかは音
節候補Ａの終端時刻と音節候補Ｂの始端時刻とを比較す
ることによって判定することができる。ここでは、それ
らの時刻の差がプラスマイナス１以下のときに後続する
と判定する。今の場合は“ン”に分類されて記憶されて
いる音節候補は、ン／２：４、ン／12：14の３個であ
る。そこで、音節候補ン／２：４が音節候補列の最後
尾の音節候補ニ／０：２の直後に後続することから、音
節候補列に音節候補ニ／０：２を連結して音節列“ニ
ン”とする。同様に、音節候補ン／12：14を音節候補列
に連結して音節列“ニン”とする。また、それまで音
節候補列記憶部106に記憶されていた音節候補列は削除
する。この結果、音節候補列記憶部106の中には、ニ／０：２−ン／２：４（ニン）ニ／10：12−ン／12：14 （ニン）の２個の音節候補列が残る。Next, the syllable candidate string generation unit 105 pays attention to the next syllable “n” in the word buffer 104, and for each of the syllable candidates stored in the syllable candidate storage unit 102 classified as “n”, Immediately after the last syllable candidate in any of the syllable candidate strings in the candidate string storage unit 106, or via another one syllable candidate, it is checked whether or not the input speech is followed. If there is a syllable candidate that continues like that,
The syllable candidate string is connected to the tail end of the syllable candidate string to generate a new syllable candidate string and stored in the syllable candidate string storage unit 106. Whether or not the syllable candidate A follows the other syllable candidate B can be determined by comparing the end time of the syllable candidate A and the start time of the syllable candidate B. Here, when the difference between those times is plus or minus 1 or less, it is determined to follow. In the present case, there are three syllable candidates that are classified and stored as "n", that is, n / 2: 4 and n / 12: 14. Therefore, since the syllable candidate / 2: 4 follows immediately after the last syllable candidate d / 2: 0: 2 of the syllable candidate string, the syllable candidate string is connected to the syllable candidate d / 2: 0: 2. Nin ”. Similarly, the syllable candidate / 12: 14 is connected to the syllable candidate string to form the syllable string “nin”. In addition, the syllable candidate string stored in the syllable candidate string storage unit 106 until then is deleted. As a result, in the syllable candidate string storage unit 106, there are two syllable candidate strings of N / 0: 2-N / 2: 4 (Nin) N / 10 / 10-12-N / 12: 14 (Nin). Remain.

続いて、音節“シ”についての処理に進む。音節候補記
憶部102中で、“シ”に分類されて記憶されている音節
候補は、シ／４：７とシ／16：18の２個である。このそ
れぞれについて音節候補列との最後尾の音節候補
に、直接かあるいは他の１個の音節候補を介して、入力
音声中で後続しているかどうかを調べる。この結果、シ
／４：７が音節候補列に連結される。またシ／16：18
は音節候補列の最後尾の音節候補ン／12：14に音節候
補ニ／14：16またはイ／14：16を介して接続しているた
め、音節候補シ／16：18が音節候補列に連結される。
従って、音節候補列記憶部106の内容はニ／０：２−ン／２：４−シ／４：７（ニンシ）ニ／10：12−ン／12：14−シ／16：18 （ニンシ）となる。Then, the process proceeds to the syllable "si". In the syllable candidate storage unit 102, there are two syllable candidates classified and stored as “si”, that is, si / 4: 7 and si / 16: 18. For each of these, it is checked whether the last syllable candidate with the syllable candidate sequence is succeeding in the input voice directly or via another one syllable candidate. As a result, Si / 4: 7 is connected to the syllable candidate sequence. See also / 16:18
Is connected to the last syllable candidate / 12: 14 of the syllable candidate sequence via syllable candidate di / 14: 16 or a / 14: 16, so syllable candidate sequence / 16: 18 becomes the syllable candidate sequence. Be connected.
Therefore, the contents of the syllable candidate string storage unit 106 are: Ni / O: 2-N / 2: 4-Si / 4: 7 (Ninshi) Ni / 10: 12-N / 12: 14-Si / 16: 18 (Ninshi) ).

ここで、単語バッファ104の中の最後の音節“キ”につ
いての処理に進む。音節候補記憶部102中で、“キ”に
分類されて記憶されている音節候補は、キ／18：19の１
個である。この音節候補について音節候補列との最
後尾の音節候補に、直接かあるいは他の１個の他の音節
候補を介して、入力音声中で後続しているかどうかを調
べる。この場合、音節候補列の最後尾の音節候補シ／
14：16の直後に後続している。このことから、音節候補
キ／18：19が音節候補列に連結され、次の音節候補列
が音節候補列に記憶される。Here, the process proceeds to the last syllable “ki” in the word buffer 104. In the syllable candidate storage unit 102, the syllable candidates that are classified and stored as “ki” are ki / 18: 19 1
It is an individual. For this syllable candidate, it is checked whether the syllabic candidate at the end of the syllable candidate sequence is followed in the input speech directly or via one other syllable candidate. In this case, the last syllable candidate sequence in the syllable candidate sequence /
It follows immediately after 14:16. Therefore, the syllable candidate key / 18: 19 is connected to the syllable candidate string, and the next syllable candidate string is stored in the syllable candidate string.

ニ／10：12−ン／12：14−シ／16：18−キ／18：19
（ニンシキ）ここで、単語バッファ104の最後の音節に達しているた
め、音節候補列生成部105は、単語「認識」が入力音声
中の時刻10から時刻19至る区間に存在するということを
出力する。D / 10: 12- / 12/12: 14-Si / 16: 18-key / 18: 19
(Ninshiki) Here, since the last syllable in the word buffer 104 has been reached, the syllable candidate string generation unit 105 outputs that the word “recognition” exists in the section from time 10 to time 19 in the input speech. To do.

以上、本発明の一実施例を説明した。なお、カテゴリの
挿入誤りは、連続しないかぎり、１個の単語中に複数個
生じていてもよい。The embodiment of the present invention has been described above. A plurality of category insertion errors may occur in one word unless they are consecutive.

（発明の効果）以上説明したように、本発明によれば、入力音声からの
音節候補抽出の段階で、いくつかの余分な音節候補が誤
って挿入された場合でも、その単語の存在と入力音声中
での位置を検出することが可能となり、しかも検出処理
の途中で生成される音節候補列の数が極めて少なくて、
効率の良い単語検出を行なうことが可能となる、単語検
出方式を提供することができる。(Effects of the Invention) As described above, according to the present invention, even when some extra syllable candidates are erroneously inserted at the stage of syllable candidate extraction from input speech, the existence and input of the word are input. It becomes possible to detect the position in the voice, and the number of syllable candidate strings generated during the detection process is extremely small,
A word detection method that enables efficient word detection can be provided.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
第１図実施例における音節抽出部の具体例を示すブロッ
ク図、第３図は第１図実施例における入力音声と抽出さ
れた音節候補の一例を示す図、第４図は第１図実施例に
おける音節候補記憶部の内容の一例を示す図である。 101……音節抽出部、102……音節候補記憶部、103……
単語記憶部、104……単語バッファ、105……音節列生成
部、106……音節列記憶部、201……音声バッファ、202
……母音候補検出部、203……母音候補記憶部、204……
母音パタン記憶部、205……子音候補検出部、206……子
音パタン記憶部。FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a block diagram showing a concrete example of a syllable extraction unit in the FIG. 1 embodiment, and FIG. 3 is an input voice and extraction in the FIG. 1 embodiment. FIG. 4 is a diagram showing an example of the selected syllable candidates, and FIG. 4 is a diagram showing an example of the contents of the syllable candidate storage unit in the embodiment of FIG. 101 ... syllable extraction unit, 102 ... syllable candidate storage unit, 103 ...
Word storage unit, 104 ... word buffer, 105 ... syllable string generation unit, 106 ... syllable string storage unit, 201 ... voice buffer, 202
Vowel candidate detection unit, 203 vowel candidate storage unit, 204
Vowel pattern storage unit, 205 ... Consonant candidate detection unit, 206 ... Consonant pattern storage unit.

Claims

[Claims]

1. A category candidate sequence corresponding to a word category sequence is generated using a plurality of category candidates extracted from an input speech, which is a sequence of categories such as syllables, phonemes, and phoneme classes, and their position information. By doing so, in a word detection method for detecting a word in an input voice and its appearance position, each of a plurality of category candidates and their position information obtained from the input voice is classified and stored by the category name, and the word is stored. The category candidates corresponding to each category and their position information are selected from the category candidates that are classified and stored under the same name as the category according to the order of the categories in the list, and two adjacent categories in the word are selected. , Respectively correspond to the category candidates at both ends of the sequence of three consecutive category candidates in the input voice, the sequence of the two categories Three of the sequence of category candidates to correspond, word detection system, characterized in that for generating the category candidate string.