JPH0656558B2

JPH0656558B2 - Word detection method

Info

Publication number: JPH0656558B2
Application number: JP61190260A
Authority: JP
Inventors: 香一郎畑▲崎▼
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-08-12
Filing date: 1986-08-12
Publication date: 1994-07-27
Anticipated expiration: 2009-07-27
Also published as: JPS6344697A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は音声認識装置、音声入力装置等において用いら
れ、入力音声中に含まれる単語とその単語の音声中での
位置とを検出する単語検出方式に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention is used in a voice recognition device, a voice input device, etc., and is a word for detecting a word included in an input voice and a position of the word in the voice. Regarding detection method.

（従来の技術）音声認識装置、音声入力装置等において入力音声中の単
語とその位置を検出する方法に、音節、音素、音素クラ
ス等のカテゴリの列である入力音声から各カテゴリとそ
れらの入力音声中での位置情報とを抽出し、抽出された
カテゴリから作成したカテゴリ列がある単語のカテゴリ
列に対応すれば、その単語と入力音声中でのカテゴリ列
の位置とを検出結果として出力する方法がある。(Prior Art) A method for detecting a word and its position in an input voice in a voice recognition device, a voice input device, etc., includes a method for inputting each category from the input voice that is a sequence of categories such as syllable, phoneme, and phoneme class Position information in the voice is extracted, and if the category string created from the extracted categories corresponds to the category string of a word, that word and the position of the category string in the input voice are output as the detection result. There is a way.

一般に上記のカテゴリは、その時間長が短かく、また類
似するカテゴリが存在することなどから、入力音声中の
カテゴリを完全に誤りなく抽出することは困難である。
このため、従来は、入力音声中の各カテゴリの区間に対
して複数個のカテゴリ候補を抽出しておき、入力音声の
端から順にカテゴリ候補を用いて、部分的なカテゴリ候
補列を生成しては単語のカテゴリ列との照合を行なうと
いう処理を繰り返すことによって、その単語に対応する
カテゴリ候補列を見つけていた。この方法の詳細は、例
えば、文献１「特願昭58−214544号、パタン認識装置」
に述べられているので、ここでは省略する。Generally, the above-mentioned categories have a short time length, and there are similar categories. Therefore, it is difficult to extract the categories in the input speech completely without error.
Therefore, conventionally, a plurality of category candidates are extracted for each category section in the input voice, and the category candidates are sequentially used from the end of the input voice to generate a partial category candidate sequence. Found a category candidate sequence corresponding to the word by repeating the process of matching the word with the category sequence. For details of this method, refer to, for example, Document 1 “Japanese Patent Application No. 58-214544, Pattern Recognition Device”.
Since it has been described above, it is omitted here.

しかしながら、特に入力が連続音声の場合は発声のなま
け音や隣接するカテゴリ（例えば音節）どうしの調音結
合などによる変形が生じやすく、カテゴリ候補抽出の段
階では、複数個のカテゴリ候補の中にも但しいカテゴリ
候補が含まれない場合もある。このように、あるカテゴ
リに対して正しいカテゴリ候補が検出できず、誤ったカ
テゴリ候補に置き換わることを、以後、カテゴリの置換
誤りと呼ぶ。However, especially when the input is a continuous voice, deformation is likely to occur due to a voicing sound and articulation combination of adjacent categories (for example, syllables). In some cases, new category candidates are not included. In this way, the fact that a correct category candidate cannot be detected for a certain category and is replaced with an incorrect category candidate is hereinafter referred to as a category replacement error.

カテゴリ置換誤りに対処するために、従来は次の方法を
用いていた。すなわち、どのようなカテゴリの並びのと
きにカテゴリ置換誤りが起こりやすいかということをあ
らかじめ調査し、その結果から比較的頻度の高いカテゴ
リ置換誤りについてカテゴリ列訂正規則を用意する。カ
テゴリ列訂正規則は、カテゴリ列置換誤りの起こってい
るカテゴリ候補列に対して適用された場合、誤っている
カテゴリ候補を正しいカテゴリ候補に起き換えたカテゴ
リ候補列に変換する。この規則を、単語検出時に、カテ
ゴリ候補列に適用することによって、比較的頻度の高い
カテゴリ置換誤りは、訂正することができる。この方法
は、例えば文献２「松永昭一、好田正紀“Branch ＆ Bo
und法の効果とBottom−Up音韻認識を利用した候補選
択”、日本音響学会音声研究会資料Ｓ85−79、1986年１
月」の616頁右側15行目から18行目に述べられている。
また、訂正規則の例は同文献617頁、表４に示されてい
る。Conventionally, the following method has been used to deal with the category replacement error. That is, it is preliminarily investigated what kind of category arrangement is likely to cause a category substitution error, and from the result, a category string correction rule is prepared for a relatively frequent category substitution error. When the category sequence correction rule is applied to a category candidate sequence in which a category sequence replacement error has occurred, the category sequence correction rule converts the incorrect category candidate into a correct category candidate sequence. By applying this rule to the category candidate sequence at the time of word detection, it is possible to correct relatively frequent category substitution errors. This method is described, for example, in Reference 2 “Shoichi Matsunaga, Masanori Yoshida” Branch & Bo.
Effect of und Method and Candidate Selection Using Bottom-Up Phonological Recognition ”, Acoustical Society of Japan, Speech Research Material S85-79, 1986 1
It is stated on lines 15 to 18 on the right side of page 616 of the Moon.
An example of the correction rule is shown in Table 4 on page 617 of the same document.

（発明が解決しようとしている問題点）上記従来の方法では、入力音声から抽出されたカテゴリ
候補を用いてカテゴリ候補列を生成したのちに、単語の
カテゴリ列との照合を行なっていたために、最終的に無
駄になるカテゴリ候補列が多数生成されてしまい、その
ために多大な計算量を必要としていた。(Problems to be Solved by the Invention) In the above-mentioned conventional method, since the category candidate string is generated using the category candidates extracted from the input speech, the category candidate string is collated with the word category string. A large number of category candidate sequences, which are wasted, are generated, which requires a large amount of calculation.

また、検出すべき単語の区間が入力音声の一部分しか占
めない場合でも、従来は、その単語の存在しない区間を
含め、入力音声の端からすべてのカテゴリ候補について
等しく単語中のカテゴリとの照合を行なわねばならず、
無駄な計算時間を必要とし、単語の検出まで長い時間を
必要としていた。Further, even if a word segment to be detected occupies only a part of the input voice, conventionally, all the category candidates including the segment in which the word does not exist are matched equally with the category in the word from the end of the input voice. Must be done,
It takes a lot of calculation time and a long time to detect a word.

さらに、前記のカテゴリ列訂正規則は、カテゴリ列置換
誤りの起こっているカテゴリ候補列だけではなくて、起
こっていないカテゴリ候補列にも等しく適用される。ま
た、一つのカテゴリ候補列に対しては、多くの場合複数
個の訂正規則が個別に適用される。このため、一つのカ
テゴリ候補列から多くのカテゴリ候補列が生成されてし
まい、単語のカテゴリ列との照合を行なうべきカテゴリ
候補列の数が増加する。しかもそのカテゴリ候補列のほ
とんどは、検出すべき単語のカテゴリ列とは一致しない
ために拒絶されるものである。Further, the above-mentioned category string correction rule is applied not only to the category candidate string in which the category string replacement error has occurred but also to the category candidate string in which the category string replacement error has not occurred. Further, in many cases, a plurality of correction rules are individually applied to one category candidate sequence. Therefore, many category candidate strings are generated from one category candidate string, and the number of category candidate strings to be matched with the word category string increases. Moreover, most of the category candidate strings are rejected because they do not match the category string of the word to be detected.

また、訂正規則で訂正できるカテゴリ置換誤りは比較的
頻繁に起こるものに限られ、まれに起こる誤りを訂正す
ることはできない。訂正できる誤りの種類を増やすため
には訂正規則の数を増加させなければならないが、この
結果、生成されるカテゴリ候補列はますます増加する。Further, the category substitution error that can be corrected by the correction rule is limited to the one that occurs relatively frequently, and the error that occurs rarely cannot be corrected. The number of correction rules must be increased in order to increase the types of errors that can be corrected, but as a result, more and more category candidate sequences are generated.

例えば、「オンセイニンシキワ（音声認識は）」と発音
された音声が入力され、その中の音節候補を抽出した場
合、音節“シ”に対して、“チ”、“イ”の２個の音節
候補しか得られない場合がある。この場合、他の音節に
対して正しい音節候補が得られた場合でも、抽出された
音節候補から生成される音節候補列は“オンセイニンチ
キワ”あるいは“オンセイニンイキワ”となり、この中
には正しい単語候補「認識」の音列例“ニンシキ”に一
致する部分がないため、単語「認識」を検出することは
できない。しかも、このような音節の置換誤りは比較的
まれな種類のものであり、この誤りを訂正する規則が用
意されていることは少ない。For example, when a voice that is pronounced "Onsei Ninshikiwa (voice recognition)" is input and syllable candidates are extracted from it, two syllables, "chi" and "a", are obtained. In some cases, only syllable candidates of can be obtained. In this case, even if correct syllable candidates are obtained for other syllables, the syllable candidate sequence generated from the extracted syllable candidates becomes “onseininchikiwa” or “onseininikikiwa”. Cannot detect the word "recognition" because there is no part that matches the sound string example "ninshiki" of the correct word candidate "recognition". Moreover, such a syllable substitution error is a relatively rare type, and there are few rules for correcting this error.

本発明の目的は、無駄なカテゴリ候補列を生成せず、ま
た、検出すべき単語の区間が入力音声全体のごく一部で
ある場合や、さらに入力音声中のカテゴリ候補検出時に
いくつかのカテゴリ候補が誤った音節候補に置換された
場合でも、効率よく入力音声から正しい単語とその位置
とを検出することを可能にする単語検出方式を提供する
ことにある。An object of the present invention is not to generate a wasteful category candidate sequence, and also when a section of a word to be detected is a very small part of the entire input speech, or when some category is detected when a category candidate is detected in the input speech. An object of the present invention is to provide a word detection method that enables efficient detection of a correct word and its position from input speech even when a candidate is replaced with an incorrect syllable candidate.

（問題点を解決するための手段）前述の問題点を解決し上記目的を達成するために本発明
が提供する手段は、音節、音素、音素クラス等のカテゴ
リの列である入力音声から抽出した複数個のカテゴリ候
補とそれらの位置情報とを用いて、単語のカテゴリ列に
対応するカテゴリ候補列を生成することによって、入力
音声中の単語とその出現位置を検出する単語検出方式に
おいて、入力音声から得た複数個のカテゴリ候補とそれ
らの位置情報のそれぞれをそのカテゴリ名で分類して記
憶し、単語中のカテゴリの並びの順に従って各カテゴリ
に対応するカテゴリ候補とその位置情報をそのカテゴリ
と同じ名前に分類されて記憶されているカテゴリ候補の
中から選ぶとともに、単語中の隣接する３個のカテゴリ
の両端のカテゴリのそれぞれが、入力音声中の連続する
３個のカテゴリ候補の並びの両端のカテゴリ候補に対応
するときには、その３個のカテゴリの並びと３個のカテ
ゴリ候補の並びとを対応させて、カテゴリ候補列の生成
を行なうことを特徴とする。(Means for Solving Problems) Means provided by the present invention for solving the above problems and achieving the above object are extracted from an input speech that is a sequence of categories such as syllables, phonemes, and phoneme classes. By using a plurality of category candidates and their position information to generate a category candidate sequence corresponding to a word category sequence, a word detection method for detecting a word in an input voice and its appearance position Each of the plurality of category candidates and their position information obtained from is classified and stored by the category name, and the category candidates corresponding to each category and their position information are stored as the category according to the order of the categories in the word. While selecting from the category candidates that are classified into the same name and stored, each of the categories at both ends of the three adjacent categories in the word is the input sound. When it corresponds to the category candidates at both ends of the sequence of three consecutive category candidates in the voice, the sequence of three categories and the sequence of three category candidates are associated with each other to generate a category candidate sequence. It is characterized by

（作用）本発明の方式では、入力音声から抽出されたカテゴリ候
補のうち、検出すべき単語に含まれるカテゴリと同じ名
前のカテゴリ候補だけを用いて、かつ単語中のカテゴリ
の並びを辿りながら対応するカテゴリ候補列を生成す
る。このことによって、単語のカテゴリ列あるいはその
部分列に対応するカテゴリ候補列だけが生成されること
になり、無駄なカテゴリ列を生成することを避けること
が可能となる。(Operation) In the method of the present invention, among the category candidates extracted from the input speech, only the category candidates having the same name as the category included in the word to be detected are used, and the arrangement of the categories in the word is followed. A category candidate sequence to be generated is generated. As a result, only the category candidate string corresponding to the word category string or the substring thereof is generated, and it is possible to avoid generating an unnecessary category string.

また、入力音声中のカテゴリ候補のうち、単語中のカテ
ゴリに対応するカテゴリ候補からカテゴリ候補列を生成
してゆくために、検出すべき単語の区間が入力音声の全
体のごく一部の場合であっても、また、その区間が入力
音声中のどの位置にあっても、素早くその単語を検出す
ることが可能となる。In addition, in order to generate a category candidate string from category candidates corresponding to the category in a word among the category candidates in the input speech, when the section of the word to be detected is a very small part of the entire input speech. It is possible to detect the word quickly regardless of whether or not the section exists in the input voice.

また、カテゴリの置換誤りが生じた場合でも、以下の原
理で単語を検出することが可能となる。いま、カテゴリ
列がＣ₁...Ｃ_i-1Ｃ_iＣ_i+1...Ｃ₁である単語Ｗが含まれ
る入力音声中のカテゴリ候補を抽出した結果、カテゴリ
Ｃ₁に対して正しいカテゴリ候補が抽出できなかったと
する。すなわち、入力音声中での単語Ｗに対応する部分
のカテゴリ候補列はＫ₁...Ｋ_i-1Ｋ_XＫ_i+1...Ｋ₁とな
る。ここで、カテゴリ候補Ｋ_i-1およびＫ_i+1はそれぞカ
テゴリＣ_i-1およびＣ_i+1の正しいカテゴリ候補であり、
カテゴリ候補Ｋ_XはカテゴリＣ_iの誤ったカテゴリ候補で
あるとする。そこで、検出すべき単語中のカテゴリの並
びを辿りながら、その単語に対応するカテゴリ候補列を
生成するときに、単語中の連続する３個のカテゴリの並
びＣ_i-1Ｃ_iＣ_i+1の両端のカテゴリ、すなわちＣ_i-1およ
びＣ_i+1に対応するカテゴリ候補がそれぞれ、入力音声
中での連続する３個のカテゴリ候補の並びの両端のカテ
ゴリ候補であるなら、その３個のカテゴリ候補の並びを
単語中のカテゴリ列Ｃ_i-1Ｃ_iＣ_i+1に対応させる。この
ことによって、カテゴリＣ_iに対するカテゴリ候補が誤
ったカテゴリ候補Ｋ_Xによって置換されていたとして
も、カテゴリ候補列とカテゴリ列との正し対応をとるこ
とが可能になる。また、単語のカテゴリ列に対応するカ
テゴリ候補列だけが生成されることになるため、無駄な
カテゴリ候補列の生成を避けることができる。Further, even if a category replacement error occurs, it is possible to detect a word based on the following principle. Now, as a result of extracting the category candidates in the input voice containing the word W whose category string is C ₁ ... C _i-1 C _i C _{i + 1} ... C _1, it is correct for the category C ₁ . It is assumed that the category candidates cannot be extracted. That is, the category candidate sequence of the portion corresponding to the word W in the input voice is K ₁ ... K _i-1 K _X K _{i + 1} ... K ₁ . Here, the category candidates K _i-1 and K _{i + 1} are correct category candidates of the categories C _i-1 and C _{i + 1} , respectively.
It is assumed that the category candidate K _X is an incorrect category candidate of the category C _i . Therefore, when the category candidate string corresponding to the word is generated while tracing the category arrangement in the word to be detected, an arrangement of three consecutive categories in the word C _i-1 C _i C _{i + 1} If the category candidates corresponding to the categories at both ends of C _i−1 , that is, the category candidates corresponding to C _i−1 and C _{i + 1} , respectively, are the category candidates at the both ends of the sequence of three consecutive category candidates in the input speech, The sequence of category candidates is made to correspond to the category string C _i-1 C _i C _{i + 1} in the word. As a result, even if the category candidate for the category C _i is replaced by the incorrect category candidate K _X , it is possible to make a correct correspondence between the category candidate sequence and the category sequence. Further, since only the category candidate string corresponding to the word category string is generated, useless generation of the category candidate string can be avoided.

（実施例）以下、図面を参照しつつ、実施例に従って本発明を一層
詳細に説明する。(Examples) Hereinafter, the present invention will be described in more detail according to examples with reference to the drawings.

第１図は本発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

本実施例では日本語の音声の入力がされるものとし、ま
たカテゴリとして音節を用いる。音節抽出部101は入力
音声中の音節候補を検出し、その候補を音節候補記憶部
102に記憶する。In this embodiment, Japanese voice is input and syllables are used as categories. The syllable extraction unit 101 detects a syllable candidate in the input voice and stores the candidate in the syllable candidate storage
Store in 102.

音節抽出部101の一例をブロック図で第２図に示す。第
２図において、入力音声は音声バッファ201に一旦格納
される。まず、母音候補検出部202が、音声バッファ201
に格納された音声中の母音候補を検出し、母音候補記憶
部203に格納する。母音候補の検出は母音パタン記憶部2
04にあらかじめ格納されている各母音の音声標準パタン
と入力音声の各区間とを照合することによって行なわれ
る。母音の音声信号は比較的定常であるので検出は容易
である。各母音候補は少なくとも母音名、入力音声中で
の位置の情報を保持している。母音候補の検出が終了し
た後、子音候補検出部205によって子音候補が次に述べ
るようにして検出される。日本語においては、音節は子
音（Ｃ）−母音（Ｖ）の組である。従って入力音声中で
は、２個の母音に挾まれた区間のうちのある時間長以下
の区間（これをＶＣＶ区間）および入力音声の始端から
ある時間長以内にある母音までの区間（これをＣＶ区
間）のそれぞれに、１個の子音が存在すると言える。子
音候補検出部204は母音候補記憶部203に記憶されている
母音候補から作られるすべてのＶＣＶ区間およびＣＶ区
間のそれぞれに対して、あらかじめ子音パタン記憶部20
6に記憶されているＶＣＶおよびＣＶ標準音声パタンと
の照合を行ない、類似度の高い複数個の音声パタンの名
前を子音候補とする。以上で決定された母音候補と子音
候補とを組み合わせて音節候補とし、入力音声中での位
置と共に音節候補記憶部102に記憶する。An example of the syllable extraction unit 101 is shown in a block diagram in FIG. In FIG. 2, the input voice is temporarily stored in the voice buffer 201. First, the vowel candidate detection unit 202 uses the voice buffer 201.
The vowel candidate in the voice stored in is detected and stored in the vowel candidate storage unit 203. Vowel candidate detection is performed by the vowel pattern storage unit 2
This is performed by comparing the standard voice pattern of each vowel stored in 04 with each section of the input voice. The vowel voice signal is relatively stationary and therefore easy to detect. Each vowel candidate holds at least vowel name and position information in the input voice. After the detection of vowel candidates is completed, the consonant candidate detection unit 205 detects consonant candidates as described below. In Japanese, a syllable is a consonant (C) -vowel (V) pair. Therefore, in the input speech, a section between two vowels that is shorter than a certain time length (this is a VCV section) and a section from the beginning of the input speech to a vowel that is within a certain time length (this is a CV section). It can be said that there is one consonant in each (section). The consonant candidate detection unit 204 preliminarily stores the consonant pattern storage unit 20 for all VCV sections and CV sections created from the vowel candidates stored in the vowel candidate storage unit 203.
The VCV and CV standard voice patterns stored in 6 are collated, and the names of a plurality of voice patterns having a high degree of similarity are used as consonant candidates. The vowel candidate and the consonant candidate determined as described above are combined into a syllable candidate and stored in the syllable candidate storage unit 102 together with the position in the input voice.

例として、“オンセイニンシキワ”（音声認識は）とい
う音声が入力されたとすると、音節認識の結果として例
えば第３図に示されるような音節候補が抽出される。第
３図において、矢印の線が各音節候補の区間であり、各
区間に複数個の音節に候補が抽出されている。これらの
音節候補は、音節名で分類されて、音節候補記憶部102
に記憶される。この結果、音節候補記憶部102の内容は
第４図に示されるようになる。この図では、各音節候補
を“音節名／始端時刻：終端時刻”の形式で表現してい
る。As an example, if the voice "ON-SEI NINISHIWA" (voice recognition) is input, syllable candidates as shown in FIG. 3 are extracted as a result of the syllable recognition. In FIG. 3, a line of an arrow is a section of each syllable candidate, and a plurality of syllable candidates are extracted in each section. These syllable candidates are classified by syllable name, and the syllable candidate storage unit 102
Memorized in. As a result, the contents of the syllable candidate storage unit 102 are as shown in FIG. In this figure, each syllable candidate is represented in the format of "syllable name / start time: end time".

単語記憶部103には検出すべき単語の音節列が記憶され
ている。その中の１個の単語を単語バッファ104に取り
出した後、入力音声にこの単語が含まれるかどうか調べ
られる。今、単語バッファ104には単語「認識」の音節
列“ニンシキ”が記憶されているとする。The word storage unit 103 stores syllable strings of words to be detected. After fetching one of the words into the word buffer 104, it is examined whether or not this word is included in the input speech. Now, it is assumed that the word buffer 104 stores the syllable string “Ninshiki” of the word “recognition”.

音節候補列生成部105は単語バッファ104に記憶されてい
る単語中の音節の並びの順に、音節候補記憶部102中の
音節候補から音節候補列を作成し、その結果の音節候補
列と対応する音節列とを音節候補列記憶部106に記憶す
る。本実施例では、単語の先頭の音節から順に音節列を
作成してゆく。The syllable candidate string generation unit 105 creates a syllable candidate string from the syllable candidates in the syllable candidate storage unit 102 in the order of the arrangement of syllables in the words stored in the word buffer 104, and corresponds to the resulting syllable candidate string. The syllable sequence and the syllable candidate sequence storage unit 106 are stored. In this embodiment, a syllable string is created in order from the beginning syllable of a word.

まず、単語バッファ104先頭の音節は“ニ”であるか
ら、音節候補列生成部105は音節候補記憶部102中で
“ニ”に分類されて記憶されている音節候補を取り出
し、それぞれを長さ１の音節候補列として、音節“ニ”
とともに音節候補列記憶部106に記憶する。この結果、
音節候補列記憶部106には、ニ／０：２（ニ）ニ／２：４（ニ）ニ／10：12 （ニ）ニ／14：16 （ニ）の４個の音節候補列が記憶される。ここで、括弧の中が
対応する音節列である。First, since the syllable at the beginning of the word buffer 104 is “d”, the syllable candidate string generation unit 105 extracts the syllable candidates stored in the syllable candidate storage unit 102 classified into “d” and stores the length of each syllable candidate. As the syllable candidate sequence of 1, the syllable "D"
It is stored together with the syllable candidate string storage unit 106. As a result,
The syllable candidate string storage unit 106 stores four syllable candidate strings: 0/2 (d) 2/2: 4 (d) d / 10: 12 (d) d / 14: 16 (d). To be done. Here, the brackets are the corresponding syllable strings.

次に、音節候補列生成部105は単語バッファ104中の次の
音節“ン”とその次の音節“シ”に注目する。すなわ
ち、音節候補記憶部102中で“ン”に分類されて記憶さ
れている音節候補のそれぞれについて、音節候補列記憶
部106中のいずれかの音節候補列の最後尾の音節候補の
直後に入力音声中で後続しているかどうかを調べる。ま
た、音節候補記憶部102中で“シ”に分類されて記憶さ
れている音節候補のそれぞれについて、音節候補列記憶
部106中のいずれかの音節候補列を最後尾の音節候補
に、他の１個の音節候補を介して、入力音声中で後続し
ているかどうかを調べる。そのような後続している音節
候補があれば、その音節候補を音節候補列の最後尾に連
結して新たな音節候補列を生成し音節候補列記憶部106
に記憶する。音節候補Ａが他の音節候補Ｂに後続してい
るかどうかは音節候補Ａの終端時刻と音節候補Ｂの始端
時刻とを比較することによって判定することができる。
ここでは、それらの時刻の差がプラスマイナス１以下の
ときに後続すると判定する。今の場合は“ン”に分類さ
れて記憶されている音節候補は、ン／２：４、ン／12：
14の３個である。そこで、音節候補ン／２：４が音節候
補列の最後尾の音節候補ニ／０：２の直後に後続する
ことから、音節候補列に音節候補ニ／０：２を連結し
て音節列“ニン”とする。同様に、音節候補ン／12：14
を音節候補列に連結して音節列“ニン”とする。ま
た、“シ”に分類されて記憶されている音節候補はな
い。さらに、それまで音節候補列記憶部106に記憶され
ていた音節候補列は削除する。この結果、音節候補列記
憶部106の中には、ニ／０：２−ン／２：４（ニン）ニ／10：12−ン／12：14 （ニン）の２個の音節候補列が残る。Next, the syllable candidate string generation unit 105 pays attention to the next syllable “n” and the next syllable “si” in the word buffer 104. That is, for each of the syllable candidates that are classified and stored as “n” in the syllable candidate storage unit 102, input immediately after the last syllable candidate of any one of the syllable candidate strings in the syllable candidate string storage unit 106. Check if it follows in the voice. In addition, for each of the syllable candidates that are classified and stored as “si” in the syllable candidate storage unit 102, one of the syllable candidate strings in the syllable candidate string storage unit 106 is set as the last syllable candidate and the other It is checked, via one syllable candidate, whether or not it follows in the input speech. If there is such a succeeding syllable candidate, that syllable candidate is connected to the end of the syllable candidate string to generate a new syllable candidate string, and the syllable candidate string storage unit 106
Remember. Whether or not the syllable candidate A follows the other syllable candidate B can be determined by comparing the end time of the syllable candidate A and the start time of the syllable candidate B.
Here, when the difference between those times is plus or minus 1 or less, it is determined to follow. In the present case, the syllable candidates that are classified and stored as "n" are n / 2: 4 and n / 12:
There are 3 of 14. Therefore, since the syllable candidate / 2: 4 follows immediately after the last syllable candidate d / 2: 0: 2 of the syllable candidate string, the syllable candidate string is connected to the syllable candidate d / 2: 0: 2. Nin ”. Similarly, syllable candidates / 12:14
Is connected to the syllable candidate sequence to form a syllable sequence “nin”. Further, there is no syllable candidate that is classified and stored as “si”. Further, the syllable candidate string stored in the syllable candidate string storage unit 106 is deleted. As a result, in the syllable candidate string storage unit 106, there are two syllable candidate strings of N / 0: 2-N / 2: 4 (Nin) N / 10 / 10-12-N / 12: 14 (Nin). Remain.

続いて、音節“シ”についての処理に進む。音節候補記
憶部102中で、“シ”に分類されて記憶されている音節
候補は、ない。単語バッファ104中の“シ”の次の音節
“キ”に分類されて記憶されている音節候補はキ／17：
20の１個である。この音節候補について音節候補列と
の最後尾の音節候補に、他の１個の他の音節候補を介
して、入力音声中で後続しているかどうかを調べる。こ
の結果、キ／17：20が音節候補列の最後尾の音節候補
ン／12：14に音節候補チ／14：17を介して後続している
ことがわかり、音節候補キ／17：20が音節候補列に連
結され、音節列“ニンシキ”に対応させられる。従っ
て、音節候補列記憶部106の内容はニ／10：12−ン／12：14−キ／17：20 （ニンシキ）となる。Then, the process proceeds to the syllable "si". In the syllable candidate storage unit 102, there are no syllable candidates classified and stored as “si”. The syllable candidates stored in the word buffer 104 classified into the next syllable “ki” of “shi” are ki / 17:
It is 1 of 20. With respect to this syllable candidate, it is checked whether or not the last syllable candidate with the syllable candidate string is followed by another one syllable candidate in the input speech. As a result, it was found that Ki / 17: 20 was succeeding the last syllable candidate / 12: 14 of the syllable candidate string via the syllable candidate J / 14: 17. It is connected to the syllable candidate string and made to correspond to the syllable string “Ninshiki”. Therefore, the content of the syllable candidate string storage unit 106 is 2/10: 12- / 12/12: 14-ki / 17: 20 (ninshinki).

ここで、単語バッファ104の中の最後の音節“キ”に達
しているため、音節候補列生成部105は、単語「認識」
が入力音声中の時刻10から時刻20に至る区間に存在する
ということを出力する。Here, since the last syllable “ki” in the word buffer 104 has been reached, the syllable candidate sequence generation unit 105 causes the word “recognition” to be recognized.
Is present in the section from the time 10 to the time 20 in the input voice.

以上、本発明の一実施例を説明した。なお、カテゴリの
置換誤りは、連続しないかぎり、１個の単語中に複数個
生じていてもよい。The embodiment of the present invention has been described above. A plurality of category replacement errors may occur in one word unless they are consecutive.

（発明の効果）以上説明したように、本発明によれば、入力音声からの
音節候補抽出の段階で、いくつかの音節候補が誤った音
節候補に置換された場合でも、その単語の存在と入力音
声中での位置を検出することが可能となり、しかも検出
処理の途中で生成される音節候補列の数が極めて少なく
て、効率の良い単語検出を行なうことが可能となる単語
検出方式を提供することができる。(Effects of the Invention) As described above, according to the present invention, even if some syllable candidates are replaced with erroneous syllable candidates at the stage of syllable candidate extraction from input speech, the presence of the word Providing a word detection method that enables detection of the position in the input voice, and the number of syllable candidate sequences generated during the detection process is extremely small, enabling efficient word detection can do.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロック図、第２図は
第１図実施例における音節抽出部の具体例を示すブロッ
ク図、第３図は第１図実施例における入力音声と抽出さ
れた音節候補の一例を示す図、第４図は第１図実施例に
おける音節候補記憶部の内容の一例を示す図である。 101……音節抽出部、102……音節候補記憶部、103……
単語記憶部、104……単語バッファ、105……音節列生成
部、106……音節列記憶部、201……音声バッファ、202
……母音候補検出部、203……母音候補記憶部、204……
母音パタン記憶部、205……子音候補検出部、206……子
音パタン記憶部。FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a block diagram showing a concrete example of a syllable extraction unit in the FIG. 1 embodiment, and FIG. 3 is an input voice and extraction in the FIG. 1 embodiment. FIG. 4 is a diagram showing an example of the selected syllable candidates, and FIG. 4 is a diagram showing an example of the contents of the syllable candidate storage unit in the embodiment of FIG. 101 ... syllable extraction unit, 102 ... syllable candidate storage unit, 103 ...
Word storage unit, 104 ... word buffer, 105 ... syllable string generation unit, 106 ... syllable string storage unit, 201 ... voice buffer, 202
Vowel candidate detection unit, 203 vowel candidate storage unit, 204
Vowel pattern storage unit, 205 ... Consonant candidate detection unit, 206 ... Consonant pattern storage unit.

Claims

[Claims]

1. A category candidate sequence corresponding to a word category sequence is generated using a plurality of category candidates extracted from an input speech, which is a sequence of categories such as syllables, phonemes, and phoneme classes, and their position information. By doing so, in a word detection method for detecting a word in an input voice and its appearance position, each of a plurality of category candidates and their position information obtained from the input voice is classified and stored by the category name, and the word is stored. The category candidates corresponding to each category and their position information are selected from the category candidates stored under the same name as the category according to the order of the categories in the list, and the three adjacent categories in the word are selected. Each of the categories at both ends of the
A word characterized by generating a category candidate sequence by associating the three category arrangements with the three category candidate arrangements when they correspond to the category candidates at both ends of the category candidate arrangement. Detection method.