JPH0458634B2

JPH0458634B2 -

Info

Publication number: JPH0458634B2
Application number: JP61078821A
Authority: JP
Inventors: Fumio Togawa; Hiroyuki Iwahashi
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1986-04-05
Filing date: 1986-04-05
Publication date: 1992-09-18
Also published as: JPS62235990A

Description

[Detailed description of the invention]

産業上の利用分野本発明は、入力されたた音声を音節単位に認識
する日本語音声入力装置などに有利に用いられる
音声認識方式に関する。背景技術一般に音声を認識するにあたつては、一音一音
を句切つて発声される音声は高精度で認識するこ
とができる。しかし、連続音声においては、各音
節が前後の音節の影響を受け（調音結合）、また
単語や文節あるいは文章内の音節の位置による音
声の強弱、高低の差も激しく、結果として認識精
度の低下が生じるという問題点がある。これは、
連続音声中では特定の人であつても異なる文脈位
置から抽出される音節の特徴パターンが様々に変
形するからである。これらに対処するため、従来
では登録時に多くの特徴標準パターンを持つこと
によつて対応したり、さらに登録時に作成された
特徴標準パターン入力時に認識率の悪い特徴標準
パターンを様々な音節環境にある音節の特徴パタ
ーンで置き換えて特徴標準パターンを更新してい
くことで対処していた。発明が解決しようとする問題点上記先行技術では、登録される特徴標準パター
ンが音節の入力頻度に強く依存するため、平均的
には認識精度が高まつたが、余り現れない文節
や、文章が入力されたときの音節の認識精度が落
ちていた。本発明の目的は、上述の技術的課題を解決し、
入力頻度の小さい音節の特徴標準パターンをも効
率よく修正・更新することができ、音節の認識精
度を向上することができるようにした音節認識方
式を提供することである。問題点を解決するための手段本発明は、入力された音声を予め登録された複
数種類の音節の特徴標準パターンとの類似度計算
によつて音節単位に認識し、その結果を辞書との
照合もしくはキーボードなどの外部指示操作によ
つて修正して最終的な入力を得るようにした音声
認識方式において、音節の各特徴標準パターン毎に、その特徴パタ
ーンが音声中から切り出された前後の少なくとも
一方側の音韻環境を表す情報を持たせ、入力時に発声された音声を分析して得られた音
節の特徴パターンを標準特徴パターンとして追
加、あるいはその入力特徴パターンを用いて標準
特徴パターンを修正して入力時に登録を行なう場
合に、その入力特徴パターンが切り出された前後
の少なくとも一方側の音韻環境と同じ環境情報を
もつ標準特徴パターンに対して追加修正による更
新の操作を行なうようにしたことを特徴とする音
声認識方式である。作用本発明に従えば、音節の直前や直後の音韻を表
す情報を各特徴標準パターンに対応づけて付加し
特徴標準パターンとととに記憶して、入力された
音節の特徴パターンを特徴標準パターンとして追
加、あるいはその特徴パターンを用いて特徴標準
パターンを修正するとき、入力された音節と同じ
音韻環境情報をもつ特徴標準パターンに対して修
正を行なう。このように音韻的な配置に基づいて
音節の特徴標準パターンを更新していくことによ
つて、入力頻度の高い音節環境（たとえば「かい
（kai）」や「さい（sai）」の「ai」にみられるよ
うな音節「い」の直前の音韻「ａ」など）から抽
出される特徴パターンで特徴標準パターンが形成
されて類似した特徴パターンがだぶつくことを防
ぎ、また出現頻度の低い音節環境から抽出される
特徴パターンを保持しながら特徴標準パターンの
更新が遂行される。実施例本発明は、音節環境という概念を利用したもの
であり、先ずこの音節環境について説明する。音
節の前後の音韻環境による音節特徴パターンの変
形について「ないのわ」について分析した結果を
第１表、第２表および第３表に示す。また「のん
でも」についての分析結果を第４表に示す。これ
は、たくさんの文節音声中の様々な音韻環境から
取り出した同種類の音節の特徴パターンについて
類似度を比較したものである。なお、第１表には
入力音声「ないのわ」の「い」についてのパター
ン距離が示されており、第２表には入力音声「な
いのわ」の「の」についてのパターン距離が示さ
れており、第３表には入力音声「ないのわ」の
「わ」についてのパターン距離が示されており、
また第４表には入力音声「のんでも」の「の」に
ついてのパターン距離が示されている。この第１表〜第４表では、パターン距離は16進
数表記で示されている。 BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method that is advantageously used in Japanese speech input devices that recognize input speech syllable by syllable. BACKGROUND ART In general, when recognizing speech, speech that is uttered with pauses in each syllable can be recognized with high accuracy. However, in continuous speech, each syllable is influenced by the syllables before and after it (articulatory combination), and there are significant differences in the strength and pitch of the voice depending on the position of the syllable in a word, phrase, or sentence, resulting in a decrease in recognition accuracy. There is a problem that this occurs. this is,
This is because, in continuous speech, the characteristic patterns of syllables extracted from different context positions vary in various ways even for a specific person. In order to deal with these problems, conventional methods have been used to deal with this by having many feature standard patterns at the time of registration, and furthermore, when inputting the feature standard patterns created at the time of registration, feature standard patterns with poor recognition rates are used in various syllable environments. This was dealt with by updating the feature standard pattern by replacing it with a syllable feature pattern. Problems to be Solved by the Invention In the above-mentioned prior art, the registered feature standard pattern strongly depends on the input frequency of syllables, so recognition accuracy is improved on average. The accuracy of recognizing syllables when input was reduced. The purpose of the present invention is to solve the above-mentioned technical problems,
It is an object of the present invention to provide a syllable recognition method capable of efficiently modifying and updating even standard characteristic patterns of syllables with low input frequency and improving syllable recognition accuracy. Means for Solving the Problems The present invention recognizes input speech in units of syllables by calculating similarity with standard patterns of syllable characteristics of multiple types registered in advance, and compares the results with a dictionary. Or, in a speech recognition method in which the final input is obtained by modifying the input using an external command such as a keyboard, for each feature standard pattern of a syllable, at least one of the two before and after the feature pattern is extracted from the speech. The syllable feature pattern obtained by analyzing the voice uttered at the time of input can be added as a standard feature pattern, or the standard feature pattern can be modified using the input feature pattern. A feature of the present invention is that when registration is performed at the time of input, an update operation by additional correction is performed on a standard feature pattern that has the same environment information as the phonological environment on at least one side before and after the input feature pattern was extracted. This is a voice recognition method that uses Effect According to the present invention, information representing the phoneme immediately before or after a syllable is added in association with each feature standard pattern and stored in the feature standard pattern and the feature standard pattern. When adding a feature pattern as a pattern or modifying a feature standard pattern using the feature pattern, the feature standard pattern having the same phonological environment information as the input syllable is modified. By updating the syllable characteristic standard pattern based on the phonological arrangement in this way, it is possible to improve the syllable environment that is frequently input (for example, ``ai'' in ``kai'' and ``sai''). A feature standard pattern is formed from the feature pattern extracted from the phoneme "a" immediately before the syllable "i" as seen in The feature standard pattern is updated while retaining the feature pattern extracted from the feature pattern. Embodiments The present invention utilizes the concept of a syllable environment, and this syllable environment will first be explained. Tables 1, 2, and 3 show the results of an analysis of ``nainowa'' regarding the transformation of the syllable feature pattern depending on the phonetic environment before and after the syllable. Table 4 shows the analysis results for "Nondemo". This is a comparison of the similarity of characteristic patterns of syllables of the same type extracted from various phonological environments in a large number of syllables. Note that Table 1 shows the pattern distances for ``i'' in the input voice ``Nainowa'', and Table 2 shows the pattern distances for ``no'' in the input voice ``Nainowa''. Table 3 shows the pattern distance for ``wa'' in the input voice ``nai no wa''.
Furthermore, Table 4 shows the pattern distance for "no" in the input speech "nondemo". In Tables 1 to 4, the pattern distances are expressed in hexadecimal notation.

【表】【table】

【表】「ないのわ（nainowa）」の「の」は、「がらす
の（garasuno）」の「の」より、「においの
（nioino）」、「かんじの（kanjino）」などの「の」
とよく類似している。つまり直前の母音が「ｉ」
である「の」と似ている。このように、音節の特
徴パターンはその切り出された前後、とくに直前
の音韻に強く影響を受けて変形することがわか
る。これらの変形は音節環境と相関があり、この
点を本発明は利用したものである。とこで日本語の音節は、基本的には子音
（Consonant）と母音（Vowel）の一対構造から
なる単音節である。第５表の記号に従うと、たと
えば「しやかいの」は「sja−ka−ｉ−no」、
「C1C2V−C1V−Ｖ−C1V」、「CV−CV−Ｖ−
Ｃ」に分解され４つの単音節からできていること
がわかる。そして子音と母音との間に半母音があ
つたり、母音の後に連母音、長音、促音などがあ
つたりする。[Table] The ``no'' in ``nainowa'' comes from the ``no'' in ``garasuno,'' as in ``nioino,'' and ``kanjino.'' ”
is very similar to In other words, the vowel immediately before is "i"
It is similar to ``no''. In this way, it can be seen that the characteristic pattern of a syllable is strongly influenced by the phoneme before and after the syllable is extracted, and especially by the phoneme immediately before it. These transformations are correlated with the syllable environment, and the present invention utilizes this point. However, Japanese syllables are basically monosyllables consisting of a pair of consonants and vowels. According to the symbols in Table 5, for example, “Shiyakai no” is “sja-ka-i-no”,
“C1C2V−C1V−V−C1V”, “CV−CV−V−
It can be seen that it is broken down into "C" and is made up of four monosyllables. There are also semi-vowels between consonants and vowels, and continuous vowels, long sounds, consonants, etc. after vowels.

【表】【table】

【表】そこで音素を第６表に示されるように16進数で
コード化し、これらを用いて日本語音節構造を８
ビツトで表現して前後の音節環境をコード化して
いる。第１図は本発明の一実施例の日本語音声入力装
置１の構成を示すブロツク図であり、第２図は日
本語音声入力装置１における音声認識処理の手順
を示すフローチヤートである。この日本語音声入
力装置１は、連続的に発声された音声を音節単位
に認識し、この認識結果を辞書によつて修正した
後、単語などの単位で外部装置に転送する機能を
有している。先ず、ステツプn1からステツプn2に移り、音
声信号が入力される。すなわち発声され入力され
た音声は、マイクロホン２を介してアナログ入力
部３に入力され、このアナログ入力部３内の増幅
器４によつて増幅された後、アナログ／デジタル
変換器５によつてデジタル信号に変換され、その
デジタル信号が音声分析部６および音節セグメン
テーシヨン部７に入力される。次にステツプn2からステツプn3に移り、音響
音節の切り出し処理が行なわれる。すなわち音声
分析部６では、入力音声を16ms程度のフレーム
に分けスペクトル分析を行ない、8ms程度の間隔
で音節セグメンテーシヨン部７に音節のセグメン
テーシヨンに必要な特徴パラメータを転送する。
音節セグメンテーシヨン部７では、音声分析部６
からの種々の特徴パラメータをリング状の特徴パ
ターンバツフア８に一時記憶しながら音節を切り
出して各音節の特徴をパターン化してステツプ
n4で特徴パターンメモリ９に記憶する。特徴パ
ターンバツフア８は複数の音節を記憶することが
できるように構成されている。音節セグメンテー
シヨン部７の処理は、中央処理装置（以下CPU
と言う）１０からの命令により開始・停止が制御
されるように構成されている。次にステツプn5で音節の認識処理が行なわれ、
ステツプn6では認識結果候補が選ばれる。すな
わち、音節認識部１１では、各音節の特徴パター
ンと、予め標準として記憶している総ての特徴標
準パターンメモリ１２とのパターン距離計算を行
ない、類似頻度の高い順に候補を出す。そして特
徴標準パターンメモリ１６に記憶しているラベル
にしたがつて同種の音節名を持つ候補を統合し音
節認識結果として認識結果メモリ１３に記憶す
る。次にステツプn7で音節認識結果の誤りを修正
処理し、ステツプn8で確定された音節認識結果
を導き出す。すなわち音節認識部１１内の修正処
理部１１ａでは、言語処理用辞書メモリ１４に記
憶した辞書を用いて音節認識結果の誤りを自動的
に修正する。あるいは操作者自身がキーボード１
５によつて、入力音声に対する認識候補から正し
い候補を選択したり、また誤り箇所を直接修正す
る場合もある。このようにして確定された正しい
結果は、漢字に変換されて文字列として出力され
る。たとえば入力音声が「かいわ」の場合に、第
７表に示すパターンマツチングが行なわれ、第８
表に示す同一音節の結合が行なわれ、第９表に示
す修正処理が行なわれて、確定文字列「かいわ」
が出力される。[Table] Therefore, the phonemes are encoded in hexadecimal as shown in Table 6, and these are used to create the Japanese syllable structure.
It is expressed in bits and encodes the surrounding syllables before and after. FIG. 1 is a block diagram showing the configuration of a Japanese speech input device 1 according to an embodiment of the present invention, and FIG. 2 is a flowchart showing the procedure of speech recognition processing in the Japanese speech input device 1. As shown in FIG. This Japanese speech input device 1 has a function of recognizing continuously uttered speech in units of syllables, correcting the recognition results using a dictionary, and then transmitting the results in units of words etc. to an external device. There is. First, the process moves from step n1 to step n2, where an audio signal is input. That is, the uttered and input voice is input to the analog input section 3 via the microphone 2, and after being amplified by the amplifier 4 in this analog input section 3, it is converted into a digital signal by the analog/digital converter 5. The digital signal is input to the speech analysis section 6 and the syllable segmentation section 7. Next, the process moves from step n2 to step n3, where acoustic syllable extraction processing is performed. That is, the speech analysis section 6 divides the input speech into frames of about 16 ms and performs spectrum analysis, and transfers the characteristic parameters necessary for syllable segmentation to the syllable segmentation section 7 at intervals of about 8 ms.
In the syllable segmentation section 7, the speech analysis section 6
While temporarily storing various feature parameters from the ring-shaped feature pattern buffer 8, syllables are cut out and the features of each syllable are patterned.
It is stored in the feature pattern memory 9 at n4. The feature pattern buffer 8 is configured to be able to store a plurality of syllables. The processing of the syllable segmentation unit 7 is carried out by the central processing unit (hereinafter referred to as CPU).
It is configured such that start and stop are controlled by commands from 10 (10). Next, in step n5, syllable recognition processing is performed,
In step n6, recognition result candidates are selected. That is, the syllable recognition unit 11 calculates the pattern distance between the feature pattern of each syllable and all the feature standard pattern memory 12 stored as standards in advance, and presents candidates in descending order of similarity frequency. Then, according to the labels stored in the feature standard pattern memory 16, candidates having the same type of syllable name are integrated and stored in the recognition result memory 13 as a syllable recognition result. Next, in step n7, errors in the syllable recognition results are corrected, and in step n8, the finalized syllable recognition results are derived. That is, the correction processing section 11a in the syllable recognition section 11 automatically corrects errors in the syllable recognition results using the dictionary stored in the language processing dictionary memory 14. Or the operator himself can use the keyboard 1
5, a correct candidate may be selected from the recognition candidates for the input speech, or an error location may be directly corrected. The correct result determined in this way is converted into Kanji and output as a character string. For example, when the input voice is "kaiwa", the pattern matching shown in Table 7 is performed, and the pattern matching shown in Table 8 is performed.
The same syllables shown in the table are combined, the correction processing shown in Table 9 is performed, and the final character string "Kaiwa" is
is output.

【表】【table】

【表】次にステツプn8で学習処理が行なわれ、ステ
ツプn9で処理が終了する。なお、音声分析部６以外は、すべてCPU１０
によつて制御されている。第３図は前記ステツプn8の標準パターン学習
処理のさらに詳細な処理手順を示すフローチヤー
トである。音節が確定した後、ステツプn8から
ステツプm1、ステツプm2に移り、音節認識部１
１内の学習制御部１１ｂで、最近の正／誤傾向な
どの情報を用いて、入力された各音節の特徴パタ
ーンを学習（その特徴パターンを用いて特徴標準
パターンを更新することを言う）するか否かを判
定する。たとえば自動修正された音節列「かい
わ」と音節認識結果とを比較すると、「い」の音
節が「ぴ」と誤認識されている。次にこの入力音節「い」の特徴パターンを学習
すると判定された場合を示す。そこでステツプ
m3で、音節認識部１１内の更新パターン限定部
１１ｃが、入力音節と同一の音節ラベルでその入
力音節が切り出された音韻環境と同じ環境情報
（特徴標準パターンメモリ１８に記憶されている）
をもつ特徴標準パターンに更新の対象を限定す
る。たとえば音節の直前の音韻を第10表に示す８
つのクラスに分類して、そのクラスの番号（単音
節の８ビツトコードの下位３ビツトを使用）を音
節環境情報として記憶している。[Table] Next, learning processing is performed at step n8, and the processing ends at step n9. In addition, except for the voice analysis section 6, all CPUs are 10.
controlled by. FIG. 3 is a flowchart showing a more detailed procedure of the standard pattern learning process in step n8. After the syllable is determined, the process moves from step n8 to step m1 and step m2, where the syllable recognition unit 1
The learning control unit 11b in 1 learns the feature pattern of each input syllable using information such as recent correct/incorrect trends (this means updating the feature standard pattern using the feature pattern). Determine whether or not. For example, when comparing the automatically corrected syllable string ``kaiwa'' with the syllable recognition results, the syllable ``i'' is incorrectly recognized as ``pi.'' Next, a case will be shown in which it is determined that the characteristic pattern of the input syllable "i" is to be learned. So step
In m3, the update pattern limiting unit 11c in the syllable recognition unit 11 uses the same syllable label as the input syllable and the same environment information as the phonological environment from which the input syllable was extracted (stored in the feature standard pattern memory 18).
The update target is limited to feature standard patterns with . For example, the phoneme immediately before the syllable is shown in Table 108.
The class number (using the lower three bits of the monosyllable 8-bit code) is stored as syllable environment information.

【表】「＊」；無音（語頭や促音「つ」）「Ｎ」；撥音「ん」入力音節「い」の直前の母音が「ａ」であり、
音節環境情報は第10表から１となる。ここでＭ個
の特徴標準パターンの構成と内容が第11表に示め
されている。[Table] "*": Silence (word-initial or consonant "tsu") "N": Decal sound "n" The vowel immediately before the input syllable "i" is "a",
The syllable environment information is 1 from Table 10. Here, the structure and contents of the M feature standard patterns are shown in Table 11.

【表】【table】

【表】この第11表より、更新の対象となる特徴標準パ
ターンはP7、P11、P13の３つである。ステツプm4において、音節認識部１１内の更
新部１１ｂは、選定された特徴標準パターンの中
で最も悪いパターンを入力された音節の特徴パタ
ーンで第１式により置換、あるいはその特徴パタ
ーンを用いて第２式により平均化の操作を行なつ
てその特徴標準パターンの更新を終了する。 P′11＝Pin …(1) P′11＝（P11＋Pin）／２ …(2) ここでPinは、入力特徴パターンを示し、P′11
は、特徴標準パターンP11が更新されたときの新
たな特徴標準パターンを示している。たとえば、認識貢献度を示す認識カウンタ（特
徴標準パターンメモリ１６に記憶されている）の
値が最も小さい（すなわち認識への貢献度が最も
小さい）パターンP11が第12表に示すように更新
される。同時に認識カウンタの値は、リセツトさ
れている。[Table] From Table 11, there are three feature standard patterns to be updated: P7, P11, and P13. In step m4, the updating unit 11b in the syllable recognition unit 11 replaces the worst pattern among the selected standard feature patterns with the feature pattern of the input syllable according to the first equation, or uses the feature pattern to replace the worst pattern with the feature pattern of the input syllable. The averaging operation is performed using the formula 2, and the update of the feature standard pattern is completed. P'11=Pin...(1) P'11=(P11+Pin)/2...(2) Here, Pin indicates the input feature pattern, and P'11
indicates a new feature standard pattern when the feature standard pattern P11 is updated. For example, the pattern P11 with the smallest value of the recognition counter (stored in the feature standard pattern memory 16) indicating the degree of recognition contribution (that is, the smallest contribution to recognition) is updated as shown in Table 12. . At the same time, the value of the recognition counter is reset.

【表】こうして１個の音節の更新が終了したときに
は、ステツプm5で次ぎの音節があるか否かが判
断され、次の音節があるときにはステツプm1に
もどる。こうしてステツプm1〜ステツプm5の動
作が繰り返されて入力音節が更新される。ステツ
プm5で次の音節がないときは、ステツプn10に移
り、処理が完全に終了する。なお、音声分析部６以外は、全てCPU１０で
制御されている。本発明では、一種の音節がもつ特徴標準パター
ンの数が数10個以上と非常に多い場合、音節環境
クラスを細かく設定すればさらにその効果がでる
ものと思われる。前述の実施例では、直前の音韻を使つたが、前
後の音韻で音節環境を設定してもよい。第４図は、ある音節の更新後の特徴標準パター
ンの分布の違いを概念的に示したものである。本
発明では第４図１で示すように各環境毎にパター
ンが配置されて音韻距離空間にまばらに分布する
のに対して、従来技術では第４図２のように入力
頻度の高いパターンに偏つた分布をしているとい
える。なお、この第４図では数字１〜６は標準パ
ターン番号を示し、記号ａ〜ｅは音節の環境を示
す。効果以上のように本発明によれば以下の効果を奏
す。 (1) 同じような調音を行なう人間の生理的な動作
を音声中の音韻と対応づけて音節の前後の環境
として定義した情報を使つて特徴標準パターン
が良く整理されているため、特徴標準パターン
が効率良く修正され更新されて最終的に高い認
識率が得られる（高精度の認識）。 (2) 音節の入力頻度に強く束縛されることなく速
く最終の認識精度に到達できる（高い収束性）。[Table] When the updating of one syllable is completed in this way, it is determined in step m5 whether or not there is a next syllable, and if there is a next syllable, the process returns to step m1. In this way, the operations from step m1 to step m5 are repeated to update the input syllable. If there is no next syllable in step m5, the process moves to step n10 and the process is completely completed. Note that everything except the voice analysis section 6 is controlled by the CPU 10. In the present invention, if the number of feature standard patterns for a type of syllable is very large, such as several dozen or more, it is thought that the effect will be even greater if the syllable environment class is set in detail. In the above embodiment, the immediately preceding phoneme is used, but the syllable environment may be set using the preceding and following phonemes. FIG. 4 conceptually shows the difference in the distribution of feature standard patterns after updating a certain syllable. In the present invention, patterns are arranged for each environment and distributed sparsely in the phonological distance space, as shown in FIG. It can be said that it has a vine distribution. In FIG. 4, numbers 1 to 6 indicate standard pattern numbers, and symbols a to e indicate syllable environments. Effects As described above, the present invention provides the following effects. (1) Feature standard patterns are well-organized using information defined as the environment before and after a syllable by associating the physiological actions of humans who perform similar articulation with phonemes in speech. is efficiently corrected and updated, ultimately resulting in a high recognition rate (high-accuracy recognition). (2) The final recognition accuracy can be quickly reached without being strongly constrained by the input frequency of syllables (high convergence).

[Brief explanation of the drawing]

第１図は本発明の一実施例の日本語音声入力装
置１の構成を示すブロツク図であり、第２図は日
本語音声入力装置１における音声認識処理の手順
を示すフローチヤート、第３図は前記ステツプ
n8の標準パターン学習処理のさらに詳細な処理
手順を示すフローチヤート、第４図は、ある音節
の更新後の特徴標準パターンの分布の違いを概念
的に示した図である。１……日本語音声入力装置、２……マイクロホ
ン、６……音声分析部、７……音節セグメンテー
シヨン部、８……特徴パターンバツフア、９……
特徴パターンメモリ、１０……CPU、１１……
音節認識部、１２，１６，１８……特徴標準パタ
ーンメモリ、１３……認識結果メモリ、１４……
言語辞書メモリ、１５……キーボード。 FIG. 1 is a block diagram showing the configuration of a Japanese speech input device 1 according to an embodiment of the present invention, FIG. 2 is a flowchart showing the procedure of speech recognition processing in the Japanese speech input device 1, and FIG. is the step
FIG. 4 is a flowchart showing a more detailed process procedure of the n8 standard pattern learning process, and is a diagram conceptually showing the difference in the distribution of feature standard patterns after updating a certain syllable. 1...Japanese speech input device, 2...Microphone, 6...Speech analysis section, 7...Syllable segmentation section, 8...Feature pattern buffer, 9...
Feature pattern memory, 10... CPU, 11...
Syllable recognition unit, 12, 16, 18... Feature standard pattern memory, 13... Recognition result memory, 14...
Language dictionary memory, 15...keyboard.

Claims

[Scope of Claims] 1. Recognizes the input voice in syllable units by calculating the similarity with a plurality of types of syllable feature standard patterns registered in advance, and compares the results with a dictionary or with an external device such as a keyboard. In a speech recognition method that obtains the final input by modifying it through instruction operations, for each feature standard pattern of a syllable, the phonological environment of at least one side before and after the feature pattern was extracted from the speech is calculated. Add the syllable feature pattern obtained by analyzing the voice uttered at the time of input as a standard feature pattern, or modify the standard feature pattern using that input feature pattern and register it at the time of input. A speech recognition method characterized in that, when an input feature pattern is extracted, an update operation is performed by additional correction on a standard feature pattern that has the same environment information as the phonological environment on at least one side before and after the input feature pattern is extracted. .