JPH0157370B2

JPH0157370B2 -

Info

Publication number: JPH0157370B2
Application number: JP55178695A
Authority: JP
Inventors: Masayoshi Yurugi; Terukazu Kito; Masaji Kobayashi
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1980-12-19
Filing date: 1980-12-19
Publication date: 1989-12-05
Also published as: JPS57102689A

Description

【発明の詳細な説明】本発明は単音節入力音声認識装置を用いた音声
入力タイプライタに関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech input typewriter using a monosyllabic input speech recognition device.

音声認識装置は種々提案されているが、音声を
認識する場合その対象により単語認識と単音節認
識に区別される。このうち単音節認識は音声タイ
プライタへの応用を考えた場合、対象が単音節で
あることにより原理的には無限の言語を認識でき
ることになる。 Various speech recognition devices have been proposed, but speech recognition is classified into word recognition and monosyllable recognition depending on the target. Among these, when considering the application of monosyllable recognition to voice typewriters, since the target is monosyllables, in principle, an infinite number of languages can be recognized.

単音節認識装置については、日本音響学会音声
研究会資料の資料番号S77−46（1977年12月号）
「単音節音声タイプライタ（北大応用電気研究
所）」に示されており、概略第１図に示すように
なる。第１図において、１はマイクロフオン、２
はプリアンプ、３は16チヤネルのバンドパスフイ
ルタ（以下16CH BPFと記す）、４は16チヤネル
のアナログ・デイジタルコンバータ（以下16CH
Ａ／Ｄコンバータと記す）、５は入力値を対数変
換するリード・オンリ・メモリ（以下対数変換
ROMと記す）、６はピツチ抽出器、７はカウン
タタイマ、８はマイクロプロセツサ（以下マイク
ロCPUという）、９はバスライン（以下BUS
LINEという）、１０はリード・オンリ・メモリ
（以下ROMと記す）、１１はランダムアクセスメ
モリ（以下RAMと記す）、１２はフロツプデイ
スドライブ装置（以下FDDと記す）、１３はキヤ
ラクタデイスプレイ装置（以下CRTと記す）、１
４はキーボード装置（以下KBと記す）、１５は
ハードウエア演算器である。 Regarding the monosyllable recognition device, refer to the Acoustical Society of Japan Speech Research Group Material No. S77-46 (December 1977 issue)
It is shown in "Monosyllabic voice typewriter (Hokkaido University Applied Electrical Research Institute)" and is roughly shown in Figure 1. In Figure 1, 1 is a microphone, 2
is a preamplifier, 3 is a 16-channel bandpass filter (hereinafter referred to as 16CH BPF), and 4 is a 16-channel analog-to-digital converter (hereinafter referred to as 16CH BPF).
5 is a read-only memory that logarithmically converts the input value (hereinafter referred to as logarithmic conversion).
6 is a pitch extractor, 7 is a counter timer, 8 is a microprocessor (hereinafter referred to as micro CPU), and 9 is a bus line (hereinafter referred to as BUS).
10 is a read-only memory (hereinafter referred to as ROM), 11 is a random access memory (hereinafter referred to as RAM), 12 is a flop disk drive device (hereinafter referred to as FDD), and 13 is a character display. Device (hereinafter referred to as CRT), 1
4 is a keyboard device (hereinafter referred to as KB), and 15 is a hardware computing unit.

前記構成の音声タイプライタは以下のように動
作する。まずマイクロフオン１より入力された音
声は６デシベル／オクターブで減衰しているた
め、プリアンプ２において６デシベル／オクター
ブの率で高域を強調し、このパワースペクトルを
平坦なものにする。その出力を16CH BPF3へ入
力し、200〜4400Hzの周波数を16チヤネルに分割
し、各CH毎にその出力をＡ／Ｄコンバータ４へ
入力する。Ａ／Ｄコンバータ４にて12bitのデイ
ジタル量に変換した後対数変換ROM５へ入力
し、８ビツトの対数に変換する。これは人の聴覚
特性が対数的であるためそれを近似するためと、
更には、パワースペクトルの正規化を行なうとき
加減算だけで演算ができるという利点をも考えた
上の処理である。Ａ／Ｄコンバータ４の出力をマ
イクロCPU８のBUS LINE９を通じてマイクロ
CPU８へ引き込む。前記マイクロCPU８には制
御用のROM１０、データエリア用のRAM１１、
登録音声等のデータを格納するFDD１２、音声
認識結果を出力表示するCRT１３、マイクロ
CPU８に対し手動で入力を行なうためのKB１４
が接続され小型の電子計算機システムが構成され
ている。 The voice typewriter having the above configuration operates as follows. First, since the audio input from the microphone 1 is attenuated at 6 dB/octave, the preamplifier 2 emphasizes the high range at a rate of 6 dB/octave to flatten this power spectrum. The output is inputted to the 16CH BPF3, the frequency of 200 to 4400Hz is divided into 16 channels, and the output is inputted to the A/D converter 4 for each CH. After being converted into a 12-bit digital quantity by the A/D converter 4, it is input to the logarithmic conversion ROM 5, where it is converted into an 8-bit logarithm. This is to approximate the logarithmic nature of human hearing, and
Furthermore, this process takes into consideration the advantage that when normalizing the power spectrum, calculations can be performed using only addition and subtraction. The output of the A/D converter 4 is connected to the micro CPU 8 through the BUS LINE 9.
Pull into CPU8. The micro CPU 8 includes a ROM 10 for control, a RAM 11 for data area,
FDD12 to store data such as registered voices, CRT13 to output and display voice recognition results, micro
KB14 for manually inputting to CPU8
are connected to form a small electronic computer system.

又、16CH BPF３の出力をピツチ抽出器６へ
入力しピツチ周期毎に16CHA／Ｄコンバータを
トリガすべく構成する。ピツチ抽出器６に関する
一手法は、例えば特開昭54−162405「ピツチ周波
数抽出装置」に提案されているが、この様な方法
によりピツチ周波数が抽出されるものと考えてよ
い。ピツチ周波数は母音又は有音子音の場合存在
するが、無声子音の場合、音源は雑音でありピツ
チ周波数が存在しない。これら無声子音をサンプ
リングするためにカウンタタイマ７により一定周
期例えば1KHzの周波数で1msec毎にパルスを発
生し、16CH Ａ／Ｄコンバータ４へ入力する。
前記16CH Ａ／Ｄコンバータ４は、母音又は有
声子音の場合はピツチ周期毎に、無声子音の場合
は一定周期毎にに16CH BPF３の出力をサンプ
リングし、アナログ―デイジタル変換することに
より入力音声波形のパワースペクトルの包絡特性
を得ることができる。 Further, the output of the 16CH BPF 3 is input to the pitch extractor 6, and the configuration is configured to trigger the 16CHA/D converter every pitch period. One method for the pitch extractor 6 is proposed in, for example, Japanese Patent Application Laid-Open No. 54-162405 entitled "Pitch Frequency Extraction Apparatus", and it may be considered that pitch frequencies are extracted by such a method. A pitch frequency exists in the case of a vowel or a voiced consonant, but in the case of a voiceless consonant, the sound source is noise and the pitch frequency does not exist. In order to sample these unvoiced consonants, a counter timer 7 generates pulses every 1 msec at a constant frequency, for example, 1 KHz, and inputs them to the 16CH A/D converter 4.
The 16CH A/D converter 4 samples the output of the 16CH BPF 3 every pitch cycle in the case of a vowel or voiced consonant, and every fixed cycle in the case of a voiceless consonant, and performs analog-to-digital conversion to obtain the input speech waveform. Envelope characteristics of the power spectrum can be obtained.

以上の様にして得られたデイジタル化された入
力音声波形はパワースペクトルの正規化を施した
後登録音声波形の夫々のデータとの間で２乗距離
を計算する。この２乗距離の計算は数千回を必要
とし、マイクロCPU８で演算していたのでは実
時間処理ができないためハードウエア演算器１５
によりハードウエア的に演算を実施している。入
力音声波形と登録音声波形の間で２乗距離を計算
し、２乗距離の最低の値を示した単音節が入力音
声であると認識するものである。 The digitized input voice waveform obtained in the above manner is subjected to power spectrum normalization, and then the square distance between it and each data of the registered voice waveform is calculated. Calculation of this squared distance requires several thousand times, and since real-time processing cannot be performed using the micro CPU 8, a hardware calculator 15 is used to calculate the squared distance.
Calculations are performed using hardware. The squared distance is calculated between the input speech waveform and the registered speech waveform, and the monosyllable that shows the lowest value of the squared distance is recognized as the input speech.

前記認識単音節が、順次出力され、単語又は文
節としてまとまつたところで単語辞書、文法辞書
を参照して漢字かなまじり文に変換される。この
かな漢字の変換には「情報処理学会誌VOL 15No.
１（1974年１月）P.2〜P.9漢字かな混り文変換シ
ステム（松下地）」に示されているような公知の
手法を用いればよい。従つて音声認識部は単音節
認識であり入力時は１音節毎に区切つて入力する
こととなる。即ち「暖かい」という言葉を入力し
ようとする場合は１音節づつ順に「ア」「タ」
「タ」「カ」「イ」と発音して音声認識部に入力し
なければならない。 The recognized single syllables are sequentially output, and when they are grouped into words or phrases, they are converted into kanji and kana mixed sentences with reference to a word dictionary and a grammar dictionary. This kana-kanji conversion is described in ``Information Processing Society of Japan Journal VOL 15 No.
1 (January 1974) P.2 to P.9 Kanji-kana mixed sentence conversion system (Matsushitaji), a known method may be used. Therefore, the speech recognition unit recognizes monosyllables, and when inputting, the speech is inputted by dividing it into syllables. In other words, if you are trying to input the word "warm", type "a" and "ta" one syllable at a time.
You have to pronounce "ta", "ka", and "i" and input them into the speech recognition unit.

一方、日本語単音節は濁音、半濁音、拗音まで
合せて101音節からなる。これらの構成は母音ａ，
ｉ，ｕ，ｅ，ｏ（５ケ）、撥音Ｎ（１ケ）、子音＋母
音の形式のもの62ケ、拗音33ケよりなる。最も数
の多い子音＋母音の形式のものについてみると、
音声波形から得られる情報量は非常に少ないとい
える。ここで人が音声を聞く時について述べれ
ば、人が他の人の話をきく時、先ず話題が何に関
してであるかにより、どの様な内容について他の
人が話をするかを予測しながら聞いているため発
言内容が理解できるものであるといえる。卑近な
例として、オフイスで電話をとつた時のことを述
べると、自席の電話をとつたとき、交換手が早口
に「〇〇社の△△さんから××さんに電話です。」
といつた場合、殆んどの場合「〇〇社」「△△さ
ん」「××さん」の内容が理解できる。これは
「〇〇社の△△さん」からは時々電話があり、又
「××さん」は自分又は自席のまわりにいる人の
名前であるため、交換手の早口の内容が理解でき
るものである。しかしながら「〇〇社の△△さ
ん」が過去に取引のない相手で、しかも「△△さ
ん」が珍名、奇名の類の場合は先ず理解できない
ものである。更に自席の電話でなく、他部門の電
話をとつた場合「××さん」さえよく分らないこ
とがある。 On the other hand, Japanese monosyllables consist of 101 syllables, including voiced, semi-voiced, and persistent sounds. These structures consist of vowels a,
It consists of i, u, e, o (5 syllables), a cursive N (1 syllable), 62 syllables in the form of a consonant + vowel, and 33 syllables. Looking at the most common consonant + vowel format,
It can be said that the amount of information that can be obtained from audio waveforms is extremely small. If we talk about when people listen to audio, when they listen to what another person is saying, they first try to predict what the other person will talk about based on what the topic is about. It can be said that the content of what is being said can be understood because the person is listening. To give you a familiar example, when I picked up the phone at my desk, the operator quickly said, ``Mr. △△ of 〇〇 company is calling Mr. XX.''
In most cases, students can understand the meaning of ``〇〇company,''``△△san,'' and ``XXsan.'' This is because ``Mr. △△ of 〇〇 company'' sometimes calls me, and ``Mr. be. However, if ``Mr. △△ of 〇〇 company'' is a partner with whom we have not done business in the past, and ``Mr. △△'' is an unusual or strange name, it is difficult to understand. Furthermore, if you pick up a phone from another department rather than your desk phone, you may not even be able to understand ``Mr.

以上述べた様に人は音声を聞くというより意識
下の情報とマツチングをとりながら認識し理解し
ているといえる。いいかえれば、音声入力の理解
の単音節の個々のものを理解し、それらを組み立
てている訳ではない。因に人に対して単音節の認
識実験をやつた処、ガ行、ダ行、バ行相互間やパ
行、カ行、タ行相互間又、マ行、ナ行相互間等の
聞き間違いが多く人を100％の認識率を有する単
音節認識部として用いることはできない。 As mentioned above, it can be said that people recognize and understand speech by matching it with information in their conscious mind, rather than by listening to it. In other words, it does not understand individual monosyllables of speech input and assemble them. Incidentally, when I conducted a monosyllable recognition experiment on people, I found mistakes in mishearing between G, D, and B lines, between PA, C, and T lines, and between M and N lines. It cannot be used as a monosyllable recognizer with a 100% recognition rate for many people.

この様に人が聞いてもよく理解できない単音節
を機械に100％認識させることは不可能に近く、
認識率95％が限度であるという重大な、しかも本
質的な欠点があつた。又入力音声を発するのは人
であるため発声音、即ち話者の健康状態、感情状
態により音声の状態が変ることとなる。従来の音
声認識装置では登録音声波形と入力音声波形間の
誤差を算出し、ある閾値以外のものはリジエクト
し話者より再入力する方法や、例えば前記誤差が
閾値以下であつたとしても最も誤差の小さいもの
を認識結果とする等の方法がとられていたが、い
づれも音声認識部の認識率だけに依存するという
欠点があつた。 It is almost impossible for a machine to 100% recognize monosyllables that humans cannot understand even when they hear them.
It had a serious and essential drawback that the recognition rate was limited to 95%. Furthermore, since the input voice is uttered by a person, the state of the voice changes depending on the uttered sound, that is, the health condition and emotional state of the speaker. In conventional speech recognition devices, the error between the registered speech waveform and the input speech waveform is calculated, and those other than a certain threshold are rejected and re-inputted by the speaker.For example, even if the error is less than the threshold, the error is Methods such as taking the smaller one as the recognition result have been used, but all of them have the disadvantage that they depend only on the recognition rate of the speech recognition unit.

本発明の目的は単音節入力音声タイプライタに
おける前記欠点を除去するため、入力音声が人が
聞いた場合認識しうる単語や構文であることに着
目し、認識結果が単語や構文としてあり得ないも
のであつた場合それを認識結果とせず一部の単音
節を音声認識部より出力された第２候補の単音節
と入れかえることにより人が認識しうる単語や構
文とすることを特徴とする実効音声認識率を上げ
ることを目的としたものであり以下詳細に説明す
る。 The purpose of the present invention is to eliminate the above-mentioned drawbacks in monosyllabic input voice typewriters, by focusing on the fact that the input voice is a word or structure that can be recognized when a person hears it, and the recognition result is not a word or structure that cannot be recognized as a word or structure. If it is a word or sentence, it is not recognized as a recognition result, but some monosyllables are replaced with a second candidate monosyllable output from the speech recognition unit, thereby creating a word or syntax that can be recognized by humans. The purpose is to increase the speech recognition rate and will be explained in detail below.

第２図は本発明の実施例を示むブロツク図であ
つて、１６は１個の入力音節に対して２個の認識
音節とそれぞれの確度とを出力する音声認識部、
１７は認識結果を格納する認識結果レジスタ、１
８は単語構文理解処理部、１９は単語・文法辞
書、２０は出力装置、２１は再試行処理部であ
る。 FIG. 2 is a block diagram showing an embodiment of the present invention, in which 16 is a speech recognition unit that outputs two recognized syllables and their respective probabilities for one input syllable;
17 is a recognition result register for storing recognition results;
8 is a word syntax understanding processing section, 19 is a word/grammar dictionary, 20 is an output device, and 21 is a retry processing section.

次にその動作を説明する。 Next, its operation will be explained.

まず、マイクロフオン１から文節単位で単音節
ごとに区切つて音声認識部１６に入力する。例え
ば、「暖かい春が来た」という内容を入力する時
はまず第１文節である「暖かい」を「ア」「タ」
「タ」「カ」「イ」と単音節毎に区切つて入力する。
音声認識部１６は第３図に示すように、まず母音
認識を行ない例えばア行であることを判別する
と、次に第４図に示す音節および確度レジスタ２
２の音節レジスタ２２ａにア行の全ての文字を格
納し、第５図に示す登録パターンメモリ２３の
1000番地から12FF番地までに格納された「ア」
の登録パターン（16チヤネルのデジタルコード）
２３ａを比較レジスタ２３ｂへ転送し、さらに確
度格納アドレスレジスタ２３ｃに確度を格納する
先頭アドレスであるF002番地をセツトする。次
に比較レジスタ２３ｂの内容と入力音声レジスタ
２３ｄの内容とを比較し、２乗距離すなわち入力
音声波形の登録音声波形に対する誤差を誤差０の
ときを確度１００として演算し、この確度を音節
および確度レジスタ２２の確度レジスタ２２ｂの
先頭を格納する。同様にして登録パターン２３の
チエツク時点の先頭レジスタを格納するサーチア
ドレスレジスタ２３ｅのサーチアドレスを1000番
地ずつ増加して比較レジスタ２３ｂへ順に「カ」
以下の登録パターンを転送し、それぞれを入力音
声レジスタ２３ｄの内容と比較し、確度格納アド
レスレジスタ２３ｃの内容を“10”ずつ増加して
比較結果である確度レジスタ２２ｂへ順次格納す
る。このようにしてア行の全ての確度を確度レジ
スタ２２ｂに格納した後第１順位および第２順位
の確度をサーチし、それぞれの音節と確度を認識
結果レジスタ１７へ出力する。これらの処理の流
れを第３図に示す。この認識結果レジスタ１７は
第７図に示すように第１順位の音節と確度を第１
候補レジスタ１７ａに格納し、第２順位の音節と
確度を第２候補レジスタ１７ｂに格納する。なお
No.は入力順序を示す番号である。 First, the speech is input from the microphone 1 into the speech recognition section 16 by dividing it into monosyllables in phrase units. For example, when inputting the message ``A warm spring has arrived,'' first change the first phrase ``warm'' by typing ``a'' and ``ta.''
Enter "ta", "ka", and "i", separated by single syllables.
As shown in FIG. 3, the speech recognition unit 16 first performs vowel recognition and determines that it is a line, for example, and then inputs the syllable and accuracy register 2 shown in FIG.
All characters in the A line are stored in the syllable register 22a of 2, and are stored in the registered pattern memory 23 shown in FIG.
"A" stored from address 1000 to address 12FF
Registration pattern (16 channel digital code)
23a is transferred to the comparison register 23b, and address F002, which is the first address for storing the accuracy, is set in the accuracy storage address register 23c. Next, the contents of the comparison register 23b and the contents of the input voice register 23d are compared, and the square distance, that is, the error of the input voice waveform with respect to the registered voice waveform, is calculated with the accuracy of 100 when the error is 0, and this accuracy is calculated for the syllable and the accuracy. The beginning of the accuracy register 22b of the register 22 is stored. Similarly, the search address of the search address register 23e that stores the first register at the time of checking the registered pattern 23 is incremented by 1000 addresses, and the search address is sequentially added to the comparison register 23b.
The following registered patterns are transferred, each is compared with the contents of the input audio register 23d, and the contents of the accuracy storage address register 23c are incremented by "10" and sequentially stored in the accuracy register 22b which is the comparison result. After storing all the accuracies of row A in this manner in the accuracy register 22b, the accuracies of the first and second ranks are searched, and the respective syllables and accuracies are output to the recognition result register 17. The flow of these processes is shown in FIG. This recognition result register 17 stores the syllable of the first rank and the accuracy of
The syllable of the second rank and the accuracy are stored in the second candidate register 17b. In addition
No. is a number indicating the input order.

第７図の例では「暖い」を入力するために
「ア」「タ」「タ」「カ」「イ」と単音節に区切つて
入力し、音声認識部１６が第１候補として認識し
た結果が「アタタカイ」であつたことを示す。そ
れぞれ認識確度は第７図に示す様に入力順に９
５，８５，８１，８６，９６である。第２候補は
入力順に認識音節と確度を示すと第７図に示す様
にサー５５，パー７０、ター７５、ター７３、キ
ー６０である。なお第６入力以降は入力されなか
つたことを示すマーク（＊）が格納される。単語
構文理解処理部１８は認識結果レジスタ１７に格
納された単音節群を単語・文法辞書１９の内の単
語辞書を用いて最長一致法により一つの単語とし
て有効はどうかを調べる。即ち第１候補レジスタ
１７ａの認識音節「アタタカイ」を第９図に示す
検定レジスタ１８ａに転送し、この「アタタカ
イ」という単語が単語辞書にあるかどうかの検定
を行ない、ない場合「イ」を削除した「アタカ
カ」で検定を行なう。このようにして順次「アタ
カ」「アタ」「ア」で検定を行なう。「ア」は「合
う」「会う」「明く」「明ける」「当てる」「浴びる」
「在る」「有る」等の動詞の語幹であるがいづれも
「ア」の次の「タカカ」が活用形の語尾に一致し
ない。この様にしていづれもが単語辞書の見出し
にない場合は次に先頭の音節「ア」が接頭語かど
うかを調べる、本例の場合「ア」は接頭語ではな
いので接頭語処理は行なわわれないが「ア」の部
分に接頭語、例えば「オ」等があつた場合は、こ
の接頭語を取り去つた残りの４音節で最長一致法
により一つの単語として有効かどうかの検定を行
なう。本例の場合「アタタカイ」という音節列に
は単語として識別できるものがなかつたこととな
る。 In the example shown in FIG. 7, in order to input "warm", the input is divided into monosyllables such as "a", "ta", "ta", "ka", and "i", and the speech recognition unit 16 recognizes it as the first candidate. It shows that the result was "Atata Kai". The recognition accuracy for each is 9 in the input order as shown in Figure 7.
5, 85, 81, 86, 96. The second candidates are sir 55, par 70, tar 75, tar 73, and key 60, as shown in FIG. 7, showing the recognized syllables and accuracy in the order of input. Note that from the sixth input onward, a mark (*) indicating that no input was made is stored. The word syntax understanding processing unit 18 uses the word dictionary in the word/grammar dictionary 19 to check whether the monosyllable group stored in the recognition result register 17 is valid as one word by the longest match method. That is, the recognized syllable ``Atatakai'' in the first candidate register 17a is transferred to the verification register 18a shown in FIG. 9, and a test is performed to see if this word ``Atatakai'' exists in the word dictionary. If not, ``i'' is deleted. The test will be conducted using the "Atakaka" test. In this way, the test is performed sequentially for "Ataka", "Ata", and "A". "A" means "to meet", "to meet", "to brighten", "to dawn", "to hit", "to bathe"
In the stems of verbs such as ``aru'' and ``aru'', the ``takaka'' that follows ``a'' does not match the ending of the conjugated form. In this way, if none of them are found in the heading of the word dictionary, next check whether the first syllable "a" is a prefix. In this example, since "a" is not a prefix, prefix processing is not performed. However, if there is a prefix such as "o" in the "a" part, remove this prefix and use the remaining four syllables to test whether or not they are valid as a single word using the longest match method. In this example, there is nothing that can be identified as a word in the syllable string "atatakai."

従来の方式ではそのままひらがな又はカタカナ
で表示装置に表示し、話者の判断を迎ぐこととな
つたが、本発明による方法では、単語又は構文と
して人が認識しうる単音節列が入力されたことに
着目して処理する。その処理方法の一例を以下に
第６図を用いて詳細に説明する。即ち、単語構文
理解処理部１８が検定した結果単語として識別で
きなかつたときは、再試行処理部２１に再試行信
号を送り、再試行処理を開始する。再試行処理部
２１は再試行信号を受けると、第８図に示す再試
行カウンタ２１ｂを調べ、計数値が“０”のとき
はこの計数値を“１”増加させた後第１候補レジ
スタ１７ａから最低確度を抽出し、入力順序を示
す番号（本例では“３”）を入替番号レジスタ２
１ｃへセツトする。また前記再試行カウンタ２１
ｂの計数値が“１”のときはこの計数値“１”増
加させた後第１候補レジスタ１７ａの認識音節を
検定レジスタ１８ａにロードしなおした後第２候
補レジスタ１７ｂから最高確度を抽出し、入力順
序を示す番号を入替番号レジスタ２１ｃへセツト
する。さらに前記再試行カウンタ２１ｂの計数値
が“２”のときは再試行カウンタ２１ｂを“０”
にリセツトした後、単語構文理解処理部１８へエ
ラー信号を送る。本例の場合１回目の再試行であ
り、再試行カウンタ２１ｂの計数値は“０”であ
るので入替番号レジスタ２１ｃに“３”がセツト
される。次に再試行処理部２１は入替番号レジス
タ２１ｃの内容“３”に基いて第２候補レジスタ
１７ｂの第３番目の内容「タ」を読出し、検定レ
ジスタ１８ａの第３番目に前記「タ」を格納した
後、単語構文理解処理部１８へ検定信号を送る。
単語構文理解処理部１８は前述のようにして再び
検定を行なうが、このとき検定レジスタ１８ａの
内容は「アタタカイ」となつているので、前記単
語文法辞書１９内の単語辞書中に「暖かい」とい
う見出しで出て居り、形容詞終止形であることが
分る。したがつて前記単語文法辞書１９から
「暖」の漢字コードと「か」「い」のかなコードが
出力され、出力装置２０において印字または表示
される。 In the conventional method, hiragana or katakana were displayed on the display device as is, and the speaker had to make a decision, but in the method according to the present invention, a monosyllable string that a person can recognize as a word or a sentence is input. Focus on this and process it. An example of the processing method will be explained in detail below using FIG. 6. That is, when the word syntax understanding processing unit 18 tests and cannot identify the word as a word, it sends a retry signal to the retry processing unit 21 to start retry processing. When the retry processing unit 21 receives the retry signal, it checks the retry counter 21b shown in FIG. Extract the lowest accuracy from
Set to 1c. In addition, the retry counter 21
When the count value of b is "1", this count value is increased by "1", the recognized syllable in the first candidate register 17a is loaded again into the verification register 18a, and the highest accuracy is extracted from the second candidate register 17b. , sets a number indicating the input order in the replacement number register 21c. Further, when the count value of the retry counter 21b is "2", the retry counter 21b is set to "0".
After resetting to , an error signal is sent to the word syntax understanding processing section 18. In this example, this is the first retry, and the count value of the retry counter 21b is "0", so "3" is set in the replacement number register 21c. Next, the retry processing unit 21 reads the third content "ta" of the second candidate register 17b based on the content "3" of the replacement number register 21c, and sets the third content "ta" of the verification register 18a. After storing, a test signal is sent to the word syntax understanding processing section 18.
The word syntax understanding processing unit 18 performs the test again as described above, but at this time, since the content of the test register 18a is "atatakai", the word "warm" is included in the word dictionary in the word grammar dictionary 19. It appears in the heading, and we can see that it is an adjective final form. Therefore, the word grammar dictionary 19 outputs the kanji code for ``warm'' and the kana code for ``ka'' and ``i'', which are printed or displayed on the output device 20.

また本例と異なり第１回目の再試行により識別
できなかつたときは再び第１候補レジスタ１７ａ
の認識音節を検定レジスタ１８ａに転送し、第２
候補レジスタ１７ｂの最高確度の認識音節を検定
レジスタ１８ａの該当音節と入れ替えて再検定す
るが、これでも認識できなかつた場合、第１候補
レジスタ１７ａの認識音節をカナコードで出力装
置２０へ送り、話者の判断に任せる。なお、単語
辞書中に「暖か」という見出しの形容動詞しかな
い場合でも、最長一致法による２度目の検定に於
いて特定され結果として「暖か」という形容動詞
に接尾語「い」が付属したものであることが判明
するため、見出し語数として膨大な数をもつ必要
はなく、この点は単語・文法辞書１９の整備と単
語構文理解処理部１８のアルゴリズムの問題に帰
着し、本発明を妨げるものでないことを付記す
る。単語構文理解処理部１８は「暖かい」という
形容詞であつたことを記憶して「暖」という漢字
コードと「か」というかなコード「い」というか
なコードを前記の順で出力装置２０へ出力すると
「暖かい」と出力される。出力装置２０は音声認
識部１６に含まれるCRT１３を用いてもよいし、
又別のプリンタ等を用いてもよい。単語構文理解
処理部１８は単語・文法辞書１９を用いて詳しく
説明した単語に関してだけでなく、文法的な観点
より構文等に関しても検定を行なうものである。
ここで「暖かい」という形容詞であつたことを記
憶するのは、次の入力文節の単語構文理解処理の
参考データとするためであり、意味処理まで行な
えば、次の言葉として、どのような文節が入力さ
れるかの推定まで行なえることとなる。 Also, unlike this example, if identification is not possible after the first retry, the first candidate register 17a is used again.
The recognized syllables are transferred to the verification register 18a, and the second
The recognized syllable of the candidate register 17b with the highest accuracy is replaced with the corresponding syllable of the test register 18a and retested, but if it still cannot be recognized, the recognized syllable of the first candidate register 17a is sent to the output device 20 in kana code, Leave it to the speaker's discretion. In addition, even if there is only an adjective verb with the heading "warm" in the word dictionary, it will be identified in the second test using the longest match method, and as a result, the adjective verb "warm" will have the suffix "i" attached. Therefore, it is not necessary to have a huge number of headwords, and this point results in a problem with the maintenance of the word/grammar dictionary 19 and the algorithm of the word syntax understanding processing unit 18, which hinders the present invention. Please note that this is not the case. The word syntax understanding processing unit 18 remembers that the adjective is "warm" and outputs the kanji code "warm" and the kana code "ka" and the kana code "ii" to the output device 20 in the above order. "Warm" is output. The output device 20 may be a CRT 13 included in the voice recognition section 16, or
Alternatively, another printer or the like may be used. The word syntax understanding processing unit 18 tests not only the words explained in detail using the word/grammar dictionary 19 but also the syntax etc. from a grammatical point of view.
The reason why we memorize the adjective ``warm'' is to use it as reference data for the word syntax understanding process for the next input phrase. It is possible to even estimate whether the input will be made.

以上説明したように前記実施例では単音節入力
列が単語又は文節単位で入力されるものとして、
更に人が聞いた場合理解しうる単語や構文である
ことに着目し、音声認識部１６より第１候補音
節、第２候補音節とそれぞれの認識確度情報を出
力し単語構文理解処理部１８により検定し人が聞
いた場合理解しうる単語や文節として出力するこ
とにより音声認識部１６の認識率のみに依存する
ことなく音声タイプライタという装置としての実
効的な認識率を向上させることが可能になるとい
う大きな利点がある。 As explained above, in the above embodiment, it is assumed that the monosyllabic input string is input in units of words or phrases.
Furthermore, focusing on the words and syntax that people can understand when heard, the speech recognition unit 16 outputs the first candidate syllable, the second candidate syllable, and recognition accuracy information for each, and the word syntax understanding processing unit 18 tests them. By outputting words and phrases that people can understand when they hear it, it becomes possible to improve the effective recognition rate of the voice typewriter device without relying solely on the recognition rate of the voice recognition unit 16. There is a big advantage.

前記実施例では第１図にて説明したように簡単
な方式で音声認識部１６を構成しているが本発明
は前記第１図の構成の音声認識方式に限定される
ものでなく、いかなる音声認識方式をとつた場合
でも音声タイプライタとして全体的な実効認識率
の向上に寄与するものであることはいう迄もな
い。 In the embodiment described above, the speech recognition unit 16 is configured in a simple manner as explained in FIG. 1, but the present invention is not limited to the speech recognition method having the structure shown in FIG. Needless to say, even if a recognition method is used, it will contribute to improving the overall effective recognition rate as a voice typewriter.

また、前記実施例では簡単のため音声認識部１
６より認識結果レジスタ１７を通じて単語構文理
解処理部１８へ出力する単音節列を単語又は文節
単位ということで説明したが単語構文理解処理部
１８の機能の中に文節への分解機能をも保有する
様に構成すれば単音節列は単語や文節に限定する
ことなく、文章全体を入力することができるもの
であり、更に単語構文理解処理部１８に意味理解
機能を付加することにより実効認識率を増々向上
させることができる。 In addition, in the above embodiment, for simplicity, the voice recognition unit 1
Although the single syllable string output from 6 to the word syntax understanding processing unit 18 through the recognition result register 17 has been explained in units of words or phrases, the function of the word syntax understanding processing unit 18 also includes a function of breaking it down into phrases. If configured in this way, the monosyllable string can be input as an entire sentence without being limited to words or phrases, and by adding a meaning understanding function to the word syntax understanding processing section 18, the effective recognition rate can be increased. It can be improved more and more.

さらに、前記実施例では第１候補の単音節列が
単語又は文節として識別できなかつた場合第１候
補群中の認識確度が最低のものを排除してその単
音節の第２候補といれかえた後再検定し、再び認
識できなかつた場合第２候補単音節列の中で最も
確度の高い単音節を第１候補単音節列の該当音節
と入れかえて再検定したが、いずれか一方の再検
定により識別できなかつたときには第１候補単音
節列をそのまま出力して話者の判断に任せるよう
にしてもよい。 Furthermore, in the above embodiment, if the first candidate monosyllable string cannot be identified as a word or phrase, the one with the lowest recognition accuracy in the first candidate group is eliminated and replaced with the second monosyllable candidate. If it was not recognized again after retesting, the most accurate monosyllable in the second candidate monosyllable string was replaced with the corresponding syllable in the first candidate monosyllable string and retested. If it cannot be identified, the first candidate monosyllable sequence may be output as is and left to the speaker's discretion.

また、人が単語や文節として理解し得ない様な
入力、例えば暗号文の如きものを入力する場合、
選択スイツチを設け単語構文理解処理部１８によ
る検定機能を使わずに出力装置２０に出力する
か、又は一歩進めてその暗号文の文法、法則等に
適合した単語構文理解処理部１８、単語・文法辞
書１９を準備し、その適法性を検定する等の応用
も考えられる。 Also, when inputting something that a person cannot understand as a word or phrase, such as cipher text,
A selection switch is provided to output the word to the output device 20 without using the verification function of the word syntax understanding processing section 18, or to go one step further, the word syntax understanding processing section 18 and word/grammar that match the grammar, rules, etc. of the ciphertext are provided. Applications such as preparing a dictionary 19 and testing its legality may also be considered.

以上詳細に説明した様に、本発明によれば日本
語の特徴を巧みにそろえて単語構文理解処理を施
すことにより、音声入力の誤認識という重大な欠
点をとり去り理想的な音声タイプライタを提供す
ることができる。又、本発明は音声タイプライタ
にとどまらず音声入力ワードプセツサ等を始めと
する音声入力日本語処理技術にとつて重大な変革
をもたらす発明であるといえる。 As explained in detail above, according to the present invention, by skillfully aligning the characteristics of the Japanese language and performing word syntax understanding processing, the serious drawback of misrecognition of voice input can be eliminated, and an ideal voice typewriter can be created. can be provided. Furthermore, the present invention can be said to be an invention that brings about a significant change not only in voice typewriters but also in voice input Japanese processing technology including voice input word processors and the like.

[Brief explanation of drawings]

第１図は従来の単音節タイプライタの音声認識
部の構成図、第２図は本発明の実施例を示すブロ
ツク図、第３図は音声認識部の動作を説明するた
めの図、第４図および第５図は音声認識部の
RAMの内容を示す図、第６図は再試行処理部の
動作を説明するための図、第７図は認識結果レジ
スタの内容を示す図、第８図は再試行処理部の要
素を示す図、第９図は検定レジスタを示す図であ
る。１……マイクロフオン、２……プリアンプ、３
……16CHBPF、４……16CHA／Ｄコンバータ、
５……対数変換ROM、６……ピツチ抽出器、７
……カウンタタイマ、８……マイクロCPU、９
……BUS LINE、１０……ROM、１１……
RAM、１２……FDD、１３……CRT、１４…
…KB、１５……ハードウエア演算器、１６……
音声認識部、１７……認識結果レジスタ、１７ａ
……第１候補レジスタ、１７ｂ……第２候補レジ
スタ、１８……単語構文理解処理部、１８ａ……
検定レジスタ、１９……単語・文法辞書、２０…
…出力装置、２１……再試行処理部、２２……音
節および確度レジスタ、２２ａ……音節レジス
タ、２２ｂ……確度レジスタ、２３……登録パタ
ーンメモリ、２３ａ……登録パターン、２３ｂ…
…比較レジスタ、２３ｃ……確度格納レジスタ、
２３ｄ……入力音声レジスタ。 FIG. 1 is a block diagram of the speech recognition section of a conventional monosyllabic typewriter, FIG. 2 is a block diagram showing an embodiment of the present invention, FIG. 3 is a diagram for explaining the operation of the speech recognition section, and FIG. The figure and Figure 5 are of the speech recognition section.
Figure 6 is a diagram showing the contents of the RAM, Figure 6 is a diagram explaining the operation of the retry processing unit, Figure 7 is a diagram showing the contents of the recognition result register, and Figure 8 is a diagram showing the elements of the retry processing unit. , FIG. 9 is a diagram showing the verification register. 1...Microphone, 2...Preamplifier, 3
...16CHBPF, 4...16CHA/D converter,
5... Logarithmic conversion ROM, 6... Pitch extractor, 7
... Counter timer, 8 ... Micro CPU, 9
...BUS LINE, 10...ROM, 11...
RAM, 12...FDD, 13...CRT, 14...
...KB, 15...Hardware computing unit, 16...
Speech recognition unit, 17...Recognition result register, 17a
...First candidate register, 17b...Second candidate register, 18...Word syntax understanding processing unit, 18a...
Certification register, 19...Word/grammar dictionary, 20...
...output device, 21...retry processing unit, 22...syllable and accuracy register, 22a...syllable register, 22b...accuracy register, 23...registered pattern memory, 23a...registered pattern, 23b...
...Comparison register, 23c...Accuracy storage register,
23d...Input audio register.

Claims

[Claims] 1. A speech input typewriter that inputs speech, processes it, and outputs it as information in a predetermined format, which outputs at least two recognized syllables and their respective accuracies for one input syllable. a recognition result register that classifies and stores the recognized syllables and accuracies into a plurality of syllable groups based on differences in accuracy, a word/grammar dictionary that stores words and grammar, and the recognition result register. A word/syntax understanding processing unit extracts a group of syllables from the word/syntax dictionary and tests whether it is correct or incorrect by referring to the words and grammar stored in the word/grammar dictionary. a retry processing unit that replaces a part of the syllable group with a part of another syllable group stored in the recognition result register and performs a retest;
A monosyllabic voice input typewriter comprising an output device that prints or displays the output of a syntax understanding processing section.