JPH0786908B2

JPH0786908B2 - Word matching device

Info

Publication number: JPH0786908B2
Application number: JP61276804A
Authority: JP
Inventors: 直樹中島; 敏夫堤田; 隆彦川谷
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 1986-11-21
Filing date: 1986-11-21
Publication date: 1995-09-20
Anticipated expiration: 2010-09-20
Also published as: JPS63131289A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文字読み取り装置により読み取られた文字列
の読み取り結果に基づいて単語照合処理を行う装置に関
する。Description: TECHNICAL FIELD The present invention relates to a device for performing a word matching process based on a reading result of a character string read by a character reading device.

[Conventional technology]

従来、光学文字読み取り装置（以下OCRという）により
帳票上の文字列を読み取ることによって得られた結果
を、読み取り対象となる単語群を予め登録してある単語
辞書部内の単語データにより、単語単位で照合する単語
照合装置がある。この種の装置では、単語辞書部の検索
を行うために入力単語の単語長、入力単語の特定の位置
における文字種によるインデックスを用いている。Conventionally, the result obtained by reading a character string on a form by an optical character reading device (hereinafter referred to as OCR) is used for each word by word data in a word dictionary unit in which a word group to be read is registered in advance. There is a word matching device for matching. This type of device uses an index based on the word length of the input word and the character type at a specific position of the input word to search the word dictionary section.

[Problems to be solved by the invention]

ところで記入者が帳票上に誤記入したとき、第２図
（ａ）に示すように縦線等で１字抹消する場合がある。
この場合抹消文字部に一文字存在するものとして誤照合
される問題がある。また抹消文字部の欄外に正しい文字
が記入されている場合もあり、抹消文字に対し筆記制限
を緩和した際、抹消文字を文字有りとみなすか文字無し
とみなすか区別できず、正しい照合がとれないことが問
題となっていた。By the way, when an erroneous entry is made on the form by mistake, one character may be erased by a vertical line as shown in FIG. 2 (a).
In this case, there is a problem that a character is erroneously collated as one character exists in the erased character portion. In addition, correct characters may be entered in the margin of the erasure character part, and when the writing restrictions on erasure characters are relaxed, it is not possible to distinguish whether the erasure character is considered to have characters or not, and a correct collation cannot be obtained. The problem was that it wasn't.

本発明の目的は、単語照合フィールドにおける、単語照
合処理の読取精度を維持したまま、筆記者の抹消文字に
よる訂正法の制限を緩和することにある。An object of the present invention is to alleviate the limitation of the writer's correction method using erasure characters while maintaining the reading accuracy of the word matching process in the word matching field.

[Means for solving problems]

本発明によれば、読み取り対象となる単語群の単語デー
タを予め登録してある単語辞書部と、抹消文字を含む文
字列の読み取り結果を前記単語データに基づいて照合す
る単語照合処理部とを備えた単語照合装置において、前
記単語照合処理部は、抹消文字を除いた文字列に対する
読み取り結果をもとに照合を行う第１の照合手段と、抹
消文字を候補文字の得られなかった棄却文字とみなして
照合を行う第２の照合手段とを備えている。According to the present invention, a word dictionary unit in which word data of a word group to be read is registered in advance, and a word matching processing unit that matches the reading result of a character string including a deletion character based on the word data. In the provided word collation device, the word collation processing unit includes a first collating unit that collates based on a reading result of a character string excluding the erasure character and a reject character for which the erasure character is not a candidate character. And a second collating means for collating.

[Action]

本発明によれば、抹消文字が削除である場合は第１の照
合手段で、抹消文字が訂正である場合は第２の照合手段
で正しい照合をとることができる。According to the present invention, the correct collation can be performed by the first collating means when the erased character is the deletion and by the second collating means when the erased character is the correction.

〔Example〕

第１図は本発明の一実施例を示すブロック構成図であ
り、１は入力帳票、２は従来技術により実現可能なOC
R、３は本発明を実施する単語照合処理部、４は単語辞
書、５は単語照合出力である。以下、本方式の動作を住
所の読取りを例に用いて説明する。FIG. 1 is a block diagram showing an embodiment of the present invention, in which 1 is an input form and 2 is an OC that can be realized by a conventional technique.
R, 3 are word matching processing units for implementing the present invention, 4 is a word dictionary, and 5 is a word matching output. The operation of this method will be described below by taking the reading of an address as an example.

入力帳票１に書き込まれている内容は第２図（ａ）に示
すものとし、単語間はスペースによって区切られている
ものとする。そのときのOCR2の読取結果が第２図（ｂ）
のような結果になったとする。第２図（ｂ）において、
“?"はOCR2によって読取不能となった文字、“1"は筆記
者の訂正により抹消された文字である。また、“トツカ
ク”の“ク”は“ノ”に誤読された状態を示している。
この第２図（ｂ）の内容が単語照合処理部３の入力とな
る。The contents written in the input form 1 are as shown in FIG. 2 (a), and the words are separated by spaces. The reading result of OCR2 at that time is shown in Fig. 2 (b).
The result is as follows. In FIG. 2 (b),
"?" Is a character that cannot be read by OCR2, and "1" is a character that is erased by the writer's correction. In addition, "Ku" of "Totsukaku" indicates a state of being misread by "No".
The contents of FIG. 2 (b) are input to the word matching processing unit 3.

次に、単語辞書４のテーブル内容を第３図に示す。ここ
で、６は単語長インデックス、７は単語辞書であり、単
語テーブル７は単語長によってソーティングされ格納さ
れている。そして、前記単語テーブル７は単語長インデ
ックス６により、該当する単語長の単語が格納されてい
るスタートアドレスによって、検索開始位置が示される
ようになっている。Next, the table contents of the word dictionary 4 are shown in FIG. Here, 6 is a word length index, 7 is a word dictionary, and the word table 7 is sorted and stored according to the word length. In the word table 7, the search start position is indicated by the word length index 6 and the start address where the word of the corresponding word length is stored.

以下、この単語照合処理部３の動作例について第４図に
フローチャートを用いて詳しく説明する。Hereinafter, an example of the operation of the word matching processing unit 3 will be described in detail with reference to the flowchart in FIG.

先ず、第２図（ｂ）の住所の読取り結果を、単語照合処
理部３で受け取り、スペースを単語間の区切り文字とみ
なし、先頭の“ヨコハ？シ”なる第１の単語を抽出し入
力単語W_iとし、抹消文字以外の文字数L₁＝５、抹消文字
数L₂＝０となる。即ち、該単語は抹消文字を含まない単
語に相当する。これをもとに、単語長Ｌ＝５となる。式
内の変数FLGについては後述する。次に、第７図の単語
長インデックス６により単語長が５の単語のスタートア
ドレスを得て、単語テーブル７から単語W_jを読みこみ、
W_iとW_jの各文字を比較し、その状態により第８図の加算
値を距離値ｄに加算する。例えば、辞書内単語“カマク
ラシ”と入力単語“ヨコハ？シ”で距離値ｄを求める
と、第１文字目の“カ”と“ヨ”を比較すると不一致と
なり、第８図により１を加算する、同様に第２文字、第
３文字目についても１ずつ加算される。最後の文字
“シ”は一致するので０が加算される。この結果、距離
値ｄは3.5となる。この処理を同じ単語長の単語全てに
行ない、候補単語として、照合した辞書内単語と距離値
ｄを登録する。同じ長さの単語との照合が全て終了した
時、登録された候補単語の中で最小の距離値ｄを持つも
のが唯一存在する時、該単語を正解単語として出力す
る。この場合、入力の“ヨコハ？シ”と単語辞書７内の
“ヨコハマシ”との距離は唯一最小の0.5となり、これ
を単語照合の結果とする。First, the reading result of the address in FIG. 2 (b) is received by the word collation processing unit 3, the space is regarded as a delimiter between words, and the first word "Yokoha?" At the beginning is extracted to input the word. As W _i , the number of characters other than the erased character L ₁ = 5 and the number of erased character L ₂ = 0. That is, the word corresponds to a word that does not include a deletion character. Based on this, the word length L = 5. The variable FLG in the formula will be described later. Next, the word length index 6 of FIG. 7 is used to obtain the start address of the word having a word length of 5, and the word W _j is read from the word table 7,
Each character of W _i and W _j is compared, and the added value of FIG. 8 is added to the distance value d depending on the state. For example, if the distance value d is calculated between the word "Kamakurashi" in the dictionary and the input word "Yokoha? Shi", there is a mismatch when the first character "ka" and "yo" are compared, and 1 is added according to FIG. Similarly, the second character and the third character are also incremented by one. Since the last character "shi" matches, 0 is added. As a result, the distance value d becomes 3.5. This process is performed for all words having the same word length, and the matched dictionary word and the distance value d are registered as candidate words. When all matching with the word of the same length is completed, and only the registered candidate word having the smallest distance value d exists, the word is output as the correct word. In this case, the distance between the input "Yokohama?" And "Yokohama" in the word dictionary 7 is only 0.5, which is the minimum, and this is used as the result of word matching.

次に、抹消による訂正文字を含む、第2,第３の単語“ト
ツ！カノ”と“ナ！ノチヨウ”を用いて本発明の動作を
説明する。ここでは最初に抹消文字を除いて第１の単語
のごとく単語照合を行ない、その結果、唯一最小距離値
を持つ単語が存在する時はその単語を単語照合結果と
し、また、唯一最小距離値を持つ単語が存在しない時は
該抹消文字を除かず候補無しの文字として、再度、単語
照合を行ない、その結果を採用する場合の例を示す。こ
れらの切替については、単語長Ｌの導出式Ｌ＝L₁＋（L₂
×FLG）におけるFLGの値を０または１と値を制御するこ
とによって実現する。もちろん、２種類の照合方法の結
果の取り扱い方は他にも容易に考えうる。Next, the operation of the present invention will be described using the second and third words "Totsu! Kano" and "Na! First, the word matching is performed like the first word excluding the erasure character. As a result, when there is a word having the only minimum distance value, that word is set as the word matching result, and the only minimum distance value is set. An example will be shown in which, when there is no possessed word, the erased character is not removed and a candidate-free character is used, word matching is performed again, and the result is adopted. For these switching, the derivation formula of the word length L is L = L ₁ + (L ₂
This is realized by controlling the value of FLG in (× FLG) to 0 or 1. Of course, other methods of handling the results of the two types of matching methods can be easily considered.

先ず、入力単語“トツ！カノ”は抹消した次の文字枠に
書き直した場合の例であり、抹消文字以外の文字数L₁＝
４、抹消文字数L₂＝１として、単語長Ｌ＝４（FLG＝
０）で単語照合を行なう。つまり、“!"を除いた単語
“トツカノ”で単語照合を行なうと、辞書内単語“トツ
カク”との距離値ｄが１となり、かつ、唯一最小となる
ので同単語を単語照合の結果とする。First, the input word “Totsu! Kano” is an example of rewriting in the next deleted character frame, and the number of characters other than the deleted character L ₁ =
4, the number of erased characters L ₂ = 1 and the word length L = 4 (FLG =
The word matching is performed in 0). In other words, if word matching is performed using the word "Totsukano" excluding "!", The distance value d to the word "Totsukaku" in the dictionary becomes 1 and is the only minimum value, so the same word is used as the result of word matching. .

次に抹消した文字枠外に書き直した単語“ナ！ノチヨ
ウ”において、最初は抹消文字以外の文字数L₁＝５、抹
消文字数L₂＝１となり、単語長Ｌ＝５（FLG＝０）で単
語照合を行なうが、唯一の最小値を持つ単語W_jが存在せ
す、照合は失敗となる。そこで、FLG＝１とし、さら
に、抹消による訂正文字を含んだ単語長Ｌ＝６として再
度、単語照合を行なう。その結果、唯一最小距離の単語
として“ナカノチヨウ”を単語照合の結果とする。以上
の動作により単語照合処理部３の単語照合出力５として
第２図（ｃ）の内容が出力される。Next, in the word "Na! Nochiyou" rewritten outside the erased character frame, the number of characters other than the erase character L ₁ = 5, the number of erase character L ₂ = 1, and the word length L = 5 (FLG = 0) But there is a word W _j with only one minimum, the match fails. Therefore, FLG = 1 is set, and further, the word length L = 6 including the correction character by erasure is set and the word matching is performed again. As a result, "Nakanochiyou" is set as the word matching result as the only word with the smallest distance. By the above operation, the content of FIG. 2C is output as the word collation output 5 of the word collation processing unit 3.

〔The invention's effect〕

以上説明したように、筆記者の抹消文字による訂正法と
して２種類の方法に対応ができるため筆記制限を緩和で
きると同時に単語照合処理の精度が向上できる利点があ
る。As described above, since there are two types of correction methods by the erasure character of the writer, there is an advantage that the writing restriction can be relaxed and the accuracy of the word matching process can be improved.

[Brief description of drawings]

第１図は本発明の一実施例を示すブロック構成図、第２
図（ａ）（ｂ）（ｃ）は入力帳票の記入内容、OCRの読
み取り結果内容、単語照合処理部の出力内容を示す説明
図、第３図は単語辞書の例を示す説明図、第４図は実施
例の動作を示すフローチャート、第５図は単語照合時の
加算値を示す説明図である。１……入力帳票、２……OCR、３……単語照合処理部、
４……単語辞書、５……単語照合出力、６……単語長イ
ンデックス、７……単語テーブルFIG. 1 is a block diagram showing an embodiment of the present invention, and FIG.
4 (a), (b), and (c) are explanatory views showing the input contents of the input form, the contents of the OCR reading result, and the output contents of the word matching processing unit. FIG. FIG. 5 is a flow chart showing the operation of the embodiment, and FIG. 5 is an explanatory diagram showing the added value at the time of word matching. 1 ... Input form, 2 ... OCR, 3 ... Word matching processing unit,
4 ... word dictionary, 5 ... word collation output, 6 ... word length index, 7 ... word table

Claims

[Claims]

1. A word dictionary in which word data of a word group to be read is registered in advance, and a word collation processing unit for collating a reading result of a character string including a deletion character based on the word data. In the word collation device, the word collation processing unit regards the first collation means that performs collation based on the reading result of the character string excluding the erasure character and the erasure character as a rejected character for which no candidate character was obtained. A second collating means for collating the word by means of a second collating means.