Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
JPH0786908B2 - Word matching device - Google Patents
[go: Go Back, main page]

JPH0786908B2 - Word matching device - Google Patents

Word matching device

Info

Publication number
JPH0786908B2
JPH0786908B2 JP61276804A JP27680486A JPH0786908B2 JP H0786908 B2 JPH0786908 B2 JP H0786908B2 JP 61276804 A JP61276804 A JP 61276804A JP 27680486 A JP27680486 A JP 27680486A JP H0786908 B2 JPH0786908 B2 JP H0786908B2
Authority
JP
Japan
Prior art keywords
word
character
matching
collation
collating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP61276804A
Other languages
Japanese (ja)
Other versions
JPS63131289A (en
Inventor
直樹 中島
敏夫 堤田
隆彦 川谷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP61276804A priority Critical patent/JPH0786908B2/en
Publication of JPS63131289A publication Critical patent/JPS63131289A/en
Publication of JPH0786908B2 publication Critical patent/JPH0786908B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Description

【発明の詳細な説明】 〔産業上の利用分野〕 本発明は、文字読み取り装置により読み取られた文字列
の読み取り結果に基づいて単語照合処理を行う装置に関
する。
Description: TECHNICAL FIELD The present invention relates to a device for performing a word matching process based on a reading result of a character string read by a character reading device.

〔従来の技術〕[Conventional technology]

従来、光学文字読み取り装置(以下OCRという)により
帳票上の文字列を読み取ることによって得られた結果
を、読み取り対象となる単語群を予め登録してある単語
辞書部内の単語データにより、単語単位で照合する単語
照合装置がある。この種の装置では、単語辞書部の検索
を行うために入力単語の単語長、入力単語の特定の位置
における文字種によるインデックスを用いている。
Conventionally, the result obtained by reading a character string on a form by an optical character reading device (hereinafter referred to as OCR) is used for each word by word data in a word dictionary unit in which a word group to be read is registered in advance. There is a word matching device for matching. This type of device uses an index based on the word length of the input word and the character type at a specific position of the input word to search the word dictionary section.

〔発明が解決しようとする問題点〕[Problems to be solved by the invention]

ところで記入者が帳票上に誤記入したとき、第2図
(a)に示すように縦線等で1字抹消する場合がある。
この場合抹消文字部に一文字存在するものとして誤照合
される問題がある。また抹消文字部の欄外に正しい文字
が記入されている場合もあり、抹消文字に対し筆記制限
を緩和した際、抹消文字を文字有りとみなすか文字無し
とみなすか区別できず、正しい照合がとれないことが問
題となっていた。
By the way, when an erroneous entry is made on the form by mistake, one character may be erased by a vertical line as shown in FIG. 2 (a).
In this case, there is a problem that a character is erroneously collated as one character exists in the erased character portion. In addition, correct characters may be entered in the margin of the erasure character part, and when the writing restrictions on erasure characters are relaxed, it is not possible to distinguish whether the erasure character is considered to have characters or not, and a correct collation cannot be obtained. The problem was that it wasn't.

本発明の目的は、単語照合フィールドにおける、単語照
合処理の読取精度を維持したまま、筆記者の抹消文字に
よる訂正法の制限を緩和することにある。
An object of the present invention is to alleviate the limitation of the writer's correction method using erasure characters while maintaining the reading accuracy of the word matching process in the word matching field.

〔問題点を解決するための手段〕[Means for solving problems]

本発明によれば、読み取り対象となる単語群の単語デー
タを予め登録してある単語辞書部と、抹消文字を含む文
字列の読み取り結果を前記単語データに基づいて照合す
る単語照合処理部とを備えた単語照合装置において、前
記単語照合処理部は、抹消文字を除いた文字列に対する
読み取り結果をもとに照合を行う第1の照合手段と、抹
消文字を候補文字の得られなかった棄却文字とみなして
照合を行う第2の照合手段とを備えている。
According to the present invention, a word dictionary unit in which word data of a word group to be read is registered in advance, and a word matching processing unit that matches the reading result of a character string including a deletion character based on the word data. In the provided word collation device, the word collation processing unit includes a first collating unit that collates based on a reading result of a character string excluding the erasure character and a reject character for which the erasure character is not a candidate character. And a second collating means for collating.

〔作用〕[Action]

本発明によれば、抹消文字が削除である場合は第1の照
合手段で、抹消文字が訂正である場合は第2の照合手段
で正しい照合をとることができる。
According to the present invention, the correct collation can be performed by the first collating means when the erased character is the deletion and by the second collating means when the erased character is the correction.

〔実施例〕〔Example〕

第1図は本発明の一実施例を示すブロック構成図であ
り、1は入力帳票、2は従来技術により実現可能なOC
R、3は本発明を実施する単語照合処理部、4は単語辞
書、5は単語照合出力である。以下、本方式の動作を住
所の読取りを例に用いて説明する。
FIG. 1 is a block diagram showing an embodiment of the present invention, in which 1 is an input form and 2 is an OC that can be realized by a conventional technique.
R, 3 are word matching processing units for implementing the present invention, 4 is a word dictionary, and 5 is a word matching output. The operation of this method will be described below by taking the reading of an address as an example.

入力帳票1に書き込まれている内容は第2図(a)に示
すものとし、単語間はスペースによって区切られている
ものとする。そのときのOCR2の読取結果が第2図(b)
のような結果になったとする。第2図(b)において、
“?"はOCR2によって読取不能となった文字、“1"は筆記
者の訂正により抹消された文字である。また、“トツカ
ク”の“ク”は“ノ”に誤読された状態を示している。
この第2図(b)の内容が単語照合処理部3の入力とな
る。
The contents written in the input form 1 are as shown in FIG. 2 (a), and the words are separated by spaces. The reading result of OCR2 at that time is shown in Fig. 2 (b).
The result is as follows. In FIG. 2 (b),
"?" Is a character that cannot be read by OCR2, and "1" is a character that is erased by the writer's correction. In addition, "Ku" of "Totsukaku" indicates a state of being misread by "No".
The contents of FIG. 2 (b) are input to the word matching processing unit 3.

次に、単語辞書4のテーブル内容を第3図に示す。ここ
で、6は単語長インデックス、7は単語辞書であり、単
語テーブル7は単語長によってソーティングされ格納さ
れている。そして、前記単語テーブル7は単語長インデ
ックス6により、該当する単語長の単語が格納されてい
るスタートアドレスによって、検索開始位置が示される
ようになっている。
Next, the table contents of the word dictionary 4 are shown in FIG. Here, 6 is a word length index, 7 is a word dictionary, and the word table 7 is sorted and stored according to the word length. In the word table 7, the search start position is indicated by the word length index 6 and the start address where the word of the corresponding word length is stored.

以下、この単語照合処理部3の動作例について第4図に
フローチャートを用いて詳しく説明する。
Hereinafter, an example of the operation of the word matching processing unit 3 will be described in detail with reference to the flowchart in FIG.

先ず、第2図(b)の住所の読取り結果を、単語照合処
理部3で受け取り、スペースを単語間の区切り文字とみ
なし、先頭の“ヨコハ?シ”なる第1の単語を抽出し入
力単語Wiとし、抹消文字以外の文字数L1=5、抹消文字
数L2=0となる。即ち、該単語は抹消文字を含まない単
語に相当する。これをもとに、単語長L=5となる。式
内の変数FLGについては後述する。次に、第7図の単語
長インデックス6により単語長が5の単語のスタートア
ドレスを得て、単語テーブル7から単語Wjを読みこみ、
WiとWjの各文字を比較し、その状態により第8図の加算
値を距離値dに加算する。例えば、辞書内単語“カマク
ラシ”と入力単語“ヨコハ?シ”で距離値dを求める
と、第1文字目の“カ”と“ヨ”を比較すると不一致と
なり、第8図により1を加算する、同様に第2文字、第
3文字目についても1ずつ加算される。最後の文字
“シ”は一致するので0が加算される。この結果、距離
値dは3.5となる。この処理を同じ単語長の単語全てに
行ない、候補単語として、照合した辞書内単語と距離値
dを登録する。同じ長さの単語との照合が全て終了した
時、登録された候補単語の中で最小の距離値dを持つも
のが唯一存在する時、該単語を正解単語として出力す
る。この場合、入力の“ヨコハ?シ”と単語辞書7内の
“ヨコハマシ”との距離は唯一最小の0.5となり、これ
を単語照合の結果とする。
First, the reading result of the address in FIG. 2 (b) is received by the word collation processing unit 3, the space is regarded as a delimiter between words, and the first word "Yokoha?" At the beginning is extracted to input the word. As W i , the number of characters other than the erased character L 1 = 5 and the number of erased character L 2 = 0. That is, the word corresponds to a word that does not include a deletion character. Based on this, the word length L = 5. The variable FLG in the formula will be described later. Next, the word length index 6 of FIG. 7 is used to obtain the start address of the word having a word length of 5, and the word W j is read from the word table 7,
Each character of W i and W j is compared, and the added value of FIG. 8 is added to the distance value d depending on the state. For example, if the distance value d is calculated between the word "Kamakurashi" in the dictionary and the input word "Yokoha? Shi", there is a mismatch when the first character "ka" and "yo" are compared, and 1 is added according to FIG. Similarly, the second character and the third character are also incremented by one. Since the last character "shi" matches, 0 is added. As a result, the distance value d becomes 3.5. This process is performed for all words having the same word length, and the matched dictionary word and the distance value d are registered as candidate words. When all matching with the word of the same length is completed, and only the registered candidate word having the smallest distance value d exists, the word is output as the correct word. In this case, the distance between the input "Yokohama?" And "Yokohama" in the word dictionary 7 is only 0.5, which is the minimum, and this is used as the result of word matching.

次に、抹消による訂正文字を含む、第2,第3の単語“ト
ツ!カノ”と“ナ!ノチヨウ”を用いて本発明の動作を
説明する。ここでは最初に抹消文字を除いて第1の単語
のごとく単語照合を行ない、その結果、唯一最小距離値
を持つ単語が存在する時はその単語を単語照合結果と
し、また、唯一最小距離値を持つ単語が存在しない時は
該抹消文字を除かず候補無しの文字として、再度、単語
照合を行ない、その結果を採用する場合の例を示す。こ
れらの切替については、単語長Lの導出式L=L1+(L2
×FLG)におけるFLGの値を0または1と値を制御するこ
とによって実現する。もちろん、2種類の照合方法の結
果の取り扱い方は他にも容易に考えうる。
Next, the operation of the present invention will be described using the second and third words "Totsu! Kano" and "Na! First, the word matching is performed like the first word excluding the erasure character. As a result, when there is a word having the only minimum distance value, that word is set as the word matching result, and the only minimum distance value is set. An example will be shown in which, when there is no possessed word, the erased character is not removed and a candidate-free character is used, word matching is performed again, and the result is adopted. For these switching, the derivation formula of the word length L is L = L 1 + (L 2
This is realized by controlling the value of FLG in (× FLG) to 0 or 1. Of course, other methods of handling the results of the two types of matching methods can be easily considered.

先ず、入力単語“トツ!カノ”は抹消した次の文字枠に
書き直した場合の例であり、抹消文字以外の文字数L1
4、抹消文字数L2=1として、単語長L=4(FLG=
0)で単語照合を行なう。つまり、“!"を除いた単語
“トツカノ”で単語照合を行なうと、辞書内単語“トツ
カク”との距離値dが1となり、かつ、唯一最小となる
ので同単語を単語照合の結果とする。
First, the input word “Totsu! Kano” is an example of rewriting in the next deleted character frame, and the number of characters other than the deleted character L 1 =
4, the number of erased characters L 2 = 1 and the word length L = 4 (FLG =
The word matching is performed in 0). In other words, if word matching is performed using the word "Totsukano" excluding "!", The distance value d to the word "Totsukaku" in the dictionary becomes 1 and is the only minimum value, so the same word is used as the result of word matching. .

次に抹消した文字枠外に書き直した単語“ナ!ノチヨ
ウ”において、最初は抹消文字以外の文字数L1=5、抹
消文字数L2=1となり、単語長L=5(FLG=0)で単
語照合を行なうが、唯一の最小値を持つ単語Wjが存在せ
す、照合は失敗となる。そこで、FLG=1とし、さら
に、抹消による訂正文字を含んだ単語長L=6として再
度、単語照合を行なう。その結果、唯一最小距離の単語
として“ナカノチヨウ”を単語照合の結果とする。以上
の動作により単語照合処理部3の単語照合出力5として
第2図(c)の内容が出力される。
Next, in the word "Na! Nochiyou" rewritten outside the erased character frame, the number of characters other than the erase character L 1 = 5, the number of erase character L 2 = 1, and the word length L = 5 (FLG = 0) But there is a word W j with only one minimum, the match fails. Therefore, FLG = 1 is set, and further, the word length L = 6 including the correction character by erasure is set and the word matching is performed again. As a result, "Nakanochiyou" is set as the word matching result as the only word with the smallest distance. By the above operation, the content of FIG. 2C is output as the word collation output 5 of the word collation processing unit 3.

〔発明の効果〕〔The invention's effect〕

以上説明したように、筆記者の抹消文字による訂正法と
して2種類の方法に対応ができるため筆記制限を緩和で
きると同時に単語照合処理の精度が向上できる利点があ
る。
As described above, since there are two types of correction methods by the erasure character of the writer, there is an advantage that the writing restriction can be relaxed and the accuracy of the word matching process can be improved.

【図面の簡単な説明】[Brief description of drawings]

第1図は本発明の一実施例を示すブロック構成図、第2
図(a)(b)(c)は入力帳票の記入内容、OCRの読
み取り結果内容、単語照合処理部の出力内容を示す説明
図、第3図は単語辞書の例を示す説明図、第4図は実施
例の動作を示すフローチャート、第5図は単語照合時の
加算値を示す説明図である。 1……入力帳票、2……OCR、3……単語照合処理部、
4……単語辞書、5……単語照合出力、6……単語長イ
ンデックス、7……単語テーブル
FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG.
4 (a), (b), and (c) are explanatory views showing the input contents of the input form, the contents of the OCR reading result, and the output contents of the word matching processing unit. FIG. FIG. 5 is a flow chart showing the operation of the embodiment, and FIG. 5 is an explanatory diagram showing the added value at the time of word matching. 1 ... Input form, 2 ... OCR, 3 ... Word matching processing unit,
4 ... word dictionary, 5 ... word collation output, 6 ... word length index, 7 ... word table

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】読み取り対象となる単語群の単語データを
予め登録してある単語辞書と、抹消文字を含む文字列の
読み取り結果を前記単語データに基づいて照合する単語
照合処理部とを備えた単語照合装置において、前記単語
照合処理部は、抹消文字を除いた文字列に対する読み取
り結果をもとに照合を行う第1の照合手段と、抹消文字
を候補文字の得られなかった棄却文字とみなして照合を
行う第2の照合手段とを備えていることを特徴とする単
語照合装置。
1. A word dictionary in which word data of a word group to be read is registered in advance, and a word collation processing unit for collating a reading result of a character string including a deletion character based on the word data. In the word collation device, the word collation processing unit regards the first collation means that performs collation based on the reading result of the character string excluding the erasure character and the erasure character as a rejected character for which no candidate character was obtained. A second collating means for collating the word by means of a second collating means.
JP61276804A 1986-11-21 1986-11-21 Word matching device Expired - Lifetime JPH0786908B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61276804A JPH0786908B2 (en) 1986-11-21 1986-11-21 Word matching device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61276804A JPH0786908B2 (en) 1986-11-21 1986-11-21 Word matching device

Publications (2)

Publication Number Publication Date
JPS63131289A JPS63131289A (en) 1988-06-03
JPH0786908B2 true JPH0786908B2 (en) 1995-09-20

Family

ID=17574614

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61276804A Expired - Lifetime JPH0786908B2 (en) 1986-11-21 1986-11-21 Word matching device

Country Status (1)

Country Link
JP (1) JPH0786908B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2723973B2 (en) * 1989-05-29 1998-03-09 沖電気工業株式会社 Character recognition device

Also Published As

Publication number Publication date
JPS63131289A (en) 1988-06-03

Similar Documents

Publication Publication Date Title
US5161245A (en) Pattern recognition system having inter-pattern spacing correction
JPS6140671A (en) Word division processing method
JPH0786908B2 (en) Word matching device
JP2815707B2 (en) Keyword search method
JPS5853791B2 (en) character recognition device
JP2982244B2 (en) Character recognition post-processing method
JPH0256086A (en) Post-processing method for character recognition
JPH0546806A (en) Character recognition method
JPH0355874B2 (en)
JPH06259481A (en) Character string matching method and device having longest matching matching function of same character type
JPH06274701A (en) Word collating device
JP2529421B2 (en) Character recognition device
JP2746345B2 (en) Post-processing method for character recognition
JP2935533B2 (en) Character processing method
JP2839515B2 (en) Character reading system
JPS63138479A (en) Character recognizing device
JPH0259513B2 (en)
JPS6061875A (en) Generation system of standard pattern
JPH03278194A (en) Character recognition processing system
JPH0338787A (en) Character recognition processor
JPS60186959A (en) Control system for dictionary data
JPH01194088A (en) Collating device for character string and word
JPH0357506B2 (en)
Benezky et al. LEXICO Guide No. 7-Headword Classification
JPH03171276A (en) Word reference device