JPH0252315B2

JPH0252315B2 -

Info

Publication number: JPH0252315B2
Application number: JP56177719A
Authority: JP
Inventors: Masaki Komya
Original assignee: Tokyo Shibaura Electric Co Ltd
Current assignee: Toshiba Corp
Priority date: 1981-11-05
Filing date: 1981-11-05
Publication date: 1990-11-13
Also published as: JPS5878276A

Description

【発明の詳細な説明】この発明は特に辞書の選択方式を改良した光学
的文字読取装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention particularly relates to an optical character reading device with an improved dictionary selection method.

現在、英文ワードプロセツサの入力用の光学的
文字読取装置（以下OCRという）が普及しつつ
ある。しかしながら、従来、上記のようなOCR
で読取れる文字種は、ほとんどの場合OCR用字
体に限られており、通常用いられオフイスフオン
トを使用できるものは少なかつた。また、各種フ
オントを読取れるマルチフオント用OCRは価格
がワードプロセツサに比べて非常に高いためバラ
ンスがとれず、ワードプロセツサの入力用の
OCRとしては不向きであつた。さらに、マルチ
フオント用のOCRを用いた場合、プログラム又
はスイツチによつてフオントを選択する必要があ
つた。 Currently, optical character reading devices (hereinafter referred to as OCR) for input into English word processors are becoming popular. However, conventionally, the above OCR
In most cases, the types of characters that can be read are limited to OCR fonts, and there are only a few that can be used with commonly used office fonts. In addition, multi-font OCR that can read various fonts is very expensive compared to word processors, so it is difficult to maintain a good balance.
It was unsuitable for OCR. Furthermore, when using OCR for multiple fonts, it was necessary to select the font using a program or a switch.

この発明は上記のような事情に鑑みてなされた
もので、予め定められた帳票上の文字に基づき読
取用の辞書の切換えを行ない、複数種のフオント
を読取ることができる光学的文字読取装置を提供
することを目的とする。 This invention was made in view of the above circumstances, and provides an optical character reading device that can read multiple types of fonts by switching the reading dictionary based on predetermined characters on a form. The purpose is to provide.

以下、図面を参照してこの発明の一実施例を説
明する。第１図はこの発明の一実施例の概略構成
図である。図中、符号１は光電変換部を示してい
る。この光電変換部１は、帳票上の文字を光電変
換し、光電変換された文字パターンとして出力す
る機能を持つている。符号２は認識部を示してい
る。この認識部２は、上記光電変換された文字パ
ターンの認識を行なう機能を持つている。符号３
は辞書インデツクステーブルを示している。この
辞書インデツクステーブル３は、各種フオントに
対応する辞書の対応格納領域の先頭番地を記憶し
ている。符号４はマルチフオント辞書を示してい
る。このマルチフオント辞書４は、複数の各フオ
ント毎に独立した辞書から構成されている。符号
５はアキユムレータを示している。このアキユム
レータ５は、帳票における第１行目の文字につい
て、マルチフオント辞書４における各フオント毎
の辞書における標準パターンとマツチング結果の
類似度値が記憶されるようになつている。 Hereinafter, one embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a schematic diagram of an embodiment of the present invention. In the figure, reference numeral 1 indicates a photoelectric conversion section. The photoelectric conversion unit 1 has a function of photoelectrically converting characters on a form and outputting the photoelectrically converted character pattern. Reference numeral 2 indicates a recognition section. This recognition section 2 has a function of recognizing the photoelectrically converted character pattern. code 3
indicates a dictionary index table. This dictionary index table 3 stores the starting addresses of corresponding storage areas of dictionaries corresponding to various fonts. Reference numeral 4 indicates a multi-font dictionary. This multi-font dictionary 4 is composed of independent dictionaries for each of a plurality of fonts. Reference numeral 5 indicates an accumulator. The accumulator 5 is configured to store the similarity value between the standard pattern in the dictionary for each font in the multi-font dictionary 4 and the matching result for the character in the first line of the document.

次に、上記実施例の動作を説明する。まず、こ
のOCRで使用される帳票には、第２図に示すよ
うに、帳票の第１行目に、その帳票で使用される
フオントで予め定められた特定の文字配列、この
場合文字列ABCを印字しておく。なお文字のフ
オントの種類には、Courier12，Courier72，
Prestige Elite72，Prestige Pica等がある。そし
て第２行目以下は、同一フオントで自由にデータ
や文章を印字するようにする。 Next, the operation of the above embodiment will be explained. First, as shown in Figure 2, the form used in this OCR has a specific character arrangement predetermined in the font used in the form, in this case the character string ABC, in the first line of the form. Print it out. The font types include Courier12, Courier72,
There are Prestige Elite72, Prestige Pica, etc. From the second line onwards, data and text can be freely printed in the same font.

このような帳票における文字は、光電変換部１
で光電変換される。このとき、まず帳票における
第１行目に印字された特定の文字が光電変換され
る。認識部２は、辞書インデツクステーブル３の
内容を参照して、マルムフオント辞書４内の１つ
の辞書から、上記第１行目に印字された文字に対
応する標準パターンを読み出し、この読み出され
た標準パターンと、上記第１行目の光電変換され
た文字パターンとのマツチングを行なう。そし
て、このマツチングの結果であるこの場合第１行
目の３文字の類似度値の和が、アキユムレータ５
に記憶される。さらに、上記同様な動作が、マル
チフオント辞書４における各辞書毎にくり返して
行なわれ、各辞書毎のマツチング結果である上記
３文字の類似度値の和が順次アキユムレータ５に
記憶される。このようにして、アキユムレータ５
に記憶されたマルチフオント辞書４における各辞
書毎の類似度値の和について比較が認識部２で行
なわれる。この際、最も類似度値の和の大きかつ
た辞書がマルチフオント辞書４から選択される。 The characters in such a form are written in the photoelectric conversion unit 1.
is photoelectrically converted. At this time, first, specific characters printed on the first line of the form are photoelectrically converted. The recognition unit 2 refers to the contents of the dictionary index table 3, reads out the standard pattern corresponding to the character printed in the first line from one dictionary in the Malmfont dictionary 4, and Matching is performed between the standard pattern and the photoelectrically converted character pattern in the first line. Then, the sum of the similarity values of the three characters in the first row, which is the result of this matching, is calculated by the accumulator 5.
is memorized. Further, the same operation as described above is repeated for each dictionary in the multi-font dictionary 4, and the sum of the similarity values of the three characters, which is the matching result for each dictionary, is sequentially stored in the accumulator 5. In this way, the accumulator 5
The recognition unit 2 compares the sum of similarity values for each dictionary in the multi-font dictionary 4 stored in the multi-font dictionary 4 . At this time, the dictionary with the largest sum of similarity values is selected from the multi-font dictionary 4.

そして、２行目以降は、選択された辞書により
認識を行なつていく。この場合、途中の行で、認
識不能文字が多発した場合には、その行について
再度辞書の選択を行なう。これは、認識不能文字
数の数、その行における各文字の類似度値に基づ
き辞書の選択を行なうものである。 From the second line onward, recognition is performed using the selected dictionary. In this case, if unrecognized characters occur frequently in a line in the middle, dictionary selection is performed again for that line. This selects a dictionary based on the number of unrecognizable characters and the similarity value of each character in the line.

したがつて、このようなOCRでは次のような
効果を奏する。 Therefore, such OCR has the following effects.

(1) 辞書を選択するために必要とする文字が決め
られており、各辞書から読み出してくる標準文
字パターンが限定されているので、辞書の選択
に要する時間が少なくて済む。(1) Since the characters required to select a dictionary are determined and the standard character patterns read from each dictionary are limited, the time required for dictionary selection can be reduced.

(2) 各種フオントのミツクスマルチ辞書を作るの
に比べ、辞書の作成が、各フオント毎の単独の
辞書を寄せ集めたものでよく、また類似文字の
発生も考慮しなくてよいので容易である。(2) Compared to creating a mix multi-dictionary for various fonts, it is easier to create a dictionary because it only requires a collection of individual dictionaries for each font, and there is no need to consider the occurrence of similar characters.

(3) ミツクスマルチ辞書に比較して、高精度の読
取りが可能であり、同時に認識時間の高速化が
可能である。(3) Compared to MIX multi-dictionaries, it is possible to read with high accuracy and at the same time, it is possible to speed up the recognition time.

(4) オペレータは、帳標の第１行目にその時に使
用するタイプヘツドで予め定められた該当する
文字列を印字するだけでよいので負担がかから
ない。(4) The operator only needs to print the corresponding character string predetermined in the type head being used at that time on the first line of the ledger, so there is no burden on the operator.

(5) マルチフオント辞書４には、必要なフオント
の辞書のみを組み合わせて構成できるのでオプ
シヨン化が容易に行なえる。(5) Since the multi-font dictionary 4 can be configured by combining only necessary font dictionaries, it can be easily made optional.

以上述べたようにこの発明によれば、予め定め
られた帳票上の文字に基づき読取用の辞書の切換
えを行ない、複数種のフオントを読取ることがで
きる光学的文字読取装置を提供することができ
る。 As described above, according to the present invention, it is possible to provide an optical character reading device that can read multiple types of fonts by switching the reading dictionary based on predetermined characters on a form. .

[Brief explanation of the drawing]

第１図はこの発明の一実施例の概略構成図、第
２図は同実施例に使用される帳票を示す図であ
る。１……光電変換部、２……認識部、３……辞書
インデツクステーブル、４……マルチフオント辞
書、５……アキユムレータ。 FIG. 1 is a schematic diagram of an embodiment of the present invention, and FIG. 2 is a diagram showing a form used in the embodiment. 1... Photoelectric conversion unit, 2... Recognition unit, 3... Dictionary index table, 4... Multifont dictionary, 5... Accumulator.

Claims

[Claims]

1. A photoelectric conversion unit that photoelectrically converts characters on a form;
A multi-font dictionary consisting of a plurality of dictionaries corresponding to each font, and a similarity value obtained by referring to each of the plurality of dictionaries for the photoelectrically converted character pattern of a specific character at a specific position on the form. The accumulator stored for each dictionary and the similarity value are determined, and the dictionary corresponding to the highest similarity value stored in the accumulator is selected from the multi-font dictionary, and the selected dictionary is referred to for photoelectric conversion. 1. An optical character reading device comprising: recognition means for recognizing a converted character pattern.