JPH0664624B2

JPH0664624B2 - Optical character reading method

Info

Publication number: JPH0664624B2
Application number: JP59040015A
Authority: JP
Inventors: 廣洲石黒; 章夫深沢
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1984-03-02
Filing date: 1984-03-02
Publication date: 1994-08-22
Anticipated expiration: 2009-08-22
Also published as: JPS60183688A

Description

【発明の詳細な説明】＜技術分野＞本発明は光学文字読取方式、特に光学文字読取装置（以
下OCRと称す）により文字記入フォーマットを自動的に
検出して読取結果を出力する文字読取方式に関するもの
である。Description: TECHNICAL FIELD The present invention relates to an optical character reading method, and more particularly to a character reading method that automatically detects a character entry format by an optical character reading device (hereinafter referred to as OCR) and outputs a reading result. It is a thing.

＜従来技術＞近年、OCRの普及によりコンピュータのデータ入力効率
は大幅に改善されて来た。しかし、従来OCRが読み取れ
る帳票は記入フォーマット、すなわち、文字の記入位
置、桁数等をあらかじめ細部に亘りOCR読取プログラ
ム、あるいはパラメータして与えておく必要があった。
このため、記入フォーマットが変わる毎にパラメータを
設定し直す必要があり、当然帳票もその都度変更するこ
とから、多様性への対応が問題になって来た。<Prior Art> In recent years, the data input efficiency of computers has been greatly improved due to the spread of OCR. However, in the past, a form that can be read by OCR had to be given an entry format, that is, a character entry position, the number of digits, etc., in detail in advance as an OCR reading program or a parameter.
For this reason, it is necessary to reset the parameters each time the entry format changes, and of course, the form must be changed each time, which makes the issue of dealing with diversity a problem.

すなわち、OCRの普及と共にOCRを使用する人の範囲が拡
がり、当初専門家によって利用若しくは管理されていた
状況から、非専門家も扱う状況へと変化して来た。この
ため、前記のような帳票や読取プログラムの多様性への
対応が増々大きな問題となって来た。In other words, with the spread of OCR, the range of people who use OCR has expanded, and the situation where it was initially used or managed by professionals has changed to a situation where non-specialists also handle it. Therefore, dealing with the variety of forms and reading programs as described above has become a big problem.

このようなことから望ましいのは出来るだけ簡便に帳票
が準備出来、且つ読取プログラムが作成出来ることであ
り、具体的対応例としてPPC用紙の利用とか、読取プロ
グラムを実帳票を読ませるだけで作成する方法等が出現
している。しかし、これらの改善策も帳票と読取プログ
ラムの両面を併せて改善するものではないため、依然と
して非専問家にとって必ずしもOCRが便利なものとはい
えない状況であった。From this, it is desirable that the form can be prepared and the reading program can be created as easily as possible. As a concrete correspondence example, use PPC paper or create the reading program only by reading the actual form. Methods are emerging. However, since these improvement measures do not improve both the form and the reading program, OCR is still not always convenient for non-specialists.

一般的に、帳票のフォーマットを規定せず何でも自由に
読み取るOCRが理想であるが、技術的にはこのレベルは
当面不可能と考えられている。In general, OCR is ideal for reading anything without specifying the format of the form, but technically this level is considered impossible for the time being.

一歩譲って考えると、汎用帳票に出来るだけ自由に記入
した文字群をOCRが自動的に読み取ることが、前記問題
点を緩和し、理想に近づける具体的方策といえる。ここ
でいう出来るだけ自由にとは、極く簡単な制約、あるい
はルールを設定し、その上で自由に記入するということ
である。しかし、そのルールは日常生活の習慣に基づく
ものであれば、一般に容易に受け入れられるものになる
と考えられ、実用性が増すことになる。しかし、ここで
いうルールは、OCRにとって論理的に解読可能でなけれ
ばならない。To give a step back, it can be said that the OCR automatically reads the character groups that are filled in as freely as possible on a general-purpose form to alleviate the above problems and bring them closer to the ideal. To be as free as possible here means to set extremely simple constraints or rules, and then fill them in freely. However, if the rule is based on the habits of everyday life, it is considered that it will be generally accepted, and its practicality will increase. However, the rules here must be logically readable by OCR.

ルールの程度とOCRによる自動読取の難易度は当然相関
があることになるが、OCRが論理的に解読可能な範囲に
おいては、単に処理の複雑さの程度として扱うことが可
能である。ここで、OCRが論理的に解読可能なルールの
限界の証明が問題であるように考えられるが、OCRの利
用者がルールを理解して協力的に対応する範囲におい
て、現実的に限界を設定出来るので、厳密な証明は必要
なく、結果としての利用価値の評価において、ルールの
適性が試されるのみである。The degree of rules and the difficulty of automatic reading by OCR naturally have a correlation, but in the range where OCR can be logically deciphered, it can be treated simply as the degree of processing complexity. Here, it seems that the proof of the limit of rules that OCR can logically decipher is a problem, but the limits are set realistically within the range in which OCR users understand the rules and respond cooperatively. As it can be done, no rigorous proof is required, only the suitability of the rule is tested in the evaluation of the resulting utility value.

さて、出来るだけ自由に記入出来るよう提供する汎用帳
票は、記入枠を縦，横一様に設定したものでよく、装置
や利用者の便宜の上において設計してあらかじめ用意す
ることが出来、この汎用帳票全体の読取プログラムは、
OCRの中にあらかじめ組み込める従来型のものでよい。A general-purpose form that can be filled in as freely as possible can be one in which the filling frames are set vertically and horizontally, and can be designed and prepared in advance for the convenience of the device and user. The reading program for the entire general-purpose form is
A conventional type that can be incorporated in the OCR beforehand can be used.

＜発明の目的＞本発明の目的は、記入ルールに従って記入された汎用帳
票を読み取らせる際、帳票上の全文字をブランクを含め
て読み取った後、全文字情報を一時記憶し、該全文字配
列を用いて帳票上に出現した文字記入者の意図するとこ
ろの記入レイアウト情報を抽出，処理し、帳票上の文字
データ群と記入方法の情報をルールに照して自動的に得
ることによりあらかじめ詳細フォーマットプログラムを
用意することなく、汎用フォーマット上文字配列から記
入フォーマットを検出して不要データを除去し、記入者
が意図するところの必要な読取結果を出力する光学文字
読取方式を提供することにある。<Object of Invention> An object of the present invention is to read all characters on a form including blanks and then temporarily store all character information when reading a general-purpose form filled in according to an entry rule, and to arrange the whole character arrangement. Details are obtained in advance by extracting and processing the entry layout information that appears on the form as intended by the character writer, and automatically obtaining the character data group and entry method information on the form according to the rules. It is to provide an optical character reading method that detects an entry format from a character array on a general-purpose format, removes unnecessary data, and outputs a necessary reading result intended by the writer without preparing a format program. .

＜発明の構成＞本発明によれば、記入すべき文字等の相対的な配置のみ
を規定する記入ルールが予め設定され、その記入ルール
にしたがって汎用帳票に記入された文字等をブランクも
含めて読み取って得た全文字情報を前記汎用帳票のフォ
ーマットに合致した文字配列で一時記憶する記憶手段
と、前記記憶手段から読み出した全文字情報中から前記
汎用帳票上の文字配列を示す記入レイアウト情報を前記
記入ルールに基づき抽出し出力するレイアウト情報抽出
手段と、前記レイアウト情報抽出手段から入力した記入
レイアウト情報に含まれる文字配列に対して、前記汎用
帳票上に記入された同一グループに属する文字データフ
ィールドの抽出データと前記文字データフィールドの記
入方法とを示すフォーマット情報を前記記入ルールに基
づき抽出し出力するフォーマット情報抽出手段と、前記
記憶手段に記憶されている全文字情報を読み出し前記フ
ォーマット情報抽出手段から入力したフォーマット情報
に基づいて所要データのみ出力する読取結果出力手段と
を含むことを特徴とする光学文字読取装置が得られる。<Structure of the Invention> According to the present invention, a writing rule that prescribes only the relative arrangement of the characters to be written is set in advance, and the characters and the like written in the general-purpose form according to the writing rule are also included, including blanks. Storage means for temporarily storing all character information obtained by reading in a character array that matches the format of the general-purpose form, and entry layout information indicating a character array on the general-purpose form from all character information read from the storage means. Layout information extracting means for extracting and outputting based on the entry rule, and a character data field belonging to the same group entered on the general-purpose form for a character array included in the entry layout information input from the layout information extracting means. Based on the entry rule, format information indicating the extracted data of and the entry method of the character data field is provided. Format information extraction means for extracting and outputting, and read result output means for reading out all character information stored in the storage means and outputting only required data based on the format information input from the format information extraction means. A characteristic optical character reader is obtained.

＜実施例＞次に本発明の実施例について、図面を用いて説明する。
第１図は本発明の一実施例のブロック構成図で、１はOC
R、２は一時記憶装置、３はレイアウト情報抽出部、４
はフォーマット情報抽出部、５は読取結果出力部であ
る。<Example> Next, the Example of this invention is described using drawing.
FIG. 1 is a block diagram of an embodiment of the present invention, in which 1 is an OC
R, 2 is a temporary storage device, 3 is a layout information extraction unit, 4
Is a format information extraction unit, and 5 is a read result output unit.

一般的なOCR1で読み取られた汎用帳票の読取結果11はブ
ランクを含めて、汎用帳票のフォーマットに合致した文
字配列で一時記憶装置２に格納される。一時記憶装置２
の出力21はレイアウト情報抽出部３に与えられる。レイ
アウト情報抽出部３において帳票上の文字配列、すなわ
ち、上記一時記憶２の内容から記入レイアウト情報が抽
出され、出力31としてフォーマット情報抽出部４に与え
られる。次にフォーマット情報抽出部４において、文字
配列に対して、同一グループに属するデータ群の分類、
記入位置の確認、右づめと左づめのチェックを行い、結
果をフォーマット情報41として読取結果出力部５に与え
る。最終的に読取結果出力部５において、一時記憶の内
容21を読み出しながら、フォーマット情報41を用いて不
要なデータを除去し、記入者の意図する必要なデータの
みを最終出力51として出力することにより、汎用帳票上
に記入ルールに基づいて記入されたデータを読み取る。The reading result 11 of the general-purpose form read by the general OCR 1 is stored in the temporary storage device 2 in a character array that matches the format of the general-purpose form, including blanks. Temporary storage device 2
The output 21 of is output to the layout information extraction unit 3. The layout information extraction unit 3 extracts the input layout information from the character array on the form, that is, the contents of the temporary storage 2, and supplies it as the output 31 to the format information extraction unit 4. Next, the format information extraction unit 4 classifies the data groups belonging to the same group with respect to the character array,
The writing position is confirmed and right-handed and left-handed checks are performed, and the result is given to the read result output unit 5 as the format information 41. Finally, in the read result output unit 5, while reading the contents 21 of the temporary storage, unnecessary data is removed using the format information 41, and only the necessary data intended by the writer is output as the final output 51. , Read the data entered on the general-purpose form based on the entry rules.

第２図は本発明による読取方式の一実施例における汎用
帳票と文字記入例を示すものである。同図に示す如く、
まず汎用帳票ＳはA5版横置きで、１行当り20文字、全体
で12行の汎用フォーマットになっている。この20文字×
12行の汎用帳票Ｓは、第１図で示した一般的なOCR1で読
み取るよう、OCR1においてプログラムされている。この
ような汎用帳票は、市販のOCRで十分読み取れるので詳
細説明は省略する。FIG. 2 shows a general-purpose form and an example of character entry in one embodiment of the reading method according to the present invention. As shown in the figure,
First of all, the general-purpose form S is horizontally placed in A5 format and is in a general-purpose format with 20 characters per line and 12 lines in total. This 20 characters ×
The 12-line generic form S is programmed in the OCR1 to read in the generic OCR1 shown in FIG. Since such a general-purpose form can be sufficiently read by a commercially available OCR, detailed description is omitted.

第２図において、帳票上部に示す１〜20の数字はカラム
番号を表わし、同じく左部に示す〜の数字は行番号
を表わす。四角で示す各ます目は文字記入枠を表わし、
記入枠内の各数字等は文字記入例を表わす。記入される
文字は数字に限らなく、OCRで読み取り可能な範囲で何
でもよい。In FIG. 2, the numbers 1 to 20 shown in the upper part of the form represent column numbers, and the numbers 1 to 20 shown in the left part show row numbers. Each square shown as a square represents a character entry box,
Each number, etc. in the entry frame represents a character entry example. The characters to be entered are not limited to numbers, and can be anything that can be read by OCR.

さて、同図における文字記入は、実施例において以下の
ような記入ルールでなされている。Now, the character entry in the figure is made according to the following entry rules in the embodiment.

ルール：行単位でフォーマットが変わったときブランク
行を入れる。（例；行，，）ルール2:1行中、ブランクでフィールドの切れ目とす
る。但し、左右端カラムはブランクがなくても切れ目と
する。Rule: Insert a blank line when the format changes line by line. (Example; line,) Rule 2: Blank lines in a line are used as field breaks. However, the left and right columns are cut even if there is no blank.

ルール3:左づめ、右づめについて、Ａ）各フィールドの左端がそろっていれば左づめとするＢ）各フィールドの左端がそろっていなく、右端がそろ
っていれば右づめとするこのようなルール１〜ルール３は日常使用するデータの
記入方法とほぼ同じもので、人間同志の間でも使用され
得る一般的なもので、OCR向きに限定したことが日常業
務活動に与える影響は少ない。尚、上記ルール１〜ルー
ル３の説明中、フィールドとは１組のデータを表わすも
ので、例えば第２図行のカラム18〜20の「100」は１
つのフィールドであるという如くである。Rule 3: Left-justified and right-justified: A) If the left ends of each field are aligned, B) If the left ends of each field are not aligned, and if the right ends are aligned, then such rules The rules 1 to 3 are almost the same as the data entry method used in daily life, and are generally used even among human beings, and limiting the OCR orientation has little effect on daily business activities. In the explanation of the above rules 1 to 3, a field represents one set of data, and for example, "100" in columns 18 to 20 of the row in FIG. 2 is 1
It seems that there are two fields.

またルール３について、若干の補則がある。これは例え
ば業務上５桁のフィールドであるが、ある帳票に記入す
る時、たまたま右づめで３桁しかデータがないというよ
うな場合の記入方法について、ルール３を意識して左右
づめが間違われないようにする必要がある。つまり、第
２図〜行のカラム16〜20のフィールドについて、こ
のフィールドは５桁であるが、データが右づめの「10
0」，「210」，「350」であっても、そのまま３桁記入
したままだと、ルール3A）により左づめになってしま
う。このとき「100」について図の如く、「00100」と記
入しておけば５桁のデータになり、結果として右づめに
なる。これを補則とする。There are some supplementary rules regarding Rule 3. For example, this is a field of 5 digits for business purposes, but when filling in a certain form, if you happen to have only 3 digits of data in the right column, you should be aware of rule 3 and the left and right columns will not be confused. Need to do so. In other words, regarding the fields in columns 16 to 20 of the row in FIG. 2, this field has 5 digits, but the data is right-padded to "10".
Even if it is "0", "210", or "350", if you continue to enter 3 digits, it will be left-justified according to Rule 3A). At this time, if you enter "00100" for "100" as shown in the figure, it will be 5-digit data, and as a result, it will be right-justified. This is a supplementary rule.

補則：右づめデータについて、フィールド長より記入デ
ータが少ない時、ルール３に照して必要な桁数だけ左側
に「０」を付加しておく。Supplementary rule: Regarding right-padded data, when there is less data to fill in than the field length, "0" is added to the left by the required number of digits in accordance with Rule 3.

（但し一番上の行のみでよい）次に第３図を用いて、記入データとルールの関係及びデ
ータ群の定義を詳しく説明する。第３図は第２図の記入
例をより詳細に示すものである。第３図において、行
目，行目，行目は全行ブランクの行であり、前記ル
ール１に従って、データは３つのデータグループD₁,D₂,
D₃に分かれる。つまり、，行目はグループD₁、〜
行目はグループD₂、〜行目はグループD₃である。
尚、行目は余白である。(However, only the top row is required.) Next, the relationship between the entered data and the rule and the definition of the data group will be described in detail with reference to FIG. FIG. 3 shows the entry example of FIG. 2 in more detail. In FIG. 3, line 1, line 2, and line 3 are all blank lines, and according to the rule 1, the data are three data groups D ₁ , D ₂ ,
Divided into D ₃ . That is, the line is group D ₁ , ~
The row is the group D ₂ and the rows are the group D ₃ .
The line is a blank space.

前記ルール２に従ってグループD₁は３つのデータフィー
ルドD₁₁,D₁₂,D₁₃から成り、ルール３に従ってD₁₁,D₁₂は
４桁の右づめデータ、D₁₃は３桁の左づめデータであ
る。ここで、D₁₁,D₁₂中の△はブランクを示す。例え
ば、D₁₁において「△△13」，「1658」は共に４桁で同
質の２つのデータであり、「△△13」は「13」と同じで
あり、説明のため△（ブランク）を付加してある。D₁₂,
D₁₃も同様である。また、データグループD₂,D₃において
も同様に図の如く成っている。According to Rule 2, the group D ₁ is composed of three data fields D ₁₁ , D ₁₂ , and D ₁₃ , and according to Rule 3, D ₁₁ and D ₁₂ are 4-digit right-justified data and D ₁₃ are 3-digit left-justified data. . Here, Δ in D ₁₁ and D ₁₂ indicates a blank. For example, in D ₁₁ , "△△ 13" and "1658" are two data of the same quality with four digits, "△△ 13" is the same as "13", and △ (blank) is added for explanation. I am doing it. D ₁₂ ,
The same applies to D ₁₃ . Similarly, the data groups D ₂ and D ₃ are also configured as shown in the figure.

従って、第３図のデータは、まず４桁の右づめデータが
２フィールドおよび３桁の左づめデータが１フィールド
から成る２行分のデータと、３桁の右づめデータが１フ
ィールドおよび５桁の右づめデータが１フィールドから
成る３行分のデータと、13桁の左づめデータが１フィー
ルドおよび３桁の左づめデータが１フィールドから成る
３行分のデータが記入されていることになる。Therefore, the data shown in FIG. 3 includes two lines of 4-digit right-justified data consisting of 2 fields and 3-digit left-justified data consisting of 1 field, and 3-digit right-justified data consisting of 1 field and 5 digits. 3 lines of data consisting of 1 field of right padding data and 1 line of 13 digits left padding data and 3 lines of 3 digit left padding data are written. .

次に、第４図を用いて処理の流れを説明する。第４図の
処理は、一般にOCRにより全文字を読み取って一時記憶
した後の処理を示すもので、レイアウト情報抽出処理は
第１図のレイアウト情報抽出部３においてなされるもの
で一時記憶装置２の記憶の内容から行単位で各ブロック
の左右端検出をした後、ブランク行の検出を併せて行
い、その後データグループを検出する。Next, the processing flow will be described with reference to FIG. The processing of FIG. 4 generally shows the processing after all characters are read by OCR and temporarily stored, and the layout information extraction processing is performed by the layout information extraction unit 3 of FIG. After the left and right ends of each block are detected line by line from the stored contents, blank lines are also detected, and then a data group is detected.

次に第４図のフォーマット情報抽出処理は、第１図のフ
ォーマット情報抽出部４においてなされるもので、デー
タグループ毎に各フィールドの左右端を検出し、左づ
め，右づめのチェックを行う。更に、第４図の読取結果
出力処理は、第１図の読取結果出力処理部５においてな
されるもので、一時記憶装置２に記憶された内容からフ
ォーマット情報に従って必要データのみ抽出して出力さ
れる。これらの一連の処理の詳細を次に説明する。Next, the format information extraction process of FIG. 4 is performed by the format information extraction unit 4 of FIG. 1, and the left and right ends of each field are detected for each data group, and left-justified and right-justified checks are performed. Further, the reading result output processing of FIG. 4 is performed by the reading result output processing unit 5 of FIG. 1, and only the necessary data is extracted and output according to the format information from the contents stored in the temporary storage device 2. . Details of the series of processes will be described below.

第５図は第４図のレイアウト情報抽出の過程で作成され
るレイアウトテーブルの例を示す説明図であ、第３図の
帳票について各行のフィールド1,2,3のブランクを除い
た文字のみのブロックの左右端のカラム数を検出した結
果を全頁に亘って示す。例えば第５図中行のブロック
１の情報「５−６」は「５」が左端カラム、「６」が右
端カラムを示す。つまり第３図のD₁₁中「△△13」のフ
ィールドのうちブランクを除いたブロック「13」の左端
つまり「１」はカラム５にありこれがこのブロックの左
端を示し、右端「３」はカラム６にありこれがこのブロ
ックの右端を示す。以下同様であるがここでブロックと
は各フィールドのうちブランクを除いた部分をいう。ま
た、レイアウトテーブル中△は全カラムブランクすなわ
ちブランク行を示す。FIG. 5 is an explanatory diagram showing an example of a layout table created in the process of extracting the layout information of FIG. 4, and in the form of FIG. 3, only the characters excluding blanks in the fields 1, 2 and 3 of each line are written. The results of detecting the number of columns at the left and right ends of the block are shown over the entire page. For example, in the information "5-6" of the block 1 in the row in FIG. 5, "5" indicates the leftmost column and "6" indicates the rightmost column. That is, the left end of the block "13" excluding blanks, that is, "1" in the field of "△△ 13" in D _{11 of} Fig. 3 is in the column 5, and this indicates the left end of this block, and the right end "3" is the column. 6 and this is the right edge of this block. The same applies to the following, but here, a block means a part of each field excluding blanks. Further, in the layout table, Δ indicates blanks in all columns, that is, blank lines.

第５図に示したテーブルは、第７図にフローチャートを
示すところの各ブロックの左右端検出処理によって抽出
されるもので、各処理中のリードとは一時記憶処理２の
内容を読み出すことを示し、カラムとは各桁を示す。ま
た、セットとは第５図のテーブル作成のための行番号，
ブロック番号の登録を、記憶とはテーブルへの書き込み
を示す。The table shown in FIG. 5 is extracted by the left and right edge detection processing of each block shown in the flowchart of FIG. 7, and the read during each processing means that the contents of the temporary storage processing 2 are read out. , Column indicates each digit. A set is a line number for creating the table shown in FIG.
The block number is registered, and the memory is written in the table.

第８図で示す処理は第７図の処理に続くもので第８図の
処理は、第７図に示す処理によって作成される第５図に
示すところのレイアウトテーブルを用いてなされるもの
である。第８図の処理は、レイアウトテーブルの各行の
情報を読み出してチェックし、ブランク行を分離情報と
しながら各データグループの登録と、各グループに属す
る対象行番号を抽出し、第６図に示すようなフォーマッ
トテーブルの左側T₁部分を作成する。つまり、ここでは
ブランク行検出とデータグループ検出を行っている。The process shown in FIG. 8 follows the process shown in FIG. 7, and the process shown in FIG. 8 is performed using the layout table shown in FIG. 5 created by the process shown in FIG. . In the process of FIG. 8, the information of each row of the layout table is read and checked, the registration of each data group and the target row number belonging to each group are extracted while using the blank row as the separation information, and as shown in FIG. The left T ₁ part of a simple format table. That is, blank line detection and data group detection are performed here.

第９図は、第４図で示すフォーマット情報抽出処理の詳
細を示すフローチャートであり、第８図で示した処理に
引続いてなされ、第６図に例として示すようなフォーマ
ットテーブルの右側T₂を作成するものである。第６図中
T₂の部分について、各データグループ内においてフィー
ルド1,2,3は第５図のレイアウトテーブルのブロック1,
2,3の左端最少値と右端最大値から抽出された結果であ
り、左右端値（左−右）と左づめ、右づめの判別結果
（左／右）から成る。例えばテーブルグループD₁のフィ
ールド１は行，において左右端値「３−６」且つ
「右づめ」となっている。これは第５図のレイアウトテ
ーブル中，行についてブロック１のデータ「５−
６」及び「３−６」から左端最少値「３」、右端最大値
「６」となり、且つ右がそろった右づめつまり右端が同
一値となっているのに基づき、第９図に示す処理によっ
て抽出されている。尚、第９図における「記入エラー」
とは体前記ルール１〜ルール３に違反したものを検出し
ているもので実用上あった方がよく、操作者に表示等で
通報できるようにしてある。FIG. 9 is a flow chart showing the details of the format information extraction processing shown in FIG. 4, which is performed subsequent to the processing shown in FIG. 8 and is on the right side T _{2 of the} format table as an example shown in FIG. Is to create. In Fig. 6
Regarding the T ₂ portion, fields 1, 2, and 3 are included in each data group in blocks 1 and 2 of the layout table in FIG.
It is the result extracted from the left end minimum value and the right end maximum value of 2 and 3, and consists of the left and right end values (left-right), left-justified, and right-justified discrimination results (left / right). For example, the field 1 of the table group D ₁ has left and right end values “3-6” and “right-justified” in the row. This is because the data "5-" of the block 1 for the row in the layout table of FIG.
6 "and" 3-6 "to the left end minimum value" 3 ", the right end maximum value" 6 ", and the right end having the same right, that is, the right end has the same value, the processing shown in FIG. Have been extracted by. In addition, "Entry error" in Fig. 9
Means that a body that violates the above-mentioned rules 1 to 3 is detected, and it is better for practical use that the operator can be notified by a display or the like.

第９図の処理の次には、第４図の読取結果出力処理がな
される。この処理では、第６図のフォーマットテーブル
の対象行番号と、各フィールドのフォーマット情報、特
に左右端値で示されるカラム番号を用いて、一時記憶装
置２の内容を取り出して出力する。すなわち、第３図に
示した帳票では例えばデータグループD₁の，行目に
ついてはカラム３〜６を４桁から成るフィールド１の情
報とし、カラム10〜13を４桁から成るフィールド２の情
報とし、カラム18〜20を３桁から成るフィールド３の情
報として取り出して出力する。以下データグループD₂,D
₃についても同様に行うことで、一頁分につき記入者が
意図するところの必要データのみを出力出来る。After the processing of FIG. 9, the reading result output processing of FIG. 4 is performed. In this process, the contents of the temporary storage device 2 are extracted and output using the target line number of the format table of FIG. 6 and the format information of each field, especially the column numbers indicated by the left and right end values. That is, in the form shown in FIG. 3, for example, in the row of the data group D ₁ , columns 3 to 6 are used as field 1 information having 4 digits and columns 10 to 13 are used as field 2 information having 4 digits. , Columns 18 to 20 are fetched and output as information of field 3 consisting of 3 digits. Below data group D ₂ , D
_By doing the same for _3, it is possible to output only the necessary data intended by the writer per page.

尚、この場合例えば行目のフィールド１のデータは
「△△13」となるが、これは２行目と桁をそろえるため
にブランクを自動的に出力するものであり、他も同様で
ある。In this case, for example, the data in the field 1 on the line is "ΔΔ13", but this is to automatically output a blank in order to align the digits with those on the second line, and the same applies to the others.

以上の説明では、各種処理の過程で抽出される情報のテ
ーブル類については帳票例に合せた例として表現した
が、各処理のフローチャートからもわかるように、処理
そのものは汎用フローになっているので、当然帳票例に
示したもの以外も処理可能なことは明白である。In the above description, the tables of information extracted in the process of various processes are expressed as an example according to the form example, but as can be seen from the flowcharts of each process, the process itself is a general flow. Obviously, it is possible to process other than those shown in the form example.

また、前述した如く、記入ルールは処理としてアルゴリ
ズム化できるものなら特に実施例に示したものでなくて
もよいが、データ記入の自然さを保存することが望まし
い。Further, as described above, the entry rule may not be the one described in the embodiment as long as it can be algorithmized as a process, but it is desirable to preserve the naturalness of data entry.

＜発明の効果＞本発明文字読取方式は帳票を用途に合せて個別に設計す
る必要がないので、OCRを簡便に使用するに際して極め
て有用である。<Effects of the Invention> Since the character reading method of the present invention does not need to individually design the form according to the application, it is extremely useful when the OCR is simply used.

[Brief description of drawings]

第１図は本発明による文字読取方式の一実施例のブロッ
ク図、第２図は汎用帳票と文字記入の例を示す図、第３
図は第２図に示すものの読取結果の一次記憶内容を示す
模式図、第４図は本発明における処理フローの概略を示
す図、第５図は抽出されたレイアウト情報の例をテーブ
ル化して示す図、第６図は検出されたフォーマット情報
の例をテーブル化して示す図、第７図はレイアウト情報
を抽出する処理のフローチャート、第８図，第９図はフ
ォーマット情報検出処理のフローチャートを分割して示
す。１はOCR、２は一時記憶装置、３はレイアウト情報抽出
部、４はフォーマット情報抽出部、５は読取結果出力部
である。FIG. 1 is a block diagram of an embodiment of a character reading system according to the present invention, FIG. 2 is a diagram showing an example of a general-purpose form and character entry, and FIG.
FIG. 4 is a schematic diagram showing the primary storage contents of the read result of what is shown in FIG. 2, FIG. 4 is a diagram showing the outline of the processing flow in the present invention, and FIG. 5 is a table showing an example of the extracted layout information. 6 and 6 are tables showing examples of detected format information, FIG. 7 is a flowchart of a process of extracting layout information, and FIGS. 8 and 9 are flowcharts of a format information detection process. Indicate. Reference numeral 1 is an OCR, 2 is a temporary storage device, 3 is a layout information extraction unit, 4 is a format information extraction unit, and 5 is a read result output unit.

Claims

[Claims]

1. A writing rule that prescribes only a relative arrangement of characters to be written is set in advance, and all the characters obtained by reading characters including a blank written in a general-purpose form according to the writing rule are obtained. Storage means for temporarily storing information in a character array conforming to the format of the general-purpose form, and entry layout information indicating a character array on the general-purpose form based on the entry rule from all character information read from the storage means. And a layout information extraction unit for outputting the extracted layout data extracted from the character data field belonging to the same group entered on the general-purpose form and the character array included in the entry layout information input from the layout information extraction unit. Format information indicating how to fill in the data fields is extracted and output based on the above-mentioned writing rules. Format information extraction means, and read result output means for reading out all character information stored in the storage means and outputting only required data based on the format information input from the format information extraction means. Character reader.