JPH0795332B2 - Character string reader - Google Patents
Character string readerInfo
- Publication number
- JPH0795332B2 JPH0795332B2 JP63001025A JP102588A JPH0795332B2 JP H0795332 B2 JPH0795332 B2 JP H0795332B2 JP 63001025 A JP63001025 A JP 63001025A JP 102588 A JP102588 A JP 102588A JP H0795332 B2 JPH0795332 B2 JP H0795332B2
- Authority
- JP
- Japan
- Prior art keywords
- character
- character string
- recognition
- string
- rotation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Landscapes
- Character Input (AREA)
- Character Discrimination (AREA)
Description
【発明の詳細な説明】 [発明の目的] (産業上の利用分野) 本発明は帳票上に記載される文字列を光学的に読取り認
識する文字列読取装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Field of Industrial Application) The present invention relates to a character string reader for optically reading and recognizing a character string described on a form.
(従来の技術) 帳票上に記載された文字列を読取り入力する装置として
光学的文字読取装置(OCR)がある。この種の装置は、
一般的には予め定められた書式の文字記入枠に記載され
た文字列を文字画像として光学的に読取り、各文字画像
を認識してその情報(認識結果)を入力するものとなっ
ている。(Prior Art) An optical character reading device (OCR) is a device for reading and inputting a character string written on a form. This kind of device
In general, a character string described in a character entry frame of a predetermined format is optically read as a character image, each character image is recognized, and the information (recognition result) is input.
これに対して最近では、書式の定まっていない帳票上に
自由に(フリーフォーマットで)記載された文字列をも
文字読取装置にて読取り入力したいと云う要求が強い。
しかし書式が定まっていない場合、文字列の並びの向き
や文字の向きが不明である為、文字列の読取り認識が著
しく困難となる。即ち、日本語の場合、文字列の形態と
して縦書きと横書きとがある。この為、光学的に読取ら
れた文字列画像を処理する場合には、その文字列を構成
する各文字の向きを判断し、その上で文字の並びの向き
を判断した上で各文字を認識処理する必要がある。従っ
てその処理手続きが繁雑であることのみならず、誤認識
や認識リジェクトが生じ易いと云う問題があった。On the other hand, recently, there is a strong demand that a character reading device also reads and inputs a character string freely (free format) written on a form whose format is not fixed.
However, if the format is not fixed, the reading and recognition of the character string becomes extremely difficult because the orientation of the character string and the orientation of the character are unknown. That is, in the case of Japanese, there are vertical writing and horizontal writing as character string forms. Therefore, when processing an optically read character string image, the orientation of each character that constitutes the character string is determined, and then the orientation of the character is determined and then each character is recognized. Need to be processed. Therefore, there is a problem that not only the processing procedure is complicated but also erroneous recognition and recognition rejection are likely to occur.
(発明が解決しようとする問題点) このように従来にあっては自由形式で文字列が記載され
た帳票からその文字列を読取り入力するには、各文字の
向きを調べた上でその文字を認識し、その認識結果の列
を総合判定して上記文字列の情報を入力しているので、
その処理手続きが繁雑である等の問題があった。(Problems to be solved by the invention) As described above, in the conventional case, in order to read and input a character string from a form in which the character string is described in free form, the direction of each character is checked and then the character is read. Is recognized, the column of the recognition result is comprehensively judged, and the information of the above character string is input.
There was a problem that the processing procedure was complicated.
本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、自由形式で記載された文字列を
簡易に効率良く読取り入力することのできる文字列読取
装置を提供することにある。The present invention has been made in view of such circumstances, and an object thereof is to provide a character string reading device capable of easily and efficiently reading and inputting a character string described in a free format. It is in.
[発明の構成] (問題点を解決するための手段) 本発明は、帳票上に記載された文字列を読取る光電変換
手段と、この光電変換手段により読取られた文字列画像
から文字画像を1文字ずつ順に抽出する文字検出切出手
段と、この文字検出切出手段にて抽出された文字画像を
前記光電変換手段での文字読取り方向に対して90°単位
で回転処理し、前記各文字画像に対するそれぞれの回転
角度毎の文字認識結果を求める文字回転認識手段と、こ
の文字回転認識手段で求められた認識結果の文字を、同
じ回転角度の認識文字を抽出することにより各角度毎の
文字列候補として出力する手段と、読取り対象となる文
字列を予め記憶する文字列辞書と、前記出力手段から出
力された文字列候補を前記文字列辞書に記憶されている
読取り対象文字列と比較することにより文字列の識別結
果を求める文字列識別手段とから構成されている。[Structure of the Invention] (Means for Solving Problems) According to the present invention, a photoelectric conversion unit that reads a character string written on a form and a character image from a character string image read by the photoelectric conversion unit are provided. Character detection and extraction means for extracting characters one by one, and the character image extracted by this character detection and extraction means is rotated in 90 ° units with respect to the character reading direction by the photoelectric conversion means, and each character image is processed. To the character rotation recognition means for obtaining the character recognition result for each rotation angle with respect to the character of the recognition result obtained by this character rotation recognition means, by extracting the recognition character of the same rotation angle, the character string for each angle A means for outputting as a candidate, a character string dictionary for storing a character string to be read in advance, and a character string candidate output from the output means for comparison with a read target character string stored in the character string dictionary. And a character string identifying means for obtaining the identification result of the character string by Rukoto.
(作用) 本発明は、帳票上に記載された文字列を読取り、この読
取った文字列画像から文字画像を1文字ずつ順に抽出す
るとともに、抽出された文字画像を前記光電変換手段で
の文字読取り方向に対して90°単位で回転処理し、それ
ぞれの回転角度毎の文字認識結果を求める。そして、求
められた認識結果の文字を、同じ回転角度の認識文字を
抽出することにより各角度毎の文字列候補を得、この文
字列候補を予め記憶された文字列辞書と比較することに
より文字列としての識別結果を求めるようにしたもので
ある。(Operation) According to the present invention, a character string written on a form is read, character images are sequentially extracted character by character from the read character string image, and the extracted character image is read by the photoelectric conversion means. Rotation is performed in 90 ° units with respect to the direction, and character recognition results are obtained for each rotation angle. Then, for the characters of the obtained recognition result, the character strings candidates for each angle are obtained by extracting the recognized characters of the same rotation angle, and the character string candidates are compared with the character string dictionary stored in advance to determine the characters. The identification result as a column is obtained.
(実施例) 以下、図面を参照して本発明の一実施例につき説明す
る。Embodiment An embodiment of the present invention will be described below with reference to the drawings.
第1図は本発明の一実施例に係る文字列読取装置の概略
構成図で、1は帳票上に記載されている文字列の情報を
帳票画像(文字列画像)として光学的に撮像入力する光
電変換部である。文字検出部2は入力された帳票画像か
ら、例えば第2図に示すようにその文字の並び方向の射
影、および並び方向と直交する向きの射影を求めて文字
列画像部分、およびその文字列を構成する各文字の文字
画像を、例えばその文字の並びの方向に沿って順に検出
切出しするものである。そして文字検出切出し部2は、
前記帳票画像から順次抽出した個々の文字画像をパター
ンメモリ3に順に格納し、その読取り入力に供してい
る。FIG. 1 is a schematic configuration diagram of a character string reading apparatus according to an embodiment of the present invention, in which 1 is information on a character string described on a form, which is optically imaged and input as a form image (character string image). It is a photoelectric conversion unit. From the input form image, the character detection unit 2 obtains a projection of the characters in the arrangement direction and a projection in a direction orthogonal to the arrangement direction, as shown in FIG. 2, to obtain the character string image portion and the character string thereof. For example, the character image of each of the constituent characters is detected and cut out in order along the direction of arrangement of the characters. Then, the character detection / cutout unit 2
The individual character images sequentially extracted from the form image are sequentially stored in the pattern memory 3 and used for reading and inputting them.
文字回転識別部4は基本的にはパターンメモリ3に格納
された文字画像を、文字辞書5を参照して認識処理する
ものである。この際、文字回転識別部4は上記文字画像
を、前記光電変換部1での読取り方向に対して所定角度
ずつ回転し、例えば90°単位で回転処理し、その向きが
上下左右の4つの向きを取り得る可能性があることを想
定し、各向きについてそれぞれ文字辞書5を参照してそ
の認識結果を求めるものとなっている。The character rotation identification unit 4 basically recognizes a character image stored in the pattern memory 3 by referring to the character dictionary 5. At this time, the character rotation identification unit 4 rotates the character image by a predetermined angle with respect to the reading direction of the photoelectric conversion unit 1, and performs rotation processing in units of 90 °, for example, and the orientation is four directions of up, down, left and right. Assuming that there is a possibility that the recognition result can be taken, the recognition result is obtained by referring to the character dictionary 5 for each direction.
具体的には第2図に例示するように『 』なる氏名を示す文字列が与えられた場合、その先頭の
文字『 』についてその向きが上下左右の4つの向きを取り得る
ことを想定し、その文字画像を90°ずつ回転処理した第
3図に示すような4つの文字画像を求め、これらの各文
字画像についてそれぞれ認識処理している。この認識
(識別)処理により、例えば0°回転(回転なし)の文
字画像『 』から「田」「母」等の文字認識結果を得、また90°回
転した文字画像『由』から「由」「申」等の文字認識結
果を得ている。同様にして180°回転した文字画像『 』から「田」「母」等の文字認識結果を得、270°回転
した文字画像『甲』から「甲」「用」等の文字認識結果
を得ている。Specifically, as illustrated in FIG. When a character string indicating the name "" is given, the first character "" Assuming that the orientation can take four directions, up, down, left and right, the four character images as shown in FIG. 3 obtained by rotating the character image by 90 ° are obtained, and each of these character images is obtained. The recognition process is in progress. By this recognition (identification) process, for example, a character image “0 ° rotated (no rotation)” is displayed. ], The character recognition results such as "Ta" and "Mother" are obtained, and the character images "Yu" and "Shin" are obtained from the 90-degree rotated character image "Yu". Similarly, the character image rotated 180 ° ], The character recognition result such as “T” and “Mother” is obtained, and the character image such as “Kou” and “Use” is obtained from the character image “Kou” rotated by 270 °.
文字列識別部6は、上述した如く想定した各向きについ
てそれぞれ求められた文字認識結果の列と、文字列辞書
7に登録されている認識対象文字列(例えば氏名や単語
等)とをそれぞれ比較照合し、該当する文字列が辞書7
中に存在するか否かを判定するものである。The character string identifying unit 6 compares the character recognition result string obtained for each of the assumed directions as described above with the recognition target character string (for example, name or word) registered in the character string dictionary 7. Collate and the corresponding character string is dictionary 7
It is to determine whether or not it exists inside.
具体的には第4図に示すように0°回転を与えて識別さ
れた文字認識結果の列「田町然向」「母脈興何」…,90
°回転を与えて識別された文字認識結果の列「由良浩
司」「田町減団」…,180°回転を与えて識別された文字
認識結果の列「田畑単凶」,…等についてそれぞれ文字
列辞書7に登録された第5図に示すような文字列と照合
し、一致する文字列を探し出している。この場合には、
90°回転を与えて識別された文字認識結果の列「由良浩
司」について文字列辞書7から該当文字列が見出される
ことから、前記入力文字列が「由良浩司」であると識別
される。Specifically, as shown in FIG. 4, a sequence of character recognition results identified by being rotated by 0 ° “Tamachi Nozuko” “Mother's veins”…, 90
° Character recognition result sequence identified by rotation "Koji Yura""TamachiShukudan" ..., Character recognition result sequence identified by rotation by 180 ° "Tabata Yankyou", ... etc. The character strings registered in the dictionary 7 as shown in FIG. 5 are collated to find a matching character string. In this case,
The input character string is identified as "Koji Yura" because the corresponding character string is found from the character string dictionary 7 for the string "Koji Yura" of the character recognition result identified by rotating 90 degrees.
尚、この各向きについての文字認識結果の列と文字列辞
書7に登録されている文字列との比較照合は、日本語の
場合には縦書きの場合には上から下へ、また横書きの場
合には左から右へと文字が記載されるとの前提の下で、
180°回転,および270°回転した文字認識結果の列につ
いては、その文字を逆順に抽出して文字列辞書7との照
合が行なわれる。つまり180°回転して求められる向き
の文字認識結果列については、「田畑単凶」なる文字列
ではなく「凶単畑田」なる文字列として辞書照合が行な
われる。It should be noted that the comparison of the character recognition result sequence for each direction and the character string registered in the character string dictionary 7 is performed from top to bottom in the case of vertical writing in Japanese, and in the case of horizontal writing. Under the assumption that the letters are written from left to right in some cases,
The character recognition result sequence rotated by 180 ° and 270 ° is extracted in reverse order and collated with the character string dictionary 7. In other words, for the character recognition result sequence in the direction obtained by rotating 180 °, the dictionary collation is performed as the character string "Keunata Hatada" instead of the character string "Keunata Hata".
このように本装置によれば、自由形式で記載された文字
列を読取り入力する際、その文字列を構成する各文字が
取り得る向きを上下左右の4つ向きとして想定し、その
向きについて求められた文字認識結果の列を直接的に文
字列辞書7と照合して文字列の識別を行なうものとなっ
ている。これ故、従来のように一々文字の向きを判断
し、更に文字列の向きを判断した上で文字列の識別処理
を行なう必要がなく、その処理手続きを大幅に簡略化す
ることができる。しかも文字列を1つの処理単位として
識別処理するので、例えば個々の文字の向きが不確かな
場合であっても、その文字列を正確に認識し、読取り入
力することができる。従って誤認識や認識リジェクトの
発生を効果的に抑え、文字列の読取り入力の効率化を図
ることが可能となる等の実用上多大なる効果が奏せられ
る。As described above, according to the present apparatus, when reading and inputting a character string described in free format, it is assumed that four possible directions of each character forming the character string are up, down, left and right, and the direction is obtained. The obtained character recognition result string is directly collated with the character string dictionary 7 to identify the character string. Therefore, it is not necessary to perform the character string identification processing after determining the character direction one by one as in the conventional case, and the processing procedure can be greatly simplified. Moreover, since the character string is identified and processed as one processing unit, the character string can be accurately recognized, read and input even when the orientation of each character is uncertain. Therefore, the occurrence of erroneous recognition and recognition rejection can be effectively suppressed, and the efficiency of reading and inputting a character string can be improved.
尚、本発明は上述した実施例に限定されるものではな
い。例えば文字列画像から個々の文字画像を抽出する
際、その文字列を構成する文字数が分るので、その情報
を利用して文字列辞書7の照合範囲を限定するように制
御することも可能である。また個々の文字画像の認識処
理の方式等も特に限定されない。その他、本発明はその
要旨を逸脱しない範囲で種々変形して実施することがで
きる。The present invention is not limited to the above embodiment. For example, when extracting individual character images from a character string image, since the number of characters that make up the character string is known, it is possible to use that information to limit the collation range of the character string dictionary 7. is there. Also, the method of recognizing individual character images is not particularly limited. In addition, the present invention can be variously modified and implemented without departing from the scope of the invention.
[発明の効果] 以上説明したように本発明よれば、帳票上に自由形式で
記載された縦書き、横書きの文字列であっても、また帳
票が上下反転していても非常に効率的にかつ正確に読取
ることができる。従って、従来のように文字の向きを判
断し、さらに文字列の向きを判断して文字列の識別を行
うものに比して処理を大幅に簡略することができ、その
結果、認識時間も短縮することができる等の実用上多大
なる効果を奏することができる。[Effects of the Invention] As described above, according to the present invention, even if a vertical or horizontal character string written in a free form on a form or the form is vertically inverted, it is very efficient. And it can be read accurately. Therefore, it is possible to greatly simplify the processing as compared with the conventional method that determines the direction of the character and further determines the direction of the character string to identify the character string, and as a result, the recognition time is also shortened. It is possible to obtain a great effect in practical use.
図は本発明の一実施例を示すもので、第1図は実施例装
置の要部概略構成図、第2図は入力文字列画像と文字画
像の抽出処理の例を説明する為の図、第3図は文字画像
に対する回転処理の概念を示す図、第4図は文字の向き
毎に求められる文字認識結果の列の例を示す図、第5図
は文字列辞書に登録される認識対象文字列の例を示す図
である。 1…光電変換部(光電変換手段)、2…文字検出切出部
(文字検出切出手段)、3…パターンメモリ、4…文字
回転識別部(文字回転認識手段)、5…文字辞書、6…
文字列識別部(文字列識別手段)、7…文字列辞書。FIG. 1 shows an embodiment of the present invention, FIG. 1 is a schematic configuration diagram of a main part of an embodiment device, FIG. 2 is a diagram for explaining an example of an input character string image and a character image extraction process, FIG. 3 is a diagram showing the concept of rotation processing for a character image, FIG. 4 is a diagram showing an example of a sequence of character recognition results obtained for each direction of a character, and FIG. 5 is a recognition target registered in a character string dictionary. It is a figure which shows the example of a character string. DESCRIPTION OF SYMBOLS 1 ... Photoelectric conversion part (photoelectric conversion means), 2 ... Character detection cut-out part (character detection cut-out means), 3 ... Pattern memory, 4 ... Character rotation identification part (character rotation recognition means), 5 ... Character dictionary, 6 …
Character string identifying section (character string identifying means), 7 ... Character string dictionary.
───────────────────────────────────────────────────── フロントページの続き (56)参考文献 特開 昭57−136285(JP,A) 特開 昭58−39377(JP,A) 特開 昭53−29623(JP,A) ─────────────────────────────────────────────────── ─── Continuation of front page (56) Reference JP-A-57-136285 (JP, A) JP-A-58-39377 (JP, A) JP-A-53-29623 (JP, A)
Claims (1)
換手段と、 この光電変換手段により読取られた文字列画像から文字
画像を1文字ずつ順に抽出する文字検出切出手段と、 この文字検出切出手段にて抽出された文字画像を前記光
電変換手段での文字読取り方向に対して90°単位で回転
処理し、前記各文字画像に対するそれぞれの回転角度毎
の文字認識結果を求める文字回転認識手段と、 この文字回転認識手段で求められた認識結果の文字を、
同じ回転角度の認識文字を抽出することにより各角度毎
の文字列候補として出力する手段と、 読取り対象となる文字列を予め記憶する文字列辞書と、 前記出力手段から出力されたそれぞれの文字列候補を、
前記文字列辞書に記憶されている読取り対象文字列と比
較することにより文字列の識別結果を求める文字列識別
手段とを具備することを特徴とする文字列読取装置。1. A photoelectric conversion means for reading a character string written on a form, a character detection / cutout means for sequentially extracting character images one by one from a character string image read by the photoelectric conversion means, and the character The character rotation extracted from the detection / cutout means is rotated in units of 90 ° with respect to the character reading direction of the photoelectric conversion means, and character rotation is obtained for each rotation angle for each of the character images. The recognition means and the character of the recognition result obtained by this character rotation recognition means,
A means for outputting a character string candidate for each angle by extracting recognized characters of the same rotation angle, a character string dictionary for storing a character string to be read in advance, and each character string output from the output means. Candidates
A character string reading device, comprising: a character string identifying unit that obtains a result of identifying a character string by comparing it with a read target character string stored in the character string dictionary.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP63001025A JPH0795332B2 (en) | 1988-01-06 | 1988-01-06 | Character string reader |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP63001025A JPH0795332B2 (en) | 1988-01-06 | 1988-01-06 | Character string reader |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPH01177179A JPH01177179A (en) | 1989-07-13 |
| JPH0795332B2 true JPH0795332B2 (en) | 1995-10-11 |
Family
ID=11490024
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP63001025A Expired - Lifetime JPH0795332B2 (en) | 1988-01-06 | 1988-01-06 | Character string reader |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JPH0795332B2 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4553241B2 (en) * | 2004-07-20 | 2010-09-29 | 株式会社リコー | Character direction identification device, document processing device, program, and storage medium |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS5329623A (en) * | 1976-09-01 | 1978-03-20 | Oki Electric Ind Co Ltd | Optical character reader |
| JPS57136285A (en) * | 1981-02-17 | 1982-08-23 | Fujitsu Ltd | Character recognizing system |
| JPS5839377A (en) * | 1981-09-02 | 1983-03-08 | Toshiba Corp | Character recognizing device |
-
1988
- 1988-01-06 JP JP63001025A patent/JPH0795332B2/en not_active Expired - Lifetime
Also Published As
| Publication number | Publication date |
|---|---|
| JPH01177179A (en) | 1989-07-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP3445394B2 (en) | How to compare at least two image sections | |
| JP3343864B2 (en) | How to separate words | |
| JP3602596B2 (en) | Document filing apparatus and method | |
| WO2025003209A1 (en) | Image analysis | |
| JPH0795332B2 (en) | Character string reader | |
| JP3090070B2 (en) | Form identification method and device | |
| CN115437504B (en) | Page numbering methods, assisted reading methods based on them, and their applications | |
| JP4221960B2 (en) | Form identification device and identification method thereof | |
| JP2004280530A (en) | System and method for processing form | |
| JPS62166479A (en) | Recognizing method for visiting card | |
| JPH01201789A (en) | Character reader | |
| JPH1166230A (en) | Document recognition device, document recognition method, and medium | |
| JP2977244B2 (en) | Character recognition method and character recognition device | |
| JPH10124610A (en) | Optical character reading device | |
| JP3151866B2 (en) | English character recognition method | |
| CN114565750A (en) | Method and system for processing paper test questions | |
| JP2006330873A (en) | Fingerprint verification apparatus, method and program | |
| JP2600703B2 (en) | Partial line collation device | |
| JP2570311B2 (en) | String recognition device | |
| JPH01191986A (en) | Slip format detector | |
| JP2963474B2 (en) | Similar character identification method | |
| JP2002042138A (en) | Image collating device, image collating method, and computer-readable recording medium recording program executing its method on computer | |
| JPH0264882A (en) | Address reading device | |
| JP2004013188A (en) | Business form reading device, business form reading method and program therefor | |
| JPH0696277A (en) | Alphabet recognizing device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20081011 Year of fee payment: 13 |
|
| EXPY | Cancellation because of completion of term | ||
| FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20081011 Year of fee payment: 13 |