JP2581809B2 - Character extraction device - Google Patents
Character extraction deviceInfo
- Publication number
- JP2581809B2 JP2581809B2 JP1264733A JP26473389A JP2581809B2 JP 2581809 B2 JP2581809 B2 JP 2581809B2 JP 1264733 A JP1264733 A JP 1264733A JP 26473389 A JP26473389 A JP 26473389A JP 2581809 B2 JP2581809 B2 JP 2581809B2
- Authority
- JP
- Japan
- Prior art keywords
- character
- line
- ruled line
- detection area
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000605 extraction Methods 0.000 title description 40
- 238000001514 detection method Methods 0.000 claims description 185
- 238000009826 distribution Methods 0.000 claims description 81
- 230000001186 cumulative effect Effects 0.000 claims description 49
- 230000002093 peripheral effect Effects 0.000 claims description 39
- 238000000926 separation method Methods 0.000 claims description 31
- 238000004364 calculation method Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 102100025218 B-cell differentiation antigen CD72 Human genes 0.000 description 11
- 101000934359 Homo sapiens B-cell differentiation antigen CD72 Proteins 0.000 description 11
- 238000006073 displacement reaction Methods 0.000 description 11
- 101100438957 Mus musculus Cd8a gene Proteins 0.000 description 10
- 238000000034 method Methods 0.000 description 8
- 238000007796 conventional method Methods 0.000 description 6
- 230000001174 ascending effect Effects 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 101100005735 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) CDC15 gene Proteins 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 2
- PCTMTFRHKVHKIS-BMFZQQSSSA-N (1s,3r,4e,6e,8e,10e,12e,14e,16e,18s,19r,20r,21s,25r,27r,30r,31r,33s,35r,37s,38r)-3-[(2r,3s,4s,5s,6r)-4-amino-3,5-dihydroxy-6-methyloxan-2-yl]oxy-19,25,27,30,31,33,35,37-octahydroxy-18,20,21-trimethyl-23-oxo-22,39-dioxabicyclo[33.3.1]nonatriaconta-4,6,8,10 Chemical compound C1C=C2C[C@@H](OS(O)(=O)=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2.O[C@H]1[C@@H](N)[C@H](O)[C@@H](C)O[C@H]1O[C@H]1/C=C/C=C/C=C/C=C/C=C/C=C/C=C/[C@H](C)[C@@H](O)[C@@H](C)[C@H](C)OC(=O)C[C@H](O)C[C@H](O)CC[C@@H](O)[C@H](O)C[C@H](O)C[C@](O)(C[C@H](O)[C@H]2C(O)=O)O[C@H]2C1 PCTMTFRHKVHKIS-BMFZQQSSSA-N 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012015 optical character recognition Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
Landscapes
- Character Input (AREA)
Description
【発明の詳細な説明】 (産業上の利用分野) この発明は、罫線を伴なう傾斜した文字行から罫線を
包含しないように文字を切出すための文字切出し装置に
関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character extracting apparatus for extracting a character from a slanted character line with a ruled line so as not to include the ruled line.
(従来の技術) 近年、大量の文書の処理効率を向上するため、文書に
記載された情報を機械的に計算機へ入力できるようにす
ることへの要求が高まっている。この機械的入力に利用
される文字認識装置は、印刷され或は手書きされた文字
群の画像から一文字分の画像を一文字単位に分離し、一
文字分に文字認識を行なう。(Prior Art) In recent years, in order to improve the processing efficiency of a large number of documents, there is an increasing demand for mechanically inputting information described in the documents to a computer. The character recognition device used for this mechanical input separates an image of one character from a printed or handwritten character group image in units of one character, and performs character recognition for one character.
一般に用いられる文書には文字に加え罫線が付加され
ることも多く、文字認識のためには罫線を包含しないよ
うに一文字分の画像を分離する必要がある。Generally used documents are often provided with ruled lines in addition to characters. For character recognition, it is necessary to separate an image for one character so as not to include the ruled lines.
この画像分離のための従来技術としては、特開昭60−
160487号公報の光学的文字読取装置や、特開昭62−2173
85号公報の文字分離方式を挙げることができる。The prior art for this image separation is disclosed in
Japanese Patent No. 160487 discloses an optical character reader, and Japanese Patent Application Laid-Open No. 62-2173.
No. 85, the character separation system can be mentioned.
前者の従来技術では、読取部からの画像データを格納
した画像メモリ上に仮想的に行方向及び列方向を設定
し、文書の画像データを行方向に走査し、この走査で行
方向の累積黒画素数を求めて周辺分布を作成し、この行
方向の周辺分布の累積黒画素数が所定値以上となる領域
をアンダライン部及び所定値未満となる領域を文字部と
して検出しこれと共に列方向の文字部の緑の位置を列方
向の文字始端位置及び終端位置として検出する。さらに
アンダライン部を除き文字部のみを走査し、この走査で
列方向の累積黒画素数を求めて周辺分布を作成し、この
列方向の周辺分布の変化から行方向の文字始端及び終端
位置を検出する。In the former conventional technique, a row direction and a column direction are virtually set on an image memory storing image data from a reading unit, and image data of a document is scanned in a row direction. A peripheral distribution is created by calculating the number of pixels, an area where the cumulative number of black pixels of the peripheral distribution in the row direction is equal to or greater than a predetermined value is detected as an underline part, and an area where the cumulative black pixel number is less than the predetermined value is detected as a character part. Are detected as the character start and end positions in the column direction. Furthermore, only the character portion is scanned except for the underline portion, and a peripheral distribution is created by calculating the cumulative number of black pixels in the column direction by this scanning, and from the change in the peripheral distribution in the column direction, the character start and end positions in the row direction are determined. To detect.
この従来技術において、行方向に沿って罫線が付加さ
れている場合の、列方向の文字始端及び終端位置を精度
良く検出するためには、画像の真の行方向が所定の行方
向と一致するような画像を得て、仮想的に予め定めた所
定の行方向において文字及び罫線の重なり合わないよう
にして周辺分布を作成する必要がある。In this prior art, when a ruled line is added along the row direction, in order to accurately detect the character start and end positions in the column direction, the true row direction of the image matches the predetermined row direction. It is necessary to obtain such an image and create a marginal distribution such that characters and ruled lines do not overlap virtually in a predetermined line direction.
また後者の従来技術では、アンダライン等の直線を伴
う文字行を包含する画像上に複数の局所領域を設定し、
これら各局所領域の画像を、パターンマッチングによっ
て所定の方向線素に記号化し、さらに方向線素を分類す
る。そしてアンダライン等の文字認識に不要な線分と同
じ方向を有する方向線素(例えば水平直線成分)から構
成される線分のなかから、所定の条件(例えば長さや存
在領域)を有する線分を検出し、検出した線分を不要線
分として画像から除去する。このようにして不要線分を
除去した画像を得たのち、この画像を走査して行及び列
方向の周辺分布を作成し、作成した周辺分布から所定の
行及び列方向における文字始端及び終端位置を検出して
いた。In the latter conventional technique, a plurality of local regions are set on an image including a character line with a straight line such as an underline,
The images of these local regions are symbolized into predetermined direction elements by pattern matching, and the direction elements are further classified. Then, a line segment having a predetermined condition (for example, length or existing area) is selected from line segments composed of direction line elements (for example, horizontal straight line components) having the same direction as a line segment unnecessary for character recognition such as an underline. Is detected, and the detected line segment is removed from the image as an unnecessary line segment. After obtaining an image from which unnecessary line segments have been removed in this way, this image is scanned to create a marginal distribution in the row and column directions, and from the created marginal distribution, the character start and end positions in predetermined row and column directions. Was detected.
(発明が解決しようとする課題) しかしながら上述した前者の従来技術では、画像メモ
リ上に仮想的に設定した所定の行方向と画像の真の行方
向とのずれが大きくなり所定の行方向において文字及び
罫線が重なり合ったり文字と罫線とが接触していたりす
ると、周辺分布作成の際に同一走査線上で文字及び罫線
の黒画素が混在するため、文字及び罫線の位置を精度良
く検出できない。(Problems to be Solved by the Invention) However, in the former conventional technique described above, the deviation between the predetermined line direction virtually set in the image memory and the true line direction of the image becomes large, and the characters in the predetermined line direction become large. If the ruled line overlaps or the character and the ruled line are in contact with each other, black pixels of the character and the ruled line are mixed on the same scanning line at the time of creating the marginal distribution, so that the position of the character and the ruled line cannot be detected with high accuracy.
また前者の従来技術では、同一文字行の文字に関して
は列方向の文字始端位置及び終端位置を全と同一とする
ため、所定の行方向と真の行方向とがずれている場合に
文字の端が欠けた不完全な文字を切出してしまい文字の
認識を行なえないことがある。この文字の欠けの程度は
行の端部に近いほどひどくなる。Also, in the former conventional technique, since the character start and end positions in the column direction are the same for all characters on the same character line, when the predetermined line direction is deviated from the true line direction, the character In some cases, incomplete characters that are missing are cut out and characters cannot be recognized. The degree of missing characters becomes worse nearer the end of the line.
また上述した後者の従来技術では、読取画像を複数の
局所領域に分割し、各局所領域の読取画像と所定の複数
の基本パターンとをマッチングによって比較し、この比
較結果に基づいて局所領域を方向線素に変換する。この
ような局所マッチング処理は複雑であり、従ってこの従
来技術を採用した装置の構成は複雑化し、これがため装
置規模の大型化や装置のコスト高を招くという問題点が
あった。In the latter conventional technique described above, the read image is divided into a plurality of local areas, the read image of each local area is compared with a predetermined plurality of basic patterns by matching, and the local area is oriented based on the comparison result. Convert to line elements. Such a local matching process is complicated, and therefore, the configuration of the device employing the conventional technique is complicated, which causes a problem that the size of the device is increased and the cost of the device is increased.
この発明の目的は、上述した従来の問題点を解決する
ため、所定の行方向と真の行方向とがずれている場合で
も、周辺分布の変化に基づいて文字が欠けないように精
度良く文字の切出し位置を検出できる文字切出し位置を
提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to solve the above-described conventional problems, and to accurately remove characters based on a change in peripheral distribution even when a predetermined line direction is deviated from a true line direction. An object of the present invention is to provide a character cutout position that can detect the cutout position of a character.
(課題を解決するための手段) この目的の達成を図るため、この発明の文字切出し装
置は、 罫線を伴なう文字行の画像データを出力する画像出力
部と、 画像データを格納する画像記憶部と、 文字行の各文字毎に設定される文字検出領域の画像デ
ータを列方向に走査して各走査線毎に累積黒画素数を検
出し第一の周辺分布を作成し該第一の周辺分布の変化か
ら文字検出領域内の文字の位置を検出し、この文字の位
置に基づいて文字に隣接する罫線検出領域の位置を設定
すると共に行方向の文字切出し位置を設定する文字間検
出部と、 罫線検出領域の画像データを行方向に走査して各走査
線毎に累積黒画素数を検出し第二の周辺分布を作成し該
第二の周辺分布の変化から罫線の位置を検出し、この罫
線の位置に基づき文字及び罫線の分離位置を設定し、分
離位置に基づいて列方向の文字切出し位置を設定する文
字罫線分離部と、 文字行の二文字目以降の文字の文字検出領域及び罫線
検出領域の設定のための列方向の位置補正値をそれ以前
に切り出した文字の位置に基づいて設定する補正値決定
部と、 各文字毎の文字切出し位置を出力する制御部とを備え
て成ることを特徴とする。(Means for Solving the Problems) In order to achieve this object, a character cutout device according to the present invention includes: an image output unit that outputs image data of a character line with a ruled line; and an image storage that stores image data. And the image data of the character detection area set for each character of the character row is scanned in the column direction, the cumulative number of black pixels is detected for each scanning line, and the first peripheral distribution is created. A character interval detection unit that detects the position of a character in a character detection region from a change in the peripheral distribution, sets the position of a ruled line detection region adjacent to the character based on the position of the character, and sets the character cutout position in the line direction. Scanning the image data of the ruled line detection area in the row direction, detecting the cumulative number of black pixels for each scanning line, creating a second peripheral distribution, and detecting the position of the ruled line from the change in the second peripheral distribution. , Character and ruled line separation position based on the position of the ruled line A character rule separation unit that sets and sets the character cutout position in the column direction based on the separation position, and a position correction in the column direction for setting the character detection area and rule detection area for the second and subsequent characters in the character line It is characterized by comprising a correction value determining unit for setting a value based on the position of a character cut out before that, and a control unit for outputting a character cutout position for each character.
(作用) このような構成の文字切出し装置によれば、文字行の
二文字目以降の文字の文字検出領域及び罫線検出領域の
設定のための列方向の位置補正値をそれ以前に切り出し
た文字の位置に基づいて設定する。(Function) According to the character extracting device having such a configuration, the position correction value in the column direction for setting the character detection area and the ruled line detection area of the second and subsequent characters of the character line is extracted before the character. Set based on the position of.
位置補正値は文字及び罫線の列方向における変位量或
は変位の範囲を表し、これがため文字行が傾斜していた
としても、文字検出領域が切出そうとする文字を包含す
るように文字検出領域を精度良く設定し、また罫線検出
領域が切出そうとする文字に隣接する罫線を包含するよ
うに罫線検出領域を精度良く設定することができる。The position correction value indicates the amount of displacement or the range of displacement of the character and the ruled line in the column direction. Therefore, even if the character line is inclined, the character detection area is detected so as to include the character to be cut out. The area can be set with high accuracy, and the ruled line detection area can be set with high accuracy so as to include the ruled line adjacent to the character to be cut out.
(実施例) 以下、図面を参照し、この発明の実施例につき説明す
る。尚、図面はこの発明が理解できる程度に概略的に示
してあるにすぎず、従って各構成成分の構成、入出力信
号、入出力信号の流れ、信号線の接続関係、動作の流れ
を図示例に限定するものではない。Hereinafter, embodiments of the present invention will be described with reference to the drawings. It should be noted that the drawings are only schematically shown to the extent that the present invention can be understood. It is not limited to.
第一実施例 第1図はこの発明の第一実施例の構成を概略的に示す
機能ブロック図である。First Embodiment FIG. 1 is a functional block diagram schematically showing a configuration of a first embodiment of the present invention.
この実施例の文字切出し装置は、罫線を伴なう文字行
の画像データを出力する画像出力部10と、画像データを
格納する画像記憶部12と、文字行の各文字毎に設定され
る文字検出領域の画像データを列方向に走査して各走査
線毎に累積黒画素数を検出し第一の周辺分布を作成し第
一の周辺分布の変化から文字検出領域内の文字の位置を
検出し、文字の位置に基づいて文字に隣接する罫線検出
領域の位置を設定すると共に行方向の文字切出し位置を
設定する文字間検出部14と、罫線検出領域の画像データ
を行方向に走査して各走査線毎に累積黒画素数を検出し
第二の周辺分布を作成し第二の周辺分布の変化から罫線
の位置を検出し、罫線の位置に基づき文字及び罫線の分
離位置を設定し、分離位置に基づいて列方向の文字切出
し位置を設定する文字罫線分離部16と、文字行の二文字
目以降の文字の文字検出領域及び罫線検出領域の設定の
ための列方向の位置補正値をそれ以前に切り出した文字
の位置に基づいて設定する補正値決定部17と、各文字毎
の文字切出し位置を出力する制御部18とを備えて成る。The character extracting apparatus according to this embodiment includes an image output unit 10 that outputs image data of a character line with a ruled line, an image storage unit 12 that stores image data, and a character set for each character in the character line. Scanning the image data of the detection area in the column direction, detecting the cumulative number of black pixels for each scanning line, creating the first peripheral distribution, and detecting the position of the character in the character detection area from the change in the first peripheral distribution Then, based on the position of the character, the position of the ruled line detection area adjacent to the character is set, and the character interval detection unit 14 that sets the character cutout position in the line direction is scanned. Detecting the cumulative number of black pixels for each scanning line, creating a second peripheral distribution, detecting the position of the ruled line from the change in the second peripheral distribution, setting the separation position of the character and the ruled line based on the position of the ruled line, Character that sets character extraction position in column direction based on separation position A line separation unit 16 and a correction value for setting a position correction value in the column direction for setting a character detection area and a ruled line detection area of the second and subsequent characters of a character line based on the position of a character cut out earlier. It comprises a determination unit 17 and a control unit 18 that outputs a character cutout position for each character.
以下、この実施例につきより詳細に説明する。 Hereinafter, this embodiment will be described in more detail.
(画像出力部) この実施例の画像出力部10は、罫線を伴なう文字行を
記録した原稿等の記録媒体を走査し、この媒体からの反
射光Pを白黒2値の量子化した電気信号(画像データ)
に変換し、罫線を伴なう文字行の画像データを画素単位
に出力する。(Image Output Unit) The image output unit 10 of this embodiment scans a recording medium such as an original on which a character line with a ruled line is recorded, and converts the reflected light P from this medium into a black and white binary quantized electric signal. Signal (image data)
And outputs the image data of the character line with the ruled line in pixel units.
(画像記憶部) この実施例では画像記憶部12を、画像出力部10からの
画像データを格納する画像メモリを用いて構成する。こ
のメモリ上にはX−Y座標系を設定し、例えば横書の文
字行を想定して仮想的にX軸方向を行方向及びY軸方向
を列方向として設定する。(Image Storage Unit) In this embodiment, the image storage unit 12 is configured using an image memory that stores image data from the image output unit 10. An XY coordinate system is set on this memory. For example, assuming a horizontally written character line, the X-axis direction is virtually set as a row direction and the Y-axis direction is set as a column direction.
文字の欠けを生ずることなく文字の切出しを行なうた
めには画像の真の行方向をX軸方向と一致させるのが最
も好ましいが、真の行方向がX軸方向からずれていても
文字の欠けを従来よりも少なくし或は文字の欠けを生ず
ることなく文字の切出しを行なえる。In order to extract characters without missing characters, it is most preferable to match the true line direction of the image with the X-axis direction. However, even if the true line direction is deviated from the X-axis direction, missing characters may occur. Can be cut out without reducing the number of characters as compared with the related art or without missing characters.
(補正値決定部) この実施例の補正値決定部17は、最初の文字の列方向
の文字検出領域位置及び罫線検出領域位置を表す行位置
情報を図示しない行位置情報検出部から入力する。(Correction Value Determining Unit) The correction value deciding unit 17 of this embodiment inputs, from a line position information detecting unit (not shown), line position information indicating a character detection area position and a ruled line detection area position in the column direction of the first character.
行位置情報検出部は書式情報により予想される最初の
文字の近傍領域(最初の文字を包含してもしなくともよ
い)を設定し、この領域内を走査して行方向の走査線上
の累積黒画素数を各走査線毎に求め、所定の閾値以上の
累積黒画素数の走査線が存在する領域の列方向における
位置を、最初の文字の近傍領域の罫線の列方向位置とし
て検出する。そして行位置情報検出部はこの罫線位置に
基づいて最初の文字の列方向の文字検出領域位置を表す
行位置情報を文字検出領域が最初の文字を包含するよう
に設定すると共に列方向の罫線検出領域位置を表す行位
置情報を罫線検出領域が最初の文字に隣接する罫線を包
含するように設定する。The line position information detection unit sets a region (may or may not include the first character) near the first character expected by the format information, scans this region, and scans the cumulative black on the scanning line in the line direction. The number of pixels is obtained for each scanning line, and the position in the column direction of the region where the scanning line having the cumulative number of black pixels equal to or greater than the predetermined threshold value is detected as the column direction position of the ruled line in the vicinity of the first character. Based on the ruled line position, the line position information detecting unit sets the line position information indicating the position of the character detection area in the column direction of the first character so that the character detection area includes the first character and detects the ruled line in the column direction. The line position information indicating the region position is set so that the ruled line detection region includes a ruled line adjacent to the first character.
さらに補正値決定部17は、二文字目以降の文字の文字
検出領域及び罫線検出領域の設定のための列方向の位置
補正値をそれ以前に切り出した文字の位置に基づいて求
める。Further, the correction value determination unit 17 obtains a position correction value in the column direction for setting the character detection area and the ruled line detection area of the second and subsequent characters based on the positions of the characters cut out earlier.
この実施例では、最初の文字の列方向の文字検出領域
位置と最初の文字以降の切出しを終えた文字の列方向の
切出し位置とを用い、文字切出し位置が文字検出領域位
置から列方向へ変位した量を求め、この変位量から位置
補正値を求める。In this embodiment, the character extraction position is displaced in the column direction from the character detection area position using the character detection area position in the column direction of the first character and the character extraction area position in the column direction of the character that has been cut out after the first character. Then, a position correction value is obtained from the displacement amount.
そして補正値決定部17は位置補正値と最初の文字の列
方向の文字検出領域位置及び罫線検出領域位置とを用い
て、次に切出すべき文字(以下、次の文字とも称す)の
列方向の文字検出領域位置及び罫線検出領域位置を設定
する。次の文字の列方向の検出領域位置は文字検出領域
が次の文字を包含するように設定され、また次の文字の
列方向の罫線検出領域位置は罫線検出領域が次の文字に
隣接する罫線を包含するように設定される。Then, the correction value determination unit 17 uses the position correction value and the character detection area position and ruled line detection area position in the column direction of the first character in the column direction of the next character to be cut out (hereinafter also referred to as the next character). The character detection area position and the ruled line detection area position are set. The detection area position of the next character in the column direction is set so that the character detection area includes the next character, and the ruled line detection area position of the next character in the column direction is a ruled line where the ruled line detection area is adjacent to the next character. Is set to include
(文字間検出部) この実施例の文字間検出部14は、第一周辺分布作成部
20、第一周辺分布記憶部22、黒ブロック検出部24及び罫
線検出領域位置決定部26から成る。(Character Detector) The character detector 14 of this embodiment is a first marginal distribution generator.
20, a first peripheral distribution storage unit 22, a black block detection unit 24, and a ruled line detection area position determination unit 26.
第一周辺分布作成部20は、行方向の文字位置に基づい
て、文字検出領域が切出し文字と当該文字の次に切出す
べき文字の少なくとも一部とを包含するように行方向の
文字検出領域位置を設定する。The first marginal distribution creating unit 20 sets the character detection area in the line direction so that the character detection area includes the cutout character and at least a part of the character to be cut out after the character based on the character position in the line direction. Set the position.
文字行の最初の文字については後述するように制御部
18が検出した最初の文字の行方向の位置に基づいて行方
向の文字検出領域位置を設定する。For the first character of the character line,
The position of the character detection area in the line direction is set based on the position in the line direction of the first character detected by 18.
また最初の文字以降の文字については、文字検出領域
を上述の如く設定すれば黒ブロック検出部24が切出し文
字の行方向の位置と共に次の文字の行方向の位置をも検
出するので、この次の文字の行方向の位置に基づいて行
方向の文字検出領域位置を設定する。For the characters following the first character, if the character detection area is set as described above, the black block detection unit 24 detects not only the position of the cut character in the line direction but also the position of the next character in the line direction. The character detection area position in the line direction is set based on the position of the character in the line direction.
そして第一周辺分布作成部20は、文字検出領域の画像
データを列方向に走査して、各走査線毎に累積黒画素数
を検出し、この各走査線毎の累積黒画素数から成る列方
向の周辺分布(第一周辺分布)を作成する。Then, the first marginal distribution creating unit 20 scans the image data of the character detection area in the column direction, detects the cumulative number of black pixels for each scanning line, and generates a column including the cumulative number of black pixels for each scanning line. Create a marginal distribution in the direction (first marginal distribution).
第一周辺分布記憶部22は、この第一周辺分布の累積黒
画素数を各走査位置毎記憶する。The first peripheral distribution storage unit 22 stores the cumulative number of black pixels of the first peripheral distribution for each scanning position.
黒ブロック検出部24は第一周辺分布の各走査位置毎の
累積黒画素数の変化から行方向の文字始端及び終端位置
を検出し、罫線検出領域位置決定部26はこの行方向の文
字始端及び終端位置に基づいて罫線検出領域が切出し文
字に隣接する罫線を包含するように、行方向の罫線検出
領域始端及び終端位置を設定する。The black block detection unit 24 detects the character start and end positions in the row direction from the change in the cumulative number of black pixels for each scanning position in the first peripheral distribution, and the ruled line detection area position determination unit 26 determines the character start and end in the line direction. Based on the end position, the start and end positions of the ruled line detection area in the row direction are set so that the ruled line detection area includes the ruled line adjacent to the cutout character.
(文字罫線分離部) この実施例の文字罫線分離部16は、第二周辺分布作成
部28、第二周辺分布記憶部30、罫線検出部32及び罫線分
離位置決定部34から成る。(Character Ruled Line Separation Unit) The character ruled line separation unit 16 of this embodiment includes a second marginal distribution creation unit 28, a second marginal distribution storage unit 30, a ruled line detection unit 32, and a ruled line separation position determination unit 34.
第二周辺分布作成部28は、罫線検出領域の画像データ
を行方向に走査して、各走査線毎に累積黒画素数を検出
し、この各走査線毎の累積黒画素数から成る行方向の周
辺分布(第二周辺分布)を作成する。The second peripheral distribution creating unit 28 scans the image data of the ruled line detection area in the row direction, detects the cumulative number of black pixels for each scanning line, and detects the cumulative number of black pixels for each scanning line. Is created (second marginal distribution).
第二周辺分布記憶部30は、この第二周辺分布の累積黒
画素数を各走査位置毎に記憶する。The second peripheral distribution storage unit 30 stores the cumulative number of black pixels of the second peripheral distribution for each scanning position.
罫線検出部32は第二周辺分布の各走査位置毎の累積黒
画素数の変化に基づいて列方向の罫線始端及び終端位置
を検出し、罫線分離位置決定部34は列方向の罫線始端及
び終端位置に基づいて、文字及び罫線を列方向において
分離するための分離位置を設定し、この分離位置から列
方向の文字切出し位置を決定する。The ruled line detection unit 32 detects the start and end positions of the ruled line in the column direction based on the change in the cumulative number of black pixels for each scanning position of the second peripheral distribution, and the ruled line separation position determination unit 34 starts and ends the ruled line in the column direction. A separation position for separating characters and ruled lines in the column direction is set based on the position, and a character cutout position in the column direction is determined from the separation position.
(制御部) この実施例の制御部18は、最初の文字の行方向の仮の
文字検出領域始端位置を、最初の文字及び最初の文字近
傍の余白を包含するように設定する。そしてこの始端位
置から文字検出領域の画像データを線順次に走査してゆ
き各走査線毎に列方向の走査線上の累積黒画素数を求
め、所定の閾値を越える累積黒画素数を検出した走査線
の行方向における位置を最初の文字の始端位置として検
出する。(Control Unit) The control unit 18 of this embodiment sets the start position of the temporary character detection area in the line direction of the first character so as to include the first character and the margin near the first character. Then, the image data in the character detection area is line-sequentially scanned from the start end position, the number of cumulative black pixels on the scanning line in the column direction is obtained for each scanning line, and the number of cumulative black pixels exceeding a predetermined threshold is detected. The position of the line in the line direction is detected as the start position of the first character.
次にこの発明の理解を深めるため、この実施例の動作
の流れにつき一例を挙げて説明する。まず第2図〜第3
図を参照し説明する。Next, in order to deepen the understanding of the present invention, an operation flow of this embodiment will be described with an example. First, FIGS. 2 to 3
This will be described with reference to the drawings.
第2図はこの実施例の動作の流れの一例を示す動作流
れ図である。また第3図(A)及び第4図(A)は横書
き文字の上下に各1本の罫線が存在する場合の画像デー
タの例、第3図(B)及び第4図(B)は最初及び第2
文字目の文字の文字検出領域の周辺分布の例を示す図で
ある。第3図及び第4図の(A)においてX及びY軸は
画像メモリ上に設定した座標軸であり仮想的にX軸方向
を行方向及びY軸方向を列方向としている。また第3図
及び第4図の(B)の横軸は第3図及び第4図の(A)
のX軸に対応する座標軸及び縦軸は副走査位置Xにおけ
る走査線上の累積黒画素数CH(X)を表す。FIG. 2 is an operation flow chart showing an example of the operation flow of this embodiment. FIGS. 3 (A) and 4 (A) are examples of image data when one ruled line exists above and below horizontal writing characters. FIGS. 3 (B) and 4 (B) show the first example. And the second
It is a figure showing the example of the circumference distribution of the character detection field of the character of the character. In FIGS. 3 and 4A, the X and Y axes are coordinate axes set on the image memory, and the X axis direction is virtually the row direction and the Y axis direction is the column direction. The horizontal axis in FIGS. 3 and 4B is the horizontal axis in FIGS. 3 and 4A.
The coordinate axis corresponding to the X-axis and the vertical axis represent the cumulative number of black pixels CH (X) on the scanning line at the sub-scanning position X.
この実施例の切出し装置は、処理開始信号を入力する
と動作を開始する。(START)。The cutout apparatus of this embodiment starts operation when a processing start signal is input. (START).
動作を開始すると、画像出力部10は文書等の所定の読
取範囲イを走査し、白黒2値の画像データを画像記憶部
12は出力する。一方、制御部18は最初の文字の列方向の
文字検出領域位置及び罫線検出領域位置を表す行位置情
報を図示しない行位置情報検出部から入力する。(S1) ここで第3図を参照し、図示しない行位置情報検出部
の動作につき説明する。When the operation is started, the image output unit 10 scans a predetermined reading range A of a document or the like, and stores monochrome binary image data in the image storage unit.
12 outputs. On the other hand, the control unit 18 inputs line position information indicating the position of the character detection area and the ruled line detection area in the column direction of the first character from a line position information detection unit (not shown). (S1) Here, with reference to FIG. 3, the operation of the row position information detecting unit (not shown) will be described.
行位置情報検出部は、文字記載範囲等の書式情報に基
づき、最初の文字の近傍領域口につき行方向の始端位置
Xa及び終端位置Xbと列方向の始端位置Ya及び終端位置Yb
とを設定する。この近傍領域口は最初の文字近傍の罫線
を包含していればよく、最初の文字近傍の罫線に加え最
初の文字を包含していてもよい。尚、図示例では罫線の
みを包含している場合の近傍領域口を示した。The line position information detector detects the start position in the line direction for the vicinity of the first character based on the format information such as the character description range.
X a and terminal position X b and column direction starting end position Y a and terminal position Y b
And. It is sufficient that the vicinity area mouth includes the ruled line near the first character, and may include the first character in addition to the ruled line near the first character. In the illustrated example, the vicinity area opening when only the ruled line is included is shown.
次に行位置情報検出部は近傍領域口内を行方向を主走
査方向として走査し、各副走査位置Yの走査線上の累積
黒画素数を求め、累積黒画素数が所定の閾値以上となる
領域の列方向の位置LYT1、LYB1及び位置LYT2、LYB2を検
出する。LYT1及びLYT2は最初の文字近傍の上側罫線の列
方向の位置、LYB1及びLYB2は最初の文字近傍の下側罫線
の列方向の位置を表す。Next, the row position information detection unit scans the inside of the neighboring area with the row direction as the main scanning direction, finds the number of cumulative black pixels on the scanning line at each sub-scanning position Y, and determines the area where the cumulative number of black pixels is equal to or greater than a predetermined threshold. LYT1, LYB1 and positions LYT2, LYB2 in the column direction are detected. LYT1 and LYT2 represent the position of the upper ruled line near the first character in the column direction, and LYB1 and LYB2 represent the position of the lower ruled line near the first character in the column direction.
そして行位置情報検出部は、最初の文字の列方向の文
字検出領域位置YT2(1)及びYB2(1)を表す行位置情
報LJT2及びLJB2と、最初の文字の列方向の罫線検出領域
位置YT1(1)及びYB1(1)を表す行位置情報LJT1及び
LJB1とを求める。これら行位置情報LJT2、LJB2、LJT1及
びLJB1を例えば次式(1)により定義する。Then, the line position information detecting section includes line position information LJT2 and LJB2 representing the character detection region positions YT2 (1) and YB2 (1) in the column direction of the first character, and the ruled line detection region position YT1 in the column direction of the first character. (1) and row position information LJT1 representing YB1 (1) and
Find LJB1. The row position information LJT2, LJB2, LJT1, and LJB1 are defined by, for example, the following equation (1).
LJT2=LYT2+α1 LJB2=LYB2−α1 LJT1=LYT1−α2 LJB1=LYB1+α2 ……(1) (1)式中のα1は、文字行が所定の傾斜量の範囲内
で傾斜していたとしても切出し文字となる最初の文字に
付された罫線と当該文字の次の文字に付された罫線の少
なくとも一部とを領域LJT2≦Y≦LJB2が包含しないよう
に、設定される定数値であり、例えばα1=12と設定す
る。LJT2 = LYT2 + α1 LJB2 = LYB2-α1 LJT1 = LYT1-α2 LJB1 = LYB1 + α2 (1) α1 in the expression (1) is a character to be cut out even if the character line is inclined within a predetermined inclination range. Is a constant value set so that the area LJT2 ≦ Y ≦ LJB2 does not include the ruled line attached to the first character and at least a part of the ruled line attached to the character following the character. Set to 12.
また(1)式中のα2は、文字行が所定の傾斜量の範
囲内で傾斜していたとしても切出し文字となる最初の文
字に付された罫線と当該文字の次の文字に付された罫線
の少なくとも一部とを領域LJT1≦Y≦LJB1が包含するよ
うに、設定される定数値であり、例えばα2=12と設定
する。Α2 in the expression (1) is a ruled line attached to the first character to be a cut-out character and a character following the character even if the character line is inclined within the range of the predetermined amount of inclination. A constant value is set so that at least a part of the ruled line is included in the area LJT1 ≦ Y ≦ LJB1, for example, α2 = 12.
S1の次に、制御部18は切出し終了フラグに切出し処理
の未終了を表す情報「0」を設定する(S2)。Subsequent to S1, the control unit 18 sets information “0” indicating the incomplete extraction processing in the extraction end flag (S2).
次に、制御部18は切出し終了フラグを参照し切出し処
理を終了したか否かを判定する(S3)。Next, the control unit 18 determines whether or not the extraction process has been completed with reference to the extraction end flag (S3).
ここでは切出し処理の終了及び未終了を表す情報とし
て例えば「1」及び「0」を用いる。この実施例の切出
し装置は、S3で切出し終了フラグ=1であれば動作を終
了し(END)、また切出し終了フラグ=0であれば次の
処理S4を行なう。Here, for example, “1” and “0” are used as information indicating the end and the non-end of the extraction process. The extraction device of this embodiment ends the operation (END) if the extraction end flag = 1 in S3, and performs the following processing S4 if the extraction end flag = 0.
S4では、補正値決定部17は第n文字目(nは自然数)
の文字につき列方向の文字検出領域始端位置YT2(n)
及び終端位置YB2(n)と列方向の罫線検出領域始端位
置YT1(n)及び終端位置YB1(n)とを設定する。In S4, the correction value determination unit 17 determines the n-th character (n is a natural number)
Character detection area start end position YT2 (n) in column direction for each character
And the end position YB2 (n) and the start position YT1 (n) and end position YB1 (n) of the ruled line detection area in the column direction.
・S4a:n=1の場合 補正値決定部17は、制御部18から行位置情報LYT1、LY
T2、LYB1及びLYB2を入力し、最初の文字につき文字検出
領域の列方向の位置YT2(1)及びYB(1)と罫線検出
領域の列方向の位置YT1(1)及びYB1(1)とを、次式
(2)に従って設定し、これら位置YT2(1)、YB2
(1)、YT1(1)及びYB1(1)を制御部18へ出力す
る。In the case of S4a: n = 1, the correction value determination unit 17 sends the row position information LYT1, LY
T2, LYB1 and LYB2 are input, and the position YT2 (1) and YB (1) of the character detection area in the column direction and the position YT1 (1) and YB1 (1) of the ruled line detection area in the column direction are input for the first character. , Set in accordance with the following equation (2), and these positions YT2 (1), YB2
(1) Outputs YT1 (1) and YB1 (1) to the control unit 18.
YT2(1)=LJT2 YB2(1)=LJB2 YT1(1)=LJT1 YB1(1)=LJB1 ……(2) ・S4b:n≠1の場合 補正値決定部17は、最初の文字以降の第n番目(n≧
2)の文字については例えば次式(3)により定義され
る位置YT2(n)、YB2(n)、YT1(n)及びYB1(n)
を設定し、これらの位置を制御部18へ出力する。YT2 (1) = LJT2 YB2 (1) = LJB2 YT1 (1) = LJT1 YB1 (1) = LJB1 (2) In the case of S4b: n ≠ 1, the correction value determination unit 17 determines the first character and the subsequent characters. n-th (n ≧
For the character 2), for example, positions YT2 (n), YB2 (n), YT1 (n) and YB1 (n) defined by the following equation (3).
Are set, and these positions are output to the control unit 18.
YT2(n)=YT2(1)+HENI YB2(n)=YB2(1)+HENI YT1(n)=YT1(1)+HENI YB1(n)=YB1(1)+HENI ……(3) 但し、HENIは補正値決定部17が後述するS12で求めた位
置補正値である。YT2 (n) = YT2 (1) + HENI YB2 (n) = YB2 (1) + HENI YT1 (n) = YT1 (1) + HENI YB1 (n) = YB1 (1) + HENI (3) where HENI is corrected This is the position correction value obtained by the value determination unit 17 in S12 described below.
S4の次にこの実施例の文字切出し装置は、行方向の文
字検出領域始端位置HS(n)及び終端位置HE(n)位置
を設定するための処理を行なう(S5)。Subsequent to S4, the character cutout apparatus of this embodiment performs a process for setting the start position HS (n) and the end position HE (n) of the character detection area in the line direction (S5).
・S5a:n=1の場合 まず、制御部18は最初の文字の行方向の仮の文字検出
領域始端位置を設定する。仮の始端位置は、文字記載範
囲等の書式情報に基づいて文字検出領域が最初の文字及
び最初の文字近傍の余白領域を包含するように、設定さ
れ、例えばこの仮の始端位置をXbとする。S5a: n = 1 First, the control unit 18 sets a temporary character detection area start end position in the line direction of the first character. Starting end position of the temporary, as the character detection region encompasses the first character and a blank area in the first character vicinity on the basis of the format information such as character stated range, is set, for example, the start position of the temporary and X b I do.
制御部18は副走査位置Xを仮の始端位置から文字側へ
順次に移動させながら最初の文字検出領域内を走査し
て、列方向の走査線上の累積黒画素数を各走査線毎に求
める。そして累積黒画素数が所定の閾値を越える走査線
の副走査位置Xを最初の文字の行方向の始端位置Blsと
して検出する。The control unit 18 scans the first character detection area while sequentially moving the sub-scanning position X from the tentative start position to the character side, and obtains the cumulative number of black pixels on the scanning line in the column direction for each scanning line. . And detecting a sub-scanning position X of scanning lines cumulative number of black pixels exceeds a predetermined threshold value as the first character in the row direction of the starting end position B ls.
次に第一周辺分布作成部20は、検出された始端位置B
lsに基づいて、例えば次式(4)により定義される最初
の文字の行方向の文字検出領域始端位置HS(1)及び終
端位置HE(1)を設定する。Next, the first marginal distribution creating unit 20 detects the detected start position B
Based on ls , for example, a character detection area start position HS (1) and end position HE (1) in the line direction of the first character defined by the following equation (4) are set.
HS(1)=B1S−β HE(1)=HS(1)+HX ……(4) ・S5b:n≠1の場合 第一周辺分布作成部20は切出そうとする第n文字目の
行方向の始端位置Bnsに基づいて、例えば次式(5)に
より定義される行方向の始端位置HS(n)及び終端位置
HE(n)を設定する。HS (1) = B 1S −β HE (1) = HS (1) + HX (4) In the case of S5b: n ≠ 1, the first marginal distribution creating unit 20 outputs the n-th character to be extracted Based on the starting position B ns in the row direction, the starting position HS (n) and the ending position in the row direction defined by the following equation (5), for example.
Set HE (n).
HS(n)=Bns−β HE(1)=HS(n)+HX ……(5) (4)式及び(5)式においてβ及びHXは任意好適に
設定される定数値であり、この実施例では第n文字目の
文字と第n+1文字目の文字の文字の少くとも一部とを
第n文字目の文字検出領域が包含するように、β及びHX
を設定した。尚、β及びHXの設定に当っては、第n文字
目の文字検出領域が第n文字目の文字と当該文字近傍の
余白領域とを少なくとも包含するように、β及びHXを設
定すればよく、例えば文字幅の最大幅4mm及びイメージ
センサの解像度16画素/mmとした場合には、β=32及びH
X=128とすればよい。HS (n) = B ns −β HE (1) = HS (n) + HX (5) In Expressions (4) and (5), β and HX are constant values that are arbitrarily and suitably set. In the embodiment, β and HX are set such that the character detection region of the nth character includes at least a part of the character of the nth character and at least a part of the character of the (n + 1) th character.
It was set. In setting β and HX, β and HX may be set so that the character detection area of the nth character includes at least the nth character and a blank area near the character. For example, when the maximum width of the character width is 4 mm and the resolution of the image sensor is 16 pixels / mm, β = 32 and H
X may be set to 128.
S5の次に、第一周辺分布作成部20は、文字行「ABCD」
からの文字切出しの終了か否かを判定する(S6)。Subsequent to S5, the first marginal distribution creating unit 20 outputs the character line "ABCD"
It is determined whether or not character extraction from is completed (S6).
例えば位置HE(n)が、位置Xbと書式情報で定められ
た文字記載面の行方向の長さとの和よりも、大きい或は
小さいとき文字切出しの終了或は未終了とみなす。For example, when the position HE (n) is larger or smaller than the sum of the position Xb and the length in the line direction of the character description surface defined by the format information, it is regarded that the character extraction is completed or not completed.
文字切出しの終了であれば、第一周辺分布作成部20は
レジスタが保持する処理終了フラグを1に書換え(S
7)、次いで第一周辺分布作成部20及び黒ブロック検出
部24が文字検出領域内の文字の行方向の位置を検出する
ための処理を行なう(S8)。また文字切出しの未終了で
あれば、S6に次いでS8の処理を行なう。If the character extraction is completed, the first marginal distribution creating unit 20 rewrites the processing end flag held in the register to 1 (S
7) Then, the first marginal distribution creation unit 20 and the black block detection unit 24 perform processing for detecting the position of the character in the character detection area in the line direction (S8). If character extraction has not been completed, the process of S8 is performed after S6.
S8では、第一周辺分布作成部20は、この作成部20が設
定したHS(n)及びHE(n)と制御部18からのYT2
(n)、YB2(n)とに基づいて第n文字目の文字検出
領域ハ(HS(n)≦X≦HE(n)及びYT2(n)≦Y≦Y
B2(n)の範囲の領域)の画像データを走査し、第一周
辺分布を作成する。この周辺分布の作成では、周辺各副
走査位置X毎に求めた走査線上の累積黒画素数CH(X)
を第一周辺分布記憶部22に格納する。累積黒画素数CH
(X)は次式(6)で表される。In S8, the first marginal distribution creating unit 20 sets the HS (n) and HE (n) set by the creating unit 20 and the YT2 from the control unit 18.
(N), based on YB2 (n), the character detection area of the n-th character (HS (n) ≦ X ≦ HE (n) and YT2 (n) ≦ Y ≦ Y
The image data in the area of B2 (n) is scanned to create a first marginal distribution. In the creation of this peripheral distribution, the cumulative number of black pixels CH (X) on the scanning line obtained for each peripheral sub-scanning position X
Is stored in the first marginal distribution storage unit 22. Cumulative black pixel count CH
(X) is represented by the following equation (6).
但し、座標(X、Y)の位置の画素の画像データが黒画
素であるときP(X、Y)=1及び白画素であるときP
(X、Y)=0である。 However, when the image data of the pixel at the position of the coordinates (X, Y) is a black pixel, P (X, Y) = 1, and when it is a white pixel, P (X, Y)
(X, Y) = 0.
S8において第一周辺分布の作成を終えたら次に、黒ブ
ロック検出部24は第一周辺分布記憶部22の累積黒画素数
CH(X)を読出し、各副走査位置X毎に閾値THL B例え
ばTHL B=1と比較し、CH(X)≧THL BとなるCH(X)
が存在する領域の行方向における始端位置及び終端位置
を第n文字目の行方向の文字始端位置Bns及び終端位置B
nEとして検出する。After completing the creation of the first marginal distribution in S8, next, the black block detection unit 24 calculates the cumulative number of black pixels in the first marginal distribution storage unit 22.
CH (X) is read out and compared with a threshold value THL B for each sub-scanning position X, for example, THL B = 1, and CH (X) satisfying CH (X) ≧ THL B
The start position and end position in the line direction of the area where is present are the character start position B ns and end position B in the line direction of the n-th character.
Detected as nE .
CH(X)≧THL Bとなる走査線を黒走査線及びCH
(X)<THL Bとなる走査線を白走査線と称するものと
すれば、白走査線の次に黒走査線となったときの黒走査
線の副走査位置Xを行方向の文字始端位置Bns及び黒走
査線の次に白走査線となったときの黒走査線の副走査位
置Xを行方向の文字終端位置BnEとすればよい。A scan line satisfying CH (X) ≧ THLB is defined as a black scan line and CH
If the scanning line satisfying (X) <THLB is called a white scanning line, the sub-scanning position X of the black scanning line when the black scanning line follows the white scanning line is the character start position in the row direction. The sub-scanning position X of the black scanning line when the white scanning line follows the B ns and the black scanning line may be set as the character end position B nE in the row direction.
この実施例では、黒ブロック検出部24は第n文字目の
文字始端位置Bns及び終端位置BnEと次の第(n+1)文
字目の行方向の文字始端位置B(n+1)sとを検出し、次に
切出すべき第(n+1)文字目の文字検出領域を設定す
るために用いる文字始端位置B(n+1)sを制御部18に対し
て出力する。In this embodiment, the black block detection unit 24 determines the character start position B ns and the end position B nE of the n-th character, and the character start position B (n + 1) s in the line direction of the next (n + 1) -th character. And outputs to the control unit 18 a character start position B (n + 1) s used for setting a character detection area of the (n + 1) th character to be cut out next.
そして、罫線検出領域位置決定部26は第n文字目の文
字始端位置Bns及び終端位置BnEに基づき、第n文字目の
文字切出し位置を決定し、この切出し位置を制御部18に
対し出力する。例えば、行方向の文字始端位置Bns及び
終端位置BnEを文字の行方向の文字切出し開始位置KXL
(n)及び終了位置KXR(n)とする。Then, the ruled-line detection area position determining unit 26 on the basis of the n-th character starting position B ns and end position B nE, determines the character extraction position of the n th character, it outputs the extraction position to the control unit 18 I do. For example, a row direction of the character start position B ns and end position B nE character in the line direction character segmentation start position KXL
(N) and the end position KXR (n).
S8の次に、罫線検出領域位置決定部26は、黒ブロック
検出部24からの行方向の文字始端位置Bns及び終端位置B
nEに基づいて、第n文字目の文字に隣接する罫線を包含
するように、行方向の罫線検出領域位置を設定し(S
9)、この行方向の罫線検出領域位置を制御部18へ出力
する。Subsequent to S8, the ruled line detection area position determination unit 26 receives the character start end position B ns and the end position B ns in the row direction from the black block detection unit 24.
Based on nE , the ruled line detection area position in the row direction is set to include the ruled line adjacent to the n-th character (S
9) The position of the ruled line detection area in the row direction is output to the control unit 18.
この実施例では、第n文字目の始端位置Bnsに隣接す
る第一罫線検出領域及び終端位置BnEに隣接する第二罫
線検出領域を設定する。In this embodiment, to set the second ruled line detection area adjacent to the first ruled line detection region and the end position B nE adjacent to the n-th character of the starting end position B ns.
第n文字目の第一罫線検出領域の行方向の始端位置BX
LL(n)及び終端位置BXLR(n)と、第二罫線検出領域
の行方向の始端位置BXRL(n)及び終端位置BXRR(n)
とを例えば次式(7)に示すように定義する。Start position BX in the row direction of the first ruled line detection area of the n-th character
LL (n) and end position BXLR (n), and start and end positions BXRL (n) and BXRR (n) in the row direction of the second ruled line detection area.
Are defined, for example, as shown in the following equation (7).
n=1又はKXL(n)−1−β>KXR(n−1)+1のと
き BXLL(n)=KXL(n)−1−β n≧2かつKXL(n)−1−β≦KXR(n−1)+1のと
き BXLL(n)=KXR(n−1)+1 BXLR(n)=KXL(n)−1 BXRL(n)=KXR(n)+1 n=nL(nLは文字行1行における文字数)又はKXR
(n)+1+β<KXL(n+1)−1のとき BXRR(n)=KXR(n)+1+β n≦nL−1かつKXR(n)+1+β≧KXL(n+1)−1
のとき BXRR(n)=KXL(n+1)−1 ……(7) 次に第2図及び第5図を参照し説明する。第5図
(A)は第n文字目のほぼ一文字分の画像データを示す
図、第5図(B)及び(C)は第n文字目の罫線検出領
域の周辺分布を示す図である。第5図の(A)において
X及びY軸は画像メモリ上に設定した座標軸、また第5
図(B)及び(C)の縦軸は第5図(A)のY軸に対応
する座標軸及び横軸は副走査位置Yにおける走査線上の
累積黒画素数BLH(Y)及びBRH(Y)を表す。When n = 1 or KXL (n) -1-β> KXR (n-1) +1 BXLL (n) = KXL (n) -1-βn ≧ 2 and KXL (n) -1-β ≦ KXR ( When n−1) +1 BXLL (n) = KXR (n−1) +1 BXLR (n) = KXL (n) −1 BXRL (n) = KXR (n) +1 n = n L (n L is a character line Number of characters in one line) or KXR
(N) + 1 + β <KXL (n + 1) −1 BXRR (n) = KXR (n) + 1 + β n ≦ n L −1 and KXR (n) + 1 + β ≧ KXL (n + 1) −1
BXRR (n) = KXL (n + 1) -1 (7) Next, a description will be given with reference to FIG. 2 and FIG. FIG. 5A is a diagram showing image data of almost one character of the n-th character, and FIGS. 5B and 5C are diagrams showing peripheral distribution of the ruled line detection area of the n-th character. In FIG. 5A, the X and Y axes are coordinate axes set on the image memory,
The vertical axes in FIGS. (B) and (C) are the coordinate axes corresponding to the Y axis in FIG. 5 (A) and the horizontal axes are the cumulative number of black pixels BLH (Y) and BRH (Y) on the scanning line at the sub-scanning position Y. Represents
S9の次に、第二周辺分布作成部28及び罫線検出部32は
第一及び第二罫線検出領域内の罫線の列方向の位置を検
出するための処理を行なう(S10)。Subsequent to S9, the second margin distribution creating unit 28 and the ruled line detection unit 32 perform processing for detecting the position of the ruled line in the column direction in the first and second ruled line detection areas (S10).
S10で第二周辺分布作成部28は、制御部18から入力し
た第一罫線検出領域二の位置BXLL(n)、BXLR(n)、
YT1(n)及びYB1(n)と第二罫線検出領域ホの位置BX
RL(n)、BXRR(n)、YT1(n)及びYB1(n)とに基
づいて、第n文字目の第一罫線検出領域ニ(BXLL(n)
≦X≦BXLR(n)及びYT1(n)≦Y≦YB1(n)の領
域)及び第二罫線検出領域ホ(BXRL(n)≦X≦BXRR
(n)及びYT1(n)≦Y≦YB1(n)の領域を走査し、
各副走査位置Yの走査線上の累積黒画素数BLH(Y)及
びBRH(Y)を求め、各副走査位置Y毎に累積黒画素数B
LH(Y)及びBRH(Y)を第二周辺分布記憶部30に格納
して、第二周辺分布を作成する。In S10, the second marginal distribution creating unit 28 determines the positions BXLL (n), BXLR (n), and BXLL (n) of the first ruled line detection area 2 input from the control unit 18.
YT1 (n) and YB1 (n) and the position BX of the second ruled line detection area E
Based on RL (n), BXRR (n), YT1 (n) and YB1 (n), the first ruled line detection area d (BXLL (n) of the n-th character
≤X≤BXLR (n) and YT1 (n) ≤Y≤YB1 (n)) and the second ruled line detection region e (BXRL (n) ≤X≤BXRR
(N) and the area of YT1 (n) ≦ Y ≦ YB1 (n) is scanned,
The cumulative number of black pixels BLH (Y) and BRH (Y) on the scanning line at each sub-scanning position Y is obtained, and the cumulative black pixel number B for each sub-scanning position Y is calculated.
LH (Y) and BRH (Y) are stored in the second marginal distribution storage unit 30, and a second marginal distribution is created.
累積黒画素数BLH(Y)及びBRH(Y)は例えば次式
(8)により定義される。The cumulative number of black pixels BLH (Y) and BRH (Y) is defined, for example, by the following equation (8).
次いで、罫線検出部32は第n文字目の文字に隣接する
罫線の当該文字に近い側の罫線位置を検出する。罫線検
出部32は第二周辺分布記憶部30の累積黒画素数BLH
(Y)をYT2(n)から副走査位置Yが大きい順に読出
し、各副走査位置Y毎に閾値THLLと比較し、最初にBLH
(Y)≧THLLとなる副走査位置Yを列方向の罫線位置YL
Tとして検出する。また累積黒画素数BLH(Y)をYB2
(n)から副走査位置Yが小さい順に読出し、各副走査
位置Y毎に閾値THLLと比較し、最初にBLH(Y)≧THLL
となる副走査位置Yを列方向の罫線位置YLBとして検出
する。第一罫線検出領域に関してはこれら罫線位置YLT
及びYLBが検出される。 Next, the ruled line detection unit 32 detects a ruled line position on a side closer to the character of the ruled line adjacent to the n-th character. The ruled line detection unit 32 stores the cumulative black pixel number BLH in the second marginal distribution storage unit 30.
(Y) is read out from YT2 (n) in the order of the sub-scanning position Y, and compared with the threshold value THLL for each sub-scanning position Y.
(Y) The sub-scanning position Y where ≧ THLL is set to the ruled line position YL in the column direction
Detected as T. In addition, the cumulative number of black pixels BLH (Y) is YB2
From (n), the sub-scanning position Y is read in ascending order and compared with the threshold value THLL for each sub-scanning position Y. First, BLH (Y) ≧ THLL
Is detected as the ruled line position YLB in the column direction. For the first ruled line detection area, these ruled line positions YLT
And YLB are detected.
閾値THLLは任意好適に設定される定数であり、例えば
次式(9)により定義される。The threshold value THLL is a constant that is arbitrarily and suitably set, and is defined, for example, by the following equation (9).
THLL={(BXLR(n)−BXLL(n)+1)/2}+1……
(9) これと共に罫線検出部32は第二周辺分布記憶部30の累
積黒画素数BRH(Y)をYT2(n)から副走査位置Yが大
きい順に読出し、各副走査位置Y毎に閾値THLRと比較
し、最初にBRH(Y)≧THLRとなる副走査位置Yを列方
向の罫線位置YRTとして検出する。また累積黒画素数BRH
(Y)をYB2(n)から副走査位置Yが小さい順に読出
し、各副走査位置Y毎に閾値THLRと比較し、最初にBRH
(Y)≧THLRとなる副走査位置Yを列方向の罫線位置YR
Bとして検出する。第二罫線検出領域に関してはこれら
罫線位置YRT及びYRBが検出される。THLL = {(BXLR (n) -BXLL (n) +1) / 2} +1 ...
(9) At the same time, the ruled line detection unit 32 reads the cumulative black pixel number BRH (Y) of the second peripheral distribution storage unit 30 from YT2 (n) in descending order of the sub-scanning position Y, and sets a threshold value THLR for each sub-scanning position Y. First, the sub-scanning position Y where BRH (Y) ≧ THLR is detected as the ruled line position YRT in the column direction. Also, the cumulative number of black pixels BRH
(Y) is read from YB2 (n) in ascending order of the sub-scanning position Y, and is compared with the threshold value THLR for each sub-scanning position Y.
(Y) The sub-scanning position Y where ≧ THLR is set to the ruled line position YR in the column direction
Detect as B. As for the second ruled line detection area, these ruled line positions YRT and YRB are detected.
閾値THLRは任意好適に設定される定数であり、例えば
次式(10)により定義される。The threshold value THLR is a constant that is arbitrarily and suitably set, and is defined, for example, by the following equation (10).
THLR={(BXRR(n)−BRRL(n)+1)/2}+1……
(10) S10の次に、罫線分離位置決定部34は第n文字目の文
字と当該文字に隣接する罫線とを列方向において分離す
るための第n文字目の分離位置YT(n)、YB(n)を設
定する(S11)。THLR = {(BXRR (n) -BRRL (n) +1) / 2} +1 ...
(10) Subsequent to S10, the ruled line separation position determination unit 34 separates the nth character from the ruled line adjacent to the nth character in the column direction, and separates the nth character YT (n), YB. (N) is set (S11).
分離位置YT(n)、YB(n)は例えば次式(11)で定
義される。The separation positions YT (n) and YB (n) are defined, for example, by the following equation (11).
YLT≧YRTのとき YT(n)=YLT YLT≧YRTのとき YT(n)=YRT YLB≧YRBのとき YB(n)=YRB YLB<YRBのとき YB(n)=YLB ……(11) そして罫線分離位置決定部34はこれら分離位置に基づ
いて列方向の文字切出し開始位置及び終了位置を設定
し、これら切出し位置を制御部18に対して出力する。例
えば分離位置YT(n)及びYB(n)を、列方向の文字切
出し開始位置及び終了位置とし、制御部18は第n文字目
の行方向及び列方向の切出し位置KXL(n)、KXR(n)
及びYT(n)、YB(n)を図示しない次段の装置へ出力
する。When YLT ≧ YRT YT (n) = YLT When YLT ≧ YRT YT (n) = YRT When YLB ≧ YRB YB (n) = YRB When YLB <YRB YB (n) = YLB ... (11) and The ruled line separation position determination unit 34 sets a character extraction start position and an end position in the column direction based on these separation positions, and outputs these extraction positions to the control unit 18. For example, the separation positions YT (n) and YB (n) are set as the character extraction start position and end position in the column direction, and the control unit 18 determines the extraction position KXL (n), KXR ( n)
And YT (n) and YB (n) are output to the next-stage device (not shown).
S11の次に補正値決定部17は位置補正値を求める(S1
2)。After S11, the correction value determination unit 17 obtains a position correction value (S1
2).
補正値決定部17は、最初の文字以降の第n番目の文字
についてS4で文字検出領域及び罫線検出領域の位置を設
定するために必要な位置補正値HENIを例えば次に述べる
ように求める。第(n−1)番目の列方向の文字切出し
開始位置YT(n−1)及び終了位置YB(n−1)を罫線
分離位置決定部34から入力し、これら切出し位置YT(n
−1)、YB(n−1)と既に設定された最初の文字の列
方向の文字検出領域位置YT2(1)、YB2(1)とを用い
て次式(12)により定義されたHENIを求める。S12を終
えたらS3の処理を行なう。The correction value determination unit 17 obtains a position correction value HENI required to set the positions of the character detection area and the ruled line detection area in S4 for the n-th character after the first character, for example, as described below. The (n-1) th character extraction start position YT (n-1) and end position YB (n-1) in the column direction are input from the ruled line separation position determination unit 34, and these extraction positions YT (n
-1), YB (n-1) and the previously set character detection area positions YT2 (1) and YB2 (1) in the column direction of the first character are used to calculate HENI defined by the following equation (12). Ask. After S12, the process of S3 is performed.
|YT(n−1)−YT2(1)|≧|YB(n−1)−YB2
(1)|のとき HENI=YB(n−1)−YB2(1) |YT(n−1)−YT2(1)|<|YB(n−1)−YB2
(1)|のとき HENI=YT(n−1)−YT2(1) ……(12) 式(3)にも示すように第1文字目の列方向の文字検
出領域位置YT2(1)、YB2(1)及び罫線検出領域位置
YT1(1)、YB1(1)は位置補正値HENIで補正され、こ
れら補正した位置が第n文字目の列方向の文字検出領域
位置YT2(n)、YB2(n)及び罫線検出領域位置YT1
(n)、YB1(n)として制御部18へ入力される。| YT (n-1) -YT2 (1) | ≥ | YB (n-1) -YB2
(1) | HENI = YB (n-1) -YB2 (1) | YT (n-1) -YT2 (1) | <| YB (n-1) -YB2
(1) When | HENI = YT (n-1) -YT2 (1) (12) As shown in Expression (3), the character detection area position YT2 (1) in the column direction of the first character is obtained. YB2 (1) and ruled line detection area position
YT1 (1) and YB1 (1) are corrected by the position correction value HENI. These corrected positions are the character detection area positions YT2 (n) and YB2 (n) and the ruled line detection area position YT1 in the column direction of the n-th character.
(N) and YB1 (n) are input to the control unit 18.
これら列方向の文字検出領域位置及び罫線検出領域位
置は一文字毎に列方向において補正される。The position of the character detection area and the position of the ruled line detection area in the column direction are corrected in the column direction for each character.
第二実施例 第6図はこの発明の第二実施例の構成を概略的に示す
機能ブロック図である。尚、第一実施例の構成成分と同
様の構成成分については同一の符号を付して示した。Second Embodiment FIG. 6 is a functional block diagram schematically showing a configuration of a second embodiment of the present invention. The same components as those of the first embodiment are denoted by the same reference numerals.
この実施例の文字切出し装置は、画像出力部10、画像
記憶部12、文字間検出部14、文字罫線分離部16、補正値
決定部36及び制御部18を備えて成る。The character extracting apparatus according to this embodiment includes an image output unit 10, an image storage unit 12, a character interval detection unit 14, a character ruled line separation unit 16, a correction value determination unit 36, and a control unit 18.
第二実施例では、補正値決定部36の構成及び動作が第
一実施例と相違する他は、第一実施例と同様である。以
下、主として第一実施例と相違する点につき説明し、第
一実施例と同様の点についてはその詳細な説明を省略す
る。The second embodiment is the same as the first embodiment except that the configuration and operation of the correction value determining unit 36 are different from those of the first embodiment. Hereinafter, points different from the first embodiment will be mainly described, and detailed description of the same points as the first embodiment will be omitted.
以下、この実施例につきより詳細に説明する。 Hereinafter, this embodiment will be described in more detail.
(補正値決定部) この実施例の補正値決定部36は、第(N−1)文字目
までの各文字毎に文字行の傾斜度を求める傾斜度計算部
36aと、第(N−1)文字目以前の文字については各文
字毎に求めた文字行の傾斜度を用いて当該傾斜度を得た
文字の位置補正値を求め第N文字目以後の文字について
は第(N−1)文字目までの文字の文字行の傾斜度の平
均値(平均傾斜度)を用いて位置補正値を求める補正値
計算部36bとから成る。(Correction Value Determining Unit) The correction value deciding unit 36 of this embodiment is a gradient calculating unit that calculates the inclination of a character line for each character up to the (N-1) th character.
36a, and for the characters before the (N-1) th character, the position correction value of the character whose inclination is obtained is obtained using the inclination of the character line obtained for each character, and the characters after the Nth character And a correction value calculation unit 36b for calculating a position correction value using the average value (average inclination) of the inclinations of the character lines of the characters up to the (N-1) th character.
尚、補正値計算部36bが最初の文字の列方向の文字検
出領域位置及び罫線検出領域位置と、次に切出すべき文
字の列方向の文字検出領域位置及び罫線検出領域位置と
を設定する。Note that the correction value calculation unit 36b sets the character detection area position and ruled line detection area position of the first character in the column direction, and the character detection area position and ruled line detection area position of the next character to be cut out in the column direction.
この実施例では任意好適に設定される行方向の基準位
置から、既に検出済みの行方向の文字切出し位置に基づ
いて得た文字の中心位置までの行方向の距離aと、任意
好適に設定される列方向の基準位置から、既に検出済み
の列方向の罫線位置或は文字切出し位置までの列方向の
距離bとから文字行の傾斜度を求め、この傾斜度から平
均傾斜度を求める。さらに任意好適に設定される行方向
の基準位置から、文字幅、文字ピッチ等の書式情報を用
いて得られる次に切出すべき文字の行方向の中心位置ま
での行方向の距離cと、傾斜度とを用いて次に切出すべ
き文字の列方向における変位或は変位の範囲を予想し、
また罫線の列方向における変位或は変位の範囲を予想
し、これら変位或は変位の範囲に基づいて列方向の文字
検出領域位置及び罫線検出領域位置を設定する。In this embodiment, the distance a in the line direction from the arbitrarily set line direction reference position to the center position of the character obtained based on the already detected line direction character cutout position is arbitrarily set. From the reference position in the column direction to the ruled line position in the column direction or the distance b in the column direction that has already been detected, the inclination of the character line is determined, and the average inclination is determined from the inclination. Further, a distance c in the line direction from a reference position in the line direction set arbitrarily and suitably to a center position in the line direction of the next character to be cut out obtained using format information such as a character width and a character pitch; Using the degrees, predict the displacement or range of displacement in the column direction of the character to be extracted next,
Further, a displacement or a range of displacement of the ruled line in the column direction is predicted, and a character detection area position and a ruled line detection area position in the column direction are set based on the displacement or the range of displacement.
例えば、行方向の基準位置を最初の文字近傍領域の行
方向の中心位置とし、上述の距離a及びcを求める。ま
た列方向の基準位置を行位置情報LYT2又はLYB2とし、上
述の距離bを求める。For example, the above-described distances a and c are obtained by setting the reference position in the row direction as the center position of the first character vicinity area in the row direction. Further, the reference position in the column direction is set as the row position information LYT2 or LYB2, and the above distance b is obtained.
(文字間検出部) この実施例の文字間検出部14は上述の距離aを求める
ための行方向の文字中心位置を、行方向の文字切出し位
置から求める。(Character space detecting unit) The character space detecting unit 14 of this embodiment calculates the character center position in the line direction for obtaining the distance a from the character cutout position in the line direction.
次にこの発明の理解を深めるために、この実施例の動
作の流れにつき一例を挙げて説明する。Next, in order to deepen the understanding of the present invention, an operation flow of this embodiment will be described by way of an example.
まず第7図及び第3図を参照し説明する。第7図はこ
の実施例の動作の流れの一例を示す動作流れ図である。First, a description will be given with reference to FIGS. 7 and 3. FIG. 7 is an operation flow chart showing an example of the operation flow of this embodiment.
この実施例の切出し装置は、処理開始信号を入力する
と動作を開始する。(START)。The cutout apparatus of this embodiment starts operation when a processing start signal is input. (START).
動作を開始すると、画像出力部10は文書等の所定の読
取範囲イを走査し、白黒2値の画像データを画像記憶部
12へ出力する。一方、制御部18は最初の文字の列方向の
文字検出領域位置を表す行位置情報LJT2、LJB2と、最初
の文字の列方向の罫線検出領域位置を表す行位置情報LJ
T1及びLJB1と、最初の文字近傍の罫線位置を表す行位置
情報LYT2、LYB2と、上述の距離aを求めるための行方向
の基準位置XKCとを図示しない行位置情報検出部から入
力する。(*S1) 行位置情報検出部は、行位置情報LJT2、LJB2、LJT1、
LJB1、LYT2及びLYB2を第一実施例と同様にして検出す
る。また行位置情報検出部は、上述の距離aを求めるた
めの基準位置XKCとして例えば領域Xa≦X≦Xbにおける
罫線のX方向の中心位置を設定する。この基準位置XKC
は次式で表される。When the operation is started, the image output unit 10 scans a predetermined reading range A of a document or the like, and stores monochrome binary image data in the image storage unit.
Output to 12. On the other hand, the control unit 18 has line position information LJT2 and LJB2 representing the character detection region position of the first character in the column direction, and line position information LJT representing the ruled line detection region position of the first character in the column direction.
T1 and LJB1, line position information LYT2 and LYB2 indicating a ruled line position near the first character, and a line direction reference position XKC for obtaining the distance a are input from a line position information detection unit (not shown). (* S1) The row position information detector detects the row position information LJT2, LJB2, LJT1,
LJB1, LYT2 and LYB2 are detected in the same manner as in the first embodiment. The row position information detection unit sets a center position in the X direction of the ruled line at the reference position XKC and to for example, an area X a ≦ X ≦ X b for determining the distance a above. This reference position XKC
Is represented by the following equation.
XKC=Xa+(Xb−Xa)/2 …(13) *S1の次に、制御部18は切出し終了フラグに切出し処
理の未終了を表す情報「0」を設定する(*S2)。XKC = X a + (X b −X a ) / 2 (13) After * S 1, the control unit 18 sets information “0” indicating the end of the extraction processing in the extraction end flag (* S 2). .
次に、制御部18は文字切出しを行なう文字行において
切出し位置を検出する文字が何文字目に当るかをカウン
トする文字番号カウンタに1を設定する(*S3)。Next, the control unit 18 sets 1 to a character number counter that counts the number of characters for which a character whose position is to be extracted is detected in the character line where character extraction is performed (* S3).
次に、補正値計算部36bは第n文字目の文字につき列
方向の文字検出領域始端位置YT2(n)及び終端位置YB2
(n)と列方向の罫線検出領域始端位置YT1(n)及び
終端位置YB1(n)とを設定する(*S4)。Next, the correction value calculation unit 36b calculates the start position YT2 (n) and the end position YB2 of the character detection area in the column direction for the nth character.
(N) and the ruled line detection area start end position YT1 (n) and end position YB1 (n) in the column direction are set (* S4).
・*S4a:n=1の場合 補正値計算部36bは、制御部18から行位置情報LJT2、L
JB2、LJT1及びLJB1を入力し、次式(2)に従って、最
初の文字の文字検出領域位置YT2(1)、YB2(1)と罫
線検出領域位置YT1(1)、YB1(1)とを設定する。• * S4a: When n = 1 The correction value calculator 36b sends the row position information LJT2, LJ
Input JB2, LJT1 and LJB1, and set the character detection area position YT2 (1), YB2 (1) and the ruled line detection area position YT1 (1), YB1 (1) of the first character according to the following equation (2). I do.
YT2(1)=LJT2 YB2(1)=LJB2 YT1(1)=LJT1 YB1(1)=LJB1 ……(2) ・*S4b:n≠1の場合 補正値計算部36bは、最初の文字以降の第n番目(n
≧2)の文字については例えば次式(3)により定義さ
れる文字検出領域位置YT2(n)、YB2(n)及び罫線検
出領域位置YT1(n)、YB1(n)を設定する。YT2 (1) = LJT2 YB2 (1) = LJB2 YT1 (1) = LJT1 YB1 (1) = LJB1 (2) ** S4b: n ≠ 1 When the correction value calculation unit 36b is used, The n-th (n
For the characters of ≧ 2), for example, character detection area positions YT2 (n) and YB2 (n) and ruled line detection area positions YT1 (n) and YB1 (n) defined by the following equation (3) are set.
YT2(n)=YT2(n−1)+HENI YB2(n)=YB2(n−1)+HENI YT1(n)=YT1(n−1)+HENI YB1(n)=YB1(n−1)+HENI ……(3) 但し、HENIは補正値計算部36bが後述する*S19又は*S2
0で求めた位置補正値である。この位置補正値HENIを用
いて列方向の位置YT2(n)、YB2(n)、YT1(n)及
びYB1(n)を文字行の傾斜に応じて一文字毎に補正す
る。YT2 (n) = YT2 (n-1) + HENI YB2 (n) = YB2 (n-1) + HENI YT1 (n) = YT1 (n-1) + HENI YB1 (n) = YB1 (n-1) + HENI ... (3) However, HENI is calculated by the correction value calculation unit 36b in * S19 or * S2 described later.
This is the position correction value obtained with 0. Using the position correction value HENI, the positions YT2 (n), YB2 (n), YT1 (n) and YB1 (n) in the column direction are corrected for each character according to the inclination of the character line.
*S4の次にこの実施例の文字切出し装置は、行方向の
文字検出領域始端位置HS(n)及び終端位置HE(n)位
置を設定するための処理を行なう(*S5)。Subsequent to * S4, the character cutout apparatus of this embodiment performs processing for setting the start position HS (n) and the end position HE (n) of the character detection area in the line direction (* S5).
・*S5a:n=1の場合 まず、制御部18は最初の文字の文字検出領域につき行
方向の仮の始端位置を設定し、副走査位置Xを仮の始端
位置から文字側へ順次に移動させながら最初の文字検出
領域内を走査して、列方向の走査線上の累積黒画素数を
各走査線毎に求める。そして累積黒画素数が所定の閾値
を越える走査線の副走査位置Xを最初の文字の行方向の
始端位置Blsとして検出する。* When S5a: n = 1 First, the control unit 18 sets a temporary starting position in the line direction for the character detection area of the first character, and sequentially moves the sub-scanning position X from the temporary starting position to the character side. While scanning, the first character detection area is scanned, and the cumulative number of black pixels on the scanning line in the column direction is obtained for each scanning line. And detecting a sub-scanning position X of scanning lines cumulative number of black pixels exceeds a predetermined threshold value as the first character in the row direction of the starting end position B ls.
次いで第一周辺分布作成部20は、検出された始端位置
Blsに基づいて、例えば次式(4)で定義される最初の
文字の行方向の文字検出領域始端位置HS(1)及び終端
位置HE(1)を設定し、これら位置HS(1)、HE(1)
を制御部18へ入力する。Next, the first marginal distribution creating unit 20 calculates the detected start position
Based on B ls , for example, the start position HS (1) and the end position HE (1) in the line direction of the first character defined by the following expression (4) are set, and these positions HS (1), HE (1)
Is input to the control unit 18.
HS(1)=Bls−β HE(1)=HS(1)+HX ……(4) ・*S5b:n≠1の場合 第一周辺分布作成部20は切出そうとする第n文字目の
行方向の始端位置Bnsに基づいて、例えば次式(5)で
定義される行方向の始端位置HS(n)及び終端位置HE
(n)を設定し、これら位置HS(n)、HE(n)を制御
部18へ入力する。HS (1) = B ls −β HE (1) = HS (1) + HX (4) In the case of * S5b: n ≠ 1, the first marginal distribution creating unit 20 attempts to cut out the n-th character based on the starting end position B ns of the row direction, for example, the following equation (5) the row direction of the starting end position HS, defined by (n) and the end position HE
(N) is set, and these positions HS (n) and HE (n) are input to the control unit 18.
HS(n)=Bns−β HE(n)=HS(n)+HX ……(5) *S5の次に、第一周辺分布作成部20は、文字行「ABC
D」からの文字切出しの終了か否かを判定する(*S
6)。HS (n) = B ns −β HE (n) = HS (n) + HX (5) * After S5, the first marginal distribution creating unit 20 sets the character line “ABC
D ”or not (* S
6).
文字切出しの終了であれば、第一周辺分布作成部20は
レジスタが保持する処理終了を1に書換え(*S7)、次
いで第一周辺分布作成部20及び黒ブロック検出部24が文
字検出領域内の文字の行方向の位置を検出するための処
理を行なう(*S8)。また文字切出しの未終了であれ
ば、*S6の次に*S8の処理を行なう。If the character extraction is completed, the first marginal distribution creation unit 20 rewrites the end of the processing held in the register to 1 (* S7), and then the first marginal distribution creation unit 20 and the black block detection unit 24 change the character detection area. A process for detecting the position of the character in the line direction is performed (* S8). If the character extraction has not been completed, * S6 is followed by * S8.
*S8では、第一周辺分布作成部20は第n文字目の文字
検出領域ハ(HS(n)≦X≦HE(n)及びYT2(n)≦
Y≦YB2(n)の範囲の領域)の画像データを走査し第
一周辺分布を作成する。この周辺分布の作成では、周辺
各副走査位置X毎に求めた走査線上の累積黒画素数CH
(X)を第一周辺分布記憶部22に格納する。累積黒画素
数CH(X)は次式(6)で表される。* In S8, the first marginal distribution creating unit 20 sets the character detection area C of the n-th character (HS (n) ≦ X ≦ HE (n) and YT2 (n) ≦
The image data of Y ≦ YB2 (n) is scanned to create a first marginal distribution. In the generation of this peripheral distribution, the cumulative number of black pixels CH on the scanning line obtained for each peripheral sub-scanning position X
(X) is stored in the first marginal distribution storage unit 22. The cumulative number of black pixels CH (X) is represented by the following equation (6).
第一周辺分布の作成を終えたら次に、黒ブロック検出
部24は第一周辺分布記憶部22の累積黒画素数CH(X)を
読出し、各副走査位置X毎に閾値THLBと比較し、CH
(X)≧THL BとなるCH(X)が存在する領域の行方向
における始端位置及び終端位置を第n文字目の行方向の
文字始端位置Bns及び終端位置BnEとして検出する。 After completing the creation of the first marginal distribution, the black block detection unit 24 reads the cumulative number of black pixels CH (X) in the first marginal distribution storage unit 22 and compares it with the threshold value THLB for each sub-scanning position X. CH
(X) ≧ THL B become CH (X) detects the starting end position and end position in the row direction of the area which exists as a character start position B ns and end position B nE of the n-th character in the row direction.
この実施例では、黒ブロック検出部24は第n文字目の
文字始端位置Bns及び終端位置BnEと次の第(n+1)文
字目の行方向の文字始端位置B(n+1)sとを検出し、次に
切出すべき第(n+1)文字目の文字検出領域を設定す
るために用いる文字始端位置B(n+1)sを制御部18に対し
て出力する。In this embodiment, the black block detection unit 24 determines the character start position B ns and the end position B nE of the n-th character, and the character start position B (n + 1) s in the line direction of the next (n + 1) -th character. And outputs to the control unit 18 a character start position B (n + 1) s used for setting a character detection area of the (n + 1) th character to be cut out next.
次いで、罫線検出領域位置決定部26は第n文字目の文
字始端位置Bns及び終端位置BnEに基づき、第n文字目の
文字切出し位置を決定し、この切出し位置を制御部18へ
出力する。例えば、行方向の文字始端位置Bns及び終端
位置BnEを最初の文字の行方向の文字切出し開始位置KXL
(n)及び終了位置KXR(n)とする。Then, the ruled-line detection area position determining unit 26 on the basis of the n-th character starting position B ns and end position B nE, determines the character extraction position of the n-th character, and outputs the extraction position to the control unit 18 . For example, a row direction of the character start position B ns and end position B nE of the first character in the row direction character segmentation start position KXL
(N) and the end position KXR (n).
*S8の次に、罫線検出領域位置決定部26は、行方向の
文字始端位置Bns及終端位置BnEに基づいて、第n文字目
の文字に隣接する罫線を包含するように、行方向の罫線
検出領域位置を設定する(*S9)。* After S8, the ruled line detection area position determination unit 26 determines the line direction so as to include the ruled line adjacent to the n-th character based on the character start position B ns and end position B nE in the line direction. The ruled line detection area position is set (* S9).
この実施例では、第n文字目の始端位置Bnsに隣接す
る第一罫線検出領域及び終端位置BnEに隣接する第二罫
線検出領域を設定する。In this embodiment, to set the second ruled line detection area adjacent to the first ruled line detection region and the end position B nE adjacent to the n-th character of the starting end position B ns.
第n文字目の第一罫線検出領域の行方向の始端位置BX
LL(n)及び終端位置BXLR(n)と、第二罫線検出領域
の行方向の始端位置BXRL(n)及び終端位置BXRR(n)
とを例えば次式(7)に従って求め、これら位置BXLL
(n)、BXLR(n)、BXRL(n)及びBXRR(n)を制御
部18へ出力する。Start position BX in the row direction of the first ruled line detection area of the n-th character
LL (n) and end position BXLR (n), and start and end positions BXRL (n) and BXRR (n) in the row direction of the second ruled line detection area.
Are calculated according to, for example, the following equation (7).
(N), BXLR (n), BXRL (n) and BXRR (n) are output to the control unit 18.
n=1又はKXL(n)−1−β>KXR(n−1)+1のと
き BXLL(n)=KXL(n)−1−β n≧2かつKXL(n)−1−β≦KXR(n−1)+1のと
き BXLL(n)=KXR(n−1)+1 BXLR(n)=KXL(n)−1 BXRL(n)=KXR(n)+1 n=nL(nLは文字行1行における文字総数) 又はKXR(n)+1+β<KXL(n+1)−1のとき BXRR(n)=KXR(n)+1+β n≦nL−1かつKXR(n)+1+β≧KXL(n+1)−1
のとき BXRR(n)=KXL(n+1)−1 ……(7) さらに罫線検出領域位置決定部26は、これら行方向の
罫線検出領域位置の設定と共に、第n文字目につき上述
の距離aを求めるための行方向の文字中心位置XMC
(n)を算出し、この位置XMC(n)を制御部18へ出力
する。中心位置XMC(n)は次式(14)で表される。When n = 1 or KXL (n) -1-β> KXR (n-1) +1 BXLL (n) = KXL (n) -1-βn ≧ 2 and KXL (n) -1-β ≦ KXR ( When n−1) +1 BXLL (n) = KXR (n−1) +1 BXLR (n) = KXL (n) −1 BXRL (n) = KXR (n) +1 n = n L (n L is a character line BXRR (n) = KXR (n) + 1 + βn ≦ n L −1 and KXR (n) + 1 + β ≧ KXL (n + 1) −1 when KXR (n) + 1 + β <KXL (n + 1) −1
BXRR (n) = KXL (n + 1) -1 (7) Furthermore, the ruled line detection area position determination unit 26 sets the ruled line detection area position in the row direction and sets the distance a for the nth character. Character center position XMC in line direction to find
(N) is calculated, and the position XMC (n) is output to the control unit 18. The center position XMC (n) is represented by the following equation (14).
XMC(n)=KXL(n)+(KXR(n)−KXL(n))/2…
…(14) 次に第7図及び第5図を参照し説明する。XMC (n) = KXL (n) + (KXR (n) -KXL (n)) / 2 ...
(14) Next, a description will be given with reference to FIGS. 7 and 5.
*S9の次に、第二周辺分布作成部28及び罫線検出部32
は第一及び第二罫線検出領域内の罫線の列方向の位置を
検出するための処理を行なう(*S10)。* After S9, the second marginal distribution creating unit 28 and the ruled line detecting unit 32
Performs processing for detecting the position of the ruled line in the column direction within the first and second ruled line detection areas (* S10).
*S10で、第二周辺分布作成部28は、第n文字目の第
一罫線検出領域ニ及び第二罫線検出領域ホの位置を制御
部18から入力してこれら罫線検出領域ニ及びホを走査
し、各副走査位置Yの走査線上の累積黒画素数BLH
(Y)及びBRH(Y)を求め、各副走査位置Y毎に累積
黒画素数BLH(Y)及びBRH(Y)を第二周辺分布記憶部
30に格納して、第二周辺分布を作成する。* In S10, the second marginal distribution creating unit 28 inputs the positions of the first ruled line detection area D and the second ruled line detection area E of the n-th character from the control unit 18 and scans these ruled line detection areas D and E. And the cumulative number of black pixels BLH on the scanning line at each sub-scanning position Y
(Y) and BRH (Y) are obtained, and the cumulative number of black pixels BLH (Y) and BRH (Y) are stored in the second peripheral distribution storage unit for each sub-scanning position Y.
Stored at 30 to create a second marginal distribution.
累積黒画素数BLH(Y)及びBRH(Y)は例えば次式
(8)により定義される。The cumulative number of black pixels BLH (Y) and BRH (Y) is defined, for example, by the following equation (8).
*S10では次に、第n文字目の文字に隣接する罫線の
当該文字に近い側の罫線位置を検出する。このため罫線
検出部32は第二周辺分布記憶部30の累積黒画素数BLH
(Y)をYT2(n)から副走査位置Yが大きい順に読出
し、各副走査位置Y毎に閾値THLLと比較し、最初にBLH
(Y)≧THLLとなる副走査位置Yを列方向の罫線位置YL
Tとして検出する。また累積黒画素数BLH(Y)をYB2
(n)から副走査位置Yが小さい順に読出し、各副走査
位置Y毎に閾値THLLと比較し、最初にBLH(Y)≧THLL
となる副走査位置Yを列方向の罫線位置YLBとして検出
する。第一罫線検出領域に関してはこれら罫線位置YLT
及びYLBが検出される。 At S10, the position of the ruled line adjacent to the character of the ruled line adjacent to the n-th character is detected. For this reason, the ruled line detection unit 32 calculates the cumulative black pixel number BLH in the second marginal distribution storage unit 30.
(Y) is read out from YT2 (n) in the order of the sub-scanning position Y, and compared with the threshold value THLL for each sub-scanning position Y.
(Y) The sub-scanning position Y where ≧ THLL is set to the ruled line position YL in the column direction
Detected as T. In addition, the cumulative number of black pixels BLH (Y) is YB2
From (n), the sub-scanning position Y is read in ascending order and compared with the threshold value THLL for each sub-scanning position Y. First, BLH (Y) ≧ THLL
Is detected as the ruled line position YLB in the column direction. For the first ruled line detection area, these ruled line positions YLT
And YLB are detected.
閾値THLLは任意好適に設定される定数であり、例えば
次式(9)によって定義される。The threshold value THLL is a constant that is arbitrarily and suitably set, and is defined, for example, by the following equation (9).
THLL={(BXLR(n)−BXLL(n)+1)/2}+1……
(9) これと共に罫線検出部32は第二周辺分布記憶部39の累
積黒画素数BRH(Y)をYT2(n)から副走査位置Yが大
きい順に読出し、各副走査位置Y毎に閾値THLRと比較
し、最初にBRH(Y)≧THLRとなる副走査位置Yを列方
向の罫線位置YRTとして検出する。また累積黒画素数BRH
(Y)をYB2(n)から副走査位置Yが小さい順に読出
し、各副走査位置Y毎に閾値THLRと比較し、最初にBRH
(Y)≧THLRとなる副走査位置Yを列方向の罫線位置YR
Bとして検出する。第二罫線検出領域に関してはこれら
罫線位置YRT及びYRBが検出される。THLL = {(BXLR (n) -BXLL (n) +1) / 2} +1 ...
(9) At the same time, the ruled line detection unit 32 reads the cumulative black pixel number BRH (Y) of the second peripheral distribution storage unit 39 from YT2 (n) in descending order of the sub-scanning position Y, and sets a threshold value THLR for each sub-scanning position Y. First, the sub-scanning position Y where BRH (Y) ≧ THLR is detected as the ruled line position YRT in the column direction. Also, the cumulative number of black pixels BRH
(Y) is read from YB2 (n) in ascending order of the sub-scanning position Y, and is compared with the threshold value THLR for each sub-scanning position Y.
(Y) The sub-scanning position Y where ≧ THLR is set to the ruled line position YR in the column direction
Detect as B. As for the second ruled line detection area, these ruled line positions YRT and YRB are detected.
閾値THLRは任意好適に設定される定数であり、例えば
次式(10)によって定義される。The threshold THLR is a constant that is arbitrarily and suitably set, and is defined by, for example, the following equation (10).
THLR={(BXRR(n)−BXRL(n)+1)/2}+ ……
(10) *S10の次に、罫線分離位置決定部34は第n文字目の
文字と当該文字に隣接する罫線とを列方向において分離
するための第n文字目の分離位置YT(n)、YB(n)を
設定する(*S11)。THLR = {(BXRR (n) -BXRL (n) +1) / 2} + ......
(10) After * S10, the ruled line separation position determination unit 34 separates the n-th character from the ruled line adjacent to the n-th character in the column direction by a separation position YT (n) of the n-th character. Set YB (n) (* S11).
分離位置YT(n)、YB(n)は例えば次式(11)によ
り定義される。The separation positions YT (n) and YB (n) are defined, for example, by the following equation (11).
YLT≧YRTのとき YT(n)=YLT YLT<YRTのとき YT(n)=YRT YLB≧YRBのとき YB(n)=YRB YLB<YRBのとき YB(n)=YLB ……(11) そして罫線分離位置決定部34はこれら分離位置に基づ
いて列方向の文字切出し開始位置及び終了位置を設定
し、これら切出し位置を制御部18及び傾斜度計算部36a
へ出力する。例えば分離位置YT(n)及びYB(n)を、
列方向の文字切出し開始位置及び終了位置とし、制御部
18は第n文字目の行方向及び列方向の切出し位置KXL
(n)、KXR(n)及びYT(n)、YB(n)を図示しな
い次段の装置へ出力する。When YLT ≧ YRT YT (n) = YLT When YLT <YRT YT (n) = YRT When YLB ≧ YRB YB (n) = YRB When YLB <YRB YB (n) = YLB ... (11) and The ruled line separation position determination unit 34 sets a character extraction start position and an end position in the column direction based on these separation positions, and determines these extraction positions by the control unit 18 and the inclination calculation unit 36a.
Output to For example, the separation positions YT (n) and YB (n)
The character extraction start position and end position in the column direction
18 is the cutout position KXL in the row and column directions of the nth character
(N), KXR (n) and YT (n), YB (n) are output to the next-stage device (not shown).
*S11の次に、制御部18は切出し終了フラグを参照し
切出し処理を終了したか否かを判定する。(*S12)。* After S11, the control unit 18 refers to the extraction end flag and determines whether or not the extraction processing has been completed. (* S12).
*S12で、切出し終了フラグ=1であれば動作を終了
し(END)、切出し終了フラグ=0であれば傾斜度算出
部36は文字数カウンタの計数値nと任意好適に設定され
るN(Nは2≦N≦nLの配置の自然数)とを比較する
(*S13)。 *S13でn<Nであれば第n文字目の文字
は第(N−1)文字目以前の文字であることを表すの
で、傾斜度計算部36aは第n文字目の傾斜度K(n)を
算出する(*S14)。At S12, if the cutout end flag = 1, the operation is terminated (END), and if the cutout end flag = 0, the inclination calculating section 36 sets the count value n of the character number counter to N (N Is a natural number of an arrangement of 2 ≦ N ≦ n L ) (* S13). If n <N in * S13, the character of the n-th character is the character before the (N-1) -th character. Therefore, the gradient calculating unit 36a calculates the gradient K (n ) Is calculated (* S14).
*S14で傾斜度計算部36aは、第n文字目の列方向の切
出し位置YT(n)、YB(n)を罫線分離位置決定部34か
ら入力し、また第n番目の文字の中心位置XMC(n)及
び行位置情報LYT2、LYB2、XKCを入力し、例えば次式(1
5)で定義される傾斜度K(n)を算出する。傾斜度計
算部36aは算出したK(n)を記憶すると共に補正値計
算部36bへ出力する。At S14, the inclination calculating unit 36a inputs the cutout positions YT (n) and YB (n) of the nth character in the column direction from the ruled line separation position determining unit 34, and furthermore, the center position XMC of the nth character. (N) and the row position information LYT2, LYB2, XKC,
The slope K (n) defined in 5) is calculated. The inclination calculating unit 36a stores the calculated K (n) and outputs it to the correction value calculating unit 36b.
|YT(n)−LYT2|≧|YB(n)−LYB2|のとき K(n)=CS×(YB(n)−LYB2)/(XMC(n)−XK
C) |YT(n)−LYT2|<|YB(n)−LYB2|のとき K(n)=CS×(YT(n)−LYT2)/(XMC(n)−XK
C) ……(15) 但し、CSは傾斜度K(n)を精度良く算出するため任意
好適に設定される定数であり、例えばCS=128と設定さ
れる。When | YT (n) −LYT2 | ≧ | YB (n) −LYB2 | K (n) = CS × (YB (n) −LYB2) / (XMC (n) −XK
C) When | YT (n) −LYT2 | <| YB (n) −LYB2 | K (n) = CS × (YT (n) −LYT2) / (XMC (n) −XK
C) (15) Here, CS is a constant that is arbitrarily set in order to accurately calculate the inclination K (n), and is set to, for example, CS = 128.
この実施例では行方向の基準位置XKCと行方向の文字
中心位置XMC(n)との差を上述の距離aとする。また
上述の距離bを求めるための列方向の基準位置として行
位置情報LYT2、LYB2を用い、列方向の基準位置LYT2又は
LYB2と列方向の切出し位置(罫線位置)YT(n)又はYB
(n)との差を上述の距離bとする。In this embodiment, the difference between the reference position XKC in the line direction and the character center position XMC (n) in the line direction is the distance a described above. Further, the row position information LYT2, LYB2 is used as the reference position in the column direction for obtaining the distance b, and the reference position LYT2 or LYT2 in the column direction is used.
LYB2 and cutout position (ruled line position) YT (n) or YB in the column direction
The difference from (n) is the distance b described above.
*S13の次に補正値計算部36bは次に切出すべき第(n
+1)文字目の文字の位置補正値HENIを求めるため*S1
5の処理を行なう。* After S13, the correction value calculation unit 36b next outputs the (n
+1) To calculate the position correction value HENI of the character * S1
Perform step 5.
また*S13でn≧Nであれば、傾斜度計算部36aはn=
Nか否かを判定する(*S16)。Also, if n ≧ N in * S13, the inclination calculator 36a calculates n =
It is determined whether or not N (* S16).
*S16でn=Nであれば第n文字目の文字は第N文字
目の文字であることを表すので、傾斜度計算部36aは第
N文字目以後の文字の位置補正値を求めるために用いる
平均傾斜度AKを算出する(*S17)。平均傾斜度AKは次
式(16)で表される。* If n = N in S16, it indicates that the character of the nth character is the character of the Nth character. Therefore, the inclination calculating unit 36a calculates the position correction value of the character from the Nth character onward. The average inclination AK to be used is calculated (* S17). The average inclination AK is represented by the following equation (16).
*S17の次に補正値計算部36bは次に切出すべき第(n
+1)文字目の文字の位置補正値HENIを求めるために*
S15の処理を行なう。 * After S17, the correction value calculation unit 36b outputs the next (n
+1) To determine the position correction value HENI of the character *
Perform the process of S15.
*S16でn≠Nであれば第n文字目の文字は第(N+
1)文字目以後の文字であることを表すので、*S16の
次に補正値計算部36bは次に切出すべき第(n+1)文
字目の文字の位置補正値HENIを求めるために*S15の処
理を行なう。* If n ≠ N in S16, the n-th character is the (N +
1) Since it indicates that it is the character after the character, the correction value calculation unit 36b next to * S16 determines the position correction value HENI of the (n + 1) th character to be extracted next. Perform processing.
*S15では、補正値計算部36bはn<Nか否かを調べ、
次に切出すべき文字の位置補正値HENIの算出に傾斜度K
(n)及び平均傾斜度AKのいずれを用いるかを判定す
る。* In S15, the correction value calculation unit 36b checks whether or not n <N.
The inclination K is used to calculate the position correction value HENI of the character to be extracted next.
It is determined whether to use (n) or the average inclination AK.
*S15でn<Nであれば次に切出すべき第(n+1)
文字目の文字は第N文字以前の文字であるので、補正値
計算部36bは傾斜度K(n)を用いて次に切出すべき文
字の位置補正値HENIを求める(*S18)。補正値計算部3
6bは傾斜度K(n)を傾斜度計算部36aから入力し基準
位置XKC及び次に切出すべき文字の行方向の始端位置KXL
(n+1)を制御部18から入力し、例えば次式(17)に
従って位置補正値HENIを算出する。* If n <N in S15, the (n + 1) th to be cut out next
Since the character of the character is the character before the Nth character, the correction value calculation unit 36b obtains the position correction value HENI of the character to be cut out next using the inclination K (n) (* S18). Correction value calculator 3
6b, the inclination K (n) is input from the inclination calculator 36a, and the reference position XKC and the starting position KXL in the line direction of the next character to be extracted.
(N + 1) is input from the control unit 18, and the position correction value HENI is calculated according to, for example, the following equation (17).
HENI=(KXL(n+1)+CWH−XKC)×K(n)/CS …
(17) 但し、CWHは書式情報から得られる文字のX方向の幅の1
/2の値であり、例えばX方向の文字幅を4mm及びイメー
ジセンサの解像度16画素/mmとした場合CWH=32とすれば
よい。この実施例では上述の距離cを求めるための基準
位置をXKCとし及び書式情報を用いて得られる次に切出
すべき文字の行方向の中心位置をKXL(n+1)+CWHと
し、距離cとしてc=KXL(n+1)+CWH−XKCを求め
る。HENI = (KXL (n + 1) + CWH-XKC) × K (n) / CS ...
(17) However, CWH is the width in the X direction of the character obtained from the format information.
For example, when the character width in the X direction is 4 mm and the resolution of the image sensor is 16 pixels / mm, CWH may be set to 32. In this embodiment, XKC is a reference position for obtaining the distance c, KXL (n + 1) + CWH is the center position in the line direction of the next character to be extracted using format information, and c = c KXL (n + 1) + CWH-XKC is calculated.
*S18の次に制御部18は文字数カウンタの計数値に1
加算し(*S19)、次いで補正値計算部36bは*S4の処理
を行なう。* After S18, the control unit 18 adds 1 to the count value of the character counter.
The addition is performed (* S19), and then the correction value calculation unit 36b performs the processing of * S4.
*S15でn≧Nであれば次に切出すべき第(n+1)
文字目の文字は第N文字以後の文字であるので、補正値
計算部36bは平均傾斜度AKを用いて次に切出すべき文字
の位置補正値HENIを求める(*S20)。補正値計算部36b
は平均傾斜度AKを傾斜度計算部36aから入力し、例えば
次式(18)に従って位置補正値HENIを求める。* If n ≧ N in S15, the (n + 1) th to be cut out next
Since the character of the character is the character after the N-th character, the correction value calculation unit 36b obtains the position correction value HENI of the character to be extracted next using the average inclination AK (* S20). Correction value calculator 36b
Inputs the average inclination AK from the inclination calculator 36a, and obtains the position correction value HENI according to, for example, the following equation (18).
HENI=(KXL(n+1)+CWH−XKC)×AK/CS …(18) *S20の次に制御部18は*S19の処理を行ない、次いで
補正値計算部36bは*S4の処理を行なう。HENI = (KXL (n + 1) + CWH-XKC) .times.AK / CS (18) After * S20, the control unit 18 performs * S19, and then the correction value calculation unit 36b performs * S4.
上述した第一及び第二実施例によれば、切出そうとす
る文字に隣接する罫線の当該文字側の位置をほぼ文字切
出し位置として設定することになり、これがため一文字
毎に精度良く文字と罫線とを分離して文字を切出すこと
ができる。According to the first and second embodiments described above, the position on the character side of the ruled line adjacent to the character to be extracted is set as a character extraction position, and therefore, a character is accurately extracted for each character. Characters can be cut out separately from the ruled lines.
また第二実施例によれば、既に検出済みの列方向及び
行方向の切出し文字の位置及び傾斜度を用いて文字検出
領域及び罫線検出領域を設定するので、文字ピッチが不
定であっても文字切出しを精度良く行なえ、従ってこの
実施例を文字認識装置に適用した場合には文字ピッチが
不定でも正確に文字を切出して精度良く文字認識を行な
える高機能な文字認識装置を提供できる。また第二実施
例によれば、文字行中の第(N−1)番目以前の文字に
つき求めた各文字の傾斜度の平均値を、第N文字以後の
文字の切出しに用いるので、処理速度を速くすることが
できる。Further, according to the second embodiment, the character detection area and the ruled line detection area are set using the positions and inclinations of the extracted characters in the column direction and the row direction which have already been detected. It is possible to provide a high-performance character recognition device capable of performing accurate extraction and performing accurate character recognition by accurately extracting characters even if the character pitch is indefinite when this embodiment is applied to a character recognition device. Further, according to the second embodiment, the average value of the inclination of each character obtained for the (N-1) th character or earlier in the character line is used for cutting out the character after the Nth character. Can be faster.
この発明は上述した実施例にのみ限定されるものでは
なく、各構成成分の動作、動作の流れ、入出力信号、各
パラメータの数値的条件、相対的位置関係、配設個数及
びそのほかの条件を任意好適に変更することができる。The present invention is not limited only to the above-described embodiment, and the operation of each component, the operation flow, the input / output signal, the numerical condition of each parameter, the relative positional relationship, the number of arrangements, and other conditions Any suitable changes can be made.
例えば上述した実施例では文字検出領域が罫線を包含
せず文字のみを包含するようにしたが、文字検出領域が
文字及び罫線を包含するように設定してもよい。文字検
出領域が文字及び罫線を包含しても閾値THL Bを任意好
適に設定することによって行方向の文字位置を検出でき
る。For example, in the above-described embodiment, the character detection area does not include the ruled line but includes only the character. However, the character detection area may be set to include the character and the ruled line. Even if the character detection area includes characters and ruled lines, the character position in the line direction can be detected by arbitrarily setting the threshold value THLB.
また上述した実施例では文字検出領域が切出し文字及
び当該文字の次の文字を包含するように設定し周辺分布
の変化から検出した次の文字の始端位置を用いて次の文
字の行方向の文字検出領域位置及び罫線検出領域位置を
検出するようにしたが、文字検出領域を切出し文字及び
当該文字に隣接する余白領域のみを包含するように設定
して行方向の切出し位置を検出しこの行方向の切出し位
置と文字ピッチ、文字幅等の書式情報とを用いて次に切
出すべき文字の行方向の文字検出領域位置及び罫線位置
を設定するようにしてもよい。In the above-described embodiment, the character detection area is set so as to include the cut-out character and the character next to the character, and the start position of the next character detected from the change in the marginal distribution is used to determine the character in the line direction of the next character. Although the detection area position and the ruled line detection area position are detected, the character detection area is set so as to include only the cut-out character and the margin area adjacent to the character, and the cut-out position in the line direction is detected. The character detection area position and the ruled line position in the line direction of the character to be extracted next may be set using the extraction position and the character pitch, character width, and other format information.
また上述した実施例では横書きの罫線を伴なう文字行
から文字を切出す例につき説明したが、この発明は第8
図にも示すような縦書きの罫線を伴なう文字行から文字
を切出す場合にも適用できる。Also, in the above-described embodiment, an example has been described in which characters are cut out from a character line with a horizontally written ruled line.
The present invention can also be applied to a case where characters are cut out from a character line with vertical ruled lines as shown in the figure.
また上述した実施例では文字行1行の一方及び他方の
側の側部に罫線が1本ずつある例場合の例につき説明し
たが、この発明は文字行の一方の側にのみ罫線がある場
合も適用できる。文字行の一方の側にのみ罫線がある場
合には、まず最初の文字近傍の罫線位置を上述と同様に
して検出し、この罫線位置と行の列方向の幅、行間隔等
の書式情報とを用いて最初の文字の列方向の文字検出領
域位置及び罫線検出領域位置を設定すればよい。Further, in the above-described embodiment, an example in which one ruled line is provided on one side and one side of one character line has been described. However, the present invention is applied to a case where there is a ruled line only on one side of the character line. Can also be applied. If there is a ruled line on only one side of the character line, the ruled line position near the first character is first detected in the same manner as described above, and the ruled line position, the width of the line in the column direction, the line spacing, etc. May be used to set the character detection area position and ruled line detection area position in the column direction of the first character.
また文字行の傾斜量、文字検出領域及び罫線検出領域
の設定のための位置補正値を上述した実施例のもののみ
限定するものではなく、任意好適に変更できる。例えば
第二実施例においては傾斜度K(n)を行位置情報LYT
2、LYB2を基準位置として求めるようにしたが、これらL
YT2、LYB2にかえて最初の文字の列方向の位置YT
(1)、YB(1)を基準位置として傾斜度K(n)を用
いるようにしてもよい。さらに第二実施例の(15)式に
おいては傾斜度K(n)として列方向の切出し開始位置
YT(n)を用いて求められる傾斜度及び列方向の切出し
終了位置YB(n)を用いて求められる傾斜度のうち最小
となる法を用いるようにしたが、これら位置YT(n)及
びYB(n)を用いて求められる傾斜度の平均値を用いる
ようにしてもよいし、また位置YT(n)を用いて求めら
れる傾斜度を傾斜度K(n)として位置補正値HENIを求
めこの位置補正値HENIに基づきYT1(n)、YT2(n)を
設定し及び列方向の切出し終了位置YB(n)を用いて求
められる傾斜K(n)として位置補正値HENIを求めこの
位置補正値HENIに基づきYB1(n)、YB2(n)を設定す
るようにしてもよい。また第二実施例において列方向の
文字切出し位置から傾斜度K(n)を求めるかわりに罫
線の位置から傾斜度を求めるようにしてもよい。In addition, the inclination amount of the character line and the position correction value for setting the character detection area and the ruled line detection area are not limited to those in the above-described embodiment, but can be arbitrarily and suitably changed. For example, in the second embodiment, the inclination K (n) is set to the row position information LYT.
2, LYB2 was determined as the reference position, but these L
YT2, LYB2 in place of the first character in the column direction YT
(1) The inclination K (n) may be used with YB (1) as a reference position. Further, in the equation (15) of the second embodiment, the cutting start position in the column direction is set as the inclination K (n).
The method of minimizing the gradient obtained by using YT (n) and the gradient obtained by using the cutout end position YB (n) in the column direction is used, but these positions YT (n) and YB (n) are used. The average value of the gradients obtained using (n) may be used, or the position correction value HENI may be obtained by using the gradient obtained using the position YT (n) as the gradient K (n). YT1 (n) and YT2 (n) are set based on the position correction value HENI, and the position correction value HENI is obtained as the inclination K (n) obtained using the cutout end position YB (n) in the column direction. YB1 (n) and YB2 (n) may be set based on HENI. In the second embodiment, instead of obtaining the inclination K (n) from the character cutout position in the column direction, the inclination may be obtained from the position of the ruled line.
(発明の効果) 上述した説明からも明らかなように、この発明の文字
切出し装置によれば、文字行の二文字目以降の文字の文
字検出領域及び罫線検出領域の設定のための列方向の位
置補正値をそれ以前に切り出した文字の位置に基づいて
設定する。(Effects of the Invention) As is clear from the above description, according to the character extracting apparatus of the present invention, the character detection area and the ruled line detection area for the second and subsequent characters of the character line are set in the column direction. The position correction value is set based on the position of the character cut out earlier.
位置補正値は文字及び罫線の列方向における変位量或
は変位の範囲を表し、これがため文字行が傾斜していた
としても、文字検出領域が切出そうとする文字を包含し
かつ隣接する他の文字行の文字を包含しないように文字
検出領域を精度良く設定し、また罫線検出領域が切出そ
うとする文字に隣接する罫線を包含するように罫線検出
領域を精度良く設定することができる。The position correction value indicates the amount of displacement or the range of displacement of the character and the ruled line in the column direction. Therefore, even if the character line is inclined, the character detection area includes the character to be cut out and is adjacent to the character. The character detection area can be set accurately so as not to include the characters of the character line, and the ruled line detection area can be set accurately so that the ruled line detection area includes the ruled line adjacent to the character to be cut out. .
従って傾斜した文字行における文字の連続的な位置変
化を文字検出領域及び罫線検出領域に伝搬させることが
できるので、罫線を伴なう傾斜した文字行から当該文字
行の文字を切出す際に当該文字行に隣接する文字行の文
字を切出すことなく当該文字行の文字を正確に切出すこ
とができる。Therefore, since the continuous position change of the character in the inclined character line can be propagated to the character detection area and the ruled line detection area, when the character of the character line is cut out from the inclined character line with the ruled line. The character of the character line can be accurately extracted without extracting the character of the character line adjacent to the character line.
また文字検出領域及び罫線検出領域を上述のように設
定できるので、これら領域内の周辺分布を作成し作成し
た周辺分布を所定の閾値と比較して周辺分布の変化を調
べ、この周辺分布の変化に基づいて文字及び罫線の位置
を検出するという簡単な処理で、精度良くしかも安定に
文字を切出せる。簡単な処理で文字を切出せるので、装
置構成を簡素化することができる。In addition, since the character detection area and the ruled line detection area can be set as described above, the peripheral distribution in these areas is created, and the created peripheral distribution is compared with a predetermined threshold value to check the change in the peripheral distribution. The character can be accurately and stably cut out by a simple process of detecting the position of the character and the ruled line based on the. Since characters can be cut out by simple processing, the device configuration can be simplified.
また文字検出領域及び罫線検出領域を上述のように設
定できるので、切出そうとする文字の位置と当該文字に
隣接する罫線の位置とを精度良く検出でき、従って文字
及び罫線が接触していたとしても文字及び罫線を正確に
分離して文字を切出すことができる。In addition, since the character detection area and the ruled line detection area can be set as described above, the position of the character to be cut out and the position of the ruled line adjacent to the character can be accurately detected, and thus the character and the ruled line are in contact. The character can be cut out by accurately separating the character and the ruled line.
従ってこの発明の文字切出し装置を、文字認識装置に
適用すれば、罫線を伴なう文字行付近に他の文字行が存
在する文書を当該文書の真の文書行方向からずれた方向
へ走査して画像データを得た場合や、罫線を伴なう文字
行が傾斜して文書に印刷されていた場合でも、周辺分布
の変化から正確に文字を切出して、精度良く文字認識を
行なえる高機能の文字認識装置を、小型に低価格に作る
ことができる。Therefore, if the character extraction device of the present invention is applied to a character recognition device, a document in which another character line exists near a character line with a ruled line is scanned in a direction shifted from the true document line direction of the document. Even if the image data is obtained or the character line with ruled lines is printed on the document with a slant, the character is accurately cut out from the change of the surrounding distribution and the character recognition can be performed accurately. Can be made small and inexpensive.
第1図はこの発明の第一実施例の構成を概略的に示す機
能ブロック図、 第2図は第一実施例の動作の流れの例を示す図、 第3図(A)及び(B)は画像データ及び文字検出領域
の周辺分布の例を示す図、 第4図(A)及び(B)は画像データ及び文字検出領域
の周辺分布の例を示す図、 第5図(A)及び(B)、(C)は画像データ及び罫線
検出領域の周辺分布の例を示す図、 第6図はこの発明の第二実施例の構成を概略的に示す機
能ブロック図、 第7図は第二実施例の動作の流れの例を示す図、 第8図は縦書きの罫線を伴なう文字行の例を示す図であ
る。 10…画像出力部、12…画像記憶部 14…文字間検出部、16…文字罫線分離部 17、36…補正値決定部 18…制御部。FIG. 1 is a functional block diagram schematically showing a configuration of a first embodiment of the present invention, FIG. 2 is a diagram showing an example of an operation flow of the first embodiment, and FIGS. 3 (A) and 3 (B). FIGS. 4A and 4B are diagrams showing examples of peripheral distributions of image data and character detection regions, and FIGS. 5A and 5B are diagrams showing examples of peripheral distributions of image data and character detection regions. FIGS. 6B and 6C are diagrams showing examples of peripheral distribution of image data and ruled line detection areas, FIG. 6 is a functional block diagram schematically showing a configuration of a second embodiment of the present invention, and FIG. FIG. 8 is a diagram showing an example of the operation flow of the embodiment, and FIG. 8 is a diagram showing an example of a character line with a vertically written ruled line. 10: Image output unit, 12: Image storage unit 14: Character interval detection unit, 16: Character ruled line separation unit 17, 36: Correction value determination unit 18: Control unit.
Claims (1)
る画像出力部と、 前記画像データを格納する画像記録部と、 前記文字行の各文字毎に設定される文字検出領域の画像
データを列方向に走査して各走査線毎に累積黒画素数を
検出し第一の周辺分布を作成し該第一の周辺分布の変化
から文字検出領域内の文字の位置を検出し、該文字の位
置に基づいて文字に隣接する罫線検出領域の位置を設定
すると共に行方向の文字切り出し位置を設定する文字間
検出部と、 前記罫線検出領域の画像データを行方向に走査して各走
査線毎に累積黒画素数を検出し第二の周辺分布を作成
し、該第二の周辺分布の変化から罫線の位置を検出し、
該罫線の位置に基づき文字及び罫線の分離位置を設定
し、該分離位置に基づいて列方向の文字切出し位置を設
定する文字罫線分離部と、 前記文字行の二文字目以降の文字の文字検出領域及び罫
線検出領域の設定のための列方向の位置補正値をそれ以
前に切り出した文字の位置に基づいて設定する補正値決
定部と、 各文字毎の文字切出し位置を出力する制御部とを備えて
成ることを特徴とする文字切出し装置。1. An image output unit for outputting image data of a character line with a ruled line, an image recording unit for storing the image data, and an image of a character detection area set for each character of the character line Scanning the data in the column direction, detecting the cumulative number of black pixels for each scanning line, creating a first peripheral distribution, detecting the position of a character in a character detection area from a change in the first peripheral distribution, A character interval detection unit that sets the position of a ruled line detection area adjacent to the character based on the position of the character and sets a character cutout position in the line direction; and scans the image data of the ruled line detection area in the line direction to perform each scan. Detecting the cumulative number of black pixels for each line to create a second peripheral distribution, detecting the position of the ruled line from a change in the second peripheral distribution,
A character rule separation unit that sets a character and rule separation position based on the position of the rule and sets a character cutout position in a column direction based on the separation position; A correction value determination unit that sets a position correction value in the column direction for setting the area and the ruled line detection area based on the position of a character cut out before that; and a control unit that outputs a character cutout position for each character. A character extracting device, comprising:
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP1264733A JP2581809B2 (en) | 1989-10-11 | 1989-10-11 | Character extraction device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP1264733A JP2581809B2 (en) | 1989-10-11 | 1989-10-11 | Character extraction device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPH03126186A JPH03126186A (en) | 1991-05-29 |
| JP2581809B2 true JP2581809B2 (en) | 1997-02-12 |
Family
ID=17407418
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP1264733A Expired - Lifetime JP2581809B2 (en) | 1989-10-11 | 1989-10-11 | Character extraction device |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JP2581809B2 (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2951814B2 (en) * | 1993-02-25 | 1999-09-20 | 富士通株式会社 | Image extraction method |
| JP7301529B2 (en) * | 2018-11-30 | 2023-07-03 | キヤノン株式会社 | Image processing device, image processing method, and program |
-
1989
- 1989-10-11 JP JP1264733A patent/JP2581809B2/en not_active Expired - Lifetime
Also Published As
| Publication number | Publication date |
|---|---|
| JPH03126186A (en) | 1991-05-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6975762B2 (en) | Ruled line extracting apparatus for extracting ruled line from normal document image and method thereof | |
| EP0138445B1 (en) | Method and apparatus for segmenting character images | |
| JP3411472B2 (en) | Pattern extraction device | |
| KR100383858B1 (en) | Character extracting method and device | |
| JP4011646B2 (en) | Line detection method and character recognition device | |
| US5982952A (en) | Optical character reader with tangent detection for detecting tilt of image data | |
| JP2581809B2 (en) | Character extraction device | |
| JP2917427B2 (en) | Drawing reader | |
| JP3187895B2 (en) | Character area extraction method | |
| JP4731748B2 (en) | Image processing apparatus, method, program, and storage medium | |
| JPH04241074A (en) | Automatic document clean copying device | |
| JP3710164B2 (en) | Image processing apparatus and method | |
| JP2616967B2 (en) | Tilt extraction device | |
| JP2812704B2 (en) | Character extraction device | |
| JPH08171609A (en) | High-speed character string extraction device | |
| JPH05298487A (en) | Alphabet recognizing device | |
| JPH0535914A (en) | Picture inclination detection method | |
| JP2812705B2 (en) | Character extraction device | |
| JPH1040334A (en) | Pattern extraction device and method for extracting pattern area | |
| JP2001155113A (en) | Character recognizing device and method of detecting character frame line | |
| JPH05174179A (en) | Document image processor | |
| JPH05266250A (en) | Character string detector | |
| JPH0452886A (en) | Character recognizing device | |
| JPH05225381A (en) | Optical character reader | |
| JPH09238247A (en) | Optical character reader |