JPH083827B2

JPH083827B2 - Character image processing method

Info

Publication number: JPH083827B2
Application number: JP62267227A
Authority: JP
Inventors: 秀明田中; 義弘北村; 敏昭森田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1987-10-21
Filing date: 1987-10-21
Publication date: 1996-01-17
Anticipated expiration: 2011-01-17
Also published as: JPH01108691A

Description

【発明の詳細な説明】産業上の利用分野本発明は、光学的読取り装置（OCR）などにおいて、
好適に実施されるキャラクタ画像処理方式に関する。The present invention relates to an optical reader (OCR), etc.
The present invention relates to a character image processing method that is preferably implemented.

従来技術従来からコンピュータなどへの入力手段の一種とし
て、光学的読取り装置（OCR）が用いられている。この
光学的読取り装置は、原稿に光源からの光を照射し、画
像を撮像装置で読取り、読取られた画像を構成する各キ
ャラクタの種類（英数字や記号など）を判別して、入力
を行なうものである。2. Description of the Related Art An optical reader (OCR) has been used as a kind of input means for a computer or the like. This optical reading device irradiates an original with light from a light source, reads an image with an image pickup device, determines the type of each character (alphanumeric characters or symbols) forming the read image, and performs input. It is a thing.

このような光学的読取り装置では、読取られた画像か
らキャラクタ列が抽出され、この後、各キャラクタ列ご
とに個々のキャラクタが抽出される。抽出されたキャラ
クタは、光学的読取り装置が辞書として記憶している全
てのキャラクタの形状パターンと比較して判別される。In such an optical reading device, a character string is extracted from the read image, and then individual characters are extracted for each character string. The extracted character is determined by comparing it with the shape patterns of all the characters stored as a dictionary by the optical reading device.

発明が解決しようとする問題点従来の光学的読取り装置は、抽出したキャラクタを判
別するとき、辞書として記憶している全てのキャラクタ
の形状パターンと比較しているため、形状パターン数に
比例して判別に要する時間が長くなる。Problems to be Solved by the Invention Since the conventional optical reader compares the extracted character with the shape patterns of all the characters stored in the dictionary, it is proportional to the number of shape patterns. The time required for discrimination becomes long.

この問題を解決する或る先行技術は、たとえば特開昭
55−112687に開示されている。この先行技術では、帳票
中の文字の高さと、予め定める定数を用いて算出した閾
値を用いて、文字グループを判定する。One prior art that solves this problem is disclosed in
55-112687. In this prior art, a character group is determined using the height of characters in a form and a threshold calculated using a predetermined constant.

このような先行技術では、一般的な入力原稿では、文
字のフォントが変化するので、文字高さからの、または
基準線からの相対的な上部または下部座標差分長も様々
に変化する。したがって、このような先行技術を一般原
稿に適用した場合、前記閾値を越えるような変化に対応
することができず、文字グループの判定を正確に行うこ
とができなくなる。In such a prior art, since a character font changes in a general input document, a relative upper or lower coordinate difference length from a character height or from a reference line also changes variously. Therefore, when such a prior art is applied to a general manuscript, it is not possible to cope with a change exceeding the threshold value, and it becomes impossible to accurately determine a character group.

本発明の目的は、キャラクタの判別に要する時間が短
縮され、判別率が向上されたキャラクタ画像処理方式を
提供することである。An object of the present invention is to provide a character image processing method in which the time required for character discrimination is shortened and the discrimination rate is improved.

問題点を解決するための手段本発明は、一配列方向に沿って配列された複数のキャ
ラクタから成るキャラクタ列を読取って画像メモリ３に
記憶し、画像メモリ３の記憶内容を読出して、そのキャラクタ
列に含まれるキャラクタの主要部分の上下方向の各占有
範囲を示す相互に平行な上部基本ラインl2と下部基本ラ
イン１とをそれぞれ定め、上部基本ラインl2と複数の各キャラクタの上部抽出座
標との絶対差距離を求め、その各絶対差距離の頻度を求
め、頻度の大きい２つのグループ22,23が存在すると
き、上部および下部の基本ラインL2,１に垂直方向に
沿うそれらのグループ22,23間の座標と上部基本ラインl
2の座標とから、それらのグループ22,23間に上部基本ラ
インl2に平行な上部閾値ラインl8を定め、下部基本ライン１と複数の各キャラクタの下部抽出
座標との絶対差距離を求め、その各絶対差距離の頻度を
求め、頻度の大きい２つのグループが存在するとき、上
部および下部の基本ライン1,l2に垂直方向に沿うそれ
らのグループ間の座標と下部基本ライン１の座標とか
ら、それらのグループ間に下部基本ライン１に平行な
下部閾値ラインl9を定め、各キャラクタを、上部基本ラインl2と下部基本ライン
１と上部閾値ラインl8と下部閾値ラインl9との各位置
関係によって、複数のキャラクタ群に分類することを特
徴とするキャラクタ画像処理方法である。Means for Solving the Problems According to the present invention, a character string composed of a plurality of characters arranged along one arrangement direction is read and stored in the image memory 3, the stored contents of the image memory 3 are read, and the characters are read. The upper basic line l2 and the lower basic line 1, which are parallel to each other and indicate the vertical occupying ranges of the main parts of the characters included in the column, are defined respectively, and the upper basic line l2 and the upper extraction coordinates of the plurality of characters are defined. When the absolute difference distance is calculated and the frequency of each absolute difference distance is calculated, and when there are two groups 22 and 23 having a high frequency, those groups 22 and 23 along the upper and lower basic lines L2,1 in the vertical direction. Coordinates between and upper basic line l
The upper threshold line l8 parallel to the upper basic line l2 is defined between the groups 22 and 23 based on the coordinates of 2 and the absolute difference distance between the lower basic line 1 and the lower extracted coordinates of a plurality of characters is calculated. The frequency of each absolute difference distance is calculated, and when there are two groups with high frequency, from the coordinates between the upper and lower basic lines 1 and 12 in the vertical direction and the coordinate of the lower basic line 1, A lower threshold line l9 parallel to the lower basic line 1 is defined between these groups, and a plurality of characters are set according to the positional relationship among the upper basic line l2, the lower basic line 1, the upper threshold line l8, and the lower threshold line l9. It is a character image processing method characterized by being classified into a character group.

作用本発明に従えば、複数のキャラクタが一配列方向に沿
って配列されてキャラクタ列が構成され、このキャラク
タ列を読取って画像メモリ３に記憶し、その画像メモリ
３の記憶内容を読出して、上部基本ラインl2と下部基本
ライン１とをそれぞれ定める。このような上部基本ラ
インl2と下部基本ライン１とを定める技術は、公知の
手法によって可能である。Operation According to the present invention, a plurality of characters are arranged along one arrangement direction to form a character string, the character string is read and stored in the image memory 3, and the stored contents of the image memory 3 are read out, An upper basic line 12 and a lower basic line 1 are defined respectively. The technique for defining the upper basic line 12 and the lower basic line 1 can be performed by a known method.

そこで上部基本ラインl2と各キャラクタの上部抽出座
標との絶対差距離の頻度を求めて、頻度の大きい２つの
グループ22,23が存在するとき、それらのグループ22,23
間の後述の実施例では谷部の座標と上部基本ラインl2の
座標とから、上部閾値ラインl8を定め、また同様に下部
基本ライン１と各キャラクタの下部抽出座標との絶対
差距離を求めて下部閾値ラインl9を定め、こうして上下
の基本ラインl2,１と上下の閾値ラインl8,l9との各位
置関係によって複数のキャラクタ群に分類する。このよ
うに上部閾値ラインl8と下部閾値ラインl9は、各キャラ
クタの高さに動的に対応して定めるようにしたので、一
般原稿の文字フォント変化にも動的に対応し、安定した
文字グループ、すなわちキャラクタ群の判定を正確に行
うことができるようになる。Therefore, the frequency of the absolute difference distance between the upper basic line l2 and the upper extracted coordinates of each character is calculated, and when there are two groups 22 and 23 having a high frequency, those groups 22,23
In an embodiment to be described later, the upper threshold line l8 is determined from the coordinates of the valley and the coordinates of the upper basic line l2, and similarly, the absolute difference distance between the lower basic line 1 and the lower extracted coordinates of each character is calculated. The lower threshold line l9 is defined, and in this way, it is classified into a plurality of character groups according to the positional relationship between the upper and lower basic lines l2,1 and the upper and lower threshold lines l8, l9. In this way, the upper threshold line l8 and the lower threshold line l9 are set dynamically in correspondence with the height of each character, so that they can dynamically respond to changes in the character font of general manuscripts, and a stable character group. That is, the character group can be accurately determined.

実施例まず、キャラクタを分類するために、基本ラインと称
される概念を用いる。第２図は、この基本ラインを説明
する図である。読取られた後に、配列された行毎に区分
されたキャラクタ列に対して、キャラクタの配列方向に
沿い、キャラクタの主要部分の上下方向占有範囲を示す
仮想線１およびl2が想定され、各キャラクタは基本的
に仮想線１上に揃えて表記される。Example First, a concept called a basic line is used to classify characters. FIG. 2 is a diagram explaining this basic line. After being read, virtual lines 1 and l2 indicating the vertical occupancy range of the main part of the character are assumed along the character array direction for the character columns divided for each arranged row, and each character is Basically, they are shown aligned on the virtual line 1.

このとき英文字「ｇ」や「ｙ」などはその一部が仮想
線１の下方に突出し、英文字「ｈ」や「ｋ」などはそ
の一部が仮想線l2の上方に突出する。各キャラクタの主
要部分は、仮想線1,l2の間に表記されることになる。
このような仮想線１およびl2を、下部基本ラインおよ
び上部基本ラインと称する。At this time, some of the English characters "g" and "y" project below the virtual line 1, and some of the English characters "h" and "k" project above the virtual line l2. The main part of each character will be written between the virtual lines 1 and 12.
Such virtual lines 1 and 12 are referred to as a lower basic line and an upper basic line.

このような下部基本ライン１および上部基本ライン
l2を用いて分類したキャラクタの具体例が第３図に示さ
れている。第３図（１）は、キャラクタの一部が上部基
本ラインl2の上方に突出するもので、上部突出キャラク
タと称する。第３図（２）は、キャラクタの一部が下部
基本ライン１の下方に突出するもので、下部突出キャ
ラクタと称する。第３図（３）は、キャラクタが下部基
本ライン１および上部基本ラインl2の間に表記される
もので、中間キャラクタと称する。第３図（４）は、キ
ャラクタの一部が下部基本ライン１の下方および上部
基本ラインl2の上方に突出するもので、その他のキャラ
クタと称する。第３図（５）は記号である。Such a lower basic line 1 and an upper basic line
FIG. 3 shows a concrete example of characters classified by using l2. In FIG. 3 (1), a part of the character projects above the upper basic line 12 and is referred to as an upper projecting character. In FIG. 3 (2), a part of the character projects below the lower basic line 1 and is referred to as a lower projecting character. In FIG. 3 (3), a character is written between the lower basic line 1 and the upper basic line 12 and is called an intermediate character. In FIG. 3 (4), a part of the character projects below the lower basic line 1 and above the upper basic line 12 and is referred to as the other character. FIG. 3 (5) is a symbol.

第１図は、本発明に従う光学的読取り装置１の基本的
構成を示すブロック図である。FIG. 1 is a block diagram showing a basic configuration of an optical reading device 1 according to the present invention.

光学的読取り装置１は、固体撮像素子などを用いた画
像入力装置２と、画像入力装置２で読取った画像が記憶
される画像メモリ３と、抽出されたキャラクタ列画像が
記憶されるキャラクタ列画像メモリ４と、各キャラクタ
列毎に抽出された上部基本ラインl2および下部基本ライ
ン１の座標が記憶される上部基本ライン座標メモリ５
および下部基本ライン座標メモリ６と、抽出されたキャ
ラクタの抽出座標が記憶されるキャラクタ抽出座標メモ
リ７と、キャラクタ抽出座標と上部基本ライン座標また
は下部基本ライン座標との絶対差距離から求められたヒ
ストグラムが記憶されるヒストグラムメモリ８と、ヒス
トグラムから検出された上部閾値座標および下部閾値座
標が記憶される上部閾値座標メモリ９および下部閾値座
標メモリ10と、上部基本ラインl2および下部基本ライン
１から求められた中心ラインの座標が記憶される中心
ライン座標メモリ11と、判別されたキャラクタのキャラ
クタ群フラグが記憶されるキャラクタ群フラグメモリ12
と、上記各構成要素の動作を統一的に制御する制御部13
とを含んで構成される。前述の座標というのは、第２
図、第３図および第７図の上下方向の位置を表す座標で
あり、第２図、第３図および第７図の左右の水平方向に
延びるライン１〜l9は、相互に平行であり、したがっ
て各ライン１〜l9を座標によって特定することができ
る。第２図、第３図および第７図の上下方向は、後述の
第６図では左右の水平方向に対応する。The optical reading device 1 includes an image input device 2 using a solid-state image pickup device, an image memory 3 in which an image read by the image input device 2 is stored, and a character string image in which an extracted character string image is stored. Memory 4 and upper basic line coordinate memory 5 in which the coordinates of the upper basic line 12 and the lower basic line 1 extracted for each character string are stored.
And a lower basic line coordinate memory 6, a character extraction coordinate memory 7 in which the extracted coordinates of the extracted character are stored, and a histogram obtained from the absolute difference distance between the character extraction coordinates and the upper basic line coordinates or the lower basic line coordinates. Is calculated from a histogram memory 8 in which is stored, an upper threshold coordinate memory 9 and a lower threshold coordinate memory 10 in which upper threshold coordinates and lower threshold coordinates detected from the histogram are stored, and an upper basic line 12 and a lower basic line 1. A center line coordinate memory 11 that stores the coordinates of the center line and a character group flag memory 12 that stores the character group flag of the determined character.
And a control unit 13 for uniformly controlling the operation of each of the above components
It is comprised including. The coordinates mentioned above mean the second
It is a coordinate that represents the vertical position in FIGS. 3, 3 and 7, and the lines 1 to 19 extending in the horizontal direction on the left and right in FIGS. 2, 3 and 7 are parallel to each other, Therefore, each of the lines 1 to 19 can be specified by the coordinates. The up and down directions in FIGS. 2, 3, and 7 correspond to the left and right horizontal directions in FIG. 6 described later.

第４図は、光学的読取り装置１の基本的動作を説明す
るフローチャートである。FIG. 4 is a flow chart for explaining the basic operation of the optical reader 1.

ステップa1では、画像入力装置２によって読取られた
画像が、画像メモリ３に記憶される。ステップa2では、
画像メモリ３に記憶された画像からキャラクタ列画像が
抽出され、キャラクタ列画像メモリ４に記憶される。ス
テップa3では、キャラクタ列画像から上部基本ラインl2
および下部基本ライン１が抽出され、キャラクタ列画
像メモリ４内に第２図、第３図および第７図の上部およ
び下部の基本ラインl2,１に垂直な上下方向の仮想的
な座標軸を設定し、上部基本ライン座標および下部基本
ライン座標が求められ、上部基本ライン座標メモリ５お
よび下部基本ライン座標メモリ６に記憶される。ステッ
プa4では、キャラクタ列画像からキャラクタが抽出さ
れ、前記仮想座標軸上での抽出座標がキャラクタ抽出座
標メモリ７に記憶される。In step a1, the image read by the image input device 2 is stored in the image memory 3. In step a2,
A character string image is extracted from the images stored in the image memory 3 and stored in the character string image memory 4. In step a3, the upper basic line l2
And the lower basic line 1 are extracted, and virtual coordinate axes in the vertical direction perpendicular to the upper and lower basic lines l2, 1 in FIGS. 2, 3, and 7 are set in the character string image memory 4. , Upper basic line coordinates and lower basic line coordinates are obtained and stored in the upper basic line coordinate memory 5 and the lower basic line coordinate memory 6. In step a4, a character is extracted from the character string image, and the extracted coordinates on the virtual coordinate axis are stored in the character extracted coordinate memory 7.

ステップa5では、上部閾値座標が検出される。第５図
は、上部閾値座標検出時のフローチャート、第６図は、
上部閾値座標検出時に作成されるヒストグラムの一例が
示されている。第５図および第６図を参照して、上部閾
値座標検出の方法を説明する。At step a5, the upper threshold coordinate is detected. FIG. 5 is a flowchart for detecting upper threshold coordinates, and FIG. 6 is
An example of the histogram created when the upper threshold coordinate is detected is shown. A method of detecting the upper threshold coordinate will be described with reference to FIGS. 5 and 6.

ステップb1では、絶対差距離（上部基本ライン座標と
各キャラクタの上部抽出座標との距離）が計算される。
ステップb2では、絶対差距離をもとに、第６図のような
ヒストグラムが作成され、ヒストグラムメモリ８に記憶
される。ステップb3では、ヒストグラムにできた２つの
山部（上部基本ラインl2の近い所に存在するキャラクタ
上部抽出座標によってできた山部22および上部基本ライ
ンl2から遠い所に存在するキャラクタ上部抽出座標によ
ってできた山部23）の間にある谷部21が検出される。ス
テップb4では、ヒストグラム上の谷部座標と上部基本ラ
インl2の座標とから、上部閾値座標が求められ、上部閾
値座標メモリ９に記憶される。上部閾値座標を求める手
法としては、たとえばその一例を述べると、谷部座標と
上部基本ラインl2の座標とを減算し、その差であるオフ
セット画素数ΔＡ（第６図参照）を求め、このオフセッ
ト画素数ΔＡを、上部基本ラインl2の座標に加算し、こ
れによって上部閾値座標を求めることができる。この上
部閾値座標は、後述の上部閾値ラインl8の座標である。
このことは、次に述べる下部閾値座標に関しても同様で
ある。In step b1, the absolute difference distance (distance between the upper basic line coordinate and the upper extracted coordinate of each character) is calculated.
At step b2, a histogram as shown in FIG. 6 is created based on the absolute difference distance and stored in the histogram memory 8. In step b3, the two mountain parts formed in the histogram (the mountain part 22 formed by the character upper extraction coordinates existing near the upper basic line 12 and the character upper extraction coordinates existing far from the upper basic line 12) are formed. The valley 21 between the peaks 23) is detected. At step b4, upper threshold coordinates are obtained from the valley coordinates on the histogram and the coordinates of the upper basic line 12 and stored in the upper threshold coordinate memory 9. As an example of a method of obtaining the upper threshold coordinate, for example, the valley coordinate and the coordinate of the upper basic line l2 are subtracted, and the offset pixel number ΔA (see FIG. 6) which is the difference is obtained, and this offset is calculated. The number of pixels ΔA can be added to the coordinates of the upper basic line 12 to obtain the upper threshold coordinates. The upper threshold coordinate is the coordinate of the upper threshold line l8 described later.
This also applies to the lower threshold coordinates described below.

こうして、上部閾値座標は、ラインl8で示される値に
定められ、このラインl8は上部および下部の基本ライン
l2,１に平行な後述の上部閾値ラインとして定められ
る。Thus, the upper threshold coordinate is set to the value indicated by the line l8, which is the upper and lower basic lines.
It is defined as an upper threshold line described later that is parallel to l2,1.

続いて第４図ステップa6では、下部閾値座標がステッ
プa5と同様な方法で検出され、下部閾値座標メモリ10に
記憶される。Subsequently, in step a6 of FIG. 4, the lower threshold coordinate is detected by the same method as in step a5 and stored in the lower threshold coordinate memory 10.

ステップa7からステップa11では、キャラクタ群の判
定が行なわれる。第７図は、仮想ｘ−ｙ座標軸とキャラ
クタとキャラクタ判別時に使用する基準ラインとキャラ
クタ抽出ラインとの相対的位置関係を示す図である。第
７図を参照して、ステップa7からステップa11の動作を
説明する。From step a7 to step a11, the character group is determined. FIG. 7 is a diagram showing a relative positional relationship between a virtual xy coordinate axis, a character, a reference line used for character discrimination, and a character extraction line. The operation of steps a7 to a11 will be described with reference to FIG.

ステップa7では、記号群の判定が行なわれる。まず、
上部基本ラインl2および下部基本ライン１から中心ラ
インl3が求められ、中心ライン座標が中心ライン座標メ
モリ11に記憶される。第７図（１）のように、中心ライ
ンl3のｙ座標の値より上部抽出ラインl4のｙ座標の値が
小さい場合、すなわち中心ラインl3より上部抽出ライン
l4が下に存在する場合は、「．」あるいは「，」と判定
される。（以下同様に、ラインの上下判定には、仮想座
標軸のｙ座標を用いる。）中心ラインl3より下部抽出ラ
インl5が上に存在する場合は、「’」あるいは「”」と
判定される。At step a7, the symbol group is determined. First,
The center line l3 is obtained from the upper basic line 12 and the lower basic line 1, and the center line coordinates are stored in the center line coordinate memory 11. As shown in FIG. 7 (1), when the y-coordinate value of the upper extraction line l4 is smaller than the y-coordinate value of the center line l3, that is, the extraction line above the center line l3
If l4 exists below, it is judged as "." or ",". (Similarly, the y coordinate of the virtual coordinate axis is used to determine whether the line is up or down.) If the lower extraction line l5 is above the center line l3, it is determined to be "'" or "".

記号「・」あるいは「−」の判定には、上部基本ライ
ンl2と下部基本ライン１の幅の1/4を閾値として記号
用上部閾値ラインl6および記号用下部閾値ラインl7を検
出する。上部抽出ラインl4が記号用上部閾値ラインl6よ
り下、および下部抽出ラインl5が記号用下部閾値ライン
l7より上に存在する場合、記号「・」あるいは記号
「−」と判定される。以上のように記号であると判定さ
れると、キャラクタ群フラグメモリ12の対応する部分に
記号群フラグがたてられる。To determine the symbol "." Or "-", the symbol upper threshold line l6 and the symbol lower threshold line l7 are detected with 1/4 of the width of the upper basic line 12 and the lower basic line 1 as the threshold. The upper extraction line l4 is below the symbol upper threshold line l6, and the lower extraction line l5 is the symbol lower threshold line
If it exists above l7, it is judged to be the symbol "." or the symbol "-". When it is determined that the character group is a symbol as described above, the symbol group flag is set in the corresponding portion of the character group flag memory 12.

ステップa8では、上部突出キャラクタ群の判定が行な
われる。ステップa7で、記号群と判定されなかったキャ
ラクタについて、第７図（２）に示すように、上部抽出
ラインl4が上部閾値ラインl8より上および下部抽出ライ
ンl5が下部閾値ラインl9より上に存在する場合、上部突
出キャラクタであると判定され、キャラクタ群フラグメ
モリ12の対応する部分に、上部突出キャラクタ群フラグ
がたてられる。At step a8, the upper protruding character group is determined. For the character not determined to be the symbol group in step a7, the upper extraction line l4 exists above the upper threshold line l8 and the lower extraction line l5 exists above the lower threshold line l9 as shown in FIG. 7 (2). In this case, the character is determined to be the upper protruding character, and the upper protruding character group flag is set in the corresponding portion of the character group flag memory 12.

ステップa9では、下部突出キャラクタ群の判定が行な
われる。ステップa7およびステップa8で、記号群または
上部突出キャラクタ群のどちらにも判定されなかったキ
ャラクタについて、第７図（３）に示すように、上部抽
出ラインl4が上部閾値ラインl8より下、および下部抽出
ラインl5が下部閾値ラインl9より下に存在する場合、下
部突出キャラクタであると判定され、キャラクタ群フラ
グメモリ12の対応する部分に、下部突出キャラクタ群フ
ラグがたてられる。ステップa10では、中間キャラクタ
群の判定が行なわれる。ステップa7、ステップa8および
ステップa9で、記号群、上部突出キャラクタ群または下
部突出キャラクタのいずれにも判定されなかったキャラ
クタについて、第７図（４）に示すように上部抽出ライ
ンl4が上部閾値ラインl8より下および下部抽出ラインl5
が下部閾値ラインl9より上に存在する場合、中間キャラ
クタ群であると判定され、キャラクタ群フラグメモリ12
の対応する部分に中間キャラクタ群フラグがたてられ
る。At step a9, the lower protruding character group is determined. For characters that are not determined to be either the symbol group or the upper protruding character group in steps a7 and a8, as shown in FIG. 7 (3), the upper extraction line l4 is below the upper threshold line l8, and the lower part is below the upper threshold line l8. If the extraction line l5 exists below the lower threshold line l9, it is determined to be a lower protruding character, and the lower protruding character group flag is set in the corresponding portion of the character group flag memory 12. At step a10, the determination of the intermediate character group is performed. For characters that are not determined to be any of the symbol group, the upper protruding character group, or the lower protruding character in step a7, step a8, or step a9, as shown in FIG. 7 (4), the upper extraction line l4 is the upper threshold line. l8 below and below l5 l5
Is above the lower threshold line l9, it is determined to be an intermediate character group, and the character group flag memory 12
The intermediate character group flag is set to the corresponding portion of the.

ステップa11では、その他のキャラクタ群の判定が行
なわれる。ステップa7、ステップa8、ステップa9および
ステップa10において、どのキャラクタ群にも判定され
なかったキャラクタ（第７図（５）に示されるようなキ
ャラクタ）がその他のキャラクタと判定され、キャラク
タ群フラグメモリ12の対応する部分にその他のキャラク
タ群フラグがたてられる。At step a11, determination of other character groups is performed. In step a7, step a8, step a9, and step a10, the character (character as shown in FIG. 7 (5)) which is not judged to be any character group is judged to be another character, and the character group flag memory 12 The other character group flags are set in the corresponding portions of.

このように、ステップa7からステップa11によって、
読取られた画像から抽出されたキャラクタを、記号群、
上部突出キャラクタ群、下部突出キャラクタ群、中間キ
ャラクタ群およびその他のキャラクタ群の５群中１群に
限定することができる。また、記号の場合は、単に記号
群と判定するだけでなく、ある程度まで種類を判定する
ことも可能である。Thus, from step a7 to step a11,
A character group extracted from the read image is
It can be limited to one of the five groups of the upper protruding character group, the lower protruding character group, the intermediate character group, and the other character groups. Further, in the case of a symbol, it is possible to determine not only the symbol group but also the type to some extent.

効果以上のように本発明によれば、読取られて画像メモリ
３に記憶されたキャラクタ列を構成する複数の各キャラ
クタの主要部分の上下方向の各占有範囲を示す上下の基
本ラインl2,１をそれぞれ定めるとともに、さらに各
キャラクタの上下の部分の抽出座標と上下の基本ライン
l2,１との絶対差距離に基づいて頻度を求めて、頻度
の大きい２つのグループ22,23間にある上下の閾値ライ
ンl8,l9を定め、このようにして上下の基本ラインl2,
１と上下の閾値ラインl8,l9とによって複数のキャラク
タ群に各キャラクタを分類するようにしたので、一般原
稿の文字フォントの変化にも動的に対応し、安定したキ
ャラクタ群の分類を正確に行うことができるようにな
る。Effect As described above, according to the present invention, the upper and lower basic lines l2,1 indicating the vertical occupying ranges of the main portions of the plurality of characters forming the character string that are read and stored in the image memory 3 are formed. In addition to defining each, the extracted coordinates of the upper and lower parts of each character and the upper and lower basic lines
The frequency is calculated based on the absolute difference distance from l2,1, and the upper and lower threshold lines l8, l9 between the two groups 22 and 23 having the high frequency are determined. In this way, the upper and lower basic lines l2,1
1 and upper and lower threshold lines l8 and l9 are used to classify each character into a plurality of character groups, so that it can dynamically respond to changes in the character font of general manuscripts and accurately classify stable character groups. You will be able to do it.

[Brief description of drawings]

第１図は光学的読取り装置１の基本的構成を示すブロッ
ク図、第２図は基本ラインを説明する図、第３図は基本
ラインを用いて分類したキャラクタの具体例を示す図、
第４図は光学的読取り装置１の基本的動作を示すフロー
チャート、第５図は閾値座標検出時の動作を示すフロー
チャート、第６図は閾値座標検出時に作成するヒストグ
ラム、第７図は仮想座標軸とキャラクタとキャラクタ判
別時に使用する基準ラインとキャラクタ抽出ラインとの
相対的位置関係を示す図である。１…光学的読取り装置、２…画像入力装置、３…画像メ
モリ、４…キャラクタ列画像メモリ、５…上部基本ライ
ン座標メモリ、６…下部基本ライン座標メモリ、７…キ
ャラクタ抽出座標メモリ、８…ヒストグラムメモリ、９
…上部閾値座標メモリ、10…下部閾値座標メモリ、11…
中心ライン座標メモリ、12…キャラクタ群フラグメモ
リ、13…制御部、１…下部基本ライン、l2…上部基本
ライン、l3…中心ライン、l4…上部抽出ライン、l5…下
部抽出ライン、l6…記号用上部閾値ライン、l7…記号用
下部閾値ライン、l8…上部閾値ライン、l9…下部閾値ラ
インFIG. 1 is a block diagram showing the basic configuration of the optical reading device 1, FIG. 2 is a diagram for explaining basic lines, and FIG. 3 is a diagram showing specific examples of characters classified using the basic lines.
4 is a flow chart showing the basic operation of the optical reader 1, FIG. 5 is a flow chart showing the operation at the time of detecting the threshold coordinate, FIG. 6 is a histogram created at the time of detecting the threshold coordinate, and FIG. 7 is a virtual coordinate axis. It is a figure which shows the relative positional relationship of the character and the reference line used at the time of character discrimination, and the character extraction line. 1 ... Optical reading device, 2 ... Image input device, 3 ... Image memory, 4 ... Character string image memory, 5 ... Upper basic line coordinate memory, 6 ... Lower basic line coordinate memory, 7 ... Character extraction coordinate memory, 8 ... Histogram memory, 9
… Upper threshold coordinate memory, 10… Lower threshold coordinate memory, 11…
Central line coordinate memory, 12 ... Character group flag memory, 13 ... Control unit, 1 ... Lower basic line, l2 ... Upper basic line, l3 ... Center line, l4 ... Upper extraction line, L5 ... Lower extraction line, L6 ... Symbol Upper threshold line, l7 ... Lower threshold line for symbols, l8 ... Upper threshold line, l9 ... Lower threshold line

Claims

[Claims]

1. A character string composed of a plurality of characters arranged in one array direction is read and stored in an image memory 3, the stored contents of the image memory 3 are read, and the main characters of the characters included in the character string are read. The upper basic line l2 and the lower basic line 1 which are parallel to each other indicating the respective occupied ranges in the vertical direction of the part are defined respectively, and the absolute difference distance between the upper basic line l2 and the upper extracted coordinates of each character is calculated, and The frequency of each absolute difference distance is calculated, and when there are two groups 22 and 23 with high frequency, the coordinates between the groups 22 and 23 along the vertical direction of the upper and lower basic lines l2,1 and the upper basic line. l
The upper threshold line l8 parallel to the upper basic line l2 is defined between the groups 22 and 23 based on the coordinates of 2 and the absolute difference distance between the lower basic line 1 and the lower extracted coordinates of the plurality of characters is calculated. The frequency of each absolute difference distance is obtained, and when there are two groups with high frequency, from the coordinates between the upper and lower basic lines 1 and 12 in the vertical direction and the coordinate of the lower basic line 1, A lower threshold line l9 parallel to the lower basic line 1 is defined between the groups, and a plurality of characters are set according to the positional relationship between the upper basic line l2, the lower basic line 1, the upper threshold line l8, and the lower threshold line l9. Character image processing method, characterized in that the character image processing method is characterized in that