JPS6136680B2

JPS6136680B2 -

Info

Publication number: JPS6136680B2
Application number: JP55120190A
Authority: JP
Inventors: Kazuo Yokoyama
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1980-08-30
Filing date: 1980-08-30
Publication date: 1986-08-19
Also published as: JPS5745677A

Description

【発明の詳細な説明】本発明は、文字認識において複数の文字を個々
の文字に分離するための位置を決定する位置決め
方式に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a positioning method for determining positions for separating a plurality of characters into individual characters in character recognition.

文字認識においては文字が印刷された媒体を走
査してビデオ信号を得、これを一旦メモリに記憶
させ、次いで１文字ずつ切出して種々の論理に従
つて文字認識を行なう。この文字切出しには、行
単位での分離は済んでいるとすると行内に区切り
を入れて１文字ずつにすればよく、本発明はこの
区切りを入れる位置を決める方式に係るものであ
る。 In character recognition, a medium on which characters are printed is scanned to obtain a video signal, which is temporarily stored in a memory, and then characters are cut out one by one and character recognition is performed according to various logics. To cut out characters, assuming that line-by-line separation has been completed, it is sufficient to insert a break within the line to separate each character, and the present invention relates to a method for determining the position to insert this break.

読取領域設定に当つては、読取り開始点（フイ
ールドスタートポイントFSP）アドレス、読取り
終了点（フイールドエンドポイントFEP）アド
レス、およびこれらの両点の間の読取文字数が上
位装置より与えられるが、これらのFSP，FEPは
画像上の開始点、終了点とは若干異なつているの
が普通である。若し一致しておればFSP，FEPは
基点からのビツト（これはメモリセルに対応す
る）数で与えられるので、文字間隔つまりピツチ
Ｐは（FFP―FSP）／ｎとして求まり（ここでｎ
は文字数）、従つてFSPまたはFEPからピツチＰ
で区切つて行けば各文字切出しができる。しかし
FSP，FEPがずれていると、この方法では各文字
を分断する線で区切つてしまうことにもなりかね
ない。 When setting the reading area, the host device provides the reading start point (field start point FSP) address, reading end point (field end point FEP) address, and the number of characters to be read between these two points. FSP and FEP are usually slightly different from the start and end points on the image. If they match, FSP and FEP are given by the number of bits from the base point (this corresponds to the memory cell), so the character spacing, or pitch P, is determined as (FFP - FSP)/n (where n
is the number of characters), so from FSP or FEP
You can separate each character by separating them with . but
If FSP and FEP are misaligned, this method may end up separating each character with a dividing line.

第１図はこのような文字切出し方法を説明する
ための図である。図において、行中の文字は何字
か欠けている場合もあるが、FEP，FSP間の文字
数をCmとすると、ここで入力される文字数は間
隔（FEP―FSP）内に詰まり得る最大数、従つて
Cm＝（FEP―FSP）／Ｐである。ラインプリン
タやタイプライタなどの機械により印字されたも
のはピツチは正確であり、従つて第１図のP₁，
P₂，P₃は全て等しい。第２図はFSP等と画像との
ずれを示し、１は文字が印された媒体、２〜４は
その画像で２はずれがＥが０のもの、３，４はＥ
＞０のものである。Ｅが大になり、また文字が詰
つて遊びが小になると切断線ｌは文字を分断する
ことになる。 FIG. 1 is a diagram for explaining such a character extraction method. In the figure, some characters in the line may be missing, but if the number of characters between FEP and FSP is Cm, the number of characters input here is the maximum number that can fit within the interval (FEP - FSP), Accordingly
Cm=(FEP−FSP)/P. Printed by machines such as line printers and typewriters, the pitch is accurate, so P ₁ in Figure 1,
P ₂ and P ₃ are all equal. Figure 2 shows the deviation between FSP etc. and the image, 1 is the medium on which the characters are marked, 2 to 4 are the images, 2 is the one where E is 0, 3 and 4 are the E
>0. When E becomes large and the characters are packed together and the play becomes small, the cutting line l will divide the characters.

本発明は実測結果でFSP，FEPを補正して正確
な文字切出しを行なおうとするものであり、その
特徴とする所は文字認識装置のメモリに一時蓄積
した文字画像情報から行中の各文字を切出すため
の位置決め方式において、与えられた読取開始点
情報、読取終了点情報、およびこれらの端点間に
含まれる最大印刷文字数から文字ピツチを算出
し、また該行中の各文字を表わす２値情報の各ビ
ツトを行幅方向に加算し、得られた行方向の数値
群を前記端点から前記文字ピツチで区切つて複数
個の数値ブロツクを得、該各数値ブロツクの各数
値を、そのブロツク内ビツト番号が同じもの同志
で累計し、累計値が最小となるビツト番号を求
め、該ビツト番号で端点情報を補正して真の端点
を求め、該真の端点からの前記文字ピツチ間隔位
置で前記各文字を切出すための切出し位置とする
ことにある。以下実施例につきこれを詳細に説明
する。 The present invention attempts to perform accurate character segmentation by correcting FSP and FEP using actual measurement results, and its feature is that each character in a line is extracted from character image information temporarily stored in the memory of a character recognition device. In the positioning method for cutting out the line, the character pitch is calculated from the given reading start point information, reading end point information, and the maximum number of printed characters included between these end points, and the 2 Each bit of value information is added in the line width direction, the obtained numerical group in the line direction is separated from the end point by the character pitch to obtain a plurality of numerical blocks, and each numerical value in each numerical block is added to that block. The bit numbers with the same inner bit number are accumulated, the bit number with the smallest cumulative value is found, the end point information is corrected with this bit number, the true end point is found, and the character pitch interval position from the true end point is calculated. The purpose is to set the cutting position for cutting out each character. This will be explained in detail below using examples.

先ず第３図で本発明の位置決め方式を説明す
る。この図で１０はメモリであり、媒体上の文字
を走査して得た白黒２値画像信号が書込まれる。
図面ではこの記憶状態を実像のイメージで示して
いる。小方形枠１２が１メモリセルである。
FSP，FEPが図示の如く与えられたとすると（本
例ではやゝ右にずれている）、余裕を見てそれよ
りＰ／２外側の点FSP′を仮の起点とし、それより縦方向の投影を求める（１文字分余分に）。この結果
が数列14である。即ちFSP′の辺りには本例では
文字はないので０（白）が続き、５番目で黒が１
ビツト現われ、次は黒が６ビツト表われ、その次
はしばらく０が続く。この数列はピツチＰ（本例
ではＰ＝６）ずつに区切つてブロツク〜と
し、該数値ブロツク〜の各数値を、そのブロ
ツク内ビツト番号が同じもの同志で累計する。
こゝでブロツク内ビツト番号とは、各数値の、当
該ブロツク始端から数えた順番をいう。結果は次
の如くなる。 First, the positioning method of the present invention will be explained with reference to FIG. In this figure, 10 is a memory, into which a black and white binary image signal obtained by scanning characters on a medium is written.
In the drawing, this memory state is shown as a real image. The small rectangular frame 12 is one memory cell.
Assuming that FSP and FEP are given as shown in the figure (in this example, they are slightly shifted to the right), taking a margin and setting a point FSP' outside P/2 from them as a temporary starting point, we can calculate the vertical projection from there. Find (one extra character). The result is sequence 14. In other words, there are no characters around FSP' in this example, so 0 (white) continues, and black is 1 at the 5th position.
A bit appears, then 6 black bits appear, and then 0 continues for a while. This numerical sequence is divided into blocks of pitches P (in this example, P=6), and the numerical values of the numerical blocks are accumulated among those having the same bit number within the block.
Here, the bit number within a block refers to the order of each numerical value counted from the start of the block. The result is as follows.

000016 000033 400032 500033 600330 330000 1830031314 文字間では当然文字はないから０になるはずで
ある。文字内では、当該文字が連続したものであ
れば０が生じることはないが、仮名の「ハ」、
「リ」などのような分離文字では文字内に０が現
われる。しかしその０が現われる幅は文字間より
は狭く、そして行内の各文字がすべて分離文字で
ある確率も小さいから累計すると消されてしまう
傾向にある。本例でも右端の文字は分離文字であ
り、０が現われているが他の文字により打消さ
れ、累計では文字間（上記累計値の左から３番
目）にのみ０が現われている。こうして文字間即
ち分離点は仮の起点FEP′からピツチＰ間隔で区
切つた点より右へ２ビツトずれた・印の点である
ことが分る。 000016 000033 400032 500033 600330 330000 1830031314 Of course there are no characters between characters, so it should be 0. In a character, if the characters are consecutive, 0 will not occur, but the kana "ha",
In separated characters such as "ri", 0 appears within the character. However, the width in which 0 appears is narrower than the character spacing, and the probability that all characters in a line are separate characters is small, so when added up, they tend to be erased. In this example as well, the rightmost character is a separate character, and although 0 appears, it is canceled out by other characters, and in total, 0 appears only between characters (third from the left in the above cumulative value). Thus, it can be seen that the character spacing, that is, the separation point is a point marked with a .lambda., which is shifted by 2 bits to the right from the point separated by the pitch P from the tentative starting point FEP'.

なお累計値の大きなものは文字中心部分を示
す。またかゝる文字中心および文字間を求めるの
にＰ個の数群全部を累計する必要はなく、その一
部の適当数例えば５〜６個累計すれば充分であ
る。次に第４図にかゝる位置決めを行なう回路を
示す。 Note that the large cumulative value indicates the central part of the character. Further, in order to obtain such character centers and character spacings, it is not necessary to total up all the P number groups, but it is sufficient to total up a suitable number, for example, 5 to 6 of them. Next, a circuit for performing such positioning is shown in FIG.

第４図で２０は減算器で開始点FSP，終了点
FEPを入力され、その差を出力する。２２は除
算器で減算器２０の出力を文字数Cmを入力さ
れ、割算を行なつてピツチＰを出力する。２４は
これを記憶するレジスタである。２６は記憶装置
で、第３図に示した如き画像情報が入力される。
２８は列（縦方向）カウンタ、３０は行（横方
向）カウンタである。カウンタ２８はクロツク
CLLを計数し、行の幅に相当するビツト、第３
図の例では８ビツト計数すると元に戻る８進カウ
ンタである。カウンタ３０はカウンタ２８の８分
周した出力を受け、記憶装置２６のアドレス指定
を行なう。こうして記憶装置２６では行の幅分の
ビツト本例では８ビツトが一度読出され、クロツ
クCLLが入力する並直列変換器３２で直列信号
に変換され、アンドゲート３４を通してカウンタ
C₁〜Cnのイネーブル端子に加えられる。行カウ
ンタ３０の出力はまた比較器３６，３８へも加え
られる。比較器３６へはFSP−１／２Ｐが他方の入力となり、両入力が一致すると出力（これは仮の起
点FSP′に相当）を生じ、フリツプフロツプ４０
をセツトする。従つてこの時よりアンドゲート３
４が開き、変換器３２の出力がカウンタC₁〜Cn
に加わる。比較器３８はFEP′＝FEP＋１／２Ｐが他方の入力となり、両入力が一致するとき出力を生
じ、フリツプフロツプ４０をリセツトする。従つ
てカウンタC₁〜Cnに変換器３２の出力が加わる
のは比較器３８の出力（これは余裕を見て延長し
た仮の終点に相当する）がある迄である。なお４
２は除算器でピツチＰを入力されてＰ／２を出力する。４４は減算器、４６は加算器である。４８は
ピツチカウンタで、アンドゲート５４で作られた
各縦列（ビツト）毎に生じるクロツクCLHを計
数対象とし、比較器３６から得られる仮の開始点
パルスをオアゲート５２を通して受けて計数を開
始し、計数値をデコーダ５０へ入力する。またこ
の計数値は比較器５６へも入力され、ピツチＰに
等しくなると比較器５６は一致出力EQを生じ、
これはオアゲート５２を通してカウンタ４８に入
力し、カウンタクリヤ、再起動を行なわせる。こ
の結果カウンタ４８の計数値は１，２，３……
Ｐ，１，２，……を繰り返し、デコーダ５０はこ
れを受けてカウンタC₁，C₂，C₃……Cn，C₁，C₂
……を順次選択する（但しｎ＝Ｐ）。 In Figure 4, 20 is the subtracter, starting point FSP, ending point
Inputs FEP and outputs the difference. 22 is a divider which inputs the output of the subtracter 20 and the number of characters Cm, performs division, and outputs pitch P. 24 is a register that stores this. 26 is a storage device into which image information as shown in FIG. 3 is input.
28 is a column (vertical direction) counter, and 30 is a row (horizontal direction) counter. Counter 28 is a clock
Count the CLL, the bit corresponding to the width of the line, the third
In the example shown in the figure, it is an octal counter that returns to the original value after counting 8 bits. The counter 30 receives the divided-by-8 output of the counter 28 and specifies the address of the storage device 26. In this way, in the storage device 26, bits equivalent to the width of a row (in this example, 8 bits) are read out once, converted into a serial signal by the parallel/serial converter 32 to which the clock CLL is input, and then sent to the counter through the AND gate 34.
Applied to the enable terminals of C ₁ to Cn. The output of row counter 30 is also applied to comparators 36,38. FSP-1/2P becomes the other input to the comparator 36, and when both inputs match, an output (this corresponds to the temporary starting point FSP') is generated, and the flip-flop 40
Set. Therefore, from this point on, ANDGATE 3
4 is opened, and the output of the converter 32 is sent to the counters _C1 to Cn.
join. Comparator 38 has FEP'=FEP+1/2P as its other input and produces an output when both inputs match, resetting flip-flop 40. Therefore, the output of the converter 32 is added to the counters C ₁ to Cn until the output of the comparator 38 (which corresponds to a provisional end point extended with a margin) is received. Note 4
2 is a divider which receives pitch P and outputs P/2. 44 is a subtracter, and 46 is an adder. 48 is a pitch counter which counts the clock CLH generated in each column (bit) created by the AND gate 54, and starts counting by receiving a temporary starting point pulse obtained from the comparator 36 through the OR gate 52; The count value is input to the decoder 50. This count value is also input to the comparator 56, and when it becomes equal to the pitch P, the comparator 56 produces a coincidence output EQ.
This is input to the counter 48 through the OR gate 52, causing the counter to be cleared and restarted. As a result, the count value of the counter 48 is 1, 2, 3...
P, 1, 2, . . . are repeated, and the decoder 50 receives this and sets the counters C ₁ , C ₂ , C ₃ . . . Cn, C ₁ , C ₂
. . . are selected in sequence (where n=P).

カウンタC₁〜Cnはまた並直列変換器３２の出
力によつても制御され、クロツクCLLを計数す
るのは変換器１２「１」（黒）信号のときのみで
ある。従つてカウンタC₁〜Cnがデコーダ５０に
より１回選択されると各カウンタの計数値は第３
図の例では数群になる。２度目は数群にな
り、しかも前の計数値と加算されるから結果は
＋になる。以下同様であり、こうしてカウンタ
C₁〜Cnは前述の累計値18，３，０，３，13，14
になる。演算部６０はマルチプレクサ５８を通し
てカウンタC₁〜Cnの計数値を取込み、最小のも
ののビツト番号を出力する。本例ではこれは前述
の通り「２」である。これは加算器６２，６４で
FSP′FEP″にプラスされ、真の開始点TESP，真
の終了点TFEPとなる。なおFEP″＝FEP−Ｐ／２である。この真の開始または終了点からピツチＰで
区切ればそれが分離点となり、こうして各文字を
正しく切出すことができる。 Counters C ₁ -Cn are also controlled by the output of parallel to serial converter 32 and only count clock CLL when converter 12 is a "1" (black) signal. Therefore, when counters C ₁ to Cn are selected once by the decoder 50, the count value of each counter becomes the third
In the example shown in the figure, it becomes a number group. The second time, it becomes a number group, and since it is added to the previous count value, the result is +. The same goes for the following, and thus the counter
C ₁ to Cn are the aforementioned cumulative values 18, 3, 0, 3, 13, 14
become. The arithmetic unit 60 takes in the counted values of the counters C ₁ -Cn through the multiplexer 58 and outputs the smallest bit number. In this example, this is "2" as described above. This is the adders 62 and 64.
It is added to FSP'FEP'' and becomes the true starting point TESP and the true ending point TFEP. Note that FEP''=FEP-P/2. If a pitch P is used to separate the characters from this true start or end point, that becomes a separation point, and thus each character can be correctly cut out.

以上詳細に示したように本発明によれば行中の
各文字を正確に切出すことができ、文字認識率を
向上させることができる。 As described above in detail, according to the present invention, each character in a line can be accurately cut out, and the character recognition rate can be improved.

[Brief explanation of the drawing]

第１図および第２図は文字読出し範囲指定につ
いての説明図、第３図は本発明の位置決め方式の
説明図、第４図は本発明の実施例を示すブロツク
図である。図面で、２６はメモリ、FSPは読取開始点、
FEPは読取終了点、Cmは最大文字数、Ｐは文字
ピツチ、TESP，TFEPは真の読取開始、終了点
である。 1 and 2 are explanatory diagrams for specifying a character reading range, FIG. 3 is an explanatory diagram for the positioning system of the present invention, and FIG. 4 is a block diagram showing an embodiment of the present invention. In the drawing, 26 is the memory, FSP is the reading start point,
FEP is the reading end point, Cm is the maximum number of characters, P is the character pitch, and TESP and TFEP are the true reading start and end points.

Claims

[Claims]

1. In a positioning method for cutting out each character in a line from character image information temporarily stored in the memory of a character recognition device, given reading start point information, reading end point information, and maximum printing included between these end points The character pitch is calculated from the number of characters, and each pitch of the binary information representing each character in the line is added in the line width direction, and the obtained numerical value group in the line direction is divided from the end point by the character pitch to form a plurality of values. Obtain numeric blocks, add up each numeric value of each numeric block among those with the same bit number within the block, find the bit number that minimizes the cumulative value, and correct the end point information using the bit number. A positioning method characterized in that a true end point is determined, and a position at the character pitch interval from the true end point is set as a cutting position for cutting out each character.