JPH0799535B2 - Character figure recognition method - Google Patents
Character figure recognition methodInfo
- Publication number
- JPH0799535B2 JPH0799535B2 JP62067968A JP6796887A JPH0799535B2 JP H0799535 B2 JPH0799535 B2 JP H0799535B2 JP 62067968 A JP62067968 A JP 62067968A JP 6796887 A JP6796887 A JP 6796887A JP H0799535 B2 JPH0799535 B2 JP H0799535B2
- Authority
- JP
- Japan
- Prior art keywords
- character
- divided
- area
- pattern
- coordinates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims description 20
- 239000011159 matrix material Substances 0.000 claims description 27
- 238000009826 distribution Methods 0.000 claims description 20
- 238000001514 detection method Methods 0.000 claims description 11
- 230000005484 gravity Effects 0.000 description 16
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Landscapes
- Character Discrimination (AREA)
Description
【発明の詳細な説明】 (産業上の利用分野) 本発明は媒体上の文字図形を認識する文字図形認識方式
に関するものである。The present invention relates to a character / figure recognition system for recognizing character / figure on a medium.
(従来の技術) 従来、文字図形認識装置では、文字図形パターンよりス
トロークを抽出し、それら抽出されたストロークの位
置、長さ、ストローク間の相互関係等を用いて認識する
方式が多く採用されている。その手法は(1)文字図形
の輪郭を追跡することにより検出された輪郭点系列につ
いて曲率を計算し、その曲率の大きな値の点を分割点と
して輪郭系列を分割し、分割された系列を組合わせるこ
とによりストロークを抽出するか、(2)文字図形パタ
ーンに細線化処理を行なつて骨格化し、その骨格パター
ンの連結性及び骨格パターンを追跡し急激な角度の変化
点等を検出してストロークを抽出し、前記(1)(2)
より抽出されたストロークについて幾何学的な特徴等を
抽出して識別を行なつていた。(Prior Art) Conventionally, a character / graphics recognition apparatus has often adopted a method of extracting strokes from a character / graphics pattern and recognizing them by using the positions, lengths, mutual relationships between the strokes, and the like of the extracted strokes. There is. The method is as follows: (1) The curvature is calculated for the contour point series detected by tracing the contour of the character figure, the contour series is divided with the point having a large curvature value as the division point, and the divided series is combined. Strokes are extracted by combining them, or (2) the character / graphic pattern is thinned to form a skeleton, the connectivity of the skeleton pattern and the skeleton pattern are traced, and a sudden angle change point is detected to make a stroke. And extract (1) and (2) above
The strokes thus extracted are identified by extracting geometrical features and the like.
また、処理が簡単な手法として、(3)入力文字図形パ
ターンを走査して得られる所定の2つの軸(例えば水平
軸,垂直軸)上における黒ビツト数分布に対し、文字枠
で定められる範囲で重心座標を決定する。次いで、それ
までに検出した夫々の重心座標で、上記文字枠で定めら
れる範囲を分割した夫々の範囲を対象として夫々の前記
黒ビツト数分布の重心座標を決定する過程を複数回繰返
して重心座標系列を求める。求めた夫々の重心座標系列
とほぼ均等に対応させた分割座標系列によつて、上記入
力文字図形パターンを夫々の軸方向に分割し、夫々の軸
上の夫々の各分割領域長を夫々の軸方向の文字枠長で正
規化して得た正規化分割領域長系列を上記入力文字図形
パターンの特徴として抽出して識別を行なつていた。In addition, as a method of easy processing, (3) a range defined by a character frame with respect to a black bit number distribution on two predetermined axes (for example, a horizontal axis and a vertical axis) obtained by scanning an input character / graphic pattern Determine the barycentric coordinates with. Then, in the respective barycentric coordinates detected so far, the process of determining the barycentric coordinates of each of the black bit number distributions for each range obtained by dividing the range defined by the character frame is repeated a plurality of times. Find the series. The input character / graphic pattern is divided in the respective axial directions by the divided coordinate series which is substantially evenly associated with the obtained respective barycentric coordinate series, and the respective divided area lengths on the respective axes are divided into the respective axes. The normalized divided area length sequence obtained by normalizing with the character frame length in the direction is extracted as a feature of the input character / graphic pattern and identified.
(発明が解決しようとする問題点) しかしながら、前記従来の文字図形認識方式では、次の
ような問題点がある。(Problems to be Solved by the Invention) However, the conventional character / graphics recognition method has the following problems.
(1)の方式では文字図形パターンが大きくなり、又文
字図形パターンが複雑化すると、その処理量が増大し処
理速度の低下を招いていた。(2)の方式では文字図形
パターンを細線化する必要があり、又その細線化による
パターンのひずみ、ヒゲ等の問題があり、その後の処理
を複雑なものとしていた。また(3)の方式は処理が簡
単ではあるが、本来二次元の性質をもつ文字図形パター
ンを分割領域長という一次元の性質を表わす特徴で表現
しているために、入力文字図形パターンによつては識別
が困難な場合があつた。In the method (1), if the character / graphic pattern becomes large and the character / graphic pattern becomes complicated, the processing amount increases and the processing speed decreases. In the method (2), it is necessary to make the character / graphic pattern thin, and there are problems such as pattern distortion and beard due to the thinning, and the subsequent processing is complicated. Although the method of (3) is simple in processing, since the character / graphic pattern originally having a two-dimensional property is expressed by a feature representing a one-dimensional property such as a divided region length, the input character / graphic pattern is used. In that case, it was difficult to identify.
本発明は以上述べた問題点を解決し、簡単な処理で高速
かつ正確に文字図形を認識することが可能な文字図形認
識方式を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to solve the above-mentioned problems and to provide a character / figure recognition system capable of recognizing a character / figure quickly and accurately by a simple process.
(問題点を解決するための手段) 本発明は前記問題点を解決するために、媒体上の文字図
形を読取って量子化,2値化して、例えば文字線部を黒ビ
ット、背景部を白ビットと表わして得られるパターンを
記憶する記憶手段を備え、前記パターンに基づいて文字
図形を認識する文字図形認識方式において、(a)前記
パターンを走査して文字図形の外接枠を検出する第1の
検出手段、(b)前記パターンを走査して所定の2つの
軸に(例えば2次元座標における水平方向、垂直方向。
以下これらをX軸、Y軸という。)に投影した各軸方向
の黒ビツト数分布を作成する作成手段、(c)前記2つ
の軸方向の前記外接枠内の範囲で各黒ビツト数分布の重
心座標を決定し、決定した各重心座標で外接枠内の範囲
を分割した各分割範囲に対し各黒ビツト数分布の重心座
標を決定する過程を繰り返して各軸方向の重心座標系列
を検出する第2の検出手段、(d)設定される分割数に
基づいて、前記重心座標系列に対応した各軸方向の分割
座標系列を決定する決定手段、(e)前記分割座標系列
で分割される前記外接枠内の分割領域毎に、該分割領域
の面積を外接枠内の面積で正規化した分割領域面積マト
リクスを計算する計算手段、及び(f)前記分割領域面
積マトリクスと予め計算された標準パターンの分割領域
面積マトリクスとを照合して前記パターンの文字図形を
認識する認識手段を具備するものである。(Means for Solving Problems) In order to solve the above problems, the present invention reads a character graphic on a medium, quantizes it, and binarizes it. For example, a character line part is a black bit and a background part is white. In a character / figure recognition system for recognizing a character / figure based on the pattern, comprising: a storage unit for storing a pattern obtained as a bit; (a) a first pattern for scanning the pattern to detect a circumscribing frame of the character / figure; (B) scanning the pattern on two predetermined axes (for example, a horizontal direction and a vertical direction in a two-dimensional coordinate).
Hereinafter, these are referred to as X axis and Y axis. ) To create a black bit number distribution in each axial direction projected onto the image), (c) determining the barycentric coordinates of each black bit number distribution within the range of the circumscribing frame in the two axial directions, and determining each barycenter. Second detection means for detecting the barycentric coordinate series in each axial direction by repeating the process of determining the barycentric coordinates of each black bit number distribution for each divided range obtained by dividing the range in the circumscribed frame by coordinates, (d) setting Determining means for determining a division coordinate series in each axial direction corresponding to the centroid coordinate series based on the number of divisions, (e) for each division area in the circumscribing frame divided by the division coordinate series, Calculation means for calculating a divided area area matrix in which the area of the divided area is normalized by the area in the circumscribing frame, and (f) the divided area area matrix is collated with the previously calculated divided area area matrix of the standard pattern. Character diagram of the pattern Those having a recognition means for recognizing.
(作用) 本発明によれば、以上のように文字認識方式を構成した
ので技術的手段は次のように作用する。記憶手段に格納
されたパターンを走査することによつて、第1の検出手
段では文字図形の外接枠(文字枠)が検出され、作成手
段では各軸方向(例えばX軸,Y軸方向)の黒ビツト数分
布が作成される。このようにして得られた外接枠及び各
黒ビツト数分布に基づいて、第2の検出手段で各軸方向
の重心座標系列が検出される。次に、設定される分割数
に基づいて、第2の検出手段で検出された重心座標系列
に対応した各軸方向の分割座標系列が決定手段により決
定される。分割数は、例えば文字図形の複雑度に応じて
設定される。決定手段で得られた分割座標系列で分割さ
れる外接枠内の分割領域毎に、その分割領域の面積を外
接枠内の面積で正規化した分割領域面積のマトリクス
(分割領域面積マトリクス)が計算手段により計算され
る。この正規化した分割領域面積マトリクスと予め同様
にして計算された標準パターンの分割領域面積マトリク
スとが、認識手段により照合され、該当する標準パター
ンのカテゴリ名が文字図形名として出力される。このよ
うに、本発明では文字図形のパターンを走査して得られ
る黒ビツト数分布よりその重心を利用して文字図形の特
徴情報として二次元的性質を表わす分割領域面積マトリ
クスを求め、この特徴情報を用いて文字図形を認識して
いるので、簡単な処理で高速かつ正確に文字図形を認識
することができる。(Operation) According to the present invention, since the character recognition system is configured as described above, the technical means operates as follows. By scanning the pattern stored in the storage means, the circumscribing frame (character frame) of the character graphic is detected by the first detecting means, and the creating means detects the circumscribing frame in each axial direction (for example, X-axis, Y-axis direction). A black bit number distribution is created. Based on the circumscribing frame and black bit number distribution thus obtained, the barycentric coordinate series in each axial direction is detected by the second detecting means. Next, based on the set number of divisions, the determining unit determines the dividing coordinate series in each axial direction corresponding to the barycentric coordinate series detected by the second detecting unit. The number of divisions is set, for example, according to the complexity of the character graphic. A matrix of divided area areas (divided area area matrix) is calculated for each divided area in the circumscribing frame that is divided by the divided coordinate series obtained by the determining means, and the area of the divided area is normalized by the area in the circumscribing frame. Calculated by means. The normalized divided area area matrix and the divided area area matrix of the standard pattern calculated in advance in the same manner are collated by the recognizing means, and the category name of the corresponding standard pattern is output as the character graphic name. As described above, according to the present invention, by using the center of gravity of the black bit number distribution obtained by scanning the pattern of the character graphic, the divided area area matrix expressing the two-dimensional property is obtained as the characteristic information of the character graphic, and this characteristic information is obtained. Since the character / graphics are recognized by using, it is possible to recognize the character / graphics quickly and accurately by a simple process.
(実施例) 以下、第1図乃至第5図を参照して本発明の実施例を説
明する。(Embodiment) An embodiment of the present invention will be described below with reference to FIGS. 1 to 5.
第1図は本発明の方式を適用した文字図形認識装置を示
す機能ブロツク図である。本実施例の文字認識装置は、
光入力1を光電変換する光電変換部2、パターンレジス
タ3、文字枠検出部4、文字投影作成部5、重心検出部
6、文字枠分割点決定部7、正規化分割領域面積計算部
8、識別部9、辞書メモリ10及び出力端子11から構成さ
れる。FIG. 1 is a functional block diagram showing a character / graphics recognition apparatus to which the system of the present invention is applied. The character recognition device of this embodiment is
A photoelectric conversion unit 2 for photoelectrically converting the light input 1, a pattern register 3, a character frame detection unit 4, a character projection creation unit 5, a center of gravity detection unit 6, a character frame division point determination unit 7, a normalized division area area calculation unit 8, It comprises an identification unit 9, a dictionary memory 10 and an output terminal 11.
文字、図形、記号等(以下文字という)が記載された帳
票等の媒体からの光入力1は光電変換部2に入力され
る。光電変換部2は光入力1を光電変換して、1つの文
字予定領域を128×128の画素へ分解し、各画素を2値の
デイジタル信号(以下これを入力文字パターンと呼ぶ)
へ変換するものであり、平均的大きさの1文字は60×60
ビット程度の入力文字パターンで表現される。パターン
レジスタ3は文字予定領域における各画素のX,Y座標を
再現できる形式で入力文字パターンを記憶するものであ
り、文字予定領域に対応して128×128ビツトの容量を有
するものである。An optical input 1 from a medium such as a form in which characters, figures, symbols and the like (hereinafter referred to as characters) are written is input to a photoelectric conversion unit 2. The photoelectric conversion unit 2 photoelectrically converts the optical input 1 to decompose one character planned area into 128 × 128 pixels, and each pixel is a binary digital signal (hereinafter referred to as an input character pattern).
Is converted to, and one character of average size is 60 × 60
It is expressed by an input character pattern of about bits. The pattern register 3 stores the input character pattern in a format capable of reproducing the X and Y coordinates of each pixel in the character planned area, and has a capacity of 128 × 128 bits corresponding to the character planned area.
文字枠検出部4は、例えば文字の外接枠(文字枠)をそ
のパターンレジスタにおける左端座標Xl、右端座標Xr、
上端座標Yt、下端座標Ybで表現して検出する。The character frame detection unit 4, for example, defines a circumscribing frame (character frame) of a character as the left end coordinate X l , the right end coordinate X r ,
The upper limit coordinate Y t and the lower limit coordinate Y b are used for detection.
文字投影作成部5はパターンレジスタ3の入力文字パタ
ーンを所定の軸、例えばX軸,Y軸(夫夫パターンレジス
タ3の2次元座標における水平方向,垂直方向)へ投影
して黒ビツト数の分布を求め、黒ビツト数分布SX
(x),SY(y)を作成する。The character projection creating unit 5 projects the input character pattern of the pattern register 3 on a predetermined axis, for example, the X axis and the Y axis (horizontal direction and vertical direction in the two-dimensional coordinates of the husband and wife pattern register 3) to distribute the number of black bits. , The black bit number distribution SX
Create (x) and SY (y).
但し、x,yはパターンレジスタ3における夫夫0〜127な
る2次元座標であり、Yt,Ybは文字枠のY軸方向の上端
座標、下端座標、Xl,XrはX軸方向の左端座標、右端座
標であり、P(x,y)は黒ビツト又は白ビツトを意味
し、黒ビツト(有意色)の場合P(x,y)=1、白ビツ
ト(背景色)の場合P(x,y)=0をとる。 Here, x and y are two-dimensional coordinates 0 to 127 in the pattern register 3, Y t and Y b are upper and lower coordinates in the Y axis direction of the character frame, and X l and X r are X axis directions. Is the left and right coordinates of P, and P (x, y) means black bit or white bit. In case of black bit (significant color), in case of P (x, y) = 1, white bit (background color) Take P (x, y) = 0.
第2図(a)に入力文字パターン例として漢字「田」と
「口」のパターンの場合を示し、第2図(b),(c)
に第2図(a)の各パターンに対する黒ビツト数分布SX
(x),SY(y)を示す。FIG. 2 (a) shows an example of input character patterns in the case of the patterns of Chinese characters “Ta” and “Kuchi”, and FIG. 2 (b), (c)
The black bit number distribution SX for each pattern in Fig. 2 (a)
(X) and SY (y) are shown.
重心検出部6は、文字枠のX,Y各軸方向の全範囲Xl〜Xr,
Yt〜Yb及び前の過程で検出した重心座標でその範囲Xl〜
Xr,Yt〜Ybを分割した各範囲を対象として、入力文字パ
ターンの夫々の黒ビツト数分布SX(x),SY(y)の重
心座標系列X(Mp),Y(Mq)を求めるものであり、各範
囲の1次モーメントの和をその範囲の黒ビツト和で除算
することによつて求めるものである。但し、Mp,Mqは座
標値の大きさの順に付した重心座標番号であり、Mp=1
〜MX(MXはX軸方向の重心の個数)Mq=1〜MY(MYはY
軸方向の重心の個数)である。X軸方向の重心座標の個
数MXとしては、15個程度の比較的多い数(分割数に比べ
て)を採用することが望ましいが、説明の簡略化のため
に7個の重心座標X(Mp)を検出する場合について述べ
る。The center-of-gravity detection unit 6 determines the entire range X l to X r , in the X and Y axis directions of the character frame.
Y t ~ Y b and its range X l ~ in the barycentric coordinates detected in the previous process
Targeting each range obtained by dividing X r , Y t to Y b , the barycentric coordinate series X (M p ), Y (M q of the black bit number distributions SX (x) and SY (y) of the input character patterns ) Is obtained by dividing the sum of the first moments in each range by the black bit sum in that range. However, M p and M q are barycentric coordinate numbers given in order of magnitude of coordinate values, and M p = 1
~ MX (MX is the number of centers of gravity in the X-axis direction) M q = 1 to MY (MY is Y
The number of center of gravity in the axial direction). As the number MX of the barycentric coordinates in the X-axis direction, it is desirable to use a relatively large number (about the number of divisions) of about 15, but for the sake of simplification of explanation, 7 barycentric coordinates X (M The case of detecting p ) will be described.
まず、文字枠のX軸方向の範囲Xl〜Xrを対象として、次
式に示すように入力文字パターンの黒ビツト数分布SX
(x)の1次モーメント和をその範囲の黒ビツト和で除
算することによつて、中央の重心座標番号4の重心座標
X(4)を求め 次いで、その重心座標X(4)で分割された夫々の範
囲、Xl〜X(4),X4〜Xrを対象として2つの重心座標
X(2),X(6)を求める。First, for the range X l to X r in the X-axis direction of the character frame, the black bit number distribution SX of the input character pattern is calculated as shown in the following equation.
The center of gravity coordinate X (4) of the center of gravity coordinate number 4 is obtained by dividing the first moment sum of (x) by the black bit sum in the range. Then, the divided respective ranges that center coordinates X (4), X l ~X (4), 2 two barycentric coordinates as a target of X4~X r X (2), determining the X (6).
次いで、これまで検出された重心座標X(2),X
(4),X(6)で分割された範囲Xl〜X(2),X(2)
〜X(4),X(4)〜X(6),X(6)〜Xrを対象とし
て4個の重心座標X(1),X(3),X(5),X(7)を
求める。 Next, the barycentric coordinates X (2), X detected so far
Range X l to X (2), X (2) divided by (4), X (6)
~X (4), X (4 ) ~X (6), X (6) 4 pieces of the center of gravity to X r as object coordinates X (1), X (3 ), X (5), X (7) Ask for.
Y軸方向の重心座標Y(Mq)の検出も検出する重心座標
個数MYを7個とした場合、まず、文字枠の範囲Yt〜Ybを
対象として入力文字パターンの黒ビツト数分布SY(y)
の重心座標Y(4)を検出し、次いで文字枠を重心座標
で2分した範囲Yt〜Y(4),Y(4)〜Ybそれぞれを対
象として黒ビツト数分布SY(y)の重心座標Y(2),Y
(6)を検出し、更にこれまでに検出された重心座標Y
(2),Y(4),Y(6)でY軸方向の文字枠を分割した
夫々の範囲Yt〜Y(2),Y(2)〜Y(4),Y(4)〜
Y(6),Y(6)〜Ybを対象として黒ビツト数分布SY
(y)の重心座標を検出することによつて、計7個の重
心座標Y(1)〜Y(7)を検出する。 If the number of barycentric coordinates MY that also detects the barycentric coordinate Y (M q ) in the Y-axis direction is set to 7, first, the black bit number distribution SY of the input character pattern is targeted for the range Y t to Y b of the character frame. (Y)
Of detecting the center coordinates Y (4), then the range Y t ~Y (4) for 2 minutes character frame center of gravity coordinates, Y (4) ~Y b black bits number distribution as a target each SY of (y) Barycentric coordinates Y (2), Y
(6) is detected, and the barycentric coordinate Y detected so far
(2), Y (4) , Y (6) range and each divided character frame in the Y-axis direction by Y t ~Y (2), Y (2) ~Y (4), Y (4) ~
Black bit number distribution SY for Y (6), Y (6) to Y b
By detecting the barycentric coordinates of (y), a total of seven barycentric coordinates Y (1) to Y (7) are detected.
漢字「田」と「口」の入力文字パターン(第2図
(a))の場合についは、第2図(b),(c)の黒ビ
ツト数分布(SX(x),SY(y))図中に重心座標X
(1)〜X(7)、Y(1)〜Y(7)を示す。In the case of the input character patterns of the Chinese characters “Ta” and “Kuchi” (FIG. 2 (a)), the black bit number distributions (SX (x), SY (y) in FIGS. 2 (b) and (c) are shown. ) Center of gravity X
(1) to X (7) and Y (1) to Y (7) are shown.
文字枠分割点決定部7は、重心検出部6よりうけたX,Y
軸各方向の重心座標系列X(Mp),Y(Mq)を分割座標候
補として、重心座標番号Mp,Mqを分割座標番号ki,kjにほ
ぼ均等に対応づけて入力文字パターンの文字枠内をNX,N
Yなる個数の分割単位領域に分割する分割座標系列DX(k
i),DY(kj)を決定するものである。The character frame division point determination unit 7 receives the X, Y received from the center of gravity detection unit 6.
The barycentric coordinate series X (M p ), Y (M q ) in each direction of the axis is used as a divisional coordinate candidate, and the barycentric coordinate numbers M p and M q are substantially evenly associated with the divisional coordinate numbers k i and k j. NX, N in the character frame of the pattern
Division coordinate series DX (k
i ), DY (k j ) are determined.
この実施例における分割単位領域の分割形式は、X軸方
向に関する分割数としてNX=4,5,6,8なる4形式を取る
ことができ、同様にY軸方向に関する分割数NYとしてNY
=4,5,6,8なる4形式を取ることができ、X軸方向の分
割座標番号をki(ki=1〜NX-1,NX=4,5,6,8)とし且つ
Y軸方向の分割座標番号をkj(kj=1〜NY-1,NY=4,5,
6,8)として、文字枠をNX・NYなる個数の分割単位領域
に分割する分割座標系列DX(ki),DY(kj)を決定す
る。X,Y軸各方向の重心座標番号Mp,MqとX,Y軸方向の分
割座標番号ki,kjをほぼ均等に対応づけて分割座標系列D
X(ki),DY(kj)を決定するために用いるテーブルを第
1表に示す。The division format of the division unit area in this embodiment can take four formats as NX = 4,5,6,8 as the division number in the X-axis direction, and similarly, the division number NY in the Y-axis direction is NY.
= 4,5,6,8, and the divided coordinate numbers in the X-axis direction are k i (k i = 1 to NX-1, NX = 4,5,6,8) and Y The division coordinate numbers in the axial direction are k j (k j = 1 to NY-1, NY = 4,5,
6, 8), the division coordinate series DX (k i ) and DY (k j ) that divide the character frame into the number of division unit areas NX · NY are determined. The divided coordinate series D by associating the barycentric coordinate numbers M p and M q in the X and Y axis directions with the divided coordinate numbers k i and k j in the X and Y axis directions almost equally
Table 1 shows the table used to determine X (k i ) and DY (k j ).
このテーブルを参照して、X,Y軸各方向の分割数NX,NYに
対応してこのテーブルから重心座標番号Mp,Mqを読み出
し、その重心座標番号Mp,Mqに対応した重心座標X
(Mp),Y(Mq)を分割座標DX(ki),DY(kj)として決
定する。 Referring to this table, X, division number NX of Y-axis in each direction, the centroid coordinate number M p from the table in response to NY, reads the M q, centroid corresponding to the barycentric coordinate number M p, M q Coordinate X
(M p ) and Y (M q ) are determined as the division coordinates DX (k i ) and DY (k j ).
第1表のテーブルは、重心検出部6で検出する重心座標
の個数MX,MYが7個の場合であるが、一般的な場合にお
いても、X,Y各方向の分割数の重心座標が含まれるよう
に対応させ、且つその際余分の重心座標が残つた場合は
両端の領域から順に1個多い重心座標が含まれるように
対応させることによつて作ることができる。The table in Table 1 shows a case where the number of barycentric coordinates MX and MY detected by the barycentric detecting unit 6 is 7, but in a general case, the barycentric coordinates of the number of divisions in each of the X and Y directions are included. It is possible to make it by making correspondence so as to include one more barycentric coordinate in order from the regions at both ends when extra barycentric coordinates remain.
第3図には、X,Y軸各方向の分割数NX,NYとしてNX=NY=
5なる分割数が指定された場合について、分割座標系列
DX(ki),DY(kj)と重心座標系列X(Mp),Y(Mq)と
の対応関係を示すと共に、それらの分割座標系列DX
(ki),DY(kj)で設定される分割単位領域(ki,kj)を
示す。In Fig. 3, NX = NY = as the number of divisions NX, NY in each direction of the X and Y axes.
When a division number of 5 is specified, the division coordinate series
The correspondence between DX (k i ), DY (k j ) and the barycentric coordinate series X (M p ), Y (M q ) is shown, and their divided coordinate series DX
(K i), indicating the DY divided unit areas set in (k j) (k i, k j).
なお、分割数NX,NYは入力文字の複雑度に応じて分割数N
X,NYを決定し、或いはいつたんリジエクトされた場合に
分割数NX,NYを変更して再度文字認識を行なわせるもの
である。The number of divisions NX, NY is N depending on the complexity of the input characters.
It decides X, NY, or changes the number of divisions NX, NY when it is rejected, and makes character recognition again.
以上の様に文字枠分割点決定部7では、分割単位領域の
分割形式は、X軸方向に関する分割数としてNX=4,5,6,
8なる4形式、Y軸方向に関する分割数としてNY=4,5,
6,8なる4形式をとることができる。本実施例では分割
数をNX=NY=8として以下説明する。この場合、X軸方
向については、重心座標X(1)〜X(7)に対応する
分割座標DX(1)〜DX(7)、Y軸方向については、重
心座標Y(1)〜Y(7)に対応する分割座標DY(1)
〜DY(7)を決定する。As described above, in the character frame division point determination unit 7, the division format of the division unit area is NX = 4,5,6, as the number of divisions in the X-axis direction.
8 formats, the number of divisions in the Y-axis direction is NY = 4,5,
It can take four formats: 6,8. In this embodiment, the number of divisions will be described below with NX = NY = 8. In this case, in the X-axis direction, divided coordinates DX (1) to DX (7) corresponding to the barycentric coordinates X (1) to X (7), and in the Y-axis direction, the barycentric coordinates Y (1) to Y ( Division coordinates DY (1) corresponding to 7)
~ Determine DY (7).
正規化分割領域面積計算部8は、文字枠検出部4で検出
されたX方向の文字枠座標Xl,XrとY軸方向の文字枠座
標Yt,Yb、及び文字枠分割点決定部7で決定されたX軸
方向の分割座標DX(1)〜DX(7)Y軸方向の分割座標
DY(1)〜DY(7)を受けて、各軸上において各分割座
標で分割される各分割領域の面積を、上記両端座標間の
長さの積で正規化した正規化分割領域面積マトリクスを
次の(6)式によつて計算する。The normalized divided area area calculation unit 8 determines the character frame coordinates X l , X r in the X direction and the character frame coordinates Y t , Y b in the Y axis detected by the character frame detection unit 4, and the character frame division point determination. Divided coordinates in the X-axis direction determined by unit 7 DX (1) to DX (7) Divided coordinates in the Y-axis direction
Received DY (1) to DY (7), the area of each divided area divided by each divided coordinate on each axis is normalized by the product of the lengths between the both end coordinates, and the normalized divided area area matrix Is calculated by the following equation (6).
正規化分割領域面積マトリクス; X軸両端座標間長;LX=Xr−Xl+1 ………(7) Y軸両端座標間長;LY=Yb−Yt+1 ………(8) ただし、DX(0)=Xl、DX(8)=Xr、DX(0)=Yt、
DY(8)=Ybである。また、Kは定数であり、本実施例
ではK=1000とする。Normalized divided area area matrix; Length between coordinates on both ends of X axis; LX = X r −X l + 1 ……… (7) Length between coordinates on both ends of Y axis; LY = Y b −Y t +1 ……… (8) However, DX (0) = X l , DX (8) = X r , DX (0) = Y t ,
DY (8) = Y b . K is a constant, and K = 1000 in this embodiment.
第4図に、分割座標DX(0)〜DX(8)、DY(0)〜DY
(8)と正規化分割領域面積マトリクス{BES(I,J)|I
=1〜8、J=1〜8}の対応関係を示す。また、前述
の第2図(a)に、漢字「田」と「口」夫夫の入力文字
パターンにおける分割座標DX(0)〜DX(8)、DY
(0)〜DY(8)と、これらの分割座標によつて入力文
字パターン(文字枠内)が各領域に分割される様子を示
す。Figure 4 shows the division coordinates DX (0) to DX (8) and DY (0) to DY.
(8) and the normalized divided area area matrix {BES (I, J) | I
= 1 to 8 and J = 1 to 8}. Further, in FIG. 2 (a) described above, the division coordinates DX (0) to DX (8), DY in the input character pattern of the Chinese characters “Ta” and “Kuchi” are input.
(0) to DY (8) and the division coordinates of the input character pattern (in the character frame) are divided into respective areas.
正規化分割領域面積計算部8で得られた入力文字パター
ンの特徴情報としての正規化分割領域面積マトリクスfi
={BES(I,J)|I=1〜8、J=1〜8}は識別部9に
与えられる。Normalized divided area area matrix f i as the characteristic information of the input character pattern obtained by the normalized divided area area calculation unit 8
= {BES (I, J) | I = 1 to 8, J = 1 to 8} is given to the identification unit 9.
辞書メモリ10には、入力文字パターンの場合と同様にし
て計算された標準パターンに対する特徴情報としての正
規化分割領域面積マトリクスgiが予め登録されている。In the dictionary memory 10, the normalized divided area area matrix g i as the characteristic information for the standard pattern calculated in the same manner as the case of the input character pattern is registered in advance.
識別部9は、以上のようにして得られた入力文字パター
ン及び標準パターンの特徴情報の類似度を測定し、最も
類似する標準パターンの文字コードを入力文字図形パタ
ーン名として認識し、その文字コードを出力端子11に出
力する。本実施例では、辞書メモリ10内の標準パターン
の正規化分割領域面積マトリクスgiと入力文字パターン
の正規化分割領域面積マトリクスfiとの間における次の
(9)式の重み付きユークリツド距離(D)の最小値を
与える標準パターンを最も類似する標準パターンとす
る。The identification unit 9 measures the similarity between the characteristic information of the input character pattern obtained as described above and the characteristic information of the standard pattern, recognizes the character code of the most similar standard pattern as the input character graphic pattern name, and recognizes the character code. Is output to the output terminal 11. In the present embodiment, the weighted Euclidean distance of the following equation (9) between the normalized divided area area matrix g i of the standard pattern in the dictionary memory 10 and the normalized divided area area matrix f i of the input character pattern ( The standard pattern that gives the minimum value of D) is the most similar standard pattern.
ここで、ユークリツド距離(D)の重みは各分割領域に
重み係数Wiを与えたものであり、本実施例では重み係数
Wiを全て1とする。 Here, the weight of the Euclidean distance (D) is the weight coefficient W i given to each divided area, and in the present embodiment, the weight coefficient is
Set W i to all 1.
なおまた、前記実施例においてはテーブルを採用するこ
とによつて重心座標と分割座標とを対応づけたが、所定
の手順のフローチヤートの演算処理を実行させることに
よつても対応づけることができる。この場合のフローチ
ヤートを第5図に示す。Further, in the above-described embodiment, the barycentric coordinates and the divided coordinates are associated with each other by using the table, but they can be associated with each other by executing the flow chart arithmetic processing of a predetermined procedure. . The flow chart in this case is shown in FIG.
なお、第5図における除算の結果はすべて小数点以下切
り捨てである。All the results of the division in FIG. 5 are rounded down to the right of the decimal point.
第5図において、ステツプS1で重心個数MXを分割数NXで
割つた数Mαを求め、ステツプS2,S3でMX/NXの剰余R1と
そのR1を2で割った商R2を求める。又、ステツプS4で分
割数の中央値kαを求め、ステツプS5,S6で分割番号kiと
重心番号Mpを0にセツトする。又、ステツプS7,S8,S9
で、分割番号kiを1つ増加する毎に、前に、設定されて
いる商R2を1つ減じ、重心番号MpをMαずつ増加させ
る。ステツプS10で商R2が負でないことを調べ、商R2が
負でない限りステツプS11で重心番号の数を1つ増し、
ステツプS12でその重心番号Mpを分割番号kiに対応づ
け、分割座標DX(Mp)を決定する。商R2が負の場合、シ
ステツプS13で現在の分割番号kiが中央値kαより大きい
か否かを判定し、大きい場合は重心番号を1つ増し、小
さい場合はステツプS9で設定された重心番号を、分割座
標DX(Mp)を決定し、ステツプS14で分割番号kiが(NX-
1)に一致したことを検出して終了する。In Figure 5, obtains a divided ivy number M alpha gravity number MX by dividing the number NX at step S1, step S2, S3 obtains the quotient R 2 obtained by dividing the remainder R 1 and its R 1 of MX / NX at 2 . Further, the number of divisions calculated median k alpha in step S4, the excisional division number k i and the center of gravity number M p to 0 in step S5, S6. Also, steps S7, S8, S9
Every time the division number k i is incremented by 1, the set quotient R 2 is decremented by 1 and the center of gravity number M p is incremented by M α . Check that the quotient R 2 in step S10 is not negative, one increases the number of centroid numbers at step S11 as long as the quotient R 2 is not negative,
At step S12, the center of gravity number M p is associated with the division number k i , and the division coordinate DX (M p ) is determined. If the quotient R 2 is negative, it is determined in system S13 whether the current division number k i is larger than the median k α. If it is larger, the center of gravity number is incremented by 1, and if it is smaller, it is set in step S9. The centroid number is used to determine the division coordinate DX (M p ), and in step S14 the division number k i becomes (NX-
Detects that it matches 1) and terminates.
以上述べた本実施例の文字図形認識方式の特徴情報であ
る正規化分割領域面積マトリクスの有効性を以下に説明
する。The effectiveness of the normalized divided area area matrix, which is the characteristic information of the character / graphics recognition method of the present embodiment described above, will be described below.
例えば第2図(a)に夫々示される「田」と「口」の入
力文字パターンにおいて、パターン中心部における垂直
ストロークと水平ストロークの交差の有無という両パタ
ーンの相異性が正規化分割領域面積マトリクスの1特徴
要素BES(5,4)において顕著な差となつてあらわれ、正
規化分割領域面積マトリクスが文字図形パターンの差異
を有効に反映していることが明らかである。この例にお
いて、BES(4,4)、BES(4,5)、BES(5,5)においても
同様である。For example, in the input character patterns of "Ta" and "Mouth" respectively shown in FIG. 2 (a), the phase difference between the vertical stroke and the horizontal stroke at the center of the pattern is a normalized divided area area matrix. 1) BES (5,4), a characteristic difference appears, and it is clear that the normalized divided area area matrix effectively reflects the difference in the character / graphic pattern. In this example, the same applies to BES (4,4), BES (4,5), and BES (5,5).
また、正規化分割領域面積マトリクスなる特徴は、重心
座標系列を利用して分割された各分割領域において当該
領域およびそのX軸方向、Y軸方向の領域内の文字線の
密度を反映したもので原文字図形パターンの二次元の性
質を表わす。したがつて、前記従来技術の(3)の方式
で示した正規化分割領域長系列なる特徴のような、本来
二次元の性質を持つ原文字図形パターンを一次元の性質
で表わすものに比べ、本実施例で用いる正規化分割領域
面積マトリクスなる特徴は、安全な特徴であり、本実施
例はより安定に文字図形を認識することが可能となる。In addition, the feature of the normalized divided area area matrix reflects the density of the character line in the divided area and its area in the X-axis direction and the Y-axis direction in each divided area divided by using the barycentric coordinate series. Represents the two-dimensional nature of the original graphic pattern. Therefore, compared with the one represented by the one-dimensional characteristic of the original character graphic pattern which originally has the two-dimensional characteristic, such as the feature of the normalized divided area length sequence shown in the method (3) of the prior art, The feature of the normalized divided area area matrix used in the present embodiment is a safe feature, and the present embodiment enables more stable recognition of character graphics.
更に、正規化分割領域面積マトリクスなる特徴は各分割
領域を文字の大きさを表現する文字枠内面積で正規化し
たものであるので、文字の大きさの変動に対しても安定
なものである。Further, since the characteristic of the normalized divided area area matrix is that each divided area is normalized by the area inside the character frame that expresses the size of the character, it is stable even against variations in the size of the character. .
以上のように本実施例によれば、入力文字パターンの走
査と所定の演算によつて得られ、二次元の性質を表わす
正規化分割領域面積マトリクスを文字の特徴情報とした
ので、簡単な処理で高速かつ正確に、さらに文字図形の
大きさの変動に対して安定に文字(図形、記号等を含
む)を認識することができる。As described above, according to the present embodiment, since the normalized divided area area matrix, which is obtained by scanning the input character pattern and the predetermined calculation and represents a two-dimensional property, is used as the character feature information, a simple process is performed. Thus, it is possible to recognize a character (including a figure, a symbol, etc.) at high speed, accurately, and stably with respect to a variation in the size of the character figure.
(発明の効果) 以上詳細に説明したように本発明によれば、従来の文字
図形認識方式の特徴情報抽出における、輪郭追跡や細線
化等の複雑なパターン処理を行なうことなく、入力文字
図形パターンを走査するだけで得られる所定2つの軸上
における黒ビツト数分布から求めた重心座標系列を利用
して二次元の性質を表わす特徴情報である正規化した分
割領域面積マトリクスを求め、これを文字図形認識に用
いているで、簡単な処理でかつ正確に、さらに文字図形
の大きさの変動に対して安定でありながら、文字の形状
の微小な差異を検討でき、文字図形を確認することがで
きる。(Effects of the Invention) As described in detail above, according to the present invention, an input character / graphic pattern can be obtained without performing complicated pattern processing such as contour tracing or thinning in the characteristic information extraction of the conventional character / graphic recognition method. By using the barycentric coordinate series obtained from the black bit number distributions on the predetermined two axes obtained only by scanning, the normalized divided area area matrix that is the characteristic information representing the two-dimensional property is obtained, and this is obtained as a character. Since it is used for figure recognition, it is possible to examine the minute differences in the shape of the character by simple processing and accurately, and while it is stable against changes in the size of the character figure, it is possible to check the character figure. it can.
第1図は本発明による文字図形認識方式の一実施例を示
す機能ブロツク図、第2図(a),(b),(c)は入
力文字パターン例と、重心座標系列、分割座標系列との
関係を示す図、第3図は重心座標系列と分割座標系列と
の対応関係を示す図、第4図は分割座標系列と正規化分
割領域面積マトリクスとの対応関係を示す図、第5図は
分割座標系列の他の決定方法を示すフローチヤートであ
る。 1……光入力、2……光電変換部、3……パターンレジ
スタ、4……文字枠検出部、5……文字投影作成部、6
……重心検出部、7……文字枠分割点決定部、8……正
規化分割領域面積計算部、9……識別部、10……辞書メ
モリ、11……出力端子FIG. 1 is a functional block diagram showing an embodiment of a character / figure recognition system according to the present invention, and FIGS. 2 (a), (b), and (c) are examples of input character patterns, a barycentric coordinate series, and a divided coordinate series. FIG. 3, FIG. 3 is a diagram showing a correspondence relationship between the barycentric coordinate series and the division coordinate series, FIG. 4 is a view showing a correspondence relationship between the division coordinate series and the normalized divided area area matrix, and FIG. Is a flow chart showing another method of determining the divided coordinate series. 1 ... Optical input, 2 ... Photoelectric conversion unit, 3 ... Pattern register, 4 ... Character frame detection unit, 5 ... Character projection creation unit, 6
...... Center of gravity detection unit, 7 ...... Character frame division point determination unit, 8 ...... Normalized division area area calculation unit, 9 ...... Identification unit, 10 ...... Dictionary memory, 11 ...... Output terminal
フロントページの続き (72)発明者 伊東 晃治 東京都港区虎ノ門1丁目7番12号 沖電気 工業株式会社内 (56)参考文献 特開 昭58−123171(JP,A) 特開 昭61−150086(JP,A)Front page continuation (72) Inventor Koji Ito 1-7-12 Toranomon, Minato-ku, Tokyo Oki Electric Industry Co., Ltd. (56) Reference JP-A-58-123171 (JP, A) JP-A-61-150086 (JP, A)
Claims (1)
られるパターンを記憶する記憶手段を備え、前記パター
ンに基づいて文字図形を認識する文字図形認識方式にお
いて、 (a)前記パターンを走査して文字図形の外接枠を検出
する第1の検出手段、 (b)前記パターンを走査して所定の2つの軸に投影し
た各軸方向の黒ビット数分布を作成する作成手段、 (c)前記2つの軸方向の前記外接枠内の範囲で各黒ビ
ット数分布の重心座標を決定し、決定した各重心座標で
外接枠内の範囲を分割した各分割範囲に対し各黒ビット
数分布の重心座標を決定する過程を繰り返して各軸方向
の重心座標系列を検出する第2の検出手段、 (d)設定される分割数に基づいて、前記重心座標系列
に対応した各軸方向の分割座標系列を決定する決定手
段、 (e)前記分割座標系列で分割される前記外接枠内の分
割領域毎に、該分割領域の面積を外接枠内の面積で正規
化した分割領域面積マトリクスを計算する計算手段、 (f)前記分割領域面積マトリクスと予め計算された標
準パターンの分割領域面積マトリクスとを照合して前記
パターンの文字図形を認識する認識手段とを具備するこ
とを特徴とする文字図形認識方式。1. A character / figure recognition system that includes a storage unit for storing a pattern obtained by reading and quantizing a character / figure on a medium, and recognizing a character / figure based on the pattern, comprising: (a) scanning the pattern. First detecting means for detecting the circumscribing frame of the character figure, (b) creating means for creating a black bit number distribution in each axis direction by scanning the pattern and projecting it on two predetermined axes, (c) The barycentric coordinates of each black bit number distribution are determined within the range of the circumscribing frame in the two axial directions, and the range of the black bit number distribution is divided into each of the divided ranges obtained by dividing the range within the circumscribing frame by the determined barycentric coordinates. Second detection means for detecting the barycentric coordinate series in each axial direction by repeating the process of determining the barycentric coordinates, (d) based on the set division number, the divided coordinates in each axial direction corresponding to the barycentric coordinate series Determining means for determining the series, (E) For each of the divided areas in the circumscribing frame divided by the division coordinate series, a calculation means for calculating a divided area area matrix in which the area of the divided area is normalized by the area in the circumscribing frame, (f) A character / figure recognition system, comprising: a recognition means for recognizing a character / figure of the pattern by comparing the divided area area matrix with a previously calculated standard area of the divided area area matrix.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP62067968A JPH0799535B2 (en) | 1987-03-24 | 1987-03-24 | Character figure recognition method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP62067968A JPH0799535B2 (en) | 1987-03-24 | 1987-03-24 | Character figure recognition method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPS63234374A JPS63234374A (en) | 1988-09-29 |
| JPH0799535B2 true JPH0799535B2 (en) | 1995-10-25 |
Family
ID=13360284
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP62067968A Expired - Fee Related JPH0799535B2 (en) | 1987-03-24 | 1987-03-24 | Character figure recognition method |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JPH0799535B2 (en) |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS58123171A (en) * | 1982-01-18 | 1983-07-22 | Oki Electric Ind Co Ltd | Character recognizing system |
-
1987
- 1987-03-24 JP JP62067968A patent/JPH0799535B2/en not_active Expired - Fee Related
Also Published As
| Publication number | Publication date |
|---|---|
| JPS63234374A (en) | 1988-09-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111275049A (en) | Method and device for acquiring character image skeleton feature descriptors | |
| JPH08508128A (en) | Image classification method and apparatus using distribution map | |
| CN112200822A (en) | Table reconstruction method, device, computer equipment and storage medium | |
| CN113537216B (en) | Dot matrix font text line inclination correction method and device | |
| JPH0799535B2 (en) | Character figure recognition method | |
| JPH0656625B2 (en) | Feature extraction method | |
| JPH0799534B2 (en) | Character figure recognition method | |
| JPS6214277A (en) | Image processing method | |
| JPH0799536B2 (en) | Character figure recognition method | |
| JPH0656624B2 (en) | Feature extraction method | |
| CN112001311A (en) | Method and device for realizing handwritten number recognition based on graph edge detection | |
| JP3095470B2 (en) | Character recognition device | |
| US20250391186A1 (en) | Vehicle mileage recognition method and apparatus | |
| JPH0877293A (en) | Character recognition apparatus and method for creating dictionary for character recognition | |
| JP2616994B2 (en) | Feature extraction device | |
| JPH0147835B2 (en) | ||
| JPH0147829B2 (en) | ||
| JP2749947B2 (en) | Character recognition method | |
| JPH08101880A (en) | Character recognition device | |
| JP2974167B2 (en) | Large Classification Recognition Method for Characters | |
| JP2576491B2 (en) | Feature extraction method | |
| JP2749946B2 (en) | Character recognition method | |
| JPH0664629B2 (en) | Character recognition method | |
| CN116740727A (en) | Bill image processing method, device, electronic equipment and storage medium | |
| JPH01187684A (en) | Character recognizing device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| LAPS | Cancellation because of no payment of annual fees |