JPS5855551B2

JPS5855551B2 - Discrimination feature extraction device

Info

Publication number: JPS5855551B2
Application number: JP54040833A
Authority: JP
Inventors: 誠一郎内藤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 1979-04-04
Filing date: 1979-04-04
Publication date: 1983-12-10
Also published as: JPS55134484A

Description

【発明の詳細な説明】本発明は、識別特徴抽出装置、特に例えば手書き漢字図
形のように複雑でカテゴリ数の多い文字を対象とする認
識システムにおける認識対象文字構成する文字線の特徴
を抽出する識別特徴抽出装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention is an identification feature extraction device, particularly for extracting features of character lines constituting recognition target characters in a recognition system that targets characters that are complex and have many categories, such as handwritten kanji figures. The present invention relates to a discriminative feature extraction device.

従来この種の装置として次の如きものが開発されている
。Conventionally, the following devices have been developed as this type of device.

即ち、手書き漢字識別システムにおいては、■文字線を
微視的に追跡し、端交点、屈曲点に代表される文字線の
幾何学的形状を特徴として抽出する装置、あるいは■文
字線を＠線あるいは適当な曲線で近似し、近似のパラメ
ータを特徴として抽出する装置等が開発されている。In other words, in a handwritten kanji recognition system, there is a device that microscopically traces character lines and extracts the geometric shapes of character lines, such as end intersections and bending points, as features, or Alternatively, a device has been developed that approximates an appropriate curve and extracts the approximation parameters as features.

また印刷漢字識別システムにおいては、■文字線の輪郭
部分を適当なマスクで観測し、マスク内の文字線輪郭部
の方向を判定し、これを全画面にわたり走査計数するこ
とにより、文字の水平、および垂直方向に関する複雑度
を表わす数値を抽出する装置、あるいは■文字の濃度分
布を垂直軸および水平軸上に投影し、該軸方向に関して
縮退した濃度分布関数を特徴として抽出する装置、ある
いは■文字画面を適当な大きさで分割し、該分割領域の
平均濃度を特徴として抽出する装置、あるいは■文字の
上下左右の周辺部に４つの矩形の検査領域を設定し、該
領域中の文字線の量を計数することにより、４種のコー
ドを各領域から特徴として抽出する装置、あるいは■文
字線を段階的に太らせながら各段階での周辺部の黒部弁
の割合を計数し、その値が太め処理とともに変化する様
子を特徴として抽出する装置、あるいは■文字図形の外
郭ストローク形状を抽出する装置等が開発されている。In addition, in the printed kanji identification system, ■ The outline of the character line is observed with an appropriate mask, the direction of the character line outline within the mask is determined, and this is scanned and counted over the entire screen. and a device that extracts numerical values representing complexity in the vertical direction, or ■A device that projects the density distribution of characters onto the vertical and horizontal axes and extracts a density distribution function that is degenerate in the direction of the axes as a feature, or ■Characters A device that divides the screen into appropriate sizes and extracts the average density of the divided areas as a feature, or ■ Sets four rectangular inspection areas around the top, bottom, left and right of the characters, and examines the character lines in the areas. A device that extracts four types of codes as features from each area by counting the amount, or ■ Counting the proportion of black part in the peripheral area at each stage while gradually thickening the character line, and calculating the value. Devices have been developed that extract characteristics that change with thickening processing, or (1) devices that extract the outline stroke shape of character figures.

しかしながら上記装置のうち、手書き漢字を対象とした
文字線の端交点に着目する装置の場合、文字線の微視的
追跡を行なうことが特徴であるとともに難点でもあり、
通常生じる文字線の局所的な接触や分離等の手書き変形
によって抽出される特徴が大きく変動してしまい不安定
なものとならざるを得ない。However, among the above-mentioned devices, the device that focuses on the end intersections of character lines for handwritten kanji has the characteristic and difficulty of performing microscopic tracing of the character lines.
Due to handwriting deformations such as local contact or separation of character lines that normally occur, extracted features vary greatly and become unstable.

そのため、これらの特徴を用いた認識システムにおいて
は、識別部分において、これらの不安定さを吸収するた
めに辞書として膨大なデータあるいは長大な識別論理を
準備し、大量の計数や複雑な判定処理を行なわなければ
ならない。Therefore, in a recognition system that uses these features, in order to absorb these instability in the identification part, a huge amount of data or a long identification logic is prepared as a dictionary, and a large amount of counting and complicated judgment processing are required. must be done.

ちなみに教育漢字８８１字を対象にした認識システムで
は高速コンピュータで数分を要する処理量となると言わ
れている。By the way, a recognition system for 881 educational kanji is said to require several minutes of processing time on a high-speed computer.

また線分を用いて漢字を近似することにより認識するシ
ステムでは、２００字種を対象にして８５φ程度の認識
率を得るにとどまっており、そのまま当用漢字程度の字
種を対象にした実用システムに拡張することは困難と思
われる。In addition, a system that recognizes kanji by approximating kanji using line segments only achieves a recognition rate of about 85φ for 200 character types, and a practical system that targets kanji of ordinary use as it is. It seems difficult to extend this to

さらに両者に共通する欠点として、漢字図形は文字線の
構成が非常に単純なものから複雑なものまであり、この
性質を利用した大分類が有効と思われるにもかかわらず
この情報にもとづく分類が行なわれず、単純な文字の識
別と複雑な文字の識別とを共に同様の複雑な手順で判定
しており、効率が必らずしも良好でない。Furthermore, a common drawback of both is that the structure of character lines in kanji figures ranges from very simple to complex, and although it would seem that a broad classification that utilizes this property is effective, classification based on this information is difficult. Instead, both simple character identification and complex character identification are determined using the same complicated procedure, which is not necessarily efficient.

また一方上記装置のうちで印刷漢字を対象にした認識シ
ステムにおいては、単一字体当用漢字を対象にして、は
ぼ実用的な装置が構成されつつあるものの、それを複数
の印刷字体を対象にしたシステムに拡張することが試み
られているにとどまり、現状では、はるかに多様で大き
な変形を含む手書き漢字を対象にしたシステムに従来技
術をそのまま用いるには無理があると考える。On the other hand, among the above-mentioned devices, in the recognition system for printed kanji, practical devices are being constructed for kanji used in a single font; At present, we believe that it is impossible to use the conventional technology as it is in a system that targets handwritten kanji, which are much more diverse and include large variations.

また現在手書き英数字やカナ等を対象にして、各種の特
徴抽出装置が開発され、大きな成果をあげている。Currently, various feature extraction devices have been developed for handwritten alphanumeric characters, kana, etc., and have achieved great results.

このことから、これらの装置を手書き漢字を対象に拡張
する試みも考えられる。From this, it is possible to consider extending these devices to include handwritten kanji.

しかし単なる量的拡張だけでなく、漢字の複雑性に対処
できる質的発展が必要である。However, not only quantitative expansion but also qualitative development that can deal with the complexity of kanji is necessary.

実際には、手書き漢字に対する従来装置の能力評価を目
的とした少数の試みがなされているにすぎない。In fact, only a few attempts have been made to evaluate the capabilities of conventional devices for handwritten Chinese characters.

またこの他に、印刷アルファベットを対象にした認識法
として、文字背景白部を、そこから上下左右方向にみた
文字線数で領域分割し、領域の種類ごとにその面積の合
計を計数したベクトルを形成し、該ベクトルを文字の特
徴として認識する方法が試みられている。In addition, as a recognition method for printed alphabets, the white part of the character background is divided into regions according to the number of character lines viewed from there in the vertical and horizontal directions, and a vector is calculated by counting the total area for each type of region. Attempts have been made to form a vector and recognize the vector as a feature of a character.

しかし、この特徴抽出法には次の３つにまとめられる欠
点があった。However, this feature extraction method had the following three drawbacks.

第１点は領域の種類すなわち出現し得る特徴の個数の問
題である。The first point is the type of region, that is, the number of features that can appear.

特徴の種類は上下左右方向にみた文字線の本数で決まる
。The type of feature is determined by the number of character lines seen in the vertical, horizontal, and horizontal directions.

上記従来の構成ではアルファベットを対象としていたた
めに、文字が簡単であった。In the conventional configuration described above, since the alphabet was targeted, the characters were simple.

そこで、文字線の本数を、０本、１本、２本以上の３種
類に制限した。Therefore, the number of character lines is limited to three types: 0, 1, and 2 or more.

したがって、上下左右方向それぞれ３種類、合計３Ｘ３
Ｘ３Ｘ３−８１種類の特徴を考えた。Therefore, 3 types each in the up, down, left and right directions, total 3X3
We considered the characteristics of 81 types of X3X3-81.

さて認識対象を漢字のように複雑な文字にすると、当然
文字線の本数が増加し、２本以上を一つにまとめて扱っ
ては充分な認識ができない。Now, when the recognition target is a complex character like a kanji, the number of character lines naturally increases, and sufficient recognition cannot be achieved if two or more lines are treated as one.

漢字の場合少なくとも１０本以上の文字線が並ぶ場合が
あるから、例えば０〜１０本まで区別して扱うと１１’
＝１４６４１種類の特徴が出現する可能性がある。In the case of kanji, there are cases where at least 10 or more character lines are lined up, so if you treat them separately from 0 to 10, it will be 11'.
= 14641 types of features may appear.

すなわち認識する文字が複雑になるにつれ認識装置の構
成はきわめて急速に困難になる。That is, as the characters to be recognized become more complex, the configuration of the recognition device becomes extremely difficult.

第２の欠点は特徴の出現位置の問題である。The second drawback is the problem of the appearance position of features.

該従来法では領域の位置に関係なく画面中で面積が集計
されてベクトルを形成した。In this conventional method, the areas on the screen are totaled to form a vector regardless of the position of the area.

これは文字が簡単な形状の場合は特徴そのものが文字の
どの部分から抽出されるものかある程度表現できるので
問題はなかったが、漢字のように複雑で似たカテゴリの
多い対象については有効な方法ではない。This is not a problem when the character has a simple shape because it can express to some extent which part of the character the feature itself is extracted from, but it is an effective method for objects that are complex and have many similar categories like kanji. isn't it.

すなわち、同種の特徴が出現位置だけを異にして抽出さ
れるような文字どうしが識別できない問題が起こる。In other words, a problem arises in which characters of the same type are extracted with only different appearance positions and cannot be distinguished from each other.

第３の欠点は個々の特徴どうしの差が計算されないこと
である。A third drawback is that differences between individual features are not calculated.

すなわち、実際の文字形状として非常に似た部分どうし
であっても、−変異なる種類の特徴が割り振られると、
両者はまったく違った形状どうしの場合と何らかわりな
く扱かわれた。In other words, even if the actual character shapes are very similar, if different types of features are assigned to them,
The two cases were treated no differently than two completely different shapes.

例えば、上下左右方向に５本の文字線がある領域と、上
下左方向に５本、右方向については６本の文字線がある
場合、両者は実際にはあまりかわらないのにもかかわら
ず、まったく異なる特徴としてその面積を集計される。For example, if there is an area with 5 character lines in the vertical and horizontal directions, and 5 character lines in the vertical and left directions, and 6 character lines in the right direction, even though the two do not actually differ much, Its area is counted as a completely different feature.

このような差異は変形により容易に起こるものであり、
これらをまったく異なるものどうしの場合と同じように
扱うのは不合理である。Such differences easily occur due to deformation;
It is unreasonable to treat these things in the same way as two completely different things.

該従来特徴を用いて識別する方法は、この変動を吸収す
るため複雑なものとならざるを得ない。The conventional identification method using the characteristics must be complicated in order to absorb this variation.

本発明は、上記問題を解決するためになされたもので、
たとえば水平あるいは垂直座標軸方向から文字を眺め、
該座標軸にそって文字を構成する文字線の本数を計数し
、それを文字図形の該座標軸上のベクルトとして抽出す
る。The present invention was made to solve the above problems, and
For example, if you look at characters from the horizontal or vertical coordinate axis,
The number of character lines constituting a character is counted along the coordinate axis and extracted as a vector on the coordinate axis of the character figure.

これを文字図形の該座標軸に関する複雑さを表わすスト
ローク密度関数と呼ぶ。This is called a stroke density function that expresses the complexity of a character graphic with respect to the coordinate axis.

この関数のベクトルとしての要素ごとの比較によるマツ
チングにより文字を識別することを特徴とする。Characters are identified by matching by comparing each element as a vector of this function.

そして単純な構成から複雑な構成まで多様な文字線密度
分布をもつ漢字図形の性質を利用し、効率よく分類認識
することを目的としている。The aim is to utilize the properties of kanji shapes, which have a variety of character line density distributions, from simple to complex configurations, to efficiently classify and recognize them.

以下図面について詳細に説明する。第１図は本発明によ
る識別特徴抽出装置を概念的に表わした実施例を示す。The drawings will be explained in detail below. FIG. 1 shows an embodiment conceptually representing an identification feature extraction device according to the present invention.

１は水平方向をＸ軸とし垂直方向をＹ軸として、それぞ
れ１２８ビツトの分解能をもった認識文字記憶用のレジ
スタである。Reference numeral 1 denotes a register for storing recognized characters, with the horizontal direction being the X axis and the vertical direction being the Y axis, each having a resolution of 128 bits.

２および３は、レジスタ１に記憶された文字を水平ある
いは垂直軸と平行に走査し、垂直あるいは水平方向の座
標軸上の位置をパラメータとして文字線の本数を計数す
る文字線数計数装置である。2 and 3 are character line number counting devices that scan the characters stored in the register 1 in parallel with the horizontal or vertical axis and count the number of character lines using the position on the vertical or horizontal coordinate axis as a parameter.

４および５は、計数装置２および計数装置３により計数
される当該軸方向に関する文字線本数すなわち文字線の
密度を座標軸に関する関数として記憶する１２８ワード
のレジスタである。4 and 5 are 128-word registers that store the number of character lines in the axis direction, that is, the density of character lines counted by the counting device 2 and the counting device 3, as a function regarding the coordinate axes.

第２図は第１図図示のレジスタ２あるいはレジスタ３を
詳細に示したもので、１′は第１図図示の１２８ビツト
×１２８ビツトレジスタ１のＸ軸方向あるいはＹ軸方向
の成る１行あるいは１列の１２８ビツトのデータである
。FIG. 2 shows in detail the register 2 or 3 shown in FIG. This is one column of 128-bit data.

６は、１２８ビツトのデータ１′を一端から１ビツトづ
つ移動しながら、２ビツトづつ並列に読み出し、逐次１
２７回にわたり装置７へ出力するビット抽出回路である
。6 reads out 2 bits in parallel while moving 1 bit at a time from one end of the 128-bit data 1', and sequentially reads 1 bit at a time.
This is a bit extraction circuit that outputs to the device 7 27 times.

７は、ビット抽出回路６により読み出された２ビツトの
データのうち右側のビットが「０」で左側が「１」の場
合のみ後述の装置８に値「１」を出力するパターン抽出
装置である。Reference numeral 7 denotes a pattern extraction device which outputs the value “1” to the device 8 described later only when the right bit of the 2-bit data read out by the bit extraction circuit 6 is “0” and the left bit is “1”. be.

８は、パターン抽出装置７により出力された［１」の個
数を計数し、その結果を第１図の計数装置２あるいは計
数装置３に出力する計数装置である。8 is a counting device which counts the number of [1's] outputted by the pattern extraction device 7 and outputs the result to the counting device 2 or the counting device 3 in FIG.

第３図は特徴の比較を行なうマツチング部のブロック図
である。FIG. 3 is a block diagram of a matching section that compares features.

第１図の記憶装置４および５に集積されたストローク密
度関数が接続されて第３図図示９に示される２５６ワー
ドレジスタに記憶される。The stroke density functions accumulated in storage devices 4 and 5 of FIG. 1 are connected and stored in a 256 word register shown at 9 in FIG. 3.

１０は識別対象カテゴリそれぞれについて、別途水めら
れた標準ストローク密度関数を記憶する記憶装置である
。Reference numeral 10 denotes a storage device that stores separately stored standard stroke density functions for each category to be identified.

１１は演算装置であり、レジスタ９に保持されるストロ
ーク密度関数と、記憶装置１０に記憶される標準ストロ
ーク密度関数について、座標軸上対応する要素どうしの
差の２乗を計算する。Reference numeral 11 denotes an arithmetic unit which calculates the square of the difference between corresponding elements on the coordinate axes for the stroke density function held in the register 9 and the standard stroke density function stored in the storage device 10.

１２は演算装置１１で計算された差の２乗と、一時的な
レジスタ１３の値とを加算し再びレジスタ１３に結果を
格納する演算装置である。Reference numeral 12 denotes an arithmetic unit that adds the square of the difference calculated by the arithmetic unit 11 and the value of the temporary register 13, and stores the result in the register 13 again.

演算装置１２は、ストローク密度関数の次元数だけ動作
を繰返す。The arithmetic unit 12 repeats the operation by the number of dimensions of the stroke density function.

１４は対象カテゴリ分だけ準備された記憶装置であり、
レジスタ９に記憶されたストローク密度関数と記憶装置
１０に記憶された各カテゴリごとの特徴とのマツチング
結果を記憶する記憶装置である。14 is a storage device prepared for each target category;
This storage device stores the matching results between the stroke density function stored in the register 9 and the characteristics for each category stored in the storage device 10.

これを動作するにはまず第１図図示のレジスタ１に文字
データを（０，１）２値の形で記憶する７次に文字の左
側すなわちＸ軸の値の小さい方から同−Ｘ軸座標をもつ
縦方向の一列弁の文字切片例えばａのデータについて、
第２図図示のビット抽出回路６によりＹ軸の値の小さい
ほうから１ビツトづつシフトしながら２ビツト分づつ観
測する。To operate this, first store the character data in the form of binary values (0, 1) in register 1 shown in Figure 1.7 Next, start from the left side of the character, that is, the smaller X-axis value, and start from the same - X-axis coordinate. For example, for the data of a character intercept of a vertical single-row valve with
The bit extraction circuit 6 shown in FIG. 2 observes two bits at a time while shifting one bit at a time starting from the smaller Y-axis value.

そして第２図図示のパターン抽出装置７および計数装置
８により、上記切片ａのデータ中のスｉ口−り本数をレ
ジスタ４の対応する場所に記憶させる。Then, the pattern extracting device 7 and counting device 8 shown in FIG.

次にＸ軸上で１ビツト分右側に位置する縦方向の一列の
文字切片データについて、同様に文字ストローク本数が
計数する。Next, the number of character strokes is counted in the same manner for a vertical line of character segment data located one bit to the right on the X-axis.

この操作をくり返して全文字画面についてストローク本
数を計数する３この結果レジスタ４にはＸ軸を座標変数
としてストローク本数を関数値とする文字の特徴が抽出
される。This operation is repeated to count the number of strokes for all character screens.3 As a result, character features are extracted in the register 4, with the X axis as a coordinate variable and the number of strokes as a function value.

すなわち、この特徴は、該認識文字が文字枠内でＸ軸座
標値をパラメータとしていかなるストローク本数を持っ
ていたかを示すものとなっている。That is, this feature indicates how many strokes the recognized character had within the character frame using the X-axis coordinate value as a parameter.

これより、この特徴をＸ軸方向のストローク密度関数と
呼ぶ。From now on, this feature will be referred to as a stroke density function in the X-axis direction.

同様の処理をＹ軸方向を座標変数として行ないＹ軸方向
のストローク密度関数がレジスタ５に得られる。A similar process is performed using the Y-axis direction as a coordinate variable, and a stroke density function in the Y-axis direction is obtained in the register 5.

識別は、装置１３を０にクリアし、カテゴリごとに２５
６次元のストローク密度関数について差を求め、記憶装
置１４に記憶する。Identification clears device 13 to 0 and 25 for each category.
Differences are calculated for the six-dimensional stroke density functions and stored in the storage device 14.

結果のうち最も小さい値を持つカテゴリを識別結果とす
ればよい。The category with the smallest value among the results may be used as the identification result.

本装置によって抽出して利用する特徴すなわちストロー
ク密度関数は文字のストローク本数を反映しており、画
数の少ない単純な文字から２０数画におよぶ複雑な文字
まで多くの種類をもつ漢字図形のような認識対象をその
複雑さに応じて効果的に分類することができる。The feature that is extracted and used by this device, that is, the stroke density function, reflects the number of strokes of a character, and it can be used for many types of characters, such as kanji figures, which have many types, from simple characters with a few strokes to complex characters with over 20 strokes. Recognition objects can be effectively classified according to their complexity.

すなわち従来の同様の方法と異なり、本特徴抽出装置で
は、特徴の種類は問題でなく、文字を表わす分解能、ま
た特徴を求める分解能により特徴の次元数が決まる。That is, unlike similar conventional methods, in the present feature extraction device, the type of feature does not matter, and the number of dimensions of a feature is determined by the resolution for representing characters and the resolution for finding features.

対象とする文字が複雑である場合従来法では特徴の次元
数が増えてしまったが、そのようなことは本発明では起
こらないという利点がある。When the target character is complex, the number of feature dimensions increases in the conventional method, but the present invention has the advantage that such a problem does not occur.

また設定した座標軸にそって特徴が抽出され、比較され
るため、特徴の出現位置が文字の識別に反映される利点
がある。Furthermore, since features are extracted and compared along the set coordinate axes, there is an advantage that the appearance position of the features is reflected in character identification.

また特徴の比較を要素間の差を比較することにより行な
うので、従来法のような特徴の種類が異なってしまうと
比較できなくなるという欠点はない。Furthermore, since the features are compared by comparing the differences between elements, there is no drawback that comparisons cannot be made if the types of features differ as in the conventional method.

例えば当用漢字のうち比較的簡単な形状をもつ文字の例
を第４図に示す。For example, an example of a character with a relatively simple shape among commonly used kanji is shown in FIG.

これらの文字についてＸ軸方向、Ｙ軸方向の夫々のスト
ローク密度関数をもとめる。Stroke density functions in the X-axis direction and the Y-axis direction are determined for these characters.

次にストｏ−り密度関数が示すストローク本数にしたが
って文字枠内で、Ｘ軸方向、Ｙ軸方向夫々に等間隔にス
トロークを配置する。Next, strokes are arranged at equal intervals in the X-axis direction and the Y-axis direction within the character frame according to the number of strokes indicated by the stroke density function.

即ち成るＸ座標位置ａに対応して上記「ストローク密度
関数が示すストローク本数」が値「１」であるとすると
、文字に外接する４辺形内で関数ｘ＝ａで与えられる直
線上の中点（即ち１／２に分割した点）にドツトをつけ
る。In other words, if the above-mentioned "number of strokes indicated by the stroke density function" is "1" corresponding to the X coordinate position a, then the center of the straight line given by the function x = a within the quadrilateral circumscribing the character. Draw a dot at the point (that is, the point divided into 1/2).

また成るＸ座標位置すに対応して値「２」であるとする
と、同じく関数ｘ＝ｂで与えられる直線上を１／３ずつ
に分割した２点にドツトをつける・・・・・・ようにし
てゆく。If we assume that the value is ``2'' corresponding to the x-coordinate position, then dots will be placed at two points that divide the straight line given by the function x=b into 1/3 each. I will continue to do so.

同様に成るＹ座標位置ｍに対応して上記「ストローク密
度関数が示すストローク本数」が値「１」であるとする
と、文字に外接する４辺形内で関数ｙ＝ｍで与えられる
直線上を１／２に分割した点にド°ノドをつける。If the above-mentioned "number of strokes indicated by the stroke density function" is "1" corresponding to the Y coordinate position m, which is the same, then the line given by the function y = m within the quadrilateral circumscribing the character. Draw a dot at the point where it was divided into 1/2.

また成るＹ座標位置ｎに対応して値「２」であるとする
と、同じく関数ｙ＝ｎで与えられる直線上を１／３
ずつに分割した２点にドツトをつける・・・・・・よう
にしてゆく。Also, if the value is "2" corresponding to the Y coordinate position n, then 1/3 on the straight line given by the function y = n
Draw a dot at each of the two divided points.

このような処理をほどこしたものが第５図図示の図形で
あり、第４図図示の各文字に対応して夫々特有の図形（
文字）が得られる。The figures shown in Figure 5 are obtained by performing such processing, and each character shown in Figure 4 has its own unique figure (
character) is obtained.

これは、ストローク密度関数による該漢字図形に対する
特徴記述結果と考えることができる。This can be considered to be the result of feature description for the Kanji figure using the stroke density function.

次に当用漢字のうちでも最も複雑な形状をもっ文字の例
を第６図に示す。Next, Figure 6 shows an example of a character with the most complex shape among the commonly used kanji.

これらの文字について同様にストローク密度関数から合
成した図形（文字）を第７図に示す。Figures (characters) similarly synthesized from the stroke density functions for these characters are shown in FIG.

第５図および第７図の両者の例から、該認識文字のスト
ロークの構成の複雑さの度合を、ストローク密度関数が
適確に表現していることがわかる。It can be seen from both the examples in FIGS. 5 and 7 that the stroke density function accurately expresses the degree of complexity of the stroke structure of the recognized character.

本装置により得られたストローク密度関数を用いて当用
漢字を対象とする大分類装置を構成できる。Using the stroke density function obtained by this device, it is possible to construct a major classification device for commonly used kanji.

たとえば当用漢字２０７４文字を１０名に筆記させた文
字データについて、そのうち５名分を標準パターンとし
て用い、ストローク密度関数の平均をもとめる。For example, regarding the character data of 2074 commonly used kanji characters written by 10 people, 5 of them are used as a standard pattern, and the average of the stroke density functions is calculated.

他の５名分を未知入力漢字としてストローク密度関数を
求めて、上で求めた平均ストローク密度関数とのユーク
リッド距離により分類を行なう。Stroke density functions are determined for the other five characters as unknown input kanji, and classification is performed based on the Euclidean distance from the average stroke density function determined above.

その結果正解がある順位以内に出現すれば正しく認識さ
れたとする分類率で、未知入力漢字について９５φが１
０位以内に正解となり、効果的に大分類された。As a result, the classification rate assumes that if the correct answer appears within a certain rank, it is recognized correctly, and 95φ is 1 for unknown input kanji.
The correct answer was within 0th place, and it was effectively classified.

この大分類装置はミニコンを用いたソフトウェアシミュ
レーションによって、処理時間は３秒／１文字、平均パ
ターン容量１３３ＫＢであり、この種の装置としては、
従来装置から想定されるものに比較して充分に高速化、
小形化を得ることができたものといえる。This major classification device has a processing time of 3 seconds/character and an average pattern capacity of 133 KB, which was determined by software simulation using a minicomputer.As a device of this kind,
Sufficiently faster speeds than expected from conventional equipment,
It can be said that we were able to achieve miniaturization.

[Brief explanation of the drawing]

第１図は本発明による識別特徴抽出装置を概念的に表わ
した一実施例、第２図は第１図図示のレジスタ２あるい
はレジスタ３の構成を示した一実施例、第３図は本特徴
のマツチング法を示したブロック図、第４図ないし第７
図は本発明の詳細な説明する説明図を示す。１９１．・・・１２８ｘｌ２８ビツトの認識文字記憶
用レジスタ、２・・・・・・文字線数計数装置、３・・
・・・・文字線数計数装置、４・・・・・・ストローク
密度関数記憶装置、５・・・・・・ストローク密度関数
記憶装置、６・・・・・・ビット抽出装置、７・・・・
・・パターン抽出装置、８・・・・・・計数装置、９・
・・・・・ストローク密度関数記憶装置、１０・・・・
・・標準ストローク密度関数記憶装置、１１・・・・・
・差、平方演算装置、１２・・・・・・加算装置、１３
・・・・・・一時記憶装置、１４・・・・・・マツチン
グ結果記憶装置。FIG. 1 is an embodiment conceptually showing the identification feature extraction device according to the present invention, FIG. 2 is an embodiment showing the configuration of register 2 or register 3 shown in FIG. Block diagrams showing the matching method, Figures 4 to 7
The figure shows a detailed illustration of the invention. 191. ...128xl 28-bit recognized character storage register, 2...Character line number counting device, 3...
... Character line number counting device, 4 ... Stroke density function storage device, 5 ... Stroke density function storage device, 6 ... Bit extraction device, 7 ...・・・
...Pattern extraction device, 8...Counting device, 9.
...Stroke density function storage device, 10...
...Standard stroke density function storage device, 11...
・Difference, square calculation device, 12... Addition device, 13
. . . Temporary storage device, 14 . . . Matching result storage device.

Claims

[Claims]

1. In a discriminating feature extraction device in a figure recognition system that extracts features by scanning figures including kanji, a two-dimensional figure storage device stores given figure information, and at least two a counting device that counts the number of intersections between a cutting line that cuts the given figure according to coordinate values on two coordinate axes and a figure stroke of the given figure; A storage device corresponding to the coordinate axes is provided, and features of the given figure are extracted based on the contents of the storage device, and the characteristics are identified by comparison at corresponding positions on the coordinate axes. , a patent characterized in that the difference is determined based on the difference in the number of stroke intersections, and identification is performed by summing up the differences at each point on the coordinate axis.