JPH0337231B2

JPH0337231B2 -

Info

Publication number: JPH0337231B2
Application number: JP59190963A
Authority: JP
Inventors: Hiroshi Matsumura; Tatsunosuke Iwahara
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1984-09-12
Filing date: 1984-09-12
Publication date: 1991-06-04
Also published as: JPS6168679A

Description

【発明の詳細な説明】 (イ) 産業上の利用分野本発明は、手書き漢字を認識する文字認識シス
テムに係り、候補字種カテゴリーの認識順位決定
方式に関する。DETAILED DESCRIPTION OF THE INVENTION (A) Field of Industrial Application The present invention relates to a character recognition system for recognizing handwritten Chinese characters, and more particularly to a method for determining the recognition ranking of candidate character type categories.

(ロ) 従来の技術一般に、文字認識システムでは、入力文字パタ
ーンから抽出した特徴パターンと、予め辞書部に
登録された字種カテゴリー毎の標準特徴パターン
との類似度を計算し、類似度の大きいｎ個の候補
字種カテゴリーを選択する。そして、類似度の最
も大きい候補字種カテゴリーを認識結果として出
力すると共に、誤認識の訂正のために、選択した
ｎ個の候補字種カテゴリーには、類似度の大きい
順に第１位から第ｎ位までの認識順位を決定して
おく。(b) Conventional technology Generally, in a character recognition system, the degree of similarity between a feature pattern extracted from an input character pattern and a standard feature pattern for each character type category registered in advance in a dictionary section is calculated, and Select n candidate character type categories. Then, the candidate character type category with the highest degree of similarity is output as a recognition result, and in order to correct misrecognition, the selected n candidate character type categories are ranked from 1st to nth in order of similarity. Decide on the recognition ranking up to the first rank.

ところが、上述の如く、認識順位の決定に類似
度のみを用いていたのでは誤認識が多く、そこ
で、類似度による複数の候補字種カテゴリーの選
択後に、何らかの後処理を施して認識順位を決定
する方式が考えられるようになつた。 However, as mentioned above, using only similarity to determine the recognition ranking often results in false recognition, so after selecting multiple candidate character categories based on similarity, some post-processing is performed to determine the recognition ranking. Now I can think of a way to do that.

そして、従来、後処理としては、特開昭59−
32082号公報に開示されているように、文法的処
理を行なうものや、特開昭59−27381号公報のよ
うに、被認識文字の前後の文字が、漢字か、カタ
カナかあるいはひらがなかを判定するものが提案
されていた。 Conventionally, as post-processing, JP-A-59-
As disclosed in Japanese Patent Publication No. 32082, it performs grammatical processing, and as in Japanese Patent Application Laid-open No. 59-27381, it determines whether the characters before and after the recognized character are kanji, katakana, or hiragana. Something was proposed.

(ハ) 発明が解決しようとする問題点従来の技術においては、文法的処理を後処理と
して行なうので、文法的な辞書等の知識部が莫大
となり、更には、その処理内容が非常に複雑にな
るという問題があり、又、前後の文字が、漢字か
カタカナか等を判定する方式では、選択した候補
字種カテゴリーが漢字やひらがなばかりである場
合には、認識率の向上は期待できなかつた。(c) Problems to be solved by the invention In the conventional technology, grammatical processing is performed as post-processing, so the knowledge section such as a grammatical dictionary becomes enormous, and furthermore, the processing content becomes extremely complicated. Furthermore, with the method of determining whether the preceding and succeeding characters are kanji or katakana, if the selected candidate character category is only kanji or hiragana, it cannot be expected to improve the recognition rate. .

そこで、本願出願人は、莫大な知識部を必要と
せず、短かい処理時間で認識率を向上させるた
め、学校教育の学習段階あるいは頻度等に応じ
て、字種カテゴリーの各々に優先度を定めてお
き、この優先度を用いて類似度計算により得られ
た複数の候補字種カテゴリーの認識順位を決定し
ようとしたが、類似度と優先度の兼ね合いにより
認識率が変わるので、この兼ね合いをいかにうま
く設定するかということが問題となつてきた。 Therefore, in order to improve the recognition rate in a short processing time without requiring a huge knowledge department, the applicant set priorities for each character type category according to the learning stage or frequency of school education. I then tried to use this priority to determine the recognition ranking of multiple candidate character type categories obtained by similarity calculation, but since the recognition rate changes depending on the balance between similarity and priority, I was wondering how to balance this. The problem has become how to set it up properly.

(ニ) 問題点を解決するための手段本発明は、字種カテゴリーの各々に優先度を定
めておき、類似度は所定値より大きい候補字種カ
テゴリーについては、優先度による認識順位の入
れ換えを行なわず、所定値より小さい候補字種カ
テゴリーについてのみ優先度による認識順位の入
れ換えを行なつて、認識順位を決定するものであ
る。(d) Means for solving the problem The present invention sets a priority for each character type category, and for candidate character type categories whose similarity is greater than a predetermined value, the recognition order is changed based on the priority. Instead, the recognition order is determined by changing the recognition order based on the priority only for candidate character type categories smaller than a predetermined value.

(ホ) 作用本発明では、類似度が大きいものについては、
優先度が無視されて類似度のみにより認識順位が
決定され、類似度が小さいものについては、優先
度により順位が決定されることとなり、このた
め、抽出した特徴パターンとよく似た特徴パター
ンの候補字種カテゴリーは、従来と全く同様の認
識順位となり、あまり似ていない候補字種カテゴ
リーでは、優先度が考慮され認識率が向上する。(E) Effect In the present invention, for those with a large degree of similarity,
The priority is ignored and the recognition ranking is determined only by the similarity, and for items with a small similarity, the ranking is determined by the priority. Therefore, candidates for feature patterns that are very similar to the extracted feature pattern Character type categories have the same recognition order as before, and for candidate character type categories that are not very similar, the priority is taken into account and the recognition rate improves.

(ヘ) 実施例第１図は、本発明を適用した文字認識システム
のブロツク図であり、１は入力用原稿に書かれた
文字を読み取り、読取り結果を２値の文字パター
ンとして出力する文字観測部、２は入力文字パタ
ーンから特徴パターンを抽出する特徴抽出部、３
は字種カテゴリー毎の標準特徴パターンを記憶し
た辞書部、４は抽出した特徴パターンと標準特徴
パターンとのマツチングを行ない、両パターンの
類似度を計算するパターンマツチング部である。(F) Embodiment Figure 1 is a block diagram of a character recognition system to which the present invention is applied, and 1 is a character observation system that reads characters written on an input manuscript and outputs the reading results as a binary character pattern. Part 2 is a feature extraction part that extracts a feature pattern from an input character pattern;
Reference numeral 4 indicates a dictionary section that stores standard feature patterns for each character type category, and 4 indicates a pattern matching section that matches the extracted feature patterns with the standard feature patterns and calculates the degree of similarity between the two patterns.

辞書部３の字種カテゴリーは、頻度あるいは学
校教育の学習段階に応じたカテゴリー分けが為さ
れており、各カテゴリーセツトが優先度が定めら
れている。例えば、第２図に示すように、小学校
１〜３年で学習する字種カテゴリーをカテゴリー
セツト１（３ａ）、小学校４〜６年で学習する字種
カテゴリーをカテゴリーセツト２（３ｂ）、中学校
以上で学習する字種カテゴリーをカテゴリーセツ
ト３（３ｃ）、というように全ての字種カテゴリー
を３つのカテゴリーセツトに分け、カテゴリーセ
ツト１〜３に順に優先度０〜２を定めている。 The character categories in the dictionary section 3 are divided into categories according to frequency or learning stage of school education, and each category set is given a priority. For example, as shown in Figure 2, category set 1 (3a) is the character category learned in the first to third years of elementary school, category set 2 (3b) is the character category learned in the fourth to sixth grade of elementary school, and category set 2 (3b) is the character category learned in the 4th to 6th years of elementary school. All character categories are divided into three category sets, such as category set 3 (3c), and priorities 0 to 2 are assigned to category sets 1 to 3 in order.

パターンマツチング部４は、カテゴリーセツト
１〜３に各々対応する３つの演算部４ａ〜４ｃを
備えており、各演算部は各カテゴリーセツトの中
から類似度の大きい順にｎ個の候補字種カテゴリ
ーを選択し、その字種コード及び計算結果として
の類似度を、候補メモリ５に格納する。この際、
演算部では対応するカテゴリーセツトの優先度を
字種コード及び類似度に付加し、これら３つの情
報が各々の候補字種カテゴリーの情報として候補
メモリ５に記憶される。このようにして、候補メ
モリ５には、各カテゴリーセツトの中からｎ個づ
つ、合計3n個の候補字種カテゴリーが記憶され
る。 The pattern matching unit 4 includes three calculation units 4a to 4c corresponding to category sets 1 to 3, respectively, and each calculation unit selects n candidate character type categories from each category set in descending order of similarity. is selected, and its character type code and the degree of similarity as a calculation result are stored in the candidate memory 5. On this occasion,
The calculation section adds the priority of the corresponding category set to the character type code and similarity, and these three pieces of information are stored in the candidate memory 5 as information for each candidate character type category. In this way, the candidate memory 5 stores n candidate character type categories from each category set, for a total of 3n candidate character type categories.

更に、第１図において、６は類似度と優先度と
の関係を記憶した知識部、７はこの知識部６の内
容を参照して、候補メモリ５に記憶された3n個
の候補字種カテゴリーのうち上位ｎ個の認識順位
を決定し、その字種コードを認識順位順に結果メ
モリ８に格納するクラスタリング制御処理部であ
つて、答出力制御部９は、認識順位が第１位の字
種コードを認識結果としてワープロあるいはパソ
コン等の文字表示装置に出力し、その字種の表示
を行なわせる。そして、答出力制御部９は、オペ
レータから誤認識の指示があれば、第２位以下の
字種コードを順次出力し、正しい認識結果が表示
されるように出力の制御を行なう。 Further, in FIG. 1, reference numeral 6 denotes a knowledge section that stores the relationship between similarity and priority, and 7 refers to the contents of this knowledge section 6 to determine the 3n candidate character type categories stored in the candidate memory 5. It is a clustering control processing unit that determines the recognition ranking of the top n characters among them and stores the character type codes in the result memory 8 in order of recognition ranking, and the answer output control unit 9 determines the character type with the first recognition ranking. The code is output as a recognition result to a character display device such as a word processor or a personal computer, and the type of character is displayed. Then, when the operator gives an instruction of misrecognition, the answer output control section 9 sequentially outputs the second and lower character type codes, and controls the output so that the correct recognition result is displayed.

次に、クラスタリング制御処理部７の処理及び
知識部６の内容について、更に詳しく説明する。 Next, the processing of the clustering control processing section 7 and the contents of the knowledge section 6 will be explained in more detail.

本実施例では、類似度としてシテイブロツク距
離ｄを用い、この距離が小さいほど類似度が大き
いとしており、知識部６には、各候補字種カテゴ
リーのシテイブロツク距離ｄを比較するための複
数の閾値D〓，D〓，D〓（D〓＜D〓＜D〓）が記憶されて
いる。クラスタリング制御処理部７は、第３図の
フロチヤートに示すように、先ず、候補メモリ５
に記憶された3n個の候補字種カテゴリーの中か
ら、距離が小さい順にｎ個の候補字種カテゴリー
M₁，M₂，……、M_oを選択し、これらを作業メ
モリ７ａに記憶する。そして、これらｎ個の候補
字種カテゴリーの各々の距離d₁，d₂，……，d_oを
複数の閾値D〓，D〓，D〓と比較することにより、
候補字種カテゴリーM₁〜M_oのクラス分けを行な
う。即ち、第４図に示すように、Ａ、Ｂ、Ｃ、Ｄ
の４つのクラスに分ける。知識部６には、第４図
に示すように、各クラスにおける優先度による順
位入れ換えの可否が予め記憶されており、Ａクラ
スでは優先度による順位入れ換えを行なわず、Ｂ
及びＣクラスでは同じクラス内のみで優先度によ
る順位入れ換えを行なうように定めている。そし
て、クラスタリング制御処理部７はこの知識部６
の内容を参照し認識順位の決定を行なう。 In this embodiment, the city block distance d is used as the degree of similarity, and it is assumed that the smaller this distance is, the greater the degree of similarity is. Threshold values D〓, D〓, D〓 (D〓<D〓<D〓) are stored. As shown in the flowchart of FIG.
From among the 3n candidate character categories stored in , n candidate character categories are selected in descending order of distance.
M ₁ , M ₂ , . . . , M _o are selected and stored in the working memory 7a. Then, by comparing the distances d ₁ , d ₂ , ..., d _o of each of these n candidate character type categories with multiple thresholds D〓, D〓, D〓,
The candidate character types are classified into categories M ₁ to M _o . That is, as shown in FIG. 4, A, B, C, D
divided into four classes. As shown in FIG. 4, the knowledge unit 6 stores in advance whether or not the order of priority can be changed in each class.
In the and C classes, it is specified that the order of priority is to be changed only within the same class. Then, the clustering control processing unit 7
The recognition order is determined by referring to the contents of

ここで、字種コードがM₁、シテイブロツク距
離d₁、優先度がＰ（Ｉ＝Ａ、Ｂ、Ｃ……、Ｐ＝０、
１、２）の候補字種カテゴリーを（M₁、d₁、Ｐ）
と表わすこととし、今、仮に、第５図(イ)に示すよ
うに、カテゴリーセツト１〜３の各々から、類似
度が大きい上位５個づつの候補字種カテゴリーが
選択され、各シテイブロツク距離d₁の関係が、d_V
＜d_P＜d_A＜d_Q＜d_S＜……であつたとすると、クラ
スタリング制御処理部７の作業メモリ７ａには、
第５図(ロ)に示すような順に５個の字種コードM_V
〜M_Sが選択記憶される。 Here, the character type code is M ₁ , the city block distance d ₁ , and the priority is P (I=A, B, C..., P=0,
1, 2) candidate character type categories (M ₁ , d ₁ , P)
Now, suppose that the top five candidate character type categories with the highest degree of similarity are selected from each of category sets 1 to 3, and each city block distance is The relationship of d ₁ is d _V
Assuming that <d _P <d _A <d _Q <d _S <..., the working memory 7a of the clustering control processing section 7 has the following information:
Five character type codes M _V in the order shown in Figure 5 (b)
~M _S is selectively stored.

そこで、例えば、字種コードM_V及びM_Pのシテ
イブロツク距離d_V及びd_Pが十分小さくd_P＜D〓であ
り、D〓＜d_A、d_B＜D〓であつたとすると、クラス
タリング制御処理部７は、字種コードM_V、M_Pを
Ａクラスに、字種コードM_A、M_Q、M_BをＢクラ
スにクラス分けし、Ａクラスの字種コードM_V、
M_Pについては優先度による認識順位の入れ換え
を行なわず、Ｂランクの字種コードM_A、M_Q、
M_Bについて優先度による認識順位の入れ換えを
行なうので、第５図(ハ)に示すように、優先度が
「２」、「１」と低くても類似度が大きい字種コー
ドM_A、M_Qが第１位、第２位の順位に決定され、
類似度が小さいＢクラスでは優先度が「０」と高
い字種コードM_A、M_Bが上位に来て、全体の第３
位及び第４位となる。 Therefore, for example, if the city block distances d _V and d _P of character type codes M _V and M _P are sufficiently small d _P < D〓, and D〓 < d _A and d _B < D〓, clustering control The processing unit 7 classifies the character type codes M _V and _MP into A class, and the character type codes M _A , M _Q , and M _B into B class, and classifies the character type codes M _V and M B of A class.
For M _P , the recognition order is not changed based on priority, and B rank character type codes M _A , M _Q ,
Since the recognition order for M _B is changed based on the priority, as shown in Figure 5 (c), even if the priority is low such as "2" or "1", the character type codes M _A and M have a high similarity. _Q was determined to be the 1st and 2nd place,
In class B, where the similarity is small, character type codes M _A and M _B with high priority of “0” are at the top, and are ranked 3rd overall.
and 4th place.

又、距離が極めて小さい候補がなく、D〓＜d_V、
d_A＜D〓＜d_Q、d_B＜D〓であつたとすると、クラス
タリング制御処理部７は、字種コードM_V、M_P、
M_AをＢクラスに、字種コードM_Q、M_BをＣクラ
スにクラス分けし、これら各クラス内で優先度に
よる認識順位の入れ換えを行なうので、第５図ニ
に示すように、距離が多少小さくても優先度の最
も高い字種コードM_Aが第１位の順位に決定され、
優先度の低い字種コードM_Vは下位に順位づけさ
れるようになる。 In addition, there are no candidates with extremely small distances, and D〓<d _V ,
If d _A < D < d _Q and d _B < D, the clustering control processing unit 7 sets the character type codes M _V , _MP ,
M _A is classified into B class, character code M _Q and M _B are classified into C class, and the recognition order is swapped based on priority within each class, so the distance is The character type code M _A with the highest priority is determined to be the first, even if it is slightly smaller.
Character type codes M _V with low priority are ranked lower.

このように、類似度が大きい字種カテゴリーに
ついては、従来と同様、優先度が無視された類似
度のみによる認識順位の決定が為され、類似度が
小さい字種カテゴリーについては、優先度を考慮
した順位の決定が為される。 In this way, for character type categories with high similarity, the recognition order is determined only by similarity, ignoring priority, as before, and for character type categories with low similarity, priority is taken into account. A determination of the ranking will be made.

(ト) 発明の効果本発明に依れば、類似度の計算結果による認識
順位が十分信頼できるような類似度が大きな候補
字種カテゴリーについては、従来と全く同様の認
識順位となり、認識結果として曖味な候補字種カ
テゴリーについては、優先度を考慮した順位決定
が行なわれるので、認識率が向上する。(G) Effects of the Invention According to the present invention, for candidate character type categories whose similarity is large enough that the recognition ranking based on the similarity calculation result is sufficiently reliable, the recognition ranking is exactly the same as before, and the recognition result is Regarding ambiguous candidate character type categories, the recognition rate is improved because the ranking is determined taking priority into consideration.

[Brief explanation of the drawing]

第１図は本発明を適用した文字認識システムの
ブロツク図、第２図はカテゴリーセツトの内容を
示す説明図、第３図はクラスタリング制御処理部
の処理内容の一部を示すフローチヤート、第４図
は知識部の内容を示す説明図、第５図は認識順位
決定の具体例を示す説明図である。主な図番の説明３…辞書部、４…パターンマ
ツチング部、５…候補メモリ、６…知識部、７…
クラスタリング制御部、８…結果メモリ、９…答
出力制御部。 FIG. 1 is a block diagram of a character recognition system to which the present invention is applied, FIG. 2 is an explanatory diagram showing the contents of a category set, FIG. 3 is a flowchart showing part of the processing contents of the clustering control processing section, and FIG. The figure is an explanatory diagram showing the contents of the knowledge section, and FIG. 5 is an explanatory diagram showing a specific example of recognition ranking determination. Explanation of main figure numbers 3...Dictionary section, 4...Pattern matching section, 5...Candidate memory, 6...Knowledge section, 7...
Clustering control unit, 8... Result memory, 9... Answer output control unit.

Claims

[Claims]

1. In a character recognition system that calculates the degree of similarity between a feature pattern extracted from an input character pattern and a standard feature pattern for each character type category registered in advance in a dictionary section, and selects a plurality of complementary character type categories, the above-mentioned A priority is set for each of the character type categories, and among the plurality of Hou complementary character type categories, for the Hou complementary character type categories whose similarity is greater than a predetermined value, the recognition order is not replaced according to the priority, A recognition order determining method, characterized in that the recognition order is changed based on the priority for the Hou-complementary character type category that is smaller than the predetermined value.