JP4594765B2

JP4594765B2 - Character recognition apparatus, character recognition method, and character recognition program recording medium

Info

Publication number: JP4594765B2
Application number: JP2005064386A
Authority: JP
Inventors: 良規草地; 章鈴木; 賢一荒川; 慎吾安藤
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2005-03-08
Filing date: 2005-03-08
Publication date: 2010-12-08
Anticipated expiration: 2025-03-08
Also published as: JP2006251920A

Description

本発明は、文字列を画像として撮影し、この文字列を識別する画像識別技術に関するものである。この具体的な産業応用システムとして、例えば看板の日本語翻訳システムなどが挙げられる。 The present invention relates to an image identification technique for capturing a character string as an image and identifying the character string. As a specific industrial application system, for example, a Japanese translation system for a signboard can be cited.

景観に存在する文字列識別は、一般的には文字列位置特定、文字領域特定、２値化、及び文字識別という４ステップを経る。しかし、このような技術では、照明変動や複雑な背景などの影響によって、文字列位置特定、文字領域特定、２値化がうまくいかず、文字列識別精度が低いという問題があった。 Character string identification existing in the landscape generally goes through four steps: character string position specification, character region specification, binarization, and character identification. However, such a technique has a problem in that character string position specification, character region specification, and binarization are not successful due to the influence of illumination fluctuations, complicated background, and the like, and character string identification accuracy is low.

このような問題を解決するために、景観に存在する文字列識別技術として、全画面探索、文字候補絞込み、及び言語モデルによる文字列推定という３ステップを経る方法がある(例えば、非特許文献１参照。)。
草地良規、伊藤直己、鈴木章、荒川賢一、「画像インデクシングを目的としたテキスト領域不要の景観中文字認識」、電子情報通信学会信学技報ＰＲＭＵ２００４−８９、（２００４−１０）、ｐ.３７−４２ In order to solve such a problem, as a character string identification technique existing in a landscape, there is a method of passing through three steps of full-screen search, character candidate narrowing down, and character string estimation by a language model (for example, Non-Patent Document 1) reference.).
Yoshinori Kusachi, Naomi Ito, Akira Suzuki, Kenichi Arakawa, “Recognition of Characters in Scenery Not Required for Text Areas for Image Indexing”, IEICE Technical Report PRMU 2004-89, (2004-10), p.37 -42

しかしながら、上記方法では、特徴の定義及び識別アルゴリズムに限界があり、背景と文字との区別が完全にはつかず、背景部分に文字候補が多数存在してしまい、言語モデルによる文字列推定がうまく働かないという課題があった。 However, in the above method, there is a limit to the feature definition and identification algorithm, the distinction between the background and the character cannot be made completely, and many character candidates exist in the background portion, and the character string estimation by the language model is successful. There was a problem of not working.

本発明は、かかる事情に鑑みてなされたものであり、その目的は、上記課題を解決した文字認識技術を提供することにある。 This invention is made | formed in view of this situation, The objective is to provide the character recognition technique which solved the said subject.

そこで、上記課題を解決するために、請求項１に記載の発明は、画像中の文字を認識する文字認識装置であって、入力した画像を走査しながら局所画像を切り出して特徴を抽出し、当該特徴と辞書に登録された文字の特徴との類似度に基づいて文字候補を決定する第一段文字認識手段と、前記決定した文字候補について、既に特徴抽出に用いられた特徴とは異なる特徴を抽出し、当該特徴と辞書に登録された文字の特徴との類似度から前記文字候補の絞込みを行う一以上の異特徴文字認識手段と、を有し、前記第一段文字認識手段と前記一以上の異特徴文字認識手段のうちのいずれか１つの手段が、入力された画像について、複数の方向に線の線幅を検出する線幅検出手段と、前記各方向の線の線幅を特徴値とした特徴ベクトルを出力する出力手段と、を有し、前記線幅検出手段は、線幅を検出する方向および線の線幅に応じた複数のフィルタを生成するフィルタ生成手段と、前記画像と前記複数のフィルタとの一致度を算出する一致度算出手段と、前記算出した一致度において最大の一致度を示すフィルタの線の線幅の値と前記一致度とを乗算して得た値を線の線幅とする線幅計算手段と、を有することを特徴とする。 Accordingly, in order to solve the above-described problem, the invention according to claim 1 is a character recognition device that recognizes characters in an image, and extracts features by cutting out a local image while scanning the input image. A first-stage character recognition unit that determines a character candidate based on the similarity between the feature and the character feature registered in the dictionary, and a feature that is different from the feature already used for feature extraction for the determined character candidate One or more different feature character recognition means for narrowing down the character candidates from the similarity between the feature and the character feature registered in the dictionary, and the first-stage character recognition means and the One of the one or more different feature character recognition means includes a line width detection means for detecting a line width of a line in a plurality of directions, and a line width of the line in each direction. Output feature vectors as feature values Means for generating a plurality of filters according to the direction in which the line width is detected and the line width of the lines, and the degree of coincidence between the image and the plurality of filters. A degree of coincidence calculating means for calculating a line width of a line obtained by multiplying the degree of coincidence by the line width value of the filter line showing the maximum degree of coincidence in the calculated degree of coincidence And calculating means .

また、請求項２に記載の発明は、画像中の文字を認識する文字認識装置における文字認識方法であって、第一段文字認識手段が、入力した画像を走査しながら局所画像を切り出して特徴を抽出し、当該特徴と辞書に登録された文字の特徴との類似度に基づいて文字候補を決定する第一段文字認識ステップと、一以上の異特徴文字認識手段が、前記決定した文字候補について、既に特徴抽出に用いられた特徴とは異なる特徴を抽出し、当該特徴と辞書に登録された文字の特徴との類似度から前記文字候補の絞込みを行う異特徴文字認識ステップと、を有し、前記第一段文字認識ステップと前記異特徴文字認識ステップのいずれかのステップにおいて、線幅検出手段が、入力された画像について、複数の方向に線の幅を検出する線幅検出ステップと、出力手段が、前記各方向の線の線幅を特徴値とした特徴ベクトルを出力する出力ステップと、を有し、前記線幅検出ステップにおいて、フィルタ生成手段が、線幅を検出する方向および線の線幅に応じた複数のフィルタを生成するフィルタ生成ステップと、一致度算出手段が、前記画像と前記複数のフィルタとの一致度を算出する一致度算出ステップと、線幅計算手段が、前記算出した一致度において最大の一致度を示すフィルタの線の線幅の値と前記一致度とを乗算して得た値を線の線幅とする線幅計算ステップと、を有することを特徴とする。 The invention according to claim 2 is a character recognition method in a character recognition device for recognizing characters in an image, wherein the first stage character recognition means cuts out the local image while scanning the input image. And a first-stage character recognition step for determining a character candidate based on the similarity between the feature and the character feature registered in the dictionary, and one or more different-character character recognition means include the determined character candidate A different feature character recognition step of extracting a feature different from the feature already used for feature extraction and narrowing down the character candidates based on the similarity between the feature and the feature of the character registered in the dictionary. and, in any step of the different characteristic character recognition step and the first stage character recognition step, a line width detection means, the input image, the line width detection step of detecting a width of a line in a plurality of directions The output means outputs a feature vector having the line width of the line in each direction as a feature value, and in the line width detection step, the filter generation means detects the line width and A filter generation step for generating a plurality of filters according to the line width of the line, a coincidence degree calculating means for calculating a coincidence degree between the image and the plurality of filters, and a line width calculating means, A line width calculation step of setting a line width of a line obtained by multiplying the line width value of the filter showing the maximum degree of coincidence in the calculated degree of coincidence by the coincidence degree. And

また、請求項３に記載の発明は、上記請求項１または２いずれかに記載の文字認識装置又は文字認識方法を、コンピュータで実行可能に記載したプログラムを記録したことを特徴とする。 According to a third aspect of the present invention, there is recorded a program in which the character recognition device or the character recognition method according to the first or second aspect is executable by a computer.

請求項１〜３に記載の発明によれば、第一段文字認識手段で決定した文字候補について、異特徴文字認識手段がさらに絞込みを行うので、背景部分の文字候補を削減することができる。 According to the first to third aspects of the present invention, the different character recognition means further narrows down the character candidates determined by the first-stage character recognition means, so that the number of character candidates in the background portion can be reduced.

また、第一段文字認識手段と異特徴文字認識手段とで異なる特徴を組み合わせて絞り込みを行うことで、さらに文字候補を削減することができる。

Moreover, character candidates can be further reduced by combining and narrowing different features between the first-stage character recognition means and the different-character recognition means.

以下、本発明の実施形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

(文字列翻訳システム)
まず図１を用いて、文字認識装置を文字列翻訳システムに適用した例を説明する。文字列翻訳システムは、カメラ付きＰＤＡなどの携帯端末１１、文字認識装置１２、文字列推定装置１３、及び翻訳装置１４から構成される。 (String translation system)
First, an example in which the character recognition device is applied to a character string translation system will be described with reference to FIG. The character string translation system includes a mobile terminal 11 such as a PDA with a camera, a character recognition device 12, a character string estimation device 13, and a translation device 14.

この文字列翻訳システムにおいて、ユーザは携帯端末１１にて画像を撮影して文字認識装置１２に送付する。文字認識装置１２は画像から文字候補を抽出して文字列推定装置１３に送付する。文字列推定装置１３は文字候補から文字列を推定して翻訳装置１４に送付する。翻訳装置１４は文字列を翻訳して翻訳結果を携帯端末１１に送付し、ユーザはこの翻訳結果を閲覧することができる。このように文字列翻訳システムにより、ユーザは撮影した文字列の画像を基に、この文字列の翻訳結果を見ることが可能となる。 In this character string translation system, the user takes an image with the portable terminal 11 and sends it to the character recognition device 12. The character recognition device 12 extracts character candidates from the image and sends them to the character string estimation device 13. The character string estimation device 13 estimates a character string from the character candidates and sends it to the translation device 14. The translation device 14 translates the character string and sends the translation result to the portable terminal 11 so that the user can view the translation result. As described above, the character string translation system allows the user to see the translation result of the character string based on the captured character string image.

(文字認識装置)
ここで図２を用いて、文字認識装置１２の構成を説明する。図２に示すように文字認識装置１２は、第一段文字認識手段２１、及びＮ個の異特徴文字認識手段２２−１〜２２−Ｎから構成される。 (Character recognition device)
Here, the configuration of the character recognition device 12 will be described with reference to FIG. As shown in FIG. 2, the character recognition device 12 includes a first-stage character recognition unit 21 and N different characteristic character recognition units 22-1 to 22-N.

(第一段文字認識手段)
第一段文字認識手段２１は、画像から文字候補を抽出する手段であり、例えば、「画像インデクシングを目的としたテキスト領域不要の景観中文字認識」、電子情報通信学会信学技報ＰＲＭＵ２００４−８９（２００４−１０）、ｐ.３７−４２、に記載の方法により実現できる。 (First stage character recognition means)
The first-stage character recognition means 21 is a means for extracting character candidates from an image. For example, “character recognition in a landscape that does not require a text area for the purpose of image indexing”, IEICE Technical Report PRMU 2004-89. (2004-10), p. 37-42.

ここで、第一段文字認識手段２１について、例を挙げて説明する。この第一段文字認識手段２１では、大きさの異なる文字に対応するために、複数解像度画像を生成し、位置をずらしながら、定められた大きさの画像を切り出し、粗密検索を行う。この結果をインデクスとして利用する。画像検索では、キーワードが入力されると、インデクスから該当する文字のみを抽出して規則性を判定し、規則的と判定された画像を結果として出力する。 Here, the first stage character recognition means 21 will be described with an example. The first-stage character recognition means 21 generates a multi-resolution image in order to correspond to characters of different sizes, cuts out an image of a predetermined size while shifting the position, and performs a coarse / fine search. This result is used as an index. In the image search, when a keyword is input, only the corresponding character is extracted from the index to determine regularity, and an image determined to be regular is output as a result.

ここで、粗密探索はパターン学習、及びパターン識別より構成される。以下に、パターン学習、及びパターン識別について説明する。 Here, the coarse / fine search includes pattern learning and pattern identification. Hereinafter, pattern learning and pattern identification will be described.

[パターン学習] パターン学習は、特徴抽出、カテゴリの階層構造の作成、幾何学変形によるパターン生成、及び辞書生成の４つの段階から構成される。 [Pattern Learning] Pattern learning is composed of four stages: feature extraction, creation of a category hierarchy, pattern generation by geometric deformation, and dictionary generation.

特徴抽出では、文字を正面から撮影した原パターン(ｗ×ｗとする)を用意し、特徴を抽出する。特徴は、加重方向指数ヒストグラム特徴(ＷＤＣＨ)を利用する。ＷＤＣＨはＯＣＲに用いられ、２値画像を対象としていたが、グレー画像に容易に拡張可能である。以下にアルゴリズムの概要を示す。ただし、Ｍ、Ｎは正定数である。
１：原パターンから、ソーベルオペレータを用いて微分の値及び方向を求める。
２：微分の方向をＭ方向に量子化する。
３：原パターンをＮ×Ｎのグリッドに分割する。
４：各グリッドの各Ｍ方向で、微分値の大きさを加算する。
５：Ｎ×Ｎ×Ｍの特徴ベクトルと考え、ノルムを正規化する。 In the feature extraction, an original pattern (w × w) obtained by photographing a character from the front is prepared, and the feature is extracted. The feature utilizes a weighted direction index histogram feature (WDCH). WDCH is used for OCR and targeted for binary images, but can be easily extended to gray images. The outline of the algorithm is shown below. However, M and N are positive constants.
1: A differential value and direction are obtained from the original pattern using a Sobel operator.
2: The direction of differentiation is quantized in the M direction.
3: The original pattern is divided into N × N grids.
4: The magnitude of the differential value is added in each M direction of each grid.
5: Consider a feature vector of N × N × M, and normalize the norm.

ＷＤＣＨは、微分値をベースとしているため、明るさ変動の影響を受けにくい。また、グリッド内の微分値の和を用いることにより、フォントによる変形などの形状の微小変動を吸収することができる。 Since WDCH is based on differential values, it is not easily affected by brightness fluctuations. Further, by using the sum of the differential values in the grid, it is possible to absorb minute variations in shape such as deformation due to fonts.

カテゴリの階層構造の作成では、特徴ベクトルの類似性から各カテゴリをクラスタリングし、階層構造を作成する。各ノードは、複数のカテゴリが含まれる。最下層のノードでは、単体のカテゴリのみが含まれる。 In creating a category hierarchical structure, each category is clustered based on the similarity of feature vectors to create a hierarchical structure. Each node includes a plurality of categories. In the lowest layer node, only a single category is included.

幾何学変形によるパターン生成では、各カテゴリに対し、視点の変動に伴う文字の変形パターンを生成する。原パターンを回転、垂直スキュー、水平スキュー、縦横比、及び拡大縮小の５パラメータのアフィン変換により幾何学変形する。生成されたパターンの大きさは、原パターンよりも大きくなる場合があるが、現パターンの窓サイズ内の部分パターンのみを用いて特徴を抽出し、この特徴ベクトルを辞書生成に用いる。 In the pattern generation by geometric deformation, a character deformation pattern accompanying a change in viewpoint is generated for each category. The original pattern is geometrically deformed by five-parameter affine transformation of rotation, vertical skew, horizontal skew, aspect ratio, and enlargement / reduction. Although the size of the generated pattern may be larger than the original pattern, features are extracted using only partial patterns within the window size of the current pattern, and this feature vector is used for dictionary generation.

辞書生成では、以下の手順で各ノードの辞書を作成する。 In dictionary generation, a dictionary for each node is created in the following procedure.

第１段階として、特徴圧縮を行う。まず、各階層において、特徴を圧縮する。幾何変形パターンを含む全特徴ベクトルを主成分分析し、上位の固有値を有する固有ベクトルを用いて圧縮する。この圧縮特徴ベクトルをｆ(ｃ，ｒ，ｐ)と表す。ただし、ｃはカテゴリ、ｒは圧縮率、ｐは変形パラメータである。 As a first stage, feature compression is performed. First, the features are compressed in each layer. All feature vectors including the geometric deformation pattern are subjected to principal component analysis and compressed using eigenvectors having higher eigenvalues. This compressed feature vector is represented as f (c, r, p). However, c is a category, r is a compression rate, and p is a deformation parameter.

第２段階として、各ノードでの辞書生成を行う。各ノードのカテゴリ集合をＣとすると、ｆ(Ｃ，ｒ，ｐ)のベクトルを主成分分析し、部分空間Ｅｄ(Ｃ，ｒ)を求める。ただし、ｄは部分空間の次元数であり、寄与率により求めるが、システムにより定められる整数である。 As a second stage, dictionary generation at each node is performed. If the category set of each node is C, the principal component analysis is performed on the vector of f (C, r, p), and the subspace Ed (C, r) is obtained. Here, d is the number of dimensions of the subspace, and is an integer determined by the system, which is determined by the contribution rate.

各階層の圧縮率は、下層に向かうに従い低く設定することで粗密探索を実現する。上層では、精度は低いが、高速な識別を行い、下層では、低速であるが高精度な識別を実行する。 The compression rate of each layer is set to be lower toward the lower layer, thereby realizing the coarse / fine search. In the upper layer, although the accuracy is low, identification is performed at high speed, and in the lower layer, identification is performed at low speed but with high accuracy.

[パターン識別] 複数解像度画像全面に位置を動かしながら、大きさＷ×Ｗの小領域を切り出し、パターン識別を行う。パターン識別では、階層構造において複数のルートを辿りながら、粗密探索を実行する。以下にアルゴリズムの概要を示す。
１．特徴抽出：各解像度画像全面に対し位置を変化させつつ、領域を切り出して特徴を抽出する。すべての切り出し領域の特徴をあらかじめ算出しておく。
２．初期化：木構造のルートノードを出発点とする。
３．候補ノードの設定：すべての切り出し領域に、候補ノードとして第一階層のノードをセットする。各切り出し領域に対して、４〜６を繰り返す。
４．圧縮：下層の圧縮率を用いて切り出し領域の特徴を圧縮する。これをＩ’(ｒ)と表す。
５．投影距離計算：以下の式に従い、候補ノードＣの部分空間を用いて投影距離Ｌ(Ｃ)を求める。 [Pattern Identification] A small area of size W × W is cut out while moving the position over the entire surface of the multi-resolution image, and pattern identification is performed. In pattern identification, a coarse / fine search is performed while following a plurality of routes in a hierarchical structure. The outline of the algorithm is shown below.
1. Feature extraction: Extracting features by cutting out regions while changing the position of each resolution image. Features of all cutout areas are calculated in advance.
2. Initialization: Start from the root node of the tree structure.
3. Candidate node setting: Nodes in the first layer are set as candidate nodes in all cutout areas. Repeat 4-6 for each cutout area.
4). Compression: compresses the features of the clip region using the compression ratio of the lower layer. This is represented as I ′ (r).
5). Projection distance calculation: The projection distance L (C) is obtained using the partial space of the candidate node C according to the following equation.

ただし、Ｄは部分空間次元である。
６．スクリーニング：上記の距離値から、各候補ノードの順位を計算する。この距離及び順位の閾値から、候補ノードを更新する。
７．ピーク検出：すべての切り出し領域の各候補ノードについて、３次元(縦、横、解像度)の空間的な連結性を算出し、セグメントを求める。各セグメント内の距離値の最小ピークを有する候補ノードのみを残す。その他の候補ノードは削除する。
８．局所領域でのスクリーニング：同一解像度のピークの集合各々に対して以下の処理を行う。まず空間をブロック分割し、各ブロック内に含まれるピークを距離値によってソートして、上位から一定個数以内だけ採用する。その後、ブロック分割の位置を水平／垂直に半ブロックずらして同じ処理を行う。
９．同一候補文字のピーク統合：同一候補文字を持つ２つのピークを取り出して中心座標と解像度が互いに近ければ距離値の小さい方に統合する処理を、統合するピークの対が存在しなくなるまで繰り返す。
１０．候補ノードの更新：候補ノードの下層に接続されたノードを新しい候補ノードとして登録する。
１１．終了判定：最下層に辿り着いていれば残った候補ノードをインデクスとして出力して終了、その他であれば上記４に戻る。インデクスのフォーマットは(カテゴリ名、位置、大きさ、類似度)である。 Where D is the subspace dimension.
6). Screening: The rank of each candidate node is calculated from the above distance values. The candidate nodes are updated from the distance and rank thresholds.
7). Peak detection: Three-dimensional (vertical, horizontal, resolution) spatial connectivity is calculated for each candidate node in all cutout regions, and a segment is obtained. Only the candidate nodes with the smallest peak of distance value within each segment are left. Other candidate nodes are deleted.
8). Screening in the local region: The following processing is performed for each set of peaks having the same resolution. First, the space is divided into blocks, and the peaks included in each block are sorted according to distance values, and only a certain number from the top is adopted. Thereafter, the same processing is performed by shifting the block division position by a half block horizontally / vertically.
9. Peak integration of the same candidate characters: If two peaks having the same candidate character are taken out and integrated with a smaller distance value if the center coordinate and the resolution are close to each other, the processing is repeated until there is no pair of peaks to be integrated.
10. Candidate node update: A node connected to a layer below the candidate node is registered as a new candidate node.
11. End determination: If the candidate node has reached the lowest layer, the remaining candidate node is output as an index and the process ends. Otherwise, the process returns to 4 above. The format of the index is (category name, position, size, similarity).

上記７の空間的な連結性は３次元だけでなく２次元(縦、横)等も考えられる。また、上記７〜９は、処理量削減のための処理であり、精度及び処理量のトレードオフとなる。すべての階層で行う必要はなく、定められた階層のみでおこなえばよい。 The spatial connectivity of 7 is not limited to three dimensions, but two dimensions (vertical and horizontal) can be considered. In addition, the above 7 to 9 are processes for reducing the processing amount, which is a tradeoff between accuracy and processing amount. It is not necessary to carry out at all levels, and it is sufficient to carry out only at a predetermined level.

[画像検索] 画像検索では、キーワードが文字列として入力されるとパターン識別で得たインデクスの中からパターンが空間的に規則的に配置された個所を探索し、そのような個所が存在するインデクスを有する画像を検索結果として出力する。パターンの空間的な配置の規則として、ここでは、
(１)パターンの大きさがほぼそろっていること
(２)ピッチがほぼ一定であること
(３)ピッチの大きさが個別のパターンの大きさに対して相対的に一定の範囲内に収まっていること
(４)パターンの並ぶ順序が入力された文字列の順序と一致し、かつパターンの並ぶ方向と水平方向又は垂直方向との角度の差が一定の範囲内であることを用いる。 [Image search] In image search, when a keyword is input as a character string, the index obtained by pattern identification is searched for a place where the pattern is spatially arranged regularly, and the index where such a place exists is searched. Is output as a search result. As a rule of spatial arrangement of patterns, here
(1) The pattern size is almost the same
(2) The pitch is almost constant
(3) The pitch size is within a certain range relative to the size of each individual pattern.
(4) The pattern arrangement order matches the input character string order, and the difference in angle between the pattern arrangement direction and the horizontal or vertical direction is within a certain range.

この場合の探索アルゴリズムでは、入力文字列を構成する順方向の任意の２個の文字の組み合わせがインデクスの中で存在する個所をすべて探し、これらの個所で仮想的な入力文字列の開始位置の２次元座標、及び文字送りを表す２次元のベクトルの値を算出し、これらのパラメータで構成される投票空間に投票を行う。ただし、投票の際には、その組み合わせが上記(１)、(３)、(４)の規則に反しないか否かをチェックし、反すると判定した場合には投票を行わない。そして最後に、投票空間の中からスコアが閾値以上の個所の有無を探索する。 In this case, the search algorithm searches for all the locations in the index where any combination of two characters in the forward direction that make up the input character string exists, and at these points the start position of the virtual input character string is determined. A two-dimensional coordinate and a value of a two-dimensional vector representing character advance are calculated, and voting is performed in a voting space constituted by these parameters. However, at the time of voting, it is checked whether or not the combination does not violate the rules (1), (3), and (4) above. Finally, the voting space is searched for the presence or absence of a part whose score is equal to or greater than a threshold value.

このアルゴリズムでは、投票の際に処理対象となる候補文字は２つのカテゴリだけに限定するため偽の候補文字を多く含むインデクスに対しても高速な処理が可能となり、かつ投票処理の特性により部分的な正解の欠落に対してロバスト性を有することになる。 In this algorithm, the candidate characters to be processed at the time of voting are limited to only two categories, so it is possible to perform high-speed processing even for indexes that contain many false candidate characters, and partial due to the characteristics of voting processing. It is robust against lack of correct answers.

このようにして第一段文字認識手段２１は画像から文字候補(候補カテゴリ、位置、大きさ、類似度)を決定する。 In this way, the first stage character recognition means 21 determines character candidates (candidate category, position, size, similarity) from the image.

(異特徴文字認識手段)
異特徴文字認識手段２２−１〜２２−Ｎは、文字候補（候補カテゴリ、位置、大きさ、類似度）から文字を認識する手段であり、画像から特徴を抽出した後、部分空間法等により文字識別を行う。なお、部分空間法は、以下の文献に詳細が記載されている。“Ｅ．Ｏｊａ．ＳｕｂｓｐａｃｅＭｅｔｈｏｄｓｏｆＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎｓ．ＲｅｓｅａｒｃｈＳｔｕｄｉｅｓＰｒｅｓｓ，１９８３．”。 (Different character recognition means)
The different feature character recognition means 22-1 to 22-N are means for recognizing characters from character candidates (candidate category, position, size, similarity), and after extracting features from an image, a subspace method or the like is used. Character recognition is performed. The subspace method is described in detail in the following documents. “E. Oja. Subspace Methods of Pattern Recognitions. Research Studies Press, 1983.”.

例えば、第一段文字認識手段２１は撮影画像から特徴を抽出し、この特徴と辞書に登録された文字の特徴との類似度に基づいて、図３に示すような文字候補の出力をするものとする。これに対し、異特徴文字認識手段２２−Ａは、第一段文字認識手段２１から出力された文字候補情報の位置及び大きさに応じた画像について、第一段文字認識手段２１が用いた特徴とは異なる特徴を抽出し、文字カテゴリの識別を行う。そして中間データとして図４に示すような文字候補のデータを出力する。この文字候補データに対して、さらに異なる異特徴文字認識手段２２−Ｂが、異特徴文字認識手段２２−Ａとは異なる特徴を抽出して、文字候補の絞込みを行う。最後には、各文字候補の距離閾値（例えば、１０００以内）又は順位閾値（例えば、５位以内）等によって、図５のような出力を得る。すなわち、複数の異なる特徴に着目して文字候補の絞込みを行う異特徴文字認識手段２２を用いて処理を行うことにより、文字候補を削減することができる。 For example, the first stage character recognition means 21 extracts a feature from the photographed image, and outputs a character candidate as shown in FIG. 3 based on the similarity between this feature and the character feature registered in the dictionary. And On the other hand, the different-character recognition unit 22-A uses the feature used by the first-stage character recognition unit 21 for the image corresponding to the position and size of the character candidate information output from the first-stage character recognition unit 21. Features that are different from the above are extracted, and character categories are identified. Then, character candidate data as shown in FIG. 4 is output as intermediate data. Further different character recognition means 22-B, which is different from the character candidate data, extracts features different from the different character recognition means 22-A and narrows down the character candidates. Finally, an output as shown in FIG. 5 is obtained based on the distance threshold (for example, within 1000) or the rank threshold (for example, within 5) of each character candidate. That is, character candidates can be reduced by performing processing using the different character recognition means 22 that narrows down character candidates by focusing on a plurality of different features.

(第一段文字認識手段と異特徴文字認識手段とが特徴抽出手段を共有)
ここで図６を用いて、第一段文字認識手段２１と異特徴文字認識手段２２とが特徴抽出手段６１を共有する場合について説明する。図６に示すように、第一段文字認識手段２１と異特徴文字認識手段２２とが特徴抽出手段６１をシュアしており、そのパラメータのみが異なるものとする。 (First stage character recognition means and different feature character recognition means share feature extraction means)
Here, the case where the first stage character recognition means 21 and the different feature character recognition means 22 share the feature extraction means 61 will be described with reference to FIG. As shown in FIG. 6, it is assumed that the first-stage character recognition unit 21 and the different-character recognition unit 22 share the feature extraction unit 61, and only the parameters thereof are different.

例えば、図７〜図９を用いて、特徴抽出手段６１がパラメータとしてブロック数を入力可能な拡張加重方向指数ヒストグラムを用いる場合について説明する。 For example, a case where the feature extraction unit 61 uses an extended weighted direction index histogram in which the number of blocks can be input as a parameter will be described with reference to FIGS.

図７は、拡張加重方向指数ヒストグラムの処理の流れを示した図である。図７に示すように、
エッジの方向／大きさを求める(Ｓ７１)。 FIG. 7 is a diagram showing a flow of processing of the extended weighted direction index histogram. As shown in FIG.
The direction / size of the edge is obtained (S71).

エッジの方向を量子化する(Ｓ７２)。 The direction of the edge is quantized (S72).

ブロック分割をする(Ｓ７３)。 Block division is performed (S73).

各ブロック／各量子化方向でエッジの大きさの和を計算する(Ｓ７４)。 The sum of edge sizes in each block / quantization direction is calculated (S74).

平滑化を行う(Ｓ７５)。 Smoothing is performed (S75).

各値を１つの特徴ベクトルの要素とみなしてノルムの大きさを正規化する(Ｓ７６)。 Each value is regarded as an element of one feature vector, and the norm size is normalized (S76).

上記の処理を経て拡張加重方向指数ヒストグラムの処理を行う。 The extended weighted direction index histogram is processed through the above processing.

なお、上記Ｓ７２、Ｓ７４、Ｓ７５については、加重方向指数ヒストグラムの詳細を記載した以下の文献に詳細が記載される。「Ｔ．Ｗａｋａｂａｙａｓｈｉ，Ｓ．Ｔｓｕｒｕｏｋａ，Ｆ．Ｋｉｍｕｒａ，Ｙ．Ｍｉｙａｋｅ，“ＡｃｃｕｒａｃｙＩｍｐｒｏｖｅｍｅｎｔｔｈｒｏｕｇｈＩｎｃｒｅａｓｅｄＦｅａｔｕｒｅＳｉｚｅｉｎＨａｎｄｗｒｉｔｔｅｎＮｕｍｅｒａｌＲｅｃｏｇｎｉｔｉｏｎ”，ＩＥＩＣＥ，Ｖｏｌ．Ｊ７７−Ｄ−ＩＩ，Ｎｏ．１０，ｐｐ．２０４６−２０５３，１９９４（ｉｎＪａｐａｎｅｓｅ）」。 In addition, about said S72, S74, and S75, details are described in the following literature which described the detail of the weight direction index histogram. “T. Wakabayashi, S. Tsuruka, F. Kimura, Y. Miyake,“ Accuracy Improved Through Through Incremented Feature Size in Jl. Ip. 1994 (in Japan) ".

Ｓ７１では、図８に示すように、エッジの大きさ及び方向を求める。具体的な例としては、縦と横のソーベルオペレータを用いて算出することができる。 In S71, as shown in FIG. 8, the size and direction of the edge are obtained. As a specific example, it can be calculated using vertical and horizontal Sobel operators.

Ｓ７３では、図９に示すように、ブロック数が８である場合は、画像を８×８のブロックに分割する。また、ブロック数が４である場合は、画像を４×４のブロックに分割する。 In S73, as shown in FIG. 9, when the number of blocks is 8, the image is divided into 8 × 8 blocks. When the number of blocks is 4, the image is divided into 4 × 4 blocks.

Ｓ７６では、一般的な技術を用いて、各値を１つの特徴ベクトルの要素とみなしてノルムの大きさを正規化することができる。 In S76, using a general technique, the magnitude of the norm can be normalized by regarding each value as an element of one feature vector.

このように、第一段文字認識手段２１で用いる特徴と異特徴文字認識手段２２で用いる特徴とがパラメータのみが異なるものを用いることにより、識別率を低下させずに文字候補を削減することが可能となる。 In this way, character candidates can be reduced without lowering the identification rate by using the features used in the first stage character recognizing means 21 and the features used in the different characteristic character recognizing means 22 that are different only in parameters. It becomes possible.

(第一段文字認識手段と異特徴文字認識手段とが異なる特徴を用いる)
ここで、図１０を用いて、第一段文字認識手段２１が加重方向指数ヒストグラム特徴抽出手段１０１を有し、異特徴文字認識手段２２が線幅特徴抽出手段１０２を有する場合について説明する。なお、第一段文字認識手段２１が線幅特徴抽出手段１０２を有し、異特徴文字認識手段２２が加重方向指数ヒストグラム特徴抽出手段１０１を有してもよい。 (First stage character recognition means and different feature character recognition means use different features)
Here, the case where the first stage character recognition means 21 has the weighted direction index histogram feature extraction means 101 and the different feature character recognition means 22 has the line width feature extraction means 102 will be described with reference to FIG. The first stage character recognition unit 21 may include the line width feature extraction unit 102, and the different feature character recognition unit 22 may include the weighted direction index histogram feature extraction unit 101.

(線幅特徴抽出手段)
図１１を用いて線幅特徴抽出手段１０２について説明する。図１１は、線幅特徴抽出手段１０２の構成図である。図１１に示すように線幅特徴抽出手段１０２は、文字パターン群又は画像（画像パターンと呼ぶ）を入力する画像入力手段１１１、任意の画素において複数の方向の線幅を検出する線幅検出手段１１２、及び各画素の各方向の線幅を特徴値とした特徴ベクトルとして出力する出力手段１１３から構成される。 (Line width feature extraction means)
The line width feature extraction unit 102 will be described with reference to FIG. FIG. 11 is a configuration diagram of the line width feature extraction unit 102. As shown in FIG. 11, the line width feature extraction unit 102 includes an image input unit 111 that inputs a character pattern group or an image (referred to as an image pattern), and a line width detection unit that detects line widths in a plurality of directions at an arbitrary pixel. 112 and output means 113 for outputting as a feature vector having a line width in each direction of each pixel as a feature value.

図１２は、線幅検出の計算例であって、「ア」の画像に対して０度、４５度、９０度、及び１３５度の方向の線幅を算出する。例えば、局所的に２値化を行い、各方向の連結の長さを測定する。図１２に示すように各画素で４つの方向の線幅値が得られ、これを特徴ベクトルとみなす（ただし、数値の記載がない画素の線幅の値は０である）。特徴ベクトルの次元は、方向数×画素数である。 FIG. 12 is a calculation example of line width detection, and calculates line widths in directions of 0 degrees, 45 degrees, 90 degrees, and 135 degrees with respect to the image “A”. For example, binarization is performed locally and the length of connection in each direction is measured. As shown in FIG. 12, a line width value in four directions is obtained for each pixel, and this is regarded as a feature vector (however, the line width value of a pixel having no numerical value is 0). The dimension of the feature vector is the number of directions × the number of pixels.

ここで図１３を用いて線幅ヒストグラム計算手段１３３を有する線幅特徴抽出手段１０２について説明する。図１３に示すように線幅ヒストグラム計算手段１３３を有する線幅特徴抽出手段１０２は、文字パターン群又は画像（画像パターンと呼ぶ）を入力する画像入力手段１３１、任意の画素において複数の方向の線幅を検出する線幅検出手段１３２、定められた局所領域内において方向別に線幅のヒストグラムを計算する線幅ヒストグラム計算手段１３３、及び各局所領域の各方向の線幅のヒストグラムを特徴値とした特徴ベクトルとして出力する出力手段１３４から構成される。 Here, the line width feature extraction unit 102 having the line width histogram calculation unit 133 will be described with reference to FIG. As shown in FIG. 13, a line width feature extraction unit 102 having a line width histogram calculation unit 133 includes an image input unit 131 that inputs a character pattern group or an image (referred to as an image pattern), and lines in a plurality of directions at an arbitrary pixel. The line width detecting means 132 for detecting the width, the line width histogram calculating means 133 for calculating the line width histogram for each direction in the determined local area, and the line width histogram in each direction of each local area as the feature values. It is comprised from the output means 134 which outputs as a feature vector.

図１４は、線幅ヒストグラムの計算例である。図１４に示すように、図１３の線幅検出手段１３２にて算出された線幅に対し、画像に対して局所領域を設定し、局所領域内で角度方向別に、線幅の大きさを加算する。ここでは、０度の中央付近の局所領域では、各画素の線幅が１、１、０．５であるため、これらを加算して２．５となる。図１４に示すように、各領域で４つの方向の線幅ヒストグラムが得られ、これを特徴ベクトルとみなす。特徴ベクトルの次元は、方向数×局所領域数である。なお、局所領域は、例えば画像を格子状に単純に分割することにより求める。 FIG. 14 is a calculation example of a line width histogram. As shown in FIG. 14, a local region is set for the image with respect to the line width calculated by the line width detecting unit 132 in FIG. 13, and the size of the line width is added for each angle direction within the local region. To do. Here, in the local region near the center of 0 degree, the line width of each pixel is 1, 1, 0.5, and these are added to 2.5. As shown in FIG. 14, line width histograms in four directions are obtained in each region, and these are regarded as feature vectors. The dimension of the feature vector is the number of directions × the number of local regions. Note that the local region is obtained by, for example, simply dividing an image into a lattice shape.

(線幅検出手段)
ここで図１５を用いて線幅検出手段１１２、１３２について説明する。図１５に示すように、線幅検出手段１１２、１３２は、任意の方向および線幅に応じた複数のフィルタを算出するフィルタ生成手段１５１、着目画素を中心とした領域と複数のフィルタとの一致度を算出する一致度算出手段１５２、及び最大の一致度を示すフィルタの線幅の大きさと一致度とを乗算して線幅とする線幅計算手段１５３から構成される。 (Line width detection means)
Here, the line width detecting means 112 and 132 will be described with reference to FIG. As shown in FIG. 15, the line width detection means 112, 132 is a filter generation means 151 that calculates a plurality of filters according to an arbitrary direction and line width, and an area centered on the pixel of interest matches a plurality of filters A degree-of-match calculation means 152 for calculating a degree and a line width calculation means 153 that multiplies the degree of coincidence by the line width of the filter indicating the maximum degree of coincidence to obtain the line width.

図１６は、フィルタ生成手段１５１により生成されるフィルタの例である。図１６に示すように、４方向、かつ３種類の線幅を検出するフィルタ例である。この場合の線幅は２、３、４である。 FIG. 16 is an example of a filter generated by the filter generation unit 151. As shown in FIG. 16, this is an example of a filter that detects four directions and three types of line widths. The line width in this case is 2, 3, and 4.

(一致度計算手段)
一致度計算手段１５２では、生成した各フィルタと着目画素を中心とした領域との一致度を計算する。ここで、図１７に一致度計算手段１５２の構成について説明する。図１７に示すように、一致度計算手段１５２は正規化相関手段１７１を有している。この正規化相関手段１７１は、フィルタ生成手段１５１が生成したフィルタ並びに文字パターン群若しくは画像を入力して、フィルタと文字パターン群又は画像との正規化相関値を算出し、この正規化相関値を一致度として出力する。この正規化相関値の算出においては、例えば、内積値や距離などを用いることができる。 (Measuring method)
The degree of coincidence calculation means 152 calculates the degree of coincidence between each generated filter and the area centered on the target pixel. Here, the configuration of the degree-of-match calculation means 152 will be described with reference to FIG. As shown in FIG. 17, the coincidence degree calculation unit 152 includes a normalized correlation unit 171. The normalized correlation unit 171 receives the filter and the character pattern group or image generated by the filter generation unit 151, calculates a normalized correlation value between the filter and the character pattern group or image, and calculates the normalized correlation value. Output as degree of match. In the calculation of the normalized correlation value, for example, an inner product value or a distance can be used.

このように、第一段文字認識手段２１が加重方向指数ヒストグラム特徴抽出手段１０１を有し、異特徴文字認識手段２２が線幅特徴抽出手段１０２を有する、すなわち、第一段文字認識手段２１と異特徴文字認識手段２２とがまったく原理の異なる特徴を組み合わせることにより、若干の識別率の低下を伴うが劇的に文字候補を削減することが可能となる。 Thus, the first stage character recognition means 21 has the weighted direction index histogram feature extraction means 101, and the different feature character recognition means 22 has the line width feature extraction means 102, that is, the first stage character recognition means 21 and By combining features having completely different principles with the different feature character recognition means 22, it is possible to dramatically reduce the number of character candidates with a slight decrease in the recognition rate.

(プログラム等)
なお、上記実施形態において、文字認識装置は、例えば、文字認識装置を構成するコンピュータ装置が有するＣＰＵによって実現され、必要とする第一段文字認識処理、異特徴文字認識処理、特徴抽出処理、拡張加重方向指数ヒストグラム特徴抽出処理、線幅特徴抽出処理、正規化相関計算処理などをアプリケーションプログラムとして搭載することができる。 (Program etc.)
In the above-described embodiment, the character recognition device is realized by, for example, a CPU included in a computer device that constitutes the character recognition device, and required first-stage character recognition processing, different character recognition processing, feature extraction processing, and extension. Weighted direction index histogram feature extraction processing, line width feature extraction processing, normalized correlation calculation processing, and the like can be installed as application programs.

また、第一段文字認識処理、異特徴文字認識処理、特徴抽出処理、拡張加重方向指数ヒストグラム特徴抽出処理、線幅特徴抽出処理、正規化相関計算処理などで行った処理結果や計算結果等のデータを内部メモリや外部記憶装置等に書き込み・読み出しができるようにしてもよい。 In addition, processing results and calculation results such as first-stage character recognition processing, different feature character recognition processing, feature extraction processing, extended weighted direction index histogram feature extraction processing, line width feature extraction processing, normalized correlation calculation processing, etc. Data may be written to and read from an internal memory or an external storage device.

また、本実施形態の機能を実現するソフトウェアのプログラムコードを記録した記録媒体を、システム又は装置に供給し、そのシステム又は装置のＣＰＵ（ＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することも可能である。この場合、記憶媒体から読み出されたプログラムコード自体が上記実施形態の機能を実現することになり、このプログラムコードを記憶した記憶媒体としては、例えば、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＭＯ、ＨＤＤ等がある。 In addition, a recording medium recording software program codes for realizing the functions of the present embodiment is supplied to a system or apparatus, and a CPU (MPU) of the system or apparatus reads and executes the program code stored in the storage medium. It is also possible. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and examples of the storage medium storing the program code include CD-ROM, DVD-ROM, and CD-R. , CD-RW, MO, HDD and the like.

文字列翻訳システムの構成図。The block diagram of a character string translation system. 文字認識装置の構成図。The block diagram of a character recognition apparatus. 第一段文字認識手段の出力例。An output example of the first stage character recognition means. 異特徴文字認識手段の出力例。The example of an output of a different characteristic character recognition means. 異特徴文字認識手段の出力例。The example of an output of a different characteristic character recognition means. 第一段文字認識手段と異特徴文字認識手段との特徴抽出手段の共有例。An example of sharing feature extraction means between first stage character recognition means and different feature character recognition means. 拡張加重方向指数ヒストグラムの処理フロー図。The processing flow figure of an extended weight direction index histogram. エッジの大きさ及び方向を求める例を示す図。The figure which shows the example which calculates | requires the magnitude | size and direction of an edge. 画像をブロックに分割する例を示す図。The figure which shows the example which divides | segments an image into a block. 第一段文字認識手段と異特徴文字認識手段との構成図。The block diagram of a 1st step character recognition means and a different characteristic character recognition means. 線幅特徴抽出手段の構成図。The lineblock diagram of a line width feature extraction means. 線幅検出の計算例を示す図。The figure which shows the example of calculation of line | wire width detection. 線幅特徴抽出手段の構成図。The lineblock diagram of a line width feature extraction means. 線幅ヒストグラムの計算例を示す図。The figure which shows the example of calculation of a line width histogram. 線幅検出手段の構成図。The lineblock diagram of a line | wire width detection means. フィルタ生成手段により生成されるフィルタの例を示す図。The figure which shows the example of the filter produced | generated by the filter production | generation means. 一致度算出手段の構成図。The block diagram of a coincidence degree calculation means.

Explanation of symbols

１１…携帯端末
１２…文字認識装置
１３…文字列推定装置
１４…翻訳装置
２１…第一段文字認識手段
２２…異特徴文字認識手段
６１…特徴抽出手段
１０１…拡張加重方向指数ヒストグラム特徴抽出手段
１０２…線幅特徴抽出手段
１１１…画像入力手段
１１２…線幅検出手段
１１３…出力手段
１３１…画像入力手段
１３２…線幅検出手段
１３３…線幅ヒストグラム計算手段
１３４…出力手段
１５１…フィルタ生成手段
１５２…一致度計算手段
１５３…線幅計算手段
１７１…正規化相関手段 DESCRIPTION OF SYMBOLS 11 ... Portable terminal 12 ... Character recognition apparatus 13 ... Character string estimation apparatus 14 ... Translation apparatus 21 ... First stage character recognition means 22 ... Different character recognition means 61 ... Feature extraction means 101 ... Extended weight direction index histogram feature extraction means 102 ... line width feature extraction means 111 ... image input means 112 ... line width detection means 113 ... output means 131 ... image input means 132 ... line width detection means 133 ... line width histogram calculation means 134 ... output means 151 ... filter generation means 152 ... Matching degree calculating means 153... Line width calculating means 171... Normalized correlation means

Claims

A character recognition device for recognizing characters in an image,
First stage character recognition means for extracting a feature by cutting out a local image while scanning an input image, and determining a character candidate based on the similarity between the feature and the feature of a character registered in the dictionary;
For the determined character candidate, a feature different from the feature already used for feature extraction is extracted, and the character candidate is narrowed down based on the similarity between the feature and the feature of the character registered in the dictionary. Characteristic character recognition means,
Any one of the first stage character recognition means and the one or more different characteristic character recognition means includes:
Line width detecting means for detecting line widths of the input image in a plurality of directions;
Output means for outputting a feature vector having a line width of the line in each direction as a feature value;
The line width detecting means is
Filter generation means for generating a plurality of filters according to the direction of detecting the line width and the line width of the line;
A degree of coincidence calculating means for calculating a degree of coincidence between the image and the plurality of filters ;
A line width calculation means for setting a value obtained by multiplying the degree of coincidence by the value of the line width of the filter line indicating the maximum degree of coincidence in the calculated degree of coincidence;
A character recognition device comprising:

A character recognition method in a character recognition device for recognizing characters in an image,
First stage character recognition means extracts a feature by cutting out a local image while scanning the input image, and determines a character candidate based on the similarity between the feature and the feature of the character registered in the dictionary A step recognition step;
One or more different feature character recognition means extracts, for the determined character candidate, a feature different from the feature already used for feature extraction, and based on the similarity between the feature and the feature of the character registered in the dictionary A different feature character recognition step for narrowing down character candidates,
In any of the first stage character recognition step and the different character recognition step,
A line width detecting unit that detects line widths in a plurality of directions with respect to the input image;
An output unit that outputs a feature vector having a line width of the line in each direction as a feature value;
In the line width detection step,
A filter generating step for generating a plurality of filters according to a direction in which the line width is detected and a line width of the line;
A degree of coincidence calculating means for calculating a degree of coincidence between the image and the plurality of filters;
A line width calculation step in which the line width calculation means sets the value obtained by multiplying the coincidence by the value of the line width of the filter line indicating the maximum coincidence in the calculated coincidence; ,
A character recognition method characterized by comprising:

A recording medium having recorded thereon a computer program capable of executing the character recognition device or character recognition method according to claim 1 or 2 .