JP5796107B2

JP5796107B2 - Method and apparatus for text detection

Info

Publication number: JP5796107B2
Application number: JP2014103652A
Authority: JP
Inventors: ウェンフォアマー; ルオツァオハイ
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-05-24
Filing date: 2014-05-19
Publication date: 2015-10-21
Anticipated expiration: 2034-05-19
Also published as: JP2014229314A

Description

本発明は、テキスト検出に関し、特に、テキスト検出の方法及び装置、並びにテキスト情報抽出の方法及びシステムに関する。 The present invention relates to text detection, and more particularly, to a text detection method and apparatus, and a text information extraction method and system.

画像、特に自然画像においてテキストを検出することは、視覚障碍者及び外国人のためのコンピュータ化支援、画像及び映像の自動検索、並びに都市環境でのロボットナビゲーション等の多数の画像認識の適応例にとって極めて重要である。 Detecting text in images, especially natural images, is useful for many image recognition applications such as computerized assistance for visually impaired and foreigners, automatic image and video search, and robot navigation in urban environments. Very important.

それにもかかわらず、自然シーンでのテキスト検出は難しい問題である。印刷されたページ、ファックス及び名刺の走査とは対照的に、主な課題は、フォント、サイズ、スキュー角、斜体及び傾斜による歪み等のテキストの多様性にある。不均一な照明及び反射、劣悪な照明条件、並びに複雑な背景等の環境要因により、より複雑化している。 Nevertheless, text detection in natural scenes is a difficult problem. In contrast to scanning printed pages, faxes and business cards, the main challenge is text diversity such as font, size, skew angle, italics and skew distortion. It is more complicated by environmental factors such as uneven illumination and reflection, poor lighting conditions, and complex backgrounds.

関連文献において、自然シーンにおいてテキスト領域を検出するテキスト検出方法は、通常、図１に示されたフローチャートに従う。図１の方法１００はブロック１１０から開始し、画像から成分を生成する。ここで、成分は、同様の色又はグレースケール、あるいはストローク幅を有する画素グループである連結成分（ＣＣ）等であってもよい。 In related literature, a text detection method for detecting a text region in a natural scene usually follows the flowchart shown in FIG. The method 100 of FIG. 1 begins at block 110 and generates components from an image. Here, the component may be a connected component (CC) that is a pixel group having a similar color, gray scale, or stroke width.

次にブロック１２０において、種々の特徴が各成分から抽出され、その特徴に基づいて非テキスト成分がフィルタリングされる結果、候補となるテキスト成分が確保される。 Next, at block 120, various features are extracted from each component, and the non-text components are filtered based on the features to ensure candidate text components.

次にブロック１３０において、確保された候補となるテキスト成分は、テキスト行又は単語を定型化するために共にグループ化され、テキスト行又は単語のバウンディングボックス（テキストを含む矩形等の最小多角形）としてテキスト領域を出力する。 Next, at block 130, the reserved candidate text components are grouped together to form a text line or word, and as a text line or word bounding box (minimum polygon such as a rectangle containing text). Output text area.

従来技術の一般的な問題は、従来技術が、エッジ、隅、ストローク、色及びテクスチャ等のテキスト領域の特徴のみによってテキストを検出しようとすることである。しかし、殆どの場合に有用となりうるテキスト周囲のコンテキスト情報は無視される。その結果、従来技術では、複雑な非テキスト領域による誤検出及び自然シーン中のテキストの広範囲の多様性による非検出が発生する。従って、テキスト領域の特徴によってのみ、直接テキストを検出しようとする際には厳しいトレードオフがある。 A general problem with the prior art is that the prior art attempts to detect text only by text region features such as edges, corners, strokes, colors and textures. However, context information around text that can be useful in most cases is ignored. As a result, in the prior art, false detection due to complex non-text regions and non-detection due to wide variety of text in natural scenes occur. Therefore, there are severe tradeoffs when trying to detect text directly only by the characteristics of the text region.

従って、テキスト領域の周囲の背景情報を利用する画像でのテキスト検出のために改善された方法が必要である。 Therefore, there is a need for an improved method for text detection in images that utilize background information around the text region.

ＹａｓｕｈｉｒｏＫｕｎｉｓｈｉｇｅ、ＦｅｎｇＹａｏｋａｉ、ＳｅｉｉｃｈｉＵｃｈｉｄａ、ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＤｏｃｕｍｅｎｔＡｎａｌｙｓｉｓａｎｄＲｅｃｏｇｎｉｔｉｏｎ（ＩＣＤＡＲ）により出版された文献「Ｓｃｅｎｅｒｙｃｈａｒａｃｔｅｒｄｅｔｅｃｔｉｏｎｗｉｔｈｅｎｖｉｒｏｎｍｅｎｔａｌｃｏｎｔｅｘ」、１０４９〜１０５３ページ、２０１１年において、コンテキスト情報の概念を使用するテキスト検出方法が提案される。特に、文献は、対象成分の周囲に１０画素の余白を追加することで形成される対象成分の拡張領域からコンテキスト特徴を抽出する。更に方法は、６個のシーン成分カテゴリ、すなわち「空」、「緑」、「看板」、「地面」、「建物」及び「その他」のうちの１つに成分を分類する。この方法の１つの問題は、「空」、「緑」等の一般的なカテゴリがテキスト検出には不適切であり、且つ「看板」が例えばロゴ、タグ、スクロール、ポスター等の全ての種類のテキスト背景領域を範囲に含まないことである。この方法の別の問題は、コンテキスト情報が、シーンの変動に適応できない固定のサイズの余白領域から抽出されることである。更に別の問題は、情報がＣＣのグループ化において有益である一方で、この方法が空間におけるＣＣの関係、例えばどのＣＣがある看板に属しているかを取得できないことである。 ‘Scenario charactor’, published in the text of the ‘Nenhichi Kunshige’, ‘Fen Yakkai’, ‘Senichi Uchida’, ‘International Conferencing on Document’, and the ‘Nen. A detection method is proposed. In particular, the literature extracts context features from the expanded region of the target component formed by adding a 10 pixel margin around the target component. The method further classifies the components into one of six scene component categories: “sky”, “green”, “signboard”, “ground”, “building” and “other”. One problem with this method is that general categories such as “sky” and “green” are inappropriate for text detection, and “signboards” are all types of logos, tags, scrolls, posters, etc. The text background area is not included in the range. Another problem with this method is that the context information is extracted from a fixed size blank area that cannot adapt to scene variations. Yet another problem is that while the information is useful in CC grouping, this method cannot obtain CC relationships in space, for example which CC belongs to a signboard.

従って、上述の問題の少なくとも１つに対処する必要がある。 Therefore, at least one of the above problems needs to be addressed.

本発明者は、自然シーン中の殆どのテキストは、容易に認識されるように、テキストに対して高いコントラストを有する相対的に同様の背景領域上に印刷されていることに着目した。これは、テキスト検出にとって有用だろう。 The inventor has noted that most text in a natural scene is printed on a relatively similar background area with high contrast to the text so that it can be easily recognized. This may be useful for text detection.

それにより、新規のテキスト検出の方法及び装置は、画像、特に自然シーン画像でのテキスト検出の性能を向上させるために本発明において提案される。テキストを取り囲むこの一般的な背景領域を規定するために、新しい概念であるテキスト背景領域（ＴＢＲ）が本発明に導入される。自然シーン画像において、ＴＢＲは、通常、看板、ロゴ、タグ、スクロール及びポスター等として存在するが、これらの形態に限定されない。テキスト領域を直接見つけるのではなく、最初にＴＢＲを見つけて、ＴＢＲ内の成分及び全てのＴＢＲの外、すなわち外側領域（ＯＲ）の成分として成分を分類することにより、テキストを探索する。テキストは、ＴＢＲの外よりＴＢＲ内に現れる可能性がより高いと仮定される。また、１つのテキスト行／単語は、２つの領域（２つのＴＢＲ又はＴＢＲ及びＯＲ）を交差することはめったにない。その仮定に基づいて、ＴＢＲ情報は、連結成分のフィルタリング及び／又は連結成分のグループ化において使用されうる。 Thereby, a novel text detection method and apparatus is proposed in the present invention to improve the performance of text detection in images, especially natural scene images. In order to define this general background area surrounding the text, a new concept, text background area (TBR), is introduced into the present invention. In a natural scene image, TBR usually exists as a signboard, logo, tag, scroll, poster, or the like, but is not limited to these forms. Rather than finding the text region directly, the text is searched by first finding the TBR and classifying the component as a component within the TBR and out of all TBRs, ie, components of the outer region (OR). It is assumed that the text is more likely to appear in the TBR than outside the TBR. Also, a text line / word rarely intersects two regions (two TBRs or TBR and OR). Based on that assumption, the TBR information may be used in connected component filtering and / or connected component grouping.

本発明の第１の態様によると、少なくとも１つの連結成分（ＣＣ）を含む画像においてテキスト領域を検出する方法が提供される。方法は、画像から少なくとも１つのテキスト背景領域（ＴＢＲ）を検出するＴＢＲ検出ステップと、少なくとも１つのＣＣをフィルタリングして少なくとも１つの候補となるテキストＣＣを確保するＣＣフィルタリングステップと、ＴＢＲ検出ステップにおいて検出されたＴＢＲに基づいて少なくとも１つの候補となるテキストＣＣをグループ化して少なくとも１つのＣＣグループを形成し、且つ少なくとも１つのＣＣグループに基づいて少なくとも１つのテキスト領域を生成するＣＣグループ化ステップとを備える。 According to a first aspect of the invention, there is provided a method for detecting a text region in an image comprising at least one connected component (CC). The method includes: a TBR detection step for detecting at least one text background region (TBR) from the image; a CC filtering step for filtering at least one CC to ensure at least one candidate text CC; and a TBR detection step. A CC grouping step for grouping at least one candidate text CC based on the detected TBR to form at least one CC group and generating at least one text region based on the at least one CC group; Is provided.

本発明の第２の態様によると、少なくとも１つの連結成分（ＣＣ）を含む画像においてテキスト領域を検出するテキスト検出装置が提供される。装置は、画像からテキスト背景領域（ＴＢＲ）を検出するように構成されたＴＢＲ検出ユニットと、少なくとも１つのＣＣをフィルタリングして少なくとも１つの候補となるテキストＣＣを確保するように構成されたＣＣフィルタリングユニットと、ＴＢＲ検出ユニットにおいて検出されたＴＢＲに基づいて少なくとも１つの候補となるテキストＣＣをグループ化して少なくとも１つのＣＣグループを形成し、且つ少なくとも１つのＣＣグループに基づいて少なくとも１つのテキスト領域を生成するように構成されたＣＣグループ化ユニットとを備える。 According to a second aspect of the present invention, there is provided a text detection device for detecting a text region in an image including at least one connected component (CC). The apparatus includes a TBR detection unit configured to detect a text background region (TBR) from the image, and CC filtering configured to filter at least one CC to ensure at least one candidate text CC. A unit and at least one candidate text CC based on the TBR detected in the TBR detection unit to form at least one CC group, and at least one text region based on the at least one CC group. A CC grouping unit configured to generate.

本発明の第３の態様によると、テキスト情報抽出方法が提供される。方法は、本発明の第１の態様に係るテキスト検出方法を使用して入力画像又は入力映像からテキスト領域を検出するステップと、検出されたテキスト領域からテキストを抽出するステップと、抽出されたテキストを認識してテキスト情報を取得するステップとを備える。 According to a third aspect of the present invention, a text information extraction method is provided. The method includes detecting a text region from an input image or input video using the text detection method according to the first aspect of the present invention, extracting text from the detected text region, and extracted text Recognizing and acquiring text information.

本発明の第４の態様によると、テキスト情報抽出システムが提供される。システムは、入力画像又は入力映像からテキスト領域を検出するように構成された本発明の第２の態様に係るテキスト検出装置と、検出されたテキスト領域からテキストを抽出するように構成された抽出装置と、抽出されたテキストを認識してテキスト情報を取得するように構成された認識装置とを備える。 According to a fourth aspect of the present invention, a text information extraction system is provided. A system includes a text detection device according to a second aspect of the present invention configured to detect a text region from an input image or an input video, and an extraction device configured to extract text from the detected text region. And a recognition device configured to recognize the extracted text and acquire text information.

これらの特徴を利用することにより、本発明に係る方法、装置及びシステムは、画像中のテキストの場所を迅速に且つ／あるいは高精度に示すことができ、その結果、テキスト検出の性能が向上する。 By utilizing these features, the method, apparatus and system according to the present invention can quickly and / or accurately indicate the location of text in an image, resulting in improved text detection performance. .

図面を参照して、以下の説明から本発明の更なる特徴及び利点が明らかになるだろう。 Further features and advantages of the present invention will become apparent from the following description with reference to the drawings.

本明細書に組み込まれ且つその一部を構成する添付の図面は、本発明の実施形態を例示し、説明と共に本発明の原理を説明するのに役立つ。
図１は、画像においてテキスト領域を検出する従来技術の方法を示すフローチャートである。図２は、本発明の実施形態を実現できるコンピュータシステムの例示的なハードウェア構成を示すブロック図である。図３は、本発明の一実施形態に係る画像においてテキスト領域を検出するテキスト検出方法を示すフローチャートである。図４は、本発明の一実施形態に係るＣＣと他のＴＢＲとの関係に基づいてＣＣがＴＢＲであるかを判定する例示的な処理を示す図である。図５Ａは、本発明の一実施形態に係る図３のＣＣフィルタリングステップを示すフローチャートである。図５Ｂは、本発明の別の実施形態に係る図３のＣＣフィルタリングステップを示すフローチャートである。図５Ｃは、本発明の一実施形態に係る訓練分類器を使用する図３のＣＣフィルタリングステップにおける例示的な処理を示す図である。図６は、図３のＣＣグループ化ステップを実行することを示す例示的なフローチャートである。図７は、図６のグループ化ステップを実行することを示す例示的なフローチャートである。、、、、、図８Ａ〜図８Ｆは、本発明に係るテキスト検出方法を使用してそれぞれの処理結果を示す例示的な画像を示す図である。図９は、本発明の一実施形態に係る画像においてテキスト領域を検出するテキスト検出装置を示すブロック図である。図１０Ａは、本発明の一実施形態に係る図９のＣＣフィルタリングユニットを示すブロック図である。図１０Ｂは、本発明の別の実施形態に係る図９のＣＣフィルタリングユニットを示すブロック図である。図１１Ａは、本発明の一実施形態に係る図９のＣＣグループ化ユニットを示すブロック図である。図１１Ｂは、本発明の一実施形態に係る図１１Ａのグループ化ユニットを示すブロック図である。図１２は、本発明の一実施形態に係るテキスト情報抽出方法を示すフローチャートである。図１３は、本発明の一実施形態に係るテキスト情報抽出システムを示すブロック図である。 The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
FIG. 1 is a flowchart illustrating a prior art method for detecting a text region in an image. FIG. 2 is a block diagram illustrating an exemplary hardware configuration of a computer system that can implement an embodiment of the present invention. FIG. 3 is a flowchart illustrating a text detection method for detecting a text region in an image according to an embodiment of the present invention. FIG. 4 is a diagram illustrating an exemplary process for determining whether a CC is a TBR based on a relationship between the CC and another TBR according to an embodiment of the present invention. FIG. 5A is a flowchart illustrating the CC filtering step of FIG. 3 according to an embodiment of the present invention. FIG. 5B is a flowchart illustrating the CC filtering step of FIG. 3 according to another embodiment of the present invention. FIG. 5C is a diagram illustrating exemplary processing in the CC filtering step of FIG. 3 using a training classifier according to an embodiment of the present invention. FIG. 6 is an exemplary flowchart illustrating performing the CC grouping step of FIG. FIG. 7 is an exemplary flowchart illustrating performing the grouping steps of FIG. , , , , , 8A to 8F are diagrams illustrating exemplary images showing respective processing results using the text detection method according to the present invention. FIG. 9 is a block diagram illustrating a text detection device that detects a text region in an image according to an embodiment of the present invention. 10A is a block diagram illustrating the CC filtering unit of FIG. 9 according to an embodiment of the present invention. 10B is a block diagram illustrating the CC filtering unit of FIG. 9 according to another embodiment of the present invention. FIG. 11A is a block diagram illustrating the CC grouping unit of FIG. 9 according to an embodiment of the present invention. FIG. 11B is a block diagram illustrating the grouping unit of FIG. 11A according to one embodiment of the present invention. FIG. 12 is a flowchart illustrating a text information extraction method according to an embodiment of the present invention. FIG. 13 is a block diagram showing a text information extraction system according to an embodiment of the present invention.

以下、図面を参照して、本発明の実施形態を詳細に説明する。本明細書に組み込まれ且つその一部を構成する添付の図面は、本発明の実施形態を例示し、説明と共に本発明の原理を説明するのに役立つ。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.

尚、同様の図中符号及びアルファベットは図中同様の項目を示すため、項目は、１つの図面において規定されれば、後続の図面に対して説明されることを要しない。 In addition, since the same code | symbol and alphabet in a figure show the same item in a figure, if an item is prescribed | regulated in one drawing, it is not required to be demonstrated with respect to subsequent drawing.

また、本発明において、「第１の」及び「第２の」等の用語は、要素又はステップを区別するためだけに使用され、時間順序、存在又は重要性を示すことを意図しない。 Also, in the present invention, terms such as “first” and “second” are used only to distinguish elements or steps and are not intended to indicate time order, presence or importance.

図２は、本発明の実施形態を実現できるコンピュータシステム１０００のハードウェア構成を示すブロック図である。 FIG. 2 is a block diagram showing a hardware configuration of a computer system 1000 that can implement the embodiment of the present invention.

図２に示されるように、コンピュータシステムはコンピュータ１１１０を備える。例えばコンピュータ１１１０は、デジタルカメラ又はスマートフォンであってよい。コンピュータ１１１０は、処理ユニット１１２０と、システムメモリ１１３０と、取り外し不可能な不揮発性メモリインタフェース１１４０と、取り外し可能な不揮発性メモリインタフェース１１５０と、ユーザ入力インタフェース１１６０と、ネットワークインタフェース１１７０と、ビデオインタフェース１１９０と、出力周辺インタフェース１１９５とを備え、それらはシステムバス１１２１を介して接続される。 As shown in FIG. 2, the computer system includes a computer 1110. For example, the computer 1110 may be a digital camera or a smartphone. The computer 1110 includes a processing unit 1120, a system memory 1130, a non-removable non-volatile memory interface 1140, a removable non-volatile memory interface 1150, a user input interface 1160, a network interface 1170, and a video interface 1190. Output peripheral interface 1195, which are connected via a system bus 1121.

システムメモリ１１３０は、ＲＯＭ（読み出し専用メモリ）１１３１及びＲＡＭ（ランダムアクセスメモリ）１１３２を備える。ＢＩＯＳ（基本入出力システム）１１３３はＲＯＭ１１３１に常駐する。オペレーティングシステム１１３４、アプリケーションプログラム１１３５、他のプログラムモジュール１１３６及びいくつかのプログラムデータ１１３７は、ＲＡＭ１１３２に常駐する。 The system memory 1130 includes a ROM (read only memory) 1131 and a RAM (random access memory) 1132. A BIOS (basic input / output system) 1133 resides in the ROM 1131. Operating system 1134, application program 1135, other program modules 1136, and some program data 1137 reside in RAM 1132.

ハードディスク等の取り外し不可の不揮発性メモリ１１４１は、取り外し不可の不揮発性メモリインタフェース１１４０に接続される。取り外し不可の不揮発性メモリ１１４１は、例えばオペレーティングシステム１１４４、アプリケーションプログラム１１４５、他のプログラムモジュール１１４６及びいくつかのプログラムデータ１１４７を格納できる。 A non-removable non-volatile memory 1141 such as a hard disk is connected to a non-removable non-volatile memory interface 1140. Non-removable non-volatile memory 1141 can store, for example, operating system 1144, application programs 1145, other program modules 1146, and some program data 1147.

フラッシュドライブ１１５１及びＣＤ−ＲＯＭドライブ１１５５等の１つ以上の取り外し可能な不揮発性メモリドライブは、取り外し可能な不揮発性メモリインタフェース１１５０に接続される。例えば、ＳＤカード等のフラッシュメモリ１１５２は、フラッシュドライブ１１５１に挿入可能であり、ＣＤ（コンパクトディスク）１１５６は、ＣＤ−ＲＯＭドライブ１１５５に挿入可能である。処理される画像は、不揮発性メモリに格納可能である。 One or more removable non-volatile memory drives, such as flash drive 1151 and CD-ROM drive 1155, are connected to removable non-volatile memory interface 1150. For example, a flash memory 1152 such as an SD card can be inserted into the flash drive 1151, and a CD (compact disk) 1156 can be inserted into the CD-ROM drive 1155. The image to be processed can be stored in a non-volatile memory.

マイク１１６１及びキーボード１１６２等の入力装置は、ユーザ入力インタフェース１１６０に接続される。 Input devices such as a microphone 1161 and a keyboard 1162 are connected to the user input interface 1160.

コンピュータ１１１０は、ネットワークインタフェース１１７０によりリモートコンピュータ１１８０に接続可能である。例えばネットワークインタフェース１１７０は、ローカルエリアネットワーク１１７１を介してリモートコンピュータ１１８０に接続可能である。あるいは、ネットワークインタフェース１１７０は、モデム（変調器−復調器）１１７２に接続可能であり、モデム１１７２は、ワイドエリアネットワーク１１７３を介してリモートコンピュータ１１８０に接続される。 The computer 1110 can be connected to the remote computer 1180 via the network interface 1170. For example, the network interface 1170 can be connected to the remote computer 1180 via the local area network 1171. Alternatively, the network interface 1170 can be connected to a modem (modulator-demodulator) 1172, and the modem 1172 is connected to the remote computer 1180 via the wide area network 1173.

リモートコンピュータ１１８０は、リモートアプリケーションプログラム１１８５を格納するハードディスク等のメモリ１１８１を含めることができる。 The remote computer 1180 can include a memory 1181 such as a hard disk that stores the remote application program 1185.

ビデオインタフェース１１９０は、本発明の実施形態に係る１つ以上の処理結果を表示するために使用されてもよいモニタ１１９１に接続される。 The video interface 1190 is connected to a monitor 1191 that may be used to display one or more processing results according to embodiments of the present invention.

出力周辺インタフェース１１９５は、プリンタ１１９６及びスピーカ１１９７に接続される。 The output peripheral interface 1195 is connected to the printer 1196 and the speaker 1197.

図２に示されたコンピュータシステムは、例示にすぎず、本発明、その適応例又は使用を限定することを全く意図しない。 The computer system shown in FIG. 2 is merely exemplary and is in no way intended to limit the invention, its application, or use.

図２に示されたコンピュータシステムは、スタンドアロンコンピュータ又は装置の処理システムとして、場合によっては１つ以上の不要な構成要素が除去された状態あるいは１つ以上の更なる構成要素が追加された状態で、実施形態のいずれかに対して実現されてもよい。 The computer system shown in FIG. 2 is a stand-alone computer or device processing system, possibly with one or more unnecessary components removed or with one or more additional components added. It may be realized for any of the embodiments.

図３は、本発明の一実施形態に係る画像においてテキスト領域を検出するテキスト検出方法３００を示すフローチャートである。本発明に係るテキスト検出方法を使用して処理結果を示す図８Ａ〜図８Ｆを更に参照して説明する。 FIG. 3 is a flowchart illustrating a text detection method 300 for detecting a text region in an image according to an embodiment of the present invention. Further description will be made with reference to FIGS. 8A to 8F showing processing results using the text detection method according to the present invention.

一実施形態によると、画像は少なくとも１つの連結成分（ＣＣ：Connected Component）を含む。ＣＣは、同様の色又はグレースケール値を含む画素クラスタ（画素群）である。１つのクラスタにおける画素は、空間において４近傍又は８近傍で接続可能である。ＣＣは、例えば色クラスタリング、適応２値化及び形態素処理等により、ＴＢＲ検出ステップの前に画像から生成されうる。実施形態の１つにおいて、内容が参考として本明細書に取り入れられる「Ｒｏｂｕｓｔｗｉｄｅｂａｓｅｌｉｎｅｓｔｅｒｅｏｆｒｏｍｍａｘｉｍａｌｌｙｓｔａｂｌｅｅｘｔｒｅｍａｌｒｅｇｉｏｎｓ」、Ｊ．Ｍａｔａｓ、Ｏ．Ｃｈｕｍ、Ｍ．Ｕｒｂａｎ及びＴ．Ｐａｊｄｌａ、Ｐｒｏｃ．ｏｆＢｒｉｔｉｓｈＭａｃｈｉｎｅＶｉｓｉｏｎＣｏｎｆｅｒｅｎｃｅ、３８４〜３９６ページ、２００２年において説明されたＭＳＥＲ方法に基づいて、ＣＣはグレースケール画像から生成されうる。一実施形態において、暗い背景上の明るいテキスト及び明るい背景上の暗いテキストの双方を検出するために、ＣＣの生成は、一方は元の画像用であり且つ他方は倒像用である２つのチャネルに適用可能である。しかし、これは必ずしも要求されない。 According to one embodiment, the image includes at least one connected component (CC). CC is a pixel cluster (pixel group) including similar colors or gray scale values. Pixels in one cluster can be connected in the vicinity of 4 or 8 in space. The CC can be generated from the image before the TBR detection step, for example by color clustering, adaptive binarization and morphological processing. In one embodiment, “Robide wide baseline from stable stable extreme regions,” J. Pat. Matas, O.M. Chum, M.M. Urban and T.W. Pajdla, Proc. Based on the MSER method described in of the British Machine Vision Conference, pages 384-396, CCs can be generated from grayscale images. In one embodiment, in order to detect both light text on a dark background and dark text on a light background, CC generation is performed on two channels, one for the original image and the other for the inverse image. It is applicable to. However, this is not always required.

一例として、図８Ａ及び図８Ｂは、それぞれ、グレースケール画像及び画像中の生成されたＣＣを示す。図８Ｂにおいて、黒線のボックスの各々はＣＣを示す。すなわち、黒線のボックスは、ＣＣのバウンディングボックス（ＣＣを含む最小の四角形）である。 As an example, FIGS. 8A and 8B show a grayscale image and the generated CC in the image, respectively. In FIG. 8B, each black line box represents a CC. In other words, the black line box is a bounding box of CC (the smallest square including CC).

ブロック３１０において、テキスト背景領域（ＴＢＲ:Text Background Region）検出ステップは、入力画像から少なくとも１つのＴＢＲを検出するために実行される。 In block 310, a text background region (TBR) detection step is performed to detect at least one TBR from the input image.

一実施形態によると、ＴＢＲ検出ステップは、画像に含まれたＣＣに基づいて実行されうる。ＴＢＲは、特殊な特性を含むようなＣＣであってよい。一実施形態によると、ＴＢＲは、規則的な境界線及び均一な色又はグレースケールを有する画像中のテキストの周囲領域であってよい。自然シーン画像において、ＴＢＲは、通常、看板、ロゴ、タグ、スクロール、ポスター等として存在するが、これらの形態に限定されない。 According to one embodiment, the TBR detection step may be performed based on the CC included in the image. The TBR may be a CC that includes special characteristics. According to one embodiment, the TBR may be a surrounding area of text in an image having regular borders and a uniform color or gray scale. In a natural scene image, TBR usually exists as a signboard, logo, tag, scroll, poster, etc., but is not limited to these forms.

一実施形態によると、ＴＢＲは、画像に含まれたＣＣから選択されうる。各ＣＣは、それがＴＢＲであるかを判定するためにチェックされうる。特性の３つの態様は、ＣＣの特徴、ＣＣ中のメンバＣＣの統計的特徴、及びＣＣと他のＴＢＲとの関係の判定において考慮されうる。これらの態様は、ＴＢＲの検出のために個々に又はあらゆる組合せで使用されうる。ここで、現在のＣＣ中のメンバＣＣは、現在のＣＣの境界内に配置されたＣＣであり、現在のＣＣに対して高いコントラストを有する。メンバＣＣは、現在のＣＣの反対側のチャネルから抽出されうる。例えば、暗いＣＣのメンバＣＣは暗いＣＣの領域内の明るいＣＣであってよく、明るいＣＣのメンバＣＣは明るいＣＣの領域内の暗いＣＣであってよい。 According to one embodiment, the TBR may be selected from CCs included in the image. Each CC can be checked to determine if it is a TBR. Three aspects of characteristics may be considered in determining CC characteristics, statistical characteristics of member CCs in the CC, and relationships between CCs and other TBRs. These aspects can be used individually or in any combination for the detection of TBR. Here, the member CC in the current CC is a CC arranged within the boundary of the current CC, and has a high contrast with respect to the current CC. The member CC can be extracted from the channel on the other side of the current CC. For example, a dark CC member CC may be a bright CC in a dark CC region, and a bright CC member CC may be a dark CC in a bright CC region.

図８Ｃは、白線のボックスで示される、２つのＴＢＲが検出された画像を示す。 FIG. 8C shows an image in which two TBRs are detected, indicated by a white line box.

［ＣＣの特徴］
ＣＣは、それがＴＢＲであるかを判定するため、その特徴に基づいてチェックされうる。ＣＣの特徴は、例えば、ＣＣの色又はグレースケールの均一性、ＣＣのサイズ、ＣＣの形状、ＣＣの境界線の規則性、画像中のＣＣの位置、ＣＣの平均グレースケール値及びＣＣのグレースケール値分布のうちの少なくとも１つを含みうる。 [Characteristics of CC]
The CC can be checked based on its characteristics to determine if it is a TBR. CC features include, for example, CC color or gray scale uniformity, CC size, CC shape, CC boundary regularity, CC location in the image, CC average gray scale value and CC gray. At least one of the scale value distributions may be included.

尚、ＴＢＲのサイズは、通常、相対的に大きい。従って、一実施形態によると、全てのＣＣはサイズによりソート可能であり、ＴＢＲは、上位ｎ個の最も大きなＣＣから選択可能である。 Note that the size of the TBR is usually relatively large. Thus, according to one embodiment, all CCs can be sorted by size, and the TBR can be selected from the top n largest CCs.

尚、ＴＢＲは、通常、画像の余白領域ではなく、画像中の顕著な位置に配置される。従って、別の実施形態によると、余白領域に配置されたＣＣは、非ＴＢＲ領域として除外されうる。例えば余白領域は、１／ｍ像幅等の指定された幅又は１／ｍ像高を有する画像の外側ループ領域として規定されうる。 Note that the TBR is usually arranged at a prominent position in the image, not in the blank area of the image. Therefore, according to another embodiment, CCs arranged in the blank area can be excluded as non-TBR areas. For example, the margin area can be defined as the outer loop area of an image having a specified width, such as a 1 / m image width, or a 1 / m image height.

更に別の実施形態によると、ＴＢＲが、通常、規則的な境界線を有するため、ＣＣの境界線の規則性は、ＴＢＲの判定において考慮されうる。境界線の規則性は、ＣＣの密度（バウンディングボックスにおけるＣＣの占有率）、境界線率（境界線画素とＣＣ画素の量との比）及び境界線の対称性（４つの四分円における密度差により評価されうる４つの四分円における境界線の類似性）により測定されうる。 According to yet another embodiment, the regularity of the CC boundary can be considered in the determination of the TBR because the TBR typically has a regular boundary. The regularity of the boundary line is the density of CC (occupation ratio of CC in the bounding box), the boundary line ratio (ratio of the amount of boundary line pixels to the CC pixel), and the symmetry of the boundary line (density in four quadrants). Boundary similarity in four quadrants that can be evaluated by difference).

更に別の実施形態によると、ＴＢＲがフラッシュにより形成されるべきではないため、いくつかの特徴は、ＴＢＲをフラッシュから区別するために使用される。尚、フラッシュは、通常、中央において平均よりも非常により高いグレースケール値を含む高い平均グレースケール値を有する。従って、ＣＣの平均グレースケール値及びグレースケール値分布は、ＴＢＲをフラッシュから区別するために使用されうる。 According to yet another embodiment, some features are used to distinguish the TBR from the flash since the TBR should not be formed by the flash. It should be noted that flash typically has a high average gray scale value that includes a much higher gray scale value than the average in the middle. Thus, the average grayscale value and grayscale value distribution of CC can be used to distinguish TBR from flash.

［ＣＣ中のメンバＣＣの統計］
ＣＣは、それがＴＢＲであるかを判定するために、ＣＣ中のメンバＣＣの統計に基づいてチェックされうる。ここで、現在のＣＣ中のメンバＣＣは、現在のＣＣの境界内に配置されたＣＣであり、現在のＣＣに対して高いコントラストを有する。メンバＣＣの領域は、完全に現在のＣＣの境界内にある。メンバＣＣは、現在のＣＣの反対側のチャネルから抽出されうる。例えば、暗いＣＣのメンバＣＣは暗いＣＣの領域内の明るいＣＣであってよく、明るいＣＣのメンバＣＣは明るいＣＣの領域内の暗いＣＣであってよい。 [Statistics of member CC in CC]
A CC may be checked based on statistics of member CCs in the CC to determine if it is a TBR. Here, the member CC in the current CC is a CC arranged within the boundary of the current CC, and has a high contrast with respect to the current CC. The region of member CC is completely within the boundaries of the current CC. The member CC can be extracted from the channel on the other side of the current CC. For example, a dark CC member CC may be a bright CC in a dark CC region, and a bright CC member CC may be a dark CC in a bright CC region.

メンバＣＣの統計は、例えば、ＣＣ中のメンバＣＣの数、メンバＣＣのシードＣＣの数、ＣＣ中のメンバＣＣの平均テキスト信頼度及びＣＣ中のメンバＣＣの総面積とＣＣの面積との比のうちの少なくとも１つを含みうる。 The statistics of the member CC include, for example, the number of member CCs in the CC, the number of seed CCs of the member CC, the average text reliability of the member CC in the CC, and the ratio of the total area of the member CC and the area of the CC in the CC. At least one of them.

説明のために、ＣＣ中のメンバＣＣの数は閾値より多いことが好ましい。 For the sake of explanation, it is preferable that the number of member CCs in the CC is larger than the threshold.

メンバＣＣの間のシードＣＣの数は、閾値より多いことが好ましい。ここで、シードＣＣは、テキスト成分である可能性が非常に高いＣＣを示す。例えばシードＣＣは、事前定義済みの閾値より高いテキスト信頼度を有するＣＣであってよい。シードＣＣを選択するために、特徴の集合は、ＣＣのテキスト信頼度を算出するためにＣＣから抽出されうる。 The number of seed CCs among the members CC is preferably larger than a threshold value. Here, the seed CC indicates a CC that is very likely to be a text component. For example, the seed CC may be a CC that has a text confidence higher than a predefined threshold. In order to select a seed CC, a set of features can be extracted from the CC to calculate the text reliability of the CC.

一例として、図８Ｄは、白線のボックスにおけるいくつかのシードＣＣを示す。 As an example, FIG. 8D shows several seed CCs in a white box.

シードＣＣを選択するために一般的に使用される特徴は、ＣＣのサイズ、ＣＣの幅／高さの比、ＣＣの密度（すなわち、バウンディングボックス内のＣＣ画素の占有率）、ＣＣのストローク幅の統計的特徴及びＣＣの領域から抽出されたテクスチャ特徴を含みうる。一実施形態において、特徴は、テキスト信頼度を調整する際に規則として使用されうる。別の実施形態において、テキスト分類器は、テキストＣＣ及び非テキストＣＣの双方を含む訓練集合に基づいて習得されうる。分類器は、ＣＣの特徴を入力として使用し、ＣＣのテキスト信頼度値を出力する。 Commonly used features for selecting the seed CC are: CC size, CC width / height ratio, CC density (ie occupancy of CC pixels in the bounding box), CC stroke width Statistical features and texture features extracted from the CC region. In one embodiment, features can be used as rules in adjusting text confidence. In another embodiment, the text classifier can be learned based on a training set that includes both text CC and non-text CC. The classifier uses CC features as input and outputs CC text confidence values.

［ＣＣと他のＴＢＲとの関係］
ＣＣは、それがＴＢＲであるかを判定するために、他のＴＢＲとの関係に基づいてチェックされうる。一実施形態によると、少なくとも１つのＣＣと他のＴＢＲとの関係に基づいてＣＣからＴＢＲを選択することは、前に判定されたあらゆるＴＢＲにおけるメンバＣＣではなく、且つ前に判定されたＴＢＲと同一のメンバＣＣを有さないＣＣに応答して、ＣＣをＴＢＲとして判定することを含みうる。換言すると、ＴＢＲは、通常、互いに重複しないかあるいは互いを含まない。 [Relationship between CC and other TBRs]
A CC can be checked based on relationships with other TBRs to determine if it is a TBR. According to one embodiment, selecting a TBR from a CC based on the relationship between at least one CC and another TBR is not a member CC in any previously determined TBR, and the previously determined TBR In response to CCs that do not have the same member CC, may include determining the CC as a TBR. In other words, TBRs typically do not overlap with each other or do not include each other.

一例として、図４は、ＣＣと他のＴＢＲとの関係に基づいてＣＣがＴＢＲであるかを判定する処理４００を示す。 As an example, FIG. 4 shows a process 400 for determining whether a CC is a TBR based on the relationship between the CC and another TBR.

ブロック４１０において、現在のＣＣが前に判定されたＴＢＲのメンバＣＣであるかを判定する。現在のＣＣは、前に判定されたＴＢＲのメンバＣＣである場合にはＴＢＲではないものとして識別される。現在のＣＣが前に判定されたＴＢＲのメンバＣＣでない場合、処理４００はブロック４２０に進み、現在のＣＣが前に判定されたＴＢＲと同一のメンバＣＣを有するかを判定する。現在のＣＣは、前に判定されたＴＢＲと同一のメンバＣＣを全く有さない場合にがＴＢＲとして識別される。現在のＣＣが前に判定されたＴＢＲと同一のメンバＣＣを有する場合、処理４００はブロック４３０に進み、現在のＣＣ及び前に判定されたＴＢＲのうちのどちらがよりＴＢＲのようであるかを判定する。現在のＣＣがよりＴＢＲのようである場合、前に判定されたＴＢＲはＴＢＲの集合から除去され（ブロック４４０）、現在のＣＣはＴＢＲとして識別される。前に判定されたＴＢＲがよりＴＢＲのようである場合、現在のＣＣは、ＴＢＲではないものとして識別される。 In block 410, it is determined whether the current CC is a previously determined member BR of the TBR. The current CC is identified as not being a TBR if it is a member CC of a previously determined TBR. If the current CC is not a previously determined TBR member CC, process 400 proceeds to block 420 to determine if the current CC has the same member CC as the previously determined TBR. The current CC is identified as a TBR if it does not have any member CC identical to the previously determined TBR. If the current CC has the same member CC as the previously determined TBR, process 400 proceeds to block 430 to determine which of the current CC or the previously determined TBR is more like a TBR. To do. If the current CC is more like a TBR, the previously determined TBR is removed from the set of TBRs (block 440) and the current CC is identified as a TBR. If the previously determined TBR is more like a TBR, the current CC is identified as not being a TBR.

ブロック４３０においてどちらがよりＴＢＲのようであるかを判定する種々の方法があってよい。例えば、上述の基準、例えば境界線の規則性、ＣＣの密度及びメンバＣＣの平均テキスト信頼度等のうちの１つ以上が使用されうる。 There may be various ways to determine which is more like TBR at block 430. For example, one or more of the above criteria may be used, such as boundary regularity, CC density, and member CC average text reliability.

再度、図３を参照する。画像からＴＢＲが検出されているため、処理３００はブロック３２０に進む。ブロック３２０において、ＣＣフィルタリングステップは、少なくとも１つのＣＣをフィルタリングして少なくとも１つの候補となるテキストＣＣを確保するために実行される。 FIG. 3 will be referred to again. Since TBR has been detected from the image, process 300 proceeds to block 320. In block 320, a CC filtering step is performed to filter at least one CC to ensure at least one candidate text CC.

特に及び好ましくは、ＣＣからＴＢＲを選択した後、残りのＣＣは、候補となるテキストＣＣ及び非テキストＣＣを含む非ＴＢＲＣＣである。このステップの後、候補となるテキストＣＣが確保されるが、非テキストＣＣは除去される。 In particular and preferably, after selecting a TBR from a CC, the remaining CCs are non-TBR CCs including candidate text CCs and non-text CCs. After this step, candidate text CCs are reserved, but non-text CCs are removed.

一実施形態によると、ＴＢＲ情報は、ＣＣをフィルタリングするために使用される。ＴＢＲの境界内のＣＣ及びあらゆるＴＢＲの境界内にないＣＣの双方は、同一の規則に基づいてフィルタリングされることが好ましい。例えば、ＴＢＲの境界内のＣＣ及びあらゆるＴＢＲの境界内にないＣＣは、ＣＣのサイズ、ＣＣの形状、ＣＣのバウンディングボックスのアスペクト比、ＣＣとそのバウンディングボックスとの面積比、周長とＣＣとの面積の比及びＣＣのテクスチャ特徴のうちの少なくとも１つに基づいてフィルタリングされる。 According to one embodiment, TBR information is used to filter CCs. Both CCs within the TBR boundary and CCs not within any TBR boundary are preferably filtered based on the same rule. For example, a CC within a TBR boundary and a CC that is not within any TBR boundary are: CC size, CC shape, CC bounding box aspect ratio, CC to its bounding box area ratio, perimeter and CC Based on at least one of the area ratio and the texture feature of CC.

好ましくは、ＣＣをフィルタリングする場合、ＣＣのサイズは、経験値に基づいて所定の範囲にあるものとして選択されうる。テキストＣＣのアスペクト比が、通常、高すぎないため、ＣＣのバウンディングボックスのアスペクト比は、所定の閾値より低いものとして選択されうる。ＣＣとそのバウンディングボックスとの面積比は、低すぎるべきではなく、経験値に従って所定の閾値より高いものとして選択されうる。ＣＣの面積比に対する周長は、所定の閾値より短くなりうる。ノイズＣＣの面積比に対する周長が、通常、相対的に長いため、これはノイズＣＣを除去するためである。ＣＣから抽出されたウェーブレット、Ｇａｂｏｒ、ＬＢＰ等のテクスチャ特徴は、テキスト信頼度を算出するために使用されうるため、ＣＣをフィルタリングする際に組み込まれうる。 Preferably, when filtering CC, the size of CC may be selected as being in a predetermined range based on experience values. Since the aspect ratio of the text CC is usually not too high, the aspect ratio of the bounding box of the CC can be selected as being lower than a predetermined threshold. The area ratio between the CC and its bounding box should not be too low and can be selected as higher than a predetermined threshold according to experience. The circumference with respect to the area ratio of CC can be shorter than a predetermined threshold. This is because the perimeter of the noise CC relative to the area ratio is usually relatively long, so that the noise CC is removed. Texture features such as wavelets, Gabor, LBP, etc. extracted from the CC can be used to calculate the text reliability and can therefore be incorporated when filtering the CC.

尚、テキストは、ＴＢＲの外よりＴＢＲ内に現れる可能性がより高い。従って、別の実施形態によると、ＴＢＲ情報は、フィルタリングの効率及び精度を更に向上させるために、ＣＣをフィルタリングする際に使用されうる。 Note that text is more likely to appear in the TBR than outside the TBR. Thus, according to another embodiment, TBR information can be used when filtering CCs to further improve the efficiency and accuracy of filtering.

特に、例えばＣＣフィルタリングステップにおいて、あらゆるＴＢＲの境界内にないＣＣに対するフィルタリングは、ＴＢＲの境界内のＣＣに対するフィルタリングより厳しくてよい。別の例の場合、あらゆるＴＢＲの境界内にないＣＣは、ＴＢＲの境界内のＣＣより多くの規則によりフィルタリングされうる。 In particular, the filtering for CCs that are not within the boundaries of any TBR, for example in the CC filtering step, may be stricter than the filtering for CCs within the boundaries of the TBR. In another example, CCs that are not within the boundaries of any TBR may be filtered by more rules than CCs that are within the boundaries of the TBR.

これは、あらゆるＴＢＲの境界内にないものとして判定されるＣＣが算出された相対的に低いテキスト信頼度を有する結果、ノイズを被りやすいためである。従って、微フィルタリングは、非テキストＣＣを除去するために実行されうる。 This is because it is likely to suffer noise as a result of having a relatively low text reliability with which the CC determined to be not within the boundaries of any TBR is calculated. Thus, fine filtering can be performed to remove non-text CCs.

あらゆるＴＢＲの境界内にないＣＣは、ストローク幅の統計及び／又はＣＣの境界線画素数とＣＣの画素数との比に更に基づいてフィルタリングされうることが好ましい。例えばストローク幅の統計は、ストローク幅の分散と平均ストローク幅との比を含みうる。 Preferably, CCs that are not within the boundaries of any TBR can be filtered further based on stroke width statistics and / or the ratio of CC border pixels to CC pixels. For example, the stroke width statistics may include a ratio of stroke width variance to average stroke width.

図８Ｄは、フィルタリングの結果を示す。確保されたテキスト候補ＣＣは、黒線又は白線で描画されたバウンディングボックスにより示される。ここで、白線のバウンディングボックスはシードＣＣを示し、黒線のバウンディングボックスは非シードＣＣを示す。図８Ｂと比較して、ＣＣの一部、特にＴＢＲの外側に配置されたＣＣは除去される。 FIG. 8D shows the result of filtering. The reserved text candidate CC is indicated by a bounding box drawn with a black line or a white line. Here, a white line bounding box indicates a seed CC, and a black line bounding box indicates a non-seed CC. Compared to FIG. 8B, a part of the CC, particularly the CC arranged outside the TBR, is removed.

ＣＣのフィルタリングを実行する例示的な方法を示すために、以下において２つの実施形態を説明する。 In order to illustrate an exemplary method for performing CC filtering, two embodiments are described below.

図５Ａは、本発明の一実施形態に係る図３のＣＣフィルタリングステップを示すフローチャートである。 FIG. 5A is a flowchart illustrating the CC filtering step of FIG. 3 according to an embodiment of the present invention.

図５Ａにおいて、非ＴＢＲＣＣの各々に対するテキスト信頼度は、ＣＣがあらゆるＴＢＲの境界内に配置されるかに基づいて算出される。 In FIG. 5A, the text reliability for each of the non-TBR CCs is calculated based on whether the CC is located within the boundaries of every TBR.

ブロック５１０において、各非ＴＢＲＣＣのテキスト信頼度は、ＴＢＲ情報に基づいて算出される。計算において、ＣＣＩＲはＣＣＯＲより重視される。 At block 510, the text reliability of each non-TBR CC is calculated based on the TBR information. In calculation, CCIR is more important than CCOR.

ブロック５２０において、ＣＣのテキスト信頼度が事前定義済みの閾値Ｔより高いかを判定する。テキスト信頼度が閾値より高い場合、ＣＣはテキスト候補ＣＣとして判定される。テキスト信頼度が閾値より低い場合、ＣＣは非テキストＣＣとして判定される。 At block 520, it is determined whether the text reliability of the CC is higher than a predefined threshold T. If the text reliability is higher than the threshold, the CC is determined as a text candidate CC. If the text reliability is lower than the threshold, the CC is determined as a non-text CC.

本実施形態の特定の一例は、以下の通り提供される。現在のＣＣのテキスト信頼度は、ベイズの定理により規定されうる。
Ｐ（Ａ｜Ｂ）＝Ｐ（Ｂ｜Ａ）Ｐ（Ａ）／Ｐ（Ｂ）
式中、Ｐ（Ａ）は事前確率であり、Ｐ（Ｂ｜Ａ）は条件付き確率であり、Ｐ（Ａ｜Ｂ）は事後確率である。 A specific example of this embodiment is provided as follows. The text reliability of the current CC can be defined by Bayes' theorem.
P (A | B) = P (B | A) P (A) / P (B)
Where P (A) is the prior probability, P (B | A) is the conditional probability, and P (A | B) is the posterior probability.

ＣＣのフィルタリングの特定の例を考慮すると、Ａは、ある特定のＣＣのラベル（テキスト又は非テキスト）を示す確率変数である。Ｐ（Ａ）は、現在のＣＣのテキスト存在の事前確率を表す。Ｐ（Ａ）はＴＢＲにより判定されうる。ＣＣＩＲは、ＣＣＯＲより高いＰ（Ａ）を与えられうる。 Considering a specific example of CC filtering, A is a random variable that indicates a particular CC label (text or non-text). P (A) represents the prior probability of text presence of the current CC. P (A) can be determined by TBR. CCIR can be given a higher P (A) than CCOR.

Ｐ（Ｂ｜Ａ）は、テキスト存在の条件付き確率である。Ｐ（Ｂ｜Ａ）は、テキスト領域が何に見えるかを説明する。従って、値は、テキスト領域自体から抽出されたテキスト特徴に基づいて算出される。Ｐ（Ｂ）は、現在のＣＣの存在確率である。ＣＣが固定される場合、Ｐ（Ｂ）は一定値である。 P (B | A) is the conditional probability of text presence. P (B | A) describes what the text area looks like. Thus, the value is calculated based on the text features extracted from the text area itself. P (B) is the existence probability of the current CC. When CC is fixed, P (B) is a constant value.

Ｐ（Ａ｜Ｂ）は、現在のＣＣのテキスト信頼度である。Ｐ（Ａ｜Ｂ）は、ＣＣ自体のテキスト特徴及びＣＣに関連したＴＢＲ情報の双方による影響を受ける。事前定義済みの閾値より高いＰ（Ａ｜Ｂ）値を有するＣＣは、テキスト候補ＣＣとして確保される。 P (A | B) is the text reliability of the current CC. P (A | B) is affected by both the text characteristics of the CC itself and the TBR information associated with the CC. CCs having a P (A | B) value higher than a predefined threshold are reserved as text candidate CCs.

この例において、ＣＣＩＲに対するＰ（Ａ）／Ｐ（Ｂ）は１として設定可能であり、ＣＣＯＲに対するＰ（Ａ）／Ｐ（Ｂ）は、（０，１）の範囲の値として設定可能である。 In this example, P (A) / P (B) for CCIR can be set as 1, and P (A) / P (B) for CCOR can be set as a value in the range of (0, 1). .

図５Ｂは、本発明の別の実施形態に係るＣＣのフィルタリングを示すフローチャートである。 FIG. 5B is a flowchart illustrating CC filtering according to another embodiment of the present invention.

図５Ｂにおいて、フィルタリングは、ＴＢＲ情報に基づいていくつかの段階、例えば２つの段階で非ＴＢＲＣＣに対して実行される。例えば２つの段階は、粗フィルタリング及び微細フィルタリングを含む。全ての非ＴＢＲＣＣは粗フィルタリングにかけられるが、あらゆるＴＢＲの境界外のＣＣのみが微細フィルタリングにかけられてもよい。単純な特徴は粗フィルタリングの際に使用可能であり、より複雑な特徴は微細フィルタリングの際に使用可能である。従って、あらゆるＴＢＲの境界内にないＣＣに対するフィルタリングは、ＴＢＲの境界内のＣＣに対するフィルタリングより厳しく行われる。従って、微細フィルタリングにかけられるＣＣの量は減少し、方法の効率は向上する。 In FIG. 5B, filtering is performed on non-TBR CCs in several stages, eg, two stages, based on TBR information. For example, the two stages include coarse filtering and fine filtering. All non-TBR CCs are subject to coarse filtering, but only CCs outside any TBR boundary may be subject to fine filtering. Simple features can be used during coarse filtering, and more complex features can be used during fine filtering. Therefore, filtering for CCs that are not within the boundaries of any TBR is more severe than filtering for CCs that are within the boundaries of the TBR. Thus, the amount of CC subjected to fine filtering is reduced and the efficiency of the method is improved.

ブロック５３０において、非ＴＢＲＣＣは、ＣＣＩＲ、すなわちあらゆるＴＢＲの境界内のＣＣと、ＣＣＯＲ、すなわちあらゆるＴＢＲの境界内にないＣＣとの２つのグループに分離される。 At block 530, the non-TBR CCs are separated into two groups: CCIRs, ie CCs within any TBR boundary, and CCORs, CCs not within any TBR boundary.

ブロック５４０において、粗フィルタリング等の第１のフィルタリングステップは、全ての非ＴＢＲＣＣに対して実行される。特に、非ＴＢＲＣＣの各々は、候補となるテキストＣＣ又は非テキストＣＣとして判定される。 At block 540, a first filtering step, such as coarse filtering, is performed for all non-TBR CCs. In particular, each non-TBR CC is determined as a candidate text CC or non-text CC.

第１のフィルタリングステップは、ＣＣが候補となるテキストＣＣであるかを判定するために、非ＴＢＲＣＣの各々の１つ以上の第１の特徴に基づいて実行されうる。第１の特徴は、ＣＣから抽出され、ＣＣのサイズ、ＣＣの形状、ＣＣのバウンディングボックスのアスペクト比、ＣＣの密度（ＣＣとそのバウンディングボックスの面積比）、ＣＣの面積比に対する周長及びＣＣのテクスチャ特徴を含むがそれらに限定されない相対的に単純な特徴であってよい。一例として、テクスチャ特徴は、ローカルバイナリパターン、エッジ方向ヒストグラム及び勾配のヒストグラムを含みうるが、それらに限定されない。 The first filtering step may be performed based on one or more first features of each of the non-TBR CCs to determine whether the CC is a candidate text CC. The first feature is extracted from the CC, the size of the CC, the shape of the CC, the aspect ratio of the bounding box of the CC, the density of the CC (the area ratio of the CC and its bounding box), the perimeter to the area ratio of the CC, and the CC May be relatively simple features including, but not limited to. As an example, texture features may include, but are not limited to, local binary patterns, edge direction histograms, and gradient histograms.

第１の特徴は、カスケード規則として使用されうるか、あるいは訓練分類器に入力される特徴ベクトルとして組み合わされうる。カスケード規則の閾値又は分類器は、テキストサンプル及び非テキストサンプルの双方から習得されうる。カスケード規則がフィルタリングの際に使用される場合、各入力ＣＣは事前定義済みの規則によりチェック可能であり、規則の少なくとも１つを満たさないＣＣは除去される。 The first feature can be used as a cascade rule or can be combined as a feature vector that is input to a training classifier. Cascade rule thresholds or classifiers can be learned from both text samples and non-text samples. When cascade rules are used in filtering, each input CC can be checked by a predefined rule, and CCs that do not satisfy at least one of the rules are removed.

ブロック５５０において、微細フィルタリング等の第２のフィルタリングステップは、候補となるテキストＣＣＯＲが候補となるテキストＣＣであるかを更に判定するために、候補となるテキストＣＣＯＲ、すなわち第１のフィルタリングステップにより候補となるテキストＣＣとして判定されるＣＣＯＲの各々に対して実行される。ＣＣＯＲが、ブロック５４０において候補となるテキストＣＣとして判定されたとしてもノイズを被りやすいため、第２のフィルタリングは非テキストＣＣを更に除去してよい。 In block 550, a second filtering step, such as fine filtering, is performed by the candidate text CCOR, ie, the first filtering step, to further determine whether the candidate text CCOR is a candidate text CC. It is executed for each CCOR determined as a text CC. Even if the CCOR is determined as a candidate text CC in block 540, the second filtering may further remove non-text CCs because it is susceptible to noise.

第２のフィルタリングステップにおいて、ブロック５４０において使用された特徴に対してより厳しい条件が採用されてよく、且つ／あるいはフィルタリングのためにいくつかの他の特徴が使用されてよい。他の特徴は、ストローク幅の統計（例えば、ストローク幅の分散と平均ストローク幅との比）及び／又は境界線画素数とＣＣ画素数との比）を含みうる。 In the second filtering step, more stringent conditions may be employed for the features used in block 540 and / or some other features may be used for filtering. Other features may include stroke width statistics (eg, ratio of stroke width variance to average stroke width) and / or ratio of borderline pixel number to CC pixel number).

第１のフィルタリングステップと同様に、第２のフィルタリングに対する特徴は、カスケード規則として使用されうるか、あるいは訓練分類器に入力される特徴ベクトルとして組み合わされうる。カスケード規則の閾値又は分類器は、テキストサンプル及び非テキストサンプルの双方から習得されうる。カスケード規則がフィルタリングの際に使用される場合、各入力ＣＣは事前定義済みの規則によりチェック可能であり、規則の少なくとも１つを満たさないＣＣは除去される。 Similar to the first filtering step, the features for the second filtering can be used as cascade rules or can be combined as feature vectors that are input to the training classifier. Cascade rule thresholds or classifiers can be learned from both text samples and non-text samples. When cascade rules are used in filtering, each input CC can be checked by a predefined rule, and CCs that do not satisfy at least one of the rules are removed.

図５Ｃは、訓練分類器を使用するＣＣのフィルタリングを示すフローチャートである。図５Ｃの方法は、第１のフィルタリングステップ及び第２のフィルタリングステップの双方に適用可能である。図５Ｃにおいて、テキストサンプル及び非テキストサンプルを含む訓練サンプルは、分類器を訓練するために使用される。ＣＣは、テキスト候補ＣＣを取得するために訓練分類器により分類される。訓練及び分類のために抽出された特徴は、第１のフィルタリング及び第２のフィルタリングに関連して上述した特徴である。本発明の主題を不必要に不明確にすることを回避するために、フローチャートに関する更なる詳細については説明しない。 FIG. 5C is a flowchart illustrating CC filtering using a training classifier. The method of FIG. 5C is applicable to both the first filtering step and the second filtering step. In FIG. 5C, training samples including text samples and non-text samples are used to train the classifier. CCs are classified by a training classifier to obtain text candidate CCs. The features extracted for training and classification are the features described above in connection with the first filtering and the second filtering. In order to avoid unnecessarily obscuring the subject matter of the present invention, no further details regarding the flowchart will be described.

再度、図３を参照する。候補となるテキストＣＣを取得した後、処理３００はブロック３３０に進む。ブロック３３０において、ＣＣグループ化ステップは、ＴＢＲ検出ステップにおいて検出されたＴＢＲに基づいて少なくとも１つの候補となるテキストＣＣをグループ化して少なくとも１つのＣＣグループを形成し、且つ少なくとも１つのＣＣグループに基づいて少なくとも１つのテキスト領域を生成するために実行される。 FIG. 3 will be referred to again. After obtaining the candidate text CC, the process 300 proceeds to block 330. In block 330, the CC grouping step groups at least one candidate text CC based on the TBR detected in the TBR detection step to form at least one CC group, and based on the at least one CC group Executed to generate at least one text region.

ＣＣグループ化ステップ３３０を実行する例示的なフローチャートを図６に示す。図６に示されるように、ＣＣグループ化ステップ３３０は、ステップ６１０〜６３０を含みうる。 An exemplary flowchart for performing the CC grouping step 330 is shown in FIG. As shown in FIG. 6, the CC grouping step 330 may include steps 610-630.

ＣＣグループ化ステップに対する入力は、候補となるテキストＣＣである。 The input to the CC grouping step is a candidate text CC.

ステップ６１０において、候補となるテキストＣＣは、それぞれのテキスト背景領域に割り当てられる。ＴＢＲに割り当て不可能な候補となるテキストＣＣは、外側領域に割り当てられる。 In step 610, candidate text CCs are assigned to respective text background regions. The text CC that is a candidate that cannot be assigned to the TBR is assigned to the outer area.

ステップ６１０は、図５Ｂのブロック５３０に示されるようなステップに類似する。従って、ステップ６１０についての説明は省略する。尚、ステップ３１０でＴＢＲが検出されない場合、全てのＣＣは外側領域にある。候補となるテキストＣＣを割り当てた後、ステップ６２０に進む。 Step 610 is similar to the step as shown in block 530 of FIG. 5B. Therefore, the description about step 610 is omitted. If no TBR is detected in step 310, all CCs are in the outer region. After assigning the candidate text CC, the process proceeds to step 620.

ステップ６２０において、各ＴＢＲ及び外側領域のＣＣは、それぞれ、グループ化されてＣＣグループを形成する。 In step 620, each TBR and outer region CCs are grouped together to form a CC group.

このステップにおいて、１つの領域のＣＣは、空間関係及び外観の類似性に基づいてグループ化される。図６のグループ化ステップ６２０を実行する例示的なフローチャートを図７に示す。 In this step, the CCs in one region are grouped based on spatial relationships and appearance similarities. An exemplary flowchart for performing the grouping step 620 of FIG. 6 is shown in FIG.

図７に示されるように、グループ化ステップ６２０は、ステップ６２０１〜６２０３を含みうる。 As shown in FIG. 7, the grouping step 620 may include steps 6201-6203.

ステップ６２０１において、各ＴＢＲ及び外側領域のＣＣは、暗い領域の明るいＣＣの集合及び明るい領域の暗いＣＣの集合に分割される。 In step 6201, each TBR and outer region CC is divided into a dark region set of bright CCs and a bright region set of dark CCs.

ステップ６２０２において、ＣＣグループは、それぞれ、明るいＣＣの集合及び暗いＣＣの集合内に生成される。 In step 6202, CC groups are created in the bright CC and dark CC sets, respectively.

好ましい一実施形態によると、ＣＣグループはＣＣクラスタリングにより生成されうる。ＣＣクラスタリングは、ある特定の方向に従うＣＣの中心の位置合わせ、ＣＣのサイズの類似性、ＣＣの形状の類似性、ＣＣの色又はグレースケールの類似性、ＣＣのストローク幅の類似性及びＣＣ間の距離の制約のうちの１つ以上を使用する。 According to a preferred embodiment, CC groups can be generated by CC clustering. CC clustering is the alignment of CC centers according to a certain direction, CC size similarity, CC shape similarity, CC color or grayscale similarity, CC stroke width similarity and between CCs. Use one or more of the distance constraints.

別の実施形態によると、ＣＣグループは、最初にハフ変換により生成される。ハフ変換の結果、１つの行上に中心があるＣＣは共にグループ化される。次に、生成されたＣＣグループに含まれたＣＣは、上記の制約を使用してフィルタリングされる。 According to another embodiment, the CC group is first generated by a Hough transform. As a result of the Hough transform, CCs centered on one row are grouped together. Next, CCs included in the generated CC group are filtered using the above constraints.

ステップ６２０３において、明るいＣＣ及び暗いＣＣが、それぞれ、ステップ６２０２でグループ化された後、明るいＣＣの集合及び暗いＣＣの集合は、空間関係及び／又は外観の類似性に基づいて組み合わされる。 In step 6203, the bright CC and dark CC are grouped in step 6202, respectively, and then the bright CC set and dark CC set are combined based on spatial relationships and / or appearance similarities.

ステップ６２０３でＣＣグループを組み合わせるために一般的に使用される特徴は、例えば、２つのある特定のグループのバウンディングボックスの重複率、サイズの類似性（サイズの差は、２つのグループの高さの最大値より低いことが好ましい）及び行方向の類似性（方向の差は、３０度より小さいことが好ましい）を含む。上記の特徴の１つ又はあらゆる組合せが実際に使用されてもよい。 Features commonly used to combine CC groups in step 6203 are, for example, the bounding box overlap rate of two specific groups, the size similarity (the difference in size is the height of the two groups Lower than the maximum value) and row direction similarity (direction difference is preferably less than 30 degrees). One or any combination of the above features may actually be used.

ステップ６２０３の完了後、各ＴＢＲ及び外側領域のＣＣグループがそれぞれ形成されている。図６のステップ６３０に進む。 After the completion of step 6203, each TBR and outer area CC group are formed. Proceed to step 630 in FIG.

ステップ６３０において、各ＴＢＲ及び外側領域間の種々の領域からのＣＣグループは、組み合わされてテキスト領域を生成する。 In step 630, CC groups from various regions between each TBR and the outer region are combined to produce a text region.

このステップにおいて、種々の領域からのＣＣグループは、行方向の整合性、ＣＣの平均サイズの類似性、ＣＣの平均ストローク幅の類似性及びＣＣの平均的な色又はグレースケールの類似性のうちの少なくとも１つに基づいて組み合わされる。 In this step, CC groups from different regions are classified into row direction consistency, CC average size similarity, CC average stroke width similarity and CC average color or grayscale similarity. Based on at least one of

種々の領域からのＣＣグループを組み合わせる規則は、１つの領域からのＣＣグループをグループ化する規則より厳しくてよいことが好ましい。種々の領域からのＣＣグループが互いに重複しないため、バウンディングボックスの重複率は使用されなくてもよい。 The rules for combining CC groups from different regions are preferably stricter than the rules for grouping CC groups from one region. Since CC groups from various regions do not overlap each other, the overlapping rate of bounding boxes may not be used.

ＣＣグループ化ステップ３３０は、ステップ６３０が完了する時に完了する。 CC grouping step 330 is completed when step 630 is completed.

図８Ｅは、ＣＣグループ化ステップ３３０の結果を示す。候補となるテキストＣＣは、同一の文字列に属するＣＣが白線を使用して接続されるテキスト行／単語にグループ化されていることが図８Ｅからわかるだろう。 FIG. 8E shows the result of the CC grouping step 330. It can be seen from FIG. 8E that candidate text CCs are grouped into text lines / words to which CCs belonging to the same string are connected using white lines.

テキスト検出方法３００は、ＣＣグループ化ステップ３３０が完了する時に終了する。 The text detection method 300 ends when the CC grouping step 330 is completed.

図８Ｆは、テキスト検出方法３００の結果を示す。テキスト行／単語のバウンディングボックスに基づくテキスト領域は、入力画像から検出されていることが分かりうる。最後に検出されたテキスト領域は、白線のボックスで示される。 FIG. 8F shows the result of the text detection method 300. It can be seen that the text region based on the text line / word bounding box has been detected from the input image. The last detected text area is indicated by a white line box.

次に、本発明の一実施形態に係る画像においてテキスト領域を検出するテキスト検出装置９００のブロック図を示す図９を参照する。装置９００は、図３〜図７を参照して説明した方法を実現するために使用されうる。簡潔にするために、図３〜図７を参照して説明したものに類似するいくつかの詳細をここでは省略する。しかし、これらの詳細も装置９００に適用可能であってもよいことが理解されるだろう。 Reference is now made to FIG. 9 showing a block diagram of a text detection device 900 that detects text regions in an image according to an embodiment of the present invention. The apparatus 900 can be used to implement the method described with reference to FIGS. For brevity, some details similar to those described with reference to FIGS. 3-7 are omitted here. However, it will be understood that these details may also be applicable to apparatus 900.

図９に示されるような一実施形態によると、テキスト検出装置９００は、テキスト背景領域（ＴＢＲ）検出ユニット９１０と、ＣＣフィルタリングユニット９２０と、ＣＣグループ化ユニット９３０とを備える。 According to one embodiment as shown in FIG. 9, the text detection device 900 comprises a text background region (TBR) detection unit 910, a CC filtering unit 920, and a CC grouping unit 930.

テキスト背景領域（ＴＢＲ）検出ユニット９１０は、画像からＴＢＲを検出するように構成されうる。 Text background region (TBR) detection unit 910 may be configured to detect TBR from an image.

ＣＣフィルタリングユニット９２０は、少なくとも１つのＣＣをフィルタリングして少なくとも１つの候補となるテキストＣＣを確保するように構成されうる。 CC filtering unit 920 may be configured to filter at least one CC to ensure at least one candidate text CC.

ＣＣグループ化ユニット９３０は、ＴＢＲ検出ユニットにおいて検出されたＴＢＲに基づいて少なくとも１つの候補となるテキストＣＣをグループ化して少なくとも１つのＣＣグループを形成し、且つ少なくとも１つのＣＣグループに基づいて少なくとも１つのテキスト領域を生成するように構成されうる。 The CC grouping unit 930 groups at least one candidate text CC based on the TBR detected in the TBR detection unit to form at least one CC group, and at least one based on the at least one CC group. Can be configured to generate one text region.

本明細書において、ＴＢＲは、画像中のテキストの周囲領域として規定されてよく、規則的な境界線及び均一な色又はグレースケールを有する。 As used herein, a TBR may be defined as the surrounding area of text in an image and has a regular border and a uniform color or gray scale.

一実施形態によると、ＴＢＲ検出ユニット９１０は、少なくとも１つのＣＣの特徴、少なくとも１つのＣＣの境界内に配置されたＣＣであり、少なくとも１つのＣＣに対して高いコントラストを有する少なくとも１つのＣＣ中のメンバＣＣの統計、及び少なくとも１つのＣＣと他のＴＢＲとの間の関係のうちの少なくとも１つに基づいて、少なくとも１つのＣＣからＴＢＲを選択するように構成されうる。 According to one embodiment, the TBR detection unit 910 is a CC located within the boundary of at least one CC, at least one CC boundary, and in at least one CC having a high contrast to the at least one CC. May be configured to select a TBR from at least one CC based on at least one of the statistics of the member CCs and the relationship between the at least one CC and another TBR.

例えば、少なくとも１つのＣＣの特徴は、ＣＣの色又はグレースケールの均一性、ＣＣのサイズ、ＣＣの形状、ＣＣの境界線の規則性、画像中のＣＣの位置、ＣＣの平均グレースケール値及びＣＣのグレースケール値分布のうちの少なくとも１つを含みうる。 For example, at least one CC feature includes CC color or gray scale uniformity, CC size, CC shape, CC border regularity, CC location in the image, CC average gray scale value, and It may include at least one of a CC grayscale value distribution.

例えば、メンバＣＣの統計は、ＣＣ中のメンバＣＣの数、第１の事前定義済みの閾値より高いテキスト信頼度を有するメンバＣＣのシードＣＣの数、ＣＣ中のメンバＣＣの平均テキスト信頼度及びＣＣ中のメンバＣＣの総面積とＣＣの面積との比のうちの少なくとも１つを含みうる。 For example, the statistics of member CC may include the number of members CC in the CC, the number of seed CCs of the member CC having a text confidence higher than the first predefined threshold, the average text confidence of the member CC in the CC, and It may include at least one of the ratio of the total area of the members CC in CC and the area of CC.

例えば、少なくとも１つのＣＣと他のＴＢＲとの間の関係に基づいて少なくとも１つのＣＣからＴＢＲを選択することは、前に判定されたあらゆるＴＢＲにおけるメンバＣＣではなく、且つ前に判定されたＴＢＲと同一のメンバＣＣを有さないＣＣに応答して、ＣＣをＴＢＲとして判定することを含みうる。これは、ＴＢＲ検出ユニット９１０により図４に示されたようなフローチャートを実行することで実現されうる。 For example, selecting a TBR from at least one CC based on the relationship between at least one CC and another TBR is not a member CC in any previously determined TBR, and the previously determined TBR And determining a CC as a TBR in response to a CC that does not have the same member CC. This can be realized by executing the flowchart as shown in FIG. 4 by the TBR detection unit 910.

一実施形態によると、ＣＣフィルタリングユニット９２０において、あらゆるＴＢＲの境界内にないＣＣに対するフィルタリングは、あらゆるＴＢＲの境界内のＣＣに対するフィルタリングより厳しくてよい。 According to one embodiment, in CC filtering unit 920, filtering for CCs that are not within the boundaries of any TBR may be stricter than filtering for CCs that are within the boundaries of any TBR.

一実施形態によると、ＣＣフィルタリングユニット９２０は、以下の条件、すなわちＣＣのサイズ、ＣＣの形状、ＣＣのバウンディングボックスのアスペクト比、ＣＣとそのバウンディングボックスの面積比、周長とＣＣの面積との比及びＣＣのテクスチャ特徴のうちの少なくとも１つに基づいて、あらゆるＴＢＲの境界内のＣＣ及びあらゆるＴＢＲの境界内にないＣＣをフィルタリングするように構成される。ＣＣフィルタリングユニット９２０は、更に以下の条件、すなわちストローク幅の統計及びＣＣの境界線画素数とＣＣの画素数との比の少なくとも一方に基づいて、あらゆるＴＢＲ内にないＣＣをフィルタリングするように構成される。 According to one embodiment, the CC filtering unit 920 includes the following conditions: CC size, CC shape, CC bounding box aspect ratio, CC to its bounding box area ratio, perimeter and CC area. Based on at least one of the ratio and the texture feature of the CC, it is configured to filter CCs within any TBR boundary and CCs not within any TBR boundary. The CC filtering unit 920 is further configured to filter CCs that are not in any TBR based on at least one of the following conditions: stroke width statistics and CC border pixel number to CC pixel number. The

図１０Ａは、本発明の一実施形態に係る図９のＣＣフィルタリングユニットを示すブロック図である。 10A is a block diagram illustrating the CC filtering unit of FIG. 9 according to an embodiment of the present invention.

図１０Ａに示されるように、一実施形態によると、ＣＣフィルタリングユニット９２０は、ＴＢＲ以外の少なくとも１つのＣＣの各々のテキスト信頼度を算出するように構成された算出ユニット１０１０であり、計算において、あらゆるＴＢＲの境界内のＣＣが他のＣＣより重視される算出ユニット１０１０と、事前定義済みの閾値より高いテキスト信頼度を有するＣＣをテキスト候補ＣＣとして判定するように構成された判定ユニット１０２０とを備える。 As shown in FIG. 10A, according to one embodiment, the CC filtering unit 920 is a calculation unit 1010 configured to calculate the text reliability of each of at least one CC other than TBR, A calculation unit 1010 in which CCs in every TBR boundary are more important than other CCs, and a determination unit 1020 configured to determine CCs having text reliability higher than a predefined threshold as text candidate CCs. Prepare.

図１０Ｂは、本発明の別の実施形態に係る図９のＣＣフィルタリングユニットを示すブロック図である。 10B is a block diagram illustrating the CC filtering unit of FIG. 9 according to another embodiment of the present invention.

図１０Ｂに示されるように、別の一実施形態によると、ＣＣフィルタリングユニット９２０は、ＴＢＲ以外の少なくとも１つのＣＣ毎に、ＣＣがあらゆるＴＢＲの境界内に配置されることに応答して、ＣＣを第１のＣＣとして識別するか、あるいはＣＣを第２のＣＣとして識別するように構成された識別ユニット１１０２と、ＣＣがテキスト候補ＣＣであるかを判定するために、第１のＣＣ及び第２のＣＣの各々に対して第１のフィルタリングステップを実行するように構成された第１のフィルタリングユニット１１０４と、ＣＣがテキスト候補ＣＣであるかを更に判定するために、第１のフィルタリングステップによりテキスト候補ＣＣとして判定される第２のＣＣの各々に対して第２のフィルタリングステップを実行するように構成された第２のフィルタリングユニット１１０６とを備える。 As shown in FIG. 10B, according to another embodiment, the CC filtering unit 920 is responsive to CCs being placed within the boundaries of every TBR for every at least one CC other than a TBR. In order to determine whether the CC is a text candidate CC and an identification unit 1102 configured to identify the CC as a first CC or a CC as a second CC A first filtering unit 1104 configured to perform a first filtering step for each of the two CCs, and a first filtering step to further determine whether the CC is a text candidate CC. Configured to perform a second filtering step for each second CC determined as a text candidate CC. And a second filtering unit 1106.

一実施形態によると、第１のフィルタリングユニット１１０４は、ＣＣがテキスト候補ＣＣであるかを判定するように、ＣＣの１つ以上の第１の特徴に基づいて第１のフィルタリングステップを実行するように更に構成されうる。 According to one embodiment, the first filtering unit 1104 performs a first filtering step based on one or more first characteristics of the CC so as to determine whether the CC is a text candidate CC. Can be further configured.

一実施形態によると、第２のフィルタリングユニット１１０６は、ＣＣがテキスト候補ＣＣであるかを更に判定するように、ＣＣの１つ以上の第２の特徴に基づいて第２のフィルタリングステップを実行するように更に構成されうる。 According to one embodiment, the second filtering unit 1106 performs a second filtering step based on one or more second characteristics of the CC to further determine whether the CC is a text candidate CC. Can be further configured.

図１１Ａは、本発明の一実施形態に係る図９のＣＣグループ化ユニット９３０を示すブロック図である。 FIG. 11A is a block diagram illustrating the CC grouping unit 930 of FIG. 9 according to an embodiment of the present invention.

一実施形態によると、ＣＣグループ化ユニット９３０は、割り当てユニット９３０１と、グループ化ユニット９３０２と、第１の組み合わせユニット９３０３とを更に備えうる。 According to an embodiment, the CC grouping unit 930 may further comprise an allocation unit 9301, a grouping unit 9302, and a first combination unit 9303.

割り当てユニット９３０１は、候補となるテキストＣＣをそれぞれのテキスト背景領域に割り当て、且つＴＢＲに割り当て不可能な候補となるテキストＣＣを外側領域に割り当てるように構成されうる。 The assignment unit 9301 may be configured to assign candidate text CCs to respective text background regions and assign candidate text CCs that cannot be assigned to TBRs to outer regions.

グループ化ユニット９３０２は、各ＴＢＲ及び外側領域のＣＣをそれぞれグループ化してＣＣグループを形成するように構成されうる。 The grouping unit 9302 may be configured to group CCs in each TBR and outer region to form a CC group.

第１の組み合わせユニット９３０３は、各ＴＢＲ及び外側領域間の種々の領域からのＣＣグループを組み合わせて前記少なくとも１つのテキスト領域を生成するように構成されうる。 The first combination unit 9303 may be configured to combine the CC groups from various regions between each TBR and the outer region to generate the at least one text region.

一実施形態によると、第１の組み合わせユニット９３０３は、以下の条件、すなわち行方向の整合性、ＣＣグループのグループバウンディングボックスの重複率、ＣＣの平均サイズの類似性、ＣＣの平均ストローク幅の類似性及びＣＣの平均的な色又はグレースケールの類似性のうちの少なくとも１つに基づいて、種々の領域からのＣＣグループを組み合わせるように構成されうる。 According to one embodiment, the first combination unit 9303 includes the following conditions: row direction consistency, CC group group bounding box overlap rate, CC average size similarity, CC average stroke width similarity. Based on at least one of sex and the average color or grayscale similarity of CCs, it can be configured to combine CC groups from different regions.

図１１Ｂは、本発明の一実施形態に係る図１１Ａのグループ化ユニット９３０２を示すブロック図である。 FIG. 11B is a block diagram illustrating the grouping unit 9302 of FIG. 11A according to one embodiment of the invention.

一実施形態によると、グループ化ユニット９３０２は、分割ユニット９３０２−１と、生成ユニット９３０２−２と、第２の組み合わせユニット９３０２−３とを更に備えうる。 According to an embodiment, the grouping unit 9302 may further comprise a split unit 9302-1, a generation unit 9302-2, and a second combination unit 9302-3.

分割ユニット９３０２−１は、各ＴＢＲ及び外側領域のＣＣを暗い領域の明るいＣＣの集合及び明るい領域の暗いＣＣの集合に分割するように構成されうる。 The division unit 9302-1 may be configured to divide each TBR and outer area CC into a dark CC set of bright CCs and a dark CC set of bright areas.

生成ユニット９３０２−２は、それぞれ明るいＣＣの集合及び暗いＣＣの集合内にＣＣグループを生成するように構成されうる。 Generation unit 9302-2 may be configured to generate CC groups within a set of bright CCs and a set of dark CCs, respectively.

第２の組み合わせユニット９３０２−３は、空間関係及び外観の類似性のうちの少なくとも１つに基づいて、明るいＣＣの集合と暗いＣＣの集合とを組み合わせるように構成されうる。 Second combination unit 9302-3 may be configured to combine a set of bright CCs and a set of dark CCs based on at least one of spatial relationships and appearance similarities.

一実施形態によると、生成ユニット９３０２−２において、ＣＣグループはＣＣクラスタリングにより生成されうる。ＣＣクラスタリングは、以下の制約、すなわちある特定の方向に従うＣＣの中心の位置合わせ、ＣＣのサイズの類似性、ＣＣの形状の類似性、ＣＣの色又はグレースケールの類似性、ＣＣのストローク幅の類似性及びＣＣ間の距離のうちの少なくとも１つを使用してよい。 According to an embodiment, in the generation unit 9302-2, the CC group may be generated by CC clustering. CC clustering consists of the following constraints: CC center alignment according to a certain direction, CC size similarity, CC shape similarity, CC color or grayscale similarity, CC stroke width At least one of similarity and distance between CCs may be used.

一実施形態によると、生成ユニット９３０２−２は、ハフ変換によりＣＣグループを生成し、且つ以下の制約、すなわちある特定の方向に従うＣＣの中心の位置合わせ、ＣＣのサイズの類似性、ＣＣの形状の類似性、ＣＣの色又はグレースケールの類似性、ＣＣのストローク幅の類似性及びＣＣ間の距離のうちの少なくとも１つを使用して生成されたＣＣグループに含まれたＣＣをフィルタリングするように更に構成されうる。 According to one embodiment, the generating unit 9302-2 generates a CC group by Hough transform, and the following constraints: CC center alignment according to a certain direction, CC size similarity, CC shape Filtering CCs included in a CC group generated using at least one of: similarity, CC color or grayscale similarity, CC stroke width similarity, and distance between CCs Can be further configured.

一実施形態によると、第１の組み合わせユニット９３０３において組み合わせることに対する規則は、グループ化ユニット９３０２においてグループ化することに対する規則より厳しくてよい。 According to one embodiment, the rules for combining in the first combination unit 9303 may be stricter than the rules for grouping in the grouping unit 9302.

本発明に係るテキスト検出の方法及び装置は、種々の適応例を有する。例えばそれは、カメラが取り込んだ画像又は映像からテキスト情報を自動的に抽出する際に使用されうる。 The text detection method and apparatus according to the present invention have various adaptation examples. For example, it can be used in automatically extracting text information from images or video captured by a camera.

図１２は、本発明の一実施形態に係るテキスト情報抽出方法を示す。 FIG. 12 shows a text information extraction method according to an embodiment of the present invention.

図１２に示されるように、ブロック１２１０において、入力画像又は入力映像からのテキスト領域は、図３〜図７を参照して説明したテキスト検出方法に係るテキスト検出方法を使用して検出される。 As shown in FIG. 12, in block 1210, a text region from an input image or input video is detected using the text detection method according to the text detection method described with reference to FIGS.

ブロック１２２０において、テキストは、検出されたテキスト領域から抽出されうる。選択的に、ブロック１２４０において示されるように、入力映像中のテキストは、入力映像からテキスト領域を検出する際に追跡されうる。 At block 1220, text may be extracted from the detected text region. Optionally, as shown in block 1240, text in the input video can be tracked in detecting text regions from the input video.

ブロック１２３０において、テキスト認識は、抽出されたテキストに対して実行されてテキスト情報を取得しうる。 At block 1230, text recognition may be performed on the extracted text to obtain text information.

次に、本発明の一実施形態に係るテキスト情報抽出システム１３００のブロック図を示す図１３を参照する。システム１３００は、図１２を参照して説明した方法を実現するために使用されうる。 Reference is now made to FIG. 13 showing a block diagram of a text information extraction system 1300 according to an embodiment of the present invention. System 1300 may be used to implement the method described with reference to FIG.

図１３に示されるように、システム１３００は、テキスト検出装置１３１０と、抽出装置１３２０と、認識装置１３３０とを備える。 As illustrated in FIG. 13, the system 1300 includes a text detection device 1310, an extraction device 1320, and a recognition device 1330.

テキスト検出装置１３１０は、入力画像又は入力映像からテキスト領域を検出するように構成され、且つ図９に関連して説明した装置９１０と同一であってよい。 The text detection device 1310 is configured to detect a text region from an input image or video and may be the same as the device 910 described in connection with FIG.

抽出装置１３２０は、検出されたテキスト領域からテキストを抽出するように構成されうる。 The extraction device 1320 can be configured to extract text from the detected text region.

認識装置１３３０は、抽出されたテキストを認識してテキスト情報を取得するように構成されうる。 The recognition device 1330 may be configured to recognize the extracted text and acquire text information.

選択的に、システム１３００は追跡装置１３４０を更に備えうる。追跡装置１３４０は、テキスト検出装置１３１０が入力映像からテキスト領域を検出するように構成される際に入力映像中のテキストを追跡するように構成されうる。 Optionally, system 1300 can further comprise a tracking device 1340. The tracking device 1340 may be configured to track text in the input video when the text detection device 1310 is configured to detect a text region from the input video.

図９〜図１１及び図１３に関連して上述したユニット及び装置は、種々のステップを実現する例示的なモジュール及び／又は好ましいモジュールであることが理解されるだろう。モジュールは、ハードウェアユニット（例えば、プロセッサ又は特定用途向け集積回路等）及び／又はソフトウェアモジュール（例えば、コンピュータプログラム）であってよい。種々のステップを実現するモジュールは、完全に上述されていない。しかし、ある特定の処理を実行するステップがある場合、同一の処理を実現する対応する機能モジュール又は機能ユニット（ハードウェア及び／又はソフトウェアにより実現された）があってもよい。上述及び後述のステップとこれらのステップに対応するユニットとの全ての組合せが構成する技術的解決法が完全で且つ適用可能である限り、それらによる技術的解決方法は本発明の開示内容に含まれる。 It will be appreciated that the units and devices described above in connection with FIGS. 9-11 and 13 are exemplary modules and / or preferred modules that implement various steps. A module may be a hardware unit (eg, a processor or an application specific integrated circuit) and / or a software module (eg, a computer program). The modules that implement the various steps are not fully described above. However, if there is a step of executing a specific process, there may be a corresponding functional module or functional unit (implemented by hardware and / or software) that implements the same process. As long as the technical solutions constituted by all combinations of the steps described above and below and the units corresponding to these steps are complete and applicable, the technical solutions by them are included in the disclosure content of the present invention. .

また、種々のユニットにより構成された上述の装置及びシステムは、機能モジュールとしてコンピュータ等のハードウェアデバイスに組み込まれうる。当然、コンピュータは、これらの機能モジュールに加えて、他のハードウェアコンポーネント又はソフトウェアコンポーネントを有する。 In addition, the above-described apparatus and system configured by various units can be incorporated into a hardware device such as a computer as a functional module. Of course, the computer has other hardware or software components in addition to these functional modules.

本発明の方法、装置及びシステムは、多くの方法で実行可能である。例えば、本発明の方法及び装置は、ソフトウェア、ハードウェア、ファームウェア又はそれらのあらゆる組合せにより実行可能である。方法のステップの上述の順序は例示することのみを意図し、特に指示のない限り、本発明の方法のステップは特に上述された順序に限定されない。それに加えて、いくつかの実施形態において、本発明は、本発明に係る方法を実現する機械可読命令を含む記録媒体に記録されたプログラムとしても実施されてもよい。従って、本発明は、本発明に係る方法を実現するプログラムを格納する記録媒体も範囲に含む。 The method, apparatus and system of the present invention can be implemented in many ways. For example, the method and apparatus of the present invention can be performed by software, hardware, firmware, or any combination thereof. The above order of method steps is intended to be exemplary only, and unless otherwise indicated, the method steps of the present invention are not particularly limited to the order described above. In addition, in some embodiments, the present invention may also be implemented as a program recorded on a recording medium that includes machine-readable instructions for implementing the method according to the present invention. Therefore, the present invention also includes a recording medium that stores a program for realizing the method according to the present invention.

例を用いて本発明のいくつかの特定の実施形態を詳細に実証したが、上述の例は、本発明の範囲を限定することではなく、例示することのみを意図することが当業者により理解されるべきである。上述の実施形態は、本発明の範囲及び趣旨から逸脱することなく変更可能であることが当業者により理解されるべきである。本発明の範囲は、添付の特許請求の範囲により規定される。 While several specific embodiments of the present invention have been demonstrated in detail using examples, those skilled in the art will appreciate that the above examples are intended to be illustrative only, rather than limiting the scope of the invention. It should be. It should be understood by those skilled in the art that the embodiments described above can be modified without departing from the scope and spirit of the invention. The scope of the present invention is defined by the appended claims.

Claims

A text detection method for detecting a text region in an image including at least one connected component , comprising:
A detection step for detecting a text background region from the image,
A filtering step of leaving a connected component as a text candidate by filtering the at least one connected component,
Wherein on the basis of the text background region detected in the detection step, the grouped text candidate to become connected components to form at least one connected component group, at least 1 and based on the at least one connected component group a grouping step of generating One of the text area,
Equipped with a,
In the filtering step, filtering for connected components not within the boundaries of the text background region, the text detection method, wherein the stringent than filtering for connected components within the boundaries of the text background area.

The method of claim 1, wherein the text background region is a surrounding region of text in the image and has a regular border and a uniform color or gray scale.

The detection process,
A feature of said at least one connected component ;
Wherein at least one connected component which is arranged within the boundaries of the connected component, the statistics said at least one member connected components in the connected component having a high contrast with respect to at least one connected component, and the at least one The relationship between connected components and other text background areas ,
The method according to claim 1, further comprising a selection step of selecting the text background region from the at least one connected component based on at least one of the following.

The characteristics of the at least one connected component are:
Color or gray scale uniformity of the connected component ,
The size of the connected component ,
The shape of the connected component ,
Regularity of boundaries of the connected components ,
The position of the connected component in the image;
Mean gray scale value of the connected component, and method of claim 3, characterized in that it comprises at least one of a gray-scale value distribution of the connected component.

The statistics of the member connected components are
The number of the members connected components in the connected component,
The number of seed connected components of the member connected components having a text confidence higher than a first predefined threshold;
Characterized in that it contains the average text reliability of the members connected components in the connected component, and at least one of the ratio between the area of the total area and the connected component of the member connected component of the coupling component The method of claim 3.

Selection step of selecting the text background region from the at least one connected component based on the relationship between the at least one connected component and another text background area,
Rather than members connected components in any text background area determined before, and the connected components in response to said connection component without the determined text background area identical members connected component and before the as a text background region The method according to claim 3, further comprising a determination step of determining.

In the off Irutaringu step connected components not within the boundaries of the connected component and any text background area within the boundaries of any text background area,
The size of the connected component ,
The shape of the connected component ,
The aspect ratio of the bounding box of the connected component ,
The area ratio of the connected component and its bounding box,
The ratio of the perimeter and the area of the connected component, and texture features of the connected component,
The method of claim 1, wherein the filtering is based on at least one of the following:

The connected component that is not in any text background area is
Statistics of stroke width, and the ratio between the number of border pixels and the number of connected component pixels,
The method of claim 7 , further filtered based on at least one of the following:

A text detection method for detecting a text region in an image including at least one connected component, comprising:
Detecting a text background region from the image;
Filtering the at least one connected component to leave a connected component that is a text candidate;
Based on the text background region detected in the detecting step, the connected components that are the text candidates are grouped to form at least one connected component group, and at least one connected component group is formed based on the at least one connected component group. A grouping process to generate a text area;
With
The grouping step,
a) assigning said candidate to become the text connected components in each of the text background area, and a step of assigning the text connected components to be assigned non the candidate text background region outside the region,
b) forming a connected component group by grouping each connected component of the text background region and the outer region,
text detection method characterized in that c) a combination of connected components groups from various regions between each text background region and the outer region further comprises the step of generating the at least one text region.

Said step b)
b1) a step of dividing the set of dark connected component of the set and the bright region of the bright connected component dark region connected component of each text background region and the outer region,
b2) respectively generating a connected component group in the set of the set and the dark connected component of the bright connected component,
b3) based on at least one of the similarity of the spatial relationships and appearance, according to claim 9, further comprising the step of combining a set of the set and the dark connected component of the bright connected component the method of.

In step c)
Row-wise consistency,
Similarity in average size of the connected components ,
Similarity of the average stroke width of said connected component, and the average color or similarity of the gray scale of the connected component,
10. The method of claim 9 , comprising combining connected component groups from different regions based on at least one of the following.

In the step b2), the connected component group is generated by connected component clustering;
The connected component clustering has the following constraints:
Alignment of the center of the connected component according to a certain direction,
Size similarity of connected components ,
The similarity of the shape of the connected components ,
Color or grayscale similarity of connected components ,
Similarity of stroke width of connected components , and
The method according to claim 10 , wherein at least one of the distances between connected components is used.

The step b2)
Generating a connected component group by Hough transform,
Using at least one of the following constraints, the constraint further comprising the step of filtering the connected components contained in the generated connected component group,
Alignment of the center of the connected component according to a certain direction,
Size similarity of connected components ,
The similarity of the shape of the connected components ,
Color or grayscale similarity of connected components ,
Similarity of stroke width of connected components , and
The distance between connected components ,
The method according to claim 10 .

10. The method of claim 9 , wherein the rules for combining in step c) are stricter than the rules for grouping in step b).

The full Irutaringu process,
Calculating a respective text reliability of the at least one connected component other than the text background area,
Here, in the calculation, connected components within the boundary of every text background region are more important than other connected components .
The method according to claim 1, characterized in that it comprises a step of determining a connecting component having a high text reliability than a second predefined threshold value as the text candidate linked components.

The full Irutaringu process,
The text background each of the at least one connected component other than the region, whether the connected components in response to being positioned within the boundaries of any text background area, identifying the connected component as a first connected component or a step of identifying the connected component as a second connected components,
For the connected component to determine whether said text candidate connected components, and performing a first filtering process on each of the first connection component and the second connected components of,
In order to further determine whether the second connected component is the text candidate connected component , for each of the second connected components determined as the text candidate connected component by the first filtering step . the method according to claim 1, characterized in that it comprises a step of performing a second filtering process.

Performing a first filtering process on each of the first connection component and the second connected components of,
For the connected component to determine whether said text candidate connected components, characterized in that it comprises a step of performing said first filtering step based on one or more of the first feature of the connected components The method of claim 16 .

Performing a second filtering process for each of the second coupling component of which is determined as the text candidate connected components by the first filtering step,
For the second coupling component of the further determining whether said text candidate connected components, performing a second filtering process based on one or more of the second feature of the second connected components The method of claim 16 , comprising:

The texture features are:
Local binary pattern,
The method of claim 7 , comprising at least one of an edge direction histogram and a gradient histogram.

18. The method of claim 17 , wherein in the first filtering step , the first feature is used as a cascade rule or combined as a feature vector input to a training classifier.

19. The method of claim 18 , wherein in the second filtering step , the second feature is used as a cascade rule or combined as a feature vector input to a training classifier.

A text detection apparatus for detecting a text region in an image including at least one connected component,
And detecting means for detecting a text background region from the image,
A filtering means leaving a connected component as a text candidate by filtering the at least one connected component,
On the basis of the detected said text background regions in detection means, the grouped text candidate to become connected components to form at least one connected component group, at least 1 and based on the at least one connected component group and grouping means for generating One of the text area,
Equipped with a,
In the filtering means, filtering for connected components not within the boundaries of the text background region, the text detecting apparatus characterized by severe than filtering for connected components within the boundaries of the text background area.

A text detection device for detecting a text region in an image including at least one connected component,
Detecting means for detecting a text background region from the image;
Filtering means for filtering the at least one connected component to leave a connected component that is a text candidate;
Based on the text background region detected by the detecting means, the connected components that are the text candidates are grouped to form at least one connected component group, and at least one based on the at least one connected component group A grouping means for generating a text area;
With
The grouping means includes
a) assigning the candidate text connected components to respective text background regions, and means for assigning the candidate text connected components that cannot be assigned to text background regions to outer regions
b) means for grouping connected components of each text background region and the outer region to form a connected component group;
c) means for generating the at least one text region by combining connected component groups from various regions between each text background region and the outer region;
A text detection device further comprising: