JP7526692B2

JP7526692B2 - Recognition method and device

Info

Publication number: JP7526692B2
Application number: JP2021026818A
Authority: JP
Inventors: 武志馬路; 将平長谷川; 昌利鴻田; 修一伊澤
Original assignee: Fujitsu Frontech Ltd
Current assignee: Fujitsu Frontech Ltd
Priority date: 2021-02-22
Filing date: 2021-02-22
Publication date: 2024-08-01
Anticipated expiration: 2041-02-22
Also published as: JP2022128348A

Description

本発明は、認識方法及び認識装置に関する。 The present invention relates to a recognition method and a recognition device.

従来、光学的に読み取った帳票の画像から所定の項目の文字認識を行う技術が知られている。 Conventionally, technology is known that performs character recognition for specific items from an optically read image of a form.

例えば、あらかじめ選択式項目に印字される文字列の候補を登録しておき、丸印やチェック印が記入された領域にある当該登録済みの文字列を認識する技術が知られている。 For example, a technology is known that preregisters candidate strings of characters to be printed in multiple-choice items and then recognizes the registered strings in areas marked with a circle or check mark.

また、例えば、文字列の選択に使われる二重線等の図形をあらかじめ登録しておき、当該登録済みの図形を読み取ることで選択された文字列を特定する技術が知られている。 In addition, there is known a technique in which a shape such as a double line used to select a character string is registered in advance, and the selected character string is identified by reading the registered shape.

特開平１１－３４５２８１号公報Japanese Patent Application Publication No. 11-345281 特開２００５－１７３６７３号公報JP 2005-173673 A

しかしながら、従来の技術には、事前に文字列及び図形等の登録が必要になるため、帳票の選択された文字列の認識を容易かつ汎用的に行うことができない場合があるという問題がある。 However, conventional techniques require the advance registration of character strings and figures, which means that it is sometimes not possible to easily and universally recognize selected character strings on a form.

例えば、銀行において、口座開設申し込み及び入金依頼等の際に、口座の科目を帳票上で選択する場面を考える。ここでは、認識対象の文字列として「普通」と「当座」が事前に登録されているものとする。 For example, consider a situation in which an account subject is selected on a form at a bank when applying to open an account or making a deposit. In this case, the characters to be recognized are assumed to be "regular" and "current."

このとき、科目の選択肢として「普通」と「当座」に加えて「納税準備」という文字列が用意されており、「納税準備」に手書きで丸が付けられた場合、従来技術では選択された文字列を認識することができない場合がある。 In this case, in addition to "regular" and "current," the character string "tax preparation" is provided as an option for the subject, and if "tax preparation" is circled by hand, conventional technology may not be able to recognize the selected character string.

１つの側面では、帳票の選択された文字列の認識を容易かつ汎用的に行うことを目的とする。 In one aspect, the aim is to make it easy and versatile to recognize selected character strings on a form.

１つの態様では、認識方法は、帳票から手書きで記入がされた第１の領域を抽出し、帳票から活字が印字された第２の領域を抽出し、第２の領域のうち、第１の領域との重複の度合いが所定の条件を満たす領域を選択する処理をコンピュータが実行する。 In one aspect, the recognition method involves a computer executing a process to extract a first area from a form that is filled in by hand, extract a second area from the form that is printed with type, and select from the second area an area whose degree of overlap with the first area satisfies a predetermined condition.

１つの側面では、帳票の選択された文字列の認識を容易かつ汎用的に行うことができる。 In one aspect, it allows for easy and versatile recognition of selected character strings on a form.

図１は、認識システムの構成例を示す図である。FIG. 1 is a diagram illustrating an example of the configuration of a recognition system. 図２は、認識装置の構成例を示す図である。FIG. 2 is a diagram illustrating an example of the configuration of a recognition device. 図３は、選択文字列情報の例を示す図である。FIG. 3 is a diagram showing an example of selected character string information. 図４は、手書き領域の抽出を説明する図である。FIG. 4 is a diagram for explaining extraction of a handwritten region. 図５は、活字領域の抽出を説明する図である。FIG. 5 is a diagram for explaining extraction of a print region. 図６は、帳票の項目の例を示す図である。FIG. 6 is a diagram showing examples of items on a form. 図７は、重複領域の例を示す図である。FIG. 7 is a diagram showing an example of an overlapping region. 図８は、認識処理の流れを示すフローチャートである。FIG. 8 is a flowchart showing the flow of the recognition process. 図９は、ハードウェア構成例を説明する図である。FIG. 9 is a diagram illustrating an example of a hardware configuration.

以下に、本発明に係る認識方法及び認識装置を図面に基づいて詳細に説明する。なお、この実施例により本発明が限定されるものではない。また、各実施例は、矛盾のない範囲内で適宜組み合わせることができる。 The following describes in detail the recognition method and recognition device according to the present invention with reference to the drawings. Note that the present invention is not limited to these examples. Furthermore, the examples can be combined as appropriate within a range that does not cause inconsistencies.

図１を用いて、実施例に係る認識システムの構成を説明する。図１は、認識システムの構成例を示す図である。図１に示すように、認識システム１は、認識装置１０及びスキャナ２０を有する。 The configuration of a recognition system according to an embodiment will be described with reference to FIG. 1. FIG. 1 is a diagram showing an example of the configuration of a recognition system. As shown in FIG. 1, the recognition system 1 includes a recognition device 10 and a scanner 20.

認識装置１０は、スキャナ２０によって生成された帳票の画像の入力を受け付け、認識結果を出力する。図１の例では、認識装置１０は、「性別：男」という認識結果を出力する。図１の認識結果は、帳票の「性別」という項目の選択肢のうち、「男」という選択肢が選択されていたことを意味している。 The recognition device 10 accepts input of an image of a form generated by the scanner 20 and outputs a recognition result. In the example of FIG. 1, the recognition device 10 outputs the recognition result "Gender: Male". The recognition result in FIG. 1 means that the option "Male" was selected from among the options for the "Gender" item on the form.

本実施形態における帳票は、紙等の媒体に項目及び選択肢が印字され、記入者が手書きで各項目の選択肢を選択するものである。例えば、帳票は、銀行の口座開設の申し込み用紙、入金依頼の用紙、マークシート等の選択式試験の解答用紙、アンケート用紙等である。 In this embodiment, the form is a medium such as paper on which items and options are printed, and the person filling out the form handwrites the option for each item. For example, the form is a bank account opening application form, a deposit request form, an answer sheet for a multiple-choice test such as a mark sheet, a questionnaire, etc.

なお、認識装置１０は、パーソナルコンピュータ、現金自動預払機（ＡＴＭ：automatic teller machine）、スマートフォン等によって実現されてもよい。 The recognition device 10 may be realized by a personal computer, an automatic teller machine (ATM), a smartphone, etc.

また、認識装置１０は、端末とサーバとを組み合わせて実現されてもよい。その場合、端末はスキャナ２０から受け取った画像をサーバに送信する。そして、サーバは画像を基にした認識結果を端末に返す。 The recognition device 10 may also be realized by combining a terminal and a server. In that case, the terminal transmits the image received from the scanner 20 to the server. The server then returns the recognition result based on the image to the terminal.

スキャナ２０は、紙等の媒体を光学的に読み取り、画像を生成する装置である。例えば、スキャナ２０は、複合機及び手書きの帳票を受け付け可能なＡＴＭ等の機能の一部であってもよい。また、スキャナ２０は、スマートフォン等のカメラ付きの携帯型端末であってもよい。 The scanner 20 is a device that optically reads media such as paper and generates an image. For example, the scanner 20 may be part of the functions of a multifunction device or an ATM that can accept handwritten forms. The scanner 20 may also be a mobile terminal with a camera, such as a smartphone.

スキャナ２０は、帳票３０を読み取る。帳票３０には、「性別」という項目名が印字されており、その下に「（１）男」及び「（２）女」という選択肢が印字されている。また、選択肢「（１）男」の付近には、手書きの丸印が記入されている。 The scanner 20 reads the form 30. On the form 30, the item name "Gender" is printed, and the options "(1) Male" and "(2) Female" are printed below it. In addition, a handwritten circle is written near the option "(1) Male."

以降の説明で、手書きは、文字を書くこと（write）に限られず、人間が手であらゆる図形を書くこと（write）及び描くこと（draw）を意味するものとする。 In the following explanation, handwriting is not limited to writing characters, but also refers to writing and drawing any shape by human hand.

また、ここでの図形には、円及び四角形といった幾何学的な図形に限られず、文字、チェックマーク、塗りつぶしといったあらゆる態様の図形を含むものとする。 The shapes referred to here are not limited to geometric shapes such as circles and rectangles, but include any type of shape such as letters, check marks, and fills.

一方、活字は、ワードプロセッサ等によって生成され、プリンタ等によって帳票に印字される文字である。 On the other hand, type is the characters that are generated by a word processor or the like and printed onto a form by a printer or the like.

図１の例では、帳票３０の丸印は手書きで記入されたものである。一方、帳票３０に印字された「性別」、「（１）男」及び「（２）女」は活字である。 In the example of FIG. 1, the circles on form 30 are handwritten. On the other hand, "Gender," "(1) Male," and "(2) Female" are printed on form 30 in type.

図２は、認識装置の構成例を示す図である。図２に示すように、認識装置１０は、ＩＦ（インタフェース）部１１、記憶部１２及び制御部１３を有する。 Figure 2 is a diagram showing an example of the configuration of a recognition device. As shown in Figure 2, the recognition device 10 has an IF (interface) unit 11, a storage unit 12, and a control unit 13.

ＩＦ部１１は、データの入力及び出力のためのインタフェースである。例えば、ＩＦ部１１はＮＩＣ（Network Interface Card）である。ＩＦ部１１はスキャナ２０を含む他の装置との間でデータの送受信を行うことができる。 The IF unit 11 is an interface for inputting and outputting data. For example, the IF unit 11 is a NIC (Network Interface Card). The IF unit 11 can send and receive data to and from other devices including the scanner 20.

また、ＩＦ部１１は、マウスやキーボード等の入力装置と接続されていてもよい。また、ＩＦ部１１は、ディスプレイ及びスピーカ等の出力装置と接続されていてもよい。 The IF unit 11 may also be connected to input devices such as a mouse and a keyboard. The IF unit 11 may also be connected to output devices such as a display and a speaker.

記憶部１２は、データや制御部１３が実行するプログラム等を記憶する記憶装置の一例であり、例えばハードディスクやメモリ等である。記憶部１２は、手書き領域抽出モデル情報１２１、活字領域抽出モデル情報１２２、辞書情報１２３及び選択文字列情報１２４を記憶する。 The memory unit 12 is an example of a storage device that stores data, programs executed by the control unit 13, etc., and is, for example, a hard disk or memory. The memory unit 12 stores handwritten area extraction model information 121, printed character area extraction model information 122, dictionary information 123, and selected character string information 124.

手書き領域抽出モデル情報１２１は、手書き領域抽出モデルを構築するためのパラメータ等である。例えば、手書き領域抽出モデルは、ニューラルネットワーク等を用いた画像認識モデルであって、後述する抽出部１３３によって使用される。例えば、手書き領域抽出モデル情報１２１は、ニューラルネットワークの重み行列及びバイアス値である。 The handwritten area extraction model information 121 is parameters for constructing a handwritten area extraction model. For example, the handwritten area extraction model is an image recognition model that uses a neural network or the like, and is used by the extraction unit 133 described below. For example, the handwritten area extraction model information 121 is a weight matrix and bias value of a neural network.

活字領域抽出モデル情報１２２は、活字領域抽出モデルを構築するためのパラメータ等である。例えば、活字領域抽出モデルは、ニューラルネットワーク等を用いた画像認識モデルであって、後述する抽出部１３３によって使用される。例えば、活字領域抽出モデル情報１２２は、ニューラルネットワークの重み行列及びバイアス値である。 The typed character area extraction model information 122 is parameters for constructing a typed character area extraction model. For example, the typed character area extraction model is an image recognition model that uses a neural network or the like, and is used by the extraction unit 133 described below. For example, the typed character area extraction model information 122 is a weight matrix and bias value of a neural network.

辞書情報１２３は、文字認識のための文字の集合である。辞書情報１２３は、既存のＯＣＲ（Optical Character Recognition）ソフト等で用いられる辞書であって、アルファベット、漢字、ひらがな、算用数字、記号といった文字の特徴を含むものであってもよい。 Dictionary information 123 is a set of characters for character recognition. Dictionary information 123 is a dictionary used in existing OCR (Optical Character Recognition) software, etc., and may include the characteristics of characters such as alphabets, kanji, hiragana, Arabic numerals, and symbols.

選択文字列情報１２４は、項目名及び選択肢として使用される文字列の組み合わせである。図３は、選択文字列情報の例を示す図である。図３に示すように、例えば、選択文字列情報１２４は、項目「性別」と選択肢「男、女、男性、女性、Male、Female、…」の組み合わせを含む。 The selection string information 124 is a combination of item names and strings used as options. FIG. 3 is a diagram showing an example of selection string information. As shown in FIG. 3, for example, the selection string information 124 includes a combination of the item "Gender" and the options "Male, Female, Male, Female, ...".

選択文字列情報１２４の項目名及び選択肢に含まれる文字列は、それぞれ項目名及び選択肢として使用される頻度が高い文字列としてあらかじめ指定されたものである。選択文字列情報１２４は、後述する文字認識処理を補助するために用いられる。 The character strings included in the item names and options of the selected character string information 124 are designated in advance as character strings that are frequently used as item names and options, respectively. The selected character string information 124 is used to assist in the character recognition process described below.

図２に戻り、制御部１３は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＧＰＵ（Graphics Processing Unit）等によって、内部の記憶装置に記憶されているプログラムがＲＡＭ（Random Access Memory）を作業領域として実行されることにより実現される。また、制御部１３は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されるようにしてもよい。 Returning to FIG. 2, the control unit 13 is realized, for example, by a CPU (Central Processing Unit), MPU (Micro Processing Unit), GPU (Graphics Processing Unit), etc., which executes a program stored in an internal storage device using RAM (Random Access Memory) as a working area. The control unit 13 may also be realized, for example, by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array).

制御部１３は、スキャナ制御部１３１と、解析部１３２と、抽出部１３３と、選択部１３４と、認識部１３５と、補正部１３６と、出力制御部１３７と、を有する。 The control unit 13 includes a scanner control unit 131, an analysis unit 132, an extraction unit 133, a selection unit 134, a recognition unit 135, a correction unit 136, and an output control unit 137.

スキャナ制御部１３１は、スキャナ２０を制御する。スキャナ制御部１３１は、スキャナ２０に、帳票の読み取り、画像の生成及び画像の受け渡しを指示する。 The scanner control unit 131 controls the scanner 20. The scanner control unit 131 instructs the scanner 20 to read the document, generate an image, and transfer the image.

解析部１３２は、スキャナ２０から受け取った画像を解析し、帳票上の選択項目の位置を特定する。例えば、解析部１３２は、参考文献１（特開２０１０－３１５５号公報）に記載された方法を用いて項目を抽出することができる。 The analysis unit 132 analyzes the image received from the scanner 20 and identifies the position of the selected item on the form. For example, the analysis unit 132 can extract the items using the method described in Reference 1 (JP Patent Publication 2010-3155A).

抽出部１３３は、解析部１３２によって特定された位置において、手書き領域及び活字領域の抽出を行う。抽出部１３３は、帳票から手書きで記入がされた手書き領域を抽出する。なお、手書き領域は第１の領域の一例である。また、抽出部１３３は、帳票から活字が印字された活字領域を抽出する。なお、活字領域は第２の領域の一例である。 The extraction unit 133 extracts handwritten areas and typed areas at the positions identified by the analysis unit 132. The extraction unit 133 extracts handwritten areas from the form where entries have been made by hand. The handwritten areas are an example of a first area. The extraction unit 133 also extracts typed areas from the form where type is printed. The typed areas are an example of a second area.

抽出部１３３は、手書きの図形の特徴を学習した画像認識モデルを用いて手書き領域を抽出する。抽出部１３３は、手書き領域抽出モデル情報１２１を基に構築した画像認識モデルである手書き領域抽出モデルを用いる。 The extraction unit 133 extracts the handwritten area using an image recognition model that has learned the characteristics of handwritten figures. The extraction unit 133 uses a handwritten area extraction model, which is an image recognition model constructed based on the handwritten area extraction model information 121.

手書き領域抽出モデルは、丸印及びチェックマークといった特定の図形を教師データとして訓練されたものであってもよいし、特定の図形に限られずあらゆる手書きの図形を教師データとして訓練されたものであってもよい。 The handwritten region extraction model may be trained using specific shapes such as circles and check marks as training data, or it may be trained using any handwritten shape as training data, not limited to specific shapes.

図４は、手書き領域の抽出を説明する図である。図４の例では、抽出部１３３は、手書きの丸印が記入された領域５１を手書き領域として抽出する。 Figure 4 is a diagram illustrating the extraction of a handwritten region. In the example of Figure 4, the extraction unit 133 extracts an area 51 in which a handwritten circle is written as a handwritten region.

抽出部１３３は、活字の特徴を学習した画像認識モデルを用いて活字領域を抽出する。抽出部１３３は、活字領域抽出モデル情報１２２を基に構築した画像認識モデルである活字領域抽出モデルを用いる。 The extraction unit 133 extracts the type region using an image recognition model that has learned the characteristics of type. The extraction unit 133 uses a type region extraction model, which is an image recognition model constructed based on the type region extraction model information 122.

図５は、活字領域の抽出を説明する図である。図５の例では、抽出部１３３は、「（１）男」と活字で印字された領域５２、及び「（２）女」と活字で印字された領域５３を活字領域として抽出する。 Figure 5 is a diagram illustrating the extraction of a type region. In the example of Figure 5, the extraction unit 133 extracts the region 52 in which "(1) Male" is printed in type, and the region 53 in which "(2) Female" is printed in type, as type regions.

選択部１３４は、活字領域のうち、手書き領域との重複の度合いが所定の条件を満たす領域を選択する。例えば、選択部１３４は、活字領域のうち、手書き領域と少なくとも一部が手書き領域と重なり合う領域を選択する。 The selection unit 134 selects an area from the type region where the degree of overlap with the handwritten region satisfies a predetermined condition. For example, the selection unit 134 selects an area from the type region where at least a portion of the area overlaps with the handwritten region.

例えば、図５の例では、手書き領域である領域５１と活字領域である領域５２は一部が重複している。一方、手書き領域である領域５１と活字領域である領域５３は重複していない。このため、選択部１３４は、領域５１に対応する活字領域として領域５２を選択する。 For example, in the example of FIG. 5, region 51, which is a handwritten region, and region 52, which is a type region, partially overlap. On the other hand, region 51, which is a handwritten region, and region 53, which is a type region, do not overlap. Therefore, the selection unit 134 selects region 52 as the type region corresponding to region 51.

また、例えば複数の活字領域が手書き領域と重なり合う場合がある。その場合、選択部１３４は、活字領域のうち、手書き領域と重なり合う部分の面積が最大である領域を選択することができる。 In addition, for example, multiple type regions may overlap with a handwritten region. In such a case, the selection unit 134 can select the type region that has the largest area of overlap with the handwritten region.

図６は、帳票の項目の例を示す図である。図６の例では、選択肢として「１普通」、「２当座」及び「３納税準備」が印字されている。 Figure 6 shows an example of the items on the form. In the example in Figure 6, the options "1. Regular," "2. Current," and "3. Tax Preparation" are printed.

図６に示す帳票に手書きで丸印が記入された場合を考える。ここでは、図７に示す位置に丸印が記入されたものとする。図７は、重複領域の例を示す図である。 Consider the case where a circle is handwritten on the form shown in Figure 6. In this case, the circle is written in the position shown in Figure 7. Figure 7 shows an example of an overlapping area.

抽出部１３３は、領域５４を手書き領域として抽出し、領域５５、領域５６及び領域５７を活字領域として抽出する。 The extraction unit 133 extracts area 54 as a handwritten area, and areas 55, 56, and 57 as typed areas.

領域５８は、領域５４と領域５６が重なり合う領域である。また、領域５９は、領域５４と領域５７が重なり合う領域である。ここで、領域５９の面積は領域５８の面積より大きいため、選択部１３４は領域５９に対応する手書き領域である領域５７を選択する。 Area 58 is an area where area 54 and area 56 overlap. Area 59 is an area where area 54 and area 57 overlap. Here, since the area of area 59 is larger than the area of area 58, the selection unit 134 selects area 57, which is a handwritten area that corresponds to area 59.

認識部１３５は、選択部１３４によって選択された領域に印字された文字列を認識する。認識部１３５は、活字領域の文字列を認識する。図５の例では、認識部１３５は文字列「（１）男」を認識する。また、図７の例では、認識部１３５は文字列「３納税準備」を認識する。 The recognition unit 135 recognizes the character string printed in the area selected by the selection unit 134. The recognition unit 135 recognizes the character string in the typed area. In the example of FIG. 5, the recognition unit 135 recognizes the character string "(1) Man." In the example of FIG. 7, the recognition unit 135 recognizes the character string "3 Preparation for paying taxes."

認識部１３５は、辞書情報１２３を参照して文字認識を行う。さらに、認識部１３５は、選択文字列情報１２４に選択肢として含まれる文字列を優先して認識するようにしてもよい。 The recognition unit 135 performs character recognition by referring to the dictionary information 123. Furthermore, the recognition unit 135 may preferentially recognize character strings included as options in the selection character string information 124.

例えば、認識部１３５が、項目名が「職業」である項目の選択肢の文字列に対して「会社員」と「会仕員」について同等の認識確度（確率）を算出したものとする。 For example, assume that the recognition unit 135 calculates the same recognition accuracy (probability) for the string of options for an item named "occupation" for "company employee" and "company employee."

一方で、図３に示す選択文字列情報１２４を参照すると、「会社員」は項目名「職業」に対する選択肢に含まれているが、「会仕員」は項目名「職業」に対する選択肢に含まれていない。この場合、認識部１３５は、当該文字列を「会社員」と認識する。 On the other hand, when referring to the selected character string information 124 shown in FIG. 3, "company employee" is included in the options for the item name "occupation", but "company employee" is not included in the options for the item name "occupation". In this case, the recognition unit 135 recognizes the character string as "company employee".

このように、認識部１３５は、あらかじめ対応付けられた項目名と選択肢の組み合わせを認識する。 In this way, the recognition unit 135 recognizes combinations of item names and options that are associated in advance.

補正部１３６は、認識部１３５によって認識された文字列からあらかじめ指定された文字列を除外する。例えば、補正部１３６は、「（１）男」から「（１）」を除外する。例えば、補正部１３６は、項番等のあらかじめ指定された情報を削除する。 The correction unit 136 removes pre-specified character strings from the character strings recognized by the recognition unit 135. For example, the correction unit 136 removes "(1)" from "(1) male." For example, the correction unit 136 deletes pre-specified information such as item numbers.

なお、補正部１３６による補正機能は、有効及び無効を管理者が任意に切り替えられるものとする。 The correction function of the correction unit 136 can be enabled or disabled at the administrator's discretion.

出力制御部１３７は、認識部１３５による認識結果を所定の形式で認識結果を出力する。例えば、出力制御部１３７は、「性別：男性」のように、項目名と認識した選択肢を組み合わせて出力してもよい。 The output control unit 137 outputs the recognition result by the recognition unit 135 in a predetermined format. For example, the output control unit 137 may output a combination of the item name and the recognized option, such as "Gender: Male."

図８は、認識処理の流れを示すフローチャートである。図８に示すように、まず、認識装置１０は、帳票を読み取る（ステップＳ１０１）。次に、認識装置１０は、帳票上の項目の位置を特定する（ステップＳ１０２）。 Figure 8 is a flowchart showing the flow of the recognition process. As shown in Figure 8, first, the recognition device 10 reads the form (step S101). Next, the recognition device 10 identifies the position of the item on the form (step S102).

そして、認識装置１０は、手書き領域抽出モデルを用いて、項目周辺の手書き領域を抽出する（ステップＳ１０３）。また、認識装置１０は、活字領域抽出モデルを用いて、項目周辺の活字領域を抽出する（ステップＳ１０４）。 Then, the recognition device 10 uses the handwritten area extraction model to extract the handwritten area around the item (step S103). The recognition device 10 also uses the typed area extraction model to extract the typed area around the item (step S104).

ここで、認識装置１０は、手書き領域と対応する活字領域を選択する（ステップＳ１０５）。例えば、認識装置１０は、活字領域のうち、手書き領域と重なり合う部分の面積が最大である領域を選択する。 Here, the recognition device 10 selects a typed region that corresponds to the handwritten region (step S105). For example, the recognition device 10 selects the typed region that has the largest area of overlap with the handwritten region.

続いて、認識装置１０は、選択した活字領域に書かれた文字列を認識（ステップＳ１０６）。さらに、認識装置１０は、文字列を補正する（ステップＳ１０７）。なお、ステップＳ１０７は設定により省略されてもよい。そして、認識装置１０は、認識した文字列を出力する（ステップＳ１０８）。 Then, the recognition device 10 recognizes the character string written in the selected type area (step S106). Furthermore, the recognition device 10 corrects the character string (step S107). Note that step S107 may be omitted depending on the settings. Then, the recognition device 10 outputs the recognized character string (step S108).

上述したように、抽出部１３３は、帳票から手書きで記入がされた手書き領域を抽出する。抽出部１３３は、帳票から活字が印字された活字領域を抽出する。選択部１３４は、活字領域のうち、手書き領域との重複の度合いが所定の条件を満たす領域を選択する。このように、認識装置１０は、認識対象の文字列及び図形を事前に指定しておくことなく、手書き領域と活字領域を対応付けて選択することができる。その結果、本実施形態によれば、帳票の選択された文字列の認識を容易かつ汎用的に行うことができる。 As described above, the extraction unit 133 extracts handwritten areas from the form where text has been written by hand. The extraction unit 133 extracts typed areas from the form where type characters are printed. The selection unit 134 selects from the typed areas an area whose degree of overlap with the handwritten area satisfies a predetermined condition. In this way, the recognition device 10 can select handwritten areas and typed areas in association with each other without specifying the character strings and figures to be recognized in advance. As a result, according to this embodiment, it is possible to easily and generally recognize selected character strings on a form.

抽出部１３３は、手書きの図形の特徴を学習した画像認識モデルを用いて手書き領域を抽出する。このように、画像認識モデルに手書きの特徴を学習させておくことで、不完全な形状の手書き図形等も含めて領域を抽出することが可能になる。 The extraction unit 133 extracts handwritten regions using an image recognition model that has learned the characteristics of handwritten figures. In this way, by having the image recognition model learn the characteristics of handwriting, it becomes possible to extract regions that include handwritten figures with incomplete shapes.

選択部１３４は、活字領域のうち、手書き領域と重なり合う部分の面積が最大である領域を選択する。これにより、認識装置１０は、選択された選択肢を定量的に特定することが可能になる。 The selection unit 134 selects the region of the type that has the largest area of overlap with the handwritten region. This allows the recognition device 10 to quantitatively identify the selected option.

認識部１３５は、選択部１３４によって選択された領域に印字された文字列を認識する。補正部１３６は、認識部１３５によって認識された文字列からあらかじめ指定された文字列を除外する。これにより、認識装置１０は、認識結果から不要な情報を除外し、重要な情報のみを残すことができる。 The recognition unit 135 recognizes the character string printed in the area selected by the selection unit 134. The correction unit 136 removes pre-specified character strings from the character string recognized by the recognition unit 135. This allows the recognition device 10 to remove unnecessary information from the recognition result and leave only important information.

認識部１３５は、あらかじめ対応付けられた項目名と選択肢の組み合わせを認識する。これにより、認識装置１０は、項目の選択肢として意図された文字列を優先的に認識することができる。 The recognition unit 135 recognizes combinations of item names and options that are associated in advance. This allows the recognition device 10 to preferentially recognize character strings intended as options for an item.

上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。また、実施例で説明した具体例、分布、数値等は、あくまで一例であり、任意に変更することができる。 The information, including the processing procedures, control procedures, specific names, various data and parameters shown in the above documents and drawings, may be changed as desired unless otherwise specified. Furthermore, the specific examples, distributions, numerical values, etc. described in the embodiments are merely examples and may be changed as desired.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散や統合の具体的形態は図示のものに限られない。つまり、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部又は任意の一部が、ＣＰＵ及び当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 In addition, each component of each device shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure. In other words, all or part of them can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc. Furthermore, each processing function performed by each device can be realized in whole or in any part by a CPU and a program analyzed and executed by the CPU, or can be realized as hardware using wired logic.

図９は、ハードウェア構成例を説明する図である。図９に示すように、認識装置１０は、通信インタフェース１０ａ、ＨＤＤ（Hard Disk Drive）１０ｂ、メモリ１０ｃ、プロセッサ１０ｄを有する。また、図９に示した各部は、バス等で相互に接続される。 Figure 9 is a diagram illustrating an example of a hardware configuration. As shown in Figure 9, the recognition device 10 has a communication interface 10a, a HDD (Hard Disk Drive) 10b, a memory 10c, and a processor 10d. In addition, each part shown in Figure 9 is connected to each other via a bus or the like.

通信インタフェース１０ａは、ネットワークインタフェースカード等であり、他のサーバとの通信を行う。ＨＤＤ１０ｂは、図２に示した機能を動作させるプログラムやＤＢを記憶する。 The communication interface 10a is a network interface card or the like, and communicates with other servers. The HDD 10b stores the programs and DBs that operate the functions shown in FIG. 2.

プロセッサ１０ｄは、図２に示した各処理部と同様の処理を実行するプログラムをＨＤＤ１０ｂ等から読み出してメモリ１０ｃに展開することで、図２等で説明した各機能を実行するプロセスを動作させるハードウェア回路である。すなわち、このプロセスは、認識装置１０が有する各処理部と同様の機能を実行する。 The processor 10d is a hardware circuit that operates a process that executes each function described in FIG. 2 and the like by reading a program that executes the same processes as the respective processing units shown in FIG. 2 from the HDD 10b and the like and expanding it in the memory 10c. In other words, this process executes the same functions as the respective processing units of the recognition device 10.

具体的には、プロセッサ１０ｄは、スキャナ制御部１３１、解析部１３２、抽出部１３３、選択部１３４、認識部１３５、補正部１３６及び出力制御部１３７と同様の機能を有するプログラムをＨＤＤ１０ｂ等から読み出す。そして、プロセッサ１０ｄは、スキャナ制御部１３１、解析部１３２、抽出部１３３、選択部１３４、認識部１３５、補正部１３６及び出力制御部１３７等と同様の処理を実行するプロセスを実行する。 Specifically, processor 10d reads out from HDD 10b or the like a program having the same functions as scanner control unit 131, analysis unit 132, extraction unit 133, selection unit 134, recognition unit 135, correction unit 136, and output control unit 137. Then, processor 10d executes a process that executes the same processing as scanner control unit 131, analysis unit 132, extraction unit 133, selection unit 134, recognition unit 135, correction unit 136, and output control unit 137, etc.

このように認識装置１０は、プログラムを読み出して実行することで学習類方法を実行する情報処理装置として動作する。また、認識装置１０は、媒体読取装置によって記録媒体から上記プログラムを読み出し、読み出された上記プログラムを実行することで上記した実施例と同様の機能を実現することもできる。なお、この他の実施例でいうプログラムは、認識装置１０によって実行されることに限定されるものではない。例えば、他のコンピュータ又はサーバがプログラムを実行する場合や、これらが協働してプログラムを実行するような場合にも、本発明を同様に適用することができる。 In this way, the recognition device 10 operates as an information processing device that executes a learning method by reading and executing a program. The recognition device 10 can also realize functions similar to those of the above-mentioned embodiment by reading the program from a recording medium using a media reading device and executing the read program. Note that the program in these other embodiments is not limited to being executed by the recognition device 10. For example, the present invention can be similarly applied to cases where another computer or server executes a program, or where these cooperate to execute a program.

このプログラムは、インターネット等のネットワークを介して配布することができる。また、このプログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＭＯ（Magneto－Optical disk）、ＤＶＤ（Digital Versatile Disc）等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行することができる。 This program can be distributed via a network such as the Internet. In addition, this program can be recorded on a computer-readable recording medium such as a hard disk, a flexible disk (FD), a CD-ROM, an MO (Magneto-Optical disk), or a DVD (Digital Versatile Disc), and can be executed by being read from the recording medium by a computer.

１０認識装置
１１ＩＦ部
１２記憶部
１３制御部
５１、５２、５３、５４、５５、５６、５７、５８、５９領域
１２１手書き領域抽出モデル情報
１２２活字領域抽出モデル情報
１２３辞書情報
１２４選択文字列情報
１３１スキャナ制御部
１３２解析部
１３３抽出部
１３４選択部
１３５認識部
１３６補正部
１３７出力制御部 REFERENCE SIGNS LIST 10 Recognition device 11 IF section 12 Storage section 13 Control section 51, 52, 53, 54, 55, 56, 57, 58, 59 Area 121 Handwritten area extraction model information 122 Printed area extraction model information 123 Dictionary information 124 Selected character string information 131 Scanner control section 132 Analysis section 133 Extraction section 134 Selection section 135 Recognition section 136 Correction section 137 Output control section

Claims

Extracting a first area where a handwritten entry is made from the form;
Extracting a second area in which type is printed from the document;
selecting an area from the second area whose degree of overlap with the first area satisfies a predetermined condition ;
Recognizing a character string printed in the area selected by the selection process and included in a choice that is previously associated with an item name
A recognition method characterized in that the processing is executed by a computer.

The recognition method according to claim 1, characterized in that the process of extracting the first region extracts the first region using an image recognition model that has learned the characteristics of handwritten figures.

The recognition method according to claim 1 or 2, characterized in that the selection process selects the second region that has the largest area of overlap with the first region.

4. The method according to claim 1, further comprising the step of: excluding a character string designated in advance from the character string recognized by the recognition process.

a handwritten area extraction unit that extracts a first area that is handwritten from a form;
a typed character area extraction unit that extracts a second area in which typed characters are printed from the document;
a selection unit that selects, from the second region, a region whose degree of overlap with the first region satisfies a predetermined condition;
a recognition unit that recognizes a character string printed in the area selected by the selection unit and that is included in an option that is previously associated with an item name;
A recognition device comprising: