JP7516170B2

JP7516170B2 - Image processing device, image processing method, and program

Info

Publication number: JP7516170B2
Application number: JP2020148383A
Authority: JP
Inventors: 崇宮内
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-03-12
Filing date: 2020-09-03
Publication date: 2024-07-16
Anticipated expiration: 2040-09-03
Also published as: JP2021144673A

Description

本開示は、画像に含まれるインデックスを抽出する技術に関する。 This disclosure relates to a technique for extracting indexes contained in an image.

帳票等の紙文書を画像読み取り装置でスキャンすることにより得られたスキャン画像に含まれる所望の項目の文字列（以下、インデックスという）を抽出する方法がある。文書の内容からインデックスを抽出するには、ＯＣＲ処理が必要となる。しかし、スキャン画像全体に対してＯＣＲ処理を実行すると処理負荷が増し、ユーザの待ち時間の増加することがある。 There is a method for extracting character strings (hereafter referred to as indexes) of desired items contained in a scanned image obtained by scanning a paper document such as a form with an image reading device. OCR processing is required to extract an index from the contents of the document. However, performing OCR processing on the entire scanned image increases the processing load, and this can lead to increased waiting times for users.

特許文献１には、文書の種類ごとにインデックスが含まれる領域の情報を予め登録し、登録されているインデックスの領域に対して部分的にＯＣＲ処理を行い、スキャン画像からインデックスを抽出する方法が開示されている。 Patent document 1 discloses a method of registering in advance information about areas that contain indexes for each type of document, and then performing OCR processing partially on the registered index areas to extract the index from the scanned image.

特開２０１９－１２８７１５号公報JP 2019-128715 A

しかしながら、同じ種類の文書であっても、記載される内容によってインデックスが含まれる文字列領域（以下、テキストブロックという）の位置がずれていることがある。このため、登録されているインデックスの領域に対して部分的にＯＣＲ処理を行っても、インデックスの抽出に失敗してしまうことがある。 However, even for documents of the same type, the position of the character string area (hereafter referred to as the text block) containing the index may differ depending on the content written. For this reason, even if OCR processing is performed partially on the registered index area, extraction of the index may fail.

本開示の技術は、スキャン画像のテキストブロックの位置が、登録されている位置とずれている場合であっても、抽出対象のインデックスを抽出することを目的とする。 The technology disclosed herein aims to extract the index of the extraction target even if the position of the text block in the scanned image is different from the registered position.

本開示の画像処理装置は、入力画像におけるテキストブロックを検出する検出手段と、複数の登録文書の中から、前記入力画像に対応する登録文書を特定する特定手段と、前記特定された登録文書において規定されている、処理対象の項目に対応する第１のテキストブロックと前記第１のテキストブロックの近傍に存在する少なくとも１つの第２のテキストブロックとを含む部分レイアウトに基づき、前記入力画像における前記処理対象の項目に対応するテキストブロックの決定をする決定手段と、前記決定されたテキストブロックに対して文字認識処理を行うことにより、前記入力画像における前記処理対象の項目に対応する文字列を取得する取得手段と、を有することを特徴とする。 The image processing device disclosed herein is characterized by having a detection means for detecting text blocks in an input image, an identification means for identifying a registered document corresponding to the input image from among a plurality of registered documents, a determination means for determining a text block corresponding to the item to be processed in the input image based on a partial layout defined in the identified registered document, the partial layout including a first text block corresponding to the item to be processed and at least one second text block existing in the vicinity of the first text block, and an acquisition means for acquiring a character string corresponding to the item to be processed in the input image by performing character recognition processing on the determined text block.

本開示の技術によれば、スキャン画像のテキストブロックの位置が登録されている文書と異なる場合であっても、抽出対象のインデックスを抽出することができる。 The technology disclosed herein makes it possible to extract the index of the target text block even if the position of the text block in the scanned image differs from that of the registered document.

システムの構成例を示す図である。FIG. 1 illustrates an example of a system configuration. 画像形成装置のハードウェア構成例を示す図である。FIG. 2 illustrates an example of a hardware configuration of an image forming apparatus. 画像形成装置の機能構成を示す図である。FIG. 2 is a diagram illustrating a functional configuration of the image forming apparatus. スキャン画像のファイル生成処理のフローチャートである。13 is a flowchart of a file generation process of a scanned image. インデックス抽出処理のフローチャートである。13 is a flowchart of an index extraction process. ブロックセレクション処理の例を示す図である。FIG. 11 is a diagram illustrating an example of a block selection process. インデックス抽出ルールの例を示す図である。FIG. 13 is a diagram illustrating an example of an index extraction rule. インデックスブロック推定処理のフローチャートである。13 is a flowchart of an index block estimation process. ペアブロックの決定方法を説明する図である。FIG. 11 is a diagram for explaining a method for determining pair blocks. 部分パターンの例を示す図である。FIG. 13 is a diagram showing an example of a partial pattern. Ｙ候補位置の決定処理を説明する図である。13A and 13B are diagrams illustrating a process of determining a Y candidate position. Ｙ方向のシフト量のヒストグラムの例を示す図である。FIG. 13 is a diagram illustrating an example of a histogram of shift amounts in the Y direction. 部分パターンの一致度の算出を説明する図である。FIG. 13 is a diagram illustrating calculation of a degree of match of a partial pattern. 部分パターンの一致度の算出を説明する図である。FIG. 13 is a diagram illustrating calculation of a degree of match of a partial pattern. 部分パターン範囲の決定方法を説明する図である。FIG. 13 is a diagram for explaining a method for determining a partial pattern range. インデックスブロック推定処理のフローチャートである。13 is a flowchart of an index block estimation process. 部分パターンの例を示す図である。FIG. 13 is a diagram showing an example of a partial pattern. ＸＹ候補位置群の例を示す図である。FIG. 13 is a diagram showing an example of a group of XY candidate positions. 類似位置群の例を示す図である。FIG. 13 is a diagram showing an example of a group of similar positions. 類似位置群とＸＹ候補位置群の対応付けを説明する図である。13 is a diagram for explaining correspondence between a group of similar positions and a group of XY candidate positions. FIG.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る本開示の技術を限定するものでなく、また本実施形態で説明されている特徴の組み合わせの全てが本開示の技術の解決手段に必須のものとは限らない。 The following embodiments are described in detail with reference to the attached drawings. Note that the following embodiments do not limit the disclosed technology according to the claims, and not all of the combinations of features described in the embodiments are necessarily essential to the solution of the disclosed technology.

＜実施形態１＞
本実施形態の画像形成装置は、文書原稿をスキャンして、得られたスキャン画像の先頭ページの画像に含まれる所定の項目の文字列を組み合わせてファイル名を生成する。そして生成したファイル名をそのスキャン画像のファイル名としてユーザにレコメンドする。しかしながら、スキャン画像から所定の項目の文字列を抽出するには処理負荷が増すことがある。 <Embodiment 1>
The image forming apparatus of this embodiment scans a document, combines character strings of predetermined items contained in the image of the first page of the obtained scanned image, and generates a file name. The generated file name is then recommended to the user as the file name of the scanned image. However, extracting character strings of predetermined items from the scanned image may increase the processing load.

このため、文書の種類ごとに所定の項目のテキストブロックの位置情報を登録しておく。そしてスキャン画像の文書の種類を特定して、特定された文書における登録されたテキストブロックの位置に基づき、スキャン画像から所定の項目の文字列を抽出することが考えられる。しかしながらこの場合も、同じ文書の種類であっても、記載内容の変更等によりスキャンされた画像におけるテキストブロックの位置は登録されている位置と異なってしまうことがある。 For this reason, position information for text blocks of specified items is registered for each document type. It is then possible to identify the document type of the scanned image and extract character strings of specified items from the scanned image based on the positions of the registered text blocks in the identified document. However, even in this case, even for the same document type, the positions of text blocks in the scanned image may differ from the registered positions due to changes in the written content, etc.

例えば、図１１（ａ）の文書が登録されており、テキストブロック１００３の位置を示す情報が発行元会社名を示す文字列が含まれる領域の情報として登録されているものとする。一方、図１１（ｂ）は、図１１（ａ）と同じ種類の文書をスキャンして得られたスキャン画像であるが、表構造内の項目行数が増えており、抽出されるべき発行元会社名のテキストブロック１１０１が、図１１（ａ）と比較して下方向にシフトしている。このため図１１（ｂ）のスキャン画像を得るためにスキャンされた文書が図１１（ａ）と同じ種類であると特定できても、図１１（ｂ）の画像の発行元会社名を示す文字列の抽出に失敗することがある。なお、図１１（ｃ）の説明については後述する。 For example, assume that the document in FIG. 11(a) is registered, and information indicating the position of text block 1003 is registered as information on an area containing a character string indicating the issuing company name. Meanwhile, FIG. 11(b) is a scanned image obtained by scanning the same type of document as FIG. 11(a), but the number of item rows in the table structure has increased, and the text block 1101 of the issuing company name to be extracted has shifted downward compared to FIG. 11(a). For this reason, even if the document scanned to obtain the scanned image in FIG. 11(b) can be identified as being of the same type as FIG. 11(a), extraction of the character string indicating the issuing company name from the image in FIG. 11(b) may fail. Note that FIG. 11(c) will be explained later.

このため実施形態では、スキャン画像に含まれる項目のテキストブロックを抽出するために、スキャンされた文書原稿と同じ種類の文書における項目を示すテキストブロックと、それ以外の少なくとも１つのテキストブロックとのレイアウトを用いる。本実施形態では、そのレイアウトとの一致度が高い領域をスキャン画像から探索して、探索された結果に基づきスキャン画像に含まれる項目のテキストブロックを推定する方法を説明する。 Therefore, in this embodiment, in order to extract text blocks of items included in a scanned image, a layout of text blocks indicating items in a document of the same type as the scanned document manuscript and at least one other text block is used. This embodiment describes a method of searching the scanned image for an area that has a high degree of match with the layout, and estimating text blocks of items included in the scanned image based on the search results.

なお、本実施形態では、画像内の座標は例えば、原点が左上で、縦方向がＹ方向、文字列が連続する横方向がＸ方向に延びる座標系が用いられる。テキストブロックの位置は、例えば、左上座標値が夫々の位置として保持される。 In this embodiment, the coordinate system used for the image has an origin at the top left, a vertical direction in the Y direction, and a horizontal direction in which a string of characters extends in the X direction. The position of each text block is stored as the top left coordinate value, for example.

［システム構成］
図１は、本実施形態を適用可能なシステムの全体構成を示す図である。本実施形態のシステム１０５は、画像形成装置１００および端末１０１を有する。図１に示すように、画像形成装置１００はＬＡＮ１０２に接続され、Ｉｎｔｅｒｎｅｔ１０３等を介してＰＣなどの端末１０１等と通信可能になっている。なお、本実施形態においては、端末１０１は無くてもよく、画像形成装置１００のみの構成だけでもよい。 [System configuration]
Fig. 1 is a diagram showing the overall configuration of a system to which this embodiment can be applied. A system 105 of this embodiment has an image forming apparatus 100 and a terminal 101. As shown in Fig. 1, the image forming apparatus 100 is connected to a LAN 102 and is capable of communicating with a terminal 101 such as a PC via the Internet 103 or the like. Note that in this embodiment, the terminal 101 does not have to be included, and the system may be configured with only the image forming apparatus 100.

画像形成装置１００は、表示・操作部１２３（図２参照）、スキャナ部１２２（図２参照）及び、プリンタ部１２１（図２参照）等を有する複合機（ＭＦＰ）である。画像形成装置１００は、スキャナ部１２２を用いて文書原稿をスキャンするスキャン端末として利用することが可能である。また、タッチパネルやハードボタンなどの表示・操作部１２３を有し、ファイル名や格納先のレコメンド結果を表示したり、ユーザからの指示を受け付けたりするためのユーザインタフェースの表示を行う。 The image forming device 100 is a multifunction peripheral (MFP) having a display and operation unit 123 (see FIG. 2), a scanner unit 122 (see FIG. 2), a printer unit 121 (see FIG. 2), and the like. The image forming device 100 can be used as a scanning terminal that scans document manuscripts using the scanner unit 122. The image forming device 100 also has a display and operation unit 123 such as a touch panel or hard buttons, and displays a user interface for displaying recommended file names and storage destinations, and for receiving instructions from the user.

［画像形成装置のハードウェア構成］
図２は、画像形成装置１００のハードウェア構成を示すブロック図である。本実施形態の画像形成装置１００は、表示・操作部１２３、スキャナ部１２２、プリンタ部１２１、及び制御部１１０を有する。 [Hardware Configuration of Image Forming Apparatus]
2 is a block diagram showing a hardware configuration of the image forming apparatus 100. The image forming apparatus 100 of this embodiment has a display and operation unit 123, a scanner unit 122, a printer unit 121, and a control unit 110.

制御部１１０は、ＣＰＵ１１１、記憶装置１１２（ＲＯＭ１１８，ＲＡＭ１１９，ＨＤＤ１２０）、プリンタＩ／Ｆ部１１３、ネットワークＩ／Ｆ部１１４、スキャナＩ／Ｆ部１１５、表示・操作Ｉ／Ｆ部１１６を有する。また、制御部１１０ではこの各部がシステムバス１１７を介して互いに通信可能に接続されている。制御部１１０は、画像形成装置１００全体の動作を制御する。 The control unit 110 has a CPU 111, a storage device 112 (ROM 118, RAM 119, HDD 120), a printer I/F unit 113, a network I/F unit 114, a scanner I/F unit 115, and a display and operation I/F unit 116. Furthermore, in the control unit 110, each of these units is connected to each other so that they can communicate with each other via a system bus 117. The control unit 110 controls the operation of the entire image forming apparatus 100.

ＣＰＵ１１１は、記憶装置１１２に記憶された制御プログラムを読み出し実行することにより、後述のフローチャートにおける読取制御や画像処理、表示制御などの各処理を実行する手段として機能する。 The CPU 111 reads and executes the control programs stored in the storage device 112, thereby functioning as a means for executing various processes such as reading control, image processing, and display control in the flowcharts described below.

記憶装置１１２は、制御プログラム、画像データ、メタデータ、設定データ及び、処理結果データ等を格納し保持する。記憶装置１１２には、不揮発性メモリであるＲＯＭ１１８、揮発性メモリであるＲＡＭ１１９及び、大容量記憶領域であるＨＤＤ１２０などがある。ＲＯＭ１１８は、制御プログラムなどを保持する不揮発性メモリであり、ＣＰＵ１１１はその制御プログラムを読み出し制御を行う。ＲＡＭ１１９は、ＣＰＵ１１１の主メモリ、ワークエリア等の一時記憶領域として用いられる揮発性メモリである。 The storage device 112 stores and holds control programs, image data, metadata, setting data, processing result data, etc. The storage device 112 includes a ROM 118 which is a non-volatile memory, a RAM 119 which is a volatile memory, and a HDD 120 which is a large-capacity storage area. The ROM 118 is a non-volatile memory which holds the control programs, etc., and the CPU 111 reads and controls the control programs. The RAM 119 is a volatile memory which is used as a temporary storage area such as the main memory and work area of the CPU 111.

ネットワークＩ／Ｆ部１１４は、制御部１１０（画像形成装置１００）を、システムバス１１７を介してＬＡＮ１０２に接続する。ネットワークＩ／Ｆ部１１４は、ＬＡＮ１０２上の外部装置に画像データを送信したり、ＬＡＮ１０２上の外部装置から各種情報を受信したりする。 The network I/F unit 114 connects the control unit 110 (image forming device 100) to the LAN 102 via the system bus 117. The network I/F unit 114 transmits image data to external devices on the LAN 102 and receives various information from external devices on the LAN 102.

スキャナＩ／Ｆ部１１５は、スキャナ部１２２と制御部１１０とを、システムバス１１７を介して接続する。スキャナ部１２２は、文書原稿を読み取ってスキャン画像データを生成し、スキャナＩ／Ｆ部１１５を介してスキャン画像データを制御部１１０に入力する。なお、スキャナ部１２２は、原稿フィーダを備え、トレイに置かれた複数の原稿を１枚ずつフィードして、連続的に読み取ることを可能とする。 The scanner I/F unit 115 connects the scanner unit 122 and the control unit 110 via the system bus 117. The scanner unit 122 reads a document original to generate scanned image data, and inputs the scanned image data to the control unit 110 via the scanner I/F unit 115. The scanner unit 122 is equipped with a document feeder, and is capable of feeding multiple documents placed on a tray one by one, enabling them to be read continuously.

表示・操作Ｉ／Ｆ部１１６は、表示・操作部１２３と制御部１１０とを、システムバス１１７を介して接続する。表示・操作部１２３には、タッチパネル機能を有する液晶表示部、ハードボタンなどが備えられている。 The display and operation I/F unit 116 connects the display and operation unit 123 and the control unit 110 via the system bus 117. The display and operation unit 123 is equipped with a liquid crystal display unit with a touch panel function, hard buttons, etc.

プリンタＩ／Ｆ部１１３は、プリンタ部１２１と制御部１１０とを、システムバス１１７を介して接続する。プリンタ部１２１は、ＣＰＵ１１１で生成された画像データをプリンタＩ／Ｆ部１１３を介して受信し、当該受信した画像データを用いて記録紙へのプリント処理が行われる。以上のように、本実施形態に係る画像形成装置１００では、上記のハードウェア構成によって、画像処理機能を提供することが可能である。 The printer I/F unit 113 connects the printer unit 121 and the control unit 110 via the system bus 117. The printer unit 121 receives image data generated by the CPU 111 via the printer I/F unit 113, and performs printing processing on recording paper using the received image data. As described above, the image forming device 100 according to this embodiment is capable of providing an image processing function with the above hardware configuration.

［画像形成装置の機能構成］
図３は、画像形成装置１００の機能構成を示すブロック図である。なお、図３では画像形成装置１００が有する諸機能のうち、文書原稿をスキャンして電子化（ファイル化）し、保存を行うまでの処理に関わる機能に絞った機能を示す。 [Functional Configuration of Image Forming Apparatus]
Fig. 3 is a block diagram showing the functional configuration of the image forming apparatus 100. Note that Fig. 3 shows only functions related to the process of scanning a document original, digitizing it (making it into a file), and storing it, among the various functions of the image forming apparatus 100.

表示制御部３０１は、表示・操作部１２３のタッチパネルに、各種のユーザ操作を受け付けるためのユーザインタフェース画面（ＵＩ画面）を表示する。各種のユーザ操作には、例えば、スキャン設定、スキャンの開始指示、ファイル名設定、ファイルの保存指示などがある。 The display control unit 301 displays a user interface screen (UI screen) for receiving various user operations on the touch panel of the display/operation unit 123. The various user operations include, for example, scan settings, instructions to start scanning, file name settings, and instructions to save a file.

スキャン制御部３０２は、ＵＩ画面でなされたユーザ操作（例えば「スキャン開始」ボタンの押下）に応じて、スキャン設定の情報と共にスキャン実行部３０３に対しスキャン処理の実行を指示する。スキャン実行部３０３は、スキャン制御部３０２からのスキャン処理の実行指示に従い、スキャナＩ／Ｆ部１１５を介してスキャナ部１２２に文書原稿の読み取り動作を実行させ、スキャン画像データを生成する。生成したスキャン画像データは、スキャン画像管理部３０４によってＨＤＤ１２０に保存される。 In response to a user operation performed on the UI screen (e.g., pressing the "Start Scan" button), the scan control unit 302 instructs the scan execution unit 303 to execute a scan process together with scan setting information. In accordance with the instruction to execute a scan process from the scan control unit 302, the scan execution unit 303 causes the scanner unit 122 to execute a read operation of the document original via the scanner I/F unit 115, and generates scan image data. The generated scan image data is stored in the HDD 120 by the scan image management unit 304.

画像処理部３０５は、スキャン画像データに対して、テキストブロックの検出処理、ＯＣＲ処理（文字認識処理）、類似文書の判定処理といった画像解析処理の他、回転や傾き補正といった画像加工処理を行う。画像処理部３０５によって、画像形成装置１００は画像処理装置としても機能する。スキャン画像から検出される文字列領域は「テキストブロック」とも呼ばれる。なお画像処理の詳細については後述する。 The image processing unit 305 performs image analysis processes, such as text block detection, OCR (character recognition), and similar document determination, on the scanned image data, as well as image processing processes such as rotation and tilt correction. The image processing unit 305 also allows the image forming device 100 to function as an image processing device. Character string areas detected from the scanned image are also called "text blocks." Details of the image processing will be described later.

図３の各部の機能は、画像形成装置１００のＣＰＵがＲＯＭに記憶されているプログラムコードをＲＡＭに展開し実行することにより実現される。または、図３の各部の一部または全部の機能をＡＳＩＣや電子回路等のハードウェアで実現してもよい。 The functions of each part in FIG. 3 are realized by the CPU of the image forming device 100 expanding program code stored in the ROM into the RAM and executing it. Alternatively, some or all of the functions of each part in FIG. 3 may be realized by hardware such as an ASIC or electronic circuit.

［スキャン画像のファイル生成処理のフローチャート］
画像形成装置１００が文書原稿を読み取り、文書原稿の先頭ページのスキャン画像に対して画像処理を行い、スキャン画像に含まれる文字列を利用してファイル名を生成し、表示・操作部１２３を通じてユーザにレコメンドする処理の全体について説明する。 [Flowchart of scanned image file generation process]
The entire process in which the image forming device 100 reads a document original, performs image processing on the scanned image of the first page of the document original, generates a file name using a character string contained in the scanned image, and recommends the file name to the user via the display/operation unit 123 will be described below.

図４のフローチャートで示される一連の処理は、画像形成装置１００のＣＰＵがＲＯＭに記憶されているプログラムコードをＲＡＭに展開し実行することにより行われる。また、図４におけるステップの一部または全部の機能をＡＳＩＣや電子回路等のハードウェアで実現してもよい。なお、各処理の説明における記号「Ｓ」は、当該フローチャートにおけるステップであることを意味し、以後のフローチャートにおいても同様とする。 The series of processes shown in the flowchart of FIG. 4 are performed by the CPU of the image forming device 100 by expanding the program code stored in the ROM into the RAM and executing it. In addition, some or all of the functions of the steps in FIG. 4 may be realized by hardware such as an ASIC or electronic circuit. Note that the symbol "S" in the explanation of each process indicates a step in the flowchart, and the same applies to subsequent flowcharts.

Ｓ４００においてスキャン制御部３０２は、表示・操作部１２３を介してユーザのスキャン指示を受け付けると、スキャン実行部３０３に、スキャナ部１２２の原稿フィーダのトレイから複数の文書原稿を１枚ずつ読み取り（スキャン）を実行させる。そして、スキャン制御部３０２は、スキャンの結果得られた画像（スキャン画像とよぶ）の画像データを取得する。 In S400, when the scan control unit 302 receives a scan instruction from the user via the display and operation unit 123, it causes the scan execution unit 303 to read (scan) multiple document originals one by one from the document feeder tray of the scanner unit 122. Then, the scan control unit 302 obtains image data of the image obtained as a result of the scan (called a scanned image).

Ｓ４０１において画像処理部３０５は、Ｓ４００で取得した画像データを解析し、スキャン画像に含まれるインデックスを抽出する処理（インデックス抽出処理）を行う。「インデックス」とは、文書のタイトル、管理ナンバー、会社名などの所定の項目の文字列である。本実施形態ではインデックスは、スキャン画像を保存する際のファイル名またはメタデータとして使用される。本ステップのインデックス抽出処理の詳細については、図５を用いて後述する。 In S401, the image processing unit 305 analyzes the image data acquired in S400 and performs processing to extract indexes contained in the scanned image (index extraction processing). An "index" is a character string of a specific item such as a document title, management number, or company name. In this embodiment, the index is used as a file name or metadata when saving the scanned image. Details of the index extraction processing in this step will be described later with reference to FIG. 5.

インデックスの使用方法はファイル名の生成またはメタデータの抽出に限られない。フォルダパスなどの他のプロパティ情報を設定するために用いられてもよい。つまり、ファイル名およびメタデータは、スキャン画像データに関するプロパティ（属性）として設定される情報の一種である。 The use of the index is not limited to generating file names or extracting metadata. It may also be used to set other property information such as folder paths. In other words, file names and metadata are types of information that are set as properties (attributes) related to scanned image data.

Ｓ４０２において表示制御部３０１は、Ｓ４０１で抽出されたインデックスを用いてファイル名を生成し、生成されたファイル名およびメタデータを、表示・操作部１２３に表示させてユーザに提示（レコメンド）する。また、表示制御部３０１は、ユーザによる確認または提示したファイル名の修正を受け付ける。表示制御部３０１は表示・操作部１２３を介してユーザから確認または修正を受け付けると、提示したファイル名または修正された場合は修正後のファイル名がスキャン画像のファイル名として決定される。ユーザが表示・操作部１２３を介して修正した場合は、インデックス抽出ルールが更新される。インデックス抽出ルールについては後述する。 In S402, the display control unit 301 generates a file name using the index extracted in S401, and displays the generated file name and metadata on the display and operation unit 123 to present (recommend) to the user. The display control unit 301 also accepts confirmation or correction of the presented file name by the user. When the display control unit 301 accepts confirmation or correction from the user via the display and operation unit 123, the presented file name or the corrected file name if corrected, is determined as the file name of the scanned image. If the user makes corrections via the display and operation unit 123, the index extraction rule is updated. The index extraction rule will be described later.

Ｓ４０３において画像処理部３０５は、Ｓ４００で取得した画像データからファイルを作成し、Ｓ４０２で決定されたファイル名を設定する。本実施形態では、一例として、ファイル形式としてＰＤＦ（ＰｏｒｔａｂｌｅＤｏｃｕｍｅｎｔＦｏｒｍａｔ）化してスキャン画像を保存するものとして説明する。ＰＤＦの場合には、画像データをページに分け保存することが可能であり、Ｓ４００において複数の文書原稿をスキャンした場合には、各文書原稿に対応する画像データを別々のページとして１つのファイルに保存される。 In S403, the image processing unit 305 creates a file from the image data acquired in S400, and sets the file name determined in S402. In this embodiment, as an example, the scanned image is saved in a file format of PDF (Portable Document Format). In the case of PDF, it is possible to save the image data divided into pages, and when multiple document manuscripts are scanned in S400, the image data corresponding to each document manuscript is saved as separate pages in a single file.

Ｓ４０４においてスキャン画像管理部３０４は、Ｓ４０３で作成したファイルを、ＬＡＮ１０２を通じて所定の送信先に送信する。 In S404, the scan image management unit 304 transmits the file created in S403 to a specified destination via the LAN 102.

［インデックス抽出処理（Ｓ４０１）について］
図５は、Ｓ４０１のインデックス抽出処理の詳細を示すフローチャートである。インデックス抽出処理の詳細について図５を用いて説明する。インデックス抽出処理では、画像データの１ページに対して、向きの補正を行い、文書の種類を特定し、文書の種類に応じたインデックス抽出を行う処理を行う。 [Regarding index extraction process (S401)]
Fig. 5 is a flowchart showing the details of the index extraction process in S401. The details of the index extraction process will be described with reference to Fig. 5. In the index extraction process, the orientation of one page of image data is corrected, the document type is identified, and an index corresponding to the document type is extracted.

Ｓ５００において画像処理部３０５は、画像データからスキャン画像の傾きの角度を検出し、検出した傾きだけ逆方向に画像を回転することでスキャン画像の傾きを補正する。傾き補正の対象となる傾きは、例えば、文書原稿のスキャン時にスキャナ部１２２の原稿フィーダ内のローラの摩耗などが原因でまっすぐに文書原稿が読み取られないことで発生する。または、スキャンされた文書原稿が印刷時にまっすぐ印刷されなかったために発生する。 In S500, the image processing unit 305 detects the angle of inclination of the scanned image from the image data, and corrects the inclination of the scanned image by rotating the image in the opposite direction by the detected inclination. The inclination that is the subject of inclination correction occurs, for example, when a document is scanned and not read straight due to wear of the rollers in the document feeder of the scanner unit 122. Or, it occurs when the scanned document is not printed straight when it is printed.

傾きの角度の検出方法として、まず、画像データ内に含まれるオブジェクトを検出し、水平方向あるいは鉛直方向に隣り合うオブジェクト群を連結する。そして、連結されたオブジェクト群の中心位置を結んだ角度が、水平方向または鉛直方向からどれだけ傾いているかを導出して傾きを求める。なお、傾きの検出方法はこの方法に限られない。他にも例えば、画像データ内に含まれるオブジェクトの中心座標を取得し、０．１度単位で中心座標群を回転させて、中心座標群が水平方向あるいは垂直方向に並ぶ割合がもっとも高い角度をスキャン画像の傾きとして求める方法でもよい。スキャン画像の傾きを補正することによって、以降に行われる、回転補正、ブロックセレクション処理、およびＯＣＲ処理のそれぞれの処理精度を上げることができる。 The method of detecting the angle of inclination is to first detect objects contained in the image data and connect adjacent objects in the horizontal or vertical direction. Then, the angle connecting the center positions of the connected objects is calculated to determine the degree of inclination from the horizontal or vertical direction. Note that the method of detecting inclination is not limited to this method. For example, a method may be used in which the center coordinates of objects contained in the image data are obtained, the center coordinates are rotated in 0.1 degree increments, and the angle at which the center coordinates are most frequently aligned horizontally or vertically is determined as the inclination of the scanned image. Correcting the inclination of the scanned image can improve the processing accuracy of the rotation correction, block selection processing, and OCR processing that are performed subsequently.

Ｓ５０１において画像処理部３０５は、Ｓ５００の処理の結果得られた傾き補正後のスキャン画像に対して、画像内の文字が正立する向きになるように、９０度単位で画像を回転補正する。回転補正の方法は、例えば、傾き補正後のスキャン画像を基準画像として、基準画像と、基準画像を９０回転した画像と、基準画像を１８０度回転した画像と、基準画像を２７０度回転した画像と、の４枚の画像を用意する。そして、それぞれの画像に対し、高速処理可能な簡易的なＯＣＲ処理を実行して、一定値以上の確信度で認識された文字の数が最も多い画像を回転補正後の画像とする方法がある。ただし、回転補正の方法はこの方法に限るものではない。なお以降のスキャン画像とは、特に断りが無い限りＳ５００およびＳ５０１で補正されたスキャン画像のことを指すものとする。 In S501, the image processing unit 305 rotates the scanned image after tilt correction obtained as a result of the processing in S500 in 90 degree increments so that the characters in the image are oriented upright. For example, the method of rotation correction is to use the scanned image after tilt correction as a reference image and prepare four images: the reference image, an image rotated by 90 degrees from the reference image, an image rotated by 180 degrees from the reference image, and an image rotated by 270 degrees from the reference image. Then, a simple OCR process that can be processed at high speed is performed on each image, and the image with the largest number of characters recognized with a certain level of confidence or higher is used as the image after rotation correction. However, the method of rotation correction is not limited to this method. Note that hereinafter, the term "scanned image" refers to the scanned image corrected in S500 and S501 unless otherwise specified.

Ｓ５０２において画像処理部３０５は、スキャン画像に対しブロックセレクション処理を実行する。ブロックセレクション処理とは、画像を前景領域と背景領域に分類した上で、前景領域をテキストブロックとそれ以外のブロックに分割して、テキストブロックを検出する処理である。 In S502, the image processing unit 305 performs block selection processing on the scanned image. Block selection processing is a process in which the image is classified into foreground and background regions, the foreground region is divided into text blocks and other blocks, and the text blocks are detected.

具体的には、白黒に二値化されたスキャン画像に対し輪郭線追跡を行って、黒画素輪郭で囲まれる画素の塊を抽出する。そして、面積が所定の大きさよりも大きい黒画素の塊については、内部にある白画素に対しても輪郭線追跡を行い白画素の塊を抽出し、さらに一定の大きさ以上の面積の白画素の塊の内部から再帰的に黒画素の塊を抽出する。こうして得られた黒画素の塊を前景領域と決定する。決定された前景領域は、大きさ及び形状で分類し異なる属性を持つ領域に分類する。例えば、縦横比が１に近く大きさが一定の範囲の前景領域を文字相当の画素塊とし、さらに近接する文字が整列良くグループ化され得る領域は文字列の領域（ＴＥＸＴ）と決定する。扁平な画素塊は線領域（ＬＩＮＥ）と決定する。一定大きさ以上でかつ矩形の白画素塊を整列よく内包する黒画素塊の占める範囲を表領域（ＴＡＢＬＥ）と決定する。不定形の画素塊が散在している領域を写真領域（ＰＨＯＴＯ）と決定する。そして、それ以外の形状の画素塊を図画領域（ＰＩＣＴＵＲＥ）と決定する。こうしてオブジェクトの属性毎に領域分割されたものの中から、文字属性を持つと決定された前景領域（ＴＥＸＴ）がテキストブロックとして検出される。 Specifically, the contour of the scanned image that has been binarized to black and white is traced to extract a cluster of pixels surrounded by a black pixel contour. For clusters of black pixels whose area is larger than a certain size, the contour of the white pixels inside is also traced to extract a cluster of white pixels, and then a cluster of black pixels is recursively extracted from inside the cluster of white pixels whose area is equal to or larger than a certain size. The cluster of black pixels thus obtained is determined as the foreground region. The determined foreground region is classified by size and shape into regions with different attributes. For example, a foreground region with an aspect ratio close to 1 and a certain range of size is determined as a pixel cluster corresponding to a character, and a region where adjacent characters can be grouped in good alignment is determined as a character string region (TEXT). A flat pixel cluster is determined as a line region (LINE). A region occupied by a black pixel cluster that is equal to or larger than a certain size and contains a rectangular white pixel cluster in good alignment is determined as a table region (TABLE). Areas where irregular pixel clusters are scattered are determined to be photograph areas (PHOTO). Pixel clusters of any other shape are determined to be picture areas (PICTURE). From among the areas divided according to object attributes in this way, foreground areas (TEXT) determined to have character attributes are detected as text blocks.

図６は、ブロックセレクション処理の結果の一例を示す図である。図６（ａ）は回転補正後のスキャン画像を示す。図６（ｂ）は図６（ａ）のスキャン画像に対するブロックセレクション処理の結果を示しており、点線で示した矩形が前景領域を表している。なお、図６（ｂ）では、全ての前景領域の属性が決定されているが、属性については一部の前景領域に対してのみ表示している。本ステップで検出された各テキストブロックの情報（属性と各ブロックの位置およびサイズを示す情報）は、後続処理である、ＯＣＲ処理および類似度計算等で用いられる。 Figure 6 shows an example of the results of block selection processing. Figure 6(a) shows a scanned image after rotation correction. Figure 6(b) shows the results of block selection processing on the scanned image of Figure 6(a), where the rectangles indicated by dotted lines represent foreground regions. Note that in Figure 6(b), the attributes of all foreground regions have been determined, but the attributes are only displayed for some of the foreground regions. Information about each text block detected in this step (attributes and information indicating the position and size of each block) is used in subsequent processes such as OCR processing and similarity calculation.

本ステップのブロックセレクション処理ではテキストブロックだけを検出する。その理由は、文字列の位置はスキャン画像の構造をよく表現し、インデックス情報と密接に関連するためである。したがって、写真領域や表領域等の他の属性を持つと判定されたブロックの情報を後続の処理で利用することを排除するものではない。 In the block selection process in this step, only text blocks are detected. This is because the position of character strings well represents the structure of a scanned image and is closely related to index information. Therefore, this does not exclude the use of information about blocks determined to have other attributes, such as photo areas or table areas, in subsequent processing.

Ｓ５０３において画像処理部３０５は、ＨＤＤ１２０からインデックス抽出ルールを取得しＲＡＭ１１９に展開する。 In S503, the image processing unit 305 retrieves the index extraction rules from the HDD 120 and loads them into the RAM 119.

図７は、インデックス抽出ルール（以下単に、抽出ルールとよぶ）の一部を示す図である。図７は、抽出ルールに含まれる帳票ＩＤとして「０００１」が付与され登録されている抽出ルールのレコードを示している。抽出ルールでは、登録されている文書１つについて、「文書ＩＤ」と、「サムネイル」と、「文書識別情報」と、「インデックス情報」との各データが、レコード単位で対応付けられている。抽出ルールは登録済み文書の数だけこれらの組み合わせ（レコード）を保持する。文書ＩＤは、文書の種類を表すユニークなＩＤである。 Figure 7 is a diagram showing a portion of an index extraction rule (hereinafter simply referred to as an extraction rule). Figure 7 shows an extraction rule record in which "0001" has been assigned as the document ID included in the extraction rule and registered. In the extraction rule, for each registered document, the following data are associated on a record-by-record basis: "document ID", "thumbnail", "document identification information", and "index information". The extraction rule holds as many combinations (records) of these as there are registered documents. The document ID is a unique ID that indicates the type of document.

文書識別情報は、登録されている文書のスキャン画像に対してブロックセレクション処理を実行した結果得られるテキストブロックの位置およびサイズの情報である。文書識別情報は、文書の種類を特定するための情報であり後述する文書マッチングで使用される。 Document identification information is information about the position and size of text blocks obtained by performing block selection processing on a scanned image of a registered document. Document identification information is information for identifying the type of document and is used in document matching, which will be described later.

インデックス情報は、スキャン画像に含まれるインデックスを抽出するための情報である。インデックスは、ファイルに付与するファイル名またはメタデータを決定するために使用される。インデックス情報は、具体的には、登録されている文書内における、それぞれの項目の文字列（インデックス）が含まれるテキストブロックの座標およびサイズの情報が含まれる。図７の「インデックス情報」の画像７０１はそれぞれの項目が含まれるテキストブロックの位置およびサイズを画像上の座標に配置して図示したものである。また、インデックス情報にはファイル名を生成するために用いられるインデックスとその順番を示す情報、メタデータとして付与するための情報が含まれる。 Index information is information for extracting indexes contained in a scanned image. The index is used to determine the file name or metadata to be assigned to a file. Specifically, index information includes information on the coordinates and size of text blocks that contain the character strings (index) of each item in a registered document. The "index information" image 701 in Figure 7 illustrates the position and size of the text blocks that contain each item, arranged at coordinates on the image. Index information also includes information indicating the indexes used to generate a file name and their order, and information to be assigned as metadata.

インデックス情報の「ファイル名ルール」には、タイトル（title）、発行元会社名（sender）、帳票番号（number）の項目のインデックスを、セパレータであるアンダースコアでつなげてファイル名を生成することが示されている。また、「メタデータ」には合計金額（total_price）の項目のインデックスをメタデータとして利用することが示されている。つまり、所定の項目のインデックスを抽出することで、ユーザにレコメンドするファイル名の生成、およびメタデータの抽出をすることができる。 The "File name rule" in the index information indicates that the file name is generated by connecting the indexes of the fields title (title), issuing company name (sender), and document number (number) with an underscore separator. Additionally, "Metadata" indicates that the index of the field total amount (total_price) is used as metadata. In other words, by extracting the indexes of specified fields, it is possible to generate file names to be recommended to users and extract metadata.

なお、本実施形態では、抽出されたインデックスをファイル名またはメタデータとして利用する例を示しているが、他のプロパティ情報であるファイルの送信先のフォルダ情報を決定するためのルールを保持してもよい。その場合も、インデックスを用いて生成されたプロパティ情報がＳ４０２でユーザにレコメンドされて、Ｓ４０３でプロパティ情報がスキャン画像のファイルに設定される。 In this embodiment, an example is shown in which the extracted index is used as a file name or metadata, but rules for determining the destination folder information of the file, which is other property information, may also be held. In this case, too, the property information generated using the index is recommended to the user in S402, and the property information is set in the scanned image file in S403.

また、登録されている文書の抽出ルールとして、図７の「サムネイル」に示したように、登録された文書に対応するスキャン画像のサムネイルを一緒に保持してもよい。 In addition, as an extraction rule for registered documents, thumbnails of scanned images corresponding to the registered documents may also be stored together, as shown in "Thumbnail" in Figure 7.

Ｓ５０４において画像処理部３０５は、スキャン画像に対して文書マッチングを実行する。文書マッチングでは、スキャン画像を得るためにスキャンされた文書（入力文書）と同じ種類の文書が、抽出ルールに登録されている文書群にあるかどうかを判定する。そして、入力文書と同じ種類の文書が登録されていると判定された場合、その種類を特定する処理である。 In S504, the image processing unit 305 performs document matching on the scanned image. In document matching, it is determined whether a document of the same type as the document (input document) scanned to obtain the scanned image is present in the document group registered in the extraction rule. If it is determined that a document of the same type as the input document is registered, the process identifies the type.

本実施形態では、まず、スキャン画像と、抽出ルールに登録されている夫々の文書と、を１対１で比較し、含まれるテキストブロックの形状および配置がどれだけ類似しているかを表す類似度の算出を行う。類似度の算出の方法として、例えば、スキャン画像のテキストブロック全体と、登録されている文書のテキストブロック全体で位置合わせを行う。そして、スキャン画像の各テキストブロックと登録されている文書の各テキストブロックとが重なる面積の総和の二乗（値Ａとする）を求める。さらにスキャン画像のテキストブロックの面積の総和と登録されている文書のテキストブロックの面積の総和との積（値Ｂとする）を求める。そして、値Ａを値Ｂで割った値を類似度とする方法がある。この類似度の算出を、スキャン画像と抽出ルールに登録されている全ての文書との間で行う。 In this embodiment, first, the scanned image is compared one-to-one with each document registered in the extraction rule, and a similarity is calculated to indicate how similar the shapes and arrangements of the included text blocks are. As a method of calculating the similarity, for example, the entire text blocks of the scanned image are aligned with the entire text blocks of the registered document. Then, the square of the sum of the overlapping areas of each text block of the scanned image and each text block of the registered document (value A) is calculated. Furthermore, the product of the sum of the areas of the text blocks of the scanned image and the sum of the areas of the text blocks of the registered document (value B) is calculated. Then, the value A divided by value B is used as the similarity. This calculation of similarity is performed between the scanned image and all documents registered in the extraction rule.

そして、所定値以上の類似度であり、かつ、最も類似度が高い、抽出ルールに登録されている文書が、スキャンされた入力文書と同じ種類の文書と特定される。また、抽出ルールに、類似度が所定値以上の文書が無かった場合は、入力文書と同じ種類の文書は、抽出ルールには登録されていないと判定される。 The document registered in the extraction rule that has the highest similarity and is equal to or greater than a predetermined value is identified as the same type of document as the scanned input document. Furthermore, if there is no document in the extraction rule with a similarity equal to or greater than a predetermined value, it is determined that no document of the same type as the input document is registered in the extraction rule.

Ｓ５０５において画像処理部３０５は、Ｓ５０４で実行した文書マッチングの結果、入力文書と同じ種類の文書が抽出ルールに登録されていたかを判定する。入力文書が登録済み文書でなかった場合（Ｓ５０５がＮＯ）、本フローチャートの処理を終了する。登録済み文書でなかった場合は、前述したように新たにＩＤが付されて、Ｓ５０２で検出したテキストブロックのレイアウト情報等が抽出ルールに登録される。この場合、Ｓ４０２ではファイル名およびメタデータのユーザにレコメンドはされずに、表示制御部３０１は、ユーザによるファイル名の入力を受け付ける。表示制御部３０１は表示・操作部１２３を介してユーザから入力を受け付けると、入力されたファイル名がスキャン画像のファイル名として決定される。 In S505, the image processing unit 305 determines whether a document of the same type as the input document has been registered in the extraction rules as a result of the document matching performed in S504. If the input document is not a registered document (NO in S505), the processing of this flowchart ends. If it is not a registered document, a new ID is assigned as described above, and the layout information of the text block detected in S502, etc. is registered in the extraction rule. In this case, in S402, the file name and metadata are not recommended to the user, and the display control unit 301 accepts input of a file name by the user. When the display control unit 301 accepts input from the user via the display and operation unit 123, the input file name is determined as the file name of the scanned image.

入力文書と同じ種類の文書が登録されている場合（Ｓ５０５がＹＥＳ）、Ｓ５０６において画像処理部３０５は、Ｓ５０４で入力文書と同じ種類と特定された抽出ルールの文書と同じ文書ＩＤを、スキャン画像に付与する。 If a document of the same type as the input document is registered (YES in S505), in S506 the image processing unit 305 assigns to the scanned image the same document ID as the document of the extraction rule identified in S504 as being of the same type as the input document.

Ｓ５０７において画像処理部３０５は、Ｓ５０６で付与された文書ＩＤに紐づいた抽出ルールに基づいて、スキャン画像内における抽出対象（処理対象）の項目のインデックスのテキストブロックを推定するインデックスブロック推定処理を実行する。タイトル、発行元会社名、帳票番号等の項目を示す文字列（インデックス）が含まれるテキストブロックをインデックスブロックと呼ぶことがある。インデックスブロック推定処理の詳細については、後述する。 In S507, the image processing unit 305 executes an index block estimation process to estimate a text block of an index of an item to be extracted (processed) in the scanned image based on the extraction rule linked to the document ID assigned in S506. A text block that includes character strings (indexes) indicating items such as a title, issuing company name, and form number is sometimes called an index block. Details of the index block estimation process will be described later.

Ｓ５０８において画像処理部３０５は、Ｓ５０７で推定された夫々の項目のインデックスブロック群に対して、部分的なＯＣＲを実行し、各インデックスブロックに対応する文字列をインデックスとして抽出する。 In S508, the image processing unit 305 performs partial OCR on the index blocks of each item estimated in S507, and extracts character strings corresponding to each index block as an index.

［インデックスブロック推定処理（Ｓ５０７）について］
図８は、Ｓ５０７のインデックスブロック推定処理のフローチャートである。インデックスブロック推定処理の詳細について図８を用いて説明する。なお、以下、登録文書とは、Ｓ５０３で取得した抽出ルールにおいて登録されている文書のうち、Ｓ５０６でスキャン画像に付与された文書ＩＤに対応する文書のことをいう。本フローチャートの説明では、登録文書は図７の文書ＩＤ「０００１」の文書であるものとして説明する。 [Regarding the index block estimation process (S507)]
Fig. 8 is a flowchart of the index block estimation process in S507. Details of the index block estimation process will be described with reference to Fig. 8. Note that, hereinafter, a registered document refers to a document that corresponds to the document ID assigned to the scanned image in S506, among the documents registered in the extraction rule acquired in S503. In the description of this flowchart, the registered document will be described as the document with document ID "0001" in Fig. 7.

Ｓ８００において画像処理部３０５は、抽出ルールから、Ｓ５０６で付与された文書ＩＤに紐づいた文書識別情報を取得する。そして、画像処理部３０５は、スキャン画像内の全体のテキストブロックと、登録文書の全体のテキストブロックとで全体の位置合わせを行う。 In S800, the image processing unit 305 obtains document identification information linked to the document ID assigned in S506 from the extraction rule. Then, the image processing unit 305 performs overall alignment between the entire text block in the scanned image and the entire text block of the registered document.

Ｓ４００で取得されたスキャン画像の入力文書は、登録文書と同じ種類の文書であり、夫々の項目は登録文書の項目と同じ座標に印刷される。しかし、印刷およびスキャンのタイミングまたは印刷時の機器による違い等により、スキャン画像上のテキストブロックの位置と登録文書のテキストブロックの位置とにズレが生じてしまうことがある。そこで、本ステップではそのズレの影響を軽減して以降の処理の精度を向上させるため、全体の位置合わせを行う。なお、本実施形態では、図５のＳ５００で傾き補正を行っているため、本ステップの全体の位置合わせでは、スキャン画像上のテキストブロック全体をシフト（平行移動）する補正のみを行う例について説明する。 The input document of the scanned image acquired in S400 is the same type of document as the registered document, and each item is printed at the same coordinates as the item in the registered document. However, due to differences in printing and scanning timing or the printing device, etc., there may be a misalignment between the position of the text block on the scanned image and the position of the text block in the registered document. Therefore, in this step, overall alignment is performed to reduce the effect of the misalignment and improve the accuracy of subsequent processing. Note that in this embodiment, since tilt correction is performed in S500 of FIG. 5, an example will be described in which the overall alignment in this step only involves correction to shift (translate) the entire text block on the scanned image.

全体の位置合わせでは、登録文書のテキストブロックに対してどれだけスキャン画像のテキストブロックがシフトしているかというシフト量を算出して、シフト量だけスキャン画像の各テキストブロックがシフトするように座標の修正を行う。 For overall alignment, the shift amount is calculated - how much the text blocks in the scanned image are shifted relative to the text blocks in the registered document - and the coordinates are corrected so that each text block in the scanned image is shifted by the shift amount.

図９は、スキャン画像のテキストブロックと登録文書のテキストブロックとを同じ座標系に描画した画像の一部分を切り出した図である。図９を用いて全体の位置合わせのためのシフト量の算出の具体的な手順を説明する。図９において、実線の矩形はスキャン画像内のテキストブロック群のうちから選択された１つのテキストブロック９００を示し、破線の矩形は、テキストブロック９００の周囲にある登録文書のテキストブロック９０１～９０３を示している。また、図９において、一点鎖線の円９０４は、スキャン画像のテキストブロック９００の左上頂点を中心に一定距離を半径とした範囲を示している。 Figure 9 is a diagram of a portion of an image in which a text block of the scanned image and a text block of a registered document are drawn in the same coordinate system. A specific procedure for calculating the shift amount for overall alignment will be explained using Figure 9. In Figure 9, the solid-line rectangle indicates one text block 900 selected from the group of text blocks in the scanned image, and the dashed-line rectangle indicates text blocks 901-903 of the registered document that surround text block 900. Also in Figure 9, the dash-dotted circle 904 indicates an area with a certain radius centered on the upper left vertex of text block 900 of the scanned image.

シフト量の算出のために、スキャン画像の各テキストブロックと対応する候補となる登録文書のテキストブロック（ペアブロックとよぶ）を決定する。ここでスキャン画像のテキストブロックのペアブロックの決定について説明する。 To calculate the shift amount, we determine the text blocks (called pair blocks) in the registered document that are candidates for each text block in the scanned image. Here we explain how to determine pair blocks for text blocks in the scanned image.

初めに、登録文書のテキストブロック９０１～９０３のうち、スキャン画像内のテキストブロック群から選択された１つのテキストブロック９００の左上頂点を中心とする円９０４の中に、左上頂点が入るテキストブロックを探す。図９では、テキストブロック９０１、９０２が該当することになる。次に、スキャン画像のテキストブロック９００と、登録文書のテキストブロック９０１、９０２それぞれとのオーバラップ率を求める。オーバラップ率は、スキャン画像のテキストブロックと登録画像のテキストブロックとの左上頂点同士を合わせて、両テキストブロックの共通部分の面積を算出する。そして、（共通部分の面積）／（両テキストブロックのうち大きい方の面積）によって値を求めてオーバラップ率とする。 First, among the text blocks 901-903 of the registered document, a text block whose upper left vertex falls within a circle 904 whose center is the upper left vertex of one text block 900 selected from the text blocks in the scanned image is searched for. In FIG. 9, text blocks 901 and 902 fit this criteria. Next, the overlap rate is calculated between the text block 900 of the scanned image and each of the text blocks 901 and 902 of the registered document. The overlap rate is calculated by aligning the upper left vertices of the text block of the scanned image and the text block of the registered image to calculate the area of the common part of both text blocks. The overlap rate is then calculated by dividing the area of the common part by the larger area of both text blocks.

オーバラップ率が、所定の条件を満たす登録文書のテキストブロックを、ペアブロックとする。所定の条件は、例えば、スキャン画像のテキストブロックとのオーバラップ率が、最大オーバラップ率に係数αを乗算した値以上であり、かつ、所定の閾値以上であることである。この場合において、係数αは最大オーバラップ率と近いオーバラップ率を持つ組合せを選択するためのもので、例えば０．５～０．８のような１．０未満の値とする。また、所定の閾値は最低ラインを規定するものであり、例えば０．３～０．７のような１．０未満の値とする。 A text block in a registered document whose overlap rate meets a specified condition is considered to be a paired block. The specified condition is, for example, that the overlap rate with the text block in the scanned image is equal to or greater than the maximum overlap rate multiplied by coefficient α, and equal to or greater than a specified threshold. In this case, coefficient α is used to select a combination with an overlap rate close to the maximum overlap rate, and is set to a value less than 1.0, such as 0.5 to 0.8. The specified threshold defines a minimum line, and is set to a value less than 1.0, such as 0.3 to 0.7.

図９では、登録文書のテキストブロック９０１、９０２のうち、スキャン画像のテキストブロック９００と形状の近い、テキストブロック９０１のみがペアブロックとして選択される。所定の条件を満たすテキストブロックが他にもあればペアブロックは複数選択されることもある。このように、スキャン画像内から選択された１つのテキストブロックに対応するペアブロック群のそれぞれに対して、スキャン画像内から選択されたテキストブロックとの左上頂点のＸ方向およびＹ方向の差分量（シフト量）を算出する。そして、差分量をシフト量ヒストグラムに投票する。この場合のヒストグラムのビンの範囲は任意でよい。 In FIG. 9, of the text blocks 901 and 902 in the registered document, only text block 901, which has a shape similar to that of text block 900 in the scanned image, is selected as a pair block. If there are other text blocks that satisfy certain conditions, multiple pair blocks may be selected. In this way, for each of the pair blocks corresponding to one text block selected from the scanned image, the difference (shift amount) in the X and Y directions of the upper left vertex between the text block selected from the scanned image is calculated. The difference amount is then voted for in a shift amount histogram. In this case, the range of the bins of the histogram can be any range.

図９の場合、テキストブロック９００については、登録文書のテキストブロック９０１とのの左上頂点のＸ方向およびＹ方向の差分量（シフト量）が算出されて、シフト量がシフト量ヒストグラムに投票される。 In the case of FIG. 9, for text block 900, the difference (shift amount) in the X and Y directions of the upper left vertex between the text block 900 and the text block 901 of the registered document is calculated, and the shift amount is voted for in the shift amount histogram.

スキャン画像内のテキストブロックに対応するペアブロック群を決定し、シフト量ヒストグラムに投票するまでの処理を、スキャン画像の全てテキストブロックに対してそれぞれ行う。そして、最終的に得られたシフト量ヒストグラムにおける最大のピーク点となる位置を決定する。決定された位置が示すシフト量を全体の位置合わせのシフト量とする。 The process of determining pair blocks corresponding to text blocks in the scanned image and voting for the shift amount histogram is performed for each text block in the scanned image. Then, the position that is the maximum peak point in the final shift amount histogram is determined. The shift amount indicated by the determined position is used as the shift amount for the overall alignment.

なお、ノイズの影響が懸念される場合は、生成したシフト量ヒストグラムに対してスムージングを掛けてもよい。また、最大となるピーク点以外の局所的なピーク点についても、シフト量の候補として選び、その候補の中から全体の位置合わせに用いるシフト量を選んでもよい。例えば、シフト量の各候補について、スキャン画像のテキストブロックの座標をシフトさせて、図５のＳ５０４の文書マッチングと同様の類似度算出を行い、最も類似度が高くなる候補を、最終的なシフト量として決定してもよい。 If the influence of noise is a concern, the generated shift amount histogram may be smoothed. Local peak points other than the maximum peak point may also be selected as candidates for the shift amount, and the shift amount to be used for overall alignment may be selected from the candidates. For example, for each candidate shift amount, the coordinates of the text block in the scanned image may be shifted, and a similarity calculation may be performed in the same way as in document matching in S504 of FIG. 5, and the candidate with the highest similarity may be determined as the final shift amount.

上記の手順で決定されたシフト量だけ、スキャン画像の各テキストブロックの座標をシフトすることで、位置合わせされたスキャン画像のテキストブロック群を得ることができる。なお、テキストブロックの位置合わせの方法は上記の方法に限るものではない。スキャン画像全体のシフト（平行移動）に関する補正のみを行う例について説明したが、印刷およびスキャンのズレとして、倍率に関するズレが想定される場合には、シフト量だけでなく、倍率のズレも考慮した位置合わせを行ってもよい。 By shifting the coordinates of each text block in the scanned image by the shift amount determined by the above procedure, a group of aligned text blocks in the scanned image can be obtained. Note that the method of aligning text blocks is not limited to the above method. An example has been described in which only correction is made regarding the shift (parallel movement) of the entire scanned image, but if a deviation in magnification is expected as a deviation between printing and scanning, alignment can be performed taking into account not only the shift amount but also the deviation in magnification.

なお以下のステップにおけるスキャン画像またはスキャン画像のテキストブロック群は、この全体の位置合わせされたスキャン画像またはテキストブロック群を指すものとする。 Note that in the following steps, the scanned image or text blocks of the scanned image refer to this entire aligned scanned image or text blocks.

次に、Ｓ５０６で付与された文書ＩＤに紐づいた登録文書のインデックス情報を取得する。そしてＳ８０１でインデックス情報に含まれるインデックスの項目のいずれかを処理対象に選んでＳ８０１～Ｓ８１０を繰り返す。そして、スキャン画像のテキストブロック群から、処理対象の項目のテキストブロックを推定する処理を行う。処理対象の項目に対する処理が終了すると、再度、未処理の項目の中から処理対象の項目が選択される。 Next, index information of the registered document linked to the document ID assigned in S506 is obtained. Then, in S801, one of the index items included in the index information is selected as the processing target, and S801 to S810 are repeated. Then, a process is performed to estimate the text block of the processing target item from the text block group of the scanned image. When the processing of the processing target item is completed, another processing target item is selected from the unprocessed items.

Ｓ８０１において画像処理部３０５は、登録文書のインデックス情報に登録されている項目のうち未処理のインデックスの項目を１つ選択して処理対象の項目とする。本実施形態では、図７のインデックス情報に保持されている、タイトル（title）、発行元会社名（sender）、帳票番号（number）、合計金額（total_price）の項目の何れかが処理対象として選択される。 In S801, the image processing unit 305 selects one unprocessed index item from among the items registered in the index information of the registered document, and sets it as the item to be processed. In this embodiment, one of the items held in the index information in FIG. 7, the title (title), issuing company name (sender), document number (number), and total amount (total_price), is selected as the item to be processed.

Ｓ８０２において画像処理部３０５は、処理対象の項目の「部分パターン」を取得する。部分パターンには、登録文書に含まれるテキストブロックの一部のレイアウト（部分レイアウト）の情報と、部分レイアウトを含む範囲（部分パターン範囲）の情報と、が含まれる。 In S802, the image processing unit 305 acquires a "partial pattern" of the item to be processed. The partial pattern includes information on the layout of a portion of a text block included in the registered document (partial layout) and information on the range that includes the partial layout (partial pattern range).

図１０（ａ）は、図７で文書ＩＤ「０００１」として登録されている登録文書における、それぞれの項目のインデックスブロックの位置およびサイズを図示したものである。図１０（ａ）の破線の矩形は、タイトル、帳票番号、合計金額、発行元会社名のそれぞれの項目のインデックスブロック１０００～１００３を表している。 Figure 10(a) illustrates the position and size of the index blocks for each item in the registered document registered in Figure 7 with document ID "0001." The dashed rectangles in Figure 10(a) represent index blocks 1000 to 1003 for each of the items: title, document number, total amount, and issuing company name.

図１０（ｂ）は、「発行元会社名（sender）」の項目の部分パターンを示す図である。図１０（ｂ）の一点鎖線の矩形で表される範囲は、「発行元会社名（sender）」の項目の部分パターン範囲１００６を示す。部分パターン範囲１００６は、「発行元会社名（sender）」の項目のテキストブロックであるインデックスブロック１００３を基準として予め設定された値を使って決定される。 Figure 10(b) is a diagram showing a partial pattern for the "issuing company name (sender)" item. The range indicated by the dashed-dotted rectangle in Figure 10(b) indicates the partial pattern range 1006 for the "issuing company name (sender)" item. The partial pattern range 1006 is determined using a preset value based on index block 1003, which is the text block for the "issuing company name (sender)" item.

テキストブロック１００４、１００５は、登録文書における、部分パターン範囲１００６に少なくとも一部が含まれるテキストブロックを表している。このテキストブロック１００４、１００５と、インデックスブロック１００３で表される登録文書内の部分的なレイアウトが、発行元会社名の項目の部分レイアウトである。部分レイアウトは、処理対象の項目のテキストブロックと、処理対象の項目のテキストブロック以外の少なくとも１つのテキストブロックとで表される。レイアウトとは、夫々のテキストブロックの位置情報と、夫々のテキストブロックのサイズと、を表す情報である。 Text blocks 1004 and 1005 represent text blocks in the registered document that are at least partially included in partial pattern range 1006. These text blocks 1004 and 1005 and the partial layout in the registered document represented by index block 1003 are the partial layout of the issuing company name item. The partial layout is represented by the text block of the item to be processed and at least one text block other than the text block of the item to be processed. The layout is information that represents the position information of each text block and the size of each text block.

発行元会社名の項目の部分パターンに含まれる情報として、部分パターン範囲１００６と、インデックスブロック１００３とテキストブロック１００４および１００５とからなる部分レイアウトと、が決定される。このように、登録文書の夫々の項目に対応する部分パターンが決定されて記憶されている。 As information contained in the partial pattern of the issuing company name item, partial pattern range 1006 and a partial layout consisting of index block 1003 and text blocks 1004 and 1005 are determined. In this way, partial patterns corresponding to each item of the registered document are determined and stored.

詳細は後述するが、本実施形態では、部分レイアウトと配置が類似または一致しているスキャン画像内の位置を探索して、スキャン画像内における処理対象の項目のテキストブロックを推定する。 As will be described in more detail below, in this embodiment, a position in the scanned image that is similar or matches the arrangement of the partial layout is searched for, and the text block of the item to be processed in the scanned image is estimated.

図１０（ｃ）は、「タイトル(title)」の項目の部分パターンを示す図である。タイトルについても同様に、部分パターン範囲１００７と、タイトルのインデックスブロック１０００と部分パターン範囲１００７に含まれるテキストブロック１００１、１００８～１０１３とからなる部分レイアウトと、が部分パターンとして決定されている。 Figure 10(c) shows the partial pattern for the "title" item. Similarly, for the title, a partial pattern range 1007 and a partial layout consisting of the title index block 1000 and text blocks 1001, 1008 to 1013 included in the partial pattern range 1007 have been determined as the partial pattern.

なお、部分パターン範囲１００７のサイズは、図１０（ｂ）の部分パターン範囲１００６と比べてサイズが異なる。このように項目の性質に応じて部分パターンサイズは異ならせてもよい。または、部分パターン範囲のサイズは、全ての項目で共通のサイズが用いられてもよい。部分パターン範囲のサイズの決定方法については実施形態２で説明する。 Note that the size of partial pattern range 1007 is different from the size of partial pattern range 1006 in FIG. 10(b). In this way, the partial pattern size may be made different depending on the properties of the item. Alternatively, the size of the partial pattern range may be a common size for all items. A method for determining the size of the partial pattern range will be described in embodiment 2.

なお、部分パターンは、文書原稿をスキャンした後に行われるインデックス抽出処理の実行が行われるごとに決定される必要はない。例えば、文書の登録時において、項目ごとに部分パターンを決定し、図７で示した抽出ルールの一部として予め記憶させてもよい。つまり、Ｓ８０２では、記憶されている処理対象の項目の部分パターンが取得されればよい。 Note that the partial pattern does not need to be determined each time the index extraction process is performed after scanning a document manuscript. For example, when registering a document, a partial pattern may be determined for each item and stored in advance as part of the extraction rules shown in FIG. 7. In other words, in S802, it is sufficient to obtain the partial pattern of the stored item to be processed.

次のＳ８０３およびＳ８０４では、処理対象の項目の部分レイアウトとの一致度が高い領域のある、スキャン画像内の位置（ＸＹ候補位置）を決定する。ＸＹ候補位置の決定方法としては、例えば、テンプレートマッチングのようにスキャン画像内の探索範囲に対して部分パターンを走査して一致度を算出することで候補位置を推定してもよい。本実施形態では計算量を抑制させるため、探索範囲におけるＹ方向の候補となる位置を決定してＹ方向の位置（Ｙ位置）を絞り込む。その上で、Ｙ位置の候補（Ｙ候補位置）群それぞれにおいて、Ｘ方向に部分パターンを走査してＸＹ候補位置を決定することで、計算量を抑える方法を説明する。 In the next steps S803 and S804, positions (XY candidate positions) in the scanned image that have an area that matches highly with the partial layout of the item to be processed are determined. As a method for determining the XY candidate positions, for example, the candidate positions may be estimated by scanning a partial pattern in a search range in the scanned image and calculating the degree of match, as in template matching. In this embodiment, in order to reduce the amount of calculations, candidate positions in the Y direction in the search range are determined and the Y direction positions (Y positions) are narrowed down. Then, for each group of Y position candidates (Y candidate positions), a partial pattern is scanned in the X direction to determine the XY candidate positions, thereby reducing the amount of calculations.

Ｓ８０３において画像処理部３０５は、スキャン画像のテキストブロック群から、登録文書における処理対象の項目の部分パターンのテキストブロックに類似するＹ候補位置群を決定する。 In S803, the image processing unit 305 determines a group of Y candidate positions from the group of text blocks in the scanned image that are similar to the text blocks of the partial pattern of the item to be processed in the registered document.

図１１は、Ｙ候補位置群の決定処理を説明するための図である。処理対象の項目が発行元会社名（sender）であるものとして説明を行う。 Figure 11 is a diagram for explaining the process of determining the Y candidate position group. The explanation will be given assuming that the item to be processed is the issuing company name (sender).

図１１（ａ）は、登録文書における発行元会社名（sender）の部分パターンを示す図であり図１０（ｂ）と同様の図である。図１１（ｂ）は、スキャン画像であり破線の矩形は、位置合わせがされたテキストブロック群を表している。また、図１１（ｂ）で示したスキャン画像が示す文書は、登録文書「０００１」と同じ種類の文書として判定された文書であるが、図７の登録文書に比べて表構造内の項目行数が増えている例を示している。よって、スキャン画像における推定されるべき発行元会社名（sender）のインデックスブロック１１０１が、登録文書における発行元会社名（sender）のインデックスブロック１００２の位置と比較して下方向にシフトしてしまっている。 Figure 11(a) is a diagram similar to Figure 10(b) showing a partial pattern of the issuing company name (sender) in a registered document. Figure 11(b) is a scanned image, and the dashed rectangle represents a group of aligned text blocks. The document shown in the scanned image in Figure 11(b) is a document determined to be the same type of document as the registered document "0001", but shows an example in which the number of item rows in the table structure is increased compared to the registered document in Figure 7. Therefore, the index block 1101 of the issuing company name (sender) that should be estimated in the scanned image has shifted downward compared to the position of the index block 1002 of the issuing company name (sender) in the registered document.

図１１（ｃ）は、発行元会社名の部分パターンに含まれる部分レイアウトを表すテキストブロック１００３～１００５のうちの１つのテキストブロック１００３を、スキャン画像のテキストブロック群と同じ座標系に重畳させた図である。Ｙ候補位置群の決定について、部分パターン内のテキストブロック１００３に注目して図１１（ｃ）を用いて説明する。 Figure 11(c) shows one text block 1003 out of text blocks 1003-1005 representing a partial layout included in the partial pattern of the issuing company name, superimposed on the same coordinate system as the text blocks in the scanned image. The determination of the Y candidate positions will be explained with reference to Figure 11(c), focusing on text block 1003 in the partial pattern.

図１１（ｃ）の、一点鎖線の矩形で表される探索範囲１１００は、処理対象の項目のＹ候補位置群を決定するために探索する範囲を表している。破線の矩形で表されるテキストブロック１１０１～１１０９は、図１１（ｂ）に示すスキャン画像のテキストブロックのうち、矩形の中心が探索範囲１１００の中にあるテキストブロックである。 In Figure 11(c), search range 1100, represented by a dashed-dotted rectangle, represents the range to be searched to determine the group of Y candidate positions for the item to be processed. Text blocks 1101-1109, represented by a dashed-dotted rectangle, are the text blocks in the scanned image shown in Figure 11(b) whose rectangular centers are within search range 1100.

Ｙ候補位置群の決定には、はじめに、部分レイアウトに含まれる１つのテキストブロック（図１１（ｃ）ではテキストブロック１００３）が選択される。そして選択されたテキストブロックをスキャン画像のテキストブロック群と同じ座標系に重畳し、探索範囲内のスキャン画像のテキストブロック（図１１（ｃ）ではテキストブロック１１０１～１１０９）との矩形の中心のＹ位置の差分量をそれぞれ算出する。そして、算出された差分量がＹ方向のシフト量ヒストグラムに投票される。シフト量ヒストグラムのビンの範囲は任意でよい。 To determine the group of Y candidate positions, first one text block (text block 1003 in FIG. 11(c)) included in the partial layout is selected. The selected text block is then superimposed on the same coordinate system as the group of text blocks in the scanned image, and the difference in the Y position of the center of the rectangle with the text blocks in the scanned image within the search range (text blocks 1101 to 1109 in FIG. 11(c)) is calculated. The calculated difference is then voted for in the Y direction shift amount histogram. The range of the bins in the shift amount histogram can be any range.

図１２は、Ｙ方向のシフト量ヒストグラムの例を示す図である。図１２（ａ）は、図１１（ｃ）における部分パターンのテキストブロック１００３と、スキャン画像のテキストブロック１１０２とのＹ位置の差分量を投票した後のシフト量ヒストグラムである。ｈは基準からのＹ方向の探索範囲の絶対値の上限を示している。テキストブロック１００３とテキストブロック１１０２とのＹ方向の差分量に従い、位置１２００に投票が行われている。同様に、部分パターンに含まれる１つのテキストブロックと、スキャン画像の探索範囲内の全てのテキストブロックとのＹ中心の差分量に応じた投票が行われる。この投票を、部分パターン内の全テキストブロックに対して行う。つまり、部分パターンのテキストブロック１００４、１００５についても、探索範囲内のテキストブロック１１０１～１１０９とのＹ中心の差分量が算出されてシフト量ヒストグラムに投票される。そして、Ｙ方向のシフト量ヒストグラムを完成させる。なお、ノイズの影響が懸念される場合は、Ｙ方向の生成したシフト量ヒストグラムに対してスムージングを掛けてもよい。 Figure 12 is a diagram showing an example of a shift amount histogram in the Y direction. Figure 12 (a) is a shift amount histogram after voting for the difference amount of the Y position between the text block 1003 of the partial pattern in Figure 11 (c) and the text block 1102 of the scanned image. h indicates the upper limit of the absolute value of the search range in the Y direction from the reference. Voting is performed on position 1200 according to the difference amount in the Y direction between the text block 1003 and the text block 1102. Similarly, voting is performed according to the difference amount of the Y center between one text block included in the partial pattern and all text blocks in the search range of the scanned image. This voting is performed for all text blocks in the partial pattern. In other words, the difference amount of the Y center between the text blocks 1101 to 1109 in the search range is also calculated for the text blocks 1004 and 1005 of the partial pattern, and voted for in the shift amount histogram. Then, the shift amount histogram in the Y direction is completed. If the influence of noise is a concern, smoothing may be applied to the shift amount histogram generated in the Y direction.

図１２（ｂ）は最終的に生成されるＹ方向のシフト量ヒストグラムである。シフト量ヒストグラムの生成が完了した後、ヒストグラム内の位置１２０１～１２０６に示すようなピーク点を決定し、各ピーク点のビンに応じたＹ方向のシフト量に基づきＹ候補位置群を決定する。 Figure 12(b) shows the finally generated Y-direction shift amount histogram. After the generation of the shift amount histogram is completed, peak points are determined as shown at positions 1201 to 1206 in the histogram, and a group of Y candidate positions is determined based on the Y-direction shift amount corresponding to the bin of each peak point.

なお、図１１（ｃ）のＹ候補位置群を決定するための探索範囲１１００は、部分パターンのインデックスブロックの位置を基準に、あらかじめ設定された値で自動決定される。なお、探索範囲のサイズについては、全ての項目で共通の範囲を使用してもよいし、処理対象の項目の属性に応じて決定してもよい。例えば、タイトルのインデックスブロックは文書内で固定の位置にあることが多い。よって、処理対象の項目がタイトルの場合、探索範囲を狭くしても探索範囲から推定されるべきインデックスブロックが外れる可能性は低いため、探索範囲を狭く設定してもよい。探索範囲を狭くすることで、計算量を抑えつつ、余計な候補位置が決定されることを防ぐことができる。一方、項目が合計金額のインデックスブロックは、文書内の表構造の項目行数の変化に応じて、位置が上下に変化することがある。このため、処理対象の項目が合計金額の場合は他の項目よりも探索範囲を上下に広く設定してもよい。 The search range 1100 for determining the Y candidate positions in FIG. 11(c) is automatically determined by a preset value based on the position of the index block of the partial pattern. The size of the search range may be a common range for all items, or may be determined according to the attributes of the item to be processed. For example, the index block of the title is often located at a fixed position in the document. Therefore, when the item to be processed is the title, the search range may be set narrower because the index block to be estimated from the search range is unlikely to be missed even if the search range is narrowed. By narrowing the search range, it is possible to prevent unnecessary candidate positions from being determined while suppressing the amount of calculation. On the other hand, the position of the index block whose item is the total amount may change up or down depending on the change in the number of item rows in the table structure in the document. Therefore, when the item to be processed is the total amount, the search range may be set wider up or down than other items.

Ｓ８０４において画像処理部３０５は、Ｓ８０３で決定された夫々のＹ候補位置を基準に、部分パターンの部分レイアウトとスキャン画像のテキストブロック群との一致度を導出する。 In S804, the image processing unit 305 derives the degree of match between the partial layout of the partial pattern and the text blocks of the scanned image based on each of the Y candidate positions determined in S803.

図１３は、スキャン画像内のある位置に処理対象の項目の部分レイアウトを重ねて置いた場合の、部分レイアウトとスキャン画像のテキストブロックのレイアウトとのの重なりの状態を示した図である。図１３を用いて、部分レイアウトとスキャン画像のテキストブロック群の一致度の導出方法について説明する。 Figure 13 shows the overlap state between a partial layout of an item to be processed and the layout of text blocks in the scanned image when the partial layout is placed at a certain position in the scanned image. Using Figure 13, we will explain how to derive the degree of matching between a partial layout and a group of text blocks in the scanned image.

図１３において、実線の矩形は、処理対象の項目の部分レイアウトを構成するテキストブロック１００３～１００５である。一点鎖線の矩形は、部分パターン範囲１００６を表している。破線の矩形は、スキャン画像のテキストブロック１１０１、１１０４～１１０６、１１０９を表す。斜線塗りつぶし領域１３０９、１３１０は、部分レイアウトのテキストブロック１００３～１００５とスキャン画像のテキストブロックの重なっている領域を表している。 In FIG. 13, the solid rectangles are text blocks 1003-1005 that make up the partial layout of the item being processed. The dashed rectangle represents the partial pattern range 1006. The dashed rectangles represent text blocks 1101, 1104-1106, and 1109 in the scanned image. The shaded areas 1309 and 1310 represent the areas where text blocks 1003-1005 in the partial layout overlap with the text blocks in the scanned image.

部分レイアウトとスキャン画像のテキストブロックとの一致度Ｓｃｏｒｅは、以下の式（１）で導出する。 The degree of match between the partial layout and the text block of the scanned image (Score) is calculated using the following formula (1):

上記式（１）において、Ｒは部分レイアウトを構成する全テキストブロックを表しており、またＮ_Rは部分レイアウトを構成するテキストブロックの総数を表す。図１３において、Ｒは、テキストブロック１００３～１００５であり、Ｎ_Rは３である。 In the above formula (1), R represents all the text blocks that make up the partial layout, and N _R represents the total number of text blocks that make up the partial layout. In Fig. 13, R is text block 1003 to 1005, and N _R is 3.

Correlation(r)は、部分レイアウトを構成する一つのテキストブロックｒの個別一致度である。テキストブロックｒの個別一致度Correlation(r)は、式（２）によって導出する。 Correlation(r) is the individual degree of correspondence of one text block r that constitutes a partial layout. The individual degree of correspondence Correlation(r) of text block r is derived using formula (2).

OverlappingQは、テキストブロックｒと重なりのあるスキャン画像のテキストブロックの集合である。OverlapArea(r,q)は、テキストブロックｒとOverlappingQのテキストブロックうちの１つのテキストブロックｑとの重なり領域の面積である。またＮ_OverlappingQはOverlappingQの総数を表す。 OverlappingQ is a set of text blocks in the scanned image that overlap with text block r. OverlapArea(r,q) is the area of the overlapping region between text block r and one of the text blocks in OverlappingQ, text block q. _{NOverlappingQ} represents the total number of OverlappingQ.

図１３において、rをテキストブロック１００３とした場合、OverlappingQはテキストブロック１１０５のみでありOverlapArea(r,q)は領域１３０９である。ｒをテキストブロック１００５とした場合、OverlappingQは、テキストブロック１１０４のみでありOverlapArea(r,q)は領域１３１０が該当する。ｒをテキストブロック１００４とした場合、該当するOverlappingQは無いためＮ_OverlappingQは0であることから、Correlation(r)は0である。 13, if r is text block 1003, OverlappingQ is only text block 1105 and OverlapArea(r,q) is area 1309. If r is text block 1005, OverlappingQ is only text block 1104 and OverlapArea(r,q) is area 1310. If r is text block 1004, there is no corresponding OverlappingQ, so _{NOverlappingQ} is 0 and Correlation(r) is 0.

Area_rはテキストブロックｒの面積であり、Area_qはテキストブロックｑの面積である。 Area_r is the area of text block r, and Area_q is the area of text block q.

なお、式（１）による一致度の導出では、スキャン画像のテキストブロックの数が多く、またテキストブロックの面積が大きいほど、個別一致度Collrelation(r)の値は大きく導出されてしまうことがある。そこで、一致度Ｓｃｏｒｅは、以下の式（１）’に示すようにペナルティ項PenaltyTermを追加してもよい。 When deriving the degree of matching using formula (1), the greater the number of text blocks in the scanned image and the larger the area of the text blocks, the larger the value of the individual degree of matching Collrelation(r) that is derived may be. Therefore, a penalty term PenaltyTerm may be added to the degree of matching Score, as shown in the following formula (1)'.

式（１）’におけるペナルティ項PenaltyTermは、式（３）によって導出する。 The penalty term PenaltyTerm in equation (1)' is derived using equation (3).

TotalArea_Rは、部分レイアウトを構成する全テキストブロックの総面積である。
図１３ではテキストブロック１００３～１００５の総面積である。 TotalArea_R is the total area of all text blocks that make up the partial layout.
In FIG. 13, it is the total area of text blocks 1003 to 1005 .

TotalArea_NonOverlappingQは、部分パターン範囲内に存在するスキャン画像のテキストブロックのうち、部分レイアウトを構成するテキストブロックの何れとも重ならないテキストブロック群の面積の総和である。図１３の場合、部分パターン範囲１００６内のテキストブロック１１０１、１１０４、１１０５、１１０６、１１０９のうちテキストブロック１００３～１００５と重ならないテキストブロック１１０１、１１０６、１１０９の面積の総和である。 TotalArea_NonOverlappingQ is the total area of all the text blocks in the scanned image that exist within the partial pattern range and that do not overlap any of the text blocks that make up the partial layout. In the case of Figure 13, it is the total area of text blocks 1101, 1104, 1105, 1106, and 1109 that do not overlap with text blocks 1003 to 1005 out of the text blocks 1101, 1104, 1105, 1106, and 1109 in the partial pattern range 1006.

ペナルティ項を設けることによって、部分パターン範囲１００６内の部分レイアウトを構成するテキストブロックが存在しなかった範囲に、スキャン画像内のテキストブロックが存在する場合に一致度を減点するように調整することができる。よって、部分レイアウトを構成するテキストブロックが少ない場合であっても、部分パターン範囲内の部分レイアウトを構成するテキストブロックが存在しない領域の情報を活用して一致度を導出することができる。なお、一致度の導出方法は、上記の式による導出に限るものではなく、部分レイアウトとの一致度が決定できればよい。 By providing a penalty term, it is possible to adjust the degree of match so that points are subtracted when a text block in the scanned image is present in an area where no text block constituting the partial layout in the partial pattern area 1006 was present. Therefore, even if there are few text blocks constituting the partial layout, the degree of match can be derived by utilizing information on areas within the partial pattern area where no text block constituting the partial layout is present. Note that the method of deriving the degree of match is not limited to derivation using the above formula, and it is sufficient if the degree of match with the partial layout can be determined.

Ｓ８０４において画像処理部３０５は、Ｓ８０３で決定したＹ候補位置群のうちのいずれかのＹ候補位置に、インデックスブロックが位置するように部分パターン（部分レイアウトおよび部分パターン範囲）を置く。そして、画像処理部３０５は、部分パターンをＸ方向に走査して、各位置における一致度を導出する。画像処理部３０５は、これを全てのＹ候補位置群に対して行う。 In S804, the image processing unit 305 places a partial pattern (partial layout and partial pattern range) so that the index block is located at one of the Y candidate positions among the group of Y candidate positions determined in S803. The image processing unit 305 then scans the partial pattern in the X direction to derive the degree of match at each position. The image processing unit 305 performs this for all groups of Y candidate positions.

図１４は、Ｓ８０３で決定したＹ候補位置群のうちの一つのＹ候補位置における本ステップの処理を表した図である。図１４（ａ）において、実線の矩形は、部分レイアウトを構成するテキストブロック１００３～１００５であり、一点鎖線の矩形は部分パターン範囲１００６を表している。また破線の矩形は、スキャン画像のテキストブロック１１０１、１１０５、１１０６を表し、斜線の領域は、部分レイアウトのテキストブロックとスキャン画像のテキストブロックとの重なっている領域を表している。また、図１４では、本ステップにおける処理が図１４（ａ）～（ｅ）から順に処理が進むように示されており、探索範囲内で部分パターンをＸ方向に（左から右に）走査しながら、それぞれの位置における一致度を導出する様子を示している。同様の処理が夫々のＹ候補位置において行われる。 Figure 14 shows the processing of this step at one of the Y candidate positions determined in S803. In Figure 14(a), the solid rectangle represents text blocks 1003-1005 that make up the partial layout, and the dashed rectangle represents the partial pattern range 1006. The dashed rectangle represents text blocks 1101, 1105, and 1106 in the scanned image, and the shaded area represents the area where the text blocks in the partial layout and the text blocks in the scanned image overlap. Figure 14 shows the processing of this step proceeding in order from Figure 14(a) to (e), and shows how the partial pattern is scanned in the X direction (from left to right) within the search range and the degree of match at each position is derived. Similar processing is performed at each Y candidate position.

Ｓ８０５において画像処理部３０５は、Ｓ８０４で導出した一致度が最大となる位置をＸＹ候補位置と決定する。例えば、図１４の場合、部分パターン（部分レイアウト）が、図１４（ｃ）に示す位置で一致度が最大となる。このため、図１４（ｃ）における部分レイアウトに含まれるインデックスブロックを示すテキストブロック１００３の位置が、ＸＹ候補位置として決定される。 In S805, the image processing unit 305 determines the position at which the degree of match derived in S804 is greatest as the XY candidate position. For example, in the case of FIG. 14, the degree of match of the partial pattern (partial layout) is greatest at the position shown in FIG. 14(c). Therefore, the position of the text block 1003 indicating the index block included in the partial layout in FIG. 14(c) is determined as the XY candidate position.

Ｓ８０６において画像処理部３０５は、Ｓ８０５で決定したＸＹ候補位置における一致度が所定の閾値以上かどうかを判定する。 In S806, the image processing unit 305 determines whether the degree of match at the XY candidate positions determined in S805 is equal to or greater than a predetermined threshold.

一致度が閾値以上の場合（Ｓ８０６がＹＥＳ）、Ｓ８０７において画像処理部３０５は、Ｓ８０５で決定したスキャン画像上のＸＹ候補位置を処理対象の項目のテキストブロック（インデックスブロック）のある位置と推定する。画像処理部３０５は、推定した位置に基づき、スキャン画像内の処理対象の項目のインデックスブロックを推定する処理を行う。 If the degree of match is equal to or greater than the threshold (YES in S806), in S807 the image processing unit 305 estimates the XY candidate position on the scanned image determined in S805 as the position of the text block (index block) of the item to be processed. The image processing unit 305 performs processing to estimate the index block of the item to be processed in the scanned image based on the estimated position.

例えば、登録文書における処理対象の項目のインデックスブロックをスキャン画像内のＸＹ候補位置にシフトさせた場合に、重なり合うスキャン画像内のテキストブロックが、所定の条件を満たすかが判定される。所定の条件とは、例えば、登録文書における処理対象のインデックスブロックとの重なり度合いを示す重なり率が所定の値以上、かつ、登録文書における処理対象のインデックスブロックとの左上座標の距離が一定の範囲内に入っているかという条件である。 For example, when the index block of an item to be processed in the registered document is shifted to a candidate XY position in the scanned image, it is determined whether the overlapping text block in the scanned image satisfies a specified condition. The specified condition is, for example, whether the overlap rate indicating the degree of overlap with the index block to be processed in the registered document is equal to or greater than a specified value, and whether the distance of the upper left coordinate with the index block to be processed in the registered document is within a certain range.

所定の条件を満たすテキストブロックがあると判定された場合（Ｓ８０７がＹＥＳ）、Ｓ８０８に進む。Ｓ８０８において画像処理部３０５は、Ｓ８０７で所定の条件を満たすと判定されたスキャン画像のテキストブロックを、Ｓ８０１で選択した処理対象の項目を示す文字列を含むテキストブロック（インデックスブロック）と推定する。 If it is determined that there is a text block that satisfies the specified condition (YES in S807), the process proceeds to S808. In S808, the image processing unit 305 estimates that the text block of the scanned image that is determined in S807 to satisfy the specified condition is a text block (index block) that contains a character string that indicates the item to be processed that was selected in S801.

一致度が閾値未満の場合（Ｓ８０６がＮＯ）または該当のテキストブロックがないと判定された場合（Ｓ８０７がＮＯ）、Ｓ８０９に進む。Ｓ８０９において画像処理部３０５は、Ｓ８０１で選択した処理対象の項目に対応するテキストブロックはスキャン画像内には無いと決定する。例えば、スキャン画像において処理対象の項目に対応する文字列が所定の領域に記載されていない場合、あるいは、Ｓ８０４で誤って位置を推定してしまった場合、Ｓ８０９において決定が行われる。 If the degree of match is less than the threshold (NO in S806) or if it is determined that there is no corresponding text block (NO in S807), the process proceeds to S809. In S809, the image processing unit 305 determines that there is no text block in the scanned image that corresponds to the item to be processed selected in S801. For example, if a character string corresponding to the item to be processed is not written in a specified area in the scanned image, or if the position was incorrectly estimated in S804, a determination is made in S809.

Ｓ８１０において画像処理部３０５は、登録文書のインデックス情報に登録されている全ての項目について、インデックスブロックを推定する処理を完了したかを判定する。未処理の項目があればＳ８０１に戻る。 In S810, the image processing unit 305 determines whether the process of estimating index blocks has been completed for all items registered in the index information of the registered document. If there are any unprocessed items, the process returns to S801.

全ての項目について処理が完了していれば本フローチャートの処理を終えＳ５０８に進む。Ｓ５０８において画像処理部３０５は、推定された夫々の項目のインデックスブロックにＯＣＲ処理を実行し、それぞれの項目に対応する文字列をインデックスとして抽出する。 If processing has been completed for all items, the process of this flowchart ends and proceeds to S508. In S508, the image processing unit 305 performs OCR processing on the index blocks of each estimated item and extracts character strings corresponding to each item as indexes.

以上説明したように本実施形態では、テキストブロックのレイアウトの一部を利用してスキャン画像に含まれるインデックスの抽出を行う。このため、本実施形態によれば、入力文書おける記載内容の変化等によって、スキャン画像に含まれるインデックスブロックの位置が登録文書と異なる場合であっても、インデックスを抽出することができる。また、本実施形態では、文書マッチングによって入力文書の種類を特定して、文書の種類に紐づいた抽出ルールを利用する。このため、テキストブロックの部分的なレイアウトによるインデックスブロックを推定する処理であっても、インデックスの誤抽出を抑制することができる。また、文書マッチングおよびインデックスブロック推定処理では、ＯＣＲ処理の前処理の結果として得られる前景領域のうちテキストブロックのみを使用する。このため、余計な計算コストをかけることなく、インデックス抽出処理を行うことができる。 As described above, in this embodiment, the index included in the scanned image is extracted by using a part of the layout of the text block. Therefore, according to this embodiment, even if the position of the index block included in the scanned image differs from that of the registered document due to a change in the contents of the input document, etc., the index can be extracted. In addition, in this embodiment, the type of input document is identified by document matching, and an extraction rule associated with the document type is used. Therefore, even in the process of estimating index blocks based on the partial layout of text blocks, erroneous extraction of indexes can be suppressed. Furthermore, in the document matching and index block estimation process, only the text blocks of the foreground region obtained as a result of preprocessing of the OCR process are used. Therefore, the index extraction process can be performed without incurring extra calculation costs.

＜実施形態２＞
実施形態１では、部分パターン範囲は、予め設定された値に基づき決定する方法について説明した。しかしながら、部分パターン範囲を広く設定しすぎると、インデックスブロックの周囲のみレイアウトが変わっているような場合、適切にインデックスブロックの位置を推定することができない。一方、部分パターン範囲が小さくなると部分レイアウトを構成するテキストブロックの数が少なく決定されることがあり、スキャン画像内の一致度の高い領域を探索するのが難しくなる。このため本実施形態では、部分パターン範囲を適切なサイズに決定する方法を説明する。なお、本実施形態については、実施形態１からの差分を中心に説明する。特に明記しない部分については実施形態１と同じ構成および処理である。 <Embodiment 2>
In the first embodiment, a method for determining the partial pattern range based on a preset value has been described. However, if the partial pattern range is set too wide, the position of the index block cannot be estimated appropriately when the layout changes only around the index block. On the other hand, if the partial pattern range is small, the number of text blocks constituting the partial layout may be determined to be small, making it difficult to search for an area with a high degree of match in the scanned image. For this reason, in this embodiment, a method for determining the partial pattern range to an appropriate size will be described. Note that the present embodiment will be described mainly with respect to the differences from the first embodiment. The configuration and processing are the same as those of the first embodiment unless otherwise specified.

文書の種類に応じてインデックスブロックの周囲に存在するテキストブロックの数、レイアウトは変わる。このため、本実施形態では、部分パターン範囲のサイズを決定するために、段階的に対象の項目のインデックスブロックを含む領域を広げながら、その領域にと重なるテキストブロックの数をカウントする。そして重なるテキストブロックの数が一定数以上になったときの領域を、その項目の部分パターン範囲として決定する。 The number and layout of text blocks around an index block vary depending on the type of document. For this reason, in this embodiment, to determine the size of the partial pattern range, the area including the index block of the target item is gradually expanded and the number of text blocks that overlap with that area is counted. Then, when the number of overlapping text blocks reaches a certain number or more, the area is determined to be the partial pattern range of that item.

図１５は、本実施形態における部分パターン範囲の決定方法を説明するための図である。図１５（ａ）における、実線の矩形はタイトルのインデックスブロック１０００であり、一点鎖線の矩形は、タイトルの部分パターン範囲を決定するための領域である。領域は、それぞれ、初期領域１５００、２段階目の領域１５０１、最大領域１５０２を示している。図１５（ａ）では、タイトルの項目における部分パターン範囲を決定するための領域が段階的に変更される様子を示している。初期領域から最大領域まで段階的に領域を広げながら、その領域と重なるインデックスブロックを除くテキストブロックをカウントする。そして、カウントされたテキストブロックが所定の数以上になったときの一点鎖線の矩形で示す領域を、その項目の部分パターン範囲として決定する。なお、所定の数は、１個以上であることが好ましい。本実施形態では、所定の数が５であるものとして説明する。 Figure 15 is a diagram for explaining a method of determining a partial pattern range in this embodiment. In Figure 15 (a), the solid line rectangle is the title index block 1000, and the dashed line rectangle is the area for determining the title partial pattern range. The areas are the initial area 1500, the second stage area 1501, and the maximum area 1502, respectively. Figure 15 (a) shows how the area for determining the partial pattern range in the title item is changed in stages. While gradually expanding the area from the initial area to the maximum area, text blocks are counted except for index blocks that overlap with the area. Then, when the counted text blocks reach a predetermined number or more, the area indicated by the dashed line rectangle is determined as the partial pattern range of the item. Note that the predetermined number is preferably 1 or more. In this embodiment, the predetermined number is described as 5.

本実施形態の部分パターン範囲の決定方法について具体的に説明する。はじめに、初期領域１５００と少しでも重なっているテキストブロックの数をカウントする。この場合、インデックスブロック１０００以外のテキストブロックが存在しないため、次の段階へ進む。 The method for determining the partial pattern range in this embodiment will now be described in detail. First, the number of text blocks that overlap even slightly with the initial region 1500 is counted. In this case, there are no text blocks other than the index block 1000, so the process proceeds to the next step.

次に、領域を広げて、２段階目の領域１５０１と少しでも重なっているテキストブロックをカウントする。図１５（ｂ）は、部分パターン範囲を決定するための領域を２段階目の領域１５０１とした場合の図である。図１５（ｂ）に示すように２段階目の領域１５０１とは、テキストブロック１００１、１００８～１０１３が重なる。このため２段階目の領域１５０１と重なるテキストブロックは７個とカウントされる。そして重なるテキストブロックの数が所定の数である５以上であると判定される。このため、タイトルの部分パターン範囲については２段階目の領域１５０１が示す位置およびサイズに決定される。このため部分パターン範囲に少なくとも一部が含まれるテキストブロック１００１、１００８～１０１３と、インデックスブロック１０００とからなるレイアウトが、タイトルの部分レイアウトとして決定される。 Next, the region is expanded, and text blocks that overlap even slightly with the second-stage region 1501 are counted. FIG. 15B shows the case where the region for determining the partial pattern range is the second-stage region 1501. As shown in FIG. 15B, text blocks 1001 and 1008 to 1013 overlap with the second-stage region 1501. Therefore, the number of text blocks that overlap with the second-stage region 1501 is counted as seven. It is then determined that the number of overlapping text blocks is equal to or greater than the predetermined number of five. Therefore, the position and size indicated by the second-stage region 1501 are determined as the partial pattern range of the title. Therefore, a layout consisting of the text blocks 1001 and 1008 to 1013, at least a portion of which is included in the partial pattern range, and the index block 1000 is determined as the partial layout of the title.

または、項目によって、周囲のテキストブロックの数は異なり、記載内容によるテキストブロックのレイアウトの変化が少ない領域は異なる。このため、例えば、項目の属性に応じて部分パターン範囲のサイズを異ならせてもよい。つまり、項目の属性に応じた部分パターンのサイズを予め設定してもよい。 Alternatively, the number of surrounding text blocks varies depending on the item, and the areas where the layout of the text blocks changes little depending on the content written are different. For this reason, for example, the size of the partial pattern range may be made different depending on the attributes of the item. In other words, the size of the partial pattern may be set in advance depending on the attributes of the item.

項目がタイトルの場合、タイトルのテキストブロックの近傍にはテキストブロックが存在しないことが多いという特徴がある。また、タイトルは、文書の記載内容の変化によるテキストブロックのレイアウトの変化が少ない文書の上部に存在するという特徴がある。このため、図１０（ｃ）の部分パターン範囲１００７に示すように、項目が文書のタイトルであれば、Ｘ方向は画像幅全体が収まり、Ｙ方向も画像の約４分の１が収まるような領域が部分パターン範囲として決定されてもよい。 When an item is a title, there is a characteristic that there are often no text blocks near the title text block. In addition, a title is characterized by being located at the top of a document where changes in the content of the document cause little change in the layout of the text block. For this reason, as shown in partial pattern range 1007 in Figure 10(c), if the item is the title of a document, an area that fits the entire image width in the X direction and approximately one-quarter of the image in the Y direction may be determined as the partial pattern range.

以上説明したように本実施形態では、文書に応じて部分パターン範囲が決定される。このため、文書に応じて適切な部分パターン範囲によって、インデックスブロック推定処理の精度を向上させることができる。 As described above, in this embodiment, the partial pattern range is determined according to the document. Therefore, the accuracy of the index block estimation process can be improved by using an appropriate partial pattern range according to the document.

＜実施形態３＞
実施形態１では、部分パターンを利用して導出された一致度が最大となる位置をＸＹ候補位置として決定し、ＸＹ候補位置の一致度が所定の閾値以上であれば、ＸＹ候補位置に基づき処理対象の項目のインデックスブロックのある位置を推定する方法を説明した。 <Embodiment 3>
In embodiment 1, a method was described in which the position at which the degree of match derived using a partial pattern is greatest is determined as the XY candidate position, and if the degree of match of the XY candidate position is equal to or greater than a predetermined threshold, the position of the index block of the item to be processed is estimated based on the XY candidate position.

しかしながら、入力文書には、登録文書の部分レイアウトと配置が類似したテキストブロックを含む領域が複数存在することがある。入力文書内に部分レイアウトと類似する領域が複数存在する場合、実施形態１の方法では、入力文書内における処理対象の項目のインデックスブロックの推定に失敗してしまうことがある。 However, an input document may contain multiple regions that contain text blocks whose layout is similar to the partial layout of a registered document. When an input document contains multiple regions whose partial layout is similar, the method of embodiment 1 may fail to estimate the index block of the item to be processed in the input document.

そこで本実施形態では、処理対象の項目の部分レイアウトに類似した領域が入力文書内に複数存在する場合であっても、入力文書内のインデックスブロックの位置を適切に推定する方法について説明する。なお、本実施形態については、実施形態１からの差分を中心に説明する。特に明記しない部分については実施形態１と同じ構成および処理である。 In this embodiment, therefore, a method for appropriately estimating the position of an index block in an input document will be described, even if the input document contains multiple areas similar to the partial layout of the item to be processed. Note that this embodiment will be described mainly focusing on the differences from embodiment 1. Parts that are not specifically mentioned have the same configuration and processing as embodiment 1.

図１６は、本実施形態におけるＳ５０７のインデックスブロック推定処理を説明するためのフローチャートである。本実施形態におけるインデックスブロック推定処理の詳細について、図１６のフローチャートに従い説明する。Ｓ１６００～Ｓ１６０４はＳ８００～Ｓ８０４と同一であるため説明を省略する。 Figure 16 is a flowchart for explaining the index block estimation process of S507 in this embodiment. Details of the index block estimation process in this embodiment will be explained with reference to the flowchart in Figure 16. S1600 to S1604 are the same as S800 to S804, so the explanation will be omitted.

Ｓ１６０５において画像処理部３０５は、Ｓ１６０４で導出した一致度が所定の閾値以上となるスキャン画像内のＸＹ位置を決定する。本ステップの結果、複数のＸＹ位置が決定されない場合もあるが、便宜的に本ステップによって決定されるＸＹ位置をＸＹ候補位置群と呼ぶ。 In S1605, the image processing unit 305 determines the XY positions in the scanned image where the degree of match derived in S1604 is equal to or greater than a predetermined threshold. As a result of this step, there may be cases where multiple XY positions are not determined, but for convenience, the XY positions determined by this step are referred to as a group of XY candidate positions.

図１７は、インデックスブロックとその周囲のブロックからなる部分レイアウトと類似する領域が複数存在する登録文書の例を示す図である。図１７（ａ）は、登録文書の一例を示す図である。図１７（ｂ）は、図１７（ａ）の登録文書における「見積日付（ＱｕｏｔａｔｉｏｎＤａｔｅ）」の項目に対応する文字列を含むテキストブロック１７０５をインデックスブロックとした場合の部分パターンを示す図である。図１７（ｂ）において、一点鎖線の矩形は、「見積日付」の項目の部分パターン範囲１７００を示し、実線の矩形で表されるテキストブロック１７０１～１７０６は、「見積日付」の項目の部分レイアウトを構成するテキストブロックを示している。図１６のフローチャートの説明では、「見積日付」を処理対象の項目とした場合の処理について説明する。 Figure 17 shows an example of a registered document in which there are multiple areas that are similar to a partial layout consisting of an index block and its surrounding blocks. Figure 17(a) shows an example of a registered document. Figure 17(b) shows a partial pattern when a text block 1705 containing a character string corresponding to the "Quotation Date" item in the registered document of Figure 17(a) is used as an index block. In Figure 17(b), the dashed-dotted rectangle indicates the partial pattern range 1700 of the "Quotation Date" item, and text blocks 1701 to 1706 represented by solid-line rectangles indicate text blocks that make up the partial layout of the "Quotation Date" item. In the explanation of the flowchart in Figure 16, the processing when "Quotation Date" is the item to be processed will be explained.

図１８は、入力文書を説明するための図である。図１８（ａ）は、入力文書を示す図であり、本フローチャートの説明では、この入力文書がスキャンされた結果得られたスキャン画像に対して、インデックスブロック推定処理が行われるものとして説明する。また、Ｓ５０４の文書マッチングにより、図１８（ａ）の入力文書に類似する文書は、図１７の登録文書が特定されたものとして説明する。 Figure 18 is a diagram for explaining an input document. Figure 18(a) is a diagram showing an input document, and in the explanation of this flowchart, it is assumed that the index block estimation process is performed on the scanned image obtained as a result of scanning this input document. Also, it is assumed that the document similar to the input document in Figure 18(a) is the registered document in Figure 17, which has been identified by the document matching in S504.

図１８（ｂ）～（ｅ）は、それぞれ、図１８（ａ）の入力文書のスキャン画像に対してブロックセレクション処理を行った結果検出されたテキストブロックを表す画像に、図１７（ｂ）の「見積日付」の部分パターンを重畳した図である。図１８（ｂ）～（ｅ）の夫々の図における矩形は、部分パターンを示す。即ち、実線の矩形は、部分レイアウトを構成するテキストブロックであり、一点鎖線の矩形は部分パターン範囲である。 Figures 18(b) to (e) are diagrams in which the partial pattern of "Estimated Date" in Figure 17(b) is superimposed on an image showing the text blocks detected as a result of performing block selection processing on the scanned image of the input document in Figure 18(a). The rectangles in each of Figures 18(b) to (e) indicate partial patterns. That is, the solid-line rectangles are text blocks that make up the partial layout, and the dash-dotted-line rectangles are the range of the partial pattern.

図１８（ｂ）～（ｅ）で示す、部分パターンの位置は、Ｓ１６０４で導出した一致度が所定の閾値以上となったときの位置である。このため部分レイアウトを構成する実線の矩形で表したテキストブロックのうち、インデックスブロックのＸＹ位置１８０１～１８０４が、本ステップの処理の結果、ＸＹ候補位置群として決定されている。 The positions of the partial patterns shown in Figures 18(b) to (e) are the positions when the degree of match derived in S1604 is equal to or greater than a predetermined threshold. Therefore, among the text blocks represented by solid-line rectangles that make up the partial layout, the XY positions 1801 to 1804 of the index blocks are determined as a group of XY candidate positions as a result of the processing in this step.

図１８（ａ）に示す入力文書のように、単純なテキストブロックの配置が繰り返し存在する文書において、その繰り返して配置されているテキストブロックの中にインデックスブロックが存在される場合には、一致度が閾値以上となるＸＹ位置が複数決定される。このため、図１８（ａ）に示す入力文書に対して、本ステップの処理がされた結果決定されるＸＹ候補位置群の数は２以上となる。 In a document in which a simple text block arrangement is repeated, such as the input document shown in FIG. 18(a), if an index block exists among the repeated text blocks, multiple XY positions are determined where the degree of match is equal to or greater than the threshold. Therefore, for the input document shown in FIG. 18(a), the number of XY candidate position groups determined as a result of processing this step will be two or more.

Ｓ１６０６において画像処理部３０５は、Ｓ１６０５で決定したＸＹ候補位置群の数に応じて処理を切り替える。ＸＹ候補位置群の数が１個であれば、Ｓ１６１０に進み、ＸＹ候補位置群の数が０個であれば、Ｓ１６１２に進む。Ｓ１６１２の処理はＳ８０９と同一であるため説明を省略する。 In S1606, the image processing unit 305 switches processing depending on the number of XY candidate position groups determined in S1605. If the number of XY candidate position groups is 1, the process proceeds to S1610, and if the number of XY candidate position groups is 0, the process proceeds to S1612. The process of S1612 is the same as S809, so a description thereof will be omitted.

ＸＹ候補位置群の数が２個以上である場合はＳ１６０７に進む。Ｓ１６０７において画像処理部３０５は、登録文書内の位置であって、処理対象の項目の部分レイアウトとの一致度が所定の閾値以上となる位置である類似位置（群）を取得する。 If the number of XY candidate position groups is two or more, proceed to S1607. In S1607, the image processing unit 305 acquires a similar position (group) that is a position in the registered document where the degree of match with the partial layout of the item to be processed is equal to or greater than a predetermined threshold.

登録文書内の位置に、処理対象の項目の部分パターンに含まれる部分レイアウトを重畳させてテキストブロックの一致度の導出を行い、一致度が所定の閾値以上となる登録文書内のＸＹ位置が「類似位置」として決定される。登録文書内のテキストブロックと部分レイアウトのテキストブロックとの一致度の算出方法は、Ｓ１６０２～Ｓ１６０４と同様の方法で導出されればよい。即ち、入力文書を対象としていたところを、登録文書を対象として同様の手順で一致度を導出すればよい。 The degree of match of the text block is derived by superimposing a partial layout included in the partial pattern of the item being processed onto a position in the registered document, and the XY position in the registered document where the degree of match is equal to or greater than a predetermined threshold is determined to be the "similar position." The degree of match between the text block in the registered document and the text block in the partial layout can be calculated in the same manner as in S1602 to S1604. In other words, instead of deriving the degree of match for the input document, the same procedure can be used to derive the degree of match for the registered document.

図１９は、登録文書内の類似位置を説明するための図である。図１９（ａ）は、図１７（ａ）と同一の登録文書を示す図である。図１９（ｂ）～（ｅ）は、それぞれ、図１９（ａ）の登録文書のスキャン画像に対してブロックセレクション処理を行った結果検出されたテキストブロックを表す画像に、図１７（ｂ）の「見積日付」の部分パターンを重畳した図である。図１９（ｂ）～（ｅ）の夫々の図における矩形は、部分パターンを示す。即ち、実線の矩形は、部分レイアウトを構成するテキストブロックであり、一点鎖線の矩形は部分パターン範囲である。 Figure 19 is a diagram for explaining similar positions within a registered document. Figure 19(a) is a diagram showing the same registered document as Figure 17(a). Figures 19(b) to (e) are each diagrams in which the partial pattern of "Estimated Date" from Figure 17(b) is superimposed on an image showing a text block detected as a result of performing block selection processing on the scanned image of the registered document from Figure 19(a). The rectangles in each of Figures 19(b) to (e) indicate partial patterns. That is, the solid-line rectangles are text blocks that make up the partial layout, and the dashed-dotted rectangles are the partial pattern ranges.

図１９（ｂ）～（ｅ）の、部分パターンの位置は、導出された一致度が所定の閾値以上となったときの、それぞれの位置である。このため部分レイアウトを構成するテキストブロックのうちのインデックスブロックのＸＹ位置が、類似位置群１９０１～１９０４として決定されている。本ステップでは、処理対象の項目の類似位置群の位置情報が取得される。類似位置群１９０１～１９０４には、類似位置１９０２のように、図１７（ｂ）で示した登録時のインデックスブロック１７０５のＸＹ位置も含まれる。 The positions of the partial patterns in Figures 19(b) to (e) are the respective positions when the derived degree of match is equal to or greater than a predetermined threshold. For this reason, the XY positions of the index blocks among the text blocks that make up the partial layout are determined as similar position groups 1901 to 1904. In this step, position information of the similar position group of the item to be processed is obtained. The similar position groups 1901 to 1904 also include the XY position of the index block 1705 at the time of registration shown in Figure 17(b), as in similar position 1902.

なお、Ｓ１６０７で登録文書内の類似位置を決定する処理が行われる必要はない。例えば、文書の登録時において、項目ごとに部分パターンを決定した後に類似位置群を決定し、類似位置群の情報を図７で示した抽出ルールの一部として予め記憶させてもよい。つまり、Ｓ１６０７では、記憶されている処理対象の項目の抽出ルールの１つとして類似位置群が取得されればよい。 Note that it is not necessary to perform processing to determine similar positions within the registered document in S1607. For example, when registering a document, a partial pattern may be determined for each item, and then a group of similar positions may be determined, and information about the group of similar positions may be stored in advance as part of the extraction rules shown in FIG. 7. In other words, in S1607, it is sufficient to obtain the group of similar positions as one of the extraction rules for the stored item to be processed.

Ｓ１６０８において画像処理部３０５は、Ｓ１６０７で取得した登録文書の類似位置群と、Ｓ１６０５で決定した入力文書におけるＸＹ候補位置群との対応付けを行う。具体的には、Ｙ位置でソートされた類似位置群に対して、類似位置群と同一条件でソートされたＸＹ候補位置群を、Ｙ位置の一方の側から順番で対応付けを行い、さらにＹ位置の他方の側からの順番で対応付けを行う。 In S1608, the image processing unit 305 associates the group of similar positions in the registered document acquired in S1607 with the group of XY candidate positions in the input document determined in S1605. Specifically, for the group of similar positions sorted by Y position, the image processing unit 305 associates the group of XY candidate positions sorted under the same conditions as the group of similar positions in order starting from one side of the Y position, and then associates them in order starting from the other side of the Y position.

図２０は、本ステップの処理を説明するための図である。表中の数値は、図１８または図１９で示した文書内の位置を示す符号を示す数値である。 Figure 20 is a diagram to explain the processing of this step. The numbers in the table are the numbers indicating the symbols that indicate the positions within the document shown in Figure 18 or Figure 19.

図２０（ａ）は、図１８および図１９で示したように、類似位置群とＸＹ候補位置群の数が一致している場合の対応付けを示す図である。列２００１はＹ位置でソートされた類似位置群である。列２００２はＹ位置でソートされたＸＹ候補位置群であり、列２００１の類似位置群に対してＹ位置の上から順番で対応付けられたＸＹ候補位置群である。列２００３はＹ位置でソートされたＸＹ候補位置群であり、列２００１の類似位置群に対してＹ位置の下から順番で対応付けられたＸＹ候補位置群である。図２０（ａ）では、列２００２のＸＹ候補位置群も列２００３のＸＹ候補位置群も、それぞれ同じ類似位置と対応付けられる。 Figure 20(a) is a diagram showing correspondence when the number of similar position groups and XY candidate position groups is the same as shown in Figures 18 and 19. Column 2001 is a group of similar positions sorted by Y position. Column 2002 is a group of XY candidate positions sorted by Y position, and is a group of XY candidate positions associated with the group of similar positions in column 2001 in order from the top of the Y position. Column 2003 is a group of XY candidate positions sorted by Y position, and is a group of XY candidate positions associated with the group of similar positions in column 2001 in order from the bottom of the Y position. In Figure 20(a), the group of XY candidate positions in column 2002 and the group of XY candidate positions in column 2003 are each associated with the same similar positions.

図２０（ｂ）は、ＸＹ位置群の数に対して、類似位置群の数が少ない場合の本ステップの対応付けの方法を説明するための図である。例えば、図１９（ｅ）に示す登録文書の位置に部分パターンを重畳させた場合の登録文書との一致度が閾値未満であり、Ｓ１６０７では類似位置群１９０１～１９０３のみが取得された場合の、対応付けを表した図が図２０（ｂ）である。列２０１１はＹ位置でソートされた類似位置群である。列２０１２は、列２０１１の類似位置群に対してＹ位置の上から順番で対応付けられたＸＹ候補位置群である。列２０１３は、列２０１１の類似位置群に対してＹ位置の下から順番で対応付けられたＸＹ候補位置群である。図２０（ｂ）では、上からの対応付けと下からの対応付けでは、類似位置群に対応するＸＹ候補位置群が異なる結果となっている。 Fig. 20B is a diagram for explaining the method of matching in this step when the number of similar position groups is small compared to the number of XY position groups. For example, Fig. 20B shows the matching when the degree of match with the registered document when a partial pattern is superimposed on the position of the registered document shown in Fig. 19E is less than the threshold, and only similar position groups 1901 to 1903 are acquired in S1607. Column 2011 is a group of similar positions sorted by Y position. Column 2012 is a group of XY candidate positions that are matched in order from the top of the Y position to the group of similar positions in column 2011. Column 2013 is a group of XY candidate positions that are matched in order from the bottom of the Y position to the group of similar positions in column 2011. In Fig. 20B, the XY candidate position groups that correspond to the similar position groups are different between the matching from above and the matching from below.

図２０（ｃ）は、ＸＹ候補位置群の数に対して、類似位置群の数が多い場合の本ステップの対応付けの方法を説明するための図である。図１８（ｅ）に示す入力文書の位置に部分パターンを重畳させた場合の入力文書との一致度が閾値未満であり、Ｓ１６０５ではＸＹ位置１８０１～１８０３のみがＸＹ候補位置群として決定された場合の、対応付けを表した図が図２０（ｃ）である。列２０２１はＹ位置でソートされた類似位置群である。列２０２２は、列２０２１の類似位置群に対してＹ位置の上から順番で対応付けられたＸＹ候補位置群である。列２０２３は、列２０２１の類似位置群に対してＹ位置の下から順番で対応付けられたＸＹ候補位置群である。上からの対応付けと下からの対応付けとでは異なる結果となり、上からの対応付けでは類似位置１９０４に対応するＸＹ候補位置群は見つからず、下からの対応付けでは類似位置１９０１に対応するＸＹ候補位置群は見つからない結果となる。 Figure 20(c) is a diagram for explaining the method of matching in this step when the number of similar position groups is large compared to the number of XY candidate position groups. Figure 20(c) shows the matching when the degree of match with the input document when a partial pattern is superimposed on the position of the input document shown in Figure 18(e) is less than a threshold, and only XY positions 1801 to 1803 are determined as the XY candidate position group in S1605. Column 2021 is a group of similar positions sorted by Y position. Column 2022 is a group of XY candidate positions that are matched to the similar position group of column 2021 in order from the top of the Y position. Column 2023 is a group of XY candidate positions that are matched to the similar position group of column 2021 in order from the bottom of the Y position. Matching from above and matching from below will produce different results; matching from above will not find a group of XY candidate positions that correspond to similar position 1904, and matching from below will not find a group of XY candidate positions that correspond to similar position 1901.

Ｓ１６０９において画像処理部３０５は、Ｓ１６０８で行った対応付けの結果に基づき、Ｓ１６０５で決定されたＸＹ候補位置群から１つのＸＹ候補位置を決定する。 In S1609, the image processing unit 305 determines one XY candidate position from the group of XY candidate positions determined in S1605 based on the result of the matching performed in S1608.

Ｓ１６０８で行われた対応付けの結果が、図２０（ａ）に示したように、上からの対応付けと下からの対応付けの結果が一致する場合がある。この場合は、ＸＹ候補位置群のうち、登録時のインデックスブロックの位置を示す類似位置に対応付けられたＸＹ位置を、１つのＸＹ候補位置として決定する。図２０（ａ）の例では、インデックスブロックの位置を示す類似位置１９０２に対応付けられたＸＹ位置１８０２が、１つのＸＹ候補位置として決定される。 As shown in FIG. 20(a), the results of the matching performed in S1608 may match the results of the matching from above and the matching from below. In this case, from among the group of XY candidate positions, an XY position that is associated with a similar position that indicates the position of the index block at the time of registration is determined as one XY candidate position. In the example of FIG. 20(a), XY position 1802 that is associated with similar position 1902 that indicates the position of the index block is determined as one XY candidate position.

一方、Ｓ１６０８で行われた対応付けの結果が、図２０（ｂ）および（ｃ）で示したように、上からの対応付けと下からの対応付けの結果が一致しない場合がある。この場合ははじめに、上からの対応付けを行った場合の、インデックスブロックの位置を示す類似位置に対応付けられた入力文書のＸＹ位置を決定する。さらに、下からの対応付けを行った場合の、インデックスブロックの位置を示す類似位置に対応付けられた入力文書のＸＹ位置を決定する。 On the other hand, as shown in Figures 20(b) and (c), the results of the matching performed in S1608 may not match between the results of matching from above and the results of matching from below. In this case, first, the XY positions of the input document that are associated with the similar position indicating the position of the index block when matching is performed from above are determined. Furthermore, the XY positions of the input document that are associated with the similar position indicating the position of the index block when matching is performed from below are determined.

図２０（ｂ）の例では、インデックスブロックの位置を示す類似位置１９０２に対応付けられた、ＸＹ位置１８０２とＸＹ位置１８０３とが決定される。図２０（Ｃ）の例では、類似位置１９０２に対応付けられた、ＸＹ位置１８０２とＸＹ位置１８０１とが決定される。そして、決定された２つのＸＹ位置のうち、Ｓ１６０４で導出した一致度が高い方を、ＸＹ候補位置群のうちの１つのＸＹ候補位置として決定する。なお、一致度を用いないで、２つのＸＹ位置から１つの中から１つのＸＹ位置を選択してもよい。例えば、２つのＸＹ位置を表示させてユーザからの指示を受け付け、上からの対応付けと下からの対応付けのどちらを利用するかを項目ごとに覚えておいて利用してもよい。 In the example of FIG. 20(b), XY position 1802 and XY position 1803 are determined to be associated with similar position 1902 indicating the position of the index block. In the example of FIG. 20(C), XY position 1802 and XY position 1801 are determined to be associated with similar position 1902. Then, of the two determined XY positions, the one with the higher degree of match derived in S1604 is determined as one XY candidate position in the XY candidate position group. Note that one XY position may be selected from one of two XY positions without using the degree of match. For example, two XY positions may be displayed and an instruction from the user may be received, and the user may remember for each item whether to use the correspondence from above or the correspondence from below.

ＸＹ候補位置群から１つのＸＹ候補位置を決定されるとＳ１６１０に進む。Ｓ１６１０では、Ｓ８０７の処理と同様に、ＸＹ候補位置を処理対象のインデックスブロックのある位置として推定して、スキャン画像のテキストブロックから、処理対象の項目のインデックスブロックを推定する処理を行う。Ｓ１６１１はＳ８０８と、Ｓ１６１３はＳ８１０とそれぞれ同一であるため説明を省略する。 When one XY candidate position is determined from the group of XY candidate positions, the process proceeds to S1610. In S1610, similar to the process of S807, the XY candidate position is estimated as the position of the index block to be processed, and the index block of the item to be processed is estimated from the text block of the scanned image. S1611 is the same as S808, and S1613 is the same as S810, so their explanations are omitted.

以上説明したように本実施形態では、入力文書において一致度が閾値以上となるＸＹ候補位置が複数存在した場合に、部分パターンとの一致度が閾値以上となる登録文書の類似位置群との対応付けを行った上で１つのＸＹ候補位置を決定する。このため、インデックスブロックとその周囲のテキストブロックからなる部分レイアウトに類似した領域が文書内に複数存在する場合でも、インデックスブロック推定処理の精度を向上させることができる。 As described above, in this embodiment, when there are multiple XY candidate positions in an input document whose degree of match is equal to or greater than a threshold, one XY candidate position is determined after associating it with a group of similar positions in a registered document whose degree of match with the partial pattern is equal to or greater than a threshold. Therefore, the accuracy of the index block estimation process can be improved even when there are multiple areas in a document that are similar to a partial layout consisting of an index block and its surrounding text blocks.

＜その他の実施形態＞
上述の実施形態では、画像形成装置１００が単体で図４のフローチャートの各ステップの処理を行う例を説明した。他にも、これらの処理の全部または一部を図３の機能を有するシステム１０５上の他の画像処理装置で行う形態でもよい。 <Other embodiments>
In the above embodiment, an example has been described in which the image forming apparatus 100 performs the processes of the steps in the flowchart of Fig. 4 by itself. Alternatively, all or part of these processes may be performed by another image processing apparatus on the system 105 having the functions of Fig. 3.

例えば、スキャン処理を画像形成装置１００で実行して、スキャン画像を端末１０１にネットワークを介して送信する。端末１０１が画像処理部３０５と同様の機能を有しており、端末１０１においてインデックス抽出処理を実行してもよい。この場合、端末１０１はインデックス抽出結果を画像形成装置１００に返信して、画像形成装置１００は取得したインデックス抽出結果に基づきファイル生成およびファイル送信をする。 For example, the scan process is executed in the image forming device 100, and the scanned image is sent to the terminal 101 via the network. The terminal 101 may have a function similar to that of the image processing unit 305, and may execute the index extraction process in the terminal 101. In this case, the terminal 101 returns the index extraction result to the image forming device 100, and the image forming device 100 generates and sends a file based on the acquired index extraction result.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention can also be realized by supplying a program that realizes one or more of the functions of the above-mentioned embodiments to a system or device via a network or a storage medium, and having one or more processors in the computer of the system or device read and execute the program. It can also be realized by a circuit (e.g., an ASIC) that realizes one or more functions.

１００画像形成装置
３０５画像処理部
１１１ＣＰＵ 100 Image forming apparatus 305 Image processing unit 111 CPU

Claims

a detection means for detecting text blocks in an input image;
A specifying means for specifying a registered document corresponding to the input image from among a plurality of registered documents;
a determining means for determining a text block in the input image corresponding to the item to be processed based on a partial layout defined in the specified registered document, the partial layout including a first text block corresponding to the item to be processed and at least one second text block existing in the vicinity of the first text block;
an acquisition means for acquiring a character string corresponding to the item to be processed in the input image by performing character recognition processing on the determined text block;
13. An image processing device comprising:

The determining means is
The image processing device according to claim 1, characterized in that the determination is made by superimposing the partial layout at any position in a search range in the input image and deriving a degree of match based on the size of an area where a text block included in the partial layout overlaps with a text block in the input image.

The determining means is
The image processing apparatus according to claim 2 , wherein the degree of match is derived within the search range, the search range being a predetermined area including a position in the input image corresponding to a position of the first text block in the specified registered document.

The determining means is
4. The image processing apparatus according to claim 2, further comprising: deriving the vertical positions for determination based on a difference in vertical position between a text block included in the partial layout and a text block within the search range.

The determining means is
The image processing apparatus according to claim 4 , wherein the degree of coincidence is derived at each position when the partial layout is superimposed in a horizontal direction on the group of vertical positions within the search range.

The determining means is
The image processing device according to claim 2 , further comprising: deriving a position in the input image where the degree of matching is equal to or greater than a threshold and where the degree of matching is maximum as the position in the input image for making the determination.

The determining means is
deriving candidate positions in the input image where the degree of match is equal to or greater than a threshold;
if the number of candidate positions is one, deriving the candidate position as a position in the input image for making the determination;
The image processing device according to claim 5, characterized in that when the number of candidate positions is two or more, a position in the identified registered document where the degree of similarity derived by the same method as the method of deriving the degree of similarity when a text block included in the partial layout is superimposed on any position in the identified registered document is equal to or greater than a threshold value is obtained as a similar position, and a position in the input image for making the decision is derived by matching the candidate positions with the similar positions.

The determining means is
When the number of the candidate positions is two or more and the number of the candidate positions is the same as the number of the similar positions,
The image processing device according to claim 7, characterized in that, as a result of associating the candidate positions and the similar positions arranged under the same conditions in order from one side, the candidate position associated with the similar position corresponding to the first text block is derived as a position in the input image for making the determination.

When the number of the candidate positions is two or more and the number of the candidate positions is different from the number of the similar positions,
As a result of associating the candidate positions and the similar positions arranged under the same conditions in order from one side, a first position indicated by the candidate position associated with the similar position corresponding to the first text block;
determining a second position indicated by the candidate position associated with the similar position corresponding to the first text block as a result of associating the candidate position and the similar position arranged under the same condition in order from the other side;
9. The image processing apparatus according to claim 7, wherein one of the first position and the second position, which satisfies a predetermined condition, is derived as a position in the input image for making the determination.

The determining means is
10. The image processing device according to claim 6, further comprising: placing the first text block in the identified registered document at the position in the input image derived based on the degree of similarity; and, if a text block in the input image that overlaps with the placed text block satisfies a predetermined condition, determining that the overlapping text block is the text block corresponding to the item to be processed in the input image.

The predetermined condition is:
11. The image processing apparatus according to claim 10, wherein the degree of overlap between the placed text block and the overlapping text block is equal to or greater than a predetermined value, and the distance between the vertices is within a certain range.

a predetermined range is set based on the first text block in the specified registered document;
The degree of agreement is
The image processing device according to any one of claims 2 to 11, characterized in that, when the partial layout is superimposed on the input image, the degree of matching is adjusted to decrease as the area of text blocks included in the specified range in the input image that do not overlap with text blocks included in the partial layout increases.

13. The image processing device according to claim 1, wherein the partial layout includes the first text block in the identified registered document and the second text block included in a predetermined range based on the first text block in the identified registered document.

The predetermined range is
The image processing device according to claim 12 or 13, characterized in that in the identified registered document, the region is based on the first text block and is an area including a predetermined number or more of text blocks.

The identification means is
15. The image processing device according to claim 1, further comprising: a processor for processing the input image based on a layout of the input image and a layout of the text blocks in the input image; a processor for processing the input image and a layout of the text blocks in the input image;

16. The image processing apparatus according to claim 1, further comprising a registration unit that registers the input image as a new document when the specification unit cannot specify a registered document corresponding to the input image.

The image processing device according to claim 1 , further comprising a setting unit that sets a property of the input image based on a character string corresponding to the item to be processed in the input image acquired by the acquisition unit.

a detection step for detecting text blocks in the input image;
a step of identifying a registered document corresponding to the input image from among a plurality of registered documents;
a determining step of determining a text block in the input image corresponding to the item to be processed based on a partial layout defined in the identified registered document, the partial layout including a first text block corresponding to the item to be processed and at least one second text block existing in the vicinity of the first text block;
an acquisition step of acquiring a character string corresponding to the item to be processed in the input image by performing a character recognition process on the determined text block;
13. An image processing method comprising the steps of:

A program for causing a computer to function as each of the means of an image processing device according to any one of claims 1 to 17.