JP7699773B2

JP7699773B2 - Document image processing system, document image processing method, and document image processing program

Info

Publication number: JP7699773B2
Application number: JP2021105251A
Authority: JP
Inventors: 福光齊藤
Original assignee: Net Smile Inc
Current assignee: Net Smile Inc
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2025-06-30
Anticipated expiration: 2041-06-24
Also published as: JP2023003887A

Description

本発明は、書類画像処理システム、書類画像処理方法、および書類画像処理プログラムに関するものである。 The present invention relates to a document image processing system, a document image processing method, and a document image processing program.

ある帳票識別システムでは、帳票フォーマットテーブルが予めユーザにより作成されており、帳票フォーマットテーブルには、ユーザにより指定された文字認識対象領域の位置、サイズ、文字種などを示すフィールド情報が含まれている。そして、この帳票フォーマット（つまり、フィールド情報）に基づいて、帳票画像の画像データから、帳票内の文字情報（テキストデータ）が取得されている（例えば特許文献１参照）。 In one document identification system, a document format table is created in advance by the user, and the document format table contains field information indicating the position, size, character type, etc. of the character recognition target area specified by the user. Then, based on this document format (i.e., field information), character information (text data) within the document is obtained from the image data of the document image (see, for example, Patent Document 1).

ある画像認識装置は、対象画像から部分画像を切り出して、部分画像における文字および数字を認識し、その文字および数字から所定の条件を満たす文字および数字を抽出する抽出処理を実行している（例えば特許文献２参照）。抽出処理において、その画像認識装置は、例えば、認識した文字が、予め設定されている所定の銀行名を含むか否かを判定し、その文字が所定の銀行名を含む場合、その文字とその文字から所定距離内の数字を、銀行名および口座番号の対として抽出している。 One image recognition device performs an extraction process that cuts out a partial image from a target image, recognizes letters and numbers in the partial image, and extracts letters and numbers that satisfy a predetermined condition from the letters and numbers (see, for example, Patent Document 2). In the extraction process, the image recognition device, for example, determines whether the recognized characters include a predetermined bank name that has been set in advance, and if the characters include the predetermined bank name, extracts the characters and numbers within a predetermined distance from the characters as a pair of a bank name and an account number.

特開２０１６－４８４４４号公報JP 2016-48444 A 特開２０２０－１７０２６４号公報JP 2020-170264 A

しかしながら、上述の帳票識別システムでは、帳票などの書類のレイアウト（各属性が記述されている位置の情報など）を指定するテンプレートデータを使用するため、レイアウトの異なる複数の書類を処理するためには、レイアウトごとにテンプレートデータを予め作成しておかなければならず、事前に煩雑な作業が要求される。また、レイアウトが未知である書類については、上述の技術では、ある属性について書類画像内の属性値を正確に検出することは困難である。 However, the above-mentioned document identification system uses template data that specifies the layout of documents such as documents (such as information on the position where each attribute is written), so in order to process multiple documents with different layouts, template data must be created in advance for each layout, which requires tedious work in advance. Furthermore, for documents with unknown layouts, the above-mentioned technology has difficulty accurately detecting attribute values for a certain attribute in the document image.

また、上述の画像認識装置では、テンプレートデータは不要であるが、抽出すべき文字列（上述の銀行名）を予め設定しておく必要があり、設定されていない文字列については抽出されない。また、上述の画像認識装置では、上述の銀行名から所定距離内の数字を口座番号として抽出しているが、２つの文字オブジェクト間の距離が短くても、両者が関連しない場合や、両者間の距離が長くても、両者が関連する場合があるため、所望の文字列が正しく抽出されない可能性がある。 Furthermore, in the above-mentioned image recognition device, template data is not required, but the character string to be extracted (the above-mentioned bank name) must be set in advance, and character strings that are not set will not be extracted. In addition, in the above-mentioned image recognition device, numbers within a specified distance from the above-mentioned bank name are extracted as account numbers, but two character objects may not be related even if the distance between them is short, and may be related even if the distance between them is long, so the desired character string may not be extracted correctly.

図５は、テーブルを含む書類画像の一例を示す図である。例えば図５に示すように、健康診断報告書の書類画像１０１には、健康診断の結果を示すテーブル１１１が含まれている。テーブル１１１には、検査項目を示すラベルと、その検査項目の検査結果である値との組み合わせが含まれている。テーブル１１１では、罫線の有無に拘わらず、セルが２次元的に配列され、セルに、ラベルや、ラベルに対応する値（数値や、数値以外の文字列）が記載されているが、両者（ラベルと値）間の距離が短くても、両者が関連しない場合や、両者間の距離が長くても、両者が関連する場合がある。 Figure 5 is a diagram showing an example of a document image including a table. For example, as shown in Figure 5, document image 101 of a medical examination report includes table 111 showing the results of the medical examination. Table 111 includes combinations of labels indicating test items and values that are the test results of those test items. In table 111, cells are arranged two-dimensionally, with or without lines, and labels and values corresponding to the labels (numeric values or non-numeric character strings) are entered in the cells. However, even if the distance between the two (label and value) is short, the two may not be related, and even if the distance between the two is long, the two may be related.

本発明は、上記の問題に鑑みてなされたものであり、テンプレートデータを使用せずに、テーブル内のセルの属性を正確に特定する書類画像処理システム、書類画像処理方法、および書類画像処理プログラムを得ることを目的とする。 The present invention has been made in consideration of the above problems, and aims to provide a document image processing system, a document image processing method, and a document image processing program that can accurately identify the attributes of cells in a table without using template data.

本発明に係る書類画像処理システムは、書類画像内のテーブルを検出し、少なくともテーブル内のセルを検出し、そのセルの位置およびサイズを示すセル幾何学データを生成するテーブル検出部と、書類画像内のテキストオブジェクトを検出し、そのテキストオブジェクトの位置およびサイズを示すテキストオブジェクト幾何学データを生成するテキストオブジェクト検出部と、そのテキストオブジェクトに対して文字認識処理を実行してテキストオブジェクトに対応するテキストデータを生成する文字認識処理部と、（ａ）セル幾何学データおよびテキストオブジェクト幾何学データに基づいて、セル内のテキストオブジェクトを特定し、（ｂ）セルごとに、セル幾何学データ、セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、およびセル内のテキストオブジェクトのテキストデータを含むノードデータを生成し、テーブルに対応するノードデータを含むノードデータセットを生成し、（ｃ）ノードデータセットに対して所定の分類処理を実行して、セルごとに、セルの属性を特定するセル属性特定部とを備える。 The document image processing system according to the present invention includes a table detection unit that detects a table in a document image, detects at least a cell in the table, and generates cell geometric data indicating the position and size of the cell; a text object detection unit that detects a text object in a document image, and generates text object geometric data indicating the position and size of the text object; a character recognition processing unit that performs character recognition processing on the text object to generate text data corresponding to the text object; and a cell attribute identification unit that (a) identifies a text object in a cell based on the cell geometric data and the text object geometric data, (b) generates node data for each cell, including the cell geometric data, the text object geometric data of the text object in the cell, and the text data of the text object in the cell, and generates a node data set including the node data corresponding to the table, and (c) performs a predetermined classification processing on the node data set to identify the attributes of the cell for each cell.

本発明に係る書類画像処理方法は、書類画像内のテーブルをコンピュータが検出し、少なくともテーブル内のセルをコンピュータが検出し、セルの位置およびサイズを示すセル幾何学データをコンピュータが生成するステップと、書類画像内のテキストオブジェクトをコンピュータが検出し、テキストオブジェクトの位置およびサイズを示すテキストオブジェクト幾何学データをコンピュータが生成するステップと、テキストオブジェクトに対して文字認識処理をコンピュータが実行してテキストオブジェクトに対応するテキストデータをコンピュータが生成するステップと、（ａ）セル幾何学データおよびテキストオブジェクト幾何学データに基づいて、セル内のテキストオブジェクトをコンピュータが特定し、（ｂ）セルごとに、セル幾何学データ、セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、およびセル内のテキストオブジェクトのテキストデータを含むノードデータをコンピュータが生成し、テーブルに対応するノードデータを含むノードデータセットをコンピュータが生成し、（ｃ）ノードデータセットに対して所定の分類処理をコンピュータが実行して、セルごとに、セルの属性をコンピュータが特定するステップとを備える。
A document image processing method according to the present invention includes the steps of: a computer detecting a table in a document image, the computer detecting at least cells in the table, and the computer generating cell geometry data indicating the positions and sizes of the cells; a computer detecting text objects in the document image, and the computer generating text object geometry data indicating the positions and sizes of the text objects; a computer performing character recognition processing on the text objects to generate text data corresponding to the text objects; and a computer generating a node data set including the node data corresponding to the table by the computer performing a predetermined classification processing on the node data set to identify cell attributes for each cell.

本発明に係る書類画像処理プログラムは、コンピューターを、上述のテーブル検出部、上述のテキストオブジェクト検出部、上述の文字認識処理部、並びに、上述のセル属性特定部として機能させる。 The document image processing program of the present invention causes a computer to function as the above-mentioned table detection unit, the above-mentioned text object detection unit, the above-mentioned character recognition processing unit, and the above-mentioned cell attribute identification unit.

本発明によれば、書類画像内である属性についての記述位置を指定するテンプレートデータを使用せずに、テーブル内のセルの属性を正確に特定する書類画像処理システム、書類画像処理方法、および書類画像処理プログラムが得られる。 The present invention provides a document image processing system, a document image processing method, and a document image processing program that accurately identify attributes of cells in a table without using template data that specifies the description position of an attribute in a document image.

本発明の上記又は他の目的、特徴および優位性は、添付の図面とともに以下の詳細な説明から更に明らかになる。 The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description taken in conjunction with the accompanying drawings.

図１は、本発明の実施の形態に係る書類画像処理システムの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a document image processing system according to an embodiment of the present invention. 図２は、本発明の実施の形態に係る書類画像処理システムにおいて生成されるノードデータについて説明する図である。FIG. 2 is a diagram for explaining node data generated in the document image processing system according to the embodiment of the present invention. 図３は、本発明の実施の形態に係る書類画像処理システムにおける分類処理について説明する図である。FIG. 3 is a diagram for explaining the classification process in the document image processing system according to the embodiment of the present invention. 図４は、図１に示す書類画像処理システムの動作を説明するフローチャートである。FIG. 4 is a flow chart for explaining the operation of the document image processing system shown in FIG. 図５は、テーブルを含む書類画像の一例を示す図である。FIG. 5 is a diagram showing an example of a document image including a table.

以下、図に基づいて本発明の実施の形態を説明する。 The following describes an embodiment of the present invention with reference to the drawings.

図１は、本発明の実施の形態に係る書類画像処理システムの構成を示すブロック図である。図１に示す書類画像処理システムは、１台の情報処理装置（パーソナルコンピューター、サーバなど）で構成されているが、後述の処理部を、互いにデータ通信可能な複数の情報処理装置に分散させてもよい。また、そのような複数の情報処理装置には、特定の演算を並列処理するＧＰＵ（Graphics Processing Unit）が含まれていてもよい。 Figure 1 is a block diagram showing the configuration of a document image processing system according to an embodiment of the present invention. The document image processing system shown in Figure 1 is composed of one information processing device (personal computer, server, etc.), but the processing units described below may be distributed among multiple information processing devices capable of data communication with each other. Furthermore, such multiple information processing devices may include a GPU (Graphics Processing Unit) that processes specific calculations in parallel.

図１に示す書類画像処理システムは、記憶装置１、通信装置２、画像読取装置３、および演算処理装置４を備える。 The document image processing system shown in FIG. 1 includes a storage device 1, a communication device 2, an image reading device 3, and a processing device 4.

記憶装置１は、フラッシュメモリー、ハードディスクなどの不揮発性の記憶装置であって、各種データやプログラムを格納する。 The storage device 1 is a non-volatile storage device such as a flash memory or a hard disk, and stores various data and programs.

ここでは、記憶装置１には、画像処理プログラム１１が格納されており、また、システム設定データ（後述の各処理部に使用されるニューラルネットワークの係数設定値など）が必要に応じて格納される。なお、画像処理プログラム１１は、ＣＤ（Compact Disk）などの可搬性のあるコンピューター読み取り可能な記録媒体に格納されていてもよい。その場合、例えば、その記録媒体から記憶装置１へ画像処理プログラム１１がインストールされる。また、画像処理プログラム１１は、１つのプログラムでも、複数のプログラムの集合体でもよい。 Here, the storage device 1 stores an image processing program 11, and also stores system setting data (such as the coefficient setting values of the neural network used in each processing unit described below) as necessary. Note that the image processing program 11 may be stored on a portable computer-readable recording medium such as a CD (Compact Disk). In that case, for example, the image processing program 11 is installed from the recording medium into the storage device 1. Also, the image processing program 11 may be a single program or a collection of multiple programs.

通信装置２は、ネットワークインターフェイス、周辺機器インターフェイス、モデムなどのデータ通信可能な装置であって、必要に応じて、他の装置とデータ通信を行う。画像読取装置３は、書類から書類画像を光学的に読み取り、書類画像の画像データ（ラスタイメージデータなど）を生成する。なお、通信装置２および画像読取装置３は、必要に応じて設けられる。 The communication device 2 is a device capable of data communication, such as a network interface, a peripheral device interface, or a modem, and performs data communication with other devices as necessary. The image reading device 3 optically reads a document image from a document and generates image data (e.g., raster image data) of the document image. The communication device 2 and the image reading device 3 are provided as necessary.

演算処理装置４は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）などを備えるコンピューターであって、プログラムを、ＲＯＭ、記憶装置１などからＲＡＭにロードしＣＰＵで実行することで、各種処理部として動作する。 The arithmetic processing device 4 is a computer equipped with a CPU (Central Processing Unit), ROM (Read Only Memory), RAM (Random Access Memory), etc., and operates as various processing units by loading programs from the ROM, storage device 1, etc. into the RAM and executing them on the CPU.

ここでは、画像処理プログラム１１を実行することで、演算処理装置４は、書類画像取得部２１、テーブル検出部２２、テキストオブジェクト検出部２３、文字認識処理部２４、セル属性特定部２５、データ出力部２６、および機械学習処理部２７として動作する。 Here, by executing the image processing program 11, the calculation processing device 4 operates as a document image acquisition unit 21, a table detection unit 22, a text object detection unit 23, a character recognition processing unit 24, a cell attribute identification unit 25, a data output unit 26, and a machine learning processing unit 27.

書類画像取得部２１は、ラスターイメージデータなどの画像データとして書類画像を取得する。書類画像は、領収書（レシートを含む）、請求書、納品書などの帳票類、宣伝広告や告知などのチラシ、回答済みアンケート用紙、健康診断報告書などといった、１または複数の属性（記載項目など）についての属性ラベル（見出しなどのテキスト）と属性値（数値、その他の文字列などのテキスト）とをテーブル内に含む書類の画像である。例えば、書類画像取得部２１は、記憶装置１に格納されている画像データとしての書類画像を読み出したり、ネットワークなどの通信路を介して通信装置２により受信された画像データとしての書類画像を取得したり、画像読取装置３により生成された画像データとしての書類画像を取得したりする。 The document image acquisition unit 21 acquires a document image as image data such as raster image data. The document image is an image of a document that includes, in a table, attribute labels (text such as headings) and attribute values (text such as numbers and other character strings) for one or more attributes (such as written items), such as receipts (including receipts), invoices, delivery notes, and other forms, flyers such as advertising and announcements, completed questionnaires, and health check reports. For example, the document image acquisition unit 21 reads out document images as image data stored in the storage device 1, acquires document images as image data received by the communication device 2 via a communication path such as a network, and acquires document images as image data generated by the image reading device 3.

テーブル検出部２２は、テンプレートデータを使用せずに、取得された書類画像内のテーブルを検出し、そのテーブルの位置およびサイズを示すテーブル幾何学データを生成するとともに、少なくともテーブル内のセルを検出し、そのセルの位置およびサイズを示すセル幾何学データを生成する。 The table detection unit 22 detects tables in the acquired document image without using template data, generates table geometry data indicating the position and size of the table, and detects at least cells in the table, and generates cell geometry data indicating the position and size of the cell.

この実施の形態では、テーブル検出部２２は、テーブル内のセルとともにロウおよびカラムの少なくとも一方を検出し、そのロウの位置およびサイズを示すロウ幾何学データ並びにそのカラムの位置およびサイズを示すカラム幾何学データの少なくとも一方（ここでは、両方）を生成する。 In this embodiment, the table detection unit 22 detects at least one of rows and columns along with cells in a table, and generates at least one of row geometry data indicating the position and size of the row and column geometry data indicating the position and size of the column (here, both).

なお、テーブル検出部２２は、ニューラルネットワークを使用する既存の手法に従って、書類画像内のテーブル、そのテーブル内のロウ、カラム、およびセルを検出し、それらの幾何学データを生成する。 The table detection unit 22 detects tables in document images, as well as rows, columns, and cells within those tables, and generates their geometric data according to an existing method using a neural network.

テキストオブジェクト検出部２３は、テンプレートデータを使用せずに、取得された書類画像内のテキストオブジェクトを検出し、そのテキストオブジェクトの位置およびサイズを示すテキストオブジェクト幾何学データを生成する。 The text object detection unit 23 detects text objects in the acquired document image without using template data, and generates text object geometry data indicating the position and size of the text objects.

具体的には、テキストオブジェクト検出部２３は、（ａ）書類画像内の文字以外のオブジェクト（写真オブジェクト、図形オブジェクト、罫線オブジェクトなど）を除外して文字オブジェクトを検出し、（ｂ）各文字オブジェクトの位置に基づいて、「単語」単位にグルーピングしてテキストオブジェクトを抽出する。 Specifically, the text object detection unit 23 (a) detects text objects by excluding non-text objects (photo objects, graphic objects, line objects, etc.) in the document image, and (b) extracts text objects by grouping them into "words" based on the position of each text object.

なお、テキストオブジェクト検出部２３は、既存の技術（例えば、領域分離処理や、機械学習されたディープニューラルネットワークなど）を使用して、書類画像内の文字オブジェクトを抽出する。 The text object detection unit 23 extracts text objects from within a document image using existing techniques (e.g., region separation processing, machine-learned deep neural networks, etc.).

文字認識処理部２４は、検出されたテキストオブジェクト（ラスターイメージ）に対して文字認識処理を実行してそのテキストオブジェクトに対応するテキストデータ（文字コード列）を生成する。なお、この文字認識処理には、既存の技術が利用される。 The character recognition processing unit 24 performs character recognition processing on the detected text object (raster image) to generate text data (character code string) corresponding to the text object. This character recognition processing uses existing technology.

セル属性特定部２５は、（ａ）上述のセル幾何学データおよびテキストオブジェクト幾何学データに基づいて、セル内のテキストオブジェクトを特定し、（ｂ）セルごとに、セル幾何学データ、セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、およびセル内のテキストオブジェクトのテキストデータを含むノードデータを生成し、テーブルに対応するノードデータを含むノードデータセットを生成し、（ｃ）そのノードデータセットに対して所定の分類処理を実行して、セルごとに、セルの属性を特定する。 The cell attribute identification unit 25 (a) identifies text objects in the cells based on the above-mentioned cell geometric data and text object geometric data, (b) generates node data for each cell, including the cell geometric data, the text object geometric data of the text objects in the cell, and the text data of the text objects in the cell, and generates a node data set including the node data corresponding to the table, and (c) performs a predetermined classification process on the node data set to identify the cell attributes for each cell.

なお、ノードデータ内のセル、ロウ、カラム、およびテキストオブジェクトの位置は、
２次元座標の座標値であり、それらのサイズは、２次元座標のそれぞれの座標での長さである。また、その位置は、セル、ロウ、カラム、およびテキストオブジェクトの矩形領域の所定部位（四隅のいずれか、中心など）の位置で示され、テーブルの所定部位（四隅のいずれか、中心など）からの相対位置で表されるようにしてもよい。この相対位置は、書類画像内でのテーブルの絶対位置（テーブル幾何学データ内の位置）とセル、ロウ、カラムまたはテキストオブジェクトの絶対位置（当初の幾何学データ内の位置）とから導出される。 The positions of cells, rows, columns, and text objects in the node data are
The coordinate values are two-dimensional coordinates, and their sizes are the lengths at the respective two-dimensional coordinates. The positions are indicated by the positions of a predetermined portion (one of the four corners, the center, etc.) of a rectangular area of a cell, row, column, or text object, and may be expressed as a relative position from a predetermined portion (one of the four corners, the center, etc.) of a table. This relative position is derived from the absolute position of the table in the document image (the position in the table geometric data) and the absolute position of the cell, row, column, or text object (the position in the original geometric data).

具体的には、検出された各セルについて、セル属性特定部２５は、そのセルのセル幾何学データから特定されるセルの領域内に、テキストオブジェクト幾何学データから特定されるテキストオブジェクト（バウンディングボックス）の領域が含まれる場合、そのテキストオブジェクトが、そのセル内のテキストオブジェクトであると判定する。 Specifically, for each detected cell, if the area of a text object (bounding box) identified from the text object geometric data is included within the cell area identified from the cell geometric data of the cell, the cell attribute identification unit 25 determines that the text object is a text object within the cell.

図２は、本発明の実施の形態に係る書類画像処理システムにおいて生成されるノードデータについて説明する図である。図３は、本発明の実施の形態に係る書類画像処理システムにおける分類処理について説明する図である。 Figure 2 is a diagram explaining node data generated in a document image processing system according to an embodiment of the present invention. Figure 3 is a diagram explaining classification processing in a document image processing system according to an embodiment of the present invention.

この実施の形態では、図２に示すように、あるセルのノードデータは、当該セルのセル幾何学データ、当該セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、および当該セル内のテキストオブジェクトのテキストデータの他に、当該セルの属するロウのロウ幾何学データおよび当該セルの属するカラムのカラム幾何学データの少なくとも一方（ここでは両方）をさらに含む。 In this embodiment, as shown in FIG. 2, the node data of a cell includes not only the cell geometry data of the cell, the text object geometry data of the text object in the cell, and the text data of the text object in the cell, but also at least one of (here, both) the row geometry data of the row to which the cell belongs and the column geometry data of the column to which the cell belongs.

この実施の形態では、図３に示すように、セル属性特定部２５は、上述のノードデータセットを入力データとして機械学習済みのグラフニューラルネットワーク（ＧＮＮ）に入力し、そのＧＮＮの出力データを上述のセルの属性として特定する。なお、このＧＮＮおよびその機械学習については既存のものが利用できる。 In this embodiment, as shown in FIG. 3, the cell attribute identification unit 25 inputs the above-mentioned node data set as input data to a machine-learned graph neural network (GNN), and identifies the output data of the GNN as the above-mentioned cell attribute. Note that this GNN and its machine learning can be existing ones.

また、セルの属性は、セルのセルタイプを少なくとも含み、セルタイプは、ラベルおよび属性値のいずれかである（つまり、この場合、セルは、ラベルのセルおよび属性値のセルのいずれかに分類される）。なお、各ノードデータについての、ＧＮＮの出力データは、そのノードデータに対応するセルのセルタイプの取り得る値（ここではラベルおよび属性値）のそれぞれについての確率（０～１の範囲内の数値）であり、その確率の値に基づいて、例えば閾値による分類などによって、セルタイプが、セルタイプの取り得る値（ここではラベルおよび属性値）のいずれかに決定される。 The attributes of a cell include at least the cell type of the cell, and the cell type is either a label or an attribute value (that is, in this case, the cell is classified as either a label cell or an attribute value cell). The output data of the GNN for each node data is the probability (a number in the range of 0 to 1) for each possible value (here, label and attribute value) of the cell type of the cell corresponding to that node data, and based on the value of that probability, for example by classification using a threshold, the cell type is determined to be one of the possible values of the cell type (here, label and attribute value).

なお、セル属性特定部２５は、ノードデータの示す特徴量に基づいて、ノードデータに対するクラスタリングを行ってノードデータをクラスターに分類し、そのクラスターに基づいてセルの属性を特定するようにしてもよい。例えば、セル属性特定部２５は、ＧＮＮの代わりに上述のクラスタリングでセルの属性を特定するようにしてもよいし、ＧＮＮで上述のセルの属性を特定する際の信頼性が低い場合に、ＧＮＮの代わりに上述のクラスタリングでセルの属性を特定するようにしてもよい。 The cell attribute identification unit 25 may classify the node data into clusters by clustering the node data based on the features indicated by the node data, and identify the cell attributes based on the clusters. For example, the cell attribute identification unit 25 may identify the cell attributes by the above-mentioned clustering instead of the GNN, or may identify the cell attributes by the above-mentioned clustering instead of the GNN when the reliability of identifying the above-mentioned cell attributes by the GNN is low.

例えば、この特徴量としては、例えばＷｏｒｄ２ｖｅｃ（Ｓｋｉｐ－Ｇｒａｍモデル）などの既存の手法に従って生成される、ノードデータ（全部または特定部分）に対応する特徴ベクトルが使用される。また、ノードデータと、そのノードデータに対するセルタイプの値との組み合わせを大量に収集し、セルタイプの各値（ここでは、ラベルまたは属性値）についての中心値（特徴ベクトルの平均）を（各クラスターの中心として）特定しておき、分類対象のノードデータの特徴ベクトルの示す位置から、最も近い中心値を有するセルタイプの値（ここでは、ラベルまたは属性値）が、そのノードデータに対応するセルタイプの値として選択される。 For example, the feature quantity used is a feature vector corresponding to the node data (all or a specific part), which is generated according to an existing method such as Word2vec (Skip-Gram model). In addition, a large number of combinations of node data and the cell type values for that node data are collected, and the median value (average of the feature vector) for each cell type value (here, label or attribute value) is identified (as the center of each cluster), and the cell type value (here, label or attribute value) having the closest median value is selected as the cell type value corresponding to that node data from the position indicated by the feature vector of the node data to be classified.

データ出力部２６は、各ノードデータに、そのノードデータに対応するセル属性を追加して、そのノードデータセットを所定のデータ形式で記憶装置１に記憶したり、通信装置２で送信したりする。 The data output unit 26 adds the cell attributes corresponding to each node data to the node data, and stores the node data set in a specified data format in the storage device 1 or transmits it via the communication device 2.

この出力データ（ノードデータセット）によって、例えば、カラム内のラベルのセルおよび属性値のセルを特定したり、ロウ内のラベルのセルおよび属性値のセルを特定したりすることができる。 This output data (node data set) makes it possible, for example, to identify label cells and attribute value cells in a column, or to identify label cells and attribute value cells in a row.

機械学習処理部２７は、上述のセル属性特定部２５におけるＧＮＮの機械学習を行う機械学習処理を実行する。なお、上述の機械学習処理部２７は、必須のものではなく、必要に応じて設ければよい。また、セル属性特定部２５（ＧＮＮ）の機械学習が完了している場合には、機械学習処理部２７は、設けられていなくてもよい。 The machine learning processing unit 27 executes machine learning processing to perform machine learning of the GNN in the cell attribute identification unit 25 described above. Note that the machine learning processing unit 27 described above is not essential and may be provided as necessary. Also, if the machine learning of the cell attribute identification unit 25 (GNN) is completed, the machine learning processing unit 27 does not need to be provided.

次に、本実施の形態に係る書類画像処理システムの動作について説明する。図４は、図１に示す書類画像処理システムの動作を説明するフローチャートである。 Next, the operation of the document image processing system according to this embodiment will be described. Figure 4 is a flowchart explaining the operation of the document image processing system shown in Figure 1.

まず、書類画像取得部２１は、書類画像を取得する（ステップＳ１）。 First, the document image acquisition unit 21 acquires a document image (step S1).

次に、テーブル検出部２２は、テンプレートデータを使用せずに、書類画像内のテーブルを検出するとともに、テーブル内のセル、カラム、およびロウを検出し、セル幾何学データ、カラム幾何学データ、およびロウ幾何学データを生成する（ステップＳ２）。 Next, the table detection unit 22 detects a table in the document image without using template data, detects cells, columns, and rows in the table, and generates cell geometry data, column geometry data, and row geometry data (step S2).

また、テキストオブジェクト検出部２３は、テンプレートデータを使用せずに、書類画像内のテキストオブジェクトを検出し、テキストオブジェクト幾何学データを生成する（ステップＳ２）。文字認識処理部２４は、検出されたテキストオブジェクトに対して文字認識処理を実行し、そのテキストオブジェクトのテキストデータを生成する。 The text object detection unit 23 detects text objects in the document image without using template data and generates text object geometric data (step S2). The character recognition processing unit 24 performs character recognition processing on the detected text objects and generates text data for the text objects.

なお、テーブル検出部２２による上述の処理およびテキストオブジェクト検出部２３による上述の処理は、並列に実行してもよいし、それらの処理を順番に行う場合には、どちらを先に実行してもよい。 The above-mentioned processing by the table detection unit 22 and the above-mentioned processing by the text object detection unit 23 may be performed in parallel, or when these processing are performed in sequence, either one may be performed first.

そして、検出された各セルについて、セル属性特定部２５は、上述のセル幾何学データおよびテキストオブジェクト幾何学データに基づいて、セル内のテキストオブジェクトを特定し、セル幾何学データ、セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、およびセル内のテキストオブジェクトのテキストデータなどを含むノードデータを生成する（ステップＳ３）。 Then, for each detected cell, the cell attribute identification unit 25 identifies the text object within the cell based on the above-mentioned cell geometric data and text object geometric data, and generates node data including the cell geometric data, the text object geometric data of the text object within the cell, and the text data of the text object within the cell (step S3).

次に、セル属性特定部２５は、テーブルごとに、そのテーブル内で検出された全セルに対応するノードデータでノードデータセットを生成し、そのノードデータセットに対して所定の分類処理を実行して、各セルの属性を分類して、各セルの属性（ここでは、ラベルまたは属性値というセルタイプ）を特定する（ステップＳ４）。 Next, the cell attribute identification unit 25 generates a node data set for each table using node data corresponding to all cells detected in that table, and performs a predetermined classification process on the node data set to classify the attributes of each cell and identify the attribute of each cell (here, the cell type, which is a label or attribute value) (step S4).

そして、データ出力部２６は、例えば、各セルの属性を、そのセルに対応するノードデータに追加し、テーブルごとに、ノードデータセットを出力データとして所定のデータ形式で記憶装置１に記憶したり、通信装置２で送信したりする（ステップＳ５）。 Then, the data output unit 26, for example, adds the attributes of each cell to the node data corresponding to that cell, and stores the node data set as output data for each table in a predetermined data format in the storage device 1 or transmits it via the communication device 2 (step S5).

このようにして、書類画像内の各テーブルについて、セル単位の属性データが生成される。 In this way, cell-level attribute data is generated for each table in the document image.

また、データ出力部２６は、所定形式の検索要求を受け付け、その検索要求に従って、上述のように生成されたノードデータセットにおいて、検索要求により指定されたラベルに対応する属性値を検索し、そのラベルと属性値との組み合わせを出力するようにしてもよい。例えば、まず、生成されたノードデータセットにおいて、検索対象のラベルを含む、セル、ロウ、およびカラムが特定され、そのロウまたはカラムにおいて属性値のセルが含まれているロウまたはカラムが特定され、そのロウまたはカラムにおいて、属性値のセルが１つであれば、その属性値がそのラベルに対応する属性値として特定され、属性値のセルが複数であれば、それらの属性値がそのラベルに対応する属性値として特定されるとともに、それらの属性値のセルのロウまたはカラム（検索対象のラベルを含むロウ内で属性値のセルが特定された場合にはカラム、検索対象のラベルを含むカラム内で属性値のセルが特定された場合にはロウ）のラベルをそれらの属性値にそれぞれ関連付けて付してもよい。 The data output unit 26 may also receive a search request in a predetermined format, and in accordance with the search request, search for an attribute value corresponding to a label specified by the search request in the node data set generated as described above, and output a combination of the label and the attribute value. For example, first, in the generated node data set, a cell, row, and column including the label to be searched are identified, and a row or column including a cell of an attribute value in the row or column is identified. If there is one cell of the attribute value in the row or column, the attribute value is identified as the attribute value corresponding to the label, and if there are multiple cells of the attribute value, the attribute values are identified as the attribute values corresponding to the label, and the labels of the row or column of the cells of the attribute value (column if a cell of the attribute value is identified in a row including the label to be searched, row if a cell of the attribute value is identified in a column including the label to be searched) may be associated with the attribute values, respectively.

例えば図５におけるテーブル１１１のノードデータセットにおいて、検索対象として「身長」が指定された場合、属性値として、「身長」の含まれるロウ内の「１６１．０」、「１６１．２」、および「１６１．１」の３つが検出され、「１６１．０」には「今回」というラベルが、「１６１．２」には「前回」というラベルが、「１６１．１」には「前々回」というラベルが付される。 For example, in the node data set of table 111 in Figure 5, when "height" is specified as the search target, three attribute values, "161.0", "161.2", and "161.1", are detected in the row containing "height", and "161.0" is labeled "this time", "161.2" is labeled "last time", and "161.1" is labeled "before last time".

また、検索要求において２つのラベルを検索対象のラベルとして指定可能としてもよい。その場合、その２つのラベルのうちの一方のラベルで上述と同様に属性値が検出され、検出された属性値のうち、その属性値のセルの属するロウまたはカラムのラベルが他方のラベルに一致するものが、その２つのラベルに対応する属性値であると判定され検出される。 Two labels may also be specified as search target labels in a search request. In this case, an attribute value is detected from one of the two labels in the same manner as described above, and among the detected attribute values, those whose row or column label to which the cell of that attribute value belongs matches the other label are determined to be attribute values corresponding to the two labels and detected.

以上のように、上記実施の形態によれば、テーブル検出部２２は、書類画像内のテーブルを検出し、少なくともテーブル内のセルを検出し、そのセルの位置およびサイズを示すセル幾何学データを生成する。テキストオブジェクト検出部２３は、書類画像内のテキストオブジェクトを検出し、そのテキストオブジェクトの位置およびサイズを示すテキストオブジェクト幾何学データを生成する。文字認識処理部２４は、そのテキストオブジェクトに対して文字認識処理を実行してテキストオブジェクトに対応するテキストデータを生成する。セル属性特定部２５は、（ａ）セル幾何学データおよびテキストオブジェクト幾何学データに基づいて、セル内のテキストオブジェクトを特定し、（ｂ）セルごとに、セル幾何学データ、セル内のテキストオブジェクトのテキストオブジェクト幾何学データ、およびセル内のテキストオブジェクトのテキストデータを含むノードデータを生成し、テーブルに対応するノードデータを含むノードデータセットを生成し、（ｃ）ノードデータセットに対して所定の分類処理を実行して、セルごとに、セルの属性を特定する。 As described above, according to the above embodiment, the table detection unit 22 detects a table in a document image, detects at least a cell in the table, and generates cell geometric data indicating the position and size of the cell. The text object detection unit 23 detects a text object in a document image, and generates text object geometric data indicating the position and size of the text object. The character recognition processing unit 24 performs character recognition processing on the text object to generate text data corresponding to the text object. The cell attribute identification unit 25 (a) identifies a text object in a cell based on the cell geometric data and the text object geometric data, (b) generates node data for each cell, including the cell geometric data, the text object geometric data of the text object in the cell, and the text data of the text object in the cell, generates a node data set including the node data corresponding to the table, and (c) performs a predetermined classification processing on the node data set to identify the cell attribute for each cell.

これにより、テンプレートデータを使用せずに、テーブル内のセルの属性が正確に特定される。また、ラベルと属性値との距離（ユークリッド距離）を考慮せずに、ラベルに対応する属性値を検出しているため、ラベルと属性値との距離（ユークリッド距離）に拘わらず、テーブル内のラベルと属性値の組み合わせが正確に特定される。 This allows the attributes of cells in a table to be accurately identified without using template data. In addition, because the attribute values corresponding to the labels are detected without taking into account the distance (Euclidean distance) between the labels and the attribute values, the combinations of labels and attribute values in a table can be accurately identified regardless of the distance (Euclidean distance) between the labels and the attribute values.

なお、上述の実施の形態に対する様々な変更および修正については、当業者には明らかである。そのような変更および修正は、その主題の趣旨および範囲から離れることなく、かつ、意図された利点を弱めることなく行われてもよい。つまり、そのような変更および修正が請求の範囲に含まれることを意図している。 It should be noted that various changes and modifications to the above-described embodiments will be apparent to those skilled in the art. Such changes and modifications may be made without departing from the spirit and scope of the subject matter and without diminishing its intended advantages. In other words, such changes and modifications are intended to be included within the scope of the claims.

例えば、上記実施の形態において、上述の処理が完了した後、ただちに、書類画像の画像データを当該システムから消去するようにしてもよい。 For example, in the above embodiment, image data of the document image may be immediately deleted from the system after the above-mentioned processing is completed.

また、上記実施の形態において、上述のＧＮＮの入力データのノード数は、ノードデータセットのノードデータ数（つまり、テーブルのセル数）の最大値以上の所定値に設定され、ＧＮＮの入力データのノード数よりノードデータ数が少ない場合には、不足分のノードデータとして固定値が使用され、それに対応する出力データは破棄される。 In addition, in the above embodiment, the number of nodes in the input data of the GNN is set to a predetermined value that is equal to or greater than the maximum number of node data in the node data set (i.e., the number of cells in the table), and if the number of node data is less than the number of nodes in the input data of the GNN, a fixed value is used as the missing node data, and the corresponding output data is discarded.

また、上記実施の形態において、セルの属性としてはセルタイプとしており、セルタイプはラベルおよび属性値のいずれかであるが、セルタイプが、ラベルおよび属性値の他、ヘッダー（テーブルのタイトルなど）、その他、などを取るようにしてもよい。 In addition, in the above embodiment, the cell attribute is the cell type, which is either a label or an attribute value, but the cell type may be a header (such as a table title) or other type in addition to the label and attribute value.

また、上記実施の形態において、ロウおよびカラムを検出せず、ノードデータに、ロウ幾何学データおよびカラム幾何学データを含まないようにしてもよい。その場合でも、セル幾何学データのセルの位置に基づいて、ラベルのセルの位置から、横方向および縦方向に沿って、属性値のセルを探索し検出することで、ラベルに対応する属性値を検出することができる。 In addition, in the above embodiment, rows and columns may not be detected, and row geometry data and column geometry data may not be included in the node data. Even in this case, the attribute value corresponding to the label can be detected by searching and detecting the attribute value cell in the horizontal and vertical directions from the position of the label cell based on the position of the cell in the cell geometry data.

本発明は、例えば、帳票などの書類画像の認識処理に適用可能である。 The present invention can be applied to recognition processing of document images such as forms, for example.

４演算処理装置（コンピューターの一例）
１１画像処理プログラム（書類画像処理プログラムの一例）
２２テーブル検出部
２３テキストオブジェクト検出部
２４文字認識処理部
２５セル属性特定部 4. Processing unit (an example of a computer)
11 Image processing program (an example of a document image processing program)
22 table detection unit 23 text object detection unit 24 character recognition processing unit 25 cell attribute identification unit

Claims

a table detector for detecting tables within a document image, detecting at least cells within said tables, and generating cell geometry data indicative of positions and sizes of said cells;
a text object detector for detecting text objects within the document image and generating text object geometry data indicative of positions and sizes of the text objects;
a character recognition processing unit that performs character recognition processing on the text object to generate text data corresponding to the text object;
(a) identifying the text object in the cell based on the cell geometric data and the text object geometric data; (b) generating, for each of the cells, node data including the cell geometric data, the text object geometric data of the text object in the cell, and the text data of the text object in the cell, and generating a node data set including the node data corresponding to the table; and (c) performing a predetermined classification process on the node data set to identify, for each of the cells, an attribute of the cell.
A document image processing system comprising:

The document image processing system according to claim 1, characterized in that the cell attribute identification unit inputs the node data set as input data to a machine-learned graph neural network and identifies the output data of the graph neural network as the attributes of the cell.

The document image processing system according to claim 1, characterized in that the cell attribute identification unit performs clustering on the node data based on the feature values indicated by the node data to classify the node data into clusters, and identifies the attributes of the cells based on the clusters.

the table detection unit detects at least one of rows and columns along with cells in the table, and generates at least one of row geometry data indicating a position and size of the rows and column geometry data indicating a position and size of the columns;
the node data of a cell further includes at least one of the row geometric data of a row to which the cell belongs and the column geometric data of a column to which the cell belongs;
4. The document image processing system according to claim 1, wherein:

the attributes of the cell include at least a cell type of the cell;
the cell type includes a label and an attribute value;
5. The document image processing system according to claim 1, further comprising:

a computer detecting a table in a document image, the computer detecting at least cells in the table, and the computer generating cell geometry data indicative of positions and sizes of the cells;
detecting text objects within the document image by the computer and generating text object geometry data indicative of positions and sizes of the text objects by the computer ;
performing character recognition processing on the text object by the computer to generate text data corresponding to the text object by the computer ;
(a) the computer identifies the text object in the cell based on the cell geometry data and the text object geometry data; (b) for each cell, the computer generates node data including the cell geometry data, the text object geometry data of the text object in the cell, and the text data of the text object in the cell , and generates a node data set including the node data corresponding to the table; and (c) the computer executes a predetermined classification process on the node data set, and for each cell, the computer identifies an attribute of the cell.
A document image processing method comprising:

Computer,
a table detector for detecting tables within a document image, detecting at least cells within said tables, and generating cell geometry data indicative of positions and sizes of said cells;
a text object detector for detecting text objects within the document image and generating text object geometry data indicative of positions and sizes of the text objects;
a character recognition processing unit that performs character recognition processing on the text object to generate text data corresponding to the text object; and
(a) identifying the text object in the cell based on the cell geometric data and the text object geometric data; (b) generating, for each of the cells, node data including the cell geometric data, the text object geometric data of the text object in the cell, and the text data of the text object in the cell, and generating a node data set including the node data corresponding to the table; and (c) performing a predetermined classification process on the node data set to identify, for each of the cells, an attribute of the cell.
A document image processing program that acts as a