JP7562292B2

JP7562292B2 - Information processing device, server, system, information processing method, and program

Info

Publication number: JP7562292B2
Application number: JP2020095330A
Authority: JP
Inventors: 満夫木村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-11-05
Filing date: 2020-06-01
Publication date: 2024-10-07
Anticipated expiration: 2040-06-01
Also published as: JP2021077332A

Description

本開示は、ＯＣＲ処理結果に基づき所定の項目に対応する文字列を抽出する技術に関する。 This disclosure relates to a technology for extracting character strings corresponding to specified items based on the results of OCR processing.

紙の帳票に記載されている特定の情報をユーザがコンピュータに入力する業務がある。また、その入力業務を支援するための入力支援システムがある。入力支援システムでは、帳票をスキャナで読み取ることによって得られた画像データに対して文字認識処理が行われ、その文字認識結果から入力に必要な情報が抽出される。 There are tasks in which users input specific information written on paper forms into a computer. There are also input support systems that assist with this input task. In input support systems, character recognition processing is performed on the image data obtained by reading the form with a scanner, and the information required for input is extracted from the character recognition results.

特許文献１には、入力に必要な情報が表示されている画像上の領域をユーザが指定することにより、文字認識結果の修正作業を行う技術が記載されている。 Patent document 1 describes a technology that allows a user to specify an area on an image where the information required for input is displayed, thereby correcting the character recognition results.

特開２０００－３４０３号公報JP 2000-3403 A

しかしながら、特許文献１の技術では、文字認識結果を修正するために、入力に必要な情報が表示されている画像上の領域をマウスなどの入力装置を用いて指定する必要があり、ユーザにとっては煩雑な操作が要求される。特に、ＭＦＰに備え付けられているタッチパネル、または携帯端末のように入力装置における画面のサイズが小さい場合、ピンチによる画面の拡大、スワイプによる画面のスクロールといった操作が必要となり、ユーザの操作はより煩雑になる。 However, with the technology of Patent Document 1, in order to correct the character recognition results, it is necessary to use an input device such as a mouse to specify the area on the image where the information required for input is displayed, which requires cumbersome operations for the user. In particular, when the screen size of the input device is small, such as a touch panel installed in an MFP or a mobile terminal, operations such as pinching to enlarge the screen and swiping to scroll the screen are necessary, making the user's operations even more cumbersome.

本開示の技術は、文字認識結果に基づく処理をする際のユーザ操作の手間を抑制することを目的とする。 The technology disclosed herein aims to reduce the amount of user operations required when performing processing based on character recognition results.

本開示に係る情報処理装置は、画像の画像データを文字認識処理することによって得られた前記画像内の文字列と前記文字列が認識された文字領域に関する情報とが含まれる処理結果と、予め定義された所定の項目のそれぞれに対応する文字列および文字領域に関する情報が出力されるように学習された学習モデルに対して前記処理結果を入力することによって出力された、前記所定の項目のそれぞれに対応する前記画像内の文字列および文字領域に関する情報と、を含む抽出結果を取得する取得手段と、前記抽出結果に基づき、前記所定の項目のそれぞれに対応する文字列を表示部に表示する表示制御手段と、前記表示部に表示された前記所定の項目のうちの何れかの項目に対応する文字列を修正する指示を、ユーザから受け付ける受付手段と、修正後の文字列と前記処理結果とに基づいて、ユーザが修正の指示をした項目に対応する文字領域を決定し、前記決定に基づき前記抽出結果を修正する修正手段と、前記学習モデルが学習するためのデータとして、前記修正手段によって修正された前記抽出結果を出力する出力手段と、を有することを特徴とする。 The information processing device according to the present disclosure is characterized in having an acquisition means for acquiring an extraction result including a processing result including a character string in an image obtained by performing character recognition processing on image data of an image and information on a character area in which the character string is recognized, and information on a character string and a character area in the image corresponding to each of predefined specified items, output by inputting the processing result into a learning model trained to output information on a character string and a character area corresponding to each of the predefined items ; a display control means for displaying a character string corresponding to each of the predefined items on a display unit based on the extraction result; a receiving means for receiving an instruction from a user to correct a character string corresponding to any of the predefined items displayed on the display unit; a correction means for determining a character area corresponding to an item instructed to be corrected by the user based on the corrected character string and the processing result, and correcting the extraction result based on the determination; and an output means for outputting the extraction result corrected by the correction means as data for the learning model to learn from.

本開示の技術によれば、文字認識結果に基づく処理をする際のユーザ操作の手間を抑制することができる。 The technology disclosed herein can reduce the amount of user operations required when performing processing based on character recognition results.

入力支援システムの構成を示す図である。FIG. 1 is a diagram illustrating a configuration of an input support system. キーバリュー抽出サーバのハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of a key-value extraction server. ＭＦＰのハードウェア構成の一例を示す図である。FIG. 2 illustrates an example of a hardware configuration of an MFP. キーバリュー抽出サーバの機能構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of a key-value extraction server. ＭＦＰの機能構成の一例を示す図である。FIG. 2 illustrates an example of a functional configuration of an MFP. スキャナで読み取られた文書の画像の一例である。1 is an example of an image of a document read by a scanner. キーバリュー抽出結果の一例である。13 is an example of a key-value extraction result. キーに対応する文字領域を決定する処理を示すシーケンス図である。FIG. 11 is a sequence diagram showing a process for determining a character area corresponding to a key. 操作パネルに表示される画面を示す図である。FIG. 4 is a diagram showing a screen displayed on an operation panel. スキャナで読み取られた文書の画像の一例である。1 is an example of an image of a document read by a scanner. キーバリュー抽出結果の一例である。13 is an example of a key-value extraction result. スキャナで読み取られた文書の画像の一例である。1 is an example of an image of a document read by a scanner. キーバリュー抽出結果の一例である。13 is an example of a key-value extraction result. 修正されたキーバリュー抽出結果の一例である。13 is an example of a corrected key-value extraction result. 操作パネルに表示される画面の遷移を説明する図である。11A to 11C are diagrams illustrating transitions of screens displayed on an operation panel. キーバリュー抽出サーバの機能構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of a key-value extraction server. キーに対応する文字領域を決定する処理を示すシーケンス図である。FIG. 11 is a sequence diagram showing a process for determining a character area corresponding to a key. キーバリュー抽出結果の一例である。13 is an example of a key-value extraction result. 操作パネルに表示される画面の遷移を説明する図である。11A to 11C are diagrams illustrating transitions of screens displayed on an operation panel. キーに対応する文字領域を決定する処理を示すシーケンス図である。FIG. 11 is a sequence diagram showing a process for determining a character area corresponding to a key.

以下、本実施形態を実施するための形態について図面を参照して説明する。なお、実施形態は、本開示の技術を限定するものではなく、また、実施形態で説明されている全ての構成が課題を解決するための手段に必須であるとは限らない。また、以下の実施形態では好ましい実施形態について説明するが、以下の実施形態に限定されず、その要旨の範囲内で種々の変形および変更が可能である。 Below, the form for implementing the present embodiment will be described with reference to the drawings. Note that the embodiment does not limit the technology of the present disclosure, and all configurations described in the embodiment are not necessarily essential to the means for solving the problem. In addition, the following embodiment describes a preferred embodiment, but is not limited to the following embodiment, and various modifications and changes are possible within the scope of the gist.

＜実施形態１＞
［システム構成］
図１は、本実施形態に係る入力支援システムの全体構成を示す図である。図１を用いて入力支援システム１００について説明する。本実施形態の入力支援システム１００は、企業の会計業務を処理するサービス（会計サービス）を提供する会計サーバ１０３に、必要なデータをユーザに代わって入力する入力支援を行うためのシステムであるものとして説明する。なお、入力支援システムによる入力支援は会計サービスに限られず、一般の帳票入力業務を支援するシステムとして利用することができる。 <Embodiment 1>
[System configuration]
Fig. 1 is a diagram showing the overall configuration of an input support system according to this embodiment. The input support system 100 will be described with reference to Fig. 1. The input support system 100 of this embodiment will be described as a system for performing input support by inputting required data on behalf of a user to an accounting server 103 that provides a service (accounting service) that processes accounting operations of a company. Note that input support by the input support system is not limited to accounting services, and can be used as a system that supports general form input operations.

ＭＦＰ（ＭｕｌｔｉｆｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌ）３００は、ネットワークに接続が可能な情報処理装置としても機能する画像形成装置である。ＭＦＰ３００は、紙文書を画像データにするスキャン機能を備える。 The MFP (Multifunction Peripheral) 300 is an image forming device that also functions as an information processing device that can be connected to a network. The MFP 300 has a scanning function that converts paper documents into image data.

キーバリュー抽出サーバ２００は、紙文書のスキャン画像から、所定の項目（キー）に対応する文字列等の情報（バリュー）を抽出する情報処理装置として機能するサーバである。項目は、本実施形態の場合、会計サービスで必要とされる項目が挙げられる。キーバリュー抽出サーバ２００は、例えば、請求書の画像から「日付」の項目に対応するバリューである文字列「２０１９／０６／１２」を抽出する。 The key-value extraction server 200 is a server that functions as an information processing device that extracts information (value), such as character strings corresponding to specific items (keys), from a scanned image of a paper document. In this embodiment, the items include items required for accounting services. For example, the key-value extraction server 200 extracts the character string "2019/06/12", which is a value corresponding to the "Date" item, from an image of an invoice.

会計サーバ１０３は、請求書等の所定の帳票の情報が入力されることで会計処理を行う会計サービスを提供するサーバである。ＭＦＰ３００、キーバリュー抽出サーバ２００、会計サーバ１０３は、インターネット１０４を介して接続されている。 The accounting server 103 is a server that provides accounting services that perform accounting processing by inputting information on specific forms such as invoices. The MFP 300, the key value extraction server 200, and the accounting server 103 are connected via the Internet 104.

［ハードウェア構成］
図２は、キーバリュー抽出サーバ２００、会計サーバ１０３の基本的なハードウェア構成を示すブロック図である。 [Hardware configuration]
FIG. 2 is a block diagram showing the basic hardware configuration of the key-value extraction server 200 and the accounting server 103.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１は、各種のプログラムを実行し、様々な機能を実現するユニットである。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０２は、各種の情報を記憶するユニットである。ＲＡＭ２０２は、ＣＰＵ２０１の一時的な作業記憶領域としても利用される。ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０３は、各種のプログラム等を記憶するユニットである。例えば、ＣＰＵ２０１は、ＲＯＭ２０３に記憶されているプログラムをＲＡＭ２０２にロードしてプログラムを実行する。 The CPU (Central Processing Unit) 201 is a unit that executes various programs and realizes various functions. The RAM (Random Access Memory) 202 is a unit that stores various information. The RAM 202 is also used as a temporary working memory area for the CPU 201. The ROM (Read Only Memory) 203 is a unit that stores various programs, etc. For example, the CPU 201 loads a program stored in the ROM 203 into the RAM 202 and executes the program.

外部記憶部２０７は、例えばフラッシュメモリ、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）又はＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｉｓｋ）等で構成され、各種のプログラムを記憶する。ＣＰＵ２０１は、外部記憶部に記憶されているプログラムに基づき処理を実行する。これにより、図４に示されるようなキーバリュー抽出サーバ２００を構成する機能構成及び後述するシーケンスにおける各ステップの処理が実現される。 The external storage unit 207 is composed of, for example, a flash memory, a hard disk drive (HDD), or a solid state disk (SSD), and stores various programs. The CPU 201 executes processing based on the programs stored in the external storage unit. This realizes the functional configuration constituting the key-value extraction server 200 as shown in FIG. 4 and the processing of each step in the sequence described below.

Ｉｎｐｕｔ／Ｏｕｔｐｕｔインターフェース２０４は、ディスプレイ装置のような出力装置、キーボード、マウスといった入力装置とのインターフェースを提供するユニットである。 The Input/Output interface 204 is a unit that provides an interface with output devices such as a display device, and input devices such as a keyboard and a mouse.

ＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）２０５は、キーバリュー抽出サーバ２００、会計サーバ１０３をネットワークに接続するためのユニットである。上述したユニットは、バス２０６を介してデータの送受信を行うことが可能となるように構成されている。 The NIC (Network Interface Card) 205 is a unit for connecting the key-value extraction server 200 and the accounting server 103 to a network. The above-mentioned units are configured to be able to send and receive data via the bus 206.

図３は、ＭＦＰ３００の基本的なハードウェア構成を示すブロック図である。図３のＣＰＵ３０１、ＲＡＭ３０２、ＲＯＭ３０３、ＮＩＣ３０４は、それぞれ、図２のＣＰＵ２０１、ＲＡＭ２０２、ＲＯＭ２０３、ＮＩＣ２０５と同様であるため説明は省略する。 Figure 3 is a block diagram showing the basic hardware configuration of MFP 300. CPU 301, RAM 302, ROM 303, and NIC 304 in Figure 3 are similar to CPU 201, RAM 202, ROM 203, and NIC 205 in Figure 2, respectively, and therefore will not be described.

スキャナ３０５は、紙文書を画像データに変換する入力ユニットである。プリンタエンジン３０６は、画像データの印刷を行う出力ユニットである。操作パネル３０７は、ユーザからのタッチ操作を受け付ける入力ユニット（入力部）であり、ＭＦＰ３００の情報をディスプレイに表示する出力ユニット（表示部）である。上述したユニットは、バス３０８を介してデータの送受信を行うことが可能となるように構成されている。 The scanner 305 is an input unit that converts paper documents into image data. The printer engine 306 is an output unit that prints the image data. The operation panel 307 is an input unit (input section) that accepts touch operations from the user, and is an output unit (display section) that displays information about the MFP 300 on a display. The above-mentioned units are configured to be able to send and receive data via the bus 308.

［キーバリュー抽出サーバの機能構成］
図４は、キーバリュー抽出サーバ２００の機能構成を示した図である。図４を用いて、キーバリュー抽出サーバ２００の機能について説明する。 [Functional configuration of the key-value extraction server]
Fig. 4 is a diagram showing the functional configuration of the key-value extraction server 200. The functions of the key-value extraction server 200 will be described with reference to Fig. 4.

画像データ取得部４０１は、ＮＩＣ２０５を介して、ＭＦＰ３００から原稿を読み取ることによって得られたスキャン画像の画像データを取得する。ＯＣＲ部４０２は、文字認識処理部であり、画像データが示す画像において文字列が含まれる領域である文字領域を特定し、その文字領域に対して文字認識処理（ＯＣＲ処理）を行う。ＯＣＲ部４０２は、文字認識処理の結果として、少なくとも画像における文字領域の位置情報と文字領域に含まれる文字列とを出力する。 The image data acquisition unit 401 acquires image data of a scanned image obtained by reading an original from the MFP 300 via the NIC 205. The OCR unit 402 is a character recognition processing unit that identifies a character area, which is an area containing a character string in the image indicated by the image data, and performs character recognition processing (OCR processing) on the character area. As a result of the character recognition processing, the OCR unit 402 outputs at least the position information of the character area in the image and the character string contained in the character area.

キーバリュー抽出部４０３は、ＯＣＲ部４０２による文字認識処理の結果に含まれる、キー（項目）に対応するバリュー（文字列）を、抽出するための処理を行う。キー（項目）は入力支援システム１００が行う入力支援に応じて予め定義されている。このキー（項目）に対応するバリュー（文字列）をキーバリューともよぶ。キーバリュー抽出部４０３は、文字認識処理の処理結果と抽出ルール４０９とを用いて、キーバリューを抽出する。 The key-value extraction unit 403 performs processing to extract values (character strings) corresponding to keys (items) contained in the results of the character recognition processing by the OCR unit 402. The keys (items) are defined in advance according to the input support performed by the input support system 100. The values (character strings) corresponding to these keys (items) are also called key values. The key-value extraction unit 403 extracts key values using the results of the character recognition processing and extraction rules 409.

本実施形態の抽出ルール４０９は、文字認識処理の処理結果の特徴に基づき、それぞれの項目に対応する文字列が出力されるように機械学習された学習モデルであるものとして説明する。文字認識処理の処理結果の特徴には、（１）文字認識された文字列から得られる特徴、（２）文字領域の矩形情報、（３）文字列の周囲にある文字列から得られる特徴、が含まれる。 The extraction rule 409 of this embodiment will be described as a learning model that is machine-learned to output character strings corresponding to each item based on the characteristics of the processing result of the character recognition process. The characteristics of the processing result of the character recognition process include (1) characteristics obtained from the character string that has been recognized, (2) rectangular information of the character area, and (3) characteristics obtained from the character strings surrounding the character string.

キーバリュー抽出部４０３が所定の項目に対応する文字列および文字領域を抽出するために、抽出ルール４０９には、ＯＣＲ部４０２の文字認識処理の処理結果である文字領域と文字領域で認識された文字列とが入力される。その結果、抽出ルール４０９は夫々の項目に対応する夫々の文字列および文字領域が出力されるように学習されている。このためキーバリュー抽出部４０３は、その出力結果に基づき所定の項目に対応する文字列および文字領域を抽出することができる。または、抽出ルール４０９は項目に対応する文字列および文字領域に関連する値を出力するように学習されており、キーバリュー抽出部４０３は、その出力された値に基づき項目に対応する文字列および文字領域を抽出してもよい。 In order for the key-value extraction unit 403 to extract character strings and character areas corresponding to specific items, the extraction rules 409 are input with the character areas that are the results of the character recognition process by the OCR unit 402 and the character strings recognized in the character areas. As a result, the extraction rules 409 are trained to output character strings and character areas corresponding to each item. Therefore, the key-value extraction unit 403 can extract character strings and character areas corresponding to specific items based on the output results. Alternatively, the extraction rules 409 may be trained to output values related to the character strings and character areas corresponding to the items, and the key-value extraction unit 403 may extract character strings and character areas corresponding to the items based on the output values.

抽出ルール４０９は、例えば、入力データと教師データとを紐付け、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）などのアルゴリズムを用いて学習させることで生成された学習モデルである。なお、ＳＶＭに限らず決定木またはニューラルネットワークのアルゴリズムを利用してもよい。 The extraction rule 409 is, for example, a learning model generated by linking input data and teacher data and training them using an algorithm such as SVM (Support Vector Machine). Note that it is not limited to SVM, and a decision tree or neural network algorithm may also be used.

また、キーバリュー抽出部４０３は、抽出結果として、それぞれの項目と項目に対応する文字列および文字領域とが関連付けられているデータであるキーバリュー抽出結果を生成する。キーバリュー抽出結果送信部４０５は、ＯＣＲ部４０２が文字認識処理をした処理結果とキーバリュー抽出結果とをＭＦＰ３００に送信する。キーバリュー抽出結果の詳細については後述する。 The key-value extraction unit 403 also generates a key-value extraction result, which is data in which each item is associated with a character string and a character area corresponding to the item, as an extraction result. The key-value extraction result transmission unit 405 transmits the processing result of the character recognition process performed by the OCR unit 402 and the key-value extraction result to the MFP 300. Details of the key-value extraction result will be described later.

本実施形態では、キーバリュー抽出部４０３によって項目に対応する文字列または文字領域が誤って抽出された場合、ＭＦＰ３００を介して、ユーザが、その項目に対応する正しい文字列および文字領域に修正するように指示することが可能である。修正結果取得部４０６は、ＭＦＰ３００から、ユーザの指示に基づき修正された項目に対応する正しい文字列および文字領域を示すデータ（修正結果）を取得する。 In this embodiment, if the key-value extraction unit 403 erroneously extracts a character string or character area corresponding to an item, the user can instruct the MFP 300 to correct the character string and character area to the correct one corresponding to the item. The correction result acquisition unit 406 acquires data (correction result) from the MFP 300 indicating the correct character string and character area corresponding to the item corrected based on the user's instruction.

学習部４０７は、修正結果取得部４０６が取得した修正結果に基づき、抽出ルール４０９を学習させる。本実施形態の抽出ルール４０９は、入力支援システム１００を利用するそれぞれのユーザの業務に適した抽出ルール４０９となるように学習部４０７が抽出ルール４０９をさらに学習させることができる。 The learning unit 407 learns the extraction rule 409 based on the correction result acquired by the correction result acquisition unit 406. The learning unit 407 can further learn the extraction rule 409 in this embodiment so that the extraction rule 409 becomes suitable for the work of each user who uses the input support system 100.

例えば、学習部４０７は、ＯＣＲ部４０２が文字認識処理をした処理結果を入力データとして抽出ルール４０９に入力して、ある項目に対応する文字列および文字領域を示す値を出力させる。そしてその出力結果と、その項目に対応する実際の正しい文字列および文字領域を示す値とのずれ量を算出する。学習部４０７は、そのずれ量が小さくなるように、抽出ルール４０９中のパラメータ等を更新する。項目に対応する実際の正しい文字列および文字領域を示すデータを教師データとよぶ。また教師データ等の学習のためのデータを用いて抽出ルール４０９のパラメータを更新することを学習とよぶ。 For example, the learning unit 407 inputs the results of character recognition processing performed by the OCR unit 402 as input data into the extraction rules 409, and causes them to output values indicating a character string and character area corresponding to a certain item. The learning unit 407 then calculates the amount of deviation between the output result and the value indicating the actual correct character string and character area corresponding to that item. The learning unit 407 updates parameters and the like in the extraction rules 409 so as to reduce the amount of deviation. Data indicating the actual correct character string and character area corresponding to an item is called teacher data. Also, updating the parameters of the extraction rules 409 using data for learning, such as teacher data, is called learning.

本実施形態では、学習部４０７の学習では、修正結果取得部４０６が取得する修正結果である、所定の項目と夫々の項目に対応する文字領域および文字列とが対応付けられているデータが、教師データとして使用される場合がある。このため、抽出ルール４０９によってある項目に対応する文字列または文字領域が誤って出力された場合は、正しい文字列および文字領域が出力されるように、抽出ルール４０９に学習させることができる。 In this embodiment, in the learning by the learning unit 407, data in which specific items are associated with the character areas and character strings corresponding to each item, which is the correction result acquired by the correction result acquisition unit 406, may be used as training data. Therefore, if the extraction rule 409 erroneously outputs a character string or character area corresponding to a certain item, the extraction rule 409 can be trained to output the correct character string and character area.

なお、抽出ルール４０９は学習モデルに限られない。他にも、例えば、抽出ルール４０９は、帳票の種類ごとに夫々の項目に対応する文字領域の座標およびサイズが関連付けられているテーブルであってもよい。この場合、不図示の修正部が、修正結果取得部４０６が取得した修正結果に基づき、所定の項目を示す文字列が含まれる文字領域の座標およびサイズの値を修正する。このように、文字認識処理することによって得られた文字列のうち、項目に対応する文字列を抽出するための処理が正しく行われるように処理を行われる。 Note that extraction rule 409 is not limited to a learning model. For example, extraction rule 409 may also be a table in which the coordinates and size of character areas corresponding to each item are associated with each type of form. In this case, a correction unit (not shown) corrects the coordinates and size values of the character area containing a character string indicating a specific item based on the correction result acquired by correction result acquisition unit 406. In this way, processing is performed so that processing for extracting character strings corresponding to items from the character strings obtained by character recognition processing is performed correctly.

領域決定部４０４は、ユーザによって修正された項目に対応する文字列と、ユーザが指示したスキャン画像内の位置と、に基づき項目に対応する正しい文字領域を決定（特定）し、キーバリュー抽出結果を修正する。詳細は後述する。 The area determination unit 404 determines (specifies) the correct character area corresponding to the item based on the character string corresponding to the item corrected by the user and the position in the scanned image specified by the user, and corrects the key-value extraction result. Details will be described later.

キーバリュー抽出結果送信部４０５は、ＭＦＰ３００にキーバリュー抽出結果を送信する。キーバリューデータ送信部４０８は、ＮＩＣ２０５を介して、それぞれの項目と項目に対応する文字列を会計サーバ１０３に送信する。 The key-value extraction result transmission unit 405 transmits the key-value extraction result to the MFP 300. The key-value data transmission unit 408 transmits each item and the character string corresponding to the item to the accounting server 103 via the NIC 205.

図４の各部の機能は、キーバリュー抽出サーバ２００のＣＰＵがＲＯＭに記憶されているプログラムコードをＲＡＭに展開し実行することにより実現される。または、図４の各部の一部または全部の機能をＡＳＩＣや電子回路等のハードウェアで実現してもよい。 The functions of each part in FIG. 4 are realized by the CPU of the key-value extraction server 200 expanding the program code stored in the ROM into the RAM and executing it. Alternatively, some or all of the functions of each part in FIG. 4 may be realized by hardware such as an ASIC or electronic circuit.

［ＭＦＰの機能構成］
図５は、ＭＦＰ３００の機能構成を示した図である。図５を用いて、ＭＦＰ３００の機能について説明する。 [Functional configuration of MFP]
Fig. 5 is a diagram showing the functional configuration of the MFP 300. The functions of the MFP 300 will be described with reference to Fig. 5.

ＵＩ部５０１は、操作パネル３０７に、ユーザに所定の通知をするための画面を表示させる表示制御部としての機能と、操作パネル３０７を介してユーザからの入力を受け付ける受付部としての機能を有する。 The UI unit 501 functions as a display control unit that causes the operation panel 307 to display a screen for providing a predetermined notification to the user, and as a reception unit that receives input from the user via the operation panel 307.

スキャン実行部５０２は、スキャナ３０５によって原稿を読み込むことによって得られたデータを画像データに変換する。画像データ送信部５０３は、ＮＩＣ３０４を介して、原稿の画像データをキーバリュー抽出サーバ２００に送信する。 The scan execution unit 502 converts data obtained by reading an original document with the scanner 305 into image data. The image data transmission unit 503 transmits the image data of the original document to the key-value extraction server 200 via the NIC 304.

キーバリュー抽出結果取得部５０４は、ＮＩＣ３０４を介してキーバリュー抽出サーバ２００からキーバリュー抽出結果を取得する。 The key-value extraction result acquisition unit 504 acquires the key-value extraction result from the key-value extraction server 200 via the NIC 304.

領域決定部５０６は、ＵＩ部５０１を介してユーザによって修正の指示がされた項目に対応する修正後の文字列を取得し、取得した修正後の文字列に基づきその項目に対応する正しい文字領域を決定する。領域決定部５０６は、決定された文字領域をキーバリュー抽出結果に反映させてキーバリュー抽出結果を修正する。詳細は後述する。 The area determination unit 506 obtains the corrected character string corresponding to the item instructed to be corrected by the user via the UI unit 501, and determines the correct character area corresponding to that item based on the obtained corrected character string. The area determination unit 506 corrects the key-value extraction result by reflecting the determined character area in the key-value extraction result. Details will be described later.

修正結果送信部５０５は、ＵＩ部５０１を介してユーザが行った修正に基づき修正されたキーバリュー抽出結果を、ＮＩＣ３０４を介してキーバリュー抽出サーバ２００に送信（出力）する。修正されたキーバリュー抽出結果は、学習部４０７が学習するためのデータとして使用される。 The modification result transmission unit 505 transmits (outputs) the modified key-value extraction result based on the modification made by the user via the UI unit 501 to the key-value extraction server 200 via the NIC 304. The modified key-value extraction result is used as data for learning by the learning unit 407.

このように本実施形態のＭＦＰ３００は、キーバリュー抽出サーバ２００から送信されたキーバリュー抽出結果に基づき、所定の処理を行う情報処理装置としても機能する。 In this way, the MFP 300 of this embodiment also functions as an information processing device that performs predetermined processing based on the key-value extraction results sent from the key-value extraction server 200.

図５の各部の機能は、ＭＦＰ３００のＣＰＵがＲＯＭに記憶されているプログラムコードをＲＡＭに展開し実行することにより実現される。または、図５の各部の一部または全部の機能をＡＳＩＣや電子回路等のハードウェアで実現してもよい。 The functions of each part in FIG. 5 are realized by the CPU of the MFP 300 expanding program code stored in the ROM into the RAM and executing it. Alternatively, some or all of the functions of each part in FIG. 5 may be realized by hardware such as an ASIC or electronic circuit.

［キーバリュー抽出結果について］
図６は、ＭＦＰ３００のスキャン実行部５０２によってスキャンされた紙文書の画像データが示す画像の一例を示す図である。図６に示すように、画像は、例えば、見積書のように、会計サーバ１０３に入力する情報が記載された紙文書（原稿）を読み取ることによって得られたスキャン画像の画像データである。 [Key-value extraction results]
Fig. 6 is a diagram showing an example of an image represented by image data of a paper document scanned by the scan execution unit 502 of the MFP 300. As shown in Fig. 6, the image is image data of a scanned image obtained by reading a paper document (original) on which information to be input to the accounting server 103, such as an estimate, is written.

図７は、文字認識処理を行った結果認識された文字領域および文字列と、認識された文字領域および文字列のうち予め定義された項目に対応する文字列および文字領域であることを示す値と、が含まれるキーバリュー抽出結果７００を示す図である。 Figure 7 shows a key-value extraction result 700 that includes character regions and character strings recognized as a result of character recognition processing, and values indicating that the recognized character regions and character strings correspond to predefined items.

キーバリュー抽出結果７００は、文字領域（列７０１）、文字領域で認識された文字列を示す文字認識結果（列７０２）、および項目の名称を示すラベル（列７０３）を保持する各列のデータがレコード単位（行単位）で関連付けて保持されている。またレコードは、文字領域ごとに生成されている。 In the key-value extraction result 700, data in each column that holds a character area (column 701), a character recognition result indicating a character string recognized in the character area (column 702), and a label indicating the name of the item (column 703) are stored in association with each other on a record-by-record (row-by-row) basis. A record is also generated for each character area.

列７０１の文字領域は、画像における文字列を含む矩形領域である文字領域の位置情報を保持する列である。位置情報として、文字領域の位置を示す座標の値と文字領域のサイズを示す値が保持される。画像内の座標は原点が左上で、縦がｙ方向、横がｘ方向に延びる座標系を用いており、列７０１には、左側から順に、ｘ方向の座標、ｙ方向の座標、ｘ方向のサイズ（幅）の値、ｙ方向のサイズ（高さ）の値、がそれぞれ保持されている。 The character area in column 701 is a column that holds position information for character areas, which are rectangular areas in an image that contain a character string. The position information holds coordinate values indicating the position of the character area and values indicating the size of the character area. The coordinate system for the image uses a coordinate system whose origin is at the top left, with the vertical direction extending in the y direction and the horizontal direction extending in the x direction. From the left, column 701 holds the x coordinate, the y coordinate, the x size (width) value, and the y size (height) value.

列７０２は、文字領域に対してＯＣＲ処理をした結果認識された文字列を保持する列である。列７０１と列７０２とに保持されている情報は、画像データに対してＯＣＲ部４０２による文字認識処理の処理結果である。 Column 702 is a column that holds character strings recognized as a result of OCR processing on a character area. The information held in columns 701 and 702 is the processing result of character recognition processing by OCR unit 402 on image data.

列７０３は、項目（キー）の名称を示す値を保持するための列である。夫々の項目の名称を示す値をラベルとよぶ。会計システムにおける項目は、会計サーバ１０３で必要とされる項目であり、本実施形態における項目は、スキャンされた文書のタイトル、請求先の電話番号、文書の管理番号、文書の発行日、金額の総額の５項目であるものとして説明する。このように、本実施形態では、キーバリュー抽出結果には、文字認識処理の処理結果が含まれているものとして説明するが、文字認識処理の処理結果は別のデータとして生成されてもよい。 Column 703 is a column for holding values indicating the name of an item (key). The value indicating the name of each item is called a label. Items in the accounting system are items required by the accounting server 103, and in this embodiment, the items are described as five items: the title of the scanned document, the billing telephone number, the document management number, the document issue date, and the total amount. Thus, in this embodiment, the key-value extraction result is described as including the processing result of the character recognition process, but the processing result of the character recognition process may be generated as separate data.

列７０３には、文書のタイトルを示す項目に対応する文字列および文字領域を保持するレコードには「タイトル」の値（ラベル）が付与される。請求先の電話番号の項目には「電話番号」の値が付与される。文書の管理番号の項目には「番号」の値が付与される。文書の発行日の項目には「日付」の値が付与される。金額の総額の項目には「金額」の値が付与される。 In column 703, the record that holds the character string and character area corresponding to the item indicating the document title is assigned the value "Title" (label). The billing telephone number item is assigned the value "Telephone number". The document management number item is assigned the value "Number". The document issue date item is assigned the value "Date". The total amount item is assigned the value "Amount".

キーバリュー抽出部４０３は、列７０１の文字領域と列７０２の文字列とを抽出ルール４０９に入力して、夫々の項目に対応する文字列および文字領域を出力させる。キーバリュー抽出部４０３は、抽出ルール４０９の出力結果に基づき、文字認識処理の処理結果における文字列および文字領域を有するレコードのうち、夫々の項目に対応するレコードに、該当する項目の名称を示すためラベルを付与する。こうしてキーバリュー抽出結果７００が生成される。キーバリュー抽出結果７００はＭＦＰ３００に送信される。 The key-value extraction unit 403 inputs the character area in column 701 and the character string in column 702 into the extraction rule 409, and outputs the character string and character area corresponding to each item. Based on the output result of the extraction rule 409, the key-value extraction unit 403 assigns a label to indicate the name of the corresponding item to the records corresponding to each item among the records having the character string and character area in the processing result of the character recognition process. In this way, the key-value extraction result 700 is generated. The key-value extraction result 700 is transmitted to the MFP 300.

図７のキーバリュー抽出結果７００の行７０４～７０８における文字認識結果の文字列は、項目（キー）に対応する文字列（バリュー）として抽出された文字列である。このため行７０４～７０８における列７０３には抽出対象の項目であることを示すラベルが付与されている。 The character strings resulting from character recognition in rows 704 to 708 of the key-value extraction result 700 in FIG. 7 are character strings extracted as character strings (values) corresponding to items (keys). For this reason, columns 703 in rows 704 to 708 are given labels indicating that they are items to be extracted.

図７に示すキーバリュー抽出結果７００は、キーバリュー抽出部４０３がキーバリューを正しく抽出し、列７０３に正しいラベルが保持されている例である。しかしながら、列７０３のラベルは、項目に対応するレコードとは異なるレコードに付与されてしまうことがある。つまり抽出ルール４０９は誤って項目に対応する文字列および文字領域を出力することがある。例えば、請求先の電話番号を項目として設定してある場合、請求先以外の電話番号をその項目に対応する文字列として出力されてしまうことが考えられる。この場合は、項目に対応する文字列をユーザが入力して修正することが可能である。また、項目に対応する正しい文字領域についてもユーザが画面をタッチして指示することが可能である。 The key-value extraction result 700 shown in FIG. 7 is an example in which the key-value extraction unit 403 correctly extracts a key-value, and the correct label is stored in column 703. However, the label in column 703 may be assigned to a record other than the record corresponding to the item. In other words, the extraction rule 409 may erroneously output a character string and character area corresponding to an item. For example, if the telephone number of the billing destination is set as an item, it is conceivable that a telephone number other than the billing destination will be output as a character string corresponding to that item. In this case, the user can input and correct the character string corresponding to the item. The user can also touch the screen to indicate the correct character area corresponding to the item.

学習部４０７は、項目に対応する正しい文字列と、項目に対応する正しい文字領域とを含む教師データがあれば、抽出ルール４０９を学習させて抽出ルール４０９を更新させることができる。このように、抽出ルール４０９を学習させることで、正しいキーバリューが抽出されるように抽出ルール４０９を更新することができる。しかし、ＭＦＰ３００に備え付けられている操作パネル３０７の画面のサイズが小さいことがある。このため、ユーザが正しい文字領域を、操作パネル３０７を介して指示しようとすると、ユーザの操作が煩雑になる虞がある。 If there is training data including the correct character string corresponding to the item and the correct character area corresponding to the item, the learning unit 407 can have the extraction rule 409 learn and update the extraction rule 409. In this way, by having the extraction rule 409 learn, it is possible to update the extraction rule 409 so that the correct key value is extracted. However, the screen size of the operation panel 307 provided on the MFP 300 may be small. For this reason, when the user tries to indicate the correct character area via the operation panel 307, the user's operation may become cumbersome.

そこで本実施形態では、ユーザが入力した項目に対応する正しい文字列に基づき、その項目に対応する文字領域を決定して、決定された文字領域を学習のためのデータとして出力する形態を説明する。また、項目に対応する文字列が含まれる文字領域を、ユーザが操作パネル３０７をタッチして指示する場合、そのタッチ位置が正確でない場合であっても項目に対応する正しい文字領域を決定する形態を説明する。 In this embodiment, therefore, a form is described in which a character area corresponding to an item input by a user is determined based on the correct character string corresponding to that item, and the determined character area is output as data for learning. In addition, a form is described in which, when a user touches the operation panel 307 to indicate a character area that contains a character string corresponding to an item, the correct character area corresponding to the item is determined even if the touch position is not accurate.

［文字抽出処理および修正処理について］
図８は、本実施形態の入力支援システムにおける処理を示すシーケンス図である。図８のシーケンス図で示されるそれぞれの装置における処理は、それぞれ装置のＣＰＵがＲＯＭに記憶されているプログラムコードをＲＡＭに展開し実行することにより行われる。また、図８におけるステップの一部または全部の機能をＡＳＩＣや電子回路等のハードウェアで実現してもよい。なお、各処理の説明における記号「Ｓ」は、当該フローにおけるステップであることを意味する。なお、説明の便宜上、ユーザの動作にも「Ｓ」を付して説明する。 [Character extraction and correction processing]
Fig. 8 is a sequence diagram showing the processing in the input support system of this embodiment. The processing in each device shown in the sequence diagram of Fig. 8 is performed by the CPU of each device expanding the program code stored in the ROM into the RAM and executing it. In addition, some or all of the functions of the steps in Fig. 8 may be realized by hardware such as an ASIC or an electronic circuit. Note that the symbol "S" in the explanation of each process means that it is a step in the flow. For convenience of explanation, the user's actions are also explained with the letter "S" attached.

Ｓ８０１においてユーザは、ＭＦＰ３００のスキャナ３０５に原稿（紙文書）をセットして、スキャン開始の指示をする。本ステップにおけるユーザの操作をトリガーに入力支援システムにおける処理は開始される。 In S801, the user places an original (paper document) on the scanner 305 of the MFP 300 and issues an instruction to start scanning. The user's operation in this step triggers the start of processing in the input support system.

Ｓ８０２においてＭＦＰ３００のスキャン実行部５０２は、紙文書のスキャンを実行し、紙文書の画像データを生成する。画像データ送信部５０３は、キーバリュー抽出サーバ２００に生成された画像データを送信する。 In S802, the scan execution unit 502 of the MFP 300 scans the paper document and generates image data of the paper document. The image data transmission unit 503 transmits the generated image data to the key-value extraction server 200.

Ｓ８０３において画像データ取得部４０１は、ＭＦＰ３００から送信された画像データを取得し、ＯＣＲ部４０２がその画像データに対して文字認識処理を実行する。 In S803, the image data acquisition unit 401 acquires the image data sent from the MFP 300, and the OCR unit 402 performs character recognition processing on the image data.

Ｓ８０４においてキーバリュー抽出部４０３は、抽出ルール４０９に文字認識処理の処理結果を入力して出力された結果に基づき、予め定義された夫々の項目に対応する文字列および文字領域を抽出する。そして、キーバリュー抽出部４０３は、文字認識処理の処理結果に含まれるレコードのうち、項目に対応する文字領域および文字列を含むレコードには、その項目の名称を示すラベルを付与して、図７に示すキーバリュー抽出結果７００を生成する。キーバリュー抽出結果送信部４０５は、生成されたキーバリュー抽出結果７００をＭＦＰ３００に送信する。 In S804, the key-value extraction unit 403 extracts character strings and character regions corresponding to each predefined item based on the results output by inputting the results of the character recognition process to the extraction rule 409. The key-value extraction unit 403 then assigns a label indicating the name of the item to records that contain character regions and character strings corresponding to the items among the records included in the results of the character recognition process, thereby generating the key-value extraction result 700 shown in FIG. 7. The key-value extraction result transmission unit 405 transmits the generated key-value extraction result 700 to the MFP 300.

Ｓ８０５においてＭＦＰ３００のキーバリュー抽出結果取得部５０４は、文字認識処理の処理結果が含まれるキーバリュー抽出結果７００を取得する。また、ＭＦＰ３００のＵＩ部５０１は、操作パネル３０７に、取得されたキーバリュー抽出結果に基づき、夫々の項目と項目に対応する夫々の文字列とを表示する。 In S805, the key-value extraction result acquisition unit 504 of the MFP 300 acquires the key-value extraction result 700, which includes the processing result of the character recognition process. In addition, the UI unit 501 of the MFP 300 displays each item and each character string corresponding to the item on the operation panel 307 based on the acquired key-value extraction result.

図９は、本ステップによってＭＦＰ３００の操作パネル３０７に表示される画面９０１を示す図である。画面９０１は、キーバリューの抽出結果をユーザが確認および修正するために表示される。バリュー表示欄９０２～９０６は、キー表示欄９１２～９１６に表示される項目に対応する文字列を表示する表示欄である。 Figure 9 shows a screen 901 displayed on the operation panel 307 of the MFP 300 in this step. Screen 901 is displayed so that the user can confirm and modify the key value extraction results. Value display fields 902-906 are display fields that display character strings corresponding to the items displayed in the key display fields 912-916.

Ｓ８０６においてユーザは、ＭＦＰ３００の操作パネル３０７に表示される画面９０１から、それぞれの項目に対応する文字列の抽出結果を確認し、文字列に誤りがあれば修正する。 In S806, the user checks the extracted strings corresponding to each item on screen 901 displayed on the operation panel 307 of the MFP 300, and corrects any errors in the strings.

ＭＦＰ３００の操作パネル３０７に表示されている画面９０１のバリュー表示欄９０２～９０６のいずれかをユーザがタッチすると、画面９０１から編集モードに切り替わる。編集モードでは、ユーザはタッチしたバリュー表示欄の項目の正しい文字列を入力して、バリュー表示欄の文字列を修正することができる。つまり、ユーザは、修正するバリュー表示欄をタッチすることで、修正する項目を指示することができる。ユーザが修正の入力後にユーザが登録ボタン９０７を押下すると、ＵＩ部５０１が修正後の文字列で登録を受け付ける。 When the user touches any of the value display fields 902 to 906 on screen 901 displayed on the operation panel 307 of the MFP 300, screen 901 switches to edit mode. In edit mode, the user can correct the string in the value display field by inputting the correct string for the item in the value display field that the user touched. In other words, the user can specify the item to be corrected by touching the value display field to be corrected. When the user presses the register button 907 after inputting the corrections, the UI unit 501 accepts the registration of the corrected string.

例えば、画面９０１ではキー表示欄９１６に表示されている「電話番号」のバリューとして「０９８－７６５－４３２１」が抽出結果として表示されている。図６の矩形６０５に示すように正しく抽出されるべき請求先の電話番号の文字列は「０１２３－４５－６７８９」であるとする。この場合、ユーザは、バリュー表示欄９０６をタッチしてから「０１２３－４５－６７８９」を入力して、項目に対応する文字列を修正することができる。 For example, on screen 901, "098-765-4321" is displayed as the extraction result as the value of "phone number" displayed in key display field 916. As shown in rectangle 605 in FIG. 6, the character string of the billing destination phone number that should be correctly extracted is "0123-45-6789". In this case, the user can touch value display field 906 and then input "0123-45-6789" to correct the character string corresponding to the item.

このようなユーザが修正を必要とするキーバリューの抽出の誤りの原因については、項目に対応する文字領域が誤って抽出されている場合がある。または、ユーザが修正を必要とするキーバリューの抽出の誤りの原因については、項目に対応する文字領域は正しく抽出されていたがその文字領域に対する文字認識が誤っている場合がある。 The cause of errors in the extraction of key values that users need to correct may be that the character area corresponding to the item is incorrectly extracted. Alternatively, the cause of errors in the extraction of key values that users need to correct may be that the character area corresponding to the item is correctly extracted, but the character recognition for that character area is incorrect.

Ｓ８０７においてＭＦＰ３００の領域決定部５０６は、修正前の項目に対応する文字認識結果の文字列と、修正後の文字列との差が所定の範囲内（許容範囲内）かを判定する。本実施形態では、キーバリュー抽出部４０３が抽出した修正前の項目に対応する文字列とユーザが修正した修正後の文字列との間の編集距離（レーベンシュタイン距離）が閾値以下である場合、所定の範囲内と判定する。閾値は、例えば１である。 In S807, the area determination unit 506 of the MFP 300 determines whether the difference between the character string resulting from character recognition corresponding to the item before correction and the corrected character string is within a predetermined range (acceptable range). In this embodiment, if the edit distance (Levenshtein distance) between the character string corresponding to the item before correction extracted by the key-value extraction unit 403 and the corrected character string corrected by the user is less than or equal to a threshold, it is determined that the difference is within the predetermined range. The threshold is, for example, 1.

例えば、Ｓ８０４においてキーバリュー抽出部４０３が抽出した、ある項目に対応する修正前の文字列が「０１２３－４５－６７８８」であり、ユーザが修正した修正後の文字列が「０１２３－４５－６７８９」の場合、編集距離は１となる。このため閾値が１であれば、項目に対応する修正前の文字列と修正後の文字列との差は所定の範囲内と判定される。 For example, if the pre-correction character string corresponding to a certain item extracted by the key-value extraction unit 403 in S804 is "0123-45-6788" and the corrected character string corrected by the user is "0123-45-6789", the edit distance will be 1. Therefore, if the threshold value is 1, the difference between the pre-correction character string and the corrected character string corresponding to the item is determined to be within a predetermined range.

文字列の差が所定の範囲内である場合（Ｓ８０７がＹＥＳ）、キーバリューの抽出の誤りの原因については、項目に対応する文字領域は正しく抽出されているものの、文字領域に対する文字認識処理が誤っていたためと考えられる。このため、後述する項目に対応する正しい文字領域の決定は行わずにＳ８１５に進む。 If the difference in the character strings is within a predetermined range (YES in S807), the cause of the error in extracting the key value is likely to be that the character area corresponding to the item was correctly extracted, but the character recognition process for the character area was incorrect. For this reason, the process proceeds to S815 without determining the correct character area corresponding to the item, which will be described later.

文字列の差が所定の範囲内ではない場合（Ｓ８０７がＮＯ）、キーバリューの抽出の誤りの原因については、項目に対応する文字領域が誤って抽出されたためと考えられる。この場合Ｓ８０８に進み、領域決定部５０６は、画像データの全体を文字認識処理した処理結果から、修正後の文字列と同じである文字列を決定し、その文字列が認識された文字領域を項目に対応する正しい文字領域として決定する。本実施形態では、キーバリュー抽出結果に文字認識処理した処理結果が含まれているため、領域決定部５０６は、キーバリュー抽出結果から、項目に対応する正しい文字領域を決定する。なお、ここで、同じとは、文字が全く同じものに限定されず、同じとみなせるものも含むものとする。 If the difference in the character strings is not within the specified range (NO in S807), the cause of the error in key-value extraction is believed to be that the character area corresponding to the item was erroneously extracted. In this case, the process proceeds to S808, where the area determination unit 506 determines a character string that is the same as the corrected character string from the processing results of character recognition processing on the entire image data, and determines the character area in which this character string is recognized as the correct character area corresponding to the item. In this embodiment, since the processing results of character recognition processing are included in the key-value extraction results, the area determination unit 506 determines the correct character area corresponding to the item from the key-value extraction results. Note that "same" here is not limited to characters that are exactly the same, but also includes characters that can be considered to be the same.

図１０は、Ｓ８０２においてＭＦＰ３００のスキャン実行部５０２によってスキャンされたスキャン画像であり、図６とは異なる紙文書のスキャン画像の一例を示す図である。図１１は、図１０の画像データに対して文字認識とキーバリュー抽出を行った結果を示す、図７とは別の、キーバリュー抽出結果である。図１０および図１１を用いて、Ｓ８０８の処理について説明する。 Figure 10 is a diagram showing an example of a scanned image of a paper document different from that shown in Figure 6, which is a scanned image scanned by the scan execution unit 502 of the MFP 300 in S802. Figure 11 is a key-value extraction result different from that shown in Figure 7, showing the result of performing character recognition and key-value extraction on the image data of Figure 10. The processing of S808 will be described using Figures 10 and 11.

図１１の行１１０２～１１０６は、項目を示すラベルが付与された文字認識処理の処理結果（文字領域および文字認識結果）含むレコードである。図１０の点線矩形１００１～１００５は、項目に対応する抽出された文字領域であることを示している。しかしながら、図１１のキーバリュー抽出結果にはキーバリュー抽出部４０３による抽出の誤りが含まれている。具体的には、ラベルが「電話番号」である項目に対応する文字領域および文字列として、行１１０６のレコードに電話番号のラベルが保持されているが、このラベルの付与は正しく無いものとする。このため、Ｓ８０６において「電話番号」に対応する正しい文字列である「０１２３－４５－６７８９」がユーザによって入力されたものとする。また、Ｓ８０７においてユーザが入力した修正後の文字列「０１２３－４５－６７８９」と誤って抽出された行１１０６の文字認識結果「０９８－７６５－４３２１」とは許容範囲内と判定されなかったためＳ８０８に処理が遷移しているものとする。 Rows 1102 to 1106 in FIG. 11 are records containing the results of the character recognition process (character regions and character recognition results) to which labels indicating items have been added. Dotted rectangles 1001 to 1005 in FIG. 10 indicate extracted character regions corresponding to items. However, the key-value extraction results in FIG. 11 contain an extraction error by the key-value extraction unit 403. Specifically, the record in row 1106 holds a telephone number label as a character region and character string corresponding to the item labeled "telephone number", but the labeling is incorrect. For this reason, it is assumed that the user inputs "0123-45-6789", which is the correct character string corresponding to "telephone number", in S806. It is also assumed that the corrected character string "0123-45-6789" input by the user in S807 and the erroneously extracted character recognition result "098-765-4321" in row 1106 are not determined to be within the acceptable range, and therefore the process transitions to S808.

Ｓ８０８において領域決定部５０６は、列１１０８の文字認識結果に保持されている全ての文字列に対して、ユーザが修正した修正後の文字列との編集距離を導出する。編集距離が所定の値以下である文字認識結果の文字列が１つの場合、領域決定部５０６は、その所定の値以下の文字列と修正後の文字列とは同じであるとみなして、項目に対応する正しい文字領域が決定できたと判定する。所定の値は例えば１である。 In S808, the area determination unit 506 derives the edit distance between the corrected string corrected by the user and all character strings stored in the character recognition results in column 1108. If there is one character string in the character recognition results whose edit distance is equal to or less than a predetermined value, the area determination unit 506 considers that the character string equal to or less than the predetermined value and the corrected string are the same, and determines that the correct character area corresponding to the item has been determined. The predetermined value is, for example, 1.

図１１の列１１０１は、各行の文字認識結果の文字列とユーザによる修正後の文字列「０１２３－４５－６７８９」との編集距離を保持する列である。図１１では、列１１０１において編集距離が１以下であるのは、行１１０７に保持されている文字認識結果のみである。つまり、編集距離が所定の値である１以下の文字列が１つだけである。このため、図１１の例では、「電話番号」の項目に対応する正しい文字領域は、行１１０７の文字領域であるものとして決定できることになる。図１０の点線矩形１００６は、図１１の行１１０７の文字領域に対応する文字領域である。 Column 1101 in FIG. 11 is a column that holds the edit distance between the character string of the character recognition result for each row and the character string "0123-45-6789" corrected by the user. In FIG. 11, the only character in column 1101 with an edit distance of 1 or less is the character recognition result held in row 1107. In other words, there is only one character string with an edit distance of 1 or less, which is the predetermined value. Therefore, in the example of FIG. 11, the correct character area corresponding to the "phone number" item can be determined to be the character area in row 1107. Dotted rectangle 1006 in FIG. 10 is the character area corresponding to the character area in row 1107 in FIG. 11.

このように、本実施形態では、項目に対応する文字領域が誤って抽出された場合、項目に対応する正しい文字領域に修正するために、ユーザがキーバリュー抽出結果または文字認識処理の処理結果から正しい文字領域を探して指定する必要はない。ユーザがその項目に対応する正しい文字列の入力に応じて、その修正後の文字列に基づき領域決定部５０６がその項目に対応する文字領域を決定することができる。 In this manner, in this embodiment, if a character area corresponding to an item is erroneously extracted, the user does not need to search for and specify the correct character area from the key-value extraction results or the processing results of the character recognition process in order to correct it to the correct character area corresponding to the item. When the user inputs the correct character string corresponding to the item, the area determination unit 506 can determine the character area corresponding to the item based on the corrected character string.

Ｓ８０９では、Ｓ８０８においてユーザが入力した修正後の文字列に基づき、項目に対応する文字領域が決定できたかが判定される。例えば、修正後の文字列との編集距離が所定の値以下の文字列が複数ある場合、または編集距離が所定の値以下の文字列が無かった場合は、Ｓ８０８では、ユーザが修正指示した項目（誤って抽出された項目）に対応する正しい文字領域は決定されない。この場合は、正しい文字領域が決定されないと判定される。 In S809, it is determined whether a character area corresponding to the item has been determined based on the corrected character string entered by the user in S808. For example, if there are multiple character strings whose edit distance from the corrected character string is less than or equal to a predetermined value, or if there are no character strings whose edit distance from the corrected character string is less than or equal to a predetermined value, the correct character area corresponding to the item specified by the user to be corrected (the item extracted in error) is not determined in S808. In this case, it is determined that the correct character area has not been determined.

正しい文字領域が決定されない場合（Ｓ８０９がＮＯ）、Ｓ８１０においてＵＩ部５０１は、操作パネル３０７に、スキャンした原稿の画像データに基づきプレビュー画像を表示させる。そして、ＵＩ部５０１は、ユーザが修正指示した項目に対応する正しい文字領域を、操作パネル３０７上のプレビュー画像にタッチして指示するように、ユーザに促す。 If the correct character area has not been determined (S809 is NO), in S810, the UI unit 501 causes the operation panel 307 to display a preview image based on the image data of the scanned document. The UI unit 501 then prompts the user to touch the preview image on the operation panel 307 to indicate the correct character area that corresponds to the item that the user has instructed to be corrected.

Ｓ８１１においてユーザは、修正後の文字列が記載されている、ユーザが修正指示した項目の正しい文字領域の位置を、ＭＦＰ３００の操作パネル３０７に表示されているプレビュー画面上をタッチして指示する。 In S811, the user touches the preview screen displayed on the operation panel 307 of the MFP 300 to indicate the position of the correct character area for the item that the user has instructed to be corrected, in which the corrected character string is written.

Ｓ８１２においてＭＦＰ３００のＵＩ部５０１は、ユーザのタッチを受け付け、操作パネル３０７のタッチ位置を検知する。そして、ＵＩ部５０１は、操作パネル３０７に画面１５０５（図１５参照）を表示させ、ユーザによる登録ボタン９０７の押下を受け付ける。このとき表示される画面１５０５では、ユーザによって修正が指示された項目に対応するバリュー表示欄には修正後の文字列が表示される。 In S812, the UI unit 501 of the MFP 300 receives a touch by the user and detects the touch position on the operation panel 307. The UI unit 501 then displays screen 1505 (see FIG. 15) on the operation panel 307 and receives the user's press of the register button 907. In screen 1505 that is displayed at this time, the corrected character string is displayed in the value display field corresponding to the item that the user instructed to be corrected.

修正結果送信部５０５は、ユーザがタッチした操作パネル３０７上のタッチ位置に基づき、画像の座標にマッピングしたタッチ位置座標に変換してタッチ位置座標を決定する。修正結果送信部５０５は、Ｓ８０６でユーザが入力した修正後の文字列と、タッチ位置座標とを、キーバリュー抽出サーバ２００に送信する。 The correction result transmission unit 505 converts the touch position on the operation panel 307 touched by the user into touch position coordinates mapped to the image coordinates, and determines the touch position coordinates. The correction result transmission unit 505 transmits the corrected character string input by the user in S806 and the touch position coordinates to the key-value extraction server 200.

Ｓ８１３においてキーバリュー抽出サーバ２００の修正結果取得部４０６は、ＭＦＰ３００から送信されたタッチ位置座標と修正後の文字列とその項目とを取得する。キーバリュー抽出サーバ２００の領域決定部４０４は、タッチ位置座標と修正後の文字列とに基づき、ユーザが修正指示した項目に対応する正しい文字領域を決定する。 In S813, the correction result acquisition unit 406 of the key-value extraction server 200 acquires the touch position coordinates and the corrected character string and its item sent from the MFP 300. The area determination unit 404 of the key-value extraction server 200 determines the correct character area corresponding to the item instructed to be corrected by the user based on the touch position coordinates and the corrected character string.

図１２は、Ｓ８０２においてＭＦＰ３００のスキャン実行部５０２によってスキャンされたスキャン画像であり、図６および図１０とは異なる紙文書のスキャン画像の一例を示す図である。図１３は、図１２の画像データに対して文字認識とキーバリュー抽出を行った結果であり、図７および図１１とは別の、キーバリュー抽出結果を示すテーブルである。図１２および図１３を用いて、Ｓ８１０からＳ８１３の処理を説明する。 Figure 12 is a scan image scanned by the scan execution unit 502 of the MFP 300 in S802, and is a diagram showing an example of a scan image of a paper document different from those shown in Figures 6 and 10. Figure 13 is a table showing the results of performing character recognition and key-value extraction on the image data of Figure 12, and is different from those shown in Figures 7 and 11. The processing from S810 to S813 will be explained using Figures 12 and 13.

図１３の行１３０２～１３０６は、所定の項目に対応する文字列（文字領域）として項目を示すラベルが付与された文字認識処理の処理結果を含むレコードである。しかしながら、列１３１２の「電話番号」は、正しい文字列（文字領域）に付与されていないものとする。 Rows 1302 to 1306 in FIG. 13 are records containing the results of character recognition processing in which a label indicating the item is assigned as a string (character area) corresponding to a specific item. However, the "phone number" in column 1312 is not assigned to the correct string (character area).

このため、キーバリュー抽出部４０３によって「電話番号」である項目の文字列が「０９８－７６５－４３２１」として抽出されたものの、ユーザによって「電話番号」である項目の文字列が「０１２３－４５－６７８９」に修正されたものとする。図１３の列１３１１には、ユーザの入力による修正後の文字列「０１２３－４５－６７８９」と文字認識結果の文字列との編集距離が保持されている。図１３の例では、編集距離が１以下の文字列が２つあるため、Ｓ８０８において領域決定部５０６は電話番号の項目に対応する正しい文字領域が決定できなかったことになる。このため処理はＳ８１０に遷移している。 For this reason, suppose that the key-value extraction unit 403 extracted the character string in the "phone number" field as "098-765-4321", but the user corrected the character string in the "phone number" field to "0123-45-6789". Column 1311 in FIG. 13 holds the edit distance between the character string "0123-45-6789" corrected by the user's input and the character string resulting from character recognition. In the example in FIG. 13, there are two character strings with an edit distance of 1 or less, so in S808 the region determination unit 506 was unable to determine the correct character region corresponding to the phone number field. For this reason, the process transitions to S810.

図１２に示すタッチ位置１２１０は、ユーザがＳ８１１においてタッチした位置であり、Ｓ８１２においてそのタッチ位置座標は、（ｘ，ｙ）＝（１１１０，４９３）と決定されている。 Touch position 1210 shown in FIG. 12 is the position touched by the user in S811, and the touch position coordinates are determined to be (x, y) = (1110, 493) in S812.

Ｓ８１３においてキーバリュー抽出サーバ２００の領域決定部４０４は、タッチ位置座標と、キーバリュー抽出結果のテーブルのすべて行の文字領域との距離を導出する。領域決定部４０４は、その距離が所定の値以下の文字領域のうち、修正後の文字列との編集距離が最小の文字列が認識された文字領域を、ユーザが修正指示した項目に対応する文字領域と決定する。所定の値は例えば１００である。 In S813, the area determination unit 404 of the key-value extraction server 200 derives the distance between the touch position coordinates and the character areas of all rows of the table of the key-value extraction results. Among the character areas where the distance is equal to or less than a predetermined value, the area determination unit 404 determines the character area in which the character string with the smallest edit distance from the corrected character string is recognized as the character area corresponding to the item instructed to be corrected by the user. The predetermined value is, for example, 100.

図１３の列１３０１は、文字領域の位置とユーザがタッチした画像内のタッチ位置との距離を保持する列であり、タッチ位置座標が（１１１０，４９３）の場合の文字領域との距離を示している。行１３０７～１３１０は、タッチ位置座標との距離が所定の値である１００以下の文字領域を有するレコードである。このうち、領域決定部４０４は、列１３１１の編集距離が最小である行１３０８の文字領域が、ユーザが修正指示した項目である「電話番号」の項目に対応する正しい文字領域と決定する。なお、最小のものが２つ以上ある場合は、距離が一番小さいほうの文字領域を正しい文字領域として決定されてもよい。 Column 1301 in FIG. 13 is a column that holds the distance between the position of a character area and the touch position in the image touched by the user, and shows the distance from the character area when the touch position coordinates are (1110, 493). Rows 1307 to 1310 are records that have character areas whose distance from the touch position coordinates is a predetermined value of 100 or less. Of these, the area determination unit 404 determines that the character area in row 1308, which has the smallest edit distance in column 1311, is the correct character area corresponding to the "phone number" item, which is the item instructed to be corrected by the user. Note that if there are two or more smallest distances, the character area with the smallest distance may be determined to be the correct character area.

このように、本実施形態では、ユーザが修正指示した項目に対応する正しい文字領域をユーザが画像をタッチして指示する場合、タッチ位置座標だけでなく修正後の文字列との編集距離についても考慮して、その項目に対応する文字領域を決定する。 In this way, in this embodiment, when a user touches an image to indicate the correct character area that corresponds to an item that the user has instructed to correct, the character area that corresponds to that item is determined taking into consideration not only the touch position coordinates but also the edit distance with the corrected character string.

例えば、図１３ではタッチ位置座標との距離が一番小さい文字領域は「ＴＥＬ：」の文字列が認識された行１３０７の文字領域であるが、これは項目に対応する文字領域ではない。このように単にタッチ位置座標に基づき項目に対応する文字領域を決定すると、画面が小さい場合などユーザが正確にタッチできない場合は、正しい文字領域を決定できない虞がある。本実施形態によればこのようにユーザが正確にタッチできない場合でも、項目に対応する文字領域を決定することができる。 For example, in FIG. 13, the character area with the smallest distance from the touch position coordinates is the character area of line 1307 where the character string "TEL:" is recognized, but this is not the character area that corresponds to the item. If the character area corresponding to the item is determined simply based on the touch position coordinates in this way, there is a risk that the correct character area cannot be determined if the user cannot touch accurately, such as when the screen is small. According to this embodiment, the character area corresponding to the item can be determined even in such cases where the user cannot touch accurately.

Ｓ８１４において領域決定部４０４は、Ｓ８１３で項目に対応する正しい文字領域が決定できたかどうかを判定する。 In S814, the area determination unit 404 determines whether the correct character area corresponding to the item was determined in S813.

Ｓ８１３で正しい文字領域が決定された場合（Ｓ８１４がＹＥＳ）、Ｓ８０８で正しい文字領域が決定された場合（Ｓ８０９がＹＥＳ）、はＳ８１５に進む。 If the correct character area is determined in S813 (YES in S814), or if the correct character area is determined in S808 (YES in S809), proceed to S815.

Ｓ８１５において学習部４０７は、領域決定部４０４、５０６によって決定された項目に対応する正しい文字領域の位置情報を示すデータを教師データとして用いて抽出ルール４０９に学習させる。抽出ルール４０９は、教師データに含まれる文字認識結果の特徴を抽出し学習することによって更新される。 In S815, the learning unit 407 uses data indicating the position information of the correct character area corresponding to the item determined by the area determination units 404 and 506 as training data to train the extraction rules 409. The extraction rules 409 are updated by extracting and learning the characteristics of the character recognition results contained in the training data.

Ｓ８０８またはＳ８１３においてユーザが修正指示した項目に対応する正しい文字領域を決定できた場合、領域決定部４０４、５０６は、正しい文字領域を含むレコードにラベルが付与されるようにキーバリュー抽出結果は修正される。 If the correct character area corresponding to the item specified by the user in S808 or S813 can be determined, the area determination unit 404, 506 corrects the key-value extraction results so that a label is assigned to the record containing the correct character area.

図１４は、図１１のキーバリュー抽出結果におけるラベルが、Ｓ８０８の処理結果に基づきＭＦＰ３００にて修正された後のキーバリュー抽出結果を示す図である。領域決定部５０６は、「電話番号」が項目である文字領域は行１１０７における文字領域であると決定している。このため領域決定部５０６は、図１４に示すように列１４０１において「電話番号」が行１１０７に付与されるようにキーバリュー抽出結果を修正する。 Figure 14 is a diagram showing the key-value extraction result after the label in the key-value extraction result in Figure 11 has been modified by MFP 300 based on the processing result of S808. The area determination unit 506 has determined that the character area in which "phone number" is the item is the character area in row 1107. Therefore, the area determination unit 506 modifies the key-value extraction result so that "phone number" is assigned to row 1107 in column 1401 as shown in Figure 14.

Ｓ８１３で項目に対応する文字領域が決定された場合も同様に、領域決定部４０４は、図１３のキーバリュー抽出結果において列１３１２における「電話番号」が行１３０８に付与されるようにキーバリュー抽出結果を修正する。このように領域決定部４０４、５０６は、決定した正しい文字領域をキーバリュー抽出結果に反映させてキーバリュー抽出結果を修正する機能も有する。 Similarly, when a character area corresponding to an item is determined in S813, the area determination unit 404 corrects the key-value extraction result so that "phone number" in column 1312 in the key-value extraction result of FIG. 13 is assigned to row 1308. In this way, the area determination units 404 and 506 also have the function of correcting the key-value extraction result by reflecting the determined correct character area in the key-value extraction result.

Ｓ８１５における学習では、例えば、図１４のような修正後のキーバリュー抽出結果が学習のためのデータとして用いられる。図１４の修正後のキーバリュー抽出結果をそのまま教師データとして使用して学習されてもよいし、または、修正後のキーバリュー抽出結果を加工して教師データまたは入力データとして用いて学習してもよい。 In the learning in S815, for example, the corrected key-value extraction result as shown in FIG. 14 is used as data for learning. The corrected key-value extraction result as shown in FIG. 14 may be used as is as training data for learning, or the corrected key-value extraction result may be processed and used as training data or input data for learning.

Ｓ８１４で正しい文字領域が決定されなかった場合（Ｓ８１４がＮＯ）、Ｓ８１６に進む。例えばタッチ位置座標との距離が所定の値以内の文字領域が存在しなかった場合は、Ｓ８１４で、文字領域が決定できなかったと判定される。この場合は、ユーザが修正指示した項目に対応する正しい文字領域が決定できかったため、Ｓ８１５の学習のためのステップへは進まないでＳ８１６に進む。このため、項目に対応する誤った文字領域が含まれているデータを学習のためのデータとして使われないようにすることができる。 If the correct character area is not determined in S814 (NO in S814), proceed to S816. For example, if there is no character area whose distance from the touch position coordinates is within a predetermined value, it is determined in S814 that the character area could not be determined. In this case, since the correct character area corresponding to the item specified by the user for correction could not be determined, proceed to S816 without proceeding to the learning step in S815. This makes it possible to prevent data that includes an incorrect character area corresponding to an item from being used as learning data.

Ｓ８１６においてキーバリューデータ送信部４０８は、項目と夫々の項目に対応する文字列のデータを会計サーバ１０３に送信する。文字認識結果またはラベルが修正されたキーバリュー抽出結果が会計サーバ１０３に送信されてもよいし、修正後のキーバリュー抽出結果から、ラベルが付与されているレコードを抽出して、会計サーバ１０３で利用するデータのみを送信してもよい。 In S816, the key-value data transmission unit 408 transmits data on the items and the character strings corresponding to each item to the accounting server 103. The character recognition result or the key-value extraction result with the label corrected may be transmitted to the accounting server 103, or records with labels may be extracted from the corrected key-value extraction result and only the data to be used by the accounting server 103 may be transmitted.

図１５は、図８のフローの各ステップにおいて、ＭＦＰ３００の操作パネル３０７に表示される画面の遷移を示している。 Figure 15 shows the transition of screens displayed on the operation panel 307 of the MFP 300 at each step of the flow in Figure 8.

画面１５０１は、Ｓ８０５の処理の結果表示される画面９０１と同じ画面である。画面１５０２は、Ｓ８０６においてユーザ操作１５０６によって、画面１５０１の電話番号の文字列が修正された後の画面を示している。 Screen 1501 is the same as screen 901 that is displayed as a result of the processing of S805. Screen 1502 shows the screen after the telephone number character string on screen 1501 has been corrected by user operation 1506 in S806.

画面１５０３は、ユーザの入力した文字列から正しい文字領域が決定できなかった場合にＳ８１０において表示される、スキャンされた原稿の画像データのプレビュー画像を示している。画面１５０４は、Ｓ８１１において、ユーザ操作１５０７によって、修正指示された項目に対応する正しい文字領域がタッチされる様子を示している。画面１５０５は、Ｓ８１１の後に表示される画面であり、ユーザ操作１５０８によって、登録ボタン９０７が押下される様子を示している。 Screen 1503 shows a preview image of the image data of a scanned document, which is displayed in S810 when the correct character area cannot be determined from the character string entered by the user. Screen 1504 shows how the correct character area corresponding to the item instructed to be corrected is touched by user operation 1507 in S811. Screen 1505 is the screen displayed after S811, and shows how the register button 907 is pressed by user operation 1508.

図１５の矢印は、画面遷移を示している。画面１５０１は画面１５０２に遷移し、Ｓ８０７またはＳ８０９でＹＥＳと判定された場合、画面１５０２から画面１５０５に遷移する。Ｓ８０７およびＳ８０９でＮＯと判定された場合、画面１５０２から画面１５０３に遷移する。画面１５０３は画面１５０４に遷移し、画面１５０４は画面１５０５に遷移する。 The arrows in FIG. 15 indicate screen transitions. Screen 1501 transitions to screen 1502, and if a YES decision is made in S807 or S809, screen 1502 transitions to screen 1505. If a NO decision is made in S807 and S809, screen 1502 transitions to screen 1503. Screen 1503 transitions to screen 1504, and screen 1504 transitions to screen 1505.

図８のシーケンス図では、ユーザによってキーバリューの修正が行われる場合についてのフローを説明した。例えばＳ８０６でユーザによってキーバリューが修正されない場合は、キーバリューは正しく抽出できたことになる。この場合であっても、キーバリュー抽出結果を学習のためのデータとして用いて抽出ルール４０９に学習させてもよい。 The sequence diagram in Figure 8 describes the flow when the user modifies the key-value. For example, if the user does not modify the key-value in S806, the key-value is correctly extracted. Even in this case, the key-value extraction result may be used as data for learning to allow the extraction rule 409 to learn.

以上説明したように本実施形態によれば、ユーザが入力した項目に対応する修正後の文字列に基づいて、その項目に対応する正しい文字領域を決定することができる。このため、ユーザが項目に対応する文字領域を入力するような煩雑な操作を行う必要がないため、ユーザビリティを向上させることができる。 As described above, according to this embodiment, the correct character area corresponding to an item input by the user can be determined based on the corrected character string corresponding to that item. This eliminates the need for the user to perform the cumbersome operation of inputting a character area corresponding to the item, thereby improving usability.

また、所定の項目に対応する正しい文字領域をユーザが画面をタッチして指定する場合であっても、ユーザがタッチした文書画像上のタッチ位置に加えてユーザによる修正後の文字列にも基づいて項目に対応する正しい文字領域を決定する。このため、タッチ位置が正確でない場合であっても項目に対応する正しい文字領域を決定することが可能となる。 Even when a user touches the screen to specify the correct character area corresponding to a specific item, the correct character area corresponding to the item is determined based on the character string corrected by the user in addition to the touch position on the document image touched by the user. This makes it possible to determine the correct character area corresponding to the item even if the touch position is not accurate.

なお、ユーザが修正指示した項目に対応する正しい文字領域の決定を、Ｓ８０８ではＭＦＰ３００の領域決定部５０６が行い、Ｓ８１３ではキーバリュー抽出サーバ２００の領域決定部４０４が行うものとして説明した。他にも、どちら一方の決定領域部だけがＳ８０８とＳ８１３との両方のステップにおいて項目に対応する正しい文字領域を決定する処理を行ってもよい。例えば、ＭＦＰ３００の領域決定部５０６が、Ｓ８０８だけでなくＳ８１３において項目に対応する文字領域を決定する形態でもよい。または、キーバリュー抽出サーバ２００の領域決定部４０４が、Ｓ８１３だけでなくＳ８０８において項目に対応する文字領域を決定する形態でもよい。また、キーバリュー抽出サーバ２００の領域決定部４０４は、ＭＦＰ３００から項目に対応する正しい文字領域の位置情報を取得できない場合に項目に対応する正しい文字領域を決定する処理を行ってもよい。 The description has been given assuming that the area determination unit 506 of the MFP 300 determines the correct character area corresponding to the item instructed to be corrected by the user in S808, and the area determination unit 404 of the key-value extraction server 200 determines the correct character area corresponding to the item in S813. Alternatively, only one of the area determination units may perform the process of determining the correct character area corresponding to the item in both steps S808 and S813. For example, the area determination unit 506 of the MFP 300 may determine the character area corresponding to the item not only in S808 but also in S813. Alternatively, the area determination unit 404 of the key-value extraction server 200 may determine the character area corresponding to the item not only in S813 but also in S808. Furthermore, the area determination unit 404 of the key-value extraction server 200 may perform the process of determining the correct character area corresponding to the item when position information of the correct character area corresponding to the item cannot be obtained from the MFP 300.

本実施形態では、文字認識処理を行う画像データはスキャン画像であるものとして説明した。他にも例えば、文字認識処理を行う画像データは、電子データとして生成された文書の画像であってもよい。 In this embodiment, the image data on which character recognition processing is performed has been described as being a scanned image. In addition, for example, the image data on which character recognition processing is performed may be an image of a document generated as electronic data.

＜実施形態２＞
実施形態１では、ＭＦＰ３００がスキャンすることによって得られた画像データの全ての文字領域に対して文字認識処理を実行し、その結果に基づきキーバリューを抽出する形態を説明した。しかし、文字認識処理による処理負荷を軽減させる等の理由により、画像データの一部の文字領域に対してのみ文字認識処理が行われ、その文字認識処理の結果に基づきキーバリューを抽出する場合がある。このため本実施形態では、このような場合において、ユーザの修正した文字列に基づき修正対象の項目に対応する正しい文字領域を決定する方法を説明する。本実施形態は、実施形態１からの差分を中心に説明する。特に明記しない部分については実施形態１と同じ構成および処理である。 <Embodiment 2>
In the first embodiment, the MFP 300 performs character recognition processing on all character areas of the image data obtained by scanning, and a key value is extracted based on the results. However, for reasons such as reducing the processing load of the character recognition processing, character recognition processing may be performed on only some character areas of the image data, and a key value may be extracted based on the results of the character recognition processing. For this reason, in this embodiment, a method for determining a correct character area corresponding to the item to be corrected based on a character string corrected by the user in such a case will be described. This embodiment will be described mainly with respect to the differences from the first embodiment. The configuration and processing are the same as those of the first embodiment unless otherwise specified.

［キーバリュー抽出サーバの機能構成］
図１６は、本実施形態のキーバリュー抽出サーバ２００の機能構成を示した図である。実施形態１と同一の構成については、同じ符号を付して説明を省略する。本実施形態のキーバリュー抽出サーバ２００は、文字領域検出部１６０１をさらに有する。 [Functional configuration of the key-value extraction server]
16 is a diagram showing the functional configuration of the key-value extraction server 200 of this embodiment. The same components as those in the first embodiment are denoted by the same reference numerals and will not be described. The key-value extraction server 200 of this embodiment further includes a character region detection unit 1601.

文字領域検出部１６０１は、画像データが示すスキャン画像内に存在する文字領域を検出する処理を行う。文字領域の検出方法は、例えば、ある閾値で２値化を行った画像から文字と推測される矩形領域を抽出する方法等、既知の方法を適用すればよい。 The character area detection unit 1601 performs processing to detect character areas present in the scanned image represented by the image data. The character area detection method may be a known method such as extracting a rectangular area assumed to be a character from an image binarized using a certain threshold value.

また、本実施形態のキーバリュー抽出部１６０２は、文字領域検出部１６０１の処理した結果得られた文字領域の位置情報を学習モデル（不図示）に入力することで出力された結果から、スキャン画像の帳票種別を推定する。学習モデルは、画像内の文字領域の位置およびサイズの特徴に基づき、その画像の帳票種別が出力されるように機械学習された学習モデルである。他にも例えば、キーバリュー抽出部４０３は、データベースに記憶されている帳票の種別ごとの文字領域のレイアウトと、文字領域検出部１６０１が処理した結果得られた文字領域のレイアウトと、の一致度を算出することにより、帳票の種別を推定してもよい。 In addition, the key-value extraction unit 1602 of this embodiment estimates the form type of the scanned image from the results output by inputting the position information of the character area obtained as a result of processing by the character area detection unit 1601 into a learning model (not shown). The learning model is a learning model that has been machine-trained to output the form type of the image based on the position and size characteristics of the character area in the image. As another example, the key-value extraction unit 403 may estimate the type of form by calculating the degree of agreement between the layout of the character area for each type of form stored in the database and the layout of the character area obtained as a result of processing by the character area detection unit 1601.

そして、本実施形態のキーバリュー抽出部１６０２は、帳票の種別を用いて、キー（項目）に対応する文字列であるキーバリューが含まれる文字領域を抽出する。 Then, the key-value extraction unit 1602 of this embodiment uses the type of form to extract a character area that contains a key-value, which is a character string corresponding to a key (item).

本実施形態の抽出ルール１６０３は、帳票の種類と画像内の文字領域の特徴とに基づき、夫々の項目の文字列が含まれる文字領域が特定可能な情報が出力されるように学習されている学習モデルであるものとして説明を行う。キーバリュー抽出部１６０２は、帳票の種別と文字領域検出部１６０１の処理した結果得られたスキャン画像の文字領域の位置等の情報とを、抽出ルール１６０３に入力して出力された結果に基づき、スキャン画像における夫々の項目の文字領域を抽出する。 The extraction rules 1603 of this embodiment will be described as a learning model that is trained to output information that can identify the character areas containing the character strings of each item, based on the type of form and the characteristics of the character areas in the image. The key-value extraction unit 1602 inputs the type of form and information such as the position of the character area of the scanned image obtained as a result of processing by the character area detection unit 1601 into the extraction rules 1603, and extracts the character areas of each item in the scanned image based on the output result.

または、抽出ルール１６０３は、帳票の種類ごとに、夫々の項目の文字領域の位置を示す座標およびサイズ等の位置情報が記憶されているデータベースであってもよい。その場合は例えば、キーバリュー抽出部１６０２は、抽出ルール１６０３に保存されている、ある項目の文字領域の位置と同じまたは近傍にあるスキャン画像内における文字領域を、その項目に対応する文字領域として抽出してもよい。 Alternatively, extraction rules 1603 may be a database in which position information such as coordinates indicating the position and size of the character area of each item is stored for each type of form. In that case, for example, key-value extraction unit 1602 may extract a character area in the scanned image that is located at the same position as or near the position of the character area of a certain item stored in extraction rules 1603 as the character area corresponding to that item.

または、抽出ルール１６０３は、画像内の文字領域の特徴のみに基づき、夫々の項目の文字領域を特定する情報が出力されるように学習されている学習モデルであってもよい。その場合、帳票の種別の推定は行われなくてもよい。つまり、キーバリュー抽出部１６０２は、文字領域検出部１６０１の処理した結果得られたスキャン画像の文字領域の位置等の情報を、抽出ルール１６０３に入力して出力された結果に基づき、スキャン画像における夫々の項目の文字領域を抽出してもよい。 Alternatively, extraction rule 1603 may be a learning model that is trained to output information identifying the character area of each item based only on the characteristics of the character area in the image. In this case, estimation of the type of form does not need to be performed. In other words, key-value extraction unit 1602 may input information such as the position of the character area of the scanned image obtained as a result of processing by character area detection unit 1601 into extraction rule 1603, and extract the character area of each item in the scanned image based on the output result.

そしてキーバリュー抽出部１６０２が抽出した夫々の項目の文字領域に対して、ＯＣＲ部４０２が文字認識処理を行う。そして、その結果得られた文字列を夫々の項目に対応する文字列として出力する。 The OCR unit 402 then performs character recognition processing on the character regions of each item extracted by the key-value extraction unit 1602. The resulting character strings are then output as character strings corresponding to each item.

［文字抽出処理および修正処理について］
図１７は、本実施形態の入力支援システムにおける処理を示すシーケンス図である。Ｓ１７０１～１７０２は、Ｓ８０１～Ｓ８０２と同一であるため説明は省略する。 [Character extraction and correction processing]
17 is a sequence diagram showing the process in the input support system of this embodiment. Steps S1701 to S1702 are the same as steps S801 to S802, so the description will be omitted.

Ｓ１７０３において画像データ取得部４０１は、ＭＦＰ３００から送信されたスキャン画像の画像データを取得し、文字領域検出部１６０１は、取得された画像データの文字領域を検出する。 In S1703, the image data acquisition unit 401 acquires the image data of the scanned image sent from the MFP 300, and the character area detection unit 1601 detects the character area of the acquired image data.

Ｓ１７０４においてキーバリュー抽出部１６０２は、検出された文字領域の情報に基づき、画像データの帳票種別を推定する。そしてＳ１７０５においてキーバリュー抽出部１６０２は、抽出ルール１６０３を用いて、予め定義された夫々の項目に対応する文字領域を抽出する。 In S1704, the key-value extraction unit 1602 estimates the document type of the image data based on the information of the detected character area. Then, in S1705, the key-value extraction unit 1602 uses the extraction rules 1603 to extract character areas corresponding to each predefined item.

Ｓ１７０６においてＯＣＲ部４０２は、Ｓ１７０５において検出された、夫々の項目に対応する文字領域に対してのみ文字認識処理を実行して、夫々の項目の文字列であるキーバリューを抽出する。本実施形態では、キーバリューを抽出するための文字認識処理は一部の文字領域に対して行われる。このためキーバリュー抽出時の処理負荷を抑制することができる。 In S1706, the OCR unit 402 performs character recognition processing only on the character areas corresponding to each item detected in S1705, and extracts key values, which are character strings for each item. In this embodiment, the character recognition processing for extracting key values is performed on some of the character areas. This makes it possible to reduce the processing load when extracting key values.

そして、キーバリュー抽出部１６０２は、夫々の項目に対応する文字領域および文字列を含むレコード（行）には、その項目の名称を示すラベルを付与して、図１８に示すキーバリュー抽出結果１８００を生成する。キーバリュー抽出結果送信部４０５は、生成されたキーバリュー抽出結果１８００をＭＦＰ３００に出力する。 Then, the key-value extraction unit 1602 assigns a label indicating the name of each item to records (rows) that contain character regions and character strings corresponding to each item, and generates the key-value extraction result 1800 shown in FIG. 18. The key-value extraction result transmission unit 405 outputs the generated key-value extraction result 1800 to the MFP 300.

図１８のキーバリュー抽出結果１８００は図６の帳票画像に対してＳ１７０２～１７０６の処理が行われた結果得られるキーバリュー抽出結果１８００である。列１８０１～１８０３に保持される内容は、列７０１～７０３と同様である。行１８０４～１８０８における文字認識結果の文字列は、項目に対応する文字列として抽出された文字列（キーバリュー）である。このため行１８０４～１８０８における列１８０３には抽出対象の項目であることを示すラベルが付与される。 The key-value extraction result 1800 in FIG. 18 is the result of performing the processes of S1702 to S1706 on the form image in FIG. 6. The contents stored in columns 1801 to 1803 are the same as those in columns 701 to 703. The character strings of the character recognition results in rows 1804 to 1808 are character strings (key values) extracted as character strings corresponding to items. For this reason, a label is added to column 1803 in rows 1804 to 1808 to indicate that it is an item to be extracted.

また、図１８に示すように、本ステップでは、項目に対応する文字領域以外の文字領域には文字認識処理されていないため、行１８０４～１８０８以外の行における列１８０２の文字認識結果には文字列が無いことを示す「―」が保持されている。 Also, as shown in FIG. 18, in this step, character recognition processing is not performed on character areas other than those corresponding to items, so the character recognition results for column 1802 in rows other than rows 1804 to 1808 contain "-", indicating that there is no character string.

Ｓ１７０７～Ｓ１７０９は、Ｓ８０５～Ｓ８０７と同様であるため説明を省略する。 Steps S1707 to S1709 are similar to steps S805 to S807, so the explanation will be omitted.

Ｓ１７０９において、ユーザによる修正後の文字列と、当初のキーバリュー抽出結果における修正対象の項目の文字列と、との差が所定の範囲内である場合（Ｓ１７０９がＹＥＳ）、Ｓ１７１３に進む。つまり、この場合、項目に対応する文字領域はＳ１７０５で正しく抽出されていたことになる。このため、後述する、ユーザが修正指示した修正対象の項目に対応する正しい文字領域の決定は行わずにＳ１７１３の学習のステップに進む。 In S1709, if the difference between the character string corrected by the user and the character string of the item to be corrected in the initial key-value extraction result is within a predetermined range (YES in S1709), proceed to S1713. In other words, in this case, the character area corresponding to the item was correctly extracted in S1705. Therefore, proceed to the learning step of S1713 without determining the correct character area corresponding to the item to be corrected instructed by the user, as described below.

Ｓ１７０９において、ユーザによる修正後の文字列と、当初のキーバリュー抽出結果における修正対象の項目の文字列と、の差が所定の範囲内ではない場合（Ｓ１７０９がＮＯ）、Ｓ１７１０に進む。つまりこの場合は修正対象の項目に対応する文字領域がＳ１７０５で誤って抽出されていると考えられる。このため、修正対象の項目に対応する正しい文字領域を決定する処理が行われる。修正結果送信部５０５は、Ｓ１７０８でユーザが入力した修正後の文字列を、キーバリュー抽出サーバ２００に送信する。 In S1709, if the difference between the character string corrected by the user and the character string of the item to be corrected in the original key-value extraction result is not within a predetermined range (NO in S1709), proceed to S1710. In other words, in this case, it is considered that the character area corresponding to the item to be corrected was erroneously extracted in S1705. Therefore, a process is performed to determine the correct character area corresponding to the item to be corrected. The correction result transmission unit 505 transmits the corrected character string entered by the user in S1708 to the key-value extraction server 200.

Ｓ１７１０において修正結果取得部４０６は、ＭＦＰ３００から送信された修正結果の文字列を取得する。そして、ＯＣＲ部４０２は、Ｓ１７０３において検出されたスキャン画像の全ての文字領域に対して文字認識処理を実行する。 In S1710, the correction result acquisition unit 406 acquires the character string of the correction result sent from the MFP 300. Then, the OCR unit 402 performs character recognition processing on all character areas of the scanned image detected in S1703.

本実施形態では、Ｓ１７０９でユーザによる修正後の文字列とキーバリュー抽出結果における修正対象の項目の文字列との差が所定の範囲内ではないと判定された場合のみ、全ての文字領域に文字認識処理がされる。その結果、図１８のキーバリュー抽出結果１８００における列１８０２の文字認識結果に「―」が保持されていた箇所に、本ステップで文字認識の結果得られた文字列が保持されるように、キーバリュー抽出結果１８００が更新される。 In this embodiment, character recognition processing is performed on all character areas only if it is determined in S1709 that the difference between the character string corrected by the user and the character string of the item to be corrected in the key-value extraction result is not within a predetermined range. As a result, the key-value extraction result 1800 is updated so that the character string obtained as a result of character recognition in this step is held in the location where "-" was held in the character recognition result of column 1802 in the key-value extraction result 1800 in FIG. 18.

Ｓ１７１１においてキーバリュー抽出サーバ２００の領域決定部４０４は、Ｓ１７１０で認識された全ての文字認識結果の文字列に対して、ユーザによる修正後の文字列との編集距離を算出する。そして領域決定部４０４は、その編集距離に基づき、修正対象の項目の文字領域を決定する。例えば、編集距離が１以下の文字認識結果が一つだけだった場合、正しい文字領域が決定できたと判断される。 In S1711, the area determination unit 404 of the key-value extraction server 200 calculates the edit distance between the character strings of all the character recognition results recognized in S1710 and the character string corrected by the user. The area determination unit 404 then determines the character area of the item to be corrected based on the edit distance. For example, if there is only one character recognition result with an edit distance of 1 or less, it is determined that the correct character area has been determined.

例えば、図１０の画像がＳ１７０２においてＭＦＰ３００のスキャン実行部５０２によってスキャンすることによって得られたスキャン画像であるとする。またユーザが、修正対象の項目である電話番号の項目の文字列を「０１２３－４５－６７８９」に修正したものとする。この場合、図１０のスキャン画像から検出された全ての文字領域に対して文字認識処理された結果得られた夫々の文字列と、修正後の文字列「０１２３－４５－６７８９」と、の編集距離が算出される。 For example, suppose that the image in FIG. 10 is the scanned image obtained by scanning with the scan execution unit 502 of the MFP 300 in S1702. Also suppose that the user has corrected the character string in the telephone number field, which is the field to be corrected, to "0123-45-6789." In this case, the edit distance between each character string obtained as a result of character recognition processing of all character areas detected from the scanned image in FIG. 10 and the corrected character string "0123-45-6789" is calculated.

図１１は、全ての文字領域に対して文字認識処理が行われた結果得られた文字列が列１１０８の文字認識結果に追加され、また列１１０１には前述した編集距離が追加されたキーバリュー抽出結果である。図１１では、編集距離が１以下の文字列は、行１１０７における列１１０８の文字認識結果に保持されている「０１２３－４５－６７８９」のみである。このため図１１では、行１１０７における文字領域が正しい文字領域として決定される。 Figure 11 shows the key-value extraction result in which the character string obtained by performing character recognition processing on all character areas is added to the character recognition result in column 1108, and the edit distance described above is added to column 1101. In Figure 11, the only character string with an edit distance of 1 or less is "0123-45-6789" stored in the character recognition result in column 1108 in row 1107. Therefore, in Figure 11, the character area in row 1107 is determined to be the correct character area.

Ｓ１７１２において領域決定部４０４は、Ｓ１７１１で修正対象の項目に対応する正しい文字領域が決定できたかどうかを判定する。Ｓ１７１１で正しい文字領域が決定された場合（Ｓ１７１２がＹＥＳ）は、Ｓ１７１３に進む。 In S1712, the area determination unit 404 determines whether the correct character area corresponding to the item to be corrected in S1711 has been determined. If the correct character area has been determined in S1711 (YES in S1712), the process proceeds to S1713.

Ｓ１７１３において学習部４０７は、Ｓ１７１１で決定された文字領域の特徴を抽出し学習することによって学習モデルが更新される。または、抽出ルール１６０３がデータベースの場合は領域決定部４０４によって決定された文字領域の位置情報が、修正対象の項目に対応する文字領域として関連付けられるように抽出ルール１６０３を修正して更新する。 In S1713, the learning unit 407 updates the learning model by extracting and learning the characteristics of the character area determined in S1711. Alternatively, if the extraction rule 1603 is a database, the extraction rule 1603 is modified and updated so that the position information of the character area determined by the area determination unit 404 is associated with the character area corresponding to the item to be modified.

Ｓ１７１１で正しい文字領域が決定されなかった場合（Ｓ１７１２がＮＯ）、学習のためのステップであるＳ１７１３へは進まないでＳ１７１４に進む。このため、項目に対応する誤った文字領域が含まれているデータを学習のためのデータとして使われないようにすることができる。Ｓ１７１４のステップの詳細はＳ８１６と同様であるため説明を省略する。 If the correct character area is not determined in S1711 (NO in S1712), the process does not proceed to S1713, which is a learning step, but proceeds to S1714. This prevents data that contains an incorrect character area corresponding to an item from being used as learning data. The details of step S1714 are the same as those of S816, so a description thereof will be omitted.

図１９は、図１７のフローの各ステップにおいて、ＭＦＰ３００の操作パネル３０７に表示される画面の遷移を示している。画面１９０１は、Ｓ１７０７の処理の結果表示される画面である。画面１９０２は、Ｓ１７０８においてユーザ操作１９０３によって、画面１９０１の電話番号の文字列が修正された後の画面を示している。画面１９０４は、Ｓ１７０８の後に表示される画面であり、ユーザ操作１９０５によって、登録ボタンが押下される様子を示している。図１９の矢印は、画面遷移を示している。画面１９０１は画面１９０２に遷移し、画面１９０２は画面１９０４に遷移する。 Figure 19 shows the transition of screens displayed on the operation panel 307 of the MFP 300 at each step of the flow in Figure 17. Screen 1901 is the screen displayed as a result of the processing of S1707. Screen 1902 shows the screen after the telephone number character string on screen 1901 has been corrected by user operation 1903 in S1708. Screen 1904 is the screen displayed after S1708, and shows the state in which the register button is pressed by user operation 1905. The arrows in Figure 19 show the screen transitions. Screen 1901 transitions to screen 1902, and screen 1902 transitions to screen 1904.

以上説明したように本実施形態によれば、一部の文字領域に対して文字認識処理をすることによりキーバリューを抽出する場合においても、ユーザが入力した修正後の文字列に基づいて、修正対象の項目に対応する正しい文字領域を決定することができる。このため、ユーザが項目に対応する文字領域を入力するような煩雑な操作を行う必要がないため、ユーザビリティを向上させることができる。 As described above, according to this embodiment, even when extracting key values by performing character recognition processing on some character areas, it is possible to determine the correct character area corresponding to the item to be corrected based on the corrected character string entered by the user. This eliminates the need for the user to perform the cumbersome operation of inputting the character area corresponding to the item, thereby improving usability.

なお、キーバリュー抽出サーバ２００が、文字領域を検出して文字認識処理をするものとして説明したが、ＭＦＰ３００が文字領域検出部およびＯＣＲ部と同様の機能を有していてもよい。その場合、上述のシーケンス図で説明した文字領域の検出および文字認識処理は、ＭＦＰ３００で行われてもよい。 In the above description, the key-value extraction server 200 detects character areas and performs character recognition processing, but the MFP 300 may have functions similar to those of the character area detection unit and OCR unit. In that case, the character area detection and character recognition processing described in the sequence diagram above may be performed by the MFP 300.

＜実施形態３＞
実施形態３では実施形態２と同様に、スキャン画像の一部の文字領域にのみ実行された文字認識処理の結果に基づきキーバリューを抽出する場合における、修正対象の項目の正しい文字領域を決定する方法を説明する。ただし実施形態３では、ユーザが画面をタッチした位置も使用して、修正対象の項目に対応する正しい文字領域を決定する方法について説明する。本実施形態は、実施形態２からの差分を中心に説明する。特に明記しない部分については実施形態２と同じ構成および処理である。 <Embodiment 3>
In the third embodiment, as in the second embodiment, a method for determining the correct character area of an item to be corrected when extracting a key value based on the result of character recognition processing executed only on a part of the character area of a scanned image will be described. However, in the third embodiment, a method for determining the correct character area corresponding to the item to be corrected using the position where the user touched the screen will be described. This embodiment will be described focusing on the differences from the second embodiment. The configuration and processing are the same as in the second embodiment unless otherwise specified.

図２０は、本実施形態の入力支援システムにおける処理を示すシーケンス図である。Ｓ２００１～２００９は、Ｓ１７０１～Ｓ１７０９と同一であるため説明は省略する。 Figure 20 is a sequence diagram showing the processing in the input support system of this embodiment. S2001 to S2009 are the same as S1701 to S1709, so the explanation is omitted.

本実施形態では、ユーザによる修正後の文字列とキーバリュー抽出結果における修正対象の項目の文字列との差が所定の範囲内ではない場合（Ｓ２００９がＮＯ）、Ｓ２０１０～Ｓ２０１５の処理が行われる。Ｓ２０１０～Ｓ２０１２までの処理は、Ｓ８１０～Ｓ８１２の処理と同様である。 In this embodiment, if the difference between the character string corrected by the user and the character string of the item to be corrected in the key-value extraction result is not within a predetermined range (NO in S2009), the processes of S2010 to S2015 are performed. The processes of S2010 to S2012 are the same as the processes of S810 to S812.

即ち、Ｓ２０１０においてＵＩ部５０１は、スキャンした原稿（帳票）の画像データに基づきプレビュー画像を操作パネル３０７に表示させる。そして、ＵＩ部５０１は、ユーザが修正指示した修正対象の項目に対応する正しい文字領域を、操作パネル３０７上のプレビュー画像にタッチして指示するように、ユーザに促す。 That is, in S2010, the UI unit 501 displays a preview image on the operation panel 307 based on the image data of the scanned document (document). The UI unit 501 then prompts the user to touch the preview image on the operation panel 307 to indicate the correct text area that corresponds to the item to be corrected as instructed by the user.

Ｓ２０１１においてユーザは、修正後の文字列が記載されている、ユーザが修正指示した項目の正しい文字領域の位置を、ＭＦＰ３００の操作パネル３０７に表示されているプレビュー画面上をタッチして指示する。 In S2011, the user touches the preview screen displayed on the operation panel 307 of the MFP 300 to indicate the position of the correct character area for the item that the user has instructed to be corrected, in which the corrected character string is written.

Ｓ２０１２においてＭＦＰ３００のＵＩ部５０１は、ユーザのタッチを受け付け、操作パネル３０７のタッチ位置を検知する。修正結果送信部５０５は、ユーザがタッチした操作パネル３０７上のタッチ位置に基づき、画像の座標にマッピングしたタッチ位置座標に変換してタッチ位置座標を決定する。そして、ＵＩ部５０１は、操作パネル３０７に画面１５０５を表示させ、ユーザによる登録ボタンの押下を受け付ける。修正結果送信部５０５は、ユーザが入力した修正後の文字列と、タッチ位置座標とを、キーバリュー抽出サーバ２００に送信する。 In S2012, the UI unit 501 of the MFP 300 accepts a touch by the user and detects the touch position on the operation panel 307. The correction result transmission unit 505 converts the touch position on the operation panel 307 touched by the user into touch position coordinates mapped to the image coordinates and determines the touch position coordinates. The UI unit 501 then displays screen 1505 on the operation panel 307 and accepts the user pressing the register button. The correction result transmission unit 505 transmits the corrected character string entered by the user and the touch position coordinates to the key-value extraction server 200.

Ｓ２０１３においてキーバリュー抽出サーバ２００のＯＣＲ部４０２は、Ｓ２００３において検出された全ての文字領域に対して文字認識処理を実行する。 In S2013, the OCR unit 402 of the key-value extraction server 200 performs character recognition processing on all character areas detected in S2003.

Ｓ２０１４の処理はＳ８１３と同様である。つまり、Ｓ２０１４においてキーバリュー抽出サーバ２００の修正結果取得部４０６は、ＭＦＰ３００から送信されたタッチ位置座標と修正後の文字列と修正対象の項目とを取得する。キーバリュー抽出サーバ２００の領域決定部４０４は、タッチ位置座標と、キーバリュー抽出結果のすべて行の文字領域との距離を導出する。領域決定部４０４は、その距離が所定の値以下の文字領域のうち、修正後の文字列との編集距離が最小の文字列が認識された文字領域を、修正対象の項目に対応する正しい文字領域として決定する。所定の値は例えば１００である。 The processing of S2014 is the same as S813. That is, in S2014, the correction result acquisition unit 406 of the key-value extraction server 200 acquires the touch position coordinates, the corrected character string, and the item to be corrected, all sent from the MFP 300. The area determination unit 404 of the key-value extraction server 200 derives the distance between the touch position coordinates and the character areas of all lines of the key-value extraction result. Among the character areas whose distance is equal to or less than a predetermined value, the area determination unit 404 determines the character area in which the character string with the smallest edit distance from the corrected character string is recognized as the correct character area corresponding to the item to be corrected. The predetermined value is, for example, 100.

例えば、図１２に示すタッチ位置１２１０は、ユーザがＳ２０１１においてタッチした位置であり、Ｓ２０１２においてそのタッチ位置座標は、（ｘ，ｙ）＝（１１１０，４９３）と決定されたものとする。 For example, touch position 1210 shown in FIG. 12 is the position touched by the user in S2011, and the touch position coordinates are determined to be (x, y) = (1110, 493) in S2012.

図１３は、図１２の画像データに対して、文字領域検出と文字認識処理とキーバリュー抽出とを行った結果である。列１３１２の「電話番号」のラベルは、正しい文字領域に付与されていないものとする。図１３の列１３１１には、ユーザの入力による修正後の文字列「０１２３－４５－６７８９」と図１３の文字認識結果の文字列との編集距離が保持されている。 Figure 13 shows the results of performing character area detection, character recognition processing, and key-value extraction on the image data in Figure 12. It is assumed that the label "phone number" in column 1312 is not assigned to the correct character area. Column 1311 in Figure 13 holds the edit distance between the character string "0123-45-6789" corrected by the user's input and the character string resulting from character recognition in Figure 13.

図１３の列１３０１は、文字領域の位置とユーザがタッチした画像内のタッチ位置との距離を保持する列であり、タッチ位置座標が（１１１０，４９３）の場合の文字領域との距離を示している。行１３０７～１３１０は、タッチ位置座標との距離が所定の値である１００以下の文字領域を有するレコードである。行１３０７～１３１０のうち、領域決定部４０４は、列１３１１の編集距離が最小である行１３０８の文字領域が、ユーザが修正指示した項目である「電話番号」の項目に対応する正しい文字領域と決定する。 Column 1301 in FIG. 13 is a column that holds the distance between the position of a character area and the touch position in the image touched by the user, and shows the distance from the character area when the touch position coordinates are (1110, 493). Rows 1307 to 1310 are records that have character areas whose distance from the touch position coordinates is a predetermined value of 100 or less. Of rows 1307 to 1310, the area determination unit 404 determines that the character area in row 1308, which has the smallest edit distance in column 1311, is the correct character area that corresponds to the "phone number" field, which is the field that the user has instructed to correct.

Ｓ２０１５～２０１７は、Ｓ１７１２～Ｓ１７１４と同様の処理であるため説明は省略する。 Steps S2015 to S2017 are the same as steps S1712 to S1714, so the explanation will be omitted.

なお、本実施形態に実施形態２の方法を組み合わせて適用してもよい。例えば、実施形態２の図１７のシーケンス図において、Ｓ１７１１で正しい文字領域を決定できなかった場合、Ｓ２０１０～Ｓ２０１２が行われて、Ｓ２０１３の文字認識処理はスキップして、Ｓ２０１４～Ｓ２０１７の処理が行われてもよい。 Note that this embodiment may be combined with the method of embodiment 2. For example, in the sequence diagram of FIG. 17 for embodiment 2, if the correct character area cannot be determined in S1711, steps S2010 to S2012 may be performed, the character recognition process of S2013 may be skipped, and the processes of S2014 to S2017 may be performed.

以上説明したように本実施形態によれば、一部の文字領域に対して文字認識処理をすることによりキーバリューを抽出する場合においても、ユーザが入力した修正後の文字列に基づいて、項目に対応する正しい文字領域を決定することができる。このため、ユーザが項目に対応する文字領域を入力するような煩雑な操作を行う必要がないため、ユーザビリティを向上させることができる。 As described above, according to this embodiment, even when extracting key values by performing character recognition processing on some character areas, the correct character area corresponding to an item can be determined based on the corrected character string entered by the user. This eliminates the need for the user to perform the cumbersome operation of entering a character area corresponding to an item, thereby improving usability.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other embodiments>
The present invention can also be realized by a process in which a program for implementing one or more of the functions of the above-described embodiments is supplied to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) that implements one or more of the functions.

３００ＭＦＰ
５０１ＵＩ部
５０４キーバリュー抽出結果取得部
５０６領域決定部
５０５修正結果送信部 300 MFP
501 UI unit 504 Key-value extraction result acquisition unit 506 Area determination unit 505 Correction result transmission unit

Claims

an acquisition means for acquiring an extraction result including a processing result including information on a character string in an image and a character area in which the character string is recognized, obtained by subjecting image data of an image to character recognition processing, and information on a character string and a character area in the image corresponding to each of the predetermined items, the information being output by inputting the processing result into a learning model trained to output information on a character string and a character area corresponding to each of the predetermined items;
a display control means for displaying character strings corresponding to the respective predetermined items on a display unit based on the extraction result;
a receiving means for receiving, from a user, an instruction to correct a character string corresponding to any one of the predetermined items displayed on the display unit;
a correction means for determining a character area corresponding to an item for which a user has instructed correction based on the corrected character string and the processing result, and correcting the extraction result based on the determination;
an output means for outputting the extraction result corrected by the correction means as data for learning the learning model;
13. An information processing device comprising:

The correction means is
The information processing device according to claim 1, characterized in that, among the character areas in the image recognized by the character recognition process, a character area in which a character string that is the same as the corrected character string is recognized is determined to be the character area corresponding to the item instructed by the user to be corrected.

The correction means is
The information processing device according to claim 1 or 2, characterized in that, when there is one character string among the character strings recognized in the character recognition process, the edit distance from the corrected character string being equal to or less than a predetermined value, the character region in which the character string having an edit distance equal to or less than the predetermined value is recognized among the character regions in the image recognized in the character recognition process is determined to be the character region corresponding to the item instructed to be corrected by the user.

The information processing device according to claim 3, characterized in that the predetermined value is 1.

The correction means is
5. The information processing device according to claim 1, wherein when a difference between a character string extracted based on the output result of the learning model corresponding to an item that the user has instructed to be corrected and the corrected character string is within a predetermined range, the character area in the extraction result corresponding to the item that the user has instructed to be corrected is not corrected.

The information processing device according to claim 5 , wherein the case where the character string is within the predetermined range is a case where an edit distance between a character string extracted based on an output result of the learning model and the corrected character string is 1 or less.

a second receiving means for receiving a touch position where the user touches the display unit as a position in the image corresponding to the item that the user has instructed to be corrected;
The information processing device according to any one of claims 1 to 6, further comprising: a second correction means for determining the character area corresponding to the item that the user has instructed to be corrected based on the touch position, the corrected character string, and the processing result, and correcting the extraction result based on the determination.

The second correction means includes:
The information processing device according to claim 7, characterized in that a distance is derived between a position in the image based on the touch position and a position of each character area in the image recognized by the character recognition process, and among the character areas where the distance is within a predetermined range, a character area in which a character string having a smallest edit distance with the corrected character string is recognized by the character recognition process is determined to be the character area corresponding to the item instructed to be corrected by the user.

when the correction means cannot determine a character area corresponding to the item instructed to be corrected by the user, the second correction means determines a character area corresponding to the item instructed to be corrected by the user;
The information processing device according to claim 7 or 8, characterized in that, when the second correction means determines a character area corresponding to an item that the user has instructed to be corrected, the output means outputs the extraction result corrected based on the determination by the second correction means as data for learning by the learning model.

The output means includes:
10. The information processing apparatus according to claim 9, wherein, when the second correction means is unable to determine a character area corresponding to an item instructed by the user to be corrected, the data for learning is not output.

The information processing apparatus according to claim 1 , further comprising an image data transmission means for transmitting image data of the image to a server that performs the character recognition process, the image being a scanned image obtained by scanning an original document.

the extraction result acquired by the acquiring means is a table in which character strings in the image obtained by performing the character recognition process on the image data of the image, character regions in the image where the character strings are recognized, and values indicating which of the character strings corresponds to each of the predetermined items are associated with each other;
12. The information processing apparatus according to claim 1, wherein, when the table is modified, the output means outputs the table reflecting the modification as the data for the learning.

character recognition means for performing character recognition processing on image data of an image;
a transmitting means for transmitting to the information processing device according to any one of claims 1 to 12, the extraction result including a processing result including information on a character string in the image obtained by the character recognition processing by the character recognition means and a character area in which the character string is recognized, and information on a character string and a character area in the image corresponding to each of the predetermined items output by inputting the processing result into the learning model;
A learning means for acquiring data for the learning model from the information processing device and training the learning model based on the data for learning;
A server comprising:

a character string after correction in the item instructed by the user to be corrected, and a touch position where the user touched the display unit as a position in the image corresponding to the item instructed by the user to be corrected;
A correction result acquisition means for acquiring the result;
and a third correction means for determining a character area corresponding to an item for which the user has instructed to correct, based on the touch position, the corrected character string, and the processing result, and correcting the extraction result based on the determination.
The learning means includes:
The server according to claim 13 , characterized in that, when the data for learning is not output from the information processing device, the learning model is trained based on the extraction result corrected by the third correction means.

A system including an information processing device and a server,
The server,
character recognition means for performing character recognition processing on image data of an image;
a transmitting means for transmitting to the information processing device an extraction result including a processing result including information on a character string in the image obtained by character recognition processing by the character recognition means and a character area in which the character string is recognized, and information on a character string and a character area in the image corresponding to each of the predetermined items, which is output by inputting the processing result into a learning model trained to output information on a character string and a character area corresponding to each of the predetermined items defined in advance;
having
The information processing device
an acquisition means for acquiring the extraction result transmitted from the transmission means ;
a display control means for displaying character strings corresponding to the respective predetermined items on a display unit based on the extraction result;
a receiving means for receiving, from a user, an instruction to correct a character string corresponding to any one of the predetermined items displayed on the display unit;
a correction means for determining a character area corresponding to an item for which a user has instructed correction based on the corrected character string and the processing result, and correcting the extraction result based on the determination;
an output means for outputting the extraction result corrected by the correction means as data for learning the learning model;
The server ,
Further comprising a learning means for learning the learning model based on data for learning the learning model output from the output means of the information processing device.
A system characterized by:

an acquisition step of acquiring an extraction result including a processing result including information on a character string in an image and a character area in which the character string is recognized, obtained by performing character recognition processing on image data of the image, and information on a character string and a character area in which the character string is recognized , in the image corresponding to each of the predetermined items, which is output by inputting the processing result into a learning model trained to output information on a character string and a character area corresponding to each of the predetermined items;
a display control step of displaying character strings corresponding to the respective predetermined items on a display unit based on the extraction result;
a receiving step of receiving, from a user, an instruction to correct a character string corresponding to any one of the predetermined items displayed on the display unit;
a correction step of determining a character area corresponding to an item for which a user has instructed correction based on the corrected character string and the processing result, and correcting the extraction result based on the determination;
an output step of outputting the extraction result corrected in the correction step as data for learning by the learning model;
13. An information processing method comprising:

A program for causing a computer to function as each of the means of the information processing device according to any one of claims 1 to 12 .