JP7714930B2

JP7714930B2 - Character matching device and program

Info

Publication number: JP7714930B2
Application number: JP2021102492A
Authority: JP
Inventors: 侑吾西川
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2025-07-30
Anticipated expiration: 2041-06-21
Also published as: JP2023001649A

Description

本発明は、文字照合装置及びプログラムに関する。 The present invention relates to a character matching device and program.

従来、特定の媒体に記載された文字列のうちから氏名、会社名、店舗名等を表す文字列を同定する情報処理装置が開示されている（例えば、特許文献１）。光学的文字認識装置（ＯＣＲ）を使用して印刷された文字を読み取った場合、誤った認識がされることで、誤ったテキストが出力されることがある。特許文献１には、電子メールアドレスやネットワークアドレスの情報を用いて、氏名等を照合する手法が提示されている。 In the past, information processing devices have been disclosed that identify character strings representing names, company names, store names, etc. from among character strings printed on specific media (for example, Patent Document 1). When printed characters are read using an optical character recognition device (OCR), incorrect recognition can occur, resulting in incorrect text being output. Patent Document 1 presents a method for matching names, etc., using information from email addresses and network addresses.

特許第４９９１４０７号公報Patent No. 4991407

例えば、口座開設業務や、カード発行業務、保険契約業務等、様々な業務における審査の過程で、申請書類に記載された氏名等が、運転免許証等の本人確認書類の記載に一致することを確認している。これらの確認を人による目検で行うのは、業務負担が増大し、ひいてはミスを生じさせる要因になり得る。そのため、近年では、コンピュータを用いてテキストを比較し、一致又は不一致を判別することが行われている。しかし、比較結果が不一致の場合であっても、一部の条件下では許容してもよい場合があるが、その判別ができなかった。 For example, during the review process for various operations, such as account opening, card issuance, and insurance contract processing, it is confirmed that the name and other information written on application documents matches the information on identification documents such as driver's licenses. Performing these checks visually by humans increases the workload and can ultimately lead to errors. For this reason, in recent years, computers have been used to compare text and determine whether it matches or does not. However, even if the comparison results in a mismatch, there are cases where it is acceptable under certain conditions, and this determination was not possible.

そこで、本発明は、文字の照合結果が不一致の場合に、不一致の内容に基づく通知をする文字照合装置及びプログラムを提供することを目的とする。 The present invention aims to provide a character matching device and program that, when a character matching result is a mismatch, notifies the user based on the nature of the mismatch.

本発明は、以下のような解決手段により、前記課題を解決する。
第１の発明は、第１書類から得られる所定の項目に対する第１テキストと、第２書類から得られる前記所定の項目に対する第２テキストとを比較する比較手段と、前記比較手段による比較の結果が不一致の場合に、不一致内容を判定する判定手段と、前記判定手段が判定した不一致内容に対応した通知をする通知手段と、を備える文字照合装置である。
第２の発明は、第１の発明の文字照合装置において、前記判定手段は、不一致である対の文字が異体字を記憶した異体字辞書に登録されているときに、前記不一致内容が異体字であると判定し、前記通知手段は、異体字である旨を通知する、文字照合装置である。
第３の発明は、第１の発明又は第２の発明の文字照合装置において、前記判定手段は、不一致である対の文字同士の類似度を算出し、算出した前記類似度が閾値以上であるときに、前記不一致内容が異体字又は誤変換であると判定し、前記通知手段は、異体字又は誤変換である旨を通知する、文字照合装置である。
第４の発明は、第３の発明の文字照合装置において、前記判定手段は、不一致である対の文字を示す画像領域を複数の小領域に分割し、同一位置の前記小領域を比較して前記類似度を算出する、文字照合装置である。
第５の発明は、第１の発明から第４の発明までのいずれかの文字照合装置において、前記第１テキスト及び前記第２テキストの文字数を確認する文字数確認手段を備え、前記比較手段は、前記文字数確認手段により文字数の一致を確認した場合に、一文字ずつ順番に比較する、文字照合装置である。
第６の発明は、第５の発明の文字照合装置において、前記文字数確認手段により文字数の不一致を確認した場合に、不一致の文字を各々特定し、特定した複数の前記文字の組み合わせと、対になる文字との類似度を算出する類似算出手段を備え、前記通知手段は、前記類似算出手段が算出した前記類似度が閾値以上の場合に、誤変換である旨を通知する、文字照合装置である。
第７の発明は、第１の発明から第６の発明までのいずれかの文字照合装置において、前記所定の項目には、氏名及び住所の少なくとも一方を含む、文字照合装置である。
第８の発明は、第１の発明から第７の発明までのいずれかの文字照合装置において、前記第１書類を画像化した第１データを取得する第１データ取得手段と、前記第２書類を画像化した第２データを取得する第２データ取得手段と、前記第１データ取得手段が取得した前記第１データから前記所定の項目を取得し、取得した前記所定の項目に対応する値を前記第１テキストとして取得する第１テキスト取得手段と、前記第２データ取得手段が取得した前記第２データから前記所定の項目を取得し、取得した前記所定の項目に対応する値を前記第２テキストとして取得する第２テキスト取得手段と、を備える、文字照合装置である。
第９の発明は、第８の発明の文字照合装置において、前記第１データ及び／又は前記第２データは、手書き文字を含む書類の画像化データであり、前記第１テキスト及び／又は前記第２テキストは、前記画像化データに対して文字認識処理を行って得られたテキストである、文字照合装置である。
第１０の発明は、第１の発明から第９の発明までのいずれかの文字照合装置としてコンピュータを機能させるためのプログラムである。 The present invention solves the above problems by the following means.
A first invention is a character matching device comprising: a comparison means for comparing first text for a predetermined item obtained from a first document with second text for the predetermined item obtained from a second document; a determination means for determining the content of the mismatch when the comparison by the comparison means results in a mismatch; and a notification means for issuing a notification corresponding to the content of the mismatch determined by the determination means.
A second invention is the character matching device of the first invention, wherein the determination means determines that the mismatched content is a variant character when the mismatched pair of characters is registered in a variant character dictionary that stores variant character variants, and the notification means notifies the user that the mismatched content is a variant character.
A third invention is a character matching device according to the first or second invention, wherein the determination means calculates a similarity between a pair of mismatched characters, and when the calculated similarity is equal to or greater than a threshold, determines that the mismatch is a variant character or a mistranslation, and the notification means notifies the user that the mismatch is a variant character or a mistranslation.
A fourth aspect of the present invention is the character matching device of the third aspect of the present invention, wherein the determination means divides an image area showing a pair of mismatched characters into a plurality of small areas, and compares the small areas at the same position to calculate the similarity.
A fifth invention is a character matching device according to any one of the first to fourth inventions, further comprising a character count confirmation means for confirming the number of characters in the first text and the second text, and the comparison means, when the character count confirmation means confirms that the numbers of characters match, compares the first text and the second text one character at a time.
A sixth invention is the character matching device of the fifth invention, further comprising a similarity calculation means for, when the character number confirmation means confirms a mismatch in the number of characters, identifying each mismatched character and calculating a similarity between the identified combination of multiple characters and a paired character, and the notification means notifies the user that the conversion has been erroneous when the similarity calculated by the similarity calculation means is equal to or greater than a threshold value.
A seventh aspect of the present invention is the character matching device according to any one of the first to sixth aspects of the present invention, wherein the predetermined items include at least one of a name and an address.
An eighth invention is a character matching device according to any one of the first to seventh inventions, comprising: first data acquisition means for acquiring first data obtained by imaging the first document; second data acquisition means for acquiring second data obtained by imaging the second document; first text acquisition means for acquiring the predetermined items from the first data acquired by the first data acquisition means, and acquiring values corresponding to the acquired predetermined items as the first text; and second text acquisition means for acquiring the predetermined items from the second data acquired by the second data acquisition means, and acquiring values corresponding to the acquired predetermined items as the second text.
A ninth invention is the character matching device of the eighth invention, wherein the first data and/or the second data is image data of a document including handwritten characters, and the first text and/or the second text is text obtained by performing character recognition processing on the image data.
A tenth aspect of the present invention is a program for causing a computer to function as any one of the character matching devices according to the first to ninth aspects of the present invention.

本発明によれば、文字の照合結果が不一致の場合に、不一致の内容に基づく通知をする文字照合装置及びプログラムを提供することができる。 The present invention provides a character matching device and program that, when a character matching result is a mismatch, notifies the user based on the nature of the mismatch.

本実施形態に係る文字照合システムの全体構成及び文字照合サーバの機能ブロックを示す図である。1 is a diagram showing the overall configuration of a character matching system according to an embodiment of the present invention and functional blocks of a character matching server. 本実施形態に係る異体字ＤＢの例を示す図である。FIG. 2 is a diagram showing an example of a variant character DB according to the embodiment; 本実施形態に係る文字照合サーバの文字判別処理を示すフローチャートである。10 is a flowchart showing a character discrimination process of the character matching server according to the embodiment; 本実施形態に係る文字照合サーバの照合対象取得処理を示すフローチャートである。10 is a flowchart showing a matching target acquisition process of the character matching server according to the present embodiment. 本実施形態に係る文字照合サーバの異体字等判別処理を示すフローチャートである。10 is a flowchart showing a process of distinguishing variant characters, etc., performed by the character matching server according to the present embodiment. 本実施形態に係る異体字等判別処理における類似度の算出方法の例を示す図である。10A and 10B are diagrams illustrating an example of a method for calculating similarity in the character variant discrimination process according to the embodiment. 本実施形態に係る文字照合サーバの誤変換判定処理を示すフローチャートである。10 is a flowchart showing an erroneous conversion determination process of the character matching server according to the present embodiment. 本実施形態に係る文字照合システムにおける文字判別に係る具体例を示す図である。10A and 10B are diagrams showing specific examples of character discrimination in the character matching system according to the present embodiment.

以下、本発明を実施するための形態について、図を参照しながら説明する。なお、これは、あくまでも一例であって、本発明の技術的範囲は、これに限られるものではない。
（実施形態）
＜文字照合システム１００の全体構成＞
図１は、本実施形態に係る文字照合システム１００の全体構成及び文字照合サーバ１の機能ブロックを示す図である。
図２は、本実施形態に係る異体字ＤＢ４の例を示す図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, this is merely an example, and the technical scope of the present invention is not limited to this example.
(Embodiment)
<Overall Configuration of Character Matching System 100>
FIG. 1 is a diagram showing the overall configuration of a character matching system 100 according to this embodiment and the functional blocks of a character matching server 1.
FIG. 2 is a diagram showing an example of the variant character DB 4 according to this embodiment.

図１に示す文字照合システム１００は、例えば、申請書類（第１書類）と本人確認書類（第２書類）とのうち、照合対象項目（所定の項目）に係る内容を照合し、照合結果を出力するシステムである。文字照合システム１００は、両者の書類にある、照合対象項目についての内容であるテキストを比較して、不一致である場合に、不一致の内容に応じた通知をする。そうすることで、文字照合システム１００は、書類の確認をするユーザ（以下、書類の確認をする者を、オペレータともいう。）の目検による作業負担を軽減できる。
文字照合システム１００は、文字照合サーバ１（文字照合装置）と、異体字ＤＢ（データベース）４（異体字辞書）と、端末５とを備える。文字照合サーバ１と、異体字ＤＢ４と、端末５とは、通信ネットワークＮを介して通信可能に接続されている。 The character matching system 100 shown in FIG. 1 is a system that matches the contents of items to be matched (predetermined items) between, for example, an application document (first document) and an identification document (second document), and outputs the matching results. The character matching system 100 compares the text of the items to be matched in both documents, and if there is a mismatch, issues a notification according to the content of the mismatch. In this way, the character matching system 100 can reduce the visual inspection workload of the user who checks the documents (hereinafter, the person who checks the documents will also be referred to as the operator).
The character matching system 100 includes a character matching server 1 (character matching device), a character variant database (DB) 4 (character variant dictionary), and a terminal 5. The character matching server 1, the character variant DB 4, and the terminal 5 are communicatively connected via a communication network N.

＜文字照合サーバ１＞
文字照合サーバ１は、様々な審査業務に用いることができるサーバである。文字照合サーバ１は、例えば、申請書類を画像化した申請書類データ（第１データ）と、当該申請書類の審査に用いる本人確認書類を画像化した本人確認データ（第２データ）とを照合する。
文字照合サーバ１は、制御部１０と、記憶部２０と、通信インタフェース部２９とを備える。
制御部１０は、文字照合サーバ１の全体を制御するＣＰＵ（中央処理装置）である。制御部１０は、記憶部２０に記憶されているＯＳ（オペレーティングシステム）や、各種のアプリケーションプログラムを適宜読み出して実行することにより、上述したハードウェアと協働し、各種機能を実行する。 <Character matching server 1>
The character matching server 1 is a server that can be used for various screening operations. For example, the character matching server 1 compares application document data (first data) obtained by imaging an application document with identity verification data (second data) obtained by imaging an identity verification document used in the screening of the application document.
The character matching server 1 includes a control unit 10 , a storage unit 20 , and a communication interface unit 29 .
The control unit 10 is a CPU (Central Processing Unit) that controls the entire character matching server 1. The control unit 10 executes various functions in cooperation with the above-mentioned hardware by appropriately reading and executing the OS (Operating System) and various application programs stored in the storage unit 20.

制御部１０は、申請データ取得部１１（第１データ取得手段）と、第１テキスト取得部１２（第１テキスト取得手段）と、本人確認データ取得部１３（第２データ取得手段）と、第２テキスト取得部１４（第２テキスト取得手段）と、比較部１５（文字数確認手段、比較手段）と、判定部１６（判定手段、類似算出手段）と、通知処理部１７（通知手段）とを備える。 The control unit 10 includes an application data acquisition unit 11 (first data acquisition means), a first text acquisition unit 12 (first text acquisition means), a personal identification data acquisition unit 13 (second data acquisition means), a second text acquisition unit 14 (second text acquisition means), a comparison unit 15 (character count confirmation means, comparison means), a determination unit 16 (determination means, similarity calculation means), and a notification processing unit 17 (notification means).

申請データ取得部１１は、申請書類を画像化した申請データを取得する。申請書類とは、例えば、銀行の口座開設の申込書や、生命保険等の各種保険の契約申込書等をいう。申請書類は、その一部に手書き文字を含んでもよい。そして、例えば、依頼者からＷｅｂによる書類の提出があった場合には、申請データ取得部１１は、図示しない依頼者端末から申請データを受け付ける。また、例えば、依頼者から郵送による書類の提出があった場合には、申請データ取得部１１は、紙ベースの申請書類を画像化後の申請データを受け付ける。そして、申請データ取得部１１は、取得した申請データを、書類データ記憶部２２に記憶させる。 The application data acquisition unit 11 acquires application data, which is an image of an application document. An application document may, for example, be an application form for opening a bank account or an application form for various insurance policies such as life insurance. An application document may contain handwritten characters in part. For example, if a requester submits a document via the web, the application data acquisition unit 11 accepts the application data from a client terminal (not shown). For example, if a requester submits a document by mail, the application data acquisition unit 11 accepts the application data after the paper-based application document has been converted into an image. The application data acquisition unit 11 then stores the acquired application data in the document data storage unit 22.

第１テキスト取得部１２は、申請データに対してＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）による処理（文字認識処理）を行って、テキストデータを得る。そして、第１テキスト取得部１２は、テキストデータから照合対象項目を取得し、取得した照合対象項目に対応する値を第１テキストとして取得する。ここで、照合対象項目とは、審査業務において照合対象とする項目をいい、例えば、氏名（姓及び名）、住所等である。そして、第１テキスト取得部１２は、取得した項目とその値（第１テキスト）とを、比較データ記憶部２３に記憶させる。
なお、Ｗｅｂによる書類の提出の場合等において、申請データ取得部１１が取得した申請データにテキストデータを含む場合には、第１テキスト取得部１２によるテキストデータを得る処理は、不要である。 The first text acquisition unit 12 performs OCR (Optical Character Recognition) processing (character recognition processing) on the application data to obtain text data. The first text acquisition unit 12 then acquires items to be compared from the text data and acquires values corresponding to the acquired items to be compared as first text. Here, the items to be compared refer to items to be compared in the screening process, such as name (first and last name), address, etc. The first text acquisition unit 12 then stores the acquired items and their values (first text) in the comparison data storage unit 23.
In addition, when documents are submitted via the web, if the application data acquired by the application data acquisition unit 11 includes text data, the process of acquiring text data by the first text acquisition unit 12 is not necessary.

本人確認データ取得部１３は、本人確認書類を画像化した本人確認データを取得する。本人確認書類とは、申請者を確認するためのものであり、例えば、運転免許証や保険証、パスポート等をいう。本人確認書類は、その一部に手書き文字を含んでもよい。そして、例えば、依頼者からＷｅｂによる書類の提出があった場合には、本人確認データ取得部１３は、図示しない依頼者端末から本人確認データを受け付ける。また、例えば、依頼者から郵送による書類の提出があった場合には、本人確認データ取得部１３は、郵送された本人確認書類のコピーを画像化後の本人確認データを受け付ける。そして、本人確認データ取得部１３は、取得した本人確認データを、共に受け付けた照合する申請データに対応付けて書類データ記憶部２２に記憶させる。 The personal identification data acquisition unit 13 acquires personal identification data, which is an image of a personal identification document. A personal identification document is used to verify the applicant, and includes, for example, a driver's license, insurance card, or passport. A personal identification document may include handwritten characters. For example, if a requester submits a document via the web, the personal identification data acquisition unit 13 accepts the personal identification data from a client terminal (not shown). For example, if a requester submits a document by mail, the personal identification data acquisition unit 13 accepts the personal identification data after an image of a copy of the mailed personal identification document has been created. The personal identification data acquisition unit 13 then associates the acquired personal identification data with the application data to be matched, which has also been accepted, and stores it in the document data storage unit 22.

第２テキスト取得部１４は、本人確認データに対してＯＣＲ処理を行って、テキストデータを得る。そして、第２テキスト取得部１４は、テキストデータから照合対象項目を取得し、取得した照合対象項目に対応する値を第２テキストとして取得する。そして、第２テキスト取得部１４は、取得した値（第２テキスト）を、照合する申請データから得られた項目及び値（第１テキスト）に対応付けて比較データ記憶部２３に記憶させる。
なお、Ｗｅｂによる書類の提出の場合等において、本人確認データ取得部１３が取得した本人確認データにテキストデータを含む場合には、第２テキスト取得部１４によるテキストデータを得る処理は、不要である。 The second text acquisition unit 14 performs OCR processing on the personal identification data to obtain text data. The second text acquisition unit 14 then acquires items to be compared from the text data and acquires values corresponding to the acquired items to be compared as second text. The second text acquisition unit 14 then associates the acquired values (second text) with the items and values (first text) obtained from the application data to be compared, and stores the values in the comparison data storage unit 23.
In addition, in the case of submitting documents via the Web, if the personal identification data acquired by the personal identification data acquisition unit 13 includes text data, the process of acquiring the text data by the second text acquisition unit 14 is not necessary.

比較部１５は、比較データ記憶部２３に記憶した照合対象項目に対する第１テキストと、第２テキストとの文字数を確認する。
また、比較部１５は、文字数の一致を確認した場合に、第１テキストと、第２テキストとを一文字ずつ順番に比較する。 The comparison unit 15 checks the number of characters in the first text and the second text for the item to be matched stored in the comparison data storage unit 23 .
Furthermore, when the comparison unit 15 confirms that the number of characters matches, it compares the first text with the second text character by character.

判定部１６は、比較部１５により、文字数の不一致を確認した場合に、不一致の文字を特定し、特定した複数の文字の組み合わせと、対になる文字との類似度を算出する。
また、判定部１６は、比較部１５による文字の比較の結果が不一致の場合に、不一致内容を判定する。
より具体的には、判定部１６は、不一致である対の文字がいずれも異体字ＤＢ４に登録されているか否かを判定する。そして、判定部１６は、不一致である対の文字が異体字ＤＢ４に登録されているときに、不一致内容が異体字文字であると判定する。
また、判定部１６は、不一致である対の文字同士の類似度を算出してもよい。そして、算出した類似度が閾値以上であるときに、判定部１６は、不一致内容が異体字又は誤変換であると判定する。ここで、判定部１６は、類似度を、例えば、不一致である対の文字を示す画像領域を複数の小領域に分割し、同一位置の小領域を比較することで算出する。 When the comparison unit 15 confirms that the number of characters does not match, the determination unit 16 identifies the mismatched characters and calculates the similarity between the combination of the identified characters and the paired characters.
Furthermore, if the comparison of the characters by the comparison unit 15 results in a mismatch, the determination unit 16 determines the content of the mismatch.
More specifically, the determination unit 16 determines whether or not both of the mismatched pair of characters are registered in the character variant DB 4. When the mismatched pair of characters is registered in the character variant DB 4, the determination unit 16 determines that the mismatch content is a character variant.
The determination unit 16 may also calculate the similarity between a pair of mismatched characters. If the calculated similarity is equal to or greater than a threshold, the determination unit 16 determines that the mismatch is a variant character or a mistranslation. Here, the determination unit 16 calculates the similarity by, for example, dividing an image area showing a pair of mismatched characters into multiple small areas and comparing the small areas at the same position.

通知処理部１７は、判定部１６が不一致内容を判定した場合に、不一致内容に対応する通知を行う。より具体的には、判定部１６により不一致内容が異体字であると判定された場合には、通知処理部１７は、異体字である旨を通知する。また、判定部１６により不一致内容が異体字又は誤変換であると判定された場合には、通知処理部１７は、異体字又は誤変換である旨を通知する。さらに、判定部１６が算出した不一致の文字である複数の文字の組み合わせと、対になる文字との類似度が閾値以上の場合には、通知処理部１７は、誤変換である旨を通知する。 When the determination unit 16 determines that there is a mismatch, the notification processing unit 17 issues a notification corresponding to the mismatch. More specifically, when the determination unit 16 determines that the mismatch is a variant character, the notification processing unit 17 issues a notification to that effect. Furthermore, when the determination unit 16 determines that the mismatch is a variant character or a misconversion, the notification processing unit 17 issues a notification to that effect. Furthermore, when the similarity between the combination of multiple characters that constitutes the mismatch calculated by the determination unit 16 and the paired character is equal to or greater than a threshold, the notification processing unit 17 issues a notification to the effect of a misconversion.

記憶部２０は、文字照合サーバ１の動作に必要なプログラム、データ等を記憶するためのハードディスク、半導体メモリ素子等の記憶装置である。
記憶部２０は、プログラム記憶部２１と、書類データ記憶部２２と、比較データ記憶部２３とを備える。
プログラム記憶部２１は、プログラムを記憶する記憶領域である。プログラム記憶部２１は、文字照合プログラム２１ａ（プログラム）を記憶する。
文字照合プログラム２１ａは、制御部１０の各機能を実行するためのプログラムである。 The storage unit 20 is a storage device such as a hard disk or semiconductor memory element for storing programs, data, etc. required for the operation of the character matching server 1 .
The storage unit 20 includes a program storage unit 21 , a document data storage unit 22 , and a comparison data storage unit 23 .
The program storage unit 21 is a storage area for storing programs, and stores a character matching program 21a (program).
The character matching program 21 a is a program for executing each function of the control unit 10 .

書類データ記憶部２２は、申請データ及び当該申請データと照合する本人確認データを対応付けて記憶する記憶領域である。
比較データ記憶部２３は、申請データから取得した照合対象項目とその値（第１テキスト）と、本人確認データから取得した当該項目に対応する値（第２テキスト）とを対応付けて記憶する。
なお、書類データ記憶部２２と、比較データ記憶部２３とは、文字照合処理を行う際に一時的に用いるものであってもよい。
通信インタフェース部２９は、通信ネットワークＮを介して異体字ＤＢ４や端末５との通信を行うためのインタフェース部である。 The document data storage unit 22 is a storage area that stores application data and personal identification data to be checked against the application data in association with each other.
The comparison data storage unit 23 stores the items to be matched and their values (first text) obtained from the application data, in association with the values (second text) corresponding to the items obtained from the personal identification data.
It should be noted that document data storage unit 22 and comparison data storage unit 23 may be used temporarily when performing character matching processing.
The communication interface unit 29 is an interface unit for communicating with the variant character DB 4 and the terminal 5 via the communication network N.

なお、コンピュータとは、制御部、記憶装置等を備えた情報処理装置をいい、文字照合サーバ１は、制御部１０、記憶部２０等を備えた情報処理装置であり、コンピュータの概念に含まれる。
また、文字照合サーバ１を構成するハードウェアの数に制限はない。必要に応じて、１又は複数で構成してもよい。また、文字照合サーバ１のハードウェアは、必要に応じてＷｅｂサーバ、ＤＢ（データベース）サーバ、アプリケーションサーバ等の各種サーバを含んで構成してもよく、１台のサーバで構成しても、それぞれ別のサーバで構成してもよい。さらに、文字照合サーバ１は、例えば、クラウドであってもよい。 It should be noted that a computer refers to an information processing device that includes a control unit, a storage device, etc., and the character matching server 1 is an information processing device that includes a control unit 10, a storage unit 20, etc., and is included in the concept of a computer.
Furthermore, there is no limit to the number of pieces of hardware that make up the character matching server 1. It may be configured with one or more pieces of hardware as needed. Furthermore, the hardware of the character matching server 1 may be configured to include various servers such as a web server, a DB (database) server, and an application server as needed, and may be configured with a single server or with separate servers for each. Furthermore, the character matching server 1 may be, for example, a cloud.

＜異体字ＤＢ４＞
異体字ＤＢ４は、異体字を記憶したデータベースである。
図２に示すように、異体字ＤＢ４は、文字と異体字とを対応付けて記憶する。文字は、標準字体の文字である。異体字は、標準字体とは異なるが、意味や発音が同じで通用する漢字である。異体字ＤＢ４は、１つの文字に対して複数の異体字を記憶してもよい。
異体字ＤＢ４は、予め作成されて登録されている。また、異体字ＤＢ４は、人手によって逐次追加更新をしてもよい。
異体字ＤＢ４は、図示しないが、制御部、記憶部、通信インタフェース部を備える。なお、異体字ＤＢ４は、制御部を有さず、文字照合サーバ１の制御部１０が、異体字ＤＢ４を制御してもよい。 <Variant Character DB4>
The variant character DB 4 is a database that stores variant characters.
As shown in Figure 2, the variant character DB 4 stores characters and variant characters in association with each other. A character is a character in a standard font. A variant character is a Chinese character that differs from the standard font but has the same meaning and pronunciation and is widely used. The variant character DB 4 may store multiple variant characters for one character.
The character variant DB 4 is created and registered in advance, and may be manually updated as needed.
Although not shown, the character variant DB 4 includes a control unit, a storage unit, and a communication interface unit. Note that the character variant DB 4 may not include a control unit, and the control unit 10 of the character matching server 1 may control the character variant DB 4.

＜端末５＞
端末５は、審査業務に係るオペレータが使用する端末である。端末５は、例えば、パーソナルコンピュータ（ＰＣ）や、タブレット端末等で構成することができる。端末５は、業務用の専用端末であってもよいし、他の業務と兼用の端末であってもよい。図示していないが、端末５は、制御部、記憶部、入力部、表示部、通信インタフェース部等を備える。 <Terminal 5>
The terminal 5 is a terminal used by an operator involved in the screening work. The terminal 5 can be configured, for example, as a personal computer (PC), a tablet terminal, etc. The terminal 5 may be a terminal dedicated to the work, or may be a terminal shared with other work. Although not shown, the terminal 5 includes a control unit, a memory unit, an input unit, a display unit, a communication interface unit, etc.

＜処理の説明＞
次に、文字照合サーバ１で行う処理について説明する。
図３は、本実施形態に係る文字照合サーバ１の文字判別処理を示すフローチャートである。
図４は、本実施形態に係る文字照合サーバ１の照合対象取得処理を示すフローチャートである。 <Processing Description>
Next, the processing performed by the character matching server 1 will be described.
FIG. 3 is a flowchart showing the character discrimination process of the character matching server 1 according to this embodiment.
FIG. 4 is a flowchart showing the matching target acquisition process of the character matching server 1 according to this embodiment.

図３のステップＳ（以下、「ステップＳ」を、単に「Ｓ」という。）１１において、文字照合サーバ１の制御部１０は、照合対象取得処理を行う。
ここで、照合対象取得処理について、図４に基づき説明する。
図４のＳ２１において、制御部１０（申請データ取得部１１）は、申請書類を画像化した申請データを取得する。そして、制御部１０は、取得した申請データを、書類データ記憶部２２に記憶させる。
Ｓ２２において、制御部１０（第１テキスト取得部１２）は、申請データに対してＯＣＲ処理を行い、文字列（テキストデータ）から照合対象項目と、当該項目に対応する値（第１テキスト）とを取得する。ここで、照合対象項目は、複数の項目であってよく、その場合には、制御部１０は、各照合対象項目に対応する各値を取得する。 In step S (hereinafter, "step S" will be simply referred to as "S") 11 in FIG. 3, the control unit 10 of the character matching server 1 performs a matching target acquisition process.
Here, the matching target acquisition process will be described with reference to FIG.
4, the control unit 10 (application data acquisition unit 11) acquires application data obtained by converting application documents into images. The control unit 10 then stores the acquired application data in the document data storage unit 22.
In S22, the control unit 10 (first text acquisition unit 12) performs OCR processing on the application data to acquire items to be matched and values (first text) corresponding to the items from the character string (text data). Here, the items to be matched may be multiple items, and in that case, the control unit 10 acquires each value corresponding to each item to be matched.

Ｓ２３において、制御部１０（本人確認データ取得部１３）は、申請書類と照合する本人確認書類を画像化した本人確認データを取得する。そして、制御部１０は、取得した本人確認データを、照合対象の申請データに対応付けて、書類データ記憶部２２に記憶させる。
Ｓ２４において、制御部１０（第２テキスト取得部１４）は、本人確認データに対してＯＣＲ処理を行い、文字列（テキストデータ）から照合対象項目と、当該項目に対応する値（第２テキスト）とを取得する。
Ｓ２５において、制御部１０は、照合対象項目と、当該項目に対応する第１テキスト及び第２テキストとを対応付けて比較データ記憶部２３に記憶させる。その後、制御部１０は、処理を図３のＳ１２に移す。 In S23, the control unit 10 (personal identification data acquisition unit 13) acquires personal identification data, which is an image of the personal identification document to be compared with the application document. The control unit 10 then associates the acquired personal identification data with the application data to be compared, and stores the data in the document data storage unit 22.
In S24, the control unit 10 (second text acquisition unit 14) performs OCR processing on the personal identification data, and acquires items to be matched and values (second text) corresponding to the items from the character string (text data).
In S25, the control unit 10 associates the item to be compared with the first text and second text corresponding to the item and stores them in the comparison data storage unit 23. Thereafter, the control unit 10 proceeds to S12 in FIG.

図３のＳ１２において、制御部１０（比較部１５）は、照合対象項目に対応する各テキストを比較データ記憶部２３から取得し、テキスト同士を比較する。
Ｓ１３において、制御部１０（比較部１５）は、文字数が一致しているか否かを判断する。文字数が一致している場合（Ｓ１３：ＹＥＳ）には、制御部１０は、処理をＳ１４に移す。他方、文字数が一致していない場合（Ｓ１３：ＮＯ）には、制御部１０は、処理をＳ１６に移す。なお、文字数が一致していない場合とは、画像データからテキストに変換する際に誤変換をしている可能性がある場合を含む。 In S12 of FIG. 3, the control unit 10 (comparison unit 15) acquires each text corresponding to the item to be matched from the comparison data storage unit 23, and compares the texts with each other.
In S13, the control unit 10 (comparison unit 15) determines whether the number of characters matches. If the number of characters matches (S13: YES), the control unit 10 proceeds to S14. On the other hand, if the number of characters does not match (S13: NO), the control unit 10 proceeds to S16. Note that a case where the number of characters does not match includes a case where there is a possibility of a conversion error when converting image data to text.

Ｓ１４において、制御部１０（比較部１５）は、文字が完全に一致しているか否かを判断する。ここで、文字が完全に一致している場合とは、申請書類に記載された照合対象項目のテキストが、本人確認書類に記載された同じ項目のテキストと１文字も違わずに一致していることをいう。文字が完全に一致している場合（Ｓ１４：ＹＥＳ）には、制御部１０は、処理をＳ１７に移す。他方、文字が完全に一致していない場合（Ｓ１４：ＮＯ）には、制御部１０は、処理をＳ１５に移す。 In S14, the control unit 10 (comparison unit 15) determines whether the characters match exactly. Here, a complete match means that the text of the item to be compared written on the application document matches the text of the same item written on the personal identification document without a single character difference. If the characters match exactly (S14: YES), the control unit 10 proceeds to S17. On the other hand, if the characters do not match exactly (S14: NO), the control unit 10 proceeds to S15.

Ｓ１５において、制御部１０は、後述する図５で説明する異体字等判別処理を行う。その後、制御部１０は、処理をＳ１７に移す。
Ｓ１６において、制御部１０は、後述する図７で説明する誤変換判定処理を行う。その後、制御部１０は、処理をＳ１７に移す。 In S15, the control unit 10 performs a process of determining variant characters, etc., which will be described later with reference to Fig. 5. After that, the control unit 10 moves the process to S17.
In S16, the control unit 10 performs an erroneous conversion determination process, which will be described later with reference to Fig. 7. After that, the control unit 10 moves the process to S17.

Ｓ１７において、制御部１０は、全ての照合対象について処理をしたか否かを判断する。比較データ記憶部２３に記憶されたデータについて全て処理をした場合には、制御部１０は、全ての照合対象について処理をしたと判断できる。全ての照合対象について処理をした場合（Ｓ１７：ＹＥＳ）には、制御部１０は、処理をＳ１８に移す。他方、全ての照合対象について処理をしていない場合（Ｓ１７：ＮＯ）には、制御部１０は、処理をＳ１２に移し、比較データ記憶部２３に記憶された残りの照合対象について処理を行う。 In S17, the control unit 10 determines whether all matching targets have been processed. If all data stored in the comparison data storage unit 23 has been processed, the control unit 10 can determine that all matching targets have been processed. If all matching targets have been processed (S17: YES), the control unit 10 proceeds to S18. On the other hand, if all matching targets have not been processed (S17: NO), the control unit 10 proceeds to S12 and processes the remaining matching targets stored in the comparison data storage unit 23.

Ｓ１８において、制御部１０（通知処理部１７）は、結果出力処理を行う。具体的には、これまでの処理でメッセージが生成されている場合には、制御部１０は、当該メッセージを端末５に出力する。また、これまでの処理でメッセージが生成されていない場合には、制御部１０は、照合結果に問題がない旨のメッセージを端末５に出力する。なお、照合対象のテキストが完全に一致している場合には、Ｓ１８の処理に達した時点でメッセージが生成されていない。その後、制御部１０は、本処理を終了する。 In S18, the control unit 10 (notification processing unit 17) performs a result output process. Specifically, if a message has been generated in the processing up to this point, the control unit 10 outputs that message to the terminal 5. On the other hand, if no message has been generated in the processing up to this point, the control unit 10 outputs a message to the terminal 5 indicating that there is no problem with the matching result. Note that if the text to be matched is a perfect match, no message has been generated by the time the processing of S18 is reached. The control unit 10 then terminates this processing.

次に、異体字等判別処理について説明する。
図５は、本実施形態に係る文字照合サーバ１の異体字等判別処理を示すフローチャートである。
図６は、本実施形態に係る異体字等判別処理における類似度の算出方法の例を示す図である。 Next, the process of determining variant characters will be described.
FIG. 5 is a flowchart showing the process of distinguishing variant characters, etc., performed by the character matching server 1 according to this embodiment.
FIG. 6 is a diagram showing an example of a method for calculating the similarity in the character variant discrimination process according to this embodiment.

図５のＳ３１において、制御部１０（判定部１６）は、不一致である対の文字について、異体字ＤＢ４を検索する。
Ｓ３２において、制御部１０（判定部１６）は、不一致である対の文字が異体字ＤＢ４に登録されているか否かを判断する。ここで、不一致である対の文字がいずれも異体字ＤＢ４の同じレコードの文字又は異体字に登録されていれば、制御部１０は、異体字ＤＢ４に登録されていると判断する。
例えば、図２に示す「高」の場合には、不一致である対の文字が、文字と異体字との両方の文字であれば、制御部１０は、異体字ＤＢ４に登録されていると判断する。また、図２に示す「辺」の場合には、不一致である対の文字が、文字と異体字との両方の文字の場合でもよいし、両方ともに異体字の文字であってもよい。
不一致である対の文字が異体字ＤＢ４に登録されている場合（Ｓ３２：ＹＥＳ）には、制御部１０は、処理をＳ３３に移す。他方、不一致である対の文字が異体字ＤＢ４に登録されていない場合（Ｓ３２：ＮＯ）には、制御部１０は、処理をＳ３４に移す。 In S31 of FIG. 5, the control unit 10 (determining unit 16) searches the character variant DB 4 for the mismatched pair of characters.
In S32, the control unit 10 (determination unit 16) determines whether the mismatched pair of characters is registered in the character variant DB 4. If both of the mismatched pair of characters are registered as characters or character variants in the same record in the character variant DB 4, the control unit 10 determines that the mismatched pair of characters is registered in the character variant DB 4.
For example, in the case of "taka" shown in Fig. 2, if the mismatched pair of characters is both the character and a variant character, the control unit 10 determines that the mismatched pair of characters is registered in the variant character DB 4. Also, in the case of "hen" shown in Fig. 2, the mismatched pair of characters may be both the character and a variant character, or both may be variant character characters.
If the mismatched pair of characters is registered in the character variant DB 4 (S32: YES), the control unit 10 proceeds to S33. On the other hand, if the mismatched pair of characters is not registered in the character variant DB 4 (S32: NO), the control unit 10 proceeds to S34.

Ｓ３３において、制御部１０（判定部１６）は、異体字である旨の通知メッセージを生成し、処理を図３のＳ１７に移す。
Ｓ３４において、制御部１０（判定部１６）は、不一致の文字間の類似度を算出する。文字間の類似度算出方法の一例として、制御部１０は、文字を示す画像領域を複数の小領域に分割する。そして、制御部１０は、同一位置の小領域を比較することで、文字間の類似度を算出する。 In S33, the control unit 10 (determining unit 16) generates a notification message indicating that the character is a variant character, and the process proceeds to S17 in FIG.
In S34, the control unit 10 (determination unit 16) calculates the similarity between the mismatched characters. As an example of a method for calculating the similarity between characters, the control unit 10 divides the image area representing the characters into multiple small areas. Then, the control unit 10 calculates the similarity between the characters by comparing the small areas at the same position.

ここで、文字間の類似度算出方法について、図６に基づき説明する。
図６（Ａ）は、画像領域６１に示される文字と、画像領域６２に示される文字との類似度を算出するものである。制御部１０は、画像領域６１をｎ×ｎの小領域に分割する。そして、制御部１０は、画像領域６１をベクトル化し、各小領域を０又は１の値で表す。同様に、制御部１０は、画像領域６２をベクトル化する。
そして、制御部１０は、ベクトル化した値を用いて、ＳＳＤ（ＳｕｍｏｆＳｑｕａｒｅＤｉｆｆｅｒｅｎｃｅ）を算出する。ＳＳＤは、画像領域６１と画像領域６２との相違度を算出するものである。
画像領域６１をｘとしてベクトル化した値をｘ（ａ，ｂ）で表し、画像領域６２をｙとしてベクトル化した値をｙ（ａ，ｂ）で表した場合に、ＳＳＤは、次の式により算出できる。 Here, a method for calculating the similarity between characters will be described with reference to FIG.
6A shows a calculation of the similarity between characters shown in an image area 61 and characters shown in an image area 62. The control unit 10 divides the image area 61 into n×n small areas. The control unit 10 then vectorizes the image area 61 and represents each small area with a value of 0 or 1. Similarly, the control unit 10 vectorizes the image area 62.
The control unit 10 then uses the vectorized values to calculate the sum of square differences (SSD). The SSD is used to calculate the degree of difference between the image area 61 and the image area 62.
When the vectorized value of the image area 61 is represented by x(a, b) and the vectorized value of the image area 62 is represented by y(a, b), the SSD can be calculated by the following formula.

この（式１）によれば、同一位置の小領域のうち値が異なる小領域の数を算出できる。
次に、制御部１０は、画像領域６１と画像領域６２との類似度を示すＳｉｍを、ＳＳＤを用いて、次の式で算出する。 According to this (Equation 1), the number of small areas with different values among small areas at the same position can be calculated.
Next, the control unit 10 calculates Sim, which indicates the similarity between the image area 61 and the image area 62, using the SSD according to the following formula.

ここで、より分かりやすい図６（Ｂ）を例に説明する。
画像領域６３と、画像領域６４との類似度を算出する場合、制御部１０は、画像領域６３をベクトル化した値６３ａと、画像領域６４をベクトル化した値６４ａとを、（式１）に当てはめて、ＳＳＤを算出する。 Here, an explanation will be given using an example of FIG. 6B, which is easier to understand.
When calculating the similarity between image area 63 and image area 64, the control unit 10 calculates the SSD by applying the vectorized value 63a of image area 63 and the vectorized value 64a of image area 64 to (Equation 1).

次に、制御部１０は、算出したＳＳＤを（式２）に当てはめて、Ｓｉｍを算出し、算出したＳｉｍの値を、画像領域６３と、画像領域６４との類似度にする。 Next, the control unit 10 applies the calculated SSD to (Equation 2) to calculate Sim, and uses the calculated Sim value as the similarity between image area 63 and image area 64.

図５のＳ３５において、制御部１０（判定部１６）は、算出した類似度が閾値以上か否かを判断する。類似度が閾値以上である場合（Ｓ３５：ＹＥＳ）には、制御部１０は、処理をＳ３６に移す。他方、類似度が閾値以上ではない場合（Ｓ３５：ＮＯ）には、制御部１０は、処理をＳ３７に移す。
例えば、閾値を０．８とすると、図６（Ｂ）の場合には、制御部１０は、類似度が閾値以上であると判断する。 5, the control unit 10 (determination unit 16) determines whether the calculated similarity is equal to or greater than a threshold. If the similarity is equal to or greater than the threshold (S35: YES), the control unit 10 proceeds to S36. On the other hand, if the similarity is not equal to or greater than the threshold (S35: NO), the control unit 10 proceeds to S37.
For example, if the threshold value is 0.8, in the case of FIG. 6B, the control unit 10 determines that the similarity is equal to or greater than the threshold value.

Ｓ３６において、制御部１０（判定部１６）は、異体字又は誤変換である旨の通知メッセージを生成し、処理を図３のＳ１７に移す。例えば、図６（Ａ）の場合には、算出した類似度が閾値以上であれば、Ｓ３６の処理に進むが、この例の場合は、異体字である。なお、実際の処理においては、図６（Ａ）に例示する文字が異体字ＤＢ４に登録されているので、Ｓ３２の処理でＹＥＳになる。そこで、この処理に到達し、異体字である場合とは、異体字ＤＢ４に登録されていない異体字の場合が該当する。
また、誤変換の例としては、例えば、「月」と「目」や、「入」と「人」や、「ヶ」と「ケ」等がある。
Ｓ３７において、制御部１０（判定部１６）は、不一致である旨の警告メッセージを生成し、処理を図３のＳ１７に移す。 In S36, the control unit 10 (determination unit 16) generates a notification message indicating that the character is a variant or a mistranslation, and proceeds to S17 in Fig. 3. For example, in the case of Fig. 6(A), if the calculated similarity is equal to or greater than the threshold, the process proceeds to S36, but in this example, the character is a variant. In actual processing, the character illustrated in Fig. 6(A) is registered in the variant DB 4, so the result of S32 becomes YES. Therefore, when this process is reached and the character is a variant, it corresponds to a case where the character is a variant that is not registered in the variant DB 4.
Examples of incorrect conversions include "tsuki" and "me", "iri" and "hito", and "ga" and "ke".
In S37, the control unit 10 (determining unit 16) generates a warning message indicating a mismatch, and the process proceeds to S17 in FIG.

次に、誤変換判定処理について説明する。
図７は、本実施形態に係る文字照合サーバ１の誤変換判定処理を示すフローチャートである。
図７のＳ４１において、制御部１０（判定部１６）は、不一致になった文字を特定する。
Ｓ４２において、制御部１０（判定部１６）は、特定した複数文字の組み合わせと、対になる文字との間の類似度を算出する。ここでの類似度の算出は、図６で説明した文字同士の類似度の算出と同様の処理によって行うことができる。但し、複数の文字の組み合わせの文字の領域は、対になる文字の領域より横方向に２倍の大きさである。そのため、例えば、複数の文字の組み合わせの文字の領域において、小領域に分割する際に、制御部１０は、文字の横方向を２倍にして１つの領域の横幅にする。そうすることで、対になる文字と同じ数の小領域を生成できるので、ベクトル化以降の処理を同様に行うことができる。 Next, the erroneous conversion determination process will be described.
FIG. 7 is a flowchart showing the erroneous conversion determination process of the character matching server 1 according to this embodiment.
In S41 of FIG. 7, the control unit 10 (determination unit 16) identifies the mismatched characters.
In S42, the control unit 10 (determination unit 16) calculates the similarity between the identified combination of multiple characters and the paired character. The calculation of the similarity here can be performed using the same process as the calculation of the similarity between characters described in FIG. 6. However, the character region of the combination of multiple characters is twice as large horizontally as the paired character region. Therefore, for example, when dividing the character region of the combination of multiple characters into small regions, the control unit 10 doubles the horizontal size of the character to make the width of one region. This makes it possible to generate the same number of small regions as the paired characters, so that the processes from vectorization onwards can be performed in the same way.

Ｓ４３において、制御部１０（判定部１６）は、類似度が閾値以上か否かを判断する。類似度が閾値以上である場合（Ｓ４３：ＹＥＳ）には、制御部１０は、処理をＳ４４に移す。他方、類似度が閾値以上ではない場合（Ｓ４３：ＮＯ）には、制御部１０は、処理をＳ４５に移す。
Ｓ４４において、制御部１０（判定部１６）は、誤変換である旨の通知メッセージを生成し、処理を図３のＳ１７に移す。ここで、Ｓ４４の処理に進むものの例として、例えば、「好」の１文字を「女」と「子」との２文字に誤変換した場合や、「明」の一文字を「日」と「月」との２文字に誤変換した場合、「飯」の１文字を「食」と「反」との２文字に誤変換した場合等がある。
Ｓ４５において、制御部１０（判定部１６）は、不一致である旨の警告メッセージを生成し、処理を図３のＳ１７に移す。 In S43, the control unit 10 (determination unit 16) determines whether the similarity is equal to or greater than a threshold. If the similarity is equal to or greater than the threshold (S43: YES), the control unit 10 proceeds to S44. On the other hand, if the similarity is not equal to or greater than the threshold (S43: NO), the control unit 10 proceeds to S45.
In S44, the control unit 10 (determination unit 16) generates a notification message indicating that the conversion is incorrect, and the process proceeds to S17 in Fig. 3. Examples of cases that proceed to the process of S44 include the erroneous conversion of one character "好" (good) into two characters "女" (woman) and "子" (child), the erroneous conversion of one character "明" (light) into two characters "日" (sun) and "月" (month), and the erroneous conversion of one character "飯" (meal) into two characters "食" (food) and "反" (anti-food).
In S45, the control unit 10 (determination unit 16) generates a warning message indicating a mismatch, and the process proceeds to S17 in FIG.

次に、具体例と共に一連の処理の流れを説明する。
図８は、本実施形態に係る文字照合システム１００における文字判別に係る具体例を示す図である。
まず、文字照合サーバ１には、申請書類７を画像化した申請データと、本人確認書類８を画像化した本人確認データとが入力される。
文字照合サーバ１において、制御部１０は、申請データから照合対象項目及び値を抽出して、比較データ記憶部２３に記憶する。また、制御部１０は、本人確認データから当該項目及び値を抽出して、比較データ記憶部２３に記憶する。
次に、文字照合サーバ１において、制御部１０は、比較データ記憶部２３から照合対象項目として氏名の値を抽出し、比較をする。
比較の結果、不一致があり、異体字であると判別されれば、文字照合サーバ１の制御部１０は、異体字である旨のメッセージを生成し、端末５にポップアップメッセージ５１として出力する。 Next, a series of processing steps will be described with a specific example.
FIG. 8 is a diagram showing a specific example of character discrimination in the character matching system 100 according to this embodiment.
First, application data obtained by imaging an application document 7 and personal identification data obtained by imaging an identification document 8 are input to the character matching server 1 .
In the character matching server 1, the control unit 10 extracts items and values to be matched from the application data and stores them in the comparison data storage unit 23. The control unit 10 also extracts the items and values from the personal identification data and stores them in the comparison data storage unit 23.
Next, in the character matching server 1, the control unit 10 extracts the value of the name as the matching target item from the comparison data storage unit 23 and performs a comparison.
If the comparison results in a mismatch and the character is determined to be a variant, the control unit 10 of the character matching server 1 generates a message indicating that the character is a variant, and outputs the message to the terminal 5 as a pop-up message 51 .

このように、本実施形態によれば、文字照合サーバ１は、以下のような効果がある。
（１）申請書類から得られる照合対象項目に対する値である第１テキストと、本人確認書類から得られる当該項目に対する値である第２テキストとを比較し、比較の結果が不一致の場合に不一致内容を判定し、判定した不一致内容に対応する通知をする。
よって、両者のテキストを照合した結果、不一致である場合には、不一致内容に対応した通知をすることができる。その結果、照合内容を確認するオペレータに分かりやすい通知を出力できる。 As described above, according to this embodiment, the character matching server 1 has the following effects.
(1) A first text, which is the value for the item to be matched obtained from the application documents, is compared with a second text, which is the value for the item obtained from the personal identification documents. If the comparison results in a mismatch, the details of the mismatch are determined, and a notification corresponding to the determined mismatch is sent.
Therefore, if the result of comparing the two texts is a mismatch, a notification corresponding to the mismatch can be given, making it possible to output a notice that is easy to understand to the operator who checks the matched content.

（２）異体字ＤＢ４を参照し、不一致である対の文字が異体字ＤＢ４に登録されているときに、不一致内容が異体字であると判定して異体字である旨を通知する。
よって、異体字であるか否かを異体字ＤＢ４に基づいて判断し、該当する場合には異体字である旨を通知することで、オペレータに注意を促すことができる。 (2) The character variant DB 4 is referenced, and if the mismatched pair of characters is registered in the character variant DB 4, it is determined that the mismatched content is a character variant, and a notification is given to that effect.
Therefore, whether or not the character is a variant is determined based on the variant character DB 4, and if so, the operator is notified that the character is a variant character, thereby alerting the operator.

（３）不一致である対の文字同士の類似度を算出し、算出した類似度が閾値以上であるときに、不一致内容が異体字又は誤変換であると判定してその旨を通知する。
よって、異体字ＤＢ４に登録がされていない異体字があった場合でも、通知によってオペレータに注意を促すことができる。また、似たような字である場合には、誤変換の可能性もあるため、通知によってオペレータに注意を促すことができる。
（４）不一致である対の文字を示す画像領域を複数の小領域に分割し、同一位置の小領域を比較して、類似度を算出するので、簡単な処理によって類似度を算出できる。 (3) The similarity between the mismatched characters is calculated, and if the calculated similarity is equal to or greater than a threshold, the mismatched content is determined to be a variant character or a mistranslation, and a notification to that effect is sent.
Therefore, even if there is a variant character that is not registered in the variant character DB 4, the operator can be alerted by the notification. Also, if the characters look similar, there is a possibility of mistranslation, so the operator can be alerted by the notification.
(4) The image area showing the mismatched pair of characters is divided into a plurality of small areas, and the small areas at the same position are compared to calculate the similarity, so that the similarity can be calculated by simple processing.

（５）第１テキスト及び第２テキストの文字数を確認し、文字数の一致を確認した場合に、一文字ずつ順番に比較するので、文字数が不一致の場合を先に除外することで、処理を効率的に行うことができる。
（６）文字数が不一致の場合には、不一致の文字を特定し、特定した複数の文字の組み合わせと、対になる文字との類似度を算出し、算出した類似度が閾値以上であれば誤変換であると判定する。よって、１文字が２文字に変換されたような誤変換についてもオペレータに注意を促すことができる。 (5) The number of characters in the first text and the second text is checked, and if the number of characters is confirmed to match, the characters are compared one by one in order. Therefore, by first excluding cases where the number of characters does not match, processing can be performed efficiently.
(6) If the number of characters does not match, the mismatched characters are identified, the similarity between the identified combination of characters and the paired characters is calculated, and if the calculated similarity is equal to or greater than a threshold, it is determined to be a misconversion. Thus, it is possible to alert the operator to a misconversion such as when one character is converted into two characters.

（７）所定の項目には、氏名及び住所の少なくとも一方を含むので、項目に対する値として異体字を含みやすい項目について用いることができる。
（８）申請書類を画像化した申請データから照合対象項目と値とを取得し、本人確認書類を画像化した本人確認データから同じ照合対象項目に対応する値を取得して比較をする。よって、画像化したデータに基づいて、照合対象項目に対する文字照合を、自動的に行うことができる。 (7) The predetermined items include at least one of a name and an address, and therefore, it can be used for items that tend to include variant characters as values for the items.
(8) Items to be matched and their values are obtained from application data that is an image of an application document, and values corresponding to the same items to be matched are obtained from identity verification data that is an image of an identity verification document, and these values are compared. Thus, character matching for items to be matched can be performed automatically based on the image data.

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されるものではない。また、実施形態に記載した効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、実施形態に記載したものに限定されない。なお、上述した実施形態及び後述する変形形態は、適宜組み合わせて用いることもできるが、詳細な説明は省略する。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments. Furthermore, the effects described in the embodiments are merely a list of the most favorable effects resulting from the present invention, and the effects of the present invention are not limited to those described in the embodiments. The above-described embodiments and the modified forms described below can also be used in appropriate combinations, but detailed explanations will be omitted.

（変形形態）
（１）本実施形態では、異体字ＤＢが文字照合サーバに対して通信可能に接続されたものであるとして説明したが、これに限定されない。異体字ＤＢは、文字照合サーバが有してもよい。
（２）本実施形態では、異体字ＤＢを用いるものと、不一致である対の文字同士の類似度を算出するものと、の両方をあわせて用いるものを例に説明したが、これに限定されない。いずれかの方法のみであってもよい。但し、両方の方法を用いた方が、より信頼度の高い照合を行うことができる。 (Modified form)
(1) In the present embodiment, the character variant DB is described as being communicably connected to the character matching server, but this is not limiting. The character variant DB may be included in the character matching server.
(2) In the present embodiment, the method using the variant character DB and the method calculating the similarity between mismatched pairs of characters are used together. However, the present invention is not limited to this. Only one of the methods may be used. However, using both methods can achieve more reliable matching.

（３）本実施形態では、文字照合サーバが異体字又は誤変換であるか否かを判定するものを例に説明したが、これに限定されない。文字照合サーバが行う処理を、申請処理を行う装置に組み込んで、申請のための様々な処理の一部として、当該機能を用いるものであってもよい。 (3) In this embodiment, the character matching server determines whether a character is a variant or a mistranslation, but this is not limited to this. The processing performed by the character matching server may be incorporated into a device that processes applications, and the function may be used as part of various processes for applications.

（４）本実施形態では、異体字ＤＢは、異体字そのものを記憶するものとして説明したが、これに限定されない。例えば、人名や地名でよく用いられる異体字を含む文字列を記憶した、例えば、異体字文字列記憶部（図示せず）を用いてもよい。その場合、文字照合サーバは、異体字の前後の文字が、異体字文字列記憶部に記憶されているか否かによって、照合する文字列を決定する。そして、文字照合サーバは、対になる文字列を含めて異体字文字列記憶部に記憶されているか否かによる判断を行って、異体字であるか否かを判断する。このような処理にすれば、照合の信頼度がより向上する。 (4) In this embodiment, the variant character DB has been described as storing variant character strings themselves, but this is not limited to this. For example, a variant character string storage unit (not shown) that stores character strings including variant character strings commonly used in personal names and place names may be used. In this case, the character matching server determines the character string to be matched based on whether the characters before and after the variant character are stored in the variant character string storage unit. The character matching server then determines whether the character string is a variant character by determining whether the character string, including its paired character string, is stored in the variant character string storage unit. This type of processing further improves the reliability of matching.

１文字照合サーバ
４異体字ＤＢ
５端末
７申請書類
８本人確認書類
１０制御部
１１申請データ取得部
１２第１テキスト取得部
１３本人確認データ取得部
１４第２テキスト取得部
１５比較部
１６判定部
１７通知処理部
２０記憶部
２１ａ文字照合プログラム
２３比較データ記憶部
５１ポップアップメッセージ
１００文字照合システム
Ｎ通信ネットワーク 1. Character matching server 4. Variant character DB
5 Terminal 7 Application document 8 Personal identification document 10 Control unit 11 Application data acquisition unit 12 First text acquisition unit 13 Personal identification data acquisition unit 14 Second text acquisition unit 15 Comparison unit 16 Determination unit 17 Notification processing unit 20 Storage unit 21a Character matching program 23 Comparison data storage unit 51 Pop-up message 100 Character matching system N Communication network

Claims

a comparison means for comparing a first text for a predetermined item obtained from a first document with a second text for the predetermined item obtained from a second document;
a determination means for determining the content of the mismatch, including variant characters and incorrect conversions , when the comparison by the comparison means results in a mismatch;
a notification means for notifying the content of the mismatch determined by the determination means;
A character matching device comprising:

2. The character matching device according to claim 1,
the determining means determines that the mismatched content is a variant character when the mismatched pair of characters is registered in a variant character dictionary that stores variant character;
The notifying means notifies that the character is a variant character.

3. The character matching device according to claim 1,
the determining means calculates a similarity between a pair of mismatched characters, and when the calculated similarity is equal to or greater than a threshold, determines that the mismatched content is a variant character or an incorrect conversion;
The notifying means notifies the user that the character is a variant character or a mistranslation.

4. The character matching device according to claim 3,
The determining means divides an image area showing a pair of mismatched characters into a plurality of small areas, and compares the small areas at the same position to calculate the similarity.

5. The character matching device according to claim 1,
character number confirmation means for confirming the number of characters of the first text and the second text;
The character matching device, wherein the comparison means compares the characters one by one when the character number confirmation means confirms that the numbers of characters match.

6. The character matching device according to claim 5,
a similarity calculation means for, when a mismatch in the number of characters is confirmed by the character number confirmation means, specifying a combination of a plurality of characters that is a mismatch and a pair of characters by comparing the first text with the second text by the comparison means , and calculating a similarity by dividing an image area showing the specified combination of a plurality of characters and an image area showing the specified pair of characters into a plurality of small areas of the same number and comparing the divided areas;
The notifying means notifies that the conversion is erroneous when the similarity calculated by the similarity calculation means is equal to or greater than a threshold value.

7. The character matching device according to claim 1,
The character matching device, wherein the predetermined items include at least one of a name and an address.

8. The character matching device according to claim 1,
a first data acquisition means for acquiring first data obtained by imaging the first document;
second data acquisition means for acquiring second data obtained by imaging the second document;
a first text acquisition means for acquiring the predetermined item from the first data acquired by the first data acquisition means, and acquiring a value corresponding to the acquired predetermined item as the first text;
second text acquisition means for acquiring the predetermined item from the second data acquired by the second data acquisition means, and acquiring a value corresponding to the acquired predetermined item as the second text;
A character matching device comprising:

9. The character matching device according to claim 8,
the first data and/or the second data are image data of a document including handwritten characters;
The character matching device, wherein the first text and/or the second text is text obtained by performing character recognition processing on the image data.

A program for causing a computer to function as the character matching device described in any one of claims 1 to 9.