JP7704045B2

JP7704045B2 - Document correction support program, document correction support method, and information processing device

Info

Publication number: JP7704045B2
Application number: JP2022023005A
Authority: JP
Inventors: 一穂前田; 進遠藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2025-07-08
Anticipated expiration: 2042-02-17
Also published as: JP2023119888A

Description

本発明の実施形態は、文書修正支援プログラム、文書修正支援方法および情報処理装置に関する。 Embodiments of the present invention relate to a document correction assistance program, a document correction assistance method, and an information processing device.

従来、窓口等で提出された種々の書類（以下、資料ともよぶ）については、書類間の整合性をチェックし、不整合のある項目を職員が修正している。例えば、税業務の窓口では、毎年、多くの申告書類が提出されている。提出された書類は、職員が、住民の基本情報や雇用元の提出書類と突き合わせて、間違いが無いかをチェックしている。 Conventionally, various documents (hereafter referred to as documents) submitted at counters etc. are checked for consistency between documents, and employees correct any inconsistencies. For example, many tax return forms are submitted at tax counters every year. Employees check the submitted documents for errors by comparing them with basic information about residents and documents submitted by their employers.

図８は、書類不備の修正の一例を説明する説明図である。図８に示すように、住民Ｈ１は、確定申告書Ｄ１および住民税申告書Ｄ２を市役所に提出する。また、住民Ｈ１の勤務先Ｋ１、Ｋ２は、住民Ｈ１に関する給与支払報告書Ｄ３、Ｄ４を提出する。また、年金機構Ｋ３は、住民Ｈ１に関する年金支払報告書Ｄ５を提出する。市役所の職員Ｈ２は、提出された確定申告書Ｄ１、住民税申告書Ｄ２、給与支払報告書Ｄ３、Ｄ４および年金支払報告書Ｄ５の各項目の記載を比較する。そして、職員Ｈ２は、不整合のある項目を検出し、その項目のデータを修正する。 Figure 8 is an explanatory diagram that explains an example of correcting document deficiencies. As shown in Figure 8, resident H1 submits a final tax return D1 and a resident tax return D2 to city hall. In addition, resident H1's employers K1 and K2 submit salary payment reports D3 and D4 related to resident H1. In addition, the pension service K3 submits a pension payment report D5 related to resident H1. City hall employee H2 compares the details of each item in the submitted final tax return D1, resident tax return D2, salary payment reports D3 and D4, and pension payment report D5. Employee H2 then detects any items that are inconsistent and corrects the data for those items.

このような、文書修正作業のサポートについては、各療養機関サーバから受信される請求書と明細書に対する記載内容を、既存のルールに従い点検する従来技術がある。また、文書の各項目内容の意味ベクトルが予め登録された規範文書における該当項目の意味ベクトルの重心からずれているかで、項目内容の妥当性を判定する従来技術がある。また、修正済みの資料データを機械学習して得られた学習済みモデルに対して評価対象の資料データを入力して得られた評価結果を表示する従来技術がある。また、新たな文書集合を学習データとして用いることでモデルを更新する従来技術がある。 To support such document correction work, there is a conventional technique that checks the contents of invoices and detailed statements received from each medical institution server according to existing rules. There is also a conventional technique that judges the validity of the item contents based on whether the semantic vector of each item content of a document deviates from the center of gravity of the semantic vector of the corresponding item in a pre-registered normative document. There is also a conventional technique that displays the evaluation results obtained by inputting the document data to be evaluated into a trained model obtained by machine learning the corrected document data. There is also a conventional technique that updates the model by using a new set of documents as training data.

特開２００７－２４１９８６号公報JP 2007-241986 A 特開２０２０－１４０４４２号公報JP 2020-140442 A 特開２０１８－１４７２８０号公報JP 2018-147280 A 特開２０２１－８９４７３号公報JP 2021-89473 A

給与支払報告書Ｄ３、Ｄ４のように、提出された文書の中に同種の文書が複数含まれる場合がある。このような場合、市役所の職員Ｈ２は、文書間の関係性を考慮した上で、同種の文書の中からいずれの文書を修正するのかを決めている。 There are cases where the submitted documents contain multiple documents of the same type, such as salary payment reports D3 and D4. In such cases, city hall employee H2 considers the relationships between the documents and decides which of the documents of the same type to revise.

しかしながら、上記の従来技術では、提出された複数の文書間の関係性が考慮されておらず、不整合のある項目を特定できたとしても、どの文書を修正するかを特定することは困難であった。 However, the above-mentioned conventional technology does not take into account the relationships between the multiple documents submitted, and even if it is possible to identify inconsistent items, it is difficult to identify which documents need to be revised.

１つの側面では、修正対象の文書を容易に特定できる文書修正支援プログラム、文書修正支援方法および情報処理装置を提供することを目的とする。 In one aspect, the objective is to provide a document correction assistance program, a document correction assistance method, and an information processing device that can easily identify documents to be corrected.

１つの案では、文書修正支援プログラムは、入力する処理と、出力する処理とをコンピュータに実行させる。入力する処理は、各文書の修正履歴を含む複数の事例の学習用データをもとに、事例ごとに、予め定義された復元演算に基づいて特定の第１の文書に含まれる項目の値および特定の第２の文書に含まれる項目の値より演算した値と、第１の文書に含まれる項目の値との差分を含む説明変数と、事例の各文書における修正履歴に対応する値を含む目的変数とに基づいて機械学習したモデルに対して、判定対象の各文書に基づいて生成した説明変数を入力する。出力する処理は、モデルからの出力に基づいて、判定対象の各文書における修正の有無を出力する。 In one proposal, the document correction assistance program causes a computer to execute an input process and an output process. The input process is based on learning data of a plurality of cases including the correction history of each document, and inputs explanatory variables generated based on each document to a machine-learned model based on explanatory variables including the difference between a value calculated from the value of an item included in a specific first document and the value of an item included in a specific second document based on a predefined restoration calculation for each case, and the value of the item included in the first document, and a target variable including a value corresponding to the correction history of each document of the cases. The output process is based on the output from the model and outputs whether or not each document to be determined is corrected.

修正対象の文書を容易に特定できる。 Documents to be corrected can be easily identified.

図１は、実施形態にかかる情報処理装置の機能構成例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a functional configuration of an information processing apparatus according to an embodiment. 図２は、復元演算定義情報の一例を説明する説明図である。FIG. 2 is an explanatory diagram illustrating an example of the restoration computation definition information. 図３は、実施形態にかかる情報処理装置の動作例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of the operation of the information processing device according to the embodiment. 図４は、実施形態にかかる情報処理装置の学習処理の一例を示すフローチャートである。FIG. 4 is a flowchart illustrating an example of a learning process of the information processing device according to the embodiment. 図５は、学習用データの一例を説明する説明図である。FIG. 5 is an explanatory diagram illustrating an example of learning data. 図６は、実施形態にかかる情報処理装置の判定処理の一例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of a determination process of the information processing device according to the embodiment. 図７は、コンピュータ構成の一例を示すブロック図である。FIG. 7 is a block diagram showing an example of a computer configuration. 図８は、書類不備の修正の一例を説明する説明図である。FIG. 8 is an explanatory diagram for explaining an example of correcting a document defect.

以下、図面を参照して、実施形態にかかる文書修正支援プログラム、文書修正支援方法および情報処理装置を説明する。実施形態において同一の機能を有する構成には同一の符号を付し、重複する説明は省略する。なお、以下の実施形態で説明する文書修正支援プログラム、文書修正支援方法および情報処理装置は、一例を示すに過ぎず、実施形態を限定するものではない。また、以下の各実施形態は、矛盾しない範囲内で適宜組みあわせてもよい。 The document correction support program, document correction support method, and information processing device according to the embodiments will be described below with reference to the drawings. Configurations having the same functions in the embodiments will be given the same reference numerals, and duplicated descriptions will be omitted. Note that the document correction support program, document correction support method, and information processing device described in the following embodiments are merely examples, and do not limit the embodiments. In addition, the following embodiments may be combined as appropriate within a range that does not cause contradictions.

図１は、実施形態にかかる情報処理装置の機能構成例を示すブロック図である。図１に示すように、情報処理装置１は、入力部１０、目的変数生成部２０、説明変数生成部３０、記憶部４０、学習部５０および判定部６０を有する。 FIG. 1 is a block diagram showing an example of the functional configuration of an information processing device according to an embodiment. As shown in FIG. 1, the information processing device 1 has an input unit 10, a target variable generation unit 20, an explanatory variable generation unit 30, a storage unit 40, a learning unit 50, and a determination unit 60.

情報処理装置１は、修正済資料データ１１を学習用データとして機械学習することで学習モデル７１を生成する学習装置の一例である。また、情報処理装置１は、判定対象の未修正資料データ１２に含まれる各文書について、生成した学習モデル７１を用いて修正の有無を判定して出力する判定装置の一例である。 The information processing device 1 is an example of a learning device that generates a learning model 71 by machine learning the corrected material data 11 as learning data. The information processing device 1 is also an example of a determination device that determines whether or not each document included in the uncorrected material data 12 to be determined has been corrected using the generated learning model 71 and outputs the result.

なお、上記の学習装置および判定装置は一つの情報処理装置１で実現してもよいが、分割して実現してもよい。例えば、情報処理装置１は、入力部１０、目的変数生成部２０、説明変数生成部３０、記憶部４０および学習部５０を有する学習装置であってもよい。また、情報処理装置１は、入力部１０、目的変数生成部２０、説明変数生成部３０、記憶部４０および判定部６０を有する判定装置であってもよい。 The above-mentioned learning device and judgment device may be realized by a single information processing device 1, or may be realized separately. For example, the information processing device 1 may be a learning device having an input unit 10, a target variable generation unit 20, an explanatory variable generation unit 30, a memory unit 40, and a learning unit 50. The information processing device 1 may also be a judgment device having an input unit 10, a target variable generation unit 20, an explanatory variable generation unit 30, a memory unit 40, and a judgment unit 60.

入力部１０は、修正済資料データ１１、未修正資料データ１２等の各種データの入力を受け付ける処理部である。 The input unit 10 is a processing unit that accepts input of various data such as corrected material data 11 and uncorrected material data 12.

修正済資料データ１１は、各事例における各書類の修正済み内容を記述したデータである。例えば、修正済資料データ１１は、各事例において、各書類（資料）の内容（例えば各項目における記入値）と、各書類における修正の有無（例えば不備のある項目と、その項目における修正値）とを含むデータである。一例として、修正済資料データ１１には、税業務において住民Ｈ１が提出し、職員Ｈ２が修正した各書類の項目の値と、修正の有無とが含まれる。 Corrected document data 11 is data that describes the corrected content of each document in each case. For example, corrected document data 11 is data that includes the content of each document (document) in each case (e.g., the value entered in each item) and whether or not each document has been corrected (e.g., an item with an error and the corrected value for that item). As an example, corrected document data 11 includes the values of the items of each document submitted by resident H1 and corrected by employee H2 in tax procedures, and whether or not it has been corrected.

機械学習では、修正済資料データ１１をもとに、事例ごとに、各書類の内容に対応する特徴量を説明変数とし、修正の有無に対応する値を目的変数として生成する。ついで、機械学習では、ニューラルネットワーク等のモデルの入力層に生成した説明変数を入力した場合、そのモデルの出力層から目的変数が得られるように、モデルのパラメータを求める。 In machine learning, based on the corrected document data 11, for each case, feature quantities corresponding to the contents of each document are used as explanatory variables, and a value corresponding to the presence or absence of corrections is generated as a target variable. Next, in machine learning, when the generated explanatory variables are input into the input layer of a model such as a neural network, the model parameters are calculated so that the target variable can be obtained from the output layer of the model.

未修正資料データ１２は、判定対象の事例に関する各書類の内容を記述したデータである。例えば、未修正資料データ１２は、判定対象の事例における各書類（資料）の内容（例えば各項目における記入値）を含むデータである。一例として、未修正資料データ１２には、税業務において住民Ｈ１が提出した各書類の項目の値が含まれる。 Uncorrected document data 12 is data that describes the contents of each document related to the case being assessed. For example, uncorrected document data 12 is data that includes the contents of each document (document) in the case being assessed (e.g., the values entered in each item). As an example, uncorrected document data 12 includes the values of the items in each document submitted by resident H1 for tax purposes.

目的変数生成部２０は、修正済資料データ１１をもとに目的変数を生成する処理部である。具体的には、目的変数生成部２０は、機械学習時において修正済資料データ１１に含まれる事例ごとに、事例の各書類における修正の有無を数値化した目的変数を生成する。 The objective variable generation unit 20 is a processing unit that generates an objective variable based on the corrected material data 11. Specifically, the objective variable generation unit 20 generates an objective variable that quantifies the presence or absence of corrections in each document of the case for each case included in the corrected material data 11 during machine learning.

説明変数生成部３０は、修正済資料データ１１または未修正資料データ１２をもとに説明変数を生成する処理部である。具体的には、説明変数生成部３０は、修正済資料データ１１または未修正資料データ１２の各事例において、各書類（資料）の内容（例えば各項目における記入値）を数値化した説明変数を生成する。 The explanatory variable generation unit 30 is a processing unit that generates explanatory variables based on the corrected document data 11 or the uncorrected document data 12. Specifically, the explanatory variable generation unit 30 generates explanatory variables that quantify the contents of each document (document) (e.g., the entered values in each item) for each case of the corrected document data 11 or the uncorrected document data 12.

情報処理装置１は、機械学習時において、修正済資料データ１１の事例ごとに、各書類について目的変数生成部２０が生成した目的変数と、説明変数生成部３０が生成した説明変数とを、例えば各書類を行とする配列として組み合わせたデータを機械学習用の学習用データ７０とする。また、情報処理装置１は、学習モデル７１を用いた判定時において、未修正資料データ１２をもとに各書類について説明変数生成部３０が生成した説明変数を、例えば各書類を行とする配列としたデータを評価用データ８０とする。 During machine learning, the information processing device 1 combines the objective variables generated by the objective variable generation unit 20 for each document and the explanatory variables generated by the explanatory variable generation unit 30 for each case of the corrected material data 11, for example, in an array with each document as a row, to obtain learning data 70 for machine learning. Furthermore, during judgment using the learning model 71, the information processing device 1 obtains evaluation data 80 by combining the explanatory variables generated by the explanatory variable generation unit 30 for each document based on the uncorrected material data 12, for example, in an array with each document as a row.

この説明変数生成部３０が生成する説明変数には、予め定義された復元演算に基づいて、特定の文書と、その文書とは別にある特定の文書（他の文書とよぶ）の項目の値より演算した値と、特定の文書の項目の値との差分、すなわち特定の文書と他の文書との間の関係性に対応する値が含まれる。 The explanatory variables generated by the explanatory variable generation unit 30 include a difference between a value calculated from the values of items in a specific document and a specific document (called the other document) other than the specific document based on a predefined restoration calculation, and the value of the item in the specific document, i.e., a value corresponding to the relationship between the specific document and the other document.

ここで、特定の文書および他の文書は、復元演算の定義内容において予め指定されたものであり、例えば定義内容で文書の種別を指定することで、同一の種別の文書が指定される。例えば、復元演算の定義内容において給与支払報告書を指定することで、各事例において提出された複数の給与支払報告書（例えば給与支払報告書Ｄ３、Ｄ４）が特定の文書および他の文書とされることとなる。 Here, the specific document and other documents are specified in advance in the definition content of the reconstruction operation. For example, by specifying a document type in the definition content, documents of the same type are specified. For example, by specifying a payroll payment report in the definition content of the reconstruction operation, multiple payroll payment reports submitted in each case (e.g., payroll payment reports D3 and D4) become the specific document and other documents.

また、復元演算は、特定の文書および他の文書間において定型的な「資料の分かれ方」を定義し、それを復元する演算である。一例として、復元演算は、資料種別ごとに、住民Ｈ１より提出された複数の文書（例えば給与支払報告書Ｄ３、Ｄ４）の項目における正しい値を職員Ｈ２が推定する際に頭の中で行う演算である。すなわち、復元演算の結果は、資料種別ごとの正しい値の候補となる。また、復元演算の結果は、該当資料種別の資料全体（例えば給与支払報告書Ｄ３、Ｄ４）の特徴を表す。したがって、その演算結果と、特定の資料の項目との差分は、特定の資料と、他の資料との関係性を間接的に表現している。 The reconstruction calculation is a calculation that defines a typical "way of dividing documents" between a specific document and other documents and restores it. As an example, the reconstruction calculation is a calculation that employee H2 performs in his head when estimating the correct values for the items of multiple documents (e.g., salary payment reports D3, D4) submitted by resident H1 for each document type. In other words, the results of the reconstruction calculation are candidates for the correct values for each document type. The results of the reconstruction calculation represent the characteristics of the entire document of that document type (e.g., salary payment reports D3, D4). Therefore, the difference between the calculation result and the items of a specific document indirectly represents the relationship between the specific document and other documents.

上記の説明変数を生成する構成として、説明変数生成部３０は、復元演算計算部３１と、差分計算部３２とを有する。 To generate the explanatory variables, the explanatory variable generation unit 30 has a reconstruction calculation unit 31 and a difference calculation unit 32.

復元演算計算部３１は、記憶部４０に記憶された復元演算定義情報４１に基づいて、修正済資料データ１１または未修正資料データ１２の各事例に含まれる特定の文書と、その文書とは別にある他の文書の項目の値より復元演算を行う処理部である。 The restoration calculation unit 31 is a processing unit that performs restoration calculations based on the restoration calculation definition information 41 stored in the memory unit 40, using the values of items of a specific document included in each case of the modified material data 11 or the unmodified material data 12 and other documents separate from that document.

差分計算部３２は、復元演算定義情報４１に基づいて、復元演算計算部３１による復元演算の結果と、特定の文書の項目の値との差分を求める処理部である。 The difference calculation unit 32 is a processing unit that calculates the difference between the result of the restoration calculation performed by the restoration calculation calculation unit 31 and the value of an item of a specific document based on the restoration calculation definition information 41.

図２は、復元演算定義情報４１の一例を説明する説明図である。図２に示すように、復元演算定義情報４１には、特定の文書と他の文書間において定型的な「資料の分かれ方」を定義し、それを復元する「復元演算」が示されている。なお、復元演算定義情報４１については、確定申告書、Ｄ２、Ｄ３等の文書の種別ごとに用意されており、文書の種別ごとに、文書間における項目ごとの復元演算の内容が定義されている。なお、以下の例では、復元演算が対象とする資料の値を０と正に限定して説明するが、復元演算が対象とする対象とする資料の値をこれらに限定するものではない。 Figure 2 is an explanatory diagram explaining an example of restoration calculation definition information 41. As shown in Figure 2, the restoration calculation definition information 41 defines a typical "way of separating documents" between a specific document and other documents, and shows a "restoration calculation" that restores it. Note that the restoration calculation definition information 41 is prepared for each type of document, such as tax returns, D2, D3, etc., and the contents of the restoration calculation for each item between documents are defined for each document type. Note that in the following example, the values of documents targeted by the restoration calculation are limited to 0 and positive, but the values of documents targeted by the restoration calculation are not limited to these.

例えば、復元演算定義情報４１に定義された「復元演算」における「資料の分かれ方」としては、ある資料（例えば給与支払報告書Ｄ３）の値が、他の資料（例えば給与支払報告書Ｄ４）の値に包括されているケースがある。このようなケースの「復元演算」では、２つの資料の値の最大値を求める。したがって、このケースの復元演算の結果と、ある資料の値との差分の意味は、０ならばこの資料が最大値であり、０以外ならば他の資料のどれかが最大値であることとなる。 For example, the "way of dividing data" in the "reconstruction calculation" defined in the reconstruction calculation definition information 41 may be a case where the value of one data (e.g., salary payment report D3) is included in the value of another data (e.g., salary payment report D4). In the "reconstruction calculation" in such a case, the maximum value of the values of the two data is found. Therefore, the meaning of the difference between the result of the reconstruction calculation in this case and the value of a data is that if it is 0, then this data is the maximum value, and if it is non-zero, then one of the other data is the maximum value.

また、復元演算定義情報４１に定義された「復元演算」における「資料の分かれ方」としては、別々の資料に値が計上されており、資料間で値が重複していないケースがある。このようなケースの「復元演算」では、全ての資料の値の合計を求める。したがって、このケースの復元演算の結果と、ある資料の値との差分の意味は、０ならばこの資料のみに正の値があり、０以外ならば他の資料にも正の値があることとなる。 In addition, the "way of dividing materials" in the "restoration calculation" defined in the restoration calculation definition information 41 may mean that values are recorded in separate materials, with no overlap between the materials. In the "restoration calculation" in such a case, the sum of the values of all materials is calculated. Therefore, the meaning of the difference between the result of the restoration calculation in this case and the value of a certain material is that if it is 0, only this material has a positive value, and if it is non-zero, the other materials also have positive values.

また、復元演算定義情報４１に定義された「復元演算」における「資料の分かれ方」としては、特定の資料の項目に別の項目の値を誤記載するケースがある。このようなケースの「復元演算」では、全ての資料の値の合計から該当資料の値を減算する。したがって、このケースの復元演算の結果と、ある資料の値との差分の意味は、０ならばこの資料と該当資料のみに正の値があり、０以外ならば他の資料にも正の値があるか、この資料が該当資料か値が０かのいずれかとなる。 In addition, the "way of dividing materials" in the "restoration calculation" defined in the restoration calculation definition information 41 can be a case where the value of a specific material item is mistakenly entered in the item of another material. In the "restoration calculation" in such a case, the value of the relevant material is subtracted from the sum of the values of all materials. Therefore, the meaning of the difference between the result of the restoration calculation in this case and the value of a certain material is that if it is 0, then only this material and the relevant material have positive values, and if it is non-zero, then either the other material also has a positive value, or this material is the relevant material or its value is 0.

記憶部４０は、メモリやＨＤＤ（Hard Disk Drive）等の記憶装置であり、復元演算定義情報４１を格納する。 The storage unit 40 is a storage device such as a memory or a hard disk drive (HDD), and stores the restoration calculation definition information 41.

学習部５０は、学習用データ７０の各事例について、説明変数生成部３０が生成した説明変数と、目的変数生成部２０が生成した目的変数とをもとに、公知の機械学習処理を行うことで学習モデル７１を生成する処理部である。学習部５０が行う機械学習処理としては、決定木、ランダムフォレスト、ディープラーニング等がある。例えば、ディープラーニングの場合、学習部５０は、説明変数を入力した場合に、目的変数に対応する出力を行うように隠れ層のパラメータを求めることで、学習モデル７１を生成する。 The learning unit 50 is a processing unit that generates a learning model 71 by performing known machine learning processing for each example of the learning data 70 based on the explanatory variables generated by the explanatory variable generation unit 30 and the objective variables generated by the objective variable generation unit 20. The machine learning processing performed by the learning unit 50 includes decision trees, random forests, deep learning, and the like. For example, in the case of deep learning, the learning unit 50 generates the learning model 71 by determining hidden layer parameters so that, when an explanatory variable is input, an output corresponding to the objective variable is produced.

判定部６０は、判定対象の事例に関して未修正資料データ１２をもとに生成した説明変数を含む評価用データ８０を学習モデル７１に入力して、判定対象の事例の判別結果を取得する処理部である。具体的には、判定部６０は、学習部５０の機械学習により得られたパラメータを読み出して学習モデル７１を構築する。次いで、判定部６０は、学習モデル７１に対して未修正資料データ１２をもとに生成した説明変数、すなわち判定対象の各書類の特徴量を入力する。次いで、判定部６０は、学習モデル７１の出力より、判定対象の各書類における修正の有無を示す確度（評価値）を得る。判定部６０は、得られた確度をもとに、例えば所定の閾値と比較することで、各書類における修正の有無を示す判定結果８１を出力する。 The judgment unit 60 is a processing unit that inputs evaluation data 80 including explanatory variables generated based on the uncorrected material data 12 for the case to be judged to the learning model 71 and obtains a judgment result for the case to be judged. Specifically, the judgment unit 60 reads out parameters obtained by machine learning in the learning unit 50 to construct the learning model 71. Next, the judgment unit 60 inputs the explanatory variables generated based on the uncorrected material data 12, i.e., the feature quantities of each document to be judged, to the learning model 71. Next, the judgment unit 60 obtains a degree of accuracy (evaluation value) indicating the presence or absence of corrections in each document to be judged from the output of the learning model 71. The judgment unit 60 outputs a judgment result 81 indicating the presence or absence of corrections in each document based on the obtained degree of accuracy, for example, by comparing it with a predetermined threshold value.

図３は、実施形態にかかる情報処理装置の動作例を示すフローチャートである。図３に示すように、情報処理装置１では、修正済資料データ１１を学習用データとし、復元演算定義情報４１に基づく復元計算の結果を説明変数に含めた学習処理を行い（Ｓ１）、学習モデル７１を生成する。 Fig. 3 is a flowchart showing an example of the operation of the information processing device according to the embodiment. As shown in Fig. 3, the information processing device 1 performs a learning process (S1) using the corrected material data 11 as learning data and including the results of the restoration calculation based on the restoration calculation definition information 41 as explanatory variables, thereby generating a learning model 71.

図４は、実施形態にかかる情報処理装置１の学習処理の一例を示すフローチャートである。図４に示すように、学習処理が開始されると、復元演算計算部３１は、修正済資料データ１１に含まれる事例（住民）ごとに、提出された各書類（資料）の値を取得する。ついで、復元演算計算部３１は、復元演算定義情報４１の定義内容に基づいて、特定の文書と、その文書とは別にある他の文書の項目の値より復元演算を行う（Ｓ１０）。 Figure 4 is a flowchart showing an example of the learning process of the information processing device 1 according to the embodiment. As shown in Figure 4, when the learning process is started, the restoration calculation unit 31 obtains the values of each submitted document (document) for each case (resident) included in the corrected document data 11. Next, the restoration calculation unit 31 performs a restoration calculation from the values of items of a specific document and another document separate from that document, based on the definition contents of the restoration calculation definition information 41 (S10).

ついで、差分計算部３２は、復元演算計算部３１による復元演算結果と、資料の値との差分を計算し（Ｓ１１）、この計算結果を説明変数として学習用データ７０に格納する（Ｓ１２）。 Then, the difference calculation unit 32 calculates the difference between the reconstruction calculation result by the reconstruction calculation unit 31 and the value of the material (S11), and stores this calculation result in the learning data 70 as an explanatory variable (S12).

ついで、説明変数生成部３０は、事例に含まれる全ての資料について説明変数を生成する処理を終了したか否かを判定する（Ｓ１３）。事例に含まれる全ての資料について処理を終了していない場合（Ｓ１３：Ｎｏ）、説明変数生成部３０は、Ｓ１０へ処理を戻す。 Next, the explanatory variable generation unit 30 determines whether the process of generating explanatory variables for all materials included in the case has been completed (S13). If the process has not been completed for all materials included in the case (S13: No), the explanatory variable generation unit 30 returns the process to S10.

事例に含まれる全ての資料について処理を終了した場合（Ｓ１３：Ｙｅｓ）、説明変数生成部３０は、復元演算定義情報４１に定義された全ての復元演算を処理したか否かを判定する（Ｓ１４）。全ての復元演算を処理していない場合（Ｓ１４：Ｎｏ）、説明変数生成部３０は、Ｓ１０へ処理を戻す。 When processing has been completed for all materials included in the case (S13: Yes), the explanatory variable generation unit 30 determines whether or not all the reconstruction operations defined in the reconstruction operation definition information 41 have been processed (S14). When not all the reconstruction operations have been processed (S14: No), the explanatory variable generation unit 30 returns the process to S10.

全ての復元演算を処理した場合（Ｓ１４：Ｙｅｓ）、説明変数生成部３０は、資料の全ての項目について説明変数の生成に関する処理を行ったか否かを判定する（Ｓ１５）。全ての項目について処理をしていない場合（Ｓ１５：Ｎｏ）、説明変数生成部３０は、Ｓ１０へ処理を戻す。 If all the reconstruction calculations have been processed (S14: Yes), the explanatory variable generation unit 30 determines whether the process for generating explanatory variables has been performed for all items of the document (S15). If the process has not been performed for all items (S15: No), the explanatory variable generation unit 30 returns to S10.

全ての項目を処理した場合（Ｓ１５：Ｙｅｓ）、説明変数生成部３０は、全ての住民（事例）を処理したか否かを判定する（Ｓ１６）。全ての住民について処理をしていない場合（Ｓ１６：Ｎｏ）、説明変数生成部３０は、Ｓ１０へ処理を戻す。 If all items have been processed (S15: Yes), the explanatory variable generation unit 30 determines whether all residents (cases) have been processed (S16). If not all residents have been processed (S16: No), the explanatory variable generation unit 30 returns to S10.

全ての住民を処理した場合（Ｓ１６：Ｙｅｓ）、目的変数生成部２０は、修正済資料データ１１に含まれる事例（住民）ごとに、提出された各書類（資料）の修正の有無を算出する（Ｓ１７）。ついで、目的変数生成部２０は、各資料の修正の有無についての算出結果を目的変数として学習用データ７０に格納する（Ｓ１８）。 When all residents have been processed (S16: Yes), the objective variable generation unit 20 calculates whether each submitted document (document) has been corrected for each case (resident) included in the corrected document data 11 (S17). Next, the objective variable generation unit 20 stores the calculation result regarding whether each document has been corrected in the learning data 70 as an objective variable (S18).

ついで、目的変数生成部２０は、全ての資料について目的変数の生成に関する処理を行ったか否かを判定する（Ｓ１９）。全ての資料について処理をしていない場合（Ｓ１９：Ｎｏ）、目的変数生成部２０は、Ｓ１７へ処理を戻す。 Next, the objective variable generation unit 20 determines whether or not the process for generating objective variables has been performed for all materials (S19). If the process has not been performed for all materials (S19: No), the objective variable generation unit 20 returns the process to S17.

全ての資料について処理を行った場合（Ｓ１９：Ｙｅｓ）、学習部５０は、学習用データ７０をもとに機械学習を行って学習モデル７１を生成し（Ｓ２０）、処理を終了する。 If all materials have been processed (S19: Yes), the learning unit 50 performs machine learning based on the learning data 70 to generate a learning model 71 (S20) and ends the process.

図５は、学習用データ７０の一例を説明する説明図である。図５に示すように、機械学習を行う際には、修正済資料データ１１に含まれる事例（住民Ａ、住民Ｂ）ごとに、修正済みの資料（１）、（２）、（３）…をもとに、各資料の内容に対応する特徴量を説明変数とし、修正の有無に対応する値を目的変数とする学習用データ７０を生成する。 Figure 5 is an explanatory diagram illustrating an example of learning data 70. As shown in Figure 5, when performing machine learning, for each case (resident A, resident B) included in the corrected material data 11, learning data 70 is generated based on the corrected materials (1), (2), (3), etc., with the feature quantities corresponding to the content of each material as explanatory variables and the value corresponding to the presence or absence of correction as the objective variable.

ここで、学習用データ７０における説明変数には、復元演算定義情報４１で定義された復元演算に基づいて、特定の文書と、その文書とは別にある他の文書の項目の値より演算した値と、特定の文書の項目の値との差分、すなわち特定の文書と他の文書間の関係性に対応する値が含まれる。 Here, the explanatory variables in the learning data 70 include the difference between the value calculated from the values of items in a specific document and another document separate from that document, based on the restoration calculation defined in the restoration calculation definition information 41, and the value of the item in the specific document, i.e., the value corresponding to the relationship between the specific document and the other document.

例えば、住民Ａの資料（１）に関する説明変数には、資料（１）の項目について、復元演算ａ、ｂ、ｃ…により得られた値との差分、すなわち、資料（１）と他の文書（例えば資料（２）、資料（３）…）間の関係性に対応する値が含まれる。同様に、資料（２）に関する説明変数には、資料（２）の項目について、復元演算ａ、ｂ、ｃ…により得られた値との差分、すなわち、資料（２）と他の文書（例えば資料（１）、資料（３）…）間の関係性に対応する値が含まれる。また、資料（３）に関する説明変数には、資料（３）の項目について、復元演算ａ、ｂ、ｃ…により得られた値との差分、すなわち、資料（３）と他の文書（例えば資料（１）、資料（２）…）間の関係性に対応する値が含まれる。なお、各事例（住民Ｂ…）についても同様である。 For example, the explanatory variables for resident A's document (1) include values that correspond to the difference between the values obtained by the reconstruction calculations a, b, c, etc. for the items in document (1), i.e., values corresponding to the relationship between document (1) and other documents (e.g., document (2), document (3), etc.). Similarly, the explanatory variables for document (2) include values that correspond to the difference between the values obtained by the reconstruction calculations a, b, c, etc. for the items in document (2), i.e., values corresponding to the relationship between document (2) and other documents (e.g., document (1), document (3), etc.). Furthermore, the explanatory variables for document (3) include values that correspond to the difference between the values obtained by the reconstruction calculations a, b, c, etc. for the items in document (3), i.e., values corresponding to the relationship between document (3) and other documents (e.g., document (1), document (2), etc.). The same is true for each case (resident B, etc.).

情報処理装置１では、このような説明変数を含む学習用データ７０をもとに機械学習を行うことで、各文書間の関係性を学習した学習モデル７１を生成することができる。 In the information processing device 1, machine learning is performed based on the learning data 70 that includes such explanatory variables, thereby generating a learning model 71 that has learned the relationships between each document.

図３に戻り、Ｓ１についで、情報処理装置１では、生成した学習モデル７１に未修正資料データ１２を適用し、未修正資料データ１２に含まれる各文書に関する修正の有無を判定する判定処理を行う（Ｓ２）。これにより、情報処理装置１は、各文書に関する修正の有無を示す判定結果８１を得て、処理を終了する。 Returning to FIG. 3, following S1, the information processing device 1 applies the uncorrected material data 12 to the generated learning model 71, and performs a determination process to determine whether or not each document included in the uncorrected material data 12 has been corrected (S2). As a result, the information processing device 1 obtains a determination result 81 indicating whether or not each document has been corrected, and ends the process.

図６は、実施形態にかかる情報処理装置１の判定処理の一例を示すフローチャートである。図６に示すように、判定処理が開始されると、復元演算計算部３１は、未修正資料データ１２に含まれる判定対象の事例（住民）について、提出された各書類（資料）の値を取得する。ついで、復元演算計算部３１は、復元演算定義情報４１の定義内容に基づいて、特定の文書と、その文書とは別にある他の文書の項目の値より復元演算を行う（Ｓ２０）。 Figure 6 is a flowchart showing an example of the judgment process of the information processing device 1 according to the embodiment. As shown in Figure 6, when the judgment process is started, the restoration calculation unit 31 obtains the values of each document (document) submitted for the case (resident) to be judged that is included in the uncorrected document data 12. Next, the restoration calculation unit 31 performs a restoration calculation from the values of items of a specific document and another document separate from that document, based on the definition contents of the restoration calculation definition information 41 (S20).

ついで、差分計算部３２は、復元演算計算部３１による復元演算結果と、資料の値との差分を計算し（Ｓ３１）、この計算結果を説明変数として評価用データ８０に格納する（Ｓ３２）。 Then, the difference calculation unit 32 calculates the difference between the reconstruction calculation result by the reconstruction calculation unit 31 and the value of the material (S31), and stores this calculation result in the evaluation data 80 as an explanatory variable (S32).

ついで、説明変数生成部３０は、判定対象の事例に含まれる全ての資料について説明変数を生成する処理を終了したか否かを判定する（Ｓ３３）。判定対象の事例に含まれる全ての資料について処理を終了していない場合（Ｓ３３：Ｎｏ）、説明変数生成部３０は、Ｓ３０へ処理を戻す。 Next, the explanatory variable generation unit 30 judges whether the process of generating explanatory variables for all materials included in the case to be judged has been completed (S33). If the process has not been completed for all materials included in the case to be judged (S33: No), the explanatory variable generation unit 30 returns the process to S30.

判定対象の事例に含まれる全ての資料について処理を終了した場合（Ｓ３３：Ｙｅｓ）、説明変数生成部３０は、復元演算定義情報４１に定義された全ての復元演算を処理したか否かを判定する（Ｓ３４）。全ての復元演算を処理していない場合（Ｓ３４：Ｎｏ）、説明変数生成部３０は、Ｓ３０へ処理を戻す。 When processing has been completed for all materials included in the case to be judged (S33: Yes), the explanatory variable generation unit 30 judges whether or not all the reconstruction operations defined in the reconstruction operation definition information 41 have been processed (S34). When not all the reconstruction operations have been processed (S34: No), the explanatory variable generation unit 30 returns the process to S30.

全ての復元演算を処理した場合（Ｓ３４：Ｙｅｓ）、説明変数生成部３０は、資料の全ての項目について説明変数の生成に関する処理を行ったか否かを判定する（Ｓ３５）。全ての項目について処理をしていない場合（Ｓ３５：Ｎｏ）、説明変数生成部３０は、Ｓ３０へ処理を戻す。 If all the restoration calculations have been processed (S34: Yes), the explanatory variable generation unit 30 determines whether the process for generating explanatory variables has been performed for all items of the document (S35). If the process has not been performed for all items (S35: No), the explanatory variable generation unit 30 returns the process to S30.

全ての項目を処理した場合（Ｓ３５：Ｙｅｓ）、説明変数生成部３０は、判定対象の全ての住民（事例）を処理したか否かを判定する（Ｓ３６）。全ての住民について処理をしていない場合（Ｓ３６：Ｎｏ）、説明変数生成部３０は、Ｓ３０へ処理を戻す。 If all items have been processed (S35: Yes), the explanatory variable generation unit 30 determines whether all residents (cases) to be judged have been processed (S36). If not all residents have been processed (S36: No), the explanatory variable generation unit 30 returns to S30.

全ての住民を処理した場合（Ｓ３６：Ｙｅｓ）、判定部６０は、評価用データ８０を学習モデル７１に入力することで、判定対象の事例の各書類における修正の有無を判定して判定結果８１を取得し（Ｓ３７）、処理を終了する。 When all residents have been processed (S36: Yes), the judgment unit 60 inputs the evaluation data 80 into the learning model 71 to judge whether or not corrections have been made to each document of the case being judged, obtains the judgment result 81 (S37), and ends the process.

以上のように、情報処理装置１は、各文書の修正履歴を含む複数の事例の修正済資料データ１１をもとに、事例ごとに、予め定義された復元演算に基づいて、特定の第１の文書に含まれる項目の値および特定の第２の文書に含まれる項目の値より演算した値と、第１の文書に含まれる項目の値との差分を含む説明変数を生成する。ついで、情報処理装置１は、生成した説明変数と、事例の各文書における修正履歴に対応する値を含む目的変数とに基づく機械学習により、学習モデル７１を生成する。情報処理装置１は、この学習モデル７１に対して、判定対象の各文書に基づいて学習時と同様に生成した説明変数を入力し、学習モデル７１からの出力に基づいて、判定対象の各文書における修正の有無を出力する。 As described above, the information processing device 1 generates explanatory variables including the difference between a value calculated from the value of an item included in a specific first document and the value of an item included in a specific second document and the value of the item included in the first document, based on a predefined restoration calculation for each case, based on the corrected material data 11 of multiple cases including the revision history of each document. Next, the information processing device 1 generates a learning model 71 by machine learning based on the generated explanatory variables and a target variable including a value corresponding to the revision history of each document of the case. The information processing device 1 inputs explanatory variables generated in the same manner as during learning based on each document to be judged to this learning model 71, and outputs the presence or absence of revisions in each document to be judged based on the output from the learning model 71.

このように、判定に用いる学習モデル７１に入力する説明変数には、特定の第１の文書に含まれる項目の値および特定の第２の文書に含まれる項目の値より復元演算した値と、第１の文書に含まれる項目との差分による文書間の関係性が含まれる。したがって、情報処理装置１では、この学習モデル７１を用いて判定対象の各文書における修正の有無を得るので、文書間の関係性に対応して修正すべき文書を特定できる。 In this way, the explanatory variables input to the learning model 71 used for the judgment include values calculated from the values of items contained in a specific first document and the values of items contained in a specific second document, and the relationship between the documents based on the difference between the values and the items contained in the first document. Therefore, the information processing device 1 uses this learning model 71 to obtain the presence or absence of corrections in each document to be judged, and can identify documents that need to be corrected in accordance with the relationship between the documents.

また、情報処理装置１における復元演算は、異なる特定の文書間の項目の値の最大値を求める演算である。これにより、情報処理装置１では、異なる特定の文書間の項目の値の最大値を求める復元演算の結果と、第１の文書の値との差分により、０ならば第１の文書の値が最大値、０以外ならば他の文書の値のいずれかが最大値となるような、文書間の関係性を示す特徴量を説明変数に含めることができる。例えば、典型的な文書の分かれ方の一例として、ある文書（例えば第１の文書）の値が他の文書（例えば第２の文書）の値に包括されているようなケースがある。情報処理装置１では、上記の復元演算により、このようなケースに関する特徴量を説明変数に含めることができる。 In addition, the restoration calculation in the information processing device 1 is a calculation that finds the maximum value of the values of items between different specific documents. As a result, the information processing device 1 can include in the explanatory variables a feature that indicates the relationship between the documents, such that if the difference between the result of the restoration calculation that finds the maximum value of the values of items between different specific documents and the value of the first document is 0, the value of the first document is the maximum, and if the difference is not 0, one of the values of the other documents is the maximum. For example, one example of a typical way in which documents are separated is a case in which the value of a certain document (e.g., the first document) is contained in the value of another document (e.g., the second document). In the information processing device 1, the above restoration calculation can include in the explanatory variables a feature related to such a case.

また、情報処理装置１における復元演算は、異なる特定の文書間の項目の値の合計値を求める演算である。これにより、情報処理装置１では、異なる特定の文書間の項目の値の合計値を求める復元演算の結果と、第１の文書の値との差分により、０ならば第１の文書のみ正の値であり、０以外ならば他の文書にも正の値があるような、文書間の関係性を示す特徴量を説明変数に含めることができる。例えば、典型的な文書の分かれ方の一例として、別々の文書に値が形状されており、値が重複しないようなケースがある。情報処理装置１では、上記の復元演算により、このようなケースに関する特徴量を説明変数に含めることができる。 In addition, the restoration calculation in the information processing device 1 is a calculation that finds the sum of the values of items between different specific documents. As a result, in the information processing device 1, the difference between the result of the restoration calculation that finds the sum of the values of items between different specific documents and the value of the first document can be included in the explanatory variables as a feature that indicates the relationship between the documents, such that if the value is 0, only the first document has a positive value, and if the value is non-zero, the other documents also have positive values. For example, one example of a typical way in which documents are separated is a case in which values are shaped in different documents and do not overlap. In the information processing device 1, the above restoration calculation can be used to include a feature related to such a case in the explanatory variables.

また、情報処理装置１における復元演算は、異なる特定の文書間の項目の値の合計値から第１の文書か第２の文書のいずれかに含まれる特定の第３の文書の項目の値を減算する演算である。これにより、情報処理装置１では、異なる特定の文書間の項目の値の合計値から第３の文書の項目の値を減算する復元演算の結果と、第１の文書の値との差分により、０ならば第１の文書と減算対象の第３の文書のみの値が正であり、０以外ならば他の文書にも正の値があるか、この文書が第３の文書か値が０かのいずれかのような、文書間の関係性を示す特徴量を説明変数に含めることができる。例えば、典型的な文書の分かれ方の一例として、特定の文書に別の項目の値を誤記載するようなケースがある。情報処理装置１では、上記の復元演算により、このようなケースに関する特徴量を説明変数に含めることができる。 In addition, the restoration calculation in the information processing device 1 is a calculation that subtracts the value of an item of a specific third document included in either the first document or the second document from the total value of the values of items between different specific documents. As a result, in the information processing device 1, depending on the difference between the result of the restoration calculation that subtracts the value of the item of the third document from the total value of the values of items between different specific documents and the value of the first document, if the difference is 0, then only the values of the first document and the third document to be subtracted are positive, and if the difference is other than 0, then either the other document also has a positive value, or this document is a third document or the value is 0, then the explanatory variable can include a feature that indicates the relationship between the documents. For example, a typical example of how documents are divided is a case where a value of another item is erroneously written in a specific document. In the information processing device 1, the above restoration calculation can include a feature related to such a case in the explanatory variable.

また、情報処理装置１における第１の文書および第２の文書は同一の種別の文書である。これにより、情報処理装置１では、同一の種別の文書間の関係性より、修正すべき文書を特定できる。 In addition, the first document and the second document in the information processing device 1 are documents of the same type. This allows the information processing device 1 to identify the document to be corrected based on the relationship between documents of the same type.

（その他）
なお、図示した各装置の各構成要素は、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、情報処理装置１については、学習モデル７１を生成する構成と、生成した学習モデル７１をもとに判定する構成とを分散してもよい。 (others)
Note that each component of each device shown in the figure does not necessarily have to be physically configured as shown in the figure. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or part of them can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc. For example, for the information processing device 1, a configuration for generating a learning model 71 and a configuration for making a judgment based on the generated learning model 71 may be distributed.

また、情報処理装置１の各種処理機能（入力部１０、目的変数生成部２０、説明変数生成部３０、学習部５０および判定部６０）は、ＣＰＵ（またはＭＰＵ、ＭＣＵ（Micro Controller Unit）等のマイクロ・コンピュータ）上で、その全部または任意の一部を実行するようにしてもよい。また、各種処理機能は、ＣＰＵ（またはＭＰＵ、ＭＣＵ等のマイクロ・コンピュータ）で解析実行されるプログラム上、またはワイヤードロジックによるハードウエア上で、その全部または任意の一部を実行するようにしてもよいことは言うまでもない。また、情報処理装置１で行われる各種処理機能は、クラウドコンピューティングにより、複数のコンピュータが協働して実行してもよい。 Furthermore, the various processing functions of the information processing device 1 (the input unit 10, the objective variable generation unit 20, the explanatory variable generation unit 30, the learning unit 50, and the judgment unit 60) may be executed in whole or in part on a CPU (or a microcomputer such as an MPU or an MCU (Micro Controller Unit)). Needless to say, the various processing functions may be executed in whole or in part on a program that is analyzed and executed by the CPU (or a microcomputer such as an MPU or an MCU), or on hardware using wired logic. Furthermore, the various processing functions performed by the information processing device 1 may be executed by multiple computers working together using cloud computing.

（コンピュータ構成例）
ところで、上記の実施形態で説明した各種の処理は、予め用意されたプログラムをコンピュータで実行することで実現できる。そこで、以下では、上記の実施形態と同様の機能を有するプログラムを実行するコンピュータ構成（ハードウエア）の一例を説明する。図７は、コンピュータ構成の一例を示すブロック図である。 (Example of computer configuration)
The various processes described in the above embodiment can be realized by executing a program prepared in advance on a computer. Therefore, an example of a computer configuration (hardware) that executes a program having the same functions as those of the above embodiment will be described below. Fig. 7 is a block diagram showing an example of the computer configuration.

図７に示すように、コンピュータ２００は、各種演算処理を実行するＣＰＵ２０１と、データ入力を受け付ける入力装置２０２と、モニタ２０３と、スピーカー２０４とを有する。また、コンピュータ２００は、記憶媒体からプログラム等を読み取る媒体読取装置２０５と、各種装置と接続するためのインタフェース装置２０６と、有線または無線により外部機器と通信接続するための通信装置２０７とを有する。また、情報処理装置１は、各種情報を一時記憶するＲＡＭ２０８と、ハードディスク装置２０９とを有する。また、コンピュータ２００内の各部（２０１～２０９）は、バス２１０に接続される。 As shown in FIG. 7, the computer 200 has a CPU 201 that executes various arithmetic processes, an input device 202 that accepts data input, a monitor 203, and a speaker 204. The computer 200 also has a medium reading device 205 that reads programs and the like from a storage medium, an interface device 206 for connecting to various devices, and a communication device 207 for connecting to external devices via wired or wireless communication. The information processing device 1 also has a RAM 208 that temporarily stores various information, and a hard disk device 209. Each unit (201 to 209) in the computer 200 is connected to a bus 210.

ハードディスク装置２０９には、上記の実施形態で説明した機能構成（例えば入力部１０、目的変数生成部２０、説明変数生成部３０、学習部５０および判定部６０）における各種の処理を実行するためのプログラム２１１が記憶される。また、ハードディスク装置２０９には、プログラム２１１が参照する各種データ２１２が記憶される。入力装置２０２は、例えば、操作者から操作情報の入力を受け付ける。モニタ２０３は、例えば、操作者が操作する各種画面を表示する。インタフェース装置２０６は、例えば印刷装置等が接続される。通信装置２０７は、ＬＡＮ（Local Area Network）等の通信ネットワークと接続され、通信ネットワークを介した外部機器との間で各種情報をやりとりする。 The hard disk drive 209 stores a program 211 for executing various processes in the functional configuration (e.g., the input unit 10, the objective variable generating unit 20, the explanatory variable generating unit 30, the learning unit 50, and the judgment unit 60) described in the above embodiment. The hard disk drive 209 also stores various data 212 referenced by the program 211. The input device 202, for example, accepts input of operation information from an operator. The monitor 203, for example, displays various screens operated by the operator. The interface device 206 is connected to, for example, a printing device. The communication device 207 is connected to a communication network such as a LAN (Local Area Network), and exchanges various information with external devices via the communication network.

ＣＰＵ２０１は、ハードディスク装置２０９に記憶されたプログラム２１１を読み出して、ＲＡＭ２０８に展開して実行することで、上記の機能構成（例えば入力部１０、目的変数生成部２０、説明変数生成部３０、学習部５０および判定部６０）に関する各種の処理を行う。なお、プログラム２１１は、ハードディスク装置２０９に記憶されていなくてもよい。例えば、コンピュータ２００が読み取り可能な記憶媒体に記憶されたプログラム２１１を読み出して実行するようにしてもよい。コンピュータ２００が読み取り可能な記憶媒体は、例えば、ＣＤ－ＲＯＭやＤＶＤディスク、ＵＳＢ（Universal Serial Bus）メモリ等の可搬型記録媒体、フラッシュメモリ等の半導体メモリ、ハードディスクドライブ等が対応する。また、公衆回線、インターネット、ＬＡＮ等に接続された装置にこのプログラム２１１を記憶させておき、コンピュータ２００がこれらからプログラム２１１を読み出して実行するようにしてもよい。 The CPU 201 reads out the program 211 stored in the hard disk drive 209, expands it in the RAM 208, and executes it to perform various processes related to the above-mentioned functional configuration (for example, the input unit 10, the objective variable generating unit 20, the explanatory variable generating unit 30, the learning unit 50, and the judgment unit 60). The program 211 does not have to be stored in the hard disk drive 209. For example, the program 211 stored in a storage medium that the computer 200 can read out and execute may be read out and executed. Examples of storage media that the computer 200 can read out include portable storage media such as CD-ROMs, DVD disks, USB (Universal Serial Bus) memories, semiconductor memories such as flash memories, and hard disk drives. The program 211 may also be stored in a device connected to a public line, the Internet, a LAN, or the like, and the computer 200 may read out and execute the program 211 from the device.

以上の実施形態に関し、さらに以下の付記を開示する。 The following notes are further provided with respect to the above embodiment.

（付記１）各文書の修正履歴を含む複数の事例の学習用データをもとに、前記事例ごとに、予め定義された復元演算に基づいて特定の第１の文書に含まれる項目の値および特定の第２の文書に含まれる項目の値より演算した値と、前記第１の文書に含まれる項目の値との差分を含む説明変数と、前記事例の各文書における修正履歴に対応する値を含む目的変数とに基づいて機械学習したモデルに対して、判定対象の各文書に基づいて生成した前記説明変数を入力し、
前記モデルからの出力に基づいて、前記判定対象の各文書における修正の有無を出力する、
処理をコンピュータに実行させることを特徴とする文書修正支援プログラム。 (Supplementary Note 1) Based on learning data of a plurality of cases including the revision history of each document, a machine learning model is generated based on explanatory variables including a difference between a value calculated from a value of an item included in a specific first document and a value of an item included in a specific second document based on a predefined restoration calculation for each of the cases and the value of the item included in the first document, and a target variable including a value corresponding to the revision history of each document of the cases, and the explanatory variables generated based on each document to be judged are input to the model;
outputting whether or not each document to be judged has been modified based on the output from the model;
A document correction support program for causing a computer to execute a process.

（付記２）前記復元演算は、異なる特定の文書間の項目の値の最大値を求める演算である、
ことを特徴とする付記１に記載の文書修正支援プログラム。 (Note 2) The restoration operation is an operation for finding the maximum value of item values between different specific documents.
2. The document correction support program according to claim 1,

（付記３）前記復元演算は、異なる特定の文書間の項目の値の合計値を求める演算である、
ことを特徴とする付記１に記載の文書修正支援プログラム。 (Additional Note 3) The restoration operation is an operation for calculating the sum of values of items between different specific documents.
2. The document correction support program according to claim 1,

（付記４）前記復元演算は、異なる特定の文書間の項目の値の合計値から前記第１の文書か前記第２の文書のいずれかに含まれる特定の第３の文書の項目の値を減算する演算である、
ことを特徴とする付記１に記載の文書修正支援プログラム。 (Additional Note 4) The restoration operation is an operation of subtracting a value of an item of a specific third document included in either the first document or the second document from a total value of values of items between different specific documents.
2. The document correction support program according to claim 1,

（付記５）前記第１の文書および前記第２の文書は同一の種別の文書である、
ことを特徴とする付記１乃至４のいずれか一に記載の文書修正支援プログラム。 (Supplementary Note 5) The first document and the second document are documents of the same type.
5. A document correction support program according to claim 1,

（付記６）各文書の修正履歴を含む複数の事例の学習用データをもとに、前記事例ごとに、予め定義された復元演算に基づいて特定の第１の文書に含まれる項目の値および特定の第２の文書に含まれる項目の値より演算した値と、前記第１の文書に含まれる項目の値との差分を含む説明変数と、前記事例の各文書における修正履歴に対応する値を含む目的変数とに基づいて機械学習したモデルに対して、判定対象の各文書に基づいて生成した前記説明変数を入力し、
前記モデルからの出力に基づいて、前記判定対象の各文書における修正の有無を出力する、
処理をコンピュータが実行することを特徴とする文書修正支援方法。 (Supplementary Note 6) Based on learning data of a plurality of cases including the revision history of each document, a machine learning model is generated based on explanatory variables including a difference between a value calculated from a value of an item included in a specific first document and a value of an item included in a specific second document based on a predefined restoration calculation for each of the cases and the value of the item included in the first document, and a target variable including a value corresponding to the revision history of each document of the cases. The explanatory variables generated based on each document to be judged are input to the model;
outputting whether or not each document to be judged has been modified based on the output from the model;
A document correction support method, comprising the steps of: executing a process on a computer;

（付記７）前記復元演算は、異なる特定の文書間の項目の値の最大値を求める演算である、
ことを特徴とする付記６に記載の文書修正支援方法。 (Additional Note 7) The restoration operation is an operation for finding the maximum value of item values between different specific documents.
7. The document correction support method according to claim 6,

（付記８）前記復元演算は、異なる特定の文書間の項目の値の合計値を求める演算である、
ことを特徴とする付記６に記載の文書修正支援方法。 (Additional Note 8) The restoration operation is an operation for calculating the sum of values of items between different specific documents.
7. The document correction support method according to claim 6,

（付記９）前記復元演算は、異なる特定の文書間の項目の値の合計値から前記第１の文書か前記第２の文書のいずれかに含まれる特定の第３の文書の項目の値を減算する演算である、
ことを特徴とする付記６に記載の文書修正支援方法。 (Supplementary Note 9) The restoration operation is an operation of subtracting a value of an item of a specific third document included in either the first document or the second document from a total value of values of items between different specific documents.
7. The document correction support method according to claim 6,

（付記１０）前記第１の文書および前記第２の文書は同一の種別の文書である、
ことを特徴とする付記６乃至９のいずれか一に記載の文書修正支援方法。 (Supplementary Note 10) The first document and the second document are documents of the same type.
10. The document correction support method according to claim 6,

（付記１１）各文書の修正履歴を含む複数の事例の学習用データをもとに、前記事例ごとに、予め定義された復元演算に基づいて特定の第１の文書に含まれる項目の値および特定の第２の文書に含まれる項目の値より演算した値と、前記第１の文書に含まれる項目の値との差分を含む説明変数と、前記事例の各文書における修正履歴に対応する値を含む目的変数とに基づいて機械学習したモデルに対して、判定対象の各文書に基づいて生成した前記説明変数を入力し、
前記モデルからの出力に基づいて、前記判定対象の各文書における修正の有無を出力する、
処理を制御部が実行することを特徴とする情報処理装置。 (Supplementary Note 11) Based on learning data of a plurality of cases including the revision history of each document, a machine learning model is generated based on explanatory variables including a difference between a value calculated from a value of an item included in a specific first document and a value of an item included in a specific second document based on a predefined restoration calculation for each of the cases and the value of the item included in the first document, and a target variable including a value corresponding to the revision history of each document of the cases, and the explanatory variables generated based on each document to be judged are input to the model;
outputting whether or not each document to be judged has been modified based on the output from the model;
An information processing device characterized in that a control unit executes processing.

（付記１２）前記復元演算は、異なる特定の文書間の項目の値の最大値を求める演算である、
ことを特徴とする付記１１に記載の情報処理装置。 (Additional Note 12) The restoration operation is an operation for finding the maximum value of item values between different specific documents.
12. The information processing device according to claim 11.

（付記１３）前記復元演算は、異なる特定の文書間の項目の値の合計値を求める演算である、
ことを特徴とする付記１１に記載の情報処理装置。 (Additional Note 13) The restoration operation is an operation for calculating the sum of values of items between different specific documents.
12. The information processing device according to claim 11.

（付記１４）前記復元演算は、異なる特定の文書間の項目の値の合計値から前記第１の文書か前記第２の文書のいずれかに含まれる特定の第３の文書の項目の値を減算する演算である、
ことを特徴とする付記１１に記載の情報処理装置。 (Supplementary Note 14) The restoration operation is an operation of subtracting a value of an item of a specific third document included in either the first document or the second document from a total value of values of items between different specific documents.
12. The information processing device according to claim 11.

（付記１５）前記第１の文書および前記第２の文書は同一の種別の文書である、
ことを特徴とする付記１１乃至１４のいずれか一に記載の情報処理装置。 (Supplementary Note 15) The first document and the second document are documents of the same type.
15. The information processing device according to any one of claims 11 to 14.

１…情報処理装置
１０…入力部
１１…修正済資料データ
１２…未修正資料データ
２０…目的変数生成部
３０…説明変数生成部
３１…復元演算計算部
３２…差分計算部
４０…記憶部
４１…復元演算定義情報
５０…学習部
６０…判定部
７０…学習用データ
７１…学習モデル
８０…評価用データ
８１…判定結果
２００…コンピュータ
２０１…ＣＰＵ
２０２…入力装置
２０３…モニタ
２０４…スピーカー
２０５…媒体読取装置
２０６…インタフェース装置
２０７…通信装置
２０８…ＲＡＭ
２０９…ハードディスク装置
２１０…バス
２１１…プログラム
２１２…各種データ
Ｄ１…確定申告書
Ｄ２…住民税申告書
Ｄ３、Ｄ４…給与支払報告書
Ｄ５…年金支払報告書
Ｈ１…住民
Ｈ２…職員
Ｋ１、Ｋ２…勤務先
Ｋ３…年金機構
1... Information processing device 10... Input unit 11... Corrected material data 12... Uncorrected material data 20... Objective variable generation unit 30... Explanatory variable generation unit 31... Restoration calculation calculation unit 32... Difference calculation unit 40... Storage unit 41... Restoration calculation definition information 50... Learning unit 60... Judgment unit 70... Learning data 71... Learning model 80... Evaluation data 81... Judgment result 200... Computer 201... CPU
202: Input device 203: Monitor 204: Speaker 205: Media reader 206: Interface device 207: Communication device 208: RAM
209...Hard disk device 210...Bus 211...Program 212...Various data D1...Final tax return D2...Resident tax return D3, D4...Salary payment report D5...Pension payment report H1...Resident H2...Employee K1, K2...Employee K3...Pension Service

Claims

a machine learning model based on learning data of a plurality of cases including the revision history of each document, for each of the cases, an explanatory variable including a difference between a value calculated from a value of an item included in a specific first document and a value of an item included in a specific second document based on a predefined restoration calculation and the value of the item included in the first document, and a target variable including a value corresponding to the revision history of each document of the cases, the explanatory variables generated based on each document to be determined are input to the machine learning model;
outputting whether or not each document to be judged has been modified based on the output from the model;
A document correction support program for causing a computer to execute a process.

The restoration operation is an operation for finding the maximum value of item values among different specific documents.
2. The document correction support program according to claim 1.

The restoration operation is an operation for calculating a sum of values of items among different specific documents.
2. The document correction support program according to claim 1.

The restoration operation is an operation of subtracting a value of an item of a specific third document included in either the first document or the second document from a total value of values of items among different specific documents.
2. The document correction support program according to claim 1.

the first document and the second document are documents of the same type;
5. The document correction support program according to claim 1,

a machine learning model based on learning data of a plurality of cases including the revision history of each document, for each of the cases, an explanatory variable including a difference between a value calculated from a value of an item included in a specific first document and a value of an item included in a specific second document based on a predefined restoration calculation and the value of the item included in the first document, and a target variable including a value corresponding to the revision history of each document of the cases, the explanatory variables generated based on each document to be determined are input to the machine learning model;
outputting whether or not each document to be judged has been modified based on the output from the model;
A document correction support method, comprising the steps of: executing a process on a computer;

a machine learning model based on learning data of a plurality of cases including the revision history of each document, for each of the cases, an explanatory variable including a difference between a value calculated from a value of an item included in a specific first document and a value of an item included in a specific second document based on a predefined restoration calculation and the value of the item included in the first document, and a target variable including a value corresponding to the revision history of each document of the cases, the explanatory variables generated based on each document to be determined are input to the machine learning model;
outputting whether or not each document to be judged has been modified based on the output from the model;
An information processing device characterized in that a control unit executes processing.