JP6979294B2

JP6979294B2 - Calibration support device, calibration support method and calibration support program

Info

Publication number: JP6979294B2
Application number: JP2017132713A
Authority: JP
Inventors: 雄太人見; 秀明田森; 健太郎乾; 直観岡崎
Original assignee: 株式会社朝日新聞社
Priority date: 2017-07-06
Filing date: 2017-07-06
Publication date: 2021-12-08
Anticipated expiration: 2037-07-06
Also published as: JP2019016140A

Description

本発明は、校正支援装置、校正支援方法及び校正支援プログラムに関する。 The present invention relates to a calibration support device, a calibration support method, and a calibration support program.

近年の人工知能（ＡＩ）の発達に伴い、分散表現を用いた自然言語処理が注目されており、例えば、自然言語処理を応用した校正支援装置が開発されている。 With the development of artificial intelligence (AI) in recent years, natural language processing using distributed expressions has attracted attention, and for example, a proofreading support device applying natural language processing has been developed.

非特許文献１には、文の一箇所をブランクとし、ブランク前後の分散表現を用いて、ブランク箇所に入る単語候補を予測する技術が開示されている。 Non-Patent Document 1 discloses a technique in which one part of a sentence is blank and a word candidate entering the blank part is predicted by using a distributed expression before and after the blank.

”ｃｏｎｔｅｘｔ２ｖｅｃ：ＬｅａｒｎｉｎｇＧｅｎｅｒｉｃＣｏｎｔｅｘｔＥｍｂｅｄｄｉｎｇｗｉｔｈＢｉｄｉｒｅｃｔｉｏｎａｌＬＳＴＭ” Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２０ｔｈＳＩＧＮＬＬＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔａｔｉｏｎａｌＮａｔｕｒａｌＬａｎｇｕａｇｅＬｅａｒｎｉｎｇ（ＣｏＮＬＬ），ｐａｇｅｓ５１−６１，Ｂｅｒｌｉｎ，Ｇｅｒｍａｎｙ，Ａｕｇｕｓｔ７−１２，２０１６"Conext2vec: Learning Generic Context Embedding with Bidilectional LSTM" Proceedings of the 20th SIGNLL Conference on ComputerLangu

本発明者らは、非特許文献１の技術を校正支援装置に応用できる可能性を見出し、種々の検討を行った。しかし、非特許文献１の技術では、基本的には複数の言い換え候補が予測されるため、例えば「宮崎駅の西口から延びる」という文の「の」の言い換え候補として「東口」「南口」が含まれる等、校正には不適切な候補も含まれる。そのため、非特許文献１の技術を、そのまま校正支援装置へ適用すると、校正の精度が十分ではなく、不自然な日本語になるという問題があった。 The present inventors have found the possibility that the technique of Non-Patent Document 1 can be applied to a calibration support device, and have conducted various studies. However, in the technology of Non-Patent Document 1, since multiple paraphrase candidates are basically predicted, for example, "east exit" and "south exit" are paraphrase candidates of "no" in the sentence "extending from the west exit of Miyazaki station". Candidates that are inappropriate for calibration, such as being included, are also included. Therefore, if the technique of Non-Patent Document 1 is applied to the calibration support device as it is, there is a problem that the accuracy of calibration is not sufficient and the Japanese becomes unnatural.

本発明は、かかる現状に鑑みてなされたものであり、分散表現を用いた校正候補の予測結果から適切な候補を選択し、正確な校正を支援することが可能な校正支援装置、校正支援方法及び校正支援プログラムを提供することを目的とする。 The present invention has been made in view of the present situation, and is a calibration support device and a calibration support method capable of selecting an appropriate candidate from the prediction results of calibration candidates using distributed representation and supporting accurate calibration. And to provide a calibration support program.

本発明の校正支援装置は、
校正対象文を処理単位に分割して分かち書き文を生成する分かち書き文生成部と、
前記分かち書き文を構成する処理単位のうち、校正履歴コーパス中の見出しと一致する処理単位を校正対象として判定する校正対象判定部と、
前記校正対象の前後少なくとも一方の処理単位または処理単位群が有するベクトルを用いて、校正候補を予測する校正候補予測部と、
前記校正候補のうち、前記校正履歴コーパス中の前記見出しに対応する校正結果と一致する校正候補を適切候補として判定する適切候補判定部と、
を有することを特徴とする。 The calibration support device of the present invention is
A word-separated sentence generator that divides the proofreading target sentence into processing units and generates a word-separated sentence,
Of the processing units that make up the word-separated text, the proofreading target determination unit that determines the processing unit that matches the heading in the proofreading history corpus as the proofreading target.
A calibration candidate prediction unit that predicts calibration candidates using a vector possessed by at least one processing unit or processing unit group before and after the calibration target.
Among the calibration candidates, an appropriate candidate determination unit that determines a calibration candidate that matches the calibration result corresponding to the heading in the calibration history corpus as an appropriate candidate.
It is characterized by having.

また、本発明の校正支援方法は、
コンピューターが実行する校正支援方法であって、
校正対象文を処理単位に分割して分かち書き文を生成する分かち書き文生成ステップと、
前記分かち書き文を構成する処理単位のうち、校正履歴コーパス中の見出しと一致する処理単位を校正対象として判定する校正対象判定ステップと、
前記校正対象の前後少なくとも一方の処理単位または処理単位群が有するベクトルを用いて、校正候補を予測する校正候補予測ステップと、
前記校正候補のうち、前記校正履歴コーパス中の前記見出しに対応する校正結果と一致する校正候補を適切候補として判定する適切候補判定ステップと、
を有することを特徴とする。 Further, the calibration support method of the present invention is:
It is a proofreading support method performed by a computer.
A step to generate a word-separated sentence by dividing the calibrated sentence into processing units and generating a word-separated sentence,
A proofreading target determination step for determining a processing unit that matches a heading in the proofreading history corpus as a proofreading target among the processing units constituting the divided sentence.
A calibration candidate prediction step for predicting a calibration candidate using a vector possessed by at least one processing unit or processing unit group before and after the calibration target.
Among the calibration candidates, an appropriate candidate determination step for determining a calibration candidate that matches the calibration result corresponding to the heading in the calibration history corpus as an appropriate candidate.
It is characterized by having.

また、本発明の校正支援プログラムは、
コンピューターに、
校正対象文を処理単位に分割して分かち書き文を生成する分かち書き文生成ステップと、
前記分かち書き文を構成する処理単位のうち、校正履歴コーパス中の見出しと一致する処理単位を校正対象として判定する校正対象判定ステップと、
前記校正対象の前後少なくとも一方の処理単位または処理単位群が有するベクトルを用いて、校正候補を予測する校正候補予測ステップと、
前記校正候補のうち、前記校正履歴コーパス中の前記見出しに対応する校正結果と一致する校正候補を適切候補として判定する適切候補判定ステップと、
を実行させることを特徴とする。 Further, the calibration support program of the present invention is
On the computer
A step to generate a word-separated sentence by dividing the calibrated sentence into processing units and generating a word-separated sentence,
A proofreading target determination step for determining a processing unit that matches a heading in the proofreading history corpus as a proofreading target among the processing units constituting the divided sentence.
A calibration candidate prediction step for predicting a calibration candidate using a vector possessed by at least one processing unit or processing unit group before and after the calibration target.
Among the calibration candidates, an appropriate candidate determination step for determining a calibration candidate that matches the calibration result corresponding to the heading in the calibration history corpus as an appropriate candidate.
Is characterized by executing.

本発明によれば、分散表現を用いた校正候補の予測結果に、校正に不適切な候補が含まれていても、より適切な候補を選択することができ、より正確な校正が可能となる。 According to the present invention, even if the prediction result of the calibration candidate using the distributed representation includes a candidate inappropriate for calibration, a more appropriate candidate can be selected, and more accurate calibration becomes possible. ..

第１実施形態の校正支援装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the calibration support apparatus of 1st Embodiment. 第１実施形態の校正支援方法の一例を示すフローチャートである。It is a flowchart which shows an example of the calibration support method of 1st Embodiment. 第１実施形態の校正履歴コーパス生成部の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the calibration history corpus generation part of 1st Embodiment. 第１実施形態のベクトル学習済みモデル生成部の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the vector trained model generation part of 1st Embodiment. 第２実施形態の校正支援装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of the structure of the calibration support apparatus of 2nd Embodiment. 第２実施形態の校正支援方法の一例を示すフローチャートである。It is a flowchart which shows an example of the calibration support method of 2nd Embodiment.

以下、本発明の校正支援装置及び校正支援方法の実施形態について、図面を参照しながら詳細に説明する。なお、本発明の校正支援装置及び校正支援方法は、以下に示す実施形態に限定されるものではない。 Hereinafter, embodiments of the calibration support device and the calibration support method of the present invention will be described in detail with reference to the drawings. The calibration support device and the calibration support method of the present invention are not limited to the embodiments shown below.

１．第１実施形態
≪校正支援装置≫
図１は、本実施形態の校正支援装置の構成の一例を示すブロック図である。図１において、１は文入力部、２は分かち書き文生成部、３は校正対象判定部、４は校正候補予測部、５は適切候補判定部、６は出力部、８は校正履歴コーパス、９は形態素解析用辞書、１０はベクトル学習済みモデルである。 1. 1. 1st Embodiment << Calibration support device >>
FIG. 1 is a block diagram showing an example of the configuration of the calibration support device of the present embodiment. In FIG. 1, 1 is a sentence input unit, 2 is a word-separated sentence generation unit, 3 is a calibration target determination unit, 4 is a calibration candidate prediction unit, 5 is an appropriate candidate determination unit, 6 is an output unit, 8 is a calibration history corpus, and 9 Is a dictionary for morphological analysis, and 10 is a vector-learned model.

＜文入力部１＞
まず、校正対象文が文入力部１に入力される。入力の方法は特に限定されず、例えば、キーボードによる入力、手書きによる入力等が挙げられる。文入力部１は、校正対象文を分かち書き文生成部２に出力する。 <Sentence input unit 1>
First, the proofreading target sentence is input to the sentence input unit 1. The input method is not particularly limited, and examples thereof include keyboard input and handwritten input. The sentence input unit 1 outputs the proofreading target sentence to the word-separated sentence generation unit 2.

＜分かち書き文生成部２＞
分かち書き文生成部２は、校正対象文を処理単位に分割して分かち書き文を生成する。例えば、あらかじめ生成しておいた形態素解析用辞書９を用いて、校正対象文の形態素解析を行い、校正対象文を形態素と固有名詞に分割する。固有名詞については、形態素解析用辞書９に含まれていればそれ以上は分割しないことが好ましい。例えば、人物の姓名は、一般的な形態素解析では「姓」と「名」に分割され、「姓／名」という結果が得られる（以下、「／」は処理単位の区切り箇所を示す）。しかし、形態素解析用辞書９に、例えば著名人の姓名が含まれていれば、その姓名を固有名詞と判断し、「姓」と「名」に分割せずに処理単位とする。本実施形態では、処理単位は原則、形態素または固有名詞であるが、連続する形態素を幾つか結合した形態素群（例えば「伸び／る」という前後２つの形態素を結合した「伸びる」）を、処理単位としてもよい。 <Divided sentence generation unit 2>
The word-separated sentence generation unit 2 divides the proofreading target sentence into processing units and generates a word-separated sentence. For example, a morphological analysis of a calibrated target sentence is performed using a morphological analysis dictionary 9 generated in advance, and the calibrated target sentence is divided into a morpheme and a proper noun. It is preferable that the proper noun is not further divided as long as it is included in the morphological analysis dictionary 9. For example, a person's first and last name is divided into "last name" and "first name" in a general morphological analysis, and the result of "last name / first name" is obtained (hereinafter, "/" indicates a delimiter of a processing unit). However, if the morphological analysis dictionary 9 contains, for example, the surname of a celebrity, the surname is determined to be a proper noun, and the surname is not divided into a "last name" and a "first name", but is used as a processing unit. In this embodiment, the processing unit is, in principle, a morpheme or a proper noun, but a morpheme group in which several consecutive morphemes are combined (for example, "elongation" in which two morphemes before and after "elongation /" are combined) is processed. It may be a unit.

分かち書き文生成部２は、校正対象文を処理単位に分割し、さらに、文頭に例えば＜ｂｏｓ＞等の文頭記号、文末に例えば＜ｅｏｓ＞等の文末記号を配置して、分かち書き文を生成する。分かち書き文生成部２は、生成した分かち書き文を、校正対象判定部３に出力する。 The word-separated sentence generation unit 2 divides the proofreading sentence into processing units, and further arranges a sentence beginning symbol such as <bos> at the beginning of the sentence and a sentence ending symbol such as <eos> at the end of the sentence to generate a divided sentence. .. The word-separated sentence generation unit 2 outputs the generated word-separated sentence to the proofreading target determination unit 3.

なお、分かち書き文は形態素解析以外の方法を用いて生成してもよい。例えば、「ＳｅｎｔｅｎｃｅＰｉｅｃｅ」という手法の様に、確率的な観点あるいはその後の処理のし易さの観点から処理単位に分割してもよい。また、例えば、校正対象文がスペースを用いる言語である場合にはスペースで区切って処理単位に分割する、校正対象文がスペースを用いない言語である場合には１文字ごとに処理単位に分割する等、校正対象の言語や校正の目的によって、処理単位を適宜決定してもよい。 The word-separated sentence may be generated by using a method other than morphological analysis. For example, as in the method of "SentencePiece", it may be divided into processing units from the viewpoint of probability or the ease of subsequent processing. Also, for example, if the proofreading target sentence is a language that uses spaces, it is divided into processing units by separating it with spaces, and if the proofreading target sentence is a language that does not use spaces, it is divided into processing units for each character. Etc., the processing unit may be appropriately determined depending on the language to be calibrated and the purpose of proofreading.

＜校正対象判定部３＞
校正対象判定部３は、分かち書き文を構成する処理単位のそれぞれを、あらかじめ生成しておいた校正履歴コーパス８の見出しと対比し、校正履歴コーパス８の見出しと一致する処理単位を、校正対象として判定する。処理単位の対比の順番は特に限定されない。ここで、校正履歴コーパス８は、過去の校正履歴を蓄積したデータベースであり、例えば表１に示すように、校正前の処理単位を見出しとし、校正結果と、校正の属性（挿入、削除、置換）と、校正回数（過去の出現回数）を関連付けたレコードが記録されている。したがって、校正対象判定部３は、分かち書き文を構成する処理単位のうち、過去に校正対象となったことがある処理単位を校正対象として判定することになる。なお、校正履歴コーパス８の詳細については後述する。 <Calibration target determination unit 3>
The proofreading target determination unit 3 compares each of the processing units constituting the divided sentence with the heading of the proofreading history corpus 8 generated in advance, and sets the processing unit matching the heading of the proofreading history corpus 8 as the proofreading target. judge. The order of comparison of processing units is not particularly limited. Here, the calibration history corpus 8 is a database in which past calibration histories are accumulated. For example, as shown in Table 1, the processing unit before calibration is used as a heading, and the calibration result and calibration attributes (insertion, deletion, replacement) are used. ) And the number of calibrations (number of appearances in the past) are recorded. Therefore, the proofreading target determination unit 3 determines the processing unit that has been the proofreading target in the past as the proofreading target among the processing units constituting the divided sentence. The details of the calibration history corpus 8 will be described later.

校正対象判定部３は、すべての処理単位を校正履歴コーパス８の見出しと対比してもいいし、第２実施形態で示すように、誤り箇所自動検出部により誤り箇所と推測された処理単位のみを校正履歴コーパス８の見出しと対比してもいい。 The calibration target determination unit 3 may compare all the processing units with the headings of the calibration history corpus 8, and as shown in the second embodiment, only the processing units estimated to be error locations by the error location automatic detection unit. May be compared with the heading of the calibration history corpus 8.

校正対象判定部３は、判定した校正対象を校正候補予測部４に出力する。校正対象が複数ある場合、校正対象を一つずつ出力してもよいし、複数の校正対象を一度に出力してもよい。また、複数の校正対象を一度に出力する場合には、校正対象判定部３と校正候補予測部４の間に校正対象記憶部を設け、校正対象判定部３から出力された複数の校正対象を、一旦、校正対象記憶部に保持し、一つずつ校正候補予測部４に出力してもよい。 The calibration target determination unit 3 outputs the determined calibration target to the calibration candidate prediction unit 4. When there are a plurality of calibration targets, the calibration targets may be output one by one, or a plurality of calibration targets may be output at one time. When outputting a plurality of calibration targets at once, a calibration target storage unit is provided between the calibration target determination unit 3 and the calibration candidate prediction unit 4, and the plurality of calibration targets output from the calibration target determination unit 3 are output. , It may be temporarily held in the calibration target storage unit and output to the calibration candidate prediction unit 4 one by one.

＜校正候補予測部４＞
校正候補予測部４は、校正対象の一つをブランクとした場合に、そのブランクを埋める処理単位、すなわち校正候補を、校正対象（ブランク）の前後少なくとも一方、好ましくは両方の処理単位または処理単位群が有するベクトル（分散表現）、例えば文字ベクトル、単語ベクトル、文ベクトル等を用いて予測する。処理単位が有するベクトルは、例えば、あらかじめ生成しておいたベクトル学習済みモデル１０から取得することができる。また、処理単位群が有するベクトルは、ベクトル学習済みモデル１０から取得した処理単位が有するベクトルを用いて計算することができる。ここで、ベクトル学習済みモデル１０は、過去に校正された校正後の文から、単語ベクトル等のベクトル（分散表現）を、それぞれの処理単位で機械学習し、学習済みモデルとして蓄積したものである。なお、ベクトル学習済みモデル１０の詳細については後述する。 <Proofreading candidate prediction unit 4>
When one of the calibration targets is a blank, the calibration candidate prediction unit 4 sets the processing unit for filling the blank, that is, the calibration candidate at least one before or after the calibration target (blank), preferably both processing units or processing units. Prediction is made using a vector (distributed expression) possessed by a group, for example, a character vector, a word vector, a sentence vector, or the like. The vector of the processing unit can be obtained from, for example, the vector trained model 10 generated in advance. Further, the vector possessed by the processing unit group can be calculated using the vector possessed by the processing unit acquired from the vector trained model 10. Here, the vector-learned model 10 is a machine-learned model of a vector (distributed expression) such as a word vector from a sentence after proofreading that has been proofread in the past in each processing unit and accumulated as a trained model. .. The details of the vector trained model 10 will be described later.

予測の方法としては、例えば、以下の方法が挙げられる。なお、類似度の計算においては、例えばコサイン類似度が利用できる。
（１）ベクトル学習済みモデル１０から、ブランク前後の複数の形態素が有するそれぞれの単語ベクトルを取得し、これらの平均ベクトルを算出する。算出した平均ベクトルと類似度が高い単語ベクトルを有する形態素を、ベクトル学習済みモデル１０から取得して、その形態素を校正候補として予測する。
（２）ベクトル学習済みモデル１０から、ブランク前後の形態素群に含まれる形態素が有するそれぞれの単語ベクトルを取得し、例えば「ｃｏｎｔｅｘｔ２ｖｅｃ」等を利用して、ブランク前後の文ベクトルを算出する。算出した文ベクトルと類似度が高い単語ベクトルを有する形態素を、ベクトル学習済みモデル１０から取得して、その形態素を校正候補として予測する。 Examples of the prediction method include the following methods. In the calculation of similarity, for example, cosine similarity can be used.
(1) From the vector-learned model 10, each word vector possessed by a plurality of morphemes before and after the blank is acquired, and the average vector of these is calculated. A morpheme having a word vector having a high degree of similarity to the calculated average vector is acquired from the vector-learned model 10, and the morpheme is predicted as a proofreading candidate.
(2) From the vector-learned model 10, each word vector contained in the morpheme group before and after the blank is acquired, and the sentence vector before and after the blank is calculated by using, for example, "context2vc". A morpheme having a word vector having a high degree of similarity to the calculated sentence vector is acquired from the vector-learned model 10, and the morpheme is predicted as a proofreading candidate.

尚、校正対象文の先頭または末尾の処理単位が校正対象である場合は、文頭記号＜ｂｏｓ＞、文末記号＜ｅｏｓ＞が有するベクトルを用いてもよいし、校正対象の後のみまたは前のみの処理単位または処理単位群が有するベクトルを用いてもよい。 When the processing unit at the beginning or end of the proofreading target is the proofreading target, the vector of the sentence beginning symbol <bos> and the sentence ending symbol <eos> may be used, or only after or before the proofreading target. The vector of the processing unit or the processing unit group may be used.

また、校正候補予測部４は、校正候補を予測する際に、他の校正対象の少なくとも一つを適切候補の一つに置き換えて校正候補を予測してもよい。特に、最適候補に置き換えた場合には、校正候補の予測の精度が向上するため好ましい。 Further, when predicting a calibration candidate, the calibration candidate prediction unit 4 may predict the calibration candidate by replacing at least one of the other calibration targets with one of the appropriate candidates. In particular, when it is replaced with the optimum candidate, the accuracy of prediction of the calibration candidate is improved, which is preferable.

校正候補予測部４は、予測した校正候補を適切候補判定部５に出力する。校正候補が複数ある場合、校正候補を一つずつ出力してもよいし、複数の校正候補を一度に出力してもよい。また、複数の校正候補を一度に出力する場合には、校正候補予測部４と適切候補判定部５の間に校正候補記憶部を設け、校正候補予測部４から出力された複数の校正候補を、一旦、校正候補記憶部に保持し、一つずつ適切候補判定部５に出力してもよい。また、校正候補予測部４は、校正候補と共に校正候補の類似度を出力してもよい。 The calibration candidate prediction unit 4 outputs the predicted calibration candidate to the appropriate candidate determination unit 5. When there are a plurality of calibration candidates, the calibration candidates may be output one by one, or a plurality of calibration candidates may be output at one time. When outputting a plurality of calibration candidates at once, a calibration candidate storage unit is provided between the calibration candidate prediction unit 4 and the appropriate candidate determination unit 5, and a plurality of calibration candidates output from the calibration candidate prediction unit 4 are output. , It may be temporarily held in the calibration candidate storage unit and output to the appropriate candidate determination unit 5 one by one. Further, the calibration candidate prediction unit 4 may output the similarity of the calibration candidates together with the calibration candidates.

＜適切候補判定部５＞
適切候補判定部５は、校正候補のうち、校正履歴コーパス８中の見出し（校正対象と一致する見出し）に対応する校正結果と一致する校正候補を適切候補として判定する。適切候補判定部５は、全ての校正対象について適切候補を判定してもいいし、校正候補に校正対象自身が含まれない校正対象のみについて適切候補を判定してもいい。 <Appropriate candidate determination unit 5>
Among the calibration candidates, the appropriate candidate determination unit 5 determines the calibration candidate that matches the calibration result corresponding to the heading (heading that matches the calibration target) in the calibration history corpus 8 as an appropriate candidate. The appropriate candidate determination unit 5 may determine appropriate candidates for all calibration targets, or may determine appropriate candidates only for calibration targets whose calibration candidates do not include the calibration target itself.

また、適切候補判定部５は、適切候補のうちの一つを最適候補として判定してもよい。適切候補が一つの場合には、その適切候補を最適候補として判定すればよい。適切候補が複数ある場合に最適候補を判定する方法は特に限定されないが、例えば、校正候補予測部４から得た校正候補の類似度、校正履歴コーパス８から得た校正候補（校正候補と一致する校正結果）の校正回数、ブランクにした校正対象の品詞等を考慮して、最適候補を決定する方法、第２実施形態で示すように、幅優先探索を用いて最適候補を決定する方法等が挙げられる。 Further, the appropriate candidate determination unit 5 may determine one of the appropriate candidates as the optimum candidate. When there is only one appropriate candidate, the appropriate candidate may be determined as the optimum candidate. The method of determining the optimum candidate when there are a plurality of appropriate candidates is not particularly limited. For example, the similarity of the calibration candidates obtained from the calibration candidate prediction unit 4 and the calibration candidates obtained from the calibration history corpus 8 (matching the calibration candidates). The method of determining the optimum candidate in consideration of the number of calibrations of the calibration result), the part of speech of the calibration target blanked, and the method of determining the optimum candidate by using the width priority search as shown in the second embodiment. Be done.

適切候補判定部５は、判定した適切候補を出力部６に出力する。 The appropriate candidate determination unit 5 outputs the determined appropriate candidate to the output unit 6.

＜出力部６＞
出力部６は、適切候補を校正対象文と関連付けて出力する。校正対象文と関連付ける方法は特に限定されないが、例えば以下の方法が挙げられる。
（１）校正対象文と、校正対象と、適切候補とを関連付けて出力する。
（２）校正対象文と、校正対象と、適切候補及びその適切度合とを関連付けて出力する。
（３）校正対象文と、校正対象と、最適候補とを関連付けて出力する。
（４）校正対象を最適候補で置き換えた校正済みの文を出力する。この際、最適候補が「＜ｄｅｌ＞・・・＜／ｄｅｌ＞」である場合は、その処理単位の削除となる。例えば、表１の例では、最適候補が「＜ｄｅｌ＞まもなく＜／ｄｅｌ＞」である場合は、「まもなく」の削除となる。また、最適候補を構成する形態素の数が、校正対象を構成する形態素の数よりも多い場合には、その処理単位の前または後への挿入となる。例えば、表１の例では、校正対象が「２例」であり、最適候補が「２例目」である場合は、「２例」の後ろへの「目」の挿入となる。 <Output unit 6>
The output unit 6 outputs an appropriate candidate in association with the proofreading target sentence. The method of associating with the calibrated sentence is not particularly limited, and examples thereof include the following methods.
(1) Output the proofreading target sentence, the proofreading target, and the appropriate candidate in association with each other.
(2) Output the proofreading target sentence, the proofreading target, the appropriate candidate, and the appropriate degree thereof in association with each other.
(3) Output the proofreading target sentence, the proofreading target, and the optimum candidate in association with each other.
(4) Output a proofread sentence in which the proofreading target is replaced with the optimum candidate. At this time, if the optimum candidate is "<del> ... </ del>", the processing unit is deleted. For example, in the example of Table 1, if the optimum candidate is "<del> soon </ del>", "soon" is deleted. Further, when the number of morphemes constituting the optimum candidate is larger than the number of morphemes constituting the calibration target, the insertion is performed before or after the processing unit. For example, in the example of Table 1, when the calibration target is "2 cases" and the optimum candidate is "2nd case", the "eye" is inserted after the "2 cases".

尚、校正対象が無い場合、校正対象はあるが適切候補が無い場合には、校正対象文だけを出力してもよいし、校正対象文と共に校正対象または適切候補が無い旨を出力してもよい。 If there is no calibration target, or if there is a calibration target but there is no appropriate candidate, only the calibration target sentence may be output, or the calibration target or the fact that there is no appropriate candidate may be output together with the calibration target sentence. good.

出力の方法は特に限定されず、例えば、ディスプレイへの表示、プリントアウト等が挙げられる。 The output method is not particularly limited, and examples thereof include display on a display and printout.

≪校正支援方法≫
図２は、本実施形態の校正支援方法の一例を示すフローチャートである。 ≪Proofreading support method≫
FIG. 2 is a flowchart showing an example of the calibration support method of the present embodiment.

＜ステップ１（Ｓ１）＞
校正対象文が文入力部１に入力されると、分かち書き文生成部２は、校正対象文を処理単位に分割して分かち書き文を生成する。例えば、あらかじめ生成しておいた形態素解析用辞書９を用いて、校正対象文の形態素解析を行い、校正対象文を形態素と固有名詞に分割する。さらに、分かち書き文生成部２は、文頭に例えば＜ｂｏｓ＞等の文頭記号、文末に例えば＜ｅｏｓ＞等の文末記号を配置して、分かち書き文を生成する。 <Step 1 (S1)>
When the proofreading target sentence is input to the sentence input unit 1, the divided writing sentence generation unit 2 divides the proofreading target sentence into processing units and generates a divided writing sentence. For example, a morphological analysis of a calibrated target sentence is performed using a morphological analysis dictionary 9 generated in advance, and the calibrated target sentence is divided into a morpheme and a proper noun. Further, the word-separated sentence generation unit 2 generates a word-separated sentence by arranging a sentence beginning symbol such as <bos> at the beginning of the sentence and a sentence ending symbol such as <eos> at the end of the sentence.

＜ステップ２，３（Ｓ２，Ｓ３）＞
校正対象判定部３は、分かち書き文を構成する処理単位のそれぞれを、あらかじめ生成しておいた校正履歴コーパス８の見出しと対比する。校正対象判定部３は、すべての処理単位を校正履歴コーパス８の見出しと対比してもいいし、第２実施形態で示すように、機械学習により誤り箇所を推測する誤り箇所自動検出部により誤り箇所と推測された処理単位のみを校正履歴コーパス８の見出しと対比してもいい。処理単位の対比の順番は特に限定されない。校正対象判定部３は、処理単位が校正履歴コーパス８の見出しと一致しない場合は、次の処理単位を校正履歴コーパス８の見出しと対比する。一方、処理単位が校正履歴コーパス８の見出しと一致する場合は、その処理単位を校正対象として判定し、ステップ４に進む。 <Steps 2 and 3 (S2, S3)>
The proofreading target determination unit 3 compares each of the processing units constituting the divided sentence with the heading of the proofreading history corpus 8 generated in advance. The calibration target determination unit 3 may compare all the processing units with the heading of the calibration history corpus 8, and as shown in the second embodiment, the error location automatic detection unit that estimates the error location by machine learning makes an error. Only the processing units presumed to be locations may be compared with the headings of the calibration history corpus 8. The order of comparison of processing units is not particularly limited. When the processing unit does not match the heading of the calibration history corpus 8, the calibration target determination unit 3 compares the next processing unit with the heading of the calibration history corpus 8. On the other hand, if the processing unit matches the heading of the calibration history corpus 8, the processing unit is determined as the calibration target, and the process proceeds to step 4.

＜ステップ４（Ｓ４）＞
校正候補予測部４は、校正対象の一つをブランクとした場合に、そのブランクを埋める処理単位、すなわち校正候補を、校正対象（ブランク）の前後少なくとも一方、好ましくは両方の処理単位または処理単位群が有するベクトル（分散表現）、例えば文字ベクトル、単語ベクトル、文ベクトル等を用いて予測する。処理単位または処理単位群が有するベクトルの求め方、これらを用いた予測方法の具体例に関しては、「＜校正候補予測部４＞」の欄で述べた通りである。 <Step 4 (S4)>
When one of the calibration targets is a blank, the calibration candidate prediction unit 4 sets the processing unit for filling the blank, that is, the calibration candidate at least one before or after the calibration target (blank), preferably both processing units or processing units. Prediction is made using a vector (distributed expression) possessed by a group, for example, a character vector, a word vector, a sentence vector, or the like. The method of obtaining the vector possessed by the processing unit or the processing unit group and the specific example of the prediction method using these are as described in the column of "<Calibration candidate prediction unit 4>".

＜ステップ５，６（Ｓ５，Ｓ６）＞
適切候補判定部５は、全ての校正候補のそれぞれを、校正履歴コーパス８中の見出し（校正対象と一致する見出し）に対応する校正結果と対比する。適切候補判定部５は、校正候補が校正結果と一致しない場合は、次の校正候補を校正履歴コーパス８の校正結果と対比する。一方、校正候補が校正結果と一致する場合は、その校正候補を適切候補と判定する。そして、全ての校正候補の判定が終わったら、ステップ２に戻る。適切候補判定部５は、適切候補のうちの一つを最適候補として判定してもよく、その場合は、校正対象を最適候補で置き換え、ステップ２に戻ると、ステップ４での校正候補の予測の精度が向上するため好ましい。 <Steps 5 and 6 (S5 and S6)>
The appropriate candidate determination unit 5 compares each of all the calibration candidates with the calibration result corresponding to the heading (heading matching the calibration target) in the calibration history corpus 8. If the calibration candidate does not match the calibration result, the appropriate candidate determination unit 5 compares the next calibration candidate with the calibration result of the calibration history corpus 8. On the other hand, if the calibration candidate matches the calibration result, the calibration candidate is determined to be an appropriate candidate. Then, when the determination of all the calibration candidates is completed, the process returns to step 2. The appropriate candidate determination unit 5 may determine one of the appropriate candidates as the optimum candidate. In that case, the calibration target is replaced with the optimum candidate, and when the process returns to step 2, the calibration candidate is predicted in step 4. It is preferable because the accuracy of the above is improved.

適切候補判定部５は、全ての校正対象を校正履歴コーパス８の校正結果と対比してもいいし、校正候補に校正対象自身が含まれない校正対象のみを校正履歴コーパス８の校正結果と対比してもいい。その場合、適切候補判定部５は、まず、校正候補に校正対象自身が含まれないかどうかを判定し、校正対象自身が含まれる場合には、ステップ２に戻る。 The appropriate candidate determination unit 5 may compare all the calibration targets with the calibration results of the calibration history corpus 8, or compare only the calibration targets whose calibration targets do not include the calibration target itself with the calibration results of the calibration history corpus 8. May I. In that case, the appropriate candidate determination unit 5 first determines whether or not the calibration target itself is included in the calibration candidate, and if the calibration target itself is included, returns to step 2.

＜ステップ７（Ｓ７）＞
ステップ２からステップ６を、すべての処理単位を処理するまで繰り返した後、出力部６は適切候補を校正対象文と関連付けて出力する。 <Step 7 (S7)>
After repeating steps 2 to 6 until all the processing units are processed, the output unit 6 outputs an appropriate candidate in association with the proofreading target sentence.

図２に示す方法では、ステップ２で校正対象を判定するたびに、ステップ３〜ステップ６に進み、その校正対象の校正の要否を判断し、ステップ２に戻って、次の処理単位について判定しているが、ステップ４からステップ６を、すべての処理単位を処理するまで繰り返す方法でもよい。また、処理速度を優先させる場合には、ステップ２からステップ６を繰り返すことなく、全ての校正対象の校正の要否を同時に判断してもよい。 In the method shown in FIG. 2, each time the calibration target is determined in step 2, the process proceeds from step 3 to step 6, the necessity of calibration of the calibration target is determined, and the process returns to step 2 to determine the next processing unit. However, a method of repeating steps 4 to 6 until all the processing units are processed may be used. Further, when the processing speed is prioritized, the necessity of calibration of all the calibration targets may be determined at the same time without repeating steps 2 to 6.

≪校正履歴コーパスの生成≫
図３は、本実施形態で用いる校正履歴コーパス８を生成する校正履歴コーパス生成部の構成の一例を示すブロック図である。図３において、１３は校正済みデータベース、１４は文対取得部、１５は文対分かち書き文生成部、１６は校正履歴獲得部である。 ≪Proofreading history corpus generation≫
FIG. 3 is a block diagram showing an example of the configuration of the calibration history corpus generation unit that generates the calibration history corpus 8 used in the present embodiment. In FIG. 3, 13 is a proofreading database, 14 is a sentence pair acquisition unit, 15 is a sentence pair division word-separated sentence generation unit, and 16 is a proofreading history acquisition unit.

＜校正済みデータベース１３＞
校正済みデータベース１３には、過去に校正された校正済みの校正前文章１２と校正後文章１１が蓄積されている。校正済みデータベース１３に蓄積される文章は、校正対象文と同一分野または関連する分野の文章であることが好ましい。例えば、校正対象文が新聞記事である場合には、校正済みデータベース１３に蓄積される文章は新聞記事であることが好ましい。 <calibrated database 13>
The proofread database 13 stores the proofread pre-proofreading sentences 12 and the proofreading sentences 11 that have been proofread in the past. The sentences stored in the proofread database 13 are preferably sentences in the same field as or related to the proofreading target sentences. For example, when the proofreading target sentence is a newspaper article, it is preferable that the sentence stored in the proofreading database 13 is a newspaper article.

＜文対取得部１４＞
文対取得部１４は、校正済みデータベース１３から、校正前後の文の対を取得する。具体的には、文対取得部１４は、校正済みデータベース１３に蓄積された校正前文章１２と校正後文章１１のそれぞれを文に分割する。文への分割方法は、例えば、校正対象が日本語の場合は句点で、校正対象が英語の場合はピリオドで区切る方法が挙げられるが、これらに限定されず、校正対象の言語や校正の目的によって、適宜選択すればよい。そしてこれらの文を対比して、校正前後の文対を取得する。文対を取得する方法は特に限定されないが、例えば、ベクトル学習済みモデル１０から単語ベクトルを取得して、それらの単語ベクトルを用いて校正前の文と校正後の文の類似度を計算し、算出された類似度に基づいて、文対を取得する方法が挙げられる。 <Sentence pair acquisition unit 14>
The sentence pair acquisition unit 14 acquires sentence pairs before and after proofreading from the proofread database 13. Specifically, the sentence pair acquisition unit 14 divides each of the pre-proofreading sentence 12 and the post-proofreading sentence 11 stored in the proofreading database 13 into sentences. The method of dividing into sentences is, for example, a method of separating with a punctuation mark when the proofreading target is Japanese and a period when the proofreading target is English, but the method is not limited to these, and the language of the proofreading target and the purpose of the proofreading. It may be selected as appropriate. Then, these sentences are compared to obtain sentence pairs before and after proofreading. The method of obtaining a sentence pair is not particularly limited, but for example, a word vector is obtained from the vector trained model 10, and the similarity between the sentence before proofreading and the sentence after proofreading is calculated using those word vectors. A method of obtaining a sentence pair based on the calculated similarity can be mentioned.

尚、校正前後の文の対を人力で取得する場合には、文対取得部１４は設けなくてもよい。 When manually acquiring a pair of sentences before and after proofreading, the sentence pair acquisition unit 14 may not be provided.

＜文対分かち書き文生成部１５＞
校正前後の文対の校正前の文と校正後の文それぞれを処理単位に分割して分かち書き文を生成する。分かち書き文を生成する方法は、「＜分かち書き文生成部２＞」で述べた通りである。 <Sentence vs. word-separated sentence generation unit 15>
The sentence before and after proofreading is divided into the sentence before proofreading and the sentence after proofreading into processing units to generate a word-separated sentence. The method of generating a divided sentence is as described in "<Divided sentence generation unit 2>".

＜校正履歴獲得部１６＞
分かち書き文を用いて校正前後の文対を比較し、校正履歴を獲得する。校正履歴を獲得する方法は特に限定されないが、例えばエディットグラフを用いて校正前後の変更箇所を自動で算出して獲得する方法が挙げられる。 <Proofreading history acquisition unit 16>
Use the word-separated sentences to compare the sentence pairs before and after proofreading, and acquire the proofreading history. The method of acquiring the calibration history is not particularly limited, and examples thereof include a method of automatically calculating and acquiring changes before and after calibration using an edit graph.

校正履歴は、校正前の処理単位に、校正結果と、校正の属性（挿入、削除、置換）を関連付けたレコードとして獲得される。ここで、校正の属性が置換の場合は、置換した処理単位が校正結果となる。校正の属性が削除の場合は、削除された処理単位を例えば削除記号「＜ｄｅｌ＞」「＜／ｄｅｌ＞」で挟んだものが校正結果となる。校正の属性が挿入の場合は、校正前の処理単位を含むものが校正結果となる。 The calibration history is acquired as a record in which the calibration result and the calibration attributes (insertion, deletion, replacement) are associated with each processing unit before calibration. Here, when the calibration attribute is replacement, the replaced processing unit is the calibration result. When the calibration attribute is deleted, the calibration result is obtained by sandwiching the deleted processing unit with, for example, the deletion symbols "<del>" and "</ del>". If the calibration attribute is insert, the calibration result will include the processing unit before calibration.

尚、校正履歴を人力で獲得する場合には、校正履歴獲得部１６は設けなくてもよい。 When the calibration history is manually acquired, the calibration history acquisition unit 16 may not be provided.

＜校正履歴コーパス８＞
校正履歴獲得部１６で獲得された校正履歴は、校正履歴コーパス８に保存される。前述の通り、例えば表１に示すように、校正前の処理単位を見出しとし、校正結果と校正の属性（挿入、削除、置換）、さらには校正回数（過去の出現回数）とを関連付けたレコードとして保存される。表１に示した例では、一つの見出しに、一つの校正結果、校正の属性及び校正回数の組み合わせが関連付けられているが、もちろん、一つの見出しに、複数の校正結果、校正の属性及び校正回数の組み合わせが関連付けられている場合もある。なお、校正履歴コーパス８は、校正履歴のデータが新たに得られた場合、逐次更新することが校正の正確性の観点より好ましい。 <Proofreading history corpus 8>
The calibration history acquired by the calibration history acquisition unit 16 is stored in the calibration history corpus 8. As described above, for example, as shown in Table 1, a record in which the processing unit before calibration is used as a heading and the calibration result is associated with calibration attributes (insertion, deletion, replacement) and calibration count (past appearance count). Saved as. In the example shown in Table 1, one heading is associated with one calibration result, calibration attribute and combination of calibration times, but of course, one heading is associated with multiple calibration results, calibration attributes and calibrations. A combination of times may be associated. It is preferable that the calibration history corpus 8 is sequentially updated when new calibration history data is obtained from the viewpoint of calibration accuracy.

≪ベクトル学習済みモデル１０の生成≫
図４は、本実施形態で用いるベクトル学習済みモデル１０を生成するベクトル学習済みモデル生成部の構成の一例を示すブロック図である。図４において、１７はベクトル計算部である。 ≪Generation of vector trained model 10≫
FIG. 4 is a block diagram showing an example of the configuration of the vector-learned model generation unit that generates the vector-learned model 10 used in the present embodiment. In FIG. 4, reference numeral 17 is a vector calculation unit.

ベクトル計算部１７は、校正済みデータベース１３から、校正後文章１１を取得し、文に分割する。文への分割方法は、例えば、校正対象が日本語の場合は句点で、校正対象が英語の場合はピリオドで区切る方法が挙げられるが、これらに限定されず、校正対象の言語や校正の目的によって、適宜選択すればよい。 The vector calculation unit 17 acquires the proofread sentence 11 from the proofread database 13 and divides it into sentences. The method of dividing into sentences is, for example, a method of separating with a punctuation mark when the proofreading target is Japanese and a period when the proofreading target is English, but the method is not limited to these, and the language of the proofreading target and the purpose of the proofreading. It may be selected as appropriate.

ベクトル計算部１７は、分割した文を処理単位に分割して分かち書き文を生成する。分かち書き文を生成する方法は、「＜分かち書き文生成部２＞」で述べた通りである。ここで、校正対象文の処理単位を削除するという校正を行うためには、例えば、削除された処理単位を削除記号「＜ｄｅｌ＞」「＜／ｄｅｌ＞」で挟んだものも処理単位とする方法が挙げられる。そのためには、ベクトル計算部１７は、校正済みデータベース１３から、校正前後の文の対を取得して校正履歴を獲得し、獲得した校正履歴に基づいて処理単位を決定することが好ましい。校正履歴を獲得する方法は、「≪校正履歴コーパスの生成≫」の欄で述べた通りである。 The vector calculation unit 17 divides the divided sentence into processing units and generates a divided sentence. The method of generating a divided sentence is as described in "<Divided sentence generation unit 2>". Here, in order to perform proofreading in which the processing unit of the proofreading target sentence is deleted, for example, the deleted processing unit sandwiched between the deletion symbols "<del>" and "</ del>" is also used as the processing unit. The method can be mentioned. For that purpose, it is preferable that the vector calculation unit 17 acquires a pair of sentences before and after proofreading from the proofreading database 13 to acquire a proofreading history, and determines a processing unit based on the acquired proofreading history. The method of acquiring the calibration history is as described in the column of "<< Generation of calibration history corpus >>".

また、校正対象文の処理単位前後に挿入するという校正、あるいは連続した複数の処理単位を置換または削除するという校正を行うためには、例えば、「ｎ−ｇｒａｍ」（隣接するｎ処理単位を結合したもの）を含めたものも処理単位とする方法が挙げられる。具体的には、校正後文章１１を分割した文が「県警によると、県内では２例目の把握となる。」であった場合、以下に示すような分かち書き文を生成し、形態素ごとの他に、前後の複数の形態素を結合したものも処理単位に含めておく。ただし、「ｎ−ｇｒａｍ」においては、ｎが大きくなるほど、処理単位が増えるため、ｎはコンピューターのリソースによって適切な値を決定することが好ましい。
１−ｇｒａｍ：「＜ｂｏｓ＞／県警／に／よる／と／、／県内／で／は／２例／目／の／把握／と／なる／。／＜ｅｏｓ＞」
２−ｇｒａｍ：「＜ｂｏｓ＞県警／県警に／による／よると／と、／、県内／県内で／では／は２例／２例目／目の／の把握／把握と／となる／なる。／。＜ｅｏｓ＞」
３−ｇｒａｍ：「＜ｂｏｓ＞県警に／県警による／によると／よると、／と、県内／、県内で／県内では／では２例／は２例目／２例目の／目の把握／の把握と／把握となる／となる。／なる。＜ｅｏｓ＞」 Further, in order to perform proofreading by inserting before or after the processing unit of the proofreading target sentence, or proofreading by replacing or deleting a plurality of consecutive processing units, for example, "n-gram" (combining adjacent n processing units). There is also a method of using the processing unit including the ones that have been used. Specifically, if the sentence obtained by dividing the sentence 11 after proofreading is "According to the prefectural police, it will be the second case in the prefecture." In addition, a combination of multiple morphemes before and after is also included in the processing unit. However, in "n-gram", as n becomes larger, the number of processing units increases, so it is preferable to determine an appropriate value for n by computer resources.
1-gram: "<bos> / prefectural police / ni / by / to /, / prefecture / de / ha / 2 cases / eyes / no / grasp / and / become /. / <eos>"
2-gram: "<bos> To the prefectural police / by / by / and /, within the prefecture / within the prefecture / in / in 2 cases / 2nd case / grasp / grasp / become / become / ./. <Eos>"
3-gram: "<bos> To the prefectural police / According to / According to /, within the prefecture /, within the prefecture / within the prefecture / in 2 cases / is the 2nd case / 2nd case / grasping the eyes / And / become / become / become. / Become. <Eos>"

ベクトル計算部１７は、生成した分かち書文から、処理単位の出現頻度や、どの処理単位と処理単位が互いに近くに配置されているか、どういった文脈で使われる処理単位か、などを統計的に機械学習し、単語ベクトル等の処理単位が有するベクトル（分散表現）を得る。文頭記号、文末記号、削除記号等の各種記号についても、ベクトルを得ることが好ましい。ベクトルを得るには、例えば「ｗｏｒｄ２ｖｅｃ」、「ＧｌｏＶｅ」などが利用できる。 The vector calculation unit 17 statistically determines the frequency of appearance of processing units, which processing units and processing units are arranged close to each other, and in what context the processing units are used, from the generated word-separated text. Machine learning is performed to obtain a vector (distributed expression) possessed by a processing unit such as a word vector. It is preferable to obtain vectors for various symbols such as the beginning symbol, the ending symbol, and the deletion symbol. For example, "word2vec", "GloVe" and the like can be used to obtain a vector.

≪具体例≫
以下、具体的な例を用いて、本実施形態をさらに説明する。 ≪Specific example≫
Hereinafter, the present embodiment will be further described with reference to specific examples.

＜具体例１＞
校正対象文の一箇所を置き換える例、具体的には、校正対象文「県警は８日、熊谷署に１００人体制の捜査本部を設置。」の「体制」を「態勢」で置き換える例について説明する。本例で用いた校正履歴コーパス８には、表２に示すレコードが保存されている。 <Specific example 1>
Explains an example of replacing one part of the calibration target sentence, specifically, an example of replacing the "system" of the calibration target sentence "The prefectural police set up an investigation headquarters of 100 people at the Kumagaya police station on the 8th." do. The records shown in Table 2 are stored in the calibration history corpus 8 used in this example.

（１）ステップ１
分かち書き文生成部２は、文入力部１から出力された校正対象文について、形態素解析用辞書９を用いて形態素解析を行い、処理単位である形態素に分割する。さらに、文頭に文頭記号＜ｂｏｓ＞、文末に文末記号＜ｅｏｓ＞を配置して、下記分かち書き文を生成する。
「＜ｂｏｓ＞／県警／は／８日／、／熊谷署／に／１００人／体制／の／捜査本部／を／設置／。／＜ｅｏｓ＞」 (1) Step 1
The word-separated sentence generation unit 2 performs morphological analysis on the calibrated target sentence output from the sentence input unit 1 using the morphological analysis dictionary 9, and divides it into morphemes, which are processing units. Further, the sentence beginning symbol <bos> and the sentence ending symbol <eos> are arranged at the beginning of the sentence to generate the following divided sentence.
"<Bos> / Prefectural Police / Ha / 8th /, / Kumagaya Police Station / Ni / 100 people / System / / Investigation Headquarters / Established /. / <Eos>"

（２）ステップ２，３
校正対象判定部３は、上記分かち書き文を構成する形態素のそれぞれを文の先頭から順番に校正履歴コーパス８の見出しと対比し、「体制」が校正履歴コーパス８の見出しと一致するので、校正対象として判定し、ステップ４に進む。 (2) Steps 2 and 3
The proofreading target determination unit 3 compares each of the morphemes constituting the above-mentioned word-separated sentence with the heading of the proofreading history corpus 8 in order from the beginning of the sentence, and since the "system" matches the heading of the proofreading history corpus 8, it is a proofreading target. And proceed to step 4.

（３）ステップ４
校正候補予測部４は、校正対象「体制」がブランクであった場合に、ブランクを埋める形態素を、校正候補として予測する。具体的には、校正対象「体制」前後の形態素群「＜ｂｏｓ＞／県警／は８日／、／熊谷署／に／１００人」「の／捜査本部／を／設置／。／＜ｅｏｓ＞」に含まれる形態素が有するそれぞれの単語ベクトルをベクトル学習済みモデル１０から取得し、「ｃｏｎｔｅｘｔ２ｖｅｃ」を利用して、それぞれの文ベクトルを算出する。そして、算出した文ベクトルと類似度が高い単語ベクトルを有する形態素を、ベクトル学習済みモデル１０から取得して、その形態素を校正候補として予測する。その結果を表３に示す。 (3) Step 4
When the calibration target "system" is blank, the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate. Specifically, the morpheme group before and after the calibration target "system""<bos> / prefectural police / on the 8th /, / Kumagaya police station / ni / 100 people""/ investigation headquarters / established /. / <Eos> The word vector of each morpheme included in "" is acquired from the vector-learned model 10, and each sentence vector is calculated by using "context2vc". Then, a morpheme having a word vector having a high degree of similarity to the calculated sentence vector is acquired from the vector-learned model 10, and the morpheme is predicted as a proofreading candidate. The results are shown in Table 3.

（４）ステップ５，６
適切候補判定部５は、校正候補のそれぞれを、表２に示す校正結果と対比する。その結果、「態勢」のみが校正結果と一致するため、これを最適候補として判定し、校正対象「体制」を、最適候補「態勢」に置き換え、ステップ２に戻る。 (4) Steps 5 and 6
The appropriate candidate determination unit 5 compares each of the calibration candidates with the calibration results shown in Table 2. As a result, since only the "position" matches the calibration result, this is determined as the optimum candidate, the calibration target "system" is replaced with the optimum candidate "position", and the process returns to step 2.

（５）ステップ２
校正対象判定部３は、上記分かち書き文の「体制」が「態勢」に置き換えられた分かち書き文の「態勢」より後ろの形態素を順番に校正履歴コーパス８の見出しと対比する。校正履歴コーパス８の見出しと一致するものはないので、ステップ７に進む。 (5) Step 2
The proofreading target determination unit 3 compares the morphemes after the "position" of the word-separated sentence in which the "system" of the above-mentioned word-separated sentence is replaced with the "position" in order with the heading of the calibration history corpus 8. Since there is nothing that matches the heading of the calibration history corpus 8, the process proceeds to step 7.

（６）ステップ７
出力部６は、校正対象「体制」を、最適候補「態勢」に置き換えた校正済み文「県警は８日、熊谷署に１００人態勢の捜査本部を設置。」をディスプレイに表示する。 (6) Step 7
The output unit 6 displays on the display a calibrated sentence "The prefectural police set up an investigation headquarters with 100 people at the Kumagaya police station on the 8th."

＜具体例２＞
校正対象文の一箇所を削除する例、具体的には、校正対象文「神通川第二ダムを超えると、まもなく木造の建物が見えてきた。」の「まもなく」を削除する例について説明する。本例で用いた校正履歴コーパス８には、表４に示すレコードが保存されている。 <Specific example 2>
An example of deleting one part of the proofreading sentence, specifically, an example of deleting "soon" in the proofreading sentence "A wooden building was soon visible after crossing the Kamitsugawa No. 2 Dam." .. The records shown in Table 4 are stored in the calibration history corpus 8 used in this example.

（１）ステップ１
具体例１と同様にして、下記分かち書き文を生成する。
「＜ｂｏｓ＞／神通川第二ダム／を／超える／と／、／まもなく／木造／の／建物／が／見え／て／きた／。／＜ｅｏｓ＞」 (1) Step 1
In the same manner as in Specific Example 1, the following word-separated sentence is generated.
"<Bos> / Kamitsugawa No. 2 Dam / Beyond / Exceed / To /, / Soon / Wooden / No / Building / Can be seen / / Kita /. / <Eos>"

（２）ステップ２，３
具体例１と同様にして、「まもなく」を校正対象として判定し、ステップ４に進む。 (2) Steps 2 and 3
In the same manner as in Specific Example 1, "soon" is determined as the calibration target, and the process proceeds to step 4.

（３）ステップ４
校正候補予測部４は、具体例１と同様にして、校正対象「まもなく」がブランクであった場合に、ブランクを埋める形態素を校正候補として予測する。その結果を表５に示す。 (3) Step 4
Similar to the first embodiment, the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate when the calibration target “soon” is blank. The results are shown in Table 5.

（４）ステップ５，６
適切候補判定部５は、校正候補のそれぞれを、表４に示す校正結果と対比する。その結果、「＜ｄｅｌ＞まもなく＜／ｄｅｌ＞」のみが校正結果と一致するため、これを最適候補として判定し、校正対象「まもなく」を、最適候補「＜ｄｅｌ＞まもなく＜／ｄｅｌ＞」に置き換え、ステップ２に戻る。 (4) Steps 5 and 6
The appropriate candidate determination unit 5 compares each of the calibration candidates with the calibration results shown in Table 4. As a result, only "<del> soon </ del>" matches the calibration result, so this is judged as the optimum candidate, and the calibration target "soon" is changed to the optimum candidate "<del> soon </ del>". Replace and return to step 2.

（５）ステップ２
校正対象判定部３は、上記分かち書き文の「まもなく」が「＜ｄｅｌ＞まもなく＜／ｄｅｌ＞」に置き換えられた分かち書き文の「＜ｄｅｌ＞まもなく＜／ｄｅｌ＞」より後ろの形態素を順番に校正履歴コーパス８の見出しと対比する。校正履歴コーパス８の見出しと一致するものはないので、ステップ７に進む。 (5) Step 2
The proofreading target determination unit 3 calibrates the morphemes after "<del> soon </ del>" in the word-separated sentence in which "soon" in the above-mentioned word-separated sentence is replaced with "<del> soon </ del>". Contrast with the heading of the history corpus 8. Since there is nothing that matches the heading of the calibration history corpus 8, the process proceeds to step 7.

（６）ステップ７
出力部６は、校正対象「まもなく」を、最適候補「＜ｄｅｌ＞まもなく＜／ｄｅｌ＞」に置き換えた、すなわち「まもなく」を削除した校正済み文「神通川第二ダムを超えると、木造の建物が見えてきた。」をディスプレイに表示する。 (6) Step 7
The output unit 6 replaced the proofreading target "soon" with the optimum candidate "<del> soon </ del>", that is, the proofread sentence "Soon" was deleted. I can see the building. "Is displayed on the display.

＜具体例３＞
校正対象文の一箇所に挿入する例、具体的には、校正対象文「県警によると、県内では２例の把握となる。」の「２例」の後に「目」を挿入する例について説明する。本例で用いた校正履歴コーパス８には、表６に示すレコードが保存されている。 <Specific example 3>
An example of inserting in one place of the proofreading target sentence, specifically, an example of inserting an "eye" after "2 cases" of the proofreading target sentence "According to the prefectural police, two cases are grasped in the prefecture." do. The records shown in Table 6 are stored in the calibration history corpus 8 used in this example.

（１）ステップ１
具体例１と同様にして、下記分かち書き文を生成する。
「＜ｂｏｓ＞／県警／に／よる／と／、／県内／で／は／２例／の／把握／と／なる／。／＜ｅｏｓ＞」 (1) Step 1
In the same manner as in Specific Example 1, the following word-separated sentence is generated.
"<Bos> / Prefectural Police / Ni / By / To /, / Prefectural / De / is / 2 cases / / Grasp / To / Become /. / <eos>"

（２）ステップ２，３
具体例１と同様にして、「２例」を校正対象として判定し、ステップ４に進む。 (2) Steps 2 and 3
In the same manner as in Specific Example 1, "2 examples" are determined as calibration targets, and the process proceeds to step 4.

（３）ステップ４
校正候補予測部４は、具体例１と同様にして、校正対象「２例」がブランクであった場合に、ブランクを埋める形態素を校正候補として予測する。その結果を表７に示す。 (3) Step 4
Similar to the specific example 1, the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate when the calibration target “2 examples” is blank. The results are shown in Table 7.

（４）ステップ５，６
適切候補判定部５は、校正候補のそれぞれを、表６に示す校正結果と対比する。その結果、「２例目」のみが校正結果と一致するため、これを最適候補として判定し、校正対象「２例」を、最適候補「２例目」に置き換え、ステップ２に戻る。 (4) Steps 5 and 6
The appropriate candidate determination unit 5 compares each of the calibration candidates with the calibration results shown in Table 6. As a result, since only the "second case" matches the calibration result, this is determined as the optimum candidate, the calibration target "two cases" is replaced with the optimum candidate "second case", and the process returns to step 2.

（５）ステップ２
校正対象判定部３は、上記分かち書き文の「２例」が「２例目」に置き換えられた分かち書き文の「２例目」より後ろの形態素を順番に校正履歴コーパス８の見出しと対比する。校正履歴コーパス８の見出しと一致するものはないので、ステップ７に進む。 (5) Step 2
The proofreading target determination unit 3 compares the morphemes after the "second example" of the divided word sentence in which the "two examples" of the above divided sentence are replaced with the "second example" in order with the heading of the calibration history corpus 8. Since there is nothing that matches the heading of the calibration history corpus 8, the process proceeds to step 7.

（６）ステップ７
出力部６は、校正対象「２例」を、最適候補「２例目」に置き換えた、すなわち「２例」の後に「目」を挿入した校正済み文「県警によると、県内では２例目の把握となる。」をディスプレイに表示する。 (6) Step 7
The output unit 6 replaced the proofreading target "2 cases" with the optimum candidate "2nd case", that is, the proofread sentence "eyes" inserted after "2 cases", "according to the prefectural police, the second case in the prefecture". Will be grasped. ”Is displayed on the display.

＜具体例４＞
校正対象文の一箇所を校正対象と判定したが校正しない例、具体的には、校正対象文「県警は８日、熊谷署に１００人態勢の捜査本部を設置。」の「県警」を校正対象と判定したが適切候補を判定せずに校正しない例について説明する。本例で用いた校正履歴コーパス８には、表８に示すレコードが保存されている。 <Specific example 4>
An example of proofreading one part of the sentence to be proofread but not proofreading, specifically, proofreading the "prefectural police" in the proofreading sentence "The prefectural police set up a 100-person investigation headquarters at the Kumagaya police station on the 8th." An example in which the target is determined but the appropriate candidate is not determined and the calibration is not performed will be described. The records shown in Table 8 are stored in the calibration history corpus 8 used in this example.

（１）ステップ１
具体例１と同様にして、下記分かち書き文を生成する。
「＜ｂｏｓ＞／県警／は／８日／、／熊谷署／に／１００人／態勢／の／捜査本部／を／設置／。／＜ｅｏｓ＞」 (1) Step 1
In the same manner as in Specific Example 1, the following word-separated sentence is generated.
"<Bos> / Prefectural Police / Ha / 8th /, / Kumagaya Police Station / Ni / 100 people / Position / No / Investigation Headquarters / Established /. / <Eos>"

（２）ステップ２，３
具体例１と同様にして、「県警」を校正対象として判定し、ステップ４に進む。 (2) Steps 2 and 3
In the same manner as in Specific Example 1, the "prefectural police" is determined as the calibration target, and the process proceeds to step 4.

（３）ステップ４
校正候補予測部４は、具体例１と同様にして、校正対象「県警」がブランクであった場合に、ブランクを埋める形態素を校正候補として予測する。その結果を表９に示す。 (3) Step 4
Similar to the first embodiment, the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate when the calibration target “prefectural police” is blank. The results are shown in Table 9.

（４）ステップ５
適切候補判定部５は、校正候補に校正対象「県警」が含まれないかどうかを判定する。本例では、表９に示す校正候補に、「県警」が含まれるため、ステップ２に戻る。 (4) Step 5
The appropriate candidate determination unit 5 determines whether or not the calibration candidate does not include the calibration target “prefectural police”. In this example, since the calibration candidates shown in Table 9 include "prefectural police", the process returns to step 2.

（５）ステップ２
校正対象判定部３は、上記分かち書き文の「県警」より後ろの形態素を順番に校正履歴コーパス８の見出しと対比する。校正履歴コーパス８の見出しと一致するものはないので、ステップ９に進む。 (5) Step 2
The proofreading target determination unit 3 compares the morphemes behind the "prefectural police" in the above-mentioned word-separated sentence with the headings of the proofreading history corpus 8 in order. Since there is nothing that matches the heading of the calibration history corpus 8, the process proceeds to step 9.

（６）ステップ９
出力部６は、校正対象文「県警は８日、熊谷署に１００人態勢の捜査本部を設置。」をディスプレイに表示する。 (6) Step 9
The output unit 6 displays the calibration target sentence "The prefectural police set up an investigation headquarters with 100 personnel at the Kumagaya police station on the 8th."

＜具体例５＞
校正対象文の複数箇所を校正する例、具体的には、校正対象文「東京（品川）と名古屋の間は２０２７年に開業、名古屋から大阪までは４５年にも伸びる予定だ。」を「東京（品川）と名古屋の間は２０２７年に開業し、名古屋から大阪までは４５年に延びる予定だ。」に校正する例について説明する。本例で用いた校正履歴コーパス８には、表１０に示すレコードが保存されている。 <Specific example 5>
An example of proofreading multiple parts of the proofreading sentence, specifically, the proofreading sentence "The section between Tokyo (Shinagawa) and Nagoya will open in 2027, and the distance from Nagoya to Osaka will be extended to 45 years." The section between Tokyo (Shinagawa) and Nagoya will open in 2027, and the distance from Nagoya to Osaka will be extended to 45 years. " The records shown in Table 10 are stored in the calibration history corpus 8 used in this example.

（１）ステップ１
具体例１と同様にして、下記分かち書き文を生成する。
「＜ｂｏｓ＞／東京／（／品川／）／と／名古屋／の／間／は／２０２７年／に／開業／、／名古屋／から／大阪／まで／は／４５年／に／も／伸びる／予定／だ／。／＜ｅｏｓ＞」 (1) Step 1
In the same manner as in Specific Example 1, the following word-separated sentence is generated.
"<Bos> / Tokyo / (/ Shinagawa /) / and / Nagoya / no / ma / ha / 2027 / ni / opening /, / Nagoya / to / Osaka / to / ha / 45 years / ni / mo / growth / Schedule / Da /. / <eos>"

（２）ステップ２，３
具体例１と同様にして、「、」（読点）を校正対象として判定し、ステップ４に進む。 (2) Steps 2 and 3
In the same manner as in Specific Example 1, "," (a reading point) is determined as a proofreading target, and the process proceeds to step 4.

（３）ステップ４
校正候補予測部４は、具体例１と同様にして、校正対象「、」がブランクであった場合に、ブランクを埋める形態素を校正候補として予測する。その結果を表１１に示す。 (3) Step 4
Similar to the first embodiment, the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate when the calibration target “,” is blank. The results are shown in Table 11.

（４）ステップ５，６
適切候補判定部５は、校正候補のそれぞれを、表１０に示す校正結果と対比する。その結果、「し、」のみが校正結果と一致するため、これを最適候補として判定し、校正対象「、」を、最適候補「し、」に置き換え、ステップ２に戻る。 (4) Steps 5 and 6
The appropriate candidate determination unit 5 compares each of the calibration candidates with the calibration results shown in Table 10. As a result, since only "shi," matches the calibration result, this is determined as the optimum candidate, the calibration target "," is replaced with the optimum candidate "shi,", and the process returns to step 2.

（５）ステップ２，３
校正対象判定部３は、上記分かち書き文の「、」が「し、」に置き換えられた分かち書き文の「し、」より後ろの形態素を順番に校正履歴コーパス８の見出しと対比し、「も」を校正対象として判定し、ステップ４に進む。 (5) Steps 2 and 3
The proofreading target determination unit 3 compares the morphemes after the "shi," in the word-separated sentence in which "," in the above-mentioned word-separated sentence is replaced with "shi," in order with the heading of the proofreading history corpus 8, and "mo". Is determined as a calibration target, and the process proceeds to step 4.

（６）ステップ４
校正候補予測部４は、具体例１と同様にして、校正対象「も」がブランクであった場合に、ブランクを埋める形態素を校正候補として予測する。その結果を表１２に示す。 (6) Step 4
Similar to the first embodiment, the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate when the calibration target “mo” is blank. The results are shown in Table 12.

（７）ステップ５，６
適切候補判定部５は、校正候補のそれぞれを、表１０に示す校正結果と対比する。その結果、「＜ｄｅｌ＞も＜／ｄｅｌ＞」のみが校正結果と一致するため、これを最適候補として判定し、校正対象「も」を、最適候補「＜ｄｅｌ＞も＜／ｄｅｌ＞」に置き換え、ステップ２に戻る。 (7) Steps 5 and 6
The appropriate candidate determination unit 5 compares each of the calibration candidates with the calibration results shown in Table 10. As a result, only "<del> also </ del>" matches the calibration result, so this is determined as the optimum candidate, and the calibration target "mo" is changed to the optimum candidate "<del> also </ del>". Replace and return to step 2.

（８）ステップ２，３
校正対象判定部３は、「、」が「し、」に置き換えられた分かち書き文の「も」が「＜ｄｅｌ＞も＜／ｄｅｌ＞」に置き換えられた分かち書き文の「＜ｄｅｌ＞も＜／ｄｅｌ＞」より後ろの形態素を順番に校正履歴コーパス８の見出しと対比し、「伸びる」を校正対象として判定し、ステップ４に進む。 (8) Steps 2 and 3
In the proofreading target determination unit 3, the word-separated sentence "<del> is also </ del>" in which "," is replaced with "," and "mo" is replaced with "<del> and </ del>". The morphemes after "del>" are sequentially compared with the headings of the calibration history corpus 8, and "elongation" is determined as the calibration target, and the process proceeds to step 4.

（９）ステップ４
校正候補予測部４は、具体例１と同様にして、校正対象「伸びる」がブランクであった場合に、ブランクを埋める形態素を校正候補として予測する。その結果を表１３に示す。 (9) Step 4
Similar to the first embodiment, the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate when the calibration target “stretches” is blank. The results are shown in Table 13.

（１０）ステップ５，６
適切候補判定部５は、校正候補のそれぞれを、表１０に示す校正結果と対比する。その結果、「延びる」のみが校正結果と一致するため、これを最適候補として判定し、校正対象「伸びる」を、最適候補「延びる」に置き換え、ステップ２に戻る。 (10) Steps 5 and 6
The appropriate candidate determination unit 5 compares each of the calibration candidates with the calibration results shown in Table 10. As a result, only "extension" matches the calibration result, so this is determined as the optimum candidate, the calibration target "elongation" is replaced with the optimum candidate "extension", and the process returns to step 2.

（１１）ステップ２
校正対象判定部３は、「、」が「し、」に、「も」が「＜ｄｅｌ＞も＜／ｄｅｌ＞」に置き換えられた分かち書き文の「伸びる」を「延びる」に置き換えた分かち書き文の「延びる」より後ろの形態素を順番に校正履歴コーパス８の見出しと対比する。校正履歴コーパス８の見出しと一致するものはないので、ステップ７に進む。 (11) Step 2
In the proofreading target determination unit 3, the word-separated sentence in which "," is replaced with "," and "mo" is replaced with "<del> also </ del>" is replaced with "extended". The morphemes after "extend" in are sequentially compared with the headings of the calibration history corpus 8. Since there is nothing that matches the heading of the calibration history corpus 8, the process proceeds to step 7.

（１２）ステップ７
出力部６は、校正済み文「東京（品川）と名古屋の間は２０２７年に開業し、名古屋から大阪までは４５年に延びる予定だ。」をディスプレイに表示する。 (12) Step 7
The output unit 6 displays a proofread sentence "The section between Tokyo (Shinagawa) and Nagoya will open in 2027, and the distance from Nagoya to Osaka will be extended to 45 years."

＜具体例６＞
具体例１〜３で用いた校正履歴コーパス８を生成する例について説明する。 <Specific example 6>
An example of generating the calibration history corpus 8 used in Specific Examples 1 to 3 will be described.

［具体例６−１（具体例１で用いた校正履歴コーパス８）］
文対取得部１４は、校正済みデータベース１３に蓄積された校正前文章１２と校正後文章１１のそれぞれを句点で区切り、文に分割する。そして、ベクトル学習済みモデル１０から単語ベクトルを取得して、それらの単語ベクトルを用いて校正前の文と校正後の文の類似度を計算し、算出された類似度に基づいて、下記校正前後の文の対を取得する。
校正前の文：「平日は４０分間隔で１頭１車両体制、土日祝日は３０分間隔で２頭２車両体制。」
校正後の文：「平日は４０分間隔で１頭１車両態勢、土日祝日は３０分間隔で２頭２車両態勢。」 [Specific Example 6-1 (Calibration history corpus 8 used in Specific Example 1)]
The sentence pair acquisition unit 14 divides each of the pre-proofreading sentence 12 and the post-proofreading sentence 11 stored in the proofreading database 13 by kuten and divides them into sentences. Then, a word vector is acquired from the vector-learned model 10, the similarity between the sentence before proofreading and the sentence after proofreading is calculated using those word vectors, and based on the calculated similarity, before and after the following proofreading. Get a pair of sentences.
Sentence before proofreading: "One vehicle system every 40 minutes on weekdays, two vehicles two vehicles every 30 minutes on weekends and holidays."
Sentence after proofreading: "One head and one vehicle are prepared every 40 minutes on weekdays, and two heads and two vehicles are prepared every 30 minutes on weekends and holidays."

文対分かち書き文生成部１５は、校正前の文と校正後の文それぞれを、形態素解析用辞書９を用いて形態素解析を行い、処理単位である形態素に分割して、文対分かち書き文を生成する。校正履歴獲得部１６は、エディットグラフを用いて、文対分かち書き文の校正前後の文の分かち書き文を比較し、「体制」が「態勢」に１回置換されたという校正履歴を獲得する。そして表２に示すように、校正前の処理単位「体制」を見出しとし、校正結果「態勢」と校正の属性「挿入」と、校正回数とを関連付けたレコードとして、校正履歴コーパス８に保存する。なお、校正回数は、「体制」が「態勢」に１回置換されたという校正履歴を獲得するごとに更新する。 The sentence-to-separate word-separated sentence generation unit 15 performs morphological analysis on each of the pre-calibration sentence and the post-calibration sentence using the morphological analysis dictionary 9, divides the sentence into morphological elements, which are processing units, and generates a sentence-to-separate word-separated sentence. do. The proofreading history acquisition unit 16 compares the divided sentences of the sentences before and after the proofreading of the sentence-to-divided sentence by using the edit graph, and acquires the proofreading history that the "system" is replaced with the "position" once. Then, as shown in Table 2, the processing unit "system" before calibration is used as a heading, and the record is stored in the calibration history corpus 8 as a record in which the calibration result "position", the calibration attribute "insertion", and the number of calibrations are associated with each other. .. The number of calibrations is updated every time the calibration history that the "system" is replaced with the "position" is acquired.

［具体例６−２（具体例２で用いた校正履歴コーパス８）］
具体例６−１と同様にして、下記校正前後の文の対を取得する。
校正前の文：「火は約４時間半後に消し止められたが、全身にやけどを負って病院に運ばれ、まもなく死亡した。」
校正後の文：「火は約４時間半後に消し止められたが、全身にやけどを負って病院に運ばれ、＜ｄｅｌ＞まもなく＜／ｄｅｌ＞死亡した。」 [Specific Example 6-2 (Calibration history corpus 8 used in Specific Example 2)]
In the same manner as in Specific Example 6-1 the following pair of sentences before and after proofreading is acquired.
Pre-proofreading sentence: "The fire was extinguished after about four and a half hours, but he was burned to the hospital and died shortly thereafter."
Post-proofreading sentence: "The fire was extinguished after about four and a half hours, but the whole body was burned and taken to the hospital, and <del> soon </ del>died."

文対分かち書き文生成部１５は、具体例６−１と同様にして、文対分かち書き文を生成する。校正履歴獲得部１６は、具体例６−１と同様にして、「まもなく」が「＜ｄｅｌ＞まもなく＜／ｄｅｌ＞」に１回置き換えられた、すなわち「まもなく」が削除されたという校正履歴を獲得し、表４に示すレコードとして、校正履歴コーパス８に保存する。 The sentence-to-partition word-separated sentence generation unit 15 generates a sentence-to-partition word-separated sentence in the same manner as in the specific example 6-1. The calibration history acquisition unit 16 displays the calibration history that "soon" was replaced with "<del> soon </ del>" once, that is, "soon" was deleted, in the same manner as in Specific Example 6-1. Obtained and stored in the calibration history corpus 8 as a record shown in Table 4.

［具体例６−３（具体例３で用いた校正履歴コーパス８）］
具体例６−１と同様にして、下記校正前後の文の対を取得する。
校正前の文：「県警によると、県内では２例の把握となる。」
校正後の文：「県警によると、県内では２例目の把握となる。」 [Specific Example 6-3 (Calibration history corpus 8 used in Specific Example 3)]
In the same manner as in Specific Example 6-1 the following pair of sentences before and after proofreading is acquired.
Sentence before proofreading: "According to the prefectural police, there are two cases in the prefecture."
Post-proofreading sentence: "According to the prefectural police, this is the second case in the prefecture."

文対分かち書き文生成部１５は、具体例６−１と同様にして、文対分かち書き文を生成する。校正履歴獲得部１６は、具体例６−１と同様にして、「２例」が「２例目」に１回置き換えられた、すなわち「２例」の後に「目」が挿入されたという校正履歴を獲得し、表６に示すレコードとして、校正履歴コーパス８に保存する。 The sentence-to-partition word-separated sentence generation unit 15 generates a sentence-to-partition word-separated sentence in the same manner as in the specific example 6-1. The calibration history acquisition unit 16 calibrates that "2 cases" are replaced once with "2nd case", that is, "eyes" are inserted after "2 cases" in the same manner as in Specific Example 6-1. The history is acquired and stored in the calibration history corpus 8 as a record shown in Table 6.

２．第２実施形態
≪校正支援装置≫
図５は、本実施形態の校正支援装置の構成の一例を示すブロック図である。尚、図５において、第１実施形態と同様の構成には同一の符号を付しており、特に説明しない限りは第１実施形態と同様である。図５において、７は誤り箇所自動検出部である。 2. 2. 2nd Embodiment << Calibration support device >>
FIG. 5 is a block diagram showing an example of the configuration of the calibration support device of the present embodiment. In FIG. 5, the same components as those in the first embodiment are designated by the same reference numerals, and are the same as those in the first embodiment unless otherwise specified. In FIG. 5, reference numeral 7 is an error location automatic detection unit.

＜誤り箇所自動検出部７＞
誤り箇所自動検出部７は、分かち書き文を構成する処理単位のそれぞれについて、機械学習により誤り箇所（文法的に誤っており、修正すべき箇所）を推測する。 <Error location automatic detection unit 7>
The error location automatic detection unit 7 estimates an error location (a location that is grammatically incorrect and should be corrected) by machine learning for each of the processing units constituting the divided sentence.

誤り箇所を推測する方法は特に限定されず、例えば、「Ｌｉｕ，Ｚｈｕｏｒａｎ，ａｎｄＹａｎｇＬｉｕ．”ＥｘｐｌｏｉｔｉｎｇＵｎｌａｂｅｌｅｄＤａｔａｆｏｒＮｅｕｒａｌＧｒａｍｍａｔｉｃａｌＥｒｒｏｒＤｅｔｅｃｔｉｏｎ．” ａｒＸｉｖｐｒｅｐｒｉｎｔａｒＸｉｖ：１６１１．０８９８７（２０１６）」に記載された方法等、公知の「ｇｒａｍｍａｔｉｃａｌｅｒｒｏｒｄｅｔｅｃｔｉｏｎ」の手法を用いることができる。 The method of inferring the error location is not particularly limited, and for example, "Liu, Zhuoran, and Yang Liu." Exploiting Unlabeled Data for Natural Grammatic Error Detection. Known "grammatical error detection" methods such as those described in "arXiv preprint arXiv: 1611.08987 (2016)" can be used.

具体的には、例えば、下記３層のニューラルネットワークから構成され、文が入力されると、０〜４のラベル（０：変更しない、１：置換、２：削除、３：（その単語の前に）挿入）が振られた入力文長の配列を出力する方法が挙げられる。
入力層：入力文（誤りを含む可能性のある文）のベクトル
中間層：入力文の低次元へ写像したベクトル
出力層：正解文（入力文を人手で校正した文）と入力文の差を以下の例の様に表現したもののベクトル Specifically, for example, it is composed of the following three-layer neural network, and when a sentence is input, labels 0 to 4 (0: do not change, 1: replace, 2: delete, 3: (before the word). There is a method of outputting an array of input sentence lengths with () inserted).
Input layer: Vector of input sentence (sentence that may contain errors) Intermediate layer: Vector mapped to lower dimension of input sentence Output layer: Difference between correct sentence (sentence manually calibrated) and input sentence Vector of what is expressed as in the example below

より具体的には、分かち書き文「＜ｂｏｓ＞／犯人／は／フェンス／を／乗り／越えて／から／侵入／した／と／み／られる／。／＜ｅｏｓ＞」が入力されると、「０，０，０，１，０，１，１，２，３，０，０，１，１，０，０」というベクトルが出力される。この出力は、「フェンス」「乗り」「越えて」「み」「られる」は置換、「から」は削除、「侵入」はその前に挿入の可能性があることを意味する。 More specifically, when the word-separated sentence "<bos> / criminal / ha / fence / is / rides / crosses / from / invades / / and / sees / is /. / <Eos>" is input, The vector "0,0,0,1,0,1,1,2,3,0,0,1,1,0,0" is output. This output means that "fence", "ride", "beyond", "mi", and "reru" can be replaced, "from" can be deleted, and "intrusion" can be inserted before that.

出力層のベクトルからＳｏｆｔｍａｘ関数を基にした損失関数を用いて出力との誤差を計算し、その誤差から誤差逆伝播法によりニューラルネットワークの最適な各パラメータを予測してもよい。また、ラベルの種類は特に限定されず、例えば、０，１（誤り箇所か否かを表す）の２種類のラベルでも良い。 An error from the output may be calculated from the vector of the output layer using a loss function based on the Softmax function, and the optimum parameters of the neural network may be predicted from the error by the error back propagation method. Further, the type of label is not particularly limited, and for example, two types of labels of 0 and 1 (indicating whether or not it is an error location) may be used.

誤り箇所自動検出部７は、誤り箇所と推測した処理単位を、好ましくはラベルと共に校正対象判定部３に出力する。誤り箇所と推測した処理単位が複数ある場合、処理対象を一つずつ出力してもよいし、複数の処理単位を一度に出力してもよい。また、複数の処理単位を一度に出力する場合には、校正対象判定部３との間に処理単位記憶部を設け、誤り箇所自動検出部７から出力された複数の処理単位を、一旦、処理単位記憶部に保持し、一つずつ校正対象判定部３に出力してもよい。 The error location automatic detection unit 7 outputs the processing unit estimated to be an error location to the calibration target determination unit 3 together with the label, preferably. When there are a plurality of processing units presumed to be an error location, the processing targets may be output one by one, or a plurality of processing units may be output at one time. Further, when a plurality of processing units are output at one time, a processing unit storage unit is provided between the calibration target determination unit 3 and the plurality of processing units output from the error location automatic detection unit 7 to be processed once. It may be held in the unit storage unit and output to the calibration target determination unit 3 one by one.

＜校正対象判定部３＞
校正対象判定部３は、誤り箇所自動検出部７が誤り箇所と推測した処理単位のみを校正履歴コーパス８の見出しと対比し、校正対象を判定する。そのため、分かち書き文を構成する処理単位の全てについて校正対象を判定する場合に比べ、処理速度が向上する可能性がある。 <Calibration target determination unit 3>
The calibration target determination unit 3 determines the calibration target by comparing only the processing unit estimated by the error location automatic detection unit 7 as the error location with the heading of the calibration history corpus 8. Therefore, there is a possibility that the processing speed will be improved as compared with the case where the calibration target is determined for all the processing units constituting the divided sentence.

また、校正対象判定部３は、誤り箇所自動検出部７が出力したラベルを用いて、連続する処理単位を連結して１つの処理単位（ｎ−ｇｒａｍ）として扱ってもよい。例えば、上述の例では、「乗り」と「越えて」、「み」と「られる」については、誤り箇所自動検出部７の出力したラベル「１」（置換）が連続しているので、それぞれ両者が連結した１語（２−ｇｒａｍ）「乗り越えて」「みられる」を処理単位として扱ってもよい。 Further, the calibration target determination unit 3 may use the label output by the error location automatic detection unit 7 to concatenate continuous processing units and treat them as one processing unit (n-gram). For example, in the above example, the labels "1" (replacement) output by the error location automatic detection unit 7 are continuous for "ride" and "beyond", and "mi" and "reru", respectively. One word (2-gram) "overcoming" and "seen" in which both are connected may be treated as a processing unit.

＜適切候補判定部５＞
適切候補判定部５は、誤り箇所自動検出部７が出力したラベルを最適候補の判断材料にしてもよい。例えば、上述の例では、「から」の適切候補が複数ある場合に、誤り箇所自動検出部７の出力したラベル「２」（削除）を加味し、「＜ｄｅｌ＞から＜／ｄｅｌ＞」を最適候補として判定してもよい。 <Appropriate candidate determination unit 5>
The appropriate candidate determination unit 5 may use the label output by the error location automatic detection unit 7 as the optimum candidate determination material. For example, in the above example, when there are a plurality of appropriate candidates for "kara", the label "2" (deleted) output by the error location automatic detection unit 7 is added, and "<del> to </ del>" is added. It may be determined as the optimum candidate.

≪校正支援方法≫
図６は、本実施形態の校正支援方法の一例を示すフローチャートである。 ≪Proofreading support method≫
FIG. 6 is a flowchart showing an example of the calibration support method of the present embodiment.

＜ステップ１１（Ｓ１１）＞
分かち書き文生成部２は、第１実施形態のステップ１と同様にして、分かち書き文を生成する。 <Step 11 (S11)>
The word-separated sentence generation unit 2 generates a word-separated sentence in the same manner as in step 1 of the first embodiment.

＜ステップ１２〜１４（Ｓ１２〜Ｓ１４）＞
誤り箇所自動検出部７が誤り箇所と推測した処理単位のみについて校正対象を判定する点、誤り箇所と推測した処理単位の全てについて校正対象を判定してから次のステップに進む点以外は、第１実施形態のステップ２，３と同様にして、校正対象判定部３は、校正対象を判定する。 <Steps 12 to 14 (S12 to S14)>
Except for the point that the error location automatic detection unit 7 determines the calibration target only for the processing unit estimated to be the error location, and the point that the calibration target is determined for all the processing units estimated to be the error location before proceeding to the next step. 1 The calibration target determination unit 3 determines the calibration target in the same manner as in steps 2 and 3 of the embodiment.

＜ステップ１５〜１９（Ｓ１５〜Ｓ１９）＞
本例では、文脈を考慮した確率モデルを組み込むことで、最適候補を判定した。以下、ビームサーチ等の幅優先探索と校正履歴コーパスの組み合わせによる最適候補の判定について説明する。幅優先ビーム探索は知識あり探索に分類され、幅優先探索を行いつつ、評価値が高いノードをビーム幅個保持し、ビーム幅個よりノードの個数が増えたら、評価値が低い枝を切り捨てるアルゴリズムである。 <Steps 15 to 19 (S15 to S19)>
In this example, the optimum candidate was determined by incorporating a probabilistic model that takes context into consideration. Hereinafter, the determination of the optimum candidate by the combination of breadth-first search such as beam search and calibration history corpus will be described. Breadth-first beam search is classified as knowledgeable search. ..

以下、校正対象を文頭側から順に、第１校正対象、第２校正対象・・・第Ｎ校正対象（Ｎは２以上の整数）とする。また、一の校正対象についての適切候補を、第１適切候補、第２適切候補・・・第Ｍ適切候補（Ｍは１以上の整数）とする。 Hereinafter, the calibration targets are the first calibration target, the second calibration target ... Nth calibration target (N is an integer of 2 or more) in order from the beginning of the sentence. Further, the appropriate candidates for one calibration target are the first appropriate candidate, the second appropriate candidate ... the M appropriate candidate (M is an integer of 1 or more).

まず、下記手順で、第１校正対象の適切候補を判定する。 First, an appropriate candidate for the first calibration target is determined by the following procedure.

［ステップ１５］
校正候補予測部４は、第１実施形態のステップ４と同様にして、所定の探索幅で第１校正対象の校正候補を予測する。その際、例えば、「Ｃｏｎｔｅｘｔ２Ｖｅｃ」で得られた類似度、類似度から得られる対数尤度等をスコアとして求める。 [Step 15]
The calibration candidate prediction unit 4 predicts the calibration candidate to be calibrated with a predetermined search width in the same manner as in step 4 of the first embodiment. At that time, for example, the similarity obtained by "Conextext2Vec", the log-likelihood obtained from the similarity, and the like are obtained as scores.

［ステップ１６，１７］
適切候補判定部５は、第１実施形態のステップ５，６と同様にして、第１校正対象の適切候補を判定する。 [Steps 16 and 17]
The appropriate candidate determination unit 5 determines an appropriate candidate for the first calibration target in the same manner as in steps 5 and 6 of the first embodiment.

次に、下記手順で、第１校正対象が第１適切候補に置き換えられた分かち書き文について、第２校正対象の適切候補を判定する。 Next, in the following procedure, the appropriate candidate for the second proofreading target is determined for the word-separated sentence in which the first proofreading target is replaced with the first appropriate candidate.

［ステップ１８］
適切候補判定部５は、第１校正対象を第１適切候補に置き換え、ステップ１５に戻る。 [Step 18]
The appropriate candidate determination unit 5 replaces the first calibration target with the first appropriate candidate, and returns to step 15.

［ステップ１５］
校正候補予測部４は、第１実施形態のステップ４と同様にして、所定の探索幅で第２校正対象の校正候補を予測すると共にスコアを求める。 [Step 15]
The calibration candidate prediction unit 4 predicts the calibration candidate to be calibrated in the second calibration target with a predetermined search width and obtains a score in the same manner as in step 4 of the first embodiment.

［ステップ１６，１７］
適切候補判定部５は、第１実施形態のステップ５，６と同様にして、第２校正対象の適切候補を判定する。 [Steps 16 and 17]
The appropriate candidate determination unit 5 determines an appropriate candidate for the second calibration target in the same manner as in steps 5 and 6 of the first embodiment.

次に、下記手順で、第１校正対象が第１適切候補に置き換えられ、第２校正対象が第１適切候補に置き換えられた分かち書き文について、第３校正対象の適切候補を判定する。 Next, in the following procedure, the appropriate candidate for the third proofreading target is determined for the word-separated sentence in which the first proofreading target is replaced with the first appropriate candidate and the second proofreading target is replaced with the first appropriate candidate.

［ステップ１８］
適切候補判定部５は、第１校正対象を第１適切候補に置き換え、第２校正対象を第１適切候補に置き換え、ステップ１５に戻る。 [Step 18]
The appropriate candidate determination unit 5 replaces the first calibration target with the first appropriate candidate, replaces the second calibration target with the first appropriate candidate, and returns to step 15.

［ステップ１５］
校正候補予測部４は、第１実施形態のステップ４と同様にして、所定の探索幅で第３校正対象の校正候補を予測すると共にスコアを求める。 [Step 15]
The calibration candidate prediction unit 4 predicts the calibration candidate to be calibrated in the third calibration target with a predetermined search width and obtains a score in the same manner as in step 4 of the first embodiment.

［ステップ１６，１７］
適切候補判定部５は、第１実施形態のステップ５，６と同様にして、第３校正対象の適切候補を判定する。 [Steps 16 and 17]
The appropriate candidate determination unit 5 determines an appropriate candidate for the third calibration target in the same manner as in steps 5 and 6 of the first embodiment.

以降、同様にしてステップ１５〜１８を繰り返し、第ｎ校正対象（ｎは２〜Ｎの整数）について、第１校正対象〜第ｎ−１校正対象をそれぞれの適切候補（第１適切候補〜第Ｍ適切候補）の一つで置き換えて適切候補を判定し、第１校正対象〜第Ｎ校正対象それぞれの適切候補の組み合わせを全て求める。 After that, steps 15 to 18 are repeated in the same manner, and for the nth calibration target (n is an integer of 2 to N), the first calibration target to the n-1 calibration target are each appropriate candidate (first appropriate candidate to first). The appropriate candidate is determined by replacing it with one of M appropriate candidates), and all combinations of appropriate candidates for each of the first calibration target to the Nth calibration target are obtained.

［ステップ１９］
適切候補判定部５は、第１校正対象〜第Ｎ校正対象それぞれの適切候補の組み合わせを構成する適切候補のスコアを合計し、組み合わせそれぞれについて、スコア合計を求め、最もスコア合計の小さい組み合わせを構成する各適切候補を最適候補と判定する。 [Step 19]
The appropriate candidate determination unit 5 totals the scores of the appropriate candidates constituting the combination of the appropriate candidates for each of the first calibration target to the Nth calibration target, obtains the total score for each combination, and constitutes the combination with the smallest total score. Each appropriate candidate is judged to be the optimum candidate.

＜ステップ２０（Ｓ２０）＞
出力部６は、例えば、校正対象を最適候補で置き換えた校正済みの文を出力する等、最適候補を校正対象文と関連付けて出力する。さらに、出力部６は、第１校正対象〜第Ｎ校正対象それぞれの適切候補の組み合わせを、スコア合計順に出力してもよい。 <Step 20 (S20)>
The output unit 6 outputs the optimum candidate in association with the calibration target sentence, for example, outputting a calibrated sentence in which the calibration target is replaced with the optimum candidate. Further, the output unit 6 may output a combination of appropriate candidates for each of the first calibration target to the Nth calibration target in the order of total scores.

＜具体例７＞
校正対象文の複数箇所を校正する例、具体的には、校正対象文「犯人はフェンスを乗り越えてから侵入したとみられる。」を「犯人は柵を乗り越え、侵入したとみられる。」に校正する例について説明する。本例で用いた校正履歴コーパス８には、表１４に示すレコードが保存されている。 <Specific example 7>
An example of proofreading multiple parts of the proofreading target sentence, specifically, proofreading the proofreading target sentence "The criminal seems to have invaded after overcoming the fence." To "The criminal seems to have invaded after overcoming the fence." An example will be described. The records shown in Table 14 are stored in the calibration history corpus 8 used in this example.

（１）ステップ１１
具体例１のステップ１と同様にして、下記分かち書き文を生成する。
「＜ｂｏｓ＞／犯人／は／フェンス／を／乗り／越えて／から／侵入／した／と／み／られる／。／＜ｅｏｓ＞」 (1) Step 11
The following word-separated sentence is generated in the same manner as in step 1 of the specific example 1.
"<Bos> / Criminal / Ha / Fence / Ride / Cross / From / Invade / Intrude / To / See / Can /. / <eos>"

（２）ステップ１２
誤り箇所自動検出部７は、機械学習により分かち書き文の誤り箇所を推測し、推測結果「０，０，０，１，０，１，１，２，３，０，０，１，１，０，０」というベクトルを出力する。すなわち、誤り箇所自動検出部７は、「フェンス」「乗り」「越えて」「み」「られる」は置換、「から」は削除、「侵入」はその前に挿入の可能性があると推測する。 (2) Step 12
The error location automatic detection unit 7 estimates the error location of the divided sentence by machine learning, and the estimation result "0,0,0,1,0,1,1,2,3,0,0,1,1,0" , 0 "is output. That is, the error location automatic detection unit 7 presumes that there is a possibility that "fence", "ride", "over", "mi", and "being" are replaced, "from" is deleted, and "intrusion" is inserted before that. do.

（３）ステップ１３，１４
校正対象判定部３は、上記分かち書き文を構成する形態素のうち、誤り箇所自動検出部７により誤り箇所と推測された形態素、すなわち「フェンス」「乗り」「越えて」「から」「侵入」「み」「られる」のそれぞれを文の先頭から順番に校正履歴コーパス８の見出しと対比する。この際、「乗り」と「越えて」については、誤り箇所自動検出部７の出力したラベル「１」（置換）が連続しているので、両者が連結した１語（２−ｇｒａｍ）「乗り越えて」として処理する。「み」と「られる」についても、同様に「みられる」として処理する。その結果、「フェンス」「乗り越えて」「から」「侵入」が校正履歴コーパス８の見出しと一致するので、これらを、それぞれ第１校正対象、第２校正対象、第３校正対象、第４校正対象として判定し、ステップ１５に進む。 (3) Steps 13 and 14
Among the morphemes constituting the above-mentioned word-separated sentence, the proofreading target determination unit 3 is a morpheme presumed to be an error part by the error part automatic detection unit 7, that is, "fence", "ride", "beyond", "from", "intrusion", and "intrusion". Each of "mi" and "reru" is compared with the heading of the calibration history corpus 8 in order from the beginning of the sentence. At this time, for "ride" and "over", the label "1" (replacement) output by the error location automatic detection unit 7 is continuous, so one word (2-gram) "overcoming" in which both are connected is continuous. Process as "te". "Mi" and "reru" are also treated as "seen" in the same way. As a result, "fence", "overcoming", "from", and "intrusion" match the headings of the calibration history corpus 8, so these are the first calibration target, the second calibration target, the third calibration target, and the fourth calibration, respectively. It is determined as a target, and the process proceeds to step 15.

（４）ステップ１５
校正候補予測部４は、具体例１のステップ４と同様にして、第１校正対象「フェンス」がブランクであった場合に、ブランクを埋める形態素を校正候補として予測する。ただし、探索幅は５とし、「Ｃｏｎｔｅｘｔ２Ｖｅｃ」で得られた類似度から得られる対数尤度をスコアとして求めた。その結果を表１５に示す。 (4) Step 15
Similar to step 4 of the specific example 1, the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate when the first calibration target “fence” is blank. However, the search width was set to 5, and the log-likelihood obtained from the similarity obtained by "Conextext2Vec" was obtained as a score. The results are shown in Table 15.

（５）ステップ１６，１７
適切候補判定部５は、校正候補のそれぞれを、表１４に示す校正結果と対比する。その結果、「柵」「堀」および校正対象自身である「フェンス」が校正結果と一致するため、これらを、それぞれ第１適切候補、第２適切候補、第３適切候補として判定する。 (5) Steps 16 and 17
The appropriate candidate determination unit 5 compares each of the calibration candidates with the calibration results shown in Table 14. As a result, since the "fence" and "moat" and the "fence" which is the calibration target itself match the calibration result, these are determined as the first appropriate candidate, the second appropriate candidate, and the third appropriate candidate, respectively.

（６）ステップ１８
適切候補判定部５は、第１校正対象「フェンス」を第１適切候補「柵」に置き換え、ステップ１５に戻る。 (6) Step 18
The appropriate candidate determination unit 5 replaces the first calibration target “fence” with the first appropriate candidate “fence”, and returns to step 15.

（７）ステップ１５
校正候補予測部４は、「（４）ステップ１５」と同様にして、第２校正対象「乗り越えて」がブランクであった場合に、ブランクを埋める形態素を校正候補として予測する。その結果を表１６に示す。 (7) Step 15
In the same manner as in "(4) Step 15", the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate when the second calibration target "overcoming" is blank. The results are shown in Table 16.

（８）ステップ１６，１７
適切候補判定部５は、校正候補のそれぞれを、表１４に示す校正結果と対比する。その結果、「乗り越え」「越え」「飛び越え」が校正結果と一致するため、これらを、それぞれ第１適切候補、第２適切候補、第３適切候補として判定する。 (8) Steps 16 and 17
The appropriate candidate determination unit 5 compares each of the calibration candidates with the calibration results shown in Table 14. As a result, "overcoming", "overcoming", and "jumping over" match the calibration results, and these are determined as the first appropriate candidate, the second appropriate candidate, and the third appropriate candidate, respectively.

（９）ステップ１８
適切候補判定部５は、第１校正対象「フェンス」を第１適切候補「柵」に置き換え、第２校正対象「乗り越えて」を第１適切候補「乗り越え」に置き換え、ステップ１５に戻る。 (9) Step 18
The appropriate candidate determination unit 5 replaces the first calibration target “fence” with the first appropriate candidate “fence”, replaces the second calibration target “overcoming” with the first appropriate candidate “overcoming”, and returns to step 15.

（１０）ステップ１５
校正候補予測部４は、「（４）ステップ１５」と同様にして、第３校正対象「から」がブランクであった場合に、ブランクを埋める形態素を校正候補として予測する。その結果を表１７に示す。 (10) Step 15
In the same manner as in "(4) Step 15", the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate when the third calibration target "kara" is blank. The results are shown in Table 17.

（１１）ステップ１６，１７
適切候補判定部５は、校正候補のそれぞれを、表１４に示す校正結果と対比する。その結果、「へ」「＜ｄｅｌ＞から＜／ｄｅｌ＞」が校正結果と一致するため、これらをそれぞれ第１適切候補、第２適切候補として判定する。さらに、適切候補判定部５は、誤り箇所自動検出部７の出力したラベル「２」（削除）を加味し、第２適切候補「＜ｄｅｌ＞から＜／ｄｅｌ＞」を最適候補として判定する。 (11) Steps 16 and 17
The appropriate candidate determination unit 5 compares each of the calibration candidates with the calibration results shown in Table 14. As a result, "to" and "from <del> to </ del>" match the calibration results, so these are determined as the first appropriate candidate and the second appropriate candidate, respectively. Further, the appropriate candidate determination unit 5 determines the second appropriate candidate “<del> to </ del>” as the optimum candidate in consideration of the label “2” (deleted) output by the error location automatic detection unit 7.

（１２）ステップ１８
適切候補判定部５は、第１校正対象「フェンス」を第１適切候補「柵」に置き換え、第２校正対象「乗り越えて」を第１適切候補「乗り越え」に置き換え、第３校正対象「から」を最適候補「＜ｄｅｌ＞から＜／ｄｅｌ＞」に置き換え、ステップ１５に戻る。 (12) Step 18
The appropriate candidate determination unit 5 replaces the first calibration target "fence" with the first appropriate candidate "fence", replaces the second calibration target "overcoming" with the first appropriate candidate "overcoming", and replaces the third calibration target "from". Is replaced with the optimum candidate "<del> to </ del>", and the process returns to step 15.

（１３）ステップ１５
校正候補予測部４は、「（４）ステップ１５」と同様にして、第４校正対象「侵入」がブランクであった場合に、ブランクを埋める形態素を校正候補として予測する。その結果を表１８に示す。 (13) Step 15
In the same manner as in "(4) Step 15", the calibration candidate prediction unit 4 predicts a morpheme that fills the blank as a calibration candidate when the fourth calibration target "intrusion" is blank. The results are shown in Table 18.

（１４）ステップ１６，１７
適切候補判定部５は、校正候補のそれぞれを、表１４に示す校正結果と対比する。その結果、「、侵入」「不法侵入」が校正結果と一致するため、これらを、それぞれ第１適切候補、第２適切候補として判定する。 (14) Steps 16 and 17
The appropriate candidate determination unit 5 compares each of the calibration candidates with the calibration results shown in Table 14. As a result, ", invasion" and "illegal invasion" match the calibration results, and these are determined as the first appropriate candidate and the second appropriate candidate, respectively.

（１５）ステップ１８
適切候補判定部５は、第１校正対象「フェンス」を第１適切候補「柵」に置き換え、第２校正対象「乗り越えて」を第２適切候補「越え」に置き換え、ステップ１５に戻る。 (15) Step 18
The appropriate candidate determination unit 5 replaces the first calibration target "fence" with the first appropriate candidate "fence", replaces the second calibration target "overcoming" with the second appropriate candidate "over", and returns to step 15.

（１６）ステップ１５〜１８
以降、同様にして、適切候補のすべての組み合わせを求める。その結果を表１９に示す。 (16) Steps 15-18
Hereafter, all combinations of appropriate candidates are obtained in the same manner. The results are shown in Table 19.

（１７）ステップ１９
適切候補判定部５は、適切候補の組み合わせのそれぞれについてスコア合計を求め、最もスコア合計の小さい組み合わせを構成する各適切候補を最適候補と判定する。本例では、表１９に示すように、第１校正対象「フェンス」を第１適切候補「柵」に置き換え、第２校正対象「乗り越えて」を第１適切候補「乗り越え」に置き換え、第３校正対象「から」を最適候補「＜ｄｅｌ＞から＜／ｄｅｌ＞」に置き換え、第４校正対象「から」を第１適切候補「、侵入」に置き換える組み合わせのスコア合計は１．２１＋１．５９＋１．６２＋１．５８＝６．００であり最小である。そのため、第１校正対象「フェンス」については第１適切候補「柵」を、第２校正対象「乗り越えて」については第１適切候補「乗り越え」を、第３校正対象「から」については最適候補「＜ｄｅｌ＞から＜／ｄｅｌ＞」を、第４校正対象「から」については第１適切候補「、侵入」を、最適候補と判定する。 (17) Step 19
The appropriate candidate determination unit 5 obtains the total score for each combination of appropriate candidates, and determines each appropriate candidate constituting the combination having the smallest total score as the optimum candidate. In this example, as shown in Table 19, the first calibration target "fence" is replaced with the first appropriate candidate "fence", the second calibration target "overcoming" is replaced with the first appropriate candidate "overcoming", and the third The total score of the combination in which the calibration target "kara" is replaced with the optimum candidate "<del> to </ del>" and the fourth calibration target "kara" is replaced with the first appropriate candidate ", intrusion" is 1.21 + 1.59 + 1. 62 + 1.58 = 6.00, which is the minimum. Therefore, the first appropriate candidate "fence" for the first calibration target "fence", the first appropriate candidate "overcome" for the second calibration target "overcome", and the optimum candidate for the third calibration target "kara". It is determined that "from <del> to </ del>" and the first appropriate candidate ", intrusion" for the fourth calibration target "from" are the optimum candidates.

（１８）ステップ２０
出力部６は、各校正対象を最適候補で置き換えた校正済み文「犯人は柵を乗り越え、侵入したとみられる。」をディスプレイに表示する。さらに、出力部６は、他の適切候補の組み合わせを、スコア合計の小さい順にディスプレイに表示する。 (18) Step 20
The output unit 6 displays on the display a proofread sentence "The criminal seems to have climbed over the fence and invaded." With each proofreading target replaced with the optimum candidate. Further, the output unit 6 displays other combinations of appropriate candidates on the display in ascending order of the total score.

１：文入力部、２：分かち書き文生成部、３：校正対象判定部、４：校正候補予測部、５：適切候補判定部、６：出力部、７：誤り箇所自動検出部、８：校正履歴コーパス、９：形態素解析用辞書、１０：ベクトル学習済みモデル、１１：校正後文章、１２：校正前文章、１３：校正済みデータベース、１４：文対取得部、１５：文対分かち書き文生成部、１６：校正履歴獲得部、１７：ベクトル計算部 1: Sentence input unit 2: Word-separated sentence generation unit 3: Proofreading target determination unit 4: Proofreading candidate prediction unit 5: Appropriate candidate determination unit, 6: Output unit, 7: Error location automatic detection unit, 8: Proofreading History corpus, 9: Morphological analysis dictionary, 10: Vector trained model, 11: Post-proofreading sentence, 12: Pre-proofreading sentence, 13: Pre-proofreading database, 14: Sentence pair acquisition unit, 15: Sentence-paired word-separated sentence generation unit , 16: Calibration history acquisition unit, 17: Vector calculation unit

Claims

A word-separated sentence generator that divides the proofreading target sentence into processing units and generates a word-separated sentence,
Of the processing units that make up the word-separated text, the proofreading target determination unit that determines the processing unit that matches the heading in the proofreading history corpus as the proofreading target.
A calibration candidate prediction unit that predicts calibration candidates using a vector possessed by at least one processing unit or processing unit group before and after the calibration target.
Among the calibration candidates, an appropriate candidate determination unit that determines a calibration candidate that matches the calibration result corresponding to the heading in the calibration history corpus as an appropriate candidate.
A calibration support device characterized by having.

Further, it has a vector-learned model, and the calibration candidate prediction unit uses a vector possessed by a processing unit acquired from the vector-learned model or a vector possessed by a processing unit group calculated using the vector possessed by the processing unit. The calibration support device according to claim 1, wherein the calibration candidate is predicted.

Further, it has an error location automatic detection unit that estimates an error location by machine learning, and the calibration target determination unit has only a processing unit that is estimated to be an error location by the error location automatic detection unit among the processing units. The calibration support device according to claim 1 or 2, wherein the calibration target is determined.

The calibration support device according to any one of claims 1 to 3, wherein the appropriate candidate determination unit determines one of the appropriate candidates as the optimum candidate.

The calibration support device according to claim 4, wherein the appropriate candidate determination unit determines an optimum candidate by using breadth-first search.

When predicting a calibration candidate for one calibration target, the calibration candidate prediction unit replaces at least one other calibration target with one of the appropriate candidates for the other calibration target to predict the calibration candidate. The calibration support device according to any one of claims 1 to 5, wherein the calibration support device is characterized by the above.

The calibration support device according to any one of claims 1 to 6, wherein the appropriate candidate determination unit determines an appropriate candidate when the calibration candidate does not include the calibration target.

Further, it has a calibration history corpus generation unit that generates the calibration history corpus.
The calibration history corpus generator
A sentence pair acquisition unit that acquires pairs of sentences before and after proofreading from a proofread database that stores sentences before and after proofreading.
A sentence-to-word-separated sentence generation unit that divides each of the pre-proofreading sentence and the post-proofreading sentence into processing units to generate a word-separated sentence.
A proofreading history acquisition unit that acquires proofreading history by comparing sentence pairs before and after proofreading using the above-mentioned word-separated sentences.
The calibration support device according to any one of claims 1 to 7, wherein the calibration support device is characterized by the above.

It is a proofreading support method performed by a computer.
A step to generate a word-separated sentence by dividing the calibrated sentence into processing units and generating a word-separated sentence,
A proofreading target determination step for determining a processing unit that matches a heading in the proofreading history corpus as a proofreading target among the processing units constituting the divided sentence.
A calibration candidate prediction step for predicting a calibration candidate using a vector possessed by at least one processing unit or processing unit group before and after the calibration target.
Among the calibration candidates, an appropriate candidate determination step for determining a calibration candidate that matches the calibration result corresponding to the heading in the calibration history corpus as an appropriate candidate.
A calibration support method characterized by having.

On the computer
A step to generate a word-separated sentence by dividing the calibrated sentence into processing units and generating a word-separated sentence,
A proofreading target determination step for determining a processing unit that matches a heading in the proofreading history corpus as a proofreading target among the processing units constituting the divided sentence.
A calibration candidate prediction step for predicting a calibration candidate using a vector possessed by at least one processing unit or processing unit group before and after the calibration target.
Among the calibration candidates, an appropriate candidate determination step for determining a calibration candidate that matches the calibration result corresponding to the heading in the calibration history corpus as an appropriate candidate.
A proofreading support program characterized by running.