JP5530469B2

JP5530469B2 - Natural language analysis processing apparatus, method, and program

Info

Publication number: JP5530469B2
Application number: JP2012050749A
Authority: JP
Inventors: 潤鈴木
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2012-03-07
Filing date: 2012-03-07
Publication date: 2014-06-25
Anticipated expiration: 2032-03-07
Also published as: JP2013186656A

Description

本発明は、自然言語解析処理装置、方法、及びプログラムに係り、特に、入力文書に対して複数段階の言語解析処理を含む自然言語解析処理を行う自然言語解析処理装置、方法、及びプログラムに関する。 The present invention relates to a natural language analysis processing device, method, and program, and more particularly, to a natural language analysis processing device, method, and program for performing natural language analysis processing including a plurality of stages of language analysis processing on an input document.

自然言語とは、日本語や英語など人間が通常使う言語のことである。自然言語で記述された文書を文法や意味的に解析する技術は、例えば、その言語の成り立ちや構造を理解するという言語学的な観点で学術的に重要な意義がある。また近年では、人間が生成した文書を、文法・意味的に自動で解析し、その結果を利用して提供する様々なサービスが、主にweb上で展開されるようになってきた。例えば、翻訳サイト、人や商品の評判分析サイト、ある特定の事象に対する要約サイトなどのサービスが、これに相当する。これらのサービスでは、人間が生成した電子化文書を、システム内部で文法、意味的に解析し、それらを利用する形で実際のサービスを提供する形になる。その意味で、自然言語を文法、意味的に解析する技術は、これらサービスの根幹の技術であり、情報処理分野でも非常に重要な位置を占めるようになった。 Natural language is a language normally used by humans, such as Japanese and English. A technique for grammatically and semantically analyzing a document described in a natural language has an academic significance from a linguistic viewpoint, for example, to understand the origin and structure of the language. In recent years, various services have been developed mainly on the web that automatically analyze grammatically and semantically documents generated by humans and use the results. For example, services such as a translation site, a reputation analysis site for people and products, and a summary site for a specific event correspond to this. In these services, digitized documents generated by humans are grammatically and semantically analyzed in the system, and the actual services are provided by using them. In this sense, grammatical and semantic analysis of natural language is the core technology of these services, and has come to occupy a very important position in the information processing field.

一概に自然言語を解析するといっても、単語区切りや品詞推定といった表層的な解析から、語や節間の係り受け関係の推定といったより高度な解析を行うものまで含まれる。例えば、文書から文の区切りを推定する「文区切り」、単語の区切りを推定する「単語区切り」、単語の品詞を推定する品詞付与、単語や節間の関係を推定する係り受け解析などがある。これらの例を図８に示す。 Natural language analysis includes everything from surface analysis such as word breaks and part-of-speech estimation to more advanced analysis such as estimation of dependency relationships between words and clauses. For example, there are “sentence breaks” that estimate sentence breaks from documents, “word breaks” that estimate word breaks, part-of-speech assignment that estimates word part-of-speech, and dependency analysis that estimates the relationship between words and clauses. . Examples of these are shown in FIG.

品詞付与や係り受け解析といった言語解析問題は、「相互依存性」がある問題に分類される。これは、例えば、単語の品詞を決定する問題では、ある単語の品詞を決定する際には、前後や周辺の品詞によって対象とする単語に付与すべき品詞が決まる関係があり、また、その前後や周辺の品詞も、その前後や周辺の品詞に依存して決まる問題である。仮に、ある品詞Aの後に、品詞Bは出現してはいけないという規則が存在する場合、個々の単語の品詞を独立に推定してしまうと、この規則を満たした結果が得られる保証が無い。つまり、個々単語の品詞を独立に推定した場合はこういった全体としての整合性がとれるとは限らないため、文書全体として最も良い品詞列を探索する問題になっている。特に、文区切り、単語区切り、品詞推定、係り受け解析といった問題を一括して解くことを考えると、とても複雑に絡みあった問題となっているため、最もよい出力を探索する問題は非常に難しい問題であると言える。 Language analysis problems such as part-of-speech assignment and dependency analysis are classified as problems with “interdependence”. This is because, for example, in the problem of determining the part of speech of a word, when determining the part of speech of a word, there is a relationship in which the part of speech to be given to the target word is determined by the surrounding parts of speech and the surrounding parts of speech. And surrounding parts of speech are also problems that depend on the part of speech before and after that. If there is a rule that part of speech B should not appear after a part of speech A, there is no guarantee that a result satisfying this rule will be obtained if the part of speech of each word is estimated independently. In other words, when the part-of-speech of individual words is estimated independently, it is not always possible to obtain such consistency as a whole. In particular, considering problems such as sentence breaks, word breaks, part-of-speech estimation, and dependency analysis in a batch, the problem of searching for the best output is very difficult because it is a very complicated problem. It can be said that it is a problem.

これらの問題を解くために、履歴ベースの逐次解析法(いわゆるstack decoding)、動的計画法、整数計画法などのアルゴリズムを使って出力を推定する方法が広く使われている（非特許文献１〜非特許文献３）。 In order to solve these problems, a method of estimating an output using an algorithm such as history-based sequential analysis (so-called stack decoding), dynamic programming, integer programming, etc. is widely used (Non-Patent Document 1). -Non-patent document 3).

履歴ベースの逐次解析法は、直前までの推定結果を利用して次の推定を行う方法である（図９参照）。独立に問題を解く方法に比べて、出力の整合性を保つことが可能である。ただし、直前までの解析結果を信じて次の解析結果を決定する方法であるため、直前の解析結果に誤りがあった場合には、その誤りに従った解析を続けてしまい、間違いの伝播が起こる、という問題が知られている。 The history-based sequential analysis method is a method for performing the next estimation using the estimation results obtained immediately before (see FIG. 9). Compared with the method of solving the problem independently, the output consistency can be maintained. However, since this is a method of determining the next analysis result by believing in the previous analysis result, if there is an error in the previous analysis result, the analysis will continue according to the error, and the propagation of the error will continue. The problem of happening is known.

その問題を解決する方法として、動的計画法に基づく方法（図１０参照）や整数計画法に基づく方法が使われるようになった。これは、全体として最もよい出力を選ぶ方法となっているため、履歴ベースの逐次解析法に比べて途中の解析誤りといった問題が発生しない。一方で、全体として最もよい出力を選ぶための計算量は、逐次解析法に比べると非常に大きくなるといった欠点もある。 As a method for solving this problem, a method based on dynamic programming (see FIG. 10) and a method based on integer programming have come to be used. Since this is a method for selecting the best output as a whole, there is no problem of an analysis error in the middle compared to the history-based sequential analysis method. On the other hand, the amount of calculation for selecting the best output as a whole has a drawback that it is very large compared to the sequential analysis method.

Joakim Nivre. Algorithms for deterministic incremental dependency parsing. Computational Linguistics, 2008.Joakim Nivre. Algorithms for deterministic incremental dependency parsing.Computational Linguistics, 2008. F. Sha and F. Pereira. Shallow Parsing with Conditional Random Fields. Proc. of HLT/NAACL, 2003.F. Sha and F. Pereira. Shallow Parsing with Conditional Random Fields. Proc. Of HLT / NAACL, 2003. Dan Roth, Wen-tau Yih Global Inference for Entity and Relation Identification via a Linear Programming Formulation Introduction to Statistical Relational Learning, 2007.Dan Roth, Wen-tau Yih Global Inference for Entity and Relation Identification via a Linear Programming Formulation Introduction to Statistical Relational Learning, 2007.

例えば、日本語の文書に対して、単語区切り、文節区切り、文区切り、品詞付与（辞書引き）、文節間係り受け関係付与を一括して行うシステムを想定する。こういった文書の文法・意味的な解析結果を使ったサービスとして、web上での人物や商品の評判を分析するサイトや、翻訳サイトなどが考えられる。 For example, a system is assumed in which a word break, phrase break, sentence break, part of speech assignment (dictionary lookup), and inter-phrase dependency relation assignment are collectively applied to a Japanese document. As a service that uses the grammatical and semantic analysis results of such documents, a site that analyzes the reputation of people and products on the web, a translation site, etc. can be considered.

こういったシステムでは、通常全ての処理を一括で行うのは複雑であるため、従来は、1、文区切り、2、単語区切り+品詞付与(一般に形態素解析と呼ばれる)、3、文節区切り、4、係り受け解析、というような順番で処理を段階的に適用して処理を行う。つまり、各段階の処理は、前段階の処理結果を入力として受け取り処理を行う。こういった逐次的に処理を行う方法を一般的に「パイプライン方式」とよぶ。よって、ここでも各段階の処理を次の段階の処理の入力として扱う従来法を「パイプライン方式」と呼ぶ。パイプライン方式は、前段階の処理結果に誤りがあった場合、その誤りを修正する枠組みは存在しないため、その誤りにより後の解析を間違えるという誤りの伝播が発生しやすい、という問題がある。また、各処理の段階にまたがって利用できる情報を考慮して解析システムを運用することは困難である。実際にこういった複合的な情報を利用することで、解析精度を向上させることが可能であることが多いため、パイプライン方式の欠点と考えられる。これらは、一つの段階において履歴ベースの逐次解析法で見られる欠点と同じ原因である。 In such a system, it is usually complicated to perform all processing at once, so conventionally, 1, sentence break, 2, word break + part of speech assignment (commonly called morphological analysis), 3, phrase break, 4 The process is applied step by step in the order of dependency analysis. That is, the process at each stage receives the process result of the previous stage as input and performs the process. Such a sequential processing method is generally called a “pipeline method”. Therefore, here again, the conventional method of handling each stage of processing as an input of the next stage of processing is called a “pipeline method”. The pipeline method has a problem that when there is an error in the processing result of the previous stage, there is no framework for correcting the error, and therefore error propagation is likely to occur because the error causes a later analysis. In addition, it is difficult to operate an analysis system in consideration of information that can be used across stages of processing. Actually, it is often possible to improve the analysis accuracy by using such complex information, so it is considered a drawback of the pipeline method. These are the same causes as the disadvantages found in history-based sequential analysis in one stage.

つまり、個々の段階の処理に動的計画法や整数計画法のような全体最適化法を利用していたとしても、結局のところパイプライン処理による部分で履歴ベースの逐次解析法と同じ欠点が発生してしまう。 In other words, even if global optimization methods such as dynamic programming and integer programming are used for processing at each stage, after all the same disadvantages as the history-based sequential analysis method in terms of pipeline processing. Will occur.

もちろん動的計画法や整数計画法のような全体最適化を行う方法を、このような複数の解析を同時に行うシステムに適用することは可能である。ただし、あまりにも計算量が大きくなりすぎて、実用上はそのまま適用するのは困難となる。 Of course, it is possible to apply a method of performing overall optimization such as dynamic programming or integer programming to a system that performs such a plurality of analyzes simultaneously. However, the calculation amount becomes too large, and it is difficult to apply as it is in practice.

本発明は、上記の事情を鑑みてなされたもので、計算量の増大を抑制して、精度よく自然言語解析処理を行うことができる自然言語解析処理装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and provides a natural language analysis processing apparatus, method, and program capable of accurately performing natural language analysis processing while suppressing an increase in the amount of calculation. Objective.

上記の目的を達成するために第１の発明に係る自然言語解析処理装置は、少なくとも一文を含む入力文書に対して複数種類の自然言語解析処理を含む自然言語解析処理を行う自然言語解析処理装置であって、前記複数種類の自然言語解析処理の各々について、前記入力文書に対して前記自然言語解析処理を行う問題を、前記自然言語解析処理について予め定義した文字単位以上の単位の問題である部分問題に分解して、前記部分問題の集合を生成する部分問題生成手段と、前記自然言語解析処理を行うＳ個（Ｓは２以上の自然数である）の計算ノードと、前記部分問題生成手段によって生成された前記部分問題の集合について、各部分問題が少なくとも２以上のグループに属するように、前記部分問題の全集合に対する任意の部分集合で構成されるＳ個のグループを作成して前記Ｓ個の計算ノードに割り当てるグループ作成手段と、を含み、前記Ｓ個の計算ノードの各々は、前記グループ作成手段によって割り当てられた前記グループの前記部分問題の全集合に対する部分集合について、前記２以上のグループに属する各部分問題の解が一致する制約条件に基づいて、前記部分集合の各部分問題の解を更新する部分解更新手段と、前記部分解更新手段によって更新された前記部分集合の各部分問題の解を、他の計算ノードに通知すると共に、前記他の計算ノードから通知された前記部分集合の各部分問題の解を取得する同期手段と、前記同期手段によって取得した前記他の計算ノードの前記部分集合の各部分問題の解と、前記部分解更新手段によって更新された前記部分集合の各部分問題の解とに基づいて、各部分問題について、前記制約条件に従って前記部分問題の解を一致させるときの解である制約条件パラメータを更新する制約条件更新手段と、前記制約条件パラメータの値が収束したか否かを判定し、前記制約条件パラメータの値が収束したと判定するまで、前記部分解更新手段による更新、前記同期手段による通知及び取得、並びに前記制約条件更新手段による更新を繰り返す収束判定手段とを含んで構成されている。 In order to achieve the above object, a natural language analysis processing apparatus according to a first invention performs a natural language analysis process including a plurality of types of natural language analysis processes on an input document including at least one sentence. In each of the plurality of types of natural language analysis processes, the problem of performing the natural language analysis process on the input document is a problem of units greater than or equal to a character unit previously defined for the natural language analysis process. Sub-problem generating means for decomposing into sub-problems and generating the set of sub-problems, S (N is a natural number of 2 or more) calculation nodes for performing the natural language analysis processing, and the partial problem generating means For the set of subproblems generated by the above, each subproblem is composed of an arbitrary subset for the whole set of subproblems so that they belong to at least two groups Creating S groups and assigning them to the S computation nodes, wherein each of the S computation nodes is for the sub-problem of the group assigned by the group creation means. A partial decomposition updating means for updating a solution of each partial problem of the subset based on a constraint condition in which the solutions of the partial problems belonging to the two or more groups match with respect to the subset of the entire set; and the partial decomposition update Synchronizing means for notifying other sub-problems solutions of the sub-set updated by the means to other computing nodes, and obtaining the solutions of the sub-problems of the sub-set notified from the other computing nodes; Solution of each partial problem of the subset of the other computation node acquired by the synchronization means, and each part of the subset updated by the partial decomposition update means Based on the solution of the problem, for each subproblem, the constraint condition update means for updating the constraint parameter that is a solution when matching the solution of the subproblem according to the constraint condition, and the value of the constraint parameter converges Until it is determined whether or not the value of the constraint parameter has converged, and the convergence determination that repeats the update by the partial update means, the notification and acquisition by the synchronization means, and the update by the constraint condition update means Means.

第２の発明に係る自然言語解析処理方法は、部分問題生成手段、Ｓ個（Ｓは２以上の自然数である）の計算ノード、及びグループ作成手段を含み、少なくとも一文を含む入力文書に対して複数種類の自然言語解析処理を含む自然言語解析処理を行う自然言語解析処理装置における自然言語解析処理方法であって、前記部分問題生成手段によって、前記複数種類の自然言語解析処理の各々について、前記入力文書に対して前記自然言語解析処理を行う問題を、前記自然言語解析処理について予め定義した文字単位以上の単位の問題である部分問題に分解して、前記部分問題の集合を生成し、前記グループ作成手段によって、前記部分問題生成手段によって生成された前記部分問題の集合について、各部分問題が少なくとも２以上のグループに属するように、前記部分問題の全集合に対する任意の部分集合で構成されるＳ個のグループを作成して前記Ｓ個の計算ノードに割り当て、前記Ｓ個の計算ノードによって、前記自然言語解析処理を行い、前記Ｓ個の計算ノードの各々によって前記自然言語解析処理を行うことは、部分解更新手段によって、前記グループ作成手段によって割り当てられた前記グループの前記部分問題の全集合に対する部分集合について、前記２以上のグループに属する各部分問題の解が一致する制約条件に基づいて、前記部分集合の各部分問題の解を更新し、同期手段によって、前記部分解更新手段によって更新された前記部分集合の各部分問題の解を、他の計算ノードに通知すると共に、前記他の計算ノードから通知された前記部分集合の各部分問題の解を取得し、制約条件更新手段によって、前記同期手段によって取得した前記他の計算ノードの前記部分集合の各部分問題の解と、前記部分解更新手段によって更新された前記部分集合の各部分問題の解とに基づいて、各部分問題について、前記制約条件に従って前記部分問題の解を一致させるときの解である制約条件パラメータを更新し、収束判定手段によって、前記制約条件パラメータの値が収束したか否かを判定し、前記制約条件パラメータの値が収束したと判定するまで、前記部分解更新手段による更新、前記同期手段による通知及び取得、並びに前記制約条件更新手段による更新を繰り返すことを含む。 A natural language analysis processing method according to a second invention includes a sub-problem generating means, S calculation nodes (S is a natural number of 2 or more), and a group creating means for an input document including at least one sentence. a natural language analysis processing method in natural language analysis processing apparatus for natural language analysis processing including a plurality of types of natural language analysis processing, by the partial problems generating means, for each of the natural language analysis processing of the plurality of types, wherein The problem of performing the natural language analysis processing on the input document is decomposed into sub-problems that are problems of units greater than or equal to a character unit predefined for the natural language analysis processing, and the set of the partial problems is generated, With respect to the set of partial problems generated by the partial problem generating means by the group creating means, each partial problem belongs to at least two or more groups. As described above, S groups composed of arbitrary subsets of the entire subset of the subproblems are created and assigned to the S computation nodes, and the natural language analysis processing is performed by the S computation nodes. And performing the natural language analysis processing by each of the S computation nodes is performed on the subset of the partial problem of the group assigned by the group creating unit by the partial decomposition updating unit. The solution of each partial problem of the subset is updated based on a constraint condition that the solutions of the partial problems belonging to two or more groups match, and the subset of the subset updated by the partial decomposition update means is synchronized by a synchronization means Not only the solution of each subproblem is notified to other computing nodes, but also the solution of each subproblem of the subset notified from the other computing node is obtained. A solution for each subproblem of the subset of the other computation node acquired by the synchronization means by the constraint condition updating means, and a solution for each subproblem of the subset updated by the partial decomposition update means, For each subproblem, update the constraint parameter that is a solution when matching the solution of the subproblem according to the constraint, and whether or not the value of the constraint parameter has converged by the convergence determination means Until the value of the constraint parameter is converged, the update by the partial decomposition update unit, the notification and acquisition by the synchronization unit, and the update by the constraint condition update unit are repeated.

このように、第１の発明及び第２の発明によれば、複数種類の自然言語解析処理を行う問題を部分問題の集合に分解してグループを作成し、２以上のグループに属する部分問題の解が一致する制約条件に基づいて、各計算ノードにおいて、割り当てられた部分問題の部分集合について解を更新し、他の計算ノードから取得した部分問題の解を用いて、制約条件パラメータを更新することを収束するまで繰り返すことにより、計算量の増大を抑制して、精度よく自然言語解析処理を行うことができる。 As described above, according to the first and second inventions, the problem of performing a plurality of types of natural language analysis processing is divided into a set of subproblems to create a group, and subproblems belonging to two or more groups Based on the constraint conditions with which the solutions match, in each computation node, the solution is updated for a subset of the assigned subproblem, and the constraint parameter is updated using the solution of the subproblem obtained from another computation node. By repeating this until convergence, an increase in the amount of calculation can be suppressed and natural language analysis processing can be performed with high accuracy.

第３の発明に係る自然言語解析処理装置は、少なくとも一文を含む入力文書に対して複数種類の自然言語解析処理を含む自然言語解析処理を行う自然言語解析処理装置であって、前記複数種類の自然言語解析処理の各々について、前記入力文書に対して前記自然言語解析処理を行う問題を、前記自然言語解析処理について予め定義した文字単位以上の単位の問題である部分問題に分解して、前記部分問題の集合を生成する部分問題生成手段と、前記自然言語解析処理を行うＳ個（Ｓは２以上の自然数である）の計算ノードと、前記部分問題生成手段によって生成された前記部分問題の集合について、各部分問題が少なくとも２以上のグループに属するように、前記部分問題の全集合に対する任意の部分集合で構成されるＳ個のグループを作成して前記Ｓ個の計算ノードに割り当てるグループ作成手段と、制約条件更新手段と、収束判定手段とを含み、前記Ｓ個の計算ノードの各々は、前記グループ作成手段によって割り当てられた前記グループの前記部分問題の全集合に対する部分集合について、前記２以上のグループに属する各部分問題の解が一致する制約条件に基づいて、前記部分集合の各部分問題の解を更新する部分解更新手段と、を含み、前記制約条件更新手段は、各計算ノードの前記部分解更新手段によって更新された前記部分集合の各部分問題の解に基づいて、各部分問題について、前記制約条件に従って前記部分問題の解を一致させるときの解である制約条件パラメータを更新し、前記収束判定手段は、前記制約条件パラメータの値が収束したか否かを判定し、前記制約条件パラメータの値が収束したと判定するまで、各計算ノードの前記部分解更新手段による更新、及び前記制約条件更新手段による更新を繰り返す。 A natural language analysis processing apparatus according to a third invention is a natural language analysis processing apparatus that performs a natural language analysis process including a plurality of types of natural language analysis processes on an input document including at least one sentence, the plurality of types of for each of the natural language analysis processing, the problem of performing natural language analysis processing, and decompose the subproblem a predefined character units or more units problem for the natural language analysis processing on the input document, the A sub-problem generating means for generating a set of sub-problems, S calculation nodes (S is a natural number of 2 or more) for performing the natural language analysis process, and the sub-problem generated by the sub-problem generating means Create S groups of arbitrary subsets for the entire set of subproblems so that each subproblem belongs to at least two groups. Group creating means for assigning to the S computing nodes, constraint condition updating means, and convergence judging means, each of the S computing nodes being the portion of the group assigned by the group creating means Partial decomposition updating means for updating a solution of each subproblem of the subset based on a constraint condition that the solutions of the subproblems belonging to the two or more groups match with respect to a subset of the whole set of problems. The constraint condition update means matches the solutions of the subproblems according to the constraint conditions for each partial problem based on the solutions of the partial problems of the subset updated by the partial decomposition update means of each computation node. Updating the constraint parameter, which is a solution when performing the determination, the convergence determination means determines whether or not the value of the constraint parameter has converged, To a value of approximately condition parameter is determined to have converged, updating by the portion degradation updating means of each computing node, and repeating the updating by the constraint condition update unit.

第４の発明に係る自然言語解析処理方法は、部分問題生成手段、Ｓ個（Ｓは２以上の自然数である）の計算ノード、グループ作成手段、制約条件更新手段、及び収束判定手段を含み、少なくとも一文を含む入力文書に対して複数種類の自然言語解析処理を含む自然言語解析処理を行う自然言語解析処理装置における自然言語解析処理方法であって、前記部分問題生成手段によって、前記複数種類の自然言語解析処理の各々について、前記入力文書に対して前記自然言語解析処理を行う問題を、前記自然言語解析処理について予め定義した文字単位以上の単位の問題である部分問題に分解して、前記部分問題の集合を生成し、前記グループ作成手段によって、前記部分問題生成手段によって生成された前記部分問題の集合について、各部分問題が少なくとも２以上のグループに属するように、前記部分問題の全集合に対する任意の部分集合で構成されるＳ個のグループを作成して前記Ｓ個の計算ノードに割り当て、前記Ｓ個の計算ノードによって、前記自然言語解析処理を行い、前記制約条件更新手段によって更新し、前記収束判定手段によって判定することを含み、前記Ｓ個の計算ノードの各々によって前記自然言語解析処理を行うことは、部分解更新手段によって、前記グループ作成手段によって割り当てられた前記グループの前記部分問題の全集合に対する部分集合について、前記２以上のグループに属する各部分問題の解が一致する制約条件に基づいて、前記部分集合の各部分問題の解を更新することを含み、前記制約条件更新手段は、各計算ノードの前記部分解更新手段によって更新された前記部分集合の各部分問題の解に基づいて、各部分問題について、前記制約条件に従って前記部分問題の解を一致させるときの解である制約条件パラメータを更新し、前記収束判定手段は、前記制約条件パラメータの値が収束したか否かを判定し、前記制約条件パラメータの値が収束したと判定するまで、各計算ノードの前記部分解更新手段による更新、及び前記制約条件更新手段による更新を繰り返す。 A natural language analysis processing method according to a fourth invention includes a sub-problem generating means, S calculation nodes (S is a natural number of 2 or more), a group creating means, a constraint condition updating means, and a convergence determining means, A natural language analysis processing method in a natural language analysis processing apparatus for performing natural language analysis processing including a plurality of types of natural language analysis processing on an input document including at least one sentence, wherein the plurality of types for each of the natural language analysis processing, the problem of performing natural language analysis processing, and decompose the subproblem a predefined character units or more units problem for the natural language analysis processing on the input document, the A set of subproblems is generated, and each subproblem is generated for the set of subproblems generated by the subproblem generating means by the group creating means. Create S groups composed of arbitrary subsets for the entire set of subproblems so as to belong to at least two groups and assign them to the S computation nodes, and by the S computation nodes, Performing the natural language analysis process, updating by the constraint condition updating unit, and determining by the convergence determination unit, and performing the natural language analysis process by each of the S computation nodes includes partial decomposition update Means for a subset of the subset assigned to the group by the group creation means based on a constraint condition that the solutions of the subproblems belonging to the two or more groups match. Updating the solution of each subproblem, wherein the constraint condition updating means is connected to the partial decomposition updating means of each computation node. Based on the solution of each subproblem of the subset updated in this way, for each subproblem, update the constraint parameter, which is a solution when matching the solution of the subproblem according to the constraint, and determine the convergence The means determines whether or not the value of the constraint parameter has converged, and updates by the partial decomposition update means for each calculation node until it is determined that the value of the constraint parameter has converged, and the constraint condition update Repeat update by means.

このように、第３の発明及び第４の発明によれば、複数種類の自然言語解析処理を行う問題を部分問題の集合に分解してグループを作成し、２以上のグループに属する部分問題の解が一致する制約条件に基づいて、各計算ノードにおいて、割り当てられた部分問題の部分集合について解を更新し、各計算ノードで更新された部分問題の解を用いて、制約条件パラメータを更新することを収束するまで繰り返すことにより、計算量の増大を抑制して、精度よく自然言語解析処理を行うことができる。 As described above, according to the third and fourth aspects of the present invention, the problem of performing a plurality of types of natural language analysis processing is decomposed into a set of subproblems to create a group, and subproblems belonging to two or more groups Based on the constraint conditions with which the solutions match, at each computation node, the solution is updated for a subset of the assigned subproblem, and the constraint parameter is updated using the subproblem solution updated at each computation node. By repeating this until convergence, an increase in the amount of calculation can be suppressed and natural language analysis processing can be performed with high accuracy.

第５の発明に係るプログラムは、コンピュータを、上記の自然言語解析処理装置の各手段として機能させるためのプログラムである。 A program according to a fifth aspect is a program for causing a computer to function as each means of the natural language analysis processing apparatus.

以上説明したように、本発明の自然言語解析処理装置、方法、及びプログラムによれば、計算量の増大を抑制して、精度よく自然言語解析処理を行うことができる、という効果が得られる。 As described above, according to the natural language analysis processing device, method, and program of the present invention, it is possible to obtain an effect that natural language analysis processing can be performed with high accuracy while suppressing an increase in calculation amount.

極小部分問題のグループを作成する方法を説明するための図である。It is a figure for demonstrating the method to produce the group of a minimum subproblem. 本発明の第１の実施の形態に係る自然言語解析処理装置の構成を示す概略図である。It is the schematic which shows the structure of the natural language analysis processing apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る自然言語解析処理装置の計算ノードにおける自然言語解析処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the natural language analysis processing routine in the calculation node of the natural language analysis processing apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る自然言語解析処理装置の構成の他の例を示す概略図である。It is the schematic which shows the other example of a structure of the natural language analysis processing apparatus which concerns on the 1st Embodiment of this invention. 本発明の第３の実施の形態に係る自然言語解析処理システムの構成を示す概略図である。It is the schematic which shows the structure of the natural language analysis processing system which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施の形態に係る言語解析制御装置の構成を示す概略図である。It is the schematic which shows the structure of the language analysis control apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施の形態に係る言語解析装置の構成を示す概略図である。It is the schematic which shows the structure of the language analyzer which concerns on the 3rd Embodiment of this invention. 単語区切り、品詞タグ付け、固有表現抽出、文節区切り、及び文節間の係り受けの例を示す図である。It is a figure which shows the example of word break, part-of-speech tagging, specific expression extraction, phrase break, and dependency between phrases. 従来技術における履歴ベースの逐次解析法を示す図である。It is a figure which shows the history-based sequential analysis method in a prior art. 従来技術における動的計画法に基づく解析の例を示す図である。It is a figure which shows the example of the analysis based on the dynamic programming in a prior art.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

〔第１の実施の形態〕
＜発明の概要＞
本発明では、非特許文献４（Andre F. T. Martins, Mario A. T. Figueiredo, Pedro M. Q. Aguiar, Noah A. Smith, Eric P. Xing An Augmented Lagrangian Approach to Constrained MAP Inference Proc. of ICML, 2011.）で提案されているDD-ADMM法を用いて、全体最適化の課題に対処する。DD-ADMMでは、実際に解きたい大きな問題を小さな部分問題に分割して解く方法である。ポイントは、分割した個々の問題は、他と独立に解くことができる点であり、相対的に計算が容易になるといった利点がある。このとき、問題を分割する際に各問題で重複する部分を作っておき、その重複部分の解析結果は全て一致しなくてはいけないという制約を付与することで全体の整合性を保つ。これによって、分割した小さな部分問題の集合を解くことで、実際に解きたい大きな問題の解を得ることが可能となる。 [First Embodiment]
<Outline of the invention>
In the present invention, it is proposed in Non-Patent Document 4 (Andre FT Martins, Mario AT Figueiredo, Pedro MQ Aguiar, Noah A. Smith, Eric P. Xing An Augmented Lagrangian Approach to Constrained MAP Inference Proc. Of ICML, 2011.). The DD-ADMM method is used to address the issue of global optimization. DD-ADMM is a method of solving a large problem that you want to solve by dividing it into small subproblems. The point is that each divided problem can be solved independently of each other, and there is an advantage that calculation is relatively easy. At this time, when the problem is divided, an overlapping part is created in each problem, and the whole consistency is maintained by giving a restriction that the analysis results of the overlapping part must all match. This makes it possible to obtain a solution to a large problem that is actually desired to be solved by solving a set of divided sub-problems.

実施例として、入力文書（日本語）に対して、単語区切り、文節区切り、文区切り、品詞付与（辞書引き）、文節間係り受け関係付与を行うシステムを想定する。従来は、1,文区切り、2,単語区切り+品詞付与(一般に形態素解析と呼ばれる)、3,文節区切り、4,係り受け解析、というように、各段階の処理は、前段階の処理結果を入力として受け取り処理を行う。つまり、処理は以下のように行われる。 As an example, a system is assumed that performs word break, phrase break, sentence break, part of speech assignment (dictionary lookup), and inter-phrase dependency relation assignment for an input document (Japanese). Conventionally, the processing at each stage, such as 1, sentence break, 2, word break + part-of-speech assignment (generally called morphological analysis), 3, clause break, 4, dependency analysis, etc. Receive as input. That is, processing is performed as follows.

・入力:システムに文書が入力される
・処理1[入力]:システムに入力された文書を受け取る
・処理1:入力文書に対して文区切りを推定する
・処理1[出力]:入力文書に対して文区切りを付与した文書を出力する
・処理2[入力]:処理1の出力した文書を受け取る
・処理2:単語区切りと同時に品詞を推定する
・処理2[出力]:入力文書に対して文区切り、単語区切り、品詞を付与した文書を出力する
・処理3[入力]:処理2の出力した文書を受け取る
・処理3:文節区切りを推定する
・処理3[出力]:入力文書に対して文区切り、単語区切り、文節区切り、品詞を付与した文書を出力する
・処理4[入力]: 処理3の出力した文書を受け取る
・処理4:文節間の係り受け関係を推定する
・処理4[出力]: 入力文書に対して文区切り、単語区切り、文節区切り、品詞、文節間係り受け関係を付与した文書を出力する
・出力:入力文書に対して文区切り、単語区切り、文節区切り、品詞、文節間係り受け関係を付与した文書を出力する。・ Input: A document is input to the system ・ Process 1 [Input]: Receives the document input to the system ・ Process 1: Estimate sentence breaks for the input document ・ Process 1 [Output]: For the input document・ Process 2 [Input]: Receive the document output from Process 1 ・ Process 2: Estimate part of speech at the same time as word break ・ Process 2 [Output]: Sentence for input document Outputs a document with delimiters, word breaks, and parts of speech ・ Process 3 [Input]: Receives the document output from Process 2 ・ Process 3: Estimate phrase break ・ Process 3 [Output]: Sentence for input document Output a document with delimiters, word breaks, phrase breaks, and parts of speech ・ Process 4 [Input]: Receive the document output from Process 3 ・ Process 4: Estimate the dependency relationship between phrases ・ Process 4 [Output] : Sentence breaks, word breaks, phrase breaks, part of speech, and dependency relations between phrases for the input document・ Output: Outputs a document with sentence breaks, word breaks, phrase breaks, part of speech, and dependency relations between phrases for the input document.

それに対し、本発明では、上記の例における5種類の処理を一括して一つの枠組みで処理を行う。 On the other hand, in the present invention, the five types of processing in the above example are performed in a single framework.

・入力:システムに文書が入力される
・処理[入力]:システムに入力された文書を受け取る
・処理:文区切り、単語区切り、文節区切り、品詞、文節間係り受け関係を一括して推定する
・処理[出力]:入力文書に対して文区切り、単語区切り、文節区切り、品詞、文節間係り受け関係を付与した文書を出力する
・出力:入力文書に対して文区切り、単語区切り、文節区切り、品詞、文節間係り受け関係を付与した文書を出力する・ Input: A document is input to the system ・ Process [Input]: Receives a document input to the system ・ Process: Sentence delimiter, word delimiter, phrase delimiter, part of speech, and dependency relation between phrases are estimated collectively Process [Output]: Outputs a document with sentence breaks, word breaks, phrase breaks, part of speech, and dependency relations between phrases for the input document.Output: sentence breaks, word breaks, phrase breaks for input documents, Output a document with part-of-speech and sentence dependency

本発明の特徴として、事前に定義された極小部分問題に対し、いくつかの極小部分問題をひとまとめにしたグループを作成する処理があること、各グループで「独立」に、担当する極小部分問題の解を決定する処理のくり返しにより全体の処理を行うこと、各グループで独立に処理された結果の整合性を保持するような処理があること、最終的な解を各グループの解が一致した時に得る処理があること、などが挙げられる。これらの特徴を以下で説明する。 As a feature of the present invention, there is a process of creating a group of several minimum sub-problems together for a pre-defined minimum sub-problem. The entire process is performed by repeating the process of determining the solution, there is a process that maintains the consistency of the results processed independently in each group, and when the solution of each group matches the final solution There is a process to obtain. These features are described below.

まず、従来のパイプライン方式では、以下の（１）式で表される最適化問題をくり返し解くことになる。 First, in the conventional pipeline method, the optimization problem expressed by the following equation (1) is repeatedly solved.

上記（１）式は、自然言語解析処理に例えばI個の段階があるとした時のi∈{1,...,I}番目の言語解析処理に関する最適化問題を表している。このとき、X_iをi番目の言語解析処理の問題の入力、Y_iをi番目の言語解析処理の問題の出力とする。パイプライン方式の場合は、i−1番目の言語解析処理で得られたy_i-1を、i番目の言語解析処理の入力x_iの一部として利用する方式と言える。 The above expression (1) represents an optimization problem related to the i∈ {1,..., I} -th language analysis process when there are, for example, I stages in the natural language analysis process. At this time, let X _{i be} the input of the i-th language analysis processing problem, and Y _{i be} the output of the i-th language analysis processing problem. In the case of the pipeline method, it can be said that y _i-1 obtained by the _i− 1th language analysis processing is used as a part of the input x _i of the _ith language analysis processing.

本発明では、基本的な考え方として処理の最小単位を事前に決定する。処理の最小単位とは、それぞれの段階の処理において、問題を定義する上で最も小さい問題として定義できるものに相当し、各段階によってそれぞれ異なり、文字単位以上の単位である。例えば、単語区切りであれば、各文字間が区切れるか区切れないかと推定する問題が問題の最小単位となる。同様に、文節区切りや文区切りも、各文字間が文節、または、文の区切りになるかどうかを推定する問題が問題の最小単位と考えることができる。また、品詞付与の問題であれば、一つの文字に適した品詞を選択する問題が最小単位となる。最小単位の問題は、解きたい問題の部分に相当することから、ここでは、最小単位の問題を「極小部分問題」と呼ぶ。ここでは、極小部分問題のインデックスをrで表す。また、全ての極小部分問題のインデックスの集合をRで表す事とする。よって、解きたい問題は｜R｜個の極小部分問題で構成されている。 In the present invention, the minimum unit of processing is determined in advance as a basic idea. The minimum unit of processing corresponds to what can be defined as the smallest problem in defining a problem in each stage of processing, and is different for each stage and is a unit of character units or more. For example, in the case of word delimiters, the problem of estimating whether each character is delimited or not is the smallest problem unit. Similarly, in the case of clause breaks and sentence breaks, the problem of estimating whether each character becomes a clause or a sentence break can be considered as the smallest unit of the problem. In addition, if it is a problem of part-of-speech assignment, the problem of selecting a part-of-speech suitable for one character is the minimum unit. Since the problem of the minimum unit corresponds to the part of the problem to be solved, the problem of the minimum unit is referred to as “minimal subproblem”. Here, the index of the minimal subproblem is represented by r. Also, let R denote the set of indices for all the minimal subproblems. Therefore, the problem to be solved is composed of | R |

次に、一つの文書を一つの入力と考えると、単語区切りや品詞タグ付けは、極小部分問題の集合により定義される。よって、これらの問題の解空間をベクトルの集合Yで表すとする。入力が与えられた時に得られる一つの出力をベクトルz∈Yで表す。ここでは、極小部分問題の解はベクトルの一つの要素z_iに相当する。 Next, considering one document as one input, word breaks and part-of-speech tagging are defined by a set of minimal subproblems. Therefore, the solution space of these problems is represented by a set Y of vectors. One output obtained when an input is given is represented by a vector z∈Y. Here, the solution of the minimal sub-problem corresponds to one element z _i of the vector.

次に、本発明では、いくつかの極小部分問題でグループを構成し、グループの集合で問題を表現する。この時、各グループは極小部分問題を重複して含むことが可能である。グループの作り方は任意であるが、以下の2つの点を満たすようにする。１点目は、解きたい問題全域をカバーする形でグループを構成することであり、２点目は、各極小部分問題は複数のグループに属することである。ここでは、作成した一つのグループを全体の問題に対する一つの部分問題と考える。よって、一つのグループで一つの最適化問題として解くことになる。なお、グループに属する極小部分問題の部分集合は、複数段階の言語解析処理に関するものを含むことが可能である。 Next, in the present invention, a group is composed of several minimal subproblems, and the problem is expressed by a set of groups. At this time, each group can include the minimum subproblem in an overlapping manner. How to create a group is arbitrary, but it should satisfy the following two points. The first point is to form a group so as to cover the entire problem to be solved, and the second point is that each local subproblem belongs to a plurality of groups. Here, one created group is considered as one partial problem for the whole problem. Therefore, one group solves as one optimization problem. It should be noted that the subset of the minimal subproblems belonging to the group can include those relating to multiple stages of language analysis processing.

ここで、グループをS個作成したとする。s番目のグループに含まれる解空間の部分空間をY_sとする。つまりY_s⊆Yである。このとき、ｓ番目のグループに含まれるr番目の極小部分問題の解をz_s(r)とする。 Here, it is assumed that S groups are created. Let Y _s be a subspace of the solution space included in the sth group. That is, Y _s ⊆Y. At this time, let z _s (r) be the solution of the rth minimal subproblem included in the sth group.

グループを作成する例を図１に示す。 An example of creating a group is shown in FIG.

R_sをs番目のグループに含まれる極小部分問題のインデックスの集合とする。次に、R_sをR⊆∪_s=1 ^S R′_s、および、R_s=R′_s∩Rとなる集合とする。これは、極小部分問題ではないが全体の問題を解く上で有効な補助的な部分問題があるので、それらの部分問題を含めてグループを構成したい場合に対応するための拡張である。つまり、ここではR′_s＼Rに相当する集合に含まれるインデックスは、極小部分問題ではないが問題を解く上での有効な補助的な部分問題のインデックスに相当する。 Let R _s be the set of indices of the minimal subproblems included in the sth group. Next, R⊆∪ the _{_{^{R s s = 1 S R '}}} s, and, R _s = R' be the set to be _s .andgate.R. Although this is not a minimal subproblem, there is an auxiliary subproblem that is effective in solving the entire problem. That is, here, the index included in the set corresponding to R ′ _s \ R is not a minimal subproblem but corresponds to an index of an auxiliary subproblem effective in solving the problem.

s番目のグループで得られる出力に対して、出力の良さを計算する関数をf_sとする。ここでは、f_s(z_s)の値が大きいほど、出力として良いものであるという仮定のもとに計算される値とする。このとき、自然言語解析処理を行う問題全体は以下の（２）式に示す最適化問題として定式化できる。 For the output obtained in the sth group, let f _s be the function that calculates the goodness of the output. Here, the value calculated under the assumption that the larger the value of f _s (z _s ) is, the better the output is. At this time, the whole problem for performing the natural language analysis processing can be formulated as an optimization problem shown in the following equation (2).

ここで、z_sはグループsに含まれる極小部分問題の解の集合とする。また、zは、z_s(r)の全集合とする。この最適化問題は、S個のグループが独立に、各グループが担当する極小部分問題に対する出力z_s(r)で最も良いと思われる部分解を選択する問題を、各グループの部分解はu(r)のベクトルと一致することを制約条件として解いていることに相当する。この定式化のポイントは、制約条件z_s(r)=u(r)の部分にあり、各グループsで、r番目の極小部分問題を含む場合、z_s(r)の値は全てu(r)と一致することを解の条件とする、ということを意味している。これは、つまり、個々のグループでは、解z_s(r)を独立に決定するが、制約条件z_s(r)=u(r)により全体として整合性が取れる解に限定することができる。この制約条件の式とラグランジュ緩和とを用いて、最適化のための以下の（３）式に示す目的関数を得る。 Here, z _s is a set of solutions of the minimal subproblem included in the group s. Z is the total set of z _s (r). This optimization problem into S groups independently a problem of selecting the best a partial solution that seems at output z _s (r) for the minimum subproblem each group is responsible, partial solution of each group u This corresponds to solving as a constraint condition that it matches the vector of (r). The point of this formulation lies in the part of the constraint condition z _s (r) = u (r), and in each group s, the value of z _s (r) is all u ( This means that the condition for the solution is to match r). In other words, in each group, the solution z _s (r) is determined independently, but can be limited to a solution that can be consistent as a whole by the constraint condition z _s (r) = u (r). Using this constraint equation and Lagrangian relaxation, the objective function shown in the following equation (3) for optimization is obtained.

次に、非特許文献５（Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 2011.）に記載されているように、augmented Lagrangianの項P/2(z_s(r)−u(r))²を追加して、以下の（４）式のように、問題を2次式の形に変形することで問題をより解きやすい形とする。 Next, it is described in Non-Patent Document 5 (Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 2011.) As shown, the augmented Lagrangian term P / 2 (z _s (r) −u (r)) ² is added and the problem is transformed into a quadratic form as shown in equation (4) below. To make the problem easier to solve.

これは、最適値でもとの問題と一致する。また、以降の式展開を簡単にするためにλ_s(r)=−ρα_s(r)とし、以下の（５）式を得る。 This is consistent with the problem with the optimal value. Further, in order to simplify the subsequent expression expansion, λ _s (r) = − ρα _s (r) is set, and the following expression (5) is obtained.

＜システム構成＞
本発明の第１の実施の形態に係る自然言語解析処理装置１００は、自然言語解析処理の対象として与えられた文書を入力として受け取り、複数段階の言語解析処理を含む自然言語解析処理の結果を出力する。この自然言語解析処理装置１００は、ＣＰＵと、ＲＡＭと、後述する自然言語解析処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成され、機能的には次に示すように構成されている。図２に示すように、自然言語解析処理装置１００は、入力部１０と、演算部２０と、出力部３０とを備えている。 <System configuration>
The natural language analysis processing apparatus 100 according to the first embodiment of the present invention receives a document given as an object of the natural language analysis process as an input, and receives the result of the natural language analysis process including a plurality of stages of language analysis processes. Output. The natural language analysis processing apparatus 100 is configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing a natural language analysis processing routine to be described later, and functionally as described below. It is configured. As shown in FIG. 2, the natural language analysis processing apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 30.

入力部１０は、入力された文書を受け付ける。また、入力部１０は、人手により入力された計算ノード数Sとパラメータρを受け付ける。 The input unit 10 receives an input document. The input unit 10 receives the number of calculation nodes S and the parameter ρ input manually.

演算部２０は、極小部分問題生成部２１、グループ作成部２２、及びS個の計算ノード２３₁〜２３_Sを備えている。なお、計算ノード２３₁〜２３_Sのうちの任意の計算ノードを示す場合には、計算ノード２３と称することとする。 The calculation unit 20 includes a minimal subproblem generation unit 21, a group creation unit 22, and S calculation nodes 23 ₁ to 23 _S. In addition, when showing arbitrary calculation nodes among the calculation nodes 23 ₁ to 23 _S , they are referred to as calculation nodes 23.

極小部分問題生成部２１は、言語解析処理の段階毎に、入力部１０により入力された入力文書に対して当該段階の言語解析処理を行う問題を、当該段階の言語解析処理について予め定義された極小部分問題に分解し、全段階における極小部分問題の集合を生成する。 For each stage of the language analysis process, the minimal partial problem generation unit 21 defines a problem for performing the language analysis process at the stage for the input document input by the input unit 10 in advance for the language analysis process at the stage. Break down into minimal subproblems and generate a set of minimal subproblems at all stages

グループ作成部２２は、極小部分問題生成部２１によって生成された極小部分問題の集合について、各極小部分問題が少なくとも２以上のグループに属するように、部分問題の部分集合からなるＳ個のグループを作成し、S個の計算ノード２３₁〜２３_Sに割り当てる。また、グループ作成部２２は、入力されたパラメータρを、S個の計算ノード２３₁〜２３_Sの各々に通知する。 The group creation unit 22 selects S groups of subsets of the subproblems so that each local subproblem belongs to at least two groups with respect to the set of local subproblems generated by the local subproblem generation unit 21. It is created and assigned to _S computing nodes 23 ₁ to 23 _S. In addition, the group creation unit 22 notifies the input parameter ρ to each of the _S computation nodes 23 ₁ to 23 _S.

S個の計算ノード２３₁〜２３_Sの各々は、パラメータ更新部３１、部分解更新部３２、同期部３３、制約条件更新部３４、及び収束判定部３５を備えている。各S個のパラメータ更新部３１、部分解更新部３２、同期部３３、制約条件更新部３４、及び収束判定部３５が存在することになるが、同様の機能を有す処理部は同じ番号で表わしている。 Each of the _S calculation nodes 23 ₁ to 23 _S includes a parameter update unit 31, a partial decomposition update unit 32, a synchronization unit 33, a constraint condition update unit 34, and a convergence determination unit 35. Each S parameter update unit 31, partial decomposition update unit 32, synchronization unit 33, constraint condition update unit 34, and convergence determination unit 35 exist, but the processing units having similar functions are the same number. It represents.

パラメータ更新部３１は、グループ作成部２２によって割り当てられたグループに含まれる極小部分問題の部分集合について、以下に説明するように、ラグランジュ未定乗数α_s（ｒ）を更新する。 The parameter updating unit 31 updates the Lagrange undetermined multiplier α _s (r) for the subset of the minimal subproblems included in the group assigned by the group creating unit 22 as described below.

（ラグランジュ未定乗数α_s（ｒ）の更新）
最初の処理として、パラメータ更新部３１は、当該計算ノード２３_sに割り当てられたグループに含まれる各極小部分問題ｒに対するラグランジュ未定乗数α_s（ｒ）を更新する。 (Update of Lagrange multiplier _s α (r))
As the first process, the parameter updating unit 31 updates the Lagrange undetermined multiplier α _s (r) for each minimal subproblem r included in the group assigned to the computation node 23 _s .

ｚとｕを固定したときα_s（ｒ）の最適値の方向は、以下の（６）式に示すように、上記（５）式に示す目的関数L_ρ(ｚ,ｕ,α)のα_s（ｒ）に関する勾配方向である。 When z and u are fixed, the direction of the optimum value of α _s (r) is α of the objective function L _ρ (z, u, α) shown in the above equation (5) as shown in the following equation (6). _s (r) is the gradient direction.

上記（６）式の関係から、以下の（７)式に示す更新式を得る。 From the relationship of the above equation (6), the update equation shown in the following equation (7) is obtained.

パラメータ更新部３１は、各極小部分問題ｒに対して、上記（７）式に従って、ラグランジュ未定乗数α_s（ｒ）を更新する。上記（７）式の更新式は、各計算ノード２３で独立に計算できるため、他の計算ノード２３と通信などを行う必要がない。 The parameter updating unit 31 updates the Lagrange undetermined multiplier α _s (r) according to the above equation (7) for each minimal subproblem r. Since the update formula (7) can be calculated independently by each calculation node 23, it is not necessary to communicate with other calculation nodes 23.

部分解更新部３２は、グループ作成部２２によって割り当てられたグループに含まれる極小部分問題ｒの部分集合について、以下に説明するように、当該計算ノード２３_sにおける各極小部分問題の解ｚ_s（ｒ）を更新する。 As will be described below, the partial decomposition update unit 32 solves each minimal subproblem solution z _s (in the calculation node 23 _s) for a subset of the minimal subproblem r included in the group assigned by the group creation unit 22. Update r).

（各極小部分問題の解ｚ_s（ｒ）の更新） (Update of solution z _s (r) for each minimal subproblem)

反復計算k（kは繰り返し回数を管理する変数）の時点で、αとｕを固定したとき各ｚ_s（ｒ）の最適解は、以下の（８)式に示すように、上記（５）式に示す目的関数L(ｚ,ｕ,α)を最大にするｚ_s（ｒ）を見つける問題である。 At the time of the iterative calculation k (k is a variable for managing the number of iterations), when α and u are fixed, the optimal solution for each z _s (r) is the above (5) as shown in the following equation (8). The problem is to find z _s (r) that maximizes the objective function L (z, u, α) shown in the equation.

ここで、f_sは以下の（９）式で表されると仮定する。 Here, it is assumed that f _s is expressed by the following equation (9).

ここでθ_s(r)は、グループsでのrに関するスコアを表しており、この値は外部から与えられるものとする。つまり、この最適化の中では定数として扱われる。 Here, θ _s (r) represents a score regarding r in the group s, and this value is given from the outside. In other words, it is treated as a constant in this optimization.

定義に従って、上記（５）式から解ｚに関係する項のみを取り出すと、以下の（１０）式が得られる。 If only the term related to the solution z is extracted from the above equation (5) according to the definition, the following equation (10) is obtained.

上記（８）式の計算も各グループs毎に計算することが可能である。つまり、計算ノード２３毎に、部分解更新部３２は、割り当てられたグループに含まれる各極小部分問題ｒについて、以下の（１２）式に示す最大化問題を解けばよい。 The calculation of equation (8) can also be calculated for each group s. That is, for each calculation node 23, the partial decomposition update unit 32 may solve the maximization problem expressed by the following equation (12) for each minimal subproblem r included in the assigned group.

同期部３３は、当該計算ノード２３で今回更新されたα_s（ｒ）とｚ_s（ｒ）を、自分以外の全ての計算ノード２３_iへ通知する。また、同期部３３は、他の計算ノード２３_i全てから通知された、今回更新されたα_s（ｒ）とｚ_s（ｒ）を受け取る。この処理によって、個々の計算ノード２３は全ての計算ノード２３_sの持つα_s（ｒ）とｚ_s（ｒ）の値を取得することができる。 The synchronization unit 33 notifies α _s (r) and z _s (r) updated this time in the computation node 23 to all computation nodes 23 _i other than itself. Further, the synchronization unit 33 receives α _s (r) and z _s (r) updated this time notified from all the other computation nodes 23 _i . By this processing, each computation node 23 can acquire the values of α _s (r) and z _s (r) possessed by all the computation nodes 23 _s .

制約条件更新部３４は、他の計算ノード２３_s全てから受け取ったα_s（ｒ）とｚ_s（ｒ）を使って、以下のように、制約条件に従って極小部分問題の解を一致させるときの解を示すパラメータｕ（ｒ）を更新する。 The constraint condition update unit 34 uses α _s (r) and z _s (r) received from all the other computation nodes 23 _s to match the solution of the minimal subproblem according to the constraint condition as follows. The parameter u (r) indicating the solution is updated.

ｚとαを固定したときｕの最適解は、上記（５）式に示す目的関数L(ｚ,ｕ,α)に対してu_s(r)に関する偏微分の値が0になる点である。その関係から以下の（１３）式、（１４）式に示す関係式が得られる。 When z and α are fixed, the optimal solution for u is that the partial differential value for u _s (r) becomes 0 with respect to the objective function L (z, u, α) shown in the above equation (5). . From the relationship, the following relational expressions (13) and (14) are obtained.

つまり、uはz_s(r)の平均とα_s(r)の平均を足したものを意味し、パラメータｕ（ｒ）の更新には、全てのα_s（ｒ）とｚ_s（ｒ）が必要である。 That is, u means the sum of the average of z _s (r) and the average of α _s (r), and all α _s (r) and z _s (r) are used to update the parameter u (r). is necessary.

そこで、制約条件更新部３４は、取得した全ての計算ノード２３_sの持つα_s（ｒ）とｚ_s（ｒ）の値を用いて、上記（１４）式に従って、全ての極小部分問題ｒの各々に対するパラメータｕ（ｒ）を更新する。 Therefore, the constraint condition update unit 34 uses the values of α _s (r) and z _s (r) possessed by all of the acquired calculation nodes 23 _s to determine all the minimal subproblems r according to the above equation (14). Update the parameter u (r) for each.

個々の計算ノード２３で独立にｕ（ｒ）を求める。ここでの注意点として、個々の計算ノード２３は独立でｕ（ｒ）を求めるが、得られるｕ（ｒ）は全ての計算ノード２３で一致する。処理方法としては、任意のひとつの計算ノード２３でｕ（ｒ）を計算し、その後に各計算ノード２３に通知するといった処理を行うようにしてもよい。ただし、その場合には、選択された計算ノード２３の計算が終了し、結果が通知されるまで、それ以外の計算ノードは待機する必要がある。本実施の形態では、個々の計算ノード２３で同じ計算を行う方式をとった場合を例に説明する。 U (r) is obtained independently at each computation node 23. It should be noted that each computation node 23 independently obtains u (r), but the obtained u (r) matches in all computation nodes 23. As a processing method, a process may be performed in which u (r) is calculated by one arbitrary calculation node 23 and then notified to each calculation node 23. However, in that case, it is necessary to wait for the other calculation nodes until the calculation of the selected calculation node 23 ends and the result is notified. In the present embodiment, a case will be described as an example in which the same calculation is performed in each calculation node 23.

収束判定部３５は、得られた制約条件のパラメータｕが収束して最適値になっているか判定する。 The convergence determination unit 35 determines whether the obtained parameter u of the constraint condition has converged to an optimum value.

二つの小さな正の実数ε₁、ε₂をあたえ、以下の（１５)式、（１６）式を満たした際に最適値に収束したと判定する（非特許文献５を参照）。 Two small positive real numbers ε ₁ and ε ₂ are given, and it is determined that they converge to the optimum value when the following equations (15) and (16) are satisfied (see Non-Patent Document 5).

収束判定で、最適値に収束していなかった場合は、ｋ=ｋ+1として、パラメータ更新部３１による処理に戻る。最適値に収束していると判定された場合は、繰り返し処理を終了する。 In the convergence determination, if it has not converged to the optimum value, k = k + 1 is set and the process returns to the parameter updating unit 31. If it is determined that the value has converged to the optimum value, the iterative process is terminated.

この収束判定の処理もｕ（ｒ）と同様に任意のひとつの計算ノード２３で行い、その結果を全体に通知するようにしてもよい。しかし、同期処理が必要となるため、本実施の形態では、収束判定も全ての計算ノード２３で個別に行い、収束と判定されれば処理を終了する場合を例に説明する。この収束判定も、全ての計算ノードで結果が必ず一致するため、個々に判定をおこなっても結果は同じになる。 Similarly to u (r), the convergence determination process may be performed by any one calculation node 23 and the result may be notified to the whole. However, since synchronous processing is required, in this embodiment, a case will be described as an example where convergence is also individually determined in all the computation nodes 23, and the processing is terminated if it is determined that convergence has occurred. The result of this convergence determination is always the same at all the computation nodes, so the result is the same even if the determination is made individually.

収束判定部３５は、最適値に収束したと判定されたときに得られた各極小部分問題ｒの解ｕ（ｒ）を組み合わせて、複数段階の言語解析処理の結果を生成し、自然言語解析処理の結果として出力部３０により出力する。なお、本実施の形態では、任意の一つの計算ノード２３から、自然言語解析処理の結果が出力される場合を例に説明したが、全ての計算ノード２３から、自然言語解析処理の結果が出力されてもよい。また、実際のランキングを生成する際には、従来法と同様に上記（２）式を用いて検索スコアを計算する。 The convergence determination unit 35 combines the solutions u (r) of the respective minimal sub-problems r obtained when it is determined that the convergence has been achieved to the optimum value, and generates a result of a plurality of stages of language analysis processing. As a result of processing, the output unit 30 outputs the result. In this embodiment, the case where the result of the natural language analysis process is output from any one calculation node 23 has been described as an example. However, the result of the natural language analysis process is output from all the calculation nodes 23. May be. When generating the actual ranking, the search score is calculated using the above equation (2) as in the conventional method.

＜自然言語解析処理装置の作用＞
次に、本実施の形態に係る自然言語解析処理装置１００の作用について説明する。まず、自然言語解析処理の対象となる入力文書が自然言語解析処理装置１００に入力されると、自然言語解析処理装置１００において、極小部分問題生成部２１によって、入力文書に対する自然言語解析処理を行う問題を、言語解析処理の段階毎の極小部分問題に分解して、全段階における極小部分問題の集合を生成する。そして、グループ作成部２２によって、生成された極小部分問題の集合について、S個のグループを作成し、S個の計算ノード２３₁〜２３_Sに割り当てる。 <Operation of natural language analysis processing device>
Next, the operation of the natural language analysis processing apparatus 100 according to the present embodiment will be described. First, when an input document to be subjected to natural language analysis processing is input to the natural language analysis processing device 100, the natural language analysis processing device 100 performs natural language analysis processing on the input document by the minimal partial problem generation unit 21. The problem is decomposed into minimal subproblems at each stage of the language analysis processing, and a set of minimal subproblems at all stages is generated. Then, the group creation unit 22 creates S groups for the generated set of minimal subproblems and assigns them to _S computation nodes 23 ₁ to 23 _S.

そして、自然言語解析処理装置１００の各計算ノード２３によって、図３に示す自然言語解析処理ルーチンが実行される。なお、以下では、計算ノード２３_sによって実行した場合について説明する。 Then, the natural language analysis processing routine shown in FIG. 3 is executed by each calculation node 23 of the natural language analysis processing apparatus 100. In the following, a case where it is executed by the computation node 23 _s will be described.

まず、ステップＳ１０１において、繰り返し回数を示す変数ｋに初期値１を設定すると共に、割り当てられたグループｓの極小部分問題ｒの部分集合に対する各ラグランジュ未定乗数α_s ⁰（ｒ）、各極小部分問題の解ｚ_s ⁰（ｒ）、及び制約条件のパラメータｕ_s ⁰（ｒ）の各々に、適当な値（例えば、０）を与えて初期化する。 First, in step S101, an initial value 1 is set for a variable k indicating the number of iterations, and each Lagrange undetermined multiplier α _s ⁰ (r) for each subset of the minimal subproblem r of the assigned group s is assigned to each minimal subproblem. An appropriate value (for example, 0) is given to each of the solution z _s ⁰ (r) and the parameter u _s ⁰ (r) of the constraint condition for initialization.

そして、ステップＳ１０２において、パラメータ更新部３１によって、上記ステップＳ１０１で初期化されたラグランジュ未定乗数α_s ⁰（ｒ）、各極小部分問題の解ｚ_s ⁰（ｒ）、及び制約条件のパラメータｕ_s ⁰（ｒ）、又は前回更新されたラグランジュ未定乗数α_s ^k-1（ｒ）、各極小部分問題の解ｚ_s ^k-1（ｒ）、及び制約条件のパラメータｕ_s ^k-1（ｒ）に基づいて、上記（７)式に従って、グループｓ内の極小部分問題ｒの各々に対するラグランジュ未定乗数α_s ^k（ｒ）を更新する。 In step S102, the parameter update unit 31 performs Lagrange undetermined multiplier α _s ⁰ (r) initialized in step S101, the solution z _s ⁰ (r) of each local subproblem, and the parameter u _{s of the} constraint condition. ⁰ (r), or Lagrange undetermined multiplier α _s ^k-1 (r) updated last time, solution z _s ^k-1 (r) of each local subproblem, and constraint parameter u _s ^k-1 (r) Based on the above, the Lagrange undetermined multiplier α _s ^k (r) for each of the minimal subproblems r in the group s is updated according to the above equation (7).

次のステップＳ１０３では、部分解更新部３２によって、上記ステップＳ１０２で更新されたラグランジュ未定乗数α_s ^k（ｒ）と、上記ステップＳ１０１で初期化された各極小部分問題の解ｚ_s ⁰（ｒ）、又は前回更新された各極小部分問題の解ｚ_s ^k-1（ｒ）に基づいて、上記（１２）式に従って、グループｓ内の極小部分問題ｒの各々に対する解ｚ_s ^k（ｒ）を更新する。 In the next step S103, the Lagrangian undetermined multiplier α _s ^k (r) updated in step S102 by the partial decomposition update unit 32 and the solution z _s ⁰ (r) of each local subproblem initialized in step S101. ), or based on previous solutions updated each minimum subproblem z _s ^k-1 (r), (12) in accordance with equation, the solution to each of the minimum partial problems r in the group s z _s ^k (r) Update.

そして、ステップＳ１０４では、同期部３３によって、上記ステップＳ１０２で更新されたラグランジュ未定乗数α_s ^k（ｒ）及び上記ステップＳ１０３で更新された各極小部分問題の解ｚ_s ^k（ｒ）を他の計算ノード２３に通知すると共に、他の計算ノード２３_i全てから、更新されたラグランジュ未定乗数α_i ^k（ｒ）及び各極小部分問題の解ｚ_i ^k（ｒ）を取得する（ｉ＝１,・・・,s−１，s＋１，・・・,S）。 In step S104, the synchronization unit 33 calculates the Lagrange undetermined multiplier α _s ^k (r) updated in step S102 and the solution z _s ^k (r) of each minimal subproblem updated in step S103 to other values. In addition to notifying the computation node 23, the updated Lagrange undetermined multiplier α _i ^k (r) and the solution z _i ^k (r) of each local subproblem are obtained from all the other computation nodes 23 _i (i = 1, ..., s-1, s + 1, ..., S).

ステップＳ１０５では、制約条件更新部３４によって、上記ステップＳ１０２で更新されたラグランジュ未定乗数α_s ^k（ｒ）及び上記ステップＳ１０３で更新された各極小部分問題の解ｚ_s ^k（ｒ）と、上記ステップＳ１０４で他の計算ノード２３_i全てから取得したラグランジュ未定乗数α_i ^k（ｒ）及び各極小部分問題の解ｚ_i ^k（ｒ）とに基づいて、上記（１４）式に従って、全ての極小部分問題ｒの各々に対する制約条件のパラメータｕ^k（ｒ）を更新する。 In step S105, the constraint condition update unit 34 updates the Lagrange undetermined multiplier α _s ^k (r) updated in step S102 and the solution z _s ^k (r) of each local subproblem updated in step S103, and Based on the Lagrange undetermined multipliers α _i ^k (r) and solutions z _i ^k (r) of the respective local subproblems obtained from all the other computation nodes 23 _{i in} step S104, all the local minima are determined according to the above equation (14). Update the constraint parameters u ^k (r) for each of the subproblems r.

次のステップＳ１０６では、更新された全てのグループｓ内の極小部分問題ｒの各々の解ｚ_s ^k（ｒ）と、更新された制約条件のパラメータｕ^k（ｒ）とに基づいて、上記（１５）式、(１６)式に従って、制約条件のパラメータｕ^k（ｒ）の全てが最適値に収束したか否かを判定する。上記（１５）式および(１６)式を満たさない場合には、収束していないと判断し、変数ｋを１インクリメントして、上記ステップＳ１０２へ戻る。一方、上記（１５）式及び(１６）式を満たした場合には、収束したと判断し、上記ステップＳ１０７へ移行する。 In the next step S106, based on the solution z _s ^k (r) of each updated minimal subproblem r in all the groups s and the updated parameter u ^k (r) of the constraint condition ( In accordance with the equations (15) and (16), it is determined whether or not all the parameters u ^k (r) of the constraint conditions have converged to the optimum values. If the above equations (15) and (16) are not satisfied, it is determined that they have not converged, the variable k is incremented by 1, and the process returns to step S102. On the other hand, if the above equations (15) and (16) are satisfied, it is determined that convergence has occurred, and the process proceeds to step S107.

ステップＳ１０７では、上記ステップＳ１０５で最終的に更新された制約条件のパラメータｕ^k（ｒ）を用いて、各段階の言語解析処理の結果を生成し、自然言語解析処理の結果として出力部３０により出力して、自然言語解析処理ルーチンを終了する。 In step S107, the result of the language analysis process at each stage is generated using the constraint parameter u ^k (r) finally updated in step S105, and the result of the natural language analysis process is output by the output unit 30. Output and end the natural language analysis processing routine.

以上説明したように、第１の実施の形態に係る自然言語解析処理装置によれば、複数段階の言語解析処理を行う問題を極小部分問題の集合に分解してグループを作成し、２以上のグループに属する極小部分問題の解が一致する制約条件に基づいて、各計算ノードにおいて、割り当てられた極小部分問題の部分集合についてラグランジュ未定乗数及び極小部分問題の解を更新し、他の計算ノードから取得したラグランジュ未定乗数及び極小部分問題の解を用いて、全ての極小部分問題に対する制約条件パラメータを更新することを収束するまで繰り返すことにより、複数段階の言語解析処理を一括して行うことができ、計算量の増大を抑制して、精度よく自然言語解析処理を行うことができる。 As described above, according to the natural language analysis processing apparatus according to the first embodiment, the problem of performing the multi-level language analysis processing is decomposed into a set of minimal subproblems, and a group is created. Update the Lagrange undetermined multiplier and the solution of the minimal subproblem for each subset of the minimal subproblems assigned to each computation node based on the constraints that the solutions of the minimal subproblems belonging to the group match. Using the acquired Lagrange undetermined multiplier and the solution of the minimal subproblem, it is possible to perform multi-level language analysis processing in a batch by repeating updating the constraint parameter for all the minimal subproblems until convergence. The natural language analysis process can be performed with high accuracy while suppressing an increase in the amount of calculation.

また、本発明を用いることにより、複数の段階で構成される言語解析処理を統合して一括して行うことができるようになる。これによって、従来のカスケード方式で解析するよりも、前の段階の解析結果の間違いの影響をその後の段階の解析に出にくくすることが可能となり、結果的に、全体の解析精度を向上させることが可能となる。また、複雑な言語解析問題を各段階に分割せずに全体最適化を少ない計算量で実現できる。 Further, by using the present invention, it becomes possible to integrate and perform language analysis processing composed of a plurality of stages. As a result, it is possible to make it harder for the analysis of the previous stage to be affected by the error in the previous stage analysis than in the conventional cascade method, and as a result, improve the overall analysis accuracy. Is possible. In addition, it is possible to achieve total optimization with a small amount of calculation without dividing a complicated language analysis problem into each stage.

また、様々な観点を統合して解析を行うことができる。例えば、辞書などの既存の情報や、部分的に解析済みのデータなども容易に統合して利用することが可能である。また、大域的な特徴も容易に統合可能である。また、問題を小さい単位に分割して問題を処理するため、並列処理による高速化が容易である。 In addition, various viewpoints can be integrated for analysis. For example, existing information such as a dictionary or partially analyzed data can be easily integrated and used. Global features can also be easily integrated. Further, since the problem is processed by dividing the problem into small units, it is easy to increase the speed by parallel processing.

なお、上記の実施の形態では、各計算ノード２３が、制約条件更新部３４及び収束判定部３５を備えている場合を例に説明したが、これに限定されるものではない。例えば、図４に示すように、各計算ノード２３は、パラメータ更新部３１及び部分解更新部３２を備え、演算部２０が、制約条件更新部３４、及び収束判定部３５を１つずつ備えるように構成してもよい。この場合には、制約条件更新部３４は、全ての計算ノード２３_sで得られたラグランジュ未定乗数α_s ^k（ｒ）と各極小部分問題の解ｚ_s ^k（ｒ）を使って、制約条件のパラメータｕ^k（ｒ）を更新し、収束判定部３５は、得られた制約条件のパラメータｕ^k（ｒ）が最適値に収束しているか判定するようにする。収束判定で、収束していなかった場合は、各計算ノード２３に得られた制約条件のパラメータｕ^k（ｒ）を通知してパラメータ更新部３１による処理に戻るようにする。 In the above embodiment, the case where each calculation node 23 includes the constraint condition update unit 34 and the convergence determination unit 35 has been described as an example. However, the present invention is not limited to this. For example, as illustrated in FIG. 4, each calculation node 23 includes a parameter update unit 31 and a partial decomposition update unit 32, and the calculation unit 20 includes one constraint condition update unit 34 and one convergence determination unit 35. You may comprise. In this case, the constraint condition update unit 34 uses the Lagrange undetermined multiplier α _s ^k (r) obtained at all the computation nodes 23 _s and the solution z _s ^k (r) of each minimal subproblem to The parameter u ^k (r) is updated, and the convergence determination unit 35 determines whether the parameter u ^k (r) of the obtained constraint condition has converged to the optimum value. If the convergence is not determined in the convergence determination, the parameter u ^k (r) of the constraint condition obtained to each computation node 23 is notified, and the process returns to the process by the parameter update unit 31.

〔第２の実施の形態〕
次に、第２の実施の形態について説明する。なお、第１の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。 [Second Embodiment]
Next, a second embodiment will be described. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

第２の実施の形態では、ネットワークで接続された複数の言語解析装置を備えた分散並列計算環境において、複数の言語解析装置による分散並列計算で、パラメータ更新を行っている点が、第１の実施の形態と異なっている。 In the second embodiment, in the distributed parallel computing environment provided with a plurality of language analysis devices connected by a network, the parameter is updated by the distributed parallel computation by the plurality of language analysis devices. This is different from the embodiment.

＜システム構成＞
図５に示すように、第２の実施の形態に係る自然言語解析処理システム２００は、言語解析制御装置２０１、及びS個の言語解析装置２０２₁〜２０２_Sを備えている。言語解析制御装置２０１及びS個の言語解析装置２０２₁〜２０２_Sは、ネットワーク２０３を介して接続されている。なお、言語解析装置２０２₁〜２０２_Sのうちの任意の言語解析装置を示す場合には、言語解析装置２０２と称することとする。 <System configuration>
As shown in FIG. 5, the natural language analysis processing system 200 according to the second embodiment includes a language analysis control device 201 and S language analysis devices 202 _{1 to} 202 _S. The language analysis control device 201 and the S language analysis devices 202 _{1 to} 202 _S are connected via a network 203. In addition, when referring to any language analysis device among the language analysis devices 202 _{1 to} 202 _S , the language analysis device 202 is referred to.

図６に示すように、言語解析制御装置２０１は、入力部１０、演算部２２０、及び出力部２３０を備えている。 As shown in FIG. 6, the language analysis control device 201 includes an input unit 10, a calculation unit 220, and an output unit 230.

演算部２２０は、極小部分問題生成部２１及びグループ作成部２２を備えている。 The calculation unit 220 includes a minimal partial problem generation unit 21 and a group creation unit 22.

グループ作成部２２は、極小部分問題生成部２１によって生成された極小部分問題の集合について、極小部分集合の部分集合からなるＳ個のグループを作成し、ネットワーク２０３を介してS個の言語解析装置２０２₁〜２０２_Sに送信する。また、グループ作成部２２は、入力されたパラメータρを、ネットワーク２０３を介してS個の言語解析装置２０２₁〜２０２_Sの各々に送信する。 The group creation unit 22 creates S groups composed of subsets of the minimal subsets for the set of minimal subproblems generated by the minimal subproblem generation unit 21, and S language analysis devices via the network 203 to send to the 202 ₁ ~202 _S. Further, the group creation unit 22 transmits the input parameter ρ to each of the _S language analysis devices 202 _{1 to} 202 S via the network 203.

S個の言語解析装置２０２₁〜２０２_Sの各々は、図７に示すように、入力部２４０、演算部２５０、及び出力部２６０を備えている。 Each of the S language analyzers 202 _{1 to} 202 _S includes an input unit 240, a calculation unit 250, and an output unit 260, as shown in FIG.

入力部２４０は、言語解析制御装置２０１から送信されたグループｓに含まれる極小部分問題の部分集合を受け付ける。また、入力部２４０は、他の言語解析装置２０２からネットワーク２０３を介して送信された情報を受け付ける。 The input unit 240 receives a subset of minimal subproblems included in the group s transmitted from the language analysis control device 201. The input unit 240 receives information transmitted from another language analysis apparatus 202 via the network 203.

演算部２５０は、パラメータ更新部３１、部分解更新部３２、同期部３３、制約条件更新部３４、及び収束判定部３５を備えている。 The calculation unit 250 includes a parameter update unit 31, a partial decomposition update unit 32, a synchronization unit 33, a constraint condition update unit 34, and a convergence determination unit 35.

パラメータ更新部３１は、言語解析装置２０２に送信されたグループｓの極小部分問題の部分集合について、各極小部分問題ｒに対するラグランジュ未定乗数α_s（ｒ）を更新する。 The parameter updating unit 31 updates the Lagrange undetermined multiplier α _s (r) for each minimal subproblem r for the subset of the minimal subproblems of the group s transmitted to the language analysis device 202.

同期部３３は、当該言語解析装置２０２_sで今回更新されたα_s（ｒ）とｚ_s（ｒ）を、自分以外の全ての言語解析装置２０２へネットワーク２０３を介して送信する。また、同期部３３は、他の言語解析装置２０２_i全てから送信された、今回更新されたα_i（ｒ）とｚ_i（ｒ）を受け取る。この処理によって、個々の言語解析装置２０２は全ての言語解析装置２０２_sの持つα_s（ｒ）とｚ_s（ｒ）の値を取得することができる。 The synchronization unit 33 transmits α _s (r) and z _s (r) updated this time by the language analysis device 202 _s to all the language analysis devices 202 other than itself through the network 203. Further, the synchronization unit 33 receives α _i (r) and z _i (r) updated this time, which are transmitted from all the other language analysis devices 202 _i . By this processing, each language analysis device 202 can acquire the values of α _s (r) and z _s (r) of all the language analysis devices 202 _s .

制約条件更新部３４は、他の言語解析装置２０２_i全てから受け取ったα_i（ｒ）とｚ_i（ｒ）を使って、上記（１４）式に従って、各極小部分問題ｒに対する制約条件のパラメータｕ（ｒ）を更新する。 The constraint condition update unit 34 uses the parameters α _i (r) and z _i (r) received from all the other language analysis devices 202 _i according to the above equation (14), and parameters for the constraint conditions for each minimal subproblem r. Update u (r).

収束判定部３５は、得られた制約条件のパラメータｕ（ｒ）が収束して最適値になっているか判定し、収束したと判定されたときに得られた制約条件のパラメータｕ（ｒ）を組み合わせて、複数段階の言語解析処理の結果を生成し、自然言語解析処理の結果として、出力部２６０により言語解析制御装置２０１へ送信する。 The convergence determination unit 35 determines whether the obtained constraint condition parameter u (r) has converged to an optimum value, and uses the constraint condition parameter u (r) obtained when it is determined that the convergence has been achieved. In combination, a result of a multi-stage language analysis process is generated, and the result of the natural language analysis process is transmitted to the language analysis control apparatus 201 by the output unit 260.

＜自然言語解析処理システムの作用＞
次に、本実施の形態に係る自然言語解析処理システム２００の作用について説明する。まず、自然言語解析処理の対象となる文書が言語解析制御装置２０１に入力されると、言語解析制御装置２０１において、極小部分問題生成部２１によって、入力文書に対する自然言語解析処理を行う問題を、言語解析処理の段階毎の極小部分問題に分解して、極小部分問題の集合を生成する。そして、グループ作成部２２によって、生成された極小部分問題の集合について、S個のグループを作成し、ネットワーク２０３を介してS個の言語解析装置２０２へ送信して、S個の言語解析装置２０２に割り当てる。 <Operation of natural language analysis processing system>
Next, the operation of the natural language analysis processing system 200 according to the present embodiment will be described. First, when a document to be subjected to natural language analysis processing is input to the language analysis control device 201, the language analysis control device 201 uses the minimal partial problem generation unit 21 to perform a problem of performing natural language analysis processing on the input document. A set of minimal subproblems is generated by decomposing the minimal subproblems at each stage of the language analysis processing. Then, the group creation unit 22 creates S groups for the generated set of minimal subproblems, transmits them to the S language analysis devices 202 via the network 203, and the S language analysis devices 202. Assign to.

そして、各言語解析装置２０２によって、上記図３に示す自然言語解析処理ルーチンが実行される。 Then, the natural language analysis processing routine shown in FIG. 3 is executed by each language analysis device 202.

少なくとも１つの言語解析装置２０２によって、最終的に更新された制約条件のパラメータｕ（ｒ）を組み合わせて生成された自然言語解析処理の結果が、ネットワーク２０３を介して言語解析制御装置２０１へ送信される。言語解析制御装置２０１は、言語解析装置２０２により受信した自然言語解析処理の結果を出力部２３０により出力する。 The result of the natural language analysis process generated by combining the finally updated constraint parameter u (r) by at least one language analysis device 202 is transmitted to the language analysis control device 201 via the network 203. The The language analysis control device 201 outputs the result of the natural language analysis processing received by the language analysis device 202 by the output unit 230.

以上説明したように、第２の実施の形態に係る自然言語解析処理システムによれば、ネットワークを介して接続された複数の言語解析装置によって、分散並列処理による自然言語解析処理を行うため、処理を高速化できる。 As described above, according to the natural language analysis processing system according to the second embodiment, a plurality of language analysis devices connected via a network perform natural language analysis processing by distributed parallel processing. Can be speeded up.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、複数段階の言語解析処理を含む自然言語解析処理を行う問題を、段階毎に予め定義された最小の単位である極小部分問題に分解する場合を例に説明したが、これに限定されるものではなく、最小の単位より大きい単位の部分問題を定義して、当該部分問題に分解するようにしてもよい。例えば、最小の単位である極小部分問題を複数組み合わせた部分問題に分解するようにしてもよい。 For example, although the case where the problem of performing the natural language analysis processing including the language analysis processing of a plurality of stages is decomposed into the minimum sub-problem which is the minimum unit predefined for each stage has been described as an example, the present invention is limited to this. Instead, a subproblem with a unit larger than the smallest unit may be defined and decomposed into the subproblem. For example, you may make it decompose | disassemble into the partial problem which combined several minimum partial problems which are the minimum units.

また、上述の自然言語解析処理装置、言語解析制御装置、言語解析装置は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 Further, the above-described natural language analysis processing device, language analysis control device, and language analysis device have a computer system therein, but if the “computer system” uses a WWW system, a homepage can be used. It also includes the provision environment (or display environment).

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 In the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium.

１０入力部
２０演算部
２１極小部分問題生成部
２２グループ作成部
２３計算ノード
３１パラメータ更新部
３２部分解更新部
３３同期部
３４制約条件更新部
３５収束判定部
１００自然言語解析処理装置
２００自然言語解析処理システム
２０１言語解析制御装置
２０２言語解析装置 DESCRIPTION OF SYMBOLS 10 Input part 20 Calculation part 21 Minimal subproblem generation part 22 Group preparation part 23 Calculation node 31 Parameter update part 32 Partial decomposition update part 33 Synchronization part 34 Restriction condition update part 35 Convergence determination part 100 Natural language analysis processing apparatus 200 Natural language analysis Processing System 201 Language Analysis Control Device 202 Language Analysis Device

Claims

A natural language analysis processing device that performs natural language analysis processing including a plurality of types of natural language analysis processing on an input document including at least one sentence,
For each of the plurality of types of natural language analysis processing, the problem of performing the natural language analysis processing on the input document is decomposed into partial problems that are problems in units of characters or more that are predefined for the natural language analysis processing. And subproblem generating means for generating the set of subproblems,
S computation nodes (S is a natural number of 2 or more) for performing the natural language analysis processing;
With respect to the set of subproblems generated by the subproblem generating means, S groups composed of arbitrary subsets with respect to the whole set of subproblems so that each subproblem belongs to at least two groups. Creating a group and assigning it to the S computation nodes;
Each of the S compute nodes is
For each subset of the subsets assigned to the group by the group creating means, each subset problem of the subset is based on a constraint condition in which the solutions of the subset problems belonging to the two or more groups match. Partial decomposition updating means for updating the solution of
Notifying the other calculation nodes of the solutions of the respective partial problems updated by the partial decomposition updating means, and obtaining the solutions of the respective partial problems of the subset notified from the other calculation nodes Synchronization means;
Based on the solution of each subproblem of the subset of the other computation node obtained by the synchronization means and the solution of each subproblem of the subset updated by the partial decomposition update means, , Constraint condition updating means for updating a constraint parameter that is a solution when matching the solution of the partial problem according to the constraint condition;
It is determined whether or not the value of the constraint parameter has converged, and until it is determined that the value of the constraint parameter has converged, the update by the partial decomposition update unit, the notification and acquisition by the synchronization unit, and the constraint condition A natural language analysis processing apparatus comprising: a convergence determination unit that repeats updating by the updating unit.

The partial decomposition updating means, for the subset of the partial problem of the group assigned by the group creating means, the Lagrange undetermined multiplier updated last time, the solution of each partial problem of the subset, and the constraint parameter To update the Lagrangian undetermined multiplier so as to optimize a predetermined objective function value, and to use the updated Lagrange undetermined multiplier to optimize the value of the objective function, Update the solution of each subproblem of the subset,
The synchronizing means notifies the Lagrange undetermined multiplier updated by the partial decomposition updating means and the solution of each subproblem of the subset to other calculation nodes, and the Lagrange notified from the other calculation nodes. Obtain the undetermined multiplier and the solution of each subproblem of the subset,
The constraint condition update means includes the Lagrange undetermined multiplier and the solution of each partial problem of the subset obtained by the synchronization means, the Lagrange undetermined multiplier updated by the partial decomposition update means, and the The natural language analysis processing apparatus according to claim 1, wherein the constraint parameter is updated for each subproblem so that the value of the objective function is optimized based on a solution of each subproblem of the subset.

A natural language analysis processing device that performs natural language analysis processing including a plurality of types of natural language analysis processing on an input document including at least one sentence,
For each of the plurality of types of natural language analysis processing, the problem of performing the natural language analysis processing on the input document is decomposed into partial problems that are problems in units of characters or more that are predefined for the natural language analysis processing. And subproblem generating means for generating the set of subproblems,
S computation nodes (S is a natural number of 2 or more) for performing the natural language analysis processing;
With respect to the set of subproblems generated by the subproblem generating means, S groups composed of arbitrary subsets with respect to the whole set of subproblems so that each subproblem belongs to at least two groups. Group creating means for creating and assigning to the S computing nodes;
Including constraint update means and convergence determination means,
Each of the S compute nodes is
For each subset of the subsets assigned to the group by the group creating means, each subset problem of the subset is based on a constraint condition in which the solutions of the subset problems belonging to the two or more groups match. A partial decomposition updating means for updating the solution of
The constraint condition update means matches the solution of the partial problem according to the constraint condition for each partial problem based on the solution of each partial problem of the subset updated by the partial decomposition update means of each computation node. Update the constraint parameter, which is the solution when
The convergence determination means determines whether or not the value of the constraint parameter has converged, and updates by the partial decomposition update means of each calculation node until determining that the value of the constraint parameter has converged, and A natural language analysis processing device that repeats updating by means of constraint condition updating.

The partial decomposition updating means, for the subset of the partial problem of the group assigned by the group creating means, the Lagrange undetermined multiplier updated last time, the solution of each partial problem of the subset, and the constraint parameter To update the Lagrangian undetermined multiplier so as to optimize a predetermined objective function value, and to use the updated Lagrange undetermined multiplier to optimize the value of the objective function, Update the solution of each subproblem of the subset,
The constraint condition update means optimizes the value of the objective function based on the Lagrange undetermined multiplier updated by the partial decomposition update means of each computation node and the solution of each partial problem of the subset. The natural language analysis processing apparatus according to claim 3, wherein the constraint parameter is updated for each partial problem.

A natural language analysis process including a plurality of types of natural language analysis processes for an input document including at least one sentence, including a subproblem generation unit, S (S is a natural number of 2 or more) calculation nodes, and a group creation unit A natural language analysis processing method in a natural language analysis processing apparatus for performing
For each of the plurality of types of natural language analysis processing, the partial problem generating means determines a problem of performing the natural language analysis processing on the input document in units of characters or units that are predefined for the natural language analysis processing. Break down into subproblems that are problems to generate the set of subproblems,
The set of subproblems generated by the subproblem generation means by the group creating means is composed of an arbitrary subset of the whole set of subproblems so that each subproblem belongs to at least two groups. Create S groups and assign them to the S compute nodes,
The natural language analysis processing is performed by the S calculation nodes,
Performing the natural language analysis process by each of the S computation nodes is as follows:
Based on the constraint condition that the solutions of the partial problems belonging to the two or more groups match with respect to a subset of the partial problem set of the group assigned by the group creation means by the partial decomposition update means, Update the solution of each subproblem in the subset,
The synchronization means notifies the solution of each subproblem of the subset updated by the subdivision updating means to other calculation nodes, and also notifies each subproblem of the subset notified from the other calculation node. Get the solution,
Based on the solution of each partial problem of the subset of the other computation node acquired by the synchronization means by the constraint update means and the solution of each partial problem of the subset updated by the partial decomposition update means For each subproblem, update the constraint parameter that is a solution when matching the solution of the subproblem according to the constraint,
The convergence determination means determines whether or not the value of the constraint parameter has converged, and updates by the partial decomposition update means, notification and acquisition by the synchronization means until it is determined that the value of the constraint parameter has converged And repeating the update by the constraint condition update means
Natural language analysis processing method.

The partial decomposition updating means, for the subset of the partial problem of the group assigned by the group creating means, the Lagrange undetermined multiplier updated last time, the solution of each partial problem of the subset, and the constraint parameter To update the Lagrangian undetermined multiplier so as to optimize a predetermined objective function value, and to use the updated Lagrange undetermined multiplier to optimize the value of the objective function, Update the solution of each subproblem of the subset,
The synchronizing means notifies the Lagrange undetermined multiplier updated by the partial decomposition updating means and the solution of each subproblem of the subset to other calculation nodes, and the Lagrange notified from the other calculation nodes. Obtain the undetermined multiplier and the solution of each subproblem of the subset,
The constraint condition update means includes the Lagrange undetermined multiplier and the solution of each partial problem of the subset obtained by the synchronization means, the Lagrange undetermined multiplier updated by the partial decomposition update means, and the The natural language analysis processing method according to claim 5, wherein the constraint parameter is updated for each subproblem so as to optimize the value of the objective function based on a solution of each subproblem of the subset.

It includes a sub-problem generating means, S calculation nodes (S is a natural number of 2 or more), a group creating means, a constraint condition updating means, and a convergence determining means, and a plurality of types of natural for an input document including at least one sentence A natural language analysis processing method in a natural language analysis processing apparatus for performing natural language analysis processing including language analysis processing,
For each of the plurality of types of natural language analysis processing, the partial problem generating means determines a problem of performing the natural language analysis processing on the input document in units of characters or units that are predefined for the natural language analysis processing. Break down into subproblems that are problems to generate the set of subproblems,
The set of subproblems generated by the subproblem generation means by the group creating means is composed of an arbitrary subset of the whole set of subproblems so that each subproblem belongs to at least two groups. Create S groups and assign them to the S compute nodes,
The natural language analysis processing is performed by the S calculation nodes,
Updated by the constraint condition update means,
Determining by the convergence determining means,
Performing the natural language analysis process by each of the S computation nodes is as follows:
Based on the constraint condition that the solutions of the partial problems belonging to the two or more groups match with respect to a subset of the partial problem set of the group assigned by the group creation means by the partial decomposition update means, Updating the solution of each subproblem of the subset,
The constraint condition update means matches the solution of the partial problem according to the constraint condition for each partial problem based on the solution of each partial problem of the subset updated by the partial decomposition update means of each computation node. Update the constraint parameter, which is the solution when
The convergence determination means determines whether or not the value of the constraint parameter has converged, and updates by the partial decomposition update means of each calculation node until determining that the value of the constraint parameter has converged, and Natural language analysis processing method that repeats the update by the constraint update means.

The program for functioning a computer as each means of the natural language analysis processing apparatus of any one of Claims 1-4.