JP6499537B2 - Connection expression structure analysis apparatus, method, and program - Google Patents
Connection expression structure analysis apparatus, method, and program Download PDFInfo
- Publication number
- JP6499537B2 JP6499537B2 JP2015141649A JP2015141649A JP6499537B2 JP 6499537 B2 JP6499537 B2 JP 6499537B2 JP 2015141649 A JP2015141649 A JP 2015141649A JP 2015141649 A JP2015141649 A JP 2015141649A JP 6499537 B2 JP6499537 B2 JP 6499537B2
- Authority
- JP
- Japan
- Prior art keywords
- connection
- sentence
- expression
- term
- connection expression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Machine Translation (AREA)
Description
本発明は、接続表現項構造解析装置、方法、及びプログラムに係り、特に、与えられた文書から接続表現によって結ばれた項を抽出するための接続表現項構造解析装置、方法、及びプログラムに関する。 The present invention relates to a connection expression term structure analysis apparatus, method, and program, and more particularly, to a connection expression term structure analysis apparatus, method, and program for extracting a term connected by a connection expression from a given document.
従来より、与えられた文書から接続表現とその意味クラス、接続表現によって結ばれた2つの項となるテキストスパン(文書中の一部のテキスト)を抽出する技術がある。この技術は、接続表現−項構造解析技術と呼ばれる。たとえば以下の文では、接続表現:becauseによって項1:「He caught a cold」、及び項2:「he got soaked in the rain」が意味:因果関係で結びついている。 Conventionally, there is a technique for extracting a connection span, a semantic class thereof, and a text span (part of text in the document) which is two terms connected by the connection representation from a given document. This technique is called a connected expression-term structure analysis technique. For example, in the following sentence, the term 1: “He caught a cold” and the term 2: “he got soaked in the rain” are connected by a connection expression: “because” in a meaning: causal relationship.
He caught a cold because he got soaked in the rain. He caught a cold because he got soaked in the rain.
このような意味的に結びついた2つのテキストスパンの組を意味クラスごとに大量に収集し、知識源とすることで、自然言語処理の様々なタスク(含意認識、文書要約、機械翻訳等)の質を向上させることができる。 By collecting a large number of such semantically linked sets of text spans for each semantic class and using it as a knowledge source, various tasks of natural language processing (entailment recognition, document summarization, machine translation, etc.) The quality can be improved.
接続表現と項との関係は明示的な場合と暗示的な場合に大別される。明示的な場合は接続表現そのものが出現する場合である。たとえば、因果関係をあらわす「because」、時間の推移をあらわす「after」などによって2つのスパンが結ばれている場合である。一方、接続表現がそのもの出現しなくとも意味的に因果関係や時間の推移をあらわす文のペアが存在する。たとえば、以下の2文の間には因果関係が成立する。 The relationship between connection expressions and terms is broadly divided into explicit cases and implicit cases. An explicit case is when the connection representation itself appears. For example, two spans are connected by “because” representing a causal relationship, “after” representing a transition of time, and the like. On the other hand, even if the connection expression does not appear, there exists a sentence pair that semantically indicates a causal relationship or a transition of time. For example, a causal relationship is established between the following two sentences.
朝から雨が降っていた。
野球の試合も中止となった。
It has been raining since morning.
The baseball game was also canceled.
従来の接続表現−項構造解析技術(非特許文献1)は、図8に示す接続表現項構造解析装置の構成で文書から、接続表現、意味クラス、及び項を抽出していた。 Conventional connection expression-term structure analysis technology (Non-patent Document 1) extracts connection expressions, semantic classes, and terms from a document with the configuration of the connection expression term structure analysis apparatus shown in FIG.
従来の接続表現項構造解析装置では、明示的接続表現−項構造の抽出は以下(1)〜(4)の手順で行っていた。 In the conventional connection expression term structure analysis apparatus, the explicit connection expression-term structure is extracted by the following procedures (1) to (4).
(1)入力文書から接続表現候補辞書に格納されているすべての接続表現を抽出し、それらの表現が項を持つ接続表現か否かを判定する(接続表現抽出部)。(2)項を持つ接続表現と判定された場合、その意味クラスを分類器を利用して決定する。(3)さらに、接続表現の2つの項(項1、項2)が同一の文に出現するか(SS)、項1が先行する文に出現するか(PS)を決定する。(4)それぞれの場合に応じて、文内項抽出部、文間項抽出部を用いて接続表現に対応する2つの項を抽出する。 (1) All connection expressions stored in the connection expression candidate dictionary are extracted from the input document, and it is determined whether or not these expressions are connection expressions having terms (connection expression extraction unit). (2) When it is determined that the connection expression has a term, its semantic class is determined using a classifier. (3) Further, it is determined whether two terms (term 1 and term 2) of the connection expression appear in the same sentence (SS) or whether term 1 appears in the preceding sentence (PS). (4) Depending on each case, two terms corresponding to the connection expression are extracted using the intra-sentence term extraction unit and the inter-sentence term extraction unit.
また、暗示的接続表現−項構造の抽出は以下(1)、(2)の手順で行っていた。 The extraction of the implicit connection expression-term structure was performed in the following procedures (1) and (2).
(1)文書中の隣接する文のペア(先の文から項1、後の文から項2を取り出す)を抽出し、意味クラス分類部を用いて意味クラスを付与する。なお、意味クラスはあらかじめ複数分類が定義されているとする。非特許文献1では、Expansion、 Contingency、Temporal、 Comparison という4つの意味クラスを利用している。意味クラス分類部は2文間に接続関係がある場合には何らかの意味クラスを出力し、接続関係にない場合には接続関係がないことを出力する。すなわち、意味クラス分類部は、2文間の関係を意味クラス数+1のクラスに分類する。(2)意味クラスが付与された2文について文間項抽出部を用いて項を抽出する。以上のようにして、接続表現項構造解析装置は、文書から明示的接続表現−項構造、及び暗示的接続表現−項構造の抽出を行っていた。 (1) A pair of adjacent sentences in the document (extracting item 1 from the previous sentence and item 2 from the subsequent sentence) is extracted, and a semantic class is assigned using a semantic class classifying unit. It is assumed that a plurality of classifications are defined in advance for the semantic class. Non-Patent Document 1 uses four semantic classes: Expansion, Contingency, Temporal, and Comparison. The semantic class classification unit outputs some semantic class when there is a connection relationship between two sentences, and outputs that there is no connection relationship when there is no connection relationship. That is, the semantic class classifying unit classifies the relationship between two sentences into classes having the number of semantic classes + 1. (2) Using the inter-sentence term extraction unit, terms are extracted from two sentences to which semantic classes are assigned. As described above, the connection expression term structure analysis apparatus extracts an explicit connection expression-term structure and an implicit connection expression-term structure from a document.
従来の接続表現−項構造の抽出法は、文書、あるいは文の談話構造を考慮せずに項を抽出している。暗示的接続表現−項抽出の場合には、隣接した2文しか項抽出の対象にならない。また、明示的接続表現−項抽出であっても、接続表現が出現する文に項1、項2が同時出現しない場合、項1はその接続表現が出現する文の1つ前の文から抽出する。しかし、1文をこえた接続関係は隣接した2文だけに限らないため、本来抽出すべき接続表現−項構造に取りこぼしが生じる。 The conventional connection representation-term structure extraction method extracts a term without considering the discourse structure of a document or sentence. In the case of implicit connection expression-term extraction, only two adjacent sentences are subject to term extraction. Even if explicit connection representation-term extraction is used, if terms 1 and 2 do not appear simultaneously in a statement in which connection representation appears, term 1 is extracted from the statement immediately preceding the statement in which the connection representation appears. To do. However, since the connection relationship over one sentence is not limited to two adjacent sentences, a connection expression-term structure that should be originally extracted is missed.
本発明は、上記問題点を解決するために成されたものであり、隣接しない文間からも、接続表現によって結ばれた項を抽出することができる接続表現項構造解析装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and a connected expression term structure analysis apparatus, method, and program capable of extracting terms connected by connection expressions even between non-adjacent sentences. The purpose is to provide.
上記目的を達成するために、第1の発明に係る接続表現項構造解析装置は、入力された文書に基づいて、前記文書に含まれる文の各々の修辞構造に基づく、前記文の各々を各ノードで表わした談話構造木を生成する談話構造解析部と、前記文書に含まれる文の各々について、構文解析を行って構文木を生成する構文解析部と、前記構文解析部によって生成された前記文の各々についての構文木に基づいて、項を持つ接続表現を抽出する接続表現抽出部と、前記接続表現抽出部によって抽出された前記接続表現について、前記接続表現を含む文内に、前記接続表現によって結ばれた2つの項が出現するか否かを判定する項位置関係決定部と、前記項位置関係決定部によって前記接続表現を含む文内に、前記接続表現によって結ばれた2つの項が出現すると判定された場合、前記接続表現を含む文から、前記接続表現によって結ばれた2つの項を抽出する文内項抽出部と、前記項位置関係決定部によって前記接続表現を含む文内に、前記接続表現によって結ばれた2つの項が出現しないと判定された場合、前記接続表現を含む文から、前記接続表現によって結ばれた2つの項の何れか一方を抽出し、前記談話構造解析部によって生成された前記談話構造木において、前記接続表現を含む文の親ノード又は兄弟ノードに対応する文から、前記接続表現によって結ばれた2つの項の何れか他方を抽出する文間項抽出部と、前記接続表現抽出部によって抽出された前記接続表現に基づいて、前記接続表現の意味クラスを分類する意味クラス分類部と、を含んで構成されている。 In order to achieve the above object, the connected expression term structure analyzing apparatus according to the first aspect of the present invention relates to each of the sentences based on the rhetorical structure of each of the sentences included in the document based on the input document. A discourse structure analysis unit that generates a discourse structure tree represented by a node; a syntax analysis unit that generates a syntax tree by performing syntax analysis for each sentence included in the document; and the syntax analysis unit that generates the syntax tree. A connection expression extraction unit that extracts a connection expression having a term based on a syntax tree for each sentence, and the connection expression extracted by the connection expression extraction unit includes the connection expression in a sentence including the connection expression. A term positional relationship determination unit that determines whether or not two terms connected by an expression appear, and two terms connected by the connection expression in a sentence including the connection expression by the term positional relationship determination unit But If it is determined to be present, a sentence in-sentence extraction unit that extracts two terms connected by the connection expression from the sentence including the connection expression, and a sentence including the connection expression by the term positional relationship determination unit When it is determined that the two terms connected by the connection expression do not appear, either one of the two terms connected by the connection expression is extracted from the sentence including the connection expression, and the discourse structure analysis is performed. In the discourse structure tree generated by the section, an inter-sentence term extraction that extracts one of the two terms connected by the connection expression from a sentence corresponding to a parent node or a sibling node of the sentence including the connection expression And a semantic class classifying unit that classifies the semantic classes of the connection representation based on the connection representation extracted by the connection representation extraction unit.
また、第2の発明に係る接続表現項構造解析装置は、入力された文書に基づいて、前記文書に含まれる文の各々の修辞構造に基づく、前記文の各々を各ノードで表わした談話構造木を生成する談話構造解析部と、前記談話構造解析部によって生成された前記談話構造木に基づいて、親子ノードに対応する文のペア、及び兄弟ノードに対応する文のペアを、接続関係を持つ文のペアの候補とし、前記接続関係を持つ文のペアの候補の各々について、接続関係があるか否かを判定する関連文ペア抽出部と、前記関連文ペア抽出部によって接続関係があると判定された前記接続関係を持つ文のペアの候補の各々について、前記接続関係を持つ文のペアの候補から、暗示的な接続表現によって結ばれる2つの項を抽出する文間項抽出部と、前記関連文ペア抽出部によって接続関係があると判定された前記接続関係を持つ文のペアの候補の各々について、前記接続関係を持つ文のペアの候補に基づいて、前記暗示的な接続表現の意味クラスを分類する意味クラス分類部と、を含んで構成されている。 Further, the connection expression term structure analyzing apparatus according to the second invention is based on an input document and based on a rhetorical structure of each sentence included in the document, and a discourse structure in which each of the sentences is represented by each node. Based on the discourse structure analysis unit for generating a tree and the discourse structure tree generated by the discourse structure analysis unit, a sentence pair corresponding to a parent-child node and a sentence pair corresponding to a sibling node are connected to each other. A related sentence pair extraction unit that determines whether or not there is a connection relationship for each of the sentence pair candidates having the connection relationship, and a connection relationship by the related sentence pair extraction unit. An inter-sentence term extraction unit that extracts, from each of the sentence pair candidates having the connection relationship, two terms connected by an implicit connection expression for each of the sentence pair candidates having the connection relationship determined as , The related sentence For each of the sentence pair candidates having the connection relationship determined by the extraction unit to be connected, the semantic class of the implicit connection expression is determined based on the sentence pair candidates having the connection relationship. And a semantic class classification unit for classification.
第3の発明に係る接続表現項構造解析方法は、談話構造解析部が、入力された文書に基づいて、前記文書に含まれる文の各々の修辞構造に基づく、前記文の各々を各ノードで表わした談話構造木を生成するステップと、構文解析部が、前記文書に含まれる文の各々について、構文解析を行って構文木を生成するステップと、接続表現抽出部が、前記構文解析部によって生成された前記文の各々についての構文木に基づいて、項を持つ接続表現を抽出するステップと、項位置関係決定部が、前記接続表現抽出部によって抽出された前記接続表現について、前記接続表現を含む文内に、前記接続表現によって結ばれた2つの項が出現するか否かを判定するステップと、文内項抽出部が、前記項位置関係決定部によって前記接続表現を含む文内に、前記接続表現によって結ばれた2つの項が出現すると判定された場合、前記接続表現を含む文から、前記接続表現によって結ばれた2つの項を抽出するステップと、文間項抽出部が、前記項位置関係決定部によって前記接続表現を含む文内に、前記接続表現によって結ばれた2つの項が出現しないと判定された場合、前記接続表現を含む文から、前記接続表現によって結ばれた2つの項の何れか一方を抽出し、前記談話構造解析部によって生成された前記談話構造木において、前記接続表現を含む文の親ノード又は兄弟ノードに対応する文から、前記接続表現によって結ばれた2つの項の何れか他方を抽出するステップと、意味クラス分類部が、前記接続表現抽出部によって抽出された前記接続表現に基づいて、前記接続表現の意味クラスを分類するステップと、を含んで実行することを特徴とする。 In the connected expression term structure analysis method according to the third invention, the discourse structure analysis unit, based on the input document, based on the rhetorical structure of each sentence included in the document, each of the sentences at each node. Generating a represented discourse structure tree, a syntax analysis unit generating a syntax tree by performing syntax analysis for each sentence included in the document, and a connection expression extracting unit by the syntax analysis unit A step of extracting a connection expression having a term based on a syntax tree for each of the generated sentences; and the connection expression for the connection expression extracted by the connection expression extraction unit by a term positional relationship determination unit. Determining whether or not two terms connected by the connection expression appear in the sentence including , When it is determined that two terms connected by the connection expression appear, a step of extracting two terms connected by the connection expression from a sentence including the connection expression, and an inter-sentence term extraction unit, When it is determined by the term positional relationship determination unit that two terms connected by the connection expression do not appear in the sentence including the connection expression, 2 connected by the connection expression from the sentence including the connection expression. One of the two terms is extracted, and in the discourse structure tree generated by the discourse structure analysis unit, the sentence is connected by the connection expression from the sentence corresponding to the parent node or sibling node of the sentence including the connection expression. A step of extracting one of the two terms, and a semantic class classification unit, based on the connection representation extracted by the connection representation extraction unit, determines a semantic class of the connection representation. And executes includes a step similar, the.
第4の発明に係る接続表現項構造解析方法は、談話構造解析部が、入力された文書に基づいて、前記文書に含まれる文の各々の修辞構造に基づく、前記文の各々を各ノードで表わした談話構造木を生成するステップと、関連文ペア抽出部が、前記談話構造解析部によって生成された前記談話構造木に基づいて、親子ノードに対応する文のペア、及び兄弟ノードに対応する文のペアを、接続関係を持つ文のペアの候補とし、前記接続関係を持つ文のペアの候補の各々について、接続関係があるか否かを判定するステップと、文間項抽出部が、前記関連文ペア抽出部によって接続関係があると判定された前記接続関係を持つ文のペアの候補の各々について、前記接続関係を持つ文のペアの候補から、暗示的な接続表現によって結ばれる2つの項を抽出するステップと、意味クラス分類部が、前記関連文ペア抽出部によって接続関係があると判定された前記接続関係を持つ文のペアの候補の各々について、前記接続関係を持つ文のペアの候補に基づいて、前記暗示的な接続表現の意味クラスを分類するステップと、を含んで実行することを特徴とする。 In the connected expression term structure analyzing method according to the fourth invention, the discourse structure analyzing unit, based on the input document, based on the rhetorical structure of each sentence included in the document, each of the sentences at each node. The step of generating the expressed discourse structure tree and the related sentence pair extraction unit correspond to the sentence pair corresponding to the parent-child node and the sibling node based on the discourse structure tree generated by the discourse structure analysis unit. The sentence pair is a sentence pair candidate having a connection relation, and for each of the sentence pair candidates having the connection relation, determining whether or not there is a connection relation; Each of the sentence pair candidates having the connection relation determined to have a connection relation by the related sentence pair extraction unit is connected from the sentence pair candidates having the connection relation by an implicit connection expression 2. Two terms And a sentence pair candidate having the connection relationship for each of the sentence pair candidates having the connection relationship determined by the related sentence pair extraction unit by the semantic class classification unit. And classifying the semantic class of the implicit connection representation based on:
第5の発明に係るプログラムは、上記第1又は第2の発明に係る接続表現項構造解析装置を構成する各部として機能させるためのプログラムである。 A program according to a fifth aspect of the invention is a program for causing each section to constitute the connected expression term structure analyzing apparatus according to the first or second aspect of the invention.
本発明の接続表現項構造解析装置、方法、及びプログラムによれば、隣接しない文間からも、接続表現によって結ばれた項を抽出することができる、という効果が得られる。 According to the connection expression term structure analysis apparatus, method, and program of the present invention, it is possible to extract the terms connected by the connection expression even between non-adjacent sentences.
以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
<本発明の第1の実施の形態に係る接続表現項構造解析装置の構成> <Configuration of connection expression term structure analysis apparatus according to first embodiment of the present invention>
まず、本発明の第1の実施の形態に係る接続表現項構造解析装置の構成について説明する。第1の実施の形態に係る接続表現項構造解析装置では、文書から明示的接続表現に関する接続表現、項、及び意味ラベルを抽出する。 First, the configuration of the connection expression term structure analysis apparatus according to the first embodiment of the present invention will be described. The connection expression term structure analysis apparatus according to the first embodiment extracts connection expressions, terms, and semantic labels relating to explicit connection expressions from a document.
図1に示すように、本発明の第1の実施の形態に係る接続表現項構造解析装置100は、CPUと、RAMと、後述する接続表現項構造解析処理ルーチンを実行するためのプログラムや各種データを記憶したROMと、を含むコンピュータで構成することが出来る。この接続表現項構造解析装置100は、機能的には図1に示すように入力部10と、演算部20と、出力部50とを備えている。 As shown in FIG. 1, the connection expression term structure analysis apparatus 100 according to the first embodiment of the present invention includes a CPU, a RAM, a program for executing a connection expression term structure analysis processing routine to be described later, and various programs. It can be constituted by a computer including a ROM storing data. Functionally, the connection expression term structure analysis apparatus 100 includes an input unit 10, an arithmetic unit 20, and an output unit 50 as shown in FIG.
入力部10は、解析対象となる文書を受け付ける。 The input unit 10 receives a document to be analyzed.
演算部20は、文分割部30と、談話構造解析部32と、構文解析部34と、接続表現抽出部36と、項位置関係決定部38と、文内項抽出部40と、文間項抽出部42と、意味クラス分類部44とを含んで構成されている。 The computing unit 20 includes a sentence dividing unit 30, a discourse structure analyzing unit 32, a syntax analyzing unit 34, a connection expression extracting unit 36, a term positional relationship determining unit 38, an intra-sentence term extracting unit 40, and an inter-sentence term. An extraction unit 42 and a semantic class classification unit 44 are included.
文分割部30は、入力部10により受け付けた文書を取得し、文書に対して文の区切りを与える。文の区切りの認定は既存の文分割器を利用する。あるいは、句点を手がかりとするだけでも良い。なお、予め文分割した文書を入力部10により受け付けて、文分割部30の処理を省略しても良い。 The sentence division unit 30 acquires the document received by the input unit 10 and gives a sentence break to the document. Sentence delimiters are identified using existing sentence dividers. Alternatively, it may be just a clue. Note that a document that has been divided into sentences in advance may be received by the input unit 10 and the processing of the sentence dividing unit 30 may be omitted.
談話構造解析部32は、文分割部30により文区切りが与えられた文書に基づいて、文書に含まれる文の各々の修辞構造に基づく、文の各々を各ノードで表わした談話構造木を生成する。談話構造木によって文同士のノード間の親子関係が表わされる。談話構造木は、非特許文献2などの修辞構造解析器を用いてRSTツリーを生成した後、非特許文献3に記載されているルールを適用することで文同士のノード間の親子関係を決定することができる。また、必ずしもRSTツリーを生成する必要はなく、文同士のノード間の親子関係を表した修辞構造木のデータを用いて学習した解析器を利用することで文同士のノード間の親子関係を得ることも可能である。 The discourse structure analysis unit 32 generates a discourse structure tree in which each sentence is represented by each node based on the rhetorical structure of each sentence included in the document, based on the document given the sentence break by the sentence division unit 30. To do. The discourse structure tree represents a parent-child relationship between nodes of sentences. The discourse structure tree generates an RST tree using a rhetorical structure analyzer such as Non-Patent Document 2, and then determines the parent-child relationship between nodes of sentences by applying the rules described in Non-Patent Document 3. can do. Moreover, it is not always necessary to generate an RST tree, and a parent-child relationship between nodes of sentences is obtained by using an analyzer that has been learned using data of a rhetorical structure tree that represents parent-child relationships between nodes of sentences. It is also possible.
[非特許文献2]:duVerle、 D. and Prendinger、 H. ‘A Novel Discourse Parser Based on Support Vector Machine Classi_cation'. Proc of the 47th ACL, pp. 665{675 (2009) . [Non-Patent Document 2]: duVerle, D. and Prendinger, H. 'A Novel Discourse Parser Based on Support Vector Machine Classi_cation'. Proc of the 47th ACL, pp. 665 {675 (2009).
[非特許文献3]:Tsutomu Hirao、 Yasuhisa Yoshida、 Masaaki Nishino, Norihito Yasuda and Masaaki Nagata. ‘Single-Document Summarization as a Tree Knapsack Problem'. Proc. of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1515{1520, (2013). [Non-Patent Document 3]: Tsutomu Hirao, Yasuhisa Yoshida, Masaaki Nishino, Norihito Yasuda and Masaaki Nagata. 'Single-Document Summarization as a Tree Knapsack Problem'. Proc. Of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1515 {1520, (2013).
構文解析部34は、文分割部30により文区切りが与えられた文書に含まれる文の各々について、構文解析を行って構文木を生成する。構文解析については様々なソフトウェアが開発されているため、既存のソフトウェアを用いて文の各々の構文木を生成すれば良い。 The syntax analysis unit 34 performs syntax analysis on each of the sentences included in the document to which the sentence delimiter 30 has been given a sentence break to generate a syntax tree. Since various software has been developed for parsing, it is only necessary to generate a syntax tree for each sentence using existing software.
接続表現抽出部36は、構文解析部34により生成された文の各々についての構文木に基づいて、項を持つ接続表現を抽出する。接続表現抽出部36は、具体的には、まず文書中に出現する単語について、予め人手で整備した接続表現候補辞書(図示省略)の辞書エントリ表現を参照し、辞書エントリ表現にマッチする単語を抽出する。そして、辞書エントリ表現にマッチする単語が項をとる接続表現か否かを、辞書エントリ表現が項をとる否かを注釈付けした学習データを用いて訓練したSVM、ロジスティック回帰のような2値分類器を利用して判定し、項をとる接続表現を抽出する。文書中に出現する単語が、項をとる接続表現か否かを判定するために利用する特徴として、以下の(1)〜(5)のような特徴を用いれば良い。 The connection expression extraction unit 36 extracts a connection expression having a term based on the syntax tree for each sentence generated by the syntax analysis unit 34. Specifically, the connection expression extraction unit 36 first refers to a dictionary entry expression of a connection expression candidate dictionary (not shown) prepared manually in advance for a word that appears in a document, and selects a word that matches the dictionary entry expression. Extract. Then, binary classification such as SVM and logistic regression trained using learning data in which whether or not a word matching the dictionary entry expression is a connected expression that takes a term is annotated. A connection expression that takes a term is extracted. The following features (1) to (5) may be used as features used to determine whether a word appearing in a document is a connected expression that takes a term.
(1)辞書エントリ表現とその品詞
(2)辞書エントリ表現の前後5単語とそれらの品詞
(3)構文木における辞書エントリ表現の深さ
(4)構文木における辞書エントリ表現の親、左の兄弟、右の兄弟
(5)構文木における辞書エントリ表現から根までのパス
(1) Dictionary entry expression and its part of speech (2) Five words before and after dictionary entry expression and their part of speech (3) Depth of dictionary entry expression in syntax tree (4) Parent of dictionary entry expression in syntax tree, left brother , Right sibling (5) Path from dictionary entry representation to root in syntax tree
項位置関係決定部38は、接続表現抽出部36によって抽出された接続表現について、接続表現を含む文内に、接続表現によって結ばれた2つの項が出現するか否かを判定する。項位置関係決定部38は、具体的には、接続表現抽出部36と同様に予め学習データを用いて訓練したSVM、ロジスティク回帰などの2値分類器を用いて、接続表現を含む文内に、接続表現によって結ばれた2つの項が出現するか否かを判定する。判定に利用する特徴は接続表現抽出部36に利用した上記(1)〜(5)の特徴に加え、接続表現の出現位置(文の前半、なかば、後半など)も用いる。 The term positional relationship determination unit 38 determines whether or not two terms connected by the connection expression appear in the sentence including the connection expression for the connection expression extracted by the connection expression extraction unit 36. Specifically, the term positional relationship determination unit 38 uses a binary classifier such as SVM or logistic regression previously trained using learning data in the same manner as the connection representation extraction unit 36, and uses the binary classifier such as SVM and logistic regression in the sentence including the connection representation. Then, it is determined whether or not two terms connected by the connection expression appear. In addition to the features (1) to (5) used for the connection expression extraction unit 36, the characteristics used for the determination use the appearance position of the connection expression (the first half of the sentence, the middle, the second half, etc.).
項位置関係決定部38は、接続表現を含む文内に接続表現によって結ばれた2つの項が出現しないと判定した場合、接続表現を含む文を、接続表現によって結ばれた2つの項のうちの項2を抽出するための文として決定し、談話構造解析部32によって生成された談話構造木において、接続表現を含む文の親ノード又は兄弟ノードに対応する文を、接続表現によって結ばれた2つの項のうちの項1を抽出するための文として決定する。 When it is determined that the two terms connected by the connection expression do not appear in the sentence including the connection expression, the term positional relationship determination unit 38 selects the sentence including the connection expression from the two terms connected by the connection expression. In the discourse structure tree generated by the discourse structure analysis unit 32, the sentence corresponding to the parent node or sibling node of the sentence including the connection expression is connected by the connection expression. Of the two terms, term 1 is determined as a sentence for extracting.
文内項抽出部40は、項位置関係決定部38によって接続表現を含む文内に、接続表現によって結ばれた2つの項が出現すると判定された場合、接続表現を含む文から、接続表現によって結ばれた項1、及び項2を抽出し、出力部50に出力する。 When the term positional relationship determination unit 38 determines that two terms connected by the connection expression appear in the sentence including the connection expression, the in-sentence term extraction unit 40 uses the connection expression from the sentence including the connection expression. The connected terms 1 and 2 are extracted and output to the output unit 50.
文内項抽出部40は、具体的には、構文解析部34により生成した文の各々の構文木のうち、接続表現を含む文の構文木を受け取り、接続表現が従属接続、又は等位接続の場合に、それぞれ以下のルールを適用して項1、及び項2を抽出する。なお、接続表現と、従属接続又は等位接続との対応関係は予め人手で与えておく。 Specifically, the sentence internal term extraction unit 40 receives a syntax tree of a sentence including a connection expression among the syntax trees of the sentence generated by the syntax analysis unit 34, and the connection expression is a dependent connection or a coordinate connection. In this case, the following rules are applied to extract the terms 1 and 2 respectively. The correspondence relationship between the connection expression and the subordinate connection or the equipotential connection is previously given manually.
まず、接続表現が従属接続の場合の項1、及び項2の抽出方法について説明する。 First, the method for extracting terms 1 and 2 when the connection representation is dependent connection will be described.
接続表現が従属接続の場合、項2を以下の(1)、(2)の手順で抽出する。 When the connection expression is a subordinate connection, the term 2 is extracted by the following procedures (1) and (2).
(1)対象とする接続表現の最後の単語を表すノードを、構文木のノードをあらわすノード変数xに代入する。
(2)xの親ノードをxに代入する。この操作をxに代入されたノードがSBARまたはSのラベルをとるまで繰り返し、どちらかのラベルをとった時点でのxによって支配されるテキストスパンを項2とする。
(1) A node representing the last word of the target connection expression is assigned to a node variable x representing a node in the syntax tree.
(2) Substitute the parent node of x into x. This operation is repeated until the node assigned to x takes the label of SBAR or S, and the text span dominated by x at the time of taking either label is term 2.
図2に抽出の例を示す。図2の例では、まずxにbecauseが代入される。becauseはS、又はSBARのどちらでもないため、xにbecauseの親ノードであるINを代入する。INはS、又はSBARのどちらでもないため、xにINの親ノードであるSBARを代入する。xがSBARとなったので処理が終わり、xに代入されたSBARが支配するスパン「because he is honest」を項2とする。 FIG. 2 shows an example of extraction. In the example of FIG. 2, first, because is substituted for x. Since “because” is neither S nor SBAR, the parent node of “because” is substituted for x. Since IN is neither S nor SBAR, SBAR which is the parent node of IN is substituted for x. Since x becomes SBAR, the process ends, and the span “because he is honest” controlled by the SBAR assigned to x is term 2.
次に、接続表現が従属接続の場合、項1を以下(1)、(2)の手順で抽出する。なお、xは項2の手順が終了した時点での値を引き継ぐ。 Next, when the connection expression is a subordinate connection, the term 1 is extracted by the following procedures (1) and (2). Note that x takes over the value at the time point when the procedure of item 2 is completed.
(1)xの親ノードをxに代入する。
(2)xに代入されたノードがSBARまたはSのラベルをとるまで繰り返し、どちらかのラベルをとった時点でのxによって支配されるテキストスパンを取り出し、そこから項2のスパンを取り除いたものを項1とする。
(1) Substitute the parent node of x into x.
(2) Repeat until the node assigned to x takes the SBAR or S label, extract the text span dominated by x at the time of taking either label, and remove the span of term 2 from it Is term 1.
図2の例では、項2を決定した時点で、xには「because he is honest」を支配するSBARが代入されているので、その親ノードであるVPをxに代入する。VPはS、SBARのどちらでもないので、さらにその親ノードであるSをxに代入する。xがSとなったので処理を終え、Sが支配するスパン「I like him because he is honest」を取り出し、そこから項2のスパン「because he is honest」を取り除いたスパン「I like him」を項1とする。 In the example of FIG. 2, when the term 2 is determined, since the SBAR governing “because he is honest” is substituted for x, the parent node VP is substituted for x. Since VP is neither S nor SBAR, the parent node S is further substituted for x. Since x becomes S, the process ends, and the span “I like him because he is honest” that S controls is taken out, and the span “I like him” is removed from the span “because he is honest” in item 2. This is term 1.
次に、接続表現が等位接続の場合の項1、及び項2の抽出方法について説明する。 Next, the extraction method of term 1 and term 2 when the connection representation is equipotential connection will be described.
接続表現が等位接続の場合、項2を以下(1)〜(3)の手順で抽出する。 When the connection expression is equipotential connection, the term 2 is extracted by the following procedures (1) to (3).
(1)対象とする接続表現の最後の単語を表すノードを、ノード変数xに代入し、xの親ノードをノード変数yに代入する。
(2)x、yにそれぞれの親ノードを代入する。
(3)x、yが支配するスパンであるspan(x)及びspan(y)の最左の単語が一致しなくなるまで、(2)を繰り返す。一致しなくなった時点で、yが支配するスパンのうち接続表現直後の単語からスパンの最後の単語までを項2とする。
(1) The node representing the last word of the target connection expression is assigned to the node variable x, and the parent node of x is assigned to the node variable y.
(2) Substitute the respective parent nodes for x and y.
(3) Repeat (2) until the leftmost words of span (x) and span (y), which are spans governed by x and y, do not match. The term from the word immediately after the connection expression to the last word of the span among the spans dominated by y at the time when they do not coincide with each other is termed item 2.
図3に抽出対象となる構造木の第1の例を示す。図3の例では、まずxにandを代入し、yにCCを代入する。span(x)、span(y)の最左の単語がandで一致するためxにCC、yにSを代入する。span(x)の最左の単語はand、span(y)の最左の単語はHeとなり、一致しないので処理を終了する。そして、span(y)、つまり、「He became a student and he received a grant」のand直後からのスパン「he received a grant」を項2とする。 FIG. 3 shows a first example of a structural tree to be extracted. In the example of FIG. 3, first, and is substituted for x, and CC is substituted for y. Since the leftmost word of span (x) and span (y) matches with and, CC is substituted for x and S is substituted for y. The leftmost word of span (x) is “and”, and the leftmost word of span (y) is “He”. Then, span (y), that is, the span “he received a grant” immediately after the “He became a student and he received a grant” is term 2.
図4に抽出対象となる構造木の第2の例を示す。図4の例では、まずxにbut、yにCCを代入する。span(x)、span(y)の最左の単語がbutで一致するため、xにCC、yにVPを代入する。span(x)とspan(y)の最左の単語はそれぞれbutとwereとで一致しないので処理を終了する。yが支配するスパンのうちbutの直後からのスパン「were not adjusted for ination」を項2とする。 FIG. 4 shows a second example of the structural tree to be extracted. In the example of FIG. 4, first, but is substituted for x and CC is substituted for y. Since the leftmost word of span (x) and span (y) matches with but, CC is substituted for x and VP is substituted for y. Since the leftmost words of span (x) and span (y) do not match in but and were, respectively, the process ends. The span “were not adjusted for ination” immediately after but of the spans controlled by y is term 2.
次に、接続表現が等位接続の場合、項1を以下(1)、及び(2)の手順で抽出する。なお、x、yは項2の手順が終了した時点での値を引き継ぐ。 Next, when the connection representation is equipotential connection, the term 1 is extracted by the following procedures (1) and (2). Note that x and y take on the values at the time point when the procedure of item 2 is completed.
(1)yの子ノードのうちxよりも左にSあるいはSBARが存在する場合(複数存在する場合には最右を選択)、そのノードが支配するスパンを項1とする。
(2)上記(1)に該当しない場合、yにその親を代入しSBARまたはSのラベルをとるまで構文木を遡る。SBARあるいはSをとった時点でのyが支配するスパンから接続表現と項2を取り除いたスパンを項1とする。
(1) If S or SBAR exists to the left of x among the child nodes of y (the rightmost is selected when there are a plurality of child nodes), the span controlled by the node is term 1.
(2) If the above (1) does not apply, the parent is substituted for y and the syntax tree is traced until SBAR or S label is obtained. The span obtained by removing the connection expression and the term 2 from the span dominated by y at the time when SBAR or S is taken is defined as term 1.
図3の構造木から抽出する例では、項2を決定した時点では、xはCC、yはSである。ここで、xよりも左のyの子ノードの中にSがあるため、そのSが支配するスパン「He became a student」を項1とする。 In the example extracted from the structural tree in FIG. 3, x is CC and y is S when the term 2 is determined. Here, since S is in the child node of y to the left of x, the span “He became a student” governed by S is term 1.
また、図4の構造木から抽出する例では、項2を決定した時点では、xはCC、yはVPである。ここで、xよりも左のyの子ノードの中にS、SBARとも存在しないため、yにその親を代入する。するとyがSとなるので処理を終了する。yが支配するスパン「The gures were adjusted for deation、 but were not adjusted for ination」から「but ware not adjusted for ination」を取り除いた「The _gures were adjusted for deation」を項1とする。 In the example extracted from the structural tree in FIG. 4, x is CC and y is VP when the term 2 is determined. Here, since neither S nor SBAR exists in the child node of y to the left of x, its parent is substituted for y. Then, since y becomes S, the process is terminated. “The _gures were adjusted for deation”, which is obtained by removing “but ware not adjusted for ination” from the span “y gures were adjusted for deation, but were not adjusted for ination”, is defined as item 1.
文間項抽出部42は、項位置関係決定部38によって、接続表現を含む文内に接続表現によって結ばれた2つの項が出現しないと判定された場合、項位置関係決定部38によって、接続表現によって結ばれた2つの項のうちの項2を抽出するための文として決定された接続表現を含む文から、項2を抽出し、項位置関係決定部38によって、接続表現によって結ばれた2つの項のうちの項1を抽出するための文として決定された、接続表現を含む文の親ノード又は兄弟ノードに対応する文から、項1を抽出し、抽出した2つの項を出力部50に出力する。 The inter-sentence term extraction unit 42 determines that the term positional relationship determination unit 38 determines that the two terms connected by the connection representation do not appear in the sentence including the connection representation. The term 2 is extracted from the sentence including the connection expression determined as the sentence for extracting the term 2 out of the two terms connected by the expression, and is connected by the connection expression by the term positional relationship determination unit 38. The term 1 is extracted from the sentence corresponding to the parent node or the sibling node of the sentence including the connection expression determined as the sentence for extracting the term 1 of the two terms, and the two extracted terms are output. Output to 50.
意味クラス分類部44は、接続表現抽出部36によって抽出された接続表現に基づいて、接続表現の意味クラスを分類し、接続表現及び当該接続表現の意味クラスを出力部50に出力する。意味クラス分類部44は、具体的には、接続表現抽出部36で抽出された接続表現と、接続表現の周辺の単語とを入力として、予め学習データにより学習した多クラスの分類問題を解くことにより、接続表現に対する意味クラスを分類する。なお、多クラス分類問題であるため、学習データ中のクラス分布がなるべく均一になるようにデータを学習データから再サンプリングする。 The semantic class classification unit 44 classifies the semantic class of the connection representation based on the connection representation extracted by the connection representation extraction unit 36, and outputs the connection representation and the semantic class of the connection representation to the output unit 50. Specifically, the semantic class classification unit 44 inputs the connection expression extracted by the connection expression extraction unit 36 and words around the connection expression, and solves a multi-class classification problem learned in advance from learning data. To classify semantic classes for connection expressions. Since this is a multi-class classification problem, the data is resampled from the learning data so that the class distribution in the learning data is as uniform as possible.
<本発明の第1の実施の形態に係る接続表現項構造解析装置の作用> <Operation of the connection expression term structure analyzing apparatus according to the first embodiment of the present invention>
次に、本発明の第1の実施の形態に係る接続表現項構造解析装置100の作用について説明する。入力部10において文書を受け付けると、接続表現項構造解析装置100は、図5に示す接続表現項構造解析処理ルーチンを実行する。 Next, the operation of the connection expression term structure analysis apparatus 100 according to the first embodiment of the present invention will be described. When the input unit 10 accepts a document, the connection expression term structure analysis apparatus 100 executes a connection expression term structure analysis processing routine shown in FIG.
まず、ステップS100では、入力部10において受け付けた文書を取得し、文書に対して文の区切りを与える。 First, in step S100, a document accepted by the input unit 10 is acquired, and sentence breaks are given to the document.
次に、ステップS102では、ステップS100で文区切りが与えられた文書に基づいて、文書に含まれる文の各々の修辞構造に基づく、文の各々を各ノードで表わした談話構造木を生成する。 Next, in step S102, a discourse structure tree in which each sentence is represented by each node is generated based on the rhetorical structure of each sentence included in the document, based on the document given the sentence break in step S100.
ステップS104では、ステップS100で文区切りが与えられた文書に含まれる文の各々について、構文解析を行って構文木を生成する。 In step S104, a syntax tree is generated by performing syntax analysis on each of the sentences included in the document given the sentence break in step S100.
ステップS106では、ステップS104で生成された文の各々についての構文木に基づいて、項を持つ接続表現を抽出する。 In step S106, a connection expression having a term is extracted based on the syntax tree for each of the sentences generated in step S104.
ステップS108では、ステップS106において、抽出された接続表現について、接続表現を含む文内に、接続表現によって結ばれた2つの項が出現するか否かを判定する。また、ステップS108では、接続表現を含む文内に接続表現によって結ばれた2つの項が出現しないと判定した場合、接続表現を含む文を、接続表現によって結ばれた2つの項のうちの項2を抽出するための文として決定し、ステップS102で生成された談話構造木において、接続表現を含む文の親ノード又は兄弟ノードに対応する文を、接続表現によって結ばれた2つの項のうちの項1を抽出するための文として決定する。 In step S108, it is determined whether or not two terms connected by the connection expression appear in the sentence including the connection expression for the connection expression extracted in step S106. In step S108, when it is determined that two terms connected by the connection expression do not appear in the sentence including the connection expression, the sentence including the connection expression is replaced with a term of the two terms connected by the connection expression. 2 is extracted as a sentence for extracting, and in the discourse structure tree generated in step S102, a sentence corresponding to the parent node or sibling node of the sentence including the connection expression is selected from the two terms connected by the connection expression. Is determined as a sentence for extracting the first term.
ステップS110では、ステップS108で接続表現を含む文内に、接続表現によって結ばれた2つの項が出現すると判定された場合、接続表現を含む文から、接続表現によって結ばれた項1、及び項2を抽出し、出力部50に出力する。 In step S110, when it is determined in step S108 that two terms connected by the connection expression appear in the sentence including the connection expression, the term 1 and the term connected by the connection expression from the sentence including the connection expression are included. 2 is extracted and output to the output unit 50.
ステップS112では、ステップS108において、接続表現を含む文内に、接続表現によって結ばれた2つの項が出現しないと判定された場合、ステップS108で項2を抽出するための文として決定された接続表現を含む文から、項2を抽出し、ステップS108で項1を抽出するための文として決定された、接続表現を含む文の親ノード又は兄弟ノードに対応する文から、項1を抽出し、抽出した2つの項を出力部50に出力する。 In step S112, if it is determined in step S108 that the two terms connected by the connection expression do not appear in the sentence including the connection expression, the connection determined as the sentence for extracting the term 2 in step S108. The term 2 is extracted from the sentence including the expression, and the term 1 is extracted from the sentence corresponding to the parent node or the sibling node of the sentence including the connection expression determined as the sentence for extracting the term 1 in step S108. The two extracted terms are output to the output unit 50.
ステップS114では、ステップS106で抽出された接続表現に基づいて、接続表現の意味クラスを分類し、接続表現及び当該接続表現の意味クラスを出力部50に出力し、接続表現項構造解析処理ルーチンを終了する。 In step S114, the semantic class of the connection representation is classified based on the connection representation extracted in step S106, the connection representation and the semantic class of the connection representation are output to the output unit 50, and the connection representation term structure analysis processing routine is executed. finish.
以上説明したように、第1の実施の形態に係る接続表現項構造解析装置によれば、文書に基づいて、文書に含まれる文の各々の修辞構造に基づく、談話構造木を生成し、構文解析を行って構文木を生成し、項を持つ接続表現を抽出し、接続表現を含む文内に、接続表現によって結ばれた2つの項が出現するか否かを判定し、接続表現によって結ばれた2つの項が出現すると判定された場合、接続表現を含む文から、接続表現によって結ばれた2つの項を抽出し、接続表現によって結ばれた2つの項が出現しないと判定された場合、接続表現を含む文から、項2を抽出し、談話構造木において、接続表現を含む文の親ノード又は兄弟ノードに対応する文から、項1を抽出し、接続表現の意味クラスを分類することにより、隣接しない文間からも、接続表現によって結ばれた項を抽出することができる。 As described above, according to the connection expression term structure analysis apparatus according to the first embodiment, a discourse structure tree is generated based on the rhetorical structure of each sentence included in the document based on the document, and the syntax Parse to generate a syntax tree, extract a connection expression with terms, determine whether two terms connected by the connection expression appear in the sentence containing the connection expression, and connect by the connection expression When it is determined that the two terms appear, the two terms connected by the connection expression are extracted from the sentence including the connection expression, and it is determined that the two terms connected by the connection expression do not appear. The term 2 is extracted from the sentence including the connection expression, and the term 1 is extracted from the sentence corresponding to the parent node or the sibling node of the sentence including the connection expression in the discourse structure tree, and the semantic class of the connection expression is classified. , Even between non-adjacent sentences Can be extracted term tied by a connection represented.
<本発明の第2の実施の形態に係る接続表現項構造解析装置の構成> <Configuration of connection expression term structure analysis device according to second embodiment of the present invention>
次に、本発明の第2の実施の形態に係る接続表現項構造解析装置の構成について説明する。なお、第1の実施の形態と同様の構成となる部分については、同一符号を付して説明を省略する。第2の実施の形態に係る接続表現項構造解析装置では、文書から暗示的接続表現に関する項、及び意味ラベルを抽出する。 Next, the configuration of the connection expression term structure analysis apparatus according to the second embodiment of the present invention will be described. In addition, about the part which becomes the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted. In the connection expression term structure analysis apparatus according to the second embodiment, a term relating to an implicit connection expression and a semantic label are extracted from a document.
図6に示すように、本発明の第2の実施の形態に係る接続表現項構造解析装置200は、CPUと、RAMと、後述する接続表現項構造解析処理ルーチンを実行するためのプログラムや各種データを記憶したROMと、を含むコンピュータで構成することが出来る。この接続表現項構造解析装置200は、機能的には図6に示すように入力部10と、演算部220と、出力部50とを備えている。 As shown in FIG. 6, the connection expression term structure analysis apparatus 200 according to the second embodiment of the present invention includes a CPU, a RAM, a program for executing a connection expression term structure analysis processing routine to be described later, and various programs. It can be constituted by a computer including a ROM storing data. Functionally, the connection expression term structure analysis apparatus 200 includes an input unit 10, a calculation unit 220, and an output unit 50 as shown in FIG.
演算部220は、文分割部30と、談話構造解析部32と、関連文ペア抽出部238と、文間項抽出部242と、意味クラス分類部244とを含んで構成されている。 The computing unit 220 includes a sentence dividing unit 30, a discourse structure analyzing unit 32, a related sentence pair extracting unit 238, an inter-sentence term extracting unit 242, and a semantic class classifying unit 244.
談話構造解析部32は、第1の実施の形態と同様の処理により、文分割部30により文区切りが与えられた文書に基づいて、文書に含まれる文の各々の修辞構造に基づく、文の各々を各ノードで表わした談話構造木を生成する。 The discourse structure analysis unit 32 performs the same process as in the first embodiment based on the rhetorical structure of each sentence included in the document based on the document given the sentence break by the sentence division unit 30. A discourse structure tree in which each node is represented is generated.
関連文ペア抽出部238は、談話構造解析部32によって生成された談話構造木に基づいて、親子ノードに対応する文のペア、及び兄弟ノードに対応する文のペアを、接続関係を持つ文のペアの候補とし、接続関係を持つ文のペアの候補の各々について、接続関係があるか否かを判定する。 Based on the discourse structure tree generated by the discourse structure analysis unit 32, the related sentence pair extraction unit 238 converts a sentence pair corresponding to a parent-child node and a sentence pair corresponding to a sibling node to a sentence having a connection relationship. It is determined whether or not there is a connection relationship for each candidate pair of sentences having a connection relationship as a pair candidate.
関連文ペア抽出部238は、具体的には、談話構造木を入力として受け取り、木の親子ノード、及び兄弟ノードとなる文のペアを、接続関係を持つ文ペアの候補とし、これらの文ペアの候補の各々に対して、予め学習した2値分類器を利用することで文ペアに接続関係があるか否かを決定する。2値分類器は、学習データの文ペアとして文Si、文Sjを用意し、以下の(1)〜(5)の特徴を用いて、2値分類器を学習する。 Specifically, the related sentence pair extraction unit 238 receives a discourse structure tree as an input, sets a sentence pair that becomes a parent-child node and a sibling node of the tree as candidates for a sentence pair having a connection relationship, and sets these sentence pairs. For each of the candidates, it is determined whether or not the sentence pair has a connection relationship by using a binary classifier learned in advance. The binary classifier prepares sentences S i and S j as sentence pairs of learning data, and learns the binary classifier using the following features (1) to (5).
(1)文Si、及び文Sjの先頭の単語
(2)文Si、及び文Sjの最後の単語
(3)文Si、及び文Sjの先頭の3単語
(4)文Siに含まれる単語と文Sjに含まれる単語とのペアすべて
(5)文Siに含まれる単語の意味クラスと文Sjに含まれる単語の意味クラスのペアすべて
(1) sentence S i, and sentence S beginning word of j (2) statements S i, and sentence S last word (3) of the j statement S i, and sentence S 3 words (4) of the head of the j sentence All pairs of words included in S i and words included in sentence S j (5) All pairs of meaning classes of words included in sentence S i and meaning classes of words included in sentence S j
なお、上記(5)の特徴である単語の意味クラスは既存のシソーラスや単語クラスタリングの結果から得ることができる。さらに、関連文ペア抽出部238は、接続関係があると判定された文のペアの候補の各々について、談話構造木が表現する修飾、被修飾関係を利用して、項1を抽出するための文、及び項2を抽出するための文を決定する。例えば、文Siが文Sjの子ノードであれば、文Siを、項2を抽出するための文とし、文Sjを、項1を抽出するための文とする。文Si、及び文Sjが兄弟ノードであるなら、文番号の小さいものを、項1を抽出するための文とし、大きいものを、項2を抽出するための文とする。 Note that the word semantic class, which is the feature (5) above, can be obtained from the results of existing thesauruses and word clustering. Further, the related sentence pair extraction unit 238 extracts the term 1 for each of the sentence pair candidates determined to have the connection relation by using the modification and the modified relation represented by the discourse structure tree. The sentence and the sentence for extracting the term 2 are determined. For example, if the child node of the statement S i the sentence S j, the statement S i, a statement for extracting the second aspect, the sentence S j, a sentence for extracting claim 1. If the sentence S i and the sentence S j are sibling nodes, a sentence having a small sentence number is a sentence for extracting the term 1, and a sentence having a large sentence number is a sentence for extracting the term 2.
文間項抽出部242は、関連文ペア抽出部238によって接続関係があると判定された接続関係を持つ文のペアの候補の各々について、当該接続関係を持つ文のペアの候補から、暗示的な接続表現によって結ばれる2つの項を抽出する。なお、関連文ペア抽出部238において、項1、及び項2がどの文から抽出されるかの判定は終わっているため、ここでは以下の(1)及び(2)の操作で項のみを取り出す。 The inter-sentence term extraction unit 242 implicitly determines, from each of the sentence pair candidates having a connection relationship determined by the related sentence pair extraction unit 238, from the sentence pair candidates having the connection relationship. Two terms connected by a simple connection expression are extracted. In addition, since the related sentence pair extraction unit 238 determines which sentence the terms 1 and 2 are extracted from, only the terms are extracted by the following operations (1) and (2). .
(1)文中に含まれる記号のうち、「。」、「!」、「?」の文末表現を削除する。
(2)文頭、文末における「“”」等の括弧表現を削除する。
(1) Delete the sentence end expressions of “.”, “!”, “?” Among the symbols included in the sentence.
(2) Delete parentheses such as ““ ”at the beginning and end of the sentence.
文間項抽出部242では、上記の(1)及び(2)の操作を変化がなくなるまで繰り返し、暗示的な接続関係を有する2つの項を出力部50に出力する。 The inter-sentence term extraction unit 242 repeats the operations (1) and (2) until there is no change, and outputs two terms having an implicit connection relationship to the output unit 50.
意味クラス分類部244は、関連文ペア抽出部238によって接続関係があると判定された接続関係を持つ文のペアの候補の各々について、接続関係を持つ文のペアの候補に基づいて、暗示的な接続表現の意味クラスを分類し、出力部50に出力する。意味クラス分類部244は、文ペアの候補の各々を入力として、予め学習データにより学習した多クラスの分類問題を解くことにより、文ペアの候補の各々の文同士をつなぐ接続関係の意味クラスを決定する。学習及び分類に用いる特徴は、上記関連文ペア抽出部238で利用した(1)〜(5)の特徴を利用する。さらに、多クラス分類問題であるため、学習データ中のクラス分布がなるべく均一になるようにデータを学習データから再サンプリングする。 The semantic class classifying unit 244 implicitly determines, based on the sentence pair candidates having the connection relationship, each of the sentence pair candidates having the connection relationship determined by the related sentence pair extracting unit 238 to have the connection relationship. The semantic classes of connection expressions are classified and output to the output unit 50. The semantic class classification unit 244 receives each sentence pair candidate as an input and solves a multi-class classification problem learned in advance from learning data, thereby determining a semantic class of a connection relationship that connects the sentences of each sentence pair candidate. decide. As features used for learning and classification, the features (1) to (5) used in the related sentence pair extraction unit 238 are used. Furthermore, since it is a multi-class classification problem, the data is resampled from the learning data so that the class distribution in the learning data is as uniform as possible.
<本発明の第2の実施の形態に係る接続表現項構造解析装置の作用> <Operation of the connection expression term structure analyzing apparatus according to the second embodiment of the present invention>
次に、本発明の第2の実施の形態に係る接続表現項構造解析装置200の作用について説明する。入力部10において文書を受け付けると、接続表現項構造解析装置200は、図7に示す接続表現項構造解析処理ルーチンを実行する。なお、第1の実施の形態と同様の作用となる箇所については同一符号を付して説明を省略する。 Next, the operation of the connection expression term structure analysis apparatus 200 according to the second embodiment of the present invention will be described. When the input unit 10 accepts a document, the connection expression term structure analysis apparatus 200 executes a connection expression term structure analysis processing routine shown in FIG. In addition, about the location which becomes the effect | action similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.
ステップS200では、ステップS102で生成された談話構造木に基づいて、親子ノードに対応する文のペア、及び兄弟ノードに対応する文のペアを、接続関係を持つ文のペアの候補とし、接続関係を持つ文のペアの候補の各々について、接続関係があるか否かを判定する。また、ステップ200では、接続関係があると判定された文のペアの候補の各々について、談話構造木が表現する修飾、被修飾関係を利用して、項2を抽出するための文、及び項1を抽出するための文を決定する。 In step S200, based on the discourse structure tree generated in step S102, the sentence pair corresponding to the parent-child node and the sentence pair corresponding to the sibling node are used as candidate sentence pairs having a connection relation, and the connection relation It is determined whether or not there is a connection relationship for each of the sentence pair candidates having. Further, in step 200, for each sentence pair candidate determined to have a connection relationship, a sentence for extracting term 2 and a term using the modification and modified relationships expressed by the discourse structure tree The sentence for extracting 1 is determined.
次に、ステップS202では、ステップS200で接続関係があると判定された接続関係を持つ文のペアの候補の各々について、当該接続関係を持つ文のペアの候補から、暗示的な接続表現によって結ばれる2つの項を抽出し、出力部50に出力する。 Next, in step S202, each of the sentence pair candidates having the connection relationship determined to have the connection relationship in step S200 is connected from the sentence pair candidates having the connection relationship by an implicit connection expression. Are extracted and output to the output unit 50.
そして、ステップS204では、ステップS200で接続関係があると判定された接続関係を持つ文のペアの候補の各々について、接続関係を持つ文のペアの候補に基づいて、暗示的な接続表現の意味クラスを分類し、出力部50に出力し、接続表現項構造解析処理ルーチンを終了する。 Then, in step S204, for each of the sentence pair candidates having the connection relation determined to have the connection relation in step S200, the meaning of the implicit connection expression based on the sentence pair candidates having the connection relation. The class is classified and output to the output unit 50, and the connection expression term structure analysis processing routine is terminated.
なお、第2の実施の形態に係る接続表現項構造解析装置200の他の構成及び作用については、第1の実施の形態と同様であるため、説明を省略する。 In addition, about the other structure and effect | action of the connection expression term structure analysis apparatus 200 which concern on 2nd Embodiment, since it is the same as that of 1st Embodiment, description is abbreviate | omitted.
以上説明したように、第2の実施の形態に係る接続表現項構造解析装置によれば、文書に基づいて、修辞構造に基づく、文の各々を各ノードで表わした談話構造木を生成し、談話構造木に基づいて、親子ノードに対応する文のペア、及び兄弟ノードに対応する文のペアを、接続関係を持つ文のペアの候補とし、接続関係を持つ文のペアの候補の各々について、接続関係があるか否かを判定し、接続関係を持つ文のペアの候補から、暗示的な接続表現によって結ばれる2つの項を抽出し、暗示的な接続表現の意味クラスを分類することにより、隣接しない文間からも、接続関係を持つ意味的に結ばれた項を抽出することができる。 As described above, according to the connection expression term structure analysis device according to the second embodiment, a discourse structure tree in which each sentence is represented by each node based on a rhetorical structure is generated based on a document, Based on the discourse structure tree, a sentence pair corresponding to a parent-child node and a sentence pair corresponding to a sibling node are used as a sentence pair candidate having a connection relation, and each of a sentence pair candidate having a connection relation is selected. Determine whether there is a connection relationship, extract two terms connected by an implicit connection expression from candidate sentence pairs with a connection relationship, and classify the semantic class of the implicit connection expression Thus, it is possible to extract semantically connected terms having a connection relationship even between non-adjacent sentences.
なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.
例えば、上述した実施の形態では、第1の実施の形態に係る接続表現項構造解析装置によって、文書から明示的接続表現に関する接続表現、項、及び意味ラベルを抽出し、第2の実施の形態に係る接続表現項構造解析装置によって、文書から暗示的接続表現に関する接続表現、項、及び意味ラベルを抽出する場合を例に説明したが、これに限定されるものではなく、一つの接続表現項構造解析装置によって、文書から明示的接続表現に関する接続表現、項、及び意味ラベル、並びに暗示的接続表現に関する項、及び意味ラベルを抽出するようにしてもよい。 For example, in the above-described embodiment, the connection expression, the term, and the semantic label related to the explicit connection expression are extracted from the document by the connection expression term structure analysis apparatus according to the first embodiment, and the second embodiment. In the above description, the connection expression, the term, and the semantic label related to the implicit connection expression are extracted from the document by the connection expression term structure analysis apparatus according to the present invention. However, the present invention is not limited to this. You may make it extract the connection expression regarding an explicit connection expression, a term, and a semantic label, and the term regarding an implicit connection expression, and a semantic label from a document by a structure analysis apparatus.
10 入力部
20、220 演算部
30 文分割部
32 談話構造解析部
34 構文解析部
36 接続表現抽出部
38 項位置関係決定部
40 文内項抽出部
42、242 文間項抽出部
44、244 意味クラス分類部
46 文間項抽出部
50 出力部
100、200 接続表現項構造解析装置
238 関連文ペア抽出部
DESCRIPTION OF SYMBOLS 10 Input part 20,220 Operation part 30 Sentence division part 32 Discourse structure analysis part 34 Syntax analysis part 36 Connection expression extraction part 38 Term positional relationship determination part 40 Sentence term extraction part 42,242 Inter sentence sentence extraction part 44,244 Meaning Class classification unit 46 Inter-sentence term extraction unit 50 Output unit 100, 200 Connection expression term structure analysis device 238 Related sentence pair extraction unit
Claims (5)
前記文書に含まれる文の各々について、構文解析を行って構文木を生成する構文解析部と、
予め定められた接続表現候補辞書及び前記構文木に基づき、前記文書に含まれる文の各々から接続表現候補、及び当該接続表現候補に対応する特徴量であって、当該接続表現候補の出現位置を含む特徴量を抽出し、前記抽出した特徴量を入力として、当該特徴量に対応する接続表現が項をもつか否かを示す第1の判定結果を出力するよう予め学習された第1の分類器を用いて、項を持つ接続表現を前記接続表現候補の中から抽出する接続表現抽出部と、
前記抽出した特徴量を入力とし、当該特徴量に対応する接続表現を含む文内に接続表現で結ばれた2つの項が出現するか否かを示す第2の判定結果を出力するよう予め学習された第2の分類器を用いて、前記接続表現抽出部が抽出した接続表現毎に当該接続表現に対応する第2の判定結果を求め、
前記第2の判定結果を求めた接続表現について、前記第2の判定結果が2つの項が出現しないことを示す場合に、当該接続表現を含む文を、前記2つの項のうちの一方の項を抽出するための文と決定し、前記談話構造木において、当該接続表現を含む文の親ノード又は兄弟ノードに対応する文を前記2つの項のうちの他方の項を抽出するための文と決定する項位置関係決定部と、
前記項位置関係決定部による前記決定に基づき、接続表現によって結ばれた2つの項を抽出する文間項抽出部と、
前記第2の判定結果を求めた接続表現について、前記第2の判定結果が2つの項が出現することを示す場合、当該接続表現を含む文から、当該接続表現によって結ばれた2つの項を抽出する文内項抽出部と、
前記接続表現抽出部によって抽出された前記接続表現に基づいて、前記接続表現の意味クラスを分類する意味クラス分類部と、
を含む接続表現項構造解析装置。 Based on the input document, based on the rhetorical structure of each of the sentences included in the document, a discourse structure analyzing unit that generates a discourse structure tree representing each of the sentences by each node;
For each of the sentences included in the document, a syntax analysis unit that performs syntax analysis and generates a syntax tree;
Based on a predetermined connection expression candidate dictionary and the syntax tree, a connection expression candidate from each of the sentences included in the document, and a feature amount corresponding to the connection expression candidate, and an appearance position of the connection expression candidate A first classification that has been learned in advance so as to extract a feature quantity that includes the extracted feature quantity and output a first determination result indicating whether or not the connection expression corresponding to the feature quantity has a term. A connection expression extraction unit that extracts a connection expression having a term from the connection expression candidates using a container ;
Learning in advance to output the second determination result indicating whether or not two terms connected by a connection expression appear in a sentence including the connection expression corresponding to the feature quantity, using the extracted feature quantity as an input. A second determination result corresponding to the connection representation for each connection representation extracted by the connection representation extraction unit using the second classifier
For the connection expression for which the second determination result is obtained, when the second determination result indicates that two terms do not appear, a sentence including the connection expression is expressed as one of the two terms. And a sentence corresponding to the parent node or sibling node of the sentence including the connection expression in the discourse structure tree, and a sentence for extracting the other of the two terms A term position relationship determining unit to determine;
An inter-sentence term extraction unit that extracts two terms connected by a connection expression based on the determination by the term positional relationship determination unit ;
Connecting representation obtained the second determination result, which indicates that the second determination result is two terms emerge from sentences including the connection representation, the two terms that are connected by the connection represented An internal sentence extraction unit to extract;
A semantic class classification unit that classifies semantic classes of the connection representation based on the connection representation extracted by the connection representation extraction unit;
Connection expression term structure analysis device including
前記談話構造解析部によって生成された前記談話構造木に基づいて、親子ノードに対応する文のペア、及び兄弟ノードに対応する文のペアを、接続関係を持つ文のペアの候補とし、前記文のペアの候補に対応する特徴量を抽出し、前記抽出した特徴量を入力として、当該特徴量に対応する文のペアについて接続関係があるか否かを示す判定結果を出力するよう予め学習された分類器を用いて、前記接続関係を持つ文のペアの候補の各々について、接続関係があるか否かを判定する関連文ペア抽出部と、
前記関連文ペア抽出部によって接続関係があると判定された前記接続関係を持つ文のペアの候補の各々について、前記接続関係を持つ文のペアの候補から、暗示的な接続表現によって結ばれる2つの項を抽出する文間項抽出部と、
前記関連文ペア抽出部によって接続関係があると判定された前記接続関係を持つ文のペアの候補の各々について、前記接続関係を持つ文のペアの候補に基づいて、前記暗示的な接続表現の意味クラスを分類する意味クラス分類部と、
を含む接続表現項構造解析装置。 Based on the input document, based on the rhetorical structure of each of the sentences included in the document, a discourse structure analyzing unit that generates a discourse structure tree representing each of the sentences by each node;
Based on the discourse structure tree generated by the discourse structure analysis unit, a sentence pair corresponding to a parent-child node and a sentence pair corresponding to a sibling node are used as candidate sentence pairs having a connection relationship, and the sentence The feature amount corresponding to the pair candidate is extracted, and the extracted feature amount is input, and learning is performed in advance to output a determination result indicating whether or not there is a connection relationship for the sentence pair corresponding to the feature amount. A related sentence pair extraction unit for determining whether or not there is a connection relation for each of the sentence pair candidates having the connection relation,
Each of the sentence pair candidates having the connection relation determined to have a connection relation by the related sentence pair extraction unit is connected from the sentence pair candidates having the connection relation by an implicit connection expression 2. An inter-sentence term extraction unit for extracting two terms;
For each of the sentence pair candidates having the connection relationship determined to have a connection relationship by the related sentence pair extraction unit, based on the sentence pair candidates having the connection relationship, the implicit connection expression A semantic class classification unit for classifying semantic classes;
Connection expression term structure analysis device including
構文解析部が、前記文書に含まれる文の各々について、構文解析を行って構文木を生成するステップと、
接続表現抽出部が、予め定められた接続表現候補辞書及び前記構文木に基づき、前記文書に含まれる文の各々から接続表現候補、及び当該接続表現候補に対応する特徴量であって、当該接続表現候補の出現位置を含む特徴量を抽出し、前記抽出した特徴量を入力として、当該特徴量に対応する接続表現が項をもつか否かを示す第1の判定結果を出力するよう予め学習された第1の分類器を用いて、項を持つ接続表現を前記接続表現候補の中から抽出するステップと、
項位置関係決定部が、前記抽出した特徴量を入力とし、当該特徴量に対応する接続表現を含む文内に接続表現で結ばれた2つの項が出現するか否かを示す第2の判定結果を出力するよう予め学習された第2の分類器を用いて、前記接続表現抽出部が抽出した接続表現毎に当該接続表現に対応する第2の判定結果を求め、
前記第2の判定結果を求めた接続表現について、前記第2の判定結果が2つの項が出現しないことを示す場合に、当該接続表現を含む文を、前記2つの項のうちの一方の項を抽出するための文と決定し、前記談話構造木において、当該接続表現を含む文の親ノード又は兄弟ノードに対応する文を前記2つの項のうちの他方の項を抽出するための文と決定するステップと、
文間項抽出部が、前記第2の判定結果を求めた接続表現について、前記第2の判定結果が2つの項が出現することを示す場合、当該接続表現を含む文から、当該接続表現によって結ばれた2つの項を抽出するステップと、
意味クラス分類部が、前記接続表現抽出部によって抽出された前記接続表現に基づいて、前記接続表現の意味クラスを分類するステップと、
を含む接続表現項構造解析方法。 A discourse structure analysis unit, based on the input document, based on the rhetorical structure of each sentence included in the document, generating a discourse structure tree in which each of the sentences is represented by each node;
A syntax analysis unit that parses each sentence included in the document to generate a syntax tree;
The connection expression extraction unit is a connection expression candidate from each of the sentences included in the document based on a predetermined connection expression candidate dictionary and the syntax tree, and a feature amount corresponding to the connection expression candidate, the connection expression Learning in advance to extract a feature quantity including the appearance position of an expression candidate, and using the extracted feature quantity as an input, output a first determination result indicating whether or not a connected expression corresponding to the feature quantity has a term Extracting a connection expression having a term from the connection expression candidates using the first classifier ,
A second determination indicating whether or not two terms connected by a connection expression appear in a sentence including the connection expression corresponding to the feature quantity, the term positional relationship determination unit receiving the extracted feature quantity as an input; Using a second classifier previously learned to output a result, a second determination result corresponding to the connection expression is obtained for each connection expression extracted by the connection expression extraction unit,
For the connection expression for which the second determination result is obtained, when the second determination result indicates that two terms do not appear, a sentence including the connection expression is expressed as one of the two terms. And a sentence corresponding to the parent node or sibling node of the sentence including the connection expression in the discourse structure tree, and a sentence for extracting the other of the two terms A step to determine ;
Bunkanko extraction unit, the connection representation obtained the second determination result, which indicates that the second determination result is two terms emerge from sentences including the connection represented by the connection expressed Extracting the connected two terms;
A semantic class classification unit classifying a semantic class of the connection representation based on the connection representation extracted by the connection representation extraction unit;
Connection expression term structure analysis method including
関連文ペア抽出部が、前記談話構造解析部によって生成された前記談話構造木に基づいて、親子ノードに対応する文のペア、及び兄弟ノードに対応する文のペアを、接続関係を持つ文のペアの候補とし、前記文のペアの候補に対応する特徴量を抽出し、前記抽出した特徴量を入力として、当該特徴量に対応する文のペアについて接続関係があるか否かを示す判定結果を出力するよう予め学習された分類器を用いて、前記接続関係を持つ文のペアの候補の各々について、接続関係があるか否かを判定するステップと、
文間項抽出部が、前記関連文ペア抽出部によって接続関係があると判定された前記接続関係を持つ文のペアの候補の各々について、前記接続関係を持つ文のペアの候補から、暗示的な接続表現によって結ばれる2つの項を抽出するステップと、
意味クラス分類部が、前記関連文ペア抽出部によって接続関係があると判定された前記接続関係を持つ文のペアの候補の各々について、前記接続関係を持つ文のペアの候補に基づいて、前記暗示的な接続表現の意味クラスを分類するステップと、
を含む接続表現項構造解析方法。 A discourse structure analysis unit, based on the input document, based on the rhetorical structure of each sentence included in the document, generating a discourse structure tree in which each of the sentences is represented by each node;
Based on the discourse structure tree generated by the discourse structure analysis unit, a related sentence pair extraction unit converts a sentence pair corresponding to a parent-child node and a sentence pair corresponding to a sibling node to a sentence having a connection relationship. A determination result indicating whether or not there is a connection relationship with respect to a sentence pair corresponding to the feature amount, by extracting the feature amount corresponding to the sentence pair candidate as a pair candidate, and using the extracted feature amount as an input Determining whether or not there is a connection relationship for each of the sentence pair candidates having the connection relationship using a classifier previously learned to output
An inter-sentence term extraction unit implicitly determines, from each of the sentence pair candidates having the connection relationship, for each of the sentence pair candidates having the connection relationship determined by the related sentence pair extraction unit. Extracting two terms connected by a simple connection expression;
For each of the sentence pair candidates having the connection relationship determined by the related sentence pair extraction unit, the semantic class classification unit, based on the sentence pair candidates having the connection relationship, Categorizing semantic classes of implicit connection expressions;
Connection expression term structure analysis method including
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2015141649A JP6499537B2 (en) | 2015-07-15 | 2015-07-15 | Connection expression structure analysis apparatus, method, and program |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2015141649A JP6499537B2 (en) | 2015-07-15 | 2015-07-15 | Connection expression structure analysis apparatus, method, and program |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JP2017027111A JP2017027111A (en) | 2017-02-02 |
| JP6499537B2 true JP6499537B2 (en) | 2019-04-10 |
Family
ID=57946570
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2015141649A Active JP6499537B2 (en) | 2015-07-15 | 2015-07-15 | Connection expression structure analysis apparatus, method, and program |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JP6499537B2 (en) |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0795323B2 (en) * | 1989-06-20 | 1995-10-11 | 工業技術院長 | Natural language processor |
| US6112168A (en) * | 1997-10-20 | 2000-08-29 | Microsoft Corporation | Automatically recognizing the discourse structure of a body of text |
| US7127208B2 (en) * | 2002-01-23 | 2006-10-24 | Educational Testing Service | Automated annotation |
| JP2005228075A (en) * | 2004-02-13 | 2005-08-25 | Institute Of Physical & Chemical Research | Daily language program processing system, method and rhetorical structure analysis method |
| JP2010271819A (en) * | 2009-05-20 | 2010-12-02 | Nec Corp | Device, method, and program for extracting phrase relation |
| US9400778B2 (en) * | 2011-02-01 | 2016-07-26 | Accenture Global Services Limited | System for identifying textual relationships |
-
2015
- 2015-07-15 JP JP2015141649A patent/JP6499537B2/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| JP2017027111A (en) | 2017-02-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7223785B2 (en) | TIME-SERIES KNOWLEDGE GRAPH GENERATION METHOD, APPARATUS, DEVICE AND MEDIUM | |
| CN111444330B (en) | Method, device, equipment and storage medium for extracting short text keywords | |
| CN109791569B (en) | Causality identification device and storage medium | |
| CN111783394A (en) | Training method of event extraction model, event extraction method, system and equipment | |
| US20150074112A1 (en) | Multimedia Question Answering System and Method | |
| CN104503998B (en) | For the kind identification method and device of user query sentence | |
| WO2017038657A1 (en) | Question answering system training device and computer program therefor | |
| CN111046656A (en) | Text processing method and device, electronic equipment and readable storage medium | |
| CN109062904B (en) | Logic predicate extraction method and device | |
| CN109933778A (en) | Segmenting method, device and computer readable storage medium | |
| CN118643050A (en) | A natural language to SQL conversion method based on large language model | |
| CN111177375A (en) | Electronic document classification method and device | |
| CN114860942A (en) | Text intention classification method, device, equipment and storage medium | |
| CN106372053B (en) | Syntactic analysis method and device | |
| CN112883736A (en) | Medical entity relationship extraction method and device | |
| CN113590650B (en) | Structured query statement identification method and device based on feature expression | |
| JP6499537B2 (en) | Connection expression structure analysis apparatus, method, and program | |
| JP6021079B2 (en) | Document summarization apparatus, method, and program | |
| CN113901780A (en) | File comparison method and device, electronic equipment and storage medium | |
| CN110413749B (en) | Method and device for determining standard problems | |
| KR20130113000A (en) | Apparatus for language processing and method thereof | |
| KR102474042B1 (en) | Method for analyzing association of diseases using data mining | |
| Xinyi et al. | Using sequential pattern mining and interactive recommendation to assist pipe-like mashup development | |
| JP6665029B2 (en) | Language analysis device, language analysis method, and program | |
| CN110069780B (en) | Specific field text-based emotion word recognition method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20170822 |
|
| A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20180615 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20180724 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20180925 |
|
| TRDD | Decision of grant or rejection written | ||
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20190219 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20190315 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 6499537 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
| S533 | Written request for registration of change of name |
Free format text: JAPANESE INTERMEDIATE CODE: R313533 |
|
| R350 | Written notification of registration of transfer |
Free format text: JAPANESE INTERMEDIATE CODE: R350 |