JP7227705B2

JP7227705B2 - Natural language processing device, search device, natural language processing method, search method and program

Info

Publication number: JP7227705B2
Application number: JP2018093299A
Authority: JP
Inventors: 裕樹太田; 真澄野村
Original assignee: Mitsubishi Heavy Industries Ltd
Current assignee: Mitsubishi Heavy Industries Ltd
Priority date: 2018-05-14
Filing date: 2018-05-14
Publication date: 2023-02-22
Anticipated expiration: 2038-05-14
Also published as: JP2019200488A

Description

本発明は、自然言語処理装置、検索装置、自然言語処理方法、検索方法およびプログラムに関する。 The present invention relates to a natural language processing device, a search device, a natural language processing method, a search method and a program.

自然言語をコンピュータで処理する技術である自然言語処理では、例えば、処理対象の文に対して、形態素解析、構文解析、意味解析、文脈解析等の異なる種類の解析処理が連鎖的に実行される。ここで、形態素解析とは、例えば、英語のように単語間に空白がある言語と異なり、日本語のように単語間に明確な区切りがない言語において、文を単語に切り分ける処理である。構文解析とは、例えば、文を構成する各単語あるいは単語のまとまりの係り受け等の階層的位置関係を簡単な文法規則に基づいて特定する処理である。意味解析とは、例えば、構文解析において複数種類の位置関係の選択肢が並立する場合に単語の意味を利用して位置関係を選択する処理である。例えば、意味解析では、意味概念（意味属性の分類）間の相互関係を定義する概念辞書等が用いられる（例えば、特許文献１参照）。 In natural language processing, which is a technology for processing natural language by computer, for example, different types of analysis processes such as morphological analysis, syntactic analysis, semantic analysis, and contextual analysis are executed in a chain on the sentence to be processed. . Here, the morphological analysis is a process of dividing a sentence into words in a language such as Japanese that does not have clear divisions between words, unlike a language such as English that has spaces between words. Syntax analysis is, for example, a process of identifying hierarchical positional relationships, such as dependencies between words or groups of words that make up a sentence, based on simple grammatical rules. Semantic analysis is, for example, a process of selecting a positional relationship using the meaning of a word when multiple types of positional relationship options exist side by side in syntactic analysis. For example, in semantic analysis, a concept dictionary or the like that defines interrelationships between semantic concepts (classification of semantic attributes) is used (see, for example, Patent Document 1).

なお、構文解析では、一般に、「構成素」を単位とし、各構成素の「文法範疇（文法カテゴリーともいう）」が特定されるとともに、各構成素間の階層的位置関係（各構成素の親子・兄弟関係）が特定される。ここで、構成素とは、文を構成する単語（語）およびそのまとまりを意味する。また、文法範疇とは、例えば、語や句の種類を意味し、具体的には、構成素の文法的特徴による分類である。また、文法範疇は、品詞分類と文法機能の分類（以下、「文法機能分類」という）とを含むとする考え方がある。この場合、品詞分類は、「名詞」、「動詞」、「名詞句」、「動詞句」、「前置詞句」等の分類である。また、文法機能分類は、「主語」、「述語」、「目的語」および「補語」の分類である。構文解析においては、各構成素の文法範疇として品詞分類が特定される。また、階層的位置関係は、各構成素間のつながりを、階層的にかつ前後関係（順序関係）を示して表す。また、構文解析の結果は、文毎に、構文木（階層的な樹形図、句構造木）、入れ子の多重の括弧を用いた形式等で表すことができる。 In syntactic analysis, in general, a ``constituent'' is used as a unit, and a ``grammatical category (also called a grammatical category)'' of each constituent is specified. Parent-child/sibling relationship) is specified. Here, a constituent means a word (word) that constitutes a sentence and a collection thereof. Also, the grammatical category means, for example, the type of words and phrases, and more specifically, classification based on the grammatical characteristics of constituents. In addition, there is an idea that the grammatical category includes a part-of-speech classification and a classification of grammatical function (hereinafter referred to as "grammatical function classification"). In this case, the part-of-speech classification is classification such as "noun", "verb", "noun phrase", "verb phrase", and "prepositional phrase". Also, the grammatical function classification is a classification of "subject", "predicate", "object" and "complement". In syntactic analysis, a part-of-speech classification is specified as a grammatical category for each constituent. In addition, the hierarchical positional relationship expresses the connections between constituent elements in a hierarchical manner and by showing the anteroposterior relationship (order relationship). In addition, the result of parsing can be represented for each sentence in a syntax tree (hierarchical tree diagram, phrase structure tree), a form using multiple nested parentheses, or the like.

また、例えば形態素解析や構文解析のための解析ツール（プログラム）がインターネット上で複数公開されている。例えば、日本語形態解析ツールとしては、「ＪＵＭＡＮ」、「茶筌（ＣｈａＳｅｎ）」、「ＭｅＣａｂ（和布蕪）」等がある。日本語構文解析ツールとしては、「ＫＮＰ」、「南瓜（ＣａｂｏＣｈａ）」等がある。また、英文構文解析ツールとしては、ＢｅｒｋｅｌｅｙＰａｒｓｅｒ、ＳｔａｎｆｏｒｄＰａｒｓｅｒ等がある。 Also, for example, a plurality of analysis tools (programs) for morphological analysis and syntactic analysis are open to the public on the Internet. For example, Japanese morphological analysis tools include "JUMAN", "ChaSen", and "MeCab". Examples of Japanese parsing tools include "KNP" and "CaboCha". Examples of English parsing tools include Berkeley Parser and Stanford Parser.

特開２００３－２６３４２８号公報JP-A-2003-263428

ところで、複数の文書を検索対象として、指定した語や文を検索する場合、例えば、指定した語や文の文法機能分類を指定した方が、所望の検索結果を得やすい場合がある。すなわち、検索のキーワードとそのキーワードが例えば主語であることを指定したり、そのキーワードが例えば目的語であることを指定したりすることで、意図にそぐわない検索結果が含まれる割合を低下させ、検索結果の精度を高めることができる場合がある。しかしながら、構文解析では一般的に品詞分類が特定されるだけで、文法機能分類は特定されない。そのため、例えば、構文解析の結果に対して何らかの処理を加えることで文法機能分類を特定する必要がある。例えば、意味解析を行うことで文法機能分類を抽出しようとする場合、特許文献１に記載されているような辞書を用意することになり、この場合、システムの構成が複雑化するという課題がある。 By the way, when retrieving a specified word or sentence from a plurality of documents, it may be easier to obtain a desired retrieval result if, for example, the grammatical function classification of the specified word or sentence is specified. In other words, by specifying the search keyword and the keyword as the subject, or specifying the keyword as the object, the percentage of search results that do not match the intention is reduced, and the search results are improved. It may be possible to increase the accuracy of the results. However, syntactic analysis generally only specifies the part-of-speech classification and not the grammatical function classification. Therefore, for example, it is necessary to specify the grammatical function classification by applying some processing to the syntactic analysis result. For example, when attempting to extract grammatical function classifications by performing semantic analysis, a dictionary such as that described in Patent Document 1 must be prepared. .

本発明は、上記事情を考慮してなされたものであり、構文解析の結果に対して簡単な処理を行うことで、所定の文法機能分類を構成素に対して特定することができる自然言語処理装置、検索装置、自然言語処理方法、検索方法およびプログラムを提供することを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances. An object is to provide a device, a search device, a natural language processing method, a search method, and a program.

本発明の一態様によれば、自然言語処理装置は、文の構成単位である一以上の単語のまとまりからなる構成素の文法的な機能の分類を示す文法機能分類を特定する自然言語処理装置であって、入力文を文単位で構文解析し、複数の前記構成素と、前記各構成素の種類である構成素種類とを、前記各構成素間の階層的位置関係で表す構文解析結果を生成する構文解析部と、前記構文解析結果における前記各構成素の前記構成素種類と前記階層的位置関係とに基づき、複数の前記構成素から、所定の文法機能分類に属する前記構成素を抽出する抽出部を備える。 According to one aspect of the present invention, a natural language processing device identifies a grammatical function classification indicating a grammatical function classification of a constituent composed of a group of one or more words, which is a constituent unit of a sentence. and syntactically analyzing an input sentence on a sentence-by-sentence basis, and expressing a plurality of said constituents and a constituent type which is a type of each of said constituents by a hierarchical positional relationship between said constituents. and the constituent element belonging to a predetermined grammatical function classification from the plurality of constituent elements based on the constituent type and the hierarchical positional relationship of each constituent element in the syntactic analysis result. An extractor for extracting is provided.

また、本発明の一態様によれば、前記抽出部は、前記構文解析結果を基に、所定の前記構成素種類である第１の前記構成素から辿った所定の前記構成素種類である第２の前記構成素を特定し、第２の前記構成素と所定の位置関係を有する第３の前記構成素を、前記文法機能分類の１つに属する前記構成素として抽出する。 Further, according to one aspect of the present invention, the extracting unit extracts the predetermined constituent type, which is the first constituent type traced from the first constituent type, which is the predetermined constituent type, based on the syntactic analysis result. Two said constituents are specified, and a third said constituent having a predetermined positional relationship with the second said constituent is extracted as said constituent belonging to one of said grammatical function classifications.

また、本発明の一態様によれば、前記抽出部は、前記構成素種類が「文」である第１の前記構成素から前記構成素種類が「動詞句」である第２の前記構成素を下層に向けて辿り、辿られた前記構成素種類が「動詞句」である第２の前記構成素の下層にある前記構成素種類が「動詞」である第３の前記構成素を、前記文法機能分類の１つである「述語」に属する前記構成素として抽出する。 Further, according to the aspect of the present invention, the extracting unit extracts from the first constituent whose constituent type is "sentence" to the second constituent whose constituent type is "verb phrase". is traced downward, and the third constituent whose constituent type is "verb" in the lower layer of the second constituent whose constituent type is "verb phrase" is added to the It is extracted as the constituent element belonging to "predicate" which is one of the grammatical function classifications.

また、本発明の一態様によれば、前記抽出部は、前記文法機能分類が「述語」であるとして抽出した前記構成素を上層に向けて辿り、最も上層に存在する「動詞句」の前記構成素種類を有する前記構成素を特定し、特定された「動詞句」の前記構成素と同層にある前記構成素種類が「名詞句」である前記構成素を、前記文法機能分類の１つである「主語」に属する前記構成素として抽出する。 Further, according to one aspect of the present invention, the extracting unit traces upward the constituent elements extracted as having the grammatical function classification of "predicate", identifying the constituent having a constituent type, and classifying the constituent having the constituent type of "noun phrase" in the same layer as the constituent of the identified "verb phrase" into one of the grammatical function classifications; It is extracted as the constituent element belonging to the "subject" which is one.

また、本発明の一態様によれば、前記抽出部は、前記文法機能分類が「述語」であるとして抽出した前記構成素に接続された前記構成素種類が「動詞句」である前記構成素を下層に向けて辿り、最も下層に存在する「名詞句」の前記構成素種類を有する前記構成素を、前記文法機能分類の１つである「目的語または補語」に属する前記構成素として抽出する。 Further, according to the aspect of the present invention, the extracting unit extracts the constituent whose kind of constituent is "verb phrase" and which is connected to the extracted constituent whose grammatical function classification is "predicate". is traced downward, and the constituent element having the constituent type of "noun phrase" existing in the lowest layer is extracted as the constituent element belonging to "object or complement" which is one of the grammatical function classifications. do.

また、本発明の一態様によれば、前記抽出部は、前記各構成素の前記構成素種類と前記各構成素間の前記階層的位置関係が、所定のパターンに適合する場合に、前記複数の構成素から、前記所定の文法機能分類に属する前記構成素を抽出する。 Further, according to the aspect of the present invention, the extraction unit extracts the plurality of from the constituents, the constituents belonging to the predetermined grammatical function classification are extracted.

第１の前記構成素の１つ上層の第２の前記構成素の前記構成素種類が「動詞句」であり、第２の前記構成素の同層の前方に存在し、第２の前記構成素と同一の構成素に上層で接続される第３の前記構成素の前記構成素種類が「助動詞」であり、第１の前記構成素の３つ上層の第４の前記構成素の前記構成素種類が「文」であり、かつ、第１の前記構成素の４つ上層に前記構成素が存在しない、との前記パターンに適合する場合、前記抽出部は、第１の前記構成素を、前記文法機能分類の１つである「述語」に属する前記構成素として抽出する。 The constituent element type of the second constituent element that is one layer above the first constituent element is "verb phrase", and it exists before the second constituent element in the same layer, and the second constituent element the constituent element type of the third constituent element connected in an upper layer to the same constituent element as the element is "auxiliary verb", and the constituent element of the fourth constituent element three layers above the first constituent element If the element type is "sentence" and the element does not exist in the four layers above the first element, the extraction unit extracts the first element. , are extracted as the constituents belonging to the "predicate" which is one of the grammatical function classifications.

また、本発明の一態様によれば、前記入力文に助動詞を含まない場合において、第１の前記構成素の１つ上層の第２の前記構成素の前記構成素種類が「動詞句」であり、第１の前記構成素の２つ上層の第３の前記構成素の前記構成素種類が「文」であり、かつ、第１の前記構成素の３つ上層の構成素が存在しない、との前記パターンに適合する場合、前記抽出部は、第１の前記構成素を、前記文法機能分類の１つである「述語」に属する前記構成素として抽出する。 Further, according to one aspect of the present invention, when the input sentence does not include an auxiliary verb, the constituent type of the second constituent that is one layer above the first constituent is "verb phrase". is, and the constituent type of the third constituent element that is two layers above the first constituent element is "sentence", and there is no constituent element that is three layers above the first constituent element; , the extraction unit extracts the first constituent as the constituent belonging to "predicate", which is one of the grammatical function classifications.

また、本発明の一態様によれば、前記入力文が重文である場合において、第１の前記構成素の１つ上層の第２の前記構成素の前記構成素種類が「動詞句」であり、第１の前記構成素の３つ上層の第３の前記構成素の前記構成素種類が「文」であり、第２の前記構成素の同層の前方に存在し、第２の前記構成素と同一の構成素に上層で接続される前記構成素の前記構成素種類が「助動詞」であり、かつ、第１の前記構成素の４つ上層の前記構成素の前記構成素種類が「文」である、との前記パターンに適合する場合、前記抽出部は、第１の前記構成素を、前記文法機能分類の１つである「述語」に属する前記構成素として抽出する。 Further, according to one aspect of the present invention, when the input sentence is a compound sentence, the constituent type of the second constituent that is one layer above the first constituent is a "verb phrase." , the constituent element type of the third constituent element that is three layers above the first constituent element is "sentence", and it exists in front of the second constituent element in the same layer, and the second constituent element The constituent type of the constituent connected to the same constituent as the element in the upper layer is "auxiliary verb", and the constituent type of the constituent that is four layers above the first constituent is " is a sentence", the extraction unit extracts the first constituent as a constituent belonging to "predicate", which is one of the grammatical function classifications.

また、本発明の一態様によれば、検索装置は、文の構成単位である一以上の単語のまとまりからなる構成素の文法的な機能の分類を示す文法機能分類を用いて検索する検索装置であって、入力文を文単位で構文解析し、複数の前記構成素と、前記各構成素の種類である構成素種類とを、前記各構成素間の階層的位置関係で表す構文解析結果を生成する構文解析部と、前記構文解析結果における前記各構成素の前記構成素種類と前記階層的位置関係とに基づき、複数の前記構成素から、所定の文法機能分類に属する前記構成素を抽出する抽出部と、前記抽出部による文法機能分類の抽出結果を、検索対象として、指定された前記文法機能分類とキーワードとに対応する前記構成素を含む前記文を検索する検索処理部と、を備える。 In addition, according to one aspect of the present invention, the search device performs a search using a grammatical function classification that indicates the grammatical function classification of a constituent consisting of a group of one or more words, which is a constituent unit of a sentence. and syntactically analyzing an input sentence on a sentence-by-sentence basis, and expressing a plurality of said constituents and a constituent type which is a type of each of said constituents by a hierarchical positional relationship between said constituents. and the constituent element belonging to a predetermined grammatical function classification from the plurality of constituent elements based on the constituent type and the hierarchical positional relationship of each constituent element in the syntactic analysis result. an extraction unit for extracting; a search processing unit for retrieving the result of grammatical function classification extracted by the extracting unit as a search target for the sentence containing the constituent element corresponding to the specified grammatical function classification and keyword; Prepare.

また、本発明の一態様によれば、自然言語処理方法は、文の構成単位である一以上の単語のまとまりからなる構成素の文法的な機能の分類を示す文法機能分類を特定する自然言語処理方法であって、入力文を文単位で構文解析し、複数の前記構成素と、前記各構成素の種類である構成素種類とを、前記各構成素間の階層的位置関係で表す構文解析結果を生成するステップと、前記構文解析結果における前記各構成素の前記構成素種類と前記階層的位置関係とに基づき、複数の前記構成素から、所定の文法機能分類に属する前記構成素を抽出するステップと、を有する。 Further, according to one aspect of the present invention, the natural language processing method specifies a grammatical function classification indicating a grammatical function classification of a constituent consisting of a group of one or more words, which is a constituent unit of a sentence. A processing method, wherein an input sentence is syntactically analyzed on a sentence-by-sentence basis, and a plurality of said constituents and a constituent type, which is the type of each constituent, are represented by a hierarchical positional relationship between said constituents. generating an analysis result; and selecting the constituents belonging to a predetermined grammatical function classification from the plurality of constituents based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result. and extracting.

また、本発明の一態様によれば、検索方法は、文の構成単位である一以上の単語のまとまりからなる構成素の文法的な機能の分類を示す文法機能分類を用いて検索する検索方法であって、入力文を文単位で構文解析し、複数の前記構成素と、前記各構成素の種類である構成素種類とを、前記各構成素間の階層的位置関係で表す構文解析結果を生成するステップと、前記構文解析結果における前記各構成素の前記構成素種類と前記階層的位置関係とに基づき、複数の前記構成素から、所定の文法機能分類に属する前記構成素を抽出するステップと、前記抽出部による文法機能分類の抽出結果を、検索対象として、指定された前記文法機能分類とキーワードとに対応する前記構成素を含む前記文を検索するステップと、を有する。 Further, according to one aspect of the present invention, the search method searches using a grammatical function classification that indicates the classification of the grammatical function of a constituent consisting of a group of one or more words, which is a constituent unit of a sentence. and syntactically analyzing an input sentence on a sentence-by-sentence basis, and expressing a plurality of said constituents and a constituent type which is a type of each of said constituents by a hierarchical positional relationship between said constituents. and extracting the constituents belonging to a predetermined grammatical function classification from the plurality of constituents based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result. and searching the sentence containing the construct corresponding to the specified grammatical function classification and keyword, using the grammatical function classification extracted by the extracting unit as a search target.

また、本発明の一態様によれば、プログラムは、文の構成単位である一以上の単語のまとまりからなる構成素の文法的な機能の分類を示す文法機能分類を特定する自然言語処理方法をコンピュータに実行させるプログラムであって、入力文を文単位で構文解析し、複数の前記構成素と、前記各構成素の種類である構成素種類とを、前記各構成素間の階層的位置関係で表す構文解析結果を生成するステップと、前記構文解析結果における前記各構成素の前記構成素種類と前記階層的位置関係とに基づき、複数の前記構成素から、所定の文法機能分類に属する前記構成素を抽出するステップと、を実行させる。 Further, according to one aspect of the present invention, the program uses a natural language processing method for specifying a grammatical function classification indicating a grammatical function classification of a constituent consisting of a group of one or more words, which is a constituent unit of a sentence. A program to be executed by a computer, parsing an input sentence on a sentence-by-sentence basis, determining a plurality of said constituents and a constituent type which is a type of each of said constituents, and determining a hierarchical positional relationship between said constituents. a step of generating a syntactic analysis result represented by and based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result, from the plurality of constituents belonging to a predetermined grammatical function classification and extracting constituents.

上述の各実施形態によれば、構文解析の結果に対して簡単な処理を行うことで、構成素に対して所定の文法機能分類を特定することができる。 According to each of the embodiments described above, it is possible to specify a predetermined grammatical function classification for a constituent by performing a simple process on the syntactic analysis result.

実施形態に係る自然言語処理装置の構成例を示すシステム図である。1 is a system diagram showing a configuration example of a natural language processing device according to an embodiment; FIG. 実施形態に係る構文解析部の動作例を説明するための模式図である。FIG. 5 is a schematic diagram for explaining an operation example of the syntax analysis unit according to the embodiment; 実施形態に係る構文解析結果に含まれる構成素の一例を示す図である。It is a figure which shows an example of the constituent contained in the parsing result which concerns on embodiment. 実施形態に係る文法機能分類抽出部が抽出する文法機能分類一覧を示す図である。FIG. 4 is a diagram showing a list of grammatical function classifications extracted by a grammatical function classification extraction unit according to the embodiment; 実施形態に係る文法機能分類抽出部の第１動作例を示すフローチャートである。4 is a flow chart showing a first operation example of the grammatical function class extraction unit according to the embodiment; 実施形態に係る文法機能分類抽出部の第１動作例を説明するための説明図である。FIG. 10 is an explanatory diagram for explaining a first operation example of the grammatical function class extraction unit according to the embodiment; 実施形態に係る文法機能分類抽出部の第１動作例を説明するための模式図である。FIG. 5 is a schematic diagram for explaining a first operation example of the grammatical function class extraction unit according to the embodiment; 実施形態に係る文法機能分類抽出結果の構成例を示す図である。FIG. 10 is a diagram showing a configuration example of a grammatical function classification extraction result according to the embodiment; 実施形態に係る文法機能分類抽出部の第２動作例を示すフローチャートである。9 is a flow chart showing a second operation example of the grammatical function class extraction unit according to the embodiment; 実施形態に係る文法機能分類抽出部の第２動作例を説明するための模式図である。FIG. 11 is a schematic diagram for explaining a second operation example of the grammatical function class extraction unit according to the embodiment; 実施形態に係る文法機能分類抽出部の第２動作例を説明するための模式図である。FIG. 11 is a schematic diagram for explaining a second operation example of the grammatical function class extraction unit according to the embodiment; 実施形態に係る文法機能分類抽出部の第２動作例を説明するための模式図である。FIG. 11 is a schematic diagram for explaining a second operation example of the grammatical function class extraction unit according to the embodiment; 実施形態に係るコンピュータの構成を示す概略ブロック図である。1 is a schematic block diagram showing the configuration of a computer according to an embodiment; FIG.

以下、図面を参照して各実施形態について説明する。 Hereinafter, each embodiment will be described with reference to the drawings.

＜実施形態＞
図１は、一実施形態に係る自然言語処理装置１の構成例を示すシステム図である。
図１に示す自然言語処理装置１は、１または複数のコンピュータを用いて構成されている。１または複数のコンピュータは、それぞれＣＰＵ（中央処理装置）、主記憶装置、補助記憶装置、入出力装置、通信装置等を備え、補助記憶装置に記憶されている所定のプログラムをＣＰＵが実行することで所定の処理を実行する。
図１に示す自然言語処理装置１は、構文解析部２と、文法機能分類抽出部３と、記憶部４と、検索部５を備える。構文解析部２と、文法機能分類抽出部３と、記憶部４と、検索部５は、自然言語処理装置１を構成する各コンピュータが備えるハードウェアとソフトウェアの組み合わせによって構成される。また、自然言語処理装置１は、例えば、１または複数のコンピュータを用いて構成された文法機能分類抽出部３と記憶部４を備える構成（検索部５等を含まない構成）であってもよいし、構文解析部２と文法機能分類抽出部３を備える構成（記憶部４と検索部５を含まない構成）であってもよいし、構文解析部２と文法機能分類抽出部３と検索部５を備える構成（記憶部４を含まない構成）であってもよいし、文法機能分類抽出部３や検索部５単体からなる装置であってもよい。なお、以下では、本実施形態に係る自然言語処理装置１が英語を対象として所定の処理する場合について説明する。なお、文法機能分類抽出部３は、抽出部の一例である。 <Embodiment>
FIG. 1 is a system diagram showing a configuration example of a natural language processing device 1 according to one embodiment.
The natural language processing device 1 shown in FIG. 1 is configured using one or more computers. Each of the one or more computers has a CPU (central processing unit), a main memory, an auxiliary memory, an input/output device, a communication device, etc., and the CPU executes a predetermined program stored in the auxiliary memory. Executes a predetermined process.
A natural language processing device 1 shown in FIG. The syntactic analysis unit 2, the grammatical function classification extraction unit 3, the storage unit 4, and the search unit 5 are configured by a combination of hardware and software provided in each computer that constitutes the natural language processing device 1. FIG. Further, the natural language processing device 1 may have a configuration including a grammatical function classification extraction unit 3 and a storage unit 4 configured using one or a plurality of computers (configuration not including the search unit 5 and the like), for example. However, a configuration including the syntax analysis unit 2 and the grammatical function classification extraction unit 3 (configuration not including the storage unit 4 and the search unit 5) may be used, or the syntax analysis unit 2, the grammatical function classification extraction unit 3, and the search unit 5 (not including the storage unit 4), or a device comprising the grammatical function classification extraction unit 3 and the search unit 5 alone. In addition, below, the case where the natural language processing apparatus 1 which concerns on this embodiment performs predetermined processing for English is demonstrated. Note that the grammatical function class extraction unit 3 is an example of an extraction unit.

記憶部４は、互いに対応づけられた文書４１と構文解析結果４２と文法機能分類抽出結果４３を複数組記憶する。文書４１は、検索部５が検索対象とする文書を表す情報である。文書４１は、例えば、自然言語処理装置１の外部から通信回線あるいは記憶媒体を介して入力され記憶部４に記憶される。構文解析結果４２は、文書４１が含む各文に対する構文解析部２による構文解析結果を表す情報である。文法機能分類抽出結果４３は、構文解析結果４２が含む構文解析結果に対する文法機能分類抽出部３による文法機能分類抽出結果を表す情報である。 The storage unit 4 stores a plurality of sets of documents 41, syntax analysis results 42, and grammatical function classification extraction results 43 that are associated with each other. The document 41 is information representing a document to be searched by the search unit 5 . For example, the document 41 is input from outside the natural language processing apparatus 1 via a communication line or a storage medium and stored in the storage unit 4 . The parsing result 42 is information representing the parsing result of the parsing unit 2 for each sentence included in the document 41 . The grammatical function classification extraction result 43 is information representing the grammatical function classification extraction result by the grammatical function classification extraction unit 3 for the syntactic analysis result included in the syntactic analysis result 42 .

構文解析部２は、既存の構文解析手法を用いて、形式的な文法に基づき記憶部４が記憶する文書４１が含む自然言語文を、文単位で構文解析し、解析した結果を構文解析結果４２として記憶部４に記憶する。すなわち、構文解析部２は、入力文を文単位で構文解析し、複数の構成素と、各構成素の種類である構成素種類とを、各構成素間の階層的位置関係で表す構文解析結果を生成する。図２に構文解析結果４２の一例を示す。 The syntactic analysis unit 2 uses an existing syntactic analysis method to parse the natural language sentences included in the document 41 stored in the storage unit 4 based on the formal grammar for each sentence. 42 is stored in the storage unit 4 . That is, the syntactic analysis unit 2 performs syntactic analysis of an input sentence on a sentence-by-sentence basis, and expresses a plurality of constituents and a constituent type, which is the type of each constituent, by a hierarchical positional relationship between the constituents. produce results. An example of the syntax analysis result 42 is shown in FIG.

図２は、図１に示す構文解析部２の動作例を説明するための模式図である。
図２において、構文解析部２は、記憶部４が記憶する文書４１が含む１つの文４１０を入力する。そして、構文解析部２は、１つの文４１０を構文解析した結果である１文の構文解析結果として構文木４２０を出力する。構文木４２０は、構文解析結果４２の一部を構成する。なお、この場合、文４１０は「Ｔｈｉｓｔｒｅｅｉｓｉｌｌｕｓｔｒａｔｉｎｇｔｈｅｃｏｎｓｔｉｔｕｅｎｃｙｒｅｌａｔｉｏｎ．」である。 FIG. 2 is a schematic diagram for explaining an operation example of the parser 2 shown in FIG.
In FIG. 2, the parser 2 inputs one sentence 410 included in the document 41 stored in the memory 4 . Then, the parsing unit 2 outputs a parse tree 420 as a result of parsing one sentence 410 . A parse tree 420 forms part of the parse result 42 . Note that in this case, the sentence 410 is "This tree is illustrating the constitution relation."

構文木４２０は、文４１０を構成する各構成素「Ｔｈｉｓ」、「ｔｒｅｅ」、「ｉｓ」、「ｉｌｌｕｓｔｒａｔｉｎｇ」、「ｔｈｅ」、「ｃｏｎｓｔｉｔｕｅｎｃｙ」、「ｒｅｌａｔｉｏｎ」に対応する節点４２０２～４２０８と、節点４２０２～４２０８を階層的にまとめた複数の構成素に対応する節点４２０９～４２１３を複数の枝４２０１で接続することで構成されている。また、節点４２０２には、最上位の階層の節点であることを示す「ＲＯＯＴ」と記されたタグ４２００が付けられている。各節点４２０１～４２１３は、それぞれが対応する構成素の種類（構成素種類）を示す記号で示されている。図３に構成素の記号の例を示す。 The syntax tree 420 includes nodes 4202 to 4208 corresponding to the constituents "This", "tree", "is", "illustrating", "the", "constituency", and "relation" that make up the sentence 410, and a node It is constructed by connecting nodes 4209 to 4213 corresponding to a plurality of constituent elements hierarchically grouping 4202 to 4208 with a plurality of branches 4201 . Also, node 4202 is tagged with "ROOT" 4200 indicating that it is the node of the highest hierarchy. Each of the nodes 4201 to 4213 is indicated by a symbol indicating the type of constituent (constituent type) to which it corresponds. FIG. 3 shows examples of constituent symbols.

図３は、複数種類の構成素について、各種類の構成素の記号と内容と構成素のレベルの対応関係の例を示す図である。
例えば、記号「Ｓ」の構成素の内容は「文」であり構成素のレベルは「節」である。また、例えば、記号「ＳＢＡＲ」の構成素の内容は「従属節」であり構成素のレベルは「節」である。また、例えば、記号「ＮＰ」の構成素の内容は「名詞句」であり構成素のレベルは「句」である。また、例えば、記号「ＶＰ」の構成素の内容は「動詞句」であり構成素のレベルは「句」である。また、例えば、記号「ＰＰ」の構成素の内容は「前置詞句」であり構成素のレベルは「句」である。また、例えば、記号「ＮＮ」の構成素の内容は「名詞」であり構成素のレベルは「語」である。また、例えば、記号「ＶＢ」の構成素の内容は「動詞」であり構成素のレベルは「語」である。 FIG. 3 is a diagram showing an example of the correspondence between symbols, contents, and constituent levels of each kind of constituents, for a plurality of kinds of constituents.
For example, the constituent content of the symbol "S" is "sentence" and the constituent level is "clause". Further, for example, the constituent content of the symbol "SBAR" is "subordinate clause" and the constituent level is "clause". Further, for example, the constituent content of the symbol "NP" is "noun phrase" and the constituent level is "phrase". Also, for example, the constituent content of the symbol "VP" is "verb phrase" and the constituent level is "phrase". Also, for example, the constituent content of the symbol "PP" is "prepositional phrase" and the constituent level is "phrase". Further, for example, the content of the constituent of the symbol "NN" is "noun" and the level of the constituent is "word". Further, for example, the constituent content of the symbol "VB" is "verb" and the constituent level is "word".

図２に示す構文木４２０では、例えば、節点４２０６は構成素「ｔｈｅ」に対応する「限定詞（ＤＴ）」の構成素に対応し、節点４２０７は構成素「ｃｏｎｓｔｉｔｕｅｎｃｙ」に対応する「名詞（ＮＮ）」の構成素に対応し、節点４２０８は構成素「ｒｅｌａｔｉｏｎ」に対応する「名詞（ＮＮ）」の構成素に対応する。また、節点４２０９は「名詞句（ＮＰ）」の構成素に対応する。この節点４２０９は節点４２０６～４２０８から１層分上位の階層に位置し、節点４２０６～４２０８は互いに同じ層に位置する。この場合、節点４２０９（親）と、節点４２０６～４２０８（子）は親子関係にあり、節点４２０６～４２０８は互いに兄弟関係を有する。また、上位の階層に位置する構成素は、下位の階層に位置する１または複数の枝４２０１を介して接続された１または複数の構成素を含んでいる。また、各節点４２０２～４２１３の図上の左右の位置関係は、対応する各構成素の前後の位置関係を示している。例えば、「語」のレベルの構成素に対応する各節点４２０２～４２０８は、層に関わらず、左右の位置関係が、文中の前後の位置関係（順序関係）を示す。また、「句」または「節」のレベルの構成素に対応する各節点４２０９～４２１３は、層毎に、左右の位置関係が、文中の前後の位置関係（順序関係）を示す。 In the parse tree 420 shown in FIG. 2, for example, the node 4206 corresponds to the constituent of the "determinant (DT)" corresponding to the constituent "the", and the node 4207 corresponds to the constituent "noun (DT)" corresponding to the constituent "consistency". NN)”, and node 4208 corresponds to the constituent of “noun (NN)” corresponding to the constituent “relation”. Also, node 4209 corresponds to a constituent of "noun phrase (NP)". This node 4209 is located one layer higher than the nodes 4206 to 4208, and the nodes 4206 to 4208 are located in the same layer. In this case, a node 4209 (parent) and nodes 4206 to 4208 (children) are in a parent-child relationship, and the nodes 4206 to 4208 are siblings to each other. In addition, constituents positioned in a higher hierarchy include one or more constituents connected via one or more branches 4201 positioned in a lower hierarchy. In addition, the left and right positional relationship of each node 4202 to 4213 in the figure indicates the front and rear positional relationship of each corresponding constituent element. For example, each of the nodes 4202 to 4208 corresponding to the "word" level constituents shows the positional relationship (order relationship) between the left and right sides of the sentence regardless of the layer. In addition, for each node 4209 to 4213 corresponding to the constituent elements of the level of "phrase" or "clause", the positional relationship between the left and right indicates the positional relationship (order relationship) before and after the sentence for each layer.

なお、構文解析部２は、構文解析した結果を例えば入れ子の多重の括弧を用いた形式で出力してもよい。図２に示す例では、構文解析結果は、例えば「（ＲＯＯＴ（Ｓ（ＮＰ（ＤＴＴｈｉｓ）（ＮＮｔｒｅｅ））（ＶＰ（ＶＢＺｉｓ）（ＶＰ（ＶＢＧｉｌｌｕｓｔｒａｔｉｎｇ）（ＮＰ（ＤＴｔｈｅ）（ＮＮｃｏｎｓｔｉｔｕｅｎｃｙ）（ＮＮｒｅｌａｔｉｏｎ））））））」と表すことができる。 Note that the syntax analysis unit 2 may output the result of syntax analysis in a format using nested multiple parentheses, for example. In the example shown in FIG. 2, the parsing result is, for example, "(ROOT (S (NP (DT This) (NN tree)) (VP (VBZ is) (VP (VBG illustration) (NP (DT the) (NN consistency ) (NN relation))))))”.

一方、図１に示す文法機能分類抽出部３は、記憶部４に記憶されている構文解析結果４２を参照し、文単位で構文解析結果４２における各構成素の構成素種類と各構成素間の階層的位置関係に基づき、複数の構成素から、所定の文法機能分類に属する構成素を抽出する。そして、文法機能分類抽出部３は、解析した結果を文法機能分類抽出結果４３として記憶部４に記憶する。本実施形態において、文法機能分類は、「主語」、「述語」、「目的語」または「補語」の分類と、主語、述語、目的語または補語に分類された各構成素に対して所定の位置関係を有する構成素が属する分類（「冒頭にある従属節」、「述語よりも後ろにある名詞句」、・・など）とを含むものとする（詳しくは図４参照）。 On the other hand, the grammatical function classification extraction unit 3 shown in FIG. Constituents belonging to a predetermined grammatical function classification are extracted from a plurality of constituents based on the hierarchical positional relationship of . Then, the grammatical function class extraction unit 3 stores the analysis result in the storage unit 4 as the grammatical function class extraction result 43 . In this embodiment, the grammatical function classification includes the classification of "subject", "predicate", "object" or "complement", and a predetermined Classifications to which constituents having a positional relationship belong ("initial subordinate clause", "noun phrase following predicate", etc.) are included (see FIG. 4 for details).

図４は、図１に示す文法機能分類抽出部３が抽出する文法機能分類一覧を示す図である。
図４は、各文法機能分類の記号と内容と当該分類の構成素の条件の対応関係とを示している。記号「Ｆ＿ＳＢＡＲ１」の分類の内容は「冒頭にある従属節」であり、構成素の条件は「主語、述語、目的語または補語を含む節または句の前」に位置することである。記号「Ｆ＿ＰＰ１」の分類の内容は「冒頭にある前置詞句」であり、構成素の条件は「主語、述語、目的語または補語を含む節または句の前」に位置することである。記号「Ｆ＿Ｓ」の分類の内容は「主語」であり、構成素の条件は「従属節に含まれない節または句内」に位置することである。記号「Ｆ＿Ｖ」の分類の内容は「述語（述語動詞）」であり、構成素の条件は「従属節に含まれない節または句内」に位置することである。記号「Ｆ＿ＯＣ」の分類の内容は「目的語または補語」であり、構成素の条件は「従属節に含まれない節または句内」に位置することである。記号「Ｆ＿ＮＰ」の分類の内容は「述語よりも後ろにある名詞句」であり、構成素の条件は「従属節に含まれない節または句内」に位置することである。記号「Ｆ＿ＡＤＪＰ」の分類の内容は「述語よりも後ろにある形容詞句」であり、構成素の条件は「従属節に含まれない節または句内」に位置することである。記号「Ｆ＿ｔｏ」の分類の内容は「述語よりも後ろにあるＴｏ不定詞」であり、構成素の条件は「従属節に含まれない節または句内」に位置することである。記号「Ｆ＿ＰＰ２」の分類の内容は「述語よりも後ろにある前置詞句」であり、構成素の条件は「主語、述語、目的語または補語を含む節または句の後」に位置することである。そして、記号「Ｆ＿ＳＢＡＲ２」の分類の内容は「述語よりも後ろにある従属節」であり、構成素の条件は「主語、述語、目的語または補語を含む節または句の後」に位置することである。 FIG. 4 is a diagram showing a grammatical function classification list extracted by the grammatical function classification extraction unit 3 shown in FIG.
FIG. 4 shows the correspondence between the symbols and contents of each grammatical function classification and the conditions of the constituents of the classification. The classification content of the symbol "F_SBAR1" is "initial subordinate clause", and the constituent condition is to be located "before a clause or phrase containing a subject, predicate, object or complement". The content of the classification of the symbol "F_PP1" is "prepositional phrase at the beginning", and the constituent condition is to be located "before a clause or phrase containing a subject, predicate, object or complement". The content of the classification of the symbol "F_S" is "subject", and the condition of the constituent is to be located "in a clause or phrase not contained in a subordinate clause". The content of the classification of the symbol "F_V" is "predicate (predicate verb)", and the condition of the constituent is to be located "in a clause or phrase not included in a subordinate clause". The content of the classification of the symbol "F_OC" is "object or complement" and the condition of the constituent is to be located "within a clause or phrase not contained in a subordinate clause". The content of the classification of the symbol "F_NP" is "a noun phrase that follows the predicate", and the condition of the constituent is that it is located "in a clause or phrase that is not included in a subordinate clause". The content of the classification of the symbol "F_ADJP" is "an adjective phrase following the predicate", and the condition of the constituent is to be located "in a clause or phrase not included in a subordinate clause". The content of the classification of the symbol "F_to" is "a To-infinitive that follows the predicate", and the condition of the constituent is that it is located "in a clause or phrase not included in a subordinate clause". The content of the classification of the symbol "F_PP2" is "a prepositional phrase after the predicate", and the constituent condition is "after a clause or phrase containing a subject, predicate, object or complement". . The content of the classification of the symbol "F_SBAR2" is "the subordinate clause following the predicate", and the constituent condition is positioned "after the clause or phrase containing the subject, predicate, object or complement". is.

文法機能分類抽出部３は、図４に示した文法機能分類に属する構成素を抽出する際に、例えば、単文、複文または重文の文の種類によって、主語、述語、または、目的語もしくは補語に対応する構成素を次のように抽出する。すなわち、抽出対象の文が単文である場合、当該文は１つの節のみを含み、主語と述語の組は１組しか含まれないため、文法機能分類抽出部３は、その文が含む主語、述語、または、目的語もしくは補語を、「Ｆ＿Ｓ」、「Ｆ＿Ｖ」または「Ｆ＿ＯＣ」の分類として特定する。また、抽出対象の文が複文である場合、当該文は主節と従属節を含み、主語と述語の組が複数組含まれるが、文法機能分類抽出部３は、主節が含む主語、述語、または、目的語もしくは補語を、「Ｆ＿Ｓ」、「Ｆ＿Ｖ」または「Ｆ＿ＯＣ」の分類にして特定する。また、抽出対象の文が重文である場合、当該文は等位接続された複数の文を含むので、文法機能分類抽出部３は、各文が単文である場合にはその単文が含む主語、述語、または、目的語もしくは補語を「Ｆ＿Ｓ」、「Ｆ＿Ｖ」または「Ｆ＿ＯＣ」の分類として特定し、各文が複文である場合にはその複文の主節が含む主語、述語、または、目的語もしくは補語を「Ｆ＿Ｓ」、「Ｆ＿Ｖ」または「Ｆ＿ＯＣ」の分類として特定する。 When extracting constituents belonging to the grammatical function classification shown in FIG. Extract the corresponding constituents as follows. That is, when the sentence to be extracted is a simple sentence, the sentence includes only one clause and only one pair of subject and predicate. Identify predicates, or objects or complements, as "F_S", "F_V" or "F_OC" categories. If the sentence to be extracted is a complex sentence, the sentence includes a main clause and a subordinate clause, and a plurality of sets of subjects and predicates. , or to identify objects or complements in the "F_S", "F_V" or "F_OC" categories. If the sentence to be extracted is a compound sentence, the sentence includes a plurality of coordinately connected sentences. Therefore, if each sentence is a simple sentence, the grammatical function classification extraction unit 3 Identify the predicate or object or complement as being classified as "F_S", "F_V" or "F_OC", and if each sentence is a compound sentence, the subject, predicate or object contained in the main clause of that compound sentence Or identify the complement as being classified as "F_S", "F_V" or "F_OC".

なお、述語動詞とは、例えば英語において述語のうち特にその核となる動詞を意味する。述語動詞は、主語の動きや状態を表現する動詞であり、一般に主語の直後に置かれる。 In addition, the predicate verb means, for example, a verb that is the core of a predicate in English, for example. A predicate verb is a verb that expresses the action or state of the subject, and is generally placed immediately after the subject.

次に、図５から図８を参照して、図１に示す文法機能分類抽出部３の第１動作例について説明する。図５は、図１に示す文法機能分類抽出部３の第１動作例を示すフローチャートである。図６は、図１に示す文法機能分類抽出部３の第１動作例を説明するための説明図である。図７は、図１に示す文法機能分類抽出部３の第１動作例を説明するための模式図であり、処理対象とする構文解析結果の例を、入力文が「Ｗｈｅｎｓｈｅｗｏｋｅｕｐ，Ｉｗａｓｈａｖｉｎｇｂｒｅａｋｆａｓｔ．」である構文木４２２として示す。なお、図７では、構成素の階層を層番号０～５として示している。層番号０が最上位の階層（最上層ともいう）であり、層番号５が最下位の階層（最下層ともいう）である。そして、図８は、図１に示す文法機能分類抽出結果４３の構成例を示す図である。 Next, a first operation example of the grammatical function class extraction unit 3 shown in FIG. 1 will be described with reference to FIGS. 5 to 8. FIG. FIG. 5 is a flow chart showing a first operation example of the grammatical function class extraction unit 3 shown in FIG. FIG. 6 is an explanatory diagram for explaining a first operation example of the grammatical function class extraction unit 3 shown in FIG. FIG. 7 is a schematic diagram for explaining a first operation example of the grammatical function classification extraction unit 3 shown in FIG. was having breakfast." as a syntax tree 422. In FIG. 7, the layers of constituent elements are shown as layer numbers 0-5. Layer number 0 is the highest layer (also referred to as the highest layer), and layer number 5 is the lowest layer (also referred to as the lowest layer). FIG. 8 is a diagram showing a configuration example of the grammatical function classification extraction result 43 shown in FIG.

第１動作例において、文法機能分類抽出部３は、起動されると、まず、抽出対象の文書があるか否かを判定する（ステップＳ１０）。抽出対象の文書がある場合、すなわち、記憶部４に記憶された複数の文書４１のうち、文法機能分類抽出部３が抽出処理を実施していない文書４１がある場合（ステップＳ１０で「Ｙｅｓ」の場合）、文法機能分類抽出部３は、抽出対象の文書４１を選択する（ステップＳ１１）。一方、抽出対象の文書がない場合（ステップＳ１０で「ＮＯ」の場合）、文法機能分類抽出部３は、処理を終了する。 In the first operation example, when activated, the grammatical function classification extraction unit 3 first determines whether or not there is a document to be extracted (step S10). If there is a document to be extracted, that is, if there is a document 41 that has not been extracted by the grammatical function classification extraction unit 3 among the plurality of documents 41 stored in the storage unit 4 ("Yes" in step S10). case), the grammatical function classification extraction unit 3 selects the document 41 to be extracted (step S11). On the other hand, if there is no document to be extracted ("NO" in step S10), the grammatical function class extraction unit 3 terminates the process.

ステップＳ１１で抽出対象の文書４１を選択すると、次に、文法機能分類抽出部３は、選択した文書４１から未処理の文を選択する（ステップＳ１２）。次に、文法機能分類抽出部３は、選択した文書４１に対応する構文解析結果４２に含まれる選択した文の構文解析結果を参照する（ステップＳ１３）。 After selecting the document 41 to be extracted in step S11, the grammatical function classification extraction unit 3 next selects an unprocessed sentence from the selected document 41 (step S12). Next, the grammatical function classification extraction unit 3 refers to the syntactic analysis result of the selected sentence included in the syntactic analysis result 42 corresponding to the selected document 41 (step S13).

次に、文法機能分類抽出部３は、選択した文に含まれる述語を特定する（ステップＳ１４）。ステップＳ１４において、文法機能分類抽出部３は、図６に「（Ｓ１４）動詞（述語動詞）の特定」として示すように、「構文木を最上層（層番号としては最も小さい）「Ｓ」（文）から「ＶＰ」（動詞句）を辿っていき、「ＶＰ」（動詞句）の左下の動詞、つまり動詞句の前方の動詞が、述語動詞であると特定する。図７に示す構文木４２２の例では、文法機能分類抽出部３は、まず、「Ｓ」の節点４２２１から「ＶＰ」の節点４２２２、続いて「ＶＰ」の節点４２２３と辿る。次に、文法機能分類抽出部３は、「ＶＰ」の節点４２２２の左下の「ＶＢＤ」（動詞）の節点４２２４に対応する語「ｗａｓ」と「ＶＰ」の節点４２２３の左下の「ＶＢＧ」（動詞）の節点４２２５に対応する語「ｈａｖｉｎｇ」を述語動詞として特定する。 Next, the grammatical function classification extraction unit 3 identifies predicates included in the selected sentence (step S14). In step S14, the grammatical function classification extracting unit 3 selects the uppermost layer (the smallest layer number) "S" (the lowest layer number) of the syntax tree, as shown in FIG. "VP" (verb phrase) is traced from the sentence), and the verb to the lower left of "VP" (verb phrase), that is, the verb preceding the verb phrase is specified as the predicate verb. In the example of the syntax tree 422 shown in FIG. 7, the grammatical function classification extraction unit 3 first traces from the "S" node 4221 to the "VP" node 4222 and then to the "VP" node 4223 . Next, the grammatical function classification extraction unit 3 extracts the word "was" corresponding to the "VBD" (verb) node 4224 at the lower left of the node 4222 of "VP" and "VBG" (the lower left of the node 4223 of "VP"). verb) is identified as the predicate verb.

次に、文法機能分類抽出部３は、選択した文に含まれる主語を特定する（ステップＳ１５）。ステップＳ１５において、文法機能分類抽出部３は、図６に「（Ｓ１５）主語の特定」として示すように、ステップＳ１４で辿ったなかで『最上層（層番号としては最も小さい）の「述語動詞が含まれるＶＰ」と同じ層にある「ＮＰ」（名詞句）が主語』であると特定する。図７に示す構文木４２２の例では、文法機能分類抽出部３は、「ＶＰ」の節点４２２２と同じ層にある「ＮＰ」の節点４２２６の下位に接続された「ＰＲＰ」（人称代名詞）の節点４２２７に対応する語「Ｉ」を主語として特定する。 Next, the grammatical function class extraction unit 3 identifies the subject included in the selected sentence (step S15). In step S15, the grammatical function classification extraction unit 3, as shown in FIG. Identify the 'NP' (noun phrase) in the same layer as the contained VP' as the subject'. In the example of the syntax tree 422 shown in FIG. 7, the grammatical function classification extraction unit 3 extracts the ``PRP'' (personal pronoun) connected below the ``NP'' node 4226 in the same layer as the ``VP'' node 4222. Identify the word "I" corresponding to node 4227 as the subject.

次に、文法機能分類抽出部３は、選択した文に含まれる目的語または補語を特定する（ステップＳ１６）。ステップＳ１６において、文法機能分類抽出部３は、図６に「（Ｓ１６）目的語／補語の特定」として示すように、ステップＳ１４で辿ったなかで『最下層（層番号としては最も大きい）の「述語動詞が含まれるＶＰ」の左下の動詞と同じ層にある「ＮＰ」が目的語／補語』であると特定する。図７に示す構文木４２２の例では、文法機能分類抽出部３は、「ＶＰ」の節点４２２３の左下の「ＶＢＧ」の節点４２２５と同じ層にある「ＮＰ」の節点４２２６の下位に接続された「ＮＮ」（名詞）の節点４２２８に対応する語「ｂｒｅａｋｆａｓｔ」を目的語または補語として特定する。 Next, the grammatical function classification extraction unit 3 identifies objects or complements included in the selected sentence (step S16). In step S16, the grammatical function classification extraction unit 3 selects the "lowest layer (largest layer number)" 'NP' in the same layer as the lower left verb of 'VP' containing the predicate verb is identified as the object/complement'. In the example of the syntax tree 422 shown in FIG. 7, the grammatical function classification extraction unit 3 is connected below the node 4226 of "NP" in the same layer as the node 4225 of "VBG" on the lower left of the node 4223 of "VP". Identify the word "breakfast" corresponding to node 4228 of "NN" (noun) as an object or complement.

次に、文法機能分類抽出部３は、主語、述語、目的語または補語を含む節または句の前後の文法機能分類を特定する（ステップＳ１７）。ステップＳ１７において、文法機能分類抽出部３は、記号「Ｆ＿ＳＢＡＲ１」（冒頭にある従属節）の分類、記号「Ｆ＿ＰＰ１」（冒頭にある前置詞句）の分類、記号「Ｆ＿ＰＰ２」（述語よりも後ろにある前置詞句）の分類、または記号「Ｆ＿ＳＢＡＲ２」（述語よりも後ろにある従属節）の分類に対応する構成素を特定する。 Next, the grammatical function classification extraction unit 3 identifies grammatical function classifications before and after the clause or phrase containing the subject, predicate, object or complement (step S17). In step S17, the grammatical function classification extraction unit 3 classifies the symbol "F_SBAR1" (the subordinate clause at the beginning), the symbol "F_PP1" (the prepositional phrase at the beginning), the symbol "F_PP2" (after the predicate). Identify the constituent corresponding to the classification of a certain prepositional phrase) or the classification of the symbol "F_SBAR2" (the subordinate clause following the predicate).

本実施形態において、記号「Ｆ＿ＳＢＡＲ１」（冒頭にある従属節）の分類に対応する構成素は、主語、述語、目的語または補語を含む節または句より前に位置する構成素種類「ＳＢＡＲ」（従属節）の構成素である。記号「Ｆ＿ＰＰ１」（冒頭にある前置詞句）の分類に対応する構成素は、主語、述語、目的語または補語を含む節または句より前に位置する構成素種類「ＰＰ」（前置詞句）の構成素である。記号「Ｆ＿ＰＰ２」（述語よりも後ろにある前置詞句）の分類に対応する構成素は、主語、述語、目的語または補語を含む節または句より後ろ位置する構成素種類「ＰＰ」（前置詞句）の構成素である。記号「Ｆ＿ＳＢＡＲ２」（述語よりも後ろにある従属節）の分類に対応する構成素は、主語、述語、目的語または補語を含む節または句より後に位置する構成素種類「ＳＢＡＲ」（従属節）の構成素である。 In this embodiment, the constituent corresponding to the classification of the symbol "F_SBAR1" (the subordinate clause at the beginning) is the constituent kind "SBAR" ( subordinate clause). Constituents corresponding to the classification of the symbol "F_PP1" (prepositional phrases at the beginning) are constituents of the constituent type "PP" (prepositional phrases) that precede clauses or phrases containing subjects, predicates, objects or complements. It is raw. Constituents corresponding to the classification of the symbol "F_PP2" (prepositional phrases after predicate) are of the constituent type "PP" (prepositional phrases) after clauses or phrases containing subjects, predicates, objects or complements. is a constituent of Constructs corresponding to the classification of symbol "F_SBAR2" (subordinate clauses after predicate) are of the constituent kind "SBAR" (subordinate clauses) located after clauses or phrases containing subjects, predicates, objects or complements. is a constituent of

図７に示す構文木４２２の例では、ステップＳ１７において文法機能分類抽出部３は、主語を含む「ＮＰ」の節点４２２６より前にある「ＳＢＡＲ」の節点４２２９に対応する構成素に対応する語「Ｗｈｅｎ」、「ｓｈｅ」、「ｗｏｋｅ」、および「ｕｐ」を、記号「Ｆ＿ＳＢＡＲ１」（冒頭にある従属節）の分類に対応する構成素として特定する。 In the example of the syntax tree 422 shown in FIG. 7, in step S17, the grammatical function classification extraction unit 3 extracts the word corresponding to the constituent corresponding to the node 4229 of "SBAR" that precedes the node 4226 of "NP" that includes the subject. Identify "When", "she", "woke", and "up" as constituents corresponding to the classification of the symbol "F_SBAR1" (the subordinate clause at the beginning).

次に、文法機能分類抽出部３は、ステップＳ１４～Ｓ１７での特定結果を、文法機能分類抽出結果４３として記憶部４に保存する（ステップＳ１８）。図８は、文法機能分類抽出結果４３の部分の構成例として部分４３０を示す。図８に示す部分４３０は、図７で例として用いた文「Ｗｈｅｎｓｈｅｗｏｋｅｕｐ，Ｉｗａｓｈａｖｉｎｇｂｒｅａｋｆａｓｔ．」を含む１つの文書４１に対する文法機能分類抽出結果４３である。この場合、部分４３０は、文書４１の識別情報（「文書識別情報Ａ」）と文「Ｗｈｅｎｓｈｅｗｏｋｅｕｐ，Ｉｗａｓｈａｖｉｎｇｂｒｅａｋｆａｓｔ．」の識別情報（「文識別情報Ｂ」）に対応づけて特定された文法機能分類の記号と構成素に対応する語を示す情報を含んでいる。なお、「文書識別情報Ａ」は、例えば文書４１のファイル名、ＵＲＩ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＩｄｅｎｔｉｆｉｅｒ）等である。「文識別情報Ｂ」は、文書４１中で文「Ｗｈｅｎｓｈｅｗｏｋｅｕｐ，Ｉｗａｓｈａｖｉｎｇｂｒｅａｋｆａｓｔ．」を特定するための情報であり、例えば、ページ、行、文字数等を用いて表す文の位置情報とすることができる。例えば、図１に示す検索部５は、文書識別情報と文識別情報を指定することで、文書４１とその文書４１中の特定の文を参照することができる。 Next, the grammatical function class extraction unit 3 saves the identification results in steps S14 to S17 in the storage unit 4 as the grammatical function class extraction result 43 (step S18). FIG. 8 shows a portion 430 as a configuration example of the grammatical function classification extraction result 43 . A portion 430 shown in FIG. 8 is the grammatical function classification extraction result 43 for one document 41 including the sentence "When she woke up, I was having breakfast." used as an example in FIG. In this case, the portion 430 is identified by associating the identification information of the document 41 (“document identification information A”) with the identification information of the sentence “When she woke up, I was having breakfast” (“sentence identification information B”). It contains information indicating the words corresponding to the symbols and constituents of the grammatical functional taxonomy. The "document identification information A" is, for example, the file name of the document 41, URI (Uniform Resource Identifier), and the like. "Sentence identification information B" is information for identifying the sentence "When she woke up, I was having breakfast." can be For example, the retrieval unit 5 shown in FIG. 1 can refer to the document 41 and a specific sentence in the document 41 by designating the document identification information and the sentence identification information.

次に、文法機能分類抽出部３は、選択した文書３１内に未処理の文があるか否かを判断する（ステップＳ１９）。未処理の文がある場合（ステップＳ１９で「ＹＥＳ」の場合）、文法機能分類抽出部３は、ステップＳ１２以降の処理を再度実行する。一方、未処理の文がない場合（ステップＳ１９で「ＮＯ」の場合）、文法機能分類抽出部３は、ステップＳ１０以降の処理を再度実行する。 Next, the grammatical function classification extraction unit 3 determines whether or not there is an unprocessed sentence in the selected document 31 (step S19). If there is an unprocessed sentence ("YES" in step S19), the grammatical function class extraction unit 3 executes the processes after step S12 again. On the other hand, if there is no unprocessed sentence ("NO" in step S19), the grammatical function classification extraction unit 3 executes the processes after step S10 again.

以上のように、第１動作例では、文法機能分類抽出部３が、構文解析結果を基に、所定の構成素種類である構成素（第１の構成素とする）から辿った所定の構成素種類である構成素（第２の構成素とする）を特定し、第２の構成素と所定の位置関係を有する構成素（第３の構成素とする）を、文法機能分類の１つに属する構成素として抽出する。よって、概念辞書等を使用することなく、構文解析の結果に対して簡単な処理を行うことで、所定の文法機能分類を構成素に対して特定することができる。 As described above, in the first operation example, the grammatical function classification extracting unit 3 extracts a predetermined structure traced from a constituent of a predetermined constituent type (referred to as a first constituent) based on the syntactic analysis result. A constituent that is a class of primes (referred to as a second constituent) is specified, and a constituent that has a predetermined positional relationship with the second constituent (referred to as a third constituent) is classified as one of the grammatical function classifications. are extracted as constituents belonging to . Therefore, a predetermined grammatical function classification can be specified for a constituent by performing a simple process on the syntactic analysis result without using a concept dictionary or the like.

また、第１動作例において文法機能分類抽出部３は、構成素種類が「文」である構成素（上記第１の構成素に該当）から構成素種類が「動詞句」である構成素（上記第２の構成素に該当）を下層に向けて辿り、辿られた構成素種類が「動詞句」である構成素（上記第２の構成素に該当）の下層にある構成素種類が「動詞」である構成素（上記第３の構成素に該当）を、文法機能分類の１つである「述語」に属する構成素として抽出する。すなわち、第１動作例によれば、構文解析の結果に対して簡単な処理を行うことで、「述語」の文法機能分類を構成素に対して特定することができる。 In the first operation example, the grammatical function classification extraction unit 3 extracts constituents (corresponding to the first constituent) whose constituent type is "sentence" from constituents whose constituent type is "verb phrase". (corresponding to the second constituent) is traced downward, and the Constituents that are "verbs" (corresponding to the third constituent) are extracted as constituents that belong to "predicate", which is one of the grammatical function classifications. That is, according to the first operation example, the grammatical function classification of the "predicate" can be specified for the constituent by performing a simple process on the syntactic analysis result.

また、第１動作例において文法機能分類抽出部３は、上記文法機能分類が「述語」であるとして抽出した構成素を上層に向けて辿り、最も上層に存在する「動詞句」の構成素種類を有する構成素を特定し、特定された「動詞句」の構成素と同層にある構成素種類が「名詞句」である構成素を、文法機能分類の１つである「主語」に属する構成素として抽出する。すなわち、第１動作例によれば、構文解析の結果に対して簡単な処理を行うことで、「主語」の文法機能分類を構成素に対して特定することができる。 Further, in the first operation example, the grammatical function class extraction unit 3 traces upward the constituents extracted as having the grammatical function class of "predicate", and finds the constituent type of "verb phrase" existing in the highest layer. , and the constituents whose constituent type is "noun phrase" in the same layer as the constituent of the identified "verb phrase" belong to "subject", which is one of the grammatical functional classifications Extract as constituents. That is, according to the first operation example, the grammatical function classification of the "subject" can be specified for the constituent by performing a simple process on the syntactic analysis result.

また、第１動作例において文法機能分類抽出部３は、上記文法機能分類が「述語」であるとして抽出した構成素に接続された構成素種類が「動詞句」である構成素を下層に向けて辿り、最も下層に存在する「名詞句」の構成素種類を有する構成素を、文法機能分類の１つである「目的語または補語」に属する構成素として抽出する。すなわち、第１動作例によれば、構文解析の結果に対して簡単な処理を行うことで、「目的語または補語」の文法機能分類を構成素に対して特定することができる。 Further, in the first operation example, the grammatical function class extraction unit 3 directs constituents of which the class of constituents is "verb phrase" connected to the constituents extracted as having the grammatical function class of "predicate" to the lower layer. Then, the constituent element having the constituent type of "noun phrase" existing in the lowest layer is extracted as a constituent element belonging to "object or complement" which is one of the grammatical function classifications. That is, according to the first operation example, the grammatical function classification of "object or complement" can be specified for constituents by performing a simple process on the syntactic analysis result.

次に、図９から図１２を参照して、図１に示す文法機能分類抽出部３の第２動作例について説明する。図９は、図１に示す文法機能分類抽出部３の第２動作例を示すフローチャートである。図１０は、図１に示す文法機能分類抽出部３の第２動作例を説明するための模式図である。
図１０（ａ）は、文法機能分類抽出部３が使用するパターンの一例を示し、図１０（ｂ）は処理対象とする構文解析結果の例を入力文が、「ＴｈｅＳｔｕｄｅｎｔｓｈｏｕｌｄｒｅａｄａｎｉｎｔｅｒｅｓｔｉｎｇｂｏｏｋ．」である構文木４２３として示す。
ここで、この入力文は助動詞「ｓｈｏｕｌｄ」を含む。
図１１（ａ）は、文法機能分類抽出部３が使用する他のパターンの例を示し、図１１（ｂ）は処理対象とする構文解析結果の例を入力文が、「Ｍｅｍｂｅｒｓａｒｅｌｏｖｅｄｂｙａｂｉｇｄｏｇｔｈａｔｅａｔｓｓｕｇａｒｔｏａｓｔｓａｎｄｈａｖｅａｇｏｏｄａｍｏｕｎｔｏｆｍｕｓｃｌｅ．」である構文木４２４として示す。ここで、この入力文は助動詞を含まない。
図１２（ａ）は文法機能分類抽出部３が使用する他のパターンの例を示し、図１２（ｂ）は処理対象とする構文解析結果の例を入力文が「ＩｎｌｉｂｒａｒｙｔｈｅｓｔｕｄｅｎｔｓｈｏｕｌｄｒｅａｄｇｏｏｄｗｏｒｋｂｏｏｋｓａｎｄｔｈｅＬｉｂｒａｒｉａｎｓｈｏｕｌｄｅｎｓｕｒｅｔｈａｔａｌｌｔｈｅｂｏｏｋｉｓｖａｎｄａｌｒｅｓｉｓｔａｎｔ．」である構文木４２５として示す。ここで、この入力文は重文であり、各文が助動詞「ｓｈｏｕｌｄ」を含む。
なお、第２動作例においても第１動作例と同様、文法機能分類抽出部３によって、図４に示す文法機能分類が抽出される。 Next, a second operation example of the grammatical function class extraction unit 3 shown in FIG. 1 will be described with reference to FIGS. 9 to 12. FIG. FIG. 9 is a flow chart showing a second operation example of the grammatical function class extraction unit 3 shown in FIG. FIG. 10 is a schematic diagram for explaining a second operation example of the grammatical function class extraction unit 3 shown in FIG.
FIG. 10(a) shows an example of a pattern used by the grammatical function classification extraction unit 3, and FIG. 10(b) shows an example of the result of syntactic analysis to be processed when the input sentence is "The Student should read an interesting book .” is shown as a syntax tree 423 .
Here, this input sentence contains the auxiliary verb "should".
FIG. 11(a) shows an example of another pattern used by the grammatical function classification extraction unit 3, and FIG. 11(b) shows an example of the parsing result to be processed when the input sentence is "Members are loved by a "big dog that eats sugar toasts and have a good amount of muscle." Here, this input sentence does not contain an auxiliary verb.
FIG. 12(a) shows an example of another pattern used by the grammatical function classification extraction unit 3, and FIG. 12(b) shows an example of the syntactic analysis result to be processed when the input sentence is "In the library the student should read good.""Workbooks and the Librarian should ensure that all the books are vandal resistant." Here, the input sentences are compound sentences, each sentence containing the auxiliary verb "should".
In the second operation example, the grammatical function classification extraction unit 3 extracts the grammatical function classifications shown in FIG. 4 as in the first operation example.

第２動作例において、文法機能分類抽出部３は、起動されると、まず、抽出対象の文書があるか否かを判定する（ステップＳ２０）。抽出対象の文書がある場合、すなわち、記憶部４に記憶された複数の文書４１のうち、文法機能分類抽出部３が抽出処理を実施していない文書４１がある場合（ステップＳ２０で「Ｙｅｓ」の場合）、文法機能分類抽出部３は、抽出対象の文書４１を選択する（ステップＳ２１）。一方、抽出対象の文書がない場合（ステップＳ２０で「ＮＯ」の場合）、文法機能分類抽出部３は、処理を終了する。 In the second operation example, when activated, the grammatical function classification extraction unit 3 first determines whether or not there is a document to be extracted (step S20). If there is a document to be extracted, that is, if there is a document 41 that has not been extracted by the grammatical function class extraction unit 3 among the plurality of documents 41 stored in the storage unit 4 ("Yes" in step S20). case), the grammatical function classification extraction unit 3 selects the document 41 to be extracted (step S21). On the other hand, if there is no document to be extracted ("NO" in step S20), the grammatical function class extraction unit 3 terminates the process.

ステップＳ２１で抽出対象の文書４１を選択すると、次に、文法機能分類抽出部３は、選択した文書４１から未処理の文を選択する（ステップＳ２２）。次に、文法機能分類抽出部３は、選択した文書４１に対応する構文解析結果４２に含まれる選択した文の構文解析結果を参照する（ステップＳ２３）。 After selecting the document 41 to be extracted in step S21, the grammatical function classification extraction unit 3 next selects an unprocessed sentence from the selected document 41 (step S22). Next, the grammatical function classification extraction unit 3 refers to the syntactic analysis result of the selected sentence included in the syntactic analysis result 42 corresponding to the selected document 41 (step S23).

次に、文法機能分類抽出部３は、選択した文の種類に対応する１または複数のパターンを選択する（ステップＳ２４）。ここで、文の種類に対応するパターンとは、所定の文の種類毎かつ文法機能分類毎に予め設定した構文解析結果（構文木）の形態である。文の種類とは、例えば、当該文が重文であるとか、重文ではない文であるとか、当該文が助動詞を含む文であるとかといった文の分類である。また、パターンとは、複数の節点（構成素）間の接続形態に対応するものであり、文法機能分類毎に設定される。各パターンは、文の種類毎に、例えば、ある節点がある文法機能分類に属する場合には当該節点に対して他の節点がどうような構成素種類であってどのような接続関係にあるのかということを示して定義することができる。なお、各パターンは、例えば複数の例文を対象にして試行錯誤的に作成することができる。 Next, the grammatical function class extraction unit 3 selects one or more patterns corresponding to the selected sentence type (step S24). Here, the pattern corresponding to the type of sentence is the form of the syntactic analysis result (syntax tree) set in advance for each predetermined sentence type and for each grammatical function classification. The sentence type is, for example, a sentence classification such as whether the sentence is a compound sentence, a non-compound sentence, or a sentence containing an auxiliary verb. A pattern corresponds to a form of connection between a plurality of nodes (constituents), and is set for each grammatical function classification. For each pattern, for each type of sentence, for example, if a certain node belongs to a certain grammatical function class, what kind of constituent type and what kind of connection relationship other nodes have with respect to that node? It can be defined by showing that Each pattern can be created by trial and error, for example, for a plurality of example sentences.

図１０（ａ）は、文が構成素「ＭＤ」（助動詞）を含む種類である場合に、文法機能分類「Ｆ＿Ｖ」（述語）を特定するためのパターンを示す。この場合、パターンはルール１からルール４を含む。ルール１からルール４のすべてが満たされる場合に文法機能分類抽出部３は当該節点の文法機能分類が「Ｆ＿Ｖ」（述語）であると特定する。ルール１は「１つ目の親がＶＰ（動詞句）」、ルール２は「１つ目の親の左兄弟がＭＤ」、ルール３は「３つ目の親がＳ（文）」、ルール４は「４つ目の親がいない」である。 FIG. 10(a) shows a pattern for identifying the grammatical function class "F_V" (predicate) when the sentence is of a type containing the constituent "MD" (auxiliary verb). In this case, the pattern includes rules 1 through 4. When all of rules 1 to 4 are satisfied, the grammatical function class extraction unit 3 specifies that the grammatical function class of the node is "F_V" (predicate). Rule 1 is "first parent is VP (verb phrase)", rule 2 is "first parent's left brother is MD", rule 3 is "third parent is S (sentence)", rule 4 is "missing a fourth parent".

図１０（ｂ）に示す助動詞を含む文の構文解析結果である構文木４２３では、構成素種類が「動詞（ＶＢ）」の節点４２３１に着目した場合、節点４２３１の１つ目の親の節点４２３２の構成素種類が「ＶＰ」であり、ルール１が満たされる。また、１つ目の親の節点４２３２の左兄弟の節点４２３３の構成素種類が「ＭＤ」であり、ルール２が満たされる。節点４２３１の２つ目の親の節点４２３４の親（３つ目の親）の節点４２３５の構成素種類が「Ｓ（文）」であり、ルール３が満たされる。そして、３つ目の親の節点４２３５は「ＲＯＯＴ」であるから、ルール４が満たされる。よって、図１０（ｂ）に示す構文木４２３に基づき、文法機能分類抽出部３は、節点４２３１に対応する語「ｐｒｏｖｉｄｅ」が文法機能分類「Ｆ＿Ｖ」（述語）であると特定することができる。 In the parse tree 423, which is the result of syntactic analysis of a sentence containing auxiliary verbs shown in FIG. The constituent kind of 4232 is "VP" and rule 1 is satisfied. Also, the constituent type of the left sibling node 4233 of the first parent node 4232 is "MD", and rule 2 is satisfied. The constituent type of node 4235, which is the parent (third parent) of node 4234, which is the second parent of node 4231, is "S (sentence)", and rule 3 is satisfied. And since the third parent node 4235 is "ROOT", rule 4 is satisfied. Therefore, based on the syntax tree 423 shown in FIG. 10B, the grammatical function classification extraction unit 3 can identify that the word "provide" corresponding to the node 4231 is the grammatical function classification "F_V" (predicate). .

また、図１１（ａ）は、文が「ＭＤ」（助動詞）の構成素を含まない種類である場合に、文法機能分類「Ｆ＿Ｖ」（述語）を特定するためのパターンを示す。この場合、パターンはルール１からルール３を含む。ルール１からルール３のすべてが満たされる場合に文法機能分類抽出部３は当該節点の文法機能分類が「Ｆ＿Ｖ」（述語）であると特定する。ルール１は「１つ目の親がＶＰ（動詞句）」、ルール２は「２つ目の親がＳ（文）」、ルール３は「３つ目の親がいない」である。 Also, FIG. 11(a) shows a pattern for identifying the grammatical function class "F_V" (predicate) when the sentence is of a type that does not contain the constituent "MD" (auxiliary verb). In this case, the pattern includes rules 1 through 3. When all of rules 1 to 3 are satisfied, the grammatical function class extraction unit 3 specifies that the grammatical function class of the node is "F_V" (predicate). Rule 1 is "first parent is VP (verb phrase)", rule 2 is "second parent is S (sentence)", and rule 3 is "no third parent".

図１１（ｂ）に示す助動詞を含まない文の構文解析結果である構文木４２４では、構成素種類が「動詞（ＶＢＰ）」の節点４２４１に着目した場合、節点４２４１の１つ目の親の節点４２４２の構成素種類が「ＶＰ」であり、ルール１が満たされる。また、節点４２４１の２つ目の親の節点４２４３の構成素種類が「Ｓ（文）」であり、ルール２が満たされる。そして、２つ目の親の節点４２４３は「ＲＯＯＴ」であるから、ルール３が満たされる。よって、図１１（ｂ）に示す構文木４２４に基づき、文法機能分類抽出部３は、節点４２４１に対応する語「ａｒｅ」が文法機能分類「Ｆ＿Ｖ」（述語）であると特定することができる。 In the parse tree 424, which is the result of syntactic analysis of a sentence that does not include auxiliary verbs, shown in FIG. The constituent kind of node 4242 is "VP" and rule 1 is satisfied. Also, the constituent type of node 4243, which is the second parent of node 4241, is "S (sentence)", and rule 2 is satisfied. And since the second parent node 4243 is "ROOT", rule 3 is satisfied. Therefore, based on the syntax tree 424 shown in FIG. 11B, the grammatical function class extraction unit 3 can identify that the word "are" corresponding to the node 4241 is the grammatical function class "F_V" (predicate). .

また、図１２（ａ）は、文の種類が重文であり、各文が助動詞を含む場合に、文法機能分類「Ｆ＿Ｖ」（述語）を特定するためのパターンを示す。この場合、パターンはルール１からルール４を含む。ルール１からルール４のすべてが満たされる場合に文法機能分類抽出部３は当該節点の文法機能分類が「Ｆ＿Ｖ」（述語）であると特定する。ルール１は「１つ目の親がＶＰ（動詞句）」、ルール２は「３つ目の親がＳ（文）」、ルール３は「１つ目の親の左兄弟がＭＤ（助動詞）」、ルール４は「４つ目の親がＳ（文）」である。 Also, FIG. 12(a) shows a pattern for identifying the grammatical function class "F_V" (predicate) when the sentence type is a compound sentence and each sentence includes an auxiliary verb. In this case, the pattern includes rules 1 through 4. When all of rules 1 to 4 are satisfied, the grammatical function class extraction unit 3 specifies that the grammatical function class of the node is "F_V" (predicate). Rule 1 is "The first parent is VP (verb phrase)", Rule 2 is "The third parent is S (sentence)", Rule 3 is "The left brother of the first parent is MD (auxiliary verb)" ', and rule 4 is 'the fourth parent is S (sentence)'.

図１２（ｂ）に示す各文が助動詞を含む重文であるの構文解析結果である構文木４２５では、構成素種類が「動詞（ＶＢ）」の節点４２５１に着目した場合、節点４２５１の１つ目の親の節点４２５２の構成素種類が「ＶＰ」であり、ルール１が満たされる。また、節点４２５１の３つ目の親の節点４２５３の構成素種類が「Ｓ（文）」であり、ルール２が満たされる。また、１つ目の親の節点４２５２の左兄弟の節点４２５４の構成素種類が「ＭＤ」であり、ルール３が満たされる。そして、節点４２５１の４つ目の親の節点４２５５の構成素種類が「Ｓ（文）」であり、ルール４が満たされる。よって、図１２（ｂ）に示す構文木４２５に基づき、文法機能分類抽出部３は、節点４２５１に対応する語「ｅｎａｂｌｅ」が文法機能分類「Ｆ＿Ｖ」（述語）であると特定することができる。 In the parse tree 425, which is the result of parsing that each sentence shown in FIG. The constituent kind of eye parent node 4252 is "VP" and rule 1 is satisfied. Also, the constituent type of node 4253, which is the third parent of node 4251, is "S (sentence)", and rule 2 is satisfied. Also, the constituent kind of the left sibling node 4254 of the first parent node 4252 is "MD", and rule 3 is satisfied. The constituent type of node 4255, which is the fourth parent of node 4251, is "S (sentence)", and rule 4 is satisfied. Therefore, based on the syntax tree 425 shown in FIG. 12(b), the grammatical function class extraction unit 3 can identify that the word "enable" corresponding to the node 4251 is the grammatical function class "F_V" (predicate). .

また、構文木４２５では、構成素種類が「動詞（ＶＢ）」の節点４２５６に着目した場合、節点４２５６の１つ目の親の節点４２５７の構成素種類が「ＶＰ」であり、ルール１が満たされる。また、節点４２５６の３つ目の親の節点４２５８の構成素種類が「Ｓ（文）」であり、ルール２が満たされる。また、１つ目の親の節点４２５７の左兄弟の節点４２５９の構成素種類が「ＭＤ」であり、ルール３が満たされる。そして、節点４２５６の４つ目の親の節点４２５５の構成素種類が「Ｓ（文）」であり、ルール４が満たされる。よって、図１２（ｂ）に示す構文木４２５に基づき、文法機能分類抽出部３は、節点４２５６に対応する語「ｅｎａｂｌｅ」が文法機能分類「Ｆ＿Ｖ」（述語）であると特定することができる。 In addition, in the syntax tree 425, when attention is focused on the node 4256 whose constituent type is "verb (VB)", the constituent type of the first parent node 4257 of the node 4256 is "VP", and rule 1 is It is filled. Also, the constituent type of node 4258, which is the third parent of node 4256, is "S (sentence)", and rule 2 is satisfied. Also, the constituent type of the left sibling node 4259 of the first parent node 4257 is "MD", and rule 3 is satisfied. The constituent type of node 4255, which is the fourth parent of node 4256, is "S (sentence)", and rule 4 is satisfied. Therefore, based on the syntax tree 425 shown in FIG. 12(b), the grammatical function classification extraction unit 3 can identify that the word "enable" corresponding to the node 4256 is the grammatical function classification "F_V" (predicate). .

さて、ステップＳ２４で選択した文の種類に対応するパターンを選択した後、文法機能分類抽出部３は、ステップＳ２３で参照した構文解析結果がステップＳ２４で選択した各パターンに適合する場合に、図１０から図１２を参照して説明したようにして所定の文法機能分類に属する各構成素を特定する（ステップＳ２５）。 Now, after selecting a pattern corresponding to the type of sentence selected in step S24, the grammatical function classification extraction unit 3 selects the Each constituent belonging to a predetermined grammatical function classification is specified as described with reference to FIGS. 10 to 12 (step S25).

次に、文法機能分類抽出部３は、ステップＳ２５での特定結果を、第１動作例と同様にして文法機能分類抽出結果４３として記憶部４に保存する（ステップＳ２６）。 Next, the grammatical function class extraction unit 3 stores the result of identification in step S25 in the storage unit 4 as the grammatical function class extraction result 43 in the same manner as in the first operation example (step S26).

次に、文法機能分類抽出部３は、選択した文書４１内に未処理の文があるか否かを判断する（ステップＳ２７）。未処理の文がある場合（ステップＳ２７で「ＹＥＳ」の場合）、文法機能分類抽出部３は、ステップＳ２２以降の処理を再度実行する。一方、未処理の文がない場合（ステップＳ２７で「ＮＯ」の場合）、文法機能分類抽出部３は、ステップＳ２０以降の処理を再度実行する。 Next, the grammatical function classification extraction unit 3 determines whether or not there is an unprocessed sentence in the selected document 41 (step S27). If there is an unprocessed sentence ("YES" in step S27), the grammatical function class extraction unit 3 executes the processes after step S22 again. On the other hand, if there is no unprocessed sentence ("NO" in step S27), the grammatical function class extraction unit 3 executes the processes after step S20 again.

以上のように第２動作例では、文法機能分類抽出部３が、各構成素の構成素種類と各構成素間の階層的位置関係が、所定のパターンに適合する場合に、複数の構成素から、所定の文法機能分類に属する構成素を抽出する。よって、概念辞書等を使用することなく、構文解析の結果に対して簡単な処理を行うことで、所定の文法機能分類を構成素に対して特定することができる。 As described above, in the second operation example, the grammatical function classification extraction unit 3 selects a plurality of constituents when the constituent type of each constituent and the hierarchical positional relationship between the constituents conform to a predetermined pattern. , to extract constituents belonging to a predetermined grammatical function classification. Therefore, a predetermined grammatical function classification can be specified for a constituent by performing a simple process on the syntactic analysis result without using a concept dictionary or the like.

また、第２動作例において文法機能分類抽出部３は、ある構成素（第１の構成素とする）の１つ上層の構成素（第２の構成素とする）の構成素種類が「動詞句」であり、第２の構成素の同層の前方に存在し、第２の構成素と同一の構成素に上層で接続される構成素（第３の構成素とする）の構成素種類が「助動詞」であり、第１の構成素の３つ上層の構成素（第４の構成素とする）の構成素種類が「文」であり、かつ、第１の構成素の４つ上層に構成素が存在しないということを所定のパターンとして、各構成素の構成素種類と各構成素間の階層的位置関係が当該パターンに適合する場合に、第１の構成素を、文法機能分類の１つである「述語」に属する構成素として抽出する。すなわち、第２動作例によれば、構文解析の結果に対して簡単な処理を行うことで、入力文が助動詞を含む場合に「述語」の文法機能分類を構成素に対して特定することができる。 Further, in the second operation example, the grammatical function classification extraction unit 3 determines that the constituent type of a constituent (assumed to be a second constituent) that is one layer above a certain constituent (assumed to be a first constituent) is "verb The constituent type of a constituent (referred to as the third constituent) that is "phrase" and exists in front of the second constituent in the same layer and is connected in the upper layer to the same constituent as the second constituent is an "auxiliary verb", the constituent type of a constituent element three layers above the first constituent element (assumed to be the fourth constituent element) is "sentence", and four constituent elements above the first constituent element With a predetermined pattern that a constituent does not exist in the grammatical function classification is extracted as a constituent belonging to "predicate" which is one of That is, according to the second operation example, by performing a simple process on the syntactic analysis result, it is possible to specify the grammatical functional classification of the "predicate" for the constituent when the input sentence includes an auxiliary verb. can.

また、第２動作例において文法機能分類抽出部３は、入力文に助動詞を含まない場合において、ある構成素（第１の構成素とする）の１つ上層の構成素（第２の構成素とする）の構成素種類が「動詞句」であり、第１の構成素の２つ上層の構成素（第３の構成素とする）の構成素種類が「文」であり、かつ、第１の構成素の３つ上層の構成素が存在しないということを所定のパターンとして、各構成素の構成素種類と各構成素間の階層的位置関係が当該パターンに適合するときに、第１の構成素を、文法機能分類の１つである「述語」に属する構成素として抽出する。すなわち、第２動作例によれば、構文解析の結果に対して簡単な処理を行うことで、入力文が助動詞を含む場合に「述語」の文法機能分類を構成素に対して特定することができる。 Further, in the second operation example, the grammatical function classification extraction unit 3 extracts a constituent (a second constituent) that is one layer higher than a certain constituent (assumed to be the first constituent) when the input sentence does not contain an auxiliary verb. The constituent type of the first constituent is "verb phrase", and the constituent type of the constituent two levels above the first constituent (which is assumed to be the third constituent) is "sentence", and With a predetermined pattern that there is no constituent element three layers above the constituent element of 1, when the constituent type of each constituent element and the hierarchical positional relationship between the constituent elements conform to the pattern, the first is extracted as a constituent belonging to "predicate" which is one of the grammatical function classifications. That is, according to the second operation example, by performing a simple process on the syntactic analysis result, it is possible to specify the grammatical functional classification of the "predicate" for the constituent when the input sentence includes an auxiliary verb. can.

また、第２動作例において文法機能分類抽出部３は、入力文が重文である場合において、ある構成素（第１の構成素とする）の１つ上層の構成素（第２の構成素とする）の構成素種類が「動詞句」であり、第１の構成素の３つ上層の構成素（第３の構成素とする）の構成素種類が「文」であり、第２の構成素の同層の前方に存在し、第２の構成素と同じく他の構成素（第４の構成素とする）に上層で接続される構成素（第５の構成素とする）の構成素種類が「助動詞」であり、かつ、第１の構成素の４つ上層の構成素の構成素種類が「文」であるということを所定のパターンとして、各構成素の構成素種類と各構成素間の階層的位置関係とが当該パターンに適合するときに、第１の構成素を、文法機能分類の１つである「述語」に属する構成素として抽出する。すなわち、第２動作例によれば、構文解析の結果に対して簡単な処理を行うことで、入力文が助動詞を含む場合に「述語」の文法機能分類を構成素に対して特定することができる。 Further, in the second operation example, when the input sentence is a compound sentence, the grammatical function classification extraction unit 3 extracts a constituent element that is one layer higher than a constituent element (referred to as a first constituent element) The constituent type of the first constituent is "verb phrase", the constituent type of the constituent three levels above the first constituent (which is called the third constituent) is "sentence", and the second constituent is A constituent element of a constituent element (fifth constituent element) that exists in front of the same layer of the element and is connected in the upper layer to another constituent element (fourth constituent element) like the second constituent element The type is "auxiliary verb", and the constituent type of the constituent four layers above the first constituent is "sentence" is used as a predetermined pattern, the constituent type of each constituent and each constituent When the hierarchical positional relationship between the elements matches the pattern, the first constituent is extracted as a constituent belonging to "predicate" which is one of the grammatical function classifications. That is, according to the second operation example, by performing a simple process on the syntactic analysis result, it is possible to specify the grammatical functional classification of the "predicate" for the constituent when the input sentence includes an auxiliary verb. can.

一方、図１に示す検索部５（検索装置）は、検索処理部５１を備える。検索処理部５１は、文法機能分類抽出結果４３を検索対象として、指定された文法機能分類５２とキーワード５３に対応する構成素を含む文を検索し、検索結果５４を出力する。検索部５は、例えば、指定されたキーワード５３と一致する指定された文法機能分類５２として抽出された１または複数の語を含む文を抽出し、検索結果５４として出力する。すなわち、検索処理部５１は、文法機能分類抽出部３が、文の構成単位である複数の構成素と各構成素の種類である構成素種類と各構成素間の階層的位置関係で入力文を表す構文解析結果４２を参照し、構文解析結果４２における各構成素の構成素種類と各構成素間の階層的位置関係に基づき、複数の構成素から、所定の文法機能分類に属する構成素を抽出した結果である文法機能分類抽出結果４３を、検索対象として、指定された文法機能分類５２とキーワード５３に対応する構成素を含む文を検索する。 On the other hand, the search unit 5 (search device) shown in FIG. 1 includes a search processing unit 51 . The search processing unit 51 searches the grammatical function classification extraction result 43 as a search target for sentences containing constituents corresponding to the designated grammatical function classification 52 and keyword 53 , and outputs a search result 54 . The search unit 5 extracts, for example, a sentence including one or more words extracted as the specified grammatical function classification 52 that matches the specified keyword 53 and outputs it as a search result 54 . That is, the search processing unit 51 extracts an input sentence based on a plurality of constituents that are constituent units of the sentence, constituent types that are the types of each constituent, and hierarchical positional relationships between the constituents. with reference to the syntactic analysis result 42 representing, based on the constituent type of each constituent in the syntactic analysis result 42 and the hierarchical positional relationship between each constituent, a constituent belonging to a predetermined grammatical function classification is selected from a plurality of constituents The grammatical function class extraction result 43, which is the result of extracting

検索部５によれば、文法機能分類を考慮することで、考慮しない単純なキーワード検索よりも精度高く意図する文を抽出することができる。 According to the search unit 5, by considering the grammatical function classification, it is possible to extract the intended sentence with higher precision than a simple keyword search that does not consider the grammatical function classification.

なお、検索部５が入力する文法機能分類５２とキーワード５３はそれぞれ１つであってもよいし、複数であってもよい。複数の場合には例えば論理和や論理積、否定等を用いて組み合わせることができる。例えば、検索部５は、指定した第１の文法機能分類５２で第１のキーワード５３に一致する１または複数の語を含み、かつ、指定した第２の文法機能分類５２で第２のキーワード５３に一致する１または複数の語を含まない文を検索することができる。 The grammatical function classification 52 and the keyword 53 input by the search unit 5 may be one each, or may be plural. In the case of a plurality of numbers, they can be combined using, for example, a logical sum, logical product, negation, or the like. For example, the search unit 5 includes one or more words that match the first keyword 53 in the first designated grammatical function classification 52 and the second keyword 53 in the second designated grammatical function classification 52 . can be searched for sentences that do not contain one or more words matching .

以上のように、本実施形態によれば、構文解析の結果に対して簡単な処理を行うことで、構成素に対して所定の文法機能分類を特定することができる。また、本実施形態によれば、文法機能分類を考慮することで、考慮しない単純なキーワード検索よりも精度高く意図する文を抽出することができる。 As described above, according to the present embodiment, it is possible to specify a predetermined grammatical function classification for a constituent by performing a simple process on the syntactic analysis result. Further, according to the present embodiment, by considering the grammatical function classification, it is possible to extract the intended sentence with higher accuracy than a simple keyword search that does not consider the grammatical function classification.

なお、本実施形態の自然言語処理装置１は、日本語等の英語以外の自然言語を処理対象とすることができる。その場合、自然言語処理装置１は、形態素解析を行う構成を備えていてもよい。また、文書４１の内容に限定はないが、例えば、製品設計や開発の分野で用いる場合には、客先提案依頼書、指示書、仕様書、国内法令、外国法令、国際規格を記載した文書を処理対象の文書４１とすることができる。この場合に、機器名、部品名、材料名やそれらの属性をキーワードとして用いる場合に、主語、目的語等の文法機能分類を用いることでより詳細な検索が可能となり、検索精度を向上させることが期待される。また、例えば測定条件等は冒頭部分に書かれていることが多く、例えば冒頭部分に書かれたキーワードに一致する語を除外するといった検索を行うことが有効である場合がある。また、文法機能分類抽出部３は、上述した第１動作例と第２動作例を組み合わせて文法機能分類を抽出（特定）してもよい。 Note that the natural language processing device 1 of the present embodiment can process natural languages other than English, such as Japanese. In that case, the natural language processing device 1 may be configured to perform morphological analysis. The content of the document 41 is not limited. can be the document 41 to be processed. In this case, when equipment names, parts names, material names, and their attributes are used as keywords, more detailed searches can be performed by using grammatical function classifications such as subjects and objects, and search accuracy can be improved. There is expected. For example, measurement conditions are often written at the beginning, and it may be effective to perform a search to exclude words that match the keyword written at the beginning, for example. Further, the grammatical function class extraction unit 3 may extract (specify) a grammatical function class by combining the above-described first operation example and second operation example.

図１３は、上述の実施形態に係るコンピュータの構成を示す概略ブロック図である。
コンピュータ９は、ＣＰＵ９１、主記憶装置９２、補助記憶装置９３、インタフェース９４を備える。
上述の自然言語処理装置１は、コンピュータ９を備える。そして、上述した各処理部の動作は、プログラムの形式で補助記憶装置９３に記憶されている。ＣＰＵ９１は、プログラムを補助記憶装置９３から読み出して主記憶装置９２に展開し、当該プログラムに従って上記処理を実行する。例えば、上述した構文解析部２、文法機能分類抽出部３、および、検索部５（検索処理部５１）は、ＣＰＵ９１であってよい。
また、ＣＰＵ９１は、プログラムに従って、上述した記憶部４に対応する記憶領域を主記憶装置９２または補助記憶装置９３に確保してもよい。
なお、自然言語処理装置１、構文解析装部２、文法機能分類抽出部３、検索部５、文法機能分類抽出結果４３等を構成するプログラムやデータは、その一部または全部をコンピュータ読取可能な記録媒体あるいは通信回線を介して頒布することができる。また、自然言語処理装置１を複数のコンピュータで構成する場合、複数のコンピュータは、ネットワークを介して分散して配置されていてもよい。 FIG. 13 is a schematic block diagram showing the configuration of the computer according to the above embodiment.
The computer 9 includes a CPU 91 , a main storage device 92 , an auxiliary storage device 93 and an interface 94 .
The natural language processing device 1 described above comprises a computer 9 . The operation of each processing unit described above is stored in the auxiliary storage device 93 in the form of a program. The CPU 91 reads out the program from the auxiliary storage device 93, develops it in the main storage device 92, and executes the above process according to the program. For example, the CPU 91 may be the syntactic analysis unit 2, the grammatical function classification extraction unit 3, and the search unit 5 (search processing unit 51) described above.
Further, the CPU 91 may secure a storage area corresponding to the above-described storage section 4 in the main storage device 92 or the auxiliary storage device 93 according to the program.
Some or all of the programs and data constituting the natural language processing device 1, the parser 2, the grammatical function class extraction unit 3, the search unit 5, the grammatical function class extraction result 43, etc. are computer readable. It can be distributed via recording media or communication lines. Moreover, when the natural language processing device 1 is composed of a plurality of computers, the plurality of computers may be arranged in a distributed manner via a network.

以上、この発明の実施形態を、図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described in detail above with reference to the drawings, the specific configuration is not limited to these embodiments, and designs and the like are included within the scope of the gist of the present invention.

１自然言語処理装置
２構文解析部
３文法機能分類抽出部（抽出部）
４記憶部
５検索部
５１検索処理部 1 Natural language processing device 2 Syntax analysis unit 3 Grammar function classification extraction unit (extraction unit)
4 storage unit 5 search unit 51 search processing unit

Claims

A natural language processing device that identifies a grammatical function classification indicating a grammatical function classification of a constituent consisting of a group of one or more words that is a constituent unit of a sentence,
A syntax for syntactically analyzing an input sentence on a sentence-by-sentence basis and generating a parsing result representing a plurality of said constituents and a constituent type which is a type of each said constituent by a hierarchical positional relationship between said constituents. an analysis unit;
an extraction unit that extracts the constituents belonging to a predetermined grammatical function classification from the plurality of constituents based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result,
The predetermined grammatical function classification determines whether at least a constituent whose constituent type is "subordinate clause" or "prepositional phrase" is at the beginning of the sentence or after the predicate of the sentence. , has a distinguishable division ,
The extraction unit extracts the predetermined grammatical function classification from the plurality of constituents when the constituent type of each constituent and the hierarchical positional relationship between the constituents conforms to a predetermined pattern. extracting said constituents belonging to
the constituent type of the second constituent that is one layer above the first constituent is a "verb phrase";
the constituent type of the third constituent that exists in front of the second constituent in the same layer and is connected to the same constituent as the second constituent in the upper layer is an "auxiliary verb";
The constituent type of the fourth constituent element that is three layers above the first constituent element is "sentence", and
there is no said constituent in four layers above the first said constituent,
If it fits the pattern with
The extracting unit extracts the first constituent element as the constituent element belonging to "predicate", which is one of the grammatical function classifications.
Natural language processor.

The extractor is
Based on the syntactic analysis result, identifying the second constituent that is the predetermined constituent type traced from the first constituent that is the predetermined constituent type,
2. The natural language processing device according to claim 1, wherein a third constituent having a predetermined positional relationship with the second constituent is extracted as the constituent belonging to one of the grammatical function classifications.

The extractor is
Tracing downward from the first constituent whose constituent type is "sentence" to the second constituent whose constituent type is "verb phrase";
The third constituent element whose constituent type is "verb", which is under the second constituent element whose constituent type is "verb phrase", is selected according to one of the grammatical function classifications. 3. The natural language processing device according to claim 2, wherein the constituent element belonging to a certain "predicate" is extracted.

The extractor is
tracing upward the constituent elements extracted as having the grammatical function classification of "predicate", and identifying the constituent element having the constituent type of "verb phrase" existing in the highest layer;
The constituent element whose constituent type is "noun phrase" and which is in the same layer as the constituent element of the specified "verb phrase" is regarded as the constituent element belonging to "subject" which is one of the grammatical function classifications. The natural language processing device according to claim 3, which extracts.

the extracting unit traces downward the constituents of which the constituent type is "verb phrase" connected to the constituents extracted as having the grammatical function classification of "predicate";
5. The constituent element of claim 3 or 4, wherein the constituent element having the constituent type of "noun phrase" existing in the lowest layer is extracted as the constituent element belonging to "object or complement" which is one of the grammatical function classifications. The natural language processing device described.

A search device for searching using a grammatical function classification that indicates the grammatical function classification of a constituent consisting of a group of one or more words, which is a constituent unit of a sentence,
A syntax for syntactically analyzing an input sentence on a sentence-by-sentence basis and generating a parsing result representing a plurality of said constituents and a constituent type which is a type of each said constituent by a hierarchical positional relationship between said constituents. an analysis unit;
an extraction unit for extracting the constituents belonging to a predetermined grammatical function classification from the plurality of constituents based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result;
a search processing unit for searching, as a search target, the sentence containing the constituent element corresponding to the specified grammatical function classification and keyword, from the results of the grammatical function classification extracted by the extracting unit;
with
The predetermined grammatical function classification determines whether at least a constituent whose constituent type is "subordinate clause" or "prepositional phrase" is at the beginning of the sentence or after the predicate of the sentence. , has a distinguishable division ,
The extraction unit extracts the predetermined grammatical function classification from the plurality of constituents when the constituent type of each constituent and the hierarchical positional relationship between the constituents conforms to a predetermined pattern. extracting said constituents belonging to
the constituent type of the second constituent that is one layer above the first constituent is a "verb phrase";
the constituent type of the third constituent that exists in front of the second constituent in the same layer and is connected to the same constituent as the second constituent in the upper layer is an "auxiliary verb";
The constituent type of the fourth constituent element that is three layers above the first constituent element is "sentence", and
there is no said constituent in four layers above the first said constituent,
If it fits the pattern with
The extracting unit extracts the first constituent element as the constituent element belonging to "predicate", which is one of the grammatical function classifications.
search device.

A natural language processing method for identifying a grammatical function classification indicating a grammatical function classification of a constituent consisting of a group of one or more words that is a constituent unit of a sentence,
parsing an input sentence sentence by sentence, and generating a syntactic analysis result representing a plurality of said constituents and a constituent kind which is a kind of each said constituent by a hierarchical positional relationship between said constituents; and,
a step of extracting the constituents belonging to a predetermined grammatical function classification from the plurality of constituents based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result;
has
The predetermined grammatical function classification determines whether at least a constituent whose constituent type is "subordinate clause" or "prepositional phrase" is at the beginning of the sentence or after the predicate of the sentence. , has a distinguishable division ,
In the extracting step, the predetermined grammatical function is extracted from the plurality of constituents when the constituent type of each constituent and the hierarchical positional relationship between the constituents conform to a predetermined pattern. extracting the constituents belonging to the classification;
the constituent type of the second constituent that is one layer above the first constituent is a "verb phrase";
the constituent type of the third constituent that exists in front of the second constituent in the same layer and is connected to the same constituent as the second constituent in the upper layer is an "auxiliary verb";
The constituent type of the fourth constituent element that is three layers above the first constituent element is "sentence", and
there is no said constituent in four layers above the first said constituent,
If it fits the pattern with
In the extracting step, the first constituent element is extracted as the constituent element belonging to "predicate", which is one of the grammatical function classifications.
Natural language processing method.

A retrieval method for retrieving using a grammatical function classification indicating the grammatical function classification of a constituent consisting of a group of one or more words that is a constituent unit of a sentence,
parsing an input sentence sentence by sentence, and generating a syntactic analysis result representing a plurality of said constituents and a constituent kind which is a kind of each said constituent by a hierarchical positional relationship between said constituents; and,
a step of extracting the constituents belonging to a predetermined grammatical function classification from the plurality of constituents based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result;
a step of searching, as a search target, the sentence containing the constituent element corresponding to the specified grammatical function classification and keyword;
has
The predetermined grammatical function classification determines whether at least a constituent whose constituent type is "subordinate clause" or "prepositional phrase" is at the beginning of the sentence or after the predicate of the sentence. , has a distinguishable division ,
In the extracting step, the predetermined grammatical function is extracted from the plurality of constituents when the constituent type of each constituent and the hierarchical positional relationship between the constituents conform to a predetermined pattern. extracting the constituents belonging to the classification;
the constituent type of the second constituent that is one layer above the first constituent is a "verb phrase";
the constituent type of the third constituent that exists in front of the second constituent in the same layer and is connected to the same constituent as the second constituent in the upper layer is an "auxiliary verb";
The constituent type of the fourth constituent element that is three layers above the first constituent element is "sentence", and
there is no said constituent in four layers above the first said constituent,
If it fits the pattern with
In the extracting step, the first constituent element is extracted as the constituent element belonging to "predicate", which is one of the grammatical function classifications.
retrieval method.

A program for causing a computer to execute a natural language processing method for identifying a grammatical function classification indicating a grammatical function classification of a constituent consisting of a group of one or more words that is a constituent unit of a sentence,
parsing an input sentence sentence by sentence, and generating a syntactic analysis result representing a plurality of said constituents and a constituent kind which is a kind of each said constituent by a hierarchical positional relationship between said constituents; and,
a step of extracting the constituents belonging to a predetermined grammatical function classification from the plurality of constituents based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result;
A program that executes
The predetermined grammatical function classification determines whether at least a constituent whose constituent type is "subordinate clause" or "prepositional phrase" is at the beginning of the sentence or after the predicate of the sentence. , has a distinguishable division ,
In the extracting step, the predetermined grammatical function is extracted from the plurality of constituents when the constituent type of each constituent and the hierarchical positional relationship between the constituents conform to a predetermined pattern. extracting the constituents belonging to the classification;
the constituent type of the second constituent that is one layer above the first constituent is a "verb phrase";
the constituent type of the third constituent that exists in front of the second constituent in the same layer and is connected to the same constituent as the second constituent in the upper layer is an "auxiliary verb";
The constituent type of the fourth constituent element that is three layers above the first constituent element is "sentence", and
there is no said constituent in four layers above the first said constituent,
If it fits the pattern with
In the extracting step, the first constituent element is extracted as the constituent element belonging to "predicate", which is one of the grammatical function classifications.
program.

A natural language processing device that identifies a grammatical function classification indicating a grammatical function classification of a constituent consisting of a group of one or more words that is a constituent unit of a sentence,
A syntax for syntactically analyzing an input sentence on a sentence-by-sentence basis and generating a parsing result representing a plurality of said constituents and a constituent type which is a type of each said constituent by a hierarchical positional relationship between said constituents. an analysis unit;
an extraction unit that extracts the constituents belonging to a predetermined grammatical function classification from the plurality of constituents based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result,
The extraction unit extracts the predetermined grammatical function classification from the plurality of constituents when the constituent type of each constituent and the hierarchical positional relationship between the constituents conforms to a predetermined pattern. extracting said constituents belonging to
the constituent type of the second constituent that is one layer above the first constituent is a "verb phrase";
the constituent type of the third constituent that exists in front of the second constituent in the same layer and is connected to the same constituent as the second constituent in the upper layer is an "auxiliary verb";
The constituent type of the fourth constituent element that is three layers above the first constituent element is "sentence", and
there is no said constituent in four layers above the first said constituent,
If it fits the pattern with
The extracting unit extracts the first constituent element as the constituent element belonging to "predicate", which is one of the grammatical function classifications.
Natural language processor.

A natural language processing device that identifies a grammatical function classification indicating a grammatical function classification of a constituent consisting of a group of one or more words that is a constituent unit of a sentence,
A syntax for syntactically analyzing an input sentence on a sentence-by-sentence basis and generating a parsing result representing a plurality of said constituents and a constituent type which is a type of each said constituent by a hierarchical positional relationship between said constituents. an analysis unit;
an extraction unit that extracts the constituents belonging to a predetermined grammatical function classification from the plurality of constituents based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result,
The extraction unit extracts the predetermined grammatical function classification from the plurality of constituents when the constituent type of each constituent and the hierarchical positional relationship between the constituents conforms to a predetermined pattern. extracting said constituents belonging to
When the input sentence does not contain an auxiliary verb,
the constituent type of the second constituent that is one layer above the first constituent is a "verb phrase";
The constituent type of the third constituent that is two layers above the first constituent is "sentence", and
There are no constituents three layers above the first said constituent,
If it fits the pattern with
The extracting unit extracts the first constituent element as the constituent element belonging to "predicate", which is one of the grammatical function classifications.
Natural language processor.

A natural language processing device that identifies a grammatical function classification indicating a grammatical function classification of a constituent consisting of a group of one or more words that is a constituent unit of a sentence,
A syntax for syntactically analyzing an input sentence on a sentence-by-sentence basis and generating a parsing result representing a plurality of said constituents and a constituent type which is a type of each said constituent by a hierarchical positional relationship between said constituents. an analysis unit;
an extraction unit that extracts the constituents belonging to a predetermined grammatical function classification from the plurality of constituents based on the constituent type and the hierarchical positional relationship of each constituent in the syntactic analysis result,
The extraction unit extracts the predetermined grammatical function classification from the plurality of constituents when the constituent type of each constituent and the hierarchical positional relationship between the constituents conforms to a predetermined pattern. extracting said constituents belonging to
When the input sentence is a compound sentence,
the constituent type of the second constituent that is one layer above the first constituent is a "verb phrase";
the constituent type of the third constituent that is three layers above the first constituent is "sentence";
The constituent type of the constituent that exists in front of the second constituent in the same layer and is connected to the constituent that is the same as the second constituent in the upper layer is an "auxiliary verb", and
The constituent element type of the constituent element four layers above the first constituent element is "sentence",
If it fits the pattern with
The extracting unit extracts the first constituent element as the constituent element belonging to "predicate", which is one of the grammatical function classifications.
Natural language processor.