JPH077418B2

JPH077418B2 - Natural language context processor

Info

Publication number: JPH077418B2
Application number: JP1052824A
Authority: JP
Inventors: 輝彦浮田; 顕司小野; 真家天野
Original assignee: 工業技術院長
Priority date: 1989-03-07
Filing date: 1989-03-07
Publication date: 1995-01-30
Anticipated expiration: 2010-01-30
Also published as: JPH02289059A

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は自然言語処理においてテキスト（文章）の文脈
構造を効果的に決定することのできる文脈処理装置に関
する。The present invention relates to a context processing device capable of effectively determining the context structure of a text (sentence) in natural language processing.

（従来の技術）近時、自然言語処理の研究が種々進められており、要約
等の文章（テキスト）の内容を抽出するための手立てと
して、テキストの文脈構造を調べてその意味解析するこ
とが行なわれるようになってきた。従来、この種の文脈
構造の決定は、専らテキスト中に出現する名詞や動詞を
調べ、それら単語に関する係り受け条件や品詞の繋がり
関係等の情報を登録した辞書を参照して上記名詞や動詞
の繋がり関係を調べながら行なわれている。つまりテキ
スト中に出現する名詞や動詞を手掛りとしてその文脈構
造を求めている。(Prior Art) Recently, various researches on natural language processing have been advanced, and as a means for extracting the contents of a text (text) such as a summary, it is possible to examine the context structure of the text and analyze its meaning. It began to take place. Conventionally, this kind of context structure is exclusively determined by examining the nouns and verbs that appear in the text, and referring to a dictionary that stores information such as dependency conditions and part-of-speech connection relations for these words It is being done while examining the connection. That is, the nouns and verbs appearing in the text are used as clues to obtain the context structure.

ところがテキスト中に出現する可能性のある全ての名詞
や動詞等に関する情報を予め辞書登録しておくことは現
実的に殆んど不可能である。これ故、辞書登録されてい
ない単語がテキスト中に出現した場合にはその文脈構造
の決定が非常に難しくなることが否めない。またこのよ
うな辞書情報を用いて文脈構造を決定するにしても膨大
な知識を必要とし、その処理負担が非常に多大である等
の問題がある。However, it is practically almost impossible to register in advance a dictionary with information about all nouns, verbs, etc. that may appear in a text. Therefore, it cannot be denied that the determination of the context structure becomes very difficult when a word that is not registered in the dictionary appears in the text. Further, even if the context structure is determined using such dictionary information, there is a problem that a huge amount of knowledge is required and the processing load is very large.

一方、文中に出現する接続詞等の接続表現を手掛りと
し、この接続表現から複数の文間の接続関係として許さ
れている系列を接続規則を参照して調べることでその文
脈構造を求めることも試みられている。しかしテキスト
を構成する複数の文に必ず接続詞等の接続表現があると
は限らないので、接続表現を利用するにしてもその文脈
構造を正確に決定するには幾つかの問題が残されてい
る。On the other hand, we also try to find the context structure by looking up the series permitted as a connection relation between multiple sentences from this connection expression by referring to the connection rules, using the connection expression such as the conjunction that appears in the sentence as a clue. Has been. However, since multiple sentences that make up a text do not necessarily have a connective expression such as a connective, there are still some problems in accurately determining the context structure even if the connective expression is used. .

（発明が解決しようとする課題）このように従来では、専らテキスト中に出現する名詞や
動詞を手掛りとしてその文脈構造を調べているだけであ
り、また接続表現を手掛りとするにしても接続関係に関
する表現のない文が多々あるので、その文脈構造を一意
に決定することが非常に困難であると云う問題があっ
た。更には文章の意味内容に立ち入って文脈構造を解析
するには膨大な知識を必要とし、また推論処理等の高度
な知識処理が必要となる等の問題があった。(Problems to be solved by the invention) As described above, conventionally, the nouns and verbs appearing in the text are exclusively used as clues to examine the context structure, and even if the connection expressions are used as clues, the connection relations are used. Since there are many sentences that do not have expressions, there is a problem that it is very difficult to uniquely determine the context structure. Further, there is a problem that enormous amount of knowledge is required to go into the semantic content of a sentence and analyze the context structure, and advanced knowledge processing such as inference processing is required.

本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、個々の名詞や動詞に関する情報
が不足している場合であってもテキストの文脈構造を良
好に決定することのできる実用性に優れた自然言語の文
脈処理装置を提供することにある。The present invention has been made in consideration of such circumstances, and an object thereof is to properly determine the context structure of a text even when information about individual nouns or verbs is insufficient. It is to provide a natural language context processing device which is excellent in practicality.

［発明の構成］（課題を解決するための手段）本発明に係る自然言語の文脈処理装置は、自然言語の文
章を解析して文章全体の構造を抽出する文脈処理装置に
おいて、入力された文章を構成する各文について形態素解析を行
なう文解析手段と、この文解析手段より求められた各文の形態素解析結果を
用いて文中の修辞表現を検出し、この検出された文とそ
の前の文とを該修辞表現に対応した接続関係で繋げた接
続関係系列を求める接続関係抽出手段と、この接続関係抽出手段より求められた前記接続関係系列
を含む前記各文に対し修辞構造としてのとり得る候補を
求め、この中から文章構造として不適切な候補を除外す
る修辞関係判定手段と、前記文解析手段より求められた各文の形態素解析結果に
対し助詞を条件として文中の話題提示表現としての単語
を抽出し、前記話題提示表現としての単語が該抽出され
た文より先行する文中に出現しているか否かを調べ、出
現する場合に該抽出された文と前記先行する文とを関連
づけて文間の関連性の情報として出力する文関連度判定
手段と、前記修辞関係判定手段より前記不適切な候補を除外され
た修辞構造の各候補に対し、前記文間の関連性の情報に
従った該抽出された文であって前記修辞表現が検出され
なかった文をその前の文と繋げた接続構造候補を夫々求
め、これら接続構造候補に対し該抽出された文と前記先
行する文との文間構造が不適切な候補を除外する文脈構
造作成手段と、を具備したことを特徴とするものである。[Configuration of Invention] (Means for Solving the Problems) A natural language context processing device according to the present invention is a context processing device that analyzes a natural language sentence and extracts the entire structure of the sentence. Detects rhetorical expressions in sentences using sentence analysis means that performs morphological analysis for each sentence that composes the sentence, and morphological analysis results for each sentence that are obtained by this sentence analysis means. And a connection relation extracting means for obtaining a connection relation series connecting the and in a relation corresponding to the rhetorical expression, and each sentence including the connection relation series obtained by the connection relation extracting means can be taken as a rhetorical structure. Rhetoric relationship determination means for finding candidates and excluding candidates that are inappropriate as sentence structures from these, and topic presentation in sentences with the particle as a condition for the morphological analysis result of each sentence obtained by the sentence analysis means The word as the present is extracted, and it is checked whether or not the word as the topic presentation expression appears in the sentence preceding the extracted sentence, and when it appears, the extracted sentence and the preceding sentence Sentence relevance determining means for outputting as information on the relevance between the sentences by associating with each other, for each candidate of the rhetorical structure in which the inappropriate candidate is excluded from the rhetorical relation determining means, Each of the extracted sentences according to the information, in which the rhetorical expression is not detected, is connected to the preceding sentence to obtain connection structure candidates, and the extracted sentence and the preceding sentence are connected to these connection structure candidates. And a context structure creating means for excluding candidates whose inter-sentence structure with the sentence is inappropriate.

（作用）本発明によれば、テキスト中の修辞表現に着目し、修辞
表現から文章部分間の接続関係の接続構造を調べてその
修辞構造を決定すると共に、修辞表現が明確に提示され
ない文に対しても、その文中にある話題提示表現に着目
し、この話題提示表現に従って文間の繋がり関係を調べ
ていくので、名詞や動詞に関する情報が不足して文構造
が十分に解析できない場合や、文間の接続関係を示す修
辞表現がない文であってもその文脈構造を効果的に決定
することが可能となる。しかも文章部分間の修辞構造を
活用して、また話題表現を活かしてその文脈構造を調べ
るので、適切な文脈処理を行ない、文章全体の構造抽出
の精度を高めることが可能となる。(Operation) According to the present invention, paying attention to the rhetorical expression in the text, the rhetorical structure is determined by examining the connection structure of the connection relation between the sentence parts from the rhetorical expression, and the sentence in which the rhetorical expression is not clearly presented Also, since we focus on the topic presentation expression in the sentence and investigate the connection relation between sentences according to this topic presentation expression, if the information about nouns and verbs is insufficient and the sentence structure cannot be analyzed sufficiently, , It is possible to effectively determine the context structure of a sentence even if there is no rhetorical expression indicating the connection relation between sentences. Moreover, since the context structure is examined by utilizing the rhetorical structure between the sentence parts and the topic expression, it is possible to perform appropriate context processing and improve the accuracy of structure extraction of the entire sentence.

（実施例）以下、図面を参照して本発明の一実施例に係る文脈処理
装置につき説明する。(Embodiment) Hereinafter, a context processing device according to an embodiment of the present invention will be described with reference to the drawings.

第１図は実施例装置の要部概略構成図で、１は解析処理
対象となる文書データや記録データ等のテキストを入力
するテキスト入力部である。このテキスト入力部１を介
して入力される自然言語のテキスト（文章）は文解析部
２に与えられ、書式構造解析，および形態素解析され
る。文解析部２を構成する書式構造解析部2aは上記テキ
スト入力部１から入力されるテキストに含まれる章や節
等の書式的な情報からテキスト全体の書式的構造を解析
し、その構造を決定するものである。また上記文解析部
２を構成する形態素解析部2bは、上記書式構造解析部2a
による解析によってその書式的構造が決定されたテキス
トに対して、例えばパラグラフ単位にそのパラグラフに
含まれる文の形態素解析を実行する。この形態素解析に
よってテキスト中の名詞や動詞等の形態素が個々に求め
られる。FIG. 1 is a schematic configuration diagram of a main part of the embodiment apparatus, and 1 is a text input unit for inputting text such as document data or recorded data to be analyzed. A natural language text (sentence) input via the text input unit 1 is given to the sentence analysis unit 2 and subjected to format structure analysis and morpheme analysis. The format structure analysis section 2a constituting the sentence analysis section 2 analyzes the format structure of the entire text from the format information such as chapters and sections included in the text input from the text input section 1 and determines the structure. To do. Further, the morpheme analysis unit 2b constituting the sentence analysis unit 2 is the format structure analysis unit 2a.
The morphological analysis of the sentence contained in the paragraph is performed on the text, the formal structure of which is determined by the analysis by the paragraph. By this morphological analysis, morphemes such as nouns and verbs in the text are individually obtained.

以上の文解析部２における処理機能は従来の文脈処理装
置と同様な機能を呈するものであり、また従来より種々
提唱されている手法を用いて実現される。The processing function of the sentence analysis unit 2 described above has the same function as that of the conventional context processing device, and is realized by using various conventionally proposed methods.

ここで本装置が特徴とするところは、上記文解析部２に
より求められたテキストの書式的構造解析結果、および
形態素解析結果に従い、テキスト中の修辞表現を手掛り
として当該テキストを構成する複数の文章部分（例えば
パラグラフ）の接続関係を求める接続関係抽出部３と修
辞関係判定部４とが設けられると共に、後述する文関連
度判定部５が設けられている点にある。Here, the feature of the present apparatus is that, according to the result of the formal structure analysis of the text obtained by the sentence analysis unit 2 and the result of the morpheme analysis, a plurality of sentences that compose the text with the rhetorical expression in the text as a clue. The point is that a connection relation extraction unit 3 for obtaining a connection relation between parts (for example, paragraphs) and a rhetorical relation judgment unit 4 are provided, and a sentence relevance degree judgment unit 5 described later is also provided.

上記接続関係抽出部３は、例えば第２図に示すように接
続詞等の修辞表現とその接続関係を示す関係名テーブル
を格納した関係名テーブル記憶部６を参照して前記テキ
スト中の修辞表現を調べ、その修辞表現にて結び付けら
れているパラグラフ間等の接続関係を求めるものであ
る。例えば『但し』なる修辞表現が求められた場合に
は、その修辞表現の前に存在するパラグラフと、上記修
辞表現が付されたパラグラフとが「捕足」の関係にある
ことを示す接続関係を求めている。この接続関係抽出部
３にて前記テキスト中に出現する全ての修辞表現が順次
求められ、且つその修辞表現で示される文章部分間の接
続関係が個々に求められる。The connection relation extraction unit 3 refers to the relation name table storage unit 6 that stores a relation name table showing connection relations and rhetorical representations such as connectives as shown in FIG. This is to find the connection relation between paragraphs etc. linked by the rhetorical expression. For example, when a rhetorical expression of "But" is requested, a connection relation indicating that the paragraph existing before the rhetorical expression and the paragraph with the above rhetorical expression have a "capture" relationship Looking for. The connection relation extracting unit 3 sequentially finds all rhetorical expressions that appear in the text, and individually finds the connection relations between the text parts indicated by the rhetorical expressions.

しかして修辞構造決定部４は、例えば第３図に示すよう
に上記修辞表現が示す接続関係の接続に関する禁止規則
を関係テーブルとして格納した禁止規則記憶部７を参照
し、前記接続関係抽出部３にて求められた文章部分間の
接続関係の系列からパラグラフ内、およびパラグラフ間
の修辞構造を求めるものである。Therefore, the rhetorical structure determination unit 4 refers to the prohibition rule storage unit 7 which stores, as a relational table, the prohibition rule relating to the connection of the connection relation shown by the rhetorical expression, for example, as shown in FIG. The rhetorical structure within and between paragraphs is obtained from the series of connection relations between sentence parts obtained in.

上記禁止規則記憶部７に格納される関係テーブルは第３
図（ａ）に示すような修辞表現によって示される接続関
係の構造的な接続禁止規則を示す情報と、第３図（ｂ）
に示すような上記接続関係の否定的な接続規則を示す情
報とを登録したテーブルからなり、テキストの修辞構造
として不適切な接続関係間の構造を規則として示してい
る。特に第３図（ａ）に示す接続禁止規則は、修辞表現
によって結ばれるパラグラフ間の接続関係の中での実際
的に出現することのない組合せ構造を示している。また
第３図（ｂ）に示す否定的な接続規則は上記接続禁止規
則ほどでないにしろ、一般的には殆んど用いられること
のない組合せ構造を示している。The relation table stored in the prohibition rule storage unit 7 is the third
Information showing structural connection prohibition rules of connection relations expressed by rhetorical expressions as shown in FIG. 3A, and FIG. 3B.
And the information indicating the negative connection rule of the above-mentioned connection relation are registered, and the structure between the connection relations which is inappropriate as a rhetorical structure of the text is shown as a rule. In particular, the connection prohibition rule shown in FIG. 3 (a) shows a combination structure that does not actually appear in the connection relation between paragraphs connected by rhetorical expressions. The negative connection rule shown in FIG. 3 (b) shows a combination structure that is rarely used in general, if not as much as the above connection prohibition rule.

修辞関係判定部４はこのような情報を有する禁止規則記
憶部７を参照し、前記接続関係抽出部３で求められた修
辞表現によって結ばれるパラグラフ間の接続関係を示す
系列から、テキストの修辞構造として不適切な組合せを
排除しながらその組合せ構造を求めて該テキストの修辞
構造を決定している。The rhetorical relation determination unit 4 refers to the prohibition rule storage unit 7 having such information, and the rhetorical structure of the text is extracted from the series indicating the relation of connection between paragraphs connected by the rhetorical expression obtained by the relational relation extraction unit 3. As a result, the rhetorical structure of the text is determined by finding the combination structure while eliminating inappropriate combinations.

尚、テキストの修辞構造が一義的に決定されず、複数の
修辞構造候補が求められた場合等には、その決定は次の
文脈構造作成部８に委ねられる。When the rhetorical structure of the text is not uniquely determined and a plurality of rhetorical structure candidates are obtained, the decision is entrusted to the next context structure creation unit 8.

一方、文関連度判定部５は前記文解析部２で求められた
テキストの解析結果に従い、テキスト（文）中に出現す
る単語を調べ、各文中に出現する単語の関係から複数の
文間の関連性を判定している。即ち、文関連度判定部５
は、入力文章（テキスト）を構成する複数の文について
各文毎にそこに出現する単語を調べており、話題表現抽
出部5aは、特に文中の話題提示表現としての単語を求め
ている。この話題提示表現としての単語は、例えばその
単語に付された助詞が「は」や「も」であること等を条
件として抽出される。具体的には、『Ｘは』なる表現が
存在した場合、単語「Ｘ」を話題提示表現としての単語
として抽出する。また『ＸのＹは』なる表現が存在する
場合には、「Ｘ」と「Ｙ」とを共に話題提示する単語と
して抽出している。On the other hand, the sentence relevance determining unit 5 checks words appearing in the text (sentence) according to the text analysis result obtained by the sentence analyzing unit 2, and detects a plurality of sentences based on the relationship between the words appearing in each sentence. Relevance is judged. That is, the sentence relevance determination unit 5
Examines words appearing in each sentence of a plurality of sentences forming an input sentence (text), and the topic expression extracting unit 5a particularly seeks a word as a topic presentation expression in the sentence. The word as the topic presentation expression is extracted, for example, on the condition that the particle attached to the word is “ha” or “mo”. Specifically, when the expression "Xwa" exists, the word "X" is extracted as a word as a topic presentation expression. When the expression “X of Y” is present, both “X” and “Y” are extracted as words that present a topic.

尚、このようにして抽出される単語の全てが話題提示表
現しているものとは限らない。例えば上記助詞を手掛か
りとして抽出される単語の中には、例えば『最近では』
とか『第Ｘ図は』や、『〜の場合には』等の話題提示表
現とは云えず、文間の接続関係を調べるための手掛かり
とならないものがある。このような単語については、例
えばこれらを不要語として予め定めて不要語テーブル記
憶部5bに格納しておき、この不要語テーブル記憶部5bを
参照することで話題提示表現としての単語から除外する
ようにしておけば良い。It should be noted that not all of the words extracted in this way represent the topic presentation. For example, among the words extracted using the particle as a clue, for example, "recently"
It cannot be said that it is a topic presentation expression such as "Fig. X is" or "in the case of ...", and there is something that does not serve as a clue for checking the connection relation between sentences. For such words, for example, these are predetermined as unnecessary words and are stored in the unnecessary word table storage unit 5b, and the unnecessary word table storage unit 5b is referred to exclude them from words as topic presentation expressions. You can leave it as it is.

また文関連度判定部５における単語反復抽出部5cは、基
本的には同一の単語が複数の文間において反復的に使用
されているか否かを調べるものである。そして特に或る
文において話題提示表現としての単語が求められた場
合、その単語が他の文においても用いられているか否か
を調べ、他の文において用いられている場合には当該文
を識別する情報（文管理番号）を求めている。The word repetition extraction unit 5c in the sentence relevance determination unit 5 basically checks whether or not the same word is repeatedly used among a plurality of sentences. When a word as a topic presentation expression is sought in a certain sentence, it is checked whether the word is also used in another sentence, and if the word is used in another sentence, the sentence is identified. The information (sentence management number) is requested.

文関連性判定処理部5dは上述した如く話題表現抽出部5a
で求められた話題提示表現としての単語、および単語反
復抽出部5cにて求められた出現単語の情報に従い、これ
らの単語情報に従って入力テキストを構成する複数の文
間の関連性を判定処理している。具体的には話題表現抽
出部5aにて或る文から話題提示表現としての単語が抽出
された場合、この単語について単語反復抽出部5cで求め
られた各文における出現単語の情報を調べ、上記話題提
示表現としての単語が抽出された文より先行する文にお
いて当該単語と同じ単語が出現している文を求め、この
文を上記話題提示表現が抽出された文と関連付けてい
る。また話題提示表現として抽出された単語が指示詞や
代名詞である場合には、文関連性判定処理部5dはこの指
示詞や代名詞からなる話題提示表現を含む文より先行す
る文章中で話題提示表現としての単語を持つ文を調べ、
この文以降の文章に上記指示詞や代名詞が求められた文
を関連付けている。The sentence relevance determination processing unit 5d is the topic expression extraction unit 5a as described above.
According to the information of the word as the topic presentation expression obtained in, and the information of the appearing word obtained in the word repetition extraction unit 5c, the relationship between a plurality of sentences forming the input text is determined according to the word information. There is. Specifically, when a word as a topic presentation expression is extracted from a sentence by the topic expression extraction unit 5a, the information of the appearance word in each sentence obtained by the word repetition extraction unit 5c for this word is checked, and A sentence in which the same word as the word appears in a sentence preceding the sentence in which the word as the topic presentation expression is extracted is obtained, and this sentence is associated with the sentence in which the topic presentation expression is extracted. If the word extracted as the topic presentation expression is a demonstrative or a pronoun, the sentence relevance determination processing unit 5d causes the sentence presentation expression in the sentence preceding the sentence including the topic presentation expression composed of the demonstrative or the pronoun. Look up the sentence with the word as
The sentence after this sentence is associated with the sentence for which the above-mentioned demonstrative or pronoun is sought.

このような判定処理によって入力テキストを構成する複
数の文が相互にどのような関係を持つかが各文に出現す
る単語を手掛かりとして調べられる。By such a determination process, it is possible to find out how the plurality of sentences forming the input text are related to each other by using the word appearing in each sentence as a clue.

前記文脈構造作成部８は上述した如く文関連度判定部５
で求められた文間の関連性の情報と、前記接続関係抽出
部３および修辞関係判定部４にて求められた入力テキス
トの修辞構造とから、知識記憶部９に記憶されている文
章構造の知識を参照しながらその文章構造を調べ、文脈
解析を行っている。このようにして求められた入力テキ
ストの文脈構造の情報がアプリケーション部10に与えら
れ、例えば要約文の作成処理等に供せられる。As described above, the context structure creation unit 8 uses the sentence relevance determination unit 5
Of the sentence structure stored in the knowledge storage unit 9 from the information on the relevance between the sentences obtained in step S1 and the rhetorical structure of the input text obtained by the connection relation extraction unit 3 and the rhetorical relation determination unit 4. While referring to the knowledge, I investigate the sentence structure and perform context analysis. The information of the context structure of the input text obtained in this way is given to the application unit 10 and is used for, for example, a summary sentence creation process.

即ち、文脈構造作成部８は知識記憶部９に格納された知
識、例えばテキストが言及している内容が属する分野の
情報、テキスト中に出現する品詞の接続関係に関する知
識や係り受け関係の知識等を用い、上述したテキストの
修辞構造に従いながら、或いはテキストの修辞構造を決
定してその文脈構造を決定する。つまりこの文脈構造の
決定は、基本的には従来の文脈処理装置と同様にして行
なわれるが、この際、上述した修辞表現から解析される
修辞構造と、話題提示表現としての単語に従って求めら
れる文間の関連性の情報に従い、その文脈構造を決定し
ていく。That is, the context structure creation unit 8 stores the knowledge stored in the knowledge storage unit 9, for example, information on the field to which the content referred to by the text belongs, knowledge on the connection relation of parts of speech appearing in the text, knowledge on the dependency relation, etc. To determine the contextual structure of the text, or by determining the rhetorical structure of the text. In other words, this context structure is basically determined in the same manner as the conventional context processing device, but at this time, the sentence obtained according to the rhetorical structure analyzed from the rhetorical expression described above and the word as the topic presentation expression. The contextual structure is determined according to the information about the relationships between them.

以上が本装置の構成とその機能である。The above is the configuration and function of the present apparatus.

ここで先ず上述した修辞表現に基づく修辞構造の決定処
理について第４図および第５図を参照して更に詳しく説
明する。First, the rhetorical structure determination process based on the rhetorical expression described above will be described in more detail with reference to FIGS. 4 and 5.

第４図は文脈処理の対象とするテキストの例を示す図で
ある。このようなテキストが入力されると前記書式構造
解析部2aは、例えばインデンテーションや句読点，コン
マ，ピリオド等の書式的情報から上記テキストを構成す
る文章部分を文やパラグラフに分割してその書式構造を
求めるものである。ここでは第４図に示す入力テキスト
を複数の文に分割し、例えばここでは文，，〜を
それぞれ求める。このようにして求められた各文につい
て前記形態素解析部2bはその形態素解析を実行する。FIG. 4 is a diagram showing an example of text to be subjected to context processing. When such text is input, the format structure analysis unit 2a divides the text part constituting the text into texts or paragraphs based on the textual information such as indentation, punctuation marks, commas, periods, etc. Is to seek. Here, the input text shown in FIG. 4 is divided into a plurality of sentences, and here, for example, sentences, ... Are respectively obtained. The morphological analysis unit 2b executes the morphological analysis on each sentence thus obtained.

この形態素解析については、例えば『長尾真監修“日本
語情報処理”電子通信学会発行，（昭和59年）』等に示
される解析法を用いれば良い。このような日本語文の解
析法によれば、例えば「以下では図面を参考にする。」なる文が与えられた場合、『以下（名詞）＋で（助詞）＋は（助詞）＋図面（名
詞）＋を（助詞）＋参考（名詞）＋に（助詞）＋する
（動詞）。』なる形態素解析結果が求められることになる。For this morphological analysis, the analysis method shown in, for example, "Makoto Nagao," Japanese Information Processing "published by The Institute of Electronics, Information and Communication Engineers, (1984)" may be used. According to such a Japanese sentence analysis method, for example, when a sentence “The following refers to drawings.” Is given, “below (noun) + (particle) + is (particle) + drawing (noun ) + To (particle) + reference (noun) + to (particle) + (verb). The result of morphological analysis will be required.

しかして接続関係抽出部３は入力テキストを構成する各
文についてそれぞれ求められた形態素解析結果に基づ
き、前記関係名テーブル記憶部６を参照して各文中の修
辞表現を探し出し、検出された修辞表現についてその修
辞表現が意味する接続関係の情報を前記関係名テーブル
記憶部６から求める。Then, the connection relation extraction unit 3 refers to the relation name table storage unit 6 to find the rhetorical expression in each sentence based on the morpheme analysis result obtained for each sentence constituting the input text, and detects the rhetorical expression detected in each sentence. Information about the connection relation that the rhetorical expression means is obtained from the relation name table storage unit 6.

この第４図に示す例では図中網掛けで示すように、文
から『この様な』なる修辞表現を求め、その接続関係が
『順接』であることを求める。また文からは『また』
なる修辞表現とその接続関係『対比』を求める。In the example shown in FIG. 4, the rhetorical expression "like this" is obtained from the sentence and the connection relation is "forward" as shown by the hatching in the figure. Also, from the sentence
Rhetorical expression and its connection relation "contrast" are obtained.

このような接続関係から接続関係抽出部３は、前記第４
図に示すテキストが第５図のＡに示すように［ − 順接 − 対比］なる接続関係で複数の文，，〜が順次繋がってい
ることを求める。From such a connection relation, the connection relation extracting unit 3 determines the fourth relation.
As shown in Fig. 5A, the text shown in the figure asks that a plurality of sentences, ... are sequentially connected in a connection relation [-forward-contrast].

しかしこのような接続関係系列は、単に複数の文間での
接続関係を個々に示しているだけであり、その修辞構造
がどのように組合わさっているかについては示していな
い。However, such a connection relation series merely shows individual connection relations between a plurality of sentences, and does not show how the rhetorical structures are combined.

しかして修辞構造をなす上記接続関係系列の組合せにつ
いて考えてみると、この５つのパラグラフが存在する場
合には、理論的には第５図のＢに示すように14通りの組
合せが考えられる。尚、文が２つの場合には、その組合
せ構造が取り得る形態は１つであり、文が３つの場合に
はその組合せ構造が取り得る形態は２通りである。そし
て４つの文の場合には５通りであり、文の数によってそ
の組合せ構造が取り得る形態の数が理論的に定まる。Then, considering the combination of the connection relation series forming the rhetorical structure, if these five paragraphs are present, theoretically, there are 14 possible combinations as shown in FIG. 5B. When there are two sentences, the combination structure can take one form, and when there are three sentences, the combination structure can take two forms. In the case of four sentences, there are five types, and the number of forms that the combination structure can take is theoretically determined by the number of sentences.

尚、この接続関係の系列が取る組合せ構造は、テキスト
中の各文がどこに，どのような関係で接続するかを示す
もので、例えば［−［［順接］−［対比］］］なる修辞構造（組合せ構造）は、第１の文が第２の文
から第５の文までの全体に対して何等かの係わりを
持つことが示される。そして第２の文から第５の文
までについては、第２の文と第３の文とが『順接』
関係にあることが導かれ、更に第４の文と第５の文
とが『対比』関係にあることが示される。その上で上記
『順接』の関係にある文と文とが、『対比』の関係
にある文と文に対して何等かの繋がり関係を持つこ
とが示される。Note that the combinational structure taken by this sequence of connection relationships indicates where and in what relationship each sentence in the text is connected. For example, the rhetorical formula is [-[[ordering]-[contrast]]]. The structure (combination structure) indicates that the first sentence has some relation to the whole of the second sentence to the fifth sentence. And for the second sentence to the fifth sentence, the second sentence and the third sentence are "junction"
It is shown that there is a relationship, and it is further shown that the fourth sentence and the fifth sentence have a "contrast" relationship. Then, it is shown that the sentences in the "junction" relationship and the sentences have some connection relationship with the sentences in the "contrast" relationship.

このような修辞構造の表現により複数の文間の接続構造
関係が示される。しかし前述したＢに示す接続関係の系
列が取る組合せ構造は、単純にその組合せの構造形態を
表わしたものに過ぎず、中には文章構造における修辞構
造として不適切なものもある。そこで修辞関係判定部４
は前述した禁止規則記憶部７を参照して上記組合せ構造
中の不適切な修辞関係にある構造を見出し、これを修辞
構造の候補から除外している。The expression of such rhetorical structure shows the connection structure relation between a plurality of sentences. However, the above-mentioned combination structure taken by the series of connection relations shown in B merely represents the structural form of the combination, and some of them are unsuitable as rhetorical structures in the text structure. Therefore, the rhetorical relationship determination unit 4
Refers to the prohibition rule storage unit 7 described above and finds an improper rhetorical structure in the combination structure, and excludes it from the rhetorical structure candidates.

例えば第３図（ａ）に示すような関係テーブルに示され
る接続禁止規則『順接［…［Ｘ順接』 X;文から上記組合せ構造中に『順接［Ｘ順接』なる組合せ構
造を含むものがある場合、これを修辞構造として不適切
であるとして、その修辞構造の候補から除外する。For example, from the connection prohibition rule "junction [... [X order"X; statement] shown in the relation table as shown in FIG. If there is something that includes it, it is considered inappropriate as a rhetorical structure and is excluded from the candidates for that rhetorical structure.

ところが第５図に示す修辞構造の候補から明らかなよう
に、第４図に示す入力テキストの例では文，からは
修辞表現を見出すことができず、これらの文に対する接
続関係が不明となっている。従って上述した如く求めら
れる文章の修辞構造の候補からだけでは適切な修辞構造
だけを絞り込むことができず、結局、その絞り込み処理
が文脈構造作成部８に委ねられることになる。However, as is clear from the rhetorical structure candidates shown in FIG. 5, in the example of the input text shown in FIG. 4, no rhetorical expression can be found from the sentence, and the connection relation to these sentences becomes unknown. There is. Therefore, the appropriate rhetorical structure cannot be narrowed down only from the candidates of the rhetorical structure of the sentence obtained as described above, and the narrowing down process is ultimately entrusted to the context structure creating unit 8.

そこで本装置では前記文関連度判定部５において求めら
れた文間の関連性の情報を利用して次のように入力テキ
ストの文脈構造を求めている。Therefore, in this apparatus, the context structure of the input text is obtained as follows by using the information on the relation between the sentences obtained by the sentence relation determining unit 5.

即ち、前述したように文関連度判定部５は文中に出現す
る話題提示表現の単語や同じ単語の反復出現を抽出して
おり、第４図に示す入力テキストからは、図中下線を付
して示すように、文から「規則合成方式は」なる話題
提示表現を求め、また文から「素片は」なる話題提示
表現を、更に文からは「素片は」なる話題提示表現を
抽出している。この話題提示表現としての単語の抽出処
理については、例えば『永野賢著“文章論総説［文法論
的考察］”朝倉書店発行（昭和61年）』に示されるよう
な話題提示表現を入力テキスト中から抽出するようにす
れば良い。That is, as described above, the sentence relevance determining unit 5 extracts the words of the topic presentation expression and the repeated occurrences of the same words that appear in the sentence, and the input text shown in FIG. 4 is underlined in the figure. As shown in the figure, a topic presentation expression "rule composition method" is obtained from the sentence, a topic presentation expression "segment" is extracted from the sentence, and a topic presentation expression "segment" is extracted from the sentence. ing. Regarding the extraction processing of words as this topic presentation expression, for example, in the input text, a topic presentation expression as shown in "Ken Nagano" Review of writing theory [grammatical consideration] "published by Asakura Shoten (1986) It should be extracted from.

しかして単語反復抽出部5cはこのようにして各文からそ
れぞれ抽出される主題提示表現の単語について、その単
語が他の文中にも反復的に出現しているか否かを調べ、
例えば第６図に示すような情報を求めている。この単語
反復抽出部5cにより求められる情報により、各文につい
ての話題提示表現の単語および反復表現の情報が示され
る。具体的には前述したテキストの文には『規則合成
方式』なる話題提示表現があること、また文には『素
片』なる話題提示表現があり、その単語が文にも出現
していること、更に文からも『素片』なる話題提示表
現があり、その単語が文，にそれぞれ出現している
ことが求められる。Then, the word repetition extraction unit 5c examines whether or not the word of the subject presentation expression extracted from each sentence in this way repeatedly appears in other sentences,
For example, the information shown in FIG. 6 is requested. The information obtained by the word repetition extraction unit 5c indicates the word of the topic presentation expression and the information of the repetition expression for each sentence. Specifically, the sentence of the above-mentioned text has a topic presentation expression of "rule composition method", and the sentence has a topic presentation expression of "segment", and the word appears in the sentence. In addition, there is a topic presentation expression that is also a "slice" from the sentence, and it is required that the word appears in the sentence.

しかして文関連性判定処理部5dは上述した如く求められ
た話題提示表現の単語から、同じ単語が先行する文中に
出現しているかを調べ、先行する文中に同じ単語が存在
するとき、その先行する文と話題提示表現が抽出された
文とを関連づけている。具体的には文から求められた
話題提示表現の単語「素片」について、その文よりも
先行する文（ここでは文）中に同じ単語が出現するか
否かを調べている。そして第６図に示す情報から文に
同じ単語が出現することを見出だし、この結果、文と
文とが『補足』の関係にあることを見出だしている。
同様にして文についても、その話題提示表現「素片」
が先行する文，にそれぞれ存在することから文が
先行する文，とに対して『補足』の関係にあること
を見出だしている。Then, the sentence relevance determination processing unit 5d checks from the words of the topic presentation expression obtained as described above whether or not the same word appears in the preceding sentence, and when the same word exists in the preceding sentence, the preceding And the sentence from which the topic presentation expression is extracted are associated with each other. Specifically, for the word “element” of the topic presentation expression obtained from the sentence, it is examined whether or not the same word appears in the sentence (here, sentence) preceding the sentence. From the information shown in FIG. 6, it is found that the same word appears in the sentence, and as a result, it is found that the sentence and the sentence have a "supplementary" relationship.
Similarly, for sentences, the topic presentation expression "elements"
It is found that the sentence has a "supplementary" relationship with the sentence that precedes the sentence because the sentence exists in the sentence that precedes.

前記文脈構造作成部８はこのような文間の関連性の情報
を参照することによって、前述した入力テキストを構成
する複数の文，，〜が第７図のＣに示すように［補足順接補足対比］なる接続関係で順次繋がっていることを求めることにな
る。The context structure creating unit 8 refers to the information about the relevance between the sentences, so that the plurality of sentences constituting the input text described above are displayed as shown in C of FIG. Supplementary comparison] It is required to be sequentially connected by the connection relation.

しかしてこのような接続関係が求められた場合、文脈構
造作成部８は第７図のＤに示すようにその接続構造候補
を求める。ここで前記知識記憶部９を参照すれば、上記
文章の接続関係を示す『補足』なる関係は、前方の文を
後出の文が補足することを示すことから、文脈構造作成
部８は、例えば『…［Ｘ補足［Ｙｒ…』 X,Y;文 r;他の関係なる接続構造が許されないことを知る。このような文間
の接続禁止の知識情報に従うことにより、文脈構造作成
部８は第７図のＤに示す接続構造候補の１）に示される
ような［補足［順接［補足［対比］］］］が文構造として不適切であることを判定し、接続構造候
補から除外する。このような判定処理により、第７図の
Ｅに示すように接続構造候補が［［補足］順接［［補足］対比］］［［［補足］順接［補足］対比］］［［［［補足］順接］補足］対比］の３つに絞り込まれる。更にこれらの接続構造候補に着
目してみると、文により補足される文が前述した話題
提示表現の単語が出現する文，であり、文を補足
するものではないことから、文，を補足する文の
間に他の接続構造が入ることが許されないことが分か
る。この結果、［…補足…［…補足…］のようにととの間に“［”なる構造が入る接続構
造候補が除外され、最終的には第７図のＦに示すように［［［［補足］順接］補足］対比］なる文接続構造候補だけが入力テキストの文脈構造とし
て求められることになる。When such a connection relation is obtained, the context structure creating unit 8 obtains the connection structure candidate as shown in D of FIG. Here, referring to the knowledge storage unit 9, since the “supplementary” relationship indicating the connection relation of the above sentences indicates that the preceding sentence is supplemented by the succeeding sentence, the context structure creating unit 8 For example, "... [X supplement [Y r ...] X, Y; sentence r; find out that other related connection structures are not allowed. By following the knowledge information of connection prohibition between sentences as described above, the context structure creating unit 8 can obtain the [supplemental [ordering [supplemental] [compare]] as shown in 1) of the connection structure candidates shown in D of FIG. ]] Is determined to be inappropriate as a sentence structure, and is excluded from connection structure candidates. As a result of such a determination process, as shown in E of FIG. 7, the connection structure candidates are [[supplemental] ordering [[supplementary] contrast]] [[[supplemental] ordering [supplementary] contrast]] [[[[[ Supplement] Ordering] Supplement] Contrast]. Further focusing on these connection structure candidates, the sentence supplemented by the sentence is the sentence in which the word of the topic presentation expression described above appears, and does not supplement the sentence. It can be seen that no other connection structure is allowed between the sentences. As a result, connection structure candidates such as [... supplementation ... [... supplementation ...] that have a structure "[" between them and are excluded, and finally, as shown in F of FIG. [Supplement] Ordering] Supplement] Contrast] Only the sentence connection structure candidate is obtained as the context structure of the input text.

このようにして文脈構造作成部８は、前記接続関係抽出
部３にて求められた接続関係の系列と、前記文関連度判
定部５により求められる接続関係の系列とから第７図に
示すような文の接続構造候補を求め、その系列が理論的
に取り得る修辞構造の組合せの中から修辞構造として不
適切なものを前記禁止規則記憶部７や知識記憶部９を参
照して検出・除外し、修辞構造としての候補の絞り込み
を行なう。そしてこの候補の絞り込みにおいて接続構造
が一義的に求められた場合には、これをその接続構造と
して決定し、複数の候補が残されている場合には、その
決定をアプリケーション部10における選択処理に委ね
る。そしてこのようにして決定されたテキストの修辞構
造，接続構造を参酌し、テキスト内容の構造等を決定し
て最終的な文脈構造を決定する。In this way, the context structure creating unit 8 uses the sequence of connection relations obtained by the connection relation extracting unit 3 and the sequence of connection relations obtained by the sentence relation determining unit 5 as shown in FIG. A connection structure candidate of a proper sentence is found, and from the combination of rhetorical structures that the sequence can theoretically take, an inappropriate one as a rhetorical structure is detected / excluded by referring to the prohibition rule storage unit 7 or the knowledge storage unit 9. Then, narrow down the candidates as a rhetorical structure. Then, when the connection structure is uniquely obtained in narrowing down the candidates, this is determined as the connection structure, and when a plurality of candidates are left, the determination is performed by the selection process in the application unit 10. Entrust. Then, the final structure of the context is determined by taking into account the rhetorical structure and the connection structure of the text determined in this way, and determining the structure of the text content and the like.

かくしてこのように構成された本装置によれば、テキス
トの名詞や動詞を解析するだけではなく、テキスト中に
出現する修辞表現によって示されるテキストの修辞構造
を解析し、この修辞構造を参酌して上記テキストの文脈
構造を決定して文脈処理を行なうので、テキスト中に出
現する名詞や動詞に関する情報が辞書登録されていない
場合であっても、その未登録語に左右されることなく適
切な文脈処理を行なうことが可能となる。しかも修辞表
現から解析される修辞構造を利用して文脈処理を行なう
ので、例えば前提と結論や判断と例示等のような修辞構
造を適確に捉えて文脈処理することができ、その文脈を
適確に捉えた言語処理を実行することが可能となる。Thus, according to the device thus configured, not only the nouns and verbs of the text are analyzed, but also the rhetorical structure of the text indicated by the rhetorical expressions appearing in the text is analyzed, and this rhetorical structure is taken into consideration. Since the context structure of the above text is determined and context processing is performed, even if the information about the noun or verb appearing in the text is not registered in the dictionary, an appropriate context is not affected by the unregistered word. It becomes possible to perform processing. Moreover, since the context processing is performed by using the rhetorical structure analyzed from the rhetorical expression, the context processing can be performed by appropriately grasping the rhetorical structure such as premise, conclusion, judgment, and illustration. It is possible to execute the language processing that is accurately captured.

また修辞表現が存在しない文がある場合であっても、文
中に現れる話題提示表現の単語等を手掛かりとして文間
の接続関係を判定していくので、その文接続構造を効果
的に解析していくことができ、その接続構造を効果的に
絞り込んで精度の高い文脈処理を行うことが可能とな
る。Even if there is a sentence in which the rhetorical expression does not exist, it is possible to analyze the sentence connection structure effectively by determining the connection relation between sentences using the words of the topic presentation expression that appear in the sentence as clues. It is possible to effectively narrow down the connection structure and perform highly accurate context processing.

尚、本発明は上述した実施例に限定されるものではな
く、書式構造の解析や形態素解析等は従来より提唱され
ている種々のアルゴリズムを適宜用いて行なえば良いも
のである。また関係名テーブルや禁止規則テーブルに登
録しておく情報も上述した例に限定されるものではな
い。It should be noted that the present invention is not limited to the above-described embodiment, and the analysis of the format structure, the morphological analysis, etc. may be performed by appropriately using various algorithms conventionally proposed. Further, the information registered in the relation name table and the prohibition rule table is not limited to the above example.

またここでは話題提示表現としての単語に着目して文間
の関連性を求めたが、例えば『これらの』『このxxは』
と云うような代名詞や指示詞についても着目して文間の
関連性を求めることも可能である。この場合には、代名
詞や指示詞が抽出された文よりも前方にある文の中で話
題提示表現を持つ文を調べ、この文以降の文章のいずれ
かの文に上記代名詞や指示詞が抽出された文が関連性を
持つと判定するようにすれば良い。また話題提示表現の
抽出を行うことなしに、文中に現れる単語の反復出現頻
度を調べ、同じ単語の反復出現頻度の高い文に対して処
理中の文との関連性が高いと判定していくことも可能で
ある。In addition, here, we focused on words as topic presentation expressions to find the relevance between sentences. For example, "these" and "this xx"
It is also possible to find the relevance between sentences by paying attention to pronouns and demonstratives such as. In this case, check the sentence that has the topic presentation expression in the sentence that precedes the sentence in which the pronoun or the demonstrative is extracted, and extract the pronoun or the verb in any of the sentences after this sentence. It is only necessary to determine that the given sentence has relevance. In addition, without extracting the topic presentation expression, we will examine the repeated appearance frequency of the words that appear in the sentence and determine that the sentence with the same repeated occurrence frequency of the same word is highly related to the sentence being processed. It is also possible.

更にはここでは日本語処理について説明したが、英語等
の他の言語に対する文脈処理についても同様に適用する
ことができる。その他、本発明はその要旨を逸脱しない
範囲で種々変形して実施することができる。Furthermore, although the Japanese processing has been described here, the context processing for other languages such as English can be similarly applied. In addition, the present invention can be variously modified and implemented without departing from the scope of the invention.

［発明の効果］以上説明したように本発明によれば、テキスト中に出現
する修辞表現に着目してその修辞構造を求ると共に、話
題提示表現としての単語に着目して文間の関連性を求
め、これらの情報を利用して上記テキストの文脈構造を
解析するので、テキスト中に未登録語が含まれるような
場合や、文から修辞表現（接続表現）が明確に検出され
ない場合であっても効果的に、且つ適確にその文脈構造
を決定することができる等の実用上多大なる効果が奏せ
られる。[Effects of the Invention] As described above, according to the present invention, the rhetorical structure appearing in a text is focused on to obtain its rhetorical structure, and the word as a topic presentation expression is focused on to determine the relevance between sentences. Since the context structure of the above text is analyzed using this information, there are cases where unregistered words are included in the text or when rhetorical expressions (connective expressions) are not clearly detected from the sentence. However, the practically great effect such as the ability to determine the context structure effectively and accurately can be obtained.

[Brief description of drawings]

図は本発明の一実施例に係る文脈処理装置につき示すも
ので、第１図は実施例装置の要部概略構成図、第２図は
関係名テーブルの構成例を示す図、第３図は禁止規則の
テーブルの構成例を示す図、第４図は文脈処理される入
力テキストの例を示す図、第５図はテキストの修辞構造
解析処理の形態を模式的に示す図、第６図は文関連度判
定部で求められる話題提示表現とその反復出現の情報を
示す図、第７図は文関連性の情報を用いた接続構造解析
処理の形態を模式的に示す図である。１……テキスト入力部、２……文解析部、2a……書式構
造解析部、2b……形態素解析部、３……接続関係抽出
部、４……修辞関係判定部、５……文関連度判定部、5a
……話題表現抽出部、5b……不要語テーブル記憶部、5c
……単語反復抽出部、5d……文関連性判定処理部、６…
…関係名テーブル記憶部、７……禁止規則記憶部、８…
…文脈構造作成部、９……知識記憶部、10……アプリケ
ーション部。FIG. 1 shows a context processing device according to an embodiment of the present invention. FIG. 1 is a schematic configuration diagram of a main part of the embodiment device, FIG. 2 is a diagram showing a configuration example of a relation name table, and FIG. FIG. 4 is a diagram showing a configuration example of a prohibition rule table, FIG. 4 is a diagram showing an example of input text subjected to context processing, FIG. 5 is a diagram schematically showing a form of text rhetorical structure analysis processing, and FIG. FIG. 7 is a diagram showing a topic presentation expression obtained by the sentence relevance determination unit and information on its repeated appearance, and FIG. 7 is a diagram schematically showing a form of connection structure analysis processing using sentence relevance information. 1 ... Text input part, 2 ... Sentence analysis part, 2a ... Form structure analysis part, 2b ... Morphological analysis part, 3 ... Connection relation extraction part, 4 ... Rhetorical relation determination part, 5 ... Sentence relation Degree determining unit, 5a
…… Topic expression extraction unit, 5b …… Unnecessary word table storage unit, 5c
...... Word repetition extraction unit, 5d ...... Sentence relevance determination processing unit, 6 ...
... relation name table storage unit, 7 ... prohibition rule storage unit, 8 ...
... Context structure creation unit, 9 ... Knowledge storage unit, 10 ... Application unit.

Claims

[Claims]

Claim: What is claimed is: 1. In a context processing device for analyzing a sentence of a natural language to extract the structure of the entire sentence, sentence analysis means for performing morphological analysis on each sentence constituting the input sentence, and a sentence analysis means for obtaining the sentence. The relational relation extraction for detecting the rhetorical expression in the sentence using the morpheme analysis result of each sentence, and for obtaining the connection relation series connecting the detected sentence and the sentence before it by the connection relation corresponding to the rhetorical expression And a rhetorical relationship determining means for finding possible candidates as a rhetorical structure for each of the sentences including the connection relation series obtained by the connection relation extracting means, and excluding candidates that are inappropriate as a sentence structure from the candidates. And a word as a topic presentation expression in the sentence is extracted from the morphological analysis result of each sentence obtained by the sentence analysis unit, using a particle as a condition, and the word as the topic presentation expression is the extracted sentence. A sentence relevance determining means for checking whether or not the sentence appears in a preceding sentence, and, when it appears, correlating the extracted sentence with the preceding sentence and outputting as information on the relation between the sentences; For each candidate of the rhetorical structure in which the inappropriate candidate is excluded by the rhetorical relationship determining means, the extracted sentence according to the information on the relation between the sentences and the rhetorical expression is not detected Context structure creating means for respectively finding connection structure candidates connected to the preceding sentence, and excluding candidates for which the inter-sentence structure between the extracted sentence and the preceding sentence is inappropriate for these connection structure candidates, A natural language context processing device comprising: