JP5488249B2

JP5488249B2 - Program and information processing apparatus

Info

Publication number: JP5488249B2
Application number: JP2010142670A
Authority: JP
Inventors: 大悟杉原; 宏梅基; 智子大熊; 博増市; 昌嗣外池
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2010-06-23
Filing date: 2010-06-23
Publication date: 2014-05-14
Anticipated expiration: 2030-06-23
Also published as: JP2012008701A

Description

本発明は、プログラムおよび情報処理装置に関する。 The present invention relates to a program and an information processing apparatus.

自然言語処理の分野において、ある表現と同じ意味を持つ表現である同義表現や、ある表現を言い換えた言い換え表現を特定する技術が知られている。 In the field of natural language processing, techniques for identifying synonymous expressions that have the same meaning as a certain expression and paraphrased expressions that rephrase a certain expression are known.

例えば、非特許文献１に記載の技術では、文における語の係り受け関係に基づいて、ある語とその修飾語または被修飾語とを接続する二部グラフを作成し、この二部グラフを用いて、互いに類似した係り受け関係を有する語同士を言い換え表現として特定する。 For example, in the technique described in Non-Patent Document 1, a bipartite graph connecting a certain word and its modifier or a modified word is created based on the dependency relationship of words in a sentence, and this bipartite graph is used. Thus, words having a dependency relationship similar to each other are specified as a paraphrase expression.

また、特許文献１には、予め設定されたルールに従って、ある表現の同義表現や言い換え表現を特定する技術が開示されている。特許文献１には、ある表現と他の表現との間に括弧記号（「」、（）など）があり、かつ一方の表現が括弧で囲まれている場合に、これらの表現を言い換え表現とするルールの例が記載されている。 Patent Document 1 discloses a technique for specifying a synonymous expression or paraphrased expression of a certain expression according to a preset rule. In Patent Document 1, when there are parentheses (“”, (), etc.) between a certain expression and another expression, and one expression is enclosed in parentheses, these expressions are referred to as paraphrase expressions. An example of a rule to do is described.

また、非特許文献２には、文中の語の構文のパターンに従って同義語などの関係を抽出するためのルールを定義しておき、このルールで定められた構文のパターンに現れる語同士を同義語などの関係を有する語として特定する技術が開示されている。 Also, in Non-Patent Document 2, a rule for extracting relationships such as synonyms according to the syntax pattern of words in a sentence is defined, and words appearing in the syntax pattern defined by this rule are synonyms. A technique for specifying a word having a relationship such as is disclosed.

特開２００６−２９３７３１号公報JP 2006-293731 A

山本和英，「テキストからの語彙的換言知識の獲得」，言語処理学会第８回年次大会，ｐｐ．６３９−６４２，２００２年３月Kazuhide Yamamoto, “Acquisition of lexical paraphrase knowledge from text”, The 8th Annual Conference of the Association for Natural Language Processing, pp. 639-642, March 2002 ＭａｒｔｉＡ．Ｈｅａｒｓｔ，“ＡｕｔｏｍａｔｉｃＡｃｑｕｉｓｉｔｉｏｎｏｆＨｙｐｏｎｙｍｓｆｒｏｍＬａｒｇｅＴｅｘｔＣｏｒｐｏｒａ“，Ｐｒｏｃｅｅｄｉｎｇｓｏｆ１４ｔｈＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｆＣｏｍｐｕｔａｔｉｏｎａｌＬｉｎｇｕｉｓｔｉｃｓ，１９９２年Marty A. Hearst, “Automatic Acquisition of Hyponyms from Large Text Corporation”, Proceedings of 14th International Conference of Computational Linguistics, 1992

ところで、何らかの対象に対する評価を表す表現では、互いに異なる意味の語句が、同一の構文または類似した構文の文中に現れることがある。また、評価を表す表現では、異なる意味の語句の前後に同一の単語が現れることもある。したがって、評価を表す表現に対して関連性が高い表現の候補を、ある表現を含む文の構文または当該表現の前後の単語を参考にして特定した場合、互いに異なる意味の表現を、互いに関連性が高い表現の候補として特定してしまう可能性がある。 By the way, in expressions representing evaluations for some objects, phrases having different meanings may appear in sentences having the same syntax or similar syntax. In the expression representing evaluation, the same word may appear before and after words having different meanings. Therefore, if a candidate for an expression that is highly relevant to the expression representing the evaluation is identified with reference to the syntax of a sentence including the expression or words before and after the expression, expressions having different meanings are related to each other. May be identified as a candidate for a high expression.

本発明は、ある表現を含む文の構文または当該表現の前後の単語を参考にして関連性が高い表現の候補を特定する技術と比較して、より正確に、評価を表す表現について関連性が高い表現の候補を特定するプログラムおよび情報処理装置を提供することを目的とする。 The present invention is more accurately related to expressions representing evaluations compared to a technique for identifying candidate expressions having high relevance by referring to the syntax of a sentence including a certain expression or words before and after the expression. It is an object of the present invention to provide a program and an information processing apparatus that specify high expression candidates.

請求項１に係る発明は、評価対象を評価するための評価項目を表す語と当該評価項目に対する評価値を表す語とを含む評価表現と、当該評価表現が肯定的な表現であるか否かを表す極性と、を関連付けて記憶した評価表現記憶手段を参照して、処理対象の文字列に含まれる前記評価表現を前記処理対象の文字列から抽出し、抽出した評価表現それぞれの極性を特定する特定ステップと、前記抽出した評価表現の間で前記評価対象および前記極性が共通するか否かに基づいて、前記抽出した評価表現を１以上のグループに分類する分類ステップと、前記分類ステップで同一のグループに分類された評価表現が複数ある場合に、これら複数の評価表現を互いに関連付けて関連表現記憶手段に対して出力する出力ステップと、をコンピュータに実行させるためのプログラムである。 The invention according to claim 1 is an evaluation expression including a word representing an evaluation item for evaluating an evaluation object and a word representing an evaluation value for the evaluation item, and whether or not the evaluation expression is a positive expression. The evaluation expression storage means stored in association with each other is extracted, and the evaluation expression included in the processing target character string is extracted from the processing target character string, and the polarity of each of the extracted evaluation expressions is specified. A classification step for classifying the extracted evaluation expressions into one or more groups based on whether the evaluation object and the polarity are common between the extracted evaluation expressions, and the classification step. When there are a plurality of evaluation expressions classified into the same group, an output step of outputting the plurality of evaluation expressions to the related expression storage means in association with each other is executed on the computer A program for causing.

請求項２に係る発明は、請求項１に係る発明において、前記分類ステップにおいて、前記抽出した評価表現を前記処理対象の文字列における出現順に並べた場合に、連続する評価表現の間で前記評価対象および前記極性が共通していれば、当該連続する評価表現を同じグループに分類する。 The invention according to claim 2 is the invention according to claim 1, wherein, in the classification step, when the extracted evaluation expressions are arranged in the order of appearance in the processing target character string, the evaluation is performed between consecutive evaluation expressions. If the target and the polarity are common, the consecutive evaluation expressions are classified into the same group.

請求項３に係る発明は、請求項１または２に係る発明において、前記出力ステップにおいて、前記複数の評価表現のうち、前記評価項目を表す語が互いに共通の意味クラスにある評価表現については、当該評価表現の間の関連付けを行わない。 The invention according to claim 3 is the invention according to claim 1 or 2, wherein in the output step, among the plurality of evaluation expressions, an evaluation expression in which words representing the evaluation items are in a common semantic class. No association between the evaluation expressions.

請求項４に係る発明は、請求項１から３のいずれか１項に係る発明において、前記コンピュータに、さらに、前記出力ステップにおける出力の対象となる前記複数の評価表現それぞれについて、当該評価表現を含む文字列を前記処理対象の文字列から抽出する抽出ステップを実行させ、前記抽出ステップで抽出される文字列は、前記処理対象の文字列において当該評価表現の前記評価項目を表す語または前記評価値を表す語との間に係り受け関係を有する語句を含み、前記出力ステップにおいて、前記複数の評価表現それぞれについて抽出した文字列を互いに関連付けて前記関連表現記憶手段に対して出力する。 The invention according to claim 4 is the invention according to any one of claims 1 to 3, wherein the evaluation expression is further added to the computer for each of the plurality of evaluation expressions to be output in the output step. An extraction step of extracting a character string including the character string to be processed from the character string to be processed, and the character string extracted in the extraction step is a word representing the evaluation item of the evaluation expression in the character string to be processed or the evaluation In the output step, character strings extracted for each of the plurality of evaluation expressions are associated with each other and output to the related expression storage unit.

請求項５に係る発明は、請求項４に係る発明において、前記出力ステップにおいて、前記複数の評価表現それぞれについて抽出した文字列のうち、互いに共通の意味クラスにある語が含まれる文字列については、当該文字列の間の関連付けを行わない。 The invention according to claim 5 is the invention according to claim 4, wherein, in the output step, the character string including words in a common semantic class among the character strings extracted for each of the plurality of evaluation expressions, No association is made between the character strings.

請求項６に係る発明は、請求項１から５のいずれか１項に係る発明において、前記コンピュータに、さらに、前記評価表現記憶手段に記憶された評価表現における前記評価項目のうち数値で評価値が表され得る評価項目を表す語のそれぞれと、当該評価項目の評価値の基準値と、を関連付けて記憶した基準値情報記憶手段を参照し、前記処理対象の文字列から、前記基準値情報記憶手段に記憶された評価項目を表す語と当該評価項目の評価値に相当する数値とを抽出し、抽出した評価項目に関連付けられた前記基準値と抽出した数値とを比較した結果に基づいて、当該評価項目と当該数値との組合せに相当する前記評価表現およびその極性を前記評価表現記憶手段において特定する第２特定ステップを実行させ、前記第２特定ステップで特定した評価表現をさらに前記分類ステップの処理対象とする。 The invention according to claim 6 is the invention according to any one of claims 1 to 5, wherein the evaluation value is a numerical value among the evaluation items in the evaluation expression stored in the computer and further in the evaluation expression storage means. Reference value information storage means that associates and stores each word representing an evaluation item that can be expressed and the reference value of the evaluation value of the evaluation item, and the reference value information from the character string to be processed Based on the result of extracting the word representing the evaluation item stored in the storage means and the numerical value corresponding to the evaluation value of the evaluation item, and comparing the extracted reference value and the reference value associated with the extracted evaluation item , Causing the evaluation expression corresponding to the combination of the evaluation item and the numerical value and the polarity thereof to be specified in the evaluation expression storage means, and executing the second specifying step. Evaluation and further processed in the classification step expressions.

請求項７に係る発明は、評価対象を評価するための評価項目を表す語と当該評価項目に対する評価値を表す語とを含む評価表現と、当該評価表現が肯定的な表現であるか否かを表す極性と、を関連付けて記憶した評価表現記憶手段を参照して、処理対象の文字列に含まれる前記評価表現を前記処理対象の文字列から抽出し、抽出した評価表現それぞれの極性を特定する特定手段と、前記特定手段が抽出した評価表現の間で前記評価対象および前記極性が共通するか否かに基づいて、前記抽出した評価表現を１以上のグループに分類する分類手段と、前記分類手段で同一のグループに分類された評価表現が複数ある場合に、これら複数の評価表現を互いに関連付けて関連表現記憶手段に対して出力する出力手段と、を備えることを特徴とする情報処理装置である。 The invention according to claim 7 is an evaluation expression including a word representing an evaluation item for evaluating the evaluation object and a word representing an evaluation value for the evaluation item, and whether or not the evaluation expression is a positive expression. The evaluation expression storage means stored in association with each other is extracted, and the evaluation expression included in the processing target character string is extracted from the processing target character string, and the polarity of each of the extracted evaluation expressions is specified. Classification means for classifying the extracted evaluation expressions into one or more groups based on whether the evaluation object and the polarity are common between the evaluation expressions extracted by the specifying means; and And an output means for associating the plurality of evaluation expressions with each other and outputting them to the related expression storage means when there are a plurality of evaluation expressions classified into the same group by the classification means. It is a processing apparatus.

請求項１または７に係る発明によると、ある表現を含む文の構文または当該表現の前後の単語を参考にして関連性が高い表現の候補を特定する技術と比較して、より正確に、評価を表す表現について関連性が高い表現の候補を特定することができる。 According to the invention according to claim 1 or 7, the evaluation is performed more accurately compared to a technique for identifying a highly relevant expression candidate with reference to a sentence syntax including an expression or words before and after the expression. It is possible to specify expressions that are highly relevant to the expression that represents.

請求項２に係る発明によると、処理対象の文字列における出現順で連続して、同じ評価対象について肯定的な評価または否定的な評価を表す評価表現同士を互いに関連付けて出力できる。 According to the second aspect of the present invention, evaluation expressions representing positive evaluation or negative evaluation for the same evaluation object can be output in association with each other successively in the order of appearance in the character string to be processed.

請求項３に係る発明によると、評価項目を表す語が互いに共通の意味クラスにある評価表現同士を関連付けないようにすることができる。 According to the third aspect of the present invention, it is possible not to associate evaluation expressions in which words representing evaluation items are in a common semantic class.

請求項４に係る発明によると、互いに関連付けられる複数の評価表現のそれぞれと、当該評価表現の評価項目を表す語または評価値を表す語との間に係り受け関係を有する語句と、を含む文字列を処理対象の文字列から抽出し、抽出した文字列を互いに関連付けて出力できる。 According to the invention of claim 4, a character including each of a plurality of evaluation expressions associated with each other and a phrase having a dependency relationship between a word representing an evaluation item of the evaluation expression or a word representing an evaluation value A column can be extracted from a character string to be processed, and the extracted character strings can be associated with each other and output.

請求項５に係る発明によると、互いに共通の意味クラスにある語を含む文字列同士を関連付けないようにすることができる。 According to the invention which concerns on Claim 5, the character string containing the word which exists in a mutually common semantic class can be made not to associate.

請求項６に係る発明によると、数値で評価が表される評価項目を含む評価表現について関連性が高い表現の候補を特定できる。 According to the invention which concerns on Claim 6, the expression candidate with high relevance can be specified about the evaluation expression containing the evaluation item by which evaluation is represented numerically.

情報処理装置の内部構成の概略の例を示すブロック図である。It is a block diagram which shows the example of the outline of an internal structure of information processing apparatus. 評価表現辞書のデータ内容の例を示す図である。It is a figure which shows the example of the data content of an evaluation expression dictionary. 処理対象のテキストの例を示す図である。It is a figure which shows the example of the text of a process target. 処理対象のテキストから抽出される評価表現のグループ分けの例を説明するための図である。It is a figure for demonstrating the example of grouping of the evaluation expression extracted from the text of a process target. 関連表現記憶部のデータ内容の例を示す図である。It is a figure which shows the example of the data content of a related expression memory | storage part. 情報処理装置が行う処理の手順の例を示すフローチャートである。It is a flowchart which shows the example of the procedure of the process which information processing apparatus performs. 処理対象のテキストから抽出される評価表現のグループ分けの他の例を説明するための図である。It is a figure for demonstrating the other example of grouping of the evaluation expression extracted from the text of a process target. 処理対象のテキストの他の例を示す図である。It is a figure which shows the other example of the text of a process target. 基準値の情報の例を示す図である。It is a figure which shows the example of the information of a reference value. コンピュータのハードウエア構成の例を示すブロック図である。It is a block diagram which shows the example of the hardware constitutions of a computer.

本発明の実施形態の例では、処理対象のテキスト（文章）を解析することで、何らかの評価対象を評価する表現の言い換え表現の候補を特定する。ここで、ある表現の「言い換え表現」とは、当該ある表現と何らかの関連があることから当該ある表現と置換可能な表現を指す。例えば、ある表現の「言い換え表現」は、当該ある表現と同じ意味を有する表現であってもよいし、当該ある表現と類似した意味を有する表現であってもよい。また、本実施形態の例では、各人の知識や考え方によっては必ずしも置換可能と認められないような表現同士であっても、一方の表現を他方の言い換え表現の候補として特定することもある。例えば、ある専門分野における評価の表現を他の表現に言い換えることで、当該専門分野の知識を有しない者にとって、その評価についての理解の助けになり得る場合、当該専門分野の専門家にとって必ずしも置換可能とは認められない表現同士であっても、言い換え表現の候補とすることがある。したがって、本発明の実施形態の例では、ある語の言い換え表現の候補とは、当該ある語に対して関連性が高い表現であると捉えられる。以下では、ある表現の言い換え表現の候補となる表現、つまり、ある表現に対して関連性が高い表現を「関連表現」と呼ぶ。 In the example of the embodiment of the present invention, by analyzing the text (sentence) to be processed, a paraphrase expression candidate for an expression that evaluates some evaluation object is specified. Here, the “paraphrased expression” of a certain expression refers to an expression that can be replaced with the certain expression because it has some relation to the certain expression. For example, the “paraphrase expression” of a certain expression may be an expression having the same meaning as the certain expression, or may be an expression having a similar meaning to the certain expression. Further, in the example of the present embodiment, one expression may be specified as a candidate for the other paraphrase expression, even if the expressions are not necessarily recognized as being replaceable depending on the knowledge and way of thinking of each person. For example, if a person who does not have knowledge in the field of specialization can help to understand the evaluation by rephrasing the expression of the evaluation in one field of expertise, it is not necessarily a replacement for the expert in the field of expertise. Even expressions that are not considered possible may be candidates for paraphrased expressions. Therefore, in the example of the embodiment of the present invention, a candidate for a paraphrased expression of a certain word is regarded as an expression highly relevant to the certain word. Hereinafter, an expression that is a candidate for a paraphrase expression of a certain expression, that is, an expression that is highly relevant to a certain expression is referred to as a “related expression”.

図１は、本発明の一実施形態の例による情報処理装置の内部構成の概略を示すブロック図である。情報処理装置１０は、参照データ記憶部１１０、コーパス解析部１２０、評価表現抽出部１３０、評価表現分類部１４０、関連表現生成部１５０、意味クラス判定部１６０、出力処理部１７０、および関連表現記憶部１８０を備える。 FIG. 1 is a block diagram showing an outline of an internal configuration of an information processing apparatus according to an example of an embodiment of the present invention. The information processing apparatus 10 includes a reference data storage unit 110, a corpus analysis unit 120, an evaluation expression extraction unit 130, an evaluation expression classification unit 140, a related expression generation unit 150, a semantic class determination unit 160, an output processing unit 170, and a related expression storage Part 180.

参照データ記憶部１１０は、処理対象のテキストの解析において用いられる各種のデータを記憶する。参照データ記憶部１１０は、解析辞書１１２、評価表現辞書１１４、および意味辞書１１６を備える。 The reference data storage unit 110 stores various data used in analyzing the text to be processed. The reference data storage unit 110 includes an analysis dictionary 112, an evaluation expression dictionary 114, and a semantic dictionary 116.

解析辞書１１２は、単語と当該単語の文法上の役割などを表す情報とを対応づけて記憶すると共に、処理対象のテキストが記述された言語（本例では日本語）の文法規則を記憶した辞書である。解析辞書１１２には、一般的な単語だけでなく、様々な専門分野で用いられる専門用語も登録しておいてよい。例えば、特定の専門分野の文章を処理対象とする場合、当該特定の専門分野の専門用語を解析辞書１１２に登録しておく。本実施形態の例では、医療分野の文章が情報処理装置１０の処理対象であり、一般的な単語と共に医療分野の専門用語が解析辞書１１２に登録されるものとする。 The analysis dictionary 112 stores a word and information indicating a grammatical role of the word in association with each other and stores a grammar rule of a language (Japanese in this example) in which a text to be processed is described. It is. In the analysis dictionary 112, not only general words but also technical terms used in various specialized fields may be registered. For example, when a sentence in a specific specialized field is to be processed, technical terms in the specific specialized field are registered in the analysis dictionary 112 in advance. In the example of the present embodiment, it is assumed that a sentence in the medical field is a processing target of the information processing apparatus 10 and technical terms in the medical field are registered in the analysis dictionary 112 together with general words.

評価表現辞書１１４は、何らかの評価対象に対する評価を表す評価表現と、その評価表現が肯定的な評価であるか否定的な評価であるかを示す極性と、を関連付けて記憶する。本実施形態の例の評価表現辞書１１４は、医療分野における評価表現およびその極性を記憶する。図２に、評価表現辞書１１４のデータ内容の一例を示す。 The evaluation expression dictionary 114 associates and stores an evaluation expression representing an evaluation with respect to some evaluation object and a polarity indicating whether the evaluation expression is a positive evaluation or a negative evaluation. The evaluation expression dictionary 114 of the example of this embodiment stores evaluation expressions in the medical field and their polarities. FIG. 2 shows an example of data contents of the evaluation expression dictionary 114.

図２には、評価対象の状態の変化を評価する評価表現の例を示す。図２の例では、評価表現辞書１１４において、評価の「対象」、「属性」、および「評価値」の組からなる評価表現のそれぞれに関連付けて、当該評価表現の極性が登録される。評価表現の「対象」は、その評価表現における評価対象を表す。図２に例示する評価表現は、すべて、「肝臓」が評価対象である。評価表現の「属性」は、評価対象を評価するための評価項目を表す語である。図２には、属性として、「肝機能」，「ＧＯＰ」，「ＧＰＴ」の各語を含む評価表現の例が示される。評価表現の「評価値」は、対応する属性の評価の良し悪しを表す語である。図２には、評価値として、「改善」，「悪化」，「低下」，「減少」，「増加」，「上昇」の各語を含む評価表現の例が示される。図２に例示する評価値は、いずれも、評価対象の状態の変化を表す語である。また、各評価表現に関連付けられた「極性」は、当該評価表現が肯定的であるか否定的であるかを表す。図２の表では、極性の値「positive」は当該評価表現が肯定的であることを表し、極性の値「negative」は当該評価表現が否定的であることを表す。 FIG. 2 shows an example of an evaluation expression for evaluating a change in the state of the evaluation target. In the example of FIG. 2, in the evaluation expression dictionary 114, the polarity of the evaluation expression is registered in association with each evaluation expression including a set of “object”, “attribute”, and “evaluation value” of evaluation. The “object” of the evaluation expression represents the evaluation object in the evaluation expression. In the evaluation expressions illustrated in FIG. 2, “liver” is an evaluation target. The “attribute” of the evaluation expression is a word representing an evaluation item for evaluating the evaluation object. FIG. 2 shows an example of an evaluation expression including the words “liver function”, “GOP”, and “GPT” as attributes. The “evaluation value” of the evaluation expression is a word that represents the evaluation of the corresponding attribute. FIG. 2 shows an example of an evaluation expression including the words “improvement”, “deterioration”, “decrease”, “decrease”, “increase”, and “increase” as evaluation values. Each of the evaluation values illustrated in FIG. 2 is a word that represents a change in the state of the evaluation target. The “polarity” associated with each evaluation expression represents whether the evaluation expression is positive or negative. In the table of FIG. 2, the polarity value “positive” indicates that the evaluation expression is positive, and the polarity value “negative” indicates that the evaluation expression is negative.

評価表現辞書１１４のデータ内容は、情報処理装置１０の後述の各要素が処理を開始する前に予め生成して登録しておく。例えば、処理対象とするテキストに関連する分野（本実施形態の例では医療分野）における複数の文章から、評価対象、属性、および評価値を含む評価表現を抽出して極性を決定し、当該評価表現と当該極性とを関連付けて評価表現辞書１１４に登録することで、評価表現辞書１１４のデータ内容を生成する。評価表現の抽出および極性の決定は、従来から知られている技術を用いて行えばよい。例えば、特開２００５−２３５０１４号公報および参考文献１（小林のぞみ，乾健太郎，松本裕治，立石健二，福島俊一共著，「意見抽出のための評価表現の収集」，自然言語処理，１２（２），２００５年）などに記載された手法を用いればよい。 The data contents of the evaluation expression dictionary 114 are generated and registered in advance before each element described later of the information processing apparatus 10 starts processing. For example, the evaluation expression including the evaluation object, the attribute, and the evaluation value is extracted from a plurality of sentences in the field related to the text to be processed (the medical field in the example of the present embodiment), and the polarity is determined. By registering the expression and the polarity in association with each other in the evaluation expression dictionary 114, the data content of the evaluation expression dictionary 114 is generated. The extraction of the evaluation expression and the determination of the polarity may be performed using a conventionally known technique. For example, JP 2005-235014 A and Reference 1 (Nozomi Kobayashi, Kentaro Inui, Yuji Matsumoto, Kenji Tateishi, Shunichi Fukushima, “Collecting Evaluation Expressions for Opinion Extraction”, Natural Language Processing, 12 (2) , 2005) may be used.

再び図１を参照し、意味辞書１１６は、一般的な単語および専門用語の意味に関する情報を記憶する。本実施形態の例では、意味辞書１１６は、一般的な単語および専門用語についてのシソーラスを含む。シソーラスは、単語の概念上の上位／下位関係、部分／全体関係、同義関係、類義関係などによって単語を分類し、体系づけた辞書であり、単語の分類に応じた階層構造を有する。また、本実施形態の例の意味辞書１１６は、各単語の同義語を表す同義語辞書をさらに含む。 Referring again to FIG. 1, the semantic dictionary 116 stores information regarding the meaning of common words and technical terms. In the example of this embodiment, the semantic dictionary 116 includes a thesaurus for common words and technical terms. The thesaurus is a systematic dictionary that classifies words according to upper / lower relations, partial / whole relations, synonymous relations, synonymous relations, etc., and has a hierarchical structure according to the classification of the words. The semantic dictionary 116 in the example of the present embodiment further includes a synonym dictionary that represents synonyms for each word.

図１の説明に戻り、コーパス解析部１２０は、解析辞書１１２を参照して、処理対象のテキストに対して形態素解析および構文解析を行う。形態素解析および構文解析は、自然言語処理の技術において従来から知られている手法を用いて行えばよい。コーパス解析部１２０による解析は、後述の各部における処理のための前処理である。 Returning to the description of FIG. 1, the corpus analysis unit 120 refers to the analysis dictionary 112 and performs morphological analysis and syntax analysis on the text to be processed. Morphological analysis and syntax analysis may be performed using a method conventionally known in the natural language processing technology. The analysis by the corpus analysis unit 120 is preprocessing for processing in each unit described later.

なお、処理対象のテキストは、例えば、文書を記憶したデータベース（図示しない）から取得される。本実施形態の例では、医師、看護師、および医療機関のスタッフなどが作成した文書（学術論文や患者の病状についてのレポートなど）を記憶したデータベース中の文書から処理対象のテキストを取得するものとする。データベース中の全文書に含まれるすべての文章を処理対象のテキストとしてもよいし、データベース中の文書のうちユーザが指定した文書に含まれる文章のすべてを処理対象のテキストとしてもよい。あるいは、データベース中の１以上の文書においてユーザが指定した１以上の部分に含まれる文章を処理対象のテキストとしてもよい。 Note that the text to be processed is acquired from, for example, a database (not shown) that stores documents. In the example of this embodiment, the text to be processed is acquired from a document in a database storing documents (such as academic papers and reports on patient medical conditions) created by doctors, nurses, and staff of medical institutions. And All sentences included in all documents in the database may be text to be processed, or all sentences included in documents specified by the user among documents in the database may be text to be processed. Alternatively, a sentence included in one or more portions designated by the user in one or more documents in the database may be set as the text to be processed.

評価表現抽出部１３０は、評価表現辞書１１４を参照し、コーパス解析部１２０が解析した処理対象のテキストから評価表現を抽出する。以下、図３に示す文章「ＧＰＴ優位の肝機能の悪化が認められた。ＧＯＰの増加、ＧＰＴの著明な上昇。補液などを行い速やかな肝機能の改善を認めた。・・・」が処理対象のテキストである場合を例にとり、評価表現抽出部１３０による評価表現の抽出の様子を説明する。本例において、評価表現辞書１１４には、図２に例示するデータ内容が登録されているとする。 The evaluation expression extraction unit 130 refers to the evaluation expression dictionary 114 and extracts an evaluation expression from the text to be processed analyzed by the corpus analysis unit 120. In the following, the text “GPT-dominated deterioration of liver function was observed. GOP increased, GPT increased significantly. Immediate improvement of liver function was confirmed by fluid replacement etc.” is shown in FIG. Taking the case of the text to be processed as an example, how the evaluation expression extraction unit 130 extracts the evaluation expression will be described. In this example, it is assumed that the data contents illustrated in FIG. 2 are registered in the evaluation expression dictionary 114.

評価表現抽出部１３０は、評価表現辞書１１４に登録された評価表現に含まれる、属性を表す語および評価値を表す語を処理対象のテキストにおいて特定する。図３において破線の四角で囲まれた語が、評価表現抽出部１３０により特定される属性および評価値の語の例である。さらに、評価表現抽出部１３０は、特定した評価値と組になる属性を特定する。特定した評価値に対応する属性は、例えば、処理対象のテキストの係り受け関係に基づいて特定してもよいし、あるいは、参考文献２（飯田龍，小林のぞみ，乾健太郎，松本裕治，立石健二，福島俊一共著，「意見抽出を目的とした機械学習による属性‐評価値対同定」，情報処理学会自然言語処理研究会，２００５−ＮＬ−１６５）に記載されているように機械学習の手法を用いて特定してもよい。参考文献２に記載された手法では、処理対象のテキストから＜対象，属性，評価値＞の３つ組で表される評価表現（当該文献中では「意見」と呼ばれる）を抽出する問題を、（１）＜属性，評価値＞の対を同定する問題、および（２）同定した対が意見性を持つか否か（どのような記述を意見とみなすかを表す条件を満たすか否か）を判定する問題に分けて、それぞれ、機械学習に基づく手法を用いて解析することで、評価表現を抽出する。図３において、組となる属性と評価値とは破線矢印で接続されている。評価表現抽出部１３０は、処理対象のテキストにおいて特定した属性を表す語と評価値を表す語との組を抽出する。図３の例では、（属性，評価値）の組として、（肝機能，悪化），（ＧＯＰ，増加），（ＧＰＴ，上昇），（肝機能，改善）が抽出される。 The evaluation expression extraction unit 130 specifies a word representing an attribute and a word representing an evaluation value included in the evaluation expression registered in the evaluation expression dictionary 114 in the text to be processed. In FIG. 3, words surrounded by broken-line squares are examples of attributes and evaluation value words specified by the evaluation expression extraction unit 130. Furthermore, the evaluation expression extraction unit 130 specifies an attribute that is paired with the specified evaluation value. The attribute corresponding to the specified evaluation value may be specified based on, for example, the dependency relationship of the text to be processed, or Reference 2 (Ryu Iida, Nozomi Kobayashi, Kentaro Inui, Yuji Matsumoto, Kenji Tateishi , Shunichi Fukushima, “Attribute-Evaluation Pair Identification by Machine Learning for Opinion Extraction”, Natural Language Processing Study Group of Information Processing Society of Japan, 2005-NL-165). May be specified. In the technique described in Reference Document 2, the problem of extracting an evaluation expression (called “opinion” in the document) represented by a triple of <object, attribute, evaluation value> from the text to be processed, (1) The problem of identifying the <attribute, evaluation value> pair, and (2) Whether the identified pair has opinionality (whether the condition that expresses what description is regarded as an opinion) The evaluation expression is extracted by analyzing using a method based on machine learning. In FIG. 3, the attribute and the evaluation value that form a pair are connected by a broken-line arrow. The evaluation expression extraction unit 130 extracts a set of a word representing the attribute specified in the text to be processed and a word representing the evaluation value. In the example of FIG. 3, (liver function, deterioration), (GOP, increase), (GPT, increase), and (liver function, improvement) are extracted as a set of (attribute, evaluation value).

なお、図３を参照する本例では、評価表現抽出部１３０は、評価表現における評価対象を、処理対象のテキストとは別に取得する。例えば、評価表現抽出部１３０は、処理対象のテキストを含む文書をデータベースから取得して解析し、当該文書の主題を表す文字列を特定し、この文字列から、評価表現辞書１１４に記憶された評価対象を表す語を抽出する。文書の主題を表す文字列は、例えば、文書の名称であってよい。また、例えば当該文書が患者の病状についてのレポートである場合、当該患者の病名を記述した文字列を、文書の主題を表す文字列としてもよい。また、当該文書の主題を表す文字列を特定する代わりに、当該文書において処理対象のテキストを含む部分（章、節、項目など）のタイトルに相当する文字列を特定し、この文字列から評価対象を表す語を抽出してもよい。以上の例において、文書中のどのような文字列から評価対象を表す語を抽出するか、および、評価対象を表す語の抽出の基となる文字列が文書中のどこに存在し得るかを表す情報は、予め設定されて参照データ記憶部１１０に記憶されており、この情報を参照して、評価表現抽出部１３０は文書の解析および評価対象の抽出を行えばよい。また、他の例では、ユーザによる処理対象のテキストの指定を受け付けると共に、評価対象の指定をユーザから受け付けてもよい。評価表現抽出部１３０は、上述の各例のように取得した評価対象を、処理対象のテキストから抽出した属性と評価値との組に対応づけて、（対象，属性，評価値）の３つ組からなる評価表現とする。図３の例では、評価対象「肝臓」が取得され、評価表現（肝臓，肝機能，悪化），（肝臓，ＧＯＰ，増加），（肝臓，ＧＰＴ，上昇），（肝臓，肝機能，改善）が得られるとする。 In this example with reference to FIG. 3, the evaluation expression extraction unit 130 acquires the evaluation target in the evaluation expression separately from the text to be processed. For example, the evaluation expression extraction unit 130 acquires a document including the text to be processed from the database, analyzes the document, specifies a character string representing the subject of the document, and stores the document in the evaluation expression dictionary 114 from this character string. Extract words representing the evaluation target. The character string representing the subject of the document may be, for example, the name of the document. Further, for example, when the document is a report on a patient's medical condition, a character string describing the patient's disease name may be a character string representing the subject of the document. Also, instead of specifying the character string representing the subject of the document, the character string corresponding to the title of the part (chapter, section, item, etc.) containing the text to be processed in the document is specified and evaluated from this character string. A word representing the target may be extracted. In the above example, the character string that represents the evaluation target is extracted from what character string in the document, and the character string that is the basis of the extraction of the word that represents the evaluation target can be present in the document. The information is set in advance and stored in the reference data storage unit 110. With reference to this information, the evaluation expression extraction unit 130 may perform document analysis and evaluation target extraction. In another example, designation of text to be processed by the user may be accepted and designation of an evaluation target may be accepted from the user. The evaluation expression extraction unit 130 associates the evaluation object acquired as in each of the above-described examples with a set of an attribute and an evaluation value extracted from the text to be processed, and includes (object, attribute, evaluation value). The evaluation expression consists of a set. In the example of FIG. 3, an evaluation target “liver” is acquired, and evaluation expressions (liver, liver function, deterioration), (liver, GOP, increase), (liver, GPT, increase), (liver, liver function, improvement) Is obtained.

さらに、評価表現抽出部１３０は、抽出した評価表現のそれぞれについて、評価表現辞書１１４において当該評価表現に関連付けて登録された極性の値を取得する。図３の例の処理対象のテキストから抽出される評価表現（対象，属性，評価値）と各評価表現について取得される極性の値とを図４に例示する。図４の表に示す各評価表現の極性の値は、図２に例示する評価表現辞書１１４において当該評価表現に関連付けられた極性の値である。 Further, the evaluation expression extraction unit 130 acquires, for each of the extracted evaluation expressions, a polarity value registered in the evaluation expression dictionary 114 in association with the evaluation expression. FIG. 4 illustrates an evaluation expression (object, attribute, evaluation value) extracted from the text to be processed in the example of FIG. 3 and the polarity value acquired for each evaluation expression. The polarity value of each evaluation expression shown in the table of FIG. 4 is a polarity value associated with the evaluation expression in the evaluation expression dictionary 114 illustrated in FIG.

図１の説明に戻り、評価表現分類部１４０は、抽出された評価表現の間で、属性を表す語および極性が共通するか否かに基づいて、抽出された評価表現を１以上のグループに分類する。同じグループに分類された複数の評価表現は、互いに関連表現の候補となる。 Returning to the description of FIG. 1, the evaluation expression classifying unit 140 divides the extracted evaluation expressions into one or more groups based on whether or not the extracted evaluation expressions have the same attribute word and polarity. Classify. A plurality of evaluation expressions classified into the same group are candidates for related expressions.

図４を参照し、評価表現分類部１４０による分類の一具体例を説明する。図４の表は、図３の例の処理対象のテキストから抽出された評価表現およびその極性を、処理対象のテキストにおける評価表現の出現の順に上の行から下の行に向かって並べたものである。本例では、評価表現分類部１４０は、処理対象のテキストで連続して出現する評価表現の間で、評価対象および極性が共通するものを同じグループに分類する。同一の評価対象に対し、同じ極性の評価表現が連続して記述されている場合、記述を行った者の評価対象に対する態度が一貫していると認められる。よって、このような評価表現同士は、同一の評価対象に対して、同様の評価を言い換えて表現したものである可能性がある。したがって、評価表現分類部１４０は、このような評価表現同士を、関連表現の候補として同じグループに分類する。図４を参照し、評価対象が「肝臓」で、極性が「negative」で連続している最初の３つの評価表現がグループ１に分類され、評価対象がグループ１と同じ「肝臓」であっても、極性が「positive」で異なる４つめの評価表現は、グループ２に分類される。 A specific example of classification by the evaluation expression classification unit 140 will be described with reference to FIG. The table of FIG. 4 is a table in which the evaluation expressions extracted from the text to be processed in the example of FIG. 3 and their polarities are arranged from the upper line to the lower line in the order of appearance of the evaluation expressions in the text to be processed. It is. In this example, the evaluation expression classifying unit 140 classifies the evaluation expressions that appear in succession in the text to be processed and that have the same evaluation object and polarity into the same group. When evaluation expressions of the same polarity are described consecutively for the same evaluation object, it is recognized that the attitude of the person who made the description to the evaluation object is consistent. Therefore, there is a possibility that such evaluation expressions express the same evaluation in other words for the same evaluation object. Therefore, the evaluation expression classifying unit 140 classifies such evaluation expressions into the same group as related expression candidates. Referring to FIG. 4, the first three evaluation expressions that are “liver” as the evaluation target and consecutive in the negative as “negative” are classified into group 1, and the evaluation target is “liver” that is the same as group 1 However, the fourth evaluation expression having a polarity of “positive” and different is classified into group 2.

再び図１を参照し、関連表現生成部１５０は、評価表現分類部１４０による分類の結果を用いて、関連表現の組を生成する。例えば、関連表現生成部１５０は、評価表現分類部１４０が評価表現を分類したグループのうち複数の評価表現を含むグループについて、当該グループに含まれる評価表現のそれぞれを含む文字列を処理対象の文字列から抽出する。ここで各評価表現について抽出される文字列は、処理対象のテキストにおいて、評価表現の属性を表す語または評価値を表す語との間に係り受け関係を有する語句を含む。例えば、図４の例のグループ１に含まれる評価表現（肝臓，肝機能，悪化）の場合、処理対象のテキスト（図３）から、属性「肝機能」および評価値「悪化」を含む、「肝機能の悪化」，「ＧＰＴ優位の肝機能の悪化」などの文字列が抽出される。また、評価表現（肝臓，ＧＯＰ，増加）の場合、文字列「ＧＯＰの増加」が抽出され、評価表現（肝臓，ＧＰＴ，上昇）の場合、文字列「ＧＰＴの上昇」，「ＧＰＴの著明な上昇」が抽出される。なお、関連表現生成部１５０は、評価表現の属性および評価値を含む文字列のうち、構文上、句または文として成立する最小単位の文字列（構文上の最短のパスにより得られる文字列）だけを処理対象のテキストから抽出してもよい。この場合、文字列「肝機能の悪化」，「ＧＯＰの増加」，「ＧＰＴの上昇」が処理対象のテキストから抽出される。 Referring again to FIG. 1, the related expression generation unit 150 generates a set of related expressions using the result of classification by the evaluation expression classification unit 140. For example, for the group including a plurality of evaluation expressions among the groups in which the evaluation expression classification unit 140 classifies the evaluation expressions, the related expression generation unit 150 processes a character string including each of the evaluation expressions included in the group. Extract from column. Here, the character string extracted for each evaluation expression includes a phrase having a dependency relationship with a word representing an attribute of the evaluation expression or a word representing an evaluation value in the text to be processed. For example, in the case of the evaluation expression (liver, liver function, deterioration) included in the group 1 in the example of FIG. 4, from the text to be processed (FIG. 3), the attribute “liver function” and the evaluation value “deterioration” are included. Character strings such as “deterioration of liver function” and “deterioration of liver function dominant in GPT” are extracted. In the case of the evaluation expression (liver, GOP, increase), the character string “Increase in GOP” is extracted. In the case of the evaluation expression (liver, GPT, increase), the character strings “Increase in GPT” and “Significant GPT” are extracted. "Near rise" is extracted. It should be noted that the related expression generation unit 150 has a minimum unit character string (character string obtained by the shortest path in the syntax) syntactically established as a phrase or sentence among the character strings including the evaluation expression attribute and the evaluation value. May be extracted from the text to be processed. In this case, the character strings “deterioration of liver function”, “increase in GOP”, and “increase in GPT” are extracted from the text to be processed.

関連表現生成部１５０は、あるグループに含まれる各評価表現について抽出した文字列のうちの２つずつを関連表現の組とする。ただし、評価表現における属性を表す語同士が共通の意味クラスに属すると判定される文字列同士は関連表現の組としない。また、処理対象の文字列から評価表現について抽出した文字列においても、共通の意味クラスに属すると判定される語を含む文字列同士を関連表現の組とすることはない。ここで、意味クラスとはシソーラス辞書等で共通の意味素でまとめられた語の集合のことであり、意味素とは意味を計算機で扱う際に用いる意味の基本単位である。例えば、意味素が「動物」である場合、意味クラスは「犬」や「馬」や「猿」といった語の集合となる。関連表現生成部１５０は、意味クラス判定部１６０に依頼して、あるグループに含まれる評価表現の属性を表す語のうち互いに共通の意味クラスにあるものが存在するか否か、および、評価表現について抽出した文字列において互いに共通の意味クラスにある語を含む文字列が存在するか否かを判定させる。 The related expression generation unit 150 sets two of the character strings extracted for each evaluation expression included in a certain group as a set of related expressions. However, character strings that are determined to belong to a common semantic class between attributes representing attributes in the evaluation expression are not included in the set of related expressions. Further, even in a character string extracted from the processing target character string with respect to the evaluation expression, character strings including words determined to belong to a common semantic class are not used as a set of related expressions. Here, the semantic class is a set of words grouped by a common semantic element in a thesaurus dictionary or the like, and the semantic element is a basic unit of meaning used when the meaning is handled by a computer. For example, if the semantic element is “animal”, the semantic class is a set of words such as “dog”, “horse”, and “monkey”. The related expression generation unit 150 requests the semantic class determination unit 160 to determine whether there are words in the same semantic class among the words representing the attributes of the evaluation expression included in a certain group, and the evaluation expression It is determined whether or not there is a character string that includes words in a common semantic class among the character strings extracted for.

意味クラス判定部１６０は、関連表現生成部１５０からの依頼を受けて、互いに共通の意味クラスにある語を判定する。ある評価対象を評価する評価表現において異なる事柄を表す複数の語が現れる場合は、これらの語が共通の意味クラスを持つ場合と捉えることができる。例えば、図３および図４を参照する上述の例において、評価表現の属性「ＧＯＰ」，「ＧＰＴ」は、互いに異なる検査項目を表す。よって、本例では、「ＧＯＰ」および「ＧＰＴ」が互いに共通する意味クラスにある語と判定される。本実施形態の例の意味クラス判定部１６０は、意味辞書１１６を参照して、語同士が共通の意味クラスにあるか否かを判定する。より具体的には、意味辞書１１６に含まれるシソーラスにおいて意味クラスが同じであり、かつ、意味辞書１１６に含まれる同義語辞書において同義語として登録されていない語同士を、互いに共通の意味クラスにあると判定する。ここで、複数の語の意味クラスが同じであるか否かは、シソーラスが有する概念の階層構造に従って判定すればよい。例えば、シソーラスの階層構造において、複数の語に相当する各概念（の項目）の直近の上位に位置する概念が同一である場合に、意味クラスが同じであると判定すればよい。言い換えると、シソーラスの階層構造に対応する木構造において、複数の語にそれぞれ対応するノードが同一の親ノードを有する場合に、意味クラスが同じであると判定すればよい。上述の「ＧＯＰ」，「ＧＰＴ」の例の場合、これらの語は、シソーラスの階層構造において、同じ「検査値」の概念（意味クラス）の直近の下位概念に属し、同義語ではないことから、共通の意味クラスにあると判定されるものとする。 The semantic class determination unit 160 receives words from the related expression generation unit 150 and determines words in a common semantic class. When a plurality of words representing different matters appear in an evaluation expression for evaluating a certain evaluation object, it can be considered that these words have a common semantic class. For example, in the example described above with reference to FIGS. 3 and 4, the attributes “GOP” and “GPT” of the evaluation expression represent different inspection items. Therefore, in this example, “GOP” and “GPT” are determined to be words in a common semantic class. The semantic class determination unit 160 in the example of this embodiment refers to the semantic dictionary 116 to determine whether words are in a common semantic class. More specifically, the semantic classes in the thesaurus included in the semantic dictionary 116 are the same, and words that are not registered as synonyms in the synonym dictionary included in the semantic dictionary 116 are defined as common semantic classes. Judge that there is. Here, whether or not the semantic classes of a plurality of words are the same may be determined according to the hierarchical structure of the concept of the thesaurus. For example, in the hierarchical structure of the thesaurus, when the concepts positioned immediately above the concepts (items) corresponding to a plurality of words are the same, the semantic classes may be determined to be the same. In other words, when nodes corresponding to a plurality of words have the same parent node in a tree structure corresponding to the thesaurus hierarchical structure, the semantic classes may be determined to be the same. In the case of the above-mentioned examples of “GOP” and “GPT”, these words belong to the immediate subordinate concept of the same “inspection value” concept (semantic class) and are not synonymous in the thesaurus hierarchy. , It is determined to be in a common semantic class.

出力処理部１７０は、関連表現生成部１５０が生成した関連表現の組を出力する処理を行う。例えば、生成された関連表現の組を関連表現記憶部１８０に対して出力する。この出力処理により、関連表現の組が関連表現記憶部１８０に登録される。また、出力処理部１７０は、関連表現の組を図示しない表示装置に表示させる処理を行ってもよい。 The output processing unit 170 performs processing for outputting a set of related expressions generated by the related expression generating unit 150. For example, the generated set of related expressions is output to the related expression storage unit 180. By this output processing, a set of related expressions is registered in the related expression storage unit 180. Further, the output processing unit 170 may perform a process of displaying a set of related expressions on a display device (not shown).

関連表現記憶部１８０は、関連表現生成部１５０が生成し、出力処理部１７０により出力された関連表現の組を記憶する。図５に、関連表現記憶部１８０のデータ内容の一例を示す。図５は、図３の処理対象テキストから図４の評価表現が抽出された場合に関連表現生成部１５０が生成する関連表現の組の例を示す。図５の表の１行に記載された２つの表現が１つの関連表現の組を表す。図５を参照すると、評価表現（肝臓，肝機能，悪化）に基づく表現「肝機能の悪化」と、評価表現（肝臓，ＧＯＰ，増加）に基づく表現「ＧＯＰの増加」と、が関連表現の組として登録されている。また、「肝機能の悪化」と、評価表現（肝臓，ＧＰＴ，上昇）に基づく表現「ＧＰＴの上昇」および「ＧＰＴの著明な上昇」のそれぞれとが関連付けられて登録されている。さらに、評価表現（肝臓，肝機能，悪化）に基づく表現「ＧＰＴ優位の肝機能の悪化」と、「ＧＰＴの上昇」および「ＧＰＴ」のそれぞれとが関連付けられて登録されている。なお、意味クラス判定部１６０により属性を表す語が共通の意味クラスにあると判定される評価表現（肝臓，ＧＯＰ，増加）および（肝臓，ＧＰＴ，上昇）に基づく表現同士は、関連表現の組として登録されていない。また、表現「ＧＰＴ優位の肝機能の悪化」は、「ＧＰＴ」の語を含むことから、これと共通の意味クラスにある語「ＧＯＰ」を含む「ＧＯＰの増加」と関連付けられていない。 The related expression storage unit 180 stores a set of related expressions generated by the related expression generation unit 150 and output by the output processing unit 170. FIG. 5 shows an example of the data contents of the related expression storage unit 180. FIG. 5 shows an example of a set of related expressions generated by the related expression generation unit 150 when the evaluation expression of FIG. 4 is extracted from the processing target text of FIG. Two expressions described in one line of the table of FIG. 5 represent a set of related expressions. Referring to FIG. 5, the expression “deterioration of liver function” based on the evaluation expression (liver, liver function, deterioration) and the expression “increase GOP” based on the evaluation expression (liver, GOP, increase) are related expressions. It is registered as a group. Also, “deterioration of liver function” and expressions “elevation of GPT” and “significant increase of GPT” based on evaluation expressions (liver, GPT, increase) are registered in association with each other. Furthermore, the expressions “deterioration of liver function predominantly GPT” based on the evaluation expression (liver, liver function, deterioration) and “increased GPT” and “GPT” are registered in association with each other. It should be noted that expressions based on evaluation expressions (liver, GOP, increase) and (liver, GPT, increase) determined by the semantic class determination unit 160 that the word representing the attribute is in a common semantic class are a set of related expressions. Not registered as. In addition, the expression “deterioration of liver function with GPT predominance” includes the word “GPT”, and thus is not associated with “increased GOP” including the word “GOP” in a common semantic class.

以下、図６を参照し、情報処理装置１０が行う処理の手順の例を説明する。情報処理装置１０は、例えば、図示しない入力装置を用いてユーザが処理の開始を指示したときに、図６の例の手順の処理を開始する。 Hereinafter, an example of a procedure of processing performed by the information processing apparatus 10 will be described with reference to FIG. For example, when the user instructs to start processing using an input device (not shown), the information processing apparatus 10 starts processing of the procedure in the example of FIG.

まず、情報処理装置１０は、処理対象のテキストを取得する（ステップＳ１０）。本例では、医療分野の文書を記憶したデータベース中の文書から処理対象のテキストを取得する。 First, the information processing apparatus 10 acquires a text to be processed (step S10). In this example, the text to be processed is acquired from the document in the database storing the medical field document.

情報処理装置１０のコーパス解析部１２０は、処理対象のテキストに対し、形態素解析および構文解析を行う（ステップＳ１２）。ステップＳ１２では、解析辞書１１２が参照される。 The corpus analysis unit 120 of the information processing apparatus 10 performs morphological analysis and syntax analysis on the text to be processed (step S12). In step S12, the analysis dictionary 112 is referred to.

ステップＳ１２の後、評価表現抽出部１３０は、処理対象のテキストから、評価表現を抽出する（ステップＳ１４）。評価表現抽出部１３０は、図３を参照して上述した例のように、評価表現辞書１１４を参照して、処理対象のテキストから属性を表す語と評価値を表す語との組を抽出する。さらに、処理対象のテキストを含む文書から、あるいは、ユーザの指定により、処理対象のテキスト中の評価表現における評価対象を取得し、取得した評価対象を、抽出した属性と評価値との組に対応づけて、（評価対象，属性，評価値）により表される評価表現を得る。 After step S12, the evaluation expression extraction unit 130 extracts an evaluation expression from the text to be processed (step S14). As in the example described above with reference to FIG. 3, the evaluation expression extraction unit 130 refers to the evaluation expression dictionary 114 and extracts a combination of a word representing an attribute and a word representing an evaluation value from the text to be processed. . Furthermore, the evaluation target in the evaluation expression in the processing target text is acquired from the document containing the processing target text or as specified by the user, and the acquired evaluation target corresponds to the combination of the extracted attribute and evaluation value. Then, an evaluation expression represented by (evaluation object, attribute, evaluation value) is obtained.

評価表現抽出部１３０は、ステップＳ１４で抽出した評価表現のそれぞれの極性を特定する（ステップＳ１６）。本例の評価表現抽出部１３０は、抽出した評価表現のそれぞれについて、当該評価表現に関連付けて評価表現辞書１１４に登録された極性の値を取得する。評価表現抽出部１３０は、ステップＳ１４，Ｓ１６の結果を評価表現分類部１４０に渡す。上述の図４は、ステップＳ１４，Ｓ１６の結果として得られる評価表現とその極性の例である。 The evaluation expression extraction unit 130 specifies each polarity of the evaluation expression extracted in step S14 (step S16). The evaluation expression extraction unit 130 of this example acquires the value of the polarity registered in the evaluation expression dictionary 114 in association with the evaluation expression for each of the extracted evaluation expressions. The evaluation expression extraction unit 130 passes the results of steps S14 and S16 to the evaluation expression classification unit 140. FIG. 4 described above is an example of evaluation expressions and their polarities obtained as a result of steps S14 and S16.

評価表現分類部１４０は、ステップＳ１４で抽出された評価表現を、これらの評価表現の間で評価対象および極性が共通するか否かに基づいて、１以上のグループに分類する（ステップＳ１８）。評価表現分類部１４０は、図４を参照して上述した例のように、処理対象のテキストにおける出現順で連続する評価表現の間で、評価対象および極性が共通する場合に、これらの連続する評価表現を同じグループに分類する。図４の例では、抽出された評価表現のすべてにおいて評価対象が「肝臓」で共通であるため、同じ極性が連続していれば同じグループに分類される。 The evaluation expression classification unit 140 classifies the evaluation expressions extracted in step S14 into one or more groups based on whether or not the evaluation object and polarity are common between these evaluation expressions (step S18). As in the example described above with reference to FIG. 4, the evaluation expression classifying unit 140 continues these evaluation expressions when the evaluation object and the polarity are common among the evaluation expressions that are consecutive in the order of appearance in the text to be processed. Classify evaluation expressions into the same group. In the example of FIG. 4, the evaluation target is “liver” in all of the extracted evaluation expressions, and therefore, if the same polarity is continuous, they are classified into the same group.

ステップＳ１８における分類結果の他の例として、図７に、抽出された評価表現において異なる評価対象を含むものがある場合の例を示す。図７の表を参照し、対象Ａについての評価表現が３つ、対象Ｂについての評価表現が４つ、対象Ｃについての評価表現が１つ、ステップＳ１４で抽出されたとする。また、これらの評価表現は、図７の表の上の行から順に処理対象のテキスト中に出現していたとする。このとき、対象Ａで極性「positive」が連続する２つの評価表現がグループａに分類され、その次の対象Ａの評価表現（極性「negative」）は、単独でグループｂに分類される。さらに、対象Ｂで極性「negative」が連続する２つの評価表現がグループｃに、対象Ｂで極性「positive」が連続する２つの評価表現がグループｄに分類され、残りの対象Ｃの評価表現はさらに他のグループｅに分類される。 As another example of the classification result in step S18, FIG. 7 shows an example in the case where there is an extracted evaluation expression including different evaluation objects. Referring to the table of FIG. 7, it is assumed that three evaluation expressions for the object A, four evaluation expressions for the object B, and one evaluation expression for the object C are extracted in step S14. Also, it is assumed that these evaluation expressions appear in the text to be processed in order from the top row in the table of FIG. At this time, two evaluation expressions in which the polarity “positive” continues in the object A are classified into the group a, and the evaluation expression (polarity “negative”) of the next object A is classified into the group b alone. Further, two evaluation expressions having the polarity “negative” continuous in the object B are classified into the group c, and two evaluation expressions having the polarity “positive” in the object B are classified into the group d, and the evaluation expressions of the remaining object C are Furthermore, it is classified into another group e.

評価表現分類部１４０は、ステップＳ１８における分類の結果を関連表現生成部１５０に渡す。 The evaluation expression classification unit 140 passes the result of the classification in step S18 to the related expression generation unit 150.

次に、関連表現生成部１５０は、各グループについて、評価表現の属性を表す語のうち共通の意味クラスにある語が存在するか否かを意味クラス判定部１６０に判定させる（ステップＳ２０）。例えば、関連表現生成部１５０は、複数の評価表現を含むグループについて、当該グループに含まれる評価表現の属性を表す語のうち、互いに共通の意味クラスにあるものが存在するか否かを意味クラス判定部１６０に判定させる。図４のグループ１の例の場合、「肝機能」と「ＧＯＰ」、「肝機能」と「ＧＰＴ」、および「ＧＯＰ」と「ＧＰＴ」の３種類の組合せについて、互いに共通の意味クラスにあるか否かを意味クラス判定部１６０に判定させる。この例では、意味クラス判定部１６０は、上記で説明したように、「ＧＯＰ」と「ＧＰＴ」とが共通の意味クラスにあると判定し、「肝機能」と「ＧＯＰ」、「肝機能」と「ＧＰＴ」については共通の意味クラスにないと判定する。 Next, the related expression generation unit 150 causes the semantic class determination unit 160 to determine whether there is a word in a common semantic class among the words representing the attributes of the evaluation expression for each group (step S20). For example, for the group including a plurality of evaluation expressions, the related expression generation unit 150 determines whether there is a common class among words representing the attributes of the evaluation expressions included in the group. The determination unit 160 makes a determination. In the case of the group 1 example in FIG. 4, “liver function” and “GOP”, “liver function” and “GPT”, and “GOP” and “GPT” are in a common semantic class. Whether or not the semantic class determination unit 160 determines. In this example, as described above, the semantic class determination unit 160 determines that “GOP” and “GPT” are in a common semantic class, and “liver function”, “GOP”, and “liver function”. And “GPT” are determined not to be in a common semantic class.

共通の意味クラスにあるか否かの判定の後、関連表現生成部１５０は、複数の評価表現を含む各グループの各評価表現を含む文字列を処理対象のテキストから抽出する（ステップＳ２２）。図４のグループ１の場合、ステップＳ２２で、上述のように、図３の処理対象テキストから、「肝機能の悪化」、「ＧＰＴ優位の肝機能の悪化」、「ＧＯＰの増加」、「ＧＰＴの上昇」および「ＧＰＴの著明な上昇」が抽出される。関連表現生成部１５０は、同じグループの各評価表現について抽出した文字列のうちの２つずつを関連表現の組とする。ただし、ステップＳ２０で共通の意味クラスにあると判定された語を含む文字列同士は関連表現の組としない。関連表現生成部１５０は、生成した関連表現の組を出力処理部１７０に渡す。 After determining whether or not they are in a common semantic class, the related expression generation unit 150 extracts a character string including each evaluation expression of each group including a plurality of evaluation expressions from the text to be processed (step S22). In the case of group 1 in FIG. 4, in step S22, as described above, “deterioration of liver function”, “deterioration of liver function superior to GPT”, “increase in GOP”, “GPT” from the processing target text in FIG. "Increase" and "Significant increase in GPT" are extracted. The related expression generation unit 150 sets two of the character strings extracted for each evaluation expression of the same group as a set of related expressions. However, character strings including words determined to be in the common semantic class in step S20 are not set as related expressions. The related expression generation unit 150 passes the generated set of related expressions to the output processing unit 170.

ステップＳ２２の後、出力処理部１７０は、関連表現生成部１５０から受け取った関連表現の組を関連表現記憶部１８０に登録する（ステップＳ２４）。上述の図５は、ステップＳ２４における登録の結果の例である。ステップＳ２４の後、図６の例の手順の処理は終了する。 After step S22, the output processing unit 170 registers the set of related expressions received from the related expression generation unit 150 in the related expression storage unit 180 (step S24). FIG. 5 described above is an example of the registration result in step S24. After step S24, the process of the procedure in the example of FIG. 6 ends.

図６の例の手順の処理により、関連表現記憶部１８０には、関連表現の組、つまり、言い換え可能な（互いに関連性の高い）表現の候補の組が登録される。関連表現記憶部１８０は、文章の検索処理などに用いられる。例えば、ユーザにより入力された検索クエリに含まれる表現の関連表現を関連表現記憶部１８０から取得し、入力された検索クエリに含まれる表現だけでなく、この表現の関連表現も検索キーとして文書を検索するといった処理を行うことが考えられる。 By the processing of the procedure in the example of FIG. 6, a set of related expressions, that is, a set of expression candidates that can be paraphrased (highly related to each other) is registered in the related expression storage unit 180. The related expression storage unit 180 is used for sentence search processing and the like. For example, the related expression of the expression included in the search query input by the user is acquired from the related expression storage unit 180, and not only the expression included in the input search query but also the related expression of this expression is used as a search key as a document. It is conceivable to perform processing such as searching.

以上、図３から図５を参照して説明した例では、処理対象のテキストにおいて、評価表現の属性および評価値の両方が１つの文に含まれている。処理対象のテキストの他の例では、評価表現の属性および評価値が異なる文に出現していてもよい。例えば、図８を参照し、処理対象のテキストが「血小板の調査結果が得られた。著明な低下が認められる。ＨＨＶ−６ウイルス量の増加が認められる。ＨＨＶ−６の再活性化が関与していた可能性も考えられる。」であるとする。図８における破線の四角および破線矢印は、図３と同様、評価表現の属性を表す語とその評価値を表す語との組を示す。図８では、評価表現の属性と評価値との組として、（血小板，低下），（ＨＨＶ−６ウイルス量，増加），（ＨＨＶ−６，再活性化）が抽出され、これらの３つの評価表現が図６のステップＳ１８で同じグループに分類されるとする。（血小板，低下）の組は、属性を表す語「血小板」と評価値を表す語「低下」とが２つの異なる文に出現している。このような場合、（血小板，低下）を含む文字列として、例えば、「血小板の調査結果が得られた。低下が認められる。」および「血小板の調査結果が得られた。著明な低下が認められる。」が抽出され、（ＨＨＶ−６ウイルス量，増加），（ＨＨＶ−６，再活性化）に基づく文字列「ＨＨＶ−６ウイルス量の増加」，「ＨＨＶ−６の再活性化」などの関連表現として関連表現記憶部１８０に登録され得る。 As described above, in the example described with reference to FIGS. 3 to 5, both the attribute of the evaluation expression and the evaluation value are included in one sentence in the text to be processed. In another example of the text to be processed, the evaluation expression attribute and the evaluation value may appear in different sentences. For example, referring to FIG. 8, the text to be processed is “Platelet survey results were obtained. A marked decrease was observed. An increase in HHV-6 viral load was observed. Reactivation of HHV-6 was observed. It is possible that he was involved. ” A broken-line square and a broken-line arrow in FIG. 8 indicate a set of a word representing an attribute of the evaluation expression and a word representing the evaluation value, as in FIG. In FIG. 8, (platelet, decrease), (HHV-6 viral load, increase), and (HHV-6, reactivation) are extracted as a set of evaluation expression attributes and evaluation values, and these three evaluations are extracted. Assume that the expressions are classified into the same group in step S18 of FIG. In the set of (platelet, decrease), the word “platelet” representing an attribute and the word “decrease” representing an evaluation value appear in two different sentences. In such a case, as a character string including (platelet, decrease), for example, “Platelet survey results were obtained. Decrease was observed.” And “Platelet survey results were obtained. Is recognized, and the strings “HHV-6 viral load”, “HHV-6 viral load, increase”, “HHV-6 viral load”, “HHV-6 viral load”, “reactivation of HHV-6” Or the like as a related expression.

以上で説明した実施形態の例は、本発明の実施の形態の一例に過ぎず、各種の変形例があってよい。 The example of embodiment described above is only an example of embodiment of this invention, and there may be various modifications.

一変形例では、評価表現の属性のうち数値で評価値が表され得る属性について、処理対象のテキストにおいて当該属性の評価値を表す数値が記述されている場合に、評価表現抽出部１３０は、当該属性を表す語と数値との組を処理対象のテキストから抽出してもよい。この変形例では、評価表現辞書１１４または意味辞書１１６において、さらに、評価表現のうち評価値が数値で表され得る属性（評価項目）の評価値の基準値を表す情報を記憶しておく。例えば、医療における各種の検査項目は、評価表現の属性となることがあり、かつ、数値によって評価値が表される。よって、本変形例の評価表現辞書１１４または意味辞書１１６は、評価表現における属性のうち検査項目を表す語と当該検査項目の検査値の基準値を表す情報とを関連付けて記憶する。図９に、このような基準値の情報の例を示す。図９の例の表は、評価表現の属性のうち数値で評価値が表され得る検査項目を表す語と、その評価値の基準値とが互いに関連付けられている。図９に例示するような基準値の情報は、例えば、医療機関で行われ得る検査の検査項目およびその基準値を用いて予め生成されて評価表現辞書１１４または意味辞書１１６に登録される。 In one modification, for an attribute whose evaluation value can be represented by a numerical value among the attributes of the evaluation expression, when a numerical value representing the evaluation value of the attribute is described in the text to be processed, the evaluation expression extracting unit 130 A pair of a word and a numerical value representing the attribute may be extracted from the text to be processed. In this modified example, the evaluation expression dictionary 114 or the semantic dictionary 116 further stores information indicating the reference value of the evaluation value of the attribute (evaluation item) in which the evaluation value can be expressed by a numerical value in the evaluation expression. For example, various examination items in medicine may be attributes of evaluation expressions, and evaluation values are represented by numerical values. Therefore, the evaluation expression dictionary 114 or the semantic dictionary 116 of the present modification stores the word representing the inspection item among the attributes in the evaluation expression in association with the information representing the reference value of the inspection value of the inspection item. FIG. 9 shows an example of such reference value information. In the table in the example of FIG. 9, a word representing an inspection item whose evaluation value can be expressed as a numerical value among attributes of the evaluation expression and a reference value of the evaluation value are associated with each other. The reference value information as illustrated in FIG. 9 is generated in advance using, for example, examination items of a test that can be performed in a medical institution and the reference value, and is registered in the evaluation expression dictionary 114 or the semantic dictionary 116.

図２の例のデータ内容に加えて図９の例の基準値の情報が評価表現辞書１１４に登録されている場合に、例えば、文「γ−ＧＴＰは１１０となり、肝機能の低下が見られる。」を処理対象のテキストとしたとする。このとき、評価表現抽出部１３０は、属性と評価値との組として、（γ−ＧＴＰ，１１０）および（肝機能，低下）を抽出する。なお、評価対象としては「肝臓」が取得される。処理対象テキストから抽出される評価表現（肝臓，肝機能，低下）は、評価表現辞書１１４に登録された評価表現そのものであり、その極性は「negative」である。評価表現抽出部１３０は、抽出した属性と数値との組（γ−ＧＴＰ，１１０）については、図９の例の基準値の情報を参照し、「γ−ＧＴＰ」の基準値「５０以下」と抽出した数値「１１０」とを比較する。この比較の結果、抽出した数値「１１０」の方が基準値よりも大きいことから、例えば、属性「γ−ＧＴＰ」を含む評価表現のうち、数値が大きくなる変化を表す評価値「増加」または「上昇」を含む評価表現（肝臓，γ−ＧＴＰ，増加）または（肝臓，γ−ＧＴＰ，上昇）を、（γ−ＧＴＰ，１１０）に対応する評価表現として選択する。そして、選択した評価表現の極性を評価表現辞書１１４から取得し、この評価表現およびその極性を、処理対象のテキストから抽出した他の評価表現（肝臓，肝機能，低下）およびその極性と共に、評価表現分類部１４０による分類処理の対象とする。なお、関連表現生成部１５０は、属性と数値との組に対応する評価表現について、処理対象のテキストから文字列を抽出する処理を行うとき、当該属性と数値とを含む文字列（例「γ−ＧＴＰは１１０」）を抽出してもよいし、当該数値を、対応する評価表現の評価値の語に置き換えた文字列（例「γ−ＧＴＰは増加」）を取得してもよい。 When the reference value information of the example of FIG. 9 is registered in the evaluation expression dictionary 114 in addition to the data contents of the example of FIG. 2, for example, the sentence “γ-GTP becomes 110, and a decrease in liver function is observed. ”Is the text to be processed. At this time, the evaluation expression extraction unit 130 extracts (γ-GTP, 110) and (liver function, decline) as a set of attributes and evaluation values. Note that “liver” is acquired as an evaluation target. The evaluation expression (liver, liver function, decline) extracted from the processing target text is the evaluation expression itself registered in the evaluation expression dictionary 114, and its polarity is “negative”. The evaluation expression extracting unit 130 refers to the reference value information in the example of FIG. 9 for the extracted attribute and numerical value pair (γ-GTP, 110), and the reference value “50 or less” of “γ-GTP”. Is compared with the extracted numerical value “110”. As a result of this comparison, since the extracted numerical value “110” is larger than the reference value, for example, among the evaluation expressions including the attribute “γ-GTP”, an evaluation value “increase” representing a change in which the numerical value increases or An evaluation expression (liver, γ-GTP, increase) or (liver, γ-GTP, increase) including “increased” is selected as an evaluation expression corresponding to (γ-GTP, 110). Then, the polarity of the selected evaluation expression is acquired from the evaluation expression dictionary 114, and this evaluation expression and its polarity are evaluated together with other evaluation expressions (liver, liver function, decline) extracted from the text to be processed and its polarity. It is a target of classification processing by the expression classification unit 140. When the related expression generation unit 150 performs a process of extracting a character string from the text to be processed for the evaluation expression corresponding to the combination of the attribute and the numerical value, the related expression generating unit 150 includes a character string (for example, “γ -GTP may be 110 "), or a character string in which the numerical value is replaced with the word of the evaluation value of the corresponding evaluation expression (eg," γ-GTP increases ") may be acquired.

図９を参照する本変形例において、処理対象のテキストから抽出された属性と数値との組に対応する評価表現は、当該抽出された数値と、当該属性に関連付けられた基準値と、の比較の結果に基づいて特定される。例えば、抽出された数値が基準値よりも大きければ、数値が大きくなる変化を表す語（増加、上昇など）を評価値として含み、かつ当該数値と組として抽出された属性を含む評価表現を対応する評価表現とする。逆に、抽出された数値が基準値よりも小さければ、数値が小さくなる変化を表す語（減少、低下など）を評価値として含み、かつ当該数値と組として抽出された属性を含む評価表現を対応する評価表現とする。また、抽出された数値が基準値と同じであれば、対応する評価表現は存在しないことにしてよい。なお、属性の基準値が数値の範囲によって表されている場合、抽出した数値が基準値の範囲の上限を超えていれば、数値が大きくなる変化を表す語を含む評価表現を、抽出した数値が基準値の範囲の下限を下回っていれば、数値が小さくなる変化を表す評価表現を、対応する評価表現とすればよい。 In this modification example with reference to FIG. 9, the evaluation expression corresponding to the combination of the attribute and the numerical value extracted from the text to be processed is a comparison between the extracted numerical value and the reference value associated with the attribute. Based on the result of For example, if the extracted numerical value is larger than the reference value, it corresponds to an evaluation expression that includes a word (increase, increase, etc.) that represents a change that increases the numerical value as an evaluation value and includes the attribute extracted as a pair with the numerical value. The evaluation expression to be used. On the contrary, if the extracted numerical value is smaller than the reference value, an evaluation expression including a word (decrease, decrease, etc.) indicating a change in the numerical value as an evaluation value and an attribute extracted as a pair with the numerical value. The corresponding evaluation expression. Further, if the extracted numerical value is the same as the reference value, there may be no corresponding evaluation expression. In addition, when the reference value of an attribute is represented by a numerical value range, if the extracted numerical value exceeds the upper limit of the reference value range, the extracted numerical value includes an evaluation expression that includes a word representing a change in the numerical value. If the value is below the lower limit of the range of the reference value, an evaluation expression representing a change in which the numerical value becomes smaller may be set as a corresponding evaluation expression.

以上で説明した実施形態および変形例では、評価対象の状態の変化を評価する評価表現（上昇、増加、低下、減少など、変化を表す評価値を含む）が評価表現辞書１１４に登録される。他の変形例では、状態の変化を評価する評価表現だけでなく、評価対象の状態そのものを評価する評価表現を評価表現辞書に登録しておいてもよい。例えば、「良い」，「悪い」，「高い」，「低い」，「大きい」，「小さい」など、時間による変化を表す語ではなく、単に状態を記述する語を評価値として含む評価表現をさらに評価表現辞書１１４に登録しておいてもよい。この場合、数値で評価値が表される属性を含む評価表現を処理する上述の変形例において、処理対象のテキストから抽出した数値と、対応する属性に関連付けられた基準値と、の大小関係に応じた語を評価値として含む評価表現を、当該抽出した数値および属性の組に相当する評価表現とすればよい。例えば、抽出した数値が基準値よりも大きい場合に、「高い」の語を含む評価表現を、抽出した数値が基準値よりも小さい場合に「低い」の語を含む評価表現を、当該数値と属性との組に対応する評価表現とする。また、抽出した数値が基準値と同じであれば、その数値と属性との組に対応する評価表現は存在しないと判定する。 In the embodiment and the modification described above, evaluation expressions (including evaluation values representing changes such as increase, increase, decrease, decrease, etc.) for evaluating changes in the state of the evaluation target are registered in the evaluation expression dictionary 114. In another modification, not only an evaluation expression for evaluating a change in state but also an evaluation expression for evaluating the state of the evaluation object itself may be registered in the evaluation expression dictionary. For example, an evaluation expression that includes a word that simply describes the state as an evaluation value instead of a word that represents a change over time, such as “good”, “bad”, “high”, “low”, “large”, “small”, etc. Further, it may be registered in the evaluation expression dictionary 114. In this case, in the above-described modified example in which the evaluation expression including the attribute whose evaluation value is represented by a numerical value is processed, the magnitude relationship between the numerical value extracted from the text to be processed and the reference value associated with the corresponding attribute is The evaluation expression including the corresponding word as the evaluation value may be an evaluation expression corresponding to the extracted numerical value and attribute pair. For example, when the extracted numerical value is larger than the reference value, an evaluation expression including the word “high” is used. When the extracted numerical value is smaller than the reference value, the evaluation expression including the word “low” is An evaluation expression corresponding to a pair with an attribute. If the extracted numerical value is the same as the reference value, it is determined that there is no evaluation expression corresponding to the combination of the numerical value and the attribute.

また、以上の説明では、評価表現分類部１４０は、処理対象のテキストにおける出現順に連続する評価表現の間で評価対象および極性が共通するものを同じグループに分類する。一変形例では、評価表現分類部１４０は、処理対象のテキストにおける出現順を考慮せずに、単に、評価対象および極性が共通する評価表現を同じグループに分類してもよい。この場合、処理対象のテキストにおける出現順で連続している評価表現でなくても、評価対象および極性が共通していれば同じグループに分類される。さらに他の例では、処理対象のテキスト中の接続詞やモダリティをさらに用いて分類してもよいし、機械学習に基づくクラスタリングの問題として分類を行ってもよい。 Further, in the above description, the evaluation expression classification unit 140 classifies the evaluation objects that have the same evaluation object and polarity among the evaluation expressions that are consecutive in the order of appearance in the text to be processed into the same group. In one modification, the evaluation expression classification unit 140 may simply classify evaluation expressions having the same evaluation object and polarity into the same group without considering the appearance order in the text to be processed. In this case, even if the evaluation expressions are not consecutive in the order of appearance in the text to be processed, they are classified into the same group as long as the evaluation object and polarity are common. In still another example, classification may be performed by further using conjunctions and modalities in the text to be processed, or classification as a clustering problem based on machine learning.

また、以上の説明では、意味クラス判定部１６０は、シソーラスにおける意味クラスが同一であって同義語でない語同士を共通の意味クラスにある異なる語と判定する。一変形例では、従来から知られている構文解析の手法を用いて、処理対象のテキストにおいて属性を表す語を含む句が並置構造を有する（例えば、「ＧＯＰの増加、ＧＰＴの著明な上昇」では、２つの句が並置されている）ことを検出し、この検出結果とシソーラスにおける意味クラスの判定とを組み合わせて、共通の意味クラスにある異なる語の有無を判定してもよい。 Moreover, in the above description, the semantic class determination unit 160 determines that words having the same semantic class in the thesaurus and not synonyms are different words in a common semantic class. In one variation, a phrase including a word representing an attribute in a text to be processed has a juxtaposed structure using a conventionally known parsing technique (for example, “an increase in GOP, a significant increase in GPT). ", Two phrases are juxtaposed), and the detection result and determination of semantic classes in the thesaurus may be combined to determine the presence or absence of different words in a common semantic class.

また、以上の説明では、関連表現生成部１５０により、評価表現を含む文字列を処理対象のテキストから抽出し、抽出した文字列同士を関連付けて関連表現記憶部１８０に登録する。一変形例では、抽出した評価表現自体を関連表現記憶部１８０にさらに登録してもよい。例えば、図４を参照し、同じグループ１に分類された３つの評価表現（肝臓，肝機能，悪化），（肝臓，ＧＯＰ，増加），（肝臓，ＧＰＴ，上昇）について、属性を表す語が共通の意味クラスにない組である、（肝臓，肝機能，悪化）と（肝臓，ＧＯＰ，増加）、（肝臓，肝機能，悪化）と（肝臓，ＧＰＴ，上昇）を関連表現記憶部１８０に登録してもよい。また、評価表現自体を関連表現記憶部１８０に登録する場合、処理対象のテキストから評価表現を含む文字列を抽出し、抽出した文字列を関連付けて関連表現記憶部１８０に登録する処理は省略してもよい。 In the above description, the related expression generation unit 150 extracts a character string including the evaluation expression from the text to be processed, and associates the extracted character strings with each other and registers them in the related expression storage unit 180. In one modification, the extracted evaluation expression itself may be further registered in the related expression storage unit 180. For example, referring to FIG. 4, words representing attributes for three evaluation expressions (liver, liver function, deterioration), (liver, GOP, increase), and (liver, GPT, increase) classified into the same group 1 are shown. In the related expression storage unit 180, (liver, liver function, deterioration), (liver, GOP, increase), (liver, liver function, deterioration) and (liver, GPT, increase), which are not in a common semantic class, are stored. You may register. Further, when registering the evaluation expression itself in the related expression storage unit 180, a process of extracting a character string including the evaluation expression from the text to be processed and associating the extracted character string with the associated expression and registering it in the related expression storage unit 180 is omitted. May be.

なお、関連表現生成部１５０は、必ずしも、関連表現の「組」を明示的に生成しなくてもよい。評価表現分類部１４０によって同じグループに分類された評価表現であって互いに共通の意味クラスにある語を含まない複数の評価表現（および各評価表現に基づき処理対象のテキストから抽出された文字列）を互いに関連付けて関連表現記憶部１８０に登録しておけばよい。 Note that the related expression generation unit 150 does not necessarily have to explicitly generate a “set” of related expressions. A plurality of evaluation expressions that are classified into the same group by the evaluation expression classification unit 140 and do not include words in a common semantic class (and character strings extracted from the text to be processed based on each evaluation expression) May be registered in the related expression storage unit 180 in association with each other.

また、以上では、対象、属性、および評価値の３種類の値の組からなる評価表現を評価表現辞書１１４に登録する。一変形例では、評価対象を含まずに、属性を表す語と評価値を表す語との組からなる評価表現を評価表現辞書１１４に登録しておき、処理対象のテキストから抽出した属性と評価値との組を評価表現として上述の実施形態および各種の変形例の処理を同様に行ってもよい。この例の場合、属性を表す語から評価対象を特定すればよい。例えば、各評価対象と、その属性を表す語と、を関連付ける情報を評価表現辞書１１４または意味辞書１１６に登録しておき、処理対象のテキストから抽出した属性を表す語に関連付けられた評価対象を、当該属性を含む評価表現の評価対象として特定する。 In the above, an evaluation expression composed of a set of three types of values, that is, an object, an attribute, and an evaluation value is registered in the evaluation expression dictionary 114. In one modified example, an evaluation expression including a word representing an attribute and a word representing an evaluation value is registered in the evaluation expression dictionary 114 without including the evaluation target, and the attribute and the evaluation extracted from the text to be processed are evaluated. You may perform similarly the process of the above-mentioned embodiment and various modifications by making a set with a value into evaluation expression. In the case of this example, an evaluation target may be specified from a word representing an attribute. For example, information that associates each evaluation object with a word that represents the attribute is registered in the evaluation expression dictionary 114 or the semantic dictionary 116, and the evaluation object associated with the word that represents the attribute extracted from the text to be processed is stored. , And specified as an evaluation target of an evaluation expression including the attribute.

なお、以上では、医療分野の文章を処理対象のテキストとする場合の例を説明した。当然ながら、本実施形態の例の処理は、他の専門分野の文章を処理対象のテキストとする場合も上記の説明と同様に行ってよい。あるいは、専門分野に限られない、一般的な文章を処理対象のテキストとしてもよい。 In the above description, an example in which medical text is a text to be processed has been described. Of course, the processing of the example of the present embodiment may be performed in the same manner as described above when a sentence in another specialized field is used as a text to be processed. Alternatively, a general sentence that is not limited to a specialized field may be used as a text to be processed.

以上に例示した情報処理装置１０は、典型的には、汎用のコンピュータにて上述の情報処理装置１０の各部の機能又は処理内容を記述したプログラムを実行することにより実現される。コンピュータは、例えば、ハードウエアとして、図１０に示すように、ＣＰＵ（中央演算装置）８０、メモリ（一次記憶）８２、各種Ｉ／Ｏ（入出力）インタフェース８４等がバス８６を介して接続された回路構成を有する。また、そのバス８６に対し、例えばＩ／Ｏインタフェース８４経由で、ハードディスクドライブ（ＨＤＤ）８８やＣＤやＤＶＤ、フラッシュメモリなどの各種規格の可搬型の不揮発性記録媒体を読み取るためのディスクドライブ９０が接続される。このようなドライブ８８又は９０は、メモリに対する外部記憶装置として機能する。実施形態の処理内容が記述されたプログラムがＣＤやＤＶＤ等の記録媒体を経由して、又はネットワーク経由で、ＨＤＤ８８等の固定記憶装置に保存され、コンピュータにインストールされる。固定記憶装置に記憶されたプログラムがメモリに読み出されＣＰＵにより実行されることにより、実施形態の処理が実現される。 The information processing apparatus 10 exemplified above is typically realized by executing a program describing functions or processing contents of each unit of the information processing apparatus 10 described above on a general-purpose computer. In the computer, for example, as shown in FIG. 10, a CPU (central processing unit) 80, a memory (primary storage) 82, various I / O (input / output) interfaces 84, and the like are connected via a bus 86 as hardware. Circuit configuration. In addition, a hard disk drive (HDD) 88, a disk drive 90 for reading various types of portable non-volatile recording media such as a CD, a DVD, and a flash memory is connected to the bus 86 via, for example, an I / O interface 84. Connected. Such a drive 88 or 90 functions as an external storage device for the memory. A program in which the processing content of the embodiment is described is stored in a fixed storage device such as the HDD 88 via a recording medium such as a CD or DVD or via a network, and is installed in a computer. The program stored in the fixed storage device is read into the memory and executed by the CPU, whereby the processing of the embodiment is realized.

なお、以上では、情報処理装置１０を１台のコンピュータにより実現する例の実施形態を説明したが、上述した情報処理装置１０の各種の例の機能を複数のコンピュータに分散させて実現してもよい。 In the above, the embodiment of the example in which the information processing apparatus 10 is realized by one computer has been described. However, the functions of various examples of the information processing apparatus 10 described above may be realized by being distributed to a plurality of computers. Good.

１０情報処理装置、８０ＣＰＵ、８２メモリ、８４Ｉ／Ｏインタフェース、８６バス、８８ＨＤＤ、９０ディスクドライブ、１１０参照データ記憶部、１１２解析辞書、１１４評価表現辞書、１１６意味辞書、１２０コーパス解析部、１３０評価表現抽出部、１４０評価表現分類部、１５０関連表現生成部、１６０意味クラス判定部、１７０出力処理部、１８０関連表現記憶部。 DESCRIPTION OF SYMBOLS 10 Information processing apparatus, 80 CPU, 82 Memory, 84 I / O interface, 86 bus, 88 HDD, 90 Disk drive, 110 Reference data storage part, 112 Analysis dictionary, 114 Evaluation expression dictionary, 116 Semantic dictionary, 120 Corpus analysis part , 130 evaluation expression extraction unit, 140 evaluation expression classification unit, 150 related expression generation unit, 160 semantic class determination unit, 170 output processing unit, 180 related expression storage unit.

Claims

Associating an evaluation expression including a word representing an evaluation item for evaluating an evaluation object and a word representing an evaluation value for the evaluation item, and a polarity indicating whether or not the evaluation expression is a positive expression A step of referring to the stored evaluation expression storage means, extracting the evaluation expression included in the character string to be processed from the character string to be processed, and specifying the polarity of each of the extracted evaluation expressions;
A classification step of classifying the extracted evaluation expressions into one or more groups based on whether the evaluation object and the polarity are common among the extracted evaluation expressions;
When there are a plurality of evaluation expressions classified in the same group in the classification step, an output step of outputting the plurality of evaluation expressions to the related expression storage means in association with each other;
A program that causes a computer to execute.

In the classification step, when the extracted evaluation expressions are arranged in the order of appearance in the character string to be processed, if the evaluation object and the polarity are common among consecutive evaluation expressions, the continuous evaluation expressions Classify them into the same group,
The program according to claim 1.

In the output step, among the plurality of evaluation expressions, for the evaluation expressions in which the words representing the evaluation items are in a common semantic class, the association between the evaluation expressions is not performed.
The program according to claim 1 or 2, characterized in that

The computer is further caused to execute an extraction step of extracting a character string including the evaluation expression from the character string to be processed for each of the plurality of evaluation expressions to be output in the output step,
The character string extracted in the extraction step includes a word having a dependency relationship with a word representing the evaluation item of the evaluation expression or a word representing the evaluation value in the character string to be processed,
In the output step, character strings extracted for each of the plurality of evaluation expressions are associated with each other and output to the related expression storage unit.
The program according to any one of claims 1 to 3, wherein:

In the output step, among character strings extracted for each of the plurality of evaluation expressions, a character string including words in a common semantic class is not associated with the character strings.
The program according to claim 4.

In the computer, each of words representing an evaluation item whose evaluation value can be expressed numerically among the evaluation items in the evaluation expression stored in the evaluation expression storage means, and a reference value of the evaluation value of the evaluation item, , Referring to the reference value information storage means stored in association with each other, from the character string to be processed, a word representing the evaluation item stored in the reference value information storage means, and a numerical value corresponding to the evaluation value of the evaluation item, And the evaluation expression corresponding to the combination of the evaluation item and the numerical value and its polarity based on the result of comparing the reference value associated with the extracted evaluation item and the extracted numerical value. A second specifying step for specifying in the storage means is executed;
The evaluation expression specified in the second specifying step is further processed in the classification step.
The program according to any one of claims 1 to 5, wherein:

Associating an evaluation expression including a word representing an evaluation item for evaluating an evaluation object and a word representing an evaluation value for the evaluation item, and a polarity indicating whether or not the evaluation expression is a positive expression Referring to the stored evaluation expression storage means, extracting the evaluation expression included in the character string to be processed from the character string to be processed, and specifying means for specifying the polarity of each of the extracted evaluation expressions;
Classification means for classifying the extracted evaluation expressions into one or more groups based on whether the evaluation object and the polarity are common among the evaluation expressions extracted by the specifying means;
When there are a plurality of evaluation expressions classified into the same group by the classification means, an output means that associates the plurality of evaluation expressions with each other and outputs them to the related expression storage means
An information processing apparatus comprising: