JP5452563B2

JP5452563B2 - Method and apparatus for extracting evaluation information

Info

Publication number: JP5452563B2
Application number: JP2011230054A
Authority: JP
Inventors: ダーリャンワン; ホンジーキュ; カイザオ; リクンチュ; チェンジエンフー
Original assignee: NEC China Co Ltd
Current assignee: NEC China Co Ltd
Priority date: 2011-01-21
Filing date: 2011-10-19
Publication date: 2014-03-26
Anticipated expiration: 2031-10-19
Also published as: CN102609424A; CN102609424B; JP2012155699A

Description

本発明はデータマイニングの分野に関し、特に、評価情報抽出のための方法および装置に関する。 The present invention relates to the field of data mining, and more particularly to a method and apparatus for extracting evaluation information.

インターネットの発展につれて、インターネット上の情報を単に受動的に受け取るだけではもはや満足できず、自らの意見をネットワーク上で表明したり、個人的な情報を公開したりしたいと考えている人が次第に増えている。こうしたコメントや意見は、一般に、ショッピングウェブサイトや、電子掲示板、個人のブログ、ミニブログなどの様々なネットワーク媒体上で公開されており、その中には製品に対するユーザーのコメント、イベントや政策に対する読者の意見が数多く含まれている。こうしたコメントなどの文章を含むコンテンツに対しては、意見マイニング技術を使用した分析やマイニングが行われており、製品、イベント、政策などに対する大衆の一般的な意見や姿勢を個人や組織が把握するのに役立っている。そのため、意見マイニング技術はきわめて応用価値が高いことが推察される。 With the development of the Internet, it is no longer satisfactory to simply passively receive information on the Internet, and more and more people want to express their opinions on the network or disclose personal information ing. These comments and opinions are generally published on various network media such as shopping websites, electronic bulletin boards, personal blogs, mini-blogs, among others, user comments on products, readers on events and policies. Many opinions are included. Content that includes such text as comments is analyzed and mined using opinion mining technology, and individuals and organizations understand the general opinions and attitudes of the public about products, events, and policies. It is useful for. Therefore, it can be inferred that opinion mining technology has extremely high application value.

意見マイニングは、主に、主観表現テキスト（感覚表現を含む単語、句、単文、文章などのテキスト）から、有用な評価情報や関連知識を自動的に取得することに関連する。意見マイニングの主な目的は、テキストから評価情報を特定し、それによって適切な分析を行うことである。現在の評価情報を取得する方法は、大きく以下の３つに分類される。 Opinion mining is mainly related to automatically acquiring useful evaluation information and related knowledge from subjective expression text (text such as words, phrases, simple sentences, sentences including sensory expressions). The main purpose of opinion mining is to identify evaluation information from text and thereby perform appropriate analysis. The method of acquiring the current evaluation information is roughly classified into the following three.

第１の方法は、共存テンプレートに基づいて評価情報を半自動的に抽出する方法である。この方法では、評価情報は３つ組み（対象、属性、評価）として定義され、評価情報の各要素は共存テンプレートのスロット値とみなされる。そして、感覚表現テキストから評価情報の３つの要素にマッチするものが抽出され、共存テンプレートで分析される。例えば、共存テンプレートが「＜対象＞の＜属性＞は＜評価＞」であるとし、分析する感覚表現テキストが「このカメラの写真は素晴らしい」であるとすれば、この文を共存テンプレートでマッチさせて抽出した３つ組はそれぞれ、＜このカメラ＞、＜写真＞、＜素晴らしい＞になると考えられる。この方法では、「対象」、「属性」、「評価」という３つの辞書を構築し、各辞書で要素となる語を手作業で選択し、出現頻度が比較的高い複数の共存テンプレートを選択する必要がある。そのためまず、「属性」と「評価」が生成され、次に、生成された「属性」と「評価」が手作業でフィルタリングされ、フィルタリング後の適正な「属性」と「評価」がそれぞれの辞書に配置される。 The first method is a method of extracting evaluation information semi-automatically based on a coexistence template. In this method, the evaluation information is defined as a triple (object, attribute, evaluation), and each element of the evaluation information is regarded as a slot value of the coexistence template. And what matches three elements of evaluation information is extracted from a sensory expression text, and is analyzed by a coexistence template. For example, if the coexistence template is "<object> <attribute> is <evaluation>" and the sensory expression text to be analyzed is "this camera photo is great", then match this sentence with the coexistence template. The extracted triples are considered to be <this camera>, <photo>, and <great>, respectively. In this method, three dictionaries “object”, “attribute”, and “evaluation” are constructed, and the word that is an element in each dictionary is manually selected, and a plurality of coexistence templates having relatively high appearance frequencies are selected. There is a need. Therefore, “attributes” and “evaluations” are first generated, then the generated “attributes” and “evaluations” are manually filtered, and the appropriate “attributes” and “evaluations” after filtering are stored in the respective dictionaries. Placed in.

第２の方法は、コロケーション辞書に基づいて、評価情報を抽出する方法である。評価情報は、評価情報の２つ組み（意見語、対象特徴）を吟味することによって得られる。この方法では、３つの辞書を構築する必要がある。１つめは意見語辞書、２つめは手作業での探索により取得される対象特徴辞書、そして３つめは、主に構文関係が記載されるリンク挿入辞書である。この方法では、評価情報を取得するために、意見語辞書と対象特徴辞書を使って感覚表現テキストに含まれる意見語と対象特徴がマーキングされる。次に、リンク挿入辞書を使い、意見語と対象特徴のペアの構文関係を参照して、この意見語と対象特徴のペアが正しくマッチしているペアか否かが判断される。 The second method is a method for extracting evaluation information based on a collocation dictionary. The evaluation information is obtained by examining two sets of evaluation information (opinion word, target feature). This method requires the construction of three dictionaries. The first is an opinion word dictionary, the second is a target feature dictionary acquired by manual search, and the third is a link insertion dictionary that mainly describes syntax relationships. In this method, in order to obtain evaluation information, opinion words and target features included in the sensory expression text are marked using the opinion word dictionary and the target feature dictionary. Next, using the link insertion dictionary, the syntactic relationship between the opinion word and the target feature pair is referred to, and it is determined whether or not the opinion word and the target feature pair match correctly.

第３の方法は、文法的なパス辞書に基づいて評価情報を抽出する方法である（特許文献１（中国特許出願第２００９１００８２３４２．１号）「Ｍｅｔｈｏｄｓ，ＤｅｖｉｃｅｓａｎｄＳｙｓｔｅｍｓｆｏｒＯｂｔａｉｎｉｎｇＥｖａｌｕａｔｉｏｎＵｎｉｔａｎｄＥｓｔａｂｌｉｓｈｉｎｇＳｙｎｔａｘｐａｔｈＤｉｃｔｉｏｎａｒｙ（評価ユニットを取得し構文パス辞書を構築するための方法、装置、およびシステム）」を参照）。この方法では、構文パスを使用して、製品特徴と意見語との関係が記述される。まず、構文パス辞書を構築する必要がある。次に、感覚表現テキストコーパスにあるすべての製品特徴語と意見語が認識され、すべての製品特徴と意見語との間の構文パスが生成される。これらの構文パスは一般化され、一般化された構文パスの出現頻度が計算される。そして、出現頻度が特定のしきい値を超える構文パスが基準構文パスとみなされて、構文パス辞書に挿入される。構文パス辞書の取得後に、入力された感覚表現テキストの製品特徴と意見語が認識され、構文分析が行われ、対応する構文ツリーが構築される。構文パス辞書に基づいて、構文ツリーが基準構文パスとマッチするパスを問い合わせることにより、このパスによってリンクされている製品特徴と意見語を評価ユニット（すなわち評価情報）とみなすることが可能になる。 The third method is a method of extracting evaluation information based on a grammatical path dictionary (Patent Document 1 (Chinese Patent Application No. 2000009100822.1) “Methods, Devices and Systems for Observing Evaluation Unit and Establishing Syntax Pax”. Dictionary (see Method, Apparatus, and System for Obtaining Evaluation Units and Building Syntax Path Dictionaries)). In this method, the relationship between product features and opinion words is described using a syntax path. First, you need to build a syntax path dictionary. Next, all product feature words and opinion words in the sensory representation text corpus are recognized, and a syntax path between all product features and opinion words is generated. These syntax paths are generalized and the frequency of occurrence of the generalized syntax path is calculated. Then, a syntax path whose appearance frequency exceeds a specific threshold is regarded as a reference syntax path, and is inserted into the syntax path dictionary. After obtaining the syntax path dictionary, the product features and opinion words of the input sensory expression text are recognized, a syntax analysis is performed, and a corresponding syntax tree is constructed. Based on the syntax path dictionary, querying the path where the syntax tree matches the reference syntax path allows product features and opinion words linked by this path to be considered as evaluation units (ie, evaluation information). .

中国特許出願第２００９１００８２３４２．１号Chinese Patent Application No. 200000913422.1

第１の方法の問題点は、共存テンプレートのタイプが単純で、狭い範囲しかカバーしないために呼び出し率が低いこと、および選択手順で手作業のフィルタリングが必要とされることである。 The problem with the first method is that the type of coexistence template is simple and covers only a narrow range, so the call rate is low, and the selection procedure requires manual filtering.

第２の方法の問題点は、構文関係のタイプによっては対象から外れるものがあるため呼び出し率が低くなること、辞書を手作業で構築する必要があること、および辞書の移植性が望ましいものではないことである。 The problem with the second method is that the call rate is low because some types of syntactic relations are excluded from the target, the dictionary needs to be built manually, and the portability of the dictionary is not desirable. It is not.

第３の方法の問題点は、辞書、構文アナライザなどの多数のリソースに依存すること、システムが高度に複雑化すること、呼び出し率が低いこと、および、生成された構文パス辞書から一部の構文関係が除外されている可能性があるために、必然的に、拡張性が低下することである。 The problems with the third method are that it depends on a large number of resources such as dictionaries and syntax analyzers, that the system is highly complex, that the call rate is low, and that some of the generated syntax path dictionaries This is necessarily a drop in extensibility because the syntactic relationship may be excluded.

上記の問題点に鑑み、本発明は評価情報を抽出するための解決法として、従来ほど複雑ではなく、辞書への依存度が低く、本来的に効率の良い方法を提供する。 In view of the above-mentioned problems, the present invention provides a method for extracting evaluation information, which is not as complex as before, has a low dependency on a dictionary, and is inherently efficient.

本発明の第１の態様によれば、コーパスから意見語集合と対象特徴集合を取得するステップと、意見語集合と対象特徴集合との関係、意見語集合の要素の類似度、および対象特徴集合の要素の類似度に基づいて意見語集合と対象特徴集合とを最適化するステップと、最適化された意見語集合および最適化された対象特徴集合に基づいて評価情報を抽出するステップとを備える、評価情報抽出方法が提供される。 According to the first aspect of the present invention, the step of acquiring the opinion word set and the target feature set from the corpus, the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the target feature set A step of optimizing the opinion word set and the target feature set based on the similarity of the elements, and a step of extracting evaluation information based on the optimized opinion word set and the optimized target feature set An evaluation information extraction method is provided.

本発明の第２の態様によれば、コーパスから意見語集合と対象特徴集合とを取得するように構成された取得手段と、意見語集合と対象特徴集合との関係、意見語集合の要素の類似度、および対象特徴集合の要素の類似度に基づいて意見語集合と対象特徴集合とを最適化するように構成された最適化手段と、最適化された意見語集合および最適化された対象特徴集合に基づいて評価情報を抽出するように構成された抽出手段とを備える、評価情報抽出装置が提供される。 According to the second aspect of the present invention, the acquisition means configured to acquire the opinion word set and the target feature set from the corpus, the relationship between the opinion word set and the target feature set, and the elements of the opinion word set Optimization means configured to optimize the opinion word set and the target feature set based on the similarity and the similarity of the elements of the target feature set, and the optimized opinion word set and the optimized target There is provided an evaluation information extraction device comprising extraction means configured to extract evaluation information based on a feature set.

本発明によれば、従来の方法と比較して低コストで管理も不要な機械学習方法を提供する。本発明による方法では、辞書への依存度が低く、従来技術のように、複数の辞書を使う必要がない。 According to the present invention, a machine learning method is provided that is less expensive and does not require management as compared with the conventional method. In the method according to the present invention, the dependence on the dictionary is low, and it is not necessary to use a plurality of dictionaries as in the prior art.

本発明のその他の特徴と利点は、本発明の原理を示す以下の好適な実施例の説明と図面を参照することにより明らかとなるであろう。 Other features and advantages of the present invention will become apparent upon reference to the following description of the preferred embodiment and the drawings which illustrate the principles of the invention.

図を参照した以下の説明によって、本発明がより包括的に理解されることにより、本発明のその他の目的と効果がさらに明確で理解しやすいものとなるであろう。 The following description with reference to the drawings will provide a more comprehensive understanding of the present invention, so that other objects and advantages of the present invention will become clearer and easier to understand.

本発明の一実施例による評価情報抽出方法のフローチャートである。4 is a flowchart of an evaluation information extraction method according to an embodiment of the present invention. 本発明の他の実施例による評価情報抽出方法のフローチャートである。6 is a flowchart of an evaluation information extraction method according to another embodiment of the present invention. 本発明の他の実施例による評価情報抽出方法のフローチャートである。6 is a flowchart of an evaluation information extraction method according to another embodiment of the present invention. 本発明の他の実施例による評価情報抽出方法のフローチャートである。6 is a flowchart of an evaluation information extraction method according to another embodiment of the present invention. 本発明の一実施例による評価情報抽出装置のブロック図である。It is a block diagram of the evaluation information extraction apparatus by one Example of this invention.

上記のすべての図において、同じ参照番号は、同じか、類似しているか、または対応する特徴や機能を示す。 In all the above figures, the same reference numbers indicate the same, similar or corresponding features or functions.

以下では、図を参照して、本発明をより詳細に説明し例示する。本発明の図と実施例は、例示のみ目的として使用されており、本発明の保護範囲を制限するためではないことを理解されたい。 In the following, the invention will be described and illustrated in more detail with reference to the figures. It should be understood that the figures and examples of the present invention are used for illustrative purposes only and are not intended to limit the protection scope of the present invention.

明瞭化のため、最初に、本発明で使用される用語を説明する。
１．コーパス For clarity, the terminology used in the present invention is first explained.
1. Corpus

本発明において、コーパスは、複数のテキストファイルで構成してもよい。テキストファイルは、複数のテキスト単位を取得するために、既定の処理粒度に従って前処理を行ってもよい。テキスト単位は、対象特徴と意見語とが共存する最小の言語単位である。テキスト単位は、単語、句、文、段落、記事全体、またはこれらの組み合わせなど、自由なテキストとすることができる。
２．評価情報 In the present invention, the corpus may be composed of a plurality of text files. The text file may be preprocessed according to a predetermined processing granularity in order to obtain a plurality of text units. The text unit is a minimum language unit in which the target feature and the opinion word coexist. The text unit can be free text such as a word, phrase, sentence, paragraph, entire article, or a combination thereof.
2. Evaluation information

本発明では、評価情報は対象特徴と意見語を含むことができる。 In the present invention, the evaluation information can include a target feature and an opinion word.

意見語とは、テキスト単位に含まれ、意見を表明するために何かを評価する単語や句である。意見語としては、例えば、「良い」、「高い」、「美しい」、「エレガントだ」、「便利だ」などが挙げられる。 Opinion words are words or phrases that are included in a text unit and evaluate something to express an opinion. Opinion words include, for example, “good”, “high”, “beautiful”, “elegant”, “convenient”, and the like.

対象特徴とは、テキスト単位に含まれる、意見語によって修飾される評価対象であり、製品やサービスなどが含まれる。対象特徴としては、例えば、「燃料消費率」、「外観」、「価格」、「安全性」、「操縦性」などが挙げられる。 The target feature is an evaluation target that is modified by an opinion word included in a text unit, and includes products and services. Examples of the target feature include “fuel consumption rate”, “appearance”, “price”, “safety”, “maneuverability”, and the like.

評価情報は、テキスト単位に含まれる意見語と、対応する対象特徴によって構成されるペアであり、明確な意見極性（正、負、中立）を有する。評価情報は、対象特徴と意見語から成る２つ組、すなわち［対象特徴、意見語］とすることができる。例えば、「高排出ガス車の燃料消費率はたいへん高い」、「このモデルの携帯電話の価格は高い」、「このブランドの車の操縦性は非常に良い」という３つのテキスト単位からは、［燃料消費率，高い］、［価格，高い］、［操縦性，良い］という３つの評価情報が得られる。
３．意見語集合と対象特徴集合の関係 The evaluation information is a pair composed of opinion words included in text units and corresponding target features, and has a clear opinion polarity (positive, negative, neutral). The evaluation information may be a pair of target features and opinion words, that is, [target features, opinion words]. For example, from the three text units "High emission fuel consumption is very high", "The price of this model's mobile phone is high" and "This brand car has very good maneuverability" Three pieces of evaluation information are obtained: [Fuel consumption rate, high], [Price, high], and [Maneuverability, good].
3. Relationship between opinion word set and target feature set

同じテキスト単位から意見語と対象特徴が検出された場合、意見語と対象特徴は関係があるとみなされる。コーパスから取得した多数のテキスト単位に基づき、意見語、対象特徴、および両者の関係を探索することによって、意見語の出現頻度、対象特徴の出現頻度、および両者の関係の頻度を得ることができる。 When an opinion word and a target feature are detected from the same text unit, the opinion word and the target feature are considered to be related. By searching for opinion words, target features, and the relationship between them based on a large number of text units obtained from the corpus, the appearance frequency of opinion words, the appearance frequency of target features, and the frequency of the relationship between them can be obtained. .

本発明では、意見語集合に含まれる個々の意見語と対象特徴集合に含まれる個々の対象特徴との関係、および関係の頻度を、意見語集合と対象特徴集合の関係と呼ぶ。 In the present invention, the relationship between the individual opinion words included in the opinion word set and the individual target features included in the target feature set and the frequency of the relationship are referred to as the relationship between the opinion word set and the target feature set.

例えば、「高排出ガス車の燃料消費率はきわめて高い」、「このモデルの携帯電話の価格は高い」、「このブランドの車の操縦性は非常に良い」、「購入したプリンタの価格はあまりにも高い」という４つのテキスト単位があるとすれば、意見語集合には２つの意見語「良い」と「高い」が含まれており、「良い」の出現頻度は１、「高い」の出現頻度は３である。対象特徴集合には、３つの対象特徴「燃料消費率」、「価格」、「操縦性」が」含まれており、「燃料消費率」の出現頻度は１、「価格」の出現頻度は２、および「操縦性」の出現頻度は１である。意見語「良い」と対象特徴「燃料消費率」とは同じテキスト単位の中では出現（以降、「共起」とする）していないため、この両者の間には関係がなく、したがって両者の関係の頻度は「０」と記録される。同様に、意見語「良い」は、対象特徴「価格」、「操縦性」のいずれとも関係があり、両者と「良い」との関係の頻度はそれぞれ「１」である。意見語「高い」についても同様に、対象特徴「燃料消費率」、「価格」、および「操縦性」のそれぞれとの関係の有無と関係の頻度を得ることができる。以下の例は、「良い」、「高い」という２つの意見語を含む意見語集合（「Ｏ」で示す）と、「燃料消費率」、「価格」、「操縦性」という３つの対象特徴を含む対象特徴集合（「Ｆ」で示す）との関係を示す。

For example, “The fuel consumption rate of high-emission vehicles is extremely high”, “The price of this model's mobile phone is high”, “The driving performance of this brand car is very good”, “The price of the purchased printer is too high If there are four text units “high”, the opinion word set contains two opinion words “good” and “high”, the appearance frequency of “good” is 1, and the appearance of “high” The frequency is 3. The target feature set includes three target features “fuel consumption rate”, “price”, and “maneuverability”. The appearance frequency of “fuel consumption rate” is 1, and the appearance frequency of “price” is 2. , And the appearance frequency of “maneuverability” is 1. The opinion word “good” and the target feature “fuel consumption rate” do not appear in the same text unit (hereinafter referred to as “co-occurrence”). The relationship frequency is recorded as “0”. Similarly, the opinion word “good” is related to both of the target characteristics “price” and “maneuverability”, and the frequency of the relationship between the two and “good” is “1”. Similarly, for the opinion word “high”, it is possible to obtain the presence / absence of the relationship with the target characteristics “fuel consumption rate”, “price”, and “maneuverability” and the frequency of the relationship. The following example shows an opinion word set (indicated by “O”) that includes two opinion words of “good” and “high”, and three target features of “fuel consumption rate”, “price”, and “maneuverability”. A relationship with a target feature set (indicated by “F”) including

本発明においては、計算を容易にするために、行列形式で関係を表現してもよい。 In the present invention, the relationship may be expressed in a matrix format for easy calculation.

意見語集合と対象特徴集合の関係に基づいて、意見語集合から対象特徴集合への変換関係（「Ｔ_Ｏ−Ｆで示す」）を得ることができる。意見語集合内の「良い」の出現頻度は１、「高い」の出現頻度は３であるから、意見語集合から対象特徴集合への変換関係Ｔ_Ｏ−Ｆは以下のように示すことができる。

Based on the relationship between the opinion word set and the target feature set, a conversion relationship from the opinion word set to the target feature set ("shown as TO _-F ") can be obtained. Since the appearance frequency of “good” in the opinion word set is 1 and the appearance frequency of “high” is 3, the conversion relationship T _O-F from the opinion word set to the target feature set can be expressed as follows. .

上表において、「／」は、比例関係を示す。例えば、列「Ｏ（「高い」）」内の「１／３」は、すべてのテキスト単位内で、意見語「高い」と対象特徴「燃料消費率」との共起頻度が「高い」の出現頻度の３分の１であることを示し、「２／３」は、すべてのテキスト単位内で、意見語「高い」と対象特徴「価格」の共起頻度が「高い」の出現頻度の３分の２であることを示し、「０／３」は、すべてのテキスト単位内で、意見語「高い」の出現頻度は３であるが、意見語「高い」と対象特徴「操縦性」とは共起していないことを示している。 In the above table, “/” indicates a proportional relationship. For example, “1/3” in the column “O (“ high ”)” indicates that the co-occurrence frequency of the opinion word “high” and the target feature “fuel consumption rate” is “high” in all text units. “2/3” indicates that the co-occurrence frequency of the opinion word “high” and the target feature “price” is “high” in all text units. “0/3” indicates that the appearance frequency of the opinion word “high” is 3 in all text units, but the opinion word “high” and the target feature “maneuverability” Indicates that they do not co-occur.

さらに、意見語集合と対象特徴集合の関係に基づいて、対象特徴集合から意見語集合への変換関係（「Ｔ_Ｆ−Ｏ」とする）を得ることができる。対象特徴集合の「燃料消費率」、「価格」、「操縦性」の出現頻度はそれぞれ１、２、１であるため、対象特徴集合から意見語集合への変換関係Ｔ_Ｆ−Ｏは以下のように示すことができる。

Furthermore, based on the relationship between the opinion word set and the target feature set, a conversion relationship from the target feature set to the opinion word set (referred to as “T _F−O ”) can be obtained. Since the appearance frequency of “fuel consumption rate”, “price”, and “maneuverability” of the target feature set is 1, 2, and 1, respectively, the conversion relationship T _F-O from the target feature set to the opinion word set is Can be shown as:

上表において、「／」は、比例関係を示す。例えば、列「Ｆ「価格」」において、「０／２」は、すべてのテキスト単位内で、対象特徴「価格」が２回出現しているときに、対象特徴「価格」と意見語「良い」は共起していないこと、すなわち、すべてのテキスト単位内の対象特徴「価格」の出現頻度が２であるのに対し、対象特徴「価格」と意見語「良い」は共起していないことを示している。「２／２」は、すべてのテキスト単位内で、対象特徴「価格」が２回出現しているときに、対象特徴「価格」と意見語「高い」の共起頻度は２であることを示している。上記は、テキスト単位内に対象特徴「価格」が出現している状況において、意見語「高い」の出現可能性は大きく、意見語「良い」の出現可能性は小さいことを反映している。 In the above table, “/” indicates a proportional relationship. For example, in the column “F“ price ””, “0/2” indicates that the target feature “price” and the opinion word “good” are present when the target feature “price” appears twice in all text units. ”Does not co-occur, that is, the appearance frequency of the target feature“ price ”in all text units is 2, whereas the target feature“ price ”and the opinion word“ good ”do not co-occur. It is shown that. “2/2” indicates that the co-occurrence frequency of the target feature “price” and the opinion word “high” is 2 when the target feature “price” appears twice in all text units. Show. The above reflects that, in a situation where the target feature “price” appears in the text unit, the possibility that the opinion word “high” appears is large and the possibility that the opinion word “good” appears is small.

本発明においては、計算を容易にするために、変換関係Ｔ_Ｏ−Ｆおよび変換関係Ｔ_Ｆ−Ｏの双方を行列形式で表現することができる。
４．対象特徴集合の要素の類似度および意見語集合の要素の類似度 In the present invention, in order to facilitate calculation, both the conversion relation T _O-F and the conversion relation T _F-O can be expressed in a matrix form.
4). Similarity of elements in the target feature set and similarity of elements in the opinion word set

対象特徴集合の要素の類似度とは、対象特徴集合に含まれるすべての対象特徴の間の類似度の組み合わせである。例えば、対象特徴集合に３つの対象特徴、「燃料消費率」、「価格」、「操縦性」が含まれ、類似度を計算することによって、「燃料消費率」と「価格」の類似度が０．３、「燃料消費率」と「操縦性」の類似度が０．２、「価格」と「操縦性」の類似度が０．０１であるという結果が得られたとすれば、対象特徴集合の要素の類似度は以下のように示すことができる。

The similarity of elements of the target feature set is a combination of similarities between all target features included in the target feature set. For example, the target feature set includes three target features, “fuel consumption rate”, “price”, and “maneuverability”. By calculating similarity, the similarity between “fuel consumption rate” and “price” If the result that the similarity between “fuel consumption rate” and “maneuverability” is 0.2 and the similarity between “price” and “maneuverability” is 0.01 is obtained, The similarity of the elements of the set can be shown as follows.

同様に、上記の方法によって得られる意見語集合の要素の類似度を以下に示す。

Similarly, the similarity of the elements of the opinion word set obtained by the above method is shown below.

本発明においては、計算を容易にするために、上記両集合の要素の類似度を行列形式で表現することができる。 In the present invention, in order to facilitate the calculation, the similarity between the elements of both sets can be expressed in a matrix format.

集合内の任意の２つの要素間の類似度は、既存のさまざまな方法に基づいて計算することができる。例えば、意味辞書（セマンティック辞書）に基づく方法を利用してもよい。この方法では、集合内のどの２つの要素間の類似度も、ＳｙｎｏｎｙｍＤｉｃｔｉｏｎａｒｙやＨｏｗＮｅｔなどの外部の辞書に基づいて、意味構造ツリー内の２つの単語のパス長を調べることによって計算することができる。もう１つの例を挙げれば、機械ベースの学習方法を利用してもよい。この方法では、集合内のどの２つの要素間の類似度も、主成分分析、潜在意味解析、文脈ベクトル類似度などのアルゴリズムに基づいて計算することができる。 The similarity between any two elements in the set can be calculated based on various existing methods. For example, a method based on a semantic dictionary (semantic dictionary) may be used. In this way, the similarity between any two elements in the set can be calculated by examining the path length of two words in the semantic structure tree based on an external dictionary such as Synonym Dictionary or HowNet. . As another example, a machine-based learning method may be used. In this method, the similarity between any two elements in the set can be calculated based on algorithms such as principal component analysis, latent semantic analysis, context vector similarity.

本発明は、評価情報を抽出する方法に関する。この方法は、コーパスから意見語集合と対象特徴集合を取得するステップと、意見語集合と対象特徴集合の関係、意見語集合の要素の類似度、および対象特徴集合の要素の類似度に基づいて意見語集合と対象特徴集合を最適化するステップと、最適化された意見語集合と最適化された対象特徴集合に基づいて評価情報を抽出するステップとを備えることができる。 The present invention relates to a method for extracting evaluation information. This method is based on the step of obtaining the opinion word set and the target feature set from the corpus, the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set. The method may include a step of optimizing the opinion word set and the target feature set, and a step of extracting evaluation information based on the optimized opinion word set and the optimized target feature set.

本発明は、「情報共起は、同じカテゴリの対象間の関係と類似度の中に存する」という原則に基づいて、対象特徴集合と意見語集合のランクを調整し、継続的な反復によって対象特徴集合と意見語集合を徐々に最適化し、最終的に得られた最適化後のランク付け結果から高ランクの対象特徴と意見語を抽出し、対象特徴と意見語の関係を利用して、対象特徴と意見語を含む評価情報を構築する。 The present invention adjusts the rank of the target feature set and opinion word set based on the principle that “information co-occurrence exists in the relationship and similarity between objects of the same category”, and the target is obtained by continuous iteration. Gradually optimize the feature set and opinion word set, extract high-ranked target features and opinion words from the final ranking results obtained, and use the relationship between the target features and opinion words. Establish evaluation information including target features and opinion words.

本発明は、従来の方法と比較して低コストで管理も不要な機械学習方法を提供する。この方法では、従来技術のように、複数の辞書を使う必要はない。それどころか、本発明は辞書を全く使うことなく目的を達成できる他、程度副詞を使うだけで実施は可能である。さらに、本発明は、対象特徴集合と意見語集合の動的なフィルタリングと拡張、および対象特徴と意見語の修飾関係の動的な構築が可能なため、コーパスから迅速かつ効果的に評価情報を抽出することができる。 The present invention provides a machine learning method that is low in cost and does not require management as compared with a conventional method. In this method, it is not necessary to use a plurality of dictionaries as in the prior art. On the contrary, the present invention can achieve the object without using a dictionary at all, and can be implemented only by using a degree adverb. Furthermore, the present invention allows dynamic filtering and expansion of the target feature set and opinion word set, and dynamic construction of the modification relationship between the target feature and opinion word, so that evaluation information can be quickly and effectively received from the corpus. Can be extracted.

図１は、本発明の一実施例による評価情報抽出方法のフローチャートである。 FIG. 1 is a flowchart of an evaluation information extraction method according to an embodiment of the present invention.

ステップＳ１０１において、コーパスから意見語集合と対象特徴集合が取得される。 In step S101, an opinion word set and a target feature set are acquired from the corpus.

このステップでは、最初に、コーパスの前処理が行われて、テキスト単位が取得され、次に、取得されたテキスト単位に基づき、意見語抽出規則に従って意見語集合が取得され、対象特徴抽出規則に従って対象特徴集合が取得されてもよい。 In this step, first, corpus preprocessing is performed to obtain a text unit, then, based on the obtained text unit, an opinion word set is obtained according to the opinion word extraction rule, and according to the target feature extraction rule. A target feature set may be acquired.

コーパスの前処理としては、例えば、コーパス内のテキストファイルを、単文分割、単語分割、品詞タグ付け、繁体字／簡体字変換などにより処理することができる。１つの実施例において、１つの文をテキスト単位とし、その文を、対象特徴と意見語が共起する最小言語単位として扱う。例えば、テキストコーパス内のテキストファイルは、７つの句読点記号（句点「。」、コンマ「，」、セミコロン「；」、感嘆符「！」、疑問符「？」、読点「、」）に対して符号置換を行い、改行することによって、単文に分割される。次に、取得された文に対して、単語分割、品詞タグ付け、繁体字／簡体字変換などの処理が必要に応じて行われる。そして、前処理された文（すなわちテキスト単位）に基づき、意見語抽出規則に従って意見語集合が取得され、対象特徴抽出規則に従って対象特徴集合が取得される。 As the corpus preprocessing, for example, text files in the corpus can be processed by simple sentence division, word division, part-of-speech tagging, traditional / simplified character conversion, and the like. In one embodiment, one sentence is set as a text unit, and the sentence is handled as a minimum language unit in which the target feature and the opinion word co-occur. For example, a text file in a text corpus is encoded for seven punctuation marks (punctuation “.”, Comma “,”, semicolon “;”, exclamation mark “!”, Question mark “?”, Punctuation mark “,”). By substituting and breaking the line, it is divided into simple sentences. Next, processing such as word division, part-of-speech tagging, and traditional / simplified character conversion is performed on the acquired sentence as necessary. Then, based on the preprocessed sentence (ie, text unit), the opinion word set is acquired according to the opinion word extraction rule, and the target feature set is acquired according to the target feature extraction rule.

意見語抽出規則としては、例えば、程度副詞のすぐ後に続く文節、形容詞、機能語を含まない文節、最長の意見語よりも長さが短いか等しい文節、および最小出現頻度の意見語よりも出現頻度が高い文節を、意見語としてテキスト単位から１つ以上抽出するという方法でもよい。 Opinion word extraction rules include, for example, a phrase that immediately follows a degree adverb, an adjective, a phrase that does not include a function word, a phrase that is shorter or equal in length to the longest opinion word, and an opinion word that has the lowest frequency of appearance A method may be used in which one or more phrases having high frequency are extracted from the text unit as opinion words.

１つの実施例において、コーパスから取得された個々のテキスト単位を、意見語抽出規則に従って検索することによって、意見語の可能性があるすべての文節が検出される。例えば、「高排出ガス車の燃料消費率はたいへん高い」というテキスト単位があるとし、意見語抽出規則が「程度副詞のすぐ後に続く文節を意見語としてテキスト単位から抽出する」である場合、このテキスト単位には程度副詞が「たいへん」の１つしかないため、「たいへん」のすぐ後にある「高い」が意見語として抽出される。このように、コーパスから取得された個々のテキスト単位に上記の処理が行われることによって、意見語候補の集合が取得される。 In one embodiment, all possible phrases of opinion words are detected by searching individual text units obtained from the corpus according to opinion word extraction rules. For example, if there is a text unit that “the fuel consumption rate of high-emission vehicles is very high”, and the opinion word extraction rule is “extract from the text unit the phrase immediately following the degree adverb as an opinion word” Since the text unit has only one degree adverb “Taien”, “High” immediately after “Taien” is extracted as an opinion word. In this way, a set of opinion word candidates is acquired by performing the above processing for each text unit acquired from the corpus.

対象特徴抽出規則は、例えば、「基本名詞句、基本名詞句同士の組み合わせ、基本名詞句と名詞／動名詞の組み合わせ、基本名詞句と限定詞の組み合わせ、および限定詞と名詞／動名詞の組み合わせ、機能語を含まない文節、最長の対象特徴よりも長さが短いか等しい文節、および出現頻度が最少の対象特徴よりも出現頻度が多い文節を、テキスト単位から１つ以上抽出して対象特徴とする」であってもよい。 The target feature extraction rules are, for example, “a basic noun phrase, a combination of basic noun phrases, a combination of a basic noun phrase and a noun / verbal noun, a combination of a basic noun phrase and a noun, and a combination of a noun phrase and a noun / noun. , Extract one or more phrases from the text unit that contain no functional words, phrases that are shorter or equal to the longest target feature, and phrases that appear more frequently than the target feature with the lowest occurrence frequency. May be used.

１つの実施例においては、コーパスから取り出した個々のテキスト単位を、意見語抽出規則に従って検索することによって、意見語の可能性のあるすべての文節が検出され、次に、検出された意見語が存在するテキスト単位の中で、対象特徴抽出規則に従って文脈から対象特徴が検出される。新たな対象特徴が検出された場合には、その対象特徴は対象特徴集合に追加される。この手順では、対象特徴と意見語がペアで出現する頻度に関する統計が作成され、意見語集合の中のどの意見語と対象特徴集合の中のどの対象特徴が同一のテキスト単位に出現していたか、およびその出現頻度を知ることによって、意見語集合と対象特徴集合の間の関係を求めることができる。 In one embodiment, by searching individual text units taken from the corpus according to opinion word extraction rules, all possible phrases of opinion words are detected, and then the detected opinion words are Among the existing text units, the target feature is detected from the context according to the target feature extraction rule. When a new target feature is detected, the target feature is added to the target feature set. This procedure creates statistics on the frequency with which target features and opinion words appear in pairs, and which opinion words in the opinion word set and which target features in the target feature set appeared in the same text unit. , And the appearance frequency thereof, the relationship between the opinion word set and the target feature set can be obtained.

ここで注目すべきは、当業者はここで開示した方法に限定されることなく、従来技術の適切な方法を使用して、意見語抽出規則と対象特徴抽出規則を定義できることである。 It should be noted here that a person skilled in the art can define opinion word extraction rules and target feature extraction rules using an appropriate method of the prior art without being limited to the method disclosed herein.

ステップＳ１０２において、意見語集合と対象特徴集合の関係、意見語集合の要素の類似度、および対象特徴集合の要素の類似度に基づいて、意見語集合と対象特徴集合が最適化される。 In step S102, the opinion word set and the target feature set are optimized based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set.

１つの実施例においては、ステップＳ１０２は以下の方法で実施される。１）意見語集合と対象特徴集合のうちの第１の集合のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、意見語集合と対象特徴集合のうちの第２の集合のスコアが計算され、２）第２の集合の要素の類似度を使って第２の集合のスコアが調整され、３）第２の集合の調整後のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、第１の集合のスコアが計算され、４）第１の集合の要素の類似度を使って第１の集合のスコアが調整され、第１の集合の調整後のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、第２の集合のスコアが計算され、５）第１の集合の調整前後のスコアの差が既定の要件を満たしているか、または、第２の集合の調整前後のスコアの差が既定の要件を満たしている場合には、第１の集合の調整後のスコアに従って第１の集合の要素がランク付けされ、かつ、第２の集合の調整後のスコアに従って第２の集合の要素がランク付けされる。次に、図２を参照して、この実施例について説明する。 In one embodiment, step S102 is performed in the following manner. 1) According to the score of the first set of the opinion word set and the target feature set, the score of the second set of the opinion word set and the target feature set is based on the relationship between the opinion word set and the target feature set. And 2) the score of the second set is adjusted using the similarity of the elements of the second set, and 3) the relationship between the opinion word set and the target feature set according to the adjusted score of the second set. The score of the first set is calculated, and 4) the score of the first set is adjusted using the similarity of the elements of the first set, and the opinion word according to the adjusted score of the first set Based on the relationship between the set and the target feature set, the score of the second set is calculated. 5) The difference between the scores before and after the adjustment of the first set satisfies a predetermined requirement, or the score of the second set If the difference between the scores before and after adjustment meets the predetermined requirements, The elements of the first set are ranked according to the adjusted score of the second set, and the elements of the second set are ranked according to the adjusted score of the second set. Next, this embodiment will be described with reference to FIG.

他の実施例では、ステップＳ１０２は以下の方法で実施される。１）意見語集合と対象特徴集合のうちの第１の集合のスコアは第１の集合の要素の類似度を使って調整され、２）第１の集合の調整後のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、意見語集合と対象特徴集合のうちの第２の集合のスコアが計算され、３）第２の集合の要素の類似度を使って第２の集合のスコアが調整され、第２の集合の調整後のスコアに従い、意見語集合と対象特徴集合との関係に基づいて、第１の集合のスコアが計算され、４）第１の集合の調整前後のスコアの差が既定の要件を満たしているか、または、第２の集合の調整前後のスコアの差が既定の要件を満たしている場合には、第１の集合の調整後のスコアに従って、第１の集合の要素がランク付けされ、かつ、第２の集合の調整後のスコアに従って、第２の集合の要素がランク付けされる。次に、図３を参照して、本実施例について説明する。 In another embodiment, step S102 is performed in the following manner. 1) The score of the first set of the opinion word set and the target feature set is adjusted using the similarity of the elements of the first set, and 2) the opinion word set according to the adjusted score of the first set The score of the second set of the opinion word set and the target feature set is calculated based on the relationship between the target set and the target feature set, and 3) the score of the second set using the similarity of the elements of the second set And the score of the first set is calculated based on the relation between the opinion word set and the target feature set according to the adjusted score of the second set. 4) Scores before and after adjustment of the first set If the difference between the two sets satisfies a predetermined requirement, or if the difference between the scores before and after adjustment of the second set satisfies the predetermined requirement, the first set is adjusted according to the adjusted score of the first set. The elements of the set are ranked and the adjusted score of the second set It, elements of the second set is ranked. Next, this embodiment will be described with reference to FIG.

本発明においては、説明の便宜上、意見語集合と対象特徴集合のうちの１つを第１の集合とし、この集合とは異なる別の集合は第２の集合としている。第１の集合は意見語集合か対象集合のいずれかであり、第２の集合も意見語集合と対象特徴集合のいずれかであるが、第１の集合は第２の集合と同じではない。つまり、第１の集合が意見語集合の場合には、第２の集合は対象特徴集合であり、第１の集合が対象特徴集合の場合には、第２の集合は意見語集合である。 In the present invention, for convenience of explanation, one of the opinion word set and the target feature set is a first set, and another set different from this set is a second set. The first set is either an opinion word set or a target set, and the second set is either an opinion word set or a target feature set, but the first set is not the same as the second set. That is, when the first set is an opinion word set, the second set is a target feature set, and when the first set is a target feature set, the second set is an opinion word set.

ステップＳ１０３において、最適化された意見語集合と最適化された対象特徴集合に基づいて、評価情報が抽出される。 In step S103, evaluation information is extracted based on the optimized opinion word set and the optimized target feature set.

１つの実施例においては、ステップＳ１０３は以下の方法で実施される。 In one embodiment, step S103 is performed in the following manner.

まず、最適化された意見語集合から意見語の既定のしきい値に従って高ランクの意見語が抽出され、最適化された対象特徴集合から対象特徴の既定のしきい値に従って高ランクの対象特徴が抽出され、次に、意見語集合と対象特徴集合の関係に基づいて、高ランクの意見語と高ランクの対象特徴から評価情報が取得される。 First, high rank opinion words are extracted from the optimized opinion word set according to the predetermined threshold value of the opinion word, and high rank target features are extracted from the optimized target feature set according to the predetermined threshold value of the target feature. Next, based on the relationship between the opinion word set and the target feature set, evaluation information is acquired from the high rank opinion word and the high rank target feature.

意見語の既定のしきい値や対象特徴の既定のしきい値は、様々な方法で求めることができる。例えば、経験値に従って指定されるか、従来技術に従って求められるか、数学モデルに従って計算される適切な事前設定値でよく、または、当業者により設定された既定の適切な値であってもよい。 The predetermined threshold value of the opinion word and the predetermined threshold value of the target feature can be obtained by various methods. For example, it may be an appropriate preset value specified according to experience values, determined according to the prior art, calculated according to a mathematical model, or a predetermined appropriate value set by a person skilled in the art.

本実施例では、意見語集合の中で事前に設定された特定の順位しきい値より前にランク付けされた意見語が、高い順位の意見語として抽出される。例えば、意見語集合に１０，０００個の意見語が含まれている場合に、順位しきい値として５，０００が設定されていれば、上位５，０００個の意見語が抽出される。同様に、上記のような方法で、対象特徴集合の中で、事前に設定された特定の順位しきい値（例えば４，０００）よりも高い順位の対象特徴が抽出される。 In the present embodiment, opinion words ranked before a specific ranking threshold set in advance in the opinion word set are extracted as high ranking opinion words. For example, if 10,000 opinion words are included in the opinion word set and the ranking threshold is set to 5,000, the top 5,000 opinion words are extracted. Similarly, by the method as described above, a target feature having a higher rank than a specific ranking threshold (for example, 4,000) set in advance is extracted from the target feature set.

意見語集合と対象特徴集合の関係は、コーパス内のテキスト単位に従って取得されるため、上位５，０００個の意見語と上位４，０００個の対象特徴は、この関係に従ってペアにされる。こうして、同一のテキスト単位内で関係を有する意見語と対象特徴が２つ組のペアにされることにより、評価情報が取得される。 Since the relationship between the opinion word set and the target feature set is acquired according to the text unit in the corpus, the top 5,000 opinion words and the top 4,000 target features are paired according to this relationship. In this way, evaluation information is acquired by making a pair of opinion words and target features having a relationship within the same text unit.

ここで注目すべきは、当業者は、ここで開示される方法に限らず、従来技術の適切な方法により、最適化された意見語集合と最適化された対象特徴集合に基づいて評価情報を抽出できることである。 It should be noted that a person skilled in the art is not limited to the method disclosed here, but the evaluation information based on the optimized opinion word set and the optimized target feature set by an appropriate method of the prior art. It can be extracted.

図１のフローはこれで終了する。 This is the end of the flow of FIG.

図２は、本発明の他の実施例による評価情報抽出方法のフローチャートである。 FIG. 2 is a flowchart of an evaluation information extraction method according to another embodiment of the present invention.

ステップ２０１において、コーパスから意見語集合と対象特徴集合が取得される。 In step 201, an opinion word set and a target feature set are acquired from the corpus.

このステップはステップＳ１０１と類似しているため、詳細は省略する。 Since this step is similar to step S101, the details are omitted.

ステップＳ２０２では、意見語集合と対象特徴集合のうちの第１の集合のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、意見語集合と対象特徴集合のうちの第２の集合のスコアが計算される。 In step S202, according to the score of the first set of the opinion word set and the target feature set, the second set of the opinion word set and the target feature set is determined based on the relationship between the opinion word set and the target feature set. A score is calculated.

第１の集合のスコアの初期値は多数の方法によって求めることができる。１つの実施例においては、まず、コーパスの第１の集合の各要素の頻度情報について統計が作成され、次に、事前に定義された方針に基づいて第１の集合のスコアが取得される。例えば、コーパスの第１の集合の各要素の頻度情報は、そのまま第１の集合のスコアとしてもよいが、事前に設定された重み付け係数を使って頻度情報に重み付けを行った上で第１の集合のスコアとすることもできる。具体的には、第１の集合が意見語集合である場合には、意見語が程度副詞辞書の程度副詞と共に使われているか否かと、そのような使われ方をしている頻度とを判定し、コーパスの第１の集合の各要素の頻度情報を修正することによって、第１の集合のスコアが取得される。 The initial value of the score of the first set can be obtained by a number of methods. In one embodiment, statistics are first generated for frequency information for each element of the first set of corpora, and then a score for the first set is obtained based on a pre-defined policy. For example, the frequency information of each element of the first set of corpus may be used as the score of the first set as it is, but the first frequency information is weighted using a weighting factor set in advance. It can also be a set score. Specifically, if the first set is an opinion word set, it is determined whether or not the opinion word is used together with the degree adverb in the degree adverb dictionary and the frequency of such use. Then, by correcting the frequency information of each element of the first set of corpus, the score of the first set is obtained.

前述のように、意見語集合から対象特徴集合への変換関係Ｔ_Ｏ−Ｆ、および対象特徴集合から意見語集合への変換関係Ｔ_Ｆ−Ｏは、意見語集合と対象特徴集合との関係に従って求められる。 As described above, the conversion relationship T _O-F from the opinion word set to the target feature set and the conversion relationship T _F-O from the target feature set to the opinion word set are in accordance with the relationship between the opinion word set and the target feature set. Desired.

第１の集合が意見語集合である場合、対象特徴集合のスコアは、変換関係Ｔ_Ｏ−Ｆを使い、意見語集合のスコアに従って計算される。第１の集合が対象特徴集合である場合、意見語集合のスコアは、変換関係Ｔ_Ｆ−Ｏを使い、対象特徴集合のスコアに従って計算される。この２つの計算手順は対称的であり、両者ともそのままベクトル行列乗算を実行するか、重み付け係数で重み付けを行った後でベクトル行列乗算を実行してもよく、また当業者によって実行可能なその他の適切な方法によって実行することもできる。 When the first set is an opinion word set, the score of the target feature set is calculated according to the score of the opinion word set using the conversion relation _TO-F . If the first set is a target feature set, the score of the opinion word set, using the transformation relationship T _F-O, is calculated according to the score of the target feature set. The two calculation procedures are symmetric and both may perform vector matrix multiplication as is, or may perform vector matrix multiplication after weighting with a weighting factor, as well as other possible implementations by those skilled in the art. It can also be performed by any suitable method.

ステップＳ２０３において、第２の集合の要素の類似度を使って第２の集合のスコアが調整される。 In step S203, the score of the second set is adjusted using the similarity of the elements of the second set.

１つの実施例では、第２の集合の要素の類似度を使うことによって行われる第２の集合のスコアの調整は、第２の集合の要素の類似度と先験的な信頼度に基づき、第２の集合のスコアを調整することによって調整スコアを求め、調整スコアと第２の集合のスコアの差が既定の要件を満たしている場合には、調整スコアを第２の集合の調整後スコアと判定して第２の集合のスコアの調整を停止し、調整スコアと第２の集合のスコアとの差が既定の要件を満たしていない場合には、第２の集合のスコアを調整スコアで更新する、という方法で行われる。 In one embodiment, the second set of score adjustments made by using the second set of element similarity is based on the second set of element similarity and a priori confidence, An adjusted score is obtained by adjusting the score of the second set, and if the difference between the adjusted score and the score of the second set satisfies a predetermined requirement, the adjusted score is the adjusted score of the second set. If the difference between the adjusted score and the score of the second set does not satisfy the predetermined requirement, the score of the second set is set as the adjusted score. This is done by updating.

ステップＳ２０４において、第２の集合の調整後のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、第１の集合のスコアが計算される。 In step S204, according to the adjusted score of the second set, the score of the first set is calculated based on the relationship between the opinion word set and the target feature set.

前述のように、意見語集合から対象特徴集合への変換関係Ｔ_Ｏ−Ｆ、および対象特徴集合から意見語集合への変換関係Ｔ_Ｆ−Ｏは、意見語集合と対象特徴集合の関係に従って取得される。 As described above, the conversion relationship T _O-F from the opinion word set to the target feature set and the conversion relationship T _F-O from the target feature set to the opinion word set are acquired according to the relationship between the opinion word set and the target feature set. Is done.

第２の集合が意見語集合である場合には、意見語集合のスコアに従い、変換関係Ｔ_Ｏ−Ｆを使って、対象特徴集合のスコアが計算される。第２の集合が対象特徴集合である場合には、対象特徴集合のスコアに従い、変換関係Ｔ_Ｆ−Ｏを使って、意見語集合のスコアが計算される。この２つの計算手順は対称的であり、両者ともそのままベクトル行列乗算を実行するか、重み付け係数で重み付けを行った後でベクトル行列乗算を実行してもよく、また当業者によって実行可能なその他の適切な方法によって実行することもできる。 If the second set is an opinion word set, the score of the target feature set is calculated using the conversion relation T _O-F according to the score of the opinion word set. If the second set is subject feature set in accordance with the score of the target feature set, using the conversion relation T _F-O, score opinion word set is calculated. The two calculation procedures are symmetric and both may perform vector matrix multiplication as is, or may perform vector matrix multiplication after weighting with a weighting factor, as well as other possible implementations by those skilled in the art. It can also be performed by any suitable method.

ステップＳ２０５において、第１の集合の要素の類似度を使って第１の集合のスコアが調整される。 In step S205, the score of the first set is adjusted using the similarity of the elements of the first set.

１つの実施例おいては、第１の集合の要素の類似度を使った第１の集合のスコアの調整は、第１の集合の要素の類似度と先験的な信頼度に基づき、第１の集合のスコアを調整することによって調整スコアを求め、調整スコアと第１の集合のスコアとの差が既定の要件を満たしている場合には、調整スコアを第１の集合の調整後スコアと判定して第１の集合のスコアの調整を停止し、調整スコアと第１の集合のスコアとの差が既定の要件を満たしていない場合には、第１の集合のスコアを調整スコアで更新する、という方法で行われる。 In one embodiment, the adjustment of the score of the first set using the similarity of the elements of the first set is based on the similarity of the elements of the first set and a priori confidence. An adjusted score is obtained by adjusting the score of one set, and if the difference between the adjusted score and the score of the first set satisfies a predetermined requirement, the adjusted score is the adjusted score of the first set. And the adjustment of the score of the first set is stopped, and if the difference between the adjusted score and the score of the first set does not satisfy the predetermined requirement, the score of the first set is This is done by updating.

この実施例おいては、調整スコアと第１の集合のスコアとの差が既定の要件を満たしているか否かの判定は、様々な方法で行うことができる。例えば、実行された調整の回数をカウントしてもよい。カウント結果が事前に設定された反復回数を超えた場合には、調整スコアと第１の集合のスコアとの差は既定の要件を満たしているとみなされる。もう１つの例を挙げれば、調整スコアと第１の集合のスコアとの差が取得され、その差が既定のしきい値よりも小さい場合には、両者の差が既定要件を満たしているとみなされる。さらにもう１つの例を挙げれば、調整スコアと第１の集合のスコアとの角度コサイン値が計算され、角度コサイン値が既定のしきい値よりも小さい場合には、両者の差は既定の要件を満たしているとみなされる。さらに、当業者はここで開示した方法に限定されず、従来技術の適切な方法でこの判定を行うことができる。 In this embodiment, whether or not the difference between the adjustment score and the score of the first set satisfies a predetermined requirement can be determined by various methods. For example, the number of adjustments performed may be counted. If the count result exceeds a preset number of iterations, the difference between the adjusted score and the score of the first set is considered to meet a predetermined requirement. As another example, if the difference between the adjusted score and the score of the first set is obtained and the difference is smaller than the predetermined threshold, the difference between the two satisfies the predetermined requirement. It is regarded. As yet another example, if the angle cosine value of the adjusted score and the first set of scores is calculated and the angle cosine value is less than a predetermined threshold, the difference between the two is a predetermined requirement. Is considered to be satisfied. Further, those skilled in the art are not limited to the methods disclosed herein, and can make this determination using any suitable method of the prior art.

ステップＳ２０６において、第１の集合の調整前後のスコアの差が既定の要件を満たしているか否かが判定される。 In step S206, it is determined whether the difference between the scores before and after adjustment of the first set satisfies a predetermined requirement.

ステップＳ２０６は多数の方法で実行することができる。例えば、第１の集合のスコアが調整される回数をカウントしてもよい。カウント結果が事前に設定された反復回数を超えた場合には、第１の集合の調整前後のスコアの差は既定の要件を満たしているとみなされる。もう１つの例を挙げれば、第１の集合の調整前後のスコアの差が取得され、スコアの差が既定のしきい値よりも小さい場合には、両者の差は既定の要件を満たしているとみなされる。さらにもう１つの例を挙げれば、第１の集合の調整前後のスコアの角度コサイン値を計算し、角度コサイン値が既定のしきい値よりも小さい場合には、両者の差は既定の要件を満たしているとみなされる。さらに、当業者はここで開示した方法に限定されず、従来技術の適切な方法でこの判定を行うことができる。 Step S206 can be performed in a number of ways. For example, the number of times the score of the first set is adjusted may be counted. If the count result exceeds a preset number of iterations, the difference in score before and after adjustment of the first set is considered to satisfy the predetermined requirement. As another example, if the difference between the scores before and after the first set of adjustments is obtained and the score difference is smaller than a predetermined threshold, the difference between the two satisfies the predetermined requirement. Is considered. As yet another example, if the angle cosine value of the score before and after the first set of adjustments is calculated and the angle cosine value is smaller than a predetermined threshold, the difference between the two will satisfy the predetermined requirement. It is considered to satisfy. Further, those skilled in the art are not limited to the methods disclosed herein, and can make this determination using any suitable method of the prior art.

本発明の他の実施例では、ステップＳ２０６で述べたステップが、第２の集合の調整前後のスコアの差が既定の要件を満たしているか否かの判定に置き換えられる。 In another embodiment of the present invention, the step described in step S206 is replaced with a determination as to whether or not the difference between the scores before and after the adjustment of the second set satisfies a predetermined requirement.

Ｓ２０６において、第１の集合の調整前後のスコアの差が既定の要件を満たしていると判定された場合、このフローはさらにステップＳ２０７へと進むが、このスコアの差が既定の要件を満たしていないと判定された場合にはステップＳ２０２に戻る。 If it is determined in S206 that the difference between the scores before and after the adjustment of the first set satisfies the predetermined requirement, the flow further proceeds to step S207, but the difference in score satisfies the predetermined requirement. If it is determined that there is no, the process returns to step S202.

ステップＳ２０７において、第１の集合の要素は第１の集合の調整後のスコアに従ってランク付けされ、第２の集合の要素は第２の集合の調整後のスコアに従ってランク付けされる。 In step S207, the elements of the first set are ranked according to the adjusted score of the first set, and the elements of the second set are ranked according to the adjusted score of the second set.

第１の集合の調整後のスコアと第２の集合の調整後のスコアは、いずれもＳ２０２からＳ２０６までのステップを何度も繰り返すことによって求められる。 The adjusted score of the first set and the adjusted score of the second set are both obtained by repeating the steps from S202 to S206 many times.

第１の集合の調整後のスコアは、第１の集合の各要素に対応するスコアを値とするベクトルである。したがって、第１の集合の要素は、第１の集合の要素がそれぞれの対応するスコア値の大きさの順にランクされるように再度ランク付けされる。これにより得られた第１の集合は、最適化された第１の集合である。 The adjusted score of the first set is a vector whose value is a score corresponding to each element of the first set. Accordingly, the elements of the first set are re-ranked so that the elements of the first set are ranked in order of their corresponding score values. The first set thus obtained is the optimized first set.

同様に、第２の集合の調整後のスコアは、第２の集合の各要素に対応するスコアを値とするベクトルである。したがって、第２の集合の要素は、第２の集合の調整後のスコアに従って、第２の集合の要素がそれぞれの対応するスコア値の大きさの順にランクされるように再度ランク付けされる。これにより得られた第２の集合は、最適化された第２の集合である。 Similarly, the adjusted score of the second set is a vector whose value is a score corresponding to each element of the second set. Thus, the elements of the second set are re-ranked so that the elements of the second set are ranked in order of their corresponding score values according to the adjusted score of the second set. The second set thus obtained is the optimized second set.

ステップＳ２０８において、最適化された意見語集合と最適化された対象特徴集合に基づいて、評価情報が抽出される。 In step S208, evaluation information is extracted based on the optimized opinion word set and the optimized target feature set.

このステップはステップＳ１０３と類似しているため、ここでは詳細を省略する。 Since this step is similar to step S103, details are omitted here.

図２のフローはこれで終了する。 This is the end of the flow of FIG.

ここで注目すべきは、本発明の１つの実施例において、図２に示す実施例は、コーパスの第１の集合の各要素の頻度情報に基づき、事前に定義された方針に従って、第１の集合のスコアを初期化するというステップをさらに備えることである。事前に定義された方針に従って行う初期化は、コーパスの第１の集合の各要素の頻度情報をそのまま使って第１の集合のスコアを初期化する方法や、事前に設定された重み付け係数を使って頻度情報に重み付けを行うことにより第１の集合のスコアを初期化する方法で実施することができる。具体的には、第１の集合が意見語集合である場合には、意見語が程度副詞辞書の程度副詞と共に使われているか否かと、そのような使われ方をしている頻度とを判定して、コーパスの第１の集合の各要素の頻度情報が修正し、その修正後の頻度情報を使用して第１の集合のスコアが初期化される。 It should be noted here that in one embodiment of the present invention, the embodiment shown in FIG. 2 is based on the frequency information of each element of the first set of corpora, and in accordance with a predefined policy, the first And further comprising the step of initializing the score of the set. Initialization performed in accordance with a pre-defined policy uses a method of initializing the score of the first set using the frequency information of each element of the first set of corpus as it is, and a weighting factor set in advance. Thus, it is possible to implement the method by initializing the score of the first set by weighting the frequency information. Specifically, if the first set is an opinion word set, it is determined whether or not the opinion word is used together with the degree adverb in the degree adverb dictionary and the frequency of such use. Then, the frequency information of each element of the first set of corpus is corrected, and the score of the first set is initialized using the corrected frequency information.

本発明においては、上述したように、第１の集合は意見語集合か対象特徴集合のいずれかであり、第２の集合も意見語集合と対象特徴集合のいずれかであるが、第１の集合は第２の集合と同一ではない。 In the present invention, as described above, the first set is either the opinion word set or the target feature set, and the second set is either the opinion word set or the target feature set. The set is not the same as the second set.

Ｓ２０１からＳ２０８までのステップを実施するための本発明の１つの実施例において、第１の集合は対象特徴集合であり、第２の集合は意見語集合である。ステップＳ２０１でコーパスから意見語集合と対象特徴集合が取得された後、ステップＳ２０２において、対象特徴集合のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、意見語集合のスコアが計算される。次に、ステップＳ２０３において、ステップＳ２０２から取得された意見語集合のスコアが、意見語集合の要素の類似度を使って調整される。その後、ステップＳ２０４において、ステップＳ２０３で調整された意見語集合のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、対象特徴集合のスコアが計算される。次に、ステップＳ２０５において、対象特徴集合の要素の類似度を使って、対象特徴集合のスコアが調整される。その後、ステップＳ２０６において、ステップＳ２０５で調整された対象特徴集合のスコアと調整前のスコアの差が既定の要件を満たしているか否かが判定される。スコアの差が既定の要件を満たしていない場合には、フローはステップＳ２０２に戻るが、スコアの差が既定の要件を満たしている場合には、フローはさらにステップＳ２０７へと進み、ステップＳ２０５で調整された対象特徴集合のスコアに従って対象特徴集合の要素をランク付けすることにより、最適化された対象特徴集合が取得され、かつ、ステップＳ２０３で調整された意見語集合のスコアに従って意見語集合の要素をランク付けすることにより、最適化された意見語集合が取得される。最後に、ステップＳ２０８において、最適化された意見語集合と最適化された対象特徴集合に基づいて評価情報が抽出される。 In one embodiment of the present invention for performing the steps from S201 to S208, the first set is a target feature set and the second set is an opinion word set. After the opinion word set and the target feature set are acquired from the corpus in step S201, the score of the opinion word set is calculated in step S202 based on the relationship between the opinion word set and the target feature set according to the score of the target feature set. The Next, in step S203, the score of the opinion word set acquired from step S202 is adjusted using the similarity of the elements of the opinion word set. Thereafter, in step S204, the score of the target feature set is calculated based on the relationship between the opinion word set and the target feature set according to the score of the opinion word set adjusted in step S203. Next, in step S205, the score of the target feature set is adjusted using the similarity of the elements of the target feature set. Thereafter, in step S206, it is determined whether or not the difference between the score of the target feature set adjusted in step S205 and the score before adjustment satisfies a predetermined requirement. If the difference in score does not satisfy the predetermined requirement, the flow returns to step S202, but if the difference in score satisfies the predetermined requirement, the flow further proceeds to step S207, and in step S205. By ranking the elements of the target feature set according to the adjusted score of the target feature set, an optimized target feature set is obtained, and the opinion word set is scored according to the score of the opinion word set adjusted in step S203. By ranking the elements, an optimized opinion word set is obtained. Finally, in step S208, evaluation information is extracted based on the optimized opinion word set and the optimized target feature set.

Ｓ２０１からＳ２０８までのステップを実施するための本発明の他の実施例において、第１の集合は意見語集合であり、第２の集合は対象特徴集合である。ステップＳ２０１でコーパスから意見語集合と対象特徴集合が取得された後、ステップＳ２０２において、意見語集合のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、対象特徴集合のスコアが計算される。次に、ステップＳ２０３において、ステップＳ２０２から取得された対象特徴集合のスコアは、対象特徴集合の要素の類似度を使って調整される。その後、ステップＳ２０４において、ステップＳ２０３で調整された対象特徴集合のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、意見語集合のスコアが計算される。次に、ステップＳ２０５において、意見語集合の要素の類似度を使って、意見語集合のスコアが調整される。その後、ステップＳ２０６において、ステップＳ２０５で調整された意見語集合のスコアと調整前のスコアの差が既定の要件を満たしているか否かが判定される。スコアの差が既定の要件を満たしていない場合には、フローはステップＳ２０２に戻るが、スコアの差が既定の要件を満たしている場合には、フローはさらにステップＳ２０７へと進み、ステップＳ２０５で調整された意見語集合のスコアに従って意見語集合の要素をランク付けすることにより、最適化された意見語集合が取得され、かつ、ステップＳ２０３で調整された対象特徴集合のスコアに従って対象特徴集合の要素をランク付けすることにより、最適化された対象特徴集合が取得される。最後に、ステップＳ２０８において、最適化された意見語集合と最適化された対象特徴集合に基づいて、評価情報が抽出される。 In another embodiment of the present invention for carrying out the steps from S201 to S208, the first set is an opinion word set and the second set is a target feature set. After the opinion word set and the target feature set are acquired from the corpus in step S201, in step S202, the score of the target feature set is calculated based on the relationship between the opinion word set and the target feature set according to the score of the opinion word set. The Next, in step S203, the score of the target feature set acquired from step S202 is adjusted using the similarity of the elements of the target feature set. Thereafter, in step S204, the score of the opinion word set is calculated based on the relationship between the opinion word set and the target feature set according to the score of the target feature set adjusted in step S203. Next, in step S205, the score of the opinion word set is adjusted using the similarity of the elements of the opinion word set. Thereafter, in step S206, it is determined whether or not the difference between the score of the opinion word set adjusted in step S205 and the score before adjustment satisfies a predetermined requirement. If the difference in score does not satisfy the predetermined requirement, the flow returns to step S202, but if the difference in score satisfies the predetermined requirement, the flow further proceeds to step S207, and in step S205. By ranking the elements of the opinion word set according to the score of the adjusted opinion word set, an optimized opinion word set is obtained, and the target feature set according to the score of the target feature set adjusted in step S203 is obtained. By ranking the elements, an optimized target feature set is obtained. Finally, in step S208, evaluation information is extracted based on the optimized opinion word set and the optimized target feature set.

図３は、本発明の他の実施例による評価情報抽出方法のフローチャートである。 FIG. 3 is a flowchart of an evaluation information extraction method according to another embodiment of the present invention.

ステップＳ３０１において、コーパスから意見語集合と対象特徴集合が取得される。 In step S301, an opinion word set and a target feature set are acquired from the corpus.

ステップＳ３０２において、第１の集合の要素の類似度を使って、意見語集合と対象特徴集合のうちの第１の集合のスコアが調整される。 In step S302, the score of the first set of the opinion word set and the target feature set is adjusted using the similarity of the elements of the first set.

このステップの他の部分は、ステップＳ２０５に類似しているため、詳細は省略する。 Since the other part of this step is similar to step S205, details are omitted.

ステップＳ３０３において、第１の集合の調整後のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、意見語集合と対象特徴集合のうちの第２の集合のスコアが計算される。このステップはステップＳ２０２と類似している。 In step S303, according to the adjusted score of the first set, the score of the second set of the opinion word set and the target feature set is calculated based on the relationship between the opinion word set and the target feature set. This step is similar to step S202.

ステップＳ３０４において、第２の集合の要素の類似度を使って第２の集合のスコアが調整される。このステップはステップＳ２０３と類似している。 In step S304, the score of the second set is adjusted using the similarity of the elements of the second set. This step is similar to step S203.

ステップＳ３０５において、第２の集合の調整前後のスコアの差が既定の要件を満たしているか否かが判定される。このステップはステップＳ２０６と類似している。 In step S305, it is determined whether or not the difference between the scores before and after adjustment of the second set satisfies a predetermined requirement. This step is similar to step S206.

本発明の他の実施例では、ステップＳ３０５で述べたステップが、第１の集合の調整前後のスコアの差が既定の要件を満たしているか否かの判定に置き換えられる。 In another embodiment of the present invention, the step described in step S305 is replaced with a determination as to whether or not the difference between the scores before and after the adjustment of the first set satisfies a predetermined requirement.

ステップＳ３０５での判定結果が「はい」の場合には、フローはさらにステップＳ３０６へと進み、判定結果が「いいえ」の場合には、フローはステップＳ３０２に戻る。 If the determination result in step S305 is “Yes”, the flow further proceeds to step S306, and if the determination result is “No”, the flow returns to step S302.

ステップＳ３０６では、第１の集合の要素は、第１の集合の調整後のスコアに従ってランク付けされ、第２の集合の要素は、第２の集合の調整後のスコアに従ってランク付けされる。このステップはステップＳ２０７と類似している。 In step S306, the elements of the first set are ranked according to the adjusted score of the first set, and the elements of the second set are ranked according to the adjusted score of the second set. This step is similar to step S207.

ステップＳ３０７では、最適化された意見語集合と最適化された対象特徴集合に基づいて、評価情報が抽出される。 In step S307, evaluation information is extracted based on the optimized opinion word set and the optimized target feature set.

図３のフローはこれで終了する。 The flow in FIG. 3 ends here.

ここで注目すべきは、本発明の１つの実施例において、図３に示す実施例はさらに、コーパスの第１の集合の各要素の頻度情報に基づき、事前に定義された方針に従って、第１の集合のスコアを初期化するというステップをさらに備えることである。事前に定義された方針に従って行う初期化は、コーパスの第１の集合の各要素の頻度情報をそのまま使うことによって第１の集合のスコアが初期化される方法や、事前設定された重み付け係数を使って頻度情報に重み付けをすることによって、第１の集合のスコアが初期化されるという方法で実施することができる。具体的には、第１の集合が意見語集合である場合には、意見語が程度副詞辞書の程度副詞と共に使われているか否かと、そのような使われ方をしている頻度とを判定することによって、コーパスの第１の集合の各要素の頻度情報が修正され、その修正後の頻度情報が第１の集合のスコアを初期化するために使用される。 It should be noted here that in one embodiment of the present invention, the embodiment shown in FIG. 3 is further based on the frequency information of each element of the first set of corpora according to a pre-defined policy. A step of initializing a score of the set of. The initialization performed in accordance with the predefined policy is based on the method of initializing the score of the first set by using the frequency information of each element of the first set of corpus as it is, and a preset weighting factor. It can be implemented in such a way that the first set of scores is initialized by using and weighting the frequency information. Specifically, if the first set is an opinion word set, it is determined whether or not the opinion word is used together with the degree adverb in the degree adverb dictionary and the frequency of such use. By doing so, the frequency information of each element of the first set of corpus is corrected, and the corrected frequency information is used to initialize the score of the first set.

本発明においては、上述したように、第１の集合は意見語集合か対象集合のいずれかであり、第２の集合も意見語集合と対象特徴集合のいずれかであるが、第１の集合は第２の集合と同一ではない。 In the present invention, as described above, the first set is either the opinion word set or the target set, and the second set is either the opinion word set or the target feature set. Is not identical to the second set.

Ｓ３０１からＳ３０７までのステップを実施するための本発明の１つの実施例において、第１の集合は対象特徴集合であり、第２の集合は意見語集合である。Ｓ３０１からＳ３０７までのステップを実施するための本発明の他の実施例において、第１の集合は意見語集合であり、第２の集合は対象特徴集合である。 In one embodiment of the present invention for performing the steps from S301 to S307, the first set is a target feature set and the second set is an opinion word set. In another embodiment of the present invention for carrying out the steps from S301 to S307, the first set is an opinion word set and the second set is a target feature set.

図４は、本発明のもう１つの実施例による評価情報抽出方法のフローチャートである。図４に示す実施例は、図２に示す実施例において、第１の集合が意見語集合で、第２の集合が対象特徴集合である場合の状況を想定して示している。 FIG. 4 is a flowchart of an evaluation information extraction method according to another embodiment of the present invention. The embodiment shown in FIG. 4 assumes the situation in the embodiment shown in FIG. 2 where the first set is an opinion word set and the second set is a target feature set.

ステップ４０１において、コーパスから意見語集合と対象特徴集合が取得される。 In step 401, an opinion word set and a target feature set are acquired from the corpus.

本実施例において、コーパスを前処理した後に、「高排出ガス車の燃料消費率はたいへん高い」、「このモデルの携帯電話の価格は高い」、「このブランドの車の操縦性は非常に良い」、「購入したプリンタの価格はあまりにも高い」という４つのテキスト単位が得られたとする。 In this example, after pre-processing the corpus, "the fuel consumption rate of the high emission car is very high", "the price of the mobile phone of this model is high", "the driving performance of this brand car is very good Suppose that four text units are obtained, "Purchased printer is too expensive."

意見語集合は、意見語抽出規則に従って取得され、対象特徴集合は対象特徴抽出規則に従って取得される。 The opinion word set is acquired according to the opinion word extraction rule, and the target feature set is acquired according to the target feature extraction rule.

コーパスの上記４つのテキスト単位に基づき、意見語抽出規則に従うことにより、「良い」、「高い」という２つの意見語を含む意見語集合が取得され、かつ、「良い」の出現頻度は１、「高い」の出現頻度は３と判定される。 Based on the above four text units of the corpus, an opinion word set including two opinion words “good” and “high” is acquired by following the opinion word extraction rule, and the appearance frequency of “good” is 1, The appearance frequency of “high” is determined to be 3.

コーパスの上記４つのテキスト単位基づき、対象特徴抽出規則に従うことにより、対象特徴集合には「燃料消費率」、「価格」、「操縦性」という３つの対象特徴が含まれ、かつ、「燃料消費率」の出現頻度は１、「価格」の出現頻度は２、「操縦性」の出現頻度は１であると判定される。 By following the target feature extraction rules based on the above four text units of the corpus, the target feature set includes three target features of “fuel consumption rate”, “price”, and “maneuverability”, and “fuel consumption” It is determined that the appearance frequency of “rate” is 1, the appearance frequency of “price” is 2, and the appearance frequency of “maneuverability” is 1.

ステップＳ４０２において、意見語集合のスコアに従い、意見語集合と対象特徴集合の関係に基づいて、対象特徴集合のスコアが計算される。 In step S402, according to the opinion word set score, the score of the target feature set is calculated based on the relationship between the opinion word set and the target feature set.

２つの集合の関係は、表１に示すように、ステップＳ４０１で取得された意見語集合と対象特徴集合に従って求めることができる。その関係（「Ｃ」とする）は、次の行列形式で表現される。

As shown in Table 1, the relationship between the two sets can be obtained according to the opinion word set and the target feature set acquired in step S401. The relationship (referred to as “C”) is expressed in the following matrix format.

ステップＳ２０２で説明したように、意見語のスコアの初期値は多数の方法によって得ることができる。例えば、ステップＳ４０２の１つの実施例においては、事前に定義された程度副詞辞書が使用される。意見語が程度副詞辞書にある程度副詞と共に１回使われている場合には、意見語のスコアは１となる（意見語のスコアの初期値は０に設定される）。このように、意見語が程度副詞と共に使われる頻度の統計を作成することによって、意見語のスコアを求めることができる。個々の意見語のスコアは、意見語集合の個々の意見語に対してこれを実行することで求められる。本発明において、意見語集合に含まれる個々の意見語のスコアは、総称して「意見語集合のスコア」と呼ばれる。 As described in step S202, the initial value of the opinion word score can be obtained by a number of methods. For example, in one embodiment of step S402, a pre-defined degree adverb dictionary is used. When an opinion word is used once together with an adverb to some extent in the degree adverb dictionary, the opinion word score is 1 (the initial value of the opinion word score is set to 0). Thus, the score of an opinion word can be calculated | required by producing the statistics of the frequency with which an opinion word is used with a degree adverb. The score of each opinion word is obtained by executing this for each opinion word in the opinion word set. In the present invention, scores of individual opinion words included in the opinion word set are collectively referred to as “opinion word set score”.

程度副詞辞書には、例えば、「たいへん（很）」、「最も（最）」、「極端に（極）」、「あまりにも多く（太）」、「非常に（非常）」、「十分に（十分）」、「とても（更）」、「とても多く（更加）」、「より多く（越）」、「過度に（過）」、「さらに多くの（越出）」、「きわめて（極其）」、「例外的に（格外）」、「とりわけ（分外）」、「少し（有点儿）」、「特別に（偏）」、「やや（稍）」、「ほんの少し（稍微）」、「ほとんど（几乎）」、「わずかに（略微）」、「あまりにも（過于）」、および「特に（尤其）」などの程度を表す副詞を１つ以上含めることができる。 The degree adverb dictionary includes, for example, “Taien (へ)”, “Most (most)”, “Extremely (pole)”, “Too much (thick)”, “Very (very)”, “Sufficiently” (Sufficient), very (extra), very much (extra), more (excess), excessive (excess), more (exit), extremely (extreme) ) "," Exceptionally (extraordinary) "," especially (extraordinary) "," little (pointed 儿) "," special (biased) "," slightly (稍) "," just a little (minor) " Can include one or more adverbs that indicate degrees such as “almost (几乎)”, “slightly (substantially)”, “too much”, and “especially”.

コーパスから取得されたテキスト単位において、意見語「良い」の出現頻度は１、「高い」の出現頻度は３である。「たいへん」および「あまりにも」は、程度副詞辞書に定められた程度副詞であるため、程度副詞の後に続く意見語「良い」の出現頻度１、程度副詞の後に続く意見語「高い」の出現頻度２が求められる。その結果、意見語集合のスコア（「Ｏ_{ｓｃｏｒｅ}」とする）は、以下のようになる。

Ｏ_{ｓｃｏｒｅ}＝（１２）（２）
In the text unit acquired from the corpus, the appearance frequency of the opinion word “good” is 1, and the appearance frequency of “high” is 3. “Tahen” and “too” are degree adverbs specified in the degree adverb dictionary, so the frequency of occurrence of the opinion word “good” following the degree adverb is 1, and the appearance of the opinion word “high” following the degree adverb Frequency 2 is determined. As a result, the score of the opinion word set (referred to as “O _score ”) is as follows.

O _score = (1 2) (2)

本発明においては、意見語集合の意見語のスコアが大きいほど、コーパス内の意見語の使用頻度は高く、評価情報抽出のための意見語の有用性が高くなる。同様に、対象特徴集合の対象特徴のスコアが大きいほど、コーパス内の対象特徴の使用頻度は高く、評価情報抽出のための対象特徴の有用性が高くなる。 In the present invention, the higher the score of the opinion word in the opinion word set, the higher the frequency of use of the opinion word in the corpus, and the higher the usefulness of the opinion word for evaluation information extraction. Similarly, the larger the score of the target feature in the target feature set, the higher the usage frequency of the target feature in the corpus, and the higher the usefulness of the target feature for extracting the evaluation information.

後のステップＳ４１２では、一定の条件が満たされていれば、フローはさらにステップＳ４１３に進み、そこで、意見語集合のスコアは調整後の意見語集合のスコアで更新される。その後、フローはステップＳ４１３からステップＳ４０２へと戻るが、ステップＳ４０２では、コーパス内の意見語の頻度情報の統計を作成することによって得られた意見語集合のスコアの代わりに、ステップＳ４１１で得られた調整後の意見語集合のスコアを使用してもよい。 In the subsequent step S412, if a certain condition is satisfied, the flow further proceeds to step S413, where the opinion word set score is updated with the adjusted opinion word set score. Thereafter, the flow returns from step S413 to step S402, but in step S402, instead of the score of the opinion word set obtained by creating the statistics of the frequency information of the opinion words in the corpus, the flow is obtained in step S411. The score of the opinion word set after adjustment may be used.

対象特徴集合のスコア（「Ｆ_{ｓｃｏｒｅ}」とする）は、意見語集合のスコアＯ_{ｓｃｏｒｅ}に従い、意見語集合と対象特徴集合の関係Ｃに基づいて、いくつかの方法で計算することができる。 The score of the target feature set (referred to as “F _score ”) can be calculated in several ways based on the relationship C between the opinion word set and the target feature set according to the score O _score of the opinion word set.

例えば、意見語集合から対象特徴集合への変換関係Ｔ_Ｏ−Ｆは、表２に示すような意見語集合と対象特徴集合の関係Ｃから求められる。この変換関係Ｔ_Ｏ−Ｆは、次の行列形式で表現することができる。

For example, the conversion relationship TO _-F from the opinion word set to the target feature set is obtained from the relationship C between the opinion word set and the target feature set as shown in Table 2. This conversion relationship T _O-F can be expressed in the following matrix format.

一例を挙げれば、変換関係Ｔ_Ｏ−Ｆと、意見語集合のスコアＯ_{ｓｃｏｒｅ}との積は、対象特徴集合のスコアＦ_{ｓｃｏｒｅ}とみなすことができる。

As an example, the product of the conversion relationship T _O-F and the score O _score of the opinion word set can be regarded as the score F _score of the target feature set.

他の例では、上記の式（４）の結果は、重み付け係数（数値、ベクトル、または行列）による重み付けが可能であり、重み付けを行った結果が対象特徴集合のスコアＦ_{ｓｃｏｒｅ}とみなされる。 In another example, the result of the above equation (4) can be weighted by a weighting coefficient (numerical value, vector, or matrix), and the weighted result is regarded as the score F _{score of the} target feature set.

ステップＳ４０３において、対象特徴集合のスコアは、対象特徴集合の要素の類似度および先験的な信頼度に基づいて調整され、調整スコアが求められる。 In step S403, the score of the target feature set is adjusted based on the similarity and a priori reliability of the elements of the target feature set to obtain an adjusted score.

表４に対象特徴集合の要素の類似度を示しているが、これは以下のような形式で表現し、Ｓ_Ｆとしてもよい。

Table 4 shows the similarity of the elements of the target feature sets in, which is expressed in the form: may be S _F.

ステップＳ４０３は、ステップＳ２０３で示した多数の方法で実行することができる。例えば、対象特徴集合のスコアは、対象特徴集合の要素の類似度Ｓ_Ｆおよび先験的な信頼度（αとする）に基づいて調整されて、調整スコアＸが求められる。

Ｘ＝Ｓ_Ｆ・Ｆ_{ｓｃｏｒｅ}＋α・Ｆ^１ _{ｓｃｏｒｅ} （６）
Step S403 can be performed in a number of ways as indicated in step S203. For example, the score of the target feature set is adjusted based on the similarity S _F and a priori reliability of the elements of the target feature set (and alpha), adjusted score X is obtained.

X = S _F · F _score + α · F ¹ _score (6)

式（６）において、Ｆ^１ _{ｓｃｏｒｅ}は、対象特徴集合のスコアの初期値（例えば、ステップＳ４０２で、意見語集合のスコアに従って最初に得られた対象特徴集合のスコア）である。先験的な信頼度αは、経験値に従って前もって指定するか、従来技術に従って求めるか、数学モデルに従って計算するか、または当業者が適切な事前設定値として設定してもよい。 In Equation (6), F ¹ _score is an initial value of the score of the target feature set (for example, the score of the target feature set first obtained in accordance with the opinion word set score in step S402). The a priori confidence α may be specified in advance according to empirical values, determined according to the prior art, calculated according to a mathematical model, or set as a suitable preset value by those skilled in the art.

ステップＳ４０４において、調整スコアと対象特徴集合のスコアの差が既定の要件を満たしているか否かが判定される。 In step S404, it is determined whether or not the difference between the adjustment score and the target feature set score satisfies a predetermined requirement.

ステップＳ２０３と同様に、調整スコアと対象特徴集合のスコアの差が既定の要件を満たしているか否かの判定は、多数の方法で行うことができる。例えば、調整スコアと対象特徴集合のスコアの差が既定の要件を満たしているか否かは、反復回数をカウントするか、調整スコアと対象特徴集合のスコアの差をしきい値と比較するか、または、調整スコアと対象特徴集合のスコアとの角コサイン値を評価することによって、判定してもよい。さらに、当業者はここで開示した方法に限定されず、従来技術の適切な方法によってこの判定を行うことができる。 Similar to step S203, it can be determined by a number of methods whether or not the difference between the adjustment score and the score of the target feature set satisfies a predetermined requirement. For example, whether the difference between the adjusted score and the target feature set score satisfies a predetermined requirement is determined by counting the number of iterations, comparing the adjusted score and the target feature set score with a threshold, Alternatively, the determination may be made by evaluating the angle cosine value between the adjustment score and the score of the target feature set. Further, those skilled in the art are not limited to the methods disclosed herein, and can make this determination by any suitable method of the prior art.

ステップＳ４０４で調整スコアと対象特徴集合のスコアの差が既定の要件を満たしていると判定された場合には、フローはさらにステップＳ４０６へと進み、既定の要件を満たしていないと判定された場合には、フローはステップＳ４０５へ進む。 If it is determined in step S404 that the difference between the adjustment score and the score of the target feature set satisfies a predetermined requirement, the flow further proceeds to step S406, where it is determined that the predetermined requirement is not satisfied. In step S405, the flow proceeds to step S405.

ステップＳ４０５において、対象特徴集合のスコアが調整スコアで更新される。 In step S405, the score of the target feature set is updated with the adjustment score.

調整スコアを対象特徴集合のスコアに置き換えることにより、対象特徴集合の元のスコアではなく、ステップＳ４０３で調整された対象特徴集合のスコアが調整スコアとなるので、反復手順の修正が容易になる。 By replacing the adjustment score with the score of the target feature set, the score of the target feature set adjusted in step S403 becomes the adjustment score instead of the original score of the target feature set, so that the iterative procedure can be easily corrected.

ステップＳ４０６において、調整スコアが対象特徴集合の調整後のスコアとみなされる。 In step S406, the adjustment score is regarded as the adjusted score of the target feature set.

この実施例では、対象特徴集合の調整後のスコアは以下のようになる。

In this embodiment, the adjusted score of the target feature set is as follows.

ステップＳ４０７において、意見語集合のスコアは、対象特徴集合の調整後のスコアに従って、意見語集合と対象特徴集合の関係に基づいて計算される。 In step S407, the score of the opinion word set is calculated based on the relationship between the opinion word set and the target feature set according to the adjusted score of the target feature set.

意見語集合のスコアＯ_{ｓｃｏｒｅ}は、対象特徴集合のスコアＦ_{ｓｃｏｒｅ}に従い、意見語集合と対象特徴集合の関係Ｃに基づいて、多数の方法で計算することができる。 The score O _score of the opinion word set can be calculated by a number of methods based on the relationship C between the opinion word set and the target feature set according to the score F _score of the target feature set.

例えば、対象特徴集合から意見語集合への変換関係Ｔ_Ｆ−Ｏは、表３に示すように、関係Ｃから求められる。この変換関係Ｔ_Ｆ−Ｏは、次の行列形式で表現される。

For example, conversion relationship T _F-O from the target feature set to opinion word set, as shown in Table 3, obtained from the relationship C. This conversion relationship _TFO is expressed in the following matrix format.

一例を挙げれば、変換関係Ｔ_Ｏ−Ｆと、ステップＳ４０６から求められた対象特徴集合のスコアＦ_{ｓｃｏｒｅ}との積は、意見語集合のスコアＯ_{ｓｃｏｒｅ}とみなされる。

As an example, the product of the conversion relationship T _O-F and the score F _score of the target feature set obtained from step S406 is regarded as the score O _score of the opinion word set.

他の例を挙げれば、上記の式（８）の結果は、重み付け係数（数値、ベクトル、または行列）による重み付けが可能で、重み付けを行った結果が意見語集合のスコアとみなされる。 As another example, the result of the above equation (8) can be weighted by a weighting coefficient (numerical value, vector, or matrix), and the weighted result is regarded as the score of the opinion word set.

ステップＳ４０８において、意見語集合のスコアが意見語集合の要素の類似度と先験的な信頼度に基づいて調整され、調整スコアが得られる。 In step S408, the score of the opinion word set is adjusted based on the similarity of the elements of the opinion word set and the a priori reliability, and an adjusted score is obtained.

意見語集合の要素の類似度は表５に示されているが、以下のような形式で表現し、Ｓｏとしてもよい。

Although the degree of similarity of the elements of the opinion word set is shown in Table 5, it may be expressed in the following format and may be So.

ステップＳ４０８は、ステップＳ２０５で示した多数の方法で実行することができる。例えば、意見語集合のスコアは、意見語集合の要素の類似度Ｓｏおよび先験的な信頼度（βとする）に基づいて調整されて、調整スコアＹが得られる。

Ｙ＝Ｓ_Ｏ・Ｏ_{ｓｃｏｒｅ}＋β・Ｏ^１ _{ｓｃｏｒｅ} （１１）
Step S408 can be performed in a number of ways as indicated in step S205. For example, the score of the opinion word set is adjusted based on the similarity So of the elements of the opinion word set and the a priori confidence level (assumed to be β), and the adjusted score Y is obtained.

Y = S _O · O _score + β · O ¹ _score (11)

式（１１）において、Ｏ^１ _{ｓｃｏｒｅ}は意見語集合のスコアの初期値を示す。先験的な信頼度βは、経験値に従って前もって指定するか、従来技術に従って求めらるか、数学モデルに従って計算するか、または、当業者が適切な事前設定値として設定してもよい。 In Expression (11), O ¹ _score indicates the initial _score of the opinion word set. The a priori confidence β may be specified in advance according to empirical values, determined according to the prior art, calculated according to a mathematical model, or may be set as a suitable preset value by those skilled in the art.

ステップＳ４０９において、調整スコアと意見語集合のスコアとの差が既定の要件を満たしているか否かが判定される。 In step S409, it is determined whether or not the difference between the adjustment score and the opinion word set score satisfies a predetermined requirement.

ステップＳ２０５と同様に、調整スコアと意見語集合のスコアの差が既定の要件を満たしているか否かの判定は、多数の方法で行うことができる。例えば、実行された調整の回数をカウントし、カウント結果が事前に設定された反復回数を超える場合には、調整スコアと意見語集合のスコアの差は既定の要件を満たしているとみなされる。もう１つの例を挙げれば、調整スコアと意見語集合のスコアとの差を求め、その差が既定のしきい値よりも小さい場合には、両者の差は既定の要件を満たしているとみなされる。もう１つの例を挙げれば、調整スコアと意見語集合のスコアの角度コサイン値が計算され、角度コサイン値が既定のしきい値よりも小さい場合には、両者の差は既定の要件を満たしているとみなされる。さらに、当業者はここで開示した方法のみに限定されず、従来技術の適切な方法によってこの判定を行うことができる。 As in step S205, it can be determined in a number of ways whether the difference between the adjustment score and the opinion word set score satisfies a predetermined requirement. For example, when the number of adjustments performed is counted and the count result exceeds a preset number of iterations, the difference between the adjustment score and the opinion word set score is considered to satisfy a predetermined requirement. In another example, if the difference between the adjusted score and the opinion set score is found and the difference is less than the predefined threshold, the difference is considered to meet the predefined requirements. It is. As another example, if the angle cosine value of the adjustment score and the opinion set score is calculated and the angle cosine value is less than the default threshold, the difference between the two satisfies the default requirement. Is considered to be. Further, those skilled in the art are not limited to the methods disclosed herein, and can make this determination by any suitable method of the prior art.

ステップＳ４０９で調整スコアと対象特徴集合のスコアの差が既定の要件を満たしていると判定された場合には、フローはさらにステップＳ４１１へと進み、この差が既定の要件を満たしていないと判定された場合には、フローはステップＳ４１０へ進む。 If it is determined in step S409 that the difference between the adjustment score and the score of the target feature set satisfies a predetermined requirement, the flow further proceeds to step S411, where it is determined that this difference does not satisfy the predetermined requirement. If so, flow proceeds to step S410.

ステップＳ４１０において、意見語集合のスコアが調整スコアで更新される。 In step S410, the score of the opinion word set is updated with the adjusted score.

ステップＳ４１１において、調整スコアが調整後の意見語集合のスコアであるとみなされる。 In step S411, the adjustment score is regarded as the score of the adjusted opinion word set.

ステップＳ４１２において、意見語集合の調整前後のスコアの差が既定の要件を満たしているか否かが判定される。 In step S412, it is determined whether the difference between the scores before and after adjustment of the opinion word set satisfies a predetermined requirement.

この判定は、例えば、意見語集合のスコアの調整回数をカウントすることによって行うことができる。カウント結果が事前に設定された反復回数を超えた場合には、意見語集合の調整前後のスコアの差が既定の要件を満たしているとみなされる。もう１つの例を挙げれば、意見語集合の調整前後のスコアの差を求め、スコアの差が既定のしきい値よりも小さい場合には、両者の差が既定の要件を満たしているとみなされる。もう１つの例を挙げれば、意見語集合の調整前後のスコアの角度コサイン値が計算され、角度コサイン値が既定のしきい値よりも小さい場合には、両者の差が既定の要件を満たしているとみなされる。さらに、当業者はここで開示した方法のみに限定されず、従来技術の適切な方法によってこの判定を行うことができる。 This determination can be performed, for example, by counting the number of adjustments of the score of the opinion word set. If the count result exceeds the preset number of iterations, the difference in score before and after adjustment of the opinion word set is considered to satisfy the predetermined requirement. As another example, if the difference in scores before and after adjustment of the opinion word set is determined and the difference in scores is smaller than the predetermined threshold, the difference between the two is considered to satisfy the predetermined requirement. It is. As another example, if the angle cosine value of the score before and after adjustment of the opinion word set is calculated and the angle cosine value is smaller than the predetermined threshold value, the difference between the two satisfies the predetermined requirement. Is considered to be. Further, those skilled in the art are not limited to the methods disclosed herein, and can make this determination by any suitable method of the prior art.

本発明の他の実施例では、ステップＳ４１２に示したステップが、対象特徴集合の調整前後のスコアの差が既定の要件を満たしているか否かの判定と置き換えられる。この場合においても、本発明は実施可能であり、本発明の効果が達成される。 In another embodiment of the present invention, the step shown in step S412 is replaced with a determination as to whether or not the difference between the scores before and after the target feature set adjustment satisfies a predetermined requirement. Even in this case, the present invention can be implemented, and the effects of the present invention are achieved.

ステップＳ４１２において、意見語集合の調整前後のスコアの差が既定の要件を満たしていると判定された場合には、フローはさらにステップＳ４１４へと進み、この差が既定の要件を満たしていないと判定された場合には、フローはステップＳ４１３へ進む。 If it is determined in step S412 that the difference between the scores before and after adjustment of the opinion word set satisfies the predetermined requirement, the flow further proceeds to step S414, and this difference does not satisfy the predetermined requirement. If it is determined, the flow proceeds to step S413.

ステップＳ４１３において、意見語集合のスコアが調整後の意見語集合のスコアで更新される。 In step S413, the score of the opinion word set is updated with the adjusted score of the opinion word set.

ステップＳ４１４において、意見語集合の要素は、意見語集合の調整後のスコアに従ってランク付けされ、対象特徴集合の要素は、対象特徴集合の調整後のスコアに従ってランク付けされる。 In step S414, the elements of the opinion word set are ranked according to the adjusted score of the opinion word set, and the elements of the target feature set are ranked according to the adjusted score of the target feature set.

本実施例では、ステップＳ４１２から得られた意見語集合の最新のスコアをＯ_{ｓｃｏｒｅ}＝（２４）とし、意見語「良い」と「高い」を含む意見語集合において「高い」は「良い」よりも高い順位にランク付けされる。すなわち、意見語は、「高い」、「良い」の順にランク付けされる。ここでは、ランク付けが完了した意見語集合は、最適化された意見語集合と呼ばれる。 In the present embodiment, the latest score of the opinion word set obtained from step S412 is O _score = ( ₂₄ ), and “high” is “good” in the opinion word set including the opinion words “good” and “high”. Will be ranked higher. That is, opinion words are ranked in the order of “high” and “good”. Here, the opinion word set for which ranking has been completed is referred to as an optimized opinion word set.

対象特徴集合の調整後のスコアが以下であったとする。

３つの対象特徴「燃料消費率」、「価格」、および「操縦性」を含む対象特徴集合において、対象特徴は、「価格」、「燃料消費率」、「操縦性」の順にランク付けされる。ここでは、ランク付けが完了した対象特徴集合は、最適化された対象特徴集合と呼ばれる。 Assume that the adjusted score of the target feature set is as follows.

In the target feature set including three target features “fuel consumption rate”, “price”, and “maneuverability”, the target features are ranked in the order of “price”, “fuel consumption rate”, “maneuverability”. . Here, the target feature set for which ranking has been completed is referred to as an optimized target feature set.

ステップＳ４１５において、最適化された意見語集合と最適化された対象特徴集合に基づいて評価情報が抽出される。 In step S415, evaluation information is extracted based on the optimized opinion word set and the optimized target feature set.

このステップはステップＳ１０３と類似している。 This step is similar to step S103.

本実施例では、意見語の既定のしきい値を２とし、意見語集合内で２位の意見語よりも上位にある高ランクの意見語、すなわち「高い」が抽出される。 In this embodiment, the default threshold value of opinion words is set to 2, and a high rank opinion word that is higher than the second opinion word in the opinion word set, that is, “high” is extracted.

また、対象特徴の既定のしきい値を３とし、対象特徴集合内の３位の対象特徴よりも上位にある対象特徴、すなわち「価格」および「燃料消費率」が抽出される。 Further, assuming that the predetermined threshold value of the target feature is 3, target features higher than the third highest target feature in the target feature set, that is, “price” and “fuel consumption rate” are extracted.

その後、抽出された意見語「高い」が抽出された対象特徴「価格」、「燃料消費率」に関連があるか否か（例えば、これらの語が同じテキスト単位内で共起していたか否か）が、意見語集合と対象特徴集合の関係に従って判定される。本実施例においては、評価情報は、抽出された意見語と抽出された対象特徴の組み合わせのうち、同じテキスト単位内で１度以上共起していたものによって構成される。本実施例の４つのテキスト単位に、「価格…高い」と「燃料消費率…高い」が出現しているため、［燃料消費率，高い］と［価格，高い］が評価情報として取得される。 After that, whether the extracted opinion word “high” is related to the extracted target characteristics “price”, “fuel consumption rate” (for example, whether these words co-occur in the same text unit) Is determined according to the relationship between the opinion word set and the target feature set. In the present embodiment, the evaluation information is constituted by a combination of the extracted opinion word and the extracted target feature that co-occurs at least once within the same text unit. Since “price ... high” and “fuel consumption rate ... high” appear in the four text units of this embodiment, [fuel consumption rate, high] and [price, high] are acquired as evaluation information. .

図４のフローはこれで終了する。 The flow in FIG. 4 ends here.

図５は、本発明の１つの実施例による評価情報抽出装置５００のブロック図である。評価情報抽出装置５００は、取得手段５０１、最適化手段５０２、および抽出手段５０３を備える。 FIG. 5 is a block diagram of an evaluation information extraction apparatus 500 according to one embodiment of the present invention. The evaluation information extraction apparatus 500 includes an acquisition unit 501, an optimization unit 502, and an extraction unit 503.

取得手段５０１は、コーパスから意見語集合と対象特徴集合を取得するように構成されている。最適化手段５０２は、意見語集合と対象特徴集合の関係、意見語集合の要素の類似度、および対象特徴集合の要素の類似度に基づいて、意見語集合と対象特徴集合を最適化するように構成されている。抽出手段５０３は、最適化された意見語集合と最適化された対象特徴集合に基づいて評価情報を抽出するように構成されている。 The acquisition unit 501 is configured to acquire the opinion word set and the target feature set from the corpus. The optimization unit 502 optimizes the opinion word set and the target feature set based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set. It is configured. The extraction unit 503 is configured to extract evaluation information based on the optimized opinion word set and the optimized target feature set.

１つの実施例においては、取得手段５０１は、テキスト単位を取得するためにコーパスの前処理を行うように構成された手段と、取得されたテキスト単位に基づき、意見語抽出規則に従って意見語集合を取得するように構成された手段と、取得されたテキスト単位に基づき、対象特徴抽出規則に従って対象特徴集合を取得するように構成された手段とを備える。 In one embodiment, the acquisition means 501 generates a set of opinion words according to opinion word extraction rules based on the means configured to perform corpus preprocessing to acquire text units and the acquired text units. Means configured to acquire and means configured to acquire a target feature set according to a target feature extraction rule based on the acquired text unit.

１つの実施例においては、意見語抽出規則を、「程度副詞のすぐ後に続く文節、形容詞、機能語を含まない文節、最長の意見語よりも長さが短いか等しい文節、および最小出現頻度の意見語よりも出現頻度が多い文節をテキスト単位から１つ以上抽出して意見語とする」としてもよい。 In one embodiment, the opinion word extraction rules are: “a clause immediately following a degree adverb, an adjective, a clause that does not include a function word, a clause that is shorter or equal in length to the longest opinion word, and a minimum occurrence frequency. It is also possible to extract one or more phrases from the text unit that have a higher appearance frequency than the opinion word and make it an opinion word.

１つの実施例においては、対象特徴抽出規則を、「基本名詞句、基本名詞句同士の組み合わせ、基本名詞句と名詞／動名詞の組み合わせ、基本名詞句と限定詞の組み合わせ、および限定詞と名詞／動名詞の組み合わせ、機能語を含まない文節、最長の対象特徴よりも長さが短いか等しい文節、および出現頻度が最少の対象特徴よりも出現頻度が多い文節を、テキスト単位から１つ以上抽出して対象特徴とする」としてもよい。 In one embodiment, the target feature extraction rules are “basic noun phrases, combinations of basic noun phrases, basic noun phrases and noun / dynamic noun combinations, basic noun phrases and qualifier combinations, and qualifiers and nouns”. One or more text units that contain combinations of verbal nouns, phrases that do not contain function words, phrases that are shorter or equal to the longest target feature, and phrases that appear more frequently than the target feature with the lowest occurrence frequency It may be “extracted and set as a target feature”.

１つの実施例においては、最適化手段５０２は、意見語集合と対象特徴集合のうちの第１の集合のスコアに従って、意見語集合と対象特徴集合のうちの第２の集合のスコアを、関係に基づいて計算するように構成された第１変換計算手段と、第２の集合の要素の類似度を使って、第２の集合のスコアを調整するように構成された第２調整手段と、第２の集合の調整後のスコアに従って第１の集合のスコアを、関係に基づいて計算するように構成された第２変換計算手段と、第１の集合の要素の類似度を使って第１の集合のスコアを調整し、第１の集合の調整後のスコアに従って第２の集合のスコアを、関係に基づいて計算するように構成された第１調整手段と、第１の集合の調整前後のスコアの差が既定の要件を満たしているか、または第２の集合の調整前後のスコアの差が既定の要件を満たしている場合に、第１の集合の調整後のスコアに従って第１の集合の要素をランク付けし、かつ第２の集合の調整後のスコアに従って第２の集合の要素をランク付けするように構成されたランク付け手段とを備える。
１つの例においては、最適化手段５０２は、コーパスの第１の集合の各要素の頻度情報に基づき、事前に定義された方針に従って、第１の集合のスコアを初期化するように構成された手段を備える。
他の例においては、最適化手段５０２の第１調整手段は、第１の集合の要素の類似度および先験的な信頼度とに基づき、第１の集合のスコアを調整することによって調整スコアを求めるように構成された手段と、調整スコアと第１の集合のスコアとの差が既定の要件を満たしている場合には、調整スコアを第１の集合の調整後のスコアと判定して第１の集合のスコアの調整を停止するように構成された手段と、調整スコアと第１の集合のスコアとの差が既定の要件を満たしていない場合には、第１の集合のスコアを調整スコアで更新するように構成された手段とを備える。
他の例においては、最適化手段５０２の第２調整手段は、第２の集合の要素の類似度および先験的な信頼度に基づき、第２の集合のスコアを調整することによって調整スコアを求めるように構成された手段と、調整スコアと第２の集合のスコアとの差が既定の要件を満たしている場合には、調整スコアを第２の集合の調整後のスコアと判定して第２の集合のスコアの調整を停止するように構成された手段と、調整スコアと第２の集合のスコアとの差が既定の要件を満たしていない場合には、第２の集合のスコアを調整スコアで更新するように構成された手段とを備える。 In one embodiment, the optimization means 502 relates the scores of the second set of the opinion word set and the target feature set according to the scores of the first set of the opinion word set and the target feature set. First transform calculating means configured to calculate based on the second set of means, and second adjusting means configured to adjust the score of the second set using the similarity of the elements of the second set; The second transformation calculating means configured to calculate the score of the first set based on the relationship according to the adjusted score of the second set, and the first using the similarity of the elements of the first set Adjusting the score of the set of the first set, and adjusting the score of the second set according to the adjusted score of the first set based on the relationship; and before and after adjusting the first set The difference in scores of the two meets the predetermined requirements or the second Rank the elements of the first set according to the adjusted score of the first set and the adjusted score of the second set when the difference between the scores before and after adjustment of the set satisfies a predetermined requirement And ranking means configured to rank the elements of the second set according to
In one example, the optimizer 502 is configured to initialize the score of the first set according to a predefined policy based on the frequency information of each element of the first set of corpus. Means.
In another example, the first adjuster of the optimizer 502 adjusts the score of the first set by adjusting the score of the first set based on the similarity and a priori confidence of the elements of the first set. And if the difference between the adjusted score and the score of the first set satisfies a predetermined requirement, the adjusted score is determined as the adjusted score of the first set. Means configured to stop adjusting the score of the first set, and if the difference between the adjusted score and the score of the first set does not satisfy the predetermined requirement, the score of the first set is Means configured to update with an adjusted score.
In another example, the second adjusting means of the optimizing means 502 determines the adjusted score by adjusting the score of the second set based on the similarity and a priori confidence of the elements of the second set. If the difference between the means configured to determine and the adjusted score and the score of the second set satisfies a predetermined requirement, the adjusted score is determined as the adjusted score of the second set and Means configured to stop adjusting the score of the second set and adjust the score of the second set if the difference between the adjusted score and the score of the second set does not meet the predetermined requirement Means configured to update with a score.

１つの実施例においては、最適化手段５０２は、第１の集合の要素の類似度を使って、意見語集合と対象特徴集合のうちの第１の集合のスコアを調整するように構成された第１調整手段と、第１の集合の調整後のスコアに従って、意見語集合と対象特徴集合のうちの第２の集合のスコアを、関係に基づいて計算するように構成された変換計算手段と、第１の集合のスコアを第２の集合の調整後のスコアに従って、関係に基づいて計算するために、第２の集合の要素の類似度を使って第２の集合のスコアを調整するように構成された第２調整手段と、第１の集合の調整前後のスコアの差が既定の要件を満たしているか、または第２の集合の調整前後のスコアの差が既定の要件を満たしている場合には、第１の集合の調整後のスコアに従って第１の集合の要素をランク付けし、かつ第２の集合の調整後のスコアに従って第２の集合の要素をランク付けするように構成されたランク付け手段とを備える。
１つの例においては、最適化手段５０２は、コーパスの第１の集合の各要素の頻度情報に基づき、事前に定義された方針に従って、第１の集合のスコアを初期化するように構成された手段を備える。
他の例においては、最適化手段５０２の第１調整手段は、第１の集合の要素の類似度および先験的な信頼度とに基づき、第１の集合のスコアを調整することによって調整スコアを求めるように構成された手段と、調整スコアと第１の集合のスコアとの差が既定の要件を満たしている場合には、調整スコアを第１の集合の調整後のスコアと判定し、かつ第１の集合のスコアの調整を停止するように構成された手段と、調整スコアと第１の集合のスコアとの差が既定の要件を満たしていない場合には、第１の集合のスコアを調整スコアで更新するように構成された手段とを備える。
他の例においては、最適化手段５０２の第２調整手段は、第２の集合の要素の類似性および先験的な信頼度とに基づき、第２の集合のスコアを調整することによって調整スコアを求めるように構成された手段と、調整スコアと第２の集合のスコアとの差が既定の要件を満たしている場合には、調整スコアを第２の集合の調整後のスコアと判定して第２の集合のスコアの調整を停止するように構成された手段と、調整スコアと第２の集合のスコアとの差が既定の要件を満たしていない場合には、第２の集合のスコアを調整スコアで更新するように構成された手段とを備える。 In one embodiment, the optimization means 502 is configured to adjust the score of the first set of the opinion word set and the target feature set using the similarity of the elements of the first set. First conversion means, and conversion calculation means configured to calculate the score of the second set of the opinion word set and the target feature set according to the relationship according to the adjusted score of the first set, Adjusting the score of the second set using the similarity of the elements of the second set to calculate the score of the first set based on the relationship according to the adjusted score of the second set The difference between the second adjustment means configured in the above and the score before and after adjustment of the first set satisfies a predetermined requirement, or the difference between the scores before and after adjustment of the second set satisfies a predetermined requirement If the first set according to the adjusted score of the first set It ranked elements of the set, and includes a configured ranked unit to rank the elements of the second set according to the score after adjustment of the second set.
In one example, the optimizer 502 is configured to initialize the score of the first set according to a predefined policy based on the frequency information of each element of the first set of corpus. Means.
In another example, the first adjuster of the optimizer 502 adjusts the score of the first set by adjusting the score of the first set based on the similarity and a priori confidence of the elements of the first set. If the difference between the means configured to determine the adjusted score and the score of the first set satisfies a predetermined requirement, the adjusted score is determined as the adjusted score of the first set; And the means configured to stop adjusting the score of the first set, and the difference between the adjusted score and the score of the first set does not satisfy the predetermined requirement, the score of the first set Means configured to update with an adjusted score.
In another example, the second adjustment means of the optimization means 502 may adjust the adjustment score by adjusting the score of the second set based on the similarity of elements of the second set and the a priori confidence. And if the difference between the adjusted score and the score of the second set satisfies the predetermined requirement, the adjusted score is determined as the adjusted score of the second set. Means configured to stop adjusting the score of the second set, and if the difference between the adjusted score and the score of the second set does not satisfy the predetermined requirement, the score of the second set is Means configured to update with an adjusted score.

１つの実施例においては、抽出手段５０３は、最適化された意見語集合から意見語の既定のしきい値に従って高い順位の意見語を抽出するように構成された手段と、最適化された対象特徴集合から対象特徴の既定のしきい値に従って高い順位の対象特徴を抽出するように構成された手段と、意見語集合と対象特徴集合の関係に基づいて、高い順位の意見語と高い順位の対象特徴から評価情報を取得するように構成された手段とを備える。 In one embodiment, the extracting means 503 includes means adapted to extract high ranking opinion words from the optimized opinion word set according to a predetermined threshold of opinion words, and an optimized object. Based on the means configured to extract high-order target features from the feature set according to a predetermined threshold of the target features, and based on the relationship between the opinion word set and the target feature set, Means for obtaining evaluation information from the target feature.

ここで注目すべきは、本発明は中国語の処理に限定されたものではなく、英語、フランス語、ドイツ語など、数多くの種類の言語に対する処理に利用できることである。 It should be noted here that the present invention is not limited to the processing of Chinese, but can be used for processing of many kinds of languages such as English, French, and German.

本発明による方法は、ソフトウェア、ハードウェア、またはソフトウェアとハードウェアの組み合わせとして実施することもできる。ハードウェア部分は専用の論理回路を使用して実装でき、ソフトウェア部分はメモリに格納して、マイクロプロセッサ、パーソナルコンピュータ（ＰＣ）、メインフレームなどの適切な命令実行システムによって実行することができる。 The method according to the invention can also be implemented as software, hardware or a combination of software and hardware. The hardware portion can be implemented using dedicated logic circuitry, and the software portion can be stored in memory and executed by a suitable instruction execution system such as a microprocessor, personal computer (PC), mainframe or the like.

以上、好ましい実施の形態をあげて本発明を説明したが、本発明は必ずしも、上記実施の形態に限定されるものでなく、その技術的思想の範囲内において様々に変形して実施することができる。 The present invention has been described above with reference to preferred embodiments. However, the present invention is not necessarily limited to the above embodiments, and various modifications can be made within the scope of the technical idea. it can.

さらに、上記実施形態の一部又は全部は、以下の付記のようにも記載されうるが、これに限定されない。 Further, a part or all of the above-described embodiment can be described as in the following supplementary notes, but is not limited thereto.

（付記１）
コーパスから意見語集合と対象特徴集合を取得するステップと、
意見語集合と対象特徴集合との関係、意見語集合の要素の類似度、および対象特徴集合の要素の類似度に基づいて、意見語集合と対象特徴集合とを最適化するステップと、
最適化された意見語集合および最適化された対象特徴集合に基づいて、評価情報を抽出するステップと
を含むことを特徴とする評価情報抽出方法。 (Appendix 1)
Obtaining an opinion word set and a target feature set from a corpus;
Optimizing the opinion word set and the target feature set based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set;
Extracting the evaluation information based on the optimized opinion word set and the optimized target feature set. An evaluation information extraction method comprising:

（付記２）
コーパスから意見語集合と対象特徴集合を取得する前記ステップが、
テキスト単位を取得するためにコーパスの前処理を行うステップと、
取得されたテキスト単位に基づき、意見語抽出規則に従って意見語集合を取得するステップと、
取得されたテキスト単位に基づき、対象特徴抽出規則に従って対象特徴集合を取得するステップと
を含むことを特徴とする付記１に記載の評価情報抽出方法。 (Appendix 2)
Obtaining the opinion word set and the target feature set from the corpus,
Performing corpus preprocessing to obtain text units;
Obtaining a set of opinion words according to opinion word extraction rules based on the obtained text units;
The evaluation information extracting method according to claim 1, further comprising: acquiring a target feature set according to the target feature extraction rule based on the acquired text unit.

（付記３）
前記意見語抽出規則が、程度副詞のすぐ後に続く文節、形容詞、機能語を含まない文節、最長の意見語よりも長さが短いか等しい文節、および最小出現頻度の意見語よりも出現頻度が多い文節を、テキスト単位から１つ以上抽出して意見語とするものであることを特徴とする付記２に記載の評価情報抽出方法。 (Appendix 3)
The opinion word extraction rule has a phrase that immediately follows a degree adverb, an adjective, a phrase that does not include a function word, a phrase that is shorter or equal in length to the longest opinion word, and an appearance word that has a minimum frequency of appearance The evaluation information extracting method according to appendix 2, wherein one or more texts are extracted from the text unit to be an opinion word.

（付記４）
前記対象特徴抽出規則が、基本名詞句、基本名詞句同士の組み合わせ、基本名詞句と名詞／動名詞の組み合わせ、基本名詞句と限定詞の組み合わせ、および限定詞と名詞／動名詞の組み合わせ、機能語を含まない文節、最長の対象特徴よりも長さが短いか等しい文節、および出現頻度が最少の対象特徴よりも出現頻度が多い文節を、テキスト単位から１つ以上抽出して対象特徴とするものであることを特徴とする付記２に記載の評価情報抽出方法。 (Appendix 4)
The target feature extraction rule is a basic noun phrase, a combination of basic noun phrases, a combination of a basic noun phrase and a noun / verbal noun, a combination of a basic noun phrase and a qualifier, and a combination of a qualifier and a noun / dynamic noun, a function Extract from the text unit one or more phrases that do not contain words, phrases that are shorter or equal to the longest target feature, and those that appear more frequently than the target feature that has the lowest occurrence frequency, and use them as target features The evaluation information extracting method according to supplementary note 2, wherein the evaluation information is extracted.

（付記５）
意見語集合と対象特徴集合との関係、意見語集合の要素の類似度、および対象特徴集合の要素の類似度に基づいて、意見語集合と対象特徴集合とを最適化する前記ステップが、
意見語集合と対象特徴集合のうちの第１の集合のスコアに従って、意見語集合と対象特徴集合のうちの第２の集合のスコアを、関係に基づいて計算するステップと、
第２の集合の要素の類似度を使って、第２の集合のスコアを調整するステップと、
第２の集合の調整後のスコアに従って第１の集合のスコアを、関係に基づいて計算するステップと、
第１の集合の要素の類似度を使って第１の集合のスコアを調整し、第１の集合の調整後のスコアに従って第２の集合のスコアを、関係に基づいて計算するステップと、
第１の集合の調整前後のスコアの差が既定の要件を満たしているか、または第２の集合の調整前後のスコアの差が既定の要件を満たしている場合に、第１の集合の調整後のスコアに従って第１の集合の要素をランク付けし、かつ第２の集合の調整後のスコアに従って第２の集合の要素をランク付けするステップと
を含むことを特徴とする付記１に記載の評価情報抽出方法。 (Appendix 5)
The step of optimizing the opinion word set and the target feature set based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set,
Calculating a score of the second set of the opinion word set and the target feature set based on the relationship according to the score of the first set of the opinion word set and the target feature set;
Adjusting the score of the second set using the similarity of the elements of the second set;
Calculating a score of the first set based on the relationship according to the adjusted score of the second set;
Adjusting the score of the first set using the similarity of the elements of the first set and calculating the score of the second set based on the relationship according to the adjusted score of the first set;
After adjustment of the first set, if the difference in score before and after adjustment of the first set meets a predetermined requirement, or the difference in score before and after adjustment of the second set satisfies a predetermined requirement Ranking the elements of the first set according to the score of and ranking the elements of the second set according to the adjusted score of the second set. Information extraction method.

（付記６）
意見語集合と対象特徴集合との関係、意見語集合の要素の類似度、および対象特徴集合の要素の類似度に基づいて、意見語集合と対象特徴集合とを最適化する前記ステップが、
第１の集合の要素の類似度を使って、意見語集合と対象特徴集合のうちの第１の集合のスコアを調整するステップと、
第１の集合の調整後のスコアに従って、意見語集合と対象特徴集合のうちの第２の集合のスコアを、関係に基づいて計算するステップと、
第１の集合のスコアを第２の集合の調整後のスコアに従って、関係に基づいて計算するために、第２の集合の要素の類似度を使って第２の集合のスコアを調整するステップと、
第１の集合の調整前後のスコアの差が既定の要件を満たしているか、または第２の集合の調整前後のスコアの差が既定の要件を満たしている場合には、第１の集合の調整後のスコアに従って第１の集合の要素をランク付けし、かつ第２の集合の調整後のスコアに従って第２の集合の要素をランク付けするステップと
を含むことを特徴とする付記１に記載の評価情報抽出方法。 (Appendix 6)
The step of optimizing the opinion word set and the target feature set based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set,
Adjusting the score of the first set of the opinion word set and the target feature set using the similarity of the elements of the first set;
Calculating a score of the second set of the opinion word set and the target feature set according to the adjusted score of the first set based on the relationship;
Adjusting the score of the second set using the similarity of the elements of the second set to calculate the score of the first set based on the relationship according to the adjusted score of the second set; ,
If the difference between the scores before and after adjustment of the first set meets a predetermined requirement, or if the difference between the scores before and after adjustment of the second set satisfies a predetermined requirement, the adjustment of the first set Ranking the elements of the first set according to a later score and ranking the elements of the second set according to the adjusted score of the second set. Evaluation information extraction method.

（付記７）
コーパスの第１の集合の各要素の頻度情報に基づき、事前に定義された方針に従って、第１の集合のスコアを初期化するステップをさらに含むことを特徴とする付記５又は付記６に記載の評価情報抽出方法。 (Appendix 7)
Appendix 5 or appendix 6 further comprising the step of initializing the score of the first set according to a predefined policy based on the frequency information of each element of the first set of corpus Evaluation information extraction method.

（付記８）
第１の集合の要素の類似度を使って第１の集合のスコアを調整する前記ステップが、
第１の集合の要素の類似度および先験的な信頼度とに基づき、第１の集合のスコアを調整することによって調整スコアを求めるステップと、
調整スコアと第１の集合のスコアとの差が既定の要件を満たしている場合には、調整スコアを第１の集合の調整後のスコアと判定し、かつ第１の集合のスコアの調整を停止するステップと、
調整スコアと第１の集合のスコアとの差が既定の要件を満たしていない場合には、第１の集合のスコアを調整スコアで更新するステップとを含み、
第２の集合の要素の類似度を使って、第２の集合のスコアを調整する前記ステップが、
第２の集合の要素の類似度および先験的な信頼度に基づき、第２の集合のスコアを調整することによって調整スコアを求めるステップと、
調整スコアと第２の集合のスコアとの差が既定の要件を満たしている場合には、調整スコアを第２の集合の調整後のスコアと判定して第２の集合のスコアの調整を停止するステップと、
調整スコアと第２の集合のスコアとの差が既定の要件を満たしていない場合には、第２の集合のスコアを調整スコアで更新するステップとを含むことを特徴とする付記５又は付記６に記載の評価情報抽出方法。 (Appendix 8)
Adjusting the score of the first set using the similarity of the elements of the first set;
Determining an adjusted score by adjusting the score of the first set based on the similarity and a priori confidence of the elements of the first set;
When the difference between the adjusted score and the score of the first set satisfies a predetermined requirement, the adjusted score is determined as the adjusted score of the first set, and the adjustment of the score of the first set is adjusted. A step to stop,
Updating the first set of scores with the adjusted score if the difference between the adjusted score and the score of the first set does not meet the predetermined requirement;
Adjusting the score of the second set using the similarity of the elements of the second set;
Determining an adjusted score by adjusting the score of the second set based on the similarity and a priori confidence of the elements of the second set;
If the difference between the adjusted score and the score of the second set satisfies a predetermined requirement, the adjusted score is determined as the adjusted score of the second set, and the adjustment of the score of the second set is stopped. And steps to
Supplementary note 5 or Supplementary note 6, comprising the step of updating the score of the second set with the adjusted score when the difference between the adjusted score and the score of the second set does not satisfy the predetermined requirement The evaluation information extraction method described in 1.

（付記９）
最適化された意見語集合および最適化された対象特徴集合に基づいて、評価情報を抽出する前記ステップが、
最適化された意見語集合から意見語の既定のしきい値に従って高い順位の意見語を抽出するステップと、
最適化された対象特徴集合から対象特徴の既定のしきい値に従って高い順位の対象特徴を抽出するステップと、
意見語集合と対象特徴集合の関係に基づいて、高い順位の意見語と高い順位の対象特徴から評価情報を取得するステップと
を含むことを特徴とする付記１に記載の評価情報抽出方法。 (Appendix 9)
Extracting the evaluation information based on the optimized opinion word set and the optimized target feature set,
Extracting high ranking opinion words from the optimized opinion word set according to a predetermined threshold of opinion words;
Extracting a high-order target feature from the optimized target feature set according to a predetermined threshold of the target feature;
The evaluation information extraction method according to appendix 1, further comprising the step of acquiring evaluation information from a high ranking opinion word and a high ranking target feature based on a relationship between the opinion word set and the target feature set.

（付記１０）
コーパスから意見語集合と対象特徴集合とを取得するように構成された取得手段と、
意見語集合と対象特徴集合との関係、意見語集合の要素の類似度、および対象特徴集合の要素の類似度に基づいて、意見語集合と対象特徴集合とを最適化するように構成された最適化手段と、
最適化された意見語集合および最適化された対象特徴集合に基づいて、評価情報を抽出するように構成された抽出手段と
を備えることを特徴とする評価情報抽出装置。 (Appendix 10)
An acquisition means configured to acquire an opinion word set and a target feature set from a corpus;
Configured to optimize opinion word set and target feature set based on relationship between opinion word set and target feature set, similarity of elements of opinion word set, and similarity of elements of target feature set Optimization means;
An evaluation information extraction apparatus comprising: extraction means configured to extract evaluation information based on an optimized opinion word set and an optimized target feature set.

（付記１１）
前記取得手段は、
テキスト単位を取得するためにコーパスの前処理を行うように構成された手段と、
取得されたテキスト単位に基づき、意見語抽出規則に従って意見語集合を取得するように構成された手段と、
取得されたテキスト単位に基づき、対象特徴抽出規則に従って対象特徴集合を取得するように構成された手段
を備えることを特徴とする付記１０に記載の評価情報抽出装置。 (Appendix 11)
The acquisition means includes
Means configured to pre-process the corpus to obtain a text unit;
Means configured to obtain a set of opinion words according to the opinion word extraction rules based on the obtained text units;
The evaluation information extraction device according to appendix 10, further comprising means configured to acquire a target feature set according to a target feature extraction rule based on the acquired text unit.

（付記１２）
前記意見語抽出規則が、程度副詞のすぐ後に続く文節、形容詞、機能語を含まない文節、最長の意見語よりも長さが短いか等しい文節、および最小出現頻度の意見語よりも出現頻度が多い文節を、テキスト単位から１つ以上抽出して意見語とするものであることを特徴とする付記１１に記載の評価情報抽出装置。 (Appendix 12)
The opinion word extraction rule has a phrase that immediately follows a degree adverb, an adjective, a phrase that does not include a function word, a phrase that is shorter or equal in length to the longest opinion word, and an appearance word that has a minimum frequency of appearance The evaluation information extraction apparatus according to appendix 11, wherein one or more texts are extracted from the text unit to form an opinion word.

（付記１３）
前記対象特徴抽出規則が、基本名詞句、基本名詞句同士の組み合わせ、基本名詞句と名詞／動名詞の組み合わせ、基本名詞句と限定詞の組み合わせ、および限定詞と名詞／動名詞の組み合わせ、機能語を含まない文節、最長の対象特徴よりも長さが短いか等しい文節、および出現頻度が最少の対象特徴よりも出現頻度が多い文節を、テキスト単位から１つ以上抽出して対象特徴とするものであることを特徴とする付記１１に記載の評価情報抽出装置。 (Appendix 13)
The target feature extraction rule is a basic noun phrase, a combination of basic noun phrases, a combination of a basic noun phrase and a noun / verbal noun, a combination of a basic noun phrase and a qualifier, and a combination of a qualifier and a noun / dynamic noun, a function Extract from the text unit one or more phrases that do not contain words, phrases that are shorter or equal to the longest target feature, and those that appear more frequently than the target feature that has the lowest occurrence frequency, and use them as target features The evaluation information extraction device according to attachment 11, wherein the evaluation information extraction device is a device.

（付記１４）
前記最適化手段は、
意見語集合と対象特徴集合のうちの第１の集合のスコアに従って、意見語集合と対象特徴集合のうちの第２の集合のスコアを、関係に基づいて計算するように構成された第１変換計算手段と、
第２の集合の要素の類似度を使って、第２の集合のスコアを調整するように構成された第２調整手段と、
第２の集合の調整後のスコアに従って第１の集合のスコアを、関係に基づいて計算するように構成された第２変換計算手段と、
第１の集合の要素の類似度を使って第１の集合のスコアを調整し、第１の集合の調整後のスコアに従って第２の集合のスコアを、関係に基づいて計算するように構成された第１調整手段と、
第１の集合の調整前後のスコアの差が既定の要件を満たしているか、または第２の集合の調整前後のスコアの差が既定の要件を満たしている場合に、第１の集合の調整後のスコアに従って第１の集合の要素をランク付けし、かつ第２の集合の調整後のスコアに従って第２の集合の要素をランク付けするように構成されたランク付け手段と
を備えることを特徴とする付記１０に記載の評価情報抽出装置。 (Appendix 14)
The optimization means includes
A first transformation configured to calculate a score of a second set of the opinion word set and the target feature set according to a relationship according to a score of the first set of the opinion word set and the target feature set A calculation means;
Second adjustment means configured to adjust the score of the second set using the similarity of the elements of the second set;
Second transformation calculating means configured to calculate a score of the first set based on the relationship according to the adjusted score of the second set;
The score of the first set is adjusted using the similarity of the elements of the first set, and the score of the second set is calculated based on the relationship according to the adjusted score of the first set. First adjusting means,
After adjustment of the first set, if the difference in score before and after adjustment of the first set meets a predetermined requirement, or the difference in score before and after adjustment of the second set satisfies a predetermined requirement And ranking means configured to rank the elements of the first set according to the score of and rank the elements of the second set according to the adjusted score of the second set, The evaluation information extraction apparatus according to Supplementary Note 10.

（付記１５）
前記最適化手段は、
第１の集合の要素の類似度を使って、意見語集合と対象特徴集合のうちの第１の集合のスコアを調整するように構成された第１調整手段と、
第１の集合の調整後のスコアに従って、意見語集合と対象特徴集合のうちの第２の集合のスコアを、関係に基づいて計算するように構成された変換計算手段と、
第１の集合のスコアを第２の集合の調整後のスコアに従って、関係に基づいて計算するために、第２の集合の要素の類似度を使って第２の集合のスコアを調整するように構成された第２調整手段と、
第１の集合の調整前後のスコアの差が既定の要件を満たしているか、または第２の集合の調整前後のスコアの差が既定の要件を満たしている場合には、第１の集合の調整後のスコアに従って第１の集合の要素をランク付けし、かつ第２の集合の調整後のスコアに従って第２の集合の要素をランク付けするように構成されたランク付け手段と
を備えることを特徴とする付記１０に記載の評価情報抽出装置。 (Appendix 15)
The optimization means includes
First adjusting means configured to adjust the score of the first set of the opinion word set and the target feature set using the similarity of the elements of the first set;
Transformation calculation means configured to calculate the score of the second set of the opinion word set and the target feature set according to the adjusted score of the first set based on the relationship;
Adjusting the score of the second set using the similarity of the elements of the second set to calculate the score of the first set according to the relationship according to the adjusted score of the second set Configured second adjusting means;
If the difference between the scores before and after adjustment of the first set meets a predetermined requirement, or if the difference between the scores before and after adjustment of the second set satisfies a predetermined requirement, the adjustment of the first set Ranking means configured to rank the elements of the first set according to the later score and rank the elements of the second set according to the adjusted score of the second set; The evaluation information extraction apparatus according to Supplementary Note 10.

（付記１６）
前記最適化手段は、
コーパスの第１の集合の各要素の頻度情報に基づき、事前に定義された方針に従って、第１の集合のスコアを初期化するように構成された手段を備えることを特徴とする付記１４又は付記１５に記載の評価情報抽出装置。 (Appendix 16)
The optimization means includes
Appendix 14 or Appendix comprising means configured to initialize a score of the first set according to a predefined policy based on frequency information of each element of the first set of corpus 15. The evaluation information extraction device according to 15.

（付記１７）
前記第１調整手段は、
第１の集合の要素の類似度および先験的な信頼度とに基づき、第１の集合のスコアを調整することによって調整スコアを求めるように構成された手段と、
調整スコアと第１の集合のスコアとの差が既定の要件を満たしている場合には、調整スコアを第１の集合の調整後のスコアと判定し、かつ第１の集合のスコアの調整を停止するように構成された手段と、
調整スコアと第１の集合のスコアとの差が既定の要件を満たしていない場合には、第１の集合のスコアを調整スコアで更新するように構成された手段とを備え、
前記第２調整手段は、
第２の集合の要素の類似度および先験的な信頼度に基づき、第２の集合のスコアを調整することによって調整スコアを求めるように構成された手段と、
調整スコアと第２の集合のスコアとの差が既定の要件を満たしている場合には、調整スコアを第２の集合の調整後のスコアと判定して第２の集合のスコアの調整を停止するように構成された手段と、
調整スコアと第２の集合のスコアとの差が既定の要件を満たしていない場合には、第２の集合のスコアを調整スコアで更新するように構成された手段とを備えることを特徴とする付記１４又は付記１５に記載の評価情報抽出装置。 (Appendix 17)
The first adjusting means includes
Means configured to determine an adjusted score by adjusting the score of the first set based on the similarity and a priori confidence of the elements of the first set;
When the difference between the adjusted score and the score of the first set satisfies a predetermined requirement, the adjusted score is determined as the adjusted score of the first set, and the adjustment of the score of the first set is adjusted. Means configured to stop;
Means configured to update the score of the first set with the adjusted score if the difference between the adjusted score and the score of the first set does not meet the predetermined requirement;
The second adjusting means includes
Means configured to determine an adjusted score by adjusting the score of the second set based on the similarity and a priori confidence of the elements of the second set;
If the difference between the adjusted score and the score of the second set satisfies a predetermined requirement, the adjusted score is determined as the adjusted score of the second set, and the adjustment of the score of the second set is stopped. Means configured to:
Means for updating the score of the second set with the adjusted score if the difference between the score of the adjusted score and the score of the second set does not satisfy the predetermined requirement. The evaluation information extraction device according to Supplementary Note 14 or Supplementary Note 15.

（付記１８）
前記抽出手段は、
最適化された意見語集合から意見語の既定のしきい値に従って高い順位の意見語を抽出するように構成された手段と、
最適化された対象特徴集合から対象特徴の既定のしきい値に従って高い順位の対象特徴を抽出するように構成された手段と、
意見語集合と対象特徴集合の関係に基づいて、高い順位の意見語と高い順位の対象特徴から評価情報を取得するように構成された手段と
を備えることを特徴とする付記１０に記載の評価情報抽出装置。 (Appendix 18)
The extraction means includes
Means configured to extract high ranking opinion words from the optimized opinion word set according to a predetermined threshold of opinion words;
Means configured to extract high-order target features from the optimized target feature set according to a predetermined threshold of the target features;
The evaluation according to claim 10, further comprising means configured to acquire evaluation information from a high ranking opinion word and a high ranking target feature based on a relationship between the opinion word set and the target feature set. Information extraction device.

５０１：取得手段
５０２：最適化手段
５０３：抽出手段 501: Acquisition means 502: Optimization means 503: Extraction means

Claims

An evaluation information extraction method by an evaluation information extraction device,
The obtaining unit of the evaluation information extracting device executes a step of obtaining an opinion word set and a target feature set from a corpus,
The optimization means of the evaluation information extracting device is configured to use the opinion word set and the target feature based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set. Perform steps to optimize the set,
The extraction means of the evaluation information extraction device performs a step of extracting evaluation information based on the optimized opinion word set and the optimized target feature set,
Obtaining the opinion word set and the target feature set from the corpus,
Performing corpus preprocessing to obtain text units;
Obtaining a set of opinion words according to opinion word extraction rules based on the obtained text units;
Obtaining a target feature set according to a target feature extraction rule based on the acquired text unit,
The opinion word extraction rule has a phrase that immediately follows a degree adverb, an adjective, a phrase that does not include a function word, a phrase that is shorter or equal in length to the longest opinion word, and an appearance word that has a minimum frequency of appearance An evaluation information extraction method characterized in that one or more texts are extracted from a text unit as opinion words .

An evaluation information extraction method by an evaluation information extraction device,
The obtaining unit of the evaluation information extracting device executes a step of obtaining an opinion word set and a target feature set from a corpus,
The optimization means of the evaluation information extracting device is configured to use the opinion word set and the target feature based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set. Perform steps to optimize the set,
The extraction means of the evaluation information extraction device performs a step of extracting evaluation information based on the optimized opinion word set and the optimized target feature set,
Obtaining the opinion word set and the target feature set from the corpus,
Performing corpus preprocessing to obtain text units;
Obtaining a set of opinion words according to opinion word extraction rules based on the obtained text units;
Obtaining a target feature set according to a target feature extraction rule based on the acquired text unit,
The target feature extraction rule is a basic noun phrase, a combination of basic noun phrases, a combination of a basic noun phrase and a noun / verbal noun, a combination of a basic noun phrase and a qualifier, and a combination of a qualifier and a noun / dynamic noun, a function Extract from the text unit one or more phrases that do not contain words, phrases that are shorter or equal to the longest target feature, and those that appear more frequently than the target feature that has the lowest occurrence frequency, and use them as target features An evaluation information extraction method characterized by being a thing .

An evaluation information extraction method by an evaluation information extraction device,
The obtaining unit of the evaluation information extracting device executes a step of obtaining an opinion word set and a target feature set from a corpus,
The optimization means of the evaluation information extracting device is configured to use the opinion word set and the target feature based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set. Perform steps to optimize the set,
The extraction means of the evaluation information extraction device performs a step of extracting evaluation information based on the optimized opinion word set and the optimized target feature set,
The step of optimizing the opinion word set and the target feature set based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set,
Calculating a score of the second set of the opinion word set and the target feature set based on the relationship according to the score of the first set of the opinion word set and the target feature set;
Adjusting the score of the second set using the similarity of the elements of the second set;
Calculating a score of the first set based on the relationship according to the adjusted score of the second set;
Adjusting the score of the first set using the similarity of the elements of the first set and calculating the score of the second set based on the relationship according to the adjusted score of the first set;
After adjustment of the first set, if the difference in score before and after adjustment of the first set meets a predetermined requirement, or the difference in score before and after adjustment of the second set satisfies a predetermined requirement Ranking the elements of the first set according to the score of and ranking the elements of the second set according to the adjusted score of the second set;
The evaluation information extraction method characterized by including .

An evaluation information extraction method by an evaluation information extraction device,
The obtaining unit of the evaluation information extracting device executes a step of obtaining an opinion word set and a target feature set from a corpus,
The optimization means of the evaluation information extracting device is configured to use the opinion word set and the target feature based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set. Perform steps to optimize the set,
The extraction means of the evaluation information extraction device performs a step of extracting evaluation information based on the optimized opinion word set and the optimized target feature set,
The step of optimizing the opinion word set and the target feature set based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set,
Adjusting the score of the first set of the opinion word set and the target feature set using the similarity of the elements of the first set;
Calculating a score of the second set of the opinion word set and the target feature set according to the adjusted score of the first set based on the relationship;
Adjusting the score of the second set using the similarity of the elements of the second set to calculate the score of the first set based on the relationship according to the adjusted score of the second set; ,
If the difference between the scores before and after adjustment of the first set meets a predetermined requirement, or if the difference between the scores before and after adjustment of the second set satisfies a predetermined requirement, the adjustment of the first set Ranking the elements of the first set according to a later score, and ranking the elements of the second set according to the adjusted score of the second set;
The evaluation information extraction method characterized by including .

The optimization means of the evaluation information extracting device executes a step of initializing a score of the first set according to a predefined policy based on frequency information of each element of the first set of corpus The evaluation information extraction method according to claim 3 or 4, wherein the evaluation information is extracted .

Adjusting the score of the first set using the similarity of the elements of the first set;
Determining an adjusted score by adjusting the score of the first set based on the similarity and a priori confidence of the elements of the first set;
When the difference between the adjusted score and the score of the first set satisfies a predetermined requirement, the adjusted score is determined as the adjusted score of the first set, and the adjustment of the score of the first set is adjusted. A step to stop,
Updating the first set of scores with the adjusted score if the difference between the adjusted score and the score of the first set does not meet the predetermined requirement;
Adjusting the score of the second set using the similarity of the elements of the second set;
Determining an adjusted score by adjusting the score of the second set based on the similarity and a priori confidence of the elements of the second set;
If the difference between the adjusted score and the score of the second set satisfies a predetermined requirement, the adjusted score is determined as the adjusted score of the second set, and the adjustment of the score of the second set is stopped. And steps to
Updating the second set of scores with the adjusted score if the difference between the adjusted score and the score of the second set does not satisfy a predetermined requirement. Item 5. The evaluation information extraction method according to Item 4 .

An evaluation information extraction method by an evaluation information extraction device,
The obtaining unit of the evaluation information extracting device executes a step of obtaining an opinion word set and a target feature set from a corpus,
The optimization means of the evaluation information extracting device is configured to use the opinion word set and the target feature based on the relationship between the opinion word set and the target feature set, the similarity of the elements of the opinion word set, and the similarity of the elements of the target feature set. Perform steps to optimize the set,
The extraction means of the evaluation information extraction device performs a step of extracting evaluation information based on the optimized opinion word set and the optimized target feature set,
Extracting the evaluation information based on the optimized opinion word set and the optimized target feature set,
Extracting high ranking opinion words from the optimized opinion word set according to a predetermined threshold of opinion words;
Extracting a high-order target feature from the optimized target feature set according to a predetermined threshold of the target feature;
Obtaining evaluation information from high ranking opinion words and high ranking target features based on the relationship between the opinion word set and the target feature set;
The evaluation information extraction method characterized by including .

An acquisition means configured to acquire an opinion word set and a target feature set from a corpus;
Configured to optimize opinion word set and target feature set based on relationship between opinion word set and target feature set, similarity of elements of opinion word set, and similarity of elements of target feature set Optimization means;
Extraction means configured to extract evaluation information based on the optimized opinion word set and the optimized target feature set, and
The acquisition means includes
Means configured to pre-process the corpus to obtain a text unit;
Means configured to obtain a set of opinion words according to the opinion word extraction rules based on the obtained text units;
Comprising means configured to acquire a target feature set according to a target feature extraction rule based on the acquired text unit;
The opinion word extraction rule has a phrase that immediately follows a degree adverb, an adjective, a phrase that does not include a function word, a phrase that is shorter or equal in length to the longest opinion word, and an appearance word that has a minimum frequency of appearance Extract one or more sentences from a text unit to make an opinion word
An evaluation information extraction apparatus characterized by that .