JP6971104B2

JP6971104B2 - Information processing equipment, information processing methods, and programs

Info

Publication number: JP6971104B2
Application number: JP2017180066A
Authority: JP
Inventors: 智輝齋藤; 樹生豊田; 真也夜久; 宏希岩澤; 健萩原
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2017-09-20
Filing date: 2017-09-20
Publication date: 2021-11-24
Anticipated expiration: 2037-09-20
Also published as: JP2019057022A

Description

本発明は、情報処理装置、情報処理方法、およびプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

従来、ウェブからデータ（文書や画像など）を収集し、収集したデータを自動的にデータベース化するクローラが知られている（特許文献１参照）。クローラは、ウェブページ中のリンクを辿って、様々なＩＰアドレスのウェブページからデータを収集する。クローラによって収集されたデータは、ウェブ情報データベースに蓄積される。 Conventionally, a crawler that collects data (documents, images, etc.) from the Web and automatically creates a database of the collected data is known (see Patent Document 1). Crawlers collect data from web pages with various IP addresses by following links in web pages. The data collected by the crawler is stored in the web information database.

一方、検索エンジンは、ユーザによって入力された検索ワードを受信すると、受信した検索ワードに関連するウェブページやアプリページの情報（例えばＵＲＬ：Uniform Resource Locator）をデータベースから取得し、取得した情報の一覧を検索結果として出力する。 On the other hand, when a search engine receives a search word input by a user, it acquires information on a web page or application page related to the received search word (for example, URL: Uniform Resource Locator) from a database, and a list of the acquired information. Is output as a search result.

特開２０１２−６９１７１号公報Japanese Unexamined Patent Publication No. 2012-69171

検索エンジンは、ユーザ満足度を向上させるため、ユーザによって入力された検索ワード（クエリ）に関連するデータを検索結果とともに出力するように運用されている場合がある。これを実現するために、テキストデータとエンティティとを互いに対応付けて記憶したデータベースが用いられる場合がある。ここで、データベースにおいて、ある特定のエンティティにのみ対応付けられているテキストデータがクエリとして入力されると、自ずと、その特定のエンティティが検索結果とともに出力されることになる。しかしながら、このテキストデータが、エンティティに関する情報として誤りである又は適切ではない場合、不適切な情報が検索結果とともに出力される可能性がある。 A search engine may be operated to output data related to a search word (query) input by a user together with a search result in order to improve user satisfaction. In order to achieve this, a database in which text data and entities are associated with each other and stored may be used. Here, when text data associated only with a specific entity is input as a query in the database, that specific entity is naturally output together with the search result. However, if this text data is incorrect or inappropriate as information about the entity, inappropriate information may be output together with the search results.

本発明は、このような事情を考慮してなされたものであり、不適切な情報が検索結果とともに出力されるのを抑制することができる情報処理装置を提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and one of the objects of the present invention is to provide an information processing apparatus capable of suppressing output of inappropriate information together with a search result.

本発明の一態様は、エンティティとテキストデータとが関連付けられたコンテンツ情報のうち、任意の対象のテキストデータを選択する選択部と、前記選択部によって選択された対象テキストデータに関連する一以上の関連ウェブページを収集する収集部と、前記コンテンツ情報において前記対象テキストデータに関連付けられたエンティティ毎に、前記収集部によって収集された一以上の関連ウェブページのうち第１種類の関連ウェブページにおいて、前記対象テキストデータがリンク先を示すテキストとして含まれる第１確率と、前記一以上の関連ウェブページのうち第２種類の関連ウェブページにおいて、前記コンテンツ情報において前記対象テキストデータと関連付けられたエンティティをテキストとして含むリンク先が選択された第２確率とのうち、少なくとも１つと、前記第１種類の関連ウェブページまたは前記第２種類の関連ウェブページにおける前記テキストにより示されるリンク先にあるウェブページが、前記コンテンツ情報において前記対象テキストデータと関連付けられるエンティティのウェブページである第３の確率とに基づいて、前記対象テキストデータに対応するエンティティとしての評価を行う評価部と、を備える情報処理装置である。 One aspect of the present invention is a selection unit that selects an arbitrary target text data from the content information associated with the entity and the text data, and one or more related to the target text data selected by the selection unit. In the collection unit that collects related web pages, and in the first type of related web page among one or more related web pages collected by the collection unit for each entity associated with the target text data in the content information. The first probability that the target text data is included as the text indicating the link destination, and the entity associated with the target text data in the content information in the second type of related web page among the one or more related web pages. Of the second probability that the link destination to be included as text is selected, at least one and the web page at the link destination indicated by the text in the first type related web page or the second type related web page. An information processing apparatus including an evaluation unit that evaluates the content information as an entity corresponding to the target text data based on a third probability that is a web page of the entity associated with the target text data. be.

本発明の一態様によれば、不適切な情報が検索結果とともに出力されるのを抑制することができる。 According to one aspect of the present invention, it is possible to suppress the output of inappropriate information together with the search result.

本実施形態に係るナレッジデータサーバの使用環境及び構成を示す図である。It is a figure which shows the use environment and the configuration of the knowledge data server which concerns on this embodiment. 本実施形態に係るナレッジグラフの一例を示す図である。It is a figure which shows an example of the knowledge graph which concerns on this embodiment. 本実施形態に係る「野球選手Ａ」のウェブページの一例を示す図である。It is a figure which shows an example of the web page of "baseball player A" which concerns on this embodiment. 本実施形態に係る「野球チームＢ」のウェブページの一例を示す図である。It is a figure which shows an example of the web page of "baseball team B" which concerns on this embodiment. 本実施形態に係るコンテンツ情報の一例を示す図である。It is a figure which shows an example of the content information which concerns on this embodiment. 本実施形態に係る検索結果ウェブページの一例を示す図である。It is a figure which shows an example of the search result web page which concerns on this embodiment. 本実施形態に係るコンテンツ情報の他の一例を示す図である。It is a figure which shows another example of the content information which concerns on this embodiment. 本実施形態に係る検索結果ウェブページの他の一例を示す図である。It is a figure which shows another example of the search result web page which concerns on this embodiment. 本実施形態に係る収集部が収集した関連ウェブページの一例を示す図である。It is a figure which shows an example of the related web page collected by the collection part which concerns on this embodiment. 本実施形態に係る収集部が収集した関連ウェブページの他の一例を示す図である。It is a figure which shows another example of the related web page collected by the collection part which concerns on this embodiment. 本実施形態に係る同一のエンティティの一例を示す図である。It is a figure which shows an example of the same entity which concerns on this embodiment. 本実施形態に係る「野球選手Ａ」のウェブページの他の一例を示す図である。It is a figure which shows another example of the web page of "baseball player A" which concerns on this embodiment. 本実施形態に係るコンテンツ情報及び得点の一例を示す図である。It is a figure which shows an example of the content information and the score which concerns on this embodiment. 本実施形態のナレッジデータサーバの得点を算出する動作の一例を示す流れ図である。It is a flow chart which shows an example of the operation which calculates the score of the knowledge data server of this embodiment. 本実施形態のナレッジデータサーバ得点に基づく動作の一例を示す流れ図である。It is a flow chart which shows an example of the operation based on the knowledge data server score of this embodiment.

以下、図面を参照し、本発明の情報処理装置、情報処理方法、及びプログラムの実施形態について説明する。本実施形態においては、情報処理装置が、ナレッジデータサーバの一部を構成するものとして説明する。ナレッジデータサーバは、例えば、収集対象データ（例えば、画像やテキストデータ）を収集し、収集したデータに基づいてナレッジグラフを生成するサーバである。本実施形態において、ナレッジグラフは、エンティティやクラス、プロパティと、エンティティに関連する関連データが記載されているウェブページの所在情報とが関連付けられたデータである。エンティティとは、例えば、ある対象事物の実体（例えば、実世界で存在している物体）を表していてもよいし、ある対象事物の概念（例えば、実世界または仮想世界の中で定義された概念）を表していてもよい。例えば、対象事物が「建物」という概念である場合、エンティティは、「○○塔」や「○○ビルディング」などといった実体を表してよい。また、例えば、対象事物が「経済学」という概念である場合、エンティティは、「ミクロ経済学」や「マクロ経済学」などといった実体のない抽象的な概念を表してよい。以下、実施形態について詳細に説明する。 Hereinafter, embodiments of the information processing apparatus, information processing method, and program of the present invention will be described with reference to the drawings. In the present embodiment, the information processing apparatus will be described as forming a part of the knowledge data server. The knowledge data server is, for example, a server that collects data to be collected (for example, image or text data) and generates a knowledge graph based on the collected data. In the present embodiment, the knowledge graph is data in which an entity, a class, and a property are associated with the location information of a web page in which related data related to the entity is described. An entity may represent, for example, an entity of an object (eg, an object that exists in the real world) or a concept of an object (eg, defined in the real world or virtual world). It may represent a concept). For example, when the object is the concept of "building", the entity may represent an entity such as "○○ tower" or "○○ building". Further, for example, when the object is the concept of "economics", the entity may represent an insubstantial abstract concept such as "microeconomics" or "macroeconomics". Hereinafter, embodiments will be described in detail.

＜実施形態＞
＜１−１．ナレッジデータサーバの使用環境＞
図１は、本実施形態に係るナレッジデータサーバ１００の使用環境及び構成を示す図である。ナレッジデータサーバ１００は、端末装置２００と、ウェブサーバ３００と、クロール対象機器ＤＶと、ネットワークＮＷを介して通信可能に接続される。ネットワークＮＷは、ワールドワイドウェブ（World Wide Web）を意味し、インターネットやイントラネットで標準的に用いられるＨＴＭＬ文書などを利用したシステムである。ネットワークＮＷは、無線基地局やプロバイダ装置、専用回線などを更に含んでよい。 <Embodiment>
<1-1. Knowledge data server usage environment>
FIG. 1 is a diagram showing a usage environment and configuration of the knowledge data server 100 according to the present embodiment. The knowledge data server 100 is communicably connected to the terminal device 200, the web server 300, the crawl target device DV, and the network NW. The network NW means the World Wide Web, and is a system using HTML documents and the like that are standardly used on the Internet and intranets. The network NW may further include a radio base station, a provider device, a dedicated line, and the like.

端末装置２００は、ユーザによって使用される装置であり、例えば、スマートフォンなどの携帯電話、タブレット型コンピュータ、ノート型コンピュータ、デスクトップ型コンピュータなどである。ウェブサーバ３００は、検索エンジンを用いて検索結果を表示するためのウェブページを生成し、端末装置２００に提供するサーバである。 The terminal device 200 is a device used by a user, and is, for example, a mobile phone such as a smartphone, a tablet computer, a notebook computer, a desktop computer, or the like. The web server 300 is a server that uses a search engine to generate a web page for displaying search results and provides it to the terminal device 200.

ナレッジデータサーバ１００は、例えば、制御部１１０と、記憶部１２０とを備える。記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ、またはこれらのうち複数が組み合わされたハイブリッド型記憶装置などにより実現される。また、記憶部１２０の一部または全部は、ＮＡＳ（Network Attached Storage）や外部のストレージサーバなど、ナレッジデータサーバ１００がアクセス可能な外部装置であってもよい。記憶部１２０には、ナレッジグラフ（以下、ナレッジグラフＤ１）と、コンテンツ情報Ｄ２とが記憶される。 The knowledge data server 100 includes, for example, a control unit 110 and a storage unit 120. The storage unit 120 is realized by, for example, a RAM (Random Access Memory), an HDD (Hard Disk Drive), a flash memory, or a hybrid storage device in which a plurality of these are combined. Further, a part or all of the storage unit 120 may be an external device such as NAS (Network Attached Storage) or an external storage server that can be accessed by the knowledge data server 100. The knowledge graph (hereinafter referred to as knowledge graph D1) and the content information D2 are stored in the storage unit 120.

図２は、本実施形態に係るナレッジグラフＤ１の一例を示す図である。ナレッジグラフＤ１において記述された事物は、オントロジーによって定義される。オントロジーとは、事物のクラスおよびプロパティを定義したものであり、クラスとプロパティとの間に成り立つ制約を集めたものである。 FIG. 2 is a diagram showing an example of Knowledge Graph D1 according to the present embodiment. The things described in Knowledge Graph D1 are defined by the ontology. An ontology is a definition of a class and a property of an object, and is a collection of constraints that hold between the class and the property.

クラスとは、オントロジーにおいて、同じ性質を持つ事物同士を一つのグループにしたものである。事物の性質がどういったものであるのか、すなわち事物がどのクラスに属するのかは、後述するプロパティにより決定される。 A class is a group of things that have the same properties in an ontology. What the nature of an object is, that is, which class the object belongs to, is determined by the properties described below.

例えば、くちばしを持ち、卵生の脊椎動物であり、前肢が翼になっている、という性質を持つ事物は、「鳥」というクラスに分類される。また、「鳥」というクラスの中で、飛べない、という性質を持つ事物は、例えば、「ペンギン」や「ダチョウ」という、より下位のクラスに分類される。このように、クラスの体系は、上位と下位の関係を有する階層構造となっていてよい。上位のクラスの性質は、下位のクラスに継承される。上述した例では、「鳥」というクラスの、「くちばしを持ち、卵生の脊椎動物であり、前肢が翼になっている」という性質は、「ペンギン」や「ダチョウ」という下位のクラスの性質にも含まれることになる。クラスを識別するためのクラス名は、例えば、「鳥」というクラスであれば、そのクラス名は「鳥」という文字列によって表されてよい。なお、クラス名は、必ずしも意味を表している必要はなく、例えば、「鳥」というクラスであっても、「情報１」や「Ｃ１」といった単なる識別情報を示す文字列が割り当てられてもよい。上述したエンティティ、すなわち事物は、オントロジーにより定義されたクラス体系の中に含まれる、いずれかのクラスに属するものとする。 For example, things that have a beak, an oviparous vertebrate, and forelimbs that are wings are classified in the "bird" class. Also, in the class of "birds", things that have the property of not being able to fly are classified into lower classes such as "penguins" and "ostriches". In this way, the class system may have a hierarchical structure having a higher-lower relationship. The nature of the upper class is inherited by the lower class. In the example above, the "bird" class's "beak-bearing, oviparous vertebrate with winged forelimbs" property is a lower class property of "penguins" and "ostriches". Will also be included. For example, if the class name for identifying the class is a class "bird", the class name may be represented by the character string "bird". The class name does not necessarily have to represent a meaning, and for example, even in the class "bird", a character string indicating mere identification information such as "information 1" or "C1" may be assigned. .. The above-mentioned entities, or things, shall belong to any of the classes contained within the class system defined by the ontology.

プロパティとは、事物の性質や特徴、クラス間の関係を記述する属性である。例えば、プロパティは、「〜を体の構成要素としてもつ」という性質や、「〜に生息する」という性質を示す属性であってもよいし、「あるクラスが上位クラスであり、あるクラスが下位クラスである」というクラス間の上位下位の関係を示す属性であってもよい。プロパティを識別するためのプロパティ名は、上述したクラス名と同様に、そのプロパティ名自体が意味を表していてもよいし、意味を表していなくてもよい。 Properties are attributes that describe the nature and characteristics of things and the relationships between classes. For example, a property may be an attribute that has the property of "having ~ as a component of the body" or the property of "living in ...", or "a class is a higher class and a certain class is a lower class". It may be an attribute indicating the relationship between the upper and lower levels of "class". As for the property name for identifying the property, the property name itself may or may not represent the meaning, as in the class name described above.

ナレッジグラフＤ１は、上述したクラスがノードとして表され、上述したプロパティがラベル付き、且つ方向性のあるエッジとして表された有向グラフである。このようなグラフ構造によって、事物についての情報がノードにより、事物間の関係がエッジによって判別可能となる。 The Knowledge Graph D1 is a directed graph in which the above-mentioned classes are represented as nodes and the above-mentioned properties are represented as labeled and directional edges. With such a graph structure, information about things can be discriminated by nodes, and relationships between things can be discriminated by edges.

図２示される例のナレッジグラフＤ１は、エンティティＥ１と、エンティティＥ２とが、「所属チーム」というプロパティ名のクラスに属する。本実施形態において、各エンティティには、各エンティティを識別することが可能な情報（以下、エンティティ識別情報ＥＩＤ）と、エンティティ名と、当該エンティティのウェブページの所在情報とが関連付けられる。所在情報とは、ウェブ上の位置を特定するための情報であり、例えば、ＵＲＬ（Uniform Resource Locator）である。所在情報が示すウェブページには、当該エンティティの関連データが記載されている。図２に示すナレッジグラフＤ１において、エンティティＥ１には、エンティティ識別情報ＥＩＤの「０００１」と、エンティティ名の「野球選手Ａ」と、当該エンティティに関連する関連データが記載されているウェブページの所在情報である「ｈｔｔｐ：／／百科事典ウェブページ／野球選手Ａ」というＵＲＬとが対応付けられている。また、エンティティＥ２には、エンティティ識別情報ＥＩＤの「０００２」と、エンティティ名「野球チームＢ」と、当該エンティティに関連する関連データが記載されているウェブページの所在情報である「ｈｔｔｐ：／／百科事典ウェブページ／野球チームＢ」というＵＲＬとが対応付けられている。以降の説明において、エンティティに関連する関連データをエンティティの関連データと記載する。また、エンティティの関連データが記載されているウェブページを、エンティティのウェブページとも記載する。 In the knowledge graph D1 of the example shown in FIG. 2, the entity E1 and the entity E2 belong to a class having a property name of "belonging team". In the present embodiment, each entity is associated with information that can identify each entity (hereinafter, entity identification information EID), an entity name, and location information of a web page of the entity. The location information is information for specifying a position on the Web, and is, for example, a URL (Uniform Resource Locator). The web page indicated by the location information contains the relevant data of the entity. In the knowledge graph D1 shown in FIG. 2, the entity E1 is the location of a web page in which the entity identification information EID "0001", the entity name "baseball player A", and related data related to the entity are described. It is associated with the information "http: // encyclopedia web page / baseball player A" URL. Further, in the entity E2, the entity identification information EID "0002", the entity name "baseball team B", and the location information of the web page in which the related data related to the entity are described are described as "http: //". It is associated with the URL "Encyclopedia Web Page / Baseball Team B". In the following description, the related data related to the entity will be described as the related data of the entity. In addition, the web page in which the data related to the entity is described is also described as the web page of the entity.

図３は、本実施形態に係る「野球選手Ａ」のウェブページの一例を示す図である。「野球選手Ａ」のウェブページには、「野球選手Ａ」の関連データが記載されている。また、図４は、本実施形態に係る「野球チームＢ」のウェブページの一例を示す図である。「野球チームＢ」のウェブページには、「野球チームＢ」の関連データが記載されている。 FIG. 3 is a diagram showing an example of a web page of “baseball player A” according to the present embodiment. The web page of "baseball player A" contains data related to "baseball player A". Further, FIG. 4 is a diagram showing an example of a web page of “baseball team B” according to the present embodiment. The web page of "Baseball Team B" contains data related to "Baseball Team B".

図５は、本実施形態に係るコンテンツ情報Ｄ２の一例を示す図である。コンテンツ情報Ｄ２とは、エンティティと、テキストデータとが関連付けられた情報である。コンテンツ情報Ｄ２は、ウェブページに含まれる関連データを収集した収集結果又は検索エンジンの検索ログに基づいて生成される。コンテンツ情報Ｄ２は、例えば、エンティティのウェブページに含まれるテキストデータが収集され、収集されたテキストデータと、エンティティとが関連付けられた情報である。また、コンテンツ情報Ｄ２は、例えば、エンティティがクエリとして入力された際、当該エンティティと共に入力されたテキストデータと、エンティティとが関連付けられた情報である。図５に示される一例のコンテンツ情報Ｄ２において、「野球選手Ａ」というエンティティには、「３０００本安打」というテキストデータが関連付けられ、「野球チームＢ」というエンティティには、「２０１７年優勝」というテキストデータが関連付けられる。 FIG. 5 is a diagram showing an example of the content information D2 according to the present embodiment. The content information D2 is information in which an entity and text data are associated with each other. The content information D2 is generated based on the collection result of collecting the related data contained in the web page or the search log of the search engine. The content information D2 is, for example, information in which text data included in a web page of an entity is collected and the collected text data is associated with the entity. Further, the content information D2 is, for example, information in which, when an entity is input as a query, the text data input together with the entity is associated with the entity. In the example content information D2 shown in FIG. 5, the entity "baseball player A" is associated with the text data "3000 hits", and the entity "baseball team B" is referred to as "2017 championship". Text data is associated.

＜１−２．検索結果をウェブページ＞
図６は、本実施形態に係る検索結果ウェブページの一例を示す図である。図６に示されるように、表示部２１０には、クエリ入力領域２１１と、ナレッジパネル２２０と、検索結果２３０とが含まれる検索結果ウェブページが表示される。ナレッジパネル２２０には、クエリに対応するエンティティ２２１と、エンティティ２２１に関連する画像２２２と、エンティティ２２１に関連する関連情報２２３とが含まれる。ユーザは、端末装置２００を用いて、端末装置２００の表示部２１０に表示された検索ページにクエリを入力する。クエリとは、一つの検索ワードまたは複数の検索ワードの組み合わせである。端末装置２００は、ユーザによって入力されたクエリをウェブサーバ３００に送信する。 <1-2. Search results web page ＞
FIG. 6 is a diagram showing an example of a search result web page according to the present embodiment. As shown in FIG. 6, the display unit 210 displays a search result web page including a query input area 211, a knowledge panel 220, and a search result 230. The knowledge panel 220 includes the entity 221 corresponding to the query, the image 222 associated with the entity 221 and the relevant information 223 associated with the entity 221. The user uses the terminal device 200 to input a query on the search page displayed on the display unit 210 of the terminal device 200. A query is a search word or a combination of multiple search words. The terminal device 200 sends the query entered by the user to the web server 300.

ウェブサーバ３００は、端末装置２００から受信したクエリをナレッジデータサーバ１００に送信する。ナレッジデータサーバ１００は、受信したクエリに対応するエンティティに関連する情報をウェブサーバ３００に送信する。ナレッジデータサーバ１００は、例えば、コンテンツ情報Ｄ２に基づいて、受信したクエリ（テキストデータ）に関連付けられたエンティティを判定する。また、ナレッジデータサーバ１００は、ナレッジグラフＤ１に基づいて、判定したエンティティに対応付けられた所在情報をウェブサーバ３００に送信する。ウェブサーバ３００は、受信した所在情報に基づいて、当該所在情報が示すウェブページのうち、エンティティの関連データを抽出する。ウェブサーバ３００は、抽出したエンティティの関連データ（この一例では、ナレッジパネル２２０に示す情報）を含めた検索結果ウェブページ（例えば、図６）を生成する。 The web server 300 transmits the query received from the terminal device 200 to the knowledge data server 100. The knowledge data server 100 sends information related to the entity corresponding to the received query to the web server 300. The knowledge data server 100 determines an entity associated with a received query (text data), for example, based on the content information D2. Further, the knowledge data server 100 transmits the location information associated with the determined entity to the web server 300 based on the knowledge graph D1. Based on the received location information, the web server 300 extracts the relevant data of the entity from the web pages indicated by the location information. The web server 300 generates a search result web page (eg, FIG. 6) that includes relevant data of the extracted entity (in this example, the information shown in the knowledge panel 220).

図６に示される例においては、クエリとして「３０００本安打」が入力され、ナレッジパネル２２０には「３０００本安打」に関連する画像や様々な情報が表示されている。ナレッジパネル２２０には、例えば、「３０００本安打」を達成した「野球選手Ａ」の画像２２２や、「野球選手Ａ」の生年月日や出身地等の関連情報２２３が表示される。このように、検索結果２３０だけでなく、ナレッジパネル２２０を表示部２１０に表示することで、検索結果に対するユーザ満足度を向上させることができる。 In the example shown in FIG. 6, "3000 hits" is input as a query, and an image and various information related to "3000 hits" are displayed on the knowledge panel 220. On the knowledge panel 220, for example, an image 222 of "baseball player A" who achieved "3000 hits" and related information 223 such as the date of birth and birthplace of "baseball player A" are displayed. In this way, by displaying not only the search result 230 but also the knowledge panel 220 on the display unit 210, the user satisfaction with the search result can be improved.

＜１−３．少数派のテキストデータが関連付けられたコンテンツ情報Ｄ２＞
図７は、本実施形態に係るコンテンツ情報Ｄ２の他の一例であるコンテンツ情報Ｄ２ａを示す図である。ここで、コンテンツ情報Ｄ２において異なる複数のエンティティに同一のテキストデータが関連付けられる場合がある。例えば、人物を示すエンティティには、人物の性別を示す「男性」のテキストデータや「女性」のテキストデータが対応付けられる。このため、エンティティがそれぞれ異なる人物を示す場合であっても、同一のテキストデータ（例えば、「男性」や「女性」）が対応付けられる。これに対して、エンティティには、他のエンティティにあまり対応付けられないテキストデータが対応付けられる場合がある。図７は、エンティティに少数派のテキストデータが対応付けられたコンテンツ情報Ｄ２（以下、コンテンツ情報Ｄ２ａ）を示す図である。 <1-3. Content information D2 associated with minority text data>
FIG. 7 is a diagram showing content information D2a, which is another example of content information D2 according to the present embodiment. Here, in the content information D2, the same text data may be associated with a plurality of different entities. For example, an entity indicating a person is associated with "male" text data or "female" text data indicating the gender of the person. Therefore, even if the entities indicate different persons, the same text data (for example, "male" or "female") is associated with each other. On the other hand, an entity may be associated with text data that is not often associated with other entities. FIG. 7 is a diagram showing content information D2 (hereinafter, content information D2a) in which a minority text data is associated with an entity.

図７に示される例において、コンテンツ情報Ｄ２ａは、「野球選手Ａ」というエンティティに対し、「暴力男」という少数派のテキストデータが関連付けられている。ここで、コンテンツ情報Ｄ２の生成する時点では、エンティティに関連付けられるテキストデータが一般的な意見であるか（多数派であるか）又は少数派の意見であるかについては検証されない。例えば、「野球選手Ａ」に対して少数派の内容（例えば、「暴力男」）を記載したウェブページがネットワークＮＷに存在する場合、当該ウェブページに基づいて、図７に示されるようなコンテンツ情報Ｄ２ａが生成される。 In the example shown in FIG. 7, in the content information D2a, the text data of the minority "violent man" is associated with the entity "baseball player A". Here, at the time of generating the content information D2, it is not verified whether the text data associated with the entity is a general opinion (majority) or a minority opinion. For example, if there is a web page in the network NW that describes the content of a minority (for example, "violent man") for "baseball player A", the content as shown in FIG. 7 based on the web page. Information D2a is generated.

＜１−４．少数派の意見に基づくナレッジパネル２２０＞
図８は、本実施形態に係る検索結果ウェブページの他の一例を示す図である。クエリ入力領域２１１に少数派の意見であるテキストデータが入力されることに伴って、表示部２１０には、当該テキストデータに関連付けられているエンティティの情報を、例えば、ナレッジパネル２２０に表示した検索結果ウェブページが表示される。図８に示される例においては、クエリ入力領域２１１に「暴力男」というテキストデータが入力された場合、ナレッジデータサーバ１００は、「暴力男」に関連付けられているエンティティが、「野球選手Ａ」であると判定する。そして、ナレッジデータサーバ１００は、クエリに応じたエンティティの情報として、「野球選手Ａ」のウェブページの所在情報をウェブサーバ３００に供給する。ウェブサーバ３００は、「暴力男」に関連する関連データとして「野球選手Ａ」の画像２２２や、「野球選手Ａ」の生年月日や出身地等の関連情報２２３を含むナレッジパネル２２０を表示する検索結果ウェブページを生成する。この場合、ユーザは、検索結果ウェブページを参照し、「野球選手Ａ」が「暴力男」であるという印象を受ける。また、悪意あるユーザは、「野球選手Ａ」が「暴力男」であるという悪い印象を与えたい場合、「野球選手Ａ」が「暴力男」であるという内容をウェブページに記載することにより、ナレッジグラフＤ１にこの関係が登録されるように誘導し、図８に示されるような検索結果ウェブページを表示させることができる。 <1-4. Knowledge Panel 220 Based on Minority Opinions>
FIG. 8 is a diagram showing another example of the search result web page according to the present embodiment. As text data, which is a minority opinion, is input to the query input area 211, the display unit 210 displays information on the entity associated with the text data, for example, on the knowledge panel 220. The result web page is displayed. In the example shown in FIG. 8, when the text data "violent man" is input to the query input area 211, in the knowledge data server 100, the entity associated with the "violent man" is "baseball player A". Is determined to be. Then, the knowledge data server 100 supplies the location information of the web page of the "baseball player A" to the web server 300 as the information of the entity corresponding to the query. The web server 300 displays a knowledge panel 220 including an image 222 of "baseball player A" and related information 223 such as the date of birth and birthplace of "baseball player A" as related data related to "violent man". Generate a search result web page. In this case, the user refers to the search result web page and gets the impression that "baseball player A" is a "violent man". In addition, if a malicious user wants to give a bad impression that "baseball player A" is a "violent man", he / she can describe the content that "baseball player A" is a "violent man" on the web page. It is possible to induce the knowledge graph D1 to register this relationship and display the search result web page as shown in FIG.

このような処理が行われることを抑制するため、エンティティに関連付けられたテキストデータは、当該テキストデータが多数派であるか又は少数派であるかを示す指標が取得されることが好ましい。本実施形態のナレッジデータサーバ１００は、エンティティに関連付けられるテキストデータが適切であるかを示す指標（以下、得点）を導出する。以下、ナレッジデータサーバ１００の詳細な構成について説明する。 In order to suppress such processing, it is preferable that the text data associated with the entity obtains an index indicating whether the text data is a majority or a minority. The knowledge data server 100 of the present embodiment derives an index (hereinafter referred to as a score) indicating whether the text data associated with the entity is appropriate. Hereinafter, the detailed configuration of the knowledge data server 100 will be described.

＜１−５．ナレッジデータサーバ１００の構成＞
図１に戻り、制御部１１０は、例えば、ＣＰＵ（Central Processing Unit）などのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。制御部１１０は、例えば、選択部１１１と、収集部１１２と、得点算出部１１３と、所在情報取得部１１４と、通信Ｉ／Ｆ（Interface）１１５を備える。また、これらの構成要素のうち一部または全部（内包する記憶部を除く）は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）などのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。通信Ｉ／Ｆ１１５は、ネットワークＮＷを介した各機能部と、他の機器（例えば、クロールの対象機器（以下、クロール対象機器ＤＶ）、端末装置２００、及びウェブサーバ３００）との通信を仲介する。 <1-5. Configuration of Knowledge Data Server 100>
Returning to FIG. 1, the control unit 110 is realized by, for example, a hardware processor such as a CPU (Central Processing Unit) executing a program (software). The control unit 110 includes, for example, a selection unit 111, a collection unit 112, a score calculation unit 113, a location information acquisition unit 114, and a communication I / F (Interface) 115. In addition, some or all of these components (excluding the contained storage unit) are LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), GPU (Graphics Processing). It may be realized by hardware (circuit unit; including circuitry) such as Unit), or it may be realized by the cooperation of software and hardware. The communication I / F 115 mediates communication between each functional unit via the network NW and other devices (for example, a crawl target device (hereinafter, crawl target device DV), a terminal device 200, and a web server 300). ..

選択部１１１は、コンテンツ情報Ｄ２から得点算出対象の任意のテキストデータ（以下、対象テキストデータ）を選択する。収集部１１２は、選択部１１１によって選択された対象テキストデータに関連するウェブページ（以下、関連ウェブページ）を、ネットワークＮＷを介して収集する。収集部１１２が収集するウェブページとは、例えば、利用者によって編集可能な百科事典のウェブページや、対象テキストデータをクエリとした場合の検索結果ウェブページである。以下、利用者によって編集可能な百科事典のウェブページを、百科事典ウェブページと記載する。ここで、百科事典ウェブページとは、第１種類の関連ウェブページの一例である。また、検索結果ウェブページとは、第２種類の関連ウェブページの一例である。 The selection unit 111 selects arbitrary text data (hereinafter referred to as target text data) to be scored from the content information D2. The collection unit 112 collects a web page (hereinafter referred to as a related web page) related to the target text data selected by the selection unit 111 via the network NW. The web page collected by the collection unit 112 is, for example, an encyclopedia web page that can be edited by a user, or a search result web page when the target text data is used as a query. Hereinafter, the encyclopedia web page that can be edited by the user is referred to as an encyclopedia web page. Here, the encyclopedia web page is an example of the first type of related web page. The search result web page is an example of a second type of related web page.

図９は、収集部１１２が収集した百科事典ウェブページの一例を示す図である。図９に示される例においては、収集部１１２は、百科事典ウェブページのうち、コンテンツ情報Ｄ２が示す「野球選手Ａ」のエンティティに関連付けられる「３０００本安打」という対象テキストデータが記載された関連ウェブページを収集する。収集部１１２は、例えば、百科事典ウェブページのＨＴＭＬ（HyperText Markup Language）データを含む収集対象データを、ネットワークＮＷを介してクロール対象機器ＤＶから収集する。収集部１１２は、いわゆるクローラプログラムにより実現される。ここで、収集対象データは、ネットワークＮＷ上（クロール対象機器ＤＶの記憶領域内）にあり、ブラウザで閲覧可能なデータである。なお、収集対象データは、ブラウザに限らず、アプリケーションプログラムによって再生されるデータであってもよい。ネットワークＮＷは、ワールドワイドウェブ（World Wide Web）を意味し、インターネットやイントラネットで標準的に用いられるＨＴＭＬ文書などを利用したシステムである。収集対象データは、例えば、ＨＴＭＬのソースを示すテキストデータを含む関連ウェブページである。収集部１１２は、百科事典ウェブページのうち、対象テキストデータが記載された関連ウェブページが存在する場合、そのすべてを収集する。 FIG. 9 is a diagram showing an example of an encyclopedia web page collected by the collection unit 112. In the example shown in FIG. 9, the collecting unit 112 describes the target text data of "3000 hits" associated with the entity of "baseball player A" indicated by the content information D2 in the encyclopedia web page. Collect web pages. The collection unit 112 collects data to be collected including HTML (HyperText Markup Language) data of an encyclopedia web page from a crawl target device DV via a network NW, for example. The collecting unit 112 is realized by a so-called crawler program. Here, the data to be collected is data that is on the network NW (in the storage area of the device DV to be crawled) and can be viewed by a browser. The data to be collected is not limited to the browser, and may be data reproduced by the application program. The network NW means the World Wide Web, and is a system using HTML documents and the like that are standardly used on the Internet and intranets. The data to be collected is, for example, a related web page containing text data indicating the source of HTML. The collection unit 112 collects all the related web pages in which the target text data is described among the encyclopedia web pages.

また、図１０は、収集部１１２が収集した検索結果ウェブページの一例を示す図である。図１０に示される例においては、検索結果ウェブページは、コンテンツ情報Ｄ２が示すエンティティ「野球選手Ａ」が検索結果として表示される、「３０００本安打」というクエリ（図示するテキストＴＸ２１）が入力された場合の検索結果を示している。クエリは、対象テキストデータの一例である。 Further, FIG. 10 is a diagram showing an example of a search result web page collected by the collection unit 112. In the example shown in FIG. 10, in the search result web page, a query (illustrated text TX21) in which the entity "baseball player A" indicated by the content information D2 is displayed as a search result is input. The search results are shown. The query is an example of target text data.

図１に戻り、得点算出部１１３は、収集部１１２によって収集された関連ウェブページに基づいて、対象テキストデータとエンティティとの組の得点を算出する。収集部１１２は、例えば、式（１）が示すＰ（ｅ｜ｓ）の値を得点として算出する。式（１）における各種要素については、後述する式（２）、（３）並びに関連ウェブページの種類ごとに説明する。 Returning to FIG. 1, the score calculation unit 113 calculates the score of the set of the target text data and the entity based on the related web page collected by the collection unit 112. The collecting unit 112 calculates, for example, the value of P (e | s) represented by the equation (1) as a score. Various elements in the formula (1) will be described for each of the formulas (2) and (3) described later and the types of related web pages.

以下、関連ウェブページが百科事典ウェブページである場合の得点算出処理の詳細について説明し、次に、関連ウェブページが検索結果ウェブページである場合の得点算出処理の詳細について説明する。 Hereinafter, the details of the score calculation process when the related web page is an encyclopedia web page will be described, and then the details of the score calculation process when the related web page is a search result web page will be described.

＜１−６．関連ウェブページが百科事典ウェブページの場合＞
得点算出部１１３は、百科事典ウェブページに含まれるテキストデータのうち、対象テキストデータと関連付けられたエンティティを示すテキストデータが、リンク先を示すテキスト（アンカーテキスト）として含まれる確率を算出する。関連ウェブページには、対象テキストデータ（この一例では、「３０００本安打」（図示するテキストＴＸ１１）と関連付けられたエンティティ（この一例では、「野球選手Ａ」）を示すテキストデータ（図示するテキストＴＸ１２〜１４）が含まれる。また、テキストＴＸ１２〜１４のうち、テキストＴＸ１２は、アンカーテキストである。また、テキストＴＸ１２に付与された他のウェブページのリンクは、ナレッジグラフＤ１において、対象テキストデータに関連付けられたエンティティ（この一例では、「野球選手Ａ」）に対応付けられた所在情報である。 <1-6. If the related web page is an encyclopedia web page>
The score calculation unit 113 calculates the probability that the text data indicating the entity associated with the target text data is included as the text (anchor text) indicating the link destination among the text data included in the encyclopedia web page. The related web page contains text data (illustrated text TX12) indicating the target text data (in this example, "3000 hits" (illustrated text TX11) and the entity associated with it (in this example, "baseball player A")). ~ 14) is included. Further, among the text TX12-14, the text TX12 is an anchor text. Further, the link of another web page given to the text TX12 is included in the target text data in the knowledge graph D1. The location information associated with the associated entity (in this example, "baseball player A").

ここで、得点算出部１１３は、以下の式（２）によって、対象テキストデータを含むすべての百科事典ウェブページにおける対象テキストの数に対して、アンカーテキストである対象テキストデータの数によって求められる確率（以下、アンカーテキスト確率）を算出する。式（２）におけるａ_ｓは、アンカーテキスト確率を示している。ここで、アンカーテキスト確率とは、第１確率の一例である。 Here, the score calculation unit 113 is a probability obtained by the number of target text data which is an anchor text with respect to the number of target texts in all the encyclopedia web pages including the target text data by the following formula (2). (Hereinafter, anchor text probability) is calculated. _{A s} in equation (2) shows the anchor text probability. Here, the anchor text probability is an example of the first probability.

また、得点算出部１１３は、以下の式（３）によって、当該アンカーテキストのリンク先が、ナレッジグラフＤ１において対象テキストデータと関連付けられた所在情報に対応するエンティティのウェブページ（以下、特定のウェブページ）である確率（以下、第１エンティティ確率）を算出する。式（３）におけるｅとは、第１エンティティ確率を示している。ここで、第１エンティティ確率とは、第３確率の一例である。 Further, the score calculation unit 113 uses the following formula (3) to indicate that the link destination of the anchor text is the web page of the entity corresponding to the location information associated with the target text data in the knowledge graph D1 (hereinafter, specific web). Page) is calculated (hereinafter referred to as the first entity probability). The e in the equation (3) indicates the first entity probability. Here, the first entity probability is an example of the third probability.

収集部１１２が収集した関連ウェブページが図９に示すウェブページのみである場合、得点算出部１１３は、すべての関連ウェブページに含まれる対象テキストデータのうち、当該対象テキストデータがアンカーテキストである確率（つまり、アンカーテキスト確率）を「１／３」と算出する。図示する対象テキストデータの中で、リンクが付与されるのは一つだからである（二つ以上付与されている場合も一つとカウントしてよい）。また、得点算出部１１３は、当該アンカーテキストのリンク先が、対象テキストデータと関連付けられるエンティティの特定のウェブページである確率（第１エンティティ確率）を「１」と算出する。 When the related web page collected by the collecting unit 112 is only the web page shown in FIG. 9, the score calculation unit 113 uses the target text data as the anchor text among the target text data included in all the related web pages. The probability (that is, the anchor text probability) is calculated as "1/3". This is because, in the target text data shown in the figure, only one link is given (even if two or more links are given, it may be counted as one). Further, the score calculation unit 113 calculates the probability (first entity probability) that the link destination of the anchor text is a specific web page of the entity associated with the target text data as "1".

＜１−７．関連ウェブページが検索結果ウェブページの場合＞
得点算出部１１３は、検索結果ウェブページに含まれる対象テキストデータと関連付けられたエンティティをテキストとして含むリンク先が選択された確率を算出する。対象テキストデータは、「３０００本安打」（図示するテキストＴＸ２１）であり、対象テキストデータと対応け付られたエンティティは、「野球選手Ａ」である。以降の説明において、対象テキストデータと関連付けられたエンティティに係るテキストを、対象エンティティテキストデータと記載する。図１０の例では、関連ウェブページには、対象テキストデータの関連ウェブページのリンクが付与されたテキストデータ（図示するテキストＴＸ２２〜２４）が含まれる。このうち、対象エンティティテキストデータ（この一例では、「野球選手Ａ」）が含まれるのは、テキストＴＸ２２である。テキストＴＸ２２に付与されたリンク先は、例えば、ナレッジグラフＤ１において、対象テキストデータに関連付けられたエンティティ（この一例では、「野球選手Ａ」）に対応付けられた所在情報である。つまり、テキストＴＸ２２が有する他のウェブページのリンクは、特定のウェブページであり、一例として百科事典ウェブページである。なお、検索結果ウェブページには、リンクを有さないテキストであって、エンティティを示すテキストＴＸ２５が含まれてよい。 <1-7. If the related web page is a search result web page>
The score calculation unit 113 calculates the probability that the link destination including the entity associated with the target text data included in the search result web page is selected as the text. The target text data is "3000 hits" (text TX21 in the figure), and the entity associated with the target text data is "baseball player A". In the following description, the text related to the entity associated with the target text data will be referred to as the target entity text data. In the example of FIG. 10, the related web page includes text data (texts TX22 to 24 shown) to which a link of the related web page of the target text data is added. Of these, the text TX22 contains the target entity text data (in this example, "baseball player A"). The link destination given to the text TX 22 is, for example, the location information associated with the entity (“baseball player A” in this example) associated with the target text data in the Knowledge Graph D1. That is, the link of another web page of the text TX22 is a specific web page, for example, an encyclopedia web page. It should be noted that the search result web page may include text TX25, which is text without a link and indicates an entity.

得点算出部１１３は、式（２）によって、すべての検索結果ウェブページに含まれるそれぞれのリンク先がクリックされた回数に対して、対象エンティティテキストデータを含むリンク先がクリックされた回数の比として求められる確率（以下、エンティティテキストクリック確率）を算出する。なお、「クリック」とは、選択することの一例であり、タッチ、タップその他の態様で選択されてもよい。以下の説明では代表してクリックと称する。式（２）におけるａ_ｓは、エンティティテキストクリック確率を示している。ここで、エンティティテキストクリック確率とは、第２確率の一例である。 The score calculation unit 113 determines the ratio of the number of times the link destination including the target entity text data is clicked to the number of times each link destination included in all the search result web pages is clicked by the formula (2). Calculate the required probability (hereinafter, entity text click probability). The "click" is an example of selection, and may be selected by touch, tap, or other modes. In the following description, it is collectively referred to as click. A _s in equation (2) shows the entity text click probability. Here, the entity text click probability is an example of the second probability.

また、得点算出部１１３は、式（３）によって、対象エンティティテキストデータを含むリンク先が、エンティティに関する特定のウェブページである確率（以下、第２エンティティ確率）を算出する。式（３）におけるｅは、第２エンティティ確率を示している。ここで、第２エンティティ確率とは、第３確率の他の一例である。 Further, the score calculation unit 113 calculates the probability that the link destination including the target entity text data is a specific web page related to the entity (hereinafter referred to as the second entity probability) by the equation (3). E in the equation (3) indicates the second entity probability. Here, the second entity probability is another example of the third probability.

＜１−８．異なる事物を指す同一のエンティティが含まれる場合＞
ここで、エンティティは、同一の文言であっても異なる事物を指す場合がある。図１１は、同一の文言によって異なるエンティティのウェブページにリンクしている例を示す図である。図１１に示される一例において、テキストＴＸ２２は、「野球選手Ａ」のウェブページ（例えば、図３）のリンク先を示す。これに対し、テキストＴＸ２６は、「野球選手Ａ」同姓同名の野球選手であって、「野球選手Ａ」とは異なる野球チームＣに所属する選手のウェブページのリンク先を示す。得点算出部１１３は、式（２）において、テキストＴＸ２６がクリックされた場合もエンティティクリック確率に算入する。 <1-8. When the same entity pointing to different things is included>
Here, an entity may refer to different things even if they have the same wording. FIG. 11 is a diagram showing an example of linking to a web page of a different entity by the same wording. In the example shown in FIG. 11, the text TX22 indicates a link destination of a web page (for example, FIG. 3) of "baseball player A". On the other hand, the text TX26 indicates a link destination of a web page of a player who is a baseball player with the same surname and the same name as "baseball player A" and belongs to a baseball team C different from "baseball player A". The score calculation unit 113 also includes the click of the text TX26 in the entity click probability in the equation (2).

図１２は、エンティティを示す文言からリンクした、別のエンティティに関するウェブページの一例を示す図である。図１２に示される「野球選手Ａ」のウェブページには、「３０００本安打」というテキストデータに関連付けられる「野球選手Ａ」（図３に示される「野球選手Ａ」）とは異なる野球選手である「野球選手Ａ」に関連する関連データが記載されている。 FIG. 12 is a diagram showing an example of a web page relating to another entity linked from the wording indicating the entity. The web page of "baseball player A" shown in FIG. 12 is a baseball player different from "baseball player A" ("baseball player A" shown in FIG. 3) associated with the text data "3000 hits". Related data related to a certain "baseball player A" is described.

得点算出部１１３は、第１エンティティ確率を算出する際、当該リンク先が図１２に示されるウェブページの場合には、当該ウェブページをエンティティのウェブページとしてカウントしない。
また、得点算出部１１３は、第２エンティティ確率を算出する際も、当該リンク先が図１２に示されるウェブページの場合には、当該ウェブページをエンティティのウェブページとしてカウントしない。 When the score calculation unit 113 calculates the probability of the first entity, if the link destination is the web page shown in FIG. 12, the score calculation unit 113 does not count the web page as the web page of the entity.
Further, even when the score calculation unit 113 calculates the second entity probability, if the link destination is the web page shown in FIG. 12, the score calculation unit 113 does not count the web page as the web page of the entity.

＜１−９．関連ウェブページ毎の重み付け＞
また、得点算出部１１３は、関連ウェブページが百科事典ウェブページである場合において算出されたアンカーテキスト確率に対して第１エンティティ確率を乗じた値に対して、以下の式（４）を更に乗じてもよい。また、得点算出部１１３は、関連ウェブページが検索結果ウェブページである場合において算出されたエンティティテキストクリック確率に対して第２エンティティ確率を乗じた値に対して、以下の式（４）を更に乗じてもよい。そして、得点算出部１１３は、例えば、両者を加算して対数を求めることで、スコアを算出する。 <1-9. Weighting for each related web page>
Further, the score calculation unit 113 further multiplies the value obtained by multiplying the anchor text probability calculated when the related web page is the encyclopedia web page by the first entity probability by the following equation (4). You may. Further, the score calculation unit 113 further applies the following equation (4) to the value obtained by multiplying the entity text click probability calculated when the related web page is the search result web page by the second entity probability. You may multiply. Then, the score calculation unit 113 calculates the score by, for example, adding both to obtain a logarithm.

式（４）は、百科事典ウェブページを用いて式（２）および式（３）を計算したサンプル数と、検索結果ウェブページを用いて式（２）および式（３）を計算したサンプル数との割合を示す。 Equation (4) is the number of samples for which equations (2) and (3) are calculated using the encyclopedia web page, and the number of samples for which equations (2) and (3) are calculated using the search result web page. The ratio with and is shown.

得点算出部１１３は、百科事典ウェブページを用いて式（２）および式（３）を計算したサンプル数の割合を、アンカーテキスト確率と、第１エンティティ確率とを乗じた値に更に乗じ、検索結果ウェブページを用いて式（２）および式（３）を計算したサンプル数の割合を、エンティティテキストクリック確率と、第２エンティティ確率とを乗じた値に更に乗じ、両者を加算することで、得点に重み付けを行う。 The score calculation unit 113 further multiplies the ratio of the number of samples for which equations (2) and (3) are calculated using the encyclopedia web page by the value obtained by multiplying the anchor text probability and the first entity probability, and searches. By further multiplying the ratio of the number of samples calculated in equations (2) and (3) using the result web page by the value obtained by multiplying the entity text click probability and the second entity probability, and adding both. Weight the score.

得点算出部１１３は、算出した得点をコンテンツ情報Ｄ２に対応付けて記憶させる。図１３は、本実施形態に係るコンテンツ情報Ｄ２及び得点の一例を示す図である。 The score calculation unit 113 stores the calculated score in association with the content information D2. FIG. 13 is a diagram showing an example of the content information D2 and the score according to the present embodiment.

図１に戻り、所在情報取得部１１４は、記憶部１２０に記憶されたコンテンツ情報Ｄ２を参照して、ウェブサーバ３００から受信したクエリに対応するエンティティを取得する。所在情報取得部１１４は、受信したクエリと合致するテキストデータに所定の閾値以上の得点が対応付けられている場合、当該テキストデータに対応付けられているエンティティを、当該クエリに対応するエンティティとして判定する。所在情報取得部１１４は、ナレッジグラフＤ１に基づいて、判定したエンティティに対応付けられている所在情報を取得する。その後、所在情報取得部１１４は、取得した所在情報をウェブサーバ３００に送信する。ここで、所在情報取得部１１４は、判定部の一例である。 Returning to FIG. 1, the location information acquisition unit 114 refers to the content information D2 stored in the storage unit 120, and acquires an entity corresponding to the query received from the web server 300. When the text data matching the received query is associated with a score equal to or higher than a predetermined threshold value, the location information acquisition unit 114 determines the entity associated with the text data as the entity corresponding to the query. do. The location information acquisition unit 114 acquires the location information associated with the determined entity based on the knowledge graph D1. After that, the location information acquisition unit 114 transmits the acquired location information to the web server 300. Here, the location information acquisition unit 114 is an example of a determination unit.

このように、所在情報取得部１１４は、ウェブサーバ３００からクエリを受信した場合、クエリに対応するテキストデータを特定する。このとき、クエリとテキストデータとが完全一致する場合だけでなく、部分一致または意味的に同一とみなされる程度の相違がある場合も、対応するテキストデータとみなしてよい。 In this way, when the location information acquisition unit 114 receives the query from the web server 300, the location information acquisition unit 114 identifies the text data corresponding to the query. At this time, not only when the query and the text data are exactly the same, but also when there is a partial match or a difference to the extent that they are considered to be semantically the same, it may be regarded as the corresponding text data.

＜１−１０．ナレッジデータサーバ１００の動作＞
以下、ナレッジデータサーバ１００の動作について説明する。図１４は、本実施形態のナレッジデータサーバ１００の得点を算出する動作の一例を示す流れ図である。選択部１１１は、コンテンツ情報Ｄ２のうち、得点算出対象の対象テキストデータを選択する（Ｓ１１０）。次に、収集部１１２は、対象テキストデータに基づいて、関連ウェブページ（百科事典ウェブページ及び検索結果ウェブページ）を収集する（Ｓ１２０）。次に、得点算出部１１３は、アンカーテキスト確率及びエンティティテキストクリック確率を算出する（Ｓ１３０）。 <1-10. Operation of Knowledge Data Server 100>
Hereinafter, the operation of the knowledge data server 100 will be described. FIG. 14 is a flow chart showing an example of an operation of calculating the score of the knowledge data server 100 of the present embodiment. The selection unit 111 selects the target text data to be scored from the content information D2 (S110). Next, the collection unit 112 collects related web pages (encyclopedia web page and search result web page) based on the target text data (S120). Next, the score calculation unit 113 calculates the anchor text probability and the entity text click probability (S130).

次に、得点算出部１１３は、第１エンティティ確率及び第２エンティティ確率を算出する（Ｓ１４０）。そして、得点算出部１１３は、百科事典ウェブページに基づくアンカーテキスト確率、第１エンティティ確率、及び第１コンテンツ割合を乗じた値と、検索結果ウェブページに基づくエンティティテキストクリック確率、第２エンティティ確率、及び第２コンテンツ割合を乗じた値との乗を得点として算出し（Ｓ１５０）算出した得点をコンテンツ情報Ｄ２に対応付けて記憶部１２０に記憶させる（Ｓ１６０）。 Next, the score calculation unit 113 calculates the first entity probability and the second entity probability (S140). Then, the score calculation unit 113 multiplies the anchor text probability based on the encyclopedia web page, the first entity probability, and the first content ratio, the entity text click probability based on the search result web page, the second entity probability, and the like. And, the multiplication with the value obtained by multiplying the second content ratio is calculated as a score (S150), and the calculated score is associated with the content information D2 and stored in the storage unit 120 (S160).

図１５は、本実施形態のナレッジデータサーバ１００の得点に基づく動作の一例を示す流れ図である。所在情報取得部１１４は、ウェブサーバ３００からクエリを受信する（Ｓ２１０）。次に、所在情報取得部１１４は、コンテンツ情報Ｄ２のうち、クエリと合致するテキストデータに対応付けられている得点を読み出す（Ｓ２２０）。次に、所在情報取得部１１４は、得点が所定の閾値以上であるかを判定する（Ｓ２３０）。所在情報取得部１１４は、得点が所定の閾値以上である場合、ナレッジグラフＤ１においてテキストデータに関連付けられているエンティティの所在情報をウェブサーバ３００に送信する（Ｓ２４０）。また、所在情報取得部１１４は、特定が所定の閾値未満である場合、ウェブサーバ３００に所在情報を送信しない（Ｓ２５０）。 FIG. 15 is a flow chart showing an example of an operation based on the score of the knowledge data server 100 of the present embodiment. The location information acquisition unit 114 receives a query from the web server 300 (S210). Next, the location information acquisition unit 114 reads out the score associated with the text data matching the query in the content information D2 (S220). Next, the location information acquisition unit 114 determines whether the score is equal to or higher than a predetermined threshold value (S230). When the score is equal to or higher than a predetermined threshold value, the location information acquisition unit 114 transmits the location information of the entity associated with the text data in the knowledge graph D1 to the web server 300 (S240). Further, the location information acquisition unit 114 does not transmit the location information to the web server 300 when the specification is less than a predetermined threshold value (S250).

＜１−１１．得点算出のバリエーション＞
なお、上述では、得点算出部１１３が、アンカーテキスト確率、エンティティテキストクリック確率、第１エンティティ確率、第２エンティティ確率、第１コンテンツ割合及び第２コンテンツ割合に基づいて、得点を算出する場合について説明したが、これに限られない。例えば、関連ウェブページのうち、収集された百科事典ウェブページの数が少ない場合、百科事典ウェブページに関する各種値（例えば、アンカーテキスト確率、第１エンティティ確率、及び第１コンテンツ割合）が得点の算出に与える影響は、少ない可能性がある。したがって、得点算出部１１３は、アンカーテキスト確率、第１エンティティ確率、及び第１コンテンツ割合を得点の算出に用いなくてもよい。また、関連ウェブページのうち、収集された検索結果ウェブページの数が少ない場合、検索結果ウェブページに関する各種値（例えば、エンティティテキストクリック確率、第２エンティティ確率及び第２コンテンツ割合）が得点の算出に与える影響は、少ない可能性がある。したがって、得点算出部１１３は、エンティティテキストクリック確率、第２エンティティ確率及び第２コンテンツ割合を得点の算出に用いなくてよい。 <1-11. Variations in score calculation>
In the above description, the case where the score calculation unit 113 calculates the score based on the anchor text probability, the entity text click probability, the first entity probability, the second entity probability, the first content ratio, and the second content ratio will be described. However, it is not limited to this. For example, if the number of encyclopedia web pages collected is small among the related web pages, various values related to the encyclopedia web pages (for example, anchor text probability, first entity probability, and first content ratio) are used to calculate the score. The impact on may be small. Therefore, the score calculation unit 113 does not have to use the anchor text probability, the first entity probability, and the first content ratio to calculate the score. In addition, when the number of collected search result web pages is small among the related web pages, various values related to the search result web pages (for example, entity text click probability, second entity probability, and second content ratio) are used to calculate the score. The impact on may be small. Therefore, the score calculation unit 113 does not have to use the entity text click probability, the second entity probability, and the second content ratio to calculate the score.

＜１−１２．得点以外の評価方法＞
また、上述では、コンテンツ情報Ｄ２のテキストデータは、得点算出部１１３が算出した得点によって評価される場合について説明したが、これに限られない。得点算出部１１３は、例えば、算出した得点に基づいて、対象テキストデータを「適当」及び「不適当」の２段階によって評価する構成であってもよい。この場合、得点算出部１１３は、算出した得点が所定の閾値以上である場合、対象テキストデータに「適当」を示す情報を対応付ける。また、得点算出部１１３は、算出した得点が所定の閾値未満である場合、対象テキストデータに「不適当」を示す情報を対応付ける。所在情報取得部１１４は、取得したクエリが「不適当」が対応付けられているテキストデータと合致する場合、テキストデータに関連付けられているエンティティの所在情報を供給することが「不適当」であると判定し、当該所在情報をウェブサーバ３００に供給しない。なお、得点算出部１１３は、対象テキストデータを２段階評価する構成に代えて、３段階以上の評価を行ってもよい。 <1-12. Evaluation method other than score>
Further, in the above description, the case where the text data of the content information D2 is evaluated by the score calculated by the score calculation unit 113 has been described, but the present invention is not limited to this. The score calculation unit 113 may be configured to evaluate the target text data in two stages of “appropriate” and “inappropriate” based on the calculated score, for example. In this case, the score calculation unit 113 associates the target text data with information indicating "appropriate" when the calculated score is equal to or higher than a predetermined threshold value. Further, when the calculated score is less than a predetermined threshold value, the score calculation unit 113 associates the target text data with information indicating “inappropriate”. When the acquired query matches the text data to which "inappropriate" is associated, the location information acquisition unit 114 is "inappropriate" to supply the location information of the entity associated with the text data. It is determined that the location information is not supplied to the web server 300. The score calculation unit 113 may evaluate the target text data in three or more stages instead of the configuration in which the target text data is evaluated in two stages.

以上、説明したように、本実施形態のナレッジデータサーバ１００は、エンティティとテキストデータとが関連付けられたコンテンツ情報Ｄ２のうち、任意の対象のテキストデータを選択する選択部１１１と、選択部１１１によって選択された対象テキストデータに関連する一以上の関連ウェブページを収集する収集部１１２と、コンテンツ情報Ｄ２において対象テキストデータに関連付けられたエンティティ毎に、収集部１１２によって収集された一以上の関連ウェブページのうち第１種類の関連ウェブページにおいて、対象テキストデータがリンク先を示すテキストとして含まれる第１確率と、一以上の関連ウェブページのうち第２種類の関連ウェブページにおいて、コンテンツ情報Ｄ２において対象テキストデータと関連付けられたエンティティをテキストとして含むリンク先が選択された第２確率とのうち、少なくとも１つと、第１種類の関連ウェブページまたは第２種類の関連ウェブページにおけるテキストにより示されるリンク先にあるウェブページがコンテンツ情報Ｄ２において対象テキストデータと関連付けられるエンティティのウェブページである第３の確率とに基づいて、対象テキストデータに対応するエンティティとしての評価を行う評価部（得点算出部１１３）と、を備える。これによって、本実施形態のナレッジデータサーバ１００は、不適切な情報が検索結果とともに出力されるのを抑制することができる。 As described above, in the knowledge data server 100 of the present embodiment, the selection unit 111 and the selection unit 111 for selecting arbitrary target text data in the content information D2 in which the entity and the text data are associated with each other are used. A collection unit 112 that collects one or more related web pages related to the selected target text data, and one or more related webs collected by the collection unit 112 for each entity associated with the target text data in the content information D2. In the first type of related web page of the page, the first probability that the target text data is included as the text indicating the link destination, and in the second type of related web page of one or more related web pages, in the content information D2. A link indicated by at least one of the second probability that a link destination containing an entity associated with the target text data as text is selected and the text in the first type related web page or the second type related web page. An evaluation unit (score calculation unit 113) that evaluates as an entity corresponding to the target text data based on a third probability that the preceding web page is the web page of the entity associated with the target text data in the content information D2. ) And. As a result, the knowledge data server 100 of the present embodiment can suppress the output of inappropriate information together with the search result.

ここで、ネットワークＮＷに存在するウェブページには、エンティティに関連するワードとして適切ではない説明がなされたウェブページが存在する場合がある。しかしながら、適切ではない説明がなされたウェブページの数は、適切な説明がなされたウェブページの数と比較して少ない可能性がある。本実施形態のナレッジデータサーバ１００は、ネットワークＮＷに存在するウェブページ（百科事典ウェブページや検索結果ウェブページ）において、コンテンツ情報Ｄ２のテキストデータの出現確率が低い場合、当該テキストデータには、低い得点を付す。また、本実施形態のナレッジデータサーバ１００は、ネットワークＮＷに存在するウェブページにおいて、コンテンツ情報Ｄ２のテキストデータの出現確率が高い場合、当該テキストデータには、高い得点を付す。これにより、本実施形態のナレッジデータサーバ１００は、エンティティに関連するワードとして適切であるかの指標として得点を算出することができる。 Here, in the web page existing in the network NW, there may be a web page with an explanation that is not appropriate as a word related to the entity. However, the number of web pages with inappropriate explanations may be small compared to the number of web pages with proper explanations. When the knowledge data server 100 of the present embodiment has a low appearance probability of the text data of the content information D2 on the web page (encyclopedia web page or search result web page) existing in the network NW, the text data is low. Give a score. Further, the knowledge data server 100 of the present embodiment gives a high score to the text data when the appearance probability of the text data of the content information D2 is high on the web page existing in the network NW. As a result, the knowledge data server 100 of the present embodiment can calculate the score as an index of whether or not the word is appropriate as a word related to the entity.

また、本実施形態のナレッジデータサーバ１００において、得点算出部１１３は、検索結果ウェブページからコンテンツ情報Ｄ２において対象テキストデータと関連付けられるエンティティのウェブページに遷移した事象と、百科事典ウェブページから前記コンテンツ情報において対象テキストデータと関連付けられるエンティティのウェブページに遷移した事象との発生確率の割合に応じて、第１確率と、第２確率とに重み付けをする。本実施形態のナレッジデータサーバ１００によれば、エンティティに関連するワードとして適切であるかの指標として得点をより精度高く算出することができる。 Further, in the knowledge data server 100 of the present embodiment, the score calculation unit 113 transitions from the search result web page to the web page of the entity associated with the target text data in the content information D2, and the content from the encyclopedia web page. The first probability and the second probability are weighted according to the ratio of the occurrence probability to the event that has transitioned to the web page of the entity associated with the target text data in the information. According to the knowledge data server 100 of the present embodiment, the score can be calculated with higher accuracy as an index of whether or not the word is appropriate as a word related to an entity.

また、本実施形態のナレッジデータサーバ１００において収集部１１２は、関連ウェブページ（この一例では、百科事典ウェブページ）を、少なくとも利用者によって編集可能な百科事典から収集する。また、本実施形態のナレッジデータサーバ１００において、収集部１１２は、少なくとも対象テキストデータをクエリとして用いた場合の検索結果を表示するためのウェブページ（この一例では、検索結果ウェブページ）を、関連ウェブページとして収集する。ここで、ネットワークＮＷに存在る検索対象を検索する際、検索対象範囲を定めずに検索する場合、検索に係る処理の負荷が膨大になる可能性がある。本実施形態のナレッジデータサーバ１００によれば、関連ウェブページとして百科事典ウェブページ及び検索結果ウェブページを収集する。したがって、本実施形態のナレッジデータサーバ１００は、簡便な処理によって関連ウェブページを収集することができる。 Further, in the knowledge data server 100 of the present embodiment, the collection unit 112 collects a related web page (in this example, an encyclopedia web page) from at least an encyclopedia that can be edited by a user. Further, in the knowledge data server 100 of the present embodiment, the collecting unit 112 relates at least a web page for displaying a search result when the target text data is used as a query (in this example, a search result web page). Collect as a web page. Here, when searching for a search target existing in the network NW, if the search target range is not defined, the processing load related to the search may become enormous. According to the knowledge data server 100 of the present embodiment, an encyclopedia web page and a search result web page are collected as related web pages. Therefore, the knowledge data server 100 of the present embodiment can collect related web pages by a simple process.

また、本実施形態のナレッジデータサーバ１００において、所在情報取得部１１４は、入力されたクエリに対して、得点算出部１１３による評価結果（この一例では、得点又は２段階の評価）を参照し、検索結果に付加して出力すべきか否かを判定する。具体的には、本実施形態のナレッジデータサーバ１００は、評価部（この一例では、得点算出部１１３）は、対象テキストデータに対応するエンティティの得点を算出し、判定部（この一例では、所在情報取得部１１４）は、得点が所定の閾値以上である場合、当該得点が関連付けられたエンティティの情報を出力させ、前記得点が所定の閾値より少ない場合、当該得点が関連付けられたエンティティの情報を出力させない。これにより、本実施形態のナレッジデータサーバ１００は、クエリがエンティティに対応付けられたテキストデータとして適切である場合当該エンティティ関連データをウェブサーバ３００に出力する。したがって、本実施形態のナレッジデータサーバ１００は、ナレッジパネル２２０を表示部２１０に表示することで、検索結果に対するユーザ満足度を向上させつつ、適切なナレッジパネル２２０を提供することができる。 Further, in the knowledge data server 100 of the present embodiment, the location information acquisition unit 114 refers to the evaluation result by the score calculation unit 113 (in this example, the score or the evaluation in two stages) with respect to the input query. Judge whether to add to the search result and output. Specifically, in the knowledge data server 100 of the present embodiment, the evaluation unit (in this example, the score calculation unit 113) calculates the score of the entity corresponding to the target text data, and the determination unit (in this example, the location). The information acquisition unit 114) outputs the information of the entity to which the score is associated when the score is equal to or higher than the predetermined threshold, and outputs the information of the entity to which the score is associated when the score is less than the predetermined threshold. Do not output. As a result, the knowledge data server 100 of the present embodiment outputs the entity-related data to the web server 300 when the query is appropriate as text data associated with the entity. Therefore, the knowledge data server 100 of the present embodiment can provide an appropriate knowledge panel 220 while improving the user satisfaction with the search result by displaying the knowledge panel 220 on the display unit 210.

＜１−１３．テキストデータのブラックリストについて＞
なお、上述では、ナレッジデータサーバ１００は、得点が所定の閾値以上である場合、ウェブサーバ３００にエンティティの所在情報を送信する場合について説明したが、これに限られない。ここで、クエリ入力領域２１１には、ナレッジパネル２２０を表示することに適さないクエリが入力される場合がある。ナレッジパネル２２０を表示することに適さないクエリとは、例えば、人物の名誉棄損となるワード、公序良俗を乱すワード及び商品の評判を貶めるワード等である。この場合、得点算出部１１３は、コンテンツ情報Ｄ２のうち、ウェブサーバ３００にエンティティの所在情報を提供しないコンテンツ情報Ｄ２を示す情報（以下、テキストデータブラックリスト）を生成する構成であってもよい。得点算出部１１３は、例えば、人物の名誉棄損となるワード、公序良俗を乱すワード及び商品の評判を貶めるワード等の辞書を参照し、コンテンツ情報Ｄ２のうち、当該辞書に含まれるワードと合致するテキストデータをテキストデータブラックリストに含める。この場合、所在情報取得部１１４は、クエリと合致するテキストデータがテキストデータブラックリストに含まれる場合、当該テキストデータの得点に関わらず、ウェブサーバ３００にエンティティの所在情報を送信しない。また、ウェブサーバ３００は、テキストデータブラックリストに含まれるクエリがクエリ入力領域２１１に入力されたことに応じて、エンティティの所在情報を受信しないため、ナレッジパネル２２０を表示しない。 <1-13. About blacklist of text data ＞
In the above description, the knowledge data server 100 has described the case where the location information of the entity is transmitted to the web server 300 when the score is equal to or higher than a predetermined threshold value, but the present invention is not limited to this. Here, in the query input area 211, a query that is not suitable for displaying the knowledge panel 220 may be input. Queries that are not suitable for displaying the knowledge panel 220 are, for example, words that defamate a person, words that disturb public order and morals, and words that detract from the reputation of a product. In this case, the score calculation unit 113 may be configured to generate information (hereinafter, text data blacklist) indicating the content information D2 that does not provide the location information of the entity to the web server 300 among the content information D2. The score calculation unit 113 refers to a dictionary such as a word that defames a person, a word that disturbs public order and morals, and a word that detracts from the reputation of a product, and is a text of the content information D2 that matches the word included in the dictionary. Include the data in the text data blacklist. In this case, when the text data matching the query is included in the text data blacklist, the location information acquisition unit 114 does not transmit the location information of the entity to the web server 300 regardless of the score of the text data. Further, the web server 300 does not display the knowledge panel 220 because it does not receive the location information of the entity in response to the query included in the text data blacklist being input to the query input area 211.

以上説明したように、本実施形態のナレッジデータサーバ１００において、得点算出部１１３は、エンティティの情報（この一例では、所在情報）を出力しない対象テキストデータと、当該テキストデータに関連付けられたエンティティを示す非出力コンテンツ情報（この一例では、テキストデータブラックリスト）を生成し、テキストデータブラックリストを参照し、入力されたクエリに対する検索結果に付加してエンティティの情報を出力すべきか否かを判定する判定部（この一例では、所在情報取得部１１４）を備える。これにより、本実施形態のナレッジデータサーバ１００は、ナレッジパネル２２０を表示することに適さないクエリに応じて、ナレッジパネル２２０が表示されることを抑制することができる。 As described above, in the knowledge data server 100 of the present embodiment, the score calculation unit 113 sets the target text data that does not output the entity information (location information in this example) and the entity associated with the text data. Generates the non-output content information shown (text data blacklist in this example), refers to the text data blacklist, and determines whether to output entity information in addition to the search results for the entered query. A determination unit (in this example, a location information acquisition unit 114) is provided. As a result, the knowledge data server 100 of the present embodiment can suppress the display of the knowledge panel 220 in response to a query that is not suitable for displaying the knowledge panel 220.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

１００…ナレッジデータサーバ
１１０…制御部
１１１…選択部
１１２…収集部
１１３…得点算出部
１１４…所在情報取得部
１２０…記憶部
２００…端末装置
２１０…表示部
２１１…クエリ入力領域
２２０…ナレッジパネル
２２１…エンティティ
２２２…画像
２２３…関連情報
２３０…検索結果
３００…ウェブサーバ
Ｄ１…ナレッジグラフ
Ｄ２、Ｄ２ａ…コンテンツ情報 100 ... Knowledge data server 110 ... Control unit 111 ... Selection unit 112 ... Collection unit 113 ... Score calculation unit 114 ... Location information acquisition unit 120 ... Storage unit 200 ... Terminal device 210 ... Display unit 211 ... Query input area 220 ... Knowledge panel 221 ... Entity 222 ... Image 223 ... Related information 230 ... Search result 300 ... Web server D1 ... Knowledge graphs D2, D2a ... Content information

Claims

A selection unit that selects the target text data from the text data in the content information in which the entity and the text data are associated with each other.
A collection unit that collects one or more related web pages related to the target text data selected by the selection unit.
You Keru to each set of the target text data and the entity to the content information,
In the first type of related web page among one or more related web pages collected by the collecting unit, the first probability that the target text data is included as an anchor text indicating a link destination, and
Of the second probability that a link destination including the entity associated with the target text data in the content information is selected as the anchor text in the second type of related web page among the one or more related web pages. At least one and
In the first type of related web pages or the second type of related web pages, the web pages in the link destination indicated by the anchor text is the webpage entities associated with the target text data in the content information and a third probability,
Evaluation department that evaluates based on
Information processing device equipped with.

The evaluation unit is the total of the number of first samples when the third probability is obtained based on the first probability and the number of second samples when the third probability is obtained based on the second probability. The value obtained by dividing the number of first samples in is multiplied by the product of the first probability and the third probability based on the first probability, and the value obtained by dividing the number of second samples by the total is defined as the second probability. Multiply the product of the third probability based on the second probability, and obtain the sum of them as the evaluation value.
The information processing apparatus according to claim 1.

The first type of related web page is an encyclopedia web page that can be edited by the user.
The information processing apparatus according to claim 1 or 2.

The second type of related web page is a web page that displays search results when the target text data is used as a query.
The information processing apparatus according to any one of claims 1 to 3.

Further, it is provided with a determination unit that refers to the evaluation result by the evaluation unit for the input query and determines whether or not the information about the entity should be added to the search result and output.
The information processing apparatus according to any one of claims 1 to 4.

The evaluation unit calculates the score of the entity corresponding to the target text data, and calculates the score.
When the score is equal to or higher than a predetermined threshold value, the determination unit outputs information on the entity to which the score is associated, and when the score is less than the predetermined threshold value, the knowledge data is associated with the entity. Decide not to output information,
The information processing apparatus according to claim 5.

The determination unit refers to the target text data that is generated in advance and does not output the information of the entity and the non-output content information indicating the entity associated with the text data, and the score is equal to or higher than a predetermined threshold. Also determines that the entity included in the non-output content information is not added to the search result for the input query.
The information processing apparatus according to claim 5 or 6.

The computer
Select the target text data from the text data in the content information in which the entity and the text data are associated, and select the target text data.
Collect one or more related web pages related to the selected target text data and
You Keru to each set of the target text data and the entity to the content information,
In the first type of related web page among one or more related web pages collected, the first probability that the target text data is included as an anchor text indicating a link destination, and
Of the second type of related web page among the one or more related web pages, the second probability that a link destination including the entity associated with the target text data in the content information as the anchor text is selected in the content information. , At least one and
The web page at the link destination indicated by the anchor text in the first type related web page or the second type related web page is the web page of the entity associated with the target text data in the content information . Based on 3 probabilities
Evaluate as an entity corresponding to the target text data.
Information processing method.

On the computer
The target text data is selected from the text data in the content information in which the entity and the text data are associated with each other.
Have one or more related web pages related to the selected target text data collected,
You Keru to each set of the target text data and the entity to the content information,
In the first type of related web page among one or more related web pages collected, the first probability that the target text data is included as an anchor text indicating a link destination, and
Of the second type of related web page among the one or more related web pages, the second probability that a link destination including the entity associated with the target text data in the content information as the anchor text is selected in the content information. , At least one and
The web page at the link destination indicated by the anchor text in the first type related web page or the second type related web page is the web page of the entity associated with the target text data in the content information . Based on 3 probabilities
To evaluate as an entity corresponding to the target text data,
program.