JP6956133B2

JP6956133B2 - model

Info

Publication number: JP6956133B2
Application number: JP2019072876A
Authority: JP
Inventors: 賢太郎西
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2017-09-20
Filing date: 2019-04-05
Publication date: 2021-10-27
Anticipated expiration: 2037-09-20
Also published as: JP2019139790A

Description

本発明は、モデルに関する。 The present invention relates to a model.

従来、人物や事象等の概念をエンティティとし、エンティティ間の関係性を構造化したナレッジベースと呼ばれる技術が知られている。また、このようなナレッジデータベースを効率的に作成するため、エンティティのクラスタリングを行い、クラスタリングの結果に基づいて、エンティティ間の関係性を更新する技術が知られている。 Conventionally, a technique called a knowledge base has been known in which the concept of a person or an event is used as an entity and the relationship between the entities is structured. Further, in order to efficiently create such a knowledge database, there is known a technique of clustering entities and updating relationships between entities based on the result of clustering.

特許第６０８８０９１号公報Japanese Patent No. 6088091

”On Emerging Entity Detection” Michael Farber, Achim Rettinger, Boulos El Asmar“On Emerging Entity Detection” Michael Farber, Achim Rettinger, Boulos El Asmar

しかしながら、上述した技術では、新たなエンティティを効率よく追加することができない場合がある。 However, with the above-mentioned technique, it may not be possible to efficiently add a new entity.

例えば、ニュース記事等、インターネット上に投稿されたコンテンツから新たなエンティティ（以下、「新出エンティティ」と記載する。）を抽出するといった手法が考えられる。しかしながら、どのコンテンツにどのような新出エンティティが含まれているかを推定するのは、困難である。 For example, a method of extracting a new entity (hereinafter referred to as "new entity") from the content posted on the Internet such as a news article can be considered. However, it is difficult to estimate what kind of new entity is included in which content.

本願は、上記に鑑みてなされたものであって、新出エンティティの追加を効率化することを目的とする。 The present application has been made in view of the above, and an object of the present application is to streamline the addition of new entities.

本願に係るモデルは、所定の要素に関する投稿情報が入力される入力層と、出力層と、前記入力層から前記出力層までのいずれかの層であって前記出力層以外の層に属する第１要素と、前記第１要素と前記第１要素の重みとに基づいて値が算出される第２要素と、を含み、前記入力層に入力された情報に対し、前記出力層以外の各層に属する各要素を前記第１要素として、前記第１要素と前記第１要素の重みとに基づく演算を行うことにより、前記投稿情報が新たな要素に関する投稿情報であるか否かを示す値を前記出力層から出力するよう、コンピュータを機能させることを特徴とする。 The model according to the present application is a first layer that is any of an input layer, an output layer, and a layer from the input layer to the output layer into which post information regarding a predetermined element is input, and belongs to a layer other than the output layer. It includes an element and a second element whose value is calculated based on the first element and the weight of the first element, and belongs to each layer other than the output layer with respect to the information input to the input layer. By performing an operation based on the first element and the weight of the first element with each element as the first element, a value indicating whether or not the posted information is posted information related to a new element is output. It is characterized by making the computer function so that it outputs from the layer.

実施形態の一態様によれば、新出エンティティの追加を効率化することができる。 According to one aspect of the embodiment, the addition of new entities can be streamlined.

図１は、実施形態に係る情報提供装置が実行する処理の一例を示す図である。FIG. 1 is a diagram showing an example of processing executed by the information providing device according to the embodiment. 図２は、実施形態に係る情報提供装置の構成例を示す図である。FIG. 2 is a diagram showing a configuration example of the information providing device according to the embodiment. 図３は、実施形態に係るエンティティデータベースに登録される情報の一例を示す図である。FIG. 3 is a diagram showing an example of information registered in the entity database according to the embodiment. 図４は、実施形態に係るトリプルデータベースに登録される情報の一例を示す図である。FIG. 4 is a diagram showing an example of information registered in the triple database according to the embodiment. 図５は、実施形態に係る検索ログデータベースに登録される情報の一例を示す図である。FIG. 5 is a diagram showing an example of information registered in the search log database according to the embodiment. 図６は、実施形態に係る投稿情報データベースに登録される情報の一例を示す図である。FIG. 6 is a diagram showing an example of information registered in the posted information database according to the embodiment. 図７は、実施形態に係る学習データデータベースに登録される情報の一例を示す図である。FIG. 7 is a diagram showing an example of information registered in the learning data database according to the embodiment. 図８は、実施形態に係るモデルデータベースに登録される情報の一例を示す図である。FIG. 8 is a diagram showing an example of information registered in the model database according to the embodiment. 図９は、実施形態に係る情報提供装置が学習データの作成対象とする要素候補を選択する処理の一例を示す図である。FIG. 9 is a diagram showing an example of a process in which the information providing device according to the embodiment selects an element candidate for which the learning data is to be created. 図１０は、実施形態に係る情報提供装置が実行する処理の流れの一例を示すフローチャートである。FIG. 10 is a flowchart showing an example of a flow of processing executed by the information providing device according to the embodiment. 図１１は、ハードウェア構成の一例を示す図である。FIG. 11 is a diagram showing an example of a hardware configuration.

以下に、本願に係るモデルを実施するための形態（以下、「実施形態」と記載する。）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係るモデルが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, embodiments of the model according to the present (hereinafter referred to as "embodiment".) Will be described in detail with reference to the drawings. It should be understood that model according to the present application is limited by this embodiment. Further, in each of the following embodiments, the same parts are designated by the same reference numerals, and duplicate description is omitted.

［実施形態］
〔１．情報提供装置が提供する処理について〕
まず、図１を用いて、選択装置の一例となる情報提供装置が実行する選択処理の一例について説明する。図１は、実施形態に係る情報提供装置が実行する処理の一例を示す図である。なお、以下の説明では、情報提供装置１０が実行する処理として、知識を体系化、組織化した情報が登録されるナレッジデータベースに登録する新たなエンティティを検出するモデルを作成するための学習データの元となるエンティティを選択する選択処理の一例について説明する。 [Embodiment]
[1. About the processing provided by the information providing device]
First, an example of the selection process executed by the information providing device, which is an example of the selection device, will be described with reference to FIG. FIG. 1 is a diagram showing an example of processing executed by the information providing device according to the embodiment. In the following description, as the process executed by the information providing device 10, the learning data for creating a model for detecting a new entity to be registered in the knowledge database in which knowledge is systematized and organized information is registered. An example of the selection process for selecting the source entity will be described.

〔１−１．情報提供装置の概要〕
情報提供装置１０は、インターネット等の所定のネットワークＮ（例えば、図２を参照。）を介して、ログサーバ１００、および説明コンテンツサーバ２００と通信可能な情報処理装置であり、例えば、サーバ装置やクラウドシステム等により実現される。なお、情報提供装置１０は、ネットワークＮを介して、任意の数のログサーバ１００や説明コンテンツサーバ２００と通信可能であってよい。また、情報提供装置１０は、例えば、利用者が使用する利用者端末（図示は、省略）からの要求に基づいて、ナレッジデータベースに登録された各種情報の提供を行う機能を有するものとする。 [1-1. Overview of information providing equipment]
The information providing device 10 is an information processing device capable of communicating with the log server 100 and the explanatory content server 200 via a predetermined network N (see, for example, FIG. 2) such as the Internet, and is, for example, a server device or a server device. It is realized by a cloud system or the like. The information providing device 10 may be able to communicate with an arbitrary number of log servers 100 and explanatory content servers 200 via the network N. Further, the information providing device 10 has a function of providing various information registered in the knowledge database based on a request from a user terminal (not shown) used by the user, for example.

例えば、利用者端末は、ナレッジデータベースの検索を行う際の検索クエリを情報提供装置１０に送信する。このような場合、情報提供装置１０は、検索クエリと対応する情報をナレッジデータベースから検索し、検索結果を利用者端末へと提供する。 For example, the user terminal transmits a search query for searching the knowledge database to the information providing device 10. In such a case, the information providing device 10 searches the knowledge database for information corresponding to the search query, and provides the search result to the user terminal.

ここで、情報提供装置１０が検索するナレッジデータベースについて説明する。ナレッジデータベースには、各種の知識が体系化、組織化された状態で登録されている。例えば、ナレッジデータベースには、登録される要素であるエンティティ（以下、「要素」と記載する場合がある。）と、エンティティ間の関係性を示す情報（以下、「関係情報」と記載する。）とが登録されている。エンティティは、実世界における人物、物体、建築物等の主語となりうる各種の物、職業や国籍等といった属性、各種の状態や事象等、世の中における各種の物事に対応する情報である。また、関係情報は、２つのエンティティ間の関係性を示す情報である。なお、情報提供装置１０が有するナレッジデータベースにおける要素（すなわち、ナレッジデータベースに登録されたエンティティ）は、任意の物事や事象と対応していてよい。 Here, the knowledge database searched by the information providing device 10 will be described. Various kinds of knowledge are registered in the knowledge database in a systematic and organized state. For example, in the knowledge database, an entity that is an element to be registered (hereinafter, may be described as "element") and information indicating a relationship between the entities (hereinafter, described as "relationship information"). And are registered. An entity is information corresponding to various things in the world such as various things that can be the subject of a person, an object, a building, etc. in the real world, attributes such as occupation and nationality, various states and events, and so on. Further, the relationship information is information indicating the relationship between two entities. The elements in the knowledge database (that is, the entities registered in the knowledge database) of the information providing device 10 may correspond to any thing or event.

ログサーバ１００は、各種の履歴を保持する情報処理装置であり、サーバ装置やクラウドシステム等により実現される。例えば、ログサーバ１００は、インターネットを介した各種の検索を行う際に利用者が入力した検索クエリのログを保持する。例えば、ログサーバ１００は、ウェブ検索、路線検索、電子商店街における取引対象の検索、地図検索、コンテンツ検索等、任意の検索における検索クエリのログを保持する。 The log server 100 is an information processing device that holds various histories, and is realized by a server device, a cloud system, or the like. For example, the log server 100 holds a log of search queries entered by the user when performing various searches via the Internet. For example, the log server 100 holds a log of search queries in any search such as a web search, a route search, a transaction target search in an electronic shopping street, a map search, and a content search.

また、ログサーバ１００は、インターネット上に投稿された各種のコンテンツである投稿情報を保持する。例えば、ログサーバ１００は、ウェブサイト上に投稿されたニュース、ブログ、ＳＮＳ（Social Networking Service）等を保持する。なお、ログサーバ１００は、ニュースを配信するサーバ装置、ブログを管理するサーバ装置、ＳＮＳを管理するサーバ装置等によって実現されてもよい。 In addition, the log server 100 holds posted information which is various contents posted on the Internet. For example, the log server 100 holds news, blogs, SNS (Social Networking Service), etc. posted on the website. The log server 100 may be realized by a server device that distributes news, a server device that manages blogs, a server device that manages SNS, and the like.

説明コンテンツサーバ２００は、人物、物体、建築物、コンテンツ、事象等、各種の要素を説明する説明コンテンツの管理や配信を行う情報処理装置であり、サーバ装置やクラウドシステム等により実現される。例えば、説明コンテンツサーバ２００は、ウィキペディア（登録商標）やインターネットを介した辞書サービス等、ナレッジデータベースにおいて要素となりえる各種の対象を説明するコンテンツである説明コンテンツの管理や配信を行う。 The explanatory content server 200 is an information processing device that manages and distributes explanatory content that explains various elements such as people, objects, buildings, contents, and events, and is realized by a server device, a cloud system, or the like. For example, the explanatory content server 200 manages and distributes explanatory content, which is content that explains various objects that can be elements in a knowledge database, such as Wikipedia (registered trademark) and a dictionary service via the Internet.

〔１−２．選択処理について〕
ここで、ナレッジデータベースに新たな事柄を示す要素を登録する処理を考える。このようなナレッジデータベースに新たな要素（以下、「新出要素」と記載する。）を登録するには、新出要素を示す文字列のみならず、新出要素が他の要素とどのような関係を有するかを示す関係情報が必要となる。このような新出要素や関係情報を効率よくナレッジデータベースに追加するため、ニュース記事等、インターネット上に投稿されたコンテンツから新出要素を抽出するといった手法が考えられる。しかしながら、どのコンテンツにどのような新出要素が含まれているかを判断するのは、困難である。 [1-2. About selection process]
Here, consider the process of registering an element indicating a new matter in the knowledge database. In order to register a new element (hereinafter referred to as "new element") in such a knowledge database, not only the character string indicating the new element but also what kind of new element is different from other elements. Relationship information indicating whether or not they have a relationship is required. In order to efficiently add such new elements and related information to the knowledge database, a method of extracting new elements from contents posted on the Internet such as news articles can be considered. However, it is difficult to determine what kind of new element is included in which content.

そこで、情報提供装置１０は、以下の選択処理を実行する。まず、情報提供装置１０は、新たな要素の候補である要素候補に関する検索履歴と、要素候補に関する投稿情報とを特定する。続いて、情報提供装置１０は、特定された検索履歴と、特定された投稿情報とに基づいて、要素候補のうち、新たな要素に関する情報の特徴をモデルに学習させるための学習データを生成するための要素候補を選択する。 Therefore, the information providing device 10 executes the following selection process. First, the information providing device 10 specifies the search history regarding the element candidate which is a candidate for a new element and the posted information regarding the element candidate. Subsequently, the information providing device 10 generates learning data for making the model learn the characteristics of the information about the new element among the element candidates based on the specified search history and the specified posted information. Select element candidates for.

例えば、ある単語を要素候補とする場合、その単語を含む検索履歴からは、その単語の検索の状況を推定することができる。また、その単語を含む投稿からは、その単語の投稿の状況を推定することができる。このような検索の状況や投稿の状況は、その単語が新出要素を示すか否かの指標となりえる。例えば、その単語が新出要素を示す場合、その単語の検索や投稿は、ある日から突然行われるとも考えられる。そこで、情報提供装置１０は、検索履歴が示す検索の状況と、投稿情報が示す投稿の状況とに基づいて、新出要素である可能性が高い要素候補を選択する。 For example, when a word is used as an element candidate, the search status of the word can be estimated from the search history including the word. In addition, the status of posting of the word can be estimated from the posts including the word. The status of such a search or the status of posting can be an index of whether or not the word indicates a new element. For example, if the word indicates a new element, the search or posting of the word may be sudden from one day. Therefore, the information providing device 10 selects an element candidate that is likely to be a new element based on the search status indicated by the search history and the posting status indicated by the posted information.

例えば、情報提供装置１０は、ある要素候補に関する検索履歴が示す検索の数と、その要素候補に関する投稿履歴が示す投稿情報の数とに基づいて、学習データを生成するための要素候補の選択を行う。例えば、情報提供装置１０は、検索の数の変化の内容と、投稿の数の変化の内容とに基づいて、学習データを生成するための要素候補の選択を行う。より具体的な例を挙げると、情報提供装置１０は、ある日を境に検索が行われ、かつ、その日を境に投稿が行われるようになった要素候補を学習データを生成するための要素候補として選択する。 For example, the information providing device 10 selects an element candidate for generating learning data based on the number of searches indicated by the search history of a certain element candidate and the number of posted information indicated by the posting history of the element candidate. conduct. For example, the information providing device 10 selects an element candidate for generating learning data based on the content of the change in the number of searches and the content of the change in the number of posts. To give a more specific example, the information providing device 10 is an element for generating learning data for element candidates that are searched on a certain day and posted on that day. Select as a candidate.

また、情報提供装置１０は、選択処理によって選択された要素候補に基づいて、モデルの学習を行う学習処理を実行する。すなわち、情報提供装置１０は、選択された要素候補に関する情報を用いて、新たな要素に関する情報の特徴をモデルに学習させる。例えば、情報提供装置１０は、選択された要素候補、選択された要素候補に関する投稿情報、および選択された要素候補と他の要素との関係性を示す関係情報との組を学習データとして生成する。そして、情報提供装置１０は、学習データを用いて、モデルの学習を行う。 Further, the information providing device 10 executes a learning process for learning a model based on the element candidates selected by the selection process. That is, the information providing device 10 causes the model to learn the characteristics of the information regarding the new element by using the information regarding the selected element candidate. For example, the information providing device 10 generates a set of selected element candidates, posted information about the selected element candidates, and relationship information indicating the relationship between the selected element candidates and other elements as learning data. .. Then, the information providing device 10 learns the model by using the learning data.

より具体的には、情報提供装置１０は、選択された要素候補に関する投稿情報を用いて、投稿情報が新たな要素に関する投稿であるか否かを判定する判定モデルの学習を行う。また、情報提供装置１０は、選択された要素候補と、選択された要素候補に関する投稿情報を用いて、投稿情報に含まれる新たな要素を抽出する要素抽出モデルの学習を行う。また、情報提供装置１０は、選択された要素候補に関する投稿情報と、選択された要素候補と他の要素との関係性とを用いて、投稿情報に含まれる新たな要素と他の要素との関係性を示す関係情報を抽出する関係推定モデルの学習を行う。例えば、情報提供装置１０は、投稿情報から、新出属性との間に所定の関係性を有する他の要素を抽出する複数のモデルであって、それぞれ異なる関係性を有する他の要素を抽出する複数のモデルを学習する。 More specifically, the information providing device 10 uses the posted information regarding the selected element candidate to learn a determination model for determining whether or not the posted information is a post related to a new element. Further, the information providing device 10 learns an element extraction model that extracts a new element included in the posted information by using the selected element candidate and the posted information about the selected element candidate. Further, the information providing device 10 uses the posted information regarding the selected element candidate and the relationship between the selected element candidate and the other element to form a new element and another element included in the posted information. The relationship estimation model that extracts the relationship information indicating the relationship is trained. For example, the information providing device 10 is a plurality of models for extracting other elements having a predetermined relationship with the new attribute from the posted information, and extracts other elements having different relationships with each other. Learn multiple models.

また、情報提供装置１０は、学習処理によって学習が行われたモデルを用いて、投稿情報から新出要素や関係情報を抽出し、抽出した新出要素や関係情報に基づいて、ナレッジデータベースを更新する更新処理を実行する。 Further, the information providing device 10 extracts new elements and relational information from the posted information by using the model learned by the learning process, and updates the knowledge database based on the extracted new elements and relational information. Execute the update process.

〔１−３．情報提供装置が実行する処理の一例について〕
以下、図１を用いて、情報提供装置１０が実行する選択処理、学習処理、および更新処理（以下、「各処理」と総称する場合がある。）の一例について説明する。まず、情報提供装置１０は、選択処理を実行する。このような選択処理において、情報提供装置１０は、まず、所定の期間内に作成された説明コンテンツを説明コンテンツサーバ２００から取得する（ステップＳ１）。 [1-3. About an example of the processing executed by the information providing device]
Hereinafter, an example of a selection process, a learning process, and an update process (hereinafter, may be collectively referred to as “each process”) executed by the information providing device 10 will be described with reference to FIG. First, the information providing device 10 executes the selection process. In such a selection process, the information providing device 10 first acquires the explanatory content created within a predetermined period from the explanatory content server 200 (step S1).

例えば、ある事柄に対して説明コンテンツが新たに登録された場合、その説明コンテンツが説明する事柄は、新出要素となる可能性が高い。そこで、情報提供装置１０は、登録されてから経過した日時が所定の期間内（例えば、数か月）となる説明コンテンツを説明コンテンツサーバ２００から取得する。 For example, when explanatory content is newly registered for a certain matter, the matter explained by the explanatory content is likely to be a new element. Therefore, the information providing device 10 acquires the explanatory content from the explanatory content server 200 in which the date and time elapsed since the registration is within a predetermined period (for example, several months).

続いて、情報提供装置１０は、説明コンテンツから、新出要素の候補となる要素（以下、「要素候補」と記載する。）と、要素候補と他の要素との関係を示す関係情報とを抽出する（ステップＳ２）。例えば、図１に示す例では、情報提供装置１０は、説明コンテンツを識別する説明コンテンツＩＤ（Identifier）が「コンテンツＩＤ＃１」となる説明コンテンツ「説明コンテンツ＃１」であって、登録日時が「日時＃１」となる説明コンテンツを取得する。このような場合、情報提供装置１０は、説明コンテンツの主題を示すと推定される文字列を要素候補として抽出するとともに、要素候補と他の要素との間の関係性を示す関係情報を説明コンテンツの内容から推定する。すなわち、情報提供装置１０は、説明コンテンツが新たに登録された場合は、その説明コンテンツから要素候補を取得する。 Subsequently, the information providing device 10 obtains, from the explanatory content, an element that is a candidate for a new element (hereinafter, referred to as "element candidate") and relational information indicating the relationship between the element candidate and another element. Extract (step S2). For example, in the example shown in FIG. 1, the information providing device 10 is the explanatory content "explanatory content # 1" in which the explanatory content ID (Identifier) for identifying the explanatory content is "content ID # 1", and the registration date and time is Acquire the explanation content that is "date and time # 1". In such a case, the information providing device 10 extracts a character string presumed to indicate the subject of the explanatory content as an element candidate, and also extracts the relational information indicating the relationship between the element candidate and another element as the explanatory content. Estimate from the contents of. That is, when the explanatory content is newly registered, the information providing device 10 acquires the element candidate from the explanatory content.

ここで、情報提供装置１０は、形態素解析や意味推定等の各種文章解析技術を用いて、要素候補や関係情報を抽出してよい。また、例えば、説明コンテンツには、説明コンテンツの主題となる事柄がどれであるか、主題となる事柄と他の事柄との間の関係性が何であるかを示す情報（例えば、インフォボックスと呼ばれる情報）が含まれる場合がある。このようなインフォボックスが説明コンテンツに含まれる場合、情報提供装置１０は、インフォボックスから要素候補や関係情報を抽出してもよい。 Here, the information providing device 10 may extract element candidates and relational information by using various sentence analysis techniques such as morphological analysis and meaning estimation. Also, for example, in the explanatory content, information indicating what is the subject matter of the explanatory content and what is the relationship between the subject matter and other matters (for example, called an infobox). Information) may be included. When such an infobox is included in the explanatory content, the information providing device 10 may extract element candidates and relational information from the infobox.

また、例えば、情報提供装置１０は、説明コンテンツが登録されてから所定の期間内（例えば、１か月）以内にインフォボックスが追加された場合には、かかるインフォボックスに登録された関係情報を要素候補の関係情報として採用してもよい。また、インフォボックスには、要素候補のカテゴリやクラス等といった分類を示す分類情報が含まれる場合がある。このような分類情報がインフォボックスに含まれる場合、情報提供装置１０は、要素候補の分類情報をインフォボックスから抽出してもよい。 Further, for example, when the infobox is added within a predetermined period (for example, one month) after the explanatory content is registered, the information providing device 10 displays the related information registered in the infobox. It may be adopted as the relational information of the element candidate. In addition, the infobox may include classification information indicating classification such as a category or class of element candidates. When such classification information is included in the infobox, the information providing device 10 may extract the classification information of the element candidates from the infobox.

このような処理の結果、情報提供装置１０は、要素候補と、要素候補と紐付けられる関係情報との組を候補データとして抽出する。例えば、情報提供装置１０は、説明コンテンツ＃１に、要素候補＃１と、要素候補＃１と他の第１要素との関係性を示す関係情報＃１−１と、要素候補＃１と他の第２要素との関係性を示す関係情報＃１−２とが含まれている場合、説明コンテンツ＃１から、要素候補＃１と、関係情報群＃１（関係情報＃１−１、および関係情報＃１−２）とを対応付けた候補データ＃１を生成する。なお、情報提供装置１０は、候補データ＃１に要素候補と紐付けられる分類情報を含めてもよい。 As a result of such processing, the information providing device 10 extracts a set of the element candidate and the relational information associated with the element candidate as candidate data. For example, the information providing device 10 includes element candidate # 1, relationship information # 1-1 indicating the relationship between element candidate # 1 and another first element, element candidate # 1 and others in the explanatory content # 1. When the relationship information # 1-2 indicating the relationship with the second element of the above is included, the element candidate # 1 and the relationship information group # 1 (relationship information # 1-1, and the relationship information # 1-1) are included from the explanation content # 1. Candidate data # 1 associated with the relationship information # 1-2) is generated. The information providing device 10 may include the classification information associated with the element candidate in the candidate data # 1.

ここで、新たに説明コンテンツが登録された事柄には、新出要素に関する説明コンテンツ以外にも、例えば、既に存在した他言語の説明コンテンツを日本語に翻訳した説明コンテンツ等、新たな事柄に関する説明コンテンツ以外のコンテンツが含まれる場合がある。そこで、情報提供装置１０は、候補データの中から、新出要素の確度が高い要素候補の候補データを選択する。より具体的には、情報提供装置１０は、要素候補の検索履歴と、要素候補に関する投稿情報とを取得する（ステップＳ３）。そして、情報提供装置１０は、特定した検索履歴の数と、投稿情報の数とに基づいて、新出要素とする要素候補を選択する。 Here, in addition to the explanation content related to the new element, the matters for which the explanation content is newly registered include explanations related to new matters such as explanation content obtained by translating the explanation content of another language that already exists into Japanese. Content other than content may be included. Therefore, the information providing device 10 selects candidate data of an element candidate having a high accuracy of a new element from the candidate data. More specifically, the information providing device 10 acquires the search history of the element candidate and the posted information about the element candidate (step S3). Then, the information providing device 10 selects an element candidate to be a new element based on the number of the specified search histories and the number of posted information.

例えば、情報提供装置１０は、所定の日時における検索履歴と投稿履歴との増加量が所定の条件を満たす要素候補を特定し、特定した要素候補に基づいた学習データを生成する（ステップＳ４）。より具体的には、情報提供装置１０は、説明コンテンツから抽出した複数の要素候補のうち、所定の日時において、検索履歴の数の増加量が所定の閾値を超え、かつ、投稿情報の数の増加量が所定の閾値を超えた要素候補を選択する。 For example, the information providing device 10 identifies an element candidate in which the amount of increase in the search history and the posting history at a predetermined date and time satisfies a predetermined condition, and generates learning data based on the specified element candidate (step S4). More specifically, in the information providing device 10, among the plurality of element candidates extracted from the explanatory content, the amount of increase in the number of search histories exceeds a predetermined threshold value at a predetermined date and time, and the number of posted information is increased. Element candidates whose increase amount exceeds a predetermined threshold are selected.

例えば、新作の映画を新出要素とする態様について考える。このような新作の映画が発表された場合、その映画のタイトルを検索クエリとする検索の回数は、発表前と比較して、発表後に急増すると考えられる。また、このような新作の映画が発表された場合、新作の映画のタイトルを含むニュースやＳＮＳの数が発表前と比較して、発表後に急増すると考えられる。このため、新出要素を検索クエリとする検索の回数や投稿情報の数は、ある日時を境に急増すると考えられる。また、このような新作の映画が発表された場合、発表してからある程度の期間が経過してから説明コンテンツが生成されることとなる。さらに、小説が映画化されたというような場合には、映画の発表日時よりも前に検索クエリや投稿情報に要素候補が含まれると考えられる。さらに、その映画のタイトルを含む検索クエリの増加と、その映画のタイトルを含む投稿情報の増加とが連動していない場合は、その映画のタイトルとなる文字列と同じ文字列であって、異なる事柄を示す文字列が検索もしくは投稿されていると考えられる。 For example, consider a mode in which a new movie is a new element. When such a new movie is announced, the number of searches using the title of the movie as a search query is expected to increase sharply after the announcement compared to before the announcement. In addition, when such a new movie is announced, it is considered that the number of news and SNS including the title of the new movie will increase sharply after the announcement as compared with before the announcement. For this reason, it is thought that the number of searches and the number of posted information using new elements as search queries will increase sharply after a certain date and time. In addition, when such a new movie is announced, the explanatory content will be generated after a certain period of time has passed since the announcement. Furthermore, if the novel is made into a movie, it is considered that the element candidates are included in the search query and the posted information before the movie announcement date and time. Furthermore, if the increase in search queries that include the movie title and the increase in post information that includes the movie title are not linked, the string is the same as the string that is the title of the movie, but different. It is probable that a character string indicating the matter has been searched or posted.

そこで、情報提供装置１０は、要素候補を抽出した説明コンテンツが登録された日時を基準日時とし、基準日時よりも所定の期間だけ前の検索履歴や投稿情報を取得する。例えば、情報提供装置１０は、基準日時の過去１年間に入力された検索クエリであって、要素候補を含む検索クエリや、基準日時の過去１年間に投稿された投稿情報であって、要素候補を含む投稿情報を取得する。そこで、情報提供装置１０は、説明コンテンツから抽出した要素候補のうち、所定の日時よりも前における検索履歴および投稿情報が存在せず、かつ、所定の日時において検索履歴の数が所定の閾値を超え、かつ、投稿情報の数が所定の閾値を超えた要素候補を選択する。 Therefore, the information providing device 10 sets the date and time when the explanatory content from which the element candidates are extracted is registered as the reference date and time, and acquires the search history and the posted information for a predetermined period before the reference date and time. For example, the information providing device 10 is a search query input in the past one year of the reference date and time, and is a search query including element candidates and post information posted in the past one year of the reference date and time, and is an element candidate. Get post information including. Therefore, the information providing device 10 does not have the search history and the posted information before the predetermined date and time among the element candidates extracted from the explanatory content, and the number of the search histories sets a predetermined threshold value at the predetermined date and time. Select element candidates that exceed and the number of posted information exceeds a predetermined threshold.

例えば、図１に示す例では、要素候補＃１を含む検索クエリの数の変遷を各日時ごとに点線でプロットし、要素候補＃１を含む投稿情報の数の変遷を各日付ごとに直線でプロットした。図１に示す例では、図１中（Ａ）に示すように、ある日時まで検索クエリおよび投稿情報の数が「０」となり、図１中（Ｂ）に示すように、ある日時で検索クエリおよび投稿情報の数が急増している。このように、要素候補＃１を含む検索クエリの数および投稿情報の数が、同じ日時に「０」から「１」以上へと変遷した場合は、要素候補＃１が新出要素である可能性が高い。このような場合、情報提供装置１０は、要素候補＃１を学習対象として選択し、要素候補＃１の候補データ＃１を学習データ＃１とする。 For example, in the example shown in FIG. 1, the transition of the number of search queries including the element candidate # 1 is plotted with a dotted line for each date and time, and the transition of the number of posted information including the element candidate # 1 is plotted with a straight line for each date. Plotted. In the example shown in FIG. 1, as shown in FIG. 1 (A), the number of search queries and posted information becomes "0" until a certain date and time, and as shown in FIG. 1 (B), the search query is performed at a certain date and time. And the number of posted information is skyrocketing. In this way, if the number of search queries including element candidate # 1 and the number of posted information change from "0" to "1" or more on the same date and time, element candidate # 1 may be a new element. Highly sexual. In such a case, the information providing device 10 selects the element candidate # 1 as the learning target, and sets the candidate data # 1 of the element candidate # 1 as the learning data # 1.

また、情報提供装置１０は、要素候補を含む投稿情報のうち、投稿情報の数が急増した日時に投稿された投稿情報を学習データとして抽出する。例えば、図１に示す例では、情報提供装置１０は、要素候補＃１を含む投稿情報のうち、投稿情報の数が「０」から「１」以上に変化した日時に投稿された投稿情報群＃１を抽出し、抽出した投稿情報群＃１を学習データ＃１に含める。このような選択処理の結果、情報提供装置１０は、新出要素である可能性が高い要素候補と、その要素候補と他の候補との関係性を示す関係情報と、その要素候補を含む投稿情報とを学習データとして取得することができる。 Further, the information providing device 10 extracts the posted information posted at the date and time when the number of posted information suddenly increases from the posted information including the element candidates as learning data. For example, in the example shown in FIG. 1, the information providing device 10 posts information group posted at a date and time when the number of posted information changes from "0" to "1" or more among the posted information including the element candidate # 1. # 1 is extracted, and the extracted post information group # 1 is included in the learning data # 1. As a result of such a selection process, the information providing device 10 includes an element candidate that is likely to be a new element, relational information indicating the relationship between the element candidate and another candidate, and a post including the element candidate. Information and can be acquired as learning data.

ここで、上述した選択処理においては、検索クエリの数および投稿情報の数が、同じ日時に「０」から「１」以上へと変遷した場合は、その日時に投稿された投稿情報を学習データとして抽出した。このように、検索クエリの数および投稿情報の数が「０」から「１」以上へと変遷した日時に投稿された投稿情報は、要素候補を新出要素として説明する投稿情報であると考えられる。 Here, in the above-mentioned selection process, if the number of search queries and the number of posted information change from "0" to "1" or more on the same date and time, the posted information posted on that date and time is learned data. Extracted as. In this way, the posted information posted on the date and time when the number of search queries and the number of posted information changed from "0" to "1" or more is considered to be posted information that explains the element candidate as a new element. Be done.

このように要素候補を新出要素として説明する投稿情報には、要素候補が新出要素であることを示す様な特徴を有する文章が含まれていると考えられる。このため、このような投稿情報の特徴を学習した場合、投稿情報に新出要素が含まれているか否かを推定するモデルを作成することができると考えられる。 It is considered that the posted information for explaining the element candidate as a new element includes a sentence having a feature indicating that the element candidate is a new element. Therefore, when learning the characteristics of such posted information, it is considered possible to create a model for estimating whether or not the posted information contains new elements.

また、要素候補を新出要素として説明する投稿情報には、新出要素である要素候補の文字列が含まれていると考えられる。このため、このような投稿情報の特徴を学習した場合、投稿情報から新出要素と推定される文字列を抽出するモデルを作成することができると考えられる。 Further, it is considered that the post information that explains the element candidate as a new element includes the character string of the element candidate that is the new element. Therefore, when learning the characteristics of such posted information, it is considered possible to create a model for extracting a character string presumed to be a new element from the posted information.

また、要素候補を新出要素として説明する投稿情報には、新出要素である要素候補を説明する文字列、すなわち、新出要素である要素候補と他の要素との関連性を示す文字列が含まれていると考えられる。このため、このような投稿情報の特徴を学習した場合、投稿情報から新出要素と他の要素との関係性を示す関係情報を推定するモデルを作成することができると考えられる。 In addition, the post information that explains the element candidate as a new element includes a character string that explains the element candidate that is the new element, that is, a character string that indicates the relationship between the element candidate that is the new element and other elements. Is considered to be included. Therefore, when learning the characteristics of such posted information, it is considered possible to create a model for estimating the relationship information indicating the relationship between the new element and other elements from the posted information.

そこで、情報提供装置１０は、選択処理によって生成された学習データを用いて、投稿情報が新出要素を含むか否かを判定する判定モデル、投稿情報から新出要素を抽出する抽出モデル、および投稿情報から新出要素と他の要素との関係性を示す関係情報を推定する関係推定モデルを学習する（ステップＳ５）。例えば、情報提供装置１０は、学習データ＃１に含まれる投稿情報群＃１を正解データとし、正解データが有する特徴をモデルに学習させることで、投稿情報が新出要素を示す投稿情報であるか否かを判定する判定モデルを生成する。より具体的な例を挙げると、情報提供装置１０は、学習データ＃１に含まれる投稿情報群＃１を入力した際に、入力された投稿情報に新出要素が含まれる旨を示す情報を出力し、要素候補＃１を含む投稿情報であって、投稿情報群＃１に含まれない投稿情報を入力した際に、入力された投稿情報に新出要素が含まれない旨を示す情報を出力するように、判定モデルの学習を行う。 Therefore, the information providing device 10 uses the learning data generated by the selection process to determine whether or not the posted information includes new elements, an extraction model for extracting new elements from the posted information, and an extraction model. Learn the relationship estimation model that estimates the relationship information indicating the relationship between the new element and other elements from the posted information (step S5). For example, the information providing device 10 uses the posted information group # 1 included in the learning data # 1 as the correct answer data, and trains the model to learn the features of the correct answer data so that the posted information is the posted information indicating a new element. Generate a judgment model to determine whether or not. To give a more specific example, when the information providing device 10 inputs the posted information group # 1 included in the learning data # 1, the information providing device 10 provides information indicating that the input posted information includes a new element. When the post information that is output and includes the element candidate # 1 and is not included in the post information group # 1 is input, the information indicating that the input post information does not include the new element is output. The judgment model is trained so as to output.

また、情報提供装置１０は、要素候補＃１と投稿情報群＃１とを正解データとし、正解データが有する特徴をモデルに学習させることで、投稿情報から新出要素を抽出する抽出モデルを生成する。より具体的な例を挙げると、情報提供装置１０は、投稿情報群＃１に含まれる各投稿情報を入力した際に、要素候補＃１を示す情報を出力するように、抽出モデルの学習を行う。すなわち、情報提供装置１０は、投稿情報群＃１を入力データとした際に、出力データが要素候補＃１となるように、抽出モデルの学習を行う。 Further, the information providing device 10 uses the element candidate # 1 and the posted information group # 1 as correct answer data, and trains the model to learn the features of the correct answer data to generate an extraction model that extracts new elements from the posted information. do. To give a more specific example, the information providing device 10 learns the extraction model so that when each posted information included in the posted information group # 1 is input, the information indicating the element candidate # 1 is output. conduct. That is, the information providing device 10 learns the extraction model so that the output data becomes the element candidate # 1 when the posted information group # 1 is used as the input data.

また、情報提供装置１０は、投稿情報群＃１と関係情報群＃１とを正解データとし、正解データが有する特徴をモデルに学習させることで、投稿情報から新出要素の関係情報を推定する関係推定モデルを生成する。より具体的な例を挙げると、情報提供装置１０は、投稿情報群＃１に含まれる各投稿情報を入力した際に、関係情報群＃１に含まれる関係情報を推定するように、関係推定モデルの学習を行う。 Further, the information providing device 10 estimates the relational information of the new element from the posted information by using the posted information group # 1 and the relational information group # 1 as correct answer data and letting the model learn the features of the correct answer data. Generate a relationship estimation model. To give a more specific example, the information providing device 10 estimates the relationship so as to estimate the relationship information included in the relationship information group # 1 when each posted information included in the posted information group # 1 is input. Train the model.

例えば、情報提供装置１０は、関係推定モデルとして、投稿情報から、新出要素との間に所定の関係性を有する他の要素を抽出する複数のモデルであって、それぞれ異なる関係性を有する他の要素を抽出する複数のモデルを学習する。例えば、情報提供装置１０は、学習データ＃１に含まれる各投稿情報から、任意の文章解析技術を用いて、要素候補との間に所定の関係性を有する他の要素を抽出する。 For example, the information providing device 10 is a plurality of models for extracting other elements having a predetermined relationship with a new element from posted information as a relationship estimation model, and each has a different relationship. Learn multiple models to extract the elements of. For example, the information providing device 10 uses an arbitrary sentence analysis technique to extract other elements having a predetermined relationship with the element candidates from each posted information included in the learning data # 1.

例えば、情報提供装置１０は、投稿情報が新作映画の説明である場合、出演者の文字列、監督の文字列、配給会社の文字列等を抽出する。そして、情報提供装置１０は、例えば、投稿情報を入力した際に、出演者の文字列を抽出するよう第１の関係推定モデルを学習し、投稿情報を入力した際に、監督の文字列を抽出するよう第２の関係推定モデルを学習し、投稿情報を入力した際に、配給会社の文字列を抽出するよう第３の関係推定モデルを学習する。 For example, when the posted information is a description of a new movie, the information providing device 10 extracts a character string of a performer, a character string of a director, a character string of a distribution company, and the like. Then, for example, the information providing device 10 learns the first relationship estimation model so as to extract the character string of the performer when the posted information is input, and when the posted information is input, the director's character string is input. The second relationship estimation model is learned so as to be extracted, and the third relationship estimation model is learned so as to extract the character string of the distribution company when the posted information is input.

ここで、情報提供装置１０は、任意の予測モデルを用いて関係推定モデルの学習を行ってよい。例えば、情報提供装置１０は、ナレッジベースに登録済みのエンティティと関係情報とに基づいて、関係推定モデルの学習を行ってもよい。より具体的な例を挙げると、情報提供装置１０は、投稿情報に含まれる要素であって、要素候補と所定の関係性を有する他の要素がナレッジデータベース中のエンティティである特性を利用し、かかる「他の要素」が人間であるか否か、人間である場合には性別や職業が何であるかをナレッジデータベースから特定する。そして、情報提供装置１０は、特定した関係情報と要素候補と要素との間の関係性を関係推定モデルに学習させてもよい。すなわち、情報提供装置１０は、投稿情報に含まれる各要素と要素候補との間の関係性を予測する関係推定モデルの学習を行ってもよい。 Here, the information providing device 10 may learn the relationship estimation model using an arbitrary prediction model. For example, the information providing device 10 may learn the relationship estimation model based on the entity registered in the knowledge base and the relationship information. To give a more specific example, the information providing device 10 utilizes a characteristic that an element included in the posted information and another element having a predetermined relationship with the element candidate is an entity in the knowledge database. Identify from the knowledge database whether or not such "other factors" are humans, and if so, what their gender and occupation are. Then, the information providing device 10 may make the relationship estimation model learn the relationship between the specified relationship information, the element candidate, and the element. That is, the information providing device 10 may learn a relationship estimation model that predicts the relationship between each element included in the posted information and the element candidate.

そして、情報提供装置１０は、各モデルを用いて更新処理を実行する。すなわち、情報提供装置１０は、新出要素を含む投稿情報を特定し、特定情報から新出要素と関係情報を抽出し、抽出した新出要素と関係情報とを用いて、ナレッジデータベースを更新する（ステップＳ６）。例えば、情報提供装置１０は、新たな投稿情報＃Ｎ１および投稿情報＃Ｎ２が存在する場合、判定モデルを用いて、各投稿情報＃Ｎ１、＃Ｎ２が新出要素を含む投稿情報（以下、「新出投稿」と記載する場合がある。）であるか否かを判定する。そして、情報提供装置１０は、投稿情報＃Ｎ１が新出投稿であると判定された場合は、抽出モデルと関係推定モデルとを用いて、投稿情報＃Ｎ１から、新出要素や関係情報を抽出する。 Then, the information providing device 10 executes the update process using each model. That is, the information providing device 10 identifies the posted information including the new element, extracts the new element and the relational information from the specific information, and updates the knowledge database by using the extracted new element and the relational information. (Step S6). For example, when new post information # N1 and post information # N2 exist, the information providing device 10 uses a determination model to post information in which each post information # N1 and # N2 includes a new element (hereinafter, "" It may be described as "new post".) Judge whether or not it is. Then, when it is determined that the posted information # N1 is a new post, the information providing device 10 extracts new elements and related information from the posted information # N1 by using the extraction model and the relationship estimation model. do.

より具体的には、情報提供装置１０は、投稿情報＃Ｎ１から新出要素を抽出するとともに、新出要素とそれぞれ所定の関係性を有する他の要素とを抽出する。そして、情報提供装置１０は、抽出した新出要素と関係情報とを用いて、ナレッジデータベースを更新する。例えば、情報提供装置１０は、新出要素をナレッジデータベースに登録するとともに、新出要素と、抽出した他の要素と、新出要素と他の要素との間の関係性を示す関係情報とを対応付けたトリプルの設定を行う。例えば、情報提供装置１０は、投稿情報を入力した際に、監督の文字列を抽出するよう学習が行われた関係推定モデルが、投稿情報＃Ｎ１から監督の文字列を抽出した場合は、抽出モデルが抽出した新出要素と、関係推定モデルが抽出した文字列（すなわち、他の要素）と、要素間の関係を示す関係情報「監督」とを対応付けたトリプルと呼ばれる情報を登録する。 More specifically, the information providing device 10 extracts a new element from the posted information # N1 and extracts the new element and another element having a predetermined relationship with each other. Then, the information providing device 10 updates the knowledge database by using the extracted new elements and the relational information. For example, the information providing device 10 registers the new element in the knowledge database, and stores the new element, the extracted other element, and the relationship information indicating the relationship between the new element and the other element. Set the associated triple. For example, the information providing device 10 extracts the director's character string from the posted information # N1 when the relationship estimation model trained to extract the director's character string when the posted information is input. Information called a triple is registered in which the new element extracted by the model, the character string extracted by the relationship estimation model (that is, other elements), and the relationship information "supervisor" indicating the relationship between the elements are associated with each other.

すなわち、情報提供装置１０が更新するナレッジデータベースにおいては、２つのエンティティとエンティティ間の関係性を示す関係情報とからなるトリプルと呼ばれる情報が登録されることにより、各種の知識が体系化されている。換言すると、ナレッジデータベースにおいては、主語（Ｓ）となるエンティティと、目的語（Ｏ）となるエンティティと、述語（Ｐ）となる関係情報とからなるトリプルによって、様々な知識が体系的に表されている。 That is, in the knowledge database updated by the information providing device 10, various kinds of knowledge are systematized by registering information called a triple consisting of two entities and relationship information indicating the relationship between the two entities. .. In other words, in the knowledge database, various knowledge is systematically represented by a triple consisting of an entity that is the subject (S), an entity that is the object (O), and relational information that is the predicate (P). ing.

このように、情報提供装置１０は、新たな要素の候補である要素候補に関する検索履歴と、要素候補に関する投稿情報とを特定する。そして、情報提供装置１０は、特定された検索履歴の数と、特定された投稿情報の数とに基づいて、要素候補のうち、新たな要素に関する情報の特徴をモデルに学習させるための学習データを生成するための要素候補を選択する。このため、情報提供装置１０は、モデルを用いた新出要素の追加を効率化することができる。 In this way, the information providing device 10 specifies the search history regarding the element candidate which is a candidate for a new element and the posted information regarding the element candidate. Then, the information providing device 10 is learning data for making the model learn the characteristics of the information about the new element among the element candidates based on the number of the specified search histories and the number of the specified posted information. Select element candidates to generate. Therefore, the information providing device 10 can streamline the addition of new elements using the model.

〔１−４．モデルについて〕
ここで、情報提供装置１０は、任意の形式の判定モデル、抽出モデル、および関係推定モデル（以下、「各モデル」と記載する場合がある。）の学習を行ってよい。例えば、情報提供装置１０は、回帰モデル、Ｂａｇｏｆｗｏｒｄｓ、ＳＶＭ（Support Vector Machine）やＤＮＮ（Deep Neural Network）、ＣＲＦ（Conditional Random Fields）、ＬＳＴＭ（Long short-term memory）等、任意のモデルが採用可能である。 [1-4. About the model]
Here, the information providing device 10 may learn an arbitrary type of determination model, extraction model, and relationship estimation model (hereinafter, may be referred to as “each model”). For example, the information providing device 10 may include a regression model, Bag of words, SVM (Support Vector Machine), DNN (Deep Neural Network), CRF (Conditional Random Fields), LSTM (Long short-term memory), or any other model. It can be adopted.

例えば、上述した学習処理により学習が行われた判定モデルは、所定の要素に関する投稿情報が入力される入力層と、出力層と、入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力された情報に対し、出力層以外の各層に属する各要素を第１要素として、第１要素と第１要素の重みとに基づく演算を行うことにより、投稿情報が新たな要素に関する投稿情報であるか否かを示す値を出力層から出力するよう、コンピュータを機能させるためのモデルである。 For example, the determination model in which the training is performed by the above-mentioned learning process is an input layer in which post information relating to a predetermined element is input, an output layer, or any layer from the input layer to the output layer, and is an output layer. The information input to the input layer includes the first element belonging to a layer other than the first element and the second element whose value is calculated based on the weights of the first element and the first element, and the information input to the input layer is other than the output layer. By performing an operation based on the first element and the weight of the first element with each element belonging to each layer as the first element, a value indicating whether or not the posted information is the posted information related to the new element is obtained from the output layer. It is a model for making the computer function to output.

また、例えば、上述した学習処理により学習が行われた抽出モデルは、新たな要素に関する投稿情報が入力される入力層と、出力層と、入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力された情報に対し、出力層以外の各層に属する各要素を第１要素として、第１要素と第１要素の重みとに基づく演算を行うことにより、投稿情報に含まれる情報のうち新たな要素を示す情報を示す値を出力層から出力するよう、コンピュータを機能させるためのモデルである。 Further, for example, the extraction model trained by the above-mentioned learning process is an input layer into which post information regarding a new element is input, an output layer, or any layer from the input layer to the output layer. An output layer for information input to the input layer, including a first element belonging to a layer other than the output layer and a second element whose value is calculated based on the weights of the first element and the first element. By performing an operation based on the first element and the weight of the first element with each element belonging to each layer other than the first element as the first element, a value indicating information indicating a new element among the information included in the posted information is output. It is a model for making the computer function so that it outputs from the layer.

また、例えば、上述した学習処理により学習が行われた関係推定モデルは、新たな要素に関する投稿情報が入力される入力層と、出力層と、記入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力された情報に対し、出力層以外の各層に属する各要素を第１要素として、第１要素と第１要素の重みとに基づく演算を行うことにより、投稿情報に含まれる情報のうち新たな要素と他の要素との間の関係性を示す値を出力層から出力するよう、コンピュータを機能させるためのモデルである。 Further, for example, the relationship estimation model trained by the above-mentioned learning process has an input layer in which post information about a new element is input, an output layer, and any layer from the input layer to the output layer. With respect to the information input to the input layer, including the first element belonging to a layer other than the output layer and the second element whose value is calculated based on the weights of the first element and the first element. By performing an operation based on the weights of the first element and the first element with each element belonging to each layer other than the output layer as the first element, a new element and other elements among the information included in the posted information can be obtained. It is a model for making a computer function so that a value indicating the relationship between them is output from the output layer.

ここで、各モデルが「y=a1*x1+a2*x2+・・・+ai*xi」で示す回帰モデルで実現されるとする。この場合、各モデルが含む第１要素は、x1やx2等といった入力データ（xi）に対応する。また、第１要素の重みは、xiに対応する係数aiに対応する。ここで、回帰モデルは、入力層と出力層とを有する単純パーセプトロンと見做すことができる。各モデルを単純パーセプトロンと見做した場合、第１要素は、入力層が有するいずれかのノードに対応し、第２要素は、出力層が有するノードと見做すことができる。 Here, it is assumed that each model is realized by the regression model shown by "y = a1 * x1 + a2 * x2 + ... + ai * xi". In this case, the first element included in each model corresponds to input data (xi) such as x1 and x2. Further, the weight of the first element corresponds to the coefficient ai corresponding to xi. Here, the regression model can be regarded as a simple perceptron having an input layer and an output layer. When each model is regarded as a simple perceptron, the first element corresponds to any node of the input layer, and the second element can be regarded as the node of the output layer.

また、各モデルがＤＮＮ（Deep Neural Network）等、１つまたは複数の中間層を有するニューラルネットワークで実現されるとする。この場合、各モデルが含む第１要素は、入力層または中間層が有するいずれかのノードに対応する。また、第２要素は、第１要素と対応するノードから値が伝達されるノードである次段のノードに対応する。また、第１要素の重みは、第１要素と対応するノードから第２要素と対応するノードに伝達される値に対して考慮される重みである接続係数に対応する。 Further, it is assumed that each model is realized by a neural network having one or a plurality of intermediate layers such as DNN (Deep Neural Network). In this case, the first element included in each model corresponds to either the node of the input layer or the intermediate layer. Further, the second element corresponds to a node in the next stage, which is a node to which a value is transmitted from a node corresponding to the first element. Further, the weight of the first element corresponds to a connection coefficient which is a weight considered for the value transmitted from the node corresponding to the first element to the node corresponding to the second element.

情報提供装置１０は、上述した回帰モデルやニューラルネットワーク等、任意の構造を有する各モデルを用いて、新出要素や関係情報を取得する。具体的には、判定モデルは、投稿情報が入力された場合に、その投稿情報に新出要素が含まれるか否かを示すスコアを出力するように、各種のパラメータ（例えば、接続係数）が設定される。また、抽出モデルは、投稿情報が入力された場合に、その投稿情報から新出要素である可能性が高い文字列等の情報を示す情報を出力するように、各種のパラメータが設定される。また、関係推定モデルは、投稿情報が入力された場合に、その投稿情報が示す新出要素と所定の関係性を有する他の要素を示す情報を出力するように、各種のパラメータが設定される。 The information providing device 10 acquires new elements and relational information by using each model having an arbitrary structure such as the regression model and the neural network described above. Specifically, the judgment model sets various parameters (for example, connection coefficient) so as to output a score indicating whether or not the posted information includes a new element when the posted information is input. Set. Further, in the extraction model, various parameters are set so as to output information indicating information such as a character string that is likely to be a new element from the posted information when the posted information is input. Further, in the relationship estimation model, various parameters are set so as to output information indicating other elements having a predetermined relationship with the new element indicated by the posted information when the posted information is input. ..

なお、実施形態に係る各モデルは、所定のモデルにデータの入出力を繰り返すことで得られる結果に基づいて生成されるモデルであってもよい。また、情報提供装置１０がＧＡＮ（Generative Adversarial Networks）を用いた学習処理を行う場合、各モデルは、ＧＡＮの一部を構成するモデルであってもよい。 Each model according to the embodiment may be a model generated based on a result obtained by repeating input / output of data to a predetermined model. Further, when the information providing device 10 performs learning processing using GAN (Generative Adversarial Networks), each model may be a model forming a part of GAN.

例えば、情報提供装置１０は、新たな要素の候補である要素候補のうち、要素候補に関する検索履歴の数と要素候補に関する投稿情報の数とに基づいて選択された要素候補と、要素候補に関する検索履歴と、要素候補と他の要素との間の関係性を示す情報とを含む学習データを用いて、上述した学習を行うことで、各モデルを生成する。なお、情報提供装置１０は、いかなる学習アルゴリズムを用いて各を生成してもよい。例えば、情報提供装置１０は、ニューラルネットワーク（neural network）、サポートベクターマシン（support vector machine）、クラスタリング、強化学習等の学習アルゴリズムを用いて学習モデルを生成する。一例として、情報提供装置１０がニューラルネットワークを用いて各モデルを生成する場合、各モデルは、一以上のニューロンを含む入力層と、一以上のニューロンを含む中間層と、一以上のニューロンを含む出力層とを有する。 For example, the information providing device 10 searches for element candidates selected based on the number of search histories related to element candidates and the number of posted information related to element candidates among the element candidates that are candidates for new elements. Each model is generated by performing the above-mentioned learning using the learning data including the history and the information indicating the relationship between the element candidate and the other element. The information providing device 10 may generate each using any learning algorithm. For example, the information providing device 10 generates a learning model by using a learning algorithm such as a neural network, a support vector machine, clustering, and reinforcement learning. As an example, when the information providing device 10 uses a neural network to generate each model, each model includes an input layer containing one or more neurons, an intermediate layer containing one or more neurons, and one or more neurons. It has an output layer.

なお、学習データは、上述した各種のモデルとして情報提供装置１０を動作させるデータである。すなわち、学習データは、新出要素の候補である要素候補のうち、要素候補に関する検索履歴の数と要素候補に関する投稿情報の数とに基づいて選択された要素候補と、要素候補に関する検索履歴と、要素候補と他の要素との間の関係性を示す情報とを含みコンピュータを、上述した各種のモデルとして機能させるためのデータである。 The learning data is data for operating the information providing device 10 as the various models described above. That is, the learning data includes the element candidates selected based on the number of search histories related to the element candidates and the number of posted information related to the element candidates among the element candidates that are candidates for new elements, and the search history related to the element candidates. , Information indicating the relationship between the element candidate and other elements, and the data for making the computer function as the various models described above.

〔１−５．学習データについて〕
上述した説明では、情報提供装置１０は、検索クエリや投稿情報の数が「０」から「１」以上となった日時に投稿された投稿情報を学習データとした。しかしながら、実施形態は、これに限定されるものではない。例えば、情報提供装置１０は、検索クエリや投稿情報の数が「０」から「１」以上となった日中に投稿された投稿情報を学習データとしてもよく、検索クエリや投稿情報の数が「０」から「１」以上となった日から所定の日時が経過するまでの間に投稿された投稿情報を学習データとしてもよい。 [1-5. About learning data]
In the above description, the information providing device 10 uses the posted information posted on the date and time when the number of search queries and posted information changes from "0" to "1" or more as learning data. However, the embodiment is not limited to this. For example, the information providing device 10 may use the posted information posted during the day when the number of search queries and posted information is changed from "0" to "1" or more as learning data, and the number of search queries and posted information is large. Posting information posted between the day when the number changes from "0" to "1" or more and the time when a predetermined date and time elapses may be used as learning data.

また、例えば、情報提供装置１０は、検索履歴が示す検索の状況や、投稿履歴が示す投稿の状況に基づいて、学習データとする要素候補を選択するのであれば、任意の状況に基づいて、要素候補の選択を行ってよい。例えば、情報提供装置１０は、検索履歴や投稿履歴の数が所定の閾値以下となる期間が所定の閾値以上存在し、ある日を境に、検索履歴や投稿履歴の数が所定の閾値以上となった場合は、対応する要素候補を学習データとして選択してもよい。また、情報提供装置１０は、検索履歴や投稿履歴が示す検索や投稿の統計的な状況等、任意の状況に基づいて、要素候補の選択を行ってよい。 Further, for example, if the information providing device 10 selects an element candidate to be learning data based on the search status indicated by the search history and the posting status indicated by the posting history, the information providing device 10 may select an element candidate as learning data based on an arbitrary situation. Element candidates may be selected. For example, the information providing device 10 has a period in which the number of search histories and posting histories is equal to or less than a predetermined threshold value, and the number of search histories and posting histories is equal to or greater than a predetermined threshold value after a certain day. If this happens, the corresponding element candidate may be selected as training data. Further, the information providing device 10 may select an element candidate based on an arbitrary situation such as a search history or a statistical situation of a search or a post indicated by a post history.

また、情報提供装置１０は、正例のみならず、負例となる学習データを生成してもよい。例えば、情報提供装置１０は、ある要素候補についての投稿情報であって、検索クエリや投稿情報の数が「０」から「１」以上となった日に投稿された投稿情報を正例とし、その要素候補についての投稿情報であって、検索クエリや投稿情報の数が「０」から「１」以上となった日から所定の日時が経過した後に投稿された投稿情報を負例とする。そして、情報提供装置１０は、正例および負例を用いて、判定モデルの学習を行ってもよい。 Further, the information providing device 10 may generate learning data which is not only a positive example but also a negative example. For example, the information providing device 10 is the posted information about a certain element candidate, and the posted information posted on the day when the number of search queries and posted information changes from "0" to "1" or more is taken as a positive example. As a negative example, the posted information about the element candidate, which is posted after a predetermined date and time has elapsed from the day when the number of search queries and posted information changes from "0" to "1" or more. Then, the information providing device 10 may learn the determination model by using the positive example and the negative example.

なお、情報提供装置１０は、を作成する際、任意の期間に登録された投稿情報を取得して良い。例えば、情報提供装置１０は、過去１年分の投稿情報や検索クエリを取得し、取得した投稿情報や検索クエリを用いて、要素候補が新出要素であるか否かを判定してもよい。 The information providing device 10 may acquire the posted information registered in an arbitrary period when creating the information providing device 10. For example, the information providing device 10 may acquire post information and search queries for the past one year, and may determine whether or not the element candidate is a new element by using the acquired post information and search query. ..

また、新出要素が生じてから、新出要素の説明コンテンツが生成されたり、インフォボックスが生成されるまでの期間は、所定の範囲内（例えば、平均で４０日）に収まると予測される。そこで、情報提供装置１０は、説明コンテンツが生成された若しくはインフォボックスが生成された日時から４０日前の投稿情報を取得し、取得した投稿情報を学習データとしてもよい。 In addition, it is predicted that the period from the occurrence of the new element to the generation of the explanatory content of the new element and the generation of the infobox will be within a predetermined range (for example, 40 days on average). .. Therefore, the information providing device 10 may acquire the posted information 40 days before the date and time when the explanatory content is generated or the infobox is generated, and the acquired posted information may be used as learning data.

〔２．情報提供装置の構成〕
以下、上記した情報提供装置１０が有する機能構成の一例について説明する。図２は、実施形態に係る情報提供装置の構成例を示す図である。図２に示すように、情報提供装置１０は、通信部２０、記憶部３０、および制御部４０を有する。 [2. Configuration of information providing device]
Hereinafter, an example of the functional configuration of the information providing device 10 described above will be described. FIG. 2 is a diagram showing a configuration example of the information providing device according to the embodiment. As shown in FIG. 2, the information providing device 10 includes a communication unit 20, a storage unit 30, and a control unit 40.

通信部２０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部２０は、ネットワークＮと有線または無線で接続され、ログサーバ１００や説明コンテンツサーバ２００との間で情報の送受信を行う。 The communication unit 20 is realized by, for example, a NIC (Network Interface Card) or the like. Then, the communication unit 20 is connected to the network N by wire or wirelessly, and transmits / receives information to / from the log server 100 and the explanation content server 200.

記憶部３０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。また、記憶部３０は、エンティティデータベース３１、トリプルデータベース３２、検索ログデータベース３３、投稿情報データベース３４、学習データデータベース３５、およびモデルデータベース３６（以下、「各データベース３１〜３６」と総称する場合がある。）を記憶する。 The storage unit 30 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. Further, the storage unit 30 may be collectively referred to as an entity database 31, a triple database 32, a search log database 33, a post information database 34, a training data database 35, and a model database 36 (hereinafter, “each database 31 to 36”). .) Is memorized.

以下、図３〜８を用いて、各データベース３１〜３６に登録される情報の一例を説明する。エンティティデータベース３１には、エンティティに関する情報が登録される。例えば、図３は、実施形態に係るエンティティデータベースに登録される情報の一例を示す図である。図３に示すように、エンティティデータベース３１には、「エンティティＩＤ」、「エンティティ種別」、「ノードＩＤ」、「ノード種別」、および「データ」といった項目を有する情報が登録される。 Hereinafter, an example of information registered in each database 31 to 36 will be described with reference to FIGS. 3 to 8. Information about an entity is registered in the entity database 31. For example, FIG. 3 is a diagram showing an example of information registered in the entity database according to the embodiment. As shown in FIG. 3, information having items such as "entity ID", "entity type", "node ID", "node type", and "data" is registered in the entity database 31.

ここで、「エンティティＩＤ」とは、エンティティの識別子である。また、「エンティティ種別」とは、対応付けられた「エンティティＩＤ」が示すエンティティの種別を示す情報であり、例えば、エンティティが「人物」を示すエンティティであるか「職業」を示すエンティティであるかといった情報を示す。「ノードＩＤ」は、対応付けられた「エンティティＩＤ」が示すエンティティと関連するノードの識別子である。「ノード種別」は、対応付けられた「ノードＩＤ」が示すノードの種別を示す情報であり、ノードが名前を示すか、写真を示すか、職業を示すか等といった情報である。また、「データ」とは、対応付けられた「ノードＩＤ」が示すノードのデータである。 Here, the "entity ID" is an identifier of an entity. Further, the "entity type" is information indicating the type of the entity indicated by the associated "entity ID". For example, whether the entity is an entity indicating a "person" or an entity indicating an "occupation". Information such as is shown. The "node ID" is an identifier of a node associated with the entity indicated by the associated "entity ID". The "node type" is information indicating the type of the node indicated by the associated "node ID", and is information such as whether the node indicates a name, a photograph, an occupation, or the like. Further, the "data" is the data of the node indicated by the associated "node ID".

例えば、図３に示す例では、エンティティＩＤ「Ｅ１１」、エンティティ種別「人物」、ノードＩＤ「Ｉ１１１」、ノード種別「名前」、およびデータ「名前＃１」が対応付けて登録されている。このような情報は、エンティティＩＤ「Ｅ１１」が示すエンティティ（すなわち、エンティティＥ１１）が「人物」を示すエンティティであり、そのエンティティが示す人物の「名前」を示すノードとしてノードＩＤ「Ｉ１１１」が登録されており、その名前が「名前＃１」である旨を示す。なお、図１に示す例では、「名前＃１」や「写真＃１」等といった概念的な値を記載したが、実際には、エンティティデータベース３１には、対応付けられたエンティティと対応する人物の名前、写真、生年月日等を示す各種の情報が登録されることとなる。 For example, in the example shown in FIG. 3, the entity ID "E11", the entity type "person", the node ID "I111", the node type "name", and the data "name # 1" are registered in association with each other. In such information, the entity indicated by the entity ID "E11" (that is, the entity E11) is an entity indicating a "person", and the node ID "I111" is registered as a node indicating the "name" of the person indicated by the entity. Indicates that the name is "Name # 1". In the example shown in FIG. 1, conceptual values such as "name # 1" and "photo # 1" are described, but in reality, the entity database 31 contains a person corresponding to the associated entity. Various information indicating the name, photo, date of birth, etc. will be registered.

トリプルデータベース３２には、トリプルを示す情報が登録される。例えば、図４は、実施形態に係るトリプルデータベースに登録される情報の一例を示す図である。図４に示す例では、トリプルデータベース３２には、「トリプルＩＤ」、「関係情報ＩＤ」、「種別」、「第１要素」、および「第２要素」といった項目を有する情報が登録される。 Information indicating the triple is registered in the triple database 32. For example, FIG. 4 is a diagram showing an example of information registered in the triple database according to the embodiment. In the example shown in FIG. 4, information having items such as "triple ID", "relationship information ID", "type", "first element", and "second element" is registered in the triple database 32.

ここで、「トリプルＩＤ」とは、トリプルを識別する識別子である。また、「関係情報ＩＤ」とは、トリプルに含まれる関係情報を識別する識別子である。また、「種別」とは、トリプルに含まれるエンティティ間の関係性を示す情報である。また、「第１要素」および「第２要素」とは、対応付けられた「トリプルＩＤ」が示すトリプルに含まれるエンティティのエンティティＩＤである。 Here, the "triple ID" is an identifier that identifies the triple. Further, the "relationship information ID" is an identifier that identifies the relational information included in the triple. Further, the "type" is information indicating the relationship between the entities included in the triple. Further, the "first element" and the "second element" are entity IDs of entities included in the triple indicated by the associated "triple ID".

例えば、図４に示す例では、トリプルＩＤ「トリプル＃１」、関係情報ＩＤ「Ｃ１」、種別「職業」、第１要素「Ｅ１１」、および第２要素「Ｅ２１」が対応付けて登録されている。このような情報は、トリプルＩＤ「トリプル＃１」が示すトリプルとして、エンティティＥ１１とエンティティＥ１２と関係情報Ｃ１とが対応付けて登録されており、エンティティＥ２１がエンティティＥ１１の職業である旨を示す。 For example, in the example shown in FIG. 4, the triple ID "triple # 1", the relationship information ID "C1", the type "occupation", the first element "E11", and the second element "E21" are registered in association with each other. There is. Such information is registered as a triple indicated by the triple ID "triple # 1" in which the entity E11, the entity E12, and the relationship information C1 are associated with each other, indicating that the entity E21 is the occupation of the entity E11.

検索ログデータベース３３には、検索クエリの履歴、すなわち、検索履歴が登録される。例えば、図５は、実施形態に係る検索ログデータベースに登録される情報の一例を示す図である。図５に示すように、検索ログデータベース３３には、検索ログを識別する識別子である「検索ログＩＤ」、入力された検索クエリを示す「検索クエリ」、および検索が行われた日時を示す「検索日時」といった項目を有する情報が登録される。 The history of search queries, that is, the search history is registered in the search log database 33. For example, FIG. 5 is a diagram showing an example of information registered in the search log database according to the embodiment. As shown in FIG. 5, in the search log database 33, an identifier that identifies the search log, a "search log ID", a "search query" that indicates the entered search query, and a "search query" that indicates the date and time when the search was performed are displayed. Information having an item such as "search date and time" is registered.

例えば、図５に示す例では、検索ログデータベース３３には、検索ログＩＤ「検索ログ＃１」、検索クエリ「検索クエリ＃１」、および検索日時「検索日時＃１」といった情報が対応付けて登録されている。このような情報は、検索ログＩＤ「検索ログ＃１」が示す検索ログの検索クエリが「検索クエリ＃１」であり、検索日時が「検索日時＃１」であった旨を示す。なお、図５に示す例では、「検索ログ＃１」、「検索クエリ＃１」、「検索日時＃１」といった概念的な値について記載したが、実際には、検索ログデータベース３３には、検索ログを識別する数値や文字列、検索クエリとして入力された文字列、検索日時を示す数値等が登録される。 For example, in the example shown in FIG. 5, the search log database 33 is associated with information such as the search log ID "search log # 1", the search query "search query # 1", and the search date and time "search date and time # 1". It is registered. Such information indicates that the search query of the search log indicated by the search log ID "search log # 1" is "search query # 1" and the search date and time is "search date and time # 1". In the example shown in FIG. 5, conceptual values such as "search log # 1", "search query # 1", and "search date and time # 1" are described, but in reality, the search log database 33 contains Numerical values and character strings that identify the search log, character strings entered as search queries, numerical values that indicate the search date and time, etc. are registered.

投稿情報データベース３４には、投稿情報が登録される。例えば、図６は、実施形態に係る投稿情報データベースに登録される情報の一例を示す図である。図６に示すように、投稿情報データベース３４には、投稿情報を識別する識別子である「投稿ログＩＤ」、投稿情報の内容を示す「投稿情報」、および投稿情報が投稿された日時を示す「投稿日時」といった項目を有する情報が登録される。 Post information is registered in the post information database 34. For example, FIG. 6 is a diagram showing an example of information registered in the posted information database according to the embodiment. As shown in FIG. 6, in the post information database 34, the "post log ID" which is an identifier for identifying the post information, the "post information" indicating the content of the post information, and the "post date and time when the post information was posted" are shown. Information with items such as "posting date and time" is registered.

例えば、図６に示す例では、投稿情報データベース３４には、投稿ログＩＤ「投稿ログ＃１」、投稿情報「投稿情報＃１」、および投稿日時「投稿日時＃１」といった情報が対応付けて登録されている。このような情報は、投稿ログＩＤ「投稿ログ＃１」が示す投稿情報が「投稿情報＃１」であり、投稿日時が「投稿日時＃１」であった旨を示す。なお、図６に示す例では、「投稿ログ＃１」、「投稿情報＃１」、「投稿日時＃１」といった概念的な値について記載したが、実際には、投稿情報データベース３４には、投稿情報を識別する数値や文字列、投稿情報として入力された文字列、投稿日時を示す数値等が登録される。 For example, in the example shown in FIG. 6, the post information database 34 is associated with information such as the post log ID “post log # 1”, the post information “post information # 1”, and the post date and time “post date and time # 1”. It is registered. Such information indicates that the posting information indicated by the posting log ID "posting log # 1" is "posting information # 1" and the posting date and time is "posting date and time # 1". In the example shown in FIG. 6, conceptual values such as "post log # 1", "post information # 1", and "post date / time # 1" are described, but in reality, the post information database 34 contains Numerical values and character strings that identify the posted information, character strings entered as the posted information, numerical values that indicate the posting date and time, etc. are registered.

学習データデータベース３５には、学習データが登録される。例えば、図７は、実施形態に係る学習データデータベースに登録される情報の一例を示す図である。図７に示すように、学習データデータベース３５には、学習データを識別する識別子である「学習データＩＤ」、学習データに含まれる要素候補である「要素候補」、学習データに含まれる関係情報である「関係情報」、および学習データに含まれる投稿情報である「投稿情報」といった項目を有する情報が登録される。 The learning data is registered in the learning data database 35. For example, FIG. 7 is a diagram showing an example of information registered in the learning data database according to the embodiment. As shown in FIG. 7, in the training data database 35, the “learning data ID” which is an identifier for identifying the training data, the “element candidate” which is the element candidate included in the training data, and the relational information included in the training data Information having items such as a certain "relationship information" and "posted information" which is posted information included in the learning data is registered.

例えば、図７に示す例では、学習データデータベース３５には、学習データＩＤ「学習データ＃１」、要素候補「要素候補＃１」、関係情報「関係情報群＃１」、投稿情報「投稿情報群＃１」が対応付けて登録されている。このような情報は、学習データＩＤ「学習データ＃１」が示す学習データとして、「要素候補＃１」、「関係情報群＃１」、および「投稿情報群＃１」が対応付けて登録されている旨を示す。また、このような情報は、「関係情報群＃１」として、「要素候補＃１」と所定の関係を有する他の要素が「対象要素＃１−１」であり、その関係が「関係＃１−１」である旨が登録されている旨を示す。また、このような情報は、「投稿情報群＃１」として「投稿情報＃１−１」や「投稿情報＃１−２」が登録されている旨を示す。 For example, in the example shown in FIG. 7, the learning data database 35 has the learning data ID “learning data # 1”, the element candidate “element candidate # 1”, the relational information “relationship information group # 1”, and the posting information “posting information”. Group # 1 ”is registered in association with each other. Such information is registered in association with "element candidate # 1", "relationship information group # 1", and "posted information group # 1" as learning data indicated by the learning data ID "learning data # 1". Indicates that. Further, in such information, as "relationship information group # 1", another element having a predetermined relationship with "element candidate # 1" is "target element # 1-1", and the relationship is "relationship # 1". It indicates that the fact that it is "1-1" is registered. Further, such information indicates that "posted information # 1-1" and "posted information # 1-2" are registered as "posted information group # 1".

なお、図７に示す例では、「学習データ＃１」、「要素候補＃１」、「関係情報＃１−１」、「関係＃１−１」、「投稿情報＃１−１」といった概念的な値について記載したが、実際には、学習データデータベース３５には、学習データを識別する数値や文字列、要素候補、関係情報、関係、投稿情報等となる文字列等が登録される。 In the example shown in FIG. 7, concepts such as "learning data # 1", "element candidate # 1", "relationship information # 1-1", "relationship # 1-1", and "posted information # 1-1" However, in reality, a numerical value or a character string for identifying the training data, an element candidate, a relationship information, a relationship, a character string or the like as posting information or the like is registered in the training data database 35.

モデルデータベース３６には、各モデルのデータが登録される。例えば、図８は、実施形態に係るモデルデータベースに登録される情報の一例を示す図である。図８に示すように、モデルデータベース３６には、モデルの種別を示す「モデル種別」およびモデルの情報である「モデルデータ」とが対応付けて登録される。 Data of each model is registered in the model database 36. For example, FIG. 8 is a diagram showing an example of information registered in the model database according to the embodiment. As shown in FIG. 8, the "model type" indicating the model type and the "model data" which is the model information are registered in the model database 36 in association with each other.

例えば、図８に示す例では、モデルデータベース３６には、モデル種別「判定モデル」およびモデルデータ「モデルデータ＃１」が対応付けて登録されている。このような情報は、「判定モデル」であるモデルの各種パラメータが「モデルデータ＃１」である旨を示す。なお、図８に示す例では、「モデルデータ＃１」といった概念的な値を記載したが、実際には、ノード間の接続関係や接続係数等といったモデルを構成するために必要な各種の情報が登録される。 For example, in the example shown in FIG. 8, the model type “determination model” and the model data “model data # 1” are registered in association with each other in the model database 36. Such information indicates that the various parameters of the model that is the "determination model" are "model data # 1". In the example shown in FIG. 8, conceptual values such as "model data # 1" are described, but in reality, various information necessary for constructing a model such as connection relationships between nodes and connection coefficients. Is registered.

図２に戻り、説明を続ける。制御部４０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等のプロセッサによって、情報提供装置１０内部の記憶装置に記憶されている各種プログラムがＲＡＭ等を作業領域として実行されることにより実現される。また、制御部４０は、コントローラ（controller）であり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されてもよい。 Returning to FIG. 2, the explanation will be continued. The control unit 40 is a controller, and for example, various programs stored in a storage device inside the information providing device 10 by a processor such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) are stored in a RAM or the like. Is realized by executing as a work area. Further, the control unit 40 is a controller, and may be realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図２に示すように、制御部４０は、取得部４１、特定部４２、選択部４３、抽出部４４、学習部４５、および更新部４６を有する。取得部４１は、要素候補を説明する説明コンテンツが登録された場合は、説明コンテンツから要素候補を取得する。例えば、取得部４１は、所定の時間間隔で説明コンテンツサーバ２００を参照し、所定の期間内に新たに登録された説明コンテンツを取得する。このような場合、取得部４１は、各種の文字解析技術を用いて、説明コンテンツが主題する物事を示すテキストを抽出し、抽出したテキストを要素候補とする。 As shown in FIG. 2, the control unit 40 includes an acquisition unit 41, a specific unit 42, a selection unit 43, an extraction unit 44, a learning unit 45, and an update unit 46. When the explanatory content for explaining the element candidate is registered, the acquisition unit 41 acquires the element candidate from the explanatory content. For example, the acquisition unit 41 refers to the explanatory content server 200 at predetermined time intervals, and acquires the newly registered explanatory content within a predetermined period. In such a case, the acquisition unit 41 uses various character analysis techniques to extract text indicating the subject matter of the explanatory content, and uses the extracted text as an element candidate.

特定部４２は、新たな要素の候補である要素候補に関する検索履歴と、要素候補に関する投稿情報とを特定する。より具体的には、特定部４２は、取得部４１により説明コンテンツから取得された要素候補の検索履歴と、要素候補に関する投稿情報とを特定する。例えば、特定部４２は、要素候補の投稿情報であって、説明コンテンツが登録されるよりも前の所定の期間内に投稿された投稿情報を特定する。 The identification unit 42 identifies the search history of the element candidate, which is a candidate for a new element, and the posted information about the element candidate. More specifically, the specific unit 42 specifies the search history of the element candidate acquired from the explanatory content by the acquisition unit 41 and the posted information regarding the element candidate. For example, the specific unit 42 identifies the posted information of the element candidate, which is posted within a predetermined period before the explanatory content is registered.

例えば、特定部４２は、所定の時間間隔で、ログサーバ１００から各種の検索履歴や投稿情報を取得する。そして、特定部４２は、取得した検索履歴を検索ログデータベース３３に登録し、取得した投稿情報を、投稿情報データベース３４に登録しておく。また、特定部４２は、取得部４１によって要素候補が取得された場合は、検索ログデータベース３３を参照し、要素候補を検索クエリとして含む検索ログを特定する。また、特定部４２は、取得された要素候補を含む投稿情報を投稿情報データベース３４から特定する。 For example, the specific unit 42 acquires various search histories and posting information from the log server 100 at predetermined time intervals. Then, the specific unit 42 registers the acquired search history in the search log database 33, and registers the acquired post information in the post information database 34. Further, when the element candidate is acquired by the acquisition unit 41, the specific unit 42 refers to the search log database 33 and specifies a search log including the element candidate as a search query. In addition, the specifying unit 42 specifies the posted information including the acquired element candidates from the posted information database 34.

ここで、特定部４２は、検索履歴や投稿情報を特定する際、検索日時や投稿日時を考慮してもよい。例えば、特定部４２は、取得された要素候補の説明コンテンツが登録された日時を特定し、検索履歴や投稿情報のうち、特定した日時よりも前の所定の期間内に検索或いは投稿された検索履歴や投稿情報を特定してもよい。例えば、特定部４２は、説明コンテンツの登録日前４０日間の投稿情報を特定してもよい。 Here, the specific unit 42 may consider the search date and time and the posting date and time when specifying the search history and the posting information. For example, the specific unit 42 specifies the date and time when the acquired element candidate explanation content is registered, and searches the search history and posted information within a predetermined period prior to the specified date and time. History and posted information may be specified. For example, the specific unit 42 may specify the posted information for 40 days before the registration date of the explanatory content.

選択部４３は、特定された検索履歴と、特定された投稿情報とに基づいて、要素候補のうち、新たな要素に関する情報の特徴をモデルに学習させるための学習データを生成するための要素候補を選択する。例えば、選択部４３は、複数の要素候補のうち、所定の日時において、検索履歴の数の増加量が所定の閾値を超え、かつ、投稿情報の数の増加量が所定の閾値を超えた要素候補を選択する。例えば、選択部４３は、複数の要素候補のうち、所定の日時よりも前における検索履歴および投稿情報が存在せず、かつ、所定の日時において検索履歴の数が所定の閾値を超え、かつ、投稿情報の数が所定の閾値を超えた要素候補を選択する。 The selection unit 43 is an element candidate for generating learning data for making a model learn the characteristics of information about a new element among the element candidates based on the specified search history and the specified post information. Select. For example, in the selection unit 43, among a plurality of element candidates, an element in which the amount of increase in the number of search histories exceeds a predetermined threshold value and the amount of increase in the number of posted information exceeds a predetermined threshold value at a predetermined date and time. Select a candidate. For example, in the selection unit 43, among the plurality of element candidates, the search history and posting information before the predetermined date and time do not exist, the number of search histories exceeds the predetermined threshold value at the predetermined date and time, and Select element candidates for which the number of posted information exceeds a predetermined threshold.

例えば、図９は、実施形態に係る情報提供装置が学習データの作成対象とする要素候補を選択する処理の一例を示す図である。なお、図９中（Ａ）には、ある要素候補＃１を含む検索クエリの数の変遷を各日時ごとに点線でプロットし、要素候補＃１を含む投稿情報の数の変遷を各日付ごとに直線でプロットした。また、図９中（Ｂ）には、ある要素候補＃２を含む検索クエリの数の変遷を各日時ごとに点線でプロットし、要素候補＃２を含む投稿情報の数の変遷を各日付ごとに直線でプロットした。 For example, FIG. 9 is a diagram showing an example of a process in which the information providing device according to the embodiment selects an element candidate for which the learning data is to be created. In FIG. 9 (A), the transition of the number of search queries including a certain element candidate # 1 is plotted with a dotted line for each date and time, and the transition of the number of posted information including the element candidate # 1 is plotted for each date. Was plotted as a straight line. Further, in FIG. 9B, the transition of the number of search queries including a certain element candidate # 2 is plotted with a dotted line for each date and time, and the transition of the number of posted information including the element candidate # 2 is plotted for each date. Was plotted as a straight line.

例えば、図９中（ａ）に示す日時において、要素候補＃１を含む検索履歴や投稿情報が急に生じており、図９中（ａ）よりも前の図９中（ｂ）に示す期間においては、要素候補＃１を含む検索履歴や投稿情報が存在しない。検索履歴の数や投稿情報の数がこのような変遷を辿る場合、要素候補＃１は、新出要素である可能性が高い。そこで、選択部４３は、検索履歴および投稿情報が存在しない状態から、所定の日時において検索履歴の数が所定の閾値を超え、かつ、投稿情報の数が所定の閾値を超えた要素候補＃１を、学習データの作成対象として選択する。 For example, at the date and time shown in FIG. 9 (a), the search history and posted information including the element candidate # 1 suddenly occur, and the period shown in FIG. 9 (b) before FIG. 9 (a). In, there is no search history or posted information including element candidate # 1. When the number of search histories and the number of posted information follow such a transition, the element candidate # 1 is likely to be a new element. Therefore, in the selection unit 43, the element candidate # 1 in which the number of search histories exceeds a predetermined threshold value and the number of posted information exceeds a predetermined threshold value at a predetermined date and time from the state where the search history and the posted information do not exist. Is selected as the target for creating training data.

一方、図９中（Ｂ）に示すように、要素候補＃２を含む検索履歴や投稿情報の数は、増減を繰り返しながらも一定量が存在している。ここで、図９中（ｃ）に示す日時において、要素候補＃１を含む検索履歴や投稿情報が急増しているものの、図９中（ｄ）に示すように、過去にも要素候補＃２を含む検索履歴や投稿情報の数が急増する日時が存在する。検索履歴の数や投稿情報の数がこのような変遷を辿る場合、要素候補＃２は、新出要素ではない可能性が高い。具体的な例を挙げると、要素候補＃２が小説のタイトルである場合、図９中（ｄ）に示す日時において要素候補＃２が話題となり、図９中（ｃ）に示す日時において要素候補＃２の小説が映画化されたといった態様が考えられる。このような要素候補＃２を学習データとした場合、真に新出要素が含まれる投稿情報を選択することができず、学習データの確度が低下する結果、新出要素の抽出精度が低下する恐れがある。そこで、情報提供装置１０は、要素候補＃２を学習データの作成対象から除外する。 On the other hand, as shown in FIG. 9B, the number of search histories and posted information including the element candidate # 2 exists in a certain amount while repeatedly increasing and decreasing. Here, at the date and time shown in FIG. 9 (c), the search history and posted information including the element candidate # 1 are rapidly increasing, but as shown in FIG. 9 (d), the element candidate # 2 has been used in the past. There is a date and time when the number of search history and posted information including is rapidly increasing. If the number of search histories and the number of posted information follow such a transition, it is highly possible that element candidate # 2 is not a new element. To give a specific example, when element candidate # 2 is the title of a novel, element candidate # 2 becomes a topic at the date and time shown in FIG. 9 (d), and element candidate # 2 becomes a topic at the date and time shown in FIG. 9 (c). It is conceivable that the # 2 novel was made into a movie. When such element candidate # 2 is used as learning data, it is not possible to select post information that truly includes a new element, and as a result, the accuracy of the training data is lowered, and as a result, the extraction accuracy of the new element is lowered. There is a fear. Therefore, the information providing device 10 excludes the element candidate # 2 from the learning data creation target.

図２に戻り、説明を続ける。抽出部４４は、選択部４３により選択された要素候補に関する投稿情報のうち、投稿情報や検索情報の数が急増した日時に投稿された投稿情報を学習データとして抽出する。また、抽出部４４は、要素候補を説明する説明コンテンツから、要素候補と他の要素との関係性を学習データとして抽出する。例えば、抽出部４４は、選択部４３が学習データの作成対象とする要素候補を選択した場合、選択した要素候補を含む投稿情報であって、投稿情報や検索情報の数が「０」から「１」以上に増加した日に投稿された投稿情報を抽出する。そして、抽出部４４は、要素候補と抽出した投稿情報とを対応付けて学習データデータベース３５に登録する。 Returning to FIG. 2, the explanation will be continued. The extraction unit 44 extracts the posted information posted on the date and time when the number of posted information and search information suddenly increases from the posted information related to the element candidates selected by the selection unit 43 as learning data. In addition, the extraction unit 44 extracts the relationship between the element candidate and other elements as learning data from the explanatory content explaining the element candidate. For example, when the selection unit 43 selects the element candidate for which the learning data is to be created, the extraction unit 44 is the post information including the selected element candidate, and the number of the post information and the search information is from "0" to "0". Extract the posted information posted on the day when it increased to 1 "or more. Then, the extraction unit 44 associates the element candidates with the extracted post information and registers them in the learning data database 35.

また、抽出部４４は、要素候補の説明コンテンツにインフォボックスが登録されている場合、かかるインフォボックスから要素候補と他の要素との間の関係性を示す関係情報を抽出する。なお、抽出部４４は、文字解析技術や、例えば、説明コンテンツに設定された他の説明コンテンツへのリンク関係等から、要素候補と他の要素との間の関係性を示す関係情報を特定してもよい。そして、抽出部４４は、抽出した関係情報を要素候補と対応付けて学習データデータベース３５に登録する。 Further, when the infobox is registered in the explanation content of the element candidate, the extraction unit 44 extracts the relationship information indicating the relationship between the element candidate and the other element from the infobox. The extraction unit 44 identifies the relationship information indicating the relationship between the element candidate and the other element from the character analysis technique, for example, the link relationship to the other explanatory content set in the explanatory content, and the like. You may. Then, the extraction unit 44 associates the extracted relationship information with the element candidates and registers them in the learning data database 35.

学習部４５は、選択された要素候補に関する情報を用いて、新たな要素に関する情報の特徴をモデルに学習させる。すなわち、学習部４５は、選択部４３により選択された要素候補に関する学習データを用いて、各モデルの学習を行う。例えば、学習部４５は、選択された要素候補、要素候補に関する投稿情報、および要素候補と他の要素との関係性を学習データとして、学習データが有する各種の特徴を各モデルに学習させる。 The learning unit 45 causes the model to learn the characteristics of the information about the new element by using the information about the selected element candidate. That is, the learning unit 45 learns each model using the learning data regarding the element candidates selected by the selection unit 43. For example, the learning unit 45 causes each model to learn various features of the learning data by using the selected element candidate, the posted information about the element candidate, and the relationship between the element candidate and other elements as learning data.

例えば、学習部４５は、学習データの投稿情報を用いて、投稿情報が新たな要素に関する投稿であるか否かを判定する判定モデルの学習を行う。例えば、学習部４５は、学習データに含まれる投稿情報を入力した際に、入力された投稿情報に新出要素が含まれる旨を示す情報を出力し、他の投稿情報を入力した際に、入力された投稿情報に新出要素が含まれない旨を示す情報を出力するように、判定モデルの学習を行う。 For example, the learning unit 45 uses the posted information of the learning data to learn a determination model for determining whether or not the posted information is a post related to a new element. For example, when the learning unit 45 inputs the posted information included in the learning data, the learning unit 45 outputs information indicating that the input posted information includes a new element, and when other posted information is input, the learning unit 45 outputs the information. The judgment model is trained so as to output information indicating that the input posted information does not include new elements.

また、例えば、学習部４５は、学習データの要素候補と、要素候補に関する投稿情報を用いて、投稿情報に含まれる新たな要素を抽出する要素抽出モデルの学習を行う。例えば、学習部４５は、ある学習データの投稿情報を入力した際に、その学習データの要素候補を示す情報を出力するように、抽出モデルの学習を行う。 Further, for example, the learning unit 45 learns an element extraction model that extracts a new element included in the posted information by using the element candidate of the learning data and the posted information about the element candidate. For example, the learning unit 45 learns the extraction model so as to output information indicating element candidates of the learning data when the posted information of a certain learning data is input.

また、例えば、学習部４５は、学習データの投稿情報と、その学習データの要素候補と他の要素との関係性とを用いて、投稿情報に含まれる新たな要素と他の要素との関係性を抽出する関係推定モデルの学習を行う。より具体的には、学習部４５は、投稿情報から、新たな要素との間に所定の関係性を有する他の要素を抽出する複数のモデルであって、それぞれ異なる関係性を有する他の要素を抽出する複数のモデルを学習する。 Further, for example, the learning unit 45 uses the posted information of the learning data and the relationship between the element candidates of the learning data and other elements to relate the new element included in the posted information to the other elements. Learn the relationship estimation model that extracts sex. More specifically, the learning unit 45 is a plurality of models for extracting other elements having a predetermined relationship with a new element from the posted information, and the other elements having different relationships with each other. Learn multiple models to extract.

例えば、学習部４５は、関係推定モデルとして、それぞれ異なる関係性と対応付けた複数のモデルを準備する。また、学習部４５は、例えば、第１の関係性と対応付けたモデルを学習する場合、学習データデータベース３５を参照し、要素候補とその要素候補との間に第１の関係性を有する他の要素とを特定する。そして、学習部４５は、第１の関係性と対応付けたモデルに対して投稿情報を入力した際に、特定した他の要素を示す情報出力するように、そのモデルの学習を行う。このような処理を各関係推定モデルについて実行することで、学習部４５は、投稿情報から、要素候補と所定の関係性を有する他の要素を抽出するモデル、すなわち、要素候補と他の要素との間の関係性を推定するための関係推定モデルを学習することができる。 For example, the learning unit 45 prepares a plurality of models associated with different relationships as the relationship estimation model. Further, for example, when learning a model associated with the first relationship, the learning unit 45 refers to the learning data database 35 and has a first relationship between the element candidate and the element candidate. Identify the elements of. Then, when the posted information is input to the model associated with the first relationship, the learning unit 45 learns the model so as to output information indicating the specified other elements. By executing such processing for each relation estimation model, the learning unit 45 extracts another element having a predetermined relationship with the element candidate from the posted information, that is, the element candidate and the other element. You can learn a relationship estimation model for estimating the relationship between.

更新部４６は、学習部４５によって学習が行われた各モデルを用いて、ナレッジデータベースの更新を行う。例えば、更新部４６は、投稿されてから所定の時間が経過していない投稿情報をログサーバ１００から取得する。そして、更新部４６は、判定モデルを用いて、各投稿情報が新出要素を含むか否かを判定する。 The update unit 46 updates the knowledge database using each model trained by the learning unit 45. For example, the update unit 46 acquires post information from the log server 100 for which a predetermined time has not passed since the post was posted. Then, the update unit 46 determines whether or not each posted information includes a new element by using the determination model.

また、更新部４６は、ある投稿情報に新出要素が含まれると判定モデルが判定した場合は、抽出モデルを用いて、その投稿情報から新出要素を抽出する。すなわち、更新部４６は、抽出モデルを用いて、新出要素が含まれると判定された投稿情報から新出要素を示す可能性が高い文字列の抽出を行う。また、更新部４６は、関係推定モデルを用いて、新出要素が含まれると判定された投稿情報から、新出要素と所定の関係性を有すると推定される他の要素を抽出する。 Further, when the determination model determines that a certain posted information includes a new element, the update unit 46 extracts the new element from the posted information by using the extraction model. That is, the update unit 46 uses the extraction model to extract a character string that is likely to indicate a new element from the posted information determined to include the new element. In addition, the update unit 46 uses the relationship estimation model to extract other elements presumed to have a predetermined relationship with the new element from the posted information determined to include the new element.

そして、更新部４６は、抽出した新出要素と、新出要素と他の要素との関係性を用いて、ナレッジデータベースの更新を行う。例えば、更新部４６は、新出要素を示すエンティティをエンティティデータベース３１に登録する。また、更新部４６は、新出要素と所定の関係性を有する他の要素のエンティティをエンティティデータベース３１から特定する。そして、更新部４６は、トリプルとして、新出要素のエンティティと、特定した他の要素のエンティティと、その要素を抽出した関係推定モデルと対応する関係（すなわち、「種別」）との組をトリプルとして、トリプルデータベース３２に登録する。 Then, the update unit 46 updates the knowledge database by using the extracted new element and the relationship between the new element and other elements. For example, the update unit 46 registers an entity indicating a new element in the entity database 31. Further, the update unit 46 identifies an entity of another element having a predetermined relationship with the new element from the entity database 31. Then, as a triple, the update unit 46 triples a set of the entity of the new element, the entity of the specified other element, and the relationship (that is, "type") corresponding to the relationship estimation model extracted from the element. To be registered in the triple database 32.

〔３．情報提供装置が実行する処理の流れの一例〕
続いて、図１０を用いて、情報提供装置１０が実行する処理の流れについて説明する。図１０は、実施形態に係る情報提供装置が実行する処理の流れの一例を示すフローチャートである。 [3. An example of the flow of processing executed by the information providing device]
Subsequently, the flow of processing executed by the information providing device 10 will be described with reference to FIG. FIG. 10 is a flowchart showing an example of a flow of processing executed by the information providing device according to the embodiment.

まず、情報提供装置１０は、所定の期間内に作成された説明コンテンツを取得する（ステップＳ１０１）。続いて、情報提供装置１０は、説明コンテンツから要素候補を抽出する（ステップＳ１０２）。また、情報提供装置１０は、抽出した要素候補に関する検索履歴と投稿履歴とを取得する（ステップＳ１０３）。そして、情報提供装置１０は、要素候補のうち、所定の日時における検索履歴と投稿履歴との増加量が所定の閾値を超える要素候補を選択する（ステップＳ１０４）。 First, the information providing device 10 acquires the explanatory content created within a predetermined period (step S101). Subsequently, the information providing device 10 extracts element candidates from the explanatory content (step S102). Further, the information providing device 10 acquires the search history and the posting history of the extracted element candidates (step S103). Then, the information providing device 10 selects an element candidate whose increase amount between the search history and the posting history at a predetermined date and time exceeds a predetermined threshold value among the element candidates (step S104).

また、情報提供装置１０は、選択した要素候補と、所定の日時に投稿された投稿履歴と、関係情報とを学習データとする（ステップＳ１０５）。そして、情報提供装置１０は、学習データを用いて、各モデルを学習する（ステップＳ１０６）。また、情報提供装置１０は、判定モデルを用いて、新出要素を含む投稿情報を特定し（ステップＳ１０７）、特定した投稿情報から、抽出モデルと関係推定モデルとを用いて、新出要素と関係情報とを抽出する（ステップＳ１０８）。そして、情報提供装置１０は、ナレッジデータベースに新出要素と関係情報とを登録し（ステップＳ１０９）、処理を終了する。 Further, the information providing device 10 uses the selected element candidate, the posting history posted on a predetermined date and time, and the related information as learning data (step S105). Then, the information providing device 10 learns each model using the learning data (step S106). Further, the information providing device 10 identifies the posted information including the new element by using the determination model (step S107), and uses the extraction model and the relationship estimation model from the identified posted information to obtain the new element. The relationship information is extracted (step S108). Then, the information providing device 10 registers the new element and the related information in the knowledge database (step S109), and ends the process.

〔４．変形例〕
上記では、情報提供装置１０による選択処理、学習処理および更新処理の一例について説明した。しかしながら、実施形態は、これに限定されるものではない。以下、情報提供装置１０が実行する提供処理や選択処理のバリエーションについて説明する。 [4. Modification example]
In the above, an example of the selection process, the learning process, and the update process by the information providing device 10 has been described. However, the embodiment is not limited to this. Hereinafter, variations of the provision process and the selection process executed by the information providing device 10 will be described.

〔４−１．エンティティの種別について〕
上述した例では、ナレッジデータベースの一例として、人物と人物の職業とを関連付けたトリプルが登録されるナレッジデータベースを示した。しかしながら、実施形態は、これに限定されるものではない。すなわち、情報提供装置１０は、任意の物事を示すナレッジデータベースの更新を行ってよい。より具体的には、情報提供装置１０は、任意の物事を新出要素として選択し、学習データの生成を行ってよく、任意の物事を新出要素として抽出して良い。 [4-1. About the type of entity]
In the above example, as an example of the knowledge database, a knowledge database in which triples associated with a person and a person's occupation are registered is shown. However, the embodiment is not limited to this. That is, the information providing device 10 may update the knowledge database indicating arbitrary things. More specifically, the information providing device 10 may select arbitrary things as new elements and generate learning data, and may extract arbitrary things as new elements.

〔４−２．装置構成〕
情報提供装置１０は、自装置でナレッジデータベースを管理せずともよい。例えば、記憶部３０に登録された各データベース３１〜３６は、外部のストレージサーバに保持されていてもよい。また、情報提供装置１０は、検索処理を実現するフロントエンドサーバと、選択処理を実現するバックエンドサーバとで実現されてもよい。このような場合、バックエンドサーバには、図２に示す各４１〜４６が配置され、フロントエンドサーバには、ナレッジデータベースに検索を行う機能を発揮するための機能構成が配置される。 [4-2. Device configuration〕
The information providing device 10 does not have to manage the knowledge database by its own device. For example, each database 31 to 36 registered in the storage unit 30 may be stored in an external storage server. Further, the information providing device 10 may be realized by a front-end server that realizes a search process and a back-end server that realizes a selection process. In such a case, each of 41 to 46 shown in FIG. 2 is arranged on the back-end server, and a functional configuration for exerting a function of searching the knowledge database is arranged on the front-end server.

〔４−３．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、逆に、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [4-3. others〕
Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, and conversely, the processes described as being manually performed. It is also possible to automatically perform all or part of the above by a known method. In addition, the processing procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of the device is functionally or physically dispersed / physically distributed in any unit according to various loads and usage conditions. Can be integrated and configured.

また、上記してきた各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the above-described embodiments can be appropriately combined as long as the processing contents do not contradict each other.

〔４−４．プログラム〕
また、上述した実施形態に係る情報提供装置１０は、例えば図１１に示すような構成のコンピュータ１０００によって実現される。図１１は、ハードウェア構成の一例を示す図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力ＩＦ（Interface）１０６０、入力ＩＦ１０７０、ネットワークＩＦ１０８０がバス１０９０により接続された形態を有する。 [4-4. program〕
Further, the information providing device 10 according to the above-described embodiment is realized by, for example, a computer 1000 having a configuration as shown in FIG. FIG. 11 is a diagram showing an example of a hardware configuration. The computer 1000 is connected to the output device 1010 and the input device 1020, and the arithmetic unit 1030, the primary storage device 1040, the secondary storage device 1050, the output IF (Interface) 1060, the input IF 1070, and the network IF 1080 are connected by the bus 1090. Has.

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。一次記憶装置１０４０は、ＲＡＭ等、演算装置１０３０が各種の演算に用いるデータを一次的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ(Read Only Memory)、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ等により実現される。 The arithmetic unit 1030 operates based on a program stored in the primary storage device 1040 or the secondary storage device 1050, a program read from the input device 1020, or the like, and executes various processes. The primary storage device 1040 is a memory device that temporarily stores data used by the arithmetic unit 1030 for various calculations, such as a RAM. Further, the secondary storage device 1050 is a storage device in which data used by the calculation device 1030 for various calculations and various databases are registered, such as a ROM (Read Only Memory), an HDD (Hard Disk Drive), and a flash memory. Is realized by.

出力ＩＦ１０６０は、モニタやプリンタといった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインタフェースであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力ＩＦ１０７０は、マウス、キーボード、およびスキャナ等といった各種の入力装置１０２０から情報を受信するためのインタフェースであり、例えば、ＵＳＢ等により実現される。 The output IF 1060 is an interface for transmitting information to be output to an output device 1010 that outputs various information such as a monitor and a printer. For example, USB (Universal Serial Bus), DVI (Digital Visual Interface), and the like. It is realized by a connector of a standard such as HDMI (registered trademark) (High Definition Multimedia Interface). Further, the input IF 1070 is an interface for receiving information from various input devices 1020 such as a mouse, a keyboard, and a scanner, and is realized by, for example, USB.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等から情報を読み出す装置であってもよい。また、入力装置１０２０は、ＵＳＢメモリ等の外付け記憶媒体であってもよい。 The input device 1020 includes, for example, an optical recording medium such as a CD (Compact Disc), a DVD (Digital Versatile Disc), or a PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), or a tape. It may be a device that reads information from a medium, a magnetic recording medium, a semiconductor memory, or the like. Further, the input device 1020 may be an external storage medium such as a USB memory.

ネットワークＩＦ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 The network IF1080 receives data from another device via the network N and sends it to the arithmetic unit 1030, and also transmits the data generated by the arithmetic unit 1030 to the other device via the network N.

演算装置１０３０は、出力ＩＦ１０６０や入力ＩＦ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 The arithmetic unit 1030 controls the output device 1010 and the input device 1020 via the output IF 1060 and the input IF 1070. For example, the arithmetic unit 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040, and executes the loaded program.

例えば、コンピュータ１０００が情報提供装置１０として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部４０の機能を実現する。 For example, when the computer 1000 functions as the information providing device 10, the arithmetic unit 1030 of the computer 1000 realizes the function of the control unit 40 by executing the program loaded on the primary storage device 1040.

〔５．効果〕
上述したように、情報提供装置１０は、新たな要素の候補である要素候補に関する検索履歴と、その要素候補に関する投稿情報とを特定する。そして、情報提供装置１０は、特定された検索履歴と、特定された投稿情報とに基づいて、要素候補のうち、新たな要素に関する情報の特徴をモデルに学習させるための学習データを生成するための要素候補を選択する。この結果、情報提供装置１０は、新出要素に関する情報の特徴をモデルに学習させることができる結果、モデルを用いた新出要素の抽出を実現し、新出エンティティの追加を効率化することができる。 [5. effect〕
As described above, the information providing device 10 specifies the search history regarding the element candidate which is a candidate for a new element and the posted information regarding the element candidate. Then, the information providing device 10 generates learning data for making the model learn the characteristics of the information about the new element among the element candidates based on the specified search history and the specified posted information. Select element candidates for. As a result, the information providing device 10 can make the model learn the characteristics of the information about the new element, and as a result, it is possible to realize the extraction of the new element using the model and to improve the efficiency of adding the new entity. can.

例えば、情報提供装置１０は、検索履歴が示す検索の状況と、投稿情報が示す投稿の状況とに基づいて、学習データを生成するための要素候補を選択する。また、例えば、情報提供装置１０は、検索履歴が示す検索の数と、投稿情報が示す投稿の数とに基づいて、学習データを生成するための要素候補を選択する。このため、情報提供装置１０は、新出要素である可能性が高い要素候補を学習データを生成するための要素候補として選択することができる。 For example, the information providing device 10 selects an element candidate for generating learning data based on the search status indicated by the search history and the posting status indicated by the posting information. Further, for example, the information providing device 10 selects element candidates for generating learning data based on the number of searches indicated by the search history and the number of posts indicated by the posted information. Therefore, the information providing device 10 can select an element candidate that is likely to be a new element as an element candidate for generating learning data.

また、情報提供装置１０は、要素候補を説明する説明コンテンツが登録された場合は、その説明コンテンツから要素候補を取得する。そして、情報提供装置１０は、要素候補の検索履歴と、その要素候補に関する投稿情報とを特定する。例えば、情報提供装置１０は、要素候補の投稿情報であって、説明コンテンツが登録されるよりも前の所定の期間内に投稿された投稿情報を特定する。このため、情報提供装置１０は、新出要素の可能性が高い要素候補の中から、学習データの生成対象を選択するので、学習データの確度を向上させることができる。 Further, when the explanatory content for explaining the element candidate is registered, the information providing device 10 acquires the element candidate from the explanatory content. Then, the information providing device 10 specifies the search history of the element candidate and the posted information about the element candidate. For example, the information providing device 10 identifies the posted information of the element candidate, which is posted within a predetermined period before the explanatory content is registered. Therefore, since the information providing device 10 selects the learning data generation target from the element candidates having a high possibility of new elements, the accuracy of the learning data can be improved.

また、情報提供装置１０は、複数の要素候補のうち、所定の日時において、検索履歴の数の増加量が所定の閾値を超え、かつ、投稿情報の数の増加量が所定の閾値を超えた要素候補を選択する。例えば、情報提供装置１０は、複数の要素候補のうち、所定の日時よりも前における検索履歴および投稿情報が存在せず、かつ、その所定の日時において検索履歴の数が所定の閾値を超え、かつ、投稿情報の数が所定の閾値を超えた要素候補を選択する。このため、情報提供装置１０は、学習データの確度を向上させることができる。 Further, in the information providing device 10, the increase amount of the number of search histories exceeds a predetermined threshold value and the increase amount of the number of posted information exceeds a predetermined threshold value at a predetermined date and time among the plurality of element candidates. Select element candidates. For example, in the information providing device 10, among the plurality of element candidates, the search history and the posted information before the predetermined date and time do not exist, and the number of search histories exceeds the predetermined threshold value at the predetermined date and time. In addition, element candidates whose number of posted information exceeds a predetermined threshold are selected. Therefore, the information providing device 10 can improve the accuracy of the learning data.

また、情報提供装置１０は、選択された要素候補に関する投稿情報のうち、所定の日時に投稿された投稿情報を学習データとして抽出する。また、情報提供装置１０は、選択された要素候補を説明する説明コンテンツから、要素候補と他の要素との関係性を学習データとして抽出する。このため、情報提供装置１０は、新出要素を検出するためのモデルの学習を実現する学習データを生成できる。 Further, the information providing device 10 extracts the posted information posted on a predetermined date and time from the posted information related to the selected element candidates as learning data. Further, the information providing device 10 extracts the relationship between the element candidate and other elements as learning data from the explanatory content explaining the selected element candidate. Therefore, the information providing device 10 can generate learning data that realizes learning of the model for detecting the new element.

また、情報提供装置１０は、選択された要素候補に関する情報を用いて、新たな要素に関する情報の特徴をモデルに学習させる。例えば、情報提供装置１０は、要素候補、その要素候補に関する投稿情報、およびその要素候補と他の要素との関係性を学習データとしてモデルに学習させる。例えば、情報提供装置１０は、選択された要素候補に関する投稿情報を用いて、投稿情報が新たな要素に関する投稿であるか否かを判定する判定モデルの学習を行う。また、例えば、情報提供装置１０は、選択された要素候補と、その要素候補に関する投稿情報を用いて、投稿情報に含まれる新たな要素を抽出する要素抽出モデルの学習を行う。また、例えば、情報提供装置１０は、要素候補に関する投稿情報と、その要素候補と他の要素との関係性とを用いて、投稿情報に含まれる新たな要素と他の要素との関係性を抽出する関係推定モデルの学習を行う。より具体的には、情報提供装置１０は、関係推定モデルとして、投稿情報から、新たな要素との間に所定の関係性を有する他の要素を抽出する複数のモデルであって、それぞれ異なる関係性を有する他の要素を抽出する複数のモデルを学習する。 Further, the information providing device 10 causes the model to learn the characteristics of the information regarding the new element by using the information regarding the selected element candidate. For example, the information providing device 10 causes the model to learn the element candidate, the posted information about the element candidate, and the relationship between the element candidate and another element as learning data. For example, the information providing device 10 learns a determination model for determining whether or not the posted information is a post related to a new element, using the posted information related to the selected element candidate. Further, for example, the information providing device 10 learns an element extraction model that extracts a new element included in the posted information by using the selected element candidate and the posted information about the element candidate. Further, for example, the information providing device 10 uses the posted information regarding the element candidate and the relationship between the element candidate and another element to determine the relationship between the new element included in the posted information and the other element. Learn the relationship estimation model to be extracted. More specifically, the information providing device 10 is a plurality of models for extracting other elements having a predetermined relationship with a new element from the posted information as a relationship estimation model, and each has a different relationship. Learn multiple models to extract other elements of sex.

このような処理の結果、情報提供装置１０は、各種の投稿情報から、新出要素を精度良く検出するとともに、新出要素と他の要素との間の関係性を推定することができる。この結果、情報提供装置１０は、例えば、ナレッジデータベースが有するトリプルの自動的な更新を実現できる。 As a result of such processing, the information providing device 10 can accurately detect the new element from various posted information and estimate the relationship between the new element and the other element. As a result, the information providing device 10 can realize, for example, automatic updating of triples possessed by the knowledge database.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 Although some of the embodiments of the present application have been described in detail with reference to the drawings, these are examples, and various modifications are made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure column of the invention. It is possible to practice the present invention in other improved forms.

また、上記してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、付与部は、特定手段や特定回路に読み替えることができる。 Further, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the imparting unit can be read as a specific means or a specific circuit.

１０情報提供装置
２０通信部
３０記憶部
３１エンティティデータベース
３２トリプルデータベース
３３検索ログデータベース
３４投稿情報データベース
３５学習データデータベース
３６モデルデータベース
４０制御部
４１取得部
４２特定部
４３選択部
４４抽出部
４５学習部
４６更新部
１００ログサーバ
２００説明コンテンツサーバ 10 Information provider 20 Communication unit 30 Storage unit 31 Entity database 32 Triple database 33 Search log database 34 Post information database 35 Learning data database 36 Model database 40 Control unit 41 Acquisition unit 42 Specific unit 43 Selection unit 44 Extraction unit 45 Learning unit 46 Update 100 Log server 200 Description Content server

Claims

An input layer where post information about a given element is entered,
Output layer and
A first element that is any layer from the input layer to the output layer and belongs to a layer other than the output layer.
Includes a first element and a second element whose value is calculated based on the weight of the first element.
With respect to the information input to the input layer, each element belonging to each layer other than the output layer is set as the first element, and the post is performed by performing an operation based on the weights of the first element and the first element. So that the output layer outputs a value indicating whether or not the information is posted information about a new element.
It ’s a model for making a computer work .
The weight of the first element is selected based on the search status indicated by the search history for the element candidate and the posting status indicated by the post information regarding the element candidate among the element candidates that are candidates for the new element. Based on learning using learning data based on element candidates
A model that features that.

An input layer where post information about new elements is entered,
Output layer and
A first element that is any layer from the input layer to the output layer and belongs to a layer other than the output layer.
Includes a first element and a second element whose value is calculated based on the weight of the first element.
With respect to the information input to the input layer, each element belonging to each layer other than the output layer is set as the first element, and the post is performed by performing an operation based on the weights of the first element and the first element. A value indicating information indicating a new element among the information contained in the information is output from the output layer.
It ’s a model for making a computer work .
The weight of the first element is selected based on the search status indicated by the search history for the element candidate and the posting status indicated by the post information regarding the element candidate among the element candidates that are candidates for the new element. Based on learning using learning data based on element candidates
A model that features that.

An input layer where post information about new elements is entered,
Output layer and
A first element that is any layer from the input layer to the output layer and belongs to a layer other than the output layer.
Includes a first element and a second element whose value is calculated based on the weight of the first element.
With respect to the information input to the input layer, each element belonging to each layer other than the output layer is set as the first element, and the post is performed by performing an operation based on the weights of the first element and the first element. Of the information contained in the information, a value indicating the relationship between the new element and another element is output from the output layer.
It ’s a model for making a computer work .
The weight of the first element is selected based on the search status indicated by the search history for the element candidate and the posting status indicated by the post information regarding the element candidate among the element candidates that are candidates for the new element. Based on learning using learning data based on element candidates
A model that features that.