JP7147380B2

JP7147380B2 - Type estimation method, information processing device and type estimation program

Info

Publication number: JP7147380B2
Application number: JP2018163169A
Authority: JP
Inventors: 裕章森川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2022-10-05
Anticipated expiration: 2038-08-31
Also published as: JP2020035332A

Description

本発明は、タイプ推定方法などに関する。 The present invention relates to a type estimation method and the like.

表のスキーマ情報から表の作成者の意図を推定し、ナレッジグラフに対するクエリを自動生成することにより表を生成するナレッジ活用システムが知られている。かかるシステムでは、表をできる限り埋めることができるクエリを自動生成するために、ナレッジグラフ内から表のタイプ（クラス）を推定する。 A knowledge utilization system is known that generates a table by estimating the intention of the creator of the table from the schema information of the table and automatically generating a query for the knowledge graph. Such systems infer the type (class) of a table from within the knowledge graph in order to automatically generate queries that can fill the table as much as possible.

例えば、表のタイプの推定方法として、ｔｆ－ｉｄｆ（term frequency－inverse document frequency）法を用いた方法がある（例えば、非特許文献１を参照）。かかる方法では、例えば、表のスキーマ情報から得られる各タイプのインスタンスに対して出現頻度を計算し、出現頻度（ｔｆ）をｔｆ－ｉｄｆ法によって重み付けして、各タイプのスコアを計算する。そして、スコアの最も高いタイプを表のタイプとして推定する。具体的には、タイプを文書、このタイプが付与されているインスタンスが持つ述語を単語とみなし、単語のスコア（ｔｆ－ｉｄｆ）を計算し、各タイプのスコアを計算する。なお、非特許文献１では、ナレッジグラフに盛り込まれる情報としてＤＢｐｅｄｉａやＦｒｅｅｂａｓｅ（登録商標）が用いられている。 For example, as a table type estimation method, there is a method using a tf-idf (term frequency-inverse document frequency) method (see, for example, Non-Patent Document 1). In such a method, for example, the appearance frequency is calculated for each type instance obtained from the schema information of the table, the appearance frequency (tf) is weighted by the tf-idf method, and the score of each type is calculated. Then, the type with the highest score is estimated as the table type. Specifically, the type is regarded as a document, and the predicate of an instance to which this type is assigned is regarded as a word, the word score (tf-idf) is calculated, and the score of each type is calculated. It should be noted that in Non-Patent Document 1, DBpedia and Freebase (registered trademark) are used as information included in the knowledge graph.

特表２０１７－５３１２４０号公報Japanese Patent Publication No. 2017-531240 特開２０１３－２５４４２１号公報JP 2013-254421 A

Duan S., Fokoue A., Hassanzadeh O., Kementsietsidis A., Srinivas K., Ward M.J. (2012) Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing. In: Cudre-Mauroux P. et al. (eds) The Semantic Web－ISWC 2012. ISWC 2012. Lecture Notes in Computer Science, vol 7649. Springer, Berlin, HeidelbergDuan S., Fokoue A., Hassanzadeh O., Kementsietsidis A., Srinivas K., Ward M.J. (2012) Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing. In: Cudre-Mauroux P. et al. (eds ) The Semantic Web - ISWC 2012. ISWC 2012. Lecture Notes in Computer Science, vol 7649. Springer, Berlin, Heidelberg

しかしながら、従来の技術では、表のタイプを適切に推定することが難しいという問題がある。すなわち、かかる技術では、タイプを文書、このタイプが付与されているインスタンスが持つ述語を単語とみなし、単語のスコア（ｔｆ－ｉｄｆ）を計算し、各タイプのスコアを計算するが、一般的なタイプのスコアが高く計算される傾向がある。つまり、一般的なタイプを表のタイプとして推定してしまう。したがって、従来の技術では、表のタイプを適切に推定することが難しい。 However, the conventional technique has the problem that it is difficult to properly estimate the table type. That is, in this technique, the type is regarded as a document, and the predicate possessed by an instance given this type is regarded as a word, the score of the word (tf-idf) is calculated, and the score of each type is calculated. Type scores tend to be calculated higher. In other words, the general type is inferred as the table type. Therefore, it is difficult to properly estimate the table type with the conventional technique.

本発明は、１つの側面では、対象の表のタイプを適切に推定することを目的とする。 In one aspect, the present invention aims at properly estimating the type of target table.

１つの態様では、タイプ推定方法は、カラム名と前記カラム名に対するカラムとが対応付けられたテーブルのタイプを推定する際に、グラフデータに含まれるタイプと、タイプごとのインスタンスの述語の出現頻度とに基づいて、前記テーブルのタイプの候補を特定し、前記グラフデータに含まれる単語同士の概念的な関係を規定したオントロジーに関する情報に基づいて、特定したタイプの候補の中から、前記テーブルのタイプを推定する、処理をコンピュータが実行する。 In one aspect, when estimating the type of a table in which a column name and a column corresponding to the column name are associated, the type estimation method uses types included in graph data and the frequency of occurrence of predicates of instances of each type. and based on the information on the ontology that defines the conceptual relationship between the words included in the graph data, from among the identified candidates of the table type A computer performs the process of estimating the type.

１実施態様によれば、対象の表のタイプを適切に推定することができる。 According to one embodiment, the type of table of interest can be reasonably inferred.

図１は、実施例に係るタイプ推定装置の機能構成を示すブロック図である。FIG. 1 is a block diagram showing the functional configuration of the type estimation device according to the embodiment. 図２は、実施例に係るタイプ推定方法のイメージを示す図である。FIG. 2 is a diagram showing an image of the type estimation method according to the embodiment. 図３は、統計スコア計算に用いられるクエリの一例を示す図である。FIG. 3 is a diagram showing an example of a query used for statistical score calculation. 図４は、統計情報ＤＢのデータ構造の一例を示す図である。FIG. 4 is a diagram illustrating an example of the data structure of a statistical information DB; 図５は、実施例に係る演繹ルールＤＢのスキーマの一例を示す図である。FIG. 5 is a diagram illustrating an example of a deduction rule DB schema according to the embodiment; 図６は、実施例に係るタイプ推定の一例を示す図である。FIG. 6 is a diagram illustrating an example of type estimation according to the embodiment. 図７は、実施例に係るスコア計算処理のフローチャートの一例を示す図である。FIG. 7 is a diagram illustrating an example of a flowchart of score calculation processing according to the embodiment. 図８は、実施例に係る演繹ルール抽出処理のフローチャートの一例を示す図である。FIG. 8 is a diagram illustrating an example of a flowchart of deduction rule extraction processing according to the embodiment. 図９は、実施例に係るタイプ推定処理のフローチャートの一例を示す図である。FIG. 9 is a diagram illustrating an example of a flowchart of type estimation processing according to the embodiment. 図１０Ａは、実施例に係るタイプ推定の用途の一例を示す図（１）である。FIG. 10A is a diagram (1) illustrating an example of a type estimation application according to the embodiment; 図１０Ｂは、実施例に係るタイプ推定の用途の一例を示す図（２）である。FIG. 10B is a diagram (2) illustrating an example of a type estimation application according to the embodiment; 図１１は、タイプ推定プログラムを実行するコンピュータの一例を示す図である。FIG. 11 is a diagram showing an example of a computer that executes a type estimation program. 図１２は、タイプ推定方法の参考例を示す図である。FIG. 12 is a diagram showing a reference example of the type estimation method.

以下に、本願の開示するタイプ推定方法、情報処理装置およびタイプ推定プログラムの実施例を図面に基づいて詳細に説明する。なお、本発明は、実施例により限定されるものではない。 Embodiments of the type estimation method, the information processing apparatus, and the type estimation program disclosed in the present application will be described in detail below with reference to the drawings. In addition, this invention is not limited by an Example.

まず、タイプ推定方法の参考例を、図１２を参照して説明する。ここでは、対象のテーブルＴ１００をできる限り埋めることができるクエリを自動生成するために、当該テーブルＴ１００のスキーマ情報を入力してナレッジグラフ内からテーブルのタイプを推定する場合について説明する。ナレッジグラフに盛り込まれる情報として、例えば、ＤＢｐｅｄｉａが用いられる。 First, a reference example of the type estimation method will be described with reference to FIG. Here, in order to automatically generate a query that can fill the target table T100 as much as possible, a case will be described in which the schema information of the table T100 is input and the table type is estimated from within the knowledge graph. DBpedia, for example, is used as information included in the knowledge graph.

図１２は、タイプ推定方法の参考例を示す図である。図１２に示すように、対象のテーブルＴ１００のスキーマ情報は、「名前、所在地、資本金、事業内容、関連組織」であるとする。 FIG. 12 is a diagram showing a reference example of the type estimation method. As shown in FIG. 12, it is assumed that the schema information of the target table T100 is "name, location, capital, business description, related organization".

タイプ推定を実行するアプリケーション（以降、アプリと略記）が、スキーマ情報を入力すると（Ｓ１００）、ナレッジグラフから、スキーマ情報に対応する、統計的なスコアを計算する（Ｓ１１０）。統計的なスコアは、一例として、ｔｆ－ｉｄｆ法を用いて計算される。 An application that performs type estimation (hereinafter abbreviated as application) inputs schema information (S100), and calculates a statistical score corresponding to the schema information from the knowledge graph (S110). Statistical scores are calculated using the tf-idf method, as an example.

スコア計算の一例として、アプリは、タイプを文書、そのタイプが付与されているインスタンスが持つ述語（ここでは、スキーマ情報に含まれる各属性名）を単語とみなす。アプリは、ナレッジグラフから、各単語の文書内の出現頻度をそれぞれ計算し、出現頻度（ｔｆ）をｔｆ－ｉｄｆによって重み付けする。ｉｄｆは、各単語がいくつの文書内で共通して使われているかを表し、いくつもの文書で使われている単語は重要でなく、低いスコアとなる。そして、アプリは、タイプごとに各単語のスコアを合計し、タイプごとのスコアを計算する。ここでは、タイプが「会社」である場合には、インスタンスの数が２１４，７３４である。インスタンス（実体）の一例として、実際の会社名が含まれる。インスタンスが持つ述語の一例として、「名前」、「所在地」、「資本金」、「事業内容」、「関連組織」が含まれる。述語が「名前」である場合のスコアは、５．８５、「所在地」である場合のスコアは、３．０８、「資本金」である場合のスコアは、４４．６８である。「事業内容」である場合のスコアは、４４．１、「関連組織」である場合のスコアは、１３．４４である。そして、アプリは、タイプが「会社」である場合のスコアを１１１．１５と計算する。同様に、アプリは、タイプが「組織」である場合のスコアを１４６．９１と計算する。 As an example of score calculation, the application regards the type as a document, and the predicate (here, each attribute name included in the schema information) of an instance to which the type is assigned as a word. The application calculates the appearance frequency of each word in the document from the knowledge graph, and weights the appearance frequency (tf) by tf-idf. The idf indicates how many documents each word is commonly used in, and words that are used in several documents are not important and have a low score. The app then totals the scores for each word by type to calculate a score for each type. Here, if the type is "Company", the number of instances is 214,734. An example of an instance (entity) includes an actual company name. Examples of predicates that an instance has include "name", "location", "capital", "business description", and "affiliated organization". The score when the predicate is "name" is 5.85, the score when it is "location" is 3.08, and the score when it is "capital" is 44.68. The score for "business content" is 44.1, and the score for "related organization" is 13.44. Then, the application calculates a score of 111.15 when the type is "company". Similarly, the app calculates a score of 146.91 when the type is "Organization".

そして、アプリは、最も高いスコアのタイプをテーブルＴ１００のタイプとして推定する（Ｓ１２０）。ここでは、タイプが「会社」である場合のスコア１１１．１５よりタイプが「組織」である場合のスコア１４６．９１の方が高い。したがって、テーブルＴ１００のタイプは、「組織」と推定される。 The application then estimates the type with the highest score as the type of table T100 (S120). Here, the score 146.91 for the type "organization" is higher than the score 111.15 for the type "company". Therefore, the type of table T100 is presumed to be "organization".

この後、アプリは、推定したタイプを用いて対象のテーブルＴ１００をできる限り埋めることができるクエリを自動生成する。そして、アプリは、生成したクエリを用いて、ナレッジグラフから、対象のテーブル１００の中で情報が設定されていない、カラム名のカラムに情報を設定する（Ｓ１３０）。ここでは、タイプが「組織」であるため、組織名が「ＦＬＥ」の正式名称が、架空組織である「Farm Leader Education Organization」と設定されている（ｃ１００）。さらに、資本金が「１００億円」と設定されている（ｃ１１０）ほか、関連組織が設定されていない（ｃ１２０）。 After this, the app automatically generates a query that can fill the target table T100 as much as possible using the inferred type. Then, the application uses the generated query to set information from the knowledge graph to the column of the column name in which information is not set in the target table 100 (S130). Here, since the type is "organization", the formal name of the organization name "FLE" is set as the fictitious organization "Farm Leader Education Organization" (c100). Furthermore, the capital is set at "10 billion yen" (c110), and related organizations are not set (c120).

すなわち、入力したスキーマ情報が「名前、所在地、資本金、事業内容、関連組織」である場合には、タイプとして「組織」より「会社」がよりもっともらしいため、「会社」をタイプとするのが望ましい。ところが、アプリは、統計的なスコアのみを利用すると、一般的な（上位の）タイプ「組織」のスコアを高く計算してしまう。したがって、アプリは、対象のテーブルＴ１００のタイプを適切に推定することができない場合がある。 In other words, if the entered schema information is "name, location, capital, business description, affiliated organization", "company" is more likely than "organization" as the type, so "company" should be used as the type. is desirable. However, if the app uses only statistical scores, it will calculate a high score for the general (higher) type “organization”. Therefore, the application may not be able to properly estimate the type of the target table T100.

そこで、以降では、対象のテーブルのタイプを適切に推定するタイプ推定装置について説明する。 Therefore, a type estimating device for appropriately estimating the type of a target table will be described below.

［タイプ推定装置の機能構成］
図１は、実施例に係るタイプ推定装置の機能構成を示すブロック図である。図１に示すように、タイプ推定装置１は、ナレッジグラフから、カラム名とカラム名に対するカラムとが対応付けられたテーブルのタイプを推定する際に、ナレッジグラフ全体から統計的に求めるスコアに加えて、ナレッジグラフに付与されているオントロジーを利用してより適切なタイプを推定する。オントロジーとは、ナレッジグラフに含まれる概念同士の関係を定義したものである。ナレッジグラフに盛り込まれる情報として、例えばＤＢｐｅｄｉａが挙げられる。なお、タイプ推定装置１は、情報処理装置の一例である。 [Functional configuration of type estimation device]
FIG. 1 is a block diagram showing the functional configuration of the type estimation device according to the embodiment. As shown in FIG. 1, when estimating the type of a table in which column names and columns corresponding to column names are associated from a knowledge graph, the type estimation device 1 adds scores statistically obtained from the entire knowledge graph to Then, a more appropriate type is estimated using the ontology attached to the knowledge graph. An ontology is a definition of relationships between concepts included in a knowledge graph. DBpedia is an example of information included in the knowledge graph. The type estimation device 1 is an example of an information processing device.

ここで、実施例に係るタイプ推定装置１が実行するタイプ推定方法のイメージを、図２を参照して説明する。図２は、実施例に係るタイプ推定方法のイメージを示す図である。図２に示すように、タイプ推定装置１は、統計的手法と演繹的な手法とを組み合わせて、ナレッジグラフ２１から対象のテーブルのタイプを推定する。なお、対象のテーブルのスキーマ情報は、「名前、所在地、資本金、事業内容、関連組織」であるとする。 Here, an image of the type estimation method executed by the type estimation device 1 according to the embodiment will be described with reference to FIG. FIG. 2 is a diagram showing an image of the type estimation method according to the embodiment. As shown in FIG. 2, the type estimation device 1 estimates the type of the target table from the knowledge graph 21 by combining the statistical method and the deductive method. It is assumed that the schema information of the target table is "name, location, capital, business description, related organization".

統計的手法は、図１２で示したタイプ推定方法と同じ方法である。すなわち、統計的手法は、ナレッジグラフ２１から、スキーマ情報に対応する、統計的なスコアを計算する。計算された統計的なスコアは、左図に示されている。統計的なスコアは、タイプが付与されているインスタンスが持つ述語ごとのスコアであり、タイプごとに設定されている。なお、統計的なスコアは、後述する統計情報ＤＢ（DataBase）２２に対応する。そして、統計的手法は、タイプごとに各述語のスコアを合計し、タイプごとの統計的なスコアを計算する。 The statistical method is the same method as the type estimation method shown in FIG. That is, the statistical method calculates statistical scores corresponding to schema information from knowledge graph 21 . The calculated statistical scores are shown in the left figure. A statistical score is a score for each predicate of an instance to which a type is assigned, and is set for each type. Note that the statistical score corresponds to a statistical information DB (DataBase) 22, which will be described later. The statistical method then sums the scores of each predicate by type to compute a statistical score for each type.

演繹的手法は、ナレッジグラフ２１に付与されているオントロジー２３´を利用し、演繹ルールを抽出する。ここでいう演繹ルールとは、オントロジー２３´で定義された、概念（タイプ）同士の階層関係を示したルールのことをいう。オントロジー２３´は、右図に示されている。かかるオントロジー２３´は、ｒｏｏｔに位置する概念（タイプ）から下位に向かってタイプ同士の階層関係を示している。なお、演繹ルールは、後述する演繹ルールＤＢ２３に対応する。 The deductive method uses an ontology 23' attached to the knowledge graph 21 to extract deductive rules. The deduction rule here means a rule indicating the hierarchical relationship between concepts (types) defined in the ontology 23'. The ontology 23' is shown in the right figure. This ontology 23' shows the hierarchical relationship between types from the concept (type) located at the root downward. The deduction rule corresponds to the deduction rule DB 23, which will be described later.

タイプ推定装置１は、統計的手法によって計算されたタイプごとの統計的なスコアに加えて、オントロジー２３´を利用した演繹ルールを適用して、対象のテーブルのタイプを推定する。これにより、タイプ推定装置１は、統計的手法のみならず、ナレッジグラフ２１に整備された知識を利用することで、対象のテーブルのタイプを適切に推定することができる。 The type estimation device 1 estimates the type of the target table by applying the deduction rule using the ontology 23' in addition to the statistical score for each type calculated by the statistical method. As a result, the type estimation device 1 can appropriately estimate the type of the target table by using not only the statistical method but also the knowledge maintained in the knowledge graph 21 .

図１に戻って、タイプ推定装置１は、統計スコア計算部１１、演繹ルール抽出部１２、統計的推定部１３、演繹ルール適用部１４およびタイプ判定部１５を有する。これらの機能部は、図示しない制御部に含まれる。制御部は、ＣＰＵ（Central Processing Unit）などの電子回路に対応する。そして、制御部は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、これらによって種々の処理を実行する。なお、統計的推定部１３は、特定部の一例である。演繹ルール適用部１４およびタイプ判定部１５は、推定部の一例である。 Returning to FIG. 1 , the type estimation device 1 has a statistical score calculator 11 , a deduction rule extraction unit 12 , a statistical estimation unit 13 , a deduction rule application unit 14 and a type determination unit 15 . These functional units are included in a control unit (not shown). The control unit corresponds to an electronic circuit such as a CPU (Central Processing Unit). The control unit has an internal memory for storing programs defining various processing procedures and control data, and executes various processing using these. Note that the statistical estimation unit 13 is an example of an identification unit. The deduction rule application unit 14 and the type determination unit 15 are examples of an estimation unit.

また、タイプ推定装置１は、ナレッジグラフ２１、統計情報ＤＢ２２および演繹ルールＤＢ２３を有する。これらの機能部は、図示しない記憶部に含まれる。記憶部は、例えば、ＲＡＭ、フラッシュメモリ（Flash Memory）などの半導体メモリ素子、または、ハードディスク、光ディスクなどの記憶装置である。 The type estimation device 1 also has a knowledge graph 21 , a statistical information DB 22 and a deduction rule DB 23 . These functional units are included in a storage unit (not shown). The storage unit is, for example, a semiconductor memory device such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk.

ナレッジグラフ２１は、例えばＤＢｐｅｄｉａなどの情報を、ＬＯＤ（Linked Open Data）形式に変換したデータである。なお、ナレッジグラフ２１は、例えば、クラウドシステムなどを用いて一般的に公開されており、任意に利用することもできる The knowledge graph 21 is data obtained by converting information such as DBpedia into LOD (Linked Open Data) format. Note that the knowledge graph 21 is generally open to the public using, for example, a cloud system, and can be used arbitrarily.

統計情報ＤＢ２２は、統計的手法によって計算された統計的な情報を記憶する。なお、統計情報ＤＢ２２は、後述する統計スコア計算部１１によって生成される。 The statistical information DB 22 stores statistical information calculated by statistical methods. The statistical information DB 22 is generated by the statistical score calculator 11, which will be described later.

演繹ルールＤＢ２３は、演繹的手法によって生成される演繹ルールを記憶する。なお、演繹ルールＤＢ２３は、後述する演繹ルール抽出部１２によって生成される。 The deduction rule DB 23 stores deduction rules generated by deductive techniques. The deduction rule DB 23 is generated by the deduction rule extraction unit 12, which will be described later.

統計スコア計算部１１は、ナレッジグラフ２１から、統計的にタイプを推定するための統計情報ＤＢ２２を構築する。 The statistical score calculator 11 constructs a statistical information DB 22 for statistically estimating the type from the knowledge graph 21 .

例えば、統計スコア計算部１１は、対象のテーブルのタイプを推定する際に、ナレッジグラフ２１に含まれるタイプを文書とみなし、タイプが付与されているインスタンスが持つ述語を単語とみなす。そして、統計スコア計算部１１は、ナレッジグラフ２１から、各単語の文書内の出現頻度（ｔｆ）をそれぞれ計算する。タイプの問い合わせや出現頻度の問い合わせは、それぞれクエリを用いて行えば良い。 For example, when estimating the type of the target table, the statistical score calculation unit 11 regards the types included in the knowledge graph 21 as documents, and the predicates of instances to which types are assigned as words. Then, the statistical score calculator 11 calculates the appearance frequency (tf) of each word in the document from the knowledge graph 21 . Inquiries about types and frequencies of appearance can be made using queries, respectively.

そして、統計スコア計算部１１は、単語の出現頻度をｔｆ－ｉｄｆによって重み付けしてスコアを計算する。すなわち、統計スコア計算部１１は、単語の出現頻度を、各単語の複数の文書内でのユニーク性を示す値で重み付けしてスコアを計算する。つまり、統計スコア計算部１１は、いくつもの文書で使われている単語（述語）は重要でなく、低いスコアに重み付けする。そして、統計スコア計算部１１は、タイプ（文書）ごとに、タイプが付与されているインスタンスが持つ述語（単語）ごとに、計算されたスコアを統計情報ＤＢ２２に格納する。 Then, the statistical score calculation unit 11 calculates a score by weighting the appearance frequency of words by tf-idf. That is, the statistical score calculation unit 11 weights the frequency of appearance of words with a value indicating the uniqueness of each word within a plurality of documents to calculate a score. In other words, the statistical score calculator 11 weights low scores for words (predicates) that are used in many documents without being important. Then, the statistical score calculation unit 11 stores the calculated score in the statistical information DB 22 for each type (document) and for each predicate (word) of the instance to which the type is assigned.

ここで、統計スコア計算部１１によって用いられるクエリの一例を、図３を参照して説明する。図３は、統計スコア計算に用いられるクエリの一例を示す図である。図３上図に示すクエリは、ナレッジグラフ２１から全てのタイプを問い合わせるクエリである。図３下図に示すクエリは、ナレッジグラフ２１内の全てのタイプについて、タイプが付与されているインスタンスが持つ述語の出現頻度を問い合わせるクエリである。 An example of a query used by the statistical score calculator 11 will now be described with reference to FIG. FIG. 3 is a diagram showing an example of a query used for statistical score calculation. The query shown in the upper diagram of FIG. 3 is a query for all types from the knowledge graph 21 . The query shown in the lower diagram of FIG. 3 is a query for inquiring the frequency of occurrence of predicates possessed by instances to which types are assigned for all types in the knowledge graph 21 .

また、統計情報ＤＢ２２のデータ構造の一例を、図４を参照して説明する。図４は、統計情報ＤＢのデータ構造の一例を示す図である。図４に示すように、統計情報ＤＢ２２は、タイプ２２ａ、述語２２ｂおよびスコア２２ｃを対応付けて記憶する。タイプ２２ａは、ナレッジグラフ２１に含まれる概念のことをいう。述語２２ｂは、タイプが付与されているインスタンス（実体）が持つ単語のことをいう。スコア２２ｃは、ｔｆ－ｉｄｆによって計算された統計的なスコアのことをいう。 Also, an example of the data structure of the statistical information DB 22 will be described with reference to FIG. FIG. 4 is a diagram illustrating an example of the data structure of a statistical information DB; As shown in FIG. 4, the statistical information DB 22 stores types 22a, predicates 22b, and scores 22c in association with each other. A type 22a refers to a concept included in the knowledge graph 21 . The predicate 22b refers to a word possessed by an instance (entity) to which a type is assigned. Score 22c refers to a statistical score calculated by tf-idf.

一例として、タイプ２２ａが「会社」である場合に、述語２２ｂとして「名前」、スコア２２ｃとして「５．８５」を記憶している。述語２２ｂとして「所在地」、スコア２２ｃとして「３．０８」を記憶している。述語２２ｂとして「資本金」、スコア２２ｃとして「４４．６８」を記憶している。述語２２ｂとして「事業内容」、スコア２２ｃとして「４４．１」を記憶している。述語２２ｂとして「関連組織」、スコア２２ｃとして「１３．４４」を記憶している。 As an example, when the type 22a is "company", "name" is stored as the predicate 22b and "5.85" is stored as the score 22c. "Location" is stored as the predicate 22b, and "3.08" is stored as the score 22c. "Capital" is stored as the predicate 22b, and "44.68" is stored as the score 22c. "Business content" is stored as the predicate 22b, and "44.1" is stored as the score 22c. "Related organization" is stored as the predicate 22b, and "13.44" is stored as the score 22c.

図１に戻って、演繹ルール抽出部１２は、ナレッジグラフ２１から、演繹ルールを抽出する。例えば、演繹ルール抽出部１２は、ナレッジグラフ２１に含まれる単語同士の概念的な関係を規定したオントロジーから演繹ルールを抽出する。演繹ルール抽出部１２は、抽出した演繹ルールを演繹ルールＤＢ２３に格納する。オントロジーから演繹ルールを抽出する問い合わせは、クエリを用いて行えば良い。一例として、クエリには、兄弟の演繹ルールを抽出する問い合わせのクエリや親子の演繹ルールを抽出する問い合わせのクエリが含まれる。 Returning to FIG. 1 , the deduction rule extraction unit 12 extracts deduction rules from the knowledge graph 21 . For example, the deduction rule extraction unit 12 extracts deduction rules from an ontology that defines conceptual relationships between words included in the knowledge graph 21 . The deduction rule extraction unit 12 stores the extracted deduction rule in the deduction rule DB 23 . Queries for extracting deduction rules from an ontology can be made using queries. As an example, the query includes a query for extracting sibling deduction rules and a query for extracting parent-child deduction rules.

統計的推定部１３は、カラム名の集合に対応するタイプ候補を統計的なスコアから推定する。 The statistical estimation unit 13 estimates type candidates corresponding to a set of column names from statistical scores.

例えば、統計的推定部１３は、後述するタイプ判定部１５からカラム名の集合を受け付けると、統計情報ＤＢ２２から、カラム名の集合に対応するタイプ２２ａ、述語２２ｂおよびスコア２２ｃを抽出する。統計的推定部１３は、タイプ２２ａごとに、述語２２ｂに対応付けられたスコア２２ｃを合計し、合計値を統計スコアとする。統計的推定部１３は、統計スコアの高いタイプの複数個をタイプ候補として推定する。言い換えると、統計的推定部１３は、対象のテーブルのタイプを推定する際に、テーブルにおけるカラム名のそれぞれと相関の高い単語をタイプとして優先するように、タイプ候補を推定する。そして、統計的推定部１３は、推定したタイプ候補ごとの統計スコアをタイプ判定部１５に出力する。 For example, when the statistical estimation unit 13 receives a set of column names from the type determination unit 15 described later, the statistical information DB 22 extracts the type 22a, the predicate 22b, and the score 22c corresponding to the set of column names. The statistical estimation unit 13 totals the scores 22c associated with the predicate 22b for each type 22a, and sets the total value as a statistical score. The statistical estimation unit 13 estimates a plurality of types with high statistical scores as type candidates. In other words, when estimating the type of the target table, the statistical estimation unit 13 estimates type candidates such that words highly correlated with each of the column names in the table are prioritized as the type. The statistical estimation unit 13 then outputs the statistical score for each estimated type candidate to the type determination unit 15 .

演繹ルール適用部１４は、演繹ルールを適用して、タイプ候補に対応する補正スコアを取得する。 The deduction rule applying unit 14 applies the deduction rule to obtain a correction score corresponding to the type candidate.

例えば、演繹ルール適用部１４は、後述するタイプ判定部１５からタイプ候補を受け付けると、演繹ルールＤＢ２３に基づいて、タイプ候補に対応する補正スコアを取得する。すなわち、演繹ルール適用部１４は、演繹ルールＤＢ２３に基づいて、タイプ候補に対応するオントロジーの深さを補正スコアとして取得する。深さが深い程、示す概念（タイプ）が相対的に具体的になる。そして、演繹ルール適用部１４は、取得したタイプ候補ごとの補正スコアをタイプ判定部１５に出力する。 For example, when the deduction rule application unit 14 receives a type candidate from the type determination unit 15 described later, the deduction rule application unit 14 acquires a correction score corresponding to the type candidate based on the deduction rule DB 23 . That is, the deduction rule application unit 14 acquires the depth of the ontology corresponding to the type candidate as a correction score based on the deduction rule DB 23 . The deeper the depth, the more specific the concepts (types) shown. Then, the deduction rule applying unit 14 outputs the corrected score for each acquired type candidate to the type determining unit 15 .

タイプ判定部１５は、統計的推定部１３および演繹ルール適用部１４を用いて、対象のテーブルのタイプを判定する。 The type determination unit 15 uses the statistical estimation unit 13 and the deduction rule application unit 14 to determine the type of the target table.

例えば、タイプ判定部１５は、対象のテーブルのカラム名の集合を入力する。ここでいうテーブルのカラム名の集合は、例えば、スキーマ情報のことを示す。タイプ判定部１５は、カラム名の集合に対応するタイプ候補を統計スコアに基づいて推定させるために、入力したカラム名の集合を統計的推定部１３に出力する。また、タイプ判定部１５は、統計的推定部１３によって推定されたタイプ候補の統計スコアを調整するために、推定されたタイプ候補を演繹ルール適用部１４に出力する。そして、タイプ判定部１５は、タイプ候補ごとの統計スコアと、演繹ルール適用部１４によって演繹ルールから得られる、タイプ候補ごとの補正スコアとに基づいて、タイプ候補ごとのスコアを計算する。タイプ判定部１５は、計算したスコアが高いほどタイプとして優先するように、タイプ候補からタイプを判定する。言い換えれば、タイプ判定部１５は、示すタイプ候補が相対的に具体的である単語（述語）ほどタイプとして優先するように、統計的推定部１３によって推定されたタイプ候補から対象のテーブルのタイプを推定（判定）する。そして、タイプ判定部１５は、推定（判定）したタイプを出力する。 For example, the type determination unit 15 inputs a set of column names of the target table. The set of column names of the table referred to here indicates, for example, schema information. The type determination unit 15 outputs the input set of column names to the statistical estimation unit 13 in order to estimate type candidates corresponding to the set of column names based on the statistical scores. Further, the type determination unit 15 outputs the estimated type candidate to the deduction rule application unit 14 in order to adjust the statistical score of the type candidate estimated by the statistical estimation unit 13 . Then, the type determination unit 15 calculates a score for each type candidate based on the statistical score for each type candidate and the correction score for each type candidate obtained from the deduction rule by the deduction rule application unit 14 . The type determination unit 15 determines the type from the type candidates so that the higher the calculated score, the higher the type. In other words, the type determination unit 15 determines the type of the target table from the type candidates estimated by the statistical estimation unit 13 so that words (predicates) whose type candidates are relatively specific are prioritized as types. Estimate (judgment). Then, the type determination unit 15 outputs the estimated (determined) type.

［演繹ルールＤＢのスキーマの一例］
図５は、実施例に係る演繹ルールＤＢのスキーマの一例を示す図である。なお、図５左図は、ナレッジグラフ２１に付与されているオントロジーを示す。オントロジーは、ｒｏｏｔに位置する概念（タイプ）としての「Ｔｈｉｎｇ」から下位に存在するタイプ同士の階層関係を示している。例えば、タイプが「組織」である場合には、「Ｔｈｉｎｇ」のサブクラスとして「エージェント」、「エージェント」のサブクラスとして「組織」および「人」、「組織」のサブクラスとして「会社」、「立法府」が存在する。一例として、「Ｔｈｉｎｇ」と「エージェント」との関係は、親子関係である。「組織」と「会社」との関係は、親子関係である。一例として、「組織」と「人」との関係は、兄弟関係である。「会社」と「立法府」との関係は、兄弟関係である。そして、オントロジーの深さは、「Ｔｈｉｎｇ」を「１」として１ずつ深くなる。 [An example of a deduction rule DB schema]
FIG. 5 is a diagram illustrating an example of a deduction rule DB schema according to the embodiment; Note that the left diagram of FIG. 5 shows the ontology given to the knowledge graph 21 . The ontology indicates a hierarchical relationship between types existing below "Thing" as a concept (type) located at the root. For example, if the type is "Organization", "Agent" is a subclass of "Thing", "Organization" and "Person" are subclasses of "Agent", and "Company" and "Legislative" are subclasses of "Organization". ” exists. As an example, the relationship between "Thing" and "agent" is a parent-child relationship. The relationship between "organization" and "company" is a parent-child relationship. As an example, the relationship between "organization" and "person" is a sibling relationship. The relationship between the "company" and the "legislature" is that of a brother. Then, the depth of the ontology increases by 1, with "Thing" being "1".

このようなオントロジーから生成される演繹ルールＤＢ２３は、ソース２３ａ、関係２３ｂおよびターゲット２３ｃを対応付けた情報を記憶する。ソース２３ａは、後述するターゲット２３ｃの上位のクラス（タイプ）を示す。ターゲット２３ｃは、ソース２３ａの下位にあるクラス（タイプ）を示す。関係２３ｂは、上位と下位との間の関係を示す。一例として、関係２３ｂが「ｓｕｂＣｌａｓｓＯｆ」とは、下位が上位のサブクラスであることを示す。 The deduction rule DB 23 generated from such an ontology stores information that associates sources 23a, relationships 23b, and targets 23c. The source 23a indicates a higher class (type) of the target 23c, which will be described later. The target 23c indicates a class (type) that is subordinate to the source 23a. A relationship 23b indicates the relationship between the higher order and the lower order. As an example, the relation 23b "subClassOf" indicates that the lower class is a subclass of the higher class.

一例として、ソース２３ａが「Ｔｈｉｎｇ」である場合に、関係２３ｂとして「ｓｕｂＣｌａｓｓＯｆ」、ターゲット２３ｃとして「エージェント」を記憶している。ソース２３ａが「エージェント」である場合に、関係２３ｂとして「ｓｕｂＣｌａｓｓＯｆ」、ターゲット２３ｃとして「組織」、「人」を記憶している。ソース２３ａが「組織」である場合に、関係２３ｂとして「ｓｕｂＣｌａｓｓＯｆ」、ターゲット２３ｃとして「会社」、「立法府」を記憶している。 As an example, when the source 23a is "Thing", "subClassOf" is stored as the relation 23b and "agent" is stored as the target 23c. When the source 23a is "agent", "subClassOf" is stored as the relationship 23b, and "organization" and "person" are stored as the targets 23c. When the source 23a is "organization", "subClassOf" is stored as the relation 23b, and "company" and "legislature" are stored as the targets 23c.

［タイプ推定の一例］
図６は、実施例に係るタイプ推定の一例を示す図である。なお、図６では、統計情報ＤＢ２２が統計スコア計算部１１によって生成されているとする。また、演繹ルールＤＢ２３が演繹ルール抽出部１２によって生成されているとし、演繹ルールＤＢ２３と同じ内容のオントロジー２３´が示されている。 [An example of type estimation]
FIG. 6 is a diagram illustrating an example of type estimation according to the embodiment. In addition, in FIG. 6, it is assumed that the statistical information DB 22 is generated by the statistical score calculator 11 . Also, it is assumed that the deduction rule DB 23 is generated by the deduction rule extraction unit 12, and an ontology 23' having the same content as the deduction rule DB 23 is shown.

このような状況の下、タイプ判定部１５は、対象のテーブルのカラム名の集合「名前、所在地、資本金、事業内容、関連組織」を受け付けると、カラム名の集合を統計的推定部１３に出力する。 Under such circumstances, when the type determination unit 15 receives a set of column names of the target table, “name, location, capital, business description, related organization,” the set of column names is sent to the statistical estimation unit 13. Output.

統計的推定部１３は、タイプ判定部１５からカラム名の集合を受け付けると、統計情報ＤＢ２２からカラム名の集合に対応するタイプ２２ａ、述語２２ｂおよびスコア２２ｃを抽出する。ここでは、タイプ２２ａが「会社」である場合の述語２２ｂおよびスコア２２ｃと、タイプ２２ａが「組織」である場合の述語２２ｂおよびスコア２２ｃとが抽出される。 Upon receiving the set of column names from the type determination unit 15, the statistical estimation unit 13 extracts the type 22a, predicate 22b, and score 22c corresponding to the set of column names from the statistical information DB 22. FIG. Here, the predicate 22b and the score 22c when the type 22a is "company" and the predicate 22b and the score 22c when the type 22a is "organization" are extracted.

そして、統計的推定部１３は、タイプ２２ａごとに、述語２２ｂに対応付けられたスコア２２ｃを合計し、統計スコアを計算する。ここでは、タイプ２２ａが「会社」である場合には、統計スコアが１１１．１５と計算される。タイプ２２ａが「組織」である場合には、統計スコアが１４６．９１と計算される。統計的推定部１３は、合計した統計スコアの高い複数個のタイプをタイプ候補として推定する。ここでは、「会社」、「組織」が、タイプ候補として推定されたとする。統計的推定部１３は、推定したタイプ候補ごとの統計スコアをタイプ判定部１５に出力する。ここでは、タイプ候補「会社」の統計スコア「１１１．１５」およびタイプ候補「組織」の統計スコア「１４６．９１」がタイプ判定部１５に出力される。 Then, the statistical estimation unit 13 totals the scores 22c associated with the predicate 22b for each type 22a to calculate a statistical score. Here, if the type 22a is "Company", the statistical score is calculated as 111.15. If the type 22a is "Organization", the statistical score is calculated as 146.91. The statistical estimation unit 13 estimates a plurality of types with high total statistical scores as type candidates. Here, it is assumed that "company" and "organization" are estimated as type candidates. The statistical estimation unit 13 outputs the statistical score for each estimated type candidate to the type determination unit 15 . Here, the statistical score “111.15” for the type candidate “company” and the statistical score “146.91” for the type candidate “organization” are output to the type determination unit 15 .

そして、タイプ判定部１５は、統計的推定部１３によって推定されたタイプ候補を演繹ルール適用部１４に出力する。 The type determination unit 15 then outputs the type candidates estimated by the statistical estimation unit 13 to the deduction rule application unit 14 .

演繹ルール適用部１４は、タイプ判定部１５からタイプ候補を受け付けると、演繹ルールＤＢ２３（オントロジー２３´）に基づいて、タイプ候補に対応する補正スコアとしてオントロジー２３´の深さを取得する。ここでは、演繹ルール適用部１４は、タイプ候補「会社」に対応する補正スコアとしてオントロジー２３´の深さ「４」を取得する。演繹ルール適用部１４は、タイプ候補「組織」に対応する補正スコアとしてオントロジー２３´の深さ「３」を取得する。オントロジー２３´の深さが深い程、示す概念（タイプ）が相対的に具体的となる。つまり、「組織」より「会社」が相対的に具体的となる。 Upon receiving a type candidate from the type determination unit 15, the deduction rule application unit 14 acquires the depth of the ontology 23' as a correction score corresponding to the type candidate based on the deduction rule DB 23 (ontology 23'). Here, the deduction rule application unit 14 acquires the depth "4" of the ontology 23' as the correction score corresponding to the type candidate "company". The deduction rule application unit 14 acquires the depth "3" of the ontology 23' as the correction score corresponding to the type candidate "organization". The greater the depth of the ontology 23', the more specific the concepts (types) shown. In other words, "company" is relatively more specific than "organization".

そして、演繹ルール適用部１４は、タイプ候補ごとの補正スコアをタイプ判定部１５に出力する。ここでは、タイプ候補「会社」の補正スコア「４」およびタイプ候補「組織」の補正スコア「３」がタイプ判定部１５に出力される。 The deduction rule application unit 14 then outputs the corrected score for each type candidate to the type determination unit 15 . Here, the corrected score “4” for the type candidate “company” and the corrected score “3” for the type candidate “organization” are output to the type determination unit 15 .

そして、タイプ判定部１５は、統計的推定部１３によって推定されたタイプ候補ごとの統計スコアと、演繹ルール適用部１４によって得られる、タイプ候補ごとの補正スコアとに基づいて、タイプ候補ごとのスコアを計算する。ここでは、スコアの計算方法は、統計スコアと補正スコアとが同じ重みである場合の一例であり、統計スコアと深さとを乗算する方法である。具体的には、タイプ候補が「会社」である場合には、スコアは、統計スコア「１１１．１５」と深さ「４」との乗算から得られる「４４４．６０」と計算される。また、タイプ候補が「組織」である場合には、スコアは、統計スコア「１４６．９１」と深さ「３」との乗算から得られる「４４０．７３」と計算される。そして、タイプ判定部１５は、計算したスコアが高いほどタイプとして優先するように、タイプ候補からタイプを推定（判定）する。ここでは、タイプ候補「会社」のスコアが「４４４．６０」であり、タイプ候補「組織」のスコアが「４４０．７３」であるので、タイプ候補「会社」の方がタイプ候補「組織」よりスコアが高い。したがって、タイプ候補「会社」がタイプとして推定（判定）される。 Then, the type determination unit 15 calculates the score for each type candidate based on the statistical score for each type candidate estimated by the statistical estimation unit 13 and the correction score for each type candidate obtained by the deduction rule application unit 14. to calculate Here, the score calculation method is an example in which the statistical score and the correction score have the same weight, and is a method of multiplying the statistical score and the depth. Specifically, if the type candidate is "Company", the score is calculated as "444.60" obtained by multiplying the statistical score "111.15" and the depth "4". Also, if the type candidate is "tissue", the score is calculated as "440.73" obtained by multiplying the statistical score "146.91" and the depth "3". Then, the type determination unit 15 estimates (determines) the type from the type candidates so that the higher the calculated score, the higher the type. Here, since the score of the type candidate "company" is "444.60" and the score of the type candidate "organization" is "440.73", the type candidate "company" is higher than the type candidate "organization". High score. Therefore, the type candidate "company" is estimated (determined) as the type.

そして、タイプ判定部１５は、判定したタイプを出力する。ここでは、「会社」がタイプとして出力される。 Then, the type determination unit 15 outputs the determined type. Here, "Company" is output as the type.

この後、タイプ推定装置１は、判定されたタイプを用いて、対象のテーブルをできる限り埋めることができるクエリを自動生成する。そして、タイプ推定装置１は、生成したクエリを用いて、ナレッジグラフ２１から、対象のテーブルの中で情報が設定されていない、カラム名のカラムに情報を設定する。ここでは、タイプが「会社」であるため、組織名が「ＦＬＥ」の正式名称が「Ｆ Laboratories of Europe」と設定される（ｃ１０）。さらに、資本金が「」と設定される（ｃ１１）ほか、関連組織が「F」と設定される（ｃ１２）。 After this, the type estimation device 1 automatically generates a query that can fill the target table as much as possible using the determined type. Then, using the generated query, the type estimating device 1 sets information from the knowledge graph 21 to columns of column names in which information is not set in the target table. Here, since the type is "company", the official name of the organization name is "FLE" and "F Laboratories of Europe" is set (c10). Further, the capital is set as "" (c11), and the affiliated organization is set as "F" (c12).

すなわち、入力したカラム名の集合が「名前、所在地、資本金、事業内容、関連組織」である場合には、タイプとして「組織」より具体的な（下位の）「会社」がよりもっともらしい。タイプ推定装置１は、タイプを「会社」として推定（判定）することで、対象のテーブルに正しい情報を埋めることが可能となる。 That is, if the set of entered column names is "name, location, capital, business description, related organization", the more specific (lower) type "company" than "organization" is more plausible. By estimating (determining) the type as "company", the type estimation device 1 can fill the target table with correct information.

なお、タイプ推定装置１は、該当のクエリを自動生成するために、スコアが最も高いタイプを用いると説明したが、これに限定されない。タイプ推定装置１は、優先順位が決められたタイプ候補を出力し、出力したタイプ候補からユーザによって選択されたタイプを用いても良い。 Although it has been described that the type estimation device 1 uses the type with the highest score in order to automatically generate the relevant query, it is not limited to this. The type estimating device 1 may output type candidates whose priority is determined, and use a type selected by the user from the output type candidates.

［スコア計算処理のフローチャート］演繹ルール抽出部１２は、
図７は、実施例に係るスコア計算処理のフローチャートの一例を示す図である。なお、統計スコア計算部１１は、例えば、統計スコアの計算要求を受け付けたとする。 [Flowchart of Score Calculation Processing] The deduction rule extraction unit 12
FIG. 7 is a diagram illustrating an example of a flowchart of score calculation processing according to the embodiment. Assume that the statistical score calculation unit 11 receives a statistical score calculation request, for example.

図７に示すように、統計スコアの計算要求を受け付けた統計スコア計算部１１は、ナレッジグラフ２１内の全てのタイプを取得する（ステップＳ１１）。例えば、統計スコア計算部１１は、図３上図で示すようなクエリ（ナレッジグラフ２１から全てのタイプを問い合わせるクエリ）を用いて、ナレッジグラフ２１内の全てのタイプを取得する。 As shown in FIG. 7, the statistical score calculator 11 that has received the statistical score calculation request acquires all types in the knowledge graph 21 (step S11). For example, the statistical score calculator 11 acquires all types in the knowledge graph 21 using a query (query for all types from the knowledge graph 21) as shown in the upper diagram of FIG.

統計スコア計算部１１は、ナレッジグラフ２１内の全てのタイプを統計情報ＤＢ２２のタイプ２２ａに保持する（ステップＳ１２）。統計スコア計算部１１は、タイプを文書とみなし、タイプが付与されているインスタンスが持つ述語を単語とみなし、単語の文書内の出現頻度を計算する（ステップＳ１３）。例えば、統計スコア計算部１１は、図３下図で示すようなクエリ（タイプが付与されているインスタンスが持つ述語の出現頻度を問い合わせるクエリ）を用いて、取得したタイプが付与されているインスタンスが持つ述語の出現頻度を計算する。 The statistical score calculator 11 stores all types in the knowledge graph 21 in the type 22a of the statistical information DB 22 (step S12). The statistical score calculation unit 11 regards the type as a document, regards the predicate of the instance to which the type is assigned as a word, and calculates the appearance frequency of the word in the document (step S13). For example, the statistical score calculation unit 11 uses a query (query for querying the appearance frequency of a predicate possessed by an instance to which a type is assigned) as shown in the lower diagram of FIG. Calculate the predicate frequency.

統計スコア計算部１１は、タイプと述語の出現頻度とを用いて、ｔｆ－ｉｄｆを計算する（ステップＳ１４）。すなわち、統計スコア計算部１１は、単語（述語）の文書（タイプ）内の出現頻度を、各単語の複数の文書内でのユニーク性を示す値で重み付けしてスコアを計算する。 The statistical score calculation unit 11 calculates tf-idf using the type and the appearance frequency of the predicate (step S14). That is, the statistical score calculation unit 11 calculates a score by weighting the appearance frequency of words (predicates) in documents (types) with a value indicating the uniqueness of each word in a plurality of documents.

統計スコア計算部１１は、計算結果を統計情報ＤＢ２２に格納する（ステップＳ１５）。例えば、統計スコア計算部１１は、タイプ（文書）ごとに、タイプが付与されているインスタンスが持つ述語（単語）ごとに、計算されたスコアを統計情報ＤＢ２２に格納する。そして、統計スコア計算部１１は、統計スコア計算処理を終了する。 The statistical score calculator 11 stores the calculation result in the statistical information DB 22 (step S15). For example, the statistical score calculation unit 11 stores the calculated score in the statistical information DB 22 for each type (document) and for each predicate (word) of the instance to which the type is assigned. Then, the statistical score calculation unit 11 terminates the statistical score calculation process.

［演繹ルール抽出処理のフローチャート］
図８は、実施例に係る演繹ルール抽出処理のフローチャートの一例を示す図である。なお、演繹ルール抽出部１２は、例えば、演繹ルールの抽出要求を受け付けたとする。 [Flowchart of Deduction Rule Extraction Processing]
FIG. 8 is a diagram illustrating an example of a flowchart of deduction rule extraction processing according to the embodiment. Assume that the deduction rule extraction unit 12 receives a deduction rule extraction request, for example.

図８に示すように、演繹ルールの抽出要求を受け付けた演繹ルール抽出部１２は、キュー（Ｑ）を初期化する（ステップＳ２１）。そして、演繹ルール抽出部１２は、ナレッジグラフ２１に付与されているオントロジーの先頭ノードであるｒｏｏｔ（Ｔｈｉｎｇ）をＱに入れる（ステップＳ２２）。 As shown in FIG. 8, the deduction rule extraction unit 12 that receives the deduction rule extraction request initializes the queue (Q) (step S21). Then, the deduction rule extraction unit 12 puts root (Thing), which is the leading node of the ontology assigned to the knowledge graph 21, into Q (step S22).

演繹ルール抽出部１２は、Ｑが空であるか否かを判定する（ステップＳ２３）。Ｑが空でないと判定した場合には（ステップＳ２３；Ｎｏ）、演繹ルール抽出部１２は、Ｑからノード（ｖ）を取り出す（ステップＳ２４）。演繹ルール抽出部１２は、取り出したノード（ｖ）の兄弟ノードリスト（Ｓｉｂｌｉｎｇ）を取得する（ステップＳ２５）。例えば、演繹ルール抽出部１２は、兄弟の演繹ルールを抽出する問い合わせのクエリを用いれば良い。 The deduction rule extraction unit 12 determines whether or not Q is empty (step S23). If it is determined that Q is not empty (step S23; No), the deduction rule extraction unit 12 extracts node (v) from Q (step S24). The deduction rule extraction unit 12 acquires a sibling node list (Sibling) of the extracted node (v) (step S25). For example, the deduction rule extraction unit 12 may use a query for extracting sibling deduction rules.

演繹ルール抽出部１２は、Ｓｉｂｌｉｎｇ内の全ての兄弟関係を演繹ルールＤＢ２３に登録する（ステップＳ２６）。一例として、図５で示す、選択されたノードが「組織」である場合に、「組織」と「人」とが兄弟の関係である。選択されたノードが「会社」である場合に、「会社」と「立法府」とが兄弟の関係である。 The deduction rule extraction unit 12 registers all sibling relationships in Sibling in the deduction rule DB 23 (step S26). As an example, when the selected node is "organization" shown in FIG. 5, "organization" and "person" are siblings. If the selected node is "Company", then "Company" and "Legislation" are siblings.

演繹ルール抽出部１２は、Ｓｉｂｌｉｎｇ内のノード（ｖ´）について全てＱに格納されているか否かを判定する（ステップＳ２７）。全てＱに格納されていないと判定した場合には（ステップＳ２７；Ｎｏ）、演繹ルール抽出部１２は、Ｓｉｂｌｉｎｇ内のＱに格納されていない兄弟ノード（ｖ´）を取り出し、ｖ´とｖとを親子関係として、演繹ルールＤＢ２３に格納する（ステップＳ２８）。 The deduction rule extraction unit 12 determines whether or not all nodes (v') in Sibling are stored in Q (step S27). If it is determined that all of them are not stored in Q (step S27; No), the deduction rule extraction unit 12 extracts sibling nodes (v') that are not stored in Q in Sibling, and are stored in the deduction rule DB 23 as a parent-child relationship (step S28).

そして、演繹ルール抽出部１２は、ｖ´をＱに格納する（ステップＳ２９）。そして、演繹ルール抽出部１２は、Ｓｉｂｌｉｎｇ内のノード（ｖ´）について全てＱに格納するために、ステップＳ２７に移行する。 Then, the deduction rule extraction unit 12 stores v' in Q (step S29). Then, the deduction rule extraction unit 12 shifts to step S27 in order to store all of the nodes (v') in Sibling in Q.

一方、全てＱに格納されていると判定した場合には（ステップＳ２７；Ｙｅｓ）、演繹ルール抽出部１２は、Ｑが空であるか否かを判定するために、ステップＳ２３に移行する。 On the other hand, if it is determined that all are stored in Q (step S27; Yes), the deduction rule extraction unit 12 proceeds to step S23 to determine whether or not Q is empty.

ステップＳ２３について、Ｑが空であると判定した場合には（ステップＳ２３；Ｙｅｓ）、演繹ルール抽出部１２は、演繹ルール抽出処理を終了する。 Regarding step S23, when it is determined that Q is empty (step S23; Yes), the deduction rule extraction unit 12 terminates the deduction rule extraction process.

［タイプ推定処理のフローチャート］
図９は、実施例に係るタイプ推定処理のフローチャートの一例を示す図である。なお、タイプ判定部１５は、例えば、タイプの判定要求を受け付けたとする。 [Type estimation processing flow chart]
FIG. 9 is a diagram illustrating an example of a flowchart of type estimation processing according to the embodiment. It is assumed that the type determination unit 15 receives a type determination request, for example.

図９に示すように、タイプの判定要求を受け付けたタイプ判定部１５は、対象テーブルのカラム名の集合を受け付ける（ステップＳ３１）。タイプ判定部１５は、受け付けたカラム名の集合を統計的推定部１３に出力する。 As shown in FIG. 9, the type determination unit 15 that has received the type determination request receives a set of column names of the target table (step S31). The type determination unit 15 outputs the received set of column names to the statistical estimation unit 13 .

続いて、統計的推定部１３は、統計情報ＤＢ２２から、カラム名の集合に対応するタイプを、統計情報ＤＢ２２から読み出す（ステップＳ３２）。統計的推定部１３は、読出件数が０件であるか否かを判定する（ステップＳ３３）。読出件数が０件であると判定した場合には（ステップＳ３３；Ｙｅｓ）、統計的推定部１３は、タイプ推定処理を終了する。 Subsequently, the statistical estimation unit 13 reads the type corresponding to the set of column names from the statistical information DB 22 (step S32). The statistical estimation unit 13 determines whether or not the read number is 0 (step S33). If it is determined that the number of readouts is 0 (step S33; Yes), the statistical estimation unit 13 terminates the type estimation process.

一方、読出件数が０件でないと判定した場合には（ステップＳ３３；Ｎｏ）、統計的推定部１３は、読み出したタイプに対する統計スコアを計算し、統計スコアからタイプを推定する（ステップＳ３４）。例えば、統計的推定部１３は、統計情報ＤＢ２２から読み出したタイプ２２ａごとに、述語２２ｂに対応付けられたスコア２２ｃを合計し、統計スコアを計算する。統計的推定部１３は、統計スコアの高いタイプの複数個をタイプ候補として推定する。そして、統計的推定部１３は、推定されたタイプと統計スコアとを対応付ける（ステップＳ３５）。 On the other hand, when it is determined that the number of readouts is not 0 (step S33; No), the statistical estimation unit 13 calculates a statistical score for the readout type and estimates the type from the statistical score (step S34). For example, the statistical estimation unit 13 totals the scores 22c associated with the predicate 22b for each type 22a read from the statistical information DB 22 to calculate the statistical score. The statistical estimation unit 13 estimates a plurality of types with high statistical scores as type candidates. Then, the statistical estimation unit 13 associates the estimated type with the statistical score (step S35).

続いて、演繹ルール適用部１４は、推定されたタイプに対し、演繹ルールを適用して補正スコアを取得する（ステップＳ３６）。例えば、演繹ルール適用部１４は、演繹ルールＤＢ２３に基づいて、統計的推定部１３によって推定されたタイプ候補に対応する補正スコアを取得する。一例として、演繹ルール適用部１４は、タイプ候補に対応するオントロジーの深さを補正スコアとして取得する。 Subsequently, the deduction rule application unit 14 applies the deduction rule to the estimated type to obtain a corrected score (step S36). For example, the deduction rule application unit 14 acquires a correction score corresponding to the type candidate estimated by the statistical estimation unit 13 based on the deduction rule DB 23 . As an example, the deduction rule application unit 14 acquires the depth of the ontology corresponding to the type candidate as a correction score.

そして、タイプ判定部１５は、推定されたタイプに対し、統計的なスコアと、補正スコアとを用いて、最終的なスコアを計算する（ステップＳ３７）。例えば、タイプ判定部１５は、統計的推定部１３によって推定されたタイプ候補ごとに、統計スコアと、演繹ルール適用部１４によって取得された補正スコアとに基づいて、タイプ候補ごとの最終的なスコアを計算する。 The type determination unit 15 then calculates a final score for the estimated type using the statistical score and the correction score (step S37). For example, for each type candidate estimated by the statistical estimation unit 13, the type determination unit 15 determines the final score to calculate

そして、タイプ判定部１５は、スコアの高いタイプを画面に出力する（ステップＳ３８）。そして、タイプ判定部１５は、タイプ推定処理を終了する。 Then, the type determination unit 15 outputs the type with the highest score to the screen (step S38). The type determination unit 15 then terminates the type estimation process.

［タイプ推定の用途例］
図１０Ａおよび図１０Ｂは、実施例に係るタイプ推定の用途の一例を示す図である。 [Application example of type estimation]
10A and 10B are diagrams illustrating an example application of type estimation according to an embodiment.

図１０Ａに示すように、対象のテーブルのカラム名の集合が、「会社名、社名、所在地、資本金、主な事業内容、事業所、代表取締役社長、本社、株主」である場合である。かかる場合には、タイプ判定部１５は、タイプとして「会社」を出力する。 As shown in FIG. 10A, the set of column names of the target table is "company name, company name, location, capital, main business, office, president, head office, shareholder". In such a case, the type determination unit 15 outputs "company" as the type.

図１０Ｂに示すように、対象のテーブルのカラム名の集合が、「店名、住所、アクセス、営業時間、定休日、料金、予算、電話番号、駐車場、ホームページ」である場合である。かかる場合には、タイプ判定部１５は、タイプとして「レストラン」を出力する。 As shown in FIG. 10B, this is the case where the set of column names of the target table is "store name, address, access, business hours, closed days, fee, budget, telephone number, parking lot, homepage". In such a case, the type determination unit 15 outputs "restaurant" as the type.

なお、実施例では、タイプ判定部１５は、統計的推定部１３によって推定されたタイプ候補ごとの統計スコアと、演繹ルール適用部１４によって得られたタイプ候補ごとの補正スコアとに基づいて、タイプ候補ごとのスコアを計算する。スコアの計算方法は、統計スコアと補正スコアとが同じ重みである場合の統計スコアと補正スコアとを乗算する方法と説明した。しかしながら、スコアの計算方法は、これに限定されず、統計スコアと補正スコアとが異なる重みである場合であっても良い。例えば、最終スコアｆは、以下の式（１）の計算方法であっても良い。なお、式（１）内のαは、０．０より大きく１．０より小さい実数であるとする。
ｆ＝α×統計スコア×（１－α）×補正スコア・・・式（１） In the embodiment, the type determination unit 15 determines the type based on the statistical score for each type candidate estimated by the statistical estimation unit 13 and the correction score for each type candidate obtained by the deduction rule application unit 14. Compute a score for each candidate. The score calculation method is described as a method of multiplying the statistical score and the corrected score when the statistical score and the corrected score have the same weight. However, the score calculation method is not limited to this, and the statistical score and the correction score may have different weights. For example, the final score f may be calculated by the following formula (1). Note that α in the expression (1) is a real number greater than 0.0 and less than 1.0.
f = α × statistical score × (1-α) × corrected score formula (1)

また、実施例では、タイプ判定部１５は、統計的推定部１３によって統計スコアを用いて推定されたタイプ候補について、統計スコアと、演繹ルール適用部１４によって得られた補正スコアとを用いて、最終的のタイプ候補をスコア化した。しかしながら、タイプ判定部１５は、これに限定されず、統計的推定部１３によって統計スコアを用いて推定されたタイプ候補について、演繹ルールを用いて選択的に排除（フィルタリング）しても良い。例えば、タイプ判定部１５は、演繹ルールを用いて深さが所定値未満のタイプ候補をフィルタリングしても良い。かかる所定値は、例えば、入力される、対象のテーブルのカラム名の集合によって決定されれば良い。 In addition, in the embodiment, the type determination unit 15 uses the statistical score and the correction score obtained by the deduction rule application unit 14 for the type candidate estimated using the statistical score by the statistical estimation unit 13, The final type candidates were scored. However, the type determination unit 15 is not limited to this, and may selectively exclude (filter) type candidates estimated using the statistical scores by the statistical estimation unit 13 using deduction rules. For example, the type determination unit 15 may filter type candidates whose depth is less than a predetermined value using a deductive rule. Such a predetermined value may be determined, for example, by a set of input column names of the target table.

また、実施例では、演繹ルールＤＢ２３に記憶される演繹ルールとして、概念（タイプ）の階層関係を利用した。しかしながら、演繹ルールは、これに限定されず概念（タイプ）の階層以外のオントロジーの情報を利用しても良い。また、利用するオントロジーの情報は、実施例で利用したオントロジー２３´の情報に限定されず、いかなるオントロジーの情報であっても良い。 Moreover, in the embodiment, the hierarchical relationship of concepts (types) is used as the deduction rule stored in the deduction rule DB 23 . However, the deduction rule is not limited to this, and ontology information other than the concept (type) hierarchy may be used. Further, the information of the ontology to be used is not limited to the information of the ontology 23' used in the embodiment, and may be information of any ontology.

［実施例の効果］
上記実施例によれば、タイプ推定装置１は、カラム名とカラム名に対するカラムとが対応付けられたテーブルのタイプを推定する際に、以下の処理を行う。タイプ推定装置１は、ナレッジグラフ２１に含まれるタイプと、タイプごとのインスタンスの述語の出現頻度とに基づいて、テーブルのタイプの候補を特定する。タイプ推定装置１は、ナレッジグラフ２１に含まれるタイプ同士の関係を規定したオントロジーの深さに関する情報に基づいて、特定したタイプの候補の中から、テーブルのタイプを推定する。かかる構成によれば、タイプ推定装置１は、タイプごとのインスタンスの述語の出現頻度に加えて、タイプ同士の関係を規定したオントロジーの深さに関する情報を用いることで、テーブルの最適なタイプを推定することができる。 [Effect of Example]
According to the above embodiment, the type estimation device 1 performs the following processing when estimating the type of the table in which the column name and the column corresponding to the column name are associated. The type estimation device 1 identifies table type candidates based on the types included in the knowledge graph 21 and the appearance frequency of the predicate of the instance of each type. The type estimation device 1 estimates the table type from among the specified type candidates based on the information on the depth of the ontology that defines the relationship between the types included in the knowledge graph 21 . According to such a configuration, the type estimation device 1 estimates the optimum type of the table by using information on the depth of the ontology that defines the relationship between types, in addition to the appearance frequency of the instance predicate for each type. can do.

また、上記実施例によれば、タイプ推定装置１は、オントロジーの深さが深い程優先するように、特定したタイプの候補の中から、テーブルのタイプを推定する。かかる構成によれば、タイプ推定装置１は、タイプごとのインスタンスの述語の出現頻度を加味しながら、示す概念が具体的な（下位の）タイプを優先することができる。 Further, according to the above-described embodiment, the type estimation device 1 estimates the table type from among the identified type candidates so that the deeper the depth of the ontology, the higher the priority. According to such a configuration, the type estimation device 1 can give priority to a specific (lower-order) type of concept to be indicated while taking into account the appearance frequency of the predicate of the instance of each type.

また、上記実施例によれば、タイプ推定装置１は、推定したタイプを用いて、ナレッジグラフ２１から、テーブルの中で情報が入力されていない、カラム名のカラムに情報を設定する。かかる構成によれば、タイプ推定装置１は、タイプごとのインスタンスの述語の出現頻度に加えて、タイプ同士の関係を規定したオントロジーの深さに関する情報を用いて推定されたタイプを用いることで、カラム名のカラムに設定される情報の精度を向上させる。 Further, according to the above-described embodiment, the type estimation device 1 uses the estimated type to set information from the knowledge graph 21 in the columns of column names in which information is not entered in the table. According to such a configuration, the type estimation device 1 uses the type estimated using the information on the depth of the ontology that defines the relationship between the types, in addition to the appearance frequency of the instance predicate for each type. Improve the accuracy of the information set in the column of column name.

［その他］
なお、図示したタイプ推定装置１の各構成要素は、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、タイプ推定装置１の分散・統合の具体的態様は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、統計的推定部１３と演繹ルール適用部１４とタイプ判定部１５とを１つの部として統合しても良い。また、タイプ判定部１５を、入出力を行う入出力部と、タイプを判定する判定部とに分離しても良い。また、図示しない記憶部をタイプ推定装置１の外部装置としてネットワーク経由で接続するようにしても良い。 [others]
It should be noted that each component of the illustrated type estimation device 1 does not necessarily have to be physically configured as illustrated. That is, the specific mode of distribution/integration of the type estimation device 1 is not limited to the illustrated one, and all or part of it can be functionally or physically implemented in arbitrary units according to various loads and usage conditions. It can be distributed and integrated. For example, the statistical estimation unit 13, the deduction rule application unit 14, and the type determination unit 15 may be integrated as one unit. Further, the type determination unit 15 may be separated into an input/output unit for input/output and a determination unit for determining the type. Also, a storage unit (not shown) may be connected to the type estimation device 1 as an external device via a network.

また、上記実施例で説明した各種の処理は、予め用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図１に示したタイプ推定装置１と同様の機能を実現するタイプ推定プログラムを実行するコンピュータの一例を説明する。図１１は、タイプ推定プログラムを実行するコンピュータの一例を示す図である。 Moreover, various processes described in the above embodiments can be realized by executing a prepared program on a computer such as a personal computer or a workstation. Therefore, an example of a computer that executes a type estimation program that implements the same functions as those of the type estimation device 1 shown in FIG. 1 will be described below. FIG. 11 is a diagram showing an example of a computer that executes a type estimation program.

図１１に示すように、コンピュータ２００は、各種演算処理を実行するＣＰＵ２０３と、ユーザからのデータの入力を受け付ける入力装置２１５と、表示装置２０９を制御する表示制御部２０７とを有する。また、コンピュータ２００は、記憶媒体からプログラムなどを読取るドライブ装置２１３と、ネットワークを介して他のコンピュータとの間でデータの授受を行う通信制御部２１７とを有する。また、コンピュータ２００は、各種情報を一時記憶するメモリ２０１と、ＨＤＤ（Hard Disk Drive）２０５を有する。そして、メモリ２０１、ＣＰＵ２０３、ＨＤＤ２０５、表示制御部２０７、ドライブ装置２１３、入力装置２１５、通信制御部２１７は、バス２１９で接続されている。 As shown in FIG. 11, the computer 200 has a CPU 203 that executes various arithmetic processes, an input device 215 that receives data input from the user, and a display control section 207 that controls the display device 209 . The computer 200 also has a drive device 213 that reads programs and the like from a storage medium, and a communication control unit 217 that exchanges data with other computers via a network. The computer 200 also has a memory 201 that temporarily stores various information and a HDD (Hard Disk Drive) 205 . The memory 201 , CPU 203 , HDD 205 , display control section 207 , drive device 213 , input device 215 and communication control section 217 are connected via a bus 219 .

ドライブ装置２１３は、例えばリムーバブルディスク２１０用の装置である。ＨＤＤ２０５は、タイプ推定プログラム２０５ａおよびタイプ推定処理関連情報２０５ｂを記憶する。 The drive device 213 is a device for the removable disk 210, for example. HDD 205 stores type estimation program 205a and type estimation processing related information 205b.

ＣＰＵ２０３は、タイプ推定プログラム２０５ａを読み出して、メモリ２０１に展開し、プロセスとして実行する。かかるプロセスは、タイプ推定装置１の各機能部に対応する。タイプ推定処理関連情報２０５ｂは、ナレッジグラフ２１、統計情報ＤＢ２２および演繹ルールＤＢ２３に対応する。そして、例えばリムーバブルディスク２１０が、タイプ推定プログラム２０５ａなどの各情報を記憶する。 The CPU 203 reads the type estimation program 205a, develops it in the memory 201, and executes it as a process. Such a process corresponds to each functional unit of the type estimating device 1 . The type estimation processing related information 205b corresponds to the knowledge graph 21, the statistical information DB 22 and the deduction rule DB 23. Then, for example, the removable disk 210 stores information such as the type estimation program 205a.

なお、タイプ推定プログラム２０５ａについては、必ずしも最初からＨＤＤ２０５に記憶させておかなくても良い。例えば、コンピュータ２００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ（Compact Disk Read Only Memory）、ＤＶＤ（Digital Versatile Disk）、光磁気ディスク、ＩＣ（Integrated Circuit）カードなどの「可搬用の物理媒体」に当該プログラムを記憶させておく。そして、コンピュータ２００がこれらからタイプ推定プログラム２０５ａを読み出して実行するようにしても良い。 Note that the type estimation program 205a does not necessarily have to be stored in the HDD 205 from the beginning. For example, a flexible disk (FD), a CD-ROM (Compact Disk Read Only Memory), a DVD (Digital Versatile Disk), a magneto-optical disk, an IC (Integrated Circuit) card inserted into the computer 200, or other "portable physical medium ” to store the program. Then, the computer 200 may read and execute the type estimation program 205a from these.

１タイプ推定装置
１１統計スコア計算部
１２演繹ルール抽出部
１３統計的推定部
１４演繹ルール適用部
１５タイプ判定部
２１ナレッジグラフ
２２統計情報ＤＢ
２３演繹ルールＤＢ 1 type estimation device 11 statistical score calculation unit 12 deduction rule extraction unit 13 statistical estimation unit 14 deduction rule application unit 15 type determination unit 21 knowledge graph 22 statistical information DB
23 Deduction rule DB

Claims

When estimating the type of the table in which the column name and the column corresponding to the column name are associated, based on the type included in the graph data and the appearance frequency of the instance predicate for each type, the type of the table identify candidates,
A type estimation method in which a computer executes a process of estimating the type of the table from among the identified type candidates based on information about the depth of an ontology that defines the relationship between types included in the graph data.

2. The type estimation method according to claim 1, wherein a computer executes the process of estimating the type of the table from among the identified type candidates so that the deeper the depth of the ontology, the higher the priority. .

3. The computer according to claim 1 or 2, wherein the estimated type is used to set information in the columns of the column names in the table in which information is not entered from the graph data. type estimation method.

When estimating the type of the table in which the column name and the column corresponding to the column name are associated, based on the type included in the graph data and the appearance frequency of the instance predicate for each type, the type of the table an identification part that identifies a candidate;
an estimating unit for estimating the type of the table from among the identified type candidates based on information on the depth of an ontology that defines the relationship between types included in the graph data;
An information processing device comprising:

When estimating the type of the table in which the column name and the column corresponding to the column name are associated, based on the type included in the graph data and the appearance frequency of the instance predicate for each type, the type of the table identify candidates,
A type estimation program that causes a computer to execute a process of estimating the type of the table from among the specified type candidates based on information about the depth of an ontology that defines the relationship between types included in the graph data.

When estimating a type of a table in which a column name and a column corresponding to the column name are associated, specifying the candidates for the type such that words highly correlated with each of the column names in the table are prioritized as the type. death,
A type estimation method in which a computer executes a process of estimating the type of the table so that, from among the identified type candidates, the word having a relatively specific concept is prioritized as the type.

When estimating a type of a table in which a column name and a column corresponding to the column name are associated, specifying the candidates for the type such that words highly correlated with each of the column names in the table are prioritized as the type. a specific part to
an estimating unit for estimating the type of the table so that a word with a relatively more specific concept is prioritized as the type from among the type candidates specified by the specifying unit;
An information processing device comprising:

When estimating a type of a table in which a column name and a column corresponding to the column name are associated, specifying the candidates for the type such that words highly correlated with each of the column names in the table are prioritized as the type. death,
A type estimation program that causes a computer to execute a process of estimating the type of the table so that, from among the specified type candidates, words that indicate relatively specific concepts are prioritized as the type.