JP7625201B2

JP7625201B2 - Knowledge model creation support device

Info

Publication number: JP7625201B2
Application number: JP2021028205A
Authority: JP
Inventors: 裕佐々木; 陸稲熊; 大小島; 隼樹酒井; 孝幸東; 雄二佐々木
Original assignee: Toyota School Foundation; JTEKT Corp
Current assignee: Toyota School Foundation; JTEKT Corp
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2025-02-03
Anticipated expiration: 2041-02-25
Also published as: JP2022129515A

Description

本発明は、知識モデル作成支援装置に関するものである。 The present invention relates to a knowledge model creation support device.

特許文献１には、工学、医学、薬学、農学、生物学等の諸分野における熟練者による知識を記述した知識モデルの構築システムが記載されている。知識モデルは、当該分野のキーワードとなる用語を因子として、因子をネットワーク形態で相互接続することによって当該分野の用語とその関係性とを表現したものである。特許文献２には、機械加工分野における知識モデルに関する記載がされている。 Patent Document 1 describes a system for constructing knowledge models that describe knowledge held by experts in various fields, such as engineering, medicine, pharmacy, agriculture, and biology. The knowledge model uses key terms in the field as factors and expresses the terms in the field and their relationships by interconnecting the factors in a network form. Patent Document 2 describes a knowledge model in the field of machining.

特開２０１８－１４７３５１号公報JP 2018-147351 A 特開２０２０－０４９６０６号公報JP 2020-049606 A

作業者は、知識モデルの作成において、因子を定義する用語を抽出した上で、当該用語同士の関係性を把握する必要がある。しかし、用語の抽出、抽出した用語同士の関係性の把握は、人手による設定、特に熟練者による設定が必要であるため、知識モデルの作成が容易ではない。 When creating a knowledge model, workers need to extract terms that define factors and then understand the relationships between those terms. However, extracting terms and understanding the relationships between the extracted terms requires manual setup, particularly by an experienced person, so creating a knowledge model is not easy.

本発明は、知識モデルを容易に作成することができる知識モデル作成支援装置を提供することを目的とする。 The present invention aims to provide a knowledge model creation support device that can easily create knowledge models.

本発明の一態様は、用語により定義された複数の因子と前記因子同士の関係性情報とにより構成された知識モデルの作成を支援する装置であって、
複数の用語と前記複数の用語の関係性とを含む用語－関係性データベースと、
用語－関係性抽出モデルを用いて文書データから前記用語及び前記関係性を抽出することにより、前記用語－関係性データベースを作成する用語－関係性データベース作成部と、
前記用語－関係性データベースに基づいて前記知識モデルを作成する知識モデル作成部と、を備え、
前記用語－関係性抽出モデルは、
前記文書データを取得する文書データ取得部と、
前記文書データより用語の抽出を行う用語抽出部と、
２用語間の関係性を導くトリガワードを記憶するトリガワード記憶部と、
前記用語抽出部にて抽出された用語から用語のペア情報を作成するペア抽出部と、
前記文書データに前記トリガワードが含まれる場合、前記用語抽出部により抽出された２つの用語の関係性を前記トリガワード及び前記ペア情報に基づいて抽出する関係性抽出部と、を備え、
前記文書データは、複数の用語を内包する入れ子構造を含み、
前記入れ子構造は、複数の内包用語と前記複数の内包用語を結合した入れ子外部用語とにより構成される構造であり、
前記用語抽出部は、前記文書データに前記入れ子構造が含まれる場合、前記文書データより、前記入れ子構造において前記内包用語及び前記入れ子外部用語の抽出を行い、
前記ペア抽出部は、前記文書データに前記入れ子構造が含まれる場合、前記入れ子構造において前記内包用語を利用せず前記入れ子外部用語のみを利用して前記ペア情報を作成し、
前記関係性抽出部は、前記文書データに前記入れ子構造が含まれる場合、前記入れ子構造における前記入れ子外部用語を関係性抽出対象とし、前記内包用語を関係性抽出対象としない、知識モデル作成支援装置にある。 One aspect of the present invention is an apparatus for supporting the creation of a knowledge model that is configured with a plurality of factors defined by terms and relationship information between the factors, the apparatus comprising:
a term-relationship database including a plurality of terms and relationships of the plurality of terms;
a term-relationship database creation unit that creates the term-relationship database by extracting the terms and the relationships from document data using a term-relationship extraction model;
a knowledge model creation unit that creates the knowledge model based on the term-relationship database,
The term-relationship extraction model is
a document data acquisition unit for acquiring the document data;
a term extraction unit that extracts terms from the document data;
a trigger word storage unit for storing a trigger word that leads to a relationship between two terms;
a pair extraction unit that creates term pair information from the terms extracted by the term extraction unit;
a relationship extraction unit that extracts a relationship between the two terms extracted by the term extraction unit based on the trigger word and the pair information when the trigger word is included in the document data ,
the document data includes a nested structure including a plurality of terms;
The nested structure is a structure that is composed of a plurality of inclusive terms and a nested outer term that combines the plurality of inclusive terms,
the term extraction unit extracts, when the document data includes the nested structure, the inclusive term and the nested outer term in the nested structure from the document data;
when the document data includes a nested structure, the pair extraction unit creates the pair information by using only the nested outer term in the nested structure without using the inclusive term in the nested structure;
The relationship extraction unit is in a knowledge model creation support device that, when the document data includes the nested structure, treats the nested outer term in the nested structure as a relationship extraction target and does not treat the inclusive term as a relationship extraction target.

本発明の他の態様は、用語により定義された複数の因子と前記因子同士の関係性情報とにより構成された知識モデルの作成を支援する装置であって、Another aspect of the present invention is an apparatus for supporting creation of a knowledge model that is configured by a plurality of factors defined by terms and relationship information between the factors, the apparatus comprising:
複数の用語と前記複数の用語の関係性とを含む用語－関係性データベースと、a term-relationship database including a plurality of terms and relationships of the plurality of terms;
用語－関係性抽出モデルを用いて文書データから前記用語及び前記関係性を抽出することにより、前記用語－関係性データベースを作成する用語－関係性データベース作成部と、a term-relationship database creation unit that creates the term-relationship database by extracting the terms and the relationships from document data using a term-relationship extraction model;
前記用語－関係性データベースに基づいて前記知識モデルを作成する知識モデル作成部と、を備え、a knowledge model creation unit that creates the knowledge model based on the term-relationship database,
前記用語－関係性抽出モデルは、The term-relationship extraction model is
前記文書データを取得する文書データ取得部と、a document data acquisition unit for acquiring the document data;
前記文書データ取得部により取得された前記文書データを構成する各トークンに基づいて前記文書データの特徴をベクトルにて表現した特徴表現を生成する特徴表現生成部と、a feature representation generating unit that generates a feature representation in which a feature of the document data is expressed as a vector based on each token constituting the document data acquired by the document data acquiring unit;
前記文書データより用語の抽出を行う用語抽出部と、a term extraction unit that extracts terms from the document data;
２用語間の関係性を導くトリガワードを記憶するトリガワード記憶部と、a trigger word storage unit for storing a trigger word that leads to a relationship between two terms;
前記文書データに前記トリガワードが含まれる場合、前記用語抽出部により抽出された２つの用語の関係性を前記トリガワードに基づいて抽出する関係性抽出部と、を備え、a relationship extraction unit that extracts a relationship between two terms extracted by the term extraction unit based on the trigger word when the trigger word is included in the document data,
前記用語抽出部及び前記関係性抽出部は、前記特徴表現生成部により生成された前記特徴表現を共有して、前記用語の抽出及び前記関係性の抽出を行う、知識モデル作成支援装置にある。The term extraction unit and the relationship extraction unit are in a knowledge model creation support device that extracts the terms and the relationships by sharing the characteristic expressions generated by the characteristic expression generation unit.

上記知識モデル作成支援装置によれば、文書データから用語及び用語同士の関係性を抽出する用語－関係性抽出モデルを用いて、用語－関係性データベースを作成する。まず、用語－関係性抽出モデルは、文書データから用語を自動的に抽出する。さらに、用語－関係性抽出モデルは、２用語間の関係性を導くトリガワードを予め記憶しておき、文書データに当該トリガワードが含まれる場合に、２つの用語の関係性をトリガワードに基づいて抽出する。このように、用語－関係性抽出モデルは、文書データから用語を自動的に抽出することができると共に、予め設定されたトリガワードを考慮して、文書データに含まれる用語同士の関係性を自動的に抽出することができる。従って、人手によらず、用語－関係性データベースを作成することができる。そして、用語－関係性データベースが作成できれば、用語－関係性データベースを参照して知識モデルを作成することができるため、知識モデルを容易に作成することができる。 According to the above knowledge model creation support device, a term-relationship database is created using a term-relationship extraction model that extracts terms and relationships between terms from document data. First, the term-relationship extraction model automatically extracts terms from document data. Furthermore, the term-relationship extraction model pre-stores trigger words that lead to relationships between two terms, and when the document data contains the trigger words, extracts the relationship between the two terms based on the trigger words. In this way, the term-relationship extraction model can automatically extract terms from document data, and can automatically extract relationships between terms contained in the document data by taking into account the preset trigger words. Therefore, the term-relationship database can be created without manual work. Then, once the term-relationship database has been created, a knowledge model can be created by referring to the term-relationship database, and therefore the knowledge model can be easily created.

知識ネットワーク図を示す。A knowledge network diagram is shown. 知識モデル作成支援装置の全体構成を示す機能ブロック図である。1 is a functional block diagram showing an overall configuration of a knowledge model creation support device; 知識モデル作成支援装置を構成する第一データベース（用語－関係性データベース）を示す図である。FIG. 2 is a diagram showing a first database (term-relationship database) constituting the knowledge model creation support device. 図３に示す第一データベースにおける関係性に関する関係ラベルを説明する図である。4 is a diagram illustrating relationship labels relating to relationships in the first database shown in FIG. 3. 知識モデル作成支援装置を構成する第一データベース作成部における用語－関係性抽出モデルを示す機能ブロック図である。1 is a functional block diagram showing a term-relationship extraction model in a first database creation unit constituting a knowledge model creation support device. FIG. 用語抽出部にて適用する入れ子構造を説明する図である。FIG. 13 is a diagram for explaining a nested structure applied in the term extraction unit. トリガワード記憶部に記憶されるトリガワードを示す図である。FIG. 4 is a diagram showing trigger words stored in a trigger word storage unit. 用語－関係性抽出モデルの詳細構成を示す図である。FIG. 13 is a diagram showing a detailed configuration of a term-relationship extraction model. 第一学習フェーズにおける用語－関係性抽出モデルの機能ブロック図である。FIG. 1 is a functional block diagram of a term-relationship extraction model in a first learning phase. 第二学習フェーズにおける用語－関係性抽出モデルの機能ブロック図である。FIG. 13 is a functional block diagram of a term-relationship extraction model in the second learning phase. 知識モデル作成支援装置を構成する知識モデル作成部の適用例を示す描画ＧＵＩウィンドウを示す図である。11 is a diagram showing a drawing GUI window showing an application example of a knowledge model creation unit constituting the knowledge model creation support device; FIG. 知識モデル作成部の他の適用例を示す描画ＧＵＩウィンドウを示す図である。FIG. 13 is a diagram showing a drawing GUI window illustrating another application example of the knowledge model creation unit. 知識モデル作成部の他の適用例を示す描画ＧＵＩウィンドウを示す図である。FIG. 13 is a diagram showing a drawing GUI window illustrating another application example of the knowledge model creation unit. 知識モデル作成部の他の適用例を示す描画ＧＵＩウィンドウを示す図である。FIG. 13 is a diagram showing a drawing GUI window illustrating another application example of the knowledge model creation unit.

（１．知識モデルの概要）
知識モデルは、任意分野の情報に係る知識を所定の形式で記述して格納するものである。即ち、知識モデルは、主として、当該分野の用語により定義された複数の因子と、因子同士の関係性情報とにより構成される。例えば、知識モデルは、技術分野に関する知識とする。 (1. Overview of the knowledge model)
A knowledge model describes and stores knowledge related to information in a given field in a specified format. That is, a knowledge model is mainly composed of multiple factors defined by the terminology of the field and information on the relationships between the factors. For example, the knowledge model may be knowledge related to a technical field.

知識モデルは、例えば、因子に関する情報を有するデータと、関係性情報を有するデータとにより表される。知識モデルは、概念としては、複数の因子がネットワーク形態で相互に繋がれることによって因子同士の関係性が表現される。つまり、知識モデルは、各種技術分野における知識（ノウハウを含む）を形式知として格納しており、更新も可能である。 A knowledge model is represented, for example, by data containing information about factors and data containing relationship information. Conceptually, a knowledge model expresses the relationships between multiple factors by connecting them to each other in the form of a network. In other words, a knowledge model stores knowledge (including know-how) in various technical fields as explicit knowledge, and can also be updated.

技術分野としては、工学、医学、薬学、農学、生物学等の諸分野を対象とすることができる。特に、技術分野には、工学分野に含まれる機械加工分野を挙げることができる。ここで、機械加工分野には、例えば、切削加工や研削加工が含まれる。又、知識モデルは、特に、各技術分野における熟練者による技術情報に関する知識を記述することが有用である。 Technical fields include engineering, medicine, pharmacy, agriculture, biology, and other fields. In particular, technical fields include the field of machining, which is included in the field of engineering. Here, the field of machining includes, for example, cutting and grinding. Furthermore, it is particularly useful for the knowledge model to describe knowledge about technical information held by experts in each technical field.

例えば、機械加工分野において、作業者は、工作物の材質、工具の材質、加工精度、加工サイクルタイム等の種々の情報を考慮して、加工条件としての切削速度、切込量等を決定する。この場合、作業者が、工作物の材質、工具の材質、加工精度、加工サイクルタイム等の種々の情報を入力情報として、加工条件としての切削速度、切込量等を決定するに際して、作業者の思考過程をモデル化したものが、知識モデルである。 For example, in the field of machining, an operator determines the cutting speed, depth of cut, and other machining conditions by taking into account various information such as the material of the workpiece, the material of the tool, the machining accuracy, and the machining cycle time. In this case, the knowledge model is a model of the operator's thought process when the operator determines the cutting speed, depth of cut, and other machining conditions by inputting various information such as the material of the workpiece, the material of the tool, the machining accuracy, and the machining cycle time.

つまり、知識モデルは、工作物の材質、工具の材質、加工要件（加工精度や加工サイクルタイム等）、切削速度、切込量等に加えて、思考過程において登場する技術要素がそれぞれ因子として定義され、因子同士の関係性が定義されている。 In other words, in addition to the workpiece material, tool material, machining requirements (machining accuracy, machining cycle time, etc.), cutting speed, cutting depth, etc., the knowledge model defines each of the technical elements that appear in the thought process as factors, and defines the relationships between the factors.

知識モデルは、例えば、以下のように利用される。作業者が、知識モデルにおいて工作物の材質、工具の材質、加工要件（加工精度や加工サイクルタイム等）を入力因子として、当該入力因子について必要な情報を入力した場合に、出力因子としての切削速度及び切込量等に関する情報が出力される。 For example, the knowledge model is used as follows: When an operator inputs the necessary information about the workpiece material, tool material, and machining requirements (machining accuracy, machining cycle time, etc.) as input factors in the knowledge model, information about the cutting speed, cutting depth, etc. as output factors is output.

（２．知識ネットワーク図１００の例）
知識モデルは、上述したように、概念としては、ネットワーク形態で表現される。知識モデルをネットワーク図で表現した知識ネットワーク図１００の例について、図１を参照して説明する。本例では、機械加工分野における知識モデルに関する知識ネットワーク図１００を例に挙げる。 (2. Example of knowledge network diagram 100)
As described above, the knowledge model is conceptually expressed in the form of a network. An example of a knowledge network diagram 100 in which a knowledge model is expressed in the form of a network diagram will be described with reference to Fig. 1. In this example, a knowledge network diagram 100 relating to a knowledge model in the field of machining will be taken as an example.

図１に示すように、知識ネットワーク図１００は、複数のノード図形１１０と、ノード図形１１０同士を繋ぐリンク図形１２０とを備える。ノード図形１１０は、ボックス等の任意の図形、テキストを含む図形、アイコン等で表される。ノード図形１１０は、知識モデルにおける因子を表す。リンク図形１２０は、直線、曲線、カギ線等で表される。本例では、リンク図形１２０は、関係性に関する方向性を規定するために矢印線にて表す。リンク図形１２０は、知識モデルにおける因子同士を繋ぐ関係性を表す。なお、図１に示す知識ネットワーク図１００においては、ノード図形１１０は、全てテキストが記述可能なボックスにて表しており、リンク図形１２０は、矢印線にて表している。 As shown in FIG. 1, the knowledge network diagram 100 comprises a plurality of node figures 110 and link figures 120 that connect the node figures 110 to each other. The node figures 110 are represented by any figure such as a box, a figure including text, an icon, etc. The node figures 110 represent factors in the knowledge model. The link figures 120 are represented by straight lines, curves, hook lines, etc. In this example, the link figures 120 are represented by arrow lines to specify the directionality of the relationships. The link figures 120 represent relationships that connect factors in the knowledge model. In the knowledge network diagram 100 shown in FIG. 1, the node figures 110 are all represented by boxes in which text can be written, and the link figures 120 are represented by arrow lines.

ここで、因子は、技術用語により定義されている。そして、複数の因子は、技術的な包含関係（上下関係、親子関係、主従関係とも称する）を有する場合、技術的な異種関係を有する場合がある。つまり、因子同士の関係性は、上記の２種類に分類される。 Here, factors are defined by technical terms. In addition, multiple factors may have technical inclusion relationships (also called hierarchical relationships, parent-child relationships, or master-slave relationships) or may have technical heterogeneous relationships. In other words, the relationships between factors are classified into the two types mentioned above.

例えば、被削材諸元に、被削材熱特性、被削材硬度、被削材伸び等を包含する関係にある。つまり、技術的な包含関係を有する因子として、被削材諸元を上位概念因子とし、被削材熱特性、被削材硬度、被削材伸び等を下位概念因子とする。例えば、技術的に異種関係を有する因子として、被削材熱特性と要求工具耐熱性等である。以下において、技術的な包含関係を有する２つの因子の関係性を、単に包含関係と称し、技術的な異種関係を有する２つの因子の関係性を、単に異種関係と称する。 For example, the workpiece specifications include the thermal properties of the workpiece, hardness of the workpiece, elongation of the workpiece, etc. In other words, as factors having a technical inclusion relationship, the workpiece specifications are the higher-level conceptual factors, and the thermal properties of the workpiece, hardness of the workpiece, elongation of the workpiece, etc. are the lower-level conceptual factors. For example, factors having a technically heterogeneous relationship include the thermal properties of the workpiece and the required heat resistance of the tool. In what follows, the relationship between two factors having a technical inclusion relationship will be referred to simply as an inclusion relationship, and the relationship between two factors having a technically heterogeneous relationship will be referred to simply as a heterogeneous relationship.

そして、リンク図形１２０については、包含関係を表す第一リンク図形１２１と、異種関係を表す第二リンク図形１２２とを、区別して表示する。つまり、第一リンク図形１２１と第二リンク図形１２２とは、異なる表示方法にて表示される。 Then, for the link graphic 120, the first link graphic 121, which represents an inclusive relationship, and the second link graphic 122, which represents a heterogeneous relationship, are displayed separately. In other words, the first link graphic 121 and the second link graphic 122 are displayed in different ways.

図１では、包含関係を表す第一リンク図形１２１は、上位概念因子の領域を示す枠線で表しており、下位概念因子が、第一リンク図形１２１を表す枠線の中に配置される。なお、第一リンク図形１２１は、枠線の他に、上下に近接して配置され左右に僅かにずらして配置されたノード図形１１０間を繋ぐＬ字形で表しても良い。この場合、第一リンク図形１２１にて繋がれた２つのノード図形１１０において、上に位置するノード図形１１０が、上位概念因子に相当する。 In FIG. 1, the first link graphic 121, which indicates an inclusion relationship, is represented by a frame line indicating the area of the superordinate conceptual factor, and the subordinate conceptual factor is arranged within the frame line representing the first link graphic 121. Note that the first link graphic 121 may also be represented by an L-shape connecting node graphics 110 arranged adjacently above and below and slightly shifted to the left and right, in addition to a frame line. In this case, of the two node graphics 110 connected by the first link graphic 121, the node graphic 110 located at the top corresponds to the superordinate conceptual factor.

又、図１では、異種関係を表す第二リンク図形１２２は、任意の位置（上下左右）に離れて配置されたノード図形１１０間を繋ぐ、直線、折れ線等で表す。第二リンク図形１２２は、因子同士の定義の方向性を表す矢印線にて示す。 In addition, in FIG. 1, the second link graphic 122 representing the heterogeneous relationship is represented by a straight line, a broken line, or the like, connecting the node graphics 110 placed at any position (up, down, left, right) away from each other. The second link graphic 122 is represented by an arrow line indicating the directionality of the definition between the factors.

（３．知識モデル作成支援装置１の構成）
知識モデル作成支援装置１は、上述した知識モデルを作成するための支援装置である。知識モデル作成支援装置１の構成について図２を参照して説明する。 (3. Configuration of knowledge model creation support device 1)
The knowledge model creation support device 1 is a support device for creating the above-mentioned knowledge model. The configuration of the knowledge model creation support device 1 will be described with reference to FIG.

知識モデル作成支援装置１は、第一ＤＢ作成部２、第一ＤＢ３、第二ＤＢ作成部４、第二ＤＢ５、第三ＤＢ作成部６、第三ＤＢ７、知識モデル作成部８を備える。ＤＢは、データベースの略語である。知識モデル作成支援装置１は、３つのＤＢ作成部２，４，６及び３つのＤＢ３，５，７を備える構成としたが、１つずつとしても良いし、２つずつとしても良いし、４以上ずつとしても良い。 The knowledge model creation support device 1 includes a first DB creation unit 2, a first DB 3, a second DB creation unit 4, a second DB 5, a third DB creation unit 6, a third DB 7, and a knowledge model creation unit 8. DB is an abbreviation for database. The knowledge model creation support device 1 is configured to include three DB creation units 2, 4, 6 and three DBs 3, 5, 7, but it may also be one of each, two of each, or four or more of each.

第一ＤＢ作成部２は、後述する用語－関係性抽出モデルを用いて文書データ（テキストデータ）から、複数の用語及び用語同士の関係性を抽出することにより、用語－関係性ＤＢである第一ＤＢ３を作成する。つまり、第一ＤＢ作成部２では、文書データ（テキストデータ）が入力されると、用語－関係性抽出モデルが実行されることにより、自動的に、複数の用語及び用語同士の関係性が抽出される。第一ＤＢ３は、第一ＤＢ作成部２により作成された用語及び関係性を記憶する。ここでの用語とは、上述した知識モデルにおける因子を作成するために利用可能な情報であって、関係性とは、上述した知識モデルにおける因子同士の関係性情報を作成するために利用可能な情報である。 The first DB creation unit 2 creates the first DB 3, which is a term-relationship DB, by extracting multiple terms and relationships between the terms from document data (text data) using a term-relationship extraction model described below. In other words, when document data (text data) is input to the first DB creation unit 2, the term-relationship extraction model is executed, and multiple terms and relationships between the terms are automatically extracted. The first DB 3 stores the terms and relationships created by the first DB creation unit 2. The terms here refer to information that can be used to create factors in the above-mentioned knowledge model, and the relationships refer to information that can be used to create relationship information between factors in the above-mentioned knowledge model.

第二ＤＢ作成部４は、第一ＤＢ作成部２を構成する用語－関係性抽出モデルとは異なるモデル、例えば、word2vecにより構成される。第二ＤＢ作成部４は、word2vecを用いて、文書データから、複数の用語間の関係性を抽出することにより、第二ＤＢ５を作成する。第二ＤＢ５は、第二ＤＢ作成部４により作成された関係性を記憶する。 The second DB creation unit 4 is configured using a model, such as word2vec, that is different from the term-relationship extraction model that configures the first DB creation unit 2. The second DB creation unit 4 creates the second DB 5 by extracting relationships between multiple terms from document data using word2vec. The second DB 5 stores the relationships created by the second DB creation unit 4.

第三ＤＢ作成部６は、第一ＤＢ作成部２及び第二ＤＢ作成部４を構成するモデルとは異なるモデル、例えば、公知の知識グラフにより構成される。第三ＤＢ作成部６は、知識グラフを用いて、文書データから、複数の用語間の関係性を抽出することにより、第三ＤＢ７を作成する。第三ＤＢ７は、第三ＤＢ作成部６により作成された関係性を記憶する。 The third DB creation unit 6 is configured by a model different from the models configuring the first DB creation unit 2 and the second DB creation unit 4, for example, a known knowledge graph. The third DB creation unit 6 creates the third DB 7 by extracting relationships between multiple terms from the document data using the knowledge graph. The third DB 7 stores the relationships created by the third DB creation unit 6.

知識モデル作成部８は、第一ＤＢ３に記憶されている用語と用語間の関係性を用い、さらに第二ＤＢ５、第三ＤＢ７の各々に記憶されている用語間の関係性を補助として用いて、知識モデルを作成する。例えば、知識モデル作成部８は、第一ＤＢ３に記憶されている複数の用語を因子候補とする。そして、知識モデル作成部８は、各ＤＢ３，５，７を利用して、登録する因子を配置する場所の候補を挙げたり、着目因子と関係性を有する因子候補を挙げたり、既に作成された知識モデルの検証等をしたりする。知識モデル作成部８は、一部において自動的に行うことができ、他の一部は人が行う。ただし、知識モデル作成部８において、知識モデル作成を完全自動化することも可能である。 The knowledge model creation unit 8 creates a knowledge model using the terms and the relationships between the terms stored in the first DB 3, and further using the relationships between the terms stored in each of the second DB 5 and the third DB 7 as auxiliary data. For example, the knowledge model creation unit 8 sets multiple terms stored in the first DB 3 as factor candidates. Then, the knowledge model creation unit 8 uses each DB 3, 5, and 7 to provide candidates for locations to place the factors to be registered, provide factor candidates that have a relationship with the factor of interest, and verify knowledge models that have already been created. The knowledge model creation unit 8 can perform some of its work automatically, and other parts manually. However, it is also possible for the knowledge model creation unit 8 to fully automate the creation of knowledge models.

（４．第一ＤＢ３の例）
第一ＤＢ（用語－関係性ＤＢ）の例について、図３及び図４を参照して説明する。図３に示すように、第一ＤＢ３は、参照用語Ａと、参照用語Ａと関係性を有する関係用語Ｂと、参照用語Ａと関係用語Ｂとの関係性を表す関係ラベルとを記憶する。例えば、参照用語ＡとしてＷ１は、Ｗ２，Ｗ４，Ｗ１０と関係性を有しており、それぞれの関係性（関係ラベル）は、Positive、Negative、Positiveである。 (4. Example of first DB3)
An example of the first DB (term-relationship DB) will be described with reference to Figures 3 and 4. As shown in Figure 3, the first DB 3 stores a reference term A, a related term B having a relationship with the reference term A, and a relationship label indicating the relationship between the reference term A and the related term B. For example, W1 as the reference term A has relationships with W2, W4, and W10, and the respective relationships (relationship labels) are Positive, Negative, and Positive.

関係ラベルは、図４に示すように、例えば、Positive、Negative、Sub、Relationの４種類を定義する。Positiveは、参照用語Ａが大きくなれば、関係用語Ｂが大きくなる関係を表す。Negativeは、参照用語Ａが大きくなれば、関係用語Ｂが小さくなる関係を表す。Subは、参照用語Ａが関係用語Ｂの一種である関係を表す。Relationは、参照用語Ａが関係用語Ｂと何らかの定性的な関係があることを表す。なお、関係ラベルは、上記４種類に限るものではなく、他の種類を含むようにしても良く、自由に設定可能である。 As shown in FIG. 4, four types of relationship labels are defined: Positive, Negative, Sub, and Relation. Positive represents a relationship in which the larger the reference term A, the larger the related term B. Negative represents a relationship in which the larger the reference term A, the smaller the related term B. Sub represents a relationship in which the reference term A is a type of related term B. Relation represents a qualitative relationship between the reference term A and the related term B. Note that the relationship labels are not limited to the above four types, and may include other types and can be freely set.

（５．第一ＤＢ作成部２の構成）
第一ＤＢ作成部２の構成について図５－図７を参照して説明する。第一ＤＢ作成部２は、用語－関係性抽出モデルにより構成される。第一ＤＢ作成部２は、図５に示すように、文書データ取得部１１、特徴表現生成部１２、用語抽出部１３、用語出力部１４、トリガワード記憶部１５、ペア抽出部１６、関係性抽出部１７、関係性出力部１８を備える。 (5. Configuration of First DB Creation Unit 2)
The configuration of the first DB creation unit 2 will be described with reference to Figures 5 to 7. The first DB creation unit 2 is configured by a term-relationship extraction model. As shown in Figure 5, the first DB creation unit 2 includes a document data acquisition unit 11, a characteristic expression generation unit 12, a term extraction unit 13, a term output unit 14, a trigger word storage unit 15, a pair extraction unit 16, a relationship extraction unit 17, and a relationship output unit 18.

文書データ取得部１１は、文書データをテキストデータとして取得する。特徴表現生成部１２は、文書データ取得部１１にて取得した文書データを構成する各トークンに基づいて、文書データの特徴をベクトルにて表現した特徴表現を生成する。用語抽出部１３は、文書データより用語の抽出を行う。詳細には、用語抽出部１３は、特徴表現生成部１２により生成された特徴表現を用いて、文書データに含まれる用語の抽出を行う。用語出力部１４は、用語抽出部１３にて抽出された用語を出力する。 The document data acquisition unit 11 acquires document data as text data. The characteristic representation generation unit 12 generates a characteristic representation that expresses the characteristics of the document data as vectors, based on each token that constitutes the document data acquired by the document data acquisition unit 11. The term extraction unit 13 extracts terms from the document data. In detail, the term extraction unit 13 extracts terms included in the document data using the characteristic representation generated by the characteristic representation generation unit 12. The term output unit 14 outputs the terms extracted by the term extraction unit 13.

ここで、用語抽出部１３は、入れ子構造２０を考慮した用語の抽出を行う。入れ子構造について、図６を参照して説明する。入れ子構造２０とは、用語が用語を内包する構造のことである。入れ子構造２０は、複数の内包用語２２，２３と、複数の内包用語２２，２３を結合した入れ子外部用語２１とにより構成される。 Here, the term extraction unit 13 extracts terms taking into consideration the nesting structure 20. The nesting structure will be described with reference to FIG. 6. The nesting structure 20 is a structure in which a term contains another term. The nesting structure 20 is composed of multiple contained terms 22, 23 and a nested outer term 21 that combines multiple contained terms 22, 23.

例えば、「切削加工」という機械加工用語は、「切削」と「加工」機械加工用語を内包している。この場合、「切削加工」が入れ子外部用語２１であり、「切削」、「加工」が内包用語２２，２３である。入れ子構造を構成する入れ子外部用語２１と内包用語２２，２３とは、例えば、上位下位の関係、属性関係、主述関係等を有する。 For example, the machining term "cutting" contains the machining terms "cutting" and "processing". In this case, "cutting" is the nested outer term 21, and "cutting" and "processing" are the contained terms 22, 23. The nested outer term 21 and the contained terms 22, 23 that make up the nested structure have, for example, a superior-subordinate relationship, an attribute relationship, a subject-predicate relationship, etc.

取得された文書データの一文の例として、「切削速度が増加すると切削温度が増す。」について説明する。当該例文において、用語抽出部１３は、「切削速度」、「切削」、「速度」、「増加」、「切削温度」、「温度」、「増す」の用語が抽出される。つまり、入れ子外部用語２１としての「切削速度」及び「切削温度」が抽出されると共に、内包用語２２，２３としての「切削」、「速度」、「温度」が抽出される。 As an example of a sentence from the acquired document data, "When the cutting speed increases, the cutting temperature increases." will be described. In this example sentence, the term extraction unit 13 extracts the terms "cutting speed," "cutting," "speed," "increase," "cutting temperature," "temperature," and "increase." In other words, "cutting speed" and "cutting temperature" are extracted as nested outer terms 21, and "cutting," "speed," and "temperature" are extracted as contained terms 22 and 23.

トリガワード記憶部１５は、予め設定されたトリガワードを記憶する。トリガワードは、２用語間の関係性を導くキーワードである。トリガワードは、物理量を表す用語の変化を表すキーワード等である。トリガワード記憶部１５は、例えば、図７に示すように、「増加する」、「減少する」、「増す」、「減る」、「上がる」、「下がる」、「含む」等である。 The trigger word storage unit 15 stores preset trigger words. A trigger word is a keyword that derives a relationship between two terms. A trigger word is, for example, a keyword that indicates a change in a term that represents a physical quantity. For example, as shown in FIG. 7, the trigger word storage unit 15 stores "increase," "decrease," "increase," "decrease," "rise," "fall," "contain," etc.

取得された文書データの一文の例として、「切削速度が増加すると切削温度が増す。」について説明する。当該例文において、トリガワードは、「増加する」、「増す」である。そして、当該トリガワードは、２用語としての「切削速度」と「切削温度」とが、一方が大きくなれば、他方が大きくなるという関係（Positiveの関係ラベル）を導くことができるキーワードである（図４参照）。 As an example of a sentence from the acquired document data, we will explain "When the cutting speed increases, the cutting temperature increases." In this example sentence, the trigger words are "increase" and "increase." The trigger words are keywords that can derive a relationship (positive relationship label) between the two terms "cutting speed" and "cutting temperature" in which if one increases, the other increases (see Figure 4).

ペア抽出部１６は、用語抽出部１３にて抽出された用語からペアを作成し、後述する関係性抽出部１７にて利用されるデータに整形する。用語抽出部１３においては入れ子構造２０を構成する場合には、入れ子外部用語２１と内包用語２２，２３とを抽出したが、ペア抽出部１６においては、最も大きな入れ子外部用語２１のみを利用し、内包用語２２，２３は利用しない。ペア抽出部１６においては、ペア作成対象の用語の数がｎ個の場合、_ｎＣ_２個のペアが作成される。 The pair extraction unit 16 creates pairs from the terms extracted by the term extraction unit 13, and formats the data to be used by the relationship extraction unit 17 described below. When constructing a nested structure 20, the term extraction unit 13 extracts a nested outer term 21 and inclusive terms 22, 23, but the pair extraction unit 16 uses only the largest nested outer term 21 and does not use the inclusive terms 22, 23. When the number of terms to be paired is n, the pair extraction unit 16 creates _n C ₂ pairs.

関係性抽出部１７は、文書データにトリガワードが含まれる場合、用語抽出部１３により抽出された２つの用語の関係性を、トリガワードに基づいて抽出する。関係性抽出部１７は、関係性抽出において、トリガワード記憶部１５に記憶されたトリガワード、及び、ペア抽出部１６にて作成された用語のペア情報を用いる。従って、関係性抽出部１７は、文書データに入れ子構造２０（図６に示す）が含まれる場合、入れ子構造２０における入れ子外部用語２１を関係性抽出対象とし、内包用語２２，２３を関係性抽出対象としないこととなる。 When the document data contains a trigger word, the relationship extraction unit 17 extracts the relationship between the two terms extracted by the term extraction unit 13 based on the trigger word. When extracting the relationship, the relationship extraction unit 17 uses the trigger word stored in the trigger word storage unit 15 and the term pair information created by the pair extraction unit 16. Therefore, when the document data contains a nested structure 20 (shown in FIG. 6), the relationship extraction unit 17 extracts the relationship from the nested outer term 21 in the nested structure 20, but does not extract the relationship from the inclusive terms 22 and 23.

さらに、関係性抽出部１７は、特徴表現生成部１２により生成された特徴表現を用いて、２つの用語の関係性を抽出する。つまり、上述した用語抽出部１３と当該関係性抽出部１７とは、特徴表現生成部１２により生成された特徴表現を共有して、用語の抽出及び関係性の抽出を行う。関係性出力部１８は、関係性抽出部１７により抽出された用語同士の関係性を出力する。 Furthermore, the relationship extraction unit 17 extracts the relationship between two terms using the characteristic representation generated by the characteristic representation generation unit 12. In other words, the above-mentioned term extraction unit 13 and the relationship extraction unit 17 share the characteristic representation generated by the characteristic representation generation unit 12 to extract terms and relationships. The relationship output unit 18 outputs the relationship between the terms extracted by the relationship extraction unit 17.

（６．用語－関係性抽出モデルの詳細構成）
第一ＤＢ作成部２を構成する用語－関係性抽出モデルの詳細構成について図８を参照して説明する。図８には、特徴表現生成部１２、用語抽出部１３、ペア抽出部１６、関係性抽出部１７を示す。例文として、「切削速度が増加すると切削温度が増し、・・」を挙げて説明する。 (6. Detailed configuration of term-relationship extraction model)
A detailed configuration of the term-relationship extraction model constituting the first DB creation unit 2 will be described with reference to Fig. 8. Fig. 8 shows the feature expression generation unit 12, the term extraction unit 13, the pair extraction unit 16, and the relationship extraction unit 17. An explanation will be given using an example sentence, "When the cutting speed increases, the cutting temperature increases,..."

（６－１．特徴表現生成部１２）
特徴表現生成部１２は、用語抽出部１３と関係性抽出部１７とに対する共有部を構成する。特徴表現生成部１２は、取得した文書データを、トークンに分割し、トークン列ｘを取得する。例えば、「切」、「削」、「速度」等が、それぞれ１つのトークンである。続いて、トークン列ｘからトークン表現Ｈ１を取得する。トークン表現Ｈ１は、トークン列ｘに対応するベクトルの列にて表現される。トークン表現Ｈ１の取得には、トークン列ｘに対する事前学習モデルを用いる。例えば、事前学習モデルの１つであるＢＥＲＴ（Bidirectional Encoder Representations from Transformers）を用いて、式（１）に示すように、トークン表現Ｈ１を取得する。 (6-1. Feature Representation Generation Unit 12)
The feature representation generating unit 12 constitutes a shared unit for the term extracting unit 13 and the relationship extracting unit 17. The feature representation generating unit 12 divides the acquired document data into tokens and acquires a token string x. For example, each of "cut", "shave", and "speed" is one token. Next, a token representation H1 is acquired from the token string x. The token representation H1 is expressed by a string of vectors corresponding to the token string x. To acquire the token representation H1, a pre-training model for the token string x is used. For example, the token representation H1 is acquired as shown in formula (1) by using BERT (Bidirectional Encoder Representations from Transformers), which is one of the pre-training models.

続いて、取得したトークン表現Ｈ１を畳み込みニューラルネットワーク（ＣＮＮ）の入力として、式（２）に示すように、中間表現Ｈ２を取得する。本例では、中間表現Ｈ２が、特徴表現生成部１２にて生成される特徴表現に相当する。 Then, the acquired token representation H1 is used as an input to a convolutional neural network (CNN) to acquire an intermediate representation H2 as shown in formula (2). In this example, the intermediate representation H2 corresponds to the feature representation generated by the feature representation generation unit 12.

中間表現Ｈ２は、設定されたパラメータを用いて生成される。ここで、中間表現Ｈ２は、用語の抽出と関係性の抽出の両者に用いる表現である。従って、後述するが、学習フェーズにおいて、用語抽出部１３により用語の抽出において発生する用語抽出損失と関係性抽出部１７により関係性の抽出において発生する関係性抽出損失とに基づいて学習する。つまり、用語抽出と関係性抽出のそれぞれで発生する損失を用いて、中間表現Ｈ２を生成するためのパラメータが更新される。 The intermediate representation H2 is generated using the set parameters. Here, the intermediate representation H2 is a representation used for both term extraction and relationship extraction. Therefore, as will be described later, in the learning phase, learning is performed based on the term extraction loss generated in the term extraction by the term extraction unit 13 and the relationship extraction loss generated in the relationship extraction by the relationship extraction unit 17. In other words, the parameters for generating the intermediate representation H2 are updated using the losses generated in each of the term extraction and relationship extraction.

（６－２．用語抽出部１３）
用語抽出部１３は、入れ子構造２０を構成する入れ子外部用語２１と内包用語２２，２３との違いが、用語を構成するトークンの数であることに注目し、トークンの構成数（トークンの長さ）毎に２値の出力をするモデルとなっている。 (6-2.Term extraction unit 13)
The term extraction unit 13 focuses on the difference between the nested outer term 21 and the inclusive terms 22 and 23 that make up the nested structure 20 being the number of tokens that make up the term, and extracts the number of tokens (the length of the token) ) is a model that outputs two values for each input.

まず、中間表現Ｈ２を入力として、式（３）に示すように、長さｉのトークン列の表現を得る畳み込み演算を行う。式（３）においてspanCNNiは、カーネルサイズｉのフィルタによる畳み込み演算を表し、ｎ_entityは、用語を構成するトークン数に対応しており、学習を行う前に与えるハイパーパラメータである。得られた中間表現Ｖｉは、トークン数１，２，３，・・・、ｎに対応するｎ_entity個の中間表現である。 First, the intermediate representation H2 is used as input, and a convolution operation is performed to obtain a representation of a token string of length i, as shown in formula (3). In formula (3), spanCNNi represents a convolution operation using a filter with kernel size i, and _nentity corresponds to the number of tokens that make up a term and is a hyperparameter given before learning. The obtained intermediate representation Vi is an intermediate representation of _nentities corresponding to the number of tokens 1, 2, 3, ..., n.

続いて、式（４）に示すように、トークン数に対応するｎ_entity個の中間表現Ｖｉに対して共通の全結合層をそれぞれ作用させて、中間表現Ｌｉを生成する。 Next, as shown in equation (4), a common fully connected layer is applied to each of the n _entity intermediate representations Vi corresponding to the number of tokens to generate an intermediate representation Li.

全結合の出力は、ｎ_entity個のカーネルサイズごとに対応したトークン数長のシーケンスになる。出力の中身は、図８に示すように、カーネルサイズの用語を構成するトークンの開始位置に１が立つものとなる。例えば、用語「切削速度」は、３個のトークンから構成され、「切削速度」の開始位置は「切」のトークンの位置であるため、トークン数長「３」且つ「切」のトークン位置に１（黒丸）が立つ。又、用語「速度」は、１個のトークンから構成され、開始位置は「速度」のトークンであるため、トークン数長「１」且つ「速度」のトークン位置に１（黒丸）が立つ。 The output of the full combination is a sequence of token length corresponding to each kernel size of n _entities . As shown in FIG. 8, the contents of the output are such that a 1 is set at the start position of the token that constitutes the term of the kernel size. For example, the term "cutting speed" is composed of three tokens, and the start position of "cutting speed" is the position of the token "cut", so the token length is "3" and a 1 (black circle) is set at the token position of "cut". In addition, the term "speed" is composed of one token, and the start position is the token "speed", so the token length is "1" and a 1 (black circle) is set at the token position of "speed".

（６－３．ペア抽出部１６）
ペア抽出部１６は、上述したように、用語抽出部１３にて抽出された用語のペアを作成する。ただし、ペア抽出部１６は、抽出された用語が入れ子構造２０を構成する場合には、最も大きな入れ子外部用語２１のみを関係性抽出対象とする。例えば、図８に示すように、用語抽出部１３にて「切削速度」、「切削」、「速度」が抽出されており、この場合、「切削速度」のみを関係性抽出対象とする。従って、図８において、トークン数長「３」且つトークン「切」の位置に１が立ったままとし、内包用語２２，２３に相当する「切削」、「速度」に対応する位置は０とする。「切削温度」についても同様である。つまり、ここでは、「切削速度」と「切削温度」のペアが作成される。文書データ１文からｎ個の用語が抽出された場合、２つの用語を選択してペアを作成するので、_ｎＣ_２個のペアが作成される。 (6-3. Pair Extraction Unit 16)
As described above, the pair extraction unit 16 creates pairs of terms extracted by the term extraction unit 13. However, when the extracted terms form a nested structure 20, the pair extraction unit 16 extracts only the largest nested external term 21 as a relationship extraction target. For example, as shown in FIG. 8, "cutting speed", "cutting", and "speed" are extracted by the term extraction unit 13, and in this case, only "cutting speed" is extracted as a relationship extraction target. Therefore, in FIG. 8, the token number length is "3" and 1 is set at the position of the token "cut", and the positions corresponding to "cutting" and "speed" corresponding to the inclusive terms 22 and 23 are set to 0. The same is true for "cutting temperature". In other words, here, a pair of "cutting speed" and "cutting temperature" is created. When n terms are extracted from one sentence of document data, two terms are selected to create a pair, so _n C ₂ pairs are created.

続いて、作られた各ペアに対してアノテーションファイルを参照して関係ラベル（図４参照）を付与し、「関係ラベル、用語Ａ、用語Ｂ」のトリプレットを作成する。さらに、ペア抽出部１６は、用語Ａと用語Ｂの用語位置ベクトルＰＥｉも作成する。用語位置ベクトルは、長さが１文のトークン数のベクトルで、用語が存在する位置に１、用語ではない位置に０が立つベクトルである。図８においては、ペアの一方の用語「切削速度」については、「切」、「削」、「速度」の位置に１が立ち、残りの位置が０となる。又、ペアの他方の用語「切削温度」については、「切」、「削」、「温度」の位置に１が立ち、残りの位置が０となる。このようにして、ペア抽出部１６により、用語位置ベクトルＰＥｉが作成される。 Next, a relational label (see FIG. 4) is assigned to each pair created by referring to the annotation file, and a triplet of "relational label, term A, term B" is created. Furthermore, the pair extraction unit 16 also creates a term position vector PEi for term A and term B. The term position vector is a vector of the number of tokens with a length of one sentence, with 1 at the position where a term exists and 0 at the position where it is not a term. In FIG. 8, for one term of the pair, "cutting speed," 1 is placed at the positions of "cut," "cut," and "speed," and 0 is placed at the remaining positions. For the other term of the pair, "cutting temperature," 1 is placed at the positions of "cut," "cut," and "temperature," and 0 is placed at the remaining positions. In this way, the term position vector PEi is created by the pair extraction unit 16.

（６－４．関係性抽出部１７）
関係性抽出部１７は、上述したようにトリガワードを考慮したモデルである。まず、共有部としての特徴表現生成部１２により生成された中間表現Ｈ２に対してトリガワードの情報を追加で与えて、Multi Head Attention層の入力とする。最初に、トリガワード記憶部１５（図５に示す）から参照してVtrigを生成する。Vtrigは、長さが１文中のトークン長（SeqLen）で次元が１のベクトルである。対応するトークンに対してトークンがトリガワードの場合には１が定義され、トリガワードではない場合には０が定義されるベクトルである。そして、式（５）に示すように、Vtrigに対して平均０、分散１の正規分布に基づいた重みで初期化される行列を用いた埋め込み処理を行う。 (6-4. Relationship Extraction Unit 17)
The relationship extraction unit 17 is a model that takes trigger words into consideration as described above. First, trigger word information is added to the intermediate representation H2 generated by the feature representation generation unit 12 as a sharing unit, and is used as an input to the Multi Head Attention layer. First, Vtrig is generated by referring to the trigger word storage unit 15 (shown in FIG. 5). Vtrig is a vector whose length is the token length (SeqLen) in one sentence and whose dimension is 1. For the corresponding token, 1 is defined if the token is a trigger word, and 0 is defined if the token is not a trigger word. Then, as shown in formula (5), an embedding process is performed on Vtrig using a matrix initialized with weights based on a normal distribution with a mean of 0 and a variance of 1.

続いて、式（６）に示すように、Htrigに対して中間表現Ｈ２を次元方向にConcatして、全結合層に入力してＱ（query）を生成する。 Next, as shown in equation (6), the intermediate representation H2 is concat- ed against Htrig in the dimension direction, and input to the fully connected layer to generate Q (query).

又、Ｋ（key）は、式（７）に従って生成し、Ｖ（value）は、式（８）に従って生成する。 K (key) is generated according to formula (7), and V (value) is generated according to formula (8).

続いて、生成したＱ，Ｋ，Ｖを、Multi Head Attention層の入力とし、中間表現Ｈ３，Ｗを生成する。当該Attention層において、Ｑ（query）は検索元（ターゲット）であり、Ｋ（key）は検索先（ソース）であり、Ｖ（value）はスコアである。 Then, the generated Q, K, and V are input to a multi-head attention layer to generate intermediate representations H3 and W. In this attention layer, Q (query) is the search source (target), K (key) is the search destination (source), and V (value) is the score.

ここで、Multi Head Attentionは、Q（query）に対してトリガワードの情報を加えたものである。Queryに対しての情報付加なので、「文中における関連度を知りたいもの」として、トリガワードを加えていることになる。又、Ｋ（key）、Ｖ（value）には中間表現Ｈ２の情報が含まれていることを考慮すると、得られる中間表現Ｈ３は、「文全体におけるトークンのうちトリガワードに関連するトークンに対して、強く注意がかかった表現」という解釈が可能となる。 Here, Multi Head Attention adds trigger word information to Q (query). Since information is added to the query, the trigger word is added as "something whose relevance within the sentence we want to know." Also, considering that K (key) and V (value) contain information from the intermediate representation H2, the resulting intermediate representation H3 can be interpreted as "an expression that pays strong attention to tokens related to the trigger word among the tokens in the entire sentence."

続いて、式（１０）（１１）に示すように、得られた中間表現Ｈ３に対してスキップコネクションと畳み込み層を２層ずつ通して中間表現Ｈ４を生成する。 Next, as shown in equations (10) and (11), the intermediate representation H3 is passed through two skip connections and two convolutional layers to generate intermediate representation H4.

続いて、式（１２）に示すように、ペア抽出部１６にて作成した用語位置ベクトルＰＥｉと中間表現Ｈ４とを用いて、関係性抽出に用いる２つの用語の表現Ｅ１，Ｅ２を抽出する。ここで、ＰＥｉは、ｉ個目の用語位置ベクトル、＊は要素積を表す。 Next, as shown in formula (12), the term position vector PEi and intermediate representation H4 created by the pair extraction unit 16 are used to extract expressions E1 and E2 of two terms to be used for relationship extraction. Here, PEi represents the i-th term position vector, and * represents the element product.

続いて、式（１３）に示すように、得られたＥ１，Ｅ２に対して、Maxpoolingの演算を行い、それぞれの１つのベクトルに整形する。ここで、Ｅ１’は、複数のトークンから構成される用語である。 Next, as shown in equation (13), Maxpooling is performed on the obtained E1 and E2, and each is shaped into a single vector. Here, E1' is a term composed of multiple tokens.

続いて、式（１４）に示すように、Ｅ１’をそれぞれ全結合層に入力して、用語としての表現Ｅ１”を生成する。 Next, as shown in equation (14), E1' is input to the fully connected layer to generate a term representation E1".

続いて、式（１５）に示すように、Ｅ１”とＥ２”のベクトルの和をとって、さらに、式（１６）に示すように、全結合層を通して用語間に存在する各関係の確率を出力する。つまり、Ｌｒｅｌは、長さが関係の種類の数であるベクトルであり、ｉ種類目の関係の確率がＬｒｅｌのｉ番目の要素の値により表されており、Ｌｒｅｌベクトル中の確率が高い関係が、当該２つの用語の関係を表すことになる。 Next, as shown in equation (15), the vectors of E1" and E2" are summed, and the probability of each relationship that exists between the terms is output through the fully connected layer as shown in equation (16). In other words, Lrel is a vector whose length is the number of types of relationships, the probability of the i-th type of relationship is represented by the value of the i-th element of Lrel, and the relationship with the highest probability in the Lrel vector represents the relationship between the two terms.

（７．学習フェーズ）
（７－１．概要）
上述した用語－関係性抽出モデルは、多くの文書データを取得して機械学習を行う必要がある。特に、本例においては、用語－関係性抽出モデルにおいて、用語抽出部１３と関係性抽出部１７とにより共有される特徴表現生成部１２におけるパラメータを学習する。 (7. Learning Phase)
(7-1. Overview)
The above-mentioned term-relationship extraction model needs to acquire a large amount of document data and perform machine learning. In particular, in this example, in the term-relationship extraction model, parameters in the feature representation generation unit 12 shared by the term extraction unit 13 and the relationship extraction unit 17 are learned.

（７－２．第一学習フェーズ）
第一次の学習としての第一学習フェーズについて、図９を参照して説明する。第一学習フェーズでは、用語抽出部１３により用語の抽出において発生する用語抽出損失Ｌentityを用いて、特徴表現生成部１２における中間表現Ｈ２の生成パラメータを更新することにより、第一次の学習を行う。つまり、第一学習フェーズでは、用語抽出損失Ｌentityに基づいて、生成パラメータを更新するシングルタスク学習処理により学習を行っている。図９において、第一学習フェーズにて使用していない機能について、破線にて記載する。 (7-2. First learning phase)
The first learning phase as the first learning will be described with reference to Fig. 9. In the first learning phase, the term extraction loss Lency generated in the term extraction by the term extraction unit 13 is used to update the generation parameters of the intermediate representation H2 in the feature representation generation unit 12, thereby performing the first learning. That is, in the first learning phase, learning is performed by a single-task learning process that updates the generation parameters based on the term extraction loss Lency. In Fig. 9, functions that are not used in the first learning phase are indicated by dashed lines.

ここで、用語抽出損失Ｌentityは、例えば、交差エントロピー損失（Cross Entropy Loss）を用いる。この場合、用語抽出損失Ｌentityは、式（１７）により表される。ｎは、モデルの出力数、即ち用意するフィルタ数を表し、ｌは、１文におけるトークン数を表す。各トークンに対するｌの予測値と正解ｙとの差を交差エントロピーで定義する。そして、各フィルタに対応する出力の和をとったものが、用語抽出損失Ｌentityとなる。なお、用語抽出損失Ｌentityは、交差エントロピー損失以外に、類似する他の損失の計算式により得られる損失を用いることもできる。 Here, for example, cross entropy loss is used as the term extraction loss Lentity. In this case, the term extraction loss Lentity is expressed by equation (17). n represents the number of model outputs, i.e., the number of filters prepared, and l represents the number of tokens in one sentence. The difference between the predicted value of l for each token and the correct answer y is defined as cross entropy. The term extraction loss Lentity is then calculated by adding up the outputs corresponding to each filter. Note that, in addition to cross entropy loss, the term extraction loss Lentity can also use losses obtained by other similar loss calculation formulas.

（７－３．第二学習フェーズ）
第一学習フェーズに次いで、第二次の学習としての第二学習フェーズについて、図１０を参照して説明する。第二学習フェーズでは、用語抽出部１３により用語の抽出において発生する用語抽出損失Ｌentityと、関係性抽出部１７により関係性の抽出において発生する関係性抽出損失Ｌrelationを用いる。そして、第二学習フェーズでは、用語抽出損失Ｌentity及び関係性抽出損失Ｌrelationに基づいて、特徴表現生成部１２における中間表現Ｈ２の生成パラメータを更新することにより、第二次の学習を行う。つまり、第二学習フェーズでは、用語抽出損失Ｌentity及び関係性抽出損失Ｌrelationに基づいて、生成パラメータを更新するマルチタスク学習処理により学習を行っている。 (7-3. Second learning phase)
Following the first learning phase, the second learning phase as the second learning will be described with reference to Fig. 10. In the second learning phase, a term extraction loss Lentity generated in the extraction of terms by the term extraction unit 13 and a relationship extraction loss Lrelation generated in the extraction of relationships by the relationship extraction unit 17 are used. In the second learning phase, the second learning is performed by updating the generation parameters of the intermediate representation H2 in the feature representation generation unit 12 based on the term extraction loss Lentity and the relationship extraction loss Lrelation. That is, in the second learning phase, learning is performed by a multi-task learning process that updates the generation parameters based on the term extraction loss Lentity and the relationship extraction loss Lrelation.

ここで、関係性抽出損失Ｌrelationは、式（１８）により表される。ｒは、関係ラベルの種類数であり（図４参照）、各関係において損失が発生するため、それらの全ての和をとったものを関係性抽出損失Ｌrelationとする。又、Ｎentityは、用語抽出が対象とする入れ子構造２０の構成用語数である。 Here, the relationship extraction loss Lrelation is expressed by equation (18). r is the number of types of relation labels (see Figure 4). Since a loss occurs for each relation, the sum of all of these is the relationship extraction loss Lrelation. Furthermore, Nentity is the number of constituent terms in the nested structure 20 targeted for term extraction.

そして、上述の式（１７）にて表される用語抽出損失Ｌentityと、式（１８）に示す関係性抽出損失Ｌrelationとを用いて、式（１９）に示すように、全体の損失Ｌallを生成する。 Then, using the term extraction loss Lentity expressed in the above formula (17) and the relationship extraction loss Lrelation shown in formula (18), the overall loss Lall is generated as shown in formula (19).

ここで、損失の表現には、第一学習フェーズと同様に、例えば、交差エントロピー損失（Cross Entropy Loss）を用いることができる。交差エントロピー損失は、式（２０）にて表される。Classは、関係性抽出の対象となるラベルを表す。 Here, to express the loss, for example, cross entropy loss can be used, as in the first learning phase. Cross entropy loss is expressed by equation (20). Class represents the label that is the target of relationship extraction.

（８．知識モデル作成部８による処理の例）
次に、知識モデル作成部８による処理の例について、図１１－図１４を参照して説明する。ただし、知識モデル作成部８による処理は、以下に限られるものではなく、種々の処理が可能である。さらに、知識モデル作成部８は、完全自動化も可能である。 (8. Example of Processing by Knowledge Model Creation Unit 8)
Next, examples of processing by the knowledge model creation unit 8 will be described with reference to Figs. 11 to 14. However, the processing by the knowledge model creation unit 8 is not limited to the following, and various processing is possible. Furthermore, the knowledge model creation unit 8 can be fully automated.

（８－１、第一例）
第一例の処理について図１１を参照して説明する。知識モデル作成部８は、描画可能な描画ＧＵＩウィンドウ３０において、因子を表すノード図形と因子同士の関係性を表すリンク図形とにより構成される知識ネットワーク図を描画する。描画操作は、作業者が行うことができる。 (8-1, first example)
The processing of the first example will be described with reference to Fig. 11. The knowledge model creation unit 8 draws a knowledge network diagram composed of node figures representing factors and link figures representing relationships between the factors in a drawable drawing GUI window 30. The drawing operation can be performed by the operator.

描画ＧＵＩウィンドウ３０において、作業者が知識ネットワーク図を描画する際に、まず、着目因子を決定する。そうすると、知識モデル作成部８が、ＤＢ３，５，７における関係性に関する情報に基づいて、着目因子に対して関係性を有する用語のノード図形の配置候補を描画ＧＵＩウィンドウ３０に表示する。このとき、複数の配置候補を表示することもできる。特に、複数のＤＢ３，５，７を用いることにより、複数の観点で、配置候補を出力することが可能となる。そして、作業者は、配置候補を決定することで、描画ＧＵＩウィンドウ３０において、着目因子を知識ネットワーク図に描画（配置）することができる。 When an operator draws a knowledge network diagram in the drawing GUI window 30, the operator first determines a factor of interest. The knowledge model creation unit 8 then displays in the drawing GUI window 30 placement candidates for node figures of terms that have a relationship with the factor of interest, based on the information on the relationships in DBs 3, 5, and 7. At this time, multiple placement candidates can also be displayed. In particular, by using multiple DBs 3, 5, and 7, it becomes possible to output placement candidates from multiple perspectives. Then, by determining the placement candidates, the operator can draw (place) the factor of interest in the knowledge network diagram in the drawing GUI window 30.

（８－２．第二例）
第二例の処理について図１２を参照して説明する。描画ＧＵＩウィンドウ３０において、作業者が、既に描画されている着目因子を選択する。そうすると、知識モデル作成部８は、ＤＢ３，５，７における関係性に関する情報に基づいて、着目因子に対して関係性を有する用語を因子候補として表示する。そして、作業者は、因子候補の中から選択することで、選択された因子候補が着目因子に関連付けられた状態で描画ＧＵＩウィンドウ３０に配置される。 (8-2. Second Example)
The processing of the second example will be described with reference to Fig. 12. In the drawing GUI window 30, the operator selects an already drawn factor of interest. The knowledge model creation unit 8 then displays terms having a relationship with the factor of interest as factor candidates based on the information on the relationships in the DBs 3, 5, and 7. The operator then selects from among the factor candidates, and the selected factor candidate is placed in the drawing GUI window 30 in a state associated with the factor of interest.

（８－３．第三例）
第三例の処理について図１３を参照して説明する。作業者が、文書データをＤＢ作成部２，４，６に入力する。そうすると、知識モデル作成部８は、ＧＵＩウィンドウ３０に、第一ＤＢ３（用語－関係性ＤＢ）に基づいて得られた複数の用語を表示する。そして、作業者が、表示された複数の用語の中から選択すると、知識モデル作成部８は、ＤＢ３，５，７における関係性に関する情報に基づいて、ＧＵＩウィンドウ３０に、選択された用語に対して関係性を有する用語を表示する。 (8-3. Third Example)
The processing of the third example will be described with reference to Fig. 13. The worker inputs document data into the DB creation units 2, 4, and 6. The knowledge model creation unit 8 then displays a plurality of terms obtained based on the first DB 3 (term-relationship DB) in the GUI window 30. Then, when the worker selects one of the displayed plurality of terms, the knowledge model creation unit 8 displays, in the GUI window 30, terms that have a relationship with the selected term based on the information on the relationships in the DBs 3, 5, and 7.

つまり、ＧＵＩウィンドウ３０には、作業者が選択した用語と、当該用語に関係性を有する用語とが表示された状態となる。そして、作業者が、表示された複数の用語を選択することで、知識モデル作成部８は、ＧＵＩウィンドウ３０に、知識ネットワーク図として描画する。 In other words, the term selected by the worker and terms related to that term are displayed in the GUI window 30. Then, when the worker selects multiple displayed terms, the knowledge model creation unit 8 draws a knowledge network diagram in the GUI window 30.

（８－４．第四例）
第四例の処理について図１４を参照して説明する。作業者が、文書データをＤＢ作成部２，４，６に入力する。そうすると、知識モデル作成部８は、ＧＵＩウィンドウ３０に、既存の類似知識モデル候補を表示する。複数の類似知識モデル候補が存在する場合には、作業者の選択によって、選択された類似知識モデル候補がＧＵＩウィンドウ３０に、表示される。 (8-4. Fourth Example)
The processing of the fourth example will be described with reference to Fig. 14. An operator inputs document data into the DB creation units 2, 4, and 6. Then, the knowledge model creation unit 8 displays existing similar knowledge model candidates in the GUI window 30. When there are multiple similar knowledge model candidates, the similar knowledge model candidate selected by the operator is displayed in the GUI window 30.

続いて、知識モデル作成部８は、入力された文書データに基づいて、ＧＵＩウィンドウ３０に、第一ＤＢ３（用語－関係性ＤＢ）に基づいて得られた複数の用語を表示する。そして、作業者が、表示された複数の用語の中から選択すると、知識モデル作成部８は、ＤＢ３，５，７における関係性に関する情報に基づいて、ＧＵＩウィンドウ３０に、選択された用語に対して関係性を有する用語を表示する。 Then, the knowledge model creation unit 8 displays, in the GUI window 30, a number of terms obtained based on the first DB3 (term-relationship DB) based on the input document data. Then, when the operator selects from the displayed number of terms, the knowledge model creation unit 8 displays, in the GUI window 30, terms that have a relationship to the selected term based on the information on the relationships in DB3, 5, and 7.

つまり、ＧＵＩウィンドウ３０には、類似知識モデルが表示されると共に、作業者が選択した用語と、当該用語に関係性を有する用語とが表示された状態となる。このとき、作業者が選択した用語を着目因子とした場合に、知識モデル作成部８は、ＧＵＩウィンドウ３０に表示されている類似知識モデルにおいて、着目因子に対して関係性を有しないリンク図形を例えば×印等によって表示する。さらに、知識モデル作成部８は、作業者が選択した用語が類似知識モデルに含まれていない場合には、選択された用語が新たに描画されるようにしても良い。 In other words, the GUI window 30 displays the similar knowledge model, as well as the term selected by the operator and terms related to that term. At this time, when the term selected by the operator is set as a factor of interest, the knowledge model creation unit 8 displays link figures that have no relationship to the factor of interest in the similar knowledge model displayed in the GUI window 30, for example, by using an X mark. Furthermore, when the term selected by the operator is not included in the similar knowledge model, the knowledge model creation unit 8 may newly draw the selected term.

（９．効果）
知識モデル作成支援装置１によれば、文書データから用語及び用語同士の関係性を抽出する用語－関係性抽出モデルを用いて、ＤＢ３を作成する。まず、用語－関係性抽出モデルは、文書データから用語を自動的に抽出する。さらに、用語－関係性抽出モデルは、２用語間の関係性を導くトリガワードを予め記憶しておき、文書データに当該トリガワードが含まれる場合に、２つの用語の関係性をトリガワードに基づいて抽出する。このように、用語－関係性抽出モデルは、文書データから用語を自動的に抽出することができると共に、予め設定されたトリガワードを考慮して、文書データに含まれる用語同士の関係性を自動的に抽出することができる。従って、人手によらず、ＤＢ３を作成することができる。そして、ＤＢ３が作成できれば、ＤＢ３を参照して知識モデルを作成することができるため、知識モデルを容易に作成することができる。 (9. Effects)
According to the knowledge model creation support device 1, DB3 is created using a term-relationship extraction model that extracts terms and relationships between terms from document data. First, the term-relationship extraction model automatically extracts terms from document data. Furthermore, the term-relationship extraction model pre-stores trigger words that lead to relationships between two terms, and when the trigger words are included in the document data, extracts the relationship between the two terms based on the trigger words. In this way, the term-relationship extraction model can automatically extract terms from document data, and can automatically extract relationships between terms included in the document data by taking into account the preset trigger words. Therefore, DB3 can be created without manual work. Then, if DB3 can be created, a knowledge model can be created by referring to DB3, and therefore a knowledge model can be easily created.

１：知識モデル作成支援装置、２：第一データベース作成部、３：第一データベース、４：第二データベース作成部、５：第二データベース、６：第二データベース作成部、７：第三データベース、８：知識モデル作成部、１１：文書データ取得部、１２：特徴表現生成部、１３：用語抽出部、１４：用語出力部、１５：トリガワード記憶部、１６：ペア抽出部、１７：関係性抽出部、１８：関係性出力部、２０：入れ子構造、２１：入れ子外部用語、２２，２３：内包用語、１００：知識ネットワーク図、１１０：ノード図形、１２０：リンク図形、１２１：第一リンク図形、１２２：第二リンク図形、Ｌentity：用語抽出損失、Ｌrelation：関係性抽出損失 1: Knowledge model creation support device, 2: First database creation unit, 3: First database, 4: Second database creation unit, 5: Second database, 6: Second database creation unit, 7: Third database, 8: Knowledge model creation unit, 11: Document data acquisition unit, 12: Feature expression generation unit, 13: Term extraction unit, 14: Term output unit, 15: Trigger word storage unit, 16: Pair extraction unit, 17: Relationship extraction unit, 18: Relationship output unit, 20: Nested structure, 21: Nested outer term, 22, 23: Inclusive term, 100: Knowledge network diagram, 110: Node diagram, 120: Link diagram, 121: First link diagram, 122: Second link diagram, Lentity: Term extraction loss, Lrelation: Relationship extraction loss

Claims

An apparatus for supporting the creation of a knowledge model composed of a plurality of factors defined by terms and relationship information between the factors, comprising:
a term-relationship database including a plurality of terms and relationships of the plurality of terms;
a term-relationship database creation unit that creates the term-relationship database by extracting the terms and the relationships from document data using a term-relationship extraction model;
a knowledge model creation unit that creates the knowledge model based on the term-relationship database,
The term-relationship extraction model is
a document data acquisition unit for acquiring the document data;
a term extraction unit that extracts terms from the document data;
a trigger word storage unit for storing a trigger word that leads to a relationship between two terms;
a pair extraction unit that creates term pair information from the terms extracted by the term extraction unit;
a relationship extraction unit that extracts a relationship between the two terms extracted by the term extraction unit based on the trigger word and the pair information when the trigger word is included in the document data ,
the document data includes a nested structure including a plurality of terms;
The nested structure is a structure that is composed of a plurality of inclusive terms and a nested outer term that combines the plurality of inclusive terms,
the term extraction unit extracts, when the document data includes the nested structure, the inclusive term and the nested outer term in the nested structure from the document data;
when the document data includes a nested structure, the pair extraction unit creates the pair information by using only the nested outer term in the nested structure without using the inclusive term in the nested structure;
A knowledge model creation support device, wherein when the document data includes the nested structure, the relationship extraction unit treats the nested outer term in the nested structure as a relationship extraction target, and does not treat the contained term as a relationship extraction target .

The knowledge model creation support device according to claim 1, wherein the trigger word is a keyword that represents a change in a term that represents a physical quantity.

The term-relationship extraction model further comprises:
a feature representation generating unit that generates a feature representation in which a feature of the document data is expressed by a vector based on each token constituting the document data acquired by the document data acquiring unit,
3 . The knowledge model creation support device according to claim 1 , wherein the term extraction unit and the relationship extraction unit extract the terms and the relationships by sharing the characteristic expressions generated by the characteristic expression generation unit.

An apparatus for supporting the creation of a knowledge model composed of a plurality of factors defined by terms and relationship information between the factors, comprising:
a term-relationship database including a plurality of terms and relationships of the plurality of terms;
a term-relationship database creation unit that creates the term-relationship database by extracting the terms and the relationships from document data using a term-relationship extraction model;
a knowledge model creation unit that creates the knowledge model based on the term-relationship database,
The term-relationship extraction model is
a document data acquisition unit for acquiring the document data;
a feature representation generating unit that generates a feature representation in which a feature of the document data is expressed as a vector based on each token constituting the document data acquired by the document data acquiring unit;
a term extraction unit that extracts terms from the document data;
a trigger word storage unit for storing a trigger word that leads to a relationship between two terms;
a relationship extraction unit that extracts a relationship between two terms extracted by the term extraction unit based on the trigger word when the trigger word is included in the document data,
A knowledge model creation support device, wherein the term extraction unit and the relationship extraction unit share the characteristic representation generated by the characteristic representation generation unit to extract the terms and the relationships.

The term-relationship extraction model is
5. The knowledge model creation support device according to claim 3, wherein, as a learning phase, learning is performed by a multi-task learning process that updates generation parameters of the feature representation in the feature representation generation unit based on a term extraction loss generated by the term extraction unit in extracting the terms and a relationship extraction loss generated by the relationship extraction unit in extracting the relationships.

The term-relationship extraction model is
In a first learning phase, learning is performed by a single-task learning process that updates the generation parameters based on the term extraction loss;
6. The knowledge model creation support device according to claim 5 , wherein learning is performed by the multitask learning process as a second learning phase following the first learning phase.

The knowledge model creation support device comprises:
a first database, which is the term-relationship database created using the term-relationship extraction model;
a second database that is created using a model different from the term-relationship extraction model and includes relationships between at least a plurality of terms;
Equipped with
The knowledge model creation unit
7. The knowledge model creation support device according to claim 1, wherein the knowledge model is created based on the first database and the second database.

The knowledge model creation unit
Draw a knowledge network diagram in a drawing GUI window, the knowledge network diagram being composed of node figures representing the factors and link figures representing relationships between the factors;
A knowledge model creation support device as described in any one of claims 1 to 7, which performs at least one of the following when drawing the knowledge network diagram: displaying placement candidates for the node figures of terms that have a relationship with the factor of interest based on the term-relationship database; displaying terms that have a relationship with the factor of interest based on the term-relationship database as factor candidates; and displaying link figures that have no relationship with the factor of interest based on the term- relationship database.