JP7783482B2

JP7783482B2 - Model generation device and model generation method

Info

Publication number: JP7783482B2
Application number: JP2021197943A
Authority: JP
Inventors: 充森本; 輝巳竹原
Original assignee: Nissin Electric Co Ltd
Current assignee: Nissin Electric Co Ltd
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2025-12-10
Anticipated expiration: 2041-12-06
Also published as: JP2023083931A

Description

本発明の一態様は、図面検索を行うための学習モデルを生成するモデル生成装置に関する。 One aspect of the present invention relates to a model generation device that generates a learning model for performing drawing searches.

例えば、プラントエンジニアリング分野では、大量の図面（例：過去図面）の取り扱いが必要となる。このため、大量の図面の内から、所望の図面を効率的に検索するための技術が提案されている。 For example, in the field of plant engineering, it is necessary to handle large amounts of drawings (e.g., past drawings). For this reason, technology has been proposed to efficiently search for desired drawings from among a large number of drawings.

一例として、特許文献１には、図面検索におけるユーザの利便性を高めることを目的とした技術が開示されている。具体的には、特許文献１の技術では、機械学習を利用して、図面検索を行うための学習モデルが生成される。 As an example, Patent Document 1 discloses technology aimed at improving user convenience when searching for drawings. Specifically, the technology in Patent Document 1 uses machine learning to generate a learning model for performing drawing searches.

特開２０２１－１２４１３号公報Japanese Patent Application Laid-Open No. 2021-12413

本発明の一態様は、図面検索を行うための学習モデルの品質を従来よりも向上させることを目的とする。 One aspect of the present invention aims to improve the quality of learning models for drawing searches compared to conventional methods.

上記の課題を解決するために、本発明の一態様に係るモデル生成装置は、複数の検索対象図面の内からターゲット図面に対応する少なくとも１つの図面を検索するための学習モデルを生成するモデル生成装置であって、上記複数の検索対象図面を解析することにより、当該複数の検索対象図面のそれぞれの記載内容に関連した複数の内容パラメータを含む内容パラメータセットを取得する取得部と、（ｉ）上記内容パラメータセットに含まれる上記複数の内容パラメータの内の、異なる２つの内容パラメータの組み合わせパターンのそれぞれについて、当該２つの内容パラメータ間の多重共線性評価値を算出するとともに、（ｉｉ）当該多重共線性評価値に基づき、上記複数の内容パラメータの内から、削除対象となる削除対象内容パラメータを決定する決定部と、上記内容パラメータセットから上記削除対象内容パラメータを削除することによって得られた剪定後内容パラメータセットに基づき、上記学習モデルを生成する学習部と、を備えている。 To solve the above problem, one aspect of the present invention provides a model generation device that generates a learning model for searching for at least one drawing corresponding to a target drawing from among a plurality of search target drawings. The model generation device includes: an acquisition unit that analyzes the plurality of search target drawings to acquire a content parameter set including a plurality of content parameters related to the content described in each of the plurality of search target drawings; (i) a determination unit that calculates a multicollinearity evaluation value between two different content parameters from among the plurality of content parameters included in the content parameter set for each combination pattern of the two different content parameters, and (ii) determines a content parameter to be deleted from among the plurality of content parameters based on the multicollinearity evaluation value; and a learning unit that generates the learning model based on a pruned content parameter set obtained by deleting the content parameter to be deleted from the content parameter set.

また、本発明の一態様に係るモデル生成方法は、複数の検索対象図面の内からターゲット図面に対応する少なくとも１つの図面を検索するための学習モデルを生成するモデル生成方法であって、上記複数の検索対象図面を解析することにより、当該複数の検索対象図面のそれぞれの記載内容に関連した複数の内容パラメータを含む内容パラメータセットを取得する取得工程と、（ｉ）上記内容パラメータセットに含まれる上記複数の内容パラメータの内の、異なる２つの内容パラメータの組み合わせパターンのそれぞれについて、当該２つの内容パラメータ間の多重共線性評価値を算出するとともに、（ｉｉ）当該多重共線性評価値に基づき、上記複数の内容パラメータの内から、削除対象となる削除対象内容パラメータを決定する決定工程と、上記内容パラメータセットから上記削除対象内容パラメータを削除することによって得られた剪定後内容パラメータセットに基づき、上記学習モデルを生成する学習工程と、を含んでいる。 A model generation method according to one aspect of the present invention is a model generation method for generating a learning model for searching for at least one drawing corresponding to a target drawing from among a plurality of search target drawings, and includes: an acquisition step of acquiring a content parameter set including a plurality of content parameters related to the content described in each of the plurality of search target drawings by analyzing the plurality of search target drawings; (i) a determination step of calculating a multicollinearity assessment value between two different content parameters from among the plurality of content parameters included in the content parameter set for each combination pattern of the two different content parameters, and (ii) a determination step of determining, from among the plurality of content parameters, content parameters to be deleted based on the multicollinearity assessment value; and a learning step of generating the learning model based on a pruned content parameter set obtained by deleting the content parameters to be deleted from the content parameter set.

本発明の一態様によれば、図面検索を行うための学習モデルの品質を従来よりも向上させることができる。 One aspect of the present invention makes it possible to improve the quality of learning models for drawing searches compared to conventional methods.

参考形態における情報処理システムの要部の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a main part of an information processing system according to a reference embodiment. 参考形態における過去物件・図面対応テーブルの一例を示す図である。FIG. 10 is a diagram showing an example of a past property/drawing correspondence table in a reference form. 参考形態における過去図面内容パラメータテーブルの一例を示す図である。FIG. 10 is a diagram showing an example of a past drawing content parameter table in the reference embodiment. 参考形態における、初期状態における内容パラメータ設定テーブルの一例を示す図である。FIG. 10 is a diagram showing an example of a content parameter setting table in an initial state in the reference embodiment. 参考形態における、各データセットにおける前処理後内容パラメータ数の一例を示す図である。FIG. 10 is a diagram showing an example of the number of content parameters after preprocessing in each data set in the reference embodiment. 参考形態における、データセット１のデータ構造を模式的に例示する図である。FIG. 10 is a diagram illustrating a schematic example of the data structure of dataset 1 in the reference embodiment. 参考形態における、正規化用データ表および標準化用データ表の一例を示す図である。10A and 10B are diagrams showing examples of a normalization data table and a standardization data table in the reference embodiment. 参考形態における、データセット・前処理手法対応テーブルの一例を示す図である。FIG. 10 is a diagram illustrating an example of a dataset/preprocessing method correspondence table in the reference embodiment. 参考形態において、学習用前処理部によって生成された複数のデータセットの内の一部を例示する図である。FIG. 10 is a diagram illustrating some of a plurality of data sets generated by a learning preprocessing unit in the reference embodiment. 参考形態における、評価結果テーブルの一例を示す図である。FIG. 10 is a diagram showing an example of an evaluation result table in the reference embodiment. 参考形態における、更新後の内容パラメータ設定テーブルの一例を示す図である。FIG. 10 is a diagram showing an example of a content parameter setting table after updating in the reference embodiment. 参考形態における、新規図面内容パラメータテーブルの一例を示す図である。FIG. 10 is a diagram showing an example of a new drawing content parameter table in the reference embodiment. 参考形態における、前処理後新規図面内容パラメータテーブルの一例を示す図である。FIG. 10 is a diagram showing an example of a post-preprocessing new drawing content parameter table in the reference embodiment. 参考形態における、複数の生値・ラベル値変換テーブルの内の一部を例示する図である。FIG. 10 is a diagram illustrating some of a plurality of raw value/label value conversion tables in a reference embodiment. 参考形態における、生値・ラベル値変換統合テーブルの一例を表す。10 illustrates an example of a raw value/label value conversion integrated table in a reference embodiment. 実施形態１における情報処理システムの要部の構成を示すブロック図である。1 is a block diagram showing a configuration of a main part of an information processing system according to a first embodiment. 実施形態１における、初期状態における決定係数・高リスク決定係数個数テーブルの一例を示す図である。FIG. 10 is a diagram showing an example of a table of coefficients of determination and the number of high-risk coefficients of determination in the initial state in the first embodiment. 図１７の決定係数・高リスク決定係数個数テーブルが更新されることによって得られた、決定係数・高リスク決定係数個数テーブルの一例を示す図である。FIG. 18 is a diagram showing an example of a coefficient of determination/high risk coefficient of determination number table obtained by updating the coefficient of determination/high risk coefficient of determination number table of FIG. 17. 図１８の決定係数・高リスク決定係数個数テーブルが更新されることによって得られた、決定係数・高リスク決定係数個数テーブルの一例を示す図である。FIG. 19 is a diagram showing an example of a coefficient of determination/high risk coefficient of determination number table obtained by updating the coefficient of determination/high risk coefficient of determination number table of FIG. 18. 更新の繰り返しの結果として得られた、最終的な決定係数・高リスク決定係数個数テーブルの一例を示す図である。FIG. 10 is a diagram showing an example of a final determination coefficient/high risk determination coefficient number table obtained as a result of repeated updates. 図２０の決定係数・高リスク決定係数個数テーブルに対応する、剪定後データセット１の一例を示す図である。FIG. 21 is a diagram showing an example of a pruned dataset 1 corresponding to the table of coefficients of determination and the number of high-risk coefficients of determination in FIG. 20. 実施形態２における情報処理システムの要部の構成を示すブロック図である。FIG. 10 is a block diagram showing the configuration of a main part of an information processing system according to a second embodiment.

〔参考形態〕
実施形態１の情報処理システム１００の説明に先立ち、参考形態としての情報処理システム１００ｓについて述べる。説明の便宜上、参考形態にて説明した構成要素（コンポーネント）と同じ機能を有する構成要素については、以降の各実施形態では、同じ符号を付記し、その説明を繰り返さない。また、簡潔化のため、公知技術と同様の事項についても、説明を適宜省略する。 [Reference form]
Prior to describing the information processing system 100 of the first embodiment, an information processing system 100s will be described as a reference embodiment. For convenience of explanation, components having the same functions as those described in the reference embodiment will be denoted by the same reference numerals in the following embodiments, and their descriptions will not be repeated. For simplicity, descriptions of matters similar to those in known technologies will also be omitted as appropriate.

本明細書において以下に述べる各数値は、単なる一例であることに留意されたい。本明細書では、２つの数ＡおよびＢについての「Ａ～Ｂ」という記載は、特に明示されない限り、「Ａ以上かつＢ以下」を意味するものとする。 Please note that the numerical values described below in this specification are merely examples. In this specification, the expression "A to B" for two numbers A and B means "greater than or equal to A and less than or equal to B" unless otherwise specified.

（情報処理システム１００ｓの概要）
図１は、情報処理システム１００ｓの要部の構成を示すブロック図である。情報処理システム１００ｓは、情報処理装置１ｓ、過去物件図面ＤＢ（Database）９１、および新規物件図面ＤＢ９２を備える。 (Overview of information processing system 100s)
1 is a block diagram showing the configuration of a main part of an information processing system 100s. The information processing system 100s includes an information processing device 1s, a previous property plan DB (Database) 91, and a new property plan DB 92.

情報処理装置１ｓは、制御装置１０ｓ、入力部７１、表示部７２、および記憶部８０を備える。制御装置１０ｓは、学習装置１１ｓおよび図面検索装置１２を備える。本明細書の「物件」とは、例えば、プラントエンジニアリングにおける「サイト」を意味する。 The information processing device 1s includes a control device 10s, an input unit 71, a display unit 72, and a memory unit 80. The control device 10s includes a learning device 11s and a drawing search device 12. In this specification, "property" refers to, for example, a "site" in plant engineering.

情報処理装置１ｓは、過去物件図面ＤＢ９１および新規物件図面ＤＢ９２と通信可能に接続されていればよい。このため、図１の例とは異なり、情報処理装置１ｓの内部に、過去物件図面ＤＢ９１および新規物件図面ＤＢ９２の少なくとも一方が設けられていてもよい。 The information processing device 1s only needs to be communicatively connected to the past property plan DB 91 and the new property plan DB 92. Therefore, unlike the example in Figure 1, at least one of the past property plan DB 91 and the new property plan DB 92 may be provided inside the information processing device 1s.

情報処理装置１ｓ（より具体的には、制御装置１０ｓ）による検索対象となる図面（検索対象図面）には、例えば、仕様図面、設計図面、および製作図面が含まれる。また、検索対象図面には、仕様書、設計書、および、見積書が含まれていてもよい。このように、検索対象図面は、「図面」という名称が含まれている書面に限定されない。一例として、検索対象図面には、プラントエンジニアリング分野におけるプロジェクトの計画に関する任意の種類の書面が含まれる。 Drawings to be searched for by the information processing device 1s (more specifically, the control device 10s) (search target drawings) include, for example, specification drawings, design drawings, and production drawings. Search target drawings may also include specifications, design documents, and estimates. In this way, search target drawings are not limited to documents containing the word "drawing." As an example, search target drawings include any type of document related to project plans in the plant engineering field.

但し、当業者であれば明らかである通り、本発明の一態様に係る情報処理装置は、プラントエンジニアリング以外の分野における図面の検索についても適用可能である。本発明の一態様に係る図面は、上記情報処理装置によって内容パラメータを取得することが可能な図面であればよい。 However, as will be clear to those skilled in the art, the information processing device according to one aspect of the present invention can also be applied to searching for drawings in fields other than plant engineering. Drawings according to one aspect of the present invention may be drawings from which content parameters can be obtained using the information processing device.

制御装置１０ｓは、情報処理装置１ｓの各部を統括的に制御する。記憶部８０は、制御装置１０ｓの処理に用いられる各種のデータおよびプログラムを格納する。以下に述べるように、制御装置１０ｓは、機械学習を利用して、複数の検索対象図面（例：過去図面ａ１～ＭＮ）から、ターゲット図面（例：図面ＮＤ）に対応する少なくとも１つの図面を検索する。 The control device 10s controls all components of the information processing device 1s. The memory unit 80 stores various data and programs used in the processing of the control device 10s. As described below, the control device 10s uses machine learning to search for at least one drawing corresponding to the target drawing (e.g., drawing ND) from multiple search target drawings (e.g., past drawings a1 to MN).

入力部７１は、ユーザの操作（ユーザ操作）を受け付ける。表示部７２は、各種のデータを表示する。一例として、表示部７２には、制御装置１０ｓによる検索結果を示すデータが表示されてよい。なお、入力部７１と表示部７２とは、一体として設けられてもよい。例えば、タッチパネルを用いることにより、入力部７１と表示部７２とを一体化できる。 The input unit 71 accepts user operations (user operations). The display unit 72 displays various data. As an example, the display unit 72 may display data indicating the search results of the control device 10s. The input unit 71 and display unit 72 may be provided as an integrated unit. For example, the input unit 71 and display unit 72 can be integrated by using a touch panel.

（過去物件図面ＤＢ９１）
過去物件図面ＤＢ９１には、過去の各物件（既設の各物件）に関する各図面（厳密には、図面データ）が格納されている。以下の説明では、「図面Ａ（ある図面）の図面データ」を、単に「図面Ａ」と適宜略称する。また、「図面Ａの図面番号」を、単に「図面Ａ」と適宜略称する。 (Past property drawings DB91)
The past property drawing DB 91 stores drawings (strictly speaking, drawing data) for each past property (each existing property). In the following explanation, "drawing data of drawing A (a certain drawing)" will be abbreviated as simply "drawing A" as appropriate. Also, "drawing number of drawing A" will be abbreviated as simply "drawing A" as appropriate.

参考形態では、過去物件図面ＤＢ９１には、複数のＭ個の異なる物件のそれぞれについての各図面が格納されている。Ｍは、１以上の整数である。以下、ｊ番目の物件を、「物件ｊ」とも称する。ｊは、１以上かつＭ以下の整数である。 In the reference embodiment, the past property drawing DB91 stores drawings for each of a number M of different properties. M is an integer greater than or equal to 1. Hereinafter, the jth property will also be referred to as "property j." j is an integer greater than or equal to 1 and less than or equal to M.

また、過去物件図面ＤＢ９１には、物件１～Ｍのそれぞれについて、Ｎ個（Ｎ種類）の異なる図面が格納されている。Ｎは、１以上の整数である。以下、物件ｊにおけるｉ番目の図面を、「図面（ｉ，ｊ）」とも称する。また、各物件におけるｉ番目（ｉ種類目）の図面を、総称的に図面ｉとも称する。ｉは、１以上かつＮ以下の整数である。 In addition, the past property drawing DB91 stores N different drawings (N types) for each of properties 1 to M. N is an integer greater than or equal to 1. Hereinafter, the i-th drawing for property j will also be referred to as "drawing (i, j)." Furthermore, the i-th drawing (i-th type) for each property will also be collectively referred to as drawing i. i is an integer greater than or equal to 1 and less than or equal to N.

以上のように、過去物件図面ＤＢ９１には、合計でＴ個の図面が格納されている。参考形態の例では、Ｔ＝Ｍ×Ｎである。Ｔは、２以上の整数であるものとする。つまり、ＭおよびＮの少なくとも１つは、１以上であるものとする。 As described above, a total of T drawings are stored in the past property drawing DB91. In the reference example, T = M x N. T is an integer greater than or equal to 2. In other words, at least one of M and N is greater than or equal to 1.

具体的には、過去物件図面ＤＢ９１では、特許文献１と同様に、図２に示す過去物件・図面対応テーブルＴＢ１の形態で、各種類の図面番号が物件番号毎にリスト化されている。以下、過去物件・図面対応テーブルＴＢ１を、「ＴＢ１」とも略記する。その他の要素についても、適宜同様に略記する。ＴＢ１のｉ行ｊ列目のセルは、図面（ｉ，ｊ）の図面番号を示す。 Specifically, in the past property drawing DB91, similar to Patent Document 1, each type of drawing number is listed by property number in the form of the past property/drawing correspondence table TB1 shown in Figure 2. Hereinafter, the past property/drawing correspondence table TB1 will also be abbreviated as "TB1." Other elements will also be abbreviated in the same way as appropriate. The cell in the i-th row and j-th column of TB1 indicates the drawing number of drawing (i, j).

図２の例では、便宜上、物件１～３をそれぞれ、物件Ａ～Ｃとも表記する。図２の例では、１種類目の図面（図面１）は外形図であり、２種類目の図面（図面２）は組立図であり、３種類目の図面（図面３）は基礎図である。また、Ｎ種類目の図面（図面Ｎ）は構成図である。 In the example of Figure 2, for convenience, properties 1 to 3 are also referred to as properties A to C, respectively. In the example of Figure 2, the first type of drawing (Drawing 1) is an outline drawing, the second type of drawing (Drawing 2) is an assembly drawing, and the third type of drawing (Drawing 3) is a foundation drawing. Furthermore, the Nth type of drawing (Drawing N) is a configuration drawing.

以下では、簡単のため、図２の例における図面（１，１）～（Ｎ，１）（すなわち、物件ＡにおけるＮ種類のそれぞれの図面）を、図面ａ１～ａＮとも表記する。例えば、図２の例における図面ａ１～ａＮはそれぞれ、物件Ａの外形図～構成図を指す。その他の物件における各図面についても、同様に表記する。 For simplicity's sake, drawings (1,1) to (N,1) in the example of Figure 2 (i.e., each of the N types of drawings for Property A) will also be referred to as drawings a1 to aN. For example, drawings a1 to aN in the example of Figure 2 refer to the outline drawing to the configuration drawing of Property A, respectively. The same notation will be used for each drawing for other properties.

以上のように、過去物件図面ＤＢ９１には、図面ａ１から図面ＭＮまでの、合計Ｔ個の図面が格納されている。以下、図面ａ１～ＭＮを総称的に、過去図面とも称する。過去図面は、検索対象図面の一例である。このため、図面ａ１～ＭＮは、検索対象図面群とも称される。 As described above, the past property drawing DB91 stores a total of T drawings, from drawing a1 to drawing MN. Hereinafter, drawings a1 to MN will be collectively referred to as past drawings. Past drawings are an example of drawings to be searched. For this reason, drawings a1 to MN will also be referred to as a group of drawings to be searched.

また、本明細書では、複数の過去図面（検索対象図面）のうちの任意の１つの図面を、候補図面とも称する。一例として、図面ａ１（物件Ａの外形図）を候補図面とした場合の、各処理について主に例示する。その他の図面に対する処理については、適宜説明を省略するが、図面ａ１の場合と同様である。 In addition, in this specification, any one drawing from multiple past drawings (drawings to be searched) is also referred to as a candidate drawing. As an example, we will mainly illustrate the processes when drawing a1 (exterior drawing of property A) is the candidate drawing. Explanations of the processes for other drawings will be omitted where appropriate, but they are the same as those for drawing a1.

（新規物件図面ＤＢ９２）
新規物件図面ＤＢ９２には、新規物件（例：これから建設が行われる予定である、少なくとも１つの物件）に関する各図面（以下、総称的に新規図面とも称する）が、新規物件データセットとして格納されている。本明細書では、新規物件データセットに含まれる１つの新規物件（物件Ｔ）について述べる。 (New property drawing DB92)
The new property drawing DB 92 stores drawings (hereinafter collectively referred to as new drawings) relating to new properties (e.g., at least one property that is scheduled to be constructed) as a new property data set. In this specification, one new property (property T) included in the new property data set will be described.

一例として、新規物件図面ＤＢ９２には、物件Ｔについて、過去物件と同種類のＮ個の異なる図面（外形図～構成図）が格納されている。本明細書では、物件Ｔの外形図を、新規図面の一例として例示する。以下、物件Ｔの外形図を、図面ＮＤと称する。参考形態における図面ＮＤは、特許文献１と同様であるものとする。 As an example, the new property drawing DB92 stores N different drawings (outline drawings to configuration drawings) of the same type for property T as for previous properties. In this specification, the outline drawing for property T is used as an example of a new drawing. Hereinafter, the outline drawing for property T will be referred to as drawing ND. Drawing ND in the reference form is assumed to be the same as in Patent Document 1.

（学習装置１１ｓ）
学習装置１１ｓは、過去図面データ取得部１１１、過去図面内容パラメータ取得部１１２（候補図面内容パラメータ取得部，検索対象図面内容パラメータ取得部，取得部）、学習用前処理部１１４（前処理部）、および学習モデル生成部１１３ｓを備える。学習装置１１ｓは、図面ａ１～ＭＮに基づき、図面検索装置１２ｓによる図面検索のための学習モデルを生成する。このことから、学習装置１１ｓは、モデル生成装置と称されてもよい。以下、学習装置１１ｓの処理の流れの一例について述べる。 (Learning device 11s)
The learning device 11s includes a past drawing data acquisition unit 111, a past drawing content parameter acquisition unit 112 (candidate drawing content parameter acquisition unit, search target drawing content parameter acquisition unit, acquisition unit), a learning preprocessing unit 114 (preprocessing unit), and a learning model generation unit 113s. Based on the drawings a1 to MN, the learning device 11s generates a learning model for drawing search by the drawing search device 12s. For this reason, the learning device 11s may be referred to as a model generation device. An example of the processing flow of the learning device 11s is described below.

（候補図面の取得）
過去物件図面ＤＢ９１では、ＴＢ１に従って、図面ａ１～ＭＮが、図面種類別に予めソートされている。従って、例えば、過去図面データ取得部１１１は、「図面ａ１→ｂ１→…→Ｍ１」の順に、各外形図を過去物件図面ＤＢ９１から取得する。続いて、過去図面データ取得部１１１は、「図面ａ２→ｂ２→…→Ｍ２」の順に、各組立図を過去物件図面ＤＢ９１から取得する。そして、最終的には、過去図面データ取得部１１１は、「図面ａＮ→ｂＮ→…→ＭＮ」の順に、各構成図を過去物件図面ＤＢ９１から取得する。参考形態におけるこれらの図面ａ１～ＭＮは、特許文献１と同様であるものとする。 (Acquisition of candidate drawings)
In the past property drawing DB91, drawings a1 to MN are pre-sorted by drawing type according to TB1. Therefore, for example, the past drawing data acquisition unit 111 acquires each exterior drawing from the past property drawing DB91 in the order of "drawing a1 → b1 → ... → M1." Next, the past drawing data acquisition unit 111 acquires each assembly drawing from the past property drawing DB91 in the order of "drawing a2 → b2 → ... → M2." Finally, the past drawing data acquisition unit 111 acquires each configuration drawing from the past property drawing DB91 in the order of "drawing aN → bN → ... → MN." These drawings a1 to MN in the reference embodiment are assumed to be the same as those in Patent Document 1.

上記の例の場合、過去図面データ取得部１１１は、はじめにＴＢ１の１行１列目のセルを参照する。そして、過去図面データ取得部１１１は、上記セルに対応する図面（１，１）、すなわち図面ａ１を、過去物件図面ＤＢ９１から取得する。過去図面データ取得部１１１は、取得した図面ａ１を、過去図面内容パラメータ取得部１１２に供給する。 In the above example, the past drawing data acquisition unit 111 first references the cell in row 1, column 1 of TB1. Then, the past drawing data acquisition unit 111 acquires the drawing (1,1) corresponding to the above cell, i.e., drawing a1, from the past property drawing DB 91. The past drawing data acquisition unit 111 supplies the acquired drawing a1 to the past drawing content parameter acquisition unit 112.

（候補図面に対応する内容パラメータセットの取得）
過去図面内容パラメータ取得部１１２は、特許文献１と同様にして、図面ａ１を解析することにより（より詳細には、ＯＣＲ処理後の図面ａ１に対して、以下に述べる第ｋ特定文字列に着目した構文解析を行うことにより）、当該図面ａ１に対応する内容パラメータセットを取得する。内容パラメータセットは、第ｋ特定文字列に対応付けられた第ｋ内容パラメータ（以下、Ａｋ）を示すデータセットである。また、第１～第Ｌ内容パラメータを総称的に、内容パラメータとも称する。 (Obtaining content parameter sets corresponding to candidate drawings)
The previous drawing content parameter acquisition unit 112 acquires a content parameter set corresponding to the drawing a1 by analyzing the drawing a1 (more specifically, by performing a syntax analysis on the OCR-processed drawing a1, focusing on the k-th specific character string described below), in the same manner as in Patent Document 1. The content parameter set is a data set indicating the k-th content parameter (hereinafter, Ak) associated with the k-th specific character string. The first to L-th content parameters are also collectively referred to as content parameters.

本明細書では、各図面について予め設定された特定の文字列（ストリング）を、特定文字列と称する。参考形態では、Ｌ個（Ｌは２以上の整数）の異なる第ｋ特定文字列が、予め設定されているものとする。以下では、ｋ番目の特定文字列を、第ｋ特定文字列と称する。ｋは、１以上かつＬ以下の整数である。以下の説明では、第１特定文字列が「電圧値」、第２特定文字列が「電流値」、第３特定文字列が「ＯＲ」、第Ｌ特定文字列が「開」として設定されている場合について、例示する。 In this specification, a specific character string (string) that is preset for each drawing is referred to as a specific character string. In the reference embodiment, it is assumed that L (L is an integer greater than or equal to 2) different kth specific character strings are preset. In the following, the kth specific character string is referred to as the kth specific character string. k is an integer greater than or equal to 1 and less than or equal to L. In the following explanation, an example is given in which the first specific character string is set to "voltage value," the second specific character string is set to "current value," the third specific character string is set to "OR," and the Lth specific character string is set to "open."

内容パラメータは、図面の記載内容（具体的には、特定文字列に係る記載内容）に関連付けられた量である。従って、内容パラメータは、当該記載内容を数値化（定量化）したデータの１つであると言える。このため、内容パラメータは、図面の記載内容を示す指標として用いられる。 Content parameters are quantities associated with the content of a drawing (specifically, the content of a specific character string). Therefore, content parameters can be considered to be a type of data that quantifies (quantifies) the content of the drawing. For this reason, content parameters are used as indicators of the content of a drawing.

なお、後述するターゲット図面内容パラメータとの区別のため、検索対象図面（過去図面）の内容パラメータを、検索対象図面内容パラメータとも称する。また、検索対象図面の第ｋ内容パラメータを、検索対象図面第ｋ内容パラメータとも称する。但し、以下の説明では、特に明示されない限り、内容パラメータは、過去図面内容パラメータを指すものとする。同様に、特に明示されない限り、内容パラメータセットは、過去図面内容パラメータセットを指すものとする。 In order to distinguish them from the target drawing content parameters described below, the content parameters of the search target drawing (past drawing) are also referred to as search target drawing content parameters. Furthermore, the kth content parameter of the search target drawing is also referred to as the kth content parameter of the search target drawing. However, in the following explanation, unless otherwise specified, content parameters refer to past drawing content parameters. Similarly, unless otherwise specified, content parameter set refers to past drawing content parameter set.

本明細書では、図面（ｉ，ｊ）のＡｋを、Ａｋ（ｉ，ｊ）とも表記する。上述の通り、過去図面内容パラメータ取得部１１２は、図面（ｉ，ｊ）に対する解析結果（より具体的には、図面（ｉ，ｊ）における特定文字列の検出結果）に基づき、Ａｋ（ｉ，ｊ）を設定する。 In this specification, Ak of drawing (i, j) is also referred to as Ak(i, j). As described above, the past drawing content parameter acquisition unit 112 sets Ak(i, j) based on the analysis results for drawing (i, j) (more specifically, the detection results of specific character strings in drawing (i, j)).

以上のように、過去図面内容パラメータ取得部１１２は、図面ａ１について、Ａ１～ＡＬを設定する。以上のように、過去図面内容パラメータ取得部１１２は、候補図面を解析することにより、当該候補図面の内容パラメータを取得する。このことから、過去図面内容パラメータ取得部１１２は、候補図面内容パラメータ取得部とも呼称される。 As described above, the past drawing content parameter acquisition unit 112 sets A1 to AL for drawing a1. As described above, the past drawing content parameter acquisition unit 112 acquires the content parameters of the candidate drawing by analyzing the candidate drawing. For this reason, the past drawing content parameter acquisition unit 112 is also referred to as the candidate drawing content parameter acquisition unit.

また、過去図面内容パラメータ取得部１１２は、その他の過去図面についても、同様の処理を行う。すなわち、過去図面内容パラメータ取得部１１２は、図面ａ１～ＭＮのそれぞれに対し、Ａ１～ＡＬを設定する。 The past drawing content parameter acquisition unit 112 also performs similar processing for other past drawings. That is, the past drawing content parameter acquisition unit 112 sets A1 to AL for each of drawings a1 to MN.

その後、過去図面内容パラメータ取得部１１２は、図面ａ１～ＭＮのそれぞれのＡ１～ＡＬ、すなわち、Ａ１（１，１）～ＡＬ（Ｍ，Ｎ）、を示す過去図面内容パラメータテーブルＴＢ２を生成する。 Then, the past drawing content parameter acquisition unit 112 generates a past drawing content parameter table TB2 that indicates A1 to AL of each of drawings a1 to MN, i.e., A1(1,1) to AL(M,N).

ＴＢ２には、過去図面内容パラメータテーブル内第ｉサブテーブルＴＢ２－ｉが含まれている。図３には、一例として、ＴＢ２－１が示されている。ＴＢ２－ｉは、図面ｉのそれぞれのＡ１～ＡＬを示すテーブルである。ＴＢ２－１には、図面ａ１～Ｍ１（物件Ａ～Ｍの外形図）のそれぞれのＡ１～ＡＬが示されている。ＴＢ２は、ＴＢ２－１～ＴＢ２－Ｎという、Ｎ個のサブテーブルのセットによって構成されている。このように、参考形態では、図面種別ごとにサブテーブルが作成される。参考形態におけるこれらのサブテーブルは、特許文献１と同様であるものとする。 TB2 contains the i-th sub-table TB2-i in the past drawing content parameter table. Figure 3 shows TB2-1 as an example. TB2-i is a table showing A1 to AL for each of drawing i. TB2-1 shows A1 to AL for each of drawings a1 to M1 (exterior drawings of properties A to M). TB2 is made up of a set of N sub-tables, TB2-1 to TB2-N. In this way, in the reference embodiment, a sub-table is created for each drawing type. These sub-tables in the reference embodiment are assumed to be the same as those in Patent Document 1.

ＴＢ２には、複数の過去図面のそれぞれ（例：図面ａ１）の番号（識別子の一例）と、当該複数の過去図面のそれぞれのＡ１～ＡＬとの対応関係が示されている。そこで、学習モデル生成部１１３ｓは、ＴＢ２を教師データとして取得する。一例として、学習モデル生成部１１３ｓは、当該教師データを用いた多項ロジスティック回帰を行うことにより、学習モデルを生成してよい。但し、後述の説明からも明らかである通り、本発明の一態様に係る機械学習アルゴリズムは、この例に限定されず、公知のその他のアルゴリズムが適用されてよい。なお、機械学習における正解データとしては、ＴＢ２において示されている各過去図面の番号（識別子）が用いられる。 TB2 shows the number (an example of an identifier) of each of multiple past drawings (e.g., drawing a1) and the correspondence between A1 to AL of each of the multiple past drawings. The learning model generation unit 113s then acquires TB2 as training data. As an example, the learning model generation unit 113s may generate a learning model by performing multinomial logistic regression using the training data. However, as will be clear from the explanation below, the machine learning algorithm according to one aspect of the present invention is not limited to this example, and other well-known algorithms may be applied. Note that the numbers (identifiers) of each past drawing shown in TB2 are used as correct answer data in machine learning.

参考形態では、過去図面内容パラメータ取得部１１２は、第ｋ内容パラメータを取得するとともに、当該第ｋ内容パラメータの変数種類を示す情報（第ｋ内容パラメータ変数種類情報）をさらに取得する。第ｋ内容パラメータ変数種類情報とは、第ｋ内容パラメータの変数種類（データ種類）を示す情報である。具体的には、第ｋ内容パラメータ変数種類情報とは、第ｋ内容パラメータが質的変数（以下、ＶＬと表記）または量的変数（以下、ＶＮと表記）のいずれであるかを示す情報である。以下、第１～第ｋ内容パラメータ変数種類情報を総称的に、内容パラメータ変数種類情報と称する。 In the reference embodiment, the past drawing content parameter acquisition unit 112 acquires the kth content parameter and also acquires information indicating the variable type of the kth content parameter (kth content parameter variable type information). The kth content parameter variable type information is information indicating the variable type (data type) of the kth content parameter. Specifically, the kth content parameter variable type information is information indicating whether the kth content parameter is a qualitative variable (hereinafter referred to as VL) or a quantitative variable (hereinafter referred to as VN). Hereinafter, the first to kth content parameter variable type information will be collectively referred to as content parameter variable type information.

一例として、過去図面内容パラメータ取得部１１２は、上述の構文解析の結果に基づいて、内容パラメータ変数種類情報を取得してよい。例えば、上述の通り、過去図面内容パラメータ取得部１１２は、構文解析の結果、第１特定文字列「電圧値」に後続する文字列「９０Ｖ」に含まれる数値「９０」が、電圧値の大きさであると判定する。この場合、過去図面内容パラメータ取得部１１２は、第１内容パラメータはＶＮであると判定する。同様に、過去図面内容パラメータ取得部１１２は、第２内容パラメータはＶＮであると判定する。 As an example, the past drawing content parameter acquisition unit 112 may acquire content parameter variable type information based on the results of the above-mentioned syntax analysis. For example, as described above, the past drawing content parameter acquisition unit 112 determines, as a result of syntax analysis, that the number "90" contained in the character string "90V" following the first specific character string "voltage value" represents the magnitude of the voltage value. In this case, the past drawing content parameter acquisition unit 112 determines that the first content parameter is VN. Similarly, the past drawing content parameter acquisition unit 112 determines that the second content parameter is VN.

また、過去図面内容パラメータ取得部１１２は、構文解析の結果、第３特定文字列「ＯＲ」は、量的変数に対応する文字が後続していない旨を判定する。この場合、過去図面内容パラメータ取得部１１２は、第３内容パラメータはＶＬであると判定する。同様に、過去図面内容パラメータ取得部１１２は、第Ｌ内容パラメータはＶＬであると判定する。 Furthermore, as a result of the syntax analysis, the past drawing content parameter acquisition unit 112 determines that the third specific character string "OR" is not followed by a character corresponding to a quantitative variable. In this case, the past drawing content parameter acquisition unit 112 determines that the third content parameter is VL. Similarly, the past drawing content parameter acquisition unit 112 determines that the Lth content parameter is VL.

（学習用前処理部１１４における処理の一例）
学習用前処理部１１４は、過去図面内容パラメータ取得部１１２から、図面ａ１～ＭＮに対応する内容パラメータセット（図面ａ１～ＭＮのそれぞれの第１～第Ｌ内容パラメータ）、および、内容パラメータ変数種類情報を取得する。そして、学習用前処理部１１４は、取得した内容パラメータ変数種類情報に応じて、内容パラメータ設定テーブルＴＢ３を生成する。 (Example of processing in the learning preprocessing unit 114)
The learning preprocessing unit 114 acquires content parameter sets corresponding to the drawings a1 to MN (the first to Lth content parameters of each of the drawings a1 to MN) and content parameter variable type information from the past drawing content parameter acquisition unit 112. Then, the learning preprocessing unit 114 generates a content parameter setting table TB3 according to the acquired content parameter variable type information.

以下の説明では、図面ａ１～Ｍ１（外形図）の内容パラメータセット（図面ａ１～Ｍ１のそれぞれの第１～第Ｌ内容パラメータ）に基づく各処理について主に述べる。従って、以下に述べる各図におけるテーブルおよびデータは、外形図の内容パラメータセットに基づいて生成されている。これらのテーブルおよびデータは、図２に示されるその他の種類の図面（例：組立図、基礎図、および構成図）についても、外形図に関する以下の説明と同様の処理の流れによって生成されることに留意されたい。このように、参考形態では、これらのテーブルおよびデータは、図２に示されている図面の種類毎に生成される。 The following explanation mainly focuses on the processes based on the content parameter sets (the first to Lth content parameters of drawings a1 to M1, respectively) for drawings a1 to M1 (outline drawings). Therefore, the tables and data for each drawing described below are generated based on the content parameter sets for the outline drawings. Please note that these tables and data are also generated for other types of drawings shown in Figure 2 (e.g., assembly drawings, foundation drawings, and configuration drawings) using a process similar to that described below for outline drawings. Thus, in the reference embodiment, these tables and data are generated for each type of drawing shown in Figure 2.

図４には、初期状態におけるＴＢ３の一例が示されている。図４に示す通り、ＴＢ３は、（ｉ）第１～第Ｌ内容パラメータのそれぞれの変数種類（データ種類）と、（ｉｉ）当該第１～第Ｌ内容パラメータのそれぞれに適用すべき前処理手法と、の対応関係を示す表である。学習用前処理部１１４は、取得した内容パラメータ変数種類情報を、ＴＢ３の「変数」の項目に記録する。なお、第１～第Ｌ内容パラメータのそれぞれに適用すべき前処理手法は、現段階では未決定である。このため、初期状態のＴＢ３では、「前処理手法」の項目は全てブランク項目として設定されている。本明細書では、初期状態のＴＢ３を、ＴＢ３ｉｎｉｔと称する。 Figure 4 shows an example of TB3 in its initial state. As shown in Figure 4, TB3 is a table showing the correspondence between (i) the variable type (data type) of each of the first through Lth content parameters and (ii) the preprocessing method to be applied to each of the first through Lth content parameters. The learning preprocessing unit 114 records the acquired content parameter variable type information in the "Variable" field of TB3. Note that the preprocessing method to be applied to each of the first through Lth content parameters has not yet been determined at this stage. For this reason, in the initial state of TB3, all of the "Preprocessing Method" fields are set to blank. In this specification, TB3 in its initial state is referred to as TB3init.

続いて、学習用前処理部１１４は、第ｋ内容パラメータ変数種類情報に応じて、第ｋ内容パラメータに複数種類の前処理手法を適用する。具体的には、参考形態では、学習用前処理部１１４は、第ｋ内容パラメータがＶＬである場合には、第ｋパラメータに対し、以下の（ｉ）～（ｉｖ）、
（ｉ）生値（Raw値）をそのまま用いる処理（恒等処理）（以下、［Ｒ］と表記）；
（ｉｉ）ワンホットエンコーディング（One hot Encoding）（以下、［Ｏ］と表記）
；
（ｉｉｉ）正規化（Normalization）（以下、［Ｎ］と表記）；
（ｉｉｉｉ）標準化（Standardization）（以下、［Ｓ］と表記）；
という４種類の前処理手法を適用する。 Next, the learning preprocessing unit 114 applies a plurality of types of preprocessing methods to the k-th content parameter according to the k-th content parameter variable type information. Specifically, in the reference embodiment, when the k-th content parameter is VL, the learning preprocessing unit 114 applies the following (i) to (iv) to the k-th parameter:
(i) Processing using raw values as they are (identity processing) (hereinafter referred to as [R]);
(ii) One hot encoding (hereinafter referred to as [O])
;
(iii) Normalization (hereinafter referred to as [N]);
(iii) Standardization (hereinafter referred to as [S]);
Four types of preprocessing methods are applied:

なお、第ｋ内容パラメータがＶＬである場合には、当該第ｋ内容パラメータは、第ｋ特定文字列が過去図面内容パラメータ取得部１１２によってラベルエンコーディング（Label Encoding）（以下、［Ｌ］と表記）されることによって導出された値であると理解することもできる。このため、第ｋ内容パラメータがＶＬである場合には、［Ｒ］は［Ｌ］に読み替えることができる。このように、第ｋ内容パラメータがＶＬである場合には、［Ｒ］と［Ｌ］とは、等価な前処理手法である。なお、前処理手法［Ｌ］の一例については、後述する。 When the kth content parameter is VL, the kth content parameter can also be understood as a value derived by label encoding (hereinafter referred to as [L]) of the kth specific character string by the past drawing content parameter acquisition unit 112. Therefore, when the kth content parameter is VL, [R] can be read as [L]. In this way, when the kth content parameter is VL, [R] and [L] are equivalent preprocessing methods. An example of the preprocessing method [L] will be described later.

他方、学習用前処理部１１４は、第ｋ内容パラメータがＶＮである場合には、第ｋパラメータに対し、以下の（ｉ）～（ｖ）、
（ｉ）［Ｒ］；
（ｉｉ）［Ｌ］；
（ｉｉｉ）［Ｏ］；
（ｉｖ）［Ｎ］；
（ｖ）［Ｓ］；
という５種類の前処理手法を適用する。但し、当業者であれば明らかである通り、本発明の一態様に係る前処理手法は、これらの例に限定されない。本発明の一態様に係る前処理手法は、ＶＬまたはＶＮに適用可能な任意の前処理手法であってよい。 On the other hand, when the k-th content parameter is VN, the learning preprocessing unit 114 performs the following (i) to (v) on the k-th parameter:
(i) [R];
(ii) [L];
(iii) [O];
(iv) [N];
(v) [S];
However, as will be apparent to those skilled in the art, the preprocessing technique according to one embodiment of the present invention is not limited to these examples. The preprocessing technique according to one embodiment of the present invention may be any preprocessing technique applicable to VL or VN.

以上の通り、学習用前処理部１１４は、内容パラメータセットに含まれる各内容パラメータに対し、第ｋ内容パラメータ変数種類情報に応じた複数種類の前処理手法を適用することにより、当該内容パラメータセットを拡張（水増し）（data augmentation）する。以下、内容パラメータセットに含まれているＶＬおよびＶＮの個数を、ＦＬおよびＦＮとそれぞれ表記する。 As described above, the learning preprocessing unit 114 expands (data augmentes) the content parameter set by applying multiple preprocessing methods to each content parameter included in the content parameter set according to the kth content parameter variable type information. Hereinafter, the number of VLs and VNs included in the content parameter set will be referred to as FL and FN, respectively.

上記の説明から明らかである通り、参考形態の例では、内容パラメータセットに対する前処理のパターンの組み合わせの総数は、４^ＦＬ×５^ＦＮ通りである。従って、学習用前処理部１１４は、１つの内容パラメータセットを、４^ＦＬ×５^ＦＮ個の内容パラメータセットへと拡張する。以下、当該４^ＦＬ×５^ＦＮ個の内容パラメータセットを、総称的に拡張後内容パラメータセットと称する。そして、当該４^ＦＬ×５^ＦＮ個の内容パラメータセットのそれぞれを、データセット１、データセット２、…、データセット４^ＦＬ×５^ＦＮと称する。拡張後内容パラメータセットは、前処理後内容パラメータセットと称されてもよい。なお、例えば、データセット１は、データセットＮｏ．１と称されてもよい。 As is clear from the above description, in the example of the reference embodiment, the total number of combinations of preprocessing patterns for the content parameter set is 4 ^FL × 5 ^FN . Therefore, the learning preprocessing unit 114 expands one content parameter set into 4 ^FL × 5 ^FN content parameter sets. Hereinafter, these 4 ^FL × 5 FN content parameter sets will be collectively referred to as expanded content parameter sets. Each of these 4 ^FL × 5 ^FN content parameter sets ^will be referred to as Dataset 1, Dataset 2, ..., Dataset 4 ^FL × 5 ^FN . The expanded content parameter sets may also be referred to as preprocessed content parameter sets. Note that, for example, Dataset 1 may also be referred to as Dataset No. 1.

なお、過去図面内容パラメータセットは、検索対象図面内容パラメータセットと称されてもよい。従って、拡張後内容パラメータセットは、過去図面拡張後内容パラメータセット（あるいは、検索対象図面拡張後内容パラメータセット）と称されてもよい。同様に、前処理後内容パラメータセットは、前処理後過去図面内容パラメータセット（あるいは、前処理後検索対象図面内容パラメータセット）と称されてもよい。なお、前処理後内容パラメータセットに含まれる各データは、前処理後内容パラメータ（より詳細には、前処理後検索対象図面内容パラメータ）と称されてもよい。 The past drawing content parameter set may also be referred to as the search target drawing content parameter set. Therefore, the expanded content parameter set may also be referred to as the past drawing expanded content parameter set (or the search target drawing expanded content parameter set). Similarly, the pre-processed content parameter set may also be referred to as the pre-processed past drawing content parameter set (or the pre-processed search target drawing content parameter set). Each piece of data included in the pre-processed content parameter set may also be referred to as a pre-processed content parameter (more specifically, a pre-processed search target drawing content parameter).

図５には、各データセットにおける前処理後内容パラメータの数（以下、前処理後内容パラメータ数と称する）の一例が示されている。１つのデータセット（例：データセット１）における前処理後内容パラメータ数は、同データセットに含まれるデータの数とも表現できる。従って、例えば、データセット１における前処理後内容パラメータ数は、データセット１の次元数（要素数）と称されてもよい。以下では、前処理後の第ｋ内容パラメータを、前処理後第ｋ内容パラメータと称する。 Figure 5 shows an example of the number of post-preprocessing content parameters in each dataset (hereinafter referred to as the number of post-preprocessing content parameters). The number of post-preprocessing content parameters in one dataset (e.g., dataset 1) can also be expressed as the number of data contained in that dataset. Therefore, for example, the number of post-preprocessing content parameters in dataset 1 may be referred to as the number of dimensions (number of elements) of dataset 1. Below, the kth content parameter after preprocessing will be referred to as the kth post-preprocessing content parameter.

以下に述べる図６からも明らかである通り、前処理後内容パラメータ数は、内容パラメータセットに含まれる各内容パラメータに対して適用される前処理手法に応じて変化しうる。例えば、より多くの内容パラメータに対して［Ｏ］が適用されるほど、前処理後内容パラメータ数が増加する傾向がある（後述の図８も参照）。 As is clear from Figure 6 below, the number of preprocessed content parameters can vary depending on the preprocessing method applied to each content parameter included in the content parameter set. For example, the more content parameters to which [O] is applied, the more likely the number of preprocessed content parameters will increase (see also Figure 8 below).

図６は、データセット１のデータ構造を模式的に例示する図である。図６の例におけるデータセット１は、（ｉ）第１内容パラメータに対して［Ｓ］が、（ｉｉ）第２内容パラメータに対して［Ｓ］が、（ｉｉｉ）第３内容パラメータに対して［Ｏ］が、（ｉｖ）第Ｌ内容パラメータに対して［Ｌ］が、それぞれ施されることにより、生成されたデータ構造（例：データフレーム）である（後述の図８も参照）。 Figure 6 is a diagram that schematically illustrates the data structure of Dataset 1. In the example of Figure 6, Dataset 1 is a data structure (e.g., a data frame) generated by applying [S] to (i) the first content parameter, (ii) [S] to the second content parameter, (iii) [O] to the third content parameter, and (iv) [L] to the Lth content parameter (see also Figure 8, described below).

図６の例では、第３内容パラメータ（第３特定文字列「ＯＲ」に対応する内容パラメータ）がワンホットエンコーディングされることにより、「ＯＲ＿０」、「ＯＲ＿１」、「ＯＲ＿２」、および「ＯＲ＿３」という、当該第３内容パラメータに対応する４つの前処理後内容パラメータが生成されている。 In the example of Figure 6, the third content parameter (the content parameter corresponding to the third specific string "OR") is one-hot encoded to generate four pre-processed content parameters corresponding to the third content parameter: "OR_0", "OR_1", "OR_2", and "OR_3".

説明の便宜上の一例として、内容パラメータセットに含まれる第３内容パラメータ（Ａ３）の最小値が０であり、最大値が３である場合を考える。すなわち、内容パラメータセットにおいて、Ａ３が０から３までの４通りの離散値をとっている場合を考える。この場合、Ａ３のそれぞれの値は、４次元のワンホットベクトル（より具体的には、４ビットのワンホットベクトル）によって表現可能である。 As an example for ease of explanation, consider a case where the minimum value of the third content parameter (A3) included in the content parameter set is 0 and the maximum value is 3. In other words, consider a case where A3 takes on four discrete values from 0 to 3 in the content parameter set. In this case, each value of A3 can be represented by a four-dimensional one-hot vector (more specifically, a four-bit one-hot vector).

例えば、Ａ３＝０である場合、
ＯＲ＿０＝（１，０，０，０）；
ＯＲ＿１＝（０，０，０，０）；
ＯＲ＿２＝（０，０，０，０）；
ＯＲ＿３＝（０，０，０，０）；
である。 For example, if A3=0,
OR_0=(1,0,0,0);
OR_1=(0,0,0,0);
OR_2=(0,0,0,0);
OR_3=(0,0,0,0);
is.

また、Ａ３＝１である場合、
ＯＲ＿０＝（０，０，０，０）；
ＯＲ＿１＝（０，１，０，０）；
ＯＲ＿２＝（０，０，０，０）；
ＯＲ＿３＝（０，０，０，０）；
である。 Also, if A3=1,
OR_0=(0,0,0,0);
OR_1=(0,1,0,0);
OR_2=(0,0,0,0);
OR_3=(0,0,0,0);
is.

また、Ａ３＝２である場合、
ＯＲ＿０＝（０，０，０，０）；
ＯＲ＿１＝（０，０，０，０）；
ＯＲ＿２＝（０，０，１，０）；
ＯＲ＿３＝（０，０，０，０）；
である。 Also, if A3=2,
OR_0=(0,0,0,0);
OR_1=(0,0,0,0);
OR_2=(0,0,1,0);
OR_3=(0,0,0,0);
is.

また、Ａ３＝３である場合、
ＯＲ＿０＝（０，０，０，０）；
ＯＲ＿１＝（０，０，０，０）；
ＯＲ＿２＝（０，０，０，０）；
ＯＲ＿３＝（０，０，０，１）；
である。以上の通り、Ａ３＝ｉ（この説明におけるｉは、０から３までの任意の自然数）であることは、ｉ番目の要素のみに成分「１」を有するワンホットベクトルＯＲ＿ｉによって表現される。 Also, if A3=3,
OR_0=(0,0,0,0);
OR_1=(0,0,0,0);
OR_2=(0,0,0,0);
OR_3=(0,0,0,1);
As described above, A3=i (i in this description is any natural number from 0 to 3) is expressed by a one-hot vector OR_i having a component "1" only in the i-th element.

続いて、前処理手法［Ｎ］の一例について説明する。以下では、学習用前処理部１１４が、第ｋ内容パラメータ（Ａｋ）を正規化する場合を例示する。まず、学習用前処理部１１４は、内容パラメータセットから、Ａｋの最大値（Ａｋｍａｘ）および最小値（Ａｋｍｉｎ）を取得する。 Next, an example of preprocessing method [N] will be described. Below, we will explain an example in which the learning preprocessing unit 114 normalizes the k-th content parameter (Ak). First, the learning preprocessing unit 114 obtains the maximum value (Akmax) and minimum value (Akmin) of Ak from the content parameter set.

そして、学習用前処理部１１４は、
Ａｋ＿Ｎｏｒｍａｌｉｚｅｄ＝（Ａｋ－Ａｋｍｉｎ）／（Ａｋｍａｘ－Ａｋｍｉｎ）
…（１）
の通り、Ａｋ＿Ｎｏｒｍａｌｉｚｅｄを算出する。Ａｋ＿Ｎｏｒｍａｌｉｚｅｄは、正規化後第ｋ内容パラメータ（より詳細には、正規化後過去図面第ｋ内容パラメータ）と称される。また、正規化後第１～第Ｌ内容パラメータを総称的に、正規化後内容パラメータ（より詳細には、正規化後過去図面内容パラメータ）と称する。正規化後内容パラメータは、前処理後内容パラメータの一例である。 Then, the learning preprocessing unit 114 performs the following:
Ak_Normalized=(Ak-Akmin)/(Akmax-Akmin)
…(1)
Ak_Normalized is calculated as follows. Ak_Normalized is referred to as the normalized kth content parameter (more specifically, the normalized kth past drawing content parameter). The normalized first to Lth content parameters are collectively referred to as the normalized content parameters (more specifically, the normalized past drawing content parameters). The normalized content parameters are an example of preprocessed content parameters.

以上の通り、学習用前処理部１１４は、式（１）に従って、ＡｋをＡｋ＿Ｎｏｒｍａｌｉｚｅｄへと正規化する。正規化は、Ｍｉｎ－Ｍａｘスケーリングとも称される。式（１）から理解される通り、［Ｎ］によって生成されたデータセットでは、正規化後内容パラメータの最小値は０であり、最大値は１である。 As described above, the learning preprocessing unit 114 normalizes Ak to Ak_Normalized according to equation (1). Normalization is also called min-max scaling. As can be seen from equation (1), in the dataset generated by [N], the minimum value of the content parameter after normalization is 0, and the maximum value is 1.

さらに、前処理手法［Ｓ］の一例について説明する。以下では、学習用前処理部１１４が、Ａｋを標準化する場合を例示する。まず、学習用前処理部１１４は、内容パラメータセットから、Ａｋの平均値（Ａｋｍｅａｎ）および標準偏差（Ａｋｓｄ）を導出する。 Next, an example of the preprocessing method [S] will be described. Below, we will explain an example in which the learning preprocessing unit 114 standardizes Ak. First, the learning preprocessing unit 114 derives the mean value (Akmean) and standard deviation (Aksd) of Ak from the content parameter set.

続いて、学習用前処理部１１４は、
Ａｋ＿Ｓｔａｎｄａｒｄｉｚｅｄ＝（Ａｋ－Ａｋｍｅａｎ）／Ａｋｓｄ …（２）の通り、Ａｋ＿Ｓｔａｎｄａｒｄｉｚｅｄを算出する。Ａｋ＿Ｓｔａｎｄａｒｄｉｚｅｄは、標準化後第ｋ内容パラメータ（より詳細には、標準化後過去図面第ｋ内容パラメータ）と称される。また、標準化後第１～第Ｌ内容パラメータを総称的に、標準化後内容パラメータ（より詳細には、標準化後過去図面内容パラメータ）と称する。標準化後内容パラメータは、前処理後内容パラメータの一例である。 Next, the learning preprocessing unit 114 performs the following:
Ak_Standardized is calculated as follows: Ak_Standardized = (Ak - Akmean) / Aksd (2). Ak_Standardized is referred to as the standardized kth content parameter (more specifically, the standardized kth past drawing content parameter). The standardized first to Lth content parameters are collectively referred to as the standardized content parameters (more specifically, the standardized past drawing content parameters). The standardized content parameters are an example of preprocessed content parameters.

以上の通り、学習用前処理部１１４は、式（２）に従って、ＡｋをＡｋ＿Ｓｔａｎｄａｒｄｉｚｅｄへと標準化する。式（２）から理解される通り、［Ｓ］によって生成されたデータセットでは、標準化後内容パラメータの平均値は０であり、標準偏差は１である。 As described above, the learning preprocessing unit 114 standardizes Ak to Ak_Standardized according to equation (2). As can be seen from equation (2), in the dataset generated by [S], the mean value of the standardized content parameters is 0 and the standard deviation is 1.

学習用前処理部１１４は、［Ｓ］の過程にて取得したＡｋｍａｘおよびＡｋｍｉｎを記録した表（標準化用データ表）を生成してもよい。同様に、学習用前処理部１１４は、［Ｎ］の過程にて取得したＡｋｍｅａｎおよびＡｋｓｄを記録した表（正規化用データ表）を生成してもよい。 The learning preprocessing unit 114 may generate a table (standardization data table) that records the Akmax and Akmin obtained in the [S] process. Similarly, the learning preprocessing unit 114 may generate a table (normalization data table) that records the Akmean and Aksd obtained in the [N] process.

図７の正規化用データ表７００Ａおよび標準化用データ表７００Ｂはそれぞれ、参考形態における正規化用データ表および標準化用データ表の一例である。具体的には、正規化用データ表７００Ａおよび標準化用データ表７００Ｂはそれぞれ、データセット１の生成に伴って生成された表である。 Normalization data table 700A and standardization data table 700B in Figure 7 are examples of a normalization data table and a standardization data table, respectively, in the reference form. Specifically, normalization data table 700A and standardization data table 700B are tables generated in conjunction with the generation of dataset 1.

上述の説明から理解される通り、［Ｎ］が施されない第ｋ内容パラメータについては、ＡｋｍａｘおよびＡｋｍｉｎがそもそも取得されない。このため、正規化用データ表７００Ａでは、［Ｎ］が施されない第ｋ内容パラメータについては、ＡｋｍａｘおよびＡｋｍｉｎに、ダミー値（例：０）が割り当てられる。 As can be understood from the above explanation, Akmax and Akmin are not obtained for the kth content parameter to which [N] is not applied. For this reason, in the normalization data table 700A, dummy values (e.g., 0) are assigned to Akmax and Akmin for the kth content parameter to which [N] is not applied.

上述の例では、データセット１における前処理後第１～第３内容パラメータおよび前処理後第Ｌ内容パラメータはいずれも、［Ｎ］以外の前処理手法によって導出されている。このため、図７に示される通り、正規化用データ表７００Ａでは、前処理後第１～第３内容パラメータおよび前処理後第Ｎ内容パラメータについては、ＡｋｍａｘおよびＡｋｍｉｎとして、ダミー値である０が記録される。 In the above example, the first to third post-preprocessing content parameters and the Lth post-preprocessing content parameter in dataset 1 were all derived using a preprocessing method other than [N]. Therefore, as shown in Figure 7, in normalization data table 700A, the dummy value 0 is recorded as Akmax and Akmin for the first to third post-preprocessing content parameters and the Nth post-preprocessing content parameter.

参考形態では、学習用前処理部１１４によって、Ａ１ｍｅａｎ＝７１．９、Ａ１ｓｄ＝１０．５、Ａ２ｍｅａｎ＝２．４、Ａ２ｓｄ＝０．６が取得された場合を例示する。この場合、学習用前処理部１１４は、標準化用データ表７００Ｂの第１内容パラメータおよび第２内容パラメータのそれぞれの項目に、これらの値を記録する。 In the reference embodiment, the learning preprocessing unit 114 obtains A1mean = 71.9, A1sd = 10.5, A2mean = 2.4, and A2sd = 0.6. In this case, the learning preprocessing unit 114 records these values in the first content parameter and second content parameter fields of the standardization data table 700B.

なお、［Ｓ］が施されない第ｋ内容パラメータについては、ＡｋｍｅａｎおよびＡｋｓｄがそもそも取得されない。このため、標準化用データ表７００Ｂでは、［Ｓ］が施されない第ｋ内容パラメータについては、ＡｋｍｅａｎおよびＡｋｓｄに、ダミー値（例：０）が割り当てられる。図７の例では、前処理後第３内容パラメータおよび前処理第Ｌ内容パラメータはいずれも、［Ｓ］以外の前処理手法によって導出されている。このため、標準化用データ表７００Ｂでは、前処理後第３内容パラメータおよび前処理第Ｌ内容パラメータについては、ＡｋｍａｘおよびＡｋｍｉｎとして、ダミー値である０が記録される。 Note that for kth content parameters that are not subjected to [S], Akmean and Aksd are not obtained in the first place. For this reason, in standardization data table 700B, for kth content parameters that are not subjected to [S], dummy values (e.g., 0) are assigned to Akmean and Aksd. In the example of Figure 7, both the post-preprocessing third content parameter and the preprocessing Lth content parameter are derived using a preprocessing method other than [S]. For this reason, in standardization data table 700B, the dummy value 0 is recorded as Akmax and Akmin for the post-preprocessing third content parameter and the preprocessing Lth content parameter.

学習用前処理部１１４は、各データセットと各前処理手法との対応関係を示すテーブル（データセット・前処理手法対応テーブル）を生成してよい。図８のテーブルＴＢ４は、データセット・前処理手法対応テーブルの一例である。なお、図８の例において、データセット１に対応する前処理手法にハッチングが付されている趣旨については後述する。 The learning preprocessing unit 114 may generate a table (dataset/preprocessing method correspondence table) that shows the correspondence between each dataset and each preprocessing method. Table TB4 in Figure 8 is an example of a dataset/preprocessing method correspondence table. Note that the reason why the preprocessing method corresponding to dataset 1 in the example in Figure 8 is hatched will be explained later.

図９には、学習用前処理部１１４によって生成された複数のデータセットの内の一部が例示されている。図９において、符号９００Ａはデータセット１を、符号９００Ｂはデータセット２を、符号９００Ｃはデータセット４^ＦＬ×５^ＦＮを、それぞれ表す。上述の説明から理解される通り、データセット１における前処理後第１～第２内容パラメータはそれぞれ、標準化後第１～第２内容パラメータである。標準化後第１内容パラメータおよび標準化後第２内容パラメータはそれぞれ、上述の式（２）に従って第１内容パラメータおよび第２内容パラメータが標準化された値である。 9 illustrates some of the multiple data sets generated by the learning preprocessing unit 114. In FIG. 9, reference numeral 900A represents data set 1, reference numeral 900B represents data set 2, and reference numeral 900C represents data set 4 ^FL × 5 ^FN . As can be understood from the above description, the first and second preprocessed content parameters in data set 1 are the first and second standardized content parameters, respectively. The first and second standardized content parameters are values obtained by standardizing the first and second content parameters according to the above-described formula (2).

上述の図８から理解される通り、図９の例におけるデータセット２は、データセット１とは異なり、第３内容パラメータに対して［Ｌ］が適用されることにより生成されている。その他の内容パラメータに対する前処理手法については、データセット１の例と同様である。 As can be seen from Figure 8 above, Dataset 2 in the example of Figure 9 differs from Dataset 1 in that it is generated by applying [L] to the third content parameter. The preprocessing methods for the other content parameters are the same as those used in the example of Dataset 1.

以上の通り、データセット２では、データセット１とは異なり、第３内容パラメータに対して［Ｏ］が適用されていない。それゆえ、データセット２の次元数は、データセット１の次元数よりも小さい。具体的には、上述の図５に示す通り、データセット１の次元数は５０であり、データセット２の次元数は２８である。 As described above, unlike Dataset 1, Dataset 2 does not apply [O] to the third content parameter. Therefore, the number of dimensions of Dataset 2 is smaller than the number of dimensions of Dataset 1. Specifically, as shown in Figure 5 above, the number of dimensions of Dataset 1 is 50, and the number of dimensions of Dataset 2 is 28.

また、上述の図８から理解される通り、図９の例におけるデータセット４^ＦＬ×５^ＦＮは、データセット１・２とは異なり、全ての内容パラメータに対して［Ｏ］が適用されることにより生成されている。このため、データセット４^ＦＬ×５^ＦＮの次元数は、データセット１・２の次元数に比べて大きい。具体的には、図５に示す通り、データセット４^ＦＬ×５^ＦＮの次元数は１５１である。 8 described above, the dataset 4 ^FL × 5 ^FN in the example of FIG. 9 is generated by applying [O] to all content parameters, unlike datasets 1 and 2. Therefore, the number of dimensions of dataset 4 ^FL × 5 ^FN is greater than the number of dimensions of datasets 1 and 2. Specifically, as shown in FIG. 5, the number of dimensions of dataset 4 ^FL × 5 ^FN is 151.

（学習モデル生成部１１３ｓにおける学習フェーズ）
学習モデル生成部１１３ｓにおける処理は、学習フェーズと検証フェーズとに大別できる。まず、学習フェーズについて述べる。学習モデル生成部１１３ｓは、学習用前処理部１１４から拡張後内容パラメータセット（データセット１～データセット４^ＦＬ×５^ＦＮ）を取得する。そして、学習モデル生成部１１３ｓは、データセット１～データセット４^ＦＬ×５^ＦＮのそれぞれを、訓練データと検証データとに分割する。 (Learning Phase in the Learning Model Generation Unit 113s)
The processing in the learning model generation unit 113s can be broadly divided into a learning phase and a verification phase. First, the learning phase will be described. The learning model generation unit 113s acquires an expanded content parameter set (dataset 1 to dataset 4 ^FL × 5 ^FN ) from the learning preprocessing unit 114. Then, the learning model generation unit 113s divides each of data sets 1 to 4 ^FL × 5 ^FN into training data and verification data.

学習モデル生成部１１３ｓは、データセット１～４^ＦＬ×５^ＦＮの内の任意の１つのデータセット（便宜上、注目データセットと称する）に対し、所定の複数種類の機械学習アルゴリズムのそれぞれを適用することによって、複数の学習モデルを生成する。具体的には、学習モデル生成部１１３ｓは、所定の複数種類の機械学習アルゴリズムのそれぞれを適用することによって、注目データセットの訓練データを用いて、複数の学習モデルを生成する。 The learning model generation unit 113s generates multiple learning models by applying each of multiple predetermined types of machine learning algorithms to any one of the datasets 1 to 4 ^FL × 5 ^FN (for convenience, referred to as a dataset of interest). Specifically, the learning model generation unit 113s generates multiple learning models using the training data of the dataset of interest by applying each of multiple predetermined types of machine learning algorithms.

一例として、学習モデル生成部１１３ｓは、所定の複数種類の機械学習アルゴリズムのそれぞれを適用することによって、データセット１の訓練データを用いて、データセット１に対応する複数の学習モデルを生成する。このように、学習モデル生成部１１３ｓは、注目データセットに対応する複数の学習モデルを生成する。 As an example, the learning model generation unit 113s generates multiple learning models corresponding to Dataset 1 using the training data of Dataset 1 by applying each of multiple predetermined types of machine learning algorithms. In this way, the learning model generation unit 113s generates multiple learning models corresponding to the dataset of interest.

参考形態の例では、複数種類の機械学習アルゴリズムには、勾配ベースの機械学習アルゴリズムと距離ベースの機械学習アルゴリズムとが含まれる。勾配ベースの機械学習アルゴリズムとは、勾配降下法を利用した機械学習アルゴリズムである。勾配ベースの機械学習アルゴリズムの具体例としては、ＤＴ（Decision Tree，決定木）、ＬＲ（Logistic Regression，ロジスティック回帰）、およびＮＮ（Neural Network，ニューラルネットワーク）を挙げることができる。上述の多項ロジスティック回帰は、ＲＳの一例である。 In the example of the reference form, the multiple types of machine learning algorithms include a gradient-based machine learning algorithm and a distance-based machine learning algorithm. A gradient-based machine learning algorithm is a machine learning algorithm that uses gradient descent. Specific examples of gradient-based machine learning algorithms include DT (Decision Tree), LR (Logistic Regression), and NN (Neural Network). The above-mentioned multinomial logistic regression is an example of RS.

本発明の一態様に係るＤＴとは、より厳密には、ＧＢＤＴ（Gradient Boosting DT，勾配ブースティング決定木）を意味する。従って、本発明の一態様に係るＤＴの例としては、ＸＧＢｏｏｓｔ（eXtreme Gradient Boosting）およびＬｉｇｈｔＧＢＭ（Light Gradient Boosting Machine）を挙げることができる。 The DT in one aspect of the present invention, more strictly speaking, refers to a GBDT (Gradient Boosting DT, gradient boosting decision tree). Therefore, examples of the DT in one aspect of the present invention include XGBoost (eXtreme Gradient Boosting) and LightGBM (Light Gradient Boosting Machine).

距離ベースの機械学習アルゴリズムとは、各入力データに含まれる１つ以上のパラメータの分布を示す距離空間における各入力データ間の距離に基づいて、各入力データを評価（例：分類）する機械学習アルゴリズムである。距離ベースの機械学習アルゴリズムの具体例としては、ＳＶＭ（Support Vector Machine，サポートベクターマシン）および重回帰を挙げることができる。 A distance-based machine learning algorithm is a machine learning algorithm that evaluates (e.g., classifies) each piece of input data based on the distance between each piece of input data in a metric space that represents the distribution of one or more parameters contained in each piece of input data. Specific examples of distance-based machine learning algorithms include SVM (Support Vector Machine) and multiple regression.

以上の通り、学習モデル生成部１１３ｓは、データセット１～４^ＦＬ×５^ＦＮのそれぞれに対して各機械学習アルゴリズムを網羅的に（総当たりで）適用することにより、複数の学習モデルを生成してよい。これにより、以下に述べる検証フェーズにおいて評価（検証）の対象となる学習モデルを、十分な数だけ生成できる。 As described above, the learning model generation unit 113s may generate multiple learning models by exhaustively applying each machine learning algorithm to each of the data sets 1 to 4 ^FL × 5 ^FN , thereby generating a sufficient number of learning models to be evaluated (verified) in the verification phase described below.

加えて、学習モデル生成部１１３ｓは、ある機械学習アルゴリズムの各ハイパーパラメータセットを変更してもよい。この場合、学習モデル生成部１１３ｓは、変更後のハイパーパラメータセットを用いて、同機械学習アルゴリズムを適用して学習モデルをさらに生成する。このように、学習モデル生成部１１３ｓは、ハイパーパラメータセットをも網羅的に適用することにより、複数の学習モデルを生成してもよい。これにより、さらに多くの学習モデルを生成できる（後述の図１０を参照）。 In addition, the learning model generation unit 113s may change each hyperparameter set of a certain machine learning algorithm. In this case, the learning model generation unit 113s uses the changed hyperparameter set to apply the same machine learning algorithm and generate a further learning model. In this way, the learning model generation unit 113s may generate multiple learning models by comprehensively applying hyperparameter sets as well. This allows for the generation of even more learning models (see Figure 10, described below).

（学習モデル生成部１１３ｓにおける検証フェーズ）
続いて、検証フェーズについて述べる。学習モデル生成部１１３ｓは、学習フェーズにおいて生成された複数の学習モデルのそれぞれの品質を、データセット１～４^ＦＬ×５^ＦＮのそれぞれを用いて（より具体的には、データセット１～４^ＦＬ×５^ＦＮのそれぞれの検証データを用いて）評価する。 (Verification phase in the learning model generation unit 113s)
Next, the verification phase will be described. The learning model generation unit 113s evaluates the quality of each of the multiple learning models generated in the learning phase using each of the data sets 1 to 4 ^FL × 5 ^FN (more specifically, using the verification data for each of the data sets 1 to 4 ^FL × 5 ^FN ).

一例として、注目データセットとしてデータセットｊを考える。ｊは、後述するＴＢ５（図１０を参照）の列番号を示す添字である。学習モデル生成部１１３ｓは、データセットｊに対応する複数の学習モデルのそれぞれについて、データセットｊの検証データを用いて、当該複数の学習モデルのそれぞれの予測精度（判定精度）を示す指標値を取得する。例えば、学習モデル生成部１１３は、後述するモデル（ｉ，ｊ）にデータセットｊの検証データを入力することにより、上記指標値をモデル（ｉ，ｊ）に出力（導出）させる。上記指標値は、モデル（ｉ，ｊ）の品質を示す指標値とも表現できる。 As an example, consider dataset j as the dataset of interest. j is a subscript indicating the column number of TB5 (see Figure 10), which will be described later. The learning model generation unit 113s uses the validation data of dataset j to obtain an index value indicating the prediction accuracy (determination accuracy) of each of the multiple learning models corresponding to dataset j. For example, the learning model generation unit 113 inputs the validation data of dataset j into model (i, j), which will be described later, to output (derive) the index value to model (i, j). The index value can also be expressed as an index value indicating the quality of model (i, j).

参考形態では、学習モデル生成部１１３ｓは、上記指標値として、Accuracy（正解率）を取得する。このことから、参考形態における予測精度は、検索精度（より詳細には、過去図面の検索精度）と称されてもよい（図１０を参照）。但し、当業者であれば明らかである通り、本発明の一態様に係る指標値は上記の例に限定されず、機械学習分野における公知のその他の指標値が用いられてもよい。従って、例えば、学習モデル生成部１１３ｓは、指標値として、Precision（適合率）またはRecall（再現率）を取得してもよい。あるいは、学習モデル生成部１１３ｓは、指標値として、Ｆスコア（F-score）を取得してもよい。周知の通り、Ｆスコアは、PrecisionとRecallとの調和平均である。 In the reference embodiment, the learning model generation unit 113s acquires Accuracy (the rate of correct answers) as the index value. Therefore, the prediction accuracy in the reference embodiment may also be referred to as search accuracy (more specifically, search accuracy of past drawings) (see Figure 10). However, as will be clear to those skilled in the art, the index value according to one aspect of the present invention is not limited to the above example, and other index values known in the field of machine learning may also be used. Therefore, for example, the learning model generation unit 113s may acquire Precision (the precision rate) or Recall (the recall rate) as the index value. Alternatively, the learning model generation unit 113s may acquire the F-score as the index value. As is well known, the F-score is the harmonic mean of Precision and Recall.

学習モデル生成部１１３ｓは、取得した複数の指標値に応じて、ベスト学習モデルを選択する。参考形態の例では、学習モデル生成部１１３ｓは、複数の指標値の内の最大値（最大指標値）を特定する。そして、学習モデル生成部１１３ｓは、最大指標値を有する学習モデルを、ベスト学習モデルとして選択する。 The learning model generation unit 113s selects the best learning model based on the multiple index values obtained. In the reference example, the learning model generation unit 113s identifies the maximum value (maximum index value) among the multiple index values. Then, the learning model generation unit 113s selects the learning model with the maximum index value as the best learning model.

学習モデル生成部１１３ｓは、評価フェーズにおける評価結果を示すテーブル（評価結果テーブル）を生成してよい。図１０のテーブルＴＢ５は、評価結果テーブルの一例である。ＴＢ５では、１つのデータセットと１つの機械学習アルゴリズムと１つのハイパーパラメータセット（例：Ｐａｒａ１）と１対１に対応するように、１つの指標値が記録されている。 The learning model generation unit 113s may generate a table (evaluation result table) that shows the evaluation results from the evaluation phase. Table TB5 in Figure 10 is an example of an evaluation result table. In TB5, one index value is recorded so that there is a one-to-one correspondence between one dataset, one machine learning algorithm, and one hyperparameter set (e.g., Para1).

図１０の例におけるＰａｒａ１およびＰａｒａ２はそれぞれ、ある１つの機械学習アルゴリズム（例：ＤＴ）に適用されるハイパーパラメータセット（一連のハイパーパラメータ）を示す。図１６の例では、
・ＤＴのＰａｒａ１：データ分割方法＝"gni"、最大深度＝３、…
・ＤＴのＰａｒａ２：データ分割方法＝"entropy"、最大深度＝３、…
・ＬＲのＰａｒａ１：正則化の種類＝"l2"、正則化項の係数＝１．０、…
・ＬＲのＰａｒａ２：正則化の種類＝"l2"、正則化項の係数＝０．５、…
・ＮＮのＰａｒａ１：バッチサイズ＝２５６、最大学習回数＝１０００、…
・ＮＮのＰａｒａ２：バッチサイズ＝１２８、最大学習回数＝１０００、…
・ＳＶＭのＰａｒａ１：カーネルの種類＝"rbf"、正則化項の係数＝１．０、…
・ＳＶＭのＰａｒａ２：カーネルの種類＝"rbf"、正則化項の係数＝０．５、…
の通りである。なお、当業者であれば明らかである通り、ハイパーパラメータセットの数は２つに限定されない。例えば、Ｐａｒａ１～Ｐａｒａ５までの５つのハイパーパラメータセットが、各機械学習アルゴリズムに対して割り当てられてもよい。 In the example of Fig. 10, Para1 and Para2 each represent a hyperparameter set (a series of hyperparameters) applied to a certain machine learning algorithm (e.g., DT).
・DT Para 1: Data division method = "gni", maximum depth = 3, ...
・DT Para 2: Data division method = "entropy", maximum depth = 3, ...
・LR Para1: Regularization type = "l2", regularization term coefficient = 1.0, ...
・LR Para2: Regularization type = "l2", regularization coefficient = 0.5, ...
・NN Para1: batch size = 256, maximum number of learning iterations = 1000, ...
NN Para2: batch size = 128, maximum number of learning iterations = 1000, ...
・SVM Para1: Kernel type = "rbf", regularization coefficient = 1.0, ...
・SVM Para2: Kernel type = "rbf", regularization coefficient = 0.5, ...
As will be apparent to those skilled in the art, the number of hyperparameter sets is not limited to two. For example, five hyperparameter sets, Para1 to Para5, may be assigned to each machine learning algorithm.

以下では、ＴＢ５のｉ行ｊ列目の成分を、ＴＢ５（ｉ，ｊ）と表記する。また、ＴＢ５（ｉ，ｊ）に対応する学習モデルを、モデル（ｉ，ｊ）と称する。ＴＢ８では、行方向（ｉ方向）に、機械学習アルゴリズムおよび当該機械学習アルゴリズムのハイパーパラメータセットが配列されている。そして、列方向（ｊ方向）に、データセットが配列されている。従って、一例として、図１０におけるＴＢ８（１，１）は、ＤＴにおいてＰａｒａ１が適用された場合に得られた検索精度である。図１０の例では、ＴＢ５（ｉ，ｊ）＝４２．１％である。上述の通り、ＴＢ５（１，１）は、Ｐａｒａ１が適用されたＤＴによって生成された学習モデル、すなわちモデル（１，１）の品質を示す指標値とも言える。 In the following, the component in the i-th row and j-th column of TB5 will be denoted as TB5(i,j). The learning model corresponding to TB5(i,j) will be referred to as model(i,j). In TB8, machine learning algorithms and their hyperparameter sets are arranged in the row direction (i direction). Datasets are arranged in the column direction (j direction). Therefore, as an example, TB8(1,1) in Figure 10 is the search accuracy obtained when Para1 is applied in DT. In the example of Figure 10, TB5(i,j) = 42.1%. As mentioned above, TB5(1,1) can also be considered an index value indicating the quality of the learning model generated by DT to which Para1 is applied, i.e., model(1,1).

説明の便宜上、図１０の例において、ＮＮのＰａｒａ１に対応する行番号を、ｉｍと表記する。図１０の例では、ＴＢ５（ｉｍ，１）＝８１．６％が、各ＴＢ５（ｉ，ｊ）の内の最大値である（ＴＢ５においてハッチングが付されているセルを参照）。 For ease of explanation, in the example in Figure 10, the row number corresponding to Para1 in the NN will be denoted as im. In the example in Figure 10, TB5(im,1) = 81.6% is the maximum value among each TB5(i,j) (see the hatched cells in TB5).

以上の通り、学習装置１１ｓは、図面ａ１～Ｍ１（外形図）の内容パラメータセット（便宜上、第１図面種類内容パラメータセットと称する）に基づいて、複数の学習モデルを生成する。そして、学習装置１１は、第１図面種類内容パラメータセットに基づいて生成した当該複数の学習モデルのそれぞれの品質を評価する（より具体的には、生成した複数の学習モデルのそれぞれの指標値を導出する）。 As described above, the learning device 11s generates multiple learning models based on the content parameter set (for convenience, referred to as the first drawing type content parameter set) for drawings a1 to M1 (outline drawings). The learning device 11 then evaluates the quality of each of the multiple learning models generated based on the first drawing type content parameter set (more specifically, derives index values for each of the multiple generated learning models).

外形図についての上記の例と同様に、学習装置１１ｓは、図面の種類毎に、当該図面の内容パラメータセットに基づいて、複数の学習モデルを生成する。そして、学習装置１１ｓは、当該内容パラメータセットに基づいて生成した複数の学習モデルのそれぞれの品質を評価する。 Similar to the example above for the outline drawing, the learning device 11s generates multiple learning models for each type of drawing based on the content parameter set for that drawing. The learning device 11s then evaluates the quality of each of the multiple learning models generated based on the content parameter set.

一例として、学習装置１１ｓは、図面ａ２～Ｍ２（組立図）の内容パラメータセット（便宜上、第２内容図面種類パラメータセットと称する）に基づいて、複数の学習モデルを生成する。そして、学習装置１１は、第２図面種類内容パラメータセットに基づいて生成した複数の学習モデルのそれぞれの品質を評価する。別の例として、学習装置１１ｓは、図面ａＮ～ＭＮ（構成図）の内容パラメータセット（便宜上、第Ｎ図面種類内容パラメータセットと称する）に基づいて、複数の学習モデルを生成する。そして、学習装置１１ｓは、第Ｎ図面種類内容パラメータセットに基づいて生成した複数の学習モデルのそれぞれの品質を評価する。 As one example, the learning device 11s generates multiple learning models based on a content parameter set (for convenience, referred to as the second content drawing type parameter set) for drawings a2 to M2 (assembly drawings). The learning device 11 then evaluates the quality of each of the multiple learning models generated based on the second drawing type content parameter set. As another example, the learning device 11s generates multiple learning models based on a content parameter set (for convenience, referred to as the Nth drawing type content parameter set) for drawings aN to MN (configuration drawings). The learning device 11s then evaluates the quality of each of the multiple learning models generated based on the Nth drawing type content parameter set.

参考形態では、以上の通り第１図面種類内容パラメータセット～第Ｎ図面種類内容パラメータセットに基づいて導出された全ての指標値の内、ＴＢ５（ｉｍ，１）が、最大値であるものとする。従って、参考形態では、学習モデル生成部１１３ｓは、ＴＢ５（ｉｍ，１）を最大指標値として特定する。そして、学習モデル生成部１１３ｓは、最大指標値を有する学習モデル、すなわちモデル（ｉｍ，１）を、ベスト学習モデルとして選択する。以上の通り、参考形態の例では、学習モデル生成部１１３ｓは、学習フェーズにおいて生成された複数の学習モデルの内、最も高品質な学習モデルを、ベスト学習モデルとして選択する。なお、本明細書では、ベスト学習モデルに対応する機械学習アルゴリズムを、ベスト機械学習アルゴリズムと称する。図１０の例におけるベスト機械学習アルゴリズムは、ＮＮである。 In the reference embodiment, as described above, of all the index values derived based on the first drawing type content parameter set through the Nth drawing type content parameter set, TB5(im,1) is assumed to be the maximum value. Therefore, in the reference embodiment, the learning model generation unit 113s identifies TB5(im,1) as the maximum index value. The learning model generation unit 113s then selects the learning model with the maximum index value, i.e., model (im,1), as the best learning model. As described above, in the example of the reference embodiment, the learning model generation unit 113s selects the highest quality learning model as the best learning model from among the multiple learning models generated in the learning phase. Note that in this specification, the machine learning algorithm corresponding to the best learning model is referred to as the best machine learning algorithm. In the example of Figure 10, the best machine learning algorithm is NN.

なお、当業者であれば明らかである通り、ベスト学習モデルの選択手法は上記の例に限定されない。学習モデル生成部１１３ｓは、複数の指標値に基づいて、複数の学習モデルの内から、ベスト学習モデルを選択できればよい。例えば、学習モデル生成部１１３ｓは、複数の指標値に基づいて統計値を導出し、当該統計値に基づいてベスト学習モデルを選択してよい。参考形態における最大指標値は、統計値の一例である。 As will be clear to those skilled in the art, the method for selecting the best learning model is not limited to the above example. The learning model generation unit 113s may simply select the best learning model from among multiple learning models based on multiple index values. For example, the learning model generation unit 113s may derive a statistical value based on multiple index values and select the best learning model based on the statistical value. The maximum index value in the reference embodiment is an example of a statistical value.

（学習フェーズについての補足）
ところで、距離ベースの機械学習アルゴリズムは、勾配ベースの機械学習アルゴリズムとは異なり、いわゆる「次元の呪い」の影響を受けることが知られている。このため、注目データセットの次元数が多い場合、距離ベースの機械学習アルゴリズムによって生成された学習モデル（以下、距離ベース学習モデルと称する）は、勾配ベースの機械学習アルゴリズムによって生成された学習モデル（以下、勾配ベース学習モデルと称する）に比べて、低品質な学習モデルとなる傾向にある。このことから、注目データセットの次元数が多い場合、当該注目データセットを用いて生成された距離ベース学習モデルがベスト学習モデルとして選択される可能性はそもそも低いと考えられる。 (Additional information about the learning phase)
However, unlike gradient-based machine learning algorithms, distance-based machine learning algorithms are known to be affected by the so-called "curse of dimensionality." Therefore, when a dataset of interest has a high number of dimensions, a learning model generated by a distance-based machine learning algorithm (hereinafter referred to as a distance-based learning model) tends to be of lower quality than a learning model generated by a gradient-based machine learning algorithm (hereinafter referred to as a gradient-based learning model). For this reason, when a dataset of interest has a high number of dimensions, it is considered unlikely that a distance-based learning model generated using the dataset of interest will be selected as the best learning model.

そこで、参考形態では、学習モデル生成部１１３は、注目データセットの次元数が所定の次元数閾値Ｄｔｈ以上である場合には、当該注目データセットを用いて距離ベースの機械学習アルゴリズムによって学習モデルを生成することを停止することが好ましい。これにより、品質が低いと予期される学習モデルが生成されることを未然に防止することができるので、学習フェーズにおける演算コストを低減できる。加えて、後続する評価フェーズにおける演算コストを低減することもできる。 Therefore, in a reference embodiment, it is preferable that the learning model generation unit 113 stop generating a learning model using a distance-based machine learning algorithm with a data set of interest when the number of dimensions of the data set of interest is equal to or greater than a predetermined dimensionality threshold Dth. This makes it possible to prevent the generation of a learning model that is expected to be of low quality, thereby reducing the computational cost in the learning phase. In addition, it is also possible to reduce the computational cost in the subsequent evaluation phase.

機械学習分野では、データセットの次元数が３０以上の場合、距離ベース学習モデルの品質が低下する傾向が高くなることが経験的に知られている。そこで、例えば、Ｄｔｈは３０以上の所定の値として設定されてよい。参考形態では、Ｄｔｈ＝３０に設定されている場合を例示する。 In the field of machine learning, it is empirically known that when the number of dimensions of a dataset is 30 or more, the quality of distance-based learning models tends to decline. Therefore, for example, Dth may be set to a predetermined value of 30 or more. In the reference embodiment, a case where Dth is set to 30 is illustrated.

上述の通り、参考形態の例では、データセット１の次元数は５０であり、データセット４^ＦＬ×５^ＦＮの次元数は１５１である。このため、図１０の例では、学習モデル生成部１１３ｓは、データセット１およびデータセット４^ＦＬ×５^ＦＮに対しては、距離ベースの機械学習アルゴリズム（例：ＳＶＭ）による学習モデルの生成を行わない（ＴＢ５において「×」マークが付されているセルを参照）。 As described above, in the example of the reference embodiment, the number of dimensions of Dataset 1 is 50, and the number of dimensions of Dataset 4 ^FL × 5 ^FN is 151. Therefore, in the example of Fig. 10, the learning model generation unit 113s does not generate learning models using a distance-based machine learning algorithm (e.g., SVM) for Dataset 1 and Dataset 4 ^FL × 5 ^FN (see cells marked with "x" in TB5).

以上のことから、図１０の例では、学習モデル生成部１１３ｓは、データセット１およびデータセット４^ＦＬ×５^ＦＮに対しては、勾配ベースの機械学習アルゴリズム（例：ＤＴ、ＬＲ、およびＮＮ）のみを適用して、学習モデルを生成する。このように、学習モデル生成部１１３ｓは、データセット１およびデータセット４^ＦＬ×５^ＦＮに対しては、距離ベース学習モデルを生成することなく、勾配ベース学習モデルのみを生成する。 10, the learning model generation unit 113s generates learning models by applying only gradient-based machine learning algorithms (e.g., DT, LR, and NN) to Data Set 1 and Data Set 4 ^FL × 5 ^FN . In this way, the learning model generation unit 113s generates only gradient-based learning models for Data Set 1 and Data Set 4 ^FL × 5 ^FN , without generating distance-based learning models.

他方、学習モデル生成部１１３ｓは、注目データセットの次元数がＤｔｈ未満である場合には、距離ベースの機械学習アルゴリズムを適用して、当該データセットを用いて学習モデルを生成してもよい。参考形態の例では、データセット２の次元数は２８である。このため、図１０の例では、学習モデル生成部１１３は、データセット２に対しては、勾配ベース学習モデルを生成するとともに、距離ベース学習モデルをさらに生成する。 On the other hand, if the number of dimensions of the dataset of interest is less than Dth, the learning model generation unit 113s may apply a distance-based machine learning algorithm to generate a learning model using the dataset. In the example of the reference form, the number of dimensions of dataset 2 is 28. Therefore, in the example of Figure 10, the learning model generation unit 113 generates a gradient-based learning model for dataset 2, and also generates a distance-based learning model.

（学習モデル生成部１１３ｓにおける検証フェーズ後の処理）
学習モデル生成部１１３ｓは、ＴＢ５に含まれている各データセットのうち、ベスト学習モデルに対応する１つのデータセットを、ベストデータセットとして選択する。図１０の例では、学習モデル生成部１１３ｓは、データセット１をベストデータセットとして選択する。 (Processing after the verification phase in the learning model generation unit 113s)
The learning model generation unit 113s selects, as the best dataset, one dataset corresponding to the best learning model from among the datasets included in TB5. In the example of Fig. 10, the learning model generation unit 113s selects Dataset 1 as the best dataset.

続いて、学習モデル生成部１１３ｓは、ベストデータセットに対応する前処理手法を、ベスト前処理手法として選択する。参考形態の例では、学習モデル生成部１１３ｓは、上述のＴＢ４を参照し、データセット１に対応する前処理手法を、ベスト前処理手法として読み出す（図８のハッチング箇所を参照）。以上の説明から理解される通り、学習モデル生成部１１３ｓは、ベスト学習モデルに対応する前処理手法を、ベスト前処理手法として選択する。 The learning model generation unit 113s then selects the preprocessing method corresponding to the best data set as the best preprocessing method. In the reference example, the learning model generation unit 113s references the above-mentioned TB4 and reads out the preprocessing method corresponding to Data Set 1 as the best preprocessing method (see the hatched area in Figure 8). As can be understood from the above explanation, the learning model generation unit 113s selects the preprocessing method corresponding to the best learning model as the best preprocessing method.

続いて、学習モデル生成部１１３ｓは、上述のＴＢ３ｉｎｉｔにおける「前処理手法」の項目に、ベスト前処理手法を記録することにより、ＴＢ３ｉｎｉｔを更新する。本明細書では、更新後の内容パラメータ設定初期テーブルを、ＴＢ３ｎｅｗと称する。図１１には、ＴＢ３ｎｅｗの一例が示されている。図１１の例では、データセット１に対応する前処理手法（換言すれば、ベスト学習モデルに対応する前処理手法）が、ベスト前処理手法として、「前処理手法」の項目に記録されている。 The learning model generation unit 113s then updates TB3init by recording the best preprocessing method in the "Preprocessing Method" field in the above-mentioned TB3init. In this specification, the updated content parameter setting initial table is referred to as TB3new. Figure 11 shows an example of TB3new. In the example of Figure 11, the preprocessing method corresponding to Dataset 1 (in other words, the preprocessing method corresponding to the best learning model) is recorded as the best preprocessing method in the "Preprocessing Method" field.

（図面検索装置１２）
続いて、図面検索装置１２について述べる。図面検索装置１２は、新規図面データ取得部１２１、新規図面内容パラメータ取得部１２２（ターゲット図面内容パラメータ取得部）、検索用前処理部１２５、および検索部１２６を備える。 (Drawing search device 12)
Next, a description will be given of the drawing search device 12. The drawing search device 12 includes a new drawing data acquisition unit 121, a new drawing content parameter acquisition unit 122 (target drawing content parameter acquisition unit), a search preprocessing unit 125, and a search unit 126.

図面検索装置１２は、学習装置１１ｓによって生成された学習モデル（参考形態の例では、ベスト学習モデル）を用いて、ターゲット図面を複数の検索対象図面のそれぞれと照合することにより、少なくとも１つの図面を検索する。参考形態の図面ＮＤは、ターゲット図面の一例である。以下に述べるように、図面検索装置１２では、上記学習モデルを用いて、図面ＮＤに対し、図面ａ１～ＭＮのそれぞれとの照合が行われる。 The drawing search device 12 searches for at least one drawing by matching the target drawing with each of the multiple search target drawings using the learning model (in the example of the reference embodiment, the best learning model) generated by the learning device 11s. Drawing ND in the reference embodiment is an example of a target drawing. As described below, the drawing search device 12 uses the learning model to match drawing ND with each of drawings a1 to MN.

（新規図面の取得）
新規図面データ取得部１２１は、過去図面データ取得部１１１と対になる機能部である。一例として、新規図面データ取得部１２１は、入力部７１が所定のユーザ操作を受け付けたことを契機として、新規物件図面ＤＢ９２の新規物件データセットに含まれている、所定の新規図面（例：図面ＮＤ）を取得する。新規図面データ取得部１２１は、取得した図面ＮＤを、新規図面内容パラメータ取得部１２２に供給する。 (Acquisition of new drawings)
The new drawing data acquisition unit 121 is a functional unit paired with the past drawing data acquisition unit 111. As an example, the new drawing data acquisition unit 121 acquires a predetermined new drawing (e.g., drawing ND) included in the new property data set of the new property drawing DB 92 when the input unit 71 receives a predetermined user operation. The new drawing data acquisition unit 121 supplies the acquired drawing ND to the new drawing content parameter acquisition unit 122.

（新規図面に対応する内容パラメータセットの取得）
新規図面内容パラメータ取得部１２２は、過去図面内容パラメータ取得部１１２と対になる機能部である。新規図面内容パラメータ取得部１２２は、過去図面内容パラメータ取得部１１２と同様の処理により、図面ＮＤに対応する内容パラメータセットを取得する。すなわち、新規図面内容パラメータ取得部１２２は、過去図面内容パラメータ取得部１１２と同じ解析手法によって図面ＮＤを解析することにより、当該図面ＮＤの内容パラメータを取得する。 (Getting the content parameter set corresponding to the new drawing)
The new drawing content parameter acquisition unit 122 is a functional unit paired with the previous drawing content parameter acquisition unit 112. The new drawing content parameter acquisition unit 122 acquires a content parameter set corresponding to the drawing ND by the same processing as the previous drawing content parameter acquisition unit 112. That is, the new drawing content parameter acquisition unit 122 acquires the content parameters of the drawing ND by analyzing the drawing ND by the same analysis method as the previous drawing content parameter acquisition unit 112.

以下、図面ＮＤの第ｋ内容パラメータを、Ｃｋとも称する。なお、上述の検索対象図面内容パラメータとの区別のため、ターゲット図面（図面ＮＤ）の内容パラメータを、ターゲット図面内容パラメータとも称する。また、ターゲット図面の第ｋ内容パラメータを、ターゲット図面第ｋ内容パラメータとも称する。ターゲット図面内容パラメータは、新規図面内容パラメータと称されてもよい。このため、ターゲット図面第ｋ内容パラメータは、新規図面第ｋ内容パラメータと称されてもよい。 Hereinafter, the kth content parameter of drawing ND will also be referred to as Ck. Note that, to distinguish it from the search target drawing content parameters described above, the content parameter of the target drawing (drawing ND) will also be referred to as the target drawing content parameter. The kth content parameter of the target drawing will also be referred to as the target drawing kth content parameter. The target drawing content parameter may also be referred to as the new drawing content parameter. Therefore, the target drawing kth content parameter may also be referred to as the new drawing kth content parameter.

以上のように、新規図面内容パラメータ取得部１２２は、図面ＮＤに対し過去図面内容パラメータ取得部１１２と同様の処理を行うことにより、Ｃ１～ＣＬを設定する。その後、新規図面内容パラメータ取得部１２２は、Ｃ１～ＣＬを示す新規図面内容パラメータテーブルＴＢ－ＮＤを生成してよい。図１２には、ＴＢ－ＮＤの一例が示されている。 As described above, the new drawing content parameter acquisition unit 122 sets C1 to CL by performing the same processing on the drawing ND as the past drawing content parameter acquisition unit 112. The new drawing content parameter acquisition unit 122 may then generate a new drawing content parameter table TB-ND that indicates C1 to CL. Figure 12 shows an example of TB-ND.

（検索用前処理部１２５における処理の一例）
検索用前処理部１２５は、新規図面内容パラメータ取得部１２２から、図面ＮＤに対応する内容パラメータセット（便宜上、新規図面内容パラメータセットと称する）を取得する。具体的には、新規図面内容パラメータセットとは、図面ＮＤの第１～第Ｌ内容パラメータ（Ｃ１～ＣＬ）を含むデータセットを意味する。一例として、検索用前処理部１２５は、新規図面内容パラメータ取得部１２２から、上述のＴＢ－ＮＤを取得する。 (Example of processing in search preprocessing unit 125)
The search preprocessing unit 125 acquires a content parameter set (for convenience, referred to as a new drawing content parameter set) corresponding to the drawing ND from the new drawing content parameter acquisition unit 122. Specifically, the new drawing content parameter set refers to a data set including the first to Lth content parameters (C1 to CL) of the drawing ND. As an example, the search preprocessing unit 125 acquires the above-mentioned TB-ND from the new drawing content parameter acquisition unit 122.

また、検索用前処理部１２５は、学習モデル生成部１１３ｓから、ベスト前処理手法を取得する。一例として、検索用前処理部１２５は、学習モデル生成部１１３ｓからＴＢ３を取得し、ＴＢ３からベスト前処理手法を読み出す。 The search preprocessing unit 125 also obtains the best preprocessing method from the learning model generation unit 113s. As an example, the search preprocessing unit 125 obtains TB3 from the learning model generation unit 113s and reads the best preprocessing method from TB3.

続いて、検索用前処理部１２５は、ベスト前処理手法に従って、新規図面内容パラメータセットに対して前処理を施すことにより、前処理後新規図面内容パラメータセットを生成する。すなわち、検索用前処理部１２５は、ベスト前処理手法と同じ前処理手法をＣ１～ＣＬのそれぞれに施すことにより、前処理後新規図面内容パラメータセットを生成する。参考形態の例では、検索用前処理部１２５は、（ｉ）Ｃ１に［Ｓ］を施し、（ｉｉ）Ｃ２に［Ｓ］を施し、（ｉｉｉ）Ｃ３に［Ｏ］を施し、かつ、（ｉｖ）ＣＬに［Ｌ］を施
す。 Next, the search preprocessing unit 125 generates a preprocessed new drawing content parameter set by preprocessing the new drawing content parameter set according to the best preprocessing method. That is, the search preprocessing unit 125 generates a preprocessed new drawing content parameter set by applying the same preprocessing method as the best preprocessing method to each of C1 to CL. In the example of the reference embodiment, the search preprocessing unit 125 (i) applies [S] to C1, (ii) applies [S] to C2, (iii) applies [O] to C3, and (iv) applies [L] to CL.

なお、新規図面内容パラメータセットは、ターゲット図面内容パラメータセットと称されてもよい。従って、前処理後新規図面内容パラメータセットは、前処理後ターゲット図面内容パラメータセットと称されてもよい。 Note that the new drawing content parameter set may also be referred to as the target drawing content parameter set. Therefore, the pre-processed new drawing content parameter set may also be referred to as the pre-processed target drawing content parameter set.

検索用前処理部１２５は、前処理後新規図面内容パラメータセットを示すテーブル（前処理後新規図面内容パラメータテーブル）を生成してよい。図１３に示されているＴＢ－ＮＤＰは、参考形態における前処理後新規図面内容パラメータテーブルの一例である。上述の説明から明らかである通り、前処理後新規図面内容パラメータセットは、データセット１と同じデータ構造を有している（上述の図６も参照）。 The search preprocessing unit 125 may generate a table indicating the post-preprocessing new drawing content parameter set (post-preprocessing new drawing content parameter table). TB-NDP shown in Figure 13 is an example of a post-preprocessing new drawing content parameter table in the reference form. As is clear from the above explanation, the post-preprocessing new drawing content parameter set has the same data structure as Dataset 1 (see also Figure 6 above).

なお、検索用前処理部１２５における前処理手法［Ｓ］の一例について説明すれば、次の通りである。以下では、検索用前処理部１２５によって、Ｃｋ（図面ＮＤの第ｋ内容パラメータ）を標準化する場合について述べる。 An example of the preprocessing method [S] in the search preprocessing unit 125 is as follows. The following describes the case where Ck (the kth content parameter of the drawing ND) is standardized by the search preprocessing unit 125.

まず、検索用前処理部１２５は、上述の標準化用データ表７００Ｂを参照し、ＡｋｍｅａｎおよびＡｋｓｄを取得する。続いて、検索用前処理部１２５は、
Ｃｋ＿Ｓｔａｎｄａｒｄｉｚｅｄ＝（Ｃｋ－Ａｋｍｅａｎ）／Ａｋｓｄ …（３）
の通り、Ｃｋ＿Ｓｔａｎｄａｒｄｉｚｅｄを算出する。Ｃｋ＿Ｓｔａｎｄａｒｄｉｚｅｄは、図面ＮＤにおける標準化後の第ｋ内容パラメータである。Ｃｋ＿Ｓｔａｎｄａｒｄｉｚｅｄは、標準化後新規図面第ｋ内容パラメータとも称される。標準化後新規図面第ｋ内容パラメータは、前処理後新規図面第ｋ内容パラメータの一例である。 First, the search preprocessing unit 125 references the standardization data table 700B and obtains Akmean and Aksd.
Ck_Standardized=(Ck-Akmean)/Aksd...(3)
Ck_Standardized is calculated as follows. Ck_Standardized is the standardized k-th content parameter in the drawing ND. Ck_Standardized is also referred to as the standardized new drawing k-th content parameter. The standardized new drawing k-th content parameter is an example of the preprocessed new drawing k-th content parameter.

また、検索用前処理部１２５における前処理手法［Ｎ］の一例について説明すれば、次の通りである。まず、検索用前処理部１２５は、上述の正規化用データ表７００Ａを参照し、ＡｋｍａｘおよびＡｋｍｉｎを取得する。続いて、検索用前処理部１２５は、
Ｃｋ＿Ｎｏｒｍａｌｉｚｅｄ＝（Ｃｋ－Ａｋｍｉｎ）／（Ａｋｍａｘ－Ａｋｍｉｎ）
…（４）
の通り、Ｃｋ＿Ｎｏｒｍａｌｉｚｅｄを算出する。Ｃｋ＿Ｎｏｒｍａｌｉｚｅｄは、図面ＮＤにおける正規化後の第ｋ内容パラメータである。Ｃｋ＿Ｎｏｒｍａｌｉｚｅｄは、正規化後新規図面第ｋ内容パラメータとも称される。正規化後新規図面第ｋ内容パラメータは、前処理後新規図面第ｋ内容パラメータの別の例である。 An example of the preprocessing method [N] in the search preprocessing unit 125 will be described as follows. First, the search preprocessing unit 125 references the normalization data table 700A described above to obtain Akmax and Akmin. Next, the search preprocessing unit 125
Ck_Normalized=(Ck-Akmin)/(Akmax-Akmin)
…(4)
Ck_Normalized is calculated as follows. Ck_Normalized is the normalized k-th content parameter in the drawing ND. Ck_Normalized is also referred to as the normalized new drawing k-th content parameter. The normalized new drawing k-th content parameter is another example of the preprocessed new drawing k-th content parameter.

（検索部１２６における検索フェーズ）
検索部１２６は、検索用前処理部１２５から、前処理後新規図面内容パラメータセットを取得する。また、検索部１２６は、学習モデル生成部１１３から、ベスト学習モデルを取得する。検索部１２６は、前処理後新規図面内容パラメータセットをベスト学習モデルに入力する。そして、検索部１２６は、前処理後新規図面内容パラメータセットに応じたベスト学習モデルの出力を、ベスト学習モデルから取得する。 (Search Phase in Search Unit 126)
The search unit 126 acquires the post-preprocessing new drawing content parameter set from the search preprocessing unit 125. The search unit 126 also acquires the best learning model from the learning model generation unit 113. The search unit 126 inputs the post-preprocessing new drawing content parameter set to the best learning model. Then, the search unit 126 acquires the output of the best learning model corresponding to the post-preprocessing new drawing content parameter set from the best learning model.

一例として、参考形態における各学習モデルが、図面ＮＤに対する各過去図面（図面ａ１～ＭＮ）の関連性の高さを示すスコア（指標）である関連性スコアを出力（導出）するように訓練された学習モデルである場合を考える。関連性スコアの導出方法の例については、特許文献１を参照されたい。 As an example, consider a case where each learning model in the reference embodiment is a learning model trained to output (derive) a relevance score, which is a score (index) indicating the degree of relevance of each past drawing (drawings a1 to MN) to drawing ND. For an example of a method for deriving a relevance score, see Patent Document 1.

この場合、検索部１２６は、ベスト学習モデルに前処理後新規図面内容パラメータセットを入力することにより、当該前処理後新規図面内容パラメータセットに応じた関連性スコアを、ベスト学習モデルに出力させる。そして、検索部１２６は、ベスト学習モデルの出力（例：関連性スコア）に基づいて、図面ＮＤに対応する少なくとも１つの過去図面を検索する。関連性スコアに基づく当該過去図面の検索手法の例については、特許文献１を参照されたい。検索部１２６は、特許文献１と同様に、検索結果を示すデータを、表示部７２に表示させてよい。 In this case, the search unit 126 inputs the preprocessed new drawing content parameter set into the best learning model, causing the best learning model to output a relevance score corresponding to the preprocessed new drawing content parameter set. Then, the search unit 126 searches for at least one past drawing corresponding to the drawing ND based on the output (e.g., relevance score) of the best learning model. For an example of a method for searching for past drawings based on a relevance score, see Patent Document 1. As with Patent Document 1, the search unit 126 may display data indicating the search results on the display unit 72.

（参考形態の効果）
参考形態における情報処理システム１００ｓ（情報処理装置１ｓ）によれば、上記先行技術（特許文献１の技術）と同様に、図面検索におけるユーザの利便性を従来よりも高めることが可能となる。加えて、学習装置１１ｓによれば、上記先行技術とは異なり、検索対象図面内容パラメータセット（過去図面内容パラメータセット）に対して複数種類の前処理手法が網羅的に施されることにより、当該検索対象図面内容パラメータセットが拡張される。すなわち、複数の前処理後検索対象図面内容パラメータセットが生成される。 (Effects of the reference form)
The information processing system 100s (information processing device 1s) in the reference embodiment, like the prior art (the technology of Patent Document 1), can improve user convenience in drawing searches compared to conventional methods. In addition, unlike the prior art, the learning device 11s comprehensively applies multiple types of preprocessing methods to a search target drawing content parameter set (past drawing content parameter set), thereby expanding the search target drawing content parameter set. In other words, multiple preprocessed search target drawing content parameter sets are generated.

続いて、複数の機械学習アルゴリズムを適用することにより、複数の前処理後検索対象図面内容パラメータセット（例：データセット１～４^ＦＬ×５^ＦＮ）を用いて、複数の学習モデルが生成される。そして、複数の学習モデルのそれぞれの品質を示す指標値（例：過去図面の検索精度）に基づいて、当該複数の学習モデルの内から、ベスト学習モデルが選択される。言い換えれば、上記指標値に基づいて、複数の機械学習アルゴリズムの内から、ベスト機械学習アルゴリズムが選択される。続いて、ベスト学習モデルに対応するベスト前処理手法が選択される。 Next, by applying multiple machine learning algorithms, multiple learning models are generated using multiple preprocessed retrieval target drawing content parameter sets (e.g., data sets 1-4 ^FL × 5 ^FN ). Then, the best learning model is selected from the multiple learning models based on an index value indicating the quality of each of the multiple learning models (e.g., retrieval accuracy of past drawings). In other words, the best machine learning algorithm is selected from the multiple machine learning algorithms based on the index value. Then, the best preprocessing method corresponding to the best learning model is selected.

一般的に、機械学習アルゴリズムによって生成される学習モデルの品質は、学習用データ（例：検索対象図面内容パラメータセット）に適用される前処理手法に応じて変化しうる。加えて、学習モデルの品質は、前処理後の学習用データに適用される機械学習アルゴリズムの種類に応じても変化しうる。 In general, the quality of a learning model generated by a machine learning algorithm may vary depending on the preprocessing method applied to the learning data (e.g., the set of drawing content parameters to be searched). In addition, the quality of the learning model may also vary depending on the type of machine learning algorithm applied to the preprocessed learning data.

適切な前処理手法および機械学習アルゴリズムが選択された場合には、高品質な学習モデルを生成することが可能である。但し、機械学習分野において適用可能な前処理手法および機械学習アルゴリズムの種類は、多岐に亘っている。このため、学習モデルの品質向上に好適な（理想的には最適な）前処理手法および機械学習アルゴリズムの組み合わせを、ユーザが人為的に選択することは必ずしも容易ではない。 When appropriate preprocessing methods and machine learning algorithms are selected, it is possible to generate high-quality learning models. However, there are a wide variety of preprocessing methods and machine learning algorithms that can be applied in the field of machine learning. For this reason, it is not always easy for users to manually select a combination of preprocessing methods and machine learning algorithms that is suitable (ideally the optimal) for improving the quality of learning models.

そこで、学習装置１１ｓでは、上述の通り、生成された複数の学習モデルのそれぞれを、上記指標値に基づいて網羅的に評価することにより、ベスト学習モデルおよびベスト前処理手法が選択される。すなわち、生成された複数の学習モデルに対してグリッドサーチを行うことにより、ベスト学習モデルおよびベスト前処理手法が選択される。 Therefore, as described above, the learning device 11s comprehensively evaluates each of the multiple generated learning models based on the above index values to select the best learning model and the best preprocessing method. In other words, the best learning model and the best preprocessing method are selected by performing a grid search on the multiple generated learning models.

上記の構成によれば、ユーザの人為的な選択を経ることなく、ベスト学習モデルおよびベスト前処理手法を特定することができる。すなわち、学習モデルの品質向上に最適である（少なくとも好適である）と期待される前処理手法および機械学習アルゴリズムの組み合わせを、学習装置１１によって自動的に選択できる。 The above configuration makes it possible to identify the best learning model and best preprocessing method without the user having to manually select them. In other words, the learning device 11 can automatically select a combination of preprocessing method and machine learning algorithm that is expected to be optimal (or at least suitable) for improving the quality of the learning model.

その後、図面検索装置１２では、学習装置１１ｓによって選択されたベスト前処理手法に従って、ターゲット図面内容パラメータセット（新規図面内容パラメータセット）に対して前処理が施される。すなわち、ベスト前処理手法に従って、前処理後ターゲット図面内容パラメータセット（前処理後新規図面内容パラメータセット）が生成される。 Then, the drawing search device 12 applies preprocessing to the target drawing content parameter set (new drawing content parameter set) according to the best preprocessing method selected by the learning device 11s. In other words, a preprocessed target drawing content parameter set (preprocessed new drawing content parameter set) is generated according to the best preprocessing method.

上記の構成によれば、ベスト学習モデルに適したデータ構造を有する入力データセットとして、前処理後ターゲット図面内容パラメータセットが生成される。このため、当該前処理後ターゲット図面内容パラメータセットをベスト学習モデルに入力することにより、上記先行技術に比べてさらに高精度な学習モデルの出力（例：ベスト学習モデルによって導出された関連性スコア）を得ることができる。 With the above configuration, a preprocessed target drawing content parameter set is generated as an input dataset having a data structure suitable for the best learning model. Therefore, by inputting this preprocessed target drawing content parameter set into the best learning model, it is possible to obtain a learning model output (e.g., a relevance score derived by the best learning model) with even higher accuracy than the prior art described above.

以上の通り、情報処理システム１００ｓでは、（ｉ）学習装置１１ｓによって予め選択されたベスト前処理手法、および、（ｉｉ）学習装置１１ｓによって予め生成されたベスト学習モデルを用いて、図面検索装置１２に検索を行わせることができる。その結果、情報処理システム１００ｓによれば、上記先行技術に比べてさらに高い検索精度を実現できる。 As described above, the information processing system 100s can cause the drawing search device 12 to perform a search using (i) the best preprocessing method pre-selected by the learning device 11s and (ii) the best learning model pre-generated by the learning device 11s. As a result, the information processing system 100s can achieve even higher search accuracy than the prior art.

（参考形態における補足）
前処理手法［Ｌ］の一例について、以下に説明する。参考形態では、学習用前処理部１１４は、生値・ラベル値変換テーブルに従って、生値（過去図面内容パラメータ取得部１１２によって取得された内容パラメータ）をラベルエンコーディングしてよい。具体的には、学習用前処理部１１４は、ある生値と当該生値に対応するラベル値の対応関係を示すテーブル（以下、生値・ラベル値変換テーブルと称する）に従って、生値をラベル値へと変換してよい。 (Supplementary information in reference format)
An example of the preprocessing method [L] will be described below. In a reference embodiment, the learning preprocessing unit 114 may label-encode raw values (content parameters acquired by the past drawing content parameter acquisition unit 112) according to a raw value-label value conversion table. Specifically, the learning preprocessing unit 114 may convert raw values into label values according to a table (hereinafter referred to as the raw value-label value conversion table) that indicates the correspondence between a certain raw value and the label value corresponding to the raw value.

一例として、情報処理システム１００ｓでは、第１～第Ｌ内容パラメータのそれぞれについて、個別の生値・ラベル値変換テーブルが予め設定されている。以下、第ｋ内容パラメータに対応する生値・ラベル値変換テーブルを、第ｋ生値・ラベル値変換テーブルと称する。 As an example, in the information processing system 100s, individual raw value-to-label value conversion tables are pre-set for each of the first through Lth content parameters. Hereinafter, the raw value-to-label value conversion table corresponding to the kth content parameter will be referred to as the kth raw value-to-label value conversion table.

図１４には、複数の生値・ラベル値変換テーブルの内の一部が例示されている。図１４において、（ｉ）符号１４００－１は第１生値・ラベル値変換テーブルを、（ｉｉ）符号１４００－２は第２生値・ラベル値変換テーブルを、（ｉｉｉ）符号１４００－３は第３生値・ラベル値変換テーブルを、（ｉｖ）符号１４００－Ｌは第Ｌ生値・ラベル値変換テーブルを、それぞれ表す。 Figure 14 illustrates some of the multiple raw value-to-label value conversion tables. In Figure 14, (i) symbol 1400-1 represents the first raw value-to-label value conversion table, (ii) symbol 1400-2 represents the second raw value-to-label value conversion table, (iii) symbol 1400-3 represents the third raw value-to-label value conversion table, and (iv) symbol 1400-L represents the Lth raw value-to-label value conversion table.

図１４の例では、学習用前処理部１１４は、第ｋ生値・ラベル値変換テーブルに従って、第ｋ内容パラメータをラベルエンコーディングしてよい。例えば、学習用前処理部１１４は、第１生値・ラベル値変換テーブルに従って、第１内容パラメータをラベルエンコーディングする。また、学習用前処理部１１４は、第Ｌ生値・ラベル値変換テーブルに従って、第Ｌ内容パラメータをラベルエンコーディングする。 In the example of FIG. 14, the learning preprocessing unit 114 may label encode the kth content parameter according to the kth raw value-to-label value conversion table. For example, the learning preprocessing unit 114 label encodes the first content parameter according to the first raw value-to-label value conversion table. Also, the learning preprocessing unit 114 label encodes the Lth content parameter according to the Lth raw value-to-label value conversion table.

以上の通り、第ｋ内容パラメータの変数種類によらず（すなわち、第ｋ内容パラメータがＶＬまたはＶＮのいずれであっても）、当該第ｋ内容パラメータに対し、前処理手法［Ｌ］が施されてよい。なお、上述の通り、第ｋ内容パラメータがＶＬである場合には、［Ｌ］は、［Ｒ］と等価な前処理手法であると言える。 As described above, regardless of the variable type of the kth content parameter (i.e., whether the kth content parameter is VL or VN), the preprocessing method [L] may be applied to the kth content parameter. Note that, as mentioned above, when the kth content parameter is VL, [L] can be said to be a preprocessing method equivalent to [R].

別の例として、情報処理システム１００ｓでは、第１～第Ｌ生値・ラベル値変換テーブルを統合したテーブル（以下、生値・ラベル値変換統合テーブルと称する）が予め作成されていてもよい。図１５における符号１５００は、生値・ラベル値変換統合テーブルの一例を表す。 As another example, the information processing system 100s may have created in advance a table that combines the first to Lth raw value-label value conversion tables (hereinafter referred to as the integrated raw value-label value conversion table). Reference numeral 1500 in Figure 15 represents an example of an integrated raw value-label value conversion table.

図１５の例において、生値・ラベル値変換統合テーブルのｋ行目は、第ｋ生値・ラベル値変換テーブルに対応する。従って、学習用前処理部１１４は、生値・ラベル値変換統合テーブルのｋ行目を参照し、第ｋ内容パラメータをラベルエンコーディングしてもよい。例えば、学習用前処理部１１４は、生値・ラベル値変換統合テーブルの２行目を参照し、第２内容パラメータをラベルエンコーディングする。 In the example of Figure 15, the kth row of the integrated raw value/label value conversion table corresponds to the kth raw value/label value conversion table. Therefore, the learning preprocessing unit 114 may refer to the kth row of the integrated raw value/label value conversion table and label encode the kth content parameter. For example, the learning preprocessing unit 114 refers to the second row of the integrated raw value/label value conversion table and label encodes the second content parameter.

〔実施形態１〕
図１６は、実施形態１の情報処理システム１００の要部の構成を示すブロック図である。情報処理システム１００の情報処理装置を、情報処理装置１と称する。情報処理装置１の制御装置を、制御装置１０と称する。制御装置１０の学習装置を、学習装置１１（モデル生成装置）と称する。 [Embodiment 1]
16 is a block diagram showing the configuration of the main parts of the information processing system 100 of embodiment 1. The information processing device of the information processing system 100 is referred to as the information processing device 1. The control device of the information processing device 1 is referred to as the control device 10. The learning device of the control device 10 is referred to as the learning device 11 (model generation device).

学習装置１１は、学習装置１１ｓとは異なり、決定部１１５をさらに備える。また、学習装置１１は、学習装置１１ｓの学習モデル生成部１１３ｓに替えて、学習モデル生成部１１３（学習部）を備える。学習装置１１の各部の動作の説明に先立ち、参考形態において改善可能な点について以下に述べる。 Unlike learning device 11s, learning device 11 further includes a determination unit 115. Furthermore, learning device 11 includes a learning model generation unit 113 (learning unit) instead of the learning model generation unit 113s of learning device 11s. Before explaining the operation of each unit of learning device 11, possible improvements in the reference embodiment will be described below.

上述の通り、参考形態では、第ｋ内容パラメータ変数種類情報に応じた複数種類の前処理手法を第ｋ内容パラメータに適用することにより、内容パラメータセットが拡張される。次いで、拡張後内容パラメータセットを用いた学習モデルの生成および検証（評価）を通じて、ベスト学習モデルおよびベスト前処理手法が選択される。 As described above, in the reference embodiment, the content parameter set is expanded by applying multiple preprocessing methods to the kth content parameter according to the kth content parameter variable type information. Next, the best learning model and best preprocessing method are selected through the generation and verification (evaluation) of a learning model using the expanded content parameter set.

しかしながら、参考形態では、第ｋ内容パラメータに適用される前処理手法次第では、拡張後内容パラメータセットに含まれる複数の説明変数間（複数の前処理後内容パラメータ間）において、多重共線性（multicollinearity）が発生しうる。当業者であれば理解できる通り、多重共線性が発生している複数の説明変数を用いて学習モデルを生成した場合には、当該学習モデルの品質が低下しうる。 However, in the reference embodiment, depending on the preprocessing method applied to the kth content parameter, multicollinearity may occur between the multiple explanatory variables included in the expanded content parameter set (between the multiple preprocessed content parameters). As will be understood by those skilled in the art, if a learning model is generated using multiple explanatory variables in which multicollinearity occurs, the quality of the learning model may deteriorate.

また、当業者であれば理解できる通り、多重共線性の発生リスクは、ある学習モデルを生成するための学習用データの次元数（参考形態の例では、前処理後内容パラメータ数）が大きくなるにつれて高くなる。上述の通り、複数種類の前処理手法のうちの１つは、［Ｏ］（ワンホットエンコーディング）でありうる。参考形態における説明から理解できる通り、［Ｏ］は、前処理後内容パラメータ数の増加をもたらす前処理手法の典型例であると言える。 Furthermore, as will be understood by those skilled in the art, the risk of multicollinearity increases as the number of dimensions of the training data used to generate a certain training model (in the example of the reference embodiment, the number of post-preprocessing content parameters) increases. As described above, one of the multiple types of preprocessing methods can be [O] (one-hot encoding). As can be understood from the explanation in the reference embodiment, [O] can be said to be a typical example of a preprocessing method that results in an increase in the number of post-preprocessing content parameters.

このことから、より多くの第ｋ内容パラメータに［Ｏ］が適用されるほど、多重共線性の発生リスクが高まると懸念される。従って、学習モデルの品質をさらに高めるためには（例：より高品質なベスト学習モデルを得るためには）、多重共線性を排除するための方策を導入することが好ましいと考えられる。実施形態１の学習装置１１は、この考え方に基づき、本願の発明者らによって新たに創作された。 Therefore, there is concern that the risk of multicollinearity increasing as [O] is applied to more k-th content parameters. Therefore, in order to further improve the quality of the learning model (e.g., to obtain a higher quality best learning model), it is considered preferable to introduce measures to eliminate multicollinearity. The learning device 11 of embodiment 1 was newly created by the inventors of the present application based on this idea.

（決定部１１５の処理の一例）
以下、内容パラメータセットの拡張によって得られたある１つのデータセット（すなわち、上述の注目データセット）に対する、決定部１１５の一連の処理について説明する。以下では、上述の図９における符号９００Ａに示されているデータセット１が、注目データセットである場合について説明する。決定部１１５は、データセット１に含まれている異なる２つの前処理後内容パラメータセットのそれぞれについて、決定係数を算出する。 (Example of processing by the determination unit 115)
The following describes a series of processes performed by the determination unit 115 for one data set obtained by expanding the content parameter set (i.e., the above-mentioned data set of interest). The following describes a case where data set 1 indicated by reference numeral 900A in FIG. 9 is the data set of interest. The determination unit 115 calculates a coefficient of determination for each of two different post-preprocessing content parameter sets included in data set 1.

具体的には、決定部１１５は、
…（５）
の通り、決定係数Ｒｉｊを算出する。決定係数Ｒｉｊは、以下に述べるＩｋとＪｋとの間の決定係数である。実施形態１の説明において、ｉは１≦ｉ＜Ｐを満たす整数であり、ｊはｉ＜ｊ≦Ｐを満たす整数である。Ｐは、注目データセットの次元数（行数）である。Ｐは、注目データセットの系列数と称されてもよい。実施形態１の例では、Ｐ＝５０である（上述の図５を参照）。 Specifically, the determination unit 115
…(5)
The coefficient of determination Rij is calculated as follows. The coefficient of determination Rij is the coefficient of determination between Ik and Jk described below. In the description of the first embodiment, i is an integer that satisfies 1≦i<P, and j is an integer that satisfies i<j≦P. P is the number of dimensions (number of rows) of the dataset of interest. P may also be referred to as the number of series of the dataset of interest. In the example of the first embodiment, P=50 (see FIG. 5 above).

式（５）におけるＩｋは、データセット１に含まれているＰ個の前処理後内容パラメータの内、ｉ番目の前処理後内容パラメータ（ある１つの前処理後内容パラメータ）である。Ｊｋは、データセット１に含まれているＰ個の前処理後内容パラメータの内、ｊ番目の前処理後内容パラメータ（別の１つの前処理後内容パラメータ）である。Ｑは、注目データセットの項目数（列数）である。実施形態１の例では、Ｑ＝Ｍである。Ｉａｖｅは、Ｉ１～ＩＱの平均値である。Ｊａｖｅは、Ｊ１～ＪＱの平均値である。 In equation (5), Ik is the i-th preprocessed content parameter (one preprocessed content parameter) of the P preprocessed content parameters included in dataset 1. Jk is the j-th preprocessed content parameter (another preprocessed content parameter) of the P preprocessed content parameters included in dataset 1. Q is the number of items (number of columns) in the dataset of interest. In the example of embodiment 1, Q = M. Iave is the average value of I1 to IQ. Jave is the average value of J1 to JQ.

以上の通り、決定部１１５は、注目データセットに含まれる任意の２つの異なる前処理後内容パラメータ間の決定係数を算出してよい。実施形態１では、決定部１１５は、式（５）に従って、_ＰＣ_２通りのＲｉｊを算出する。すなわち、決定部１１５は、注目データセットに含まれる異なる２つの検索対象図面内容パラメータの組み合わせパターンのそれぞれについて、Ｒｉｊを算出する。 As described above, the determination unit 115 may calculate the coefficient of determination between any two different preprocessed content parameters included in the target data set. In the first embodiment, the determination unit 115 calculates _two _P C combinations of Rij according to formula (5). That is, the determination unit 115 calculates Rij for each combination pattern of two different search target drawing content parameters included in the target data set.

式（５）に示されている通り、Ｒｉｊは、ＩｋとＪｋとの間の相関係数ｓｉｊの２乗値として表すことができる。従って、Ｒｉｊを、ＩｋとＪｋとの間における多重共線性の程度（強さ）を示す評価値（多重共線性評価値）として用いることができる。実施形態１におけるＲｉｊは、多重共線性評価値の一例である。 As shown in equation (5), Rij can be expressed as the square of the correlation coefficient sij between Ik and Jk. Therefore, Rij can be used as an evaluation value (multicollinearity evaluation value) that indicates the degree (strength) of multicollinearity between Ik and Jk. Rij in embodiment 1 is an example of a multicollinearity evaluation value.

決定部１１５は、算出した各決定係数を、所定の閾値Ｒｔｈ（決定係数閾値）と比較する。決定係数閾値は、多重共線性評価値に対する閾値（多重共線性閾値）の一例である。本明細書では、「多重共線性評価値が多重共線性閾値以上である」という条件を満たしている多重共線性評価値を、高リスク多重共線性評価値と称する。従って、実施形態１では、「決定係数がＲｔｈ以上である」という条件を満たしている決定係数を、高リスク決定係数と称する。 The determination unit 115 compares each calculated coefficient of determination with a predetermined threshold Rth (coefficient of determination threshold). The coefficient of determination threshold is an example of a threshold for the multicollinearity assessment value (multicollinearity threshold). In this specification, a multicollinearity assessment value that satisfies the condition that "the multicollinearity assessment value is equal to or greater than the multicollinearity threshold" is referred to as a high-risk multicollinearity assessment value. Therefore, in embodiment 1, a coefficient of determination that satisfies the condition that "the coefficient of determination is equal to or greater than Rth" is referred to as a high-risk coefficient of determination.

実施形態１では、決定部１１５は、各前処理後内容パラメータについて、各決定係数をＲｔｈと比較し、各決定係数の内から高リスク決定係数を抽出する。そして、決定部１１５は、各前処理後内容パラメータについて、抽出した高リスク決定係数の個数を計上（カウントアップ）する。 In embodiment 1, the determination unit 115 compares each coefficient of determination with Rth for each preprocessed content parameter and extracts high-risk coefficients of determination from among the coefficients of determination. The determination unit 115 then counts up the number of high-risk coefficients of determination extracted for each preprocessed content parameter.

データサイエンス分野では、２つの説明変数間の相関係数の絶対値（以下、相関係数絶対値とも称する）が０．７以上である場合に、当該２つの説明変数間に強い相関（あるいは、やや強い相関）が存在していると評価されることが多い。このことから、一例として、Ｒｔｈは、０．４９（＝０．７^２）以上かつ１以下の所定の値として設定されることが好ましい。実施形態１では、Ｒｔｈが０．４９に設定されている場合を例示する。但し、当業者であれば明らかである通り、Ｒｔｈは上記の例に限定されない。 In the field of data science, when the absolute value of the correlation coefficient between two explanatory variables (hereinafter also referred to as the absolute value of the correlation coefficient) is 0.7 or more, it is often evaluated that there is a strong correlation (or a somewhat strong correlation) between the two explanatory variables. For this reason, as an example, it is preferable that Rth is set to a predetermined value of 0.49 (= ^0.72 ) or more and 1 or less. In embodiment 1, a case where Rth is set to 0.49 is exemplified. However, as will be clear to those skilled in the art, Rth is not limited to the above example.

決定部１１５は、各Ｒｉｊと高リスク決定係数の個数との対応関係を示すテーブル（決定係数・高リスク決定係数個数テーブル）を生成する。図１７のＴＢ６は、決定係数・高リスク決定係数個数テーブルの一例である。なお、以下に述べる通り、ＴＢ６は、決定部１１５における一連の処理を通じて更新される。このことから、本明細書では、初期状態のＴＢ６を、ＴＢ６ｉｎｉｔとも称する。 The determination unit 115 generates a table (coefficient of determination/number of high-risk coefficients of determination table) that shows the correspondence between each Rij and the number of high-risk coefficients of determination. TB6 in Figure 17 is an example of a coefficient of determination/number of high-risk coefficients of determination table. As described below, TB6 is updated through a series of processes in the determination unit 115. For this reason, in this specification, TB6 in its initial state is also referred to as TB6init.

図１７には、ＴＢ６ｉｎｉｔの一例が示されている。ＴＢ６ｉｎｉｔのｉ行ｊ列目には、ｉ番目の各前処理後内容パラメータ（すなわちＩｋ）とｊ番目の各前処理後内容パラメータ（すなわちＪｋ）との間の決定係数Ｒｉｊが記録されている。そして、ＴＢ６ｉｎｉｔの最右端列（Ｐ＋１列目）には、Ｉｋに対応する高リスク決定係数の個数（説明の便宜上、「Ｉｋが有する高リスク決定係数の個数」とも称する）が記録されている。図１７の例では、高リスク決定係数にハッチングが付されている。 Figure 17 shows an example of TB6init. The coefficient of determination Rij between the i-th preprocessed content parameter (i.e., Ik) and the j-th preprocessed content parameter (i.e., Jk) is recorded in the i-th row and j-th column of TB6init. The rightmost column (P+1 column) of TB6init records the number of high-risk coefficients of determination corresponding to Ik (for ease of explanation, also referred to as "the number of high-risk coefficients of determination possessed by Ik"). In the example of Figure 17, high-risk coefficients of determination are hatched.

一例として、決定部１１５は、ＴＢ６内（図１７の例では、ＴＢ６ｉｎｉｔ内）の各前処理後内容パラメータの内、最も多い高リスク決定係数の個数（最多高リスク決定係数個数）を有する前処理後内容パラメータを、削除対象となる削除対象前処理後内容パラメータとして決定してよい。図１７の例では、ＴＢ６ｉｎｉｔに含まれる各前処理後内容パラメータの内、５行目の前処理後内容パラメータ「ＯＲ＿２」が、最多高リスク決定係数個数（６個）を有している（図１７中の「高リスク決定係数の個数」における、点線によって図示された矩形部を参照）。このため、決定部１１５は、当該前処理後内容パラメータ「ＯＲ＿２」を、削除対象前処理後内容パラメータとして決定する。そして、決定部１１５は、ＴＢ６ｉｎｉｔから、「ＯＲ＿２」に対応する行および列（すなわち、第５行および第５列）を削除することにより、ＴＢ６を更新する。 As an example, the determination unit 115 may determine, among the pre-processing content parameters in TB6 (in TB6init in the example of Figure 17), the pre-processing content parameter with the highest number of high-risk determination coefficients (highest number of high-risk determination coefficients) as the pre-processing content parameter to be deleted. In the example of Figure 17, among the pre-processing content parameters included in TB6init, the pre-processing content parameter "OR_2" in the fifth row has the highest number of high-risk determination coefficients (6) (see the dotted rectangle in "Number of high-risk determination coefficients" in Figure 17). Therefore, the determination unit 115 determines the pre-processing content parameter "OR_2" as the pre-processing content parameter to be deleted. Then, the determination unit 115 updates TB6 by deleting the row and column corresponding to "OR_2" (i.e., the fifth row and fifth column) from TB6init.

図１８のＴＢ６ａは、ＴＢ６ｉｎｉｔが上述の通り更新されることによって得られたＴＢ６の一例である。決定部１１５は、ＴＢ６ｉｎｉｔから、「ＯＲ＿２」に対応する行および列を削除した後、ＴＢ６ｉｎｉｔの最右端列に記録されていた、Ｉｋに対応する高リスク決定係数の個数を更新する。従って、ＴＢ６ａの最右端列には、更新後の高リスク決定係数の個数が記録されている。 Table 6a in Figure 18 is an example of Table 6 obtained by updating Table 6init as described above. After deleting the row and column corresponding to "OR_2" from Table 6init, the determination unit 115 updates the number of high-risk determination coefficients corresponding to Ik that was recorded in the rightmost column of Table 6init. Therefore, the rightmost column of Table 6a records the updated number of high-risk determination coefficients.

図１８の例では、ＴＢ６ａに含まれる各前処理後内容パラメータの内、２行目の前処理後内容パラメータ「電流値」および４行目の前処理後内容パラメータ「ＯＲ＿１」がそれぞれ、最多高リスク決定係数個数（３個）を有している（図１８中の「高リスク決定係数の個数」における、点線によって図示された矩形部を参照）。このように、ＴＢ６において、複数の前処理後内容パラメータが、同数の最多高リスク決定係数個数を有する場合も考えられる。 In the example of Figure 18, of the pre-processed content parameters included in TB6a, the pre-processed content parameter "Current Value" in the second row and the pre-processed content parameter "OR_1" in the fourth row each have the highest number of high-risk determination coefficients (3) (see the dotted rectangle in "Number of High-Risk Determination Coefficients" in Figure 18). In this way, it is possible that multiple pre-processed content parameters in TB6 have the same number of highest number of high-risk determination coefficients.

そこで、このような場合、決定部１１５は、最多高リスク決定係数個数を有する複数の前処理後内容パラメータを、削除対象候補前処理後内容パラメータとして決定してよい。次いで、決定部１１５は、複数の削除対象候補前処理後内容パラメータの内、最も大きい決定数を有する削除対象候補前処理後内容パラメータを、削除対象前処理後内容パラメータとして決定してよい。なお、削除対象前処理後内容パラメータは、学習対象外前処理後内容パラメータと称されてもよい。同様に、削除対象候補前処理後内容パラメータは、学習対象外候補前処理後内容パラメータと称されてもよい。 In such a case, the determination unit 115 may determine the multiple pre-processing content parameters having the largest number of high-risk determination coefficients as the deletion candidate post-processing content parameters. Next, the determination unit 115 may determine the deletion candidate post-processing content parameter having the largest number of determinations among the multiple deletion candidate post-processing content parameters as the deletion candidate post-processing content parameter. Note that the deletion candidate post-processing content parameter may also be referred to as a non-learning candidate post-processing content parameter. Similarly, the deletion candidate post-processing content parameter may also be referred to as a non-learning candidate post-processing content parameter.

図１８の例では、２行目の前処理後内容パラメータ「電流値」における決定係数の最大値が、４行目の前処理後内容パラメータ「ＯＲ＿１」における決定係数の最大値よりも大きいものとする。このため、決定部１１５は、２行目の前処理後内容パラメータ「電流値」を、削除対象前処理後内容パラメータとして決定する（図１８中の「高リスク決定係数の個数」における、ハッチング付の上記矩形部を参照）。そして、決定部１１５は、ＴＢ６ａから、「電流値」に対応する行および列（すなわち、第２行および第２列）を削除することにより、ＴＢ６を更新する。 In the example of Figure 18, the maximum value of the coefficient of determination for the preprocessed content parameter "current value" in the second row is greater than the maximum value of the coefficient of determination for the preprocessed content parameter "OR_1" in the fourth row. Therefore, the determination unit 115 determines the preprocessed content parameter "current value" in the second row as the preprocessed content parameter to be deleted (see the hatched rectangle in "Number of high-risk determination coefficients" in Figure 18). The determination unit 115 then updates TB6 by deleting the row and column corresponding to "current value" (i.e., the second row and second column) from TB6a.

図１９のＴＢ６ｂは、ＴＢ６ａが上述の通り更新されることによって得られたＴＢ６の一例である。決定部１１５は、ＴＢ６ａから、「電流値」に対応する行および列を削除した後、ＴＢ６ａの最右端列に記録されていた、Ｉｋに対応する高リスク決定係数の個数を更新する。従って、ＴＢ６ｂの最右端列には、更新後の高リスク決定係数の個数が記録されている。以降、決定部１１５は、ＴＢ６における全ての高リスク決定係数の個数が０になるまで、上述の通りＴＢ６の更新を繰り返す。なお、図１９の例では、３行目の前処理後内容パラメータ「ＯＲ＿１」が削除対象候補前処理後内容パラメータとして決定されている（図１９中の「高リスク決定係数の個数」における、点線によって図示された矩形部を参照）。但し、図１９の例では、不図示の前処理後内容パラメータが、削除対象前処理後内容パラメータとして決定されている。 Table 6b in Figure 19 is an example of table 6 obtained by updating table 6a as described above. After deleting the row and column corresponding to "current value" from table 6a, the determination unit 115 updates the number of high-risk determination coefficients corresponding to Ik, which was recorded in the rightmost column of table 6a. Therefore, the rightmost column of table 6b records the updated number of high-risk determination coefficients. Thereafter, the determination unit 115 repeats updating table 6 as described above until the number of all high-risk determination coefficients in table 6 becomes 0. Note that in the example of Figure 19, the preprocessed content parameter "OR_1" in the third row has been determined as the preprocessed content parameter candidate for deletion (see the dotted rectangle in "Number of high-risk determination coefficients" in Figure 19). However, in the example of Figure 19, a preprocessed content parameter not shown has been determined as the preprocessed content parameter to be deleted.

図２０のＴＢ６ｅｎｄは、上述の更新の繰り返しの結果として得られた、最終的なＴＢ６の一例である。図２０に示されている通り、ＴＢ６ｅｎｄでは、全ての高リスク決定係数の個数が０である。ＴＢ６ｅｎｄに含まれている前処理後内容パラメータは、ＴＢ６ｉｎｉｔに含まれていた複数の前処理後内容パラメータから、ＴＢ６の更新過程において見出された各削除対象前処理後内容パラメータ（実施形態１の例において図示されている範囲では、「ＯＲ＿２」および「電流値」）を除いた前処理後内容パラメータである。このことから、ＴＢ６ｅｎｄに含まれている前処理後内容パラメータは、残余前処理後内容パラメータと称されてもよい。 TB6end in Figure 20 is an example of the final TB6 obtained as a result of the repeated updates described above. As shown in Figure 20, in TB6end, the counts of all high-risk determination coefficients are 0. The preprocessed content parameters included in TB6end are the preprocessed content parameters obtained by excluding each preprocessed content parameter to be deleted (in the range illustrated in the example of embodiment 1) found during the TB6 update process from the multiple preprocessed content parameters included in TB6init. For this reason, the preprocessed content parameters included in TB6end may also be referred to as residual preprocessed content parameters.

上述の説明から明らかである通り、複数の残余前処理後内容パラメータ間においては、高リスク決定係数は生じない。すなわち、複数の残余前処理後内容パラメータ間における、多重共線性の発生リスクが十分に低減されている。そこで、決定部１１５は、残余前処理後内容パラメータのみが学習フェーズにおける説明変数として用いられるように、データセット１を処理（より具体的には、剪定）してよい。 As is clear from the above explanation, no high-risk coefficients of determination occur between the multiple residual preprocessed content parameters. In other words, the risk of multicollinearity occurring between the multiple residual preprocessed content parameters is sufficiently reduced. Therefore, the determination unit 115 may process (more specifically, prune) dataset 1 so that only the residual preprocessed content parameters are used as explanatory variables in the learning phase.

具体的には、決定部１１５は、データセット１から削除対象前処理後内容パラメータを削除することにより、剪定後データセット１（データセット１に対応する剪定後データセット）を生成してよい。図２１における符号９００ＡＰは、剪定後データセット１の一例を表す。上述の説明から明らかである通り、図２１の例における剪定後データセット１は、データセット１から「ＯＲ＿２」および「電流値」が削除されることによって生成される。 Specifically, the determination unit 115 may generate pruned dataset 1 (a pruned dataset corresponding to dataset 1) by deleting the post-preprocessing content parameters to be deleted from dataset 1. Reference numeral 900AP in FIG. 21 represents an example of pruned dataset 1. As is clear from the above explanation, pruned dataset 1 in the example of FIG. 21 is generated by deleting "OR_2" and "current value" from dataset 1.

以上の通り、決定部１１５は、注目データセットにおける削除対象前処理後内容パラメータを当該注目データセットから削除することにより、当該注目データセットに対応する剪定後注目データセットを生成してよい。従って、例えば、決定部１１５は、データセット１に関する上記の例と同様にして、データセット２における削除対象前処理後内容パラメータをデータセット２から削除することにより、剪定後データセット２を生成してよい。また、決定部１１５は、データセット４^ＦＬ×５^ＦＮにおける削除対象前処理後内容パラメータをデータセット４^ＦＬ×５^ＦＮから削除することにより、剪定後データセット４^ＦＬ×５^ＦＮを生成してよい。 As described above, the determining unit 115 may generate a pruned dataset of interest corresponding to the dataset of interest by deleting, from the dataset of interest, post-preprocessing content parameters to be deleted in the dataset of interest. Therefore, for example, the determining unit 115 may generate pruned dataset 2 by deleting, from dataset 2, post-preprocessing content parameters to be deleted in dataset 2, in the same manner as in the above example regarding dataset 1. Furthermore, the determining unit 115 may generate pruned dataset 4 ^FL × 5 ^FN by deleting, from dataset 4 ^FL × 5 ^FN , post-preprocessing content parameters to be deleted in dataset 4 ^FL × 5 ^FN .

このように、決定部１１５は、前処理後第１図面種類内容パラメータセット（第１図面種類内容パラメータセットに複数パターンの前処理のそれぞれが施されることによって得られたデータセット）における削除対象前処理後内容パラメータを、当該前処理後第１図面種類内容パラメータセットから削除することにより、剪定後前処理後第１図面種類内容パラメータセット（剪定後データセット１～剪定後データセット４^ＦＬ×５^ＦＮから成るセット）を生成してよい。 In this way, the determination unit 115 may generate a pruned post-preprocessing first drawing type content parameter set (a set consisting of pruned dataset 1 to pruned dataset 4 FL x 5 FN) by deleting the preprocessing content parameters to be deleted in the preprocessing first drawing type content parameter set (a ^data set obtained by applying each of multiple patterns of preprocessing to the first drawing type content parameter set) from ^the preprocessing first drawing type content parameter set.

第１図面種類内容パラメータセットについての上記の例と同様に、決定部１１５は、前処理後第２図面種類内容パラメータセット（第２図面種類内容パラメータセットに複数パターンの前処理のそれぞれが施されることによって得られたデータセット）における削除対象前処理後内容パラメータを、当該前処理後第２図面種類内容パラメータセットから削除することにより、剪定後前処理後第２図面種類内容パラメータセットを生成してよい。また、決定部１１５は、前処理第Ｎ図面種類内容パラメータセット（第Ｎ図面種類内容パラメータセットに複数パターンの前処理のそれぞれが施されることによって得られたデータセット）における削除対象前処理後内容パラメータを、当該前処理第Ｎ図面種類内容パラメータセットから削除することにより、剪定後前処理第Ｎ図面種類内容パラメータセットを生成してよい。 Similar to the above example for the first drawing type content parameter set, the determination unit 115 may generate the pruned post-preprocessing second drawing type content parameter set by deleting from the post-preprocessing second drawing type content parameter set the post-preprocessing content parameters to be deleted in the post-preprocessing second drawing type content parameter set (a data set obtained by applying each of multiple patterns of preprocessing to the second drawing type content parameter set). Furthermore, the determination unit 115 may generate the pruned post-preprocessing Nth drawing type content parameter set by deleting from the preprocessing Nth drawing type content parameter set the post-preprocessing content parameters to be deleted in the preprocessing Nth drawing type content parameter set (a data set obtained by applying each of multiple patterns of preprocessing to the Nth drawing type content parameter set).

（学習モデル生成部１１３における処理の一例）
実施形態１では、学習モデル生成部１１３は、決定部１１５から剪定後前処理後第１図面種類内容パラメータセットを取得する。学習モデル生成部１１３は、参考形態と同様にして、剪定後前処理後第１図面種類内容パラメータセットに基づいて、剪定後前処理後第１図面種類内容パラメータセットに対応する複数の学習モデルを生成する。次いで、学習モデル生成部１１３は、当該複数の学習モデルのそれぞれの品質を評価する。 (Example of processing in the learning model generation unit 113)
In the first embodiment, the learning model generation unit 113 acquires the pruned, preprocessed, first drawing type content parameter set from the determination unit 115. Similar to the reference embodiment, the learning model generation unit 113 generates a plurality of learning models corresponding to the pruned, preprocessed, first drawing type content parameter set based on the pruned, preprocessed, first drawing type content parameter set. Next, the learning model generation unit 113 evaluates the quality of each of the plurality of learning models.

具体的には、学習モデル生成部１１３は、剪定後前処理後第１図面種類内容パラメータセットに含まれる各データセットを、訓練データと検証データとに分割する。そして、学習モデル生成部１１３は、当該訓練データを用いて、当該訓練データに対応する複数の学習モデルを生成する。次いで、学習モデル生成部１１３は、当該検証データを用いて、当該複数の学習モデルのそれぞれの品質を評価する。より具体的には、学習モデル生成部１１３は、当該検証データを用いて、当該複数の学習モデルのそれぞれの指標値を導出する。 Specifically, the learning model generation unit 113 divides each data set included in the post-pruning and post-preprocessing first drawing type content parameter set into training data and validation data. Then, the learning model generation unit 113 uses the training data to generate multiple learning models corresponding to the training data. Next, the learning model generation unit 113 uses the validation data to evaluate the quality of each of the multiple learning models. More specifically, the learning model generation unit 113 uses the validation data to derive index values for each of the multiple learning models.

同様に、学習モデル生成部１１３は、決定部１１５から剪定後前処理後第２図面種類内容パラメータセットを取得する。学習モデル生成部１１３は、剪定後前処理後第２図面種類内容パラメータセットに基づいて、当該剪定後前処理後第２図面種類内容パラメータセットに対応する複数の学習モデルを生成する。次いで、学習モデル生成部１１３は、当該複数の学習モデルのそれぞれの品質を評価する。また、学習モデル生成部１１３は、決定部１１５から剪定後前処理後第Ｎ図面種類内容パラメータセットを取得する。学習モデル生成部１１３は、剪定後前処理後第Ｎ図面種類内容パラメータセットに基づいて、当該剪定後前処理後第Ｎ図面種類内容パラメータセットに対応する複数の学習モデルを生成する。次いで、学習モデル生成部１１３は、当該複数の学習モデルのそれぞれの品質を評価する。 Similarly, the learning model generation unit 113 obtains the post-pruning, post-preprocessing, second drawing type content parameter set from the determination unit 115. The learning model generation unit 113 generates a plurality of learning models corresponding to the post-pruning, post-preprocessing, second drawing type content parameter set based on the post-pruning, post-preprocessing, second drawing type content parameter set. Next, the learning model generation unit 113 evaluates the quality of each of the plurality of learning models. Furthermore, the learning model generation unit 113 obtains the post-pruning, post-preprocessing, Nth drawing type content parameter set from the determination unit 115. The learning model generation unit 113 generates a plurality of learning models corresponding to the post-pruning, post-preprocessing, Nth drawing type content parameter set based on the post-pruning, post-preprocessing, Nth drawing type content parameter set. Next, the learning model generation unit 113 evaluates the quality of each of the plurality of learning models.

実施形態１では、学習モデル生成部１１３は、剪定後前処理後第１図面種類内容パラメータセット～剪定後前処理後第Ｎ図面種類内容パラメータセットに基づいて導出された指標値に基づき、生成された複数の学習モデルの内から、ベスト学習モデルを選択する。例えば、学習モデル生成部１１３は、当該複数の学習モデルの内、最大指標値を有する学習モデルを、ベスト学習モデルとして選択する。続いて、学習モデル生成部１１３は、参考形態と同様にして、ベスト学習モデルに対応する前処理手法を、ベスト前処理手法として選択する。実施形態１における以降の処理については、参考形態と同様である。 In embodiment 1, the learning model generation unit 113 selects the best learning model from among multiple generated learning models based on index values derived based on the pruned, post-preprocessing first drawing type content parameter set through the pruned, post-preprocessing Nth drawing type content parameter set. For example, the learning model generation unit 113 selects the learning model with the largest index value from among the multiple learning models as the best learning model. Next, the learning model generation unit 113 selects the preprocessing method corresponding to the best learning model as the best preprocessing method, similar to the reference embodiment. Subsequent processing in embodiment 1 is similar to the reference embodiment.

（実施形態１の効果）
実施形態１における学習装置１１によれば、複数の前処理後内容パラメータから、削除対象前処理後内容パラメータ（多重共線性を生じさせるリスクが高いと懸念される前処理後内容パラメータ）を排除した上で、複数の学習モデルを生成できる。その上で、複数の学習モデルの内から、ベスト学習モデルを見出すことができる。 (Effects of the First Embodiment)
The learning device 11 according to the first embodiment can generate multiple learning models after excluding preprocessing parameters to be deleted (preprocessing parameters that are feared to have a high risk of causing multicollinearity) from multiple preprocessing parameters, and then can find the best learning model from among the multiple learning models.

本明細書では、ある前処理後内容パラメータセット（より詳細には、前処理後検索対象図面内容パラメータセット）のそれぞれから、当該内容パラメータセットにおける削除対象前処理後内容パラメータを削除することによって得られるデータセットを、剪定後前処理後内容パラメータセット（より詳細には、剪定後前処理後検索対象図面内容パラメータセット）と称する。上述の剪定後前処理後第１図面種類内容パラメータセット～剪定後前処理後第Ｎ図面種類内容パラメータセットはいずれも、剪定後前処理後内容パラメータセットの例である。 In this specification, the data set obtained by deleting the preprocessed content parameters to be deleted in a preprocessed content parameter set (more specifically, the preprocessed drawing content parameter set to be searched) from that content parameter set is referred to as the pruned preprocessed content parameter set (more specifically, the pruned preprocessed drawing content parameter set to be searched). The pruned preprocessed first drawing type content parameter set through the pruned preprocessed Nth drawing type content parameter set described above are all examples of pruned preprocessed content parameter sets.

以上の通り、学習装置１１によれば、複数の剪定後前処理後内容パラメータセットに基づき、複数の学習モデルを生成できる。その上で、複数の学習モデルの内から、ベスト学習モデルを見出すことができる。このため、学習装置１１によれば、多重共線性の影響が排除された学習モデルを、ベスト学習モデルとして得ることができる。それゆえ、参考形態に比べてさらに高品質なベスト学習モデルを得ることができる。このように、学習装置１１によれば、図面検索を行うための学習モデルの品質を従来よりも向上させることができる。 As described above, the learning device 11 can generate multiple learning models based on multiple pruned and preprocessed content parameter sets. It can then find the best learning model from among the multiple learning models. Therefore, the learning device 11 can obtain a learning model from which the effects of multicollinearity have been eliminated as the best learning model. Therefore, it is possible to obtain a best learning model of even higher quality than the reference form. In this way, the learning device 11 can improve the quality of learning models for drawing search compared to conventional methods.

また、学習装置１１によれば、残余前処理後内容パラメータのみが学習フェーズにおける説明変数として用いられるように、複数の学習モデルを生成できる。すなわち、参考形態に比べ、各データセットの次元数を削減させた上で、複数の学習モデルを生成できる。それゆえ、学習フェーズに要する計算コストを、参考形態に比べて低減させることもできる。例えば、参考形態よりも短い計算時間によって、高品質なベスト学習モデルを得ることができる。 Furthermore, the learning device 11 can generate multiple learning models so that only the residual preprocessed content parameters are used as explanatory variables in the learning phase. In other words, multiple learning models can be generated after reducing the number of dimensions of each dataset compared to the reference form. Therefore, the computational cost required for the learning phase can also be reduced compared to the reference form. For example, a high-quality best learning model can be obtained in a shorter computation time than the reference form.

また、学習装置１１によれば、多重共線性の影響が排除されたベスト学習モデルに対応する前処理手法を、ベスト前処理手法として選択することもできる。それゆえ、図面検索の精度向上のためにより一層有効性の高い前処理手法を、ベスト前処理手法として選択できることも期待される。 Furthermore, the learning device 11 can select the preprocessing method corresponding to the best learning model, from which the effects of multicollinearity have been eliminated, as the best preprocessing method. Therefore, it is expected that a more effective preprocessing method for improving the accuracy of drawing search can be selected as the best preprocessing method.

そして、実施形態１における図面検索装置１２では、上述の通り学習装置１１によって選択されたベスト学習モデルおよびベスト前処理手法を用いて、図面検索を行うことができる。その結果、参考形態に比べて、さらに高い検索精度によって図面検索を行うことができる。 The drawing search device 12 in embodiment 1 can perform drawing searches using the best learning model and best preprocessing method selected by the learning device 11 as described above. As a result, drawing searches can be performed with even higher search accuracy than in the reference embodiment.

〔変形例〕
実施形態１では、多重共線性評価値として決定係数（Ｒｉｊ）が用いられる場合が例示されていた。但し、当業者であれば明らかである通り、多重共線性評価値は、上記の例に限定されない。例えば、学習装置１１において、多重共線性評価値として、上述の相関係数絶対値が用いられてもよい。 [Modification]
In the first embodiment, the coefficient of determination (Rij) is used as the multicollinearity evaluation value. However, as will be apparent to those skilled in the art, the multicollinearity evaluation value is not limited to the above example. For example, the learning device 11 may use the above-mentioned absolute value of the correlation coefficient as the multicollinearity evaluation value.

この場合、決定部１１５は、実施形態１と同様にして、注目データセットに含まれる任意の２つの異なる前処理後内容パラメータ間の相関係数絶対値を算出してよい。本変形例では、決定部１１５は、_ＰＣ_２通りの相関係数絶対値を算出する。このように、決定部１１５は、各前処理後内容パラメータについて、相関係数絶対値（すなわち、｜ｓｉｊ｜）を算出する。 In this case, the determining unit 115 may calculate the absolute value of the correlation coefficient between any two different preprocessed content parameters included in the data set of interest, similarly to the first embodiment. In this modification, the determining unit 115 calculates _two absolute values of the correlation coefficient _P. In this way, the determining unit 115 calculates the absolute value of the correlation coefficient (i.e., |sij|) for each preprocessed content parameter.

そして、決定部１１５は、算出した各相関係数絶対値を、所定の閾値ｓｔｈ（相関係数絶対値閾値）と比較する。相関係数絶対値閾値は、多重共線性閾値の別の例である。本明細書では、「相関係数絶対値がｓｔｈ以上である」という条件を満たしている相関係数絶対値を、高リスク相関係数絶対値と称する。 The determination unit 115 then compares each calculated correlation coefficient absolute value with a predetermined threshold sth (correlation coefficient absolute value threshold). The correlation coefficient absolute value threshold is another example of a multicollinearity threshold. In this specification, correlation coefficient absolute values that satisfy the condition that "the correlation coefficient absolute value is equal to or greater than sth" are referred to as high-risk correlation coefficient absolute values.

本変形例では、決定部１１５は、各前処理後内容パラメータについて、各相関係数絶対値をｓｔｈと比較し、各相関係数絶対値の内から高リスク相関係数絶対値を抽出する。そして、決定部１１５は、各前処理後内容パラメータについて、抽出した高リスク相関係数絶対値の個数を計上する。 In this modified example, the determination unit 115 compares each correlation coefficient absolute value with sth for each preprocessed content parameter, and extracts high-risk correlation coefficient absolute values from among the correlation coefficient absolute values. Then, the determination unit 115 counts the number of high-risk correlation coefficient absolute values extracted for each preprocessed content parameter.

上述の式（５）との対応性から明らかである通り、例えば、
ｓｔｈ＝Ｒｔｈ^１／２ …（６）
として、ｓｔｈが設定されてよい。従って、ｓｔｈは、０．７以上かつ１以下の所定の値として設定されることが好ましい。一例として、ｓｔｈは０．７に設定されてよい。但し、当業者であれば明らかである通り、ｓｔｈは上記の例に限定されない。 As is clear from the correspondence with the above formula (5), for example,
sth=Rth ^1/2 ...(6)
Therefore, sth is preferably set to a predetermined value equal to or greater than 0.7 and equal to or less than 1. As an example, sth may be set to 0.7. However, as will be apparent to those skilled in the art, sth is not limited to the above example.

本変形例において、決定部１１５は、各相関係数絶対値と高リスク相関係数絶対値の個数との対応関係を示すテーブル（相関係数絶対値・高リスク相関係数絶対値個数テーブル）を生成する。当該相関係数絶対値・高リスク相関係数絶対値個数テーブルは、上述のＴＢ６に対応する。そして、決定部１１５は、実施形態１と同様にして、相関係数絶対値・高リスク相関係数絶対値個数テーブルにおける全ての高リスク相関係数絶対値の個数が０になるまで、当該相関係数絶対値・高リスク相関係数絶対値個数テーブルの更新を繰り返す。以降の処理については、実施形態１と同様である。 In this modified example, the determination unit 115 generates a table (a correlation coefficient absolute value/high risk correlation coefficient absolute value count table) that indicates the correspondence between each correlation coefficient absolute value and the number of high risk correlation coefficient absolute values. This correlation coefficient absolute value/high risk correlation coefficient absolute value count table corresponds to TB6 described above. Then, as in embodiment 1, the determination unit 115 repeatedly updates the correlation coefficient absolute value/high risk correlation coefficient absolute value count table until the number of all high risk correlation coefficient absolute values in the correlation coefficient absolute value/high risk correlation coefficient absolute value count table becomes 0. Subsequent processing is the same as in embodiment 1.

以上の説明から明らかである通り、本発明の一態様において、決定部１１５は、各前処理後内容パラメータについて、多重共線性評価値を算出する。そして、決定部１１５は、算出した各多重共線性評価値を、所定の多重共線性閾値と比較する。具体的には、決定部１１５は、各多重共線性評価値を多重共線性閾値と比較し、各多重共線性評価値の内から高リスク多重共線性評価値を抽出する。決定部１１５は、各前処理後内容パラメータについて、抽出した高リスク多重共線性評価値の個数を計上する。 As is clear from the above explanation, in one aspect of the present invention, the determination unit 115 calculates a multicollinearity assessment value for each preprocessed content parameter. Then, the determination unit 115 compares each calculated multicollinearity assessment value with a predetermined multicollinearity threshold. Specifically, the determination unit 115 compares each multicollinearity assessment value with the multicollinearity threshold and extracts high-risk multicollinearity assessment values from among the multicollinearity assessment values. The determination unit 115 counts the number of high-risk multicollinearity assessment values extracted for each preprocessed content parameter.

次いで、決定部１１５は、各多重共線性評価値と高リスク多重共線性評価値の個数との対応関係を示すテーブル（多重共線性評価値・高リスク多重共線性評価値個数テーブル）を生成する。当該多重共線性評価値・高リスク多重共線性評価値個数テーブルは、上述のＴＢ６に対応する。そして、上述の通り、決定部１１５は、多重共線性評価値・高リスク多重共線性評価値個数テーブルにおける全ての高リスク多重共線性評価値の個数が０になるまで、当該多重共線性評価値・高リスク多重共線性評価値個数テーブルの更新を繰り返す。 Next, the determination unit 115 generates a table (multicollinearity evaluation value/high-risk multicollinearity evaluation value count table) that shows the correspondence between each multicollinearity evaluation value and the number of high-risk multicollinearity evaluation values. This multicollinearity evaluation value/high-risk multicollinearity evaluation value count table corresponds to TB6 described above. Then, as described above, the determination unit 115 repeatedly updates the multicollinearity evaluation value/high-risk multicollinearity evaluation value count table until the counts of all high-risk multicollinearity evaluation values in the multicollinearity evaluation value/high-risk multicollinearity evaluation value count table become zero.

以上の通り、決定部１１５は、複数の前処理後内容パラメータに対して導出された複数の多重共線性評価値に基づいて、当該複数の前処理後内容パラメータの内から削除対象前処理後内容パラメータを決定できるように設定されていればよい。これにより、決定部１１５によって、当該複数の前処理後内容パラメータから削除対象前処理後内容パラメータを削除することが可能となる。その結果、上述の通り、多重共線性の影響が排除された学習モデルを、ベスト学習モデルとして得ることができる。 As described above, the determination unit 115 may be configured to determine preprocessed content parameters to be deleted from among a plurality of preprocessed content parameters based on a plurality of multicollinearity evaluation values derived for the plurality of preprocessed content parameters. This enables the determination unit 115 to delete preprocessed content parameters to be deleted from the plurality of preprocessed content parameters. As a result, as described above, a learning model from which the effects of multicollinearity have been eliminated can be obtained as the best learning model.

なお、当業者であれば明らかである通り、削除対象前処理後内容パラメータの選択方法は、実施形態１の例に限定されない。決定部１１５は、異なる２つの前処理後内容パラメータ間の多重共線性評価値が多重共線性閾値以上である場合に、当該２つの検索対象図面内容パラメータの内の一方を、削除対象前処理後内容パラメータとして決定すればよい。 As will be clear to those skilled in the art, the method for selecting the preprocessed content parameters to be deleted is not limited to the example in embodiment 1. If the multicollinearity evaluation value between two different preprocessed content parameters is equal to or greater than the multicollinearity threshold, the determination unit 115 may determine one of the two search target drawing content parameters as the preprocessed content parameter to be deleted.

〔実施形態２〕
図２２は、実施形態２の情報処理システム１００Ｖの要部の構成を示すブロック図である。情報処理システム１００Ｖの情報処理装置を、情報処理装置１Ｖと称する。情報処理装置１Ｖの制御装置を、制御装置１０Ｖと称する。制御装置１０の学習装置および図面検索装置をそれぞれ、学習装置１１Ｖ（モデル生成装置）および図面検索装置１２Ｖと称する。 [Embodiment 2]
22 is a block diagram showing the configuration of the main parts of an information processing system 100V according to the second embodiment. The information processing device of the information processing system 100V is referred to as an information processing device 1V. The control device of the information processing device 1V is referred to as a control device 10V. The learning device and drawing search device of the control device 10 are referred to as a learning device 11V (model generation device) and a drawing search device 12V, respectively.

学習装置１１Ｖは、実施形態１の学習装置１１とは異なり、学習用前処理部１１４を有していない。学習装置１１Ｖの決定部および学習モデル生成部をそれぞれ、決定部１１５Ｖおよび学習モデル生成部１１３Ｖ（学習部）と称する。また、図面検索装置１２Ｖは、実施形態１の図面検索装置１２とは異なり、検索用前処理部１２５を有していない。図面検索装置１２Ｖの検索部を、検索部１２６Ｖと称する。 Unlike the learning device 11 of embodiment 1, the learning device 11V does not have a learning preprocessing unit 114. The determination unit and learning model generation unit of the learning device 11V are referred to as the determination unit 115V and the learning model generation unit 113V (learning unit), respectively. Furthermore, unlike the drawing search device 12 of embodiment 1, the drawing search device 12V does not have a search preprocessing unit 125. The search unit of the drawing search device 12V is referred to as the search unit 126V.

図２２から明らかである通り、学習装置１１Ｖでは、学習装置１１とは異なり、前処理による各内容パラメータセットの拡張が行われない。このため、実施形態２では、複数の説明変数間（実施形態２の例では、複数の内容パラメータ間）における多重共線性の発生リスクは、実施形態１に比べて低いと期待される。但し、例えば、各過去図面の記載内容次第では、複数の内容パラメータ間において多重共線性が発生することも考えられる。そこで、実施形態２では、多重共線性の影響を排除するために、決定部１１５Ｖが設けられている。 As is clear from FIG. 22, unlike the learning device 11, the learning device 11V does not expand each content parameter set through preprocessing. For this reason, in the second embodiment, the risk of multicollinearity occurring between multiple explanatory variables (between multiple content parameters in the example of the second embodiment) is expected to be lower than in the first embodiment. However, depending on the content of each past drawing, for example, it is possible that multicollinearity may occur between multiple content parameters. Therefore, in the second embodiment, a determination unit 115V is provided to eliminate the effects of multicollinearity.

（学習装置１１Ｖにおける処理の一例）
実施形態２では、決定部１１５Ｖは、過去図面内容パラメータ取得部１１２から、第１図面種類内容パラメータセット～第Ｎ図面種類内容パラメータセットを取得する。決定部１１５Ｖは、第１図面種類内容パラメータセットに含まれる任意の２つの異なる内容パラメータ間の多重共線性評価値を算出してよい。 (Example of processing in learning device 11V)
In the second embodiment, the determination unit 115V acquires the first to Nth drawing type content parameter sets from the past drawing content parameter acquisition unit 112. The determination unit 115V may calculate a multicollinearity evaluation value between any two different content parameters included in the first drawing type content parameter set.

実施形態２では、決定部１１５Ｖは、第１図面種類内容パラメータセットに含まれる各内容パラメータ（第１図面種類内容パラメータセット内の第１～第Ｌ内容パラメータ）について、多重共線性評価値を算出する。すなわち、実施形態２では、決定部１１５Ｖは、第１図面種類内容パラメータセットについて、_ＬＣ_２通りの多重共線性評価値を算出する。このように、決定部１１５は、第１図面種類内容パラメータセットに含まれる異なる２つの検索対象図面内容パラメータの組み合わせパターンのそれぞれについて、多重共線性評価値を算出する。 In the second embodiment, the determination unit 115V calculates a multicollinearity evaluation value for each content parameter included in the first drawing type content parameter set (the first to Lth content parameters in the first drawing type content parameter set). That is, in the second embodiment, the determination unit 115V calculates _L C _two multicollinearity evaluation values for the first drawing type content parameter set. In this way, the determination unit 115 calculates a multicollinearity evaluation value for each combination pattern of two different drawing content parameters to be searched included in the first drawing type content parameter set.

次いで、決定部１１５Ｖは、算出した各多重共線性評価値を多重共線性閾値と比較し、各多重共線性評価値の内から高リスク多重共線性評価値を抽出する。そして、決定部１１５は、各内容パラメータについて、抽出した高リスク多重共線性評価値の個数を計上する。 The determination unit 115V then compares each calculated multicollinearity evaluation value with a multicollinearity threshold and extracts high-risk multicollinearity evaluation values from among the multicollinearity evaluation values. The determination unit 115 then counts the number of extracted high-risk multicollinearity evaluation values for each content parameter.

次いで、決定部１１５Ｖは、多重共線性評価値・高リスク多重共線性評価値個数テーブルを生成する。そして、決定部１１５Ｖは、多重共線性評価値・高リスク多重共線性評価値個数テーブルにおける全ての高リスク多重共線性評価値の個数が０になるまで、当該多重共線性評価値・高リスク多重共線性評価値個数テーブルの更新を繰り返す。 The determination unit 115V then generates a table of multicollinearity assessment values and the number of high-risk multicollinearity assessment values. The determination unit 115V then repeatedly updates the table of multicollinearity assessment values and the number of high-risk multicollinearity assessment values until the number of all high-risk multicollinearity assessment values in the table of multicollinearity assessment values and the number of high-risk multicollinearity assessment values becomes 0.

このように、決定部１１５Ｖは、多重共線性評価値・高リスク多重共線性評価値個数テーブルの更新を繰り返すことにより、第１図面種類内容パラメータセット内の第１～第Ｌ内容パラメータの内から、削除対象となる削除対象内容パラメータを特定する。削除対象内容パラメータは、多重共線性を生じさせるリスクが高いと懸念される内容パラメータと言える。なお、削除対象内容パラメータは、学習対象外内容パラメータと称されてもよい。 In this way, the determination unit 115V repeatedly updates the multicollinearity evaluation value/high-risk multicollinearity evaluation value count table to identify content parameters to be deleted from among the first through Lth content parameters in the first drawing type content parameter set. Content parameters to be deleted can be said to be content parameters that are feared to pose a high risk of causing multicollinearity. Note that content parameters to be deleted may also be referred to as content parameters not to be learned.

上述の各説明から明らかである通り、決定部１１５Ｖは、多重共線性評価値が所定の多重共線性閾値以上である場合に、上記２つの検索対象図面内容パラメータの内の一方を、削除対象内容パラメータとして決定してよい。そして、決定部１１５Ｖは、第１図面種類内容パラメータセット内の第１～第Ｌ内容パラメータの内から、削除対象内容パラメータを削除することにより、剪定後第１図面種類内容パラメータセットを生成する。 As is clear from the above explanations, the determination unit 115V may determine one of the two search target drawing content parameters as a content parameter to be deleted if the multicollinearity evaluation value is equal to or greater than a predetermined multicollinearity threshold. The determination unit 115V then generates a pruned first drawing type content parameter set by deleting the content parameter to be deleted from the first through Lth content parameters in the first drawing type content parameter set.

一例として、第１図面種類内容パラメータセット内の第１～第Ｌ内容パラメータの内、第２内容パラメータ「電流値」が、削除対象内容パラメータとして特定された場合を考える。この場合、決定部１１５Ｖは、第１図面種類内容パラメータセットから、第２内容パラメータ「電流値」に対応する系列（図３の例における２行目）を削除することにより、剪定後第１図面種類内容パラメータセットを生成する。 As an example, consider the case where, of the first through Lth content parameters in the first drawing type content parameter set, the second content parameter "current value" is identified as the content parameter to be deleted. In this case, the determination unit 115V generates a pruned first drawing type content parameter set by deleting the series corresponding to the second content parameter "current value" (the second row in the example of Figure 3) from the first drawing type content parameter set.

同様にして、決定部１１５Ｖは、第２図面種類内容パラメータセットに含まれる各内容パラメータ（第２図面種類内容パラメータセット内の第１～第Ｌ内容パラメータ）について、多重共線性評価値を算出する。そして、決定部１１５は、各多重共線性評価値に基づいて第２図面種類内容パラメータセットを剪定することにより、剪定後第２図面種類内容パラメータセットを生成する。 Similarly, the determination unit 115V calculates a multicollinearity evaluation value for each content parameter included in the second drawing type content parameter set (the first to Lth content parameters in the second drawing type content parameter set).The determination unit 115 then prunes the second drawing type content parameter set based on each multicollinearity evaluation value, thereby generating a pruned second drawing type content parameter set.

また、決定部１１５Ｖは、第Ｎ図面種類内容パラメータセットに含まれる各内容パラメータ（第Ｎ図面種類内容パラメータセット内の第１～第Ｌ内容パラメータ）について、多重共線性評価値を算出する。そして、決定部１１５Ｖは、各多重共線性評価値に基づいて第Ｎ図面種類内容パラメータセットを剪定することにより、剪定後第Ｎ図面種類内容パラメータセットを生成する。決定部１１５Ｖは、以上の通り生成した剪定後第１図面種類内容パラメータセット～剪定後第Ｎ図面種類内容パラメータセットを、学習モデル生成部１１３Ｖに供給する。 The determination unit 115V also calculates a multicollinearity evaluation value for each content parameter included in the Nth drawing type content parameter set (the 1st to Lth content parameters in the Nth drawing type content parameter set). The determination unit 115V then generates a pruned Nth drawing type content parameter set by pruning the Nth drawing type content parameter set based on each multicollinearity evaluation value. The determination unit 115V supplies the pruned 1st to Nth drawing type content parameter sets generated as described above to the learning model generation unit 113V.

実施形態２では、学習モデル生成部１１３Ｖは、剪定後第１図面種類内容パラメータセット～剪定後第Ｎ図面種類内容パラメータセットに基づき、学習モデルを生成する。具体的には、学習モデル生成部１１３Ｖは、所定の機械学習アルゴリズムを実行することにより、剪定後第１図面種類内容パラメータセット～剪定後第Ｎ図面種類内容パラメータセットに基づき、学習モデルを生成する。学習モデル生成部１１３Ｖは、生成した学習モデルを、図面検索装置１２Ｖ（より具体的には、検索部１２６Ｖ）に供給する。 In embodiment 2, the learning model generation unit 113V generates a learning model based on the pruned first drawing type content parameter set through the pruned Nth drawing type content parameter set. Specifically, the learning model generation unit 113V executes a predetermined machine learning algorithm to generate a learning model based on the pruned first drawing type content parameter set through the pruned Nth drawing type content parameter set. The learning model generation unit 113V supplies the generated learning model to the drawing search device 12V (more specifically, the search unit 126V).

実施形態２における機械学習アルゴリズムは、実施形態１において例示した複数種類の機械学習アルゴリズムのうちの任意の１つであってよい。従って、一例として、学習モデル生成部１１３Ｖは、多項ロジスティック回帰によって、剪定後第１図面種類内容パラメータセット～剪定後第Ｎ図面種類内容パラメータセットに基づき、学習モデルを生成してよい。 The machine learning algorithm in embodiment 2 may be any one of the multiple types of machine learning algorithms exemplified in embodiment 1. Therefore, as an example, the learning model generation unit 113V may generate a learning model based on the pruned first drawing type content parameter set through the pruned Nth drawing type content parameter set using multinomial logistic regression.

上述の通り、実施形態２では、前処理による各内容パラメータセットの拡張が行われない。このため、実施形態２では、ベスト学習モデルおよびベスト前処理手法は決定されない。このことから明らかである通り、本発明の一態様において、複数の学習モデルを生成し、かつ、当該複数の学習モデルの内からベスト学習モデルを選択する決定する工程は必須ではない。同様に、本発明の一態様において、複数の前処理手法の内からベスト前処理手法を決定する工程も必須ではない。 As described above, in embodiment 2, each content parameter set is not expanded by preprocessing. Therefore, in embodiment 2, the best learning model and best preprocessing method are not determined. As is clear from this, in one aspect of the present invention, the steps of generating multiple learning models and selecting the best learning model from among the multiple learning models are not required. Similarly, in one aspect of the present invention, the step of determining the best preprocessing method from among multiple preprocessing methods is not required.

（図面検索装置１２Ｖにおける処理の一例）
検索部１２６Ｖは、（ｉ）新規図面内容パラメータ取得部１２２から新規図面内容パラメータセットを取得するとともに、（ｉｉ）学習モデル生成部１１３Ｖによって生成された学習モデルを、学習装置１１Ｖから取得する。 (Example of processing in the drawing search device 12V)
The search unit 126V (i) acquires a new drawing content parameter set from the new drawing content parameter acquisition unit 122, and (ii) acquires the learning model generated by the learning model generation unit 113V from the learning device 11V.

検索部１２６Ｖは、新規図面内容パラメータセットを学習モデルに入力する。そして、検索部１２６Ｖは、新規図面内容パラメータセットに応じた学習モデルの出力を、当該学習モデルから取得する。 The search unit 126V inputs the new drawing content parameter set into the learning model. Then, the search unit 126V obtains the output of the learning model corresponding to the new drawing content parameter set from the learning model.

一例として、検索部１２６Ｖは、学習モデルに新規図面内容パラメータセットを入力することにより、当該新規図面内容パラメータセットに応じた関連性スコアを、当該学習モデルに出力させる。そして、検索部１２６Ｖは、学習モデルの出力（例：関連性スコア）に基づいて、図面ＮＤに対応する少なくとも１つの過去図面を検索する。 As an example, the search unit 126V inputs a new drawing content parameter set into a learning model, causing the learning model to output a relevance score corresponding to the new drawing content parameter set. Then, the search unit 126V searches for at least one past drawing corresponding to the drawing ND based on the output of the learning model (e.g., the relevance score).

（実施形態２の効果）
実施形態２における学習装置１１Ｖによれば、複数の内容パラメータから、削除対象内容パラメータ（多重共線性を生じさせるリスクが高いと懸念される内容パラメータ）を排除した上で、学習モデルを生成できる。すなわち、学習装置１１Ｖによれば、削除対象内容パラメータを除いた複数の内容パラメータに基づき、学習モデルを生成できる。 (Effects of the Second Embodiment)
The learning device 11V in the second embodiment can generate a learning model after excluding content parameters to be deleted (content parameters that are feared to have a high risk of causing multicollinearity) from a plurality of content parameters. That is, the learning device 11V can generate a learning model based on a plurality of content parameters excluding the content parameters to be deleted.

本明細書では、内容パラメータセット（より詳細には、検索対象図面内容パラメータセット）から削除対象内容パラメータを削除することによって得られるデータセットを、剪定後内容パラメータセット（より詳細には、剪定後検索対象図面内容パラメータセット）とも称する。以上の通り、学習装置１１Ｖによれば、剪定後内容パラメータセットに基づき、学習モデルを生成できる。このため、学習装置１１Ｖによれば、多重共線性の影響が排除された学習モデルを得ることができる。それゆえ、従来（例：特許文献１の技術）に比べて、学習モデルの品質を従来よりも向上させることができる。 In this specification, the data set obtained by deleting the deletion target content parameters from the content parameter set (more specifically, the search target drawing content parameter set) is also referred to as the pruned content parameter set (more specifically, the pruned search target drawing content parameter set). As described above, the learning device 11V can generate a learning model based on the pruned content parameter set. Therefore, the learning device 11V can obtain a learning model in which the effects of multicollinearity are eliminated. This makes it possible to improve the quality of the learning model compared to conventional techniques (e.g., the technology of Patent Document 1).

そして、実施形態２における図面検索装置１２Ｖでは、上述の通り学習装置１１Ｖによって生成された学習モデルを用いて、図面検索を行うことができる。その結果、従来に比べて、さらに高い検索精度によって図面検索を行うことができる。 The drawing search device 12V in embodiment 2 can perform drawing searches using the learning model generated by the learning device 11V as described above. As a result, drawing searches can be performed with even higher search accuracy than before.

上述の通り、実施形態２では、実施形態１とは異なり、ベスト学習モデルおよびベスト前処理手法の選択は実行されない。このため、実施形態２によれば、図面検索装置の運用時に使用される最終的な学習モデル（運用モデル）を、実施形態１に比べて短時間で得ることができる。 As described above, in embodiment 2, unlike embodiment 1, the selection of the best learning model and the best preprocessing method is not performed. Therefore, according to embodiment 2, the final learning model (operational model) used when operating the drawing search device can be obtained in a shorter time than in embodiment 1.

但し、上述の説明から明らかである通り、実施形態１によれば、運用モデルとしてベスト学習モデルを選択できる。すなわち、実施形態１によれば、実施形態２に比べてさらに高品質な運用モデルを得ることができる。加えて、実施形態１によれば、図面検索装置の運用時に使用される前処理手法（運用時前処理手法）として、ベスト前処理手法を選択できる。運用時にベスト前処理手法を適用することにより、図面検索装置における図面検索精度をより一層向上させることができる。従って、実施形態１または２のいずれの情報処理装置の構成を採用するかについては、例えば、当該情報処理装置に要求される仕様に応じて、当該情報処理装置の設計者によって適宜決定されてよい。 However, as is clear from the above explanation, according to embodiment 1, the best learning model can be selected as the operational model. That is, according to embodiment 1, an operational model of even higher quality can be obtained compared to embodiment 2. In addition, according to embodiment 1, the best preprocessing method can be selected as the preprocessing method to be used when the drawing search device is in operation (operational preprocessing method). By applying the best preprocessing method during operation, the drawing search accuracy of the drawing search device can be further improved. Therefore, whether the configuration of the information processing device of embodiment 1 or 2 is to be adopted may be decided appropriately by the designer of the information processing device, for example, depending on the specifications required for the information processing device.

〔ソフトウェアによる実現例〕
情報処理システム１００ｓ・１００・１００Ｖ（以下では、便宜上「装置」と呼ぶ）の機能は、当該装置としてコンピュータを機能させるためのプログラムであって、当該装置の各制御ブロック（特に、制御装置１０ｓ・１０・１０Ｖに含まれる各部）としてコンピュータを機能させるためのプログラムにより実現することができる。 [Software implementation example]
The functions of the information processing systems 100s, 100, and 100V (hereinafter referred to as "devices" for convenience) can be realized by a program that causes a computer to function as the device, and a program that causes a computer to function as each control block of the device (in particular, each part included in the control device 10s, 10, and 10V).

この場合、上記装置は、上記プログラムを実行するためのハードウェアとして、少なくとも１つの制御装置（例えばプロセッサ）と少なくとも１つの記憶装置（例えばメモリ）を有するコンピュータを備えている。この制御装置と記憶装置により上記プログラムを実行することにより、上記各実施形態で説明した各機能が実現される。 In this case, the device includes a computer having at least one control device (e.g., a processor) and at least one storage device (e.g., a memory) as hardware for executing the program. The functions described in each of the above embodiments are realized by executing the program using this control device and storage device.

上記プログラムは、一時的ではなく、コンピュータ読み取り可能な、１または複数の記録媒体に記録されていてもよい。この記録媒体は、上記装置が備えていてもよいし、備えていなくてもよい。後者の場合、上記プログラムは、有線または無線の任意の伝送媒体を介して上記装置に供給されてもよい。 The above program may be stored non-transitory on one or more computer-readable storage media. These storage media may or may not be included in the device. In the latter case, the program may be supplied to the device via any wired or wireless transmission medium.

また、上記各制御ブロックの機能の一部または全部は、論理回路により実現することも可能である。例えば、上記各制御ブロックとして機能する論理回路が形成された集積回路も本発明の一態様の範疇に含まれる。この他にも、例えば量子コンピュータにより上記各制御ブロックの機能を実現することも可能である。 Furthermore, some or all of the functions of each of the above control blocks can be realized by logic circuits. For example, an integrated circuit incorporating logic circuits that function as each of the above control blocks also falls within the scope of one aspect of the present invention. In addition, the functions of each of the above control blocks can also be realized by, for example, a quantum computer.

上述の各説明から明らかである通り、上記各実施形態で説明した各処理は、ＡＩ（Artificial Intelligence：人工知能）に実行させることができる。この場合、ＡＩは上記制御装置で動作するものであってもよいし、他の装置（例えばエッジコンピュータまたはクラウドサーバ等）で動作するものであってもよい。 As is clear from the above explanations, the processes described in the above embodiments can be executed by AI (Artificial Intelligence). In this case, the AI may run on the control device, or on another device (such as an edge computer or cloud server).

〔付記事項〕
本発明の一態様は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の一態様の技術的範囲に含まれる。 [Additional Notes]
One aspect of the present invention is not limited to the above-described embodiments, and various modifications are possible within the scope of the claims. Embodiments obtained by appropriately combining the technical means disclosed in different embodiments are also included in the technical scope of one aspect of the present invention.

１，１Ｖ情報処理装置
１０，１０Ｖ制御装置
１１，１１Ｖ学習装置（モデル生成装置）
１００，１００Ｖ情報処理システム
１１１過去図面データ取得部
１１２過去図面内容パラメータ取得部（取得部）
１１３，１１３Ｖ学習モデル生成部（学習部）
１１４学習用前処理部（前処理部）
１１５、１１５Ｖ決定部 1, 1V Information processing device 10, 10V Control device 11, 11V Learning device (model generation device)
100, 100V Information processing system 111 Past drawing data acquisition unit 112 Past drawing content parameter acquisition unit (acquisition unit)
113, 113V Learning model generation unit (learning unit)
114 Learning preprocessing unit (preprocessing unit)
115, 115V decision section

Claims

A model generation device that generates a learning model for searching at least one drawing corresponding to a target drawing from among a plurality of search target drawings,
an acquisition unit that analyzes the plurality of search target drawings to acquire a content parameter set including a plurality of content parameters related to the description content of each of the plurality of search target drawings;
(i) a determination unit that calculates a multicollinearity evaluation value between two different content parameters for each combination pattern of the plurality of content parameters included in the content parameter set, and (ii) determines content parameters to be deleted from the plurality of content parameters based on the multicollinearity evaluation value;
a learning unit that generates the learning model based on a pruned content parameter set obtained by deleting the deletion target content parameters from the content parameter set ,
The model generating device
a preprocessing unit that generates a plurality of preprocessed content parameter sets each including a plurality of preprocessed content parameters by preprocessing each of the plurality of content parameters included in the content parameter set according to a combination of a predetermined plurality of types of preprocessing techniques;
The determination unit (i) calculates a multicollinearity evaluation value between two different pre-processed content parameters for each combination pattern of the plurality of pre-processed content parameters included in each of the plurality of pre-processed content parameter sets, and (ii) determines, based on the multicollinearity evaluation value, a post-processed content parameter to be deleted from the plurality of pre-processed content parameters included in each of the plurality of pre-processed content parameter sets;
The learning section is
generating a plurality of learning models using a plurality of pruned post-preprocessing content parameter sets obtained by deleting the post-preprocessing content parameters to be deleted from each of the plurality of post-preprocessing content parameter sets by applying each of a plurality of predetermined types of machine learning algorithms;
selecting a best learning model from the plurality of learning models based on a plurality of index values indicating the quality of each of the plurality of learning models, the index values being obtained by verifying each of the plurality of learning models using each of the plurality of post-pruning and post-preprocessing content parameter sets;
selecting a preprocessing method corresponding to the best learning model as a best preprocessing method from among the plurality of preprocessing methods;
One of the multiple preprocessing techniques is one-hot encoding.
Model generation device.

2. The model generating device according to claim 1 , wherein the learning unit selects, from the plurality of learning models, a learning model having the highest index value as the best learning model.

3. The model generating device according to claim 1, wherein the determination unit determines one of the two preprocessed content parameters included in each of the plurality of preprocessed content parameter sets as the preprocessed content parameter to be deleted when the multicollinearity evaluation value is equal to or greater than a multicollinearity threshold.

The determination unit:
calculating a coefficient of determination between the two preprocessed content parameters included in each of the plurality of preprocessed content parameter sets as the multicollinearity evaluation value;
4. The model generating device according to claim 3, wherein, when the coefficient of determination is equal to or greater than a coefficient of determination threshold as the multicollinearity threshold, one of the two preprocessed content parameters included in each of the plurality of preprocessed content parameter sets is determined to be the preprocessed content parameter to be deleted.

The model generating device according to claim 4 , wherein the coefficient of determination threshold is set to a predetermined value not less than 0.49 and not more than 1.

The determination unit:
calculating, as the multicollinearity evaluation value, an absolute value of a correlation coefficient between the two preprocessed content parameters included in each of the plurality of preprocessed content parameter sets;
4. The model generating device according to claim 3, wherein, when the absolute value of the correlation coefficient is equal to or greater than a correlation coefficient absolute value threshold as the multicollinearity threshold , one of the two preprocessed content parameters included in each of the plurality of preprocessed content parameter sets is determined to be the preprocessed content parameter to be deleted.

7. The model generating device according to claim 6 , wherein the correlation coefficient absolute value threshold is set to a predetermined value not less than 0.7 and not more than 1.

A model generation method in which a model generation device generates a learning model for searching at least one drawing corresponding to a target drawing from among a plurality of search target drawings, the method comprising:
The above model generation method is as follows:
an acquisition step in which the model generation device analyzes the plurality of search target drawings to acquire a content parameter set including a plurality of content parameters related to the description content of each of the plurality of search target drawings;
the model generating device performs a determination step of: (i) calculating a multicollinearity evaluation value between two different content parameters for each combination pattern of the plurality of content parameters included in the content parameter set; and (ii) determining a deletion target content parameter to be deleted from the plurality of content parameters based on the multicollinearity evaluation value;
a learning step in which the model generation device generates the learning model based on a pruned content parameter set obtained by deleting the deletion target content parameters from the content parameter set ,
The above model generation method is as follows:
a preprocessing step of generating a plurality of preprocessed content parameter sets each including a plurality of preprocessed content parameters by preprocessing each of the plurality of content parameters included in the content parameter set according to a combination of a plurality of predetermined types of preprocessing techniques;
the determining step further includes a step in which the model generating device (i) calculates a multicollinearity evaluation value between two different pre-processed content parameters for each combination pattern of the plurality of pre-processed content parameters included in each of the plurality of pre-processed content parameter sets, and (ii) determines, based on the multicollinearity evaluation value, a pre-processed content parameter to be deleted from the plurality of pre-processed content parameters included in each of the plurality of pre-processed content parameter sets;
The learning process is as follows:
The model generating device generates a plurality of learning models using a plurality of pruned post-preprocessing content parameter sets obtained by deleting the post-preprocessing content parameters to be deleted from each of the plurality of post-preprocessing content parameter sets by applying each of a plurality of predetermined types of machine learning algorithms;
a step in which the model generation device selects a best learning model from among the plurality of learning models based on a plurality of index values indicating the quality of each of the plurality of learning models, the index values being obtained by verifying each of the plurality of learning models using each of the plurality of post-pruning and post-preprocessing content parameter sets;
the model generation device further includes a step of selecting, from the plurality of types of preprocessing methods, a preprocessing method corresponding to the best learning model as a best preprocessing method;
One of the multiple preprocessing techniques is one-hot encoding.
Model generation method.