JP7702799B2

JP7702799B2 - DATA EXTRACTION DEVICE, LEARNING MODEL CONSTRUCTION DEVICE, DATA EXTRACTION METHOD, LEARNING MODEL CONSTRUCTION METHOD, AND PROGRAM

Info

Publication number: JP7702799B2
Application number: JP2021062980A
Authority: JP
Inventors: 勝彦大塚; 克明森田; 清彦壺内; 裕之住田; 昇平大槻
Original assignee: Mitsubishi Heavy Industries Ltd
Current assignee: Mitsubishi Heavy Industries Ltd
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2025-07-04
Anticipated expiration: 2041-04-01
Also published as: JP2022158224A

Description

本開示は、データ抽出装置、学習モデル構築装置、データ抽出方法、学習モデル構築方法およびプログラムに関する。 The present disclosure relates to a data extraction device, a learning model construction device, a data extraction method, a learning model construction method, and a program.

近年、深層学習の活用が進んでいる。例えば、特許文献１には、深層学習を利用した画像内の目標検出における検出精度の向上方法が開示されている。また、例えば、時系列データの多値分類問題に対して、深層学習を用いると高い精度が得られる可能性があり、例えば、ＣＮＮ（ＣＮＮ：Convolutional Neural Network）の適用が期待されている。ＣＮＮの学習、評価には多大な時間を要する。例えば、プラントのプロセスデータからプラントの運転状態を判別する判別モデルを構築する場合、プラント毎に判別モデルを構築しようとすると、学習、評価に膨大な時間を要する。 In recent years, deep learning has been increasingly used. For example, Patent Document 1 discloses a method for improving detection accuracy in target detection in an image using deep learning. In addition, for example, high accuracy may be achieved by using deep learning for multi-value classification problems of time-series data, and the application of CNN (Convolutional Neural Network), for example, is expected. Training and evaluation of CNN requires a significant amount of time. For example, when building a discrimination model for discriminating the operating state of a plant from the plant's process data, building a discrimination model for each plant requires an enormous amount of time for training and evaluation.

中国特許出願公開第１０６９３４３４６号明細書Chinese Patent Publication No. 106934346

学習モデルの構築に要する時間を低減する方法が必要とされている。 There is a need for a method to reduce the time required to build a learning model.

本開示は、上記課題を解決することができるデータ抽出装置、学習モデル構築装置、データ抽出方法、学習モデル構築方法およびプログラムを提供する。 The present disclosure provides a data extraction device, a learning model construction device, a data extraction method, a learning model construction method, and a program that can solve the above problems.

本開示のデータ抽出装置は、時系列データから時間窓ごとに切り出された複数の学習データの中から、一部の学習データを抽出するデータ抽出装置であって、複数の前記学習データを取得するデータ取得部と、前記時間窓における時間的な前後の違いを除外した前記学習データが有する特徴に基づいて、前記学習データ間の類似度を計算する類似度計算部と、前記類似度が高い前記学習データ同士を分別し、その一部を削除するデータ削減部と、を備える。さらに、前記類似度計算部は、２つの前記学習データを時間軸方向にずらしたときに、それぞれの学習データが示す波形データと時間軸で挟まれた領域に関して、２つの前記学習データに係る前記領域が重なり合う面積の単位時間あたりの大きさに基づいて前記類似度を計算すしてもよい。また、前記類似度計算部は、２つの前記学習データを周波数分析して得られるそれぞれの周波数分布の形状に基づいて、２つの前記形状の間の類似度を計算してもよい。また、複数の前記学習データを所定の評価尺度に基づいて類似するもの同士を同じグループへ分類する分類部をさらに備え、前記類似度計算部は、同じ前記グループに属する前記学習データ同士の前記類似度を計算してもよい。 The data extraction device disclosed herein is a data extraction device that extracts a portion of learning data from a plurality of learning data cut out for each time window from time series data, and includes a data acquisition unit that acquires the plurality of learning data, a similarity calculation unit that calculates a similarity between the learning data based on features of the learning data excluding temporal differences in the time window, and a data reduction unit that classifies the learning data with high similarity from each other and deletes a portion of the learning data. Furthermore, the similarity calculation unit may calculate the similarity based on the size per unit time of the area where the regions related to the two learning data overlap with respect to a region sandwiched between the waveform data indicated by each learning data and the time axis when the two learning data are shifted in the time axis direction. Furthermore, the similarity calculation unit may calculate the similarity between the two shapes based on the shapes of the respective frequency distributions obtained by frequency analysis of the two learning data. Furthermore, the data extraction device may further include a classification unit that classifies similar pieces of the plurality of learning data into the same group based on a predetermined evaluation scale, and the similarity calculation unit may calculate the similarity between the learning data belonging to the same group .

本開示の学習モデル構築装置は、上記のデータ抽出装置と、前記学習データを学習して学習モデルを構築する学習部と、を備え、前記学習部は、前記データ抽出装置によって抽出された前記学習データを学習して前記学習モデルを構築し、その後、抽出前の前記学習データを用いて前記学習モデルのパラメータの調整を行う。 The learning model construction device of the present disclosure includes the above-mentioned data extraction device and a learning unit that learns the learning data and constructs a learning model, and the learning unit learns the learning data extracted by the data extraction device to construct the learning model, and then adjusts the parameters of the learning model using the learning data before extraction.

本開示のデータ抽出方法は、時系列データから時間窓ごとに切り出された複数の学習データの中から、一部の学習データを抽出するデータ抽出方法であって、複数の前記学習データを取得するステップと、前記時間窓における時間的な前後の違いを除外した前記学習データが有する特徴に基づいて、前記学習データ間の類似度を計算するステップと、前記類似度が高い前記学習データ同士を分別し、その一部を削除するステップと、を有する。さらに前記類似度を計算するステップでは、２つの前記学習データを時間軸方向にずらしたときに、それぞれの学習データが示す波形データと時間軸で挟まれた領域に関して、２つの前記学習データに係る前記領域が重なり合う面積の単位時間あたりの大きさに基づいて前記類似度を計算してもよい。また、前記類似度を計算するステップでは、２つの前記学習データを周波数分析して得られるそれぞれの周波数分布の形状に基づいて、２つの前記形状の間の類似度を計算してもよい。また、複数の前記学習データを所定の評価尺度に基づいて類似するもの同士を同じグループへ分類するステップ、をさらに有し、前記類似度を計算するステップでは、同じ前記グループに属する前記学習データ同士の前記類似度を計算してもよい。 The data extraction method disclosed herein is a data extraction method for extracting a portion of learning data from a plurality of learning data cut out for each time window from time series data, and includes the steps of acquiring a plurality of learning data, calculating a similarity between the learning data based on the characteristics of the learning data excluding the temporal difference in the time window, and classifying the learning data having a high similarity from each other and deleting a portion of the learning data. Furthermore, in the step of calculating the similarity, the similarity may be calculated based on the size per unit time of the area where the regions related to the two learning data overlap with respect to the region sandwiched between the waveform data and the time axis indicated by each learning data when the two learning data are shifted in the time axis direction. In addition, in the step of calculating the similarity, the similarity between the two shapes may be calculated based on the shapes of the respective frequency distributions obtained by frequency analysis of the two learning data. In addition, the method further includes the step of classifying similar multiple learning data into the same group based on a predetermined evaluation scale, and in the step of calculating the similarity, the similarity between the learning data belonging to the same group may be calculated .

本開示の学習モデル構築方法は、上記のデータ抽出方法によって抽出された前記学習データを学習して学習モデルを構築し、その後、抽出前の前記学習データを用いて前記学習モデルのパラメータの調整を行う。 The learning model construction method disclosed herein constructs a learning model by learning the learning data extracted by the data extraction method described above, and then adjusts the parameters of the learning model using the learning data before extraction.

本開示のプログラムは、コンピュータに、時系列データから時間窓ごとに切り出された複数の学習データの中から、一部の学習データを抽出するデータ抽出処理であって、複数の前記学習データを取得するステップと、前記時間窓における時間的な前後の違いを除外した前記学習データが有する特徴に基づいて、前記学習データ間の類似度を計算するステップと、前記類似度が高い前記学習データ同士を分別し、その一部を削除するステップと、を有するデータ抽出処理を実行させる。さらに前記類似度を計算するステップでは、２つの前記学習データを時間軸方向にずらしたときに、それぞれの学習データが示す波形データと時間軸で挟まれた領域に関して、２つの前記学習データに係る前記領域が重なり合う面積の単位時間あたりの大きさに基づいて前記類似度を計算してもよい。また、前記類似度を計算するステップでは、２つの前記学習データを周波数分析して得られるそれぞれの周波数分布の形状に基づいて、２つの前記形状の間の類似度を計算してもよい。また、前記プログラムが実行させる前記データ抽出処理は、複数の前記学習データを所定の評価尺度に基づいて類似するもの同士を同じグループへ分類するステップ、をさらに有し、前記類似度を計算するステップでは、同じ前記グループに属する前記学習データ同士の前記類似度を計算してもよい。
The program of the present disclosure causes a computer to execute a data extraction process for extracting a portion of learning data from a plurality of learning data cut out for each time window from time series data, the data extraction process including the steps of acquiring a plurality of learning data, calculating a similarity between the learning data based on the characteristics of the learning data excluding the temporal difference in the time window, and classifying the learning data having high similarity from each other and deleting a portion of the learning data. Furthermore, in the step of calculating the similarity, when the two learning data are shifted in the time axis direction, the similarity may be calculated based on the size per unit time of the area where the two learning data overlap with respect to the area sandwiched between the waveform data indicated by each learning data and the time axis. In addition, in the step of calculating the similarity, the similarity between the two shapes may be calculated based on the shapes of the respective frequency distributions obtained by frequency analysis of the two learning data. In addition, the data extraction process executed by the program may further include a step of classifying similar pieces of learning data into the same group based on a predetermined evaluation measure, and the step of calculating the similarity may calculate the similarity between the learning data belonging to the same group .

本開示のプログラムは、コンピュータに、上記のデータ抽出処理によって抽出された前記学習データを学習して学習モデルを構築し、その後、抽出前の前記学習データを用いて前記学習モデルのパラメータの調整を行う処理を実行させる。 The program disclosed herein causes a computer to learn the learning data extracted by the data extraction process described above to construct a learning model, and then execute a process of adjusting the parameters of the learning model using the learning data before extraction.

上述のデータ抽出装置、学習モデル構築装置、データ抽出方法、学習モデル構築方法およびプログラムによれば、学習データを削減することにより、学習モデルの構築に要する時間を低減することができる。 The above-mentioned data extraction device, learning model construction device, data extraction method, learning model construction method, and program can reduce the time required to construct a learning model by reducing the amount of learning data.

第一実施形態の学習モデル構築装置の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of a learning model construction device according to a first embodiment. 第一実施形態のデータ抽出手法の概略を示す図である。FIG. 2 is a diagram illustrating an outline of a data extraction method according to the first embodiment. 第一実施形態の正規化相互相関による類似度の計算方法を説明する図である。5A to 5C are diagrams illustrating a method of calculating a similarity using normalized cross-correlation according to the first embodiment. 第一実施形態の周波数分布による類似度の計算方法を説明する図である。5A to 5C are diagrams illustrating a method of calculating a similarity based on a frequency distribution according to the first embodiment. 第一実施形態のデータ削減方法を説明する第１の図である。FIG. 2 is a first diagram illustrating the data reduction method according to the first embodiment. 第一実施形態のデータ削減方法を説明する第２の図である。FIG. 2 is a second diagram illustrating the data reduction method according to the first embodiment. 第一実施形態のデータ削減方法を説明する第３の図である。FIG. 11 is a third diagram illustrating the data reduction method according to the first embodiment. 第一実施形態のデータ削減方法を説明する第４の図である。FIG. 4 is a fourth diagram illustrating the data reduction method according to the first embodiment. 第一実施形態のデータ抽出処理の一例を示すフローチャートである。11 is a flowchart illustrating an example of a data extraction process according to the first embodiment. 第一実施形態の学習モデル構築処理の一例を示すフローチャートである。1 is a flowchart illustrating an example of a learning model construction process according to the first embodiment. 第二実施形態の学習モデル構築装置の一例を示すブロック図である。FIG. 11 is a block diagram showing an example of a learning model construction device according to a second embodiment. 第二実施形態のデータ抽出処理の一例を示すフローチャートである。13 is a flowchart illustrating an example of a data extraction process according to the second embodiment. 実施形態の学習モデル構築装置のハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of a learning model construction device according to an embodiment.

＜第一実施形態＞
以下、本開示の第一実施形態に係るデータ抽出方法、学習モデルの構築方法について、図１～図７を参照しながら説明する。
（構成）
図１は、実施形態の学習モデル構築装置の一例を示すブロック図である。
図示するように学習モデル構築装置１０は、データ取得部１１と、データ抽出部１２と、学習部１３と、入力部１４と、出力部１５と、記憶部１６と、を備える。
データ取得部１１は、学習モデルの構築に用いる学習データを取得する。学習データとは、例えば、監視対象の機器やプラントから採取された温度、圧力、流用などの時系列データから所定の時間幅の時間窓ごとに切り出されたデータである。ここで、図２を参照する。図２の上図２０１に、時系列データから学習データを切り出す処理（ランダムサンプリング）の一例を示す。変数Ａ～Ｃは、温度、圧力、流量など各種パラメータの時系列データである。ランダムサンプリングでは、各パラメータの時系列データから共通の開始時刻と終了時刻を有する時間窓で１つずつ学習データを切り出し、時間窓を任意にずらして、複数の学習データを切り出す。データ取得部１１は、変数Ａ～Ｃのそれぞれについて共通の時間窓で切り出されたデータを取得する。これらのデータは、学習モデル構築用の学習データとして用いられる。ここでは変数Ａ～Ｃを例として説明を行うが、変数は３つに限定されず、２つ以下でもよいし４つ以上でもよい。 First Embodiment
Hereinafter, a data extraction method and a learning model construction method according to a first embodiment of the present disclosure will be described with reference to FIGS. 1 to 7.
(composition)
FIG. 1 is a block diagram illustrating an example of a learning model construction device according to an embodiment.
As shown in the figure, the learning model construction device 10 includes a data acquisition unit 11, a data extraction unit 12, a learning unit 13, an input unit 14, an output unit 15, and a memory unit 16.
The data acquisition unit 11 acquires learning data used to construct a learning model. The learning data is, for example, data extracted for each time window of a predetermined time width from time series data such as temperature, pressure, and diversion collected from a monitored device or plant. Here, reference is made to FIG. 2. An example of a process (random sampling) for extracting learning data from time series data is shown in the upper diagram 201 of FIG. 2. The variables A to C are time series data of various parameters such as temperature, pressure, and flow rate. In random sampling, learning data is extracted one by one from the time series data of each parameter with a time window having a common start time and end time, and the time window is arbitrarily shifted to extract multiple learning data. The data acquisition unit 11 acquires data extracted with a common time window for each of the variables A to C. These data are used as learning data for constructing a learning model. Here, the variables A to C are used as an example for explanation, but the number of variables is not limited to three, and may be two or less, or four or more.

データ抽出部１２は、データ取得部１１が取得した学習データの中から学習精度が低下しないようにデータを抽出する。ここで、図２の下図２０２を参照する。データ抽出部１２は、時間枠Ｗ１～Ｗ９ごとに切出された学習データのうち、時系列的に同じ特徴であると判断されるデータ（例えば、Ｗ１とＷ４）を重複して含まないように代表的なデータ（例えば、Ｗ１）だけを学習データとして抽出する。データ抽出部１２は、類似度計算部１２１と、データ削減部１２２と、を備える。類似度計算部１２１は、（１）正規化相互相関、（２）周波数分布などに基づいてデータの類似度を計算する。類似度の計算方法については、後に図３～図４を参照して説明する。データ削減部１２２は、データ間の類似度に基づいて、時系列的に特徴が重複するデータを学習データから削減する。学習データからの削減方法については、後に図５Ａ～図５Ｄを参照して説明する。ここで、時系列的に特徴が重複するデータとは、時間枠ごとに切り出したデータ同士を比較した場合に、そのままで特徴が類似する場合だけではなく、時間枠で切り出したデータを時間軸方向にずらしたときに（時間枠を時間軸方向にずらすのではなく、共通の時間枠で切り出した後のデータをずらす。）特徴が類似するデータも含む意味である。ＣＮＮなどの深層学習のように学習データの時系列的な情報（時間の前後の情報）が排除されて学習される学習モデルを構築する場合、時系列的に特徴が重複するデータを削減しても、その特徴を有する学習データを少し残しておけば、その特徴に対応した学習モデルを構築できると考えられる。時系列的に特徴が重複するデータを学習データから排除し、学習データ全体の中から学習モデル構築に必要な特徴を有する少数の学習データに絞り込んで学習を行うことで、学習時間の短縮化を図る。 The data extraction unit 12 extracts data from the learning data acquired by the data acquisition unit 11 so that the learning accuracy does not decrease. Here, reference is made to the lower diagram 202 of FIG. 2. The data extraction unit 12 extracts only representative data (e.g., W1) from the learning data cut out for each time frame W1 to W9, so as not to include overlapping data (e.g., W1 and W4) that are determined to have the same characteristics in a time series. The data extraction unit 12 includes a similarity calculation unit 121 and a data reduction unit 122. The similarity calculation unit 121 calculates the similarity of data based on (1) normalized cross-correlation, (2) frequency distribution, etc. The method of calculating the similarity will be described later with reference to FIG. 3 to FIG. 4. The data reduction unit 122 reduces data with overlapping characteristics in a time series from the learning data based on the similarity between the data. The reduction method from the learning data will be described later with reference to FIG. 5A to FIG. 5D. Here, data with overlapping features in a time series does not only mean data with similar features when data extracted for each time frame is compared, but also includes data with similar features when the data extracted in the time frame is shifted along the time axis (the data after extraction in a common time frame is shifted, rather than shifting the time frame along the time axis). When constructing a learning model in which time-series information (information before and after time) of the learning data is eliminated, such as in deep learning such as CNN, it is considered that even if data with overlapping features in a time series is reduced, a learning model corresponding to the feature can be constructed by leaving a small amount of learning data having the feature. By excluding data with overlapping features in a time series from the learning data and narrowing down the entire learning data to a small number of learning data having the features necessary for building a learning model, the learning time can be shortened.

学習部１３は、学習データを深層学習によって学習し、学習モデルを構築する。
入力部１４は、キーボード、マウス、タッチパネル、ボタン等の入力装置を用いて構成される。入力部１４は、入力装置を用いて入力された情報を受け付け、その情報をデータ抽出部１２や学習部１３等に出力する。
出力部１５は、各種データを電子ファイルや表示装置へ出力する。例えば、出力部１５は、データ抽出部１２によって抽出された学習データを電子ファイルとして出力してもよい。
記憶部１６は、学習データの抽出、学習モデルの構築に必要なデータを記憶する。 The learning unit 13 learns the learning data through deep learning and constructs a learning model.
The input unit 14 is configured using input devices such as a keyboard, a mouse, a touch panel, buttons, etc. The input unit 14 accepts information input using the input devices, and outputs the information to the data extraction unit 12, the learning unit 13, etc.
The output unit 15 outputs various data to an electronic file or a display device. For example, the output unit 15 may output the learning data extracted by the data extracting unit 12 as an electronic file.
The storage unit 16 stores data necessary for extracting learning data and constructing a learning model.

［類似度の計算］
類似度計算部１２１は、以下の正規化相互相関又は周波数成分分布を用いて、学習データ間の類似度を計算する。 [Similarity calculation]
The similarity calculation unit 121 calculates the similarity between the learning data by using the following normalized cross-correlation or frequency component distribution.

（１）正規化相互相関に基づく類似度
次に図２、図３を用いて、正規化相互相関に基づく類似度の計算方法について説明する。図３の波形データＬ１と波形データＬ２は、それぞれ、時系列データから切り出した時間枠Ｗ１の変数Ａのデータと時間枠Ｗ４の変数Ａのデータを、データの値の平均が０、標準偏差が１となるように正規化したデータである。類似度計算部１２１は、例えば、波形データＬ１に対して、波形データＬ２を時間軸方向にずらしながら、両者の波形が重なる部分の面積Ｐ１を計算する。そして、類似度計算部１２１は、重なる部分の面積Ｐ１を、重なり部分の時間Ｔ１で除算して重なり部分の単位面積を計算する。変数Ａと同様にして、類似度計算部１２１は、変数Ｂ、Ｃの時間枠Ｗ１のデータと時間枠Ｗ４のデータについてもそれぞれ正規化した波形データを算出し、変数Ａと同じ量だけ時間枠Ｗ４を時間方向にずらしたときに、時間枠Ｗ１について算出した波形データと時間枠Ｗ４について算出した波形データが重なる部分の単位面積を計算する。このときの変数Ｂについての単位面積をＰ２、変数Ｃについての単位面積をＰ３とする。類似度計算部１２１は、時間枠Ｗ４のデータの時間方向のずらし量を調整しながら、単位面積Ｐ１～Ｐ３の平均値を計算し、平均値を最大化する時間枠Ｗ４のずらし量を探索する。そして、類似度計算部１２１は、単位面積Ｐ１～Ｐ３の平均値が最大となるときのその平均値を時間枠Ｗ１のデータと時間枠Ｗ４のデータの類似度とする。このような計算により、位相差のある波形同士でもトレンドが似た期間があれば類似度の値が大きくなる。類似度計算部１２１は、同じ方法で時間枠Ｗ１と時間枠Ｗ２～Ｗ９の類似度を計算し、時間枠Ｗ２と時間枠Ｗ３～Ｗ９の類似度を計算し、・・・といったように時間枠の組合せの全パターンについて類似度を計算する。なお、１つの学習データの時系列値がすべて同じ値の場合、正規化ができないため、計算対象変数の類似度を１とする。 (1) Similarity Based on Normalized Cross-Correlation Next, a method for calculating similarity based on normalized cross-correlation will be described with reference to Fig. 2 and Fig. 3. The waveform data L1 and waveform data L2 in Fig. 3 are data of variable A in time frame W1 and variable A in time frame W4, respectively, extracted from time series data, normalized so that the average of the data values is 0 and the standard deviation is 1. For example, the similarity calculation unit 121 calculates the area P1 of the overlapping portion between the waveform data L1 and the waveform data L2 while shifting the data in the time axis direction. Then, the similarity calculation unit 121 divides the area P1 of the overlapping portion by the time T1 of the overlapping portion to calculate the unit area of the overlapping portion. In the same manner as for the variable A, the similarity calculation unit 121 calculates normalized waveform data for the data of the time frame W1 and the data of the time frame W4 of the variables B and C, respectively, and calculates the unit area of the portion where the waveform data calculated for the time frame W1 and the waveform data calculated for the time frame W4 overlap when the time frame W4 is shifted in the time direction by the same amount as the variable A. The unit area for the variable B at this time is P2, and the unit area for the variable C is P3. The similarity calculation unit 121 calculates the average value of the unit areas P1 to P3 while adjusting the shift amount in the time direction of the data of the time frame W4, and searches for the shift amount of the time frame W4 that maximizes the average value. Then, the similarity calculation unit 121 determines the average value when the average value of the unit areas P1 to P3 is maximized as the similarity between the data of the time frame W1 and the data of the time frame W4. By such calculation, if there is a period in which the trends are similar even between waveforms with a phase difference, the value of the similarity becomes large. The similarity calculation unit 121 uses the same method to calculate the similarity between time frame W1 and time frames W2 to W9, calculates the similarity between time frame W2 and time frames W3 to W9, and so on for all patterns of combinations of time frames. Note that if the time series values of one learning data are all the same value, normalization is not possible, so the similarity of the calculation target variable is set to 1.

（２）周波数成分分布に基づく類似度
次に図４を用いて、周波数成分の分布に基づく類似度の計算方法について説明する。例えば、類似度計算部１２１は、時間枠Ｗ１で切り出した変数Ａのデータを正規化したデータと時間枠Ｗ４で切り出した変数Ａのデータを正規化したデータの各々について高速フーリエ変換による周波数分析を行う。類似度計算部１２１は、周波数分析後のデータに基づいて、周波数分布を用いた類似度を計算する。図４に異なる時間枠のデータを周波数分析したデータＱ１～Ｑ３を示す。図４のグラフの縦軸は周波数成分の大きさ、横軸は周波数を示す。例えば、データＱ１の縦軸の値は、時間枠Ｗ１における各周波数成分の大きさの平均値（時間軸方向の平均）を示している。データＱ１～Ｑ３を比較すると、周波数成分の大きさが異なっていても、波形の形状は類似している。例えば、データＱ１～Ｑ３の何れにおいても周波数がｆ１の範囲では、周波数が増加するにしたがって周波数成分の大きさが上昇し、ｆ２の範囲では、周波数が変化しても周波数成分の大きさは概ね一定である。このように周波数分布のトレンドが似ている場合、データＱ１～Ｑ３の間の類似度は高く、周波数分布のトレンドが異なる場合、類似度は低くなる。具体的には、類似度計算部１２１は、例えば、周波数の範囲ｆ３のデータＱ１とデータＱ３の各周波数成分によって構成されるベクトルＶ１とＶ３のコサイン類似度を計算する。類似度計算部１２１は、他の周波数範囲についても当該範囲における各周波数成分によって構成されるベクトル（例えば、Ｖ１ａとＶ３ａ）のコサイン類似度を計算する。例えば、類似度計算部１２１は、各変数Ａ～Ｃについて、所定の範囲について計算したコサイン類似度の平均をデータＱ１とデータＱ３の類似度とする。より具体的には、高速フーリエ変換を利用した類似度計算方法は以下の通りである。（ステップ１）２つのデータに対して、高速フーリエ変換を行い、各周波数の成分の実数、及び、虚数を算出する。（ステップ２）上記で算出した実数、及び、虚数の係数を特徴量としたコサイン類似度を類似度として算出する。 (2) Similarity Based on Frequency Component Distribution Next, a method of calculating similarity based on the distribution of frequency components will be described with reference to FIG. 4. For example, the similarity calculation unit 121 performs frequency analysis by fast Fourier transform on each of data obtained by normalizing data of variable A extracted in time frame W1 and data obtained by normalizing data of variable A extracted in time frame W4. The similarity calculation unit 121 calculates similarity using frequency distribution based on the data after frequency analysis. FIG. 4 shows data Q1 to Q3 obtained by frequency analysis of data of different time frames. The vertical axis of the graph in FIG. 4 indicates the magnitude of the frequency components, and the horizontal axis indicates the frequency. For example, the value of the vertical axis of data Q1 indicates the average value (average in the time axis direction) of the magnitude of each frequency component in time frame W1. When comparing data Q1 to Q3, the shapes of the waveforms are similar even if the magnitudes of the frequency components are different. For example, in any of data Q1 to Q3, in the frequency range of f1, the magnitude of the frequency components increases as the frequency increases, and in the range of f2, the magnitude of the frequency components is roughly constant even if the frequency changes. In this way, when the trends of the frequency distribution are similar, the similarity between the data Q1 to Q3 is high, and when the trends of the frequency distribution are different, the similarity is low. Specifically, the similarity calculation unit 121 calculates the cosine similarity of vectors V1 and V3 composed of each frequency component of the data Q1 and data Q3 in the frequency range f3, for example. The similarity calculation unit 121 also calculates the cosine similarity of vectors (for example, V1a and V3a) composed of each frequency component in the other frequency ranges. For example, the similarity calculation unit 121 sets the average of the cosine similarities calculated for each variable A to C in a predetermined range as the similarity between the data Q1 and data Q3. More specifically, the similarity calculation method using the fast Fourier transform is as follows. (Step 1) A fast Fourier transform is performed on the two data to calculate the real number and imaginary number of each frequency component. (Step 2) A cosine similarity is calculated as the similarity, with the coefficients of the real number and imaginary number calculated above as feature quantities.

所定の範囲とは、例えば、データにノイズ（主に高周波成分）が多く含まれている場合、ノイズが多く含まれる周波数の範囲（高周波）を除いた他の周波数の範囲である。また、ある周波数の範囲について周波数成分値が０や低い値である場合、その範囲を除外してコサイン類似度を計算してもよい。また、例えば、対象機器やプラントの運転モード（起動中、定格運転中、部分負荷運転中、停止中など）に応じて、各運転モードに応じた周波数の範囲だけを用いて類似度を計算してもよい。例えば、過渡的な運転状態（起動中、停止中、運転モードの変更中）にあるときには、低周波数の範囲のデータを用いて類似度を計算してもよい。類似度計算部１２１は、このような方法で時間枠Ｗ１と時間枠Ｗ２～Ｗ９それぞれの周波数分布に基づく類似度を計算し、時間枠Ｗ２と時間枠Ｗ３～Ｗ９それぞれの類似度を計算し、・・・といったように時間枠の組合せの全パターンについて類似度を計算する。なお、周波数成分分布に基づく類似度では、もともと時間軸方向の前後関係が除外されて特徴量（各周波数の周波数成分値）および類似度が計算されている。 The predetermined range is, for example, when the data contains a lot of noise (mainly high-frequency components), a range of frequencies other than the range of frequencies containing a lot of noise (high frequencies). In addition, when the frequency component value for a certain frequency range is 0 or a low value, the cosine similarity may be calculated excluding that range. In addition, for example, the similarity may be calculated using only the frequency range corresponding to each operation mode (starting, rated operation, partial load operation, stopped, etc.) of the target equipment or plant. For example, when in a transient operation state (starting, stopping, changing operation mode), the similarity may be calculated using data in the low frequency range. In this way, the similarity calculation unit 121 calculates the similarity based on the frequency distribution of each of the time frame W1 and the time frames W2 to W9, calculates the similarity of each of the time frame W2 and the time frames W3 to W9, and so on, calculating the similarity for all patterns of combinations of time frames. Note that in the similarity based on the frequency component distribution, the feature amount (frequency component value of each frequency) and similarity are calculated by originally excluding the context in the time axis direction.

データ抽出部１２は、類似度が高い他のデータによって、あるデータの特徴が代替可能であるという考えのもとに、類似度計算部１２１が計算した類似度に基づいて、データ削減部１２２を使って、時系列的な特徴が重複するデータを削減し、学習モデルの構築に必要な特徴を持った一部のデータを抽出する。次に図５Ａ～図５Ｄを参照して、データ削減処理について説明する。 Based on the idea that the features of a certain piece of data can be substituted by other data with a high degree of similarity, the data extraction unit 12 uses the data reduction unit 122 to reduce data with overlapping time-series features based on the similarity calculated by the similarity calculation unit 121, and extracts a portion of data with features necessary for constructing a learning model. Next, the data reduction process will be described with reference to Figures 5A to 5D.

［データ削減処理］
図５Ａの表５００に、それぞれの時間枠で切り出したＡ～Ｉのデータについて、上記の（１）又は（２）の類似度計算方法で計算した各データ間の類似度を整理した結果を示す。例えば、データＡとデータＢの類似度は０．９であり、データＡとデータＣの類似度は０．６であり、データＡとデータＩの類似度は０．２である。ここで、データ削減部１２２は、所定の閾値（例えば、０．９）に基づいて、各データについて、類似するデータと類似しないデータに分別する。分別後の類似するデータのグループを図５Ｂの表５０１に示す。また、各データの類似度をデータ間の距離で示した特徴量に基づく分布を図５Ｃに示す。なお、データＡ～Ｉはクラス１に属し、データＡ´はクラス２に属する。例えば、クラス１は、プラントの状態が正常のときの圧力、流量などの各変数のデータの集合であり、クラス２は、プラントの状態が異常のときのデータの集合である。あるいは、クラス１は、プラントが定格運転しているときの各変数のデータの集合であり、クラス２は、起動時のデータの集合である。データ削減部１２２は、各クラス内で学習データの削除を行う。ここでは、クラス１の中でのデータ削除について説明する。 [Data reduction process]
Table 500 in FIG. 5A shows the result of sorting the similarity between each data calculated by the similarity calculation method (1) or (2) for data A to I cut out in each time frame. For example, the similarity between data A and data B is 0.9, the similarity between data A and data C is 0.6, and the similarity between data A and data I is 0.2. Here, the data reduction unit 122 classifies each data into similar data and dissimilar data based on a predetermined threshold (for example, 0.9). Groups of similar data after classification are shown in table 501 in FIG. 5B. Also, a distribution based on the feature amount showing the similarity of each data by the distance between data is shown in FIG. 5C. Note that data A to I belong to class 1, and data A' belongs to class 2. For example, class 1 is a set of data of each variable such as pressure and flow rate when the plant state is normal, and class 2 is a set of data when the plant state is abnormal. Alternatively, class 1 is a set of data for each variable when the plant is operating at rated power, and class 2 is a set of data at the time of startup. The data reduction unit 122 deletes learning data within each class. Here, data deletion within class 1 will be described.

データ削減部１２２は、類似度が高いデータを多く有するデータを選択し、選択したデータを削除する。図５Ｂの表５０１を参照すると、例えば、データＡに類似するクラス１のデータ数は１個（Ｂ）、データＣに類似するクラス１のデータ数は３個（Ｇ、Ｈ、Ｉ）、データＨに類似するクラス１のデータ数は５個（Ｃ、Ｅ、Ｆ、Ｇ、Ｉ）などとなる。また、図５Ｃを参照すると、データＨの周囲には、類似度０．９以上のデータＣ，Ｅ，Ｆ，Ｇ，Ｉが分布することが分かる。このような場合、データ削減部１２２は、データＨを削除する。削除後のデータ間の関係を図５Ｂの表５０２に示す。表５０２を参照すると、データＢに類似するデータ数は２個（Ａ、Ｃ）、データＣに類似するデータ数は２個（Ｂ、Ｉ）、他のデータについては類似するデータは１個以下である。表５０２のデータ群について、データ削減部１２２は、類似度が高いデータを多く有するデータを削除する処理を繰り返し行う。例えば、データ削減部１２２は、データＢとデータＣのうちの何れかを削除する（どちらでも良い。）。ここでは、データＢを削除するとする。データＢを削除した後のデータ間の関係を図５Ｄに示す。図５Ｄでは、それぞれのデータに類似するデータ数は１個以下となっている。このような処理を繰り返し行うことで、時系列的な特徴が重複するデータが削除されていく。データ削減部１２２は、もともとの学習データ群が有する特徴を保存しつつ、十分に学習データ数が削減できると、データ削減を終了する。例えば、データ削減部１２２は、全体のデータ数が所定数以下となると、あるいは、各データに類似するデータ数が所定数以下となると、データの削除を終了する。データ抽出部１２は、時系列的な特徴が重複するデータを削減した後に残ったデータを、学習モデルの大まかな構造を学習するための学習データとして抽出する。 The data reduction unit 122 selects data having many data with high similarity and deletes the selected data. With reference to table 501 in FIG. 5B, for example, the number of data in class 1 similar to data A is 1 (B), the number of data in class 1 similar to data C is 3 (G, H, I), and the number of data in class 1 similar to data H is 5 (C, E, F, G, I). Also, with reference to FIG. 5C, it can be seen that data C, E, F, G, and I with similarity of 0.9 or more are distributed around data H. In such a case, the data reduction unit 122 deletes data H. The relationship between the data after deletion is shown in table 502 in FIG. 5B. With reference to table 502, the number of data similar to data B is 2 (A, C), the number of data similar to data C is 2 (B, I), and the number of similar data for other data is 1 or less. For the data group in table 502, the data reduction unit 122 repeatedly performs a process of deleting data having many data with high similarity. For example, the data reduction unit 122 deletes either data B or data C (either is acceptable). Here, it is assumed that data B is deleted. FIG. 5D shows the relationship between the data after data B is deleted. In FIG. 5D, the number of pieces of data similar to each piece of data is one or less. By repeating such processing, data with overlapping time-series features is deleted. The data reduction unit 122 ends the data reduction when the number of pieces of training data can be sufficiently reduced while preserving the features of the original training data group. For example, the data reduction unit 122 ends the data deletion when the total number of pieces of data becomes a predetermined number or less, or when the number of pieces of data similar to each piece of data becomes a predetermined number or less. The data extraction unit 12 extracts the data remaining after the data with overlapping time-series features is deleted as training data for learning the rough structure of the training model.

（動作）
[データ抽出処理]
次に図６を参照して、本実施形態のデータ抽出処理の流れを説明する。
図６は、第一実施形態のデータ抽出処理の一例を示すフローチャートである。
まず、データ取得部１１が、複数のパラメータ（例えば、圧力、流量、・・・など）の時系列データの各々から時間窓ごとに切り出された複数の学習データを取得する（ステップＳ１０）。データ取得部１１が、時系列データを取得して、時間窓ごとにデータを切り出して学習データを取得してもよい。データ取得部１１は、時間枠ごとに切り出された複数の学習データを記憶部１６へ記録する。 (operation)
[Data extraction process]
Next, the flow of the data extraction process of this embodiment will be described with reference to FIG.
FIG. 6 is a flowchart illustrating an example of the data extraction process of the first embodiment.
First, the data acquiring unit 11 acquires a plurality of pieces of learning data cut out for each time window from each of the time series data of a plurality of parameters (e.g., pressure, flow rate, etc.) (step S10). The data acquiring unit 11 may acquire the time series data and cut out the data for each time window to acquire the learning data. The data acquiring unit 11 records the plurality of pieces of learning data cut out for each time frame in the storage unit 16.

次にデータ抽出部１２が学習データをクラスに分類する（ステップＳ２０）。例えば、学習データをプラントが正常な運転状態のときのデータ（クラス１）と、異常なときのデータ（クラス２）とに分けたい場合、ユーザが、正常な運転状態の時間帯、異常な運転状態のときの時間帯を学習モデル構築装置１０へ入力する。入力部１４は、入力された時間帯の情報をデータ抽出部１２へ出力する。データ抽出部１２は、ユーザによって設定された時間帯に応じて、ステップＳ１０で取得した学習データをクラス１、クラス２の何れかに分類する。 Next, the data extraction unit 12 classifies the learning data into classes (step S20). For example, if the user wishes to separate the learning data into data when the plant is in a normal operating state (class 1) and data when the plant is in an abnormal operating state (class 2), the user inputs the time periods when the plant is in a normal operating state and the time periods when the plant is in an abnormal operating state to the learning model construction device 10. The input unit 14 outputs the input information about the time periods to the data extraction unit 12. The data extraction unit 12 classifies the learning data acquired in step S10 into either class 1 or class 2, depending on the time period set by the user.

次にデータ抽出部１２が、複数の学習データから時系列的な特徴が重複したデータを除外し、学習モデルの大まかな構造を精度よく構築するために必要な少数の学習データだけを抽出する処理（ステップＳ３０～Ｓ６０）を行う。データ抽出部１２は、この処理をクラスごとに実行する。 Then, the data extraction unit 12 performs a process (steps S30 to S60) of excluding data with overlapping time-series features from the multiple learning data sets and extracting only the small amount of learning data necessary to accurately construct the general structure of the learning model. The data extraction unit 12 performs this process for each class.

まず、類似度計算部１２１が、（１）正規化相互相関に基づく類似度、または（２）周波数分布に基づく類似度を計算する（ステップＳ３０）。類似度計算部１２１は、ステップＳ２０で分類した同じクラスに属する学習データから２つを選択して組み合わせる全ての組合せパターンについて、（１）や（２）の方法で類似度を計算し、各学習データ間の類似度を記憶部１６に記録する。これにより、図５Ａの表５００のようなデータが得られる。類似度計算部１２１は、（１）と（２）の両方の方法で類似度を計算し、それらの加重平均を類似度としてもよい。 First, the similarity calculation unit 121 calculates (1) a similarity based on normalized cross-correlation, or (2) a similarity based on frequency distribution (step S30). The similarity calculation unit 121 calculates the similarity by method (1) or (2) for all combination patterns in which two pieces of learning data belonging to the same class classified in step S20 are selected and combined, and records the similarity between each piece of learning data in the memory unit 16. This results in data such as table 500 in FIG. 5A. The similarity calculation unit 121 may calculate the similarity by both methods (1) and (2) and use the weighted average of these as the similarity.

次に、データ削減部１２２が、類似度の高い学習データの組合せを集計する（ステップＳ４０）。データ削減部１２２は、各学習データに対して、所定の閾値以上（例えば、０．９以上）の他の学習データを対応付け、類似度が高いデータ同士に分別する。データ削減部１２２は、この分別結果を記憶部１６に記録する。この処理によって、図５Ｂの表５０１に例示するようなデータが得られる。 Next, the data reduction unit 122 tabulates combinations of learning data with high similarity (step S40). The data reduction unit 122 associates each learning data with other learning data that is equal to or greater than a predetermined threshold (e.g., 0.9 or greater), and separates the data into those with high similarity. The data reduction unit 122 records the separation results in the storage unit 16. This process results in data such as that shown in table 501 of FIG. 5B.

次に、データ削減部１２２が、類似度が高いデータを多く有する学習データを削除する（ステップＳ５０）。図５Ｃで例示したような学習データが得られた場合、データ削減部１２２は、データＨを削除する。このようにしてデータを削除することで、学習データ群が有する特徴を保持したまま（特徴が失われることを抑制しながら）、学習データの数を削減することができる。図５Ｃで例示したデータ分布の場合、データＣ、Ｉ、Ｇ、Ｆなどを残して、データＨを削除することで、クラス１の境界付近に点在する学習データを残しつつ、データ量を削減することができる。学習データを削減する他の方法として、類似度が高いデータを多く有する学習データを選択し、選択したデータを残して、類似する他の学習データを削除するようにしてもよい。例えば、図５Ｂの表５０１の場合、データＨに関して、データＨを残してデータＣ、Ｅ、Ｆ、Ｇ、Ｉを削除する。これにより、図５Ｃの例の場合であれば、より多くのデータを削減することができる。 Next, the data reduction unit 122 deletes the training data that has many data with high similarity (step S50). When the training data as shown in FIG. 5C is obtained, the data reduction unit 122 deletes data H. By deleting data in this way, it is possible to reduce the number of training data while retaining the features of the training data group (while suppressing the loss of features). In the case of the data distribution shown in FIG. 5C, the amount of data can be reduced while leaving the training data scattered around the boundary of class 1 by leaving data C, I, G, F, etc. and deleting data H. As another method of reducing the training data, it is possible to select training data that has many data with high similarity, leave the selected data, and delete other similar training data. For example, in the case of table 501 in FIG. 5B, with respect to data H, data H is left and data C, E, F, G, and I are deleted. As a result, in the case of the example in FIG. 5C, more data can be reduced.

次に、データ削減部１２２が、学習データの削除を完了するか否かを判定する（ステップＳ６０）。例えば、データ削減部１２２は、各データに類似するデータ数が所定数以下となると、学習データの削除を完了すると判定する。あるいは、データ削減部１２２は、残った学習データの数が所定数以下となると、又は、削除したデータ数の累積が所定数に達すると、学習データの削除を完了すると判定してもよい。学習データの削除を完了しない場合（ステップＳ６０；Ｎｏ）、ステップＳ４０からの処理を繰り返す。学習データの削除を完了する場合（ステップＳ６０；Ｙｅｓ）、データ抽出部１２は、抽出した学習データを記憶部１６に記録する（ステップＳ７０）。データ抽出部１２は、削除されずに残った学習データを抽出された学習データとして記憶部１６に記録する。 Next, the data reduction unit 122 determines whether or not the deletion of the learning data is complete (step S60). For example, the data reduction unit 122 determines that the deletion of the learning data is complete when the number of pieces of data similar to each piece of data is equal to or less than a predetermined number. Alternatively, the data reduction unit 122 may determine that the deletion of the learning data is complete when the number of pieces of remaining learning data is equal to or less than a predetermined number, or when the cumulative number of pieces of deleted data reaches a predetermined number. If the deletion of the learning data is not complete (step S60; No), the process from step S40 is repeated. If the deletion of the learning data is complete (step S60; Yes), the data extraction unit 12 records the extracted learning data in the storage unit 16 (step S70). The data extraction unit 12 records the remaining learning data that has not been deleted as extracted learning data in the storage unit 16.

[学習モデル構築処理]
次に学習モデルを構築する処理について説明する。
図７は、第一実施形態の学習モデル構築処理の一例を示すフローチャートである。
まず、データ取得部１１が、複数のパラメータの時系列データから時間枠ごとに切り出された学習データを取得する（ステップＳ１）。
次に、データ抽出部１２が、学習データを抽出する（ステップＳ２）。データ抽出部１２は、図６で説明した処理によって学習データを抽出する。 [Learning model construction process]
Next, the process of constructing a learning model will be described.
FIG. 7 is a flowchart illustrating an example of a learning model construction process according to the first embodiment.
First, the data acquiring unit 11 acquires learning data extracted for each time frame from the time-series data of a plurality of parameters (step S1).
Next, the data extraction unit 12 extracts learning data (step S2). The data extraction unit 12 extracts learning data by the process described with reference to FIG.

次に、学習部１３が、データ抽出部１２によって抽出されたデータを学習する（ステップＳ３）。学習部１３は、抽出された学習データを記憶部１６から読み出して、例えば、ＣＮＮなどの深層学習により学習モデルを構築する。例えば、学習部１３は、共通する時間窓の複数パラメータの学習データに対して、その時間窓の時間帯におけるプラント状態を示す情報（正常、異常、事象ＸＸが発生など）をラベル付けした教師データを学習してプラントに生じる事象を判別するための判別モデルを構築する。これにより、判別モデルのハイパーパラメータのおおよその値が決定される。このステップでは、学習データが絞り込まれているので、全ての学習データを用いて学習を行う場合に比べて、短時間で学習、評価を実施することができる。また、図６の処理により、もともとの学習データ群が有している特徴を保持したデータが抽出されているので、精度を落とさずに学習モデル（判別モデル）を構築することができる。 Next, the learning unit 13 learns the data extracted by the data extraction unit 12 (step S3). The learning unit 13 reads the extracted learning data from the storage unit 16 and constructs a learning model by deep learning such as CNN. For example, the learning unit 13 learns teacher data labeled with information indicating the plant state during the time window (normal, abnormal, event XX has occurred, etc.) for the learning data of multiple parameters of a common time window, and constructs a discrimination model for discriminating events occurring in the plant. This determines approximate values of the hyperparameters of the discrimination model. In this step, since the learning data is narrowed down, learning and evaluation can be performed in a short time compared to the case where learning is performed using all the learning data. In addition, since data that retains the characteristics of the original learning data group is extracted by the process of FIG. 6, a learning model (discrimination model) can be constructed without reducing accuracy.

次に、学習部１３が、全ての学習データを学習する（ステップＳ４）。学習部１３は、ステップＳ３で構築された学習モデルに対して、多数の学習データを用いて学習を進め、決定したハイパーパラメータの値を調整する。既にステップＳ３でハイパーパラメータの値が決定されているので、全ての学習データを用いて学習を行っても、比較的短時間で学習、評価を実施することができる。また、多数のデータを用いて学習を行うことにより、ステップＳ３で構築した学習モデルの精度を向上することができる。図７の処理によれば、全ての学習データを用いて学習データを構築する前に、ステップＳ３の処理を行うことで、最初から全ての学習データを用いて学習モデルを構築する場合と比較して、短時間に学習モデルを構築することができる。 Next, the learning unit 13 learns all the learning data (step S4). The learning unit 13 proceeds with learning using a large amount of learning data for the learning model constructed in step S3, and adjusts the determined hyperparameter values. Since the hyperparameter values have already been determined in step S3, learning and evaluation can be performed in a relatively short time even if learning is performed using all the learning data. Furthermore, by performing learning using a large amount of data, the accuracy of the learning model constructed in step S3 can be improved. According to the process of FIG. 7, by performing the process of step S3 before constructing the learning data using all the learning data, the learning model can be constructed in a short time compared to the case of constructing the learning model using all the learning data from the beginning.

（効果）
以上説明したように、本実施形態によれば、もともとの学習データ群が有する特徴を保持したまま、学習データの数を削減することができる。精度を保ったまま、学習データを削減することで、学習モデル構築に要する学習時間を短縮することができる。深層学習モデルの構造は、学習データ量ではなく、学習データの特徴量のバラツキに依存していると仮定すると、本実施形態の学習データ抽出方法を用いて所望する精度が得られるモデル構造を特定した後に、大量の学習データを用いて、各パラメータを微調整しながらモデルの高精度化を図ることにより、効率的に学習モデルの構築を行うことができる。 (effect)
As described above, according to this embodiment, the number of training data can be reduced while maintaining the characteristics of the original training data group. By reducing the training data while maintaining accuracy, the learning time required to build a learning model can be shortened. Assuming that the structure of a deep learning model depends on the variation in the feature amount of the training data, rather than on the amount of training data, a model structure that can obtain the desired accuracy is specified using the training data extraction method of this embodiment, and then a large amount of training data is used to fine-tune each parameter while improving the accuracy of the model, thereby efficiently building a learning model.

本実施形態の学習データの抽出方法は、学習可能なモデル、または、高精度なモデル構造のあたりがつけられず、何度かモデル構造の変更を行いながら評価する必要がある場合などに有効である。また、本実施形態の学習データの抽出方法は、時系列データを学習するにあたり、学習データが有する特徴の時間的な前後関係を無視して（時系列的な情報を排除して）、学習する手法（深層学習など）に有効である。また、例えば、プラントの異常事象を判別する判別モデルのように、時系列データの多値分類問題を扱ったシステムに適用することができる。なお、実施形態では、時間窓から切出したデータが、時系列的に同じ特徴となるデータか判断するため、正規化相互相関、周波数分布を用いて類似度を計算する例を挙げたが、類似度の計算は、他の方法によって行ってもよい。また、上記の実施形態では、教師ありの深層学習を例に説明を行ったが、教師なしの深層学習についても適用することができる。 The learning data extraction method of this embodiment is effective when it is not possible to find a model that can be learned or a highly accurate model structure, and it is necessary to make several changes to the model structure before evaluating it. In addition, the learning data extraction method of this embodiment is effective for a method (such as deep learning) in which the time sequence of the features of the learning data is ignored (time sequence information is excluded) when learning time series data. In addition, it can be applied to a system that handles a multi-value classification problem of time series data, such as a discrimination model that discriminates abnormal events in a plant. In the embodiment, an example is given in which similarity is calculated using normalized cross-correlation and frequency distribution to determine whether data extracted from a time window has the same characteristics in a time series, but the similarity may be calculated by other methods. In addition, in the above embodiment, supervised deep learning is used as an example, but it can also be applied to unsupervised deep learning.

＜第二実施形態＞
以下、本発明の第二実施形態による学習モデル構築装置１０Ａについて図８～図９を参照して説明する。
第一実施形態では、全ての学習データの組合せについて類似度の計算を行った。この方法は、学習データの数が膨大になると、計算負荷や計算時間が膨大になる可能性がある。そこで、第二実施形態では、類似度が低いデータ間の類似度の計算を省略し、計算量の削減を図る。 Second Embodiment
A learning model construction device 10A according to a second embodiment of the present invention will be described below with reference to FIGS.
In the first embodiment, the similarity is calculated for all combinations of learning data. With this method, when the number of learning data becomes huge, the calculation load and calculation time may become huge. Therefore, in the second embodiment, the calculation of the similarity between data with low similarity is omitted to reduce the amount of calculation.

（構成）
図８は、第二実施形態の学習モデル構築装置の一例を示すブロック図である。
本発明の第二実施形態に係る学習モデル構築装置１０Ａの構成のうち、本発明の第一実施形態に係る学習モデル構築装置１０を構成する機能部と同じものには同じ符号を付し、それらの説明を省略する。第二実施形態に係る学習モデル構築装置１０Ａは、第一実施形態のデータ抽出部１２に代えてデータ抽出部１２Ａを備える。データ抽出部１２Ａは、第一実施形態と同様の類似度計算部１２１とデータ削減部１２２に加え、分類部１２３を備える。 (composition)
FIG. 8 is a block diagram illustrating an example of a learning model construction device according to the second embodiment.
Among the components of the learning model construction device 10A according to the second embodiment of the present invention, the same functional units as those constituting the learning model construction device 10 according to the first embodiment of the present invention are denoted by the same reference numerals, and their description will be omitted. The learning model construction device 10A according to the second embodiment includes a data extraction unit 12A instead of the data extraction unit 12 of the first embodiment. The data extraction unit 12A includes a classification unit 123 in addition to a similarity calculation unit 121 and a data reduction unit 122 similar to those of the first embodiment.

分類部１２３は、同じクラスの学習データを類似するもの同士が集まったグループ（クラスタ）に分類する。例えば、分類部１２３は、第一実施形態で説明した（１）正規化相互相関、又は（２）周波数成分分布のトレンドが類似する学習データ同士を任意のクラスタリング手法により同一クラスタへ分類する。類似度計算部１２１は、同じクラスタに属する学習データの組合せについて類似度を計算する。 The classification unit 123 classifies the learning data of the same class into groups (clusters) of similar data. For example, the classification unit 123 classifies learning data with similar trends in frequency component distribution into the same cluster by (1) normalized cross-correlation or (2) any clustering method described in the first embodiment. The similarity calculation unit 121 calculates the similarity between combinations of learning data belonging to the same cluster.

（動作）
次に第二実施形態のデータ抽出処理について説明を行う。
図９は、第二実施形態のデータ抽出処理の一例を示すフローチャートである。
図６を用いて説明した第一実施形態の処理と同じ処理については同じ符号を付し、説明を省略する。まず、データ取得部１１が、複数の学習データを取得する（ステップＳ１０）。次にデータ抽出部１２が学習データをクラスに分類する（ステップＳ２０）。次に分類部１２３が、同一クラス内の学習データについてクラスタリングを行う（ステップＳ２５）。分類部１２３は、クラス１に属する学習データについて、任意のクラスタリング手法（例えば、ｋｍｅａｎｓ法）を用いて、所定の評価尺度（例えば、学習データ間の時系列的な前後関係を考慮しない特徴量の類似度）に基づいて、類似する学習データを同一のクラスタへ分類する。例えば、分類部１２３は、（１）正規化相互相関又は（２）周波数成分分布に基づいて類似度を計算し、類似度が高いもの同士を複数のクラスタのうちの何れかへ分類してゆく。分類部１２３は、各クラスについてクラスタリングを行うが、あるクラスの学習データ数が閾値未満の場合には、そのクラスについてはステップＳ３１の処理を実施しなくてもよい。分類部１２３は、クラスタリングを行った全ての学習データに対して、その学習データが属するクラスタの識別情報を記憶部１６に記録する。 (operation)
Next, the data extraction process of the second embodiment will be described.
FIG. 9 is a flowchart illustrating an example of the data extraction process according to the second embodiment.
The same processes as those in the first embodiment described with reference to FIG. 6 are denoted by the same reference numerals, and the description thereof will be omitted. First, the data acquisition unit 11 acquires a plurality of pieces of learning data (step S10). Next, the data extraction unit 12 classifies the learning data into classes (step S20). Next, the classification unit 123 performs clustering on the learning data in the same class (step S25). The classification unit 123 classifies similar learning data belonging to class 1 into the same cluster based on a predetermined evaluation measure (for example, similarity of features not considering the chronological relationship between the learning data) using any clustering method (for example, kmeans method). For example, the classification unit 123 calculates similarity based on (1) normalized cross-correlation or (2) frequency component distribution, and classifies those with high similarity into one of a plurality of clusters. The classification unit 123 performs clustering for each class, but if the number of learning data in a certain class is less than a threshold, the processing of step S31 may not be performed for that class. The classification unit 123 records, for each of the learning data that has been subjected to the clustering, in the storage unit 16, identification information of the cluster to which the learning data belongs.

次に類似度計算部１２１が、同じクラスタに分類された学習データの中から２つを選択して組み合わせる全ての組合せパターンについて、（１）や（２）の方法で類似度を計算し（ステップＳ３１）、各学習データ間の類似度を記憶部１６に記録する。ステップＳ２５にて同様の方法で類似度を計算している場合、その計算結果を用いることができる。類似度の計算を同一クラスの同一クラスタの範囲内に限定することで、第一実施形態のステップＳ３０に比べ、計算コストを低減することができる。 Next, the similarity calculation unit 121 calculates the similarity using methods (1) or (2) for all combination patterns in which two pieces of learning data classified into the same cluster are selected and combined (step S31), and records the similarity between each piece of learning data in the memory unit 16. If the similarity is calculated using a similar method in step S25, the calculation result can be used. By limiting the calculation of the similarity to within the same cluster of the same class, the calculation cost can be reduced compared to step S30 in the first embodiment.

次に、データ削減部１２２が、類似度の高い学習データの組合せを集計する（ステップＳ４０）。ステップＳ２５において、時系列的な前後関係を考慮しない特徴量に基づいて類似する学習データを分類しているので、ステップＳ３１にて、類似度計算の対象範囲を限定しても、ステップＳ４０の集計結果に影響（例えば、図５Ｂの表５０１にて、類似度が高いデータ同士を分別できない等）は無く、学習データの抽出精度が低下することは無いと考えられる。次に、データ削減部１２２が、類似度が高いデータを多く有する学習データを削除する（ステップＳ５０）。データ削減部１２２は、所定の完了条件を満たすまで（ステップＳ６０）、データ削減を繰り返し行う。データ抽出部１２は、削除されずに残った学習データを抽出された学習データとして記憶部１６に記録する（ステップＳ７０）。抽出した学習データを用いて学習モデルを構築する処理については第一実施形態（図７）と同様である。 Next, the data reduction unit 122 tabulates combinations of learning data with high similarity (step S40). In step S25, similar learning data is classified based on features that do not take into account chronological context. Therefore, even if the range of similarity calculation is limited in step S31, it is considered that the tabulation result in step S40 is not affected (for example, data with high similarity cannot be distinguished from each other in table 501 in FIG. 5B) and the accuracy of extraction of learning data is not reduced. Next, the data reduction unit 122 deletes learning data that has many data with high similarity (step S50). The data reduction unit 122 repeats data reduction until a predetermined completion condition is satisfied (step S60). The data extraction unit 12 records the remaining learning data that has not been deleted as extracted learning data in the storage unit 16 (step S70). The process of constructing a learning model using the extracted learning data is the same as that in the first embodiment (FIG. 7).

第二実施形態によれば、学習データ間の組合せが膨大となる場合、クラスタリングを行い、類似度が低いと考えられる他クラスタに属する学習データとの類似度計算を省略することで類似度計算に要する時間を削減する。これにより、第一実施形態の効果に加え、データ抽出処理の時間を短縮することができる。
なお、ｋｍｅａｎｓ法によるクラスタリングでは、各学習データが所属するクラスタを推定するため、ｋｍｅａｎｓモデルを構築する必要がある。ｋｍｅａｎｓモデルを構築する場合は、全学習データを利用してもよいが、モデル構築に時間を要する場合は、代表的な学習データを用いてｋｍｅａｎｓモデルを構築してもよい。 According to the second embodiment, when the number of combinations between learning data is enormous, clustering is performed and the time required for similarity calculation is reduced by omitting similarity calculation with learning data belonging to other clusters that are considered to have low similarity. This makes it possible to reduce the time required for data extraction processing in addition to the effects of the first embodiment.
In addition, in the clustering by the kmeans method, in order to estimate the cluster to which each piece of learning data belongs, it is necessary to construct a kmeans model. When constructing the kmeans model, all learning data may be used, but if it takes time to construct the model, the kmeans model may be constructed using representative learning data.

図１０は、各実施形態に係る学習モデル構築装置のハードウェア構成の一例を示す図である。
コンピュータ９００は、ＣＰＵ９０１、主記憶装置９０２、補助記憶装置９０３、入出力インタフェース９０４、通信インタフェース９０５を備える。
上述の学習モデル構築装置１０、１０Ａは、コンピュータ９００に実装される。そして、上述した各機能は、プログラムの形式で補助記憶装置９０３に記憶されている。ＣＰＵ９０１は、プログラムを補助記憶装置９０３から読み出して主記憶装置９０２に展開し、当該プログラムに従って上記処理を実行する。また、ＣＰＵ９０１は、プログラムに従って、記憶領域を主記憶装置９０２に確保する。また、ＣＰＵ９０１は、プログラムに従って、処理中のデータを記憶する記憶領域を補助記憶装置９０３に確保する。 FIG. 10 is a diagram illustrating an example of the hardware configuration of a learning model construction device according to each embodiment.
The computer 900 includes a CPU 901 , a main memory device 902 , an auxiliary memory device 903 , an input/output interface 904 , and a communication interface 905 .
The above-mentioned learning model construction devices 10 and 10A are implemented in a computer 900. The above-mentioned functions are stored in the auxiliary storage device 903 in the form of a program. The CPU 901 reads the program from the auxiliary storage device 903, expands it in the main storage device 902, and executes the above-mentioned processing according to the program. The CPU 901 also secures a memory area in the main storage device 902 according to the program. The CPU 901 also secures a memory area in the auxiliary storage device 903 for storing data being processed according to the program.

なお、学習モデル構築装置１０、１０Ａの全部または一部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより各機能部による処理を行ってもよい。ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、ＣＤ、ＤＶＤ、ＵＳＢ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。また、このプログラムが通信回線によってコンピュータ９００に配信される場合、配信を受けたコンピュータ９００が当該プログラムを主記憶装置９０２に展開し、上記処理を実行しても良い。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。
なお、学習モデル構築装置１０、１０Ａは、複数のコンピュータ９００によって構成されていても良い。 In addition, a program for realizing all or part of the functions of the learning model construction device 10, 10A may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed to perform processing by each functional unit. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. In addition, if a WWW system is used, the term "computer system" also includes a homepage providing environment (or display environment). In addition, the term "computer-readable recording medium" refers to portable media such as CDs, DVDs, and USBs, and storage devices such as hard disks built into a computer system. In addition, when the program is distributed to a computer 900 via a communication line, the computer 900 that receives the program may expand the program into the main storage device 902 and execute the above processing. In addition, the above program may be for realizing part of the above-mentioned functions, and may further be capable of realizing the above-mentioned functions in combination with a program already recorded in the computer system.
In addition, the learning model construction device 10, 10A may be composed of multiple computers 900.

以上のとおり、本開示に係るいくつかの実施形態を説明したが、これら全ての実施形態は、例として提示したものであり、発明の範囲を限定することを意図していない。これらの実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で種々の省略、置き換え、変更を行うことができる。これらの実施形態及びその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 As described above, several embodiments of the present disclosure have been described, but all of these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other forms, and various omissions, substitutions, and modifications can be made without departing from the gist of the invention. These embodiments and their modifications are included in the scope of the invention and its equivalents as described in the claims, as well as in the scope and gist of the invention.

＜付記＞
各実施形態に記載のデータ抽出装置、学習モデル構築装置、データ抽出方法、学習モデル構築方法およびプログラムは、例えば以下のように把握される。 <Additional Notes>
The data extraction device, the learning model construction device, the data extraction method, the learning model construction method, and the program described in each embodiment can be understood, for example, as follows.

（１）第１の態様に係るデータ抽出装置（学習モデル構築装置１０）は、時系列データから時間窓ごとに切り出された複数の学習データの中から、一部の学習データを抽出するデータ抽出装置であって、複数の前記学習データを取得するデータ取得部１１と、前記時間窓における時間的な前後の違いを除外して、前記学習データが有する特徴に基づいて、前記学習データ間の類似度を計算する類似度計算部１２１と、前記類似度が高い前記学習データ同士を分別し、その一部を削除するデータ削減部１２２と、を備える。
これにより、学習データが有する特徴の時間的な前後関係を無視して（時系列的な情報を排除して）、学習する手法（深層学習など）において、もとの学習データ群が有する特徴を保ったまま、効果的に学習データの数を減らすことができる。 (1) A data extraction device (learning model construction device 10) according to a first aspect is a data extraction device that extracts a portion of learning data from a plurality of learning data cut out for each time window from time-series data, and includes a data acquisition unit 11 that acquires the plurality of learning data, a similarity calculation unit 121 that calculates a similarity between the learning data based on features of the learning data, excluding temporal differences in the time window, and a data reduction unit 122 that separates the learning data with high similarity from each other and deletes a portion of the learning data.
This allows learning methods (such as deep learning) to ignore the temporal context of the features of the training data (eliminating chronological information), effectively reducing the amount of training data while preserving the features of the original training data group.

（２）第２の態様に係るデータ抽出装置（学習モデル構築装置１０）は、（１）のデータ抽出装置であって、前記データ削減部１２２は、それぞれの前記学習データについて、前記類似度が閾値以上の他の前記学習データを対応付け、他の前記学習データが最も多く対応付けられた前記学習データを削除する。
これにより、学習データ群が有する特徴を損なうことなく、学習データを削減していくことができる。 (2) A data extraction device (learning model construction device 10) according to a second aspect is the data extraction device of (1), in which the data reduction unit 122 associates, for each of the learning data, other learning data whose similarity is equal to or greater than a threshold, and deletes the learning data with which the other learning data is most frequently associated.
This makes it possible to reduce the training data without compromising the characteristics of the training data set.

（３）第３の態様に係るデータ抽出装置（学習モデル構築装置１０）は、（１）～（２）のデータ抽出装置であって、前記類似度計算部１２１は、２つの前記学習データを時間軸方向にずらしたときに、それぞれの学習データが示す波形データと前記時間軸とで挟まれた領域に関して、２つの前記学習データに係る前記領域が重なり合う面積の単位時間あたりの大きさに基づいて前記類似度を計算する。
これにより、時間的な前後の違いを除外した特徴量に基づいて学習データ間の類似度を計算することができる。 (3) A data extraction device (learning model construction device 10) according to a third aspect is a data extraction device according to (1) to (2), wherein the similarity calculation unit 121 calculates the similarity based on the size per unit time of the area of overlap between the areas relating to the two learning data when the two learning data are shifted in the time axis direction, for the area sandwiched between the waveform data indicated by each learning data and the time axis.
This makes it possible to calculate the similarity between training data based on features that exclude temporal differences.

（４）第４の態様に係るデータ抽出装置（学習モデル構築装置１０）は、（１）～（３）のデータ抽出装置であって、前記類似度計算部１２１は、２つの前記学習データを周波数分析して得られるそれぞれの周波数分布の形状に基づいて、２つの前記形状の間の類似度を計算する。
これにより、時間に依存しない特徴量（時間に的な前後の違いを除外した特徴量）に基づいて学習データ間の類似度を計算することができる。 (4) A data extraction device (learning model construction device 10) according to a fourth aspect is a data extraction device according to any one of (1) to (3), wherein the similarity calculation unit 121 calculates the similarity between the two shapes based on the shapes of respective frequency distributions obtained by frequency analysis of the two pieces of learning data.
This makes it possible to calculate the similarity between training data based on time-independent features (features that exclude differences before and after in terms of time).

（５）第５の態様に係るデータ抽出装置（学習モデル構築装置１０）は、（１）～（４）のデータ抽出装置であって、複数の前記学習データを所定の評価尺度（例えば、学習データ間の時系列的な前後関係を考慮しない特徴量の類似度）に基づいて類似するもの同士を同じグループ（クラスタ）へ分類する分類部１２３をさらに備え、前記類似度計算部１２１は、同じ前記グループ（クラスタ）に属する前記学習データ同士の前記類似度を計算する。
これにより、類似度計算の計算コストが膨大になることを抑制することができる。 (5) A data extraction device (learning model construction device 10) according to a fifth aspect is a data extraction device according to any one of (1) to (4), further comprising a classification unit 123 that classifies similar pieces of learning data into the same group (cluster) based on a predetermined evaluation measure (e.g., similarity of features without taking into account the chronological relationship between the learning data), and the similarity calculation unit 121 calculates the similarity between the learning data belonging to the same group (cluster).
This makes it possible to prevent the computational cost of similarity calculation from becoming enormous.

（６）第６の態様に係るデータ抽出装置（学習モデル構築装置１０）は、（１）～（５）のデータ抽出装置であって、前記学習データは、深層学習に用いる学習データである。
これにより、深層学習の学習データ量を削減することができる。削減後の学習データは、深層学習モデルの大まかな構造を決めるために用いることができる。 (6) A data extraction device (learning model construction device 10) according to a sixth aspect is a data extraction device according to any one of (1) to (5), in which the learning data is learning data used for deep learning.
This allows the amount of training data for deep learning to be reduced. The reduced training data can be used to determine the rough structure of a deep learning model.

（７）第７の態様に係る学習モデル構築装置１０は、（１）～（６）のデータ抽出装置と、前記学習データを学習して学習モデルを構築する学習部と、を備え、前記学習部は、前記データ抽出装置によって抽出された学習データで前記学習モデルを構築した後、抽出前の前記学習データを用いて前記学習モデルのパラメータの調整を行う。
これにより、学習モデルの構築に要する時間を削減することができる。また、全データを用いて学習モデルのパラメータの調整を行うことで、学習モデルの精度を担保することができる。 (7) A learning model construction device 10 according to a seventh aspect includes a data extraction device as described in (1) to (6) and a learning unit that learns the learning data and constructs a learning model. The learning unit constructs the learning model using the learning data extracted by the data extraction device, and then adjusts parameters of the learning model using the learning data before extraction.
This reduces the time required to build a learning model. In addition, the accuracy of the learning model can be ensured by adjusting the parameters of the learning model using all data.

（８）第８の態様に係るデータ抽出方法は、時系列データから時間窓ごとに切り出された複数の学習データの中から、一部の学習データを抽出するデータ抽出方法であって、複数の前記学習データを取得するステップと、前記時間窓における時間的な前後の違いを除外して、前記学習データが有する特徴に基づいて、前記学習データ間の類似度を計算するステップと、前記類似度が高い前記学習データ同士を分別し、その一部を削除するステップと、を有する。 (8) The data extraction method according to the eighth aspect is a data extraction method for extracting a portion of learning data from a plurality of learning data cut out for each time window from time series data, and includes the steps of acquiring the plurality of learning data, excluding temporal differences in the time window and calculating the similarity between the learning data based on the characteristics of the learning data, and separating the learning data with high similarity from each other and deleting a portion of the data.

（９）第９の態様に係る学習モデル構築方法は、（８）のデータ抽出方法によって抽出された前記学習データを学習して前記学習モデルを構築した後、抽出前の前記学習データを用いて前記学習モデルのパラメータの調整を行う。 (9) A learning model construction method according to a ninth aspect learns the learning data extracted by the data extraction method of (8) to construct the learning model, and then adjusts the parameters of the learning model using the learning data before extraction.

（１０）第１０の態様に係るプログラムは、コンピュータ９００に、時系列データから時間窓ごとに切り出された複数の学習データの中から、一部の学習データを抽出するデータ抽出処理であって、複数の前記学習データを取得するステップと、前記時間窓における時間的な前後の違いを除外して、前記学習データが有する特徴に基づいて、前記学習データ間の類似度を計算するステップと、前記類似度が高い前記学習データ同士を分別し、その一部を削除するステップと、を有するデータ抽出処理を実行させる。 (10) The program according to the tenth aspect causes a computer 900 to execute a data extraction process for extracting a portion of learning data from a plurality of pieces of learning data cut out from time series data for each time window, the data extraction process having the steps of acquiring the plurality of pieces of learning data, calculating the similarity between the pieces of learning data based on the characteristics of the learning data while excluding the temporal differences in the time window, and separating the pieces of learning data with high similarity from each other and deleting a portion of the pieces of learning data.

（１１）第１１の態様に係るプログラムは、コンピュータ９００に、（８）のデータ抽出処理によって抽出された前記学習データを学習して前記学習モデルを構築した後、抽出前の前記学習データを用いて前記学習モデルのパラメータの調整を行う処理を実行させる。 (11) The program according to the eleventh aspect causes the computer 900 to execute a process of learning the learning data extracted by the data extraction process of (8) to construct the learning model, and then adjusting the parameters of the learning model using the learning data before extraction.

１０、１０Ａ・・・学習モデル構築装置
１１・・・データ取得部
１２、１２Ａ・・・データ抽出部
１２１・・・類似度計算部
１２２・・・データ削減部
１２３・・・分類部
１３・・・学習部
１４・・・入力部
１５・・・出力部
１６・・・記憶部
９００・・・コンピュータ
９０１・・・ＣＰＵ
９０２・・・主記憶装置
９０３・・・補助記憶装置
９０４・・・入出力インタフェース
９０５・・・通信インタフェース 10, 10A... Learning model construction device 11... Data acquisition unit 12, 12A... Data extraction unit 121... Similarity calculation unit 122... Data reduction unit 123... Classification unit 13... Learning unit 14... Input unit 15... Output unit 16... Memory unit 900... Computer 901... CPU
902: Main memory device 903: Auxiliary memory device 904: Input/output interface 905: Communication interface

Claims

A data extraction device that extracts a portion of learning data from a plurality of learning data extracted for each time window from time-series data, comprising:
A data acquisition unit that acquires a plurality of the learning data;
a similarity calculation unit that calculates a similarity between the learning data based on features of the learning data excluding temporal differences in the time window;
a data reduction unit that classifies the learning data with high similarity from each other and deletes a part of the learning data;
Equipped with
the similarity calculation unit calculates the similarity based on a size per unit time of an overlapping area between the two learning data and waveform data indicated by each learning data and a time axis when the two learning data are shifted in a time axis direction.
Data Extraction Device.

A data extraction device that extracts a portion of learning data from a plurality of learning data extracted for each time window from time-series data, comprising:
A data acquisition unit that acquires a plurality of the learning data;
a similarity calculation unit that calculates a similarity between the learning data based on features of the learning data excluding temporal differences in the time window;
a data reduction unit that classifies the learning data with high similarity from each other and deletes a part of the learning data;
Equipped with
the similarity calculation unit calculates a similarity between the two shapes based on respective shapes of frequency distributions obtained by performing a frequency analysis on the two pieces of learning data ;
Data Extraction Device.

A data extraction device that extracts a portion of learning data from a plurality of learning data extracted for each time window from time-series data, comprising:
A data acquisition unit that acquires a plurality of the learning data;
A classification unit that classifies the plurality of learning data into the same group so that similar data are classified based on a predetermined evaluation criterion;
a similarity calculation unit that calculates a similarity between the learning data based on features of the learning data excluding temporal differences in the time window;
a data reduction unit that classifies the learning data with high similarity from each other and deletes a part of the learning data;
Equipped with
the similarity calculation unit calculates the similarity between the learning data belonging to the same group.
Data Extraction Device.

the data reduction unit associates, for each of the learning data, other learning data whose similarity is equal to or greater than a threshold, and deletes the learning data to which the other learning data is most frequently associated.
The data extraction device according to any one of claims 1 to 3 .

The learning data is learning data used for deep learning.
The data extraction device according to any one of claims 1 to 4 .

A data extraction device according to any one of claims 1 to 5 ;
A learning unit that learns the learning data and constructs a learning model,
The learning unit learns the learning data extracted by the data extraction device to construct the learning model, and then adjusts parameters of the learning model using the learning data before extraction.
Learning model construction device.

A data extraction method for extracting a portion of learning data from a plurality of learning data extracted for each time window from time series data, comprising the steps of:
acquiring a plurality of said learning data;
calculating a similarity between the learning data based on features of the learning data excluding temporal differences in the time window;
A step of classifying the learning data having a high degree of similarity from each other and deleting a part of the learning data;
having
In the step of calculating the similarity, when the two pieces of learning data are shifted in the time axis direction, the similarity is calculated based on a size per unit time of an area where the areas related to the two pieces of learning data overlap with respect to an area sandwiched between the waveform data indicated by each piece of learning data and the time axis.
Data extraction methods.

A data extraction method for extracting a portion of learning data from a plurality of learning data extracted for each time window from time series data, comprising the steps of:
acquiring a plurality of said learning data;
calculating a similarity between the learning data based on features of the learning data excluding temporal differences in the time window;
A step of classifying the learning data having a high degree of similarity from each other and deleting a part of the learning data;
having
In the step of calculating the similarity, a similarity between the two shapes is calculated based on shapes of respective frequency distributions obtained by performing a frequency analysis on the two pieces of learning data.
Data extraction methods.

A data extraction method for extracting a portion of learning data from a plurality of learning data extracted for each time window from time series data, comprising the steps of:
acquiring a plurality of said learning data;
A step of classifying the plurality of learning data into the same group so that similar data are classified based on a predetermined evaluation criterion;
calculating a similarity between the learning data based on features of the learning data excluding temporal differences in the time window;
A step of classifying the learning data having a high degree of similarity from each other and deleting a part of the learning data;
having
In the step of calculating the similarity, the similarity between the learning data belonging to the same group is calculated.
Data extraction methods.

A learning model is constructed by learning the learning data extracted by the data extraction method according to any one of claims 7 to 9 , and then parameters of the learning model are adjusted using the learning data before extraction.
How to build a learning model.

On the computer,
A data extraction process for extracting a portion of learning data from a plurality of learning data cut out for each time window from time series data, the data extraction process comprising the steps of:
acquiring a plurality of said learning data;
calculating a similarity between the learning data based on features of the learning data excluding temporal differences in the time window;
A step of classifying the learning data having a high degree of similarity from each other and deleting a part of the learning data;
having
The step of calculating the similarity includes a step of executing a data extraction process to calculate the similarity based on the size per unit time of the area where the regions related to the two learning data overlap, for a region sandwiched between the waveform data represented by each learning data and the time axis when the two learning data are shifted in the time axis direction.

On the computer,
A data extraction process for extracting a portion of learning data from a plurality of learning data cut out for each time window from time series data, the data extraction process comprising the steps of:
acquiring a plurality of said learning data;
calculating a similarity between the learning data based on features of the learning data excluding temporal differences in the time window;
A step of classifying the learning data having a high degree of similarity from each other and deleting a part of the learning data;
having
In the step of calculating the similarity, a program is provided for executing a data extraction process that calculates the similarity between the two shapes based on the shapes of respective frequency distributions obtained by frequency analysis of the two pieces of learning data.

On the computer,
A data extraction process for extracting a portion of learning data from a plurality of learning data cut out for each time window from time series data, the data extraction process comprising the steps of:
acquiring a plurality of said learning data;
A step of classifying the plurality of learning data into the same group so that similar data are classified based on a predetermined evaluation criterion;
calculating a similarity between the learning data based on features of the learning data excluding temporal differences in the time window;
A step of classifying the learning data having a high degree of similarity from each other and deleting a part of the learning data;
having
The step of calculating the similarity includes the step of: extracting data from the learning data that belong to the same group;

On the computer,
A program that executes a process of learning the learning data extracted by the program according to any one of claims 11 to 13 to construct a learning model, and then adjusting parameters of the learning model using the learning data before extraction.