JP7491382B2

JP7491382B2 - Importance calculation device, importance calculation method, and importance calculation program

Info

Publication number: JP7491382B2
Application number: JP2022543201A
Authority: JP
Inventors: 一樹足立; 哲哉塩田; 真智子豊田
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2024-05-28
Anticipated expiration: 2040-08-19
Also published as: WO2022038722A1; JPWO2022038722A1

Description

本発明は、重要度計算装置、重要度計算方法及び重要度計算プログラムに関する。 The present invention relates to an importance calculation device, an importance calculation method, and an importance calculation program.

機械学習におけるデータの前処理として特徴選択は重要である。例えば、モデルの学習用のデータに不要な特徴が含まれている場合、モデルの精度の低下、学習コスト（計算量、時間）の増大、モデルの複雑化、過学習といった好ましくない事象が発生する場合がある。 Feature selection is an important method of data preprocessing in machine learning. For example, if unnecessary features are included in the data used to train a model, undesirable phenomena such as a decrease in model accuracy, an increase in training costs (computational complexity, time), model complexity, and overfitting may occur.

Breiman , Leo, et al. "RANDOM FORESTS." January 2001.Breiman, Leo, et al. "RANDOM FORESTS." January 2001. Li, Zechao , et al. "Unsupervised feature selection using nonnegative spectral analysis." Twenty Sixth AAAI Conference on Artificial Intelligence. 2012.Li, Zechao, et al. "Unsupervised feature selection using nonnegative spectral analysis." Twenty Sixth AAAI Conference on Artificial Intelligence. 2012.

しかしながら、従来の技術には、特徴選択を効率良く行うことができない場合があるという問題がある。まず、人手による特徴選択を行う場合、統計及び機械学習に関する知識が求められる。他方、特徴選択を自動的に行う手法も知られているが、そのような手法を実施するためには、モデルをどのような目的で使用するか（タスク）が明確である必要があり、さらに正解が付与された教師ありデータが必要になる。一般的に正解の付与は高コストな作業である。However, conventional techniques have the problem that feature selection may not be performed efficiently. First, manual feature selection requires knowledge of statistics and machine learning. On the other hand, there are known techniques for performing feature selection automatically, but to implement such techniques, it is necessary to clearly define the purpose (task) for which the model will be used, and furthermore, supervised data with correct answers assigned is required. Assigning correct answers is generally a costly task.

上述した課題を解決し、目的を達成するために、重要度計算装置は、学習済みのオートエンコーダのエンコーダ部分を用いて、データサンプルの潜在表現を計算する潜在表現計算部と、前記潜在表現に対する前記データサンプルの勾配を基に、前記データサンプルに含まれる特徴ごとの重要度を計算する重要度計算部と、を有することを特徴とする。In order to solve the above-mentioned problems and achieve the objective, the importance calculation device is characterized by having a latent representation calculation unit that calculates a latent representation of a data sample using the encoder portion of a trained autoencoder, and an importance calculation unit that calculates the importance of each feature contained in the data sample based on the gradient of the data sample with respect to the latent representation.

本発明によれば、特徴選択を効率良く行うことができる。 According to the present invention, feature selection can be performed efficiently.

図１は、第１の実施形態に係る重要度計算装置の構成例を示す図である。FIG. 1 is a diagram illustrating an example of the configuration of an importance calculation device according to the first embodiment. 図２は、データサンプルの例を示す図である。FIG. 2 is a diagram showing an example of a data sample. 図３は、勾配について説明する図である。FIG. 3 is a diagram illustrating the gradient. 図４は、特徴選択について説明する図である。FIG. 4 is a diagram illustrating feature selection. 図５は、第１の実施形態に係る重要度計算装置の処理の流れを示すフローチャートである。FIG. 5 is a flowchart showing the flow of processing of the importance calculation device according to the first embodiment. 図６は、特徴同士の関係の例を示す図である。FIG. 6 is a diagram showing an example of the relationship between features. 図７は、特徴ごとの重要度の例を示す図である。FIG. 7 is a diagram showing an example of the importance of each feature. 図８は、特徴ごとの重要度の例を示す図である。FIG. 8 is a diagram showing an example of the importance of each feature. 図９は、特徴ごとの重要度の例を示す図である。FIG. 9 is a diagram showing an example of the importance of each feature. 図１０は、特徴ごとの重要度の例を示す図である。FIG. 10 is a diagram showing an example of the importance of each feature. 図１１は、重要度を降順に並べた場合の例を示す図である。FIG. 11 is a diagram showing an example in which the importance is arranged in descending order. 図１２は、重要度を降順に並べた場合の例を示す図である。FIG. 12 is a diagram showing an example in which the importance is arranged in descending order. 図１３は、重要度計算プログラムを実行するコンピュータの一例を示す図である。FIG. 13 is a diagram illustrating an example of a computer that executes a importance calculation program.

以下に、本願に係る重要度計算装置、重要度計算方法及び重要度計算プログラムの実施形態を図面に基づいて詳細に説明する。なお、本発明は、以下に説明する実施形態により限定されるものではない。 Below, the embodiments of the importance calculation device, importance calculation method, and importance calculation program according to the present application are described in detail with reference to the drawings. Note that the present invention is not limited to the embodiments described below.

［第１の実施形態の構成］
まず、図１を用いて、第１の実施形態に係る重要度計算装置の構成について説明する。図１は、第１の実施形態に係る重要度計算装置の構成の一例を示す図である。図１に示すように、重要度計算装置１０は、データの入力を受け付け、重要度の計算を行い、データの特徴ごとの重要度を出力する。また、重要度計算装置１０は、インタフェース部１１、記憶部１２及び制御部１３を有する。 [Configuration of the first embodiment]
First, the configuration of the importance calculation device according to the first embodiment will be described with reference to Fig. 1. Fig. 1 is a diagram showing an example of the configuration of the importance calculation device according to the first embodiment. As shown in Fig. 1, the importance calculation device 10 receives input of data, calculates the importance, and outputs the importance of each feature of the data. In addition, the importance calculation device 10 has an interface unit 11, a storage unit 12, and a control unit 13.

インタフェース部１１は、データの入力及び出力のためのインタフェースである。例えば、インタフェース部１１はＮＩＣ（Network Interface Card）である。また、インタフェース部１１は、マウスやキーボード等の入力装置、及びディスプレイ等の出力装置と接続されていてもよい。The interface unit 11 is an interface for inputting and outputting data. For example, the interface unit 11 is a NIC (Network Interface Card). The interface unit 11 may also be connected to an input device such as a mouse or a keyboard, and an output device such as a display.

記憶部１２は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、光ディスク等の記憶装置である。なお、記憶部１２は、ＲＡＭ（Random Access Memory）、フラッシュメモリ、ＮＶＳＲＡＭ（Non Volatile Static Random Access Memory）等のデータを書き換え可能な半導体メモリであってもよい。記憶部１２は、重要度計算装置１０で実行されるＯＳ（Operating System）や各種プログラムを記憶する。記憶部１２は、モデル情報１２１を記憶する。The memory unit 12 is a storage device such as a hard disk drive (HDD), a solid state drive (SSD), or an optical disk. The memory unit 12 may be a semiconductor memory in which data can be rewritten, such as a random access memory (RAM), a flash memory, or a non-volatile static random access memory (NVSRAM). The memory unit 12 stores an operating system (OS) and various programs executed by the importance calculation device 10. The memory unit 12 stores model information 121.

モデル情報１２１は、モデルを構築するための情報である。本実施形態では、モデル情報１２１によってオートエンコーダが構築されるものとする。また、オートエンコーダは、入力と出力の平均二乗誤差の最小化等により学習済みであるものとする。 The model information 121 is information for constructing a model. In this embodiment, an autoencoder is constructed using the model information 121. The autoencoder is also assumed to have been trained by minimizing the mean squared error between the input and output, for example.

例えば、モデル情報１２１は、オートエンコーダに含まれるエンコーダ及びデコーダを実現する多層ニューラルネットワークの各ノードの重みやバイアス等である。なお、モデル情報１２１には、オートエンコーダに含まれるエンコーダ及びデコーダのうち、エンコーダを構築するための情報が少なくとも含まれていればよい。For example, the model information 121 is the weights and biases of each node of a multilayer neural network that realizes the encoder and decoder included in the autoencoder. Note that the model information 121 only needs to include information for constructing the encoder among the encoders and decoders included in the autoencoder.

制御部１３は、重要度計算装置１０全体を制御する。制御部１３は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＧＰＵ（Graphics Processing Unit）等の電子回路や、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等の集積回路である。また、制御部１３は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部１３は、各種のプログラムが動作することにより各種の処理部として機能する。例えば、制御部１３は、潜在表現計算部１３１、影響度計算部１３２、重要度計算部１３３及び特徴選択部１３４を有する。The control unit 13 controls the entire importance calculation device 10. The control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit), MPU (Micro Processing Unit), or GPU (Graphics Processing Unit), or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array). The control unit 13 also has an internal memory for storing programs and control data that define various processing procedures, and executes each process using the internal memory. The control unit 13 also functions as various processing units by the operation of various programs. For example, the control unit 13 has a latent expression calculation unit 131, an influence calculation unit 132, an importance calculation unit 133, and a feature selection unit 134.

潜在表現計算部１３１は、学習済みのオートエンコーダのエンコーダ部分を用いて、データサンプルの潜在表現を計算する。潜在表現計算部１３１は、データサンプルの入力を受け付け、潜在表現を出力する。ここで、重要度計算装置１０に入力されるデータをデータサンプルと呼ぶ。重要度計算装置１０には複数のデータサンプルが入力されてもよい。また、データサンプルは、オートエンコーダの学習に使用されたものであってもよい。The latent representation calculation unit 131 calculates the latent representation of a data sample using the encoder portion of the trained autoencoder. The latent representation calculation unit 131 accepts input of a data sample and outputs a latent representation. Here, the data input to the importance calculation device 10 is called a data sample. Multiple data samples may be input to the importance calculation device 10. Furthermore, the data sample may be one that was used for training the autoencoder.

図２は、データサンプルの例を示す図である。図２に示すように、例えば、各データサンプルはテーブルデータを構成する。すなわち、データサンプルは「Ｎａｍｅ」、「Ｍａｔｈ」、「Ｅｎｇｌｉｓｈ」といった項目を持つ。また、各項目の値がデータサンプルの特徴に相当する。特徴の数をｄとすると、データサンプルはｘ＝［ｘ_１，…，ｘ_ｄ］のように表される。 Fig. 2 is a diagram showing an example of a data sample. As shown in Fig. 2, for example, each data sample constitutes table data. That is, a data sample has items such as "Name", "Math", and "English". Also, the value of each item corresponds to a feature of the data sample. If the number of features is d, the data sample is expressed as x = [ _x1 , ..., _xd ].

影響度計算部１３２は、データサンプルの特徴ごとに、潜在表現に対する１つの影響度を計算する。影響度計算部１３２は、潜在表現の入力を受け付け、影響度を出力する。潜在表現は、データサンプルを基にエンコーダによって計算される。潜在表現の次元数をｄ´とすると、潜在表現はｚ＝［ｚ_１，…，ｚ_ｄ´］のように表される。図３は、勾配について説明する図である。図３に示すように、データサンプルのそれぞれの特徴が、潜在表現の各次元に対応する。 The influence calculation unit 132 calculates one influence on the latent representation for each feature of the data sample. The influence calculation unit 132 accepts an input of the latent representation and outputs an influence. The latent representation is calculated by an encoder based on the data sample. If the number of dimensions of the latent representation is d', the latent representation is expressed as z = [ _z1 , ..., _zd' ]. Figure 3 is a diagram explaining the gradient. As shown in Figure 3, each feature of the data sample corresponds to each dimension of the latent representation.

例えば、影響度計算部１３２は、データサンプルの特徴ごとの、潜在表現の各次元に対する勾配の絶対値を足し合せることで影響度を計算する。ここで、潜在表現の次元ｚ_ｉ（ｉ∈｛１，…,ｄ´｝）に対する特徴ｘ_ｋ（ｋ∈｛１，…,ｄ｝）の勾配は、∂ｚ_ｉ／∂ｘ_ｋのように書ける。そこで、影響度計算部１３２は、特徴ｘ_ｋの影響度ｉｎｆｌｕｅｎｃｅ_ｋを（１）式のように計算する。 For example, the influence calculation unit 132 calculates the influence by adding up the absolute values of the gradients for each dimension of the latent representation for each feature of the data sample. Here, the gradient of feature _xk (k∈{1,...,d}) for dimension _zi (i∈{1,...,d'}) of the latent representation can be written as ∂z _i /∂x _k . Therefore, the influence calculation unit 132 calculates _{the influence influence k} _of feature xk as shown in formula (1).

これにより、影響度計算部１３２は、データサンプルの特徴がｋ個存在する場合（データサンプルがｋ次元である場合）、スカラー値である影響度をｋ個計算することができる。この場合、入力次元と影響度は１対１に対応する。 In this way, when there are k features in a data sample (when the data sample is k-dimensional), the influence calculation unit 132 can calculate k influences, which are scalar values. In this case, there is a one-to-one correspondence between the input dimension and the influence.

勾配は入力次元数、すなわちデータサンプルの特徴の数と潜在表現の次元数との積ｄ×ｄ´だけ存在するが、（１）式のように潜在表現の各次元に対応する勾配の絶対値を足し合わせることで、１つの特徴に対し、スカラー値である１つの影響度が得られる。さらに、影響度がスカラー値になることで、利用者が容易に理解できるようになる。また、各勾配は正負に関係なく絶対値が大きいほど影響が大きいことを意味するため、（１）式のように先に絶対値を取ってから足し合わせることで勾配同士が打ち消し合うことを防ぐことができる。 There are as many gradients as the number of input dimensions, i.e., the product d x d' of the number of features of the data sample and the number of dimensions of the latent representation. However, by adding up the absolute values of the gradients corresponding to each dimension of the latent representation as in equation (1), a single influence level, which is a scalar value, can be obtained for one feature. Furthermore, by making the influence level a scalar value, users can easily understand it. Also, since the larger the absolute value of each gradient, regardless of whether it is positive or negative, it means that the greater the influence, so by first taking the absolute value and then adding them up as in equation (1), it is possible to prevent gradients from canceling each other out.

重要度計算部１３３は、潜在表現に対するデータサンプルの勾配を基に、データサンプルに含まれる特徴ごとの重要度を計算する。重要度計算部１３３は、影響度の入力を受け付け、重要度を出力する。例えば、重要度計算部１３３は、複数のデータサンプルについて計算された影響度の中央値及び平均値等の所定の統計量を重要度として計算する。The importance calculation unit 133 calculates the importance of each feature included in the data sample based on the gradient of the data sample with respect to the latent expression. The importance calculation unit 133 accepts an input of the influence and outputs the importance. For example, the importance calculation unit 133 calculates a predetermined statistical quantity, such as the median and average of the influence calculated for multiple data samples, as the importance.

さらに、重要度計算部１３３は、重要度を評価するための評価値を計算する。例えば、重要度計算部１３３は、影響度の中央値を重要度として計算し、中央値に重要度のばらつきの大きさを表す値を掛けた評価値をさらに計算する。Furthermore, the importance calculation unit 133 calculates an evaluation value for evaluating the importance. For example, the importance calculation unit 133 calculates the median of the influence levels as the importance, and further calculates an evaluation value by multiplying the median by a value representing the magnitude of variation in the importance.

具体的には、まず、重要度計算部１３３は、影響度ｉｎｆｌｕｅｎｃｅ_ｋを複数のデータサンプルについて計算する。ここで、どのデータサンプルにおいても平均的に影響度が大きい特徴は重要と考えられる。そこで、重要度計算部１３３は、評価値を、影響度の中央値×（影響度の７５パーセンタイル値－影響度の２５パーセンタイル値）のように計算する。影響度の中央値は重要度に相当する。また、影響度の７５パーセンタイル値から影響度の２５パーセンタイル値を引いた値は、ばらつきの大きさを表す値の一例である。 Specifically, first, the importance calculation unit 133 calculates the influence influence _k for a plurality of data samples. Here, a feature with a large influence on average in all data samples is considered to be important. Therefore, the importance calculation unit 133 calculates the evaluation value as the median of the influence × (75th percentile value of the influence − 25th percentile value of the influence). The median of the influence corresponds to the importance. Moreover, the value obtained by subtracting the 25th percentile value of the influence from the 75th percentile value of the influence is an example of a value representing the magnitude of variation.

評価値が小さい特徴は、どのデータサンプルに対しても重要度が低いといえるため、全体的に重要度が低いと判断できる。また、データサンプルが非線形な構造を持っている場合、重要な特徴は個々のデータサンプルによって異なると考えられる。上記の評価値によれば、特徴ごとの各データサンプルにおける影響度のばらつきを評価することができる。 Features with small evaluation values are considered to be of low importance to all data samples, and therefore can be judged to be of low importance overall. In addition, if the data samples have a nonlinear structure, it is likely that important features will differ depending on each individual data sample. Using the above evaluation values, it is possible to evaluate the variation in the influence of each feature on each data sample.

特徴選択部１３４は、重要度又は評価値を基に特徴を選択する。特徴選択部１３４は、各特徴の重要度又は評価値を入力として受け付け、選択した特徴を出力する。例えば、特徴選択部１３４は、重要度又は評価値が所定の閾値以上である特徴を選択してもよいし、重要度又は評価値が大きい順に所定の数の特徴を選択してもよい。なお、重要度計算装置１０は、重要度又は評価値を出力してもよいし、特徴選択部１３４によって選択された特徴を出力してもよい。The feature selection unit 134 selects features based on the importance or evaluation value. The feature selection unit 134 accepts the importance or evaluation value of each feature as an input, and outputs the selected features. For example, the feature selection unit 134 may select features whose importance or evaluation value is equal to or greater than a predetermined threshold, or may select a predetermined number of features in descending order of importance or evaluation value. The importance calculation device 10 may output the importance or evaluation value, or may output the features selected by the feature selection unit 134.

図４は、特徴選択について説明する図である。図４に示すように、本実施形態による特徴選択は、オートエンコーダで次元圧縮し、潜在表現に対する寄与度を基に重要（又は不要な）特徴を自動で選択するものである。また、オートエンコーダの潜在表現にはデータを表現するためのエッセンスが抽出されているため、本実施形態では潜在表現への寄与が大きい特徴を重要とみなしているということができる。 Figure 4 is a diagram explaining feature selection. As shown in Figure 4, feature selection according to this embodiment involves dimensional compression using an autoencoder, and automatically selecting important (or unnecessary) features based on their contribution to the latent representation. In addition, since the essence of expressing data is extracted from the latent representation of the autoencoder, in this embodiment, features that contribute greatly to the latent representation are considered important.

［第１の実施形態の処理］
図５は、第１の実施形態に係る重要度計算装置の処理の流れを示すフローチャートである。図５に示すように、まず、潜在表現計算部１３１は、エンコーダを用いてデータサンプルの潜在表現を計算する（ステップＳ１０１）。次に、影響度計算部１３２は、潜在表現に対する特徴ごとの影響度を計算する（ステップＳ１０２）。 [Processing of the First Embodiment]
5 is a flowchart showing a process flow of the importance calculation device according to the first embodiment. As shown in FIG. 5, first, the latent expression calculation unit 131 calculates the latent expression of the data sample using the encoder (step S101). Next, the influence calculation unit 132 calculates the influence of each feature on the latent expression (step S102).

ここで、重要度計算部１３３は、影響度を基に特徴ごとの重要度を計算する（ステップＳ１０３）。そして、特徴選択部１３４は、重要度が所定の値以上である特徴を選択する（ステップＳ１０４）。また、重要度計算装置１０は、重要度又は選択した特徴を出力する（ステップＳ１０５）。Here, the importance calculation unit 133 calculates the importance of each feature based on the influence (step S103). Then, the feature selection unit 134 selects features whose importance is equal to or greater than a predetermined value (step S104). Furthermore, the importance calculation device 10 outputs the importance or the selected feature (step S105).

［第１の実施形態の効果］
これまで説明してきたように、潜在表現計算部１３１は、学習済みのオートエンコーダのエンコーダ部分を用いて、データサンプルの潜在表現を計算する。重要度計算部１３３は、潜在表現に対するデータサンプルの勾配を基に、データサンプルに含まれる特徴ごとの重要度を計算する。このように、重要度計算装置１０は、データサンプルに正解が付与されているか否かにかかわらず重要度を自動的に計算することができる。その結果、本実施形態によれば、特徴選択を効率良く行うことができる。 [Effects of the First Embodiment]
As described above, the latent representation calculation unit 131 calculates the latent representation of the data sample using the encoder part of the trained autoencoder. The importance calculation unit 133 calculates the importance of each feature included in the data sample based on the gradient of the data sample with respect to the latent representation. In this way, the importance calculation device 10 can automatically calculate the importance regardless of whether the correct answer is assigned to the data sample. As a result, according to this embodiment, feature selection can be performed efficiently.

ここで、分析者には、タスクの目的及び内容が明確でないデータサンプルが渡される場合がある。本実施形態によれば、データサンプルを使ったタスクの目的及び内容が明確でない場合であっても、データサンプルの特徴の重要度を計算することができる。また、本実施形態で計算される重要度及び重要度を用いた特徴選択は、オートエンコーダの学習に利用可能であることはいうまでもないが、オートエンコーダ以外のモデルの学習にも利用することができる。Here, the analyst may be given a data sample for which the purpose and content of the task are unclear. According to this embodiment, even if the purpose and content of the task using the data sample are unclear, the importance of the features of the data sample can be calculated. Furthermore, it goes without saying that the importance calculated in this embodiment and feature selection using the importance can be used for learning autoencoders, but they can also be used for learning models other than autoencoders.

影響度計算部１３２は、データサンプルの特徴ごとに、潜在表現に対する１つの影響度を計算する。重要度計算部１３３は、複数のデータサンプルについて計算された影響度の所定の統計量を重要度として計算する。このように、複数のデータサンプルごとに計算される影響度を、中央値及び平均値といった統計量にまとめることで、各特徴の重要度を俯瞰的に把握することが可能になる。The influence calculation unit 132 calculates one influence on the latent expression for each feature of the data sample. The importance calculation unit 133 calculates a predetermined statistical amount of the influence calculated for multiple data samples as the importance. In this way, by summarizing the influence calculated for each of the multiple data samples into statistics such as the median and the average, it becomes possible to grasp the importance of each feature from a bird's-eye view.

影響度計算部１３２は、データサンプルの特徴ごとの、潜在表現の各次元に対する勾配の絶対値を足し合せることで影響度を計算する。これにより、正負の両方が存在し得る勾配同士が打ち消し合うことを防ぎ、影響度をより意味のある指標にすることができる。The influence calculation unit 132 calculates the influence by adding up the absolute values of the gradients for each dimension of the latent representation for each feature of the data sample. This prevents gradients, which can be both positive and negative, from canceling each other out, making the influence a more meaningful index.

重要度計算部１３３は、影響度の中央値を重要度として計算し、中央値に重要度のばらつきの大きさを表す値を掛けた評価値をさらに計算する。これにより、ばらつきの面から各特徴の重要度を評価できるようになる。The importance calculation unit 133 calculates the median of the influence levels as the importance level, and further calculates an evaluation value by multiplying the median by a value that represents the magnitude of the variation in the importance level. This makes it possible to evaluate the importance of each feature in terms of variation.

［第１の実験］
第１の実施形態を実際のデータサンプルに適用して行った実験について説明する。第１の実験では、（２）式に従って発生させた９次元のデータサンプルｘ＝［ｚ_１，ｚ_２，ｚ_３，ｘ_１，ｘ_２，ｘ_３，ｘ_４，ｘ_５，ｎ_１］を５万件生成し、影響度及び重要度を計算した。 [First Experiment]
An experiment performed by applying the first embodiment to actual data samples will be described below. In the first experiment, 50,000 nine-dimensional data samples x = [ _z1 , _z2 , _z3 , _x1 , _x2 , _x3 , _x4 , _x5 , _n1 ] were generated according to formula (2), and the influence and importance were calculated.

（２）式に示すように、ｚ_１、ｚ_２、ｚ_３は独立な分布から生成される特徴である。ｘ_１、ｘ_２、ｘ_３、ｘ_４、ｘ_５はｚ_１、ｚ_２、ｚ_３に依存する分布から生成される特徴である。ｎ_１は常に０であり、完全に不要な特徴であり、特徴選択によって除去されることが望ましい。 As shown in formula (2), _z1 , _z2 , and _z3 are features generated from independent distributions. _x1 , _x2 , _x3 , _x4 , and _x5 are features generated from distributions that depend on _z1 , _z2 , and _z3 . _n1 is always 0 and is a completely unnecessary feature, which is desirably removed by feature selection.

オートエンコーダのエンコーダ及びデコーダはともに４層のニューラルネットワークである。また、エンコーダの各層は、入力側から９次元、２０次元、３０次元、５０次元、５次元とする。この場合、ｄ＝９であり、ｄ´＝５である。 Both the encoder and decoder of an autoencoder are four-layer neural networks. The encoder layers are 9-dimensional, 20-dimensional, 30-dimensional, 50-dimensional, and 5-dimensional from the input side. In this case, d = 9 and d' = 5.

図６は、特徴同士の関係の例を示す図である。図６では、ｚ_１、ｚ_２、ｚ_３は互いに独立している傾向が見られる。また、ｘ_１、ｘ_２、ｘ_３、ｘ_４、ｘ_５には、ｚ_１、ｚ_２、ｚ_３の影響が見られる。一方、ｎ_１は他の特徴の影響を受けていない。 Fig. 6 is a diagram showing an example of the relationship between features. In Fig. 6, _z1 , _z2 , and _z3 tend to be independent of each other. Also, _x1 , _x2 , _x3 , _x4 , and _x5 are influenced by _z1 , _z2 , and _z3 . On the other hand, _n1 is not influenced by other features.

図７は、特徴ごとの重要度の例を示す図である。図７には、各特徴の重要度（影響度の中央値）と影響度の上限値及び下限値が示されている。図７に示すように、ｎ_１は常に重要度が低い。また、影響度の分散が大きい特徴と小さい特徴が存在する。これは、データサンプルの非線形性によるものと考えられる。このため、重要な特徴はデータサンプルごとに異なると考えられる。例えば、ｘ_２の重要度はｘ_４よりも低いが、データサンプルによってはｘ_２の影響度がｘ_４の影響度を上回ることがあると考えられる。また、図７のように特徴ごとの重要度を並べて表示することで、視覚的に重要な特徴を捉えることができる。 FIG. 7 is a diagram showing an example of the importance of each feature. FIG. 7 shows the importance (median of the influence) of each feature and the upper and lower limits of the influence. As shown in FIG. 7, n ₁ always has a low importance. In addition, there are features with large and small variances of the influence. This is considered to be due to the nonlinearity of the data samples. For this reason, it is considered that the important features differ for each data sample. For example, the importance of x ₂ is lower than that of x ₄ , but it is considered that the influence of x ₂ may exceed that of x ₄ depending on the data sample. In addition, by displaying the importance of each feature side by side as in FIG. 7, it is possible to visually grasp the important features.

［第２の実験］
第２の実験では、実際のサーバのログから得られたデータサンプルに第１の実施形態を適用した。データサンプルは、８２次元のサーバログデータである。特徴は、ＣＰＵ使用率、メモリ使用量等の数値データであり、次元ごとに標準化されている。また、データサンプルの数は２９９，４６５件である。 [Second Experiment]
In the second experiment, the first embodiment was applied to a data sample obtained from an actual server log. The data sample was 82-dimensional server log data. The features were numerical data such as CPU usage and memory usage, which were standardized for each dimension. The number of data samples was 299,465.

オートエンコーダのエンコーダ及びデコーダはともに２層のニューラルネットワークである。また、エンコーダの各層は、入力側から８２次元、１００次元、２０次元とする。この場合、ｄ＝８２であり、ｄ´＝２０である。 The encoder and decoder of an autoencoder are both two-layer neural networks. The encoder layers are 82, 100, and 20 dimensions from the input side. In this case, d = 82 and d' = 20.

図８、図９、図１０は、特徴ごとの重要度の例を示す図である。図８、図９、図１０に示すように、影響度の分布における分散は、特徴ごとに様々である。図８の「ｕｓｅｄ＿ｍｅｍ＿１」、「ｆｒｅｅ＿ｍｅｍ＿１」等は分散が非常に大きい。このような分散の大きな特徴は、データサンプルによりその特徴の影響度が異なることが考えられる。その場合、影響度が同じであるデータサンプルを抽出することで新たな知見を得られることも期待される。 Figures 8, 9, and 10 are diagrams showing examples of the importance of each feature. As shown in Figures 8, 9, and 10, the variance in the distribution of influence varies from feature to feature. "used_mem_1", "free_mem_1", etc. in Figure 8 have very large variance. It is thought that features with such large variance have different influences depending on the data sample. In such cases, it is expected that new knowledge can be obtained by extracting data samples with the same influence.

一方、図８の「ｕｓｅｄ＿ｓｗａｐ＿１」、図９の「ｒｅａｄ＿ｒｅｑｕｅｓｔ＿１＿ｓｄａ１」、図１０の「ｄｉｓｋ＿ｓｉｚｅ＿１」等は、分散が非常に小さい。このような分散の小さな特徴は、どのデータサンプルでも同程度の影響度を持つ特徴であるということができる。On the other hand, "used_swap_1" in Figure 8, "read_request_1_sda1" in Figure 9, "disk_size_1" in Figure 10, etc. have very small variances. Features with such small variances can be said to have the same degree of influence on all data samples.

図１１及び図１２は、重要度を降順に並べた場合の例を示す図である。図１０の「ｃａｓｈｅｄ＿ｍｅｍ＿１」及び「ｕｓｅｄ＿ｍｅｍ＿１」は、重要度が上位である上に、図８に示すように影響度の分散が大きい。一方、図１１の「ｄｉｓｋ＿ｓｉｚｅ＿１」は、重要度が下位であって、かつ図１０に示すように影響度の分散が小さい。これは、重要度が低い特徴は、ほとんどのデータサンプルにおいて安定して影響度も低いことを意味する。 Figures 11 and 12 are diagrams showing examples in descending order of importance. "cached_mem_1" and "used_mem_1" in Figure 10 have high importance and a large variance in impact as shown in Figure 8. On the other hand, "disk_size_1" in Figure 11 has a low importance and a small variance in impact as shown in Figure 10. This means that features with low importance have a stable low impact in most data samples.

［システム構成等］
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散及び統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散又は統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部又は任意の一部が、ＣＰＵ（Central Processing Unit）及び当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。なお、プログラムは、ＣＰＵだけでなく、ＧＰＵ等の他のプロセッサによって実行されてもよい。 [System configuration, etc.]
In addition, each component of each device shown in the figure is functionally conceptual, and does not necessarily have to be physically configured as shown in the figure. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or a part of it can be functionally or physically distributed or integrated in any unit depending on various loads, usage conditions, etc. Furthermore, each processing function performed by each device can be realized in whole or in part by a CPU (Central Processing Unit) and a program analyzed and executed by the CPU, or can be realized as hardware by wired logic. Note that the program may be executed not only by the CPU but also by other processors such as a GPU.

また、本実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Furthermore, among the processes described in this embodiment, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically by a known method. In addition, the information including the processing procedures, control procedures, specific names, various data and parameters shown in the above documents and drawings can be changed arbitrarily unless otherwise specified.

［プログラム］
一実施形態として、重要度計算装置１０は、パッケージソフトウェアやオンラインソフトウェアとして上記の重要度計算処理を実行する重要度計算プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の重要度計算プログラムを情報処理装置に実行させることにより、情報処理装置を重要度計算装置１０として機能させることができる。ここで言う情報処理装置には、デスクトップ型又はノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やＰＨＡ（Personal Handyphone System）等の移動体通信端末、さらには、ＰＤＡ（Personal Digital Assistant）等のスレート端末等がその範疇に含まれる。 [program]
As an embodiment, the importance calculation device 10 can be implemented by installing an importance calculation program that executes the above-mentioned importance calculation process as package software or online software on a desired computer. For example, the information processing device can function as the importance calculation device 10 by executing the above-mentioned importance calculation program on the information processing device. The information processing device here includes desktop or notebook personal computers. In addition, the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHAs (Personal Handyphone Systems), as well as slate terminals such as PDAs (Personal Digital Assistants).

また、重要度計算装置１０は、ユーザが使用する端末装置をクライアントとし、当該クライアントに上記の重要度計算処理に関するサービスを提供する重要度計算サーバ装置として実装することもできる。例えば、重要度計算サーバ装置は、データを入力とし、データの特徴ごとの重要度を出力とする重要度計算サービスを提供するサーバ装置として実装される。この場合、重要度計算サーバ装置は、Ｗｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記の重要度計算処理に関するサービスを提供するクラウドとして実装することとしてもかまわない。 The importance calculation device 10 can also be implemented as an importance calculation server device that provides a service related to the above-mentioned importance calculation process to a client, the client being a terminal device used by a user. For example, the importance calculation server device is implemented as a server device that provides an importance calculation service that inputs data and outputs the importance of each feature of the data. In this case, the importance calculation server device may be implemented as a Web server, or may be implemented as a cloud that provides a service related to the above-mentioned importance calculation process by outsourcing.

図１３は、重要度計算プログラムを実行するコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 Figure 13 is a diagram showing an example of a computer that executes an importance calculation program. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these components is connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１及びＲＡＭ（Random Access Memory）１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（BASIC Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to a display 1130, for example.

ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、重要度計算装置１０の各処理を規定するプログラムは、コンピュータにより実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、重要度計算装置１０における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid State Drive）により代替されてもよい。The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the program that defines each process of the importance calculation device 10 is implemented as a program module 1093 in which computer-executable code is written. The program module 1093 is stored, for example, in the hard disk drive 1090. For example, a program module 1093 for executing a process similar to the functional configuration of the importance calculation device 10 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

また、上述した実施形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０は、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して、上述した実施形態の処理を実行する。In addition, the setting data used in the processing of the above-described embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090. Then, CPU 1020 reads out program module 1093 and program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary, and executes the processing of the above-described embodiment.

なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３及びプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３及びプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 Note that the program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (such as a local area network (LAN) or wide area network (WAN)). The program module 1093 and the program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.

１０重要度計算装置
１１インタフェース部
１２記憶部
１３制御部
１２１モデル情報
１３１潜在表現計算部
１３２影響度計算部
１３３重要度計算部
１３４特徴選択部 REFERENCE SIGNS LIST 10 Importance calculation device 11 Interface unit 12 Storage unit 13 Control unit 121 Model information 131 Latent expression calculation unit 132 Influence calculation unit 133 Importance calculation unit 134 Feature selection unit

Claims

a latent representation calculation unit that calculates latent representations of data samples using the encoder part of the trained autoencoder;
an importance calculation unit that calculates an importance of each feature included in the data sample based on a gradient of the data sample with respect to the latent representation;
a feature selection unit that selects the feature whose importance is equal to or greater than a predetermined threshold based on the importance calculated by the importance calculation unit;
An importance calculation device comprising:

An influence calculation unit is further provided for calculating one influence on the latent representation for each feature of the data sample;
2. The importance calculation device according to claim 1, wherein the importance calculation unit calculates, as the importance, a predetermined statistic of the influence calculated for a plurality of the data samples.

The importance calculation device according to claim 2, characterized in that the influence calculation unit calculates the influence by adding up the absolute values of the gradients for each dimension of the latent representation for each feature of the data sample.

the importance calculation unit calculates a median of the influence levels as an importance level, and further calculates an evaluation value by multiplying the median by a value representing a degree of variation in the importance levels ;
3. The importance calculation device according to claim 2 , wherein the feature selection unit selects the features based on the importance or evaluation value calculated by the importance calculation unit .

An importance calculation method executed by an importance calculation device,
A latent representation calculation step of calculating latent representations of data samples using the encoder part of the trained autoencoder;
an importance calculation step of calculating an importance of each feature included in the data sample based on a gradient of the data sample with respect to the latent representation;
a feature selection step of selecting the features whose importance is equal to or greater than a predetermined threshold based on the importance calculated by the importance calculation step;
The importance calculation method includes the steps of:

An importance calculation program for causing a computer to function as an importance calculation device according to any one of claims 1 to 4.