JP7614611B2

JP7614611B2 - Interpretation method, interpretation device, and program

Info

Publication number: JP7614611B2
Application number: JP2021145237A
Authority: JP
Inventors: 具治岩田; 友也吉川
Original assignee: Nippon Telegraph and Telephone Corp; Chiba Institute of Technology; NTT Inc USA
Current assignee: Chiba Institute of Technology; NTT Inc; NTT Inc USA
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2025-01-16
Anticipated expiration: 2041-09-07
Also published as: JP2023038481A

Description

本発明は、解釈方法、解釈装置、及びプログラムに関する。 The present invention relates to an interpretation method, an interpretation device, and a program.

近年、深層学習の発展により、高性能の機械学習モデル（以下、単に「モデル」ともいう。）を構築できるようになってきている。しかしながら、一般に、複雑な現象を予測するためには複雑なモデルが必要であり、人間にとってそのモデルがどのように予測をしたのかを解釈することは困難である。これに対して、複雑なモデルを解釈するために、そのモデルを局所線形モデルで近似する手法が提案されている（例えば、非特許文献１参照）。 In recent years, advances in deep learning have made it possible to construct high-performance machine learning models (hereinafter simply referred to as "models"). However, in general, complex models are needed to predict complex phenomena, and it is difficult for humans to interpret how the models make predictions. In response to this, a method has been proposed to interpret complex models by approximating the models with locally linear models (see, for example, Non-Patent Document 1).

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144).Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144).

しかしながら、局所線形モデルでの近似が困難な場合、モデルの適切な解釈ができないという問題がある。 However, when approximation with a locally linear model is difficult, there is a problem in that the model cannot be properly interpreted.

本発明の一実施形態は、上記の点に鑑みてなされたもので、解釈可能なモデルを得ることを目的とする。 One embodiment of the present invention has been made in consideration of the above points, and aims to obtain an interpretable model.

上記目的を達成するため、一実施形態に係る解釈方法は、１以上の特徴を表す特徴量と前記特徴量に対する正解ラベルとを含む学習用データで構成される学習用データセットを用いて、第１のモデルの予測性能が高くなるように、かつ、前記第１のモデルの出力と、前記第１のモデルよりも解釈性が高い第２のモデルの出力とが同じになるように、前記第１のモデルのパラメータと前記第２のモデルのパラメータとを学習する第１の学習手順と、前記学習用データセットを用いて、前記第１の学習手順で学習後の前記第１のモデルの出力と、前記第２のモデルの出力とが同じになるように、前記第２のモデルのパラメータを学習する第２の学習手順と、をコンピュータが実行する。 In order to achieve the above object, an interpretation method according to one embodiment includes a first learning procedure in which a learning dataset is used that is composed of learning data including features representing one or more features and ground truth labels for the features, and the parameters of the first model and the parameters of the second model are learned so that the predictive performance of the first model is high and the output of the first model is the same as the output of a second model that has higher interpretability than the first model, and a second learning procedure in which the parameters of the second model are learned using the learning dataset, such that the output of the first model after learning in the first learning procedure is the same as the output of the second model.

解釈可能なモデルを得ることができる。 You can get an interpretable model.

本実施形態に係る解釈装置のハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of an interpretation device according to the present embodiment. 本実施形態に係る解釈装置の機能構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of an interpretation device according to the present embodiment.

以下、本発明の一実施形態について説明する。本実施形態では、任意のモデル（以下、「元モデル」ともいう。）と、その学習用データセットと、解釈性が高いモデル（以下、「説明可能モデル」）とが与えられた場合に、元モデルと同等の予測性能を持つ説明可能モデルを得ることができる解釈装置１０について説明する。 One embodiment of the present invention will be described below. In this embodiment, an interpretation device 10 will be described that, when given an arbitrary model (hereinafter also referred to as the "original model"), its training dataset, and a model with high interpretability (hereinafter referred to as the "explainable model"), can obtain an explainable model with predictive performance equivalent to that of the original model.

本実施形態に係る解釈装置１０には、学習用データセットＤ＝｛（ｘ_ｎ，ｙ_ｎ）｜ｎ＝１，・・・，Ｎ｝と、元モデルｆ（・；θ）と、説明可能モデルｇ（・；φ）とが与えられる。ここで、ｘ_ｎはｎ番目の事例の特徴量を表すベクトル、ｙ_ｎはそのラベル（正解ラベル）を表すスカラー値、Ｎは事例数である。また、元モデルｆ（・；θ）と説明可能モデルｇ（・；φ）は、特徴量ｘを入力したとき、その予測ラベルを出力する関数である。すなわち、予測ラベルを＾ｙとすれば、＾ｙ＝ｆ（ｘ；θ）、＾ｙ＝ｇ（ｘ；φ）である。更に、θとφはそれぞれ元モデルと説明可能モデルのパラメータである。なお、特徴量ｘの各要素は、それに対応する特徴の値を表す。 The interpretation device 10 according to this embodiment is given a learning data set D={(x _n , y _n )|n=1,...,N}, an original model f(.;θ), and an explainable model g(.;φ). Here, x _n is a vector representing the feature of the nth case, y _n is a scalar value representing its label (correct label), and N is the number of cases. In addition, the original model f(.;θ) and the explainable model g(.;φ) are functions that output a predicted label when a feature x is input. That is, if the predicted label is ^y, then ^y=f(x;θ) and ^y=g(x;φ). Furthermore, θ and φ are parameters of the original model and the explainable model, respectively. Each element of the feature x represents the value of the corresponding feature.

元モデル、説明可能モデルとしては任意のモデルを用いることができるが、説明可能モデルとしては、例えば、局所線形モデルや決定木等といった解釈性が高いモデルを用いることを想定する。元モデルと説明可能モデルで異なる特徴量を入力できてもよいが、その場合は、元モデルの特徴量が、説明可能モデルの特徴量に変換できるものとする。 Any model can be used as the original model and the explainable model, but it is assumed that the explainable model will be a model with high interpretability, such as a locally linear model or a decision tree. Different features may be input to the original model and the explainable model, but in that case, the features of the original model must be convertible to the features of the explainable model.

＜解釈装置１０のハードウェア構成＞
本実施形態に係る解釈装置１０のハードウェア構成を図１に示す。図１に示すように、本実施形態に係る解釈装置１０は一般的なコンピュータ又はコンピュータシステムのハードウェア構成で実現され、入力装置１０１と、表示装置１０２と、外部Ｉ／Ｆ１０３と、通信Ｉ／Ｆ１０４と、プロセッサ１０５と、メモリ装置１０６とを有する。これらの各ハードウェアは、それぞれがバス１０７により通信可能に接続される。 <Hardware Configuration of Interpretation Device 10>
The hardware configuration of the interpretation device 10 according to this embodiment is shown in Fig. 1. As shown in Fig. 1, the interpretation device 10 according to this embodiment is realized by the hardware configuration of a general computer or computer system, and has an input device 101, a display device 102, an external I/F 103, a communication I/F 104, a processor 105, and a memory device 106. Each of these pieces of hardware are connected to each other via a bus 107 so as to be able to communicate with each other.

入力装置１０１は、例えば、キーボードやマウス、タッチパネル、各種物理ボタン等である。表示装置１０２は、例えば、ディスプレイや表示パネル等である。なお、解釈装置１０は、例えば、入力装置１０１及び表示装置１０２のうちの少なくとも一方を有していなくてもよい。 The input device 101 is, for example, a keyboard, a mouse, a touch panel, various physical buttons, etc. The display device 102 is, for example, a display, a display panel, etc. Note that the interpretation device 10 does not have to have at least one of the input device 101 and the display device 102, for example.

外部Ｉ／Ｆ１０３は、記録媒体１０３ａ等の外部装置とのインタフェースである。解釈装置１０は、外部Ｉ／Ｆ１０３を介して、記録媒体１０３ａの読み取りや書き込み等を行うことができる。なお、記録媒体１０３ａとしては、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk）、ＳＤメモリカード（Secure Digital memory card）、ＵＳＢ（Universal Serial Bus）メモリカード等が挙げられる。 The external I/F 103 is an interface with an external device such as a recording medium 103a. The interpretation device 10 can read and write data from and to the recording medium 103a via the external I/F 103. Examples of the recording medium 103a include a CD (Compact Disc), a DVD (Digital Versatile Disk), a SD memory card (Secure Digital memory card), and a USB (Universal Serial Bus) memory card.

通信Ｉ／Ｆ１０４は、解釈装置１０を通信ネットワークに接続するためのインタフェースである。プロセッサ１０５は、例えば、ＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等の各種演算装置である。メモリ装置１０６は、例えば、ＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）、フラッシュメモリ、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）等の各種記憶装置である。 The communication I/F 104 is an interface for connecting the interpretation device 10 to a communication network. The processor 105 is, for example, various types of arithmetic devices such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit). The memory device 106 is, for example, various types of storage devices such as a HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, a RAM (Random Access Memory), or a ROM (Read Only Memory).

本実施形態に係る解釈装置１０は、図１に示すハードウェア構成を有することにより、後述する各種処理を実現することができる。なお、図１に示すハードウェア構成は一例であって、解釈装置１０は、例えば、複数のプロセッサ１０５を有していてもよいし、複数のメモリ装置１０６を有していてもよいし、図示したハードウェア以外の様々なハードウェアを有していてもよい。 The interpretation device 10 according to this embodiment has the hardware configuration shown in FIG. 1 and is therefore capable of implementing various processes described below. Note that the hardware configuration shown in FIG. 1 is merely an example, and the interpretation device 10 may have, for example, multiple processors 105, multiple memory devices 106, or various other hardware in addition to the hardware shown in the figure.

＜解釈装置１０の機能構成＞
本実施形態に係る解釈装置１０の機能構成を図２に示す。図２に示すように、本実施形態に係る解釈装置１０は、学習部２０１と、解釈部２０２と、記憶部２０３とを有する。なお、学習部２０１及び解釈部２０２は、例えば、解釈装置１０にインストールされた１以上のプログラムがプロセッサ１０５に実行させる処理により実現される。また、記憶部２０３は、例えば、メモリ装置１０６により実現される。ただし、記憶部２０３は、例えば、解釈装置１０と通信ネットワークを介して接続される記憶装置（ＮＡＳ（Network Attached Storage）、データベースサーバ等）により実現されてもよい。 <Functional configuration of interpretation device 10>
The functional configuration of the interpretation device 10 according to this embodiment is shown in Fig. 2. As shown in Fig. 2, the interpretation device 10 according to this embodiment includes a learning unit 201, an interpretation unit 202, and a storage unit 203. The learning unit 201 and the interpretation unit 202 are realized, for example, by a process executed by the processor 105 of one or more programs installed in the interpretation device 10. The storage unit 203 is realized, for example, by the memory device 106. However, the storage unit 203 may be realized, for example, by a storage device (such as a NAS (Network Attached Storage), a database server, etc.) connected to the interpretation device 10 via a communication network.

学習部２０１は、学習用データセットＤを用いて、元モデルｆ（・；θ）の予測性能が高くなるように、かつ、元モデルｆ（・；θ）と説明可能モデルｇ（・；φ）の出力が同じになるように、元モデルｆ（・；θ）及び説明可能モデルｇ（・；φ）のパラメータθ及びφを学習する。 The learning unit 201 uses the learning dataset D to learn the parameters θ and φ of the original model f(・;θ) and the explainable model g(・;φ) so that the predictive performance of the original model f(・;θ) is improved and the outputs of the original model f(・;θ) and the explainable model g(・;φ) are the same.

例えば、説明可能モデルとして局所線形モデルを用いた場合、局所線形モデルは、 For example, if a locally linear model is used as an explainable model, the locally linear model is

と表現できる。ここで、φ＝ｗはパラメータである。この場合、学習部２０１は、以下の式（１）に示す損失Ｅ（θ）を最小化するようにパラメータθ及びφを学習すればよい。

Here, φ=w is a parameter. In this case, the learning unit 201 may learn the parameters θ and φ so as to minimize the loss E(θ) shown in the following formula (1).

ここで、π（ｘ_ｎ，ｘ_ｍ'）は事例間の類似度、ｘ_ｍ'はｘ_ｍにノイズを加えて生成した特徴量、ｗ（ｘ_ｎ；ｆ（・；θ））は局所線形モデルをｘ_ｎの近傍で元モデルｆ（・；θ）に適合させた場合（つまり、ｘ_ｎの近傍で局所線形モデルが元モデルを近似するようにした場合）のパラメータである。また、ｌ（小文字のＬ）は元モデルｆ（・；θ）に対応する誤差（損失）関数、Ｍはｘにノイズを加えたｘ'を何個作成するかを表すハイパーパラメータ、λは第１項目と第２項目のどちらを重視するかを表すハイパーパラメータである。なお、類似度としては、例えば、コサイン類似度等といった任意の類似度を用いることが可能である。

Here, π(x _n , x _m ') is the similarity between cases, x _m ' is the feature generated by adding noise to x _m , and w(x _n ; f(·; θ)) is a parameter when a local linear model is adapted to the original model f(·; θ) in the vicinity of x _n (i.e., when the local linear model is made to approximate the original model in the vicinity of x _n ). In addition, l (lowercase L) is an error (loss) function corresponding to the original model f(·; θ), M is a hyperparameter indicating how many x' obtained by adding noise to x are to be created, and λ is a hyperparameter indicating whether the first or second item is emphasized. Note that any similarity such as cosine similarity can be used as the similarity.

上記の式（１）の第１項目は予測精度を高めるためのものであり、第２項目は元モデルｆ（・；θ）と説明可能モデルｇ（・；φ）の出力が同じになるようにするためのものである。 The first term in the above equation (1) is intended to improve prediction accuracy, and the second term is intended to ensure that the outputs of the original model f(·;θ) and the explainable model g(·;φ) are the same.

一方で、例えば、説明可能モデルとして決定木等といった局所的ではないモデルを用いた場合、学習部２０１は、以下の式（２）に示す損失Ｅ（θ，φ）を最小化するようにパラメータθ及びφを学習すればよい。 On the other hand, for example, when a non-local model such as a decision tree is used as the explainable model, the learning unit 201 can learn the parameters θ and φ so as to minimize the loss E(θ, φ) shown in the following equation (2).

なお、上記の式（２）は、式（１）と同様に、第１項目が予測精度を高めるためのものであり、第２項目が元モデルｆ（・；θ）と説明可能モデルｇ（・；φ）の出力が同じになるようにするためのものである。

In addition, in the above equation (2), like equation (1), the first term is intended to improve prediction accuracy, and the second term is intended to ensure that the outputs of the original model f(.;θ) and the explainable model g(.;φ) are the same.

解釈部２０２は、学習用データセットＤを用いて、学習済みの元モデルｆ（・；θ）の出力と説明可能モデルｇ（・；φ）とが同じになるように、説明可能モデルｇ（・；φ）のパラメータφを学習する。すなわち、解釈部２０２は、元モデルｆ（・；θ）のパラメータθを固定し、説明可能モデルｇ（・；φ）のパラメータφのみを学習する。 The interpretation unit 202 uses the learning dataset D to learn the parameter φ of the explainable model g(・;φ) so that the output of the learned original model f(・;θ) is the same as the explainable model g(・;φ). In other words, the interpretation unit 202 fixes the parameter θ of the original model f(・;θ) and learns only the parameter φ of the explainable model g(・;φ).

例えば、説明可能モデルとして局所線形モデルを用いた場合、解釈部２０２は、以下の式（３）に示すｗ（ｘ；ｆ（・；θ））を計算する。 For example, when a local linear model is used as the explainable model, the interpretation unit 202 calculates w(x;f(·;θ)) shown in the following equation (3).

ここで、Ｍ'はｘにノイズを加えたｘ'を何個作成するかを表すハイパーパラメータである。また、ηは正則化のハイパーパラメータであり、説明可能モデルの複雑度をコントロールするものである。

Here, M' is a hyperparameter that indicates how many x' are generated by adding noise to x, and η is a regularization hyperparameter that controls the complexity of the explainable model.

上記の式（３）に示すｗ（ｘ；ｆ（・；θ））が解釈となる。すなわち、説明可能モデルｇ（・；φ）により特徴量ｘから予測ラベル＾ｙを出力した際に、ｗ（ｘ；ｆ（・；θ））の要素のうち、絶対値が最も大きい要素に対応する特徴が、予測に大きな影響を与えた特徴ということになる。例えば、ｗ（ｘ；ｆ（・；θ））のうち、ｒ番目の要素の絶対値が最も大きければ、ｒ番目の特徴が、予測に大きな影響を与えた特徴ということになる。 The interpretation is w(x; f(.; θ)) shown in the above formula (3). In other words, when the predictive label ^y is output from the feature x by the explainable model g(.; φ), the feature corresponding to the element with the largest absolute value among the elements of w(x; f(.; θ)) is the feature that had the greatest impact on the prediction. For example, if the absolute value of the rth element of w(x; f(.; θ)) is the largest, then the rth feature is the feature that had the greatest impact on the prediction.

一方で、例えば、説明可能モデルとして決定木等といった局所的ではないモデルを用いた場合、解釈部２０２は、以下の式（４）に示すφ（ｆ（・；θ））を計算する。 On the other hand, for example, when a non-local model such as a decision tree is used as the explainable model, the interpretation unit 202 calculates φ(f(·;θ)) shown in the following formula (4).

このφ（ｆ（・；θ））が解釈となる。

This φ(f(.;θ)) is the interpretation.

記憶部２０３は、解釈装置１０に与えられた学習用データセット、元モデルｆ（・；θ）、及び説明可能モデルｇ（・；φ）を記憶する。 The memory unit 203 stores the learning dataset, the original model f(·;θ), and the explainable model g(·;φ) provided to the interpretation device 10.

＜処理の流れ＞
学習時には、本実施形態に係る解釈装置１０は以下のＳｔｅｐ１－１～Ｓｔｅｐ１－２を実行する。 <Processing flow>
During learning, the interpretation device 10 according to this embodiment executes the following Step 1-1 to Step 1-2.

Ｓｔｅｐ１－１：まず、学習部２０１は、学習用データセットＤを用いて、元モデルｆ（・；θ）の予測性能が高くなるように、かつ、元モデルｆ（・；θ）と説明可能モデルｇ（・；φ）の出力が同じになるように、元モデルｆ（・；θ）及び説明可能モデルｇ（・；φ）のパラメータθ及びφを学習する。これは、説明可能モデルｇ（・；φ）として局所線形モデルを用いた場合は上記の式（１）、決定木等の局所的ではないモデルを用いた場合は上記の式（２）によりパラメータθ及びφを学習すればよい。 Step 1-1: First, the learning unit 201 uses the learning dataset D to learn the parameters θ and φ of the original model f(.;θ) and the explainable model g(.;φ) so that the predictive performance of the original model f(.;θ) is improved and the outputs of the original model f(.;θ) and the explainable model g(.;φ) are the same. This can be done by learning the parameters θ and φ using the above formula (1) when a local linear model is used as the explainable model g(.;φ), or by learning the above formula (2) when a non-local model such as a decision tree is used.

Ｓｔｅｐ１－２：そして、解釈部２０２は、学習用データセットＤを用いて、学習済みの元モデルｆ（・；θ）の出力と説明可能モデルｇ（・；φ）とが同じになるように、説明可能モデルｇ（・；φ）のパラメータφを学習する。これは、説明可能モデルｇ（・；φ）として局所線形モデルを用いた場合は上記の式（３）、決定木等の局所的ではないモデルを用いた場合は上記の式（４）によりパラメータφを学習すればよい。 Step 1-2: Then, the interpretation unit 202 uses the learning dataset D to learn the parameters φ of the explainable model g(・;φ) so that the output of the learned original model f(・;θ) is the same as the explainable model g(・;φ). This can be done by learning the parameters φ using the above formula (3) if a local linear model is used as the explainable model g(・;φ), or by learning the above formula (4) if a non-local model such as a decision tree is used.

以上のＳｔｅｐ１－１～Ｓｔｅｐ１－２により、解釈性が高く、かつ、元モデルｆ（・；θ）と同等の予測性能を持つ説明可能モデルｇ（・；φ）が得られる。 By performing Step 1-1 to Step 1-2 above, an explainable model g(・;φ) is obtained that is highly interpretable and has predictive performance equivalent to that of the original model f(・;θ).

次に、推論時（予測時）には、本実施形態に係る解釈装置１０は以下のＳｔｅｐ２－１～Ｓｔｅｐ２－２を実行する。 Next, during inference (prediction), the interpretation device 10 according to this embodiment executes the following Step 2-1 to Step 2-2.

Ｓｔｅｐ２－１：まず、解釈部２０２は、予測対象の特徴量ｘを用いて、予測ラベル＾ｙ＝ｇ（・；φ）を計算する。 Step 2-1: First, the interpretation unit 202 calculates the predicted label ^y = g(·;φ) using the feature x of the prediction target.

Ｓｔｅｐ２－２：そして、解釈部２０２は、予測ラベル＾ｙの予測に大きな影響を与えた特徴を特定する。これは、説明可能モデルｇ（・；φ）として局所線形モデルを用いた場合は上記の式（３）に示すｗ（ｘ；ｆ（・；θ））、決定木等の局所的ではないモデルを用いた場合は上記の式（２）に示すφ（ｘ；ｆ（・；θ））を用いて、絶対値が最も大きい要素に対応する特徴を、予測に大きな影響を与えた特徴と特定すればよい。なお、絶対値が大きい上位Ｓ（Ｓは予め決められた自然数）個の要素に対応する特徴を、予測に大きな影響を与えた特徴と特定してもよい。 Step 2-2: The interpretation unit 202 then identifies the features that had a large impact on the prediction of the predicted label ^y. This can be done by using w(x;f(.;θ)) shown in the above formula (3) when a local linear model is used as the explainable model g(.;φ), or φ(x;f(.;θ)) shown in the above formula (2) when a non-local model such as a decision tree is used, and identifying the feature corresponding to the element with the largest absolute value as the feature that had a large impact on the prediction. Note that features corresponding to the top S (S is a predetermined natural number) elements with the largest absolute values may also be identified as the feature that had a large impact on the prediction.

なお、本実施形態では、学習時と推論時を同一の解釈装置１０が実行するものとしたが、学習時と推論時が異なる装置で実行されてもよい。 In this embodiment, the same interpretation device 10 is used for learning and inference, but learning and inference may be performed by different devices.

＜評価＞
以下、２つのデータセットを用いて、本実施形態に係る解釈装置１０の評価について説明する。 <Evaluation>
Below, an evaluation of the interpretation device 10 according to this embodiment will be described using two data sets.

Ｄｉｇｉｔｓデータセットを用いた場合の評価結果（平均と標準誤差）を以下の表１に示す。 The evaluation results (mean and standard error) when using the Digits dataset are shown in Table 1 below.

ここで、正答率は予測性能を表し、その値が高いほど良い。信頼性スコアは元モデルと説明可能モデルがどの程度同じ振る舞いをするかを表し、その値が高いほど良い。安定性スコアは類似した特徴量の解釈がどの程度似ているかを表し、その値が高いほど良い。なお、Ｄｉｇｉｔｓデータセットについては、例えば、参考文献１等を参照されたい。

Here, the accuracy rate represents the prediction performance, and the higher the value, the better. The reliability score represents the extent to which the original model and the explainable model behave in the same way, and the higher the value, the better. The stability score represents the extent to which the interpretation of similar features is similar, and the higher the value, the better. For the Digits dataset, see, for example, Reference 1.

Ｂｏｓｔｏｎデータセットを用いた場合の評価結果（平均と標準誤差）を以下の表２に示す。 The evaluation results (mean and standard error) when using the Boston dataset are shown in Table 2 below.

ここで、平均二乗誤差は予測性能を表し、その値が低いほど良い。信頼性スコアは元モデルと説明可能モデルがどの程度同じ振る舞いをするかを表し、その値が低いほど良い。安定性スコアは類似した特徴量の解釈がどの程度似ているかを表し、その値が低いほど良い。なお、Ｂｏｓｔｏｎデータセットについては、例えば、参考文献２等を参照されたい。

Here, the mean square error represents prediction performance, and the lower the value, the better. The reliability score represents how similar the original model and the explainable model behave, and the lower the value, the better. The stability score represents how similar the interpretations of similar features are, and the lower the value, the better. For the Boston dataset, see, for example, Reference 2.

上記の表１及び２に示されるように、本実施形態に係る解釈装置１０は、既存技術と比較して、同等の予測性能を達成しつつ、解釈の信頼性と安定性を高めることができている。すなわち、本実施形態に係る解釈装置１０によって得られる説明可能モデルは、元モデルと同等の予測性能を持ち、かつ、解釈性が高いモデルであるといえる。 As shown in Tables 1 and 2 above, the interpretation device 10 according to this embodiment is able to improve the reliability and stability of interpretation while achieving the same predictive performance as existing technologies. In other words, the explainable model obtained by the interpretation device 10 according to this embodiment can be said to be a model that has predictive performance equivalent to that of the original model and is highly interpretable.

［参考文献］
参考文献１：C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their Applications to Handwritten Digit Recognition, MSc Thesis, Institute of Graduate Studies in Science and Engineering, Bogazici University.
参考文献２：Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.
＜実施形態のまとめ＞
本明細書には、少なくとも下記各項の解釈方法、解釈装置、及びプログラムが開示されている。
（付記項１）
１以上の特徴を表す特徴量と前記特徴量に対する正解ラベルとを含む学習用データで構成される学習用データセットを用いて、第１のモデルの予測性能が高くなるように、かつ、前記第１のモデルの出力と、前記第１のモデルよりも解釈性が高い第２のモデルの出力とが同じになるように、前記第１のモデルのパラメータと前記第２のモデルのパラメータとを学習する第１の学習手順と、
前記学習用データセットを用いて、前記第１の学習手順で学習後の前記第１のモデルの出力と、前記第２のモデルの出力とが同じになるように、前記第２のモデルのパラメータを学習する第２の学習手順と、
をコンピュータが実行する解釈方法。
（付記項２）
推論対象の特徴量を用いて、前記第２の学習手順で学習後の前記第２のモデルにより前記推論対象の特徴量に対する予測ラベルを計算する予測手順と、
前記第２の学習手順で学習後の前記第２のモデルのパラメータを用いて、前記予測ラベルの計算に対する影響を解釈する解釈手順と、
をコンピュータが実行する付記項１に記載の解釈方法。
（付記項３）
前記解釈手順は、
前記第２の学習手順で学習後の前記第２のモデルのパラメータの要素の中で前記予測ラベルを計算したときの絶対値が大きい上位所定の個数の要素に対応する特徴を、前記予測ラベルの計算に対する影響が大きい特徴と解釈する、付記項２に記載の解釈方法。
（付記項４）
前記第２のモデルは、局所線形モデルであり、
前記第１の学習手順は、
前記特徴量を用いて前記第１のモデルにより予測したラベルと、前記特徴量に対する正解ラベルとの誤差と、
前記特徴量と該特徴量に対してノイズを付与したノイズ付与後特徴量との類似度と、
前記特徴量の近傍で前記第２のモデルにより前記第１のモデルを近似した場合に前記特徴量を用いて前記第２のモデルにより予測したラベルと、前記ノイズ付与後特徴量を用いて前記第１のモデルにより予測したラベルとの差と、
に基づいて、前記第１のモデルのパラメータと前記第２のモデルのパラメータとを学習する、付記項１乃至３の何れか一項に記載の解釈方法。
（付記項５）
前記第２のモデルは、決定木を含む局所的ではないモデルであり、
前記第１の学習手順は、
前記特徴量を用いて前記第１のモデルにより予測したラベルと、前記特徴量に対する正解ラベルとの誤差と、
前記特徴量を用いて前記第１のモデルにより予測したラベルと、前記特徴量を用いて前記第２のモデルにより予測したラベルとの誤差と、
に基づいて、前記第１のモデルのパラメータと前記第２のモデルのパラメータとを学習する、付記項１乃至３の何れか一項に記載の解釈方法。
（付記項６）
前記第２の学習手順は、
前記第１の学習手順で学習後の前記第１のモデルのパラメータを固定した上で、前記第１の学習手順で学習後の前記第１のモデルの出力と、前記第２のモデルの出力とが同じになるように、前記第２のモデルのパラメータを学習する、付記項１乃至５の何れか一項に記載の解釈方法。
（付記項７）
１以上の特徴を表す特徴量と前記特徴量に対する正解ラベルとを含む学習用データで構成される学習用データセットを用いて、第１のモデルの予測性能が高くなるように、かつ、前記第１のモデルの出力と、前記第１のモデルよりも解釈性が高い第２のモデルの出力とが同じになるように、前記第１のモデルのパラメータと前記第２のモデルのパラメータとを学習する第１の学習部と、
前記学習用データセットを用いて、前記第１の学習部で学習後の前記第１のモデルの出力と、前記第２のモデルの出力とが同じになるように、前記第２のモデルのパラメータを学習する第２の学習部と、
を有する解釈装置。
（付記項８）
コンピュータに、付記項１乃至６の何れか一項に記載の解釈方法を実行させるプログラム。 [References]
Reference 1: C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their Applications to Handwritten Digit Recognition, MSc Thesis, Institute of Graduate Studies in Science and Engineering, Bogazici University.
Reference 2: Harrison, D. and Rubinfeld, DL 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.
Summary of the embodiment
This specification discloses at least the interpretation method, interpretation device, and program described in the following items.
(Additional Note 1)
a first learning procedure for learning parameters of the first model and the second model, using a learning dataset configured with learning data including features representing one or more features and correct labels for the features, so that the predictive performance of the first model is improved and the output of the first model is the same as the output of a second model having higher interpretability than the first model;
a second learning procedure for learning parameters of the second model using the learning data set such that an output of the first model after learning in the first learning procedure is the same as an output of the second model;
The interpretation method executed by the computer.
(Additional Note 2)
a prediction step of calculating a predicted label for the feature quantity of the inference object by the second model learned in the second learning step using the feature quantity of the inference object;
an interpretation step of interpreting the effect of the second model parameters trained in the second training step on the calculation of the predicted label;
2. The interpretation method according to claim 1, wherein the interpretation method is executed by a computer.
(Additional Note 3)
The interpretation procedure is as follows:
The interpretation method described in Appendix 2, in which features corresponding to a top predetermined number of elements having large absolute values when the predicted label is calculated among the elements of the parameters of the second model after learning in the second learning procedure are interpreted as features having a large influence on the calculation of the predicted label.
(Additional Note 4)
the second model is a locally linear model;
The first learning procedure includes:
an error between a label predicted by the first model using the feature amount and a correct label for the feature amount; and
a similarity between the feature amount and a noise-added feature amount obtained by adding noise to the feature amount; and
a difference between a label predicted by the second model using the feature when the first model is approximated by the second model in the vicinity of the feature and a label predicted by the first model using the noise-added feature; and
The interpretation method according to any one of claims 1 to 3, further comprising learning parameters of the first model and parameters of the second model based on:
(Additional Note 5)
the second model is a non-local model that includes a decision tree;
The first learning procedure includes:
an error between a label predicted by the first model using the feature amount and a correct label for the feature amount; and
an error between a label predicted by the first model using the feature amount and a label predicted by the second model using the feature amount; and
The interpretation method according to any one of claims 1 to 3, further comprising learning parameters of the first model and parameters of the second model based on:
(Additional Note 6)
The second learning procedure includes:
An interpretation method described in any one of appendix 1 to 5, in which parameters of the first model after learning in the first learning procedure are fixed, and parameters of the second model are learned so that the output of the first model after learning in the first learning procedure is the same as the output of the second model.
(Additional Note 7)
a first learning unit that learns parameters of the first model and the second model by using a learning dataset configured with learning data including features representing one or more features and correct labels for the features, so that the predictive performance of the first model is improved and an output of the first model is the same as an output of a second model having higher interpretability than the first model;
a second learning unit that learns parameters of the second model using the learning data set so that an output of the first model after learning by the first learning unit is the same as an output of the second model;
An interpretation device having:
(Additional Note 8)
A program for causing a computer to execute the interpretation method according to any one of claims 1 to 6.

本発明は、具体的に開示された上記の実施形態に限定されるものではなく、特許請求の範囲の記載から逸脱することなく、種々の変形や変更、既知の技術との組み合わせ等が可能である。 The present invention is not limited to the specifically disclosed embodiments above, and various modifications, changes, and combinations with known technologies are possible without departing from the scope of the claims.

１０解釈装置
１０１入力装置
１０２表示装置
１０３外部Ｉ／Ｆ
１０３ａ記録媒体
１０４通信Ｉ／Ｆ
１０５プロセッサ
１０６メモリ装置
１０７バス
２０１学習部
２０２解釈部
２０３記憶部 10 Interpretation device 101 Input device 102 Display device 103 External I/F
103a Recording medium 104 Communication I/F
105 Processor 106 Memory device 107 Bus 201 Learning unit 202 Interpretation unit 203 Storage unit

Claims

a first learning procedure for learning parameters of the first model and the second model, using a learning dataset configured with learning data including features representing one or more features and correct labels for the features, so that the predictive performance of the first model is improved and the output of the first model is the same as the output of a second model having higher interpretability than the first model;
a second learning procedure for learning parameters of the second model using the learning data set such that an output of the first model after learning in the first learning procedure is the same as an output of the second model;
a prediction step of calculating a predicted label for the feature quantity of the inference object by the second model learned in the second learning step using the feature quantity of the inference object;
an interpretation step of interpreting the effect of the second model parameters trained in the second training step on the calculation of the predicted label;
The computer executes
The interpretation procedure is as follows:
an interpretation method for interpreting features corresponding to a top predetermined number of elements having the largest absolute values when the predicted label is calculated among the elements of the parameters of the second model after learning in the second learning procedure as features having a large influence on the calculation of the predicted label .

a first learning procedure for learning parameters of the first model and the second model using a learning dataset configured with learning data including features representing one or more features and correct labels for the features, so that the predictive performance of the first model is improved and the output of the first model is the same as the output of a second model having higher interpretability than the first model;
a second learning procedure for learning parameters of the second model using the learning data set such that an output of the first model after learning in the first learning procedure is the same as an output of the second model;
The computer executes
the second model is a locally linear model;
The first learning procedure includes:
an error between a label predicted by the first model using the feature amount and a correct label for the feature amount; and
a similarity between the feature amount and a noise-added feature amount obtained by adding noise to the feature amount; and
a difference between a label predicted by the second model using the feature when the first model is approximated by the second model in the vicinity of the feature and a label predicted by the first model using the noise-added feature; and
and learning parameters of the first model and parameters of the second model based on the

a first learning procedure for learning parameters of the first model and the second model, using a learning dataset configured with learning data including features representing one or more features and correct labels for the features, so that the predictive performance of the first model is improved and the output of the first model is the same as the output of a second model having higher interpretability than the first model;
a second learning procedure for learning parameters of the second model using the learning data set such that an output of the first model after learning in the first learning procedure is the same as an output of the second model;
The computer executes
the second model is a non-local model that includes a decision tree;
The first learning procedure includes:
an error between a label predicted by the first model using the feature amount and a correct label for the feature amount; and
an error between a label predicted by the first model using the feature amount and a label predicted by the second model using the feature amount; and
and learning parameters of the first model and parameters of the second model based on the

The second learning procedure includes:
The interpretation method according to any one of claims 1 to 3, further comprising: fixing parameters of the first model after learning in the first learning procedure; and learning parameters of the second model so that an output of the first model after learning in the first learning procedure is the same as an output of the second model.

a first learning procedure for simultaneously learning parameters of the first model and parameters of the second model using a learning dataset configured with learning data including features representing one or more features and correct labels for the features, so that the predictive performance of the first model is improved and the output of the first model is the same as the output of a second model having higher interpretability than the first model;
a second learning procedure for learning parameters of the second model using the learning data set such that an output of the first model after learning in the first learning procedure is the same as an output of the second model;
The interpretation method executed by the computer.

a prediction step of calculating a predicted label for the feature quantity of the inference object by the second model learned in the second learning step using the feature quantity of the inference object;
an interpretation step of interpreting the effect of the second model parameters trained in the second training step on the calculation of the predicted label;
The interpretation method according to claim 5, wherein the interpretation method is executed by a computer.

a first learning unit that learns parameters of the first model and the second model by using a learning dataset configured with learning data including features representing one or more features and correct labels for the features, so that the predictive performance of the first model is improved and an output of the first model is the same as an output of a second model having higher interpretability than the first model;
a second learning unit that learns parameters of the second model using the learning data set so that an output of the first model after learning by the first learning unit is the same as an output of the second model;
a prediction unit that uses a feature of an inference object to calculate a predicted label for the feature of the inference object by the second model learned by the second learning unit;
an interpretation unit that interprets an effect on the calculation of the predicted label by using parameters of the second model trained by the second training unit;
having
The interpretation unit is
An interpretation device that interprets features corresponding to a top predetermined number of elements having large absolute values when the predicted label is calculated among the elements of the parameters of the second model after learning by the second learning unit as features having a large influence on the calculation of the predicted label .

a first learning unit that learns parameters of the first model and the second model by using a learning dataset configured with learning data including features representing one or more features and correct labels for the features, so that the predictive performance of the first model is improved and an output of the first model is the same as an output of a second model having higher interpretability than the first model;
a second learning unit that learns parameters of the second model using the learning data set so that an output of the first model after learning by the first learning unit is the same as an output of the second model;
having
the second model is a locally linear model;
The first learning unit includes:
an error between a label predicted by the first model using the feature amount and a correct label for the feature amount; and
a similarity between the feature amount and a noise-added feature amount obtained by adding noise to the feature amount; and
a difference between a label predicted by the second model using the feature when the first model is approximated by the second model in the vicinity of the feature and a label predicted by the first model using the noise-added feature; and
and learning parameters of the first model and parameters of the second model based on the above.

a first learning unit that learns parameters of the first model and the second model by using a learning dataset configured with learning data including features representing one or more features and correct labels for the features, so that the predictive performance of the first model is improved and an output of the first model is the same as an output of a second model having higher interpretability than the first model;
a second learning unit that learns parameters of the second model using the learning data set so that an output of the first model after learning by the first learning unit is the same as an output of the second model;
having
the second model is a non-local model that includes a decision tree;
The first learning unit includes:
an error between a label predicted by the first model using the feature amount and a correct label for the feature amount; and
an error between a label predicted by the first model using the feature amount and a label predicted by the second model using the feature amount; and
and learning parameters of the first model and parameters of the second model based on the above.

a first learning unit that uses a learning dataset configured with learning data including features representing one or more features and correct labels for the features to simultaneously learn parameters of the first model and parameters of the second model so that the predictive performance of the first model is improved and so that an output of the first model is the same as an output of a second model having higher interpretability than the first model;
a second learning unit that learns parameters of the second model using the learning data set so that an output of the first model after learning by the first learning unit is the same as an output of the second model;
An interpretation device having:

a prediction unit that uses a feature of an inference object to calculate a predicted label for the feature of the inference object by the second model learned by the second learning unit;
an interpretation unit that interprets an effect on the calculation of the predicted label by using parameters of the second model trained by the second training unit;
11. The interpretation device according to claim 10, comprising:

A program for causing a computer to execute the interpretation method according to any one of claims 1 to 6.