JP6465440B2

JP6465440B2 - Analysis apparatus, method, and program

Info

Publication number: JP6465440B2
Application number: JP2016036106A
Authority: JP
Inventors: 竹内　孝; 孝竹内; 具治岩田; 吉伸河原
Original assignee: Nippon Telegraph and Telephone Corp; Osaka University NUC; NTT Inc USA
Current assignee: University of Osaka NUC; NTT Inc; NTT Inc USA
Priority date: 2016-02-26
Filing date: 2016-02-26
Publication date: 2019-02-06
Anticipated expiration: 2036-02-26
Also published as: JP2017151904A

Description

本発明は、解析装置、方法、及びプログラムに係り、特に、正則化技術を利用して、データを解析する解析装置、方法、及びプログラムに関する。 The present invention relates to an analysis apparatus, method, and program, and more particularly, to an analysis apparatus, method, and program for analyzing data using a regularization technique.

教師あり学習は、観測データと教師データとのペアからなる訓練データが与えられた際に、観測データから教師データを予測するためのパラメータを推定する技術である。パラメータの推定は、観測データから教師データを予測したときの誤差を最小化する問題として定式化されている。推定されたパラメータの性能は、テストデータと呼ばれる訓練データに含まれない未知の観測データから教師データを予測した際の誤差（汎化誤差）によって評価され、汎化誤差が小さいほど性能が良いパラメータとされる。 Supervised learning is a technique for estimating parameters for predicting supervised data from observed data when given training data consisting of pairs of observed data and supervised data. Parameter estimation is formulated as a problem that minimizes errors when predicting teacher data from observed data. The performance of the estimated parameters is evaluated by the error (generalization error) when predicting the teacher data from unknown observation data not included in the training data called test data. The smaller the generalization error, the better the performance It is said.

近年、教師あり学習の研究分野では観測データの高次元化に伴い、訓練データが未知の観測データの次元より少ない場合において、推定されたパラメータの汎化誤差が大きくなることにより、推定精度が悪化する問題が注目を集めている。この現象は、過学習という現象に拠る。そこでデータに関して予め得られている事前知識を正則化項として導入し、パラメータの取りうる値を制約し、過学習を回避する正則化技術が研究されている。正則化技術のなかでも、パラメータの持つ構造を利用したものは構造正則化とよばれ、近年盛んに研究されている。 In recent years, in the supervised learning research field, the accuracy of estimation deteriorates as the generalization error of the estimated parameters increases when the training data is less than the dimensions of the unknown observation data as the observation data increases in dimension. The problem is getting attention. This phenomenon is based on the phenomenon of overlearning. Therefore, regularization techniques that introduce prior knowledge obtained in advance as data as regularization terms, restrict the possible values of parameters, and avoid overlearning have been studied. Among regularization techniques, the one using the structure of parameters is called structural regularization, and has been actively studied in recent years.

非特許文献１では、結合正則化(Fused Lasso)という構造正則化技術が提案されている。結合正則化技術は、あるパラメータと別のパラメータは隣接関係を持つために同一の真値をもつという事前知識を正則化項として導入する。非特許文献１は、結合正則化項の最小化法を提案し、実験からパラメータが隣接関係を持つデータにおいて、構造を用いない正則化技術よりも高い性能を達成すると報告されている。 Non-Patent Document 1 proposes a structural regularization technique called combined regularization (Fused Lasso). The combined regularization technique introduces prior knowledge that a certain parameter and another parameter have the same true value because they have an adjacent relationship as a regularization term. Non-Patent Document 1 proposes a method for minimizing the joint regularization term, and it is reported from experiments that the performance is higher than that of the regularization technique that does not use the structure in the data having parameters adjacent to each other.

非特許文献２では、結合正則化の隣接関係を任意の隣接関係に拡張した一般化結合正則化(Generalized Fused Lasso)という技術が提案されている。非特許文献２は、一般化結合正則化が離散最適化における劣モジュラ関数のロバシュ拡張と一致することを示し（非特許文献３参照）、さらにその性質を利用することで、非特許文献４及び非特許文献７によって一般化結合正則化項の最小化法を提案した。 Non-Patent Document 2 proposes a technique called generalized combined regularization (Generalized Fused Lasso) in which the adjacency relationship of connection regularization is expanded to an arbitrary adjacency relationship. Non-Patent Document 2 shows that generalized joint regularization is consistent with the Robache extension of the submodular function in discrete optimization (see Non-Patent Document 3), and further using its properties, Non-Patent Document 4 and Non-Patent Document 7 proposed a method for minimizing generalized coupled regularization terms.

教師なし学習では、訓練データとテストデータに観測データは含まれるが、教師データが含まれない場合に、データを解析する技術であり、行列分解などの技術が提案されている。教師なし学習においても、データの高次元化に伴い過学習の問題が起きており、この問題を回避するために正則化技術を適応する研究が行われており解析精度の向上が報告されている。 In unsupervised learning, observation data is included in training data and test data, but when teacher data is not included, a technique for analyzing data, such as matrix decomposition, has been proposed. Even in unsupervised learning, the problem of over-learning has occurred with higher data dimensions, and research to apply regularization technology to avoid this problem has been conducted, and improvement in analysis accuracy has been reported. .

Tibshirani,R., Saunders,M., Rosset,S., Zhu,J., Knight, K.: Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(1), 91−108 (2005)Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K .: Sparsity and smoothness via the fused lasso.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (1) , 91-108 (2005) Xin,B., Kawahara,Y., Wang,Y., Gao,W. :Efficient generalized fused lasso with its application to the diagnosis of Alzheimer’s disease. In: Proc. of AAAI. pp. 2163−2169 (2014)Xin, B., Kawahara, Y., Wang, Y., Gao, W.: Efficient generalized fused lasso with its application to the diagnosis of Alzheimer ’s disease.In: Proc. Of AAAI. Pp. 2163-2169 (2014) Fujishige,S.:Submodularfunctionsandoptimization,vol.58.Elsevier(2005)Fujishige, S .: Submodular function sandoptimization, vol. 58. Elsevier (2005) Gallo,G., Grigoriadis,M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing 18(1), 30−55 (1989)Gallo, G., Grigoriadis, M.D., Tarjan, R.E .: A fast parametric maximum flow algorithm and applications.SIAM Journal on Computing 18 (1), 30-55 (1989) Kohli,P., Ladicky,L., Torr,P.H.S. : Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision 82(3), 302−324 (2009)Kohli, P., Ladicky, L., Torr, P.H.S .: Robust higher order potentials for enforcing label consistency.International Journal of Computer Vision 82 (3), 302-324 (2009) Combettes,P.L. ,Wajs,V.R.: Signal recovery by proximal forward−backward splitting. Multiscale Modeling & Simulation 4(4), 1168−1200 (2005)Combettes, P.L., Wajs, V.R .: Signal recovery by proximal forward-backward splitting.Multiscale Modeling & Simulation 4 (4), 1168-1200 (2005) Nagano,K., Kawahara,Y., Aihara,K.: Size−constrained submodular minimization through minimum norm base. In: Proc. of ICML. pp. 977−984 (2011)Nagano, K., Kawahara, Y., Aihara, K .: Size-constrained submodular minimization through minimum norm base. In: Proc. Of ICML. Pp. 977-984 (2011) Liu,J., Ji,S., Ye,J.: SLEP: Sparse Learning with Efficient Projections. Arizona State University (2009),Liu, J., Ji, S., Ye, J .: SLEP: Sparse Learning with Efficient Projections.Arizona State University (2009),

パラメータに関する事前知識として、高階の構造、すなわち、あるパラメータの集合は同一のグループに所属する、といった情報を正則化に利用したい場合を想定する。しかし、一般化結合正則化が利用可能な事前知識は隣接関係のみに限定されているため、高階の構造を利用できない。そのためにパラメータがグループ構造を持つデータに対して、十分な性能を得られない場合がある。 As prior knowledge regarding parameters, a case is assumed in which information such as a higher-order structure, that is, a set of parameters belonging to the same group, is used for regularization. However, prior knowledge that can be used for generalized joint regularization is limited only to the adjacency relationship, so a higher-order structure cannot be used. For this reason, sufficient performance may not be obtained for data having parameters having a group structure.

本発明は、上記問題点を解決するために成されたものであり、グループ構造を持つデータを精度よく解析することができる解析装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide an analysis apparatus, method, and program capable of accurately analyzing data having a group structure.

上記目的を達成するために、第１の発明に係る解析装置は、観測データｘに対する実数スカラー値ｙを予測する解析装置であって、観測データｘと実数スカラー値ｙとの組み合わせからなるデータ点の集合である訓練データと、観測データｘからなるテストデータと、観測データｘに対する実数スカラー値ｙを予測するためのｄ次元のベクトルであるパラメータβの各次元の、各グループへの所属度を表すパラメータ構造データと、前記パラメータβを推定するための損失関数における前記訓練データの前記データ点の各々についての誤差項ｌの勾配∇ｌと、前記損失関数におけるＲ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとを受け付けるデータ入力部と、前記訓練データと、前記誤差項ｌの勾配∇ｌと、前記Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、前記訓練データの前記データ点の各々についての前記誤差項ｌと、前記Ｒ個の正則化項Ω_ｒと、各グループに対し、前記パラメータβにおける、前記グループに所属する次元間の要素の値の差を用いて表される高階結合正則化項とを含む前記損失関数を最小化するように、前記パラメータβを推定するパラメータ推定部と、前記パラメータ推定部によって推定された前記パラメータβに基づいて、前記テストデータに対する実数スカラー値ｙを予測する予測部と、を含んで構成されている。 In order to achieve the above object, an analysis apparatus according to a first invention is an analysis apparatus for predicting a real scalar value y with respect to observation data x, and is a data point comprising a combination of observation data x and real scalar value y. The degree of affiliation of each dimension of each parameter of the parameter β, which is a d-dimensional vector for predicting the real scalar value y for the observation data x, the training data that is a set of the training data, the observation data x, and Representing the parameter structure data representing, the slope ∇l of the error term l for each of the data points of the training data in the loss function for estimating the parameter β, and the R regularization terms Ω _{r in} the loss function A data input unit that accepts a function ∇Ω _r that minimizes each, the training data, the gradient ∇l of the error term l, and each of the R regularization terms Ω _r On the basis of a function ∇Ω _r that minimizes each of the training data, the error term l for each of the data points of the training data, the R regularization terms Ω _r, and for each group, the parameter β A parameter estimation unit that estimates the parameter β so as to minimize the loss function including a higher-order coupled regularization term expressed using a difference in element values between dimensions belonging to the group in A prediction unit that predicts a real scalar value y for the test data based on the parameter β estimated by the parameter estimation unit.

また、第１の発明に係る解析装置において、前記パラメータ構造データは、前記パラメータβの次元対の類似度を更に含み、前記パラメータ推定部は、前記訓練データと、前記誤差項ｌの勾配∇ｌと、前記Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、前記訓練データの前記データ点の各々についての前記誤差項ｌと、前記Ｒ個の正則化項Ω_ｒと、前記高階結合正則化項、及び前記パラメータβの次元対の類似度を用いて表される一般化結合正則化項を含む一般化高階結合正則化項とを含む前記損失関数を最小化するように、前記パラメータβを推定するようにしてもよい。 In the analysis apparatus according to the first invention, the parameter structure data further includes a dimension pair similarity of the parameter β, and the parameter estimation unit includes the training data and the gradient ∇l of the error term l. And the error term l for each of the data points of the training data, and the R regularization terms, based on and the function ∇Ω _r that minimizes each of the R regularization terms Ω _r Minimize the loss function including Ω _r and the generalized higher order regularization term including the higher order joint regularization term and the generalized joint regularization term expressed using the similarity of the dimension pairs of the parameter β. Alternatively, the parameter β may be estimated.

また、第１の発明に係る解析装置において、前記パラメータ推定部は、前記誤差項ｌの勾配∇ｌに基づいて、前記パラメータβを推定する誤差項パラメータ推定部と、前記誤差項パラメータ推定部によって推定された前記パラメータβと、前記Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、前記パラメータβを推定する正則化項パラメータ推定部と、前記正則化項パラメータ推定部によって推定された前記パラメータβと、前記高階結合正則化項とに基づいて、前記パラメータβを推定する高階結合正則化項パラメータ推定部とを含み、前記誤差項パラメータ推定部による推定、前記正則化項パラメータ推定部による推定、及び前記高階結合正則化項パラメータ推定部による推定を、少なくとも１回ずつ行うようにしてもよい。 In the analysis apparatus according to the first invention, the parameter estimation unit includes an error term parameter estimation unit that estimates the parameter β based on the gradient ∇l of the error term l, and the error term parameter estimation unit. A regularization term parameter estimator for estimating the parameter β based on the estimated parameter β and a function ∇Ω _r that minimizes each of the R regularization terms Ω _r , and the regularization term A higher-order coupled regularization term parameter estimation unit that estimates the parameter β based on the parameter β estimated by the parameter estimation unit and the higher-order coupling regularization term, and the estimation by the error term parameter estimation unit, The estimation by the regularization term parameter estimation unit and the estimation by the higher-order coupled regularization term parameter estimation unit may be performed at least once. .

また、第１の発明に係る解析装置において、前記パラメータ推定部は、前記誤差項ｌの勾配∇ｌに基づいて、前記パラメータβを推定する誤差項パラメータ推定部と、前記誤差項パラメータ推定部によって推定された前記パラメータβと、前記Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、前記パラメータβを推定する正則化項パラメータ推定部と、前記正則化項パラメータ推定部によって推定された前記パラメータβと、前記一般化高階結合正則化項とに基づいて、前記パラメータβを推定する一般化高階結合正則化項パラメータ推定部とを含み、前記誤差項パラメータ推定部による推定、前記正則化項パラメータ推定部による推定、及び前記一般化高階結合正則化項パラメータ推定部による推定を、少なくとも１回ずつ行うようにしてもよい。 In the analysis apparatus according to the first invention, the parameter estimation unit includes an error term parameter estimation unit that estimates the parameter β based on the gradient ∇l of the error term l, and the error term parameter estimation unit. A regularization term parameter estimator for estimating the parameter β based on the estimated parameter β and a function ∇Ω _r that minimizes each of the R regularization terms Ω _r , and the regularization term A generalized higher-order coupled regularization term parameter estimation unit that estimates the parameter β based on the parameter β estimated by the parameter estimation unit and the generalized higher-order coupling regularization term, and the error term parameter estimation Estimation by the regular part, estimation by the regularization term parameter estimation unit, and estimation by the generalized higher order coupled regularization term parameter estimation unit at least once. You may make it.

また、第１の発明に係る解析装置において、前記高階結合パラメータ推定部は、前記正則化項パラメータ推定部によって推定された前記パラメータβと、前記高階結合正則化項に対応して予め定められたｓ／ｔグラフとに基づいて、パラメトリック最大流アルゴリズムに従って、パラメータβを推定するようにしてもよい。 In the analysis apparatus according to the first invention, the higher-order coupled parameter estimation unit is predetermined corresponding to the parameter β estimated by the regularization term parameter estimation unit and the higher-order coupled regularization term. The parameter β may be estimated according to the parametric maximum flow algorithm based on the s / t graph.

また、第１の発明に係る解析装置において、前記一般化高階結合正則化項パラメータ推定部は、前記正則化項パラメータ推定部によって推定された前記パラメータβと、前記一般化高階結合正則化項に対応して予め定められたｓ／ｔグラフとに基づいて、パラメトリック最大流アルゴリズムに従って、パラメータβを推定するようにしてもよい。 In the analysis apparatus according to the first aspect of the present invention, the generalized higher order coupled regularization term parameter estimation unit includes the parameter β estimated by the regularization term parameter estimation unit and the generalized higher order coupled regularization term. Correspondingly, the parameter β may be estimated according to a parametric maximum flow algorithm based on a predetermined s / t graph.

また、第２の発明に係る解析装置は、観測データｘからなるデータ点の集合である訓練データと、観測データｘを解析するためのｄ次元のベクトルであるパラメータβの各次元の、各グループへの所属度を表すパラメータ構造データと、前記パラメータβを推定するための損失関数における前記訓練データの前記データ点の各々についての誤差項ｌの勾配∇ｌと、前記損失関数におけるＲ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとを受け付けるデータ入力部と、前記訓練データと、前記誤差項ｌの勾配∇ｌと、前記Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、前記訓練データの前記データ点の各々についての前記誤差項ｌと、前記Ｒ個の正則化項Ω_ｒと、各グループに対し、前記パラメータβにおける、前記グループに所属する次元間の要素の値の差を用いて表される高階結合正則化項とを含む前記損失関数を最小化するように、前記パラメータβを推定するパラメータ推定部と、を含んで構成されている。 The analysis apparatus according to the second aspect of the present invention provides each group of training data that is a set of data points composed of observation data x and each dimension of parameter β that is a d-dimensional vector for analyzing observation data x. Parameter structure data representing the degree of membership in the data, the slope ∇l of the error term l for each of the data points of the training data in the loss function for estimating the parameter β, and R regulars in the loss function A data input that accepts a function ∇Ω _r that minimizes each of the term Ω _r, the training data, the slope ∇ l of the error term l, and each of the R regularization terms Ω _r based on the function ∇Omega _r of reduction, and the error term l for each of the data points of the training data, wherein the R-number of regularization term Omega _r, for each group, in the parameter beta, the Guru A parameter estimator for estimating the parameter β so as to minimize the loss function including a higher-order coupled regularization term expressed using a difference in element values between dimensions belonging to It is configured.

また、第２の発明に係る解析装置において、前記パラメータ構造データは、前記パラメータβの次元対の類似度を更に含み、前記パラメータ推定部は、前記訓練データと、前記誤差項ｌの勾配∇ｌと、前記Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、前記訓練データの前記データ点の各々についての前記誤差項ｌと、前記Ｒ個の正則化項Ω_ｒと、前記高階結合正則化項、及び前記パラメータβの次元対の類似度を用いて表される一般化結合正則化項を含む一般化高階結合正則化項とを含む前記損失関数を最小化するように、前記パラメータβを推定するようにしてもよい。 In the analysis apparatus according to the second invention, the parameter structure data further includes a dimension pair similarity of the parameter β, and the parameter estimation unit includes the training data and the gradient ∇l of the error term l. And the error term l for each of the data points of the training data, and the R regularization terms, based on and the function ∇Ω _r that minimizes each of the R regularization terms Ω _r Minimize the loss function including Ω _r and the generalized higher order regularization term including the higher order joint regularization term and the generalized joint regularization term expressed using the similarity of the dimension pairs of the parameter β. Alternatively, the parameter β may be estimated.

第３の発明に係る解析方法は、観測データｘに対する実数スカラー値ｙを予測する解析装置における解析方法であって、データ入力部が、観測データｘと実数スカラー値ｙとの組み合わせからなるデータ点の集合である訓練データと、観測データｘからなるテストデータと、観測データｘに対する実数スカラー値ｙを予測するためのｄ次元のベクトルであるパラメータβの各次元の、各グループへの所属度を表すパラメータ構造データと、前記パラメータβを推定するための損失関数における前記訓練データの前記データ点の各々についての誤差項ｌの勾配∇ｌと、前記損失関数におけるＲ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとを受け付けるステップと、パラメータ推定部が、前記訓練データと、前記誤差項ｌの勾配∇ｌと、前記Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、前記訓練データの前記データ点の各々についての前記誤差項ｌと、前記Ｒ個の正則化項Ω_ｒと、各グループに対し、前記パラメータβにおける、前記グループに所属する次元間の要素の値の差を用いて表される高階結合正則化項とを含む前記損失関数を最小化するように、前記パラメータβを推定するステップと、予測部が、前記パラメータ推定部によって推定された前記パラメータβに基づいて、前記テストデータに対する実数スカラー値ｙを予測するステップと、を含んで実行することを特徴とする。 An analysis method according to a third invention is an analysis method in an analyzer for predicting a real scalar value y with respect to observation data x, wherein the data input unit is a data point comprising a combination of the observation data x and the real scalar value y. The degree of affiliation of each dimension of each parameter of the parameter β, which is a d-dimensional vector for predicting the real scalar value y for the observation data x, the training data that is a set of the training data, the observation data x, and Representing the parameter structure data representing, the slope ∇l of the error term l for each of the data points of the training data in the loss function for estimating the parameter β, and the R regularization terms Ω _{r in} the loss function Receiving a function ∇Ω _r that minimizes each, and a parameter estimator, wherein the training data, the slope ∇l of the error term l, Based each of the R regularization term Omega _r in the function ∇Omega _r that minimizes, and the error term l for each of the data points of the training data, and wherein the R regularization term Omega _r For each group, the parameter β so as to minimize the loss function including a higher-order coupled regularization term represented using a difference in element values between dimensions belonging to the group in the parameter β. a step of estimating β, and a step of predicting a real scalar value y for the test data based on the parameter β estimated by the parameter estimation unit. .

また、第３の発明に係る解析方法において、前記パラメータ構造データは、前記パラメータβの次元対の類似度を更に含み、前記パラメータ推定部が推定するステップは、前記訓練データと、前記誤差項ｌの勾配∇ｌと、前記Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、前記訓練データの前記データ点の各々についての前記誤差項ｌと、前記Ｒ個の正則化項Ω_ｒと、前記高階結合正則化項、及び前記パラメータβの次元対の類似度を用いて表される一般化結合正則化項を含む一般化高階結合正則化項とを含む前記損失関数を最小化するように、前記パラメータβを推定するようにしてもよい。 In the analysis method according to the third aspect of the invention, the parameter structure data further includes the similarity of the dimension pair of the parameter β, and the step of the parameter estimation unit estimating includes the training data and the error term l. , And the error term l for each of the data points of the training data, and the R pieces, based on the slope ∇l of and the function ∇Ω _r that minimizes each of the R regularization terms Ω _r A regularized term Ω _r of the above, a higher-order coupled regularization term, and a generalized higher-order coupled regularization term including a generalized coupled regularization term expressed using the similarity of the dimension pairs of the parameter β The parameter β may be estimated so as to minimize the loss function.

また、第３の発明に係る解析方法において、前記パラメータ推定部が推定するステップは、誤差項パラメータ推定部が、前記誤差項ｌの勾配∇ｌに基づいて、前記パラメータβを推定するステップと、正則化項パラメータ推定部が、前記誤差項パラメータ推定部によって推定された前記パラメータβと、前記Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、前記パラメータβを推定するステップと、高階結合正則化項パラメータ推定部が、前記正則化項パラメータ推定部によって推定された前記パラメータβと、前記高階結合正則化項とに基づいて、前記パラメータβを推定するステップと、を含み、前記誤差項パラメータ推定部による推定、前記正則化項パラメータ推定部による推定、及び前記高階結合正則化項パラメータ推定部による推定を、少なくとも１回ずつ行うようにしてもよい。 Further, in the analysis method according to the third aspect of the invention, the step of estimating the parameter estimation unit includes the step of the error term parameter estimation unit estimating the parameter β based on the gradient ∇l of the error term l; regularization term parameter estimation unit, and the parameter β estimated by the error term parameter estimator, based on the function ∇Omega _r that minimizes each of the R number of regularization term Omega _r, the parameter β And a step in which a higher-order coupled regularization term parameter estimation unit estimates the parameter β based on the parameter β estimated by the regularization-term parameter estimation unit and the higher-order coupling regularization term. And the estimation by the error term parameter estimation unit, the estimation by the regularization term parameter estimation unit, and the higher-order coupled regularization term parameter estimation The estimation by the fixed unit may be performed at least once.

また、第３の発明に係る解析方法において、前記高階結合パラメータ推定部が推定するステップは、前記正則化項パラメータ推定部によって推定された前記パラメータβと、前記高階結合正則化項に対応して予め定められたｓ／ｔグラフとに基づいて、パラメトリック最大流アルゴリズムに従って、パラメータβを推定するようにしてもよい。 In the analysis method according to the third aspect of the present invention, the step of estimating by the higher order coupled parameter estimation unit corresponds to the parameter β estimated by the regularization term parameter estimation unit and the higher order coupled regularization term. The parameter β may be estimated according to a parametric maximum flow algorithm based on a predetermined s / t graph.

第４の発明に係るプログラムは、コンピュータを、第１又は第２の発明に係る解析装置の各部として機能させるためのプログラムである。 A program according to a fourth invention is a program for causing a computer to function as each part of the analysis device according to the first or second invention.

本発明の解析装置、方法、及びプログラムによれば、訓練データと、誤差項ｌの勾配∇ｌと、Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、訓練データのデータ点の各々についての誤差項ｌと、Ｒ個の正則化項Ω_ｒと、各グループに対し、パラメータβにおける、グループに所属する次元間の要素の値の差を用いて表される高階結合正則化項とを含む損失関数を最小化するように、パラメータβを推定することにより、グループ構造を持つデータを精度よく解析することができる、という効果が得られる。 According to the analysis apparatus, method and program of the present invention, based on the training data, the slope ∇l of the error term l, and the function ∇Ω _r that minimizes each of the R regularization terms Ω _r , Expressed using the error term l for each of the data points of the training data, R regularization terms Ω _r, and for each group, the difference in element values between the dimensions belonging to the group in the parameter β. By estimating the parameter β so as to minimize a loss function including a higher-order coupled regularization term, data having a group structure can be analyzed with high accuracy.

本発明の第１の実施の形態に係る解析装置の構成を示すブロック図である。It is a block diagram which shows the structure of the analyzer which concerns on the 1st Embodiment of this invention. ｓ／ｔグラフを構築した場合の一例を示す図である。It is a figure which shows an example at the time of constructing an s / t graph. 本発明の第１の実施の形態に係る解析装置における解析処理ルーチンを示すフローチャートである。It is a flowchart which shows the analysis process routine in the analyzer which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る解析装置の構成を示すブロック図である。It is a block diagram which shows the structure of the analyzer which concerns on the 2nd Embodiment of this invention. ｓ／ｔグラフを構築した場合の一例を示す図である。It is a figure which shows an example at the time of constructing an s / t graph. 本発明の第３の実施の形態に係る解析装置の構成を示すブロック図である。It is a block diagram which shows the structure of the analyzer which concerns on the 3rd Embodiment of this invention. 本発明の第４の実施の形態に係る解析装置の構成を示すブロック図である。It is a block diagram which shows the structure of the analyzer which concerns on the 4th Embodiment of this invention. 実験例におけるnon-overlapping条件の結果の一例を示す図である。It is a figure which shows an example of the result of the non-overlapping conditions in an experiment example. 実験例におけるoverlapping条件の結果の一例を示す図である。It is a figure which shows an example of the result of the overlapping conditions in an experiment example. 実験例における真値、観測値、推定値の一例を示す図である。It is a figure which shows an example of the true value in an example of an experiment, an observed value, and an estimated value.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る原理＞ <Principle according to the embodiment of the present invention>

まず、本発明の実施の形態における原理を説明する。 First, the principle in the embodiment of the present invention will be described.

本発明の実施の形態では、高階結合正則化項の提案および、その最小化法を適用した解析装置を提案する。高階結合正則化項は非特許文献５に記載のロバストＰ^ｎモデルのロバシュ拡張として提案する。さらにロバストＰ^ｎモデルが劣モジュラ関数であることを利用し高階結合正則化項の効率的な最小化法を示す。 In the embodiment of the present invention, a proposal for a higher-order coupled regularization term and an analysis apparatus to which the minimization method is applied are proposed. The higher-order coupled regularization term is proposed as a Robust extension of the robust ^Pn model described in Non-Patent Document 5. Furthermore, an efficient minimization method for higher-order coupled regularization terms is shown using the fact that the robust ^Pn model is a submodular function.

観測データをＭ次元の実数ベクトルｘ、教師データの実数スカラー値ｙとする。訓練データをＮ個のデータ点の集合｛（ｘ_ｎ，ｙ_ｎ）｝_ｎ＝１ ^Ｎ、テストデータはＮ’個のデータ点からなる集合｛（ｘ_ｎ’）｝_ｎ’＝１ ^Ｎ’とする。正則化付き教師あり学習の損失関数は、ｄ次元の実数ベクトルβをパラメータとして用いて次の（１）式ように定式化される。 The observation data is assumed to be an M-dimensional real vector x and a teacher data real scalar value y. The training data is a set of N data points {(x _n , y _n )} _{n = 1} ^N , and the test data is a set of N ′ data points {(x _{n ′} )} _{n ′ = 1} ^{N ′} To do. The loss function of supervised learning with regularization is formulated as the following equation (1) using the d-dimensional real vector β as a parameter.

このとき、ｌ（ｘ_ｎ，ｙ_ｎ）はデータ点毎の誤差項、Ω（β）は正則化項である。また、正則化項Ω（β）は At this time, l (x _n , y _n ) is an error term for each data point, and Ω (β) is a regularization term. Also, the regularization term Ω (β) is

と表される。ただし、Ω_ｒは、Ｒ個の正則化項であり、λ_ｒは、正則化項Ω_ｒに対するハイパーパラメータであり、Ω_ｈｏ（β）は、後述する高階結合正則化項であり、λ_ｈｏは、高階結合正則化項Ω_ｈｏに対するハイパーパラメータである。 It is expressed. Here, Ω _r is R regularization terms, λ _r is a hyperparameter to the regularization term Ω _r , Ω _ho (β) is a higher-order coupled regularization term described later, and λ _ho is , A hyperparameter for the higher-order coupled regularization term Ω _ho .

正則化付き教師なし学習の損失関数は、ｙ_ｎを用いずに次の（２）式ように定式化される。 Loss function regularization with unsupervised learning is formulated into the following equation (2) so without a y _n.

正則化付き教師あり学習、及び正則化付き教師なし学習は、次の（３）式に示す、損失関数を最小化するパラメータβ^＊を求める問題である。 Regularized supervised learning and regularized unsupervised learning are problems for obtaining a parameter β ^* that minimizes the loss function shown in the following equation (3).

パラメータβ^＊の推定は、誤差項と正則化項の最小化を交互に行う（上記非特許文献６参照）。教師あり学習においては、予測関数ｆ（β^＊；ｘ_ｎ’）によって、テストデータの予測値｛（ｙ_ｎ’）｝_ｎ’＝１ ^Ｎ’を計算する。 The parameter β ^* is estimated by alternately minimizing the error term and the regularization term (see Non-Patent Document 6 above). In supervised learning, the predicted value {(y _{n ′} )} _{n ′ = 1} ^{N ′} of the test data is calculated by the prediction function f (β ^* ; x _{n ′} ).

本発明の実施の形態では、新たな構造正則化項として、Ω（β）の１つである、高階結合正則化項Ω_ｈｏ（β）の提案と高階結合正則化項を最小化するパラメータの推定を提案し、機械学習問題に適応する。 In the embodiment of the present invention, as a new structural regularization term, one of Ω (β) is proposed as a higher order coupled regularization term Ω _ho (β) and a parameter for minimizing the higher order coupled regularization term Propose estimation and adapt to machine learning problems.

＜本発明の第１の実施の形態に係る解析装置の構成＞ <Configuration of analysis apparatus according to first embodiment of the present invention>

次に、本発明の第１の実施の形態に係る解析装置の構成について説明する。図１に示すように、本発明の第１の実施の形態に係る解析装置１００は、ＣＰＵと、ＲＡＭと、後述する解析処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この解析装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部５０とを備えている。 Next, the configuration of the analysis apparatus according to the first embodiment of the present invention will be described. As shown in FIG. 1, the analysis device 100 according to the first embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing an analysis processing routine described later. It can be configured with a computer including. Functionally, the analysis apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG.

入力部１０は、訓練データ、テストデータ、パラメータ構造データ、誤差項ｌ、パラメータβを推定するための損失関数における訓練データのデータ点の各々についての誤差項ｌの勾配∇ｌ、誤差項のリプシッツ定数Ｌ、Ｒ個の正則化項Ω_ｒ（ｒ＝１，…，Ｒ）、損失関数におけるＲ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒ（ｒ＝１，…，Ｒ）、予測関数ｆ、繰り返し演算数Ｐ、ハイパーパラメータα、及びハイパーパラメータγを読み込む。 The input unit 10 includes training data, test data, parameter structure data, an error term l, a slope ｌl of the error term l for each of the data points of the training data in the loss function for estimating the parameter β, and Lipsitz of the error term. Constant L, R regularization terms Ω _r (r = 1,..., R), and a function ∇Ω _r (r = 1,..., R) that minimizes each of the R regularization terms Ω _r in the loss function. ), The prediction function f, the number of repeated operations P, the hyper parameter α, and the hyper parameter γ are read.

訓練データは、観測データｘと実数スカラー値ｙとの組み合わせからなるＮ個のデータ点の集合｛（ｘ_ｎ，ｙ_ｎ）｝_ｎ＝１ ^Ｎである。テストデータは、Ｎ’個の観測データｘの集合｛（ｘ_ｎ）｝_ｎ＝１ ^Ｎ’である。 The training data is a set of N data points {(x _n , y _n )} _{n = 1} ^N consisting of a combination of the observation data x and the real scalar value y. The test data is a set {(x _n )} _{n = 1} ^{N ′} of ^{N ′} observation data x.

パラメータ構造データは、次の（４）式に示す、観測データｘに対する実数スカラー値ｙを予測するためのｄ次元のベクトルであるパラメータβの各次元の、各グループｋへの所属度を表すｃ_０ ^ｋ、及びｃ_１ ^ｋである。 The parameter structure data is c indicating the degree of affiliation of each dimension of the parameter β, which is a d-dimensional vector for predicting the real scalar value y with respect to the observation data x, shown in the following equation (4), to each group k. ₀ ^k and c ₁ ^k .

パラメータ構造データは、Ｋ個のグループｋにおけるｃ_０ ^ｋ、及びｃ_１ ^ｋであり、ｋにおけるｃ_０ ^ｋ、及びｃ_１ ^ｋは、グループｋに属するパラメータβの各次元の要素の値が、と、グループｋにおける真値と同じとなる度合いを表している。また、ｇ_ｋはｋ番目のグループに属する、パラメータβの次元の集合である。また、パラメータ構造データは、グループｋにおける真値と同じ値を持つ要素の数を制御するためのパラメータθ_０ ^ｋ、θ_１ ^ｋ、及びθ_ｍａｘ ^ｋを更に含む。 Parameter structure data is K _c ^{0 k} in group k in number, and _c ^{1 k,} _c ^{0 k,} and _c ^{1 k} at k, the value of each dimension element of the parameter β in the group k is the , The degree of being the same as the true value in the group k. G _k is a set of dimensions of the parameter β belonging to the k-th group. The parameter structure data further includes parameters θ ₀ ^k , θ ₁ ^k , and θ _max ^k for controlling the number of elements having the same value as the true value in group k.

演算部２０は、パラメータ推定部３０と、予測部４０とを含んで構成されている。 The calculation unit 20 includes a parameter estimation unit 30 and a prediction unit 40.

パラメータ推定部３０は、訓練データと、誤差項ｌの勾配∇ｌと、Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒと、誤差項のリプシッツ定数Ｌと、ハイパーパラメータα、γとに基づいて、訓練データのデータ点の各々についての誤差項ｌと、Ｒ個の正則化項Ω_ｒと、各グループｋに対し、パラメータβにおける、グループｋに所属する次元間の要素の値の差を用いて表される高階結合正則化項とを含む損失関数を最小化するように、パラメータβを推定する。 The parameter estimation unit 30 includes training data, a slope ∇l of the error term l, a function ∇Ω _r that minimizes each of the R regularization terms Ω _r , a Lipschitz constant L of the error term, and a hyperparameter α. , Γ, the error term l for each of the data points of the training data, the R regularization terms Ω _r, and the elements between the dimensions belonging to group k in parameter β for each group k The parameter β is estimated so as to minimize the loss function including the higher-order coupled regularization term expressed using the difference between the values of.

パラメータ推定部３０は、具体的には、以下に説明する初期化部３２と、誤差項パラメータ推定部３４と、正則化項パラメータ推定部３６と、高階結合正則化項パラメータ推定部３８と、を含んで構成されている。パラメータ推定部３０では、誤差項パラメータ推定部３４、正則化項パラメータ推定部３６、及び高階結合正則化項パラメータ推定部３８の推定を少なくとも１回ずつ行う。本実施の形態では、入力部１０で受け付けた繰り返し演算数Ｐの回数分の推定を繰り返して得られたパラメータβ^＊を予測部４０に出力する。 Specifically, the parameter estimation unit 30 includes an initialization unit 32, an error term parameter estimation unit 34, a regularization term parameter estimation unit 36, and a higher-order coupled regularization term parameter estimation unit 38 described below. It is configured to include. The parameter estimation unit 30 performs estimation of the error term parameter estimation unit 34, the regularization term parameter estimation unit 36, and the higher-order coupled regularization term parameter estimation unit 38 at least once. In the present embodiment, the parameter β ^* obtained by repeatedly estimating the number of repetition operations P received by the input unit 10 is output to the prediction unit 40.

初期化部３２は、パラメータβの初期値β_０を一様乱数によって生成する。 Initialization unit 32 generates the uniform random initial value beta ₀ parameter beta.

誤差項パラメータ推定部３４は、誤差項ｌの勾配∇ｌと、誤差項のリプシッツ定数Ｌと、パラメータβの初期値β_０又は高階結合正則化項パラメータ推定部３８によって推定されたｐ−１回目のパラメータ推定値β_ｐ−１に基づいて、パラメータβ_ｐ ^＾を推定する。ここでは、上記非特許文献６に記載の手法に従って、以下（５）式に示すように、初期値β_０、又はｐ−１回目のパラメータ推定値β_ｐ−１を用いて、誤差項に関する推定値β_ｐ ^＾を得る。 The error term parameter estimation unit 34 is the p−1th time estimated by the gradient ∇l of the error term l, the Lipschitz constant L of the error term, the initial value β ₀ of the parameter β or the higher-order coupled regularization term parameter estimation unit 38. The parameter β _p ^{^} is estimated based on the parameter estimated value β _p−1 of Here, according to the method described in Non-Patent Document 6, the error term is estimated using the initial value β ₀ or the (p−1) th parameter estimated value β _p−1 as shown in the following equation (5). Obtain the value β _p ^{^} .

正則化項パラメータ推定部３６は、誤差項パラメータ推定部３４によって推定されたパラメータβ_ｐ＾と、Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、パラメータβ_ｐ ^〜を推定する。ここでは、誤差項パラメータ推定部３４で得られたβ_ｐ ^＾と、１つ目の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとを用いて、パラメータの推定値β_ｐ（１）^〜を得て、パラメータの推定値β_ｐ（１）^〜と、２つ目の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとを用いて、パラメータの推定値β_ｐ（２）^〜を得る。同様に、パラメータの推定値β_ｐ（ｒ−１）^〜と、２つ目の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒと用いて、パラメータの推定値β_ｐ（２）^〜を得る。ここでは、β_ｐ ^〜（０）＝β_ｐ ^＾とし、以下の（６）式に示す操作をＲ回繰り返す。 The regularization term parameter estimation unit 36 determines the parameter β based on the parameter β _p ^ estimated by the error term parameter estimation unit 34 and the function ∇Ω _r that minimizes each of the R regularization terms Ω _r. to estimate the _p ^~. Here, by using β _p ^{^} obtained by the error term parameter estimation unit 34 and a function ∇Ω _r that minimizes each of the first regularization terms Ω _r , an estimated value β _p (1 ) to give ^~, estimate beta _{p (1} parameter) ^- and, second each regularization term Omega _r by using the function ∇Omega _r that minimizes the estimated value of the parameter beta _{p (2} ) Get ^~ . Similarly, using the parameter estimate β _p (r−1) ^˜ and the function ∇Ω _r that minimizes each of the second regularization terms Ω _r , the parameter estimate β _p (2) ^˜ Get. Here, β _p ^˜ (0) = β _p ^{^,} and the operation shown in the following equation (6) is repeated R times.

そして、正則化項パラメータ推定部３６は、Ｒ回繰り返した結果を、β_ｐ ^〜＝β_ｐ ^〜（Ｒ）とする。 The regularization term parameter estimation unit 36, the result of repeated R times, β _{_p} ^{^~} = β _p ^~ a (R).

高階結合正則化項パラメータ推定部３８は、正則化項パラメータ推定部３６によって推定されたパラメータβ_ｐ ^〜と、高階結合正則化項Ω_ｈｏ（β）とに基づいて、パラメータβ_ｐを推定する。 Higher-order bond regularization term parameter estimation unit 38, the parameter beta _p and ^~ estimated by the regularization term parameter estimation unit 36, based on the higher-order bond regularization term Omega _{ho (beta),} to estimate the parameters beta _p.

具体的には、まず高階結合正則化項を次の（７）式のように定式化する。以下、簡単のためにβ＝β_ｐ ^〜とする。 Specifically, first, the higher-order coupled regularization term is formulated as in the following equation (7). Below, β = β _p and ^- for the sake of simplicity.

このとき、ｊ_ｓ ^ｋ及びｊ_ｔ ^ｋは、 At this time, j _s ^k and j _t ^k are

である。 It is.

ここで、高階結合正則化項は、劣モジュラ関数である上記非特許文献５のロバストＰ^ｎモデルのロバシュ拡張に対応する。そこで高階結合正則化項に関するパラメータβの推定は上記非特許文献２と同様に、非特許文献４及び非特許文献７に記載の手法によって行う。 Here, the higher-order coupled regularization term corresponds to the Robust extension of the robust ^Pn model of Non-Patent Document 5, which is a submodular function. Therefore, the estimation of the parameter β related to the higher-order coupled regularization term is performed by the methods described in Non-Patent Document 4 and Non-Patent Document 7, similarly to Non-Patent Document 2.

次に、高階結合正則化項に関するパラメータβの推定のため、高階結合正則化項Ω_ｈｏ（β）に対応して定められたｓ／ｔグラフを図２のように構築する。ｓ／ｔグラフにおいて、ｓは始点ノード、ｔは終点ノード、｛ｖ_１，…，ｖ_ｄ｝はパラメータβの各次元に対応するノード、Ｕ_ｓ＝｛ｕ_ｓ ^１，…，ｕ_ｓ ^Ｋ｝，Ｕ_ｔ＝｛ｕ_ｔ ^１，…，ｕ_ｔ ^Ｋ｝はグループｋに対応するハイパーノードである。 Next, in order to estimate the parameter β related to the higher-order coupled regularization term, an s / t graph determined corresponding to the higher-order coupled regularization term Ω _ho (β) is constructed as shown in FIG. In s / t graph, s is the start node, t is the end point _{_{node, {v 1, ..., v}} d} are nodes corresponding to each dimension of parameter _{_{^{β, U s = {u s}}} 1, ..., u s K} , U _t = {u _t ¹ ,..., U _t ^K } are hypernodes corresponding to group k.

高階結合正則化項パラメータ推定部３８は、正則化項パラメータ推定部３６によって推定されたパラメータβ_ｐ ^〜と、ｓ／ｔグラフとに基づいて、上記非特許文献４に記載のパラメトリック最大流アルゴリズムによって、ハイパーパラメータαの値を変化させながら、始点ノードから終点ノードまでのフローが最大となるパスを探索することにより、パラメータβ_ｐを推定する。 The higher-order coupled regularization term parameter estimation unit 38 uses the parametric maximum flow algorithm described in Non-Patent Document 4 based on the parameters β _p ^˜ estimated by the regularization term parameter estimation unit 36 and the s / t graph. Then, while changing the value of the hyper parameter α, the parameter β _p is estimated by searching for a path that maximizes the flow from the start node to the end node.

パラメータ推定部３０は、誤差項パラメータ推定部３４、正則化項パラメータ推定部３６、及び高階結合正則化項パラメータ推定部３８による推定をＰ回繰り返して得られたパラメータの推定値β_ｐを、上記（３）式によって定式化されたパラメータβ^＊として予測部４０に出力する。 The parameter estimation unit 30 calculates the estimated value β _p of the parameter obtained by repeating the estimation by the error term parameter estimation unit 34, the regularization term parameter estimation unit 36, and the higher-order coupled regularization term parameter estimation unit 38 P times. The parameter β ^* formulated by the equation (3) is output to the prediction unit 40.

予測部４０は、パラメータ推定部３０によって推定されたパラメータβ^＊と、予測関数ｆとに基づいて、テストデータに対する実数スカラー値ｙを予測する。ここでは、以下（８）式のように、パラメータ推定部３０によって推定されたパラメータβ^＊と、予測関数ｆとを用いて、テストデータ｛（ｘ_ｎ）｝_ｎ＝１ ^Ｎ’から予測値｛（ｙ_ｎ＾）｝_ｎ＝１ ^Ｎ’を計算し、計算結果を出力部５０に出力する。 The prediction unit 40 predicts a real scalar value y for the test data based on the parameter β ^* estimated by the parameter estimation unit 30 and the prediction function f. Here, as shown in the following equation (8), using the parameter β ^* estimated by the parameter estimation unit 30 and the prediction function f, the test data {(x _n )} _{n = 1} ^{N ′ is a} predicted value { (Y _{n ^} )} _{n = 1} ^{N ′} is calculated, and the calculation result is output to the output unit 50.

＜本発明の第１の実施の形態に係る解析装置の作用＞ <Operation of the analyzing apparatus according to the first embodiment of the present invention>

次に、本発明の第１の実施の形態に係る解析装置１００の作用について説明する。入力部１０において訓練データ、テストデータ、パラメータ構造データ、誤差項ｌ、誤差項ｌの勾配∇ｌ、誤差項のリプシッツ定数Ｌ、Ｒ個の正則化項Ω_ｒ、損失関数におけるＲ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒ、予測関数ｆ、繰り返し演算数Ｐ、ハイパーパラメータα、及びハイパーパラメータγを読み込むと、解析装置１００は、図３に示す解析処理ルーチンを実行する。 Next, the operation of the analysis apparatus 100 according to the first embodiment of the present invention will be described. In the input unit 10, training data, test data, parameter structure data, error term l, error term l slope ∇l, error term Lipschitz constant L, R regularization terms Ω _r , R regularization in loss function When the function ∇Ω _r that minimizes each of the terms Ω _r , the prediction function f, the number of iterations P, the hyper parameter α, and the hyper parameter γ are read, the analysis apparatus 100 executes the analysis processing routine shown in FIG. .

まず、ステップＳ１００では、ｐ＝１として、ｐ＝０のパラメータβの初期値β_０を一様乱数によって生成する。 First, in step S100, as p = 1, the initial value beta ₀ parameter beta for p = 0 generated by uniform random numbers.

次に、ステップＳ１０２では、誤差項ｌの勾配∇ｌと、誤差項のリプシッツ定数Ｌと、パラメータβの初期値β_０又は後述するステップＳ１０６によって推定されたｐ−１回目のパラメータ推定値β_ｐ−１に基づいて、上記（５）式に従って、パラメータβ_ｐ ^＾を推定する。 Next, in step S102, the gradient ∇l error term l, and Lipschitz constant L of the error term, the parameter parameter estimates p-1 th estimated by the step S106 to the initial value beta ₀ or below described beta beta _p Based on ₋₁ , the parameter β _p ^{^} is estimated according to the above equation (5).

ステップＳ１０４では、ステップＳ１０２で推定されたパラメータβ_ｐ＾と、Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、上記（６）式に従って、パラメータβ_ｐ ^〜を推定する。 In step S104, based on the parameter β _p ^ estimated in step S102 and the function ∇Ω _r that minimizes each of the R regularization terms Ω _r , the parameters β _p ^~ Is estimated.

ステップＳ１０６では、ステップＳ１０４で推定されたパラメータβ_ｐ ^〜と、上記（７）式で定式化された高階結合正則化項Ω_ｈｏ（β）に対応して定められたｓ／ｔグラフとに基づいて、上記非特許文献４に記載のパラメトリック最大流アルゴリズムによって、ハイパーパラメータαの値を変化させながら、始点ノードから終点ノードまでのフローが最大となるパスを探索することにより、パラメータβ_ｐを推定する。 In step S106, based on the parameter β _p ^~ estimated in step S104 and the s / t graph defined corresponding to the higher-order coupled regularization term Ω _ho (β) formulated in the above equation (7). Then, the parameter β _p is estimated by searching for a path that maximizes the flow from the start node to the end node while changing the value of the hyperparameter α by the parametric maximum flow algorithm described in Non-Patent Document 4 above. To do.

ステップＳ１０８では、ステップＳ１０２〜Ｓ１０６の処理をＰ回繰り返したかを判定し、Ｐ回繰り返していればステップＳ１１２へ移行し、Ｐ回繰り返していなければステップＳ１１０へ移行する。 In step S108, it is determined whether the processes in steps S102 to S106 have been repeated P times. If the process has been repeated P times, the process proceeds to step S112. If the process has not been repeated P times, the process proceeds to step S110.

ステップＳ１１０では、ｐ＝ｐ＋１として、ステップＳ１０２へ戻ってステップＳ１０２〜Ｓ１０６の処理を繰り返す。 In step S110, it sets p = p + 1, returns to step S102, and repeats the process of steps S102-S106.

ステップＳ１１２では、ステップＳ１０２〜Ｓ１０８の結果得られたパラメータの推定値β_ｐを、上記（３）式によって定式化されたパラメータβ^＊として予測部４０に出力する。 In step S112, the estimated value β _p of the parameter obtained as a result of steps S102 to S108 is output to the prediction unit 40 as the parameter β ^* formulated by the above equation (3).

ステップＳ１１４では、ステップＳ１１０で得られたパラメータβ^＊と、予測関数ｆとに基づいて、テストデータに対する実数スカラー値ｙを予測し、予測結果を出力部５０に出力して解析処理ルーチンを終了する。 In step S114, the real scalar value y for the test data is predicted based on the parameter β ^* obtained in step S110 and the prediction function f, the prediction result is output to the output unit 50, and the analysis processing routine is terminated. .

以上説明したように、第１の実施の形態に係る解析装置によれば、訓練データと、誤差項ｌの勾配∇ｌと、Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、訓練データのデータ点の各々についての誤差項ｌと、Ｒ個の正則化項Ω_ｒと、各グループに対し、パラメータβにおける、グループに所属する次元間の要素の値の差を用いて表される高階結合正則化項とを含む損失関数を最小化するように、パラメータβを推定し、パラメータβに基づいて、テストデータに対する実数スカラー値ｙを予測することにより、グループ構造を持つデータを精度よく解析して、テストデータに対する実数スカラー値ｙを予測することができる。 As described above, according to the analysis apparatus according to the first embodiment, the training data, the gradient ∇l of the error term l, and the function ∇Ω that minimizes each of the R regularization terms Ω _r. based on _r , the error term l for each of the data points of the training data, R regularization terms Ω _r, and for each group, the value of the element between the dimensions belonging to the group in the parameter β By estimating the parameter β so as to minimize a loss function including a higher-order coupled regularization term expressed using the difference, and predicting a real scalar value y for the test data based on the parameter β, the group It is possible to predict the real scalar value y for the test data by accurately analyzing the data having a structure.

＜本発明の第２の実施の形態に係る解析装置の構成＞ <Configuration of analysis apparatus according to second embodiment of the present invention>

次に、本発明の第２の実施の形態に係る解析装置の構成について説明する。第２の実施の形態は、一般化高階結合正則化項を含む損失関数を用いる点が第１の実施の形態と異なっている。なお、第１の実施の形態と同様となる箇所については同一符号を付して説明を省略する。 Next, the configuration of the analysis apparatus according to the second embodiment of the present invention will be described. The second embodiment is different from the first embodiment in that a loss function including a generalized higher-order coupled regularization term is used. In addition, the same code | symbol is attached | subjected about the location similar to 1st Embodiment, and description is abbreviate | omitted.

図４に示すように、本発明の第２の実施の形態に係る解析装置２００は、入力部２１０と、演算部２２０と、出力部５０とを備えている。 As shown in FIG. 4, the analysis device 200 according to the second embodiment of the present invention includes an input unit 210, a calculation unit 220, and an output unit 50.

入力部２１０は、訓練データ、パラメータ構造データ、誤差項ｌ、パラメータβを推定するための損失関数における訓練データのデータ点の各々についての誤差項ｌの勾配∇ｌ、誤差項のリプシッツ定数Ｌ、Ｒ個の正則化項Ω_ｒ（ｒ＝１，…，Ｒ）、損失関数におけるＲ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒ（ｒ＝１，…，Ｒ）、繰り返し演算数Ｐ、ハイパーパラメータα、及びハイパーパラメータγを読み込む。以下に、各データについて、第１の実施の形態と異なる点を説明する。 The input unit 210 includes training data, parameter structure data, an error term l, a slope ∇l of the error term l for each of the data points of the training data in the loss function for estimating the parameter β, a Lipschitz constant L of the error term, R regularization terms Ω _r (r = 1,..., R), a function ∇Ω _r (r = 1,..., R) that minimizes each of the R regularization terms Ω _r in the loss function, and repetition The operation number P, the hyper parameter α, and the hyper parameter γ are read. In the following, each data will be described with respect to differences from the first embodiment.

パラメータ構造データは、上記（４）式に示す、観測データｘを解析するためのｄ次元のベクトルであるパラメータβの各次元の、各グループｋへの所属度を表すｃ_０ ^ｋ、及びｃ_１ ^ｋである。パラメータ構造データは、更に、パラメータβの次元をノード、次元対に関する類似度をエッジに持つグラフ行列Ｗを含む。 The parameter structure data includes c ₀ ^k and c ₁ representing the degree of affiliation of each dimension of the parameter β, which is a d-dimensional vector for analyzing the observation data x shown in the above equation (4), to each group k. ^k . The parameter structure data further includes a graph matrix W having a dimension of the parameter β as a node and a similarity with respect to a dimension pair as an edge.

演算部２２０は、パラメータ推定部２３０と予測部４０とを含んで構成されている。 The calculation unit 220 includes a parameter estimation unit 230 and a prediction unit 40.

パラメータ推定部２３０は、訓練データと、誤差項ｌの勾配∇ｌと、Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒと、誤差項のリプシッツ定数Ｌと、ハイパーパラメータα、γとに基づいて、訓練データのデータ点の各々についての誤差項ｌと、Ｒ個の正則化項Ω_ｒと、高階結合正則化項及びパラメータβの次元対の類似度を用いて表される一般化結合正則化項を含む一般化高階結合正則化項とを含む損失関数を最小化するように、パラメータβを推定する。また、パラメータ推定部３０は、初期化部３２と、誤差項パラメータ推定部３４と、正則化項パラメータ推定部３６と、一般化高階結合正則化項パラメータ推定部２３８と、を含んで構成されている。本実施の形態では、パラメータ推定部３０によって、訓練データに含まれる観測データｘを解析するためのパラメータβを推定し、出力部５０に出力する。 The parameter estimation unit 230 includes training data, a slope ∇l of the error term l, a function ∇Ω _r that minimizes each of the R regularization terms Ω _r , a Lipschitz constant L of the error term, and a hyper parameter α. , Γ and is expressed using the error term l for each of the data points of the training data, the R regularization terms Ω _r , the higher order joint regularization term and the similarity of the dimension pairs of the parameter β. The parameter β is estimated so as to minimize the loss function including the generalized higher order combined regularization term including the generalized combined regularization term. The parameter estimation unit 30 includes an initialization unit 32, an error term parameter estimation unit 34, a regularization term parameter estimation unit 36, and a generalized higher-order coupled regularization term parameter estimation unit 238. Yes. In the present embodiment, the parameter estimation unit 30 estimates the parameter β for analyzing the observation data x included in the training data, and outputs it to the output unit 50.

一般化高階結合正則化項パラメータ推定部２３８は、正則化項パラメータ推定部３６によって推定されたパラメータβ_ｐ ^〜と、一般化高階結合正則化項Ω_GFL（β）とに基づいて、パラメータβ_ｐを推定する。 Generalized higher order binding regularization term parameter estimating unit 238, based on the parameter beta _p and ^~ estimated by the regularization term parameter estimation unit 36, a generalized higher order binding regularization term Omega _GFL (beta), the parameter beta _p Is estimated.

具体的には、まず、一般化結合正則化項をΩ_GFL、高階結合正則化項をΩ_HOとし、一般化高階結合正則化項を次のように定式化する。以下、簡単のためにβ＝β_ｐ ^〜とする。 Specifically, first, the generalized coupled regularization term is Ω _GFL , the higher order coupled regularization term is Ω _HO , and the generalized higher order coupled regularization term is formulated as follows. Below, β = β _p and ^- for the sake of simplicity.

非特許文献２より、一般化結合正則化は、劣モジュラ関数であるカット関数のロバシュ拡張に対応する。また、上記第１の実施の形態と同様に、高階結合正則化項は、劣モジュラ関数である非特許文献４のロバストPⁿモデルのロバシュ拡張に対応する。したがって、一般化高階結合正則化はカット関数とロバストPⁿモデルの和からなる劣モジュラ関数のロバシュ拡張と一致する。 From Non-Patent Document 2, the generalized combined regularization corresponds to the Robust extension of the cut function, which is a submodular function. As in the first embodiment, the higher-order coupled regularization term corresponds to Robust extension of the robust ^Pn model of Non-Patent Document 4 that is a submodular function. Therefore, the generalized higher order regularization is consistent with the Robust extension of the submodular function consisting of the sum of the cut function and the robust ^Pn model.

以上から、一般化高階結合正則化項に関するパラメータの推定値は非特許文献２と同様に、非特許文献４と非特許文献７によって行う。 From the above, the estimated values of the parameters related to the generalized higher-order coupled regularization term are performed by Non-Patent Document 4 and Non-Patent Document 7, similarly to Non-Patent Document 2.

この際、s/tグラフを図５のように構築する。グラフにおいて、sは始点ノード、tは終点ノード、｛ｖ_１，…，ｖ_ｄ｝はパラメータの各次元に対応するノード、Ｕ_ｓ＝｛ｕ_ｓ ^１，…，ｕ_ｓ ^Ｋ｝，Ｕ_ｔ＝｛ｕ_ｔ ^１，…，ｕ_ｔ ^Ｋ｝はグループに対応するハイパーノードである。 At this time, the s / t graph is constructed as shown in FIG. In the graph, s is a start node, t is an end node, {v ₁ ,..., V _d } is a node corresponding to each dimension of the parameter, U _s = {u _s ¹ ,..., U _s ^K }, U _t = {U _t ¹ ,..., U _t ^K } are hypernodes corresponding to groups.

一般化高階結合正則化項パラメータ推定部２３８は、正則化項パラメータ推定部３６によって推定されたパラメータβ_ｐ ^〜と、ｓ／ｔグラフとに基づいて、上記非特許文献４に記載のパラメトリック最大流アルゴリズムによって、ハイパーパラメータαの値を変化させながら、始点ノードから終点ノードまでのフローが最大となるパスを探索することにより、パラメータβ_ｐを推定する。 The generalized higher-order coupled regularization term parameter estimation unit 238 is based on the parameter β _p ^~ estimated by the regularization term parameter estimation unit 36 and the s / t graph, and the parametric maximum flow described in Non-Patent Document 4 above. the algorithm, while changing the value of the hyper-parameter alpha, flows from a source node to a destination node is by searching the path with the maximum, to estimate parameters beta _p.

パラメータ推定部２３０は、誤差項パラメータ推定部３４、正則化項パラメータ推定部３６、及び一般化高階結合正則化項パラメータ推定部２３８による推定をＰ回繰り返して得られたパラメータの推定値β_ｐを、上記（３）式によって定式化されたパラメータβ^＊として予測部４０に出力する。 The parameter estimation unit 230 obtains an estimated value β _p of the parameter obtained by repeating the estimation by the error term parameter estimation unit 34, the regularization term parameter estimation unit 36, and the generalized higher-order coupled regularization term parameter estimation unit 238 P times. The parameter β ^* formulated by the above equation (3) is output to the prediction unit 40.

なお、第２の実施の形態に係る解析装置の他の構成及び作用については、第１の実施の形態と同様であるため、説明を省略する。 In addition, since it is the same as that of 1st Embodiment about the other structure and effect | action of the analyzer which concerns on 2nd Embodiment, description is abbreviate | omitted.

以上説明したように、第２の実施の形態に係る解析装置によれば、訓練データと、誤差項ｌの勾配∇ｌと、Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒとに基づいて、訓練データのデータ点の各々についての誤差項ｌと、Ｒ個の正則化項Ω_ｒと、高階結合正則化項及びパラメータβの次元対の類似度を用いて表される一般化結合正則化項を含む一般化高階結合正則化項とを含む損失関数を最小化するように、パラメータβを推定し、パラメータβに基づいて、テストデータに対する実数スカラー値ｙを予測することにより、グループ構造を持つデータを精度よく解析して、テストデータに対する実数スカラー値ｙを予測することができる。 As described above, according to the analysis apparatus according to the second embodiment, the training data, the gradient ∇l of the error term l, and the function ∇Ω that minimizes each of the R regularization terms Ω _r. _{is expressed using} the error term l for each of the data points of the training data, the R regularization terms Ω _r , the higher-order coupled regularization term, and the similarity of the dimension pairs of the parameter β. Estimate the parameter β to minimize a loss function including a generalized higher order regularization term including a generalized combined regularization term, and predict a real scalar value y for the test data based on the parameter β. Thus, it is possible to accurately analyze data having a group structure and predict a real scalar value y for test data.

また、パラメータに関する隣接、高階の事前情報を一般化高階結合正則化項として利用することにより、教師あり学習の定量的な性能向上が可能になる。 In addition, by using prior information on adjacent and higher orders regarding parameters as generalized higher order combined regularization terms, it is possible to improve the quantitative performance of supervised learning.

また、パラメータに関する事前知識として、隣接構造と高階構造の2種類の事前情報を正則化に利用したい場合に、これらの事前知識を利用するための一般化高階結合正則化項を含む損失関数を最小化するように、パラメータβを推定し、パラメータβに基づいて、テストデータに対する実数スカラー値ｙを予測することができる。 In addition, as prior knowledge about parameters, if you want to use two types of prior information of adjacent structure and higher-order structure for regularization, the loss function including generalized higher-order coupled regularization term to use these prior knowledge is minimized. The parameter β is estimated so that the real scalar value y for the test data can be predicted based on the parameter β.

また、一般化高階結合正則化が劣モジュラ関数のロバシュ拡張であることを利用して、効率的な最小化法により、パラメータβを推定することができる。 Further, the parameter β can be estimated by an efficient minimization method using the fact that the generalized higher-order joint regularization is a Robust extension of the submodular function.

＜本発明の第３の実施の形態に係る解析装置の構成＞ <Configuration of analysis apparatus according to third embodiment of the present invention>

次に、本発明の第３の実施の形態に係る解析装置の構成について説明する。第３の実施の形態は教師データを用いない教師なし学習を行う点が第１の実施の形態と異なっている。なお、第１の実施の形態と同様となる箇所については同一符号を付して説明を省略する。 Next, the configuration of an analysis apparatus according to the third embodiment of the present invention will be described. The third embodiment is different from the first embodiment in that unsupervised learning without using teacher data is performed. In addition, the same code | symbol is attached | subjected about the location similar to 1st Embodiment, and description is abbreviate | omitted.

図６に示すように、本発明の第３の実施の形態に係る解析装置３００は、ＣＰＵと、ＲＡＭと、後述する解析処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この解析装置３００は、機能的には図６に示すように入力部３１０と、演算部３２０と、出力部５０とを備えている。 As shown in FIG. 6, an analysis apparatus 300 according to the third embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing an analysis processing routine described later. It can be configured with a computer including. Functionally, the analysis apparatus 300 includes an input unit 310, a calculation unit 320, and an output unit 50 as shown in FIG.

入力部３１０は、訓練データ、パラメータ構造データ、誤差項ｌ、パラメータβを推定するための損失関数における訓練データのデータ点の各々についての誤差項ｌの勾配∇ｌ、誤差項のリプシッツ定数Ｌ、Ｒ個の正則化項Ω_ｒ（ｒ＝１，…，Ｒ）、損失関数におけるＲ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒ（ｒ＝１，…，Ｒ）、繰り返し演算数Ｐ、ハイパーパラメータα、及びハイパーパラメータγを読み込む。以下に、各データについて、第１の実施の形態と異なる点を説明する。 The input unit 310 includes training data, parameter structure data, an error term l, a slope ∇l of the error term l for each of the data points of the training data in the loss function for estimating the parameter β, a Lipschitz constant L of the error term, R regularization terms Ω _r (r = 1,..., R), a function ∇Ω _r (r = 1,..., R) that minimizes each of the R regularization terms Ω _r in the loss function, and repetition The operation number P, the hyper parameter α, and the hyper parameter γ are read. In the following, each data will be described with respect to differences from the first embodiment.

訓練データは、観測データｘからなるＮ個のデータ点の集合｛（ｘ_ｎ）｝_ｎ＝１ ^Ｎである。本実施の形態は、教師なし学習であるため、実数スカラー値｛（ｙ_ｎ）｝_ｎ＝１ ^Ｎを読み込まない。 The training data is a set of N data points consisting of observation data x {(x _n )} _{n = 1} ^N. Since this embodiment is unsupervised learning, a real scalar value {(y _n )} _{n = 1} ^N is not read.

パラメータ構造データは、上記（４）式に示す、観測データｘを解析するためのｄ次元のベクトルであるパラメータβの各次元の、各グループｋへの所属度を表すｃ_０ ^ｋ、及びｃ_１ ^ｋである。 The parameter structure data includes c ₀ ^k and c ₁ representing the degree of affiliation of each dimension of the parameter β, which is a d-dimensional vector for analyzing the observation data x shown in the above equation (4), to each group k. ^k .

また、本実施の形態では、上記（２）式に示すように、損失関数における誤差項ｌが、第１の実施の形態と異なっている。 In the present embodiment, as shown in the above equation (2), the error term l in the loss function is different from that in the first embodiment.

演算部３２０は、パラメータ推定部３０を含んで構成されている。 The calculation unit 320 includes the parameter estimation unit 30.

パラメータ推定部３０は、第１の実施の形態と同様に、訓練データと、誤差項ｌの勾配∇ｌと、Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒと、誤差項のリプシッツ定数Ｌと、ハイパーパラメータα、γとに基づいて、訓練データのデータ点の各々についての誤差項ｌと、Ｒ個の正則化項Ω_ｒと、各グループｋに対し、パラメータβにおける、グループｋに所属する次元間の要素の値の差を用いて表される高階結合正則化項とを含む損失関数を最小化するように、パラメータβを推定する。また、パラメータ推定部３０は、第１の実施の形態と同様に、初期化部３２と、誤差項パラメータ推定部３４と、正則化項パラメータ推定部３６と、高階結合正則化項パラメータ推定部３８と、を含んで構成されている。本実施の形態では、パラメータ推定部３０によって、訓練データに含まれる観測データｘを解析するためのパラメータβを推定し、出力部５０に出力する。 Parameter estimation unit 30, like the first embodiment, the training data, and slope ∇l error term l, and functions ∇Omega _r that minimizes each of the R regularization term Omega _r, the error Based on the Lipschitz constant L of the term and the hyperparameters α, γ, the error term l for each of the data points of the training data, the R regularization terms Ω _r, and for each group k in the parameter β The parameter β is estimated so as to minimize a loss function including a higher-order coupled regularization term expressed using a difference in element values between dimensions belonging to the group k. In addition, the parameter estimation unit 30 includes an initialization unit 32, an error term parameter estimation unit 34, a regularization term parameter estimation unit 36, and a higher-order coupled regularization term parameter estimation unit 38, as in the first embodiment. And. In the present embodiment, the parameter estimation unit 30 estimates the parameter β for analyzing the observation data x included in the training data, and outputs it to the output unit 50.

なお、第３の実施の形態の他の構成及び作用は第１の実施の形態と同様であるため、説明を省略する。 In addition, since the other structure and effect | action of 3rd Embodiment are the same as that of 1st Embodiment, description is abbreviate | omitted.

＜本発明の第４の実施の形態に係る解析装置の構成＞ <Configuration of Analysis Device According to Fourth Embodiment of the Present Invention>

次に、本発明の第４の実施の形態に係る解析装置の構成について説明する。第４の実施の形態は教師データを用いない教師なし学習を行う点が第２の実施の形態と異なっている。なお、第２の実施の形態と同様となる箇所については同一符号を付して説明を省略する。 Next, the configuration of an analysis apparatus according to the fourth embodiment of the present invention will be described. The fourth embodiment is different from the second embodiment in that unsupervised learning without using teacher data is performed. In addition, about the location similar to 2nd Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

図７に示すように、本発明の第４の実施の形態に係る解析装置４００は、ＣＰＵと、ＲＡＭと、後述する解析処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この解析装置４００は、機能的には図７に示すように入力部４１０と、演算部４２０と、出力部５０とを備えている。 As shown in FIG. 7, an analysis apparatus 400 according to the fourth embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing an analysis processing routine to be described later. It can be configured with a computer including. Functionally, the analysis apparatus 400 includes an input unit 410, a calculation unit 420, and an output unit 50 as shown in FIG.

入力部４１０は、訓練データ、パラメータ構造データ、誤差項ｌ、パラメータβを推定するための損失関数における訓練データのデータ点の各々についての誤差項ｌの勾配∇ｌ、誤差項のリプシッツ定数Ｌ、Ｒ個の正則化項Ω_ｒ（ｒ＝１，…，Ｒ）、損失関数におけるＲ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒ（ｒ＝１，…，Ｒ）、繰り返し演算数Ｐ、ハイパーパラメータα、及びハイパーパラメータγを読み込む。以下に、各データについて、第２の実施の形態と異なる点を説明する。 The input unit 410 includes training data, parameter structure data, an error term l, a slope ∇l of the error term l for each of the data points of the training data in the loss function for estimating the parameter β, a Lipschitz constant L of the error term, R regularization terms Ω _r (r = 1,..., R), a function ∇Ω _r (r = 1,..., R) that minimizes each of the R regularization terms Ω _r in the loss function, and repetition The operation number P, the hyper parameter α, and the hyper parameter γ are read. Hereinafter, each data will be described with respect to differences from the second embodiment.

パラメータ構造データは、上記（４）式に示す、観測データｘを解析するためのｄ次元のベクトルであるパラメータβの各次元の、各グループｋへの所属度を表すｃ_０ ^ｋ、及びｃ_１ ^ｋである。パラメータ構造データは、更に、更に、パラメータβの次元をノード、次元対に関する類似度をエッジに持つグラフ行列Ｗを含む。 The parameter structure data includes c ₀ ^k and c ₁ representing the degree of affiliation of each dimension of the parameter β, which is a d-dimensional vector for analyzing the observation data x shown in the above equation (4), to each group k. ^k . The parameter structure data further includes a graph matrix W having a dimension of the parameter β as a node and a similarity with respect to a dimension pair as an edge.

また、本実施の形態では、上記（２）式に示すように、損失関数における誤差項ｌが、第２の実施の形態と異なっている。 In the present embodiment, as shown in the above equation (2), the error term l in the loss function is different from that in the second embodiment.

演算部４２０は、パラメータ推定部２３０を含んで構成されている。 The calculation unit 420 includes a parameter estimation unit 230.

パラメータ推定部２３０は、第２の実施の形態と同様に、訓練データと、誤差項ｌの勾配∇ｌと、Ｒ個の正則化項Ω_ｒの各々を最小化する関数∇Ω_ｒと、誤差項のリプシッツ定数Ｌと、ハイパーパラメータα、γとに基づいて、訓練データのデータ点の各々についての誤差項ｌと、Ｒ個の正則化項Ω_ｒと、一般化高階結合正則化項とを含む損失関数を最小化するように、パラメータβを推定する。また、パラメータ推定部２３０は、第２の実施の形態と同様に、初期化部３２と、誤差項パラメータ推定部３４と、正則化項パラメータ推定部３６と、一般化高階結合正則化項パラメータ推定部２３８と、を含んで構成されている。本実施の形態では、パラメータ推定部３０によって、訓練データに含まれる観測データｘを解析するためのパラメータβを推定し、出力部５０に出力する。 Parameter estimation unit 230, similarly to the second embodiment, the training data, and slope ∇l error term l, and functions ∇Omega _r that minimizes each of the R regularization term Omega _r, the error Based on the Lipsitz constant L of the term and the hyperparameters α, γ, an error term l for each of the data points of the training data, R regularization terms Ω _r, and a generalized higher order coupled regularization term The parameter β is estimated so as to minimize the loss function including it. Also, the parameter estimation unit 230 is similar to the second embodiment in that the initialization unit 32, the error term parameter estimation unit 34, the regularization term parameter estimation unit 36, and the generalized higher-order coupled regularization term parameter estimation. Part 238. In the present embodiment, the parameter estimation unit 30 estimates the parameter β for analyzing the observation data x included in the training data, and outputs it to the output unit 50.

なお、第４の実施の形態の他の構成及び作用は第２の実施の形態と同様であるため、説明を省略する。 In addition, since the other structure and effect | action of 4th Embodiment are the same as that of 2nd Embodiment, description is abbreviate | omitted.

このように、パラメータに関する隣接、高階の事前情報を一般化高階結合正則化項として利用することにより、教師なし学習の定量的な性能向上が可能になる。 In this way, quantitative performance improvement of unsupervised learning can be performed by using adjacent information and higher-order prior information regarding parameters as generalized higher-order coupled regularization terms.

＜実験例＞ <Experimental example>

本発明の第１の実施の形態に係る手法の効果を示すために、人工的に生成したデータセットと、インターネット上で公開されているデータセットに対して、教師あり学習の１つである線形回帰分析を行い、以下（９）式に示す平均二乗誤差という指標を用いて定量的な性能評価を行う。 In order to show the effect of the technique according to the first embodiment of the present invention, a linear that is one of supervised learning for an artificially generated data set and a data set published on the Internet. Regression analysis is performed, and quantitative performance evaluation is performed using an index called mean square error shown in the following equation (9).

人工的に生成したデータでは、データとパラメータの次元はＭ＝ｄ＝１００とし、データを３０，５０，７０，１００，１５０点の条件で生成した。乱数によって生成したｘ_ｎとあらかじめ設計したパラメータの線形和を取り、そこにガウスノイズを加えてｙ_ｎ用を人工的に生成した。１０回の交差検定によってテストデータに対する平均二乗誤差を計測した。比較手法には、ＳＧＬ(非特許文献８)、ＧＦＬ(非特許文献２)、Ｌａｓｓｏ(非特許文献１)、及びＯＬＳ(非特許文献１)を用いた。 In the artificially generated data, the dimension of the data and parameters was M = d = 100, and the data was generated under the conditions of 30, 50, 70, 100, and 150 points. Take the linear sum of pre-designed parameters x _n generated by a random number, generated artificially a for y _n thereto was added Gaussian noise. The mean square error for the test data was measured by 10 cross-validations. SGL (Non-Patent Document 8), GFL (Non-Patent Document 2), Lasso (Non-Patent Document 1), and OLS (Non-Patent Document 1) were used as the comparison method.

パラメータの設計には２つの条件を用いた。１つ目の条件をnon-overlappingと呼ぶ。本条件では、パラメータは全体で５つのグループ構造をもち、あるパラメータは1つのグループのみに所属すると定め、グループがオーバラップを持たないようにした。同一グループに所属するパラメータは同一の真値を持つように設定した。２つ目の条件をoverlappingと呼ぶ。本条件では、１つ目の設定と異なり、パラメータのうち２つのグループに所属するものが存在するようにし、グループがオーバラップを持つようにした。２つのグループに所属するパラメータの値は２つのグループのいずれかと同一の真値をもつようにせて値した。他の設定は条件non-overlappingと同様にした。なお、ｃ_０,ｉ ^ｋ,ｃ_１,ｉ ^ｋの値は１．０あるいは０．０とし、θ_０ ^ｋ, θ_１ ^ｋは０．０、θ_ｍａｘ ^ｋは１．０とした。 Two conditions were used for designing the parameters. The first condition is called non-overlapping. Under this condition, the parameters have a total of five group structures, and certain parameters belong to only one group, so that the groups do not overlap. Parameters belonging to the same group were set to have the same true value. The second condition is called overlapping. In this condition, unlike the first setting, parameters belonging to two groups exist so that the groups have overlap. The values of the parameters belonging to the two groups were made to have the same true value as either of the two groups. The other settings were the same as the condition non-overlapping. Note that the values of c _{0, i} ^k , c _{1, i} ^k were 1.0 or 0.0, θ ₀ ^k , θ ₁ ^k were 0.0, and θ _max ^k was 1.0.

Ｎを３０から１５０までの条件で実験を行い、得られたテストデータに対する平均二乗誤差を以下の表１に示す。(a)はnon-overlapping、(b)はoverlappingの条件での結果である。太字は他の手法の平均二乗誤差と統計的に優位な差が有ることを示している。 An experiment was conducted under the condition of N from 30 to 150, and the mean square error for the obtained test data is shown in Table 1 below. (a) is the result under non-overlapping, and (b) is the result under overlapping conditions. Bold indicates that there is a statistically significant difference from the mean square error of other methods.

non−overlapping条件では本発明とGFLが良好な性能を示している。overlapping条件では、本発明によって、テストデータに対する平均二乗誤差の改善が確認された。本発明はNがdよりも小さく過学習が起こる場合(N=70,50,30)でも、良好な性能を示した。これは高階結合正則化項によって過学習を避けられたからである。 Under non-overlapping conditions, the present invention and GFL show good performance. Under the overlapping condition, the improvement of the mean square error for the test data was confirmed by the present invention. The present invention showed good performance even when N was smaller than d and overlearning occurred (N = 70, 50, 30). This is because overlearning was avoided by higher order regularization terms.

次に実験から推定されたパラメータ(N=30)を図８及び図９に示す。図中の線はパラメータの真値、白抜き丸は各手法によるパラメータの推定値である。図８のnon−overlapping条件では、本発明とGFLがパラメータの真値と近いパラメータを推定できた。図９のoverlapping条件では、本発明の実施の形態に係る手法のみがパラメータの真値と近いパラメータを推定できた。 Next, parameters (N = 30) estimated from the experiment are shown in FIGS. The line in the figure is the true value of the parameter, and the white circle is the estimated value of the parameter by each method. In the non-overlapping condition of FIG. 8, the present invention and GFL were able to estimate a parameter close to the true value of the parameter. Under the overlapping conditions in FIG. 9, only the method according to the embodiment of the present invention can estimate a parameter close to the true value of the parameter.

以上の結果から、本発明の第１の実施の形態に係る手法の性能は、パラメータが単一のグループに所属する場合、及び複数のグループに所属するような場合のいずれにおいても有効であることが示された。 From the above results, the performance of the technique according to the first embodiment of the present invention is effective both when the parameter belongs to a single group and when the parameter belongs to a plurality of groups. It has been shown.

次に、インターネット上で公開されているデータセットを用いて実験の結果を示す。本実験では、MovieLens100k、EachMovie、Book−Crossing（http://grouplens.org ）のデータセットを利用し、データセットに含まれるユーザが視聴した映画と本の履歴データから、映画、及び本に与えた評価値を予測する実験を行った。データセットの要約は次の表２のとおりである。 Next, the result of an experiment is shown using a data set published on the Internet. In this experiment, MovieLens100k, EachMovie, and Book-Crossing (http://grouplens.org) data sets are used to give movies and books from the history data of movies and books viewed by users in the data set. An experiment was conducted to predict the evaluation value. A summary of the data set is shown in Table 2 below.

表３に１０回の交差検定によってテストデータに対する平均二乗誤差を計測した結果を示す。本発明の第１の実施の形態に係る手法は、すべての設定で既存の手法とほぼ同等、あるいは上回る性能を示している。 Table 3 shows the results of measuring the mean square error for the test data by 10 cross-validations. The technique according to the first embodiment of the present invention shows almost the same or better performance than the existing technique in all settings.

また、本発明の第１の実施の形態に係る手法の性能は、実世界で記録されたデータに対しても有効であることが示された。 It was also shown that the performance of the technique according to the first embodiment of the present invention is effective for data recorded in the real world.

次に、本発明の第２の実施の形態に係る手法の効果を示すために、人工的に生成したデータセットと、インターネット上で公開されているデータセットに対して、教師あり学習の１つである線形回帰分析を行い、上記（９）式に示す平均二乗誤差という指標を用いて定量的な性能評価を行う。 Next, in order to show the effect of the technique according to the second embodiment of the present invention, one of supervised learning is performed on an artificially generated data set and a data set released on the Internet. A linear regression analysis is performed, and quantitative performance evaluation is performed using an index called a mean square error shown in the above equation (9).

人工的に生成したデータでは、図１０（ａ）に示す、行と列の次元が50の行列データを生成する。行列を12個（8つの長方形と星、丸、菱型の3図形を作成した）の領域に分割し、各領域内のすべての要素は1から12の整数値のいずれかを取る。この整数値をテストデータとする。次にこの整数値に平均0、分散1のガウス分布からサンプリングしたノイズを加算し、さらに要素の値を乱択により欠損させ、訓練データとする。行列補完では観測データから欠損した要素の真値を推定する問題となる。 With the artificially generated data, matrix data having 50 rows and columns as shown in FIG. 10A is generated. Divide the matrix into 12 regions (8 rectangles and 3 shapes, stars, circles, and diamonds), and every element in each region takes an integer value between 1 and 12. This integer value is used as test data. Next, noise sampled from a Gaussian distribution with mean 0 and variance 1 is added to this integer value, and element values are lost by random selection to obtain training data. Matrix interpolation is a problem of estimating the true value of missing elements from observed data.

実験では、行列をベクトル化してあつかう。すなわちパラメータの次元はd=50²=2500となる。欠損値の割合をp=0.9, 0.5, 0.7, 0.9の条件で生成した。10回の交差検定によってテストデータに対する平均二乗誤差を計測した。比較手法には、GFL(非特許文献２)、HOFL(第１の実施の形態)、平均値を用いた。 In the experiment, the matrix is vectorized. That is, the parameter dimension is d = 50 ² = 2500. The percentage of missing values was generated under the conditions of p = 0.9, 0.5, 0.7, 0.9. The mean square error for the test data was measured by 10 cross-validations. As a comparison method, GFL (Non-patent Document 2), HOFL (first embodiment), and average value were used.

なお、w_i,j, c_0,i ^k , c_1,i ^kの値は1.0あるいは0.0とし、θ₀ ^k, θ₁ ^kは0.0、θ_max ^kは1.0とした。隣接構造には行列データで隣接するパラメータの情報、高階構造には8つの長方形と星形を与えた。つまり、丸と菱型は未知の構造として扱った。実験から得られた推定値と実際の観測値の平均二乗誤差を以下の表に示す。太字は平均二乗誤差が最も低いものを示している。 Note that the values of w _{i, j} , c _{0, i} ^k , c _{1, i} ^k are 1.0 or 0.0, θ ₀ ^k and θ ₁ ^k are 0.0, and θ _max ^k is 1.0. Adjacent structure was given information of adjacent parameters by matrix data, and 8 rectangles and star shape were given to higher order structure. In other words, the circle and rhombus were treated as unknown structures. The mean square error between the estimated value obtained from the experiment and the actual observed value is shown in the following table. Bold type indicates the lowest mean square error.

p=0.1の条件では本発明の第２の実施の形態に係る手法(prop.)とGFLが良好な性能を示している。本発明の第２の実施の形態に係る手法は、その他の欠損値が多く、過学習が起こる場合(p=0.5, 0.7, 0.9)でも、良好な性能を示した。これは一般化高階結合正則化項によって過学習を避けられたからである。 Under the condition of p = 0.1, the technique (prop.) and GFL according to the second embodiment of the present invention show good performance. The technique according to the second embodiment of the present invention has good performance even when there are many other missing values and overlearning occurs (p = 0.5, 0.7, 0.9). This is because over-learning was avoided by the generalized higher-order coupled regularization term.

次に実験から推定されたパラメータを図１０に示す。図１０（ａ）は、真の行列、図１０（ｂ）は、ランダム欠損してノイズが加えられた観測値、図１０（ｃ）は、本発明の第２の実施の形態に係る手法の推定値、図１０（ｄ）は、一般化結合正則化の推定値、図１０（ｅ）は、高階結合正則化の推定値である。本発明の第２の実施の形態に係る手法が他の手法と比べて真値と近いパラメータを推定できたことが確認できる。 Next, parameters estimated from the experiment are shown in FIG. FIG. 10 (a) is a true matrix, FIG. 10 (b) is an observed value added with noise due to random loss, and FIG. 10 (c) is a method according to the second embodiment of the present invention. FIG. 10D shows an estimated value, FIG. 10D shows an estimated value for generalized joint regularization, and FIG. 10E shows an estimated value for higher-order joint regularization. It can be confirmed that the method according to the second embodiment of the present invention was able to estimate a parameter closer to the true value than other methods.

以上から、本発明の第２の実施の形態に係る手法の有効性が示された。 From the above, the effectiveness of the technique according to the second embodiment of the present invention was shown.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

１０、２１０、３１０、４１０入力部
２０、２２０、３２０、４２０演算部
３０、２３０パラメータ推定部
３２初期化部
３４誤差項パラメータ推定部
３６正則化項パラメータ推定部
３８高階結合正則化項パラメータ推定部
２３８一般化高階結合正則化項パラメータ推定部
４０予測部
５０出力部
１００、２００、３００、４００解析装置 10, 210, 310, 410 Input unit 20, 220, 320, 420 Operation unit 30, 230 Parameter estimation unit 32 Initialization unit 34 Error term parameter estimation unit 36 Regularization term parameter estimation unit 38 Higher order regularization term parameter estimation unit 238 Generalized higher-order coupled regularization term parameter estimation unit 40 prediction unit 50 output unit 100, 200, 300, 400 analysis device

Claims

An analysis device for predicting a real scalar value y for observation data x,
A training data that is a set of data points consisting of a combination of observation data x and real scalar value y, test data consisting of observation data x, and a d-dimensional vector for predicting real scalar value y for observation data x. The parameter structure data representing the degree of affiliation of each dimension of a certain parameter β to each group, and the slope ∇l of the error term l for each of the data points of the training data in the loss function for estimating the parameter β And a data input unit for receiving a function ∇Ω _r that minimizes each of the R regularization terms Ω _{r in} the loss function;
Based on the training data, the slope ∇l of the error term l, and a function ∇Ω _r that minimizes each of the R regularization terms Ω _r , for each of the data points of the training data. Higher-order joint regularization expressed using the error term l, the R regularization terms Ω _r, and the difference in element values between dimensions belonging to the group in the parameter β for each group A parameter estimator for estimating the parameter β so as to minimize the loss function including a term;
A prediction unit that predicts a real scalar value y for the test data based on the parameter β estimated by the parameter estimation unit;
Analysis device including

The parameter structure data further includes a similarity of a dimension pair of the parameter β,
The parameter estimation unit includes:
Based on the training data, the slope ∇l of the error term l, and a function ∇Ω _r that minimizes each of the R regularization terms Ω _r , for each of the data points of the training data. Generalization including the error term l, the R regularization terms Ω _r , the higher-order joint regularization term, and a generalized joint regularization term expressed using the similarity of the dimension pairs of the parameter β The analysis apparatus according to claim 1, wherein the parameter β is estimated so as to minimize the loss function including a higher-order coupled regularization term.

The parameter estimation unit includes:
An error term parameter estimator for estimating the parameter β based on the gradient ∇l of the error term l;
Regularization term parameter estimation for estimating the parameter β based on the parameter β estimated by the error term parameter estimation unit and a function ∇Ω _r that minimizes each of the R regularization terms Ω _r And
A higher-order coupled regularization term parameter estimation unit that estimates the parameter β based on the parameter β estimated by the regularization term parameter estimation unit and the higher-order coupled regularization term;
The analysis apparatus according to claim 1, wherein the estimation by the error term parameter estimation unit, the estimation by the regularization term parameter estimation unit, and the estimation by the higher-order coupled regularization term parameter estimation unit are performed at least once.

The parameter estimation unit includes:
An error term parameter estimator for estimating the parameter β based on the gradient ∇l of the error term l;
Regularization term parameter estimation for estimating the parameter β based on the parameter β estimated by the error term parameter estimation unit and a function ∇Ω _r that minimizes each of the R regularization terms Ω _r And
A generalized higher order coupled regularization term parameter estimation unit that estimates the parameter β based on the parameter β estimated by the regularization term parameter estimation unit and the generalized higher order combination regularization term;
The analysis apparatus according to claim 2, wherein the estimation by the error term parameter estimation unit, the estimation by the regularization term parameter estimation unit, and the estimation by the generalized higher-order coupled regularization term parameter estimation unit are performed at least once.

The higher order coupled regularization term parameter estimation unit is based on the parameter β estimated by the regularization term parameter estimation unit and an s / t graph predetermined corresponding to the higher order coupled regularization term, The analysis apparatus according to claim 3, wherein the parameter β is estimated according to a parametric maximum flow algorithm.

The generalized higher-order coupled regularization term parameter estimation unit includes the parameter β estimated by the regularization term parameter estimation unit, and an s / t graph determined in advance corresponding to the generalized higher-order coupled regularization term. 5. The analysis apparatus according to claim 4, wherein the parameter β is estimated according to the parametric maximum flow algorithm.

Training data that is a set of data points composed of observation data x; parameter structure data that represents the degree of affiliation of each dimension of parameter β, which is a d-dimensional vector for analyzing observation data x, to each group; The slope ∇l of the error term l for each of the data points of the training data in the loss function to estimate the parameter β, and the function ∇ that minimizes each of the R regularization terms Ω _{r in} the loss function. A data input unit for receiving Ω _r ;
Based on the training data, the slope ∇l of the error term l, and a function ∇Ω _r that minimizes each of the R regularization terms Ω _r , for each of the data points of the training data. Higher-order joint regularization expressed using the error term l, the R regularization terms Ω _r, and the difference in element values between dimensions belonging to the group in the parameter β for each group A parameter estimator for estimating the parameter β so as to minimize the loss function including a term;
Analysis device including

The parameter structure data further includes a similarity of a dimension pair of the parameter β,
The parameter estimation unit includes:
Based on the training data, the slope ∇l of the error term l, and a function ∇Ω _r that minimizes each of the R regularization terms Ω _r , for each of the data points of the training data. Generalization including the error term l, the R regularization terms Ω _r , the higher-order joint regularization term, and a generalized joint regularization term expressed using the similarity of the dimension pairs of the parameter β The analysis apparatus according to claim 7, wherein the parameter β is estimated so as to minimize the loss function including a higher-order coupled regularization term.

An analysis method in an analyzer for predicting a real scalar value y with respect to observation data x,
A data input unit for training data that is a set of data points composed of a combination of observation data x and real scalar value y, test data composed of observation data x, and real scalar value y for observation data x An error term for each of the data points of the training data in the loss function for estimating the parameter β and parameter structure data representing the degree of affiliation of each dimension of the parameter β, which is a d-dimensional vector, to each group receiving a slope ∇l of l and a function ∇Ω _r that minimizes each of the R regularization terms Ω _{r in} the loss function;
A parameter estimator based on the training data, the gradient ∇l of the error term l, and the function ∇Ω _r that minimizes each of the R regularization terms Ω _r ; Expressed using the error term l for each of the points, the R regularization terms Ω _r, and for each group, the difference in element values between the dimensions belonging to the group in the parameter β. Estimating the parameter β so as to minimize the loss function including a higher order coupled regularization term,
A predicting unit predicting a real scalar value y for the test data based on the parameter β estimated by the parameter estimating unit;
Analysis method including

The parameter structure data further includes a similarity of a dimension pair of the parameter β,
The step of the parameter estimation unit estimating includes:
Based on the training data, the slope ∇l of the error term l, and a function ∇Ω _r that minimizes each of the R regularization terms Ω _r , for each of the data points of the training data. Generalization including the error term l, the R regularization terms Ω _r , the higher-order joint regularization term, and a generalized joint regularization term expressed using the similarity of the dimension pairs of the parameter β The analysis method according to claim 9, wherein the parameter β is estimated so as to minimize the loss function including a higher-order coupled regularization term.

The program for functioning a computer as each part of the analyzer of any one of Claims 1-8.