JP7635854B2

JP7635854B2 - Data analysis device, method and program

Info

Publication number: JP7635854B2
Application number: JP2023555934A
Authority: JP
Inventors: 匡宏幸島; 隆二山本
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2025-02-26
Anticipated expiration: 2041-10-26
Also published as: WO2023073812A1; JPWO2023073812A1

Description

本発明の実施形態は、データ解析装置、方法およびプログラムに関する。 Embodiments of the present invention relate to data analysis devices, methods and programs.

近年のデータ(data)分析において、行列分解(Matrix Factorization, MF)またはテンソル分解(Tensor Factorization, TF)と呼ばれる手法が広く利用されている（例えば非特許文献１参照）。テンソルは３軸以上の高次の配列である。In recent data analysis, a method called matrix factorization (MF) or tensor factorization (TF) has been widely used (see, for example, Non-Patent Document 1). A tensor is a high-dimensional array with three or more axes.

分析対象のデータが、文書、購買履歴、ユーザ(user)属性情報、または映画の評価履歴などの、行列またはテンソルとして表現することができるデータである場合、行列表現されたデータをMFによって複数の行列の積へ因子分解したり、テンソル表現されたデータをTFによって複数の行列の積へ因子分解したりすることで、データ中のパターン(pattern)を自動で抽出したり、データの欠損値を補完したりすることが可能となる。 When the data to be analyzed can be expressed as a matrix or tensor, such as documents, purchasing history, user attribute information, or movie rating history, it is possible to automatically extract patterns in the data or complement missing values in the data by factoring matrix-represented data into the product of multiple matrices using MF, or factoring tensor-represented data into the product of multiple matrices using TF.

鹿島久嗣(Hisashi Kashima). 関係データの機械学習(Machine Learning for Relational Data). 日本ロボット学会誌(JRSJ:Journal of the Robotics Society of Japan), Vol.33, No.2, pp.72-76,2015.Hisashi Kashima. Machine Learning for Relational Data. Journal of the Robotics Society of Japan, Vol.33, No.2, pp.72-76,2015.

しかしながら、近年のデータ分析においては、データ収集の方法およびプライバシ(privacy)保護のための処理などの結果により作成される、行列またはテンソルの要素のインデックス(index)と、その要素の値との対応が取れないデータを分析しなければならない場合がある。
例えば、ユーザの映画の評価傾向のパターンを抽出する例において、通常のMFでは、行列表現された映画の評価履歴 However, in recent data analysis, it is sometimes necessary to analyze data in which the index of a matrix or tensor element does not correspond to the value of that element, which is created as a result of the data collection method and processing for privacy protection.
For example, in the case of extracting a pattern of a user's movie rating tendency, in a normal MF, the movie rating history is expressed as a matrix.

が利用可能であると仮定し、これを入力としてMFを適用することでパターンの抽出が行なわれる。行列表現された映画の評価履歴は、例えば、行のインデックスiがユーザに、列のインデックスjが映画にそれぞれ対応し、行列要素の値x_ijがユーザiによる映画jの評価値を示す行列である。 Assuming that the matrix is available, patterns are extracted by applying MF to the input. For example, the movie rating history represented in a matrix is a matrix in which row index i corresponds to a user, column index j corresponds to a movie, and the value of the matrix element x _ij indicates the rating value of movie j by user i.

しかし、このような行列表現された映画の評価履歴は、必ずしも利用可能であるとは限らない。例えば、ユーザiが映画jを評価した、または視聴したことは記録されて利用され得るが、プライバシ保護などの観点から、その評価値は、映画jに紐付く形で記録または公開されておらず、ユーザiによる映画jの評価値の分布、例えば評価値を「1」とした映画がz₁本存在し、評価値を「2」とした映画がz₂本存在する…など、の形式が利用できる場合がある。
この場合、ユーザiによる映画jの評価値は観測できないため、行列要素のインデックス(i, j)と、その要素の値との対応が取れず、すなわちx_ijの値が何であるかは分からないため、行列表現ができないデータとして表現される。 However, such a matrix representation of movie rating history is not necessarily usable. For example, the fact that user i rated or watched movie j may be recorded and used, but the rating value is not recorded or made public in a form linked to movie j from the viewpoint of privacy protection, and the distribution of ratings of movie j by user i, for example, there is _one movie with a rating value of "1", there are z _two movies with a rating value of "2", etc., may be available.
In this case, the rating value of movie j by user i cannot be observed, so there is no correspondence between the matrix element index (i, j) and the value of that element. In other words, it is not known what the value of x _ij is, so the data is represented as data that cannot be expressed in a matrix.

同様に、行のインデックスiが体重および年収などの各属性の項目に対応し、列のインデックスjがユーザに対応するユーザ属性情報などを扱う場合も、プライバシ保護などの観点から、その評価値はユーザjに紐付く形で記録または公開されておらず、項目iの値の分布、例えば体重が60kgであるとしたユーザがz₁人存在し、体重が61kgであるとした人がz₂人存在する…など、の形式で利用できる場合がある。この場合も、映画の評価履歴の場合と同様、項目iに関するユーザjの情報は観測できないため、行列要素のインデックスと、その要素の値との対応が取れないことから、行列表現のできないデータとなる。 Similarly, when dealing with user attribute information where row index i corresponds to each attribute item such as weight and annual income, and column index j corresponds to a user, from the perspective of privacy protection, the rating value is not recorded or published in a form linked to user j, and may be available in the form of a distribution of the value of item i, for example, there is _one user z whose weight is 60 kg, there are _two users z whose weight is 61 kg, etc. In this case, as in the case of movie rating history, the information of user j regarding item i cannot be observed, and therefore it is not possible to establish a correspondence between the index of a matrix element and the value of that element, resulting in data that cannot be expressed in a matrix.

この発明は、上記事情に着目してなされたもので、その目的とするところは、解析対象データのインデックスと要素との対応の有無によらずに当該解析対象データを適切に解析することができるようにしたデータ解析装置、方法およびプログラムを提供することにある。This invention has been made in light of the above-mentioned circumstances, and its purpose is to provide a data analysis device, method, and program that are capable of appropriately analyzing the data to be analyzed regardless of whether there is a correspondence between the index and elements of the data to be analyzed.

本発明の一態様に係るデータ解析装置は、解析対象データのインデックスの集合であるインデックスデータ、前記解析対象データの要素の集合、および前記解析対象データの要素の大小関係が示される大小比較データの入力を受け付ける入力部と、前記解析対象データの因子の推定に用いられる損失関数と前記損失関数の近似との誤差の上界を最小化するパラメタを、前記入力部により入力した前記要素の集合に基づいて推定する第１の推定部と、前記入力部により入力した前記インデックスデータ、前記大小比較データ、および前記第１の推定部により推定されたパラメタに基づいて、前記解析対象データの因子を推定する第２の推定部と、を備える。A data analysis device according to one embodiment of the present invention comprises an input unit that accepts input of index data, which is a set of indexes of data to be analyzed, a set of elements of the data to be analyzed, and size comparison data indicating the size relationship of the elements of the data to be analyzed, a first estimation unit that estimates parameters that minimize an upper bound of the error between a loss function used to estimate a factor of the data to be analyzed and an approximation of the loss function based on the set of elements input by the input unit, and a second estimation unit that estimates a factor of the data to be analyzed based on the index data, the size comparison data, and the parameters estimated by the first estimation unit input by the input unit.

本発明の一態様に係るデータ解析方法は、データ解析装置により行なわれる方法であって、解析対象データのインデックスの集合であるインデックスデータ、前記解析対象データの要素の集合、および前記解析対象データの要素の大小関係が示される大小比較データの入力を受け付けることと、前記解析対象データの因子の推定に用いられる損失関数と前記損失関数の近似との誤差の上界を最小化するパラメタを、前記入力した前記要素の集合に基づいて推定することと、前記入力した前記インデックスデータ、前記大小比較データ、および前記推定されたパラメタに基づいて、前記解析対象データの因子を推定することと、を備える。 A data analysis method according to one aspect of the present invention is a method performed by a data analysis device, and includes accepting input of index data, which is a set of indexes of data to be analyzed, a set of elements of the data to be analyzed, and size comparison data indicating the size relationship of the elements of the data to be analyzed; estimating parameters that minimize an upper bound of the error between a loss function used to estimate a factor of the data to be analyzed and an approximation of the loss function based on the input set of elements; and estimating the factor of the data to be analyzed based on the input index data, the size comparison data, and the estimated parameters.

本発明によれば、解析対象データのインデックスと要素との対応の有無によらずに当該解析対象データを適切に解析することができる。 According to the present invention, the data to be analyzed can be appropriately analyzed regardless of whether there is a correspondence between the index and elements of the data to be analyzed.

図１は、本発明の一実施形態に係るデータ解析装置の適用例を示す図である。FIG. 1 is a diagram showing an application example of a data analysis device according to an embodiment of the present invention. 図２は、本発明の一実施形態に係るデータ解析装置の動作の概要の一例を示すフローチャート(flow chart)である。FIG. 2 is a flow chart showing an example of an outline of the operation of the data analysis device according to one embodiment of the present invention. 図３は、ハイパーパラメタ(hyper-parameter)推定部による動作の一例を示すフローチャートである。FIG. 3 is a flowchart showing an example of an operation of the hyper-parameter estimation unit. 図４は、パラメタ推定部による動作の一例を示すフローチャートである。FIG. 4 is a flowchart showing an example of the operation of the parameter estimation unit. 図５は、本発明の一実施形態に係るデータ解析装置のハードウエア（hardware）構成の一例を示すブロック図（block diagram）である。FIG. 5 is a block diagram showing an example of a hardware configuration of a data analysis apparatus according to an embodiment of the present invention.

以下、図面を参照しながら、この発明に係わる一実施形態を説明する。
本実施形態では、行列またはテンソルの要素のインデックスと、その要素の値との対応が取れない形式で与えられるデータであってもMFまたはTFのように、データ中のパターンを自動で抽出したり、行列要素の値を推定したりすることが可能な手法について説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
In this embodiment, we will explain a method that can automatically extract patterns in data or estimate values of matrix elements, such as MF or TF, even if the data is given in a format in which the index of a matrix or tensor element cannot correspond to the value of that element.

従来技術のMF(TF)は、上述のような行列(テンソル)要素のインデックスと、その要素の値との対応が取れないデータを分析することができない。なぜなら、以下のように、MFでは、入力となるデータが行列表現されていることが仮定されているためである。 Conventional MF (TF) technology cannot analyze data in which the indices of matrix (tensor) elements do not correspond to the values of those elements, as described above. This is because MF assumes that the input data is expressed as a matrix, as shown below.

MFは、入力となる行列MF is the input matrix

を低ランク近似(low rank approximation)する、i行r列の因子行列およびr行j列の因子行列にそれぞれ対応する、因子行列 The factor matrices that correspond to the factor matrix with i rows and r columns and the factor matrix with r rows and j columns, respectively, are used to perform low rank approximation of

と、 and,

とを因子分解の結果として推定することでパターンの抽出を行なう。 Patterns are extracted by estimating as the result of factorization.

因子行列の推定は、行列The estimation of the factor matrix is done by the matrix

と、因子行列の積 and the product of the factor matrices

とが類似した値となるように行われる。以後、 and are similar values.

の(i, j)要素はThe (i, j) element of

と記載される。上記の類似の尺度として用いられる損失関数(loss function)、すなわち値が小さいほど２つの行列が類似していることを表す関数には、以下のL₂ダイバージェンス(divergence) The loss function used as a measure of similarity above, that is, a function that indicates that the smaller the value, the more similar the two matrices are, is the following L ₂ divergence:

または、以下のブレグマンダイバージェンス(Bregman Divergence, BD) Or the Bregman Divergence (BD) below:

などが用いられる（例えば「TamaraG Kolda and BrettW Bader. Tensor decompositions and applications. SIAM review, Vol.51, No.3, pp. 455-500, 2009.」（参考文献１）、「RussR Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In Advances in neural information processing systems, pp. 1257-1264, 2008.」（参考文献２）、および「AjitP Singh and GeoffreyJ Gordon. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 650-658, 2008.」（参考文献４）参照）。 etc. are used (see, for example, "TamaraG Kolda and BrettW Bader. Tensor decompositions and applications. SIAM review, Vol.51, No.3, pp. 455-500, 2009." (Reference 1), "RussR Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In Advances in neural information processing systems, pp. 1257-1264, 2008." (Reference 2), and "AjitP Singh and GeoffreyJ Gordon. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 650-658, 2008." (Reference 4)).

なお、 In addition,

は、ある凸関数(convex function)であり、 is a convex function,

は、その微分である。関数 is its derivative. The function

を変えることでBDは多様な関数を表現できる。例えばBy changing , BD can express various functions. For example,

であるときは2乗誤差に対応し、 corresponds to the squared error,

であるときはロジスティック(logistic)損失に対応し、 corresponds to logistic loss,

であるときはI-divergence(一般化KL(Kullback Leibler) divergenceとも呼ばれる)に対応し、 When this is the case, it corresponds to I-divergence (also called generalized KL (Kullback-Leibler) divergence),

であるときは板倉斎藤擬距離(Itakura-Saito divergence)に対応する。利用する損失関数を定めることは、行列要素を生成する確率分布(probability distribution)に仮定をおくことと等価である。具体的には、2乗誤差、I-divergence、または板倉斎藤擬距離を利用することは、それぞれ正規分布(normal distribution)、ポアソン分布(poisson distribution)、または指数分布(exponential distribution)に従って行列要素が生成されるという仮定に１対１で対応する。 When , it corresponds to the Itakura-Saito pseudo-distance. Determining the loss function to use is equivalent to making an assumption on the probability distribution that generates the matrix elements. Specifically, using squared error, I-divergence, or Itakura-Saito pseudo-distance corresponds one-to-one to the assumption that the matrix elements are generated according to normal distribution, poisson distribution, or exponential distribution, respectively.

上記に代表される損失関数を最小化することで因子行列が推定される。この損失関数の最適化には確率的勾配降下法(Stochastic Gradient Descent, SGD)、または(擬)ニュートン法（Newton’s method）などの様々な手法が利用できる。上記の参考文献４では、ニュートン法に基づき因子行列を交互に更新を繰り返すアルゴリズム(algorithm)が導かれることが記載される。The factor matrix is estimated by minimizing the loss function shown above. Various methods can be used to optimize this loss function, such as Stochastic Gradient Descent (SGD) or Newton's method. Reference 4 above describes the derivation of an algorithm that repeatedly updates the factor matrix in an alternating manner based on Newton's method.

なお、TFを適用する場合は行列ではなくテンソルの低ランク近似を考えればよく、例えば3次のテンソル When applying TF, we can consider low-rank approximations of tensors rather than matrices. For example, a third-order tensor

をCP分解(Canonical Polyadic decomposition)により近似する、すなわち基底数Ｒ個の因子に分解する場合は、テンソルの要素x_ijkと、その因子行列 is approximated by CP decomposition (Canonical Polyadic decomposition), that is, decomposed into a basis number R of factors, the elements x _ijk of the tensor and its factor matrix

および and

による近似値Approximation by

とが類似した値になるように推定すれば良い。 It is sufficient to estimate that and are similar values.

上記のMF(TF)では、今回分析の対象となるデータのように行列(テンソル)としての表現が不足するデータに対して適用することができない。データが行列(テンソル)表現されていないため、損失関数の値を計算することもできなければ、上記ニュートンの更新式を計算することもできないためである。そこで、本実施形態では以下の新しい手法を示す。 The above MF (TF) cannot be applied to data that lacks matrix (tensor) expression, such as the data that is the subject of this analysis. This is because, because the data is not expressed as a matrix (tensor), it is not possible to calculate the value of the loss function, nor is it possible to calculate the above Newton's update formula. Therefore, in this embodiment, the following new method is presented.

本実施形態での手法とは異なるが関連する問題を扱う手法として、「Z.Shen, L.Du, X.Shen, and Y.Shen. Interval-valued matrix factorization with applications. In ICDM, pp. 1037-1042. IEEE, 2010.」（参考文献３）に記載された手法と、「Yuto Yamaguchi and Kohei Hayashi. Tensor decomposition with missing indices. In International Joint Conference on Artificial Intelligence, pp.3217-3223, 2017.」（参考文献６）に記載された手法が存在する。参考文献３に記載された手法では、行列要素の値そのものは観測できないが、値の、ある区間の情報、例えば、x_ijの値は1以上3以下であるなど、という情報が利用できるときに、その情報を利用してMFを行なう手法が提案されている。 As a method for dealing with a problem related to the method of this embodiment, but different from the method of this embodiment, there is a method described in "Z.Shen, L.Du, X.Shen, and Y.Shen. Interval-valued matrix factorization with applications. In ICDM, pp. 1037-1042. IEEE, 2010." (Reference 3) and a method described in "Yuto Yamaguchi and Kohei Hayashi. Tensor decomposition with missing indices. In International Joint Conference on Artificial Intelligence, pp.3217-3223, 2017." (Reference 6). In the method described in Reference 3, the values of the matrix elements themselves cannot be observed, but when information on a certain interval of values, for example, information that the value of x _ij is 1 to 3, is available, a method is proposed in which MF is performed using the information.

本実施形態で考える問題は、値の、ある区間の情報が与えられる問題とは異なるため、この手法を適用することはできない。上記の参考文献６では、テンソルのインデックスに欠損がある場合であってもテンソル分解を行なうことができる手法が提案される。この手法では、テンソルのインデックスと要素との組(i, j, k, x_ijk)のうちインデックスの一部、例えばjが欠損した場合であっても、欠損したインデックスの値を推定することでテンソル分解を適用している。 The problem considered in this embodiment is different from the problem where information on a certain interval of values is given, so this method cannot be applied. The above-mentioned reference 6 proposes a method that can perform tensor decomposition even if the tensor index is missing. In this method, even if some of the indexes, for example j, of the tensor index and element pair (i, j, k, x _ijk ) are missing, tensor decomposition is applied by estimating the value of the missing index.

本実施形態で考える手法は、上記の参考文献６に記載された手法とは異なり、欠損の値を推定してインデックスと行列要素との対応を取ることを必要とせずに、データを分析することができる手法である。The method considered in this embodiment differs from the method described in Reference 6 above in that it is a method that can analyze data without the need to estimate missing values and establish correspondence between indices and matrix elements.

なお、本実施形態でパラメタ推定に用いられる目的関数の導出に当たっては、「Liyuan Xu, Gang Niu, Junya Honda, and Masashi Sugiyama. Uncoupled regression from pairwise comparison data. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 3992-4002, 2019.」（参考文献５）に記載された回帰手法での、目的関数の導出に用いられる近似と同種の近似を利用した。ただし、ここで参考文献５に記載された手法で取り組んだ問題は、入力xと出力yとの関係y=f(x)を定める関数fを推定する回帰問題であり、本実施形態で考える行列分解またはテンソル分解とは異なる問題である。In this embodiment, the objective function used for parameter estimation was derived using the same approximation as that used to derive the objective function in the regression method described in "Liyuan Xu, Gang Niu, Junya Honda, and Masashi Sugiyama. Uncoupled regression from pairwise comparison data. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 3992-4002, 2019." (Reference 5). However, the problem addressed by the method described in Reference 5 here is a regression problem of estimating a function f that determines the relationship y=f(x) between an input x and an output y, which is a different problem from the matrix decomposition or tensor decomposition considered in this embodiment.

本実施形態では、行列またはテンソルの要素のインデックスと、その要素の値との対応が取れず、行列またはテンソルとしての表現が不足するデータであっても、行列分解またはテンソル分解を行なうことのできる手法について説明する。以後、本手法をアンカップル行列分解(Uncoupled Matrix Factorization, UMF)およびアンカップルテンソル分解(Uncoupled Tensor Factorization, UTF)と称して説明する。In this embodiment, a method is described that can perform matrix decomposition or tensor decomposition even for data in which the index of a matrix or tensor element does not correspond to the value of that element, and the data is insufficient to be expressed as a matrix or tensor. Hereinafter, this method is referred to as uncoupled matrix factorization (UMF) and uncoupled tensor factorization (UTF).

（定式化）
次に、本実施形態で説明する手法の問題設定である定式化について説明する。以下では、UMFの定式化について記すが、UTFの定式化も、UMFの定式化に基づいて行列をテンソルに置き換えれば、ほぼ同様である。行列のサイズ(size)はn_I×n_Jであるとして記述する。前述の映画の評価履歴のデータの場合、n_Iはユーザの数、n_Jは映画の数をそれぞれ表す。行列のインデックスの集合は (Formulation)
Next, we will explain the formulation, which is the problem setting of the method described in this embodiment. Below, we will describe the UMF formulation, but the UTF formulation is almost the same if matrices are replaced with tensors based on the UMF formulation. The size of the matrix is described as _nI × _nJ . In the case of the movie rating history data mentioned above, _nI represents the number of users and _nJ represents the number of movies. The set of matrix indices is

と記載され、以後インデックスデータと呼ぶ。n_Uは、このデータ中のインデックスの数を表す。前述の映画の評価履歴のデータの場合、 Hereinafter, this will be referred to as index data. n _U represents the number of indexes in this data. In the case of the movie rating history data mentioned above,

はユーザi_mが映画j_mを評価した、または視聴したことを記録したデータの集合に対応する。 corresponds to the set of data that records that user i _m has rated or watched movie j _m .

さらに行列の各行の要素の分布の集合、すなわち確率分布の集合は Furthermore, the set of distributions for the elements of each row of the matrix, i.e., the set of probability distributions, is

と記載される。上記の各行の要素の分布は、行iで条件付けられ(または固定され)かつ列jに関して周辺化(marginalization)された行列要素の確率密度関数(Probability Density Function, PDF)である。The distribution of elements in each row above is the Probability Density Function (PDF) of the matrix elements conditioned (or fixed) on row i and marginalized with respect to column j.

前述の映画の評価履歴のデータの場合、f_X|I=i(x)が、あるユーザiによる映画の評価値xの確率密度関数を表すものであり、事前知識から作成したもの、評価「1」とした映画がz₁本存在し、評価「2」とした映画がz₂本存在するという集計データから推定したもの、または行列の各行の要素の値の集合 In the case of the movie rating history data mentioned above, f _X|I=i (x) represents the probability density function of movie rating value x by a certain user i, and can be created from prior knowledge, estimated from aggregate data that shows that there is z ₁ movie rated "1" and z ₂ movies rated "2", or the set of values of the elements of each row of the matrix.

から推定したもの、などが利用できる。n_Vは、このデータ中の要素の数を表す。i_mは、行のインデックスを表し、x_mは、第i_m行目のある列の、どの列かは分からない要素の値を表す。なお、後述するが、集計データまたは各行の要素の値の集合を利用する場合には、明示的には、この確率密度関数を推定することなく手法を適用することもできる。 n _V represents the number of elements in the data. i _m represents the row index, and x _m represents the value of an element in a column in the i _m- th row, although the column is unknown. As will be described later, when using aggregated data or a set of element values in each row, the method can be applied without explicitly estimating this probability density function.

上記に加えて、行列要素の値の大小関係が示されるデータはIn addition to the above, the data that shows the magnitude relationship of the matrix elements is

と記載され、以後は大小比較データと呼ぶ。i_mは行のインデックスであり、 Hereafter, we will call it the size comparison data. _{i m} is the row index,

と、 and,

とは、共に列のインデックスであり、その要素の値 and are both column indexes and the element value

の値の方が、 value is,

の値よりも大きいことを表す。なお、要素の値It means that the value of the element is greater than the value of

そのものが観測されている訳ではない。n_Rは、この大小比較された行列要素の組の数を表す。このようなデータは、例えばユーザが、ある映画jを評価するときに、それとは別の映画j′よりも良かったかどうかを判断してもらい、この判断の結果を記録することで得ることが可能である。このようなデータは、1点、2点などというスコア(score)を付けることと比べてユーザが回答することが容易であり、データを収集することが容易である場合が多い。 The data itself is not observed. n _R represents the number of sets of matrix elements compared. For example, when a user evaluates a movie j, they can be asked to judge whether it is better than another movie j' and record the result of this judgment, and this kind of data can be obtained. Compared to giving a score such as 1 or 2, it is easy for users to answer and it is often easy to collect the data.

また、行列の要素が、映画の評価値ではなく、店舗の訪問回数を表す行列などであるときは、訪問回数そのものを入力するよりも、ある店舗に、それとは別の店舗より多く訪問する、とだけ回答する方が、訪問回数が非常に多い、行きつけの店舗があることを隠すことができ、自身の情報を減らして回答することが可能となる。UMFは、インデックスデータ Also, when the elements of a matrix are not movie ratings but rather the number of times a store has been visited, answering that you visit one store more than another store rather than entering the actual number of visits will hide the fact that you have a favorite store that you visit very frequently, and will allow you to answer with less information about yourself. UMF is an index data

、確率分布の集合 , a set of probability distributions

、および大小比較データ , and size comparison data

でなる3つのデータを入力として、因子行列 The three data are input, and the factor matrix

を推定する手法である。以後この因子行列は、まとめて is a method to estimate the factor matrix.

と記載され、因子行列を用いた各要素の推定値は and the estimates for each element using the factor matrix are

と記載される。 It is written.

（損失関数）
次に、上記の推定のための損失関数を導くことについて説明する。損失関数を定義するにあたり、実現値が行のインデックスに対応する確率変数(random variable)はIと記載され、列のインデックスに対応する確率変数はJと記載される。
さらに実現値が行列要素の値に対応する確率変数はXと記載される。これらの確率変数が従う確率分布はP_I,J,Xと記載される。
さらにI=iで条件付けられた際の条件付きの確率分布と、この確率分布の確率密度関数は、それぞれP_J,X|I=i, f_J,X|I=iと記載されるとする。
この条件付きの分布を、さらに上記のJまたはXに関して周辺化した分布は、それぞれP_X|I=i, P_J|I=iと記載される。確率分布P_X|I=iの確率密度関数はf_X|I=i、累積密度関数はF_X|I=iとそれぞれ記載される。定義より、 (Loss function)
Next, we will explain how to derive the loss function for the above estimation. In defining the loss function, the random variable whose realization corresponds to the row index is written as I, and the random variable whose realization corresponds to the column index is written as J.
Furthermore, the random variables whose realizations correspond to the values of the matrix elements are denoted as X. The probability distributions that these random variables follow are denoted as P _I,J,X .
Furthermore, the conditional probability distribution when conditioned on I=i and the probability density function of this probability distribution are written as P _J,X|I=i and f _J,X|I=i, respectively.
The distributions obtained by further marginalizing this conditional distribution with respect to J or X above are written as P _X|I=i and P _J|I=i , respectively. The probability density function of the probability distribution P _X|I=i is written as f _X|I=i , and the cumulative density function is written as F _X|I=i . By definition,

である。 is.

上記の確率変数を用いてパラメタθの推定に用いる損失関数は、上記のブレグマンダイバージェンスを用いて以下の式（１）のように定義される。 The loss function used to estimate the parameter θ using the above random variables is defined as the following equation (1) using the above Bregman divergence.

ただし、式（１）のHowever, in equation (1),

は、確率分布P_I,J,Xによる期待値(expected value)操作であり、 is the expected value operation based on the probability distribution P _I,J,X ,

は以下の式（２）で定義される。 is defined by the following equation (2).

ただし、式（２）のHowever, in equation (2),

は、それぞれ確率分布P_J,X|I=i, P_J|I=iによる期待値操作を表す。この損失関数において困難さを生じているのは式（２）の右辺の最終項 represent the expected value operations based on the probability distributions P _J,X|I=i and P _J|I=i, respectively. The difficulty in this loss function is caused by the last term on the right-hand side of equation (2)

である。なぜなら、この項は列のインデックスJと要素Xとに関する同時分布によって定義されているが、本実施形態で考える問題設定では、インデックスと要素とは対応付けされず同時に観測できないため、例えば標本近似したとしても計算することができない。よって以降で、この項を近似的に評価することを考える。 This is because this term is defined by the joint distribution of column index J and element X, but in the problem setting considered in this embodiment, the index and the element are not associated and cannot be observed simultaneously, so it cannot be calculated even if, for example, a sample approximation is performed. Therefore, hereafter, we consider approximately evaluating this term.

ここでは新たに、実現値が列のインデックスに対応する確率変数の組(J⁺, J^-)を導入する。この組は、あるiを固定した条件で、要素(i, J⁺)の値が、(i, J^-)の値よりも大きいことを示し、以下のように定義される。 We introduce a new set of random variables (J ⁺ , J ^- ) whose realizations correspond to the index of the columns. This set indicates that for a fixed i, the value of element (i, J ⁺ ) is greater than the value of (i, J ^- ), and is defined as follows:

ただし、(J, X)と(J′, X′)とは、共に確率P_J,X|I=iに従う独立の確率変数である。この定義から、前述した大小比較データ Here, (J, X) and (J′, X′) are both independent random variables with probability P _J,X|I=i . From this definition, the above-mentioned size comparison data

は、この確率変数の実現値であると見做せることを後ほど利用する。以後、あるiを固定した条件でのJ⁺とJ^-が従う確率密度関数は、それぞれ We will later use the fact that we can consider J+ and J- to be the realization of this random variable. Hereafter, the probability density functions that J ⁺ and ^J- follow under the condition that i is fixed are respectively

と and

と記載され、さらにJ⁺とJ^-の出方に関する期待値操作は、 The expected value operation for the appearance of J ⁺ and J ^- is as follows:

と記載される。この確率変数を用いると、以下の式（３）および（４）が成り立つことが示せる。Using this random variable, it can be shown that the following equations (3) and (4) hold.

ただし、F_X|I=iは、行iを固定した条件での要素の値に関する累積密度関数 where F _X|I=i is the cumulative density function for the element values with row i fixed.

である。 is.

式（３）の証明について説明する。J⁺の定義により、f_J+|I=iは、次のように展開できる。 The proof of formula (3) will be explained. According to the definition of J ⁺ , fJ _+|I=i can be expanded as follows.

ただし、Zは正規化定数であり、次のように定義される。 where Z is a normalization constant defined as follows:

部分積分(intgration by parts)により、Z=1/2であると導かれる。この結果を用いれば、次のように式（３）を導くことができる。 By integration by parts, we can derive that Z = 1/2. Using this result, we can derive equation (3) as follows:

式（４）の導出も同様である。以上で証明を終わる。 The derivation of equation (4) is similar. This completes the proof.

式（３）および（４）を用いれば、確率変数Xの条件付き周辺分布が[0, 1]上の一様分布(すなわちF_X|I=i(x)=x)であるとき、式（２）の右辺の最終項は Using equations (3) and (4), when the conditional marginal distribution of the random variable X is a uniform distribution on [0, 1] (i.e., F _X|I=i(x) =x), the last term on the right-hand side of equation (2) is

と変形できる。この事実を考慮すれば、あるハイパーパラメタ Taking this fact into consideration, a certain hyperparameter

を用いて、以下の式（５）で示される近似形を利用することが有望と考えられる。 It seems promising to use the approximation form shown in the following equation (5).

ただし、式（５）では右辺の値が左辺の値を近似していることを表す。上記の[0, 1]上の一様分布の場合は、(w_i1, w_i2)=(1/2, 0)と定義すれば、この近似は精確である。これは確率変数Xの条件付き周辺分布が[a, b]上の一様分布である場合(F_X|I=i(x)=(x-a)/(b-a) for all x ∈ [a, b])にも一般化でき、(w_i1, w_i2)=(b/2, a/2)と定義すれば良い。一様分布ではない、より一般的な分布を考える場合には、汎化損失(generalization error) However, in equation (5), the value on the right side indicates that the value on the left side is an approximation. In the above uniform distribution on [0, 1], this approximation is accurate if (w _i1 , w _i2 )=(1/2, 0). This can also be generalized to the case where the conditional marginal distribution of the random variable X is a uniform distribution on [a, b] (F _X|I=i (x)=(xa)/(ba) for all x ∈ [a, b]), where (w _i1 , w _i2 )=(b/2, a/2) can be defined. When considering a more general distribution other than a uniform distribution, the generalization loss is

の上界を最小化するようにハイパーパラメタ(w_i1, w_i2)を決定できる。この決定については後述する。 We can determine the hyperparameters (w _i1 , w _i2 ) so as to minimize the upper bound of . This determination will be described later.

式（３）と（４）の和をとると、以下の式が導かれる。 By adding up equations (3) and (4), we obtain the following equation.

この式を用いると、ある定数λ_iを用いて、上記の式（５）を以下の式（６）のように変形できる。 Using this formula, the above formula (5) can be transformed into the following formula (6) using a certain constant λ _i .

λ_iの設定に関しては自由度があり、λ_i=0またはλ_i=(w_i1+w_i2)/2などと、任意に設定できる。ただし、基本的にはλ_iの値はゼロ(zero)以上であることが望ましく、以後の説明ではλ_i≧0であるとする。上記の式（６）を用いれば、式（１）の汎化損失 There is a degree of freedom in setting λ _i , and it can be set arbitrarily, such as λ _i =0 or λ _i =(w _i1 +w _i2 )/2. However, it is basically desirable for the value of λ _i to be zero or greater, and in the following explanation, it is assumed that λ _i ≧0. Using the above formula (6), the generalization loss of formula (1) can be calculated as follows:

の近似Approximation of

は、下記のように得られる。 is obtained as follows:

ただし、 however,

である。上記のC_iはパラメタθに依存しない定数である。よって確率変数I, J, J⁺およびJ^-に関する期待値を標本平均(sample average)で置き換えることで、以下の式（７）の通り経験損失(empilical risk) The above C _i is a constant that does not depend on the parameter θ. Therefore, by replacing the expected values for the random variables I, J, J ⁺ , and J ^- with the sample average, the empirical risk is expressed as the following equation (7).

が得られる。 is obtained.

ただし、Cはパラメタθに依存しない定数である。この量は、定数Cを除き、データから計算できる量であるため、パラメタθの推定のための目的関数として利用できる。よって、これ以後、経験損失 where C is a constant that does not depend on the parameter θ. Since these quantities, except for the constant C, can be calculated from data, they can be used as the objective function for estimating the parameter θ. Therefore, from now on, the empirical loss

から定数Cを除き、さらにパラメタに関する正則化項を加えてなる、下記の式（８）で示される目的関数 The objective function is expressed by the following equation (8), which is obtained by removing the constant C from

を最適化することについて説明する。 This article explains how to optimize

式（８）のΩ_A,Ω_Bは、それぞれ因子行列 In equation (8), Ω _A and Ω _B are factor matrices,

の正則化項を表し、L₁ノルム(norm)またはL₂ノルムなど任意の値が利用できる。次節では経験損失 represents the regularization term of the L ₁ norm or L ₂ norm, and any value can be used. In the next section, we will

を最小化するパラメタ推定の方法、すなわち推定アルゴリズムについて述べる。 We describe a parameter estimation method, i.e., an estimation algorithm, that minimizes .

（推定アルゴリズム）
パラメタθの推定には、確率的勾配降下法、(擬)ニュートン法、交互最小二乗法(Alternating Least Squares method, ALS)などの任意の最適化手法が利用できる。以後は、目的関数は微分可能である場合を考えるが、正則化項としてL₁ノルムを利用した場合など、目的関数に微分可能でない点がある場合も、劣勾配法を利用すれば同様に最適化可能である。目的関数 (Estimation Algorithm)
Any optimization method can be used to estimate the parameter θ, such as stochastic gradient descent, (quasi-)Newton method, or alternating least squares method (ALS). In the following, we consider the case where the objective function is differentiable, but even if the objective function has points that are not differentiable, such as when the _L1 norm is used as a regularization term, it can be optimized in the same way by using the subgradient method. Objective function

の偏微分(partial derivative)は以下の式で得られる。 The partial derivative of is given by the following formula:

ただし、上記のHowever, the above

は、インデックスデータ is index data

から、i_m=iであるデータの全てを抽出したものであり、上記の All data for which i _m =i is extracted from the above

は、インデックスデータ is index data

から、j_m=jであるデータの全てを抽出したものであり、上記の All data for which j _m =j is extracted from the above

は、大小比較データ is size comparison data

から、i_m=iであるデータ全てを抽出したものであり、上記の All data for which i _m =i is extracted from the above.

は、大小比較データ is size comparison data

から、 from,

であるデータの全てを抽出したものであり、上記のAll of the data listed above is extracted.

は、大小比較データ is size comparison data

から、 from,

であるデータの全てを抽出した値である。これらの値を用いれば、例えば次のように推定のための更新式を導ける。 This is the value obtained by extracting all the data. Using these values, we can derive an update equation for estimation, for example, as follows:

上記の勾配降下法について説明する。この勾配降下法では、最急降下方向にパラメタθを更新する。更新式は具体的には以下の式（９）および（１０）のように書き下せる。 We will now explain the gradient descent method. In this gradient descent method, the parameter θ is updated in the direction of steepest descent. Specifically, the update equations can be written as the following equations (9) and (10).

ただし、ηi, γjは学習率である。これらの式（９）および（１０）に従いパラメタθを更新することを繰り返すことでパラメタθの推定が可能となる。なお、確率的勾配効果法の様に、更新の際に全ての行iおよび列jのパラメタを毎回更新せず、確率的に選んだ行iに関するパラメタ where ηi and γj are learning rates. By repeatedly updating the parameter θ according to these equations (9) and (10), it is possible to estimate the parameter θ. Note that, unlike the stochastic gradient effect method, the parameters for all rows i and columns j are not updated every time, but the parameters for row i selected stochastically are

と、列jに関するパラメタ and the parameters for column j

とだけを更新するようにしても良い。 You can also just update it.

上記のニュートン法について説明する。このニュートン法では、2回偏微分まで用いて更新式を導く。一般にニュートン法では2回偏微分により求めるヘッセ行列(Hession matrix)の計算に要する時間の長さが問題となるが、i≠i′であるならば、 We will now explain the Newton method. In this Newton method, the update equation is derived using up to two partial derivatives. In general, the length of time required to calculate the Hessian matrix obtained by two partial derivatives is an issue with the Newton method. However, if i ≠ i′, then

であることと、j≠j′であるならば、 and j ≠ j′, then

であることと、を利用すると効率的な計算が可能となる。なぜなら、正則化項として同様の性質、すなわち、i≠i′またはj≠j′ならば2回偏微分がゼロである性質を持つことを利用し、片方の因子行列を固定した条件での、もう一方の因子行列を更新する交互更新的なアプローチ(approach)を用いれば、更新対象である因子行列のパラメタθの更新は、行または列ごとに独立に行なうことができ、更新対象のパラメタθのヘッセ行列は、サイズがR×Rの行列にできるからである。これにより、効率的にパラメタθを更新することができる。パラメタ Efficient calculations are possible by utilizing this and the fact that it has a similar property as a regularization term, namely, that if i ≠ i′ or j ≠ j′, then by using an alternating update approach in which one factor matrix is fixed and the other factor matrix is updated, the parameter θ of the factor matrix to be updated can be updated independently for each row or column, and the Hessian matrix of the parameter θ to be updated can be a matrix of size R × R. This allows the parameter θ to be updated efficiently. The parameter

が更新されるときの、サイズがR×Rであるヘッセ行列Hessian matrix of size R×R when is updated

、および , and

が更新されるときのヘッセ行列Hessian matrix when is updated

の各要素は、それぞれ以下のように得られる。 Each element is obtained as follows:

これを用いて以下の式（１１）および（１２）のようにパラメタθを更新すれば良い。 Using this, we can update the parameter θ as shown in equations (11) and (12) below.

ただし、ηi, γjは学習率である。これらの式（１１）および（１２）に従いパラメタθを更新することを繰り返すことで、目的関数の局所最適解に到達することができる。また、確率的勾配降下法などと同様に、パラメタθの更新の際に、全ての行iまたは列jのパラメタθを毎回更新せず、確率的に選択した行iおよび列jに関するパラメタ where ηi and γj are learning rates. By repeatedly updating the parameter θ according to these equations (11) and (12), a local optimum solution for the objective function can be reached. Also, like the stochastic gradient descent method, when updating the parameter θ, the parameter θ for all rows i or columns j is not updated every time, but the parameter θ for the row i and column j selected stochastically is updated every time.

だけを更新するようにしても良い。 It is also possible to update only.

前述した様にBDは利用する凸関数As mentioned above, BD is a convex function that

を変えることで多様な損失関数を表現できるから、上記の勾配降下法またはニュートン法の更新式を利用することで、上記の2乗誤差、ロジスティック損失、またはI-divergence、または板倉斎藤擬距離などの、BDで表現できる任意の損失関数を用いる場合のパラメタθの更新式を導くことができる。 By changing this, various loss functions can be expressed, so by using the update formula for the gradient descent method or Newton method above, it is possible to derive the update formula for the parameter θ when using any loss function that can be expressed in BD, such as the above squared error, logistic loss, I-divergence, or Itakura-Saito pseudo distance.

（ハイパーパラメタの推定）
最後にハイパーパラメタ{w_i1, w_i2}の推定について述べる。このハイパーパラメタは、下記の式（１３）のように関数Err_iを最小化することで決定できる。 (Hyperparameter estimation)
Finally, we will describe the estimation of the hyper-parameters {w _i1 , w _i2 }. These hyper-parameters can be determined by minimizing the function _Erri as shown in the following equation (13).

この関数は、関数This function is the function

の of

による近似の誤差の上界になっている。正確に述べると、任意の因子行列 is an upper bound on the error of the approximation by

と、成分(i, j)とに関して and for component (i, j)

であるという条件のもとで、 under the condition that

が成り立つ。よって、上記の関数Err_iを最小化することで近似の誤差を小さくすることができる。関数Err_iは積分計算を含むが、仮定により、分布f_X|I=i(x)は既知であるため、離散近似(discrete approximation)として計算できる。具体的には、例えばxの取り得る値全体を離散的に Therefore, the approximation error can be reduced by minimizing the above function _Erri . The function _Erri includes integral calculations, but since the distribution f _X|I=i (x) is known by assumption, it can be calculated as a discrete approximation. Specifically, for example, all possible values of x can be calculated discretely.

と分割して、例えば、xの0.01分位点を and divide it into, for example, the 0.01 quantile of x

として、0.99分位点を and the 0.99th quantile is

として、 As

と設定することで近似し、下記の式（１４）の右辺の最小化を考えれば、任意の最適化手法で推定できる。 This can be approximated by setting and minimizing the right-hand side of equation (14) below, and can be estimated using any optimization method.

この式（１４）は絶対値の和で定義されているため、劣勾配法などの、目的関数中に微分不可能な点が存在しても扱える手法、または線形計画法(linear programming)などを利用することができる。 Since equation (14) is defined as the sum of absolute values, it is possible to use methods that can handle the presence of non-differentiable points in the objective function, such as the subgradient method, or linear programming.

劣勾配法を利用する場合、関数When using the subgradient method, the function

のw_i=(w_i1, wi2)における劣勾配の集合 The set of subgradients at w _i =(w _i1 , wi2) of

に属する任意のベクトル(vector)gを用いて、以下の式（１５）であるUsing any vector g belonging to , the following equation (15) is

に従ってパラメタθを更新することを繰り返せば良い。ただし、γ′は学習率である。 Then we repeatedly update the parameter θ according to the following, where γ′ is the learning rate.

なお、上記では確率分布f_X|I=i(x)が既知である場合のハイパーパラメタの推定方法について示したが、行列の各行の要素の値の集合 In the above, we have shown how to estimate hyperparameters when the probability distribution f _X|I=i (x) is known.

を用いてハイパーパラメタを推定することも可能である。具体的には、 It is also possible to estimate hyperparameters using

からi_m=iである値を抜き出したものが The value where i _m =i is extracted from

と記載されるとすると、関数Err_iの以下の式（１６）のような近似を考えれば良い。 Then, the function _Erri can be approximated as in the following equation (16):

ただし、 however,

は、累積密度関数F_X|I=iの経験近似 is the empirical approximation of the cumulative density function F _X|I=i

である。 is.

は teeth

のデータ数を表す。集計データを用いる場合も同様に関数Err_iの近似が可能である。上記の式（１６）の最適化には、上記の式（１４）の最適化と同様に任意の最適化法が利用でき、劣勾配法などが利用できる。 represents the number of data points. When aggregated data is used, the function _Erri can be similarly approximated. For optimizing the above formula (16), any optimization method can be used, as in the case of optimizing the above formula (14), and the subgradient method can be used.

本実施形態によって、行列またはテンソルの要素のインデックスと、その要素の値との対応が取れず、行列またはテンソルとしての表現が不足するデータであっても、行列分解またはテンソル分解を行なうことができるようになる。これにより、行列またはテンソルとしての表現が不足するデータから潜在的なパターンを抽出したり、行列要素の値を推定したりすることが可能となる。 This embodiment makes it possible to perform matrix decomposition or tensor decomposition even for data in which the index of a matrix or tensor element does not correspond to the value of that element, and which lacks a matrix or tensor representation. This makes it possible to extract latent patterns from data in which the representation as a matrix or tensor is lacking, and to estimate the values of matrix elements.

（動作の説明）
以下、本実施形態に係るデータ解析装置１の構成および各部の動作について説明する。
図１は、本発明の一実施形態に係るデータ解析装置の適用例を示す図である。
図１に示されるように、本発明の一実施形態に係るデータ解析装置１は、入力データ処理部１０、ハイパーパラメタ推定部２０、パラメタ推定部３０、パラメタ処理部４０、記録部５０、および入出力部６０を備える。入出力部６０は、外部装置２との間でデータの入出力が可能である。
記録部５０は、入力データ記録部５１、ハイパーパラメタ記録部５２、およびパラメタ記録部５３を有する。 (Explanation of operation)
The configuration of the data analysis device 1 according to this embodiment and the operation of each unit will be described below.
FIG. 1 is a diagram showing an application example of a data analysis device according to an embodiment of the present invention.
1, a data analysis device 1 according to an embodiment of the present invention includes an input data processing unit 10, a hyperparameter estimating unit 20, a parameter estimating unit 30, a parameter processing unit 40, a recording unit 50, and an input/output unit 60. The input/output unit 60 is capable of inputting and outputting data to and from an external device 2.
The recording unit 50 includes an input data recording unit 51 , a hyper-parameter recording unit 52 , and a parameter recording unit 53 .

次に、データ解析装置１の動作の概要を説明する。図２は、本発明の一実施形態に係るデータ解析装置の動作の概要の一例を示すフローチャートである。
まず、データ解析装置１の入力データ処理部１０は、インデックスデータ Next, an overview of the operation of the data analysis device 1 will be described below. Fig. 2 is a flowchart showing an example of the overview of the operation of the data analysis device according to an embodiment of the present invention.
First, the input data processing unit 10 of the data analysis device 1 receives index data

、確率分布の集合 , a set of probability distributions

、および大小比較データ , and size comparison data

を外部装置２から入出力部６０を介して入力する（ステップ(step)Ｓ１）。 is input from the external device 2 via the input/output unit 60 (step S1).

次に、ハイパーパラメタ推定部２０は、ハイパーパラメタNext, the hyper-parameter estimation unit 20 estimates the hyper-parameter

を推定する（ステップＳ２）。
次に、パラメタ推定部３０は、パラメタθを推定する（ステップＳ３）。
次に、パラメタ処理部４０は、推定されたパラメタθを入出力部６０を介して外部装置２に出力する（ステップＳ４）。 is estimated (step S2).
Next, the parameter estimation unit 30 estimates the parameter θ (step S3).
Next, the parameter processing unit 40 outputs the estimated parameter θ to the external device 2 via the input/output unit 60 (step S4).

以下に、上記のＳ１～Ｓ４の具体的な動作を説明する。
まず、Ｓ１の具体例を説明する。入力データ処理部１０は、インデックスデータ The specific operations of S1 to S4 above will be described below.
First, a specific example of S1 will be described. The input data processing unit 10 receives index data

、確率分布の集合 , a set of probability distributions

、および大小比較データ , and size comparison data

を入力して、記録部５０の入力データ記録部５１に格納する。 and store it in the input data recording section 51 of the recording unit 50.

次に、Ｓ２の具体例を説明する。ハイパーパラメタ推定部２０は、入力データ記録部５１に格納された確率分布の集合Next, a specific example of S2 will be described. The hyperparameter estimation unit 20 estimates the set of probability distributions stored in the input data recording unit 51.

を入力し、以下に示す方法（ａ）～（ｄ）によって上記の式（１４）の目的関数を最小化することで、ハイパーパラメタwを推定する。ハイパーパラメタ推定部２０は、推定したハイパーパラメタwを記録部５０のハイパーパラメタ記録部５２に格納する。図３は、ハイパーパラメタ推定部による動作の一例を示すフローチャートである。図３では、劣勾配法を用いる場合のハイパーパラメタw_iを更新する例が示される。この更新をi＝1, …, n_I毎に行なうことで推定対象のハイパーパラメタwが得られる。 and estimates the hyper-parameter w by minimizing the objective function of the above formula (14) using the following methods (a) to (d). The hyper-parameter estimation unit 20 stores the estimated hyper-parameter w in the hyper-parameter recording unit 52 of the recording unit 50. FIG. 3 is a flowchart showing an example of the operation of the hyper-parameter estimation unit. FIG. 3 shows an example of updating the hyper-parameter w _i when the subgradient method is used. By performing this update for each i=1, ..., n _I, the hyper-parameter w to be estimated is obtained.

（ａ）ハイパーパラメタ推定部２０は、パラメタw_iを初期化する（ステップＳ２１）。ステップＳ２に係る処理の終了条件に用いる変数として、ハイパーパラメタ推定部２０は、ハイパーパラメタw_iの更新量の最大変化幅を示す変数δを同様に初期化し、上記終了条件の閾値εおよび計算の最大繰り返し回数、ここでは変数δの更新に係る計算の繰り返し回数の最大値をそれぞれ設定する（ステップＳ２２）。 (a) The hyper-parameter estimation unit 20 initializes the parameter w _i (step S21). As a variable used as a termination condition for the process in step S2, the hyper-parameter estimation unit 20 similarly initializes a variable δ indicating the maximum change range of the update amount of the hyper-parameter w _i , and sets a threshold ε and a maximum number of calculation iterations for the termination condition, here the maximum value of the number of calculation iterations for updating the variable δ (step S22).

（ｂ）ハイパーパラメタ推定部２０は、パラメタwを上記の式（１５）に従い更新する。このとき、ハイパーパラメタ推定部２０は、更新前のハイパーパラメタw_iと更新後のハイパーパラメタw_iとの差の絶対値の最大値 (b) The hyper-parameter estimation unit 20 updates the parameter w in accordance with the above formula (15). At this time, the hyper-parameter estimation unit 20 calculates the maximum absolute value of the difference between the hyper-parameter w _i before the update and the hyper-parameter w _i after the update.

を変数δに記録する（ステップＳ２３）。ただし、ここでは、更新前のハイパーパラメタw_iの要素は is recorded in the variable δ (step S23). Here, the elements of the hyper-parameter w _i before the update are

と記述され、更新後のハイパーパラメタw_iの要素は The elements of the updated hyperparameter w _i are

と記述される。 It is described as follows.

（ｃ）ステップＳ２３に応じて、ハイパーパラメタ推定部２０は、計算の繰り返し回数のカウント値(count value)を更新する（ステップＳ２４）。
（ｄ）計算繰り返し回数のカウント値が、上記あらかじめ定めた繰り返し回数の最大値を超える、または更新量の最大変化幅を示す変数δが、上記あらかじめ定めた閾値εより小さいときは（ステップＳ２５のＹｅｓ）、ハイパーパラメタ推定部２０は、ステップＳ２に係る処理を終了する。そうでなければ（ステップＳ２５のＮｏ）、ハイパーパラメタ推定部２０は、変数δをδ←0としてステップＳ２２に戻る。 (c) In response to step S23, the hyper-parameter estimation unit 20 updates the count value of the number of times the calculation is repeated (step S24).
(d) When the count value of the number of calculation iterations exceeds the predetermined maximum number of iterations or the variable δ indicating the maximum change width of the update amount is smaller than the predetermined threshold ε (Yes in step S25), the hyper-parameter estimation unit 20 ends the process related to step S2. Otherwise (No in step S25), the hyper-parameter estimation unit 20 sets the variable δ to δ←0 and returns to step S22.

次に、Ｓ３の具体例を説明する。パラメタ推定部３０は、入力データ記録部５１に格納されたインデックスデータNext, a specific example of S3 will be described. The parameter estimation unit 30 uses the index data stored in the input data recording unit 51

、大小比較データ , size comparison data

、およびハイパーパラメタ記録部５２に格納されたハイパーパラメタwを入力し、以下に示す方法（ａ）～（ｅ）によって上記の式（８）の目的関数を最小化することで、パラメタθを推定する。パラメタ推定部３０は、推定したパラメタθを記録部５０のパラメタ記録部５３に格納する。図４は、パラメタ推定部による動作の一例を示すフローチャートである。図４では、ニュートン法を用いる場合のパラメタθを更新する例が示される。 and the hyper-parameter w stored in the hyper-parameter recording unit 52, and estimates the parameter θ by minimizing the objective function of the above equation (8) using the following methods (a) to (e). The parameter estimation unit 30 stores the estimated parameter θ in the parameter recording unit 53 of the recording unit 50. Fig. 4 is a flowchart showing an example of the operation of the parameter estimation unit. Fig. 4 shows an example of updating the parameter θ when using the Newton method.

（ａ）パラメタ推定部３０は、パラメタθを初期化する（ステップＳ３１）。ステップＳ３に係る処理の終了条件に用いる変数として、パラメタ推定部３０は、パラメタθの更新量の最大変化幅を示す変数δを同様に初期化し、上記終了条件の閾値εおよび計算の最大繰り返し回数、ここでは変数δの更新に係る計算の繰り返し回数の最大値をそれぞれ設定する（ステップＳ３２）。(a) The parameter estimation unit 30 initializes the parameter θ (step S31). As a variable used as the termination condition for the process related to step S3, the parameter estimation unit 30 similarly initializes a variable δ indicating the maximum change range of the update amount of the parameter θ, and sets the threshold ε of the termination condition and the maximum number of calculation iterations, here the maximum number of calculation iterations related to the update of the variable δ (step S32).

（ｂ）パラメタ推定部３０は、因子行列(b) The parameter estimation unit 30 estimates the factor matrix

を上記の式（１１）に従い更新する。このとき、パラメタ推定部３０は、更新前後の因子行列 is updated according to the above formula (11). At this time, the parameter estimation unit 30 updates the factor matrix before and after the update

の差の絶対値の最大値Maximum absolute value of the difference

が現在の変数δより大きければ、パラメタ推定部３０は、この変数δをIf is greater than the current variable δ, the parameter estimation unit 30 sets this variable δ to

と更新する（ステップＳ３３）。ただし、更新前の因子行列 (step S33). However, the factor matrix before the update

の要素はThe elements of

と記述され、更新後の上記要素は and the above elements after the update are

と記述される。 It is described as follows.

（ｃ）パラメタ推定部３０は、因子行列(c) The parameter estimation unit 30 estimates the factor matrix

を上記の式（１２）に従い更新する。この時、パラメタ推定部３０は、更新前後の因子行列 is updated according to the above formula (12). At this time, the parameter estimation unit 30 calculates the factor matrix before and after the update

の差の絶対値の最大値Maximum absolute value of the difference

が現在の変数δより大きければ、この変数δをIf is greater than the current variable δ, set this variable δ to

と更新する（ステップＳ３４）。ただし、更新前の因子行列 (step S34). However, the factor matrix before the update

の要素はThe elements of

と記述される。 It is described as follows.

（ｄ）ステップＳ３４に応じて、パラメタ推定部３０は、計算の繰り返し回数のカウント値を更新する（ステップＳ３５）。(d) In response to step S34, the parameter estimation unit 30 updates the count value of the number of calculation iterations (step S35).

（ｅ）計算繰り返し回数のカウント値が、上記あらかじめ定めた繰り返し回数の最大値を超える、または更新量の最大変化幅を示す変数δが、上記あらかじめ定めた閾値εより小さいときは（ステップＳ３６のＹｅｓ）、パラメタ推定部３０は、ステップＳ３に係る処理を終了し、そうでなければ（ステップＳ３６のＮｏ）、パラメタ推定部３０は、変数δをδ←0としてステップＳ３２に戻る。(e) If the count value of the number of calculation iterations exceeds the maximum number of iterations determined above, or if the variable δ indicating the maximum change in the update amount is smaller than the predetermined threshold value ε (Yes in step S36), the parameter estimation unit 30 terminates the processing related to step S3; otherwise (No in step S36), the parameter estimation unit 30 sets the variable δ to δ←0 and returns to step S32.

次に、Ｓ４の具体例を説明する。パラメタ処理部４０は、パラメタ記録部５３に格納されたパラメタθを入出力部６０を介して外部装置２に出力する。
上記の実施形態では、上記式（８）の目的関数を最小化するパラメタθを推定するためにニュートン法に基づくアルゴリズムを用いているが、他のいかなる方法、例えば上記の勾配降下法を用いても良い。また、上記の実施形態では、ハイパーパラメタの推定に確率分布の集合 Next, a specific example of S4 will be described. The parameter processing unit 40 outputs the parameter θ stored in the parameter recording unit 53 to the external device 2 via the input/output unit 60.
In the above embodiment, an algorithm based on Newton's method is used to estimate the parameter θ that minimizes the objective function of the above equation (8), but any other method, such as the gradient descent method, may be used. Also, in the above embodiment, a set of probability distributions is used to estimate the hyperparameters.

を利用しているが、これの代わりに行列の各行の要素の値の集合 is used, but instead, a set of element values for each row of the matrix

や集計データを利用しても良い。この場合には上記の式（１４）の目的関数ではなく、上記の式（１６）の最小化（最適化）によりハイパーパラメタを推定すれば良い。この最適化には任意の方法が利用できる。また、上記では行列としての表現が不足するデータに対して行列分解を行うUMFを適用した例を示しているが、同様の枠組みで、テンソルとしての表現が不足するデータをテンソル分解するUTFを適用することも可能である。 or aggregated data. In this case, hyperparameters can be estimated by minimizing (optimizing) the above formula (16) instead of the objective function of the above formula (14). Any method can be used for this optimization. In addition, while the above shows an example of applying UMF, which performs matrix decomposition on data that lacks a matrix representation, it is also possible to apply UTF, which performs tensor decomposition on data that lacks a tensor representation, in a similar framework.

図５は、本発明の一実施形態に係るデータ解析装置のハードウエア構成の一例を示すブロック図である。
図５に示された例では、上記の実施形態に係るデータ解析装置１は、例えばサーバコンピュータ（server computer）またはパーソナルコンピュータ（personal computer）により構成され、ＣＰＵ（Central Processing Unit）等のハードウエアプロセッサ（hardware processor）１１１Ａを有する。そして、このハードウエアプロセッサ１１１Ａに対し、プログラムメモリ（program memory）１１１Ｂ、データメモリ（data memory）１１２、入出力インタフェース（interface）１１３及び通信インタフェース１１４が、バス（bus）１１５を介して接続される。 FIG. 5 is a block diagram showing an example of a hardware configuration of a data analysis device according to an embodiment of the present invention.
5, the data analysis device 1 according to the embodiment is configured, for example, by a server computer or a personal computer, and has a hardware processor 111A such as a CPU (Central Processing Unit). A program memory 111B, a data memory 112, an input/output interface 113, and a communication interface 114 are connected to the hardware processor 111A via a bus 115.

通信インタフェース１１４は、例えば１つ以上の無線の通信インタフェースユニットを含んでおり、通信ネットワーク（network）ＮＷとの間で情報の送受信を可能にする。無線インタフェースとしては、例えば無線ＬＡＮ（Local Area Network）などの小電力無線データ通信規格が採用されたインタフェースが使用される。The communication interface 114 includes, for example, one or more wireless communication interface units, and enables transmission and reception of information to and from a communication network NW. As a wireless interface, for example, an interface that adopts a low-power wireless data communication standard such as a wireless LAN (Local Area Network) is used.

入出力インタフェース１１３には、データ解析装置１に付設される、利用者などにより用いられる入力デバイス（device）２００および出力デバイス３００が接続される。入力デバイス２００および出力デバイス３００は、図１に示される外部装置２に対応する。
入出力インタフェース１１３は、キーボード、タッチパネル（touch panel）、タッチパッド（touchpad）、マウス（mouse）等の入力デバイス２００を通じて利用者などにより入力された操作データを取り込むとともに、出力データを液晶または有機ＥＬ（Electro Luminescence）等が用いられた表示デバイスを含む出力デバイス３００へ出力して表示させる処理を行なう。なお、入力デバイス２００および出力デバイス３００には、データ解析装置１に内蔵されたデバイスが使用されてもよく、また、ネットワークＮＷを介してデータ解析装置１と通信可能である他の情報端末の入力デバイスおよび出力デバイスが使用されてもよい。 An input device 200 and an output device 300 used by a user or the like, which are attached to the data analysis apparatus 1, are connected to the input/output interface 113. The input device 200 and the output device 300 correspond to the external device 2 shown in FIG.
The input/output interface 113 takes in operation data input by a user or the like through an input device 200 such as a keyboard, a touch panel, a touchpad, a mouse, or the like, and outputs output data to an output device 300 including a display device using liquid crystal or organic EL (Electro Luminescence), etc., for display. The input device 200 and the output device 300 may be devices built into the data analysis device 1, or may be input devices and output devices of other information terminals that can communicate with the data analysis device 1 via the network NW.

プログラムメモリ１１１Ｂは、非一時的な有形の記憶媒体として、例えば、ＨＤＤ（Hard Disk Drive）またはＳＳＤ（Solid State Drive）等の随時書込みおよび読出しが可能な不揮発性メモリ（non-volatile memory）と、ＲＯＭ（Read Only Memory）等の不揮発性メモリとが組み合わせて使用されたもので、一実施形態に係る各種制御処理等を実行する為に必要なプログラムが格納されている。 The program memory 111B is a non-transient tangible storage medium that is a combination of a non-volatile memory that can be written to and read from at any time, such as a hard disk drive (HDD) or solid state drive (SSD), and a non-volatile memory such as a read only memory (ROM), and stores the programs necessary to execute various control processes, etc. according to one embodiment.

データメモリ１１２は、有形の記憶媒体として、例えば、上記の不揮発性メモリと、ＲＡＭ（Random Access Memory）等の揮発性メモリ（volatile memory）とが組み合わせて使用されたもので、各種処理が行なわれる過程で取得および作成された各種データが記憶される為に用いられる。 Data memory 112 is a tangible storage medium that is, for example, a combination of the above-mentioned non-volatile memory and a volatile memory such as RAM (Random Access Memory), and is used to store various data acquired and created during various processing steps.

本発明の一実施形態に係るデータ解析装置１は、ソフトウエア（software）による処理機能部として、図１に示される入力データ処理部１０、ハイパーパラメタ推定部２０、パラメタ推定部３０、パラメタ処理部４０、および入出力部６０を有するデータ処理装置として構成され得る。A data analysis device 1 according to one embodiment of the present invention can be configured as a data processing device having an input data processing unit 10, a hyperparameter estimation unit 20, a parameter estimation unit 30, a parameter processing unit 40, and an input/output unit 60 as software-based processing function units, as shown in FIG. 1.

データ解析装置１の各部によるワークメモリ（working memory）などとして用いられる各情報記憶部および記録部５０は、図５に示されたデータメモリ１１２が用いられることで構成され得る。ただし、これらの構成される記憶領域はデータ解析装置１内に必須の構成ではなく、例えば、ＵＳＢ（Universal Serial Bus）メモリなどの外付け記憶媒体、又はクラウド（cloud）に配置されたデータベースサーバ（database server）等の記憶装置に設けられた領域であってもよい。Each information storage unit and recording unit 50 used as a working memory by each unit of the data analysis device 1 can be configured by using the data memory 112 shown in Fig. 5. However, these configured storage areas are not essential components within the data analysis device 1, and may be areas provided in a storage device such as an external storage medium such as a Universal Serial Bus (USB) memory, or a database server located in the cloud.

上記の入力データ処理部１０、ハイパーパラメタ推定部２０、パラメタ推定部３０、パラメタ処理部４０、および入出力部６０の各部における処理機能部は、いずれも、プログラムメモリ１１１Ｂに格納されたプログラムを上記ハードウエアプロセッサ１１１Ａにより読み出させて実行させることにより実現され得る。なお、これらの処理機能部の一部または全部は、特定用途向け集積回路（ＡＳＩＣ（Application Specific Integrated Circuit））またはＦＰＧＡ（Field-Programmable Gate Array）などの集積回路を含む、他の多様な形式によって実現されてもよい。The processing function units in the input data processing unit 10, the hyper parameter estimation unit 20, the parameter estimation unit 30, the parameter processing unit 40, and the input/output unit 60 can all be realized by reading and executing a program stored in the program memory 111B by the hardware processor 111A. Note that some or all of these processing function units may be realized in various other forms, including integrated circuits such as application specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs).

また、各実施形態に記載された手法は、計算機（コンピュータ）に実行させることができるプログラム（ソフトウエア手段）として、例えば磁気ディスク（フロッピー（登録商標）ディスク（Floppy disk）、ハードディスク（hard disk）等）、光ディスク（optical disc）（ＣＤ－ＲＯＭ、ＤＶＤ、ＭＯ等）、半導体メモリ（ＲＯＭ、ＲＡＭ、フラッシュメモリ（Flash memory）等）等の記録媒体に格納し、また通信媒体により伝送して頒布され得る。なお、媒体側に格納されるプログラムには、計算機に実行させるソフトウエア手段（実行プログラムのみならずテーブル（table）、データ構造も含む）を計算機内に構成させる設定プログラムをも含む。本装置を実現する計算機は、記録媒体に記録されたプログラムを読み込み、また場合により設定プログラムによりソフトウエア手段を構築し、このソフトウエア手段によって動作が制御されることにより上述した処理を実行する。なお、本明細書でいう記録媒体は、頒布用に限らず、計算機内部あるいはネットワークを介して接続される機器に設けられた磁気ディスク、半導体メモリ等の記憶媒体を含むものである。 The methods described in each embodiment can be stored as a program (software means) that can be executed by a computer on a recording medium such as a magnetic disk (floppy disk, hard disk, etc.), optical disk (CD-ROM, DVD, MO, etc.), semiconductor memory (ROM, RAM, flash memory, etc.), and can be distributed by transmitting it via a communication medium. The programs stored on the medium also include a setting program that configures the software means (including not only execution programs but also tables and data structures) that the computer executes. The computer that realizes this device reads the program recorded on the recording medium, and in some cases, constructs the software means using the setting program, and executes the above-mentioned processing by controlling the operation of the software means. Note that the recording medium referred to in this specification is not limited to a recording medium for distribution, but also includes a storage medium such as a magnetic disk or semiconductor memory provided inside the computer or in a device connected via a network.

なお、本発明は、上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は適宜組み合わせて実施してもよく、その場合組み合わせた効果が得られる。更に、上記実施形態には種々の発明が含まれており、開示される複数の構成要件から選択された組み合わせにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、課題が解決でき、効果が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。 Note that the present invention is not limited to the above-described embodiments, and can be modified in various ways in the implementation stage without departing from the gist of the invention. The embodiments may also be implemented in appropriate combination, in which case the combined effects can be obtained. Furthermore, the above-described embodiments include various inventions, and various inventions can be extracted by combinations selected from the multiple constituent elements disclosed. For example, if the problem can be solved and an effect can be obtained even if some constituent elements are deleted from all the constituent elements shown in the embodiments, the configuration from which these constituent elements are deleted can be extracted as an invention.

１…データ解析装置
２…外部装置
１０…入力データ処理部
２０…ハイパーパラメタ推定部
３０…パラメタ推定部
４０…パラメタ処理部
５０…記録部
５１…入力データ記録部
５２…ハイパーパラメタ記録部
５３…パラメタ記録部
６０…入出力部 REFERENCE SIGNS LIST 1 data analysis device 2 external device 10 input data processing unit 20 hyperparameter estimation unit 30 parameter estimation unit 40 parameter processing unit 50 recording unit 51 input data recording unit 52 hyperparameter recording unit 53 parameter recording unit 60 input/output unit

Claims

an input unit that receives input of index data, which is a set of indexes of analysis target data, a set of elements of the analysis target data, and size comparison data indicating the size relationship of the elements of the analysis target data;
a first estimation unit that estimates a parameter that minimizes an upper bound of an error between a loss function used to estimate a factor of the analysis target data and an approximation of the loss function, based on the set of elements input by the input unit;
a second estimation unit that estimates a factor of the analysis target data based on the index data input by the input unit, the magnitude comparison data, and the parameter estimated by the first estimation unit;
A data analysis device comprising:

The index data is
is a set of matrix or tensor indices,
The elements of the analysis target data are:
are the elements of each row of said matrix or tensor,
The magnitude comparison data is
The data indicates the magnitude relationship of the elements of the matrix or tensor.
The data analysis device according to claim 1 .

The second estimation unit is
optimizing a function based on a regularization term related to a factor of the analysis target data, a value based on the index data input by the input unit, and a value based on the magnitude comparison data input by the input unit and the estimated parameter, thereby estimating the factor of the analysis target data;
The data analysis device according to claim 1 .

The first estimation unit
estimating the parameters that minimize a function based on a probability density function of elements of the analysis target data;
The data analysis device according to claim 1 .

A method performed by a data analysis device, comprising:
receiving input of index data which is a set of indexes of analysis target data, a set of elements of the analysis target data, and size comparison data which indicates the size relationship of the elements of the analysis target data;
estimating a parameter that minimizes an upper bound of an error between a loss function used to estimate a factor of the analysis target data and an approximation of the loss function based on the input set of elements;
estimating a factor of the analysis target data based on the input index data, the magnitude comparison data, and the estimated parameter;
A data analysis method comprising:

The index data is
is a set of matrix or tensor indices,
The elements of the analysis target data are:
are the elements of each row of said matrix or tensor,
The magnitude comparison data is
The data indicates the magnitude relationship of the elements of the matrix or tensor.
The data analysis method according to claim 5 .

Estimating the factors includes:
and optimizing a function based on a regularization term related to a factor of the analysis target data, a value based on the input index data, and a value based on the input magnitude comparison data and the estimated parameter, thereby estimating the factor of the analysis target data.
The data analysis method according to claim 5 .

A data analysis processing program that causes a processor to function as each of the components of the data analysis device described in any one of claims 1 to 4.