JP6433858B2

JP6433858B2 - Probability density function estimation device, continuous value prediction device, method, and program

Info

Publication number: JP6433858B2
Application number: JP2015141650A
Authority: JP
Inventors: 秀明金; 澤田　宏; 宏澤田
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2015-07-15
Filing date: 2015-07-15
Publication date: 2018-12-05
Anticipated expiration: 2035-07-15
Also published as: JP2017027112A

Description

本発明は、確率密度関数推定装置、連続値予測装置、方法、及びプログラムに係り、特に、個体が生成する連続値を予測するための確率密度関数推定装置、連続値予測装置、方法、及びプログラムに関する。 The present invention relates to a probability density function estimation device, continuous value prediction device, method, and program, and more particularly to a probability density function estimation device, continuous value prediction device, method, and program for predicting continuous values generated by an individual. About.

一般に、確率的に振る舞う個体が連続値を生成し、その連続値をデータとして我々が観測する状況を考える。本状況としては、例えば、消費者（＝個体）がある金額（＝連続値）の購買を行う、消費者（＝個体）がある時間間隔（＝連続値）で購買を行う、旅行者（＝個体）がある時間間隔（＝連続値）で次の滞在地に移動する、旅行者（＝個体）がある距離（＝連続値）だけ離れた位置に移動する、文書（＝個体）の中のある単語がある時刻（＝連続値）に出現する、機器（＝個体）の故障がある時刻（＝連続値）に発生する、神経細胞（＝個体）がある時間間隔（＝連続値）で神経スパイク信号を発生させる、地震（＝個体）がある時間間隔（＝連続値）で発生する、など様々なものが挙げられる。 In general, we consider a situation in which an individual who behaves stochastically generates a continuous value and observes that continuous data as data. As this situation, for example, a consumer (= individual) purchases a certain amount (= continuous value), a consumer (= individual) purchases at a certain time interval (= continuous value), a traveler (= The individual moves to the next place of stay at a certain time interval (= continuous value), the traveler (= individual) moves to a position separated by a certain distance (= continuous value), in the document (= individual) A certain word appears at a certain time (= continuous value), a device (= individual) fails at a certain time (= continuous value), and a nerve cell (= individual) has a nerve at a certain time interval (= continuous value) There are various things such as generating spike signals, earthquakes (= individuals) occurring at certain time intervals (= continuous values).

個体が連続値を生成する時に従う確率的な規則、すなわち確率密度関数をデータから推定することができれば、個体が将来に生成する連続値を予測することができる。上記の例では、消費者の将来の購買時刻や旅行者の将来の滞在地を予測できることになる。確率密度関数の推定方法は大きく分けて、パラメトリック密度推定とノンパラメトリック密度推定の二種類が存在する。パラメトリック密度推定とは、確率密度関数をパラメータを有した分布関数で表現し、データからそのパラメータの値を決める方法である。一方ノンパラメトリック密度推定とは、確率密度関数の形状に強い仮定を置かずにデータから推定する方法である。ノンパラメトリック密度推定の代表的なものにヒストグラム法があり、本発明はこのヒストグラム法に関するものである。 If a probabilistic rule that an individual follows when generating a continuous value, that is, a probability density function can be estimated from the data, a continuous value that the individual will generate in the future can be predicted. In the above example, the consumer's future purchase time and the traveler's future place of stay can be predicted. Probability density function estimation methods can be broadly divided into two types: parametric density estimation and nonparametric density estimation. Parametric density estimation is a method in which a probability density function is expressed by a distribution function having parameters, and the values of the parameters are determined from data. On the other hand, nonparametric density estimation is a method of estimating from data without making a strong assumption on the shape of the probability density function. A representative example of nonparametric density estimation is a histogram method, and the present invention relates to this histogram method.

J. Rissanen, "Density estimation by stochastic complexity", IEEE Transactions on Information Theory, Vol. 38, pp.315-323, 1992.J. Rissanen, "Density estimation by stochastic complexity", IEEE Transactions on Information Theory, Vol. 38, pp.315-323, 1992. C. Bishop. “Pattern Recognition and Machine Learning”, Springer, New York, 2006.C. Bishop. “Pattern Recognition and Machine Learning”, Springer, New York, 2006. K. H. Knuth, "Optimal data-based binning for Histograms", arXiv preprint physics/0605197, 2006.K. H. Knuth, "Optimal data-based binning for Histograms", arXiv preprint physics / 0605197, 2006. L. Davies, U. Gather, D. Nordman, H. Weinert, "A comparison of automatic histogram constructions", ESAIM: Probability and Statistics, Vol. 13, pp.181-196, 2009.L. Davies, U. Gather, D. Nordman, H. Weinert, "A comparison of automatic histogram constructions", ESAIM: Probability and Statistics, Vol. 13, pp.181-196, 2009.

個体から観測されたデータが十分手に入る場合、確率密度関数の形状が制限される（自由度が低い）パラメトリック密度推定ではなく、より多様な形状を再現できるヒストグラム法が適している。しかし多くの場合、手に入る個体のデータは少ないため、ヒストグラム法は適用できずパラメトリック密度推定が用いられる。このとき、パラメトリック密度推定において仮定された分布関数の形状が真の形状と異なる場合、正しい確率密度関数を推定することができず、結果予測精度が低くなる。ヒストグラム法とパラメトリック密度推定の長所と短所について表１に示す。 When the data observed from an individual is sufficiently available, the histogram method capable of reproducing more various shapes is suitable instead of the parametric density estimation in which the shape of the probability density function is limited (low degree of freedom). However, in many cases, since there is little data of individuals available, the histogram method cannot be applied and parametric density estimation is used. At this time, when the shape of the distribution function assumed in the parametric density estimation is different from the true shape, a correct probability density function cannot be estimated, and the result prediction accuracy is lowered. Table 1 shows the advantages and disadvantages of the histogram method and parametric density estimation.

我々は個体に内在する真の確率密度関数を知り得ないため、可能な限りヒストグラム法を使用すべきである。本発明で解決しようとする課題は、個体の少ないデータ（スパースデータ）に対するヒストグラム法の弱さである。 Since we cannot know the true probability density function inherent in an individual, we should use the histogram method whenever possible. The problem to be solved by the present invention is the weakness of the histogram method for data with few individuals (sparse data).

本発明は、上記問題点を解決するために成されたものであり、個体が生成する連続値を精度よく予測するための確率密度関数を推定することができる確率密度関数推定装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in order to solve the above problems, and a probability density function estimation apparatus, method, and method capable of estimating a probability density function for accurately predicting continuous values generated by an individual, and The purpose is to provide a program.

また、個体が生成する連続値を精度よく予測することができる連続値予測装置、方法、及びプログラムを提供することを目的とする。 Another object of the present invention is to provide a continuous value prediction apparatus, method, and program capable of accurately predicting continuous values generated by an individual.

上記目的を達成するために、第１の発明に係る確率密度関数推定装置は、複数の個体で観測された、前記個体の個体ＩＤが付与された連続値データの集合を入力とし、前記連続値データの集合に含まれる各連続値データについて、各個体及び各トピックに対する、前記トピックに所属する前記個体の連続値データの個数、及び各トピックに対応するヒストグラムにおける連続値の各区間に対する、前記トピックに所属する連続値データのうち、前記トピックに対応する前記ヒストグラムの前記区間に含まれる連続値データの個数に基づく、前記連続値データが各トピックに所属する事後確率に従って、前記連続値データが所属するトピックを計算し、前記トピックの計算結果から得られる、各個体及び各トピックに対する、前記トピックに所属する前記個体の連続値データの個数、及び各トピックに対応するヒストグラムの各区間に対する、前記トピックに対応する前記ヒストグラムの前記区間に含まれる連続値データの個数に基づいて、前記トピックの各々に対応するヒストグラムの各々に対する重みの分布を表すパラメータ、及び前記ヒストグラムの分布を表すパラメータを更新することを繰り返すトピック推定部と、前記トピック推定部によって更新された前記重みの分布を表すパラメータ、前記ヒストグラムの分布を表すパラメータ、繰り返し毎に得られた、各個体及び各トピックに対する、前記トピックに所属する前記個体の連続値データの個数、及び繰り返し毎に得られた、各トピックに対応するヒストグラムの各区間に対する、前記トピックに対応する前記ヒストグラムの前記区間に含まれる連続値データの個数に基づいて、前記トピックの各々に対応する前記ヒストグラムの各々の各区間の形状、及び各個体における前記トピックの各々に対応する前記ヒストグラムの各々に対する重みを推定し、前記複数の個体の各々に対し、前記個体に対して推定された前記ヒストグラムの各々に対する重みと、前記推定された前記トピックの各々に対応する前記ヒストグラムの各々の各区間の形状とに基づいて、前記トピックの各々に対応する前記ヒストグラムの各々の線形和で表わされる確率密度関数を推定する確率密度関数推定部と、を含んで構成されている。 In order to achieve the above object, a probability density function estimation apparatus according to a first aspect of the present invention receives, as an input, a set of continuous value data to which an individual ID of the individual is assigned, which is observed by a plurality of individuals. For each continuous value data included in the data set, for each individual and each topic, the number of continuous value data of the individual belonging to the topic, and the topic for each interval of continuous values in the histogram corresponding to each topic Among the continuous value data belonging to the topic, the continuous value data belongs according to the posterior probability that the continuous value data belongs to each topic based on the number of continuous value data included in the section of the histogram corresponding to the topic. The topic for each individual and each topic obtained from the calculation result of the topic is calculated. Corresponding to each of the topics based on the number of continuous value data of the individual and the number of continuous value data included in the section of the histogram corresponding to the topic for each section of the histogram corresponding to each topic. A parameter representing the distribution of weights for each of the histograms, a topic estimation unit that repeatedly updates the parameters representing the distribution of the histograms, a parameter representing the distribution of weights updated by the topic estimation unit, Parameters representing the distribution, for each individual and each topic obtained for each iteration, the number of continuous value data of the individual belonging to the topic, and each section of the histogram corresponding to each topic obtained for each iteration The histogram corresponding to the topic The weight of each of the histograms corresponding to each of the topics in each individual, and the shape of each section of the histogram corresponding to each of the topics For each of the plurality of individuals, a weight for each of the histograms estimated for the individual, and a shape of each section of the histogram corresponding to each of the estimated topics And a probability density function estimator for estimating a probability density function represented by a linear sum of each of the histograms corresponding to each of the topics.

また、第１の発明に係る確率密度関数推定装置において、前記トピック推定部は、前記連続値データの集合に含まれる各連続値データについて、前記連続値データが各トピックに所属する事後確率に従って、前記連続値データが所属するトピックを計算し、前記トピックの各々に対応する前記ヒストグラムの各々について、前記トピックの計算結果から得られる、各個体に対する、前記トピックに所属する前記個体の連続値データの個数、及び前記トピックに対応するヒストグラムの各区間に含まれる連続値データの個数に基づく、前記ヒストグラムの分割数の事後確率に従って、前記ヒストグラムの分割数を計算し、前記トピックの計算結果から得られる、各個体及び各トピックに対する、前記トピックに所属する前記個体の連続値データの個数、及び各トピックに対応するヒストグラムの各区間に対する、前記トピックに対応する前記ヒストグラムの前記区間に含まれる連続値データの個数と、前記トピックの各々に対応するヒストグラムの各々について計算された分割数とに基づいて、前記トピックの各々に対応するヒストグラムの各々に対する重みの分布を表すパラメータ、及び前記ヒストグラムの分布を表すパラメータを更新することを繰り返し、前記確率密度関数推定部は、前記トピック推定部によって更新された前記重みの分布を表すパラメータ、前記ヒストグラムの分布を表すパラメータ、前記トピックの各々に対応するヒストグラムの各々について計算された分割数、繰り返し毎に得られた、各個体及び各トピックに対する、前記トピックに所属する前記個体の連続値データの個数、及び繰り返し毎に得られた、各トピックに対応するヒストグラムの各区間に対する、前記トピックに対応する前記ヒストグラムの前記区間に含まれる連続値データの個数に基づいて、前記トピックの各々に対応する前記ヒストグラムの各々の各区間の形状、及び各個体における前記トピックの各々に対応する前記ヒストグラムの各々に対する重みを推定するようにしてもよい。 Further, in the probability density function estimation device according to the first invention, the topic estimation unit, for each continuous value data included in the set of continuous value data, according to the posterior probability that the continuous value data belongs to each topic, The topic to which the continuous value data belongs is calculated, and for each of the histograms corresponding to each of the topics, for each individual, the continuous value data of the individual belonging to the topic for each individual is calculated. The histogram division number is calculated according to the posterior probability of the histogram division number based on the number and the number of continuous value data included in each section of the histogram corresponding to the topic, and obtained from the topic calculation result For each individual and each topic, continuous value data of the individual belonging to the topic And the number of continuous value data included in the section of the histogram corresponding to the topic and the number of divisions calculated for each of the histogram corresponding to each of the topics, for each section of the histogram corresponding to each topic And repeatedly updating a parameter representing a distribution of weights for each of the histograms corresponding to each of the topics and a parameter representing the distribution of the histograms, and the probability density function estimator includes the topic estimator A parameter representing the distribution of the weights updated by the parameter, a parameter representing the distribution of the histogram, the number of divisions calculated for each of the histograms corresponding to each of the topics, and each individual and each topic obtained for each iteration. , Of the individual belonging to the topic Based on the number of continuous value data and the number of continuous value data included in the section of the histogram corresponding to the topic, for each section of the histogram corresponding to each topic, obtained for each iteration. The shape of each section of the histogram corresponding to each and the weight for each of the histograms corresponding to each of the topics in each individual may be estimated.

また、第２の発明に係る連続値予測装置は、予測対象の個体の連続値データを予測する連続値予測装置であって、前記予測対象の個体に対して予め推定された、トピックの各々に対するヒストグラムの各々に対する重みと、前記トピックの各々に対応する前記ヒストグラムの各々における連続値の各区間の形状とに基づく、前記トピックの各々に対応する前記ヒストグラムの各々の線形和で表わされる確率密度関数に基づいて、前記予測対象の個体について将来生成される連続値の期待値を計算する連続値予測部を含んで構成されている。 The continuous value prediction apparatus according to the second invention is a continuous value prediction apparatus for predicting continuous value data of an individual to be predicted, and for each topic estimated in advance for the individual to be predicted. A probability density function represented by a linear sum of each of the histograms corresponding to each of the topics, based on a weight for each of the histograms and a shape of each interval of successive values in each of the histograms corresponding to each of the topics And a continuous value prediction unit for calculating an expected value of a continuous value generated in the future for the individual to be predicted.

第３の発明に係る確率密度関数推定方法は、トピック推定部が、複数の個体で観測された、前記個体の個体ＩＤが付与された連続値データの集合を入力とし、前記連続値データの集合に含まれる各連続値データについて、各個体及び各トピックに対する、前記トピックに所属する前記個体の連続値データの個数、及び各トピックに対応するヒストグラムにおける連続値の各区間に対する、前記トピックに所属する連続値データのうち、前記トピックに対応する前記ヒストグラムの前記区間に含まれる連続値データの個数に基づく、前記連続値データが各トピックに所属する事後確率に従って、前記連続値データが所属するトピックを計算し、前記トピックの計算結果から得られる、各個体及び各トピックに対する、前記トピックに所属する前記個体の連続値データの個数、及び各トピックに対応するヒストグラムの各区間に対する、前記トピックに対応する前記ヒストグラムの前記区間に含まれる連続値データの個数に基づいて、前記トピックの各々に対応するヒストグラムの各々に対する重みの分布を表すパラメータ、及び前記ヒストグラムの分布を表すパラメータを更新することを繰り返すステップと、確率密度関数推定部が、前記トピック推定部によって更新された前記重みの分布を表すパラメータ、前記ヒストグラムの分布を表すパラメータ、繰り返し毎に得られた、各個体及び各トピックに対する、前記トピックに所属する前記個体の連続値データの個数、及び繰り返し毎に得られた、各トピックに対応するヒストグラムの各区間に対する、前記トピックに対応する前記ヒストグラムの前記区間に含まれる連続値データの個数に基づいて、前記トピックの各々に対応する前記ヒストグラムの各々の各区間の形状、及び各個体における前記トピックの各々に対応する前記ヒストグラムの各々に対する重みを推定し、前記複数の個体の各々に対し、前記個体に対して推定された前記ヒストグラムの各々に対する重みと、前記推定された前記トピックの各々に対応する前記ヒストグラムの各々の各区間の形状とに基づいて、前記トピックの各々に対応する前記ヒストグラムの各々の線形和で表わされる確率密度関数を推定するステップと、を含んで実行することを特徴とする。 In the probability density function estimation method according to a third aspect of the invention, the topic estimation unit receives as input a set of continuous value data to which the individual ID of the individual is assigned, observed by a plurality of individuals, and sets the continuous value data For each individual and each topic, the number of the continuous value data of the individual belonging to the topic, and each interval of continuous values in the histogram corresponding to each topic Based on the number of continuous value data included in the section of the histogram corresponding to the topic among the continuous value data, the topic to which the continuous value data belongs is determined according to the posterior probability that the continuous value data belongs to each topic. Calculated and obtained from the calculation result of the topic, before belonging to the topic for each individual and each topic A histogram corresponding to each of the topics based on the number of continuous value data of the individual and the number of continuous value data included in the section of the histogram corresponding to the topic for each section of the histogram corresponding to each topic. A step of repeatedly updating a parameter representing a distribution of weights for each of the parameters and a parameter representing the distribution of the histogram, and a parameter representing the distribution of weights updated by the probability density function estimator by the topic estimator, Parameters representing the distribution of the histogram, the number of continuous value data of the individual belonging to the topic for each individual and each topic obtained for each iteration, and the histogram corresponding to each topic obtained for each iteration Corresponding to the topic for each section of Based on the number of continuous value data included in the section of the stogram, the shape of each section of the histogram corresponding to each of the topics, and the weight for each of the histograms corresponding to each of the topics in each individual For each of the plurality of individuals, a weight for each of the histograms estimated for the individual, and a shape of each section of the histogram corresponding to each of the estimated topics And a step of estimating a probability density function represented by a linear sum of each of the histograms corresponding to each of the topics.

また、第３の発明に係る確率密度関数推定方法において、前記トピック推定部が推定及び更新するステップは、前記連続値データの集合に含まれる各連続値データについて、前記連続値データが各トピックに所属する事後確率に従って、前記連続値データが所属するトピックを計算し、前記トピックの各々に対応する前記ヒストグラムの各々について、前記トピックの計算結果から得られる、各個体に対する、前記トピックに所属する前記個体の連続値データの個数、及び前記トピックに対応するヒストグラムの各区間に含まれる連続値データの個数に基づく、前記ヒストグラムの分割数の事後確率に従って、前記ヒストグラムの分割数を計算し、前記トピックの計算結果から得られる、各個体及び各トピックに対する、前記トピックに所属する前記個体の連続値データの個数、及び各トピックに対応するヒストグラムの各区間に対する、前記トピックに対応する前記ヒストグラムの前記区間に含まれる連続値データの個数と、前記トピックの各々に対応するヒストグラムの各々について計算された分割数とに基づいて、前記トピックの各々に対応するヒストグラムの各々に対する重みの分布を表すパラメータ、及び前記ヒストグラムの分布を表すパラメータを更新することを繰り返し、前記確率密度関数推定部が推定するステップは、前記トピック推定部によって更新された前記重みの分布を表すパラメータ、前記ヒストグラムの分布を表すパラメータ、前記トピックの各々に対応するヒストグラムの各々について計算された分割数、繰り返し毎に得られた、各個体及び各トピックに対する、前記トピックに所属する前記個体の連続値データの個数、及び繰り返し毎に得られた、各トピックに対応するヒストグラムの各区間に対する、前記トピックに対応する前記ヒストグラムの前記区間に含まれる連続値データの個数に基づいて、前記トピックの各々に対応する前記ヒストグラムの各々の各区間の形状、及び各個体における前記トピックの各々に対応する前記ヒストグラムの各々に対する重みを推定するようにしてもよい。 In the probability density function estimation method according to a third aspect of the invention, the step of estimating and updating by the topic estimation unit includes the continuous value data in each topic for each continuous value data included in the set of continuous value data. According to the posterior probability of belonging, calculate the topic to which the continuous value data belongs, and for each of the histograms corresponding to each of the topics, the individual belonging to the topic for each individual obtained from the calculation result of the topic Calculating the number of divisions of the histogram according to the posterior probability of the number of divisions of the histogram based on the number of continuous value data of the individual and the number of continuous value data included in each section of the histogram corresponding to the topic; Belonging to the topic for each individual and each topic obtained from the calculation result of The number of continuous value data of the individual, the number of continuous value data included in the section of the histogram corresponding to the topic, and the histogram corresponding to each of the topics, for each section of the histogram corresponding to each topic And updating the parameter representing the distribution of weights for each of the histograms corresponding to each of the topics and the parameter representing the distribution of the histograms based on the number of divisions calculated for each of the topics, and the probability density function The estimation unit estimates the parameters representing the weight distribution updated by the topic estimation unit, the parameters representing the histogram distribution, the number of divisions calculated for each histogram corresponding to each of the topics, and repetition. Each individual and each topic obtained for each , The number of continuous value data of the individual belonging to the topic, and the continuous value included in the section of the histogram corresponding to the topic for each section of the histogram corresponding to each topic obtained for each iteration Based on the number of data, the shape of each section of the histogram corresponding to each of the topics and the weight for each of the histograms corresponding to each of the topics in each individual may be estimated.

第４の発明に係る連続値予測方法は、予測対象の個体の連続値データを予測する連続値予測装置における連続値予測方法であって、連続値予測部が、前記予測対象の個体に対して予め推定された、トピックの各々に対するヒストグラムの各々に対する重みと、前記トピックの各々に対応する前記ヒストグラムの各々における連続値の各区間の形状とに基づく、前記トピックの各々に対応する前記ヒストグラムの各々の線形和で表わされる確率密度関数に基づいて、前記予測対象の個体について将来生成される連続値の期待値を計算するステップを含んで実行することを特徴とする。 A continuous value prediction method according to a fourth invention is a continuous value prediction method in a continuous value prediction apparatus for predicting continuous value data of an individual to be predicted, wherein the continuous value prediction unit Each of the histograms corresponding to each of the topics based on a pre-estimated weight for each of the histograms for each of the topics and the shape of each interval of successive values in each of the histograms corresponding to each of the topics. Based on a probability density function represented by a linear sum of the above, it is executed including a step of calculating an expected value of continuous values generated in the future for the individual to be predicted.

第５の発明に係るプログラムは、コンピュータを、上記第１の発明に係る確率密度関数推定装置を構成する各部として機能させるためのプログラムである。 A program according to a fifth invention is a program for causing a computer to function as each part constituting the probability density function estimation device according to the first invention.

第６の発明に係るプログラムは、コンピュータを、上記第２の発明に係る連続値予測装置を構成する各部として機能させるためのプログラムである。 A program according to a sixth invention is a program for causing a computer to function as each unit constituting the continuous value predicting device according to the second invention.

本発明の確率密度関数推定装置、方法、及びプログラムによれば、事後確率に従って、連続値データが所属するトピックを計算し、トピックの計算結果から得られる、各個体及び各トピックに対する、トピックに所属する個体の連続値データの個数、及び各トピックに対応するヒストグラムの各区間に対する、トピックに対応するヒストグラムの区間に含まれる連続値データの個数に基づいて、重みの分布を表すパラメータ、及びヒストグラムの分布を表すパラメータを更新することを繰り返し、更新された重みの分布を表すパラメータ、ヒストグラムの分布を表すパラメータ、繰り返し毎に得られた、各個体及び各トピックに対する、トピックに所属する個体の連続値データの個数、及び繰り返し毎に得られた、各トピックに対応するヒストグラムの各区間に対する、トピックに対応するヒストグラムの区間に含まれる連続値データの個数に基づいて、トピックの各々に対応するヒストグラムの各々の各区間の形状、及び各個体におけるトピックの各々に対応するヒストグラムの各々に対する重みを推定し、ヒストグラムの各々に対する重みと、ヒストグラムの各々の各区間の形状とに基づいて、確率密度関数を推定することにより、個体が生成する連続値を精度よく予測するための確率密度関数を推定することができる、という効果が得られる。 According to the probability density function estimation apparatus, method, and program of the present invention, according to the posterior probability, the topic to which the continuous value data belongs is calculated, and each individual and each topic obtained from the topic calculation result belong to the topic. Parameters representing the distribution of weights based on the number of continuous value data of the individual and the number of continuous value data included in the histogram section corresponding to the topic for each section of the histogram corresponding to each topic, and the histogram Repeated updating parameters representing distributions, updated weight distribution parameters, histogram distribution parameters, and individual values of each individual and each topic obtained for each iteration. The number of data and hiss corresponding to each topic obtained at each iteration Corresponding to the shape of each section of the histogram corresponding to each of the topics and each topic in each individual based on the number of continuous value data included in the section of the histogram corresponding to the topic for each section of the gram To estimate the continuous value generated by an individual accurately by estimating the weight for each histogram and estimating the probability density function based on the weight for each histogram and the shape of each section of the histogram. The probability density function can be estimated.

また、本発明の連続値予測装置、方法、及びプログラムによれば、予測対象の個体に対して予め推定された、トピックの各々に対するヒストグラムの各々に対する重みと、ヒストグラムの各々における連続値の各区間の形状とに基づく、確率密度関数に基づいて、予測対象の個体について将来生成される連続値の期待値を計算することにより、個体が生成する連続値を精度よく予測することができる、という効果が得られる。 Further, according to the continuous value predicting apparatus, method, and program of the present invention, the weight for each histogram for each topic estimated in advance for the individual to be predicted, and each interval of the continuous value in each histogram The effect that it is possible to accurately predict the continuous value generated by an individual by calculating the expected value of the continuous value generated in the future for the prediction target individual based on the probability density function based on the shape of Is obtained.

本発明の実施の形態に係る確率密度関数推定装置の構成を示すブロック図である。It is a block diagram which shows the structure of the probability density function estimation apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る連続値予測装置の構成を示すブロック図である。It is a block diagram which shows the structure of the continuous value estimation apparatus which concerns on embodiment of this invention. 本発明の第１の実施の形態に係る確率密度関数推定装置における確率密度関数推定処理ルーチンを示すフローチャートである。It is a flowchart which shows the probability density function estimation process routine in the probability density function estimation apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る確率密度関数推定装置における連続値予測処理ルーチンを示すフローチャートである。It is a flowchart which shows the continuous value prediction process routine in the probability density function estimation apparatus which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る確率密度関数推定装置における確率密度関数推定処理ルーチンを示すフローチャートである。It is a flowchart which shows the probability density function estimation process routine in the probability density function estimation apparatus which concerns on the 2nd Embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係る概要＞ <Outline according to Embodiment of the Present Invention>

まず、本発明の実施の形態における概要を説明する。 First, an outline of the embodiment of the present invention will be described.

多くの場合、個体一つ一つから得られるデータは少ないが、一方で複数の個体からそれぞれ（少数の）データを得ることができる。例えばｅコマースサイトにおいて、大半の顧客は数回しか購買を行っていないが、登録されている顧客数が極めて多い、という状況である。個体はそれぞれ独自の確率密度関数を有しているが、互いに似た確率密度関数を有する個体群が存在はずである。このとき、（ｉ）全個体の確率密度関数が少数のヒストグラムの線形和で表現される、（ｉｉ）個体ごとの確率密度関数の違いは線形和の重み付けの違いで表現される、の二点を仮定することで、ヒストグラム法の長所である高い自由度を担保しつつ、その弱みであるデータのスパース性を回避することができる、と着想した。個体の確率密度関数をヒストグラムの線形和で表現する装置はすでに存在するが（上記非特許文献１参照）、それは単一の個体に対してであり、ヒストグラム法の弱点であるデータのスパース性を回避することはできないものである。 In many cases, there is little data obtained from each individual, but on the other hand (small number) of data can be obtained from a plurality of individuals. For example, in an e-commerce site, most customers purchase only a few times, but the number of registered customers is extremely large. Each individual has its own probability density function, but there should be individuals having similar probability density functions. At this time, (i) the probability density function of all individuals is expressed by a linear sum of a small number of histograms, and (ii) the difference in probability density function for each individual is expressed by a difference in weighting of the linear sum. Assuming that, the high degree of freedom that is an advantage of the histogram method can be secured, and the sparsity of data, which is its weakness, can be avoided. There is already a device that expresses the probability density function of an individual as a linear sum of histograms (see Non-Patent Document 1 above), but it is for a single individual, and the sparsity of data, which is a weak point of the histogram method, is It cannot be avoided.

＜本発明の第１の実施の形態に係る確率密度関数推定装置の構成＞ <Configuration of probability density function estimation device according to first embodiment of the present invention>

次に、本発明の第１の実施の形態に係る確率密度関数推定装置の構成について説明する。図１に示すように、本発明の第１の実施の形態に係る確率密度関数推定装置１００は、ＣＰＵと、ＲＡＭと、後述する確率密度関数推定処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この確率密度関数推定装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部５０とを備えている。 Next, the configuration of the probability density function estimation apparatus according to the first embodiment of the present invention will be described. As shown in FIG. 1, the probability density function estimation apparatus 100 according to the first embodiment of the present invention includes a CPU, a RAM, a program for executing a probability density function estimation processing routine described later, and various data. It can be composed of a computer including a stored ROM. Functionally, the probability density function estimation apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG.

入力部１０は、解析対象である複数の個体で観測された、個体の個体ＩＤが付与された連続値データの集合を受け付ける。また、個体の確率密度関数を表現するためのヒストグラムの個数Ｋ、及び連続値データの観測期間である確率密度関数の定義域（連続値が取りうる値の範囲）Ｔ≡［Ｔ_０，Ｔ_１］を受け付ける。ただし以後、Ｋ個のヒストグラムをそれぞれｋ＝１，２，３，・・・，Ｋで表記し、ｋ番目のヒストグラムがトピックｋに対応しているものとする。連続値データは、個体ＩＤ（ｕと表記）、個体ｕで観測された連続値データの個数（Ｎ_ｕと表記）、全個体で観測された連続値データの総数（Ｎと表記）、全個体で観測された連続値データの集合（｛ｔ_ｊ｝≡（ｔ_１，ｔ_２，・・・，ｔ_Ｎ）と表記）、各連続値データを生成した個体ＩＤの集合（｛ｕ_ｊ｝≡（ｕ_１，ｕ_２，・・・，ｕ_Ｎ）と表記）、から構成される。 The input unit 10 receives a set of continuous value data to which individual individual IDs are assigned, which are observed by a plurality of individuals to be analyzed. In addition, the number K of histograms for expressing the probability density function of an individual, and the definition area of the probability density function that is the observation period of continuous value data (range of values that the continuous value can take) T≡ [T ₀ , T ₁ ] Is accepted. In the following description, it is assumed that the K histograms are represented by k = 1, 2, 3,..., K, and the k-th histogram corresponds to the topic k. Continuous value data (denoted u) individual ID, the number of consecutive values data observed in individuals u (N _u hereinafter), the total number of successive values data observed in all individuals (N hereinafter), all individuals A set of continuous value data observed in (represented as {t _j } ≡ (t ₁ , t ₂ ,..., T _N )), and a set of individual IDs that generated each continuous value data ({u _j } ≡ (Denoted as u ₁ , u ₂ ,..., U _N )).

ここで、本実施の形態で用いる生成モデルの原理について説明する。 Here, the principle of the generation model used in this embodiment will be described.

ヒストグラムによる確率密度推定では、定義域の分割数を与える必要がある。本実施の形態では、分割数を推定しないため、各トピックの分割数Ｗ≡（Ｗ_１、Ｗ_２、・・・，Ｗ_Ｋ）が入力として与えられる。 Probability density estimation using a histogram requires the number of domain divisions. In the present embodiment, since the division number is not estimated, the division number W≡ (W ₁ , W ₂ ,..., W _K ) of each topic is given as an input.

まず、以下（１）式に示す生成モデルに基づき、各個体から連続値データが生成されると仮定する。ｊ番目の連続値データｔ_ｊを生成した個体ｕ_ｊは、Ｋ個のヒストグラムの線形和で表現される確率密度関数ｐ（ｔ｜φ）から連続値データｔ_ｊを生成する。 First, it is assumed that continuous value data is generated from each individual based on the generation model shown in the following equation (1). individuals were generated j-th consecutive value data t _j u _j is the probability density function p expressed by the linear sum of K histogram | generating a continuous value data t _j from (t φ).

ただし、ｚ_ｊはｊ番目の連続値データが所属するトピック（１〜Ｋ）を表す潜在変数、 Where z _j is a latent variable representing the topic (1 to K) to which the j-th continuous value data belongs,

は個体ｕ_ｊのトピックｋに対する重み、ｐ（ｔ_ｊ｜ｚ_ｊ＝ｋ、φ_・ｋ）はヒストグラム Is the weight of individual u _j to topic k, and p (t _j | z _j = k, φ _{· k} ) is the histogram

で表されるトピックｋの分布、φ_ｌｋはヒストグラムの区間ｌにおける形状を決めるパラメータ、ｗ_ｊ ^ｋは The distribution of the topic k expressed by: φ _lk is a parameter for determining the shape in the section l of the histogram, and w _j ^k is

で定義される整数（１＜ｗ_ｊ ^ｋ＜Ｗ_ｋ）である。 (1 <w _j ^k <W _k ).

ここで上記（３）式の関数ｉｎｔ［ｘ］は実数値ｘの整数部分を取り出す関数とする。ヒストグラムによる確率密度関数は、連続値データの測定期間の定義域［Ｔ_０、Ｔ_１］をＷ_ｋ個の区間に等分割しその各区間ｌ（１〜Ｗ_ｋ）での確率密度の値を一定値φ_ｌｋで表現したものであり、ｗ_ｊ ^ｋは連続値データｔ_ｊが何番目の区間内に入っているかを示している。 Here, the function int [x] in the above equation (3) is a function for extracting the integer part of the real value x. The probability density function based on the histogram divides the domain [T ₀ , T ₁ ] of the measurement period of continuous value data equally into W _k sections, and calculates the probability density value in each section l ( ₁ to W _k ). _This is expressed by a constant value φ _lk , and w _j ^k indicates in which section the continuous value data t _j is included.

次に、上記（１）式〜（３）式の生成モデルに現れる二種類のパラメータ Next, two kinds of parameters appearing in the generation models of the above equations (1) to (3)

及びφ_ｌｋに対して、共役な事前分布であるディリクレ分布をそれぞれに仮定する。 _And Dirichlet distribution, which is a conjugate prior distribution, for φ _lk .

上記（１）式〜（４）式より、連続値データの所属トピックｚ≡（ｚ_１，ｚ_２，・・・，ｚ_Ｎ）及び観測ｔ≡（ｔ_１，ｔ_２，・・・，ｔ_Ｎ）の同時確率 From the above formulas (1) to (4), the topic z≡ (z ₁ , z ₂ ,..., Z _N ) and observation t≡ (t ₁ , t ₂ ,. _N ) joint probability

を得る。ただし、Ｎ_ｋｕは個体ｕの連続値データのうち所属トピックがｋである連続値データの個数、Ｎ_ｋｌは所属トピックがｋである全連続値データの中でヒストグラムのｌ番目の区間に含まれている連続値データの個数、Ｎ_ｋは所属トピックがｋである連続値データの総数を表しており、また、モデルを簡潔にするためディリクレ分布のパラメータβの全要素を等しいと置いた（β_１＝β_２＝・・・＝β）。 Get. Here, N _ku is the number of continuous value data whose belonging topic is k among the continuous value data of the individual u, and N _kl is included in the l-th section of the histogram among all continuous value data whose belonging topic is k. N _k represents the total number of continuous value data of which the topic is k, and in order to simplify the model, all elements of the parameter β of the Dirichlet distribution are set to be equal (β ₁ = β ₂ =... = Β).

上記（１）〜（５）式によって本実施の形態における生成モデルが計算できる。以上が本実施の形態における生成モデルの原理である。 The generation model in the present embodiment can be calculated by the above equations (1) to (5). The above is the principle of the generation model in the present embodiment.

演算部２０は、トピック推定部３０と、確率密度関数推定部３２とを含んで構成されている。 The calculation unit 20 includes a topic estimation unit 30 and a probability density function estimation unit 32.

トピック推定部３０は、入力部１０で受け付けた個体ｕの個体ＩＤが付与された連続値データの集合を入力とし、連続値データの集合に含まれる各連続値データについて、各個体ｕ及び各トピックに対する、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及び各トピックｋに対応するヒストグラムにおける連続値の各区間ｌに対する、トピックｋに所属する連続値データのうち、トピックｋに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づく、連続値データが各トピックに所属する事後確率に従って、連続値データが所属するトピックｋを計算する。また、トピック推定部３０は、トピックｋの計算結果から得られる、各個ｕ体及び各トピックｋに対する、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及び各トピックｋに対応するヒストグラムの各区間ｌに対する、トピックｋに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づいて、トピックｋの各々に対応するヒストグラムの各々に対する重みの分布を表すパラメータα、及びヒストグラムの分布を表すパラメータβを更新する。トピック推定部３０は、上記のトピックの計算及びパラメータの更新を繰り返し行う。 The topic estimation unit 30 receives a set of continuous value data to which the individual ID of the individual u received by the input unit 10 is input, and for each continuous value data included in the continuous value data set, each individual u and each topic Corresponding to topic k among continuous value data belonging to topic k with respect to the number N _ku of continuous value data of individual u belonging to topic k and each interval l of continuous values in the histogram corresponding to each topic k The topic k to which the continuous value data belongs is calculated according to the posterior probability that the continuous value data belongs to each topic, based on the number N _kl of the continuous value data included in the interval l of the histogram. Further, the topic estimation unit 30 obtains the number N _ku of continuous value data of the individual u belonging to the topic k and the histogram corresponding to each topic k for each individual u body and each topic k obtained from the calculation result of the topic k. Parameter α representing the distribution of weights for each of the histograms corresponding to each of the topics k based on the number N _kl of the continuous value data included in the interval l of the histogram corresponding to the topic k, and the histogram The parameter β representing the distribution of is updated. The topic estimation unit 30 repeatedly performs the above topic calculation and parameter update.

トピック推定部３０で行われる具体的な計算方法について以下に詳述する。トピック推定部３０及び後述する確率密度関数推定部３２では、所属トピックｚ≡（ｚ_１，ｚ_２，・・・ｚ_Ｎ）の事後確率ｐ（ｚ｜ｔ，α，β，Ｗ）を計算することで、未知のパラメータα、β、θ、及びφをすべて推定することが出来る。 A specific calculation method performed by the topic estimation unit 30 will be described in detail below. The topic estimation unit 30 and the probability density function estimation unit 32 described later calculate the posterior probability p (z | t, α, β, W) of the belonging topic z≡ (z ₁ , z ₂ ,... Z _N ). Thus, all the unknown parameters α, β, θ, and φ can be estimated.

トピック推定部３０は、まず、事後確率を解析的に扱うのは困難であるため、同時確率が上記（５）式から得られる、各連続値データｔ_ｊが所属するトピックｚ_ｊをギブスサンプリングするための公式 First, since it is difficult to analytically handle the posterior probability, the topic estimation unit 30 performs the Gibbs sampling on the topic z _j to which each continuous value data t _j belongs, in which the joint probability is obtained from the above equation (5). Formula for

を用いて、パラメータα、パラメータβ、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、トピックｋに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌ、及び所属トピックがｋである連続値データの総数Ｎ_ｋに基づいて、ｚ≡（ｚ_１、ｚ_２、・・・ｚ_Ｎ）のＰ回分のサンプル（ｚ^（１），ｚ^（２），・・・，ｚ^（Ｐ））を生成し、保持しておく。ただし、（６）式に現れる−ｊは、総数Ｎの全連続値データ集合からｊ番目の連続値データを除いた部分集合を表す。実際にはｚのサンプルそのものではなく、その十分統計量である（Ｎ_ｋｕ ^（１），Ｎ_ｋｕ ^（２），・・・，Ｎ_ｋｕ ^（Ｐ））と（Ｎ_ｋｌ ^（１），Ｎ_ｋｌ ^（２），・・・，Ｎ_ｋｌ ^（Ｐ））とを保持することになる。 , The parameter α, the parameter β, the number N _ku of the continuous value data of the individual u belonging to the topic k, the number N _{kl of} the continuous value data included in the section l of the histogram corresponding to the topic k, Based on the total number N _k of continuous value data _k , P samples (z ⁽¹⁾ , z ⁽²⁾ ,..., z ^{) of} z≡ (z ₁ , z ₂ ,... z _N ) ^(P) ) is generated and held. However, −j appearing in the equation (6) represents a subset obtained by excluding the j-th continuous value data from the total number N of continuous value data sets. Actually, it is not a sample of z itself, but its sufficient statistics (N _ku ⁽¹⁾ , N _ku ⁽²⁾ ,..., N _ku ^(P) ) and (N _kl ⁽¹⁾ , N _kl ^{( 2)} ,..., N _kl ^(P) ).

トピック推定部３０は、Ｐ回分のサンプルを生成するのと同時に、以下（７）式に示す確率的ＥＭ法（上記非特許文献２参照）に基づき未知パラメータα、及びβの値をＰ回更新する。 At the same time as generating the samples for P times, the topic estimation unit 30 updates the values of the unknown parameters α and β P times based on the stochastic EM method shown in the following equation (7) (see Non-Patent Document 2 above). To do.

トピック推定部３０は、具体的には、パラメータα、パラメータβ、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、トピックｋに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌ、個体ｕの連続値データの総数Ｎ_u、及び所属トピックがｋである連続値データの総数Ｎ_ｋに基づいて、以下（８）式の更新式に従ってパラメータα、及びβの値をＰ回更新する。 Specifically, the topic estimation unit 30 sets the parameter α, the parameter β, the number of continuous value data N _{ku of} the individual u belonging to the topic k, and the number of continuous value data included in the interval l of the histogram corresponding to the topic k. N _kl, based on the total number N _k of consecutive value data is the total number N _u of a continuous value data of individuals u, and affiliation topic k, parameter according to the following equation (8) update equation alpha, and the value of beta P Update once.

そして、トピック推定部３０は、Ｐ回更新後のパラメータ値α^（Ｐ）、及びβ^（Ｐ）をパラメータα、及びβのそれぞれの推定値とする。なお、更新の度に、Ｎ_ｋｕ ^（ｐ）、Ｎ_ｋｌ ^（ｐ）、及びＮ_ｋの値も更新される。 Then, the topic estimation unit 30 sets the parameter values α ^(P) and β ^(P) after updating P times as the estimated values of the parameters α and β, respectively. Note that the values of N _ku ^(p) , N _kl ^(p) , and N _k are also updated each time it is updated.

確率密度関数推定部３２は、トピック推定部３０によって更新された重みの分布を表すパラメータα、ヒストグラムの分布を表すパラメータβ、繰り返し毎に得られた、各個体ｕ及び各トピックｋに対する、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及び繰り返し毎に得られた、各トピックｋに対応するヒストグラムの各区間ｌに対する、トピックｋに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づいて、トピックｋの各々に対応するヒストグラムの各々の各区間ｌの形状~φ_ｌｋ、及び各個体ｕにおけるトピックｋの各々に対応するヒストグラムの各々に対する重み~θ_ｋｕを推定する。そして、確率密度関数推定部３２は、複数の個体ｕの各々に対し、当該個体ｕに対して推定されたヒストグラムの各々に対する重み~θ_ｋｕと、推定されたトピックｋの各々に対応するヒストグラムの各々の各区間ｌの形状~φ_ｌｋとに基づいて、トピックｋの各々に対応するヒストグラムの各々の線形和で表わされる確率密度関数~ｐ_ｕ（ｔ│~φ，~θ_・ｕ）を推定する。 The probability density function estimator 32 includes a parameter α representing the weight distribution updated by the topic estimator 30, a parameter β representing the histogram distribution, and the topic k for each individual u and each topic k obtained for each iteration. The continuous value data included in the interval l of the histogram corresponding to the topic k with respect to the number N _ku of the continuous value data of the individual u belonging to, and each interval l of the histogram corresponding to each topic k obtained for each iteration based on the number N _kl, topic k each corresponding histogram of each shape ~ phi _lk of each section _l, and estimates the weight ~ theta _ku for each of the histogram corresponding to each topic k in each individual u To do. Then, the probability density function estimation unit 32 calculates, for each of the plurality of individuals u, a weight ~ θ _ku for each of the histograms estimated for the individual u and a histogram corresponding to each of the estimated topics k. _{Estimate the} probability density function ~ p _u (t│ ~ φ, ~ θ _{・ u} ) represented by the linear sum of each histogram corresponding to each topic k based on the shape ~ φ _lk of each section l To do.

確率密度関数推定部３２で行われる具体的な計算について以下に詳述する。確率密度関数推定部３２では、トピック推定部３０による推定で得られたパラメータα、パラメータβ、（Ｎ_ｋｕ ^（１），Ｎ_ｋｕ ^（２），・・・，Ｎ_ｋｕ ^（Ｐ））、及び（Ｎ_ｋｌ ^（１），Ｎ_ｋｌ ^（２），・・・，Ｎ_ｋｌ ^（Ｐ））を用いて、各個体ｕの確率密度関数を推定する。 Specific calculations performed by the probability density function estimation unit 32 will be described in detail below. In the probability density function estimation unit 32, the parameter α, parameter β, (N _ku ⁽¹⁾ , N _ku ⁽²⁾ ,..., N _ku ^(P) ) obtained by the estimation by the topic estimation unit 30 and ( N _kl ⁽¹⁾ , N _kl ⁽²⁾ ,..., N _kl ^(P) ) are used to estimate the probability density function of each individual u.

まず、パラメータβ、トピックｋに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌ、所属トピックがｋである連続値データの総数Ｎ_ｋに基づいて、トピックｋ（１≦ｋ≦Ｋ）のヒストグラムの各区間ｌの形状を表すパラメータφ_ｌｋを事後確率の平均 First, based on the parameter β, the number N _kl of continuous value data included in the interval l of the histogram corresponding to the topic k, and the total number N _k of continuous value data with the affiliated topic k, the topic k (1 ≦ k ≦ K ) _Is the average of the posterior probabilities

により推定する。次に、パラメータα、トピックｋに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌ、及び個体ｕの連続値データの総数Ｎ_uに基づいて、各個体ｕにおけるトピックｋの重みθ_ｋｕを事後確率の平均 Estimated by Next, based on the parameter α, the number N _kl of continuous value data included in the interval l of the histogram corresponding to the topic k, and the total number N _u of continuous value data of the individual u, the weight θ of the topic k in each individual u _ku is the average of posterior probabilities

により推定する。最終的に、上記（９）式、及び（１０）式で得られた結果を用いて、各個体ｕの確率密度関数の推定値~ｐ_ｕ（ｔ│~φ，~θ_・ｕ）は以下（１１）式に従って計算する。 Estimated by Finally, using the results obtained in the above equations (9) and (10), the estimated value ~ p _u (t | ~ φ, ~ θ _{· u} ) of the probability density function of each individual u is Calculate according to equation (11).

ただし、ヒストグラムｐ（ｔ│ｚ_ｊ＝ｋ，~φ_ｋ，Ｗ_ｋ）は上記（２）式で定義される。 However, the histogram p (t | z _j = k, ~ φ _k , W _k ) is defined by the above equation (2).

確率密度関数推定部３２において、個体ｕの確率密度関数は~θ_ｋｕ、~φ_ｌｋ、Ｗ_ｋ、及びＴで表現される。そして、確率密度関数推定部３２は、個体ｕごとに~θ_ｋｕ（１≦ｋ≦Ｋ）が、トピックｋごとに、分割数Ｗ_ｋと、~φ_ｌｋ（１≦ｌ≦Ｗ_ｋ）と、共通の定義域であるＴ≡［Ｔ_０,Ｔ_１］とを出力部５０に出力する。トピック数Ｋ＝３の場合の出力例を表２に示す。 In the probability density function estimator 32, the probability density function of the individual u is expressed by ~ θ _ku , ~ φ _lk , W _k , and T. Then, the probability density function estimation unit 32 _obtains ~ θ _ku (1 ≦ k ≦ K) for each individual u, the number of divisions W _k and ~ φ _lk (1 ≦ l ≦ W _k ) for each topic k, The common domain T≡ [T ₀ , T ₁ ] is output to the output unit 50. Table 2 shows an output example when the number of topics K = 3.

＜本発明の第１の実施の形態に係る連続値予測装置の構成＞
<Configuration of continuous value prediction apparatus according to first embodiment of the present invention>

次に、本発明の第１の実施の形態に係る連続値予測装置の構成について説明する。図２に示すように、本発明の第１の実施の形態に係る連続値予測装置２００は、ＣＰＵと、ＲＡＭと、後述する連続値予測処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この連続値予測装置２００は、機能的には図２に示すように入力部２１０と、演算部２２０と、出力部２５０とを備えている。 Next, the configuration of the continuous value prediction apparatus according to the first embodiment of the present invention will be described. As shown in FIG. 2, the continuous value prediction apparatus 200 according to the first embodiment of the present invention stores a CPU, a RAM, a program for executing a continuous value prediction processing routine described later, and various data. It can be composed of a computer including a ROM. Functionally, the continuous value prediction apparatus 200 includes an input unit 210, a calculation unit 220, and an output unit 250 as shown in FIG.

入力部２１０は、予測対象の個体ｕに対して予め確率密度関数推定装置１００によって推定された、トピックｋの各々に対するヒストグラムの各々に対する重み~θ_ｋｕと、トピックｋの各々に対応するヒストグラムの各々における連続値の各区間ｌの形状~φ_ｌｋとに基づく、トピックｋの各々に対応するヒストグラムの各々の線形和で表わされる確率密度関数~ｐ_ｕ（ｔ│~φ，~θ_・ｕ）を受け付ける。 The input unit 210 uses the weight ~ θ _ku for each of the histograms for each topic k estimated in advance by the probability density function estimation apparatus 100 for the individual u to be predicted, and each of the histograms corresponding to each of the topics k. A probability density function ~ p _u (t│ ~ φ, ~ θ _{・ u} ) represented by a linear sum of histograms corresponding to each of the topics k based on the shape ~ φ _lk of each interval l of continuous values in Accept.

演算部２２０は、連続値予測部２３０を含んで構成されている。 The calculation unit 220 includes a continuous value prediction unit 230.

連続値予測部２３０は、入力部２１０で受け付けた確率密度関数~ｐ_ｕ（ｔ│~φ，~θ_・ｕ）に基づいて、以下（１２）式に従って、予測対象の個体ｕについて将来生成される連続値の期待値Ｅ［ｔ^ｕ］を計算し、計算の結果得られた個体ｕについての期待値Ｅ［ｔ^ｕ］を出力部２５０に出力する。 The continuous value prediction unit 230 is generated in the future for the individual u to be predicted according to the following equation (12) based on the probability density function ~ p _u (t | ~ φ, ~ θ _{· u} ) received by the input unit 210. that the expected value E of the continuous values [t ^u] calculates and outputs the expected value E of the resulting solid u of calculating the [t ^u] to the output unit 250.

ただし、ΔＴ≡Ｔ_１−Ｔ_０とする。 However, ΔT≡T ₁ −T ₀ .

＜本発明の第１の実施の形態に係る確率密度関数推定装置の作用＞ <Operation of the probability density function estimation apparatus according to the first embodiment of the present invention>

次に、本発明の第１の実施の形態に係る確率密度関数推定装置１００の作用について説明する。入力部１０において個体の個体ＩＤが付与された連続値データの集合を受け付けると、確率密度関数推定装置１００は、図３に示す確率密度関数推定処理ルーチンを実行する。 Next, the operation of the probability density function estimation apparatus 100 according to the first embodiment of the present invention will be described. When the input unit 10 receives a set of continuous value data to which an individual ID is assigned, the probability density function estimation apparatus 100 executes a probability density function estimation processing routine shown in FIG.

まず、ステップＳ１００では、入力部１０において受け付けた連続値データの集合を取得する。 First, in step S100, a set of continuous value data received by the input unit 10 is acquired.

次に、ステップＳ１０２では、ステップＳ１００で取得した連続値データの集合に含まれる各連続値データについて、各個体ｕ及び各トピックに対する、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕの初期値、及び各トピックｋに対応するヒストグラムにおける連続値の各区間ｌに含まれる連続値データの個数Ｎ_ｋｌの初期値、又は後述するステップ１０４で得られた、各個体ｕ及び各トピックに対する、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及び各トピックｋに対応するヒストグラムにおける連続値の各区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づいて、上記（６）式に示す連続値データが各トピックに所属する事後確率に従って、連続値データが所属するトピックｋを計算する。 Next, in step S102, for each continuous value data included in the set of continuous value data acquired in step S100, the number _Nku of the continuous value data of the individual u belonging to the topic k for each individual u and each topic. The initial value and the initial value of the number N _kl of continuous value data included in each interval l of continuous values in the histogram corresponding to each topic k, or for each individual u and each topic obtained in step 104 described later, Based on the number N _ku of continuous value data of the individual u belonging to the topic k and the number N _kl of continuous value data included in each interval l of continuous values in the histogram corresponding to each topic k, the above equation (6) The topic k to which the continuous value data belongs is calculated according to the posterior probability that the continuous value data to belong to each topic.

ステップＳ１０４では、ステップＳ１０２によるトピックｋの計算結果から得られる、各個ｕ体及び各トピックｋに対する、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及び各トピックｋに対応するヒストグラムの各区間ｌに対する、トピックｋに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づいて、上記（８）式に従って、トピックｋの各々に対応するヒストグラムの各々に対する重みの分布を表すパラメータα、及びヒストグラムの分布を表すパラメータβを更新する。 In step S104, for each individual u body and each topic k obtained from the calculation result of topic k in step S102, the number N _ku of continuous value data of the individual u belonging to topic k, and the histogram corresponding to each topic k. Based on the number N _kl of continuous value data included in the section l of the histogram corresponding to the topic k for each section l, the distribution of weights for each of the histograms corresponding to each of the topics k is calculated according to the above equation (8). The parameter α representing and the parameter β representing the histogram distribution are updated.

ステップＳ１０５では、ステップＳ１０２〜Ｓ１０４の処理を予め定められた回数繰り返したかを判定し、予め定められた回数繰り返していなければステップＳ１０２へ戻ってステップＳ１０２〜Ｓ１０４の処理を繰り返し、予め定められた回数繰り返していればステップＳ１０６へ移行する。 In step S105, it is determined whether the processes in steps S102 to S104 have been repeated a predetermined number of times. If the predetermined number of times has not been repeated, the process returns to step S102 and the processes in steps S102 to S104 are repeated. If it has been repeated, the process proceeds to step S106.

ステップＳ１０６では、ステップＳ１０４で更新されたトピックｋの各々に対応するヒストグラムの各々に対する重みの分布を表すパラメータα、及びヒストグラムの分布を表すパラメータβ、繰り返し毎に得られた、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及び繰り返し毎に得られた、ヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づいて、上記（９）式及び（１０）式に従って、トピックｋの各々に対応するヒストグラムの各々の各区間ｌの形状~φ_ｌｋ、及び各個体ｕにおけるトピックｋの各々に対応するヒストグラムの各々に対する重み~θ_ｋｕを推定する。 In step S106, the parameter α representing the weight distribution for each of the histograms corresponding to each of the topics k updated in step S104, the parameter β representing the histogram distribution, and the topic k obtained for each iteration belong to the topic k. Based on the number N _ku of continuous value data of the individual u and the number N _kl of continuous value data included in the interval l of the histogram obtained for each iteration, the topic is calculated according to the above formulas (9) and (10). The shape ~ φ _lk of each section l of each histogram corresponding to each of k and the weight ~ θ _ku for each histogram corresponding to each of the topics k in each individual u are estimated.

ステップＳ１０８では、ステップＳ１０６で複数の個体ｕの各々に対し、当該個体ｕに対して推定されたヒストグラムの各々に対する重み~θ_ｋｕと、推定されたトピックｋの各々に対応するヒストグラムの各々の各区間ｌの形状~φ_ｌｋとに基づいて、上記（１１）式に従って、トピックｋの各々に対応するヒストグラムの各々の線形和で表わされる確率密度関数~ｐ_ｕ（ｔ│~φ，~θ_・ｕ）を推定し、出力部５０に出力して確率密度関数処理ルーチンを終了する。 In step S108, for each of the plurality of individuals u in step S106, the weight ~ θ _ku for each of the histograms estimated for the individual u and each of the histograms corresponding to each of the estimated topics k. Probability density function ~ p _u (t | ~ φ, ~ θ _·) represented by the linear sum of each histogram corresponding to each topic k according to the above equation (11) based on the shape of section l ~ φ _lk _u ) is estimated and output to the output unit 50 to end the probability density function processing routine.

以上説明したように、第１の実施の形態に係る確率密度関数推定装置によれば、上記（６）式に示す事後確率に従って、連続値データが所属するトピックｋを計算し、トピックｋの計算結果から得られる、各個体及び各トピックに対する、トピックｋに所属する個体の連続値データの個数、及び各トピックｋに対応するヒストグラムの各区間ｌに対する、トピックに対応するヒストグラムの区間ｌに含まれる連続値データの個数に基づいて、上記（８）式に従って、重みの分布を表すパラメータα、及びヒストグラムの分布を表すパラメータβを更新することを繰り返し、更新された重みの分布を表すパラメータα、ヒストグラムの分布を表すパラメータβ、繰り返し毎に得られた、各個体及び各トピックｋに対する、トピックｋに所属する個体の連続値データの個数Ｎ_ｋｕ、及び繰り返し毎に得られた、各トピックｋに対応するヒストグラムの各区間ｌに対する、トピックｋに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づいて、上記（９）式及び（１０）式に従って、トピックｋの各々に対応するヒストグラムの各々の各区間ｌの形状~φ_ｌｋ、及び各個体におけるトピックｋの各々に対応するヒストグラムの各々に対する重み~θ_ｋｕを推定し、ヒストグラムの各々に対する重み~θ_ｋｕと、ヒストグラムの各々の各区間ｌの形状~φ_ｌｋとに基づいて、上記（１１）式に従って、確率密度関数~ｐ_ｕ（ｔ│~φ，~θ_・ｕ）を推定することにより、個体が生成する連続値を精度よく予測するための確率密度関数を推定することができる。 As described above, according to the probability density function estimation apparatus according to the first embodiment, the topic k to which the continuous value data belongs is calculated according to the posterior probability shown in the above equation (6), and the calculation of the topic k is performed. The number of continuous value data of individuals belonging to topic k for each individual and each topic obtained from the result, and each histogram interval corresponding to each topic k is included in the histogram interval corresponding to the topic. Based on the number of continuous value data, the parameter α representing the distribution of weights and the parameter β representing the distribution of histograms are repeatedly updated according to the above equation (8), and the parameter α representing the updated weight distribution is obtained. Parameter β representing the distribution of the histogram, the individual belonging to topic k for each individual and each topic k obtained for each iteration Continuous value data number N _ku, and was obtained at each iteration, for each section l of the histogram corresponding to each topic k, the number N _kl continuous value data included in the interval l of the histogram corresponding to the topic k Based on the above formulas (9) and (10), the shape of each section l of the histogram corresponding to each of the topics _{k˜φ lk} , and the histogram corresponding to each of the topics k in each individual The weight ~ θ _ku is estimated, and based on the weight ~ θ _ku for each of the histograms and the shape ~ φ _lk of each section l of each histogram, the probability density function ~ p _u (t It is possible to estimate a probability density function for accurately predicting continuous values generated by an individual by estimating │ ~ φ, ~ θ _{· u} ).

＜本発明の実施の形態に係る連続値予測装置の作用＞ <Operation of Continuous Value Prediction Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係る連続値予測装置２００の作用について説明する。入力部２１０において、予測対象の個体ｕについて確率密度関数推定装置１００により推定された確率密度関数を受け付けると、連続値予測装置２００は、図４に示す連続値予測処理ルーチンを実行する。 Next, the operation of continuous value prediction apparatus 200 according to the embodiment of the present invention will be described. When the input unit 210 receives the probability density function estimated by the probability density function estimation device 100 for the individual u to be predicted, the continuous value prediction device 200 executes a continuous value prediction processing routine shown in FIG.

ステップＳ２００では、入力部２１０において受け付けた確率密度関数に基づいて、上記（１２）式に従って、予測対象の個体ｕについて将来生成される連続値の期待値Ｅ［ｔ^ｕ］を計算する。 In step S200, based on the probability density function received in the input unit 210, an expected value E [t ^u ] of a continuous value generated in the future for the individual u to be predicted is calculated according to the above equation (12).

そして、ステップＳ２０２では、計算の結果得られた個体ｕについての期待値Ｅ［ｔ^ｕ］を出力部２５０に出力し、連続値予測処理ルーチンを終了する。 In step S202, the expected value E [t ^u ] for the individual u obtained as a result of the calculation is output to the output unit 250, and the continuous value prediction processing routine ends.

以上説明したように、第１の実施の形態に係る予測装置によれば、予測対象の個体に対して予め推定された、トピックの各々に対するヒストグラムの各々に対する重みと、ヒストグラムの各々における連続値の各区間ｌの形状とに基づく、確率密度関数に基づいて、上記（１２）式に従って、予測対象の個体について将来生成される連続値の期待値を計算することにより、個体が生成する連続値を精度よく予測することができる。 As described above, according to the prediction device according to the first embodiment, the weight for each of the histograms for each topic estimated in advance for the individual to be predicted, and the continuous value in each of the histograms Based on the probability density function based on the shape of each section l, the continuous value generated by the individual is calculated by calculating the expected value of the continuous value generated in the future for the prediction target individual according to the above equation (12). Predict with high accuracy.

＜本発明の第２の実施の形態に係る確率密度関数推定装置の構成＞ <Configuration of probability density function estimation apparatus according to second embodiment of the present invention>

次に、本発明の第２の実施の形態に係る確率密度関数推定装置の構成について説明する。なお、第１の実施の形態と同様となる箇所については同一符号を付して説明を省略する。 Next, the configuration of the probability density function estimation apparatus according to the second embodiment of the present invention will be described. In addition, the same code | symbol is attached | subjected about the location similar to 1st Embodiment, and description is abbreviate | omitted.

上記図１に示すように、本発明の第２の実施の形態に係る確率密度関数推定装置１００は、ＣＰＵと、ＲＡＭと、後述する確率密度関数推定処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この確率密度関数推定装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部５０とを備えている。 As shown in FIG. 1, the probability density function estimation apparatus 100 according to the second embodiment of the present invention includes a CPU, a RAM, and a program and various data for executing a probability density function estimation processing routine described later. And a ROM including a ROM that stores information. Functionally, the probability density function estimation apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG.

第２の実施の形態に係る入力部１０は、解析対象である複数の個体で観測された、個体の個体ＩＤが付与された連続値データの集合を受け付ける。また、確率密度関数の定義域（連続値が取りうる値の範囲）Ｔ≡［Ｔ_０，Ｔ_１］を受け付ける。 The input unit 10 according to the second embodiment receives a set of continuous value data to which individual individual IDs are assigned, which are observed by a plurality of individuals to be analyzed. Also, a domain of probability density function (a range of values that can be taken by continuous values) T≡ [T ₀ , T ₁ ] is accepted.

第２の実施の形態に係る演算部２０は、トピック推定部３０と、確率密度関数推定部３２とを含んで構成されている。 The calculation unit 20 according to the second embodiment includes a topic estimation unit 30 and a probability density function estimation unit 32.

第２の実施の形態に係るトピック推定部３０は、入力部１０により受け付けた連続値データの集合に含まれる各連続値データについて、連続値データが各トピックｋに所属する事後確率に従って、連続値データが所属するトピックｋを計算する。また、トピック推定部３０は、トピックｋの各々に対応するヒストグラムの各々について、トピックｋの計算結果から得られる、各個体ｕに対する、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及びトピックｋに対応するヒストグラムの各区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づく、ヒストグラムの分割数Ｗの事後確率に従って、ヒストグラムの分割数Ｗを計算する。また、トピック推定部３０は、トピックｋの計算結果から得られる、各個体ｕ及び各トピックｋに対する、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及び各トピックに対応するヒストグラムの各区間ｌに対する、トピックに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌと、トピックｋの各々に対応するヒストグラムの各々について計算された分割数Ｗとに基づいて、トピックｋの各々に対応するヒストグラムの各々に対する重みの分布を表すパラメータα、及びヒストグラムの分布を表すパラメータβを更新する。トピック推定部３０は、上記のトピックの計算、ヒストグラムの分割数Ｗの計算、及びパラメータの更新を繰り返し行う。 The topic estimation unit 30 according to the second embodiment uses the continuous value for each continuous value data included in the set of continuous value data received by the input unit 10 according to the posterior probability that the continuous value data belongs to each topic k. Calculate the topic k to which the data belongs. Also, the topic estimation unit 30 obtains the number N _ku of continuous value data of the individual u belonging to the topic k for each individual u obtained from the calculation result of the topic k for each histogram corresponding to each of the topics k. The histogram division number W is calculated according to the posterior probability of the histogram division number W based on the number N _kl of continuous value data included in each section l of the histogram corresponding to the topic k. Further, the topic estimation unit 30 obtains the number N _ku of continuous value data of the individual u belonging to the topic k and the histogram corresponding to each topic obtained from the calculation result of the topic k, for each individual u and each topic k. Based on the number N _kl of continuous value data included in the interval l of the histogram corresponding to the topic for each interval l and the division number W calculated for each of the histograms corresponding to each of the topics k, The parameter α representing the weight distribution for each of the corresponding histograms and the parameter β representing the histogram distribution are updated. The topic estimation unit 30 repeatedly performs the above topic calculation, histogram division number W calculation, and parameter update.

トピック推定部３０において、トピックｋの計算は第１の実施の形態と同様に行えばよい。 The topic estimation unit 30 may calculate the topic k in the same manner as in the first embodiment.

次に、トピック推定部３０におけるヒストグラムの分割数Ｗの推定、並びに、パラメータα、及びパラメータβの更新の計算方法について説明する。 Next, the estimation method of the histogram division number W and the update method of the parameter α and the parameter β in the topic estimation unit 30 will be described.

トピック推定部３０では、第１の実施の形態では既知とした各トピックのヒストグラムの分割数Ｗ≡（Ｗ_１，Ｗ_２，・・・，Ｗ_Ｋ）を連続値データから推定する。分割数に関する無情報事前分布として一様分布 The topic estimation unit 30 estimates the histogram division number W≡ (W ₁ , W ₂ ,..., W _K ) of each topic, which is known in the first embodiment, from continuous value data. Uniform distribution as no information prior distribution about the number of divisions

を仮定し、上記（６）式における所属トピックｚのギブスサンプリングだけでなく、パラメータβ、トピックｋに対応するヒストグラムの各区間ｌに含まれる連続値データの個数Ｎ_ｋｌ、及び所属トピックがｋである連続値データの総数Ｎ_ｋに基づいて、下記（１４）式に従って分割数Ｗについてもギブスサンプリングを行う。 As well as Gibbs sampling of the affiliated topic z in the above equation (6), the parameter β, the number N _kl of continuous value data included in each section l of the histogram corresponding to the topic k, and the affiliated topic is k Based on the total number _Nk of certain continuous value data, Gibbs sampling is also performed for the division number W according to the following equation (14).

経験的に分割数Ｗの分布の凸形状は鋭いので、最後のサンプリングで得られた分割数Ｗ^（Ｐ）≡（Ｗ_１ ^（Ｐ），Ｗ_２ ^（Ｐ），・・・，Ｗ_Ｋ ^（Ｐ））を推定値とする。 Since the convex shape of the distribution of the division number W is empirically sharp, the division number W ^(P) ≡ (W ₁ ^(P) , W ₂ ^(P) ,..., W _K ^{(P )} ) Is an estimated value.

また、トピック推定部３０では、未知パラメータα、及びβは第１の実施の形態と同様に確率的ＥＭ法を用いて、以下（１５）式に従ってＰ回更新する。 In the topic estimation unit 30, the unknown parameters α and β are updated P times according to the following equation (15) using the probabilistic EM method as in the first embodiment.

従来技術においては、単一の個体に対してのヒストグラム法に限り、分割数をデータから決定する方法はいくつか提案されていた（上記非特許文献１、３、４参照）。しかし、本発明の第２の実施の形態のように、（ｉ）全個体の確率密度関数がＫ（≠１）個のヒストグラムの線形和で表現される、（ｉｉ）個体ごとの確率密度関数の違いは線形和の重み付けの違いで表現される、の二点を仮定した上で各ヒストグラムの分割数Ｗをデータから決定する方法は未だ提案されておらず、この機能は本実施の形態に係る確率密度関数推定装置独自のものである。 In the prior art, only a method of determining the number of divisions from data has been proposed only for the histogram method for a single individual (see Non-Patent Documents 1, 3, and 4 above). However, as in the second embodiment of the present invention, (ii) the probability density function for each individual is represented by (i) the probability density function for all individuals expressed as a linear sum of K (≠ 1) histograms. The method of determining the number of divisions W of each histogram from the data on the assumption of the two points that the difference is expressed by the difference in the weighting of the linear sum has not yet been proposed. Such a probability density function estimation device is unique.

第２の実施の形態に係る確率密度関数推定部３２は、トピック推定部３０によって更新された重みの分布を表すパラメータα、ヒストグラムの分布を表すパラメータβ、トピックｋの各々に対応するヒストグラムの各々について計算された分割数Ｗ、繰り返し毎に得られた、各個体ｕ及び各トピックｋに対する、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及び繰り返し毎に得られた、各トピックｋに対応するヒストグラムの各区間ｌに対する、トピックｋに対応するヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づいて、トピックｋの各々に対応するヒストグラムの各々の各区間ｌの形状~φ_ｌｋ、及び各個体ｕにおけるトピックｋの各々に対応するヒストグラムの各々に対する重み~θ_ｋｕを推定する。そして、確率密度関数推定部３２は、複数の個体ｕの各々に対し、当該個体ｕに対して推定されたヒストグラムの各々に対する重み~θ_ｋｕと、推定されたトピックｋの各々に対応するヒストグラムの各々の各区間ｌの形状~φ_ｌｋとに基づいて、トピックｋの各々に対応するヒストグラムの各々の線形和で表わされる確率密度関数~ｐ_ｕ（ｔ│~φ，~θ_・ｕ）を推定する。なお、第２の実施の形態に係る確率密度関数推定部３２における確率密度関数の具体的な計算方法については第１の実施の形態と同様である。 The probability density function estimation unit 32 according to the second embodiment includes a parameter α representing a weight distribution updated by the topic estimation unit 30, a parameter β representing a histogram distribution, and a histogram corresponding to each of the topics k. For each individual u and each topic k obtained for each iteration, the number N _ku of continuous value data of the individual u belonging to topic k, and each topic obtained for each iteration For each section l of the histogram corresponding to k, the shape of each section l of each histogram corresponding to each topic k based on the number of consecutive value data N _kl included in the section l of the histogram corresponding to topic k ~ φ _lk and the weights ~ θ _ku for each of the histograms corresponding to each of the topics k in each individual u are estimated. Then, the probability density function estimation unit 32 calculates, for each of the plurality of individuals u, a weight ~ θ _ku for each of the histograms estimated for the individual u and a histogram corresponding to each of the estimated topics k. _{Estimate the} probability density function ~ p _u (t│ ~ φ, ~ θ _{・ u} ) represented by the linear sum of each histogram corresponding to each topic k based on the shape ~ φ _lk of each section l To do. The specific method for calculating the probability density function in the probability density function estimation unit 32 according to the second embodiment is the same as that in the first embodiment.

なお、第２の実施の形態に係る確率密度関数推定装置１００の他の構成は第１の実施の形態と同様であるため説明を省略する。 In addition, since the other structure of the probability density function estimation apparatus 100 which concerns on 2nd Embodiment is the same as that of 1st Embodiment, description is abbreviate | omitted.

＜本発明の第２の実施の形態に係る連続値予測装置の構成＞ <Configuration of continuous value prediction apparatus according to second embodiment of the present invention>

本発明の第２の実施の形態に係る連続値予測装置の構成は第１の発明の実施の形態に係る連続値予測装置の構成と同様であるため詳細な説明を省略する。 Since the configuration of the continuous value predicting apparatus according to the second embodiment of the present invention is the same as the configuration of the continuous value predicting apparatus according to the first embodiment of the present invention, detailed description thereof is omitted.

＜本発明の第２の実施の形態に係る確率密度関数推定装置の作用＞ <Operation of the probability density function estimation apparatus according to the second embodiment of the present invention>

次に、本発明の第２の実施の形態に係る確率密度関数推定装置１００の作用について説明する。なお、第１の実施の形態と同様の作用となる箇所については同一符号を付して説明を省略する。 Next, the operation of the probability density function estimation apparatus 100 according to the second embodiment of the present invention will be described. In addition, about the location which becomes the effect | action similar to 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted.

入力部１０において個体の個体ＩＤが付与された連続値データの集合を受け付けると、確率密度関数推定装置１００は、図５に示す確率密度関数推定処理ルーチンを実行する。 When the input unit 10 receives a set of continuous value data to which an individual ID is assigned, the probability density function estimation apparatus 100 executes a probability density function estimation processing routine shown in FIG.

ステップＳ３００では、各トピックｋに対し、ステップＳ１０２によるトピックｋの計算結果から得られる、各個体ｕに対する、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及びトピックｋに対応するヒストグラムの各区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づく、上記（１４）式に示すヒストグラムの分割数Ｗ_ｋの事後確率に従って、当該トピックｋに対応するヒストグラムの分割数Ｗ_ｋを計算する。 In step S300, for each topic k, the number N _ku of continuous value data of the individual u belonging to the topic k and the histogram corresponding to the topic k obtained from the calculation result of the topic k in step S102. The histogram division number W _k corresponding to the topic _k is calculated according to the a posteriori probability of the histogram division number W _k shown in the above equation (14) based on the number N _kl of continuous value data included in each interval l. .

ステップＳ３０２では、ステップＳ１０２、Ｓ３００、及びＳ１０４の処理を予め定められた回数繰り返したかを判定し、予め定められた回数繰り返していなければステップＳ１０２へ戻ってステップＳ１０２、Ｓ３００、及びＳ１０４の処理を繰り返し、予め定められた回数繰り返していればステップＳ２０４へ移行する。 In step S302, it is determined whether the processes in steps S102, S300, and S104 have been repeated a predetermined number of times, and if not repeated a predetermined number of times, the process returns to step S102 and the processes in steps S102, S300, and S104 are repeated. If it has been repeated a predetermined number of times, the process proceeds to step S204.

ステップＳ３０４では、ステップＳ１０４で更新されたトピックｋの各々に対応するヒストグラムの各々に対する重みの分布を表すパラメータα、ヒストグラムの分布を表すパラメータβ、ステップＳ３００でトピックｋの各々に対応するヒストグラムの各々について計算された分割数Ｗ_ｋ、繰り返し毎に得られた、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及び繰り返し毎に得られた、ヒストグラムの区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づいて、上記（９）式及び（１０）式に従って、トピックｋの各々に対応するヒストグラムの各々の各区間ｌの形状~φ_ｌｋ、及び各個体ｕにおけるトピックｋの各々に対応するヒストグラムの各々に対する重み~θ_ｋｕを推定する。 In step S304, a parameter α representing the distribution of weights for each of the histograms corresponding to each of the topics k updated in step S104, a parameter β representing the distribution of the histograms, and each of the histograms corresponding to each of the topics k in step S300. The number of divisions W _k calculated for the number of pieces of continuous value data N _{ku of} the individual u belonging to the topic k obtained for each iteration, and the continuous value data included in the interval l of the histogram obtained for each iteration. based on the number N _kl, according to the above (9) and (10), the histogram of each shape ~ phi _lk of each section l corresponding to each topic _k, and each topic k in each individual u A weight _{˜θ ku} for each corresponding histogram is estimated.

なお、第２の実施の形態に係る確率密度関数推定装置１００の他の作用は第１の実施の形態と同様であるため説明を省略する。 In addition, since the other effect | action of the probability density function estimation apparatus 100 which concerns on 2nd Embodiment is the same as that of 1st Embodiment, description is abbreviate | omitted.

以上説明したように、第２の実施の形態に係る確率密度関数推定装置によれば、上記（６）式に示す事後確率に従って、連続値データが所属するトピックｋを計算し、トピックｋに所属する個体ｕの連続値データの個数Ｎ_ｋｕ、及びトピックｋに対応するヒストグラムの各区間ｌに含まれる連続値データの個数Ｎ_ｋｌに基づく、上記（１４）式に示すヒストグラムの分割数Ｗの事後確率に従って、ヒストグラムの分割数Ｗを計算し、トピッｋクの計算結果から得られる、各個体及び各トピックｋに対する、トピックｋに所属する個体の連続値データの個数Ｎ_ｋｕ、及び各トピックｋに対応するヒストグラムの各区間に対する、トピックに対応するヒストグラムの区間に含まれる連続値データの個数Ｎ_ｋｌに基づいて、上記（１５）式に従って、重みの分布を表すパラメータα、及びヒストグラムの分布を表すパラメータβを更新することを繰り返し、更新された重みの分布を表すパラメータα、ヒストグラムの分布を表すパラメータβ、分割数Ｗ、繰り返し毎に得られた、各個体及び各トピックｋに対する、トピックｋに所属する個体の連続値データの個数Ｎ_ｋｕ、及び繰り返し毎に得られた、各トピックｋに対応するヒストグラムの各区間に対する、トピックｋに対応するヒストグラムの区間に含まれる連続値データの個数Ｎ_ｋｌに基づいて、上記（９）式及び（１０）式に従って、トピックｋの各々に対応するヒストグラムの各々の各区間の形状~φ_ｌｋ、及び各個体におけるトピックｋの各々に対応するヒストグラムの各々に対する重み~θ_ｋｕを推定し、ヒストグラムの各々に対する重み~θ_ｋｕと、ヒストグラムの各々の各区間の形状~φ_ｌｋとに基づいて、上記（１１）式に従って、確率密度関数~ｐ_ｕ（ｔ│~φ，~θ_・ｕ）を推定することにより、個体が生成する連続値を精度よく予測するための確率密度関数を推定することができる。
＜本発明の第２の実施の形態に係る連続値予測装置の作用＞ As described above, according to the probability density function estimation apparatus according to the second embodiment, the topic k to which the continuous value data belongs is calculated according to the posterior probability shown in the above equation (6), and belongs to the topic k. Posterior of the division number W of the histogram shown in the above equation (14), based on the number N _ku of continuous value data of the individual u and the number N _kl of continuous value data included in each section l of the histogram corresponding to the topic k The number of divisions W of the histogram is calculated according to the probability, and the number N _ku of continuous value data of individuals belonging to the topic k for each individual and each topic k, obtained from the calculation result of the topic, and for each topic k for each section of the corresponding histogram, based on the number N _kl continuous value data included in the section of the histogram corresponding to the topic, follow the above (15) Repeatedly updating the parameter α representing the weight distribution and the parameter β representing the histogram distribution, the parameter α representing the updated weight distribution, the parameter β representing the histogram distribution, the division number W, For each individual and each topic k, the number N _ku of continuous value data of individuals belonging to the topic k, and the topic k for each section of the histogram corresponding to each topic k obtained for each iteration. The shape of each section of the histogram corresponding to each of the topics k to φ _lk according to the above formulas (9) and (10) based on the number N _kl of continuous value data included in the section of the histogram corresponding to , And a weight _{˜θ ku} for each of the histograms corresponding to each of the topics k in each individual, The probability density function ~ p _u (t | ~ φ, ~ θ _{· u} ) is estimated according to the above equation (11) based on the weight ~ θ _ku and the shape of each section of the histogram ~ φ _lk By doing so, it is possible to estimate a probability density function for accurately predicting a continuous value generated by an individual.
<Operation of Continuous Value Prediction Device According to Second Embodiment of the Present Invention>

本発明の第２の実施の形態に係る連続値予測装置の作用は第１の発明の実施の形態に係る連続値予測装置の作用と同様であるため詳細な説明を省略する。 Since the operation of the continuous value predicting apparatus according to the second embodiment of the present invention is the same as that of the continuous value predicting apparatus according to the first embodiment of the present invention, detailed description thereof is omitted.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

１０、２１０入力部
２０、２２０演算部
３０トピック推定部
３２確率密度関数推定部
５０、２５０出力部
１００確率密度関数推定装置
２００連続値予測装置
２３０連続値予測部 10, 210 Input unit 20, 220 Calculation unit 30 Topic estimation unit 32 Probability density function estimation unit 50, 250 Output unit 100 Probability density function estimation device 200 Continuous value prediction device 230 Continuous value prediction unit

Claims

A set of continuous value data observed by a plurality of individuals and assigned with an individual ID of the individual is used as an input, and for each continuous value data included in the set of continuous value data, the topic for each individual and each topic Included in the interval of the histogram corresponding to the topic among the continuous value data belonging to the topic for the interval of the continuous value in the histogram corresponding to the number of the continuous value data of the individual belonging to Based on the number of continuous value data, the topic to which the continuous value data belongs is calculated according to the posterior probability that the continuous value data belongs to each topic,
The number of continuous value data of the individual belonging to the topic for each individual and each topic obtained from the calculation result of the topic, and the histogram corresponding to the topic for each section of the histogram corresponding to each topic A topic estimation unit that repeats updating a parameter representing a distribution of weights for each of the histograms corresponding to each of the topics and a parameter representing the distribution of the histograms based on the number of continuous value data included in the section; ,
Parameters representing the distribution of the weights updated by the topic estimation unit, parameters representing the distribution of the histogram, continuous values data of the individuals belonging to the topic for each individual and each topic obtained for each iteration Corresponding to each of the topics based on the number and the number of continuous value data included in the section of the histogram corresponding to the topic for each section of the histogram corresponding to each topic obtained for each iteration. Estimating the shape of each section of each of the histograms and the weight for each of the histograms corresponding to each of the topics in each individual, and for each of the plurality of individuals, the histogram estimated for the individual Corresponding to each of the estimated topics Serial based on the shape of each section of each of the histograms, probability density function estimating unit for estimating a probability density function expressed by a linear sum of each of the histograms corresponding to each of the topics,
Probability density function estimation device.

The topic estimation unit calculates, for each continuous value data included in the set of continuous value data, a topic to which the continuous value data belongs according to a posterior probability that the continuous value data belongs to each topic,
For each of the histograms corresponding to each of the topics, the number of continuous value data of the individual belonging to the topic for each individual, obtained from the calculation result of the topic, and each section of the histogram corresponding to the topic Calculating the number of divisions of the histogram according to the posterior probability of the number of divisions of the histogram based on the number of continuous value data included in the
The number of continuous value data of the individual belonging to the topic for each individual and each topic obtained from the calculation result of the topic, and the histogram corresponding to the topic for each section of the histogram corresponding to each topic A parameter representing a weight distribution for each of the histograms corresponding to each of the topics based on the number of continuous value data included in the section and the number of divisions calculated for each of the histograms corresponding to each of the topics. And updating the parameter representing the distribution of the histogram,
The probability density function estimator includes a parameter representing the weight distribution updated by the topic estimator, a parameter representing the histogram distribution, the number of divisions calculated for each histogram corresponding to each of the topics, and repetition. For each individual and each topic obtained for each topic, the number of continuous value data of the individual belonging to the topic, and corresponding to the topic for each section of the histogram corresponding to each topic obtained for each iteration The shape of each section of the histogram corresponding to each of the topics, and each of the histograms corresponding to each of the topics in each individual, based on the number of continuous value data included in the section of the histogram The probability density function estimation apparatus according to claim 1, wherein the weight for is estimated.

A continuous value prediction device for predicting continuous value data of an individual to be predicted,
The weight corresponding to each of the histograms for each of the topics, which is estimated in advance by the probability density function estimating device according to claim 1 or 2, and the corresponding to each of the topics for the individual to be predicted Continuously generated in the future for the individual to be predicted based on a probability density function represented by a linear sum of each of the histograms corresponding to each of the topics, based on the shape of each interval of continuous values in each of the histograms A continuous value prediction apparatus including a continuous value prediction unit for calculating an expected value of a value.

The topic estimation unit receives a set of continuous value data that is observed by a plurality of individuals and is assigned with the individual ID of the individual, and for each continuous value data included in the set of continuous value data, each individual and each The histogram corresponding to the topic among the continuous value data belonging to the topic with respect to each section of the continuous value in the histogram corresponding to the number of continuous value data of the individual belonging to the topic and the topic for the topic Based on the number of continuous value data included in the interval, the topic to which the continuous value data belongs is calculated according to the posterior probability that the continuous value data belongs to each topic,
The number of continuous value data of the individual belonging to the topic for each individual and each topic obtained from the calculation result of the topic, and the histogram corresponding to the topic for each section of the histogram corresponding to each topic Repetitively updating a parameter representing a distribution of weights for each of the histograms corresponding to each of the topics and a parameter representing the distribution of the histograms based on the number of continuous value data included in the section;
A probability density function estimation unit belongs to the topic for each individual and each topic obtained for each iteration, a parameter representing the distribution of the weight updated by the topic estimation unit, a parameter representing the distribution of the histogram Based on the number of continuous value data of the individual and each section of the histogram corresponding to each topic obtained for each iteration, based on the number of continuous value data included in the section of the histogram corresponding to the topic, Estimating the shape of each section of each of the histograms corresponding to each of the topics and the weight for each of the histograms corresponding to each of the topics in each individual, and for each of the plurality of individuals to the individual A weight for each of the histograms estimated for the Based on the shape of each section of each of the histograms corresponding to each of the click, and estimating a probability density function expressed by a linear sum of each of the histograms corresponding to each of the topics,
A probability density function estimation method including:

The step of estimating and updating by the topic estimation unit includes, for each continuous value data included in the set of continuous value data, the topic to which the continuous value data belongs according to the posterior probability that the continuous value data belongs to each topic. Calculate
For each of the histograms corresponding to each of the topics, the number of continuous value data of the individual belonging to the topic for each individual, obtained from the calculation result of the topic, and each section of the histogram corresponding to the topic The number of divisions of the histogram is calculated according to the a posteriori probability of the number of divisions of the histogram based on the number of continuous value data included in the data, and each individual and each topic obtained from the calculation result of the topic belong to the topic And the number of continuous value data included in the section of the histogram corresponding to the topic for each section of the histogram corresponding to each topic, and the histogram corresponding to each of the topics And the number of divisions calculated for each of the above Parameter represents the weight distribution for each of the histograms corresponding to each of the pick, and repeatedly updating the parameters representing the distribution of the histogram,
The step of estimating by the probability density function estimator is calculated for each of the parameter representing the distribution of the weight updated by the topic estimator, the parameter representing the distribution of the histogram, and the histogram corresponding to each of the topics. For each individual and each topic obtained for each number of iterations, the number of continuous value data of the individual belonging to the topic, and for each section of the histogram corresponding to each topic obtained for each iteration, Based on the number of continuous value data included in the section of the histogram corresponding to the topic, the shape of each section of the histogram corresponding to each of the topics, and each of the topics in each individual The probability of claim 4, wherein a weight for each of said histograms is estimated. Degree function estimation method.

A continuous value prediction method in a continuous value prediction device for predicting continuous value data of an individual to be predicted,
A continuous value predicting unit, for the individual to be predicted, the weight for each of the histograms for each of the topics estimated in advance by the probability density function estimation method according to claim 4 or 5, and the topic Based on a probability density function represented by a linear sum of each of the histograms corresponding to each of the topics, based on the shape of each section of continuous values in each of the histograms corresponding to each of the histograms A continuous value prediction method including the step of calculating an expected value of continuous values generated in the future for.

The program for functioning a computer as each part which comprises the probability density function estimation apparatus of Claim 1 or 2.

The program for functioning a computer as each part which comprises the continuous value prediction apparatus of Claim 3.