JP7708031B2

JP7708031B2 - Information processing device, information processing method, and program

Info

Publication number: JP7708031B2
Application number: JP2022129482A
Authority: JP
Inventors: 靖弘生田; 敦史川本
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2025-07-15
Anticipated expiration: 2042-08-16
Also published as: JP2024026931A

Description

本開示は情報処理装置、情報処理方法、及びプログラムに関し、特に、予測値を算出する情報処理装置、情報処理方法、及びプログラムに関する。 The present disclosure relates to an information processing device, an information processing method, and a program, and in particular to an information processing device, an information processing method, and a program that calculates a predicted value.

データ間の因果関係を分析する因果分析の技術分野などにおいて、データ間の相対距離に基づいて予測値（推定値）を算出する処理が行なわれることがある。これに関連し、特許文献１に開示された因果関係学習装置は、２つのセンサによって測定された測定値間の相関を判定する相関判定部と、相関が所定の基準よりも低い場合に、結果である測定値から原因である測定値を推定することによって２つのセンサ間の因果関係を判定する推定部とを備える。また、関連する技術として、非特許文献１は、ＮＰＭＲ（Non-Parametric Multiplicative Regression）モデルを用いた因果分析について開示している。 In technical fields such as causal analysis, which analyzes causal relationships between data, a process of calculating a predicted value (estimated value) is sometimes performed based on the relative distance between data. In relation to this, the causal relationship learning device disclosed in Patent Document 1 includes a correlation determination unit that determines the correlation between measurements taken by two sensors, and an estimation unit that determines the causal relationship between the two sensors by estimating the cause measurement value from the result measurement value when the correlation is lower than a predetermined standard. Furthermore, as a related technique, Non-Patent Document 1 discloses causal analysis using the NPMR (Non-Parametric Multiplicative Regression) model.

国際公開第２０１８／１９８２６７号International Publication No. 2018/198267

Nicolaou N et al., “A Nonlinear Causality Estimator Based on Non-Parametric Multiplicative Regression”, Frontiers in Neuroinformatics, Volume 10, June 2016.Nicolaou N et al., “A Nonlinear Causality Estimator Based on Non-Parametric Multiplicative Regression”, Frontiers in Neuroinformatics, Volume 10, June 2016.

特許文献１で用いられる予測式は、データ間の相対距離に基づく重みの計算式が比較的単純であるため、非線形モデルとしての表現力が乏しい。この問題を解決するためには、例えば、重みの数を増やした予測式を用いたり、予測式における重みの算出において積演算を用いたりすることが考えられる。しかしながら、そのような予測式を用いた場合、重みにかかわる計算量が増加し、多くの計算時間を要することとなってしまう。 The prediction formula used in Patent Document 1 has a relatively simple formula for calculating weights based on the relative distance between data, and therefore has poor expressive power as a nonlinear model. To solve this problem, for example, a prediction formula with an increased number of weights could be used, or a multiplication operation could be used to calculate the weights in the prediction formula. However, when such a prediction formula is used, the amount of calculation related to the weights increases, and a long calculation time is required.

本開示は、上記した事情を背景としてなされたものであり、データ間の相対距離の計算を含む予測式を用いた予測処理において、高速な処理を実現することができる情報処理装置、情報処理方法、及びプログラムを提供することを目的とする。 The present disclosure has been made against the background of the above circumstances, and aims to provide an information processing device, an information processing method, and a program that can achieve high-speed processing in prediction processing using a prediction formula that includes calculation of the relative distance between data.

上記目的を達成するための本開示の一態様は、Ｔ（ただし、Ｔは２以上の整数）行ｍ（ただし、ｍは１以上の整数）列の行列として表される説明変数のデータと、Ｔ個の成分を有するベクトルとして表される目的変数のデータを取得するデータ取得部と、前記説明変数のデータのうち注目する行のデータと他のそれぞれの行のデータとの相対距離の計算を含む所定の予測式と、前記データ取得部が取得したデータとを用いて、前記目的変数についての予測値を算出する予測処理部とを有し、前記予測処理部は、前記他のそれぞれの行との前記相対距離がベクトルにより表された前記予測式を用いて、又は、前記他のそれぞれの行のうち前記相対距離が所定の閾値以上である行のデータを間引いて計算することにより、前記予測値を算出する情報処理装置である。 One aspect of the present disclosure for achieving the above object is an information processing device that includes a data acquisition unit that acquires explanatory variable data expressed as a matrix of T (where T is an integer of 2 or more) rows and m (where m is an integer of 1 or more) columns, and objective variable data expressed as a vector having T components, and a prediction processing unit that calculates a predicted value for the objective variable using a predetermined prediction formula including a calculation of a relative distance between data of a row of interest and data of each of the other rows of the explanatory variable data, and the data acquired by the data acquisition unit, and the prediction processing unit calculates the predicted value using the prediction formula in which the relative distance with each of the other rows is expressed by a vector, or by thinning out data of rows of the other rows whose relative distance is equal to or greater than a predetermined threshold.

また、上記目的を達成するための本開示の他の一態様は、情報処理装置が、Ｔ（ただし、Ｔは２以上の整数）行ｍ（ただし、ｍは１以上の整数）列の行列として表される説明変数のデータと、Ｔ個の成分を有するベクトルとして表される目的変数のデータを取得し、前記説明変数のデータのうち注目する行のデータと他のそれぞれの行のデータとの相対距離の計算を含む所定の予測式と、取得した前記データとを用いて、前記目的変数についての予測値を算出し、前記予測値を算出するステップでは、前記他のそれぞれの行との前記相対距離がベクトルにより表された前記予測式を用いて、又は、前記他のそれぞれの行のうち前記相対距離が所定の閾値以上である行のデータを間引いて計算することにより、前記予測値を算出する情報処理方法である。 In addition, another aspect of the present disclosure for achieving the above object is an information processing method in which an information processing device acquires explanatory variable data expressed as a matrix with T (where T is an integer of 2 or more) rows and m (where m is an integer of 1 or more) columns, and objective variable data expressed as a vector having T components, and calculates a predicted value for the objective variable using a predetermined prediction formula including a calculation of a relative distance between data of a row of interest and data of each of the other rows of the explanatory variable data, and the acquired data, and in the step of calculating the predicted value, the predicted value is calculated using the prediction formula in which the relative distance with each of the other rows is expressed by a vector, or by thinning out data of rows among the other rows whose relative distance is equal to or greater than a predetermined threshold.

また、上記目的を達成するための本開示の他の一態様は、Ｔ（ただし、Ｔは２以上の整数）行ｍ（ただし、ｍは１以上の整数）列の行列として表される説明変数のデータと、Ｔ個の成分を有するベクトルとして表される目的変数のデータを取得するデータ取得ステップと、前記説明変数のデータのうち注目する行のデータと他のそれぞれの行のデータとの相対距離の計算を含む所定の予測式と、前記データ取得ステップで取得したデータとを用いて、前記目的変数についての予測値を算出する予測処理ステップとをコンピュータに実行させ、前記予測処理ステップでは、前記他のそれぞれの行との前記相対距離がベクトルにより表された前記予測式を用いて、又は、前記他のそれぞれの行のうち前記相対距離が所定の閾値以上である行のデータを間引いて計算することにより、前記予測値を算出するプログラムである。 In addition, another aspect of the present disclosure for achieving the above object is a program that causes a computer to execute a data acquisition step of acquiring explanatory variable data expressed as a matrix of T (where T is an integer of 2 or more) rows and m (where m is an integer of 1 or more) columns and objective variable data expressed as a vector having T components, and a prediction processing step of calculating a predicted value for the objective variable using a predetermined prediction formula including a calculation of a relative distance between data of a row of interest and data of each of the other rows among the explanatory variable data, and the data acquired in the data acquisition step, and in which the prediction processing step calculates the predicted value using the prediction formula in which the relative distance with each of the other rows is expressed by a vector, or by thinning out data of each of the other rows in which the relative distance is equal to or greater than a predetermined threshold.

本開示によれば、データ間の相対距離の計算を含む予測式を用いた予測処理において、高速な処理を実現することができる情報処理装置、情報処理方法、及びプログラムを提供することができる。 According to the present disclosure, it is possible to provide an information processing device, an information processing method, and a program that can achieve high-speed processing in prediction processing using a prediction formula that includes calculation of the relative distance between data.

実施の形態にかかる情報処理装置の構成の一例を示すブロック図である。1 is a block diagram showing an example of a configuration of an information processing device according to an embodiment; オリジナルのＮＰＭＲ予測式で示される演算をコンピュータプログラムにより実装する場合のプログラムのソースコードの一例を示す図である。FIG. 1 is a diagram showing an example of a source code of a program when the calculation represented by the original NPMR prediction formula is implemented by a computer program. ベクトル化予測式で示される演算をコンピュータプログラムにより実装する場合のプログラムのソースコードの一例を示す図である。FIG. 13 is a diagram showing an example of source code of a computer program that implements the calculations represented by the vector prediction formula. リカレンスプロットの一例を示す図である。FIG. 13 is a diagram showing an example of a recurrence plot. 実施の形態にかかる情報処理装置のハードウェア構成の一例を示すブロック図である。1 is a block diagram illustrating an example of a hardware configuration of an information processing device according to an embodiment. 実施の形態にかかる情報処理装置の動作の一例を示すフローチャートである。10 is a flowchart illustrating an example of an operation of the information processing device according to the embodiment. 第１の実験における実験結果を示すグラフである。11 is a graph showing experimental results in a first experiment. 第２の実験における実験結果を示すグラフである。13 is a graph showing experimental results in a second experiment. 第３の実験における実験結果を示すグラフである。13 is a graph showing experimental results in a third experiment.

以下、図面を参照して本発明の実施の形態について説明する。
図１は、実施の形態にかかる情報処理装置１０の構成の一例を示すブロック図である。情報処理装置１０は、例えば説明変数のデータと目的変数のデータとの間の因果関係を分析する因果分析を行なうために、説明変数のデータと目的変数のデータとを用いて目的変数のデータを予測する処理を行なうが、他の任意の目的のために後述する予測処理を行なってもよい。情報処理装置１０は、一例として、図１に示すようにデータ取得部１０１と予測処理部１０２とを有する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
1 is a block diagram showing an example of a configuration of an information processing device 10 according to an embodiment. The information processing device 10 performs a process of predicting data of a response variable using data of explanatory variables and data of a response variable in order to perform a causal analysis for analyzing a causal relationship between data of explanatory variables and data of a response variable, for example, but may perform a prediction process described later for any other purpose. As an example, the information processing device 10 has a data acquisition unit 101 and a prediction processing unit 102 as shown in FIG. 1.

データ取得部１０１は、予測処理に用いるデータの入力を受付ける。データ取得部１０１は、説明変数のデータ系列と、目的変数のデータ系列の入力を受付ける。本実施の形態では、説明変数のデータ及び目的変数のデータは、いずれも時系列データであるが、必ずしもこれらが時系列データでなくてもよい。特に、データ取得部１０１は、Ｔ行ｍ列の行列として表される説明変数のデータと、Ｔ個の成分を有するベクトルとして表される目的変数のデータを取得する。 The data acquisition unit 101 accepts input of data to be used in the prediction process. The data acquisition unit 101 accepts input of a data series of explanatory variables and a data series of objective variables. In this embodiment, both the explanatory variable data and the objective variable data are time-series data, but they do not necessarily have to be time-series data. In particular, the data acquisition unit 101 acquires explanatory variable data expressed as a matrix with T rows and m columns, and objective variable data expressed as a vector having T components.

具体的には、データ取得部１０１は、以下の式（１）で示される説明変数のデータを取得する。 Specifically, the data acquisition unit 101 acquires data on explanatory variables shown in the following formula (1).

＜式（１）＞
<Formula (1)>

また、データ取得部１０１は、以下の式（２）で示される目的変数のデータを取得する。 The data acquisition unit 101 also acquires data on the objective variable shown in the following equation (2).

＜式（２）＞
<Formula (2)>

ここで、Ｔは２以上の整数であり、ｍは１以上の整数である。なお、ｍは、説明変数として用いられるデータの種類数を表す。すなわち、予測処理のためにｍ個の予測子が用いられる。また、Ｔは、各データ系列（各時系列データ）のデータ数である。具体的な一例を挙げると、例えば、Ｘの第１列のデータ系列は気温の時系列データであり、第２列のデータ系列は降水量の時系列データであり、Ｙが示すデータ系列は電気使用量の時系列データである。この場合、後述する予測処理部１０２は、気温の時系列データと降水量の時系列データと電気使用量の時系列データとに基づいて、電気使用量の予測値を算出することとなる。なお、これらは一例に過ぎず、各データ系列の具体的なデータは上述した例に限定されない。 Here, T is an integer of 2 or more, and m is an integer of 1 or more. Note that m represents the number of types of data used as explanatory variables. In other words, m predictors are used for the prediction process. Also, T is the number of data in each data series (each time series data). To give a specific example, for example, the data series in the first column of X is time series data of temperature, the data series in the second column is time series data of precipitation, and the data series indicated by Y is time series data of electricity usage. In this case, the prediction processing unit 102 described later calculates a predicted value of electricity usage based on the time series data of temperature, the time series data of precipitation, and the time series data of electricity usage. Note that these are merely examples, and the specific data of each data series is not limited to the above-mentioned examples.

なお、データ取得部１０１は、入力されたデータ系列から、Ｔ行ｍ列の行列として表される説明変数のデータと、Ｔ個の成分を有するベクトルとして表される目的変数のデータを設定してもよい。すなわち、データ取得部１０１は、行列Ｘの成分であるｘ_ｉ，ｊ（ただし、ｉは１以上Ｔ以下の整数、ｊは１以上ｍ以下の整数）及びベクトルＹの成分であるｙ_ｔ（ただし、ｔは１以上Ｔ以下の整数）のそれぞれの値を指定する入力を受付けてもよいし、入力されたデータからこれらの値をユーザの指示に従うなどして設定してもよい。また、データ取得部１０１は、後述する標準偏差σ_ｊが所定の値（例えば１）となるように、データを予め規格化してもよい。 The data acquisition unit 101 may set explanatory variable data represented as a matrix of T rows and m columns and target variable data represented as a vector having T components from the input data series. That is, the data acquisition unit 101 may accept inputs specifying the values of x _i,j (where i is an integer between 1 and T, and j is an integer between 1 and m) which are components of the matrix X, and y _t (where t is an integer between 1 and T) which are components of the vector Y, or may set these values from the input data according to user instructions. The data acquisition unit 101 may also normalize the data in advance so that the standard deviation σ _j described later is a predetermined value (for example, 1).

予測処理部１０２は、説明変数のデータのうち注目する行のデータ（ｔ行のデータ）と他のそれぞれの行のデータ（ｔ行以外のそれぞれの行のデータ）との相対距離（差）の計算を含む所定の予測式と、データ取得部１０１が取得したデータ（Ｘ及びＹ）とを用いて、目的変数についての予測値を算出する。所定の予測式は、より詳細には、上述した相対距離に基づく重み計算を含む予測式である。なお、この所定の予測式では、相対距離が大きいほど、目的変数の値に対する重みが小さくなるよう重みが設定されている。 The prediction processing unit 102 calculates a predicted value for the objective variable using a predetermined prediction formula including a calculation of the relative distance (difference) between the data of a row of interest (data of row t) and each of the other rows of data (data of each row other than row t) among the explanatory variable data, and the data (X and Y) acquired by the data acquisition unit 101. More specifically, the predetermined prediction formula is a prediction formula including a weight calculation based on the above-mentioned relative distance. Note that in this predetermined prediction formula, the weight is set so that the greater the relative distance, the smaller the weight for the objective variable value.

本実施の形態では、予測処理部１０２は、上述した所定の予測式として、具体的には、例えば、非特許文献１に開示されているＮＰＭＲモデルの予測式に基づく予測式（ＮＰＭＲモデルの予測式を修正した予測式）を用いて予測値を算出する。ここで、非特許文献１に開示されているＮＰＭＲモデルの予測式は、以下の式（３）のように表される。ここで、式（３）の予測式において、目的変数の値に対して適用される重みｗ_ｉ，ｊは、式（４）により表される。なお、σ_ｊは、説明変数Ｘのｊ列のデータの標準偏差である。また、本開示では、式（３）の左辺などに現われる、予測値を表す変数をｙ_ｔ^と表記することがある。 In this embodiment, the prediction processing unit 102 calculates a predicted value using, as the above-mentioned predetermined prediction formula, specifically, for example, a prediction formula based on the prediction formula of the NPMR model disclosed in Non-Patent Document 1 (a prediction formula obtained by modifying the prediction formula of the NPMR model). Here, the prediction formula of the NPMR model disclosed in Non-Patent Document 1 is expressed as the following formula (3). Here, in the prediction formula of formula (3), the weight w _i,j applied to the value of the objective variable is expressed by formula (4). Note that σ _j is the standard deviation of the data in the jth column of the explanatory variable X. In addition, in the present disclosure, a variable representing a predicted value appearing on the left side of formula (3) may be expressed as y _t ^.

＜式（３）＞
<Formula (3)>

＜式（４）＞
<Formula (4)>

このように、ＮＰＭＲでは、目的変数Ｙのデータ系列のｔ番目の成分であるｙ_ｔの予測値ｙ_ｔ^を、目的変数Ｙのデータ系列のｔ番目以外のそれぞれの成分に重み付けをすることにより算出している。上記予測式から明らかなように、ＮＰＭＲモデルでは、目的変数Ｙの全ての成分、すなわちｙ_１からｙ_Ｔがいずれも実データである場合（すなわち、いずれもＮｕｌｌ値でない場合）、ｔの値として１からＴを設定することにより、ｙ_１^からｙ_Ｔ^を全て算出することができる。このため、実データであるｙ_１からｙ_Ｔのそれぞれと予測値であるｙ_１^からｙ_Ｔ^のそれぞれとを比較することで説明変数Ｘを用いた予測の精度がわかる。一方で、予測値ｙ_ｔ^の算出には、目的変数Ｙのデータ系列のｔ番目の成分であるｙ_ｔは用いられない。ｙ_１^からｙ_Ｔ^のうちいずれか一つの値（例えば、未知のｙ_ｔ）を算出するだけであれば、目的変数ＹのＴ個の成分の全てが必ずしも実データでなくてもよい。すなわち、データ取得部１０１は、Ｎｕｌｌ値（欠損値）である成分を一つ含むベクトルＹを目的変数Ｙのデータ列として取得してもよい。換言すると、データ取得部１０１が取得するＴ個の成分を有するベクトルの所定のｔ番目の成分はＮｕｌｌ値であってもよい。 In this way, in NPMR, the predicted value _yt ^ of _yt , which is the t-th component of the data series of the objective variable Y, is calculated by weighting each component other than the t-th component of the data series of the objective variable Y. As is clear from the above prediction formula, in the NPMR model, when all components of the objective variable Y, i.e., _y1 to _yT , are real data (i.e., when none of them are null values), all of _y1 ^ to _yT ^ can be calculated by setting 1 to T as the value of t. Therefore, the accuracy of the prediction using the explanatory variable X can be known by comparing each of the real data _y1 to _yT with each of the predicted values _y1 ^ to _yT ^. On the other hand, yt _, which is the t-th component of the data series of the objective variable Y, is not used to calculate the predicted value yt _^ . If only one value (for example, unknown _yt ) of _y1 ^ to _yT ^ is to be calculated, all of the T components of the objective variable Y do not necessarily have to be real data. That is, the data acquisition unit 101 may acquire a vector Y including one component that is a null value (missing value) as a data string of the objective variable Y. In other words, a predetermined t-th component of a vector having T components acquired by the data acquisition unit 101 may be a null value.

式（４）からわかるように、重みは、説明変数のデータのうち注目する行のデータ（ｔ行のデータ）と他のそれぞれの行のデータ（ｔ行以外のそれぞれの行のデータ）との相対距離（差）を計算することにより算出される。ＮＰＭＲモデルは、特許文献１などに示されるモデルに比べ十分に非線形性を表現できるものの、重みに関する計算量が多く、予測値の算出にかかる処理時間が長くなってしまう。そこで、本実施の形態では、予測処理部１０２は、高速な処理を実現するために、相対距離がベクトルにより表された予測式を用い、かつ、相対距離が所定の閾値以上である行のデータを間引いて計算することにより予測値を算出する。すなわち、予測処理部１０２による予測値の計算は、相対距離のベクトル化という特徴と、データの間引きという特徴とを有する。なお、本実施の形態は、これら両方の特徴を備えるが、予測処理部１０２は、いずれか一方の特徴だけを採用して予測値の算出を行なってもよい。 As can be seen from formula (4), the weight is calculated by calculating the relative distance (difference) between the data of the row of interest (data of row t) and each of the other rows (data of each row other than row t) among the data of the explanatory variables. Although the NPMR model can adequately express nonlinearity compared to the model shown in Patent Document 1 and the like, the amount of calculation related to the weight is large, and the processing time required for calculating the predicted value is long. Therefore, in this embodiment, in order to realize high-speed processing, the prediction processing unit 102 uses a prediction formula in which the relative distance is expressed by a vector, and calculates the predicted value by thinning out the data of the row whose relative distance is equal to or greater than a predetermined threshold. In other words, the calculation of the predicted value by the prediction processing unit 102 has the characteristics of vectorization of the relative distance and thinning out the data. Note that this embodiment has both of these characteristics, but the prediction processing unit 102 may calculate the predicted value by adopting only one of the characteristics.

まず、相対距離のベクトル化に関して説明する。発明者らは、ＮＰＭＲモデルの予測式は、相対距離がベクトルにより表された以下の予測式（式（５））に変形することが可能であることを見出した。このような変形が行なわれた予測式を、本開示ではベクトル化予測式と称すこととする。 First, the vectorization of relative distance will be described. The inventors have found that the prediction formula of the NPMR model can be transformed into the following prediction formula (formula (5)) in which the relative distance is represented by a vector. In this disclosure, such a transformed prediction formula will be referred to as a vectorized prediction formula.

＜式（５）＞
<Formula (5)>

ここで、－tの添え字は、ｔ番目（ｔ行目）の成分が取り除かれていることを意味する。例えば、以下の式（６）に示すベクトルｙについて、ｙ_－tは以下の式（６）のように定義される。 Here, the subscript -t means that the t-th (t-th row) component is removed. For example, for a vector y shown in the following equation (6), y _-t is defined as shown in the following equation (6).

＜式（６）＞
<Formula (6)>

したがって、式（５）においてベクトルｙ_－tは、Ｔ個の成分を有するベクトルとして表される目的変数のデータ列からｔ番目の成分を除いたＴ－１個の成分を有するベクトルである。 Therefore, in equation (5), vector y _−t is a vector having T−1 components obtained by excluding the t-th component from the data string of the response variable expressed as a vector having T components.

また、式（５）においてベクトルΔｘ_－tは、説明変数のデータにおけるｔ行以外の行のデータセットとｔ行のデータセットとの相対距離を成分とするＴ－１個の成分を有するベクトルである。つまり、ベクトルΔｘ_－tは、説明変数のデータにおけるｉ行のデータセットとｔ行のデータセットとの相対距離をｉ番目の成分とするＴ個のベクトルから、ｔ番目の成分を除いて得られるＴ－１個の成分を有するベクトルである。説明変数のデータにおけるｔ行以外の行のデータセットとｔ行のデータセットとの相対距離は、説明変数のデータにおけるｔ行以外の行であるｉ行のｍ個の成分（ｘ_ｉ，１，ｘ_ｉ，２，・・・，ｘ_ｉ，ｍ）で表されるベクトルと、説明変数のデータにおけるｔ行のｍ個の成分（ｘ_ｔ，１，ｘ_ｔ，２，・・・，ｘ_ｔ，ｍ）で表されるベクトルとの相対距離とも言える。 In addition, in formula (5), the vector Δx _-t is a vector having T-1 components whose components are the relative distance between the dataset of the row other than the t row in the explanatory variable data and the dataset of the t row. In other words, the vector Δx _-t is a vector having T-1 components obtained by removing the t-th component from T vectors whose i-th component is the relative distance between the dataset of the i row and the dataset of the t row in the explanatory variable data. The relative distance between the dataset of the row other than the t row in the explanatory variable data and the dataset of the t row can also be said to be the relative distance between the vector represented by the m components (x _i,1 , x _i,2 , ..., x _i,m ) of the i row other than the t row in the explanatory variable data and the vector represented by the m components (x _t,1 , x _t,2 , ..., x _t,m ) of the t row in the explanatory variable data.

簡単な例として、ｍ＝１の場合には、ベクトルΔｘ_－tは例えば以下のように示される。 As a simple example, when m=1, the vector Δx _−t can be expressed as follows:

＜式（７）＞
<Formula (7)>

また、式（５）などの本開示に示される式において、演算子「・」は、内積を示す演算子（内積演算子）であり、演算子「ｏ」は、アダマール積を示す演算子（アダマール積演算子）である。また、式（５）などの本開示に示される式において、関数「ｅｘｐ」は、指数関数であり、関数「Ｅｘｐ」はベクトルの各成分に作用する指数関数である。すなわち、関数「Ｅｘｐ」は、関数「ｅｘｐ」を各成分に対しそれぞれ適用することを意味する。 In addition, in the formulas shown in this disclosure, such as formula (5), the operator "." is an operator indicating an inner product (inner product operator), and the operator "o" is an operator indicating a Hadamard product (Hadamard product operator). In addition, in the formulas shown in this disclosure, such as formula (5), the function "exp" is an exponential function that acts on each component of a vector. In other words, the function "Exp" means that the function "exp" is applied to each component.

図２は、オリジナルのＮＰＭＲ予測式である式（３）で示される演算をコンピュータプログラムにより実装する場合のプログラムのソースコードの一例を示す図である。また、図３は、ベクトル化予測式である式（５）で示される演算をコンピュータプログラムにより実装する場合のプログラムのソースコードの一例を示す図である。図２及び図３で示したソースコードは、プログラミング言語としてJuliaを用いた場合の例を示しているが他のプログラミング言語により予測式の演算が実装されてもよい。図３に示すように、ベクトル化予測式を用いた場合、ベクトル演算を行なうコードによる実装が可能となるため、ベクトル化前の予測式（オリジナルのＮＰＭＲ予測式）の演算を実装するコードよりもループ処理による計算の実行回数が削減されたコードにより予測値の計算が可能となる。このため、ベクトル化予測式を用いることにより、予測値の予測処理において、高速な処理を実現することができる。なお、ベクトル化予測式を用いた場合の処理の高速化は、後述する図７などに示す結果からも実験的に理解される。 2 is a diagram showing an example of a source code of a program when the calculation shown in the formula (3), which is the original NPMR prediction formula, is implemented by a computer program. FIG. 3 is a diagram showing an example of a source code of a program when the calculation shown in the formula (5), which is the vectorized prediction formula, is implemented by a computer program. The source code shown in FIG. 2 and FIG. 3 shows an example when Julia is used as a programming language, but the calculation of the prediction formula may be implemented by other programming languages. As shown in FIG. 3, when the vectorized prediction formula is used, it is possible to implement it by a code that performs vector calculations, so that the predicted value can be calculated by a code that reduces the number of calculations by loop processing compared to the code that implements the calculation of the prediction formula before vectorization (the original NPMR prediction formula). Therefore, by using the vectorized prediction formula, high-speed processing can be realized in the prediction processing of the predicted value. The speed-up of processing when the vectorized prediction formula is used can be experimentally understood from the results shown in FIG. 7, which will be described later.

次に、データの間引きに関して説明する。式（３）及び式（４）からもわかるように、ＮＰＭＲ予測式などのように相対距離に基づき重みが計算される予測式においては、説明変数Ｘのｔ行のデータとの相対距離が大きい行に対応する目的変数Ｙの成分は、予測値ｙ_ｔ^の算出における寄与度合いが低い。そこで、本実施の形態では、予測値の算出処理をさらに高速化するために、予測処理部１０２は、相対距離が所定の閾値以上である行のデータを間引いて計算することにより、予測値を算出する。具体的には、予測処理部１０２は、以下の式（８）で示される予測式を用いて予測値ｙ_ｔ^を算出する。式（８）は、式（５）に示されるベクトル化予測式をさらに変形した予測式である。なお、式（８）により計算される予測値ｙ_ｔ^は、式（３）や式（５）で示される予測式により算出される予測値の近似値となっている。 Next, data thinning will be described. As can be seen from formulas (3) and (4), in a prediction formula in which weights are calculated based on relative distances, such as the NPMR prediction formula, the component of the objective variable Y corresponding to a row having a large relative distance from the data of the t row of the explanatory variable X has a low degree of contribution in the calculation of the predicted value y _t ^. Therefore, in this embodiment, in order to further speed up the calculation process of the predicted value, the prediction processing unit 102 calculates the predicted value by thinning and calculating the data of the row whose relative distance is equal to or greater than a predetermined threshold. Specifically, the prediction processing unit 102 calculates the predicted value y _t ^ using the prediction formula shown in the following formula (8). Formula (8) is a prediction formula obtained by further modifying the vectorization prediction formula shown in formula (5). The predicted value y _t ^ calculated by formula (8) is an approximation of the predicted value calculated by the prediction formula shown in formula (3) or formula (5).

＜式（８）＞
<Formula (8)>

上記式におけるΔｘ_－ｔ ^ｒｅｃは以下の式（９）により定義され、ｙ_－ｔ ^ｒｅｃは、以下の式（１０）により定義される。 In the above formula, Δx _−t ^rec is defined by the following formula (9), and y _−t ^rec is defined by the following formula (10).

＜式（９）＞
<Formula (9)>

＜式（１０）＞
<Formula (10)>

ここで、Ｒ（ｔ）は、説明変数のデータにおけるｔ行以外の行のデータセットとｔ行のデータセットとの相対距離に基づいて生成されるＴ－１個の成分を有するベクトルである。Ｒ（ｔ）は、相対距離が所定の閾値未満である成分、すなわち予測値の算出における寄与度合いが所定の基準以上である成分を抽出するためのフィルタとして用いられる。予測処理部１０２は、相対距離が閾値未満である行のデータを抽出するためのフィルタ、換言すると、予測値の算出における寄与度合いが所定の基準以上である成分を抽出するためのフィルタであるＲ（ｔ）を相対距離に基づいて生成し、当該フィルタを用いて相対距離が閾値以上である行のデータを間引く。Ｒ（ｔ）は、説明変数のデータ間の相対距離を表すリカレンスプロット（図４参照）においてプロットされる点に対応するものであり、以下の式（１１）、（１２）により定義される。 Here, R(t) is a vector having T-1 components generated based on the relative distance between the data set of rows other than row t in the explanatory variable data and the data set of row t. R(t) is used as a filter for extracting components whose relative distance is less than a predetermined threshold, that is, components whose contribution in the calculation of the predicted value is equal to or greater than a predetermined standard. The prediction processing unit 102 generates R(t), which is a filter for extracting data of rows whose relative distance is less than a threshold, in other words, a filter for extracting components whose contribution in the calculation of the predicted value is equal to or greater than a predetermined standard, based on the relative distance, and thins out data of rows whose relative distance is equal to or greater than a threshold using the filter. R(t) corresponds to a point plotted in a recurrence plot (see FIG. 4) that represents the relative distance between the data of the explanatory variables, and is defined by the following formulas (11) and (12).

＜式（１１）＞
<Formula (11)>

＜式（１２）＞
<Formula (12)>

なお、式（１２）において、ｋ＝１，２，・・・，ｔ－１，ｔ＋１，・・・，Ｔである。すなわちｋは、１からＴまでの整数のうちｔを除く整数である。また、式（１２）において、ｄ（ｘ_ｋ，ｘ_ｔ）は、説明変数Ｘのデータにおけるｋ行のデータセットとｔ行のデータセットとの相対距離である。また、δは、上述した所定の閾値である。図４に示される、Ｒ（ｔ）とリカレンスプロットとの関係性からわかるように、式（１１）で示されるＲ（ｔ）は、時刻ｔにおけるリカレンスプロットの分布を０と１とで表現するベクトルである。 In addition, in formula (12), k = 1, 2, ..., t-1, t+1, ..., T. That is, k is an integer from 1 to T excluding t. In addition, in formula (12), d( _xk , _xt ) is the relative distance between the kth row of the data set and the tth row of the data set in the explanatory variable X. In addition, δ is the above-mentioned predetermined threshold value. As can be seen from the relationship between R(t) and the recurrence plot shown in FIG. 4, R(t) shown in formula (11) is a vector expressing the distribution of the recurrence plot at time t with 0 and 1.

上述の通り、Ｒ（ｔ）の成分は、説明変数Ｘのデータにおけるｔ行のデータセットとの相対距離が閾値δ以上である場合に０となり、相対距離が閾値δ未満である場合に１となる。そして、式（９）に示されるように、０又は１の値をとるＴ－１個の成分を有するＲ（ｔ）とΔｘ_－ｔとのアダマール積によりΔｘ_－ｔ ^ｒｅｃが定義される。つまり、Δｘ^ｒｅｃ _－ｔは、Δｘ_－tの成分の値を変更したベクトルであって、説明変数のデータにおいてｔ行のデータとの相対距離が閾値以上である行に対応する各成分の値を０に変更したベクトルである。換言すると、Δｘ^ｒｅｃ _－ｔは、Δｘ_－tの成分の値を変更したベクトルであって、Δｘ_－tの成分のうち、相対距離が閾値以上である成分の値を０に変更したベクトルとも言える。同様に、式（１０）に示されるように、０又は１の値をとるＴ－１個の成分を有するＲ（ｔ）とｙ_－ｔとのアダマール積によりｙ_－ｔ ^ｒｅｃが定義される。つまり、ｙ_－ｔ ^ｒｅｃは、ｙ_－tの成分の値を変更したベクトルであって、説明変数のデータにおいてｔ行のデータとの相対距離が閾値以上である行に対応する各成分の値を０に変更したベクトルである。換言すると、ｙ_－ｔ ^ｒｅｃは、ｙ_－ｔの成分の値を変更したベクトルであって、ｙ_－ｔの成分のうち、Δｘ^ｒｅｃ _－ｔにおいて値が０に変更された成分と同じインデックス（要素番号）の成分の値を０に変更したベクトルとも言える。このため、式（５）に示す予測式とは対照的に、式（８）で示した予測式では、Δｘ_－ｔの各成分のうち相対距離が閾値以上である成分つまり予測値算出に対して実質的に寄与しない成分の値を０に変更したベクトルと、ｙ_－ｔの各成分のうち予測値算出に対して実質的に寄与しない成分の値を０に変更したベクトルとを用いて、予測値の算出が行なわれることとなる。予測処理部１０２は、このように、相対距離が所定の閾値以上である行のデータを間引いて計算することにより、換言すると、そのようなデータを除外して計算することにより、より少ない計算量で予測値を算出することができ、予測値の予測処理において、高速な処理を実現することができる。また、これにともない、メモリの消費も低減できる。 As described above, the components of R(t) are 0 when the relative distance between the data set of the tth row in the data of the explanatory variable X is equal to or greater than the threshold δ, and are 1 when the relative distance is less than the threshold δ. Then, as shown in formula (9), Δx _-t ^rec is defined by the Hadamard product of R(t) having T-1 components that take the value of 0 or 1 and Δx _-t . That is, Δx ^rec _-t is a vector in which the values of the components of Δx _-t have been changed, and the values of each component corresponding to the row in the explanatory variable data whose relative distance between the data of the tth row and the data of the tth row has been changed to 0. In other words, Δx ^rec _-t is a vector in which the values of the components of Δx _-t have been changed, and it can be said that the values of the components of Δx _-t whose relative distance is equal to or greater than the threshold have been changed to 0. Similarly, as shown in formula (10), y _-t ^rec is defined by the Hadamard product of R(t) having T-1 components that take the value of 0 or 1 and y _-t . That is, y _-t ^rec is a vector in which the value of the component of y _-t is changed, and the value of each component corresponding to a row in which the relative distance from the data of the t row in the explanatory variable data is equal to or greater than a threshold is changed to 0. In other words, y _-t ^rec is a vector in which the value of the component of y _-t is changed to 0, and the value of the component of y _-t having the same index (element number) as the component whose value is changed to 0 in Δx ^rec _-t is changed to 0. Therefore, in contrast to the prediction formula shown in formula (5), in the prediction formula shown in formula (8), the prediction value is calculated using a vector in which the value of the component _whose relative distance is equal to or greater than a threshold, that is, the component that does not substantially contribute to the prediction value calculation, is changed to 0 among the components of Δx _-t , and a vector in which the value of the component that does not substantially contribute to the prediction value calculation is changed to 0 among the components of y -t. In this way, the prediction processing unit 102 performs calculations by thinning out data from rows whose relative distances are equal to or greater than a predetermined threshold, in other words, by excluding such data, it is possible to calculate a predicted value with a smaller amount of calculations, thereby realizing high-speed processing in the prediction process of the predicted value. In addition, memory consumption can be reduced accordingly.

次に、情報処理装置１０のハードウェア構成の一例について説明する。図５は、情報処理装置１０のハードウェア構成の一例を示すブロック図である。図５に示すように、情報処理装置１０は、入出力インタフェース１５１、メモリ１５２、及びプロセッサ１５３を含む。 Next, an example of the hardware configuration of the information processing device 10 will be described. FIG. 5 is a block diagram showing an example of the hardware configuration of the information processing device 10. As shown in FIG. 5, the information processing device 10 includes an input/output interface 151, a memory 152, and a processor 153.

入出力インタフェース１５１は、必要に応じて他の装置（例えば、入力装置又は出力装置など）と通信可能に接続するためのインタフェースである。例えば、入出力インタフェース１５１は、データ取得部１０１がデータを取得するために用いられてもよいし、予測処理部１０２が予測結果を出力するために用いられてもよい。 The input/output interface 151 is an interface for communicatively connecting to other devices (e.g., an input device or an output device) as necessary. For example, the input/output interface 151 may be used by the data acquisition unit 101 to acquire data, or by the prediction processing unit 102 to output prediction results.

メモリ１５２は、例えば、揮発性メモリ及び不揮発性メモリの組み合わせによって構成される。メモリ１５２は、プロセッサ１５３により実行される、１以上の命令を含むソフトウェア（コンピュータプログラム）、及び各種処理に用いるデータなどを格納するために使用される。 Memory 152 is composed of, for example, a combination of volatile memory and non-volatile memory. Memory 152 is used to store software (computer programs) including one or more instructions executed by processor 153, data used for various processes, etc.

プロセッサ１５３は、メモリ１５２からソフトウェア（コンピュータプログラム）を読み出して実行することで、図１に示した各構成要素の処理を行う。プロセッサ１５３は、例えば、マイクロプロセッサ、ＭＰＵ(Micro Processor Unit)、又はＣＰＵ(Central Processing Unit)などであってもよい。プロセッサ１５３は、複数のプロセッサを含んでもよい。
このように、情報処理装置１０は、コンピュータとしての機能を備えている。 1 by reading and executing software (computer programs) from the memory 152. The processor 153 may be, for example, a microprocessor, a microprocessor unit (MPU), or a central processing unit (CPU). The processor 153 may include multiple processors.
In this manner, the information processing device 10 has the functions of a computer.

プログラムは、コンピュータに読み込まれた場合に、実施形態で説明される１又はそれ以上の機能をコンピュータに行わせるための命令群（又はソフトウェアコード）を含む。プログラムは、非一時的なコンピュータ可読媒体又は実体のある記憶媒体に格納されてもよい。限定ではなく例として、コンピュータ可読媒体又は実体のある記憶媒体は、random-access memory（RAM）、read-only memory（ROM）、フラッシュメモリ、solid-state drive（SSD）又はその他のメモリ技術、CD-ROM、digital versatile disc（DVD）、Blu-ray（登録商標）ディスク又はその他の光ディスクストレージ、磁気カセット、磁気テープ、磁気ディスクストレージ又はその他の磁気ストレージデバイスを含む。プログラムは、一時的なコンピュータ可読媒体又は通信媒体上で送信されてもよい。限定ではなく例として、一時的なコンピュータ可読媒体又は通信媒体は、電気的、光学的、音響的、またはその他の形式の伝搬信号を含む。 The program includes instructions (or software code) that, when loaded into a computer, cause the computer to perform one or more functions described in the embodiments. The program may be stored on a non-transitory computer-readable medium or a tangible storage medium. By way of example and not limitation, computer-readable media or tangible storage media include random-access memory (RAM), read-only memory (ROM), flash memory, solid-state drive (SSD) or other memory technology, CD-ROM, digital versatile disc (DVD), Blu-ray (registered trademark) disk or other optical disk storage, magnetic cassette, magnetic tape, magnetic disk storage or other magnetic storage device. The program may be transmitted on a transitory computer-readable medium or communication medium. By way of example and not limitation, a transitory computer-readable medium or communication medium includes electrical, optical, acoustic, or other forms of propagated signals.

次に、情報処理装置１０の処理について、フローチャートを参照しつつ説明する。図６は、本実施の形態にかかる情報処理装置１０の動作の一例を示すフローチャートである。 Next, the processing of the information processing device 10 will be described with reference to a flowchart. FIG. 6 is a flowchart showing an example of the operation of the information processing device 10 according to this embodiment.

ステップＳ１００において、データ取得部１０１は、説明変数及び目標変数のデータを設定する。本実施の形態では、具体的には、データ取得部１０１は、予測値の算出に用いる、Ｔ行ｍ列の行列として表される説明変数のデータと、Ｔ個の成分を有するベクトルとして表される目的変数のデータを設定する。次に、ステップＳ１０１において、予測処理部１０２は、高速化のための予測式を用いて予測値を算出する。本実施の形態では、予測処理部１０２は、式（８）に示した予測式のように、データを間引いて計算することを可能になるように変形されたベクトル化予測式を用いて予測値を算出したが、予測処理部１０２による予測値の計算は、相対距離のベクトル化という特徴と、データの間引きという特徴のいずれか一方の特徴だけを採用して予測値の算出を行なってもよい。次に、ステップＳ１０２において、予測処理部１０２は、ステップＳ１０１で算出された予測値、すなわち予測結果を出力する。予測処理部１０２は、ディスプレイなどの出力装置に出力してもよいし、予測結果を記憶するために記憶装置に出力してもよい。 In step S100, the data acquisition unit 101 sets data on explanatory variables and target variables. In this embodiment, specifically, the data acquisition unit 101 sets data on explanatory variables expressed as a matrix with T rows and m columns, and data on target variables expressed as a vector having T components, to be used in calculating the predicted value. Next, in step S101, the prediction processing unit 102 calculates the predicted value using a prediction formula for speeding up. In this embodiment, the prediction processing unit 102 calculates the predicted value using a vectorized prediction formula that is modified so that calculation can be performed by thinning out data, such as the prediction formula shown in formula (8). However, the calculation of the predicted value by the prediction processing unit 102 may be performed by adopting only one of the features of vectorization of relative distance and the feature of thinning out data. Next, in step S102, the prediction processing unit 102 outputs the predicted value calculated in step S101, i.e., the prediction result. The prediction processing unit 102 may output to an output device such as a display, or may output to a storage device to store the prediction result.

以上、実施の形態について説明した。情報処理装置１０によれば、予測処理部１０２は、説明変数のデータのうち注目する行のデータと他のそれぞれの行のデータとの相対距離がベクトルにより表された予測式を用いて、又は、当該他のそれぞれの行のうち相対距離が所定の閾値以上である行のデータを間引いて、予測値を算出する。前者の場合、相対距離がベクトルにより表されていない予測式をコンピュータプログラムにより演算する場合に比べて、実行されるループ処理が削減されるため、高速な処理を実現できる。また、後者の場合、予測値の算出に実質的に寄与しないデータを間引いて処理が行なわれるため、高速な処理を実現できる。また、メモリの消費も低減できる。また、特に、予測処理部１０２は、予測値の算出に用いる予測式として、ＮＰＭＲモデルの予測式に基づく予測式を用いるため、十分に非線形性を反映した予測処理を高速に実現することができる。 The above describes the embodiment. According to the information processing device 10, the prediction processing unit 102 calculates a predicted value using a prediction formula in which the relative distance between the data of a row of interest and the data of each of the other rows of explanatory variable data is expressed by a vector, or by thinning out the data of the rows in which the relative distance is equal to or greater than a predetermined threshold value among the other rows. In the former case, the loop processing to be executed is reduced compared to the case where a prediction formula in which the relative distance is not expressed by a vector is calculated by a computer program, so that high-speed processing can be realized. In the latter case, data that does not substantially contribute to the calculation of the predicted value is thinned out and processing is performed, so that high-speed processing can be realized. In addition, memory consumption can also be reduced. In particular, the prediction processing unit 102 uses a prediction formula based on the prediction formula of the NPMR model as a prediction formula used to calculate the predicted value, so that prediction processing that sufficiently reflects nonlinearity can be realized at high speed.

次に、予測処理部１０２による高速な処理についての実験結果を示す。本実験のために用いた測定環境は以下の通りである。
オペレ－ティングシステム：macOS Catalina
プロセッサ：2.8GHz、クアッドコア、Intel Core i7
メモリ：16GB、2133Mhz、LPDDR3 Next, experimental results will be shown regarding high-speed processing by the prediction processing unit 102. The measurement environment used for this experiment is as follows.
Operating system: macOS Catalina
Processor: 2.8GHz, quad-core, Intel Core i7
Memory: 16GB, 2133Mhz, LPDDR3

また、第１の実験では、テストデータとして以下により定義される２つの時系列データを用い、これらの時系列データから説明変数Ｘ及び目的変数Ｙを設定した。なお、式（１３）において、ｅ_１（ｔ）、ｅ_２（ｔ）は、ホワイトノイズである。式（１３）に示すテストデータでは、ｘ_２からｘ_１への非線形な因果が存在する。 In the first experiment, two time series data defined as follows were used as test data, and explanatory variable X and objective variable Y were set from these time series data. In addition, in formula (13), e ₁ (t) and e ₂ (t) are white noise. In the test data shown in formula (13), there is a nonlinear causality from x ₂ to x ₁ .

＜式（１３）＞
<Formula (13)>

また、第２の実験では、テストデータとして以下により定義される３つの時系列データを用い、これらの時系列データから説明変数Ｘ及び目的変数Ｙを設定した。なお、式（１４）において、ｅ_１（ｔ）、ｅ_２（ｔ）、ｅ_３（ｔ）は、ホワイトノイズである。式（１４）に示すテストデータでは、ｘ_１からｘ_２への非線形な因果と、ｘ_１からｘ_３への非線形な因果と、ｘ_２からｘ_３への線形な因果が存在する。 In the second experiment, three time series data defined as follows were used as test data, and explanatory variable X and objective variable Y were set from these time series data. In addition, in formula (14), e ₁ (t), e ₂ (t), and e ₃ (t) are white noise. In the test data shown in formula (14), there is a nonlinear causal relationship from x ₁ to x ₂ , a nonlinear causal relationship from x ₁ to x ₃ , and a linear causal relationship from x ₂ to x ₃ .

＜式（１４）＞
<Formula (14)>

図７は、第１の実験における実験結果を示すグラフである。また、図８は、第２の実験における実験結果を示すグラフである。図７及び図８では、オリジナルのＮＰＭＲを用いて解析する場合の所要時間のグラフ（ラベル「NPMR」が付されたグラフ）と、相対距離のベクトル化がされたＮＰＭＲを用いて解析する場合の所要時間のグラフ（ラベル「Fast NPMR-1」が付されたグラフ）と、相対距離のベクトル化とデータの間引きを組み合わせたＮＰＭＲを用いて解析する場合の所要時間のグラフ（ラベル「Fast NPMR-2」が付されたグラフ）とを示している。図７及び図８に示されるように、本実施の形態で示される技術が用いられることにより、処理が高速化されることがわかる。特にデータ数が多いほど、また、解析対象が複雑であるほど、高速化が顕著である。 Figure 7 is a graph showing the results of the first experiment. Figure 8 is a graph showing the results of the second experiment. Figures 7 and 8 show a graph of the time required for analysis using the original NPMR (graph labeled "NPMR"), a graph of the time required for analysis using NPMR with relative distance vectorization (graph labeled "Fast NPMR-1"), and a graph of the time required for analysis using NPMR with a combination of relative distance vectorization and data thinning (graph labeled "Fast NPMR-2"). As shown in Figures 7 and 8, it can be seen that the technology shown in this embodiment is used to speed up processing. The speedup is particularly noticeable the more data there is and the more complex the analysis target is.

また、第３の実験では、上述した予測式とは別の予測式を用いて、データの間引きによる効果について確認した。ここでは、カーネル法を用いた予測式について検討した。具体的には、予測処理部１０２は、以下の式（１５）で示される予測式を高速に処理するために、以下の式（１６）で示される予測式を用いて処理を行なう。 In addition, in the third experiment, a prediction formula other than the above-mentioned prediction formula was used to confirm the effect of thinning out the data. Here, a prediction formula using the kernel method was examined. Specifically, the prediction processing unit 102 performs processing using the prediction formula shown in the following formula (16) in order to quickly process the prediction formula shown in the following formula (15).

＜式（１５）＞
<Formula (15)>

＜式（１６）＞
<Formula (16)>

式（１６）においても、上述したフィルタ、すなわち相対距離が所定の閾値未満である成分（予測値の算出における寄与度合いが所定の基準以上である成分）を抽出するためのフィルタを用いて、計算に用いられるデータが間引かれている。ここで、式（１６）におけるＲ（ｔ）は、以下のように定義され、相対距離が閾値δ未満である成分を示すインデックス（行番号）の集合を表す。なお、式（１７）において、Ｄ（ｔ_ｉ）は、データ取得部１０１が設定した説明変数Ｘのｉ行のデータを示し、Ｄ（ｔ_ｉ）は、説明変数Ｘのｔ行のデータＤ（ｔ）との相対距離が算出され、閾値と比較される。 In formula (16), the data used in the calculation is thinned out using the above-mentioned filter, that is, a filter for extracting components whose relative distance is less than a predetermined threshold (components whose contribution in the calculation of the predicted value is equal to or greater than a predetermined standard). Here, R(t) in formula (16) is defined as follows, and represents a set of indexes (row numbers) indicating components whose relative distance is less than the threshold δ. In formula (17), D(t _i ) indicates the data of the i row of explanatory variable X set by the data acquisition unit 101, and D(t _i ) is calculated as the relative distance from the data D(t) of the t row of explanatory variable X, and is compared with the threshold.

＜式（１７）＞
<Formula (17)>

予測処理部１０２は、Ｒ（ｔ）を生成し、式（１６）に示される予測式の演算を実行する。図９は、第３の実験における実験結果を示すグラフである。なお、第３の実験において用いたテストデータは、第１の実験と同様である。図９では、相対距離に基づくデータの間引きを行なわないで解析する場合の所要時間のグラフ（ラベル「通常版」が付されたグラフ）と、相対距離に基づくデータの間引きを行なって解析する場合の所要時間のグラフ（ラベル「高速版」が付されたグラフ）とを示している。図９に示されるように、フィルタを用いたデータの間引きが行なわれることにより、処理が高速化されることがわかる。 The prediction processing unit 102 generates R(t) and performs the calculation of the prediction formula shown in equation (16). Figure 9 is a graph showing the experimental results of the third experiment. Note that the test data used in the third experiment is the same as that in the first experiment. Figure 9 shows a graph of the time required for analysis without thinning out data based on relative distance (graph labeled "normal version") and a graph of the time required for analysis after thinning out data based on relative distance (graph labeled "fast version"). As shown in Figure 9, it can be seen that the processing speed is increased by thinning out data using a filter.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。例えば、上記実施の形態では、情報処理装置１０が、データ取得部１０１及び予測処理部１０２の機能を有したが、これらの一部又は全ての機能が、他の装置（例えば、サーバなど）により実装されてもよい。すなわち、１以上の装置から構成されるシステムにより、上記実施の形態で説明された処理が実現されてもよい。 The present invention is not limited to the above-described embodiment, and can be modified as appropriate without departing from the spirit of the present invention. For example, in the above-described embodiment, the information processing device 10 has the functions of the data acquisition unit 101 and the prediction processing unit 102, but some or all of these functions may be implemented by another device (e.g., a server, etc.). In other words, the processing described in the above-described embodiment may be realized by a system composed of one or more devices.

１０情報処理装置
１０１データ取得部
１０２予測処理部
１５１入出力インタフェース
１５２メモリ
１５３プロセッサ 10 Information processing device 101 Data acquisition unit 102 Prediction processing unit 151 Input/output interface 152 Memory 153 Processor

Claims

a data acquisition unit that acquires data on explanatory variables represented as a matrix with T rows and m columns (where T is an integer of 2 or more) and data on objective variables represented as a vector having T components;
a prediction processing unit that calculates a predicted value for the objective variable using a predetermined prediction formula including a calculation of a relative distance between a row of data of interest and each of the other rows of data among the explanatory variable data, and the data acquired by the data acquisition unit;
The prediction processing unit calculates the predicted value using the prediction formula in which the relative distance to each of the other rows is expressed by a vector, or by thinning out data of rows among the other rows whose relative distance is equal to or greater than a predetermined threshold.

The information processing device according to claim 1 , wherein the prediction processing unit generates a filter for extracting data of the row in which the relative distance is less than the threshold value based on the relative distance, and thins out data of the row in which the relative distance is equal to or greater than the threshold value using the filter.

The information processing apparatus according to claim 1 , wherein the predetermined prediction formula is a prediction formula based on a prediction formula of a Non-Parametric Multiplicative Regression (NPMR) model.

The information processing device according to claim 3 , wherein the prediction processing unit calculates the predicted value using the following formula as the prediction formula:
however,
In the following formula:
is the predicted value,
t corresponds to the row number of the row of interest,
Δx ^rec _-t is a vector obtained by changing the values of the components of a vector having T-1 components whose components are the relative distance between a dataset of rows other than row t in the explanatory variable data and the dataset of row t, and the values of each component corresponding to the row whose relative distance is equal to or greater than the threshold are changed to 0;
y ^rec _-t is a vector obtained by changing the values of the components of a vector having T-1 components excluding the t-th component from the objective variable represented as a vector having T components, in which the values of the components corresponding to the rows in which the relative distance is equal to or greater than the threshold value are changed to 0;
exp is the exponential function,
Exp is the exponential function acting on each component of the vector.

The information processing apparatus according to claim 1 , wherein the data of the explanatory variables and the data of the objective variables are time-series data.

An information processing device,
Obtain data on explanatory variables represented as a matrix with T rows and m columns (where T is an integer of 2 or more) and data on objective variables represented as a vector having T components;
calculating a predicted value for the objective variable using a predetermined prediction formula including a calculation of a relative distance between the data of a row of interest and each of the other rows of data among the data of the explanatory variables, and the acquired data;
The information processing method, in which in the step of calculating the predicted value, the predicted value is calculated using the prediction formula in which the relative distance to each of the other rows is expressed by a vector, or by thinning out data of rows among the other rows whose relative distance is equal to or greater than a predetermined threshold.

a data acquisition step of acquiring data on explanatory variables represented as a matrix with T rows and m columns (where T is an integer of 2 or more) and data on objective variables represented as a vector having T components;
a prediction processing step of calculating a predicted value for the objective variable using a predetermined prediction formula including a calculation of a relative distance between a row of data of interest and each of the other rows of data among the explanatory variable data, and the data acquired in the data acquisition step;
A program for calculating the predicted value in the prediction processing step by using the prediction formula in which the relative distance to each of the other rows is expressed by a vector, or by thinning out data of rows among the other rows whose relative distance is equal to or greater than a predetermined threshold.