JP6937359B2

JP6937359B2 - Cluster division evaluation device, cluster division evaluation method and cluster division evaluation program

Info

Publication number: JP6937359B2
Application number: JP2019233838A
Authority: JP
Inventors: 佳秀太田; 北村　慎吾; 慎吾北村
Original assignee: Hitachi Industry and Control Solutions Co Ltd
Current assignee: Hitachi Industry and Control Solutions Co Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2021-09-22
Anticipated expiration: 2039-12-25
Also published as: JP2021103398A

Description

本発明は、クラスタ分割評価装置、クラスタ分割評価方法及びクラスタ分割評価プログラムに関する。 The present invention relates to a cluster division evaluation device, a cluster division evaluation method, and a cluster division evaluation program.

例えば、ある企業が新たな出店を計画する場合、当該企業は、どこに、どのような規模の出店をすれば、どの程度の売上高を見込めるかを予測する。また、他の企業は、ある商品を購買した顧客のデータに基づき、その顧客の購買金額（総合的な購買力）を予測する。このように、マーケティングのために、購買者の購買金額又は販売者の売上高を予測する機会は多い。このような予測を行う手法として、店舗又は顧客の過去データを使用する回帰分析が知られている。特許文献１の特性値予測装置もまた、回帰分析の手法を使用し、販売管理を含む広い分野の特性値（目的変数）を予測する。 For example, when a company plans to open a new store, the company predicts where, what size of store, and how much sales can be expected. In addition, other companies predict the purchase amount (total purchasing power) of a customer based on the data of the customer who purchased a certain product. In this way, there are many opportunities to predict the purchase amount of the buyer or the sales of the seller for marketing. As a method for making such a prediction, a regression analysis using past data of a store or a customer is known. The characteristic value predictor of Patent Document 1 also uses a regression analysis method to predict characteristic values (objective variables) in a wide range of fields including sales management.

特許文献１の特性値予測装置は、説明変数のｎ個の候補のうちから１つの変数の組合せ、２つの変数の組合せ、３つの変数の組合せ、・・・、ｎ個の変数の組合せを作成する。これらの組合せの総数は、_ｎＣ_１＋_ｎＣ_２＋_ｎＣ_３＋・・・＋_ｎＣ_ｎとなる。当該予測装置は、各組合せに属する説明変数の実測値及びその時点の目的変数の実測値を使用して、予測モデルを作成する。そして、当該予測装置は、予測モデルが出力する目的変数の予測値と実測値との差分（誤差）を求め、誤差の分散の対数に負の符号を付したものを“基準値”とする。当該予測装置は、説明変数の数が少ない組合せから順に基準値を算出していく。すると、説明変数の数が増えるにつれて基準値は上昇する。当該予測装置は、この上昇の度合いが所定の閾値より小さくなる直前の組合せを、目的変数を予測するための説明変数の組合せとする。 The characteristic value predictor of Patent Document 1 creates a combination of one variable, a combination of two variables, a combination of three variables, ..., A combination of n variables from n candidates for explanatory variables. do. The total number of these combinations is _n C ₁ + _n C ₂ + _n C ₃ + ... + _n C _n . The prediction device creates a prediction model using the measured values of the explanatory variables belonging to each combination and the measured values of the objective variable at that time. Then, the prediction device obtains the difference (error) between the predicted value of the objective variable output by the prediction model and the measured value, and sets the logarithm of the variance of the error as a "reference value". The prediction device calculates reference values in order from the combination with the smallest number of explanatory variables. Then, the reference value rises as the number of explanatory variables increases. The prediction device sets the combination immediately before the degree of increase becomes smaller than a predetermined threshold value as the combination of explanatory variables for predicting the objective variable.

特開平７−９３２８４号公報Japanese Unexamined Patent Publication No. 7-93284

売上高又は購買金額の予測においては、データの特性に応じてデータを複数のカテゴリに分割することがある。例えば、大規模小売店の売上高の予測におけるカテゴリは、“都心部”、“住宅街”、“郊外”等である。しかしながら、通常の業務知識からは見落としがちなカテゴリが存在する。例えば、東京の皇居北西部は、“都心部”ではあるが、“オフィス街”及び“高級住宅街”となっている。千葉県の海浜地区は、“郊外”ではあるが、“オフィス街”及び“高層マンション街”となっている。いずれの場合も、“都心部”、“住宅街”及び“郊外”という一般的なカテゴリを使用することはできない。 In forecasting sales or purchase amount, data may be divided into multiple categories according to the characteristics of the data. For example, the categories in the sales forecast of large-scale retail stores are "city center", "residential area", "suburbs", and the like. However, there are categories that are often overlooked from ordinary business knowledge. For example, the northwestern part of the Imperial Palace in Tokyo is an "office district" and a "luxury residential district," although it is a "city center." The beach area of Chiba Prefecture is an "office district" and a "high-rise condominium district," although it is a "suburb." In either case, the general categories "city center", "residential area" and "suburbs" cannot be used.

特許文献１の特性値予測装置は、最適な予測モデルを作成するためのものであるが、説明変数の選択に注意を集中しており、予測モデルをデータの特性に応じて使い分けるという発想に欠ける。
そこで、本発明は、精度の高い複数の予測モデルをデータの特性に応じて使用できるように、変数の多次元空間を分割することを目的とする。 The characteristic value prediction device of Patent Document 1 is for creating an optimum prediction model, but it focuses attention on the selection of explanatory variables and lacks the idea of using the prediction model properly according to the characteristics of the data. ..
Therefore, an object of the present invention is to divide a multidimensional space of variables so that a plurality of highly accurate prediction models can be used according to the characteristics of data.

本発明のクラスタ分割評価装置は、取引主体の属性の調査値、及び、取引主体の経済力の調査値を取得する調査値取得部と、取引主体における複数の属性の調査値及び取引主体の経済力の調査値を示す点に基づき、多次元空間内において複数のクラスタに分割するクラスタリング部と、分割されたクラスタごとの属性の調査値及び経済力の調査値に基づき、取引主体の属性を説明変数とし取引主体の経済力を目的変数とする予測モデルを作成し、予測モデルのパラメータを、複数のクラスタごとに最適化し、パラメータが最適化された予測モデルが出力する取引主体の経済力の予測値と、取引主体の経済力の調査値との差分を、クラスタの数ごとに評価し、評価の結果に基づき、クラスタの数を決定する回帰分析部と、を備えることを特徴とする。
その他の手段については、発明を実施するための形態のなかで説明する。 The cluster division evaluation device of the present invention includes a survey value acquisition unit that acquires the survey value of the attribute of the trading entity and the survey value of the economic power of the trading entity, and the survey value of a plurality of attributes in the trading entity and the economy of the trading entity. Based on the point showing the power survey value, the clustering part that divides into multiple clusters in the multidimensional space, and the attribute of the trading entity is explained based on the attribute survey value and economic power survey value of each divided cluster. Create a prediction model with the economic power of the trading entity as a variable , optimize the parameters of the prediction model for each of multiple clusters, and predict the economic power of the trading entity output by the prediction model with optimized parameters. It is characterized by including a regression analysis unit that evaluates the difference between the value and the survey value of the economic power of the trading entity for each number of clusters and determines the number of clusters based on the evaluation result.
Other means will be described in the form for carrying out the invention.

本発明によれば、精度の高い複数の予測モデルをデータの特性に応じて使用できるように、変数の多次元空間を分割することができる。 According to the present invention, the multidimensional space of variables can be divided so that a plurality of highly accurate prediction models can be used according to the characteristics of the data.

クラスタ分割評価装置の構成等を説明する図である。It is a figure explaining the structure of the cluster division evaluation apparatus. 調査値情報の一例である。This is an example of survey value information. クラスタ情報の一例である。This is an example of cluster information. クラスタと予測モデルとの関係を説明する図である。It is a figure explaining the relationship between a cluster and a prediction model. クラスタと予測モデルとの関係を説明する図である。It is a figure explaining the relationship between a cluster and a prediction model. クラスタと予測モデルとの関係を説明する図である。It is a figure explaining the relationship between a cluster and a prediction model. クラスタと予測モデルとの関係を説明する図である。It is a figure explaining the relationship between a cluster and a prediction model. 誤差を説明する図である。It is a figure explaining an error. 誤差情報の一例である。This is an example of error information. 処理手順のフローチャートである。It is a flowchart of a processing procedure. 調査値情報の一例である。This is an example of survey value information.

以降、本発明を実施するための形態（“本実施形態”という）を、図等を参照しながら詳細に説明する。本実施形態は、第１の実施形態及び第２の実施形態を有する。第１の実施形態は、店舗の年間売上高を予測する例であり、第２の実施形態は、顧客の年間購買金額を予測する例である。これらは、あくまでも一例であり、本発明は、より一般的に、マーケティングに関する数量の予測に適用可能である。 Hereinafter, a mode for carrying out the present invention (referred to as “the present embodiment”) will be described in detail with reference to figures and the like. This embodiment has a first embodiment and a second embodiment. The first embodiment is an example of predicting the annual sales of a store, and the second embodiment is an example of predicting the annual purchase amount of a customer. These are just examples, and the present invention is more generally applicable to marketing quantity forecasting.

〈第１の実施形態〉
まず、第１の実施形態を説明する。 <First Embodiment>
First, the first embodiment will be described.

（クラスタ分割評価装置）
図１に沿って、クラスタ分割評価装置１の構成等を説明する。クラスタ分割評価装置１は、一般的なコンピュータであり、中央制御装置１１、マウス、キーボード等の入力装置１２、ディスプレイ等の出力装置１３、主記憶装置１４、補助記憶装置１５及び通信装置１６を備える。これらは、バスで相互に接続されている。補助記憶装置１５は、予測モデル３１、調査値情報３２、クラスタ情報３３及び誤差情報３４（詳細後記）を格納している。 (Cluster division evaluation device)
The configuration of the cluster division evaluation device 1 and the like will be described with reference to FIG. The cluster division evaluation device 1 is a general computer, and includes a central control device 11, an input device 12 such as a mouse and a keyboard, an output device 13 such as a display, a main storage device 14, an auxiliary storage device 15, and a communication device 16. .. These are connected to each other by a bus. The auxiliary storage device 15 stores the prediction model 31, the survey value information 32, the cluster information 33, and the error information 34 (details will be described later).

主記憶装置１４における調査値取得部２１、クラスタリング部２２、回帰分析部２３及び表示処理部２４は、プログラムである。中央制御装置１１は、これらのプログラムを補助記憶装置１５から読み出し主記憶装置１４にロードすることによって、それぞれのプログラムの機能（詳細後記）を実現する。補助記憶装置１５は、クラスタ分割評価装置１から独立した構成となっていてもよい。 The survey value acquisition unit 21, the clustering unit 22, the regression analysis unit 23, and the display processing unit 24 in the main storage device 14 are programs. The central control device 11 reads these programs from the auxiliary storage device 15 and loads them into the main storage device 14, thereby realizing the functions of the respective programs (details will be described later). The auxiliary storage device 15 may have a configuration independent of the cluster division evaluation device 1.

店舗サーバ３及びカード会社サーバ４は、ネットワーク２を介してクラスタ分割評価装置１に接続されている。クラスタ分割評価装置１は、店舗サーバ３及びカード会社サーバ４から、店舗及び顧客に関する様々なデータを取得することができる。 The store server 3 and the card company server 4 are connected to the cluster division evaluation device 1 via the network 2. The cluster division evaluation device 1 can acquire various data related to the store and the customer from the store server 3 and the card company server 4.

（予測モデル）
本実施形態の予測モデル３１は、以下の式１のような１次式である。
ｙ＝ａ_０＋ａ_１ｘ_１＋ａ_２ｘ_２＋ａ_３ｘ_３（式１） (Prediction model)
The prediction model 31 of this embodiment is a linear equation such as the following equation 1.
y = a ₀ + a ₁ x ₁ + a ₂ x ₂ + a ₃ x ₃ (Equation 1)

ここで、ｙは、店舗の年間売上高である。ｘ_１は、店舗の売場面積である。ｘ_２は、店舗が立地している土地の固定資産税路線価である。ｘ_３は、店舗の駐車場台数である。ａ_０、ａ_１、ａ_２及びａ_３は、定数（パラメータ）である。式１は、ｘ_１、ｘ_２及びｘ_３を説明変数としｙを目的変数とする関数になっている。そして、ａ_０、ａ_１、ａ_２及びａ_３の値を様々に変化させることによって、４次元空間における予測モデル３１の形状及びその位置が様々に変化する。ここで、変数の数（種類）を“４”としたのは、あくまでも一例である。変数の数がさらに多くても、すなわち、予測モデルの次元がさらに大きくてもよい。 Here, y is the annual sales of the store. x ₁ is the sales floor area of the store. x ₂ is the property tax land price of the land where the store is located. x ₃ is the number of parking lots in the store. a ₀ , a ₁ , a ₂ and a ₃ are constants (parameters). Equation 1 is a function with x ₁ , x ₂ and x ₃ as explanatory variables and y as an objective variable. Then, _{by changing the values of a 0} , a ₁ , a ₂ and a ₃ in various ways, the shape of the prediction model 31 and its position in the four-dimensional space are changed in various ways. Here, the number (type) of variables is set to "4", which is just an example. The number of variables may be larger, that is, the dimensions of the prediction model may be larger.

いま、年間売上高、売場面積、固定資産税路線価及び駐車場台数の過去における調査値の組合せとして、“[Ｙ，Ｘ_１，Ｘ_２，Ｘ_３]”が多数存在するとする。Ｙ、Ｘ_１、Ｘ_２及びＸ_３のそれぞれが示す数値の種類は、ｙ、ｘ_１、ｘ_２及びｘ_３のそれぞれが示す数値の種類と同じである。しかしながら、説明の便宜上、実際に認められた実例としての調査値を大文字で示し、予測モデルの変数を小文字で示している。予測モデルの出力（目的変数）ｙは、“予測値”である。“Ｙ−ｙ”を誤差と呼ぶ。クラスタ分割評価装置１は、調査値の組合せを使用して、誤差の２乗和“Σ（Ｙ−ｙ）^２”を最小にするパラメータの組合せ“[ａ_０，ａ_１，ａ_２，ａ_３]”を決定する（詳細後記）。 Now, suppose that there are many _{"[Y, X 1} , X ₂ , X ₃ ]" as a combination of past survey values of annual sales, sales floor area, property tax route price, and number of parking lots. The types of numerical values indicated by each of Y, X ₁ , X ₂ and X ₃ are the same as the types of numerical values indicated by each of _{y, x 1} , x ₂ and x _3. However, for convenience of explanation, the survey values as actual examples are shown in uppercase, and the variables of the prediction model are shown in lowercase. The output (objective variable) y of the prediction model is a “prediction value”. "Y-y" is called an error. The cluster division evaluation device 1 uses a combination of survey values to minimize a combination of parameters “[a ₀ , a ₁ , a ₂ , a ₃ ^{” that minimizes the sum of squares “Σ (Y−y) 2” of the error.} ] ”Is determined (details below).

（調査値情報）
図２は、調査値情報３２の一例である。調査値情報３２においては、店舗ＩＤ欄１０１に記憶された店舗ＩＤに関連付けて、目的変数欄１０２には目的変数の調査値が、説明変数欄１０３には説明変数の調査値が記憶されている。
店舗ＩＤ欄１０１の店舗ＩＤは、取引主体である店舗を一意に特定する識別子である。 (Survey value information)
FIG. 2 is an example of the survey value information 32. In the survey value information 32, the survey value of the objective variable is stored in the objective variable column 102 and the survey value of the explanatory variable is stored in the explanatory variable column 103 in association with the store ID stored in the store ID column 101. ..
The store ID in the store ID column 101 is an identifier that uniquely identifies the store that is the transaction entity.

目的変数欄１０２の目的変数の調査値は、店舗の年間売上高である。“年間”はあくまで一例であり、目的変数の調査値が、月間売上高であっても、週間売上高であっても、その他の期間の売上高であってもよい。“＃”は、異なる値を省略的に示している（以下同様）。 The survey value of the objective variable in the objective variable column 102 is the annual sales of the store. “Annual” is just an example, and the survey value of the objective variable may be monthly sales, weekly sales, or sales in other periods. “#” Indicates a different value abbreviated (the same applies hereinafter).

説明変数欄１０３の説明変数の調査値は、売場面積（欄１０３ａ）、固定資産税路線価（欄１０３ｂ）及び駐車場台数（欄１０３ｃ）である。
このうち、売場面積は、店舗の総床面積のうち、顧客に対する商品の販売に直接供される面積である。
固定資産税路線価は、店舗が立地する土地に課される固定資産税を算出する基礎となる路線価である。
駐車場台数は、店舗に来店する顧客が利用可能な駐車場の収容可能車両数である。
なお、年間売上高は、“取引主体の経済力”に、売場面積、固定資産税路線価及び駐車場台数は、“取引主体の属性”に相当する。 The survey values of the explanatory variables in the explanatory variable column 103 are the sales floor area (column 103a), the property tax route price (column 103b), and the number of parking lots (column 103c).
Of these, the sales floor area is the area of the total floor area of the store that is directly used for selling products to customers.
The property tax land price is the land price that is the basis for calculating the property tax levied on the land where the store is located.
The number of parking lots is the number of vehicles that can be accommodated in the parking lot that can be used by customers who visit the store.
The annual sales correspond to the "economic power of the transaction entity", and the sales floor area, property tax route price and the number of parking lots correspond to the "attributes of the transaction entity".

（クラスタリング）
多次元空間に描画された多くの点を、位置が近いもの同士でグループ分けすることを一般に“クラスタリング”と呼ぶ。クラスタリングの手法として、“ｋ−平均法”がよく知られている。クラスタ分割評価装置１もまた、以下の（１）〜（５）のようにｋ−平均法を使用する。 (Clustering)
Grouping many points drawn in a multidimensional space by those with similar positions is generally called "clustering". As a clustering method, the "k-means method" is well known. The cluster division evaluation device 1 also uses the k-means method as described in (1) to (5) below.

（１）クラスタ分割評価装置１は、複数の点のそれぞれを、ｋ個のグループのいずれかに適当に所属させる。
（２）クラスタ分割評価装置１は、あるグループの重心から当該グループに属する点までの距離の２乗和ｄ_ｉをグループごとに算出する。ｉは、グループの番号（ｉ＝１、２、・・・、ｋ）である。 (1) The cluster division evaluation device 1 appropriately assigns each of the plurality of points to any of the k groups.
(2) Clustering evaluation apparatus 1 calculates the sum of squares d _i of the distance from the gravity center of a group to the point belonging to the group for each group. i is a group number (i = 1, 2, ..., K).

（３）クラスタ分割評価装置１は、１個の点の所属をあるグループから他のグループに変えたうえで、Ｄ_ｋ＝Σｄ_ｉを算出する。Ｄ_ｋは、ｋ個のグループのｄ_ｉの総和である。クラスタ分割評価装置１は、所属を変える点及び新たな所属先のすべての組合せごとに当該処理を繰り返す。
（４）クラスタ分割評価装置１は、Ｄ_ｋを最小にするような、各点の所属を決定する。
（５）クラスタ分割評価装置１は、ｋを１、２、３、・・・と変化させたうえで、（１）〜（４）の処理を繰り返す。 (3) The cluster division evaluation device 1 calculates _{D k} = _Σ di after changing the affiliation of one point from a certain group to another group. D _k is the sum of _{d i} of the k groups. The cluster division evaluation device 1 repeats the process for each combination of the point of changing the affiliation and the new affiliation.
(4) The cluster division evaluation device 1 determines the affiliation of each point so as to minimize _{D k.}
(5) The cluster division evaluation device 1 repeats the processes (1) to (4) after changing k to 1, 2, 3, ....

（クラスタ情報）
図３は、クラスタ情報３３の一例である。クラスタ情報３３においては、店舗ＩＤ欄１１１に記憶された店舗ＩＤに関連付けて、目的変数欄１１２には目的変数の調査値が、説明変数欄１１３には説明変数の調査値が、所属クラスタＩＤ欄１１４には所属クラスタＩＤが記憶されている。 (Cluster information)
FIG. 3 is an example of cluster information 33. In the cluster information 33, in association with the store ID stored in the store ID column 111, the objective variable column 112 contains the survey value of the objective variable, the explanatory variable column 113 contains the survey value of the explanatory variable, and the cluster ID column to which the objective variable belongs. The belonging cluster ID is stored in 114.

店舗ＩＤ欄１１１の店舗ＩＤは、図２の店舗ＩＤと同じである。
目的変数欄１１２の目的変数の調査値は、図２の目的変数の調査値と同じである。
説明変数欄１１３の説明変数の調査値は、図２の説明変数の調査値のうちの売場面積（欄１０３ａ）である。説明の単純化のため、図３の説明変数は、“売場面積”だけとしている。 The store ID in the store ID column 111 is the same as the store ID in FIG.
The survey value of the objective variable in the objective variable column 112 is the same as the survey value of the objective variable in FIG.
The survey value of the explanatory variable in the explanatory variable column 113 is the sales floor area (column 103a) among the survey values of the explanatory variable in FIG. For the sake of simplification of the explanation, the explanatory variable in FIG. 3 is only "sales floor area".

所属クラスタＩＤ欄１１４は、クラスタの数ごとに、クラスタ数１欄１１４ａ、クラスタ数２欄１１４ｂ、クラスタ数３欄１１４ｃ、クラスタ数４欄１１４ｄ、・・・に分かれている。そして分かれた各欄に、クラスタＩＤが記憶されている。クラスタＩＤは、クラスタを一意に特定する識別子である。各クラスタは、店舗の地域特性に対応している。一般に、新規出店を計画する企業は、地域特性に応じて、年間売上高を含む様々な数値（出店パタン）を決定する。なお、“ｃ３”及び“ｃ１０”のように、所属する点が結果的に同じになったとしても、クラスタの数が異なれば、異なるクラスタＩＤが採番されている（Ｄ_ｋを算出し直しているため）。 The affiliation cluster ID column 114 is divided into a cluster number 1 column 114a, a cluster number 2 column 114b, a cluster number 3 column 114c, a cluster number 4 column 114d, and so on, according to the number of clusters. The cluster ID is stored in each of the separated columns. The cluster ID is an identifier that uniquely identifies the cluster. Each cluster corresponds to the regional characteristics of the store. In general, a company planning to open a new store determines various numerical values (store opening patterns) including annual sales according to regional characteristics. Even if the points to which they belong are the same as in "c3" and "c10", different cluster IDs are assigned if the number of clusters is different ( _Dk is recalculated). Because).

図４〜図７は、クラスタと予測モデル３１との関係を説明する図である。図４は、図３のクラスタ数１欄１１４ａに対応している。図４の座標平面の横軸は説明変数（売場面積）であり、縦軸は目的変数（年間売上高）である。座標平面上に、店舗Ｍ０１〜Ｍ２０に対応する２０個の点●が描画されている（図５〜図７においても同様）。円ｃ１は、クラスタｃ１を表している。直線３１ａは、予測モデル３１（図１）を表している。予測モデルの作成方法については後記する。 4 to 7 are diagrams for explaining the relationship between the cluster and the prediction model 31. FIG. 4 corresponds to the number of clusters 1 column 114a in FIG. The horizontal axis of the coordinate plane of FIG. 4 is an explanatory variable (sales floor area), and the vertical axis is an objective variable (annual sales). Twenty points ● corresponding to the stores M01 to M20 are drawn on the coordinate plane (the same applies to FIGS. 5 to 7). The circle c1 represents the cluster c1. The straight line 31a represents the prediction model 31 (FIG. 1). The method of creating the prediction model will be described later.

図５は、図３のクラスタ数２欄１１４ｂに対応している。円ｃ２は、クラスタｃ２を表している。円ｃ３は、クラスタｃ３を表している。直線３１ｂは、予測モデル３１（図１）を表している。直線３１ｃも、予測モデル３１（図１）を表している。
図６は、図３のクラスタ数３欄１１４ｃに対応している。図７は、図３のクラスタ数４欄１１４ｄに対応している。図６及び図７の説明は、図５の説明に準ずる。
なお、図４〜図７においては、作図上の制約に起因し、円ｃ１等の中心は、クラスタｃ１等の重心（すべての点●の座標値の平均）とはなっていない。 FIG. 5 corresponds to the number of clusters column 2 114b in FIG. The circle c2 represents the cluster c2. The circle c3 represents the cluster c3. The straight line 31b represents the prediction model 31 (FIG. 1). The straight line 31c also represents the prediction model 31 (FIG. 1).
FIG. 6 corresponds to the number of clusters 3 column 114c in FIG. FIG. 7 corresponds to the number of clusters in column 114d in FIG. The description of FIGS. 6 and 7 is based on the description of FIG.
In FIGS. 4 to 7, the center of the circle c1 or the like is not the center of gravity of the cluster c1 or the like (the average of the coordinate values of all the points ●) due to the restrictions on drawing.

図４〜図７において、クラスタ分割評価装置１は、クラスタごとに、当該クラスタに所属する調査値●のみを使用して、予測モデルを作成している。クラスタ分割評価装置１が予測モデル“ｙ＝ａ_０＋ａ_１ｘ_１”を作成する方法は、以下の（１１）〜（１７）の通りである。 In FIGS. 4 to 7, the cluster division evaluation device 1 creates a prediction model for each cluster using only the survey values ● belonging to the cluster. The method by which the cluster division evaluation device 1 _{creates the prediction model “y = a 0} + a ₁ x ₁ ” is as follows (11) to (17).

（１１）クラスタ分割評価装置１は、無作為的に発生させたパラメータａ_０及びａ_１の値を予測モデルのａ_０及びａ_１に代入する。
（１２）クラスタ分割評価装置１は、調査値Ｘを予測モデルのｘ_１に代入し、ｙを算出する。
（１３）クラスタ分割評価装置１は、誤差“Ｙ−ｙ”を算出する。
（１４）クラスタ分割評価装置１は、店舗ごとに [Ｘ，Ｙ]の値を変化させて前記（１２）及び前記（１３）の処理を繰り返す。 (11) Clustering evaluation apparatus 1 substitutes random manner the values of the parameters _{a 0} and _{a 1} which is generated in _{a 0} and _{a 1} of _the prediction model.
(12) The cluster division evaluation device 1 _{substitutes the survey value X into x 1} of the prediction model and calculates y.
(13) The cluster division evaluation device 1 calculates an error "Y-y".
(14) The cluster division evaluation device 1 changes the values of [X, Y] for each store and repeats the processes (12) and (13).

（１５）クラスタ分割評価装置１は、各店舗の“（Ｙ−ｙ）^２”の総和である“Σ（Ｙ−ｙ）^２”を算出する。
（１６）クラスタ分割評価装置１は、無作為的に発生させたパラメータａ_０及びａ_１の他の値を予測モデルのａ_０及びａ_１に代入したうえで、前記（１２）〜（１５）の処理を充分多い回数だけ繰り返す。
（１７）クラスタ分割評価装置１は、“Σ（Ｙ−ｙ）^２”を最小にするパラメータａ_０Ｓ及びａ_１Ｓの値を決定する。ここで“Ｓ”は、“最適化されている”ことを示す。 (15) The cluster division evaluation device 1 calculates "Σ (Y-y) ² ^{" which is the sum of "(Y-y) 2" of each store.}
(16) The cluster division evaluation device 1 substitutes the other values of _{the parameters a 0} and a ₁ _{randomly generated into the prediction models a 0} and a _1, and then substitutes the values (12) to (15). Is repeated a sufficiently large number of times.
(17) The cluster division evaluation device 1 determines the values of the parameters a _0S and a _1S ^{that minimize “Σ (Y−y) 2”.} Here, "S" indicates that it is "optimized".

（誤差）
図８は、誤差を説明する図である。図８の座標平面の横軸は売場面積であり、縦軸は年間売上高である。２０個の点●は、図３における調査値の組合せ[Ｘ，Ｙ]を示している。直線３１ａは、予測モデル３１（図１）であり、その式は、“ｙ＝ａ_０Ｓ＋ａ_１Ｓｘ_１”である。点●のそれぞれについて、誤差“Ｙ−ｙ”が定義される。前記したように、“Σ（Ｙ−ｙ）^２”は最小化されてはいるが、個々の点●に注目した場合、誤差が殆どないものと、誤差が比較的大きいものとが混在している。 (error)
FIG. 8 is a diagram illustrating an error. The horizontal axis of the coordinate plane of FIG. 8 is the sales floor area, and the vertical axis is the annual sales. The 20 points ● indicate the combination of survey values [X, Y] in FIG. The straight line 31a is a prediction model 31 (FIG. 1), and the formula is “y = a _0S + a _1S x ₁ ”. An error "Y-y" is defined for each of the points ●. As mentioned above, "Σ (Y-y) ² " is minimized, but when focusing on individual points ●, some have almost no error and some have relatively large error. There is.

（誤差情報）
図９は、誤差情報３４の一例である。誤差情報３４においては、クラスタ数欄１２１に記憶されたクラスタ数に関連付けて、誤差欄１２２には誤差が、誤差評価値欄１２３には誤差評価値が記憶されている。
クラスタ数欄１２１のクラスタ数は、クラスタの数である。
誤差欄１２２の誤差は、“√（Σ（Ｙ−ｙ）^２／ｎ）”である。ここで、ｎは、クラスタ内の点●の数である。“√（Σ（Ｙ−ｙ）^２／ｎ）”は、図８における誤差の２乗和の平均の平方根である。“＃”に付された括弧内には、クラスタＩＤが記載されている。 (Error information)
FIG. 9 is an example of the error information 34. In the error information 34, an error is stored in the error column 122 and an error evaluation value is stored in the error evaluation value column 123 in association with the number of clusters stored in the cluster number column 121.
The number of clusters in the number of clusters column 121 is the number of clusters.
The error in the error column 122 is “√ (Σ (Y−y) ² / n)”. Here, n is the number of points ● in the cluster. “√ (Σ (Y−y) ² / n)” is the square root of the average of the sum of squares of the errors in FIG. The cluster ID is described in parentheses attached to "#".

誤差評価値欄１２３の誤差評価値は、誤差を加工して得られる任意の値であり、その値が小さいほど、クラスタ数に対する評価は高い。誤差評価値は、例えば、誤差情報３４のレコード（行）に含まれる誤差の平均、誤差の最小値、誤差の分散等である。なお、誤差評価値の定義の仕方によっては、その値が大きいほど、クラスタ数に対する評価が高い場合もある。 The error evaluation value in the error evaluation value column 123 is an arbitrary value obtained by processing the error, and the smaller the value, the higher the evaluation for the number of clusters. The error evaluation value is, for example, the average of the errors included in the record (row) of the error information 34, the minimum value of the error, the variance of the error, and the like. Depending on how the error evaluation value is defined, the larger the value, the higher the evaluation for the number of clusters.

（処理手順）
図１０は、処理手順のフローチャートである。処理手順を開始する前提として、調査値情報３２（図２）が、完成された状態で補助記憶装置１５に格納されているものとする。
ステップＳ２０１において、クラスタ分割評価装置１の調査値取得部２１は、調査値を取得する。具体的には、調査値取得部２１は、補助記憶装置１５から調査値情報３２（図２）を取得する。 (Processing procedure)
FIG. 10 is a flowchart of the processing procedure. As a premise for starting the processing procedure, it is assumed that the survey value information 32 (FIG. 2) is stored in the auxiliary storage device 15 in a completed state.
In step S201, the survey value acquisition unit 21 of the cluster division evaluation device 1 acquires the survey value. Specifically, the survey value acquisition unit 21 acquires the survey value information 32 (FIG. 2) from the auxiliary storage device 15.

ステップＳ２０２において、クラスタ分割評価装置１のクラスタリング部２２は、変数を受け付ける。具体的には、クラスタリング部２２は、複数の説明変数の一部又は全部をユーザが入力装置１２を介して選択するのを受け付ける。例えば、ユーザが説明変数のうち売場面積のパラメータの値ａ_１がａ_０を除くすべてのパラメータのうちで最も大きくなりそうである、すなわち、売場面積が目的変数に与える影響が最も大きくなりそうであると予測している場合、ユーザは“売場面積”を選択してもよい。ここでは、ユーザは“売場面積”を選択したとする。 In step S202, the clustering unit 22 of the cluster division evaluation device 1 receives the variable. Specifically, the clustering unit 22 accepts a user to select a part or all of a plurality of explanatory variables via the input device 12. For example, the value a ₁ of the parameters of floor space among users explanatory variable is largest will likely of all the parameters except a _0, i.e., floor space is likely to be the largest influence on the objective variable If expected, the user may select "sales floor area". Here, it is assumed that the user selects "sales floor area".

ステップＳ２０３において、クラスタリング部２２は、クラスタ数の最大値等を受け付ける。具体的には、クラスタリング部２２は、クラスタ数の最小値及び最大値、並びに、１つのクラスタに含まれる点●（クラスタ情報３３のレコード数）の最小値をユーザが入力装置１２を介して選択するのを受け付ける。ここでは、ユーザはクラスタ数の最小値として“１”、クラスタ数の最大値として“４”、１つのクラスタに含まれる点●の最小値として“４”を入力したとする。 In step S203, the clustering unit 22 receives the maximum value of the number of clusters and the like. Specifically, the clustering unit 22 selects the minimum and maximum values of the number of clusters and the minimum value of the points ● (the number of records of the cluster information 33) included in one cluster via the input device 12. Accept to do. Here, it is assumed that the user inputs "1" as the minimum value of the number of clusters, "4" as the maximum value of the number of clusters, and "4" as the minimum value of the points ● included in one cluster.

ステップＳ２０４において、クラスタリング部２２は、クラスタリングを行う。具体的には、第１に、クラスタリング部２２は、調査値情報３２（図２）から、“売場面積”以外の説明変数の欄を削除する。
第２に、クラスタリング部２２は、前記したｋ−平均法を使用して、調査値情報３２（図２）の２０個の点●“[Ｘ，Ｙ]＝[売場面積，年間売上高]”を、ｋ個（ｋ＝１、２、３、４）のクラスタに分割する。このとき、クラスタリング部２２は、いずれのクラスタにも少なくとも４個の点●が含まれるようにする。 In step S204, the clustering unit 22 performs clustering. Specifically, first, the clustering unit 22 deletes the columns of explanatory variables other than the "sales floor area" from the survey value information 32 (FIG. 2).
Secondly, the clustering unit 22 uses the k-means method described above to obtain 20 points of the survey value information 32 (FIG. 2) ● “[X, Y] = [sales floor area, annual sales]”. Is divided into k clusters (k = 1, 2, 3, 4). At this time, the clustering unit 22 ensures that each cluster contains at least four points ●.

ステップＳ２０５において、クラスタリング部２２は、クラスタ情報３３（図３）を作成する。具体的には、クラスタリング部２２は、ステップＳ２０４の“第２”におけるクラスタリングの結果に基づきクラスタ情報３３を作成する。 In step S205, the clustering unit 22 creates the cluster information 33 (FIG. 3). Specifically, the clustering unit 22 creates the cluster information 33 based on the result of clustering in the “second” step S204.

ステップＳ２０６において、クラスタ分割評価装置１の回帰分析部２３は、予測モデル３１を作成する。具体的には、回帰分析部２３は、ユーザが画面上で予測モデルの数式を記述するのを受け付け、又は、一般的な予測モデルのひな型を画面表示し、ユーザが選択するのを受け付ける。ここで作成される予測モデル３１は、前記した式１のような１次式である必要はなく、高次式であってもよいし、指数、対数等を含む非線形の式であってもよい。ただし、予測モデル３１は、ステップＳ２０２において受け付けた各変数についてのパラメータ（この段階では値は未知である）を含むものとする。 In step S206, the regression analysis unit 23 of the cluster division evaluation device 1 creates the prediction model 31. Specifically, the regression analysis unit 23 accepts the user to describe the mathematical formula of the prediction model on the screen, or displays the template of the general prediction model on the screen and accepts the user to select. The prediction model 31 created here does not have to be a linear equation as in Equation 1 described above, and may be a higher-order equation or a non-linear equation including an exponent, a logarithm, and the like. .. However, it is assumed that the prediction model 31 includes parameters (values are unknown at this stage) for each variable received in step S202.

ステップＳ２０７において、回帰分析部２３は、クラスタごとにパラメータを最適化する。具体的には、回帰分析部２３は、前記した方法で、予測モデルのパラメータをクラスタごとに決定する。つまり、回帰分析部２３は、調査値情報３２（図２）の店舗Ｍ０１〜Ｍ２０の調査値のうち、処理対象のクラスタに属するものを使用して、“Σ（Ｙ−ｙ）^２”を最小にするパラメータを決定する。 In step S207, the regression analysis unit 23 optimizes the parameters for each cluster. Specifically, the regression analysis unit 23 determines the parameters of the prediction model for each cluster by the method described above. That is, the regression analysis unit 23 uses the survey values of the stores M01 to M20 of the survey value information 32 (FIG. 2) that belong to the cluster to be processed, and minimizes ^{"Σ (Y-y) 2".} Determine the parameters to be used.

ステップＳ２０８において、回帰分析部２３は、誤差情報３４（図９）を作成する。具体的には、第１に、回帰分析部２３は、誤差情報３４を作成する。ここで作成される誤差情報３４は、４本のレコードを有し、クラスタ数欄１２１には、“１”、“２”、“３”及び“４”が記憶されている。誤差欄１２２及び誤差評価値欄１２３は、空欄である。
第２に、回帰分析部２３は、ステップＳ２０７において最小となった“Σ（Ｙ−ｙ）^２”を使用して、誤差“√（Σ（Ｙ−ｙ）^２／ｎ）”を算出し、誤差欄１２２に記憶する。
第３に、回帰分析部２３は、各レコードの誤差に基づいて、誤差評価値を算出し、誤差評価値欄１２３に記憶する。 In step S208, the regression analysis unit 23 creates the error information 34 (FIG. 9). Specifically, first, the regression analysis unit 23 creates the error information 34. The error information 34 created here has four records, and "1", "2", "3", and "4" are stored in the cluster number column 121. The error column 122 and the error evaluation value column 123 are blank.
Second, the regression analysis unit 23 calculates the error “√ (Σ (Y−y) ² / n)” using the minimum “Σ (Y−y) ^{2” in step S207.} It is stored in the error column 122.
Thirdly, the regression analysis unit 23 calculates an error evaluation value based on the error of each record and stores it in the error evaluation value column 123.

ステップＳ２０９において、回帰分析部２３は、誤差評価値に基づきクラスタ数を決定する。具体的には、回帰分析部２３は、誤差評価値が最小であるレコードのクラスタ数を決定する。ここで“最小”としたのはあくまでも一例であり、回帰分析部２３は、所定の基準を満たす程度に誤差評価値が小さい複数の“クラスタ数”を決定してもよい。 In step S209, the regression analysis unit 23 determines the number of clusters based on the error evaluation value. Specifically, the regression analysis unit 23 determines the number of clusters of records having the smallest error evaluation value. Here, “minimum” is merely an example, and the regression analysis unit 23 may determine a plurality of “clusters” whose error evaluation values are small enough to satisfy a predetermined criterion.

ステップＳ２１０において、クラスタ分割評価装置１の表示処理部２４は、決定したクラスタ数及び誤差評価値を表示する。具体的には、第１に、表示処理部２４は、ステップＳ２０９において決定したクラスタ数及びそのクラスタ数に対する誤差評価値を出力装置１３に表示する。ここでは、“クラスタ数＝４”が表示されたとする。
第２に、表示処理部２４は、４個のクラスタｃ７、ｃ８、ｃ９及びｃ１０に対応する予測モデル３１ｇ、３１ｈ、３１ｉ及び３１ｊ（図７）を補助記憶装置１５に記憶する。その後、処理手順を終了する。 In step S210, the display processing unit 24 of the cluster division evaluation device 1 displays the determined number of clusters and the error evaluation value. Specifically, first, the display processing unit 24 displays the number of clusters determined in step S209 and the error evaluation value for the number of clusters on the output device 13. Here, it is assumed that "the number of clusters = 4" is displayed.
Second, the display processing unit 24 stores the prediction models 31g, 31h, 31i and 31j (FIG. 7) corresponding to the four clusters c7, c8, c9 and c10 in the auxiliary storage device 15. After that, the processing procedure ends.

（予測モデルの活用）
ステップＳ２１０の“第１”において“クラスタ数＝４”が表示されたという前提で、その後の予測モデルの活用方法を説明する。クラスタｃ７は、図３の店舗Ｍ０１〜Ｍ０４に対応している。店舗Ｍ０１〜Ｍ０４は、例えば、ある特定の地域に立地する店舗である。回帰分析部２３は、当該地域に新たに出店される店舗の年間売上高を予測する場合、予測モデル３１ｇを使用する。クラスタｃ１０は、図３の店舗Ｍ１７〜Ｍ２０に対応している。店舗Ｍ１７〜Ｍ２０は、例えば、ある他の特定の地域に立地する店舗である。回帰分析部２３は、当該他の地域に新たに出店される店舗の年間売上高を予測する場合、予測モデル３１ｊを使用する。他のクラスタについても同様である。 (Utilization of prediction model)
Assuming that "number of clusters = 4" is displayed in "first" of step S210, a method of utilizing the prediction model after that will be described. The cluster c7 corresponds to the stores M01 to M04 in FIG. Stores M01 to M04 are, for example, stores located in a specific area. The regression analysis unit 23 uses the prediction model 31g when predicting the annual sales of stores newly opened in the area. The cluster c10 corresponds to the stores M17 to M20 in FIG. The stores M17 to M20 are, for example, stores located in a specific area. The regression analysis unit 23 uses the prediction model 31j when predicting the annual sales of stores newly opened in the other area. The same applies to other clusters.

（処理手順の変形例）
前記では、クラスタリング部２２は、すべてのクラスタ数についてクラスタリングを行い、回帰分析部２３は、すべてのクラスタ数について誤差評価値を算出している（総当たり処理）。しかしながら、クラスタ数ｋ＝１、２、３、４の降順又は昇順に、クラスタリング部２２がクラスタリングを行い、回帰分析部２３が誤差評価値を算出する処理を繰り返してもよい。この場合、所定の閾値（目標）に誤差評価値が達するまで、又は、誤差評価値の対前回比減少分が所定の閾値以下になるまで、クラスタリング部２２及び回帰分析部２３は処理を繰り返す。 (Modified example of processing procedure)
In the above, the clustering unit 22 clusters for all the number of clusters, and the regression analysis unit 23 calculates the error evaluation value for all the number of clusters (round-robin processing). However, the clustering unit 22 may perform clustering in descending or ascending order of the number of clusters k = 1, 2, 3, and 4, and the regression analysis unit 23 may repeat the process of calculating the error evaluation value. In this case, the clustering unit 22 and the regression analysis unit 23 repeat the process until the error evaluation value reaches a predetermined threshold value (target) or the amount of decrease in the error evaluation value from the previous time becomes equal to or less than the predetermined threshold value.

〈第２の実施形態〉
続いて、第２の実施形態を説明する。第２の実施形態が第１の実施形態と異なる点は、第１の実施形態が調査値情報３２（図２）を使用するのに代えて、第２の実施形態が調査値情報３２ｂ（図１１）を使用することである。 <Second embodiment>
Subsequently, the second embodiment will be described. The difference between the second embodiment and the first embodiment is that instead of the first embodiment using the survey value information 32 (FIG. 2), the second embodiment uses the survey value information 32b (FIG. 2). 11) is to be used.

図１１は、調査値情報３２ｂの一例である。調査値情報３２ｂにおいては、顧客ＩＤ欄１３１に記憶された顧客ＩＤに関連付けて、目的変数欄１３２には目的変数の調査値が、説明変数欄１３３には説明変数の調査値が記憶されている。
顧客ＩＤ欄１３１の顧客ＩＤは、取引主体である顧客を一意に特定する識別子である。 FIG. 11 is an example of the survey value information 32b. In the survey value information 32b, the survey value of the objective variable is stored in the objective variable column 132 and the survey value of the explanatory variable is stored in the explanatory variable column 133 in association with the customer ID stored in the customer ID column 131. ..
The customer ID in the customer ID column 131 is an identifier that uniquely identifies the customer who is the legal entity.

目的変数欄１３２の目的変数の調査値は、顧客の年間購買金額である。ここでの年間購買金額は、すべての店舗で購入した商品の代金の合計額である。“年間”はあくまで一例であり、目的変数の調査値は、月間購買金額であっても、その他の期間の購買金額であってもよい。“＃”は、異なる値を省略的に示している（以下同様）。
説明変数欄１３３の説明変数の調査値は、年齢（欄１３３ａ）、性別（欄１３３ｂ）、自家所有（欄１３３ｃ）、年収（欄１３３ｄ）、支払方法（欄１３３ｅ）、食品購買金額（欄１３３ｆ）及び衣類購買金額（欄１３３ｇ）である。 The survey value of the objective variable in the objective variable column 132 is the annual purchase amount of the customer. The annual purchase amount here is the total price of the products purchased at all stores. “Annual” is just an example, and the survey value of the objective variable may be the monthly purchase amount or the purchase amount for other periods. “#” Indicates a different value abbreviated (the same applies hereinafter).
The survey values of the explanatory variables in the explanatory variable column 133 are age (column 133a), gender (column 133b), self-owned (column 133c), annual income (column 133d), payment method (column 133e), and food purchase amount (column 133f). ) And the purchase price of clothing (column 133 g).

このうち、年齢は、顧客の年齢である。
性別は、顧客の性別である。図１１においては、わかりやすさのために、性別は“男”又は“女”のいずれかとしている。性別が多次元空間の説明変数の軸に割り当てられる場合、例えば“男＝０、女＝１”のように数値化される（後記する自家所有及び支払方法についても同様）。 Of these, the age is the age of the customer.
Gender is the gender of the customer. In FIG. 11, for the sake of clarity, the gender is either “male” or “female”. When gender is assigned to the axis of the explanatory variable of the multidimensional space, it is quantified as, for example, "male = 0, female = 1" (the same applies to the self-owned and payment methods described later).

自家所有は、顧客が居住する住宅を顧客が保有していることを示す“あり”、又は、保有していないことを示す“なし”のいずれかである。自家所有が“あり”の場合、例えば、土地の面積又は市場価値が当該欄に記憶されてもよい。
年収は、顧客の年間所得金額である。顧客は、年収から貯金額又は返済額を減算した金額で商品等を購入することになる。
支払方法は、商品代金を現金で支払ったことを示す“現金”、又は、カードで支払ったことを示す“カード”のいずれかである。 Self-ownership is either "yes", which indicates that the customer owns the home in which the customer resides, or "none", which indicates that the customer does not own the home. If self-owned is "yes", for example, the area or market value of the land may be stored in this column.
Annual income is the amount of annual income of the customer. The customer purchases the product or the like at the amount obtained by subtracting the savings amount or the repayment amount from the annual income.
The payment method is either "cash" indicating that the product price has been paid in cash, or "card" indicating that the product has been paid in cash.

食品購買金額は、すべての店舗で購入した商品のうち食品の代金の年間合計額である。
衣類購買金額は、すべての店舗で購入した商品のうち衣類の代金の年間合計額である。
“食品”及び“衣類”は、あくまでも一例である。ユーザは、年間購買金額の使途として分析しようとしている商品又はサービスと相関がありそうな特定の品目の購買金額を選択することができる。例えば、食品購買金額が年間購買金額に占める比率（エンゲル係数）は、社会階層ごとにほぼ一定であることはよく知られている。
なお、年間購買金額は、“取引主体の経済力”に、年齢、性別、自家所有、年収、支払方法、食品購買金額及び衣類購買金額は、“取引主体の属性”に相当する。 The food purchase amount is the annual total amount of the food price of the products purchased at all stores.
The clothing purchase amount is the annual total amount of clothing purchases among the products purchased at all stores.
“Food” and “clothing” are just examples. The user can select the purchase amount of a specific item that is likely to correlate with the product or service to be analyzed as the use of the annual purchase amount. For example, it is well known that the ratio of food purchase amount to annual purchase amount (Engel coefficient) is almost constant for each social class.
The annual purchase amount corresponds to the "economic power of the transaction entity", and the age, gender, self-owned property, annual income, payment method, food purchase amount and clothing purchase amount correspond to the "attribute of the transaction entity".

調査値情報３２ｂのレコードは、基本的には顧客ごとに記憶されている。しかしながら、１本のレコードを１回の購買機会（支払単位）に対応させ、顧客ごとの重複を許容することも可能である。レコードが顧客ごとに記憶される場合、１本のレコードは、通常、その顧客についての複数の購買機会を反映している。したがって、支払方法としての“カード”及び“現金”が混在する場合がある。この場合、支払方法としていずれか回数の多い方又は金額の大きい方が代表的に採用されてもよい。また、“カード＝＃回、現金＝＃回”又は“カード＝＃円、現金＝＃円”のように２次元化された値が採用されてもよい。 The record of the survey value information 32b is basically stored for each customer. However, it is also possible to make one record correspond to one purchase opportunity (payment unit) and allow duplication for each customer. When records are stored on a customer-by-customer basis, a single record typically reflects multiple purchasing opportunities for that customer. Therefore, "card" and "cash" as payment methods may be mixed. In this case, as the payment method, the one with the larger number of payments or the one with the larger amount of money may be typically adopted. Further, a two-dimensional value such as "card = # times, cash = # times" or "card = # yen, cash = # yen" may be adopted.

第２の実施形態の具体的な処理内容は、第１の実施形態と同様であり、図１及び図３〜図１０についての説明が、第２の実施形態にもそのまま当てはまる。但し、図３において、“店舗ＩＤ”（欄１１１）は、“顧客ＩＤ”と読み替える。“年間売上高”（欄１１２）は、“年間購買金額”と読み替える。“売場面積”（欄１１３）は、図１１の欄１３３ａ〜１３３ｇのうちのいずれか（例えば“年収”）と読み替える。 The specific processing content of the second embodiment is the same as that of the first embodiment, and the description of FIGS. 1 and 3 to 10 applies to the second embodiment as it is. However, in FIG. 3, "store ID" (column 111) is read as "customer ID". “Annual sales” (column 112) should be read as “annual purchase amount”. The “sales floor area” (column 113) should be read as any one of columns 133a to 133g in FIG. 11 (for example, “annual income”).

第１の実施形態と同様に第２の実施形態においても、ステップＳ２１０の“第１”において“クラスタ数＝４”が表示されたという前提で、その後の予測モデルの活用方法を説明する。クラスタｃ７は、前記読み替えを行った後の図３の顧客Ｐ０１〜Ｐ０４に対応している。顧客Ｐ０１〜Ｐ０４は、例えば、ある特定の地域に居住する顧客である。回帰分析部２３は、当該地域に居住する顧客の年間購買金額を予測する場合、予測モデル３１ｇを使用する。クラスタｃ１０は、図３の顧客Ｐ１７〜Ｐ２０に対応している。顧客Ｐ１７〜Ｐ２０は、例えば、ある他の特定の地域に居住する顧客である。回帰分析部２３は、当該他の地域に居住する顧客の年間購買金額を予測する場合、予測モデル３１ｊを使用する。他のクラスタについても同様である。 In the second embodiment as in the first embodiment, on the premise that "the number of clusters = 4" is displayed in the "first" of step S210, a method of utilizing the prediction model thereafter will be described. The cluster c7 corresponds to the customers P01 to P04 in FIG. 3 after the replacement. Customers P01 to P04 are, for example, customers who live in a specific area. The regression analysis unit 23 uses the prediction model 31g when predicting the annual purchase amount of a customer residing in the area. The cluster c10 corresponds to the customers P17 to P20 of FIG. Customers P17 to P20 are, for example, customers who live in a specific area. The regression analysis unit 23 uses the prediction model 31j when predicting the annual purchase amount of a customer residing in the other area. The same applies to other clusters.

第１の実施形態及び第２の実施形態を通じて、商品取引の主体である店舗及び顧客の例を説明した。しかしながら、前記から明らかなように、本発明は、商品の買主としての顧客及び商品の売主としての店舗以外の取引主体に対しても適用することができる。取引主体は、例えば、物品賃借取引における貸主及び借主、資本取引における債権者及び債務者、交通、物流、宿泊、医療、教育、介護サービス等におけるサービスの提供者及び被提供者等を含む。 Through the first embodiment and the second embodiment, examples of stores and customers who are the main constituents of commodity transactions have been described. However, as is clear from the above, the present invention can also be applied to a legal entity other than the customer of the buyer of the goods and the store of the seller of the goods. Transaction entities include, for example, lenders and borrowers in goods lease transactions, creditors and debtors in capital transactions, service providers and recipients in transportation, logistics, accommodation, medical care, education, nursing care services, and the like.

第１の実施形態及び第２の実施形態を通じて、年間売上高及び年間購買金額を予測する例を説明した。しかしながら、前記から明らかなように、本発明は、取引主体の一般的な経済力を予測する例に対して適用することができる。経済力は、貸出残高、借入残高、提供する又は提供される商品又はサービスの数量、市場占有率等を含む。 An example of forecasting annual sales and annual purchase amount has been described through the first embodiment and the second embodiment. However, as is clear from the above, the present invention can be applied to an example of predicting the general economic power of a legal entity. Economic power includes loan balance, borrowing balance, quantity of goods or services provided or provided, market share, etc.

第１の実施形態及び第２の実施形態を通じて、商品取引の主体である店舗及び顧客の説明変数が店舗サーバ３又はカード会社サーバ４から取得可能なデータである例を説明した。しかしながら、前記から明らかなように、本発明は、数値化することが可能な取引主体のあらゆる属性に対して適用することができる。 Through the first embodiment and the second embodiment, an example in which the explanatory variables of the store and the customer, which are the subjects of the product transaction, are the data that can be acquired from the store server 3 or the card company server 4 has been described. However, as is clear from the above, the present invention can be applied to any attribute of a legal entity that can be quantified.

（本実施形態の効果）
本実施形態のクラスタ分割評価装置の効果は以下の通りである。
（１）クラスタ分割評価装置は、取引主体の属性ごとに精度の高い予測モデルを作成することができる。
（２）クラスタ分割評価装置は、期待し得る誤差評価値及び取引主体の属性に対応するクラスタ数を表示することができる。
（３）クラスタ分割評価装置は、ユーザがクラスタの数及び大きさを指定することを可能にする。
（４）クラスタ分割評価装置は、店舗の売上高又は顧客の購買金額の予測に適用することができる。
（５）クラスタ分割評価装置は、一般的に入手しやすい取引主体の属性を使用することができる。 (Effect of this embodiment)
The effects of the cluster division evaluation device of this embodiment are as follows.
(1) The cluster division evaluation device can create a highly accurate prediction model for each attribute of the legal entity.
(2) The cluster division evaluation device can display the expected error evaluation value and the number of clusters corresponding to the attributes of the legal entity.
(3) The cluster division evaluation device enables the user to specify the number and size of clusters.
(4) The cluster division evaluation device can be applied to forecast the sales of a store or the purchase amount of a customer.
(5) The cluster division evaluation device can use the attributes of the legal entity that are generally easily available.

なお、本発明は前記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、前記した実施例は、本発明を分かり易く説明するために詳細に説明したものであり、必ずしも説明したすべての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 The present invention is not limited to the above-described embodiment, and includes various modifications. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the configurations described. Further, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Further, it is possible to add / delete / replace a part of the configuration of each embodiment with another configuration.

また、前記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウエアで実現してもよい。また、前記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウエアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（Solid State Drive）等の記録装置、又は、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。
また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしもすべての制御線や情報線を示しているとは限らない。実際には殆どすべての構成が相互に接続されていると考えてもよい。 Further, each of the above-mentioned configurations, functions, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be placed in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
In addition, control lines and information lines are shown as necessary for explanation, and not all control lines and information lines are shown in the product. In practice, it can be considered that almost all configurations are interconnected.

１クラスタ分割評価装置
２ネットワーク
３店舗サーバ
４カード会社サーバ
１１中央制御装置
１２入力装置
１３出力装置
１４主記憶装置
１５補助記憶装置
１６通信装置
２１調査値取得部
２２クラスタリング部
２３回帰分析部
２４表示処理部
３１予測モデル
３２、３２ｂ調査値情報
３３クラスタ情報
３４誤差情報 1 Cluster division evaluation device 2 Network 3 Store server 4 Card company server 11 Central control device 12 Input device 13 Output device 14 Main memory device 15 Auxiliary storage device 16 Communication device 21 Survey value acquisition unit 22 Clustering unit 23 Regression analysis unit 24 Display processing Part 31 Prediction model 32, 32b Survey value information 33 Cluster information 34 Error information

Claims

The survey value acquisition unit that acquires the survey value of the attribute of the legal entity and the survey value of the economic power of the trading entity,
A clustering unit that divides into a plurality of clusters in a multidimensional space based on the survey values of a plurality of attributes of the legal entity and the survey values of the economic power of the legal entity.
Based on the survey value of the attribute and the survey value of the economic power for each of the divided clusters , a prediction model is created with the attribute of the trading entity as the explanatory variable and the economic power of the trading entity as the objective variable, and the prediction is made. The model parameters are optimized for each of the plurality of clusters.
The difference between the predicted value of the economic power of the trading entity output by the prediction model optimized for the parameters and the survey value of the economic power of the trading entity is evaluated for each number of clusters.
A regression analysis unit that determines the number of clusters based on the evaluation results,
A cluster division evaluation device characterized by comprising.

If the legal entity is a store
The attribute includes either the sales floor area, the property tax line price, or the number of parking lots.
If the legal entity is a customer
The attributes include either self-owned, annual income, payment method, or the purchase price of a particular item.
The cluster division evaluation device according to claim 1.

The clustering unit
Accepting the user to select some or all of the plurality of explanatory variables.
The cluster division evaluation device according to claim 2.

The regression analysis unit
Accepting the user to describe the mathematical formula of the prediction model on the screen, or displaying the model of the prediction model on the screen and accepting the user to select.
The cluster division evaluation device according to claim 3.

Provided with a display processing unit that displays the number of the determined clusters and the evaluation result corresponding to the number of the clusters.
The cluster division evaluation device according to claim 1.

The clustering unit
Accepting the user to enter the minimum and maximum values for the number of clusters and the minimum and maximum number of survey values contained in the clusters.
The cluster division evaluation device according to claim 5.

If the legal entity is a store
The economic power is sales,
If the legal entity is a customer
The economic power is the purchase price,
The cluster division evaluation device according to claim 6.

The survey value acquisition section of the cluster division evaluation device is
Obtain the survey value of the attribute of the trading entity and the survey value of the economic power of the trading entity,
The clustering unit of the cluster division evaluation device is
Divide into a plurality of clusters in a multidimensional space based on the survey values of a plurality of attributes of the legal entity and the survey values of the economic power of the legal entity.
The regression analysis unit of the cluster division evaluation device
Based on the survey value of the attribute and the survey value of the economic power for each of the divided clusters , a prediction model is created with the attribute of the trading entity as the explanatory variable and the economic power of the trading entity as the objective variable, and the prediction is made. The model parameters are optimized for each of the plurality of clusters.
The difference between the predicted value of the economic power of the trading entity output by the prediction model optimized for the parameters and the survey value of the economic power of the trading entity is evaluated for each number of clusters.
Determining the number of clusters based on the results of the evaluation,
A cluster division evaluation method for a cluster division evaluation device characterized by.

Computer,
The survey value acquisition unit that acquires the survey value of the attribute of the legal entity and the survey value of the economic power of the trading entity,
A clustering unit that divides into a plurality of clusters in a multidimensional space based on the survey values of a plurality of attributes of the legal entity and the survey values of the economic power of the legal entity.
Based on the survey value of the attribute and the survey value of the economic power for each of the divided clusters , a prediction model is created with the attribute of the trading entity as the explanatory variable and the economic power of the trading entity as the objective variable, and the prediction is made. Create model parameters, optimize for each of the multiple clusters,
The difference between the predicted value of the economic power of the trading entity output by the prediction model optimized for the parameters and the survey value of the economic power of the trading entity is evaluated for each number of clusters.
A regression analysis unit that determines the number of clusters based on the evaluation results,
A cluster division evaluation program for functioning.