JP6557192B2

JP6557192B2 - Clustering apparatus and clustering method

Info

Publication number: JP6557192B2
Application number: JP2016160855A
Authority: JP
Inventors: 旭史; 真智子豊田; 可織飯盛
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2016-08-18
Filing date: 2016-08-18
Publication date: 2019-08-07
Anticipated expiration: 2036-08-18
Also published as: JP2018028823A

Description

本発明は、クラスタリング装置およびクラスタリング方法に関する。 The present invention relates to a clustering apparatus and a clustering method.

従来、移動体データを対象とした分析技術が知られている。従来の分析技術は、例えば、個々の移動体にフォーカスした技術と複数の移動体全体にフォーカスした技術とに分類される。また、従来の分析技術は、移動体が移動する場所にフォーカスした技術と移動体の詳細な軌跡にフォーカスした技術とに分類される。 Conventionally, an analysis technique for moving object data is known. Conventional analysis techniques are classified into, for example, a technique that focuses on individual moving bodies and a technique that focuses on a plurality of moving bodies as a whole. Conventional analysis techniques are classified into a technique focused on a place where the moving body moves and a technique focused on a detailed trajectory of the moving body.

個々の移動体および場所にフォーカスした技術として、例えば、移動体の各通過点での滞留時間とＰＯＩ（Point of interest）情報を用いて滞留場所を推定する技術が知られている。また、移動体全体および場所にフォーカスした技術として、例えば、複数の移動体が通過した密度の高い場所を求める技術が知られている。 As a technique focusing on individual moving bodies and places, for example, a technique for estimating a staying place using a staying time and POI (Point of interest) information at each passing point of the moving body is known. As a technique focused on the entire moving body and the place, for example, a technique for obtaining a high-density place through which a plurality of moving bodies have passed is known.

また、個々の移動体および軌跡にフォーカスした技術として、例えば、マップをメッシュに分け、詳細軌跡に基づくメッシュ間の遷移確率から目的地を予測する技術が知られている。また、移動体全体および軌跡にフォーカスした技術として、例えば、移動体の軌跡を重回帰混合正規分布モデルに当てはめ、同じモデルに含まれる軌跡を共通移動ルートとして検出する技術が知られている。 Further, as a technique focusing on individual moving bodies and trajectories, for example, a technique is known in which a map is divided into meshes and a destination is predicted from transition probabilities between meshes based on detailed trajectories. As a technique focusing on the entire moving body and the trajectory, for example, a technique is known in which the trajectory of the moving body is applied to a multiple regression mixed normal distribution model and the trajectory included in the same model is detected as a common movement route.

Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu: “A density-based algorithm for discovering clusters in large spatial databases with noise,” KDD, pp.226-231, 1996.Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu: “A density-based algorithm for discovering clusters in large spatial databases with noise,” KDD, pp.226-231, 1996. 熊谷雄介、今井良太、松林達史: “非負値複合テンソル因子分解を用いた訪日外国人観光客の回遊行動分析,” IBISML, 2015.Yusuke Kumagai, Ryota Imai, Tatsushi Matsubayashi: “Analysis of migratory behavior of foreign tourists using non-negative compound tensor factorization,” IBISML, 2015. Scott Gaffney, and Padhraic Smyth: “Trajectory Clustering with Mixtures of Regression Models,” KDD, pp.63-72, 1999.Scott Gaffney, and Padhraic Smyth: “Trajectory Clustering with Mixtures of Regression Models,” KDD, pp.63-72, 1999. Jae-Gil Lee, Jiawei Han, Kyu-Young Whang: “Trajectory clustering: a partition-and-group framework,” SIGMOD, pp.593-604, 2007.Jae-Gil Lee, Jiawei Han, Kyu-Young Whang: “Trajectory clustering: a partition-and-group framework,” SIGMOD, pp.593-604, 2007.

しかしながら、従来の技術には、移動体データから、場所および軌跡の両方にフォーカスした分析結果を取得することができない場合があるという問題があった。例えば、点の密度が高い場所を求める技術では、軌跡にフォーカスした分析結果を取得することができない場合があった。また、例えば、同じ重回帰混合正規分布モデルに含まれる軌跡を共通移動ルートとして検出する技術では、場所にフォーカスした分析結果を取得することができない場合があった。 However, the conventional technique has a problem in that it may not be possible to acquire analysis results focused on both the location and the locus from the moving body data. For example, in a technique for obtaining a place where the density of points is high, there is a case where an analysis result focused on a locus cannot be obtained. In addition, for example, in the technique of detecting a trajectory included in the same multiple regression mixed normal distribution model as a common movement route, there is a case where an analysis result focused on a place cannot be acquired.

本発明のクラスタリング装置は、所定の区域を、複数のパターンで、複数の区域に分割する分割部と、前記複数のパターンごとに、前記複数の区域のそれぞれについて、区域内におけるデータ点の移動方向間の角度を基に、区域内のデータ点が交差する傾向の大きさを示す値を計算し、前記複数の区域のうち前記値が所定値以上である区域を第１の区域に分類し、前記複数の区域のうち前記値が所定値以上でない区域を第２の区域に分類する分類部と、前記第１の区域に分類された区域ごとに、前記データ点の位置間の距離および前記データ点の移動方向間の角度の両方に基づく第１の尺度を用いて、区域内のデータ点のクラスタリングを行い、前記第２の区域に分類された区域ごとに、前記データ点の位置間の距離および前記データ点の移動方向間の角度の両方に基づく第２の尺度を用いて、区域内のデータ点のクラスタリングを行うクラスタリング部と、前記複数のパターンごとに、前記クラスタリング部によるクラスタリングの結果に基づいて、区域の数、区域の大きさ、区域内のデータ点の数、クラスタの数およびいずれのクラスタにも含まれなかったデータ点の数に比例する第１のコストを計算する計算部と、前記複数のパターンのうち、前記第１のコストが最も小さいパターンを最適解として選択する選択部と、を有することを特徴とする。 The clustering device of the present invention includes a dividing unit that divides a predetermined area into a plurality of areas by a plurality of patterns, and a moving direction of data points in the area for each of the plurality of areas for each of the plurality of patterns. Based on the angle between, calculate a value indicating the magnitude of the tendency of the data points in the area to intersect, classify the area where the value is greater than or equal to a predetermined value among the plurality of areas as a first area, A classifying unit that classifies a zone whose value is not equal to or greater than a predetermined value among the plurality of zones as a second zone, and a distance between the positions of the data points and the data for each zone classified as the first zone Clustering data points within an area using a first measure based on both angles between the direction of movement of the points, and for each of the areas classified as the second area, the distance between the positions of the data points And transfer of the data points A clustering unit for clustering data points in a zone using a second measure based on both angles between directions, and for each of the plurality of patterns, the number of zones based on the result of clustering by the clustering unit A calculator for calculating a first cost proportional to the size of the area, the number of data points in the area, the number of clusters and the number of data points not included in any cluster, and And a selection unit that selects a pattern having the smallest first cost as an optimal solution.

本発明のクラスタリング方法は、クラスタリング装置で実行されるクラスタリング方法であって、所定の区域を、複数のパターンで、複数の区域に分割する分割工程と、前記複数のパターンごとに、前記複数の区域のそれぞれについて、区域内におけるデータ点の移動方向間の角度を基に、区域内のデータ点が交差する傾向の大きさを示す値を計算し、前記複数の区域のうち前記値が所定値以上である区域を第１の区域に分類し、前記複数の区域のうち前記値が所定値以上でない区域を第２の区域に分類する分類工程と、前記第１の区域に分類された区域ごとに、前記データ点の位置間の距離および前記データ点の移動方向間の角度の両方に基づく第１の尺度を用いて、区域内のデータ点のクラスタリングを行い、前記第２の区域に分類された区域ごとに、前記データ点の位置間の距離および前記データ点の移動方向間の角度の両方に基づく第２の尺度を用いて、区域内のデータ点のクラスタリングを行うクラスタリング工程と、前記複数のパターンごとに、前記クラスタリング工程によるクラスタリングの結果に基づいて、区域の数、区域の大きさ、区域内のデータ点の数、クラスタの数およびいずれのクラスタにも含まれなかったデータ点の数に比例する第１のコストを計算する計算工程と、前記複数のパターンのうち、前記第１のコストが最も小さいパターンを最適解として選択する選択工程と、を含んだことを特徴とする。 The clustering method of the present invention is a clustering method executed by a clustering apparatus, wherein a predetermined area is divided into a plurality of areas by a plurality of patterns, and the plurality of areas for each of the plurality of patterns. For each of the above, a value indicating the magnitude of the tendency of the data points in the area to intersect is calculated based on the angle between the movement directions of the data points in the area. For each area classified into the first area, a classification process for classifying an area that is a first area, and classifying an area in which the value is not equal to or greater than a predetermined value among the plurality of areas as a second area; Clustering the data points in the area using a first measure based on both the distance between the positions of the data points and the angle between the movement directions of the data points, and are classified into the second area For each zone, a clustering step for clustering the data points in the zone using a second measure based on both the distance between the positions of the data points and the angle between the movement directions of the data points; For each pattern, based on the result of clustering in the clustering step, the number of areas, the size of the area, the number of data points in the area, the number of clusters and the number of data points not included in any cluster. The method includes a calculation step of calculating a proportional first cost, and a selection step of selecting a pattern having the smallest first cost among the plurality of patterns as an optimal solution.

本発明によれば、移動体データから、場所および軌跡の両方にフォーカスした分析結果を取得することができる。 According to the present invention, it is possible to acquire an analysis result focusing on both a place and a locus from moving body data.

図１は、第１の実施形態に係るクラスタリング装置の構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of the configuration of the clustering apparatus according to the first embodiment. 図２は、クラスタリング装置に入力される移動体データの一例を示す図である。FIG. 2 is a diagram illustrating an example of mobile object data input to the clustering apparatus. 図３は、方向の算出方法を説明するための図である。FIG. 3 is a diagram for explaining a direction calculation method. 図４は、変換後の移動体データの一例を示す図である。FIG. 4 is a diagram illustrating an example of the moving body data after conversion. 図５は、移動体データを基に作成される図の一例である。FIG. 5 is an example of a diagram created based on mobile object data. 図６は、クラスタリング装置の処理について説明するための図である。FIG. 6 is a diagram for explaining processing of the clustering apparatus. 図７は、分割のパターンの一例を示す図である。FIG. 7 is a diagram illustrating an example of a division pattern. 図８は、密集エリアクラスタの傾向を説明するための図である。FIG. 8 is a diagram for explaining a tendency of a dense area cluster. 図９は、共通移動軌跡クラスタの傾向を説明するための図である。FIG. 9 is a diagram for explaining the tendency of the common movement locus cluster. 図１０は、クラスタリング結果の一例を示す図である。FIG. 10 is a diagram illustrating an example of the clustering result. 図１１は、第１の実施形態に係るクラスタリング装置の処理の流れを示すフローチャートである。FIG. 11 is a flowchart showing the flow of processing of the clustering apparatus according to the first embodiment. 図１２は、プログラムが実行されることによりクラスタリング装置が実現されるコンピュータの一例を示す図である。FIG. 12 is a diagram illustrating an example of a computer in which a clustering apparatus is realized by executing a program.

［第１の実施形態の構成］
以下に、本願に係るクラスタリング装置およびクラスタリング方法の実施形態を図面に基づいて詳細に説明する。なお、この実施形態により本発明が限定されるものではない。まず、図１を用いて、第１の実施形態に係るクラスタリング装置の構成について説明する。図１は、第１の実施形態に係るクラスタリング装置の構成の一例を示す図である。図１に示すように、クラスタリング装置１０は、入力部１１、出力部１２、制御部１３および記憶部１４を有する。 [Configuration of First Embodiment]
Hereinafter, embodiments of a clustering apparatus and a clustering method according to the present application will be described in detail with reference to the drawings. In addition, this invention is not limited by this embodiment. First, the configuration of the clustering apparatus according to the first embodiment will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of the configuration of the clustering apparatus according to the first embodiment. As illustrated in FIG. 1, the clustering apparatus 10 includes an input unit 11, an output unit 12, a control unit 13, and a storage unit 14.

入力部１１は、例えば、移動体データの入力を受け付ける。入力部１１は、例えば、マウスやキーボード等の入力装置である。出力部１２は、画面の表示等により、データを出力する。出力部１２は、例えば、クラスタリングの結果を表示するディスプレイ等の表示装置である。 The input unit 11 receives, for example, input of moving body data. The input unit 11 is an input device such as a mouse or a keyboard, for example. The output unit 12 outputs data by screen display or the like. The output unit 12 is a display device such as a display for displaying the result of clustering, for example.

制御部１３は、クラスタリング装置１０全体を制御する。制御部１３は、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等の電子回路や、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等の集積回路である。また、制御部１３は、各種の処理手順を規定したプログラムや制御データを格納するための内部メモリを有し、内部メモリを用いて各処理を実行する。また、制御部１３は、各種のプログラムが動作することにより各種の処理部として機能する。例えば、制御部１３は、変換部１３１、分割部１３２、分類部１３３、クラスタリング部１３４、計算部１３５および選択部１３６を有する。 The control unit 13 controls the entire clustering apparatus 10. The control unit 13 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or MPU (Micro Processing Unit), or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). The control unit 13 has an internal memory for storing programs and control data defining various processing procedures, and executes each process using the internal memory. The control unit 13 functions as various processing units when various programs are operated. For example, the control unit 13 includes a conversion unit 131, a division unit 132, a classification unit 133, a clustering unit 134, a calculation unit 135, and a selection unit 136.

記憶部１４は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、光ディスク等の記憶装置である。なお、記憶部１４は、ＲＡＭ（Random Access Memory）、フラッシュメモリ、ＮＶＳＲＡＭ（Non Volatile Static Random Access Memory）等のデータを書き換え可能な半導体メモリであってもよい。記憶部１４は、クラスタリング装置１０で実行されるＯＳ（Operating System）や各種プログラムを記憶する。さらに、記憶部１４は、プログラムの実行で用いられる各種情報を記憶する。 The storage unit 14 is a storage device such as a hard disk drive (HDD), a solid state drive (SSD), and an optical disk. The storage unit 14 may be a semiconductor memory capable of rewriting data, such as a RAM (Random Access Memory), a flash memory, and an NVSRAM (Non Volatile Static Random Access Memory). The storage unit 14 stores an OS (Operating System) executed by the clustering apparatus 10 and various programs. Furthermore, the memory | storage part 14 memorize | stores the various information used by execution of a program.

図２を用いて、移動体データについて説明する。図２は、クラスタリング装置に入力される移動体データの一例を示す図である。移動体データは、移動体の位置および移動方向に関する情報を含んだデータである。なお、図２に示す移動体データは、東京大学の人の流れプロジェクトの人流データ（参考ＵＲＬ：http://bdm.change-jp.com/?p=1705）の一部である。人流データは、国や地方自治体等で実施するパーソントリップ調査データから推定した各都市圏周辺の人の流れデータである。人流データには、日別に、各時刻における各ユーザの位置の経度、緯度等の属性が含まれる。また、この場合、ユーザが移動体である。 The moving body data will be described with reference to FIG. FIG. 2 is a diagram illustrating an example of mobile object data input to the clustering apparatus. The moving object data is data including information on the position and moving direction of the moving object. The mobile data shown in FIG. 2 is a part of human flow data (reference URL: http://bdm.change-jp.com/?p=1705) of the human flow project at the University of Tokyo. Human flow data is flow data of people around each metropolitan area estimated from person trip survey data conducted in the national or local government. The human flow data includes attributes such as longitude and latitude of the position of each user at each time for each day. In this case, the user is a moving body.

例えば、図２の１行目のデータは、ユーザＩＤが１であるユーザが、２０１３／７／１９：４５：００に、緯度が３５．６８３８４５１３、経度が１３９．８９９１３３７である位置にいたことを示している。なお、ユーザＩＤはユーザを識別するＩＤである。 For example, in the data on the first line in FIG. 2, the user whose user ID is 1 was at a position where the latitude was 35.68384513 and the longitude was 139.88991337 at 2013/7/1 9:45:00. Is shown. The user ID is an ID for identifying the user.

変換部１３１は、クラスタリング装置１０に入力された移動体データを、所定の形式のデータに変換する。例えば、図３に示すように、変換部１３１は、入力された移動体データを、ユーザごとに時系列に並べ、緯度および経度の変化に基づいて方向を算出する。図３は、方向の算出方法を説明するための図である。 The conversion unit 131 converts the moving body data input to the clustering apparatus 10 into data in a predetermined format. For example, as illustrated in FIG. 3, the conversion unit 131 arranges input mobile body data in time series for each user, and calculates a direction based on changes in latitude and longitude. FIG. 3 is a diagram for explaining a direction calculation method.

ここで、点ｉｄは、ユーザの各時刻における位置を識別するＩＤである。例えば、点ｉｄが１であるデータ点の位置の緯度は３５．６８３８４５１３、経度は１３９．８９９１３３７である。また、例えば、点ｉｄが２であるデータ点の位置の緯度は３５．６８２４３２８７、経度は１３９．７７３５７２８である。また、位置が同じであっても、ユーザが異なる場合は、異なる点ｉｄが設定されるようにしてもよい。 Here, the point id is an ID for identifying the position of the user at each time. For example, the latitude of the position of the data point with the point id 1 is 35.68384513 and the longitude is 139.88991337. Further, for example, the latitude of the position of the data point with the point id 2 is 35.683243287 and the longitude is 139.7735728. Even if the positions are the same, different points may be set when the users are different.

例えば、変換部１３１は、図３の１行目および２行目のデータを用いて、ユーザＩＤが１であるユーザの、点ｉｄが１であるデータ点の位置における方向を算出する。また、図３の３行目のデータには、後に続く位置のデータが存在しないため、変換部１３１は、図３の３行目のデータを削除し、ユーザＩＤが１であるユーザの、点ｉｄが３であるデータ点の位置における方向を算出しない。 For example, the conversion unit 131 calculates the direction at the position of the data point with the point id 1 for the user with the user ID 1 using the data in the first and second rows in FIG. 3. In addition, since the data in the subsequent position does not exist in the data on the third line in FIG. 3, the conversion unit 131 deletes the data on the third line in FIG. The direction at the position of the data point whose id is 3 is not calculated.

図４は、変換後の移動体データの一例を示す図である。図４に示すように変換後の移動体データには、点ｉｄ、緯度、経度および方向が含まれる。また、例えば、図４に示すように、方向は角度で表される。例えば、点ｉｄが１であるデータ点の位置の緯度は３５．６８３８４５１３、経度は１３９．８９９１３３７、方向はπ／３である。 FIG. 4 is a diagram illustrating an example of the moving body data after conversion. As shown in FIG. 4, the converted moving body data includes a point id, a latitude, a longitude, and a direction. Further, for example, as shown in FIG. 4, the direction is represented by an angle. For example, the latitude of the position of the data point with the point id 1 is 35.68384513, the longitude is 139.88991337, and the direction is π / 3.

図５は、移動体データを基に作成される図の一例である。図５は、図４に示す変換後の移動体データに含まれる各点ｉｄに対応したデータを、地図上にプロットしたものである。図５に示すように、移動体の各点ｉｄに対応したデータは、矢印で表される。図５の矢印の根本の位置は各データの緯度および経度に対応し、矢印の向きは各データの方向に対応している。 FIG. 5 is an example of a diagram created based on mobile object data. FIG. 5 is a plot of data corresponding to each point id included in the converted moving body data shown in FIG. 4 on a map. As shown in FIG. 5, the data corresponding to each point id of the moving object is represented by an arrow. The root position of the arrow in FIG. 5 corresponds to the latitude and longitude of each data, and the direction of the arrow corresponds to the direction of each data.

なお、移動体データの変換は、クラスタリング装置１０以外の装置によって行われるようにしてもよい。また、クラスタリング装置１０によるクラスタリングに適合されるデータが入力される場合、変換部１３１による変換処理は行われなくてもよい。 Note that the mobile data conversion may be performed by a device other than the clustering device 10. When data suitable for clustering by the clustering device 10 is input, the conversion process by the conversion unit 131 may not be performed.

以降の処理を、図６を用いて説明する。図６は、クラスタリング装置の処理について説明するための図である。クラスタリング装置１０は、イテレーションによって処理を行う。まず、初回のイテレーションの処理について説明し、その後、２回目以降の処理について説明する。なお、クラスタリング装置１０は、初回の処理のみを行うこととしてもよい。 The subsequent processing will be described with reference to FIG. FIG. 6 is a diagram for explaining processing of the clustering apparatus. The clustering apparatus 10 performs processing by iteration. First, the first iteration process will be described, and then the second and subsequent processes will be described. Note that the clustering apparatus 10 may perform only the first process.

［初回の処理］
まず、図６の（１−１）に示すように、分割部１３２は、移動体データの対象である領域のうち全領域を分析対象の領域として選択する。以降、分割部１３２が選択した領域を区域とよぶ。次に、図６の（１−２）に示すように、分割部１３２は、区域を、複数のパターンで、複数の区域に分割する。分割部１３２は、例えば、図７に示すように、全領域を所定のサイズのセルに分割しメッシュ化したうえで、所定の緯度を縦線、所定の経度を横線として区域を分割する。図７は、分割のパターンの一例を示す図である。図７に示すように、分割部１３２は、例えば、３×３のセルからなる区域を、４つのパターンで分割する。 [First time processing]
First, as illustrated in (1-1) in FIG. 6, the dividing unit 132 selects all the regions as regions to be analyzed among regions that are targets of moving body data. Hereinafter, the area selected by the dividing unit 132 is referred to as an area. Next, as illustrated in (1-2) of FIG. 6, the dividing unit 132 divides the area into a plurality of areas with a plurality of patterns. For example, as shown in FIG. 7, the dividing unit 132 divides the entire area into cells of a predetermined size and meshes it, and then divides an area with a predetermined latitude as a vertical line and a predetermined longitude as a horizontal line. FIG. 7 is a diagram illustrating an example of a division pattern. As illustrated in FIG. 7, the dividing unit 132 divides, for example, an area composed of 3 × 3 cells into four patterns.

ところで、密集エリアは、例えば、駅やショッピングモール等の多くの人が集まる施設が存在するエリアである。一方、共通移動軌跡は、例えば、幹線道路等の、多くの人が利用するルートである。 By the way, a crowded area is an area where there are facilities where many people gather, such as stations and shopping malls. On the other hand, the common movement locus is a route used by many people, such as a main road.

従来、密集エリアは、例えば移動体の各通過点での密度に基づいて決定されていた。また、従来、共通移動軌跡は、移動体の移動方向等を考慮した軌跡の類似度に基づいて決定されていた。このように、密集エリアおよび共通移動軌跡は、密度や方向といった、それぞれ異なる尺度で決定されていた。第１の実施形態では、クラスタリング装置１０は、密集エリアおよび共通移動軌跡をクラスタとして捉え、同一の尺度を用いて、区域内に密集エリアクラスタまたは共通移動軌跡クラスタを生成する。 Conventionally, the dense area has been determined based on, for example, the density at each passing point of the moving body. Conventionally, the common movement trajectory has been determined based on the similarity of the trajectory considering the moving direction of the moving object. As described above, the dense area and the common movement trajectory are determined on different scales such as density and direction. In the first embodiment, the clustering apparatus 10 regards a dense area and a common movement trajectory as a cluster, and generates a dense area cluster or a common movement trajectory cluster in a section using the same scale.

図８に示すように、密集エリアクラスタでは、データ点が交差する傾向、すなわち、データ点間の距離が近く、データ点の移動方向が様々である傾向が大きい。図８は、密集エリアクラスタの傾向を説明するための図である。また、共通移動軌跡クラスタでは、データ点が交差する傾向が小さい。つまり、図９に示すように、共通移動軌跡クラスタでは、データ点が一定の方向に移動する傾向が大きい。図９は、共通移動軌跡クラスタの傾向を説明するための図である。なお、データ点とは、移動体の位置および移動方向が定義されたデータである。データ点の詳細については後述する。 As shown in FIG. 8, in a dense area cluster, the data points tend to intersect, that is, the distance between the data points is close and the data points move in various directions. FIG. 8 is a diagram for explaining a tendency of a dense area cluster. In the common movement locus cluster, the tendency of data points to intersect is small. That is, as shown in FIG. 9, in the common movement trajectory cluster, the data points tend to move in a certain direction. FIG. 9 is a diagram for explaining the tendency of the common movement locus cluster. The data point is data in which the position and moving direction of the moving object are defined. Details of the data points will be described later.

これより、データ点間の距離およびデータ点の移動方向は、密集エリアクラスタおよび共通移動軌跡クラスタの両方に共通した尺度であることがいえる。また、移動方向傾向の違いにより、密集エリアおよび共通移動軌跡クラスタを区別できることがいえる。そこで、分類部１３３は、データ点の移動方向を基に、分割部１３２による分割のパターンごとに区域を分類する。具体的には、分類部１３３は、複数のパターンごとに、複数の区域のそれぞれについて、区域内におけるデータ点の移動方向間の角度を基に、区域内のデータ点が交差する傾向の大きさを示す値を計算し、複数の区域のうち値が所定値以上である区域を第１の区域に分類し、複数の区域のうち値が所定値以上でない区域を第２の区域に分類する。ここで、第１の区域は、密集エリア区域である。また、第２の区域は、共通移動軌跡区域である。 Thus, it can be said that the distance between the data points and the moving direction of the data points are a scale common to both the dense area cluster and the common moving locus cluster. Further, it can be said that the dense area and the common movement trajectory cluster can be distinguished by the difference in movement direction tendency. Therefore, the classification unit 133 classifies the areas for each division pattern by the division unit 132 based on the moving direction of the data points. Specifically, for each of a plurality of patterns, the classification unit 133 determines the magnitude of the tendency of the data points in the area to intersect based on the angle between the movement directions of the data points in the area. Is calculated, and a zone having a value greater than or equal to a predetermined value among the plurality of zones is classified as a first zone, and a zone having a value not greater than or equal to the predetermined value among the plurality of zones is classified as a second zone. Here, the first area is a dense area area. The second area is a common movement locus area.

ここで、移動体データのうち、あるユーザについての移動体データを、それぞれの要素が緯度と経度とで表される位置座標、および角度で表される移動方向を持つ系列｛ｐ_１，ｐ_２，…，ｐ_ｎ｝で表す。ここで、データ点は、当該系列の各要素である。また、データ点ｐ_ｉの移動方向は、ｐ_ｉ→ｐ_ｉ＋１と表す。データ点は点ｉｄで識別されるので、図４の点ｉｄが１であるデータ点をｐ_１とすると、ｐ_１の位置座標は（３５．６８３８４５１３，１３９．８９９１３３７）であり、ｐ_１の移動方向ｐ_１→ｐ_２はπ／３である。 Here, of the moving body data, the moving body data for a certain user is represented by a series {p ₁ , p ₂ , each element having a position coordinate represented by latitude and longitude, and a moving direction represented by an angle. ,..., _Pn }. Here, the data point is each element of the series. The moving direction of the data point p _i is expressed as p _i → p _{i + 1} . Since the data points are identified at point id, when the data point is 1 id point 4 and _{p 1,} the position coordinates of _{p 1} is (35.68384513,139.8991337), movement of _{p 1} The direction p ₁ → p ₂ is π / 3.

分類部１３３は、区域内の全てのデータ点のペアについて、移動方向によって生成される角度θを算出する。θの範囲は０≦θ≦πとする。例えば、分類部１３３は、ｐ_ｉとｐ_ｊの移動方向によって生成される角度θを、ｐ_ｉ→ｐ_ｉ＋１とｐ_ｊ→ｐ_ｊ＋１との差を０≦θ≦πの範囲で表したものとすることができる。例えば、ｐ_ｉ→ｐ_ｉ＋１−ｐ_ｊ→ｐ_ｊ＋１＝π／２である場合、分類部１３３はθ＝π／２とする。また、例えば、ｐ_ｉ→ｐ_ｉ＋１−ｐ_ｊ→ｐ_ｊ＋１＝３π／２である場合、分類部１３３はθ＝π／２とする。また、例えば、ｐ_ｉ→ｐ_ｉ＋１−ｐ_ｊ→ｐ_ｊ＋１＝−π／３である場合、分類部１３３はθ＝π／３とする。 The classification unit 133 calculates the angle θ generated by the moving direction for all pairs of data points in the area. The range of θ is 0 ≦ θ ≦ π. For example, the classification unit 133 represents the angle θ generated by the moving direction of p _i and p _j and the difference between p _i → p _{i + 1} and p _j → p _{j + 1 in} the range of 0 ≦ θ ≦ π. can do. For example, when p _i → p _{i + 1} −p _j → p _{j + 1} = π / 2, the classification unit 133 sets θ = π / 2. For example, when p _i → p _{i + 1} −p _j → p _{j + 1} = 3π / 2, the classification unit 133 sets θ = π / 2. For example, when p _i → p _{i + 1} −p _j → p _{j + 1} = −π / 3, the classification unit 133 sets θ = π / 3.

そして、分類部１３３は、算出したθのうち、π／４≦θ≦３π／４となるθの数が、０≦θ＜π／４∨３π／４＜θ≦πとなるθの数より多い区域を第１の区域、すなわち密集エリア区域に分類する。また、分類部１３３は、算出したθのうち、０≦θ＜π／４∨３π／４＜θ≦πとなるθの数が、π／４≦θ≦３π／４となるθの数以上である区域を第２の区域、すなわち共通移動軌跡区域に分類する。 Then, the classification unit 133 determines that, among the calculated θ, the number of θ that satisfies π / 4 ≦ θ ≦ 3π / 4 is greater than the number of θ that satisfies 0 ≦ θ <π / 4∨3π / 4 <θ ≦ π. A large area is classified as a first area, that is, a dense area. Further, the classification unit 133 includes the number of θs satisfying 0 ≦ θ <π / 4 ／ 3π / 4 <θ ≦ π among the calculated θs equal to or greater than the number of θs satisfying π / 4 ≦ θ ≦ 3π / 4. Is classified as a second area, that is, a common movement trajectory area.

つまり、分類部１３３は、θとπ／２との差が４／π以下であるペアの数の、θとπ／２との差が４／πより大きいデータ点の組み合わせの数に対する比が１以上である場合、区域を密集エリア区域に分類し、比が１より小さい場合、区域を共通移動軌跡区域に分類する。 In other words, the classification unit 133 has a ratio of the number of pairs in which the difference between θ and π / 2 is 4 / π or less to the number of combinations of data points in which the difference between θ and π / 2 is larger than 4 / π. If it is greater than or equal to 1, the area is classified as a dense area area, and if the ratio is less than 1, the area is classified as a common movement trajectory area.

このように、分類部１３３は、π／４≦θ≦３π／４となるθの数と０≦θ＜π／４∨３π／４＜θ≦πとなるθの数との比を、区域内のデータ点が交差する傾向の大きさを示す値としている。つまり、密集エリア区域、すなわちπ／４≦θ≦３π／４となるθの数が多い区域には、交差するデータ点が多いことになる。 In this way, the classification unit 133 calculates the ratio between the number of θs satisfying π / 4 ≦ θ ≦ 3π / 4 and the number of θs satisfying 0 ≦ θ <π / 4∨3π / 4 <θ ≦ π. The value indicates the magnitude of the tendency of the data points to intersect. That is, there are many intersecting data points in a dense area area, that is, an area where the number of θs satisfying π / 4 ≦ θ ≦ 3π / 4 is large.

一方、共通移動軌跡区域、すなわち０≦θ＜π／４∨３π／４＜θ≦πとなるθの数が多い区域には、交差するデータ点が少なく、移動方向が一致するデータ点、または移動方向が正反対であるデータ点が多いことになる。このような場合、当該区域には、上り方面と下り方面を有するルートが存在することが考えられる。 On the other hand, in a common movement trajectory area, that is, an area where the number of θ satisfying 0 ≦ θ <π / 4π3π / 4 <θ ≦ π is small, there are few data points intersecting and data points having the same moving direction, or There will be many data points with opposite directions of movement. In such a case, it is possible that a route having an up direction and a down direction exists in the area.

クラスタリング部１３４は、第１の区域に分類された区域ごとに、データ点の位置間の距離およびデータ点の移動方向間の角度の両方に基づく第１の尺度を用いて、区域内のデータ点のクラスタリングを行い、第２の区域に分類された区域ごとに、データ点の位置間の距離およびデータ点の移動方向間の角度の両方に基づく第２の尺度を用いて、区域内のデータ点のクラスタリングを行う。 The clustering unit 134 uses, for each area classified as the first area, a data point in the area using a first measure based on both the distance between the data point positions and the angle between the movement directions of the data points. For each zone classified as a second zone, using a second measure based on both the distance between the positions of the data points and the angle between the direction of movement of the data points. Perform clustering.

クラスタリング部１３４は、第１の区域、すなわち密集エリア区域に分類された区域について、式（１）によってデータ点ｐ_ｉとデータ点ｐ_ｊとの関係を表す第１の尺度ｄｉｓｔ_Ｄを算出し、ｄｉｓｔ_Ｄに基づいて区域内の各データ点のクラスタリングを行う。以降、第１の尺度ｄｉｓｔ_Ｄを密集エリア距離とよぶ。ただし、δをスケール因子、θをｐ_ｉ→ｐ_ｉ＋１とｐ_ｊ→ｐ_ｊ＋１によって生成される角度、ＥＤ（ｐ_ｉ，ｐ_ｊ）をデータ点ｐ_ｉとデータ点ｐ_ｊの位置座標間のユークリッド距離とし、θの範囲を０≦θ≦πとする。 The clustering unit 134 calculates a first scale dist _D representing the relationship between the data points p _i and the data points p _j by the equation (1) for the first area, that is, the area classified as the dense area area, Based on dist _D , each data point in the area is clustered. Hereinafter, the first scale dist _D is referred to as a dense area distance. Where δ is a scale factor, θ is an angle generated by p _i → p _{i + 1} and p _j → p _{j + 1} , and ED (p _i , p _j ) is a Euclidean between the position coordinates of the data point p _i and the data point p _j. The distance is set, and the range of θ is 0 ≦ θ ≦ π.

また、クラスタリング部１３４は、第２の区域、すなわち共通移動軌跡区域に分類された区域について、式（２）によってデータ点ｐ_ｉとデータ点ｐ_ｊとの関係を表す第２の尺度ｄｉｓｔ_Ｔを算出し、ｄｉｓｔ_Ｔに基づいて区域内の各データ点のクラスタリングを行う。以降、第２の尺度ｄｉｓｔ_Ｔを共通移動軌跡距離とよぶ。ただし、式（１）と同様に、δをスケール因子、θをｐ_ｉ→ｐ_ｉ＋１とｐ_ｊ→ｐ_ｊ＋１によって生成される角度、ＥＤ（ｐ_ｉ，ｐ_ｊ）をデータ点ｐ_ｉとデータ点ｐ_ｊの位置座標間のユークリッド距離とする。また、式（２）ではθの範囲を０≦θ≦π／２とする。 In addition, the clustering unit 134 calculates a second scale dist _T representing the relationship between the data point p _i and the data point p _j by the expression (2) for the second area, that is, the area classified as the common movement locus area. Compute and cluster each data point in the area based on dist _T. Hereinafter, the second scale dist _T is referred to as a common movement trajectory distance. However, as in the equation (1), δ is a scale factor, θ is an angle generated by p _i → p _{i + 1} and p _j → p _{j + 1} , and ED (p _i , p _j ) is a data point p _i and a data point. Let Euclidean distance between the position coordinates of p _j . In the formula (2), the range of θ is set to 0 ≦ θ ≦ π / 2.

式（１）より、２点の位置座標のユークリッド距離が０に近いほど、また、移動方向によって生成される角度がπ／２に近いほど、密集エリア距離の値は最大値である１に近くなる。つまり、密集エリア距離は、２つのデータ点の位置間の距離の短さ、および移動方向間の角度の正弦値に比例する尺度である。 From equation (1), the closer the Euclidean distance between the position coordinates of the two points is to 0 and the closer the angle generated by the moving direction is to π / 2, the closer the value of the dense area distance is to 1, which is the maximum value. Become. That is, the dense area distance is a measure proportional to the short distance between the positions of the two data points and the sine value of the angle between the moving directions.

一方、式（２）より、２点の位置座標のユークリッド距離が０に近いほど、また、移動方向によって生成される角度が０に近いほど、共通移動軌跡距離の値は最大値である１に近くなる。つまり、共通移動軌跡距離は、２つのデータ点の位置間の距離の短さ、および移動方向間の角度の余弦値に比例する尺度である。 On the other hand, as the Euclidean distance between the position coordinates of the two points is closer to 0 and the angle generated according to the moving direction is closer to 0, the value of the common movement trajectory distance is set to 1 which is the maximum value. Get closer. That is, the common movement trajectory distance is a measure proportional to the short distance between the positions of the two data points and the cosine value of the angle between the movement directions.

例えば、クラスタリング部１３４は、最小距離の閾値Ｍｉｎ_ｄｉｓｔ、および最小密度の閾値Ｍｉｎ_{ｄｅｎｓｉｔｙ}が与えられるとき、密集エリア区域については密集エリア距離を用いて、共通移動軌跡区域については共通移動軌跡距離を用いて、非特許文献１に記載された密度ベースのクラスタリング手法によってクラスタリングを行うことができる。なお、クラスタリング部１３４のクラスタリングの手法は、上記手法に限られず、データ点の位置間の距離およびデータ点の移動方向間の角度の両方に基づく尺度を用いた任意のクラスタリング手法とすることができる。また、クラスタリング部１３４によるクラスタリングの結果は、各データ点がどのクラスタに含まれるかによって表される。このとき、いずれのクラスタにも含まれないデータ点が存在する場合があり、そのようなデータ点をノイズとよぶ。 For example, when the minimum distance threshold Min _dist and the minimum density threshold Min _density are given, the clustering unit 134 uses the dense area distance for the dense area area and uses the common movement locus distance for the common movement locus area. Thus, clustering can be performed by the density-based clustering method described in Non-Patent Document 1. Note that the clustering method of the clustering unit 134 is not limited to the above method, and can be any clustering method using a scale based on both the distance between the positions of the data points and the angle between the movement directions of the data points. . Further, the result of clustering by the clustering unit 134 is represented by which cluster each data point is included. At this time, there may be data points that are not included in any cluster, and such data points are referred to as noise.

図６の（１−３）は、クラスタリング部１３４による分割のパターンごとのクラスタリング結果を表している。図６の（１−３）の曲線で囲まれた領域は、密集エリア区域のクラスタを示している。また、図６の（１−３）の矢印は、共通移動軌跡区域のクラスタを示している。つまり、図６の（１−３）の各分割のパターンにおける太線で囲まれた区域内の、曲線で囲まれたデータ点の塊は、密集エリアクラスタである。また、図６の（１−３）の各分割のパターンにおける太線で囲まれた区域内の、矢印で囲まれたデータ点の塊は、共通移動軌跡クラスタである。 (1-3) in FIG. 6 represents a clustering result for each division pattern by the clustering unit 134. A region surrounded by a curve (1-3) in FIG. 6 indicates a cluster of a dense area. Moreover, the arrow of (1-3) of FIG. 6 has shown the cluster of the common movement locus | trajectory area. That is, the cluster of data points surrounded by the curves in the area surrounded by the thick line in each division pattern of (1-3) in FIG. 6 is a dense area cluster. A cluster of data points surrounded by arrows in the area surrounded by a thick line in each division pattern of (1-3) in FIG. 6 is a common movement locus cluster.

計算部１３５は、複数のパターンごとに、クラスタリング部１３４によるクラスタリングの結果に基づいて、区域の数、区域の大きさ、区域内のデータ点の数、クラスタの数およびいずれのクラスタにも含まれなかったデータ点の数に比例する第１のコストを計算する。 The calculation unit 135 is included in the number of areas, the size of the area, the number of data points in the area, the number of clusters, and the number of clusters for each of the plurality of patterns based on the result of clustering by the clustering unit 134. Calculate a first cost proportional to the number of missing data points.

計算部１３５は、各分割のパターンについて、第１のコストとしてＭＤＬコストを計算する。ＭＤＬコストとは、モデル選択基準であるＭＤＬ（minimum description length）を基にした方法によって計算される。また、ＭＤＬコストは、各分割のパターンのクラスタリング結果を評価するスコアである。なお、ＭＤＬは、式（３）に示すように、２つのコスト関数の組み合わせにより表現される。 The calculation unit 135 calculates the MDL cost as the first cost for each division pattern. The MDL cost is calculated by a method based on MDL (minimum description length) which is a model selection criterion. The MDL cost is a score for evaluating the clustering result of each divided pattern. Note that MDL is expressed by a combination of two cost functions as shown in Equation (3).

例えば、データの圧縮モデルを選択する際の基準としてＭＤＬが用いられる場合、Ｃｏｓｔ（Ｈ）は圧縮モデルＨを表現するコストである。このとき、圧縮モデルがシンプルであるほど、Ｃｏｓｔ（Ｈ）は小さくなる。一方、Ｃｏｓｔ（Ｄ│Ｈ）は圧縮の質、すなわちモデルＨによりデータＤを表現するためのコストである。つまり、データと、データに割り当てられたモデルとの適合度が大きいほど、Ｃｏｓｔ（Ｄ│Ｈ）は小さくなる。ＭＤＬでは、この二つのコストの総計が一番小さなものを選び、少ない符号でデータを表現する最適なデータ圧縮を達成する。 For example, when MDL is used as a reference when selecting a compression model of data, Cost (H) is a cost for expressing the compression model H. At this time, the simpler the compression model, the smaller Cost (H). On the other hand, Cost (D | H) is the quality of the compression, that is, the cost for expressing the data D by the model H. That is, the higher the degree of matching between the data and the model assigned to the data, the smaller Cost (D | H). In MDL, the sum of these two costs is selected and the optimum data compression is achieved in which data is expressed with a small number of codes.

まず、計算部１３５による計算に用いられるＣｏｓｔ（Ｈ）に含まれる値は、下記の通り定義される。
生成された区域の個数ｎ：ｌｏｇ^＊（ｎ）ビット
各区域 ni におけるセルの数ｎ_ｃ：Σ^ｎ _ｉ＝１ｌｏｇ^＊（ｎ_ｃ）ビット
各区域 ni におけるデータ点の数ｎ_ｐ：Σ^ｎ _ｉ＝１ｌｏｇ^＊（ｎ_ｐ）ビット
ここで、ｌｏｇ^＊（ｘ）はユニバーサル符号長であり、ｌｏｇ^＊（ｘ）＝ｌｏｇ_２（２．８６５）＋ｌｏｇ_２（ｘ）＋ｌｏｇ_２ｌｏｇ_２（ｘ）＋…と表すことができる。 First, values included in Cost (H) used for calculation by the calculation unit 135 are defined as follows.
The number of the generated zone n: ^log * (n) number of cells in the bit each zone _{^{_{ni n c: Σ n i =}}} 1 log * (n c) the number of bits _{n p} data points in each zone ni: Σ ⁿ _{i = 1} log ^* (n _p ) bits where log ^* (x) is the universal code length, log ^* (x) = log ₂ (2.865) + log ₂ (x) + log ₂ log ₂ (x) + …It can be expressed as.

クラスタリング部１３４によるクラスタリングによって、各区域に、クラスタの集合Ｃ＝｛ｃ_１，ｃ_２，…ｃ_ｍ｝とＫ個のノイズとが生成されるとして、まず、計算部１３５は、式（４）に示すようにエントロピー符号を計算する。 Assuming that a set of clusters C = {c ₁ , c ₂ ,..., C _m } and K noises are generated in each section by clustering by the clustering unit 134, first, the calculation unit 135 first calculates the expression (4). The entropy code is calculated as shown in FIG.

ここで、式（４）のｑ_ｉは、事象ｉの生起確率を表す。データ集合に表した事象の数が多いほど、また、各事象の生起確率が均等であるほど、データを表現するための最低限の符号が長くなり、エントロピー符号が大きくなる。一方、データ集合に表した事象の数が少ないほど、また、各事象の生起確率に偏りがあるほど、データを表現するための最低限の符号が短くなり、エントロピー符号が小さくなる。 Here, q _i in Equation (4) represents the occurrence probability of event i. As the number of events represented in the data set is larger and the probability of occurrence of each event is more equal, the minimum code for representing data becomes longer and the entropy code becomes larger. On the other hand, the smaller the number of events represented in the data set and the bias in the occurrence probability of each event, the shorter the minimum code for representing the data and the smaller the entropy code.

そこで、計算部１３５は、生成される同種類のクラスタと個々のノイズを単独の事象とみなし、各区域内のデータ点ｐ_ｉを表現するための最低限の符号長コストを式（５）により計算する。 Therefore, calculation unit 135, the same type of clusters and individual noise generated regarded as a single event, the minimum code length cost for representing data points p _i in each area by the equation (5) calculate.

ここで、｜ｃ_ｉ｜はクラスタｃ_ｉに含まれるデータ点の数、ｍはクラスタの数、ｋはノイズの数とする。区域内に生成されるクラスタの数やノイズとして判断されるデータ点の数が少ないほど、Ｅ_ｐｉの値は小さくなる。このことは、各区域に含まれるデータ点をノイズの少ない１個のクラスタで表すことに寄与する。 Here, | c _i | is the number of data points included in cluster c _i , m is the number of clusters, and k is the number of noises. The smaller the number of clusters generated in an area or the number of data points judged as noise, the smaller the value of E _pi . This contributes to representing the data points included in each area with one cluster with less noise.

そして、計算部１３５は、区域内の全てのデータ点を表現する符号長コストＥ_ｎｉを、Ｅ_ｎｉ＝ｎ_ｐ・Ｅ_ｐｉ、分割された区域の個数をｎとして、データ領域における全てのデータ点を表現するためのコストＣｏｓｔ（Ｄ｜Ｈ）を、Ｃｏｓｔ（Ｄ｜Ｈ）＝Σ^ｎ _ｉ＝１Ｅ_ｎｉとして計算する。 Then, the calculation unit 135 sets the code length cost E _ni representing all the data points in the area to E _ni = n _p · E _pi , the number of the divided areas as n, and sets all the data points in the data area. The cost Cost (D | H) for expressing is calculated as Cost (D | H) = Σ ⁿ _{i = 1} E _ni .

各区域内の全てのデータ点を表現するコストＥ_ｎｉは、その区域に対する再分割の必要性を示している。Ｅ_ｎｉが大きい区域は、推定した種類のクラスタによって、区域における全てのデータ点を表現するのは適切ではないことを意味し、区域の再分割と細分された区域におけるデータ点に対する種類の再推定を行うことによって、より質の高いクラスタリング結果を得ることができる。 The cost E _ni representing all the data points in each area indicates the need for subdivision for that area. An area with a large E _ni means that it is not appropriate to represent all the data points in the area by the estimated kind of cluster, and re-estimate the kind for the data points in the subdivision and subdivision areas By performing the above, a higher quality clustering result can be obtained.

以上より、計算部１３５は、候補解Ｚ＝｛｛Ｃ，ｋ｝｜｛ｎ，｛ｎ_ｃ｝，｛ｎ_ｐ｝｝が与えられたときの符号長コストＣｏｓｔ（Ｄ，Ｈ）を式（６）のように計算する。 As described above, the calculation unit 135 calculates the code length cost Cost (D, H) when the candidate solution Z = {{C, k} | {n, {n _c }, {n _p }} is given by the formula ( Calculate as in 6).

選択部１３６は、複数のパターンのうち、第１のコストが最も小さいパターンを最適解として選択する。すなわち、図６の（１−４）に示すように、Ｃｏｓｔ_ｔ−１（Ｄ，Ｈ）を、第１のコストが最も小さい分割のパターンのコストとする。図６の（１−４）の破線で囲まれたパターンは、Ｃｏｓｔ_ｔ−１（Ｄ，Ｈ）決定の一例である。なお、ｔ−１は何回目のイテレーションであるかを示している。ここで説明しているのは初回のイテレーション処理なので、ｔ−１は、例えば１とすることができる。 The selection unit 136 selects a pattern having the smallest first cost among the plurality of patterns as an optimal solution. That is, as shown in (1-4) of FIG. 6, Cost _t-1 (D, H) is set as the cost of the division pattern having the smallest first cost. A pattern surrounded by a broken line in (1-4) of FIG. 6 is an example of Cost _t-1 (D, H) determination. Note that t-1 indicates the number of iterations. Since what is described here is an initial iteration process, t−1 can be set to 1, for example.

［２回目以降の処理］
ここまで、イテレーションの初回の処理について説明した。ここで、２回目以降のイテレーションの処理について説明する。計算部１３５は、選択部１３６によって最適解として選択されたパターンである前回最適パターンに含まれる区域ごとに、クラスタの数およびいずれのクラスタにも含まれなかったデータ点の数に比例する第２のコストを比較する。第２のコストは、前述のＥ_ｎｉである。例えば、２回目のイテレーションの処理では、計算部１３５は、初回の処理で選択されたパターンの区域ごとに、Ｅ_ｎｉを比較する。 [Second and subsequent processing]
So far, the first iteration process has been described. Here, the second and subsequent iteration processing will be described. For each area included in the previous optimal pattern, which is the pattern selected as the optimal solution by the selection unit 136, the calculation unit 135 is proportional to the number of clusters and the number of data points not included in any cluster. Compare costs. The second cost is the aforementioned E _ni . For example, in the second iteration process, the calculation unit 135 compares E _ni for each area of the pattern selected in the first process.

分割部１３２は、第２のコストが最も大きい区域を、複数のパターンで、複数の区域にさらに分割する。例えば、２回目のイテレーションの処理では、図６の（２−２）に示すように、分割部１３２は、Ｅ_ｎｉが最も大きかった区域、すなわち、図６の（２−１）の左側の区域をさらに分割する。 The dividing unit 132 further divides the area having the largest second cost into a plurality of areas with a plurality of patterns. For example, in the second iteration process, as shown in (2-2) of FIG. 6, the dividing unit 132 determines that the area where E _ni is the largest, that is, the left area of (2-1) of FIG. 6. Is further divided.

また、例えば、イテレーションの２回目の処理では、図６の（２−３）に示すように、クラスタリング部１３４はさらにクラスタリングを行う。そして、計算部１３５は、さらにＣｏｓｔ（Ｄ，Ｈ）を計算する。 For example, in the second iteration process, the clustering unit 134 further performs clustering as shown in (2-3) of FIG. Then, the calculation unit 135 further calculates Cost (D, H).

選択部１３６は、複数のパターンのうち第１のコストが最も小さいパターンである今回最適パターンと前回最適パターンとから、第１のコストが小さい方のパターンを最適解として選択し、今回最適パターンを最適解として選択した場合、分割部１３２に今回最適パターンを与え、さらに処理を実行させる。以降、各イテレーションにおける最適解をローカル最適解とよぶ。例えば、イテレーションの２回目の処理では、選択部１３６は、図６の（２−４）に示すように、Ｃｏｓｔ_ｔ（Ｄ，Ｈ）が、ｔ回目のローカル最適解の第１のコストとする。図６の（２−４）の破線で囲まれたパターンは、Ｃｏｓｔ_ｔ（Ｄ，Ｈ）決定の一例である。 The selection unit 136 selects the pattern having the smaller first cost as the optimum solution from the present optimum pattern and the previous optimum pattern which are the patterns having the smallest first cost among the plurality of patterns, and selects the optimum pattern this time. When the optimal solution is selected, the optimal pattern is given to the dividing unit 132 this time, and further processing is executed. Hereinafter, the optimal solution in each iteration is referred to as a local optimal solution. For example, in the second iteration process, the selection unit 136 sets Cost _t (D, H) as the first cost of the t-th local optimum solution as shown in (2-4) of FIG. . A pattern surrounded by a broken line in (2-4) of FIG. 6 is an example of Cost _t (D, H) determination.

そして、Ｃｏｓｔ_ｔ（Ｄ，Ｈ）がＣｏｓｔ_ｔ−１（Ｄ，Ｈ）より大きい場合、選択部１３６は、ｔ−１回目のローカル最適解を選択する。そして、クラスタリング装置１０は、選択部１３６によって選択されたパターンを、最終的な最適解として、出力部１２に出力させる。 When Cost _t (D, H) is greater than Cost _t−1 (D, H), the selection unit 136 selects the t−1th local optimum solution. Then, the clustering apparatus 10 causes the output unit 12 to output the pattern selected by the selection unit 136 as the final optimal solution.

一方、Ｃｏｓｔ_ｔ（Ｄ，Ｈ）がＣｏｓｔ_ｔ−１（Ｄ，Ｈ）より小さい場合、選択部１３６は、ｔ回目のローカル最適解を選択する。そして、選択部１３６は、分割部１３２に選択したパターンを与え、さらに処理を実行させる。 On the other hand, when Cost _t (D, H) is smaller than Cost _t−1 (D, H), the selection unit 136 selects the t-th local optimum solution. Then, the selection unit 136 gives the selected pattern to the division unit 132 and further executes processing.

初回および２回目を例に挙げて説明した上記処理は、ｔ−１回目とｔ回目に拡張することができる。つまり、クラスタリング装置１０は、ｔ回目のイテレーションとｔ−１回目のイテレーションで選ばれたローカル最適解のＭＤＬコストを比較する。そして、ｔ回目のイテレーションのコストがｔ−１回目のイテレーションのコストより小さい場合、クラスタリング装置１０は、ｔ回目のイテレーションの結果を保留し、ｔ＋１回目のイテレーションを実行する。 The above-described processing described by taking the first and second times as an example can be extended to the (t−1) th and tth times. That is, the clustering apparatus 10 compares the MDL cost of the local optimum solution selected in the t-th iteration and the t−1 iteration. When the cost of the t-th iteration is smaller than the cost of the (t-1) -th iteration, the clustering apparatus 10 holds the result of the t-th iteration and executes the (t + 1) -th iteration.

一方、ｔ回目のイテレーションのコストがｔ−１回目のイテレーションのコストより大きい場合、クラスタリング装置１０は、ｔ−１回目のイテレーションの結果を最終的な最適解とする。 On the other hand, when the cost of the t-th iteration is greater than the cost of the t-1 iteration, the clustering apparatus 10 sets the result of the t-1 iteration as the final optimal solution.

図１０を用いて、クラスタリング装置１０によるクラスタリング結果について説明する。図１０は、クラスタリング結果の一例を示す図である。図１０の例では、前述の人流データを用い、セルサイズを２００ｍ×２００ｍ、最小距離の閾値Ｍｉｎ_ｄｉｓｔを、区域内全てのデータ点のうち自身との距離が上位２５個であるデータ点との距離の平均値の平均、最小密度の閾値Ｍｉｎ_{ｄｅｎｓｉｔｙ}を、区域内全てのデータ点の密度の平均に設定しクラスタリングを行った際のクラスタリング結果である。 A clustering result by the clustering apparatus 10 will be described with reference to FIG. FIG. 10 is a diagram illustrating an example of the clustering result. In the example of FIG. 10, the above-described human flow data is used, the cell size is 200 m × 200 m, and the minimum distance threshold Min _dist is the distance between the data points of the top 25 among all the data points in the area. This is a clustering result when clustering is performed by setting the average of the average distance and the minimum density threshold Min _density to the average of the densities of all data points in the area.

図１０の四角形で囲まれた領域は、分割部１３２によって分割された区域である。また、曲線で囲まれたデータ点の塊は、密集エリアクラスタである。破線の四角形で囲まれた区域は、密集エリア区域である。また、白抜きの矢印で囲まれたデータ点の塊は、共通移動軌跡クラスタである。実線の四角形で囲まれた区域は、共通移動軌跡区域である。また、灰色の矢印はノイズである。 A region surrounded by a rectangle in FIG. 10 is an area divided by the dividing unit 132. A cluster of data points surrounded by a curve is a dense area cluster. The area surrounded by the dashed rectangle is a dense area. A cluster of data points surrounded by white arrows is a common movement trajectory cluster. An area surrounded by a solid rectangle is a common movement locus area. The gray arrow is noise.

図１０に示すように、密集エリアクラスタはランドマーク（東京ドーム、飯田橋駅）とマッチしていた。また、共通移動軌跡クラスタは道路（国道等）とマッチしていた。図１０のように出力を行うことで、所定の日時に所定の領域にいた人々が、どこに集まっているか、どこから集まっているか、といったことを確認することができる。 As shown in FIG. 10, the dense area cluster matched the landmark (Tokyo Dome, Iidabashi Station). In addition, the common movement trajectory cluster matched a road (such as a national road). By performing the output as shown in FIG. 10, it is possible to confirm where the people who were in the predetermined area at the predetermined date and time are gathered and where.

［第１の実施形態の処理］
図１１を用いて、クラスタリング装置１０の処理の流れについて説明する。図１１は、第１の実施形態に係るクラスタリング装置の処理の流れを示すフローチャートである。図１１に示すように、分割部１３２は、セルサイズを決定し、データ領域をメッシュ化する（ステップＳ１０１）。 [Process of First Embodiment]
A processing flow of the clustering apparatus 10 will be described with reference to FIG. FIG. 11 is a flowchart showing the flow of processing of the clustering apparatus according to the first embodiment. As shown in FIG. 11, the dividing unit 132 determines the cell size and meshes the data area (step S101).

次に、分割部１３２は、分析対象である区域を選択し、複数のパターンで区域の分割を行う（ステップＳ１０２）。分類部１３３は、データ点の移動方向の角度を基に、区域を密集エリア区域と共通移動軌跡区域とに分類する（ステップＳ１０３）。クラスタリング部１３４は、密集エリア区域および共通移動軌跡区域のいずれにおいても、データ点の位置間の距離およびデータ点間の移動方向の角度を基に、データ点に対するクラスタリングを行う（ステップＳ１０４）。 Next, the dividing unit 132 selects an area to be analyzed, and divides the area with a plurality of patterns (step S102). The classifying unit 133 classifies the areas into a dense area area and a common movement locus area based on the angle of the data points in the moving direction (step S103). The clustering unit 134 performs clustering on the data points based on the distance between the positions of the data points and the angle of the movement direction between the data points in both the dense area area and the common movement trajectory area (step S104).

計算部１３５は、クラスタリング部１３４によるクラスタリングの結果を基にＭＤＬコストを計算し、ローカル最適解のコストＣｏｓｔ_ｔ（Ｄ，Ｈ）を決定する（ステップＳ１０５）。ここで、Ｃｏｓｔ_ｔ（Ｄ，Ｈ）≦Ｃｏｓｔ_ｔ−１（Ｄ，Ｈ）である場合（ステップＳ１０６、Ｙｅｓ）、クラスタリング装置１０は、ステップＳ１０２に処理を戻し、さらに処理を繰り返す。また、Ｃｏｓｔ_ｔ（Ｄ，Ｈ）≦Ｃｏｓｔ_ｔ−１（Ｄ，Ｈ）でない場合（ステップＳ１０６、Ｎｏ）、クラスタリング装置１０は、クラスタリング結果を出力し（ステップＳ１０７）、処理を終了する。 The calculation unit 135 calculates the MDL cost based on the result of clustering by the clustering unit 134, and determines the cost Cost _t (D, H) of the local optimum solution (step S105). Here, if Cost _t (D, H) ≦ Cost _t−1 (D, H) (step S106, Yes), the clustering apparatus 10 returns the process to step S102 and repeats the process. If Cost _t (D, H) ≦ Cost _t−1 (D, H) is not satisfied (step S106, No), the clustering apparatus 10 outputs the clustering result (step S107) and ends the process.

［第１の実施形態の効果］
分割部１３２は、所定の区域を、複数のパターンで、複数の区域に分割する。分類部１３３は、複数のパターンごとに、複数の区域のそれぞれについて、区域内におけるデータ点の移動方向間の角度を基に、区域内のデータ点が交差する傾向の大きさを示す値を計算し、複数の区域のうち値が所定値以上である区域を第１の区域に分類し、複数の区域のうち値が所定値以上でない区域を第２の区域に分類する。また、クラスタリング部１３４は、第１の区域に分類された区域ごとに、データ点の位置間の距離およびデータ点の移動方向間の角度の両方に基づく第１の尺度を用いて、区域内のデータ点のクラスタリングを行い、第２の区域に分類された区域ごとに、データ点の位置間の距離およびデータ点の移動方向間の角度の両方に基づく第２の尺度を用いて、区域内のデータ点のクラスタリングを行う。また、計算部１３５は、複数のパターンごとに、クラスタリング部１３４によるクラスタリングの結果に基づいて、区域の数、区域の大きさ、区域内のデータ点の数、クラスタの数およびいずれのクラスタにも含まれなかったデータ点の数に比例する第１のコストを計算する。また、選択部１３６は、複数のパターンのうち、第１のコストが最も小さいパターンを最適解として選択する。 [Effect of the first embodiment]
The dividing unit 132 divides the predetermined area into a plurality of areas with a plurality of patterns. For each of a plurality of patterns, the classification unit 133 calculates a value indicating the magnitude of the tendency of the data points in the area to intersect based on the angle between the movement directions of the data points in the area. Then, an area having a value greater than or equal to a predetermined value among the plurality of areas is classified as a first area, and an area having a value not greater than the predetermined value among the plurality of areas is classified as a second area. In addition, the clustering unit 134 uses, for each of the areas classified as the first area, a first scale based on both the distance between the positions of the data points and the angle between the movement directions of the data points. Perform clustering of data points, and for each zone classified as a second zone, use a second measure based on both the distance between the data point locations and the angle between the data point movement directions, Cluster data points. Further, the calculation unit 135 calculates the number of areas, the size of the area, the number of data points in the area, the number of clusters, and any clusters based on the result of clustering by the clustering unit 134 for each of the plurality of patterns. A first cost proportional to the number of data points not included is calculated. In addition, the selection unit 136 selects a pattern having the smallest first cost among the plurality of patterns as an optimal solution.

このように、第１の実施形態では、密集エリア区域および共通移動軌跡区域を、データ点の位置間の距離およびデータ点の移動方向間の角度という同一の尺度を用いて評価しているため、密集エリアクラスタおよび共通移動軌跡クラスタの同時検出が可能となり、移動体データから、場所および軌跡の両方にフォーカスした分析結果を取得することができるようになる。 Thus, in the first embodiment, the dense area area and the common movement trajectory area are evaluated using the same measure of the distance between the positions of the data points and the angle between the movement directions of the data points. The dense area cluster and the common movement trajectory cluster can be simultaneously detected, and the analysis result focusing on both the place and the trajectory can be acquired from the moving body data.

計算部１３５は、選択部１３６によって今回のローカル最適解として選択されたパターンに含まれる区域ごとに、クラスタの数およびいずれのクラスタにも含まれなかったデータ点の数に比例する第２のコストを比較する。分割部１３２は、第２のコストが最も大きい区域を、複数のパターンで、複数の区域にさらに分割する。選択部１３６は、今回のローカル最適解と前回のローカル最適解とから、第１のコストが小さい方のパターンを最適解として選択し、今回のローカル最適解を最適解として選択した場合、分割部１３２に今回のローカル最適解与え、さらに処理を実行させる。このように、最適解が見つかるまで処理を繰り返すことで、より精度の高い分析結果を得ることが可能になる。 The calculation unit 135 calculates a second cost proportional to the number of clusters and the number of data points not included in any cluster for each area included in the pattern selected as the current local optimal solution by the selection unit 136. Compare The dividing unit 132 further divides the area having the largest second cost into a plurality of areas with a plurality of patterns. When the selection unit 136 selects the pattern having the smaller first cost as the optimal solution from the current local optimal solution and the previous local optimal solution, and selects the current local optimal solution as the optimal solution, the dividing unit The current local optimal solution is given to 132, and further processing is executed. In this way, it is possible to obtain a more accurate analysis result by repeating the processing until an optimal solution is found.

分類部１３３は、移動方向間の角度とπ／２との差が所定値以下であるデータ点の組み合わせの数の、移動方向間の角度とπ／２との差が所定値より大きいデータ点の組み合わせの数に対する比が１以上である場合、区域を第１の区域に分類し、比が１より小さい場合、区域を第２の区域に分類する。これにより、密集エリア区域と共通移動軌跡区域におけるデータ点の移動傾向の違いをクラスタリングに反映させることができる。 The classification unit 133 determines the number of data points in which the difference between the angle between the moving directions and π / 2 is equal to or smaller than a predetermined value, and the data point where the difference between the angle between the moving directions and π / 2 is larger than the predetermined value. If the ratio to the number of combinations is greater than or equal to 1, the area is classified as a first area, and if the ratio is less than 1, the area is classified as a second area. Thereby, the difference in the movement tendency of the data points in the dense area and the common movement locus area can be reflected in the clustering.

クラスタリング部１３４は、２つのデータ点の位置間の距離の短さ、および移動方向間の角度の正弦値に比例する尺度を第１の尺度とし、２つのデータ点の位置間の距離の短さ、および移動方向間の角度の余弦値に比例する尺度を第２の尺度としてクラスタリングを行う。これにより、データ点の位置間の距離およびデータ点の移動方向の角度という同一の尺度を用いつつ、密集エリア区域および共通移動軌跡区域のそれぞれに属するデータ点に適合した方法でクラスタリングを行うことが可能となる。 The clustering unit 134 uses, as a first scale, a distance proportional to the sine value of the angle between the two data points and the angle between the movement directions, and the distance between the two data points. , And a scale proportional to the cosine value of the angle between the moving directions is used as a second scale for clustering. This makes it possible to perform clustering in a manner adapted to the data points belonging to the dense area area and the common movement trajectory area while using the same measure of the distance between the positions of the data points and the angle of the movement direction of the data points. It becomes possible.

［システム構成等］
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部または任意の一部が、ＣＰＵおよび当該ＣＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 [System configuration, etc.]
Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. Furthermore, all or a part of each processing function performed in each device may be realized by a CPU and a program that is analyzed and executed by the CPU, or may be realized as hardware by wired logic.

また、本実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Also, among the processes described in this embodiment, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed. All or a part can be automatically performed by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above-described document and drawings can be arbitrarily changed unless otherwise specified.

［プログラム］
一実施形態として、クラスタリング装置１０は、パッケージソフトウェアやオンラインソフトウェアとして上記のクラスタリングを実行するクラスタリングプログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記のクラスタリングプログラムを情報処理装置に実行させることにより、情報処理装置をクラスタリング装置１０として機能させることができる。ここで言う情報処理装置には、デスクトップ型またはノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal Handyphone System）等の移動体通信端末、さらには、ＰＤＡ（Personal Digital Assistant）等のスレート端末等がその範疇に含まれる。 [program]
As an embodiment, the clustering apparatus 10 can be implemented by installing a clustering program for executing the above clustering as package software or online software on a desired computer. For example, the information processing apparatus can function as the clustering apparatus 10 by causing the information processing apparatus to execute the above clustering program. The information processing apparatus referred to here includes a desktop or notebook personal computer. In addition, the information processing apparatus includes mobile communication terminals such as smartphones, mobile phones and PHS (Personal Handyphone System), and slate terminals such as PDA (Personal Digital Assistant).

また、クラスタリング装置１０は、ユーザが使用する端末装置をクライアントとし、当該クライアントに上記のクラスタリングに関するサービスを提供するクラスタリングサーバ装置として実装することもできる。例えば、クラスタリングサーバ装置は、移動体データを入力とし、クラスタリング結果を出力とするクラスタリングサービスを提供するサーバ装置として実装される。この場合、クラスタリングサーバ装置は、Ｗｅｂサーバとして実装することとしてもよいし、アウトソーシングによって上記のクラスタリングに関するサービスを提供するクラウドとして実装することとしてもかまわない。 The clustering apparatus 10 can also be implemented as a clustering server apparatus that uses a terminal device used by a user as a client and provides the client with the above-described service related to clustering. For example, the clustering server device is implemented as a server device that provides a clustering service that receives mobile data and outputs a clustering result. In this case, the clustering server device may be implemented as a Web server, or may be implemented as a cloud that provides the above-described clustering service by outsourcing.

図１２は、プログラムが実行されることによりクラスタリング装置が実現されるコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 FIG. 12 is a diagram illustrating an example of a computer in which a clustering apparatus is realized by executing a program. The computer 1000 includes a memory 1010 and a CPU 1020, for example. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These units are connected by a bus 1080.

メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090. The disk drive interface 1040 is connected to the disk drive 1100. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to the display 1130, for example.

ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、クラスタリング装置１０の各処理を規定するプログラムは、コンピュータにより実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、クラスタリング装置１０における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤにより代替されてもよい。 The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process of the clustering apparatus 10 is implemented as a program module 1093 in which a code executable by a computer is described. The program module 1093 is stored in the hard disk drive 1090, for example. For example, a program module 1093 for executing processing similar to the functional configuration in the clustering apparatus 10 is stored in the hard disk drive 1090. Note that the hard disk drive 1090 may be replaced by an SSD.

また、上述した実施形態の処理で用いられる設定データは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 The setting data used in the processing of the above-described embodiment is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 and executes them as necessary.

なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３およびプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３およびプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read out by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and the program data 1094 may be read by the CPU 1020 from another computer via the network interface 1070.

１０クラスタリング装置
１１入力部
１２出力部
１３制御部
１４記憶部
１３１変換部
１３２分割部
１３３分類部
１３４クラスタリング部
１３５計算部
１３６選択部 DESCRIPTION OF SYMBOLS 10 Clustering apparatus 11 Input part 12 Output part 13 Control part 14 Storage part 131 Conversion part 132 Division part 133 Classification part 134 Clustering part 135 Calculation part 136 Selection part

Claims

A dividing unit that divides a predetermined area into a plurality of areas in a plurality of patterns;
For each of the plurality of patterns, for each of the plurality of areas, a value indicating a magnitude of a tendency of data points in the area to intersect is calculated based on an angle between moving directions of the data points in the area, A classifying unit that classifies a zone where the value is equal to or greater than a predetermined value among a plurality of zones as a first zone, and classifies a zone where the value is not equal to or greater than a predetermined value among the plurality of zones as a second zone;
For each zone classified as the first zone, clustering of data points in the zone using a first measure based on both the distance between the positions of the data points and the angle between the movement directions of the data points. For each zone classified as the second zone, using a second measure based on both the distance between the positions of the data points and the angle between the movement directions of the data points, A clustering unit for clustering points;
For each of the plurality of patterns, based on the result of clustering by the clustering unit, the number of areas, the size of the area, the number of data points in the area, the number of clusters, and the data points not included in any cluster A calculator for calculating a first cost proportional to the number of
A selection unit that selects a pattern having the smallest first cost among the plurality of patterns as an optimal solution;
A clustering apparatus characterized by comprising:

For each of the areas included in the previous optimal pattern, which is the pattern selected as the optimal solution by the selection unit, the calculation unit is proportional to the number of clusters and the number of data points not included in any cluster. Compare the costs of
The division unit further divides the area with the second highest cost into a plurality of areas in a plurality of patterns,
The selection unit selects, as an optimal solution, a pattern having a smaller first cost from the current optimal pattern and the previous optimal pattern that are the patterns having the smallest first cost among the plurality of patterns, The clustering device according to claim 1, wherein, when the current optimal pattern is selected as an optimal solution, the current optimal pattern is given to the dividing unit and further processing is executed.

The classification unit is data having a difference between the angle between the moving direction and π / 2 that is greater than the predetermined value as the number of combinations of data points where the difference between the angle between the moving directions and π / 2 is a predetermined value or less. If the ratio to the number of point combinations is 1 or greater, classify the zone as the first zone, and if the ratio is less than 1, classify the zone as the second zone;
The clustering unit uses, as the first scale, a scale that is proportional to the short distance between the positions of the two data points and the sine value of the angle between the moving directions, and the short distance between the positions of the two data points. The clustering apparatus according to claim 1, wherein clustering is performed by using a scale that is proportional to a cosine value of an angle between movement directions as the second scale.

A clustering method executed by a clustering apparatus,
A dividing step of dividing a predetermined area into a plurality of areas in a plurality of patterns;
For each of the plurality of patterns, for each of the plurality of areas, a value indicating a magnitude of a tendency of data points in the area to intersect is calculated based on an angle between moving directions of the data points in the area, A classifying step of classifying a zone where the value is equal to or greater than a predetermined value among a plurality of zones as a first zone, and classifying a zone where the value is not equal to or greater than a predetermined value among the plurality of zones as a second zone;
For each zone classified as the first zone, clustering of data points in the zone using a first measure based on both the distance between the positions of the data points and the angle between the movement directions of the data points. For each zone classified as the second zone, using a second measure based on both the distance between the positions of the data points and the angle between the movement directions of the data points, A clustering step for clustering points;
For each of the plurality of patterns, based on the result of clustering by the clustering step, the number of areas, the size of the area, the number of data points in the area, the number of clusters, and the data points not included in any cluster Calculating a first cost proportional to the number of
A selection step of selecting, as an optimal solution, a pattern having the smallest first cost among the plurality of patterns;
A clustering method characterized by including