JP5953262B2

JP5953262B2 - DATA INDEX DEVICE, DATA INDEX METHOD, AND PROGRAM

Info

Publication number: JP5953262B2
Application number: JP2013099231A
Authority: JP
Inventors: 豊荒川; 中村　隆幸; 隆幸中村; 伸彦松浦
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2013-05-09
Filing date: 2013-05-09
Publication date: 2016-07-20
Anticipated expiration: 2033-05-09
Also published as: JP2014219865A

Description

この発明は、木構造索引を用いてデータの挿入及び検索を行うデータ索引装置、データ索引方法及びプログラムに関する。 The present invention relates to a data index device, a data index method, and a program for inserting and retrieving data using a tree structure index.

従来、データベース技術分野において、検索を高速化するための技術としてB-TreeやR-Treeなど木構造をもつ索引技術が種々提案されている。その１つとして、センサの計測データのような情報、すなわちセンサ情報などの多様な属性と属性値の組（key-value pairとも呼ばれる）を１つ以上含むデータに対する索引技術として、UBI-Tree（特許文献１）がある。UBI-Treeを使用すると、例えば「センサＩＤ＝１、時間＝３、温度＝２０」のようにセンサIDと時間と温度という３つの属性を含むデータと、「センサＩＤ＝３、時間＝７、照度＝６、音量＝４」のようにセンサＩＤと時間と照度と音量という４つの属性を含むデータを１つの索引に効率的に索引付けすることができる。 Conventionally, various index techniques having a tree structure such as B-Tree and R-Tree have been proposed in the database technical field as techniques for speeding up the search. For example, UBI-Tree (indexing technology for data including one or more pairs of attribute and attribute values (also called key-value pairs) such as sensor measurement data, that is, sensor information, etc. There exists patent document 1). When UBI-Tree is used, for example, “sensor ID = 1, time = 3, temperature = 20”, which includes data including three attributes of sensor ID, time and temperature, “sensor ID = 3, time = 7, Data including four attributes such as sensor ID, time, illuminance, and volume, such as “illuminance = 6, volume = 4”, can be efficiently indexed into one index.

ところで、UBI-Treeにおいては、新規データの索引付けを行う際、つまりUBI-Treeの木構造に対して新規データの挿入を行う際に、他の多くの木構造の索引技術と同様に、ルートノードから、挿入先ノード選択アルゴリズムに基づいて似たデータを保持する子ノードを選択しながらノードを辿り、辿り着いたリーフノードに新規データを保持させる。ここで、似たデータとは、新規データとの間で属性の種類やその属性値が互いに近いデータを指す。 By the way, in UBI-Tree, when indexing new data, that is, when inserting new data into the UBI-Tree tree structure, the root is the same as in many other tree structure index technologies. The node is traced while selecting a child node that holds similar data based on the insertion destination node selection algorithm from the node, and new data is held in the arrived leaf node. Here, similar data refers to data that is close to each other in attribute type and attribute value with new data.

また、UBI-Treeの木構造に新規データを挿入することで、保持するデータの数、あるいは保持する子ノードの数が上限を越えるノードが発生した場合には、ノード分割アルゴリズムに基づいてノードを２つに分割し、この分割された２つのノードにデータまたは子ノードを分配する。このとき、似たデータ同士、あるいは似たデータを下位に保持する子ノード同士を同じノードに分配する。このようにすることで、互いに似たデータを同じリーフノード、あるいは同じ部分木に分類して保持することが可能となる。これにより、検索では類似したデータを取得するための条件が設定されることが多いため、検索条件に合致するデータが１つのリーフノード、あるいは１つの部分木に集まって存在することとなり、データが大量に存在する場合でも、特定のノードだけを辿り、効率的に検索条件に合致するデータを検索することができる。 In addition, if new data is inserted into the tree structure of UBI-Tree and the number of data to be retained or the number of child nodes to be retained exceeds the upper limit, the node is determined based on the node division algorithm. The data is divided into two and data or child nodes are distributed to the two divided nodes. At this time, similar data or child nodes holding similar data in the lower order are distributed to the same node. In this way, similar data can be classified and held in the same leaf node or the same subtree. As a result, conditions for acquiring similar data are often set in the search, and therefore, data that matches the search conditions exists in one leaf node or one subtree, and the data is Even when a large amount exists, only specific nodes can be traced and data matching the search conditions can be efficiently searched.

上記挿入先ノードの選択アルゴリズムとノード分割アルゴリズムは、R-TreeにおいてはそれぞれChooseSubtree、Splitと呼ばれる。また、両アルゴリズムともペナルティと呼ばれる指標を用いる。挿入先ノード選択アルゴリズムは、新規データを挿入した際に、ペナルティの増分が最も少ないノードを選択するアルゴリズムである。またノード分割アルゴリズムは、分割後の２つのノードのペナルティの和が最も小さくなる分割方法を行うアルゴリズムである。 The above insertion destination node selection algorithm and node split algorithm are called ChooseSubtree and Split in the R-Tree, respectively. Both algorithms use an index called a penalty. The insertion destination node selection algorithm is an algorithm for selecting a node having the smallest penalty increment when new data is inserted. The node division algorithm is an algorithm that performs a division method that minimizes the sum of the penalty of two nodes after division.

UBI-Treeにおいては、ペナルティとして「当該ノードが検索される確率」または「当該ノードに含まれる属性種類数」、あるいはその組合せが用いられる。「当該ノードが検索される確率」をペナルティとして用いた場合、挿入先ノード選択アルゴリズムまたはノード分割アルゴリズムにおけるペナルティを小さく抑えることで、検索時に辿るべきノード数を小さくし、検索効率を高めることができる。また、「当該ノードに含まれる属性種類数」をペナルティとして用いた場合、同様にペナルティを小さく抑えることで、検索条件に指定された属性の種類に基づき検索時に辿るべきノード数を小さくし、検索効率を高めることができる。「当該ノードが検索される確率」と「当該ノードに含まれる属性種類数」の組合せをペナルティに用いた場合、両方の効果が期待できる。「当該ノードが検索される確率」は、それまでに挿入されたデータの統計情報から算出する。該統計情報は属性表に記録されており、該属性表には各属性について、挿入したデータ内での出現頻度や対応する属性値の最大値、最小値が記録されている。新規データを挿入するたびに該属性表は更新される。「当該ノードが検索される確率」は、該属性表を用いて、次式で算出される。
当該ノードが検索される確率＝ΣXi Yi （全属性集合Aに対し、i∈A）
i
但し、Xi は属性i で検索された場合に当該ノードが検索される確率、Yi は検索条件に属性i が用いられる確率を示す。 In the UBI-Tree, “probability that the node is searched” or “the number of attribute types included in the node” or a combination thereof is used as a penalty. When the “probability that the node is searched” is used as a penalty, the number of nodes to be traced can be reduced and search efficiency can be improved by suppressing the penalty in the insertion destination node selection algorithm or the node division algorithm. . In addition, when “the number of attribute types included in the node” is used as a penalty, the number of nodes to be traced at the time of search based on the attribute type specified in the search condition can be reduced by similarly reducing the penalty. Efficiency can be increased. When the combination of “probability of searching for the node” and “number of attribute types included in the node” is used as a penalty, both effects can be expected. The “probability that the node is searched” is calculated from the statistical information of the data inserted so far. The statistical information is recorded in an attribute table. For each attribute, the appearance frequency in the inserted data and the maximum and minimum values of the corresponding attribute value are recorded. Each time new data is inserted, the attribute table is updated. The “probability that the node is searched” is calculated by the following equation using the attribute table.
Probability that the node is searched = ΣXi Yi (i∈A for all attribute sets A)
i
However, Xi indicates the probability that the node is searched when the attribute i is searched, and Yi indicates the probability that the attribute i is used as a search condition.

また、「当該ノードが検索される確率」と「当該ノードに含まれる属性種類数」の組合せがペナルティに用いられる場合には、例えば先ず「当該ノードに含まれる属性種類数」をペナルティとして挿入先ノード選択アルゴリズムを適用し、それでも挿入先ノードを１つに絞り込むことができない場合に、さらに「当該ノードが検索される確率」をペナルティとして挿入先ノード選択アルゴリズムを適用し、複数の候補の中から挿入先ノードを１つに絞り込めば良い。 In addition, when a combination of “probability of searching for the node” and “number of attribute types included in the node” is used as a penalty, for example, first, “number of attribute types included in the node” is used as a penalty. When the node selection algorithm is applied and the insertion destination node cannot be narrowed down to one, the insertion destination node selection algorithm is further applied with the “probability that the node is searched” as a penalty, and a plurality of candidates are selected. It is sufficient to narrow down the insertion destination node to one.

特開２０１１−１７０４６１号公報JP 2011-170461 A

ところが、従来のUBI-Treeにおいては以下のような解決すべき課題があった。
すなわち、従来のUBI-Treeは、ペナルティとして用いる「当該ノードが検索される確率」を算出する際に、UBI-Treeに挿入されたデータのすべての属性を考慮して計算を行っている。このため、すべての属性について計算することはそれだけ計算量が大きくなり、これによりデータ挿入処理にそれだけ多くの時間を必要とする。 However, the conventional UBI-Tree has the following problems to be solved.
That is, the conventional UBI-Tree calculates in consideration of all attributes of data inserted into the UBI-Tree when calculating the “probability that the node is searched” used as a penalty. For this reason, calculation for all the attributes requires a large amount of calculation, which requires much time for data insertion processing.

また、従来のUBI-Treeは、それまでに挿入されたデータの統計情報から「当該ノードが検索される確率」の予測値を算出している。このため、予測値は必ずしも真の値と同じではなく、予測値と真の値の間に誤差がある場合には、検索効率の劣化が発生してしまう。 Also, the conventional UBI-Tree calculates a predicted value of “probability that the node is searched” from statistical information of data inserted so far. For this reason, the predicted value is not necessarily the same as the true value. If there is an error between the predicted value and the true value, the search efficiency is deteriorated.

さらに、従来のUBI-Treeでは、ペナルティとして「当該ノードに含まれる属性種類数」と「当該ノードが検索される確率」が組み合わされて用いられる場合に、挿入されたデータの多くが含む属性を検索条件とする検索に対して「当該ノードに含まれる属性種類数」をペナルティとして各ノードに含まれる属性種類数を小さくしても、検索条件に指定された属性の種類に基づき検索時に辿るべきノード数を小さくすることができず、検索効率が劣化する。 Furthermore, in the conventional UBI-Tree, when a combination of “number of attribute types included in the node” and “probability that the node is searched” is used as a penalty, attributes included in most of the inserted data are included. Even if the number of attribute types included in each node is reduced with the “number of attribute types included in the node” as a penalty for the search as a search condition, the search should be followed based on the type of attribute specified in the search condition. The number of nodes cannot be reduced and search efficiency deteriorates.

この発明は上記事情に着目してなされたもので、その目的とするところは、ペナルティの算出に必要な計算量を低減して新規データの挿入処理を高速に行えるようにし、かつ検索時に辿るべきノード数を抑制して検索効率を高めたデータ索引装置、データ索引方法及びプログラムを提供することにある。 The present invention has been made paying attention to the above circumstances, and the object of the present invention is to reduce the amount of calculation necessary for calculating the penalty so that the new data can be inserted at high speed and to be followed at the time of retrieval. An object of the present invention is to provide a data indexing device, a data indexing method, and a program in which the number of nodes is suppressed and search efficiency is increased.

上記目的を達成するためにこの発明の第１の観点は、属性とその属性値とからなる組を少なくとも１つ有するデータの集合を、UBI-Tree構造により記憶し管理するデータ索引装置にあって、上記データ集合と、上記UBI-Tree構造と、ペナルティの算出に使用する属性を予め定義した着目属性リストとを少なくとも記憶しておき、上記UBI-Tree構造中に新規データを挿入する際に、当該新規データが有する属性のうち、上記着目属性リストに定義された属性に該当する属性のみを用いてペナルティを算出し、上記新規データの挿入を行うようにしたものである。 To achieve the above object, according to a first aspect of the present invention, there is provided a data indexing apparatus for storing and managing a data set having at least one set of attributes and attribute values by a UBI-Tree structure. The data set, the UBI-Tree structure, and at least the attribute list of interest in which attributes used for calculating the penalty are stored in advance, and when inserting new data into the UBI-Tree structure, Among the attributes of the new data, a penalty is calculated using only the attributes corresponding to the attributes defined in the target attribute list, and the new data is inserted.

また、この発明の第２の観点は、対象ノードに含まれる属性種類数をペナルティとして用いるか否かを指定する次元抑制ペナルティ無効化フラグをさらに記憶しておき、上記UBI-Tree構造中に新規データを挿入する際に、上記次元抑制ペナルティ無効化フラグを参照し、当該フラグがオンに設定されている場合には、対象ノードに含まれる属性種類数と、対象ノードが検索される確率のうち、対象ノードが検索される確率のみをペナルティとして用いてその値を算出し、上記フラグがオフに設定されている場合には上記対象ノードに含まれる属性種類数と上記対象ノードが検索される確率の両方をペナルティとして用いてその値を算出するようにしたものである。 The second aspect of the present invention further stores a dimension suppression penalty invalidation flag for designating whether or not to use the number of attribute types included in the target node as a penalty, and adds a new one to the UBI-Tree structure. When inserting data, refer to the dimension suppression penalty invalidation flag, and if the flag is set to ON, the number of attribute types included in the target node and the probability that the target node will be searched The value is calculated using only the probability that the target node is searched as a penalty, and the number of attribute types included in the target node and the probability that the target node is searched when the flag is set to off. Both are used as penalties and their values are calculated.

この発明の第１の観点によれば、UBI-Tree構造中に新規データを挿入する際に、当該新規データが有する属性のうち、予め記憶しておいた着目属性リストに定義された属性に該当する属性のみを用いてペナルティが算出される。一般に、検索条件に用いる属性集合は事前に分かっていることが多い。したがって、このような場合に検索条件に用いる属性集合を着目属性リストに定義しておけば、検索精度を維持した上で、新規データが有するすべての属性についてペナルティを算出する場合に比べペナルティの計算量を減らすことができ、これにより新規データの挿入に要する時間を短縮すると共に装置の処理負荷を軽減することが可能となる。 According to the first aspect of the present invention, when new data is inserted into the UBI-Tree structure, it corresponds to an attribute defined in the attention attribute list stored in advance among the attributes of the new data. Penalties are calculated using only the attributes to be processed. In general, the attribute set used for the search condition is often known in advance. Therefore, if the attribute set used for the search condition in such a case is defined in the attribute list of interest, the penalty calculation is performed compared with the case where the penalty is calculated for all the attributes of the new data while maintaining the search accuracy. The amount can be reduced, thereby reducing the time required to insert new data and reducing the processing load on the apparatus.

この発明の第２の観点によれば、UBI-Tree構造中に新規データを挿入する際に、予め記憶しておいた次元抑制ペナルティ無効化フラグがオンに設定されている場合には、対象ノードに含まれる属性種類数をペナルティとして用いずに、対象ノードが検索される確率のみをペナルティとして用いてペナルティの計算が行われる。先に述べたように、検索条件に用いる属性集合は事前に分かっていることが多い。そしてその属性が、挿入対象のデータの多くが含む属性である場合には、対象ノードに含まれる属性種類数を次元抑制ペナルティ無効化フラグにより事前に無効化しておけば、検索精度を維持した上でペナルティの計算量を減らすことができ、これにより新規データの挿入に要する時間を短縮すると共に装置の処理負荷を軽減することが可能となる。 According to the second aspect of the present invention, when new dimension insertion penalty invalidation flag stored in advance is set to ON when inserting new data into the UBI-Tree structure, the target node The penalty calculation is performed using only the probability that the target node is searched as a penalty without using the number of attribute types included in the as a penalty. As described above, the attribute set used for the search condition is often known in advance. If the attribute is an attribute that is included in most of the data to be inserted, the number of attribute types included in the target node can be invalidated in advance using the dimension suppression penalty invalidation flag to maintain search accuracy. Thus, it is possible to reduce the amount of penalty calculation, thereby reducing the time required for inserting new data and reducing the processing load of the apparatus.

すなわちこの発明の第１及び第２の観点によれば、ペナルティの算出に必要な計算量が低減されて新規データの挿入処理を高速に行えるようになり、かつ検索時に辿るべきノード数が抑制されて検索効率を高めることが可能なデータ索引装置、データ索引方法及びプログラムを提供することができる。 In other words, according to the first and second aspects of the present invention, the amount of calculation required for calculating the penalty is reduced, the new data insertion process can be performed at high speed, and the number of nodes to be traced at the time of retrieval is suppressed. Thus, it is possible to provide a data indexing device, a data indexing method, and a program capable of improving the search efficiency.

この発明の一実施形態に係るデータ索引装置の機能構成を示すブロック図。The block diagram which shows the function structure of the data index apparatus which concerns on one Embodiment of this invention. 図１に示したデータ索引装置の記憶ユニットに記憶されるデータの一例を示す図。The figure which shows an example of the data memorize | stored in the memory | storage unit of the data index apparatus shown in FIG. 図１に示したデータ索引装置の記憶ユニットに記憶される木構造情報の一例を示す図。The figure which shows an example of the tree structure information memorize | stored in the memory | storage unit of the data index apparatus shown in FIG. 図１に示したデータ索引装置の記憶ユニットに記憶される属性表情報の一例を示す図。The figure which shows an example of the attribute table information memorize | stored in the memory | storage unit of the data index apparatus shown in FIG. 図１に示したデータ索引装置の記憶ユニットに記憶される着目属性リスト情報の一例を示す図。The figure which shows an example of attention attribute list information memorize | stored in the memory | storage unit of the data index apparatus shown in FIG. 図１に示したデータ索引装置の記憶ユニットに記憶される次元抑制ペナルティフラグの一例を示す図。The figure which shows an example of the dimension suppression penalty flag memorize | stored in the memory | storage unit of the data index apparatus shown in FIG. 新規挿入対象のデータの一例を示す図。The figure which shows an example of the data of new insertion object. データの新規挿入に伴い更新された属性表情報の一例を示す図。The figure which shows an example of the attribute table information updated with the new insertion of data.

以下、図面を参照してこの発明に係わる実施形態を説明する。
［一実施形態］
図１は、この発明の一実施形態に係るデータ索引装置の機能構成を示すブロック図である。
本実施形態のデータ索引装置は、例えばデータベースサーバからなり、制御ユニット１と、通信インタフェースユニット２と、記憶ユニット３を備えている。通信インタフェースユニット２は、ネットワークＮＷで規定される通信プロトコルに従い、例えば図示しないセンサ群との間でその計測データを受信する機能を有する。 Embodiments according to the present invention will be described below with reference to the drawings.
[One Embodiment]
FIG. 1 is a block diagram showing a functional configuration of a data indexing apparatus according to an embodiment of the present invention.
The data index device according to the present embodiment includes, for example, a database server, and includes a control unit 1, a communication interface unit 2, and a storage unit 3. The communication interface unit 2 has a function of receiving measurement data with a sensor group (not shown), for example, according to a communication protocol defined by the network NW.

記憶ユニット３は、記憶媒体として例えばＨＤＤ（Hard Disc Drive）またはＳＳＤ（Solid State Drive）を使用したもので、この発明の一実施形態を実現する上で必要な木構造情報を記憶する記憶領域として、データ集合記憶部３１と、木構造記憶部３２と、属性表記憶部３３と、着目属性リスト記憶部３４と、次元抑制ペナルティフラグ記憶部３５を備える。 The storage unit 3 uses, for example, an HDD (Hard Disc Drive) or an SSD (Solid State Drive) as a storage medium, and serves as a storage area for storing tree structure information necessary for realizing an embodiment of the present invention. A data set storage unit 31, a tree structure storage unit 32, an attribute table storage unit 33, a target attribute list storage unit 34, and a dimension suppression penalty flag storage unit 35.

データ集合記憶部３１には、上記通信インタフェースユニット２により受信された計測データの集合が記憶される。個々の計測データは、属性とその属性値とからなる組を少なくとも１つ有する。図２の２０１〜２０５は計測データの一例を示したものである。例えば、計測データ２０１は属性として「センサＩＤ」、「時間」、「温度」及び「湿度」を有し、その属性値としてそれぞれ「１」、「３」、「２０．３」及び「６０」を有する。また計測データ２０２は、属性として「センサＩＤ」、「時間」及び「照度」を有し、その属性値としてそれぞれ「３」、「７」及び「６」を有する。すなわち、計測データは、各々１つ以上の属性と属性値との組を持ち、また属性の数と種類はデータ間で異なり得る。 The data set storage unit 31 stores a set of measurement data received by the communication interface unit 2. Each measurement data has at least one set of attributes and attribute values. Reference numerals 201 to 205 in FIG. 2 show examples of measurement data. For example, the measurement data 201 has “sensor ID”, “time”, “temperature”, and “humidity” as attributes, and “1”, “3”, “20.3”, and “60” as attribute values, respectively. Have The measurement data 202 has “sensor ID”, “time”, and “illuminance” as attributes, and has “3”, “7”, and “6” as attribute values. That is, each measurement data has a set of one or more attributes and attribute values, and the number and types of attributes may differ between the data.

木構造記憶部３２には、上記データ集合をUBI-Treeにより索引付けした状態で管理するための木構造が記憶される。UBI-Treeの木構造は、「当該ノードが検索される確率」と「当該ノードに含まれる属性種類数」の組合せをペナルティとして用いる。図３はUBI-Treeの木構造の一例を示すもので、この例では複数のノード３０１〜３０６と、複数の計測データ２０１〜２０５とから構成される。計測データ２０１〜２０５は、図２に示したデータに対応する。 The tree structure storage unit 32 stores a tree structure for managing the data set in an indexed state by UBI-Tree. The tree structure of UBI-Tree uses a combination of “probability that the node is searched” and “the number of attribute types included in the node” as a penalty. FIG. 3 shows an example of a UBI-Tree tree structure. In this example, the tree structure includes a plurality of nodes 301 to 306 and a plurality of measurement data 201 to 205. The measurement data 201 to 205 correspond to the data shown in FIG.

木の根元に位置するノード３０１はルートノードと呼ばれ、木の葉にあたるノード３０４〜３０６はリーフノードと呼ばれる。各ノード３０１〜３０６は他のノードあるいはデータへのポインタを保持する。例えば、ルートノード３０１はノード３０２へのポインタとノード３０３へのポインタを保持する。このことを単にノード３０１は子ノード３０２を保持すると表現する。ポインタにより接続されたノード間には親子関係があると見なされる。例えばルートノード３０１に対してノード３０２は子ノードと呼ばれる。逆に、ノード３０２に対してルートノード３０１は親ノードと呼ばれる。また、各ノードが保持する子ノードあるいはデータの数の上限値は「２」に設定されている。 The node 301 located at the root of the tree is called a root node, and the nodes 304 to 306 corresponding to the leaves of the tree are called leaf nodes. Each node 301 to 306 holds a pointer to another node or data. For example, the root node 301 holds a pointer to the node 302 and a pointer to the node 303. This is simply expressed as the node 301 holding the child node 302. It is assumed that there is a parent-child relationship between the nodes connected by the pointer. For example, for the root node 301, the node 302 is called a child node. Conversely, the root node 301 is called a parent node with respect to the node 302. The upper limit value of the number of child nodes or data held by each node is set to “2”.

なお、図３では図示を省略しているが、親ノードは各子ノードについて、当該子ノードの下位に存在するデータに関する範囲を表す情報を保持している。例えば、ルートノード３０１は、子ノード３０２以下に対し「センサＩＤ」が１〜１０、「時間」が３〜１０、「温度」が２０．３〜２１．１、「湿度」が６０〜９０のデータが存在している、という情報を保持している。同様に、ルートノード３０１は子ノード３０３以下に存在しているデータの範囲情報も保持している。検索を行う際には、当該範囲情報を用いて、検索条件を満たすノードだけを辿ることにより、検索条件を満たすデータを効率的に発見することが可能である。 Although not shown in FIG. 3, for each child node, the parent node holds information indicating a range related to data existing under the child node. For example, the root node 301 has a “sensor ID” of 1 to 10, a “time” of 3 to 10, a “temperature” of 20.3 to 21.1, and a “humidity” of 60 to 90 with respect to the child nodes 302 and below. It holds information that data exists. Similarly, the root node 301 also holds data range information existing below the child node 303. When performing a search, it is possible to efficiently find data satisfying the search condition by tracing only nodes satisfying the search condition using the range information.

属性表記憶部３３には、上記データ集合記憶部３１に記憶された計測データの集合が持つ属性の出現頻度、最小値及び最大値を表す情報が記憶される。図４はその一例を示すもので、図２に示したデータ集合の属性を表に表した場合を例示している。
着目属性リスト記憶部３４には、上記データ集合記憶部３１に新規データを挿入する際に、「当該ノードが検索される確率」をペナルティとして算出する場合の、算出対象とする属性のリストが予め記憶されている。図５はその一例を示すもので、この例では「時間」及び「温度」を着目属性として定義した場合を示している。 The attribute table storage unit 33 stores information representing the appearance frequency, minimum value, and maximum value of the attributes of the measurement data set stored in the data set storage unit 31. FIG. 4 shows an example of this, and illustrates the case where the attributes of the data set shown in FIG. 2 are represented in a table.
The attention attribute list storage unit 34 stores in advance a list of attributes to be calculated when calculating “probability that the node is searched” as a penalty when inserting new data into the data set storage unit 31. It is remembered. FIG. 5 shows an example, and in this example, “time” and “temperature” are defined as the attributes of interest.

次元抑制ペナルティ無効化フラグ記憶部３５には、次元抑制ペナルティ無効化フラグの状態（ＯＮ／ＯＦＦ）が予め記憶される。次元抑制ペナルティ無効化フラグとは、「当該ノードに含まれる属性種類数」のことであり、これを無効化するとは「当該ノードが検索される確率」と「当該ノードに含まれる属性種類数」の組合せをペナルティとしている場合に、「当該ノードが検索される確率」だけをペナルティとすることを意味する。図６は、次元抑制ペナルティ無効化フラグがＯＮに設定された状態を例示している。 The dimension suppression penalty invalidation flag storage unit 35 stores in advance the state (ON / OFF) of the dimension suppression penalty invalidation flag. The dimension suppression penalty invalidation flag is “the number of attribute types included in the node”. To invalidate this, “the probability that the node is searched” and “the number of attribute types included in the node” This means that only the “probability that the node is searched” is taken as a penalty. FIG. 6 illustrates a state in which the dimension suppression penalty invalidation flag is set to ON.

制御ユニット１は、例えばＣＰＵ（Central Processing Unit）を備え、この発明の一実施形態を実現するために必要な制御機能として、データ検索部１１と、データ挿入部１２を有している。なお、これらのデータ検索部１１及びデータ挿入部１２は、何れも図示しないプログラムメモリに格納されたプログラムを上記ＣＰＵに実行させることにより実現される。 The control unit 1 includes, for example, a CPU (Central Processing Unit), and includes a data search unit 11 and a data insertion unit 12 as control functions necessary for realizing one embodiment of the present invention. The data search unit 11 and the data insertion unit 12 are realized by causing the CPU to execute a program stored in a program memory (not shown).

データ検索部１１は、図示しない検索元の装置から送信されたデータ検索要求がネットワークＮＷを介して通信インタフェースユニット２で受信された場合に、当該受信されたデータ検索要求により指定された検索条件と、上記木構造記憶部３２に記憶された木構造情報、及び属性表記憶部３３に記憶された属性表情報に基づいて、ルートノード３０１から子ノード３０２，３０３及びリーフノード３０４〜３０６を順次辿って計測データを特定する。そして、この特定された計測データをデータ集合記憶部３１から読み出し、この読み出された計測データを通信インタフェース２からネットワークＮＷを介して要求元の装置へ返送する処理を行う。 When a data search request transmitted from a search source device (not shown) is received by the communication interface unit 2 via the network NW, the data search unit 11 includes a search condition specified by the received data search request, Based on the tree structure information stored in the tree structure storage unit 32 and the attribute table information stored in the attribute table storage unit 33, the child node 302, 303 and the leaf nodes 304 to 306 are sequentially traced from the root node 301. To identify measurement data. Then, the specified measurement data is read from the data set storage unit 31, and the read measurement data is returned from the communication interface 2 to the requesting device via the network NW.

データ挿入部１２は、以下のような処理機能を有する。
(1) 図示しないセンサから送信された新規計測データがネットワークＮＷを介して通信インタフェースユニット２で受信された場合に、当該受信された新規計測データが持つ属性の種類のうち、着目属性リスト記憶部３４に記憶された着目属性リストに含まれる属性のみからペナルティを計算する。そして、この計算されたペナルティと、上記木構造記憶部３２に記憶された木構造情報に基づいてノードを辿り、上記新規計測データを挿入すべきノードを特定する処理。 The data insertion unit 12 has the following processing functions.
(1) When new measurement data transmitted from a sensor (not shown) is received by the communication interface unit 2 via the network NW, among the types of attributes of the received new measurement data, the target attribute list storage unit A penalty is calculated only from the attributes included in the attribute list of interest stored in 34. Then, a process of tracing a node based on the calculated penalty and the tree structure information stored in the tree structure storage unit 32 and specifying the node into which the new measurement data is to be inserted.

(2) 上記ペナルティを算出する際に、次元抑制ペナルティ無効化フラグ記憶部３５に記憶されている次元抑制ペナルティ無効化フラグを参照し、当該フラグがＯＮであれば「当該ノードに含まれる属性種類数」をペナルティとして用いずに「当該ノードが検索される確率」のみをペナルティとして計算する処理。 (2) When calculating the penalty, the dimension suppression penalty invalidation flag stored in the dimension suppression penalty invalidation flag storage unit 35 is referred to, and if the flag is ON, “attribute type included in the node” A process of calculating only “probability that the node is searched” as a penalty without using “number” as a penalty.

次に、以上のように構成されたデータ索引装置ＩＵの動作を説明する。
ここでは、次元抑制ペナルティ無効化フラグが「ＯＮ」に設定され、さらに着目属性リストには「時間」と「温度」が記述されているものとする。なお、記憶ユニット３の木構造記憶部３２には図３に示すUBI-Tree構造が、また属性表記憶部３３には図４に示す属性表がそれぞれ記憶されているものとして説明を行う。 Next, the operation of the data indexing device IU configured as described above will be described.
Here, it is assumed that the dimension suppression penalty invalidation flag is set to “ON”, and “time” and “temperature” are described in the target attribute list. In the following description, it is assumed that the tree structure storage unit 32 of the storage unit 3 stores the UBI-Tree structure shown in FIG. 3, and the attribute table storage unit 33 stores the attribute table shown in FIG.

いま、図示しないセンサから新たな計測データが送信され、この新規計測データがネットワークＮＷを介して通信インタフェースユニット２で受信されたとする。図７に、このとき受信された新規計測データを示す。 Now, it is assumed that new measurement data is transmitted from a sensor (not shown) and this new measurement data is received by the communication interface unit 2 via the network NW. FIG. 7 shows the new measurement data received at this time.

上記新規計測データが受信されると、制御ユニットＩＵはデータ挿入部１２を起動し、このデータ挿入部１２の制御の下で以下のようなデータ挿入処理を実行する。すなわち、先ず上記受信された新規計測データに基づいて、属性表記憶部３３に記憶された属性表を更新する。例えば、受信された新規計測データは属性として「センサＩＤ」、「時間」、「湿度」、「照度」、「音量」を含むため、図４に示す更新前の属性表の該当する行の出現頻度がそれぞれ“１”増加され、さらに「時間」、「照度」、「音量」の値は既存のデータ値よりも大きいため、対応する属性の最大値がそれぞれ更新される。図８はこの更新後の属性表を示す。 When the new measurement data is received, the control unit IU activates the data insertion unit 12 and executes the following data insertion process under the control of the data insertion unit 12. That is, first, the attribute table stored in the attribute table storage unit 33 is updated based on the received new measurement data. For example, since the received new measurement data includes “sensor ID”, “time”, “humidity”, “illuminance”, and “volume” as attributes, the appearance of the corresponding row in the pre-update attribute table shown in FIG. Since the frequency is increased by “1” and the values of “time”, “illuminance”, and “volume” are larger than the existing data values, the maximum value of the corresponding attribute is updated. FIG. 8 shows the updated attribute table.

次に、木構造のルートノード３０１から、挿入先ノード選択アルゴリズムを用いながら挿入先のノードを選択していく。このとき、次元抑制ペナルティ無効化フラグ記憶部３５に記憶された次元抑制ペナルティ無効化フラグを調べる。そして、当該フラグがＯＮに設定されているかＯＦＦに設定されているかを判定し、この判定結果に応じて「当該ノードに含まれる属性種類数」と「当該ノードが検索される確率」の両方をペナルティとするか、或いは「当該ノードが検索される確率」だけをペナルティとするかを決定する。また、「当該ノードが検索される確率」を算出する際には、上記受信された新規計測データに含まれる属性のうち、上記着目属性リスト記憶部３３に記憶されている着目属性リストに記述された着目属性についてのみペナルティを算出する。 Next, an insertion destination node is selected from the tree-structured root node 301 using an insertion destination node selection algorithm. At this time, the dimension suppression penalty invalidation flag stored in the dimension suppression penalty invalidation flag storage unit 35 is examined. Then, it is determined whether the flag is set to ON or OFF, and both “the number of attribute types included in the node” and “probability that the node is searched” are determined according to the determination result. It is determined whether to use a penalty or only a “probability that the node is searched”. Further, when calculating the “probability that the node is searched”, among the attributes included in the received new measurement data, it is described in the target attribute list stored in the target attribute list storage unit 33. Penalties are calculated only for the attribute of interest.

例えば、いま先に述べたように次元抑制ペナルティ無効化フラグがＯＮに設定され、着目属性リストには「時間」と「温度」が記述されているとする。この場合、「当該ノードに含まれる属性種類数」はペナルティから除外され、「当該ノードが検索される確率」のみがペナルティとして用いられる。また、「当該ノードが検索される確率」を計算する際に、上記受信された新規計測データには属性として「時間」、「湿度」、「照度」、「音量」が含まれているものの、着目属性リストには上記したように「時間」と「温度」のみが記述されているため、「時間」についてのみ「当該ノードが検索される確率」がペナルティとして算出される。このときの「当該ノードが検索される確率」Xと、検索条件として「時間」を用いたときの確率Yは、次式で算出される。
当該ノードが検索される確率＝XY
すなわち、右辺はすべての属性について足し合わせるのではなく、ペナルティの算出に用いる属性である「時間」についてのみ足し合わせたものとなる。 For example, as described above, it is assumed that the dimension suppression penalty invalidation flag is set to ON and “time” and “temperature” are described in the target attribute list. In this case, “the number of attribute types included in the node” is excluded from the penalty, and only the “probability that the node is searched” is used as the penalty. In addition, when calculating the “probability that the node is searched”, the received new measurement data includes “time”, “humidity”, “illuminance”, and “volume” as attributes, Since only “time” and “temperature” are described in the target attribute list, “probability that the node is searched” is calculated as a penalty only for “time”. The “probability that the node is searched” X and the probability Y when “time” is used as a search condition are calculated by the following equations.
Probability of searching for the node = XY
That is, the right side is not the sum of all attributes, but only the “time” that is the attribute used for calculating the penalty.

データ挿入部１２は、ルートノード３０１において、挿入先ノード選択アルゴリズムを用いて挿入先のノードを以下のように選択する。すなわち、上記新規計測データをノード３０３に挿入する場合を仮定すると、ノード３０３に含まれる「時間」の値が元々「７〜９」であったものが、「７〜１１」に変化する。すなわち、値の範囲は「２」だけ広がる。これに対し、新規計測データをノード３０２に挿入する場合を仮定すると、ノード３０２に含まれる「時間」の値が元々「３〜１０」であったものが、「３〜１１」に変化する。すなわち、値の範囲は「１」だけ広がる。ここで、「当該ノードが検索される確率」をペナルティとする場合には、値の範囲の広がりが小さい方が挿入先として選択される。したがって、この場合には「時間」の値の範囲の広がりが小さいノード３０２が挿入先として選択される。 In the root node 301, the data insertion unit 12 selects an insertion destination node using an insertion destination node selection algorithm as follows. That is, assuming that the new measurement data is inserted into the node 303, the value of “time” included in the node 303 is originally “7-9” and changes to “7-11”. That is, the value range is expanded by “2”. On the other hand, assuming that new measurement data is inserted into the node 302, the value of “time” included in the node 302 is originally “3 to 10” and changes to “3 to 11”. That is, the range of values is expanded by “1”. Here, when the “probability that the node is searched” is used as a penalty, the one with the smaller spread of the value range is selected as the insertion destination. Therefore, in this case, the node 302 having a small range of the “time” value is selected as the insertion destination.

次に、ノード３０２において、同様に挿入先ノード選択アルゴリズムを用いて挿入先のノードを選択する。この場合、「時間」の値の範囲の広がりはノード３０４よりもノード３０５の方が小さい。このため、ノード３０５が挿入先のノードとして選択され、このノード３０５に上記新規計測データが挿入される。 Next, in the node 302, the insertion destination node is similarly selected using the insertion destination node selection algorithm. In this case, the spread of the “time” value range is smaller in the node 305 than in the node 304. Therefore, the node 305 is selected as the insertion destination node, and the new measurement data is inserted into this node 305.

但し、上記ノード３０５に上記新規計測データを挿入すると、ノード３０５が保持するデータ数は「３」となってしまい、各ノードが保持する子ノードあるいはデータの数の上限値「２」を超えてしまう。そこで、ノード分割アルゴリズムによりノード３０５を分割する。なお、ノードを分割する際にも、先に述べた挿入先ノード選択アルゴリズムと同様に、次元抑制ペナルティ無効化フラグを調べ、かつ「当該ノードが検索される確率」を算出する際に、着目属性リストに記述されている属性のみを算出対象とする。 However, when the new measurement data is inserted into the node 305, the number of data held by the node 305 becomes “3”, exceeding the upper limit “2” of the number of child nodes or data held by each node. End up. Therefore, the node 305 is divided by a node division algorithm. Note that when dividing a node, as in the insertion destination node selection algorithm described above, the attribute of interest is used when checking the dimension suppression penalty invalidation flag and calculating the “probability that the node is searched”. Only the attributes described in the list are subject to calculation.

一方、上記新規計測データを挿入する際に、次元抑制ペナルティ無効化フラグがＯＦＦに設定されていたとすると、「当該ノードに含まれる属性種類数」がペナルティとして使用される。そして、挿入先ノード選択アルゴリズムにより、ルートノード３０１に接続された２つのノード３０２，３０２ついて上記「当該ノードに含まれる属性種類数」が計算される。 On the other hand, if the dimension suppression penalty invalidation flag is set to OFF when the new measurement data is inserted, “the number of attribute types included in the node” is used as a penalty. Then, the “number of attribute types included in the node” is calculated for the two nodes 302 and 302 connected to the root node 301 by the insertion destination node selection algorithm.

例えば、いま新規計測データをノード３０３に挿入するものと仮定すると、ノード３０３に含まれる属性は元々「センサＩＤ」、「時間」、「照度」、「音量」の４種類であったものが、「センサＩＤ」、「時間」、「照度」、「音量」、「湿度」の５種類となり、属性種類数が「１」だけ増加する。これに対し、新規計測データをノード３０２に挿入するものと仮定すると、ノード３０２に含まれる属性が元々「センサＩＤ」、「時間」、「温度」、「湿度」の４種類であったものが、「センサＩＤ」、「時間」、「温度」、「湿度」、「照度」、「音量」の６種類となり、属性種類数が「２」増加する。このため、この場合には挿入先として属性の種類数の増加幅が少ないノード３０３が選択される。 For example, assuming that new measurement data is to be inserted into the node 303, the attributes included in the node 303 are originally four types of “sensor ID”, “time”, “illuminance”, and “volume”. There are five types of “sensor ID”, “time”, “illuminance”, “volume”, and “humidity”, and the number of attribute types increases by “1”. On the other hand, assuming that new measurement data is to be inserted into the node 302, the attributes included in the node 302 were originally “sensor ID”, “time”, “temperature”, and “humidity”. , “Sensor ID”, “time”, “temperature”, “humidity”, “illuminance”, “volume”, and the number of attribute types increases by “2”. Therefore, in this case, the node 303 with a small increase in the number of attribute types is selected as the insertion destination.

このように、上記挿入先として選択されたノード３０３に連なるノードに新規計測データを挿入すれば、「音量」の属性をもつデータはノード３０３に連なるノードに絞られるため、「音量」を検索条件に用いた場合にはノード３０３以下の部分木のみを検索すれば良くなり、検索効率が向上する。 As described above, if new measurement data is inserted into a node connected to the node 303 selected as the insertion destination, data having the “volume” attribute is narrowed down to nodes connected to the node 303, and therefore “volume” is set as a search condition. When this is used, only the subtree below the node 303 needs to be searched, and the search efficiency is improved.

これに対し、挿入先としてノード３０２を選択した場合、「音量」の属性を持つ計測データがノード３０２に連なるノードとノード３０３に連なるノードに分散してしまうため、「音量＝０〜１０」を検索条件に用いた場合に、ノード３０２に連なる部分木とノード３０３に連なる部分木の両方を検索しなければならず、検索効率は低下する。 On the other hand, when the node 302 is selected as the insertion destination, measurement data having the attribute of “volume” is distributed to the node connected to the node 302 and the node connected to the node 303. Therefore, “volume = 0 to 10” is set. When used as a search condition, both the subtree connected to the node 302 and the subtree connected to the node 303 must be searched, and the search efficiency decreases.

しかし、挿入先としてノード３０３を選択した場合、ノード３０２は「時間」の値が「３〜１０」の計測データを傘下のノードに保持し、ノード３０３は「時間」の値が「７〜１１」の計測データを傘下のノードに保持することになる。このため、例えば「時間＝１０」を検索条件に用いた場合、両方の部分木を検索する必要が生じる。これに対し、挿入先としてノード３０２を選択した場合には、「時間＝１０」という検索条件に対してノード３０２以下の部分木のみを検索すれば良い。このように、「音量＝０〜１０」のように、挿入されたすべての測定データのうち、一部のデータだけが保持する属性を検索条件とする場合には、「当該ノードに含まれる属性種類数」をペナルティとすることで検索効率が上がる。しかし、「時間」のように、多くのデータが保持する属性を検索条件とする場合には、「当該ノードに含まれる属性種類数」をペナルティとすることで検索効率が低下してしまう。したがって、多くのデータが保持する属性を検索条件とすることが多いことが事前に分かっている場合、本実施形態のように次元抑制ペナルティ無効化フラグをＯＮに設定しておき、「当該ノードに含まれる属性種類数」をペナルティとして使用しないようにすることで、検索効率を高めることが可能となる。 However, when the node 303 is selected as the insertion destination, the node 302 holds the measurement data having the “time” value “3 to 10” in the subordinate node, and the node 303 has the “time” value “7 to 11”. ”Is stored in a subordinate node. For this reason, for example, when “time = 10” is used as a search condition, it is necessary to search both subtrees. On the other hand, when the node 302 is selected as the insertion destination, only the subtree below the node 302 needs to be searched for the search condition “time = 10”. As described above, when the search condition is an attribute held by only a part of the inserted measurement data, such as “volume = 0 to 10,” “the attribute included in the node”. Search efficiency is improved by taking the number of types as a penalty. However, in the case where an attribute held by a lot of data such as “time” is used as a search condition, the search efficiency is lowered by setting “the number of attribute types included in the node” as a penalty. Therefore, when it is known in advance that the attribute held by a lot of data is often used as a search condition, the dimension suppression penalty invalidation flag is set to ON as in this embodiment, By not using the “number of included attribute types” as a penalty, search efficiency can be improved.

また、新規計測データを挿入する際に、ノード３０２において挿入先ノード選択アルゴリズムにより新規計測データに含まれるすべての属性を用いて「当該ノードが検索される確率」を算出すると、ノード３０５ではなくノード３０４が挿入先として選択される。そして、この選択されたノード３０４に新規計測データを挿入すると、「時間」の値の範囲は「３〜３」から「３〜１１」に増加して拡大幅が「８」になるものの、「センサＩＤ」は「１〜１」から「１〜２」へ増加するだけで拡大幅は「１」で済むことになる。また、「湿度」については値の範囲は増加しないで済む。逆に挿入先としてノード３０５を選択した場合、「時間」の値の範囲は「８〜１０」から「８〜１１」へ増加するだけで拡大幅は「１」で済むが、「センサＩＤ」や「湿度」の値の範囲の増加が大きくなる。 Further, when new measurement data is inserted, if the “probability that the node is searched” is calculated using all the attributes included in the new measurement data by the insertion destination node selection algorithm at the node 302, the node 302 is not the node 305. 304 is selected as the insertion destination. Then, when new measurement data is inserted into the selected node 304, the range of the “time” value increases from “3 to 3” to “3 to 11” and the expansion width becomes “8”. The sensor ID ”increases from“ 1-1 ”to“ 1-2 ”, and the expansion width is only“ 1 ”. Further, the value range of “humidity” does not need to be increased. Conversely, when the node 305 is selected as the insertion destination, the range of the “time” value only increases from “8 to 10” to “8 to 11”, and the expansion width can be “1”, but “sensor ID”. And “humidity” value range increases.

ルートノード３０１における挿入先ノードの選択アルゴリズムと同様に考えると、例えば「センサＩＤ」で検索する場合には、「センサＩＤ」の値の拡大幅が小さくなるように挿入先としてノード３０４を選択することで検索効率が向上する。一方、「時間」で検索する場合には、「時間」の値の拡大幅が小さくなるように挿入先としてノード３０５を選択することで検索効率が向上する。 Considering the same as the selection algorithm of the insertion destination node in the root node 301, for example, when searching by “sensor ID”, the node 304 is selected as the insertion destination so that the expansion width of the value of “sensor ID” becomes small. This improves search efficiency. On the other hand, when searching by “time”, the search efficiency is improved by selecting the node 305 as the insertion destination so that the expansion range of the value of “time” is small.

したがって、検索条件に用いる属性が事前に分かっている場合には、本実施形態のように着目属性リスト記憶部３４に検索条件に用いる属性の種類を予め記憶しておき、新規計測データに含まれる属性のうち、上記着目属性リストに記述された属性だけを用いて「当該ノードが検索される確率」をペナルティとして算出することで、検索効率を高めることが可能となる。 Therefore, when the attribute used for the search condition is known in advance, the attribute type used for the search condition is stored in advance in the focused attribute list storage unit 34 as in the present embodiment, and is included in the new measurement data. It is possible to increase the search efficiency by calculating “probability that the node is searched” as a penalty using only the attributes described in the attribute list of interest among the attributes.

［他の実施形態］
なお、この発明は上記実施形態に限定されるものではない。例えば、上記一実施形態では、各ノードが保持する子ノードあるいはデータの数の上限値は「２」に設定されているとしたが、これに限らず他の正数値に設定しても良い。 [Other Embodiments]
The present invention is not limited to the above embodiment. For example, in the above-described embodiment, the upper limit value of the number of child nodes or data held by each node is set to “2”. However, the present invention is not limited to this and may be set to another positive value.

また、前記一実施形態では、ペナルティを計算する際に、着目属性リストを参照して、「当該ノードが検索される確率」の算出に用いる属性を選択したが、これに限らず、属性表を更新する際に、着目属性リストを参照して、当該着目属性リストに記述されていない属性は属性表に追記しない、というようにしても良い。このようにすることで、ペナルティを計算する際には、属性表に存在しない属性は着目属性リストに記述されていないため、属性表を参照するだけで「当該ノードが検索される確率」の算出に用いる属性を選択することが可能となる。また属性表に、１カラムを増やし、着目属性リストに記述されているかどうかを示すフラグを記憶させることとしても良い。このようにすることにより、属性表を参照するだけで「当該ノードが検索される確率」の算出に用いる属性を選択することが可能となる。 In the above embodiment, when calculating the penalty, the attribute used for calculating the “probability that the node is searched” is selected with reference to the attribute list of interest. When updating, it is also possible to refer to the target attribute list and not add an attribute not described in the target attribute list to the attribute table. In this way, when calculating the penalty, since the attribute that does not exist in the attribute table is not described in the attribute list of interest, the “probability that the node is searched” is calculated simply by referring to the attribute table. It is possible to select an attribute to be used for. Further, the attribute table may be increased by one column and a flag indicating whether or not it is described in the target attribute list may be stored. By doing so, it is possible to select an attribute used for calculating the “probability that the node is searched” simply by referring to the attribute table.

さらに、前記一実施形態では、挿入する新規計測データに含まれる属性のうち、着目属性リストに記述された属性のみを用いてペナルティを算出するようにしたが、挿入する計測データに含まれる属性のうち、着目属性リストに記述された属性以外の属性を用いてペナルティを算出するようにしても良い。このようにすると、「この属性では検索しない」ことが事前に分かっている場合に、着目属性リストに属性を記述しておくことでペナルティの算出に用いないようにすることが可能となる。 Furthermore, in the one embodiment, the penalty is calculated using only the attribute described in the target attribute list among the attributes included in the new measurement data to be inserted. However, the attribute included in the measurement data to be inserted Of these, the penalty may be calculated using an attribute other than the attribute described in the attribute list of interest. In this way, when it is known in advance that “do not search with this attribute”, it is possible to prevent the penalty from being used for calculating the penalty by describing the attribute in the attribute list of interest.

さらに、前記一実施形態では、次元抑制ペナルティ無効化フラグ及び着目属性リストを事前にそれぞれ次元抑制ペナルティ無効化フラグ記憶部３５及び着目属性記憶部３４に記憶させておくものとしたが、外部の保守端末等から書き換え可能としても良い。これにより、検索要求の変化に応じて、これらの設定を動的に変化させることが可能となる。 Further, in the embodiment, the dimension suppression penalty invalidation flag and the target attribute list are stored in advance in the dimension suppression penalty invalidation flag storage unit 35 and the target attribute storage unit 34, respectively. It may be rewritable from a terminal or the like. This makes it possible to dynamically change these settings in accordance with changes in search requests.

また、前記一実施形態ではデータとしてセンサにより得られた温度、湿度、照度、音量等の計測データを取り扱う場合を例にとって説明した。しかし、それに限らず電流や電圧値、流体の流量、物質の濃度、明度、騒音レベル、位置、加速度などの計測データを取り扱ってよく、さらにはセンサ以外の例えばＷｅｂやインターネットを経由して取得した情報であってもよい。また、それら値に加えて、センサの特性や状態、計測日時等を示すメタデータを含む情報を取り扱ってもよい。その他、データ索引装置の構成やデータ挿入処理の手順と処理内容等についても、この発明の要旨を逸脱しない範囲で種々変形して実施可能である。 In the above-described embodiment, the case where measurement data such as temperature, humidity, illuminance, and volume obtained by a sensor is handled as data has been described as an example. However, the measurement data such as current, voltage value, fluid flow rate, substance concentration, brightness, noise level, position, acceleration, etc. may be dealt with, and it is obtained via other than sensors such as the Web or the Internet. It may be information. In addition to these values, information including metadata indicating sensor characteristics and states, measurement date and time, and the like may be handled. In addition, the configuration of the data indexing device, the procedure and processing contents of the data insertion processing, and the like can be implemented with various modifications without departing from the gist of the present invention.

要するにこの発明は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 In short, the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, you may combine suitably the component covering different embodiment.

ＩＵ…データ索引装置、ＮＷ…ネットワーク、１…制御ユニット、２…通信インタフェースユニット、３…記憶ユニット、１１…データ検索部、１２…データ挿入部、３１…データ集合記憶部、３２…木構造記憶部、３３…属性表記憶部、３４…着目属性リスト記憶部、３５…次元抑制ペナルティ無効化フラグ記憶部。 IU ... Data indexing device, NW ... Network, 1 ... Control unit, 2 ... Communication interface unit, 3 ... Storage unit, 11 ... Data search unit, 12 ... Data insertion unit, 31 ... Data set storage unit, 32 ... Tree structure storage , 33 ... attribute table storage unit, 34 ... attention attribute list storage unit, 35 ... dimension suppression penalty invalidation flag storage unit.

Claims

A data indexing apparatus for storing and managing a set of data having at least one set of attributes and attribute values by a UBI-Tree structure,
Tree structure information storage means for storing at least the data set, the UBI-Tree structure, and an attribute list of interest in which attributes used for calculating penalties are defined in advance;
When inserting new data into the UBI-Tree structure, a penalty is calculated using only attributes corresponding to the attributes defined in the attribute list of interest among the attributes of the new data, and the new data A data indexing device comprising: data insertion means for performing insertion.

The tree structure information storage means further stores a dimension suppression penalty invalidation flag that specifies whether or not to use the number of attribute types included in the target node as a penalty,
The data insertion means refers to the dimension suppression penalty invalidation flag when inserting new data into the UBI-Tree structure, and is included in the target node if the flag is set to ON. Among the number of attribute types and the probability that the target node is searched, the value is calculated using only the probability that the target node is searched as a penalty, and if the flag is set to off, the target node The data indexing device according to claim 1, wherein the value is calculated by using both the number of included attribute types and the probability that the target node is searched as a penalty.

A data indexing method executed by a device that stores and manages a set of data having at least one set of attributes and attribute values by a UBI-Tree structure,
Storing the data set, the UBI-Tree structure, and an attribute list of interest in which attributes used for calculating penalties are defined in advance in at least a tree structure information storage unit;
When inserting new data into the UBI-Tree structure, the target attribute list is read, and only attributes corresponding to the attributes defined in the read target attribute list among the attributes of the new data are read. A data indexing method comprising: calculating a penalty using the data, and inserting the new data based on the calculation result.

The storing step further stores a dimension suppression penalty invalidation flag that specifies whether or not to use the number of attribute types included in the target node as a penalty,
The process of inserting new data refers to the dimension suppression penalty invalidation flag when inserting new data into the UBI-Tree structure, and when the flag is set to ON, When the number of attribute types included in the node and the probability of searching for the target node is calculated using only the probability of searching for the target node as a penalty, and the flag is set to off The data indexing method according to claim 3, wherein the value is calculated using both the number of attribute types included in the target node and the probability that the target node is searched as a penalty.

A program for causing a computer included in the data indexing apparatus to execute processing by the data insertion unit included in the data indexing apparatus according to claim 1 or 2.