JP6300982B2

JP6300982B2 - SEARCH DATA MANAGEMENT DEVICE, SEARCH DATA MANAGEMENT METHOD, AND SEARCH DATA MANAGEMENT PROGRAM

Info

Publication number: JP6300982B2
Application number: JP2017075781A
Authority: JP
Inventors: 岩崎　雅二郎; 雅二郎岩崎
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2016-08-17
Filing date: 2017-04-06
Publication date: 2018-03-28
Anticipated expiration: 2037-04-06
Also published as: JP2018032374A

Description

本発明は、検索データ管理装置、検索データ管理方法、および検索データ管理プログラムに関する。 The present invention relates to a search data management device, a search data management method, and a search data management program.

画像検索などのデータ検索の分野において、検索対象のデータ同士を仮想的に接続要素によって接続したグラフ構造のデータを構築しておき、接続要素を辿ることで、データ検索が行われている。接続要素とは、二つのデータが接続されていることを示す情報であり、エッジ、リンクなどと称される（以下ではエッジと称する）。エッジの設定は、まず検索対象のデータのオブジェクトを求め、オブジェクトが近いデータ同士を接続するといった規則に従って行われる。オブジェクトとは、距離が定義可能なデータである。オブジェクトは、典型的には多次元ベクトルデータで表され、オブジェクトが近いとは、例えば、多次元ベクトルデータ間の距離が短いことと定義される。多次元ベクトルデータ同士の距離としては、例えば、ベクトル要素間の差分についてＬｐノルムを求めたものが使用される（ｐ＝１、２、…）。 In the field of data search such as image search, data search is performed by constructing data having a graph structure in which search target data is virtually connected by connection elements and tracing the connection elements. The connection element is information indicating that two pieces of data are connected, and is referred to as an edge or a link (hereinafter referred to as an edge). The setting of the edge is performed according to a rule such that an object of data to be searched is first obtained and data close to each other is connected. An object is data whose distance can be defined. An object is typically represented by multidimensional vector data, and being close to an object is defined, for example, as a short distance between multidimensional vector data. As the distance between the multidimensional vector data, for example, the Lp norm obtained for the difference between the vector elements is used (p = 1, 2,...).

また、エッジには、有向エッジと無向エッジが存在する。有向エッジとは、一方向にしかデータを辿れないエッジであり、無向エッジとは、双方向にデータを辿ることができるエッジである。無向エッジによって表現されたグラフ構造のデータ（無向グラフ）を用いて検索を行う技術が開示されている（特許文献１参照）。 In addition, there are directed edges and undirected edges. A directed edge is an edge that can trace data in only one direction, and an undirected edge is an edge that can trace data in both directions. A technique for performing a search using graph structure data (undirected graph) expressed by an undirected edge is disclosed (see Patent Document 1).

特許第５２０８００１号公報Japanese Patent No. 5208001

無向グラフを使用する場合、検索経路のパターンが多くなるため、より網羅的な検索を行うことができる。反面、無駄な経路で検索を行ってしまう機会も多くなるため、時間辺りの検索精度が、有向グラフを用いた場合よりも高くなるとは限らない。 When an undirected graph is used, since there are many patterns of search paths, a more comprehensive search can be performed. On the other hand, since there are many opportunities to perform a search using a useless route, the search accuracy over time is not always higher than when a directed graph is used.

一方で、有向グラフを使用する場合、そのグラフの作り方によっては、他のデータから参照されないデータが出現する可能性がある。この場合、検索処理において、そのデータに辿り着かない可能性が高くなり、網羅的な検索を行うことができなくなる場合がある。 On the other hand, when a directed graph is used, data that is not referred to by other data may appear depending on how the graph is created. In this case, in the search process, there is a high possibility that the data cannot be reached, and an exhaustive search may not be performed.

本発明は、このような事情を考慮してなされたものであり、高速かつ高精度な検索を可能にする有向グラフを生成することができる検索データ管理装置、検索データ管理方法、および検索データ管理プログラムを提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and a search data management device, a search data management method, and a search data management program capable of generating a directed graph that enables high-speed and high-precision search Is one of the purposes.

本発明の一態様は、データ検索の対象となる複数のノードを記憶する記憶部と、前記記憶部に記憶された各ノードから近傍ノードに対して有向エッジが設定されたグラフ構造のデータである一次グラフを生成する一次グラフ生成部と、前記記憶部に記憶された一次グラフにおける前記有向エッジの向きを逆にしたグラフ構造のデータである転置グラフを、データ検索のためのデータとして生成する転置グラフ生成部と、を備える検索データ管理装置である。 One aspect of the present invention is a storage unit that stores a plurality of nodes to be searched for data, and data having a graph structure in which a directed edge is set for each neighboring node from each node stored in the storage unit. A primary graph generation unit that generates a certain primary graph and a transposed graph that is data of a graph structure in which the direction of the directed edge in the primary graph stored in the storage unit is reversed is generated as data for data search And a transposed graph generation unit.

本発明の一態様によれば、高速かつ高精度な検索を可能にする有向グラフを生成することができる検索データ管理装置、検索データ管理方法、および検索データ管理プログラムを提供することができる。 According to one embodiment of the present invention, it is possible to provide a search data management device, a search data management method, and a search data management program that can generate a directed graph that enables high-speed and highly accurate search.

実施形態の検索データ管理装置１を中心とした構成図である。It is a block diagram centering on the search data management apparatus 1 of embodiment. 制御部２０のハードウェア構成の一例を示す図である。2 is a diagram illustrating an example of a hardware configuration of a control unit 20. FIG. 既存データオブジェクト４２の内容の一例を示す図である。It is a figure which shows an example of the content of the existing data object. グラフインデックス４４の内容の一例を示す図である。4 is a diagram illustrating an example of the contents of a graph index 44. FIG. 一次グラフ生成部２６Ａによる処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of a process by 26 A of primary graph production | generation parts. 近傍オブジェクトが更新される様子を示す図である。It is a figure which shows a mode that a near object is updated. 一次グラフＧに基づいて転置グラフＧｒが生成される様子を示す図である。It is a figure which shows a mode that the transposed graph Gr is produced | generated based on the primary graph G. FIG. 転置グラフＧｒに基づいて転置グラフＧｒ^＋が生成される様子を示す図である。It is a figure which shows a mode that the transposition graph Gr ⁺ is produced | generated based on the transposition graph Gr. 転置グラフＧｒ^＋に基づいて転置グラフＧｒ^＋＋が生成される様子を示す図である。It is a diagram showing how the transposition graph Gr ⁺⁺ is generated based on the transposed graph Gr ^+. 転置グラフ生成部２６Ｂにより実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process performed by the transposed graph production | generation part 26B. 各グラフを用いて検索を行うシミュレーションの結果を示す図である。It is a figure which shows the result of the simulation which searches using each graph. 無向グラフＧａを生成する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which produces | generates the undirected graph Ga. 無向グラフＧａを生成する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which produces | generates the undirected graph Ga. 第２実施形態に係るインデックス検索部２４の処理の内容を説明するための図である。It is a figure for demonstrating the content of the process of the index search part 24 which concerns on 2nd Embodiment.

以下、図面を参照し、本発明の検索データ管理装置、検索データ管理方法、および検索データ管理プログラムの実施形態について説明する。 Hereinafter, embodiments of a search data management device, a search data management method, and a search data management program according to the present invention will be described with reference to the drawings.

＜第１実施形態＞
以下、第１実施形態について説明する。
［全体構成］
図１は、実施形態の検索データ管理装置１を中心とした構成図である。検索データ管理装置１は、一以上のクライアント端末ＣＬとネットワークＮＷを介して接続される。クライアント端末は、パーソナルコンピュータ、スマートフォンなどの携帯電話、タブレット端末、その他の端末装置である。ネットワークＮＷは、無線基地局、公衆回線、専用回線、プロバイダ端末、インターネットなどを含む。検索データ管理装置１は、クライアント端末ＣＬからクエリデータを受信すると、クエリデータに類似するデータを検索し、検索結果をクライアント端末ＣＬに返信する。検索データ管理装置１が返信するデータは、データそのもの（例えばｊｐｇなどで生成された画像データ）であってもよいし、データを参照するための識別子（ＵＲＬなど）であってもよい。また、クエリデータや検索対象のデータは、画像、音声、テキストデータなど、如何なる種類のデータであってもよい。以下の説明では、検索データ管理装置１が画像を検索するものとして説明する。 <First Embodiment>
The first embodiment will be described below.
[overall structure]
FIG. 1 is a configuration diagram centering on a search data management apparatus 1 according to the embodiment. The search data management device 1 is connected to one or more client terminals CL via a network NW. The client terminal is a personal computer, a mobile phone such as a smartphone, a tablet terminal, or another terminal device. The network NW includes a wireless base station, a public line, a dedicated line, a provider terminal, the Internet, and the like. When the search data management device 1 receives the query data from the client terminal CL, the search data management device 1 searches for data similar to the query data and returns the search result to the client terminal CL. The data returned by the search data management device 1 may be the data itself (for example, image data generated by jpg or the like) or an identifier (URL or the like) for referring to the data. Further, the query data and the search target data may be any kind of data such as images, sounds, text data, and the like. In the following description, it is assumed that the search data management device 1 searches for images.

検索データ管理装置１は、例えば、ネットワークインターフェース１０と、制御部２０と、入出力装置３０と、データサーバ４０とを備える。ネットワークインターフェース１０は、例えば、ＮＩＣ（Network Interface Card）である。 The search data management device 1 includes, for example, a network interface 10, a control unit 20, an input / output device 30, and a data server 40. The network interface 10 is, for example, a NIC (Network Interface Card).

制御部２０は、例えば、データオブジェクト生成部２２と、インデックス検索部２４と、グラフインデックス編集部２６とを備える。グラフインデックス編集部２６は、一次グラフ生成部２６Ａと、転置グラフ生成部２６Ｂとを備える。これらの制御部２０の構成要素は、例えば、ＣＰＵ（Central Processing Unit）などのプロセッサがプログラムを実行することにより実現される。また、これらの機能部のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）などのハードウェアによって実現されてもよいし、ソフトウェアとハードウェアが協働して実現されてもよい。これらの機能部による処理の内容については後述する。 The control unit 20 includes, for example, a data object generation unit 22, an index search unit 24, and a graph index editing unit 26. The graph index editing unit 26 includes a primary graph generation unit 26A and a transposed graph generation unit 26B. These components of the control unit 20 are realized, for example, when a processor such as a CPU (Central Processing Unit) executes a program. Some or all of these functional units may be realized by hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), or software. And hardware may be implemented in cooperation. The contents of processing by these functional units will be described later.

図２は、制御部２０のハードウェア構成の一例を示す図である。制御部２０は、例えば、ＣＰＵ２０Ａと、ＲＡＭ（Random Access Memory）２０Ｂと、プログラムメモリ２０Ｃとが内部バスによって接続された構成を有する。内部バスには、ネットワークインターフェース１０、入出力インターフェース、メモリインターフェースなどの各種インターフェースが接続される。ＲＡＭ２０Ｂには、ＣＰＵ２０Ａによる処理対象のデータ、或いは処理結果のデータなどが格納される。プログラムメモリ２０Ｃは、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリなどの記憶装置のうち一部または全部を含む。プログラムメモリ２０Ｃには、データオブジェクト生成プログラム２０Ｃａ、インデックス検索プログラム２０Ｃｂ、グラフインデックス編集プログラム２０Ｃｃなどのプログラムが格納される。これらのプログラムは明確に分離されている必要はなく、一部または全部が共通化されてもよい。 FIG. 2 is a diagram illustrating an example of a hardware configuration of the control unit 20. The control unit 20 has, for example, a configuration in which a CPU 20A, a RAM (Random Access Memory) 20B, and a program memory 20C are connected by an internal bus. Various interfaces such as a network interface 10, an input / output interface, and a memory interface are connected to the internal bus. The RAM 20B stores data to be processed by the CPU 20A, data of processing results, and the like. The program memory 20C includes a part or all of a storage device such as a ROM (Read Only Memory), a HDD (Hard Disk Drive), and a flash memory. The program memory 20C stores programs such as a data object generation program 20Ca, an index search program 20Cb, and a graph index editing program 20Cc. These programs do not need to be clearly separated, and some or all of them may be shared.

図１に戻り、入出力装置３０は、検索データ管理装置１の管理者による入力操作を受け付けるマウスやキーボード、タッチパネルなどの入力装置と、ＬＣＤ（Liquid Crystal Display）や有機ＥＬ（Electroluminescence）表示装置、スピーカなどの出力装置とを含む。 Returning to FIG. 1, the input / output device 30 includes an input device such as a mouse, a keyboard, and a touch panel that accepts an input operation by an administrator of the search data management device 1, an LCD (Liquid Crystal Display), an organic EL (Electroluminescence) display device, Output devices such as speakers.

データサーバ４０は、ＨＤＤやフラッシュメモリなどの記憶装置により実現される。データサーバ４０は、制御部２０のプログラムメモリ２０Ｃと共通化されてもよいし、プログラムメモリ２０Ｃとは別体の記憶装置により実現されてもよい。また、データサーバ４０は、検索データ管理装置１から各種ネットワークを介してアクセス可能なＮＡＳ（Network Attached Storage）装置などの外部記憶装置により実現されてもよい。この場合、特許請求の範囲における「記憶部」は、データサーバ４０から取得したデータが一時的に格納されるＲＡＭ２０Ｂを指すものと解釈してもよい。 The data server 40 is realized by a storage device such as an HDD or a flash memory. The data server 40 may be shared with the program memory 20C of the control unit 20, or may be realized by a storage device separate from the program memory 20C. The data server 40 may be realized by an external storage device such as a NAS (Network Attached Storage) device that can be accessed from the search data management device 1 via various networks. In this case, the “storage unit” in the claims may be interpreted to indicate the RAM 20B in which data acquired from the data server 40 is temporarily stored.

［制御部］
以下、制御部２０の各機能部による処理の内容について説明する。なお、制御部２０は、一つのプロセッサにより実現される必要はなく、機能ごとに分散処理を行ってもよい。例えば、データオブジェクト生成部２２およびインデックス検索部２４と、グラフインデックス編集部２６とは、それぞれ別体のプロセッサにより実現されてもよい。 [Control unit]
Hereinafter, the contents of processing by each functional unit of the control unit 20 will be described. Note that the control unit 20 does not need to be realized by a single processor, and may perform distributed processing for each function. For example, the data object generation unit 22, the index search unit 24, and the graph index editing unit 26 may be realized by separate processors.

データオブジェクト生成部２２は、クライアント端末ＣＬなどから受信した入力データに基づいて、入力データのオブジェクトを生成する。入力データが画像である場合、オブジェクトは、例えば、局所特徴量やＢｏＦ（Bag of Features）、色のヒストグラムなど、或いはこれらの組み合わせである。また、入力データがテキストデータである場合、オブジェクトは、例えば、単語の出現頻度のベクトルデータ、ニューラルネットワークを用いて抽出した意味ベクトル、或いは、他の単語と共に出現する頻度その他の相対関係に基づくベクトルなどである。オブジェクトは、データ検索の対象となる「ノード」の一例である。 The data object generation unit 22 generates an input data object based on the input data received from the client terminal CL or the like. When the input data is an image, the object is, for example, a local feature amount, BoF (Bag of Features), a color histogram, or a combination thereof. In addition, when the input data is text data, the object is, for example, vector data on the appearance frequency of words, a semantic vector extracted using a neural network, or a vector based on the frequency or other relative relation that appears together with other words. Etc. An object is an example of a “node” that is a target of data search.

インデックス検索部２４は、データオブジェクト生成部２２により生成されたオブジェクト（以下、新規オブジェクトと称する）をクエリデータとしてデータサーバ４０を検索し、新規オブジェクトに類似するオブジェクトを、既存データオブジェクト４２から抽出する。データサーバ４０には、例えば、既存データオブジェクト４２と、グラフインデックス４４とが格納される。 The index search unit 24 searches the data server 40 using the object (hereinafter referred to as a new object) generated by the data object generation unit 22 as query data, and extracts an object similar to the new object from the existing data object 42. . For example, an existing data object 42 and a graph index 44 are stored in the data server 40.

既存データオブジェクト４２は、検索対象となるデータから予め生成されたオブジェクトのデータである。既存データオブジェクト４２には、その既存データオブジェクト４２を生成する元となったデータ、或いはデータを参照する識別子が対応付けられている。検索データ管理装置１は、新規オブジェクトをクエリとして得られた既存オブジェクト、すなわち新規オブジェクトに類似するオブジェクトに対応付けられたデータ、或いはそのデータを参照する識別子をクライアント端末ＣＬに送信する。図３は、既存データオブジェクト４２の内容の一例を示す図である。図示するように、既存データオブジェクト４２は、オブジェクトの識別情報（図中、オブジェクトＩＤ）に対して、オブジェクト（図ではベクトルデータ）、およびそのオブジェクトを生成する元となったデータ、或いはデータを参照する識別子が対応付けられたデータである。 The existing data object 42 is data of an object generated in advance from data to be searched. The existing data object 42 is associated with data used to generate the existing data object 42 or an identifier that refers to the data. The search data management device 1 transmits data associated with an existing object obtained by using a new object as a query, that is, data associated with an object similar to the new object, or an identifier referring to the data to the client terminal CL. FIG. 3 is a diagram illustrating an example of the contents of the existing data object 42. As shown in the figure, the existing data object 42 refers to the object identification information (object ID in the figure), the object (vector data in the figure), and the data or data from which the object is generated. Data associated with the identifier to be associated.

グラフインデックス４４は、複数のオブジェクトを接続するエッジに関する情報であり、参照元のオブジェクトと参照先のオブジェクトとを接続する有向エッジの集合により形成されるグラフ構造のデータである。図４は、グラフインデックス４４の内容の一例を示す図である。 The graph index 44 is information regarding edges connecting a plurality of objects, and is data having a graph structure formed by a set of directed edges connecting a reference source object and a reference destination object. FIG. 4 is a diagram illustrating an example of the contents of the graph index 44.

グラフインデックス４４は、例えば、参照元のオブジェクトの識別情報である参照元オブジェクトＩＤに対し、参照先のオブジェクトの識別情報である参照先オブジェクトＩＤ、それらを接続する有向エッジの識別情報であるエッジＩＤ、および参照元のオブジェクトと参照先のオブジェクトとの距離などの情報が対応付けられたものである。オブジェクトがベクトルである場合、それらの距離は、例えば、ベクトル要素間の差分についてＬｐノルムを求めたものと定義される（ｐ＝１、２、…）。なお、図３および図４に示すデータ構造は、あくまで一例であり、他のデータ構造が採用されてもよい。例えば、グラフインデックス４４は、エッジＩＤに対してその他の情報が対応付けられたものであってもよい。すなわち、本実施形態においてデータ構造の形式は本質的な問題ではなく、距離が定義されている限り、如何なるデータ構造のデータがデータサーバ４０に格納されてもよい。なお、本来距離とは距離の公理を満たすことを意味するが、距離の公理を満たさない擬似的な距離であっても構わない。擬似的な距離の場合には検索性能が低下するが、処理上扱いやすいなどのメリットがあれば疑似的な距離を使用してもよい。 The graph index 44 is, for example, a reference source object ID that is identification information of a reference source object, a reference destination object ID that is identification information of a reference destination object, and an edge that is identification information of a directed edge connecting them Information such as the ID and the distance between the reference source object and the reference destination object is associated. If the object is a vector, their distance is defined as, for example, the Lp norm for the difference between vector elements (p = 1, 2,...). Note that the data structures shown in FIGS. 3 and 4 are merely examples, and other data structures may be employed. For example, the graph index 44 may be one in which other information is associated with the edge ID. In other words, the format of the data structure is not an essential problem in the present embodiment, and data of any data structure may be stored in the data server 40 as long as the distance is defined. The distance originally means that the distance axiom is satisfied, but may be a pseudo distance that does not satisfy the distance axiom. In the case of a pseudo distance, the search performance is lowered, but the pseudo distance may be used if there is a merit such as easy handling.

インデックス検索部２４は、グラフインデックス４４により規定されたエッジによって、参照元のオブジェクトから参照先のオブジェクトへの向きに従って辿ることのできるオブジェクトのうち、新規オブジェクトに対して既定の距離以内にあるオブジェクトを、新規オブジェクトに類似するオブジェクトとして抽出する。また、インデックス検索部２４は、グラフインデックス４４により規定されたエッジによって、参照元のオブジェクトから参照先のオブジェクトへの向きに従って辿ることのできるオブジェクトのうち、新規オブジェクトに対する距離が短いものから順に所定数のオブジェクトを、新規オブジェクトに類似するオブジェクトとして抽出してもよい。そして、インデックス検索部２４は、既存データオブジェクト４２において、抽出したオブジェクトに対応付けられているデータまたはその識別子を、ネットワークインターフェース１０およびネットワークＮＷを介してクライアント端末ＣＬに送信する。 Of the objects that can be traced according to the direction from the reference source object to the reference destination object by the edge defined by the graph index 44, the index search unit 24 selects an object within a predetermined distance from the new object. Extract as an object similar to a new object. In addition, the index search unit 24 uses a predetermined number in order from the shortest distance to the new object among the objects that can be traced according to the direction from the reference source object to the reference destination object by the edge defined by the graph index 44. These objects may be extracted as objects similar to the new object. Then, the index search unit 24 transmits the data associated with the extracted object in the existing data object 42 or its identifier to the client terminal CL via the network interface 10 and the network NW.

なお、インデックス検索部２４による検索処理は、例えば、後述する図１３のフローチャートの処理を用いて実現される。図１３のフローチャートにおいて、新規オブジェクトをオブジェクトｙとして得られるオブジェクト集合Ｒが、インデックス検索部２４による検索結果の一例である。この場合、所定数ｋｓを所望の値に調整することで、検索結果に含まれるオブジェクトの数を調整することができる。 Note that the search processing by the index search unit 24 is realized using, for example, the processing of the flowchart of FIG. In the flowchart of FIG. 13, an object set R obtained from a new object as an object y is an example of a search result by the index search unit 24. In this case, the number of objects included in the search result can be adjusted by adjusting the predetermined number ks to a desired value.

データオブジェクト生成部２２およびインデックス検索部２４の機能によって、クライアント端末ＣＬは、送信したデータに類似するデータを検索データ管理装置１から取得することができる。なお、検索データ管理装置１の役割は、新規オブジェクトに類似するオブジェクトを抽出するところまでであってもよく、その場合、抽出したオブジェクトに対応したデータ、或いはデータを参照する識別子をクライアント端末ＣＬに送信する機能は、検索データ管理装置１とは異なる装置が有してもよい。 With the functions of the data object generation unit 22 and the index search unit 24, the client terminal CL can acquire data similar to the transmitted data from the search data management device 1. Note that the role of the search data management apparatus 1 may be up to extracting an object similar to a new object. In this case, data corresponding to the extracted object or an identifier referring to the data is given to the client terminal CL. The transmission function may be provided by a device different from the search data management device 1.

グラフインデックス編集部２６の一次グラフ生成部２６Ａは、グラフインデックス４４の元となる有向グラフ（一次グラフ）Ｇを生成する。一次グラフ生成部２６Ａは、一次グラフをＲＡＭ２０Ｂまたはデータサーバ４０などの記憶部に記憶させる。なお、以下の説明において「オブジェクト間の距離が短い」と「オブジェクト同士が近い」、および「オブジェクト間の距離が長い」と「オブジェクトが遠い」は同じ意味である。 The primary graph generation unit 26 </ b> A of the graph index editing unit 26 generates a directed graph (primary graph) G that is the basis of the graph index 44. The primary graph generation unit 26A stores the primary graph in a storage unit such as the RAM 20B or the data server 40. In the following description, “the distance between the objects is short” and “the objects are close”, and “the distance between the objects is long” and “the object is far” have the same meaning.

図５は、一次グラフ生成部２６Ａによる処理の流れの一例を示すフローチャートである。本フローチャートの処理の前提として、既存データオブジェクト４２に含まれるオブジェクトには、１からｎまでのオブジェクトＩＤが付与されているものとする。フローチャートの中で使用する引数ｉおよびｊは、１からｎまで順に選択されて処理が実行される。すなわち、一次グラフ生成部２６Ａは、まず、引数ｉを１に設定すると共に、引数ｊを１から一つずつインクリメントしながら引数ｊがｎに至るまで繰り返しＳ１００〜Ｓ１０６の処理を行う。次に、一次グラフ生成部２６Ａは、引数ｉを２に設定すると共に、引数ｊを１から一つずつインクリメントしながら引数ｊがｎに至るまで繰り返しＳ１００〜Ｓ１０６の処理を行う。これを、引数ｉがｎに至るまで繰り返し実行する。 FIG. 5 is a flowchart illustrating an example of a flow of processing by the primary graph generation unit 26A. As a premise of the processing of this flowchart, it is assumed that object IDs 1 to n are assigned to objects included in the existing data object 42. Arguments i and j used in the flowchart are sequentially selected from 1 to n and executed. That is, the primary graph generation unit 26A first sets the argument i to 1, and repeatedly performs the processing of S100 to S106 until the argument j reaches n while incrementing the argument j one by one. Next, the primary graph generation unit 26A sets the argument i to 2, and repeats the processes of S100 to S106 until the argument j reaches n while incrementing the argument j one by one. This is repeated until the argument i reaches n.

まず、一次グラフ生成部２６Ａは、オブジェクトｉ（オブジェクトＩＤがｉであるオブジェクト；以下同様）が、オブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）の要素である、またはｉ＝ｊであるか否かを判定する（Ｓ１００）。オブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）とは、オブジェクトｊ（着目ノードの一例）から見て距離が短い方から順に所定数ｋｐだけ選択されるオブジェクトの集合であり、オブジェクトｊを参照元とする参照先のオブジェクトの集合である。以下、オブジェクトｊを参照元とし、近傍オブジェクト集合Ｎ（Ｇ，ｊ）に含まれるオブジェクトを参照先とするエッジを、「オブジェクトｊに付与されたエッジ」と称することがある。「Ｇ」は、グラフ構造のデータを示す符号である。オブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）の要素であるオブジェクトのそれぞれと、オブジェクトｊとの間は、有向エッジによって接続される。従って、少なくともオブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）の要素であるオブジェクトは、オブジェクトｊから一つのエッジを介して辿ることのできるオブジェクトである。なお、オブジェクトｊが、オブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）の要素でない他のオブジェクト（例えばｕ）の近傍オブジェクト集合Ｎ（Ｇ，ｕ）の要素である場合もある。以下、このことを前提として説明する。オブジェクトｉがオブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）の要素である場合、またはｉ＝ｊである場合、Ｓ１０２〜Ｓ１０６の処理がスキップされる。 First, the primary graph generation unit 26A determines whether the object i (the object whose object ID is i; hereinafter the same) is an element of the neighborhood object set N (G, j) of the object j or i = j. Is determined (S100). The neighborhood object set N (G, j) of the object j is a set of objects that are selected by a predetermined number kp in order from the shortest distance when viewed from the object j (an example of the target node). Is a set of reference destination objects. Hereinafter, an edge having the object j as a reference source and an object included in the neighborhood object set N (G, j) as a reference destination may be referred to as an “edge given to the object j”. “G” is a code indicating data having a graph structure. Each of the objects that are elements of the neighborhood object set N (G, j) of the object j and the object j are connected by a directed edge. Therefore, at least an object that is an element of the neighborhood object set N (G, j) of the object j is an object that can be traced from the object j through one edge. Note that the object j may be an element of a neighboring object set N (G, u) of another object (for example, u) that is not an element of the neighboring object set N (G, j) of the object j. Hereinafter, this will be described on the assumption. When the object i is an element of the neighborhood object set N (G, j) of the object j, or when i = j, the processing of S102 to S106 is skipped.

オブジェクトｉがオブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）の要素でなく、且つｉ＝ｊでない場合、一次グラフ生成部２６Ａは、オブジェクトｉをオブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）の要素に追加する（Ｓ１０２）。次に、一次グラフ生成部２６Ａは、オブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）の要素であるオブジェクトの数が所定数ｋｐを超えるか否かを判定する（Ｓ１０４）。所定数ｋｐは、任意に定められる自然数である。例えば、ｋｐ＝４０である。オブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）の要素であるオブジェクトの数が所定数ｋｐを超える場合、一次グラフ生成部２６Ａは、オブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）から、オブジェクトｊとの距離が最も長い（オブジェクトｊから最も遠い）オブジェクトを除外する（Ｓ１０６）。 If the object i is not an element of the neighboring object set N (G, j) of the object j and i = j is not satisfied, the primary graph generation unit 26A sets the object i to the neighboring object set N (G, j) of the object j. It adds to an element (S102). Next, the primary graph generation unit 26A determines whether or not the number of objects that are elements of the neighborhood object set N (G, j) of the object j exceeds a predetermined number kp (S104). The predetermined number kp is an arbitrary natural number. For example, kp = 40. When the number of objects that are elements of the neighborhood object set N (G, j) of the object j exceeds the predetermined number kp, the primary graph generation unit 26A determines that the object j from the neighborhood object set N (G, j) of the object j Are excluded (S106).

図６は、近傍オブジェクト集合が更新される様子を示す図である。図示の例では、元々、オブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）にオブジェクト（１）〜（３）が設定されていたところ、オブジェクトｉが追加されると共にオブジェクト（３）が除外された。この結果、オブジェクトｊの近傍オブジェクト集合Ｎ（Ｇ，ｊ）がＮ（Ｇ，ｊ）＊に更新された。このような処理を繰り返すことで、オブジェクトｘ＝１〜ｎの全てについて、近傍オブジェクト集合Ｎ（Ｇ，ｘ）が設定される。これによって、グラフインデックス４４の元となる有向グラフ（一次グラフ）Ｇが生成される。 FIG. 6 is a diagram illustrating a state in which the neighborhood object set is updated. In the illustrated example, when the objects (1) to (3) were originally set in the neighborhood object set N (G, j) of the object j, the object i was added and the object (3) was excluded. . As a result, the neighborhood object set N (G, j) of the object j is updated to N (G, j) *. By repeating such processing, the neighborhood object set N (G, x) is set for all of the objects x = 1 to n. As a result, a directed graph (primary graph) G that is the basis of the graph index 44 is generated.

転置グラフ生成部２６Ｂは、一次グラフ生成部２６Ａにより生成された一次グラフＧに対して、エッジの向きを逆にした（すなわち参照元のデータと参照先のデータを入れ替えた）転置グラフＧｒを生成する。図７は、一次グラフＧに基づいて転置グラフＧｒが生成される様子を示す図である。 The transposed graph generation unit 26B generates a transposed graph Gr in which the edge direction is reversed (that is, the reference source data and the reference destination data are switched) with respect to the primary graph G generated by the primary graph generation unit 26A. To do. FIG. 7 is a diagram illustrating how the transposed graph Gr is generated based on the primary graph G.

ここで、一次グラフＧにおいては、「他のデータから参照されない、或いは参照元のデータが極めて少ないデータ」が存在する可能性がある。これに対し、転置グラフＧｒにおいては、各データは、所定数ｋｐのデータから参照されることが保証されている。この結果、検索漏れを無くすことができる。 Here, in the primary graph G, there is a possibility that “data that is not referred to by other data or that has very little reference source data” exists. On the other hand, in the transposed graph Gr, each data is guaranteed to be referred to from a predetermined number kp of data. As a result, search omissions can be eliminated.

また、転置グラフ生成部２６Ｂは、転置グラフに対し、各オブジェクトに付与されたエッジに対して、逆向きのエッジを所定数ｋｒ（または所定割合）加えてもよい。所定数ｋｒは、所定数ｋｐよりも小さい数であり、一例として、ｋｒ＝２０である。以下、逆向きのエッジを所定数ｋｒ加えたものを転置グラフＧｒ^＋と称する。図８は、転置グラフＧｒに基づいて転置グラフＧｒ^＋が生成される様子を示す図である。図中、ＡＥは転置グラフＧｒに対して追加されたエッジである。 Further, the transposed graph generation unit 26B may add a predetermined number kr (or a predetermined ratio) of edges in the opposite direction to the edges given to each object in the transposed graph. The predetermined number kr is a number smaller than the predetermined number kp, and as an example, kr = 20. Hereinafter, a result obtained by adding a predetermined number kr of opposite edges is referred to as a transposed graph Gr ⁺ . FIG. 8 is a diagram illustrating a state in which a transposed graph Gr ⁺ is generated based on the transposed graph Gr. In the figure, AE is an edge added to the transposed graph Gr.

更に、転置グラフ生成部２６Ｂは、逆向きのエッジを所定数ｋｒ加えた転置グラフＧｒ^＋に対し、各オブジェクトに付与されたエッジを長さ順にソートし、短い順に所定数ｋｍ（または所定割合）のエッジを残して長いエッジを削除してもよい。一例として、ｋｍ＝６０である。以下、短い順に所定数ｋｍのエッジを残して長いエッジを削除したものを転置グラフＧｒ^＋＋と称する。図９は、転置グラフＧｒ^＋に基づいて転置グラフＧｒ^＋＋が生成される様子を示す図である。図中、破線で示す矢印は、転置グラフＧｒ^＋から削除されたエッジである。なお、転置グラフ生成部２６Ｂは、転置グラフＧｒ^＋を生成せず、転置グラフＧｒにおいて、各オブジェクトに付与されたエッジを長さ順にソートし、短い順に所定数ｋｍのエッジを残して長いエッジを削除して転置グラフＧｒ^＋＋を生成してもよい。 Further, the transposed graph generation unit 26B sorts the edges given to each object in the order of length with respect to the transposed graph Gr ^{+ obtained by} adding a predetermined number kr of opposite edges, and the predetermined number km (or a predetermined ratio) in short order The long edge may be deleted while leaving the edge. As an example, km = 60. Hereinafter, a graph obtained by deleting a long edge while leaving a predetermined number of km edges in short order is referred to as a transposed graph Gr ⁺⁺ . Figure 9 is a diagram showing how the transposition graph Gr ⁺⁺ is generated based on the transposed graph Gr ^+. In the figure, an arrow indicated by a broken line is an edge deleted from the transposed graph Gr ⁺ . Note that the transposed graph generation unit 26B does not generate the transposed graph Gr ⁺ and sorts the edges assigned to the objects in the transposed graph Gr in the order of length, and leaves the predetermined number km of edges in the shortest order to generate long edges The transposed graph Gr ⁺⁺ may be generated by deleting.

図１に示すグラフインデックス４４は、例えば、転置グラフＧｒ、転置グラフＧｒ^＋、または転置グラフＧｒ^＋＋のうちいずれかである。 The graph index 44 illustrated in FIG. 1 is, for example, one of a transposed graph Gr, a transposed graph Gr ⁺ , and a transposed graph Gr ⁺⁺ .

図１０は、転置グラフ生成部２６Ｂにより実行される処理の流れの一例を示すフローチャートである。なお、本フローチャートの処理は、図８に示す処理と、図９に示す処理とを明確に分離せず、これらの処理をオブジェクト毎の処理に包含させたものであるが、結果として得られる転置グラフＧｒ^＋＋は同じになる。 FIG. 10 is a flowchart illustrating an example of a flow of processing executed by the transposed graph generation unit 26B. The process of this flowchart does not clearly separate the process shown in FIG. 8 and the process shown in FIG. 9 and includes these processes in the process for each object. The graph Gr ⁺⁺ is the same.

まず、転置グラフ生成部２６Ｂは、ＲＡＭ２０Ｂまたはデータサーバ４０に記憶された一次グラフからエッジを一つ選択し（Ｓ２００）、参照元と参照先を入れ替える（Ｓ２０２）。次に、転置グラフ生成部２６Ｂは、Ｓ２００において全てのエッジを選択したか否かを判定する（Ｓ２０４）。全てのエッジを選択していない場合、Ｓ２００に戻り、次のエッジを選択する。全てのエッジを選択した場合、Ｓ２０６に処理が進められる。これによって、転置グラフＧｒが生成される。転置グラフＧは、ＲＡＭ２０Ｂまたはデータサーバ４０に格納される。 First, the transposed graph generation unit 26B selects one edge from the primary graph stored in the RAM 20B or the data server 40 (S200), and switches the reference source and the reference destination (S202). Next, the transposed graph generation unit 26B determines whether or not all edges have been selected in S200 (S204). If all the edges have not been selected, the process returns to S200 and the next edge is selected. If all edges have been selected, the process proceeds to S206. Thereby, a transposed graph Gr is generated. The transposed graph G is stored in the RAM 20B or the data server 40.

転置グラフ生成部２６Ｂは、転置グラフＧにおけるオブジェクトを一つ選択する（Ｓ２０６）。以下、Ｓ２０６で選択されたオブジェクトを選択オブジェクトと称する。次に、転置グラフ生成部２６Ｂは、選択オブジェクトに付与されたエッジが無いかどうかを判定する（Ｓ２０８）。選択オブジェクトに付与されたエッジとは、前述したように、選択オブジェクトを参照元とするエッジである。選択オブジェクトに付与されたエッジが無い場合、転置グラフ生成部２６Ｂは、選択オブジェクトに所定数ｋａのエッジを付与する（Ｓ２１０）。例えば、転置グラフ生成部２６Ｂは、選択オブジェクトを参照先とするエッジのうち短い方から順に所定数ｋａのエッジを選択し、これらと逆向きのエッジを選択オブジェクトに付与する。なお、Ｓ２０８およびＳ２１０の処理は必須でなく、付与されたエッジの無いオブジェクトが生じる可能性が低い場合、省略されてもよい。 The transposed graph generation unit 26B selects one object in the transposed graph G (S206). Hereinafter, the object selected in S206 is referred to as a selected object. Next, the transposed graph generation unit 26B determines whether or not there is an edge given to the selected object (S208). The edge given to the selected object is an edge having the selected object as a reference source, as described above. If there is no edge given to the selected object, the transposed graph generation unit 26B gives a predetermined number of edges to the selected object (S210). For example, the transposed graph generation unit 26B selects a predetermined number of edges in order from the shortest edge among the edges having the selected object as a reference destination, and assigns edges opposite to these to the selected object. Note that the processing of S208 and S210 is not essential, and may be omitted when there is a low possibility that an object without an added edge will occur.

次に、転置グラフ生成部２６Ｂは、選択オブジェクトに付与されたエッジを長さ順にソートし（Ｓ２１２）、最も短いエッジから順に所定数ｋｒ分、逆向きのエッジを転置グラフＧに追加する（Ｓ２１４）。また、転置グラフ生成部２６Ｂは、最も長いエッジから順に、所定数ｋｍのエッジを残すように削除を行う（Ｓ２１６）。 Next, the transposed graph generation unit 26B sorts the edges given to the selected object in the order of length (S212), and adds reverse edges to the transposed graph G by a predetermined number kr in order from the shortest edge (S214). ). In addition, the transposed graph generation unit 26B performs deletion so as to leave a predetermined number of km edges in order from the longest edge (S216).

そして、転置グラフ生成部２６Ｂは、Ｓ２０６において全てのオブジェクトを選択したか否かを判定する（Ｓ２１８）。全てのオブジェクトを選択していない場合、Ｓ２０６に戻り、次のオブジェクトを選択する。全てのオブジェクトを選択した場合、本フローチャートの処理が終了する。これによって、転置グラフＧｒ^＋＋が生成される。転置グラフＧｒ^＋＋は、グラフインデックス４４としてデータサーバ４０に格納される。 Then, the transposed graph generation unit 26B determines whether all objects have been selected in S206 (S218). If all the objects have not been selected, the process returns to S206 and the next object is selected. When all the objects have been selected, the process of this flowchart ends. As a result, a transposed graph Gr ⁺⁺ is generated. The transposed graph Gr ⁺⁺ is stored in the data server 40 as the graph index 44.

図１１は、上記各グラフを用いて検索を行うシミュレーションの結果を示す図である。本シミュレーションは、画像の局所特徴量ＳＩＦＴをオブジェクトとし、上記説明した各種グラフを用いて、ＳＩＦＴ１０００万件から２０件の近傍オブジェクトを検索するシミュレーションである。このシミュレーションは、前述のように、ｋｐ＝４０、ｋｒ＝２０、ｋｍ＝６０として行った。図１１は、そのシミュレーション結果としての検索時間に対する精度の関係を示している。図中、縦軸は精度（２０件中の正解数の割合）を、横軸は検索時間を示している。このため、特性線が左上に偏しているほどシミュレーション結果が良好であったことになる。図示するように、転置グラフＧｒ^＋＋を用いた場合に最も良好なシミュレーション結果が得られ、次いで転置グラフＧｒ^＋、転置グラフＧｒ、一次グラフＧの順であった。 FIG. 11 is a diagram showing the results of a simulation for performing a search using each of the above graphs. This simulation is a simulation in which the local feature amount SIFT of an image is an object, and 20 neighboring objects are searched from 10 million SIFTs using the various graphs described above. As described above, this simulation was performed with kp = 40, kr = 20, and km = 60. FIG. 11 shows the relationship of accuracy with respect to search time as the simulation result. In the figure, the vertical axis represents accuracy (ratio of the number of correct answers in 20 cases), and the horizontal axis represents search time. For this reason, the simulation result is better as the characteristic line is biased to the upper left. As shown in the figure, the best simulation result was obtained when the transposed graph Gr ⁺⁺ was used, and then the transposed graph Gr ⁺ , the transposed graph Gr, and the primary graph G were in this order.

以上説明した第１実施形態の検索データ管理装置１によれば、一次グラフＧにおける有向エッジの向きを逆にしたグラフ構造のデータである転置グラフＧｒを、データ検索のためのデータとして生成するため、高速かつ高精度な検索を可能にする有向グラフを生成することができる。 According to the search data management apparatus 1 of the first embodiment described above, the transposed graph Gr, which is graph structure data in which the direction of the directed edge in the primary graph G is reversed, is generated as data for data search. Therefore, it is possible to generate a directed graph that enables a high-speed and highly accurate search.

また、第１実施形態の検索データ管理装置１によれば、転置グラフＧｒにおいて、各オブジェクトに付与されたエッジに対して、逆向きのエッジを所定数ｋｒ加えた転置グラフＧｒ^＋を生成することにより、より高速かつ高精度な検索を可能にする有向グラフを生成することができる。 Further, according to the search data management device 1 of the first embodiment, in the transposed graph Gr, the transposed graph Gr ⁺ is generated by adding a predetermined number kr of opposite edges to the edges given to each object. Thus, a directed graph that enables a faster and more accurate search can be generated.

また、第１実施形態の検索データ管理装置１によれば、転置グラフＧｒ^＋において、各オブジェクトに付与されたエッジを長さ順にソートし、短い順に所定数ｋｍのエッジを残して長いエッジを削除した転置グラフＧｒ^＋＋を生成することにより、更に高速かつ高精度な検索を可能にする有向グラフを生成することができる。 Further, according to the search data management device 1 of the first embodiment, in the transposed graph Gr ⁺ , the edges assigned to each object are sorted in length order, and long edges are deleted while leaving a predetermined number of kilometers in short order. By generating the transposed graph Gr ⁺⁺ , it is possible to generate a directed graph that enables a faster and more accurate search.

第１実施形態において（或いは後述する第２実施形態において）、一次グラフＧは、図５で説明したフローチャートの処理ではなく、以下に説明する処理によって生成される無向グラフＧａに対し、無向エッジを二本の有向エッジとみなし、各オブジェクトに付与されるエッジに上限を設けて、上限を超える分を距離の長いエッジから削減したものであってもよい。 In the first embodiment (or in the second embodiment described later), the primary graph G is not directed to the undirected graph Ga generated by the processing described below, instead of the processing of the flowchart described in FIG. The edge may be regarded as two directed edges, and an upper limit is provided for the edge given to each object, and the portion exceeding the upper limit is reduced from the edge having a long distance.

図１２および図１３は、無向グラフＧａを生成する処理の一例を示すフローチャートである。本フローチャートの処理の前提として、既存データオブジェクト４２において、１からｍまでのオブジェクトＩＤが付与されているものとする。フローチャートの中で使用する引数ｙは、１からｍまで順に選択されて処理が実行される。ここでは、近傍オブジェクト集合Ｎ（Ｇ，ｙ）は、オブジェクトｙに対して相互に参照し合うオブジェクトの集合である。 12 and 13 are flowcharts illustrating an example of processing for generating the undirected graph Ga. As a premise of the processing of this flowchart, it is assumed that object IDs 1 to m are assigned to the existing data object 42. The argument y used in the flowchart is selected in order from 1 to m and the process is executed. Here, the neighborhood object set N (G, y) is a set of objects that mutually refer to the object y.

一次グラフ生成部２６Ａは、Ｋ近傍検索処理を実行する（Ｓ３００）。Ｋ近傍検索処理については図１３を用いて説明する。図１３に示すように、一次グラフ生成部２６Ａは、超球の半径ｒを∞（無限大）に設定し（Ｓ４００）、既存のオブジェクト集合から部分集合Ｓを抽出する（Ｓ４０２）。超球とは、検索範囲（その範囲に入っていれば入力オブジェクトｙの近傍オブジェクト集合Ｎ（Ｇ，ｙ）の要素に含められる可能性がある範囲）を示す仮想的な球である。なお、Ｓ４０２で抽出されたオブジェクト集合Ｓに含まれるオブジェクトは、同時にオブジェクト集合Ｒの初期集合にも含められてもよい。 The primary graph generation unit 26A executes a K neighborhood search process (S300). The K neighborhood search process will be described with reference to FIG. As shown in FIG. 13, the primary graph generation unit 26A sets the radius r of the hypersphere to ∞ (infinity) (S400), and extracts the subset S from the existing object set (S402). The hypersphere is a virtual sphere indicating a search range (a range that can be included in the elements of the neighborhood object set N (G, y) of the input object y if the range is included). Note that the objects included in the object set S extracted in S402 may be included in the initial set of the object set R at the same time.

次に、一次グラフ生成部２６Ａは、オブジェクト集合Ｓに含まれるオブジェクトの中で、オブジェクトｙとの距離が最も短いオブジェクトを抽出し、オブジェクトｓとする（Ｓ４０４）。次に、一次グラフ生成部２６Ａは、オブジェクトｓをオブジェクト集合Ｓから除外する（Ｓ４０６）。 Next, the primary graph generation unit 26A extracts the object having the shortest distance from the object y from the objects included in the object set S and sets it as the object s (S404). Next, the primary graph generation unit 26A excludes the object s from the object set S (S406).

次に、一次グラフ生成部２６Ａは、オブジェクトｓとオブジェクトｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超えるか否かを判定する（Ｓ４０８）。ここで、εは拡張要素であり、ｒ（１＋ε）は、探索範囲（その範囲にあるオブジェクトの近傍オブジェクトについて、入力オブジェクトｙの近傍オブジェクト集合Ｎ（Ｇ，ｙ）の要素に含められるか否かを判定する範囲）の半径を示す値である。オブジェクトｓとオブジェクトｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超える場合、一次グラフ生成部２６Ａは、オブジェクト集合Ｒをオブジェクトｙの近傍オブジェクト集合Ｎ（Ｇ，ｙ）として出力し（Ｓ４１０）、図１３のフローチャートの処理を終了する。 Next, the primary graph generation unit 26A determines whether or not the distance d (s, y) between the object s and the object y exceeds r (1 + ε) (S408). Here, ε is an expansion element, and r (1 + ε) is a search range (whether it is included in the elements of the neighborhood object set N (G, y) of the input object y for the neighborhood objects of the objects in the range) This is a value indicating the radius of the range in which to determine. When the distance d (s, y) between the object s and the object y exceeds r (1 + ε), the primary graph generation unit 26A outputs the object set R as a neighborhood object set N (G, y) of the object y ( S410), the process of the flowchart of FIG.

オブジェクトｓとオブジェクトｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超えない場合、一次グラフ生成部２６Ａは、オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトの中からオブジェクト集合Ｃに含まれないオブジェクトを一つ選択し、選択したオブジェクトｕを、オブジェクト集合Ｃに格納する（Ｓ４１２）。オブジェクト集合Ｃは、重複検索を回避するために便宜上設けられるものであり、図１３のフローチャートが開始されるときにリセットされて空集合とされる。 When the distance d (s, y) between the object s and the object y does not exceed r (1 + ε), the primary graph generation unit 26A selects the object that is an element of the neighboring object set N (G, s) of the object s. Then, one object not included in the object set C is selected, and the selected object u is stored in the object set C (S412). The object set C is provided for convenience in order to avoid duplicate search, and is reset to an empty set when the flowchart of FIG. 13 is started.

次に、一次グラフ生成部２６Ａは、オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ（１＋ε）以下であるか否かを判定する（Ｓ４１４）。オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ（１＋ε）以下である場合、一次グラフ生成部２６Ａは、オブジェクトｕをオブジェクト集合Ｓに追加する（Ｓ４１６）。 Next, the primary graph generation unit 26A determines whether or not the distance d (u, y) between the object u and the object y is equal to or less than r (1 + ε) (S414). When the distance d (u, y) between the object u and the object y is equal to or less than r (1 + ε), the primary graph generation unit 26A adds the object u to the object set S (S416).

次に、一次グラフ生成部２６Ａは、オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ以下であるか否かを判定する（Ｓ４１８）。オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒを超える場合、Ｓ４３０に処理が進められる。 Next, the primary graph generation unit 26A determines whether or not the distance d (u, y) between the object u and the object y is equal to or less than r (S418). If the distance d (u, y) between the object u and the object y exceeds r, the process proceeds to S430.

オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ以下である場合、一次グラフ生成部２６Ａは、オブジェクトｕをオブジェクト集合Ｒに追加する（Ｓ４２０）。 When the distance d (u, y) between the object u and the object y is equal to or less than r, the primary graph generation unit 26A adds the object u to the object set R (S420).

そして、一次グラフ生成部２６Ａは、オブジェクト集合Ｒに含まれるオブジェクト数がｋｓを超えるか否かを判定する（Ｓ４２２）。所定数ｋｓは、任意に定められる自然数である。例えば、ｋｓ＝３である。 Then, the primary graph generation unit 26A determines whether or not the number of objects included in the object set R exceeds ks (S422). The predetermined number ks is a natural number that is arbitrarily determined. For example, ks = 3.

オブジェクト集合Ｒに含まれるオブジェクト数がｋｓを超える場合、一次グラフ生成部２６Ａは、オブジェクト集合Ｒに含まれるオブジェクトの中でオブジェクトｙとの距離が最も長いオブジェクトを、オブジェクト集合Ｒから除外する（Ｓ４２４）。 When the number of objects included in the object set R exceeds ks, the primary graph generation unit 26A excludes the object having the longest distance from the object y from the object set R among the objects included in the object set R (S424). ).

次に、一次グラフ生成部２６Ａは、オブジェクト集合Ｒに含まれるオブジェクト数がｋｓと一致するか否かを判定する（Ｓ４２６）。オブジェクト集合Ｒに含まれるオブジェクト数がｋｓと一致する場合、一次グラフ生成部２６Ａは、オブジェクト集合Ｒに含まれるオブジェクトの中でオブジェクトｙとの距離が最も長いオブジェクトと、オブジェクトｙとの距離を、新たなｒに設定する（Ｓ４２８）。 Next, the primary graph generation unit 26A determines whether or not the number of objects included in the object set R matches ks (S426). When the number of objects included in the object set R coincides with ks, the primary graph generation unit 26A determines the distance between the object with the longest distance from the object y and the object y among the objects included in the object set R. A new r is set (S428).

そして、一次グラフ生成部２６Ａは、オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトから全てのオブジェクトを選択してオブジェクト集合Ｃに格納し終えたか否かを判定する（Ｓ４３０）。オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトから全てのオブジェクトを選択してオブジェクト集合Ｃに格納し終えていない場合、Ｓ４１２に処理が戻される。 Then, the primary graph generation unit 26A determines whether all objects have been selected from the objects that are elements of the neighboring object set N (G, s) of the object s and stored in the object set C (S430). . If all objects have not been selected from the objects that are elements of the neighboring object set N (G, s) of the object s and stored in the object set C, the process returns to S412.

オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトから全てのオブジェクトを選択してオブジェクト集合Ｃに格納し終えた場合、一次グラフ生成部２６Ａは、オブジェクト集合Ｓが空集合であるか否かを判定する（Ｓ４３２）。オブジェクト集合Ｓが空集合でない場合、Ｓ４０４に処理が戻され、オブジェクト集合Ｓが空集合である場合、一次グラフ生成部２６Ａは、オブジェクト集合Ｒをオブジェクトｙの近傍オブジェクト集合Ｎ（Ｇ，ｙ）として出力し、図１３のフローチャートの処理を終了する（Ｓ４３４）。 When all the objects are selected from the objects that are elements of the neighboring object set N (G, s) of the object s and stored in the object set C, the primary graph generation unit 26A determines that the object set S is an empty set. Is determined (S432). When the object set S is not an empty set, the process is returned to S404. When the object set S is an empty set, the primary graph generation unit 26A sets the object set R as a neighboring object set N (G, y) of the object y. To output the processing of the flowchart in FIG. 13 (S434).

図１２に戻る。オブジェクトｙの近傍オブジェクト集合Ｎ（Ｇ，ｙ）が求められると、一次グラフ生成部２６Ａは、オブジェクトｙの近傍オブジェクト集合Ｎ（Ｇ，ｙ）の要素であるオブジェクトを一つ選択し、オブジェクトｚとする（Ｓ３０２）。次に、一次グラフ生成部２６Ａは、オブジェクトｚの近傍オブジェクト集合Ｎ（Ｇ，ｚ）の要素としてオブジェクトｙを追加する（Ｓ３０４）。 Returning to FIG. When the neighborhood object set N (G, y) of the object y is obtained, the primary graph generation unit 26A selects one object that is an element of the neighborhood object set N (G, y) of the object y, and the object z and (S302). Next, the primary graph generation unit 26A adds the object y as an element of the neighborhood object set N (G, z) of the object z (S304).

次に、一次グラフ生成部２６Ａは、オブジェクトｚの近傍オブジェクト集合Ｎ（Ｇ，ｚ）の要素であるオブジェクトの数がｋｐを超えるか否かを判定する（Ｓ３０６）。オブジェクトｚの近傍オブジェクト集合Ｎ（Ｇ，ｚ）の要素であるオブジェクトの数が所定数ｋｐを超える場合、一次グラフ生成部２６Ａは、オブジェクトｚの近傍オブジェクト集合Ｎ（Ｇ，ｚ）から、オブジェクトｚとの距離が最も長い（オブジェクトｊから最も遠い）オブジェクトを除外する（Ｓ３０８）。 Next, the primary graph generation unit 26A determines whether or not the number of objects that are elements of the neighborhood object set N (G, z) of the object z exceeds kp (S306). When the number of objects that are elements of the neighborhood object set N (G, z) of the object z exceeds the predetermined number kp, the primary graph generation unit 26A determines from the neighborhood object set N (G, z) of the object z that the object z Are excluded (S308).

そして、一次グラフ生成部２６Ａは、オブジェクトｙの近傍オブジェクト集合Ｎ（Ｇ，ｙ）の要素であるオブジェクトｚを全て選択したか否かを判定する（Ｓ３１０）。オブジェクトｙの近傍オブジェクト集合Ｎ（Ｇ，ｙ）の要素であるオブジェクトｚを全て選択していない場合はＳ３０２に処理が戻され、オブジェクトｙの近傍オブジェクト集合Ｎ（Ｇ，ｙ）の要素であるオブジェクトｚを全て選択した場合はＳ３００〜Ｓ３０８のループ処理の１回分が終了する。これによって、無向グラフＧａが生成される。 Then, the primary graph generation unit 26A determines whether or not all the objects z that are elements of the neighborhood object set N (G, y) of the object y have been selected (S310). If all the objects z that are elements of the neighboring object set N (G, y) of the object y have not been selected, the process returns to S302, and the objects that are elements of the neighboring object set N (G, y) of the object y If all z are selected, one loop of S300 to S308 is completed. Thereby, the undirected graph Ga is generated.

＜第２実施形態＞
以下、第２実施形態について説明する。第２実施形態において、インデックス検索部２４は、下記のような処理を行うことで、検索対象のオブジェクトの数を低減し、より高速な検索処理を実現する。 Second Embodiment
Hereinafter, a second embodiment will be described. In the second embodiment, the index search unit 24 performs the following process to reduce the number of objects to be searched and realize a faster search process.

図１４は、第２実施形態に係るインデックス検索部２４の処理の内容を説明するための図である。これについて、インデックス検索部２４の処理としても採用される図１３の処理を参照しながら説明する。図中Ｑはクエリデータに対応するオブジェクトであり、Ｏ１は、検索処理（図１３に示す処理）の中のＳ４１２で選択されたオブジェクトｕに相当する。オブジェクトｕとオブジェクトＱの距離Ｄ１は、Ｓ４１２の時点で既に算出されている。一方、オブジェクトＯ２、Ｏ３、およびＯ４は、オブジェクトｕの近傍オブジェクト集合Ｎ（Ｇ，ｕ）の要素である。また、Ｄ０はオブジェクトＱを基準とした検索範囲の仮想的な半径（Ｓ４１８のｒ）、または探索範囲の仮想的な半径（Ｓ４１４のｒ（１＋ε））である。図１３の説明と同様に、検索範囲とは、その範囲内に無ければクエリデータと類似しないと判定される範囲であり、探索範囲とは、オブジェクトを辿る過程で、その範囲内に無いオブジェクトは参照されない範囲である。本来、オブジェクトＱと、オブジェクトＯ２、Ｏ３、およびＯ４のそれぞれとの間の距離を算出して、Ｓ４１４またはＳ４１８の判定処理を行うのであるが、この距離を算出せずにＳ４１４またはＳ４１８の判定処理を行うことが可能な状態が存在する。 FIG. 14 is a diagram for explaining the processing contents of the index search unit 24 according to the second embodiment. This will be described with reference to the process of FIG. 13 that is also adopted as the process of the index search unit 24. In the figure, Q is an object corresponding to the query data, and O1 corresponds to the object u selected in S412 in the search process (process shown in FIG. 13). The distance D1 between the object u and the object Q has already been calculated at the time of S412. On the other hand, the objects O2, O3, and O4 are elements of the neighborhood object set N (G, u) of the object u. D0 is a virtual radius of the search range (r in S418) or a virtual radius of the search range (r (1 + ε) in S414) with the object Q as a reference. Similar to the description of FIG. 13, the search range is a range that is determined not to be similar to the query data unless it is within the range, and the search range is an object that is not within the range in the process of tracing the object. It is a range that is not referenced. Originally, the distance between the object Q and each of the objects O2, O3, and O4 is calculated and the determination process of S414 or S418 is performed. However, the determination process of S414 or S418 is not performed without calculating this distance. There is a state where it is possible to perform.

図１４に示す状態において、まず、オブジェクトＱとオブジェクトＯ１との距離が算出される。距離Ｄ１＞距離Ｄ０であるので、オブジェクトＯ１で表されるデータは、オブジェクトＱで表されるデータとは類似しないと判定される。ここで、オブジェクトＯ２、Ｏ３、Ｏ４のそれぞれについて、オブジェクトＱとの距離は未だ算出されていないが、オブジェクトＯ１とオブジェクトＱとの距離は算出されている。 In the state shown in FIG. 14, first, the distance between the object Q and the object O1 is calculated. Since distance D1> distance D0, it is determined that the data represented by the object O1 is not similar to the data represented by the object Q. Here, for each of the objects O2, O3, and O4, the distance from the object Q has not yet been calculated, but the distance between the object O1 and the object Q has been calculated.

この場合、オブジェクトＯ２に関しては、［Ｄ２＜（Ｄ１−Ｄ０）］であるため、検索範囲または探索範囲の中には入らないことが明らかである。従って、オブジェクトＱとオブジェクトＯ２の距離を算出するまでもなく、オブジェクトＯ２で表されるデータは、オブジェクトＱで表されるクエリデータと類似しないと判定される。 In this case, it is clear that the object O2 does not fall within the search range or the search range because [D2 <(D1-D0)]. Therefore, it is not necessary to calculate the distance between the object Q and the object O2, and it is determined that the data represented by the object O2 is not similar to the query data represented by the object Q.

オブジェクトＯ３に関しては、［Ｄ３＞（Ｄ１−Ｄ０）］であるため、検索範囲または探索範囲の中に入る可能性がある。従って、オブジェクトＱとオブジェクトＯ３の距離が算出され、オブジェクトＯ３で表されるデータが、オブジェクトＱで表されるクエリデータと類似するか否かが判定される。 Since the object O3 is [D3> (D1-D0)], there is a possibility of entering the search range or the search range. Accordingly, the distance between the object Q and the object O3 is calculated, and it is determined whether or not the data represented by the object O3 is similar to the query data represented by the object Q.

オブジェクトＯ４に関しては、［Ｄ４＞（Ｄ１＋Ｄ０）］であるため、検索範囲または探索範囲の中には入らないことが明らかである。従って、オブジェクトＱとオブジェクトＯ４の距離を算出するまでもなく、オブジェクトＯ４で表されるデータは、オブジェクトＱで表されるクエリデータと類似しないと判定される。 It is clear that the object O4 does not fall within the search range or the search range because [D4> (D1 + D0)]. Accordingly, without calculating the distance between the object Q and the object O4, it is determined that the data represented by the object O4 is not similar to the query data represented by the object Q.

このように、第２実施形態に係るインデックス検索部２４は、あるオブジェクト（図１４ではＯ１）についてクエリデータを表すオブジェクトとの距離を算出し、検索範囲または探索範囲の中には入らないことが判明した場合には、図中の範囲Ａ０またはＡ１に含まれるオブジェクトのみを距離算出の対象とし、Ａ２またはＡ３に含まれることが明らかなオブジェクトに関しては、距離算出の対象とせず棄却する。これによって、より高速な検索処理を実現することができる。 As described above, the index search unit 24 according to the second embodiment calculates the distance of an object (O1 in FIG. 14) from the object representing the query data, and may not enter the search range or the search range. If it is found, only objects included in the range A0 or A1 in the figure are subject to distance calculation, and objects that are clearly included in A2 or A3 are rejected without being subject to distance calculation. As a result, a higher-speed search process can be realized.

以上説明した第２実施形態の検索データ管理装置１によれば、より高速な検索処理を実現することができる。 According to the search data management device 1 of the second embodiment described above, a higher-speed search process can be realized.

なお、第２実施形態の処理を実行する場合、転置グラフＧｒ^＋を求めることで、第１実施形態における転置グラフＧｒ^＋＋と同程度の性能を発揮することができる。 In the case of executing the processing of the second embodiment, by obtaining the transposed graph Gr ^+, it can exert transposed graph Gr ⁺⁺ and comparable performance in the first embodiment.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 As mentioned above, although the form for implementing this invention was demonstrated using embodiment, this invention is not limited to such embodiment at all, In the range which does not deviate from the summary of this invention, various deformation | transformation and substitution Can be added.

１…検索データ管理装置、１０…ネットワークインターフェース、２０…制御部、２２…データオブジェクト生成部、２４…インデックス検索部、２６…グラフインデックス編集部、２６Ａ…一次グラフ生成部、２６Ｂ…転置グラフ生成部、３０…入出力装置、４０…データサーバ、４２…既存データオブジェクト、４４…グラフインデックス DESCRIPTION OF SYMBOLS 1 ... Search data management apparatus, 10 ... Network interface, 20 ... Control part, 22 ... Data object generation part, 24 ... Index search part, 26 ... Graph index edit part, 26A ... Primary graph generation part, 26B ... Transposed graph generation part , 30 ... I / O device, 40 ... Data server, 42 ... Existing data object, 44 ... Graph index

Claims

A storage unit for storing a plurality of nodes to be searched;
A primary graph generating unit that generates a primary graph that is data of a graph structure in which a directed edge is set with respect to neighboring nodes from each node stored in the storage unit;
A transposed graph generating unit that generates a transposed graph, which is data of a graph structure in which the direction of the directed edge in the primary graph stored in the storage unit is reversed, as data for data search;
A search data management device comprising:

The primary graph generation unit selects data of a graph structure in which a predetermined number of nodes are selected in order of short distance from the target node, and directed edges are set for the predetermined number of selected nodes from the target node. Generating as the primary graph,
The search data management device according to claim 1.

The primary graph generation unit regards undirected edges as two directed edges for data having a graph structure in which undirected edges are set for a plurality of nodes, and determines the edges from each node toward another node . Each of the numbers has an upper limit, and the linear graph is generated by reducing the amount exceeding the upper limit in order from the edge having a long distance,
The search data management device according to claim 1.

The transposed graph generation unit generates a graph obtained by adding a predetermined number or a predetermined ratio of edges opposite to the edge for each node to which an edge is given as the transposed graph that is data for data search. ,
The search data management device according to any one of claims 1 to 3.

The transposition graph generation unit for each node that has been granted edges, the graph deleting the granted edges as Tokoro constant or a predetermined ratio of the edge remains in the ascending order is the data for data retrieval Generating as the transposed graph ,
The search data management device according to any one of claims 1 to 4.

A search unit that searches for data similar to the query data by calculating a distance between a query object obtained from query data and an object corresponding to a node included in the transposed graph;
A search range based on the query object or a virtual radius of the search range; a first distance between the first object corresponding to the first node included in the transposed graph and the query object; The similarity between the second node and the query data is determined based on the relationship between the second object and the second distance corresponding to the second node included in the transposed graph. A search unit for determining whether or not to
The search data management device according to any one of claims 1 to 5.

In the case where the first object is not similar to the query object, the search unit determines that the second distance when the distance obtained by subtracting the virtual radius from the first distance is larger than the second distance. Decide not to determine the similarity between the node and the query data;
The search data management apparatus according to claim 6.

In the case where the first object is not similar to the query object, the search unit adds the first radius to the first distance and the second distance is smaller than the second distance. Decide not to determine the similarity between the node and the query data;
The search data management device according to claim 6 or 7.

Computer
Generate a primary graph that is data of a graph structure in which a directed edge is set for a neighboring node from each node included in a plurality of nodes to be searched for data,
Generating a transposed graph, which is data of a graph structure in which the direction of the directed edge in the generated primary graph is reversed, as data for data search;
Search data management method.

On the computer,
Generate a primary graph that is data of a graph structure in which a directed edge is set for a neighboring node from each node included in a plurality of nodes to be searched for data,
Generating a transposed graph, which is data of a graph structure in which the direction of the directed edge in the generated primary graph is reversed, as data for data search;
Search data management program.