JP7601220B2

JP7601220B2 - Name data matching device, name data matching method, and name data matching program

Info

Publication number: JP7601220B2
Application number: JP2023527147A
Authority: JP
Inventors: まな美小川; 正崇佐藤
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2024-12-17
Anticipated expiration: 2041-06-07
Also published as: WO2022259303A1; JPWO2022259303A1

Description

この発明の実施形態は、名称データ対応付け装置、名称データ対応付け方法及び名称データ対応付けプログラムに関する。 Embodiments of the present invention relate to a name data matching device, a name data matching method, and a name data matching program.

データベースを用いた業務には、異なる管理下にあるデータベースを統合し、格納されていた名称データを横並びで使用することで、より多角的・包括的な分析を行うことがある。そのためには、統合するデータベースの間で同一の事柄を表している名称データに対し、同一の識別情報を付与するなどして、名称データを統合する、所謂「名寄せ」という作業が必要になる。 In database-based operations, it is sometimes possible to perform more multifaceted and comprehensive analyses by integrating databases under different management and using the stored name data side-by-side. To do this, a process known as "name matching" is required, in which name data that represents the same thing between the databases being integrated is integrated by assigning the same identification information to the data.

しかしながら、名称データの入力方法は、データベースの管理元に依る。そのため、統合するデータベースの間で同一の事柄を表しているにもかかわらず、その表記が異なるという状況（表記ゆれ）はしばしば存在する。表記ゆれを含むデータベースを統合してしまうと、前述のような分析を行う際に、一つの事柄に関連する情報が表記ゆれを起こした部分だけ不足してしまう事態が発生してしまう。 However, the method of entering name data depends on the database administrator. As a result, there are often situations in which the same thing is represented in different spellings (spelling variations) between the databases being integrated. If databases that contain spelling variations are integrated, when carrying out the analysis described above, the information related to one thing will be missing in the areas where spelling variations have occurred.

このような表記ゆれに対処する技術として、非特許文献１及び非特許文献２は、検索対象の文字列同士の類似度を定量的に計算することで、最も似ている文字列を検索する手法を提案している。また、非特許文献３は、検索用の辞書を作成することで正確且つ効率良く同一の事柄を表す文字列を探し出す方法を提案している。また、特許文献１は、名寄せをしたいデータの周辺情報を用いた紐づけ方法を開示している。As a technique for dealing with such variations in notation, Non-Patent Document 1 and Non-Patent Document 2 propose a method for searching for the most similar character string by quantitatively calculating the similarity between the strings to be searched. Non-Patent Document 3 proposes a method for accurately and efficiently finding character strings that express the same thing by creating a search dictionary. Patent Document 1 also discloses a linking method that uses peripheral information about the data to be matched.

中川裕志, 他, 「出現頻度と連接頻度に基づく専門用語抽出」, 自然言語処理, 2003, 10巻, 1号, p.27-45Hiroshi Nakagawa et al., "Terminology Extraction Based on Frequency of Occurrence and Concatenation," Natural Language Processing, 2003, Vol. 10, No. 1, pp. 27-45 田淵裕章, 他, 「N-gram に基づく用例対訳検索手法」, 信学技報, 人工知能と知識処理研究会, Vol.108, No.441, pp.43-48, 2009.Hiroaki Tabuchi et al., "Example-Based Parallel Text Retrieval Method Based on N-grams," IEICE Technical Report, Artificial Intelligence and Knowledge Processing Research Group, Vol.108, No.441, pp.43-48, 2009. Surajit Chaudhuri, Venkatesh Ganti, and Dong Xin. 2009. "Mining document collections to facilitate accurate approximate entity matching". Proc. VLDB Endow. 2, 1 (August 2009), 395-406. DOI: https://doi.org/10.14778/1687627.1687673Surajit Chaudhuri, Venkatesh Ganti, and Dong Xin. 2009. "Mining document collections to facilitate accurate approximate entity matching". Proc. VLDB Endow. 2, 1 (August 2009), 395-406. DOI: https://doi.org /10.14778/1687627.1687673

日本国特開２０２０－１２３２１０号公報Japanese Patent Application Publication No. 2020-123210

表記のゆれ方には、登録データ名を省略した表記と、使用者同士でのローカルルールに基づく呼び名（通称）による表記と、が存在する。 There are variations in the way the data is written, including an abbreviation of the registered data name, and names (nicknames) used by users based on local rules.

非特許文献１及び２に開示されているような手法は、前者の省略表記のみが表記ゆれとして存在する場合には、ポピュラー且つ有効な手段である。しかしながら、後者の通称表記が混在している状況下では、各通称に対してその通称と文字列的に類似した名称が紐付けられるため、誤った結果を提示する可能性が高い。なぜならば、通称表記は、本来紐付けられるべき名称と著しくかけ離れているケースが多いためである。The methods disclosed in Non-Patent Documents 1 and 2 are popular and effective when only the former abbreviated spellings are present as spelling variations. However, in situations where the latter common names are mixed, each common name is linked to a name that is similar in character string to the common name, which makes it highly likely that erroneous results will be presented. This is because the common name is often significantly different from the name to which it should be linked.

また、前者の省略表記のみを扱う場合であっても、非特許文献１及び２に開示の手法は、日本語に対して使用されることを想定して作られているので、技術の適用範囲が限定的である。日本語における省略表記の特徴と他言語における特徴は全てが一致するわけではないので、非特許文献１及び２に開示の手法が、他言語で入力された名称データに対して問題なく適用可能とは限らないためである。 Even when dealing only with the former abbreviations, the methods disclosed in Non-Patent Documents 1 and 2 are designed for use with Japanese, so the scope of their application is limited. This is because the characteristics of abbreviations in Japanese do not all match the characteristics of other languages, so the methods disclosed in Non-Patent Documents 1 and 2 may not be applicable without problem to name data entered in other languages.

よって、通称表記に対しては、非特許文献３に開示されているような、辞書を作成することが最適な手法だと考えられている。しかしながら、統合するデータベースの個数が増加すると、それに伴い辞書を拡張する必要が発生するため、表記ゆれに対処可能になるまでに時間が掛かるという欠点がある。 Therefore, it is thought that the best method for dealing with common names is to create a dictionary, as disclosed in Non-Patent Document 3. However, as the number of databases to be integrated increases, the dictionary must be expanded accordingly, which has the drawback that it takes time to deal with variations in spelling.

そこで、特許文献１では、辞書に頼らず、名寄せの対象データ周辺の情報（同データベースのデータＡとデータＢは繋がりがある、など）を用いて、通称を名寄せ可能とする技術を提案している。しかしながら、この特許文献１に開示されているような技術は、それぞれのデータベースの名称データから構築できるグラフが、一種の包含関係（一方のグラフの辺に対応する辺が、他方のグラフに必ず存在する）を満たしている必要がある。そのため、包含関係が保たれていない構造のグラフが得られてしまう名称データを名寄せすることは困難か、できたとしても候補となる名称が大量に出てきてしまう、という問題があった。Therefore, Patent Document 1 proposes a technology that makes it possible to match common names by using information surrounding the target data for matching (e.g., data A and data B in the same database are connected) without relying on dictionaries. However, the technology disclosed in Patent Document 1 requires that the graph that can be constructed from the name data of each database satisfy a kind of inclusion relationship (an edge corresponding to an edge in one graph must exist in the other graph). As a result, there is a problem that it is difficult to match name data that results in a graph with a structure that does not maintain the inclusion relationship, and even if it is possible, a large number of candidate names will appear.

この発明は、統合するデータベース間で表記ゆれが存在する同義の名称データを、人的稼働を掛けず正確に対応付けることができる技術を提供しようとするものである。 This invention aims to provide technology that can accurately match synonymous name data that has spelling variations between the databases being integrated without requiring human labor.

上記課題を解決するために、この発明の一態様に係る名称データ対応付け装置は、複数の名称データ及びそれら名称データの論理的または物理的な隣接関係を示す隣接情報を保持する第１のデータベースと、複数の名称データ、それら名称データの隣接情報及びそれら名称データが属するパスを表すパス識別情報を保持する第２のデータベースとの間で異なる表記を有する同義の名称データを対応付ける名称データ対応付け装置であって、共通データ抽出部と、パス作成部と、対応付け部と、を備える。共通データ抽出部は、第１のデータベースと第２のデータベースとの間で同じ表記である名称データを共通データとして抽出する。パス作成部は、第２のデータベースが保持するパス識別情報で表されるパスから、共通データ抽出部が抽出した共通データを端点とし且つ非共通データを端点間の頂点とする部分パスを抽出し、第１のデータベースが保持する情報に基づいて、部分パスそれぞれについて、部分パスの端点と同じ共通データの端点を持ち且つ部分パスの長さ以上の長さを持つパスを作成する。対応付け部は、パス作成部が抽出した部分パスそれぞれについて、部分パス上の各頂点とパス作成部が作成したパス上の頂点との組み合わせを探索することで、第１のデータベースが保持する名称データと第２のデータベースが保持する名称データとを対応付ける。In order to solve the above problem, a name data matching device according to one aspect of the present invention is a name data matching device that matches synonymous name data having different notations between a first database that holds a plurality of name data and adjacency information indicating a logical or physical adjacency relationship between the name data and a second database that holds a plurality of name data, the adjacency information of the name data, and path identification information indicating a path to which the name data belongs, and includes a common data extraction unit, a path creation unit, and a matching unit. The common data extraction unit extracts name data that has the same notation between the first database and the second database as common data. The path creation unit extracts partial paths that have the common data extracted by the common data extraction unit as endpoints and non-common data as vertices between the endpoints from paths represented by path identification information held by the second database, and creates paths for each partial path that have the same endpoints of the common data as the endpoints of the partial path and have a length equal to or greater than the length of the partial path, based on information held by the first database. The matching unit matches the name data held in the first database with the name data held in the second database by searching for combinations of each vertex on the partial path and a vertex on the path created by the path creation unit for each partial path extracted by the path creation unit.

この発明の一態様によれば、統合するデータベース間で表記ゆれが存在する同義の名称データを人的稼働を掛けず正確に対応付けることができる技術を提供することができる。 According to one aspect of the present invention, a technology can be provided that can accurately match synonymous name data that has spelling variations between the databases being integrated without requiring human labor.

図１は、この発明の一実施形態に係る名称データ対応付け装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of a name data associating device according to an embodiment of the present invention. 図２は、名称データ対応付け装置のハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of the name data associating device. 図３は、基礎データベース記憶部に記憶される基礎データベースが保持する情報の一例を示す図である。FIG. 3 is a diagram showing an example of information held in the basic database stored in the basic database storage unit. 図４は、派生データベース記憶部に記憶される派生データベースが保持する情報の一例を示す図である。FIG. 4 is a diagram illustrating an example of information held in a derivative database stored in a derivative database storage unit. 図５は、名称データ対応付け装置における名称データの対応付けに係わる処理動作の一例を示すフローチャートである。FIG. 5 is a flowchart showing an example of a processing operation related to the association of name data in the name data association device. 図６は、名称の対応付け方法を説明するための模式図である。FIG. 6 is a schematic diagram for explaining a method of associating names. 図７は、動作例において基礎データベースが保持する情報の一例を示す図である。FIG. 7 is a diagram showing an example of information held in the basic database in the operation example. 図８は、動作例において派生データベースが保持する情報の一例を示す図である。FIG. 8 is a diagram showing an example of information held in the derivative database in the operation example. 図９は、動作例においてグラフ作成部によって基礎データベースが保持する情報から作成された閉路グラフの一例を示す模式図である。FIG. 9 is a schematic diagram showing an example of a cycle graph created by the graph creation unit from information held in the basic database in the operation example. 図１０は、動作例においてグラフ作成部によって派生データベースが保持する情報から作成された閉路グラフから生成されたパスの一例を示す模式図である。FIG. 10 is a schematic diagram showing an example of a path generated from a cycle graph generated by the graph generating unit from information held in the derivation database in the operation example. 図１１は、動作例において出力情報記憶部に記憶される出力情報の一例を示す図である。FIG. 11 is a diagram showing an example of output information stored in the output information storage unit in the operation example.

以下、図面を参照して、この発明に係わる実施形態を説明する。 Below, an embodiment of the present invention is described with reference to the drawings.

本実施形態において、複数のデータベースは、異なる表記を有する同義の名称データを保持しており、これらデータベースで名称データを対応付けしたいデータカラムは、既知であるとする。各データカラムは、名称データと、例えば測定値や測定日時、売上日時や売上金額、などといった、名称データに対応する文字列別データを含むことができる。また、各データベースは、名称データの隣接関係を示す論理的あるいは物理的な隣接情報を保持していることを想定する。ここで、名称データの隣接関係を示す隣接情報とは、例えば、人脈（人物Ａと人物Ｂが知り合いである）や、ネットワーク上の接続関係（ビルＡとビルＢがケーブルによって接続されている）といった、データ同士の繋がり方の情報を指す。この発明は、名称データを対応付けするデータベースの個数に制限は特にないが、本一実施形態では、説明の簡単化のため、対象とするデータベースは２つであるとする。また、各データベース内の名称データ間には、ネットワーク上の接続関係があるとする。具体的には、各データベースに「上位ビル」「下位ビル」という名前のカラムがあり、「上位ビル」に格納された名称データと「下位ビル」に格納された名称データは、あるネットワーク上で隣接していることを表す。加えて、複数のデータベースのうち少なくとも１つには、隣接情報に加えて、名称データが属するパスを表すパス識別情報が追加されていると想定する。In this embodiment, a plurality of databases hold synonymous name data with different notations, and data columns to which name data is to be associated in these databases are assumed to be known. Each data column can include name data and character string data corresponding to the name data, such as measurement values, measurement dates and times, sales dates and times, and sales amounts. It is also assumed that each database holds logical or physical adjacent information indicating the adjacent relationship of the name data. Here, the adjacent information indicating the adjacent relationship of the name data refers to information on how data is connected to each other, such as personal connections (person A and person B are acquaintances) and connection relationships on a network (building A and building B are connected by a cable). In this invention, there is no particular limit to the number of databases to which name data is associated, but in this embodiment, for the sake of simplicity, it is assumed that there are two target databases. It is also assumed that there is a connection relationship on a network between the name data in each database. Specifically, each database has columns named "upper building" and "lower building", and the name data stored in the "upper building" and the name data stored in the "lower building" are adjacent on a certain network. In addition, it is assumed that at least one of the plurality of databases has, in addition to the adjacent information, path identification information indicating the path to which the name data belongs.

（構成例）
図１は、この発明の一実施形態に係る名称データ対応付け装置の構成の一例を示すブロック図である。名称データ対応付け装置は、基礎データベース（図では、データベースをＤＢと略記する。）１、派生データベース２、グラフ作成部３、共通データ抽出部４、パス情報抽出部５、パス作成部６、対応付け部７及びデータ出力部８を有する。 (Configuration example)
1 is a block diagram showing an example of the configuration of a name data matching device according to an embodiment of the present invention. The name data matching device has a basic database (database is abbreviated as DB in the figure) 1, a derived database 2, a graph creation unit 3, a common data extraction unit 4, a path information extraction unit 5, a path creation unit 6, a matching unit 7, and a data output unit 8.

基礎データベース１は、複数の名称データと、それら名称データの隣接関係を示す隣接情報と、を保持する第１のデータベースである。また、派生データベース２は、複数の名称データと、それら名称データの隣接情報と、それら名称データが属するパスを表すパス識別情報と、を保持する第２のデータベースである。The basic database 1 is a first database that holds a plurality of name data and adjacent information that indicates the adjacent relationship of the name data. The derived database 2 is a second database that holds a plurality of name data, adjacent information of the name data, and path identification information that indicates the path to which the name data belongs.

グラフ作成部３は、基礎データベース１及び派生データベース２が保持する情報に基づいて、名称データを頂点とする無向グラフを作成する。 The graph creation unit 3 creates an undirected graph with name data as vertices based on the information held in the basic database 1 and the derived database 2.

共通データ抽出部４は、基礎データベース１と派生データベース２との間で同じ表記である名称データを、共通データとして抽出する。 The common data extraction unit 4 extracts name data that is written in the same way between the basic database 1 and the derived database 2 as common data.

パス情報抽出部５は、派生データベース２が保持するパス識別情報に基づいて、共通データ抽出部４が抽出した共通データのうち１つを始点とし、派生データベース２が保持する名称データを頂点とする、全てのパスを生成する。パスの終点は、始点と同じ共通データとなる場合もあるし、始点と異なる共通データとなる場合もある。そして、パス情報抽出部５は、それらのパスそれぞれについて、頂点数、含まれる頂点の名称データ及びそのパス上の位置を含むパス情報を抽出する。例えば、パス情報抽出部５は、グラフ作成部３が作成した無向グラフと派生データベース２が保持するパス識別情報とに基づいて、パス情報を抽出することができる。 Based on the path identification information held by the derivative database 2, the path information extraction unit 5 generates all paths that start from one of the common data extracted by the common data extraction unit 4 and have the name data held by the derivative database 2 as their vertices. The end point of a path may be the same common data as the start point, or it may be common data different from the start point. Then, the path information extraction unit 5 extracts path information for each of these paths, including the number of vertices, the name data of the vertices included, and their positions on the path. For example, the path information extraction unit 5 can extract path information based on the undirected graph created by the graph creation unit 3 and the path identification information held by the derivative database 2.

パス作成部６は、パス情報抽出部５が抽出したパス情報で示される各パスについて、各共通データを端点とする、すなわち始点及び終点とする、部分パスを抽出する。そして、パス作成部６は、基礎データベース１が保持する情報に基づいて、各部分パスと一致する共通データを端点とし且つ規定の長さである全てのパスを数え上げる。例えば、パス作成部６は、グラフ作成部３が基礎データベース１から作成した無向グラフに基づいて、パスを列挙することができる。 For each path indicated by the path information extracted by the path information extraction unit 5, the path creation unit 6 extracts partial paths whose endpoints are the common data, i.e., the start and end points. Then, based on the information held by the basic database 1, the path creation unit 6 counts up all paths whose endpoints are common data matching each partial path and which are of a specified length. For example, the path creation unit 6 can enumerate paths based on the undirected graph created by the graph creation unit 3 from the basic database 1.

対応付け部７は、例えば、編集距離等の文字列類似度に基づいて、パス作成部６が抽出した部分パスと数え上げたパスとから、頂点の名称データの組み合わせを探索する。そして、対応付け部７は、探索した組み合わせに基づいて、基礎データベース１が保持する名称データと派生データベース２が保持する名称データとを対応付ける。The matching unit 7 searches for combinations of vertex name data from the partial paths extracted by the path creation unit 6 and the counted paths based on string similarity such as edit distance. Then, based on the searched combinations, the matching unit 7 matches the name data held in the basic database 1 with the name data held in the derived database 2.

データ出力部８は、対応付け部７での対応付けの結果に基づいて、出力情報を生成し、それを出力する。例えば、データ出力部８は、対応付け部７での対応付けの結果に基づいて、名称データの対応関係を表す対応表を、出力情報として生成することができる。また、データ出力部８は、対応付け部７での対応付けの結果に基づいて基礎データベース１が保持している情報について名称データを変換して、新たなデータベースを作成し、これを出力情報とするようにしても良い。あるいは、データ出力部８は、対応付け部７での対応付けの結果に基づいて基礎データベース１及び派生データベース２が保持している情報を統合して、新たなデータベースを作成し、これを出力情報とするようにしても良い。The data output unit 8 generates output information based on the results of the matching in the matching unit 7 and outputs it. For example, the data output unit 8 can generate a correspondence table showing the correspondence of name data as output information based on the results of the matching in the matching unit 7. The data output unit 8 may also convert name data for the information held in the basic database 1 based on the results of the matching in the matching unit 7, create a new database, and use this as output information. Alternatively, the data output unit 8 may integrate the information held in the basic database 1 and the derived database 2 based on the results of the matching in the matching unit 7, create a new database, and use this as output information.

図２は、名称データ対応付け装置のハードウェア構成の一例を示す図である。 Figure 2 shows an example of the hardware configuration of a name data matching device.

名称データ対応付け装置は、図２に示すように、例えばサーバコンピュータ（Server computer）やパーソナルコンピュータ（Personal computer）などのコンピュータにより構成され、ＣＰＵ（Central Processing Unit）等のハードウェアプロセッサ１０１を有する。そして、名称データ対応付け装置では、このプロセッサ１０１に対し、プログラムメモリ１０２と、データメモリ１０３と、通信インタフェース１０４と、入出力インタフェース（図２では入出力ＩＦと記す）１０５とが、バス１０６を介して接続される。2, the name data matching device is configured with a computer such as a server computer or a personal computer, and has a hardware processor 101 such as a CPU (Central Processing Unit). In the name data matching device, program memory 102, data memory 103, communication interface 104, and input/output interface (referred to as input/output IF in FIG. 2) 105 are connected to this processor 101 via bus 106.

通信インタフェース１０４は、例えば一つ以上の有線または無線の通信モジュールを含むことができる。通信インタフェース１０４は、基礎データベース１及び／または派生データベース２が、ＬＡＮ（Local Area Network）やインターネットなどのネットワークを介して接続されるデータサーバなどに構成される場合には、そのデータサーバなどとの間で通信を行い、それらデータサーバからデータを取得することができる。また、通信インタフェース１０４は、外部のデータ処理装置などと通信して、そのデータ処理装置からの要求を受信したり、その要求に応じたデータ処理結果をデータ処理装置に返信したりすることもできる。The communication interface 104 may include, for example, one or more wired or wireless communication modules. When the basic database 1 and/or the derived database 2 are configured as data servers connected via a network such as a LAN (Local Area Network) or the Internet, the communication interface 104 can communicate with the data servers and acquire data from the data servers. The communication interface 104 can also communicate with an external data processing device and receive requests from the data processing device and return data processing results in response to the requests to the data processing device.

入出力インタフェース１０５には、入力部１０７及び表示部１０８が接続されている。入力部１０７及び表示部１０８は、例えば液晶または有機ＥＬ（Electro Luminescence）を使用した表示デバイスの表示画面上に、静電方式または圧力方式を採用した入力検知シートを配置した、いわゆるタブレット型の入力・表示デバイスを用いたものが用いられることができる。なお、入力部１０７及び表示部１０８は独立するデバイスにより構成されても良い。入出力インタフェース１０５は、上記入力部１０７において入力された操作情報をプロセッサ１０１に入力すると共に、プロセッサ１０１で生成された表示情報を表示部１０８に表示させる。The input/output interface 105 is connected to an input unit 107 and a display unit 108. The input unit 107 and the display unit 108 may be a so-called tablet-type input/display device in which an input detection sheet using an electrostatic or pressure method is arranged on the display screen of a display device using, for example, liquid crystal or organic EL (Electro Luminescence). The input unit 107 and the display unit 108 may be configured as independent devices. The input/output interface 105 inputs operation information inputted in the input unit 107 to the processor 101, and causes the display unit 108 to display display information generated by the processor 101.

なお、入力部１０７及び表示部１０８は、入出力インタフェース１０５に接続されていなくても良い。入力部１０７及び表示部１０８は、通信インタフェース１０４と直接またはネットワークを介して接続するための通信ユニットを備えることで、プロセッサ１０１との間で情報の授受を行い得る。It should be noted that the input unit 107 and the display unit 108 do not have to be connected to the input/output interface 105. The input unit 107 and the display unit 108 can transmit and receive information between the processor 101 and the input unit 107 and the display unit 108 by being provided with a communication unit for connecting to the communication interface 104 directly or via a network.

また、入出力インタフェース１０５は、フラッシュメモリなどの半導体メモリといった記録媒体のリード／ライト機能を有しても良いし、あるいは、そのような記録媒体のリード／ライト機能を持ったリーダライタとの接続機能を有しても良い。これにより、名称データ対応付け装置に対して着脱自在な記録媒体を、名称データを保持するデータベースとすることができる。入出力インタフェース１０５は、さらに、他の機器との接続機能を有して良い。In addition, the input/output interface 105 may have a function for reading/writing a recording medium such as a semiconductor memory such as a flash memory, or may have a function for connecting to a reader/writer having a function for reading/writing such a recording medium. This allows a recording medium that is detachable from the name data matching device to be used as a database that holds name data. The input/output interface 105 may further have a function for connecting to other devices.

プログラムメモリ１０２は、非一時的な有形のコンピュータ可読記憶媒体として、例えば、ＨＤＤ（Hard Disk Drive）またはＳＳＤ（Solid State Drive）等の随時書込み及び読出しが可能な不揮発性メモリと、ＲＯＭ等の不揮発性メモリとが組合せて使用されたものである。このプログラムメモリ１０２には、プロセッサ１０１が一実施形態に係る各種制御処理を実行するために必要なプログラムが格納されている。すなわち、上記のグラフ作成部３、共通データ抽出部４、パス情報抽出部５、パス作成部６、対応付け部７及びデータ出力部８の各部における処理機能部は、いずれも、プログラムメモリ１０２に格納されたプログラムを上記プロセッサ１０１により読み出させて実行させることにより実現され得る。なお、これらの処理機能部の一部または全部は、特定用途向け集積回路（ＡＳＩＣ：Application Specific Integrated Circuit）またはＦＰＧＡ（field-programmable gate array）などの集積回路を含む、他の多様な形式によって実現されても良い。The program memory 102 is a non-transient tangible computer-readable storage medium that is a combination of a non-volatile memory that can be written and read at any time, such as a hard disk drive (HDD) or a solid state drive (SSD), and a non-volatile memory such as a ROM. The program memory 102 stores programs necessary for the processor 101 to execute various control processes according to an embodiment. That is, the processing function units in each of the graph creation unit 3, the common data extraction unit 4, the path information extraction unit 5, the path creation unit 6, the correspondence unit 7, and the data output unit 8 can all be realized by having the processor 101 read and execute the programs stored in the program memory 102. Note that some or all of these processing function units may be realized in various other forms, including integrated circuits such as application specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs).

データメモリ１０３は、有形のコンピュータ可読記憶媒体として、例えば、上記の不揮発性メモリと、ＲＡＭ（Random Access Memory）等の揮発性メモリとが組合せて使用されたものである。このデータメモリ１０３は、各種処理が行われる過程で取得及び作成された各種データが記憶されるために用いられる。すなわち、データメモリ１０３には、各種処理が行われる過程で、適宜、各種データを記憶するための領域が確保される。そのような領域として、データメモリ１０３には、例えば、基礎データベース記憶部１０３１、派生データベース記憶部１０３２、一時記憶部１０３３及び出力情報記憶部１０３４を設けることができる。The data memory 103 is a tangible computer-readable storage medium, for example, a combination of the above-mentioned non-volatile memory and a volatile memory such as a RAM (Random Access Memory). This data memory 103 is used to store various data acquired and created in the course of various processes. That is, areas are secured in the data memory 103 for storing various data as appropriate in the course of various processes. As such areas, the data memory 103 may be provided with, for example, a basic database storage unit 1031, a derived database storage unit 1032, a temporary storage unit 1033, and an output information storage unit 1034.

基礎データベース記憶部１０３１は、基礎データベース１の情報を記憶し、派生データベース記憶部１０３２は、派生データベース２の情報を記憶する。すなわち、基礎データベース１及び派生データベース２が、この基礎データベース記憶部１０３１及び派生データベース記憶部１０３２に構成されることができる。The basic database storage unit 1031 stores information on basic database 1, and the derived database storage unit 1032 stores information on derived database 2. In other words, the basic database 1 and the derived database 2 can be configured in the basic database storage unit 1031 and the derived database storage unit 1032.

図３は、基礎データベース記憶部１０３１に記憶される基礎データベース１が保持する情報の一例を示す図であり、図４は、派生データベース記憶部１０３２に記憶される派生データベース２が保持する情報の一例を示す図である。ここでは、名称データがビルの名称である例を示す。基礎データベース記憶部１０３１に記憶される基礎データベース１では、上位ビルと下位ビルは、隣接関係にある。派生データベース記憶部１０３２に記憶される派生データベース２では、同一のパス識別子（図では、識別子をＩＤと略記する）を持つビルの組み合わせが、１つのパス（新宿ビル→南新宿ビル→外苑ビル→四ツ谷ビル→新宿ビル）を構成している。以降、派生データベース２にあるビル名をｃ_i（ｉ∈｛１，２，…，ｎ｝）で表し、基礎データベース１にあるビル名をｄ_j（ｊ∈｛１，２，…，ｍ｝）で表す。ここで、ｎ及びｍは、それぞれのデータベースにおけるビル名数である。 FIG. 3 is a diagram showing an example of information held by the basic database 1 stored in the basic database storage unit 1031, and FIG. 4 is a diagram showing an example of information held by the derivative database 2 stored in the derivative database storage unit 1032. Here, an example is shown in which the name data is the name of a building. In the basic database 1 stored in the basic database storage unit 1031, a higher-level building and a lower-level building are adjacent to each other. In the derivative database 2 stored in the derivative database storage unit 1032, a combination of buildings having the same path identifier (in the figure, the identifier is abbreviated as ID) constitutes one path (Shinjuku Building → Minami-Shinjuku Building → Gaien Building → Yotsuya Building → Shinjuku Building). Hereinafter, the building names in the derivative database 2 are represented by c _i (i ∈ {1, 2, ..., n}), and the building names in the basic database 1 are represented by d _j (j ∈ {1, 2, ..., m}). Here, n and m are the numbers of building names in each database.

これら基礎データベース記憶部１０３１及び派生データベース記憶部１０３２に記憶される情報は、例えば、プロセッサ１０１が入出力インタフェース１０５を介して受け取った、入力部１０７から入力された基礎データベース１及び派生データベース２の情報とすることができる。すなわち、基礎データベース１及び派生データベース２が、データメモリ１０３に構築されることができる。また、外部のデータサーバに構築された基礎データベース１及び派生データベース２が保持する情報の全部または一部を、基礎データベース記憶部１０３１及び派生データベース記憶部１０３２に記憶させるようにしても良い。この場合は、例えば、プロセッサ１０１は、入力部１０７からのユーザ操作による指示に応じて、データベースサーバに蓄積された情報を通信インタフェース１０４を介して取得して、それらを記憶部１０３１、１０３２に記憶させる。あるいは、プロセッサ１０１は、記録媒体に記録された情報を、入出力インタフェース１０５を介して取得しても良い。また、プロセッサ１０１は、外部のデータ処理装置などから基礎データベース１及び派生データベース２の情報と名称データの対応付け要求とを通信インタフェース１０４を介して受信して、それら受信したデータベースの情報を、処理対象の情報として記憶部１０３１、１０３２に記憶させるようにしても良い。The information stored in the basic database storage unit 1031 and the derived database storage unit 1032 can be, for example, information on the basic database 1 and the derived database 2 received by the processor 101 via the input/output interface 105 and input from the input unit 107. That is, the basic database 1 and the derived database 2 can be constructed in the data memory 103. In addition, all or part of the information held by the basic database 1 and the derived database 2 constructed in an external data server may be stored in the basic database storage unit 1031 and the derived database storage unit 1032. In this case, for example, the processor 101 acquires information accumulated in the database server via the communication interface 104 in response to an instruction by a user operation from the input unit 107, and stores the information in the storage units 1031 and 1032. Alternatively, the processor 101 may acquire information recorded on a recording medium via the input/output interface 105. In addition, the processor 101 may receive a request for matching information of the basic database 1 and derived database 2 with name data from an external data processing device or the like via the communication interface 104, and store the received database information in the memory units 1031 and 1032 as information to be processed.

一時記憶部１０３３は、プロセッサ１０１が、上記グラフ作成部３としての動作を実施した際に作成する無向グラフ、上記共通データ抽出部４としての動作を実施した際に抽出した共通データ、上記パス情報抽出部５としての動作を実施した際に抽出した全てのパスについてのパス情報、上記パス作成部６としての動作を実施した際に抽出した部分パス及び数え上げたパス、上記対応付け部７としての動作を実施した際に得られる名称データの対応付け結果、などを記憶する。The temporary memory unit 1033 stores the undirected graph created by the processor 101 when it operates as the graph creation unit 3, the common data extracted when it operates as the common data extraction unit 4, the path information for all paths extracted when it operates as the path information extraction unit 5, the partial paths extracted and counted paths when it operates as the path creation unit 6, the name data matching results obtained when it operates as the matching unit 7, and the like.

出力情報記憶部１０３４は、プロセッサ１０１が上記データ出力部８としての動作を実施した際に得られる出力情報を記憶する。The output information storage unit 1034 stores the output information obtained when the processor 101 performs operation as the data output unit 8.

（動作）
次に、名称データ対応付け装置の動作を説明する。 (Operation)
Next, the operation of the name data associating device will be described.

図５は、名称データ対応付け装置における名称データの対応付けに係わる処理動作の一例を示すフローチャートである。ここでは、既に、基礎データベース記憶部１０３１には基礎データベース１の情報が記憶され、派生データベース記憶部１０３２には派生データベース２の情報が記憶されているものとする。入出力インタフェース１０５を介して入力部１０７から、あるいは、通信インタフェース１０４を介して外部のデータ処理装置から、名称データの対応付けの実施を指示されると、名称データ対応付け装置のプロセッサ１０１は、このフローチャートに示す動作を開始する。 Figure 5 is a flowchart showing an example of processing operations related to matching of name data in the name data matching device. Here, it is assumed that information of basic database 1 is already stored in basic database storage unit 1031, and information of derived database 2 is already stored in derived database storage unit 1032. When an instruction to perform matching of name data is received from input unit 107 via input/output interface 105, or from an external data processing device via communication interface 104, processor 101 of the name data matching device begins the operations shown in this flowchart.

まず、プロセッサ１０１は、グラフ作成部３としての動作を実行する。すなわち、プロセッサ１０１は、派生データベース記憶部１０３２に記憶された派生データベース２の情報と基礎データベース記憶部１０３１に記憶された基礎データベース１の情報とのそれぞれについて、隣接情報を使用して、名称データを頂点とする閉路グラフＧ_c及びＧ_dを生成する（ステップＳ１）。生成された閉路グラフＧ_c及びＧ_dは、データメモリ１０３の一時記憶部１０３３に記憶される。 First, the processor 101 executes the operation of the graph creation unit 3. That is, the processor 101 uses the adjacency information for each of the information of the derived database 2 stored in the derived database storage unit 1032 and the information of the basic database 1 stored in the basic database storage unit 1031 to generate cycle graphs _Gc and _Gd having name data as vertices (step S1). The generated cycle graphs _Gc and _Gd are stored in the temporary storage unit 1033 of the data memory 103.

派生データベース２にあるビル名ｃ_i及び基礎データベース１にあるビル名ｄ_jをそれぞれ頂点とし、隣接関係にある頂点同士は辺で結ばれていると解釈すると、以下のように、無向グラフである閉路グラフＧ_c及びＧ_dが構築できる。ここで、閉路とは、閉路グラフＧ_cの部分グラフであり、始点と終点が同一頂点であるようなパスを指す。 If we consider the building name c _i in the derived database 2 and the building name d _j in the basic database 1 as vertices, and assume that adjacent vertices are connected by edges, then we can construct the undirected cycle graphs G _c and G _d as follows: Here, a cycle is a subgraph of the cycle graph G _c , and refers to a path whose start and end points are the same vertex.

Ｅ_d：基礎データベース１の隣接情報から得られる辺の集合
ｇ_d：Ｅ_d→Ｐ（Ｖ_d）Ｅ_dの元に頂点集合Ｖ_dの部分集合を対応させる写像。ただし、Ｐ（Ｖ_d）は頂点集合Ｖ_dの冪集合である
Ｇ_d :＝（ｇ_d，Ｖ_d，Ｅ_d） E _d : A set of edges obtained from the adjacent information of the basic database 1. g _d : E _d →P(V _d ) A mapping that associates a subset of the vertex set V _d with the elements of E _d . Here, P(V _d ) is a power set of the vertex set V _d .
G _d :=(g _d , V _d , E _d )

Ｅ_c：派生データベース２の隣接情報から得られる辺の集合
ｇ_c：Ｅ_c→Ｐ（Ｖ_c）Ｅ_cの元に頂点集合Ｖ_cの部分集合を対応させる写像。ただし、Ｐ（Ｖ_c）は頂点集合Ｖ_cの冪集合である
Ｇ_c :＝（ｇ_c，Ｖ_c，Ｅ_c） _Ec : A set of edges obtained from the adjacent information of the derived database 2. _gc : _Ec → P( _Vc ) A mapping that associates a subset of the vertex set _Vc with an element of _Ec . Note that P( _Vc ) is a power set of the vertex set _Vc .
G _c :=(g _c , V _c , E _c )

また、名称データ対応付け装置のプロセッサ１０１は、共通データ抽出部４としての動作を実行する。すなわち、プロセッサ１０１は、基礎データベース記憶部１０３１に記憶された基礎データベース１の情報と派生データベース記憶部１０３２に記憶された派生データベース２の情報とで、共通する名称データを抽出する（ステップＳ２）。抽出され共通する名称データは、データメモリ１０３の一時記憶部１０３３に記憶される。In addition, the processor 101 of the name data matching device executes the operation as the common data extraction unit 4. That is, the processor 101 extracts common name data between the information of the basic database 1 stored in the basic database storage unit 1031 and the information of the derived database 2 stored in the derived database storage unit 1032 (step S2). The extracted common name data is stored in the temporary storage unit 1033 of the data memory 103.

次に、プロセッサ１０１は、パス情報抽出部５としての動作を実行する。すなわち、プロセッサ１０１は、共通する名称データと派生データベース２が格納するパス識別情報とに基づいて、派生データベース２の閉路グラフＧ_cからパスΓ_k（ｋ∈｛１，２，…，Ｋ｝、Ｋは閉路グラフＧ_c内のパスの総数）を抽出する（ステップＳ３）。抽出されたパスΓ_kを示すパス情報は、データメモリ１０３の一時記憶部１０３３に記憶される。パス情報は、抽出されたパスΓ_kの頂点数、含まれる頂点の名称データ及びそのパス上の位置を含むことができる。 Next, the processor 101 executes the operation as the path information extraction unit 5. That is, the processor 101 extracts paths Γ _k (k ∈ {1, 2, ..., K}, K is the total number of paths in the closed-loop graph G c ) from the closed-loop graph G _c of the derived database 2 based on the common name data and the path identification information stored in the derived database ₂ (step S3). Path information indicating the extracted path Γ _k is stored in the temporary storage unit 1033 of the data memory 103. The path information can include the number of vertices of the extracted path Γ _k , the name data of the vertices included therein, and the positions on the path.

ここで、パスΓ_kとは、派生データベース２の閉路グラフＧ_cにおける頂点ｓ_k∈Ｖ_cを始点とするｋ番目のパスである。
Γ_k[l]： Γ_kを構成する頂点のうちｌ番目の頂点（第ｌ要素）
｜Γ_k｜：パスΓ_kの長さ（パスΓ_kを構成する頂点の個数）
Γ_k＝（ｓ_k，…，ｔ_k），
（Γ_k[l]，Γ_k[l+1]）∈Ｅ_c，
ｌ∈｛１，２，…，｜Γ_k｜｝ Here, the path Γ _k is the k-th path in the cycle graph G _c of the derived database 2, the starting point of which is the vertex s _k ∈V _c .
Γ _k [l]: The l-th vertex (l-th element) among the vertices that make up Γ _k
|Γ _k |: Length of path Γ _k (number of vertices that make up path Γ _k )
Γ _k =(s _k ,..., t _k ),
(Γ _k [l], Γ _k [l+1])∈E _c ,
l∈{1,2,…, |Γ _k |}

閉路グラフＧ_cに対しパスはいくつあっても良いとするが、いずれのパスも、以下の３条件を満たすとする。
１．全てのｓ_kに対してｓ_k＝ｄ_j，ｔ_k＝ｄ_lを満たすｄ_j∈Ｖ_d，ｄ_lが存在する。
２．パスΓ_kを構成する全ての辺は、Ｅ_cに存在する。
３．∀ｃ_i∈Ｖ_cは、いずれかのパスΓ_kに所属している。 There may be any number of paths in the cycle graph G _c , but each path must satisfy the following three conditions.
1. For all s _{k ,} there exists d _j ∈ V _d , d _l such that s _k =d _j , t _k =d _l .
2. All edges in the path Γ _k exist in E _c .
3. ∀c _i ∈V _c belongs to any path Γ _k .

ここで、ステップＳ２で抽出された、Ｖ_cとＶ_dで同一表記であるようなビル名の集合をＳ :＝｛ｃ_i∈Ｖ_c｜∃ｄ_j∈Ｖ_d s.t. ｃ_i＝ｄ_j｝とする。名称データ対応付け装置は、この集合Ｓの要素ではない各ｃ_i，ｄ_jに対し、以下のようにして、閉路グラフＧ_c及びＧ_dを用いた対応付けを行う。 Here, the set of building names extracted in step S2 that are expressed the same in _Vc and _Vd is defined as S:={ _ci ∈ _Vc |∃dj ∈ _Vd _stci = _dj }. For each _ci and _dj that are not elements of this set S, the name _data matching device performs matching using closed loop graphs _Gc and _Gd as follows.

ここで、閉路グラフＧ_cを構成するパスの一つをΓ_kと表記し、パスΓ_kを構成する頂点のうち、集合Ｓに含まれる頂点の配列をＩ_kとし、以下で定義する。
Ｉ_k :＝（Γ_k[i]｜Γ_k[i]∈Ｓ，Γ_k[i]≠ｓ_k，i＝１，２，…，｜Γ_k｜） Here, one of the paths constituting the closed-path graph _Gc is denoted as _Γk , and the array of vertices included in the set S among the vertices constituting the path _Γk is denoted as _Ik , which is defined as follows.
I _k :=(Γ _k [i] | Γ _k [i]∈S, Γ _k [i]≠s _k , i=1, 2, ..., | Γ _k |)

次に、プロセッサ１０１は、パス作成部６としての動作を実行する。
すなわち、プロセッサ１０１は、まず、上記ステップＳ３で抽出されたパス情報に基づいて、一つのパスΓ_kに対して、閉路グラフＧ_cから集合Ｓの各要素を端点とする部分パス Next, the processor 101 executes the operation of the path creation unit 6 .
That is, the processor 101 first extracts, for one path Γ _k , partial paths having each element of the set S as an end point from the closed-path graph G _c based on the path information extracted in step S3.

を抽出する（ステップＳ４）。すなわち、Ｌ_k ⁱとは、パスΓ_kにおいて、頂点ｌ_k[i]から頂点ｌ_k[i+1]までの部分パスのことである。また、ｌ_k[i]は、配列ｌ_kのｉ番目の要素である。 (Step S4). That is, L _k ⁱ is a partial path from vertex l _k [i] to vertex l _k [i+1] in path Γ _k . Also, l _k [i] is the i-th element of array l _k .

次に、プロセッサ１０１は、抽出された部分パスに基づいて、基礎データベース１の閉路グラフＧ_dにおいて始点がｌ_k[i]、終点がｌ_k[i+1]であるようなパスのうち、長さが、｜Ｌ_k ⁱ｜以上（i＝１，２，…，｜Γ_k｝）且つ｜Ｌ_k ⁱ｜＋ｘ以下であるものを全て数え上げる（ステップＳ５）。ここで、ｘはユーザが指定した０以上の正整数である。なお、このパスを列挙する際に、同じ頂点及び辺を２回通ることはない。列挙された部分パスの集合を Next, based on the extracted partial paths, the processor 101 counts up all paths in the closed-loop graph _Gd of the basic database 1, whose start point is _lk [i] and whose end point is _lk [i+1], and whose length is greater than or equal to | _Lki ^| (i=1, 2, ..., | _Γk }) and less than or equal to | _Lki ^| +x (step S5), where x is a positive integer greater than or equal to 0 specified by the user. Note that when enumerating these paths, the same vertices and edges are not passed through twice. The set of enumerated partial paths is

と表記する。 It is written as

次に、プロセッサ１０１は、対応付け部７としての動作を実行する。
すなわち、プロセッサ１０１は、まず、上記ステップＳ５で数え上げられた、パスの集合Ａ_k ⁱの中で、長さが｜Ｌ_k ⁱ｜のものがある場合、そのパスをαとする。この下で、以下のように名称の組み合わせを選出する（ステップＳ６）。
（Ｌ_k ⁱ[j]，α[j]），j＝１，２，…，｜Ｌ_k ⁱ｜
ただし、Ｌ_k ⁱ[j]，α[j]は、各パスのｊ番目の頂点である。 Next, the processor 101 executes the operation of the association unit 7 .
That is, the processor 101 first designates a path of length |L _k ⁱ | as α if there is a path of length |L k i | in the set of paths A _k ⁱ counted up in step S5 above. Under this, the processor 101 selects combinations of names as follows (step S6).
(L _k ⁱ [j], α[j]), j=1, 2,..., |L _k ⁱ |
Here, L _k ⁱ [j], α[j] is the j-th vertex of each path.

さらに、プロセッサ１０１は、それら選出した組み合わせの中で、長さが｜Ｌ_k ⁱ｜より長く且つ｜Ｌ_k ⁱ｜＋ｘ以下のものがある場合、図６のように文字列類似度（例えば、編集距離）に基づく名称検索技術を用いて、名称データの組み合わせを探索して対応付け、その結果をデータメモリ１０３の一時記憶部１０３３に記憶する（ステップＳ７）。編集距離は、例えば、D. Gusfield. "Algorithms on strings, trees and sequences: computer science and computational biology." Cambridge university press, 1997.に開示されている。 Furthermore, if there is any combination among the selected combinations whose length is longer than |L _k ⁱ | and is equal to or less than |L _k ⁱ |+x, the processor 101 searches for and associates the combination of name data using a name search technique based on string similarity (e.g., edit distance) as shown in Fig. 6, and stores the result in the temporary storage unit 1033 of the data memory 103 (step S7). The edit distance is disclosed, for example, in D. Gusfield. "Algorithms on strings, trees and sequences: computer science and computational biology." Cambridge university press, 1997.

図６は、名称の対応付け方法を説明するための模式図である。基礎データベース１に格納されたビルＢＬ_dの名称データ（Ａビル、Ｂビル、…ｎビル）と派生データベース２に格納されたビルＢＬ_cの名称（αビル、βビル、…νビル）があり、同図に一点鎖線で示すように、同一名称または別のパスにより、Ａビルとαビル、ｎビルとνビルが既に対応付けがされているとする。このような場合、プロセッサ１０１は、以下の手順で、名称データの組み合わせを探索することができる。 6 is a schematic diagram for explaining a method of matching names. There is name data (A building, B building, ... n building) of building BL _d stored in the basic database 1 and the name (α building, β building, ... v building) of building BL _c stored in the derived database 2, and as shown by the dashed dotted line in the figure, A building and α building, and n building and v building are already matched by the same name or by a different path. In such a case, the processor 101 can search for a combination of name data by the following procedure.

１．ｘ＝０と初期化する。
２．長さ｜Ｌ_k ⁱ｜＋ｘのパスを数え上げる。
３．数え上げた１パスの頂点の中から、既に対応付けできている名称を除く。
４．得られたパスの長さが｜Ｌ_k ⁱ｜より大きい場合、派生データベース２のビルＢＬ_cの内、未だ対応付けできていないビル（γビル）から、基礎データベース１のビルＢＬ_dへ、編集距離が最短であるビルを求める。例えば、γビルから編集距離が最短のビルとして、実線矢印で示すように、Ｃビルが探索され、対応付けされることができる。
５．ｘ＝ｘ＋１として、予めユーザが指定したｘの上限値になるまで、上記２～４を繰り返す。
なお、編集距離が最短のビルＢＬ_dを探索する際には、破線矢印で示すように、既に対応付けされたビルＢＬ_dの次のビルから探索が開始される。 1. Initialize x=0.
2. Count up the paths of length |L _k ⁱ |+x.
3. Among the vertices of the counted path, names that have already been associated are removed.
4. If the length of the obtained path is greater than |L _k ⁱ |, a building with the shortest edit distance from a building (γ building) that has not yet been associated among the buildings BL _c in the derived database 2 to the building BL _d in the basic database 1 is found. For example, as shown by the solid arrow, building C is searched for as the building with the shortest edit distance from γ building, and it can be associated.
5. Set x=x+1 and repeat steps 2 to 4 above until x reaches the upper limit value designated in advance by the user.
When searching for the building BL _d with the shortest edit distance, the search starts from the building next to the already associated building BL _d , as indicated by the dashed arrow.

以上のようにして、組み合わせが１通りしかないものについてはそのまま出力結果とし、それ以外のものについては、既に出力結果が得られている名称データを候補から除外する。除外して残った候補のうち、整合性のとれるものを残して対応付け結果とする。ここでの整合性とは、ある名称Ａについて複数の候補名がある下で、候補名の中で前述の操作で除外された名称Ｂがあるときには、除外された名称Ｂと、名称Ａの組み合わせ（Ａ，Ｂ）を出力する根拠となったパスＰが必ず存在する。このパスＰからは、名称Ａとは別の名称Ｃに対しても、名称の組み合わせ（Ｃ，Ｄ）を与えている。名称組み合わせ（Ａ，Ｂ）が除外されたことで、組み合わせ（Ｃ，Ｄ）もまた除外する。より具体的な例は、動作例として、後述する。 In this way, if there is only one combination, the output result is the same, and for the rest, the name data for which an output result has already been obtained is excluded from the candidates. Of the remaining candidates after the exclusion, those that are consistent are kept and used as the matching result. Consistency here means that when there are multiple candidate names for a certain name A, and a name B is excluded from the candidate names by the above-mentioned operation, there is always a path P that was the basis for outputting the excluded name B and the combination (A, B) of name A. From this path P, the name combination (C, D) is also given for a name C that is different from name A. As the name combination (A, B) has been excluded, the combination (C, D) is also excluded. A more specific example will be described later as an operation example.

こうして、一つのパスΓ_kについての処理が終了したならば、プロセッサ１０１は、上記ステップＳ３で抽出されたパス情報に基づくパスΓ_kの全てを処理したか否か判断する（ステップＳ８）。すなわち、全てのパスΓ_kの全ての頂点について処理を終了したか判断する。未だ処理していないパスΓ_kが存在すると判断した場合には、プロセッサ１０１は、ｋを更新して、上記ステップＳ４の処理に移行して、上記ステップＳ４乃至ステップＳ７の処理を繰り返す。 When the process for one path Γ _k is completed in this manner, the processor 101 judges whether or not all paths Γ _k based on the path information extracted in step S3 have been processed (step S8). That is, the processor 101 judges whether or not the process for all vertices of all paths Γ _k has been completed. If the processor 101 judges that there is a path Γ _k that has not yet been processed, the processor 101 updates k and proceeds to the process of step S4, and repeats the processes of steps S4 to S7.

そして、上記ステップＳ８においてパスΓ_kの全てを処理したと判断した場合、プロセッサ１０１は、データ出力部８としての動作を実行することで、名称データの対応付け情報を出力する（ステップＳ９）。すなわち、プロセッサ１０１は、入力部１０７からまたは外部のデータ処理装置から指示された形態の出力情報をデータメモリ１０３の一時記憶部１０３３に記憶された対応付け結果から生成し、その生成した出力情報をデータメモリ１０３の出力情報記憶部１０３４に記憶させる。そして、プロセッサ１０１は、この出力情報を、入出力インタフェース１０５を介して表示部１０８により表示したり、通信インタフェース１０４を介して外部のデータ処理装置に送信したりすることができる。 Then, when it is determined in step S8 that all of the paths Γ _k have been processed, the processor 101 executes the operation of the data output unit 8 to output the association information of the name data (step S9). That is, the processor 101 generates output information in a form instructed from the input unit 107 or an external data processing device from the association results stored in the temporary storage unit 1033 of the data memory 103, and stores the generated output information in the output information storage unit 1034 of the data memory 103. The processor 101 can then display this output information on the display unit 108 via the input/output interface 105, or transmit it to the external data processing device via the communication interface 104.

以上に説明した一実施形態に係る名称データ対応付け装置は、パス作成部６により、共通データを端点とし且つ非共通データを端点間の頂点とする部分パスを抽出し、部分パスそれぞれについて、端点と同じ共通データの端点を持ち且つ部分パスの長さ以上の長さを持つパスを作成し、対応付け部７により、この部分パスそれぞれについて、部分パス上の各頂点とパス上の頂点との組み合わせを探索することで、基礎データベース１が保持する名称データと派生データベース２が保持する名称データとを対応付ける。これにより、統合するデータベース間で表記ゆれが存在する同義の名称データを、名称データに対応する文字列別データがデータベース間で対応関係を有していなくとも、人的稼働を掛けず正確に対応付けることができる。よって、異なるデータベース間で、ある事柄に対して漏れのない情報収集を行うことが可能となる。また、人的稼働の削減により、業務効率を上げる効果が期待できる。In the name data matching device according to the embodiment described above, the path creation unit 6 extracts partial paths with common data as end points and non-common data as vertices between the end points, creates a path for each partial path that has the same end point of the common data as the end point and has a length equal to or greater than the length of the partial path, and the matching unit 7 searches for combinations of each vertex on the partial path and a vertex on the path for each partial path, thereby matching the name data held by the basic database 1 with the name data held by the derived database 2. This allows synonymous name data with spelling variations between the databases to be integrated to be accurately matched without human labor, even if the character string data corresponding to the name data does not have a correspondence relationship between the databases. This makes it possible to collect information on a certain matter without omissions between different databases. In addition, the reduction in human labor is expected to have the effect of improving business efficiency.

なお、一実施形態に係る名称データ対応付け装置は、グラフ作成部３により、名称データを頂点とする基礎データベース１及び派生データベース２の無向グラフである閉路グラフＧ_d及びＧ_cを作成し、パス情報抽出部５によって、共通データを端点とし且つ派生データベース２が保持する名称データを頂点とする全てのパスΓ_kを生成し、それらのパスΓ_kそれぞれについて、頂点数、含まれる頂点の名称データ及びそのパス上の位置を含むパス情報を抽出する。そして、パス作成部６は、このパスΓ_kの１つに対して、前記パス情報に基づいて、閉路グラフＧ_cから部分パスを抽出し、この部分パスそれぞれについて、閉路グラフＧ_dから、部分パスの端点と同じ共通データの端点を持ち且つ部分パスが有する頂点の頂点数以上の頂点を含むパスを作成する。よって、基礎データベース１が保持している名称データのうち、派生データベース２が保持している名称データと対応付けできる可能性がある頂点を含む、換言すれば、派生データベース２が保持している名称データと対応付けできる可能性の無い頂点を除外した、パスを作成することができる。 In addition, in the name data association device according to the embodiment, the graph creation unit 3 creates closed-loop graphs _Gd and _Gc , which are undirected graphs of the basic database 1 and the derived database 2, with name data as vertices, and the path information extraction unit 5 generates all paths _Γk with common data as end points and name data held in the derived database 2 as vertices, and extracts path information including the number of vertices, name data of included vertices, and positions on the path for each of the paths _Γk . Then, the path creation unit 6 extracts a partial path from the closed-loop graph _Gc based on the path information for one of the paths _Γk , and creates a path from the closed-loop graph _Gd for each of the partial paths, which has end points of common data that are the same as the end points of the partial path and includes vertices equal to or greater than the number of vertices that the partial path has. Thus, it is possible to create a path that includes vertices that may be associated with name data held in the basic database 1 and the name data held in the derived database 2, in other words, excludes vertices that may not be associated with name data held in the derived database 2.

さらにここで、パス作成部６は、作成するパスとして、パスΓ_kの頂点数以上であり且つ頂点数に対してユーザが指定した個数以下の頂点数を含むパスを作成する。よって、パスが含む頂点数を制限することで、処理時間の短縮化を図れる。 Furthermore, the path creation unit 6 creates a path that contains a number of vertices equal to or greater than the number of vertices of the path Γ _k and equal to or less than the number designated by the user. Thus, by limiting the number of vertices contained in a path, it is possible to shorten the processing time.

また、一実施形態に係る名称データ対応付け装置においては、対応付け部７は、パス作成部６によって作成したパス上の頂点それぞれについて、パス上の位置が部分パス上の頂点に対応する場合には、基礎データベース１が保持する名称データのうちのパス上の頂点に対応する名称データを、派生データベース２が保持する名称データのうちの部分パス上の頂点の名称データに対応付ける。また、パス上の位置が部分パス上の頂点に対応しない場合には、対応付け部７は、名称データ同士の文字列類似度に基づいて、基礎データベース１が保持する名称データのうちのパス上の頂点に対応する名称データと、派生データベース２が保持する名称データのうちの部分パス上の頂点の名称データとを対応付ける。よって、基礎データベース１が保持する名称データを、派生データベース２が保持する名称データに容易に対応付けすることができる。In addition, in a name data matching device according to one embodiment, for each vertex on the path created by the path creation unit 6, if the position on the path corresponds to a vertex on the partial path, the matching unit 7 matches the name data corresponding to the vertex on the path among the name data held by the basic database 1 with the name data of the vertex on the partial path among the name data held by the derived database 2. If the position on the path does not correspond to a vertex on the partial path, the matching unit 7 matches the name data corresponding to the vertex on the path among the name data held by the basic database 1 with the name data of the vertex on the partial path among the name data held by the derived database 2 based on the character string similarity between the name data. Thus, the name data held by the basic database 1 can be easily matched with the name data held by the derived database 2.

また、一実施形態に係る名称データ対応付け装置は、パス情報抽出部５が生成したパスΓ_kの全てに対する処理が終了するまで、パス作成部６及び対応付け部７の処理を繰り返す。よって、派生データベース２が保持する名称データが基礎データベース１が保持する名称データと対応付けし損なう確率を減少させることができる。 Furthermore, the name data association device according to an embodiment repeats the processes of the path creation unit 6 and the association unit 7 until the processes are completed for all of the paths Γ _k generated by the path information extraction unit 5. This makes it possible to reduce the probability of failing to associate the name data held in the derived database 2 with the name data held in the basic database 1.

また、一実施形態に係る名称データ対応付け装置は、データ出力部８により、名称データの対応付けの結果に基づいて、名称データの対応表を含む出力情報を生成する。よって、この出力情報を利用して、データベースの統合処理を実施することが可能となる。また、一実施形態に係る名称データ対応付け装置は、出力情報として、統合したデータベースの情報を生成しても良い。 In addition, the name data matching device according to one embodiment generates output information including a name data correspondence table based on the result of matching the name data by the data output unit 8. This output information can therefore be used to implement database integration processing. In addition, the name data matching device according to one embodiment may generate information of the integrated database as output information.

［動作例］
本実施形態の動作例として、適用した名称データの概要と結果を説明する。 [Example of operation]
As an operation example of this embodiment, an overview of the applied name data and the results will be described.

図７は、動作例において基礎データベース記憶部１０３１に記憶される基礎データベース１が保持する情報の一例を示す図である。この基礎データベースから得られる隣接情報は、以下の通りである。ここで、（Ａ，Ｂ）という表記は、データ名Ａとデータ名Ｂとは繋がりがあることを示すものとする。
・（福岡花園ビル、立子山ビル）
・（立子山ビル、福山伊達ビル）
・（福山伊達ビル、桑原ビル）
・（桑原ビル、福井藤田ビル）
・（福井藤田ビル、福地梁川ビル）
・（福地梁川ビル、保科ビル）
・（保科ビル、恐山ビル）
・（保科ビル、福岡花園ビル）
・（恐山ビル、福岡花園ビル）
・（恐山ビル、月舘ビル）
・（月舘ビル、福島川俣ビル）
・（福島川俣ビル、福岡花園ビル） 7 is a diagram showing an example of information held by the basic database 1 stored in the basic database storage unit 1031 in the operation example. The adjacent information obtained from this basic database is as follows. Here, the notation (A, B) indicates that the data name A and the data name B are related to each other.
・(Fukuoka Hanazono Building, Tatsukoyama Building)
・(Tatsukoyama Building, Fukuyama Date Building)
・(Fukuyama Date Building, Kuwabara Building)
・(Kuwabara Building, Fukui Fujita Building)
・(Fukui Fujita Building, Fukuchi Yanagawa Building)
・(Fukuchi Yanagawa Building, Hoshina Building)
・(Hoshina Building, Osorezan Building)
・(Hoshina Building, Fukuoka Hanazono Building)
・(Osorezan Building, Fukuoka Hanazono Building)
・(Osorezan Building, Tsukidate Building)
・(Tsukidate Building, Fukushima Kawamata Building)
・(Fukushima Kawamata Building, Fukuoka Hanazono Building)

図８は、動作例において派生データベース記憶部１０３２に記憶される派生データベース２が保持する情報の一例を示す図である。この派生データベースから得られる隣接情報は、以下の通りである。この隣接情報のパスがΓ_kであり、本動作例で扱うパスは一本なのでｋ＝１とする。ここで、（Ａ→Ｂ）という表記は、データ名Ａからデータ名Ｂへパスがあることを示すものとする。
・（花園ビル→伊達ビル）
・（伊達ビル→桑原ビル）
・（桑原ビル→藤田ビル）
・（藤田ビル→梁川ビル）
・（梁川ビル→保科ビル）
・（保科ビル→恐山ビル）
・（恐山ビル→月館ビル）
・（月館ビル→川俣ビル）
・（川俣ビル→花園ビル） 8 is a diagram showing an example of information held by the derivative database 2 stored in the derivative database storage unit 1032 in the operation example. The adjacent information obtained from this derivative database is as follows. The path of this adjacent information is Γ _k , and since there is only one path handled in this operation example, k = 1. Here, the notation (A → B) indicates that there is a path from data name A to data name B.
・(Hanazono Building → Date Building)
・(Date Building → Kuwabara Building)
・(Kuwabara Building → Fujita Building)
・(Fujita Building → Yanagawa Building)
・(Yanagawa Building → Hoshina Building)
・(Hoshina Building → Osorezan Building)
・(Osorezan Building → Tsukikan Building)
・(Tsukikan Building → Kawamata Building)
・(Kawamata Building → Hanazono Building)

この動作例では、パスＩＤ＝２の名称データに関して、頂点集合Ｖ_c及びＶ_dは、以下の通りである。
Ｖ_c＝｛福岡花園ビル、立子山ビル、福山伊達ビル、桑原ビル、福井藤田ビル、福地梁川ビル、保科ビル、恐山ビル、月舘ビル、福島川俣ビル｝
Ｖ_d＝｛花園ビル、伊達ビル、桑原ビル、藤田ビル、梁川ビル、保科ビル、恐山ビル、月館ビル、川俣ビル｝ In this operation example, for the name data of path ID=2, the vertex sets V _c and V _d are as follows:
_Vc = {Fukuoka Hanazono Building, Tatsukoyama Building, Fukuyama Date Building, Kuwabara Building, Fukui Fujita Building, Fukuchi Yanagawa Building, Hoshina Building, Osorezan Building, Tsukidate Building, Fukushima Kawamata Building}
V _d = {Hanazono Building, Date Building, Kuwabara Building, Fujita Building, Yanagawa Building, Hoshina Building, Osorezan Building, Tsukidate Building, Kawamata Building}

そして、名称データの正確な表記の組み合わせ、すなわち名称データの対応付けは、この動作例では、次の通りである。
｛（月舘ビル、月館ビル），（福島川俣ビル、川俣ビル），（福岡花園ビル、花園ビル），（福山伊達ビル、伊達ビル），（福井藤田ビル、藤田ビル），（福地梁川ビル、梁川ビル）｝ In this operation example, the correct combination of notations of the name data, that is, the association of the name data, is as follows:
{(Tsukidate Building, Tsukidate Building), (Fukushima Kawamata Building, Kawamata Building), (Fukuoka Hanazono Building, Hanazono Building), (Fukuyama Date Building, Date Building), (Fukui Fujita Building, Fujita Building), (Fukuchi Yanagawa Building, Yanagawa Building)}

実施形態に係る名称データ対応付け装置がこの対応付けを正しく行うことができるか確認した。 We confirmed whether the name data matching device of the embodiment can perform this matching correctly.

名称データ対応付け装置のプロセッサ１０１は、ステップＳ１において、グラフ作成部３としての動作を実施し、閉路グラフを作成する。図９は、動作例において基礎データベース１が保持する情報から作成された閉路グラフＧ_dの一例を示す模式図である。 In step S1, the processor 101 of the name data association device performs the operation of the graph creation unit 3 to create a cycle graph. Fig. 9 is a schematic diagram showing an example of a cycle graph _Gd created from information held in the basic database 1 in the operation example.

また、プロセッサ１０１は、ステップＳ２において、共通データ抽出部４としての動作を実施し、閉路グラフＧ_cと閉路グラフＧ_dとで共通する名称データを抽出する。ここで、同一表記であるような名称データつまりビル名集合Ｓは、以下の通りである。
Ｓ :＝｛桑原ビル、保科ビル、恐山ビル｝ In step S2, the processor 101 operates as the common data extraction unit 4 and extracts name data common to the closed-loop graphs _Gc and _Gd . Here, the name data with the same notation, that is, the building name set S, is as follows:
S: = {Kuwabara Building, Hoshina Building, Osorezan Building}

そこで、プロセッサ１０１は、ステップＳ３において、パス情報抽出部５としての動作を実施して派生データベース２からパス情報を抽出し、ステップＳ４において、パス作成部６としての動作を実施して部分パスを抽出する。図１０は、動作例において派生データベース２が保持する情報から作成された閉路グラフＧ_cから生成されたパスΓ1の一例を示す模式図である。プロセッサ１０１は、閉路グラフＧ_cからビル名集合Ｓの各要素を端点とする部分パス Therefore, in step S3, the processor 101 operates as the path information extraction unit 5 to extract path information from the derivative database 2, and in step S4, operates as the path creation unit 6 to extract a partial path. Fig. 10 is a schematic diagram showing an example of a path Γ1 generated from a cycle graph _Gc created from information held in the derivative database 2 in the operation example. The processor 101 extracts partial paths Γ1 from the cycle graph _Gc , the end points of which are each element of the building name set S.

を抽出する。
Ｌ₁ ¹ :＝（桑原ビル、藤田ビル、梁川ビル、保科ビル）
Ｌ₁ ² :＝（保科ビル、恐山ビル）
Ｌ₁ ³ :＝（恐山ビル、月館ビル、川俣ビル、花園ビル、伊達ビル、桑原ビル） Extract.
L ₁ ¹ : = (Kuwabara Building, Fujita Building, Yanagawa Building, Hoshina Building)
L ₁ ² : = (Hoshina Building, Osorezan Building)
L ₁ ³ : = (Osorezan Building, Tsukidate Building, Kawamata Building, Hanazono Building, Date Building, Kuwabara Building)

そして、プロセッサ１０１は、ステップＳ５において、部分パスＬ₁ ¹について、閉路グラフＧ_d上で、「桑原ビル」と「保科ビル」を端点に持つ、長さ３以上、３＋ｘ以下のパスを数え上げる。 Then, in step S5, the processor 101 counts up, for the partial path L ₁ ¹ , paths in the closed graph G _d that have "Kuwabara Building" and "Hoshina Building" as their end points and have a length of 3 or more and 3+x or less.

今回は動作例のため、パラメータｘ＝１とする。すると、
長さ３：（桑原ビル、福井藤田ビル、福地梁川ビル、保科ビル）
長さ４：（桑原ビル、福山伊達ビル、立子山ビル、福岡花園ビル、保科ビル）
となる。 For this example, let's set the parameter x = 1. Then,
Length 3: (Kuwabara Building, Fukui Fujita Building, Fukuchi Yanagawa Building, Hoshina Building)
Length 4: (Kuwabara Building, Fukuyama Date Building, Tatsukoyama Building, Fukuoka Hanazono Building, Hoshina Building)
It becomes.

長さ３の場合について、数え上げたパスと部分パスＬ₁ ¹との対応する頂点名を組合せると、
（福井藤田ビル、藤田ビル）、（福地梁川ビル、梁川ビル）
の候補を得る。 For the length 3 case, combining the corresponding vertex names of the enumerated paths and the partial path L ₁ ¹ gives
(Fukui Fujita Building, Fujita Building), (Fukuchi Yanagawa Building, Yanagawa Building)
Get candidates.

長さ４の場合は、どの組み合わせも編集距離が１であるので、
「藤田ビル」に対し、候補：「福山伊達ビル」、「立子山ビル」、「福井藤田ビル」
「梁川ビル」に対し、候補：「福岡花園ビル」、「立子山ビル」、「福地梁川ビル」
が考えられる。 In the case of length 4, the edit distance of every combination is 1, so
Candidates for "Fujita Building" are: "Fukuyama Date Building", "Tatsukoyama Building", "Fukui Fujita Building"
Candidates for "Yanagawa Building" are: "Fukuoka Hanazono Building", "Tatsukoyama Building", "Fukuchi Yanagawa Building"
It is possible that:

部分パスＬ₁ ²については、長さ１なので、省略する。 The partial path L ₁ ² has a length of 1 and is therefore omitted.

部分パスＬ₁ ³について、プロセッサ１０１は、閉路グラフＧ_d上で、「恐山ビル」と「桑原ビル」を端点に持つ、長さ５以上、５＋ｘ＝６以下のパスを数え上げる。すると、
長さ５：該当なし
長さ６：（恐山ビル、月舘ビル、福島川俣ビル、福岡花園ビル、立子山ビル、福山伊達ビル、桑原ビル）
を得る。 For ^the partial path _L13 , the processor 101 counts up the paths in the closed graph _Gd that have "Osorezan Building" and "Kuwabara Building" as their end points and have a length of 5 or more and 5+x=6 or less. Then,
Length 5: N/A
Length 6: (Osorezan Building, Tsukidate Building, Fukushima Kawamata Building, Fukuoka Hanazono Building, Tatsukoyama Building, Fukuyama Date Building, Kuwabara Building)
get.

長さ６のパスから、部分パスＬ₁ ³の各頂点と編集距離が最短である点を選択すると、
（月舘ビル、月館ビル）、（福島川俣ビル、川俣ビル）、（福岡花園ビル、花園ビル）、（福山伊達ビル、伊達ビル）
の候補を得る。 From the path of length 6, a point that has the shortest edit distance with each vertex of the partial path L ₁ ³ is selected.
(Tsukidate Building, Tsukidate Building), (Fukushima Kawamata Building, Kawamata Building), (Fukuoka Hanazono Building, Hanazono Building), (Fukuyama Date Building, Date Building)
Get candidates.

以上より、候補数が１つである組み合わせ＝回答、とするので、
（月舘ビル、月館ビル）、（福島川俣ビル、川俣ビル）、（福岡花園ビル、花園ビル）、（福山伊達ビル、伊達ビル）
は回答となる。 From the above, the combination with one candidate is the answer, so
(Tsukidate Building, Tsukidate Building), (Fukushima Kawamata Building, Kawamata Building), (Fukuoka Hanazono Building, Hanazono Building), (Fukuyama Date Building, Date Building)
is the answer.

そして、この「花園ビル」と「伊達ビル」の回答から、「藤田ビル」及び「梁川ビル」の候補は、
「藤田ビル」に対し、候補：「立子山ビル」、「福井藤田ビル」
「梁川ビル」に対し、候補：「立子山ビル」、「福地梁川ビル」
になる。ここで、「福岡花園ビル」と「福山伊達ビル」の候補がなくなったことから、
パス：（桑原ビル、福山伊達ビル、立子山ビル、福岡花園ビル、保科ビル）
は、部分パスＬ₁ ¹の閉路グラフＧ_dにおける対応パスにはなりえない。よって、「藤田ビル」及び「梁川ビル」の候補から「立子山ビル」も除外されるので、
（福井藤田ビル、藤田ビル）、（福地梁川ビル、梁川ビル）
を回答として得る。 And from the answers of "Hanazono Building" and "Date Building", the candidates of "Fujita Building" and "Yanakawa Building" are,
Candidates for "Fujita Building" are: "Tatsukoyama Building" and "Fukui Fujita Building"
Candidates for "Yasagawa Building" are: "Tatsukoyama Building", "Fukuchi Yanagawa Building"
At this point, since the candidates for "Fukuoka Hanazono Building" and "Fukuyama Date Building" were eliminated,
Path: (Kuwabara Building, Fukuyama Date Building, Tatsukoyama Building, Fukuoka Hanazono Building, Hoshina Building)
cannot be a corresponding path in the cyclic graph G _d of the partial path L ₁ ^1. Therefore, the "Tatsukoyama Building" is also excluded from the candidates of the "Fujita Building" and the "Yanagagawa Building", so
(Fukui Fujita Building, Fujita Building), (Fukuchi Yanagawa Building, Yanagawa Building)
gets as an answer.

その後、プロセッサ１０１は、データメモリ１０３の一時記憶部１０３３に記憶された上記対応付けの結果に基づいて出力情報を生成し、データメモリ１０３の出力情報記憶部１０３４に記憶させる。図１１は、この出力情報記憶部１０３４に記憶される出力情報の一例を示す図である。なお、ここでは、出力情報を名称データの対応関係を表す対応表として示しているが、これに限定されないことは勿論である。Thereafter, the processor 101 generates output information based on the results of the above-mentioned correspondence stored in the temporary storage unit 1033 of the data memory 103, and stores the output information in the output information storage unit 1034 of the data memory 103. Figure 11 is a diagram showing an example of the output information stored in the output information storage unit 1034. Note that, although the output information is shown here as a correspondence table showing the correspondence of the name data, it is of course not limited to this.

以上より、名称データ対応付け装置により、部分パスを使用することで、正確な名称データの対応付けが可能であることが検証できた。 From the above, it was verified that the name data matching device can accurately match name data by using partial paths.

［他の実施形態］
前記一実施形態では、対象とするデータベースも２つの場合を例に説明したが、３つ以上であっても良い。すなわち、３つ以上のデータベースのうち、少なくとも１つのデータベースがパス識別情報を保持していれば、残りの２つ以上のデータベースとの間で名称データの対応付けを行うことが可能となる。 [Other embodiments]
In the embodiment described above, the number of target databases is two, but the number may be three or more. In other words, if at least one of the three or more databases holds path identification information, it is possible to associate name data with the remaining two or more databases.

また、前記一実施形態では、パスを例に説明したが、パスではなく閉路である場合（始点と終点が同一の頂点）であっても対応可能なことは勿論である。 In addition, in the above embodiment, a path was used as an example, but it is of course also possible to handle a closed circuit (where the start and end points are the same vertex) rather than a path.

また、前記一実施形態では、データメモリ１０３の基礎データベース記憶部１０３１及び派生データベース記憶部１０３２に基礎データベース１及び派生データベース２が保持する情報の全部または一部を記憶して処理を進める例を説明したが、それに限定するものではない。プロセッサ１０１は、通信インタフェース１０４により外部のデータサーバに適宜アクセスして、そこに構築された基礎データベース１及び派生データベース２に蓄積されている情報を使用して処理を進め、各ステップの処理結果のみを一時記憶部１０３３に記憶するようにしても良い。これにより、名称データ対応付け装置が備えるデータメモリ１０３の容量を抑えることができ、安価に名称データ対応付け装置を構成することが可能となる。 In the above embodiment, the processing is carried out by storing all or part of the information held by basic database 1 and derived database 2 in basic database storage unit 1031 and derived database storage unit 1032 of data memory 103, but this is not limiting. Processor 101 may access an external data server as appropriate via communication interface 104, and use the information stored in basic database 1 and derived database 2 constructed there to carry out processing, storing only the processing results of each step in temporary storage unit 1033. This makes it possible to reduce the capacity of data memory 103 provided in the name data matching device, making it possible to configure a name data matching device at low cost.

また、前記一実施形態では、出力情報を生成して、表示部１０８または外部のデータ処理装置に出力する例を説明したが、出力情報を生成することなく、一時記憶部１０３３に記憶した対応付け結果を出力するようにしても良い。これにより、名称データ対応付け装置が備えるデータメモリ１０３の容量を抑えることができ、安価に名称データ対応付け装置を構成することが可能となる。また、データベースの統合処理を行うデータ処理装置に対して、名称データの対応付けのみを行うサービスを提供することが可能となる。 In addition, in the above embodiment, an example was described in which output information was generated and output to the display unit 108 or an external data processing device, but it is also possible to output the matching results stored in the temporary storage unit 1033 without generating output information. This makes it possible to reduce the capacity of the data memory 103 provided in the name data matching device, making it possible to configure a name data matching device at low cost. It is also possible to provide a service that only matches name data to a data processing device that performs database integration processing.

また、各実施形態に記載した手法は、計算機（コンピュータ）に実行させることができるプログラム（ソフトウェア手段）として、例えば磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ－ＲＯＭ、ＤＶＤ、ＭＯ等）、半導体メモリ（ＲＯＭ、ＲＡＭ、フラッシュメモリ等）等の記録媒体に格納し、また通信媒体により伝送して頒布することもできる。なお、媒体側に格納されるプログラムには、計算機に実行させるソフトウェア手段（実行プログラムのみならずテーブル、データ構造も含む）を計算機内に構成させる設定プログラムをも含む。本装置を実現する計算機は、記録媒体に記録されたプログラムを読み込み、また場合により設定プログラムによりソフトウェア手段を構築し、このソフトウェア手段によって動作が制御されることにより上述した処理を実行する。なお、本明細書でいう記録媒体は、頒布用に限らず、計算機内部あるいはネットワークを介して接続される機器に設けられた磁気ディスク、半導体メモリ等の記憶媒体を含むものである。 The methods described in each embodiment can be stored as a program (software means) that can be executed by a calculator (computer) on a recording medium such as a magnetic disk (floppy disk, hard disk, etc.), optical disk (CD-ROM, DVD, MO, etc.), semiconductor memory (ROM, RAM, flash memory, etc.), and can also be distributed by transmitting it via a communication medium. The programs stored on the medium also include a setting program that configures the software means (including not only execution programs but also tables and data structures) that the calculator executes. The computer that realizes this device reads the program recorded on the recording medium, and in some cases, constructs the software means using the setting program, and executes the above-mentioned processing by controlling the operation of the software means. Note that the recording medium referred to in this specification is not limited to a recording medium for distribution, but also includes storage media such as a magnetic disk or semiconductor memory installed inside the computer or in a device connected via a network.

要するに、この発明は上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は可能な限り適宜組合せて実施しても良く、その場合組合せた効果が得られる。さらに、上記実施形態には種々の段階の発明が含まれており、開示される複数の構成要件における適当な組み合わせにより種々の発明が抽出され得る。In short, this invention is not limited to the above-described embodiments, and various modifications can be made in the implementation stage without departing from the gist of the invention. Furthermore, the embodiments may be implemented in appropriate combinations as far as possible, in which case the combined effects can be obtained. Furthermore, the above-described embodiments include inventions at various stages, and various inventions can be extracted by appropriate combinations of the multiple constituent elements disclosed.

１…基礎データベース
２…派生データベース
３…グラフ作成部
４…共通データ抽出部
５…パス情報抽出部
６…パス作成部
７…対応付け部
８…データ出力部
１０１…プロセッサ
１０２…プログラムメモリ
１０３…データメモリ
１０４…通信インタフェース
１０５…入出力インタフェース
１０６…バス
１０７…入力部
１０８…表示部
１０３１…基礎データベース記憶部
１０３２…派生データベース記憶部
１０３３…一時記憶部
１０３４…出力情報記憶部

REFERENCE SIGNS LIST 1...basic database 2...derived database 3...graph creation section 4...common data extraction section 5...path information extraction section 6...path creation section 7...association section 8...data output section 101...processor 102...program memory 103...data memory 104...communication interface 105...input/output interface 106...bus 107...input section 108...display section 1031...basic database storage section 1032...derived database storage section 1033...temporary storage section 1034...output information storage section

Claims

A name data associating device for associating synonymous name data having different notations between a first database that holds a plurality of name data and adjacency information indicating a logical or physical adjacency relationship between the name data and a second database that holds a plurality of name data, adjacency information for the name data, and path identification information indicating a path to which the name data belongs, comprising:
a common data extraction unit that extracts name data having the same notation between the first database and the second database as common data;
a path creation unit that extracts partial paths, from the paths represented by the path identification information stored in the second database, that have the common data extracted by the common data extraction unit as endpoints and non-common data as vertices between the endpoints, and creates, for each of the partial paths, a path that has endpoints of common data that are the same as the endpoints of the partial paths and has a length equal to or greater than the length of the partial paths, based on the information stored in the first database;
a correspondence unit that associates the name data held in the first database with the name data held in the second database by searching for a combination of each vertex on the partial path and a vertex on the path created by the path creation unit for each of the partial paths extracted by the path creation unit;
A name data association device comprising:

a graph creation unit that creates an undirected graph of the first database and the second database, with the name data as a vertex, based on information held in the first database and the second database;
a path information extraction unit that generates all paths having the common data extracted by the common data extraction unit as an end point and the name data stored in the second database as a vertex based on the undirected graph of the second database created by the graph creation unit and the path identification information stored in the second database, and extracts path information for each of the paths including the number of vertices, name data of the vertices included, and positions on the path;
Further comprising:
2. The name data matching device of claim 1, wherein the path creation unit extracts the partial path from the undirected graph of the second database created by the graph creation unit based on the path information for one of the paths generated by the path information extraction unit, and creates, for each of the partial paths, a path from the undirected graph of the first database that has the same common data endpoint as the endpoint of the partial path and includes more vertices than the number of vertices the partial path has.

The name data matching device of claim 2, wherein the path creation unit creates a path that includes a number of vertices that is equal to or greater than the number of vertices and is equal to or less than a number specified by a user with respect to the number of vertices.

The association unit, for each of the vertices on the path created by the path creation unit,
if the position on the path corresponds to the vertex on the partial path, then associating name data corresponding to the vertex on the path among the name data held in the first database with name data of the vertex on the partial path among the name data held in the second database;
if the position on the path does not correspond to the vertex on the partial path, then, based on a character string similarity between the name data, name data corresponding to the vertex on the path among the name data held in the first database is associated with name data of the vertex on the partial path among the name data held in the second database;
4. The name data correlating device according to claim 1.

A name data matching device as described in claim 2 or 3, wherein the path creation unit and the matching unit repeat the process until processing is completed for all paths generated by the path information extraction unit.

A name data matching device as described in any one of claims 1 to 5, further comprising an output unit that generates output information including a name data matching table based on the result of matching by the matching unit.

A name data matching method in a name data matching device comprising a processor, a first database that holds a plurality of name data and adjacency information indicating a logical or physical adjacency relationship between the name data, and a second database that holds a plurality of name data, adjacency information for the name data, and path identification information indicating a path to which the name data belongs, the method comprising: matching synonymous name data having different notations between the first database and the second database, the method comprising:
extracting, by the processor, name data having the same notation between the first database and the second database stored in the memory as common data;
extracting, by the processor, a partial path having the extracted common data as an end point and non-common data as a vertex between the end points from the path represented by the path identification information held in the second database stored in the memory;
by the processor, based on information held in the first database stored in the memory, creating, for each of the extracted partial paths, a path having an end point of common data that is the same as the end point of the partial path and having a length equal to or greater than the length of the partial path;
by the processor, for each of the extracted partial paths, searching for a combination of each vertex on the partial path and a vertex on the created path, thereby associating the name data held in the first database stored in the memory with the name data held in the second database stored in the memory;
Name data matching method.

7. A name data associating program that causes a processor to function as each of said units of the name data associating device according to claim 1.