JP3635341B2

JP3635341B2 - How to join databases

Info

Publication number: JP3635341B2
Application number: JP11725193A
Authority: JP
Inventors: 裕明唐沢
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 1993-05-19
Filing date: 1993-05-19
Publication date: 2005-04-06
Anticipated expiration: 2020-04-06
Also published as: JPH06332766A

Description

【０００１】
【産業上の利用分野】
本発明は、データベースの結合方法に係り、特に、任意の異なる独立したデータベースにおて、情報に信頼性の違いがある、或いは、表現形式の相違を持つ個々のレコードを互いに関連付け、相互参照を可能とする結合データベースを正確に、且つ自動的に作成するデータベースの結合方法に関する。
【０００２】
【従来の技術】
従来、任意の異なる独立したデータベースにおいて、個々のレコードを関連付け、結合したいという要求があった場合に、任意の異なる独立したデータベースが統一的に規則に基づいて作成されている場合は、データベースの共通のフィールドに対してＤＢＭＳ（データベース管理システム）に用意されているＪＯＩＮ（結合）操作などで容易に結合することができる。
【０００３】
データベースの結合方法であるＪＯＩＮ操作には、結合する項目でデータベースへの両テーブルをソートして（又はインデックスを利用して）、両テーブル７の同一キーのものを取り出す方法と、データベースのテーブル間で一方のテーブルを基準にして他方のデータベースのテーブルを総当たりし、条件を満足するものを取り出す方法がある。
【０００４】
【発明が解決しようとする課題】
しかしながら、上記従来のＪＯＩＮ操作でデータベースを結合する方法は、任意の異なる独立したデータベースが、それぞれの規則に基づいて作成されている場合には、情報の表現形式に統一性がなく、任意の異なる独立したデータベースのある情報、例えば、「事務所」と「オフィス」が共通のフィールドで、同一の事項を指していたとしても、その表現にこのようなゆらぎが生じていることが多々あること、また、データベースの共通のフィールド情報、例えば、住所の都道府県情報と住所の街区情報などに信頼性の差異が存在すること等があるため、単純な共通のフィールド情報の一致をみる比較では、結合される件数は少なく、着目するフィールドによっては、非常に信頼性が低くなっている。そのため、正確な結合データベースを作成するためには、
(1) 片方のデータベースを基準にして、他のデータベースの内容を盛り込んで、新しい結合データベースを人為的に最初から作成し直す;
（２）片方のデータベースを基準にして、個々のレコードに対して結合する相手となるレコードを、他のデータベースの内容から人為的に発見する;
等の方法があるが、これらの方法を採った場合には、いずれも莫大な人為的稼働、作業時間、そして、作業コストを投じなければ結合を実現できない等の欠点がある。
【０００５】
本発明は、上記の点に鑑みなされたもので、上記従来の問題点を解決し、任意の異なる独立したデータベースに対して、情報に信頼性の違いあるいは、表現形式の相違を持つ個々のレコードを互いに関連付け、相互参照を可能とする結合データベースを正確に、人為的な作業を交えず、自動的に作成することができるデータベースの結合方法を提供することを目的とする。
【０００６】
【課題を解決するための手段】
本発明は、共通のフィールドを持ち、該共通のフィールド情報の確からしさである信頼性の度合あるいはその表現形式が異なるデータベースのレコードを、該共通のフィールド情報を利用することにより、個々に関連付けるデータベース結合方法において、
表現形式の異なる該共通のフィールド情報に対しては、表現形式を統一する解析処理を行い、該解析処理後の共通のフィールド情報を作成し、
共通のフィールド情報の信頼性の度合或いは表現形式が異なるデータベースのレコードについて、共通のフィールドごとに比較し、一致の成否により、解析処理の処理内容及び、該共通のフィールド情報の信頼性の度合に基づいて予め得点を採点表に設定し、該得点を信頼性の度合によりそれぞれのレコードに付与し、該得点に基づいて異なるデータベースを関連付け、結合する個々のレコードを決定する。
【０００７】
【作用】
本発明は、共通のフィールドを持つ任意の異なるデータベース間において、共通のフィールド情報の信頼性が低い、或いはその表現形式が異なっていたとしても、表現形式の異なる共通のフィールド情報に対しては、表現形式を統一させる種々の解析アルゴリズムによる処理を行い、解析処理後の共通のフィールド情報を作成し、次に、共通のフィールド毎に比較し、フィールドのデータの一致の成否により、解析処理の処理内容及び、共通のフィールド情報の信頼性の度合から作成された採点表による得点をそれぞれのレコードに付与し、得点によって、異なるデータベースの個々のレコードを関連付け、相互参照を可能とする結合データベースを正確に人為的な作業を交えず、自動的に作成することができる。
【０００８】
【実施例】
以下、図面とともに本発明のデータベースの結合方法の実施例を説明する。
【０００９】
図１は、本発明の一実施例のデータベース結合方法のブロック図を示す。
【００１０】
同図に示すシステムは、電話帳データベース（ＤＢ１），或いは地図データベース（ＤＢ２）などの複数の結合対象データベース１、結合処理が行われた結果を表す結合データベース（ＤＢ３）２、結合対象データベース１より該当レコードを取り出す或いは結合データベース２を更新するデータベース制御部３、データベース制御部３に対し、レコードの要求あるいは、格納を指示し、該当レコードを記憶しておく一時記録部４、一時記録部４に記録されている文字列に対し、変換処理を加える一般文字列処理部５、一時記録部４に記録されているレコード群の比較をフィールド毎に行い、比較結果に基づいて点数処理を行い、一時記録部４に書き戻す処理および、一時記録部４に集計された点数に基づいて、結合相手レコードを決定する比較照合部６、比較照合部６からの問い合わせに対し、照合結果に対応する点数を発行する採点表７より構成される。
【００１１】
図２は本発明の一実施例の動作を示すフローチャートである。
【００１２】
以下の説明において、同図のフローチャートに示す処理において、ＤＢ１は電話帳データベースであり、ＤＢ２は地図データベースである。
【００１３】
図３は本発明の一実施例の電話帳データベース（ＤＢ１）を示し、図４は本発明の一実施例の地図データベース（ＤＢ２）を示す。図５は本発明の一実施例の結合データベースの例を示す。
【００１４】
図３及び図４のデータベースの各レコードはフィールド構成をとるものである。本実施例では、ＤＢ１とＤＢ２を結合して、図５に示すような結合データベースＤＢ３を構築するものである。
【００１５】
ステップ１）結合対象データベース１のうち、電話帳データベース（ＤＢ１）に未処理のレコードが存在するか確認し、取り出すＤＢ１のレコードがなくなれば、ステップ１０の終了処理に移行する。一方、ＤＢ１より取り出すレコードが残っていれば、ステップ２に移行する。
【００１６】
ステップ２）データベース制御部３は、図３に示すようなフィールド構成の１レコードをＤＢ１から取り出し、清音化、拗音・撥音正規化、法人名称削除、職業語尾削除、固有情報抽出処理等の文字列変換処理を行い、変換処理後のデータをデータベース制御部３から一時記録部４に記録する。
【００１７】
ステップ３）結合対象データベース１のうちの地図データベース（ＤＢ２）に対して、ＤＢ１より取り出した１レコードと比較が済んでいないものが存在する限り、ステップ４からステップ６の操作を繰り返し、ＤＢ１から取り出された１レコードとＤＢ２の全てのレコードの比較が済んだ時点で処理をステップ７に移行する。
【００１８】
ステップ４）データベース制御部３により、図４に示すようなＤＢ２のフィールド構成の１レコードを取り出し、ステップ２と同様な文字列変換処理を加え、変換処理後のデータをデータベース制御部３から一時記録部４に記録する。
【００１９】
ステップ５）比較照合部６において、一時記録部４に記録されたＤＢ１のレコードとＤＢ２のレコードの住所コード及び、文字列変換処理を加えた名義、ビル名のフィールドについて比較照合を行い、比較照合の結果を採点表７に照らし合わせて点数を得る。
【００２０】
ステップ６）ステップ５で得られた点数とレコードと共に一時記録部４に保存し、ステップ３に移行する。
【００２１】
ステップ７）ＤＢ２の全てのレコードの比較が終了すれば、一時記録部４に記録されたレコードの点数のうち最も高い点数のレコードを、比較照合部６により判断し、結合を示すフラグを付与する。
【００２２】
ステップ８）ＤＢ１の１レコード及び、結合を示すフラグが付与されたＤＢ２のレコードから図５に示すようなフィールド構成の結合レコードを生成し、データベース制御部３から結合データベース２（ＤＢ３）に書き出す。
【００２３】
ステップ９）一時記録部４の内容を消去し、ステップ１に移行する。
【００２４】
ステップ１０）処理を終了する。
【００２５】
上記のような方法により、結合データベース２を作成することができる。
【００２６】
上記に示したフローチャート処理を具体的に説明する。
【００２７】
まず、図４に示すように、ＤＢ１の電話帳データベースのフィールドには、「県コード」、「市区コード」、「町村コード」及び「丁目」コードよりなる「住所コード」、「名義」、「ビル名」、「電話番号」及び「職業コード」がある。また、図４に示すように、ＤＢ２の地図データベースのフィールドには、「県コード」、「市区コード」、「町村コード」及び「丁目」コードよりなる「住所コード」、「名義」、「ビル名」、「地図座標」及び「建物ＩＤ」等がある。
【００２８】
ＤＢ１及びＤＢ２において、共通のフィールドである住所コードは、県コード、市区コード、町村コードそして、丁目コードのように住所階層に従った、階層化されたコードによって成り立っており、住所コードデータにおいて、県コード、市区コードそして、町村コードは信頼性が高く、丁目コードはそれらに比べて信頼性が低いなど、そのデータの確からしさにはフィールドの特徴によってばらつきがあり、一方、名義あるいは、ビル名などは、文字列で記述されており、表現上のゆらぎが存在する。
【００２９】
この文字列の表現上のゆらぎをもつ情報を統一するために文字変換処理を行う。
【００３０】
図６は本発明の一実施例の文字変換処理の例を示す。
【００３１】
同図は、ＤＢ１及びＤＢ２において共通のフィールドである名義或いはビル名などの表現上のゆらぎを持つ文字列で表現された情報に対する変換処理内容を示している。
【００３２】
まず、清音化処理は、濁音あるいは、半濁音の濁り音を消去する処理で、例えば、『ボタン書房』を「ホタン書房」とする。また、拗音・撥音正規化処理は、小さい文字「っ」を大きな文字「つ」に変換する処理で、例えば、『キャッシュ金融』を「キヤツシユ金融」とする。また、法人名削除処理は、「株式会社」などの法人を表す語を削除する処理で、例えば、『電信株式会社」を「電信」とする。また、職業語尾削除処理は、「駅」、「寺」などの職業語尾を削除する処理で、例えば、『ＪＲ新宿駅』を「ＪＲ新宿」とする。また、固有情報抽出処理は、固有名義を表現する語のみをあるロジックによって抽出する処理で、例えば、『田中特許事務所』を「田中」とする。
【００３３】
図７は本発明の一実施例の採点表を示す。同図に示す採点表は、文字列変換処理の処理内容及び住所コードの一致の度合いにより定まる点数処理について記述されたものである。
【００３４】
上記の文字列変換処理の結果と住所コードの一致を完全一致か末尾のみ相違しているものとに分け、採点する。例えば、文字列変換処理を行わずに、住所コードが完全一致した場合の採点は２０点とし、末尾のみが相違している場合の採点は、１０点、また、職業語尾削除処理後に住所コードと完全一致した場合には１３点となる。
【００３５】
上記のＤＢ１、ＤＢ２の結合対象データベース１より結合データベース２を生成する場合について説明する。
【００３６】
▲１▼まず、データベース制御部３は、結合対象のデータベースである電話帳データベース（ＤＢ１）の１レコードを取り出し、一般文字列処理部５により文字列の統一のための文字列変換処理を行い、一時記録部４にＤＢ１のレコードを格納する。
【００３７】
▲２▼データベース制御部３は、結合対象のデータベースである地図データベース（ＤＢ２）の１レコードを取り出し、▲１▼と同様に文字列変換処理を行い、一時記録部４にＤＢ２のレコードを格納する。
【００３８】
▲３▼本実施例では、住所コードの一致の度合いを参照するので、比較照合部６は、採点表７を用いて、一時記録部４に格納されているＤＢ１のレコードとＤＢ２のレコードの住所コードを比較し、その比較されたＤＢ２のレコードを点数と共に一時記録部４に格納しておく。このようにして、すべてのＤＢ２のレコードの比較照合が終了したら、一時記録部４に格納されている点数を付与されているＤＢ２のレコードのうち最も高い点数のレコードと▲１▼で一時記録部４に格納されているＤＢ１のレコードとを結合する。本実施例の場合は、図３のＤＢ１に対するレコードとして、、図４に示されるＤＢ２のレコードの住所コードが文字列変換処理については未処理状態でＤＢ１のレコードと完全一致しているために、最高点の２０点が付与される。
【００３９】
▲４▼本実施例では、結合データベース２のフィールド構成は、『電話番号』、『職業コード』、『地図座標』、及び『建物ＩＤ』である。従って、ＤＢ１より電話番号“03-3123-4567”、職業コード“10242002”が転送され、ＤＢ２より地図座標“24-65-72-41 ”、建物ＩＤ“54202 ”が転送され、これらのデータが結合データベース２に書き込まれる。
【００４０】
▲５▼１レコードの結合データベースへ２の書込みが終了したら、一時記録部４をクリアする。
【００４１】
上記実施例に示すように、住所コードのような表現にゆらぎのないフィールドデータを用いて、比較照合を行うレコードを絞り込むことによって、信頼性を落とさず高速に結合データベース２を作成することができる。
【００４２】
また、上記実施例では、情報の信頼性の度合から作成された採点表７による得点によりデータベースのレコードを関連付けて結合データベースを生成したが、ソフトウェア内部にテーブルを設けて処理しても良い。また、表現形式のみが異なる共通のフィールド情報については、共通のフィールド情報の表現形式を統一させる種々の解析アルゴリズムを用いてもよい。
【００４３】
【発明の効果】
上述のように本発明によれば、任意の異なる独立したデータベースに対してその情報に信頼性の違いあるいは、表現形式の相違をもつ個々のレコードを互いに関連付け、相互参照を可能とする結合データベースを、信頼性を損なわずに、人為的な作業を交えずに、自動的に作成することができる。
【図面の簡単な説明】
【図１】本発明の一実施例のデータベース結合方法のブロック図である。
【図２】本発明の一実施例の動作を示すフローチャートである。
【図３】本発明の一実施例の電話帳データベース（ＤＢ１）を示す図である。
【図４】本発明の一実施例の地図データベース（ＤＢ２）を示す図である。
【図５】本発明の一実施例の結合データベースの例を示す図である。
【図６】本発明の一実施例の文字変換処理の例を示す図である。
【図７】本発明の一実施例の採点表である。
【符号の説明】
１結合対象データベース
２結合データベース
３データベース制御部
４一時記録部
５一般文字列処理部
６比較照合部
７採点表[0001]
[Industrial application fields]
The present invention relates to a database joining method, and particularly relates to individual records having different reliability of information or having different representation formats in any different independent databases, and cross-reference them. The present invention relates to a database joining method for accurately and automatically creating a possible joined database.
[0002]
[Prior art]
Conventionally, when there is a request to associate and join individual records in any different independent database, if any different independent database is created based on a uniform rule, the common database These fields can be easily joined by a JOIN (joining) operation prepared in a DBMS (database management system).
[0003]
The JOIN operation, which is a database join method, sorts both tables to the database by the item to be joined (or uses an index) and retrieves the same key of both tables 7 and between the tables in the database. There is a method in which one table is used as a reference and the other database table is brute-forced and a table satisfying the conditions is extracted.
[0004]
[Problems to be solved by the invention]
However, the above-mentioned conventional method of joining databases by JOIN operation is that, if any different independent database is created based on the respective rules, the information expression format is not uniform and arbitrarily different Some information in an independent database, for example, “Office” and “Office” are common fields, and even if they point to the same matter, there are often such fluctuations in the expression, In addition, since there may be a difference in reliability between common field information in the database, for example, address prefecture information and address block information, etc. The number of cases is small, and depending on the field of interest, the reliability is very low. So to create an accurate join database,
(1) Create a new combined database artificially from the beginning, including the contents of the other database based on one database;
(2) Using one database as a reference, artificially find the record to be combined with each record from the contents of the other database;
However, when these methods are adopted, there are drawbacks such that the combination cannot be realized unless a large amount of man-made operation, work time, and work cost are invested.
[0005]
The present invention has been made in view of the above points, solves the above-described conventional problems, and records individual records having a difference in reliability or expression in any different independent database. It is an object of the present invention to provide a database combining method capable of automatically creating a combined database that associates each other with each other and enables cross-references without accurately performing human work.
[0006]
[Means for Solving the Problems]
The present invention relates to a database in which records of a database having a common field and having different degrees of reliability or the representation format of the reliability of the common field information are individually associated by using the common field information. In the joining method:
For the common field information with different expression formats, an analysis process to unify the expression format is performed, and common field information after the analysis process is created,
The record of reliability degree or representation of a common field information are different databases, as compared to each common field, the success or failure of the matching, the processing contents of the analysis process and the reliability of every case of the common field information Based on the score, the score is set in the scoring table in advance, the score is assigned to each record according to the degree of reliability, and the different records are associated and the individual records to be combined are determined based on the score.
[0007]
[Action]
The present invention relates to common field information having different representation formats even if the reliability of the common field information is low or the representation format is different between any different databases having common fields. Processes using various analysis algorithms that unify the expression format, create common field information after the analysis process, compare each field, and process the analysis process according to the success or failure of the field data association database contents and, to impart common field information reliability degrees if either et created by scoring table to score each record, the score associated individual records of different databases, to allow cross-reference Can be created automatically without any human work.
[0008]
【Example】
Embodiments of the database combining method of the present invention will be described below with reference to the drawings.
[0009]
FIG. 1 shows a block diagram of a database combining method according to an embodiment of the present invention.
[0010]
The system shown in FIG. 1 includes a plurality of join target databases 1 such as a telephone directory database (DB1) or a map database (DB2), a join database (DB3) 2 representing the result of the join process, and a join target database 1. The temporary recording unit 4 and the temporary recording unit 4 for instructing the database control unit 3 and the database control unit 3 to retrieve the record or update the combined database 2 to request or store the record and store the record. The general character string processing unit 5 that applies conversion processing to the recorded character strings, the record groups recorded in the temporary recording unit 4 are compared for each field, and score processing is performed based on the comparison result. Based on the process of writing back to the recording unit 4 and the points accumulated in the temporary recording unit 4, the combination partner record is determined.較照 engaging portion 6, with respect to inquiries from the comparison unit 6, comprised of scoring table 7 to issue a number corresponding to the comparison result.
[0011]
FIG. 2 is a flowchart showing the operation of one embodiment of the present invention.
[0012]
In the following description, in the processing shown in the flowchart of FIG. 10, DB1 is a telephone directory database and DB2 is a map database.
[0013]
FIG. 3 shows a telephone directory database (DB1) according to an embodiment of the present invention, and FIG. 4 shows a map database (DB2) according to an embodiment of the present invention. FIG. 5 shows an example of a combined database according to an embodiment of the present invention.
[0014]
Each record in the database of FIGS. 3 and 4 has a field structure. In this embodiment, DB1 and DB2 are combined to construct a combined database DB3 as shown in FIG.
[0015]
Step 1) Check whether there is an unprocessed record in the telephone directory database (DB1) in the database 1 to be joined. If there is no DB1 record to be extracted, the process proceeds to the end process of Step 10. On the other hand, if there remains a record to be retrieved from DB1, the process proceeds to step 2.
[0016]
Step 2) The database control unit 3 takes out one record of the field structure as shown in FIG. The conversion process is performed, and the data after the conversion process is recorded from the database control unit 3 to the temporary recording unit 4.
[0017]
Step 3) As long as there is a map database (DB2) in the database 1 to be combined that has not been compared with one record extracted from DB1, the operations from Step 4 to Step 6 are repeated to extract from DB1. When the comparison of one record and all records in DB2 is completed, the process proceeds to step 7.
[0018]
Step 4) The database control unit 3 takes out one record of the field structure of DB2 as shown in FIG. 4 and applies a character string conversion process similar to that in step 2, and temporarily records the converted data from the database control unit 3 Part 4 is recorded.
[0019]
Step 5) The comparison and collation unit 6 performs comparison and collation on the address code of the DB1 record and the DB2 record recorded in the temporary recording unit 4 and the name and building name fields to which character string conversion processing has been added. The score is obtained by comparing the result of the above with the scoring table 7.
[0020]
Step 6) The score and the record obtained in Step 5 are stored in the temporary recording unit 4 and the process proceeds to Step 3.
[0021]
Step 7) When the comparison of all records in DB2 is completed, the record with the highest score among the scores of the records recorded in the temporary recording unit 4 is judged by the comparison / collation unit 6, and a flag indicating the combination is given. .
[0022]
Step 8) A join record having a field configuration as shown in FIG. 5 is generated from one record of DB1 and a record of DB2 to which a flag indicating join is assigned, and is written from the database control unit 3 to the join database 2 (DB3).
[0023]
Step 9) The contents of the temporary recording unit 4 are erased, and the process proceeds to Step 1.
[0024]
Step 10) The process is terminated.
[0025]
The combined database 2 can be created by the method as described above.
[0026]
The flowchart process shown above will be specifically described.
[0027]
First, as shown in FIG. 4, the fields of the DB 1 phonebook database include “address code”, “name”, “name”, “prefecture code”, “city code”, “town code”, and “chome” codes. There are “building name”, “phone number”, and “occupation code”. Further, as shown in FIG. 4, the fields of the map database of DB 2 include “address code”, “name”, “name” including “prefecture code”, “city code”, “town code”, and “chome” code. There are “building name”, “map coordinates”, “building ID”, and the like.
[0028]
In DB1 and DB2, the address code, which is a common field, is composed of hierarchized codes according to the address hierarchy such as prefecture code, city code, town code, and chome code. , Prefecture code, city code and town and village code are highly reliable, chome code is less reliable than those, the accuracy of the data varies depending on the characteristics of the field, on the other hand, The building name is described by a character string, and there is fluctuation in expression.
[0029]
Character conversion processing is performed in order to unify information having fluctuations in the expression of the character string.
[0030]
FIG. 6 shows an example of character conversion processing according to an embodiment of the present invention.
[0031]
The figure shows the contents of conversion processing for information expressed by character strings having fluctuations in expression such as names or building names which are common fields in DB1 and DB2.
[0032]
First, the clean-up process is a process of deleting the muddy sound or the muddy sound of the semi-turbid sound. For example, “button shobo” is set to “hotan shobo”. The stuttering / sound repelling normalization process is a process of converting a small character “tsu” into a large character “tsu”. For example, “cash finance” is set to “cache finance”. The corporation name deletion process is a process of deleting a word representing a corporation such as “corporation”. The occupation ending deletion process is a process of deleting occupation endings such as “station” and “temple”. For example, “JR Shinjuku Station” is set to “JR Shinjuku”. The unique information extraction process is a process of extracting only a word expressing the unique name by a certain logic. For example, “Tanaka Patent Office” is set to “Tanaka”.
[0033]
FIG. 7 shows a scoring table of one embodiment of the present invention. The scoring table shown in the figure describes the score processing determined by the processing contents of the character string conversion processing and the degree of matching of the address codes.
[0034]
The result of the above character string conversion process and the match of the address code are divided into complete match or different only at the end, and are scored. For example, when the address code is completely matched without performing the character string conversion process, the score is 20 points, and when only the end is different, the score is 10 points. If there is a complete match, the score is 13.
[0035]
A case where the combined database 2 is generated from the above-described DB1 and DB2 target database 1 will be described.
[0036]
(1) First, the database control unit 3 takes out one record of the phone book database (DB1) which is a database to be combined, and performs a character string conversion process for unifying the character strings by the general character string processing unit 5, The record of DB1 is stored in the temporary recording unit 4.
[0037]
(2) The database control unit 3 takes out one record of the map database (DB2) that is the database to be combined, performs character string conversion processing in the same manner as (1), and stores the record of DB2 in the temporary recording unit 4 .
[0038]
(3) In this embodiment, since the degree of coincidence of the address code is referred to, the comparison / collation unit 6 uses the scoring table 7 to store the addresses of the DB1 record and the DB2 record stored in the temporary recording unit 4. The codes are compared, and the compared DB2 records are stored in the temporary recording unit 4 together with the score. In this way, when the comparison of all DB2 records is completed, the record with the highest score among the records of DB2 assigned with the points stored in the temporary recording unit 4 and the temporary recording unit with (1) 4 and the record of DB1 stored in 4 are combined. In the case of the present embodiment, as the record for DB1 in FIG. 3, the address code of the record in DB2 shown in FIG. 4 is completely matched with the record in DB1 in the unprocessed state for the character string conversion process. The highest 20 points are awarded.
[0039]
(4) In this embodiment, the field structure of the combined database 2 is “phone number”, “occupation code”, “map coordinates”, and “building ID”. Therefore, the telephone number “03-3123-4567” and the occupation code “10242002” are transferred from DB1, the map coordinates “24-65-72-41” and the building ID “54202” are transferred from DB2, and these data are stored. It is written in the combined database 2.
[0040]
(5) When the writing of 2 to the combined database of 1 record is completed, the temporary recording unit 4 is cleared.
[0041]
As shown in the above-described embodiment, by using field data that does not fluctuate in the expression such as an address code, it is possible to create the combined database 2 at high speed without reducing reliability by narrowing down the records to be compared and collated. .
[0042]
Moreover, in the said Example, although the record of the database was linked | related by the score by the scoring table 7 produced from the degree of reliability of information and the combined database was produced | generated, you may provide a table in software and process. For common field information that differs only in expression format, various analysis algorithms that unify the expression format of common field information may be used.
[0043]
【The invention's effect】
As described above, according to the present invention, a combined database capable of cross-referencing by associating individual records having differences in reliability or expression formats with respect to any different independent database. It can be created automatically without compromising reliability and without human work.
[Brief description of the drawings]
FIG. 1 is a block diagram of a database combining method according to an embodiment of the present invention.
FIG. 2 is a flowchart showing the operation of an embodiment of the present invention.
FIG. 3 is a diagram showing a telephone directory database (DB1) according to an embodiment of the present invention.
FIG. 4 is a diagram showing a map database (DB2) according to an embodiment of the present invention.
FIG. 5 is a diagram showing an example of a combined database according to an embodiment of the present invention.
FIG. 6 is a diagram illustrating an example of character conversion processing according to an embodiment of the present invention.
FIG. 7 is a scoring table according to an embodiment of the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Join object database 2 Join database 3 Database control part 4 Temporary recording part 5 General character string process part 6 Comparison collation part 7 Scoring table

Claims

In a database combining method for associating records of databases having common fields and having different degrees of reliability that are the certainty of the common field information or different representation formats by using the common field information,
For the common field information with different expression formats, an analysis process to unify the expression format is performed, and common field information after the analysis process is created,
The record of the reliability degree or representation of a common field information are different databases, as compared to each common field, the success or failure of the matching, the processing contents of the analysis process and the reliability of the common field information characterized in that set in advance score based on the time if the scoring table, given to each record by the degree of reliability該得points, associate a different database based on該得point, to determine the individual records that binds Database join method.