JP6949449B2

JP6949449B2 - Data search system and data search program

Info

Publication number: JP6949449B2
Application number: JP2018171603A
Authority: JP
Inventors: 恵哉生田
Original assignee: Toshiba Information Systems Japan Corp
Current assignee: Toshiba Information Systems Japan Corp
Priority date: 2018-09-13
Filing date: 2018-09-13
Publication date: 2021-10-13
Anticipated expiration: 2038-09-13
Also published as: JP2020042722A

Description

この発明は、データ検索システム及びデータ検索用プログラムに関するものである。 The present invention relates to a data search system and a data search program.

従来、大量データを擁するリレーショナルデータベースなどのデータベースシステムにおいて曖昧検索を行う場合には、極めて多くの時間を要するという問題があった。 Conventionally, there has been a problem that an extremely large amount of time is required when performing an ambiguous search in a database system such as a relational database having a large amount of data.

特許文献１には、全文検索エンジンとＲＤＢ（リレーショナルデータベース）を用いて高速検索前処理の状態情報を用いることで、検索時間を短縮することができる文書検索装置が開示されている。 Patent Document 1 discloses a document search device capable of shortening the search time by using the state information of high-speed search preprocessing using a full-text search engine and an RDB (relational database).

具体的には、複数の文書とフォルダに対し、パターンマッチング部と全文検索部とを備えて、検索対象の種類と状態とを判断して、上記パターンマッチング部と全文検索部とのいずれかにより検索を行うというものである。この特許文献１の発明は、パターンマッチング部は即時検索可能であるが検索時間がかかるというという特徴があり、全文検索部は登録に時間がかかるが検索時間が高速であるという特徴があることに鑑み、これらを切換えてそれぞれを有利な場面で使用するものである。 Specifically, a pattern matching unit and a full-text search unit are provided for a plurality of documents and folders, the type and state of the search target are determined, and either the pattern matching unit or the full-text search unit is used. It is a search. The invention of Patent Document 1 is characterized in that the pattern matching unit can be searched immediately but the search time is long, and the full-text search unit is characterized in that the registration takes time but the search time is high. In view of this, these are switched and each is used in an advantageous situation.

特許文献２には、複数のクライアントと通信可能な全文検索エンジンは文書登録時に全文検索エンジンが文書へのアクセス権を有するクライアントの識別符号（ユーザＩＤ）に制御文字（例えば区切り文字）を付加した文字列を上記文書と共に格納すること、また、全文検索エンジンは文書へのアクセス要求時に検索語に対して上記アクセス要求したクライアントの識別符号に上記制御文字を付与した文字列を追加して全文検索を実行することが、開示されている。 In Patent Document 2, a full-text search engine capable of communicating with a plurality of clients adds a control character (for example, a delimiter) to the identification code (user ID) of the client to which the full-text search engine has the right to access the document at the time of document registration. The character string is stored together with the above document, and the full-text search engine adds the character string in which the above control character is added to the identification code of the client who requested the access to the search term when the document is requested to access the document and performs a full-text search. Is disclosed.

更に上記特許文献２の発明では、全文検索エンジンが、文書本体とは別に当該文書の属性値を格納するカラムを有しており、全文検索エンジンは、文書登録時にクライアントの識別符号と制御文字とからなる文字列を文書の属性値としてカラムに格納し、文書へのアクセス要求時に前記アクセス要求したクライアントの識別符号に制御文字を付与した文字列を上記カラムに対する検索語として全文検索する。以上により、全文検索と共にアクセス制御が同時にできることになる、というものである。 Further, in the invention of Patent Document 2, the full-text search engine has a column for storing the attribute value of the document separately from the document body, and the full-text search engine uses the identification code and the control character of the client at the time of document registration. A character string consisting of is stored in a column as an attribute value of a document, and a character string in which a control character is added to the identification code of the client requested to access the document is searched in full text as a search term for the column. As a result, access control can be performed at the same time as full-text search.

また、特許文献３には、金融情報検索システムとして、各銘柄について記載された文書データを保持する文書ＤＢを所定タイミングでクローリングして全文検索用の文書インデックスを作成するクローラを備える検索エンジンが開示されている。この検索エンジンは更に、営業端末から受け付けた検索要求に対して文書インデックスからマッチする文書データに係るレコードのうち、上位の所定の件数を検索結果として応答する検索処理部を有しており、営業端末から受け付けた検索要求においてキーワードの指定がなされていない場合に、検索エンジンによる検索ではなく、文書ＤＢに対して直接に検索処理を行うＤＢ検索部を有するというものである。 Further, Patent Document 3 discloses a search engine including a crawler that creates a document index for full-text search by crawling a document DB holding document data describing each issue at a predetermined timing as a financial information retrieval system. Has been done. This search engine further has a search processing unit that responds as a search result to a predetermined number of high-ranking records related to document data matching from the document index in response to a search request received from a sales terminal. It has a DB search unit that directly performs a search process on a document DB instead of a search by a search engine when a keyword is not specified in the search request received from the terminal.

上記特許文献３の発明によれば、検索エンジンによる検索の際にキーワードが指定されていない場合においても、検索結果に対するソートの条件で上位の所定件数に入り得るデータが漏れずに表示されるようになる、という効果を奏することになる。 According to the invention of Patent Document 3, even when a keyword is not specified at the time of a search by a search engine, data that can be included in a predetermined number of high-ranking items under the sorting conditions for the search result is displayed without omission. It will have the effect of becoming.

更に、特許文献４には、入力部によって入力されたユーザ指定の検索キーワードのデータ構造上の特徴（データ型等）を解析するデータ型解析部と、リレーショナルデータベースに格納されている検索の対象となるテーブルの各カラムのうち、上記解析された検索キーワードのデータ構造上の特徴（データ型等）に合致するカラムを、上記検索対象列として検出する検索対象列検出部とを備える検索対象列決定装置が開示されている。 Further, Patent Document 4 includes a data type analysis unit that analyzes data structure features (data types, etc.) of a user-specified search keyword input by the input unit, and a search target stored in a relational database. A search target column determination including a search target column detection unit that detects a column that matches the data structure characteristics (data type, etc.) of the analyzed search keyword as the search target column among the columns of the table. The device is disclosed.

上記特許文献４の発明は、検索キーワードのデータ構造上の特徴から全文検索の対象とすべきカラムを動的に絞ることにより、全文検索時の応答性能を向上させるというものである。 The invention of Patent Document 4 is to improve the response performance at the time of full-text search by dynamically narrowing down the columns to be the target of full-text search based on the data structure characteristics of the search keyword.

特開２００６−７９４２３号公報Japanese Unexamined Patent Publication No. 2006-79423 特開２００９−１６９７３６号公報JP-A-2009-169736 特開２０１５−１８５０１３号公報Japanese Unexamined Patent Publication No. 2015-185013 特開２０１０−６７２１３号公報JP-A-2010-67213

本発明は、上記のような検索システムより以上に高速な検索を可能とするデータ検索システム及びデータ検索用プログラムを提供することを目的とする。 An object of the present invention is to provide a data search system and a data search program that enable a faster search than the above-mentioned search system.

本発明に係るデータ検索システムは、１単位の管理対象データのテーブルが複数テーブル蓄積されたデータベースと、前記データベースの前記複数テーブルの全てを検索情報で検索し、前記管理対象データの１単位であるテーブルを全て特定し、特定された全てのテーブルについてのユニークな値を主キー情報として、この主キー情報に該当テーブルの内容データ中の検索対象情報を属性情報として帰属させた第１のインデックステーブルを作成し、この第１のインデックステーブルをまとめた第１のインデックスファイルを生成する第１のクローラ収集手段と、検索すべきキーワードが与えられると、前記第１のインデックスファイルを検索して、当該キーワードに対応するデータを備える第１のインデックステーブルを検出してこの第１のインデックステーブルの主キー情報を求める主キー情報取得手段と、前記主キー情報取得手段が求めた主キー情報に基づき前記データベースを検索し、得られた該当テーブルから前記キーワードに対応するデータを取り出すデータベース検索手段と、情報を表示する表示手段と、前記データベース検索手段が取り出したデータに基づく表示を前記表示手段に行う表示制御手段とを具備することを特徴とする。
The data search system according to the present invention searches for a database in which a plurality of tables of managed data in one unit are accumulated and all of the plurality of tables in the database with search information, and is one unit of the managed data. A first index table in which all the tables are specified, unique values for all the specified tables are used as the primary key information, and the search target information in the content data of the corresponding table is assigned to this primary key information as attribute information. Is given, and a first crawler collecting means for generating a first index file that summarizes the first index table and a keyword to be searched are given, the first index file is searched and the relevant first index file is searched. The primary key information acquisition means for detecting the first index table having data corresponding to the keyword and obtaining the primary key information of the first index table, and the primary key information obtained by the primary key information acquisition means. A database search means for searching a database and extracting data corresponding to the keyword from the obtained corresponding table, a display means for displaying information, and a display for displaying based on the data extracted by the database search means on the display means. It is characterized by having a control means.

本発明に係るデータ検索システムの実施形態の構成を示すブロック図。The block diagram which shows the structure of embodiment of the data search system which concerns on this invention. 本発明に係るデータ検索システムの実施形態において用いられるデータベースの内容の一例を示す図。The figure which shows an example of the contents of the database used in embodiment of the data search system which concerns on this invention. 本発明に係るデータ検索システムの実施形態において、データベースのテーブルから第１のインデックスファイル内の第１のインデックステーブルを作成する過程の一例を示す図。FIG. 5 is a diagram showing an example of a process of creating a first index table in a first index file from a database table in an embodiment of a data retrieval system according to the present invention. 本発明に係るデータ検索システムの実施形態において用いられるファイル装置に蓄積された添付ファイルの内容の一例を示す図。The figure which shows an example of the contents of the attached file stored in the file apparatus used in embodiment of the data search system which concerns on this invention. 本発明に係るデータ検索システムの実施形態において、ファイル装置の添付ファイルから第２のインデックスファイル内の第２のインデックステーブルを作成する過程の一例を示す図。FIG. 5 is a diagram showing an example of a process of creating a second index table in a second index file from an attached file of a file device in an embodiment of a data retrieval system according to the present invention. 本発明に係るデータ検索システムの実施形態の動作を示すフローチャート。The flowchart which shows the operation of the embodiment of the data search system which concerns on this invention.

以下添付図面を参照して、本発明に係るデータ検索システム及びデータ検索用プログラムの実施形態を説明する。各図において、同一の構成要素には同一の符号を付して重複する説明を省略する。図１には、本発明に係るデータ検索システムの実施形態の構成図が示されている。実施形態に係るデータ検索システムは、種々のデータが蓄積されたデータベース３００と、このデータベース３００の全ての文の検索を行う全文検索エンジン５００とが備えられている。データベース３００としては、例えばリレーショナルデータベースを採用することができる。 Hereinafter, embodiments of the data search system and the data search program according to the present invention will be described with reference to the accompanying drawings. In each figure, the same components are designated by the same reference numerals, and duplicate description will be omitted. FIG. 1 shows a configuration diagram of an embodiment of a data retrieval system according to the present invention. The data search system according to the embodiment includes a database 300 in which various data are stored, and a full-text search engine 500 that searches all sentences in the database 300. As the database 300, for example, a relational database can be adopted.

ここでは、データベース３００が蓄積している管理対象データが商品カタログのデータであり、例えば、図２に示されるように１単位の管理対象データのテーブルＤ１１、Ｄ１２、Ｄ１３、・・・、Ｄ１ｎが蓄積されている。１つのテーブルについては、ユニークな値に対し、必要な項目が複数配置された構造を有する。本実施形態では、ユニークな値はテーブルの先頭に配置されている商品番号であり、各テーブルＤ１１、Ｄ１２、Ｄ１３、・・・、Ｄ１ｎには、項目のデータとして、「商品名」、「商品名カナ」、「荷姿」、「取扱開始日」、「取扱終了日」、・・・と並んでいる。なお、項目内のデータの並び順は、例示に過ぎない。 Here, the managed data accumulated in the database 300 is the data of the product catalog, and for example, as shown in FIG. 2, the tables D11, D12, D13, ..., D1n of one unit of managed data are It has been accumulated. One table has a structure in which a plurality of necessary items are arranged for unique values. In the present embodiment, the unique value is the product number arranged at the head of the table, and in each table D11, D12, D13, ..., D1n, as item data, "product name" and "product" are displayed. It is lined up with "name kana", "packaging", "handling start date", "handling end date", and so on. The order of the data in the items is only an example.

全文検索エンジン５００には、第１のクローラ収集手段５０３が備えられている。第１のクローラ収集手段５０３は、上記データベース３００を検索し、上記管理対象データの１単位であるテーブルを特定し、特定された全てのテーブルについてのユニークな値を主キー情報として、この主キー情報に該当テーブルの内容データ中の検索対象情報を属性情報として帰属させた第１のインデックステーブルを作成し、この第１のインデックステーブルをまとめた第１のインデックスファイル５０１を生成するものである。 The full-text search engine 500 includes a first crawler collecting means 503. The first crawler collecting means 503 searches the database 300, identifies a table that is one unit of the managed data, and uses unique values for all the identified tables as primary key information, and this primary key. A first index table is created in which the search target information in the content data of the corresponding table is assigned to the information as attribute information, and a first index file 501 that summarizes the first index table is generated.

既に説明したように、データベース３００のテーブルＤ１１には、商品番号と、項目のデータとして、「商品名」、「商品名カナ」、「荷姿」、「取扱開始日」、「取扱終了日」、・・・が記憶されているので、第１のクローラ収集手段５０３は上記「商品名」、「商品名カナ」、「荷姿」、「取扱開始日」、「取扱終了日」、・・・というデータから、ＳＱＬ（Structured Query Language）で指定した検索対象情報に該当する文字列の項目（カラム）の情報を検索して、帰属情報として帰属させて第１のインデックステーブルを作成する。 As already explained, in the table D11 of the database 300, the item number and the item data are "product name", "product name kana", "packing style", "handling start date", and "handling end date". Since ... -From the data, the information of the item (column) of the character string corresponding to the search target information specified in SQL (Structured Query Language) is searched and assigned as the attribution information to create the first index table.

図３に、テーブルＤ１１から第１のインデックステーブルＤ４１を作成する過程を示す。ここでは、ＳＱＬにより指定された検索対象情報に「商品番号」「商品名」「商品カナ」が該当したことを示す。テーブルＤ１２〜Ｄ１ｎについても同様に検索が行われ、ＳＱＬにより指定された検索対象情報に該当する情報が項目（カラム）に含まれていたテーブルに対応してインデックステーブルが作成される。従って、テーブルＤ１１〜Ｄ１ｎの全てについてインデックステーブルが作成される訳ではない。例えば、テーブルＤ１２には、ＳＱＬにより指定された検索対象情報の「商品番号」「商品名」「商品カナ」が含まれていないので、このテーブルに対応するインデックステーブルは作成されない。以上のようにして作成された幾つかのインデックステーブルが全て１つにまとめられて第１のインデックスファイル５０１とされる。 FIG. 3 shows a process of creating the first index table D41 from the table D11. Here, it is shown that the "product number", "product name", and "product kana" correspond to the search target information specified by SQL. The same search is performed for the tables D12 to D1n, and an index table is created corresponding to the table in which the information corresponding to the search target information specified by SQL is included in the item (column). Therefore, index tables are not created for all of the tables D11 to D1n. For example, since the table D12 does not include the "product number", "product name", and "product kana" of the search target information specified by SQL, the index table corresponding to this table is not created. The several index tables created as described above are all combined into one to be the first index file 501.

本実施形態では、データベース３００のデータのディレクトリ配下の添付ファイルが蓄積されたファイル装置４００が設けられている。例えば、ファイル装置４００中の１つの添付ファイルＤ２１は図４に示すようであり、図２に示したテーブルＤ１１のディレクトリ配下の添付ファイルである。そして、添付ファイルＤ２１のユニークな値は、テーブルＤ１１のディレクトリ配下を示すためにテーブルＤ１１と同じ「商品番号」に対し、この添付ファイルＤ２１にユニークな「連番」が付加されたものとなっている。添付ファイルＤ２１には、上記ユニークな値である「商品番号＿連番」以外に、この商品番号に対応付けられているパンフレットや取扱説明書などのデータが保存されている。ファイル装置４００には、この添付ファイルＤ２１と同様に複数の添付ファイルが蓄積されており、その添付ファイルにユニークな「商品番号＿連番」と共に、この商品番号に対応付けられているパンフレットや取扱説明書などのデータが保存されている。この添付ファイルはそれぞれ、データベース３００が蓄積しているテーブルＤ１１、Ｄ１２、Ｄ１３、・・・、Ｄ１ｎのディレクトリ配下として記憶されている。なお、添付ファイルは、テーブルＤ１１、Ｄ１２、Ｄ１３、・・・、Ｄ１ｎの全てに必ずしも対応付けられているものではなく、添付ファイルが対応付けられていないテーブルも存在する。また、添付ファイルの識別子であるユニークな識別情報である「商品番号＿連番」には、「連番」となっていることからも明らかな通り、データベース３００中の１つのテーブルに対して複数の添付ファイルが存在していても良く、この場合は「連番」の部分は「０１」、「０２」、・・・となる。 In the present embodiment, the file device 400 in which the attached files under the data directory of the database 300 are stored is provided. For example, one attached file D21 in the file device 400 is as shown in FIG. 4, and is an attached file under the directory of the table D11 shown in FIG. Then, the unique value of the attached file D21 is the same "product number" as the table D11 to indicate the directory subordinate of the table D11, and a unique "serial number" is added to the attached file D21. There is. In the attached file D21, in addition to the above-mentioned unique value "product number_serial number", data such as a pamphlet and an instruction manual associated with this product number are stored. Similar to the attached file D21, a plurality of attached files are accumulated in the file device 400, and the attached file has a unique "product number_serial number", as well as brochures and handling associated with this product number. Data such as instructions are saved. These attached files are stored under the directories of the tables D11, D12, D13, ..., D1n in which the database 300 is stored, respectively. The attached file is not necessarily associated with all of the tables D11, D12, D13, ..., D1n, and there is a table to which the attached file is not associated. Further, as is clear from the fact that the "product number_serial number", which is the unique identification information that is the identifier of the attached file, is the "serial number", there are a plurality of items for one table in the database 300. The attached file may exist. In this case, the "serial number" part is "01", "02", ....

全文検索エンジン５００には、第２のクローラ収集手段５０４が備えられている。第２のクローラ収集手段５０４は、上記ファイル装置４００内を検索して、ユニークな値を識別情報に該当添付ファイルの上記所要データを帰属させた第２のインデックステーブルを作成し、この第２のインデックステーブルをまとめた第２のインデックスファイル５０２を生成するものである。 The full-text search engine 500 is provided with a second crawler collecting means 504. The second crawler collecting means 504 searches the file device 400 and creates a second index table in which the required data of the attached file is assigned to the identification information with a unique value, and the second index table is created. It generates a second index file 502 that summarizes the index table.

既に説明したように、ファイル装置４００には、添付ファイルＤ２１・・・等が蓄積されており、その添付ファイルにユニークな「商品番号＿連番」と共に、この商品番号に対応付けられているパンフレットや取扱説明書などのデータが保存されている。第２のクローラ収集手段５０４は、「商品番号＿連番」に対応付けられているパンフレットや取扱説明書などのデータから、ＳＱＬ（Structured Query Language）で指定された検索対象情報に該当する文字列のデータを帰属情報として帰属させて第２のインデックステーブルを作成する。このＳＱＬ（Structured Query Language）で指定された検索対象情報は、第１のインデックステーブルを作成するときに用いた検索対象情報と異なっても良い。上記第１のクローラ収集手段５０３及び上記第２のクローラ収集手段５０４は、形態素解析とＮ−Ｇｒａｍのいずれかにより検索を行う構成とすることができる。 As described above, the file device 400 stores attached files D21, etc., and the attached file has a unique "product number_serial number" and a pamphlet associated with this product number. And data such as instruction manuals are saved. The second crawler collecting means 504 is a character string corresponding to the search target information specified in SQL (Structured Query Language) from data such as a brochure or an instruction manual associated with "product number_serial number". A second index table is created by assigning the data of. The search target information specified in this SQL (Structured Query Language) may be different from the search target information used when creating the first index table. The first crawler collecting means 503 and the second crawler collecting means 504 can be configured to perform a search by either morphological analysis or N-Gram.

図５には、ファイル装置４００の添付ファイルＤ２１から第２のインデックステーブルＤ４２を作成する過程を示す。ここでは、ＳＱＬにより指定された検索対象情報は、パンフレットや取扱説明書などのデータが該当したことを示す。第２のインデックステーブルＤ４２は、添付ファイルＤ２１のユニークな値である「商品番号＿連番」を識別情報とし、この識別情報に該当添付ファイルのＳＱＬにより指定された検索対象情報に該当したパンフレットや取扱説明書などのデータを帰属させた第２のインデックステーブルを作成する。ファイル装置４００中の添付ファイルＤ２１以外の図示しない添付ファイルについても同様に検索が行われ、ＳＱＬにより指定された検索対象情報に該当するデータがヒットした場合には、この添付ファイルの識別情報である「商品番号＿連番」に対応してインデックステーブルが作成される。ＳＱＬにより指定された検索対象情報に該当するデータがヒットしない場合には、インデックステーブルは作成されない。従って、ファイル装置４００内の全ての添付ファイルについてインデックステーブルが作成される訳ではない。以上のようにして作成された幾つかのインデックステーブルが全て１つにまとめられて第２のインデックスファイル５０２とされる。 FIG. 5 shows a process of creating a second index table D42 from the attached file D21 of the file device 400. Here, the search target information specified by SQL indicates that data such as a pamphlet or an instruction manual is applicable. The second index table D42 uses "item number_serial number", which is a unique value of the attached file D21, as identification information, and the pamphlet or the pamphlet corresponding to the search target information specified by the SQL of the attached file corresponding to this identification information. Create a second index table to which data such as instruction manuals are assigned. The same search is performed for the attached file (not shown) other than the attached file D21 in the file device 400, and when the data corresponding to the search target information specified by the SQL hits, it is the identification information of this attached file. An index table is created corresponding to "Product number_Sequential number". If the data corresponding to the search target information specified by SQL is not hit, the index table is not created. Therefore, the index table is not created for all the attached files in the file device 400. The several index tables created as described above are all combined into one to form a second index file 502.

本実施形態では、本検索装置２００が設けられる。本検索装置２００の「本」の意味は、全文検索エンジン５００によっても検索が行われるため、本来的な検索要求が到来してからの検索が、この本検索装置２００において行われることを示すものである。この本検索装置２００には、検索端末１０１から検索要求とキーワードが与えられる。検索端末１０１としては、ネットワーク等により接続されるパーソナルコンピュータやワークステーション、携帯電話機やスマートフォン等の移動携帯端末などを採用することができる。 In this embodiment, the search device 200 is provided. The meaning of "book" in the search device 200 is that the search is performed by the full-text search engine 500 as well, so that the search after the original search request arrives is performed in the search device 200. Is. A search request and a keyword are given to the search device 200 from the search terminal 101. As the search terminal 101, a personal computer or workstation connected by a network or the like, a mobile mobile terminal such as a mobile phone or a smartphone, or the like can be adopted.

本検索装置２００には、主キー情報取得手段２０５とデータベース検索手段２０１とが備えられている。主キー情報取得手段２０５は、検索すべきキーワードが与えられると、上記第１のインデックスファイル５０１を検索して、当該キーワードに対応するデータを備える第１のインデックステーブルを検出してこの第１のインデックステーブルの主キー情報を求めるものである。具体的は、主キー情報取得手段２０５は、全文検索エンジン５００へ検索要求を与えて第１のインデックスファイル５０１の検索を行わせ、主キー情報を得るものである。第１のインデックスファイル５０１には、「商品番号」を主キー情報とし、属性情報が属する第１のインデックステーブルが複数格納されているので、この第１のインデックステーブル全てについてキーワードを用いて検索を行い、該当するキーワードが含まれる第１のインデックステーブルを求めて、その主キー情報である「商品番号」を求める。従って、全文検索エンジン５００による第１のインデックスファイル５０１の検索が終了すると、幾つかの主キー情報である「商品番号」が求まっているか、キーワードにヒットする属性情報が無かったために、「商品番号」が求まっていないかである。この情報は上記主キー情報取得手段２０５へ送られる。 The search device 200 is provided with a primary key information acquisition means 205 and a database search means 201. When a keyword to be searched is given, the primary key information acquisition means 205 searches the first index file 501, detects a first index table having data corresponding to the keyword, and detects the first index table. It asks for the primary key information of the index table. Specifically, the primary key information acquisition unit 205 obtains the primary key information by giving a search request to the full-text search engine 500 to search the first index file 501. Since a plurality of first index tables to which the attribute information belongs are stored in the first index file 501 with the "product number" as the primary key information, a search is performed using keywords for all of the first index tables. Then, the first index table including the corresponding keyword is obtained, and the "product number" which is the primary key information is obtained. Therefore, when the search of the first index file 501 by the full-text search engine 500 is completed, some primary key information "product number" is obtained, or there is no attribute information that hits the keyword, so that "product number" is obtained. Is not being sought. This information is sent to the primary key information acquisition means 205.

データベース検索手段２０１は、上記主キー情報取得手段２０５が求めた主キー情報に基づき上記データベース３００を検索し、得られた該当テーブルから上記キーワードに対応するデータを取り出すものである。つまり、データベース検索手段２０１が主キー情報に基づき上記データベース３００を検索するとき、該当のデータが存在しているデータベース３００のテーブルへと高速に確実に行き着くので、このテーブルからキーワードに対応する所望のデータを取り出すことができる。 The database search means 201 searches the database 300 based on the primary key information obtained by the primary key information acquisition means 205, and extracts data corresponding to the keyword from the obtained corresponding table. That is, when the database search means 201 searches the database 300 based on the primary key information, it quickly and surely arrives at the table of the database 300 in which the corresponding data exists. Therefore, it is desired to correspond to the keyword from this table. Data can be retrieved.

本検索装置２００には、表示制御手段２０６が設けられている。上記データベース検索手段２０１により得られたデータは、表示制御手段２０６へ送られる。表示制御手段２０６は、上記データベース検索手段２０１が取り出したデータに基づく表示を検索端末１０１に送って、その表示手段において表示を行うようにする。 The search device 200 is provided with display control means 206. The data obtained by the database search means 201 is sent to the display control means 206. The display control means 206 sends a display based on the data extracted by the database search means 201 to the search terminal 101 so that the display means performs the display.

表示制御手段２０６には、表示データ加工手段２０３と表示処理手段２０４が設けられている。表示データ加工手段２０３は、上記データベース検索手段２０１により得られたたデータ（キーワードに対応してヒットしたデータ）を検索端末１０１に一覧表示するデータとして加工する処理を行う。表示処理手段２０４は、上記加工されたデータを検索端末１０１の表示手段（ＬＥＤ等の表示器）に表示可能な表示データとして送出する。 The display control means 206 is provided with a display data processing means 203 and a display processing means 204. The display data processing means 203 processes the data (data hit corresponding to the keyword) obtained by the database search means 201 as data to be displayed in a list on the search terminal 101. The display processing means 204 sends the processed data as display data that can be displayed on the display means (display such as LED) of the search terminal 101.

更に、本検索装置２００には、識別情報取得手段２０７と添付ファイル検索手段２０２とが備えられている。識別情報取得手段２０７は、検索すべきキーワードが与えられると、上記第２のインデックスファイル５０２を検索して、当該キーワードに対応するデータを備える第２のインデックステーブルを検出してこの第２のインデックステーブルの識別情報を求めるものである。具体的には、識別情報取得手段２０７は、全文検索エンジン５００へ検索要求を与えて第２のインデックスファイル５０２を検索させて、識別情報を得るものである。第２のインデックスファイル５０２には、「商品番号＿連番」を識別情報とし、パンフレットや取扱説明書などのデータを帰属させた第２のインデックステーブルが複数格納されているので、この第２のインデックステーブル全てについてキーワードを用いて検索を行い、該当するキーワードが含まれる第２のインデックステーブルを求めて、その識別情報である「商品番号＿連番」を求める。従って、全文検索エンジン５００による第２のインデックスファイル５０２の検索が終了すると、幾つかの識別情報である「商品番号＿連番」が求まっているか、キーワードにヒットするデータを有する第２のインデックステーブルが無かったために、「商品番号＿連番」が求まっていないかである。この情報は上記識別情報取得手段２０７へ送られる。 Further, the search device 200 is provided with identification information acquisition means 207 and attachment file search means 202. When a keyword to be searched is given, the identification information acquisition means 207 searches the second index file 502, finds a second index table having data corresponding to the keyword, and detects the second index. It asks for the identification information of the table. Specifically, the identification information acquisition unit 207 obtains the identification information by giving a search request to the full-text search engine 500 and causing the second index file 502 to be searched. Since the second index file 502 stores a plurality of second index tables in which "product number_serial number" is used as identification information and data such as brochures and instruction manuals are assigned, this second index file 502 is stored. A search is performed on all index tables using keywords, a second index table including the corresponding keyword is obtained, and the identification information "product number_serial number" is obtained. Therefore, when the search of the second index file 502 by the full-text search engine 500 is completed, some identification information "product number_serial number" is obtained, or a second index table having data that hits the keyword is obtained. Because there was no such thing, "Product number_Sequential number" was not obtained. This information is sent to the identification information acquisition means 207.

添付ファイル検索手段２０２は、上記識別情報取得手段２０７が求めた識別情報に基づき上記ファイル装置４００を検索し、得られた該当添付ファイルから上記キーワードに対応するデータを取り出すものである。つまり、添付ファイル検索手段２０２が求めた識別情報に基づき上記ファイル装置４００を検索するとき、該当するデータが存在しているファイル装置４００のテーブルへと高速に確実に行き着くので、このテーブルからキーワードに対応する所望のデータを取り出すことができる。 The attached file search means 202 searches the file device 400 based on the identification information obtained by the identification information acquisition means 207, and extracts data corresponding to the keyword from the obtained attached file. That is, when the file device 400 is searched based on the identification information obtained by the attached file search means 202, the table of the file device 400 in which the corresponding data exists can be reliably reached at high speed. The corresponding desired data can be retrieved.

このようにして取り出されたデータは、表示制御手段２０６へ送られ、表示制御手段２０６は、上記データベース検索手段２０１が取り出したデータに基づく表示を検索端末１０１に送って、その表示手段において表示を行うようにする。 The data extracted in this way is sent to the display control means 206, and the display control means 206 sends a display based on the data extracted by the database search means 201 to the search terminal 101, and the display means displays the display. Try to do it.

上記添付ファイル検索手段２０２が取り出したデータについて、表示データ加工手段２０３は、上記データベース検索手段２０１により得られたデータと共に加工を行う。例えば、検索端末１０１に一覧表示するデータ内に、添付ファイル検索手段２０２が取り出したパンフレットや取扱説明書などのデータを最小限個別に含ませて加工する処理を行う。表示処理手段２０４は、上記加工されたデータを検索端末１０１の表示手段（ＬＥＤ等の表示器）に表示可能な表示データとして送出する。 The display data processing means 203 processes the data extracted by the attached file search means 202 together with the data obtained by the database search means 201. For example, the data displayed in the list on the search terminal 101 is processed by including at least the data such as the pamphlet and the instruction manual taken out by the attached file search means 202 individually. The display processing means 204 sends the processed data as display data that can be displayed on the display means (display such as LED) of the search terminal 101.

以上のような構成において、上記第１のクローラ収集手段５０３は、任意の時刻に処理を行うことができる。例えば、午前０時からの６時間中において１分間隔で処理を行うことができる。また、第２のクローラ収集手段５０４についても、上記と同様に任意の時刻に処理を行うことができる。 In the above configuration, the first crawler collecting means 503 can perform processing at an arbitrary time. For example, processing can be performed at 1-minute intervals during 6 hours from midnight. Further, the second crawler collecting means 504 can also be processed at an arbitrary time in the same manner as described above.

上記のように、第１のクローラ収集手段５０３により第１のインデックスファイル５０１に第１のインデックステーブルが収集されており、第２のクローラ収集手段５０４により第２のインデックスファイル５０２に第２のインデックステーブルが収集されている。このときに、検索端末１０１からキーワードの検索要求がされると、図６に示すフローチャートに示すような処理が行われる。 As described above, the first crawler collecting means 503 collects the first index table in the first index file 501, and the second crawler collecting means 504 collects the second index in the second index file 502. The table is being collected. At this time, when a keyword search request is made from the search terminal 101, the processing as shown in the flowchart shown in FIG. 6 is performed.

検索端末１０１からキーワードの検索要求がされると、検索処理が開始される。検索端末１０１から与えられたキーワードに基づき第１のインデックスファイル５０１に対する検索を行い、該当する第１のインデックステーブルにおける主キー情報である「商品番号」を取得して、内部メモリテーブルＡ（図１）へ格納する（Ｓ１１）。 When a keyword search request is made from the search terminal 101, the search process is started. The first index file 501 is searched based on the keyword given from the search terminal 101, the "product number" which is the primary key information in the corresponding first index table is acquired, and the internal memory table A (FIG. 1). ) (S11).

次に、検索端末１０１から与えられたキーワードに基づき第２のインデックスファイル５０２に対する検索を行い、該当する第２のインデックステーブルにおける識別情報である「商品番号＿連番」を取得して、「商品番号」部分のみを内部メモリテーブルＢ（図１）へ格納する（Ｓ１２）。 Next, the second index file 502 is searched based on the keyword given from the search terminal 101, the "product number_serial number" which is the identification information in the corresponding second index table is acquired, and the "product number_serial number" is acquired. Only the "number" part is stored in the internal memory table B (FIG. 1) (S12).

次に、内部メモリテーブルＡの主キー情報と内部メモリテーブルＢの論理和を作成し、内部メモリテーブルＣ（図１）に格納する（Ｓ１３）。つまり、同じ「商品番号」があれば１つとする。このようにして、データベース３００とファイル装置４００を曖昧検索し、一致した主キー情報（「商品番号」）を高速に得ることができる。次に、内部メモリテーブルＣの主キー情報のみを用いてデータベース３００へアクセスし、得られた該当テーブルから上記キーワードに対応するデータを取り出すものである（Ｓ１４：データベース検索手段２０１）。 Next, the primary key information of the internal memory table A and the logical sum of the internal memory table B are created and stored in the internal memory table C (FIG. 1) (S13). In other words, if there is the same "product number", it will be one. In this way, the database 300 and the file device 400 can be ambiguously searched, and the matching primary key information (“product number”) can be obtained at high speed. Next, the database 300 is accessed using only the primary key information of the internal memory table C, and the data corresponding to the above keyword is extracted from the obtained corresponding table (S14: database search means 201).

上記で取り出されたデータを加工して（Ｓ１５：表示データ加工手段２０３）、検索端末１０１へ送って表示手段へ表示する（Ｓ１６：表示処理手段２０４）。 The data extracted above is processed (S15: display data processing means 203), sent to the search terminal 101, and displayed on the display means (S16: display processing means 204).

上記表示制御手段２０６は、上記添付ファイル検索手段が取り出したデータがない場合には、上記データベース検索手段が取り出したデータのみに基づく表示を前記表示手段に行うことができる。また、上記表示制御手段２０６は、上記データベース検索手段２０１が取り出したデータ及び上記添付ファイル検索手段２０２が取り出したデータがない場合には、検索結果が得られないことの表示を上記表示手段に行うことができる。 When the display control means 206 does not have the data retrieved by the attached file search means, the display control means 206 can display the display means based only on the data retrieved by the database search means. Further, the display control means 206 displays to the display means that a search result cannot be obtained when there is no data retrieved by the database search means 201 and data retrieved by the attached file search means 202. be able to.

なお、本実施形態では、識別情報である「商品番号＿連番」を用いてファイル装置４００を検索してデータを得る処理を行わないが、内部メモリテーブルＢの情報を用いてファイル装置４００から添付ファイル検索手段２０２による検索を行って、得られたデータを表示データ加工手段２０３が加工して一覧表示する情報としてまとめても良い。これにより、この情報を用いファイル装置４００へアクセスし対応するパンフレットや取扱説明書などのデータをダウンロードして表示することが可能となる。 In the present embodiment, the process of searching the file device 400 using the identification information "product number_serial number" to obtain data is not performed, but the information in the internal memory table B is used from the file device 400. The search may be performed by the attached file search means 202, and the obtained data may be summarized as information to be processed and displayed as a list by the display data processing means 203. This makes it possible to access the file device 400 using this information and download and display the corresponding data such as pamphlets and instruction manuals.

上記の構成に対し本実施形態は、データベース検索手段２０１が第１のインデックスファイル５０１の検索結果である主キー情報を用いるのではなく、主キー情報取得手段２０５の処理の処理を行わずに直接にデータベース３００をＳＱＬによって検索する構成を採ることはない。即ち、このような構成によると、検索端末１０１からのキーワードで列項目である「商品番号」、「商品名」、「商品名カナ」をそれぞれ前方後方中間一致の条件により検索する必要が生じ、データベースシステムとしては負荷が大きくレスポンスの悪いものとなる。検索キーワードによる検索が、データベース３００における複数テーブルに跨るようなものである場合には、直接にデータベース３００を検索するためのＳＱＬが非常に複雑となることから、この点においても上記構成を本実施形態では採用していないことが理解できる。 In contrast to the above configuration, in the present embodiment, the database search means 201 does not use the primary key information that is the search result of the first index file 501, but directly without processing the primary key information acquisition means 205. The database 300 is not searched by SQL. That is, according to such a configuration, it becomes necessary to search the column items "product number", "product name", and "product name kana" by keywords from the search terminal 101 according to the conditions of front-back intermediate match. As a database system, the load is heavy and the response is poor. If the search by the search keyword spans a plurality of tables in the database 300, the SQL for directly searching the database 300 becomes very complicated. Therefore, the above configuration is also implemented in this respect as well. It can be understood that it is not adopted in the form.

以上のように本実施形態によれば、データベース及び全文検索エンジンに特殊・特別な変形・加工を加えることなく、データベースの各テーブルに分散する項目（カラム）を検索対象項目にした曖昧処理により、目的のデータを効率良く高速に検索できる効果（本実施形態の効果という）を奏する。 As described above, according to the present embodiment, the items (columns) distributed in each table of the database are set as the search target items by the ambiguous processing without adding special / special transformation / processing to the database and the full-text search engine. It has the effect of being able to search for the target data efficiently and at high speed (referred to as the effect of this embodiment).

なお、上記の実施形態は、添付ファイルが存在する場合の構成であるが、添付ファイルが無く、ファイル装置４００を備えない構成を採用することができる。この場合には、ファイル装置４００以外に、第２のクローラ収集手段５０４、第２のインデックスファイル５０２、識別情報取得手段２０７、添付ファイル検索手段２０２は不要である。この場合においても、上記実施形態の効果と同じ効果を得ることが可能である。 Although the above embodiment is a configuration when an attached file exists, it is possible to adopt a configuration in which there is no attached file and the file device 400 is not provided. In this case, in addition to the file device 400, the second crawler collecting means 504, the second index file 502, the identification information acquiring means 207, and the attached file searching means 202 are unnecessary. Even in this case, it is possible to obtain the same effect as that of the above-described embodiment.

１０１検索端末
２００本検索装置
２０１データベース検索手段
２０２添付ファイル検索手段
２０３表示データ加工手段
２０４表示処理手段
２０５主キー情報取得手段
２０６表示制御手段
２０７識別情報取得手段
３００データベース
４００ファイル装置
５００全文検索エンジン
５０１第１のインデックスファイル
５０２第２のインデックスファイル
５０３第１のクローラ収集手段
５０４第２のクローラ収集手段 101 Search terminal 200 Search device 201 Database search means 202 Attached file search means 203 Display data processing means 204 Display processing means 205 Main key information acquisition means 206 Display control means 207 Identification information acquisition means 300 Database 400 File device 500 Full-text search engine 501 First index file 502 Second index file 503 First crawler collecting means 504 Second crawler collecting means

Claims

A database in which multiple tables of managed data of one unit are accumulated, and
All of the plurality of tables in the database are searched by the search information, all the tables which are one unit of the managed data are specified, and the unique values for all the specified tables are used as the primary key information, and this primary key is used. First crawler collection that creates a first index table to which the search target information in the content data of the corresponding table is assigned to the information as attribute information, and generates a first index file that summarizes this first index table. Means and
When a keyword to be searched is given, the main index file is searched, the first index table having the data corresponding to the keyword is detected, and the primary key information of the first index table is obtained. Key information acquisition method and
A database search means that searches the database based on the primary key information obtained by the primary key information acquisition means and extracts data corresponding to the keyword from the obtained corresponding table.
Display means for displaying information and
A data search system comprising: a display control means for displaying the data taken out by the database search means on the display means.

A file device in which attachments under the data directory of the database table are stored, and
A second index table is created by searching the file device, collecting the required data of the attached file by a crawler, and assigning the required data of the attached file to the identification information with a unique value. A second crawler collection means that generates a second index file that summarizes the index table of
When a keyword to be searched is given, the second index file is searched, a second index table having data corresponding to the keyword is detected, and identification information for obtaining identification information of the second index table is obtained. Acquisition method and
It is provided with an attachment file search means for searching the file device based on the identification information obtained by the identification information acquisition means and extracting data corresponding to the keyword from the obtained corresponding attachment file.
The data search system according to claim 1, wherein the display means displays the display means based on the data taken out by the attached file search means.

Wherein the display control unit, wherein, when the attached file retrieval means there is no data to fetch, according display based only on the data the database retrieval means is taken out to claim 2, characterized in that said display means Data search system.

The display control means is characterized in that the display means displays that a search result cannot be obtained when there is no data retrieved by the database search means and data retrieved by the attached file search means. The data search system according to claim 2.

The data retrieval system according to claim 2, wherein the first crawler collecting means and the second crawler collecting means perform a search by either morphological analysis or N-Gram.

The data retrieval system according to claim 1, wherein the first crawler collecting means performs processing at an arbitrary time.

The data retrieval system according to claim 2, wherein the first crawler collecting means and the second crawler collecting means perform processing at an arbitrary time.

A computer with a data search system that searches for data in a database in which multiple tables of managed data per unit are stored.
All of the plurality of tables in the database are searched by the search information, all the tables which are one unit of the managed data are specified, and the unique values for all the specified tables are used as the primary key information, and this primary key is used. First crawler collection that creates a first index table to which the search target information in the content data of the corresponding table is assigned to the information as attribute information, and generates a first index file that summarizes this first index table. means,
When a keyword to be searched is given, the main index file is searched, the first index table having the data corresponding to the keyword is detected, and the primary key information of the first index table is obtained. Key information acquisition method,
A database search means that searches the database based on the primary key information obtained by the primary key information acquisition means and extracts data corresponding to the keyword from the obtained corresponding table.
A data search program characterized in that the display means functions as a display control means for displaying data based on the data retrieved by the database search means.

Further, the computer
Search the file device where the attached files under the data directory of the database table are stored, collect the required data of the attached file by crawler, and use the unique value as the identification information. A second crawler collecting means that creates a second index table to which it is assigned and generates a second index file that summarizes the second index table.
When a keyword to be searched is given, the second index file is searched, a second index table having data corresponding to the keyword is detected, and identification information for obtaining identification information of the second index table is obtained. Acquisition method,
The file device is searched based on the identification information obtained by the identification information acquisition means, and the file device is made to function as an attachment search means for extracting data corresponding to the keyword from the obtained attached file.
The data search program according to claim 8, wherein the computer is used as the display control means to cause the display means to perform a display based on the data taken out by the attached file search means.

The computer is used as the display control means, and when there is no data retrieved by the attached file search means, the display means is made to function to display based only on the data retrieved by the database search means. The data search program according to claim 9.

Using the computer as the display control means, the display means is indicated that no search result can be obtained when there is no data retrieved by the database search means and data retrieved by the attached file search means. The data search program according to claim 9, wherein the data search program is made to function.

The data search according to claim 9, wherein the computer functions as the first crawler collecting means and the second crawler collecting means so as to perform a search by either morphological analysis or N-Gram. Program for .

The data search program according to claim 8, wherein the computer functions as the first crawler collecting means so as to perform processing at an arbitrary time.

The data search program according to claim 9, wherein the computer functions as the first crawler collecting means and the second crawler collecting means so as to perform processing at an arbitrary time.