JP2962287B2

JP2962287B2 - Structured document search device and machine-readable recording medium recording program

Info

Publication number: JP2962287B2
Application number: JP9220233A
Authority: JP
Inventors: 享赤峯
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1997-07-31
Filing date: 1997-07-31
Publication date: 1999-10-12
Anticipated expiration: 2017-07-31
Also published as: JPH1153400A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、１件の文書が複数
の論理構造（ゾーン）から構成されている構造化文書を
対象にした検索技術に関し、特に、ユーザによって指定
されたゾーンのみを検索対象にして構造化文書の検索を
行う技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a retrieval technique for a structured document in which one document is composed of a plurality of logical structures (zones), and more particularly, retrieves only a zone designated by a user. The present invention relates to a technology for searching a structured document for an object.

【０００２】[0002]

【従来の技術】近年、ＳＧＭＬ(Standard Generalized
Markup Language)に代表される、１件の文書を複数のゾ
ーンで構成した構造化文書が頻繁に用いられるようにな
ってきている。それにつれて、構造化文書の特定ゾーン
のみを検索対象にした検索（ゾーン検索）を行うこと
が、精度の高い検索を行う上で重要な機能になってきて
いる。2. Description of the Related Art In recent years, SGML (Standard Generalized)
Structured documents, such as Markup Languages), in which one document is composed of a plurality of zones, have been frequently used. Along with this, performing a search (zone search) targeting only a specific zone of a structured document as a search target has become an important function in performing a highly accurate search.

【０００３】ゾーン検索を行う従来の技術としては、例
えば、検索条件を「検索対象とするゾーンの開始タグと
終了タグの間にキーワードを含む文書」とし、テキスト
全体を対象にして文字列照合を行うことにより、検索条
件を満足させる文書を検索するようにしたものがある。
しかし、この方法は、テキスト全体を対象にして文字列
照合を行うので、検索時間が非常に長くなってしまう。
このような問題点を解決するため、テキストの不要部分
（検索対象とすることが指定されたゾーン以外のゾー
ン）をスキップして検索を行うようにした技術も提案さ
れている（例えば、特開平８−１６６００号公報）。こ
の技術によれば、文字列照合を行う範囲が少なくなるた
め、先の従来例に比較して検索時間を短くすることがで
きる。しかし、ギガバイトクラスの大規模データに対す
る検索では、不要部分をスキップすることにより文字列
照合範囲を例えば１０分の１程度に縮小できたとして
も、百メガバイトクラスのテキストを対象にして文字列
照合を行うことが必要になるため、高速な検索は望めな
い。As a conventional technique for performing a zone search, for example, a search condition is set to “a document including a keyword between a start tag and an end tag of a zone to be searched” and character string collation is performed for the entire text. In some cases, a document that satisfies the search condition is searched by performing the search.
However, in this method, since the character string matching is performed on the entire text, the search time becomes extremely long.
In order to solve such a problem, a technique has been proposed in which an unnecessary portion of a text (a zone other than a zone designated as a search target) is skipped and a search is performed (for example, Japanese Unexamined Patent Application Publication No. No. 8-16600). According to this technique, the range in which character string matching is performed is reduced, so that the search time can be shortened as compared with the above-described conventional example. However, in the search for large-scale data of gigabyte class, even if the character string collation range can be reduced to, for example, about 1/10 by skipping unnecessary parts, character string collation can be performed for 100 megabyte class text. You don't want fast searches because you need to do it.

【０００４】このように、文字列照合によりゾーン検索
を行う方法では、高速な検索を行うことが難しいため、
大規模データに対する高速な検索を可能にするために作
成された全文インデックスを利用してゾーン検索を行う
ことが考えられる。全文インデックスは、キー文字列が
キー情報として格納されたキー情報部と、キー情報部に
格納された各キー文字列それぞれについてそのキー文字
列が存在する文書の文書識別子，文書内位置が位置情報
として格納された位置情報部とから構成されるものであ
り、このような全文インデックスを利用してゾーン検索
を行う方法としては、下記（Ａ）〜（Ｃ）の３つの方法
が考えられる。As described above, it is difficult to perform a high-speed search by a method of performing a zone search by character string matching.
It is conceivable to perform a zone search using a full-text index created to enable high-speed search of large-scale data. The full-text index is a key information part in which a key character string is stored as key information, and for each key character string stored in the key information part, a document identifier of a document in which the key character string exists, and a position in the document as positional information. And a position information section stored as a. The following three methods (A) to (C) can be considered as a method of performing a zone search using such a full-text index.

【０００５】（Ａ）ゾーンに関する情報をキー情報部に持たせる方
法。（Ｂ）ゾーンに関する情報を位置情報部に持たせる方
法。（Ｃ）全文インデックスとは別のゾーンに関するインデ
ックスを作成する方法。(A) A method in which information about a zone is stored in a key information section. (B) A method in which information about the zone is provided in the position information section. (C) A method of creating an index for a zone different from the full-text index.

【０００６】（Ａ）の方法では、全文インデックスのキ
ー情報部に、キー文字列とそのキー文字列が存在するゾ
ーンのゾーン名とのペアからなるキー情報を格納してお
く。その際、複数のゾーンに存在するキー文字列につい
ては、各ゾーン毎にキー文字列とゾーン名との対からな
るキー情報を格納する。位置情報部には、各キーワード
毎に該当する文書の文書識別子，文書内位置が格納され
る。そして、検索時には、ユーザによって指定されたゾ
ーン名とキーワードとをキーにして全文インデックスを
検索することにより、上記ゾーン名のゾーンに、上記キ
ーワードを含む文書を探し出すようにしている。[0006] In the method (A), key information composed of a pair of a key character string and a zone name of a zone in which the key character string exists is stored in the key information part of the full-text index. At this time, for key character strings existing in a plurality of zones, key information including pairs of key character strings and zone names is stored for each zone. The position information section stores the document identifier of the document corresponding to each keyword and the position in the document. At the time of search, a document including the keyword is found in the zone of the zone name by searching the full-text index using the zone name and the keyword designated by the user as keys.

【０００７】（Ｂ）の方法では、位置情報部に格納する
位置情報に、位置情報として文書識別子，文書内位置の
他にゾーン名も併せ持たせておく。そして、検索時に
は、先ず、ユーザによって指定されたキーワードをキー
にして全文インデックスを検索することにより、上記キ
ーワードを含む文書の位置情報を全て求め、その後、上
記位置情報の中からユーザによって指定されたゾーン名
を含む位置情報を選択することにより、ゾーン検索を行
うようにしている。In the method (B), the position information stored in the position information section is provided with the zone name in addition to the document identifier and the position in the document as the position information. At the time of the search, first, a full-text index is searched using the keyword specified by the user as a key, thereby obtaining all the position information of the document including the keyword, and thereafter, the position information specified by the user from the position information is obtained. A zone search is performed by selecting location information including a zone name.

【０００８】（Ｃ）の方法では、全文インデックスとは
別に、検索対象とする全ての文書それぞれの、各ゾーン
の開始位置，終了位置が格納されたゾーン用インデック
スを作成しておく。そして、検索時には、先ず、全文イ
ンデックスを検索することにより、ユーザが指定したキ
ーワードを含む文書の位置情報を取得する。その後、ゾ
ーン用インデックスを検索し、上記文書の、ユーザによ
って指定されたゾーンの開始位置，終了位置を取得す
る。更に、位置情報中の文書内位置と取得した上記ゾー
ンの開始位置，終了位置とに基づいて、上記文書のユー
ザによって指定されたゾーン内に、ユーザによって指定
されたキーワードが存在するか否かをチェックすること
により、ゾーン検索を行う（例えば、特開平８−３１４
９６６号公報）。In the method (C), apart from the full-text index, a zone index in which the start position and end position of each zone are stored for all documents to be searched is created. Then, at the time of retrieval, first, by searching the full-text index, the position information of the document including the keyword specified by the user is obtained. Thereafter, the zone index is searched, and the start position and the end position of the zone specified by the user in the document are obtained. Further, based on the position in the document in the position information and the acquired start position and end position of the zone, it is determined whether or not the keyword specified by the user exists in the zone specified by the user of the document. By performing a check, a zone search is performed (for example, see JP-A-8-314).
966).

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、上述し
た（Ａ）の方法は、ゾーンの数に比例してキー情報数が
多くなるため、全文インデックスのサイズが大きくなる
という問題がある。更に、全文インデックスのサイズが
大きくなることにより、検索速度が低下するという問題
もある。However, the method (A) described above has a problem that the size of the full-text index increases because the number of key information increases in proportion to the number of zones. Further, there is a problem that the search speed is reduced due to an increase in the size of the full-text index.

【００１０】また、（Ｂ）の方法は、全文インデックス
の位置情報部の容量が多くなるため、全文インデックス
のサイズが大きくなるという問題がある。更に、位置情
報部からの読み出し量が多くなるため、検索速度が低下
するという問題もある。Further, the method (B) has a problem that the size of the full-text index increases because the capacity of the position information part of the full-text index increases. Further, there is a problem that the retrieval speed is reduced because the amount of reading from the position information section is increased.

【００１１】また、（Ｃ）の方法は、検索対象とする全
ての文書それぞれの、各ゾーンの開始位置，終了位置を
格納したゾーン用インデックスが必要になるため、イン
デックスサイズが大きくなるという問題がある。更に、
全文インデックスを検索することにより探し出した各該
当文書について、ゾーン用インデックスを検索し、ユー
ザが指定したキーワードが、ユーザが指定したゾーン内
に存在するか否かをチェックする必要があるため、この
処理がオーバーヘッドになって検索速度が低下してしま
うという問題が生じる。Further, the method (C) requires a zone index storing the start position and end position of each zone of each document to be searched, so that the index size becomes large. is there. Furthermore,
For each corresponding document found by searching the full-text index, it is necessary to search the index for the zone and check whether the keyword specified by the user exists in the zone specified by the user. However, there is a problem that the search speed is reduced due to overhead.

【００１２】そこで、本発明の目的は、全文インデック
スを利用したゾーン検索に於いて、インデックスサイズ
を小さくし、且つ検索速度を高速化することにある。It is therefore an object of the present invention to reduce the index size and increase the search speed in zone search using a full-text index.

【００１３】[0013]

【課題を解決するための手段】本発明の構造化文書検索
装置は、上記目的を達成するため、複数のゾーンから構
成される構造化文書が複数格納された文書格納手段と、
ゾーン位置変換文書に於ける各ゾーンの位置を示す情報
が格納されたゾーン情報テーブルと、前記文書格納手段
に格納されている構造化文書中の各ゾーンを前記ゾーン
情報テーブルの内容によって示される位置に移動させた
ゾーン位置変換文書を作成する文書内位置変換手段と、
該文書内位置変換手段によって作成されたゾーン位置変
換文書に基づいて、キー文字列と、そのキー文字列が存
在する構造化文書の文書識別子と、そのキー文字列のゾ
ーン位置変換文書に於ける文書内位置とが対応して格納
されたインデックスを作成するインデックス作成手段
と、検索対象にするゾーンのゾーン名とキーワードとを
含む検索条件式を受け付ける検索条件入力手段と、該検
索条件入力手段が受け付けた検索条件式中のキーワード
をキーにして前記インデックスを検索し、その結果得ら
れた前記キーワードが存在する構造化文書の文書識別
子，文書内位置と前記ゾーン情報テーブルの内容とに基
づいて、前記検索条件式中のゾーン名によって示される
ゾーンに前記キーワードが存在する構造化文書の文書識
別子を求めるキーワード検索手段とを備えたものであ
る。In order to achieve the above object, a structured document retrieval apparatus according to the present invention comprises: a document storage means for storing a plurality of structured documents each comprising a plurality of zones;
A zone information table in which information indicating the position of each zone in the zone position conversion document is stored, and a position in each of the zones in the structured document stored in the document storage means indicated by the contents of the zone information table In-document position conversion means for creating a zone position conversion document moved to
The key character string, the document identifier of the structured document in which the key character string exists, and the key character string in the zone position conversion document based on the zone position conversion document created by the intra-document position conversion means. An index creating means for creating an index in which the positions in the document are stored in correspondence with each other; a search condition input means for receiving a search condition expression including a zone name and a keyword of a zone to be searched; The index is searched using the keyword in the received search condition expression as a key, and based on the document identifier of the structured document in which the keyword is obtained as a result, the position in the document, and the contents of the zone information table, A keyword for finding a document identifier of a structured document in which the keyword exists in a zone indicated by a zone name in the search condition expression It is obtained by a search means.

【００１４】この構成に於いては、ゾーン検索を行うた
めの準備として、文書内位置変換手段が、文書格納手段
に格納されている構造化文書中の各ゾーンをゾーン情報
テーブルの内容によって示される位置に移動させたゾー
ン位置変換文書を作成し、インデックス作成手段が、文
書内位置変換手段によって作成されたゾーン位置変換文
書に基づいて、キー文字列と、そのキー文字列が存在す
る構造化文書の文書識別子と、そのキー文字列のゾーン
位置変換文書に於ける文書内位置とが対応して格納され
たインデックスを作成する。In this configuration, as a preparation for performing a zone search, the in-document position conversion means indicates each zone in the structured document stored in the document storage means by the contents of the zone information table. A zone position conversion document that has been moved to a position is created, and an index creation unit generates a key character string and a structured document in which the key character string exists based on the zone position conversion document created by the in-document position conversion unit. Then, an index is created in which the document identifier of the key character string and the position of the key character string in the document in the zone position conversion document are stored correspondingly.

【００１５】そして、ゾーン検索時に、ユーザが検索対
象にするゾーンのゾーン名とキーワードとを含む検索条
件式を入力すると、検索条件入力手段がそれを受け付
け、キーワード検索手段が上記検索条件式中のキーワー
ドをキーにしてインデックスを検索し、その結果得られ
た前記キーワードが存在する構造化文書の文書識別子，
文書内位置とゾーン情報テーブルの内容とに基づいて、
上記検索条件式中のゾーン名によって示されるゾーンに
前記キーワードが存在する構造化文書の文書識別子を求
める。When a user inputs a search condition expression including a zone name and a keyword of a zone to be searched at the time of zone search, the search condition input means accepts the search condition expression and the keyword search means accepts the search condition expression in the search condition expression. An index is searched using a keyword as a key, and a document identifier of a structured document in which the keyword is obtained as a result,
Based on the position in the document and the contents of the zone information table,
A document identifier of a structured document in which the keyword exists in a zone indicated by a zone name in the search condition expression is obtained.

【００１６】[0016]

【発明の実施の形態】次に本発明の実施の形態について
図面を参照して詳細に説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００１７】図１は本発明の実施例のブロック図であ
り、文書格納手段１と、文書内位置変換手段２と、ゾー
ン情報テーブル３と、インデックス作成手段４と、イン
デックス５と、キーワード検索手段６と、検索条件入力
手段７と、キーワード検索結果格納手段８と、論理条件
解析手段９と、検索結果出力手段１０とから構成されて
いる。FIG. 1 is a block diagram of an embodiment of the present invention. A document storage means 1, a document position conversion means 2, a zone information table 3, an index creation means 4, an index 5, a keyword search means 6, a search condition input means 7, a keyword search result storage means 8, a logical condition analysis means 9, and a search result output means 10.

【００１８】文書格納手段１には、検索対象となる複数
の構造化文書が格納されている。図２は文書格納手段１
の内容例を示した図である。本実施例では、説明を簡単
にするため、文書格納手段１には、文書識別子ＩＤ１，
ＩＤ２の２つの構造化文書２１，２２が格納されている
とする。各構造化文書２１，２２は、それぞれゾーン名
「全体」，「発明の名称」，「要約」，「目的」，「構
成」の各ゾーンから構成され、各ゾーンは、ゾーン開始
タグ，ゾーン終了タグ等により分割されている。The document storage unit 1 stores a plurality of structured documents to be searched. FIG. 2 shows the document storage unit 1.
FIG. 4 is a diagram showing an example of the contents of the above. In this embodiment, for the sake of simplicity, the document storage means 1 stores the document identifiers ID1 and ID1.
It is assumed that two structured documents 21 and 22 of ID2 are stored. Each of the structured documents 21 and 22 is composed of a zone having a zone name “whole”, “name of invention”, “abstract”, “purpose”, and “composition”, and each zone has a zone start tag and a zone end. It is divided by tags and the like.

【００１９】ゾーン情報テーブル３には、文書内位置変
換手段２で作成するゾーン位置変換文書に於ける各ゾー
ンの位置を示す情報が格納されている。図３はゾーン情
報テーブル３の内容例を示した図であり、ゾーン名と、
そのゾーン名のゾーンをゾーン位置変換文書内のどの位
置に配置するのかを示すゾーン位置情報とが格納されて
いる。図３の例は、ゾーン名「全体」，「発明の名
称」，「要約」，「目的」，「構成」のゾーンを、それ
ぞれゾーン位置変換文書内の「第１文字目〜第２０００
文字目」，「第１文字目〜第５００文字目」，「第５０
１文字目〜第２０００文字目」，「第５０１文字目〜第
１０００文字目」，「第１００１文字目〜第２０００文
字目」に配置することを示している。The zone information table 3 stores information indicating the position of each zone in the zone position conversion document created by the in-document position conversion means 2. FIG. 3 is a diagram showing an example of the contents of the zone information table 3, in which a zone name,
Zone position information indicating where in the zone position conversion document the zone with the zone name is to be located is stored. In the example of FIG. 3, the zones of the zone names “entire”, “name of the invention”, “abstract”, “object”, and “structure” are respectively referred to as “first character to 2000th” in the zone position conversion document.
"The first character to the 500th character", "the 50th character"
This means that the characters are arranged in the first to 2,000-th characters, the 501st to 1000th characters, and the 1001 to 2000th characters.

【００２０】文書内位置変換手段２は、ゾーン情報テー
ブル３を参照し、文書格納手段１に格納されている構造
化文書に対して、各ゾーンの文字列をゾーン情報テーブ
ル３中のゾーン位置情報によって示される位置に移動し
たゾーン位置変換文書を作成する機能を有する。従っ
て、各構造化文書では異なる位置に存在していた各ゾー
ンの文字列は、文書内位置変換手段２によって作成され
たゾーン位置変換文書では、同じ範囲に存在することに
なる。The in-document position conversion means 2 refers to the zone information table 3 and, for the structured document stored in the document storage means 1, converts the character string of each zone into the zone position information in the zone information table 3. Has a function of creating a zone position conversion document moved to the position indicated by. Therefore, the character string of each zone existing at a different position in each structured document is present in the same range in the zone position conversion document created by the in-document position conversion unit 2.

【００２１】インデックス作成手段４は、文書内位置変
換手段２で作成された各構造化文書に対応するゾーン位
置変換文書に基づいてインデックス５を作成する機能を
有する。インデックス５は、図４に示すように、キー情
報部５１と、位置情報部５２とを有している。キー情報
部５１にはＮ文字組や単語等のキー情報が格納され、位
置情報部５１には、キー情報が存在する構造化文書の文
書識別子と、そのキー情報が存在するゾーン位置変換文
書内の位置とが格納される。ここで、位置情報「ｉ−
ｊ」は、文書識別子ＩＤｉの構造化文書に対応するゾー
ン位置変換文書の第ｊ文字目を表している。従って、図
４の例の第１番目のエリアは、文字「文」が文書識別子
ＩＤ１の構造化文書２１中に存在し、それと対応するゾ
ーン位置変換文書２１’では第１文字目，第５０４文字
目に存在することと、文字「文」が文書識別子ＩＤ２の
構造化文書２２中に存在し、それと対応するゾーン位置
変換文書２２’では第１文字目，第５０１文字目に存在
することを表している。The index creation means 4 has a function of creating an index 5 based on the zone position conversion document corresponding to each structured document created by the in-document position conversion means 2. The index 5 has a key information section 51 and a position information section 52 as shown in FIG. The key information section 51 stores key information such as N character sets and words. The position information section 51 stores the document identifier of the structured document in which the key information exists and the zone position conversion document in which the key information exists. Is stored. Here, the location information "i-
"j" represents the j-th character of the zone position conversion document corresponding to the structured document having the document identifier IDi. Therefore, in the first area of the example of FIG. 4, the character "sentence" exists in the structured document 21 of the document identifier ID1, and the first character, the 504th character in the zone position conversion document 21 'corresponding thereto. And that the character "sentence" exists in the structured document 22 of the document identifier ID2, and in the corresponding zone position conversion document 22 ', exists in the first character and the 501st character. ing.

【００２２】検索条件入力手段７は、ユーザによって入
力された検索条件式を受け付ける機能，論理条件解析手
段９を利用して検索条件式を検索項目に分解する機能，
検索項目をキーワード検索手段６に渡す機能等を有す
る。ユーザが入力する検索条件式は、検索対象とするゾ
ーン名とキーワードとのペアからなる検索項目を１つ或
いは複数含むものであり、検索項目を複数含む場合は、
各検索項目は、ＡＮＤ，ＯＲ等の論理演算記号によって
結合される。図５は、ユーザが入力する検索条件式の１
例を示した図であり、２つの検索項目が論理演算記号Ａ
ＮＤによって結合されている。この検索条件式は、ゾー
ン名「発明の名称」のゾーンにキーワード「検索」を含
み、且つゾーン名「要約」のゾーンにキーワード「イン
デックス」を含む構造化文書の検索を指示するものであ
る。The search condition input means 7 has a function of receiving a search condition expression input by a user, a function of using the logical condition analysis means 9 to decompose the search condition expression into search items,
It has a function of passing a search item to the keyword search means 6 and the like. The search condition expression input by the user includes one or more search items including a pair of a zone name and a keyword to be searched. When a plurality of search items is included,
Each search item is linked by logical operation symbols such as AND and OR. FIG. 5 shows one of the search condition expressions input by the user.
FIG. 4 is a diagram showing an example, in which two search items are logical operation symbols A;
Connected by ND. This search condition expression indicates a search for a structured document that includes the keyword “search” in the zone with the zone name “name of the invention” and the keyword “index” in the zone with the zone name “summary”.

【００２３】キーワード検索手段６は、検索条件入力手
段７から渡された各検索項目中のキーワードをキーにし
てインデックス５を検索することにより、各検索項目そ
れぞれについて、その検索項目中のキーワードが現れる
文書の文書識別子，文書内位置を全て求める機能や、各
検索項目それぞれについて、ゾーン情報テーブル３を参
照して検索項目に含まれているゾーン名によって示され
るゾーンのゾーン位置を求める機能や、各検索項目それ
ぞれについて、その検索結果（文書識別子，文書内位
置）の中に上記ゾーン位置内の位置を示す検索結果があ
れば、その検索結果中の文書識別子とそれが何番目の検
索項目についてのものなのかを示す情報とをペアにして
キーワード検索結果格納手段８に格納する機能等を有す
る。The keyword search means 6 searches the index 5 using the keyword in each search item passed from the search condition input means 7 as a key, and the keyword in the search item appears for each search item. A function for obtaining all of the document identifier and the position in the document, a function for obtaining the zone position of the zone indicated by the zone name included in the search item with reference to the zone information table 3 for each search item, For each search item, if there is a search result indicating the position within the zone position in the search results (document identifier, position in the document), the document identifier in the search result and the order number of the search item It has a function of pairing information indicating whether or not the information is a pair and storing the information in the keyword search result storage unit 8.

【００２４】論理条件解析手段９は、キーワード検索結
果格納手段８に格納されている検索項目毎の検索結果
（文書識別子）と、検索条件入力手段７が受け付けた検
索条件式中の各検索項目を結合する論理演算記号とに基
づいて、上記検索条件式を満足させる構造化文書の文書
識別子を求める機能を有する。The logical condition analysis means 9 compares the search result (document identifier) for each search item stored in the keyword search result storage means 8 with each search item in the search condition expression received by the search condition input means 7. It has a function of obtaining a document identifier of a structured document that satisfies the above search condition expression based on a logical operation symbol to be combined.

【００２５】検索結果出力手段１０は、論理条件解析手
段９が求めた文書識別子を有する構造化文書を文書格納
手段１から取り出し、プリンタ，ＣＲＴ等の出力装置
（図示せず）に出力する機能を有する。The search result output means 10 has a function of extracting a structured document having the document identifier obtained by the logical condition analysis means 9 from the document storage means 1 and outputting the structured document to an output device (not shown) such as a printer or a CRT. Have.

【００２６】図６は文書内位置変換手段２の処理例を示
す流れ図、図７はインデックス作成手段４の処理例を示
す流れ図、図８は検索条件入力手段７の処理例を示す流
れ図、図９は検索条件入力手段７から検索条件式が渡さ
れたときの論理条件解析手段９の処理例を示す流れ図、
図１０はキーワード検索手段６から終了通知が送られて
きたときの論理条件解析手段９の処理例を示す流れ図、
図１１はキーワード検索手段６の処理例を示す流れ図で
あり、以下各図を参照して本実施例の動作を説明する。FIG. 6 is a flowchart showing a processing example of the in-document position conversion means 2, FIG. 7 is a flowchart showing a processing example of the index creation means 4, FIG. 8 is a flowchart showing a processing example of the search condition input means 7, and FIG. Is a flowchart showing a processing example of the logical condition analysis means 9 when a search condition expression is passed from the search condition input means 7;
FIG. 10 is a flowchart showing a processing example of the logical condition analysis means 9 when an end notification is sent from the keyword search means 6;
FIG. 11 is a flowchart showing a processing example of the keyword search means 6, and the operation of this embodiment will be described below with reference to the drawings.

【００２７】先ず、インデックス５の作成時の動作を説
明する。First, the operation at the time of creating the index 5 will be described.

【００２８】インデックス５の作成時、文書内位置変換
手段２は、図６の流れ図に示すように、文書格納手段１
から未処理の構造化文書を１つ入力する（Ｓ６１）。そ
の後、文書内位置変換手段２は、入力した構造化文書の
各ゾーン中の文字列をゾーン情報テーブル３中のゾーン
位置情報によって示される位置に移動させたゾーン位置
変換文書を作成し（Ｓ６３）、作成したゾーン位置変換
文書とＳ６１で入力した構造化文書の文書識別子とをイ
ンデックス作成手段４に渡す（Ｓ６４）。以上の処理を
未処理の構造化文書がなくなるまで（Ｓ６２がＮＯ）、
繰り返し行う。When the index 5 is created, the in-document position conversion means 2 sets the document storage means 1 as shown in the flowchart of FIG.
, One unprocessed structured document is input (S61). Thereafter, the in-document position conversion means 2 creates a zone position conversion document in which the character string in each zone of the input structured document is moved to the position indicated by the zone position information in the zone information table 3 (S63). Then, the created zone position conversion document and the document identifier of the structured document input in S61 are passed to the index creation means 4 (S64). The above processing is performed until there is no unprocessed structured document (NO in S62).
Repeat.

【００２９】本実施例の場合、文書格納手段１には図２
に示すような文書識別子ＩＤ１，ＩＤ２の構造化文書２
１，２２が格納され、更に、ゾーン情報テーブル３の内
容は図３に示すものになっているので、文書内位置変換
手段２は、図１２に示すようなゾーン位置変換文書２
１’，２２’を順次作成してインデックス作成手段４に
渡すことになる。In the case of this embodiment, the document storage means 1 has
Structured document 2 of document identifiers ID1 and ID2 as shown in
1 and 22 are stored, and the contents of the zone information table 3 are as shown in FIG. 3, so that the in-document position conversion means 2 executes the zone position conversion document 2 as shown in FIG.
1 'and 22' are sequentially created and passed to the index creation means 4.

【００３０】つまり、文書内位置変換手段２は、図３に
示したゾーン情報テーブル３の内容を参照し、その内容
に従って、構造化文書２１のゾーン「発明の名称」中に
存在する文字列「文書検索装置」を第１文字目から始ま
る位置に移動させ、ゾーン「目的」中に存在する文字列
「高速に文書を検索する。」を第５０１文字目から始ま
る位置に移動させ、ゾーン「構成」中に存在する文字列
「インデックス作成手段と……。」を第１００１文字目
から始まる位置に移動させた図１２に示すようなゾーン
位置変換文書２１’を作成して文書識別子ＩＤ１と共に
インデックス作成手段４に渡す。同様に、文書内位置変
換手段２は、構造化文書２２のゾーン「発明の名称」中
に存在する文字列「文書処理装置」を第１文字目から始
まる位置に移動させ、ゾーン「目的」中に存在する文字
列「文書を……。」を第５０１文字目から始まる位置に
移動させ、ゾーン「構成」中に存在する文字列「検索手
段と……。」を第１００１文字目から始まる位置に移動
させた図１２に示すようなゾーン位置変換文書２２’を
作成して文書識別子ＩＤ２と共にインデックス作成手段
４に渡す。That is, the in-document position conversion means 2 refers to the contents of the zone information table 3 shown in FIG. 3, and according to the contents, the character string "character name" existing in the zone "name of the invention" of the structured document 21. The document search device is moved to the position starting from the first character, and the character string “Search the document at high speed” existing in the zone “purpose” is moved to the position starting from the 501st character, and the zone “ The character string "index creation means ..." in "" is moved to a position starting from the 1001st character to create a zone position conversion document 21 'as shown in FIG. 12 and create an index together with the document identifier ID1. Pass to means 4. Similarly, the in-document position conversion means 2 moves the character string “document processing device” existing in the zone “name of the invention” of the structured document 22 to a position starting from the first character, Is moved to the position starting from the 501st character, and the character string "search means ..." in the zone "composition" is moved to the position starting from the 1001st character. Is created and transferred to the index creation means 4 together with the document identifier ID2 as shown in FIG.

【００３１】図１２から判るように、変換処理後の各ゾ
ーン位置変換文書２１’，２２’は、ゾーン名「全
体」，「発明の名称」，「要約」，「目的」，「構成」
の各ゾーンが、ゾーン情報テーブル３中のゾーン位置情
報によって示される位置に必ず存在することになる。As can be seen from FIG. 12, each of the zone position conversion documents 21 'and 22' after the conversion processing has the zone names "whole", "name of invention", "summary", "purpose", and "structure".
Will always exist at the position indicated by the zone position information in the zone information table 3.

【００３２】インデックス作成手段４は、文書内位置変
換手段２からゾーン位置変換文書，文書識別子が渡され
ると、図７の流れ図に示すように、ゾーン位置変換文書
の先頭位置に注目する（Ｓ７１）。そして注目位置に、
インデックス５のキー情報部５１に格納すべき文字が存
在するか否かを判断する（Ｓ７２）。格納すべき文字か
否かの判断は、例えば、空白文字，句読点等、格納する
必要のない文字を予め定めておき、注目位置に存在する
文字がそれ以外の文字であるか否かを判断することによ
り行う。When the index conversion unit 4 receives the zone position conversion document and the document identifier from the intra-document position conversion unit 2, the index generation unit 4 pays attention to the head position of the zone position conversion document as shown in the flowchart of FIG. 7 (S71). . And at the point of interest,
It is determined whether there is any character to be stored in the key information section 51 of the index 5 (S72). To determine whether or not the character should be stored, for example, a character that does not need to be stored, such as a blank character or a punctuation mark, is determined in advance, and it is determined whether or not the character present at the target position is another character. It is done by doing.

【００３３】そして、Ｓ７２に於いて、格納すべき文字
が注目位置に存在すると判断した場合（Ｓ７２がＹＥ
Ｓ）は、その文字が既に格納済みか否かを判断する（Ｓ
７３）。格納済みでないと判断した場合は、注目位置に
存在する文字をインデックス５のキー情報部５１に格納
すると共に、文書内位置変換手段２から渡された文書識
別子と文書内位置（現在の注目位置）とからなる位置情
報を位置情報部５２に格納する（Ｓ７３がＮＯ，Ｓ７
４）。これに対して、格納済みであると判断した場合
は、位置情報部５２に文書内位置変換手段２から渡され
た文書識別子と文書内位置とからなる位置情報を位置情
報部５２に格納する（Ｓ７３がＹＥＳ，Ｓ７５）。If it is determined in S72 that the character to be stored exists at the position of interest (S72: YE
S) determines whether the character has already been stored (S).
73). If it is determined that the character has not been stored, the character present at the position of interest is stored in the key information part 51 of the index 5, and the document identifier and the position within the document (current position of interest) passed from the in-document position conversion means 2 are stored. Is stored in the position information section 52 (NO in S73, S7
4). On the other hand, if it is determined that the document has been stored, the position information including the document identifier and the position in the document passed from the position conversion unit 2 in the document to the position information unit 52 is stored in the position information unit 52 ( S73 is YES, S75).

【００３４】Ｓ７４，７５の処理が終了すると、インデ
ックス作成手段４は、注目位置を次の位置に移し（Ｓ７
６）、前述したと同様の処理を行う。また、Ｓ７２で格
納すべき文字が注目位置に存在しないと判断した場合
も、Ｓ７６の処理を行う。When the processes of S74 and S75 are completed, the index creating means 4 moves the target position to the next position (S7).
6) The same processing as described above is performed. Also, when it is determined in S72 that the character to be stored does not exist at the position of interest, the process of S76 is performed.

【００３５】以上の処理を文書内位置変換手段２から渡
されたゾーン位置変換文書の終わりまで（Ｓ７７がＹＥ
Ｓ）、繰り返し行う。The above processing is performed until the end of the zone position converted document passed from the in-document position converting means 2 (S77: YE
S) Repeat.

【００３６】本実施例の場合、インデックス作成手段４
には、図１２に示すようなゾーン位置変換文書２１’，
文書識別子ＩＤ１と、ゾーン位置変換文書２２’，文書
識別子ＩＤ２とが渡されるので、インデックス作成手段
４に於いては、次のような処理が行われることになる。In the case of this embodiment, the index creation means 4
Includes a zone position conversion document 21 ', as shown in FIG.
Since the document identifier ID1, the zone position conversion document 22 ', and the document identifier ID2 are passed, the following processing is performed in the index creating means 4.

【００３７】文書内位置変換手段２からゾーン位置変換
文書２１’と文書識別子ＩＤ１とが渡された場合は、イ
ンデックス作成手段４は、先頭位置に注目したときに、
文字「文」をキー情報部５１に格納し、位置情報「１−
１」を位置情報部５２に格納する（Ｓ７１，Ｓ７４）。
また、インデックス作成手段４は、ゾーン位置変換文書
２１’中の次に位置（第２文字目）に注目したときは、
注目位置に存在する文字「書」をキー情報部５１に格納
し、位置情報「１−２」を位置情報部５２に格納する
（Ｓ７４）。また、例えば、注目位置をゾーン位置変換
文書２１’の第５０４文字目にしたときは、注目位置に
存在する文字「文」は既に格納済みであるので、位置情
報部５２中の上記文字「文」に対応するエントリに位置
情報「１−５０４」を格納することになる（Ｓ７５）。
このような処理を、ゾーン位置変換文書２１’の終わり
まで行う。ゾーン位置変換文書２２’と文書ＩＤとが渡
された場合も、インデックス作成手段４は前述したと同
様の処理を行う。この結果、インデックス５の内容は、
図４に示すものとなる。When the zone position conversion document 21 'and the document identifier ID1 are passed from the in-document position conversion means 2, the index creation means 4 pays attention to the head position.
The character “sentence” is stored in the key information part 51, and the position information “1-
"1" is stored in the position information section 52 (S71, S74).
Also, when the index creating unit 4 pays attention to the next position (the second character) in the zone position conversion document 21 ′,
The character "" at the position of interest is stored in the key information section 51, and the position information "1-2" is stored in the position information section 52 (S74). Further, for example, when the target position is set to the 504th character of the zone position conversion document 21 ′, the character “sentence” existing at the target position has already been stored. "Is stored in the entry corresponding to". "(S75).
Such processing is performed until the end of the zone position conversion document 21 '. Also when the zone position conversion document 22 'and the document ID are passed, the index creation means 4 performs the same processing as described above. As a result, the contents of index 5 are:
This is shown in FIG.

【００３８】尚、ここでは、説明を簡単に行うため、キ
ー情報部５１に格納する文字列の文字長を１文字とした
が、これに限られるものではなく、文字長が２以上のＮ
文字組でも、単語であっても構わない。Here, for simplicity of description, the character length of the character string stored in the key information section 51 is one character, but the present invention is not limited to this.
It may be a character set or a word.

【００３９】次に、ゾーン検索時の動作について説明す
る。Next, the operation at the time of zone search will be described.

【００４０】ゾーン検索を行う場合、ユーザは、検索対
象とするゾーンのゾーン名とキーワードとのペアからな
る検索項目を１つ或いは複数含む検索条件式を検索条件
入力手段７に入力する。前述したように、検索項目を複
数含む検索条件式の場合は、各検索項目は、ＡＮＤ，Ｏ
Ｒ等の論理演算記号によって結合されている。When performing a zone search, the user inputs to the search condition input means 7 a search condition expression including one or more search items consisting of a pair of a zone name and a keyword of a zone to be searched. As described above, in the case of a search condition expression including a plurality of search items, each search item is AND, O
They are connected by logical operation symbols such as R.

【００４１】今、例えば、ユーザが検索条件式として、
図５に示した検索条件式「（発明の名称＝検索）ＡＮＤ
（要約＝インデックス）」を検索条件入力手段７に入力
したとする。この検索条件式は、前述したように、ゾー
ン名「発明の名称」の部分に文字列「検索」が現れ、且
つゾーン名「要約」の部分に文字列「インデックス」が
現れる構造化文書の検索を指示するものである。Now, for example, when the user sets a search condition expression as
The search condition expression “(name of invention = search) AND shown in FIG.
(Summary = index) "is input to the search condition input means 7. As described above, this search condition expression is a search for a structured document in which the character string “search” appears in the zone name “name of the invention” and the character string “index” appears in the zone name “summary”. Is to indicate.

【００４２】ユーザが検索条件式「（発明の名称＝検
索）ＡＮＤ（要約＝インデックス）」を入力すると、検
索条件入力手段７は、図８の流れ図に示すように、それ
を受け付け、論理条件解析手段９に渡す（Ｓ８１，Ｓ８
２）。When the user inputs a search condition expression "(name of invention = search) AND (summary = index)", the search condition input means 7 accepts the search condition and performs a logical condition analysis as shown in the flowchart of FIG. To the means 9 (S81, S8
2).

【００４３】論理条件解析手段９は、検索条件式「（発
明の名称＝検索）ＡＮＤ（要約＝インデックス）」が渡
されると、図９の流れ図に示すように、検索条件式を第
１番目の検索項目「発明の名称＝検索」と、第２番目の
検索項目「要約＝インデックス」との２つの検索項目に
分割し、それらを検索条件入力手段７に返す（Ｓ９１，
Ｓ９２）。When the search condition expression “(name of invention = search) AND (summary = index)” is passed, the logical condition analysis means 9 converts the search condition expression into the first condition as shown in the flowchart of FIG. The search item “name of invention = search” and the second search item “summary = index” are divided into two search items, which are returned to the search condition input means 7 (S91,
S92).

【００４４】検索条件入力手段７は、論理条件解析手段
９から第１番目，第２番目の検索項目「発明の名称＝検
索」，「要約＝インデックス」を受け取ると、それらを
キーワード検索手段６に渡す（図８のＳ８３，Ｓ８
４）。Upon receiving the first and second search items “name of invention = search” and “summary = index” from the logical condition analysis means 9, the search condition input means 7 sends them to the keyword search means 6. Hand over (S83, S8 in FIG. 8)
4).

【００４５】キーワード検索手段６は、検索条件入力手
段７から第１番目，第２番目の検索項目「発明の名称＝
検索」，「要約＝インデックス」が渡されると、図１１
の流れ図に示すように、その内の１つに注目する（Ｓ１
１１）。The keyword search means 6 sends the first and second search items “name of invention =
When “search” and “summary = index” are passed, FIG.
As shown in the flow chart of FIG.
11).

【００４６】今、例えば、第１番目の検索項目「発明の
名称＝検索」に注目したとすると、キーワード検索手段
６は、先ず、第１番目の検索項目「発明の名称＝検索」
中のキーワード「検索」をキーにしてインデックス５を
検索することにより、キーワード「検索」が現れるゾー
ン位置変換文書の文書識別子と、文書内位置とを求める
（Ｓ１１３）。本実施例の場合、インデックス５の内容
は、図５に示すものになっているので、Ｓ１１３を行う
ことにより、キーワード「検索」が、文書識別子ＩＤ１
のゾーン位置変換文書２１’の第３文字目〜第４文字目
と、文書識別子がＩＤ２のゾーン位置変換文書２２’の
第１００１文字目〜第１００２文字目に現れることが求
められる。Now, for example, if attention is paid to the first search item “name of invention = search”, the keyword search means 6 firstly searches for the first search item “name of invention = search”.
By searching the index 5 using the keyword “search” in the key as a key, the document identifier of the zone position conversion document in which the keyword “search” appears and the position in the document are obtained (S113). In the case of the present embodiment, since the contents of the index 5 are as shown in FIG. 5, by performing S113, the keyword “search” is changed to the document identifier ID1.
Is required to appear in the third to fourth characters of the zone position conversion document 21 ′ and the document identifier in the 1001 to 1002 characters of the zone position conversion document 22 ′ of ID2.

【００４７】その後、キーワード検索手段６は、ゾーン
情報テーブル３を参照し、第１番目の検索項目中のゾー
ン名「発明の名称」によって示されるゾーンのゾーン位
置を求める（Ｓ１１４）。本実施例の場合、「第１文字
目〜第５００文字目」がゾーン位置として求められる
（図３参照）。Thereafter, the keyword search means 6 refers to the zone information table 3 and obtains the zone position of the zone indicated by the zone name "name of the invention" in the first search item (S114). In the case of the present embodiment, “first character to 500th character” is obtained as the zone position (see FIG. 3).

【００４８】次いで、キーワード検索手段６は、Ｓ１１
３の検索結果の中に、Ｓ１１４で求めたゾーン位置内の
位置を示すものがあれば、その検索結果中の文書識別子
とそれが第１番目の検索項目についてのものであること
を示す情報とをペアにしてキーワード検索結果格納手段
８に格納する（Ｓ１１５）。この例では、Ｓ１１３の検
索結果が、「文書識別子ＩＤ１のゾーン位置変換文書２
１’の第３文字目〜第４文字目」，「文書識別子がＩＤ
２のゾーン位置変換文書２２’の第１００１文字目〜第
１００２文字目」で、Ｓ１１４で求めたゾーン位置が
「第１文字目〜第５００文字目」であるので、キーワー
ド検索手段６は、文書識別子ＩＤ１とそれが第１番目の
検索項目についてのものであることを示す情報とをペア
にしてキーワード検索結果格納手段８に格納することに
なる。Next, the keyword search means 6 proceeds to S11
If there is a search result indicating the position in the zone position obtained in S114 among the search results of S3, the document identifier in the search result and information indicating that the document identifier is for the first search item Are stored as a pair in the keyword search result storage means 8 (S115). In this example, the search result of S113 is “Zone position conversion document 2 of document identifier ID1”.
3rd to 4th characters of 1 '"," document identifier is ID
In the second zone position conversion document 22 ′, the zone position obtained in S114 is “first character to 500th character” in the 1001st to 1002th characters, so that the keyword search unit 6 The identifier ID1 and information indicating that the identifier ID1 is for the first search item are stored in the keyword search result storage unit 8 as a pair.

【００４９】その後、キーワード検索手段６は、第２番
目の検索項目「要約＝インデックス」に注目し（Ｓ１１
１）、前述したと同様の処理を行う（Ｓ１１３〜Ｓ１１
５）。第２番目の検索項目の場合、キーワード「インデ
ックス」は、文書識別子ＩＤ１のゾーン位置変換文書２
１’の第１００１文字目〜第１００６文字目に現れ、ゾ
ーン名「要約」のゾーンのゾーン位置は、「第５０１文
字目〜第２０００文字目」であるので、キーワード検索
手段６は、文書識別子ＩＤ１とそれが第２番目の検索項
目についてのものであることを示す情報とをペアにして
キーワード検索結果格納手段８に格納する。Thereafter, the keyword search means 6 pays attention to the second search item "summary = index" (S11).
1) Perform the same processing as described above (S113 to S11)
5). In the case of the second search item, the keyword “index” is the zone position conversion document 2 of the document identifier ID1.
1 ′ appears at the 1001st to 1006th characters and the zone position of the zone with the zone name “abstract” is “501st to 2000th characters”. ID1 and information indicating that it is for the second search item are stored in the keyword search result storage means 8 as a pair.

【００５０】そして、検索条件入力手段７から渡された
全ての検索項目について上述した処理を行うと（Ｓ１１
２がＮＯ）、キーワード検索手段６は、論理条件解析手
段９に対して終了通知を送る（Ｓ１１６）。When the above-described processing is performed on all the search items passed from the search condition input means 7 (S11)
2 is NO), the keyword search means 6 sends an end notification to the logical condition analysis means 9 (S116).

【００５１】論理条件解析手段９は、キーワード検索手
段６から終了通知が送られてくると、図１０の流れ図に
示すように、キーワード検索結果格納手段８に格納され
ている各検索項目についての検索結果と、検索条件入力
手段７から渡された検索条件式中の論理演算記号とに基
づいて、検索条件式を満足させる構造化文書の文書識別
子を求め、それを検索結果出力手段１０に渡す（Ｓ１０
１，Ｓ１０２）。When the completion notice is sent from the keyword search means 6, the logical condition analysis means 9 searches for each search item stored in the keyword search result storage means 8 as shown in the flowchart of FIG. Based on the result and the logical operation symbol in the search condition expression passed from the search condition input unit 7, a document identifier of a structured document that satisfies the search condition expression is obtained, and is passed to the search result output unit 10 ( S10
1, S102).

【００５２】この例の場合、キーワード検索結果格納手
段８には、第１番目，第２番目の検索項目の検索結果と
してそれぞれ文書識別子「ＩＤ１」，「ＩＤ１」が格納
され、検索条件式中の第１番目の検索項目と第２番目の
検索項目とを結合する論理演算式が「ＡＮＤ」であるこ
とから、両方の検索結果中に存在する文書識別子「ＩＤ
１」を検索結果出力手段１０に渡すことになる。In this example, the keyword search result storage means 8 stores the document identifiers “ID1” and “ID1” as the search results of the first and second search items, respectively. Since the logical operation expression that joins the first search item and the second search item is “AND”, the document identifier “ID” existing in both search results
"1" is passed to the search result output means 10.

【００５３】検索結果出力手段１０は、文書識別子「Ｉ
Ｄ１」が渡されると、文書格納手段１から文書識別子が
「ＩＤ１」の構造化文書２１を読み込み、プリンタ，Ｃ
ＲＴ等の出力装置（図示せず）に出力する。The search result output means 10 outputs the document identifier "I
D1 ”, the structured document 21 whose document identifier is“ ID1 ”is read from the document storage unit 1, and the printer, C
Output to an output device (not shown) such as an RT.

【００５４】図１３は、図１に示した構造化文書検索装
置のハードウェア構成の一例を示したブロック図であ
り、コンピュータ１３１と、記録媒体１３２と、記憶装
置１３３とから構成されている。記録媒体１３２は、磁
気ディスク，半導体メモリ，その他の記録媒体であり、
コンピュータ１３１を構造化文書検索装置として機能さ
せるためのプログラムが記録されている。FIG. 13 is a block diagram showing an example of the hardware configuration of the structured document search device shown in FIG. 1, and comprises a computer 131, a recording medium 132, and a storage device 133. The recording medium 132 is a magnetic disk, a semiconductor memory, or another recording medium.
A program for causing the computer 131 to function as a structured document search device is recorded.

【００５５】記録媒体１３２に記録されているプログラ
ムは、コンピュータ１３１によって読み込まれ、コンピ
ュータ１３１の動作を制御することにより、コンピュー
タ１３１上に図１に示した文書内位置変換手段２，イン
デックス作成手段４，キーワード検索手段６，検索条件
入力手段７，論理条件解析手段９，検索結果出力手段１
０を実現する。尚、文書格納手段１，ゾーン情報テーブ
ル３，インデックス５，キーワード検索結果格納手段８
は、記憶装置１３３上に構成される。The program recorded on the recording medium 132 is read by the computer 131, and by controlling the operation of the computer 131, the in-document position conversion means 2 and the index creation means 4 shown in FIG. Keyword search means 6, search condition input means 7, logical condition analysis means 9, search result output means 1
0 is realized. The document storage unit 1, the zone information table 3, the index 5, the keyword search result storage unit 8
Are configured on the storage device 133.

【００５６】[0056]

【発明の効果】以上説明したように、本発明の構造化文
書検索装置によれば、従来の全文インデックスを利用し
てゾーン検索を行う従来の技術に比較してインデックス
サイズを小さくすることができ、且つ検索速度を高速化
することができる。その理由は、ゾーンの位置情報が全
ての構造化文書で共通になるような形でインデックスを
作成するため、検索時に、非常に小規模なゾーン情報テ
ーブルを参照するだけでゾーン検索を行うことができる
からである。As described above, according to the structured document search apparatus of the present invention, the index size can be reduced as compared with the conventional technique of performing a zone search using a conventional full-text index. In addition, the search speed can be increased. The reason is that because the index is created in such a way that the zone location information is common to all structured documents, it is possible to perform a zone search only by referring to a very small zone information table when searching. Because you can.

[Brief description of the drawings]

【図１】本発明の実施例のブロック図である。FIG. 1 is a block diagram of an embodiment of the present invention.

【図２】文書格納手段１の内容例を示す図である。FIG. 2 is a diagram showing an example of contents of a document storage unit 1;

【図３】ゾーン情報テーブル３の内容例を示す図であ
る。FIG. 3 is a diagram showing a content example of a zone information table 3;

【図４】インデックス５の内容例を示す図である。FIG. 4 is a diagram showing an example of the contents of an index 5;

【図５】検索条件式の一例を示す図である。FIG. 5 is a diagram illustrating an example of a search condition expression.

【図６】文書内位置変換手段２の処理例を示す流れ図で
ある。FIG. 6 is a flowchart showing a processing example of the in-document position conversion means 2;

【図７】インデックス作成手段４の処理例を示す流れ図
である。FIG. 7 is a flowchart showing a processing example of the index creation means 4;

【図８】検索条件入力手段７の処理例を示す流れ図であ
る。FIG. 8 is a flowchart showing a processing example of a search condition input unit 7;

【図９】検索条件入力手段７から検索条件式が渡された
ときの論理条件解析手段９の処理例を示す流れ図であ
る。FIG. 9 is a flowchart showing a processing example of the logical condition analysis means 9 when a search condition expression is passed from the search condition input means 7;

【図１０】キーワード検索手段６から終了通知が送られ
てきたときの論理条件解析手段９の処理例を示す流れ図
である。FIG. 10 is a flowchart showing a processing example of the logical condition analysis means 9 when an end notification is sent from the keyword search means 6;

【図１１】キーワード検索手段６の処理例を示す流れ図
である。FIG. 11 is a flowchart showing a processing example of the keyword search means 6;

【図１２】文書内位置変換手段２で作成されたゾーン位
置変換文書の一例を示す図である。FIG. 12 is a diagram showing an example of a zone position conversion document created by the in-document position conversion means 2.

【図１３】図１に示した構造化文書検索装置のハードウ
ェア構成の一例を示すブロック図である。13 is a block diagram showing an example of a hardware configuration of the structured document search device shown in FIG.

[Explanation of symbols]

１…文書格納手段２…文書内位置変換手段３…ゾーン情報テーブル４…インデックス作成手段５…インデックス６…キーワード検索手段７…検索条件入力手段８…キーワード検索結果格納手段９…論理条件解析手段１０…検索結果出力手段２１，２２…構造化文書２１’，２２’…ゾーン位置変換文書５１…キー情報部５２…位置情報部１３１…コンピュータ１３２…記録媒体１３３…記憶装置 DESCRIPTION OF SYMBOLS 1 ... Document storage means 2 ... Document position conversion means 3 ... Zone information table 4 ... Index creation means 5 ... Index 6 ... Keyword search means 7 ... Search condition input means 8 ... Keyword search result storage means 9 ... Logical condition analysis means 10 ... Search result output means 21 and 22 Structured documents 21 ′ and 22 ′ Zone position conversion document 51 Key information part 52 Position information part 131 Computer 132 Recording medium 133 Storage device

フロントページの続き (56)参考文献特開平８−329116（ＪＰ，Ａ) 特開平８−241332（ＪＰ，Ａ) 「高速全文検索の威力」日経バイト, 1996年10月号，ｐ．142−167（平８−９ −22) 福島、赤峯「高速全文検索のためのフレキシブル文字列インバージョン法（１）方式概要」情報処理，Ｖｏｌ. 38，Ｎｏ．４，1997，ｐ．334−335（平９−４−15) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/30 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References JP-A-8-329116 (JP, A) JP-A-8-241332 (JP, A) "Power of high-speed full-text search", Nikkei Byte, October 1996, p. 142-167 (Heisei 8-9-22) Fukushima, Akamine "Flexible character string inversion method for high-speed full-text search (1) Method outline" Information Processing, Vol. 4, 1997, p. 334-335 (Heisei 9-4-15) (58) Fields investigated (Int. Cl. ⁶ , DB name) G06F 17/30 JICST file (JOIS)

Claims

(57) [Claims]

1. A document storage unit in which a plurality of structured documents each including a plurality of zones are stored; a zone information table in which information indicating a position of each zone in a zone position conversion document is stored; An intra-document position conversion unit that creates a zone position conversion document in which each zone in the structured document stored in the storage unit is moved to a position indicated by the contents of the zone information table; Based on the created zone position conversion document, the key character string, the document identifier of the structured document in which the key character string exists, and the position of the key character string in the document in the zone position conversion document correspond to each other. Creation means for creating an index stored therein, and a search condition for receiving a search condition expression including a zone name and a keyword of a zone to be searched Searching means for searching the index by using a keyword in a search condition expression received by the search condition input means as a key, and a document identifier and a position in the document of a structured document in which the keyword obtained as a result is present; A keyword search unit for obtaining a document identifier of a structured document in which the keyword exists in a zone indicated by a zone name in the search condition expression based on the contents of the zone information table. Document retrieval device.

2. The zone information table stores a zone name and zone position information indicating a position of a zone having the zone name in a zone position conversion document. 2. The structured document search device according to claim 1, wherein:

3. The search condition expression has a format in which a plurality of search items composed of a zone name and a keyword of a zone to be searched are linked by a logical operation symbol. For each search item of the search condition expression received by the condition input means, the index is searched using a keyword in the search item as a key, and the search is performed based on the search result and the contents of the zone information table. A structure for obtaining a document identifier of a structured document in which the keyword exists in a zone indicated by a zone name in an item; and a document identifier for each search item obtained by the keyword search means and the search condition input. Based on a logical operation symbol connecting each search item in the search condition expression accepted by the means,
3. The structured document search device according to claim 2, further comprising a logical condition analysis unit for obtaining a document identifier of a structured document satisfying the search condition expression.

4. The structured document search according to claim 3, further comprising: a search result output unit that reads out and outputs a structured document of the document identifier obtained by the logical condition analysis unit from the document storage unit. apparatus.

5. A document storage unit storing a plurality of structured documents each including a plurality of zones, and a zone information table storing zone position information indicating the position of each zone in the zone position conversion document. An intra-document position conversion unit that creates a zone position conversion document in which each computer in the structured document stored in the document storage unit is moved to a position indicated by the contents of the zone information table. Based on the zone position conversion document created by the in-document position conversion means, the key character string, the document identifier of the structured document in which the key character string exists, and the document position of the key character string correspond to each other. Index creation means for creating a stored index, a search condition for receiving a search condition expression including a zone name and a keyword of a zone to be searched Input means, searching the index by using a keyword in a search condition expression received by the search condition input means as a key, and obtaining a document identifier, a position in the document, and a zone information table of a document including the keyword obtained as a result. A machine-readable recording in which a program for functioning as keyword search means for obtaining a document identifier of a structured document in which the keyword is present in a zone indicated by a zone name in the search condition expression based on the contents of Medium.