JP3413866B2

JP3413866B2 - Information retrieval device

Info

Publication number: JP3413866B2
Application number: JP05894193A
Authority: JP
Inventors: 敦畠山; 寛次加藤; 悟志浅川; 川口　　久光
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1992-03-19
Filing date: 1993-03-18
Publication date: 2003-06-09
Anticipated expiration: 2018-06-09
Also published as: JPH0660121A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、文書データベースを、
文字列を指定して文書の全文を対象として探索する機能
を複数のユーザが共用して使う情報検索装置に係り、特
に各ユーザからの検索要求が同時に送られてくる場合
に、待ち時間を最小限に抑えるのに好適な装置に関す
る。BACKGROUND OF THE INVENTION The present invention relates to a document database,
It relates to an information retrieval device that uses the function of specifying a character string to search the entire text of a document as a target, and that minimizes waiting time especially when search requests from each user are sent simultaneously. The present invention relates to a device suitable for minimizing the limit.

【０００２】[0002]

【従来の技術】従来より、文書の登録の際にキーワード
付けを行う必要の無いフルテキストサーチ方式が特開平
０３−１７４６５２号公報で提案されている。この方式
は、文書を単語単位に圧縮した凝縮本文と、使用文字を
一文字単位で登録した文字成分表を用いて、フルテキス
トサーチを実用レベルで高速に行うことを目的としてい
る。2. Description of the Related Art Conventionally, a full-text search method that does not need to add a keyword when registering a document has been proposed in Japanese Patent Application Laid-Open No. 03-174652. This method aims to perform a full-text search at a practical level at high speed using a condensed text obtained by compressing a document in word units and a character component table in which characters to be used are registered in character units.

【０００３】しかしながら、凝縮されているとはいえ、
テキストデータを一文字ずつ探索する方式であることに
違いは無い。そのため、ＣＰＵがデータスキャンの間中
文字の照合動作を行っており、他の処理を行う時間的余
裕が無いことになる。このことは、複数のユーザにサー
ビスを時分割に提供することが困難であるということを
意味する。[0003] However, although condensed,
There is no difference in the method of searching text data one character at a time. Therefore, the CPU performs the character collating operation during the data scan, and there is no time to perform other processing. This means that it is difficult to provide services to a plurality of users in a time-sharing manner.

【０００４】すなわち、フルテキストサーチ処理を行う
検索装置に複数個の端末が接続し、その各々の端末から
頻繁に検索要求が与えられる場合、テキストスキャン中
のため他の処理をＣＰＵが行うことができず、ＣＰＵの
文字照合動作が一通り終了するまで他の要求が待たされ
ることになる。That is, when a plurality of terminals are connected to a search device that performs a full-text search process, and a search request is frequently given from each of the terminals, the CPU may perform another process during text scanning. Therefore, another request is waited until the character collation operation of the CPU is completely completed.

【０００５】また、処理の終了待ちとなっている状態
で、次の処理要求が来た場合、あるいは、次々と各端末
から検索要求が来た場合には、検索装置は要求が来た順
番に処理をするために後から来た要求であるほど待ち時
間が多くなるという問題があった。[0005] When the next processing request arrives in the state of waiting for the end of the processing, or when a search request comes from each terminal one after another, the search device operates in the order in which the requests came. There is a problem in that the later the request for processing, the longer the waiting time.

【０００６】[0006]

【発明が解決しようとする課題】本発明の目的は、複数
個の端末を持つフルテキストサーチの検索装置につい
て、各端末からの検索要求が次々と送られてきても、見
かけ上シングルユーザの場合と変らない応答時間で端末
に結果を帰すことのできる装置を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a full-text search apparatus having a plurality of terminals, even if search requests from each terminal are sent one after another. It is an object of the present invention to provide a device capable of returning a result to a terminal with a response time that is not different from that of the terminal.

【０００７】[0007]

【課題を解決するための手段】前記課題を解決するため
に、検索処理中に他の検索端末からの検索要求を受け付
けた場合、その検索要求を格納する手段と、検索要求格
納手段内の検索要求の個数を監視して、予め設定された
個数ｎ個以上の複数の検索要求が格納されたとき、格納
された複数の検索要求を一括して検索処理する手段とを
設けたことにある。Means for Solving the Problems To solve the above problems, when a search request from another search terminal is received during a search process, means for storing the search request, and search in the search request storage means. Means is provided for monitoring the number of requests and, when a plurality of search requests of a predetermined number n or more are stored, means for performing a batch search process on the stored plurality of search requests.

【０００８】具体的には、以下の手段を有する装置を構
成する。（１）文字列の照合手段（２）テキストデータの格納と読出し手段（３）複数の検索端末より要求を受け付ける手段（４）検索要求の待ちキューバッファ（前記検索要求格
納手段に相当する）（５）検索要求元識別手段（６）キューバッファに溜った検索要求を要求元の識別
子を付加して一つにまとめる検索要求を統一する手段（７）要求元別に検索結果の文書集合を格納する手段（８）要求元別に結果情報を一時保存し、要求元へ転送
する手段（９）統一された検索要求に対して一時に検索処理を行
い、その結果を要求元別に振り分ける手段以下これらの手段を用いた検索処理の概要を説明する。Specifically, an apparatus having the following means is configured. (1) Character string collating means (2) Text data storing and reading means (3) Means for receiving requests from a plurality of search terminals (4) Search request waiting queue buffer (corresponding to the search request storing means) 5) Search request source identification means (6) Means for unifying search requests accumulated in a queue buffer by adding a request source identifier to each other (7) Store a document set of search results for each request source Means (8) means for temporarily storing the result information for each requesting source, and means for transferring the information to the requesting source; (9) means for temporarily performing a search process for a unified search request and distributing the result for each requesting source. An outline of the search processing using the search will be described.

【０００９】検索処理中でも、他の端末からの要求を受
け付けられるようにするには、図１のようにキューバッ
ファを設け、検索中に受け付けた要求を逐次待ちキュー
として登録していく方法が考えられる。しかし、この方
法では、後から登録した検索要求はその前に登録してあ
る検索要求の検索処理が全て実行されるまで待たされる
ことになる。そこで、図２に示すように検索要求をバッ
ファから取り出す時に、バッファに既に入っている検索
要求の数を見て、１個の要求であればそのまま検索処理
を行い、複数個溜まっている場合には検索要求を一つに
まとめる検索要求統一処理を行って検索することで、前
記の検索要求登録順に係らず一回の検索処理で全ての要
求元に検索結果を送出することができる。以下、この検
索要求統一処理と要求元への検索結果送出方法について
概略を説明する。In order to be able to accept requests from other terminals even during the search process, a method of providing a queue buffer as shown in FIG. 1 and sequentially registering the requests accepted during the search as a waiting queue is considered. Can be However, in this method, a search request registered later is kept waiting until all the search processing of the search request registered before is executed. Therefore, as shown in FIG. 2, when a search request is taken out from the buffer, the number of search requests already contained in the buffer is checked.
It was carried out, in one of the search request if you have accumulated a plurality
Search request unified processing performed by by searching can send a search result to all requesting a search process once regardless of the search request registration order of the summarized. Hereinafter, an outline of the search request unifying process and a method of sending a search result to a request source will be briefly described.

【００１０】この検索要求統一処理は、例えば「“計算
機”という単語が現われる文書を探せ」という要求と、
「“バイオ技術”という単語が現われる文書を探せ」と
いう要求と「“学習型ユーザインタフェース”という単
語が現われる文書を探せ」という要求がキューバッファ
に格納されていた場合に、これらをまとめて「“計算
機”または“バイオ技術”または“学習型ユーザインタ
フェース”のいずれかの単語が現われる文書を探せ」と
いう新たな要求として検索処理を実行する。The search request unification process includes, for example, a request to “find a document in which the word“ computer ”appears”
If a request to "search for a document in which the word" biotechnology "appears" and a request to "search for a document in which the word" learning user interface "appears" are stored in the queue buffer, they are combined into " The search process is executed as a new request of "find a document in which any of the words" computer "," biotechnology ", or" learning user interface "appears.

【００１１】この検索要求を管理する待ちキューは、検
索要求のあった端末の識別子を条件式と共に格納するも
のである。すなわち、前記例での検索要求は以下のよう
に表せる。The waiting queue for managing the search request stores the identifier of the terminal that has made the search request together with the conditional expression. That is, the search request in the above example can be expressed as follows.

【００１２】最初の段階でキュー１＝ｕ１：“計算機” キュー２＝ｕ２：“バイオ技術” キュー３＝ｕ３：“学習型ユーザインタフェース” とキューバッファに入っていた場合、これらの検索要求
を統一してキュー１＝ｕ１：“計算機” ｏｒｕ２：“バイオ技術” ｏｒｕ３：“学習型ユーザインタフェース” のように新しい検索要求として検索処理する。ここで、
ｕ１，ｕ２，ｕ３は要求元の識別子である。In the first stage, queue 1 = u1: “computer” queue 2 = u2: “biotechnology” queue 3 = u3: if “learning type user interface” is in the queue buffer, these search requests are unified. Then, queue 1 = u1: “computer” or u2: “biotechnology” or u3: “learning type user interface”, and search processing is performed as a new search request. here,
u1, u2, and u3 are requester identifiers.

【００１３】このようにして統一された検索要求で、検
索処理してすべての要求に対していずれかの条件式を充
たす文書を検索する。すなわち、前記の例では、“計算
機”，“バイオ技術”，“学習型ユーザインタフェー
ス”のいずれかの語を含む文書を検索し、もしこれらの
語を含む文書がヒットすれば、文書を特定する文書ＩＤ
と共に要求元の識別子を出力する。例えば、ｄ．．を文
書ＩＤとして、ｄ１，ｕ１，ｕ３，ｕ５ｄ３，ｕ２ｄ１０，ｕ３，ｕ４ｄ２５，ｕ２，ｕ５ｄ３７，ｕ５のように出力する。この例では、例えばｄ１の文書につ
いて、ｕ１，ｕ３，ｕ５の要求した条件式が充たされて
いることを示す。With the search request unified in this manner, search processing is performed to search for a document that satisfies one of the conditional expressions for all requests. That is, in the above example, a document containing any of the words “computer”, “biotechnology”, and “learning-type user interface” is searched, and if a document containing these words is hit, the document is specified. Document ID
Together with the requester identifier. For example, d. . Is output as the document ID as d1, u1, u3, u5 d3, u2 d10, u3, u4 d25, u2, u5 d37, u5. This example shows that, for example, the conditional expression requested by u1, u3, and u5 is satisfied for the document d1.

【００１４】次にこの出力結果を要求元へ振り分ける処
理を行う。この振り分け処理は、要求元識別子をもとに
して行うことができる。すなわち、前記例では、ｕ．．
によってそれぞれの要求元への検索結果集合として格納
し各要求元へ出力する。Next, processing for distributing the output result to the request source is performed. This distribution process can be performed based on the request source identifier. That is, in the above example, u. .
And stores the result set as a search result set for each request source and outputs it to each request source.

【００１５】例えば、ｕ１：（ｄ１）ｕ２：（ｄ３，ｄ２５）ｕ３：（ｄ１，ｄ１０）ｕ４：（ｄ１０）ｕ５：（ｄ１，ｄ２５，ｄ３７）のようになる。これは、例えばｕ１の要求した条件に合
致する文書がｄ１であったことを示している。最後にこ
のようにして得られた要求元毎の検索結果を各要求元へ
出力して処理を終える。For example, u1: (d1) u2: (d3, d25) u3: (d1, d10) u4: (d10) u5: (d1, d25, d37) This indicates that, for example, the document meeting the condition requested by u1 was d1. Finally, the search result obtained for each request source obtained in this way is output to each request source, and the process is completed.

【００１６】このようにして、複数の検索端末の要求を
ひとつにまとめて検索処理することにより、検索装置の
レスポンスを向上することができる。例えば、前記例で
は、５個の検索要求を統一して検索処理しているので、
本来５回テキストをスキャンしなければならないところ
を１回のスキャンで処理を終えることができる。[0016] In this way, the response of the search device can be improved by performing the search processing by collecting the requests from the plurality of search terminals into one. For example, in the above example, since five search requests are unified and the search processing is performed,
Processing where text must be scanned five times can be completed by one scan.

【００１７】次に、検索装置にさらに高度な機能を持た
せた場合の処理の概要について、説明する。フルテキス
トサーチにおいては、テキストをスキャンするというそ
の方式上の問題から、できるだけスキャンする量を減ら
した方が結果として処理時間が短縮するという特質があ
る。また、検索の手順を考えた場合、前回の検索結果集
合をもとに、さらに条件を付加して絞り込みの検索をす
ることが圧倒的に多い。すなわち、一般的な検索では、
既に絞られた特定の集合に対してキーワードを変えなが
らさらに絞り込んでいくことが多く、常にＤＢ全体（原
集合）に対しての検索を行うわけではない。従って、絞
り込みの検索を行うのであれば、すべてのデータをスキ
ャンするよりも、前回の検索結果集合のデータのみにつ
いてスキャンする方が、処理時間を短縮することができ
る。この様な、前回の検索結果である限られた文書集合
を次の検索対象として検索処理していき、検索処理毎に
その探索範囲を狭めていく検索方法をここではハイアラ
ーキ検索と呼ぶ。Next, an outline of the processing when the search device is provided with more advanced functions will be described. In the full text search, there is a characteristic that the processing time is shortened as a result of reducing the amount of scanning as much as possible due to the problem of the method of scanning text. Also, when considering a search procedure, it is overwhelmingly common to perform a narrowing-down search by adding further conditions based on the previous set of search results. That is, in a general search,
In many cases, a specific set that has already been narrowed down is further narrowed down while changing keywords, and a search is not always performed on the entire DB (original set). Therefore, when performing a narrowing-down search, it is possible to reduce the processing time by scanning only the data of the previous set of search results, rather than by scanning all the data. Such a search method in which a limited document set, which is the previous search result, is subjected to search processing as the next search object and the search range is narrowed for each search processing is referred to as a hierarchical search.

【００１８】ハイアラーキ検索機能を複数端末接続のシ
ステムで実現するには、まず検索対象をどの様に管理
し、複数要求の受付け時どの様に設定するかが問題とな
る。この問題を図３を用いて具体的に説明する。図３で
は、時刻ｔ１にユーザａが“音声”という検索タームが
ある文書集合ＢＡＳＥ＿Ａ１をつくり、時刻ｔ３にＢＡ
ＳＥ＿Ａ１に対して、更に“合成”という検索タームで
絞り込んでいる。一方、時刻ｔ２にはユーザＢが“画
像”という検索タームがある文書集合ＢＡＳＥ＿Ｂ１を
つくり、時刻ｔ４にＢＡＳＥ＿Ｂ１に対して、更に“認
識”という検索タームで絞り込んでいる。それぞれのユ
ーザの第２回目の検索では、ＢＡＳＥ＿Ａ１，ＢＡＳＥ
＿Ｂ１という異なる文書集合を対象に別々の検索処理を
行う必要があり、複数個の検索要求を一括して処理する
ことが困難となる。つまり、時刻ｔ３とｔ４での文字列
検索処理を同時に行うことができない。これまで説明し
てきたように、文字列検索処理は非常に時間のかかる処
理なので、より多くの時間を検索サービスにに要するよ
うになってしまう。In order to realize the hierarchy search function in a system connected to a plurality of terminals, there is a problem how to manage search targets and how to set a plurality of requests when receiving a plurality of requests. This problem will be specifically described with reference to FIG. In FIG. 3, the user a creates a document set BASE_A1 having a search term “voice” at time t1, and creates a document set BASE_A1 at time t3.
SE_A1 is further narrowed down by a search term “synthesis”. On the other hand, at time t2, the user B creates a document set BASE_B1 having a search term “image”, and narrows down the BASE_B1 at time t4 by a search term “recognition”. In the second search of each user, BASE_A1, BASE
It is necessary to perform a separate search process for a different set of documents called _B1, which makes it difficult to process a plurality of search requests collectively. That is, character string search processing at times t3 and t4 cannot be performed simultaneously. As described above, since the character string search process is a very time-consuming process, more time is required for the search service.

【００１９】この問題を解決するには、別々の検索対象
の文集合を一つにまとめ、一度の検索処理で複数個の検
索要求を処理する必要がある。これは例えば、図４に示
すように常にデータベースの全体を検索対象として、検
索処理を行い検索処理後に各要求元の検索対象との積集
合を求めることで解決することができる。この方法で
は、常にＤＢ全体（原集合）ＢＡＳＥ＿０を対象に検索
処理するので、ユーザＡ、ユーザＢの２回目の検索要求
も時刻ｔ３の一度の文字列検索処理で終了することがで
きる。すなわち、まず第１に時刻ｔ３においてＢＡＳＥ
＿０に対してユーザＡとユーザＢの検索要求である“合
成”と“認識”のいずれかを含む文書を文字列検索し、
中間の結果集合ＢＡＳＥ＿ＡＢ２を作成する。次に得ら
れた集合ＢＡＳＥ＿ＡＢ２を、ユーザＡとユーザＢの検
索結果集合すなわちＢＡＳＥ＿ＡＢ２ＡとＢＡＳＥ＿Ａ
Ｂ２Ｂに分配し、それぞれのユーザの母集合ＢＡＳＥ＿
Ａ１、ＢＡＳＥ＿Ｂ１と積集合を求めることで、求める
検索結果集合ＢＡＳＥ＿Ａ２、ＢＡＳＥ＿Ｂ２を得るこ
とができる。この方法では、図３のｔ３及びｔ４の文字
列検索処理を一度に行うことができるが、常に全データ
ベースＢＡＳＥ＿０のテキストデータをスキャンしなけ
ればならないため、探索量が増えて処理時間が多量にか
かるという欠点がある。In order to solve this problem, it is necessary to combine separate sentence sets to be searched into one and process a plurality of search requests in one search process. For example, as shown in FIG. 4, this can be solved by always performing a search process with the entire database as a search target, and obtaining an intersection of each request source and the search target after the search process. In this method, since the search process is always performed on the entire DB (original set) BASE_0, the second search request of the user A and the user B can be completed by one character string search process at time t3. That is, first, at time t3, BASE
A character string search is performed for a document including any of “combination” and “recognition”, which are search requests of the user A and the user B for _0,
Create an intermediate result set BASE_AB2. Then got
The collected set BASE_AB2 is detected by the user A and the user B.
Search result sets, BASE_AB2A and BASE_A
And distributed to the B2B, mother set of each of the user BASE_
By obtaining the intersection of A1 and BASE_B1, the search result sets BASE_A2 and BASE_B2 to be obtained can be obtained. In this method, the character string search processing at t3 and t4 in FIG. 3 can be performed at one time, but since the text data of the entire database BASE_0 must be constantly scanned, the search amount increases and the processing time increases. There is a disadvantage that.

【００２０】これに対し、図５のようにＢＡＳＥ＿Ａ１
とＢＡＳＥ＿Ｂ１の和集合を検索対象にして一括検索す
る方法も考えられる。確かに検索対象は少なくなるので
探索量は減少するが、単純にＢＡＳＥ＿Ａ１とＢＡＳＥ
＿Ｂ１の和集合ＢＡＳＥ＿ＡＢ１の検索結果をユーザＡ
とユーザＢに分配するだけでは、求める正しい結果集合
が得られない。すなわち、図に示す交差斜線の部分（ユ
ーザＡの例では、ＢＡＳＥ＿Ｂ１でＢＡＳＥ＿Ａ１以外
であり、かつ“合成”を含む文書）がノイズとなる。こ
のように、マルチユーザ環境で検索装置を運用する場
合、検索対象ＤＢが同一でも各ユーザの検索文字列がそ
れぞれ異なり、また絞り込み回数もそれぞれ異なるため、
通常ある時点での各ユーザの検索対象集合は相互に違っ
たものになる。したがって、各ユーザの検索要求（検索
文字列）を一括して検索しようとしても対象とする集合
が異なるため、単にそれぞれの検索対象集合の論理和
（ＯＲ）を取り、それに対して検索を行ったとしても、
正しい検索結果集合が得られない。On the other hand, as shown in FIG. 5, BASE_A1
A method is also conceivable in which the union of BASE_B1 and BASE_B1 is set as a search target and a batch search is performed. Certainly, since the number of search objects is reduced, the amount of search is reduced, but simply BASE_A1 and BASE_A1
The search result of the union BASE_AB1 of _B1
And only distributed to the user B is not correct result set is obtained to determine. That is, the cross-hatched portion shown in the figure (in the example of the user A, BASE_B1 other than BASE_A1 and a document including “composite”) becomes noise. As described above, when the search apparatus is operated in a multi-user environment, even if the search target DB is the same, the search character strings of each user are different, and the number of times of narrowing is also different.
Usually, the set of search targets of each user at a certain time is different from each other. Therefore, even if the search requests (search character strings) of the respective users are to be searched collectively, the set to be searched is different. As
A correct search result set cannot be obtained.

【００２１】これを解決するには、図６のように一括検
索処理した後、それぞれの要求元の検索結果集合との積
集合を求め、これを検索結果とする。すなわち、時刻ｔ
３での検索結果集合ＢＡＳＥ＿ＡＢ２をユーザＡの結果
集合ＢＡＳＥ＿ＡＢ２ＡとユーザＢの結果集合ＢＡＳＥ
＿ＡＢ２Ｂに分配すると共に、各ユーザの検索対象であ
るＢＡＳＥ＿Ａ１とＢＡＳＥ＿Ｂ１との積集合を取るの
である。こうすれば、要求元単位で、検索ノイズのない
正しい検索結果が得られることになる。すなわち、ＢＡ
ＳＥ＿ＡＢ２ＡとＢＡＳＥ＿Ａ１の積集合を求めること
で、正しい結果集合ＢＡＳＥ＿３Ａが求まる。同様に、Ｂ
ＡＳＥ＿ＡＢ２ＢとＢＡＳＥ＿Ｂ１の積集合を求めるこ
とでＢＡＳＥ＿３Ｂが求められる。 In order to solve this, after performing a collective search process as shown in FIG. 6, a product set with a search result set of each request source is obtained, and this is set as a search result. That is, the time t
Result of the user A search result set BASE_AB2 of 3
Set BASE_AB2A and result set BASE of user B
_AB2B , and the intersection of BASE_A1 and BASE_B1, which are search targets of each user, is obtained. In this way, a correct search result without search noise can be obtained for each request source. That is, BA
Finding the intersection of SE_AB2A and BASE_A1
Thus, a correct result set BASE_3A is obtained. Similarly , B
Find the intersection of ASE_AB2B and BASE_B1
And BASE_3B is obtained.

【００２２】以下、図６の方式でえられたユーザＡの検
索結果ＢＡＳＥ＿ＡＢ３Ａが求める集合ＢＡＳＥ＿Ａ２
に等しいことを図７を用いて説明する。ここでは、検索
対象を定めて文字列検索した結果をＢＡＳＥ＿ｘｘ
（“文字列”）のように表すものとする。この式は、Ｂ
ＡＳＥ＿ｘｘの集合を対象に“文字列”のある文書を検
索した結果集合を表す。例えば、ＢＡＳＥ＿０（“音
声”）はＢＡＳＥ＿０を対象に“音声”で検索した結果
集合なので、ＢＡＳＥ＿Ａ１と同一集合を表すことにな
る。図６の検索結果集合ＢＡＳＥ＿ＡＢ３Ａはこの式を
用いて以下のように展開できる。Hereinafter, a set BASE_A2 obtained from the search result BASE_AB3A of the user A obtained by the method of FIG.
This will be described with reference to FIG. Here, the result of the character string search with the search target determined is BASE_xx
(“Character string”). This equation gives B
It represents a result set obtained by searching for a document having a “character string” for the set of ASE_xx. For example, BASE_0 (“voice”) is a result set obtained by searching for “voice” for BASE_0, and thus represents the same set as BASE_A1. The search result set BASE_AB3A of FIG. 6 can be expanded as follows using this expression.

【００２３】 BASE_AB3A = BASE_A1 AND BASE_AB2A = BASE_A1 AND BASE_AB2("合成") = BASE_A1 AND BASE_AB1(("合成" OR "認識") AND "合成") = BASE_A1 AND BASE_AB1("合成") = BASE_A1 AND (BASE_A1("合成") OR BASE_B1("合成")) = (BASE_A1 AND BASE_A1("合成")) OR (BASE_A1 AND BASE_B1("合成")) = BASE_A1("合成") OR (BASE_A1 AND BASE_B1)("合成”）図７は以上の検索結果集合間の関係を示すものである。
図に示すように、ＢＡＳＥ＿Ａ１とBASE_B1の積集合(BA
SE_A1 AND BASE_B1)はBASE_A1に完全に包含される。従
って、(BASE_A1 AND BASE_B1)("合成")も、BASE_A1("合
成")に完全に包含される。ゆえに、 BASE_AB3A = BASE_A1("合成") = BASE_A2 が成り立つ。BASE_AB3A = BASE_A1 AND BASE_AB2A = BASE_A1 AND BASE_AB2 ("synthesis") = BASE_A1 AND BASE_AB1 (("synthesis" OR "recognition") AND "synthesis") = BASE_A1 AND BASE_AB1 ("synthesis") = BASE_A1 AND (BASE_A1 ("Synthesis") OR BASE_B1 ("synthesis")) = (BASE_A1 AND BASE_A1 ("synthesis")) OR (BASE_A1 AND BASE_B1 ("synthesis")) = BASE_A1 ("synthesis") OR (BASE_A1 AND BASE_B1) (" FIG. 7 shows the relationship between the above retrieval result sets.
As shown in the figure, the intersection of BASE_A1 and BASE_B1 (BA
SE_A1 AND BASE_B1) is completely contained in BASE_A1. Therefore, (BASE_A1 AND BASE_B1) ("composite") is also completely included in BASE_A1 ("composite"). Therefore, BASE_AB3A = BASE_A1 ("composite") = BASE_A2 holds.

【００２４】このような、集合間の論理条件の演算は、
文字列検索に較べ極めて短時間に処理できるため、文字
列検索処理の対象をできるだけ減少させている本方式は
有効で、複数人の検索要求を一括して短時間で処理する
ことができるといえる。The operation of the logical condition between the sets is as follows.
Since the processing can be performed in a very short time as compared with the character string search, the method of reducing the target of the character string search processing as much as possible is effective, and it can be said that search requests of a plurality of persons can be collectively processed in a short time. .

【００２５】このようなハイアラーキ検索機能を有する
検索装置において、前記の複数要求がキューバッファに
溜っている場合の処理についてもう少し具体的に説明す
る。In the search apparatus having such a hierarchy search function, the processing when the plurality of requests are accumulated in the queue buffer will be described more specifically.

【００２６】例えば各検索端末の検索要求により、現在
までに次のように結果集合が得られているものとする。For example, it is assumed that a result set as described below has been obtained by a search request from each search terminal.

【００２７】ｕ１：（ｄ１，ｄ５，ｄ１２，ｄ１５，ｄ１８，ｄ２
７）ｕ２：（ｄ２，ｄ５，ｄ７，ｄ１２，ｄ１８）ｕ３：（ｄ１，ｄ５，ｄ１８，ｄ３０）ｕ４：（ｄ３，ｄ７，ｄ１２，ｄ３０，ｄ４２，ｄ５
０，ｄ５２）ｕ５：（ｄ２，ｄ５，ｄ８，ｄ１２，ｄ１８，ｄ４２，
ｄ５２）この時、キューには次のように処理すべき要求が入って
いたものとする。U1: (d1, d5, d12, d15, d18, d2
7) u2: (d2, d5, d7, d12, d18) u3: (d1, d5, d18, d30) u4: (d3, d7, d12, d30, d42, d5)
0, d52) u5: (d2, d5, d8, d12, d18, d42,
d52) At this time, it is assumed that the queue contains a request to be processed as follows.

【００２８】キュー１＝ｕ１：“計算機” キュー２＝ｕ２：“バイオ技術” キュー３＝ｕ３：“学習型ユーザインタフェース” 検索要求のあるのはｕ１，ｕ２，ｕ３の端末からである
ので、前記検索結果集合のうち、ｕ１，ｕ２，ｕ３の結
果集合のＯＲ集合を求めて、この集合内の文書のテキス
トデータのみをスキャンして“計算機”“バイオ技術”
“学習型ユーザインタフェース”のいずれかの語を含む
文書を探索して、ヒットした文書について、要求元の識
別子と共に文書ＩＤを出力する。すなわち、下記の結果
集合ｕ１ｏｒｕ２ｏｒｕ３：（ｄ１，ｄ２，ｄ５，
ｄ７，ｄ１２，ｄ１５，ｄ１８，ｄ２７，ｄ３０）に含まれる文書のテキストデータをスキャンし、例えば
次のような検索結果を得る。Queue 1 = u1: “computer” Queue 2 = u2: “biotechnology” Queue 3 = u3: “learning type user interface” Since the search request is from the terminals u1, u2 and u3, Among the search result sets, an OR set of the result sets of u1, u2, and u3 is obtained, and only the text data of the documents in this set are scanned to obtain “computer” “biotechnology”
A document containing any of the words in the "learning type user interface" is searched, and for the hit document, a document ID is output together with the identifier of the request source. That is, the following result set u1 or u2 or u3: (d1, d2, d5,
d7, d12, d15, d18, d27, d30) are scanned, and the following search results are obtained, for example.

【００２９】ｄ１，ｕ１，ｕ３ｄ５，ｕ２ｄ１２，ｕ２，ｕ３ｄ１５，ｕ１ｄ３０，ｕ２この後、要求元別にこの出力結果を振り分けるが、ただ
振り分けただけでは、要求元別の検索結果集合に含まれ
ない文書が出力されるため、振り分けた後の集合と前記
の最初にＯＲ集合を求めた結果集合とのＡＮＤ集合を求
め最終結果とする。すなわち、一旦下記のように要求元
別に検索結果を振り分けた後、ｕ１：（ｄ１，ｄ１５）ｕ２：（ｄ５，ｄ１２，ｄ３０）ｕ３：（ｄ１，ｄ１２）前記、最初にＯＲ集合を求めた結果集合とのＡＮＤ集合
を求め、ｕ１：（ｄ１，ｄ１５）ｕ２：（ｄ５，ｄ１２）ｕ３：（ｄ１）とすると、正しい結果が得られることになる。D1, u1, u3 d5, u2 d12, u2, u3 d15, u1 d30, u2 After that, the output results are sorted by request source, but if they are just sorted, they are included in the search result set for each request source. since the documents that are not is output, the a set of after sorting
At the beginning, an AND set with the result set obtained from the OR set is obtained as the final result. That is, once the search results are sorted by request source as follows, u1: (d1, d15) u2: (d5, d12, d30) u3: (d1, d12) The result of the first OR set obtained When an AND set with the set is determined, and u1: (d1, d15) u2: (d5, d12) u3: (d1), a correct result will be obtained.

【００３０】このように、検索結果集合をもとにスキャ
ンすべき文書をあらかじめ絞っておき、検索要求をＯＲ
で検索処理した後、各要求元別の結果集合をＡＮＤする
ことにより、レスポンスがよく、なおかつハイアラーキ
検索という機能を提供することが可能となる。As described above, documents to be scanned are narrowed down in advance based on the search result set, and the search request is ORed.
After performing the search processing by AND, it is possible to provide a function of good response and a hierarchy search by ANDing the result set for each request source.

【００３１】[0031]

【作用】以上の手段を用いて、複数の検索端末から検索
要求をキューバッファに溜め込み、検索要求を取り出す
ときにキューバッファに入っている要求の数を見て、要
求数が１個であればそのまま通常の検索処理を行ない、
複数個の要求が溜まっている場合には、検索要求を一つ
に統一して処理することにより、待ち時間の少ない検索
装置を提供することができる。統一された検索要求に
は、要求元の識別子がそれぞれの検索要求に付加し、文字
列の照合結果にも要求元識別子を出力するので、一度に
検索処理した後、結果集合を要求元別に振り分けること
ができる。By using the above means, search requests from a plurality of search terminals are stored in a queue buffer and the search requests are taken out.
Look at the number of requests in the queue buffer at the time, required
If the number of requests is one, normal search processing is performed as it is ,
If multiple requests are accumulated by processing unified into one search request, it is possible to provide a low latency retrieval device. In the unified search request, the requester identifier is added to each search request, and the requester identifier is output also in the result of character string collation, so after performing the search processing at a time, the result set is sorted by requester be able to.

【００３２】また、要求元別に過去の検索結果集合を格
納する手段を有し、絞り込み検索のモードの時、統一され
た検索要求に対しては、まず検索要求元毎の結果集合の
ＯＲ集合を検索の対象として、テキストデータをスキャ
ンすることにより、データスキャン量を少なくして、検索
処理時間を短縮することができる。このとき、検索結果
に対して、前記ＯＲ集合をとった元の結果集合ともう一
度今度はＡＮＤを求めることで、各要求元別に正しい絞
り込み結果が得ることができる。Further, there is provided a means for storing a set of past search results for each request source. In the case of the refined search mode, for a unified search request, first, an OR set of a result set for each search request source is set. By scanning text data as a search target, the amount of data scan can be reduced and the search processing time can be reduced. At this time, a correct narrowing result can be obtained for each request source by obtaining AND again with respect to the original result set obtained by taking the OR set with respect to the search result.

【００３３】[0033]

【実施例】以下、本発明の実施例について詳細に説明す
る。Embodiments of the present invention will be described below in detail.

【００３４】図８は、本実施例の構成を示す図である。
本実施例は、検索端末１００、検索処理を実行するＣＰ
Ｕ１１０、データベースを格納する磁気ディスク１２
０、各種のプログラムとデータバッファを格納するメモ
リ１３０からなっている。メモリ１３０には、検索処理
全体を制御する制御プログラム、データベース中の特定
文字列を探索する文字列探索プログラム、データを前記
磁気ディスクから読み出し文字列探索プログラムにデー
タを送るファイル読み出しプログラム、文字列探索プロ
グラムの出力から検索文字列間のＡＮＤ，ＯＲを判定処
理する論理条件判定プログラム、要求が複数の端末から
送られたものである場合、各要求元へ検索結果を分配し
て送出する照合結果分配プログラムの各プログラムが格
納してある。また、同メモリのほかの部分には文字列探
索及び論理条件判定の結果を一時的に蓄える照合結果出
力バッファ、各要求元別に過去の検索結果集合を管理蓄
積するセッションｎ用出力バッファ、各要求元へ検索結
果を出力するためのセッションｎ用出力バッファ、複数
の端末から同時に要求が来たときに検索要求を一時格納
しておく待ちキューバッファとしても使っている。FIG. 8 is a diagram showing the configuration of this embodiment.
In this embodiment, a search terminal 100 and a CP that executes a search process
U110, magnetic disk 12 for storing database
0, a memory 130 for storing various programs and data buffers. The memory 130 includes a control program for controlling the entire search process, a character string search program for searching for a specific character string in a database, a file reading program for reading data from the magnetic disk and sending data to the character string search program, and a character string search. A logical condition judgment program for judging AND and OR between search character strings from the output of the program. If the request is sent from a plurality of terminals, a collation result distribution for distributing and sending the search result to each request source Each program of the program is stored. In another part of the memory, a collation result output buffer for temporarily storing the results of the character string search and the logical condition determination, a session n output buffer for managing and storing past retrieval result sets for each request source, It is also used as an output buffer for session n for outputting search results to the origin, and as a waiting queue buffer for temporarily storing search requests when requests are received simultaneously from a plurality of terminals.

【００３５】これより、検索処理の流れに従って、各プ
ログラムの処理の詳細を説明する。まず、端末側での検
索処理の流れについて説明する。図９は本実施例での検
索端末で実行する業務の流れを示すＰＡＤ図である。本
実施例では、検索端末を検索装置に接続する接続要求を
行なった後、条件式を端末使用者から読み込み、検索要
求を装置に送り結果を受け取る処理を繰り返すものとす
る。受け取った検索結果により、場合によっては文書デ
ータを検索装置より取得して表示することも行なう。す
べての検索業務が終了した後、検索装置との接続を開放
するものとする。The details of the processing of each program will now be described according to the flow of the search processing. First, the flow of the search process on the terminal side will be described. FIG. 9 is a PAD diagram illustrating a flow of a task executed by the search terminal according to the present embodiment. In the present embodiment, it is assumed that after making a connection request for connecting the search terminal to the search device, the processing of reading the conditional expression from the terminal user, sending the search request to the device, and receiving the result is repeated. Depending on the received search result, document data may be obtained from the search device and displayed in some cases. After all the search operations are completed, the connection with the search device is released.

【００３６】図１０は制御プログラムの処理をまとめた
ＰＡＤ図である。FIG. 10 is a PAD diagram summarizing the processing of the control program.

【００３７】まず検索端末より接続要求があると、制御
プログラムはセッションｎ用結果集合管理テーブルとセ
ッションｎ用出力バッファを確保する。このｎは、接続
要求のあった端末ごとに更新される１からの通し番号で
ある。この番号を以後端末識別子と呼ぶ。例えば、既に
２個の端末からの接続要求が来ていて、新たに３個目の
端末から接続要求が来た場合、この端末の識別子を３と
し、セッション３用結果集合管理テーブル及びセッショ
ン３用出力バッファの領域を確保する。First, when there is a connection request from the search terminal, the control program secures a result set management table for session n and an output buffer for session n. This n is a serial number from 1 that is updated for each terminal that has made a connection request. This number is hereinafter referred to as a terminal identifier. For example, if a connection request has already been received from two terminals and a connection request has newly been received from a third terminal, the identifier of this terminal is set to 3, the session 3 result set management table and the session 3 Allocate output buffer area.

【００３８】接続の開放要求があった場合には、上記端
末識別子の結果集合管理テーブルと出力バッファのメモ
リ領域を開放する。When there is a connection release request, the result set management table of the terminal identifier and the memory area of the output buffer are released.

【００３９】文書データの表示要求の場合は、要求のあ
った文書をファイルから読みだす命令をファイル読み出
しプログラムに対して送り、該プログラムがセッション
ｎ用出力バッファに出力する文書データを要求元へ送出
する。In the case of a document data display request, an instruction to read the requested document from a file is sent to the file reading program, and the program sends the document data output to the session n output buffer to the request source. I do.

【００４０】制御プログラムの処理の説明の最後に、本
発明の特徴である検索要求が端末より送られたときの処
理について詳細に説明する。At the end of the description of the processing of the control program, the processing when a search request is sent from a terminal, which is a feature of the present invention, will be described in detail.

【００４１】検索要求を受け付けた場合、制御プログラ
ムは新たに待ちキューを生成してキューバッファに積み
上げる処理を行なう。このキューバッファは図１１に示
すように、要求元の端末識別子と検索条件式を対にして
積み上げた構造となっている。キューが一つ積み上がる
毎にキューカウンタの数をインクリメントする。後に説
明する文字列探索プログラムがこのキューバッファから
検索要求をとりだし、その条件式にしたがって検索処理
を行ない、結果を要求元へ送る。従って、制御プログラ
ムではキューバッファに検索要求を溜める処理だけを行
ない、文字列探索プログラムではキューバッファから検
索要求を取り出す処理のみを行なうことになる。文字列
探索処理が終わらないまま、制御プログラムが次々と検
索要求を受け付けた場合には、キューバッファに待ちキ
ューがどんどん積み上げられていくことになる。When the search request is received, the control program performs a process of generating a new waiting queue and stacking the waiting queue in the queue buffer. As shown in FIG. 11, the queue buffer has a structure in which a terminal identifier of a request source and a search condition expression are paired and stacked. Each time one queue is stacked, the number of the queue counter is incremented. A character string search program described later fetches a search request from this queue buffer, performs a search process in accordance with the conditional expression, and sends the result to the request source. Therefore, the control program performs only the processing for storing the search requests in the queue buffer, and the character string search program performs only the processing for extracting the search requests from the queue buffer. If the control program accepts search requests one after another without completing the character string search process, the waiting queue is accumulated in the queue buffer.

【００４２】続いて文字列探索処理の説明をする。この
文字列探索プログラムは前述の制御プログラムとは時分
割（タイムシェアリング）で動作しているものとする。
すなわち、制御プログラムは端末から検索要求が来るの
を監視し、文字列探索プログラムは制御プログラムがキ
ューを積むのを待っている。キューバッファに検索要求
が入ると文字列探索プログラムは、検索処理を始める
が、検索処理の実行中はキューバッファは参照しないの
で、次々と検索要求が来たときには検索要求がバッファ
に溜りつづけることになる。この検索要求は文字列探索
プログラムの検索処理の実行終了後、処理されることに
なる。Next, the character string search processing will be described. It is assumed that this character string search program operates in a time sharing manner (time sharing) with the above-described control program.
That is, the control program monitors for a search request from the terminal, and the character string search program is waiting for the control program to load the queue. When a search request is entered in the queue buffer, the character string search program starts the search process, but does not refer to the queue buffer during the execution of the search process. Become. This search request is processed after the execution of the search processing of the character string search program.

【００４３】図１２に、この文字列探索プログラムの処
理の流れを表すＰＡＤ図を示す。文字列探索プログラム
は、最初検索要求が入ってくるのを待っている。もし、
検索要求が入ってくれば検索処理を実行する。この時、
待ちキューバッファに入っている検索要求の個数をチェ
ックし、もし複数個の検索要求が入っている場合には、こ
れらの検索要求を同時に取り出して一括して検索処理を
行う。検索要求が１個のみであれば、通常の検索処理を
行なう。次にキューの中に入っている条件式を解析し、
検索タームにタームＩＤを振る処理を行う。以後の処理
ではこのタームＩＤを用いて行うことになる。すなわ
ち、文字列探索処理において、テキスト中に該当する検
索タームが有った場合、上記タームＩＤを出力する。そ
して、ターム間論理条件判定処理で、検索ターム間の論
理的な条件すなわち、ＡＮＤ、ＯＲの条件に合致するも
のだけを出力する。最後に今までの結果を要求端末毎に
分配して、送出して一連の検索処理が終了し、再び検索
要求待ちの状態へ移る。FIG. 12 is a PAD diagram showing the flow of processing of the character string search program. The string search program first waits for a search request to come in. if,
When a search request comes in, search processing is executed. At this time,
The number of search requests in the waiting queue buffer is checked, and if a plurality of search requests are included, these search requests are taken out simultaneously and search processing is performed collectively. If there is only one search request , normal search processing
Do. Next, analyze the conditional expression in the queue,
A process of assigning a term ID to the search term is performed. Subsequent processing will be performed using this term ID. That is, in the character string search processing, if there is a corresponding search term in the text, the term ID is output. Then, in the inter-term logical condition determination processing, only those that match the logical conditions between the search terms, that is, AND and OR conditions, are output. Finally, the results so far are distributed to the requesting terminals and transmitted, and a series of search processing is completed, and the process returns to a state of waiting for a search request.

【００４４】以上の文字列探索処理プログラムの中で、
タームＩＤの付与処理、論理条件の判定処理、結果の分
配処理について更に詳細に説明する。In the above character string search processing program,
The term ID assignment processing, the logical condition determination processing, and the result distribution processing will be described in further detail.

【００４５】タームＩＤの付与処理とは、図１３に示す
ように、単一あるいは複数個のキューに蓄えられている
条件式の検索タームにユニークな番号を付与することが
主な処理となる。その他、条件式間で同じ検索タームが
使用されていないか、ターム間のＡＮＤ条件は存在する
か、各検索タームはどの検索端末からの要求によるもの
かを識別し、処理する。タームＩＤ付与処理では、図１
３に示すように端末識別子と条件式を格納した検索要求
を処理して、タームＩＤテーブル、多重照合タームＩＤ
テーブル、ＡＮＤ条件判定テーブル、及び出力ＩＤリス
トの４個のテーブルあるいはリストを生成する。ターム
ＩＤテーブルは、実際の文字列と対応するタームＩＤを
示したものである。このとき、ターム間にＡＮＤ条件が
あれば、ＡＮＤ条件の項に後述するＡＮＤ条件判定テー
ブルの条件ＩＤを登録する。この条件ＩＤは、１から始
まる整数値である。As shown in FIG. 13, the main process of assigning a term ID is to assign a unique number to a search term of a conditional expression stored in a single or a plurality of queues. In addition, it identifies and processes whether the same search term is not used among the conditional expressions, whether there is an AND condition between the terms, and which search terminal requests each search term. In the term ID assignment process, FIG.
The search request storing the terminal identifier and the conditional expression is processed as shown in FIG.
A table, an AND condition determination table, and four tables or lists of an output ID list are generated. The term ID table shows a term ID corresponding to an actual character string. At this time, if there is an AND condition between terms, a condition ID of an AND condition determination table described later is registered in the AND condition item. This condition ID is an integer value starting from 1.

【００４６】タームＩＤテーブルに登録する処理は、検
索要求ごとに行われ、それぞれの検索要求内で指定され
る検索文字列にユニークな番号を付与していく。図１３
の例では、キュー１の検索条件により“計算機”にター
ムＩＤの１が“事務処理”にタームＩＤの２がつけられ
ることになる。以下、検索タームの登録ごとに順番にユ
ニークなＩＤ番号がつけられていく。The process of registering in the term ID table is performed for each search request, and a unique number is assigned to a search character string specified in each search request. FIG.
In the example, the term ID 1 is assigned to “computer” and the term ID 2 is assigned to “business processing” according to the search condition of the queue 1. Hereinafter, a unique ID number is sequentially assigned for each registration of a search term.

【００４７】このようにタームＩＤテーブルにＩＤをつ
けていくとき、既に他の条件式で同じ文字列をタームと
して用いていた場合、例えば図１３の例でキュー３の処
理の場合には、多重照合タームＩＤという特殊なＩＤを
タームＩＤテーブルに登録する。図１３の例では、１０
０１から始まる整数値となっている。そして、多重照合
タームＩＤテーブルに２件分のタームＩＤを登録してお
く。すなわち、図１３の例で具体的に説明すると、ター
ムＩＤテーブルに元々登録されていた“計算機”をあら
わすＩＤ番号１を１００１で置き換え、更に多重照合タ
ームＩＤテーブルに２件分のタームＩＤである１と６を
登録する。この多重照合タームＩＤを付与された文字列
が文字列検索中ヒットした場合には、多重照合タームＩ
Ｄテーブルを参照してそこに登録された複数個のターム
ＩＤを検索結果として出力する。図１３の例では、“計
算機”という文字列が文書中に有ると、タームＩＤ１０
０１が出力されるがこれは多重照合タームＩＤなので、
多重照合タームＩＤテーブルを参照して、結局タームＩ
Ｄの１と６を出力することになる。As described above, when assigning IDs to the term ID table, if the same character string is already used as a term in another conditional expression, for example, in the case of the processing of the queue 3 in the example of FIG. A special ID called a collation term ID is registered in the term ID table. In the example of FIG.
It is an integer value starting from 01. Then, two term IDs are registered in the multiple matching term ID table. In other words, specifically explaining with reference to the example of FIG. 13, the ID number 1 representing “computer” originally registered in the term ID table is replaced with 1001, and two term IDs are included in the multiple matching term ID table. Register 1 and 6. If the character string provided with the multiple matching term ID hits during the character string search, the multiple matching term I
With reference to the D table, a plurality of term IDs registered therein are output as search results. In the example of FIG. 13, if the character string “computer” is present in the document, the term ID 10
01 is output, but since this is a multiple matching term ID,
Referring to the multiple matching term ID table, the term I
D 1 and 6 will be output.

【００４８】ＡＮＤ条件判定テーブルは、ターム間にＡ
ＮＤ条件が有ったときに登録される。例えば、図１３の
キュー１の検索要求を処理するとき、タームＩＤテーブ
ルにタームＩＤとターム文字列を登録し、かつＡＮＤ条
件の項目に条件ＩＤの１を登録する。この条件ＩＤも各
ＡＮＤ条件ごとにユニークな１から始まる整数値であ
る。また、ＡＮＤ条件テーブルには、条件ＩＤとターム
ＩＤ及び出力ＩＤを登録する。出力ＩＤは、タームＩＤ
を含めたユニークな番号となる。図１３の例でキュー１
の処理をしていて、これまでタームＩＤは、１と２が付
与されているので、ＡＮＤ条件テーブルの出力ＩＤには
３が登録されることになる。また、ここでＩＤ番号３が
使用されるため、キュー２の処理では検索ターム“環境
問題”にタームＩＤの４が付与されることになる。The AND condition judgment table indicates that A
Registered when there is an ND condition. For example, when processing the search request of the queue 1 in FIG. 13, the term ID and the term character string are registered in the term ID table, and the condition ID 1 is registered in the AND condition item. This condition ID is also an integer value starting from 1 which is unique for each AND condition. The condition ID, the term ID, and the output ID are registered in the AND condition table. Output ID is term ID
And unique numbers. Queue 1 in the example of FIG.
Since the term IDs 1 and 2 have been assigned so far, 3 is registered in the output ID of the AND condition table. Further, since the ID number 3 is used here, in the processing of the queue 2, the term ID 4 is added to the search term “environmental problem”.

【００４９】ＡＮＤ条件判定テーブルは、図に示すとお
りＡＮＤ条件を持つタームＩＤとそのタームが文書中に
あったことを示すヒットフラグ、及び条件が成立したと
きに出力する出力ＩＤとを項目としている。ヒットフラ
グは最初０に初期化されており、文字列検索処理におい
て、タームが文書中にあったときに１となる。一つの条
件ＩＤに対して全てのヒットフラグが１の場合に出力Ｉ
Ｄを出力する。また、文書単位でヒットフラグは０にリ
セットされる。図１３の例では、“計算機”（多重照合
により、タームＩＤは１及び６）と“事務処理”（ター
ムＩＤは２）の両方の文字列が文書中にあったときに条
件ＩＤが１の全てのヒットフラグが１になるのでＩＤの
３を出力する。As shown in the figure, the AND condition determination table has a term ID having an AND condition, a hit flag indicating that the term is present in a document, and an output ID output when the condition is satisfied. . The hit flag is initially initialized to 0, and becomes 1 when a term is present in a document in the character string search process. Output I when all hit flags are 1 for one condition ID
D is output. The hit flag is reset to 0 for each document. In the example of FIG. 13, when both character strings of “computer” (term IDs are 1 and 6 due to multiple matching) and “business processing” (term ID is 2) are present in the document, the condition ID of 1 Since all the hit flags become 1, ID 3 is output.

【００５０】出力ＩＤリストは、最終結果を出力すると
きのタームＩＤまたは条件式ＩＤを登録したもので、こ
こに登録されなかったＩＤは検索結果として出力しな
い。また、同時に出力するＩＤと該タームあるいは条件
を要求した端末の識別子の対応をとっていて、照合結果
分配プログラムにおいては、この対応をもとに検索結果
を端末毎の出力バッファに振り分ける処理を行う。The output ID list is a list of term IDs or conditional expression IDs used when outputting the final result. IDs not registered here are not output as search results. In addition, the IDs to be output at the same time correspond to the identifiers of the terminals requesting the terms or conditions, and the collation result distribution program performs processing for distributing the search results to the output buffer of each terminal based on this correspondence. .

【００５１】以上のタームＩＤ付与処理のアルゴリズム
をＰＡＤ図を用いて詳細に説明する。図１４に示すよう
に、この処理は待ちキューバッファより取り出した検索
要求の個数分同一の処理を繰り返す。最初に検索要求を
出した端末識別子をキューより取得する。次に、条件式
の種別に従ってそれぞれの処理を行う。条件式の種別と
は、ターム間がどの様な論理条件で結ばれているかを区
別するもので、本実施例ではＡＮＤ，ＯＲ及び検索ター
ムが単一で用いられていることを示す単純条件の３種類
を用いている。ＡＮＤ条件のときは、条件式中のそれぞ
れのタームについて後述のタームＩＤ登録、タームＩＤ
テーブルへのＡＮＤ条件ＩＤの設定、及びＡＮＤ条件テ
ーブルの設定を行なった後、出力ＩＤリストに端末識別
子と出力ＩＤを登録する。ＯＲ条件のときは、それぞれ
のタームについてタームＩＤ登録をした後に、各ターム
ＩＤを出力ＩＤリストに端末識別子とともに登録する。
単純条件のときは、タームＩＤ登録と出力ＩＤリストへ
の登録を行なう。The algorithm of the above-described term ID assignment processing will be described in detail with reference to a PAD diagram. As shown in FIG. 14, this process repeats the same process for the number of search requests retrieved from the waiting queue buffer. First, the terminal identifier that issued the search request is obtained from the queue. Next, each process is performed according to the type of the conditional expression. The type of conditional expression is used to distinguish what logical condition is connected between terms. In the present embodiment, a simple condition indicating that AND, OR, and a search term are used alone is used. Three types are used. In the case of an AND condition, for each term in the conditional expression, a term ID registration,
After setting the AND condition ID in the table and setting the AND condition table, the terminal identifier and the output ID are registered in the output ID list. In the case of the OR condition, after registering the term ID for each term, each term ID is registered in the output ID list together with the terminal identifier.
In the case of simple conditions, term ID registration and registration in the output ID list are performed.

【００５２】次にタームＩＤの登録アルゴリズムの説明
をする。この処理は検索タームの文字列をタームＩＤテ
ーブルに登録するのが主な処理であるが、この時登録し
ようとする文字列が既に登録されているかチェックし、
登録されている場合には多重照合タームＩＤテーブルへ
の設定を行なう。すなわち、図１５に示すように、まず
登録しようとする検索タームの文字列と同一の文字列が
タームＩＤテーブルに登録されているかチェックする。
もし、新規であれば新しいタームＩＤとともに文字列を
タームＩＤテーブルに登録する。既に同じ文字列が存在
する場合には、既存タームのタームＩＤを多重照合ター
ムＩＤに変更し、多重照合タームＩＤテーブルに新規タ
ームとともにタームＩＤを設定する。このとき、既登録
のタームにＡＮＤ条件が設定されていた場合、タームＩ
Ｄテーブルに設定されていたＡＮＤ条件ＩＤを０にクリ
アし、多重照合タームＩＤテーブルの方へ設定する。図
１３の例では、最初“計算機”はタームＩＤテーブルに
ＡＮＤ条件ＩＤとともに登録されていたが、キュー３の
処理のとき、多重照合タームＩＤ１００１に変更し、こ
の時同時にＡＮＤ条件ＩＤも多重照合タームＩＤテーブ
ルの方へ設定し直している。Next, the term ID registration algorithm will be described. This processing is mainly to register the character string of the search term in the term ID table. At this time, it is checked whether the character string to be registered is already registered,
If it is registered, the setting is made in the multiple matching term ID table. That is, as shown in FIG. 15, first, it is checked whether the same character string as the character string of the search term to be registered is registered in the term ID table.
If new, the character string is registered in the term ID table together with the new term ID. If the same character string already exists, the term ID of the existing term is changed to the multiple matching term ID, and the term ID is set in the multiple matching term ID table together with the new term. At this time, if an AND condition is set for a registered term, term I
The AND condition ID set in the D table is cleared to 0, and is set to the multiple matching term ID table. In the example of FIG. 13, “computer” is initially registered in the term ID table together with the AND condition ID. The ID table has been reset.

【００５３】最後に検索結果の分配処理について詳細に
説明する。図１６は、本実施例で用いるテキストデータ
の形態と論理条件判定後のデータの形態及びセッション
ｎ用出力バッファに格納されるデータの形態を示してい
る。テキストデータは、文書を特定するための文書ＩＤ
のｄｘｘｘを先頭として複数文書が列になって格納され
ている。図１６の例では、ｄ００１からｄ００４までの
テキストデータが並んでいることを示している。このテ
キストデータを文字列判定処理、論理条件判定処理を行
なって文字列の有った文書番号とヒットしたタームＩＤ
を出力する。この時は、各要求元から来た条件は全て一
緒に出力される。この論理条件判定処理後のデータを図
１３に示した出力ＩＤリストをもとに各要求元に対応す
るセッションｎ用出力バッファへ結果の文書ＩＤを振り
分ける。すなわち、論理条件判定の出力するＩＤの２と
３の出現した文書はセッション３用出力バッファへ、Ｉ
Ｄ５はセッション５用出力バッファへ、ＩＤ６はセッシ
ョン１用出力バッファへそれぞれ文書ＩＤを出力するれ
ばよい。Lastly, the process of distributing search results will be described in detail. FIG. 16 shows the form of text data used in this embodiment, the form of data after logical condition determination, and the form of data stored in the session n output buffer. The text data is a document ID for identifying the document
A plurality of documents are stored in a row starting with dxxx. The example in FIG. 16 shows that text data from d001 to d004 are arranged. This text data is subjected to character string judgment processing and logical condition judgment processing, and the document ID having the character string and the term ID that has been hit
Is output. At this time, all the conditions from each request source are output together. Based on the output ID list shown in FIG. 13, the data after the logical condition determination processing is assigned the resulting document ID to the session n output buffer corresponding to each request source. That is, the document in which the IDs 2 and 3 output by the logical condition determination appear in the session 3 output buffer,
D5 may be output to the session 5 output buffer, and ID6 may be output to the session 1 output buffer.

【００５４】以上、ＣＰＵ１１０の処理について詳細に
説明した。次に、磁気ディスク１２０に複数のデータベ
ースが格納されている時の処理について説明する。デー
タベースは図１７に示すように、磁気ディスク１２０上
にデータベース毎にまとまって格納されているものとす
る。The processing of the CPU 110 has been described above in detail. Next, processing when a plurality of databases are stored on the magnetic disk 120 will be described. It is assumed that the databases are collectively stored on the magnetic disk 120 for each database as shown in FIG.

【００５５】この様に複数個のデータベースを格納し
て、各要求元が対象とするデータベースを指定して検索
要求を出す場合、制御プログラムがキューバッファに溜
った検索要求を一括して処理しても処理時間の短縮につ
ながらない場合がある。つまり、各要求が別々のデータ
ベースに対するものであった時である。例えば、要求元
１がデータベース１を検索対称に指定し、要求元２がデ
ータベース２を検索対象に指定した検索要求がキューバ
ッファに溜っていた時、これらの検索要求を統一して、
データベース１と２の両方を一括して探索しても、別々
に探索してもトータルのスキャン量は変わらない。すな
わち、図１７の例では磁気ディスク上の論理アドレス０
から３５００番地までをスキャンする量は変らないので
ある。When a plurality of databases are stored in this way and each requester issues a search request by specifying a target database, the control program collectively processes the search requests accumulated in the queue buffer. May not lead to a reduction in processing time. That is, when each request was to a separate database. For example, when the request source 1 specifies the database 1 as a search target, and the request source 2 specifies the database 2 as a search target, the search requests are accumulated in the queue buffer.
The total scan amount does not change when both the databases 1 and 2 are searched collectively or separately. That is, in the example of FIG.
The amount of scanning from address 3500 to address 3500 does not change.

【００５６】これに対し、要求元１と２の両方が同じデ
ータベース２を対象に指定しているときには、検索要求
を統一して一括して探索するとスキャンする量が半分に
なることがわかる。すなわち、検索要求を一つずつ別々
に処理すると、論理アドレス２１００番地から３５００
番地を２度スキャンするのに対して、統一して処理すれ
ば１度のスキャンで探索処理が終了できることがわか
る。On the other hand, when both the request sources 1 and 2 specify the same database 2 as a target, it can be seen that if the search requests are unified and searched collectively, the amount of scanning becomes half. That is, when the search requests are processed individually one by one, the logical addresses 2100 to 3500
It can be seen that the search process can be completed by one scan if the address is scanned twice, but the process is performed in a unified manner.

【００５７】この様に、情報検索装置が複数のデータベ
ースを格納する場合には、各要求の検索対象データベー
スを考慮し、探索範囲を同じくする要求にたいして、検
索要求を一括処理することによって、スキャン量を減ら
し、待ち時間の少ない処理を実現することができる。As described above, when the information search device stores a plurality of databases, the search amount is processed in a lump in response to requests having the same search range in consideration of the search target database of each request. And a process with a short waiting time can be realized.

【００５８】このために、検索装置側では各検索端末か
ら送られてくる検索要求をキューバッファに格納する際
に、どのデータベースが対象かを識別するためのデータ
ベースｉｄも同時にキューの中に格納する。この時のキ
ューバッファの構造は、図１８のようになる。すなわ
ち、端末識別子、データベースｉｄ、条件式をキューと
して格納することになる。そして、このキューバッファ
から、溜っている検索要求を取り出して統合処理すると
きに、データベースｉｄが同一の検索要求のみを選択す
ることで、前述の効果的な一括検索処理が可能となる。For this reason, when the search device stores a search request sent from each search terminal in the queue buffer, a database id for identifying which database is the target is also stored in the queue at the same time. . The structure of the queue buffer at this time is as shown in FIG. That is, the terminal identifier, the database id, and the conditional expression are stored as a queue. Then, when the accumulated search requests are taken out from the queue buffer and integrated, the database id selects only the same search request, thereby enabling the above-described effective batch search process.

【００５９】このデータベース選択の検索処理を具体的
に説明する。図１９は図１２で説明した文字列探索プロ
グラムの複数データベースを有するときの処理を示すＰ
ＡＤ図である。処理の違いはキューバッファからのキュ
ーの取り出しで、それ以外の処理は既に説明したので省
略する。今、図１８の様にキューバッファに検索要求が
溜っている場合、まずキュー１をバッファから取り出し
て、データベースｉｄを参照し、バッファ内のデータベ
ースｉｄが２であるキュー、すなわちキュー１と対象が
同一であるものだけをキューバッファから取り出す。こ
の例では、キュー３とキュー４が該当するため、キュー
２だけをバッファに残し、残りの検索要求は統合され一
括処理される。この様にして、検索要求を統合し、デー
タベース２を一度スキャンした後、端末識別子１，２，
５にそれぞれ検索結果を分配し検索処理を終える。この
時、キューバッファにはまだキュー２が残っているの
で、次にこの検索要求がバッファから取り出され、デー
タベース３について検索処理を始めることになる。The search process for selecting a database will be specifically described. FIG. 19 is a flowchart showing processing when the character string search program described in FIG. 12 has a plurality of databases.
It is an AD diagram. The difference between the processes is the removal of the queue from the queue buffer, and the other processes have already been described and will not be described. When search requests are accumulated in the queue buffer as shown in FIG. 18, first, the queue 1 is taken out from the buffer and the database id is referred to, and the queue whose database id is 2 in the buffer, that is, the queue 1 and the target are Retrieve only the identical ones from the queue buffer. In this example, since queue 3 and queue 4 correspond, only queue 2 is left in the buffer, and the remaining search requests are integrated and processed collectively. In this way, the search requests are integrated, the database 2 is scanned once, and then the terminal identifiers 1, 2,
5 and the search results are distributed, and the search process is completed. At this time, since the queue 2 still remains in the queue buffer, the search request is fetched from the buffer and the search process for the database 3 is started.

【００６０】この様に、複数のデータベースに対して要
求が送られてくる場合でも、複数の検索要求を一括して
処理することが可能で、待ち時間の少ない情報検索装置
を提供することができる。As described above, even when a request is sent to a plurality of databases, a plurality of search requests can be processed collectively, and an information search apparatus with a short waiting time can be provided. .

【００６１】以上、単一あるいは複数の検索要求が有る
場合の文字列検索処理について説明した。このようにし
て得られた検索結果である文書の集合は、図２０のよう
にして格納され、以後の検索の際にベース集合として用
いたり、あるいは集合間のＯＲ演算をしたりして用いら
れる。図２０の例では、検索結果集合である文書ＩＤの
列を文書ＩＤの位置を１とし、ヒットしていない文書Ｉ
Ｄの位置を０とした検索結果ビットリストで表現してい
る。ビットリストで表現した理由は、集合間のＡＮＤ，
ＯＲ処理を行なうのにビット間のＡＮＤ，ＯＲ演算で処
理できるという利点が有るからである。このビットリス
トを検索回数分蓄えて管理しているのが、結果集合管理
テーブルである。この結果集合管理テーブルは、接続し
ている端末の個数分用意され、それぞれに各端末の要求
毎の結果集合が格納されている。以下、この結果集合管
理テーブルを用いたハイアラーキ検索の処理について説
明する。The character string search processing when there is a single or a plurality of search requests has been described above. The set of documents as search results obtained in this way is stored as shown in FIG. 20, and is used as a base set in the subsequent search, or used by performing an OR operation between the sets. . In the example of FIG. 20, the position of the document ID is set to 1 in the column of the document ID which is the search result set,
It is represented by a search result bit list in which the position of D is 0. The reason expressed by the bit list is that AND between sets,
This is because there is an advantage that the OR processing can be performed by AND and OR operations between bits. The result set management table stores and manages this bit list for the number of searches. This result set management table is prepared for the number of connected terminals, and the result set for each request of each terminal is stored in each of the tables. Hereinafter, the process of the hierarchy search using the result set management table will be described.

【００６２】ハイアラーキ検索とは、図２１に示すよう
に、結果集合管理テーブルで管理する過去の結果集合を
検索範囲として、検索する処理のことで、絞り込み検索と
も呼ぶ。すなわち、過去（通常直前の検索結果）の検索
結果である文書ＩＤの列を結果集合管理テーブルのビッ
トリストから求め（これをベースと呼ぶ）、新たな条件
式（図２１の例では、“学習型ユーザインタフェー
ス”）でテキストデータを探索する。この時、テキスト
データのベース以外のデータは読み飛ばす処理を行な
う。例えば、図２１の例ではベース以外のテキストデー
タｄ００１〜ｄ００２及びｄ００４〜ｄ００７などは不
要テキストとして、文字列の探索処理を行なわない。す
なわち、ｄ００３、ｄ００８、ｄ０１５などのテキスト
データのみを“学習型インターフェイス”で文字列探索
を行なう。結果として、ベースのテキストデータに条件
の検索タームが存在した文書ＩＤｄ００８、ｄ０１６
が出力される。これが絞り込み処理である。複数の検索
要求が待ちキューバッファに溜っている場合にこのハイ
アラーキ検索を実現する方法を次に説明する。検索要求
のあった端末が異なる場合、当然ベースも異なることに
なる。従ってベースをどの様に設定するか、また結果集
合をどのように分配するかが問題となってくる。文字列
の照合動作は、テキストデータをスキャンするという正
確上できるだけ少なくすることが望まれる。そこで、図
２２のように、各要求元の結果集合管理テーブルからベ
ースのビットリストを読みだし、それらのビットＯＲを
複数要求受付時のベースとする。そして、絞り込んだ後
の結果集合をまたビットリストに戻し、各要求元のベー
スビットリストとＡＮＤ演算する。最後のＡＮＤ演算
は、要求元のベース集合以外の他の要求元のベースでの
条件式の照合によるノイズを取り去るために必要とな
る。As shown in FIG. 21, the hierarchy search is a process of searching using a past result set managed by a result set management table as a search range, and is also called a narrowing search. That is, a column of document IDs, which are past (normally immediately preceding search results), is obtained from the bit list of the result set management table (this is called a base), and a new conditional expression (in the example of FIG. Search for text data in the type user interface "). At this time, data other than the base of the text data is skipped. For example, in the example of FIG. 21, the text data d001 to d002 and d004 to d007 other than the base are regarded as unnecessary texts and the character string search processing is not performed. You
That is, text such as d003, d008, d015
Character string search for data only with “learning interface”
Perform As a result, the document IDs d008 and d016 in which the search term of the condition exists in the base text data
Is output. This is the narrowing down process. Next, a method for implementing the hierarchy search when a plurality of search requests are accumulated in the waiting queue buffer will be described. If the terminal requesting the search is different, the base is naturally different. Therefore, how to set the base and how to distribute the result set become problems. It is desirable that the collation operation of the character string be as small as possible in terms of scanning the text data. Therefore, as shown in FIG. 22, the base bit list is read from the result set management table of each request source, and those bit ORs are used as the base when accepting a plurality of requests. Then, the result set after the narrowing down is returned to the bit list, and the AND operation is performed on the base bit list of each request source. The final AND operation is necessary to remove noise due to the matching of the conditional expression on the base of the requester other than the base set of the requester.

【００６３】以上ハイアラーキ検索機能の実現処理を含
め、第一の実施例の説明をした。本実施例は、特別なハ
ードウエアを必要とせず、単一のＣＰＵで複数のフルテ
キストサーチ要求を処理するのに効果がある。The first embodiment has been described above, including the processing for realizing the hierarchy search function. This embodiment does not require special hardware, and is effective in processing a plurality of full-text search requests with a single CPU.

【００６４】本実施例では、各端末からユーザがオンラ
インで検索要求を発信することを仮定に説明したが、オ
フラインで指定された検索要求に対しても同様の処理で
複数の検索処理を一括して処理することができる。これ
は、例えばバッチジョブのように何件もの検索要求を一
台の検索装置で処理する場合に、従来は一件ずつＤＢの
テキストデータをスキャンして結果を出力していたもの
が、一回のスキャンですべての要求を処理できることを
意味する。つまり、本実施例によればオンライン、オフ
ラインを問わず高速な検索要求の処理が可能となる。Although the present embodiment has been described on the assumption that the user sends a search request online from each terminal, a plurality of search processes are collectively performed in the same manner for a search request specified offline. Can be processed. This is because, when many search requests are processed by a single search device, such as a batch job, the conventional technique is to scan the text data of the DB one by one and output the result. Means that all requests can be processed in a single scan. That is, according to the present embodiment, it is possible to process a search request at high speed regardless of online or offline.

【００６５】これより、第二の実施例の説明を行なう。
本実施例は、文字列探索処理のための特別なハードウエ
アあるいは探索処理専門のＣＰＵを有する複数個のＣＰ
Ｕで実現することに特徴がある。A description will now be given of the second embodiment.
In this embodiment, a plurality of CPs having special hardware for character string search processing or a CPU dedicated to search processing are used.
The feature is that it is realized by U.

【００６６】図２３は第二の実施例の構成を示す図であ
る。本実施例は、検索端末１３００、検索処理を実行す
るＣＰＵ１３１０、制御プログラムと各種のデータバッ
ファを格納するメモリ１３２０、ＣＰＵ１３１０に接続
し実際の文字列探索処理を制御するためのＣＰＵ１３３
０、文字列照合処理を専門に処理する文字列照合部１３
４０、データベースを格納する磁気ディスクの書込み読
出しを制御するディスク制御部１３５０、磁気ディスク
１３６０、文字列照合処理を制御するプログラムと各種
データエリアを格納するメモリ１３７０とからなってい
る。１３３０〜１３７０はＴＳＭという一つの処理ブロ
ックを形成してＣＰＵ１３１０に複数個接続する。メモ
リ１３２０には、ＴＳＭ及び端末を制御する制御プログ
ラムが格納されている。また、同メモリは、各ＴＳＭが
持つデータベースを管理するＤＢ−ＴＳＭ対応テーブ
ル、各ＴＳＭ用の待ちキューバッファ、接続された端末
用のセッションｎ用出力バッファとしても使っている。
メモリ１３７０には、文字列照合処理を制御する制御プ
ログラム、ターム間のＡＮＤ，ＯＲ条件を判定する論理
条件判定プログラム、複数の端末からの検索要求を同時
に受け付けたときに検索結果集合を分配する照合結果分
配プログラムが格納してある。また、同メモリは照合結
果を一時蓄える照合結果出力バッファ、各接続端末毎に
結果集合間利テーブルの格納領域としても用いる。これ
より、検索処理の流れに従って、各プログラムの処理の
詳細を説明する。FIG. 23 is a diagram showing the configuration of the second embodiment. In this embodiment, a search terminal 1300, a CPU 1310 for executing a search process, a memory 1320 for storing a control program and various data buffers, and a CPU 133 connected to the CPU 1310 for controlling an actual character string search process
0, character string matching unit 13 that specializes in character string matching processing
40, a disk controller 1350 for controlling writing and reading of a magnetic disk for storing a database, a magnetic disk 1360, a program for controlling a character string collation process, and a memory 1370 for storing various data areas. 1330 to 1370 form one processing block called TSM and are connected to a plurality of CPUs 1310. The memory 1320 stores a control program for controlling the TSM and the terminal. The memory is also used as a DB-TSM correspondence table for managing a database of each TSM, a waiting queue buffer for each TSM, and an output buffer for session n for a connected terminal.
The memory 1370 includes a control program for controlling a character string collation process, a logical condition determination program for determining AND and OR conditions between terms, and a collation for distributing a search result set when search requests from a plurality of terminals are simultaneously received. A result distribution program is stored. The memory is also used as a collation result output buffer for temporarily storing collation results, and as a storage area for a result set inter-use table for each connection terminal. Hereinafter, the processing of each program will be described in detail according to the flow of the search processing.

【００６７】端末側での検索処理の流れは、第一の実施
例と同一である。第一の実施例と異なる点は、ＣＰＵ１
３１０，１３３０の制御プログラムの流れであるので、
これらについて詳しく説明する。図２４はＣＰＵ１３１
０の処理の流れを示すＰＡＤ図である。基本的にＣＰＵ
１３１０は端末からの要求を待っているが、割込みによ
りＴＳＭからの検索処理終了信号も処理する。まず、端
末からの要求の処理について説明し、次にＴＳＭからの
終了信号の処理について説明する。The flow of the search process on the terminal side is the same as in the first embodiment. The difference from the first embodiment is that the CPU 1
Since the flow of the control program is 310, 1330,
These will be described in detail. FIG. 24 shows the CPU 131.
FIG. 11 is a PAD diagram showing the flow of processing of No. 0. Basically CPU
1310 is waiting for a request from the terminal, but also processes a search processing end signal from the TSM by interruption. First, processing of a request from a terminal will be described, and then processing of an end signal from the TSM will be described.

【００６８】端末から接続要求があった場合には、要求
の検索対象となるデータベースからＤＢ−ＴＳＭ対応テ
ーブルを参照し、対応するＴＳＭへ接続要求を発行す
る。その後、セッションｎ用出力バッファの領域をメモ
リ１３２０に確保する。解放要求の場合は、ＴＳＭへ解
放要求を出し、接続時に確保したセッションｎ用出力バ
ッファの領域を解放する。検索要求の場合は、対応ＴＳ
Ｍの待ちキューバッファへ端末識別子と共に検索要求を
登録する。文書データ表示要求の場合は、対応ＴＳＭへ
データ表示要求を出し、データが出力バッファに格納さ
れるのを待って要求元端末へデータを送出する。ＴＳＭ
からの検索処理終了割込みを受け付けた場合、検索要求
元の端末が指定したデータベースを担当する全てのＴＳ
Ｍの処理が終わったかを判定し、全てのＴＳＭの検索処
理が終了している場合にかぎり出力バッファからの検索
データの送出を行う。それ以外の場合は、終了割込みが
あっても、これを無視して全てのＴＳＭの処理が終了す
るまで待っている。When there is a connection request from the terminal, the connection request is issued to the corresponding TSM by referring to the DB-TSM correspondence table from the database to be searched for the request. After that, the area of the output buffer for session n is secured in the memory 1320. In the case of a release request, a release request is issued to the TSM to release the area of the output buffer for session n secured at the time of connection. For a search request, the corresponding TS
The search request is registered together with the terminal identifier in the M waiting queue buffer. In the case of a document data display request, a data display request is issued to the corresponding TSM, and the data is sent to the requesting terminal after the data is stored in the output buffer. TSM
When the search processing end interrupt is received from the server, all the TSs in charge of the database specified by the search requesting terminal are received.
It is determined whether or not the processing of M has been completed, and the search data is transmitted from the output buffer only when the search processing of all the TSMs has been completed. In other cases, even if there is a termination interrupt, it ignores the interruption and waits until all TSM processes are completed.

【００６９】ここで用いるＤＢ−ＴＳＭ対応テーブルは
図２５のような構造をしている。項目として、ＤＢ名称
とそのＤＢを格納しているＴＳＭの番号の対応表となっ
ていて、接続要求時、端末からの検索対象データベース
名称からこのテーブルを参照して、対応するＴＳＭへ接
続要求を出す。例えば、新聞記事を検索対象とする端末
からの接続要求が来た場合には、ＴＳＭ１と２に接続要
求を出し、以後該端末からの検索要求はＴＳＭ１と２に
対して行われる。The DB-TSM correspondence table used here has a structure as shown in FIG. An item is a correspondence table of the DB name and the number of the TSM storing the DB. When a connection is requested, the terminal refers to this table from the name of the search target database from the terminal, and issues a connection request to the corresponding TSM. put out. For example, when a connection request is received from a terminal whose search target is a newspaper article, a connection request is issued to TSM1 and TSM2. Thereafter, a search request from the terminal is made to TSM1 and TSM2.

【００７０】キューバッファの構造は第一の実施例と同
様であるが、ＴＳＭの数だけメモリ１３２０に用意され
ており、端末からの検索要求にしたがって対応するＴＳ
Ｍ用のキューバッファに端末識別子と共に登録される。
例えば、新聞記事ＤＢを検索対象とする端末からの検索
要求では、ＴＳＭ１と２用のキューバッファにそれぞれ
検索要求が登録される。The structure of the queue buffer is the same as that of the first embodiment, but the same number of TSMs are provided in the memory 1320, and the corresponding TSs are provided according to a search request from the terminal.
It is registered in the M queue buffer together with the terminal identifier.
For example, in a search request from a terminal that searches the newspaper article DB, the search request is registered in the queue buffers for TSM1 and TSM2.

【００７１】次にＣＰＵ１３３０の処理について説明す
る。このＴＳＭに設けられたＣＰＵでは、上位のＣＰＵ
１３１０からの要求を待っている。図２６に示すよう
に、接続の要求があったときには、接続された端末用の
結果集合管理テーブルの領域を確保する。また、開放要
求の場合には、対応する端末用の結果集合管理テーブル
の領域を開放する。文書データ表示の要求では、送られ
て来た文書ＩＤに対応するテキストデータをディスク制
御部を駆動して、磁気ディスクより取り出し、上位ＣＰ
Ｕ１３１０の管理するセッションｎ用出力バッファへデ
ータを送出する。この時の、送り先であるセッションｎ
用出力バッファとは、勿論データ表示要求元の端末に対
応する出力バッファを示す。Next, the processing of the CPU 1330 will be described. In the CPU provided in this TSM, the upper CPU
Waiting for request from 1310. As shown in FIG. 26, when there is a connection request, an area of the result set management table for the connected terminal is secured. In the case of a release request, the area of the result set management table for the corresponding terminal is released. In response to the document data display request, the disk control unit drives the disk control unit to retrieve the text data corresponding to the sent document ID from the magnetic disk.
The data is sent to the session n output buffer managed by U1310. At this time, the session n which is the destination
The output buffer for use is, of course, an output buffer corresponding to the terminal that has requested data display.

【００７２】検索要求の場合は、第一の実施例と同様
に、まず待ちキューバッファに蓄えられた検索要求の数
をチェックし、１個なら単一キュー取り出し、複数個な
らばキューを同時に取り出して、タームＩＤ付与処理を
行う。この後、文字列照合部とディスク制御部を駆動
し、条件式中の検索タームを探索する文字列探索を行
う。そして、ターム間の論理条件判定を行って条件式に
合致する文書ＩＤを照合結果出力バッファに出力した
後、端末識別子を基に照合結果を各要求元別に分配し
て、上位ＣＰＵの管理するセッションｎ用出力バッファ
へ出力する。この時、結果集合は要求元に対応する結果
集合管理テーブルへも格納し、ハイアラーキサーチ等で
次の検索のために利用する。最後に、上位ＣＰＵ１３１
０へ担当分の検索処理が終わったことを報告して、全て
の検索処理が終わり、また上位ＣＰＵからの要求を待
つ。In the case of a search request, as in the first embodiment, first, the number of search requests stored in the waiting queue buffer is checked. Then, a term ID assignment process is performed. Thereafter, the character string collating unit and the disk control unit are driven to perform a character string search for searching for a search term in the conditional expression. Then, after performing a logical condition determination between terms and outputting a document ID matching the conditional expression to the matching result output buffer, the matching result is distributed to each request source based on the terminal identifier, and a session managed by the upper CPU is performed. Output to the output buffer for n. At this time, the result set is also stored in the result set management table corresponding to the request source, and is used for the next search by hierarchy search or the like. Finally, the upper CPU 131
It reports to 0 that the search process for the charge is completed, all search processes are completed, and the system waits for a request from the upper CPU.

【００７３】以上、第二の実施例を説明した。本実施例
によれば、文字列探索処理のための特別なハードウエア
あるいは探索処理専門のＣＰＵを有するため、複数のＴ
ＳＭで並列にフルテキストサーチを実行することができ
る。The second embodiment has been described above. According to this embodiment, since special hardware for a character string search process or a CPU dedicated to a search process is provided, a plurality of T
A full-text search can be performed in parallel with the SM.

【００７４】また、この第二の実施例では、処理ブロッ
クＴＳＭが複数個ＣＰＵに接続し、同ＣＰＵが検索端末
のあるＬＡＮに接続する構成としたが、図２７に示すよ
うにＴＳＭを検索端末と同一のＬＡＮに接続する構成で
も同じ処理を行うことが可能である。Further, in the second embodiment, the processing block TSM is connected to a plurality of CPUs, and the CPU is connected to a LAN having a search terminal. However, as shown in FIG. The same processing can be performed in a configuration in which the same LAN is connected.

【００７５】さらにまた、検索端末からの接続要求によ
り確保されるセッションｎ用結果集合管理テーブルとセ
ッションｎ用出力バッファを、同一端末からの別々の接
続要求で個々に確保することで、同一端末上の異なる検
索セッションをそれぞれ管理することもできる。Furthermore, the result set management table for session n and the output buffer for session n secured by the connection request from the search terminal are individually secured by separate connection requests from the same terminal, so that the same terminal can be used. You can also manage different search sessions.

【００７６】[0076]

【発明の効果】本発明によれば、ＣＰＵ占有時間の多い
フルテキストサーチ処理を複数の端末をつなげて処理す
ることができる。すなわち、複数端末から同時に検索処
理要求が来たときでも、後から要求のあった端末を待た
せることなく、一回のテキストスキャンで複数個の条件
式を一括で処理することが可能となる。According to the present invention, a full-text search process with a long CPU occupation time can be processed by connecting a plurality of terminals. That is, even when a search processing request is received from a plurality of terminals at the same time, a plurality of conditional expressions can be collectively processed by one text scan without having to wait for the terminal that has made the request later.

【００７７】待ちキューバッファを設け、端末から来た
要求を逐次登録しておくことにより、検索処理中に入っ
て来た要求は、現在の検索処理終了後に、複数の要求を
まとめて処理することができるようになる。By providing a waiting queue buffer and sequentially registering requests from terminals, a request coming in during a search process can be processed as a group of requests after the current search process is completed. Will be able to

【００７８】まとめて処理された検索結果は、照合結果
分配プログラムと、セッションｎ用出力バッファにより
要求元別に分配され、それぞれの要求元へ送出すること
ができる。The search results processed together are distributed for each request source by the collation result distribution program and the output buffer for session n, and can be sent to each request source.

【００７９】オフラインで複数件の検索要求を処理する
場合においても、同様の処理を行うことで、複数の検索
要求を一括して処理することができるので、一件単位で
検索要求を処理するよりも効率的に検索要求を消化する
ことができるようになる。Even when a plurality of search requests are processed offline, a plurality of search requests can be processed collectively by performing the same processing. It is also possible to efficiently digest search requests.

【００８０】また、要求元別に結果集合を蓄える結果集
合管理テーブルを持ち、ハイアラーキサーチのときに、
第一に要求元分のベース集合のＯＲ集合を検索対象とし
て検索処理し、第二に得られた照合結果を要求元別に分
配し、第三に分配した結果集合に対して各要求元のベー
ス集合とＡＮＤすることで、複数の要求元から来る検索
要求を一括して処理することが可能となる。Further, a result set management table for storing a result set for each request source is provided.
First, search processing is performed using the OR set of the base set for the request source as a search target, second, the obtained collation results are distributed for each request source, and third, the base of each request source is assigned to the distributed result set. By ANDing with a set, it becomes possible to collectively process search requests coming from a plurality of request sources.

【００８１】さらに、端末からの要求を受け付ける上位
ＣＰＵとそれにつながる複数個の文字列照合を担当する
下位ＣＰＵとを有することにより、テキストスキャン処
理を複数台で並列に行うことができ、高速なフルテキス
トサーチが可能となる。Further, by having an upper CPU that receives a request from the terminal and a lower CPU that is in charge of collating a plurality of character strings connected to the terminal, text scan processing can be performed in parallel by a plurality of units, and high-speed full scan can be performed. Text search becomes possible.

[Brief description of the drawings]

【図１】キューバッファを用いた複数要求受け付け処理
を示す概念図である。FIG. 1 is a conceptual diagram showing a multiple request acceptance process using a queue buffer.

【図２】複数の検索要求の一括処理を示す概念図であ
る。FIG. 2 is a conceptual diagram illustrating batch processing of a plurality of search requests.

【図３】マルチユーザでのハイアラーキ検索の処理を示
す説明図である。FIG. 3 is an explanatory diagram showing a process of a hierarchical search by a multi-user.

【図４】マルチユーザでのハイアラーキ検索の効率的な
処理を示す説明図である。FIG. 4 is an explanatory diagram showing efficient processing of a hierarchical search by a multi-user.

【図５】一括検索処理において検索ノイズが発生するこ
とを示す説明図である。FIG. 5 is an explanatory diagram showing that search noise occurs in a batch search process.

【図６】一括検索後の分配処理を示す概念図である。FIG. 6 is a conceptual diagram showing a distribution process after a batch search.

【図７】一括検索後の分配処理が正しい結果を出力する
ことを示す集合関係図である。FIG. 7 is a set relationship diagram showing that distribution processing after a batch search outputs correct results.

【図８】第一の実施例の構成図である。FIG. 8 is a configuration diagram of the first embodiment.

【図９】本実施例における検索端末側の検索業務の流れ
を示すＰＡＤ図である。FIG. 9 is a PAD diagram showing a flow of a search operation on the search terminal side in the embodiment.

【図１０】検索装置の制御プログラムの流れを示すＰＡ
Ｄ図である。FIG. 10 is a PA showing a flow of a control program of the search device.
FIG.

【図１１】待ちキューバッファの構造を示す概念図であ
る。FIG. 11 is a conceptual diagram showing the structure of a wait queue buffer.

【図１２】文字列探索プログラムの処理を示すＰＡＤ図
である。FIG. 12 is a PAD diagram showing processing of a character string search program.

【図１３】タームＩＤ付与処理を示す概念図である。FIG. 13 is a conceptual diagram showing a term ID assignment process.

【図１４】タームＩＤ付与処理を示すＰＡＤ図である。FIG. 14 is a PAD diagram showing a term ID assignment process.

【図１５】タームＩＤの登録処理を示すＰＡＤ図であ
る。FIG. 15 is a PAD diagram showing a term ID registration process.

【図１６】検索結果の分配処理を示す概念図である。FIG. 16 is a conceptual diagram showing a search result distribution process.

【図１７】データベースの格納形式を示す概念図であ
る。FIG. 17 is a conceptual diagram showing a storage format of a database.

【図１８】待ちキューバッファの構造を示す概念図であ
る。FIG. 18 is a conceptual diagram showing the structure of a waiting queue buffer.

【図１９】複数のデータベースを有するときの文字列探
索プログラムの処理を示すＰＡＤ図である。FIG. 19 is a PAD diagram showing processing of a character string search program when there are a plurality of databases.

【図２０】検索結果格納方法を示す概念図である。FIG. 20 is a conceptual diagram showing a search result storage method.

【図２１】ハイアラーキ検索の原理を示す概念図であ
る。FIG. 21 is a conceptual diagram showing the principle of a hierarchy search.

【図２２】複数要求受付け時のハイアラーキ検索の方法
を示す概念図である。FIG. 22 is a conceptual diagram showing a hierarchical search method when a plurality of requests are received.

【図２３】第二の実施例の構成図である。FIG. 23 is a configuration diagram of a second embodiment.

【図２４】上位ＣＰＵの制御プログラムの処理を示すＰ
ＡＤ図である。FIG. 24 is a diagram showing P indicating the processing of the control program of the upper CPU.
It is an AD diagram.

【図２５】ＤＢ−ＴＳＭ対応テーブルの形態を示す図で
ある。FIG. 25 is a diagram showing a form of a DB-TSM correspondence table.

【図２６】下位ＣＰＵの制御プログラムの処理を示すＰ
ＡＤ図である。FIG. 26 is a diagram illustrating a process P of the control program of the lower CPU.
It is an AD diagram.

【図２７】検索端末と検索処理ブロックを同一のＬＡＮ
で接続した構成図である。FIG. 27: Search terminal and search processing block on the same LAN
FIG.

[Explanation of symbols]

１００…検索端末、１１０…ＣＰＵ，１２０…磁気ディ
スク、１３０…メモリ。100: search terminal; 110: CPU; 120: magnetic disk; 130: memory.

───────────────────────────────────────────────────── フロントページの続き (72)発明者川口久光東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (56)参考文献特開平１−149125（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 340 G06F 9/46 340 ＪＩＣＳＴファイル（ＪＯＩＳ)────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Hisamitsu Kawaguchi 1-280 Higashi Koigakubo, Kokubunji-shi, Tokyo Inside Central Research Laboratory, Hitachi, Ltd. (56) References JP-A-1-149125 (JP, A) (58) Survey Field (Int.Cl. ⁷ , DB name) G06F 17/30 340 G06F 9/46 340 JICST file (JOIS)

Claims

(57) [Claims]

1. A plurality of search terminals are connected and a specific character
In an information search apparatus that searches for a document including a column, when another search request is received during a search process, the search request is stored together with identification information indicating which search request,
After the end of the character string search process, a search request storage unit that sorts and outputs a search result to each search request source based on the identification information, and a plurality of search requests in the search request storage unit are stored.
When the plurality of stored search requests are collectively searched,
When management, search performs a search process for all of the data
Store past search results as a set for each processing means and search request source.
Can, after sorting the search results to the search request source, it
The past result set corresponding to the search request source and the new result
Performing AND processing between sets with the result set to narrow down
Result set managing means for performing management, to characterized in that a city
Information retrieval device.

2. A plurality of search terminals are connected and a specific character
In the information retrieval apparatus for searching a document containing a column, when receiving another search request during the search process, which search
Together with identification information indicating whether the requested, storing the search request,
After the character string search processing ends, the search binding based on the identification information
Search request storage unit that sorts and outputs results to each search request source
When a plurality of search requests in the search request storage unit are stored, and when the stored plurality of search requests are collectively searched, an OR set of a result set corresponding to each search request source is determined. Search processing means that performs search processing based on search results, and stores past search results as a set for each search request source.
Can, after sorting the search results to the search request source, it
Past results corresponding to the search source that obtained the OR set
Between a set and a new result set, rows set between AND processing
Result set management means for performing the narrowing process I, provided the capital
An information retrieval apparatus characterized by the above-mentioned.