JP3260428B2

JP3260428B2 - Information retrieval processor

Info

Publication number: JP3260428B2
Application number: JP20335192A
Authority: JP
Inventors: 忠一菊池; 伸一伊藤
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1992-07-30
Filing date: 1992-07-30
Publication date: 2002-02-25
Anticipated expiration: 2017-02-25
Also published as: JPH0652222A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は情報検索処理装置に係わ
り、特に、文書等のデータベースに存在する文字列等を
検索したり文字列の修正を行うに好適な情報検索処理装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information retrieval processing apparatus, and more particularly to an information retrieval processing apparatus suitable for retrieving a character string or the like existing in a database such as a document or correcting a character string.

【０００２】[0002]

【従来の技術】従来、情報処理システムにおいては、デ
ータベースにおける目的の文字等のデータ（キーワー
ド）の検索を行うに際しては、予め記憶されているデー
タ即ちテキストを、全て検索するように構成された逐次
検索方式が一般的に採用されている。図１２は、キーワ
ードの逐次検索方式が適用された従来の構成図である。
図１２において、検索すべき文字データであるキーワー
ドはマッチング部５１に入力されるようになっている。
このマッチング部５１には、テキスト即ち予め記憶され
た文字等のデータを記憶しているテキストメモリ５２が
接続されている。そしてマッチング部５１はその文書の
文字を１文字づつ読み出して、キーワードと一致してい
るか否かを判別するようになっている。2. Description of the Related Art Conventionally, in a data processing system, when searching for data (keywords) of a target character or the like in a database, it is configured to sequentially search all data or text stored in advance. A search method is generally employed. FIG. 12 is a conventional configuration diagram to which a sequential search method of a keyword is applied.
In FIG. 12, a keyword as character data to be searched is input to the matching unit 51.
The matching unit 51 is connected to a text memory 52 which stores text, that is, data such as characters stored in advance. Then, the matching unit 51 reads out the characters of the document one by one and determines whether or not the characters match the keyword.

【０００３】しかし、従来の逐次検索方式では、テキス
ト中の全ての文字とキーワードとを順次比較してキーワ
ードが存在するか否かを判別しているため、その検索に
時間を要するという問題を有している。特に、テキスト
の文字が増加するに伴って、その検索時間も増加するた
め、大容量のテキストの検索を行うときには検索に多く
の時間を要することになる。そこでこのような問題点を
解決するために、検索を短時間で行うようにしたものと
して、例えば特開昭６４−３５６２７号があげられる。However, in the conventional sequential search method, all characters in the text and the keyword are sequentially compared to determine whether or not the keyword exists. are doing. In particular, as the number of characters in the text increases, the search time also increases. Therefore, when searching for a large-capacity text, it takes a lot of time to search. In order to solve such a problem, Japanese Patent Application Laid-Open No. 64-35627 discloses a technique for performing a search in a short time.

【０００４】図１３は新しい検索方式を採用した装置の
機能ブロック図である。図１３において、文字連鎖抽出
手段５３には複数の文字よりなるキーワードが入力され
るようになっており、この文字連鎖抽出手段５３はその
複数の文字で構成される特定数の文字列とキーワード中
の位置とを抽出することができる。文字連鎖インデック
ス５４は検索すべきテキスト中に存在する特定数の連続
する文字がテキスト中のどの位置に存在するかを記憶し
ている。インデックス検索手段５５は、文字連鎖抽出手
段５３により抽出された文字列を基に、文字連鎖インデ
ックス５４を検索するとともに、複数の文字列の位置関
係からキーワードのテキスト中における位置を検索でき
るようになっている。FIG. 13 is a functional block diagram of a device employing a new search method. In FIG. 13, a keyword composed of a plurality of characters is input to a character chain extraction unit 53. The character chain extraction unit 53 includes a specific number of character strings composed of the plurality of characters and a keyword. Can be extracted. The character chain index 54 stores where in the text a particular number of consecutive characters present in the text to be searched are located. The index search means 55 can search the character chain index 54 based on the character string extracted by the character chain extraction means 53, and can search the position of the keyword in the text from the positional relationship of the plurality of character strings. ing.

【０００５】[0005]

【発明が解決しようとする課題】しかし、従来技術で
は、逐次検索方式よりも検索の高速化を図ることはでき
るが、テキストの修正時には、修正箇所以降の文字位置
を表すポインタを全て更新しなければならないという問
題が生じる。即ち、修正が生じた場合、修正箇所以降の
文字連鎖の文字位置が変わるため、文字連鎖インデック
ス５４に格納した文字連鎖に対応するポインタのうち、
修正箇所のうち、修正箇所以降の文字連鎖に対応する全
てのポインタを更新しなければならない。However, in the prior art, the search can be performed at a higher speed than in the sequential search method. However, when the text is corrected, all the pointers indicating the character positions after the correction must be updated. The problem arises that it must be done. That is, when the correction occurs, the character position of the character chain after the correction portion changes, and therefore, of the pointers corresponding to the character chains stored in the character chain index 54,
In the modified portion, all pointers corresponding to the character chain after the modified portion must be updated.

【０００６】本発明の目的は、検索の高速化と文書の修
正時間の短縮化を図ることができる情報検索処理装置を
提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide an information retrieval processing device capable of speeding up retrieval and shortening the time required for modifying a document.

【０００７】[0007]

【課題を解決するための手段】前記目的を達成するため
に、本発明は、複数の文字等で構成されている文字列の
データをその位置データとともに記憶するテキストファ
イルと、前記テキストファイルのデータを特定文字また
は特定文字列のデータ毎に分割して複数のサブテキスト
を生成するテキスト生成手段と、前記サブテキストから
複数の特定文字数で構成される文字連鎖とこの文字連鎖
のサブテキスト中の位置を示す文字連鎖位置データを抽
出する文字連鎖抽出手段と、前記文字連鎖抽出手段の抽
出による文字連鎖と文字連鎖位置データをテキスト中の
位置に対応づけて記憶する文字連鎖記憶手段と、入力さ
れたキーワードによる検索処理の際には、前記文字連鎖
抽出手段がキーワードより抽出した文字連鎖と文字連鎖
位置データとから、前記文字連鎖記憶手段を検索して、
前記文字連鎖抽出手段の抽出による文字連鎖と同一文字
連鎖をその位置データとともに抽出するインデックス検
索手段と、を備えている情報検索処理装置において、前
記サブテキストの修正対象文字列を修正文字列に修正す
る文字連鎖修正手段と、をさらに備え、修正処理の際に
は、前記文字連鎖抽出手段は、修正対象文字列から、文
字連鎖と修正対象文字列中の位置を返す文字連鎖位置デ
ータとを抽出し、前記インデックス検索手段は、前記文
字連鎖抽出手段により抽出された修正対象文字列の文字
連鎖と文字連鎖位置データとから、前記文字連鎖記憶手
段を検索して修正対象文字列のテキスト中の位置データ
を抽出し、前記テキスト生成手段は、前記テキストファ
イルから、前記インデックス検索手段により抽出された
位置データに該当するサブテキストを抽出し、前記文字
連鎖修正手段は、前記インデックス検索手段により抽出
された位置データと、修正文字列とを用いて、前記テキ
スト生成手段により抽出されたサブテキストを修正して
修正サブテキストを作成し、前記文字連鎖抽出手段は、
前記文字連鎖修正手段により修正された修正サブテキス
トから文字連鎖と修正サブテキスト中の位置を表す文字
連鎖位置データとを抽出し、前記文字連鎖修正手段は、
前記文字連鎖抽出手段により抽出された修正サブテキス
トの文字連鎖と文字連鎖位置データを、前記文字連鎖記
憶手段へ格納することにより、サブテキストを修正する
情報検索処理装置を構成したものである。To SUMMARY OF THE INVENTION To achieve the above object, the present invention, a text file and the data of the text file be stored with the location data of the data string is composed of a plurality of characters such as the divided for each data specific character or a specific character string and the text generation means for generating a plurality of sub-text, in the sub-text of this character chain to consist character chain of a plurality of specific character from the sub text position and Rubun shaped linkage extracting means to extract character chain position 置De over data indicating a correspondence the character chain and character chain position data by extracting the character chain extracting means <br/> position location in the text a character chain storage means for storing, of the input
When performing a search process using the specified keyword,
Character chains and character chains extracted from keywords by the extraction means
Searching the character chain storage means from the position data ,
In the information retrieval apparatus which includes a index search means, for extracting the character chain of the same character chain by extraction of the character chain extracting unit together with the position data, before
Modify the substring to be modified to the modified string
Character chain correction means,
, The character chain extracting means, from the correction target character string, text
Character chain position data that returns the character chain and the position in the character string to be corrected
And the index search means extracts the sentence
The character of the correction target character string extracted by the character chain extraction unit
Based on the chain and character chain position data, the character chain memory
Search column and position data in the text of the character string to be corrected
And the text generating means extracts the text file.
Extracted from the file by the index search means.
Extracting sub text corresponding to the position data, the character
The chain correction means is extracted by the index search means.
Using the position data obtained and the corrected character string,
Modify the subtext extracted by the list generation means
Creating a modified sub-text, wherein the character chain extracting means comprises:
Fixed Sabute kiss modified by the character chain modification means
Characters that represent the position in the character chain and the modified subtext from the
Chain position data and the character chain correcting means,
Corrected subtext extracted by the character chain extraction means
The character chain and character chain position data of the
Modify subtext by storing it in storage
This constitutes an information retrieval processing device .

【０００８】[0008]

【０００９】[0009]

【作用】まず、登録処理として、テキストファイルに文
字列のデータとその位置データが記憶されると、テキス
トファイルのデータがサブテキスト生成手段に入力され
る。テキストファイルのデータがサブテキスト生成手段
に入力されると、テキストファイルのデータを特定文字
または特定文字列のデータ毎に分割して複数のサブテキ
ストが生成される。各サブテキストのデータはそれぞれ
文字連鎖抽出手段に入力される。文字連鎖抽出手段で
は、各サブテキストの文字列に関するデータを受け、各
サブテキストのデータの中から特定文字数で構成される
文字連鎖とこの文字連鎖のサブテキスト中の位置を示す
文字連鎖位置とに関するデータを抽出する。例えば３文
字連鎖の複数の文字列とその文字列のサブテキスト中で
の位置を求める。そして抽出された文字連鎖の文字連鎖
位置データはテキスト中の位置データに変換され、位置
データの変換された文字連鎖データがテキスト中の位置
に対応付けられて文字連鎖記憶手段に記憶される。First, as a registration process, when character string data and its position data are stored in a text file, the data of the text file is input to the sub-text generating means. When the data of the text file is input to the subtext generating means, a plurality of subtexts are generated by dividing the data of the text file for each data of a specific character or a specific character string. The data of each subtext is input to the character chain extracting means. The character chain extracting means receives data relating to the character string of each subtext, and relates to a character chain composed of a specific number of characters from the data of each subtext and a character chain position indicating the position of the character chain in the subtext. Extract data. For example, a plurality of character strings in a three-character chain and the position of the character string in the subtext are obtained. Then, the character chain position data of the extracted character chain is converted into position data in the text, and the converted character chain data of the position data is stored in the character chain storage means in association with the position in the text.

【００１０】次に、検索処理として、指定の文字連鎖と
この文字連鎖のテキスト中の位置に関するデータを含む
キーワードがキーワード出力手段から出力されると、こ
のキーワードから複数の特定文字数で構成される文字連
鎖とその文字連鎖位置に関するデータが文字連鎖抽出手
段によって抽出される。例えば３文字連鎖の文字列とそ
の文字列のキーワード中での位置が求められる。文字連
鎖記憶手段には、テキスト中に存在する文字列とその位
置データが記憶されており、文字連鎖抽出手段から入力
した複数の文字列のテキスト中の位置を文字連鎖記憶手
段から読み出す。そして前述のテキストより抽出した文
字列が例えば３文字単位の場合、先頭から３文字続い
て、１文字後の連続する３文字、さらには２文字後の連
続する３文字である。１文字単位でシフトして３文字単
位の文字列であるので、キーワードがテキスト中に位置
するときには、文字連鎖記憶手段から読み出した位置デ
ータはキーワード中での文字列の位置データとその連続
性が一致することになる。従って、インデックス検索手
段がその連続性を位置関係から求めると、テキスト中の
キーワードと同一文字列の位置データを検索することが
できる。Next, as a search process, when a keyword including data relating to a designated character chain and the position of the character chain in the text is output from the keyword output means, a character composed of a plurality of specific characters is obtained from the keyword. Data relating to the chain and its character chain position is extracted by the character chain extracting means. For example, a character string of a three-character chain and the position of the character string in the keyword are obtained. The character chain storage means stores a character string existing in the text and its position data, and reads the positions in the text of a plurality of character strings input from the character chain extraction means from the character chain storage means. If the character string extracted from the above-described text is, for example, in units of three characters, three consecutive characters from the beginning, three consecutive characters one character later, and three consecutive characters two characters later. Since the character string is shifted in units of one character and is in units of three characters, when the keyword is located in the text, the position data read out from the character chain storage means has the continuity with the position data of the character string in the keyword. Will match. Therefore, when the index search means obtains the continuity from the positional relationship, it is possible to search for position data having the same character string as the keyword in the text.

【００１１】次に修正処理について述べる。テキストフ
ァイルのデータを特定文字または特定文字列のデータ毎
に分割して得られた複数の修正対象サブテキストが入力
されると、修正対象文字列を複数の特定文字数の連続す
る文字列、例えば３文字連鎖の文字列とその文字列の修
正対象文字列中での位置が求められる。これらのデータ
はサブテキスト中の位置に対応付けられて文字連鎖記憶
手段に記憶される。そして検索処理と同様に、インデッ
クス検索手段がその文字列の修正対象文字列中での位置
とテキスト中の修正対象文字列と同一の文字列の位置デ
ータを検索すると、修正対象文字列が存在するサブテキ
スト番号のデータが文字連鎖修正手段に入力される。こ
の修正対象サブテキスト生成手段がテキストファイルか
らこのサブテキスト番号に一致するサブテキストを抽出
すると、このサブテキストに関するデータが文字連鎖抽
出手段に転送される。そして文字連鎖抽出手段により、
サブテキスト中の複数の特定文字数の連続する文字列、
例えば３文字連鎖の文字列とその文字列のサブテキスト
中での位置が求められる。そして文字連鎖記憶手段に記
憶されているサブテキスト番号を有する全ての位置デー
タが消去される。次に修正文字列が入力されると、この
修正文字列に関するデータが文字連鎖抽出手段に入力さ
れる。そして修正文字列を複数の特定文字数の連続する
文字列、例えば３文字連鎖の文字列とその文字列のサブ
テキスト中での位置が求められ、これらのデータが文字
連鎖修正手段へ出力される。この文字連鎖修正手段で
は、既に得ている修正対象文字列が存在する文字列位置
とサブテキスト及び修正文字列から修正サブテキストを
生成する。そしてサブテキストから複数の特定文字数の
文字列とテキスト中の位置データをインデックス検索手
段へ転送する。インデックス検索手段はこの文字列と位
置データを文字連鎖記憶手段へ格納する。このテキスト
の修正はサブテキスト内に留まり、すなわち、前記特定
文字または特定文字列のデータ毎に分割したテキストフ
ァイルのデータのみを修正することができ、修正箇所以
降のサブテキストには影響がないため、テキストの修正
を短時間で行うことができる。また文字列の挿入や削除
も修正と同様に、該当するサブテキスト内の位置データ
の削除あるいは追加で短時間に行うことができる。Next, the correction processing will be described. When a plurality of correction target sub-texts obtained by dividing data of a text file for each specific character or specific character string data are input, the correction target character string is converted into a plurality of consecutive character strings of a specific number of characters, for example, 3 characters. The character string of the character chain and the position of the character string in the correction target character string are obtained. These data are stored in the character chain storage means in association with the position in the subtext. Then, similarly to the search processing, when the index search means searches the position of the character string in the correction target character string and the position data of the same character string as the correction target character string in the text, the correction target character string exists. The data of the subtext number is input to the character chain correcting means. When the correction target subtext generating means extracts a subtext corresponding to the subtext number from the text file, data relating to the subtext is transferred to the character chain extracting means. And by the character chain extraction means,
Multiple consecutive strings of a certain number of characters in the subtext,
For example, a character string of a three-character chain and the position of the character string in the subtext are obtained. Then, all the position data having the subtext number stored in the character chain storage means are deleted. Next, when a corrected character string is input, data relating to the corrected character string is input to the character chain extracting means. The corrected character string is obtained from a plurality of consecutive character strings having a specific number of characters, for example, a three-character chain character string and the position of the character string in the subtext. These data are output to the character chain correcting means. In this character chain correcting means, a corrected subtext is generated from the character string position where the obtained correction target character string exists, the subtext, and the corrected character string. Then, a plurality of character strings having a specific number of characters and position data in the text are transferred to the index search means from the subtext. The index search means stores the character string and the position data in the character chain storage means. Modifications of this text remain in the subtext, ie,
Text files divided for each character or specific character string data
Only the data of the file can be corrected, and the sub text after the corrected portion is not affected, so that the text can be corrected in a short time. In addition, the insertion and deletion of a character string can be performed in a short time by deleting or adding the position data in the corresponding subtext, similarly to the correction.

【００１２】[0012]

【実施例】以下、本発明の一実施例を図面に基づいて説
明する。図１は本発明の一実施例である情報検索処理装
置の全体構成図、図２は本発明の登録処理例の構成を示
す構成図、図３は図２の動作を説明するための動作説明
図である。図１及び図２において、１２は複数の文字等
で構成される文字列のデータをその位置データとともに
記憶するテキストファイルを構成している。テキストフ
ァイル１２のデータがサブテキスト生成部１３に入力さ
れると、テキストのデータが特定文字または特定文字列
のデータ毎に複数のサブテキストに分割される。各サブ
テキストのデータは文字連鎖抽出部１４に転送され、各
サブテキストから複数の特定文字数で構成される文字連
鎖とこの文字連鎖のサブテキスト中の位置を示す文字連
鎖位置とに関するデータが抽出される。抽出されたデー
タが文字連鎖修正部１７へ転送されると、文字連鎖の文
字連鎖位置データがテキスト中の位置データに変換され
る。即ち文字連鎖修正部１７は位置データ変換手段とし
て構成されている。文字連鎖修正部１７により位置デー
タの変換された文字データはテキスト中の位置に対応付
けられて文字連鎖インデックス１５に格納される。即ち
文字連鎖インデックス１５は文字連鎖記憶手段として構
成されている。また指定の文字連鎖とこの文字連鎖のテ
キスト中の位置に関するデータを含むキーワードが文字
連鎖抽出部１４に入力されると、文字連鎖抽出部１４に
おいて、キーワードから複数の特定文字数で構成される
文字連鎖とその文字連鎖位置に関するデータが抽出され
る。即ち文字連鎖抽出部１４は第１文字連鎖抽出手段と
第２文字連鎖抽出手段を構成している。そして文字連鎖
抽出部１４で抽出されたデータがインデックス検索部１
６へ転送されると、文字連鎖抽出部１４で抽出された文
字連鎖と同一の文字連鎖が文字連鎖インデックス１５か
ら位置データとともに抽出される。即ちインデックス検
索部１６はインデックス検索手段として構成されてい
る。An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is an overall configuration diagram of an information search processing device according to an embodiment of the present invention, FIG. 2 is a configuration diagram showing a configuration of a registration processing example of the present invention, and FIG. 3 is an operation explanation for explaining the operation of FIG. FIG. In FIG. 1 and FIG. 2, reference numeral 12 denotes a text file that stores character string data including a plurality of characters and the like together with its position data. When the data of the text file 12 is input to the subtext generator 13, the text data is divided into a plurality of subtexts for each specific character or specific character string data. The data of each subtext is transferred to the character chain extraction unit 14, and data on a character chain composed of a plurality of specific characters and a character chain position indicating the position of the character chain in the subtext is extracted from each subtext. You. When the extracted data is transferred to the character chain correcting unit 17, the character chain position data of the character chain is converted into position data in the text. That is, the character chain correction unit 17 is configured as a position data conversion unit. The character data whose position data has been converted by the character chain correction unit 17 is stored in the character chain index 15 in association with the position in the text. That is, the character chain index 15 is configured as character chain storage means. When a keyword including a designated character chain and data on the position of the character chain in the text is input to the character chain extracting unit 14, the character chain extracting unit 14 converts the keyword into a character chain composed of a plurality of specific characters. And data on the character chain position are extracted. That is, the character chain extraction unit 14 constitutes a first character chain extraction unit and a second character chain extraction unit. Then, the data extracted by the character chain extraction unit 14 is stored in the index search unit 1.
6, the same character chain as the character chain extracted by the character chain extraction unit 14 is extracted from the character chain index 15 together with the position data. That is, the index search unit 16 is configured as an index search unit.

【００１３】上記構成において、本実施例の登録処理を
図２及び図３に従って説明する。まず登録テキストがテ
キストファイル１２からサブテキスト生成部１３に入力
されると、サブテキスト生成部１３で、サブテキストの
終端を表す予め設定された文字あるいは文字列の探索が
行われる。図３では、句点「。」をサブテキストの終端
としている。そしてサブテキスト生成部１３で句
点「。」を検出すると、句点を検出する毎に、検出まで
に文字列を参照した全ての文字で構成される文字列をサ
ブテキストとし、テキスト中のサブテキストの出現順位
をサブテキスト番号とする。例えば、図３のように、テ
キストが（・・・。・・・あいうえお・・・。・・・か
きくけこ・・・・。・・・）の場合、サブテキストの番
号は（・・・ｉ ₁、ｉ₂になり、サブテキスト番号ｉ₁と
ｉ₂のサブテキストはそれぞれ、（・・・あいうえお・
・・。）と（・・・かきくけこ・・・・。）になる。In the above configuration, the registration processing of this embodiment
This will be described with reference to FIGS. First, the registration text
Input from the text file 12 to the sub text generator 13
Then, the sub-text generation unit 13
Searching for a preset character or character string representing the end
Done. In FIG. 3, the period "."
And Then, the sub-text generation unit 13
When a dot "." Is detected, every time a period is detected,
A character string composed of all the characters that referred to the character string
Text and the order of appearance of the subtext in the text
Is the subtext number. For example, as shown in FIG.
Kist (...
Kikukeko ... …), The sub-text number
The number is (... i ₁, I_TwoAnd the subtext number i₁When
i_TwoThe sub texts are (...
・・. ) And (... Kikukukeko ...).

【００１４】登録処理が終了した後、図４に示されるよ
うに、複数のサブテキストを生成するための処理が行わ
れる。まずテキストファイル１２からテキストのデータ
がサブテキスト生成部１３に入力されると、イニシャル
セット（ステップ）ＯＰ１が行われる。即ち、サブテキ
スト番号ｃｎｔを“１”に、テキストの先頭からの文字
位置ｉを“２”、サブテキスト終端文字列をｐ₁〜ｐ
_kに、ｐ₁〜ｐ_kを検出するまでの照合回数ｊを“０”にセ
ットする。そしてテキストから取り出したｋ文字（ｋは
ｐ_kのｋに同じ）の文字列とサブテキスト終端文字列ｐ₁
〜ｐ_kが一致するか否かをステップＯＰ２で判定する。
この処理で一致しないと判定されたときには、ステップ
ＯＰ３でテキストとサブテキスト終端文字列ｐ₁〜ｐ_kと
の文字列照合が終了するか否かの判定を行う。このステ
ップで継続と判定されたときには、ステップ４で文字位
置ｉと照合回数ｊを＋１とする。さらにステップＯＰ２
以降の処理を繰り返し、テキストの終わりまでテキスト
から取り出すｋ文字の文字列とサブテキスト終端文字列
ｐ₁〜ｐ_kが一致するか否かを判定する。次に、ステップ
ＯＰ２で、テキストから取り出したｋ文字の文字列とサ
ブテキスト終端文字列ｐ₁〜ｐ_kが一致すると、ステップ
ＯＰ５で、これまでに文字列の照合で得られた文字で構
成されるサブテキスト“Ｃ_i-j〜Ｃ_i+k-1”とサブテキス
ト番号ｃｎｔをこれまでのテキスト分割結果とする。次
にステップＯＰ６で、次のサブテキストを探索するため
に、サブテキスト番号ｃｎｔを＋１し、照合回数ｊ“−
１”にする。さらにステップＯＰ３で、テキストの終わ
りまでの文字列照合を確認すると、ステップＯＰ７でこ
れまでにステップステップＯＰ５で作成されたサブテキ
ストとサブテキスト番号をサブテキスト分割の結果とす
る。After the registration process is completed, a process for generating a plurality of subtexts is performed as shown in FIG. First, when text data is input from the text file 12 to the subtext generator 13, an initial set (step) OP1 is performed. That is, the subtext number cnt is “1”, the character position i from the beginning of the text is “2”, and the subtext end character string is p ₁ to p
to _k, it is set to "0" to match the number of times j of up to detect the p ₁ ~p _k. And strings and sub-text-terminated string p ₁ of k characters taken from the text (k are the same as k of p _k)
Determines whether ~p _k coincide in step OP2.
When it is determined not to match the process, the character string matching text and sub text terminated string p ₁ ~p _k is determined whether to end at step OP3. If it is determined in this step to continue, the character position i and the number of times of collation j are set to +1 in step 4. Step OP2
Repeating the subsequent processing, it determines whether there is a character string and the sub-text terminated string p ₁ ~p _k of k characters match to retrieve from the text to the end of the text. Next, in step OP2, the character string and the sub-text terminated string p ₁ ~p _k of k characters taken from text matches, in step OP5, consists of a character obtained by the matching string to date The sub-text “C _ij -C _{i + k−1} ” and the sub-text number cnt are the text segmentation results so far. Next, in step OP6, to search for the next subtext, the subtext number cnt is incremented by 1, and the number of collations j "-
In step OP3, when the character string collation up to the end of the text is confirmed, in step OP7, the subtext and subtext number created so far in step OP5 are set as the result of subtext division.

【００１５】さらに、図３に示されるように、サブテキ
ストとサブテキスト番号がサブテキスト生成部１３から
文字連鎖抽出部１４に転送されると、文字連鎖抽出部１
４においてそのサブテキストから、複数の特定文字数で
構成される文字連鎖と、サブテキスト中の位置を表す文
字連鎖位置データとが求められる。例えば、図３で３文
字連鎖を求めた場合には、文字連鎖とその位置は・・
・、＜あいう，ｉ₁〜ｐｔ₁＞，＜いうえ，ｉ₁〜ｐｔ
₂＞，＜うえおｉ₁〜ｐｔ₃＞，・・・，＜かきくｉ₂〜ｐ
ｔ₁’＞，＜きくけｉ₂〜ｐｔ₂’＞，＜くけこｉ₂〜ｐｔ
₃’＞，・・・となる。ここでｐｔ₁及びｐｔ₁’等は、文
字連鎖の位置を表している。Further, as shown in FIG. 3, when the subtext and the subtext number are transferred from the subtext generator 13 to the character chain extractor 14, the character chain extractor 1
In step 4, a character chain composed of a plurality of specific characters and character chain position data indicating a position in the subtext are obtained from the subtext. For example, when a three-character chain is obtained in FIG. 3, the character chain and its position are:
・, <Ai, i _{1 to} pt ₁ >, <Iue, i _{1 to} pt
_2>, <Ueo _{_{i 1 ~pt 3>, ···,}} < Kakiku i ₂ ~p
t ₁ '>, <Kikuke i _₂ ~pt _2'>, <Kukeko i ₂ ~pt
₃ '>, ... Here, pt ₁ and pt ₁ ′ indicate the position of the character chain.

【００１６】次に、文字連鎖抽出処理の動作を図５に従
って説明する。入力Ｓ即ち、サブテキストが文字連鎖抽
出部１４に入力されると、イニシャルセットＯＰ８が行
われる。即ち文字連鎖の文字数ａを“３”にセットし、
入力Ｓの文字連鎖の先頭を表すポインタｉを１にする。
そして、次に文字連鎖抽出の最後であるか否かをステッ
プＯＰ９で判定する。ポインタｉは後述する動作で変化
し、文字連鎖の先頭を指示しているので、その指示位置
を含めてａ文字分後方でキーワードが連続しているか否
かの判定をステップＯＰ９で行うことになる。そしてサ
ブテキストが１０文字（・・・あいうえお・・・。）の
ときには、ｎは１０であるので、ｉ＝８まで、ｉ＋ａ−
１＞ｎを満足しない（ＮＯ）ので、次にはポインタｉが
指示する位置から３文字分のデータをレジスタＸに格納
する処理をステップＯＰ１０で行う。レジスタＸはマト
リックスレジスタであり、ｊ＝１〜ａ即ち１，２，３と
変化して入力Ｓ（１）〜Ｓ（ｎ）の３文字分をまず記憶
する。ｉ＝１のときにはＸ（１，１）←Ｓ（１），Ｘ
（１，２）←Ｓ（２），Ｘ（１，３）←Ｓ（３）とな
る。この動作で先頭から３文字分が１個の文字連鎖とし
てレジスタＸに記憶される。そして次にポインタｉをイ
ンクリメントする動作をステップＯＰ１１で行う。即
ち、次の文字をポイントすべき動作を行う。そしてステ
ップＯＰ９で再び判別動作を行う。この繰り返しの動作
によって、次ぎにはＸ（２，１）←Ｓ（２），Ｘ（２，
２）←Ｓ（３），Ｘ（２，３）←Ｓ（４）が行われる。Next, the operation of the character chain extraction processing will be described with reference to FIG. When the input S, that is, the sub text is input to the character chain extraction unit 14, an initial set OP8 is performed. That is, the number of characters a in the character chain is set to “3”,
The pointer i indicating the head of the character chain of the input S is set to 1.
Then, it is determined in step OP9 whether or not it is the last of the character chain extraction. Since the pointer i is changed by an operation described later and indicates the head of the character chain, it is determined in step OP9 whether or not the keyword is continuous by a character behind including the specified position. . When the sub text has 10 characters (...,...), N is 10, so that i + a−
Since 1> n is not satisfied (NO), next, a process of storing data for three characters from the position indicated by the pointer i in the register X is performed in step OP10. The register X is a matrix register, and changes from j = 1 to a, that is, 1, 2, 3, and stores three characters of inputs S (1) to S (n) first. When i = 1, X (1,1) ← S (1), X
(1,2) ← S (2) and X (1,3) ← S (3). In this operation, the three characters from the beginning are stored in the register X as one character chain. Then, the operation of incrementing the pointer i is performed in step OP11. That is, an operation for pointing to the next character is performed. Then, the discriminating operation is performed again in step OP9. By this repetitive operation, next, X (2,1) ← S (2), X (2,2)
2) ← S (3), X (2,3) ← S (4) are performed.

【００１７】サブテキストが１０文字の場合、ポインタ
ｉが９を指示したときには、ｉ＋ａ−１＝１１，ｎ＝１
０であるので、ｉ＋ａ−１＞ｎを満足する（ＹＥＳ）よ
うになり、Ｘをインデックス検索部１６に出力する処理
をステップ１２で行って全ての動作を終了する。If the subtext is 10 characters and the pointer i indicates 9, i + a-1 = 11, n = 1
Since it is 0, i + a-1> n is satisfied (YES), the process of outputting X to the index search unit 16 is performed in step 12, and all the operations are completed.

【００１８】以上の動作によってインデックス検索部１
５に文字連鎖とその位置データが格納される。なお、マ
トリックスレジスタＸ（ｉ，ｊ）におけるｉの値がその
文字連鎖の位置を表している。そして文字連鎖インデッ
クス１５には図３に示されるように、文字連鎖（・・
・，あいう，いうえ，・・・，きくけ，くけこ，・・
・）のような３文字の文字連鎖と、テキスト中の位置を
表すポインタ（・・・，ｉ ₁−ｐｔ₁，ｉ₁−ｐｔ₂，・・
・ｉ₂−ｐｔ₂’，ｉ₂−ｐｔ₃’，・・・）が記憶され
る。例えば、あいう，いうえ，きくけ，，くけこ、に対
応してｉ₁−ｐｔ₁，ｉ ₁−ｐｔ₂，ｉ₂−ｐｔ₁’，ｉ₂−
ｐｔ₃’が記憶される。なお、文字連鎖インデックス１
５には、３文字連鎖の他に、１文字や２文字連鎖等にも
対応できるように、それらの文字連鎖とポインタも記憶
されている。With the above operation, the index search unit 1
5, a character chain and its position data are stored. Note that
The value of i in the trix register X (i, j) is
Indicates the position of the character chain. And character chain index
As shown in FIG. 3, a character chain (..
・、 Ah, it's like ... ・・・ Kikuke, Kukeko ...
・) And the position in the text
Pointer (..., i ₁-Pt₁, I₁-Pt_Two, ...
・ I_Two-Pt_Two’, I_Two-Pt_Three’, ...)
You. For example, like that, Iue, Kikuke, Kukeko
Correspondingly i₁-Pt₁, I ₁-Pt_Two, I_Two-Pt₁’, I_Two−
pt_Three'Is stored. Note that the character chain index 1
5, in addition to three-letter chains, one-letter and two-letter chains
Also remember their character chains and pointers so they can be handled
Have been.

【００１９】次に、検索処理を図６及び図７に従って説
明する。まずキーワード出力手段からキーワードが文字
連鎖抽出部１４において、キーワードから、複数の特定
文字数で構成される文字連鎖と、キーワード中の位置を
表す文字連鎖位置データとが抽出される。例えば、図７
で３文字連鎖を求めた場合には、文字連鎖とその文字連
鎖位置データは、＜あいう，１＞，＜いうえ，２＞にな
る。ここで“１”及び“２”は文字連鎖位置データを表
している。また文字連鎖抽出動作は図５と同じ処理が行
われる。文字連鎖と文字連鎖位置データが文字連鎖抽出
部１４からインデックス検索部１６は、前述した文字連
鎖（あいう，いうえ）を用いて、まず“あいう”と“い
うえ”のテキスト文中の文字連鎖のポインタ列ここでは
Ｐ,Ｑとすると、Ｐ,Ｑを求める。ポインタ列Ｐ,Ｑは一般
的に複数のポインタからなり、ポインタ列Ｐはｐ（１）
〜ｐ（ｌ）の集合であり、。またポインタ列Ｑはｑ
（１）〜ｑ（ｍ）の集合である。このポインタ列は“あ
いう”と“いうえ”の文字連鎖が存在するテキスト文中
の位置を表しているものであり、“あいうえ”なるキー
ワードがテキスト文中に存在するときには、ｐ（ｌ）−
ｑ（ｍ）＝−１を満足するポインタが存在する。このポ
インタの抽出もインデックス検索部１６で行われる。Next, the retrieval process will be described with reference to FIGS. First, the keyword is extracted from the keyword output means in the character chain extraction unit 14, from the keyword, a character chain composed of a plurality of specific characters and character chain position data indicating a position in the keyword. For example, FIG.
When the three-character chain is obtained in step (1), the character chain and its character chain position data are <A, 1> and <I, 2>. Here, "1" and "2" represent character chain position data. In the character chain extraction operation, the same processing as in FIG. 5 is performed. The character chain and character chain position data are converted from the character chain extraction unit 14 to the index search unit 16 by using the above-described character chain (Ai, Iue). Assuming that P and Q are pointer strings, P and Q are obtained. The pointer strings P and Q generally consist of a plurality of pointers, and the pointer string P is p (1)
~ P (l). The pointer sequence Q is q
It is a set of (1) to q (m). This pointer string indicates the position in the text where the character chain of “Aoi” and “Iue” exists. When the keyword “Aiue” exists in the text, p (l) −
There is a pointer satisfying q (m) =-1. The extraction of the pointer is also performed by the index search unit 16.

【００２０】図８はポインタ抽出の動作を示すフローチ
ャートである。インデックス検索部１６が文字連鎖イン
デックス１５を検索することにより文字連鎖に対応した
ポインタ列Ｐ，Ｑを求めると、次ぎにはｉ，ｊ，ｈをセ
ットする動作がステップＯＰ１３で行われる。ｉ，ｊは
ポインタ列ｐ（１）〜ｐ（ｌ）とｑ（１）〜ｑ（ｍ）を
それぞれ指定するポインタであり、ｈは目的とした文字
列即ちキーワードと同一の文字連鎖が存在したときに、
その文字連鎖位置を格納する処理結果レジスタＹ（ｈ）
を指示をするポインタである。そしてイニシャルセット
動作がステップＯＰ１３で行われた後は、ポインタｉ，
ｊがそれぞれｉ＞１，ｊ＞ｍの判別がステップＯＰ１４
で行われる。ｌ，ｍはポインタ列Ｐ，Ｑの最終ポインタ
を表しており、ともに満足していないときには、ｉ，ｊ
で指示されるポインタｐ（ｉ），ｑ（ｊ）がキーワード
中の一連の文字連鎖の条件を満足しているか否かの判定
がステップＯＰ１５で行われる。即ち、キーワードから
抽出した文字連鎖位置データａｘとｂｘを用いてポイン
タ列ＰとＱの文字位置の差を、ａｏ←ｐ（ｉ）−ａｘ，
ｂｏ←ｑ（ｊ）−ｂｘと修正する。ステップＯＰ１６と
ＯＰ１８で、レジスタａａ，ｂｏの値が同一であったと
きには、連続する一連の文字連鎖であるから、ステップ
ＯＰ２０でこのときのａｏを処理結果Ｙ（ｈ）に格納
し、Ｙ（ｈ）のポインタｈを＋１とする。またステップ
１６でａｏがｂｏより大きいときには、ポインタｐ
（ｉ）がポインタｑ（ｊ）より文中の先を指示している
場合であるので、ステップＯＰ１７でポインタ列Ｑのポ
インタｊを＋１とする。一方、ステップ１８でｂｏがａ
ｏより大きいと判定されたときには、ポインタｑ（ｊ）
がポインタｐ（ｉ）より文中の先を指示している場合で
あるので、ステップＯＰ１９でポインタ列Ｐのポインタ
ｉを＋１とする。そしてａｏとｂｏが一致したときに
は、ポインタｐ（ｉ）とｑ（ｊ）は同一の一連の文字連
鎖を指示しているので、それぞれが次のポインタとすべ
きで、ステップＯＰ２１でポインタｉとポインタｊをそ
れぞれ＋１とする。このような動作の繰り返しによっ
て、キーワードと同一の文字連鎖の位置が求められ、そ
れぞれの位置データがレジスタＹに格納される。FIG. 8 is a flowchart showing the operation of pointer extraction. When the index search unit 16 obtains the pointer strings P and Q corresponding to the character chain by searching the character chain index 15, the operation of setting i, j and h is then performed in step OP13. i and j are pointers for specifying the pointer strings p (1) to p (l) and q (1) to q (m), respectively, and h is the target character string, that is, the same character chain as the keyword exists. sometimes,
Processing result register Y (h) storing the character chain position
Is a pointer for instructing. After the initial set operation is performed in step OP13, the pointer i,
It is determined in step OP14 that j is i> 1, j> m, respectively.
Done in l and m represent the final pointers of the pointer strings P and Q, and when both are not satisfied, i, j
A determination is made in step OP15 as to whether or not the pointers p (i) and q (j) indicated by satisfies the conditions for a series of character chains in the keyword. That is, using the character chain position data ax and bx extracted from the keyword, the difference between the character positions of the pointer strings P and Q is calculated as ao ← p (i) -ax,
Correct as bo ← q (j) −bx. If the values of the registers aa and bo are the same in Steps OP16 and OP18, it is a continuous series of character chains. Therefore, in Step OP20, ao at this time is stored in the processing result Y (h), and Y (h) is stored. ) Is set to +1. If ao is larger than bo in step 16, the pointer p
Since (i) indicates the point in the sentence from the pointer q (j), the pointer j of the pointer sequence Q is set to +1 in step OP17. On the other hand, in step 18, bo is a
When it is determined to be larger than o, the pointer q (j)
Indicates the point in the sentence from the pointer p (i), the pointer i of the pointer sequence P is set to +1 in step OP19. When ao and bo match, the pointers p (i) and q (j) indicate the same series of character chains, and they should be the next pointers. j is set to +1. By repeating such an operation, the position of the same character chain as the keyword is obtained, and the respective position data is stored in the register Y.

【００２１】一方、ステップＯＰ１４においてｉ＞１，
ｊ＞ｍの一方が満足したときには、満足したほうのポイ
ンタ列にこれ以上のポインタがないことになるため、処
理結果Ｙを出力する処理をステップ２２で行う。この動
作によってインデックス検索部１６の処理が終了するこ
とになる。On the other hand, at step OP14, i> 1,
When one of j> m is satisfied, there is no more pointer in the satisfied pointer row, and the processing for outputting the processing result Y is performed in step 22. With this operation, the processing of the index search unit 16 ends.

【００２２】次に、修正処理について述べる。図９は本
発明の修正処理の実施例を示す構成図であり、図１０は
その動作を示す動作説明図である。そしてサブテキスト
生成部１３は修正対象サブテキスト生成手段を構成して
おり、文字連鎖修正部１７は第１，第２修正対象文字列
データ生成手段及びデータ修正手段を構成するようにな
っている。そして修正対象文字列はこれから修正しよう
とする文字列を表しており、修正文字列は修正対象文字
列を書き換える文字列である。なお、挿入処理は修正対
象文字列がなく、削除処理は修正文字列がない場合であ
る。そして修正対象文字列が文字連鎖抽出部１４に入力
されると、文字連鎖抽出部１４は修正対象文字列から、
複数の特定文字数で構成される文字連鎖と修正対象文字
列中の位置を表す文字連鎖位置データとを求めるように
なっている。例えば図１０で、文字連鎖３文字を求めた
場合には、文字連鎖と文字連鎖位置データは＜あいう，
１＞＜いうえ，２＞になる。Next, the correction processing will be described. FIG. 9 is a block diagram showing an embodiment of the correction processing of the present invention, and FIG. 10 is an operation explanatory diagram showing the operation. The sub-text generating unit 13 constitutes a correction target sub-text generating unit, and the character chain correcting unit 17 constitutes first and second correction target character string data generating units and data correcting unit. The character string to be corrected represents a character string to be corrected, and the character string to be corrected is a character string for rewriting the character string to be corrected. Note that the insertion processing is performed when there is no correction target character string, and the deletion processing is performed when there is no correction character string. When the correction target character string is input to the character chain extraction unit 14, the character chain extraction unit 14
A character chain composed of a plurality of specific characters and character chain position data indicating a position in the correction target character string are obtained. For example, in FIG. 10, when three character chains are obtained, the character chain and the character chain position data are <A,
1><Ue,2>.

【００２３】これらの文字連鎖と文字連鎖位置データが
インデックス検索部１６に入力されると、検索処理と同
様に、文字連鎖インデックス１５から、修正対象文字列
がテキスト中に存在する場所を表すポインタを求めるた
めの処理が行われる。図１０では、ｉ₁−ｐ_t1，ｉ₁−ｐ
_t2がポインタである。このポインタを文字連鎖修正部１
７が入力すると、サブテキスト生成部１３にサブテキス
ト番号ｉ₁が転送される。サブテキスト生成部１３は、
テキストファイル１２から、このサブテキスト番号ｉ₁
に該当するサブテキストを抽出し、文字連鎖修正部１７
へ出力する。このとき、該当するサブテキストは、例え
ば（・・・あいうえお・・・。）である。When these character chains and character chain position data are input to the index search unit 16, a pointer indicating the location where the character string to be corrected exists in the text is obtained from the character chain index 15 in the same manner as in the search processing. The process for obtaining is performed. In Figure _{_{10, i 1 -p t1, i}} 1 -p
_t2 is a pointer. This pointer is used as the character chain correction unit 1
When 7 is input, the subtext number i ₁ is transferred to the subtext generator 13. The sub text generation unit 13
From the text file 12, this subtext number i ₁
Is extracted, and the character chain correcting unit 17
Output to At this time, the corresponding sub text is, for example, (...,...).

【００２４】文字連鎖修正部１７が修正対象サブテキス
トに関するデータを入力すると、これらのデータは文字
連鎖抽出部１４へ転送される。文字連鎖抽出部１４は、
この修正対象サブテキストのデータから、複数の特定文
字数で構成される文字連鎖と修正対象文字列中の位置を
表す文字連鎖位置データとを求め、これらのデータを文
字連鎖修正部１７へ転送する。例えば、図１０で、文字
連鎖３文字を求めた場合には、文字連鎖と文字連鎖位置
データは・・・＜あいう，ｉ₁−ｐ_t1＞，＜いうえｉ₁−
ｐ_t2＞，・・・になる。そして文字連鎖修正部１７は、
この修正前の文字連鎖と文字連鎖位置データを基に、文
字連鎖インデックス１５に記憶されている修正前の文字
連鎖と文字連鎖位置データを削除する。When the character chain correcting unit 17 inputs data relating to the correction target subtext, these data are transferred to the character chain extracting unit 14. The character chain extraction unit 14
From the data of the sub-text to be corrected, a character chain composed of a plurality of specific characters and character chain position data indicating a position in the character string to be corrected are obtained, and these data are transferred to the character chain correcting unit 17. For example, in FIG. 10, when the sought character chain 3 characters, character chain and character chain position data ... <abc, i ₁ -p _t1>, <Iue i ₁ -
p _t2 > _,. Then, the character chain correcting unit 17
Based on the character chain and character chain position data before correction, the character chain and character chain position data before correction stored in the character chain index 15 are deleted.

【００２５】次に、文字連鎖修正部１７は、先ほどイン
デックス検索部１６から入力した修正対象文字列のポイ
ンタｉ₁−ｐ_t1とｉ₁−ｐ_t2を用いて、この修正対象サブ
テキストから修正対象文字列を削除する。そして、別途
入力する修正文字列を修正対象文字列のポインタ位置に
挿入して修正サブテキストを作成し、作成した修正サブ
テキストを文字連鎖抽出部１４へ転送する。例えば、図
１０では、修正文字列はａｂｃであるから、修正サブテ
キストは（・・・ａｂｃお・・・）になる。そして文字
連鎖抽出部１４は、この修正サブテキストから、特定の
文字数で構成される文字連鎖と修正サブテキスト中の位
置を表す文字連鎖位置データとを求める。例えば、図１
０で文字連鎖３文字を求めたときには、文字連鎖とその
位置データは・・・，＜ａｂｃ，ｉ₁−ｐ_t1＞，＜ｂｃ
お，ｉ₁−ｐ_t2＞，・・・になる。次に文字連鎖修正部
１７が文字連鎖抽出部１４から修正サブテキストの文字
連鎖と文字連鎖位置データを入力すると、文字連鎖イン
デックス１５の修正文字連鎖に対応するポインタ列に、
修正文字連鎖位置データが格納され、ポインタ列中で昇
順配列にソートされ、修正された文字連鎖インデックス
が作成される。Next, the character chain modifying unit 17, by using the correction target string pointer i ₁ -p _t1 and i ₁ -p _t2 input from just the index search unit 16, the target modified from the correction target sub text Delete the string. Then, a separately input correction character string is inserted at the pointer position of the correction target character string to generate a correction subtext, and the generated correction subtext is transferred to the character chain extraction unit 14. For example, in FIG. 10, since the correction character string is abc, the correction sub-text is (... abc ...). Then, the character chain extraction unit 14 obtains a character chain composed of a specific number of characters and character chain position data indicating a position in the corrected subtext from the corrected subtext. For example, FIG.
When sought character chain three characters is 0, and its position data character chain _{···, <abc, i 1 -p} t1>, <bc
, I ₁ -p _t2>, become .... Next, when the character chain correcting unit 17 receives the character chain of the corrected subtext and the character chain position data from the character chain extracting unit 14, the pointer sequence corresponding to the corrected character chain of the character chain index 15 contains
The corrected character chain position data is stored, sorted in ascending order in the pointer column, and a corrected character chain index is created.

【００２６】図１１は、本発明の修正処理における修正
対象サブテキスト抽出の動作を示すフローチャートであ
る。図１１において、テキストファイル１２からテキス
トのデータが入力されると、まずステップＯＰ２３でイ
ニシャルセットが行われる。即ちサブテキスト番号ｃｎ
ｔを“１”に、テキストの先頭からの文字位置ｉを
“１”に、サブテキスト終端文字列をｐ₁〜ｐ_kに、ｐ₁
〜ｐ_kを検出するまでの処理回数ｊを“０”に、文字連
鎖修正部１７から入力した抽出サブテキスト番号Ｉをセ
ットする。そして、テキストから取り出したｋ文字（ｋ
はｐ_kのＫに同じ）の文字列とサブテキスト終端文字列
ｐ₁〜ｐ_kが一致するか否かの判定をＯＰ２４で行う。こ
のステップで一致しないと判定されたときには、ステッ
プＯＰ２５で、テキストとサブテキスト終端文字列ｐ₁
〜ｐ_kとの文字列照合が終了したか否かの判定を行う。
この処理で継続と判定されたときには、ステップＯＰ２
６で文字位置ｉと照合回数ｊを＋１とする。さらにステ
ップＯＰ４２以降の処理を繰り返し、テキストの終わり
まで、テキストから取り出すｋ文字の文字列とサブテキ
スト終端文字列ｐ₁〜ｐ_kが一致するか否かの判定が行わ
れる。FIG. 11 is a flowchart showing the operation of extracting the subtext to be corrected in the correction processing of the present invention. In FIG. 11, when text data is input from the text file 12, initial setting is first performed in step OP23. That is, the sub text number cn
t to "1", the character position i from the beginning of the text to "1", the sub-text-terminated string to p _₁ ~p _k, p ₁
The processing number j until detection of the ~p _k to "0", and sets the extracted sub text number I input from character chain modification unit 17. Then, k characters (k
Is the same as K of p _k ), and a determination is made in OP24 as to whether or not the subtext end character strings p _{1 to} p _k match. If it is determined in this step that they do not match, in step OP25, the text and the subtext end character string p ₁
String matching with ~p _k it is determined whether finished.
If it is determined that the process is to be continued, the process proceeds to step OP2.
In step 6, the character position i and the number of collations j are set to +1. Further steps OP42 repeat the subsequent processes until the end of the text, whether the judging string and sub text terminated string p ₁ ~p _k of k characters match to retrieve from the text is performed.

【００２７】ステップＯＰ２４でテキストから取り出し
たｋ文字の文字列とサブテキスト終端文字列ｐ₁〜ｐ_kと
が一致したときには、ステップＯＰ２７で、このサブテ
キストの番号が抽出サブテキスト番号Ｉに一致するか否
かの判定が行われる。このステップで一致しないと判定
されたときには、次のサブテキストを探索するため、ス
テップＯＰ２８で、サブテキスト番号ｃｎｔを＋１し、
照合回数ｊを“−１”にする。ステップＯＰ２５で、テ
キストとサブテキスト終端文字列ｐ₁〜ｐ_kとの文字列照
合が終了するか否かの判定を行う。さらにステップＯＰ
２４以降の処理を繰り返し、テキストの終わりまで文字
列の照合を行う。[0027] When the character string and the sub-text terminated string p ₁ ~p _k of k characters extracted from the text matches in step OP 24, in step OP27, number of sub-text matches the extracted sub text number I Is determined. If it is determined in this step that they do not match, in step OP28, the subtext number cnt is incremented by 1 to search for the next subtext.
The number of times of collation j is set to “−1”. In step OP 25, a determination is string matching of text and sub text terminated string p ₁ ~p _k is whether to end. Further step OP
The processing after step 24 is repeated, and the character string is collated until the end of the text.

【００２８】ステップＯＰ２７で、このサブテキストの
番号ｃｎｔが抽出されるテキスト番号Ｉに一致したとき
には、ステップＯＰ２９で、一致したサブテキスト“Ｃ
_i-j〜Ｃ_i+k-1”を出力し、修正対象サブテキストの抽出
を終了する。なおステップＯＰ２５でテキストとサブテ
キスト終端文字列ｐ₁〜ｐ_kとの文字列照合が終了したと
きには、ステップＯＰ３０で、エラーメッセージ“該当
なし”を出力し、このルーチンでの処理を終了する。When the sub-text number cnt matches the extracted text number I in step OP27, in step OP29 the matching sub-text "C
Outputs _{_{ij ~C i + k-1 "}} , and ends the extracted correction target sub text. Note that when a string matching the text and sub text terminated string p ₁ ~p _k ended in step OP 25, step In OP30, an error message “not applicable” is output, and the processing in this routine ends.

【００２９】本実施例においては、サブテキスト終端文
字列を句点「。」とし、１個の文ごとにサブテキストを
構成したが、句点＋スペースをサブテキスト終端文字列
とすると、１個の段落ごとにサブテキストを構成するこ
ともできる。さらに、登録テキストに、例えば利用単位
を考慮して利用単位区切り文字列を挿入し、１利用単位
ごとにサブテキストを構成することもできる。さらに、
登録テキストに、例えば利用単位を考慮して利用単位区
切り文字列を挿入し、位置利用単位ごとにサブテキスト
を構成することもできる。In this embodiment, the sub-text terminating character string is a period (.), And a sub-text is constructed for each sentence. A sub text can also be configured for each. Further, for example, a sub-text can be configured for each use unit by inserting a use unit delimiter character string in consideration of the use unit in the registered text. further,
For example, a use unit delimiter character string may be inserted into the registered text in consideration of the use unit, and a subtext may be configured for each position use unit.

【００３０】また、上記実施例においては、文字連鎖イ
ンデックスの文字数を３文字以下とした場合について述
べたが、４文字以上の文字連鎖でも前記実施例と同様な
処理で検索ができる。また検索時に、文字連鎖の重なり
具合が最小となるように、文字連鎖をキーワードから抽
出することもできる。例えば、キーワードが“あいうえ
お”の場合、文字連鎖を“あいう”と“うえお”とする
ことも可能である。Further, in the above embodiment, the case where the number of characters of the character chain index is set to three or less has been described. However, a character chain of four or more characters can be searched by the same processing as in the above embodiment. At the time of retrieval, a character chain can be extracted from a keyword such that the degree of overlap of the character chain is minimized. For example, when the keyword is “Aieo”, the character chain can be “Aoi” and “Ueo”.

【００３１】さらに、本発明は、文字連鎖に限らず、一
般的なデータに対しても同様に行うことができる。Further, the present invention can be applied not only to character chains but also to general data.

【００３２】[0032]

【発明の効果】本発明によれば、上記実施例から明らか
なように、予め設定したサブテキスト終端文字列でテキ
ストを複数のサブテキストに分割し、各サブテキストご
とに文字位置を管理することにより、テキストの修正・
挿入・削除時には、修正・挿入・削除箇所以降の文字位
置の変更をサブテキスト内に留めるようにしたため、テ
キストの修正・挿入・削除の短縮化が可能となる。また
同一文あるいは同一段落に、全キーワードが存在するこ
とが求められる近接演算検索処理の場合には、全キーワ
ードから求められるポインタのサブテキスト番号が全て
一致するかを確認するだけてよく、近接演算検索処理の
高速化にも寄与することができる。According to the present invention, as is apparent from the above embodiment, a text is divided into a plurality of subtexts by a preset subtext end character string, and a character position is managed for each subtext. To correct the text
At the time of insertion / deletion, the change of the character position after the correction / insertion / deletion part is kept in the subtext, so that the text correction / insertion / deletion can be shortened. Also, in the case of the proximity operation search processing in which all keywords are required to be present in the same sentence or the same paragraph, it is only necessary to check whether or not all the subtext numbers of the pointers obtained from all keywords match. This can also contribute to speeding up the search processing.

[Brief description of the drawings]

【図１】本発明の一実施例を示す全体構成図FIG. 1 is an overall configuration diagram showing an embodiment of the present invention.

【図２】本発明の登録処理例を示す構成図FIG. 2 is a configuration diagram showing a registration processing example of the present invention.

【図３】本発明の登録処理例の動作を説明するための動
作説明図FIG. 3 is an operation explanatory diagram for explaining an operation of a registration processing example of the present invention;

【図４】サブテキスト分割処理を説明するためのフロー
チャートFIG. 4 is a flowchart illustrating a sub-text division process.

【図５】文字連鎖抽出の動作を説明するためのフローチ
ャートFIG. 5 is a flowchart for explaining the operation of character chain extraction.

【図６】本発明の検索処理を示す構成図FIG. 6 is a configuration diagram showing a search process according to the present invention.

【図７】本発明の検索処理の動作を説明するための動作
説明図FIG. 7 is an operation explanatory diagram for explaining the operation of the search processing according to the present invention;

【図８】ポインタ抽出の動作を説明するためのフローチ
ャートFIG. 8 is a flowchart for explaining the operation of pointer extraction;

【図９】本発明の修正処理例を示す構成図FIG. 9 is a configuration diagram illustrating an example of a correction process according to the present invention.

【図１０】本発明の修正処理例の動作を説明するための
動作説明図FIG. 10 is an operation explanatory diagram for explaining an operation of a modification processing example of the present invention;

【図１１】修正対象サブテキストの抽出動作を説明する
ためのフローチャートFIG. 11 is a flowchart illustrating an operation of extracting a correction target subtext.

【図１２】従来例の構成図FIG. 12 is a configuration diagram of a conventional example.

【図１３】他の従来例の構成図FIG. 13 is a configuration diagram of another conventional example.

[Explanation of symbols]

１２テキストファイル１３サブテキスト生成部１４文字連鎖抽出部１５文字連鎖インデックス１６インデックス検索部１７文字連鎖修正部 12 Text file 13 Subtext generator 14 Character chain extraction unit 15 Character chain index 16 Index search unit 17 Character chain correction unit

Claims

(57) [Claims]

1. A text file that stores data of a string that consists of a plurality of characters such as together with its position data, a plurality of divided data for each data of the specific character or a specific character string of the text file a text generation means for generating a sub-text, Rubun characters to extract character chain position 置De over data indicating a position in the sub-text of this character chain to consist character chain of a plurality of specific character from the sub text linkage extracting means, a character chain storage means for storing in association with the character chain and character chain position data by extracting the character chain extracting means position in the text, the time of the search processing by the input keyword, Said sentence
Character chain and sentence extracted from keyword by character chain extraction means
And a character chain position data, by searching the character chain storage unit, index search means and, in which the information retrieval process comprising the extracting character chain of the same character chain together with its position data by extracting the character chain extracting means In the device , the correction target character string of the sub text is corrected to a correction character string.
Character chain correction means for performing correction processing.
The character chain extracting means may be configured to convert a character string
Character chain position data that returns the position in the chain and the string to be modified
And the index search means sends the character chain extraction means
Character chain and character chain position of the string to be corrected extracted from
Search and correct the character chain storage means from the location data
Extracting position data in the text of the target character string, the text generating means, from the text file,
Position data extracted by the index search means
Is extracted, and the character chain correcting unit sends the sub-text to the index searching unit.
Using the extracted position data and the corrected character string,
Sub text extracted by the text generation means
Modify to create a modified sub-text, wherein the character chain extracting means is
Character chain and correction sub-text from the corrected correction sub-text
Extracts the character chain position data representing the position in the text, the character chain correcting means, by the character chain extracting means
Character chain and character chain position of extracted modified subtext
By storing data in the character chain storage means,
And an information retrieval processing device that corrects sub text.