JP2837006B2

JP2837006B2 - Text compression device

Info

Publication number: JP2837006B2
Application number: JP3287864A
Authority: JP
Inventors: 明濱田; 等鈴木; 広勝秋山
Original assignee: Consejo Superior de Investigaciones Cientificas CSIC
Current assignee: Consejo Superior de Investigaciones Cientificas CSIC
Priority date: 1991-11-01
Filing date: 1991-11-01
Publication date: 1998-12-14
Anticipated expiration: 2013-12-14
Also published as: JPH05128102A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、ワ―ドプロセッサや電
子化された書籍の文章圧縮装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a word processor and a text compression apparatus for digitized books.

【０００２】[0002]

【従来の技術】従来の文章圧縮装置は、単語や一定文字
数の文字列を単位として、同一の文字列が複数あればそ
の文字列を辞書に登録し、辞書に登録された文字列を識
別するコ―ドによって文章中の同じ文字列を置き換えて
文章を圧縮する文章圧縮方法を用いている。2. Description of the Related Art A conventional text compression apparatus registers a plurality of identical character strings in a dictionary in units of words or character strings having a fixed number of characters, and identifies the character strings registered in the dictionary. A text compression method is used in which the same character string in the text is replaced by a code to compress the text.

【０００３】図９は、上述した従来の文章圧縮装置で用
いられている単語単位の文字列を登録する辞書の構造を
示す。FIG. 9 shows the structure of a dictionary for registering character strings in word units used in the above-mentioned conventional text compression apparatus.

【０００４】図９の辞書を用いた従来の文章圧縮装置で
は、例えば、「また、風の力を源とするエレメンタル魔
法を使います。」のような文章を、「また、風の力を源
とする［語０１４１１］［語１２９５９］を使いま
す。」のように圧縮する。In a conventional text compression apparatus using the dictionary shown in FIG. 9, for example, a text such as "I use elemental magic that uses the power of wind" [Word 01411] and [Word 12959] are used. "

【０００５】この圧縮文中の［語０１４１１］、［語１
２９５９］などはコ―ド化された単語を示しており、コ
―ドに含まれる単語番号で図９の辞書を検索すれば元の
文字列を復元できる。[0005] [Word 01411], [Word 1
2959] and the like indicate coded words. The original character string can be restored by searching the dictionary of FIG. 9 using the word numbers included in the code.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上述し
た従来の文章圧縮装置では、辞書に登録する文字列が２
５６種類を超えた場合、識別コ―ドとして２バイトが必
要である。しかし、漢語では２文字で形成されたものが
多く、対象となる語は数文字の長さにしかならないの
で、文章全体の圧縮率を大きくできないという問題点が
あった。However, in the conventional text compression apparatus described above, the character string registered in the dictionary is 2 characters.
If the number exceeds 56, 2 bytes are required as the identification code. However, many Chinese characters are formed of two characters, and the target word is only a few characters long. Therefore, there is a problem that the compression rate of the entire sentence cannot be increased.

【０００７】本発明は、上述した従来の文章圧縮装置に
おける問題点に鑑み、文字列をコ―ド化して文章の圧縮
率を拡大できる文章圧縮装置を提供する。The present invention has been made in view of the above-described problems of the conventional text compression apparatus, and provides a text compression apparatus capable of expanding a text compression ratio by coding a character string.

【０００８】[0008]

【課題を解決するための手段】本発明は、複数の語から
形成された同一または類似した文字列を複数含む文章の
中から類似した文字列のうち特定の文字列を登録する登
録手段と、登録手段に接続されており文章中の類似した
文字列を登録手段に登録した特定の文字列の識別データ
及び特定の文字列から削除すべき文字列の先頭位置と削
除文字数と削除すべき文字列に換えて挿入すべき文字列
とを含む差分データに置き換える変換手段とを備えてお
り、先頭位置が特定の文字列の先頭から第１の所定数以
内となるように特定の文字列を分割し、削除文字数が第
２の所定数以内となるように調整するように構成されて
いる文章圧縮装置によって達成される。According to the present invention, there is provided a registration means for registering a specific character string among similar character strings from a sentence including a plurality of identical or similar character strings formed from a plurality of words; The identification data of the specific character string registered in the registration means and the similar character string in the sentence which is connected to the registration means, and the starting position and deletion of the character string to be deleted from the specific character string
Character string to be inserted instead of the number of characters to be deleted and the character string to be deleted
You and a conversion means for replacing the differential data, including the door
The start position is equal to or less than the first predetermined number from the start of the specific character string.
The specified character string is divided so that
This is achieved by a text compression device configured to adjust to be within a predetermined number of 2 .

【０００９】[0009]

【作用】本発明の文章圧縮装置では、登録手段は複数の
語から形成された同一または類似した文字列を複数含む
文章の中から類似した文字列のうち特定の文字列を登録
し、変換手段は登録手段に接続されており文章中の類似
した文字列を登録手段に登録した特定の文字列の識別デ
ータ及び特定の文字列から削除すべき文字列の先頭位置
と削除文字数と削除すべき文字列に換えて挿入すべき文
字列とを含む差分データに置き換える。特定の文字列は
削除すべき文字列の先頭位置が特定の文字列の先頭から
第１の所定数以内となるように分割され、削除文字数が
第２の所定数以内となるように調整される。 In the text compression apparatus according to the present invention, the registration means registers a specific character string among similar character strings from a text including a plurality of identical or similar character strings formed from a plurality of words, and converts the character string. Is connected to the registration means, and the identification data of the specific character string registered in the registration means for a similar character string in the text and the head position of the character string to be deleted from the specific character string
To be inserted instead of the number of characters to be deleted and the character string to be deleted
Replace with the difference data including the character string . The specific string is
The beginning of the string to be deleted is from the beginning of the specific string
It is divided so as to be within the first predetermined number, and the number of deleted characters is
The adjustment is performed so as to be within the second predetermined number.

【００１０】[0010]

【実施例】以下、図面を参照して本発明の文章圧縮装置
の実施例を説明する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a text compression apparatus according to the present invention.

【００１１】図１は、本発明の文章圧縮装置の一実施例
の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of a text compression apparatus according to the present invention.

【００１２】図１の文章圧縮装置は、圧縮された文章デ
―タの復元や文字列の表示を制御する変換手段の一部で
ある制御部11、制御部11に接続されており復元する文章
を指示する入力部12、制御部11に接続されており復元し
た文章を表示する陰極線管（ＣＲＴ）や液晶表示装置
（ＬＣＤ）などからなる表示部13、制御部11に接続され
ており原文（図２参照）を圧縮して長い文字列中の同一
の文字列をコ―ド化（以下、文コ―ドと称する）したデ
―タから、復元される文字列を格納する登録手段の一部
である辞書14、制御部11に接続されており単語コ―ドか
ら復元される文字列を格納する登録手段の一部である辞
書15、制御部11に接続されており文コ―ドでコ―ド化さ
れかつ差分デ―タで圧縮された文章を有機的に結合して
集合させた変換手段の一部であるデ―タベ―ス16、制御
部11に接続されており最終的に復元された結果を蓄える
変換手段の一部であるバッフア17、制御部11に接続され
ており辞書14から取り出した文字列を一時的に蓄える変
換手段の一部であるバッフア18、制御部11に接続されて
おりバッフア17，18及び圧縮デ―タの文字位置を格納す
る変換手段の一部であるメモリ19によって構成されてい
る。The text compression apparatus shown in FIG. 1 is a control unit 11 which is a part of a conversion means for controlling the recovery of compressed text data and the display of a character string, and is connected to the control unit 11 to recover the text. And a display unit 13 such as a cathode ray tube (CRT) or a liquid crystal display (LCD), which is connected to the control unit 11 and displays the restored text. One of the registering means for storing a character string to be restored from data obtained by compressing the same character string in a long character string by compressing (see FIG. 2) (hereinafter, referred to as a sentence code). And a dictionary 15 which is connected to the control unit 11 and is a part of the registration means for storing the character string restored from the word code, and which is connected to the control unit 11 and uses the sentence code. A part of the conversion means that organically combines and compiles coded sentences compressed with difference data A buffer 17 which is connected to a certain database 16 and the control unit 11 and is a part of a conversion means for storing finally restored results, and a character string which is connected to the control unit 11 and extracted from the dictionary 14 Buffer 18, which is part of conversion means for temporarily storing data, and which is connected to the control unit 11, is constituted by buffers 17, 18 and a memory 19 which is part of conversion means for storing character positions of compressed data. ing.

【００１３】次に、上記各構成部分を説明する。Next, the above components will be described.

【００１４】制御部11は、圧縮された文章デ―タの復元
や文字列の表示を制御すると共に、他の部分の動作を制
御する。The control unit 11 controls the restoration of compressed text data and the display of character strings, and also controls the operation of other parts.

【００１５】入力部12は、復元する文章を指示するよう
にキ―ボ―ドで構成されている。The input unit 12 is composed of a keyboard for designating a sentence to be restored.

【００１６】表示部13は、復元した文章を表示するよう
に陰極線管（ＣＲＴ）や液晶表示装置（ＬＣＤ）などで
構成されている。The display unit 13 includes a cathode ray tube (CRT), a liquid crystal display (LCD), and the like so as to display the restored text.

【００１７】辞書14は、図３に示すように１対１で文番
号に対応するインデックス141 と可変長の文字列を格納
する本体部142 とから構成されている。As shown in FIG. 3, the dictionary 14 comprises an index 141 corresponding to a sentence number on a one-to-one basis and a main body 142 for storing variable-length character strings.

【００１８】インデックス141 は、本体部142 の対応す
る文字列の先頭を示し、インデックス141 の次の要素が
指す文字位置との差から文字列の長さがわかるように構
成されている。The index 141 indicates the head of the corresponding character string in the main body 142, and is configured so that the length of the character string can be determined from the difference from the character position indicated by the next element of the index 141.

【００１９】辞書15は、図９に示すように単語番号と１
対１で対応する構造となっており、デ―タベ―ス16は、
文コ―ド及び差分デ―タでコ―ド化及び圧縮された文章
を有機的に結合して集合させ、入力部12の指示により必
要な部分が取り出せるように構成されている。The dictionary 15 stores a word number and a 1 as shown in FIG.
The structure corresponds to one to one, and the database 16
It is configured such that sentences coded and compressed by the sentence code and the difference data are organically combined and collected, and a necessary part can be extracted by an instruction of the input unit 12.

【００２０】バッフア17は、最終的に復元された結果を
蓄え、バッフア18は、辞書14から取り出した文字列を一
時的に蓄えるように構成されている。The buffer 17 is configured to store the finally restored result, and the buffer 18 is configured to temporarily store the character string extracted from the dictionary 14.

【００２１】メモリ19は、文章デ―タベ―ス16の処理位
置を示すポインタ191 、復元バッフア17の末尾位置を示
すポインタ192、復元バッフア18の末尾位置を示すポイ
ンタ193 及び復元バッフア18の挿入位置を示すポインタ
194 によって構成されている。The memory 19 has a pointer 191 indicating the processing position of the text database 16, a pointer 192 indicating the end position of the restoration buffer 17, a pointer 193 indicating the end position of the restoration buffer 18, and an insertion position of the restoration buffer 18. Pointer to
194.

【００２２】図２は、本実施例の文章圧縮装置により圧
縮する原文の一部をに示す。また、図３は、本実施例の
文章圧縮装置の長い文字列を登録した辞書の一構成例を
示しており、図４は、本実施例の文章圧縮装置により長
い文字列をコ―ド化した一例を示す。FIG. 2 shows a part of the original sentence compressed by the sentence compression apparatus of the present embodiment. FIG. 3 shows an example of the configuration of a dictionary in which long character strings are registered in the text compression apparatus according to the present embodiment. FIG. 4 shows how a long text string is encoded by the text compression apparatus according to the present embodiment. An example is shown below.

【００２３】本実施例の文章圧縮装置は、文、節または
句などのような長い文字列を辞書14に登録して、同一の
文字列をコ―ド化（文コ―ド）すると共に、辞書14に登
録した文字列と完全に一致しない場合には、その差分デ
―タを付加するように構成されている。即ち、本実施例
の文章圧縮装置は、図２のような原文を図３の辞書を用
いて図４の形に圧縮する文章圧縮方法を用いている。The text compression apparatus according to the present embodiment registers a long character string such as a sentence, a clause or a phrase in the dictionary 14 and codes the same character string (sentence code). If the character string does not completely match the character string registered in the dictionary 14, the difference data is added. That is, the text compression apparatus of the present embodiment uses a text compression method of compressing the original text as shown in FIG. 2 into the form shown in FIG. 4 using the dictionary shown in FIG.

【００２４】次に、図１の文章圧縮装置により圧縮され
た図４の文章の各部分を詳細に説明する。Next, each part of the text of FIG. 4 compressed by the text compression apparatus of FIG. 1 will be described in detail.

【００２５】図４の［文］は、文コ―ド及び差分デ―タ
を通常文字コ―ドから分離するセパレ―タである。ま
た、図４の［０００２３］は、文コ―ドを示しており、
コ―ドに含まれる文番号で図３の辞書を検索すれば元の
文字列が復元されるように構成されている。[Sentence] in FIG. 4 is a separator for separating the sentence code and the difference data from the ordinary character code. [00023] in FIG. 4 shows a sentence code.
If the dictionary of FIG. 3 is searched with the sentence number included in the code, the original character string is restored.

【００２６】図４の［０１−０４］は、文コ―ドから復
元された文字列からの削除デ―タであり、ハイフンの後
ろに配置された数字が削除文字の先頭位置を示し、ハイ
フンの前に配置された数字が削除文字数を示している。
この削除文字デ―タ及び削除文字デ―タに続く挿入文字
列により差分デ―タを構成する。[01-04] In FIG. 4, [01-04] is deletion data from the character string restored from the sentence code. The number placed after the hyphen indicates the start position of the deletion character. The number placed before is the number of characters to be deleted.
Difference data is constituted by the deleted character data and the inserted character string following the deleted character data.

【００２７】図５は、本実施例の文章圧縮装置における
長い文字列と単語単位の文字列の双方のコ―ド化を行っ
た一例を示す模式図であり、図６は、本実施例の文章圧
縮装置におけるコ―ドデ―タ構造を説明するための図で
ある。FIG. 5 is a schematic diagram showing an example in which both a long character string and a character string in word units are coded in the text compression apparatus according to the present embodiment. FIG. 4 is a diagram for explaining a code data structure in the text compression device.

【００２８】図５に示した内容は、図４の後半を図９の
単語辞書を用いて更に圧縮したものである。なお、実際
のコ―ドとしては、通常、文字をＪＩＳの漢字コ―ドを
用いて２バイト（最上位バイト（ＭＳＢ）“０”）で表
現し、その他の圧縮コ―ドは図６に示すような構成であ
る。この構造に収めるため、差分が先頭から２５６文字
以降になる場合は登録する文を分割して２５５文字以内
に調整し、また、差分が８文字以上になる場合は７文字
以内になるように調整する。The contents shown in FIG. 5 are obtained by further compressing the latter half of FIG. 4 using the word dictionary of FIG. Note that, as an actual code, a character is usually represented by 2 bytes (most significant byte (MSB) “0”) using a JIS kanji code, and other compressed codes are shown in FIG. The configuration is as shown. To accommodate this structure, if the difference is 256 characters or less from the beginning, the sentence to be registered is divided and adjusted to 255 characters or less, and if the difference is 8 characters or more, it is adjusted to 7 characters or less. I do.

【００２９】次に、図７及び図８のフロ−チャ−トを参
照して、図１の文章圧縮装置の動作、特に入力部12（図
１参照）で指示された圧縮文書を復元する手順を説明す
る。Next, with reference to the flowcharts of FIGS. 7 and 8, the operation of the text compression apparatus of FIG. 1, in particular, the procedure of restoring a compressed document specified by the input unit 12 (see FIG. 1). Will be described.

【００３０】図５の文の１行目を取り上げれば、まず、
ポインタ191 が指示された圧縮デ―タの先頭を指すよう
にしてポインタ192 がバッフア17の先頭、ポインタ193
がバッフア18の先頭をそれぞれ指すように初期化し（ス
テップＳ１）、ポインタ191が指すデ―タの種類を判別
し（ステップＳ２）、図５の場合には、セパレ―タ
［文］がくるのでステップＳ３に進む、ここからは、文
コ―ドデ―タの復元を行う図８のサブ・ル―チンに入
る。Taking the first line of the sentence of FIG. 5, first,
With the pointer 191 pointing to the head of the specified compressed data, the pointer 192 is set to the head of the buffer 17 and the pointer 193
Is initialized to point to the head of the buffer 18 (step S1), the type of data pointed to by the pointer 191 is determined (step S2), and in the case of FIG. 5, a separator [statement] comes. The process proceeds to step S3. From here, the subroutine shown in FIG. 8 for restoring the sentence code data is entered.

【００３１】文セパレ―タの直後に文コ―ドがあるの
で、文番号［０００２３］を取り出し（ステップＳ３
１）、この文番号でインデックス４１を介して辞書４２
を検索して“また、風の…”という文字列を見付け（ス
テップＳ３２）、見付けた文字列をバッフア18の先頭に
コピ―してその末尾位置をポインタ193 が指すようにし
（ステップＳ３３）、ポインタ191 を次に進めて（ステ
ップＳ３４）、ポインタ191 が指すデ―タの種類を判別
する（ステップＳ３５）。この場合、次のデ―タは削除
文字デ―タ［０１−０４］であるのでステップＳ３６に
進む。Since there is a sentence code immediately after the sentence separator, the sentence number [00023] is extracted (step S3).
1) The dictionary 42 through the index 41 with this sentence number
And finds a character string "against the wind ..." (step S32). The found character string is copied to the beginning of the buffer 18 so that the end position is indicated by the pointer 193 (step S33). The pointer 191 is advanced next (step S34), and the type of data indicated by the pointer 191 is determined (step S35). In this case, since the next data is the deleted character data [01-04], the process proceeds to step S36.

【００３２】バッフア18から、削除する文字位置及び文
字長を取り出すと共に、文字位置をポインタ194 に保存
し（ステップＳ３６）、この場合、先頭から４文字目の
１字のみ（削除対象となった文字（この場合、
“風”））をバッフア18から削除し、ポインタ193 が文
字列の末尾を指すように操作し（ステップＳ３７）、ス
テップＳ３４へ戻りステップＳ３５の判別で、次のデ―
タが通常文字の“彼”なのでステップＳ３９へ進む。The character position and character length to be deleted are extracted from the buffer 18 and the character position is stored in the pointer 194 (step S36). In this case, only the fourth character from the top (the character to be deleted) (in this case,
“Wind”)) is deleted from the buffer 18, the pointer 193 is operated so as to point to the end of the character string (step S37), and the process returns to step S34 to determine the next data in step S35.
Since the character is the normal character "he", the process proceeds to step S39.

【００３３】この文字をバッフア18のポインタ194 が示
す位置に挿入し、ポインタ193 及びポインタ194 を挿入
文字数分進めてそれぞれ末尾位置と次の挿入位置を指す
ようにし（ステップＳ３９）、ステップＳ３４、ステッ
プＳ３５及びステップＳ３９をル―プし、“ら”、
“は”を同様に挿入した後、ステップＳ３５の判別によ
り次のデ―タが単語コ―ド［語０８３９４］であるので
ステップＳ３８に進む。This character is inserted into the buffer 18 at the position indicated by the pointer 194, and the pointer 193 and the pointer 194 are advanced by the number of inserted characters so as to point to the end position and the next insertion position, respectively (step S39). Looping S35 and step S39, "ra",
After "ha" is similarly inserted, the next data is the word code [word 08394] by the discrimination in step S35, and the flow advances to step S38.

【００３４】単語コ―ドの単語番号から辞書15を検索し
て対応する文字列、この場合“大地”を取り出す（ステ
ップＳ３８）。ステップＳ３９で取り出した“大地”を
バッフア18に挿入してステップＳ３４に戻る。The dictionary 15 is searched from the word number of the word code, and a corresponding character string, in this case, "earth" is extracted (step S38). The “ground” extracted in step S39 is inserted into the buffer 18, and the process returns to step S34.

【００３５】ステップＳ３５で、次のデ―タが文セパレ
―タであることが分かるので図７のメインル―チンに戻
る。In step S35, since it is found that the next data is a sentence separator, the process returns to the main routine of FIG.

【００３６】文字列（バッフア18に用意された文字列）
をバッフア17のポインタ192 が指す位置へ追加する。Character string (character string prepared in buffer 18)
At the position indicated by the pointer 192 of the buffer 17.

【００３７】ポインタ193 をバッフア18の先頭を指すよ
うにし（ステップＳ５）、ポインタ191 を進め（ステッ
プＳ６）、デ―タの終わりの判別を行う（ステップＳ
７）。この場合、ここでデ―タは終わっているのでステ
ップＳ８へ進み、バッフア17に蓄えられた文字列を表示
部13に表示して処理を終了する（ステップＳ８）。The pointer 193 is pointed to the beginning of the buffer 18 (step S5), the pointer 191 is advanced (step S6), and the end of the data is determined (step S5).
7). In this case, since the data has been completed, the process proceeds to step S8, where the character string stored in the buffer 17 is displayed on the display unit 13 and the process is terminated (step S8).

【００３８】次に、図５の文の３行目を取り上げて説明
する。Next, the third line of the sentence in FIG. 5 will be described.

【００３９】まず、１行目と同様にステップＳ１の初期
化を行った後、ステップＳ２の判別で最初の文字が通常
文字“た”であるので、ステップＳ５へ進み、この文字
をバッフア17に入力し（ステップＳ５）、ポインタ191
を進め（ステップＳ６）、デ―タの終りの判別をする
（ステップＳ７）。この場合には、まだデ―タがあるの
でステップＳ２へ戻る。そして、ステップＳ２、ステッ
プＳ５、ステップＳ６及びステップＳ７をル―プして、
“だ”、“し”、“、”を同様にバッフア17に追加した
後、ステップＳ２の判別により次のデ―タが単語コ―ド
［語００８０８］であるのでステップＳ４に進む。First, after the initialization of step S1 is performed in the same manner as in the first line, the first character is the ordinary character "ta" in the determination of step S2. Input (step S5), pointer 191
(Step S6), and the end of the data is determined (step S7). In this case, since there is still data, the process returns to step S2. Then, step S2, step S5, step S6 and step S7 are looped,
After "da", "shi", and "," are similarly added to the buffer 17, the next data is the word code [word 00808] according to the determination in step S2, and the flow advances to step S4.

【００４０】単語コ―ドの単語番号から辞書15を検索し
対応する文字列、この場合“石つぶて”を取り出し（ス
テップＳ４）、ステップＳ５でこの文字列をバッフア17
に追加する。この後、同様のル―プで通常文字列を追加
し、最後に１行目の文の場合と同様に文コ―ド［０００
２５］を復元してバッフア17に追加する。The dictionary 15 is searched from the word number of the word code, and a corresponding character string, in this case, "stone crush" is extracted (step S4), and this character string is stored in a buffer 17 in step S5.
Add to Thereafter, a normal character string is added in the same loop, and the sentence code [000] is finally added as in the case of the first sentence.
[25] is restored and added to the buffer 17.

【００４１】ステップＳ７でデ―タの終わりを判別し、
ステップＳ８で復元した結果を表示して処理を終わる。In step S7, the end of the data is determined.
The result restored in step S8 is displayed, and the process ends.

【００４２】上述したように、辞書に登録した文、節、
句などの長い文字列と同一または類似の文字列をコ―ド
化して圧縮した文章デ―タから元の文字列が復元され
る。As described above, sentences, sections,
The original character string is restored from sentence data obtained by coding and compressing a character string identical or similar to a long character string such as a phrase.

【００４３】[0043]

【発明の効果】本発明の文章圧縮装置は、複数の語から
形成された同一または類似した文字列を複数含む文章の
中から類似した文字列のうち特定の文字列を登録する登
録手段と、登録手段に接続されており文章中の類似した
文字列を登録手段に登録した特定の文字列の識別データ
及び特定の文字列から削除すべき文字列の先頭位置と削
除文字数と削除すべき文字列に換えて挿入すべき文字列
とを含む差分データに置き換える変換手段とを備えてお
り、先頭位置が特定の文字列の先頭から第１の所定数以
内となるように特定の文字列を分割し、削除文字数が第
２の所定数以内となるように調整するように構成されて
いるので、差分データを文字列単位として登録した特定
の文字列と類似しているが一部のみが異なる文字列が効
率よく差分データに変換されるので類似した言い回しが
多く現れる文章の圧縮率を向上できる。加えて、登録す
る特定の文字列及び削除すべき文字列の文字数を調整す
ることにより圧縮したデータの形態を使用する記憶装置
に適した形態として効率よく格納することができる。 According to the sentence compression apparatus of the present invention, there are provided registration means for registering a specific character string among similar character strings from a sentence including a plurality of identical or similar character strings formed from a plurality of words; The identification data of the specific character string registered in the registration means and the similar character string in the sentence which is connected to the registration means, and the starting position and deletion of the character string to be deleted from the specific character string
Character string to be inserted instead of the number of characters to be deleted and the character string to be deleted
You and a conversion means for replacing the differential data, including the door
The start position is equal to or less than the first predetermined number from the start of the specific character string.
The specified character string is divided so that
2 so that the difference data is adjusted to be within a predetermined number.
A character string that is similar to the
Since the data is efficiently converted into the difference data, the compression rate of a sentence in which similar phrases appear frequently can be improved. In addition, register
Adjust the number of characters in a specific string and the string to be deleted.
Storage device using compressed data form
Can be stored efficiently as a form suitable for

[Brief description of the drawings]

【図１】本発明の文章圧縮装置の一実施例の構成を示す
ブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a text compression apparatus according to the present invention.

【図２】図１の文章圧縮装置による圧縮の一例を説明す
るための原文章の一部を示す説明図である。FIG. 2 is an explanatory diagram showing a part of an original sentence for explaining an example of compression by the sentence compression device in FIG. 1;

【図３】図１の文章圧縮装置の長い文字列を登録する辞
書の一構成例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of a configuration of a dictionary for registering long character strings in the text compression apparatus of FIG. 1;

【図４】図１の文章圧縮装置で長い文字列をコ―ド化し
た例文を示す説明図である。FIG. 4 is an explanatory diagram showing an example sentence in which a long character string is coded by the sentence compression apparatus of FIG. 1;

【図５】図１の文章圧縮装置の長い文字列と単語単位の
文字列の双方のコ―ド化を行った一例の構造を示す模式
図である。5 is a schematic diagram showing an example of a structure in which both the long character string and the word-based character string of the text compression apparatus of FIG. 1 are coded.

【図６】図１の文章圧縮装置におけるコ―ドデ―タの構
造を示す説明図である。FIG. 6 is an explanatory diagram showing the structure of code data in the text compression apparatus of FIG. 1;

【図７】図１の文章圧縮装置の動作を説明するためのメ
イン・ル―チンのフロ−チャ−トである。FIG. 7 is a flowchart of a main routine for explaining the operation of the text compression apparatus of FIG. 1;

【図８】図１の文章圧縮装置の動作を説明するためのサ
ブ・ル―チンのフロ−チャ−トである。8 is a subroutine flowchart for explaining the operation of the text compression apparatus of FIG. 1. FIG.

【図９】従来の文章圧縮装置の単語単位の文字列を登録
する辞書の構造を示す説明図である。FIG. 9 is an explanatory diagram showing the structure of a dictionary for registering character strings in word units in a conventional text compression apparatus.

[Explanation of symbols]

11 制御部 12 入力部 13 表示部 14 文字列辞書 15 単語辞書 16 圧縮文章デ―タベ―ス 17 復元文バッフア 18 文字列辞書バッフア 19 メモリ集合 141 文字列辞書インデックス 142 文字列辞書本体 191 〜194 メモリポインタ 11 Control unit 12 Input unit 13 Display unit 14 String dictionary 15 Word dictionary 16 Compressed sentence database 17 Restoration sentence buffer 18 String dictionary buffer 19 Memory set 141 String dictionary index 142 String dictionary body 191 to 194 memory Pointer

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平３−206533（ＪＰ，Ａ) 特開平２−297280（ＪＰ，Ａ) 特開昭64−66755（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/21 - 17/26 G06F 12/00────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-3-206533 (JP, A) JP-A-2-297280 (JP, A) JP-A 64-66755 (JP, A) (58) Survey Field (Int.Cl. ⁶ , DB name) G06F 17/21-17/26 G06F 12/00

Claims

(57) [Claims]

1. A registration unit for registering a specific character string among similar character strings from a sentence including a plurality of identical or similar character strings formed from a plurality of words, and a registration unit connected to the registration unit. The identification data of the specific character string registered in the registration unit with the similar character string in the sentence and the head position of the character string to be deleted from the specific character string;
The number of characters to be deleted and the sentence to be inserted in place of the character string to be deleted
And a converting means for replacing the difference data including the string, the head position is first from the head of the specific character string
Divide the specific character string so that it is within a predetermined number of 1
The number of characters to be deleted is adjusted to be within a second predetermined number.
A sentence compression apparatus characterized in that the sentence compression apparatus is configured to adjust the text.