JPH0827804B2

JPH0827804B2 - Japanese dictionary data management method

Info

Publication number: JPH0827804B2
Application number: JP1271855A
Authority: JP
Inventors: 雅博奥; 伸一郎高木; 浩司松岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 1989-10-20
Filing date: 1989-10-20
Publication date: 1996-03-21
Anticipated expiration: 2011-03-21
Also published as: JPH03134773A

Description

【発明の詳細な説明】〔発明の目的〕（産業上の利用分野）本発明は、計算機による日本語処理において日本文中
から単語を抽出する日本語辞書のデータを管理する日本
語辞書データ管理方式に関し、更に詳しくは、日本語辞
書を単語の長さに応じて複数の辞書に分類することによ
って単語の検索時間の短縮を図る日本語辞書データ管理
方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Field of Industrial Application) The present invention is a Japanese dictionary data management system for managing data of a Japanese dictionary for extracting words from Japanese sentences in Japanese processing by a computer. More specifically, the present invention relates to a Japanese dictionary data management method for reducing the search time for words by classifying the Japanese dictionary into a plurality of dictionaries according to the length of the words.

（従来の技術）日本文を単語に分類する従来の方法は、１つの日本語
辞書にすべての単語を登録しておき、この辞書を用いて
日本文を単語に分割するものである。(Prior Art) A conventional method of classifying Japanese sentences into words is to register all the words in one Japanese dictionary and divide the Japanese sentences into words using this dictionary.

第４図は日本文の中から２文字以上の単語を抽出する
従来の方法の説明図である。同図において、１は単語候
補抽出の対象文字列であり、２はこの対象文字列から網
羅的に抽出される単語候補列であり、３はフィールド長
として最も長い単語の長さを有する日本語辞書の見出し
キー部であり（長さをｎ文字とする）、４はデータ部で
ある。FIG. 4 is an explanatory diagram of a conventional method for extracting a word having two or more characters from a Japanese sentence. In the figure, 1 is a target character string for word candidate extraction, 2 is a word candidate string exhaustively extracted from this target character string, and 3 is a Japanese word having the longest word length as a field length. The index key part of the dictionary (the length is n characters), and 4 is the data part.

第５図は第４図に示す方法における単語抽出の概略処
理を示すフローチャートである。単語抽出は、第４図お
よび第５図に示すように、２文字から１文字ずつ増やし
て、単語候補抽出の対象文字列１との見出しのマッチン
グにより網羅的に行われる。この従来の方法では、この
マッチングの際にいかなる長さの文字列であっても、辞
書の見出しキー部３の長さのｎ文字に合わせなければな
らないので、ｎ文字に足りない部分はブランクを埋める
等の処理が必要となる。また、一般に、キー長が長くな
るほど、マッチングに要する時間が長くなる。FIG. 5 is a flow chart showing a schematic process of word extraction in the method shown in FIG. As shown in FIG. 4 and FIG. 5, the word extraction is performed comprehensively by matching the heading with the target character string 1 for word candidate extraction, increasing one character from two characters. In this conventional method, at the time of this matching, a character string of any length has to be adjusted to n characters of the length of the index key portion 3 of the dictionary, so a blank is added to the part that is insufficient for n characters. Processing such as filling is required. Further, generally, the longer the key length, the longer the time required for matching.

第４図においては、単語候補文字列２の中の「特許」
をキーとして日本語辞書を検索する際に、日本語辞書の
見出しキー長であるｎ文字に合わせるために文字列「特
許」の後方に（ｎ−２）文字のブランクを埋め、この
「特許」を含むｎ文字の文字列をキーとして日本語辞書
を検索し、検索に成功して初めて「特許」が単語として
判明する。次に、「特許」よりも１文字だけ長い文字列
「特許出」が単語であるか否かを調べるために、「特
許」の場合と同様に（ｎ−３）文字のブランクを埋め、
日本語辞書を検索する。文字列「特許出」は日本語辞書
に登録されていないので、単語ではないことが判明す
る。以下、文字列「許出」、「出願」、「出願人」につ
いても同様の処理が行われる。そして結果として、「特
許」、「出願」、「出願人」の３つの単語候補列が日本
語辞書に登録されていることがわかり、単語として認定
される。In FIG. 4, “patent” in the word candidate character string 2
When searching the Japanese dictionary with the key as a key, a blank of (n-2) characters is padded after the character string "patent" to match the heading key length of the Japanese dictionary of n characters, and this "patent" A Japanese dictionary is searched using a character string of n characters including "," and "patent" is not found as a word until the search is successful. Next, in order to check whether or not the character string “patented”, which is one character longer than “patent”, is a word, as in the case of “patent”, a blank of (n−3) characters is filled,
Search Japanese dictionary. Since the character string "patented" is not registered in the Japanese dictionary, it turns out that it is not a word. Hereinafter, the same processing is performed for the character strings “grant”, “application”, and “applicant”. As a result, it is found that the three word candidate strings of “patent”, “application”, and “applicant” are registered in the Japanese dictionary, and the word is recognized as a word.

（発明が解決しようとする課題）以上のように従来の方法では、短い単語候補列であっ
ても、見出しのキー長を日本語辞書の最も長い見出しキ
ー長に合わせることが必要であるため、短い単語候補列
に対しても最も長い見出しキー長に対応した長い検索時
間がかかるとともに、またキー部分に無駄な空き領域が
多く、非効率的であるという問題がある。(Problem to be Solved by the Invention) As described above, in the conventional method, it is necessary to match the key length of the heading with the longest heading key length of the Japanese dictionary even for a short word candidate string. There is a problem that it takes a long search time corresponding to the longest heading key length even for a short word candidate string, and there is a lot of useless empty area in the key portion, which is inefficient.

本発明は、上記に鑑みてなされたもので、その目的と
するところは、無駄な空き領域が少なく、かつ検索時間
が短い日本語辞書データ管理方式を提供することにあ
る。The present invention has been made in view of the above, and an object of the present invention is to provide a Japanese dictionary data management method in which there is little wasted empty area and the search time is short.

[Structure of Invention]

（課題を解決するための手段）上記目的を達成するため、本発明の日本語辞書データ
管理方式は、単語を構成する文字数に応じた複数の単語
辞書に各単語を分類して登録する分類登録手段と、前記
複数の単語辞書のうち単語の文字数の短い単語辞書に存
在しない単語と同じ単語を先頭文字に含む単語が単語の
文字数の長い単語辞書に存在する場合、前記文字数の短
い単語辞書に存在しない前記単語を該文字数の短い単語
辞書に追加登録する追加登録手段と、前記複数の単語辞
書のうち単語の文字数の短い単語辞書に存在する単語と
同じ単語を先頭文字に含む単語が単語の文字数の長い単
語辞書に存在する場合、該文字数の長い単語辞書の中で
最も短い文字数の単語辞書を示す次検索辞書種別情報を
前記文字数の短い単語辞書に設定する次検索辞書種別情
報設定手段とを有することを要旨とする。(Means for Solving the Problem) In order to achieve the above object, the Japanese dictionary data management method of the present invention is a classification registration in which each word is classified and registered in a plurality of word dictionaries according to the number of characters forming the word. Means and, in the word dictionary with a short number of characters in the word dictionary with a long number of characters in the word dictionary having the same word as the first character that does not exist in the word dictionary with a short number of characters in the word dictionary, Additional registration means for additionally registering the word that does not exist in the word dictionary having a shorter number of characters, and a word that includes the same word as the first character in the word dictionary having a shorter number of words in the plurality of word dictionaries as the first character When present in a word dictionary with a long number of characters, the next search dictionary type information indicating the word dictionary with the shortest number of characters in the word dictionary with a long number of characters is set in the word dictionary with a short number of characters. The gist is to have a dictionary type information setting means.

（作用）本発明の日本語辞書データ管理方式では、単語の文字
数に応じて複数の単語辞書に各単語を分類登録し、文字
数の短い単語辞書に存在しない単語と同じ単語を先頭文
字に含む単語が文字数の長い単語辞書に存在する場合、
該単語を文字数の短い単語辞書に追加登録し、文字数の
短い単語辞書に存在する単語と同じ単語を先頭文字に含
む単語が文字数の長い単語辞書に存在する場合、該文字
数の長い単語辞書の中でも最も短い文字数の単語辞書を
示す次検索辞書種別情報を前記文字数の短い単語辞書に
設定している。(Operation) In the Japanese dictionary data management method of the present invention, each word is classified and registered in a plurality of word dictionaries according to the number of characters of the word, and a word including the same word as the first character that does not exist in the word dictionary with a short number of characters as the first character Is present in the long word dictionary,
When the word is additionally registered in a word dictionary with a short number of characters, and a word that includes the same word as the first character in the word dictionary with a short number of characters as the first character is present in the word dictionary with a long character number, even in the word dictionary with a long character number, Next search dictionary type information indicating the word dictionary having the shortest number of characters is set in the word dictionary having the shortest number of characters.

（実施例）以下、図面を用いて本発明の実施例を説明する。(Examples) Examples of the present invention will be described below with reference to the drawings.

第１図は本発明の一実施例に係わる日本語辞書データ
管理方式の説明図であり、第２図は第１図の日本語辞書
データ管理方式の作用を示すフローチャートである。第
１図において、１および２はそれぞれ前述したと同様に
単語候補抽出の対象文字列であり、２はこの対象文字列
から網羅的に抽出される単語候補列である。本実施例に
おいては、単語辞書は単語文字の長さに応じて分割され
て構成され、例えば、図示のように２文字の単語からな
る２文字単語辞書10および３文字列以上の単語からなる
３文字以上単語辞書11のように構成される。FIG. 1 is an explanatory diagram of a Japanese dictionary data management system according to an embodiment of the present invention, and FIG. 2 is a flow chart showing the operation of the Japanese dictionary data management system of FIG. In FIG. 1, 1 and 2 are target character strings for word candidate extraction, respectively, as described above, and 2 is a word candidate string exhaustively extracted from this target character string. In the present embodiment, the word dictionary is configured by being divided according to the length of the word characters. For example, as shown in the figure, a two-character word dictionary 10 consisting of two-character words and three words consisting of three or more character strings. It is structured like a word dictionary 11 of characters or more.

２文字単語辞書10において、５は２文字単語辞書10用
の見出しキー部であり、その長さは２文字である。６は
次検索辞書種別情報であり、この情報に対応する２文字
単語辞書10の見出しキー部５の２文字が３文字以上単語
辞書11の先頭の２文字にある場合には、該次検索辞書種
別情報６は「１」に設定され、また３文字以上単語辞書
11の先頭の２文字にない場合には、該次検索辞書種別情
報６は「０」に設定される。更に、７は２文字単語辞書
10のデータ部である。In the 2-character word dictionary 10, reference numeral 5 is a heading key portion for the 2-character word dictionary 10 and its length is 2 characters. 6 is the next search dictionary type information, and when the two characters of the heading key portion 5 of the two-character word dictionary 10 corresponding to this information are three or more characters in the first two characters of the word dictionary 11, the next search dictionary is indicated. Type information 6 is set to "1" and a word dictionary of 3 or more characters
When it is not in the first two characters of 11, the next search dictionary type information 6 is set to "0". Furthermore, 7 is a two-letter word dictionary.
It is the data part of 10.

また、３文字以上単語辞書11において、８は見出しキ
ー部であり、９はデータ部である。Further, in the word dictionary 11 having three or more characters, 8 is a heading key part and 9 is a data part.

次に、第２図に示すフローチャートを参照して作用を
説明する。Next, the operation will be described with reference to the flowchart shown in FIG.

第２図においては、対象文字列１をｌとするととも
に、この文字列ｌの対象文字列１の中で検索しようとす
る文字列の先頭の文字位置ｉを１とし（ｉ＝１）、また
検索キー長を２に設定する（ステップ110,120）。更
に、対象文字列１のｉ文字目からｊ文字までを最初の検
索キーとする（ステップ130）。また、ｊは最初２に設
定され、次のステップ140においてｊが２より大きいか
否かチェックされ（ｊ＞２）、この結果に応じて２文字
単語辞書10または３文字以上単語辞書11が検索される。In FIG. 2, the target character string 1 is set to 1, and the first character position i of the character string to be searched in the target character string 1 of this character string 1 is set to 1 (i = 1). The search key length is set to 2 (steps 110 and 120). Further, the i-th character to the j-th character of the target character string 1 are set as the first search key (step 130). Also, j is initially set to 2, and in the next step 140 it is checked whether j is greater than 2 (j> 2), and the two-letter word dictionary 10 or the three or more-letter word dictionary 11 is searched according to this result. To be done.

具体的には、第１図に示す場合には、単語候補抽出の
対象文字列１の中において第１番目（ｉ＝１）から第２
番目（ｊ＝２）に対応する最初の２文字の「特許」を単
語候補列２として抽出する。そして、この場合のｊは２
より大きいので、ステップ150に進み、この単語候補列
２の文字列「特許」を検索キーとして２文字単語辞書10
を検索する。この場合、２文字単語辞書10のキー長は２
文字であるので、「特許」で検索すればよく、従来のよ
うなブランクを付加する必要はない。Specifically, in the case shown in FIG. 1, the first (i = 1) to the second in the target character string 1 for word candidate extraction.
The first two characters “patent” corresponding to the th (j = 2) are extracted as the word candidate string 2. And j in this case is 2
Since it is larger, the process proceeds to step 150, and the two-character word dictionary 10 using the character string "patent" of this word candidate string 2 as a search key.
To search. In this case, the key length of the 2-character word dictionary 10 is 2
Since it is a character, it can be searched by "patent", and it is not necessary to add a blank as in the conventional case.

「特許」を検索キーとして２文字単語辞書10を検索す
ると、「特許」が単語として登録されているので、完全
マッチするため（ステップ160）、更に完全マッチする
単語のすべてを次検索で検索する（ステップ170）。そ
れから、この検索した単語の次検索辞書種別情報６が
「１」であるか否か、すなわち「特許」を先頭２文字と
して有する単語が３文字以上単語辞書11の中に存在する
か否かチェックする（ステップ180）。「特許」の次検
索辞書種別情報６は「０」であり、３文字以上単語辞書
11の中に「特許」を先頭２文字とする単語が存在しない
ことがわかる。また、単語候補列２の中の「特許出」は
３文字以上単語辞書11の中にはないことがわかるので、
この「特許出」については辞書検索を行う必要はない。
また、次の「許出」については、２文字単語辞書10を検
索した時点で登録されていないことがわかるので、単語
でないことが明らかとなる。When the 2-character word dictionary 10 is searched using "patent" as a search key, "patent" is registered as a word, and therefore a perfect match is found (step 160). Therefore, all of the completely matching words are searched for in the next search. (Step 170). Then, it is checked whether or not the next search dictionary type information 6 of this searched word is "1", that is, whether or not a word having "patent" as the first two characters is present in the word dictionary 11 with three characters or more. Yes (step 180). The next search dictionary type information 6 of "patent" is "0", and is a word dictionary of three or more characters.
It can be seen that there is no word in 11 that has "patent" as the first two letters. Also, it can be seen that "patented" in the word candidate sequence 2 is not in the word dictionary 11 more than 3 characters,
It is not necessary to perform a dictionary search for this "patented".
Further, as for the next "permit", it is clear that it is not a word because it is found that it is not registered when the two-character word dictionary 10 is searched.

更に、「出願」については、２文字単語辞書10の中に
「出願」として登録されていることが上述したと同じ検
索処理により認定されるとともに、この「出願」の次検
索辞書種別情報６は「１」と認定され、３文字以上単語
辞書11の中に「出願」を先頭２文字として有する単語が
存在することがわかる。従って、この場合の検索キー長
を３として、「出願」に１文字加えた「出願人」を検索
キーとして、３文字以上単語辞書11を検索する（ステッ
プ190以降）。この結果、３文字以上単語辞書11には
「出願人」が登録されているので、「出願人」は単語と
して登録される。Further, regarding the “application”, it is recognized by the same search processing as described above that the “application” is registered in the two-character word dictionary 10, and the next search dictionary type information 6 of the “application” is It can be seen that there is a word having “application” as the first two characters in the word dictionary 11 that is recognized as “1” and has three or more characters. Therefore, the search key length in this case is set to 3, and the word dictionary 11 having 3 or more characters is searched using "applicant", which is one character added to "application", as a search key (step 190 and thereafter). As a result, since "applicant" is registered in the word dictionary 11 of 3 characters or more, "applicant" is registered as a word.

以上のように、「特許」、「出願」、「出願人」の３
つが単語として認定される。このとき、「特許」、「出
願」の２語に対しては、２文字をキーとして検索すれば
よいので、従来の方法に比較して検索時間が非常に短く
なる。As described above, 3 of “patent”, “application”, and “applicant”
One is recognized as a word. At this time, for two words of "patent" and "application", it is sufficient to search with two characters as a key, so the search time is very short compared to the conventional method.

本実施例の日本語辞書データ管理方式の有効性を確認
するために、漢字用語辞書（漢字２文字以上の単語）、
漢字１文字辞書、カタカナ辞書、ひらがな辞書、混ぜ書
き辞書の５つの辞書を用いて、日本文の形態素解析を行
うシステムに本実施例の日本語辞書データ管理方式を適
用し、従来の漢字用語辞書を２文字単語辞書と３文字以
上単語辞書の２つに分割し、この分割前後の辞書サイズ
および形態素解析時間の比較を行った結果は次の通りで
ある。In order to confirm the effectiveness of the Japanese dictionary data management method of this embodiment, a kanji term dictionary (words of two or more kanji characters),
The Japanese dictionary data management method of the present embodiment is applied to a system for performing morphological analysis of Japanese sentences using five dictionaries including a one-character kanji dictionary, a katakana dictionary, a hiragana dictionary, and a mixed-writing dictionary, and a conventional kanji term dictionary is applied. Is divided into two, a two-letter word dictionary and a three-letter or more word dictionary, and the results of comparing the dictionary size and the morphological analysis time before and after this division are as follows.

（１）辞書サイズの比較分割前の漢字用語辞書・・・16896000バイト分割後の２つの辞書の合計・・・15633408バイト（２）形態素解析時間の比較マニュアルから任意に取り出した１文当りの39文字の
平均文字長を有する20文について比較した結果は次の通
りである。(1) Comparison of dictionary size Kanji term dictionary before division ・・・ 16896000 bytes Total of two dictionaries after division ・・・ 15633408 bytes (2) Comparison of morphological analysis time 39 per sentence arbitrarily extracted from the manual The results of comparing 20 sentences with the average character length are as follows.

分割前の形態素解析時間・・・177.7秒分割後の形態素解析時間・・・148.7秒以上のことから、本日本語辞書データ管理方式を日本
語の形態素解析に使用することにより、辞書サイズを約
８％削減でき、更に形態素解析に必要な処理時間も約16
％削減できる。Morphological analysis time before division ・・・ 177.7 seconds Morphological analysis time after division ・・・ 148.7 seconds Since it is more than 148.7 seconds, by using this Japanese dictionary data management method for Japanese morphological analysis, the dictionary size is reduced. 8% reduction and about 16 processing time required for morphological analysis
% Reduction.

第３図は本発明の他の実施例に係わる日本語辞書デー
タ管理方式の説明図である。同図に示す実施例は、同図
に示す９語を２文字単語辞書、３文字単語辞書および４
文字以上単語辞書に登録する場合の処理を説明するもの
である。FIG. 3 is an explanatory diagram of a Japanese dictionary data management system according to another embodiment of the present invention. In the embodiment shown in the figure, the nine words shown in the figure are converted into a two-letter word dictionary, a three-letter word dictionary, and a four-letter word dictionary.
A process for registering more than characters in the word dictionary will be described.

第３図において、12は２文字単語辞書の見出しキー
部、13は３文字単語辞書または４文字以上単語辞書への
次検索辞書種別情報であって、２文字単語辞書の見出し
キー部12にある２文字が３文字単語辞書の中の先頭に文
字にある場合には「３」が設定され、４文字以上単語辞
書の中の先頭２文字にある場合には「４」が設定され、
両者にない場合には「０」が設定される。14は２文字単
語辞書のデータ部である。In FIG. 3, 12 is a heading key part of the two-character word dictionary, 13 is next search dictionary type information for a three-character word dictionary or a four-character or more word dictionary, and is in the heading key part 12 of the two-character word dictionary. If two characters are in the first character in the three-character word dictionary, "3" is set, and if four or more characters are in the first two characters in the word dictionary, "4" is set,
If neither exists, "0" is set. 14 is the data part of the two-letter word dictionary.

３文字単語辞書において、15は見出しキー部であり、
16は４文字以上単語辞書への次検索辞書種別情報であっ
て、３文字単語辞書の見出しキー部12にある２文字が４
文字以上単語辞書の中の先頭２文字にある場合には
「４」が設定され、ない場合には「０」が設定される。
17は３文字単語辞書データ部である。In the 3-letter word dictionary, 15 is a heading key part,
16 is the next search dictionary type information for the word dictionary of 4 characters or more, and 2 characters in the index key part 12 of the 3-character word dictionary are 4 characters.
If there are more than one character in the first two characters in the word dictionary, "4" is set, and if not, "0" is set.
Reference numeral 17 is a 3-character word dictionary data section.

４文字以上単語辞書において、18は見出しキー部であ
り、19はデータ部である。In the word dictionary of four characters or more, 18 is a heading key part and 19 is a data part.

登録すべき９語、すなわちAA,AAB,AB,ABA,ABAC,ABACD
E,AE,AFE,AFEDは、その字面長に応じて、２文字単語辞
書、３文字単語辞書または４文字以上単語辞書のいずれ
かに分割して登録される。なお、この時には、２文字単
語辞書および３文字単語辞書の次検索辞書種別情報は設
定されない。Nine words to register, namely AA, AAB, AB, ABA, ABAC, ABACD
E, AE, AFE, and AFED are divided and registered in either a two-character word dictionary, a three-character word dictionary, or a four-character or more word dictionary according to the character length. At this time, the next search dictionary type information of the 2-character word dictionary and the 3-character word dictionary is not set.

次に、３文字単語辞書および４文字以上単語辞書に登
録されている単語であって、その先頭２文字の文字列が
２文字単語辞書に登録されていない文字列を検出する。
第３図においては、文字列「AF」がこの文字列に相当す
る。従って、この文字列「AF」を２文字単語辞書の見出
しキー部にダミーレコードとして登録する。Next, a character string that is a word registered in the 3-character word dictionary and a word string of 4 characters or more and whose first two character strings are not registered in the 2-character word dictionary is detected.
In FIG. 3, the character string "AF" corresponds to this character string. Therefore, this character string "AF" is registered as a dummy record in the heading key part of the two-character word dictionary.

最後に、次検索辞書種別情報の設定を行う。例えば、
２文字単語辞書の「AF」は３文字単語辞書の「AFE」お
よび４文字以上単語辞書の「AFED」の２つに含まれてい
るので、２文字単語辞書の次検索辞書種別情報13には、
単語長の短い方の３文字単語辞書を示す「３」を設定す
る。また、２文字単語辞書の「AA」は３文字単語辞書の
「AAB」に含まれているので、次検索辞書種別情報13に
は「３」が設定されている。２文字単語辞書の「AB」も
同様である。２文字単語辞書の「AE」は他の辞書に含ま
れていないので、次検索辞書種別情報13には「０」が設
定される。更に、３文字単語辞書の「AAB」は４文字以
上単語辞書に含まれていないので、その次検索辞書種別
情報16には「０」が設定される。３文字単語辞書の「AB
A」は４文字以上単語辞書の「ABAC」および「ABACDE」
に含まれているので、その次検索辞書種別情報16には
「４」が設定される。３文字単語辞書の「AFE」は４文
字以上単語辞書の「AFED」に含まれているので、その次
検索辞書種別情報16には「４」が設定される。Finally, the next search dictionary type information is set. For example,
Since “AF” of the 2-character word dictionary is included in “AFE” of the 3-character word dictionary and “AFED” of the word dictionary of 4 characters or more, the next search dictionary type information 13 of the 2-character word dictionary is ,
“3” is set, which indicates the three-letter word dictionary with the shorter word length. Further, since “AA” of the two-character word dictionary is included in “AAB” of the three-character word dictionary, “3” is set in the next search dictionary type information 13. The same applies to "AB" in the two-letter word dictionary. Since “AE” of the two-letter word dictionary is not included in other dictionaries, “0” is set in the next search dictionary type information 13. Furthermore, since "AAB" of the 3-character word dictionary is not included in the word dictionary of 4 characters or more, "0" is set in the next search dictionary type information 16. "AB in the 3-letter word dictionary
“A” is a word dictionary of 4 or more letters “ABAC” and “ABACDE”
Therefore, “4” is set in the next search dictionary type information 16. Since “AFE” of the 3-character word dictionary is included in “AFED” of the word dictionary of 4 characters or more, “4” is set in the next search dictionary type information 16.

以上のようにして、第３図に示す２文字単語辞書、３
文字単語辞書および４文字以上単語辞書が作成される。As described above, the two-letter word dictionary shown in FIG.
A character word dictionary and a word dictionary of 4 or more characters are created.

〔The invention's effect〕

以上説明したように、本発明によれば、単語の文字数
に応じて複数の単語辞書に各単語を分類登録し、文字数
の短い単語辞書に存在しない単語と同じ単語を先頭文字
に含む単語が文字数の長い単語辞書に存在する場合、該
単語を文字数の短い単語辞書に追加登録し、文字数の短
い単語辞書に存在する単語と同じ単語を先頭文字に含む
単語が文字数の長い単語辞書に存在する場合、該文字数
の長い単語辞書の中で最も短い文字数の単語辞書を示す
次検索辞書種別情報を前記文字数の短い単語辞書に設定
しているので、単語の検索に当たって単語の長さを単語
辞書の見出しキー長に合わせる必要がなく、単語辞書の
キー部分に空領域が存在しないため、使用効率を向上す
ることができるとともに、検索時間を短縮することがで
きる。As described above, according to the present invention, each word is classified and registered in a plurality of word dictionaries according to the number of characters of the word, and the number of words whose first character is the same word as a word that does not exist in the word dictionary with a short number of characters is the number of characters. Existing in the word dictionary with a long number of characters, the word is additionally registered in the word dictionary with a short number of characters, and a word including the same word as the word existing in the word dictionary with a short number of characters as the first character exists in the word dictionary with a long number of characters. Since the next search dictionary type information indicating the word dictionary with the shortest number of characters in the long word dictionary is set in the short word dictionary with the shortest number of characters, the length of the word is searched for in the word dictionary when searching for the word. Since it is not necessary to match the key length and there is no empty area in the key portion of the word dictionary, it is possible to improve the usage efficiency and shorten the search time.

[Brief description of drawings]

第１図は本発明の一実施例に係わる日本語辞書データ管
理方式の説明図、第２図は第１図の日本語辞書データ管
理方式の作用を示すフローチャート、第３図は本発明の
他の実施例に係わる日本語辞書データ管理方式の説明
図、第４図は従来の方法による単語抽出の説明図、第５
図は第４図の作用を示すフローチャートである。１……対象文字列、２……単語候補列、 5,8……見出しキー部、６……次検索辞書種別情報、 7,9……データ部、 10……２文字単語辞書、 11……３文字以上単語辞書。FIG. 1 is an explanatory diagram of a Japanese dictionary data management system according to an embodiment of the present invention, FIG. 2 is a flow chart showing the operation of the Japanese dictionary data management system of FIG. 1, and FIG. FIG. 4 is an explanatory view of a Japanese dictionary data management method according to the embodiment of the present invention, FIG. 4 is an explanatory view of word extraction by a conventional method, and FIG.
The figure is a flow chart showing the operation of FIG. 1 …… Target character string, 2 …… Word candidate string, 5,8 …… Heading key part, 6 …… Next search dictionary type information, 7,9 …… Data part, 10 …… 2-character word dictionary, 11… … A word dictionary of 3 or more characters.

フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所 9288−5ＬＧ０６Ｆ 15/20 ５２０Ｂ Continuation of front page (51) Int.Cl. ⁶ Identification number Office reference number FI technical display location 9288-5L G06F 15/20 520B

Claims

[Claims]

1. A classification / registration means for classifying and registering each word in a plurality of word dictionaries according to the number of characters forming the word, and a word not existing in a word dictionary having a short word length among the plurality of word dictionaries. When a word including the same word as the first character exists in a word dictionary having a long number of characters, the word that does not exist in the word dictionary having a short number of characters is additionally registered in the word dictionary having a short number of characters; Of the word dictionary with the shortest number of characters in the word dictionary, if the word that contains the same word as the first character in the word dictionary with the longest number of characters exists in the word dictionary with the longest number of characters, the shortest number of characters in the word dictionary with the longest number of characters And a next search dictionary type information setting means for setting the next search dictionary type information indicating the word dictionary in the word dictionary having the short number of characters.