JP2579304B2

JP2579304B2 - Contracted word processor

Info

Publication number: JP2579304B2
Application number: JP61252601A
Authority: JP
Inventors: 壽彦横川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-10-23
Filing date: 1986-10-23
Publication date: 1997-02-05
Anticipated expiration: 2012-02-05
Also published as: JPS63106075A

Description

【発明の詳細な説明】技術分野本発明は、言語解析装置，より詳細には、機械翻訳装
置等の言語解析における縮約語処理装置に関する。Description: TECHNICAL FIELD The present invention relates to a language analyzer, and more particularly, to a contracted word processing device in language analysis such as a machine translation device.

従来技術代名詞・関係詞・疑問詞等と、助動詞類との縮約形
（it'll,who'd,what's等）を扱う時に、そのままの形で
扱うと、上記のような縮約形のためだけの文法ルールを
作り、新しい文法範疇も必要となり、整合性を欠きやす
い。また、これは文法体系に添っていないので、人間が
解析を追跡するのにもより大きな困難が伴うことになり
やすい。また、上述のように、代名詞・関係詞・疑問詞
等と、be動詞・have動詞・助動詞の縮約形（it's,what'
s等）をそのままの形で扱おうとすると、解析するにあ
たっては、新しい文法シンボルルールが必要となり、大
幅な手直しが必要となるばかりでなく、文法体系の持つ
統一性を損い、その結果、誤った解析を行なう可能性も
高くなる。Prior art When dealing with reduced forms (it'll, who'd, what's, etc.) of pronouns, relatives, interrogatives, etc. and auxiliary verbs, if they are handled as they are, Grammar rules are created only for the purpose, a new grammar category is also required, and it tends to lack consistency. Also, since it does not follow the grammar system, it is likely that humans will have greater difficulty tracking the analysis. In addition, as described above, the pronouns, relatives, interrogatives, etc., and the contracted forms (it's, what '
s) would require new grammatical symbol rules for analysis, which would require not only a major rework, but also a loss of uniformity of the grammar system, The likelihood of performing an analysis increases.

目的本発明は、上述のごとき実情に鑑みてなされたもの
で、縮約形の語を扱うにあたって、縮約形のままではな
く、縮約形になる前の２語にもどして扱うことにより、
文法の改変をなくし、上述した誤解析の可能性を減らす
ようにした縮約語処理装置を提供することを目的として
なされたものである。Objective The present invention has been made in view of the above-mentioned circumstances, and in treating a contracted word, it is not a contracted form, but by returning to the two words before the contracted form,
It is an object of the present invention to provide a contracted word processing device that eliminates grammatical modification and reduces the possibility of the above-described erroneous analysis.

構成対象言語文を入力し、入力された対象言語文の単語に
ついて辞書検索を行い、辞書中にエントリがあるかない
かを判断する辞書検索部と、前記辞書検索部がエントリ
ありと判断した場合に、縮約語の縮約前の形を辞書から
取り出して処理を行う辞書登録語処理部と、前記辞書検
索部がエントリなしと判断した場合に、縮約形のタイプ
を判別し、各タイプ毎に個別の処理を行って縮約語の縮
約前の形に戻して処理を行う辞書未登録語処理部を有す
ることを特徴としたものである。以下、本発明の実施例
に基づいて説明する。A target language sentence is input, a dictionary search is performed for a word of the input target language sentence, and a dictionary search unit that determines whether or not there is an entry in the dictionary, and when the dictionary search unit determines that there is an entry, A dictionary registered word processing unit that extracts a pre-reduced form of a contracted word from a dictionary and processes the contracted word; and if the dictionary search unit determines that there is no entry, discriminates the type of the contracted word, and And a dictionary unregistered word processing unit that performs individual processing to return the contracted word to its pre-reduced form and performs the processing. Hereinafter, a description will be given based on examples of the present invention.

第14図は、本発明による縮約語処理方式が適用される
英日自動翻訳装置の全体構成を説明するための図である
が、本発明は、英語を日本語に翻訳する英日自動翻訳装
置のみならず、ある言語を他の言語に翻訳する際、入力
される言語の文を解析する如何なる言語の解析装置にも
効果的に適用されることは言うまでもない。FIG. 14 is a diagram for explaining the entire structure of an English-Japanese automatic translation apparatus to which the contracted word processing method according to the present invention is applied. It goes without saying that the present invention can be effectively applied not only to a device but also to an analyzer of any language for analyzing a sentence of an input language when translating a certain language into another language.

第14図において、１は入力部、２は英文テキスト、３
は前編集部、４は形態素解析部、５は構文解析Ｉ部、６
は構文解析II部、７は操作表示部、８は単語辞書、９は
解析ルールファイル、10は制御部、11は構造変換部、12
は訳文生成部、13は後編集部、14は出力部、15は日本語
文で、本翻訳装置は、図示のように、入力部１を有し、
日本語に翻訳すべき英文テキスト２がこれにより入力さ
れる。入力部１は、たとえば、英数字キーなどの文字キ
ーや機能キーなどを有するキーボード、紙に記録された
英文テキストを読み取る光学的文字読取装置（OCR），
および（または）磁気ディスクなどの記録媒体に記録さ
れた英文テキストを読み込むファイル記憶装置などを含
んでよい。In FIG. 14, 1 is an input section, 2 is an English text, 3
Is a pre-editing section, 4 is a morphological analysis section, 5 is a parsing I section, 6
Is a syntax analysis II unit, 7 is an operation display unit, 8 is a word dictionary, 9 is an analysis rule file, 10 is a control unit, 11 is a structure conversion unit, 12
Is a translated sentence generating unit, 13 is a post-editing unit, 14 is an output unit, 15 is a Japanese sentence, and the translator has an input unit 1 as shown in the figure.
Thus, the English text 2 to be translated into Japanese is input. The input unit 1 includes, for example, a keyboard having character keys such as alphanumeric keys and function keys, an optical character reading device (OCR) for reading English text recorded on paper,
And / or a file storage device for reading English text recorded on a recording medium such as a magnetic disk.

入力部１により入力された英文テキストは、前編集部
３に読み込まれ、翻訳の前処理が行なわれる。ここで
は、主として文の認定と未知語の処理を行なう。これは
形態素解析の一部として機能する。The English text input by the input unit 1 is read by the pre-editing unit 3 and pre-translation processing is performed. Here, recognition of sentences and processing of unknown words are mainly performed. This functions as part of the morphological analysis.

前編集された英文データは、前編集で得られた情報と
ともに形態素解析部４に転送される。形態素解析部４で
は、単語辞書８を索引して文に分割し、英文の形態素を
解析し、未知語の処理、固有名詞、時の表面、数の表現
などの各種のまとめあげを行ない、付加疑問、同格の認
定などの文全体の処理を行なう。その形態素解析ルール
は解析ルールファイル９に格納されている。The pre-edited English sentence data is transferred to the morphological analysis unit 4 together with the information obtained by the pre-editing. The morphological analysis unit 4 indexes the word dictionary 8 and divides the sentence into sentences, analyzes the morphemes of the English sentence, performs various processing such as processing of unknown words, proper nouns, the surface of time, and expressions of numbers. , And performs the processing of the entire sentence such as recognition of the same rank. The morphological analysis rules are stored in the analysis rule file 9.

形態素解析された英文データは、形態素解析で得られ
た辞書情報とともに構文解析Ｉ部５に転送される。構文
解析Ｉ部５は、文法ルールを英文データに適用して文に
ついて表層構造の解析を行ない、すべての構文的可能性
を見つけ出す機能部である。The English sentence data subjected to the morphological analysis is transferred to the syntax analysis unit 5 together with the dictionary information obtained by the morphological analysis. The syntactic analysis unit 5 is a functional unit that applies a grammar rule to English sentence data, analyzes the surface structure of the sentence, and finds all syntactic possibilities.

構文解析Ｉ部５で構文解析された英文データは、その
解析情報とともに構文解析II部６に送られる。ここで
は、構文解析Ｉによる表層的な構文解析結果から、構造
記述を適用して解を選択する。これによって英語記述の
確からしい解析木を作成し、その構造を作る。これらの
構造解析ルールはやはり、解析ルールファイル９に格納
されている。The English sentence data parsed by the parsing I unit 5 is sent to the parsing II unit 6 together with the parsing information. Here, a solution is selected by applying a structural description from the surface analysis result obtained by the analysis I. This creates a probable parse tree of the English description, and its structure. These structural analysis rules are also stored in the analysis rule file 9.

構文解析された英文データは、解析木のデータとして
構造変換部11に転送される。構造変換部11では、英語文
の中間的構造である構文木から対応する日本語文の構文
木を作成し、日本語文を訳出しやすい日本語基底構造に
変換する。The parsed English sentence data is transferred to the structure conversion unit 11 as parse tree data. The structure conversion unit 11 creates a corresponding syntax tree of a Japanese sentence from a syntax tree that is an intermediate structure of the English sentence, and converts the Japanese sentence into a Japanese base structure that is easy to translate.

こうして構造変換された日本語の基底構造を示す構文
データは訳文生成部12に送出され、後者にて訳文の生成
が行なわれる。これは、日本語の構文木の木構造から日
本語の文を生成する機能部である。The syntax data indicating the Japanese base structure whose structure has been converted in this way is sent to the translated sentence generating unit 12, and the latter generates a translated sentence. This is a functional unit that generates a Japanese sentence from the tree structure of the Japanese syntax tree.

訳文生成された日本語文データ、すなわち訳文データ
は、後編集部13に送られる。後編集部13では、翻訳処理
に利用した情報を使用し、辞書８を索引して訳文データ
を修正し、より自然な日本語文を完成する。この日本語
文データは出力語14に転送され、翻訳された日本語文デ
ータは出力部14に転送され、翻訳された日本語文15とし
て出力部14から出力される。出力部14は、たとえばプリ
ンタ、ディスプレイ、および（または）磁気ディスクな
どのファイル記憶装置を含む。The translated sentence data, that is, the translated sentence data, is sent to the post-editing unit 13. The post-editing unit 13 uses the information used for the translation process to index the dictionary 8 to correct the translated sentence data, thereby completing a more natural Japanese sentence. The Japanese sentence data is transferred to the output word 14, and the translated Japanese sentence data is transferred to the output unit 14 and output from the output unit 14 as a translated Japanese sentence 15. The output unit 14 includes a file storage device such as a printer, a display, and / or a magnetic disk.

これらの一連の翻訳処理の流れは、本装置全体の制御
を統括する制御部10によって制御される。単語辞書８に
は、図示例の場合、英語および日本語の単語についての
辞書データが格納され、語彙だけでなく、係り関係すな
わち共起関係や、意味、単複、品詞などの様々な情報が
記述されている。また、解析ルールファイル９には、形
態素解析および英文解析のルールデータが格納されてい
る。The flow of these series of translation processes is controlled by the control unit 10 which controls the entire control of the present apparatus. In the case of the illustrated example, the word dictionary 8 stores dictionary data for English and Japanese words, and describes not only vocabulary but also various kinds of information such as relations, that is, co-occurrence relations, meanings, singular words, parts of speech, and the like. Have been. The analysis rule file 9 stores rule data for morphological analysis and English sentence analysis.

制御部10には、操作表示部７が接続されている。操作
表示部７には、操作者から本装置に様々な指示を与え
る。たとえば翻訳指示キー、カーソルキーなどの操作キ
ーや、入力英語文テキスト、翻訳結果の日本語文、辞書
情報などの中間データ、操作者に対する様々な指示など
を可視表示するディスプレイやインジケータを有する。
なお、それらの操作表示機能の多くは、入力部１にキー
ボードを備えている場合はそのキーボードに、また出力
部14にディスプレイを備えている場合はそのディスプレ
イに含まれるように構成してよい。The operation display unit 7 is connected to the control unit 10. The operation display unit 7 gives various instructions from the operator to the apparatus. For example, it has operation keys such as a translation instruction key and a cursor key, a display and an indicator for visually displaying input English sentence text, translation result Japanese sentence, intermediate data such as dictionary information, various instructions to an operator, and the like.
Many of these operation display functions may be configured to be included in the keyboard when the input unit 1 includes a keyboard, and included in the display when the output unit 14 includes a display.

本発明は、上述のごとき機械翻訳装置において、英文
テキスト２中に縮約語があった場合に、これを文法の改
変等を行うことなく、しかも誤解析なく処理するように
したものである。なお、表１は縮約形（Ｉ）と縮約前の
形との対応関係の一例を示す。According to the present invention, in the above-described machine translation apparatus, if there is a contracted word in the English text 2, it is processed without altering the grammar and without erroneous analysis. Table 1 shows an example of the correspondence between the contracted form (I) and the form before the contraction.

表１なお、表１において、１は第１タイプの縮約形（１語
の縮約）、２は第２タイプの縮約形、３は第３タイプの
縮約形（２語を語に縮約）を示している。Table 1 In Table 1, 1 is a contraction of the first type (reduction of one word), 2 is a contraction of the second type, and 3 is a contraction of the third type (reduction of two words into words). ).

第１図は、本発明による縮約語処理の一実施例を説明
するための全体ブロック線図、第２図は、第１図の回路
の動作説明をするためのフローチャートで、図中、20は
入力処理部、21は単位切り出し部、22はデリミット（de
limit）テーブル、23は辞書検索部、24は参照辞書、25
は辞書登録語処理部、26は辞書未登録語処理部、27は辞
書情報保存テーブルである。英語文テキストが入力処理
部20に入力されると（step1）、その英語文は単語切り
出し部21により、辞書引き単位の切り出しをデリミット
テーブル22を参照しながら行い（step2）、エンド（en
d）でない場合は（step3）、辞書検索部23において参照
辞書24により辞書検索を行い（step4）、エントリあり
の場合は（step5）、辞書登録語処理部25により辞書登
録語の処理を行い（step6）、エントリなしの場合は（s
tep5）、辞書未登録語処理部26により辞書未登録語の処
理を行う（step7）。前記step3において、エンドである
場合は、出力する（step8）。FIG. 1 is an overall block diagram for explaining an embodiment of contracted word processing according to the present invention, and FIG. 2 is a flowchart for explaining the operation of the circuit of FIG. Is an input processing unit, 21 is a unit cutout unit, 22 is a delimiter (de
limit) table, 23 is a dictionary search unit, 24 is a reference dictionary, 25
Is a dictionary registered word processing unit, 26 is a dictionary unregistered word processing unit, and 27 is a dictionary information storage table. When an English sentence text is input to the input processing unit 20 (step 1), the English sentence is cut out by the word cutout unit 21 in dictionary lookup units with reference to the delimit table 22 (step 2), and the end (en)
If not d) (step 3), the dictionary search unit 23 performs a dictionary search using the reference dictionary 24 (step 4). If there is an entry (step 5), the dictionary registered word processing unit 25 processes the dictionary registered word (step 3). step6), if there is no entry (s
tep5), the dictionary unregistered word processing unit 26 processes the dictionary unregistered words (step 7). If it is the end in step 3, the output is made (step 8).

表２は参照辞書を示しており、この辞書から例えば、
ain'tはamとnotの縮約語から成っている。 Table 2 shows reference dictionaries from which, for example,
ain't is a contraction of am and not.

表３は、I ain't a worm.の場合、ain'tがamとnotか
ら成り、amの開始位置が３（ａの位置）で、終了位置が
４（ｍの位置）であり、notの開始位置が５（ｎの位
置）、終了位置が７（ｔの位置）であることを示してい
るが、この開始位置、終了位置は辞書登録語の処理中の
計算式に従って決定される（ただし、値が整数でない場
合は切り捨てを行う）。Table 3 shows that in the case of I ain't a worm., Ain't is composed of am and not, the start position of am is 3 (position of a), the end position is 4 (position of m), and not Indicates that the start position is 5 (the position of n) and the end position is 7 (the position of t). The start position and the end position are determined according to the calculation formula during the processing of the dictionary registration word ( However, if the value is not an integer, it is truncated.)

表４は、I'm a worm.I would't…の場合の例を示すも
ので、表１からI'mが第１タイプの縮約形、wouldn'tが
第２タイプの縮約形であることが解る。 Table 4 shows an example of the case of I'm a worm.I would't ..., and from Table 1, I'm is the first type contracted form and wouldn't is the second type contracted form. It turns out that.

第４において、I'mのＩは１−１の位置であり、'm＝a
mは２−３の位置であることを表わし、wouldn'tのwould
は開始位置（ｗ）が15であり、終了位置（ｄ）が19であ
り、n't＝notは開始位置（ｎ）が20であり、終了位置
（ｔ）が22であることを表わしている。 In the fourth, I of I'm is position 1-1 and 'm = a
m represents the position of 2-3, and wouldn't
Indicates that the start position (w) is 15, the end position (d) is 19, and n't = not indicates that the start position (n) is 20 and the end position (t) is 22. I have.

第３図は、第１図における辞書登録語処理部のブロッ
ク線図、第４図は、第３図の動作説明をするためのフロ
ーチャートで、図中、30は縮約語判定部、31は縮約前の
各語形読み取り処理部、32は縮約前の語形に対する辞書
検索部、33は位置情報付与部、34は辞書情報保存テーブ
ルである。縮約語判定部30において縮約語であると判定
された場合（step1）、この縮約語が辞書登録されてい
る場合には、語形読み取り処理部31により、辞書情報か
ら縮約前の各語形を読みとり（step2、辞書検索部32に
より、縮約前の各語形に対して辞書検索を行う（step
3）。次に、位置情報付与部33により、各語形に対して
重ならないように位置情報を付け（step4）、各語形と
その辞書情報を辞書情報保存テーブル34に記録する（st
ep5）。前記step1において、縮約語でないと判断された
場合は、辞書情報保存テーブル27へ記録する（step
6）。第４図の注１において、辞書には縮約形の登録を
示さず、すべて第３タイプの縮約形テーブルに登録して
しまってもよい。また、縮約前の各語形に対して辞書検
索を行う場合、注２において、構成各語形は辞書登録さ
れているものしか許さないが、辞書中の各語形の位置を
もつようにしてもよいし、また、各語形の辞書情報（の
コピー）をもつようにしてもよい。なお、第４図の点線
内の処理は、開始位置をa,終了位置をb,構成語数をｃと
した時、ｎ番目の構成要素の開始位置を終了位置をとして処理している。FIG. 3 is a block diagram of the dictionary registered word processing unit in FIG. 1, and FIG. 4 is a flowchart for explaining the operation of FIG. Each word form reading processing unit before contraction, 32 is a dictionary search unit for the word form before contraction, 33 is a position information adding unit, and 34 is a dictionary information storage table. If the contracted word is determined by the contracted word determination unit 30 to be a contracted word (step 1), and if the contracted word is registered in the dictionary, the word form reading processing unit 31 uses the dictionary information to process The word form is read (step 2, dictionary search is performed for each word form before contraction by the dictionary search unit 32 (step 2).
3). Next, the position information adding unit 33 adds position information to each word form so as not to overlap (step 4), and records each word form and its dictionary information in the dictionary information storage table 34 (st
ep5). If it is determined in step 1 that the word is not a contracted word, it is recorded in the dictionary information storage table 27 (step
6). In Note 1 of FIG. 4, the dictionary does not show the registration of the contracted form, and may all be registered in the contracted form table of the third type. Also, when performing a dictionary search for each word form before contraction, in Note 2, only constituent word forms are allowed in the dictionary, but each word form may have the position of each word form in the dictionary. Alternatively, it may have (copy of) the dictionary information of each word form. The processing in the dotted line in FIG. 4 is based on the assumption that the starting position is a, the ending position is b, and the number of constituent words is c. End position It is processed as.

第５図は、第１図における辞書未登録語処理部の一例
を説明するためのブロック線図、第６図は、第５図の動
作説明をするためのフローチャートで、図中、40は全体
マッチング部、41は第３タイプ縮約形テーブル、42は末
尾部分とのマッチング部、43は第２タイプの縮約形テー
ブル、44は末尾部分とのマッチング部、45は第１タイプ
の縮約形テーブル、46は第１タイプの処理部、47は第２
タイプの処理部、48は第３タイプの処理部、49は辞書情
報保存テーブルである。前記第１〜第３タイプ縮約形テ
ーブルは、それぞれ表１に示した縮約形（Ｉ）の１〜３
に対応している。全体マッチング部40により、全体での
第３タイプ縮約形テーブル41とのマッチングを行い（st
ep1）、該当するものがあるかどうかを判断する（step
2）、該当するものがあれば、第３タイプ処理部48によ
り、第３タイプの縮約形の処理を行う（step3）。前記s
tep2において、該当するものがなければ、末尾部分との
マッチング42において、末尾部分と第２タイプの縮約形
テーブル43とのマッチングを行い（step4）、該当する
ものがあれば（step5）、第２タイプ処理部47により、
第２タイプの縮約形の処理を行う（step6）。前記step5
において該当するものがなければ、次に、末尾部分との
マッチング44において、末尾部分と第１タイプの縮約形
テーブル45とのマッチングを行い（step7）、該当する
ものがあれば（step8）、第１タイプ処理部46により、
第１タイプの縮約形の処理を行う（step9）。前記step8
において、該当するものがなければ、辞書未登録語とし
て辞書情報保存テーブル49へ記録する。FIG. 5 is a block diagram for explaining an example of a dictionary unregistered word processing section in FIG. 1, and FIG. 6 is a flowchart for explaining the operation of FIG. Matching unit, 41 is a contraction table of the third type, 42 is a matching unit with the tail, 43 is a contraction table of the second type, 44 is a matching unit with the tail, and 45 is a contraction of the first type Shape table, 46 is the first type processing unit, 47 is the second type
A type processing unit, 48 is a third type processing unit, and 49 is a dictionary information storage table. The first to third type contracted tables correspond to contracted forms (I) 1 to 3 shown in Table 1, respectively.
It corresponds to. The whole matching unit 40 performs matching with the third type reduced form table 41 as a whole (st
ep1), determine if there is any (step
2) If there is a corresponding one, the third type processing unit 48 performs the third type of contracted processing (step 3). Said s
In tep2, if there is no corresponding one, in the matching 42 with the tail part, the tail part is matched with the second type reduced form table 43 (step 4), and if there is a corresponding one (step 5), By the two-type processing unit 47,
A second type of contraction processing is performed (step 6). Step5
If there is no corresponding item in the above, then, in matching 44 with the end portion, matching is performed between the end portion and the first type contraction table 45 (step 7), and if there is a corresponding one (step 8), By the first type processing unit 46,
The first type contraction processing is performed (step 9). Step8
If there is no corresponding word in, it is recorded in the dictionary information storage table 49 as a dictionary unregistered word.

第７図は、第１タイプの縮約語の処理部の一例を説明
するためのブロック図、第８図は、第７図の動作説明を
するためのフローチャートで、図中、50は縮約形をはず
した形で検索する検索部、51は縮約形可能語テーブル
（表１の第１タイプの補足テーブル）、52は縮約部分を
はずした形で辞書検索する検索部、53は縮約前の形を抽
出する抽出部、54は縮約前の形テーブル（表１の第１タ
イプの縮約前の形（II）テーブル）、55は縮約前の形で
辞書検索する検索部、56は辞書情報保存テーブルで、こ
の処理は、縮約された部分に付き得る語がある範囲の語
に限定されている縮約のタイプに適用されるものであ
る。まず、検索部50により、縮約部分をはずした形で、
縮約形可能語テーブル51を検索する（step1）。次に、
縮約形可能語かどうかを判断し（step2）、Noであれ
ば、全体で未登録語として辞書情報保存テーブル56へ記
録する（step3）。前記step2において、Yesであれば、
検索部52により、縮約部分をはずした形で辞書検索を行
い（step4）、縮約部分をはずした部分の辞書情報保存
テーブル56への記録を行う（step5）。次に、抽出部53
により、縮約部分の縮約前の形（複数可）をテーブル54
からとり出し（step6）、検索部55により、縮約前の形
それぞれで辞書検索を行う（step7）。次に、縮約前の
形をそれぞれ辞書情報保存テーブル56に記録する（step
8）。なお、第８図の注１において、辞書登録されてい
る語を、これらのテーブルにのせることは許さないが、
当然、これらのテーブルそのものが辞書中の情報の位置
あるいは辞書情報そのもの（のコピー）を保存すること
も可能である。また、注２において、アポストロフィ's
の処理等を行うことはさしつかえない。また、注３にお
いて、位置情報は縮約形をはずした部分の開始位置，終
了位置であり、注４において、位置情報は（いずれも
（複数あっても））縮約形部分の開始位置，終了位置で
ある。FIG. 7 is a block diagram for explaining an example of a processing section for the first type of contracted words, and FIG. 8 is a flowchart for explaining the operation of FIG. A search unit for searching in a non-shaped form, 51 is a contractible word table (the first type of supplementary table in Table 1), 52 is a search unit for performing a dictionary search without a reduced part, and 53 is a contracted part. An extraction unit for extracting a shape before contraction, 54 is a shape table before contraction (a first type (II) table of the first type in Table 1), and 55 is a retrieval unit for searching a dictionary in a form before contraction. , 56 are dictionary information storage tables, and this processing is applied to a type of contraction in which words that can be attached to the contracted portion are limited to a certain range of words. First, the search unit 50 removes the contracted part,
The contractable word table 51 is searched (step 1). next,
It is determined whether the word is a contractible word (step 2). If No, the entire word is recorded in the dictionary information storage table 56 as an unregistered word (step 3). If yes in step 2 above,
The search unit 52 performs a dictionary search with the reduced part removed (step 4), and records the part with the reduced part removed in the dictionary information storage table 56 (step 5). Next, the extraction unit 53
By using the table 54, the shape (s) of the contracted portion before contraction can be
(Step 6), and the search unit 55 performs a dictionary search in each of the forms before the contraction (step 7). Next, the shapes before reduction are recorded in the dictionary information storage table 56 (step
8). Note that, in Note 1 in FIG. 8, words registered in the dictionary are not allowed to be placed in these tables,
Naturally, these tables themselves can store the position of information in the dictionary or (copy of) the dictionary information itself. In Note 2, Apostrophe's
May be performed. In Note 3, the position information is the starting position and the ending position of the part where the contracted form is removed, and in Note 4, the positional information (all (even if there are plural)) is the starting position of the contracted part, End position.

第９図は、第２タイプの縮約語の処理の一例を説明す
るためのブロック線図、第10図は、第９図の動作説明を
するためのフローチャートで、図中、60は縮約形部分を
はずした形で辞書検索をする検索部、61は辞書情報保存
テーブル、62は縮約前の形を抽出する抽出部、63は縮約
前の形テーブル（表１の第２タイプの縮約前の形テーブ
ル）、64は縮約前の形で辞書検索する検索部で、この処
理は、縮約された部分に付き得る語に制限のないものを
処理する場合に適用されるものである。まず、検索部60
により、縮約形部分をはずした形で辞書検索し（step
1）、エントリがあるかどうかを判断する（step2）。No
であれば、縮約形部分をはずした形で未登録語として辞
書情報保存テーブル61に記録し（step3）、Yesであれば
縮約形部分をはずした形を辞書情報保存テーブル61に記
録する（step4）。次に、抽出部62により、縮約形部分
の縮約形の形（複数化）をテーブル63からとり出し（st
ep5）、検索部64により、それぞれの縮約前の形で辞書
検索を行う（step6）。次に、それぞれの形で辞書情報
保存テーブル61に記録する（step7）。なお、第10図の
注１において、全体で未登録とすることにしてもよい
が、その場合には、直接endへ行く。また、注２におい
て、位置情報は縮約形をはずした部分の開始位置，終了
位置であり、注３において、位置情報は（いずれも）縮
約部分の開始位置，終了位置である。また、注４は、第
８図に関して説明した第１タイプの縮約語処理における
注１と同様である。FIG. 9 is a block diagram for explaining an example of processing of the second type of contracted words, and FIG. 10 is a flowchart for explaining the operation of FIG. A search unit that performs a dictionary search without the shape part, 61 is a dictionary information storage table, 62 is an extraction unit that extracts a shape before contraction, 63 is a shape table before contraction (the second type in Table 1). Form table before contraction), 64 is a search unit that performs dictionary search in the form before contraction. This processing is applied when processing unlimited words that can be attached to the contracted part. It is. First, search unit 60
To search the dictionary without the contracted part (step
1), it is determined whether there is an entry (step 2). No
If it is, the contracted part is removed and recorded in the dictionary information storage table 61 as an unregistered word (step 3). If Yes, the form with the contracted part removed is recorded in the dictionary information storage table 61. (Step4). Next, the extraction unit 62 extracts the contracted form (pluralization) of the contracted part from the table 63 (st
ep5), the search unit 64 performs a dictionary search in the form before each reduction (step 6). Next, the data is recorded in the dictionary information storage table 61 in each form (step 7). Note that, in Note 1 in FIG. 10, it may be unregistered as a whole, but in that case, go directly to end. In Note 2, the position information is the start position and the end position of the part in which the contracted form is removed, and in Note 3, the position information is the start position and the end position of the contracted part (in any case). Note 4 is the same as Note 1 in the first type contracted word processing described with reference to FIG.

第11図は、第３タイプの縮約語の処理部の一例を説明
するためのブロック線図、第12図は、第11図の動作説明
をするためのフローチャートで、図中、70は縮約前の各
語形を抽出する抽出部、71は縮約形テーブル（表１の第
３タイプの縮約形テーブル）、72は辞書検索部、73は位
置情報付与部、74は辞書情報保存テーブルで、この第３
タイプの縮約形は特殊な縮約の仕方であり、全体をテー
ブルに登録しておく必要のあるものであるが、本発明で
は、辞書登録されている縮約語の処理と同じであり、従
って、第３タイプの縮約形にするか、辞書登録するかの
本質的差異はなく、どちらか一方だけを許すようにし、
許さなかった方の処理を省略するようにすることも可能
である。まず、抽出部70により、縮約形テーブル71から
縮約前の各語形をとり出し（step1）、辞書検索部72に
より、各語形に対して辞書検索を行う（step2）。次
に、位置情報付与部73により、各語形に対して位置情報
を与え（step3）、各語形を辞書情報保存テーブル74へ
記録する（step4）。なお、第12図の注１において、各
語形は辞書登録されているものしか許さず、また、注２
において、詳細は辞書登録されている縮約形の処理と同
じである。FIG. 11 is a block diagram for explaining an example of a processing unit for the third type of contracted words, and FIG. 12 is a flowchart for explaining the operation of FIG. Extraction unit for extracting each word form before the previous, 71 is a contracted table (third type of contracted table in Table 1), 72 is a dictionary search unit, 73 is a position information adding unit, and 74 is a dictionary information storage table In this third
The type contraction is a special contraction method, and the entire contraction must be registered in a table. However, in the present invention, the contraction processing is the same as the processing of contracted words registered in a dictionary. Therefore, there is no essential difference between the contraction of the third type and the registration in the dictionary, and only one of them is allowed.
It is also possible to omit the processing which is not permitted. First, each word form before reduction is extracted from the contracted form table 71 by the extraction unit 70 (step 1), and a dictionary search is performed on each word form by the dictionary search unit 72 (step 2). Next, the position information giving unit 73 gives position information to each word form (step 3), and records each word form in the dictionary information storage table 74 (step 4). Note that, in Note 1 in Fig. 12, each word form is only allowed to be registered in the dictionary.
Are the same as those in the contracted form registered in the dictionary.

第13図は、本発明による縮約語処理部の一実施例を示
す詳細ブロック線図で、図中の各ブロックについては第
１乃至第12図において既に説明したので、これらの各ブ
ロックに第１図乃至第12図において使用した参照番号と
同一の参照番号を付してその詳細な説明は省略する。FIG. 13 is a detailed block diagram showing an embodiment of a contracted word processing unit according to the present invention. Since each block in the figure has already been described in FIGS. The same reference numerals as those used in FIGS. 1 to 12 denote the same reference numerals, and a detailed description thereof will be omitted.

効果以上の説明から明らかなように、本発明によると、縮
約語を処理するに当って、縮約語のままでなく、縮約語
形になる前の２語にもどし処理するようにしたので、文
法等の改変が必要でなく、また、誤解析の可能性を少な
くして処理を行うことができる。Effect As is apparent from the above description, according to the present invention, in processing a contracted word, not only the contracted word but also the two words before the contracted word form are processed. , Grammar and the like need not be modified, and processing can be performed with a reduced possibility of erroneous analysis.

[Brief description of the drawings]

第１図は、本発明による縮約語処理装置の一実施例を説
明するための全体ブロック線図、第２図は、第１図の動
作説明をするためのフローチャート、第３図は、辞書登
録語処理部の一例を説明するためのブロック線図、第４
図は、第３図の動作説明をするためのフローチャート、
第５図は、辞書未登録語処理部の一例を説明するための
ブロック線図、第６図は、第５図の動作説明をするため
のフローチャート、第７図は、第１タイプの縮約語の処
理部の一例を説明するためのブロック線図、第８図は、
第７図の動作説明をするためのフローチャート、第９図
は、第２タイプの縮約語の処理部を説明するためのブロ
ック線図、第10図は、第９図の動作説明をするためのフ
ローチャート、第11図は、第３タイプの縮約語の処理部
の一例を説明するためのブロック線図、第12図は、第11
図の動作説明をするためのフローチャート、第13図は、
本発明による縮約語処理部の詳細ブロック線図、第14図
は、本発明による縮約語処理が適用される自動翻訳装置
の全体構成を示す図である。 20……入力処理部,21……単位切り出し部,22……デリミ
ットテーブル,23……辞書検索部,24……参照辞書,25…
…辞書登録語処理部,26……辞書未登録語処理部,27……
辞書情報保存テーブル,30……縮約語判定部,31……縮約
前の各語形読み取り処理部,32……縮約前の語形に対す
る辞書検索部,33……位置情報付与部,34……辞書情報保
存テーブル,40……全体マッチング部,41……第３タイプ
縮約形テーブル,42……末尾部分とのマッチング部,43…
…第２タイプの縮約形テーブル,44……末尾部分とのマ
ッチング部,45……第１タイプの縮約形テーブル,46……
第１タイプの処理部,47……第２タイプの処理部,48……
第３タイプの処理部,4……辞書情報検索テーブル,50…
…縮約形をはずした形で検索する検索部,51……縮約形
可能語テーブル,52……縮約部分をはずした形で辞書検
索する検索部,53……縮約前の形を抽出する抽出部,54…
…縮約前の形テーブル,55……縮約前の形で辞書検索す
る検索部,56……辞書情報保存テーブル,60……縮約形部
分をはずした形で辞書検索をする検索部,61……辞書保
存テーブル,62……縮約前の形を抽出する抽出部,63……
縮約前の形テーブル,64……縮約前の形で辞書検索する
検索部,70……縮約前の各語形を抽出する抽出部,71……
縮約形テーブル,72……辞書検索部,73……位置情報付与
部,74……辞書情報保存テーブル。FIG. 1 is an overall block diagram for explaining an embodiment of a contracted word processing apparatus according to the present invention, FIG. 2 is a flowchart for explaining the operation of FIG. 1, and FIG. Block diagram for explaining an example of the registered word processing unit, FIG.
The figure is a flowchart for explaining the operation of FIG. 3,
FIG. 5 is a block diagram for explaining an example of a dictionary unregistered word processing section, FIG. 6 is a flowchart for explaining the operation of FIG. 5, and FIG. 7 is a first type contraction. FIG. 8 is a block diagram for explaining an example of a word processing unit.
7 is a flow chart for explaining the operation of FIG. 7, FIG. 9 is a block diagram for explaining the processing unit of the second type contracted word, and FIG. 10 is for explaining the operation of FIG. FIG. 11 is a block diagram for explaining an example of a processing unit of the third type contracted word, and FIG.
FIG. 13 is a flowchart for explaining the operation of FIG.
FIG. 14 is a detailed block diagram of the contracted word processing unit according to the present invention, and FIG. 14 is a diagram showing the entire configuration of an automatic translation apparatus to which the contracted word processing according to the present invention is applied. 20 input processing unit, 21 unit cutout unit, 22 delimiter table, 23 dictionary search unit, 24 reference dictionary, 25
… Dictionary registered word processing unit, 26 …… Dictionary unregistered word processing unit, 27 ……
Dictionary information storage table, 30... Contracted word determination unit, 31... Each word form reading processing unit before contraction, 32... Dictionary search unit for word form before contraction, 33. ... Dictionary information storage table, 40 ... Overall matching unit, 41 ... Third type contraction type table, 42 ... Matching unit with end part, 43 ...
… The second type contracted table, 44… matching part with the end part, 45… the first type contracted table, 46…
First type processing unit, 47 ... Second type processing unit, 48 ...
Processing unit of the third type, 4 ... Dictionary information retrieval table, 50 ...
… Retrieval part for searching in a form without contraction, 51 …… Reducible word table, 52… Search part for dictionary search in a form without contraction part, 53 …… Retrieval form before contraction Extraction unit to extract, 54 ...
… A shape table before contraction, 55… a search unit that searches the dictionary in a form before contraction, 56 …… a dictionary information storage table, 60 …… a search unit that performs a dictionary search without the contraction part, 61 …… Dictionary storage table, 62 …… Extraction unit that extracts the shape before contraction, 63 ……
Shape table before contraction, 64 Search unit for dictionary search in form before contraction, 70 Extraction unit for extracting each word form before contraction, 71 ...
Reduced form table, 72: Dictionary search unit, 73: Position information adding unit, 74: Dictionary information storage table.

Claims

(57) [Claims]

1. A dictionary search unit for inputting a target language sentence, performing a dictionary search for words of the input target language sentence, and determining whether or not there is an entry in the dictionary. When it is determined, each word form before the contraction is read, a dictionary search is performed on the read word form before the contraction, and a dictionary is attached with position information so as not to overlap each word form before the contraction. A dictionary registration word processing unit for recording in a storage table, and a plurality of contraction type tables having different contraction types, wherein when the dictionary search unit determines that there is no entry, a contraction of a dictionary unregistered word Compare the whole or the end part of the contracted form with the contracted form of the contracted form table for each type,
The matched type is discriminated as the type of the given contracted form, the type is returned to the pre-reduced word form for each type, and location information is added so as not to overlap the pre-reduced word form, and the dictionary storage table And a dictionary unregistered word processing unit for recording in a dictionary.