JPH0412505B2

JPH0412505B2 -

Info

Publication number: JPH0412505B2
Application number: JP58232527A
Authority: JP
Inventors: Hiroshi Kushima; Shigeru Hirose
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-12-09
Filing date: 1983-12-09
Publication date: 1992-03-04
Also published as: JPS60124782A

Description

【発明の詳細な説明】発明の技術分野本発明は、英語から日本語へなどの外国語翻訳
を行なう機械翻訳装置に関する。DETAILED DESCRIPTION OF THE INVENTION Technical Field of the Invention The present invention relates to a machine translation device that performs foreign language translation such as from English to Japanese.

従来技術と問題点翻訳即ち日本語、英語、フランス語などの異種
言語間の変換を入手ではなく、装置に行なわせる
には、一般に１つの翻訳装置を置いてさらに一種
の言語の文を入力し、該翻訳装置に内蔵されてい
る所定の変換規則に従つて変換した他種の言語の
文を出力させる、この間に人手は加えない、のが
普通である。しかし文は、それがタイトルにある
のか、目次にあるのか、文中にあるのか等で、同
じ文でも異なる意味になることとがある。例えば
次の文Ａ、 The door opened for us は我々のための開かれたドア、ともまたドア
は我々のために開いた、とも解釈できる。
openedを他動詞とするなら、自動詞とするな
らであり、この文だけでは自動詞、他動詞の区
別はつかず、なのかなのかの決定はできな
い。しかし文Ａがタイトルにあるなら、タイトル
は名詞化されているのが普通であるから、普通
文中にあるなら文は主語述語関係から成り立つて
いるのが普通であるからである確率が高い。Prior Art and Problems In order to have a device perform translation, that is, conversion between different languages such as Japanese, English, and French, instead of simply obtaining the information, generally one translation device is installed, and sentences in one language are inputted. Usually, the translation device outputs a sentence in another language converted according to a predetermined conversion rule built in, and no human intervention is performed during this process. However, the same sentence can have different meanings depending on whether it is in the title, table of contents, or in the text. For example, the following sentence A, The door opened for us, can be interpreted as the door opened for us, or the door opened for us.
If opened is a transitive verb, then it is an intransitive verb. From this sentence alone, it is not possible to distinguish between an intransitive verb and a transitive verb, and it is impossible to determine whether it is or. However, if sentence A is in a title, the title is usually a noun, and if it is in a sentence, the probability is high because the sentence is usually made up of a subject-predicate relationship.

文Ａについての上記多義性の問題は英単語の多
品詞性から生じているが、勿論多義性は各種の原
因で生じる。例えば英語の名詞に対する日本語の
名詞は複数あるのが普通であり、選択の仕方によ
つては２様、３葉の解釈が生じる。 The above-mentioned problem of ambiguity regarding sentence A arises from the multi-part nature of English words, but of course ambiguity arises for various reasons. For example, it is common for there to be multiple Japanese nouns for English nouns, and depending on how you choose them, there can be two or three different interpretations.

発明の目的本発明は機械翻訳における多義性問題に対処し
ようとするものであり、書式に関する情報から文
種を採取してこれを有効に活用することにより的
確な翻訳を実行可能にしようとするものである。Purpose of the Invention The present invention attempts to deal with the ambiguity problem in machine translation, and attempts to make accurate translation possible by extracting sentence types from format information and effectively utilizing this information. It is.

発明の構成本発明は英語から日本語へなどの外国語翻訳を
行なう機械翻訳装置において、翻訳対象の入力文
データを１文ずつに区切り、その各文の書式から
当該文は標題か、箇条書きか、本文かなどの文種
を判別する文種別検出手段と、該文種別検出手段
が出力する文種別に対応して設けられ、文種に応
じた翻訳処理を行なう複数個の翻訳機構と、前記
文種別検出手段が出力する各文の文種別信号に従
つて当該文データをそれに適合する翻訳機構へ入
力する翻訳機構選択手段とを備えることを特徴と
するが、次に図面を参照しながらこれを説明す
る。Structure of the Invention The present invention is a machine translation device that performs foreign language translation such as from English to Japanese, which divides input sentence data to be translated into sentences one by one, and determines whether the sentence is a title or bulleted list based on the format of each sentence. , a sentence type detection means for determining a sentence type such as whether it is a main text, a plurality of translation mechanisms provided corresponding to the sentence types output by the sentence type detection means and performing translation processing according to the sentence type; Translation mechanism selection means for inputting the sentence data to a translation mechanism suitable for the sentence data according to the sentence type signal of each sentence outputted by the sentence type detection means. Explain.

発明の実施例第１図は本発明に係る機械翻訳装置の概要を示
し、この装置は図示のように入力文を各文毎に区
切りその文種判定を行なう文種検出部１０、文種
に応じた翻訳処理を行なう複数種本例では３種の
翻訳機構１４ａ，１４ｂ，１４ｃ、及び検出部１
０が検出した文種に応じて使用すべき翻訳機構１
４ａ〜１４ｃを選択する翻訳機構選択部１２から
なる。１種ではなく複数種の翻訳機構を設けた点
が本発明の大きな特徴点である。複数種の翻訳機
構１４ａ〜１４ｃは本例では１４ａはタイトル
用、１４ｂは本文用、１４ｃは命令文用である
が、これらは翻訳対象言語の特質により区分け
し、選定するのがよい。なお命令文用翻訳機構１
４ｃについては、一般に命令文は、これは命令文
であるということを知らないと誤訳の恐れのある
ものである。例えばSET DIALは命令文ならダ
イヤルをセツトせよであり、普通文ならセツト用
ダイヤルであろう。この種の命令文は機械の整備
マニユアルなどに多数でてくる。そこで、翻訳対
象言語に命令文が含まれることが予想されるなら
翻訳機構１４ｃを設けておいてこれを選択できる
ようにしておくとよい。勿論この種のものは翻訳
対象言語に含まれる恐れはないなら、翻訳機構１
４ｃを用意する必要はない。Embodiment of the Invention FIG. 1 shows an outline of a machine translation device according to the present invention. As shown in the figure, this device includes a sentence type detection unit 10 that separates input sentences into sentences and determines the sentence type; In this example, there are three types of translation mechanisms 14a, 14b, 14c, and the detection unit 1.
Translation mechanism 1 to be used according to the sentence type detected by 0
It consists of a translation mechanism selection section 12 that selects 4a to 14c. A major feature of the present invention is that not one type but multiple types of translation mechanisms are provided. In this example, the plurality of types of translation mechanisms 14a to 14c are 14a for titles, 14b for main texts, and 14c for command sentences, but these are preferably classified and selected according to the characteristics of the language to be translated. In addition, imperative sentence translation mechanism 1
Regarding 4c, there is a risk of mistranslation of an imperative sentence unless one is aware that it is an imperative sentence. For example, SET DIAL would be a command sentence to set the dial, and a plain sentence would be a dial for setting. This type of command statement appears frequently in machine maintenance manuals and the like. Therefore, if it is expected that the language to be translated includes imperative sentences, it is advisable to provide a translation mechanism 14c so that one can be selected. Of course, if there is no risk that this kind of thing will be included in the target language, translation mechanism 1
There is no need to prepare 4c.

翻訳機構は、源言語で記述された文章を文字列
として入力され、特定の他の言語（目的言語）に
変換することを目的とされたもので、一般には辞
書と変換規則を備え、入力文を単語に分割し、各
単語について、辞書によつて文法属性や訳語を対
応させ、変換規則によつて語順の変更等の加工を
行なつた上、目的言語で出力するものである。翻
訳機構それ自体には既知のものが多数存在する。
これらの翻訳機構のうちの適当なものを採用し、
必要に応じて改変して、翻訳機構１４ａ〜１４ｃ
とすればよい。 A translation mechanism is a device whose purpose is to convert a text written in a source language into a specific other language (target language), and is generally equipped with a dictionary and conversion rules. is divided into words, each word is associated with grammatical attributes and translations using a dictionary, processed such as changing the word order using conversion rules, and then output in the target language. There are many known translation mechanisms themselves.
Adopting an appropriate one of these translation mechanisms,
Translation mechanisms 14a to 14c with modifications as necessary
And it is sufficient.

この装置では入力文即ち翻訳対象言語ISを文種
別検出部１０に入力し、入力文を１文毎に切出
し、その１文がどのような文種のものであるか、
表題なのか、図の説明文なのか、目次の一部なの
か、を検出させ、その１文Ｓを文種別信号KSと
共に選択部１２へ送る。選択部１２では文種別信
号KSに従つて翻訳機構１４ａ〜１４ｃの１つを
選択し、検出部１０から送られてきた文Ｓをその
翻訳に最適な上記選択した翻訳機構へ送る。例え
ば文章Ｓが表題であれば該文章は翻訳機構１４ａ
へ送られ、本文であれば翻訳機構１４ｂへ送られ
る。TSはこうして翻訳された出力文即ち訳文を
示す。 In this device, the input sentence, that is, the language to be translated IS, is input to the sentence type detection unit 10, the input sentence is cut out sentence by sentence, and the type of sentence that each sentence is is determined.
It detects whether it is a title, a figure explanatory text, or a part of a table of contents, and sends that one sentence S to the selection section 12 together with a sentence type signal KS. The selection unit 12 selects one of the translation mechanisms 14a to 14c according to the sentence type signal KS, and sends the sentence S sent from the detection unit 10 to the selected translation mechanism that is most suitable for the translation. For example, if the sentence S is the title, the sentence is translated by the translation mechanism 14a
If it is a main text, it is sent to the translation mechanism 14b. TS indicates the output sentence translated in this way, that is, the translated sentence.

文種別検出部１０は入力文を１分毎に分離し、
各文の書式を解析して文種を判定する。こゝでは
入力文は英文を、出力文は日本文を想定してお
り、そして書式とは文章の初めであることを示す
段下げがあるか、ピリオドで終つているか、空行
が続いているか、各単語が大文字で始つているか
などを言う。また文種とは、標題、箇条書き、目
次、本文、図、表などの種類をいうが、本分とそ
れ以外、外文と命令文とこれら以外などの分類も
有効である。翻訳機構は文種別に構成するが、１
文種に限る必要はなく、例えば標題と目次等は文
章が名詞化されている（名詞句になつている）の
が普通なので同じ翻訳機構で扱える。次に入力文
からの各文の抽出、その抽出した文の文種判定の
要領を第２図で説明する。 The sentence type detection unit 10 separates input sentences every minute,
The format of each sentence is analyzed to determine the sentence type. Here, the input sentence is assumed to be English, the output sentence is Japanese, and the format is whether there is a column indentation to indicate the beginning of the sentence, whether it ends with a period, or whether there are continuous blank lines. , whether each word starts with a capital letter, etc. Text types include titles, bulleted lists, tables of contents, main text, figures, tables, etc., but classifications such as main text and other text, external text, imperative text, and other types are also valid. The translation mechanism is configured by sentence type, but 1
There is no need to limit the type of sentence; for example, titles and tables of contents are usually converted into nouns (noun phrases), so they can be handled by the same translation mechanism. Next, the procedure for extracting each sentence from the input sentence and determining the sentence type of the extracted sentence will be explained with reference to FIG.

第２図で２０は翻訳対象文書であり、この文章
に記載の文が光電的に読取られ、文字認識されて
翻訳対象文即ち前記の入力文ISとなる。翻訳対象
文の入力方法は、このような光電的な方法でな
く、磁気テープ、パンチカードなどに記録したも
のを読取るという方法でもよい。唯、本発明は文
書に書かれたものを翻訳することを基本としてい
るので、入力手段はどうあれ、各行の識別が可能
である必要がある。本発明では文抽出に当つて常
に次行も参照する。２２は現在行バツフア、２４
は次行バツフアであり、これらに入力文の１行と
その次の行の文字データが入力する。これらのバ
ツフアの文字データから文抽出及び文種判別を行
なうが、これらは互いに関連し合つている。例え
ば文章はピリオドで終るのが普通であるが、その
外に感嘆符で終るものもある。そこで本装置では
文抽出に次の基準を用いる。 In FIG. 2, reference numeral 20 denotes a document to be translated, and sentences written in this document are read photoelectrically and character-recognized to become a sentence to be translated, that is, the input sentence IS described above. The method of inputting the sentence to be translated is not such a photoelectric method, but may also be a method of reading what is recorded on a magnetic tape, punch card, or the like. However, since the present invention is based on translating what is written in a document, it is necessary to be able to identify each line regardless of the input means. In the present invention, the next line is always referred to when extracting a sentence. 22 is the current row buffer, 24
are next line buffers, into which character data of one line of the input sentence and the next line are input. Sentence extraction and sentence type discrimination are performed from the character data of these buffers, and these are mutually related. For example, sentences usually end with a period, but some sentences end with an exclamation point. Therefore, this device uses the following criteria for sentence extraction.

(1) ピリオド、感嘆符、又は疑問符、の後に空白
が２個以上ある。(1) There are two or more spaces after a period, exclamation mark, or question mark.

(2) 次行が特定書式である。(2) The next line is in a specific format.

こゝで特定書式とは次のものをいう。空白
行、標題、箇条書き、節番号つき標題、
目次、段落。 The specific format here refers to the following: Blank lines, headings, bulleted lists, headings with section numbers,
Table of contents, paragraphs.

ピリオドは文の終りを意味するかというと勿論
そうではなく、例えば複数個の名詞を頭文字だけ
とつて並べる場合はS.H.などとするから、ピリ
オドのみでは文の区切り検出は不確実である。と
ころで英文の終りはピリオド、感嘆符、又は疑問
符の後２文字を空けるという約束があるから上記
(1)の基準を用いると文の終りを検出できる。上記
の頭文字に付くピリオドはその次を空けない又は
１文字空けるだけで、２文字を空けるということ
はしない。また文の終りにはピリオドなどのない
ものもある。例えば箇条書きでは(1)……(2)……と
なつていて文末にピリオドのないものがあり、ピ
リオドを文末検出の条件とすると箇条書きの各文
は全て連結されてしまう。また箇条書きを導く文
（箇条書きに先行する文）にもピリオドのないも
のがあり、かゝる文もピリオドが文末検出条件な
ら箇条書きの文とつながつてしまう。文抽出基準
に上記(2)を設けると、かゝる問題を解決できる。
特定書式のの「空白行」は何も書いてない行、
空いている行ということであり、次行が空いてい
る現在行はそれで終りとしてよい。また標題はそ
れ自身独立した文であるから、現在行が前記(1)の
条件を欠いても次行が表題なら現在行はこれで終
りとしてよい。以下これに準ずる。 Of course, a period does not mean the end of a sentence; for example, when multiple nouns are listed by their first letter, they are used as SH, so it is not reliable to detect the end of a sentence with just a period. By the way, there is a convention to leave two characters blank after a period, exclamation mark, or question mark at the end of an English sentence, so the above
Using criterion (1), the end of a sentence can be detected. The period attached to the first letter above does not leave a space after it, or only leaves a space between one character and not two characters. Also, some sentences do not have a period at the end. For example, some bulleted lists are (1)...(2)... without a period at the end of the sentence, and if a period is used as a condition for detecting the end of a sentence, all the sentences in the bulleted list will be concatenated. Also, some sentences that lead to bullet points (sentences that precede bullet points) do not have a period, and such sentences will also be connected to bullet points if a period is the end-of-sentence detection condition. By setting (2) above as the sentence extraction criterion, such a problem can be solved.
A "blank line" in a specific format is a line where nothing is written,
This is an empty line, and the current line can be considered to end when the next line is empty. Furthermore, since the title itself is an independent sentence, even if the current line lacks the condition (1) above, if the next line is the title, the current line can be considered as the end. This applies hereafter.

特定書式〜の検出の仕方を説明すると、
の空白行は、当該行に文字データが全くない、と
いうことで検出でき、の標題は、当該文章が前
記(1)の条件なしで終り、次行は空白行、箇条書き
などの特定書式であるということで検出できる。
またの箇条書きは、数字とそのあとのピリオド
または括弧入り数字、に続く文章、により検出で
き、の節番号つき標題は、数字又は数字とピリ
オドの繰り返し（１又は１、１、２など）に続く
文章、により検出できる。またの目次は標題又
は節番号つき表題と記号繰り返しと数字、により
検出でき、の段落は、前行より開始行が右側に
ある（書初めが下つている）、ことにより検出で
きる。 To explain how to detect a specific format,
A blank line can be detected by the fact that there is no character data on the line, and a heading can be detected by the fact that the sentence ends without the condition (1) above, and the next line is in a specific format such as a blank line or bulleted list. It can be detected by the fact that it exists.
A bulleted list can be detected by a number followed by a period or a number in parentheses, and a section heading with a number can be detected by a number or a repeated number and period (1 or 1, 1, 2, etc.). It can be detected by the following sentences. A table of contents can be detected by the title or section numbered title, symbol repetition, and numbers, and a paragraph can be detected by the fact that the starting line is to the right of the previous line (the beginning of the calligraphy is downward).

翻訳対象文書によつては、この文書には目次は
ない等、特定書式〜の中に無意味なものもあ
る。書式選択表２６はこれに対処するもので、翻
訳対象毎に文種別判別手段２８で用いる特定書式
を前記〜のどれにするか、どれを有効としど
れを無効とするかを指定する。この指定はオペレ
ータが行なう。文種別判別手段２８及び文抽出手
段３０は前述のように現在行バツフア２２と次行
バツフア２４の文字データを用いて文種判別、文
抽出を行ない、抽出した１センテンスは文バツフ
ア３２を通して前記の入力文Ｓとして、また文種
は文種別バツフア３４を通して前記の文種別信号
KSとして翻訳機構選択部１２へ入力する。 Depending on the document to be translated, some of the specific formats may be meaningless, such as the document does not have a table of contents. The format selection table 26 deals with this, and specifies which of the above-mentioned formats to be used by the sentence type determining means 28 for each translation target, and which formats are valid and which are invalid. This specification is made by the operator. The sentence type discrimination means 28 and the sentence extraction means 30 use the character data of the current line buffer 22 and the next line buffer 24 to discriminate the sentence type and extract the sentence as described above, and the extracted one sentence is passed through the sentence buffer 32 to As the input sentence S, the sentence type is sent to the above-mentioned sentence type signal through the sentence type buffer 34.
It is input to the translation mechanism selection unit 12 as KS.

発明の効果以上説明したように本発明では翻訳機構を本
文、タイトルなどの文種に対応する（適合する）
複数種設け、翻訳対象文はピリオドなしの文か、
などのその書式から文種を判断して該当する翻訳
機構へ入力し、そこで翻訳させるので、二様、三
様の解釈ができる翻訳対象文も適切な訳文に翻訳
することができ、機械翻訳装置の性能向上に寄与
する所、大なるものがある。Effects of the Invention As explained above, in the present invention, the translation mechanism corresponds to (adapts) text types such as body text and title.
Multiple types are provided, and the target sentence for translation is a sentence without a period,
The type of sentence is determined from the format, such as, and inputted to the corresponding translation mechanism, where it is translated, so even sentences that can be interpreted in two or three ways can be translated into the appropriate translation, and the machine translation device There are some great things that contribute to improved performance.

[Brief explanation of drawings]

第１図は本発明の実施例を示すブロツク図、第
２図は第１図の一部の詳細を示すブロツク図であ
る。図面で１０は文種別検出手段、１４ａ〜１４ｃ
は複数個の翻訳機構、１２は翻訳機構選択手段、
ISは入力文データ、Ｓは文データ、KSは文種別
信号である。 FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is a block diagram showing details of a part of FIG. In the drawing, 10 is a sentence type detection means, 14a to 14c.
12 is a translation mechanism selection means;
IS is input sentence data, S is sentence data, and KS is a sentence type signal.

Claims

[Claims] 1. In a machine translation device that performs foreign language translation such as from English to Japanese, input sentence data to be translated is divided into sentences,
A sentence type detection means is provided to determine whether the sentence is a title, bulleted list, main body, etc. from the format of each sentence, and a sentence type is provided corresponding to the sentence type output by the sentence type detection means. a plurality of translation mechanisms that perform translation processing according to the sentence type detection means, and a translation mechanism selection means that inputs the sentence data to a translation mechanism suitable for the sentence according to the sentence type signal of each sentence outputted by the sentence type detection means. A machine translation device characterized by: