JP7845470B2

JP7845470B2 - Generation apparatus, generation method, and program

Info

Publication number: JP7845470B2
Application number: JP2024530243A
Authority: JP
Inventors: 克己帖佐; 睦森下; 昌明永田
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2026-04-14
Anticipated expiration: 2042-06-30
Also published as: JPWO2024004184A1; WO2024004184A1

Description

本発明は、機械翻訳の技術分野に関連するものである。This invention relates to the technical field of machine translation.

あるドメインの文を他のドメイン（例：他言語）に変換する際に、指定された語句（制約語句）をすべて含ませることを目的とした制約を課したものを語彙制約付き機械翻訳という。語彙制約付き機械翻訳により、特定の語に対する訳を統一できることから、語彙制約付き機械翻訳は、一貫性が必要な特許／法務／技術文書などの翻訳においては特に重要となる技術である。When translating text from one domain to another domain (e.g., another language), machine translation with vocabulary constraints imposes constraints aimed at including all specified words or phrases (constraint words). Because machine translation with vocabulary constraints can unify the translation of specific words, it is a particularly important technology in the translation of documents such as patents, legal documents, and technical documents where consistency is required.

Chen, G., Chen, Y., and Li, V. O. (2021). "Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance." Proceedings of the AAAI Conference on Artificial Intelligence.Chen, G., Chen, Y., and Li, V. O. (2021). "Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance." Proceedings of the AAAI Conference on Artificial Intelligence. Matt Post and David Vilar. 2018. Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1314-1324, New Orleans, Louisiana. Association for Computational LinguisticsMatt Post and David Vilar. 2018. Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1314-1324, New Orleans, Louisiana. Association for Computational Linguistics

語彙制約付き機械翻訳手法では、与えられる制約語句を含むように翻訳文を生成する。一方、自動で制約語句を抽出する場合、人手で制約語句を抽出する場合のいずれにおいても、複数の制約語句の中には、適切でない制約語句（ノイズ）が含まれ得る。In vocabulary-constrained machine translation methods, the translated text is generated to include the given constraint words. However, whether constraint words are extracted automatically or manually, the selection of constraint words may include inappropriate constraint words (noise).

ノイズを含んだ制約語句の全てを語彙制約として語彙制約付き機械翻訳手法を適用すると、間違った語句が翻訳文に含まれることになり、翻訳精度が低下することが想定される。なお、このような課題は、機械翻訳の分野に限らない、制約情報を用いて系列変換を行う分野全般で生じ得る課題である。If a lexically constrained machine translation method is applied using all constraint phrases containing noise as lexical constraints, it is expected that incorrect phrases will be included in the translated text, leading to a decrease in translation accuracy. This issue is not limited to machine translation; it can occur in any field that uses constraint information for sequence transformation.

本発明は上記の点に鑑みてなされたものであり、制約情報を使用した系列変換を精度良く行うための技術を提供することを目的とする。This invention has been made in view of the above points, and aims to provide a technology for performing sequence transformations using constraint information with high accuracy.

開示の技術によれば、制約情報と、情報の系列である第１系列とから別の情報の系列である第２系列を生成するための生成装置であって、
制約情報リストを入力し、前記制約情報リストに含まれる複数の制約情報の部分集合の各要素を語彙制約として出力する入力生成部と、
前記入力生成部から出力される複数の語彙制約のそれぞれについて、前記第１系列と前記語彙制約とを用いて、前記第２系列についての１又は複数の候補を生成する系列生成部と、
前記第２系列としての適切さを示すスコアを、前記複数の語彙制約のそれぞれについて得られた前記１又は複数の候補のそれぞれについて算出するリランキング部と
を備える生成装置が提供される。
According to the disclosed technology, a generation device for generating a second series of information from constraint information and a first series of information, which is another series of information,
An input generation unit that takes a constraint information list as input and outputs each element of a subset of multiple constraint information included in the constraint information list as a lexical constraint,
A sequence generation unit generates one or more candidates for the second sequence using the first sequence and the lexical constraints for each of the multiple lexical constraints output from the input generation unit ,
A generation device is provided, comprising a reranking unit that calculates a score indicating the appropriateness of the second series for each of the one or more candidates obtained for each of the multiple lexical constraints .

開示の技術によれば、制約情報を使用した系列変換を精度良く行うための技術が提供される。According to the disclosed technology, a technique is provided for performing sequence transformations with high accuracy using constraint information.

語彙制約付き機械翻訳の例を示す図である。This figure shows an example of machine translation with vocabulary constraints. 生成装置１００の構成例を示す図である。This is a diagram showing an example of the configuration of the generating device 100. 生成装置１００の動作を説明するためのフローチャートである。This is a flowchart illustrating the operation of the generation device 100. 抽出部１２０の構成例を示す図である。This figure shows an example of the configuration of the extraction unit 120. 抽出部１２０の構成例を示す図である。This figure shows an example of the configuration of the extraction unit 120. 生成装置１００の構成例を示す図である。This is a diagram showing an example of the configuration of the generating device 100. 系列生成部１４０の構成例を示す図である。This figure shows an example of the configuration of the sequence generation unit 140. 機械翻訳モデルの構成例を示す図でる。This diagram shows an example of a machine translation model configuration. 系列生成部１４０の構成例を示す図である。This figure shows an example of the configuration of the sequence generation unit 140. 表示部５００による表示イメージを示す図である。This figure shows the display image provided by the display unit 500. 生成装置１００の構成例を示す図である。This is a diagram showing an example of the configuration of the generating device 100. 実験で使用した各セッティングにおいてベースとなる詳細セッティングとハイパーパラメータを示す図である。This diagram shows the base detailed settings and hyperparameters for each setting used in the experiment. 評価結果を示す図である。This is a diagram showing the evaluation results. 装置のハードウェア構成例を示す図である。This figure shows an example of the device's hardware configuration.

以下、図面を参照して本発明の実施の形態（本実施の形態）を説明する。以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。The embodiments of the present invention (this embodiment) will be described below with reference to the drawings. The embodiments described below are merely examples, and the embodiments to which the present invention is applied are not limited to the embodiments described below.

以下で説明する実施の形態では、本発明を機械翻訳に適用する例を示しているが、本発明は、制約情報を用いた系列変換であれば、どのような分野の系列変換にも適用可能である。例えば、要約タスク、発話文生成タスク、画像に説明文を付けるタスク等にも本発明を用いることが可能である。The embodiments described below illustrate an example of applying the present invention to machine translation, but the present invention is applicable to sequence transformations in any field as long as they use constraint information. For example, the present invention can be used for summarization tasks, speech generation tasks, and tasks that add descriptive text to images.

また、以下で説明する実施の形態では、翻訳の単位を文としているが、翻訳の単位は任意のものとしてよい。Furthermore, in the embodiments described below, the unit of translation is defined as a sentence, but the unit of translation may be any unit.

以下で説明する生成装置１００は、制約を付けた系列変換を行うような従来技術に対して特定の改善を提供するものであり、制約を付けた系列変換に係る技術分野の向上を示すものである。また、以下で説明する抽出装置は、制約情報の抽出において従来技術に対して特定の改善を提供するものであり、制約情報の抽出に係る技術分野の向上を示すものである。The generation device 100 described below offers specific improvements to conventional technologies that perform constrained sequence transformations, and represents an improvement in the technical field related to constrained sequence transformations. Furthermore, the extraction device described below offers specific improvements to conventional technologies in the extraction of constraint information, and represents an improvement in the technical field related to the extraction of constraint information.

（課題について）
本実施の形態に係る構成と動作を詳細に説明する前に、まず、従来技術とそれに対する課題について説明する。なお、以下の課題の説明は公知技術ではない。また、以下で説明する課題は、実施形態の技術に関する課題である。 (Regarding the challenges)
Before describing the configuration and operation of this embodiment in detail, we will first describe the prior art and the problems associated with it. Note that the following description of problems is not publicly known. Furthermore, the problems described below relate to the technology of this embodiment.

既に説明したように、あるドメインの文を他のドメイン（例：他言語）に変換する際に、指定された語句をすべて含ませることを目的とした制約を課したものを語彙制約付き機械翻訳と呼ぶ。参考として、図１に、語彙制約付き機械翻訳における入出力の例を示す。As already explained, machine translation with vocabulary constraints is a method that imposes constraints on the translation of a sentence from one domain to another (e.g., another language) with the aim of including all specified words and phrases. For reference, Figure 1 shows an example of input and output in machine translation with vocabulary constraints.

図１の例では、「光線一致に基づく定常波の幾何光学的理論を展開した。」という原言語文に対して、機械翻訳（MT Output）、制約語句（Constraints）、語彙制約付き機械翻訳（Constrained MT Output）が示されている。下線部分が制約語句を示す。In the example in Figure 1, the machine translation (MT Output), constraints, and lexical-constrained machine translation (Constrained MT Output) are shown for the original sentence, "A geometrical optical theory of standing waves based on ray coincidence was developed." The underlined parts indicate the constraints.

語彙制約付き機械翻訳の従来技術として、非特許文献１「Chen, G., Chen, Y., and Li, V. O. (2021). "Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance." Proceedings of the AAAI Conference on Artificial Intelligence」には、人手で作成された制約語句に対する語彙制約付き機械翻訳手法が開示されている。非特許文献１に開示された手法は、ｓｏｆｔ法とも呼ばれる。非特許文献１に開示された手法では、制約語句が必ず翻訳文に含まれる保証はされない。As a prior art technique for lexically constrained machine translation, Non-Patent Document 1, "Chen, G., Chen, Y., and Li, V. O. (2021). "Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance." Proceedings of the AAAI Conference on Artificial Intelligence," discloses a lexically constrained machine translation method for constraint phrases created by humans. The method disclosed in Non-Patent Document 1 is also called the soft method. The method disclosed in Non-Patent Document 1 does not guarantee that the constraint phrases will always be included in the translated text.

非特許文献２「Matt Post and David Vilar. 2018. Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1314-1324, New Orleans, Louisiana. Association for Computational Linguistics」、及び参考文献１「Chousa, K. and Morishita, M. (2021). "Input Augmentation Improves Constrained Beam Search for Neural Machine Translation: NTT at WAT 2021." In Proceedings of the 8th Workshop on Asian Translation (WAT), pp. 53--61, Online. Association for Computational Linguistics.」にも、人手で作成された制約語句に対する語彙制約付き機械翻訳手法が開示されている。この手法では、制約語句が必ず翻訳文に含まれる保証がある。この手法は、ｈａｒｄ法とも呼ばれる。Non-patent document 2, "Matt Post and David Vilar. 2018. Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1314-1324, New Orleans, Louisiana. Association for Computational Linguistics," and reference 1, "Chousa, K. and Morishita, M. (2021). "Input Augmentation Improves Constrained Beam Search for Neural Machine Translation: NTT at WAT 2021." In Proceedings of the 8th Workshop on Asian Translation (WAT), pp. 53-61, Online. Association for Computational Linguistics," also disclose a lexical-constrained machine translation method for manually created constraint phrases. This method guarantees that the constraint phrase will always be included in the translated text. This method is also known as the HARD method.

制約語句を人手で作成するのではなく、自動で作成するユースケースもある。例えば、特許や科学技術論文などの固有名詞を多く含むドメインの文書の翻訳では過去の翻訳結果から作成される翻訳メモリや対訳辞書を用いることが多く、そのため対訳辞書から自動で抽出した制約語句によって語彙制約付き機械翻訳を行うというユースケースが考えられる。There are also use cases where constraint phrases are created automatically rather than manually. For example, in the translation of domain documents containing many proper nouns, such as patents and scientific and technical papers, translation memories and bilingual dictionaries created from past translation results are often used. Therefore, a use case can be conceivable where machine translation with vocabulary constraints is performed using constraint phrases automatically extracted from the bilingual dictionary.

一方で、制約語句を自動で抽出する場合には抽出された制約語句にノイズとなる語句が含まれることが考えられる。また、人手で制約語句を抽出する場合でも、ノイズは含まれ得る。On the other hand, when constraint terms are automatically extracted, it is possible that the extracted constraint terms may contain noisy terms. Furthermore, even when constraint terms are extracted manually, noise may be included.

非特許文献１、２等に開示された従来の語彙制約付き機械翻訳手法では、与えられる制約語句が参照訳中に含まれることを仮定している。そのため、抽出された制約語句を語彙制約として語彙制約付き機械翻訳手法を適用すると、間違った語句が翻訳文に含まれる場合があり、翻訳精度が低下することが想定される。Conventional vocabulary-constrained machine translation methods disclosed in Non-Patent Documents 1 and 2, etc., assume that the given constraint phrases are included in the reference translation. Therefore, when applying a vocabulary-constrained machine translation method using extracted constraint phrases as vocabulary constraints, it is anticipated that incorrect phrases may be included in the translated text, leading to a decrease in translation accuracy.

上記の点から、以下では、ノイズを削減して適切に制約語句を抽出する技術、及び、ノイズが含まれ得る制約語句の集合を使用する場合でも、精度良く語彙制約付き機械翻訳を行うための技術を説明する。Based on the above points, the following describes techniques for reducing noise and appropriately extracting constraint phrases, as well as techniques for performing accurate lexical-constrained machine translation even when using sets of constraint phrases that may contain noise.

（装置構成例、全体動作）
図２に、本実施の形態における生成装置１００の構成例を示す。図２に示すように、生成装置１００は、入力部１１０、抽出部１２０、入力生成部１３０、系列生成部１４０、リランキング部１５０、及び出力部１６０を有する。 (Example of device configuration, overall operation)
Figure 2 shows an example of the configuration of the generation device 100 in this embodiment. As shown in Figure 2, the generation device 100 has an input unit 110, an extraction unit 120, an input generation unit 130, a series generation unit 140, a reranking unit 150, and an output unit 160.

また、対訳辞書ＤＢ２００、及びモデルＤＢ３００が備えられている。対訳辞書ＤＢ２００には、対訳辞書が格納されており、モデルＤＢ３００には、学習済みの機械翻訳モデルが格納されている。対訳辞書ＤＢ２００、及びモデルＤＢ３００は、生成装置１００の外部に備えられていてもよいし（図２の例）、生成装置１００の内部に備えられていてもよい。Furthermore, a bilingual dictionary DB 200 and a model DB 300 are provided. The bilingual dictionary DB 200 stores bilingual dictionaries, and the model DB 300 stores trained machine translation models. The bilingual dictionary DB 200 and the model DB 300 may be located outside the generation device 100 (as in the example in Figure 2), or they may be located inside the generation device 100.

図３のフローチャートを参照して、生成装置１００による全体の動作の流れについて説明する。Ｓ１０１において、入力部１１０により原言語文を入力する。Ｓ１０２において、抽出部１２０は、入力部１１０により入力された原言語文（入力文）と、対訳辞書ＤＢ２００から読み出した対訳辞書に基づいて、制約語句を自動的に抽出する。Referring to the flowchart in Figure 3, the overall operation flow of the generation device 100 will be explained. In S101, the source language text is input by the input unit 110. In S102, the extraction unit 120 automatically extracts constraint phrases based on the source language text (input text) input by the input unit 110 and the translation dictionary read from the translation dictionary DB 200.

Ｓ１０３において、入力生成部１３０は、制約語句の任意の組み合わせから複数の入力（語彙制約）を生成する。Ｓ１０４において、系列生成部１４０は、Ｓ１０３で生成した複数の入力と、モデルＤＢ３００から読み出した機械翻訳モデルを用いて、入力文に対する翻訳を行う。ここでは、Ｓ１０３で生成した複数の入力のそれぞれに対して翻訳結果が得られる。すなわち、系列生成部１４０は、ある系列と語彙制約とを用いて、予め学習済みの系列変換モデルに基づいて、別の系列についての１又は複数の候補を生成する。In S103, the input generation unit 130 generates multiple inputs (lexical constraints) from any combination of constraint phrases. In S104, the sequence generation unit 140 performs translation on the input sentence using the multiple inputs generated in S103 and the machine translation model read from the model DB 300. Here, a translation result is obtained for each of the multiple inputs generated in S103. That is, the sequence generation unit 140 uses a certain sequence and lexical constraints to generate one or more candidates for another sequence based on a pre-trained sequence transformation model.

Ｓ１０５において、リランキング部１５０が、各翻訳結果に対して入力文を用いてリランキング用のスコアを予測する。Ｓ１０６において、出力部１６０が、スコアの最も高い翻訳結果（目的言語文）を出力する。以下、主要な機能部についての構成と動作を詳細に説明する。In S105, the reranking unit 150 predicts a reranking score for each translation result using the input sentence. In S106, the output unit 160 outputs the translation result (target language sentence) with the highest score. The configuration and operation of the main functional units will be described in detail below.

（抽出部１２０）
まず、抽出部１２０について説明する。抽出部１２０は、原言語文と対訳辞書を入力として受け取り、原言語文と制約語句リストを出力する。なお、原言語文は出力しないこととしてもよい。 (Extraction unit 120)
First, let's explain the extraction unit 120. The extraction unit 120 receives the source language text and the translation dictionary as input and outputs the source language text and the constraint vocabulary list. Note that the source language text may not be output.

図４は、抽出部１２０の構成図である。図４に示すように、抽出部１２０は、フィルタリング部１２１、分割部１２２、制約語句抽出部１２３を有する。また、抽出部１２０は、対訳辞書２００を参照する。なお、抽出部１２０は、フィルタリング部１２１を備えない構成としてもよい。Figure 4 is a diagram showing the configuration of the extraction unit 120. As shown in Figure 4, the extraction unit 120 includes a filtering unit 121, a division unit 122, and a constraint phrase extraction unit 123. The extraction unit 120 also refers to the bilingual dictionary 200. Note that the extraction unit 120 may also be configured without the filtering unit 121.

対訳辞書ＤＢ２００は、系列を変換する際に対応させる２つの語句のペアの集合を格納する。具体的には、翻訳を対象としている本実施の形態では、対訳辞書ＤＢ２００は、＜原言語語句、目的言語語句＞ペアの集合を格納している。原言語語句、及び目的言語語句はそれぞれ、複数の語からなる場合もある。本実施の形態では、１つの＜原言語語句、目的言語語句＞ペアを「対訳」と呼称する。原言語語句、及び目的言語語句をそれぞれ、原言語訳語、及び目的言語訳語と呼んでもよい。The bilingual dictionary DB200 stores a set of pairs of two words that are used to convert sequences. Specifically, in this embodiment which targets translation, the bilingual dictionary DB200 stores a set of <source language phrase, target language phrase> pairs. The source language phrase and target language phrase may each consist of multiple words. In this embodiment, one <source language phrase, target language phrase> pair is referred to as a "translation". The source language phrase and target language phrase may also be referred to as the source language translation and the target language translation, respectively.

なお、対訳辞書ＤＢ２００を、翻訳以外のタスクに使う場合には、その内容は、＜原言語語句、目的言語語句＞ペアの集合に限られない。Furthermore, when using the DB200 bilingual dictionary for tasks other than translation, its contents are not limited to sets of <source language phrase, target language phrase> pairs.

フィルタリング部１２１は、対訳辞書からノイズとなる対訳をフィルタリングする。フィルタリング後の対訳辞書は対訳辞書ＤＢ２００に格納され、分割部１２２及び制約語句抽出部１２３は、フィルタリング後の対訳辞書を参照する。The filtering unit 121 filters out noisy translations from the translation dictionary. The filtered translation dictionary is stored in the translation dictionary DB 200, and the splitting unit 122 and the constraint phrase extraction unit 123 refer to the filtered translation dictionary.

分割部１２２は、原言語文、及び、対訳辞書の原言語語句を形態素解析する。つまり、分割部１２２は、原言語文、及び、対訳辞書の原言語語句を単位情報に分割する。制約語句抽出部１２３は、原言語文に含まれる語句（分割で得られた単位情報の例）に対応する対訳を抽出し制約語句リストを作成する。以下、各部の処理をより詳細に説明する。The splitting unit 122 performs morphological analysis on the source language sentence and the source language phrases in the translation dictionary. In other words, the splitting unit 122 divides the source language sentence and the source language phrases in the translation dictionary into unit information. The constraint phrase extraction unit 123 extracts translations corresponding to the phrases contained in the source language sentence (examples of unit information obtained by splitting) and creates a constraint phrase list. The processing of each unit will be explained in more detail below.

＜抽出部１２０：フィルタリング部１２１＞
フィルタリング部１２１は、以下（Ａ）～（Ｃ）に当てはまる対訳、又は対訳に含まれる語句を対訳辞書から削除する。ただし、フィルタリング部１２１は、（Ａ）～（Ｃ）の全部を実施することは必須ではなく、（Ａ）～（Ｃ）のうちの少なくとも１つを実施することとしてもよい。また、（Ａ）～（Ｃ）以外のフィルタリングを行ってもよい。特に、後述する変形例１，２においては、以下（Ｃ）の処理はスキップすることとしてもよい。 <Extraction unit 120: Filtering unit 121>
The filtering unit 121 removes translations that fall under (A) to (C) below, or words or phrases included in the translations, from the translation dictionary. However, it is not mandatory for the filtering unit 121 to perform all of (A) to (C), and it may perform at least one of (A) to (C). It may also perform filtering other than (A) to (C). In particular, in the modified examples 1 and 2 described later, the process of (C) below may be skipped.

（Ａ）名詞／名詞句以外の語句を含む対訳（動詞は活用があるためのこれを除く）
（Ｂ）長さが１の語句からなる対訳
単位などの一文字のものが（Ｂ）の例である。例えば、「目的言語：Ｃ、原言語：度」の対訳が（Ｂ）に該当する。 (A) Translations that include words other than nouns/noun phrases (excluding verbs because they are conjugated)
(B) Translations consisting of phrases with a length of 1. Examples of (B) include single-character words such as units. For example, the translation of "Target language: C, Source language: degree" falls under (B).

（Ｃ）原言語と目的言語との間の対応関係に一意性が無いもの（例えば、原言語側の１つの語句に対して複数の訳語が存在するもの）
（Ｃ）に該当する対訳については、対訳を削除する。あるいは、複数の訳語から、１つの訳語を残して他を削除することで、原言語語句と目的言語語句とが１対１になるようにする。複数の訳語から、１つの訳語を残して他を削除する方法についてはどのような方法を用いてもよく、例えば、１番最初に記載された訳語を残す、訳語の出現頻度が最も高いものを残す、などの方法を用いることができる。 (C) Cases where there is no unique correspondence between the source language and the target language (for example, cases where there are multiple translations for a single word or phrase in the source language).
For translations corresponding to (C), the translation shall be deleted. Alternatively, one translation shall be kept from multiple translations, and the others shall be deleted, so that there is a one-to-one correspondence between the source language phrase and the target language phrase. Any method may be used to keep one translation from multiple translations and delete the others; for example, one may keep the translation that is listed first, or one may keep the translation that appears most frequently.

例えば、「原言語：ｃｏｍｐｕｔｅｒ、目的言語：計算機、コンピュータ」の対訳は（Ｃ）に該当し、この場合、例えば、この対訳を削除するか、「目的言語：ｃｏｍｐｕｔｅｒ、原言語：計算機」のようにして、原言語語句と目的言語語句とが１対１になるようにする。For example, the translation of "Source language: computer, Target language: calculator, computer" falls under (C). In this case, for example, this translation should be deleted, or changed to "Target language: computer, Source language: calculator" so that there is a one-to-one correspondence between the source language phrase and the target language phrase.

＜抽出部１２０：分割部１２２＞
分割部１２２は、原言語文と、対訳辞書の原言語訳語を形態素単位に分割（トークナイズ）して、形態素境界に所定の記号（例：空白、"／"）を挿入する。この分割単位は、後に翻訳する際に行う分割処理の分割単位とは異なってもよい。 <Extraction unit 120: Division unit 122>
The splitting unit 122 divides (tokenizes) the source language text and the source language translation from the bilingual dictionary into morphemes and inserts a predetermined symbol (e.g., a space, "/") at the morpheme boundaries. This splitting unit may differ from the splitting unit used in the subsequent translation process.

例えば、原言語文が"その限りではない"とすると、分割部１２２による処理後の原言語文は、"その／限り／で／は／ない"となる。For example, if the original language sentence is "not limited to that," the original language sentence after processing by the splitting unit 122 becomes "not limited to that."

＜抽出部１２０：制約語句抽出部１２３＞
制約語句抽出部１２３は、原言語文に含まれる語句に対応する対訳を抽出し、抽出した対訳を用いて制約語句リストを作成する。具体的な制約語句抽出方法の例を以下に説明する。なお、原言語文に含まれる語句に対応する制約語句を抽出できる方法であれば、辞書の形式や検索方法等は、以下で説明する方法に限定されず、別の方法であってもよい。 <Extraction unit 120: Constraint phrase extraction unit 123>
The constraint phrase extraction unit 123 extracts translations corresponding to words and phrases contained in the source language text and creates a constraint phrase list using the extracted translations. An example of a specific constraint phrase extraction method is described below. Note that the dictionary format and search method are not limited to the method described below, and other methods may be used as long as they can extract constraint phrases corresponding to words and phrases contained in the source language text.

本例では、対訳辞書として、原言語訳語の文字単位で、Ｔｒｉｅ木とよばれるデータ構造を使用して表現したものを用いることとする。In this example, we will use a translation dictionary that represents the original language translations character by character using a data structure called a Trie tree.

制約語句抽出部１２３は、原言語文の文頭から、対訳辞書の原言語訳語の集合を対象に、前方一致検索を進める。原言語文に含まれる語句にマッチする原言語訳語を含む対訳（ペア）が見つかったら、その目的語訳語を制約語句として抽出する。前方一致検索の際には原言語訳語の語句の長さが最長となる対訳を選択する。The constraint phrase extraction unit 123 performs a prefix matching search on the set of source language translations in the translation dictionary, starting from the beginning of the source language sentence. When a translation (pair) containing a source language translation that matches a phrase in the source language sentence is found, that object translation is extracted as a constraint phrase. During the prefix matching search, the translation with the longest source language translation phrase is selected.

例えば、原言語文において、分割部１２２による形態素解析により分割された結果、３つの語句、つまり、「ＡＢＣ／ＧＨＩ／ＸＹＺ」が得られたとする。ここでのＡ、Ｂ、Ｃなどは文字である。制約語句抽出部１２３が「ＡＢＣ／ＧＨＩ／ＸＹＺ」を用いて対訳辞書の原言語訳語に対して検索を行うと、「ＡＢＣ／ＧＨＩ／ＸＹＺ」の文頭（あるいは前方）から一致するものがマッチする。For example, suppose that in the source language text, morphological analysis by the splitting unit 122 results in three phrases, namely "ABC/GHI/XYZ". Here, A, B, C, etc., are letters. When the constraint phrase extraction unit 123 searches the source language translation in the translation dictionary using "ABC/GHI/XYZ", it matches the beginning (or front) of "ABC/GHI/XYZ".

上記の検索の結果、例えば、「ＡＢ」、「ＡＢＣ」、「ＡＢＣＧ」「ＡＢＣ／ＧＨＩ」の４つの語句がマッチする場合でも、後述するように「ＡＢ」や「ＡＢＣＧ」は、形態素の単位が合わないため、マッチしないようにすることができる。この場合、残った「ＡＢＣ」「ＡＢＣ／ＧＨＩ」のうち、原言語訳語の語句の長さが最長である「ＡＢＣ／ＧＨＩ」とペアの目的語訳語を制約語句として抽出する。その後、「ＡＢＣ／ＧＨＩ」より後ろの部分である「ＸＹＺ」を用いて同様の処理を実行する。As a result of the above search, even if four phrases such as "AB," "ABC," "ABCG," and "ABC/GHI" match, as will be explained later, "AB" and "ABCG" can be excluded from the match because their morphemes do not match. In this case, of the remaining "ABC" and "ABC/GHI," the object translation paired with "ABC/GHI," which has the longest phrase length in the original language translation, is extracted as a constraint phrase. Then, the same process is performed using "XYZ," which is the part after "ABC/GHI."

本実施の形態のように、分割部１２２により原言語文と対訳辞書の原言語訳語とを事前に形態素（単位情報の例）に分割し、形態素境界を考慮して検索を行うことで、分割単位の一致しない語句を誤抽出すること防ぐことができる。これは、原言語が日本語等のように、分かち書きされない言語の場合には特に効果が大きい。例えば原言語訳語である"はな"（花）が、原言語文の"その／限り／で／は／ない"にマッチすることを防ぐことができる。つまり、"はな"が、"は／な"にマッチできなくなる。As in this embodiment, the division unit 122 pre-divides the source language text and the source language translation in the bilingual dictionary into morphemes (examples of unit information), and by performing a search while considering the morpheme boundaries, it is possible to prevent the mis-extraction of words that do not match in the division units. This is particularly effective when the source language is a language that is not separated by spaces, such as Japanese. For example, it is possible to prevent the source language translation "hana" (flower) from matching "sono/kagiri/de/wa/nai" in the source language text. In other words, "hana" will no longer be able to match "ha/na".

なお、ここで説明するような前方一致、最長一致、及び、単語分割の実施は、よりノイズが少なく、曖昧性を削減した制約語句抽出を実現するための手段の一例である。曖昧性を解消できる別の手段を用いてもよい。The prefix matching, longest match, and word segmentation methods described here are just examples of techniques for achieving less noisy and less ambiguous restricted phrase extraction. Other methods for resolving ambiguity may also be used.

例えば、分割部１２２で形態素解析を行う際に、品詞、原型、語幹、活用形、読み（発音）など曖昧性解消に必要な情報を分割した語句に付与し、当該付与した情報も使用してマッチングを行う。すなわち、マッチングの際に文字列だけでなく、その品詞等の付属情報を使うことで、例えば、原言語文にある"ｉｎ"という文字が、原言語訳語である前置詞ｉｎと名詞ｉｎｎ（宿）の両方にマッチする状況に置いて、その曖昧性を解消することができる。マッチング時に、曖昧性の解消を行うことが、翻訳精度の向上に重要な要素となる。For example, when performing morphological analysis in the splitting unit 122, information necessary for resolving ambiguity, such as part of speech, base form, stem, conjugation form, and reading (pronunciation), is attached to the split words, and matching is performed using this attached information. In other words, by using not only the string but also its accompanying information such as part of speech during matching, ambiguity can be resolved in situations where, for example, the character "in" in the original language sentence matches both the preposition "in" and the noun "inn" (inn), which are the original language translations. Resolving ambiguity during matching is an important factor in improving translation accuracy.

＜抽出部１２０の構成の他の例＞
なお、抽出部１２０を、図４に示す構成に代えて、図５に示す構成としてもよい。図５に示す構成では、対訳辞書のフィルタリングを行うのではなく、制約語句抽出部１２１で抽出した制約語句に対して、フィルタリング部１２１がフィルタリングを行う。 <Other examples of the configuration of the extraction unit 120>
Note that the extraction unit 120 may be configured as shown in Figure 5 instead of the configuration shown in Figure 4. In the configuration shown in Figure 5, instead of filtering the bilingual dictionary, the filtering unit 121 performs filtering on the constraint phrases extracted by the constraint phrase extraction unit 121.

フィルタリング処理については、前述したフィルタリング部１２１による処理と同様である。ただし、「対訳」は「制約語句」と読み替える。具体的には、フィルタリング部１２１は、以下（Ａ）～（Ｃ）に当てはまる制約語句を、制約語句抽出部１２３による抽出結果から削除する。ただし、フィルタリング部１２１は、（Ａ）～（Ｃ）の全部を実施することは必須ではなく、（Ａ）～（Ｃ）のうちの少なくとも１つを実施することとしてもよい。また、（Ａ）～（Ｃ）以外のルールを用いてもよい。特に、後述する変形例１，２を行う場合には以下（Ｃ）の処理はスキップすることとしてもよい。The filtering process is the same as that performed by the filtering unit 121 described above. However, "translation" is read as "constraining phrase". Specifically, the filtering unit 121 deletes constraint phrases that fall under (A) to (C) below from the extraction results of the constraint phrase extraction unit 123. However, it is not mandatory for the filtering unit 121 to perform all of (A) to (C), and it may perform at least one of (A) to (C). Also, rules other than (A) to (C) may be used. In particular, when performing the modified examples 1 and 2 described later, the process of (C) below may be skipped.

（Ａ）名詞／名詞句以外の語句を含む制約語句（動詞は活用があるためにこれを削除する）
（Ｂ）長さが１の語句からなる制約語句
（Ｃ）原言語と目的言語との間の対応関係に一意性が無いもの（例えば、原言語側の１つの語句に対して複数の制約語句が存在するもの）
（Ｃ）を実施する場合、原言語側の１つの語句に対して複数の制約語句が存在する場合、例えば、当該複数の制約語句を削除する、あるいは、複数の制約語句から、１つの制約語句を残して他を削除することで、原言語語句と目的言語語句とが１対１になるようにする。 (A) Restrictive phrases that include words other than nouns/noun phrases (verbs are removed because they have conjugations)
(B) Constraint phrases consisting of words of length 1 (C) The correspondence between the source language and the target language is not unique (for example, there are multiple constraint phrases for a single word in the source language)
When implementing (C), if there are multiple constraint clauses for a single word in the source language, for example, the multiple constraint clauses may be deleted, or one constraint clause may be kept and the others deleted, so that there is a one-to-one correspondence between the source language word and the target language word.

（抽出部１２０、生成装置１００の他の構成例について）
抽出部１２０は、生成装置１００とは独立の単独の装置であってもよい。この単独の装置を抽出装置と呼んでもよい。なお、生成装置１００に含まれている抽出部１２０についても、当該抽出部１２０を抽出装置と呼んでもよい。また、抽出部１２０を有する生成装置１００を抽出装置と呼んでもよい。また、抽出部１２０と抽出装置はいずれも、後述する実施例における表示情報生成部１７０と修正部１８０の両方又はいずれか１つを備えてもよい。 (Regarding other configuration examples of the extraction unit 120 and the generation device 100)
The extraction unit 120 may be a standalone device independent of the generation device 100. This standalone device may be called an extraction device. Furthermore, an extraction unit 120 included in the generation device 100 may also be called an extraction device. Additionally, the generation device 100 having an extraction unit 120 may be called an extraction device. Both the extraction unit 120 and the extraction device may also include either or both of the display information generation unit 170 and the modification unit 180 as described in the embodiment later.

抽出部１２０が、生成装置１００とは独立の単独の装置として構成される場合において、生成装置１００は、抽出部１２０を含まないこととしてもよい。この場合の生成装置１００の構成を図６に示す。図６の構成においては、抽出装置により生成された制約語句リストが生成装置１００に入力される。ただし、図６の構成において、抽出装置により生成された制約語句リストではない制約語句リスト（例：ノイズが多く含まれる制約語句リスト）が生成装置１００に入力されることとしてもよい。When the extraction unit 120 is configured as a separate device independent of the generation device 100, the generation device 100 may not include the extraction unit 120. The configuration of the generation device 100 in this case is shown in Figure 6. In the configuration of Figure 6, the constraint phrase list generated by the extraction device is input to the generation device 100. However, in the configuration of Figure 6, a constraint phrase list other than the constraint phrase list generated by the extraction device (e.g., a constraint phrase list containing a lot of noise) may be input to the generation device 100.

図６における入力生成部１３０、系列生成部１４０、及びリランキング部１５０の動作は、図２における入力生成部１３０、系列生成部１４０、及びリランキング部１５０の動作と同じである。The operation of the input generation unit 130, the sequence generation unit 140, and the reranking unit 150 in Figure 6 is the same as the operation of the input generation unit 130, the sequence generation unit 140, and the reranking unit 150 in Figure 2.

（入力生成部１３０）
次に、入力生成部１３０について説明する。入力生成部１３０は、制約語句リストを入力として受け取り、制約語句リストに含まれる語句についての部分集合の全ての要素をそれぞれ語彙制約とする。ただし、全ての要素のうちの一部の要素をそれぞれ語彙制約としてもよい。 (Input generation unit 130)
Next, the input generation unit 130 will be described. The input generation unit 130 receives a constraint phrase list as input and sets all elements of a subset of the words included in the constraint phrase list as lexical constraints. However, it is also possible to set only some of the elements out of all as lexical constraints.

最後に、入力生成部１３０は、抽出部１２０に入力された原言語文に対応する語彙制約として、上記の語彙制約を出力する。具体例を以下に示す。Finally, the input generation unit 130 outputs the above vocabulary constraints as vocabulary constraints corresponding to the source language sentence input to the extraction unit 120. A specific example is shown below.

入力生成部１３０に、制約語句リストとして、｛Ａ，Ｂ，Ｃ｝が入力されるとする。ここで、Ａ、Ｂ、Ｃはそれぞれ制約語句である。Assume that the input generation unit 130 receives {A, B, C} as a constraint phrase list. Here, A, B, and C are constraint phrases.

入力生成部１３０は、｛Ａ，Ｂ，Ｃ｝の部分集合の要素として、｛｝，｛Ａ｝，｛Ｂ｝，｛Ｃ｝，｛Ａ，Ｂ｝，｛Ａ，Ｃ｝，｛Ｂ，Ｃ｝，｛Ａ，Ｂ，Ｃ｝を抽出し、これらのそれぞれを語彙制約として出力する。The input generation unit 130 extracts {}, {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, {A,B,C} as elements of a subset of {A,B,C}, and outputs each of these as a lexical constraint.

なお、｛｛｝，｛Ａ｝，｛Ｂ｝，｛Ｃ｝，｛Ａ，Ｂ｝，｛Ａ，Ｃ｝，｛Ｂ，Ｃ｝，｛Ａ，Ｂ，Ｃ｝｝は制約語彙集合であり、１つの｛...｝が、１つの語彙制約である。Note that {{}, {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, {A,B,C}} are constraint lexical sets, and each {...} represents one lexical constraint.

制約語句のリストＣの部分集合から作成される語彙制約は２^｜Ｃ｜個となり、後述するように、その各語彙制約からそれぞれ複数の翻訳文の候補が得られる。 The lexical constraints created from a subset of the constraint phrase list C are 2 ^|C| , and as will be described later, multiple candidate translations can be obtained from each of these lexical constraints.

（系列生成部１４０）
次に、系列生成部１４０について説明する。系列生成部１４０は、モデルＤＢ３００から読み出した、学習済みの機械翻訳モデルを保持しているとする。また、系列生成部１４０は、下記の処理を語彙制約の数（語彙制約集合の要素数分）だけ繰り返す。例えば、語彙制約集合が｛｛｝，｛Ａ｝，｛Ｂ｝，｛Ｃ｝，｛Ａ，Ｂ｝，｛Ａ，Ｃ｝，｛Ｂ，Ｃ｝，｛Ａ，Ｂ，Ｃ｝｝であるとすると、８回繰り返す。 (Sequence generation unit 140)
Next, the sequence generation unit 140 will be described. Assume that the sequence generation unit 140 holds a trained machine translation model read from the model DB 300. The sequence generation unit 140 repeats the following process a number of times equal to the number of lexical constraints (the number of elements in the lexical constraint set). For example, if the lexical constraint set is {{}, {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, {A,B,C}}, then it will be repeated 8 times.

系列生成部１４０は、入力文（原言語文）と、語彙制約を入力として受け取る。系列生成部１４０は、語彙制約付き機械翻訳の既存手法を適用することにより、機械翻訳モデルを用いて翻訳文（目的言語文）を生成する。ここでは、複数の翻訳文を翻訳文候補（目的言語文候補）として生成する。また、翻訳文候補には、翻訳文としてのスコアが付けられている。The sequence generation unit 140 receives an input sentence (source language sentence) and lexical constraints as input. The sequence generation unit 140 generates a translated sentence (target language sentence) using a machine translation model by applying existing methods for machine translation with lexical constraints. Here, multiple translated sentences are generated as candidate translations (target language sentence candidates). Each candidate translation is also assigned a score as a translated sentence.

語彙制約付き機械翻訳の既存手法として、どの手法を用いてもよいが、例えば、ＬｅＣＡあるいはＬｅＣＡ＋ＬＣＤを使用することができる。ＬｅＣＡは非特許文献１に開示されており、ｓｏｆｔ法とも呼ばれる。ＬｅＣＡ＋ＬＣＤは、前述した参考文献１に開示されており、ｈａｒｄ法とも呼ばれる。Any existing method can be used for vocabulary-constrained machine translation, but for example, LeCA or LeCA+LCD can be used. LeCA is disclosed in Non-Patent Document 1 and is also called the soft method. LeCA+LCD is disclosed in the aforementioned Reference Document 1 and is also called the hard method.

系列生成部１４０は、生成した複数の翻訳文候補を出力する。一例として、系列生成部１４０は、スコアの高い順に、予め定めた数の翻訳文候補を出力する。「予め定めた数」は１であってもよい。つまり、最も高いスコアの翻訳文のみを出力してもよい。ここでは、例えば、１つの語彙制約あたり３０個の翻訳文候補を出力する。The sequence generation unit 140 outputs multiple generated translation candidate sentences. For example, the sequence generation unit 140 outputs a predetermined number of translation candidate sentences in descending order of score. The "determined number" may be 1. In other words, only the translation sentence with the highest score may be output. Here, for example, 30 translation candidate sentences are output for each vocabulary constraint.

＜系列生成部１４０の構成例＞
図７に、系列生成部１４０の構成例を示す。図７に示すように、系列生成部１４０は、系列変換部１４１と探索部１４２を有する。 <Example of the configuration of the sequence generation unit 140>
Figure 7 shows an example of the configuration of the sequence generation unit 140. As shown in Figure 7, the sequence generation unit 140 has a sequence conversion unit 141 and a search unit 142.

なお、翻訳文生成のためにｓｏｆｔ法を使う場合は、系列変換部１４１で語彙制約の情報を使用し、ｈａｒｄ法を使う場合は、ｈａｒｄ法の種類によって、系列変換部１４１で語彙制約を使う場合もあれば使わない場合もある。語彙制約についての系列変換部１４２への入力の矢印を点線で示している。ｈａｒｄ法のうち、前述したＬｅＣＡ＋ＬＣＤでは系列変換部１４１で語彙制約の情報を使用する。以下では、ＬｅＣＡ＋ＬＣＤを想定した構成／動作を説明する。Furthermore, when using the soft method for generating translated text, the sequence conversion unit 141 uses vocabulary constraint information. When using the hard method, depending on the type of hard method, the sequence conversion unit 141 may or may not use vocabulary constraints. The input arrows for vocabulary constraints to the sequence conversion unit 142 are indicated by dotted lines. Among the hard methods, the LeCA+LCD mentioned above uses vocabulary constraint information in the sequence conversion unit 141. The following describes the configuration/operation assuming LeCA+LCD.

系列変換部１４１において、機械翻訳モデルとして、図８に示すように、符号化器（エンコーダ）と復号化器（デコーダ）とを有する、一般的なエンコーダ‐デコーダモデル（例えば、Transformer）をベースとするモデルを使用することができる。ただし、本発明は、エンコーダ‐デコーダモデル以外のモデルを使用しても実施可能である。In the sequence conversion unit 141, as shown in Figure 8, a model based on a general encoder-decoder model (e.g., Transformer) having an encoder and a decoder can be used as the machine translation model. However, the present invention can also be implemented using a model other than the encoder-decoder model.

系列変換部１４１は、原言語文及び語彙制約を入力として受け取り、まず、語彙制約を用いて原言語文を拡張することにより、語彙制約の情報を付け加えた入力系列を作成し、それを機械翻訳モデルへの入力とする。The sequence conversion unit 141 receives the source language sentence and lexical constraints as input. First, it expands the source language sentence using the lexical constraints to create an input sequence with added lexical constraint information, and then uses this as input to the machine translation model.

より具体的には、上記の拡張において、系列変換部１４１は、入力系列である原言語文Ｘと、各制約語句Ｃ_ｉとを、下記のように＜ｓｅｐ＞という特別な区切りを示す文字列を介して結合（連結）することで語彙制約付きの入力系列を作成する。＜ｅｏｓ＞は文の終わりを表す文字列である。 More specifically, in the above extension, the sequence conversion unit 141 creates a vocabulary-constrained input sequence by combining (concatenating) the input sequence, the original language sentence X, and each constraint phrase C _i via a special delimiter string, <sep>, as shown below. <eos> is a string that indicates the end of a sentence.

［Ｘ，＜ｓｅｐ＞，Ｃ_１，＜ｓｅｐ＞，Ｃ_２，…，Ｃ_Ｎ，＜ｅｏｓ＞］
系列変換部１４１は、拡張した入力系列を機械翻訳モデルへの入力として、文を生成する。より具体的には、出力系列を構成し得る語の集合における各語の確率を出力する。 [X, <sep>, C ₁ , <sep>, C ₂ ,..., C _N , <eos>]
The sequence conversion unit 141 takes the expanded input sequence as input to the machine translation model and generates a sentence. More specifically, it outputs the probability of each word in the set of words that can constitute the output sequence.

探索部１４２は、機械翻訳モデルにおける復号化器の出力確率を用いて、入力系列が与えられたときの生成確率が最大になる出力系列（の近似解）を探索する。探索部１４２は、ビームサーチを基としたグリッドビームサーチの手法を用いることで、出力系列が制約語彙をすべて満たすことを保証することを可能としている。The search unit 142 uses the output probabilities of the decoder in the machine translation model to search for an output sequence (or an approximate solution thereof) that maximizes the generation probability given an input sequence. The search unit 142 uses a grid beam search method based on beam search, which ensures that the output sequence satisfies all constraint vocabulary.

なお、探索部１４２がグリッドビームサーチを用いて探索を行うことは一例である。制約語句を含むように、語彙制約付き探索を行う処理方法であればどのような処理方法を使用してもよい。Note that the use of grid beam search by the search unit 142 is just one example. Any processing method that performs a lexical-constrained search to include the constraint phrases may be used.

（リランキング部１５０）
続いて、リランキング部１５０について説明する。リランキング部１５０は、系列生成部１４０で生成された１つ又は複数の翻訳文候補を入力として受け取る。例えば、系列生成部１４０が、１つの語彙制約あたりに３０個の翻訳文候補を生成するとして、８つの語彙制約があるとすると、リランキング部１５０は、その８つの語彙制約における１つの語彙制約ごとに３０文の翻訳文候補を入力として受け取る。 (Re-ranking section 150)
Next, the reranking unit 150 will be explained. The reranking unit 150 receives one or more translation candidate sentences generated by the sequence generation unit 140 as input. For example, if the sequence generation unit 140 generates 30 translation candidate sentences for each vocabulary constraint, and there are eight vocabulary constraints, the reranking unit 150 receives 30 translation candidate sentences as input for each vocabulary constraint among those eight vocabulary constraints.

次に、リランキング部１５０は、入力文（原言語文）を用いて各翻訳文候補に対してスコアを計算し、スコアが最も良い翻訳文候補を、最終的な翻訳文として出力する。ここでスコアが最も高いものに絞り込まず、翻訳文とそのスコア全てを（又は一部を）出力してもよい。これにより、出力部１６０は、スコアを使ってランキング形式で翻訳文をユーザに提示することができる。Next, the re-ranking unit 150 calculates a score for each translation candidate using the input sentence (source language sentence), and outputs the translation candidate with the best score as the final translation. At this point, it is also possible to output all (or some) translations and their scores, rather than narrowing it down to the highest score. This allows the output unit 160 to present the translations to the user in a ranked format using the scores.

リランキング部１５０によるスコアの計算の方法については、翻訳文のスコアを計算できる方法であればどのような方法を使用してもよいが、例えば、下記の例１、例２の方法を使用することができる。Regarding the method for calculating the score by the reranking unit 150, any method that can calculate the score of the translated text may be used, but for example, the methods in Examples 1 and 2 below can be used.

例１：
リランキング部１５０は、系列生成部１４０において翻訳に用いた機械翻訳モデルの出力する翻訳文候補の尤度をスコアとして使用する。 Example 1:
The reranking unit 150 uses the likelihood of the translated sentence candidates output by the machine translation model used for translation in the sequence generation unit 140 as a score.

例２：
リランキング部１５０は、文末から文頭へと翻訳文を生成するRight-to-Left翻訳タスクをEncoder-DecoderモデルであるTransformerで学習した機械翻訳モデルをリランキングモデルとして使用し、そのリランキングモデルで翻訳文候補を強制的に出力させた際の尤度をスコアとして使用する。翻訳文候補を強制的に出力させることを、翻訳文候補を用いてforced decodingする、と言い換えてもよい。 Example 2:
The reranking unit 150 uses a machine translation model trained on an Encoder-Decoder model called a Transformer for a Right-to-Left translation task that generates translated sentences from the end of a sentence to the beginning of a sentence as its reranking model, and uses the likelihood of forcing the output of translated sentence candidates with this reranking model as its score. Forcing the output of translated sentence candidates can also be rephrased as performing forced decoding using translated sentence candidates.

すなわち、リランキングモデルのエンコーダに原言語文を入力し、リランキングモデルのデコーダへは、スコア（尤度）を評価したい翻訳文候補の単語を順次入力する。In other words, the source language sentence is input to the encoder of the reranking model, and the words of the translation candidate for which you want to evaluate the score (likelihood) are sequentially input to the decoder of the reranking model.

なお、例１と例２において、機械翻訳モデルが出力する尤度は、もっともらしさを示す値であればどのような値であってもよい。機械翻訳モデルが出力する尤度は、確率であってもよいし、確率以外の値であってもよい。In Example 1 and Example 2, the likelihood output by the machine translation model can be any value that indicates plausibility. The likelihood output by the machine translation model can be a probability or a value other than a probability.

また、リランキング部１５０は、例１の尤度と例２の尤度の両方を使用してリランキングのスコアを計算してもよい。例えば、例１の尤度と例２の尤度の平均をリランキングのスコアとしてもよい。Furthermore, the reranking unit 150 may calculate the reranking score using both the likelihood of Example 1 and the likelihood of Example 2. For example, the average of the likelihoods of Example 1 and Example 2 may be used as the reranking score.

（変形例１）
次に、変形例１を説明する。変形例１では、抽出部１２０により生成される制約語句リストとして、１つの原言語の語句に対し、複数の目的言語語句が対応するものを用いることができる。このような制約語句リストを、複数訳語を許容する制約語句リストと呼んでもよい。例えば、抽出部１２０のフィルタリング部で（Ｃ）の手順を行わない場合、こうした制約語句リストが生成される場合がある。 (Variation 1)
Next, Modification 1 will be described. In Modification 1, the constraint phrase list generated by the extraction unit 120 can be one in which multiple target language phrases correspond to one source language phrase. Such a constraint phrase list may be called a constraint phrase list that allows multiple translations. For example, such a constraint phrase list may be generated if the filtering unit of the extraction unit 120 does not perform procedure (C).

例えば、ある原言語の語句に対する複数の目的言語語句として、ＡとＡ´があり、これらと、Ｂ、Ｃを含む「Ａ，Ａ´，Ｂ，Ｃ」が、抽出部１２０により制約語句リストの複数要素として生成されたとする。例えば、原言語文の語句がｃｏｍｐｕｔｅｒで、目的言語文の語句が計算機、コンピュータである場合、ＡとＡ´は、計算機とコンピュータに相当する。For example, suppose there are multiple target language phrases A and A' for a given source language phrase, and the extraction unit 120 generates "A, A', B, C" including these two and B and C as multiple elements of the constraint phrase list. For example, if the source language phrase is "computer" and the target language phrases are "computer" and "computer", then A and A' correspond to "computer" and "computer".

ここでは、このような複数要素を有する制約語句リストを｛｛Ａ，Ａ´｝，｛Ｂ｝，｛Ｃ｝｝と表現する。Here, a list of constraint phrases with multiple elements like this is represented as {{A, A'}, {B}, {C}}.

抽出部１２０から｛｛Ａ，Ａ´｝，｛Ｂ｝，｛Ｃ｝｝を入力された入力生成部１３０は、｛｝，｛Ａ｝，｛Ｂ｝，｛Ｃ｝，｛Ａ，Ｂ｝，｛Ａ，Ｃ｝，｛Ｂ，Ｃ｝，｛Ａ，Ｂ，Ｃ｝に加え、｛Ａ´｝，｛Ａ´，Ｂ｝，｛Ａ´，Ｃ｝，｛Ａ´，Ｂ，Ｃ｝も語彙制約として生成する。The input generation unit 130, having received {{A, A'}, {B}, {C}} from the extraction unit 120, generates {}, {A}, {B}, {C}, {A, B}, {A, C}, {B, C}, {A, B, C} as well as {A'}, {A', B}, {A', C}, {A', B, C} as vocabulary constraints.

入力生成部１３０は、生成した複数の語彙制約のそれぞれを、系列生成部１４０に入力する。The input generation unit 130 inputs each of the generated vocabulary constraints to the sequence generation unit 140.

系列生成部１４０は、｛｝，｛Ａ｝，｛Ｂ｝，｛Ｃ｝，｛Ａ，Ｂ｝，｛Ａ，Ｃ｝，｛Ｂ，Ｃ｝，｛Ａ，Ｂ，Ｃ｝，｛Ａ´｝，｛Ａ´，Ｂ｝，｛Ａ´，Ｃ｝，｛Ａ´，Ｂ，Ｃ｝の１２個の語彙制約のそれぞれを語彙制約として使用することにより、１２回の語彙制約付き機械翻訳を行い、翻訳文候補を得る。例えば、仮に、１つの語彙制約に対して１つの翻訳文候補を生成する場合、１２個の翻訳文候補が得られることになる。The sequence generation unit 140 uses each of the 12 lexical constraints {}, {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, {A,B,C}, {A'}, {A',B}, {A',C}, {A',B,C} as a lexical constraint to perform 12 lexical-constrained machine translations and obtain candidate translations. For example, if one candidate translation is generated for each lexical constraint, 12 candidate translations will be obtained.

語彙制約付き機械翻訳を行った後、これまでに説明した方法で、リランキング部１５０が、リランキング処理を行って、例えば、スコアの最も高い翻訳文候補を、最終的な翻訳文として出力する。After performing machine translation with vocabulary constraints, the re-ranking unit 150 performs a re-ranking process using the method described above, and outputs, for example, the translation candidate with the highest score as the final translation.

（変形例２）
次に、変形例２を説明する。変形例２においても、抽出部１２０により生成される制約語句リストとして、１つの原言語の語句に対し、複数の目的言語語句が対応するものを用いることができる。 (Variation 2)
Next, a second modification will be described. In the second modification as well, the constraint phrase list generated by the extraction unit 120 can be one in which multiple target language phrases correspond to one source language phrase.

変形例２では、系列生成部１４０の探索部１４２における翻訳文の探索処理において、１つの制約語句の表現型が複数存在することを許容して探索を行うようにしてもよい。つまり、各制約語句の候補からそれぞれ１つの要素を満たすように探索を行うようにしてもよい。具体的には下記のとおりである。In the second modification, the search process for translated text in the search unit 142 of the sequence generation unit 140 may be modified to allow for multiple phenotypes of a single constraint phrase. In other words, the search may be performed to satisfy one element from each candidate constraint phrase. Specifically, this is as follows.

変形例２においても、ある原言語の語句に対する複数の目的言語語句として、ＡとＡ´があり、これらと、Ｂ、Ｃを含む「Ａ，Ａ´，Ｂ，Ｃ」が、抽出部１２０により制約語句リストの複数要素として生成されたとする。ここでは、制約語句リストとしては、｛Ａ，Ｂ，Ｃ｝が生成され、Ａは、Ａ´でもよいことを示す情報が抽出部１２０から入力生成部１３０に入力されるとする。あるいは、制約語句リストとしては、｛Ａ，Ａ´，Ｂ，Ｃ｝が生成され、ＡとＡ´はどちらでもよいことを示す情報が抽出部１２０から入力生成部１３０に入力されることとしてもよい。なお、上記では、１つの制約語句に対して、２つの曖昧性を許容する形式の例を示しているが、１つの制約語句に対して、３つ以上の曖昧性を許容する形式を用いてもよい。In the second variation, let's assume that there are multiple target language phrases, A and A', for a given source language phrase, and that "A, A', B, C", including B and C, is generated as multiple elements of the constraint phrase list by the extraction unit 120. Here, the constraint phrase list {A, B, C} is generated, and information indicating that A can also be A' is input from the extraction unit 120 to the input generation unit 130. Alternatively, the constraint phrase list {A, A', B, C} may be generated, and information indicating that A and A' can be either may be input from the extraction unit 120 to the input generation unit 130. Note that the above example shows a format that allows two ambiguities for one constraint phrase, but a format that allows three or more ambiguities for one constraint phrase may also be used.

例えば、Ａについて、３つの曖昧性を許容する場合、制約語句リストとしては、｛Ａ，Ｂ，Ｃ｝が生成され、Ａは、Ａ´でもよいし、Ａ´´でもよいことを示す情報が抽出部１２０から入力生成部１３０に入力されるとする。あるいは、制約語句リストとしては、｛Ａ，Ａ´，Ａ´´，Ｂ，Ｃ｝が生成され、ＡとＡ´とＡ´´はどれでもよいことを示す情報が抽出部１２０から入力生成部１３０に入力されることとしてもよい。For example, if three ambiguities are allowed for A, the constraint phrase list {A, B, C} is generated, and information indicating that A can be A' or A'' is input from the extraction unit 120 to the input generation unit 130. Alternatively, the constraint phrase list {A, A', A'', B, C} may be generated, and information indicating that A, A', and A'' can be any of them may be input from the extraction unit 120 to the input generation unit 130.

制約語句リスト｛Ａ，Ｂ，Ｃ｝に対し、ＡはＡ´でもよい場合、入力生成部１３０は、｛｝，｛｛Ａ，Ａ´｝｝，｛｛Ｂ｝｝，｛｛Ｃ｝｝，｛｛Ａ，Ａ´｝，｛Ｂ｝｝，｛｛Ａ，Ａ´｝，｛Ｃ｝｝，｛｛Ａ，Ａ´｝，｛Ｂ｝，｛Ｃ｝｝の７個の語彙候補制約を生成する。なお、変形例２では、ある原言語語句に対して、複数の目的言語語句（例：Ａ，Ａ´）が対応する場合があることから、訳語に曖昧性があり、制約として用いる語彙が確定していないので、語彙制約に代えて「語彙候補制約」と呼んでいる。すなわち、「語彙候補制約」は、曖昧性を保持した語彙制約である。なお、上記の語彙候補制約の表現形式は一例である。ＡでもよいしＡ´でもよい、ということが表現できれば、表現形式として上記の表現形式以外の表現形式を用いてもよい。For the constraint phrase list {A, B, C}, if A can also be A', the input generation unit 130 generates seven lexical candidate constraints: {}, {{A, A'}}, {{B}}, {{C}}, {{A, A'}, {B}}, {{A, A'}, {C}}, {{A, A'}, {B}, {C}}. In the modified example 2, since a certain source language phrase may correspond to multiple target language phrases (e.g., A, A'), there is ambiguity in the translation, and the vocabulary used as constraints is not fixed, so it is called a "lexical candidate constraint" instead of a lexical constraint. In other words, a "lexical candidate constraint" is a lexical constraint that retains ambiguity. Note that the above expression format for lexical candidate constraints is just one example. Any expression format other than the above format may be used as long as it can express that it can be either A or A'.

図９に示すように、系列生成部１４０は、原言語文とともに、語彙候補制約を入力として受け取る。系列生成部１４０は、｛｝，｛｛Ａ，Ａ´｝｝，｛｛Ｂ｝｝，｛｛Ｃ｝｝，｛｛Ａ，Ａ´｝，｛Ｂ｝｝，｛｛Ａ，Ａ´｝，｛Ｃ｝｝，｛｛Ａ，Ａ´｝，｛Ｂ｝，｛Ｃ｝｝の７個の語彙候補制約のそれぞれを語彙候補制約として使用して、７回の語彙制約付き機械翻訳を行い、翻訳文候補を得る。例えば、仮に、１つの語彙候補制約に対して１つの翻訳文候補を生成する場合、７個の翻訳文候補が得られることになる。As shown in Figure 9, the sequence generation unit 140 receives the source language sentence along with the vocabulary candidate constraints as input. The sequence generation unit 140 uses each of the seven vocabulary candidate constraints—{}, {{A, A'}}, {{B}}, {{C}}, {{A, A'}, {B}}, {{A, A'}, {C}}, {{A, A'}, {B}, {C}}—as a vocabulary candidate constraint and performs seven vocabulary-constrained machine translations to obtain candidate translations. For example, if one candidate translation is generated for each vocabulary candidate constraint, seven candidate translations will be obtained.

｛Ａ，Ａ´｝を含む語彙候補制約を用いる場合、系列生成部１４０の探索部１４２では、Ａという単語がＡ´でもよいとして、探索を実行する。つまり、曖昧性を考慮した探索を実行する。探索には、例えばhttps://aclanthology.org/D17-1098/に開示されている参考文献２「Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2017. Guided Open Vocabulary Image Captioning with Constrained Beam Search. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936-945, Copenhagen, Denmark. Association for Computational Linguistics」の手法を用いることができる。この手法は、「曖昧性を考慮した探索」の手法の例である。When using a lexical candidate constraint that includes {A, A'}, the search unit 142 of the sequence generation unit 140 performs the search assuming that the word A may also be A'. In other words, it performs a search that takes ambiguity into account. For the search, for example, the method described in reference 2, "Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2017. Guided Open Vocabulary Image Captioning with Constrained Beam Search. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 936-945, Copenhagen, Denmark. Association for Computational Linguistics," disclosed at https://aclanthology.org/D17-1098/, can be used. This method is an example of a "search that takes ambiguity into account."

参考文献２に開示されている手法では、ＡとＡ´のどちらでもよいという訳語の曖昧性を考慮した語彙制約付きビームサーチを行う。つまり、ビームサーチの際にＡとＡ´の曖昧性が解消される。The method disclosed in Reference 2 performs a lexical-constrained beam search that takes into account the ambiguity of the translation, where either A or A' is acceptable. In other words, the ambiguity between A and A' is resolved during the beam search.

なお、参考文献２に開示されている手法は、言語生成手法ではあるが、翻訳技術ではない。当該手法を翻訳のデコーディング時の探索に応用した従来技術は存在しない。Furthermore, the method disclosed in Reference 2 is a language generation method, but not a translation technique. There are no prior techniques that apply this method to the search process during translation decoding.

なお、これまでの実施の形態の説明、及び、変形例１及び２の説明において、ある原言語の語句に対する複数の目的言語語句（例：Ａ，Ａ´）は、計算機とコンピュータのような同義語だけでなく、Ｔｒｕｎｋに対する車のトランク、象の鼻、幹、幹線、など、同義語以外の語句であってもよい。探索部１４２における探索時には単語の意味を考慮していないので、ＡとＡ´が全く関係ない語句であってもよい。また、本発明に係る技術を翻訳以外のタスクで使用する際には、元の系列の語に対して、変換後系列における何が複数の語に対応するかについては、任意の基準で設定すればよい。Furthermore, in the descriptions of the embodiments described above, and in the descriptions of modifications 1 and 2, the multiple target language terms (e.g., A, A') for a given source language term may not only be synonyms such as "calculator" and "computer," but may also be terms other than synonyms, such as "trunk," "elephant's trunk," "trunk," and "main road" for "trunk." Since the meaning of words is not considered during the search in the search unit 142, A and A' may be completely unrelated terms. Also, when using the technology according to the present invention for tasks other than translation, the criteria for determining which multiple words in the converted sequence correspond to the words in the original sequence can be set according to any arbitrary criteria.

また、これまでの実施の形態の説明、及び、変形例１及び２の説明において、単語の変形（複数形、時制の変化、など）を考慮して形態素解析時に、原型でない語句を原型に変換するようにしてもよい。Furthermore, in the descriptions of the embodiments described above, and in the descriptions of modifications 1 and 2, it is also possible to convert non-base forms of words into their base forms during morphological analysis, taking into account word variations (plural forms, tense changes, etc.).

一例として、対訳辞書が英－和である場合において、「corn－トウモロコシ、魚の目」という辞書エントリがあるとする。この場合、「We roasted corns over the charcoal.」という原言語文が生成装置１００に入力されたと想定する。この場合、抽出部１２０において、形態素単位でマッチングをかけたときに、cornsについては、cornが形態素として含まれるので、対訳辞書にマッチする。しかし、例えば、対訳辞書のエントリがfeetであり、入力文における形態素が、footである場合、マッチしない。そこで、footを原型に戻してマッチングをかけることにより、このような問題を解消できる。As an example, suppose the translation dictionary is English-Japanese and has the dictionary entry "corn - corn, fish eye". In this case, assume that the source language sentence "We roasted corns over the charcoal." is input to the generation device 100. In this case, when the extraction unit 120 performs morphological matching, "corns" matches the translation dictionary because "corn" is included as a morpheme. However, if, for example, the entry in the translation dictionary is "feet" and the morpheme in the input sentence is "foot", then it will not match. This problem can be solved by reverting "foot" back to its base form before performing the matching.

（実施例）
次に、より具体的な例として、これまでに説明した技術を用いた実施例を説明する。本実施例では、後述する表示部５００（表示、及び入力操作が可能な装置）において、制約語句の編集（修正、追加）を行って、その都度、その制約語句に対する目的言語文（翻訳文）を確認することが可能である。 (Examples)
Next, as a more specific example, an embodiment using the technology described above will be explained. In this embodiment, it is possible to edit (modify, add) constraint phrases in the display unit 500 (a device capable of display and input operations), which will be described later, and to check the target language sentence (translated sentence) for each constraint phrase each time.

＜表示イメージ＞
まず、表示部５００での表示イメージを、図１０を参照して説明する。図１０に示す例において、ユーザは、原言語文として、「分路巻線のみに補助巻線を持つ超電導単相単巻変圧器を試作した。」を入力し、「送信」を押す。 <Display Image>
First, the display image on the display unit 500 will be explained with reference to Figure 10. In the example shown in Figure 10, the user enters "We have prototyped a superconducting single-phase autotransformer with auxiliary windings only on the shunt winding" as the source language text and presses "Send".

表示部５００には、入力した原言語文に対する複数の制約語句（制約語句リスト）が表示される。ここで制約語句として表示される語句は、フィルタリング部１２１によるフィルタリング後の語句である。その右側に、フィルタリングされた制約語句が、「追加しますか？」という形で表示される。The display unit 500 displays multiple constraint phrases (a list of constraint phrases) for the input source language sentence. The phrases displayed here as constraint phrases are those filtered by the filtering unit 121. To the right of these, the filtered constraint phrases are displayed in the form of "Do you want to add them?".

ユーザは、チェックボックスをマークすることで、修正（又は削除）あるいは追加したい制約語句を選択し、対応するボタンを押すことで、選択した制約語句の修正（又は削除）／追加を行なうことができる。また、ユーザ自身が作成した制約語句を追加することも可能である。Users can select constraint terms they wish to modify (or delete) or add by checking checkboxes, and then modify (or delete)/add the selected constraint terms by pressing the corresponding buttons. Users can also add constraint terms they have created themselves.

表示イメージにおける「更新」を押すことで、その時点での制約語句を使用した目的言語文を表示させることができる。By pressing "Update" in the displayed image, you can view the target language sentence using the constraint phrases at that time.

＜装置構成、動作＞
上記のような表示を実現するための生成装置１００の構成例を図１１に示す。図１１に示すように、本実施例の生成装置１００は、抽出部１２０、表示情報生成部１７０、修正部１８０、生成部１９０、対訳辞書ＤＢ２００、制約語句リストＤＢ４００を有する。なお、修正部１８０は、表示情報生成部１７０の中に含まれていてもよい。 <Device configuration and operation>
Figure 11 shows an example configuration of a generation device 100 for realizing the above-described display. As shown in Figure 11, the generation device 100 in this embodiment includes an extraction unit 120, a display information generation unit 170, a modification unit 180, a generation unit 190, a bilingual dictionary DB 200, and a constraint word list DB 400. Note that the modification unit 180 may be included in the display information generation unit 170.

なお、対訳辞書ＤＢ２００、制約語句リストＤＢ４００は、生成装置１００の外部に備えられてもよい。また、生成部１９０も、生成装置１００の外部（例えば別サーバ）に備えられてもよい。また、生成装置１００は、制約語句のリストを表示部５００に表示する目的で使用されてもよい。その場合、生成装置１００には、図１１に示す機能部のうち、抽出部１２０と表示情報生成部１７０のみが備えられることとしてもよい。生成装置１００を抽出装置と呼んでもよい。各部の機能は下記のとおりである。Furthermore, the bilingual dictionary DB 200 and the constraint term list DB 400 may be provided outside the generation device 100. Similarly, the generation unit 190 may also be provided outside the generation device 100 (for example, on a separate server). The generation device 100 may also be used for the purpose of displaying the constraint term list on the display unit 500. In that case, the generation device 100 may consist only of the extraction unit 120 and the display information generation unit 170 from the functional units shown in Figure 11. The generation device 100 may also be called the extraction device. The functions of each unit are as follows.

抽出部１２０は、図４又は図５に示した抽出部１２０である。原言語文を入力として、制約語句リストを出力する。出力された制約語句リストは、制約語句リストＤＢ４００に格納されるとともに、表示情報生成部１７０に入力される。また、抽出部１２０は、フィルタリングした制約語句を、フィルタ語句リストとして出力してもよい。出力されたフィルタ語句リストは、表示情報生成部１７０に入力される。The extraction unit 120 is the extraction unit 120 shown in Figure 4 or Figure 5. It takes the source language sentence as input and outputs a list of constraint words. The output list of constraint words is stored in the constraint word list DB 400 and is also input to the display information generation unit 170. The extraction unit 120 may also output the filtered constraint words as a filtered word list. The output filtered word list is also input to the display information generation unit 170.

表示情報生成部１７０は、制約語句リストを表示部５００に表示するための情報（制約語句リスト提示用情報と呼ぶ）を生成する。制約語句リスト提示用情報には、制約語句リストが含まれる。また、制約語句リスト提示用情報に、フィルタ語句リストの情報を、削除してしまった情報又はフィルタ候補語句又は追加候補として含めてもよい。制約語句リスト提示用情報は、表示情報生成部１７０から表示部５００に送信され、表示部５００に入力される。また、表示情報生成部１７０は、制約語句を修正可能な形式で、当該制約語句を用いて生成された目的言語文（翻訳文）とともに表示するための表示情報を生成してもよい。また、生成装置１００が、表示部５００から追加又は修正がなされた制約語句を受信した際に、表示情報生成部１７０は、受信した制約語句に基づいて生成された目的言語文（翻訳文）を取得し、当該目的言語文（翻訳文）を表示するための表示情報を生成してもよい。The display information generation unit 170 generates information (referred to as information for presenting the constraint phrase list) for displaying the constraint phrase list on the display unit 500. The information for presenting the constraint phrase list includes the constraint phrase list. Furthermore, the information for presenting the constraint phrase list may also include information from the filter phrase list, such as deleted information, filter candidate phrases, or additional candidate phrases. The information for presenting the constraint phrase list is transmitted from the display information generation unit 170 to the display unit 500 and input to the display unit 500. The display information generation unit 170 may also generate display information for displaying the constraint phrases in a format that allows for modification, along with the target language sentence (translated text) generated using the constraint phrases. Additionally, when the generation device 100 receives added or modified constraint phrases from the display unit 500, the display information generation unit 170 may acquire the target language sentence (translated text) generated based on the received constraint phrases and generate display information for displaying the target language sentence (translated text).

また、表示情報生成部１７０は、ユーザが制約語句リストを確認する際の「修正支援情報」を生成し、それを表示部５００に送信してもよい。修正支援情報には、ユーザが入力した原言語文、抽出された制約語句リスト、抽出された制約語句リストに基づいて生成された目的言語文の少なくとも１つを含む。Furthermore, the display information generation unit 170 may generate "correction support information" for the user to use when reviewing the constraint word list, and transmit it to the display unit 500. The correction support information includes at least one of the source language sentence entered by the user, the extracted constraint word list, and the target language sentence generated based on the extracted constraint word list.

修正部１８０は、ユーザが、提示された制約語句リストを修正した情報として、追加制約語句又は修正制約語句の少なくとも一方を、表示部５００から受信する。The modification unit 180 receives from the display unit 500 at least one of the additional constraint phrases or modified constraint phrases as information indicating that the user has modified the presented list of constraint phrases.

修正部１８０は、受信した情報を元に、制約語句リストＤＢ４００に保存された情報を修正する。また、制約語句リストが修正された際に、修正された制約語句リストを元に、語彙制約付き機械翻訳によって再度目的言語文の生成を行い、表示情報生成部１７０で、当該目的言語文を有する修正支援情報を生成し、これを表示部５００に送信することで、表示部５００で表示してもよい。The modification unit 180 modifies the information stored in the constraint word list DB 400 based on the received information. Furthermore, when the constraint word list is modified, the target language sentence may be regenerated using vocabulary-constrained machine translation based on the modified constraint word list. The display information generation unit 170 then generates modification support information containing the target language sentence, which is transmitted to the display unit 500 for display.

生成部１９０は、入力生成部１３０、系列生成部１４０、及びリランキング部１５０を有する。これまでに説明したとおり、生成部１９０は、これらの機能部により、制約語句リストＤＢ４００から読み出した制約語句リストと、表示部５００から受信した原言語文とに基づいて、語彙制約を考慮した目的言語文（翻訳文）を生成し、生成した目的言語文を表示情報生成部１７０に入力する。The generation unit 190 includes an input generation unit 130, a sequence generation unit 140, and a reranking unit 150. As described above, the generation unit 190 uses these functional units to generate a target language sentence (translated sentence) that takes vocabulary constraints into account, based on the constraint phrase list read from the constraint phrase list DB 400 and the source language sentence received from the display unit 500, and inputs the generated target language sentence to the display information generation unit 170.

表示部５００は、例えば、ディスプレイを有するコンピュータ（端末）である。表示部５００は、生成装置１００とネットワークを介して接続される。図１０を参照して説明したように、表示部５００は、ユーザから原言語文を受け付けるとともに、制約語句リスト等を表示する。また、表示部５００は、制約語句や原言語文について、追加・修正指示を受け付ける。また、表示部５００は、原言語文、最終的な目的言語文、及び、最終的な制約語句リストをセットで出力することも可能である。The display unit 500 is, for example, a computer (terminal) having a display. The display unit 500 is connected to the generation device 100 via a network. As explained with reference to Figure 10, the display unit 500 receives the source language text from the user and displays the constraint word list, etc. The display unit 500 also receives instructions to add or modify constraint words and the source language text. The display unit 500 can also output the source language text, the final target language text, and the final constraint word list as a set.

上記実施例における生成装置１００により、語彙制約付き機械翻訳の結果を確認しながら制約語句リストの修正を行うことをインタラクティブに繰り返す事で、よりユーザのイメージに近い目的言語文（翻訳文）を生成することができる。The generation device 100 in the above embodiment allows for interactively modifying the list of constraint phrases while checking the results of machine translation with vocabulary constraints, thereby generating a target language sentence (translated sentence) that is closer to the user's intended image.

（実験結果）
以下の実験結果の説明では、「本実施の形態における生成装置１００」を提案手法あるいは提案システムと呼ぶ。 (Experimental results)
In the following explanation of the experimental results, "the generation apparatus 100 in this embodiment" will be referred to as the proposed method or proposed system.

提案手法により自動抽出した語彙制約に対する翻訳候補のリランキングによる語彙制約付き機械翻訳手法の有効性を確認するため、日英翻訳を対象として対訳辞書から自動抽出した語彙制約による語彙制約付き機械翻訳の精度評価を行った。To confirm the effectiveness of the proposed machine translation method with vocabulary constraints, which involves re-ranking translation candidates against automatically extracted vocabulary constraints, we evaluated the accuracy of machine translation with vocabulary constraints using vocabulary constraints automatically extracted from a parallel dictionary, focusing on Japanese-English translation.

＜対訳辞書について＞
語彙制約を抽出するために用いる対訳辞書として、汎用的な辞書であるEDR日英対訳辞書（EDR-JE）および日英翻訳システムALT-J/E の対訳辞書を用いた。 <About the Bilingual Dictionary>
To extract vocabulary constraints, we used the general-purpose EDR Japanese-English Bilingual Dictionary (EDR-JE) and the Japanese-English translation system ALT-J/E as bilingual dictionaries.

＜モデル＞
評価に用いる翻訳モデルとしては以下のものを使用した。 <Model>
The following translation models were used for evaluation.

・Transformer
・LeCA + {EDR-JE, ALT-J/E}
・LeCA+LCD + {EDR-JE, ALT-J/E}
翻訳モデルの学習および評価に使用する対訳コーパスにはASPECを用いた。各モデルの詳細な設定およびハイパーパラメータについては、図１２に示すものを用いた。 Transformer
• LeCA + {EDR-JE, ALT-J/E}
・LeCA+LCD + {EDR-JE, ALT-J/E}
ASPEC was used as the parallel corpus for training and evaluating the translation models. The detailed settings and hyperparameters for each model are shown in Figure 12.

辞書から抽出した制約に対しては２^｜Ｃ｜個の語彙制約からそれぞれ生成文の上位３０文を集めたものを用いた。翻訳候補のリランキングの際に用いるスコアにはRerankerにより、原言語文と翻訳候補からリランキングモデルの計算するスコアを用いた。 For constraints extracted from the dictionary, we used the top 30 generated sentences for each of the 2 ^|C| lexical constraints. For the reranking of translation candidates, we used a score calculated by a reranking model using the source language sentences and translation candidates via Reranker.

リランキングモデルには文末から文頭へと翻訳文を生成するRight-to-Left 翻訳タスクをTransformer（big）で学習したモデルを使用した。その際のリランキングのスコアとしては入力された翻訳候補をforced decodingした際の尤度を用いた。各手法の評価には翻訳精度の自動評価尺度であるBLEUを用いた。The reranking model used was a model trained on a Right-to-Left translation task that generates translated text from the end of a sentence to the beginning of a sentence, using a Transformer (big) model. The reranking score was calculated using the likelihood obtained from forced decoding of the input translation candidates. BLEU, an automated evaluation metric for translation accuracy, was used to evaluate each method.

＜実験結果について＞
対訳辞書により自動抽出した語彙制約を用いた際の各手法の翻訳精度を図１３に示す。リランキングモデルによるスコアを用いるRerankerにおいて、ベースライン（Transformer）に対してLeCAやLeCA+LCDが翻訳精度を向上させることができていることがわかる。また、図１３から、辞書の種類に依らずに、翻訳精度が高いことがわかる。 <About the experimental results>
Figure 13 shows the translation accuracy of each method when using vocabulary constraints automatically extracted from a bilingual dictionary. In the Reranker, which uses a score from a reranking model, it can be seen that LeCA and LeCA+LCD are able to improve translation accuracy compared to the baseline (Transformer). Furthermore, Figure 13 shows that the translation accuracy is high regardless of the type of dictionary used.

（ハードウェア構成例）
本実施の形態で説明したいずれの装置（生成装置１００、抽出装置）も、例えば、コンピュータにプログラムを実行させることにより実現できる。このコンピュータは、物理的なコンピュータであってもよいし、クラウド上の仮想マシンであってもよい。 (Example hardware configuration)
Any of the devices described in this embodiment (generation device 100, extraction device) can be realized, for example, by having a computer run a program. This computer may be a physical computer or a virtual machine on the cloud.

すなわち、当該装置は、コンピュータに内蔵されるＣＰＵやメモリ等のハードウェア資源を用いて、当該装置で実施される処理に対応するプログラムを実行することによって実現することが可能である。上記プログラムは、コンピュータが読み取り可能な記録媒体（可搬メモリ等）に記録して、保存したり、配布したりすることが可能である。また、上記プログラムをインターネットや電子メール等、ネットワークを通して提供することも可能である。In other words, the device can be realized by using hardware resources such as the CPU and memory built into a computer to execute a program corresponding to the processing performed by the device. This program can be recorded on a computer-readable recording medium (such as portable memory), saved, and distributed. Furthermore, this program can also be provided via a network, such as the internet or email.

図１４は、上記コンピュータのハードウェア構成例を示す図である。図１４のコンピュータは、それぞれバスＢＳで相互に接続されているドライブ装置１０００、補助記憶装置１００２、メモリ装置１００３、ＣＰＵ１００４、インタフェース装置１００５、表示装置１００６、入力装置１００７、出力装置１００８等を有する。なお、当該コンピュータは、更にＧＰＵを備えてもよい。Figure 14 shows an example of the hardware configuration of the computer described above. The computer in Figure 14 has a drive device 1000, an auxiliary storage device 1002, a memory device 1003, a CPU 1004, an interface device 1005, a display device 1006, an input device 1007, an output device 1008, etc., all of which are interconnected by a bus BS. The computer may also be equipped with a GPU.

当該コンピュータでの処理を実現するプログラムは、例えば、ＣＤ－ＲＯＭ又はメモリカード等の記録媒体１００１によって提供される。プログラムを記憶した記録媒体１００１がドライブ装置１０００にセットされると、プログラムが記録媒体１００１からドライブ装置１０００を介して補助記憶装置１００２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１００１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１００２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。The program that enables processing on the computer is provided on a recording medium 1001, such as a CD-ROM or memory card. When the recording medium 1001 containing the program is set in the drive device 1000, the program is installed from the recording medium 1001 to the auxiliary storage device 1002 via the drive device 1000. However, the program does not necessarily have to be installed from the recording medium 1001; it may also be downloaded from another computer via a network. The auxiliary storage device 1002 stores the installed program as well as necessary files and data.

メモリ装置１００３は、プログラムの起動指示があった場合に、補助記憶装置１００２からプログラムを読み出して格納する。ＣＰＵ１００４は、メモリ装置１００３に格納されたプログラムに従って、ライトタッチ維持装置１００に係る機能を実現する。インタフェース装置１００５は、ネットワーク等に接続するためのインタフェースとして用いられる。表示装置１００６はプログラムによるＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）等を表示する。入力装置１００７はキーボード及びマウス、ボタン、又はタッチパネル等で構成され、様々な操作指示を入力させるために用いられる。出力装置１００８は演算結果を出力する。The memory device 1003 reads and stores a program from the auxiliary storage device 1002 when a program startup command is received. The CPU 1004 implements the functions related to the light touch maintenance device 100 according to the program stored in the memory device 1003. The interface device 1005 is used as an interface for connecting to a network, etc. The display device 1006 displays a GUI (Graphical User Interface) etc. based on the program. The input device 1007 consists of a keyboard and mouse, buttons, or a touch panel etc., and is used to input various operation commands. The output device 1008 outputs the calculation results.

（実施の形態のまとめ、効果等）
以上説明したとおり、本実施の形態で説明した技術により、語彙制約付き機械翻訳に用いる制約語句を、低いノイズで適切に自動抽出することが可能となる。また、本実施の形態で説明した技術により、語彙制約付き機械翻訳において、精度良く翻訳を行うことができる。 (Summary of the embodiment, effects, etc.)
As described above, the technology described in this embodiment makes it possible to automatically extract constraint phrases used in vocabulary-constrained machine translation with low noise. Furthermore, the technology described in this embodiment enables accurate translation in vocabulary-constrained machine translation.

以上の実施形態に関し、更に以下の付記１と付記２を開示する。Further details regarding the above embodiments are disclosed below, specifically Appendix 1 and Appendix 2.

＜付記１＞
（付記項１）
メモリと、
前記メモリに接続された少なくとも１つのプロセッサと、
を含み、
前記プロセッサは、
第１情報と第２情報とのペアの集合である辞書における前記第１情報と、第１系列のそれぞれを単位情報に分割し、
前記第１系列の単位情報にマッチする前記第１情報に対応する前記第２情報を、前記辞書から、前記第１系列に基づいて第２系列を生成するために使用される制約情報として抽出する
抽出装置。
（付記項２）
前記プロセッサは、前記辞書から、予め定めたルールに合致するペアを削除し、当該削除処理が施された辞書を使用する
付記項１に記載の抽出装置。
（付記項３）
前記予め定めたルールに合致するペアは、名詞以外の語句又は名詞句以外の語句を含むペア、長さが１の語句からなるペア、及び、第１情報と第２情報との対応関係に一意性が無いペアのうちの少なくともいずれか１つである
付記項２に記載の抽出装置。
（付記項４）
前記プロセッサは、曖昧性を解消するように、前記第１系列の単位情報と前記第１情報とのマッチングを実行する
付記項１ないし３のうちいずれか１項に記載の抽出装置。
（付記項５）
前記プロセッサは、前記制約情報を表示部に送信するための表示情報を生成し、前記表示部に表示された制約情報に対して追加又は修正がなされた制約情報を受信する
付記項１ないし４のうちいずれか１項に記載の抽出装置。
（付記項６）
メモリと、
前記メモリに接続された少なくとも１つのプロセッサと、
を含み、
前記プロセッサは、
第１系列を入力として、当該第１系列と、第１情報と第２情報とのペアの集合である辞書とに基づいて、制約情報を抽出し、
前記制約情報と前記第１系列とに基づいて第２系列を生成し、
前記制約情報を修正可能な形式で、前記第２系列とともに表示するための表示情報を生成する
生成装置。
（付記項７）
追加又は修正がなされた制約情報を受信した際に、前記プロセッサは、受信した制約情報に基づいて生成された系列を取得し、当該系列を表示するための表示情報を生成する
付記項６に記載の生成装置。
（付記項８）
前記プロセッサは、予め定めたルールに基づきフィルタリングされた制約情報を追加候補として表示するための表示情報を生成する
付記項６又は７に記載の生成装置。
（付記項９）
コンピュータが実行する抽出方法であって、
第１情報と第２情報とのペアの集合である辞書における前記第１情報と、第１系列のそれぞれを単位情報に分割する分割ステップと、
前記第１系列の単位情報にマッチする前記第１情報に対応する前記第２情報を、前記辞書から、前記第１系列に基づいて第２系列を生成するために使用される制約情報として抽出する制約情報抽出ステップと
を備える抽出方法。
（付記項１０）
コンピュータが実行する生成方法であって、
第１系列を入力として、当該第１系列と、第１情報と第２情報とのペアの集合である辞書とに基づいて、制約情報を抽出する抽出ステップと、
前記制約情報と前記第１系列とに基づいて第２系列を生成する生成ステップと、
前記制約情報を修正可能な形式で、前記第２系列とともに表示するための表示情報を生成する表示情報生成ステップと
を備える生成方法。
（付記項１１）
コンピュータを、付記項１ないし５のうちいずれか１項に記載の抽出装置として機能させるためのプログラムを記憶した非一時的記憶媒体。 <Note 1>
(Additional note 1)
Memory and
At least one processor connected to the memory,
Includes,
The aforementioned processor,
The dictionary is a collection of pairs of first and second pieces of information. The first piece of information and each of the first sequence are divided into unit pieces of information.
An extraction device that extracts second information corresponding to the first information that matches the unit information of the first series from the dictionary as constraint information used to generate the second series based on the first series.
(Additional note 2)
The extraction device according to Appendix 1, wherein the processor deletes pairs from the dictionary that match predetermined rules and uses the dictionary from which the deletion process has been performed.
(Additional note 3)
The extraction device described in Appendix 2, wherein a pair that conforms to the predetermined rule is at least one of the following: a pair containing a word or phrase other than a noun, a pair consisting of a word or phrase of length 1, and a pair in which the correspondence between the first information and the second information is not unique.
(Additional note 4)
The extraction device according to any one of the appendices 1 to 3, wherein the processor performs matching of the first series of unit information with the first information in order to resolve ambiguity.
(Additional note 5)
The extraction device according to any one of the appendices 1 to 4, wherein the processor generates display information for transmitting the constraint information to the display unit, and receives constraint information that has been added to or modified from the constraint information displayed on the display unit.
(Additional note 6)
Memory and
At least one processor connected to the memory,
Includes,
The aforementioned processor,
Taking the first sequence as input, constraint information is extracted based on the first sequence and a dictionary which is a set of pairs of first and second pieces of information.
A second sequence is generated based on the aforementioned constraint information and the first sequence.
A generating device that generates display information for displaying the aforementioned constraint information together with the second series in a format that allows for modification of the constraint information.
(Supplementary Note 7)
The generation device according to Appendix 6, wherein, upon receiving constraint information that has been added or modified, the processor obtains a series generated based on the received constraint information and generates display information for displaying the series.
(Supplementary Note 8)
The processor generates display information for displaying constraint information filtered based on predetermined rules as additional candidates, as described in Appendix 6 or 7.
(Supplementary Note 9)
A computer-based extraction method,
The first information in a dictionary is a set of pairs of first and second information, and each of the first sequences is divided into unit information in a division step,
An extraction method comprising: a constraint information extraction step of extracting from the dictionary the second information corresponding to the first information that matches the unit information of the first series, as constraint information used to generate the second series based on the first series.
(Supplementary Note 10)
A generation method performed by a computer,
An extraction step in which constraint information is extracted based on the first sequence as input and a dictionary which is a set of pairs of first and second pieces of information,
A generation step of generating a second sequence based on the constraint information and the first sequence,
A generation method comprising: a display information generation step of generating display information for displaying the aforementioned constraint information together with the second series in a format that can be modified.
(Supplementary Note 11)
A non-temporary storage medium storing a program that causes a computer to function as an extraction device as described in any one of the appendices 1 through 5.

＜付記２＞
（付記項１）
制約情報と、情報の系列である第１系列とから別の情報の系列である第２系列を生成するための生成装置であって、
メモリと、
前記メモリに接続された少なくとも１つのプロセッサと、
を含み、
前記プロセッサは、
制約情報リストを入力し、前記制約情報リストに含まれる１又は複数の制約情報の部分集合の各要素を語彙制約として出力し、
前記第１系列と前記語彙制約とを用いて、前記第２系列についての１又は複数の候補を生成し、
前記第２系列としての適切さを示すスコアを、前記１又は複数の候補のそれぞれについて算出する
生成装置。
（付記項２）
前記プロセッサは、前記系列生成部において前記候補の生成に用いたモデルにより出力される尤度と、リランキングモデルにより前記候補から得られる尤度のうちの少なくとも一方に基づいて、前記スコアを算出する
付記項１に記載の生成装置。
（付記項３）
前記制約情報リストに曖昧性を有する制約情報が含まれる場合において、前記プロセッサは、曖昧性を考慮した語彙制約付きのビームサーチを行うことにより、前記１又は複数の候補を生成する
付記項１又は２に記載の生成装置。
（付記項４）
少なくとも１つの制約情報が、２以上の曖昧性を許容する形式で前記プロセッサに入力され、前記プロセッサは、当該曖昧性を保持したまま語彙制約を生成する
付記項１ないし３のうちいずれか１項に記載の生成装置。
（付記項５）
制約情報と、情報の系列である第１系列とから別の情報の系列である第２系列を生成するためのコンピュータが実行する生成方法であって、
制約情報リストを入力し、前記制約情報リストに含まれる１又は複数の制約情報の部分集合の各要素を語彙制約として出力する入力生成ステップと、
前記第１系列と前記語彙制約とを用いて、前記第２系列についての１又は複数の候補を生成する系列生成ステップと、
前記第２系列としての適切さを示すスコアを、前記１又は複数の候補のそれぞれについて算出するリランキングステップと
を備える生成方法。
（付記項６）
コンピュータを、付記項１ないし４のうちいずれか１項に記載の生成装置における各部として機能させるためのプログラムを記憶した非一時的記憶媒体。 <Note 2>
(Additional note 1)
A generation device for generating a second sequence of information from constraint information and a first sequence of information, which is another sequence of information,
Memory and
At least one processor connected to the memory,
Includes,
The aforementioned processor,
A constraint information list is input, and each element of one or more subsets of constraint information included in the constraint information list is output as a lexical constraint.
Using the first sequence and the lexical constraints, one or more candidates for the second sequence are generated.
A generating device that calculates a score indicating the suitability of each of the one or more candidates as the second series.
(Additional note 2)
The generation apparatus according to Appendix 1, wherein the processor calculates the score based on at least one of the likelihood output by the model used to generate the candidates in the sequence generation unit and the likelihood obtained from the candidates by the reranking model.
(Additional note 3)
The generation apparatus according to Appendix 1 or 2, wherein, when the constraint information list includes constraint information with ambiguity, the processor generates one or more candidates by performing a beam search with vocabulary constraints that take ambiguity into account.
(Additional note 4)
The generation apparatus according to any one of the appendices 1 to 3, wherein at least one constraint information is input to the processor in a format that allows for two or more ambiguities, and the processor generates lexical constraints while retaining the ambiguities.
(Additional note 5)
A computer generation method for generating a second sequence of information from constraint information and a first sequence of information, wherein
An input generation step that inputs a constraint information list and outputs each element of one or more subsets of constraint information included in the constraint information list as a lexical constraint,
A sequence generation step that generates one or more candidates for the second sequence using the first sequence and the lexical constraints,
A generation method comprising: a reranking step of calculating a score for each of the one or more candidates indicating their suitability as the second series.
(Additional note 6)
A non-temporary storage medium storing a program for causing a computer to function as a component in any one of the generating devices described in any one of the appendices 1 to 4.

以上、本実施の形態について説明したが、本発明はかかる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。Although this embodiment has been described above, the present invention is not limited to this specific embodiment, and various modifications and changes are possible within the scope of the gist of the invention as described in the claims.

１００生成装置
１１０入力部
１２０抽出部
１２１フィルタリング部
１２２分割部
１２３制約語句抽出部
１３０入力生成部
１４０系列生成部
１４１系列変換部
１４２探索部
１５０リランキング部
１６０出力部
１７０表示情報生成部
１８０修正部
１９０生成部
２００対訳辞書ＤＢ
３００モデルＤＢ
４００制約語句リストＤＢ
５００表示部
１０００ドライブ装置
１００１記録媒体
１００２補助記憶装置
１００３メモリ装置
１００４ＣＰＵ
１００５インタフェース装置
１００６表示装置
１００７入力装置
１００８出力装置 100 Generator 110 Input unit 120 Extraction unit 121 Filtering unit 122 Splitting unit 123 Constraint phrase extraction unit 130 Input generation unit 140 Sequence generation unit 141 Sequence conversion unit 142 Search unit 150 Reranking unit 160 Output unit 170 Display information generation unit 180 Correction unit 190 Generation unit 200 Bilingual dictionary DB
300 Model DB
400 Constraint Phrase List DB
500 Display unit 1000 Drive device 1001 Recording medium 1002 Auxiliary storage device 1003 Memory device 1004 CPU
1005 Interface device 1006 Display device 1007 Input device 1008 Output device

Claims

A generation device for generating a second sequence of information from constraint information and a first sequence of information, which is another sequence of information,
An input generation unit that takes a constraint information list as input and outputs each element of a subset of multiple constraint information included in the constraint information list as a lexical constraint,
A sequence generation unit generates one or more candidates for the second sequence using the first sequence and the lexical constraints for each of the multiple lexical constraints output from the input generation unit ,
A generating apparatus comprising: a reranking unit that calculates a score indicating the appropriateness of the second series for each of the one or more candidates obtained for each of the multiple lexical constraints .

The generation apparatus according to claim 1, wherein the reranking unit calculates the score based on at least one of the likelihood output by the model used to generate the candidates in the sequence generation unit and the likelihood obtained from the candidates by the reranking model.

The generation apparatus according to claim 1, wherein, when the constraint information list includes constraint information with ambiguity, the sequence generation unit generates one or more candidates by performing a beam search with vocabulary constraints that take ambiguity into consideration.

The generation apparatus according to claim 1, wherein at least one constraint information is input to the input generation unit in a format that allows for two or more ambiguities, and the input generation unit generates lexical constraints while retaining the ambiguities.

The aforementioned constraint information is a constraint phrase, and the aforementioned constraint information list is a constraint phrase list.
The generating apparatus according to claim 1.

The aforementioned constraint information list includes a constraint phrase list in which multiple target language phrases correspond to a given source language phrase.
The generating apparatus according to claim 5.

A generation method performed by a generation device for generating a second sequence of information from constraint information and a first sequence of information, the second sequence of information being another sequence of information,
An input generation step that takes a constraint information list as input and outputs each element of a subset of multiple constraint information included in the constraint information list as a lexical constraint,
A sequence generation step in which, for each of the multiple lexical constraints output by the input generation step , one or more candidates for the second sequence are generated using the first sequence and the lexical constraints,
A generation method comprising: a reranking step of calculating a score indicating the appropriateness of the second series for each of the one or more candidates obtained for each of the multiple lexical constraints .

A program for causing a computer to function as a component in the generating apparatus described in any one of claims 1 to 6 .