JP7687589B2

JP7687589B2 - Question creation device, question creation method, and program

Info

Publication number: JP7687589B2
Application number: JP2021081813A
Authority: JP
Inventors: 倫太今井; 匠哉松森; 哲平吉野; 遼一柴田
Original assignee: Keio University
Current assignee: Keio University
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2025-06-03
Anticipated expiration: 2041-05-13
Also published as: JP2022175437A

Description

本開示は、問題文作成装置、問題文作成方法及びプログラムに関する。 This disclosure relates to a question creation device, a question creation method, and a program.

近年の情報技術の発展によって、様々な分野に情報技術が適用されてきている。例えば、情報技術を利用した各種学習支援ツールが開発されてきている。このような学習支援ツールの一例として、スマートフォン、タブレット、パーソナルコンピュータなどを操作して、学習支援アプリ等を利用して学習者は、学習支援アプリ上に表示された問題を解くことによって英語などを学習することができる。 With the recent development of information technology, information technology is being applied to various fields. For example, various learning support tools that utilize information technology have been developed. One example of such a learning support tool is a learning support app that allows a learner to operate a smartphone, tablet, personal computer, etc. and learn English by solving problems displayed on the learning support app.

このような学習支援アプリによって提供される問題形式の一例として穴埋め問題があるが、このような穴埋め問題を自動生成するための問題文作成方法が提案されている。穴埋め問題とは、前後の文脈から空欄に入る単語や用語を学習者に推測させ、あるいは、選択肢から選択させる形式の問題である。 One example of the question format provided by such learning support apps is fill-in-the-blank questions, and a method for creating questions to automatically generate such fill-in-the-blank questions has been proposed. Fill-in-the-blank questions are questions that require learners to guess the word or term that should fill in the blank from the context, or to select it from a list of options.

特開２０１８－１７９０４号公報JP 2018-17904 A

近年、自然言語処理等の分野に人工知能が利用され、ＧｏｏｇｌｅによるＢＥＲＴ（ＢｉｄｉｒｅｃｔｉｏｎａｌＥｎｃｏｄｅｒＲｅｐｒｅｓｅｎｔａｔｉｏｎｓｆｒｏｍＴｒａｎｓｆｏｒｍｅｒｓ）などの様々な自然言語処理モデルが開発され、自然言語処理におけるこれらの自然言語処理モデルのパフォーマンスの高さが注目されている。 In recent years, artificial intelligence has been used in fields such as natural language processing, and various natural language processing models such as BERT (Bidirectional Encoder Representations from Transformers) by Google have been developed, and the high performance of these natural language processing models in natural language processing has attracted attention.

本開示の課題は、自然言語処理モデルを利用した穴埋め問題のための問題文作成技術を提供することである。 The objective of this disclosure is to provide a technique for creating questions for fill-in-the-blank questions using a natural language processing model.

上記課題を解決するため、本発明の一態様は、データセットから学習対象の単語を含む第１の文を取得する文取得部と、前記第１の文において前記単語をマスクし、推測モデルを利用して、前記第１の文のマスク箇所における前記単語の確信度と前記マスク箇所に入り得る他の候補である候補単語の確信度とを算出する確信度算出部と、前記単語の確信度が前記候補単語の確信度よりも所定値以上大きい場合、前記マスクされた文を穴埋め問題として出力する問題文出力部と、を有する問題文作成装置に関する。 In order to solve the above problem, one aspect of the present invention relates to a question creation device having a sentence acquisition unit that acquires a first sentence including a word to be learned from a dataset, a confidence calculation unit that masks the word in the first sentence and calculates the confidence of the word in the masked portion of the first sentence and the confidence of a candidate word that is another candidate that can be placed in the masked portion using a prediction model, and a question output unit that outputs the masked sentence as a fill-in-the-blank question if the confidence of the word is greater than the confidence of the candidate word by a predetermined value or more.

本開示によると、自然言語処理モデルを利用した穴埋め問題のための問題文作成技術を提供することができる。 The present disclosure provides a technique for creating questions for fill-in-the-blank questions that utilizes a natural language processing model.

本開示の一実施例による問題文作成装置を示す概略図である。1 is a schematic diagram showing a question sentence creation device according to an embodiment of the present disclosure. 本開示の一実施例による学習支援システムによる学習者の学習を示す概略図である。1 is a schematic diagram showing a learner's learning using a learning support system according to an embodiment of the present disclosure. 本開示の一実施例による問題文作成処理を示す概略図である。1 is a schematic diagram showing a question sentence creation process according to an embodiment of the present disclosure. 本開示の一実施例による問題文作成装置のハードウェア構成を示すブロック図である。1 is a block diagram showing a hardware configuration of a question sentence creation device according to an embodiment of the present disclosure. 本開示の一実施例による問題文作成装置の機能構成を示すブロック図である。1 is a block diagram showing a functional configuration of a question sentence creation device according to an embodiment of the present disclosure. 本開示の他の実施例による問題文作成処理を示す概略図である。FIG. 11 is a schematic diagram showing a question sentence creation process according to another embodiment of the present disclosure. 本開示の他の実施例による問題文作成処理を示す概略図である。FIG. 11 is a schematic diagram showing a question sentence creation process according to another embodiment of the present disclosure. 本開示の他の実施例による問題文作成装置の機能構成を示すブロック図である。FIG. 13 is a block diagram showing a functional configuration of a question sentence creation device according to another embodiment of the present disclosure. 本開示の一実施例による問題文作成処理を示すフローチャートである。13 is a flowchart illustrating a question sentence creation process according to an embodiment of the present disclosure.

＜用語の説明＞
・本明細書において、確信度とは、Ｍａｓｋ予測課題(単語穴埋め課題)の正当(予測)確率であり、単語の確信度とは、その単語が文中の該当の箇所に入る確からしさ（確信度）である。「単語の確信度」は、文内にある他の一つ以上の単語に基づいて計算される。また、該当の文の前後の文内にある一つ以上の単語にも基づいて、「単語の確信度」が計算されてもよい。その値（つまり、確信度の値）が高いほど、他の単語から類推して該当の単語が用いられることが多いことを表す。なお、確信度は算出できればよく、特定の算出手段である必要はない。本実施例では、ＢＥＲＴが算出する確信度を用いて説明する。 <Terminology>
In this specification, the confidence level is the correct (predicted) probability of a mask prediction task (word fill-in-the-blank task), and the confidence level of a word is the likelihood (confidence level) that the word will be placed in the corresponding position in the sentence. The "confidence level of a word" is calculated based on one or more other words in the sentence. The "confidence level of a word" may also be calculated based on one or more words in the sentences before and after the corresponding sentence. The higher the value (i.e., the confidence level value), the more frequently the corresponding word is used by analogy with other words. Note that it is sufficient for the confidence level to be calculated, and there is no need for a specific calculation method. In this embodiment, the confidence level calculated by BERT is used for explanation.

以下の実施例では、穴埋め問題のための問題文作成装置が開示される。 In the following embodiment, a question creation device for fill-in-the-blank questions is disclosed.

本開示の一実施例による問題文作成装置１００は、図１に示されるように、学習対象の単語と当該単語を含む文とを入力として受け付けると、ＢＥＲＴなどの何れか適当な自然言語処理モデルを利用して、入力された単語に対する穴埋め問題を出力する。例えば、図２に示されるように、学習者Ａが学習支援システムによって運営される学習用アプリを用いて英語を学習するとき、学習支援システムは、学習者Ａの学習履歴などから学習者Ａが習得済みの英単語に関する情報を取得すると共に、データベースやウェブ上から取得した英文コンテンツから、未習得の英単語に対する穴埋め問題を作成する。 As shown in FIG. 1, a question creation device 100 according to an embodiment of the present disclosure receives as input a word to be learned and a sentence containing the word, and outputs fill-in-the-blank questions for the input word using any appropriate natural language processing model such as BERT. For example, as shown in FIG. 2, when learner A learns English using a learning app operated by a learning support system, the learning support system obtains information about English words that learner A has learned from learner A's learning history, etc., and creates fill-in-the-blank questions for English words that learner A has not learned from English content obtained from a database or the web.

具体的には、問題文作成装置１００は、図３に示されるように、データセット（図３のＳ１１、後述する図７のＳ３１）などから取得した文（図３のＳ１２、図７のＳ３２）において学習対象の英単語をマスク（図３のＳ１３、図７のＳ３３）し、マスク箇所を含む文をＢＥＲＴなどの自然言語処理モデルに入力（図３のＳ１４、図７のＳ３４）し、当該マスク箇所に入るべき英単語の候補を推定する。本開示によると、問題文作成装置１００は、自然言語処理モデルから推定される英単語の各候補の確信度を推定し、最も高い第１の確信度、すなわち、マスクされた学習対象の英単語の確信度と次に高い第２の確信度との間の差分が所定値以上である場合、当該マスク箇所を含む文を第１の確信度を有する英単語に対する穴埋め問題として出力する（図３のＳ１５、図７のＳ３５）。これにより、マスク箇所に入るべき英単語として、他の英単語候補よりも有意な差分を有する高い確信度の英単語を含む文を問題文として出力することが可能になり、学習対象の英単語のみを正解とする穴埋め問題を作成することができる。 Specifically, as shown in FIG. 3, the question creation device 100 masks (S13 in FIG. 3, S33 in FIG. 7) the English words to be learned in sentences (S12 in FIG. 3, S32 in FIG. 7) acquired from a dataset (S11 in FIG. 3, S31 in FIG. 7 described later) or the like, inputs the sentences including the masked portions into a natural language processing model such as BERT (S14 in FIG. 3, S34 in FIG. 7), and estimates candidates for the English words to be placed in the masked portions. According to the present disclosure, the question creation device 100 estimates the confidence of each candidate English word estimated from the natural language processing model, and when the difference between the highest first confidence, i.e., the confidence of the masked English word to be learned and the second highest second confidence, is equal to or greater than a predetermined value, outputs the sentence including the masked portion as a fill-in-the-blank question for the English word having the first confidence (S15 in FIG. 3, S35 in FIG. 7). This makes it possible to output as a question a sentence that contains an English word with a high degree of certainty that has a significant difference from other English word candidates as the English word to be filled in the masked portion, and to create fill-in-the-blank questions in which only the English word being studied is the correct answer.

問題文作成装置１００は、後述するように、学習用アプリなど介し学習者のスマートフォン、タブレット、パーソナルコンピュータ等に問題文を提供するサーバなどであってもよい。例えば、問題文作成装置１００は、学習用アプリを運営する企業、業者等のサーバであってもよく、当該企業、業者等が保持する文をデータセットとして格納するデータベースから処理対象の文を取得し、後述するように問題文を作成する。好ましくは、当該データセットは、著作権管理された文から構成される。 The question creation device 100 may be a server that provides questions to a learner's smartphone, tablet, personal computer, etc. via a learning app, as described below. For example, the question creation device 100 may be a server of a company, vendor, etc. that operates a learning app, and obtains the sentence to be processed from a database that stores sentences held by the company, vendor, etc. as a dataset, and creates a question as described below. Preferably, the dataset is composed of copyright-managed sentences.

問題文作成装置１００は、例えば、図４に示されるように、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などのプロセッサ１０１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、フラッシュメモリなどのメモリ１０２、ハードディスクなどのストレージ１０３、及び入出力（Ｉ／Ｏ）インタフェース１０４によるハードウェア構成を有してもよい。 The question creation device 100 may have a hardware configuration, for example as shown in FIG. 4, including a processor 101 such as a CPU (Central Processing Unit), a memory 102 such as a RAM (Random Access Memory) or a flash memory, a storage 103 such as a hard disk, and an input/output (I/O) interface 104.

プロセッサ１０１は、後述される問題文作成装置１００の各種処理を実行する。 The processor 101 executes various processes of the question creation device 100, which will be described later.

メモリ１０２は、問題文作成装置１００における各種データ及びプログラムを格納し、特に作業用データ、実行中のプログラムなどのためのワーキングメモリとして機能する。具体的には、メモリ１０２は、ストレージ１０３からロードされた後述される各種処理を実行及び制御するためのプログラムなどを格納し、プロセッサ１０１によるプログラムの実行中にワーキングメモリとして機能する。 The memory 102 stores various data and programs in the question creation device 100, and functions in particular as a working memory for working data, programs currently being executed, and the like. Specifically, the memory 102 stores programs for executing and controlling various processes described below that are loaded from the storage 103, and functions as a working memory while the processor 101 is executing the programs.

ストレージ１０３は、問題文作成装置１００における各種データ及びプログラムを格納する。 Storage 103 stores various data and programs in the question creation device 100.

Ｉ／Ｏインタフェース１０４は、ユーザからの命令、入力データなどを受け付け、出力結果を表示、再生などすると共に、外部装置との間でデータを入出力するためのインタフェースである。例えば、Ｉ／Ｏインタフェース１０４は、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）、通信回線、キーボード、マウス、ディスプレイ、マイクロフォン、スピーカなどの各種データを入出力するためのデバイスであってもよい。 The I/O interface 104 is an interface for receiving commands and input data from a user, displaying and playing back output results, and inputting and outputting data between external devices. For example, the I/O interface 104 may be a device for inputting and outputting various types of data, such as a USB (Universal Serial Bus), a communication line, a keyboard, a mouse, a display, a microphone, and a speaker.

しかしながら、本開示による問題文作成装置１００は、上述したハードウェア構成に限定されず、他の何れか適切なハードウェア構成を有してもよい。例えば、問題文作成装置１００による各種処理の１つ以上は、これを実現するよう配線化された処理回路又は電子回路により実現されてもよい。 However, the question creation device 100 according to the present disclosure is not limited to the above-mentioned hardware configuration, and may have any other appropriate hardware configuration. For example, one or more of the various processes performed by the question creation device 100 may be realized by a processing circuit or electronic circuit that is hardwired to realize the process.

次に、図５～９を参照して、本開示の一実施例による問題文作成装置１００をより詳細に説明する。問題文作成装置１００は、学習対象の単語（例えば、英単語、用語、人名、場所など）と文とを入力として受け付け、当該単語を空欄とする穴埋め問題を出力する。以下の実施例では、学習対象の単語として英単語が適用され、当該英単語がマスクされた穴埋め問題が出力されるが、本開示はこれに限定されず、英語学習以外の他の任意の科目における用語の学習に適用されうる。 Next, the question creation device 100 according to an embodiment of the present disclosure will be described in more detail with reference to Figures 5 to 9. The question creation device 100 receives as input a word to be learned (e.g., an English word, term, person's name, place, etc.) and a sentence, and outputs a fill-in-the-blank question with the word left blank. In the following embodiment, an English word is applied as the word to be learned, and a fill-in-the-blank question in which the English word is masked is output, but the present disclosure is not limited thereto and may be applied to the learning of terms in any subject other than English language learning.

図５は、本開示の一実施例による問題文作成装置１００の機能構成を示すブロック図である。図５に示されるように、問題文作成装置１００は、文取得部１１０、確信度算出部１２０及び問題文出力部１３０を有する。 FIG. 5 is a block diagram showing the functional configuration of a question sentence creation device 100 according to an embodiment of the present disclosure. As shown in FIG. 5, the question sentence creation device 100 has a sentence acquisition unit 110, a confidence factor calculation unit 120, and a question sentence output unit 130.

文取得部１１０は、データセットから学習対象の単語を含む文を取得する。本明細書において、文とは句点「。」又はピリオド「．」で終わる１語以上の単語からなるものであり、文章はまとまった意味を有する複数の連続した文である。取得対象の文又は文章は、例えば、データベースやウェブ上から取得されてもよい。好ましくは、取得対象の文又は文章は、著作権が管理されているデータベースのデータセットから抽出されてもよい。 The sentence acquisition unit 110 acquires sentences including the words to be learned from the dataset. In this specification, a sentence is one or more words ending with a period "." or a full stop ".", and a sentence is a series of multiple consecutive sentences having a coherent meaning. The sentence or sentences to be acquired may be acquired, for example, from a database or the web. Preferably, the sentence or sentences to be acquired may be extracted from a dataset in a database where copyright is managed.

確信度算出部１２０は、取得した文において学習対象の単語をマスクし、推測モデルを利用して、当該文のマスク箇所における学習対象の単語の確信度とマスク箇所に入り得る他の候補である候補単語の確信度とを算出する。具体的には、確信度算出部１２０は、学習対象の単語を含む文又は文章において、当該単語をマスクすることによってマスク箇所を空欄とする文又は文章に変換する。そして、確信度算出部１２０は、マスク箇所を含む文又は文章をＢＥＲＴなどの何れか適切な自然言語処理モデルに入力する。ＢＥＲＴは、図３に示されるように、マスク箇所に入るべき候補単語と、各候補単語の確信度とを出力する。 The confidence calculation unit 120 masks the word to be learned in the acquired sentence, and uses a prediction model to calculate the confidence of the word to be learned in the masked portion of the sentence and the confidence of other candidate words that can be placed in the masked portion. Specifically, the confidence calculation unit 120 converts a sentence or text containing the word to be learned into a sentence or text in which the masked portion is left blank by masking the word. The confidence calculation unit 120 then inputs the sentence or text containing the masked portion into any appropriate natural language processing model such as BERT. BERT outputs the candidate words to be placed in the masked portion and the confidence of each candidate word, as shown in FIG. 3.

図示された具体例では、マスクされた単語"ｃｏｎｔｉｎｕｅ"に対して、ＢＥＲＴは、"Ｔｈｉｓｃｉｔｙｗｉｌｌｃｅａｓｅｔｏｂｅａｎｄｗｉｌｌ（）ｔｏｇｒｏｗ！"というマスク箇所を含む入力文から、マスク箇所の候補単語及び確信度"ｃｏｎｔｉｎｕｅ０．９３７３・・・"、"ｃｅａｓｅ０．０５０６・・・"、"ｂｅｇｉｎ０．００２５・・・"を出力する。ここで、確信度は当該単語がマスク箇所に入る確率を示すものであり、例えば、マスクされた単語の確信度が最も高くなる。なお、マスクされた単語の確信度が最も高くならない場合もありうる。 In the illustrated specific example, for the masked word "continue", BERT outputs candidate words and confidence levels for the masked part "continue 0.9373...", "cease 0.0506...", and "begin 0.0025..." from an input sentence containing the masked part "This city will cease to be and will ( ) to grow!". Here, the confidence level indicates the probability that the word will be included in the masked part, and for example, the confidence level of the masked word is the highest. Note that there may be cases where the confidence level of the masked word is not the highest.

また、ＢＥＲＴは、マスクされた単語"ｈｕｒｔ"に対して、"Ｉａｐｏｌｏｇｉｚｅｉｆｍｙａｃｔｉｏｎｓ（）ｙｏｕｒｐｒｉｄｅ．"というマスク箇所を含む入力文から、マスク箇所の候補単語及び確信度"ｈｕｒｔ０．９４６５・・・"、"ｗｏｕｎｄｅｄ０．０１６７・・・"、"ｏｆｆｅｎｄｅｄ０．００６２・・・"を出力する。 For the masked word "hurt," BERT outputs candidate words and confidence scores for the masked part "hurt 0.9465...," "wounded 0.0167...," and "offended 0.0062..." from an input sentence that includes the masked part "I apologize if my actions ( ) your pride."

なお、マスクされた単語（例えば、上述した実施例の"ｃｏｎｔｉｎｕｅ"、"ｈｕｒｔ"）が、マスク箇所に入り得る候補として推測されなかった場合には、問題文作成装置１００は、エラー処理であるとして、別の文を取得する。 If the masked word (for example, "continue" or "hurt" in the above-mentioned embodiment) is not predicted as a possible candidate for the masked portion, the question creation device 100 will handle this as an error and obtain another sentence.

上述した実施例では、ＢＥＲＴに１つの文が入力されたが、本開示はこれに限定されず、一実施例では、連続した複数の文から構成される文章がＢＥＲＴに入力されてもよい。例えば、確信度算出部１２０は、図６に示されるように、学習対象の単語を含む文と当該文の前後の文とから構成される文章（図６のＳ２１、Ｓ２２）に対して、学習対象の単語をマスク（図６のＳ２３）し、マスク箇所を含む文章をＢＥＲＴに入力（図６のＳ２４）してもよい。同様にして、ＢＥＲＴは、マスク箇所を含む入力文からマスク箇所の候補単語及び確信度を出力する（図６のＳ２５）。 In the above-described embodiment, one sentence is input to the BERT, but the present disclosure is not limited thereto, and in one embodiment, a sentence consisting of multiple consecutive sentences may be input to the BERT. For example, as shown in FIG. 6, the certainty calculation unit 120 may mask (S23 in FIG. 6) the word to be learned from a sentence consisting of a sentence including the word to be learned and sentences before and after the sentence (S21, S22 in FIG. 6), and input the sentence including the masked portion to the BERT (S24 in FIG. 6). In a similar manner, the BERT outputs candidate words and certainty for the masked portion from the input sentence including the masked portion (S25 in FIG. 6).

問題文出力部１３０は、単語の確信度と候補単語の確信度との差分が所定値以上である場合、マスクされた文を穴埋め問題として出力する。具体的には、問題文出力部１３０は、マスク箇所の各候補単語の確信度を比較し、最も高い確信度（例えば、マスクされた単語の確信度）と次に高い確信度との差分を算出する。そして、最も高い確信度と次に高い確信度との差分が所定の閾値以上である場合、問題文出力部１３０は、入力文は当該単語の穴埋め問題として適していると判断し、マスクされた文を当該単語の穴埋め問題として出力する。他方、最も高い確信度と次に高い確信度との差分が所定の閾値未満である場合、問題文出力部１３０は、入力文は当該単語の穴埋め問題として適していないと判断し、マスクされた文を当該単語の穴埋め問題として出力しない。すなわち、これらの確信度に有意な差がない場合、入力文は穴埋め問題の正解を一意的に特定することが困難であり、穴埋め問題として適していないと考えられる。同様に、最も高い確信度がマスクされた単語の確信度でない場合も、入力文は穴埋め問題として適していないと考えられる。つまり、問題文出力部１３０は、最も高い確信度がマスクされた単語の確信度であり、かつ、マスクされた単語の確信度と候補単語の確信度との差分が所定値以上である場合に、マスクされた文を穴埋め問題として出力する。 If the difference between the confidence of the word and the confidence of the candidate word is equal to or greater than a predetermined value, the question output unit 130 outputs the masked sentence as a fill-in-the-blank question. Specifically, the question output unit 130 compares the confidence of each candidate word in the masked portion and calculates the difference between the highest confidence (for example, the confidence of the masked word) and the next highest confidence. If the difference between the highest confidence and the next highest confidence is equal to or greater than a predetermined threshold, the question output unit 130 determines that the input sentence is suitable as a fill-in-the-blank question for the word, and outputs the masked sentence as a fill-in-the-blank question for the word. On the other hand, if the difference between the highest confidence and the next highest confidence is less than a predetermined threshold, the question output unit 130 determines that the input sentence is not suitable as a fill-in-the-blank question for the word, and does not output the masked sentence as a fill-in-the-blank question for the word. In other words, if there is no significant difference between these confidences, it is difficult to uniquely identify the correct answer to the fill-in-the-blank question for the input sentence, and it is considered that the input sentence is not suitable as a fill-in-the-blank question. Similarly, if the highest confidence level is not the confidence level of the masked word, the input sentence is also considered to be unsuitable as a fill-in-the-blank question. In other words, the question sentence output unit 130 outputs the masked sentence as a fill-in-the-blank question if the highest confidence level is the confidence level of the masked word and the difference between the confidence level of the masked word and the confidence level of the candidate word is equal to or greater than a predetermined value.

一実施例では、単語の確信度と候補単語の確信度との差分が所定値未満である場合、文取得部１１０は、取得した文（すなわち、データセットからの学習対象の単語を含む文）に隣接する文を取得し、確信度算出部１２０は、推測モデルを利用して、当該取得した文と当該取得した文に隣接する文とから構成される文章のマスク箇所における学習対象の単語の確信度と候補単語の確信度とを算出し、問題文出力部１３０は、学習対象の単語の確信度と各候補単語の確信度との差分が所定値以上である場合、マスクされた文章を当該単語の穴埋め問題として出力してもよい。例えば、ＢＥＲＴへの入力文が穴埋め問題として適していない場合、入力文の前後の文が探索され、入力文と前後の文とから構成される文章がＢＥＲＴに入力されてもよい。一般に、文脈の範囲が拡大すると、マスク箇所に入るべき候補単語が絞られると考えられ、学習対象の単語の確信度と次に高い確信度との間の差分が大きくなると想定される。従って、入力文が穴埋め問題として適していない場合、文取得部１１０は、入力文の前後の文をデータセットから抽出し、確信度算出部１２０は、抽出した文章において学習対象の単語をマスクし、マスク箇所を含む文章をＢＥＲＴに入力し、候補単語及び各候補単語の確信度を算出してもよい。そして、最も高い学習対象の単語の確信度と次に高い確信度との間の差分が所定値以上になった場合、問題文作成装置１００は、当該文章を穴埋め問題として出力してもよい。つまり、第ｎ（ｎは１以上の整数）の文において単語の確信度と候補単語の確信度との差分が所定値未満である場合、文取得部１１０は、第ｎの文に隣接する第ｎ＋１の文を取得し、確信度算出部１２０は、推測モデルを利用して、当該取得した文（第ｎの文）と当該取得した文に隣接する文（第ｎ＋１の文）とから構成される文章のマスク箇所における学習対象の単語の確信度と候補単語の確信度とを算出し、問題文出力部１３０は、学習対象の単語の確信度と各候補単語の確信度との差分が所定値以上である場合、マスクされた文章を当該単語の穴埋め問題として出力することができる（例えば、指定された回数まで、あるいは、学習対象の単語の確信度と各候補単語の確信度との差分が所定値以上になるまで、隣接する文の取得が繰り返されてもよい）。 In one embodiment, if the difference between the confidence of the word and the confidence of the candidate word is less than a predetermined value, the sentence acquisition unit 110 acquires a sentence adjacent to the acquired sentence (i.e., a sentence including the learning target word from the dataset), the confidence calculation unit 120 uses a prediction model to calculate the confidence of the learning target word and the confidence of the candidate word in the masked portion of the sentence composed of the acquired sentence and the sentence adjacent to the acquired sentence, and the question sentence output unit 130 may output the masked sentence as a fill-in-the-blank question for the word if the difference between the confidence of the learning target word and the confidence of each candidate word is equal to or greater than a predetermined value. For example, if the input sentence to the BERT is not suitable as a fill-in-the-blank question, the sentences before and after the input sentence may be searched, and the sentence composed of the input sentence and the sentences before and after the input sentence may be input to the BERT. In general, it is considered that as the range of the context expands, the candidate words to be included in the masked portion are narrowed down, and it is assumed that the difference between the confidence of the learning target word and the next highest confidence becomes large. Therefore, when the input sentence is not suitable as a fill-in-the-blank question, the sentence acquisition unit 110 may extract sentences before and after the input sentence from the dataset, and the certainty calculation unit 120 may mask the words to be learned in the extracted sentences, input the sentences including the masked parts to BERT, and calculate the candidate words and the certainty of each candidate word. Then, when the difference between the highest certainty of the word to be learned and the next highest certainty is equal to or greater than a predetermined value, the question sentence creation device 100 may output the sentence as a fill-in-the-blank question. That is, if the difference between the confidence of the word and the confidence of the candidate word in the nth sentence (n is an integer equal to or greater than 1) is less than a predetermined value, the sentence acquisition unit 110 acquires the n+1th sentence adjacent to the nth sentence, the confidence calculation unit 120 uses the prediction model to calculate the confidence of the word to be studied and the confidence of the candidate word in the masked portion of the sentence composed of the acquired sentence (nth sentence) and the sentence adjacent to the acquired sentence (n+1th sentence), and the question sentence output unit 130 can output the masked sentence as a fill-in-the-blank question for the word if the difference between the confidence of the word to be studied and the confidence of each candidate word is equal to or greater than a predetermined value (for example, acquisition of the adjacent sentence may be repeated up to a specified number of times, or until the difference between the confidence of the word to be studied and the confidence of each candidate word is equal to or greater than a predetermined value).

また、一実施例では、問題文出力部１３０は、穴埋め問題において単語と当該単語に付随する選択候補単語とを選択肢として出力してもよい。すなわち、問題文出力部１３０は、穴埋め問題の選択肢を自動生成してもよい。例えば、問題文出力部１３０は、空欄の選択肢として、当該学習対象の単語と当該単語の派生語や異なる品詞の単語とを穴埋め問題の空欄の選択肢として選択してもよい。具体的には、学習対象の単語が動詞である場合、問題文出力部１３０は、辞書データベースなどを参照して、当該動詞の対応する名詞、形容詞、副詞などを選択肢として決定してもよい。 In addition, in one embodiment, the question output unit 130 may output a word and a selection candidate word associated with the word as options in a fill-in-the-blank question. That is, the question output unit 130 may automatically generate options for a fill-in-the-blank question. For example, the question output unit 130 may select the word to be studied and a derivative of the word or a word of a different part of speech as options for the blank in a fill-in-the-blank question. Specifically, when the word to be studied is a verb, the question output unit 130 may refer to a dictionary database or the like to determine the corresponding noun, adjective, adverb, etc. of the verb as options.

また、一実施例では、問題文出力部１３０は、穴埋め問題において単語の１つ以上の文字を表示してもよい。例えば、問題文出力部１３０は、穴埋め問題の空欄にヒントとして、正解の単語の最初の文字（例えば、正解の単語が"ｃｏｎｔｉｎｕｅ"の場合、最初の文字の"ｃ"）を表示してもよい。 In one embodiment, the question output unit 130 may display one or more letters of a word in a fill-in-the-blank question. For example, the question output unit 130 may display the first letter of the correct word (for example, if the correct word is "continue," then the first letter "c") as a hint in the blank of the fill-in-the-blank question.

また、一実施例では、問題文出力部１３０は、穴埋め問題において文と当該文に付随する文とを表示してもよい。例えば、問題文出力部１３０は、図７に示されるように、英語の穴埋め問題において、英文と当該英文の訳文とを表示してもよい。 In one embodiment, the question output unit 130 may display a sentence and a sentence accompanying the sentence in a fill-in-the-blank question. For example, as shown in FIG. 7, the question output unit 130 may display an English sentence and a translation of the English sentence in an English fill-in-the-blank question.

また、一実施例では、問題文出力部１３０は、穴埋め問題において文と当該文に付随する画像とを表示してもよい。例えば、"二酸化炭素は、１つの炭素原子と２つの（）とが結合したものである。"という穴埋め問題において、炭素原子と２つの酸素原子とが結合した図"Ｏ＝Ｃ＝Ｏ"が一緒に表示されてもよい。 In one embodiment, the question output unit 130 may display a sentence and an image associated with the sentence in a fill-in-the-blank question. For example, in a fill-in-the-blank question such as "Carbon dioxide is a bond between one carbon atom and two ( )," a diagram of a carbon atom and two oxygen atoms bonded together, "O=C=O," may be displayed together.

また、一実施例では、問題文作成装置１００は更に、図８に示されるように、学習対象の単語を提供する単語提供部１４０を有してもよい。例えば、単語提供部１４０は、学習者の学習履歴や習得レベルに基づき学習対象の単語を決定し、文取得部１１０、確信度算出部１２０及び／又は問題文出力部１３０に学習対象の単語を提供してもよい。 In one embodiment, the question creation device 100 may further include a word providing unit 140 that provides words to be learned, as shown in FIG. 8. For example, the word providing unit 140 may determine words to be learned based on the learning history and learning level of the learner, and provide the words to be learned to the sentence acquisition unit 110, the confidence calculation unit 120, and/or the question output unit 130.

なお、上述した各実施例は適宜組み合わされてもよい。 The above-mentioned embodiments may be combined as appropriate.

図９は、本開示の一実施例による問題文作成処理を示すフローチャートである。問題文作成処理は、上述した問題文作成装置１００によって実行され、特に問題文作成装置１００のプロセッサがプログラムを実行することによって実現されうる。 Figure 9 is a flowchart showing a question creation process according to one embodiment of the present disclosure. The question creation process is executed by the above-mentioned question creation device 100, and in particular can be realized by the processor of the question creation device 100 executing a program.

図９に示されるように、ステップＳ１０１において、問題文作成装置１００は、データセットから学習対象の単語を含む文を取得する。例えば、問題文作成装置１００は、著作権管理されたデータベースから学習対象の単語を含む文又は文章を抽出する。具体的には、学習対象の単語が高校受験用の英単語である場合、問題文作成装置１００は、これらの英単語を含む英文をデータベースから抽出する。 As shown in FIG. 9, in step S101, the question creation device 100 acquires sentences including the words to be studied from the dataset. For example, the question creation device 100 extracts sentences or paragraphs including the words to be studied from a copyright-managed database. Specifically, if the words to be studied are English words for high school entrance exams, the question creation device 100 extracts English sentences including these English words from the database.

ステップＳ１０２において、問題文作成装置１００は、取得した文において学習対象の単語をマスクする。例えば、学習対象の単語を含む文を取得すると、問題文作成装置１００は、取得した文における当該単語の箇所を空欄にする。 In step S102, the question creation device 100 masks the word to be learned in the acquired sentence. For example, when a sentence containing the word to be learned is acquired, the question creation device 100 leaves the location of the word in the acquired sentence blank.

ステップＳ１０３において、問題文作成装置１００は、推測モデルを利用して、マスクされた文のマスク箇所における当該単語の確信度と候補単語の確信度とを算出する。例えば、問題文作成装置１００は、ＢＥＲＴにマスク箇所を含む文を入力し、当該マスク箇所に入りうる候補単語及び各候補単語の確信度を取得する。例えば、マスクされた単語の確信度が最も高くなる。 In step S103, the question creation device 100 uses the inference model to calculate the confidence of the word in the masked portion of the masked sentence and the confidence of the candidate words. For example, the question creation device 100 inputs a sentence including a masked portion into BERT, and obtains candidate words that may be included in the masked portion and the confidence of each candidate word. For example, the confidence of the masked word is the highest.

ステップＳ１０４において、問題文作成装置１００は、マスクされた単語の確信度と他の候補単語の確信度との差分が所定値以上である場合、マスクされた文を穴埋め問題として出力する。なお、所定値は、マスクされた単語と候補単語とを有意に区別しうる何れか適当な値に設定されてもよい。 In step S104, if the difference between the certainty of the masked word and the certainty of the other candidate words is equal to or greater than a predetermined value, the question creation device 100 outputs the masked sentence as a fill-in-the-blank question. Note that the predetermined value may be set to any appropriate value that can significantly distinguish the masked word from the candidate words.

以上、本発明の実施例について詳述したが、本発明は上述した特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 Although the examples of the present invention have been described in detail above, the present invention is not limited to the specific embodiments described above, and various modifications and variations are possible within the scope of the gist of the present invention as described in the claims.

１００問題文作成装置
１１０文取得部
１２０確信度算出部
１３０問題文出力部
１４０単語提供部 100 Question sentence creation device 110 Sentence acquisition unit 120 Confidence factor calculation unit 130 Question sentence output unit 140 Word provision unit

Claims

a sentence acquisition unit that acquires a first sentence including a word to be learned from the dataset;
a confidence calculation unit that masks the word in the first sentence and calculates the confidence of the word in the masked portion of the first sentence and the confidence of other candidate words that may be included in the masked portion by using an inference model;
a question output unit that sets a predetermined value for distinguishing the word from the candidate word, and outputs the masked sentence as a fill-in-the-blank question when the certainty of the word is greater than the certainty of the candidate word by at least the predetermined value;
A problem creation device having the above structure.

if the difference between the confidence of the word and the confidence of the candidate word is less than the predetermined value,
the sentence acquisition unit acquires a second sentence adjacent to the first sentence in a sentence in the dataset ;
the certainty calculation unit uses the inference model to calculate certainty of the word and certainty of the candidate word in a masked portion of the sentence composed of the first sentence and the second sentence;
2 . The question creation device according to claim 1 , wherein the question output unit outputs the masked sentence as a fill-in-the-blank question for the word when the certainty of the word is greater than the certainty of the candidate word by at least the predetermined value.

if the difference between the confidence of the word and the confidence of the candidate word is less than the predetermined value,
the sentence acquisition unit acquires an n+1-th sentence adjacent to an n-th sentence in the sentences in the dataset , where n is an integer equal to or greater than 1;
the certainty calculation unit uses the inference model to calculate certainty of the word and certainty of the candidate word in a masked portion of the sentence composed of the nth sentence and the n+1th sentence;
2 . The question creation device according to claim 1 , wherein the question output unit outputs the masked sentence as a fill-in-the-blank question for the word when the certainty of the word is greater than the certainty of the candidate word by at least the predetermined value.

The question creation device according to any one of claims 1 to 3, wherein the question output unit outputs the word and a selection candidate word associated with the word as options in the fill-in-the-blank question.

The question creation device according to any one of claims 1 to 4, wherein the question output unit displays one or more characters of the word in the fill-in-the-blank question.

The question creation device according to any one of claims 1 to 5, wherein the question output unit displays the masked sentence and a sentence associated with the masked sentence in the fill-in-the-blank question.

The question creation device according to any one of claims 1 to 5, wherein the question output unit displays the masked sentence and an image associated with the sentence in the fill-in-the-blank question.

The question creation device according to any one of claims 1 to 7, further comprising a word providing unit that provides the words to be studied.

obtaining a first sentence from the dataset that includes a word to be trained;
masking the word in the first sentence and calculating the confidence of the word in the masked portion of the first sentence and the confidence of other candidate words that may be included in the masked portion using an inference model;
setting a predetermined value for distinguishing the word from the candidate word, and outputting the masked sentence as a fill-in-the-blank question if the certainty of the word is greater than the certainty of the candidate word by at least the predetermined value;
A method of creating questions that is carried out by a computer.

obtaining a first sentence including a word to be trained from the dataset;
A process of masking the word in the first sentence and calculating the confidence of the word in the masked portion of the first sentence and the confidence of other candidate words that may be included in the masked portion using a prediction model;
a process of setting a predetermined value for distinguishing the word from the candidate word, and outputting the masked sentence as a fill-in-the-blank question if the certainty of the word is greater than the certainty of the candidate word by at least the predetermined value;
A program that causes a computer to execute the following.