JP6913706B2

JP6913706B2 - Exam question prediction system and exam question prediction method

Info

Publication number: JP6913706B2
Application number: JP2019080130A
Authority: JP
Inventors: 政人鬼頭
Original assignee: 株式会社サイトビジット
Priority date: 2019-04-19
Filing date: 2019-04-19
Publication date: 2021-08-04
Anticipated expiration: 2039-04-19
Also published as: JP2020177507A; JP7303243B2; JP2021106062A

Description

本発明は、試験問題を予測するシステムに係り、特に、過去の問題の問題文をカテゴリに分類して出題のカテゴリを予測し、それに基づいて試験問題を予測する試験問題予測システム及び試験問題予測方法に関する。 The present invention relates to a system for predicting test questions, and in particular, a test question prediction system and a test question prediction system for predicting test questions based on classifying question sentences of past questions into categories and predicting question categories. Regarding the method.

［従来の技術］
従来、コンピュータを用いて試験問題を作成し、ネットワークを介して試験問題を提供するものがある。
しかしながら、過去の試験問題の問題文を分類して今年の試験問題を精度よく予測する手法は確立されていない。 [Conventional technology]
Conventionally, there is a method in which an examination question is created using a computer and the examination question is provided via a network.
However, no method has been established to classify the question sentences of past exam questions and accurately predict this year's exam questions.

［関連技術］
尚、関連する先行技術として、特開２００７−２４８６０５号公報「試験問題作成方法、試験問題作成システム及び試験問題作成プログラム」（特許文献１）がある。
特許文献１には、試験の種別に応じた評価基準を満たす試験問題を作成できることが示されている。 [Related technology]
As a related prior art, there is Japanese Patent Application Laid-Open No. 2007-248605 "Examination Question Creation Method, Examination Question Creation System and Examination Question Creation Program" (Patent Document 1).
Patent Document 1 shows that it is possible to create a test question that satisfies the evaluation criteria according to the type of test.

特開２００７−２４８６０５号公報JP-A-2007-248605

従来の試験問題作成方法では、過去の試験問題の文章を、人工知能を用いて効率的に分類し、分類した内容から出題カテゴリを分析して次回の試験問題を予測するものではないため、試験問題を効率的に精度よく予測できるものとはなっていないという問題点があった。 The conventional test question creation method does not predict the next test question by efficiently classifying the sentences of the past test questions using artificial intelligence and analyzing the question category from the classified contents. There was a problem that the problem could not be predicted efficiently and accurately.

特に、過去の試験問題の量が膨大になると、試験問題を出題カテゴリに分類する作業が増大し、効率的に精度よく出題カテゴリに分析して次回の試験問題を予測するのが困難となっていた。
特許文献１にも、過去の問題内容から次回の出題カテゴリを予測して試験問題を作成するものとはなっていないものである。 In particular, when the amount of past exam questions becomes enormous, the work of classifying exam questions into question categories increases, and it becomes difficult to efficiently and accurately analyze the exam questions into question categories and predict the next exam questions. rice field.
Patent Document 1 also does not create an examination question by predicting the next question category from the past question contents.

本発明は上記実情に鑑みて為されたもので、過去に出題された問題文をカテゴリに分類し、分類結果から次回の出題カテゴリを予測して、その予測結果を利用して過去の問題に基づいて試験対策用の予測問題を容易に作成できる試験問題予測システム及び試験問題予測方法を提供することを目的とする。 The present invention has been made in view of the above circumstances, classifies question sentences that have been asked in the past into categories, predicts the next question category from the classification results, and uses the prediction results to solve past problems. It is an object of the present invention to provide a test question prediction system and a test question prediction method that can easily create a prediction question for test preparation based on the test question.

上記従来例の問題点を解決するための本発明は、試験問題を予測する処理装置を備える試験問題予測システムであって、処理装置が、過去の試験問題の問題文から単語を抽出し、当該抽出した単語についてベクトルの要素毎に数値を付与してベクトルデータを生成するベクトル化を行い、ベクトルデータに基づいて試験問題をカテゴリに分類し、過去の年毎にカテゴリに分類された試験問題の出題回数を過去の時系列に配置し、当該配置を１年過去にシフトさせる配列を段階的に複数生成した階差系列を用いて次回の試験問題のカテゴリ毎の出題回数を機械学習により予測することを特徴とする。 The present invention for solving the problems of the above-mentioned conventional example is a test question prediction system including a processing device for predicting a test question, in which the processing device extracts a word from a question sentence of a past test question. For the extracted words, give numerical values to each element of the vector to generate vector data, perform vectorization, classify the exam questions into categories based on the vector data, and classify the exam questions into categories for each past year. The number of questions is arranged in the past time series, and the number of questions for each category of the next exam question is predicted by machine learning using the difference series that gradually generates multiple sequences that shift the arrangement to the past one year. It is characterized by that.

本発明は、上記試験問題予測システムにおいて、処理装置が、機械学習の学習モデルに教師データの問題文を用いて所望のベクトルデータが得られるよう学習させ、当該学習済みの学習モデルを用いてベクトル化を行うことを特徴とする。 In the present invention, in the above-mentioned test question prediction system, a processing device trains a machine learning learning model to obtain desired vector data by using a question sentence of teacher data, and uses the trained learning model to perform a vector. It is characterized by performing conversion.

本発明は、上記試験問題予測システムにおいて、処理装置が、機械学習の学習モデルに教師データのベクトルデータを用いて所望のカテゴリに分類されるよう学習させ、当該学習済みの学習モデルを用いてベクトル化されたベクトルデータから試験問題をカテゴリに分類することを特徴とする。 In the present invention, in the above-mentioned test question prediction system, a processing device trains a machine learning learning model to be classified into a desired category using vector data of teacher data, and uses the trained learning model to perform a vector. It is characterized by classifying exam questions into categories from the converted vector data.

本発明は、上記試験問題予測システムにおいて、処理装置が、機械学習に、再帰型ニューラルネットワークを用いることを特徴とする。 The present invention is characterized in that, in the test question prediction system, the processing device uses a recurrent neural network for machine learning.

本発明は、上記試験問題予測システムにおいて、処理装置が、予測されたカテゴリ毎の出題回数に従って過去の試験問題から問題を抽出して次回の予測問題を作成することを特徴とする。 The present invention is characterized in that, in the test question prediction system, the processing device extracts questions from past test questions according to the predicted number of questions for each category and creates the next predicted question.

本発明は、試験問題を処理装置に予測させる試験問題予測方法であって、過去の試験問題の問題文から単語を抽出させ、当該抽出させた単語についてベクトルの要素毎に数値を付与してベクトルデータを生成するベクトル化を行わせ、ベクトルデータに基づいて試験問題をカテゴリに分類させ、過去の年毎にカテゴリに分類された試験問題の出題回数を過去の時系列に配置させ、当該配置を１年過去にシフトさせる配列を段階的に複数生成した階差系列を用いて次回の試験問題のカテゴリ毎の出題回数を機械学習により予測させることを特徴とする。 The present invention provides a test question prediction method Ru is predictive of test questions to the processor, to extract words from the question sentence past exam, the word obtained by the extraction by applying a number to each element of the vector row Align vectorization of generating vector data, to classify the test problems based on the vector data in the category, so the question number in the past year each test problems classified into categories are arranged in time series in the past, the arrangement the characterized Rukoto is predicted by the machine learning question count for each category of the next exam using stepwise multiple generated differenced series arrangement for shifting a year past.

本発明によれば、処理装置が、過去の試験問題の問題文から単語を抽出し、当該抽出した単語についてベクトルの要素毎に数値を付与してベクトルデータを生成するベクトル化を行い、ベクトルデータに基づいて試験問題をカテゴリに分類し、過去の年毎にカテゴリに分類された試験問題の出題回数を過去の時系列に配置し、当該配置を１年過去にシフトさせる配列を段階的に複数生成した階差系列を用いて次回の試験問題のカテゴリ毎の出題回数を機械学習により予測する試験問題予測システムとしているので、過去の試験問題の問題文をカテゴリに分類し、分類結果から次回の試験出題のカテゴリ毎の出題回数を適正に予測できる効果がある。 According to the present invention, the processing apparatus extracts a word from a question sentence of a past test question, assigns a numerical value to each element of the vector to the extracted word, performs vectorization to generate vector data, and vector data. the test problems classified into categories based on the question number in the past year each test problems that are categorized into place in a time series in the past, stepwise multiple sequences to shift the arrangement to one year past Since it is a test question prediction system that predicts the number of questions for each category of the next test question by machine learning using the generated difference series, the question sentences of the past test questions are classified into categories, and the next test question is based on the classification result. It has the effect of appropriately predicting the number of questions in each test question category.

本システムの概略図である。It is a schematic diagram of this system. 本実装エンジンの概略図である。It is a schematic diagram of this mounting engine. 文章特徴量抽出エンジンの概略図である。It is a schematic diagram of a sentence feature amount extraction engine. 文章分類エンジンの概略図である。It is a schematic diagram of a sentence classification engine. ＲＮＮのSelf-Attention利用例を示す概略図である。It is a schematic diagram which shows the self-Attention use example of RNN. 出題傾向予測エンジンの概略図である。It is a schematic diagram of a question tendency prediction engine. 予測と問題作成処理の概略図である。It is a schematic diagram of a prediction and a problem making process. 本装置におけるニューラルネットワークのモデルの概略図である。It is a schematic diagram of the model of the neural network in this apparatus. 年次毎のカテゴリの出題回数を示す図である。It is a figure which shows the number of times of a question of a category for each year. 階差データの作り方を示す図である。It is a figure which shows how to make the difference data.

本発明の実施の形態について図面を参照しながら説明する。
［実施の形態の概要］
本発明の実施の形態に係る試験問題予測システム（本システム）は、予測問題作成処理装置（本装置）で、過去の試験問題の問題文から単語を抽出し、当該抽出した単語についてベクトルの要素毎に数値を付与してベクトルデータを生成するベクトル化を行い、ベクトルデータに基づいて試験問題をカテゴリに分類し、過去の年毎にカテゴリに分類された試験問題の出題回数から次回の試験問題のカテゴリ毎の出題回数を予測するものであり、過去の試験問題の問題文をカテゴリに分類し、分類結果から次回の試験出題のカテゴリ毎の出題回数を適正に予測できるものである。 Embodiments of the present invention will be described with reference to the drawings.
[Outline of Embodiment]
The test question prediction system (this system) according to the embodiment of the present invention is a prediction question creation processing device (this device) that extracts words from question sentences of past test questions, and vector elements for the extracted words. Perform vectorization to generate vector data by giving a numerical value for each, classify the exam questions into categories based on the vector data, and the next exam question from the number of questions of the exam questions classified into the category for each past year It predicts the number of questions for each category, classifies the question sentences of past exam questions into categories, and can appropriately predict the number of questions for each category of the next exam questions from the classification results.

更に、本システムでは、処理装置が、予測されたカテゴリ毎の出題回数に従って過去の試験問題から問題を抽出して次回の予測問題を作成するものであり、出題回数が適正に予測されたカテゴリについての予測問題を精度よく作成でき、受験者は予測精度が高い予測問題を試験対策用として受けることができるものである。 Furthermore, in this system, the processing device extracts questions from the past exam questions according to the predicted number of questions for each category and creates the next predicted question, and for the category in which the number of questions is properly predicted. Prediction questions can be created with high accuracy, and candidates can receive prediction questions with high prediction accuracy for exam preparation.

［本システム：図１］
本システムについて図１を参照しながら説明する。図１は、本システムの概略図である。
本システムは、図１に示すように、予測問題作成処理装置１と、予測問題提供サーバ２と、インターネット３と、受験者コンピュータ（ＰＣ）４とを基本的に有している。
各装置は、インターネット３を介して接続されており、受験者ＰＣ４は本来、複数台接続されるものである。 [This system: Fig. 1]
This system will be described with reference to FIG. FIG. 1 is a schematic view of this system.
As shown in FIG. 1, this system basically includes a prediction problem creation processing device 1, a prediction problem providing server 2, an Internet 3, and an examinee computer (PC) 4.
Each device is connected via the Internet 3, and a plurality of examinee PCs 4 are originally connected.

［予測問題作成処理装置１］
予測問題作成処理装置１は、過去の問題文をカテゴリに分類して出題カテゴリを分析し、次回の試験問題を予測して試験問題を作成する処理を行う。
予測問題作成処理装置１は、制御部１１と、記憶部１２と、インタフェース部１３とを備え、インタフェース部１３には、表示部１４、入力部１５が接続され、更にインターネット３に接続している。 [Prediction problem creation processing device 1]
The prediction question creation processing device 1 classifies past question sentences into categories, analyzes the question categories, predicts the next exam question, and creates the exam question.
The prediction problem creation processing device 1 includes a control unit 11, a storage unit 12, and an interface unit 13. A display unit 14 and an input unit 15 are connected to the interface unit 13, and further connected to the Internet 3. ..

制御部１１は、記憶部１２に記憶する処理プログラムを読み込み、後述する処理を実行する。
記憶部１２は、処理プログラムを記憶すると共に、過去の試験問題を記憶する。
表示部１４は、予測問題を作成するに必要な表示を行う。
入力部１５は、予測問題を作成するに必要な入力を行う。 The control unit 11 reads the processing program stored in the storage unit 12 and executes the processing described later.
The storage unit 12 stores the processing program and also stores the past test questions.
The display unit 14 makes the display necessary for creating the prediction problem.
The input unit 15 makes inputs necessary for creating a prediction problem.

［予測問題提供サーバ２］
予測問題提供サーバ２は、制御部と記憶部を備え、インターネット３に接続するコンピュータであり、予測問題作成処理装置１で作成された予測問題を入力して記憶し、受講者ＰＣ４に予測問題を配信する。請求項では「予測問題提供装置」としている。
図１では、予測問題作成処理装置１と予測問題提供サーバ２とをインターネット３を介して接続しているが、社内のネットワークで接続してもよい。
また、予測問題作成処理装置１と予測問題提供サーバ２とを一体の装置の構成としてもよい。 [Prediction problem providing server 2]
The prediction problem providing server 2 is a computer having a control unit and a storage unit and connected to the Internet 3. The prediction problem created by the prediction problem creation processing device 1 is input and stored, and the prediction problem is sent to the student PC 4. To deliver. In the claims, it is referred to as a "prediction problem providing device".
In FIG. 1, the prediction problem creation processing device 1 and the prediction problem providing server 2 are connected via the Internet 3, but they may be connected via an in-house network.
Further, the prediction problem creation processing device 1 and the prediction problem providing server 2 may be integrated into one device.

［受験者ＰＣ４］
受験者ＰＣ４は、インターネット３を介して予測問題提供サーバ２にアクセスし、提供される予測問題を受け取ることができるコンピュータである。請求項では「受験者装置」としており、コンピュータに限らずタブレット端末、スマートフォン等の端末装置であってもよい。 [Examinee PC4]
The examinee PC 4 is a computer that can access the prediction problem providing server 2 via the Internet 3 and receive the provided prediction problem. In the claims, the term "examinee's device" is used, and the device may be a terminal device such as a tablet terminal or a smartphone as well as a computer.

また、受験者ＰＣ４は、提供された予測問題に解答した場合に、解答データを予測問題提供サーバ２に送信するようにしてもよい。その場合、予測問題提供サーバ２が受験者の解答データを採点して、合格のためのアドバイスを行ったり、弱点について実力に応じた講義を受講するよう促したりするものである。 Further, the examinee PC 4 may transmit the answer data to the prediction question providing server 2 when answering the provided prediction question. In that case, the prediction question providing server 2 scores the answer data of the examinee, gives advice for passing, and encourages the student to take a lecture according to his / her ability.

［本実装エンジン：図２］
次に、予測問題作成処理装置１で動作するプログラム（ソフトウェア）で実現される実装エンジン（本実装エンジン）について図２を参照しながら説明する。図２は、本実装エンジンの概略図である。
本実装エンジン１０は、図２に示すように、試験文章分類エンジン１００と、出題傾向予測エンジン２００とを備えている。
試験文章分類エンジン１００は、文章特徴量抽出エンジンと文章分類エンジンを備えている。 [This mounting engine: Fig. 2]
Next, the mounting engine (main mounting engine) realized by the program (software) operating in the prediction problem creation processing device 1 will be described with reference to FIG. FIG. 2 is a schematic view of the present mounting engine.
As shown in FIG. 2, the mounting engine 10 includes a test sentence classification engine 100 and a questioning tendency prediction engine 200.
The test sentence classification engine 100 includes a sentence feature amount extraction engine and a sentence classification engine.

試験文章分類エンジン１００の文章特徴量抽出エンジンは、問題文の文章を大量に読み込み、文章のベクトル化を行うＡＩ（Artificial Intelligence）の深層学習（Deep Learning）の学習済みモデルを習得しており、入力された問題から単語を抽出し、当該単語に対するベクトルデータを出力する。 The sentence feature extraction engine of the test sentence classification engine 100 has acquired a trained model of AI (Artificial Intelligence) deep learning that reads a large amount of question sentences and vectorizes the sentences. A word is extracted from the input problem, and vector data for the word is output.

試験文章分類エンジン１００の文章分類エンジンは、そのベクトルデータに基づき別の学習済みモデルを用いて当該試験問題（問題文）のカテゴリを分類する。
出題傾向予測エンジン２００は、過去のカテゴリ別の出題数の推移や直近の出題の推移や直近の出題数から次回のカテゴリ毎の出題数を予測する。
以下、各エンジンについて具体的に説明する。 The sentence classification engine of the test sentence classification engine 100 classifies the categories of the test questions (question sentences) using another learned model based on the vector data.
The question tendency prediction engine 200 predicts the number of questions for each category next time from the transition of the number of questions for each category in the past, the transition of the latest questions, and the number of the latest questions.
Hereinafter, each engine will be specifically described.

［文章特徴量抽出エンジン：図３］
次に、文章特徴量抽出エンジンについて図３を参照しながら説明する。図３は、文章特徴量抽出エンジンの概略図である。
文章特徴量抽出エンジンは、図３に示すように、試験問題、例えば司法試験の選択肢等の文章データ（テキストデータ）を学習済みモデル１１０に読み込み、文章からキーワードの単語を抽出し、分類が容易なベクトルデータに変換する。 [Sentence feature extraction engine: Fig. 3]
Next, the text feature extraction engine will be described with reference to FIG. FIG. 3 is a schematic diagram of a text feature extraction engine.
As shown in FIG. 3, the sentence feature extraction engine reads test questions, for example, sentence data (text data) such as bar examination options into the trained model 110, extracts keyword words from sentences, and is easy to classify. Convert to vector data.

文章特徴量抽出エンジンでは、予めカテゴリ分類のキーとなる要素（向き）をベクトルｖ１，ｖ２、ｖ３，…として設定している。
ベクトルデータは、試験問題の文章データから抽出した単語（例えば「今日」「罰金」「申請」等）が、これらの要素とどの程度関連しているかを数値化したものである。
つまり、ベクトルデータは、要素を示す方向と、その方向に対する関連の強さを示す大きさとを備え、各単語の特徴量を表す情報（特徴ベクトル）となっている。
ここで、文章データから単語を抽出するのは、形態素解析を用いる。形態素解析は、品詞を分解して、不要な語句を削除することで、重要な単語を抽出する。 In the sentence feature amount extraction engine, the key elements (directions) of the category classification are set in advance as vectors v1, v2, v3, ....
The vector data is a numerical value of how much the words extracted from the text data of the examination questions (for example, "today", "fine", "application", etc.) are related to these elements.
That is, the vector data has a direction indicating the element and a size indicating the strength of the relationship with the direction, and is information (feature vector) representing the feature amount of each word.
Here, morphological analysis is used to extract words from sentence data. Morphological analysis extracts important words by decomposing part of speech and deleting unnecessary words.

学習済みモデル１１０は、教師データの問題文を大量に入力し、抽出した特定の単語についてベクトルｖ１，ｖ２，ｖ３…の各要素の大きさが最適な数値となるよう学習している。
学習済みモデル１１０で学習させる教師データは、第１に、問題文のみと分類ラベルのデータ、第２に、問題文と選択肢と問題の正解のデータ、第３に、書籍のテキストデータを用いる。 The trained model 110 inputs a large amount of problem sentences of teacher data, and learns so that the size of each element of the vectors v1, v2, v3 ... For the extracted specific word becomes an optimum numerical value.
As the teacher data to be trained by the trained model 110, firstly, only the question sentence and the classification label data are used, secondly, the question sentence, the choices and the correct answer data of the question are used, and thirdly, the text data of the book is used.

具体的には、ベクトルｖ１をカテゴリａに関係する「平和」の要素、ベクトルｖ２をカテゴリｂに関係する「懲役」の要素、ベクトルｖ３をカテゴリｃに関係する「法人」の要素とする。
そして、文章特徴量抽出エンジンの学習済みモデル１１０は、入力した問題文から単語「今日」「懲役」「法人」を抽出したとすると、カテゴリ分類が容易となるよう、それら単語をベクトルデータに落とし込む変換処理を行う。 Specifically, vector v1 is an element of "peace" related to category a, vector v2 is an element of "imprisonment" related to category b, and vector v3 is an element of "corporation" related to category c.
Then, if the trained model 110 of the sentence feature extraction engine extracts the words "today", "imprisonment", and "corporation" from the input question sentence, the words are dropped into vector data so that the categorization can be facilitated. Perform conversion processing.

図３の例では、単語「今日」は、ｖ１が「０．８」、ｖ２が「０」、ｖ３が「０．１」の数値が付与され、単語「罰金」は、ｖ１が「０」、ｖ２が「０．５」、ｖ３が「０．３」の数値が付与され、単語「申請」は、ｖ１が「０．０５」、ｖ２が「０」、ｖ３が「０．７２」の各数値が付与される。
尚、問題文が異なれば、同じ単語「今日」が抽出された場合でも、ベクトルの数値が同じになるとは限らない。
また、入力される問題の文章は毎回異なるので、抽出される単語も毎回異なるものとなる。 In the example of FIG. 3, the word "today" is given a numerical value of "0.8" for v1, "0" for v2, and "0.1" for v3, and the word "fine" is given a numerical value of "0" for v1. , V2 is "0.5", v3 is "0.3", and the word "application" is v1 is "0.05", v2 is "0", v3 is "0.72". Each numerical value is given.
If the problem sentences are different, even if the same word "today" is extracted, the numerical values of the vectors are not always the same.
In addition, since the text of the question to be input is different each time, the extracted words are also different each time.

ベクトルは、試験問題の文章をカテゴリ分類するために設定される重要なファクタであり、文章の単語に対するベクトルの要素に対する数値が高いということは、そのベクトルに対応するカテゴリへの関連性が高いことを示している。 Vectors are an important factor set to categorize sentences in exam questions, and a high numerical value for a vector element for a word in a sentence is highly relevant to the category corresponding to that vector. Is shown.

そして、入力された文章の単語全体とベクトルの関連性、つまり、ベクトルの縦方向の数値全体が単語全体とベクトルとの関連性を示すものとなる。
これにより、個々の単語ではなく、入力された文章（問題文や選択肢）のベクトル毎の特徴量が抽出されるものである。 Then, the relationship between the entire word and the vector in the input sentence, that is, the entire vertical numerical value of the vector indicates the relationship between the entire word and the vector.
As a result, the feature amount for each vector of the input sentence (question sentence or option) is extracted instead of the individual word.

文章特徴量抽出エンジンの学習済みモデル１１０では、単語を１つ入力したときに前後に出てくる単語を予測させるタスクを学習させ中間層を獲得し、学習後の中間層を単語特徴ベクトルとして使用して算出するようにしてもよい。
つまり、入力単語について、中間層を利用して単語の前又は後に出てくる単語の予測確率を利用してベクトルの数値を求めてもよい。 In the trained model 110 of the sentence feature extraction engine, a task of predicting words appearing before and after when one word is input is learned to acquire an intermediate layer, and the learned intermediate layer is used as a word feature vector. It may be calculated by.
That is, for the input word, the numerical value of the vector may be obtained by using the prediction probability of the word appearing before or after the word by using the middle layer.

［文章分類エンジン：図４］
次に、文章分類エンジンについて図４を参照しながら説明する。図４は、文章分類エンジンの概略図である。
文章分類エンジンは、図４に示すように、図３の学習済みモデル１１０から出力されたベクトルデータをまず教師データとして学習済みモデル１２０に大量に入力し、所望のカテゴリに分類されるよう学習させる。 [Sentence classification engine: Fig. 4]
Next, the sentence classification engine will be described with reference to FIG. FIG. 4 is a schematic diagram of a sentence classification engine.
As shown in FIG. 4, the sentence classification engine first inputs a large amount of vector data output from the trained model 110 of FIG. 3 into the trained model 120 as teacher data, and trains the trained model 120 to be classified into a desired category. ..

そして、図３の文章特徴量抽出エンジンで教師データではない問題文について変換処理を行って得られたベクトルデータを、図４の文章分類エンジンの学習済みモデル１２０に入力して、カテゴリに分類する。
図４では、司法試験を想定しているので、分類されたカテゴリは「民法第８編親族第２章家族」となり、「章」が中カテゴリで、「節」が小カテゴリで、中カテゴリまでの分類であってもよく、小カテゴリまでの分類であってもよい。 Then, the vector data obtained by performing the conversion process on the problem sentence that is not the teacher data by the sentence feature amount extraction engine of FIG. 3 is input to the trained model 120 of the sentence classification engine of FIG. 4 and classified into categories. ..
In Fig. 4, since the bar examination is assumed, the classified categories are "Civil Code Vol. 8 Relatives Chapter 2 Family", "Chapter" is the middle category, "Section" is the small category, and up to the middle category. It may be a classification of up to a small category.

［ＲＮＮの利用例：図５］
文章分類エンジンにおいて、再帰的ニューラルネットワーク（ＲＮＮ：Recurrent Neural Network）を用い、前から順番に計算することで、系列データの前後関係を学習している。
ここで、過去から未来の方向だけでなく、未来から過去の方向への双方向に情報伝搬を行って未来の情報を予測する「Bidirectional」のモデルで、入力毎にユニット自体の重要性（重み）を使い分ける「Self-Attention」の利用例を図５に紹介する。図５は、ＲＮＮのSelf-Attention利用例を示す概略図である。 [Example of using RNN: Fig. 5]
In the sentence classification engine, a recurrent neural network (RNN) is used, and the context of the series data is learned by calculating in order from the front.
Here, in the "Bidirectional" model that predicts future information by propagating information not only from the past to the future but also from the future to the past, the importance (weight) of the unit itself for each input. ) Is used properly, and an example of using "Self-Attention" is introduced in Fig. 5. FIG. 5 is a schematic view showing an example of using Self-Attention of RNN.

図５に示すように、複数のユニット（メモリ）１２１〜１２６において、学習した単語に対してどのユニットが重要であるかを計算するが、その際に入力と同じ情報を用いている。
図５において、ユニット１２１，１２３，１２５がユニット１２２，１２４，１２６より重みがあり、また、ユニット１２６は、ユニット１２２，１２４より重みがあることを示している。
この図５の例は、文章分類エンジンで用いられる学習モデルであり、関連性の高い単語同士（例えば「懲役」と「年」など）のみを注視できるので、言語処理に優れている。 As shown in FIG. 5, in a plurality of units (memory) 121 to 126, which unit is important for the learned word is calculated, and the same information as the input is used at that time.
In FIG. 5, it is shown that the units 121, 123, 125 are heavier than the units 122, 124, 126, and the unit 126 is heavier than the units 122, 124.
The example of FIG. 5 is a learning model used in a sentence classification engine, and is excellent in language processing because it can watch only words with high relevance (for example, "imprisonment" and "year").

［出題傾向予測エンジン：図６］
次に、予測問題作成処理装置１に実装される出題傾向予測エンジンについて図６を参照しながら説明する。図６は、出題傾向予測エンジンの概略図である。
試験文章分類エンジン１００で過去の年度毎、カテゴリ毎に分類された出題件数（数値データ）を学習済みモデル２１０に入力して、次の年のカテゴリ毎の出題件数（数値データ）を予測する。
学習済みモデル２１０には、重回帰、ＬＳＴＭ（Long Short-Term Memory network：長・短期記憶）等のモデルが用いられる。
尚、出題傾向予測及び問題作成の詳細は、以下に説明する。 [Question trend prediction engine: Fig. 6]
Next, the questioning tendency prediction engine implemented in the prediction problem creation processing device 1 will be described with reference to FIG. FIG. 6 is a schematic diagram of a questioning tendency prediction engine.
The test sentence classification engine 100 inputs the number of questions (numerical data) classified for each category in the past year into the trained model 210, and predicts the number of questions (numerical data) for each category in the next year.
As the trained model 210, models such as multiple regression and LSTM (Long Short-Term Memory network) are used.
The details of questioning tendency prediction and question creation will be described below.

［予測と問題作成の処理：図７］
本装置における予測問題作成の処理（本処理）について図７を参照しながら説明する。図７は、予測と問題作成の処理の概略図である。
本処理は、図７に示すように、過去の特定期間（前回試験から連続して遡った期間）の試験問題を読み込んで、それら試験問題について内容に応じて予め定められた小カテゴリ（単に「カテゴリ」と称することがある）に分類したデータを基に次回試験の小カテゴリを予測する小カテゴリ予測ステップと、その小カテゴリ予測結果を用いて予測問題の作成を行う問題作成ステップとを有している。 [Forecasting and problem creation processing: Fig. 7]
The process of creating a prediction problem (this process) in this apparatus will be described with reference to FIG. 7. FIG. 7 is a schematic diagram of the process of prediction and problem creation.
As shown in FIG. 7, this process reads the exam questions for a specific period in the past (a period that goes back continuously from the previous exam), and for those exam questions, a predetermined subcategory (simply "" It has a small category prediction step that predicts the small category of the next exam based on the data classified into "category") and a question creation step that creates a prediction problem using the small category prediction result. ing.

尚、小カテゴリとは、問題の内容に応じて予め分野を分類したものであり、小カテゴリへの分類作業は、人手によって為されている。
また、小カテゴリに分類されたデータとは、カテゴリ毎の出題回数（出題件数）であり、例えば、カテゴリ１について出願回数２回、カテゴリ２について出題回数３回というものになる。 The small categories are those in which the fields are classified in advance according to the content of the problem, and the work of classifying into the small categories is performed manually.
The data classified into small categories is the number of questions (number of questions) for each category. For example, the number of applications for category 1 is 2 times, and the number of questions for category 2 is 3 times.

図７の例は、既に分類された試験問題からカテゴリの予測を行って予測問題を作成するものであり、図２〜５で説明した処理は、カテゴリに分類されていない問題文からカテゴリ分類するものである。
従って、分類されていない問題文については、図２〜５でカテゴリ分類し、図６以降の予測プログラムによる処理でカテゴリ予測を行い、予測問題を作成することになる。 In the example of FIG. 7, a category is predicted from the already classified test questions to create a prediction question, and the processing described in FIGS. 2 to 5 classifies the questions from the question sentences that are not classified into categories. It is a thing.
Therefore, the problem sentences that are not classified are classified into categories in FIGS. 2 to 5, and the category prediction is performed by the processing by the prediction program in FIGS. 6 and 6 to create a prediction problem.

本処理の小カテゴリ予測ステップと問題作成ステップについて具体的に説明する。
［小カテゴリ予測ステップ］
本装置の制御部１１は、記憶部１２から処理プログラムを読み込んで、小カテゴリ予測ステップを実行するものであり、外部から過去の特定期間の試験問題（過去問）をＣＳＶ（Comma Separated Values）形式で入力し、試験毎に小カテゴリ毎の出題回数を集計し（ここまでの処理を試験文章分類エンジン１００で行ってもよい）、その小カテゴリ毎の出題回数を基に予測プログラムを実行して小カテゴリ予測結果を出力する。この小カテゴリ予測結果は、次回の試験問題の小カテゴリ毎の予測出題回数である。 The sub-category prediction step and the problem creation step of this process will be specifically described.
[Small category prediction step]
The control unit 11 of the present device reads a processing program from the storage unit 12 and executes a small category prediction step, and externally inputs the past test questions (past questions) of a specific period in the CSV (Comma Separated Values) format. Enter in, total the number of questions for each subcategory for each exam (the processing up to this point may be performed by the exam text classification engine 100), and execute the prediction program based on the number of questions for each subcategory. Output the small category prediction result. This sub-category prediction result is the predicted number of questions for each sub-category of the next exam question.

ここで、予測プログラムとして時系列データを扱える再帰型のニューラルネットワーク（ＲＮＮ）を用いたＡＩプログラムとなっており、中間層（隠れ層）のユニットをＬＳＴＭブロックに置き換えたものとなっている。ＲＮＮは、時系列情報と過去のデータを学習するのに適している。
尚、試験問題が複数科目から構成されている場合は、科目毎の出題数が決まっているので、その分類はルールベースで行う。 Here, the AI program uses a recurrent neural network (RNN) that can handle time-series data as a prediction program, and the unit of the intermediate layer (hidden layer) is replaced with an LSTM block. RNNs are suitable for learning time series information and past data.
If the exam questions consist of multiple subjects, the number of questions for each subject is fixed, so the classification is based on rules.

［問題作成ステップ］
本装置の制御部１１は、記憶部１２から処理プログラムを読み込んで、問題作成ステップを実行するものであり、小カテゴリ予測ステップで得られた小カテゴリ予測結果（小カテゴリ毎の予測出題回数）を入力し、問題作成プログラムを実行して過去問からランダムに問題を抽出して出題予測結果を出力する。
つまり、小カテゴリ毎の予測出題回数に従って、当該小カテゴリに対応する過去問の中からランダムに問題を抽出して全体としての予測問題を作成する。 [Problem creation step]
The control unit 11 of the present device reads the processing program from the storage unit 12 and executes the problem creation step, and obtains the small category prediction result (predicted number of questions for each small category) obtained in the small category prediction step. Input, execute the question creation program, randomly extract questions from past questions, and output the question prediction results.
That is, according to the number of predicted questions for each subcategory, questions are randomly extracted from the past questions corresponding to the subcategory to create a predicted question as a whole.

本問題作成ステップでは、過去問をランダムに抽出するようにしているが、前回、前々回に出題された問題が選ばれにくいように抽出の際に、重み付けを行って選択してもよい。
また、過去問について、過去問そのものではなく、問題内容を少し変更したものを作成しておき、それらを含めて抽出するようにしてもよい。 In this question creation step, past questions are randomly extracted, but the questions that were asked last time or two times before the previous question may be weighted and selected at the time of extraction so that they are difficult to be selected.
In addition, for past questions, instead of the past questions themselves, questions may be created with slightly modified contents and extracted including them.

［ニューラルネットワークのモデル：図８］
次に、本装置で使用されるニューラルネットワークのモデルについて図８を参照しながら説明する。図８は、本装置におけるニューラルネットワークのモデルの概略図である。
本装置におけるニューラルネットワークは、図８に示すように、入力層１１１に入力データｘが入力され、それが中間層のＬＳＴＭレイヤー１１２，１１３を介して、全結合レイヤー１１４に入力されて全結合がなされ、それが出力層１１５に予測値として出力データｙが出力される。 [Neural network model: Fig. 8]
Next, the model of the neural network used in this apparatus will be described with reference to FIG. FIG. 8 is a schematic diagram of a neural network model in this device.
In the neural network in this apparatus, as shown in FIG. 8, input data x is input to the input layer 111, which is input to the fully connected layer 114 via the LSTM layers 112 and 113 of the intermediate layer, and the fully connected is performed. Then, the output data y is output to the output layer 115 as a predicted value.

中間層のＬＳＴＭレイヤー１１２，１１３は、時系列情報を一定の時間窓の期間メモリユニットで保持し、時間窓内の情報について学習を行う。
ＬＳＴＭレイヤーは、ＲＮＮの短期間の記憶しか実現できないという限界を緩和するもので、メモリユニットには、入力層から入力されるものもあり、中間層からの出力を帰還させるものもあり、時系列情報の長期の記憶を可能としている。
図８のモデルを本装置が使用することで、小カテゴリの予測結果の精度を向上させることができるものである。 The LSTM layers 112 and 113 of the intermediate layer hold the time series information in the memory unit for a certain period of time window, and learn about the information in the time window.
The LSTM layer relaxes the limitation that only short-term memory of RNN can be realized, and some memory units are input from the input layer and some are fed back the output from the intermediate layer. It enables long-term memory of information.
By using the model of FIG. 8 in this apparatus, it is possible to improve the accuracy of the prediction result of the small category.

［年次毎のカテゴリの出題回数：図９］
次に、年次毎のカテゴリ（小カテゴリ）の出題回数について図９を参照しながら説明する。図９は、年次毎のカテゴリの出題回数を示す図である。
年次毎の出題回数は、図９に示すように、例えば、９３のカテゴリに分類されており、平成１年（Ｈ１）から平成２９年（Ｈ２９）までの各年のカテゴリ毎の出題回数が得られる。 [Number of questions in each category for each year: Fig. 9]
Next, the number of questions in each category (small category) will be described with reference to FIG. FIG. 9 is a diagram showing the number of questions in the category for each year.
As shown in FIG. 9, the number of questions for each year is classified into, for example, 93 categories, and the number of questions for each category from 2001 (H1) to 2017 (H29) is can get.

尚、カテゴリ番号Ｉについてカテゴリの出題回数は、ｎ_ijで表され、ｉはカテゴリの番号を示し、ｊは年次を表している。
以下に説明する階差データの作り方では、カテゴリ毎（カテゴリ番号毎）にＨ１〜Ｈ２９の図９の行データを用いている。例えば、カテゴリ番号２では、ｎ₂₁，ｎ₂₂，ｎ₂₃，…の行データを単位として用いている。 Regarding the category number I, the number of questions in the category is _{represented by n ij} , i represents the category number, and j represents the year.
In the method of creating the difference data described below, the row data of FIG. 9 of H1 to H29 is used for each category (each category number). For example, in category number 2, _{the row data of n 21} , n ₂₂ , n ₂₃ , ... Is used as a unit.

［階差データの作り方：図１０］
次に、小カテゴリ毎の階差データの作り方について図１０を参照しながら説明する。図１０は、階差データの作り方を示す図である。
過去問の小カテゴリ毎の出題回数を単純にＡＩプログラムに入力するだけでは、精度のよい小カテゴリ予測結果を得るのに十分ではないため、以下に説明する階差データを作成し、その階差データから得られる行列の数値をＡＩプログラムに入力して予測精度を向上させている。 [How to create difference data: Fig. 10]
Next, how to create the difference data for each small category will be described with reference to FIG. FIG. 10 is a diagram showing how to create the difference data.
Simply inputting the number of questions for each small category of past questions into the AI program is not enough to obtain accurate small category prediction results. Therefore, create the difference data explained below and create the difference. The numerical value of the matrix obtained from the data is input to the AI program to improve the prediction accuracy.

図１０に示すように、年度Ｈ１〜Ｈ２９について、（１）〜（２９）が元の時系列データである。例えば、図９で示したカテゴリ番号２についてのｎ₂₁，ｎ₂₂，ｎ₂₃，…の行のデータが時系列データに該当する。
尚、丸括弧で表される数値は、実際の試験での出題回数であり、角括弧で表される数値は予測値である。 As shown in FIG. 10, (1) to (29) are the original time series data for the years H1 to H29. _{For example, the data in the rows n 21} , n ₂₂ , n ₂₃ , ... For category number 2 shown in FIG. 9 corresponds to time series data.
The numerical value represented by parentheses is the number of questions in the actual test, and the numerical value represented by square brackets is the predicted value.

そして、その時系列データを１年分左にシフトさせた配列をその下に配置し、それを繰り返し行って複数段の配列を生成し、例えば、１３段の配列の階差データを生成する。
更に、Ｈ１の縦方向の列のデータ（１）〜（１３）の値からＨ１４のこのカテゴリの出題回数の数値［１４］を予測する。 Then, an array in which the time-series data is shifted to the left by one year is placed below it, and this is repeated to generate a multi-stage array, for example, a 13-stage array difference data is generated.
Further, the numerical value [14] of the number of questions in this category of H14 is predicted from the values of the data (1) to (13) in the vertical column of H1.

更に、Ｈ１の縦方向の列のデータとＨ２の縦方向の列のデータ（２）〜（１４）の値からＨ１５の予測値［１５］を得る。これは、１３行×２列の行列の値を入力データとして予測値を得るものである。
同様に、Ｈ１〜Ｈ１３の１３行×１４列の行列の値を入力データとしてＨ２６の予測値［２６］を得る。 Further, the predicted value [15] of H15 is obtained from the values of the vertical column data of H1 and the data (2) to (14) of the vertical column of H2. In this method, the predicted value is obtained by using the value of the matrix of 13 rows × 2 columns as input data.
Similarly, the predicted value [26] of H26 is obtained by using the value of the matrix of 13 rows × 14 columns of H1 to H13 as input data.

本装置では、時系列データを１４年分使用することにしており、Ｈ３０の予測値を得るのに、Ｈ４〜Ｈ１７の１３行×１４列の行列の値を入力データとして小カテゴリ予測ステップのＡＩプログラムに入力して予測値［３０］を得る。 In this device, time series data is used for 14 years, and in order to obtain the predicted value of H30, the value of the matrix of 13 rows × 14 columns of H4 to H17 is used as input data, and the AI of the small category prediction step. Input to the program to obtain the predicted value [30].

尚、小カテゴリは、９３個あるので、図１０に示したような階差データを９３個分並べて（９３，１４，１３）の配列をＡＩプログラムに入力して、全ての小カテゴリについて小カテゴリ予測結果を得ることになる。
このようにして得られた小カテゴリ予測結果は、精度が高く、問題作成ステップでの問題作成でも予測精度を向上させることができる。 Since there are 93 subcategories, 93 subdivision data as shown in FIG. 10 are arranged and an array (93, 14, 13) is input to the AI program, and all the subcategories are subcategories. You will get the prediction result.
The small category prediction result obtained in this way has high accuracy, and the prediction accuracy can be improved even in the problem creation in the problem creation step.

［予測問題提供サーバ２の処理］
次に、本システムの予測問題提供サーバ２での処理を説明する。
予測問題提供サーバ２は、予測問題作成処理装置１で作成された予測問題データを予測問題作成処理装置１からアップロードされて、記憶部に記憶し、その予測問題データを受験者ＰＣ４からのアクセスにより提供する。 [Processing of prediction problem providing server 2]
Next, the processing in the prediction problem providing server 2 of this system will be described.
The prediction problem providing server 2 uploads the prediction problem data created by the prediction problem creation processing device 1 from the prediction problem creation processing device 1, stores it in the storage unit, and accesses the prediction problem data from the examinee PC4. offer.

予測問題提供サーバ２は、受験者ＰＣ４から予測問題に対する解答データを受け取り、採点を行い、採点結果を受験者ＰＣ４に通知してもよい。
また、採点結果を単に通知するだけでなく、予測問題の全解答者の解答内容の分析結果を提供したり、受験者個人の弱点を指摘し、合格のための有効な学習方法をアドバイスとして提供するようにしてもよい。当該アドバイスには、勉強に参考となるオンラインの講義を紹介してその講義に呼び込むことも含まれる。 The prediction question providing server 2 may receive answer data for the prediction question from the examinee PC4, perform scoring, and notify the examinee PC4 of the scoring result.
In addition to simply notifying the scoring results, it also provides analysis results of the answers of all the answerers of the prediction question, points out the weaknesses of individual examinees, and provides effective learning methods for passing as advice. You may try to do it. The advice also includes introducing online lectures that can be used as a reference for studying and inviting them to the lectures.

［実施の形態の効果］
本システム及び試験問題予測方法によれば、本装置１が、過去の試験問題の問題文から単語を抽出し、当該抽出した単語についてベクトルの要素毎に数値を付与してベクトルデータを生成するベクトル化を行い、ベクトルデータに基づいて試験問題をカテゴリに分類し、過去の年毎にカテゴリに分類された試験問題の出題回数から次回の試験問題のカテゴリ毎の出題回数を予測するものとしているので、過去の試験問題の問題文をカテゴリに分類し、分類結果から次回の試験出題のカテゴリ毎の出題回数を適正に予測できる効果がある。 [Effect of Embodiment]
According to this system and the test question prediction method, the device 1 extracts words from the question sentences of past test questions, assigns numerical values to each of the vector elements for the extracted words, and generates vector data. Based on the vector data, the exam questions are classified into categories, and the number of questions for each category of the next exam question is predicted from the number of questions in the exam questions classified into categories for each past year. , It has the effect of classifying the question sentences of past exam questions into categories and appropriately predicting the number of questions for each category of the next exam questions from the classification results.

更に、本システムでは、処理装置が、予測されたカテゴリ毎の出題回数に従って過去の試験問題から問題を抽出して次回の予測問題を作成するものとしているので、出題回数が適正に予測されたカテゴリについての予測問題を精度よく作成でき、受験者は予測精度が高い予測問題を試験対策用として受けることができる効果がある。 Furthermore, in this system, the processing device extracts questions from past exam questions according to the predicted number of questions for each category and creates the next predicted question, so the category in which the number of questions is properly predicted. It is possible to create a prediction question with high accuracy, and the examinee can receive a prediction question with high prediction accuracy as a test preparation.

本発明は、過去に出題された問題内容から次回の出題カテゴリを予測して、その予測結果を利用して過去の問題に基づいて試験対策用の予測問題を容易に作成できる試験問題予測システム及び試験問題予測方法に好適である。 The present invention is an examination question prediction system capable of predicting the next question category from the contents of questions asked in the past and easily creating a prediction question for exam preparation based on the past question using the prediction result. It is suitable for the test question prediction method.

１…予測問題作成処理装置、２…予測問題提供サーバ、３…インターネット、４…受験者ＰＣ、１０…実装エンジン、１１…制御部、１２…記憶部、１３…インタフェース部、１４…表示部、１５…入力部、１１１…入力層、１００試験文章分類エンジン、１１２，１１３…中間層、１１４…全結合層、１１５…出力層 1 ... Prediction question creation processing device, 2 ... Prediction question providing server, 3 ... Internet, 4 ... Candidate PC, 10 ... Mounting engine, 11 ... Control unit, 12 ... Storage unit, 13 ... Interface unit, 14 ... Display unit, 15 ... Input section, 111 ... Input layer, 100 Test sentence classification engine, 112, 113 ... Intermediate layer, 114 ... Fully connected layer, 115 ... Output layer

Claims

A test question prediction system equipped with a processing device that predicts test questions.
The processing device extracts words from the question sentences of past test questions, assigns numerical values to each of the vector elements to the extracted words, performs vectorization to generate vector data, and tests based on the vector data. The questions are classified into categories, the number of questions in the exam questions classified into categories for each past year is arranged in the past time series, and multiple sequences that shift the arrangement to the past one year are generated step by step. An exam question prediction system characterized by predicting the number of questions for each category of the next exam question by machine learning using a series.

The processing device is characterized in that a learning model of machine learning is trained so as to obtain desired vector data by using a problem sentence of teacher data, and vectorization is performed using the learned learning model. Described exam question prediction system.

The processing device trains the machine learning learning model to be classified into a desired category using the vector data of the teacher data, and classifies the test questions from the vector data vectorized using the trained learning model into categories. The test question prediction system according to claim 1 or 2, wherein the test questions are classified.

The test question prediction system according to any one of claims 1 to 3, wherein the processing device uses a recurrent neural network for machine learning.

The test question prediction system according to any one of claims 1 to 4 , wherein the processing device extracts questions from past test questions according to the predicted number of questions for each category and creates the next predicted question.

A vector that is a test question prediction method in which a processing device predicts test questions, extracts words from question sentences of past test questions, assigns numerical values to each of the vector elements for the extracted words, and generates vector data. Based on the vector data, the exam questions are classified into categories, the number of questions in the exam questions classified into the categories is arranged in the past time series for each past year, and the arrangement is made one year in the past. A test question prediction method characterized in that the number of questions for each category of the next test question is predicted by machine learning using a difference series in which a plurality of sequences to be shifted are generated stepwise.