JP7800673B2

JP7800673B2 - Winning probability acquisition device, method, and program

Info

Publication number: JP7800673B2
Application number: JP2024522830A
Authority: JP
Inventors: 早苗藤田; 哲生小林; 正嗣服部
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2026-01-16
Anticipated expiration: 2042-05-26
Also published as: WO2023228361A1; JPWO2023228361A1

Description

特許法第３０条第２項適用（１）ウェブサイトの掲載日２０２２年３月１０日ウェブサイトのアドレスｈｔｔｐｓ：／／ｗｗｗ．ｆｕｋｕｉ－ｃ．ｅｄ．ｊｐ／▲～▼ｆｅｃ／ｗｐ－ｃｏｎｔｅｎｔ／ｕｐｌｏａｄｓ／２０２２／０３／２０２１＿０３ＮＴＴ％Ｅ８％９７％Ａ４％Ｅ７％９４％Ｂ０％Ｅ６％Ｂ０％８Ｆ％Ｅ７％８９％Ｂ９％Ｅ５％８８％Ａ５％Ｅ５％ＡＦ％８４％Ｅ７％Ａ８％ＢＦ．ｐｄｆArticle 30, paragraph 2 of the Patent Act applies. (1) Date of website publication: March 10, 2022 Website address: https://www.fukui-c.ed.jp/▲～▼fec/wp-content/uploads/2022/03/2021_03NTT%E8%97%A4%E7%94%B0%E6%B0%8F%E7%89%B9%E5%88%A5%E5%AF%84%E7%A8%BF.pdf

開示の技術は、単語の獲得確率を取得する技術に関する。 The disclosed technology relates to a technology for obtaining the probability of word acquisition.

ある人が知っている単語の総数をその人の語彙数という。語彙数推定テストは、その語彙数を短時間に精度よく推定するテストである（例えば、非特許文献１等参照）。以下にその推定手順の概要を示す。 The total number of words a person knows is called their vocabulary size. A vocabulary size estimation test is a test that accurately estimates that vocabulary size in a short amount of time (see, for example, Non-Patent Document 1). An overview of the estimation procedure is provided below.

(1)単語親密度ＤＢ（データベース）の単語リストを、親密度の順に並べてほぼ一定間隔となるようにテスト単語を選択する（例えば、1000語ごとに1語選択する）。親密度（単語親密度）とは単語のなじみ深さを数値化したものである。親密度が高い単語ほどなじみのある語であることを示す。 (1) The word list in the word familiarity database is sorted in order of familiarity, and test words are selected so that they are spaced at approximately regular intervals (for example, one word is selected for every 1,000 words). Familiarity (word familiarity) is a numerical representation of how familiar a word is. The higher the familiarity, the more familiar the word.

(2)テスト単語を利用者に提示し、その単語を知っているか否かを回答させる。 (2) Present a test word to the user and ask them to respond whether or not they know the word.

(3)このようなテスト単語と回答の組み合わせを最もよく説明できるようロジスティック回帰分析を行う。ただし、このロジスティック回帰分析では、単語親密度ＤＢ中において各テスト単語の親密度以上の親密度の単語の総数を独立変数ｘとし、利用者が各単語を知っていると回答する確率（例えば、０又は１）を従属変数ｙとする。ロジスティック回帰分析の結果、ロジスティックモデル（あるいは、ロジスティック回帰式）を得る。ロジスティックモデルの例を、図１２に示す。 (3) A logistic regression analysis is performed to best explain these combinations of test words and answers. In this logistic regression analysis, the independent variable x is the total number of words in the word familiarity DB that have a familiarity level equal to or higher than that of each test word, and the dependent variable y is the probability (e.g., 0 or 1) that the user will answer that they know each word. As a result of the logistic regression analysis, a logistic model (or logistic regression equation) is obtained. An example of a logistic model is shown in Figure 12.

(4)求めたロジスティックモデルにおいて、ｙ＝０．５に対応するｘの値を求め、推定語彙数とする。なお、推定語彙数とは、利用者の語彙数と推定される値を意味する。(4) In the logistic model obtained, the value of x corresponding to y = 0.5 is calculated and used as the estimated vocabulary size. Note that the estimated vocabulary size refers to the value estimated to be the user's vocabulary size.

この方法では、単語親密度ＤＢを用いることで、選択されたテスト単語を知っているか否かをテストするだけで、利用者の語彙数を精度よく推定できる。 This method uses a word familiarity database to accurately estimate a user's vocabulary size simply by testing whether or not they know the selected test words.

小林哲生，天野成昭，正高信男，“モバイル社会の現状と行方”，２００７，ＮＴＴ出版，ｐ１２７－１２８．Kobayashi Tetsuo, Amano Shigeaki, Masataka Nobuo, "The Current State and Future of the Mobile Society," 2007, NTT Publishing, pp. 127-128.

生成されたロジスティックモデルを用いて、あるテキストに含まれる各単語をある者が獲得している確率を取得する技術はこれまで提案されていなかった。 No technology has previously been proposed that uses the generated logistic model to obtain the probability that a person has acquired each word contained in a text.

開示の技術は、あるテキストに含まれる各単語をある者が獲得している確率を取得することを目的とする。 The disclosed technology aims to obtain the probability that a person has acquired each word contained in a text.

開示の技術の一態様は、獲得確率取得装置であって、親密度は単語に対する親密さを表す指標であり、複数の単語と、複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度ＤＢが記憶されている記憶部と、入力されたテキストに含まれる各単語に対応する親密度を記憶部に記憶されている単語親密度ＤＢから取得する親密度取得部と、各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルが記憶されているモデル記憶部と、取得された各単語に対応する親密度と、モデル記憶部に記憶されているモデルとを少なくとも用いて、各単語を前記ある者が獲得している確率である獲得確率を取得する獲得確率取得部と、を備えている。 One aspect of the disclosed technology is an acquisition probability acquisition device, where intimacy is an index representing familiarity with a word, and the device comprises: a memory unit in which a word intimacy DB storing a plurality of words and a plurality of intimacies corresponding to each of the plurality of words is stored; an intimacy acquisition unit that acquires the intimacy corresponding to each word included in input text from the word intimacy DB stored in the memory unit; a model memory unit that stores a model representing the relationship between a value based on the intimacy corresponding to each word and a value based on the probability that a certain person has acquired each word; and an acquisition probability acquisition unit that acquires the acquisition probability, which is the probability that a certain person has acquired each word, using at least the acquired intimacy corresponding to each word and the model stored in the model memory unit.

開示の技術によれば、あるテキストに含まれる各単語をある者が獲得している確率を取得することができる。 The disclosed technology makes it possible to obtain the probability that a person has acquired each word contained in a text.

図１は、モデル生成装置及び単語選択装置の機能構成の例を示す図である。FIG. 1 is a diagram illustrating an example of the functional configuration of a model generation device and a word selection device. 図２は、モデル生成方法及び単語選択方法の処理手続きの例を示す図である。FIG. 2 is a diagram showing an example of the processing procedure of the model generation method and the word selection method. 図３は、ロジスティック回帰のモデルの例を示す図である。FIG. 3 is a diagram showing an example of a logistic regression model. 図４は、獲得確率取得装置の機能構成の例を示す図である。FIG. 4 is a diagram illustrating an example of the functional configuration of the winning probability acquisition device. 図５は、獲得確率取得方法の処理手続きの例を示す図である。FIG. 5 is a diagram showing an example of a processing procedure of the winning probability obtaining method. 図６は、獲得語情報の生成の例を説明するための図である。FIG. 6 is a diagram for explaining an example of generation of acquired word information. 図７は、学習推奨語抽出装置の機能構成の例を示す図である。FIG. 7 is a diagram illustrating an example of the functional configuration of the device for extracting recommended words to learn. 図８は、学習推奨語抽出方法の処理手続きの例を示す図である。FIG. 8 is a diagram showing an example of a processing procedure of the method for extracting recommended words to learn. 図９は、学習推奨語の例を示す図である。FIG. 9 is a diagram showing examples of recommended words to study. 図１０は、コンピュータの機能構成例を示す図である。FIG. 10 is a diagram illustrating an example of the functional configuration of a computer. 図１１は、親密度と語数の対応の例を示す図である。FIG. 11 is a diagram showing an example of the correspondence between the familiarity and the number of words. 図１２は、背景技術を説明するための図である。FIG. 12 is a diagram for explaining the background art.

以下、図面を参照して開示の技術の実施形態を説明する。 Below, embodiments of the disclosed technology are described with reference to the drawings.

［第一実施形態］
まず、第一実施形態を説明する。第一実施形態は、モデル生成装置及び方法、単語生成装置及び方法である。 [First embodiment]
First, a first embodiment will be described. The first embodiment is a model generation device and method, and a word generation device and method.

図１に例示するように、本実施形態のモデル生成装置１は、記憶部１１、単語選択部１２、提示部１３、回答受付部１４、モデル生成部１５及び語彙数推定部１６を備えている。モデル生成装置１は、単語選択部１２、提示部１３、回答受付部１４、記憶部１１及び語彙数推定部１６を備えていなくてもよい。 As illustrated in FIG. 1, the model generation device 1 of this embodiment includes a memory unit 11, a word selection unit 12, a presentation unit 13, a response reception unit 14, a model generation unit 15, and a vocabulary size estimation unit 16. The model generation device 1 does not necessarily have to include the word selection unit 12, the presentation unit 13, the response reception unit 14, the memory unit 11, and the vocabulary size estimation unit 16.

なお、図１に破線で示すように、記憶部１１及び単語選択部１２により、単語生成装置Ａ１は構成される。なお、単語生成装置Ａ１は、提示部１３、回答受付部１４を備えていてもよい。As shown by the dashed lines in Figure 1, the word generation device A1 is composed of a memory unit 11 and a word selection unit 12. The word generation device A1 may also include a presentation unit 13 and an answer acceptance unit 14.

＜記憶部１１＞
記憶部１１には予め親密度データベース（ＤＢ）が格納されている。単語親密度ＤＢは、Ｍ個の単語（複数の単語）と当該単語それぞれに対して予め定められた親密度（単語親密度）との組を格納したデータベースである。言い換えれば、複数の単語と、複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度ＤＢが記憶されている。 <Storage unit 11>
A familiarity database (DB) is stored in advance in the storage unit 11. The word familiarity DB is a database that stores pairs of M words (a plurality of words) and a predetermined familiarity (word familiarity) for each of the words. In other words, a word familiarity DB that stores a plurality of words and a plurality of familiarities corresponding to each of the plurality of words is stored.

単語親密度ＤＢのＭ個の単語は親密度に基づく順序（例えば、親密度順）で順位付けされている。Ｍは単語親密度ＤＢに含まれる単語数を表す２以上の整数である。Ｍの値に限定はないが、例えば、母語の語彙数を測る場合Ｍは７００００以上、第二言語（例えば日本語母語話者にとっての英語）の語彙数を測る場合Mは1００００以上、が望ましい。日本人の成人の語彙数が約４万から５万程度と言われているため、７万語程度あれば個人差を含めてほとんどの人の語彙をカバーできるからである。一方、第二言語の場合、母語ほど語彙数がない場合が多く、母語の場合のMより少ない語数でもほとんどの人の語彙をカバーできると考えられる。ただし、表記ゆれや派生語の扱いなど語彙の数え方によって語彙数は大きく変わる。そのため、語彙の数え方によっては母語の場合Mが１０００００以上必要な場合もある。また、推定される語彙数は、基準となる単語親密度ＤＢに含まれる語数が上限となる。そのため、外れ値となるような語彙数の多い人の語彙推定も行う場合には、Ｍの値をより大きくすることが望ましい。The M words in the word familiarity database are ranked in order based on familiarity (e.g., by familiarity). M is an integer equal to or greater than 2, representing the number of words contained in the word familiarity database. While there are no limitations on the value of M, it is desirable to set M to 70,000 or greater when measuring the vocabulary size of a native language, and 10,000 or greater when measuring the vocabulary size of a second language (e.g., English for native Japanese speakers). Because the average Japanese adult vocabulary size is estimated to be approximately 40,000 to 50,000 words, approximately 70,000 words can cover the vocabulary of most people, taking into account individual differences. On the other hand, second languages often have a smaller vocabulary than their native language, and a word count smaller than the M for the native language is likely to cover the vocabulary of most people. However, vocabulary size can vary significantly depending on how vocabulary is counted, including orthographic variations and the handling of derivative words. Therefore, depending on how vocabulary is counted, M may need to be 100,000 or greater for the native language. The estimated vocabulary size is limited to the number of words contained in the reference word familiarity database. Therefore, when vocabulary estimation is also performed for people with large vocabulary sizes who may be outliers, it is desirable to set the value of M to a larger value.

親密度（単語親密度）とは、単語に対する親密さを表す指標である。単語に対する親密さを表す指標の例は、親密度は単語のなじみ深さを表す指標（例えば、非特許文献１で紹介されている単語のなじみ深さを数値化したもの）、単語をどの程度見たり聞いたりするかを表す指標、単語をどの程度知っているかを表す指標、単語をどの程度書くことができるかを表す指標、単語を用いてどの程度話すことができるかを表す指標である。 Familiarity (word familiarity) is an index that represents familiarity with a word. Examples of indices that represent familiarity with a word include an index that represents the familiarity of a word (for example, the numerical representation of word familiarity introduced in Non-Patent Document 1), an index that represents how often a word is seen or heard, an index that represents how well a word is known, an index that represents how well a word can be written, and an index that represents how well a word can be spoken using the word.

例えば、親密度が高い単語ほど親密な語である。本実施形態では、親密度を表す数値が大きいほど親密度が高いことを表す。しかし、これは本発明を限定するものではない。 For example, the higher the intimacy level of a word, the more intimate the word. In this embodiment, the larger the number representing the intimacy level, the higher the intimacy level. However, this does not limit the present invention.

記憶部１１は、単語選択部１２およびモデル生成部１５からの読み出し要請を入力として、当該要請に応じた単語と、その単語の親密度を出力する。 The memory unit 11 receives read requests from the word selection unit 12 and the model generation unit 15 as input, and outputs the words that correspond to the requests and the familiarity of those words.

＜単語選択部１２＞
入力：利用者またはシステムからの問題生成要請
出力：語彙数推定テストに使用するＮ個のテスト単語
単語選択部１２は、利用者またはシステムからの問題生成要請を受け付けると、記憶部１１の単語親密度ＤＢに含まれる順序付けされた複数の単語から語彙数推定テストに使用する複数のテスト単語ｗ（１），…，ｗ（Ｎ）を選択して出力する。 <Word Selection Unit 12>
Input: Question generation request from the user or the system Output: N test words to be used in the vocabulary size estimation test When the word selection unit 12 receives a question generation request from the user or the system, it selects and outputs a plurality of test words w(1), ..., w(N) to be used in the vocabulary size estimation test from a plurality of ordered words contained in the word familiarity DB in the memory unit 11.

例えば、単語選択部１２は、記憶部１１に記憶されている単語親密度ＤＢを用いて、複数の単語から複数のテスト単語ｗ（１），…，ｗ（Ｎ）を、テスト単語に対応する親密度の間隔が一定間隔になるように選択する（ステップＳ１２）。 For example, the word selection unit 12 uses a word intimacy DB stored in the memory unit 11 to select multiple test words w(1), ..., w(N) from multiple words so that the intervals between the intimacy levels corresponding to the test words are constant (step S12).

例えば、単語選択部１２は、記憶部１１の単語親密度ＤＢに含まれる全単語を対象として、選択した単語の親密度がほぼ一定間隔になるようにまんべんなく単語をＮ個選択し、選択したＮ個の単語をテスト単語ｗ（１），…，ｗ（Ｎ）として出力する。 For example, the word selection unit 12 selects N words evenly from all words included in the word familiarity DB of the memory unit 11 so that the familiarity of the selected words is at approximately regular intervals, and outputs the selected N words as test words w(1), ..., w(N).

例えば、単語選択部１２は、親密度の間隔が０．１となるように単語を選択する。例えば、単語選択部１２は、親密度が１の単語ｗ（１）、親密度が１．１の単語ｗ（２）、…、親密度が６．９の単語ｗ（６０）、親密度が７の単語ｗ（６１）の計６１個の単語を選択してもよい。For example, the word selection unit 12 selects words so that the intimacy interval is 0.1. For example, the word selection unit 12 may select a total of 61 words, including word w(1) with an intimacy level of 1, word w(2) with an intimacy level of 1.1, ..., word w(60) with an intimacy level of 6.9, and word w(61) with an intimacy level of 7.

テスト単語ｗ（１），…，ｗ（Ｎ）の親密度は、必ずしも一定間隔である必要はなく、まんべんなく選択できていれば良く、過去の調査から利用者が知っているか知らないかの境界周辺の親密度が予測されている場合など、重点的に調査したい親密度周辺の単語を多めに選択するなどしてもよい。すなわち、一連のテスト単語ｗ（１），…，ｗ（Ｎ）の親密度の数値に粗密があってもよい。 The familiarity of the test words w(1), ..., w(N) does not necessarily have to be at regular intervals; it is sufficient to select them evenly. If past surveys predict a familiarity level near the boundary between whether the user knows them or not, you can select more words near the familiarity level you want to focus on. In other words, the familiarity values for the series of test words w(1), ..., w(N) may vary in density.

単語選択部１２から出力されるテスト単語ｗ（１），…，ｗ（Ｎ）の順序に限定はないが、単語選択部１２は、例えば親密度の高い順にテスト単語ｗ（１），…，ｗ（Ｎ）を出力する。 There is no limitation on the order in which the test words w(1), ..., w(N) are output from the word selection unit 12, but the word selection unit 12 outputs the test words w(1), ..., w(N) in order of increasing familiarity, for example.

テスト単語の数Ｎは、問題生成要請によって指定されてもよいし、予め定められていてもよい。Ｎの値に限定はないが、例えば５０≦Ｎ≦１００程度が望ましい。十分な推定を行うためにはＮ≧２５であることが望ましい。Ｎが大きい方が精度の高い推定が可能であるが、利用者（被験者）の負荷が高くなる（ステップＳ１２）。 The number of test words N may be specified in the question generation request or may be predetermined. There is no limit to the value of N, but a value of approximately 50≦N≦100 is desirable. To ensure sufficient estimation, it is desirable for N≧25. A larger N allows for more accurate estimation, but increases the burden on the user (subject) (step S12).

利用者の負荷を減らし、精度を高くするために、例えば５０語ずつのテストを複数回（例えば、３回）実施し、それぞれのテストごとに語彙数を推定し、複数回分の回答をまとめて推定しなおしてもよい。この場合、１度のテスト単語を少なくできるため、利用者の負担が少なく、それぞれのテストごとに結果が見られようにすれば利用者の回答モチベーション維持につながる。また、複数回分の語を合わせて最終的な語彙数推定を実施すれば、推定精度を向上できる。 To reduce the burden on users and improve accuracy, tests of 50 words each can be administered multiple times (e.g., three times), the vocabulary size can be estimated for each test, and the answers from multiple tests can be re-estimated together. In this case, fewer words can be tested each time, reducing the burden on users, and allowing users to see the results for each test helps maintain their motivation to answer. Furthermore, combining the words from multiple tests to perform a final vocabulary size estimation can improve estimation accuracy.

テスト単語に対応する親密度の間隔が一定間隔になるように、複数のテスト単語を選択することで、親密度のばらつきを抑えることができるため、ロジスティック曲線が収束しやすくなる。 By selecting multiple test words so that the intervals between the familiarity levels corresponding to the test words are uniform, the variation in familiarity can be reduced, making it easier for the logistic curve to converge.

＜提示部１３＞
入力：Ｎ個のテスト単語
出力：指示文およびＮ個のテスト単語
提示部１３には、単語選択部１２から出力されたＮ個のテスト単語ｗ（１），…，ｗ（Ｎ）が入力される。提示部１３は、事前に設定された表示形式に従い、テスト単語ｗ（１），…，ｗ（Ｎ）を利用者１００（被験者）に提示する（ステップＳ１３）。 <Presentation section 13>
Input: N test words Output: instruction sentence and N test words The presentation unit 13 receives the N test words w(1), ..., w(N) output from the word selection unit 12. The presentation unit 13 presents the test words w(1), ..., w(N) to the user 100 (subject) in accordance with a pre-set display format (step S13).

例えば、提示部１３は、事前に設定された表示形式に従い、利用者１００のテスト単語の知識に関する回答の入力を促す予め定められた指示文、およびＮ個のテスト単語ｗ（１），…，ｗ（Ｎ）を、語彙数推定テスト用のフォーマットで利用者１００に提示する。 For example, the presentation unit 13 presents to the user 100 a predetermined instruction sentence prompting the user 100 to input an answer regarding the user's knowledge of the test words, and N test words w(1), ..., w(N) in a format for a vocabulary size estimation test, in accordance with a pre-set display format.

この提示形式に限定はなく、これらの情報がテキストや画像などの視覚情報として提示されてもよいし、音声などの聴覚情報として提示されてもよいし、点字などの触覚情報として提示されてもよい。 There are no limitations on the presentation format, and this information may be presented as visual information such as text or images, as auditory information such as audio, or as tactile information such as Braille.

例えば、提示部１３は、ＰＣ（personal computer）、タブレット、スマートフォンなどの端末装置の表示画面に、指示文およびテスト単語を電子的に表示してもよい。すなわち、提示部１３は、ディスプレイ等に提示するための画面情報を生成し、ディスプレイに対して出力してもよい。For example, the presentation unit 13 may electronically display the instruction sentences and test words on the display screen of a terminal device such as a personal computer (PC), tablet, or smartphone. In other words, the presentation unit 13 may generate screen information for presentation on a display or the like and output it to the display.

または、提示部１３が印刷装置であり、指示文およびテスト単語を紙などに印刷して出力してもよい。あるいは提示部１３が端末装置のスピーカーであり、指示文およびテスト単語を音声出力してもよい。または、提示部１３が点字ディスプレイであり、指示文およびテスト単語の点字を提示してもよい。 Alternatively, the presentation unit 13 may be a printing device, and the instructions and test words may be printed and output on paper or the like. Alternatively, the presentation unit 13 may be a speaker on a terminal device, and the instructions and test words may be output aloud. Alternatively, the presentation unit 13 may be a Braille display, and the instructions and test words may be displayed in Braille.

利用者１００のテスト単語の知識に関する回答は、テスト単語を「知っている」または「知らない」の何れかを表すもの（各順位のテスト単語を知っている、または、知らないとの回答）であってもよいし、「知っている」および「知らない」を含む３以上の選択肢の何れかを表すものであってもよい。「知っている」および「知らない」以外の選択肢の例は「（知っているかどうか）自信がない」「単語としては知っているが、意味は知らない」などである。ただし、利用者１００に「知っている」および「知らない」を含む３以上の選択肢から回答させても、「知っている」または「知らない」の何れかを回答させる場合に比べて語彙数推定精度が向上しない場合もある。例えば、利用者１００に「知っている」「知らない」「自信がない」の３個の選択肢から回答を選ばせた場合、「自信がない」が選択されるか否かは利用者１００の性格に依存する。このような場合には、選択肢を増やしても語彙数推定精度は向上しない。したがって、通常、利用者１００にテスト単語を「知っている」または「知らない」などの二択で回答させる方が好ましい。User 100's response regarding their knowledge of the test words may indicate either "knowing" or "not knowing" the test words (answering that they know or do not know the test words in each rank), or it may indicate one of three or more options including "knowing" and "not knowing." Examples of options other than "knowing" and "not knowing" include "not confident (about knowing)" and "knowing the word, but not knowing its meaning." However, even if user 100 is asked to respond with three or more options including "knowing" and "not knowing," the accuracy of vocabulary size estimation may not be improved compared to when only "knowing" or "not knowing." For example, if user 100 is asked to choose from three options, "knowing," "not knowing," and "not confident," whether "not confident" is selected depends on user 100's personality. In such cases, increasing the number of options does not improve the accuracy of vocabulary size estimation. Therefore, it is usually preferable to have user 100 answer with two choices, such as "knowing" or "not knowing."

ただし、「知っている」か「知らない」かではなく、「（テスト単語をつかって）例文を説明作れる」か「例文を作れない」か、あるいは、「（テスト単語の）意味を説明できる」か「意味を説明できない」かといった観点での回答を求めてもよい。観点を明確にすることで、推定する語彙数は変わってくる。たとえば「例文を作れる」かどうかであれば、利用者が使えると考えている語彙数を推定することになる。However, rather than asking whether they "know" or "don't know," it would also be better to ask for answers from the perspective of whether they "can create example sentences (using the test words)" or "cannot create example sentences," or whether they "can explain the meaning (of the test words)" or "cannot explain the meaning." By clarifying the perspective, the estimated vocabulary size will change. For example, if the question is whether the user "can create example sentences," the vocabulary size that the user believes they can use will be estimated.

以下では、利用者１００にテスト単語を「知っている」または「知らない」の何れかから回答させる例を説明する。 Below, we will explain an example in which user 100 is asked to answer the test word by either "knowing" or "not knowing."

また、例えば、テスト単語は親密度が高い順に提示されるが、提示順はこれに限るものではなく、ランダムな順序でテスト単語が提示されてもよい。 Also, for example, the test words are presented in order of increasing familiarity, but the order of presentation is not limited to this, and the test words may be presented in a random order.

＜回答受付部１４＞
入力：利用者のテスト単語の知識に関する回答
出力：利用者のテスト単語の知識に関する回答
指示文およびテスト単語が提示された利用者１００は、利用者１００のテスト単語の知識に関する回答を回答受付部１４に入力する（ステップＳ１４）。 <Answer Receiving Unit 14>
Input: User's answer regarding knowledge of test words Output: User's answer regarding knowledge of test words After being presented with the instruction sentence and test words, the user 100 inputs the answer regarding the user's 100 knowledge of the test words into the answer receiving unit 14 (step S14).

例えば、回答受付部１４は、ＰＣ、タブレット、スマートフォンなどの端末装置のタッチパネルであり、利用者１００は当該タッチパネルに回答を入力する。回答受付部１４が端末装置のマイクロホンであってもよく、この場合、利用者１００は当該マイクロホンに回答を音声入力する。For example, the answer receiving unit 14 may be a touch panel of a terminal device such as a PC, tablet, or smartphone, and the user 100 inputs the answer into the touch panel. The answer receiving unit 14 may also be a microphone of the terminal device, in which case the user 100 inputs the answer by voice into the microphone.

利用者１００は、マウス等でクリックすることにより、回答受付部１４に回答を入力してもよい。 The user 100 may input an answer into the answer receiving unit 14 by clicking with a mouse or the like.

回答受付部１４は、入力されたテスト単語の知識に関する回答（例えば、テスト単語を知っているとの回答、またはテスト単語を知らないとの回答）を受け付け、電子的なデータとして当該回答を出力する。回答受付部１４は、テスト単語ごとに回答を出力してもよいし、１テスト分の回答をまとめて出力してもよいし、複数テスト分の回答をまとめて出力してもよい。The answer accepting unit 14 accepts answers regarding knowledge of the input test words (for example, an answer that the test word is known or an answer that the test word is not known) and outputs the answers as electronic data. The answer accepting unit 14 may output answers for each test word, may output answers for one test together, or may output answers for multiple tests together.

例えば、回答受付部１４は、利用者１００がテスト単語を知っているという回答を受け付けた場合には、そのテスト単語の知識に関する回答に１という数値を割り当てる。一方、回答受付部１４は、利用者１００がテスト単語を知らないという回答を受け付けた場合には、そのテスト単語の知識に関する回答に０という数値を割り当てる。これらの数値が、モデル生成部１５に出力される。For example, if the answer receiving unit 14 receives an answer from the user 100 that the user 100 knows the test word, it assigns a numerical value of 1 to the answer regarding knowledge of that test word. On the other hand, if the answer receiving unit 14 receives an answer from the user 100 that the user 100 does not know the test word, it assigns a numerical value of 0 to the answer regarding knowledge of that test word. These numerical values are output to the model generation unit 15.

＜モデル生成部１５＞
入力：利用者のテスト単語の知識に関する回答
出力：モデル
回答受付部１４から出力された利用者１００のテスト単語の知識に関する回答は、モデル生成部１５に入力される。 <Model Generation Unit 15>
Input: Answers about knowledge of test words from user Output: Model The answers about knowledge of test words from user 100 output from the answer receiving unit 14 are input to the model generating unit 15 .

モデル生成部１５は、テスト単語の知識に関する回答と、記憶部１１に記憶されている単語親密度ＤＢとを用いて、テスト単語に対応する親密度に基づく値と、利用者１００がテスト単語を知っていると回答する確率に基づく値と、の関係を表すモデルを得る（ステップＳ１５）。得られたモデルは、語彙数推定部１６に出力される。The model generation unit 15 uses the answers regarding knowledge of the test words and the word familiarity DB stored in the memory unit 11 to obtain a model that represents the relationship between a value based on the familiarity corresponding to the test word and a value based on the probability that the user 100 will answer that they know the test word (step S15). The obtained model is output to the vocabulary size estimation unit 16.

テスト単語に対応する親密度に基づく値は、テスト単語に対応する親密度そのものであってもよいし、テスト単語に対応する親密度の非単調減少関数値（例えば、単調増加関数値）であってもよい。説明の簡略化のため、以下では、テスト単語に対応する親密度に基づく値が、テスト単語に対応する親密度そのものである場合を例示する。 The value based on the intimacy corresponding to the test word may be the intimacy corresponding to the test word itself, or may be a non-monotonic decreasing function value of the intimacy corresponding to the test word (e.g., a monotonically increasing function value). For simplicity of explanation, the following will illustrate the case where the value based on the intimacy corresponding to the test word is the intimacy corresponding to the test word itself.

利用者１００がテスト単語を知っていると回答する確率に基づく値は、利用者１００がテスト単語を知っていると回答する確率そのものであってもよいし、利用者１００がテスト単語を知っていると回答する確率の非単調減少関数値（例えば、単調増加関数値）であってもよい。説明の簡略化のため、以下では、利用者１００がテスト単語を知っていると回答する確率に基づく値は、利用者１００がテスト単語を知っていると回答する確率そのものである場合を例示する。 The value based on the probability that user 100 will answer that he knows the test word may be the probability that user 100 will answer that he knows the test word itself, or may be a non-monotonic decreasing function value (e.g., a monotonically increasing function value) of the probability that user 100 will answer that he knows the test word. For simplicity of explanation, the following will illustrate a case where the value based on the probability that user 100 will answer that he knows the test word is the probability that user 100 will answer that he knows the test word itself.

モデルに限定はないが、モデルの一例はロジスティック回帰のモデル（ロジスティックモデル）である。説明の簡略化のため、以下では、テスト単語に対応する親密度を独立変数ｘとし、利用者１００が各単語を知っていると回答する確率を従属変数ｙとしたロジスティック曲線ｙ＝ｆ（ｘ，Ψ）がモデルである場合を例示する。Ψはモデルパラメータである。 Although there are no limitations on the model, one example of a model is a logistic regression model (logistic model). For simplicity of explanation, the following example illustrates a logistic curve y = f(x, Ψ) where the degree of familiarity corresponding to the test word is the independent variable x and the probability that the user 100 will respond that they know each word is the dependent variable y. Ψ is a model parameter.

モデル生成部１５は、記憶部１１に記憶されている単語親密度ＤＢを参照して、利用者１００が知っていると回答したテスト単語ｗ（ｎ）に対応する親密度を得て、得られた親密度をｘ（ｎ）とする。この親密度ｘ（ｎ）が、テスト単語ｗ（ｎ）に対応する親密度である。 The model generation unit 15 refers to the word familiarity DB stored in the memory unit 11 to obtain the familiarity corresponding to the test word w(n) that the user 100 answered that they know, and defines the obtained familiarity as x(n). This familiarity x(n) is the familiarity corresponding to the test word w(n).

モデル生成部１５は、利用者１００が知っていると回答したテスト単語ｗ（ｎ）について、利用者１００が当該テスト単語ｗ（ｎ）を知っていると回答する確率ｙが１（すなわち１００％）であり、当該テスト単語ｗ（ｎ）に対応する親密度がｘ（ｎ）である点（ｘ，ｙ）＝（ｘ（ｎ），１）を設定する。 For a test word w(n) that user 100 answers as knowing, the model generation unit 15 sets a point (x, y) = (x(n), 1) where the probability y that user 100 answers as knowing the test word w(n) is 1 (i.e., 100%) and the familiarity corresponding to the test word w(n) is x(n).

また、モデル生成部１５は、利用者１００が知らないと回答した（または、知っていると回答しない）テスト単語ｗ（ｎ）について、利用者１００が当該テスト単語ｗ（ｎ）を知っていると回答する確率ｙが０（すなわち０％）であり、そのときの当該テスト単語ｗ（ｎ）に対応する親密度がｘ（ｎ）である点（ｘ，ｙ）＝（ｘ（ｎ），０）を設定する。 In addition, for a test word w(n) that the user 100 answers that they do not know (or does not answer that they know), the model generation unit 15 sets a point (x, y) = (x(n), 0) where the probability y that the user 100 answers that they know the test word w(n) is 0 (i.e., 0%) and the familiarity corresponding to the test word w(n) at that time is x(n).

モデル生成部１５は、ｎ＝１，…，Ｎの各点（ｘ，ｙ）＝（ｘ（ｎ），１）または（ｘ（ｎ），０）に対してロジスティック曲線への当てはめを行い、誤差を最小化するロジスティック曲線ｙ＝ｆ（ｘ，Ψ）をモデルとして得る。すなわち、モデル生成部１５は、ｎ＝１，…，Ｎの各点（ｘ，ｙ）＝（ｘ（ｎ），１）または（ｘ（ｎ），０）に対して誤差を最小化するロジスティック曲線ｙ＝ｆ（ｘ，Ψ）をモデルとして得る。 The model generation unit 15 fits a logistic curve to each point (x, y) = (x(n), 1) or (x(n), 0) where n = 1, ..., N, and obtains, as a model, a logistic curve y = f(x, Ψ) that minimizes the error. In other words, the model generation unit 15 obtains, as a model, a logistic curve y = f(x, Ψ) that minimizes the error for each point (x, y) = (x(n), 1) or (x(n), 0) where n = 1, ..., N.

図３にロジスティック曲線ｙ＝ｆ（ｘ，Ψ）のモデルを例示する。図３では、横軸が親密度を表し、縦軸が単語を知っていると回答する確率（ｙ）を表す。丸印は利用者１００が知っていると回答したテスト単語ｗ（ｎ）に対する点（ｘ，ｙ）＝（ｘ（ｎ），１）、および利用者１００が知らないと回答した（または、知っていると回答しない）テスト単語ｗ（ｎ）に対する点（ｘ，ｙ）＝（ｘ（ｎ），０）を表す。図３の「ＡＩＣ」は、赤池情報量規準を表し、値が小さいほどモデルの当てはまりがよいことを示す。図３の「ｎ」は、テスト単語の数を表す。 Figure 3 shows an example of a model of the logistic curve y = f(x, Ψ). In Figure 3, the horizontal axis represents familiarity, and the vertical axis represents the probability (y) of responding that the user knows the word. The circles represent the point (x, y) = (x(n), 1) for the test word w(n) that user 100 responds that they know, and the point (x, y) = (x(n), 0) for the test word w(n) that user 100 responds that they do not know (or do not respond that they know). "AIC" in Figure 3 represents the Akaike Information Criterion, and the smaller the value, the better the fit of the model. "n" in Figure 3 represents the number of test words.

ここで、生成は、作成、構築と言い換えることもできる。したがって、モデル生成部１５は、モデル作成部１５又はモデル構築部１５であってもよい。また、モデルは、作成又は構築されてもよい。 Here, generation can also be rephrased as creation or construction. Therefore, the model generation unit 15 may be a model creation unit 15 or a model construction unit 15. Furthermore, a model may be created or constructed.

＜語彙数推定部１６＞
入力：モデル
出力：利用者１００の語彙数
語彙数推定部１６は、モデルに基づいて利用者１００の語彙数を推定する（ステップＳ１６）。 <Vocabulary size estimation unit 16>
Input: Model Output: Vocabulary size of user 100 The vocabulary size estimation unit 16 estimates the vocabulary size of user 100 based on the model (step S16).

以下、語彙数推定部１６による利用者１００の語彙数を推定方法の例として、推定方法１から推定方法３を説明する。 Below, estimation methods 1 to 3 are explained as examples of methods for estimating the vocabulary size of user 100 by the vocabulary size estimation unit 16.

（推定方法１）
語彙数推定部１６は、モデルにおいて、利用者１００が単語を知っていると回答する確率に基づく値が所定値または所定値の近傍のときの親密度である所定値獲得親密度を得る。所定値の例は、０．５または０．８である。もちろん、所定値は、０より大きい１未満の他の値であってもよい。 (Estimation method 1)
The vocabulary size estimation unit 16 obtains a predetermined value of familiarity, which is the familiarity when the value based on the probability that the user 100 will answer that they know the word in the model is a predetermined value or close to the predetermined value. Examples of the predetermined value are 0.5 or 0.8. Of course, the predetermined value may be any other value greater than 0 and less than 1.

そして、語彙数推定部１６は、記憶部１１に記憶されている単語親密度ＤＢを参照して、所定値獲得親密度以上の親密度の単語の数を得て、得られた数を利用者１００の語彙数とする。 Then, the vocabulary size estimation unit 16 refers to the word intimacy DB stored in the memory unit 11 to obtain the number of words with an intimacy level equal to or greater than a predetermined value of acquired intimacy, and sets the obtained number as the vocabulary size of the user 100.

（推定方法２）
語彙数推定部１６は、モデルと、記憶部１１に記憶されている単語親密度ＤＢとを参照して、単語親密度ＤＢに含まれている単語ｗ（ｍ）に対応する親密度ｘ（ｍ）をモデルに入力した場合の出力値ｙ（ｍ）を得る。言い換えれば、語彙数推定部１６は、モデルにおける、単語ｗ（ｍ）に対応する親密度ｘ（ｍ）に対応するｙの値を計算して、その計算値を出力値ｙ（ｍ）とする。語彙数推定部１６は、この処理を単語親密度ＤＢに含まれる各単語ｗ（ｍ）（ｍ＝１，…，Ｍ）に対して行うことで、出力値ｙ（ｍ）（ｍ＝１，…，Ｍ）を得る。 (Estimation method 2)
The vocabulary size estimation unit 16 refers to the model and the word familiarity DB stored in the storage unit 11, and obtains an output value y(m) when familiarity x(m) corresponding to word w(m) included in the word familiarity DB is input to the model. In other words, the vocabulary size estimation unit 16 calculates the value of y corresponding to familiarity x(m) corresponding to word w(m) in the model, and sets the calculated value as the output value y(m). The vocabulary size estimation unit 16 performs this process for each word w(m) (m = 1, ..., M) included in the word familiarity DB, thereby obtaining the output value y(m) (m = 1, ..., M).

そして、語彙数推定部１６は、Σ_ｍ＝１ ^Ｍｙ（ｍ）を計算して、この計算値を利用者１００の語彙数とする。 Then, the vocabulary size estimation unit 16 calculates Σ _m=1 ^M y(m) and sets this calculated value as the vocabulary size of the user 100 .

その際、単語ｗ（ｍ）が、テスト単語であり、テスト単語ｗ（ｍ）の知識に関する回答が得られている場合には、語彙数推定部１６は、そのテスト単語ｗ（ｍ）の知識に関する回答を考慮して、利用者１００の語彙数を推定してもよい。 In this case, if the word w(m) is a test word and an answer regarding knowledge of the test word w(m) has been obtained, the vocabulary size estimation unit 16 may estimate the vocabulary size of the user 100 by taking into account the answer regarding knowledge of the test word w(m).

例えば、語彙数推定部１６は、テスト単語ｗ（ｍ）の知識に関する回答が知っているという回答である場合にはｙ（ｍ）＝１とし、テスト単語ｗ（ｍ）の知識に関する回答が知らないという回答である場合にはｙ（ｍ）＝０とする。テスト単語以外の単語のｙ（ｍ）は、上記のようにモデルから得られた出力値ｙ（ｍ）を用いる。 For example, the vocabulary size estimation unit 16 sets y(m) = 1 if the answer regarding knowledge of the test word w(m) is "knowledge," and sets y(m) = 0 if the answer regarding knowledge of the test word w(m) is "not knowledge." For y(m) of words other than the test word, the output value y(m) obtained from the model as described above is used.

そして、語彙数推定部１６は、これらのｙ（ｍ）を用いて、Σ_ｍ＝１ ^Ｍｙ（ｍ）を計算して、この計算値を利用者１００の語彙数とする。 Then, the vocabulary size estimation unit 16 uses these y(m) to calculate Σ _m=1 ^M y(m), and sets this calculated value as the vocabulary size of the user 100 .

テスト単語の知識に関する回答を考慮することで、より適切な語彙数を推定することができる。 By taking into account answers regarding knowledge of test words, a more appropriate vocabulary size can be estimated.

利用者１００が当該テスト単語を知っていると回答する確率であるｙと、当該テスト単語の親密度ｘから推定したロジスティックモデルに基づき語彙数を推定することで、語彙数を直接ｘとする場合よりモデルが収束しやすくなりロバストに語彙数を推定できる。また、各親密度に対応する語数の分布が大きく異なる場合であっても、急激な推定語彙数の変化を抑えることができる。 By estimating the vocabulary size based on a logistic model estimated from y, the probability that user 100 will respond that they know the test word, and x, the familiarity of the test word, the model converges more easily than when x is used directly, allowing for a more robust estimation of the vocabulary size. Furthermore, even if the distribution of word counts corresponding to each familiarity level differs significantly, sudden changes in the estimated vocabulary size can be suppressed.

（推定方法３）
語彙数推定部１６は、モデルと、記憶部１１に記憶されている単語親密度ＤＢとを参照して、単語親密度ＤＢに含まれている親密度ｘ（ｉ）をモデルに入力した場合の出力値ｙ（ｉ）を得る。言い換えれば、語彙数推定部１６は、モデルにおける、親密度ｘ（ｉ）に対応するｙの値を計算して、その計算値を出力値ｙ（ｉ）とする。また、語彙数推定部１６は、記憶部１１に記憶されている単語親密度ＤＢを参照して、単語親密度ＤＢに含まれている、親密度ｘ（ｉ）に対応する単語の数ｎ（ｉ）を得る。語彙数推定部１６は、これらの処理を単語親密度ＤＢに含まれる各親密度ｘ（ｉ）（ｉ＝１，…，Ｉ）に対して行うことで、出力値ｙ（ｉ）（ｉ＝１，…，Ｉ）、単語の数ｎ（ｉ）（ｉ＝１，…，Ｉ）を得る。Ｉは、親密度の種類の数である。 (Estimation method 3)
The vocabulary size estimation unit 16 references the model and the word familiarity DB stored in the storage unit 11 to obtain an output value y(i) when a familiarity x(i) included in the word familiarity DB is input to the model. In other words, the vocabulary size estimation unit 16 calculates the value of y corresponding to the familiarity x(i) in the model and sets the calculated value as the output value y(i). The vocabulary size estimation unit 16 also references the word familiarity DB stored in the storage unit 11 to obtain the number n(i) of words included in the word familiarity DB that correspond to the familiarity x(i). The vocabulary size estimation unit 16 performs these processes on each familiarity x(i) (i = 1, ..., I) included in the word familiarity DB to obtain the output value y(i) (i = 1, ..., I) and the number of words n(i) (i = 1, ..., I), where I is the number of familiarity types.

そして、語彙数推定部１６は、Σ_ｉ＝１ ^Ｉｙ（ｉ）×ｎ（ｉ）を計算して、この計算値を利用者１００の語彙数とする。 Then, the vocabulary size estimation unit 16 calculates Σ _i=1 ^I y(i)×n(i) and sets this calculated value as the vocabulary size of the user 100 .

親密度が同じであれば対応するｙの値は同じである。また、同じ親密度の単語があり得る。このため、推定方法２ではなく、推定方法３のように、親密度ごとに計算をすることで、語彙数推定の計算を早く行うことができる。 If the degree of intimacy is the same, the corresponding y value will be the same. Also, there may be words with the same degree of intimacy. For this reason, by performing calculations for each degree of intimacy, as in estimation method 3 rather than estimation method 2, the vocabulary size estimation calculation can be performed more quickly.

＜第一実施形態の変形例＞
単語選択部１２は、テスト単語に対応する親密度の間隔が一定間隔になるようにではなく、複数の単語から複数のテスト単語ｗ（１），…，ｗ（Ｎ）を単に選択してもよい。 <Modification of the first embodiment>
The word selection unit 12 may simply select a plurality of test words w(1), . . . , w(N) from a plurality of words, rather than selecting test words such that the intervals between the familiarity scores corresponding to the test words are constant.

また、モデル生成部１５は、非提示単語の知識に関する回答を仮定して、テスト単語及び非提示単語に対応する親密度に基づく値と、利用者１００がテスト単語及び非提示単語を知っていると回答する確率あるいは仮定に基づく値と、の関係を表すモデルを得てもよい。 In addition, the model generation unit 15 may hypothesize answers regarding knowledge of non-presented words and obtain a model representing the relationship between values based on the familiarity corresponding to the test words and non-presented words and the probability or hypothetical value that the user 100 will answer that he or she knows the test words and non-presented words.

ここで、非提示単語とは、複数の単語の中の複数のテスト単語以外の単語である。ロジスティックモデルが収束しやすくなるよう、テスト単語に用いなかった非提示単語の回答を仮定してモデルの作成に利用する。親密度の上限に近い単語は、多くの人が知っている単語であり、下限に近い単語は多くの人が知らない単語である。そのため、テスト単語で最も親密度の高い単語を利用者１００が知っていると回答した場合には、その親密度以上の親密度の非提示単語も知っていると仮定する。逆に、テスト単語で最も親密度の低い語を利用者が知らないと回答した場合には、その親密度以下の親密度の非提示単語も知らないと仮定する。 Here, non-presented words are words other than the multiple test words among the multiple words. To facilitate convergence of the logistic model, answers for non-presented words that were not used as test words are assumed and used to create the model. Words close to the upper limit of familiarity are words that many people know, and words close to the lower limit are words that many people do not know. Therefore, if user 100 answers that they know the test word with the highest familiarity, it is assumed that they also know non-presented words with a familiarity level higher than that. Conversely, if the user answers that they do not know the test word with the lowest familiarity, it is assumed that they also do not know non-presented words with a familiarity level lower than that.

言い換えれば、非提示単語を利用者１００に提示したと仮定した場合の利用者１００の非提示単語の知識に関する回答は、テスト単語の親密度の最大値よりも高い親密度の単語に対しては知っているという回答であり、テスト単語の親密度の最小値よりも低い親密度の単語に対しては知らないという回答であるとする。 In other words, assuming that non-presented words are presented to user 100, user 100's response regarding knowledge of the non-presented words will be that he knows the words with a higher familiarity than the maximum familiarity value of the test words, and that he does not know the words with a lower familiarity than the minimum familiarity value of the test words.

例えば、利用者が親密度６．５のテスト単語を知っていると回答した場合には、親密度６．７や６．９の非提示単語は知っているという回答であると仮定する。また、利用者が親密度２のテスト単語を知らないと回答した場合には、親密度１．８や１．６の非提示単語は知らないという回答であると仮定する。For example, if a user answers that they know a test word with a familiarity level of 6.5, it is assumed that they also know non-presented words with familiarity levels of 6.7 and 6.9. Furthermore, if a user answers that they do not know a test word with a familiarity level of 2, it is assumed that they do not know non-presented words with familiarity levels of 1.8 and 1.6.

このように、利用者１００に提示しなかった単語である非提示単語及び非提示単語の知識に関する回答を追加して、モデルを推定することで、モデルが収束しやすくなり、より適切なモデルを生成することができる。これにより、例えば、利用者１００がテスト単語のほとんどを知っていると回答した場合、または、利用者１００がテスト単語のほとんどを知らないと回答した場合であっても、モデルが収束しやすくなり、より適切なモデルを生成することができる。In this way, by adding non-presented words, which are words that were not presented to user 100, and answers regarding knowledge of non-presented words, and estimating the model, the model is more likely to converge, and a more appropriate model can be generated. This makes it easier for the model to converge, and a more appropriate model can be generated, even if, for example, user 100 answers that he knows most of the test words, or even if user 100 answers that he does not know most of the test words.

［第二実施形態］
第二実施形態を説明する。第二実施形態は、獲得確率取得装置及び方法である。 [Second embodiment]
A second embodiment will now be described. The second embodiment is an acquisition probability acquisition device and method.

以下では、第一実施形態及び第一実施形態の変形例との相違点を中心に説明する。既に説明した事項については説明を省略することがある。 The following will focus on the differences between the first embodiment and the modified version of the first embodiment. Explanations of matters that have already been explained may be omitted.

図４に例示するように、本実施形態の獲得確率取得装置２は、記憶部１１、モデル記憶部２１、単語抽出部２２、親密度取得部２３、獲得確率取得部２４及び獲得語情報生成部２５を備えている。獲得確率取得装置２は、単語抽出部２２及び獲得語情報生成部２５を備えていなくてもよい。 As illustrated in Figure 4, the acquisition probability acquisition device 2 of this embodiment includes a memory unit 11, a model memory unit 21, a word extraction unit 22, an intimacy acquisition unit 23, an acquisition probability acquisition unit 24, and an acquired word information generation unit 25. The acquisition probability acquisition device 2 does not necessarily have to include the word extraction unit 22 and the acquired word information generation unit 25.

＜記憶部１１＞
記憶部１１については、第一実施形態の記憶部１１と同じである。 <Storage unit 11>
The storage unit 11 is the same as the storage unit 11 in the first embodiment.

記憶部１１には、複数の単語と、複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度ＤＢが記憶されている。ここで、親密度は単語に対する親密さを表す指標である。The memory unit 11 stores a word intimacy DB that stores multiple words and multiple intimacy levels corresponding to each of the multiple words. Here, intimacy is an index that represents the familiarity with a word.

＜モデル記憶部２１＞
モデル記憶部２１には、各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルが記憶されている。ここで、「ある者」とは、獲得確率の取得となる者である。「ある者」は、利用者１００であってもよい。 <Model storage unit 21>
The model storage unit 21 stores a model that represents the relationship between a value based on the familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word. Here, the "certain person" refers to a person who acquires the acquisition probability. The "certain person" may be a user 100.

ここで、単語を獲得とは、言い換えれば、単語を知っている、単語を使うことができる、単語をわかっている、又は、単語を説明できることである。 Here, acquiring a word means, in other words, knowing a word, being able to use a word, understanding a word, or being able to explain a word.

このモデルの例は、第一実施形態及び第一実施形態の変形例のモデル生成装置１で生成されたモデルである。 This example model is a model generated by the model generation device 1 of the first embodiment and a variant of the first embodiment.

図４に破線で示すように、獲得確率取得装置２は、モデル記憶部２１に記憶されるモデルを生成するためのモデル生成装置１を更に備えていてもよい。 As shown by the dashed line in Figure 4, the acquisition probability acquisition device 2 may further include a model generation device 1 for generating a model to be stored in the model memory unit 21.

すなわち、獲得確率取得装置２は、（１）複数の単語から複数のテスト単語を選択する単語選択部１２と、（２）テスト単語を利用者に提示する提示部１３と、（３）利用者のテスト単語の知識に関する回答を受け付ける回答受付部１４と、（４）テスト単語の知識に関する回答と、記憶部１１に記憶されている単語親密度ＤＢとを用いて、テスト単語に対応する親密度に基づく値と、利用者がテスト単語を知っていると回答する確率に基づく値と、の関係を表すモデルを得て、得られたモデルをモデル記憶部に記憶されているモデルとするモデル生成部１５と、を更に備えていてもよい。 That is, the acquisition probability acquisition device 2 may further include (1) a word selection unit 12 that selects multiple test words from multiple words, (2) a presentation unit 13 that presents the test words to the user, (3) an answer acceptance unit 14 that accepts the user's answers regarding their knowledge of the test words, and (4) a model generation unit 15 that uses the answers regarding their knowledge of the test words and a word familiarity DB stored in the memory unit 11 to obtain a model that represents the relationship between a value based on the familiarity corresponding to the test word and a value based on the probability that the user will answer that they know the test word, and sets the obtained model as the model stored in the model memory unit.

＜単語抽出部２２＞
入力：テキスト
出力：単語
単語抽出部２２は、入力されたテキストに含まれる各単語を抽出する（ステップＳ２２）。 <Word Extraction Unit 22>
Input: Text Output: Words The word extraction unit 22 extracts each word contained in the input text (step S22).

抽出された各単語は、親密度取得部２３に出力される。 Each extracted word is output to the intimacy acquisition unit 23.

単語抽出部２２に入力されるテキストは、情報処理装置である単語抽出部２２が可読なテキストであればどのようなテキストであってもよい。テキストの例は、教科書や小説等の本、新聞や雑誌、Ｗｅｂページに掲載されたテキストである。 The text input to the word extraction unit 22 may be any text that is readable by the word extraction unit 22, which is an information processing device. Examples of text include texts published in books such as textbooks and novels, newspapers and magazines, and web pages.

単語抽出部２２は、例えば入力されたテキストについて形態素解析をすることにより、テキストに含まれる各単語を抽出する。 The word extraction unit 22 extracts each word contained in the text, for example, by performing morphological analysis on the input text.

＜親密度取得部２３＞
入力：単語
出力：単語、親密度
親密度取得部２３には、単語抽出部２２が抽出した各単語が入力される。親密度取得部２３は、各単語に対応する親密度を記憶部１１に記憶されている単語親密度ＤＢから単語から取得する（ステップＳ２３）。 <Familiarity acquisition unit 23>
Input: Word Output: Word, Intimacy Each word extracted by the word extraction unit 22 is input to the intimacy acquisition unit 23. The intimacy acquisition unit 23 acquires the intimacy corresponding to each word from the word intimacy DB stored in the storage unit 11 (step S23).

獲得確率取得装置２が単語抽出部２２を備えていない場合には、テキストに含まれる各単語が入力される。この場合、親密度取得部２３は、テキストに含まれる各単語に対応する親密度を記憶部１１に記憶されている単語親密度ＤＢから単語から取得する（ステップＳ２３）。If the acquisition probability acquisition device 2 does not have a word extraction unit 22, each word contained in the text is input. In this case, the intimacy acquisition unit 23 acquires the intimacy corresponding to each word contained in the text from the word intimacy DB stored in the memory unit 11 (step S23).

各単語及び各単語に対応する親密度は、獲得確率取得部２４に出力される。 Each word and the corresponding familiarity are output to the acquisition probability acquisition unit 24.

なお、親密度取得部２３は、単語抽出部２２は、固有名詞、数詞、助詞等の機能語である単語については親密度を取得しなくてもよい。言い換えれば、単語抽出部２２は、内容語である単語のみについて親密度を取得してもよい。 Note that the intimacy acquisition unit 23 and the word extraction unit 22 may not acquire intimacy for words that are function words such as proper nouns, numerals, particles, etc. In other words, the word extraction unit 22 may acquire intimacy only for words that are content words.

数詞、助詞等の機能語は、多くの人が知っている単語である。このため、これらの機能語である単語について親密度を取得することで、言い換えればこれらの機能語である単語を処理の対象とすることで、獲得語情報生成部２５により計算される、テキスト中の推定獲得語の割合を高くすることができる。反対に、これらの機能語である単語について親密度を取得しないことで、言い換えればこれらの機能語である単語を処理の対象としないことで、獲得語情報生成部２５により計算される、テキスト中の推定獲得語の割合を低くすることができる。 Function words such as numerals and particles are words that many people know. For this reason, by obtaining the familiarity of these function words, in other words, by processing these function words, the proportion of estimated acquired words in the text calculated by the acquired word information generation unit 25 can be increased. Conversely, by not obtaining the familiarity of these function words, in other words, by not processing these function words, the proportion of estimated acquired words in the text calculated by the acquired word information generation unit 25 can be lowered.

また、親密度取得部２３は、単語親密度ＤＢに含まれていない単語については親密度を取得せずに無視してもよい。これにより、形態素解析が誤っている場合であっても、獲得確率取得の処理を適切に行うことができる。 The intimacy acquisition unit 23 may also ignore words that are not included in the word intimacy DB without acquiring the intimacy. This allows the acquisition probability acquisition process to be performed appropriately even if the morphological analysis is incorrect.

＜獲得確率取得部２４＞
入力：単語、親密度
出力：単語、獲得確率
獲得確率取得部２４は、各単語に対応する親密度と、モデル記憶部２１に記憶されているモデルとを少なくとも用いて、各単語をある者が獲得している確率である獲得確率を取得する（ステップＳ２４）。 <Acquisition Probability Acquisition Unit 24>
Input: word, intimacy Output: word, acquisition probability The acquisition probability acquisition unit 24 acquires the acquisition probability, which is the probability that a person has acquired each word, using at least the intimacy corresponding to each word and the model stored in the model memory unit 21 (step S24).

獲得確率取得部２４は、各単語に対応する親密度をモデルに入力した場合の出力値を得て、得られた出力値をその各単語に対応する獲得確率とする。言い換えれば、獲得確率取得部２４は、モデルにおける、各単語に対応する親密度ｘに対応するｙの値を計算して、その計算値をその各単語に対応する獲得確率とする。 The acquisition probability acquisition unit 24 obtains an output value when the familiarity corresponding to each word is input into the model, and sets the obtained output value as the acquisition probability corresponding to each word. In other words, the acquisition probability acquisition unit 24 calculates the value of y corresponding to the familiarity x corresponding to each word in the model, and sets the calculated value as the acquisition probability corresponding to each word.

モデル記憶部２１に記憶されているモデルが、単語に対応する親密度を独立変数ｘとし、ある者が各単語を知っていると回答する確率を従属変数ｙとしたロジスティック曲線ｙ＝ｆ（ｘ，Ψ）であるがモデルである場合には、獲得確率取得部２４は、各単語に対応する親密度ｘに対応するｙ＝ｆ（ｘ，Ψ）の値を計算して、その計算値をその各単語に対応する獲得確率とする。 If the model stored in the model memory unit 21 is a logistic curve y = f(x, Ψ) in which the independent variable x is the familiarity corresponding to a word and the dependent variable y is the probability that a person will respond that they know each word, the acquisition probability acquisition unit 24 calculates the value of y = f(x, Ψ) corresponding to the familiarity x corresponding to each word, and sets the calculated value as the acquisition probability corresponding to each word.

獲得確率取得部２４は、品詞、単語の長さ等を考慮して獲得確率を取得してもよい。例えば、獲得確率取得部２４は、品詞、単語の長さ等も説明変数として用いて獲得確率を取得してもよい。 The acquisition probability acquisition unit 24 may acquire the acquisition probability by taking into account parts of speech, word length, etc. For example, the acquisition probability acquisition unit 24 may also acquire the acquisition probability by using parts of speech, word length, etc. as explanatory variables.

各単語及び各単語に対応する獲得確率は、獲得語情報生成部２５に出力される。 Each word and its corresponding acquisition probability are output to the acquired word information generation unit 25.

＜獲得語情報生成部２５＞
入力：単語、獲得確率
出力：獲得語情報
獲得語情報生成部２５は、各単語に対応する獲得確率を用いて、テキストに含まれる単語の獲得に関する情報である獲得語情報を生成する（ステップＳ２５）。 <Acquired word information generation unit 25>
Input: Word, acquisition probability Output: Acquired word information The acquired word information generating unit 25 generates acquired word information, which is information related to the acquisition of words included in the text, using the acquisition probability corresponding to each word (step S25).

獲得語情報の例は、テキスト中の推定獲得語、テキスト中の推定獲得語の数、テキスト中の推定獲得語の割合の少なくとも１つである。 Examples of acquired word information include at least one of estimated acquired words in the text, the number of estimated acquired words in the text, and the percentage of estimated acquired words in the text.

以下、テキスト中の推定獲得語、テキスト中の推定獲得語の数、テキスト中の推定獲得語の割合のそれぞれの求めた方の例について説明する。 Below, we will explain examples of how to calculate the estimated acquired words in the text, the number of estimated acquired words in the text, and the proportion of estimated acquired words in the text.

（テキスト中の推定獲得語）
まず、獲得語情報生成部２５は、ある者の語彙数を推定する。語彙数の推定は、第一実施形態の語彙数推定部１６で説明した方法により行うことができる。語彙数を推定するために、図４に一点鎖線で示すように、記憶部１１から単語親密度ＤＢ及びモデル記憶部２１からモデルが獲得語情報生成部２５に入力されてもよい。 (Estimated acquired words in the text)
First, the acquired word information generation unit 25 estimates the vocabulary size of a person. The vocabulary size can be estimated by the method described in the vocabulary size estimation unit 16 of the first embodiment. To estimate the vocabulary size, the word familiarity DB from the storage unit 11 and the model from the model storage unit 21 may be input to the acquired word information generation unit 25, as shown by the dashed dotted line in FIG. 4 .

次に、獲得語情報生成部２５は、入力された各単語ｗ（ｋ）に対応する親密度以上の親密度の単語の数ＧＯＩＳＵ（ｋ）を得る。ＧＯＩＳＵ（ｋ）を得るために、図４に一点鎖線で示すように、記憶部１１から単語親密度ＤＢが獲得語情報生成部２５に入力されてもよい。Next, the acquired word information generation unit 25 obtains GOISU(k), the number of words with an intimacy level equal to or higher than the intimacy level corresponding to each input word w(k). To obtain GOISU(k), the word intimacy DB may be input from the memory unit 11 to the acquired word information generation unit 25, as shown by the dashed line in Figure 4.

そして、獲得語情報生成部２５は、ＧＯＩＳＵ（ｋ）がある者の語彙数以下である単語を、テキスト中の推定獲得語とする。一般に、親密度が高い語ほどＧＯＩＳＵ（ｋ）は小さくなる。このため、ある者は、そのある者の語彙数以下のＧＯＩＳＵ（ｋ）の単語を知っていると仮定できる。 The acquired word information generation unit 25 then determines words in the text whose GOISU(k) is equal to or less than the person's vocabulary size as estimated acquired words. Generally, the higher the familiarity of a word, the smaller the GOISU(k). Therefore, it can be assumed that a person knows words with a GOISU(k) equal to or less than the person's vocabulary size.

図６に、ＧＯＩＳＵ（ｋ）の例を示す。 Figure 6 shows an example of GOISU(k).

（テキスト中の推定獲得語の数）
まず、獲得語情報生成部２５は、ある者の語彙数を推定する。語彙数の推定は、第一実施形態の語彙数推定部１６で説明した方法により行うことができる。語彙数を推定するために、図４に一点鎖線で示すように、記憶部１１から単語親密度ＤＢ及びモデル記憶部２１からモデルが獲得語情報生成部２５に入力されてもよい。 (estimated number of words acquired in the text)
First, the acquired word information generation unit 25 estimates the vocabulary size of a person. The vocabulary size can be estimated by the method described in the vocabulary size estimation unit 16 of the first embodiment. To estimate the vocabulary size, the word familiarity DB from the storage unit 11 and the model from the model storage unit 21 may be input to the acquired word information generation unit 25, as shown by the dashed dotted line in FIG. 4 .

そして、獲得語情報生成部２５は、ＧＯＩＳＵ（ｋ）がある者の語彙数以下である単語の数を、テキスト中の推定獲得語の数とする。 Then, the acquired word information generation unit 25 determines the number of words whose GOISU(k) is less than or equal to the vocabulary size of a person as the number of estimated acquired words in the text.

（テキスト中の推定獲得語の割合）
獲得語情報生成部２５は、例えば、以下の式（１）又は式（２）により定まる値を計算して、その計算値をテキスト中の推定獲得語の割合とする。 (Percentage of estimated acquired words in the text)
The acquired word information generating unit 25 calculates a value determined by, for example, the following formula (1) or (2), and sets the calculated value as the proportion of estimated acquired words in the text.

（Σ_ｋ＝１ ^Ｋｙ（ｋ）ＦＲＥＱ（ｋ））／Σ_ｋ＝１ ^ＫＦＲＥＱ（ｋ）…（１）
（Σ_ｋ＝１ ^Ｋｙ（ｋ）ＤＩＦＦ（ｋ））／Σ_ｋ＝１ ^ＫＤＩＦＦ（ｋ）…（２）
ここで、ＦＲＥＱ（ｋ）は、単語ｗ（ｋ）がテキストに現れた回数である。テキストが複数のパートに分かれているとして、ＤＩＦＦ（ｋ）は、単語ｗ（ｋ）が現れたパートの数である。パートの例は、単元、章、節等のテキストを構成する所定の単位である。テキスト全体を単位としてもよい。Ｋは、テキストに含まれ、獲得確率取得部２４によって獲得確率が取得された単語の総数である。 (Σ _k=1 ^K y(k) FREQ(k))/Σ _k=1 ^K FREQ(k)…(1)
(Σ _k=1 ^K y(k) DIFF(k))/Σ _k=1 ^K DIFF(k)…(2)
Here, FREQ(k) is the number of times that word w(k) appears in the text. If the text is divided into multiple parts, DIFF(k) is the number of parts in which word w(k) appears. Examples of parts are predetermined units that make up the text, such as a unit, chapter, or section. The entire text may also be used as the unit. K is the total number of words included in the text and for which the acquisition probability has been acquired by the acquisition probability acquisition unit 24.

獲得語情報生成部２５は、ＦＲＥＱ（ｋ）及びＤＩＦＦ（ｋ）を、入力された単語に基づいてカウントする。獲得語情報生成部２５は、カウントすることで求まったＦＲＥＱ（ｋ）及びＤＩＦＦ（ｋ）を用いて、式（１）又は式（２）により定まる値を計算する。The acquired word information generation unit 25 counts FREQ(k) and DIFF(k) based on the input words. The acquired word information generation unit 25 uses the FREQ(k) and DIFF(k) obtained by counting to calculate the value determined by equation (1) or equation (2).

図６に、ＦＲＥＱ（ｋ）及びＤＩＦＦ（ｋ）の例を示す。 Figure 6 shows an example of FREQ(k) and DIFF(k).

一般的に、多くの人が知っている単語ほど出現頻度は高くなり、多くの人が知らない単語ほど出現頻度は低くなる。したがって、珍しい語のテキスト中での出現回数は、よく知られている語のテキスト内での出現回数より少なくなる。 Generally, the more familiar a word is, the more frequently it appears, and the less familiar a word is, the less frequently it appears. Therefore, rare words will appear less frequently in a text than familiar words will appear in a text.

このため、ＦＲＥＱ（ｋ）を用いた式（１）によって求まるテキスト中の推定獲得語の割合は、ＤＩＦＦ（ｋ）を用いた式（２）によって求まるテキスト中の推定獲得語の割合よりも高くなると考えられる。式（１）及び（２）のどちらを用いるのかは、獲得語情報としてどのような情報が必要か等に応じて適宜定められる。 For this reason, the proportion of estimated acquired words in a text calculated using formula (1) using FREQ(k) is expected to be higher than the proportion of estimated acquired words in a text calculated using formula (2) using DIFF(k). Whether formula (1) or (2) is used is determined appropriately depending on factors such as the type of information needed as acquired word information.

なお、獲得語情報生成部２５は、テキスト中の推定獲得語の数／Ｋをテキスト中の推定獲得語の割合としてもよい。テキスト中の推定獲得語の数は、（テキスト中の推定獲得語の数）で説明した方法に求めることができる。 The acquired word information generation unit 25 may use the number of estimated acquired words in the text/K as the proportion of estimated acquired words in the text. The number of estimated acquired words in the text can be obtained using the method described in (Number of estimated acquired words in the text).

［第三実施形態］
第三実施形態を説明する。第三実施形態は、学習推奨語抽出装置及び方法である。 [Third embodiment]
A third embodiment will now be described. The third embodiment is an apparatus and method for extracting recommended words to study.

図７に例示するように、本実施形態の学習推奨語抽出装置３は、記憶部１１、モデル記憶部３１、獲得確率取得部３２及び学習推奨語抽出部３３を備えている。 As illustrated in Figure 7, the learning recommendation word extraction device 3 of this embodiment includes a memory unit 11, a model memory unit 31, an acquisition probability acquisition unit 32, and a learning recommendation word extraction unit 33.

＜モデル記憶部３１＞
モデル記憶部３１には、各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルが記憶されている。ここで、「ある者」とは、学習推奨語の抽出の対象となる者である。「ある者」は、利用者１００であってもよい。 <Model storage unit 31>
The model storage unit 31 stores a model that represents the relationship between a value based on the familiarity corresponding to each word and a value based on the probability that a certain person has acquired each word. Here, the "certain person" refers to a person from whom recommended words for learning are extracted. The "certain person" may be a user 100.

図６に破線で示すように、学習推奨語抽出装置３は、モデル記憶部３１に記憶されるモデルを生成するためのモデル生成装置１を更に備えていてもよい。 As shown by the dashed line in Figure 6, the learning recommendation word extraction device 3 may further include a model generation device 1 for generating models to be stored in the model memory unit 31.

すなわち、学習推奨語抽出装置３は、（１）複数の単語から複数のテスト単語を選択する単語選択部１２と、（２）テスト単語を利用者に提示する提示部１３と、（３）利用者のテスト単語の知識に関する回答を受け付ける回答受付部１４と、（４）テスト単語の知識に関する回答と、記憶部１１に記憶されている単語親密度ＤＢとを用いて、テスト単語に対応する親密度に基づく値と、利用者がテスト単語を知っていると回答する確率に基づく値と、の関係を表すモデルを得て、得られたモデルをモデル記憶部に記憶されているモデルとするモデル生成部１５と、を更に備えていてもよい。 That is, the learning recommendation word extraction device 3 may further include (1) a word selection unit 12 that selects multiple test words from multiple words, (2) a presentation unit 13 that presents the test words to the user, (3) an answer acceptance unit 14 that accepts the user's answers regarding their knowledge of the test words, and (4) a model generation unit 15 that uses the answers regarding their knowledge of the test words and a word familiarity DB stored in the memory unit 11 to obtain a model that represents the relationship between a value based on the familiarity corresponding to the test word and a value based on the probability that the user will answer that they know the test word, and sets the obtained model as the model stored in the model memory unit.

＜獲得確率取得部３２＞
入力：単語
出力：単語、獲得確率
獲得確率取得部３２には、学習推奨語の候補となる複数の単語からなる単語集合が入力される。 <Acquisition Probability Acquisition Unit 32>
Input: word Output: word, acquisition probability A word set consisting of a plurality of words that are candidates for recommended words to learn is input to the acquisition probability acquisition unit 32 .

獲得確率取得部３２は、記憶部１１に記憶されている単語親密度ＤＢと、モデル記憶部３１に記憶されているモデルとを少なくとも用いて、入力された単語集合に含まれる各単語をある者が獲得している確率である獲得確率を取得する（ステップＳ３２）。 The acquisition probability acquisition unit 32 uses at least the word familiarity DB stored in the memory unit 11 and the model stored in the model memory unit 31 to acquire the acquisition probability, which is the probability that a person has acquired each word included in the input word set (step S32).

獲得確率取得部３２は、各単語に対応する親密度をモデルに入力した場合の出力値を得て、得られた出力値をその各単語に対応する獲得確率とする。言い換えれば、獲得確率取得部３２は、モデルにおける、各単語に対応する親密度ｘに対応するｙの値を計算して、その計算値をその各単語に対応する獲得確率とする。 The acquisition probability acquisition unit 32 obtains an output value when the familiarity corresponding to each word is input into the model, and sets the obtained output value as the acquisition probability corresponding to each word. In other words, the acquisition probability acquisition unit 32 calculates the value of y corresponding to the familiarity x corresponding to each word in the model, and sets the calculated value as the acquisition probability corresponding to each word.

モデル記憶部３１に記憶されているモデルが、単語に対応する親密度を独立変数ｘとし、ある者が各単語を知っていると回答する確率を従属変数ｙとしたロジスティック曲線ｙ＝ｆ（ｘ，Ψ）であるがモデルである場合には、獲得確率取得部３２は、各単語に対応する親密度ｘに対応するｙ＝ｆ（ｘ，Ψ）の値を計算して、その計算値をその各単語に対応する獲得確率とする。 If the model stored in the model memory unit 31 is a logistic curve y = f(x, Ψ) in which the independent variable x is the familiarity corresponding to a word and the dependent variable y is the probability that a person will respond that they know each word, the acquisition probability acquisition unit 32 calculates the value of y = f(x, Ψ) corresponding to the familiarity x corresponding to each word, and sets the calculated value as the acquisition probability corresponding to each word.

獲得確率取得部３２は、品詞、単語の長さ等を考慮して獲得確率を取得してもよい。例えば、獲得確率取得部３２は、品詞、単語の長さ等も説明変数として用いて獲得確率を取得してもよい。 The acquisition probability acquisition unit 32 may acquire the acquisition probability by taking into account parts of speech, word length, etc. For example, the acquisition probability acquisition unit 32 may acquire the acquisition probability by using parts of speech, word length, etc. as explanatory variables.

＜学習推奨語抽出部３３＞
入力：単語、獲得確率
出力：学習推奨語
学習推奨語抽出部３３は、取得された獲得確率に基づいて前記単語集合から学習推奨語を抽出する（ステップＳ３３）。 <Recommended study word extraction unit 33>
Input: word, acquisition probability Output: recommended words to learn The recommended words to learn extraction unit 33 extracts recommended words to learn from the word set based on the acquired acquisition probability (step S33).

例えば、学習推奨語抽出部３３は、取得された獲得確率が所定の確率に近い単語を、学習推奨語として抽出してもよい。 For example, the study recommendation word extraction unit 33 may extract words whose acquisition probability is close to a predetermined probability as study recommendation words.

所定の確率は、０より大きく１より小さい数である。所定の確率の例は、０．５である。 The predetermined probability is a number greater than 0 and less than 1. An example of a predetermined probability is 0.5.

学習推奨語抽出部３３は、所定の確率に近い所定の個数の単語を、学習推奨語として抽出してもよい。 The recommended learning word extraction unit 33 may extract a predetermined number of words that are close to a predetermined probability as recommended learning words.

所定の確率が０．５であり、所定の個数が７である場合には、例えば図９に示す７個の単語が学習推奨語として抽出される。図９において、ENTRYは単語の表記であり、PSYは親密度であり、Probは獲得確率であり、YNはこれらの単語について利用者１００から知っているという回答又は知らないという回答が得られている場合にはその回答についての情報であり、Distance50はこの場合の所定の確率である０．５とProbとの差の大きさである。 If the specified probability is 0.5 and the specified number is 7, for example, the seven words shown in Figure 9 are extracted as recommended words to learn. In Figure 9, ENTRY is the notation of the word, PSY is the familiarity, Prob is the probability of acquisition, YN is information about the answer from user 100 if they know or do not know these words, and Distance50 is the magnitude of the difference between the specified probability of 0.5 in this case and Prob.

この例では、利用者１００から知っているという回答又は知らないという回答が得られていないため、YNに”-”を表示している。利用者１００から単語を知っているという回答が得られている場合にはYNに”1”が表示され、利用者１００から単語を知らないという回答が得られている場合にはYNに”0”が表示される。 In this example, since no answer was received from user 100 indicating that they knew or did not know the word, a "-" is displayed in YN. If user 100 responded that they knew the word, a "1" is displayed in YN, and if user 100 responded that they did not know the word, a "0" is displayed in YN.

学習推奨語は、学習推奨語の抽出の対象となる者に提示される。学習推奨語は、図９に示す表の形式で、学習推奨語の抽出の対象となる者に提示されてもよい。 Suggested words to learn are presented to the person for whom the suggested words to learn are to be extracted. The suggested words to learn may be presented to the person for whom the suggested words to learn are to be extracted in the form of a table as shown in Figure 9.

学習推奨語抽出部３３は、所定の確率を含む所定の範囲内に含まれる単語を、学習推奨語として抽出してもよい。 The study recommendation word extraction unit 33 may extract words that fall within a specified range that includes a specified probability as study recommendation words.

学習推奨語抽出部３３は、所定の品詞の単語であって、取得された獲得確率が所定の確率に近い単語を、学習推奨語として抽出してもよい。所定の品詞の例は、動詞、名詞、形容詞である。所定の品詞は、２種類以上の品詞であってもよい。この場合、学習推奨語抽出部３３は、２種類以上の品詞のそれぞれの単語の中から、取得された獲得確率が所定の確率に近い単語を、学習推奨語として抽出してもよい。 The study recommendation word extraction unit 33 may extract, as study recommendation words, words of a predetermined part of speech whose acquired probability is close to a predetermined probability. Examples of predetermined parts of speech are verbs, nouns, and adjectives. The predetermined parts of speech may be two or more parts of speech. In this case, the study recommendation word extraction unit 33 may extract, as study recommendation words, words whose acquired probability is close to a predetermined probability from each of words of two or more parts of speech.

品詞の情報は、単語親密度ＤＢに記憶されていてもよい。この場合、学習推奨語抽出部３３は、単語親密度ＤＢを参照して、単語の品詞を取得して、上記の処理を行うことができる。 The part-of-speech information may be stored in a word familiarity DB. In this case, the learning recommendation word extraction unit 33 can refer to the word familiarity DB to obtain the part-of-speech of the word and perform the above processing.

学習推奨語抽出部３３は、図示していない記憶部に記憶された単語とその品詞が記憶された辞書を参照して、単語の品詞を取得して、上記の処理を行ってもよい。 The learning recommendation word extraction unit 33 may refer to a dictionary that stores words and their parts of speech stored in a memory unit not shown in the figure, obtain the word's part of speech, and perform the above processing.

＜第三実施形態の変形例＞
獲得確率取得部３２に入力される、学習推奨語の候補となる複数の単語からなる単語集合は、所定のテキストに含まれる単語であってもよい。そのために、学習推奨語抽出装置３は、以下に説明する単語抽出部３４を備えていてもよい。 <Modification of the third embodiment>
The word set consisting of multiple words that are candidates for recommended words to be studied and input to the acquisition probability acquisition unit 32 may be words contained in a predetermined text. For this purpose, the recommended word extraction device 3 may be equipped with a word extraction unit 34 described below.

＜単語抽出部３４＞
入力：テキスト
出力：単語
単語抽出部３４は、入力されたテキストに含まれる各単語を抽出する（ステップＳ３４）。 <Word Extraction Unit 34>
Input: Text Output: Words The word extraction unit 34 extracts each word contained in the input text (step S34).

抽出された各単語は、学習推奨語の候補である単語集合として獲得確率取得部３２に出力される。 Each extracted word is output to the acquisition probability acquisition unit 32 as a set of words that are candidates for recommended learning words.

単語抽出部３４に入力されるテキストは、情報処理装置である単語抽出部２２が可読なテキストであればどのようなテキストであってもよい。テキストの例は、教科書や小説等の本、新聞や雑誌、Ｗｅｂページに掲載されたテキストである。 The text input to the word extraction unit 34 may be any text that is readable by the word extraction unit 22, which is an information processing device. Examples of text include texts published in books such as textbooks and novels, newspapers and magazines, and web pages.

単語抽出部３４は、例えば入力されたテキストについて形態素解析をすることにより、テキストに含まれる各単語を抽出する。 The word extraction unit 34 extracts each word contained in the text, for example, by performing morphological analysis on the input text.

[変形例]
なお、本開示は、上述した実施形態に限定されるものではなく、本開示の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 [Variations]
The present disclosure is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the gist of the present disclosure.

実施形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 The various processes described in the embodiments may not only be performed chronologically in the order described, but may also be performed in parallel or individually depending on the processing capacity of the device performing the processes or as needed.

例えば、モデル生成装置１、獲得確率取得装置２、学習推奨語抽出装置３の構成部間のデータのやり取りは直接行われてもよいし、図示していない記憶部を介して行われてもよい。 For example, data exchange between the components of the model generation device 1, acquisition probability acquisition device 2, and learning recommendation word extraction device 3 may be performed directly or via a memory unit not shown.

[プログラム、記録媒体]
上述した各装置の各部の処理をコンピュータにより実現してもよく、この場合は各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムを図１０に示すコンピュータ１０００の記憶部１０２０に読み込ませ、演算処理部１０１０、入力部１０３０、出力部１０４０、表示部１０６０などに動作させることにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 [Programs, recording media]
The processing of each unit of each of the above-mentioned devices may be realized by a computer, in which case the processing content of the functions that each device should have is described by a program. Then, by loading this program into storage unit 1020 of computer 1000 shown in Figure 10 and operating arithmetic processing unit 1010, input unit 1030, output unit 1040, display unit 1060, etc., various processing functions of each of the above-mentioned devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体は、例えば、非一時的な記録媒体であり、具体的には、磁気記録装置、光ディスク、等である。 The program describing this processing content can be recorded on a computer-readable recording medium. A computer-readable recording medium is, for example, a non-transitory recording medium, specifically a magnetic recording device, optical disk, etc.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program may be distributed, for example, by selling, transferring, or lending portable recording media such as DVDs or CD-ROMs on which the program is recorded. Furthermore, this program may be distributed by storing it in a storage device of a server computer and transferring it from the server computer to other computers via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の非一時的な記憶装置である補助記録部１０５０に格納する。そして、処理の実行時、このコンピュータは、自己の非一時的な記憶装置である補助記録部１０５０に格納されたプログラムを記憶部１０２０に読み込み、読み込んだプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを記憶部１０２０に読み込み、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。A computer executing such a program, for example, first stores the program recorded on a portable recording medium or transferred from a server computer in its own non-transitory storage device, auxiliary storage unit 1050. Then, when executing a process, the computer loads the program stored in its own non-transitory storage device, auxiliary storage unit 1050, into storage unit 1020 and executes processing in accordance with the loaded program. Alternatively, as an alternative execution form of this program, the computer may load the program directly from a portable recording medium into storage unit 1020 and execute processing in accordance with the program. Furthermore, each time a program is transferred from a server computer to this computer, the computer may execute processing in accordance with the received program. Alternatively, the above-described processing may be executed using a so-called ASP (Application Service Provider) type service, which realizes processing functions simply by issuing execution instructions and obtaining results, without transferring the program from the server computer to this computer. In this embodiment, the program includes information used for processing by an electronic computer that is equivalent to a program (such as data that is not a direct instruction to a computer but has properties that dictate computer processing).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, in this form, the device is configured by executing a specified program on a computer, but at least some of the processing content may also be realized in hardware.

例えば、単語選択部１２、提示部１３、回答受付部１４、モデル生成部１５、語彙数推定部１６、単語抽出部２２、親密度取得部２３、獲得確率取得部２４、獲得語情報生成部２５、獲得確率取得部３２、学習推奨語抽出部３３、単語抽出部３４は、処理回路により構成されてもよい。 For example, the word selection unit 12, presentation unit 13, answer acceptance unit 14, model generation unit 15, vocabulary size estimation unit 16, word extraction unit 22, intimacy acquisition unit 23, acquisition probability acquisition unit 24, acquired word information generation unit 25, acquisition probability acquisition unit 32, recommended learning word extraction unit 33, and word extraction unit 34 may be configured by a processing circuit.

また、記憶部１１、モデル記憶部２１、モデル記憶部３１は、メモリにより構成されてもよい。 Furthermore, the memory unit 11, the model memory unit 21, and the model memory unit 31 may be configured as memory.

以上の実施形態に関し、更に以下の付記を開示する。 The following notes are further disclosed regarding the above embodiments.

（付記項１）
メモリと、
前記メモリに接続された少なくとも１つのプロセッサと、
を含み、
前記メモリには、親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度ＤＢが記憶されており、
前記プロセッサは、
前記メモリに記憶されている単語親密度ＤＢを用いて、前記複数の単語から複数のテスト単語を、テスト単語に対応する親密度の間隔が一定間隔になるように選択する、
単語選択装置。 (Additional note 1)
Memory and
at least one processor coupled to said memory;
Including,
The memory stores a word familiarity DB in which a plurality of words and a plurality of familiarity degrees respectively corresponding to the plurality of words are stored, the familiarity being an index representing familiarity with a word;
The processor:
selecting a plurality of test words from the plurality of words using a word familiarity DB stored in the memory so that the familiarity intervals corresponding to the test words are constant;
Word selection device.

（付記項２）
単語選択処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
前記単語選択処理は、
親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度ＤＢを用いて、前記複数の単語から複数のテスト単語を、テスト単語に対応する親密度の間隔が一定間隔になるように選択する、
非一時的記憶媒体。 (Additional note 2)
A non-transitory storage medium storing a program executable by a computer to perform a word selection process,
The word selection process includes:
The familiarity is an index representing familiarity with a word, and a word familiarity DB storing a plurality of words and a plurality of familiarity degrees corresponding to the plurality of words is used to select a plurality of test words from the plurality of words so that the intervals between the familiarity degrees corresponding to the test words are constant.
Non-transitory storage medium.

（付記項３）
メモリと、
前記メモリに接続された少なくとも１つのプロセッサと、
を含み、
前記メモリには、親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度ＤＢが記憶されており、
前記プロセッサは、
複数のテスト単語と、前記複数のテスト単語が提示された利用者の前記テスト単語の知識に関する回答とを入力として、前記テスト単語の知識に関する回答と、前記メモリに記憶されている単語親密度ＤＢとを用いて、前記テスト単語に対応する親密度に基づく値と、前記利用者が前記テスト単語を知っていると回答する確率に基づく値と、の関係を表すモデルを得る、
モデル生成装置。 (Additional note 3)
Memory and
at least one processor coupled to said memory;
Including,
The memory stores a word familiarity DB in which a plurality of words and a plurality of familiarity degrees respectively corresponding to the plurality of words are stored, the familiarity being an index representing familiarity with a word;
The processor:
a method for inputting a plurality of test words and answers regarding knowledge of the test words of a user to whom the plurality of test words have been presented, and obtaining a model representing the relationship between a value based on the familiarity corresponding to the test words and a value based on the probability that the user will answer that they know the test words, using the answers regarding knowledge of the test words and a word familiarity DB stored in the memory;
Model generation device.

（付記項４）
モデル生成処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
前記モデル生成処理は、
複数のテスト単語と、前記複数のテスト単語が提示された利用者の前記テスト単語の知識に関する回答とを入力として、前記テスト単語の知識に関する回答と、単語親密度ＤＢとを用いて、前記テスト単語に対応する親密度に基づく値と、前記利用者が前記テスト単語を知っていると回答する確率に基づく値と、の関係を表すモデルを得、
前記親密度は単語に対する親密さを表す指標であり、単語親密度ＤＢは、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納している、
非一時的記憶媒体。 (Additional note 4)
A non-transitory storage medium storing a program executable by a computer to perform a model generation process,
The model generation process includes:
a method for inputting a plurality of test words and answers of a user to which the plurality of test words have been presented, the method using the answers to the test word knowledge and a word familiarity DB to obtain a model representing the relationship between a value based on familiarity corresponding to the test words and a value based on the probability that the user will answer that they know the test words;
The intimacy is an index representing familiarity with a word, and the word intimacy DB stores a plurality of words and a plurality of intimacies corresponding to the plurality of words, respectively.
Non-transitory storage medium.

（付記項５）
メモリと、
前記メモリに接続された少なくとも１つのプロセッサと、
を含み、
前記メモリには、
親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度ＤＢと、
各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルと、
が記憶されており、
前記プロセッサは、
入力されたテキストに含まれる各単語に対応する親密度を前記メモリに記憶されている単語親密度ＤＢから取得し、
前記取得された各単語に対応する親密度と、前記メモリに記憶されているモデルとを少なくとも用いて、前記各単語を前記ある者が獲得している確率である獲得確率を取得する、
獲得確率取得装置。 (Additional note 5)
Memory and
at least one processor coupled to said memory;
Including,
The memory includes:
The intimacy is an index representing the familiarity of a word, and a word intimacy DB storing a plurality of words and a plurality of intimacies corresponding to the plurality of words;
a model representing the relationship between a value based on the familiarity corresponding to each word and a value based on the probability that a person has acquired each word;
is stored,
The processor:
The familiarity corresponding to each word included in the input text is obtained from the word familiarity DB stored in the memory;
acquiring an acquisition probability, which is the probability that the certain person has acquired each of the words, by using at least the familiarity corresponding to each of the acquired words and the model stored in the memory;
Acquisition probability acquisition device.

（付記項６）
獲得確率取得処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
前記獲得確率取得処理は、
親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度ＤＢから、入力されたテキストに含まれる各単語に対応する親密度を取得し、
各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルと、前記取得された各単語に対応する親密度とを少なくとも用いて、前記各単語を前記ある者が獲得している確率である獲得確率を取得する、
非一時的記憶媒体。 (Additional note 6)
A non-transitory storage medium storing a program executable by a computer to execute an acquisition probability acquisition process,
The acquisition probability acquisition process includes:
The intimacy is an index representing the familiarity of a word, and the intimacy corresponding to each word included in the input text is obtained from a word intimacy DB that stores a plurality of words and a plurality of intimacies corresponding to each of the plurality of words;
acquiring an acquisition probability, which is the probability that each word is acquired by the certain person, by using at least a model representing the relationship between a value based on the familiarity corresponding to each word and a value based on the probability that each word is acquired by the certain person, and the familiarity corresponding to each acquired word;
Non-transitory storage medium.

（付記項７）
メモリと、
前記メモリに接続された少なくとも１つのプロセッサと、
を含み、
前記メモリには、
親密度は単語に対する親密さを表す指標であり、複数の単語と、前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度ＤＢと、
各単語に対応する親密度に基づく値と、ある者が各単語を獲得している確率に基づく値と、の関係を表すモデルと、
が記憶されており、
前記プロセッサは、
前記メモリに記憶されている単語親密度ＤＢと、前記メモリに記憶されているモデルとを少なくとも用いて、入力された単語集合に含まれる各単語を前記ある者が獲得している確率である獲得確率を取得し、
前記取得された獲得確率に基づいて単語集合から学習推奨語を抽出する、
学習推奨語抽出装置。 (Supplementary Note 7)
Memory and
at least one processor coupled to said memory;
Including,
The memory includes:
The intimacy is an index representing the familiarity of a word, and a word intimacy DB storing a plurality of words and a plurality of intimacies corresponding to the plurality of words;
a model representing the relationship between a value based on the familiarity corresponding to each word and a value based on the probability that a person has acquired each word;
is stored,
The processor:
using at least the word familiarity DB stored in the memory and the model stored in the memory, to obtain an acquisition probability that is the probability that the certain person has acquired each word included in the input word set;
extracting learning recommendation words from the word set based on the acquired acquisition probability;
A device for extracting recommended learning words.

（付記項８）
学習推奨語抽出処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
前記学習推奨語抽出処理は、
親密度は単語に対する親密さを表す指標であり、複数の単語と前記複数の単語にそれぞれ対応する複数の親密度とを格納した単語親密度ＤＢと、各単語に対応する親密度に基づく値とある者が各単語を獲得している確率に基づく値との関係を表すモデルとを少なくとも用いて、入力された単語集合に含まれる各単語を前記ある者が獲得している確率である獲得確率を取得し、
前記取得された獲得確率に基づいて単語集合から学習推奨語を抽出する、
非一時的記憶媒体。 (Supplementary Note 8)
A non-transitory storage medium storing a program executable by a computer to execute a recommended learning word extraction process,
The recommended learning word extraction process includes:
The intimacy is an index representing the familiarity with a word, and the acquisition probability, which is the probability that a certain person has acquired each word included in the input word set, is obtained by using at least a word intimacy DB that stores a plurality of words and a plurality of intimacies corresponding to each of the plurality of words, and a model that represents the relationship between a value based on the intimacy corresponding to each word and a value based on the probability that a certain person has acquired each word,
extracting learning recommendation words from the word set based on the acquired acquisition probability;
Non-transitory storage medium.

本明細書に記載された全ての文献、特許出願、及び技術規格は、個々の文献、特許出願、及び技術規格が参照により取り込まれることが具体的かつ個々に記載された場合と同程度に、本明細書中に参照により取り込まれる。 All publications, patent applications, and technical standards mentioned in this specification are incorporated by reference into this specification to the same extent as if each individual publication, patent application, and technical standard was specifically and individually indicated to be incorporated by reference.

Claims

a storage unit in which a word familiarity DB is stored, the familiarity being an index representing familiarity with a word, and in which a plurality of words and a plurality of familiarity degrees respectively corresponding to the plurality of words are stored;
a familiarity acquiring unit that acquires familiarity corresponding to each word included in the input text from a word familiarity DB stored in the storage unit;
a model storage unit that stores a model representing the relationship between a value based on the familiarity corresponding to each word and a value based on the probability that a person has acquired each word;
an acquisition probability acquisition unit that acquires an acquisition probability, which is the probability that each word is acquired by the certain person, by using at least the familiarity corresponding to each of the acquired words and a model stored in the model storage unit;
An acquisition probability acquisition device including:

2. The winning probability acquisition device of claim 1,
The acquisition probability acquisition device further includes: (1) a word selection unit that selects a plurality of test words from a plurality of words; (2) a presentation unit that presents the test words to a user; (3) an answer acceptance unit that accepts the user's answer regarding knowledge of the test words; and (4) a model generation unit that uses the answer regarding knowledge of the test words and a word familiarity DB stored in the storage unit to obtain a model that represents the relationship between a value based on familiarity corresponding to the test words and a value based on the probability that the user will answer that he or she knows the test word, and sets the obtained model as the model stored in the model storage unit.
Acquisition probability acquisition device.

3. The winning probability acquisition device according to claim 1 or 2,
and further comprising an acquired word information generating unit that generates acquired word information, which is information regarding the acquisition of words included in the text, using the acquired acquisition probability corresponding to each of the words.
Acquisition probability acquisition device.

an intimacy acquisition step in which an intimacy acquisition unit acquires an intimacy corresponding to each word included in the input text from a word intimacy DB storing a plurality of words and a plurality of intimacies corresponding to each of the plurality of words, the intimacy being an index representing intimacy with respect to a word;
an acquisition probability acquisition step in which an acquisition probability acquisition unit acquires an acquisition probability, which is the probability that each word is acquired by the certain person, using at least a model representing the relationship between a value based on the familiarity corresponding to each word and a value based on the probability that each word is acquired by the certain person, and the familiarity corresponding to each word acquired;
Including how to obtain the probability of winning.

A program for causing a computer to function as each part of the winning probability acquisition device of claim 1.