JP7280705B2

JP7280705B2 - Machine learning device, program and machine learning method

Info

Publication number: JP7280705B2
Application number: JP2019021083A
Authority: JP
Inventors: 浩史近藤
Original assignee: Japan Research Institute Ltd
Current assignee: Japan Research Institute Ltd
Priority date: 2019-02-07
Filing date: 2019-02-07
Publication date: 2023-05-24
Anticipated expiration: 2039-02-07
Also published as: JP2020129232A

Description

本発明は、機械学習装置、プログラム及び機械学習方法に関する。 The present invention relates to a machine learning device, program, and machine learning method.

学習モデルを利用して、公報に社内分類を付与する分類システムが知られている（例えば、特許文献１）。
［先行技術文献］
［特許文献］
［特許文献１］特開２０１８－０２６１１９号公報 A classification system that assigns an in-house classification to publications using a learning model is known (for example, Patent Document 1).
[Prior art documents]
[Patent Literature]
[Patent Document 1] JP 2018-026119 A

特許文献１の分類システムにおいては、分類システムに入力されるサンプルデータの種類が予め特定されており、入力されたサンプルデータの全てに社内分類が付与される。そのため、サンプルデータの中に、分類対象となるデータと、分類対象にならないデータが混在している場合には、分類精度が低下する、システムリソースの利用効率が低下するなどの課題がある。 In the classification system of Patent Literature 1, the types of sample data to be input to the classification system are specified in advance, and the in-house classification is assigned to all of the input sample data. Therefore, when sample data includes data to be classified and data not to be classified, there are problems such as reduction in classification accuracy and reduction in utilization efficiency of system resources.

本発明の第１の態様においては、機械学習装置が提供される。上記の機械学習装置は、例えば、（ｉ）評価対象に関する評価、及び、（ｉｉ）評価対象の状態又は評価の理由を示す１以上の説明文が対応付けられた評価情報に含まれる１以上の説明文を教師データとして利用して、入力された文が、評価対象に対する評価、評価対象の状態又は評価の理由を示す文であるか否かを判定するための第１学習モデルを構築する第１モデル構築部を備える。上記の機械学習装置は、例えば、第１モデル構築部が構築した第１学習モデルを用いて、テキストデータに含まれる１以上の文の中から、評価対象に関連する文を抽出する抽出部を備える。 In a first aspect of the invention, a machine learning device is provided. For example, the above-described machine learning device includes (i) an evaluation regarding an evaluation target, and (ii) one or more descriptions indicating the status of the evaluation target or the reason for the evaluation. Constructing a first learning model for determining whether or not an input sentence is a sentence indicating an evaluation for an evaluation target, a condition of an evaluation target, or a reason for the evaluation, using the explanation text as teacher data. 1 model construction unit. The above-described machine learning device includes, for example, an extraction unit that extracts sentences related to an evaluation target from among one or more sentences included in text data using the first learning model constructed by the first model construction unit. Prepare.

上記の機械学習装置において、第１学習モデルは、入力された文を、評価対象の状態又は評価の理由を示す文、又は、評価対象の状態又は評価の理由を示す文ではない文の何れかに分類する文章分類器を含んでよい。上記の機械学習装置において、第１学習モデルは、複数の文章分類器を含んでよい。上記の機械学習装置において、複数の文章分類器のそれぞれは、入力された文が、評価対象の状態又は評価の理由を示す文であることの確からしさを示すスコアを出力してよい。上記の機械学習装置において、抽出部は、複数の文章分類器のそれぞれが出力したスコアの合計値が、予め定められた閾値よりも大きい場合に、入力された文を、評価対象に関連する文として抽出してよい。 In the machine learning device described above, the first learning model converts the input sentence to either a sentence indicating the state of the evaluation target or the reason for the evaluation, or a sentence that does not indicate the state of the evaluation target or the reason for the evaluation. may include a sentence classifier that classifies into In the machine learning device described above, the first learning model may include a plurality of sentence classifiers. In the machine learning device described above, each of the plurality of sentence classifiers may output a score indicating the likelihood that the input sentence is the sentence indicating the state of the evaluation target or the reason for the evaluation. In the machine learning device described above, the extraction unit converts the input sentence to a sentence related to the evaluation target when the total score output by each of the plurality of sentence classifiers is greater than a predetermined threshold. can be extracted as

上記の機械学習装置において、抽出部は、テキストデータに含まれる１以上の文の少なくとも一部を、第１学習モデルに入力し、第１学習モデルが評価対象の状態又は評価の理由を示す文であると判定した文を、評価対象に関連する文として抽出する第１抽出部を有してよい。上記の機械学習装置において、抽出部は、評価対象に関連するキーワード又はキーフレーズを示す情報を取得する条件取得部を有してよい。上記の機械学習装置において、抽出部は、テキストデータに含まれる１以上の文の中から、キーワードを含む文、キーフレーズに合致する文、キーワードに類似する単語を含む文、及び、キーフレーズに類似する条件に合致する文の少なくとも１つを、評価対象に関連する文、又は、評価対象に関連する文の候補として抽出する第２抽出部を有してよい。上記の機械学習装置において、第１抽出部は、第２抽出部が評価対象に関連する文の候補として抽出した文を、第１学習モデルに入力してよい。上記の機械学習装置において、第１抽出部は、第１学習モデルが評価対象の状態又は評価の理由を示す文であると判定した文を、評価対象に関連する文として抽出してよい。 In the machine learning device described above, the extracting unit inputs at least a part of one or more sentences included in the text data to the first learning model, and the first learning model extracts sentences indicating the state to be evaluated or the reason for the evaluation. It may have a first extraction unit that extracts the sentence determined to be the sentence related to the evaluation target. In the machine learning device described above, the extraction unit may have a condition acquisition unit that acquires information indicating a keyword or key phrase related to the evaluation target. In the machine learning device described above, the extraction unit selects, from among one or more sentences included in the text data, a sentence containing a keyword, a sentence matching a key phrase, a sentence containing words similar to the keyword, and a key phrase. It may have a second extraction unit that extracts at least one of the sentences that match the similar condition as a sentence related to the evaluation target or a sentence candidate related to the evaluation target. In the machine learning device described above, the first extraction unit may input the sentence extracted by the second extraction unit as a sentence candidate related to the evaluation target to the first learning model. In the machine learning device described above, the first extraction unit may extract a sentence determined by the first learning model as a sentence indicating the state of the evaluation target or the reason for the evaluation as a sentence related to the evaluation target.

上記の機械学習装置において、第２抽出部は、連続する２以上の文を含む文章であって、キーワードを含む文、キーフレーズに合致する文、キーワードに類似する単語を含む文、及び、キーフレーズに類似する条件に合致する文の少なくとも１つを含む文章を、評価対象に関連する文の候補として抽出してよい。上記の機械学習装置は、テキストデータの種類を区別するための種別情報を取得する種別情報取得部を備えてよい。上記の機械学習装置において、第２抽出部は、種別情報により示されるテキストデータの種類に基づいて、テキストデータに含まれる１以上の文の中から、キーワードを含む文、キーフレーズに合致する文、キーワードに類似する単語を含む文、及び、キーフレーズに類似する条件に合致する文の少なくとも１つを、評価対象に関連する文及び評価対象に関連する文の候補の何れとして抽出するかを決定してよい。 In the machine learning device described above, the second extraction unit extracts a sentence containing two or more consecutive sentences, a sentence containing the keyword, a sentence matching the key phrase, a sentence containing a word similar to the keyword, and a key A sentence containing at least one sentence that matches the condition similar to the phrase may be extracted as a sentence candidate related to the evaluation target. The machine learning device described above may include a type information acquisition unit that acquires type information for distinguishing types of text data. In the machine learning device described above, the second extraction unit selects, from among one or more sentences included in the text data, a sentence containing a keyword, a sentence matching a key phrase, based on the type of text data indicated by the type information. , at least one of a sentence containing words similar to the keyword and a sentence similar to the key phrase matching a condition is extracted as either a sentence related to the evaluation object or a sentence candidate related to the evaluation object. You can decide.

上記の機械学習装置は、評価情報を教師データとして利用して、入力された文に、評価対象に関する評価を付与するための第２学習モデルを構築する第２モデル構築部を備えてよい。上記の機械学習装置は、第２モデル構築部が構築した第２学習モデルを用いて、抽出部が抽出した文に、評価対象に関する評価を付与する評価付与部を備えてよい。 The above-described machine learning device may include a second model building section that builds a second learning model for giving an evaluation regarding an evaluation target to an input sentence using evaluation information as teacher data. The above machine learning device may include an evaluation assigning unit that assigns an evaluation regarding an evaluation target to the sentence extracted by the extracting unit using the second learning model constructed by the second model constructing unit.

上記の機械学習装置は、評価付与部による評価に基づいて、特定の期間における評価対象の状態又は動向を示す指標を算出する指標算出部を備えてよい。上記の機械学習装置は、特定の期間を示す情報を取得する期間情報取得部を備えてよい。上記の機械学習装置は、複数のテキストデータのそれぞれを、複数のテキストデータのそれぞれの内容に関連する時期、複数のテキストデータのそれぞれが記録された時期、又は、複数のテキストデータのそれぞれを含む電子ファイルが作成若しくは更新された時期を示す時期情報と対応づけて取得するテキストデータ取得部を備えてよい。上記の機械学習装置において、抽出部は、複数のテキストデータのうち、当該テキストデータに対応付けられた時期情報により示される時期が特定の期間に含まれるテキストデータに含まれる複数の文の中から、評価対象に関連する文を抽出してよい。上記の機械学習装置において、評価付与部は、抽出部が複数のテキストデータの少なくとも一部から抽出した複数の文に、評価対象に関する評価を付与してよい。上記の機械学習装置において、指標算出部は、評価付与部が複数の文のそれぞれに付与した評価に基づいて、指標を算出してよい。 The machine learning device described above may include an index calculation unit that calculates an index indicating the state or trend of the evaluation target in a specific period based on the evaluation by the evaluation giving unit. The above machine learning device may include a period information acquisition unit that acquires information indicating a specific period. The above-mentioned machine learning device, each of the plurality of text data, the time associated with the content of each of the plurality of text data, the time when each of the plurality of text data was recorded, or each of the plurality of text data A text data acquisition unit may be provided that acquires the text data in association with time information indicating the time when the electronic file was created or updated. In the above-described machine learning device, the extracting unit selects from among a plurality of sentences included in text data in which the time indicated by the time information associated with the text data is included in a specific period, out of the plurality of text data. , may extract sentences related to the evaluation target. In the machine learning device described above, the evaluation imparting unit may impart an evaluation regarding the evaluation target to the plurality of sentences extracted from at least part of the plurality of text data by the extracting unit. In the machine learning device described above, the index calculation unit may calculate the index based on the evaluation given to each of the plurality of sentences by the evaluation giving unit.

上記の機械学習装置において、テキストデータは、情報提供者の発言若しくはジェスチャにより提示された情報、又は、情報提供者が知覚した情報を含んでよい。上記の機械学習装置は、評価付与部による評価に基づいて、特定の期間における評価対象の状態又は動向を示す指標を算出する指標算出部を備えてよい。上記の機械学習装置は、複数のテキストデータのそれぞれを、複数のテキストデータのそれぞれの情報提供者の属性を示す属性情報と対応づけて取得するテキストデータ取得部を備えてよい。上記の機械学習装置において、評価付与部は、抽出部が複数のテキストデータの少なくとも一部から抽出した複数の文のそれぞれに対して、各文が含まれていたテキストデータに対応する属性情報により示される情報提供者の属性に基づいて、評価対象に関する評価を付与してよい。上記の機械学習装置において、指標算出部は、評価付与部が複数の文のそれぞれに付与した評価に基づいて、指標を算出してよい。 In the machine learning device described above, the text data may include information presented by an information provider's utterances or gestures, or information perceived by the information provider. The machine learning device described above may include an index calculation unit that calculates an index indicating the state or trend of the evaluation target in a specific period based on the evaluation by the evaluation giving unit. The above machine learning device may include a text data acquisition unit that acquires each of the plurality of text data in association with attribute information indicating the attribute of the information provider of each of the plurality of text data. In the machine learning device described above, the evaluation imparting unit uses the attribute information corresponding to the text data in which each sentence is included for each of the plurality of sentences extracted from at least a part of the plurality of text data by the extraction unit. Based on the attribute of the information provider shown, an evaluation regarding the evaluation target may be given. In the machine learning device described above, the index calculation unit may calculate the index based on the evaluation given to each of the plurality of sentences by the evaluation giving unit.

本発明の第２の態様においては、機械学習方法が提供される。上記の機械学習方法は、例えば、（ｉ）評価対象に関する評価、及び、（ｉｉ）評価対象の状態又は評価の理由を示す１以上の説明文が対応付けられた評価情報に含まれる１以上の説明文を教師データとして利用して、入力された文が、評価対象の状態、評価対象に対する評価又は評価の理由を示す文であるか否かを判定するための第１学習モデルを構築する第１モデル構築段階を有する。上記の機械学習方法は、例えば、第１モデル構築段階において構築された第１学習モデルを用いて、テキストデータに含まれる１以上の文の中から、評価対象に関連する文を抽出する抽出段階を有する。 In a second aspect of the invention, a machine learning method is provided. The above machine learning method includes, for example, (i) an evaluation regarding an evaluation target, and (ii) one or more descriptions indicating the status of the evaluation target or the reason for the evaluation are associated with the evaluation information. Constructing a first learning model for determining whether or not an input sentence is a sentence indicating a state of an evaluation target, an evaluation for an evaluation target, or a reason for the evaluation, using the explanation text as teacher data. 1 model building stage. In the above machine learning method, for example, using the first learning model constructed in the first model construction stage, an extraction stage of extracting sentences related to the evaluation target from among one or more sentences included in the text data. have

本発明の第３の態様においては、プログラムが提供される。上記のプログラムを格納する非一時的コンピュータ可読媒体が提供されてもよい。上記のプログラムは、例えば、コンピュータを、上記の第１の態様に係る機械学習装置として機能させるためのプログラムである。上記のプログラムは、コンピュータに、上記の第２の態様に係る機械学習方法を実行させるためのプログラムであってもよい。 In a third aspect of the invention, a program is provided. A non-transitory computer-readable medium storing the above program may be provided. The program is, for example, a program for causing a computer to function as the machine learning device according to the first aspect. The program may be a program for causing a computer to execute the machine learning method according to the second aspect.

なお、上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではない。また、これらの特徴群のサブコンビネーションもまた、発明となりうる。 It should be noted that the above summary of the invention does not list all the necessary features of the invention. Subcombinations of these feature groups can also be inventions.

指標推定システム１００のシステム構成の一例を概略的に示す。1 schematically shows an example of a system configuration of an index estimation system 100; 格納部１２６の内部構成の一例を概略的に示す。An example of the internal configuration of the storage unit 126 is shown schematically. データテーブル３００の一例を概略的に示す。An example of a data table 300 is shown schematically. データテーブル４００の一例を概略的に示す。An example of a data table 400 is shown schematically. モデル構築部１４４の内部構成の一例を概略的に示す。An example of the internal configuration of the model construction unit 144 is shown schematically. 指標推定部１６６の内部構成の一例を概略的に示す。An example of an internal configuration of an index estimation unit 166 is schematically shown. 機械学習型抽出部６３４の内部構成の一例を概略的に示す。An example of the internal configuration of the machine learning type extraction unit 634 is shown schematically. 評価対象抽出部６３０における情報処理の一例を概略的に示す。An example of information processing in the evaluation target extraction unit 630 is schematically shown. 評価対象抽出部６３０における情報処理の一例を概略的に示す。An example of information processing in the evaluation target extraction unit 630 is schematically shown. 評価対象抽出部６３０における情報処理の一例を概略的に示す。An example of information processing in the evaluation target extraction unit 630 is schematically shown. データテーブル１１００の一例を概略的に示す。An example of a data table 1100 is shown schematically. コンピュータ３０００のシステム構成の一例を概略的に示す。An example of the system configuration of the computer 3000 is shown schematically.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。なお、図面において、同一または類似の部分には同一の参照番号を付して、重複する説明を省く場合がある。 Hereinafter, the present invention will be described through embodiments of the invention, but the following embodiments do not limit the invention according to the claims. Also, not all combinations of features described in the embodiments are essential for the solution of the invention. In addition, in the drawings, the same or similar parts may be denoted by the same reference numerals to omit redundant description.

［指標推定システム１００の概要］
図１は、指標推定システム１００のシステム構成の一例を概略的に示す。本実施形態において、指標推定システム１００は、（ｉ）各種のＳＮＳに登録された情報、ネットワーク上の各種の掲示板に登録された情報、各種のニュースで報道された情報、景況感に関する各種のアンケートにより収集された情報、顧客等との折衝において得られた情報、営業日報又は業務日報に記載された情報などを取得し、（ｉｉ）経済指標の推定値を出力する。これにより、指標推定システム１００のユーザは、経済活動の状態又は動向を、政府、中央銀行などの公的機関が公表する経済指標と同等の精度で、当該経済指標の公表時期よりも早い時期に把握することができる。 [Outline of index estimation system 100]
FIG. 1 schematically shows an example of a system configuration of an index estimation system 100. As shown in FIG. In the present embodiment, the index estimation system 100 includes (i) information registered on various SNSs, information registered on various bulletin boards on networks, information reported on various news, and various questionnaires on business sentiment. (ii) output estimated values of economic indicators; As a result, the user of the indicator estimation system 100 can predict the state or trend of economic activity with accuracy equivalent to that of economic indicators published by public institutions such as governments and central banks, and earlier than the time of publication of the relevant economic indicators. can grasp.

政府、中央銀行などの公的機関が公表する経済指標は、例えば、政府、中央銀行などの公的機関が公表する経済活動に関する調査結果（公的な調査結果と称される場合がある。）に記載される。上記の経済指標としては、日本銀行による「企業短期経済観測調査」に記載された各種の指数、内閣府による「景気ウォッチャー調査」に記載された各種の指数、経済産業省による「生産動態統計調査」に記載された各種の指数などが例示される。「企業短期経済観測調査」に記載された指数としては、「業況判断指数」（日銀短観と称される場合がある）が例示される。「景気ウォッチャー調査」に記載された指数としては、「現状判断」が例示される。「生産動態統計調査」に記載された指数としては、各種の「鉱工業指数」が例示される。経済指標の他の例としては、内閣府から公表される景気動向指数がある。 Economic indicators published by public institutions such as governments and central banks are, for example, the results of surveys on economic activities published by public institutions such as governments and central banks (sometimes referred to as official survey results). listed in The above economic indicators include various indexes listed in the Bank of Japan's "Short-term Economic Survey of Enterprises," various indexes listed in the Cabinet Office's "Economy Watchers Survey," and the Ministry of Economy, Trade and Industry's "Current Production Statistics Survey." ” are exemplified. An example of an index included in the "Short-term Economic Survey of Enterprises" is the "Business Conditions Index" (sometimes referred to as the Bank of Japan Tankan). As an index described in the "Economy Watchers Survey", "Current situation judgment" is exemplified. Various "indices of mining and manufacturing" are exemplified as indices described in the "current production statistics survey." Another example of economic indicators is the economic trend index published by the Cabinet Office.

本実施形態において、指標推定システム１００は、通信部１２２と、入出力部１２４と、格納部１２６と、要求受付部１２８と、教師データ取得部１４２と、モデル構築部１４４と、サンプルデータ取得部１６２と、テキストデータ生成部１６４と、指標推定部１６６とを備える。本実施形態において、指標推定システム１００は、通信ネットワーク１０を介して、ユーザ端末１２、教師データ提供サーバ１４、及び、サンプルデータ提供サーバ１６の少なくとも１つとの間で情報を送受することができる。 In this embodiment, the index estimation system 100 includes a communication unit 122, an input/output unit 124, a storage unit 126, a request reception unit 128, a teacher data acquisition unit 142, a model construction unit 144, and a sample data acquisition unit. 162 , a text data generator 164 and an index estimator 166 . In this embodiment, the index estimation system 100 can transmit and receive information to and from at least one of the user terminal 12, the teacher data providing server 14, and the sample data providing server 16 via the communication network 10.

本実施形態において、通信ネットワーク１０は、ユーザ端末１２、教師データ提供サーバ１４、及び、サンプルデータ提供サーバ１６の少なくとも１つと、指標推定システム１００との間で、情報を伝達する。通信ネットワーク１０は、有線通信の伝送路であってもよく、無線通信の伝送路であってもよく、無線通信の伝送路及び有線通信の伝送路の組み合わせであってもよい。 In this embodiment, the communication network 10 transmits information between at least one of the user terminal 12 , the teacher data providing server 14 and the sample data providing server 16 and the index estimation system 100 . The communication network 10 may be a wired communication transmission line, a wireless communication transmission line, or a combination of a wireless communication transmission line and a wired communication transmission line.

通信ネットワーク１０は、無線通信網、インターネット、Ｐ２Ｐネットワーク、専用回線、ＶＰＮ、電力線通信回線などを含んでもよい。無線通信網における通信方式は、（ｉ）３Ｇ方式、ＬＴＥ方式、４Ｇ方式、５Ｇ方式などの移動体通信方式であってもよく、（ｉｉ）Ｂｌｕｅｔｏｏｔｈ（登録商標）、Ｚｉｇｂｅｅ（登録商標）、ＮＦＣ（ＮｅａｒＦｉｅｌｄＣｏｍｍｕｎｉｃａｔｉｏｎ）のような近距離無線方式、ＷｉＦｉ（登録商標）のような無線ＬＡＮ方式、ＷｉＭＡＸ（登録商標）のような無線ＭＡＮ方式、無線ＷＡＮ方式などの無線データ通信方式であってもよい。 Communication network 10 may include wireless communication networks, the Internet, P2P networks, leased lines, VPNs, power line communication lines, and the like. The communication system in the wireless communication network may be (i) a mobile communication system such as 3G system, LTE system, 4G system, or 5G system, (ii) Bluetooth (registered trademark), Zigbee (registered trademark), NFC (Near Field Communication), a wireless LAN system such as WiFi (registered trademark), a wireless MAN system such as WiMAX (registered trademark), a wireless data communication system such as a wireless WAN system. good.

本実施形態において、ユーザ端末１２は、通信ネットワーク１０を介して指標推定システム１００との間で情報を送受することのできる情報処理端末であればよく、その詳細は特に限定されない。ユーザ端末１２は、指標推定システム１００のユーザにより、指標推定システム１００のユーザインタフェースとして利用され得る。ユーザ端末１２としては、パーソナルコンピュータ、携帯端末などが例示される。携帯端末としては、携帯電話、スマートフォン、ＰＤＡ、タブレット、ノートブック・コンピュータ又はラップトップ・コンピュータ、ウエアラブル・コンピュータなどが例示される。 In the present embodiment, the user terminal 12 may be an information processing terminal capable of transmitting and receiving information to and from the index estimation system 100 via the communication network 10, and details thereof are not particularly limited. The user terminal 12 can be used by a user of the index estimation system 100 as a user interface of the index estimation system 100 . Examples of the user terminal 12 include a personal computer and a mobile terminal. Examples of mobile terminals include mobile phones, smart phones, PDAs, tablets, notebook or laptop computers, wearable computers, and the like.

本実施形態において、教師データ提供サーバ１４は、公的な調査結果のデータを管理する。例えば、教師データ提供サーバ１４は、複数の調査のそれぞれについて、当該調査の種類を示す情報と、当該調査の対象期間を示す情報と、当該調査の調査結果のデータとを対応付けて格納する。教師データ提供サーバ１４は、指標推定システム１００からの要求に応じて、指標推定システム１００が要求する調査結果のデータを、指標推定システム１００に送信する。 In this embodiment, the teacher data providing server 14 manages data of public survey results. For example, for each of a plurality of surveys, the teacher data providing server 14 stores information indicating the type of survey, information indicating the target period of the survey, and survey result data of the survey in association with each other. In response to a request from the index estimation system 100 , the teacher data providing server 14 transmits the survey result data requested by the index estimation system 100 to the index estimation system 100 .

調査の種類の例としては、調査の名称、調査対象の種類、調査の目的、調査におけるヒアリング対象者の属性などが例示される。調査の名称としては、企業短期経済観測調査、景気ウォッチャー調査、生産動態統計調査などが例示される。調査におけるヒアリング対象者の属性としては、年齢、性別、調査対象との関連度合などが例示される。調査対象との関連度合としては、ヒアリング対象者が所属する団体の業種、当該団体の規模、当該団体が活動する地域、当該団体における当該ヒアリング対象者の肩書、当該ヒアリング対象者の職種などが例示される。ヒアリング対象者の肩書としては、所属部署、役職などが例示される。調査対象との関連度合の他の例としては、調査対象に関する予測精度、調査対象に関連する業務又は役職の経験年数などが例示される。 Examples of survey types include the name of the survey, the type of survey target, the purpose of the survey, and the attributes of interviewees in the survey. Names of surveys include short-term corporate economic observation surveys, economic watchers surveys, and current production statistics surveys. Attributes of interviewees in the survey include age, gender, degree of relevance to the survey target, and the like. Examples of the degree of relevance to the survey target include the industry of the organization to which the interviewee belongs, the scale of the organization, the region in which the organization is active, the title of the interviewee in the organization, and the occupation of the interviewee. be done. The title of the person to be interviewed includes, for example, the department to which he/she belongs and the title. Other examples of the degree of relevance to the research target include prediction accuracy regarding the research target, years of experience in work or positions related to the research target, and the like.

ヒアリング対象者及び調査対象の関連度合は、連続的な数値により表されてもよく、段階的な区分により表されてもよい。例えば、関連度合は、調査対象に対する予測精度が良好である程、関連度合を表す数値が大きくなるように設定される。関連度合は、調査対象に関連する業務又は役職の経験年数が長い程、関連度合を表す数値が大きくなるように設定されてもよい。関連度合は、役職の階級が上級であるほど、関連度合を表す数値が大きくなるように設定されてもよい。 The degree of relevance between interviewees and research subjects may be represented by continuous numerical values, or may be represented by gradual divisions. For example, the degree of relevance is set such that the higher the prediction accuracy with respect to the investigation target, the larger the numerical value representing the degree of relevance. The degree of relevance may be set such that the longer the number of years of experience in a job or position related to a survey target, the larger the numerical value representing the degree of relevance. The degree of association may be set such that the higher the rank of the position, the larger the numerical value representing the degree of association.

例えば、調査対象が経済活動である場合、「次の四半期の景気が現在よりも良くなると予想し、実際に、景気が良くなった場合」、「次の四半期の景気が現在よりも良くなると予想したが、実際には景気動向に変化がなかった場合」、「次の四半期の景気が現在よりも良くなると予想したが、実際には景気が悪化した場合」の順に、関連度合を表す数値として大きな値が付与される。「次の四半期の景気が現在よりも良くなると予想し、実際に、景気が良くなった場合」の関連具合は、１以上であってよく、「次の四半期の景気が現在よりも良くなると予想したが、実際には景気動向に変化がなかった場合」及び「次の四半期の景気が現在よりも良くなると予想したが、実際には景気が悪化した場合」の関連具合は１より小さくてもよい。 For example, if the subject of the survey is economic activity, "If you expect the economy to improve next quarter, and it actually does improve," or "If you expect the economy to improve next quarter, However, in reality there was no change in the economic trend," and "I expected the economy to improve next quarter, but it actually deteriorated." given a large value. The degree of relevance for ``If you expect the economy to improve next quarter and if it actually improves'' can be 1 or more, and ``If you expect the economy to improve next quarter. However, even if the degree of association is less than 1 when there is no actual change in the economic trend” and “when the economy is expected to improve next quarter, but the economy actually deteriorates” good.

本実施形態において、サンプルデータ提供サーバ１６は、指標推定システム１００による分析の対象となる各種のデータ（サンプルデータと称される場合がある。）を管理する。サンプルデータは、作成者又は更新者ごとに管理されてもよく、作成者又は更新者及び作成日又は更新日ごとに管理されてもよい。例えば、サンプルデータが営業日報、業務日報などのデータである場合、当該営業日報、業務日報などのデータが保存されるごとに、１つのサンプルデータとして扱われる。サンプルデータが営業日報、業務日報などのデータである場合、当該営業日報、業務日報などを作成又は更新した担当者ごとに、１つのサンプルデータとして扱われてもよい。サンプルデータが営業日報、業務日報などのデータである場合、担当者ごと、且つ、営業日ごとに、１つのサンプルデータとして扱われてもよい。例えば、サンプルデータがＳＮＳ上に投稿されたデータである場合、投稿ごとに、１つのサンプルデータとして取り扱われる。 In this embodiment, the sample data providing server 16 manages various data (sometimes referred to as sample data) to be analyzed by the index estimation system 100 . The sample data may be managed by creator or updater, or may be managed by creator or updater and creation date or update date. For example, if the sample data is data such as a daily business report, daily business report, etc., each time the data of the daily business report, daily business report, etc. is saved, it is handled as one sample data. If the sample data is data such as a daily business report or daily business report, it may be treated as one sample data for each person in charge who created or updated the daily business report or daily business report. If the sample data is data such as a business daily report or business daily report, each person in charge and each business day may be treated as one sample data. For example, when sample data is data posted on SNS, each post is treated as one piece of sample data.

例えば、サンプルデータ提供サーバ１６は、複数のサンプルデータのそれぞれについて、（ｉ）当該サンプルデータの作成時刻又は更新時刻を示す情報、及び、当該サンプルデータの作成者、更新者又は管理者を示す情報の少なくとも一方と、（ｉｉ）当該サンプルデータとを対応付けて格納する。サンプルデータ提供サーバ１６は、複数のサンプルデータのそれぞれについて、（ｉ）当該サンプルデータの作成時刻又は更新時刻を示す情報、及び、当該サンプルデータの作成者、更新者又は管理者を示す情報の少なくとも一方と、（ｉｉ）当該サンプルデータの種類を示す情報と、（ｉｉｉ）当該サンプルデータとを対応付けて格納してもよい。サンプルデータ提供サーバ１６は、指標推定システム１００からの要求に応じて、指標推定システム１００が要求するサンプルデータを、指標推定システム１００に送信する。 For example, for each of a plurality of sample data, the sample data providing server 16 provides (i) information indicating the creation time or update time of the sample data, and information indicating the creator, updater, or administrator of the sample data. and (ii) the sample data are stored in association with each other. For each of the plurality of sample data, the sample data providing server 16 provides (i) at least information indicating the creation time or update time of the sample data and at least information indicating the creator, updater, or administrator of the sample data One, (ii) information indicating the type of the sample data, and (iii) the sample data may be stored in association with each other. The sample data providing server 16 transmits sample data requested by the index estimation system 100 to the index estimation system 100 in response to a request from the index estimation system 100 .

サンプルデータは、テキストデータであってもよく、音声データであってもよく、画像データであってもよく、任意のアプリケーションプログラム用のデータであってもよい。画像データは、静止画像データであってもよく、動画像データであってもよい。サンプルデータに含まれる情報としては、各種のＳＮＳに登録された情報、各種のニュースで報道された情報、各種のアンケートにより収集された情報、顧客等との折衝において得られた情報、営業日報又は業務日報に記載された情報などが例示される。 The sample data may be text data, voice data, image data, or data for any application program. The image data may be still image data or moving image data. Information included in the sample data includes information registered on various SNS, information reported in various news, information collected through various questionnaires, information obtained in negotiations with customers, daily business reports, or Examples include information written in daily business reports.

サンプルデータには、情報提供者の発言若しくはジェスチャにより提示された情報、又は、情報提供者が知覚した情報が含まれてよい。ジェスチャとしては、身振り、手振り、仕草、表情などが例示される。発言、ジェスチャなどに関する情報としては、（ｉ）発言内容又はジェスチャを示す情報、（ｉｉ）情報提供者の発言又はジェスチャに関する、上記の担当者による要約又は補足説明を示す情報などが例示される。補足説明としては、上記の発言又はジェスチャの背景、担当者の感想などが例示される。 The sample data may include information presented by the information provider's utterances or gestures, or information perceived by the information provider. Gestures are exemplified by gestures, gestures, gestures, facial expressions, and the like. Examples of information related to utterances, gestures, etc. include (i) information indicating the content of utterances or gestures, and (ii) information indicating a summary or supplementary explanation by the person in charge regarding the utterances or gestures of the information provider. Examples of the supplementary explanation include the background of the above remarks or gestures, the impression of the person in charge, and the like.

例えば、サンプルデータが営業日報、業務日報などのデータである場合、ヒアリング作業の担当者は、ヒアリング対象者の発言、ジェスチャなどに関する情報を営業日報、業務日報などに書き込むことで、当該営業日報、業務日報などを作成又は更新する。この場合のヒアリング対象者は、情報提供者の一例であってよい。例えば、サンプルデータがＳＮＳ上に投稿されたデータである場合、投稿者は、自己が知覚した情報を、ＳＮＳ上に投稿する。この場合の投稿者は、情報提供者の一例であってよい。サンプルデータの種類の詳細は後述される。 For example, if the sample data is a daily business report, a daily business report, etc., the person in charge of the interview work can write information about the speeches, gestures, etc. of the interviewee in the daily business report, daily business report, etc. Create or update business daily reports, etc. The person to be interviewed in this case may be an example of an information provider. For example, if the sample data is data posted on an SNS, the poster posts the information perceived by himself/herself on the SNS. The contributor in this case may be an example of an information provider. Details of the types of sample data will be described later.

［指標推定システム１００の各部の概要］
本実施形態において、通信部１２２は、通信ネットワーク１０を介して、ユーザ端末１２、教師データ提供サーバ１４、及び、サンプルデータ提供サーバ１６の少なくとも１つとの間で情報を送受する。例えば、通信部１２２は、ユーザ端末１２、教師データ提供サーバ１４、及び、サンプルデータ提供サーバ１６の少なくとも１つに、各種の要求を送信する。通信部１２２は、ユーザ端末１２、教師データ提供サーバ１４、及び、サンプルデータ提供サーバ１６の少なくとも１つから、上記の要求に対する応答を受信する。 [Overview of Each Part of Index Estimation System 100]
In this embodiment, the communication unit 122 transmits and receives information to and from at least one of the user terminal 12 , the teacher data providing server 14 , and the sample data providing server 16 via the communication network 10 . For example, the communication unit 122 transmits various requests to at least one of the user terminal 12 , teacher data providing server 14 , and sample data providing server 16 . The communication unit 122 receives responses to the above requests from at least one of the user terminal 12 , the teacher data providing server 14 , and the sample data providing server 16 .

一実施形態において、通信部１２２は、ユーザ端末１２に対して、指標推定システム１００のユーザとのインタラクションを要求する。例えば、通信部１２２は、ユーザ端末１２に、指標推定システム１００のユーザに提示される情報を送信する。通信部１２２は、各種の入力画面の情報を送信してよい。ユーザ端末１２は、上記のインタラクションに関する要求に基づいて、通信部１２２から受け取った情報を、ユーザに提示する。情報の提示態様は特に限定されない。上記の情報は、画像として表示又は投影されてもよく、音声として出力されてもよい。ユーザ端末１２は、ユーザに対して情報の入力を要求又は催促してよい。ユーザ端末１２は、ユーザが入力した情報を、通信部１２２に送信してよい。これにより、通信部１２２は、指標推定システム１００のユーザがユーザ端末１２に入力した情報を取得することができる。 In one embodiment, the communication unit 122 requests the user terminal 12 to interact with the user of the index estimation system 100 . For example, the communication unit 122 transmits information presented to the user of the index estimation system 100 to the user terminal 12 . The communication unit 122 may transmit information of various input screens. The user terminal 12 presents the information received from the communication unit 122 to the user based on the interaction request. The information presentation mode is not particularly limited. The above information may be displayed or projected as an image, or may be output as audio. The user terminal 12 may request or prompt the user to input information. The user terminal 12 may transmit information input by the user to the communication unit 122 . Thereby, the communication unit 122 can acquire information input to the user terminal 12 by the user of the index estimation system 100 .

他の実施形態において、通信部１２２は、教師データ提供サーバ１４に対して、特定の調査結果に関するデータの送信を要求する。これにより、通信部１２２は、教師データ提供サーバ１４から、各種の調査結果のデータを取得することができる。取得された調査結果のデータは、例えば、モデル構築部１４４における機械学習用の教師データとして用いられる。 In another embodiment, the communication unit 122 requests the teacher data providing server 14 to transmit data related to specific survey results. As a result, the communication unit 122 can acquire various survey result data from the teacher data providing server 14 . The obtained survey result data is used, for example, as teacher data for machine learning in the model construction unit 144 .

さらに他の実施形態において、通信部１２２は、サンプルデータ提供サーバ１６に対して、特定のサンプルデータの送信を要求する。これにより、通信部１２２は、サンプルデータ提供サーバ１６から、各種のサンプルデータを取得することができる。取得されたサンプルデータは、例えば、指標推定部１６６における推定処理用の入力データとして用いられる。 In yet another embodiment, the communication unit 122 requests the sample data providing server 16 to transmit specific sample data. Thereby, the communication unit 122 can acquire various sample data from the sample data providing server 16 . The acquired sample data is used as input data for estimation processing in the index estimation unit 166, for example.

本実施形態において、入出力部１２４は、指標推定システム１００のユーザからの情報の入力を受け付ける。入出力部１２４は、指標推定システム１００のユーザに情報を提示する。入出力部１２４は、指標推定システム１００のユーザにより、指標推定システム１００のユーザインタフェースとして利用され得る。入出力部１２４は、キーボード、ポインティングデバイス、タッチパネル、マイク、カメラ、音声入力システム、ジェスチャ入力システムなどの入力装置を有してよい。入出力部１２４は、表示機器、投影機器、音声出力機器、振動機器などの出力装置を有してよい。 In the present embodiment, the input/output unit 124 receives input of information from the user of the index estimation system 100 . The input/output unit 124 presents information to the user of the index estimation system 100 . The input/output unit 124 can be used by the user of the index estimation system 100 as a user interface of the index estimation system 100 . The input/output unit 124 may have input devices such as a keyboard, pointing device, touch panel, microphone, camera, voice input system, and gesture input system. The input/output unit 124 may have an output device such as a display device, a projection device, an audio output device, or a vibration device.

本実施形態において、格納部１２６は、各種の情報を格納する。格納部１２６は、指標推定システム１００の情報処理において利用される情報を格納してよい。格納部１２６は、指標推定システム１００の情報処理においせ生成された情報を格納してよい。格納部１２６の詳細は後述される。 In this embodiment, the storage unit 126 stores various information. The storage unit 126 may store information used in information processing of the index estimation system 100 . The storage unit 126 may store information generated by information processing of the index estimation system 100 . Details of the storage unit 126 will be described later.

本実施形態において、要求受付部１２８は、指標推定システム１００に対する各種の要求を受け付ける。例えば、要求受付部１２８は、ユーザからの要求であって、指標推定システム１００に関する各種の設定を登録するための要求を受け付ける。要求受付部１２８は、受け付けられた要求を、当該要求の処理に適した要素に転送してよい。 In this embodiment, the request receiving unit 128 receives various requests for the index estimation system 100 . For example, the request receiving unit 128 receives a request from the user to register various settings regarding the index estimation system 100 . The request accepting unit 128 may forward the accepted request to an element suitable for processing the request.

指標推定システム１００に関する設定としては、指標推定システム１００が推定する指標の種類に関する設定、指標推定システム１００における機械学習に関する各種の設定、指標推定システム１００に入力されるサンプルデータに関する各種の設定などが例示される。指標の種類としては、企業短期経済観測調査に記載された各種の指数、景気ウォッチャー調査に記載された各種の指数、生産動態統計調査に記載された各種の指数、景気動向指数などが例示される。機械学習に関する設定としては、機械学習用の教師データに関する設定、学習モデルに関する設定などが例示される。教師データに関する設定としては、データのＵＲＩ、データ形式などが例示される。サンプルデータに関する設定としては、データのＵＲＩ、データ形式などが例示される。 Settings related to the indicator estimation system 100 include settings related to the types of indicators estimated by the indicator estimation system 100, various settings related to machine learning in the indicator estimation system 100, and various settings related to sample data input to the indicator estimation system 100. exemplified. Examples of the types of indicators include various indexes recorded in the Short-term Economic Survey of Enterprises, various indexes recorded in the Economy Watchers Survey, various indexes recorded in the Current Production Statistics Survey, and economic trend indexes. . Examples of settings related to machine learning include settings related to teacher data for machine learning, settings related to learning models, and the like. Examples of settings related to teacher data include the URI of data and data format. Examples of settings related to sample data include data URI, data format, and the like.

一実施形態において、要求受付部１２８が、指標推定システム１００のユーザからの要求を受け付ける。要求受付部１２８は、ユーザ端末１２又は入出力部１２４を介して、ユーザからの要求を取得してよい。他の実施形態において、要求受付部１２８は、ユーザ端末１２、教師データ提供サーバ１４及びサンプルデータ提供サーバ１６の少なくとも１つからの要求を受け付けてよい。 In one embodiment, the request accepting unit 128 accepts requests from users of the index estimation system 100 . The request receiving unit 128 may acquire requests from users via the user terminal 12 or the input/output unit 124 . In another embodiment, the request receiving unit 128 may receive requests from at least one of the user terminal 12 , teacher data providing server 14 and sample data providing server 16 .

本実施形態において、教師データ取得部１４２は、教師データ提供サーバ１４に対して、特定の調査結果のデータの送信を要求する。調査結果のデータは、例えば、当該調査の種類と、当該調査の対象期間とにより特定される。これにより、教師データ取得部１４２は、教師データ提供サーバ１４から、特定の調査結果のデータを取得することができる。 In this embodiment, the teacher data acquisition unit 142 requests the teacher data providing server 14 to transmit specific survey result data. The survey result data is specified, for example, by the type of survey and the target period of the survey. As a result, the teacher data acquiring unit 142 can acquire data of specific survey results from the teacher data providing server 14 .

一実施形態において、教師データ取得部１４２は、取得された調査結果のデータを、格納部１２６に格納する。他の実施形態において、教師データ取得部１４２は、取得された調査結果のデータを、モデル構築部１４４に出力してよい。 In one embodiment, the teacher data acquisition unit 142 stores the acquired survey result data in the storage unit 126 . In another embodiment, the teacher data acquisition unit 142 may output the acquired survey result data to the model construction unit 144 .

本実施形態において、モデル構築部１４４は、教師データ取得部１４２が取得した調査結果のデータを教師データとして用いて、指標推定部１６６において利用される学習モデルを構築する。モデル構築部１４４の詳細は後述される。 In the present embodiment, the model construction unit 144 constructs a learning model used in the index estimation unit 166 by using the survey result data acquired by the teacher data acquisition unit 142 as teacher data. Details of the model construction unit 144 will be described later.

本実施形態において、サンプルデータ取得部１６２は、サンプルデータ提供サーバ１６に対して、特定のサンプルデータの送信を要求する。これにより、通信部１２２は、サンプルデータ提供サーバ１６から、各種のサンプルデータを取得することができる。サンプルデータ取得部１６２は、１以上のサンプルデータのそれぞれを、各サンプルデータの内容に関連する時期、各サンプルデータが記録された時期、又は、各サンプルデータを含む電子ファイルが作成若しくは更新された時期を示す情報と対応づけて取得してよい。サンプルデータ取得部１６２は、１以上のサンプルデータのそれぞれを、各サンプルデータの情報提供者の属性を示す情報と対応付けて取得してもよい。 In this embodiment, the sample data acquisition unit 162 requests the sample data providing server 16 to transmit specific sample data. Thereby, the communication unit 122 can acquire various sample data from the sample data providing server 16 . The sample data acquisition unit 162 obtains each of the one or more sample data at a time related to the contents of each sample data, at a time when each sample data was recorded, or at a time when an electronic file containing each sample data was created or updated. It may be acquired in association with the information indicating the time. The sample data acquisition unit 162 may acquire each of the one or more pieces of sample data in association with information indicating the attribute of the information provider of each piece of sample data.

サンプルデータは、例えば、（ｉ）当該サンプルデータの種類と、（ｉｉ）当該サンプルデータが作成若しくは更新された時刻、又は、当該時刻に関する範囲（上記の時刻、又は、当該時刻に関する範囲は、時期と称される場合がある。）とにより特定される。サンプルデータの種類は、例えば、当該サンプルデータを管理するサンプルデータ提供サーバ１６のＵＲＬにより特定される。サンプルデータの種類は、当該種類を識別するための識別情報により特定されてもよい。サンプルデータの種類は、当該サンプルデータの名称、作成者、更新者、及び、情報提供者の少なくとも１つにより特定されてもよい。サンプルデータの種類は、当該サンプルデータの内容を示す情報を提供した情報提供者の属性により特定されてもよい。情報提供者の属性は、年齢、性別、評価対象との関連度合などが例示される。情報提供者の属性の具体例は、上述されたヒアリング対象者の属性の具体例と同様であってよい。 Sample data includes, for example, (i) the type of sample data, (ii) the time when the sample data was created or updated, or the range of time It is sometimes referred to as.) and is specified by. The type of sample data is specified, for example, by the URL of the sample data providing server 16 that manages the sample data. The type of sample data may be identified by identification information for identifying the type. The type of sample data may be identified by at least one of the sample data name, creator, updater, and information provider. The type of sample data may be specified by the attribute of the information provider who provided the information indicating the contents of the sample data. Attributes of the information provider are exemplified by age, gender, degree of relevance to the evaluation target, and the like. A specific example of the attribute of the information provider may be the same as the specific example of the attribute of the interviewee described above.

サンプルデータが作成又は更新された時期は、当該サンプルデータを格納する電子ファイルに当該サンプルデータが記録された時期であってもよく、当該電子ファイルが作成又は更新された時期であってもよい。なお、サンプルデータが作成又は更新された時期の代わりに、当該サンプルデータの内容に関連する時期が用いられてもよい。例えば、サンプルデータ中に、「２０１８年の１２月の売り上げは、前年比１０％増であった」というように、評価対象に関する時期を示す情報が含まれている場合、当該時期が、当該サンプルデータの内容に関連する時期として利用される。 The time when the sample data was created or updated may be the time when the sample data was recorded in the electronic file storing the sample data, or the time when the electronic file was created or updated. Note that instead of the time when the sample data was created or updated, a time related to the content of the sample data may be used. For example, if the sample data contains information indicating the time period related to the evaluation target, such as "Sales in December 2018 increased by 10% compared to the previous year", the time period is the sample It is used as a period related to the content of the data.

一実施形態において、サンプルデータ取得部１６２は、取得されたサンプルデータを、格納部１２６に格納する。サンプルデータ取得部１６２は、取得されたサンプルデータを、当該サンプルデータの種類を区別するための情報（種別情報と称される場合がある）と対応付けて、格納部１２６に格納してもよい。他の実施形態において、サンプルデータ取得部１６２は、取得されたサンプルデータを、指標推定部１６６に出力してよい。さらに他の実施形態において、取得されたサンプルデータがテキストデータ以外の形式のデータである場合、又は、取得されたサンプルデータにテキストデータ以外の形式のデータが含まれる場合、サンプルデータ取得部１６２は、取得されたサンプルデータを、テキストデータ生成部１６４に出力してよい。 In one embodiment, the sample data acquisition unit 162 stores the acquired sample data in the storage unit 126 . The sample data acquisition unit 162 may store the acquired sample data in the storage unit 126 in association with information for distinguishing the type of the sample data (sometimes referred to as type information). . In another embodiment, the sample data acquiring section 162 may output the acquired sample data to the index estimating section 166 . In yet another embodiment, when the acquired sample data is data in a format other than text data, or when the acquired sample data includes data in a format other than text data, the sample data acquisition unit 162 , the acquired sample data may be output to the text data generator 164 .

本実施形態において、サンプルデータ取得部１６２が取得したサンプルデータがテキストデータ以外の形式のデータである場合、又は、当該サンプルデータにテキストデータ以外の形式のデータが含まれる場合、テキストデータ生成部１６４は、当該テキストデータ以外の形式のデータから、テキストデータを生成する。テキストデータ生成部１６４は、特定のサンプルデータから生成されたテキストデータを、当該サンプルデータの一部として、格納部１２６に格納してよい。テキストデータ生成部１６４は、特定のサンプルデータから生成されたテキストデータを、当該サンプルデータの一部として、指標推定部１６６に出力してもよい。 In this embodiment, if the sample data acquired by the sample data acquisition unit 162 is data in a format other than text data, or if the sample data includes data in a format other than text data, the text data generation unit 164 generates text data from data in a format other than the text data. The text data generation unit 164 may store text data generated from specific sample data in the storage unit 126 as part of the sample data. The text data generating section 164 may output text data generated from specific sample data to the index estimating section 166 as part of the sample data.

一実施形態において、テキストデータ生成部１６４は、サンプルデータに含まれる音声データに対して、音声認識処理を実行することで、当該音声データに含まれる人間の音声を記録したテキストデータを生成する。他の実施形態において、テキストデータ生成部１６４は、サンプルデータに含まれる画像データに対して、画像認識処理を実行することで、当該画像データに含まれる文字又は手話を記録したテキストデータを生成する。 In one embodiment, the text data generation unit 164 generates text data in which human speech included in the sample data is recorded by performing voice recognition processing on the voice data included in the sample data. In another embodiment, the text data generation unit 164 performs image recognition processing on image data included in sample data to generate text data in which characters or sign language included in the image data are recorded. .

本実施形態において、指標推定部１６６は、サンプルデータ取得部１６２が取得したサンプルデータを用いて、ユーザにより指定された種類の指標の推定値を出力する。出力される指標の種類は、例えば、ユーザによる設定又は初期設定に基づいて決定される。 In the present embodiment, the index estimation unit 166 uses the sample data acquired by the sample data acquisition unit 162 to output the estimated value of the type of index designated by the user. The type of index to be output is determined based on user settings or initial settings, for example.

具体的には、まず、指標推定部１６６は、サンプルデータ取得部１６２が取得した複数のサンプルデータの中から、ユーザにより指定された期間に作成又は更新された複数のサンプルデータを抽出する。次に、指標推定部１６６は、抽出された複数のサンプルデータのそれぞれを１以上の文に分割することで、分析対象となる複数の文を得る。次に、指標推定部１６６は、分析対象となる複数の文の中から、経済活動に関連する可能性の高い文を、評価対象となる文として抽出する。 Specifically, first, the index estimation unit 166 extracts a plurality of sample data created or updated during a period designated by the user from among the plurality of sample data acquired by the sample data acquisition unit 162 . Next, the index estimation unit 166 obtains a plurality of sentences to be analyzed by dividing each of the plurality of extracted sample data into one or more sentences. Next, the index estimating unit 166 extracts sentences that are highly likely to be related to economic activity from among the plurality of sentences to be analyzed as sentences to be evaluated.

次に、指標推定部１６６は、評価対象となる文のそれぞれについて、当該文により示される経済活動の状態又は動向の程度を評価して、当該評価に対応するスコア（評価スコアと称される場合がある。）を付与する。次に、指標推定部１６６は、評価対象となる文のそれぞれに付与された評価スコアを、ユーザにより指定された指標の種類に応じて適切に処理することで、当該指標を算出する。 Next, the index estimating unit 166 evaluates the degree of the state or trend of economic activity indicated by each of the sentences to be evaluated, and scores corresponding to the evaluation (referred to as evaluation scores) There is.) is given. Next, the index estimation unit 166 calculates the index by appropriately processing the evaluation score assigned to each sentence to be evaluated according to the type of index specified by the user.

本実施形態によれば、指標推定システム１００は、各種のＳＮＳに登録された情報、各種のニュースで報道された情報、各種のアンケートにより収集された情報、顧客等との折衝において得られた情報（折衝記録に記載された情報と称される場合がある）、営業日報又は業務日報に記載された情報などのビックデータを利用して、評価対象の評価に関連する指標を算出することができる。指標推定部１６６の詳細は後述される。 According to this embodiment, the indicator estimation system 100 includes information registered in various SNSs, information reported in various news, information collected through various questionnaires, information obtained in negotiations with customers, etc. It is possible to calculate indicators related to the evaluation of the evaluation target using big data such as information written in business daily reports (sometimes referred to as information written in negotiation records), daily business reports, etc. . Details of the index estimation unit 166 will be described later.

本実施形態においては、指標推定システム１００が、経済活動を評価対象とし、経済活動の状態又は動向を示す指標として各種の経済指標の推定値を出力する場合を例として、指標推定システム１００の詳細が説明された。しかしながら、指標推定システム１００の評価対象は、経済活動に限定されない。他の実施形態において、指標推定システム１００は、個人、団体若しくは法人、商品若しくはサービス、又は、地域若しくはランドマークの人気度又は知名度を評価対象としてもよい。 In the present embodiment, the indicator estimation system 100 evaluates economic activity and outputs estimated values of various economic indicators as indicators indicating the state or trend of economic activity. Details of the indicator estimation system 100 was explained. However, the evaluation target of the index estimation system 100 is not limited to economic activities. In another embodiment, the indicator estimation system 100 may evaluate the popularity or name recognition of individuals, groups or corporations, products or services, or areas or landmarks.

また、本実施形態においては、指標推定システム１００が、政府、中央銀行などの公的機関が公表する経済活動に関する調査結果を教師データとして利用する場合を例として、指標推定システム１００の詳細が説明される。しかしながら、指標推定システム１００は本実施形態に限定されない。他の実施形態において、指標推定システム１００は、民間の調査機関による調査結果を教師データとして利用してもよい。 In addition, in the present embodiment, the details of the indicator estimation system 100 will be described using, as an example, the case where the indicator estimation system 100 uses, as training data, the results of surveys on economic activities published by public institutions such as governments and central banks. be done. However, the index estimation system 100 is not limited to this embodiment. In another embodiment, the index estimation system 100 may use survey results from private research organizations as teacher data.

指標推定システム１００は、機械学習装置の一例であってよい。通信部１２２は、条件取得部、種別情報取得部、期間情報取得部、テキストデータ取得部の一例であってよい。入出力部１２４は、条件取得部、種別情報取得部、期間情報取得部、テキストデータ取得部の一例であってよい。要求受付部１２８は、条件取得部、種別情報取得部、期間情報取得部、テキストデータ取得部の一例であってよい。モデル構築部１４４は、第１モデル構築部、及び、第２モデル構築部の一例であってよい。サンプルデータ取得部１６２は、種別情報取得部、及び、テキストデータ取得部の一例であってよい。テキストデータ生成部１６４は、種別情報取得部、及び、テキストデータ取得部の一例であってよい。指標推定部１６６は、抽出部、指数算出部の一例であってよい。 The indicator estimation system 100 may be an example of a machine learning device. The communication unit 122 may be an example of a condition acquisition unit, a type information acquisition unit, a period information acquisition unit, and a text data acquisition unit. The input/output unit 124 may be an example of a condition acquisition unit, a type information acquisition unit, a period information acquisition unit, and a text data acquisition unit. The request reception unit 128 may be an example of a condition acquisition unit, a type information acquisition unit, a period information acquisition unit, and a text data acquisition unit. The model builder 144 may be an example of a first model builder and a second model builder. The sample data acquisition unit 162 may be an example of a type information acquisition unit and a text data acquisition unit. The text data generation unit 164 may be an example of a type information acquisition unit and a text data acquisition unit. The index estimation unit 166 may be an example of an extraction unit and an index calculation unit.

景況感は、経済活動に関する評価の一例であってよい。経済活動は、評価対象の一例であってよい。調査対象は、評価対象の一例であってよい。公的な調査結果は、評価情報の一例であってよい。サンプルデータは、テキストデータの一例であってよい。経済指標は、指標の一例であってよい。ヒアリング対象者は、情報提供者の一例であってよい。 Business confidence may be an example of an assessment of economic activity. Economic activity may be an example of an evaluation target. A survey target may be an example of an evaluation target. Public survey results may be an example of evaluation information. Sample data may be an example of text data. An economic indicator may be an example of an indicator. A person to be interviewed may be an example of an information provider.

［指標推定システム１００の各部の具体的な構成］
指標推定システム１００の各部は、ハードウエアにより実現されてもよく、ソフトウエアにより実現されてもよく、ハードウエアとソフトウエアとの組み合わせにより実現されてもよい。指標推定システム１００の構成要素の少なくとも一部がソフトウエアにより実現される場合、当該ソフトウエアにより実現される構成要素は、一般的な構成の情報処理装置において、当該構成要素に関する動作を規定したプログラムを起動することにより実現されてよい。 [Specific configuration of each part of index estimation system 100]
Each unit of index estimation system 100 may be implemented by hardware, software, or a combination of hardware and software. When at least part of the constituent elements of the index estimation system 100 are implemented by software, the constituent elements realized by the software are programs that define the operations of the constituent elements in an information processing device with a general configuration. may be implemented by invoking the

プログラムは、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、メモリ、ハードディスクなどのコンピュータ読み取り可能な媒体に記憶されていてもよく、ネットワークに接続された記憶装置に記憶されていてもよい。プログラムは、コンピュータ読み取り可能な媒体又はネットワークに接続された記憶装置から、指標推定システム１００の少なくとも一部を構成するコンピュータにインストールされてよい。プログラムが実行されることにより、コンピュータが、指標推定システム１００の各部の少なくとも一部として機能してもよい。 The program may be stored in a computer-readable medium such as a CD-ROM, DVD-ROM, memory, hard disk, etc., or may be stored in a storage device connected to a network. The program may be installed in a computer that constitutes at least part of the index estimation system 100 from a computer-readable medium or a storage device connected to a network. A computer may function as at least part of each unit of the index estimation system 100 by executing the program.

コンピュータを指標推定システム１００の各部の少なくとも一部として機能させるプログラムは、指標推定システム１００の各部の動作を規定したモジュールを備えてよい。これらのプログラム又はモジュールは、データ処理装置、入力装置、出力装置、記憶装置等に働きかけて、コンピュータを指標推定システム１００の各部として機能させたり、コンピュータに指標推定システム１００の各部における情報処理方法を実行させたりする。 A program that causes a computer to function as at least part of each unit of the index estimation system 100 may include modules that define the operation of each unit of the index estimation system 100 . These programs or modules work on a data processing device, an input device, an output device, a storage device, etc. to cause the computer to function as each part of the index estimation system 100, or instruct the computer to perform information processing in each part of the index estimation system 100. or let it run.

プログラムに記述された情報処理は、当該プログラムがコンピュータに読込まれることにより、当該プログラムに関連するソフトウエアと、指標推定システム１００の各種のハードウエア資源とが協働した具体的手段として機能する。そして、上記の具体的手段が、本実施形態におけるコンピュータの使用目的に応じた情報の演算又は加工を実現することにより、当該使用目的に応じた指標推定システム１００が構築される。 The information processing described in the program functions as concrete means in which the software related to the program and various hardware resources of the index estimation system 100 work together when the program is read into the computer. . Then, the specific means described above realizes the calculation or processing of information according to the purpose of use of the computer in this embodiment, thereby constructing the index estimation system 100 according to the purpose of use.

上記のプログラムは、コンピュータに、機械学習方法を実行させるためのプログラムであってよい。上記の機械学習方法は、例えば、（ｉ）評価対象に関する評価、及び、（ｉｉ）評価対象の状態又は評価の理由を示す１以上の説明文が対応付けられた評価情報に含まれる１以上の説明文を教師データとして利用して、入力された文が、評価対象の状態、評価対象に対する評価又は評価の理由を示す文であるか否かを判定するための第１学習モデルを構築する第１モデル構築段階を有する。上記の機械学習方法は、例えば、第１モデル構築段階において構築された第１学習モデルを用いて、テキストデータに含まれる１以上の文の中から、評価対象に関連する文を抽出する抽出段階を有する。 The above program may be a program for causing a computer to execute a machine learning method. The above machine learning method includes, for example, (i) an evaluation regarding an evaluation target, and (ii) one or more descriptions indicating the status of the evaluation target or the reason for the evaluation are associated with the evaluation information. Constructing a first learning model for determining whether or not an input sentence is a sentence indicating a state of an evaluation target, an evaluation for an evaluation target, or a reason for the evaluation, using the explanation text as teacher data. 1 model building stage. In the above machine learning method, for example, using the first learning model constructed in the first model construction stage, an extraction stage of extracting sentences related to the evaluation target from among one or more sentences included in the text data. have

図２は、格納部１２６の内部構成の一例を概略的に示す。本実施形態において、格納部１２６は、設定情報格納部２２２と、サンプルデータ格納部２２６と、教師データ格納部２２４と、モデル情報格納部２２８とを備える。 FIG. 2 schematically shows an example of the internal configuration of the storage unit 126. As shown in FIG. In this embodiment, the storage unit 126 includes a setting information storage unit 222 , a sample data storage unit 226 , a teacher data storage unit 224 and a model information storage unit 228 .

本実施形態において、設定情報格納部２２２は、要求受付部１２８が受け付けた、指標推定システム１００に関する設定を示す情報を格納する。上記の設定としては、指標推定システム１００が推定する指標の種類に関する設定、指標推定システム１００における機械学習に関する各種の設定、指標推定システム１００に入力されるサンプルデータに関する各種の設定などが例示される。 In this embodiment, the setting information storage unit 222 stores information indicating settings regarding the index estimation system 100 received by the request receiving unit 128 . Examples of the above settings include settings related to the types of indicators estimated by the indicator estimation system 100, various settings related to machine learning in the indicator estimation system 100, and various settings related to sample data input to the indicator estimation system 100. .

本実施形態において、教師データ格納部２２４は、教師データ取得部１４２が取得した各種のデータを、モデル構築部１４４のモデル構築処理において利用される教師データとして格納する。教師データ格納部２２４は、例えば、複数の教師データのそれぞれについて、当該データの識別情報と、当該データの種類を示す情報及び当該データの対象期間を示す情報の少なくとも一方と、当該データとを対応付けて格納してよい。 In this embodiment, the teacher data storage unit 224 stores various data acquired by the teacher data acquisition unit 142 as teacher data used in the model building process of the model building unit 144 . For example, for each of a plurality of pieces of teacher data, the teacher data storage unit 224 associates the data with identification information of the data, at least one of information indicating the type of the data and information indicating the target period of the data. You can store it with

データの種類は、当該データの内容を示す情報を提供した情報提供者の属性であってよい。教師データが各種の調査結果のデータである場合、情報提供者としては、当該調査におけるヒアリング対象者が例示される。上記のデータが特定の調査結果のデータである場合、データの種類の具体例は、上述された調査の種類の具体例と同様であってよい。 The data type may be an attribute of an information provider who provided information indicating the content of the data. When the teacher data is data of various survey results, the information provider is exemplified by the interviewees in the survey. Where the data is of a particular survey result, examples of data types may be similar to the examples of survey types described above.

本実施形態において、サンプルデータ格納部２２６は、サンプルデータ取得部１６２が取得したサンプルデータを格納する。サンプルデータ格納部２２６は、複数のサンプルデータのそれぞれについて、（ｉ）当該サンプルデータの識別情報と、（ｉｉ）当該サンプルデータの種類を示す情報、当該サンプルデータの作成時刻又は更新時刻を示す情報、及び、当該サンプルデータの作成者、更新者又は管理者を示す情報の少なくとも１つと、（ｉｉｉ）当該サンプルデータとを対応付けて格納してよい。サンプルデータ格納部２２６は、複数のサンプルデータのそれぞれについて、当該サンプルデータとともに、又は、当該サンプルデータに代えて、テキストデータ生成部１６４が生成したテキストデータを格納してよい。 In this embodiment, the sample data storage unit 226 stores sample data acquired by the sample data acquisition unit 162 . The sample data storage unit 226 stores (i) identification information of the sample data, (ii) information indicating the type of the sample data, and information indicating the creation time or update time of the sample data for each of the plurality of sample data. , and at least one of information indicating the creator, updater, or administrator of the sample data, and (iii) the sample data may be stored in association with each other. The sample data storage unit 226 may store the text data generated by the text data generation unit 164 together with or instead of the sample data for each of the plurality of sample data.

本実施形態において、モデル情報格納部２２８は、モデル構築部１４４が構築した学習モデルに関する各種の情報を格納する。例えば、モデル情報格納部２２８は、複数の学習モデルのそれぞれについて、当該モデルの識別情報と、当該モデルのアルゴリズムを示す情報と、当該モデルのパラメータの値を示す情報とを対応付けて格納する。モデル情報格納部２２８は、複数の学習モデルのそれぞれについて、当該モデルに関する他の情報を格納してもよい。 In this embodiment, the model information storage unit 228 stores various information regarding the learning model constructed by the model construction unit 144 . For example, the model information storage unit 228 stores, for each of a plurality of learning models, identification information of the model, information indicating the algorithm of the model, and information indicating the parameter values of the model in association with each other. The model information storage unit 228 may store other information regarding each of the plurality of learning models.

図３は、データテーブル３００の一例を概略的に示す。データテーブル３００は、教師データ格納部２２４に格納された教師データの一例であってよい。データテーブル３００は、特定の期間を対象期間とする景気ウォッチャー調査の一例であってよい。 FIG. 3 schematically shows an example of a data table 300. As shown in FIG. The data table 300 may be an example of teacher data stored in the teacher data storage unit 224 . The data table 300 may be an example of an economy watchers survey covering a specific period.

景気ウォッチャー調査は、地域の景気に関連の深い動きを観察できる立場にある人々（ヒアリング対象者と称される場合がある。）の協力を得て、地域ごとに景気動向を的確かつ迅速に把握し、景気動向判断の基礎資料とすることを目的として実施される。景気ウォッチャー調査における調査項目としては、（ｉ）景気の現状に対する判断、（ｉｉ）現状に対する判断の理由、（ｉｉｉ）上記の理由に関する追加説明、及び、具体的状況の説明、（ｉｖ）景気の先行きに対する判断、（ｖ）先行きに対する判断の理由などが例示される。 The Economy Watchers Survey obtains the cooperation of people who are in a position to observe developments closely related to the regional economy (sometimes referred to as interviewees) to accurately and quickly grasp economic trends in each region. The survey is conducted with the aim of providing basic data for assessing economic trends. Survey items in the Economy Watchers Survey include (i) judgments on the current state of the economy, (ii) reasons for the judgments on the current state, (iii) additional explanations regarding the above reasons and explanations of specific conditions, and (iv) economic conditions. Judgment on the future, (v) reasons for the judgment on the future, etc. are exemplified.

本実施形態において、データテーブル３００は、経済活動の分野を示す情報３１２と、調査対象となる地域を示す情報３１４と、ヒアリング対象者の業種及び職種を示す情報３１６と、景気の現状判断を示す情報３２２と、判断の理由を示す情報３２４と、追加説明及び具体的状況の説明を示す情報３２６とを対応付けて格納する。データテーブル３００の各行は、評価情報の一例であってよい。経済活動の分野を示す情報３１２、調査対象となる地域を示す情報３１４、及び、ヒアリング対象者の業種及び職種を示す情報３１６のそれぞれは、調査の種類の一例であってよい。景気の現状判断を示す情報３２２は、評価対象に関する評価の一例であってよい。判断の理由を示す情報３２４は、評価の理由を示す説明文の一例であってよい。追加説明及び具体的状況の説明を示す情報３２６は、評価対象の状態を示す説明文の一例であってよい。 In this embodiment, the data table 300 includes information 312 indicating the field of economic activity, information 314 indicating the area to be surveyed, information 316 indicating the industry and occupation of the interviewee, and the judgment of the current state of the economy. Information 322, information 324 indicating the reason for the judgment, and information 326 indicating the additional explanation and explanation of the specific situation are stored in association with each other. Each row of the data table 300 may be an example of evaluation information. Each of the information 312 indicating the field of economic activity, the information 314 indicating the area to be investigated, and the information 316 indicating the industry and occupation of the interviewee may be examples of types of investigation. The information 322 indicating the judgment of the current state of the economy may be an example of an evaluation regarding an evaluation target. The information 324 indicating the reason for the judgment may be an example of a descriptive text indicating the reason for the evaluation. The information 326 indicating the additional explanation and the explanation of the specific situation may be an example of an explanatory text indicating the state of the evaluation target.

なお、データテーブル３００のデータ構造は本実施形態に限定されない。他の実施形態において、データテーブル３００は、経済活動の分野を示す情報３１２、調査対象となる地域を示す情報３１４、及び、ヒアリング対象者の業種及び職種を示す情報３１６、判断の理由を示す情報３２４の少なくとも１つを備えなくてもよい。さらに他の実施形態において、データテーブル３００は、データテーブル３００の各行を識別するための識別情報を格納するための列、調査の種類を示す情報を格納するための列、及び、調査が実施された時期を示す情報を格納するための列の少なくとも１つをさらに備えてもよい。 Note that the data structure of the data table 300 is not limited to this embodiment. In another embodiment, the data table 300 includes information 312 indicating the field of economic activity, information 314 indicating the area to be investigated, information 316 indicating the industry and occupation of the interviewee, and information indicating the reason for the judgment. 324 may not be provided. In yet another embodiment, the data table 300 includes a column for storing identification information for identifying each row of the data table 300, a column for storing information indicating the type of survey, and a column for storing information indicating the type of survey conducted. It may further comprise at least one column for storing information indicative of when the time was taken.

図４は、データテーブル４００の一例を概略的に示す。データテーブル４００は、サンプルデータ格納部２２６に格納されたサンプルデータの一例であってよい。本実施形態においては、サンプルデータとして、企業の営業担当者が、顧客との会話、折衝などの内容を記録した営業日報が入力された場合を例として、データテーブル４００の詳細が説明される。しかしながら、サンプルデータが本実施形態に限定されないことに留意されたい。本実施形態において、上記の顧客は、ヒアリング対象者の一例であってよい。 FIG. 4 schematically shows an example of a data table 400. As shown in FIG. The data table 400 may be an example of sample data stored in the sample data storage unit 226 . In the present embodiment, the details of the data table 400 will be described by taking as an example a case in which a salesperson of a company inputs a business daily report that records the contents of conversations, negotiations, etc. with customers as sample data. However, it should be noted that the sample data is not limited to this embodiment. In this embodiment, the above customer may be an example of a person to be interviewed.

本実施形態において、データテーブル４００の各行（レコードと称される場合がある。）には、単一の文の情報が格納される。例えば、単一のサンプルデータに複数の文が含まれる場合、データテーブル４００は、当該サンプルデータに関する情報を、複数のレコードに分割して格納する。複数のレコードのそれぞれには、上記の複数の文のそれぞれに関する情報が格納される。 In this embodiment, each row (sometimes called a record) of the data table 400 stores information of a single sentence. For example, if a single sample data contains multiple sentences, the data table 400 divides and stores information about the sample data into multiple records. Information relating to each of the plurality of sentences is stored in each of the plurality of records.

本実施形態において、データテーブル４００は、サンプルＩＤ４１２と、センテンスＩＤ４１４と、各文の記録時刻を示す情報４１６と、各文の内容を示す情報４１８と、データの種類を示す情報４２０とを対応づけて格納する。サンプルＩＤ４１２は、複数のサンプルデータのそれぞれを識別することのできる情報であればよく、その詳細は特に限定されない。センテンスＩＤ４１４は、複数の文のそれぞれを識別することのできる情報であればよく、その詳細は特に限定されない。 In this embodiment, the data table 400 associates a sample ID 412, a sentence ID 414, information 416 indicating the recording time of each sentence, information 418 indicating the content of each sentence, and information 420 indicating the type of data. store. The sample ID 412 may be any information that can identify each of the plurality of sample data, and its details are not particularly limited. The sentence ID 414 may be information that can identify each of a plurality of sentences, and its details are not particularly limited.

記録時刻を示す情報４１６は、複数の文のそれぞれが作成又は更新された時刻を示す。内容を示す情報４１８は、複数の文のそれぞれの内容を示す。データの種類を示す情報４２０は、複数の文のそれぞれの種類を示す。文の種類としては、当該文が含まれていたサンプルデータの種類、当該文の内容を示す情報を提供した情報提供者の属性などが例示される。サンプルデータが営業日報、業務日報又は折衝記録に関するテキストデータである場合、情報提供者としては、ヒアリング対象となった顧客などが例示される。 Information 416 indicating recording time indicates the time when each of the plurality of sentences was created or updated. Information indicating content 418 indicates the content of each of the plurality of sentences. The data type information 420 indicates the type of each of the plurality of sentences. Examples of the type of sentence include the type of sample data containing the sentence, the attribute of the information provider who provided the information indicating the content of the sentence, and the like. If the sample data is text data relating to daily business reports, daily business reports, or negotiation records, examples of information providers include customers who were interviewed.

図４に示された例によれば、データの種類を示す情報４２０として、情報提供者の属性を示す情報が格納されている。上記の属性は、例えば、サンプルデータ取得部１６２が、営業日報の文章又は各文を解析することにより、各文に付与される。上記の属性は、営業担当者が、営業日報の文章又は各文に対応する属性を入出力部１２４に入力することにより、各文に付与されてもよい。 According to the example shown in FIG. 4, information indicating attributes of the information provider is stored as the information 420 indicating the type of data. For example, the sample data acquisition unit 162 analyzes the sentences or each sentence in the business daily report, and the above attributes are given to each sentence. The above attributes may be given to each sentence by the salesperson inputting the sentence of the business daily report or the attribute corresponding to each sentence to the input/output unit 124 .

なお、データテーブル４００のデータ構造は本実施形態に限定されない。他の実施形態において、複数の文のそれぞれについて、データテーブル４００は、サンプルＩＤ４１２と、センテンスＩＤ４１４と、内容を示す情報４１８とを対応付けて格納する第１のデータテーブルと、複数のサンプルデータのそれぞれについて、サンプルＩＤ４１２と、記録時刻を示す情報４１６とを対応付けて格納する第２のデータテーブルとに分割されていてもよい。 Note that the data structure of the data table 400 is not limited to this embodiment. In another embodiment, for each of a plurality of sentences, the data table 400 includes a first data table that stores a sample ID 412, a sentence ID 414, and information 418 indicating content in association with each other, and a plurality of sample data. Each may be divided into a second data table that stores the sample ID 412 and the information 416 indicating the recording time in association with each other.

図５は、モデル構築部１４４の内部構成の一例を概略的に示す。本実施形態において、モデル構築部１４４は、抽出用モデル構築部５２２と、評価用モデル構築部５２４とを備える。 FIG. 5 schematically shows an example of the internal configuration of the model building section 144. As shown in FIG. In this embodiment, the model construction unit 144 includes an extraction model construction unit 522 and an evaluation model construction unit 524 .

上述のとおり、モデル構築部１４４は、指標推定部１６６において利用される各種の学習モデルを構築する。上述のとおり、指標推定部１６６は、分析対象となる複数の文の中から、経済活動に関連する可能性の高い文を、評価対象となる文として抽出する。また、指標推定部１６６は、評価対象となる文のそれぞれについて、当該文により示される経済活動の状態又は動向の程度を評価して、当該評価に対応するスコアを付与する。 As described above, the model building section 144 builds various learning models used in the index estimating section 166 . As described above, the index estimating unit 166 extracts sentences that are highly likely to be related to economic activity from among the plurality of sentences to be analyzed as sentences to be evaluated. In addition, the index estimation unit 166 evaluates the degree of economic activity state or trend indicated by each sentence to be evaluated, and assigns a score corresponding to the evaluation.

本実施形態において、抽出用モデル構築部５２２は、上記の分析対象となる複数の文の中から、評価対象となる文を抽出するための学習モデルを構築する。具体的には、抽出用モデル構築部５２２は、（ｉ）評価対象に関する評価、及び、（ｉｉ）評価対象の状態又は評価の理由を示す１以上の説明文が対応付けられた評価情報に含まれる１以上の説明文を教師データとして利用して、入力された文が、評価対象の状態、評価対象に対する評価又は評価の理由を示す文であるか否かを判定するための学習モデルを構築する。 In this embodiment, the extraction model construction unit 522 constructs a learning model for extracting sentences to be evaluated from among the plurality of sentences to be analyzed. Specifically, the extraction model construction unit 522 includes (i) the evaluation regarding the evaluation target, and (ii) one or more explanations indicating the state of the evaluation target or the reason for the evaluation in the evaluation information associated with the evaluation information. Building a learning model for judging whether or not an input sentence is a sentence indicating the status of an evaluation target, an evaluation for an evaluation target, or a reason for the evaluation, using one or more explanatory sentences as training data. do.

例えば、抽出用モデル構築部５２２は、データテーブル３００を構成する複数のレコードの追加説明及び具体的状況の説明を示す情報３２６を教師データとして利用して、入力された文が、（ｉ）経済活動に対する評価、（ｉｉ）経済活動の状態、又は、（ｉｉｉ）当該評価の理由を示す文であるか否かを判定するための学習モデルを構築する。上記の学習モデルによれば、当該学習モデルに入力された複数の文のそれぞれは、追加説明及び具体的状況の説明を示す情報３２６に含まれる文に類似する文と、追加説明及び具体的状況の説明を示す情報３２６に含まれる文に類似しない文とに分類される。そして、追加説明及び具体的状況の説明を示す情報３２６に含まれる文に類似する文は、（ｉ）経済活動に対する評価、（ｉｉ）経済活動の状態、又は、（ｉｉｉ）当該評価の理由を示す文であると判定される。 For example, the extraction model construction unit 522 uses information 326 indicating additional explanations and specific situation explanations of a plurality of records that make up the data table 300 as teacher data, so that the input sentence is (i) economic Build a learning model to determine whether it is an evaluation of an activity, (ii) the state of economic activity, or (iii) a sentence indicating the reason for the evaluation. According to the above learning model, each of the plurality of sentences input to the learning model includes a sentence similar to the sentence included in the information 326 indicating the additional explanation and the specific situation, and the additional explanation and the specific situation. sentences that are not similar to the sentences included in the information 326 indicating the description of Then, a sentence similar to the sentence included in the information 326 indicating the additional explanation and the explanation of the specific situation can be (i) the evaluation of the economic activity, (ii) the state of the economic activity, or (iii) the reason for the evaluation. is determined to be a sentence indicating

上記の学習モデルの種類は、特に限定されない。学習モデルの種類としては、ニューラルネットワークモデル（ＮＮと略称される場合がある）、畳み込みニューラルネットワーク（ＣＮＮと略称される場合がある。）、ロジスティック回帰モデル（ＬＲと略称される場合がある）、シンプルワードエンベッディングモデル（ＳＷＥＭと略称される場合がある）、ロングショートタームメモリモデル（ＬＳＴＭと略称される場合がある）、ＢｉｄｉｒｅｃｔｉｏｎａｌＬＴＳＭなどが例示される。 The type of learning model described above is not particularly limited. Types of learning models include neural network models (sometimes abbreviated as NN), convolutional neural networks (sometimes abbreviated as CNN), logistic regression models (sometimes abbreviated as LR), Examples include simple word embedding model (sometimes abbreviated as SWEM), long short term memory model (sometimes abbreviated as LSTM), bidirectional LTSM, and the like.

上記の学習モデルは、入力された文を、「（ｉ）経済活動に対する評価、（ｉｉ）経済活動の状態、又は、（ｉｉｉ）当該評価の理由を示す第１の文」、又は、「第１の文ではない第２の文」の何れかに分類する文章分類器を含んでよい。上記の学習モデルは、「評価対象の状態又は評価の理由を示す文」、又は、「評価対象の状態又は評価の理由を示す文ではない文」の何れかに分類する文章分類器を含んでもよい。文章分類器は、センテンスエンベッディングの生成器と、分類器とのペアにより構成されてよい。上記の学習モデルは、複数の文章分類器を含んでよい。 The above learning model converts the input sentence into "(i) the evaluation of economic activity, (ii) the state of economic activity, or (iii) the first sentence indicating the reason for the evaluation", or the "first A sentence classifier may be included that classifies any second sentence that is not the first sentence. The above learning model may include a sentence classifier that classifies into either "sentence indicating the state of the evaluation target or the reason for the evaluation" or "sentence not indicating the state of the evaluation target or the reason for the evaluation". good. The sentence classifier may consist of a sentence embedding generator and a classifier pair. The learning model described above may include multiple sentence classifiers.

抽出用モデル構築部５２２は、サンプルデータの種類に応じた学習モデルを構築してもよい。抽出用モデル構築部５２２は、サンプルデータの種類に応じて、利用される学習モデルの種類を決定してよい。抽出用モデル構築部５２２は、サンプルデータの種類に応じて、利用される学習モデルの種類の数を決定してよい。抽出用モデル構築部５２２は、サンプルデータの種類に応じて、利用される学習モデルの組み合わせを決定してもよい。サンプルデータは、学習モデルの入力データの一例であってよい。入力データの他の例としては、サンプルデータに含まれる１以上の文のデータが挙げられる。 The extraction model construction unit 522 may construct a learning model according to the type of sample data. The extraction model construction unit 522 may determine the type of learning model to be used according to the type of sample data. The extraction model construction unit 522 may determine the number of types of learning models to be used according to the types of sample data. The extraction model construction unit 522 may determine the combination of learning models to be used according to the type of sample data. Sample data may be an example of input data for a learning model. Another example of input data is data of one or more sentences included in the sample data.

抽出用モデル構築部５２２は、サンプルデータに含まれる１以上の文のそれぞれと、各文の種類を示す情報とが対応付けられた情報を教師データとして用いて、上記の学習モデルを構築してもよい。これにより、例えば、各文に関する情報提供者の属性に応じた判定結果を出力する学習モデルが構築される。 The extraction model construction unit 522 constructs the above learning model using information in which each of one or more sentences included in the sample data and information indicating the type of each sentence are associated with each other as teacher data. good too. As a result, for example, a learning model is constructed that outputs a determination result according to the attribute of the information provider regarding each sentence.

本実施形態において、評価用モデル構築部５２４は、上記の評価対象となる文のそれぞれに、評価スコアを付与するための学習モデルを構築する。具体的には、評価情報を教師データとして利用して、入力された文に評価スコアを付与するための学習モデルを構築する。 In this embodiment, the evaluation model construction unit 524 constructs a learning model for assigning an evaluation score to each sentence to be evaluated. Specifically, the evaluation information is used as teacher data to build a learning model for assigning evaluation scores to input sentences.

例えば、評価用モデル構築部５２４は、データテーブル３００を構成する複数のレコードの景気の現状判断を示す情報３２２と、追加説明及び具体的状況の説明を示す情報３２６とを教師データとして利用して、入力された文に評価スコアを付与するための学習モデルを構築する。 For example, the evaluation model construction unit 524 uses, as training data, information 322 indicating the judgment of the current state of the economy and information 326 indicating additional explanations and explanations of specific situations in the plurality of records that make up the data table 300. , builds a learning model for assigning evaluation scores to input sentences.

上記の学習モデルの種類は、特に限定されない。上記の学習モデルは、畳み込みニューラルネットワークを利用した回帰モデルであってよい。 The type of learning model described above is not particularly limited. The learning model described above may be a regression model using a convolutional neural network.

データテーブル３００において、景気の現状判断が段階的な区分により示されている場合、評価用モデル構築部５２４は、各区分に対応するスコアを決定してよい。これにより、各文に付与する評価スコアを連続的な数値で表現することができる。例えば、景気の現状判断が、「良くなる」、「やや良くなる」、「変わらない」、「やや悪くなる」及び「悪くなる」という５段階評価で表されている場合、評価用モデル構築部５２４は、「良くなる」という評価に２というスコアを付与する。同様に、評価用モデル構築部５２４は、「やや良くなる」、「変わらない」、「やや悪くなる」及び「悪くなる」という評価のそれぞれに、１、０、－１及び－２というスコアを付与する。これにより、評価用モデル構築部５２４が構築した学習モデルは、入力された文の評価スコアとして、－２から２までの範囲で任意の数値を付与する。 In the data table 300, when the judgment of the current state of the economy is indicated by tiered divisions, the evaluation model construction unit 524 may determine a score corresponding to each division. As a result, the evaluation score assigned to each sentence can be represented by continuous numerical values. For example, if the judgment of the current economic situation is represented by a five-level evaluation of "better", "better", "no change", "worse", and "worse", the evaluation model construction unit 524 assigns a score of 2 to the "gets better" rating. Similarly, the evaluation model construction unit 524 assigns scores of 1, 0, -1 and -2 to the evaluations of "slightly better", "no change", "slightly worse" and "worse", respectively. Give. As a result, the learning model constructed by the evaluation model constructing unit 524 assigns an arbitrary numerical value within the range of -2 to 2 as the evaluation score of the input sentence.

抽出用モデル構築部５２２は、第２モデル構築部の一例であってよい。評価用モデル構築部５２４は、第１モデル構築部の一例であってよい。 The extraction model construction unit 522 may be an example of a second model construction unit. The evaluation model construction unit 524 may be an example of a first model construction unit.

図６は、指標推定部１６６の内部構成の一例を概略的に示す。本実施形態において、指標推定部１６６は、種別判定部６２０と、評価対象抽出部６３０と、評価部６４０と、指標生成部６５０とを備える。本実施形態において、評価対象抽出部６３０は、キーワード型抽出部６３２と、機械学習型抽出部６３４とを有する。 FIG. 6 schematically shows an example of the internal configuration of the index estimation unit 166. As shown in FIG. In this embodiment, the index estimation unit 166 includes a type determination unit 620 , an evaluation target extraction unit 630 , an evaluation unit 640 and an index generation unit 650 . In this embodiment, the evaluation target extraction unit 630 has a keyword type extraction unit 632 and a machine learning type extraction unit 634 .

種別判定部６２０は、分析対象として指標推定部１６６に入力されたサンプルデータの種類を判定する。例えば、種別判定部６２０は、格納部１２６を参照して、入力されたサンプルデータの種別情報を取得し、当該種別情報に基づいて、サンプルデータの種類を判定する。種別判定部６２０は、入力されたサンプルデータの内容を解析して、当該サンプルデータの種類を判定してもよい。 The type determining unit 620 determines the type of sample data input to the index estimating unit 166 as an analysis target. For example, the type determination unit 620 refers to the storage unit 126 to obtain type information of the input sample data, and determines the type of sample data based on the type information. The type determination unit 620 may analyze the contents of the input sample data and determine the type of the sample data.

種別判定部６２０は、判定結果を示す情報を、例えば、評価対象抽出部６３０に出力する。種別判定部６２０は、判定結果を示す情報を、評価部６４０に出力してもよい。一実施形態において、種別判定部６２０は、サンプルデータごとに、当該サンプルデータの種類を示す情報を、評価対象抽出部６３０及び評価部６４０の少なくとも一方に出力する。他の実施形態において、種別判定部６２０は、サンプルデータに含まれる１以上の文のそれぞれについて、当該文の種類を示す情報を、評価対象抽出部６３０及び評価部６４０の少なくとも一方に出力する。 The type determination unit 620 outputs information indicating the determination result to the evaluation target extraction unit 630, for example. The type determination section 620 may output information indicating the determination result to the evaluation section 640 . In one embodiment, the type determination unit 620 outputs information indicating the type of sample data to at least one of the evaluation target extraction unit 630 and the evaluation unit 640 for each sample data. In another embodiment, the type determination section 620 outputs information indicating the type of each of one or more sentences included in the sample data to at least one of the evaluation target extraction section 630 and the evaluation section 640 .

種別判定部６２０は、入力されたサンプルデータの種類に関する判定結果を利用して、単一のサンプルデータを構成する１又は複数の文のそれぞれに対して、当該文の種類を示す情報を付与してもよい。種別判定部６２０は、各文の種類を示す情報を、サンプルデータ格納部２２６に格納してよい。例えば、種別判定部６２０は、各文の種類を示す情報を、データテーブル４００に格納する。 The type determination unit 620 uses the determination result regarding the type of the input sample data to add information indicating the type of the sentence to each of one or more sentences forming a single sample data. may The type determination section 620 may store information indicating the type of each sentence in the sample data storage section 226 . For example, the type determination unit 620 stores information indicating the type of each sentence in the data table 400 .

一実施形態において、種別判定部６２０は、単一のサンプルデータと、単一の種類とを対応付ける。例えば、種別判定部６２０は、単一のサンプルデータを構成する１又は複数の文のそれぞれに対して、当該文の種類を示す情報として、同一の情報を付与する。例えば、サンプルデータの種類を示す情報により、当該サンプルデータが特定のＳＮＳに投稿された情報であることが示される場合、種別判定部６２０は、当該サンプルデータに含まれる全ての文に対して、当該文が特定のＳＮＳに投稿された情報であることを示す情報を付与する。 In one embodiment, the type determination unit 620 associates a single sample data with a single type. For example, the type determination unit 620 assigns the same information as information indicating the type of the sentence to each of one or more sentences forming a single piece of sample data. For example, if the information indicating the type of the sample data indicates that the sample data is information posted on a specific SNS, the type determination unit 620, for all sentences included in the sample data, Information indicating that the sentence is information posted to a specific SNS is added.

他の実施形態において、種別判定部６２０は、単一のサンプルデータと、複数の種類とを対応付ける。例えば、種別判定部６２０は、単一のサンプルデータを構成する複数の文のうち、第１の群に属する１以上の文、及び、第２の群に属する１以上の文のそれぞれに対して、各群に属する文の種類を示す情報として、異なる情報を付与する。例えば、サンプルデータの種類を示す情報により、当該サンプルデータが折衝記録、営業日報又は業務日報に記録された情報であることが示される場合、種別判定部６２０は、第１のヒアリング対象者から得られた情報に関する文章に含まれる文には、第１のヒアリング対象者の属性を示す情報を付与し、第２のヒアリング対象者から得られた情報に関する文章に含まれる文には、第２のヒアリング対象者の属性を示す情報を付与する。 In another embodiment, the type determination unit 620 associates single sample data with multiple types. For example, the type determination unit 620 performs the , different information is given as information indicating the types of sentences belonging to each group. For example, when the information indicating the type of sample data indicates that the sample data is information recorded in a negotiation record, daily business report, or daily business report, the type determination unit 620 obtains from the first interviewee: Information indicating the attributes of the first interviewee is added to the sentences included in the sentences regarding the obtained information, and the sentences included in the sentences about the information obtained from the second interviewee are given the second information. Add information that indicates the attributes of the person to be interviewed.

本実施形態において、評価対象抽出部６３０は、設定情報格納部２２２を参照して、分析対象となる期間に関する設定情報を取得する。評価対象抽出部６３０は、サンプルデータ格納部２２６を参照して、分析対象となる期間に合致する時期に作成又は更新されたサンプルデータ（分析対象となるサンプルデータと称される場合がある）を取得する。評価対象抽出部６３０は、分析対象となる複数のサンプルデータのそれぞれに関するテキストデータを、各サンプルデータの内容に関連する時期、各サンプルデータが記録された時期、又は、各サンプルデータを含む電子ファイルが作成若しくは更新された時期を示す情報と対応づけて取得する。評価対象抽出部６３０は、分析対象となる複数のサンプルデータのそれぞれに関するテキストデータを、各サンプルデータの情報提供者の属性を示す情報と対応付けて取得してもよい。その後、評価対象抽出部６３０は、抽出用モデル構築部５２２が構築した学習モデルを用いて、上記のサンプルデータに含まれる１以上の文の中から、経済活動に関連する文を抽出する。 In this embodiment, the evaluation target extraction unit 630 refers to the setting information storage unit 222 and acquires setting information regarding the period to be analyzed. The evaluation target extracting unit 630 refers to the sample data storage unit 226, and extracts sample data (sometimes referred to as sample data to be analyzed) created or updated at a time that matches the period to be analyzed. get. The evaluation target extracting unit 630 extracts text data relating to each of a plurality of sample data to be analyzed as a time related to the contents of each sample data, a time when each sample data was recorded, or an electronic file containing each sample data. is acquired in association with information indicating when the was created or updated. The evaluation target extraction unit 630 may acquire text data relating to each of a plurality of sample data to be analyzed in association with information indicating the attribute of the information provider of each sample data. After that, the evaluation target extraction unit 630 uses the learning model constructed by the extraction model construction unit 522 to extract sentences related to economic activity from among the one or more sentences included in the sample data.

本実施形態において、評価対象抽出部６３０は、少なくとも、機械学習型抽出部６３４を利用して、サンプルデータに含まれる１以上の文の中から、経済活動に関連する文を抽出する。評価対象抽出部６３０は、抽出された文を、評価部６４０に出力する。 In this embodiment, the evaluation target extraction unit 630 uses at least the machine learning type extraction unit 634 to extract sentences related to economic activity from among one or more sentences included in the sample data. Evaluation target extraction unit 630 outputs the extracted sentence to evaluation unit 640 .

評価対象抽出部６３０は、キーワード型抽出部６３２及び機械学習型抽出部６３４を利用して、サンプルデータに含まれる１以上の文の中から、経済活動に関連する文を抽出してもよい。評価対象抽出部６３０は、サンプルデータの種類に基づいて、キーワード型抽出部６３２を用いた抽出処理と、機械学習型抽出部６３４を用いた抽出処理との組み合わせ方を決定してよい。 The evaluation target extraction unit 630 may use the keyword type extraction unit 632 and the machine learning type extraction unit 634 to extract sentences related to economic activity from one or more sentences included in the sample data. The evaluation target extraction unit 630 may determine how to combine the extraction processing using the keyword type extraction unit 632 and the extraction processing using the machine learning type extraction unit 634 based on the type of sample data.

本実施形態において、キーワード型抽出部６３２は、サンプルデータに含まれる１以上の文の中から、予め定められたキーワード又はキーフレーズに関する条件に合致する文を抽出する。例えば、キーワード型抽出部６３２は、サンプルデータに含まれる１以上の文の中から、キーワードを含む文、キーフレーズに合致する文、キーワードに類似する単語を含む文、及び、キーフレーズに類似する条件に合致する文の少なくとも１つ（キーワードなどに合致する文と称される場合がある。）を、経済活動に関連する文、又は、経済活動に関連する文の候補として抽出する。 In this embodiment, the keyword-type extracting unit 632 extracts sentences that match conditions related to a predetermined keyword or key phrase from one or more sentences included in the sample data. For example, the keyword-type extraction unit 632 selects, from one or more sentences included in the sample data, a sentence containing a keyword, a sentence matching a key phrase, a sentence containing a word similar to the keyword, and a sentence similar to the key phrase. At least one of the sentences matching the condition (sometimes referred to as a sentence matching a keyword or the like) is extracted as a sentence related to economic activity or a candidate for a sentence related to economic activity.

キーワード型抽出部６３２は、設定情報格納部２２２を参照して、キーワード又はキーフレーズに関する条件を示す設定情報を取得してよい。キーワード又はキーフレーズに関する設定情報は、キーワードを示す情報、及び、キーフレーズを示す情報の少なくとも一方を含む。キーワード又はキーフレーズに関する設定情報は、キーワードに類似する単語を含む文を抽出するか否かを示す情報、及び、キーフレーズに類似する条件に合致する文を抽出するか否かを示す情報の少なくとも一方を含んでもよい。 The keyword type extraction unit 632 may refer to the setting information storage unit 222 to acquire setting information indicating conditions related to keywords or key phrases. The setting information related to keywords or keyphrases includes at least one of information indicating keywords and information indicating keyphrases. The setting information related to keywords or keyphrases includes at least information indicating whether to extract sentences containing words similar to the keywords and information indicating whether to extract sentences matching conditions similar to the keyphrases. You may include one.

キーワード型抽出部６３２は、種別判定部６２０の判定結果に基づいて、抽出処理の条件を決定してもよい。これにより、入力されるデータの種類に応じた、適切な条件が設定される。一実施形態において、キーワード型抽出部６３２は、種別判定部６２０の判定結果に基づいて、キーワードに類似する単語を決定してよい。キーワード型抽出部６３２は、種別判定部６２０の判定結果に基づいて、キーワードに類似する単語の個数を決定してもよい。他の実施形態において、キーワード型抽出部６３２は、種別判定部６２０の判定結果に基づいて、キーフレーズに類似する条件を決定してよい。キーワード型抽出部６３２は、種別判定部６２０の判定結果に基づいて、キーフレーズに類似する条件の個数を決定してもよい。 The keyword type extraction unit 632 may determine conditions for extraction processing based on the determination result of the type determination unit 620 . Accordingly, an appropriate condition is set according to the type of data to be input. In one embodiment, the keyword type extractor 632 may determine words similar to the keyword based on the determination result of the type determiner 620 . The keyword type extraction section 632 may determine the number of words similar to the keyword based on the determination result of the type determination section 620 . In another embodiment, the keyword type extraction unit 632 may determine a condition similar to the key phrase based on the determination result of the type determination unit 620 . The keyword type extraction section 632 may determine the number of conditions similar to the key phrase based on the determination result of the type determination section 620 .

キーワード型抽出部６３２は、種別判定部６２０の判定結果に基づいて、サンプルデータに含まれる１以上の文の中から、キーワードを含む文、キーフレーズに合致する文、キーワードに類似する単語を含む文、及び、キーフレーズに類似する条件に合致する文の少なくとも１つを、（ｉ）経済活動に関連する文として抽出するか、又は、（ｉｉ）経済活動に関連する文の候補として抽出するかを決定してもよい。例えば、経済活動に関連する文として抽出された文は、評価部６４０に出力される。一方、経済活動に関連する文の候補として抽出された文は、機械学習型抽出部６３４に出力される。 Based on the determination result of the type determination unit 620, the keyword type extraction unit 632 extracts sentences containing keywords, sentences matching key phrases, and words similar to keywords from among one or more sentences included in the sample data. At least one of sentences and sentences matching conditions similar to the key phrase is extracted as (i) sentences related to economic activity, or (ii) candidates for sentences related to economic activity. You may decide whether For example, sentences extracted as sentences related to economic activity are output to the evaluation unit 640 . On the other hand, sentences extracted as candidate sentences related to economic activity are output to the machine learning type extraction unit 634 .

キーワードなどに合致する文だけでなく、キーワードなどに合致する文の近傍に配された１以上の文も、その他の文と比較して、経済活動に関連する文である可能性が高い。そこで、キーワード型抽出部６３２は、連続する２以上の文を含む文章であって、キーワードを含む文、キーフレーズに合致する文、キーワードに類似する単語を含む文、及び、キーフレーズに類似する条件に合致する文の少なくとも１つを含む文章を、経済活動に関連する文の候補として抽出してもよい。 Not only the sentence that matches the keyword, etc., but also one or more sentences placed near the sentence that matches the keyword, etc., are more likely to be sentences related to economic activity than other sentences. Therefore, the keyword-type extraction unit 632 extracts sentences that include two or more consecutive sentences and includes a keyword, a sentence that matches a key phrase, a sentence that includes a word similar to the keyword, and a sentence that is similar to the key phrase. A sentence that includes at least one sentence that satisfies the condition may be extracted as a candidate for a sentence related to economic activity.

この場合において、キーワード型抽出部６３２は、種別判定部６２０の判定結果に基づいて、上記の文章に含まれる文の個数を決定してもよい。一実施形態において、上記の文の個数は、サンプルデータに含まれるノイズが多いほど、上記の文の個数が少なくなるように設定されてよい。ここで、ノイズとは、経済活動に関連しない文を示す。これにより、評価部６４０に出力される１以上の文にノイズが混入することが抑制され得る。他の実施形態において、上記の文の個数は、サンプルデータに含まれるノイズが多いほど、上記の文の個数が多くなるように設定されてもよい。これにより、経済活動に関連する文の抽出漏れが抑制され得る。 In this case, the keyword type extraction unit 632 may determine the number of sentences included in the above sentence based on the determination result of the type determination unit 620 . In one embodiment, the number of sentences may be set such that the more noise included in the sample data, the fewer the number of sentences. Here, noise refers to sentences that are not related to economic activity. This can prevent noise from being mixed into one or more sentences output to the evaluation unit 640 . In another embodiment, the number of sentences may be set such that the more noise included in the sample data, the greater the number of sentences. As a result, omission of extraction of sentences related to economic activities can be suppressed.

本実施形態において、機械学習型抽出部６３４は、抽出用モデル構築部５２２が構築した学習モデルを利用して、サンプルデータに含まれる１以上の文の中から、経済活動に関連する文を抽出する。具体的には、機械学習型抽出部６３４は、サンプルデータに含まれる１以上の文の少なくとも一部を学習モデルに入力し、当該学習モデルが、経済活動の状態を示す文又は経済活動に関する評価の理由を示す文であると判定した文を、経済活動に関連する文として抽出する。 In this embodiment, the machine-learning extraction unit 634 uses the learning model constructed by the extraction model construction unit 522 to extract sentences related to economic activity from among one or more sentences included in the sample data. do. Specifically, the machine-learning extraction unit 634 inputs at least part of one or more sentences included in the sample data to a learning model, and the learning model outputs sentences indicating the state of economic activities or evaluations of economic activities. The sentence determined to be a sentence indicating the reason for is extracted as a sentence related to economic activity.

一実施形態において、機械学習型抽出部６３４は、分析対象となるサンプルデータの全てを、学習モデルに入力する。他の実施形態において、機械学習型抽出部６３４は、キーワード型抽出部６３２が経済活動に関連する文の候補として抽出した文を、学習モデルに入力する。さらに他の実施形態において、機械学習型抽出部６３４は、分析対象となるサンプルデータのうち、キーワード型抽出部６３２により抽出されなかった文を、学習モデルに入力する。 In one embodiment, the machine-learning extractor 634 inputs all of the sample data to be analyzed into the learning model. In another embodiment, the machine learning type extraction unit 634 inputs sentences extracted by the keyword type extraction unit 632 as candidate sentences related to economic activity to the learning model. In yet another embodiment, the machine-learning extraction unit 634 inputs sentences not extracted by the keyword-type extraction unit 632 out of the sample data to be analyzed into the learning model.

本実施形態において、評価部６４０は、評価対象抽出部６３０が抽出した文に、評価スコアを付与する。評価部６４０は、評価用モデル構築部５２４が構築した学習モデルを用いて、評価対象抽出部６３０が抽出した文に、評価スコアを付与してよい。例えば、評価部６４０は、評価用モデル構築部５２４が構築した学習モデルに、評価対象抽出部６３０が抽出した１以上の文のそれぞれを入力する。評価部６４０は、学習モデルの出力を、各文の評価スコアとして決定する。 In this embodiment, the evaluation unit 640 gives an evaluation score to the sentence extracted by the evaluation target extraction unit 630 . The evaluation unit 640 may assign an evaluation score to the sentence extracted by the evaluation target extraction unit 630 using the learning model constructed by the evaluation model construction unit 524 . For example, the evaluation unit 640 inputs each of the one or more sentences extracted by the evaluation target extraction unit 630 to the learning model constructed by the evaluation model construction unit 524 . The evaluation unit 640 determines the output of the learning model as the evaluation score of each sentence.

評価部６４０は、種別判定部６２０の判定結果を利用して、評価対象抽出部６３０が抽出した文に評価スコアを付与してもよい。例えば、評価部６４０は、評価用モデル構築部５２４が構築した学習モデルに、評価対象抽出部６３０が抽出した１以上の文のそれぞれと、各文に関する種別判定部６２０の判定結果とを入力する。評価部６４０は、学習モデルの出力を、各文の評価スコアとして決定する。 The evaluation unit 640 may use the determination result of the type determination unit 620 to assign an evaluation score to the sentence extracted by the evaluation target extraction unit 630 . For example, the evaluation unit 640 inputs each of the one or more sentences extracted by the evaluation target extraction unit 630 and the determination result of the type determination unit 620 regarding each sentence to the learning model constructed by the evaluation model construction unit 524. . The evaluation unit 640 determines the output of the learning model as the evaluation score of each sentence.

上述されたとおり、各文のデータの種類は、例えば、各文が含まれていたサンプルデータのＵＲＩ、当該サンプルデータの作成者又は更新者、当該サンプルデータに関する情報提供者の属性などに基づき決定される。評価部６４０は、種別判定部６２０の判定結果として、例えば、情報提供者の属性を利用する。これにより、評価部６４０は、評価対象抽出部６３０が抽出した複数の文のそれぞれに対して、各文に関する情報提供者の属性に基づいて、評価対象に関する評価を付与することができる。 As described above, the type of data for each sentence is determined based on, for example, the URI of the sample data that contained each sentence, the creator or updater of the sample data, the attributes of the information provider regarding the sample data, etc. be done. The evaluation unit 640 uses, for example, the attribute of the information provider as the determination result of the type determination unit 620 . As a result, the evaluation unit 640 can assign an evaluation object evaluation to each of the plurality of sentences extracted by the evaluation object extraction unit 630 based on the attribute of the information provider for each sentence.

評価部６４０は、情報提供者の評価対象に対する造詣が深い程、当該情報提供者により提供された情報が、指標生成部６５０により生成される指標に与える影響が大きくなるように、評価スコアを付与してよい。例えば、指標生成部６５０が、日銀短観の代替となり得る指標を生成する場合、不特定多数のユーザが利用することのできるＳＮＳ上に、匿名の投稿者により投稿された情報よりも、企業の経営者により提供された情報の方が、情報源として相応しい。そこで、例えば、公的な調査のヒアリング対象者の属性と、情報提供者の属性との類似度合が予め定められた基準よりも大きい場合、評価部６４０は、学習モデルの出力値が所定値より大きいときには、学習モデルの出力値よりも大きな値を評価スコアとして決定し、学習モデルの出力値が所定値より小さいときには、学習モデルの出力値よりも小さな値を評価スコアとして決定する。 The evaluation unit 640 assigns an evaluation score such that the deeper the information provider's knowledge of the evaluation target, the greater the influence of the information provided by the information provider on the index generated by the index generation unit 650. You can For example, when the index generation unit 650 generates an index that can substitute for the Tankan of the Bank of Japan, information posted by anonymous contributors on an SNS that can be used by an unspecified number of users is used rather than information posted by an anonymous contributor. Information provided by a third party is a better source of information. Therefore, for example, if the degree of similarity between the attributes of the interviewee of the public survey and the attributes of the information provider is greater than a predetermined standard, the evaluation unit 640 determines that the output value of the learning model is greater than the predetermined value. When it is larger, a value larger than the output value of the learning model is determined as an evaluation score, and when the output value of the learning model is smaller than a predetermined value, a value smaller than the output value of the learning model is determined as an evaluation score.

これにより、評価スコアの決定において、例えば、各文に関する情報提供者の属性が考慮され得る。教師データとして用いられる景気ウォッチャー調査、生産動態統計調査などにおいては、ヒアリング対象者として、特定の属性を有する個人、団体又は法人が選択される。そのため、教師データとして用いられた調査のヒアリング対象者と、評価対象として入力された各文に関する情報提供者の属性とが合致又は類似する場合に、当該類似の度合を考慮した評価スコアが付与されることにより、指標生成部６５０により生成される指標の信頼性が向上する。 This allows, for example, the attributes of the information provider for each sentence to be taken into account in determining the evaluation score. In the Economic Watchers Survey and the Current Production Statistics Survey, which are used as teacher data, individuals, groups, or corporations with specific attributes are selected as interviewees. Therefore, when the interviewees of the survey used as training data match or resemble each other in the attributes of the information provider regarding each sentence input as the evaluation target, an evaluation score is given that takes into consideration the degree of similarity. As a result, the reliability of the index generated by the index generator 650 is improved.

評価部６４０は、種別判定部６２０の判定結果を利用して、評価用モデル構築部５２４が構築した学習モデルが出力したスコアを補正することで、評価対象抽出部６３０が抽出した文に、評価スコアを付与してもよい。例えば、まず、評価部６４０は、評価用モデル構築部５２４が構築した学習モデルに、評価対象抽出部６３０が抽出した１以上の文のそれぞれを入力する。評価部６４０は、学習モデルの出力を、各文の評価スコアの暫定値として決定する。次に、評価部６４０は、例えば、学習モデルの出力に、各文のデータの種類に応じた補正係数を乗じて得られた値を、各文の評価スコアとして決定する。 The evaluation unit 640 uses the determination result of the type determination unit 620 to correct the score output by the learning model constructed by the evaluation model construction unit 524, so that the sentence extracted by the evaluation target extraction unit 630 is evaluated. A score may be given. For example, first, the evaluation unit 640 inputs each of the one or more sentences extracted by the evaluation target extraction unit 630 to the learning model constructed by the evaluation model construction unit 524 . The evaluation unit 640 determines the output of the learning model as a provisional value of the evaluation score of each sentence. Next, the evaluation unit 640 determines, as the evaluation score of each sentence, a value obtained by, for example, multiplying the output of the learning model by a correction coefficient according to the type of data of each sentence.

評価部６４０は、種別判定部６２０の判定結果として、例えば、情報提供者の属性を利用する。これにより、評価部６４０は、評価対象抽出部６３０が抽出した複数の文のそれぞれに対して、各文に関する情報提供者の属性に基づいて、評価対象に関する評価を付与することができる。 The evaluation unit 640 uses, for example, the attribute of the information provider as the determination result of the type determination unit 620 . As a result, the evaluation unit 640 can assign an evaluation object evaluation to each of the plurality of sentences extracted by the evaluation object extraction unit 630 based on the attribute of the information provider for each sentence.

例えば、補正係数の値を、上記の情報提供者及び評価対象の関連度合が大きいほど、当該補正係数の値が大きくなるように設定することで、指標生成部６５０により生成される指標の精度が向上し得る。例えば、補正係数の値を、上記の情報提供者による過去の景気予測の精度が高いほど、当該補正係数の値が大きくなるように設定することで、指標生成部６５０により生成される経済指標の精度が向上し得る。 For example, the accuracy of the index generated by the index generation unit 650 can be increased by setting the value of the correction coefficient such that the greater the degree of association between the information provider and the evaluation target, the greater the value of the correction coefficient. can improve. For example, the value of the correction coefficient is set such that the higher the accuracy of the past economic forecast by the information provider, the larger the value of the correction coefficient. Accuracy can be improved.

本実施形態において、指標生成部６５０は、評価対象抽出部６３０が抽出した複数の文のそれぞれに付与された評価スコアに基づいて、分析対象期間における経済活動の状態を示す指標を算出する。指標の算出方法は、指標の種類に応じて適切に決定される。これにより、指標推定システム１００は、分析対象期間における指標の推定値を出力することができる。 In this embodiment, the index generator 650 calculates an index indicating the state of economic activity during the analysis target period based on the evaluation scores assigned to each of the plurality of sentences extracted by the evaluation target extractor 630 . The index calculation method is appropriately determined according to the type of index. Thereby, the index estimation system 100 can output the estimated value of the index in the analysis target period.

一実施形態において、指標生成部６５０は、評価対象抽出部６３０が抽出した複数の文のそれぞれに付与された評価スコアを集計することで、指標を算出する。他の実施形態において、指標生成部６５０は、評価対象抽出部６３０が抽出した複数の文のそれぞれに付与された評価スコアの統計値を用いて、指標を算出する。統計値としては、平均値、中央値、四分位数、分散などが例示される。さらに他の実施形態において、指標生成部６５０は、評価対象抽出部６３０が抽出した複数の文のそれぞれに付与された評価スコア及びその統計値の少なくとも一方を、予め定められた数式に代入する、又は、予め定められたアルゴリズムに基づいて処理することで、指標を算出する。指標生成部６５０は、種別判定部６２０の判定結果を利用して、上記の数式又はアルゴリズムのパラメータを決定してもよい。指標生成部６５０は、情報提供者の属性を利用して、上記の数式又はアルゴリズムのパラメータを決定してもよい。 In one embodiment, the index generation unit 650 calculates the index by summing up the evaluation scores given to each of the plurality of sentences extracted by the evaluation target extraction unit 630 . In another embodiment, the index generation unit 650 calculates the index using statistical values of evaluation scores assigned to each of the plurality of sentences extracted by the evaluation target extraction unit 630. FIG. Examples of statistical values include mean values, median values, quartiles, variances, and the like. In yet another embodiment, the index generation unit 650 substitutes at least one of the evaluation score given to each of the plurality of sentences extracted by the evaluation target extraction unit 630 and the statistical value thereof into a predetermined formula. Alternatively, the index is calculated by processing based on a predetermined algorithm. The index generation unit 650 may use the determination result of the type determination unit 620 to determine the parameters of the above formulas or algorithms. The index generator 650 may use information provider attributes to determine the parameters of the above formulas or algorithms.

指標生成部６５０は、算出された指標を正規化してもよい。一実施形態において、指標生成部６５０は、分析対象期間よりも長い期間における指標の最大値及び最小値を利用して、当該指標を正規化する。他の実施形態において、指標生成部６５０は、評価スコアが付与された文の個数を利用して、指標を正規化してもよい。 The index generator 650 may normalize the calculated index. In one embodiment, the index generator 650 normalizes the index using the maximum and minimum values of the index in a period longer than the analysis target period. In another embodiment, the index generator 650 may normalize the index using the number of sentences to which evaluation scores have been assigned.

種別判定部６２０は、種別情報取得部の一例であってよい。評価対象抽出部６３０は、期間情報取得部、及び、抽出部の一例であってよい。キーワード型抽出部６３２は、条件取得部、及び、第２抽出部の一例であってよい。機械学習型抽出部６３４は、第１抽出部の一例であってよい。評価部６４０は、評価付与部の一例であってよい。指標生成部６５０は、指標算出部の一例であってよい。 The type determination unit 620 may be an example of a type information acquisition unit. The evaluation target extraction unit 630 may be an example of a period information acquisition unit and an extraction unit. The keyword type extraction unit 632 may be an example of a condition acquisition unit and a second extraction unit. The machine learning type extractor 634 may be an example of a first extractor. The evaluation unit 640 may be an example of an evaluation granting unit. The index generator 650 may be an example of an index calculator.

図７は、機械学習型抽出部６３４の内部構成の一例を概略的に示す。本実施形態において、機械学習型抽出部６３４は、学習モデル７２０と、判定部７４０とを備える。本実施形態において、学習モデル７２０は、文章分類器７２２と、文章分類器７２４と、文章分類器７２６と、文章分類器７２８とを有する。 FIG. 7 schematically shows an example of the internal configuration of the machine learning type extraction unit 634. As shown in FIG. In this embodiment, the machine learning type extraction unit 634 includes a learning model 720 and a determination unit 740 . In this embodiment, learning model 720 includes sentence classifier 722 , sentence classifier 724 , sentence classifier 726 , and sentence classifier 728 .

本実施形態において、学習モデル７２０に含まれる、文章分類器７２２、文章分類器７２４、文章分類器７２６及び文章分類器７２８のそれぞれは、入力された文が、経済活動の状態又は経済活動に関する評価の理由を示す文であることの確からしさを示すスコアを、判定部７４０に出力する。判定部７４０は、文章分類器７２２、文章分類器７２４、文章分類器７２６及び文章分類器７２８のそれぞれが出力したスコアの合計値が、予め定められた閾値よりも大きい場合に、入力された文を、経済活動に関連する文として抽出する。 In this embodiment, each of sentence classifier 722, sentence classifier 724, sentence classifier 726, and sentence classifier 728 included in learning model 720 determines whether an input sentence is a state of economic activity or an evaluation of economic activity. A score indicating the probability of being a sentence indicating the reason for is output to determination section 740 . If the sum of the scores output by the sentence classifier 722, the sentence classifier 724, the sentence classifier 726, and the sentence classifier 728 is greater than a predetermined threshold, the determination unit 740 classifies the input sentence. are extracted as sentences related to economic activities.

本実施形態において、文章分類器７２２は、センテンスエンベディングの生成器としてＴＦ－ＩＤＦモデルを利用し、分類器としてＬＲモデルを利用する。本実施形態において、文章分類器７２４は、センテンスエンベディングの生成器としてＬＳＴＭモデルを利用し、分類器としてＮＮモデルを利用する。本実施形態において、文章分類器７２６は、センテンスエンベディングの生成器としてＣＮＮモデルを利用し、分類器としてＮＮモデルを利用する。本実施形態において、文章分類器７２８は、センテンスエンベディングの生成器としてＳＷＥＭモデルを利用し、分類器としてＬＲモデルを利用する。 In this embodiment, the sentence classifier 722 utilizes the TF-IDF model as the generator of the sentence embeddings and the LR model as the classifier. In this embodiment, the sentence classifier 724 utilizes an LSTM model as the generator of the sentence embeddings and an NN model as the classifier. In this embodiment, the sentence classifier 726 utilizes the CNN model as the generator of the sentence embeddings and the NN model as the classifier. In this embodiment, the sentence classifier 728 utilizes the SWEM model as the generator of the sentence embeddings and the LR model as the classifier.

学習モデル７２０の構成は、種別判定部６２０の判定結果に基づいて決定されてよい。例えば、学習モデル７２０を構成する文章分類器に用いられるモデルの種類は、種別判定部６２０の判定結果に基づいて決定される。学習モデル７２０を構成する文章分類器に用いられるモデルの組み合わせは、種別判定部６２０の判定結果に基づいて決定されてよい。学習モデル７２０を構成する文章分類器の個数は、種別判定部６２０の判定結果に基づいて決定されてよい。 The configuration of learning model 720 may be determined based on the determination result of type determination section 620 . For example, the type of model used for the sentence classifier that constitutes the learning model 720 is determined based on the determination result of the type determination unit 620 . A combination of models used for the sentence classifiers that constitute the learning model 720 may be determined based on the determination result of the type determination unit 620 . The number of sentence classifiers forming learning model 720 may be determined based on the determination result of type determination section 620 .

図８、図９及び図１０を用いて、評価対象抽出部６３０が、キーワード型抽出部６３２及び機械学習型抽出部６３４を利用して、サンプルデータに含まれる１以上の文の中から、経済活動に関連する文を抽出する情報処理の概要が説明される。図８は、評価対象抽出部６３０における情報処理の一例を概略的に示す。図９は、評価対象抽出部６３０における情報処理の他の例を概略的に示す。図１０は、評価対象抽出部６３０における情報処理のさらに他の例を概略的に示す。 8, 9 and 10, the evaluation object extraction unit 630 uses the keyword type extraction unit 632 and the machine learning type extraction unit 634 to extract economic sentences from one or more sentences included in the sample data. An overview of information processing for extracting activity-related sentences is provided. FIG. 8 schematically shows an example of information processing in the evaluation target extraction unit 630. As shown in FIG. FIG. 9 schematically shows another example of information processing in the evaluation target extraction unit 630. As shown in FIG. FIG. 10 schematically shows still another example of information processing in the evaluation target extraction unit 630. As shown in FIG.

なお、評価対象抽出部６３０における情報処理はこれらの実施形態に限定されない。他の実施形態において、評価対象抽出部６３０は、機械学習型抽出部６３４のみを利用して、サンプルデータに含まれる１以上の文の中から、経済活動に関連する文を抽出する。 Information processing in the evaluation target extraction unit 630 is not limited to these embodiments. In another embodiment, the evaluation target extraction unit 630 uses only the machine learning type extraction unit 634 to extract sentences related to economic activity from one or more sentences included in the sample data.

図８に示された実施形態によれば、分析対象となるサンプルデータの全てが、まず、キーワード型抽出部６３２に入力される。本実施形態によれば、キーワード型抽出部６３２により抽出された全ての文が、機械学習型抽出部６３４に入力される。一方、キーワード型抽出部６３２により抽出されなかった文は、機械学習型抽出部６３４に入力されない。 According to the embodiment shown in FIG. 8, all of the sample data to be analyzed are first input into the keyword type extractor 632 . According to this embodiment, all sentences extracted by the keyword type extraction unit 632 are input to the machine learning type extraction unit 634 . On the other hand, sentences not extracted by the keyword type extraction unit 632 are not input to the machine learning type extraction unit 634 .

図９に示された実施形態によれば、分析対象となるサンプルデータの全てが、まず、キーワード型抽出部６３２に入力される。本実施形態によれば、キーワード型抽出部６３２により抽出されなかった文が、機械学習型抽出部６３４に入力される。一方、キーワード型抽出部６３２により抽出された文は、機械学習型抽出部６３４に入力されることなく、評価部６４０に出力される。 According to the embodiment shown in FIG. 9, all of the sample data to be analyzed are first input into the keyword type extractor 632 . According to this embodiment, sentences not extracted by the keyword type extraction unit 632 are input to the machine learning type extraction unit 634 . On the other hand, the sentences extracted by the keyword type extraction unit 632 are output to the evaluation unit 640 without being input to the machine learning type extraction unit 634 .

図１０に示された実施形態によれば、分析対象となるサンプルデータの全てが、まず、キーワード型抽出部６３２に入力される。本実施形態によれば、キーワード型抽出部６３２により抽出された文の一部は、機械学習型抽出部６３４に入力されることなく、評価部６４０に出力される。一方、キーワード型抽出部６３２により抽出された文の残りの部分は、機械学習型抽出部６３４に入力される。 According to the embodiment shown in FIG. 10, all of the sample data to be analyzed are first input to the keyword type extractor 632 . According to this embodiment, part of the sentence extracted by the keyword-based extraction unit 632 is output to the evaluation unit 640 without being input to the machine-learning-type extraction unit 634 . On the other hand, the rest of the sentence extracted by the keyword type extraction unit 632 is input to the machine learning type extraction unit 634 .

例えば、キーワード型抽出部６３２は、連続する２以上の文を含む文章であって、キーワードを含む文、キーフレーズに合致する文、キーワードに類似する単語を含む文、及び、キーフレーズに類似する条件に合致する文の少なくとも１つを含む文章を、経済活動に関連する文の候補として抽出する場合を考える。この場合において、キーワードを含む文、キーフレーズに合致する文、キーワードに類似する単語を含む文、及び、キーフレーズに類似する条件に合致する文は、経済活動に関連する文である可能性が比較的高い。そこで、これらの文は、機械学習型抽出部６３４に入力されることなく、評価部６４０に出力される。一方、キーワード型抽出部６３２が抽出した残りの文は、経済活動に関連する文である可能性が比較的低い。そこで、これらの文は、機械学習型抽出部６３４に入力される。 For example, the keyword-type extraction unit 632 extracts sentences containing two or more consecutive sentences that include the keyword, sentences that match the key phrase, sentences that include words similar to the keyword, and sentences that are similar to the key phrase. Consider a case where sentences including at least one sentence that satisfies a condition are extracted as candidate sentences related to economic activity. In this case, sentences containing keywords, sentences matching keyphrases, sentences containing words similar to keywords, and sentences matching conditions similar to keyphrases may be sentences related to economic activities. Relatively high. Therefore, these sentences are output to the evaluation unit 640 without being input to the machine learning type extraction unit 634 . On the other hand, the remaining sentences extracted by the keyword type extraction unit 632 are less likely to be sentences related to economic activities. These sentences are then input to the machine learning type extraction unit 634 .

図１１は、データテーブル１１００の一例を概略的に示す。本実施形態において、データテーブル１１００の各行は、評価部６４０の評価結果の一例であってよい。本実施形態において、データテーブル１１００は、サンプルＩＤ１１１２と、センテンスＩＤ１１１４と、記録時刻を示す情報１１１６と、評価スコアを示す情報１１１８とを示す。サンプルＩＤ１１１２、センテンスＩＤ１１１４、及び、記録時刻を示す情報１１１６のそれぞれは、サンプルＩＤ４１２、センテンスＩＤ４１４及び記録時刻を示す情報４１６と同様の構成を有してよい。評価スコアを示す情報１１１８は、評価部６４０により付与された評価スコアを示す。 FIG. 11 schematically shows an example of a data table 1100. As shown in FIG. In this embodiment, each row of the data table 1100 may be an example of the evaluation result of the evaluation unit 640 . In this embodiment, the data table 1100 shows sample IDs 1112, sentence IDs 1114, information 1116 indicating recording times, and information 1118 indicating evaluation scores. Each of the sample ID 1112, the sentence ID 1114, and the information 1116 indicating the recording time may have the same configuration as the sample ID 412, the sentence ID 414, and the information 416 indicating the recording time. The evaluation score information 1118 indicates the evaluation score given by the evaluation unit 640 .

図１２は、本発明の複数の態様が全体的又は部分的に具現化されてよいコンピュータ３０００の一例を示す。例えば、指標推定システム１００は、コンピュータ３０００により実現される。 FIG. 12 illustrates an example computer 3000 upon which aspects of the present invention may be embodied, in whole or in part. For example, index estimation system 100 is implemented by computer 3000 .

コンピュータ３０００にインストールされたプログラムは、コンピュータ３０００に、本発明の実施形態に係る装置に関連付けられるオペレーション又は当該装置の１又は複数の「部」として機能させ、又は当該オペレーション又は当該１又は複数の「部」を実行させることができ、及び／又はコンピュータ３０００に、本発明の実施形態に係るプロセス又は当該プロセスの段階を実行させることができる。そのようなプログラムは、コンピュータ３０００に、本明細書に記載のフローチャート及びブロック図のブロックのうちのいくつか又はすべてに関連付けられた特定のオペレーションを実行させるべく、ＣＰＵ３０１２によって実行されてよい。 Programs installed on the computer 3000 cause the computer 3000 to function as one or more "parts" of operations or one or more "parts" of an apparatus according to embodiments of the invention, or to and/or cause computer 3000 to perform a process or steps of a process according to embodiments of the present invention. Such programs may be executed by CPU 3012 to cause computer 3000 to perform certain operations associated with some or all of the blocks in the flowcharts and block diagrams described herein.

本実施形態によるコンピュータ３０００は、ＣＰＵ３０１２、ＲＡＭ３０１４、グラフィックコントローラ３０１６、及びディスプレイデバイス３０１８を含み、それらはホストコントローラ３０１０によって相互に接続されている。コンピュータ３０００はまた、通信インターフェース３０２２、ハードディスクドライブ３０２４、ＤＶＤ－ＲＯＭドライブ３０２６、及びＩＣカードドライブのような入出力ユニットを含み、それらは入出力コントローラ３０２０を介してホストコントローラ３０１０に接続されている。コンピュータはまた、ＲＯＭ３０３０及びキーボード３０４２のようなレガシの入出力ユニットを含み、それらは入出力チップ３０４０を介して入出力コントローラ３０２０に接続されている。 Computer 3000 according to this embodiment includes CPU 3012 , RAM 3014 , graphics controller 3016 , and display device 3018 , which are interconnected by host controller 3010 . Computer 3000 also includes input/output units such as communication interface 3022 , hard disk drive 3024 , DVD-ROM drive 3026 and IC card drive, which are connected to host controller 3010 via input/output controller 3020 . The computer also includes legacy input/output units such as ROM 3030 and keyboard 3042 , which are connected to input/output controller 3020 via input/output chip 3040 .

ＣＰＵ３０１２は、ＲＯＭ３０３０及びＲＡＭ３０１４内に格納されたプログラムに従い動作し、それにより各ユニットを制御する。グラフィックコントローラ３０１６は、ＲＡＭ３０１４内に提供されるフレームバッファ等又はそれ自体の中に、ＣＰＵ３０１２によって生成されるイメージデータを取得し、イメージデータがディスプレイデバイス３０１８上に表示されるようにする。 The CPU 3012 operates according to programs stored in the ROM 3030 and RAM 3014, thereby controlling each unit. Graphics controller 3016 retrieves image data generated by CPU 3012 into a frame buffer or the like provided in RAM 3014 or itself, and causes the image data to be displayed on display device 3018 .

通信インターフェース３０２２は、ネットワークを介して他の電子デバイスと通信する。ハードディスクドライブ３０２４は、コンピュータ３０００内のＣＰＵ３０１２によって使用されるプログラム及びデータを格納する。ＤＶＤ－ＲＯＭドライブ３０２６は、プログラム又はデータをＤＶＤ－ＲＯＭ３００１から読み取り、ハードディスクドライブ３０２４にＲＡＭ３０１４を介してプログラム又はデータを提供する。ＩＣカードドライブは、プログラム及びデータをＩＣカードから読み取り、及び／又はプログラム及びデータをＩＣカードに書き込む。 Communication interface 3022 communicates with other electronic devices over a network. Hard disk drive 3024 stores programs and data used by CPU 3012 within computer 3000 . DVD-ROM drive 3026 reads programs or data from DVD-ROM 3001 and provides programs or data to hard disk drive 3024 via RAM 3014 . The IC card drive reads programs and data from IC cards and/or writes programs and data to IC cards.

ＲＯＭ３０３０はその中に、アクティブ化時にコンピュータ３０００によって実行されるブートプログラム等、及び／又はコンピュータ３０００のハードウエアに依存するプログラムを格納する。入出力チップ３０４０はまた、様々な入出力ユニットをパラレルポート、シリアルポート、キーボードポート、マウスポート等を介して、入出力コントローラ３０２０に接続してよい。 ROM 3030 stores therein programs that are dependent on the hardware of computer 3000, such as a boot program that is executed by computer 3000 upon activation. Input/output chip 3040 may also connect various input/output units to input/output controller 3020 via parallel ports, serial ports, keyboard ports, mouse ports, and the like.

プログラムが、ＤＶＤ－ＲＯＭ３００１又はＩＣカードのようなコンピュータ可読記憶媒体によって提供される。プログラムは、コンピュータ可読記憶媒体から読み取られ、コンピュータ可読記憶媒体の例でもあるハードディスクドライブ３０２４、ＲＡＭ３０１４、又はＲＯＭ３０３０にインストールされ、ＣＰＵ３０１２によって実行される。これらのプログラム内に記述される情報処理は、コンピュータ３０００に読み取られ、プログラムと、上記様々なタイプのハードウエアリソースとの間の連携をもたらす。装置又は方法が、コンピュータ３０００の使用に従い情報のオペレーション又は処理を実現することによって構成されてよい。 A program is provided by a computer-readable storage medium such as a DVD-ROM 3001 or an IC card. The program is read from a computer-readable storage medium, installed in hard disk drive 3024 , RAM 3014 , or ROM 3030 , which are also examples of computer-readable storage medium, and executed by CPU 3012 . The information processing described within these programs is read by computer 3000 to provide coordination between the programs and the various types of hardware resources described above. An apparatus or method may be configured by implementing information operations or processing according to the use of computer 3000 .

例えば、通信がコンピュータ３０００及び外部デバイス間で実行される場合、ＣＰＵ３０１２は、ＲＡＭ３０１４にロードされた通信プログラムを実行し、通信プログラムに記述された処理に基づいて、通信インターフェース３０２２に対し、通信処理を命令してよい。通信インターフェース３０２２は、ＣＰＵ３０１２の制御の下、ＲＡＭ３０１４、ハードディスクドライブ３０２４、ＤＶＤ－ＲＯＭ３００１、又はＩＣカードのような記録媒体内に提供される送信バッファ領域に格納された送信データを読み取り、読み取られた送信データをネットワークに送信し、又はネットワークから受信した受信データを記録媒体上に提供される受信バッファ領域等に書き込む。 For example, when communication is performed between the computer 3000 and an external device, the CPU 3012 executes a communication program loaded into the RAM 3014 and sends communication processing to the communication interface 3022 based on the processing described in the communication program. you can command. Under the control of the CPU 3012, the communication interface 3022 reads transmission data stored in a transmission buffer area provided in a recording medium such as the RAM 3014, the hard disk drive 3024, the DVD-ROM 3001, or an IC card, and transmits the read transmission data. Data is transmitted to the network, or received data received from the network is written in a receive buffer area or the like provided on the recording medium.

また、ＣＰＵ３０１２は、ハードディスクドライブ３０２４、ＤＶＤ－ＲＯＭドライブ３０２６（ＤＶＤ－ＲＯＭ３００１）、ＩＣカード等のような外部記録媒体に格納されたファイル又はデータベースの全部又は必要な部分がＲＡＭ３０１４に読み取られるようにし、ＲＡＭ３０１４上のデータに対し様々なタイプの処理を実行してよい。ＣＰＵ３０１２は次に、処理されたデータを外部記録媒体にライトバックしてよい。 In addition, the CPU 3012 causes the RAM 3014 to read all or necessary portions of files or databases stored in external recording media such as a hard disk drive 3024, a DVD-ROM drive 3026 (DVD-ROM 3001), an IC card, etc. Various types of processing may be performed on the data in RAM 3014 . CPU 3012 may then write back the processed data to an external recording medium.

様々なタイプのプログラム、データ、テーブル、及びデータベースのような様々なタイプの情報が記録媒体に格納され、情報処理を受けてよい。ＣＰＵ３０１２は、ＲＡＭ３０１４から読み取られたデータに対し、本開示の随所に記載され、プログラムの命令シーケンスによって指定される様々なタイプのオペレーション、情報処理、条件判断、条件分岐、無条件分岐、情報の検索／置換等を含む、様々なタイプの処理を実行してよく、結果をＲＡＭ３０１４に対しライトバックする。また、ＣＰＵ３０１２は、記録媒体内のファイル、データベース等における情報を検索してよい。例えば、各々が第２の属性の属性値に関連付けられた第１の属性の属性値を有する複数のエントリが記録媒体内に格納される場合、ＣＰＵ３０１２は、当該複数のエントリの中から、第１の属性の属性値が指定されている条件に一致するエントリを検索し、当該エントリ内に格納された第２の属性の属性値を読み取り、それにより予め定められた条件を満たす第１の属性に関連付けられた第２の属性の属性値を取得してよい。 Various types of information, such as various types of programs, data, tables, and databases, may be stored on recording media and subjected to information processing. CPU 3012 performs various types of operations on data read from RAM 3014, information processing, conditional decisions, conditional branching, unconditional branching, and information retrieval as specified throughout this disclosure and by instruction sequences of programs. Various types of processing may be performed, including /replace, etc., and the results written back to RAM 3014 . Also, the CPU 3012 may search for information in a file in a recording medium, a database, or the like. For example, when a plurality of entries each having an attribute value of a first attribute associated with an attribute value of a second attribute are stored in the recording medium, the CPU 3012 selects the first attribute from among the plurality of entries. search for an entry that matches the specified condition of the attribute value of the attribute, read the attribute value of the second attribute stored in the entry, and thereby determine the first attribute that satisfies the predetermined condition An attribute value of the associated second attribute may be obtained.

上で説明したプログラム又はソフトウエアモジュールは、コンピュータ３０００上又はコンピュータ３０００近傍のコンピュータ可読記憶媒体に格納されてよい。また、専用通信ネットワーク又はインターネットに接続されたサーバシステム内に提供されるハードディスク又はＲＡＭのような記録媒体が、コンピュータ可読記憶媒体として使用可能であり、それにより、上記のプログラムを、ネットワークを介してコンピュータ３０００に提供する。 The programs or software modules described above may be stored in a computer readable storage medium on or near computer 3000 . In addition, a recording medium such as a hard disk or RAM provided in a server system connected to a dedicated communication network or the Internet can be used as a computer-readable storage medium, whereby the above program can be transferred via a network. provided to the computer 3000;

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments. It is obvious to those skilled in the art that various modifications and improvements can be made to the above embodiments. It is clear from the description of the scope of claims that forms with such modifications or improvements can also be included in the technical scope of the present invention.

特許請求の範囲、明細書、および図面中において示した装置、システム、プログラム、および方法における動作、手順、ステップ、および段階等の各処理の実行順序は、特段「より前に」、「先立って」等と明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。 The execution order of each process such as actions, procedures, steps, and stages in the devices, systems, programs, and methods shown in the claims, the specification, and the drawings is particularly "before", "before etc., and it should be noted that it can be implemented in any order unless the output of the previous process is used in the subsequent process. Regarding the operation flow in the claims, the specification, and the drawings, even if the description is made using "first," "next," etc. for the sake of convenience, it means that it is essential to carry out in this order. not a thing

１０通信ネットワーク、１２ユーザ端末、１４教師データ提供サーバ、１６サンプルデータ提供サーバ、１００指標推定システム、１２２通信部、１２４入出力部、１２６格納部、１２８要求受付部、１４２教師データ取得部、１４４モデル構築部、１６２サンプルデータ取得部、１６４テキストデータ生成部、１６６指標推定部、２２２設定情報格納部、２２４教師データ格納部、２２６サンプルデータ格納部、２２８モデル情報格納部、３００データテーブル、３１２情報、３１４情報、３１６情報、３２２情報、３２４情報、３２６情報、４００データテーブル、４１２サンプルＩＤ、４１４センテンスＩＤ、４１６情報、４１８情報、４２０情報、５２２抽出用モデル構築部、５２４評価用モデル構築部、６２０種別判定部、６３０評価対象抽出部、６３２キーワード型抽出部、６３４機械学習型抽出部、６４０評価部、６５０指標生成部、７２０学習モデル、７２２文章分類器、７２４文章分類器、７２６文章分類器、７２８文章分類器、７４０判定部、１１００データテーブル、１１１２サンプルＩＤ、１１１４センテンスＩＤ、１１１６情報、１１１８情報、３０００コンピュータ、３００１ＤＶＤ－ＲＯＭ、３０１０ホストコントローラ、３０１２ＣＰＵ、３０１４ＲＡＭ、３０１６グラフィックコントローラ、３０１８ディスプレイデバイス、３０２０入出力コントローラ、３０２２通信インターフェース、３０２４ハードディスクドライブ、３０２６ＤＶＤ－ＲＯＭドライブ、３０３０ＲＯＭ、３０４０入出力チップ、３０４２キーボード 10 communication network 12 user terminal 14 teacher data providing server 16 sample data providing server 100 index estimation system 122 communication unit 124 input/output unit 126 storage unit 128 request reception unit 142 teacher data acquisition unit 144 model construction unit 162 sample data acquisition unit 164 text data generation unit 166 index estimation unit 222 setting information storage unit 224 teacher data storage unit 226 sample data storage unit 228 model information storage unit 300 data table 312 Information, 314 information, 316 information, 322 information, 324 information, 326 information, 400 data table, 412 sample ID, 414 sentence ID, 416 information, 418 information, 420 information, 522 extraction model construction unit, 524 evaluation model construction Unit 620 Type Determination Unit 630 Evaluation Target Extraction Unit 632 Keyword Extraction Unit 634 Machine Learning Extraction Unit 640 Evaluation Unit 650 Index Generation Unit 720 Learning Model 722 Sentence Classifier 724 Sentence Classifier 726 Sentence Classifier 728 Sentence Classifier 740 Determination Unit 1100 Data Table 1112 Sample ID 1114 Sentence ID 1116 Information 1118 Information 3000 Computer 3001 DVD-ROM 3010 Host Controller 3012 CPU 3014 RAM 3016 Graphic Controller 3018 Display Device 3020 Input/Output Controller 3022 Communication Interface 3024 Hard Disk Drive 3026 DVD-ROM Drive 3030 ROM 3040 Input/Output Chip 3042 Keyboard

Claims

(i) an evaluation of an evaluation target, and (ii) the one or more descriptions included in the evaluation information associated with the one or more descriptions indicating the state of the evaluation target or the reason for the evaluation, as teacher data. First model construction for constructing a first learning model for determining whether or not an input sentence is a sentence indicating an evaluation for the evaluation object, a state of the evaluation object, or a reason for the evaluation. Department and
an extraction unit that extracts sentences related to the evaluation target from among one or more sentences included in text data using the first learning model constructed by the first model construction unit;
A machine learning device comprising
The text data includes information presented by an information provider's utterances or gestures, or information perceived by the information provider,
The machine learning device
a second model building unit that uses the evaluation information as training data to build a second learning model for assigning an evaluation regarding the evaluation target to an input sentence;
an evaluation assigning unit that assigns an evaluation regarding the evaluation target to the sentence extracted by the extracting unit using the second learning model constructed by the second model constructing unit;
an index calculation unit that calculates an index indicating the state or trend of the evaluation target in a specific period based on the evaluation by the evaluation giving unit;
a text data acquisition unit that acquires each of the plurality of text data in association with attribute information indicating attributes of information providers of each of the plurality of text data;
further comprising
For each of the plurality of sentences extracted from at least a part of the plurality of text data by the extraction unit, the evaluation imparting unit provides information indicated by the attribute information corresponding to the text data in which each sentence is included. Giving an evaluation regarding the evaluation target based on the attribute of the provider,
The index calculation unit calculates the index based on the evaluation given to each of the plurality of sentences by the evaluation giving unit.
Machine learning device.

The extractor is
At least part of the one or more sentences included in the text data is input to the first learning model, and the first learning model determines that the sentence indicates the state of the evaluation object or the reason for the evaluation. a first extraction unit that extracts a sentence as a sentence related to the evaluation target;
a condition acquisition unit that acquires information indicating a keyword or key phrase related to the evaluation target;
Among the one or more sentences included in the text data, a sentence containing the keyword, a sentence matching the key phrase, a sentence containing a word similar to the keyword, and a condition similar to the key phrase are matched. a second extracting unit that extracts at least one sentence related to the evaluation object;
having
The machine learning device according to claim 1.

The second extraction unit selects, from the one or more sentences included in the text data, a sentence containing the keyword, a sentence matching the key phrase, a sentence containing a word similar to the keyword, and the key extracting at least one sentence that matches a condition similar to the phrase as a sentence related to the evaluation target or a sentence candidate related to the evaluation target;
The first extraction unit is
inputting the sentence extracted by the second extraction unit as a sentence candidate related to the evaluation object into the first learning model;
extracting a sentence determined by the first learning model to be a sentence indicating the state of the evaluation target or the reason for the evaluation as a sentence related to the evaluation target;
The machine learning device according to claim 2.

The extractor is
At least part of the one or more sentences included in the text data is input to the first learning model, and the first learning model determines that the sentence indicates the state of the evaluation object or the reason for the evaluation. a first extraction unit that extracts a sentence as a sentence related to the evaluation target;
a condition acquisition unit that acquires information indicating a keyword or key phrase related to the evaluation target;
Among the one or more sentences included in the text data, a sentence containing the keyword, a sentence matching the key phrase, a sentence containing a word similar to the keyword, and a condition similar to the key phrase are matched. a second extracting unit that extracts at least one of the sentences as a sentence related to the evaluation target or a sentence candidate related to the evaluation target;
has
The first extraction unit is
inputting the sentence extracted by the second extraction unit as a sentence candidate related to the evaluation object into the first learning model;
extracting a sentence determined by the first learning model to be a sentence indicating the state of the evaluation target or the reason for the evaluation as a sentence related to the evaluation target;
The machine learning device according to claim 1.

The second extraction unit extracts a sentence containing two or more consecutive sentences, a sentence containing the keyword, a sentence matching the key phrase, a sentence containing a word similar to the keyword, and a sentence containing the key phrase. Extracting a sentence containing at least one sentence that matches a similar condition as a sentence candidate related to the evaluation object;
The machine learning device according to claim 3 or 4 .

further comprising a type information acquisition unit that acquires type information for distinguishing types of the text data,
Based on the type of the text data indicated by the type information, the second extracting unit matches the sentence including the keyword and the key phrase from among the one or more sentences included in the text data. At least one of a sentence, a sentence containing a word similar to the keyword, and a sentence matching a condition similar to the key phrase is selected from either a sentence related to the evaluation object or a sentence candidate related to the evaluation object. to decide whether to extract as,
The machine learning device according to any one of claims 3 to 5 .

The first learning model converts the input sentence into either a sentence indicating the state of the evaluation target or the reason for the evaluation, or a sentence that does not indicate the state of the evaluation target or the reason for the evaluation. including a sentence classifier to classify,
The machine learning device according to any one of claims 1 to 6 .

an index calculation unit that calculates an index indicating the state or trend of the evaluation target in a specific period based on the evaluation by the evaluation giving unit;
a period information acquisition unit that acquires information indicating the specific period;
When each of the plurality of text data is related to the content of each of the plurality of text data, when each of the plurality of text data was recorded, or when an electronic file containing each of the plurality of text data is a text data acquisition unit that acquires text data in association with time information indicating when it was created or updated;
further comprising
The extracting unit selects, from among the plurality of text data, a plurality of sentences included in the text data whose time indicated by the time information associated with the text data is included in the specific period, the evaluation target. extract sentences related to
The evaluation giving unit gives a plurality of sentences extracted from at least a part of the plurality of text data by the extracting unit an evaluation regarding the evaluation target,
The index calculation unit calculates the index based on the evaluation given to each of the plurality of sentences by the evaluation giving unit.
The machine learning device according to any one of claims 1 to 7 .

The information presented by the information provider's utterances or gestures, or the information perceived by the information provider is information described in a daily business report or a daily business report.
The machine learning device according to any one of claims 1 to 8.

A program for causing a computer to function as the machine learning device according to any one of claims 1 to 9 .

A computer calculates (i) an evaluation regarding an evaluation target, and (ii) the one or more descriptions included in the evaluation information associated with the one or more descriptions indicating the status of the evaluation target or the reason for the evaluation. constructing a first learning model for determining whether or not an input sentence is a sentence indicating the state of the evaluation target, the evaluation for the evaluation target, or the reason for the evaluation, using the training data as training data; 1 model building stage;
an extracting step in which the computer extracts sentences related to the evaluation target from among one or more sentences included in text data using the first learning model constructed in the first model constructing step;
A machine learning method comprising
The text data includes information presented by an information provider's utterances or gestures, or information perceived by the information provider,
The machine learning method comprises:
a second model construction step in which the computer constructs a second learning model for assigning an evaluation regarding the evaluation object to an input sentence using the evaluation information as teacher data;
an evaluation step in which the computer uses the second learning model built in the second model building step to give an evaluation regarding the evaluation target to the sentence extracted in the extraction step;
an index calculation step in which the computer calculates an index indicating the state or trend of the evaluation object in a specific period based on the evaluation in the evaluation giving step;
a text data acquisition step in which the computer acquires each of the plurality of text data in association with attribute information indicating the attributes of the information provider of each of the plurality of text data;
further having
In the evaluation step, each of the plurality of sentences extracted from at least a part of the plurality of text data in the extraction step is indicated by the attribute information corresponding to the text data in which each sentence was included. Based on the attributes of the information provider, giving an evaluation regarding the evaluation target,
The index calculation step includes calculating the index based on the evaluation given to each of the plurality of sentences in the evaluation giving step,
machine learning method.