JP7410066B2

JP7410066B2 - Information provision device, information provision method, and information provision program

Info

Publication number: JP7410066B2
Application number: JP2021024492A
Authority: JP
Inventors: 隼人小林; 徹清水; 立日暮; 毅司増山
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-02-18
Filing date: 2021-02-18
Publication date: 2024-01-09
Anticipated expiration: 2041-02-18
Also published as: JP2022126428A

Description

本発明は、情報提供装置、情報提供方法および情報提供プログラムに関する。 The present invention relates to an information providing device, an information providing method, and an information providing program.

近年、インターネットを使って、ユーザ（質問者）が投稿した質問文に対して他のユーザ（回答者）が回答文を投稿することで、ユーザ間で知識や知恵の共有を行う、所謂Ｑ＆Ａ（Question Answering）サイトが知られている。この種のＱ＆Ａサイトには、多数の質問文および回答文（以下質問回答という）が蓄積されているため、これらの質問回答を利用して様々なカテゴリにおける質問回答集であるＦＡＱ（Frequently Asked Questions：よくある質問）を作成するサービスが望まれている。従来、複数の質問回答のクラスタリング結果からＦＡＱを作成する技術が知られている。 In recent years, so-called Q&A (Q&A) has become popular, using the Internet to share knowledge and wisdom among users by having other users (answers) post answers to questions posted by users (askers). Question Answering) site is known. This type of Q&A site accumulates a large number of questions and answers (hereinafter referred to as questions and answers), so these questions and answers are used to create FAQs (Frequently Asked Questions), which are collections of questions and answers in various categories. : Frequently Asked Questions) service is desired. 2. Description of the Related Art Conventionally, a technique for creating FAQs from clustering results of a plurality of question answers is known.

特開２０２０－１６６４２６号公報JP2020-166426A

しかしながら、上述した従来技術では、利用者の意図に合ったＦＡＱ（質問回答集）が提供されているとは言えない場合がある。 However, with the above-mentioned conventional technology, it may not be possible to say that a FAQ (question and answer collection) that meets the user's intention is provided.

例えば、各カテゴリにおける質問回答の粒度や軸（方向性）にばらつきがある場合には、これら質問回答のクラスタリング結果からＦＡＱを作成してもばらつきが残り、利用者の意図に沿ったＦＡＱが提供されないおそれがある。 For example, if there are variations in the granularity or axis (direction) of question and answers in each category, even if FAQs are created from the clustering results of these questions and answers, there will still be variations, and FAQs that match the user's intentions will not be provided. There is a possibility that it will not be done.

本願は、上記に鑑みてなされたものであって、利用者の意図に合った質問回答集を提供することを目的とする。 This application was made in view of the above, and the purpose is to provide a question and answer collection that meets the user's intentions.

本願に係る情報提供装置は、所定の意味空間にマッピングされた複数の質問回答に対して、利用者の指示に基づく距離学習を行う学習処理部と、距離学習された質問回答をクラスタリングするクラスタリング処理部と、クラスタリングされた質問回答を含む各クラスタを要約して質問回答集を作成するＦＡＱ作成部と、を備える。 The information providing device according to the present application includes a learning processing unit that performs distance learning based on user instructions for a plurality of question answers mapped to a predetermined semantic space, and a clustering process that clusters the distance learned question answers. and an FAQ creation section that summarizes each cluster including clustered questions and answers to create a question and answer collection.

実施形態の一態様によれば、利用者の意図に合った質問回答集を作成することができる。 According to one aspect of the embodiment, it is possible to create a question and answer collection that matches the user's intention.

図１は、本実施形態に係る情報提供装置の一例を示す図である。FIG. 1 is a diagram showing an example of an information providing apparatus according to this embodiment. 図２は、本実施形態に係る情報提供装置の構成例を示す図である。FIG. 2 is a diagram showing a configuration example of the information providing device according to the present embodiment. 図３は、本実施形態に係る質問回答記憶部に記憶された情報の一例を示す図である。FIG. 3 is a diagram showing an example of information stored in the question and answer storage unit according to the present embodiment. 図４は、本実施形態に係る処理の流れの一例を示すフローチャートである。FIG. 4 is a flowchart showing an example of the flow of processing according to this embodiment. 図５は、ハードウェア構成の一例を示す図である。FIG. 5 is a diagram showing an example of the hardware configuration.

以下に、本願に係る情報提供装置、情報提供方法および情報提供プログラムを実施するための形態（以下、「実施形態」と記載する。）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る情報提供装置、情報提供方法および情報提供プログラムが限定されるものではない。 DESCRIPTION OF THE PREFERRED EMBODIMENTS An information providing apparatus, an information providing method, and an embodiment of an information providing program (hereinafter referred to as "embodiments") according to the present application will be described in detail below with reference to the drawings. Note that this embodiment does not limit the information providing apparatus, information providing method, and information providing program according to the present application.

［実施形態］
〔１．情報提供装置の概要について〕
まず、図１を用いて、本実施形態に係る情報提供装置１０の一例について説明する。なお、以下の説明では、情報提供装置１０が実行する処理の一例として、所謂Ｑ＆Ａサイトに蓄積された質問回答情報を利用して、利用者が指定した所定のカテゴリに関するＦＡＱ（質問回答集）を自動作成する処理を説明する。この種のＱ＆Ａサイトは、インターネットを使って、ユーザ（質問者）が投稿した質問文に対して他のユーザ（回答者）が回答文を投稿することで、ユーザ間で知識や知恵の共有を行うものである。 [Embodiment]
[1. About the overview of the information providing device]
First, an example of the information providing apparatus 10 according to the present embodiment will be described using FIG. 1. In the following description, as an example of the process executed by the information providing device 10, a FAQ (question and answer collection) regarding a predetermined category specified by the user is generated using question and answer information accumulated on a so-called Q&A site. Describe the automatic creation process. This type of Q&A site uses the Internet to share knowledge and wisdom among users by having other users (respondents) post answers to questions posted by users (questioners). It is something to do.

また、本実施形態における質問文および回答文は、１つ以上の文（センテンス）を含んだテキストデータであるものとする。また、文は、句点、感嘆符、疑問符、空白等で区切られたテキストデータの範囲であるものとする。また、質問文は、質問することを意図して作成されたものとして処理される文書であり、回答文は、質問文に対して回答することを意図して作成されたものとして処理される文書である。また、以下の説明では、情報提供装置１０によってＦＡＱの作成を指示するユーザを利用者と記載する。また、質問文とこれに対応する回答文等、各ユーザから受付けた情報を質問回答情報または質問回答と総称する場合がある。 Furthermore, it is assumed that the question text and the answer text in this embodiment are text data containing one or more sentences. It is also assumed that a sentence is a range of text data separated by periods, exclamation marks, question marks, spaces, etc. Additionally, a question text is a document that is processed as having been created with the intention of asking a question, and an answer text is a document that is processed as having been created with the intention of answering a question text. It is. Furthermore, in the following description, a user who instructs the creation of a FAQ using the information providing device 10 will be referred to as a user. Further, information received from each user, such as a question text and a corresponding answer text, may be collectively referred to as question and answer information or question and answer.

図１は、本実施形態に係る情報提供装置の一例を示す図である。図１に示す情報提供装置１０は、質問回答集であるＦＡＱの作成処理を行う情報処理装置であり、例えば、サーバ装置やクラウドシステム等により実現される。例えば、情報提供装置１０は、４Ｇ（Generation）、５Ｇ、ＬＴＥ（Long Term Evolution）、Ｗｉｆｉ（登録商標）若しくは無線ＬＡＮ（Local Area Network）等といった各種の無線通信網若しくは各種の有線通信網といったネットワークＮ（例えば、図２参照）を介して、利用者が使用する端末装置１００およびＱ＆Ａサイトを管理するウェブサーバ２００との間で通信を行う。 FIG. 1 is a diagram showing an example of an information providing apparatus according to this embodiment. The information providing device 10 shown in FIG. 1 is an information processing device that performs a process of creating FAQ, which is a collection of questions and answers, and is realized by, for example, a server device, a cloud system, or the like. For example, the information providing device 10 is a network such as various wireless communication networks such as 4G (Generation), 5G, LTE (Long Term Evolution), Wifi (registered trademark), or wireless LAN (Local Area Network), or various wired communication networks. Communication is performed between the terminal device 100 used by the user and the web server 200 that manages the Q&A site via N (for example, see FIG. 2).

端末装置１００は、ＰＣ（Personal Computer）、サーバ装置、スマートテレビジョン、スマートフォン若しくはタブレット等といったスマートデバイス等により実現され、ネットワークＮを介して、情報提供装置１０との間で通信を行うことができる携帯端末装置である。また、端末装置１００は、液晶ディスプレイ等の画面であって、タッチパネルの機能を有する画面を有し、利用者から指やスタイラス等によりタップ操作、スライド操作、スクロール操作等、情報提供装置１０から配信されるコンテンツに対する各種の操作を受付け可能な機能を有していてもよい。 The terminal device 100 is realized by a smart device such as a PC (Personal Computer), a server device, a smart television, a smartphone, or a tablet, and can communicate with the information providing device 10 via the network N. It is a mobile terminal device. In addition, the terminal device 100 has a screen such as a liquid crystal display, which has a touch panel function, and the information providing device 10 allows the user to perform tap operations, slide operations, scroll operations, etc. using a finger or a stylus, etc. It may have a function that can accept various operations on the content that is displayed.

なお、図１に示す例では、端末装置１００は、利用者Ｕにより利用される端末装置である。また、図１に示す例では、１人の利用者Ｕを記載したが、これに限定されるものではない。情報提供装置１０には、任意の数の利用者がそれぞれ端末装置を介して接続することができるため、各利用者が意図するＦＡＱを作成することができる。 Note that in the example shown in FIG. 1, the terminal device 100 is a terminal device used by the user U. Further, in the example shown in FIG. 1, one user U is described, but the present invention is not limited to this. Since any number of users can connect to the information providing device 10 via their respective terminal devices, each user can create the FAQ that he/she intends.

ウェブサーバ２００は、Ｑ＆Ａサイトの管理を実行する情報処理装置であり、例えば、サーバ装置やクラウドシステム等により実現される。ウェブサーバ２００は、ユーザが投稿した質問文や回答文を受けつけ、質問文に対応する回答文と紐づけて記憶する。また、ウェブサーバ２００は、Ｑ＆Ａサイト上で検索された質問回答をユーザに提供する。本実施形態では、ウェブサーバ２００を情報提供装置１０と別体として説明したが、これらを一体に構成しても構わない。 The web server 200 is an information processing device that manages the Q&A site, and is realized by, for example, a server device, a cloud system, or the like. The web server 200 receives questions and answers posted by users, and stores them in association with the answers corresponding to the questions. The web server 200 also provides the user with questions and answers searched on the Q&A site. In this embodiment, the web server 200 has been described as being separate from the information providing device 10, but they may be configured as one.

〔２．処理の一例について〕
ウェブサーバ２００が管理するＱ＆Ａサイトには、多数の質問回答が蓄積されているため、これらの質問回答を利用して様々なカテゴリにおけるＦＡＱを作成して提供するサービスが望まれている。この場合、様々なカテゴリのＦＡＱを作成するには膨大な人手コストがかかるため、該ＦＡＱを自動的に作成することが好ましい。一方、様々なカテゴリに対応するＦＡＱを自動的に作成しようとする場合、質問回答（見出し）の粒度や軸（方向性）が揃わないため、利用者の意図に沿ったＦＡＱを提供できない問題が生じるおそれがある。例えば、クレジットカード関連のＦＡＱを作成したい場合には、各カード会社に関する質問回答が混在すると、質問回答の粒度がばらつく。このため、カード会社ごとに質問回答を分けたいという利用者の意図を反映させることが好ましい。また、決済方法（スマホ決済やＥコマース）に関する質問回答と決済代行会社に関する質問回答とは、質問回答の軸が異なる。このため、質問回答を決済方法ごとに分けたい場合と決済代行会社ごとに分けたい場合とが混在して欲しくないという利用者の意図を反映させることが好ましい。 [2. Regarding an example of processing]
Since the Q&A site managed by the web server 200 has accumulated a large number of questions and answers, there is a desire for a service that uses these questions and answers to create and provide FAQs in various categories. In this case, it is preferable to automatically create FAQs because creating FAQs for various categories requires a huge amount of manpower. On the other hand, when trying to automatically create FAQs corresponding to various categories, the granularity and axes (direction) of question answers (headings) are not aligned, resulting in the problem of not being able to provide FAQs that match the user's intentions. There is a risk that this may occur. For example, when creating a FAQ related to credit cards, if questions and answers about each card company are mixed, the granularity of the questions and answers will vary. For this reason, it is preferable to reflect the user's intention to separate question and answers for each card company. Additionally, the questions and answers regarding payment methods (smartphone payments and e-commerce) are different from those regarding payment processing companies. For this reason, it is preferable to reflect the user's intention that they do not want questions and answers to be divided by payment method and by payment agency company at the same time.

本実施形態では、情報提供装置１０は、例えば、予めベクトル化された質問回答に対して、距離学習させた後にクラスタリング処理を行うことで、質問回答の粒度や軸のばらつきを抑えて、利用者の意図に合致したＦＡＱを提供するものである。以下、情報提供装置１０が実行する各処理の一例について説明する。 In the present embodiment, the information providing device 10 performs distance learning on question answers that have been vectorized in advance, and then performs clustering processing, thereby suppressing variations in the granularity and axes of the question answers and providing user feedback. The aim is to provide FAQs that match the intent of the website. An example of each process executed by the information providing device 10 will be described below.

〔２－１．ベクトルへ変換する前処理の一例について〕
図１に示すように、情報提供装置１０は、ウェブサーバ２００から質問回答を受け取る（ステップＳ１）。この質問回答は、情報提供装置１０の指示に応じてウェブサーバ２００が送信しても良いし、定期的にウェブサーバ２００が送信しても良い。また、ウェブサーバ２００は、毎回すべての質問回答を送信しても良いし、前回との差分情報を送信しても良い。 [2-1. About an example of preprocessing for converting to vector]
As shown in FIG. 1, the information providing device 10 receives a question and answer from the web server 200 (step S1). This question and answer may be sent by the web server 200 in response to an instruction from the information providing device 10, or may be sent periodically by the web server 200. Further, the web server 200 may transmit all questions and answers each time, or may transmit difference information from the previous time.

情報提供装置１０は質問回答を受け取ると、この質問回答をベクトルデータへ変換する前処理を行う（ステップＳ２）。本実施形態では、情報提供装置１０は、例えば、質問回答のテキスト情報の特徴量を抽出する学習モデル（オートエンコーダ；自己符号化器）を用いて各テキスト情報からＮ次元ベクトルを生成する。 When the information providing device 10 receives the question and answer, it performs preprocessing to convert the question and answer into vector data (step S2). In the present embodiment, the information providing apparatus 10 generates an N-dimensional vector from each piece of text information using, for example, a learning model (autoencoder) that extracts feature amounts of text information of question and answers.

学習モデルは、例えば、入力層と中間層と出力層とを含んで構成される。入力層は、情報が入力される層であり、出力層は、入力層への入力に応じて、入力された情報と同様の情報が出力される層である。この構成では、入力層から中間層までの間は、入力された情報を圧縮する処理（エンコード処理）を行う部分に対応し、中間層から出力層までの間は、圧縮された情報を復元する処理（デコード処理）を行う部分に対応する。また、中間層は、入力層から中間層までの間で圧縮された情報の特徴を表現する層である。 The learning model is configured to include, for example, an input layer, a middle layer, and an output layer. The input layer is a layer to which information is input, and the output layer is a layer to which information similar to the input information is output in response to input to the input layer. In this configuration, the section from the input layer to the middle layer corresponds to the part that compresses the input information (encoding processing), and the section from the middle layer to the output layer corresponds to the part that performs the processing to compress the compressed information. Corresponds to the part that performs processing (decoding processing). Furthermore, the intermediate layer is a layer that expresses the characteristics of information compressed between the input layer and the intermediate layer.

例えば、情報提供装置１０は、所定の学習モデルＭの入力層に質問回答のテキスト情報を入力することにより、学習モデルＭの各要素（ニューロン）の値を演算し、入力したテキスト情報と同様の情報を出力層から出力する。この場合、情報提供装置１０は、例えば中間層の各要素（ニューロン）の値を特徴量として抽出し、質問回答に対応するＮ次元のベクトルデータを生成する。このベクトルデータは、例えば、Ｎ次元の実数列として表現される。 For example, the information providing device 10 calculates the value of each element (neuron) of the learning model M by inputting question and answer text information into the input layer of a predetermined learning model M, and calculates the value of each element (neuron) of the learning model M. Output information from the output layer. In this case, the information providing device 10 extracts, for example, the value of each element (neuron) of the intermediate layer as a feature amount, and generates N-dimensional vector data corresponding to the question answer. This vector data is expressed, for example, as an N-dimensional real number sequence.

このような学習モデルＭは、例えば、ＤＮＮ（Deep Neural Network）といった各種の分類器によって実現可能である。なお、ＤＮＮは、例えば、ＲＮＮ（Recurrent Neural Network）、ＣＮＮ（Convolution Neural Network）、ＬＳＴＭ(Long short-term memory)等といった任意の構成を有するニューラルネットワークが採用可能である。 Such a learning model M can be realized by, for example, various classifiers such as DNN (Deep Neural Network). Note that a neural network having an arbitrary configuration such as an RNN (Recurrent Neural Network), a CNN (Convolution Neural Network), or an LSTM (Long short-term memory) can be employed as the DNN.

続いて、利用者Ｕが端末装置１００を通じて、カテゴリ（例えばクレジットカード）を指定すると（ステップＳ３）、情報提供装置１０は、クレジットカードの分野に関連する質問回答に対応するベクトルデータを抽出する（ステップＳ４）。情報提供装置１０は、例えば、カード会社名、入会、退会、請求、明細などの語彙を含む質問回答に対応するベクトルデータを抽出することができる。これにより、情報提供装置１０は、抽出されたベクトルデータにより、所定のカテゴリに関する意味空間にマッピングされた状態の質問回答を構成することができる。なお、本実施形態では、蓄積されたすべての質問回答を学習モデルＭによってベクトル化した後、指定されたカテゴリに関するベクトルデータを抽出した構成としたが、これに限るものではない。例えば、蓄積されたすべての質問回答から指定されたカテゴリに関する質問回答を抽出し、この抽出した質問回答を学習モデルＭによってベクトル化してもよい。 Subsequently, when the user U specifies a category (for example, credit card) through the terminal device 100 (step S3), the information providing device 10 extracts vector data corresponding to question answers related to the field of credit cards ( Step S4). The information providing device 10 can extract vector data corresponding to question answers including vocabulary such as card company name, membership, withdrawal, billing, details, etc., for example. Thereby, the information providing device 10 can compose a question and answer that is mapped to a semantic space related to a predetermined category using the extracted vector data. Note that in this embodiment, all the accumulated question answers are vectorized by the learning model M, and then vector data related to the specified category is extracted, but the present invention is not limited to this. For example, questions and answers related to a specified category may be extracted from all accumulated question and answers, and the extracted questions and answers may be vectorized by the learning model M.

〔２－２．距離学習処理の一例について〕 [2-2. Regarding an example of distance learning processing]

続いて、利用者Ｕは、端末装置１００を通じて、所定のカテゴリに関する意味空間にマッピングされた複数の質問回答に対して、該利用者Ｕの意図を含んだ指示を送信する（ステップＳ５）。この指示が受け取ると、情報提供装置１０は、所定のカテゴリに関する意味空間にマッピングされた複数の質問回答を、この指示に基づいて距離学習させる（ステップＳ６）。この指示は、例えば、複数の質問回答の粒度や軸のばらつきを抑えるための少量の例である。 Subsequently, the user U transmits, through the terminal device 100, an instruction that includes the user's U's intention with respect to a plurality of question answers mapped to a semantic space related to a predetermined category (step S5). When this instruction is received, the information providing device 10 performs distance learning of a plurality of question answers mapped to a semantic space related to a predetermined category based on this instruction (step S6). This instruction is a small example for suppressing variations in the granularity and axes of multiple question answers, for example.

ＦＡＱを作成する場合、例えば、金融関係に関連するＦＡＱでは、クレジットカードの各カード会社に関する質問回答が混在しても問題はないが、例えば、クレジットカードに関連のＦＡＱでは、各カード会社に関する質問回答が混在すると質問回答の粒度がばらつく。また、決済方法（スマホ決済やＥコマース）に関する質問回答と決済代行会社に関する質問回答とは質問回答の軸が異なる。このため、ユーザが利用しやすいＦＡＱを提供するためには、質問回答の粒度および軸のばらつきを抑えることが有効となる。このため、情報提供装置１０は、質問回答の粒度および軸のばらつきを抑えるために距離学習を実行して、意味空間における質問回答のベクトルデータの距離（例えば、意味空間における相対的な距離）を調整している。 When creating an FAQ, for example, in an FAQ related to finance, it is okay to include questions and answers about each credit card company; If the answers are mixed, the granularity of the questions and answers will vary. Additionally, the questions and answers regarding payment methods (smartphone payments and e-commerce) are different from those regarding payment processing companies. Therefore, in order to provide an FAQ that is easy for users to use, it is effective to suppress variations in the granularity and axes of question answers. For this reason, the information providing device 10 executes distance learning in order to suppress variations in the granularity and axes of question and answers, and calculates the distance of vector data of question and answers in the semantic space (for example, relative distance in the semantic space). I'm making adjustments.

この距離学習では、似ているデータは意味空間において相対的に近く、似ていないデータは意味空間において相対的に遠くなるように学習させている。情報提供装置１０は、例えば、マハラノビス距離学習を実行することができる。このマハラノビス距離学習を実行する際に、利用者の指示の一例（教師データ）として、Ａ社に関するクレジットカード関連のデータ（類似データ）の組と、Ａ社以外のカード会社（Ｂ社、Ｃ社・・・）に関するクレジットカード関連のデータ（非類似データ）の組とを用意する。情報提供装置１０は、これらの類似データの組と非類似データの組を用いて所定の共分散行列を学習する。そして、情報提供装置１０は、学習済の共分散行列により質問回答のベクトルデータを演算することで、Ａ社に関するクレジットカード関連のデータは相対的により近く、Ａ社以外のカード会社に関するクレジットカード関連のデータは相対的に遠くなる。このため、Ａ社のクレジットカード関連の質問回答とＡ社以外のクレジットカード関連の質問回答との距離を離すことができる。本実施形態では、距離学習の一例について説明したがこれに限るものではない。 In this distance learning, similar data is learned to be relatively close in the semantic space, and dissimilar data is learned to be relatively far away in the semantic space. The information providing device 10 can perform Mahalanobis distance learning, for example. When executing this Mahalanobis distance learning, as an example of the user's instructions (teacher data), a set of credit card related data (similar data) regarding company A and card companies other than company A (company B, company C) are used. ...) and a set of credit card-related data (dissimilar data). The information providing device 10 learns a predetermined covariance matrix using these sets of similar data and sets of dissimilar data. Then, the information providing device 10 calculates the vector data of the question and answer using the learned covariance matrix, so that the credit card related data regarding company A is relatively closer to the credit card related data regarding credit card companies other than company A. data is relatively far away. Therefore, the distance between Company A's credit card-related question answers and credit card-related question answers of companies other than A can be increased. In this embodiment, an example of distance learning has been described, but the present invention is not limited to this.

〔２－３．クラスタリング処理の一例について〕
続いて、情報提供装置１０は、距離学習されたベクトルデータに基づいてクラスタリング処理を行う（ステップＳ７）。このクラスタリング処理は、ベクトルデータ間の距離（例えばコサイン距離）を考慮して、この距離の近いベクトルデータに対応する質問回答群により階層的なクラスタ形成する。本実施形態では、クラスタリング処理の前に、距離学習により質問回答のベクトルデータの距離が調整されているため、該質問回答の粒度や軸のばらつきを抑えた状態でクラスタリングがなされる。続いて、情報提供装置１０は、形成された各クラスタについて、該クラスタに含まれる質問回答の見出しを形成してＦＡＱを作成する（ステップＳ８）。情報提供装置１０は、クラスタに含まれる質問回答の単語を分析して、特徴がある単語を用いて見出し（例えば、［Ａ社入退会］、［Ａ社明細］、［Ａ社ポイント］など）をつける。なお、見出しの代わりに各クラスタの中心質問回答を選択してもよい。最後に情報提供装置１０は、作成されたＦＡＱ情報を端末装置１００に送信することにより、利用者にＦＡＱ（質問回答集）を提供する（ステップＳ９）。この構成では、所定のカテゴリに関する意味空間にマッピングされた複数の質問回答を、利用者の指示に基づいて距離学習させるため、質問回答の粒度や軸のばらつきを抑えることができ、利用者の意図に沿ったＦＡＱを作成して提供することができる。 [2-3. About an example of clustering processing]
Subsequently, the information providing device 10 performs clustering processing based on the distance learned vector data (step S7). This clustering process takes into consideration the distance (for example, cosine distance) between vector data, and forms hierarchical clusters based on question and answer groups corresponding to vector data with close distances. In this embodiment, the distance of vector data of question and answers is adjusted by distance learning before clustering processing, so clustering is performed with variations in the granularity and axes of the question and answers suppressed. Subsequently, the information providing device 10 creates a FAQ for each cluster formed by creating headings for questions and answers included in the cluster (step S8). The information providing device 10 analyzes the words of the questions and answers included in the cluster and uses the characteristic words to generate headings (for example, [Company A membership/withdrawal], [Company A details], [Company A points], etc.). Attach. Note that the central question and answer of each cluster may be selected instead of the heading. Finally, the information providing device 10 sends the created FAQ information to the terminal device 100 to provide the user with the FAQ (question and answer collection) (step S9). In this configuration, multiple question answers mapped to a semantic space related to a predetermined category are distance learned based on the user's instructions, so variations in the granularity and axes of question answers can be suppressed, and the user's intentions You can create and provide FAQs based on the following.

〔３．情報提供装置の構成〕
以下、上記した情報提供装置１０が有する機能構成の一例について説明する。なお、以下の説明では、Ｑ＆Ａサイトに蓄積された質問回答情報を利用して、利用者が所望するカテゴリに関する質問回答集を作成して提供する情報提供装置１０が有する機能構成の一例を示す。図２は、本実施形態に係る情報提供装置の構成例を示す図である。図２に示すように、情報提供装置１０は、通信部２０、記憶部３０、および制御部４０を有する。 [3. Configuration of information providing device]
An example of the functional configuration of the information providing device 10 described above will be described below. In the following description, an example of a functional configuration of an information providing apparatus 10 that uses question and answer information accumulated in a Q&A site to create and provide a question and answer collection related to a category desired by a user will be described. FIG. 2 is a diagram showing a configuration example of the information providing device according to the present embodiment. As shown in FIG. 2, the information providing device 10 includes a communication section 20, a storage section 30, and a control section 40.

通信部２０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部２０は、ネットワークＮと有線または無線で接続され、端末装置１００およびウェブサーバ２００との間で情報の送受信を行う。 The communication unit 20 is realized by, for example, a NIC (Network Interface Card). The communication unit 20 is connected to the network N by wire or wirelessly, and transmits and receives information between the terminal device 100 and the web server 200.

記憶部３０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。また、記憶部３０は、質問回答記憶部３１と、学習モデル記憶部３２とを有する。 The storage unit 30 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. Furthermore, the storage unit 30 includes a question and answer storage unit 31 and a learning model storage unit 32.

図３は、本実施形態に係る質問回答記憶部に記憶された情報の一例を示す図である。質問回答記憶部３１は、質問回答に関する各種情報を記憶する。例えば、質問回答記憶部３１は、質問回答ＩＤやベクトルデータを記憶する。図３の例では、質問回答記憶部３１には、「質問回答ＩＤ」、「質問回答」、「質問回答情報」、「ベクトル情報」といった項目の情報が含まれる。 FIG. 3 is a diagram showing an example of information stored in the question and answer storage unit according to the present embodiment. The question and answer storage unit 31 stores various information regarding questions and answers. For example, the question and answer storage unit 31 stores question and answer IDs and vector data. In the example of FIG. 3, the question and answer storage unit 31 includes information on items such as "question and answer ID," "question and answer," "question and answer information," and "vector information."

「質問回答ＩＤ」は、質問回答を識別するための識別情報を示す。この質問回答は、質問文書と対応する回答文書とからなる。また、「質問回答」は、質問回答ＩＤにより識別される質問回答の具体的な名称や内容等を示す。なお、図３の例では、「質問回答」を「Ａ社カード入会」等のように、内容を特定するための文字情報で示している。 "Question and answer ID" indicates identification information for identifying a question and answer. This question and answer consists of a question document and a corresponding answer document. Further, "Question and Answer" indicates the specific name, content, etc. of the question and answer identified by the question and answer ID. In the example of FIG. 3, "Question and Answer" is indicated by character information for specifying the content, such as "A company card membership".

「質問回答情報」は、質問回答ＩＤにより識別される質問回答に関する情報を示す。なお、図３の例では、質問回答情報を「Ａ社カードの申し込み時に必要な書類はあるか」という質問文と、「お申し込みの際には原則必要ありません。ただし、Ａ社カードをお受け取りの際には本人確認資料等が必要な場合があります。」という回答文とからなる文字情報で示している。「ベクトル情報」とは、質問回答ＩＤにより識別される質問回答情報に対応するベクトル情報を示す。図３の例では、質問回答ＩＤに対応する質問回答情報を多次元（Ｎ次元）のベクトル情報（ベクトルデータ）「１０，２４，５４，２，・・・」で示している。 "Question and answer information" indicates information regarding the question and answer identified by the question and answer ID. In the example in Figure 3, the question and answer information is ``Are there any documents required when applying for a Company A card?'' and ``In principle, there are no documents required when applying. In some cases, identification documents, etc. may be required.'' "Vector information" indicates vector information corresponding to the question and answer information identified by the question and answer ID. In the example of FIG. 3, the question and answer information corresponding to the question and answer ID is shown as multidimensional (N-dimensional) vector information (vector data) "10, 24, 54, 2, . . .".

学習モデル記憶部３２は、質問回答のテキスト情報をベクトルデータに変換（ベクトル化）する学習モデルＭを記憶する。この種の学習モデルＭは、例えば、ＤＮＮ（Deep Neural Network）といった各種の分類器によって実現可能である。なお、ＤＮＮは、例えば、ＲＮＮ（Recurrent Neural Network）、ＣＮＮ（Convolution Neural Network）、ＬＳＴＭ(Long short-term memory)等といった任意の構成を有するニューラルネットワークが採用可能である。 The learning model storage unit 32 stores a learning model M that converts text information of question and answers into vector data (vectorization). This type of learning model M can be realized by various classifiers such as DNN (Deep Neural Network), for example. Note that a neural network having an arbitrary configuration such as an RNN (Recurrent Neural Network), a CNN (Convolution Neural Network), or an LSTM (Long short-term memory) can be employed as the DNN.

図２に戻って説明を続ける。制御部４０は、例えば、コントローラ（controller）であり、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、情報提供装置１０内部の記憶装置に記憶されている各種プログラムがＲＡＭを作業領域として実行されることにより実現される。また、制御部４０は、コントローラであり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 Returning to FIG. 2, the explanation will be continued. The control unit 40 is, for example, a controller, and various programs stored in a storage device inside the information providing device 10 use a RAM as a work area using a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like. This is achieved by executing as . Further, the control unit 40 is a controller, and is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図２に示すように、制御部４０は、前処理部４１、抽出部４２、距離学習部４３、クラスタリング処理部４４、およびＦＡＱ作成部４５を有する。前処理部４１は、学習モデル記憶部３２に記憶された学習モデルＭを用いて、質問回答のテキスト情報からＮ次元のベクトルデータを生成する。 As shown in FIG. 2, the control unit 40 includes a preprocessing unit 41, an extraction unit 42, a distance learning unit 43, a clustering processing unit 44, and an FAQ creation unit 45. The preprocessing unit 41 uses the learning model M stored in the learning model storage unit 32 to generate N-dimensional vector data from the text information of the question and answer.

前処理部４１は、例えば、学習モデルＭの入力層に質問回答のテキスト情報を入力することにより、学習モデルＭの各要素（ニューロン）の値を演算し、入力したテキスト情報と同様の情報を出力層から出力する。この場合、情報提供装置１０は、例えば中間層の各要素（ニューロン）の値を特徴量として抽出し、質問回答に対応するＮ次元のベクトルデータを生成する。このベクトルデータは、例えば、Ｎ次元の実数列として表現され、上記した質問回答記憶部３１に、質問回答ＩＤに対応する「ベクトル情報」として記憶される。 For example, the preprocessing unit 41 calculates the value of each element (neuron) of the learning model M by inputting text information of a question answer to the input layer of the learning model M, and calculates information similar to the input text information. Output from the output layer. In this case, the information providing device 10 extracts, for example, the value of each element (neuron) of the intermediate layer as a feature amount, and generates N-dimensional vector data corresponding to the question answer. This vector data is expressed, for example, as an N-dimensional real number sequence, and is stored in the above-mentioned question and answer storage section 31 as "vector information" corresponding to the question and answer ID.

抽出部４２は、利用者の指示に応じて該当するベクトルデータを抽出する。例えば、抽出部４２は、カテゴリとしてクレジットカードを指定すると、このクレジットカードの分野に関連する質問回答に対応するベクトルデータを抽出する。この場合、抽出部４２は、クレジットカードの分野に関連する情報として、例えば、カード会社名、入会、退会、請求、明細などの語彙を含む質問回答に対応するベクトルデータを抽出することができる。これにより、情報提供装置１０は、抽出されたベクトルデータを用いて、所定のカテゴリ（クレジットカード）に関する意味空間にマッピングされた質問回答を構成することができる。 The extraction unit 42 extracts the corresponding vector data according to the user's instructions. For example, when the extraction unit 42 specifies credit cards as a category, it extracts vector data corresponding to questions and answers related to the field of credit cards. In this case, the extraction unit 42 can extract vector data corresponding to the question and answer including vocabulary such as card company name, membership, withdrawal, billing, statement, etc., as information related to the field of credit cards. Thereby, the information providing device 10 can use the extracted vector data to compose a question and answer mapped to a semantic space regarding a predetermined category (credit card).

距離学習部４３は、利用者の指示に応じて、所定のカテゴリに関する意味空間にマッピングされた質問回答のベクトルデータの距離を調整する距離学習を実行する。利用者の指示とは、例えば、質問回答の粒度および軸のばらつきを抑えるための例（教師データ）である。ＦＡＱを作成する場合、例えば、金融関係に関連するＦＡＱでは、クレジットカードの各カード会社に関する質問回答が混在しても問題はないが、例えば、クレジットカードに関連のＦＡＱでは、各カード会社に関する質問回答が混在すると質問回答の粒度がばらつく。また、決済方法（スマホ決済やＥコマース）に関する質問回答と決済代行会社に関する質問回答とは質問回答の軸が異なる。このため、ユーザが利用しやすいＦＡＱを提供するためには、質問回答の粒度および軸のばらつきを抑えることが有効となる。このため、距離学習部４３は、意味空間にマッピングされた質問回答に距離学習を実行して、これら質問回答のベクトルデータの距離（例えば意味空間における相対的な距離）を調整することで、質問回答の粒度および軸のばらつきを抑えている。 The distance learning unit 43 executes distance learning to adjust the distance of vector data of question and answers mapped to a semantic space related to a predetermined category in response to a user's instruction. The user's instructions are, for example, examples (teacher data) for suppressing variations in the granularity and axes of question answers. When creating an FAQ, for example, in an FAQ related to finance, it is okay to include questions and answers about each credit card company; If the answers are mixed, the granularity of the questions and answers will vary. Additionally, the questions and answers regarding payment methods (smartphone payments and e-commerce) are different from those regarding payment processing companies. Therefore, in order to provide an FAQ that is easy for users to use, it is effective to suppress variations in the granularity and axes of question answers. For this reason, the distance learning unit 43 performs distance learning on the question answers mapped to the semantic space, and adjusts the distances (for example, relative distances in the semantic space) of the vector data of these question answers. The granularity of answers and variations in axes are suppressed.

この距離学習では、似ているデータは意味空間において相対的に近く、似ていないデータは意味空間において相対的に遠くなるように学習させている。情報提供装置１０は、例えば、マハラノビス距離学習を実行することができる。このマハラノビス距離学習を実行する際に、利用者の指示（教師データ）の一例として、Ａ社に関するクレジットカード関連のデータ（類似データ）の組と、Ａ社以外のカード会社（Ｂ社、Ｃ社・・・）に関するクレジットカード関連のデータ（非類似データ）の組とを用意する。距離学習部４３は、これらの類似データの組と非類似データの組を用いて所定の共分散行列を学習する。そして、距離学習部４３は、学習済の共分散行列により質問回答のベクトルデータを演算することで、Ａ社に関するクレジットカード関連のデータは相対的に近く、Ａ社以外のカード会社に関するクレジットカード関連のデータは相対的に遠くなる。このため、Ａ社のクレジットカード関連の質問回答とＡ社以外のクレジットカード関連の質問回答との距離を離すことができる。 In this distance learning, similar data is learned to be relatively close in the semantic space, and dissimilar data is learned to be relatively far away in the semantic space. The information providing device 10 can perform Mahalanobis distance learning, for example. When executing this Mahalanobis distance learning, as an example of the user's instructions (teacher data), a set of credit card related data (similar data) regarding company A and card companies other than company A (company B, company C) are used. ...) and a set of credit card-related data (dissimilar data). The distance learning unit 43 learns a predetermined covariance matrix using these sets of similar data and sets of dissimilar data. Then, the distance learning unit 43 calculates the vector data of the question and answer using the learned covariance matrix, so that the credit card related data related to company A are relatively close, and the credit card related data related to card companies other than company A are calculated. data is relatively far away. Therefore, the distance between Company A's credit card-related question answers and credit card-related question answers of companies other than A can be increased.

クラスタリング処理部４４は、距離学習されたベクトルデータに基づいてクラスタリング処理を実行する。クラスタリング処理部４４は、ベクトルデータ間の距離（例えばコサイン距離）を考慮して、この距離の近いベクトルデータに対応する質問回答群により階層的なクラスタ形成する。クラスタリング処理の手法は、例えば、階層構造が必要であればｗａｒｄ法や群平均法を用いることができる。また、非階層構造が必要であれば、ｋ―ｍｅａｎｓ法を用いることができる。また、クラスタリング処理をする際の距離は、コサイン距離に限るものではなく、例えば、ユークリッド距離などの既存の距離を用いても良い。 The clustering processing unit 44 executes clustering processing based on the distance learned vector data. The clustering processing unit 44 takes into consideration the distance between vector data (for example, cosine distance) and forms hierarchical clusters using question and answer groups corresponding to vector data with a close distance. For example, if a hierarchical structure is required, the Ward method or the group average method can be used as the clustering method. Furthermore, if a non-hierarchical structure is required, the k-means method can be used. Further, the distance used in the clustering process is not limited to the cosine distance, and for example, an existing distance such as the Euclidean distance may be used.

本実施形態では、クラスタリング処理部４４がクラスタリング処理を実行する前に、距離学習部４３が距離学習を行うことで質問回答のベクトルデータの距離が調整されているため、該質問回答の粒度や軸のばらつきを抑えた状態でクラスタリング処理をすることができる。 In this embodiment, before the clustering processing unit 44 executes the clustering process, the distance learning unit 43 performs distance learning to adjust the distance of the vector data of the question and answer, so the granularity and axis of the question and answer are adjusted. Clustering processing can be performed while suppressing the variation in .

ＦＡＱ作成部４５は、クラスタリング処理によって形成された各クラスタについて、該クラスタに含まれる質問回答の見出しを形成してＦＡＱを作成する。ＦＡＱ作成部４５は、クラスタに含まれる質問回答の単語を分析して、特徴がある単語を用いて見出し（例えば、［Ａ社入退会］、［Ａ社明細］、［Ａ社ポイント］など）をつける。この場合、ＦＡＱ作成部４５は、クラスタの質問回答のテキスト情報に含まれる単語（名詞や名詞接尾辞）を抽出し、これら単語（名詞や名詞接尾辞）の頻度に基づいて特徴を抽出する。なお、ＦＡＱ作成部４５は、見出しの代わりに各クラスタの中心質問回答を選択してもよい。この場合、ＦＡＱ作成部４５は、各クラスタの中心点からの距離が最も近い（類似度が最も高い）質問回答を中心質問回答に選択する。 The FAQ creation unit 45 creates FAQs by creating headings for questions and answers included in each cluster formed by the clustering process. The FAQ creation unit 45 analyzes the words of the questions and answers included in the cluster and creates headings (for example, [Company A membership/withdrawal], [Company A details], [Company A points], etc.) using characteristic words. Attach. In this case, the FAQ creation unit 45 extracts words (nouns and noun suffixes) included in the text information of the questions and answers of the cluster, and extracts features based on the frequencies of these words (nouns and noun suffixes). Note that the FAQ creation unit 45 may select the central question and answer of each cluster instead of the heading. In this case, the FAQ creation unit 45 selects the question answer that is closest in distance from the center point of each cluster (has the highest degree of similarity) as the central question answer.

ＦＡＱ作成部４５は、作成したＦＡＱに関する情報を、通信部２０を介して、端末装置１００に送信することにより、利用者にＦＡＱ（質問回答集）を提供する。ＦＡＱの提供については、ＦＡＱ作成部４５でない部分が実行してもよく、例えば、提供部を別途設けて提供部が実行する構成としてもよい。本実施形態では、所定のカテゴリに関する意味空間にマッピングされた複数の質問回答を、利用者の指示に基づいて距離学習部４３が距離学習するため、質問回答の粒度や軸のばらつきを抑えることができ、ＦＡＱ作成部４５は利用者の意図に沿ったＦＡＱを作成して提供することができる。 The FAQ creation unit 45 sends information regarding the created FAQ to the terminal device 100 via the communication unit 20, thereby providing the user with an FAQ (question and answer collection). The provision of FAQs may be performed by a part other than the FAQ creation section 45, or, for example, a provision section may be provided separately and the provision section may execute the provision. In this embodiment, the distance learning unit 43 performs distance learning on a plurality of question answers mapped to a semantic space related to a predetermined category based on instructions from the user, so it is possible to suppress variations in the granularity and axes of question answers. The FAQ creation unit 45 can create and provide FAQs that meet the user's intentions.

〔４．処理手順〕
次に、図４を用いて、本実施形態に係る情報提供装置１０が実行する処理の流れの一例を説明する。図４は、本実施形態に係る処理の流れの一例を示すフローチャートである。図４に示すように、情報提供装置１０は、ウェブサーバ２００から受け取った質問回答をベクトルデータへ変換する前処理を行う（ステップＳ１０１）。続いて、情報提供装置１０は、利用者の指示に基づくカテゴリに関連する質問回答に対応するベクトルデータを抽出する（ステップＳ１０２）。続いて、情報提供装置１０は、利用者の指示に基づいて、所定のカテゴリに関する意味空間にマッピングされた複数の質問回答の距離学習を実行させる（ステップＳ１０３）。続いて、情報提供装置１０は、距離学習されたベクトルデータに基づいてクラスタリングする（ステップＳ１０４）。続いて、情報提供装置１０は、クラスタリングにより形成された各クラスタについて、該クラスタに含まれる質問回答の見出しを形成してＦＡＱを作成して（ステップＳ１０５）、このＦＡＱを利用者に提供して処理を終了する。なお、情報提供装置１０は、ステップＳ１０１とステップＳ１０２とを実行する順序を入れ替えてもよい。また、ステップＳ１０１とステップＳ１０２とステップＳ１０３とを同時に実行してもよい。 [4. Processing procedure]
Next, an example of the flow of processing executed by the information providing apparatus 10 according to this embodiment will be described using FIG. 4. FIG. 4 is a flowchart showing an example of the flow of processing according to this embodiment. As shown in FIG. 4, the information providing apparatus 10 performs preprocessing to convert the question and answer received from the web server 200 into vector data (step S101). Subsequently, the information providing device 10 extracts vector data corresponding to the question and answer related to the category based on the user's instruction (step S102). Subsequently, the information providing device 10 executes distance learning of a plurality of question answers mapped to the semantic space related to a predetermined category based on the user's instructions (step S103). Subsequently, the information providing apparatus 10 performs clustering based on the distance learned vector data (step S104). Next, for each cluster formed by clustering, the information providing device 10 creates a FAQ by creating a heading for the questions and answers included in the cluster (step S105), and provides this FAQ to the user. Finish the process. Note that the information providing apparatus 10 may change the order in which steps S101 and S102 are executed. Further, step S101, step S102, and step S103 may be executed simultaneously.

〔５．変形例〕
上述した情報提供装置１０は、上記実施形態以外にも種々の異なる形態にて実施されてもよい。そこで、以下では、情報提供装置１０の他の実施形態について説明する。 [5. Modified example]
The information providing device 10 described above may be implemented in various different forms other than the embodiments described above. Therefore, other embodiments of the information providing device 10 will be described below.

〔５－１．前処理と距離学習処理との同時処理について〕
上記した実施形態では、情報提供装置１０は、所定のカテゴリに関する意味空間に複数の質問回答をマッピングする前処理の後に、これら質問回答の距離学習を行う距離学習処理を行っていたが、これら前処理と距離学習処理とをほぼ同時に処理して一気に学習させることもできる。 [5-1. Regarding simultaneous processing of preprocessing and distance learning processing]
In the above-described embodiment, the information providing device 10 performs distance learning processing to perform distance learning of these question answers after preprocessing to map a plurality of question answers to a semantic space related to a predetermined category. It is also possible to process the processing and distance learning processing almost simultaneously to perform learning at once.

例えば、情報提供装置１０は、端末装置１００から利用者が指定するカテゴリ（検索クエリや既存の質問集合）と、利用者が指定する分け方の具体例（検索結果に基づき利用者がラベルを付ける）を取得する。情報提供装置１０は、指定されたカテゴリに関する複数の質問回答を、一方の学習モデル（オートエンコーダ）を用いて、各質問回答のテキスト情報からＮ次元ベクトルを生成する。また、情報提供装置１０は、例えば、トリプレットロスといった類似性を学習する方法により生成された他の学習モデルを用いて、ベクトル化された質問回答の距離を調整することができる。この場合、同じクラスタにしたい例は近く、別のクラスタにしたい例は遠くなるようなトリプレットロスを追加することができる。また、同じカテゴリの質問は近く、それ以外は遠くなるようなトリプレットロスを追加することもできる。 For example, the information providing device 10 uses the terminal device 100 to specify categories (search queries and existing question sets) specified by the user, and specific examples of classification specified by the user (labels that the user adds based on the search results). ) to obtain. The information providing device 10 generates an N-dimensional vector from the text information of each question and answer using one of the learning models (autoencoder) for a plurality of questions and answers regarding the specified category. Further, the information providing device 10 can adjust the vectorized question-answer distance using another learning model generated by a similarity learning method such as triplet loss, for example. In this case, it is possible to add triplet losses such that examples that you want to have in the same cluster are close together, and examples that you want to have in different clusters are far away. You can also add a triplet loss so that questions in the same category are close together and others are far apart.

〔５－２．その他〕
また、クラスタリング処理として、取得した質問回答のベクトルデータを、例えば、ε近傍法、ｋ近傍法、全結合法などの手法を用いてグラフに変換した後、このグラフの連結性に注目してクラスタリングを行うスペクトラルクラスタリングを採用することもできる。このクラスタリングの際に、一度クラスタ化しやすい低次元表現に落としてからクラスタリングするため、利用者が指定する教師データにフィットするように、質問回答の軸の重みを調整することができる。 [5-2. others〕
In addition, as a clustering process, the vector data of the obtained question answers is converted into a graph using a method such as the ε-nearest neighbor method, k-nearest neighbor method, or fully connected method, and then clustering is performed by focusing on the connectivity of this graph. Spectral clustering can also be used. During this clustering, since clustering is performed after reducing the expression to a low-dimensional representation that is easy to cluster, the weights of the question and answer axes can be adjusted to fit the training data specified by the user.

また、上記した実施形態では、テキスト間の距離（例えばコサイン距離）に基づいて、テキストの類似度を測っているが、例えば、ＷｏｒｄＮｅｔのような概念グラフを用いて、テキストの類似度を測ってもよい。ＷｏｒｄＮｅｔは、所謂概念辞書であり、単語ＩＤと概念ＩＤとが紐づけられているため、単語は非類似でも概念が類似する上位・下位概念の単語の類似度を測ることができる。情報提供装置１０は、例えば、ポワンカレ空間上で階層構造を保存するベクトルを学習する学習モデルを用いることで、テキストの類似度を測ることができる。 Furthermore, in the embodiment described above, the similarity of texts is measured based on the distance between texts (for example, cosine distance), but for example, the similarity of texts can be measured using a concept graph such as WordNet. Good too. WordNet is a so-called concept dictionary, and since word IDs and concept IDs are linked, it is possible to measure the degree of similarity between words of higher-level and lower-level concepts that have similar concepts, even if the words are dissimilar. For example, the information providing device 10 can measure the similarity of texts by using a learning model that learns vectors that preserve a hierarchical structure on Poincaré space.

〔６．効果〕
上述してきたように、情報提供装置１０は、蓄積された複数の質問回答文書に対して、利用者の指示に基づく距離学習を行う距離学習部４３と、距離学習された質問回答文書をクラスタリングするクラスタリング処理部４４と、クラスタリングされた質問回答文書を含む各クラスタを要約してＦＡＱを作成するＦＡＱ作成部４５と、を備えるため、質問回答の粒度や軸のばらつきを抑えることができ、利用者の意図に沿ったＦＡＱを作成することができる。 [6. effect〕
As described above, the information providing device 10 includes a distance learning unit 43 that performs distance learning based on user instructions on a plurality of accumulated question and answer documents, and clusters the distance learned question and answer documents. Since it includes a clustering processing unit 44 and an FAQ creation unit 45 that summarizes each cluster including clustered question and answer documents and creates an FAQ, it is possible to suppress variations in the granularity and axes of question and answers, and to You can create FAQs that match your intentions.

また、距離学習を行う前に、複数の質問回答文書をベクトル化して所定の意味空間にマッピングする前処理を実行する前処理部４１を備えるため、意味空間にマッピングされた質問回答文書に対して、利用者の指示に基づく距離学習を容易に行うことができる。 In addition, since it includes a preprocessing unit 41 that performs preprocessing to vectorize a plurality of question and answer documents and map them to a predetermined semantic space before performing distance learning, the question and answer documents mapped to the semantic space are , it is possible to easily perform distance learning based on user instructions.

また、距離学習部４３は、利用者の指示に基づいて学習された共分散行列を用いて、ベクトル化された質問回答文書を演算することにより、意味空間内で質問回答文書を移動させるため、質問回答の粒度や軸のばらつきを容易に抑えることができる。 In addition, the distance learning unit 43 moves the question-and-answer document within the semantic space by calculating the vectorized question-and-answer document using the covariance matrix learned based on the user's instructions. Variations in the granularity and axes of question answers can be easily suppressed.

また、クラスタリング処理部４４は、ベクトル化された複数の質問回答文書間の距離を考慮して、この距離の近いベクトルに対応する質問回答群によりクラスタを形成するため、質問回答の粒度や軸のばらつきを抑えたクラスタリングをすることができる。 In addition, the clustering processing unit 44 takes into account the distance between a plurality of vectorized question-and-answer documents and forms a cluster by a group of question-and-answers corresponding to vectors with a close distance. Clustering can be performed with reduced variation.

また、ＦＡＱ作成部４５は、各クラスタに含まれる質問回答文書に見出しをつけるため、ＦＡＱを容易に作成することができる。 Furthermore, since the FAQ creation unit 45 attaches headings to the question and answer documents included in each cluster, it is possible to easily create the FAQ.

また、前処理部４１と距離学習部４３は、それぞれ前処理および距離学習を同時に実行するため、利用者の指示に基づく距離学習を迅速に処理できる。 Moreover, since the preprocessing unit 41 and the distance learning unit 43 simultaneously perform preprocessing and distance learning, respectively, they can quickly process distance learning based on a user's instruction.

〔７．ハードウェア構成〕
また、上述してきた実施形態に係る情報提供装置１０は、例えば図５に示すような構成のコンピュータ１０００によって実現される。図５は、実施形態に係る情報提供装置１０の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [7. Hardware configuration]
Further, the information providing apparatus 10 according to the embodiments described above is realized by, for example, a computer 1000 having a configuration as shown in FIG. FIG. 5 is a hardware configuration diagram showing an example of a computer that implements the functions of the information providing device 10 according to the embodiment. Computer 1000 has CPU 1100, RAM 1200, ROM 1300, HDD 1400, communication interface (I/F) 1500, input/output interface (I/F) 1600, and media interface (I/F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 CPU 1100 operates based on a program stored in ROM 1300 or HDD 1400, and controls each section. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started, programs depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、ネットワーク（通信ネットワーク）Ｎを介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータをネットワークＮを介して他の機器へ送信する。 The HDD 1400 stores programs executed by the CPU 1100, data used by the programs, and the like. Communication interface 1500 receives data from other devices via network (communication network) N and sends it to CPU 1100, and sends data generated by CPU 1100 to other devices via network N.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls output devices such as a display and a printer, and input devices such as a keyboard and mouse via an input/output interface 1600. CPU 1100 obtains data from an input device via input/output interface 1600. Further, CPU 1100 outputs the generated data to an output device via input/output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 Media interface 1700 reads programs or data stored in recording medium 1800 and provides them to CPU 1100 via RAM 1200. CPU 1100 loads this program from recording medium 1800 onto RAM 1200 via media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or a PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. etc.

例えば、コンピュータ１０００が実施形態に係る情報提供装置１０として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部４０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置からネットワークＮを介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the information providing apparatus 10 according to the embodiment, the CPU 1100 of the computer 1000 realizes the functions of the control unit 40 by executing a program loaded onto the RAM 1200. The CPU 1100 of the computer 1000 reads these programs from the recording medium 1800 and executes them, but as another example, these programs may be acquired from another device via the network N.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 Some of the embodiments of the present application have been described above in detail based on the drawings, but these are merely examples, and various modifications and variations may be made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure section of the invention. It is possible to carry out the invention in other forms with modifications.

〔８．その他〕
また、上記実施形態及び変形例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [8. others〕
Furthermore, among the processes described in the above embodiments and modified examples, all or part of the processes described as being performed automatically can be performed manually, or may be described as being performed manually. All or part of this processing can also be performed automatically using known methods. In addition, information including the processing procedures, specific names, and various data and parameters shown in the above documents and drawings may be changed arbitrarily, unless otherwise specified. For example, the various information shown in each figure is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、前処理部４１、抽出部４２、距離学習部４３、クラスタリング処理部４４、またはＦＡＱ作成部４５を情報提供装置１０の外部装置としてネットワーク経由で接続するようにしてもよい。また、前処理部４１、抽出部４２、距離学習部４３、クラスタリング処理部４４、またはＦＡＱ作成部４５を別の装置がそれぞれ有し、ネットワーク接続されて協働することで、上記の情報提供装置１０の機能を実現するようにしてもよい。すなわち、情報提供装置１０は、複数のサーバコンピュータで実現してもよく、また、機能によっては外部のプラットフォーム等をＡＰＩ（Application Programming Interface）やネットワークコンピューティングなどで呼び出して実現するなど、構成は柔軟に変更できる。 Furthermore, each component of each device shown in the drawings is functionally conceptual, and does not necessarily need to be physically configured as shown in the drawings. In other words, the specific form of distributing and integrating each device is not limited to what is shown in the diagram, and all or part of the devices can be functionally or physically distributed or integrated in arbitrary units depending on various loads and usage conditions. Can be integrated and configured. For example, the preprocessing unit 41, the extraction unit 42, the distance learning unit 43, the clustering processing unit 44, or the FAQ creation unit 45 may be connected as external devices to the information providing device 10 via a network. In addition, separate devices each have a preprocessing section 41, an extraction section 42, a distance learning section 43, a clustering processing section 44, or an FAQ creation section 45, and by being connected to a network and working together, the above-mentioned information providing device 10 functions may be realized. In other words, the information providing device 10 may be realized by a plurality of server computers, and depending on the function, it may be realized by calling an external platform etc. using an API (Application Programming Interface) or network computing, etc., so the configuration is flexible. can be changed to

また、上述してきた実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Furthermore, the above-described embodiments and modifications can be combined as appropriate within a range that does not conflict with the processing contents.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、前処理部４１は、前処理手段や前処理回路に読み替えることができる。 Further, the above-mentioned "section, module, unit" can be read as "means", "circuit", etc. For example, the preprocessing section 41 can be read as a preprocessing means or a preprocessing circuit.

１０情報提供装置
２０通信部
３０記憶部
３１質問回答記憶部
３２学習モデル記憶部
４０制御部
４１前処理部
４２抽出部
４３距離学習部（学習処理部）
４４クラスタリング処理部
４５ＦＡＱ作成部
１００端末装置 10 information providing device 20 communication unit 30 storage unit 31 question and answer storage unit 32 learning model storage unit 40 control unit 41 preprocessing unit 42 extraction unit 43 distance learning unit (learning processing unit)
44 Clustering processing unit 45 FAQ creation unit 100 Terminal device

Claims

a learning processing unit that performs distance learning for adjusting the relative distance of vector data of question and answers mapped to a semantic space related to a predetermined category specified by a user with respect to a plurality of accumulated question and answer documents;
a clustering processing unit that clusters the learned question and answer documents;
an FAQ creation unit that creates a question and answer collection using the question and answer closest to the center point of each cluster including the clustered question and answer documents;
An information providing device comprising:

The learning processing unit calculates the vectorized question and answer document using a covariance matrix that has undergone predetermined learning using a set of similar data and a set of dissimilar data for the vector data. , the information providing apparatus according to claim 1, wherein the relative distance of the vector data of the question and answer is adjusted within the semantic space.

2. The clustering processing unit takes into account the distance between the plurality of vectorized question and answer documents, and forms a cluster by a group of question and answers corresponding to vector data having a close distance. 2. The information providing device according to 2.

4. The information providing apparatus according to claim 1, wherein the FAQ creation unit attaches a heading to the question and answer documents included in each cluster.

An information provision method executed by a computer, the method comprising:
a learning processing step of performing distance learning for adjusting the relative distance of vector data of question and answers mapped to a semantic space related to a predetermined category specified by a user with respect to the plurality of accumulated question and answer documents;
a clustering process for clustering the learned question and answer documents;
an FAQ creation step of creating a question and answer collection using the question and answer closest to the center point of each cluster including the clustered question and answer documents;
An information provision method characterized by including.

a learning processing procedure for performing distance learning for adjusting the relative distance of vector data of question and answers mapped to a semantic space related to a predetermined category specified by a user for a plurality of accumulated question and answer documents;
a clustering procedure for clustering learned question-and-answer documents;
A FAQ creation procedure for creating a question and answer collection using the question and answer closest to the center point of each cluster including clustered question and answer documents;
An information providing program that causes a computer to execute.