JP7660849B2

JP7660849B2 - Learning system, attribute value extraction system, learning method, and program

Info

Publication number: JP7660849B2
Application number: JP2022073474A
Authority: JP
Inventors: 圭司新里; ▲彦▼迪夏; 維▲徳▼ 陳; 直樹吉永
Original assignee: University of Tokyo NUC; Rakuten Group Inc
Current assignee: University of Tokyo NUC; Rakuten Group Inc
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2025-04-14
Anticipated expiration: 2042-04-27
Also published as: JP2023162816A

Description

本開示は、学習システム、属性値抽出システム、学習方法、及びプログラムに関する。 This disclosure relates to a learning system, an attribute value extraction system, a learning method, and a program.

従来、商品又はコンテンツといった種々のアイテムに関する属性値を抽出する技術が知られている。例えば、非特許文献１及び非特許文献２には、学習済みの属性値抽出モデルに相当するモデルに対し、アイテムに関するアイテムデータと、アイテムに関する属性を含むクエリと、を入力し、当該モデルから出力された属性値などの固有表現に相当する情報を抽出する技術が記載されている。 Conventionally, there are known techniques for extracting attribute values related to various items such as products or content. For example, Non-Patent Documents 1 and 2 describe a technique in which item data related to an item and a query including attributes related to the item are input to a model equivalent to a trained attribute value extraction model, and information equivalent to a named entity such as an attribute value output from the model is extracted.

Qifan Wang, Li Yang, Bhargav Kanagal, Sumit Sanghai, D.Sivakumar, Bin Shu, Zac Yu, and Jon Elsas. 2020. Learning to extract attribute value from product via question answering: A multi-task approach. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 47-55, Online. ACM.Qifan Wang, Li Yang, Bhargav Kanagal, Sumit Sanghai, D.Sivakumar, Bin Shu, Zac Yu, and Jon Elsas. 2020. Learning to extract attribute value from product via question answering: A multi-task approach. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 47-55, Online. ACM. Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu, and Jiwei Li. 2020. A unified MRC framework for named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5849-5859, Online. Association for Computational Linguistics.Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu, and Jiwei Li. 2020. A unified MRC framework for named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5849-5859, Online. Association for Computational Linguistics.

しかしながら、非特許文献１及び非特許文献２のモデルから出力される属性値の精度は、クエリ次第で低下することがある。例えば、非特許文献１の技術では、クエリとして属性名を入力することが想定されている。例えば、非特許文献２の技術では、クエリとして抽出対象の定義説明文を入力することが想定されている。例えば、シンプルな語がクエリとして入力されると、非特許文献１及び非特許文献２のモデルがクエリの意味を認識できず、モデルから出力される属性値の精度が低下する可能性があった。 However, the accuracy of the attribute values output from the models in Non-Patent Document 1 and Non-Patent Document 2 may decrease depending on the query. For example, the technology in Non-Patent Document 1 assumes that an attribute name is input as a query. For example, the technology in Non-Patent Document 2 assumes that a definition and description of the object to be extracted is input as a query. For example, if a simple word is input as a query, the models in Non-Patent Document 1 and Non-Patent Document 2 may not be able to recognize the meaning of the query, and the accuracy of the attribute values output from the models may decrease.

本開示の目的の１つは、属性値抽出モデルの精度を高めることである。 One of the objectives of this disclosure is to improve the accuracy of attribute value extraction models.

本開示に係る学習システムは、推定用の第１アイテムに関する第１属性に関連付けられた少なくとも１つの第１属性値を含む第１クエリを利用して、前記第１アイテムに関する第２属性値を含む第１データから前記第２属性値を抽出するための属性値抽出モデルを記憶するモデル記憶部と、学習用の第２アイテムに関する第２属性に関連付けられた複数の第３属性値のうちの少なくとも一部が除外された第２クエリを取得する第２クエリ取得部と、前記第２アイテムに関する第４属性値を含む第２データと、前記第２クエリと、に基づいて、前記属性値抽出モデルに関する学習を行う学習部と、を含む。 The learning system according to the present disclosure includes a model storage unit that stores an attribute value extraction model for extracting a second attribute value from first data including a second attribute value related to a first item for estimation using a first query including at least one first attribute value associated with the first attribute related to the first item, a second query acquisition unit that acquires a second query in which at least a portion of a plurality of third attribute values associated with a second attribute related to a second item for learning is excluded, and a learning unit that performs learning on the attribute value extraction model based on second data including a fourth attribute value related to the second item and the second query.

本開示によれば、属性値抽出モデルからの精度が高まる。 This disclosure improves accuracy from attribute value extraction models.

学習システムの全体構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of an overall configuration of a learning system. 商品データから取得される属性及び属性値の一例を示す図である。11A and 11B are diagrams illustrating an example of attributes and attribute values acquired from product data. 質問応答モデルを利用した属性値抽出モデルの一例を示す図である。FIG. 13 is a diagram illustrating an example of an attribute value extraction model using a question answering model. 本実施形態の属性値抽出モデルの一例を示す図である。FIG. 4 is a diagram illustrating an example of an attribute value extraction model according to the present embodiment. 知識の不完全さを属性値抽出モデルに学習させる方法の一例を示す図である。FIG. 13 is a diagram illustrating an example of a method for making an attribute value extraction model learn about incompleteness of knowledge. 学習システムで実現される機能の一例を示す機能ブロック図である。A functional block diagram showing an example of functions realized by the learning system. 訓練データベースの一例を示す図である。FIG. 2 is a diagram illustrating an example of a training database. 属性データベースの一例を示す図である。FIG. 4 is a diagram illustrating an example of an attribute database. 学習システムで実行される処理の一例を示すフロー図である。FIG. 2 is a flow chart showing an example of processing executed in the learning system. 学習システムで実行される処理の一例を示すフロー図である。FIG. 2 is a flow chart showing an example of processing executed in the learning system. 変形例における学習システム及び属性値抽出システムの一例である。13 is an example of a learning system and an attribute value extraction system according to a modified example.

［１．学習システムの全体構成］
本開示に係る学習システム及び属性値抽出システムの実施形態の一例を説明する。本実施形態では、学習システムが属性値抽出システムの機能も含む場合を例に挙げるが、学習システム及び属性値抽出システムは、互いに別々のシステムであってもよい。学習システム及び属性値抽出システムが互いに別々のシステムである場合は、後述の変形例で説明する。 [1. Overall configuration of the learning system]
An example of an embodiment of a learning system and an attribute value extraction system according to the present disclosure will be described. In this embodiment, an example is given of a learning system that also includes the functions of an attribute value extraction system, but the learning system and the attribute value extraction system may be separate systems. A case in which the learning system and the attribute value extraction system are separate systems will be described in a modified example below.

図１は、学習システムの全体構成の一例を示す図である。例えば、学習システム１は、サーバ１０、学習端末２０、及び推定端末３０を含む。サーバ１０、学習端末２０、及び推定端末３０の各々は、インターネット又はＬＡＮ等の任意のネットワークＮに接続可能である。 Figure 1 is a diagram showing an example of the overall configuration of a learning system. For example, the learning system 1 includes a server 10, a learning terminal 20, and an estimation terminal 30. Each of the server 10, the learning terminal 20, and the estimation terminal 30 can be connected to any network N such as the Internet or a LAN.

サーバ１０は、サーバコンピュータである。制御部１１は、少なくとも１つのプロセッサを含む。記憶部１２は、ＲＡＭ等の揮発性メモリと、フラッシュメモリ等の不揮発性メモリと、を含む。通信部１３は、有線通信用の通信インタフェースと、無線通信用の通信インタフェースと、の少なくとも一方を含む。 The server 10 is a server computer. The control unit 11 includes at least one processor. The storage unit 12 includes a volatile memory such as a RAM and a non-volatile memory such as a flash memory. The communication unit 13 includes at least one of a communication interface for wired communication and a communication interface for wireless communication.

学習端末２０は、後述の属性値抽出モデルの学習を行うコンピュータである。例えば、学習端末２０は、パーソナルコンピュータ、スマートフォン、又はタブレット端末である。制御部２１、記憶部２２、及び通信部２３のハードウェア構成は、それぞれ制御部１１、記憶部１２、及び通信部１３と同様である。操作部２４は、タッチパネル又はマウス等の入力デバイスである。表示部２５は、液晶ディスプレイ又は有機ＥＬディスプレイである。 The learning terminal 20 is a computer that learns the attribute value extraction model described below. For example, the learning terminal 20 is a personal computer, a smartphone, or a tablet terminal. The hardware configurations of the control unit 21, the memory unit 22, and the communication unit 23 are similar to those of the control unit 11, the memory unit 12, and the communication unit 13, respectively. The operation unit 24 is an input device such as a touch panel or a mouse. The display unit 25 is a liquid crystal display or an organic EL display.

推定端末３０は、後述の学習済みの属性値抽出モデルを利用するコンピュータである。例えば、推定端末３０は、パーソナルコンピュータ、スマートフォン、又はタブレット端末である。制御部３１、記憶部３２、通信部３３、操作部３４、及び表示部３５のハードウェア構成は、それぞれ制御部１１、記憶部１２、通信部１３、操作部２４、及び表示部２５と同様である。 The estimation terminal 30 is a computer that uses a trained attribute value extraction model described below. For example, the estimation terminal 30 is a personal computer, a smartphone, or a tablet terminal. The hardware configurations of the control unit 31, the memory unit 32, the communication unit 33, the operation unit 34, and the display unit 35 are similar to those of the control unit 11, the memory unit 12, the communication unit 13, the operation unit 24, and the display unit 25, respectively.

なお、記憶部１２，２２，３２に記憶されるプログラムは、ネットワークＮを介して供給されてもよい。また、各コンピュータには、コンピュータ読み取り可能な情報記憶媒体を読み取る読取部（例えば、メモリカードスロット）と、外部機器とデータの入出力をするための入出力部（例えば、ＵＳＢポート）と、の少なくとも一方が含まれてもよい。例えば、情報記憶媒体に記憶されたプログラムが、読取部及び入出力部の少なくとも一方を介して供給されてもよい。 The programs stored in the storage units 12, 22, and 32 may be supplied via the network N. Each computer may also include at least one of a reading unit (e.g., a memory card slot) that reads a computer-readable information storage medium and an input/output unit (e.g., a USB port) for inputting and outputting data to and from an external device. For example, a program stored in an information storage medium may be supplied via at least one of the reading unit and the input/output unit.

また、学習システム１は、少なくとも１つのコンピュータを含めばよい。学習システム１に含まれるコンピュータは、図１の例に限られない。例えば、推定端末３０が学習システム１の外部に存在してもよい。学習システム１は、サーバ１０及び学習端末２０のみを含んでもよい。学習システム１は、サーバ１０又は学習端末２０の何れか一方のみを含んでもよい。学習システム１は、サーバ１０又は学習端末２０の何れか一方と、他のコンピュータと、を含んでもよい。 The learning system 1 may include at least one computer. The computers included in the learning system 1 are not limited to the example in FIG. 1. For example, the estimation terminal 30 may exist outside the learning system 1. The learning system 1 may include only the server 10 and the learning terminal 20. The learning system 1 may include only either the server 10 or the learning terminal 20. The learning system 1 may include either the server 10 or the learning terminal 20, and another computer.

［２．学習システムの概要］
本実施形態では、商品に関する商品データの中から、商品の属性に応じた属性値が取得される場合を例に挙げて、学習システム１で実行される処理を説明する。商品は、商取引の対象となる物である。本実施形態では、オンラインショッピングモールにおける商品を例に挙げるが、商品自体は、任意の商品であってよく、例えば、インターネットオークション、オンラインフリーマーケット、又は現実の店舗で取引される商品であってもよい。 [2. Overview of the learning system]
In this embodiment, the processing executed by the learning system 1 will be described by taking as an example a case where attribute values according to product attributes are acquired from product data related to the product. A product is an object that is the subject of commercial transaction. In this embodiment, a product in an online shopping mall is taken as an example, but the product itself may be any product, for example, a product traded in an internet auction, an online flea market, or a real store.

商品データは、商品の詳細に関する情報である。例えば、商品データは、文字、数字、その他の記号、画像、音声、動画、又はこれらの組み合わせを含む。本実施形態では、オンラインショッピングモールにおける商品を例に挙げるので、店舗の担当者が入力した任意のテキスト（例えば、商品のタイトル）が商品データに相当する。例えば、商品データの内容は、オンラインショッピングモールのウェブサイト又はアプリケーションから閲覧可能である。 Product data is information about the details of a product. For example, product data includes letters, numbers, other symbols, images, sounds, videos, or a combination of these. In this embodiment, products in an online shopping mall are used as an example, so any text entered by a store staff member (e.g., the title of the product) corresponds to the product data. For example, the contents of the product data can be viewed from the website or application of the online shopping mall.

属性は、商品を分類するための区分である。属性は、カテゴリ又はジャンルと呼ばれることもある。属性は、階層的に定義されてもよい。属性が階層的に定義される場合、属性が上位であるほど属性の意味が抽象的になり、属性が下位であるほど属性の意味が具体的になる。以降、単に属性と記載した箇所は、属性の名前を意味する。属性は、文字、数字、その他の記号、又はこれらの組み合わせによって表現される。属性は、所定の観点で商品を分類可能なものであればよく、任意の観点で属性を定義可能である。例えば、属性は、ブランド、色、素材、サイズ、機能、柄、又は産地といった情報である。 Attributes are categories for classifying products. Attributes are sometimes called categories or genres. Attributes may be defined hierarchically. When attributes are defined hierarchically, the higher the attribute, the more abstract the meaning of the attribute, and the lower the attribute, the more specific the meaning of the attribute. Hereinafter, when simply describing an attribute, it means the name of the attribute. Attributes are expressed by letters, numbers, other symbols, or combinations of these. Attributes can be defined from any perspective as long as they can classify products from a specific perspective. For example, attributes are information such as brand, color, material, size, function, pattern, or place of origin.

属性値は、具体的な分類を示す値である。属性には、複数の属性値が予め用意されている。商品には、予め用意された複数の属性値のうちの少なくとも１つが付与される。予め用意された複数の属性値は、商品に付与される属性値の候補である。属性値は、文字、数字、その他の記号、又はこれらの組み合わせによって表現される。例えば、店舗の担当者は、ある商品の商品データをアップロードする時に、属性と、この属性に対して用意された複数の属性値のうちの少なくとも１つと、を指定する。この商品の商品データには、当該指定された属性及び属性値が関連付けられる。 An attribute value is a value that indicates a specific classification. Multiple attribute values are prepared in advance for an attribute. At least one of the multiple prepared attribute values is assigned to a product. The multiple prepared attribute values are candidates for the attribute values to be assigned to a product. Attribute values are expressed by letters, numbers, other symbols, or combinations of these. For example, when uploading product data for a product, a store staff member specifies an attribute and at least one of the multiple attribute values prepared for that attribute. The specified attribute and attribute value are associated with the product data for that product.

例えば、属性「ブランド」には、属性値として、種々のブランド名が用意されている。あるブランドの商品には、属性「ブランド」に対して用意された複数の属性値のうち、この商品のブランドのブランド名を示す属性値が付与される。例えば、属性「色」には、属性値として、黒、白、黄色といった種々の色が予め用意されている。見た目が黒い商品には、属性「色」に対して用意された複数の属性値のうち、「黒」を示す属性値が付与される。他の属性についても同様であり、任意の属性値を予め用意し、任意の属性値を商品に付与できる。 For example, various brand names are prepared as attribute values for the attribute "brand." Products of a certain brand are assigned an attribute value indicating the brand name of the brand of this product from among the multiple attribute values prepared for the attribute "brand." For example, various colors such as black, white, and yellow are prepared in advance as attribute values for the attribute "color." Products that look black are assigned an attribute value indicating "black" from among the multiple attribute values prepared for the attribute "color." The same is true for other attributes; any attribute value can be prepared in advance and assigned to products.

図２は、商品データから取得される属性及び属性値の一例を示す図である。例えば、学習端末２０がオンラインショッピングモールのウェブサイトにアクセスすると、商品データを含む商品ページＰが表示部２５に表示される。例えば、商品ページＰには、店舗の名前、商品のタイトル、商品の画像、及び商品の詳細な説明文といった商品データが表示される。店舗の担当者が、商品データをアップロードする時に属性及び属性値を指定していれば、当該指定された属性及び属性値も商品データとして商品ページＰに表示される。商品の属性及び属性値は、検索時のインデックスとして利用される。 Figure 2 is a diagram showing an example of attributes and attribute values obtained from product data. For example, when the learning terminal 20 accesses a website of an online shopping mall, a product page P including product data is displayed on the display unit 25. For example, product data such as the store name, product title, product image, and detailed product description are displayed on the product page P. If the store staff specifies attributes and attribute values when uploading the product data, the specified attributes and attribute values are also displayed as product data on the product page P. The product attributes and attribute values are used as indexes during searches.

商品データの中から、商品の属性に応じた適切な属性値を抽出できれば、種々の効果を奏することができる。例えば、店舗の担当者が属性値を指定する必要がなくなるので、店舗の担当者の負担を軽減できる。例えば、店舗の担当者が誤って不適切な属性値を指定したとしても、商品データの中から取得した適切な属性値に置き換えることもできる。例えば、現時点では属性値として用意されていなくても、商品データから新たな属性値を抽出することもできる。例えば、顧客の嗜好又はマーケットプレイスを深く理解してマーケティングに活用することもできる。 If appropriate attribute values according to the attributes of a product can be extracted from product data, various effects can be achieved. For example, the burden on store staff can be reduced because they no longer need to specify attribute values. For example, even if a store staff member mistakenly specifies an inappropriate attribute value, it can be replaced with an appropriate attribute value obtained from the product data. For example, new attribute values can be extracted from product data even if they are not currently available as attribute values. For example, it can also be used for marketing by gaining a deeper understanding of customer preferences or the marketplace.

そこで、学習システム１は、商品データから属性値を抽出するための属性値抽出モデルを利用して、商品データから属性値を抽出する。属性値抽出モデルは、機械学習を利用したモデルである。機械学習自体は、種々の手法を利用可能である。例えば、機械学習の手法は、教師有り学習、半教師有り学習、又は教師無し学習の何れの手法も、属性値抽出モデルに利用可能である。 The learning system 1 extracts attribute values from product data using an attribute value extraction model for extracting attribute values from product data. The attribute value extraction model is a model that uses machine learning. Various methods can be used for the machine learning itself. For example, any of the machine learning methods, such as supervised learning, semi-supervised learning, and unsupervised learning, can be used for the attribute value extraction model.

本実施形態では、ＢＥＲＴと呼ばれるモデルに基づいた質問応答モデル（ＱＡモデル）を利用した属性値抽出モデルを例に挙げるが、属性値抽出モデルは、商品データを一例とするエンティティとクエリを入力として属性値を抽出するモデルであればよく、その種別に制限はない。例えば、属性値抽出モデルは、ＢＥＲＴ以外のいわゆるＴｒａｎｓｆｏｒｍｅｒベースのモデルであってもよいし、Ｔｒａｎｓｆｏｒｍｅｒが登場する以前に利用されていたニューラルネットワークのモデルであってもよい。 In this embodiment, an attribute value extraction model that uses a question answering model (QA model) based on a model called BERT is given as an example, but the attribute value extraction model may be any model that extracts attribute values using an entity, such as product data, and a query as input, and there is no restriction on the type. For example, the attribute value extraction model may be a so-called Transformer-based model other than BERT, or a neural network model that was used before the advent of Transformers.

図３は、質問応答モデルを利用した属性値抽出モデルの一例を示す図である。図３の質問応答モデルＭ１及び属性値抽出モデルＭ２は、従来のモデルであるが、基本的な仕組みを理解するために、従来のモデルについて説明する。質問応答モデルＭ１及び属性値抽出モデルＭ２の詳細は、先行技術文献として挙げた非特許文献１、非特許文献２、又はこれらに記載された参考文献を参照されたい。なお、これらの文献では、質問応答モデルＭ１及び属性値抽出モデルＭ２に相当するモデルが他の名前で呼ばれていることがある。 Figure 3 is a diagram showing an example of an attribute value extraction model that uses a question-answering model. The question-answering model M1 and attribute value extraction model M2 in Figure 3 are conventional models, but in order to understand the basic mechanism, the conventional models will be explained. For details on the question-answering model M1 and the attribute value extraction model M2, please refer to Non-Patent Document 1 and Non-Patent Document 2 listed as prior art documents, or the references described therein. Note that in these documents, the models equivalent to the question-answering model M1 and the attribute value extraction model M2 may be called by other names.

例えば、質問応答モデルＭ１は、ニュース記事及びクエリが入力されると、クエリに応じた応答を出力する。質問応答モデルＭ１には、ニュース記事及びクエリと、クエリに応じた応答と、の関係が学習されている。質問応答モデルＭ１に入力されるクエリは、ニュース記事の内容に関する質問である。質問応答モデルＭ１から出力される応答は、クエリである質問に対する回答である。質問応答モデルＭ１は、ニュース記事から回答として適切な部分を推定し、当該部分を出力する。 For example, when a news article and a query are input, the question-answering model M1 outputs a response corresponding to the query. The question-answering model M1 has learned the relationship between the news article and the query, and the response corresponding to the query. The query input to the question-answering model M1 is a question about the content of the news article. The response output from the question-answering model M1 is an answer to the question, which is the query. The question-answering model M1 estimates the part of the news article that is appropriate as the answer, and outputs that part.

図３の例では、日本における緊急事態宣言の解除に関するニュース記事Ｎ１０と、緊急事態宣言が解除される時期を問うクエリＱ１１と、が質問応答モデルＭ１に入力される。質問応答モデルＭ１は、緊急事態宣言が解除される時期として、ニュース記事Ｎ１０の「９月３０日」の部分が適切であることを推定し、この部分を識別可能な応答Ｒ１２を出力する。このような質問応答モデルＭ１は、商品データから属性値を抽出する目的でも利用できる。 In the example of FIG. 3, a news article N10 about the lifting of the state of emergency in Japan and a query Q11 asking when the state of emergency will be lifted are input to the question-answering model M1. The question-answering model M1 infers that the part of the news article N10 that is "September 30th" is appropriate as the date when the state of emergency will be lifted, and outputs a response R12 that can identify this part. Such a question-answering model M1 can also be used to extract attribute values from product data.

例えば、属性値抽出モデルＭ２は、質問応答モデルＭ１を流用して作成されたモデルである。属性値抽出モデルＭ２は、商品データ及びクエリが入力されると、クエリに応じた応答として、属性値を出力する。属性値抽出モデルＭ２には、商品データ及びクエリと、商品データに含まれる属性値と、の関係が学習されている。属性値抽出モデルＭ２に入力されるクエリは、商品データが示す商品の属性である。クエリは、商品データのうち、属性に応じた属性値が含まれる部分を問う質問ということができる。属性値抽出モデルＭ２から出力される応答は、クエリである属性に応じた属性値である。属性値抽出モデルＭ２は、商品データから属性値として適切な部分を推定し、当該部分を出力する。 For example, the attribute value extraction model M2 is a model created by reusing the question answering model M1. When product data and a query are input, the attribute value extraction model M2 outputs an attribute value as a response to the query. The attribute value extraction model M2 has learned the relationship between the product data and the query, and the attribute values contained in the product data. The query input to the attribute value extraction model M2 is the attribute of the product indicated by the product data. The query can be said to be a question that asks about a part of the product data that contains an attribute value corresponding to the attribute. The response output from the attribute value extraction model M2 is an attribute value corresponding to the attribute that is the query. The attribute value extraction model M2 estimates an appropriate part as an attribute value from the product data, and outputs that part.

図３の例では、バッグの特徴に関する商品データＤ２０と、属性「ブランド」を含むクエリＱ２１と、が属性値抽出モデルＭ２に入力される。図３では、商品の画像も属性値抽出モデルＭ２に入力されるものとしているが、実際には、商品データＤ２０のうちのテキスト部分のみが属性値抽出モデルＭ２に入力されるものとする。商品データＤ２０のうちの画像部分は、ニューラルネットワーク等の機械学習モデルに基づく特徴抽出器等を介して特徴ベクトル等の数値の組み合わせに変換されることで、属性値抽出モデルＭ２に入力されてもよい。 In the example of FIG. 3, product data D20 relating to the characteristics of a bag and a query Q21 including the attribute "brand" are input to the attribute value extraction model M2. In FIG. 3, an image of the product is also input to the attribute value extraction model M2, but in reality, only the text portion of the product data D20 is input to the attribute value extraction model M2. The image portion of the product data D20 may be converted into a combination of numerical values such as a feature vector via a feature extractor based on a machine learning model such as a neural network, and then input to the attribute value extraction model M2.

例えば、商品データＤ２０がオンラインショッピングモールにアップロードされる前に、商品データＤ２０と、クエリＱ２１と、が属性値抽出モデルＭ２に入力される。属性値抽出モデルＭ２は、クエリＱ２１に含まれる属性「ブランド」に応じた属性値Ｖ２２（ここでは、ブランド名）として、商品データの「ＢＢＢバッグ」の部分が適切であることを推定して出力する。例えば、属性「色」、属性「素材」、属性「サイズ」、又は属性「機能」といった他の属性も、属性値抽出モデルＭ２へのクエリとして入力可能である。 For example, before the product data D20 is uploaded to an online shopping mall, the product data D20 and a query Q21 are input to the attribute value extraction model M2. The attribute value extraction model M2 estimates that the "BBB bag" portion of the product data is appropriate as the attribute value V22 (here, the brand name) corresponding to the attribute "brand" included in the query Q21, and outputs the inference. For example, other attributes such as the attribute "color", the attribute "material", the attribute "size", or the attribute "function" can also be input as a query to the attribute value extraction model M2.

例えば、商品の属性は、シンプルな語であることが多いので、属性値抽出モデルＭ２に入力されるクエリは、質問応答モデルＭ１に入力されるクエリよりも短く具体性に欠けることがある。このため、属性値抽出モデルＭ２は、クエリの意味を認識できないことがある。属性値抽出モデルＭ２がクエリの意味を認識できなければ、属性値抽出モデルＭ２からの出力の精度も低下する。特に、オンラインショッピングモールの場合、商品データに基づいて属性値抽出モデルＭ２の訓練データを作成すると、人気のある商品の属性が多くなり、他の商品に関する訓練データが少なくなりがちである。このような訓練データのスパースネス性も、属性値抽出モデルＭ２からの出力の精度が低下する要因の１つである。 For example, product attributes are often simple words, so the query input to the attribute value extraction model M2 may be shorter and less specific than the query input to the question-answering model M1. For this reason, the attribute value extraction model M2 may not be able to recognize the meaning of the query. If the attribute value extraction model M2 cannot recognize the meaning of the query, the accuracy of the output from the attribute value extraction model M2 will also decrease. In particular, in the case of an online shopping mall, when training data for the attribute value extraction model M2 is created based on product data, there tends to be more attributes for popular products and less training data for other products. Such sparseness of the training data is one of the factors that reduces the accuracy of the output from the attribute value extraction model M2.

属性「機能」を例に挙げると、エアコンのような商品であれば、「タイマー」及び「省エネ」といった機能が存在する。バッグのような商品であれば、属性「機能」として、「防水」、「防カビ」、又は「防汚」といった機能が存在する。このため、同じ属性「機能」だったとしても、種々の意味が存在する。属性値抽出モデルＭ２は、属性「機能」のような短く抽象的なクエリの意味を認識できず、適切な属性値を出力できないことがある。この点は、他の属性も同様であり、クエリの質によって、属性値抽出モデルＭ２からの出力の精度が低下する可能性がある。 Taking the attribute "function" as an example, a product such as an air conditioner may have functions such as "timer" and "energy saving." A product such as a bag may have attributes such as "waterproof," "mildew-proof," or "stain-resistant." For this reason, the same attribute "function" may have a variety of meanings. The attribute value extraction model M2 may not be able to recognize the meaning of a short, abstract query such as the attribute "function," and may not be able to output an appropriate attribute value. This is true for other attributes, and the accuracy of the output from the attribute value extraction model M2 may decrease depending on the quality of the query.

そこで、本実施形態では、商品の属性だけではなく、商品の属性に予め関連付けられた属性値もクエリとして利用することによって、クエリを拡張するようにしている。これにより、クエリの意味がより具体的になり、属性値抽出モデルＭ２がクエリの意味を認識しやすくなるので、クエリの質が高まると考えられる。クエリの質が高まれば、先述した訓練データのスパースネス性の問題も解決できると考えられる。 Therefore, in this embodiment, the query is expanded by using not only the product attributes but also attribute values that are pre-associated with the product attributes as the query. This makes the meaning of the query more specific, making it easier for the attribute value extraction model M2 to recognize the meaning of the query, which is believed to improve the quality of the query. If the query quality is improved, it is believed that the problem of sparseness of the training data mentioned above can also be solved.

図４は、本実施形態の属性値抽出モデルの一例を示す図である。属性値抽出モデルＭ３は、従来の技術ではなく、新規な技術である。属性値抽出モデルＭ３は、商品データ及びクエリが入力されると、クエリに応じた応答を出力するモデルという点では、属性値抽出モデルＭ２と同様であるが、クエリの中身が属性値抽出モデルＭ２とは異なる。図４の例では、商品データＤ３０は、商品データＤ２０と同じであるが、クエリＱ３１は、属性だけではなく属性値も含むので、クエリＱ２１よりも拡張されている。 Figure 4 is a diagram showing an example of an attribute value extraction model of this embodiment. The attribute value extraction model M3 is not a conventional technology, but a new technology. The attribute value extraction model M3 is similar to the attribute value extraction model M2 in that it is a model that outputs a response according to the query when product data and a query are input, but differs from the attribute value extraction model M2 in the content of the query. In the example of Figure 4, the product data D30 is the same as the product data D20, but the query Q31 includes not only attributes but also attribute values, and is therefore more extended than the query Q21.

例えば、商品データＤ３０が示す商品がバッグであり、クエリＱ３１として入力される商品の属性が「機能」だったとする。先述した通り、属性「機能」は、短くて抽象的な属性であり、バッグ以外の商品にも関係する種々の意味を有する。このため、バッグ関連の属性の意味であることを属性値抽出モデルＭ３が認識できるように、属性「機能」と、バッグ関連の属性「機能」に予め用意された属性値「防カビ」と、を含むクエリＱ３１が入力される。なお、図４では省略するが、本実施形態では、属性値「防カビ」だけではなく、属性「機能」に関連付けられた全ての属性値がクエリＱ３１に含まれるものとする。 For example, suppose that the product indicated by the product data D30 is a bag, and the product attribute input as query Q31 is "function". As mentioned above, the attribute "function" is a short and abstract attribute, and has various meanings related to products other than bags. For this reason, to enable the attribute value extraction model M3 to recognize that the meaning is a bag-related attribute, query Q31 is input that includes the attribute "function" and the attribute value "mold prevention" that is prepared in advance for the bag-related attribute "function". Note that, although omitted in FIG. 4, in this embodiment, all attribute values associated with the attribute "function" are included in query Q31, not just the attribute value "mold prevention".

図４の商品は、防カビの機能を有する商品ではないが、クエリＱ３１に含まれる属性値「防カビ」によって、属性「機能」の意味を属性値抽出モデルＭ３が認識できるようになる。例えば、属性値抽出モデルＭ３は、クエリＱ３１に含まれる属性「機能」が、先述したエアコンのような機能ではなく、バッグに関する機能であることを認識できる。このため、属性値抽出モデルＭ３は、商品データＤ３０の中から、例えばクエリＱ３１に含まれる属性値「防カビ」と意味的又は表記的に近い表現を有する機能名である「防水」といった属性値Ｖ３２を出力できるようになる。 The product in FIG. 4 is not a product with an anti-mold function, but the attribute value "anti-mold" included in query Q31 enables the attribute value extraction model M3 to recognize the meaning of the attribute "function". For example, the attribute value extraction model M3 can recognize that the attribute "function" included in query Q31 is not a function such as an air conditioner as described above, but a function related to a bag. Therefore, the attribute value extraction model M3 can output an attribute value V32 from the product data D30, such as "waterproof", which is a function name that has an expression similar in meaning or notation to the attribute value "anti-mold" included in query Q31.

図４の例以外にも、属性値「防汚」（汚れを防ぐ機能）といったように、例えば抽出したい属性値「防水」と意味的又は表記的に近い他の表現がクエリとして与えられた場合も同様に、属性値抽出モデルＭ３は、商品データの中から属性値「防水」を出力できる。この点は、例えば、属性「機能」に限られず、属性「種類」又は「タイプ」といったように、短く抽象的な属性を含む他のクエリが属性値抽出モデルＭ３に入力される場合も同様である。 In addition to the example in FIG. 4, when another expression that is semantically or notationally close to the attribute value "waterproof" to be extracted is given as a query, such as the attribute value "stain-resistant" (function to prevent stains), the attribute value extraction model M3 can output the attribute value "waterproof" from the product data. This is also true when other queries including short and abstract attributes, such as the attribute "kind" or "type", are input to the attribute value extraction model M3, not limited to the attribute "function".

以上のように、本実施形態では、商品の属性に関連付けられた属性値を利用することによって、クエリを拡張するようにしている。この点は、属性「ブランド」、属性「色」、属性「素材」、属性「サイズ」、又は属性「機能」といった他の属性も同様である。例えば、オンラインショッピングモールで取引される商品の属性及び属性値を網羅することができれば、どのような商品にも対応可能になると考えられる。 As described above, in this embodiment, the query is expanded by using attribute values associated with the product attributes. This also applies to other attributes such as the attribute "brand", attribute "color", attribute "material", attribute "size", or attribute "function". For example, if it is possible to cover all the attributes and attribute values of products traded in online shopping malls, it is believed that it will be possible to handle any product.

しかしながら、オンラインショッピングモールの商品は多岐に渡るので、属性及び属性値を完全に網羅することは、現実的ではない。例えば、オンラインショッピングモールで取引される商品の中には、オンラインショッピングモールの管理者が把握しきれていない属性及び属性値も存在する。このため、実際の運用では、属性及び属性値を完全には網羅しきれていない不完全な知識を利用する必要がある。 However, since there is a wide variety of products in online shopping malls, it is not realistic to completely cover all attributes and attribute values. For example, among the products traded in online shopping malls, there are attributes and attribute values that the online shopping mall administrator does not fully understand. For this reason, in actual operation, it is necessary to use incomplete knowledge that does not completely cover all attributes and attribute values.

例えば、実際の運用で用いられる知識が不完全であることが属性値抽出モデルＭ３に学習されていない場合、属性値抽出モデルＭ３は、入力されたクエリが完全に正しいものとして、属性値の推定を行う可能性がある。この場合、例えば、属性値抽出モデルＭ３は、クエリに含まれる属性値との単純な文字列一致に基づいて、属性値の推定をすることがある。この場合、未知の属性及び属性値に対応することができないと考えられる。そこで、本実施形態では、属性値抽出モデルＭ３に知識の不完全さをあえて学習させることによって、実際の運用に対応できるようにしている。 For example, if the attribute value extraction model M3 has not learned that the knowledge used in actual operations is incomplete, the attribute value extraction model M3 may estimate an attribute value assuming that the input query is completely correct. In this case, for example, the attribute value extraction model M3 may estimate an attribute value based on a simple string match with the attribute value contained in the query. In this case, it is considered that unknown attributes and attribute values cannot be handled. Therefore, in this embodiment, the attribute value extraction model M3 is intentionally made to learn the incompleteness of knowledge, thereby enabling it to handle actual operations.

図５は、知識の不完全さを属性値抽出モデルＭ３に学習させる方法の一例を示す図である。例えば、訓練データベースＤＢ１には、属性値抽出モデルＭ３に学習させる商品の商品データ（図５では、商品のタイトル）と、当該商品の属性及び属性値と、の組み合わせが多数格納されている。図５の例では、バッテリーに関する商品の商品データに「ＡＢＣバッテリー１２Ｖ１４ＡＨＳＬＡＲｅｃｈａｒｇｅａｂｌｅ」といった文字列が含まれている。「ＡＢＣバッテリー」は、属性「ブランド」の属性値である。「１４ＡＨ」は、属性「公称容量」の属性値である。 Figure 5 is a diagram showing an example of a method for having the attribute value extraction model M3 learn about the incompleteness of knowledge. For example, the training database DB1 stores a large number of combinations of product data (product titles in Figure 5) for products to be learned by the attribute value extraction model M3, and the attributes and attribute values of the products. In the example of Figure 5, the product data for a battery-related product contains a character string such as "ABC Battery 12V 14AH SLA Rechargeable." "ABC Battery" is the attribute value of the attribute "Brand." "14AH" is the attribute value of the attribute "Nominal Capacity."

例えば、属性データベースＤＢ２には、訓練データベースＤＢ１に存在する属性と属性値のペアが多数格納されている。訓練データベースＤＢ１に格納された属性「公称容量」の属性値として、「１ａｈ」～「１００ａｈ」といった１００個の属性値が存在したとすると、属性「公称容量」と、これら１００個の属性値と、のペアが属性データベースＤＢ２に格納される。属性データベースＤＢ２には、他の属性のペアも多数格納されているものとする。 For example, the attribute database DB2 stores many pairs of attributes and attribute values that exist in the training database DB1. If there are 100 attribute values, such as "1 ah" to "100 ah", for the attribute "nominal capacity" stored in the training database DB1, pairs of the attribute "nominal capacity" and these 100 attribute values are stored in the attribute database DB2. It is assumed that the attribute database DB2 also stores many pairs of other attributes.

仮に、属性「公称容量」の１００個すべての属性値を含むクエリＱ４０を利用して属性値抽出モデルＭ３の学習を行ったとすると、属性値抽出モデルＭ３は、自身に入力される知識が完全であると認識する可能性がある。この場合、例えば、属性値抽出モデルＭ３は、属性値にばかり着目してしまい、属性に着目しなくなったり、未知の属性値を抽出できなったりする可能性がある。例えば、未知の属性がクエリとして入力されたり、利用可能な属性値が少ない属性がクエリとして入力されたりした場合に、属性値抽出モデルＭ３の精度が低下する可能性がある。 If attribute value extraction model M3 is trained using query Q40 that includes all 100 attribute values of the attribute "nominal capacity", there is a possibility that attribute value extraction model M3 will recognize that the knowledge input to it is complete. In this case, for example, attribute value extraction model M3 may focus only on attribute values and not on attributes, or may be unable to extract unknown attribute values. For example, if an unknown attribute is input as a query, or an attribute with few available attribute values is input as a query, the accuracy of attribute value extraction model M3 may decrease.

そこで、本実施形態では、知識の不完全さを属性値抽出モデルＭ３に学習させるために、２つの手法が利用される。１つ目の手法では、属性データベースＤＢ２に格納された全ての属性値を属性値抽出モデルＭ３に学習させるのではなく、あえて一部を除外したうえで属性値抽出モデルＭ３に学習させるようにしている。以降、１つ目の手法を、ナレッジドロップアウト手法という。ナレッジドロップアウト手法では、本当は利用可能な属性値を意図的に少なくすることによって、知識の不完全さを属性値抽出モデルＭ３に学習させることができる。即ち、ナレッジドロップアウト手法は、属性値抽出モデルＭ３に学習させる知識を意図的に少なくする手法である。 Therefore, in this embodiment, two techniques are used to make the attribute value extraction model M3 learn about the incompleteness of knowledge. In the first technique, rather than making the attribute value extraction model M3 learn all of the attribute values stored in the attribute database DB2, some are deliberately excluded and then made to learn by the attribute value extraction model M3. Hereinafter, the first technique is referred to as the knowledge dropout technique. In the knowledge dropout technique, the attribute value extraction model M3 can learn about the incompleteness of knowledge by intentionally reducing the number of attribute values that are actually usable. In other words, the knowledge dropout technique is a technique for intentionally reducing the amount of knowledge that is made to learn by the attribute value extraction model M3.

２つ目の手法では、属性データベースＤＢ２に格納された属性値を何れも利用せずに、属性値抽出モデルＭ３に学習させるようにしている。ただし、一切の属性値を利用しない場合には、図３で説明した従来の属性値抽出モデルＭ２と同様に精度が低下する可能性があるので、本実施形態では、訓練データベースＤＢ１に商品データが格納された商品１つにつき、ナレッジドロップアウト手法を利用した学習と、属性値を全て除外して利用しない学習と、の２つの学習を行うようにしている。 In the second technique, the attribute value extraction model M3 is trained without using any of the attribute values stored in the attribute database DB2. However, if no attribute values are used, there is a possibility that the accuracy will decrease as with the conventional attribute value extraction model M2 described in FIG. 3. Therefore, in this embodiment, for each product whose product data is stored in the training database DB1, two types of learning are performed: learning using the knowledge dropout technique, and learning in which all attribute values are removed and not used.

第２の手法では、属性値の利用可能性を示す特別なトークンを、クエリに含めるようにしている。トークンは、語の単位である。図５の例では、スペースで区切られた語の単位をトークンとする。トークンは、スペースではなく、特別な記号で区切られた語の単位であってもよい。トークンは、何らかの意味を有する語になることもあるが、あるトークンだけを見ても人間が意味を理解できる語になるとは限らない。 In the second technique, a special token that indicates the availability of an attribute value is included in the query. A token is a unit of a word. In the example of Figure 5, a token is a unit of a word separated by a space. A token may be a unit of a word separated by a special symbol instead of a space. A token may be a word that has some meaning, but looking at a particular token alone does not necessarily mean that it is a word whose meaning can be understood by a human.

以降、第２の手法で利用される特別なトークンを、ナレッジトークンという。更に、第２の手法を、ナレッジトークン手法という。図５の例では、ナレッジトークンは、［Ｓｅｅｎ］と「Ｕｎｓｅｅｎ」といった文字列で表現される。ナレッジトークンが［Ｓｅｅｎ］であることは、属性値が利用可能であることを意味する。ナレッジトークンが［Ｕｎｓｅｅｎ］であることは、属性値が利用可能ではないことを意味する。 Hereinafter, the special token used in the second method is referred to as a knowledge token. Furthermore, the second method is referred to as the knowledge token method. In the example of FIG. 5, the knowledge token is expressed as a character string such as [Seen] and "Unseen". When the knowledge token is [Seen], it means that the attribute value is available. When the knowledge token is [Unseen], it means that the attribute value is not available.

本実施形態では、属性データベースＤＢ２に格納された属性には、ペアとなる属性値が必ず存在するものとする。このため、学習段階では、本当は属性値が利用可能である属性に対し、擬似的に［Ｕｎｓｅｅｎ］のナレッジトークンが関連付けられる。図５の例であれば、学習時の属性値抽出モデルＭ３に対する入力Ｉ４２のように、属性「公称容量」には、本当は１００個の属性値が存在するが、擬似的に属性値が存在しないものとして、［Ｕｎｓｅｅｎ］のナレッジトークンが関連付けられる。それとは別に、入力Ｉ４１のように、ナレッジドロップアウト手法を利用した［Ｓｅｅｎ］のナレッジトークンを含むクエリ（除外されなかった属性値を含むクエリ）も属性値抽出モデルＭ２に学習される。 In this embodiment, it is assumed that an attribute stored in the attribute database DB2 always has a paired attribute value. For this reason, in the learning stage, a pseudo knowledge token of [Unseen] is associated with an attribute whose attribute value is actually available. In the example of FIG. 5, as in input I42 for the attribute value extraction model M3 during learning, the attribute "nominal capacity" actually has 100 attribute values, but the attribute value is pseudo-assigned to an [Unseen] knowledge token. In addition, as in input I41, a query containing a knowledge token of [Seen] using the knowledge dropout technique (a query containing an attribute value that was not excluded) is also learned by the attribute value extraction model M2.

以上のように、本実施形態では、ナレッジドロップアウト手法と、ナレッジトークン手法と、を利用して、属性値抽出モデルＭ３に知識の不完全さを学習させるようにしている。知識の不完全さを属性値抽出モデルＭ３に学習させることによって、実運用の環境に適した属性値抽出を実行できるので、属性値抽出モデルＭ３の精度が高まる。以降、学習システム１の詳細を説明する。 As described above, in this embodiment, the knowledge dropout method and the knowledge token method are used to have the attribute value extraction model M3 learn about the incompleteness of knowledge. By having the attribute value extraction model M3 learn about the incompleteness of knowledge, attribute value extraction suitable for the actual operation environment can be performed, thereby improving the accuracy of the attribute value extraction model M3. The learning system 1 will be described in detail below.

［３．学習システムで実現される機能］
図６は、学習システム１で実現される機能の一例を示す機能ブロック図である。本実施形態では、属性値抽出モデルＭ３の学習に関する学習機能が学習端末２０により実現される場合を説明する。学習済みの属性値抽出モデルＭ３を利用した推定に関する推定機能が推定端末３０により実現される場合を説明する。以降、推定時及び学習時の各々で利用される商品、商品データ、属性、属性値、及びクエリを区別するために、商品、商品データ、属性、属性値、及びクエリに対し、下記のように名前を付ける。 [3. Functions realized by the learning system]
6 is a functional block diagram showing an example of functions realized by the learning system 1. In this embodiment, a case will be described in which a learning function related to learning the attribute value extraction model M3 is realized by the learning terminal 20. A case will be described in which an estimation function related to estimation using the learned attribute value extraction model M3 is realized by the estimation terminal 30. Hereinafter, in order to distinguish between the products, product data, attributes, attribute values, and queries used during estimation and learning, the products, product data, attributes, attribute values, and queries are given names as follows:

［推定時の各用語］
第１商品：学習済みの属性値抽出モデルＭ３の処理対象となる商品
第１データ：第１商品の商品データ
第１属性：第１商品の属性
第１属性値：第１属性に関連付けられた属性値
第１クエリ：推定時に利用されるクエリ
第２属性値：第１データから抽出された属性値 [Terms used in estimation]
First product: product to be processed by the trained attribute value extraction model M3 First data: product data of the first product First attribute: attribute of the first product First attribute value: attribute value associated with the first attribute First query: query used during estimation Second attribute value: attribute value extracted from the first data

［学習時の各用語］
第２商品：属性値抽出モデルＭ３の学習で利用される商品
第２データ：第２商品の商品データ
第２属性：第２商品の属性
第３属性値：第２属性に関連付けられた属性値
第２クエリ：学習時に利用されるクエリ
第４属性値：第２データに含まれる、学習時の正解となる属性値 [Terms used during study]
Second product: product used in learning the attribute value extraction model M3 Second data: product data of the second product Second attribute: attribute of the second product Third attribute value: attribute value associated with the second attribute Second query: query used during learning Fourth attribute value: attribute value included in the second data and serving as the correct answer during learning

［３－１．サーバで実現される機能］
モデル記憶部１００は、記憶部１２により実現される。モデル記憶部１００は、学習済みの属性値抽出モデルＭ３を記憶する。学習済みの属性値抽出モデルＭ３は、後述の学習部２０４による学習が完了した属性値抽出モデルＭ３である。本実施形態では、推定端末３０が、モデル記憶部１００に記憶された学習済みの属性値抽出モデルＭ３をダウンロードして利用する場合を説明するが、推定端末３０は、学習済みの属性値抽出モデルＭ３をダウンロードすることなく、学習済みの属性値抽出モデルＭ３をオンライン上で利用してもよい。モデル記憶部１００は、後述の訓練データベースＤＢ１及び属性データベースＤＢ２を記憶してもよい。 [3-1. Functions realized by the server]
The model storage unit 100 is realized by the storage unit 12. The model storage unit 100 stores a trained attribute value extraction model M3. The trained attribute value extraction model M3 is an attribute value extraction model M3 for which training by a learning unit 204 described later has been completed. In this embodiment, a case will be described in which the estimation terminal 30 downloads and uses the trained attribute value extraction model M3 stored in the model storage unit 100, but the estimation terminal 30 may use the trained attribute value extraction model M3 online without downloading the trained attribute value extraction model M3. The model storage unit 100 may store a training database DB1 and an attribute database DB2 described later.

［３－２．学習端末で実現される機能］
モデル記憶部２００は、記憶部２２により実現される。第３属性値取得部２０１、確率決定部２０２、第２クエリ取得部２０３、及び学習部２０４は、制御部２１を主として実現される。これらの機能は、学習機能の一例である。 [3-2. Functions realized on the learning device]
The model storage unit 200 is realized by the storage unit 22. The third attribute value acquisition unit 201, the probability determination unit 202, the second query acquisition unit 203, and the learning unit 204 are mainly realized by the control unit 21. These functions are examples of learning functions.

［モデル記憶部］
モデル記憶部２００は、属性値抽出モデルＭ３の学習に必要なデータを記憶する。例えば、モデル記憶部２００は、推定用の第１商品に関する第１属性に関連付けられた少なくとも１つの第１属性値を含む第１クエリを利用して、第１商品に関する第２属性値を含む第１データから第２属性値を抽出するための属性値抽出モデルＭ３を記憶する。第１商品は、第１アイテムの一例である。このため、第１商品と記載した箇所は、第１アイテムと読み替えることができる。 [Model storage unit]
The model storage unit 200 stores data necessary for learning the attribute value extraction model M3. For example, the model storage unit 200 stores an attribute value extraction model M3 for extracting a second attribute value from first data including a second attribute value related to a first product using a first query including at least one first attribute value associated with a first attribute related to a first product for estimation. The first product is an example of a first item. Therefore, the description "first product" can be read as "first item."

アイテムとは、属性及び属性値が付与される対象となる物である。アイテムは、商品のような有体物であってもよいし、サービス又はデータのような無体物であってもよい。アイテムは、任意の物であってよく、商品に限られない。例えば、アイテムは、宿泊施設に関するコンテンツ、レストランに関するコンテンツ、電子書籍、動画、楽曲、ウェブサイト、又はその他のコンテンツであってもよい。例えば、アイテムは、金融サービス又は通信サービスといったサービスの紹介文、ＳＮＳにおける投稿、電子メール等のメッセージ、又はその他の文書であってもよい。 An item is an object to which attributes and attribute values are assigned. An item may be a tangible object such as a product, or an intangible object such as a service or data. An item may be any object and is not limited to a product. For example, an item may be content related to accommodations, content related to restaurants, e-books, videos, music, websites, or other content. For example, an item may be an introduction to a service such as a financial service or a communication service, a post on a social networking site, a message such as an e-mail, or other document.

第１アイテムは、上記のようなアイテムのうち、学習済みの属性値抽出モデルＭ３による推定対象となるアイテムである。第１アイテムは、第２属性値の抽出対象となるアイテムである。第１アイテムは、第１データから第２属性値がまだ抽出されていないアイテムである。第１アイテムは、後述の第２アイテムと偶然同じになることもあるが、原則として、第２アイテムとは異なるものとする。 The first item is an item among the above items that is to be estimated by the trained attribute value extraction model M3. The first item is an item that is to be extracted as a second attribute value. The first item is an item for which a second attribute value has not yet been extracted from the first data. The first item may coincidentally be the same as the second item described below, but in principle, it is different from the second item.

第１データは、第１アイテムに関する何らかの内容を含むデータである。第１データは、第１アイテムの詳細に関するデータである。第１データは、第１アイテムの種類に応じたデータであればよく、第１商品のタイトルに限られない。例えば、第１アイテムが宿泊施設であれば、宿泊施設又は部屋のタイトル又は紹介文が第１データに相当してもよい。例えば、アイテムが電子書籍であれば、電子書籍の実データ部分が第１データに相当してもよい。本実施形態では、第１データは、第１商品の説明に関する第１文字列を含む。属性値抽出モデルＭ３は、当該第１文字列から第２属性値を抽出するための自然言語処理に関するモデルである。 The first data is data including some content related to the first item. The first data is data related to the details of the first item. The first data may be data according to the type of the first item, and is not limited to the title of the first product. For example, if the first item is an accommodation facility, the title or description of the accommodation facility or room may correspond to the first data. For example, if the item is an electronic book, the actual data portion of the electronic book may correspond to the first data. In this embodiment, the first data includes a first character string related to a description of the first product. The attribute value extraction model M3 is a model related to natural language processing for extracting a second attribute value from the first character string.

例えば、モデル記憶部２００は、学習前の属性値抽出モデルＭ３を記憶する。学習前の属性値抽出モデルＭ３は、パラメータが初期値の属性値抽出モデルＭ３である。学習部２０４が学習を開始した後は、モデル記憶部２００は、学習途中の属性値抽出モデルＭ３を記憶する。学習が完了した後は、モデル記憶部２００は、学習済みの属性値抽出モデルＭ３を記憶する。モデル記憶部２００は、属性値抽出モデルＭ３以外にも、訓練データベースＤＢ１及び属性データベースＤＢ２を記憶する。 For example, the model storage unit 200 stores the attribute value extraction model M3 before learning. The attribute value extraction model M3 before learning is an attribute value extraction model M3 with parameters of initial values. After the learning unit 204 starts learning, the model storage unit 200 stores the attribute value extraction model M3 in the middle of learning. After learning is completed, the model storage unit 200 stores the learned attribute value extraction model M3. In addition to the attribute value extraction model M3, the model storage unit 200 also stores a training database DB1 and an attribute database DB2.

図７は、訓練データベースＤＢ１の一例を示す図である。訓練データベースＤＢ１は、属性値抽出モデルＭ３の訓練データになりうるデータが格納されたデータベースである。訓練データは、第２データ及び第２クエリと、第４属性値と、の関係に関するデータである。例えば、訓練データベースＤＢ１には、第２データ、第２クエリに含まれる第２属性、及び第４属性が関連付けられている。訓練データは、学習時に属性値抽出モデルＭ３に入力される入力部分と、属性値抽出モデルＭ３から出力されるべき出力部分と、のペアを含む。図７では、第２クエリのうちの第２属性だけが示されている。第２クエリに含まれる第３属性値は、属性データベースＤＢ２から補填される。 FIG. 7 is a diagram showing an example of the training database DB1. The training database DB1 is a database in which data that can be training data for the attribute value extraction model M3 is stored. The training data is data related to the relationship between the second data, the second query, and the fourth attribute value. For example, the training database DB1 is associated with the second data, the second attribute included in the second query, and the fourth attribute. The training data includes a pair of an input portion that is input to the attribute value extraction model M3 during learning, and an output portion that is to be output from the attribute value extraction model M3. In FIG. 7, only the second attribute of the second query is shown. The third attribute value included in the second query is filled in from the attribute database DB2.

訓練データの入力部分は、学習済みの属性値抽出モデルＭ３に入力されるデータと同じ形式である。このため、推定時に入力される第１データ及び第１クエリの形式と、学習時に入力される第２データ及び第２クエリの形式と、は同じである。本実施形態では、第１データ及び第１クエリと、第２データ及び第２クエリと、が互いに文字形式である場合を説明するが、これらは、数字、その他の記号、又はこれらと文字の組み合わせといった任意の形式であってよい。例えば、属性値抽出モデルＭ３に文字が入力されるのではなく、文字の特徴量が入力されるのであれば、訓練データの入力部分として、第２データ及び第２クエリの特徴量が含まれてもよい。 The input portion of the training data has the same format as the data input to the trained attribute value extraction model M3. Therefore, the format of the first data and the first query input at the time of estimation is the same as the format of the second data and the second query input at the time of learning. In this embodiment, the first data and the first query, and the second data and the second query are both in character format, but they may be in any format, such as numbers, other symbols, or a combination of these and characters. For example, if character features are input to the attribute value extraction model M3 instead of characters, the input portion of the training data may include the features of the second data and the second query.

訓練データの出力部分は、学習済みの属性値抽出モデルＭ３から出力されるデータと同じ形式である。このため、推定時の出力の形式と、学習時の出力の形式と、は同じである。本実施形態では、後述の部分識別情報が出力される場合を説明するが、第２属性値及び第４属性値そのものが出力されてもよい。本実施形態では、第２属性値及び第４属性値が互いに文字形式である場合を説明するが、これらは、数字、その他の記号、又はこれらと文字の組み合わせといった任意の形式であってよい。例えば、属性値抽出モデルＭ３から属性値を示す文字列が出力されるのではなく、属性値を識別可能なＩＤ又は番号が出力されるのであれば、訓練データの出力部分として、第４属性値を識別可能なＩＤ又は番号が含まれてもよい。 The output portion of the training data has the same format as the data output from the trained attribute value extraction model M3. Therefore, the format of the output during estimation is the same as the format of the output during learning. In this embodiment, a case where partial identification information described below is output will be described, but the second attribute value and the fourth attribute value themselves may be output. In this embodiment, a case where the second attribute value and the fourth attribute value are both in character format will be described, but these may be in any format, such as numbers, other symbols, or a combination of these and characters. For example, if the attribute value extraction model M3 does not output a character string indicating the attribute value, but rather outputs an ID or number that can identify the attribute value, the output portion of the training data may include an ID or number that can identify the fourth attribute value.

本実施形態では、オンラインショッピングモールで実際に販売される商品が第１商品及び第２商品に相当する場合を説明する。即ち、オンラインショッピングモールに実際にアップロードされる商品データが第１データ及び第２データに相当する。例えば、第１データ及び第２データは、オンラインショッピングモールで販売される商品のタイトル、説明文、又はこれらの組み合わせである。第１データは、ユーザが入力した文字を含む。なお、訓練データは、オンラインショッピングモールで実際に販売される商品に基づいて作成されるのではなく、オンラインショッピングモールの管理者が手作業で作成してもよい。 In this embodiment, a case will be described in which the first and second products correspond to products actually sold at an online shopping mall. That is, product data actually uploaded to an online shopping mall corresponds to the first and second data. For example, the first and second data are the titles, descriptions, or combinations of products sold at an online shopping mall. The first data includes characters entered by a user. Note that the training data may be created manually by an administrator of the online shopping mall, rather than being created based on products actually sold at an online shopping mall.

第２クエリは、第２商品の属性値を問うための質問である。例えば、第２クエリは、第２属性と、当該第２属性に関連付けられた第３属性値と、を含むことができる。第２属性は、第２商品に関連付けられた属性である。例えば、第２属性は、店舗の担当者が指定した属性を示す文字を含む。第２属性は、オンラインショッピングモールの管理者により指定されてもよいし、第２属性を抽出するためのツールが利用されてもよい。本実施形態では、第２クエリに含まれる第３属性値は、ナレッジドロップアウト手法により決定されるので、訓練データベースＤＢ１には、第２クエリに含めるべき第２属性のみが示されている。第２クエリに含める第３属性値は、後述の属性データベースＤＢ２から取得される。 The second query is a question for inquiring about the attribute value of the second product. For example, the second query may include a second attribute and a third attribute value associated with the second attribute. The second attribute is an attribute associated with the second product. For example, the second attribute includes characters indicating an attribute specified by a store staff member. The second attribute may be specified by an online shopping mall manager, or a tool for extracting the second attribute may be used. In this embodiment, the third attribute value included in the second query is determined by a knowledge dropout method, so that only the second attribute to be included in the second query is shown in the training database DB1. The third attribute value to be included in the second query is obtained from the attribute database DB2 described below.

第４属性値は、第２商品の正解となる属性値である。第４属性値は、第２属性に関連付けられた複数の第３属性値のうちの何れかであってもよいし、当該複数の第３属性値の中には存在しない属性値であってもよい。例えば、第４属性値は、店舗の担当者が指定した属性値を示す文字を含む。なお、第４属性値は、店舗の担当者により指定されるのではなく、オンラインショッピングモールの管理者により指定されてもよいし、過去に作成した属性値抽出モデルＭ３により抽出されてもよい。ただし、過去に作成した属性値抽出モデルＭ３から抽出された第４属性値には、誤った第４属性値が含まれる可能性があるので、人手で作成した訓練データに基づいて学習された属性値抽出モデルＭ３よりも精度が悪くなる可能性がある。 The fourth attribute value is the correct attribute value for the second product. The fourth attribute value may be any one of a plurality of third attribute values associated with the second attribute, or may be an attribute value that does not exist among the plurality of third attribute values. For example, the fourth attribute value includes characters indicating an attribute value specified by a store staff member. The fourth attribute value may not be specified by a store staff member, but may be specified by an online shopping mall manager, or may be extracted by a previously created attribute value extraction model M3. However, the fourth attribute value extracted from the previously created attribute value extraction model M3 may include an incorrect fourth attribute value, and therefore may be less accurate than an attribute value extraction model M3 trained based on manually created training data.

図８は、属性データベースＤＢ２の一例を示す図である。属性データベースＤＢ２は、複数の属性の各々に対して予め用意された属性値が格納されたデータベースである。属性データベースＤＢ２は、属性と属性値の辞書ということもできる。例えば、属性データベースＤＢ２には、属性及び属性値のペアが多数格納される。属性データベースＤＢ２に格納された属性は、第１属性にもなりうるし、第２属性にもなりうる。属性データベースＤＢ２に格納された属性値は、第１属性値にもなりうるし、第３属性値にもなりうる。属性データベースＤＢ２には、後述の頻度が格納されてもよい。 Figure 8 is a diagram showing an example of the attribute database DB2. The attribute database DB2 is a database in which pre-prepared attribute values are stored for each of a plurality of attributes. The attribute database DB2 can also be a dictionary of attributes and attribute values. For example, the attribute database DB2 stores a large number of pairs of attributes and attribute values. An attribute stored in the attribute database DB2 can be a first attribute or a second attribute. An attribute value stored in the attribute database DB2 can be a first attribute value or a third attribute value. The attribute database DB2 may also store frequencies, which will be described later.

本実施形態では、属性データベースＤＢ２は、訓練データベースＤＢ１に基づいて作成されるものとする。例えば、学習端末２０は、訓練データの入力部分に含まれる第２属性と、当該入力部分に対応する出力部分である第４属性値と、の組み合わせを集計する。学習端末２０は、当該集計結果に基づいて、第２属性及び第４属性値のペアを、属性データベースＤＢ２に格納する。学習端末２０は、ある第２属性に対して閾値以上の頻度を有する第４属性値だけを属性データベースＤＢ２に格納してもよい。 In this embodiment, the attribute database DB2 is created based on the training database DB1. For example, the learning terminal 20 tally up combinations of the second attribute included in the input portion of the training data and the fourth attribute value that is the output portion corresponding to the input portion. Based on the tallying results, the learning terminal 20 stores pairs of the second attribute and the fourth attribute value in the attribute database DB2. The learning terminal 20 may store only the fourth attribute values that have a frequency equal to or greater than a threshold value for a certain second attribute in the attribute database DB2.

なお、属性データベースＤＢ２は、任意の方法によって作成可能であり、訓練データベースＤＢ１に基づいて作成されなくてもよい。例えば、学習端末２０は、オンラインショッピングモールで販売される商品に関する商品データベースに基づいて、属性データベースＤＢ２を作成してもよい。例えば、オンラインショッピングモールの管理者が、属性データベースＤＢ２を手作業で作成してもよい。例えば、オンラインショッピングモールで販売される商品に対して付与可能な属性及び属性値のリストが属性データベースＤＢ２として利用されてもよい。 The attribute database DB2 can be created by any method and does not have to be created based on the training database DB1. For example, the learning terminal 20 may create the attribute database DB2 based on a product database related to products sold at an online shopping mall. For example, an administrator of the online shopping mall may manually create the attribute database DB2. For example, a list of attributes and attribute values that can be assigned to products sold at an online shopping mall may be used as the attribute database DB2.

［第３属性値取得部］
第３属性値取得部は、第２クエリに含まれる少なくとも１つの第３属性値を取得する。例えば、第３属性値取得部２０１は、複数の第３属性値が格納された属性データベースＤＢ２の中から、第３属性に関連付けられた複数の第３属性値を取得する。属性データベースＤＢ２は、第２データベースの一例である。このため、属性データベースＤＢ２と記載した箇所は、第２データベースと読み替えることができる。第２データベースは、種々の第３属性値が格納されたデータベースであればよく、他の名前で呼ばれてもよい。 [Third attribute value acquisition unit]
The third attribute value acquisition unit acquires at least one third attribute value included in the second query. For example, the third attribute value acquisition unit 201 acquires a plurality of third attribute values associated with the third attribute from an attribute database DB2 in which a plurality of third attribute values are stored. The attribute database DB2 is an example of a second database. Therefore, the description of the attribute database DB2 can be replaced with the second database. The second database may be called by another name as long as it is a database in which various third attribute values are stored.

本実施形態では、第３属性値取得部２０１は、複数の第３属性値が格納された属性データベースＤＢ２に格納された全ての第３属性値を取得する。例えば、第３属性値取得部２０１は、属性データベースＤＢ２を参照し、ある第２商品の第２属性に関連付けられた全ての第３属性値を取得する。第２商品として図５のバッテリーを例に挙げると、第２属性「公称容量」には、「１ａｈ」～「１００ａｈ」といった１００個の第３属性値が属性データベースＤＢ２に格納されているので、第３属性値取得部２０１は、これら１００個の第３属性値を全て取得する。第３属性値取得部２０１は、他の第２属性も同様に、全ての第３属性値を属性データベースＤＢ２から取得する。 In this embodiment, the third attribute value acquisition unit 201 acquires all the third attribute values stored in the attribute database DB2 in which a plurality of third attribute values are stored. For example, the third attribute value acquisition unit 201 refers to the attribute database DB2 and acquires all the third attribute values associated with the second attribute of a certain second product. Taking the battery in FIG. 5 as an example of the second product, the second attribute "nominal capacity" has 100 third attribute values such as "1 ah" to "100 ah" stored in the attribute database DB2, so the third attribute value acquisition unit 201 acquires all of these 100 third attribute values. The third attribute value acquisition unit 201 similarly acquires all the third attribute values for the other second attributes from the attribute database DB2.

なお、第３属性値取得部２０１は、属性データベースＤＢ２に格納された第３属性値のうちの一部のみを取得してもよい。例えば、第３属性値取得部２０１は、全ての第３属性値の中から、所定数の第３属性値をランダムに取得してもよい。例えば、第３属性値取得部２０１は、全ての第３属性値の中から、訓練データベースＤＢ１における頻度が高い順に所定数の第３属性値を取得してもよい。例えば、第３属性値取得部２０１は、全ての第３属性値の中から、訓練データベースＤＢ１における頻度が閾値以上の第３属性値を取得してもよい。 The third attribute value acquisition unit 201 may acquire only a portion of the third attribute values stored in the attribute database DB2. For example, the third attribute value acquisition unit 201 may randomly acquire a predetermined number of third attribute values from among all the third attribute values. For example, the third attribute value acquisition unit 201 may acquire a predetermined number of third attribute values from among all the third attribute values in descending order of frequency in the training database DB1. For example, the third attribute value acquisition unit 201 may acquire third attribute values whose frequency in the training database DB1 is equal to or greater than a threshold value from among all the third attribute values.

例えば、第３属性値取得部２０１は、属性データベースＤＢ２から第３属性値を取得するのではなく、訓練データベースＤＢ１又は他のデータベースから第３属性値を取得してもよい。第３属性値は、オンラインショッピングモールの管理者により指定されてもよい。例えば、属性値抽出モデルＭ３の学習時に、管理者が第３属性値を逐一指定する場合には、第３属性値取得部２０１は、当該逐一指定された第３属性値を取得してもよい。利用可能な第３属性値が１つだけの場合には、第３属性値取得部２０１は、１つの第３属性値だけを取得してもよい。 For example, the third attribute value acquisition unit 201 may acquire the third attribute value from the training database DB1 or another database, rather than acquiring the third attribute value from the attribute database DB2. The third attribute value may be specified by an administrator of an online shopping mall. For example, when the administrator specifies the third attribute value one by one during learning of the attribute value extraction model M3, the third attribute value acquisition unit 201 may acquire the third attribute value specified one by one. When only one third attribute value is available, the third attribute value acquisition unit 201 may acquire only one third attribute value.

［確率決定部］
確率決定部２０２は、第３属性値が除外される確率を決定する。この確率は、ナレッジドロップアウト手法において、仮の第２クエリには含められた第３属性値が、最終的な第２クエリには含まれなくなる確率である。この確率は、全ての第３属性値で共通の確率であってもよいしランダムに決定されてもよいが、本実施形態では、第３属性値の頻度に応じた確率であるものとする。 [Probability determination unit]
The probability determination unit 202 determines the probability that the third attribute value is excluded. This probability is the probability that the third attribute value included in the tentative second query will not be included in the final second query in the knowledge dropout method. This probability may be a common probability for all the third attribute values or may be determined randomly, but in this embodiment, it is assumed to be a probability according to the frequency of the third attribute value.

頻度は、第３属性値が実際の商品に利用されている頻度である。本実施形態では、ある第３属性値の頻度は、訓練データベースＤＢ１で当該第３属性値が出現する頻度である。頻度は、訓練データベースＤＢ１における出現数ということもできる。図８では省略しているが、属性データベースＤＢ２には、第３属性値の頻度も格納されているものとする。例えば、学習端末２０は、訓練データベースＤＢ１で第３属性値が出現した頻度を集計し、属性データベースＤＢ２に格納する。 The frequency is the frequency with which the third attribute value is used in actual products. In this embodiment, the frequency of a certain third attribute value is the frequency with which the third attribute value appears in the training database DB1. The frequency can also be referred to as the number of occurrences in the training database DB1. Although omitted in FIG. 8, the frequency of the third attribute value is also stored in the attribute database DB2. For example, the learning terminal 20 tallies the frequency with which the third attribute value appears in the training database DB1, and stores it in the attribute database DB2.

例えば、確率決定部２０２は、第３属性値ごとに、複数の第２データが格納された訓練データベースＤＢ１における当該第３属性値の頻度に基づいて、当該第３属性値の確率を決定する。訓練データベースＤＢ１は、第１データベースの一例である。このため、訓練データベースＤＢ１と記載した箇所は、第１データベースと読み替えることができる。第１データベースは、種々の第２商品の第３属性値が格納されたデータベースであればよく、他の名前で呼ばれてもよい。例えば、オンラインショッピングモールの商品データベースが第１データベースに相当してもよい。即ち、他のデータベースにおける頻度が利用されてもよい。 For example, the probability determination unit 202 determines the probability of each third attribute value based on the frequency of the third attribute value in a training database DB1 in which multiple second data are stored. The training database DB1 is an example of a first database. Therefore, the reference to training database DB1 can be read as the first database. The first database may be called by another name as long as it is a database in which the third attribute values of various second products are stored. For example, a product database of an online shopping mall may correspond to the first database. In other words, the frequency in another database may be used.

例えば、確率決定部２０２は、第３属性値の頻度が高いほど、第３属性値の確率が低くなるように、第３属性値の確率を決定する。逆にいえば、確率決定部２０２は、第３属性値の頻度が低いほど、第３属性値の確率が高くなるように、第３属性値の確率を決定する。このようにすれば、よく利用されている第３属性値が除外されにくくなるので、よく利用されている第３属性値を属性値抽出モデルＭ３に学習させやすくなる。 For example, the probability determination unit 202 determines the probability of the third attribute value such that the higher the frequency of the third attribute value, the lower the probability of the third attribute value. Conversely, the probability determination unit 202 determines the probability of the third attribute value such that the lower the frequency of the third attribute value, the higher the probability of the third attribute value. In this way, it becomes less likely that a frequently used third attribute value will be excluded, making it easier to have the attribute value extraction model M3 learn frequently used third attribute values.

頻度と確率の関係を示すデータは、予めモデル記憶部２００に記憶されているものとする。このデータは、任意の形式であってよく、例えば、数式形式又はテーブル形式であってもよい。このデータは、機械学習を利用したモデル、又は、プログラムの一部であってもよい。確率決定部２０２は、第３属性値の頻度に関連付けられた確率となるように、第３属性値の確率を決定する。 The data indicating the relationship between frequency and probability is stored in advance in the model storage unit 200. This data may be in any format, for example, in the form of a mathematical formula or a table. This data may be a model using machine learning, or may be part of a program. The probability determination unit 202 determines the probability of the third attribute value so that the probability is associated with the frequency of the third attribute value.

例えば、確率決定部２０２は、第３属性値ごとに、予め定められた除外レートと、当該第３属性値の頻度と、に基づいて、当該第３属性値の確率を決定してもよい。除外レートは、デフォルトの確率である。除外レートは、第３属性値に応じて異なってもよいが、本実施形態では、全ての第３属性値で除外レートが共通であるものとする。除外レートは、オンラインショッピングモールの管理者が指定してもよいし、外部のデータベースに基づいて動的に決定されてもよい。 For example, the probability determination unit 202 may determine the probability of each third attribute value based on a predetermined exclusion rate and the frequency of the third attribute value. The exclusion rate is a default probability. The exclusion rate may differ depending on the third attribute value, but in this embodiment, the exclusion rate is the same for all third attribute values. The exclusion rate may be specified by the administrator of the online shopping mall, or may be dynamically determined based on an external database.

例えば、除外レートをｒとし、ある第３属性値の頻度をｎｖとすると、確率決定部２０２は、この第３属性値の確率を、ｒ^ｎｖとなるように決定する。なお、確率の計算方法は、他の計算方法であってもよく、本実施形態の例に限られない。例えば、確率決定部２０２は、除外レートｒと、ある第３属性値の頻度ｎｖと、を除算した値を、この第３属性値の確率としてもよい。例えば、確率決定部２０２は、除外レートｒから、ある第３属性値の頻度ｎｖと、を減算した値に応じた値を、この第３属性値の確率としてもよい。例えば、確率決定部２０２は、除外レートｒを利用せずに確率を決定してもよい。 For example, if the exclusion rate is r and the frequency of a certain third attribute value is nv, the probability determination unit 202 determines the probability of this third attribute value to be r ^nv . Note that the calculation method of the probability may be other calculation methods and is not limited to the example of this embodiment. For example, the probability determination unit 202 may determine the value obtained by dividing the exclusion rate r by the frequency nv of a certain third attribute value as the probability of this third attribute value. For example, the probability determination unit 202 may determine the value obtained by subtracting the frequency nv of a certain third attribute value from the exclusion rate r as the probability of this third attribute value. For example, the probability determination unit 202 may determine the probability without using the exclusion rate r.

［第２クエリ取得部］
第２クエリ取得部２０３は、第２クエリを取得する。本実施形態では、知識の不完全さを表現するために、本当は学習で利用可能な第３属性値が意図的に除外されるので、第２クエリ取得部２０３は、学習用の第２商品に関する第２属性に関連付けられた複数の第３属性値のうちの少なくとも一部が除外された第２クエリを取得する。 [Second query acquisition unit]
The second query acquisition unit 203 acquires a second query. In this embodiment, in order to express incompleteness of knowledge, third attribute values that are actually available for learning are intentionally excluded, so the second query acquisition unit 203 acquires a second query in which at least some of the third attribute values associated with the second attribute related to the second product for learning are excluded.

第３属性値を除外するとは、第３属性値を第２クエリに含めないことである。即ち、本当は学習で利用可能な第３属性値を、第２クエリに意図的に含めないことが、第３属性値を除外することに相当する。属性データベースＤＢ２に存在する第３属性値ではあるが、第２クエリに含めないことは、第３属性値を除外することに相当する。第３属性値の除外は、第３属性値の無効化又は削除ということもできる。 Excluding the third attribute value means not including the third attribute value in the second query. In other words, intentionally not including a third attribute value that can actually be used for learning in the second query corresponds to excluding the third attribute value. Although the third attribute value exists in the attribute database DB2, not including it in the second query corresponds to excluding the third attribute value. Exclusion of the third attribute value can also be referred to as disabling or deleting the third attribute value.

本実施形態では、第２クエリは、第２属性を含み、複数の第３属性値のうちの少なくとも一部が除外される場合を例に挙げる。即ち、第２クエリが、第２属性と、第３属性値と、の両方を含む場合を例に挙げる。なお、第２クエリは、第２属性を含まずに、複数の第３属性値のうちの除外されなかった第３属性値を含んでもよい。ある程度の数の第３属性値が利用可能であり、かつ、第３属性値だけでも第２クエリの意味をある程度認識できる場合には、第２クエリは、第２属性を含まなくてもよい。 In this embodiment, the second query includes the second attribute, and at least some of the multiple third attribute values are excluded. That is, the second query includes both the second attribute and the third attribute value. The second query may not include the second attribute, but may include the third attribute values that are not excluded from the multiple third attribute values. If a certain number of third attribute values are available and the meaning of the second query can be understood to some extent from the third attribute values alone, the second query does not need to include the second attribute.

図５のクエリＱ４０が第２クエリに相当したとすると、ナレッジトークン、第２属性に相当する属性トークン、第３属性値に相当する属性値トークン、及びＳＥＰトークンを含む。ＳＥＰトークンは、区切りを示す特別なトークンである。例えば、第２データと第２属性の区切りを示すＳＥＰトークン（図５では、タイトルトークンとナレッジトークンの間のＳＥＰトークン）、第２属性と第３属性値の区切りを示すＳＥＰトークン（図５では、属性トークンと属性値トークンの間のＳＥＰトークン）、及び第３属性値同士の区切りを示すＳＥＰトークン（図５では、属性値トークン同士の間のＳＥＰトークン）がある。なお、図５のＣＬＳトークンは、冒頭に配置される特別なトークンである。隠れ状態は、属性値抽出モデルＭ３の内部の計算を保持するためのメモリに相当する。隠れ状態には、埋め込み表現等の情報が保持されてもよい。 If the query Q40 in FIG. 5 corresponds to the second query, it includes a knowledge token, an attribute token corresponding to the second attribute, an attribute value token corresponding to the third attribute value, and an SEP token. The SEP token is a special token that indicates a separation. For example, there is an SEP token that indicates a separation between the second data and the second attribute (in FIG. 5, an SEP token between the title token and the knowledge token), an SEP token that indicates a separation between the second attribute and the third attribute value (in FIG. 5, an SEP token between the attribute token and the attribute value token), and an SEP token that indicates a separation between the third attribute values (in FIG. 5, an SEP token between the attribute value tokens). Note that the CLS token in FIG. 5 is a special token that is placed at the beginning. The hidden state corresponds to a memory for holding calculations inside the attribute value extraction model M3. The hidden state may hold information such as embedded expressions.

ナレッジドロップアウト手法では、第２クエリ取得部２０３は、複数の第３属性値のうちの一部のみを除外する。第２クエリ取得部２０３は、複数の第３属性値のうち、除外されなかった残りの第３属性値を含む第２クエリを取得する。例えば、ある第２属性に関連付けられた第３属性値がｋ個（ｋは２以上の整数）だったとすると、ｋ個の第３属性値のうち、１個以上ｋ個未満の第３属性値を除外することは、複数の第３属性値のうちの一部のみを除外することに相当する。除外される第３属性値の個数又は割合は、予め定められていてもよいし、動的に変わってもよい。 In the knowledge dropout method, the second query acquisition unit 203 excludes only a portion of the multiple third attribute values. The second query acquisition unit 203 acquires a second query including the remaining third attribute values that have not been excluded from the multiple third attribute values. For example, if there are k third attribute values (k is an integer of 2 or more) associated with a certain second attribute, then excluding 1 or more but less than k third attribute values from the k third attribute values corresponds to excluding only a portion of the multiple third attribute values. The number or proportion of the excluded third attribute values may be determined in advance or may change dynamically.

本実施形態では、第２クエリ取得部２０３は、第３属性値ごとに、当該第３属性値に関連付けられた確率に基づいて、当該第３属性値を除外するか否かを決定する。第２クエリ取得部２０３は、複数の第３属性値のうち、上記確率に基づいて除外すると決定した一部のみを除外する。例えば、第２クエリ取得部２０３は、第３属性値ごとに、当該第３属性値に関連付けられた、当該第３属性値の頻度に応じた確率に基づいて、当該第３属性値を除外するか否かを決定する。 In this embodiment, the second query acquisition unit 203 determines, for each third attribute value, whether or not to exclude the third attribute value based on the probability associated with the third attribute value. The second query acquisition unit 203 excludes only a portion of the multiple third attribute values that are determined to be excluded based on the above probability. For example, the second query acquisition unit 203 determines, for each third attribute value, whether or not to exclude the third attribute value based on a probability associated with the third attribute value according to the frequency of the third attribute value.

先述した確率決定部２０２の例であれば、第３属性値の確率は、ｒ^ｎｖになる。第２クエリ取得部２０３は、第３属性値ごとに、当該第３属性値の確率ｒ^ｎｖに基づいて、当該第３属性値を除外するか否かを判定する。この判定は、種々の抽せんアルゴリズムを利用可能である。第２クエリ取得部２０３は、除外すると判定した第３属性値が第２クエリに含まれないように、当該第３属性値が第２クエリから除外する。第２クエリ取得部２０３は、除外すると判定されない第３属性値が第２クエリに含まれるように、当該第３属性値を第２クエリから除外しない。 In the example of the probability determination unit 202 described above, the probability of the third attribute value is r ^nv . The second query acquisition unit 203 determines, for each third attribute value, whether or not to exclude the third attribute value based on the probability r ^nv of the third attribute value. This determination can use various lottery algorithms. The second query acquisition unit 203 excludes the third attribute value determined to be excluded from the second query so that the third attribute value is not included in the second query. The second query acquisition unit 203 does not exclude the third attribute value from the second query so that the third attribute value not determined to be excluded is included in the second query.

本実施形態では、ひとまず全ての第３属性値が取得されるので、第２クエリ取得部２０３は、全ての第３属性値のうちの一部のみを除外する。第２クエリ取得部２０３は、全ての第３属性値のうち、除外されなかった残りの第３属性値を含む第２クエリを取得する。例えば、第２クエリ取得部２０３は、ひとまず全ての第３属性値を含む仮の第２クエリを取得する。第２クエリ取得部２０３は、仮の第２クエリの中から、除外すると判定した第３属性値を除外し、最終的な第２クエリを取得する。 In this embodiment, since all third attribute values are acquired for the time being, the second query acquisition unit 203 excludes only a portion of all third attribute values. The second query acquisition unit 203 acquires a second query including the remaining third attribute values that have not been excluded from all third attribute values. For example, the second query acquisition unit 203 acquires a tentative second query including all third attribute values for the time being. The second query acquisition unit 203 excludes the third attribute values that it has determined to be excluded from the tentative second query, and acquires a final second query.

例えば、第２クエリ取得部２０３は、複数の第３属性値を含む第２クエリを取得する。この第２クエリは、仮の第２クエリである。第２クエリ取得部２０３は、第２クエリに含まれる複数の第３属性値のうちの一部を他の値に置き換えることによって、当該一部の第３属性値を除外してもよい。他の値は、予め定められた値であればよく、例えば、所定の文字列である。他の値は、パディングと呼ばれることもある。他の値は、特に何の意味もない文字列であってもよい。ここでは、他の値は、［ＰＡＤ］といった文字列であるものとする。 For example, the second query acquisition unit 203 acquires a second query including a plurality of third attribute values. This second query is a tentative second query. The second query acquisition unit 203 may exclude some of the plurality of third attribute values included in the second query by replacing the some of the third attribute values with other values. The other values may be any predetermined value, for example, a predetermined character string. The other values may be called padding. The other values may be a character string that has no particular meaning. Here, the other values are assumed to be a character string such as [PAD].

例えば、除外される第３属性値が他の値に置き換えられた後の第２クエリは、最終的な第２クエリである。図５の入力Ｉ４１の例では、「１ａｈ」、「４ａｈ」、「５ａｈ」といった第３属性値が除外されている。図５の例では、除外された第３属性値が他の値に置き換えられずに、仮の第３クエリから削除される場合が示されているが、例えば、入力Ｉ４１の「２ａｈ」の前に、除外された「１ａｈ」を置き換えるための他の値が挿入されてもよい。同様に、「３ａｈ」と「６ａｈ」の間に、除外された「４ａｈ」と「５ａｈ」を置き換えるための他の値が挿入されてもよい。 For example, the second query after the excluded third attribute value is replaced with another value is the final second query. In the example of input I41 in FIG. 5, third attribute values such as "1ah", "4ah", and "5ah" are excluded. In the example of FIG. 5, a case is shown in which the excluded third attribute value is deleted from the tentative third query without being replaced with another value, but, for example, another value to replace the excluded "1ah" may be inserted before "2ah" in input I41. Similarly, another value to replace the excluded "4ah" and "5ah" may be inserted between "3ah" and "6ah".

以上説明した第２クエリは、ナレッジドロップアウト手法で利用される第２クエリである。ナレッジトークン手法では、第２クエリ取得部２０３は、第３属性値が利用可能なことを示す［Ｓｅｅｎ］のナレッジトークンと、複数の第３属性値のうちの少なくとも一部と、を含む第１タイプの第２クエリを取得する。［Ｓｅｅｎ］のナレッジトークンは、第１識別情報の一例である。このため、［Ｓｅｅｎ］のナレッジトークンについて説明している箇所は、第１識別情報と読み替えることができる。 The second query described above is the second query used in the knowledge dropout method. In the knowledge token method, the second query acquisition unit 203 acquires a first type of second query that includes a knowledge token of [Seen] indicating that the third attribute value is available and at least a part of the multiple third attribute values. The knowledge token of [Seen] is an example of first identification information. Therefore, the parts that describe the knowledge token of [Seen] can be read as first identification information.

第１識別情報は、第３属性値が利用可能であることを示せばよく、［Ｓｅｅｎ］のナレッジトークン以外にも任意の情報を利用可能である。例えば、［Ｓｅｅｎ］以外の他の文字列であってもよいし、数値又はその他の記号であってもよい。ただし、第１識別情報として利用されるトークンは、タイトル、属性名、及び属性値に含まれないものとする。学習済みの属性値抽出モデルＭ３による推測時には、どのようなタイトル、属性名、及び属性値が入力されるか分からないので、第１識別情報は、［］で囲う等することによって、第１識別情報であることが分かるようになっている。第１識別情報であることを示す識別子は、［］に限られず、他の記号であってもよい。第１タイプの第２クエリは、第１識別情報を含む第２クエリである。第１タイプの第２クエリは、第３属性値を含む第２クエリである。図５の例では、入力Ｉ４１のうちの［Ｓｅｅｎ］以降の部分が第１タイプの第２クエリに相当する。 The first identification information only needs to indicate that the third attribute value is available, and any information other than the knowledge token of [Seen] can be used. For example, it may be a character string other than [Seen], or may be a numerical value or other symbol. However, the token used as the first identification information is not included in the title, attribute name, and attribute value. When making a prediction using the trained attribute value extraction model M3, it is not known what title, attribute name, and attribute value will be input, so the first identification information is enclosed in [ ], etc., so that it can be known that it is the first identification information. The identifier indicating that it is the first identification information is not limited to [ ] and may be another symbol. The second query of the first type is a second query including the first identification information. The second query of the first type is a second query including the third attribute value. In the example of FIG. 5, the part of the input I41 after [Seen] corresponds to the second query of the first type.

また、ナレッジトークン手法では、第２クエリ取得部２０３は、第３属性値が利用可能ではないことを示す第２識別情報を含み、複数の第３属性値の全てが除外された第２タイプの第２クエリを取得する。［Ｕｎｓｅｅｎ］のナレッジトークンは、第２識別情報の一例である。このため、［Ｕｎｓｅｅｎ］のナレッジトークンについて説明している箇所は、第２識別情報と読み替えることができる。 In addition, in the knowledge token method, the second query acquisition unit 203 acquires a second query of a second type that includes second identification information indicating that the third attribute value is not available and in which all of the multiple third attribute values are excluded. The knowledge token of [Unseen] is an example of second identification information. Therefore, the description of the knowledge token of [Unseen] can be read as the second identification information.

第２識別情報は、第３属性値が利用可能ではないことを示せばよく、［Ｕｎｓｅｅｎ］のナレッジトークン以外にも任意の情報を利用可能である。例えば、［Ｕｎｓｅｅｎ］以外の他の文字列であってもよいし、数値又はその他の記号であってもよい。第２識別情報も、第１識別情報と同様に、［］で囲う等することによって、第２識別情報であることが分かるようになっているものとする。第２タイプの第２クエリは、第２識別情報を含む第２クエリである。図５の例では、入力Ｉ４２のうちの［Ｕｎｓｅｅｎ］以降の部分が第２タイプの第２クエリに相当する。図５の例では、第２タイプの第２クエリは、ナレッジトークンと第２属性だけを含む。 The second identification information only needs to indicate that the third attribute value is not available, and any information other than the knowledge token of [Unseen] can be used. For example, it may be a character string other than [Unseen], or it may be a numerical value or other symbol. As with the first identification information, the second identification information is also made recognizable as the second identification information by enclosing it in [ ], for example. The second query of the second type is a second query that includes the second identification information. In the example of FIG. 5, the portion of the input I42 following [Unseen] corresponds to the second query of the second type. In the example of FIG. 5, the second query of the second type includes only the knowledge token and the second attribute.

なお、［Ｓｅｅｎ］のナレッジトークンと、［Ｕｎｓｅｅｎ］のナレッジトークンと、は任意の位置に挿入されるようにすればよく、これらのナレッジトークンが挿入される位置は、図５の例に限られない。例えば、これらのナレッジトークンは、第２属性の後に挿入されてもよいし、第３属性値の後に挿入されてもよい。ナレッジトークンは、予め定められた位置に挿入されるようにすればよい。 The knowledge tokens [Seen] and [Unseen] may be inserted at any position, and the positions at which these knowledge tokens are inserted are not limited to the example in FIG. 5. For example, these knowledge tokens may be inserted after the second attribute or after the third attribute value. The knowledge tokens may be inserted at predetermined positions.

［学習部］
学習部２０４は、第２商品に関する第４属性値を含む第２データと、第２クエリと、に基づいて、属性値抽出モデルＭ３に関する学習を行う。第２商品は、第２アイテムの一例である。このため、第２商品と記載した箇所は、第２アイテムと読み替えることができる。第２アイテムは、第１アイテムの箇所で説明したアイテムの説明のうち、属性値抽出モデルＭ３の学習で利用されるアイテムである。第２アイテムは、正解となる第４属性値が特定されているアイテムである。第２アイテムは、属性値抽出モデルＭ３の訓練データとして用いられるアイテムである。 [Learning Department]
The learning unit 204 learns the attribute value extraction model M3 based on the second data including the fourth attribute value related to the second product and the second query. The second product is an example of a second item. Therefore, the section describing the second product can be read as the second item. The second item is an item that is used in learning the attribute value extraction model M3 among the items described in the section regarding the first item. The second item is an item for which the correct fourth attribute value has been identified. The second item is an item that is used as training data for the attribute value extraction model M3.

第２データは、第２アイテムに関する何らかの内容を含むデータである。第２データは、第２アイテムの詳細に関するデータである。第２データは、第２アイテムの種類に応じたデータであればよく、第２商品のタイトルに限られない。例えば、第２アイテムが宿泊施設であれば、宿泊施設又は部屋のタイトル又は紹介文が第２データに相当してもよい。例えば、アイテムが電子書籍であれば、電子書籍の実データ部分が第２データに相当してもよい。本実施形態では、第２データは、第２商品の説明に関する第２文字列を含む。 The second data is data including some content related to the second item. The second data is data related to details of the second item. The second data may be data according to the type of the second item, and is not limited to the title of the second product. For example, if the second item is an accommodation facility, the title or description of the accommodation facility or room may correspond to the second data. For example, if the item is an e-book, the actual data portion of the e-book may correspond to the second data. In this embodiment, the second data includes a second character string related to the description of the second product.

例えば、学習部２０４は、訓練データの入力部分が入力された場合に、訓練データの出力部分が出力されるように、属性値抽出モデルＭ３の学習を行う。学習は、属性値抽出モデルＭ３のパラメータを調整することである。学習自体は、属性値抽出モデルＭ３に応じた種々のアルゴリズムを利用可能であり、例えば、ＢＥＲＴ等の事前学習済みのＴｒａｎｓｆｏｒｍｅｒベースのモデルで行われるファインチューニングの態様をとってよく、他のモデルの最適なパラメータを求める際は誤差逆伝播法又は勾配降下法といったアルゴリズムを利用してもよい。 For example, the learning unit 204 learns the attribute value extraction model M3 so that when an input portion of training data is input, an output portion of training data is output. Learning is to adjust the parameters of the attribute value extraction model M3. The learning itself can use various algorithms according to the attribute value extraction model M3, and may take the form of fine tuning performed on a pre-trained Transformer-based model such as BERT, and algorithms such as backpropagation or gradient descent may be used when determining optimal parameters for other models.

本実施形態では、学習部２０４は、第２データ及び第２クエリが属性値抽出モデルＭ３に入力された場合に、第２データにおける第４属性値の部分を識別可能な始点Ｐｂと終点Ｐｅを属性値抽出モデルＭ３が出力するように、学習を行う。始点Ｐｂと終点Ｐｅは、部分識別情報の一例である。このため、始点Ｐｂと終点Ｐｅについて説明している箇所は、部分識別情報と読み替えることができる。図５の例では、入力Ｉ４１，Ｉ４２のうち、第４属性値の部分は「１００Ａｈ」の部分である。例えば、文字をトークンの単位とした場合には、「１」の部分が始点Ｐｂとなり、「ｈ」の部分が終点Ｐｅとなるように、学習が行われる。例えば、単語をトークンの単位として、「１００Ａｈ」が「１００」と「Ａｈ」に分かれた場合には、「１００」の部分が始点Ｐｂになり、「Ａｈ」の部分が終点Ｐｅになる。例えば、サブワードをトークンの単位として、「１００Ａｈ」が「１００」、「＃＃Ａ」、「＃＃ｈ」に分かれた場合には、「１００」の部分が始点Ｐｂになり、「＃＃ｈ」の部分が終点Ｐｅになる。このときサブワードに付く「＃＃」は、元の文字列において、当該サブワードがひとつ前のサブワードと連結していたことを意味する。図５のように、「ＡＨ」、「ａｈ」、「Ａｈ」といった多少の表記ゆれを吸収できるように、学習が行われる。 In this embodiment, the learning unit 204 performs learning so that when the second data and the second query are input to the attribute value extraction model M3, the attribute value extraction model M3 outputs a start point Pb and an end point Pe that can identify the part of the fourth attribute value in the second data. The start point Pb and the end point Pe are examples of partial identification information. Therefore, the part that describes the start point Pb and the end point Pe can be read as partial identification information. In the example of FIG. 5, the part of the fourth attribute value in the inputs I41 and I42 is the part "100Ah". For example, when a character is used as a unit of tokens, learning is performed so that the part "1" becomes the start point Pb and the part "h" becomes the end point Pe. For example, when a word is used as a unit of tokens and "100Ah" is divided into "100" and "Ah", the part "100" becomes the start point Pb and the part "Ah" becomes the end point Pe. For example, if "100Ah" is divided into "100," "##A," and "##h," with subwords as the unit of tokens, the "100" part will be the starting point Pb, and the "##h" part will be the end point Pe. In this case, the "##" attached to a subword means that in the original string, the subword in question was connected to the previous subword. As shown in Figure 5, learning is performed so that slight variations in spelling, such as "AH," "ah," and "Ah," can be absorbed.

部分識別情報は、第２データにおいて第４属性値が含まれている部分を特定可能な情報であればよい。例えば、第４属性値の１つ前の文字が何文字目であるかを示す情報と、第４属性値の１つ後の文字が何文字目であるかを示す情報と、のペアが部分識別情報に相当してもよい。他にも例えば、部分識別情報は、第４属性値の始点Ｐｂと終点Ｐｅだけではなく、その中間の文字も識別可能な情報を含んでもよい。なお、属性値抽出モデルＭ３は、始点Ｐｂと終点Ｐｅではなく、第４属性値そのものを出力してもよい。この場合、学習部２０４は、第２データ及び第２クエリが入力された場合に、第２データに含まれる第４属性値が出力されるように、属性値抽出モデルＭ３の学習を行う。 The partial identification information may be any information capable of identifying a portion of the second data that includes the fourth attribute value. For example, the partial identification information may be a pair of information indicating the number of the character immediately before the fourth attribute value and information indicating the number of the character immediately after the fourth attribute value. For another example, the partial identification information may include information capable of identifying not only the start point Pb and end point Pe of the fourth attribute value, but also characters in between. The attribute value extraction model M3 may output the fourth attribute value itself, rather than the start point Pb and end point Pe. In this case, the learning unit 204 learns the attribute value extraction model M3 so that, when the second data and the second query are input, the fourth attribute value included in the second data is output.

ナレッジトークン手法では、学習部２０４は、第２データと、第１タイプの第２クエリと、に基づいて、属性値抽出モデルＭ３に関する第１学習を行う。本実施形態では、第１学習は、ナレッジドロップアウト手法と同様の学習であるものとする。学習部２０４は、第２データと、第２タイプの第２クエリと、に基づいて、属性値抽出モデルＭ３に関する第２学習を行う。学習部２０４は、ある１つの第２商品に対し、第１タイプの第２クエリを利用した第１学習と、第２タイプの第２クエリを利用した第２学習と、の２つの学習を行う。 In the knowledge token method, the learning unit 204 performs a first learning on the attribute value extraction model M3 based on the second data and the second query of the first type. In this embodiment, the first learning is assumed to be the same as the learning in the knowledge dropout method. The learning unit 204 performs a second learning on the attribute value extraction model M3 based on the second data and the second query of the second type. The learning unit 204 performs two learnings for a certain second product: a first learning using the second query of the first type, and a second learning using the second query of the second type.

本実施形態では、学習部２０４は、第３属性値の利用可能性をドメインとみなしたマルチドメイン学習を利用して、第１学習及び第２学習を行う。本実施形態のマルチドメイン学習では、属性値知識の有無をドメインとみなし、各ドメインのデータとして、［Ｓｅｅｎ］又は［Ｕｎｓｅｅｎ］を含む入力が人工的に用意され、２つのドメインに対する属性値抽出モデルＭ３の学習が行われる。例えば、学習部２０４は、「Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the Second Conference on Machine Translation, pages 118-126, Copenhagen, Denmark. Association for Computational Linguistics.」のマルチドメイン学習の手法からインスパイアされた上記のマルチドメイン学習を実行する。 In this embodiment, the learning unit 204 performs the first learning and the second learning using multi-domain learning in which the availability of the third attribute value is regarded as a domain. In the multi-domain learning of this embodiment, the presence or absence of attribute value knowledge is regarded as a domain, and inputs including [Seen] or [Unseen] are artificially prepared as data for each domain, and the attribute value extraction model M3 for the two domains is learned. For example, the learning unit 204 performs the above multi-domain learning inspired by the multi-domain learning method of "Denny Britz, Quoc Le, and Reid Pryzant. 2017. Effective domain mixing for neural machine translation. In Proceedings of the Second Conference on Machine Translation, pages 118-126, Copenhagen, Denmark. Association for Computational Linguistics.".

第１学習により、第１タイプと同タイプのクエリが入力された場合には、第１タイプに対応する第１ドメインが推定で利用される。第２学習により、第２タイプと同タイプのクエリが入力された場合には、第２タイプに対応する第２ドメインが推定で利用される。第１ドメインは、第２ドメインよりもクエリ内の属性値に着目した推定が行われる。第２ドメインは、第１ドメインよりもクエリ内の属性に着目した推定が行われる。第１ドメインの方が第２ドメインよりも知識を重視した推定が行われることになる。なお、学習部２０４は、マルチドメイン学習以外の手法を利用して、第１タイプの第２クエリと、第２タイプの第２クエリと、を利用した学習を行ってもよい。 When a query of the same type as the first type is input by the first learning, the first domain corresponding to the first type is used in the estimation. When a query of the same type as the second type is input by the second learning, the second domain corresponding to the second type is used in the estimation. The first domain is estimated by focusing more on the attribute values in the query than the second domain. The second domain is estimated by focusing more on the attributes in the query than the first domain. The first domain is estimated by placing more importance on knowledge than the second domain. The learning unit 204 may use a method other than multi-domain learning to learn using the second query of the first type and the second query of the second type.

学習部２０４は、モデル記憶部２００に記憶された学習前の属性値抽出モデルＭ３の学習が完了すると、学習済みの属性値抽出モデルＭ３をモデル記憶部２００に記録する。学習部２０４は、サーバ１０に対し、学習済みの属性値抽出モデルＭ３を送信する。サーバ１０は、学習済みの属性値抽出モデルＭ３を受信すると、学習済みの属性値抽出モデルＭ３をモデル記憶部１００に記録する。 When the learning unit 204 completes learning the pre-learning attribute value extraction model M3 stored in the model storage unit 200, it records the learned attribute value extraction model M3 in the model storage unit 200. The learning unit 204 transmits the learned attribute value extraction model M3 to the server 10. When the server 10 receives the learned attribute value extraction model M3, it records the learned attribute value extraction model M3 in the model storage unit 100.

［３－３．推定端末で実現される機能］
モデル記憶部３００は、記憶部３２により実現される。利用可能性判定部３０１、第１クエリ取得部３０２、及び第２属性値抽出部３０３は、制御部３１を主として実現される。これらの機能は、推定機能の一例である。 [3-3. Functions realized by the estimated terminal]
The model storage unit 300 is realized by the storage unit 32. The availability determination unit 301, the first query acquisition unit 302, and the second attribute value extraction unit 303 are mainly realized by the control unit 31. These functions are examples of estimation functions.

［モデル記憶部］
モデル記憶部３００は、学習済みの属性値抽出モデルＭ３を記憶する。例えば、推定端末３０は、サーバ１０から学習済みの属性値抽出モデルＭ３をダウンロードすると、学習済みの属性値抽出モデルＭ３をモデル記憶部３００に記録する。モデル記憶部３００は、属性データベースＤＢ２も記憶する。この属性データベースＤＢ２も、サーバ１０からダウンロードされたものであってもよい。 [Model storage unit]
The model storage unit 300 stores the trained attribute value extraction model M3. For example, when the estimation terminal 30 downloads the trained attribute value extraction model M3 from the server 10, the estimation terminal 30 records the trained attribute value extraction model M3 in the model storage unit 300. The model storage unit 300 also stores an attribute database DB2. This attribute database DB2 may also be downloaded from the server 10.

［利用可能性判定部］
利用可能性判定部３０１は、推定用の第１商品に関する第１属性に関連付けられた少なくとも１つの第１属性値の利用可能性を判定する。本実施形態では、第１属性によっては、第１属性値を利用可能ではないことがある。例えば、未知の第１属性が第１クエリとして入力された場合には、第１属性値を利用可能ではない。このため、学習用の第２属性は、原則として属性データベースＤＢ２に格納されているが、推定用の第１属性は、属性データベースＤＢ２に格納されているとは限らない。また、第１属性が属性データベースＤＢ２に格納されていたとしても、第１属性値が存在しなければ、必ずしも、第１属性に第１属性値が関連付けられているとは限らない。 [Availability determination unit]
The availability determination unit 301 determines the availability of at least one first attribute value associated with the first attribute related to the first product for estimation. In this embodiment, depending on the first attribute, the first attribute value may not be available. For example, when an unknown first attribute is input as the first query, the first attribute value is not available. For this reason, the second attribute for learning is stored in the attribute database DB2 in principle, but the first attribute for estimation is not necessarily stored in the attribute database DB2. Furthermore, even if the first attribute is stored in the attribute database DB2, if the first attribute value does not exist, it is not necessarily associated with the first attribute.

例えば、利用可能性判定部３０１は、属性データベースＤＢ２を参照し、第１属性が存在するか否かを判定する。利用可能性判定部３０１は、属性データベースＤＢ２に第１属性が存在しないと判定した場合には、第１属性値の利用可能性がないと判定する。利用可能性判定部３０１は、属性データベースＤＢ２に第１属性が存在すると判定した場合には、第１属性に関連付けられた第１属性値が属性データベースＤＢ２に存在するか否かを判定する。 For example, the availability determination unit 301 refers to the attribute database DB2 and determines whether or not the first attribute exists. If the availability determination unit 301 determines that the first attribute does not exist in the attribute database DB2, it determines that the first attribute value is not available. If the availability determination unit 301 determines that the first attribute exists in the attribute database DB2, it determines whether or not the first attribute value associated with the first attribute exists in the attribute database DB2.

利用可能性判定部３０１は、第１属性に関連付けられた第１属性値が属性データベースＤＢ２に存在しないと判定した場合には、第１属性値の利用可能性がないと判定する。利用可能性判定部３０１は、第１属性に関連付けられた第１属性値が属性データベースＤＢ２に存在すると判定した場合には、第１属性値の利用可能性があると判定する。なお、利用可能性の判定方法は、本実施形態の例に限られない。例えば、利用可能性判定部３０１は、第１属性に関連付けられた第１属性値が存在したとしても、その第１属性値の個数が閾値未満だったり、その第１属性値の頻度が閾値未満だったりした場合には、第１属性値が利用可能ではないと判定してもよい。 When the availability determination unit 301 determines that the first attribute value associated with the first attribute does not exist in the attribute database DB2, it determines that the first attribute value is not available. When the availability determination unit 301 determines that the first attribute value associated with the first attribute exists in the attribute database DB2, it determines that the first attribute value is available. Note that the method of determining availability is not limited to the example of this embodiment. For example, even if a first attribute value associated with the first attribute exists, the availability determination unit 301 may determine that the first attribute value is not available if the number of first attribute values is less than a threshold value or the frequency of the first attribute value is less than a threshold value.

なお、本実施形態では、店舗の担当者が第１属性を指定する場合を例に挙げるが、第１属性は、何らかの形で取得されるようにすればよく、他の方法によって取得されてもよい。例えば、オンラインショッピングモールの管理者が第１属性を指定してもよい。例えば、商品データと属性の関係が学習された機械学習モデルに基づいて、第１データから第１属性が推定されることによって、第１属性が取得されてもよい。他にも例えば、店舗の担当者が第１商品に対して指定した属性と関連性のある他の属性が、第１属性として取得されてもよい。第１データも同様であり、店舗の担当者が指定するだけではなく、任意の方法によって取得可能である。例えば、推定端末３０又は他のコンピュータに第１データ及び第１属性が記憶されており、当該記憶された第１データ及び第１属性が取得されてもよい。 In the present embodiment, the first attribute is specified by a store staff member, but the first attribute may be acquired by other methods as long as it is acquired in some form. For example, the manager of an online shopping mall may specify the first attribute. For example, the first attribute may be acquired by estimating the first attribute from the first data based on a machine learning model that has learned the relationship between product data and attributes. In another example, another attribute related to the attribute specified by the store staff member for the first product may be acquired as the first attribute. The same is true for the first data, and it can be acquired by any method other than being specified by the store staff member. For example, the first data and the first attribute may be stored in the estimation terminal 30 or another computer, and the stored first data and first attribute may be acquired.

［第１クエリ取得部］
第１クエリ取得部３０２は、推定用の第１商品に関する第１属性に関連付けられた少なくとも１つの第１属性値を含む第１クエリを取得する。本実施形態では、第１クエリは、原則として、第１属性と、少なくとも１つの第１属性値と、を含む。ただし、第１属性値が利用可能とは限らないので、この場合には、第１クエリは、第１属性を含むが第１属性値は含まないこともある。第１属性値が利用可能な場合には、第１クエリには、第１属性が１つだけ含まれてもよい。 [First query acquisition unit]
The first query acquisition unit 302 acquires a first query including at least one first attribute value associated with a first attribute related to a first product for estimation. In this embodiment, the first query includes, in principle, the first attribute and at least one first attribute value. However, since the first attribute value is not always available, in this case, the first query may include the first attribute but not the first attribute value. When the first attribute value is available, the first query may include only one first attribute.

ナレッジトークン手法では、第１クエリ取得部３０２は、第１属性値が利用可能であると判定された場合には、第１属性値が利用可能なことを示す第３識別情報と、少なくとも１つの第１属性値と、を含む第３タイプの第１クエリを取得する。第３識別情報は、第１属性値の利用可能性を示すという点で第１識別情報とは異なるが、他の点は、第１識別情報と同様である。このため、本実施形態では、第３識別情報は、［Ｓｅｅｎ］の文字列である。第３タイプは、推定用の第１クエリのタイプという点で第１タイプとは異なるが、データとしての形式自体は、第１タイプと同様である。第３識別情報及び第３タイプの詳細は、第１識別情報及び第１タイプの詳細と同様である。 In the knowledge token method, when it is determined that the first attribute value is available, the first query acquisition unit 302 acquires a third type of first query including third identification information indicating that the first attribute value is available and at least one first attribute value. The third identification information differs from the first identification information in that it indicates the availability of the first attribute value, but is similar to the first identification information in other respects. For this reason, in this embodiment, the third identification information is the character string [Seen]. The third type differs from the first type in that it is the type of first query for estimation, but the format itself as data is similar to the first type. The details of the third identification information and the third type are similar to the details of the first identification information and the first type.

第１クエリ取得部３０２は、第１属性値が利用可能であると判定されない場合には、第１属性値が利用可能ではないことを示す第４識別情報を含む第４タイプの第１クエリを取得する。第４識別情報は、第１属性値の利用可能性を示すという点で第２識別情報とは異なるが、他の点は、第２識別情報と同様である。このため、本実施形態では、第４識別情報は、［Ｕｎｓｅｅｎ］の文字列である。第４タイプは、推定用の第１クエリのタイプという点で第２タイプとは異なるが、データとしての形式自体は、第４タイプと同様である。第４識別情報及び第４タイプの詳細は、第２識別情報及び第２タイプの詳細と同様である。 When it is not determined that the first attribute value is available, the first query acquisition unit 302 acquires a fourth type of first query including fourth identification information indicating that the first attribute value is unavailable. The fourth identification information differs from the second identification information in that it indicates the availability of the first attribute value, but is otherwise similar to the second identification information. For this reason, in this embodiment, the fourth identification information is the character string [Unseen]. The fourth type differs from the second type in that it is the type of first query for estimation, but the format itself as data is similar to the fourth type. Details of the fourth identification information and the fourth type are similar to the details of the second identification information and the second type.

なお、本実施形態では、ナレッジドロップアウト手法は、学習時を想定したものであり、推定時を想定したものではないものとする。このため、第１クエリ取得部３０２は、第１属性値が利用可能であると判定された場合には、利用可能な第１属性値の全てが第１クエリに含まれるように、第１クエリを取得する。第１クエリは、利用可能な第１属性値の一部だけを含んでもよい。この場合、第１クエリ取得部３０２は、最も頻度が高い第１属性値だけを含む第１クエリを取得してもよい。他にも例えば、第１クエリ取得部３０２は、ランダムに選択した第１属性値を含む第１クエリ、頻度が閾値以上の全ての第１属性値を含む第１クエリ、又は頻度が高い順に所定数の第１属性値を含む第１クエリを取得してもよい。他にも例えば、第１属性値が埋め込み表現化された状態で第１クエリに含まれるようにしてもよい。 In this embodiment, the knowledge dropout method is assumed to be used during learning, not during estimation. Therefore, when it is determined that the first attribute value is available, the first query acquisition unit 302 acquires the first query so that all available first attribute values are included in the first query. The first query may include only a portion of the available first attribute values. In this case, the first query acquisition unit 302 may acquire a first query including only the most frequent first attribute value. For example, the first query acquisition unit 302 may acquire a first query including a randomly selected first attribute value, a first query including all first attribute values whose frequency is equal to or greater than a threshold, or a first query including a predetermined number of first attribute values in descending order of frequency. For example, the first attribute value may be included in the first query in an embedded representation state.

［第２属性値抽出部］
第２属性値抽出部３０３は、第１商品に関する第２属性値を含む第１データ、第１クエリ、及び学習済みの属性値抽出モデルＭ３に基づいて、第１データから第２属性値を抽出する。例えば、第２属性値抽出部３０３は、第１データ及び第１クエリを、学習済みの属性値抽出モデルＭ３に入力する。学習済みの属性値抽出モデルＭ３は、第１データ及び第１クエリの埋め込みベクトル（埋め込み表現）を計算し、当該計算された埋め込みベクトルに基づいて、第２属性値の始点Ｐｂと終点Ｐｅを出力する。第２属性値抽出部３０３は、第１データのうち、始点Ｐｂから終点Ｐｅまでの部分を、第２属性値として抽出する。 [Second attribute value extraction unit]
The second attribute value extraction unit 303 extracts the second attribute value from the first data based on the first data including the second attribute value related to the first product, the first query, and the trained attribute value extraction model M3. For example, the second attribute value extraction unit 303 inputs the first data and the first query to the trained attribute value extraction model M3. The trained attribute value extraction model M3 calculates an embedding vector (embedded expression) of the first data and the first query, and outputs a start point Pb and an end point Pe of the second attribute value based on the calculated embedding vector. The second attribute value extraction unit 303 extracts a portion of the first data from the start point Pb to the end point Pe as the second attribute value.

ナレッジトークン手法では、第２属性値抽出部３０３は、第１属性値が利用可能であると判定された場合には、第１商品に関する第２属性値を含む第１データ、第３タイプの第１クエリ、及び学習済みの属性値抽出モデルＭ３に基づいて、第１データから第２属性値を抽出する。第２属性値抽出部３０３は、第１データと、第３タイプの第１クエリと、を学習済みの属性値抽出モデルＭ３に入力する。学習済みの属性値抽出モデルＭ３は、これらの埋め込みベクトルに応じた第２属性値の始点Ｐｂと終点Ｐｅを出力する。 In the knowledge token method, when it is determined that the first attribute value is usable, the second attribute value extraction unit 303 extracts the second attribute value from the first data based on the first data including the second attribute value related to the first product, the first query of the third type, and the trained attribute value extraction model M3. The second attribute value extraction unit 303 inputs the first data and the first query of the third type to the trained attribute value extraction model M3. The trained attribute value extraction model M3 outputs the start point Pb and end point Pe of the second attribute value according to these embedding vectors.

第２属性値抽出部３０３は、第１属性値が利用可能であると判定されない場合には、第１データ、第４タイプの第１クエリ、及び学習済みの属性値抽出モデルＭ３に基づいて、第１データから第２属性値を抽出する。第２属性値抽出部３０３は、第１データと、第４タイプの第１クエリと、を学習済みの属性値抽出モデルＭ３に入力する。学習済みの属性値抽出モデルＭ３は、これらの埋め込みベクトルに応じた第２属性値の始点Ｐｂと終点Ｐｅを出力する。なお、属性値抽出モデルＭ３が、始点Ｐｂと終点Ｐｅを出力するのではなく、第２属性値そのものを出力する場合には、第２属性値抽出部３０３は、属性値抽出モデルＭ３から出力された第２属性値を取得すればよい。 If it is not determined that the first attribute value is available, the second attribute value extraction unit 303 extracts the second attribute value from the first data based on the first data, the first query of the fourth type, and the trained attribute value extraction model M3. The second attribute value extraction unit 303 inputs the first data and the first query of the fourth type to the trained attribute value extraction model M3. The trained attribute value extraction model M3 outputs the start point Pb and the end point Pe of the second attribute value according to these embedding vectors. Note that, if the attribute value extraction model M3 outputs the second attribute value itself instead of outputting the start point Pb and the end point Pe, the second attribute value extraction unit 303 may obtain the second attribute value output from the attribute value extraction model M3.

［４．学習システムで実行される処理］
図９及び図１０は、学習システム１で実行される処理の一例を示すフロー図である。図９及び図１０の処理は、制御部１１，２１，３１がそれぞれ記憶部１２，２２，３２に記憶されたプログラムに従って動作することによって実行される。図９及び図１０の処理が実行されるにあたり、訓練データベースＤＢ１及び属性データベースＤＢ２は、予め作成されているものとする。 4. Processing Executed by the Learning System
Figures 9 and 10 are flow diagrams showing an example of processing executed by the learning system 1. The processing in Figures 9 and 10 is executed by the control units 11, 21, and 31 operating in accordance with programs stored in the storage units 12, 22, and 32, respectively. When the processing in Figures 9 and 10 is executed, the training database DB1 and the attribute database DB2 are assumed to have been created in advance.

図９のように、学習端末２０は、訓練データベースＤＢ１に第２データが格納された第２商品のうち、学習対象の第２商品を決定する（Ｓ１）。学習対象の第２商品は、Ｓ２以降の処理の対象となる第２商品である。例えば、訓練データベースＤＢ１に第２データが格納された第２商品の中から、まだ属性値抽出モデルＭ３に学習させていない第２商品が学習対象として決定される。 As shown in FIG. 9, the learning terminal 20 determines a second product to be learned from among the second products whose second data is stored in the training database DB1 (S1). The second product to be learned is the second product that is the subject of processing from S2 onwards. For example, from among the second products whose second data is stored in the training database DB1, a second product that has not yet been trained by the attribute value extraction model M3 is determined as the learning target.

学習端末２０は、訓練データベースＤＢ１を参照し、学習対象の第２商品の第２データ及び第２属性を取得する（Ｓ２）。学習端末２０は、属性データベースＤＢ２を参照し、Ｓ２で取得された第２属性に関連付けられた全ての第３属性値を取得する（Ｓ３）。学習端末２０は、Ｓ２で取得された第２属性と、Ｓ３で取得された全ての第３属性値と、を含む仮の第２クエリを取得する（Ｓ４）。Ｓ４の時点でナレッジトークンが挿入されてもよいが、ここでは、まだナレッジトークンは挿入されないものとする。第２属性と第３属性値の間と、第３属性値同士の間と、には、ＳＥＰトークンが挿入される。 The learning terminal 20 refers to the training database DB1 and acquires the second data and the second attribute of the second product to be learned (S2). The learning terminal 20 refers to the attribute database DB2 and acquires all the third attribute values associated with the second attribute acquired in S2 (S3). The learning terminal 20 acquires a tentative second query including the second attribute acquired in S2 and all the third attribute values acquired in S3 (S4). A knowledge token may be inserted at the point of S4, but it is assumed here that no knowledge token is inserted yet. An SEP token is inserted between the second attribute and the third attribute value, and between the third attribute values.

学習端末２０は、第３属性値ごとに、除外レートと、当該第３属性値の頻度と、に基づいて、当該第３属性値が除外される確率を決定する（Ｓ５）。先述したように、除外レートは、予め指定されている。頻度は、属性データベースＤＢ２に格納されているものとする。学習端末２０は、第３属性値ごとに、当該第３属性値の確率に基づいて、当該第３属性値を除外するか否かを判定する（Ｓ６）。Ｓ６の判定により、全ての第３属性値のうちの一部が除外されると判定される。なお、極めて低い確率で、全ての第３属性値が除外されると判定されることもあるが、本実施形態では、この点は考えないものとする。 The learning terminal 20 determines, for each third attribute value, the probability that the third attribute value will be excluded based on the exclusion rate and the frequency of the third attribute value (S5). As described above, the exclusion rate is specified in advance. The frequency is assumed to be stored in the attribute database DB2. The learning terminal 20 determines, for each third attribute value, whether or not to exclude the third attribute value based on the probability of the third attribute value (S6). As a result of the determination in S6, it is determined that some of all third attribute values will be excluded. Note that there are cases where it is determined with an extremely low probability that all third attribute values will be excluded, but this point is not taken into consideration in this embodiment.

学習端末２０は、Ｓ４で取得した仮の第２クエリのうち、Ｓ６で除外すると判定した第３属性値を他の値（例えば、パディング用のトークン）に置き換えることによって、第１タイプの第２クエリを取得する（Ｓ７）。Ｓ７では、［Ｓｅｅｎ］のナレッジトークンも第２クエリに挿入される。学習端末２０は、Ｓ４で取得した仮の第２クエリのうち、第３属性値が全て除外された第２タイプの第２クエリを取得する（Ｓ８）。Ｓ８では、［Ｕｎｓｅｅｎ］のナレッジトークンも第２クエリに挿入される。 The learning terminal 20 acquires a second query of the first type by replacing the third attribute values determined to be excluded in S6 from the provisional second query acquired in S4 with other values (e.g., padding tokens) (S7). In S7, the knowledge token of [Seen] is also inserted into the second query. The learning terminal 20 acquires a second query of the second type from the provisional second query acquired in S4, in which all third attribute values have been excluded (S8). In S8, the knowledge token of [Unseen] is also inserted into the second query.

学習端末２０は、Ｓ２で取得した第２データ、第１タイプの第２クエリ、及び第２タイプの第２クエリに基づいて、マルチドメイン学習を利用して、属性値抽出モデルＭ３の学習を行う（Ｓ９）。Ｓ９では、学習端末２０は、学習対象の第２商品の第２データ及び第１タイプの第２クエリが入力された場合に、当該第２データにおける第４属性値の始点Ｐｂと終点Ｐｅが出力されるように、属性値抽出モデルＭ３の学習を行う。学習端末２０は、学習対象の第２商品の第２データ及び第２タイプの第２クエリが入力された場合に、当該第２データにおける第４属性値の始点Ｐｂと終点Ｐｅが出力されるように、属性値抽出モデルＭ３の学習を行う。 The learning terminal 20 uses multi-domain learning to learn the attribute value extraction model M3 based on the second data, the first type second query, and the second type second query acquired in S2 (S9). In S9, the learning terminal 20 learns the attribute value extraction model M3 so that when the second data of the second product to be learned and the first type second query are input, the start point Pb and end point Pe of the fourth attribute value in the second data are output. The learning terminal 20 learns the attribute value extraction model M3 so that when the second data of the second product to be learned and the second type second query are input, the start point Pb and end point Pe of the fourth attribute value in the second data are output.

学習端末２０は、属性値抽出モデルＭ３の学習を完了するか否かを判定する（Ｓ１０）。Ｓ１０の判定は、予め定められた条件に基づいて実行されるようにすればよい。例えば、訓練データベースＤＢ１に格納された全ての第２データが学習で利用された場合に、属性値抽出モデルＭ３の学習が完了してもよいし、一定数の第２データが学習で利用された場合に、属性値抽出モデルＭ３の学習が完了してもよい。 The learning terminal 20 determines whether or not learning of the attribute value extraction model M3 is complete (S10). The determination in S10 may be performed based on a predetermined condition. For example, learning of the attribute value extraction model M3 may be completed when all of the second data stored in the training database DB1 have been used in learning, or learning of the attribute value extraction model M3 may be completed when a certain number of second data have been used in learning.

Ｓ１０において、属性値抽出モデルＭ３の学習を完了すると判定されない場合（Ｓ１０：Ｎ）、Ｓ１の処理に戻る。この場合、次の学習対象の第２商品が決定されて属性値抽出モデルＭ３の学習が継続される。属性値抽出モデルＭ３の学習を完了すると判定された場合（Ｓ１０：Ｙ）、学習端末２０は、属性値抽出モデルＭ３の学習を完了し、サーバ１０に対し、学習済みの属性値抽出モデルＭ３を送信する（Ｓ１１）。サーバ１０は、学習済みの属性値抽出モデルＭ３を受信すると（Ｓ１２）、学習済みの属性値抽出モデルＭ３を記憶部１２に記録する（Ｓ１３）。以降、推定端末３０から学習済みの属性値抽出モデルＭ３が利用可能になる。 If it is not determined in S10 that the learning of the attribute value extraction model M3 is completed (S10:N), the process returns to S1. In this case, the next learning target, the second product, is determined and the learning of the attribute value extraction model M3 continues. If it is determined that the learning of the attribute value extraction model M3 is completed (S10:Y), the learning terminal 20 completes the learning of the attribute value extraction model M3 and transmits the learned attribute value extraction model M3 to the server 10 (S11). When the server 10 receives the learned attribute value extraction model M3 (S12), it records the learned attribute value extraction model M3 in the memory unit 12 (S13). Thereafter, the learned attribute value extraction model M3 becomes available from the estimation terminal 30.

図１０に移り、推定端末３０は、サーバ１０から、学習済みの属性値抽出モデルＭ３をダウンロードして記憶部２２に記録する（Ｓ１４）。推定端末３０は、記憶部３２に予め記憶された第１商品の第１データと、店舗の担当者により指定された第１属性と、を取得する（Ｓ１５）。推定端末３０は、属性データベースＤＢ２を参照し、第１属性値の利用可能性を判定する（Ｓ１６）。 Moving on to FIG. 10, the estimation terminal 30 downloads the trained attribute value extraction model M3 from the server 10 and records it in the memory unit 22 (S14). The estimation terminal 30 acquires the first data of the first product pre-stored in the memory unit 32 and the first attribute specified by the store staff (S15). The estimation terminal 30 refers to the attribute database DB2 and determines the usability of the first attribute value (S16).

第１属性値を利用可能であると判定された場合（Ｓ１６：可）、推定端末３０は、第１属性に関連付けられた全ての第１属性値を取得する（Ｓ１７）。推定端末３０は、Ｓ１５で取得した第１属性と、全ての第１属性値と、を含む第３タイプの第１クエリを取得する（Ｓ１８）。推定端末３０は、第１データ、第３タイプの第１クエリ、及び学習済みの属性値抽出モデルＭ３に基づいて、第１データから第２属性値を抽出し（Ｓ１９）、本処理は終了する。Ｓ１９では、属性値抽出モデルＭ３の第１ドメインを利用した推定が実行される。 If it is determined that the first attribute value is usable (S16: Yes), the estimation terminal 30 acquires all first attribute values associated with the first attribute (S17). The estimation terminal 30 acquires a third type first query including the first attribute acquired in S15 and all the first attribute values (S18). The estimation terminal 30 extracts a second attribute value from the first data based on the first data, the third type first query, and the learned attribute value extraction model M3 (S19), and the process ends. In S19, estimation is performed using the first domain of the attribute value extraction model M3.

Ｓ１６において、第１属性値を利用可能であると判定されない場合（Ｓ１６：不可）、推定端末３０は、第１属性値を含まずに、第１属性を含む第４タイプの第１クエリを取得する（Ｓ２０）。推定端末３０は、第１データ、第４タイプの第１クエリ、及び学習済みの属性値抽出モデルＭ３に基づいて、第１データから第２属性値を抽出し（Ｓ２１）、本処理は終了する。Ｓ２１では、属性値抽出モデルＭ３の第２ドメインを利用した推定が実行される。 If it is not determined in S16 that the first attribute value is usable (S16: Not possible), the estimation terminal 30 acquires a first query of a fourth type that does not include the first attribute value but does include the first attribute (S20). The estimation terminal 30 extracts a second attribute value from the first data based on the first data, the first query of the fourth type, and the learned attribute value extraction model M3 (S21), and the process ends. In S21, estimation is performed using the second domain of the attribute value extraction model M3.

本実施形態の学習システム１によれば、第４属性値を含む第２データと、第２商品に関する第２属性に関連付けられた複数の第３属性値のうちの少なくとも一部が除外された第２クエリと、に基づいて、属性値抽出モデルＭ３に関する学習を行う。これにより、本当は学習で利用可能な第３属性値を除外することによって、知識の不完全さを属性値抽出モデルＭ３に学習させることができるので、属性値抽出モデルＭ３の精度が高まる。例えば、実運用では、完全には網羅しきれていない属性データベースＤＢ２といった不完全な知識を利用する必要がある。この不完全さを属性値抽出モデルＭ３に学習させることによって、未知の第１属性が入力されたり、属性データベースＤＢ２にほとんど第１属性値が存在しない第１属性が入力されたりしたとしても、属性値抽出モデルＭ３は、第１データから第２属性値を抽出可能になる。更に、従来の属性値抽出モデルＭ２に比べるとクエリ拡張を実現できるので、この点でも、属性値抽出モデルＭ３の精度が高まる。 According to the learning system 1 of the present embodiment, learning is performed on the attribute value extraction model M3 based on the second data including the fourth attribute value and the second query in which at least a part of the multiple third attribute values associated with the second attribute related to the second product is excluded. As a result, by excluding the third attribute value that is actually available for learning, the attribute value extraction model M3 can learn the incompleteness of knowledge, thereby improving the accuracy of the attribute value extraction model M3. For example, in actual operation, it is necessary to use incomplete knowledge such as the attribute database DB2 that is not completely covered. By having the attribute value extraction model M3 learn this incompleteness, even if an unknown first attribute is input or a first attribute with almost no first attribute value is input in the attribute database DB2, the attribute value extraction model M3 can extract the second attribute value from the first data. Furthermore, since the query can be expanded more than in the conventional attribute value extraction model M2, the accuracy of the attribute value extraction model M3 is also improved in this respect.

また、学習システム１は、第１クエリは、第１属性と、少なくとも１つの第１属性値と、を含む。第２クエリは、第２属性を含み、複数の第３属性値のうちの少なくとも一部が除外される。これにより、第１属性値だけではなく第１属性も第１クエリに含めることができ、かつ、第２属性を第２クエリに含めることができるので、より効果的にクエリ拡張を実現できる。その結果、属性値抽出モデルＭ３の精度が高まる。 In addition, in the learning system 1, the first query includes the first attribute and at least one first attribute value. The second query includes the second attribute, and at least some of the multiple third attribute values are excluded. This allows not only the first attribute value but also the first attribute to be included in the first query, and allows the second attribute to be included in the second query, thereby achieving more effective query expansion. As a result, the accuracy of the attribute value extraction model M3 is improved.

また、学習システム１は、複数の第３属性値のうちの一部のみを除外し、複数の第３属性値のうち、除外されなかった残りの第３属性値を含む第２クエリを取得する。これにより、第２クエリにある程度の第３属性値を含めることができるので、クエリ拡張を実現しつつ、知識の不完全さを属性値抽出モデルＭ３に学習させることができる。 The learning system 1 also excludes only a portion of the multiple third attribute values, and obtains a second query that includes the remaining third attribute values that were not excluded from the multiple third attribute values. This allows a certain number of third attribute values to be included in the second query, making it possible to have the attribute value extraction model M3 learn about the incompleteness of knowledge while realizing query expansion.

また、学習システム１は、第３属性値ごとに、当該第３属性値に関連付けられた確率に基づいて、当該第３属性値を除外するか否かを決定し、複数の第３属性値のうち、確率に基づいて除外すると決定した一部のみを除外する。これにより、どの程度の第３属性値を除外せずに残すかを管理しやすくなるので、管理者が想定した通りに第３属性値を除外できる。例えば、管理者が想定しないほど多くの第３属性値が除外されてしまったり、管理者が想定したよりも少ない第３属性値しか除外されなかったりすることを防止できる。その結果、管理者が狙ったように、知識の不自然さを属性値抽出モデルＭ３に学習させることができる。 Furthermore, the learning system 1 determines, for each third attribute value, whether or not to exclude that third attribute value based on the probability associated with that third attribute value, and excludes only a portion of the multiple third attribute values that have been determined to be excluded based on the probability. This makes it easier to manage how many third attribute values are left unexcluded, allowing the administrator to exclude third attribute values as intended. For example, it is possible to prevent more third attribute values than the administrator anticipates from being excluded, or to prevent fewer third attribute values than the administrator anticipates from being excluded. As a result, the attribute value extraction model M3 can learn the unnaturalness of knowledge as intended by the administrator.

また、学習システム１は、第３属性値ごとに、当該第３属性値に関連付けられた、当該第３属性値の頻度に応じた確率に基づいて、当該第３属性値を除外するか否かを決定する。これにより、例えば、第３属性値の頻度が高いほど、当該第３属性値が除外されにくくするといったことが可能になるので、頻出の第３属性値を属性値抽出モデルＭ３に学習させやすくなる。 Furthermore, the learning system 1 determines, for each third attribute value, whether or not to exclude the third attribute value based on a probability associated with the third attribute value according to the frequency of the third attribute value. This makes it possible, for example, to make it more difficult for the third attribute value to be excluded the higher the frequency of the third attribute value, making it easier to have the attribute value extraction model M3 learn frequently occurring third attribute values.

また、学習システム１は、第３属性値ごとに、予め定められた除外レートと、当該第３属性値の頻度と、に基づいて、当該第３属性値の確率を決定する。除外レートにより、どの程度の第３属性値を残すかを管理しやすくなるので、管理者が想定した通りに第３属性値を除外できる。 Furthermore, the learning system 1 determines the probability of each third attribute value based on a predetermined exclusion rate and the frequency of the third attribute value. The exclusion rate makes it easier to manage how many third attribute values are left, so that the administrator can exclude third attribute values as intended.

また、学習システム１は、全ての第３属性値のうちの一部のみを除外し、全ての第３属性値のうち、除外されなかった残りの第３属性値を含む第２クエリを取得する。これにより、ある程度の第３属性値を属性値抽出モデルＭ３に学習させることができる。 The learning system 1 also excludes only a portion of all the third attribute values, and obtains a second query that includes the remaining third attribute values that have not been excluded from all the third attribute values. This allows the attribute value extraction model M3 to learn a certain number of third attribute values.

また、学習システム１は、複数の第３属性値を含む第２クエリを取得し、第２クエリに含まれる複数の第３属性値のうちの一部を他の値に置き換えることによって、当該一部の第３属性値を除外する。これにより、他の値を第２クエリに含めることによって、知識の不完全さを属性値抽出モデルＭ３に学習させることができる。 The learning system 1 also acquires a second query including multiple third attribute values, and replaces some of the multiple third attribute values included in the second query with other values, thereby excluding the some third attribute values. In this way, by including the other values in the second query, it is possible to cause the attribute value extraction model M3 to learn about the incompleteness of knowledge.

また、学習システム１は、第２データと、第１タイプの第２クエリと、に基づいて、属性値抽出モデルＭ３に関する第１学習を行い、第２データと、第２タイプの第２クエリと、に基づいて、属性値抽出モデルＭ３に関する第２学習を行う。これにより、第３属性値を属性値抽出モデルＭ３に学習させつつ、より効率的に、知識の不完全さを属性値抽出モデルＭ３に学習させることができる。更に、第１属性値が利用可能な場合の属性値抽出モデルＭ３と、第１属性値が利用可能ではない場合の属性値抽出モデルＭ３と、を別々に作成する場合に比べて、１つの属性値抽出モデルＭ３にまとめることができるので、属性値抽出モデルＭ３の管理負担が軽減する。 The learning system 1 also performs a first learning process on the attribute value extraction model M3 based on the second data and the second query of the first type, and performs a second learning process on the attribute value extraction model M3 based on the second data and the second query of the second type. This allows the attribute value extraction model M3 to learn the third attribute value while more efficiently learning about the incompleteness of knowledge. Furthermore, compared to creating an attribute value extraction model M3 for when the first attribute value is available and an attribute value extraction model M3 for when the first attribute value is not available separately, they can be consolidated into a single attribute value extraction model M3, reducing the management burden of the attribute value extraction model M3.

また、学習システム１は、第３属性値の利用可能性をドメインとみなしたマルチドメイン学習を利用して、第１学習及び第２学習を行う。これにより、１つの属性値抽出モデルＭ３の中で、実際の推定時における第１属性値の利用可能性に応じて推定処理のスイッチが切り替わるようにすることができる。このため、第１属性値が利用可能な場合と利用可能ではない場合との両方に対応できるハイブリッドな属性値抽出モデルＭ３とすることができる。 The learning system 1 also performs the first learning and the second learning by using multi-domain learning in which the availability of the third attribute value is regarded as a domain. This allows the estimation process to be switched in one attribute value extraction model M3 depending on the availability of the first attribute value at the time of actual estimation. This allows for a hybrid attribute value extraction model M3 that can handle both cases where the first attribute value is available and cases where it is not available.

また、学習システム１は、第２データ及び第２クエリが属性値抽出モデルＭ３に入力された場合に、第２データにおける第４属性値の部分を識別可能な部分識別情報を属性値抽出モデルＭ３が出力するように、学習を行う。これにより、実際の推定時には、第１データの中のどの部分が第２属性値なのかを識別できるので、第１データから第２属性値を抽出しやすくなる。 Furthermore, the learning system 1 learns so that when the second data and the second query are input to the attribute value extraction model M3, the attribute value extraction model M3 outputs partial identification information capable of identifying the portion of the fourth attribute value in the second data. This makes it possible to identify which portion of the first data is the second attribute value during actual estimation, making it easier to extract the second attribute value from the first data.

また、学習システム１は、第１アイテムは、推定用の第１商品であり、第２アイテムは、学習用の第２商品である。これにより、オンラインショッピングモールで取引される商品の属性値抽出を精度よく行うことができる。 In addition, in the learning system 1, the first item is a first product for estimation, and the second item is a second product for learning. This allows for accurate extraction of attribute values of products traded in online shopping malls.

また、学習システム１は、第１商品に関する第２属性値を含む第１データ、第１属性値を含む第１クエリ、及び学習済みの属性値抽出モデルＭ３に基づいて、第１データから第２属性値を抽出する。これにより、クエリ拡張を実現できるので、第２属性値を精度よく抽出できる。 The learning system 1 also extracts the second attribute value from the first data based on the first data including the second attribute value related to the first product, the first query including the first attribute value, and the trained attribute value extraction model M3. This enables query expansion, so that the second attribute value can be extracted with high accuracy.

また、学習システム１は、第１属性値が利用可能であると判定された場合には、第１アイテムに関する第２属性値を含む第１データ、第３タイプの第１クエリ、及び学習済みの属性値抽出モデルＭ３に基づいて、第１データから第２属性値を抽出する。学習システム１は、第１属性値が利用可能であると判定されない場合には、第１データ、第４タイプの第１クエリ、及び学習済みの属性値抽出モデルＭ３に基づいて、第１データから第２属性値を抽出する。これにより、第１属性値が利用可能な場合と、第１属性値が利用可能ではない場合と、の何れの場合にも、第２属性値を精度よく抽出できる。 Furthermore, when it is determined that the first attribute value is available, the learning system 1 extracts the second attribute value from the first data based on the first data including the second attribute value related to the first item, the first query of the third type, and the learned attribute value extraction model M3. When it is not determined that the first attribute value is available, the learning system 1 extracts the second attribute value from the first data based on the first data, the first query of the fourth type, and the learned attribute value extraction model M3. This makes it possible to accurately extract the second attribute value in both cases, when the first attribute value is available and when the first attribute value is not available.

［５．変形例］
本開示は、以上に説明した実施形態に限定されるものではない。本開示の趣旨を逸脱しない範囲で、適宜変更可能である。 5. Modifications
The present disclosure is not limited to the above-described embodiment, and may be modified as appropriate without departing from the spirit and scope of the present disclosure.

図１１は、変形例における学習システム及び属性値抽出システムの一例である。図１１のように、学習システム１及び属性値抽出システム２が互いに別々のシステムであってもよい。属性値抽出システム２は、学習システム１により作成された学習済みの属性値抽出モデルを利用可能なシステムである。図１１の例では、学習システム１は、学習端末２０を含む。属性値抽出システム２は、サーバ１０及び推定端末３０を含む。サーバ１０、学習端末２０、及び推定端末３０の各々の機能は、実施形態で説明した通りである。属性値抽出システム２は、サーバ１０を含まずに、推定端末３０だけを含んでもよい。 Figure 11 is an example of a learning system and an attribute value extraction system in a modified example. As shown in Figure 11, the learning system 1 and the attribute value extraction system 2 may be separate systems. The attribute value extraction system 2 is a system that can use a trained attribute value extraction model created by the learning system 1. In the example of Figure 11, the learning system 1 includes a learning terminal 20. The attribute value extraction system 2 includes a server 10 and an estimation terminal 30. The functions of the server 10, the learning terminal 20, and the estimation terminal 30 are as described in the embodiment. The attribute value extraction system 2 may include only the estimation terminal 30 without including the server 10.

例えば、実施形態では、ナレッジドロップアウト手法及びナレッジトークン手法の両方が利用される場合を説明したが、学習端末２０は、ナレッジトークン手法を利用せずに、ナレッジドロップアウト手法だけを利用して、属性値抽出モデルＭ３の学習を行ってもよい。この場合、第１クエリ及び第２クエリは、ナレッジトークンを含まない。例えば、学習端末２０は、ナレッジドロップアウト手法を利用せずに、ナレッジトークン手法だけを利用して、属性値抽出モデルＭ３の学習を行ってもよい。この場合、第１クエリは、第３属性値が除外されずに、全ての第３属性値を含んでもよい。 For example, in the embodiment, a case has been described in which both the knowledge dropout method and the knowledge token method are used, but the learning terminal 20 may learn the attribute value extraction model M3 using only the knowledge dropout method without using the knowledge token method. In this case, the first query and the second query do not include a knowledge token. For example, the learning terminal 20 may learn the attribute value extraction model M3 using only the knowledge token method without using the knowledge dropout method. In this case, the first query may include all third attribute values without excluding the third attribute values.

例えば、学習端末２０で実現されるものとして説明した機能は、サーバ１０、推定端末３０、又は他のコンピュータで実現されてもよいし、複数のコンピュータで分担されてもよい。例えば、推定端末３０で実現されるものとして説明した機能は、サーバ１０、学習端末２０、又は他のコンピュータで実現されてもよいし、複数のコンピュータで分担されてもよい。 For example, the functions described as being realized by the learning terminal 20 may be realized by the server 10, the estimation terminal 30, or another computer, or may be shared among multiple computers. For example, the functions described as being realized by the estimation terminal 30 may be realized by the server 10, the learning terminal 20, or another computer, or may be shared among multiple computers.

［６．付記］
例えば、学習システム及び属性値抽出システムは、下記のような構成も可能である。 [6. Notes]
For example, the learning system and the attribute value extraction system can be configured as follows.

（１）
推定用の第１アイテムに関する第１属性に関連付けられた少なくとも１つの第１属性値を含む第１クエリを利用して、前記第１アイテムに関する第２属性値を含む第１データから前記第２属性値を抽出するための属性値抽出モデルを記憶するモデル記憶部と、
学習用の第２アイテムに関する第２属性に関連付けられた複数の第３属性値のうちの少なくとも一部が除外された第２クエリを取得する第２クエリ取得部と、
前記第２アイテムに関する第４属性値を含む第２データと、前記第２クエリと、に基づいて、前記属性値抽出モデルに関する学習を行う学習部と、
を含む学習システム。
（２）
前記第１クエリは、前記第１属性と、前記少なくとも１つの第１属性値と、を含み、
前記第２クエリは、前記第２属性を含み、前記複数の第３属性値のうちの少なくとも一部が除外される、
（１）に記載の学習システム。
（３）
前記第２クエリ取得部は、
前記複数の第３属性値のうちの一部のみを除外し、
前記複数の第３属性値のうち、除外されなかった残りの前記第３属性値を含む前記第２クエリを取得する、
（１）又は（２）に記載の学習システム。
（４）
前記第２クエリ取得部は、
前記第３属性値ごとに、当該第３属性値に関連付けられた確率に基づいて、当該第３属性値を除外するか否かを決定し、
前記複数の第３属性値のうち、前記確率に基づいて除外すると決定した一部のみを除外する、
（３）に記載の学習システム。
（５）
前記学習システムは、前記第３属性値ごとに、複数の前記第２データが格納された第１データベースにおける当該第３属性値の頻度に基づいて、当該第３属性値の前記確率を決定する確率決定部を更に含み、
前記第２クエリ取得部は、前記第３属性値ごとに、当該第３属性値に関連付けられた、当該第３属性値の前記頻度に応じた前記確率に基づいて、当該第３属性値を除外するか否かを決定する、
（４）に記載の学習システム。
（６）
前記確率決定部は、前記第３属性値ごとに、予め定められた除外レートと、当該第３属性値の前記頻度と、に基づいて、当該第３属性値の前記確率を決定する、
（５）に記載の学習システム。
（７）
前記学習システムは、前記複数の第３属性値が格納された第２データベースに格納された全ての前記第３属性値を取得する第３属性値取得部を更に含み、
前記第２クエリ取得部は、
前記全ての第３属性値のうちの一部のみを除外し、
前記全ての第３属性値のうち、除外されなかった残りの前記第３属性値を含む前記第２クエリを取得する、
（３）～（６）の何れかに記載の学習システム。
（８）
前記第２クエリ取得部は、
前記複数の第３属性値を含む前記第２クエリを取得し、
前記第２クエリに含まれる前記複数の第３属性値のうちの一部を他の値に置き換えることによって、当該一部の第３属性値を除外する、
（３）～（７）に記載の学習システム。
（９）
前記第１属性によっては、前記第１属性値を利用可能ではないことがあり、
前記第２クエリ取得部は、
前記第３属性値が利用可能なことを示す第１識別情報と、前記複数の第３属性値のうちの少なくとも一部と、を含む第１タイプの前記第２クエリを取得し、
前記第３属性値が利用可能ではないことを示す第２識別情報を含み、前記複数の第３属性値の全てが除外された第２タイプの前記第２クエリを取得し、
前記学習部は、
前記第２データと、前記第１タイプの前記第２クエリと、に基づいて、前記属性値抽出モデルに関する第１学習と、
前記第２データと、前記第２タイプの前記第２クエリと、に基づいて、前記属性値抽出モデルに関する第２学習と、を行う、
（１）～（８）の何れかに記載の学習システム。
（１０）
前記学習部は、前記第３属性値の利用可能性をドメインとみなしたマルチドメイン学習を利用して、前記第１学習及び前記第２学習を行う、
（９）に記載の学習システム。
（１１）
前記学習部は、前記第２データ及び前記第２クエリが前記属性値抽出モデルに入力された場合に、前記第２データにおける前記第４属性値の部分を識別可能な部分識別情報を前記属性値抽出モデルが出力するように、前記学習を行う、
（１）～（１０）の何れかに記載の学習システム。
（１２）
前記第１アイテムは、推定用の第１商品であり、
前記第２アイテムは、学習用の第２商品であり、
前記第１データは、前記第１商品の説明に関する第１文字列を含み、
前記第２データは、前記第２商品の説明に関する第２文字列を含み、
前記属性値抽出モデルは、前記第１文字列から前記第２属性値を抽出するための自然言語処理に関するモデルである、
（１）～（１１）の何れかに記載の学習システム。
（１３）
（３）～（８）の何れかに記載の学習システムにより作成された学習済みの属性値抽出モデルを利用可能な属性値抽出システムであって、
推定用の第１アイテムに関する第１属性に関連付けられた少なくとも１つの第１属性値を含む第１クエリを取得する第１クエリ取得部と、
前記第１アイテムに関する第２属性値を含む第１データ、前記第１クエリ、及び前記学習済みの属性値抽出モデルに基づいて、前記第１データから前記第２属性値を抽出する第２属性値抽出部と、
を含む属性値抽出システム。
（１４）
（９）又は（１０）に記載の学習システムにより作成された学習済みの属性値抽出モデルを利用可能な属性値抽出システムであって、
推定用の第１アイテムに関する第１属性に関連付けられた少なくとも１つの第１属性値の利用可能性を判定する利用可能性判定部と、
前記第１属性値が利用可能であると判定された場合には、前記第１属性値が利用可能なことを示す第３識別情報と、前記少なくとも１つの第１属性値と、を含む第３タイプの第１クエリを取得し、前記第１属性値が利用可能であると判定されない場合には、前記第１属性値が利用可能ではないことを示す第４識別情報を含む第４タイプの第１クエリを取得する第１クエリ取得部と、
前記第１属性値が利用可能であると判定された場合には、前記第１アイテムに関する第２属性値を含む第１データ、前記第３タイプの第１クエリ、及び前記学習済みの属性値抽出モデルに基づいて、前記第１データから前記第２属性値を抽出し、前記第１属性値が利用可能であると判定されない場合には、前記第１データ、前記第４タイプの第１クエリ、及び前記学習済みの属性値抽出モデルに基づいて、前記第１データから前記第２属性値を抽出する第２属性値抽出部と、
を含む属性値抽出システム。 (1)
a model storage unit that stores an attribute value extraction model for extracting a second attribute value from first data including a second attribute value related to a first item for estimation, by using a first query including at least one first attribute value associated with the first attribute related to the first item;
a second query acquisition unit that acquires a second query in which at least a part of a plurality of third attribute values associated with a second attribute related to a second item for learning is excluded;
a learning unit that performs learning on the attribute value extraction model based on second data including a fourth attribute value related to the second item and the second query;
A learning system including:
(2)
the first query includes the first attribute and the at least one first attribute value;
the second query includes the second attribute and excludes at least a portion of the third attribute values;
A learning system as described in (1).
(3)
The second query acquisition unit is
excluding only a portion of the plurality of third attribute values;
obtaining the second query including the remaining third attribute values that have not been excluded from the plurality of third attribute values;
A learning system according to (1) or (2).
(4)
The second query acquisition unit is
determining, for each of the third attribute values, whether to exclude the third attribute value based on a probability associated with the third attribute value;
excluding only a portion of the plurality of third attribute values that is determined to be excluded based on the probability;
A learning system as described in (3).
(5)
the learning system further includes a probability determination unit that determines, for each of the third attribute values, the probability of the third attribute value based on a frequency of the third attribute value in a first database in which a plurality of the second data are stored;
the second query acquisition unit determines, for each of the third attribute values, whether or not to exclude the third attribute value based on the probability associated with the third attribute value and corresponding to the frequency of the third attribute value;
A learning system as described in (4).
(6)
the probability determination unit determines the probability of each of the third attribute values based on a predetermined exclusion rate and the frequency of the third attribute value;
A learning system as described in (5).
(7)
The learning system further includes a third attribute value acquisition unit that acquires all of the third attribute values stored in a second database in which the plurality of third attribute values are stored,
The second query acquisition unit is
excluding only a portion of all the third attribute values;
obtaining the second query including the remaining third attribute values that have not been excluded from among all the third attribute values;
A learning system according to any one of (3) to (6).
(8)
The second query acquisition unit is
obtaining the second query including the third attribute values;
replacing a part of the plurality of third attribute values included in the second query with another value, thereby excluding the part of the third attribute values;
A learning system according to any one of (3) to (7).
(9)
Depending on the first attribute, the first attribute value may not be available;
The second query acquisition unit is
obtaining the second query of a first type including first identification information indicating that the third attribute value is available and at least a portion of the plurality of third attribute values;
obtaining the second query of a second type including second identification information indicating that the third attribute value is not available, the second query excluding all of the third attribute values;
The learning unit is
a first learning of the attribute value extraction model based on the second data and the second query of the first type;
performing second learning on the attribute value extraction model based on the second data and the second query of the second type;
A learning system according to any one of (1) to (8).
(10)
the learning unit performs the first learning and the second learning by using multi-domain learning in which availability of the third attribute value is regarded as a domain.
A learning system as described in (9).
(11)
the learning unit performs the learning such that, when the second data and the second query are input to the attribute value extraction model, the attribute value extraction model outputs partial identification information capable of identifying a portion of the fourth attribute value in the second data.
A learning system according to any one of (1) to (10).
(12)
the first item is a first commodity for estimation;
the second item is a second product for learning;
the first data includes a first string relating to a description of the first product;
the second data includes a second string relating to a description of the second product;
the attribute value extraction model is a model related to natural language processing for extracting the second attribute value from the first character string;
A learning system according to any one of (1) to (11).
(13)
An attribute value extraction system capable of using a trained attribute value extraction model created by the learning system according to any one of (3) to (8),
a first query acquisition unit that acquires a first query including at least one first attribute value associated with a first attribute related to a first item for estimation;
a second attribute value extraction unit that extracts the second attribute value from the first data based on first data including a second attribute value related to the first item, the first query, and the trained attribute value extraction model;
An attribute value extraction system comprising:
(14)
An attribute value extraction system capable of using a trained attribute value extraction model created by the learning system according to (9) or (10),
an availability determination unit that determines availability of at least one first attribute value associated with a first attribute related to a first item for estimation;
a first query acquisition unit that acquires a third type of first query including third identification information indicating that the first attribute value is available and the at least one first attribute value when it is determined that the first attribute value is available, and acquires a fourth type of first query including fourth identification information indicating that the first attribute value is not available when it is not determined that the first attribute value is available;
a second attribute value extraction unit that extracts the second attribute value from the first data based on first data including a second attribute value related to the first item, a first query of the third type, and the learned attribute value extraction model when it is determined that the first attribute value is available, and extracts the second attribute value from the first data based on the first data, the first query of the fourth type, and the learned attribute value extraction model when it is not determined that the first attribute value is available;
An attribute value extraction system comprising:

１学習システム、２属性値抽出システム、Ｎネットワーク、Ｐ商品ページ、１０サーバ、１１，２１，３１制御部、１２，２２，３２記憶部、１３，２３，３３通信部、２０学習端末、２４，３４操作部、２５，３５表示部、３０推定端末、Ｐ商品ページ、Ｍ１質問応答モデル、Ｍ２，Ｍ３属性値抽出モデル、Ｐｂ始点、Ｐｅ終点、１００，２００モデル記憶部、２０１第３属性値取得部、２０２確率決定部、２０３第２クエリ取得部、２０４学習部、３００モデル記憶部、３０１利用可能性判定部、３０２第１クエリ取得部、３０３第２属性値抽出部、Ｎ１０ニュース記事、Ｒ１２応答、Ｄ２０，Ｄ３０商品データ、ＤＢ１訓練データベース、ＤＢ２属性データベース、Ｉ４１，Ｉ４２入力、Ｑ１１，Ｑ２１，Ｑ３１，Ｑ４０クエリ、Ｖ２２，Ｖ３２属性値。 1 Learning system, 2 Attribute value extraction system, N Network, P Product page, 10 Server, 11, 21, 31 Control unit, 12, 22, 32 Memory unit, 13, 23, 33 Communication unit, 20 Learning terminal, 24, 34 Operation unit, 25, 35 Display unit, 30 Estimation terminal, P Product page, M1 Question answering model, M2, M3 Attribute value extraction model, Pb Start point, Pe End point, 100, 200 Model memory unit, 201 Third attribute value acquisition unit, 202 Probability determination unit, 203 Second query acquisition unit, 204 Learning unit, 300 Model memory unit, 301 Availability determination unit, 302 First query acquisition unit, 303 Second attribute value extraction unit, N10 News article, R12 Response, D20, D30 Product data, DB1 Training database, DB2 Attribute database, I41, I42 input, Q11, Q21, Q31, Q40 query, V22, V32 attribute value.

Claims

a model storage unit that stores an attribute value extraction model for extracting a second attribute value from first data including a second attribute value related to a first item to be estimated, by using a first query including a first attribute related to the first item and at least one first attribute value associated with the first attribute;
a second query acquisition unit that acquires a second query including a second attribute related to a second item for learning and a third attribute value that has not been excluded from among a plurality of third attribute values associated with the second attribute;
a learning unit that causes the attribute value extraction model to learn training data including, as an input portion, second data including a fourth attribute value related to the second item and associated with the second attribute, and the second query, and including , as an output portion, information capable of identifying the fourth attribute value in the second data;
A learning system including:

The second query acquisition unit is
excluding only a portion of the plurality of third attribute values;
obtaining the second query including the remaining third attribute values that have not been excluded from the plurality of third attribute values;
The learning system according to claim 1 .

The second query acquisition unit is
determining, for each of the third attribute values, whether to exclude the third attribute value based on a probability associated with the third attribute value;
excluding only a portion of the plurality of third attribute values that is determined to be excluded based on the probability;
The learning system according to claim 2 .

the learning system further includes a probability determination unit that determines, for each of the third attribute values, the probability of the third attribute value based on a frequency of the third attribute value in a first database in which a plurality of the second data are stored;
the second query acquisition unit determines, for each of the third attribute values, whether or not to exclude the third attribute value based on the probability associated with the third attribute value and corresponding to the frequency of the third attribute value;
The learning system according to claim 3 .

the probability determination unit determines the probability of each of the third attribute values based on a predetermined exclusion rate and the frequency of the third attribute value;
The learning system according to claim 4 .

The learning system further includes a third attribute value acquisition unit that acquires all of the third attribute values stored in a second database in which the plurality of third attribute values are stored,
The second query acquisition unit is
excluding only a portion of all the third attribute values;
obtaining the second query including the remaining third attribute values that have not been excluded from among all the third attribute values;
The learning system according to claim 2 .

The second query acquisition unit is
obtaining the second query including the third attribute values;
replacing a part of the plurality of third attribute values included in the second query with another value, thereby excluding the part of the third attribute values;
The learning system according to claim 2 .

Depending on the first attribute, the first attribute value may not be available;
The second query acquisition unit is
obtaining the second query of a first type including first identification information indicating that the third attribute value is available and at least a portion of the plurality of third attribute values;
obtaining the second query of a second type including second identification information indicating that the third attribute value is not available, the second query excluding all of the third attribute values;
The learning unit is
a first learning of the attribute value extraction model based on the second data and the second query of the first type;
performing second learning on the attribute value extraction model based on the second data and the second query of the second type;
A learning system according to any one of claims 1 to 7 .

the learning unit performs the first learning and the second learning by using multi-domain learning in which availability of the third attribute value is regarded as a domain.
The learning system according to claim 8 .

the learning unit performs the learning such that, when the second data and the second query are input to the attribute value extraction model, the attribute value extraction model outputs partial identification information capable of identifying a portion of the fourth attribute value in the second data.
A learning system according to any one of claims 1 to 7 .

the first item is a first commodity for estimation;
the second item is a second product for learning;
the first data includes a first string relating to a description of the first product;
the second data includes a second string relating to a description of the second product;
the attribute value extraction model is a model related to natural language processing for extracting the second attribute value from the first character string;
A learning system according to any one of claims 1 to 7 .

An attribute value extraction system capable of using a trained attribute value extraction model created by the learning system according to claim 2 ,
a first query acquisition unit that acquires a first query including at least one first attribute value associated with a first attribute related to a first item for estimation;
a second attribute value extraction unit that extracts the second attribute value from the first data based on first data including a second attribute value related to the first item, the first query, and the trained attribute value extraction model;
An attribute value extraction system comprising:

An attribute value extraction system capable of using a trained attribute value extraction model created by the learning system according to claim 8 ,
an availability determination unit that determines availability of at least one first attribute value associated with a first attribute related to a first item for estimation;
a first query acquisition unit that acquires a third type of first query including third identification information indicating that the first attribute value is available and the at least one first attribute value when it is determined that the first attribute value is available, and acquires a fourth type of first query including fourth identification information indicating that the first attribute value is not available when it is not determined that the first attribute value is available;
a second attribute value extraction unit that extracts the second attribute value from the first data based on first data including a second attribute value related to the first item, a first query of the third type, and the learned attribute value extraction model when it is determined that the first attribute value is available, and extracts the second attribute value from the first data based on the first data, the first query of the fourth type, and the learned attribute value extraction model when it is not determined that the first attribute value is available;
An attribute value extraction system comprising:

The computer
a second query acquisition step of acquiring a second query including a second attribute related to a second item for learning and a third attribute value that has not been excluded among a plurality of third attribute values associated with the second attribute;
a learning step of training data including, as an input portion, second data including a fourth attribute value related to the second item and the fourth attribute value related to the second attribute, and the second query, and including, as an output portion, information capable of identifying the fourth attribute value in the second data, using a first query including a first attribute related to a first item to be estimated and at least one first attribute value associated with the first attribute, to learn an attribute value extraction model for extracting the second attribute value from first data including the second attribute value related to the first item, by using the first query;
Learn how to do it .

a second query acquisition unit that acquires a second query including a second attribute related to a second item for learning and a third attribute value that has not been excluded from among a plurality of third attribute values associated with the second attribute;
a learning unit that causes an attribute value extraction model that stores an attribute value extraction model for extracting the second attribute value from first data including a second attribute value related to the first item to learn training data including, as an input portion, second data including a fourth attribute value related to the second item , the fourth attribute value being related to the second attribute, the second query, the second data including the fourth attribute value different from the third attribute value , and the second query , and including, as an output portion, information capable of identifying the fourth attribute value in the second data, by using a first query including a first attribute related to a first item to be estimated and at least one first attribute value associated with the first attribute;
A program that makes a computer function as a