JP6856466B2

JP6856466B2 - Information processing systems, information processing methods, and programs

Info

Publication number: JP6856466B2
Application number: JP2017137664A
Authority: JP
Inventors: 朋哉山崎; 圭一郎永島
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2017-07-14
Filing date: 2017-07-14
Publication date: 2021-04-07
Anticipated expiration: 2037-07-14
Also published as: JP2019020940A

Description

本発明は、情報処理システム、情報処理方法、およびプログラムに関する。 The present invention relates to information processing systems, information processing methods, and programs.

記憶された情報検索データを分析して、過去の情報検索行動から関連性パターンを特定する学習コンポーネントと、前記関連性のパターンに少なくとも部分的に基づいて現在の検索結果のサブセットを特定する検索コンポーネントと、を備えた情報検索システムが提案されている（特許文献１参照）。 A learning component that analyzes stored information retrieval data to identify relevance patterns from past information retrieval behavior, and a search component that identifies a subset of current search results based on at least part of the relevance pattern. An information retrieval system equipped with the above has been proposed (see Patent Document 1).

特開２００６−２８５９８２号公報Japanese Unexamined Patent Publication No. 2006-285982

ところで、ユーザの検索行動に関連するより有用な情報を提供することができると好ましい場合がある。 By the way, it may be preferable to be able to provide more useful information related to the user's search behavior.

本発明は、このような事情を考慮してなされたものであり、ユーザの検索行動に関連するより有用な情報を提供することができる情報処理システム、情報処理方法、およびプログラムを提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and provides an information processing system, an information processing method, and a program capable of providing more useful information related to a user's search behavior. It is one of the purposes.

本発明の一態様は、同一のクエリを入力したユーザ群毎に、前記ユーザ群により販売サイトで選択された複数の商品またはサービスのそれぞれに付与されているカテゴリパスを収集する収集部と、前記収集部により収集された前記ユーザ群毎の複数のカテゴリパスに含まれる同一のカテゴリを統合することで、クエリ毎のカテゴリツリーを生成する生成部と、複数のクエリ間の類似度を、前記複数のクエリのそれぞれに対して前記生成部により生成されたカテゴリツリー同士の類似度に基づき導出する導出部と、を備えた情報処理システムである。 One aspect of the present invention includes a collecting unit that collects category paths given to each of a plurality of products or services selected by the user group on the sales site for each user group that inputs the same query. By integrating the same categories included in the plurality of category paths for each of the user groups collected by the collection unit, the generation unit that generates a category tree for each query and the similarity between the plurality of queries are determined. It is an information processing system including a derivation unit for deriving each of the queries of the above based on the similarity between the category trees generated by the generation unit.

本発明の一態様によれば、ユーザの検索行動に関連するより有用な情報を提供することができる。 According to one aspect of the present invention, it is possible to provide more useful information related to the user's search behavior.

実施形態のサーバ装置１０の使用環境を示す図である。It is a figure which shows the use environment of the server apparatus 10 of an embodiment. 実施形態のカテゴリパスＰを説明するための図である。It is a figure for demonstrating the category path P of embodiment. 実施形態の商品情報の内容の一例を示す図である。It is a figure which shows an example of the content of the product information of an embodiment. 実施形態のユーザ行動情報の内容の一例を示す図である。It is a figure which shows an example of the content of the user behavior information of an embodiment. 実施形態の類似度分析部３００の構成を示すブロック図である。It is a block diagram which shows the structure of the similarity analysis part 300 of embodiment. 実施形態のあるクエリを入力したユーザ群に対してカテゴリパス収集部３１０により収集されたカテゴリパスＰの集まりの一例を示す図である。It is a figure which shows an example of the set of the category path P collected by the category path collecting unit 310 with respect to the user group which input a query with an embodiment. 実施形態のカテゴリパス統合部３２１により生成されたカテゴリツリーＴの一例を示す図である。It is a figure which shows an example of the category tree T generated by the category path integration part 321 of an embodiment. 実施形態のカテゴリパス統合部３２１により生成されたカテゴリツリーＴの別の例を示す図である。It is a figure which shows another example of the category tree T generated by the category path integration part 321 of an embodiment. 実施形態のカテゴリツリーＴ同士の類似度の求め方の一例を示す図である。It is a figure which shows an example of how to obtain the degree of similarity between category trees T of embodiment. 実施形態のセッション分割部４００によるセッションの分割の一例を示す図である。It is a figure which shows an example of the session division by the session division part 400 of an embodiment. 実施形態のカテゴリツリーＴの導出段階の処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the process flow of the derivation stage of the category tree T of embodiment. 実施形態のクエリ間の類似度を用いたセッション分割およびコンバージョン予測値の導出段階の処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the process flow of the session division and the derivation stage of the conversion prediction value using the similarity between the queries of an embodiment. 本実施形態のサーバ装置１０を用いたコンバージョン予測値のシミュレーションによる実験結果を示す図である。It is a figure which shows the experimental result by the simulation of the conversion predicted value using the server apparatus 10 of this embodiment.

以下、図面を参照して、情報処理システム、情報処理方法、およびプログラムの実施形態について説明する。本実施形態では、情報処理システムが販売サイトを提供するサーバ装置に適用されたものとして説明する。販売サイトは、ブラウザによって再生されるものに限らず、アプリケーションプログラムによって再生されるものも含むものとする。サーバ装置は、インターネットなどのネットワークを介してユーザ端末装置と通信可能に接続され、販売サイトに対するユーザの検索行動を示す情報を取得する。サーバ装置は、ユーザから入力された複数のクエリの間の類似度を導出し、導出した類似度に基づきセッションを分割する。また、サーバ装置は、分割したセッションに関する情報に基づき、ユーザのコンバージョンの予測値（ユーザが所定の行動をとる確率）を導出する。以下、実施形態について説明する。 Hereinafter, an information processing system, an information processing method, and an embodiment of a program will be described with reference to the drawings. In the present embodiment, it is assumed that the information processing system is applied to the server device that provides the sales site. The sales site is not limited to the one played by the browser, but also includes the one played by the application program. The server device is communicably connected to the user terminal device via a network such as the Internet, and acquires information indicating the user's search behavior for the sales site. The server device derives the similarity between a plurality of queries input from the user, and divides the session based on the derived similarity. Further, the server device derives a predicted value of conversion of the user (probability that the user takes a predetermined action) based on the information about the divided session. Hereinafter, embodiments will be described.

図１は、実施形態のサーバ装置（情報処理システム）１０の使用環境を示す図である。サーバ装置１０は、ネットワークＮＷを介してユーザ端末装置ＵＤおよびクライアント端末装置ＣＤと通信可能に接続される。ネットワークＮＷは、インターネットやＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）などを含む。また、サーバ装置１０と通信可能に接続されるユーザ端末装置ＵＤおよびクライアント端末装置ＣＤの数は、例えばそれぞれ複数である。 FIG. 1 is a diagram showing a usage environment of the server device (information processing system) 10 of the embodiment. The server device 10 is communicably connected to the user terminal device UD and the client terminal device CD via the network NW. The network NW includes the Internet, WAN (Wide Area Network), LAN (Local Area Network) and the like. Further, the number of the user terminal device UD and the client terminal device CD that are communicably connected to the server device 10 is, for example, a plurality of each.

ユーザ端末装置ＵＤは、ユーザによって使用される情報処理装置である。ユーザ端末装置ＵＤは、例えば、商品またはサービス（以下、「商品等」と称する）を販売するための、サーバ装置１０が提供する販売サイトを閲覧するためのブラウザやアプリケーションプログラムを有する。ユーザ端末装置ＵＤは、販売サイトで紹介されている商品等を、ユーザが検索したり、購入または契約（予約を含む）（以下、単に「購入」と称する）したりするために利用される。商品等は、「対象物」の一例である。 The user terminal device UD is an information processing device used by the user. The user terminal device UD has, for example, a browser or an application program for browsing a sales site provided by the server device 10 for selling a product or service (hereinafter referred to as "product or the like"). The user terminal device UD is used for a user to search for a product or the like introduced on a sales site, or to purchase or contract (including a reservation) (hereinafter, simply referred to as "purchase"). A product or the like is an example of an "object".

クライアント端末装置ＣＤは、クライアントによって使用される情報処理装置である。クライアント端末装置ＣＤは、例えば、販売対象の商品等の情報（以下、「商品情報」と称する）をサーバ装置１０に登録するためにクライアントによって利用される。 The client terminal device CD is an information processing device used by the client. The client terminal device CD is used by the client, for example, to register information such as a product to be sold (hereinafter referred to as "product information") in the server device 10.

サーバ装置１０は、商品等の販売サイトを提供する。サーバ装置１０は、クライアント端末装置ＣＤによって登録された商品情報を記憶し、ユーザ端末装置ＵＤを通じてユーザの検索行動の入力を受け付けると、ユーザの検索行動に応じて絞り込まれた商品情報を掲載した販売サイトをユーザ端末装置ＵＤに対して配信する。 The server device 10 provides a sales site for products and the like. When the server device 10 stores the product information registered by the client terminal device CD and receives the input of the user's search behavior through the user terminal device UD, the server device 10 publishes the product information narrowed down according to the user's search behavior. Distribute the site to the user terminal device UD.

本実施形態のサーバ装置１０は、例えば、商品情報取得部１００と、行動情報取得部２００と、類似度分析部３００と、セッション分割部４００と、計算モデル学習部５００と、ＣＶ予測部６００と、情報出力部７００と、記憶部８００とを備える。 The server device 10 of the present embodiment includes, for example, a product information acquisition unit 100, an action information acquisition unit 200, a similarity analysis unit 300, a session division unit 400, a calculation model learning unit 500, and a CV prediction unit 600. The information output unit 700 and the storage unit 800 are provided.

商品情報取得部１００、行動情報取得部２００、類似度分析部３００、セッション分割部４００、計算モデル学習部５００、ＣＶ予測部６００、および情報出力部７００の全部または一部は、例えば、ＣＰＵ（Central Processing Unit）などのハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）などのハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。 The product information acquisition unit 100, the behavior information acquisition unit 200, the similarity analysis unit 300, the session division unit 400, the calculation model learning unit 500, the CV prediction unit 600, and the information output unit 700 are all or part of, for example, a CPU ( It is realized by executing a program (software) by a hardware processor such as the Central Processing Unit. In addition, some or all of these components are hardware (circuits) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), and GPU (Graphics Processing Unit). It may be realized by the part; including circuitry), or it may be realized by the cooperation of software and hardware.

記憶部８００は、例えば、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ、またはこれらのうち複数が組み合わされたハイブリッド型記憶装置などにより実現される。なお、記憶部８００の全部または一部は、ＮＡＳ（Network Attached Storage）や外部のストレージサーバなど、サーバ装置１０のプロセッサがアクセス可能な外部装置により実現されてもよい。記憶部８００には、商品情報データベースＤＢ１、行動情報データベース（過去ログ）ＤＢ２、カテゴリツリー情報データベースＤＢ３、計算モデル情報データベースＤＢ４、およびＣＶ予測値情報データベースＤＢ５が格納される。 The storage unit 800 is realized by, for example, a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), a flash memory, or a hybrid storage device in which a plurality of these is combined. All or part of the storage unit 800 may be realized by an external device such as NAS (Network Attached Storage) or an external storage server that can be accessed by the processor of the server device 10. The storage unit 800 stores a product information database DB1, an action information database (past log) DB2, a category tree information database DB3, a calculation model information database DB4, and a CV predicted value information database DB5.

次に、サーバ装置１０の各機能部について詳しく説明する。 Next, each functional unit of the server device 10 will be described in detail.

まず、商品情報取得部１００について説明する。商品情報取得部１００は、ネットワークＮＷを通じてクライアント端末装置ＣＤから入力される商品情報を取得する。「商品情報」とは、販売サイトにて紹介される商品等に関する情報であり、例えば、商品等の名前、内容、生産者（提供者）、価格、送料、在庫数、配達予定日などである。「商品情報」は、商品等のカテゴリを示すカテゴリ情報を含む。カテゴリ情報は、商品等が属する階層状のカテゴリを特定するとともに、以下に示す「カテゴリパスＰ」を含む。 First, the product information acquisition unit 100 will be described. The product information acquisition unit 100 acquires the product information input from the client terminal device CD through the network NW. "Product information" is information about products, etc. introduced on the sales site, such as the name, contents, producer (provider), price, shipping fee, inventory quantity, estimated delivery date, etc. of the product, etc. .. The "product information" includes category information indicating a category such as a product. The category information specifies the hierarchical category to which the product or the like belongs, and also includes the "category path P" shown below.

図２は、カテゴリパスＰを説明するための図である。図２に示すように、階層状のカテゴリは、広い概念から順に、第１階層、第２階層、…、第Ｎ階層（Ｎは３以上の自然数）を含む。ここでは説明の便宜上、４階層の階層状のカテゴリについて説明するが、カテゴリの階層は、３階層以下でもよく、５階層以上でもよい。 FIG. 2 is a diagram for explaining the category path P. As shown in FIG. 2, the hierarchical category includes the first layer, the second layer, ..., The Nth layer (N is a natural number of 3 or more) in order from the broad concept. Here, for convenience of explanation, a four-layer hierarchical category will be described, but the category hierarchy may be three or less layers or five or more layers.

「カテゴリパスＰ」とは、階層状のカテゴリにおいて、最上位層のカテゴリから中位層のカテゴリを経由して最下位層の１つのカテゴリに向かう１本の仮想的なパスを意味する。例えば、図２に示す例では、最下位層（第４階層）のカテゴリ「洋食」に属するある商品のカテゴリパスＰは、「本（第１階層）＞育児（第２階層）＞料理（第３階層）＞洋食（第４階層）」となる。カテゴリパスＰは、個々の階層状のカテゴリにおいて、最下位層のカテゴリの数だけ存在する。このようなカテゴリパスＰは、クライアント端末装置ＣＤを通じて商品情報がサーバ装置１０に登録されるときに、個々の商品等に対して付与される。例えば、販売サイトが有する階層状のカテゴリにおいて、新たに登録される商品等が属する最下位層のカテゴリが指定されることで、その登録される商品等に対してカテゴリパスＰが付与される。 The “category path P” means one virtual path from the top layer category to the lowest layer category via the middle layer category in the hierarchical category. For example, in the example shown in FIG. 2, the category path P of a certain product belonging to the category "Western food" of the lowest layer (fourth layer) is "book (first layer)> childcare (second layer)> cooking (first layer). 3rd floor)> Western food (4th floor) ". There are as many category paths P as there are categories in the lowest layer in each hierarchical category. Such a category path P is given to each product or the like when the product information is registered in the server device 10 through the client terminal device CD. For example, in the hierarchical category of the sales site, by designating the category of the lowest layer to which the newly registered product or the like belongs, the category path P is given to the registered product or the like.

図３は、本実施形態の商品情報の内容の一例を示す図である。図３に示すように、例えば、商品等に対して登録された商品情報は、複数の階層の各カテゴリにおいて、その商品等がいずれのカテゴリに属するかを示す情報を含む。このように、複数の階層の各々においてその商品等がいずれのカテゴリに属するかを示す情報が登録されていることは、「商品等にカテゴリパスＰが付与されている」の一例に該当する。またこれに代えて、商品等に対してその商品等が最下位層のどのカテゴリに属するかを示す情報が登録されており、最下位層の各カテゴリとそれよりも上位のカテゴリとの対応関係を示す情報が参照可能である場合も、「商品等にカテゴリパスＰが付与されている」の一例に該当する。 FIG. 3 is a diagram showing an example of the contents of the product information of the present embodiment. As shown in FIG. 3, for example, the product information registered for a product or the like includes information indicating which category the product or the like belongs to in each category of a plurality of layers. As described above, the fact that the information indicating which category the product or the like belongs to is registered in each of the plurality of layers corresponds to an example of "the category pass P is given to the product or the like". Instead of this, information indicating which category of the lowest layer the product belongs to is registered for the product, etc., and the correspondence between each category of the lowest layer and the category higher than that is registered. Even if the information indicating the above can be referred to, it corresponds to an example of "a category pass P is given to a product or the like".

次に、行動情報取得部２００について説明する。行動情報取得部２００は、販売サイトに対するユーザ端末装置ＵＤを通じたユーザの行動情報を取得する。行動情報取得部２００は、ユーザの新しい行動を受け付ける度に、ユーザの行動情報を取得する。行動情報取得部２００は、取得したユーザの行動情報を、行動情報データベースＤＢ２に登録するとともに、類似度分析部３００およびＣＶ予測部６００に出力する。ユーザの行動情報は、ユーザの検索行動に関する情報、およびユーザの特定行動（後述）に関する情報を含む。 Next, the behavior information acquisition unit 200 will be described. The behavior information acquisition unit 200 acquires the user's behavior information through the user terminal device UD for the sales site. The behavior information acquisition unit 200 acquires the user's behavior information each time it receives a new behavior of the user. The behavior information acquisition unit 200 registers the acquired behavior information of the user in the behavior information database DB2 and outputs it to the similarity analysis unit 300 and the CV prediction unit 600. The user behavior information includes information on the user's search behavior and information on the user's specific behavior (described later).

「検索行動」とは、例えば、商品等を探すまたは絞り込むための行動である。例えば、「検索行動」とは、クエリ（検索クエリ）の入力、販売サイトで設定されている商品等を絞り込むためのボタンなどに対するクリック、および販売サイトで設定されている商品等の表示順を変更するためのボタンなどに対するクリックなどを含む。「検索行動に関する情報」とは、ユーザによる検索行動の内容、およびその検索行動が行われた時刻などを含む。また本願でいう「ボタン」とは、画面に表示された仮想的なものでもよく、ラジオボックスのようなボタンと同視できるものも含む。 The "search action" is, for example, an action for searching for or narrowing down a product or the like. For example, "search behavior" means inputting a query (search query), clicking on a button for narrowing down the products set on the sales site, and changing the display order of the products set on the sales site. Includes clicks on buttons etc. to do. The "information about the search behavior" includes the content of the search behavior by the user, the time when the search behavior is performed, and the like. Further, the "button" referred to in the present application may be a virtual one displayed on the screen, and includes a button that can be equated with a button such as a radio box.

「特定行動」とは、例えば、検索行動によって絞り込まれた個々の商品等に対する行動である。例えば、「特定行動」とは、販売サイトに表示された複数の商品等のなかから、１以上の商品等を選択する行動である。例えば、「特定行動」とは、商品等の詳細な紹介ページへのハイパーリンクに対するクリックや、商品等を購入するための購入ボタンに対するクリックなどを含む。「特定行動に関する情報」とは、ユーザによる特定行動の内容、およびその特定行動が行われた時刻などを含む。 The "specific action" is, for example, an action for an individual product or the like narrowed down by a search action. For example, the "specific action" is an action of selecting one or more products or the like from a plurality of products or the like displayed on the sales site. For example, the "specific action" includes a click on a hyperlink to a detailed introduction page of a product or the like, a click on a purchase button for purchasing a product or the like, and the like. The "information about the specific action" includes the content of the specific action by the user, the time when the specific action is performed, and the like.

図４は、本実施形態のユーザ行動情報の内容の一例を示す図である。図４に示すように、ユーザ行動情報では、ユーザの各行動（例えば、検索行動および特定行動）と、その行動が販売サイトに対して行われた時刻とが対応付けられて管理されている。また、ユーザ行動情報は、ユーザ毎に管理されている。 FIG. 4 is a diagram showing an example of the content of the user behavior information of the present embodiment. As shown in FIG. 4, in the user behavior information, each behavior of the user (for example, a search behavior and a specific behavior) is managed in association with the time when the behavior is performed on the sales site. In addition, user behavior information is managed for each user.

次に、類似度分析部３００について説明する。類似度分析部３００は、販売サイトに対するユーザの複数の検索行動の内容に基づき、その複数の検索行動の間の類似度を導出する。本実施形態では、販売サイトに対してユーザから入力された複数のクリエの内容に基づき、その複数のクエリ間の類似度を前記複数の検索行動の間の類似度として導出する。 Next, the similarity analysis unit 300 will be described. The similarity analysis unit 300 derives the similarity between the plurality of search behaviors based on the contents of the plurality of search behaviors of the user with respect to the sales site. In the present embodiment, the similarity between the plurality of queries is derived as the similarity between the plurality of search actions based on the contents of the plurality of clicks input from the user to the sales site.

ここで、本願で言う「クエリ」について定義する。クエリとは、ウェブサイトに対する要求を所定の形式の文字で表現したものである。クエリは、例えば、ウェブサイトに対して直接に入力された文字である。ただし、本実施形態では、類似度分析部３００は、直接に入力されたクエリだけに限らず、直接に入力されたクエリと同視できる検索行動がなされた場合も、前記直接に入力されたクエリと同視できる検索行動をクエリの入力として取り扱う。「直接に入力されたクエリと同視できる検索行動」とは、ある項目（例えば、ある商品等）に関連するページへのハイパーリンクに対するクリックなどである。 Here, the "query" referred to in the present application is defined. A query is a written representation of a request for a website in a predetermined format. A query is, for example, a character entered directly into a website. However, in the present embodiment, the similarity analysis unit 300 is not limited to the directly input query, and even when a search action that can be equated with the directly input query is performed, the directly input query is used. Treat search behaviors that can be equated as query input. "Search behavior that can be equated with a directly entered query" is a click on a hyperlink to a page related to a certain item (for example, a certain product).

また、本実施形態では、１つの検索ボックスに複数のターム（キーワード）が１度に入力された場合、その入力された複数のタームを纏めて１つのクエリとして取り扱う。例えば、検索ボックスに「本小説」のような入力がされた場合、「本小説」で１つのクエリとなる。言い換えると、「本」と入力されたクエリと、「本小説」と入力されたクエリとは、互いに異なるクエリとして取り扱われる。 Further, in the present embodiment, when a plurality of terms (keywords) are input at one time in one search box, the input plurality of terms are collectively treated as one query. For example, if an input such as "this novel" is entered in the search box, "this novel" becomes one query. In other words, the query entered as "book" and the query entered as "book" are treated as different queries.

図５は、本実施形態の類似度分析部３００の構成を示すブロック図である。本実施形態の類似度分析部３００は、同一のクエリを入力したユーザ群毎に、そのユーザ群が選択した複数の商品等のそれぞれに付与されているカテゴリパスを収集する。そして、類似度分析部３００は、収集した複数のカテゴリパスに基づいてクエリ毎のカテゴリツリーを生成し、生成したクエリ毎のカテゴリツリーに基づいて複数のクエリ間の類似度を導出する。以下、この内容について詳しく説明する。図５に示すように、類似度分析部３００は、例えば、カテゴリパス収集部３１０と、カテゴリツリー生成部３２０と、類似度導出部３３０とを有する。 FIG. 5 is a block diagram showing the configuration of the similarity analysis unit 300 of the present embodiment. The similarity analysis unit 300 of the present embodiment collects the category paths given to each of the plurality of products selected by the user group for each user group in which the same query is input. Then, the similarity analysis unit 300 generates a category tree for each query based on the collected plurality of category paths, and derives the similarity between the plurality of queries based on the generated category tree for each query. This content will be described in detail below. As shown in FIG. 5, the similarity analysis unit 300 includes, for example, a category path collection unit 310, a category tree generation unit 320, and a similarity derivation unit 330.

カテゴリパス収集部３１０は、同一のクエリを入力したユーザ群毎に、そのユーザ群が販売サイトで選択した複数の商品等のそれぞれに付与されているカテゴリパスＰを収集する。詳しく述べると、カテゴリパス収集部３１０は、販売サイトに対して検索行動を行った複数のユーザのなかで、同一のクエリ（以下、「特定クエリ」と称する）を入力したユーザを特定する。例えば、カテゴリパス収集部３１０は、行動情報データベースＤＢ２に登録されたユーザ行動情報を参照することで、特定クエリを入力したユーザを抽出する。 The category path collecting unit 310 collects the category path P given to each of the plurality of products or the like selected by the user group on the sales site for each user group in which the same query is input. More specifically, the category path collecting unit 310 identifies a user who has input the same query (hereinafter, referred to as "specific query") among a plurality of users who have performed a search action on the sales site. For example, the category path collecting unit 310 extracts the user who has input a specific query by referring to the user behavior information registered in the behavior information database DB2.

また、カテゴリパス収集部３１０は、特定クエリを入力したユーザがその特定クエリを入力した直後に選択した商品等を特定する。なお、「特定クエリを入力した直後」とは、特定クエリの入力と、商品等の選択との間に、別のクエリの入力が存在しないことを意味する。「特定クエリを入力した直後」とは、特定クエリの入力と、商品等の選択との間に、商品等の表示順の変更などの別の行動が存在してもよい。また、「（商品等を）選択した」とは、例えば、その商品等に対して上記特定行動が行われたことを意味する。例えば、本実施形態では、カテゴリパス収集部３１０は、個々の商品等に関連するボタンまたはハイパーリンクがクリックされたこと（アクセスされたこと）を、「商品等が選択された」と見做す。なおこれに代えて、カテゴリパス収集部３１０は、商品等が実際に購入されたことに限定して「商品等が選択された」と見做してもよい。カテゴリパス収集部３１０は、行動情報データベースＤＢ２に登録されたユーザ行動情報を参照し、ユーザ行動情報に含まれる時刻情報に基づくことで、ユーザが特定クエリを入力した直後に選択した商品等を特定する。カテゴリパス収集部３１０は、上記処理を、同一のクエリを入力した全てのユーザに対して実行する。 In addition, the category path collecting unit 310 identifies the product or the like selected immediately after the user who entered the specific query inputs the specific query. In addition, "immediately after inputting a specific query" means that there is no input of another query between the input of the specific query and the selection of the product or the like. “Immediately after inputting a specific query” means that another action such as changing the display order of products or the like may exist between the input of the specific query and the selection of the product or the like. Further, "selecting (a product or the like)" means that, for example, the above-mentioned specific action was performed on the product or the like. For example, in the present embodiment, the category path collecting unit 310 considers that a button or hyperlink related to an individual product or the like is clicked (accessed) as "a product or the like is selected". .. Instead of this, the category pass collecting unit 310 may consider that "the product or the like has been selected" only when the product or the like is actually purchased. The category path collecting unit 310 refers to the user behavior information registered in the behavior information database DB2 and identifies the product or the like selected immediately after the user inputs a specific query by referring to the time information included in the user behavior information. To do. The category path collecting unit 310 executes the above process for all users who have input the same query.

また、カテゴリパス収集部３１０は、商品情報データベースＤＢ１に登録された商品情報を参照し、特定クエリを入力したユーザがその特定クエリを入力した直後に選択した商品等に付与されているカテゴリパスＰを収集する。ここで、同一クエリを入力した複数のユーザが別々の商品等を選択している場合には、それら複数の商品等のそれぞれに対応する複数のカテゴリパスＰが収集される。カテゴリパス収集部３１０は、上記処理を、同一のクエリを入力したユーザ群によって選択された全ての商品等に対して実行する。 Further, the category path collecting unit 310 refers to the product information registered in the product information database DB1, and the category path P given to the product or the like selected immediately after the user who input the specific query inputs the specific query. To collect. Here, when a plurality of users who have input the same query select different products and the like, a plurality of category paths P corresponding to each of the plurality of products and the like are collected. The category path collecting unit 310 executes the above processing for all products and the like selected by the user group who input the same query.

図６は、あるクエリを入力したユーザ群に対してカテゴリパス収集部３１０により収集されたカテゴリパスＰの集まりの一例を示す図である。図６は、特定クエリとして「本」を入力したユーザが、特定クエリを入力した直後に選択した（例えばハイパーリンクをクリックした）商品等のカテゴリパスＰの集まりを示す。 FIG. 6 is a diagram showing an example of a collection of category paths P collected by the category path collection unit 310 for a group of users who input a certain query. FIG. 6 shows a set of category paths P such as products selected (for example, by clicking a hyperlink) immediately after inputting a specific query by a user who input "book" as a specific query.

以上のように、カテゴリパス収集部３１０は、同一のクエリを入力したユーザ群毎に、そのユーザ群が販売サイトで選択した複数の商品等のそれぞれに付与されたカテゴリパスＰを収集する。カテゴリパス収集部３１０は、ユーザ群毎に収集したカテゴリパスＰの集まりを、カテゴリツリー生成部３２０に出力する。カテゴリパス収集部３１０は、所定期間の間に販売サイトに入力された全てのクエリの各々に対して上記処理を行う。 As described above, the category path collecting unit 310 collects the category paths P given to each of the plurality of products and the like selected by the user group on the sales site for each user group in which the same query is input. The category path collection unit 310 outputs a collection of category paths P collected for each user group to the category tree generation unit 320. The category path collecting unit 310 performs the above processing for each of all the queries input to the sales site during the predetermined period.

図５に戻り説明すると、カテゴリツリー生成部３２０は、カテゴリパス収集部３１０により収集されたユーザ群毎に収集された複数のカテゴリパスＰに基づき、その複数のカテゴリパスＰに含まれる同一のカテゴリを統合することで、クエリ毎のカテゴリツリーＴを生成する。カテゴリツリー生成部３２０は、例えば、カテゴリパス統合部３２１と、重み付与部３２２とを有する。 Returning to FIG. 5, the category tree generation unit 320 is based on a plurality of category paths P collected for each user group collected by the category path collection unit 310, and the same category included in the plurality of category paths P. By integrating, a category tree T for each query is generated. The category tree generation unit 320 has, for example, a category path integration unit 321 and a weighting unit 322.

カテゴリパス統合部３２１は、カテゴリパス収集部３１０により収集されたユーザ群毎に収集された複数のカテゴリパスＰに含まれる同一のカテゴリを統合する。すなわち、カテゴリパス統合部３２１は、複数のカテゴリパスＰのなかで、例えば同一階層に同一のカテゴリを持つ２つ以上のカテゴリパスＰが存在しないか判定する。そして、カテゴリパス統合部３２１は、同一階層に同一カテゴリを持つ２つ以上のカテゴリパスＰが存在する場合、そのカテゴリおよびそのカテゴリよりも上位のカテゴリを統合することで、前記２つ以上のカテゴリパスＰを１つのカテゴリツリーＴに変換する。カテゴリパス統合部３２１は、上記のようなカテゴリの統合を、階層状のカテゴリの各階層で行う。 The category path integration unit 321 integrates the same categories included in a plurality of category paths P collected for each user group collected by the category path collection unit 310. That is, the category path integration unit 321 determines whether or not there are two or more category paths P having the same category in the same hierarchy among the plurality of category paths P. Then, when there are two or more category paths P having the same category in the same hierarchy, the category path integration unit 321 integrates the category and a category higher than the category, so that the two or more categories are integrated. Convert the path P into one category tree T. The category path integration unit 321 integrates the categories as described above in each layer of the hierarchical categories.

図７は、カテゴリパス統合部３２１により生成されたカテゴリツリーＴの一例を示す図である。図７に示されたカテゴリツリーＴは、図６に示されたカテゴリパスＰの集まりに基づいて生成されたカテゴリツリーである。図７に示すように、同一階層に同一カテゴリを持つ２つ以上のカテゴリパスＰが存在する場合、そのカテゴリおよびそのカテゴリよりも上位のカテゴリが統合されて、１つのカテゴリツリーＴが生成されている。 FIG. 7 is a diagram showing an example of the category tree T generated by the category path integration unit 321. The category tree T shown in FIG. 7 is a category tree generated based on the set of category paths P shown in FIG. As shown in FIG. 7, when two or more category paths P having the same category exist in the same hierarchy, the category and the categories higher than the category are integrated to generate one category tree T. There is.

また、カテゴリパス統合部３２１は、生成したカテゴリツリーＴにおいて、第１階層のカテゴリのさらに上位に、ルートノードＲＮを設定する。ルートノードＲＮは、第１階層のさらに上位の階層（第０階層）に設定された仮想的なカテゴリとして取り扱われる。ルートノードＲＮは、全てのクエリのカテゴリツリーＴで同一の内容である。なお、ルートノードＲＮの役割については後述する。 Further, the category path integration unit 321 sets the root node RN at a higher level than the category of the first layer in the generated category tree T. The root node RN is treated as a virtual category set in a higher layer (0th layer) of the first layer. The root node RN has the same contents in the category tree T of all queries. The role of the root node RN will be described later.

重み付与部３２２は、カテゴリパス統合部３２１により生成されたカテゴリツリーＴに含まれる各カテゴリに重みを付与する。詳しく述べると、重み付与部３２２は、カテゴリツリーＴの生成過程で統合されたカテゴリに対して重みを付与する。例えば、重み付与部３２２は、カテゴリツリーＴの生成過程で統合された同一のカテゴリの数に応じてそのカテゴリに付与する重みの大きさを導出する。 The weighting unit 322 assigns weights to each category included in the category tree T generated by the category path integration unit 321. More specifically, the weighting unit 322 assigns weights to the categories integrated in the process of generating the category tree T. For example, the weighting unit 322 derives the magnitude of the weight given to the category according to the number of the same categories integrated in the process of generating the category tree T.

本実施形態では、重み付与部３２２は、まず、カテゴリツリーＴの生成過程で統合された同一のカテゴリの数に応じてそのカテゴリにスコアを付与する。例えば、図６に示された複数のカテゴリパスＰに基づいて図７に示されたカテゴリツリーＴが生成される例では、カテゴリ「本」に対して「３」、カテゴリ「育児」に対して「２」、カテゴリ「料理」に対して「２」、その他のカテゴリに対して「１」のスコアが付与される。 In the present embodiment, the weighting unit 322 first assigns a score to the category according to the number of the same categories integrated in the process of generating the category tree T. For example, in the example in which the category tree T shown in FIG. 7 is generated based on the plurality of category paths P shown in FIG. 6, the category “book” is set to “3” and the category “childcare” is set to. A score of "2" is given to the category "cooking", and a score of "1" is given to other categories.

そして、重み付与部３２２は、１つのカテゴリツリーＴに含まれる全てのカテゴリに付与されたスコアをカテゴリツリーＴ単位で正規化することで、各カテゴリに付与される重みの値を導出する。図７に示す例では、各カテゴリに付与されたスコアを、全てのカテゴリに付与されたスコアの合計値である「１２」で除算することで、各カテゴリに付与される重みの値が導出される。例えば、カテゴリ「本」に対して「０．２５」、カテゴリ「育児」に対して「０．１６７」、カテゴリ「料理」に対して「０．１６７」、その他のカテゴリに対して「０．０８３」の重みが付与される。 Then, the weighting unit 322 derives the weight value given to each category by normalizing the scores given to all the categories included in one category tree T in the category tree T unit. In the example shown in FIG. 7, the weight value given to each category is derived by dividing the score given to each category by "12" which is the total value of the scores given to all categories. To. For example, "0.25" for the category "book", "0.167" for the category "childcare", "0.167" for the category "cooking", and "0.67" for the other categories. The weight of "083" is given.

カテゴリツリー生成部３２０は、前記所定期間の間に販売サイトに入力された全てのクエリに対して上記処理を行い、クエリ毎の重み付きカテゴリツリーＴを生成する。カテゴリツリー生成部３２０により生成されたクエリ毎の重み付きカテゴリツリーＴは、カテゴリツリー情報データベースＤＢ３に登録される。 The category tree generation unit 320 performs the above processing on all the queries input to the sales site during the predetermined period, and generates a weighted category tree T for each query. The weighted category tree T for each query generated by the category tree generation unit 320 is registered in the category tree information database DB3.

次に、図８を参照して、カテゴリパス統合部３２１により生成されたカテゴリツリーＴの別の例について説明する。図８に示す例では、あるクエリが複数の意味を持つ場合の例である。例えば、１つのクエリに含まれるタームが、ある「水産物の産地」と、その水産物の産地とは全く関係のない「ファッションブランド」とをそれぞれ指す場合である。この場合、そのクエリを入力したユーザがクエリの入力の直後にクリックした商品等は、上記水産物の産地に関連するカテゴリに属する商品等である場合と、上記ファッションブランドに関連するカテゴリに属する商品等である場合がある。この場合、２つの異なる階層状のカテゴリが存在することになるので、それぞれの商品等のカテゴリパスＰをそのままでは１つのカテゴリツリーＴに統合することができない。そこで本実施形態では、図８に示すように、第１階層のカテゴリのさらに上位にルートノードＲＮを設定している。そして、２つの異なる階層状のカテゴリから抽出された２つのカテゴリパスＰは、ルートノードＲＮで結合される。これにより、１つのクエリが複数の意味を持つ場合であっても、そのクエリに対応する１つのカテゴリツリーＴを生成することができる。 Next, another example of the category tree T generated by the category path integration unit 321 will be described with reference to FIG. The example shown in FIG. 8 is an example in which a certain query has a plurality of meanings. For example, a term included in one query may refer to a certain "fishery origin" and a "fashion brand" that has nothing to do with the marine product origin. In this case, the products, etc. that the user who entered the query clicked immediately after entering the query are the products, etc. that belong to the category related to the production area of the marine products, and the products, etc. that belong to the category related to the fashion brand. May be. In this case, since there are two different hierarchical categories, the category path P of each product or the like cannot be integrated into one category tree T as it is. Therefore, in the present embodiment, as shown in FIG. 8, the root node RN is set higher than the category of the first layer. Then, the two category paths P extracted from the two different hierarchical categories are combined by the root node RN. Thereby, even if one query has a plurality of meanings, one category tree T corresponding to the query can be generated.

次に、図５に戻り、類似度導出部３３０について説明する。類似度導出部３３０は、ユーザの複数の検索行動の内容に基づき、前記複数の検索行動の間の類似度を導出する。本実施形態では、ユーザが入力したクエリの内容に基づき、複数のクエリ間の類似度を前記複数の検索行動の間の類似度として導出する。類似度導出部３３０は、比較対象の複数のクエリ間の類似度を、前記複数のクエリのそれぞれに対してカテゴリツリー生成部３２０により生成されたカテゴリツリーＴ同士の類似度に基づき導出する。本実施形態では、類似度導出部３３０は、木間編集距離を用いて２つの重み付きカテゴリツリーＴ同士の類似度を求める。類似度導出部３３０は、「第１導出部」の一例である。 Next, returning to FIG. 5, the similarity deriving unit 330 will be described. The similarity derivation unit 330 derives the similarity between the plurality of search actions based on the contents of the plurality of search actions of the user. In the present embodiment, the similarity between a plurality of queries is derived as the similarity between the plurality of search actions based on the content of the query input by the user. The similarity derivation unit 330 derives the similarity between the plurality of queries to be compared based on the similarity between the category trees T generated by the category tree generation unit 320 for each of the plurality of queries. In the present embodiment, the similarity deriving unit 330 obtains the similarity between the two weighted category trees T by using the intertree edit distance. The similarity derivation unit 330 is an example of the “first derivation unit”.

図９は、２つの重み付きカテゴリツリーＴ同士の類似度の求め方の一例を示す図である。図９に示すように、類似度導出部３３０は、複数のクエリのそれぞれに対応するカテゴリツリーＴに含まれる複数のカテゴリ同士の一致度に基づき、前記複数のクエリ間の類似度を導出する。例えば、類似度導出部３３０は、複数のクエリのそれぞれに対応するカテゴリツリーＴに含まれるカテゴリ同士が一致するか否かを最上位のカテゴリ（例えばルートノードＲＮ）から順に判定し、前記最上位のカテゴリから連続してどれだけ多くのカテゴリが一致するかに基づき、複数のクエリ間の類似度を導出する。ここでは、最上位のカテゴリから連続して一致するカテゴリの数が多いほど、高い類似度が導出される。 FIG. 9 is a diagram showing an example of how to obtain the similarity between two weighted category trees T. As shown in FIG. 9, the similarity derivation unit 330 derives the similarity between the plurality of queries based on the degree of matching between the plurality of categories included in the category tree T corresponding to each of the plurality of queries. For example, the similarity derivation unit 330 determines whether or not the categories included in the category tree T corresponding to each of the plurality of queries match each other in order from the highest category (for example, the root node RN), and determines whether or not the categories are the same. Derivation of similarity between multiple queries is based on how many categories match in a row from. Here, the higher the number of consecutive matching categories from the highest category, the higher the similarity is derived.

図９に示す例では、図９中の左側のカテゴリツリーＴがクエリ「本」に対応するカテゴリツリーであり、図９中の右側のカテゴリツリーＴが「本」とは異なるクエリに対応するカテゴリツリーである。図９に示す例では、最上位のカテゴリ（ルートノードＲＮ）から見て、カテゴリ「本」、カテゴリ「育児」、カテゴリ「料理」が２つのカテゴリツリーＴで一致するカテゴリになる。 In the example shown in FIG. 9, the category tree T on the left side in FIG. 9 is the category tree corresponding to the query "book", and the category tree T on the right side in FIG. 9 corresponds to the category different from the "book". It is a tree. In the example shown in FIG. 9, the category "book", the category "childcare", and the category "cooking" are the same categories in the two category trees T when viewed from the highest category (root node RN).

本実施形態では、類似度導出部３３０は、複数のクエリのそれぞれに対応するカテゴリツリーＴに含まれる複数のカテゴリ同士の一致度と、一致するカテゴリに付与された前記重みとに基づき、複数のクエリ間の類似度を導出する。例えば、類似度導出部３３０は、複数のクエリのそれぞれに対応するカテゴリツリーＴに含まれるカテゴリ同士が一致するか否かを最上位のカテゴリから順に判定し、一致するカテゴリに付与された前記重みの値を加算していくことで、複数のクエリ間の類似度を導出する。一致するカテゴリに付与された前記重みは、例えば、２つのカテゴリツリーＴにおいて小さい方の重みの値が採用される。 In the present embodiment, the similarity deriving unit 330 has a plurality of similarity deriving units 330 based on the degree of matching between a plurality of categories included in the category tree T corresponding to each of the plurality of queries and the weight given to the matching category. Derive the similarity between queries. For example, the similarity derivation unit 330 determines whether or not the categories included in the category tree T corresponding to each of the plurality of queries match each other in order from the highest category, and the weights given to the matching categories. By adding the values of, the similarity between multiple queries is derived. As the weight given to the matching category, for example, the value of the smaller weight in the two category trees T is adopted.

図９に示す例では、最上位のカテゴリから見て、カテゴリ「本」、カテゴリ「育児」、カテゴリ「料理」が２つのカテゴリツリーＴで一致するカテゴリである。そして、左側のカテゴリツリーＴのカテゴリ「本」に対して付与された重みは、「０．２５」であり、右側のカテゴリツリーＴのテゴリ「本」に対して付与された重みは、「０．２０」である。この場合、「０．２５」と「０．２０」とのうち小さい方である「０．２０」が加算対象の重みの値となる。同様に、２つのカテゴリツリーＴでカテゴリ「育児」に対して付与された重みの小さい方である「０．１５」、および２つのカテゴリツリーＴでカテゴリ「料理」に対して付与された重みの小さい方である「０．１５」が順に加算される。そして、これら「０．２０」、「０．１５」、および「０．１５」を加算した合計値である「０．５」が２つのカテゴリツリーＴの類似度を示す値として導出される。この導出される類似度の値が大きい程、２つのクエリ間の類似度が高いと見做される。 In the example shown in FIG. 9, the category "book", the category "childcare", and the category "cooking" are the same categories in the two category trees T when viewed from the top category. The weight given to the category "book" of the category tree T on the left side is "0.25", and the weight given to the tegori "book" of the category tree T on the right side is "0". .20 ". In this case, "0.20", which is the smaller of "0.25" and "0.20", is the value of the weight to be added. Similarly, the smaller weight given to the category "childcare" in the two category trees T is "0.15", and the weight given to the category "cooking" in the two category trees T. The smaller "0.15" is added in order. Then, "0.5", which is the total value obtained by adding these "0.20", "0.15", and "0.15", is derived as a value indicating the similarity between the two category trees T. The larger the derived similarity value, the higher the similarity between the two queries.

また、類似度導出部３３０は、後述するコンバージョン予測値を導出する段階において、販売サイトに対して入力されるクエリ（すなわち、行動情報取得部２００により受け付けられるクエリ）を監視する。そして、類似度導出部３３０は、ユーザからクエリが新しく入力される毎に、新しく入力されたクエリに対応するカテゴリツリーＴをカテゴリツリー情報データベースＤＢ３から取得する。そして、類似度導出部３３０は、新しく入力されたクエリに対応するカテゴリツリーＴと、１つ前に入力されたクエリに対応するカテゴリツリーＴとを比較することで、新しく入力されたクエリと１つ前に入力されたクエリとの間の類似度を導出する。類似度導出部３３０は、新しく入力されたクエリと１つ前に入力されたクエリとの類似度が閾値以下の場合、新しく入力されたクエリと１つ前に入力されたクエリとの類似度が閾値以下であることを示す信号をセッション分割部４００に出力する。 Further, the similarity derivation unit 330 monitors a query input to the sales site (that is, a query accepted by the behavior information acquisition unit 200) at the stage of deriving the conversion prediction value described later. Then, the similarity derivation unit 330 acquires the category tree T corresponding to the newly input query from the category tree information database DB3 every time a query is newly input by the user. Then, the similarity derivation unit 330 compares the newly input query with the category tree T corresponding to the newly input query and the category tree T corresponding to the previously input query, so that the newly input query and 1 Derives the similarity with the previously entered query. In the similarity derivation unit 330, when the similarity between the newly input query and the previously input query is less than or equal to the threshold value, the similarity between the newly input query and the previously input query is the same. A signal indicating that the value is equal to or less than the threshold value is output to the session division unit 400.

次に、図１に戻り、セッション分割部４００について説明する。セッション分割部４００は、類似度導出部３３０により導出された複数の検索行動の間の類似度に基づき、前記複数の検索行動の間でセッションを分割する。本実施形態では、セッション分割部４００は、ユーザから入力された複数のクエリに対して類似度導出部３３０により導出された複数のクエリ間の類似度に基づき、複数のクエリをそれぞれ受け付けた時刻の間でセッションを分割する。「セッション」とは、例えば、クッキー等の状態管理機能の有効期間である。例えば、ウェブサイト内のあるウェブページにアクセスしてから所定時間経過（タイムアウト）するまでの期間が一つのセッションとして扱われる。また、セッションとは、ウェブサイト内のあるウェブページにアクセスしてから、当該ウェブサイト内の他のウェブページ、または他のウェブサイト内のウェブページに切り替わるまでの期間であってもよい。また、セッションとは、ウェブサイト内のあるウェブページにアクセスしてから、当該ウェブページを表示するウェブブラウザを閉じるまでの期間であってもよい。 Next, returning to FIG. 1, the session dividing unit 400 will be described. The session dividing unit 400 divides a session among the plurality of search actions based on the similarity between the plurality of search actions derived by the similarity deriving unit 330. In the present embodiment, the session dividing unit 400 receives a plurality of queries based on the similarity between the plurality of queries derived by the similarity deriving unit 330 with respect to the plurality of queries input from the user. Split the session between. The "session" is, for example, a valid period of a state management function such as a cookie. For example, the period from accessing a certain web page on a website to the elapse of a predetermined time (timeout) is treated as one session. In addition, the session may be a period from accessing a certain web page in the website to switching to another web page in the website or a web page in another website. In addition, the session may be a period from accessing a certain web page in the website to closing the web browser that displays the web page.

図１０は、セッション分割部４００によるセッションの分割の一例を示す図である。本実施形態では、セッション分割部４００は、新しく入力されたクエリと１つ前に入力されたクエリとの類似度が閾値以下であることを示す信号を類似度分析部３００から受け取る場合、新しく入力されたクエリを受け付けた時刻と１つ前に入力されたクエリを受け付けた時刻との間でセッションを分割する。 FIG. 10 is a diagram showing an example of session division by the session division unit 400. In the present embodiment, when the session division unit 400 receives a signal from the similarity analysis unit 300 indicating that the similarity between the newly input query and the previously input query is equal to or less than the threshold value, the session division unit 400 newly inputs. The session is divided between the time when the query entered is received and the time when the query entered immediately before is received.

図１０に示す例では、クエリ「本小説」を受け付けた時刻と、クエリ「服」を受け付けた時刻との間でセッションが分割される。本実施形態のセッション分割部４００によれば、クエリ「本小説」を受け付けた時刻から、クエリ「服」を受け付けた時刻までの経過時間が短い場合でも、２つのクエリの類似度が閾値以下の場合、２つのクエリが受け付けられた時刻の間でセッションが分割される。なお、２つのクエリの入力の間にユーザの別の行動（例えば、商品等を絞り込むボタンに対するクリックなど）がある場合は、セッションは、新しく入力されたクエリを受け付けた時刻の直前で分割される。セッション分割部４００は、セッションを分割した場合、セッションを分割したこと、およびセッションを分割することで新しく始まるセッション（以下、「分割セッション」と称する）の開始時刻を、計算モデル学習部５００およびＣＶ予測部６００に出力する。 In the example shown in FIG. 10, the session is divided between the time when the query "this novel" is received and the time when the query "clothes" is received. According to the session division unit 400 of the present embodiment, even if the elapsed time from the time when the query "this novel" is received to the time when the query "clothes" is received is short, the similarity between the two queries is equal to or less than the threshold value. If so, the session is split between the times when the two queries are accepted. If there is another action of the user between the input of two queries (for example, clicking on the button to narrow down the product etc.), the session is divided immediately before the time when the newly input query is accepted. .. When the session is divided, the session division unit 400 sets the start time of the divided session and the session newly started by dividing the session (hereinafter referred to as "divided session") by the calculation model learning unit 500 and the CV. Output to the prediction unit 600.

次に、図１に戻り、計算モデル学習部５００について説明する。計算モデル学習部５００は、ユーザの検索行動に基づきユーザのコンバージョン予測値を導出する計算モデルを学習させる。本実施形態では、計算モデル学習部５００により学習される計算モデルは、分割セッションにおけるユーザの検索行動に基づき、分割セッションの間にユーザに対して提示されている商品等に対するユーザのコンバージョン予測値を導出する計算モデルである。 Next, returning to FIG. 1, the calculation model learning unit 500 will be described. The calculation model learning unit 500 trains a calculation model that derives a user's conversion prediction value based on the user's search behavior. In the present embodiment, the calculation model learned by the calculation model learning unit 500 determines the user's conversion prediction value for the product or the like presented to the user during the division session based on the user's search behavior in the division session. It is a calculation model to be derived.

詳しく述べると、上記計算モデルに対する入力は、リクエスト素性情報と、セッション素性情報とを含む。一方で、上記計算モデルからの出力は、ユーザのコンバージョン予測値である。 More specifically, the input to the above calculation model includes request feature information and session feature information. On the other hand, the output from the above calculation model is the conversion prediction value of the user.

リクエスト素性情報は、例えば、ユーザの検索行動の内容と、その検索行動を行った時刻を示す情報である。検索行動の内容は、例えば、ユーザにより入力されたクエリの内容、販売サイトでの商品等の表示順（並び順）の変更に関するユーザのリクエストの内容、販売サイトでの商品等の絞り込み条件に関するユーザのリクエストの内容などである。ユーザにより入力されたクエリの内容は、例えばベクトル表現に変換されて上記計算モデルに入力される。ベクトル表現は、例えばクエリの内容を分散表現で表した密ベクトルであるが、クエリの内容を局所表現で表した疎ベクトルでもよい。販売サイトでの商品等の表示順は、商品等を人気順で表示するか、または価格順で表示するか、などである。販売サイトでの商品等の絞り込み条件は、商品等の在庫の有無、商品等の配達予定日が所定日数以内であること、送料が無料であること、などである。リクエスト素性情報は、行動情報取得部２００により取得されたユーザ行動情報に基づいて得ることができる。 The request feature information is, for example, information indicating the content of the user's search action and the time when the search action was performed. The content of the search behavior is, for example, the content of the query input by the user, the content of the user's request regarding the change of the display order (order) of the products, etc. on the sales site, and the user regarding the narrowing conditions of the products, etc. on the sales site. The content of the request. The content of the query input by the user is converted into, for example, a vector representation and input to the above calculation model. The vector representation is, for example, a dense vector in which the contents of the query are expressed in a distributed representation, but may be a sparse vector in which the contents of the query are represented in a local representation. The display order of the products and the like on the sales site is whether to display the products or the like in the order of popularity or the order of price. The conditions for narrowing down the products on the sales site are whether or not the products are in stock, the scheduled delivery date of the products is within a predetermined number of days, and the shipping fee is free. The request feature information can be obtained based on the user behavior information acquired by the behavior information acquisition unit 200.

セッション素性情報は、分割セッションに関する時刻情報、および分割セッションにおけるリクエスト素性の平均値などである。分割セッションに関する時刻情報は、例えば、分割セッション開始からの経過時間である。「分割セッション開始からの経過時間」とは、例えば、分割セッション開始からユーザによる検索行動（クエリの入力、商品等の表示順変更や絞り込みのリクエスト等）を受け付けた時点までの経過時間である。例えば、１つの分割セッションにおいてユーザが複数の検索行動を行った場合、「分割セッション開始からの経過時間」とは、例えば、分割セッション開始からユーザによる各検索行動を受け付けた各時点までの各経過時間である。「リクエスト素性の平均値」とは、例えば、分割セッションをユーザのリクエスト（例えばクエリの入力）の集合とみなしたとき、リクエスト毎に得た「リクエスト素性情報」のベクトル（上述したクエリの内容を分散表現または局所表現で表したベクトル）の平均値である。セッション素性情報は、セッション分割部４００および行動情報取得部２００により導出または取得された情報に基づいて得られる。 The session feature information includes time information related to the split session and the average value of request features in the split session. The time information regarding the split session is, for example, the elapsed time from the start of the split session. The "elapsed time from the start of the split session" is, for example, the elapsed time from the start of the split session to the time when the user receives a search action (query input, change of display order of products, request for narrowing down, etc.). For example, when a user performs a plurality of search actions in one split session, the "elapsed time from the start of the split session" is, for example, each elapsed time from the start of the split session to each time point when each search action by the user is accepted. It's time. The "mean value of request features" is, for example, a vector of "request feature information" obtained for each request when the divided session is regarded as a set of user requests (for example, query input) (contents of the above-mentioned query). It is the average value of the vector represented by the distributed representation or the local representation). The session identity information is obtained based on the information derived or acquired by the session division unit 400 and the action information acquisition unit 200.

コンバージョン予測値は、分割セッションの間にユーザに対して提示されている商品等に対してユーザが所定の行動をとる確率である。コンバージョン予測値は、種々の目的に応じて異なる定義が可能である。コンバージョン予測値の一例は、分割セッションの間にユーザに対して提示されている商品等をユーザが購入する確率である。ただし、コンバージョン予測値は、上記例に限らず、分割セッションの間にユーザに対して提示されている商品等のハイパーリンクをユーザがクリックする確率でもよいし、別の定義によるものでもよい。 The conversion prediction value is the probability that the user takes a predetermined action for the product or the like presented to the user during the divided session. Predicted conversion values can be defined differently for different purposes. An example of the conversion prediction value is the probability that the user purchases a product or the like presented to the user during the split session. However, the conversion prediction value is not limited to the above example, and may be the probability that the user clicks a hyperlink such as a product presented to the user during the split session, or may be based on another definition.

本実施形態では、計算モデル学習部５００は、過去の一定期間において行動情報取得部２００およびセッション分割部４００により取得または導出されたユーザの行動情報およびセッション情報を教師データとして上記学習モデルを学習させる。行動情報取得部２００により取得されたユーザの行動情報は、上記リクエスト素性情報と、ユーザが実際に上記所定の行動（例えば、商品等の購入）を行った否かを示す情報を含む。複数のユーザについて上記所定の行動が実際に行われた否かを示す情報（例えば、コンバージョン率）は、コンバージョン予測値に対する正解データである。セッション分割部４００により取得されたセッション情報は、複数のクエリ間の類似度に基づいて分割された分割セッションに関する時刻情報（例えば、分割セッション開始からコンバージョン予測値を導出する時点までの経過時間）を含む。 In the present embodiment, the calculation model learning unit 500 trains the learning model using the user's behavior information and session information acquired or derived by the behavior information acquisition unit 200 and the session division unit 400 as teacher data in the past fixed period. .. The user's behavior information acquired by the behavior information acquisition unit 200 includes the request feature information and information indicating whether or not the user has actually performed the predetermined action (for example, purchase of a product or the like). The information (for example, conversion rate) indicating whether or not the predetermined action is actually performed for the plurality of users is the correct answer data for the conversion prediction value. The session information acquired by the session division unit 400 contains time information (for example, the elapsed time from the start of the division session to the time when the predicted conversion value is derived) regarding the division session divided based on the similarity between a plurality of queries. Including.

計算モデル学習部５００は、上記のような情報を教師データとした機械学習により、ユーザの行動情報およびセッション情報と、コンバージョン予測値との関係を示す計算モデルを学習する。機械学習の手法は、例えば、サポートベクターマシン（ＳＶＭ：Support Vector Machine）やディープランニングであるが、これらに限定されない。計算モデル学習部５００は、学習した計算モデルを、計算モデル情報データベースＤＢ４に登録する。 The calculation model learning unit 500 learns a calculation model showing the relationship between the user's behavior information and session information and the conversion prediction value by machine learning using the above information as teacher data. Machine learning methods are, for example, Support Vector Machine (SVM) and deep running, but are not limited thereto. The calculation model learning unit 500 registers the learned calculation model in the calculation model information database DB4.

ＣＶ予測部６００は、コンバージョン予測値を求めたいユーザ（以下、「対象ユーザ」と称する）の検索行動に基づき、対象ユーザのコンバージョン予測値を導出する。ＣＶ予測部６００は、クエリ間類似度を用いて分割された分割セッションにおける対象ユーザの検索行動に基づき、前記分割セッションの間に提示されている商品等に対する対象ユーザのコンバージョン予測値を導出する。ＣＶ予測部６００は、分割セッションにおける対象ユーザの検索行動の内容と、分割セッションに関する時刻情報とに基づき、対象ユーザのコンバージョン予測値を導出する。例えば、ＣＶ予測部６００は、分割セッションにおける対象ユーザの検索行動の内容と、分割セッション開始からの経過時間とに基づき、対象ユーザのコンバージョン予測値を導出する。ＣＶ予測部６００は、「第２導出部」の一例である。 The CV prediction unit 600 derives the conversion prediction value of the target user based on the search behavior of the user (hereinafter referred to as "target user") who wants to obtain the conversion prediction value. The CV prediction unit 600 derives the conversion prediction value of the target user for the product or the like presented during the division session based on the search behavior of the target user in the division session divided using the similarity between queries. The CV prediction unit 600 derives a conversion prediction value of the target user based on the content of the search behavior of the target user in the split session and the time information related to the split session. For example, the CV prediction unit 600 derives a conversion prediction value of the target user based on the content of the search behavior of the target user in the split session and the elapsed time from the start of the split session. The CV prediction unit 600 is an example of a “second out-licensing unit”.

詳しく述べると、ＣＶ予測部６００は、行動情報取得部２００により取得されたユーザの行動情報を受け取る。また、ＣＶ予測部６００は、セッション分割部４００により導出されたセッション情報（例えば、分割セッション開始からの経過時間）を受け取る。また、ＣＶ予測部６００は、計算モデル情報データベースＤＢ４を参照することで、計算モデル学習部５００により学習された計算モデルを読み出す。そして、ＣＶ予測部６００は、例えば、ユーザの各検索行動の内容と、分割セッション開始からの経過時間と、上記計算モデルとに基づき、コンバージョン予測値を導出する。ＣＶ予測部６００は、導出したコンバージョン予測値を示す情報を、ＣＶ予測値情報データベースＤＢ５に登録する。 More specifically, the CV prediction unit 600 receives the user's behavior information acquired by the behavior information acquisition unit 200. Further, the CV prediction unit 600 receives the session information derived by the session division unit 400 (for example, the elapsed time from the start of the division session). Further, the CV prediction unit 600 reads out the calculation model learned by the calculation model learning unit 500 by referring to the calculation model information database DB4. Then, the CV prediction unit 600 derives a conversion prediction value based on, for example, the content of each search behavior of the user, the elapsed time from the start of the split session, and the above calculation model. The CV prediction unit 600 registers the information indicating the derived conversion prediction value in the CV prediction value information database DB5.

なお上記に代えて、ＣＶ予測部６００は、行動情報取得部２００により取得されたユーザの行動情報と、セッション分割部４００により導出されたセッション情報とに基づき、分割セッション開始からのユーザの各検索行動が行われた各時点までの各経過時間を導出してもよい。そして、ＣＶ予測部６００は、ユーザの各検索行動の内容と、分割セッション開始からのユーザの各検索行動が行われた各時点までの各経過時間と、上記計算モデルとに基づき、コンバージョン予測値を導出してもよい。 Instead of the above, the CV prediction unit 600 searches for each user from the start of the divided session based on the user's behavior information acquired by the behavior information acquisition unit 200 and the session information derived by the session division unit 400. Each elapsed time up to each point in time when the action was taken may be derived. Then, the CV prediction unit 600 determines the conversion prediction value based on the content of each search action of the user, each elapsed time from the start of the split session to each time point when each search action of the user is performed, and the above calculation model. May be derived.

最後に、情報出力部７００について説明する。情報出力部７００は、所定の周期に応じて、または外部からの要求を受け付けた場合に、ＣＶ予測値情報データベースＤＢ５に登録されたコンバージョン予測値を読み出し、外部に出力する。 Finally, the information output unit 700 will be described. The information output unit 700 reads out the conversion predicted value registered in the CV predicted value information database DB 5 and outputs it to the outside according to a predetermined cycle or when a request from the outside is received.

次に、サーバ装置１０による処理の流れの一例について説明する。サーバ装置１０による処理の流れは、（１）カテゴリツリーの導出段階と、（２）クエリ間の類似度を用いたセッション分割およびコンバージョン予測値の導出段階と、に大きく分かれる。以下、これらの内容について説明する。 Next, an example of the processing flow by the server device 10 will be described. The processing flow by the server device 10 is roughly divided into (1) a category tree derivation stage and (2) a session division using the similarity between queries and a conversion prediction value derivation stage. Hereinafter, these contents will be described.

図１１は、カテゴリツリーの導出段階の処理の流れの一例を示すフローチャートである。図１１に示すように、まず、カテゴリパス収集部３１０は、同一のクエリを入力したユーザ群毎に、前記ユーザ群が選択した商品群に予め付与されているカテゴリパスＰを取得する（Ｓ１０１）。 FIG. 11 is a flowchart showing an example of the processing flow at the derivation stage of the category tree. As shown in FIG. 11, first, the category path collecting unit 310 acquires the category path P previously assigned to the product group selected by the user group for each user group inputting the same query (S101). ..

次に、カテゴリツリー生成部３２０は、カテゴリパス収集部３１０により収集されたユーザ群毎のカテゴリパスＰの集まりに対して、その集まりの複数のカテゴリパスに含まれる同一のカテゴリを統合することで、カテゴリツリーＴを生成する（Ｓ１０２）。次に、カテゴリツリー生成部３２０は、カテゴリツリーの生成過程で統合された同一カテゴリに対して重みを付与する（Ｓ１０３）。 Next, the category tree generation unit 320 integrates the same categories included in a plurality of category paths of the group of category paths P for each user group collected by the category path collection unit 310. , Generate a category tree T (S102). Next, the category tree generation unit 320 assigns weights to the same category integrated in the category tree generation process (S103).

サーバ装置１０は、クエリ毎に、上記Ｓ１０１からＳ１０３の処理を行い、クエリ毎のカテゴリツリーＴを生成する。これにより、カテゴリツリーＴの導出段階が終了する。これにより、カテゴリツリーＴを用いたクエリ間類似度の導出と、クエリ間類似度を用いたセッションの分割が可能になる。この後、過去の一定期間におけるユーザの行動情報と、前記一定期間に関して求められた、クエリ間類似度によりセッション分割されたセッション情報）とを教師データとして、コンバージョン予測値を導出するための計算モデルが学習される。 The server device 10 performs the processes S101 to S103 for each query to generate a category tree T for each query. This completes the derivation stage of the category tree T. This makes it possible to derive the inter-query similarity using the category tree T and to divide the session using the inter-query similarity. After that, a calculation model for deriving a conversion prediction value using the user's behavior information in the past fixed period and the session information obtained for the fixed period, which is divided into sessions according to the similarity between queries) as teacher data. Is learned.

図１２は、クエリ間の類似度を用いたセッション分割およびコンバージョン予測値の導出段階の処理の流れの一例を示すフローチャートである。図１２に示すように、まず、行動情報取得部２００は、ユーザが新しいクエリを入力する毎に、ユーザが入力した新しいクエリを受け付ける（Ｓ２０１）。次に、類似度導出部３３０は、ユーザからクエリが新しく入力される毎に、新しく入力されたクエリに対応するカテゴリツリーＴをカテゴリツリー情報データベースＤＢ３から読み出す。そして、類似度導出部３３０は、新しく入力されたクエリに対応するカテゴリツリーＴと、１つ前に入力されたクエリに対応するカテゴリツリーＴとを比較する（Ｓ２０２）。これにより、類似度導出部３３０は、新しく入力されたクエリと１つ前に入力されたクエリとの間の類似度を導出する（Ｓ２０３）。なお、１つ前に入力されたクエリに対応するカテゴリツリーＴは、新しく入力されたクエリに対応するカテゴリツリーＴと同様にカテゴリツリー情報データベースＤＢ３から読み出されてもよいし、ＲＡＭのような揮発性メモリ上で保持されていてもよい。 FIG. 12 is a flowchart showing an example of the processing flow of the session division and the derivation stage of the conversion prediction value using the similarity between queries. As shown in FIG. 12, first, the behavior information acquisition unit 200 receives a new query input by the user each time the user inputs a new query (S201). Next, the similarity derivation unit 330 reads the category tree T corresponding to the newly input query from the category tree information database DB3 every time a new query is input from the user. Then, the similarity derivation unit 330 compares the category tree T corresponding to the newly input query with the category tree T corresponding to the previously input query (S202). As a result, the similarity derivation unit 330 derives the similarity between the newly input query and the previously input query (S203). The category tree T corresponding to the previously input query may be read from the category tree information database DB3 in the same manner as the category tree T corresponding to the newly input query, or may be read from the category tree information database DB3, such as RAM. It may be held on the volatile memory.

次に、類似度導出部３３０は、新しく入力されたクエリと１つ前に入力されたクエリとの間の類似度が閾値以下であるか否かを判定する（Ｓ２０４）。前記類似度が閾値よりも高い場合（Ｓ２０４：ＮＯ）、セッション分割部４００は、セッションを分割しない。この場合、Ｓ２０６の処理に進む。 Next, the similarity derivation unit 330 determines whether or not the similarity between the newly input query and the previously input query is equal to or less than the threshold value (S204). When the similarity is higher than the threshold value (S204: NO), the session dividing unit 400 does not divide the session. In this case, the process proceeds to S206.

一方で、前記類似度が閾値以下である場合（Ｓ２０４：ＹＥＳ）、セッション分割部４００は、新しく入力されたクエリが受け付けられた時刻と１つ前に入力されたクエリが受け付けられた時刻との間でセッションを分割する（Ｓ２０５）。セッション分割部４００は、セッションが分割されたこと、および新しく始まる分割セッションの開始時刻を示す情報をＣＶ予測部６００に出力する。 On the other hand, when the similarity is equal to or less than the threshold value (S204: YES), the session dividing unit 400 sets the time when the newly input query is received and the time when the previously input query is received. The session is divided between (S205). The session division unit 400 outputs information indicating that the session has been divided and the start time of the newly started divided session to the CV prediction unit 600.

ＣＶ予測部６００は、行動情報取得部２００により取得されたユーザの検索行動の内容と、セッション分割部４００により導出されたセッション情報と、上記計算モデルとに基づき、コンバージョン予測値を導出する（Ｓ２０６）。情報出力部７００は、ＣＶ予測部６００により導出されたコンバージョン予測値を示す情報を出力する。サーバ装置１０は、例えばユーザにより新しいクエリが入力される毎に、Ｓ２０１からＳ２０６の処理を繰り返す。 The CV prediction unit 600 derives a conversion prediction value based on the content of the user's search behavior acquired by the behavior information acquisition unit 200, the session information derived by the session division unit 400, and the above calculation model (S206). ). The information output unit 700 outputs information indicating a conversion prediction value derived by the CV prediction unit 600. The server device 10 repeats the processes S201 to S206 every time a new query is input by the user, for example.

図１３は、本実施形態のサーバ装置１０を用いたコンバージョン予測値のシミュレーションによる実験結果を示す図である。図１３中のセッション分割方法による「ユーザ」とは、ユーザ毎にセッションを分割したモデルを示し、「時間（３０分）」とは、１つ前のクエリの入力から新しいクエリの入力までの経過時間が３０分以上の場合にセッションを分割したモデルを示し、「ｗ２ｖ」は、ｗｏｒｄ２ｖｅｃを用いたモデルを示し、「カテゴリツリー」は上記実施形態で説明したモデルを示す。図１３に示すように、本実施形態のようなカテゴリツリーＴを用いたセッション分割によれば、他の手法に比べて、コンバージョン予測値の導出精度が向上することが確認された。特に、本実施形態のようなカテゴリツリーＴを用いたクエリ間類似度の導出方法によれば、「商品券」と「図書券」、または商品名のカタカナ表記と英文表記との間の類似度を、他の手法に比べて高く導出することができることも確認された。 FIG. 13 is a diagram showing experimental results by simulating conversion predicted values using the server device 10 of the present embodiment. The "user" according to the session division method in FIG. 13 indicates a model in which a session is divided for each user, and the "time (30 minutes)" is the elapsed time from the input of the previous query to the input of a new query. A model in which a session is divided when the time is 30 minutes or more is shown, "w2v" shows a model using word2vec, and "category tree" shows a model described in the above embodiment. As shown in FIG. 13, it was confirmed that the session division using the category tree T as in the present embodiment improves the derivation accuracy of the conversion prediction value as compared with other methods. In particular, according to the method for deriving the similarity between queries using the category tree T as in the present embodiment, the similarity between the "gift certificate" and the "book ticket" or the katakana notation and the English notation of the product name. Was also confirmed to be able to be derived higher than other methods.

以上説明したサーバ装置１０によれば、ユーザの検索行動に関連するより有用な情報を提供することができる。すなわち、本実施形態では、サーバ装置１０は、カテゴリパス収集部３１０と、カテゴリツリー生成部３２０と、類似度導出部３３０とを有する。カテゴリパス収集部３１０は、同一のクエリを入力したユーザ群毎に、そのユーザ群が販売サイトで選択した複数の商品等のそれぞれに付与されているカテゴリパスＰを収集する。カテゴリツリー生成部３２０は、カテゴリパス収集部３１０により収集されたユーザ群毎の複数のカテゴリパスＰに含まれる同一のカテゴリを統合することで、クエリ毎のカテゴリツリーＴを生成する。類似度導出部３３０は、複数のクエリ間の類似度を、前記複数のクエリのそれぞれに対してカテゴリツリー生成部３２０により生成されたカテゴリツリーＴ同士の類似度に基づき導出する。このような構成によれば、複数のクエリ間の類似度を、容易に、且つ精度よく、導出することができる。これにより、ユーザの検索行動に関連するより有用な情報を提供することができる。 According to the server device 10 described above, more useful information related to the user's search behavior can be provided. That is, in the present embodiment, the server device 10 has a category path collection unit 310, a category tree generation unit 320, and a similarity derivation unit 330. The category path collecting unit 310 collects the category path P given to each of the plurality of products or the like selected by the user group on the sales site for each user group in which the same query is input. The category tree generation unit 320 generates a category tree T for each query by integrating the same categories included in a plurality of category paths P for each user group collected by the category path collection unit 310. The similarity derivation unit 330 derives the similarity between the plurality of queries based on the similarity between the category trees T generated by the category tree generation unit 320 for each of the plurality of queries. According to such a configuration, the similarity between a plurality of queries can be easily and accurately derived. This makes it possible to provide more useful information related to the user's search behavior.

本実施形態では、類似度導出部３３０は、複数のクエリのそれぞれに対応するカテゴリツリーＴに含まれる複数のカテゴリ同士の一致度に基づき、複数のクエリ間の類似度を導出する。このような構成によれば、複数のクエリ間の類似度を、比較的簡単な処理で導出することができる。 In the present embodiment, the similarity derivation unit 330 derives the similarity between a plurality of queries based on the degree of matching between the plurality of categories included in the category tree T corresponding to each of the plurality of queries. According to such a configuration, the similarity between a plurality of queries can be derived by a relatively simple process.

本実施形態では、類似度導出部３３０は、複数のクエリのそれぞれに対応するカテゴリツリーＴに含まれるカテゴリ同士が一致するか否かを最上位のカテゴリから順に判定し、最上位のカテゴリから連続してどれだけ多くのカテゴリが一致するかに基づき、複数のクエリ間の類似度を導出する。このような構成によれば、複数のクエリ間の類似度を、比較的簡単な処理で導出することができる。 In the present embodiment, the similarity deriving unit 330 determines whether or not the categories included in the category tree T corresponding to each of the plurality of queries match each other in order from the highest category, and continues from the highest category. Then, based on how many categories match, the similarity between multiple queries is derived. According to such a configuration, the similarity between a plurality of queries can be derived by a relatively simple process.

本実施形態では、カテゴリツリー生成部３２０は、カテゴリツリーＴの生成過程で統合されたカテゴリに重みを付与する。類似度導出部３３０は、複数のクエリのそれぞれに対応するカテゴリツリーＴに含まれる複数のカテゴリ同士の一致度と、一致するカテゴリに付与された前記重みとに基づき、複数のクエリ間の類似度を導出する。このような構成によれば、統合されたカテゴリに重みが付与されることで、複数のカテゴリツリーＴの類似度をさらに精度良く判定することができる。これにより、複数のクエリ間の類似度の導出精度を高めることができる。 In the present embodiment, the category tree generation unit 320 gives weights to the categories integrated in the process of generating the category tree T. The similarity derivation unit 330 is based on the degree of matching between a plurality of categories included in the category tree T corresponding to each of the plurality of queries and the weight given to the matching category, and the degree of similarity between the plurality of queries. Is derived. According to such a configuration, the similarity of the plurality of category trees T can be determined more accurately by giving weights to the integrated categories. As a result, the accuracy of deriving the similarity between a plurality of queries can be improved.

本実施形態では、カテゴリツリー生成部３２０は、カテゴリツリーＴの生成過程で統合された同一のカテゴリの数に応じてそのカテゴリに付与する前記重みを導出する。このような構成によれば、カテゴリの統合の度合いを反映させることができ、複数のカテゴリツリーＴの類似度をさらに精度良く導出することができる。 In the present embodiment, the category tree generation unit 320 derives the weight to be given to the category according to the number of the same categories integrated in the generation process of the category tree T. According to such a configuration, the degree of integration of categories can be reflected, and the similarity of a plurality of category trees T can be derived with higher accuracy.

本実施形態では、カテゴリツリー生成部３２０は、カテゴリツリーＴの生成過程で統合された同一のカテゴリの数に応じてそのカテゴリにスコアを付与し、１つのカテゴリツリーに含まれる全てのカテゴリに付与されたスコアを前記カテゴリツリー単位で正規化することで前記重みを導出する。このような構成によれば、クエリ毎にカテゴリツリーＴの規模が大きく異なる場合でも、複数のカテゴリツリーＴの類似度を精度良く判定することができる。 In the present embodiment, the category tree generation unit 320 assigns a score to the category according to the number of the same categories integrated in the generation process of the category tree T, and assigns a score to all the categories included in one category tree. The weight is derived by normalizing the score obtained in units of the category tree. According to such a configuration, even if the scale of the category tree T is significantly different for each query, the similarity of the plurality of category trees T can be accurately determined.

本実施形態では、類似度導出部３３０は、複数のクエリのそれぞれに対応するカテゴリツリーＴに含まれるカテゴリ同士が一致するか否かを最上位のカテゴリから順に判定し、一致するカテゴリに付与された前記重みの値を加算することで、複数のクエリ間の類似度を導出する。このような構成によれば、複数のクエリ間の類似度の導出精度をさらに高めることができる。 In the present embodiment, the similarity derivation unit 330 determines whether or not the categories included in the category tree T corresponding to each of the plurality of queries match each other in order from the highest category, and assigns the matching categories to the matching categories. By adding the values of the weights, the similarity between a plurality of queries is derived. According to such a configuration, the accuracy of deriving the similarity between a plurality of queries can be further improved.

本実施形態では、サーバ装置１０は、類似度導出部３３０と、セッション分割部４００とを有する。類似度導出部３３０は、販売サイトに対するユーザの複数の検索行動の内容に基づき、複数の検索行動の間の類似度を導出する。セッション分割部４００は、類似度導出部３３０により導出された前記類似度に基づき、複数の検索行動の間でセッションを分割する。このような構成によれば、ユーザの意図を考慮したセッション分割が可能になる。これにより、ユーザの検索行動に関連するより有用な情報を提供することができる。 In the present embodiment, the server device 10 has a similarity derivation unit 330 and a session division unit 400. The similarity derivation unit 330 derives the similarity between the plurality of search behaviors based on the contents of the plurality of search behaviors of the user with respect to the sales site. The session dividing unit 400 divides a session among a plurality of search actions based on the similarity derived by the similarity deriving unit 330. With such a configuration, it is possible to divide the session in consideration of the user's intention. This makes it possible to provide more useful information related to the user's search behavior.

本実施形態では、類似度導出部３３０は、販売サイトに対してユーザが入力した複数のクエリの内容に基づき、複数のクエリ間の類似度を複数の検索行動の間の類似度として導出する。セッション分割部４００は、複数のクエリ間の類似度に基づき、複数のクエリがそれぞれ入力された時刻の間でセッションを分割する。このような構成によれば、クエリの内容に基づき、セッションを分割することができる。これにより、ユーザの検索行動に関連するより有用な情報を提供することができる。 In the present embodiment, the similarity deriving unit 330 derives the similarity between the plurality of queries as the similarity between the plurality of search actions based on the contents of the plurality of queries input by the user to the sales site. The session division unit 400 divides the session between the times when the plurality of queries are input, based on the similarity between the plurality of queries. With such a configuration, the session can be divided based on the contents of the query. This makes it possible to provide more useful information related to the user's search behavior.

本実施形態では、類似度導出部３３０は、直接入力されたクエリだけでなく、直接入力されたクエリと同視できる検索行動がなされた場合も、前記直接入力されたクエリと同視できる検索行動をクエリの入力として取り扱う。このような構成によれば、クエリのみに用いてセッションが分割される場合に比べて、より精度の高いセッションの分割が可能になる場合がある。 In the present embodiment, the similarity deriving unit 330 queries not only the directly input query but also the search behavior that can be equated with the directly input query even when the search behavior can be equated with the directly input query. Treat as input of. With such a configuration, it may be possible to divide the session with higher accuracy than when the session is divided using only the query.

本実施形態では、サーバ装置１０は、セッションを分割することで得られる分割セッションにおけるユーザの検索行動に基づき、分割セッションの間に提示されている商品等に対するユーザのコンバージョン予測値を導出するＣＶ予測部６００を備える。このような構成によれば、ユーザの意図を考慮したセッション分割を反映させたコンバージョン予測値の導出が可能になる。このため、コンバージョン予測値の導出精度を高めることができる。 In the present embodiment, the server device 10 derives the user's conversion prediction value for the product or the like presented during the division session based on the user's search behavior in the division session obtained by dividing the session. A unit 600 is provided. With such a configuration, it is possible to derive a conversion prediction value that reflects the session division in consideration of the user's intention. Therefore, the accuracy of deriving the predicted conversion value can be improved.

本実施形態では、ＣＶ予測部６００は、分割セッションに関する時刻情報に基づき、ユーザのコンバージョン予測値を導出する。このような構成によれば、ユーザの意図が考慮されて分割された分割セッションの時刻情報に基づいてコンバージョン予測値が導出されるため、コンバージョン予測値の導出精度を高めることができる。 In the present embodiment, the CV prediction unit 600 derives the conversion prediction value of the user based on the time information regarding the split session. According to such a configuration, since the conversion prediction value is derived based on the time information of the division session divided in consideration of the user's intention, it is possible to improve the derivation accuracy of the conversion prediction value.

本実施形態では、ＣＶ予測部６００は、分割セッションの開始からの経過時間に基づき、ユーザのコンバージョン予測値を導出する。このような構成によれば、ユーザの意図が考慮されて分割された分割セッションの開始時刻からの経過時間に基づいてコンバージョン予測値が導出されるため、コンバージョン予測値の導出精度を高めることができる。 In the present embodiment, the CV prediction unit 600 derives the conversion prediction value of the user based on the elapsed time from the start of the split session. According to such a configuration, the conversion prediction value is derived based on the elapsed time from the start time of the division session divided in consideration of the user's intention, so that the derivation accuracy of the conversion prediction value can be improved. ..

以上、実施形態のサーバ装置１０について説明したが、実施形態は上記例に限定されない。また本願において「ＸＸに基づく」とは、「少なくともＸＸに基づく」ことを意味し、ＸＸに加えて別の要素に基づく場合も含む。また「ＸＸに基づく」とは、ＸＸを直接に用いる場合に限定されず、ＸＸに対して演算や加工が行われたものに基づく場合も含む。「ＸＸ」は、任意の要素である。 Although the server device 10 of the embodiment has been described above, the embodiment is not limited to the above example. Further, in the present application, "based on XX" means "based on at least XX", and includes a case where it is based on another element in addition to XX. Further, "based on XX" is not limited to the case where XX is directly used, but also includes the case where it is based on the case where calculation or processing is performed on XX. "XX" is an arbitrary element.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

１０…サーバ装置（情報処理システム）、３００…類似度分析部、３１０…カテゴリパス収集部、３２０…カテゴリツリー生成部、３３０…類似度導出部、４００…セッション分割部、６００…ＣＶ予測部。 10 ... Server device (information processing system), 300 ... Similarity analysis unit, 310 ... Category path collection unit, 320 ... Category tree generation unit, 330 ... Similarity derivation unit, 400 ... Session division unit, 600 ... CV prediction unit.

Claims

A collection unit that collects category paths given to each of a plurality of products or services selected on the sales site by the user group for each user group that has entered the same query.
A generation unit that generates a category tree for each query by integrating the same categories included in a plurality of category paths for each user group collected by the collection unit.
A derivation unit that derives the similarity between a plurality of queries based on the similarity between the category trees generated by the generation unit for each of the plurality of queries.
Information processing system equipped with.

The derivation unit derives the similarity between the plurality of queries based on the degree of matching between the plurality of categories included in the category tree corresponding to each of the plurality of queries.
The information processing system according to claim 1.

The derivation unit determines whether or not the categories included in the category tree corresponding to each of the plurality of queries match each other in order from the highest category, and how many consecutively from the highest category. Derivation of similarity between the plurality of queries based on whether the categories of
The information processing system according to claim 1 or 2.

The generator assigns weights to the categories integrated in the process of generating the category tree.
The derivation unit has a degree of similarity between the plurality of queries based on the degree of matching between the plurality of categories included in the category tree corresponding to each of the plurality of queries and the weight given to the matching category. To derive,
The information processing system according to any one of claims 1 to 3.

The generation unit derives the weight to be given to the category according to the number of the same categories integrated in the generation process of the category tree.
The information processing system according to claim 4.

The generation unit assigns a score to the category according to the number of the same categories integrated in the process of generating the category tree, and assigns a score to all the categories included in one category tree to the category tree. The weight is derived by normalizing in units.
The information processing system according to claim 4 or 5.

The derivation unit determines whether or not the categories included in the category tree corresponding to each of the plurality of queries match each other in order from the highest category, and determines the value of the weight given to the matching category. By adding, the similarity between the plurality of queries is derived.
The information processing system according to claim 6.

A session division unit that divides a session between the times when the plurality of queries are received is further provided based on the similarity between the plurality of queries derived by the derivation unit for a plurality of queries input from the user. Prepared
The information processing system according to any one of claims 1 to 7.

The computer
For each user group who entered the same query, the category paths given to each of the plurality of products or services selected on the sales site by the user group are collected.
By integrating the same categories included in the collected multiple category paths for each user group, a category tree for each query is generated.
The similarity between a plurality of queries is derived based on the similarity between the category trees generated for each of the plurality of queries.
Information processing method.

On the computer
For each user group who entered the same query, the category paths given to each of the plurality of products or services selected on the sales site by the user group are collected.
By integrating the same categories included in the collected multiple category paths for each user group, a category tree for each query can be generated.
The similarity between a plurality of queries is derived based on the similarity between the category trees generated for each of the plurality of queries.
program.