JP7807364B2

JP7807364B2 - Information processing device, information processing method, and information processing program

Info

Publication number: JP7807364B2
Application number: JP2022198796A
Authority: JP
Inventors: 智山内; 開大福田
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2026-01-27
Anticipated expiration: 2042-12-13
Also published as: JP2024084495A

Description

本出願は、情報処理装置、情報処理方法、及び情報処理プログラムに関する。 This application relates to an information processing device, an information processing method, and an information processing program.

従来、チャットボットと連携したインタラクティブなコミュニケーションを実行するための技術が提案されている。たとえば、特許文献１では、チャットボットと連携したデジタルボードを通じて、対話型のコミュニケーションを実行する技術が提案されている。 Technologies have been proposed in the past for implementing interactive communication in conjunction with chatbots. For example, Patent Document 1 proposes technology for implementing conversational communication through a digital board in conjunction with a chatbot.

再表２０２０－２４０８３８号公報Re-table No. 2020-240838

しかしながら、上述した従来の技術では、チャットボットによるコミュニケーションを通じて、オンラインサービスのサービス利用者から効率的に情報を引き出す上で改善の余地がある。たとえば、従来の技術は、チャットからの離脱を抑制することを目的とするものであり、サービス利用者のレビューを効率的に取得することを目的とするものではない点で、少なからず改善の余地が残されている。 However, the above-mentioned conventional technology leaves room for improvement in terms of efficiently extracting information from online service users through chatbot communication. For example, the conventional technology aims to prevent users from abandoning chats, and is not intended to efficiently obtain reviews from service users, so there is still considerable room for improvement.

本願は、上記に鑑みてなされたものであって、オンラインサービスのサービス利用者からの効率的な情報収集を図ることができる情報処理装置、情報処理方法、及び情報処理プログラムを提供することを目的とする。 This application has been made in light of the above, and aims to provide an information processing device, information processing method, and information processing program that can efficiently collect information from service users of online services.

本願に係る情報処理装置は、オンラインサービスのサービス利用者との間でチャットボットを通じて行われる対話に関する処理を制御する情報処理装置であって、選択部と、指示部とを有する。選択部は、対話において想定される一連の会話の内容を示す会話パターンが予め規定された複数の会話用シーケンスの中から、サービス利用者の状態に応じた会話用シーケンスを選択する。指示部は、選択部により選択された会話用シーケンスを、対話に関する処理を実行する外部装置に指示する。 The information processing device according to the present application controls processing related to a dialogue conducted with a service user of an online service through a chatbot, and includes a selection unit and an instruction unit. The selection unit selects a conversation sequence according to the state of the service user from among a plurality of conversation sequences in which conversation patterns indicating the content of a series of conversations expected in the dialogue are predefined. The instruction unit instructs an external device that executes processing related to the dialogue to use the conversation sequence selected by the selection unit.

実施形態の態様の１つによれば、オンラインサービスのサービス利用者からの効率的な情報収集を図ることができる。 According to one aspect of the embodiment, it is possible to efficiently collect information from service users of online services.

図１は、実施形態に係る情報処理の概要を示す図である。FIG. 1 is a diagram illustrating an overview of information processing according to an embodiment. 図２は、実施形態に係る会話用シーケンスの概要を示す図である。FIG. 2 is a diagram showing an outline of a conversation sequence according to the embodiment. 図３は、実施形態に係る第２サーバから第１サーバに対する会話用シーケンスの指示例を示す図である。FIG. 3 is a diagram showing an example of a conversation sequence instruction from the second server to the first server according to the embodiment. 図４は、実施形態に係る強化学習の概要を模式的に示す図である。FIG. 4 is a diagram illustrating an overview of reinforcement learning according to the embodiment. 図５は、実施形態に係る第２サーバの構成例を示すブロック図である。FIG. 5 is a block diagram illustrating an example of the configuration of the second server according to the embodiment. 図６は、実施形態に係る会話用シーケンスの情報の概要を示す図である。FIG. 6 is a diagram showing an outline of conversation sequence information according to the embodiment. 図７は、実施形態に係る選択モデルに関する情報の概要を示す図である。FIG. 7 is a diagram showing an overview of information related to a selection model according to the embodiment. 図８は、実施形態に係る利用者情報の概要を示す図である。FIG. 8 is a diagram showing an overview of user information according to the embodiment. 図９は、実施形態に係る第２サーバが実行する処理手順の一例を示すフローチャートである。FIG. 9 is a flowchart illustrating an example of a processing procedure executed by the second server according to the embodiment. 図１０は、実施形態または各変形例に係る第２サーバの機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 10 is a hardware configuration diagram illustrating an example of a computer that realizes the functions of the second server according to the embodiment or each modification.

以下に、本願に係る情報処理装置、情報処理方法、及び情報処理プログラムを実施するための形態（以下、「実施形態」と称する。）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る情報処理装置、情報処理方法、及び情報処理プログラムが限定されるものではない。また、各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Below, detailed explanations will be given of the information processing device, information processing method, and information processing program (hereinafter referred to as "embodiments") according to the present application, with reference to the drawings. Note that the information processing device, information processing method, and information processing program according to the present application are not limited to these embodiments. Furthermore, the embodiments can be combined as appropriate to the extent that the processing content is not contradictory. Furthermore, the same components in the following embodiments will be assigned the same reference numerals, and redundant explanations will be omitted.

（実施形態）
［１．実施形態に係るシステム構成］
まず、図１を用いて、実施形態に係る情報処理装置の一例である第２サーバ２００を有する情報処理システムＳＹＳの構成について説明する。図１に、実施形態に係る情報処理システムＳＹＳの構成例を示す。図１に示すように、実施形態に係る情報処理システムＳＹＳは、利用者端末１０と、第１サーバ１００と、第２サーバ２００とを有している。 (Embodiment)
1. System configuration according to the embodiment
First, the configuration of an information processing system SYS including a second server 200, which is an example of an information processing device according to an embodiment, will be described with reference to Fig. 1. Fig. 1 shows an example of the configuration of the information processing system SYS according to an embodiment. As shown in Fig. 1, the information processing system SYS according to an embodiment includes a user terminal 10, a first server 100, and a second server 200.

利用者端末１０、第１サーバ１００、及び第２サーバ２００は、インターネットなどのネットワーク（たとえば、図５に示すネットワークＮ）に接続される。利用者端末１０および第１サーバ１００は、ネットワークを通じて相互に通信できる。第１サーバ１００及び第２サーバ２００は、ネットワークを通じて相互に通信できる。なお、利用者端末１０および第２サーバ２００が、ネットワークを通じて相互に通信してもよい。なお、図１に示す情報処理システムＳＹＳは、図１に示す例よりも多くの利用者端末１０を有していてもよい。 The user terminal 10, the first server 100, and the second server 200 are connected to a network such as the Internet (for example, network N shown in FIG. 5). The user terminal 10 and the first server 100 can communicate with each other through the network. The first server 100 and the second server 200 can communicate with each other through the network. The user terminal 10 and the second server 200 may also communicate with each other through the network. The information processing system SYS shown in FIG. 1 may have more user terminals 10 than the example shown in FIG. 1.

利用者端末１０は、第１サーバ１００の管理者がプラットフォーマーとして運営に携わる各種オンラインサービスの利用者であるサービス利用者Ｕにより使用される情報処理端末である。たとえば、利用者端末１０は、スマートフォンや、デスクトップ型ＰＣ（Personal Computer）や、ノート型ＰＣや、タブレット端末や、携帯電話機や、ＰＤＡ（Personal Digital Assistant）などにより実現され得る。 The user terminal 10 is an information processing terminal used by a service user U, who is a user of various online services operated by the administrator of the first server 100 as a platform provider. For example, the user terminal 10 can be realized as a smartphone, desktop PC (Personal Computer), notebook PC, tablet terminal, mobile phone, or PDA (Personal Digital Assistant).

また、利用者端末１０は、ＬＴＥ（Long Term Evolution）や、４Ｇ（4th Generation：第４世代移動通信システム）や、５Ｇ（5th Generation：第５世代移動通信システム）などの無線通信網や、Ｂｌｕｅｔｏｏｔｈ（登録商標）や、無線ＬＡＮ（Local Area Network）などの近距離無線通信を実行するための通信機能を有し、これらの通信機能によりネットワークに接続できる。 The user terminal 10 also has communication functions for performing wireless communication networks such as LTE (Long Term Evolution), 4G (4th Generation: fourth generation mobile communication system), and 5G (5th Generation: fifth generation mobile communication system), as well as short-range wireless communication such as Bluetooth (registered trademark) and wireless LAN (Local Area Network), and can connect to networks using these communication functions.

また、利用者端末１０は、たとえば、第１サーバ１００により提供される各種オンラインサービスのウェブコンテンツを、ウェブブラウザやアプリケーションにより表示できる。なお、利用者端末１０は、情報の表示処理を実現する制御情報を第１サーバ１００などから受け取った場合には、制御情報に従って表示処理を実現する。 The user terminal 10 can also display web content from various online services provided by the first server 100, for example, using a web browser or application. When the user terminal 10 receives control information from the first server 100 or the like that enables the display processing of information, the user terminal 10 performs the display processing in accordance with the control information.

サービス利用者Ｕは、利用者端末１０を操作して、ウェブブラウザにより表示される各種オンラインサービスのウェブサイトを閲覧したり、ウェブブラウザにより表示されるウェブコンテンツを利用したりできる。また、サービス利用者Ｕは、各種オンラインサービスのウェブサイトを利用するための専用のアプリケーションプログラム（以下、「専用アプリ」と称する。）を第１サーバ１００からダウンロードして、利用者端末Ｕにインストールできる。この場合、サービス利用者Ｕは、専用アプリを操作することにより、専用アプリ用に構成された各種オンラインサービスのコンテンツを利用できる。 The service user U can operate the user terminal 10 to browse websites of various online services displayed by a web browser and use web content displayed by the web browser. The service user U can also download a dedicated application program (hereinafter referred to as a "dedicated app") for using the websites of various online services from the first server 100 and install it on the user terminal U. In this case, the service user U can use the content of various online services designed for the dedicated app by operating the dedicated app.

第１サーバ１００は、各サービス利用者に対して各種オンラインサービスを提供する情報処理装置である。第１サーバ１００は、典型的にはサーバ装置であるが、メインフレームやワークステーションなどにより実現されてもよい。また、第１サーバ１００がサーバ装置により実現される場合、単独のサーバ装置により実現されてもよいし、複数のサーバ装置及び複数のストレージ装置が協働して動作するクラウドシステムなどにより実現されてもよい。 The first server 100 is an information processing device that provides various online services to each service user. The first server 100 is typically a server device, but may also be realized by a mainframe, workstation, or the like. Furthermore, when the first server 100 is realized by a server device, it may be realized by a single server device, or it may be realized by a cloud system in which multiple server devices and multiple storage devices operate in cooperation with each other.

また、第１サーバ１００により提供される各種オンラインサービスには、インターネット接続や、検索サービスや、ＳＮＳ（Social Networking Service）や、電子商取引サービスや、電子決済サービスや、オンラインゲームや、オンラインバンキングサービスや、オンライントレーディングサービスや、宿泊予約サービスや、チケット予約サービスや、動画配信サービスや、音楽配信サービスや、ニュース配信サービスや、地図情報サービスや、ルート検索サービスや、経路案内サービスや、路線情報サービスや、運行情報サービスや、天気情報サービスなどが含まれ得る。なお、各種オンラインサービスには、各種アプリケーションに対応するＡＰＩ（Application Programming Interface）サービスが含まれていてもよい。 The various online services provided by the first server 100 may include internet connection, search services, SNS (Social Networking Service), e-commerce services, electronic payment services, online games, online banking services, online trading services, hotel reservation services, ticket reservation services, video distribution services, music distribution services, news distribution services, map information services, route search services, route guidance services, line information services, traffic information services, and weather information services. The various online services may also include API (Application Programming Interface) services corresponding to various applications.

また、第１サーバ１００は、各種オンラインサービスの提供にあたり、サービス利用者（たとえば、サービス利用者Ｕ）の各々を特定するための利用者識別情報である利用者ＩＤを含むユーザアカウントを作成する。このユーザアカウントに含まれる利用者ＩＤは、各種オンラインサービスの利用登録の際にサービス利用者（たとえば、サービス利用者Ｕ）が任意に設定するか、又は第１サーバ１００により個別に割り振られる。第１サーバ１００は、各サービス利用者（たとえば、サービス利用者Ｕ）のユーザアカウントに紐付けて、オンラインサービスの利用履歴であるサービス利用履歴（「利用者情報」の一例）を記録し、記録したサービス利用履歴をサービス利用者ごとに管理する。また、第１サーバ１００は、サービス利用者（たとえば、サービス利用者Ｕ）からの要求に応じて、各種オンラインサービスを利用するための専用アプリを配布できる。 Furthermore, when providing various online services, the first server 100 creates a user account including a user ID, which is user identification information for identifying each service user (for example, service user U). The user ID included in this user account is either set arbitrarily by the service user (for example, service user U) when registering to use the various online services, or is individually assigned by the first server 100. The first server 100 links the user account of each service user (for example, service user U) to record a service usage history (an example of "user information"), which is a usage history of the online services, and manages the recorded service usage history for each service user. Furthermore, the first server 100 can distribute dedicated apps for using various online services in response to a request from the service user (for example, service user U).

また、第１サーバ１００は、チャットボットを通じて、各種オンラインサービスのサービス利用者（たとえば、サービス利用者Ｕ）との対話に関する処理を実行する。なお、第１サーバ１００の処理については後述する。 The first server 100 also performs processing related to interactions with service users (e.g., service user U) of various online services through the chatbot. The processing of the first server 100 will be described later.

また、第２サーバ２００は、各種オンラインサービスのサービス利用者（たとえば、サービス利用者Ｕ）と第１サーバ１００との間でチャットボットを通じて行われる対話に関する処理を制御する情報処理装置である。第２サーバ２００は、典型的にはサーバ装置であるが、メインフレームやワークステーションなどにより実現されてもよい。また、第２サーバ２００がサーバ装置により実現される場合、単独のサーバ装置により実現されてもよいし、複数のサーバ装置及び複数のストレージ装置が協働して動作するクラウドシステムなどにより実現されてもよい。なお、第２サーバ２００については後述する。 The second server 200 is an information processing device that controls processing related to interactions conducted through a chatbot between service users of various online services (for example, service user U) and the first server 100. The second server 200 is typically a server device, but may also be implemented as a mainframe, workstation, or the like. When the second server 200 is implemented as a server device, it may be implemented as a single server device, or as a cloud system in which multiple server devices and multiple storage devices operate in cooperation with each other. The second server 200 will be described later.

［２．実施形態に係る情報処理の概要］
以下、図１～図４を参照しつつ、実施形態に係る情報処理の概要について説明する。以下の説明において、利用者端末１０をサービス利用者Ｕと表記して説明する場合がある。すなわち、サービス利用者Ｕを利用者端末１０と読み替えることができる。 2. Overview of Information Processing According to the Embodiment
An overview of information processing according to the embodiment will be described below with reference to Figures 1 to 4. In the following description, the user terminal 10 may be referred to as the service user U. In other words, the service user U can be read as the user terminal 10.

また、以下の説明において、会話用シーケンスＳＱ１－１や会話用シーケンスＳＱ２－１などを特に区別して説明する必要がない場合、「会話用シーケンスＳＱ」と総称して説明する。 Furthermore, in the following explanation, when there is no need to distinguish between conversation sequence SQ1-1, conversation sequence SQ2-1, etc., they will be collectively referred to as "conversation sequence SQ."

図１に、実施形態に係る情報処理の概要を示す。図１に示すように、第１サーバ１００は、サービス利用者Ｕがアクセス中のオンラインサービスのサービス用コンテンツＣＴを提供するとともに、サービス用コンテンツＣＴとともに表示されるチャットボットＣＢの対話画面を通じて、サービス利用者Ｕとの対話に関する処理を実行する。第１サーバ１００は、サービス用コンテンツＣＴの提供に際して、サービス利用者Ｕの属性を示す属性情報を取得できる。また、第１サーバ１００は、チャットボットＣＢによるサービス利用者Ｕとの対話における会話履歴、及びサービス利用者Ｕの反応に関する情報を取得できる。 Figure 1 shows an overview of information processing according to an embodiment. As shown in Figure 1, the first server 100 provides service content CT for the online service currently being accessed by the service user U, and also performs processing related to the dialogue with the service user U through the dialogue screen of the chatbot CB displayed together with the service content CT. When providing the service content CT, the first server 100 can acquire attribute information indicating the attributes of the service user U. The first server 100 can also acquire the conversation history of the dialogue between the chatbot CB and the service user U, and information related to the service user U's reactions.

まず、第１サーバ１００は、上述のサービス利用者Ｕとの対話に関する処理を実行するにあたり、第２サーバ２００に対して最初の会話用シーケンスＳＱの問合せを送信する（ステップＳ０１）。図２に、実施形態に係る会話用シーケンスの概要を示す。 First, when executing the above-described process related to the dialogue with the service user U, the first server 100 sends a query for the initial conversation sequence SQ to the second server 200 (step S01). Figure 2 shows an overview of the conversation sequence according to the embodiment.

実施形態に係る会話用シーケンスＳＱは、チャットボットＣＢとサービス利用者との対話において想定される一連の会話の内容を示す会話パターンを予め規定した情報である。第１サーバ１００および第２サーバ２００の管理者は、サービス利用者（たとえば、図１に示すサービス利用者Ｕ）との間で行われる一連の会話の内容（発話と応答のやりとり）を、必然性の高い（必須の）まとまりで抽出した会話パターンを複数洗い出す。そして、管理者は、洗い出した複数の会話パターンのそれぞれを、会話用シーケンスＳＱとして設定する。 The conversation sequence SQ according to the embodiment is information that predefines conversation patterns that indicate the content of a series of conversations expected in a dialogue between the chatbot CB and a service user. The administrator of the first server 100 and the second server 200 identifies multiple conversation patterns that are extracted as necessary (essential) groups of the content (exchanges of utterances and responses) of a series of conversations that take place between the chatbot CB and the service user (for example, service user U shown in Figure 1). The administrator then sets each of the identified multiple conversation patterns as a conversation sequence SQ.

たとえば、図２に示す会話用シーケンスＳＱ１－１は、時系列で発話される一連の会話の内容として、発話Ｘ１－１と、発話Ｘ１－２と、発話Ｘ１－３と、発話（質問）Ｑ１－１とを含んで構成されている。発話（質問）Ｑ１－１は、チャットボットＣＢからサービス利用者Ｕに対して問いかけを行う内容となっている。たとえば、発話（質問）Ｑ１－１は、チャットボットＣＢからの問いかけに対する回答を、サービス利用者Ｕに選択させるための回答選択肢が関連付けられている。チャットボットＣＢに発話（質問）Ｑ１－１が表示される際、回答選択肢が合わせて表示される。 For example, the conversation sequence SQ1-1 shown in FIG. 2 is composed of a series of conversational contents uttered in chronological order, including utterance X1-1, utterance X1-2, utterance X1-3, and utterance (question) Q1-1. Utterance (question) Q1-1 is a question posed by chatbot CB to service user U. For example, utterance (question) Q1-1 is associated with answer options that allow service user U to select an answer to the question posed by chatbot CB. When utterance (question) Q1-1 is displayed on chatbot CB, the answer options are also displayed.

また、図２に示す会話用シーケンスＳＱ２－１は、時系列で発話される一連の会話の内容として、発話Ｘ２－１と、発話（質問）Ｑ２－１と、発話Ｘ２－２と、発話（質問）Ｑ２－２とを含んで構成されている。発話（質問）Ｑ２－１および発話（質問）Ｑ２－２については、上述した発話（質問）Ｑ１－１と同様の性質を有している。 Furthermore, the conversation sequence SQ2-1 shown in Figure 2 is composed of a series of conversational contents uttered in chronological order, including utterance X2-1, utterance (question) Q2-1, utterance X2-2, and utterance (question) Q2-2. Utterance (question) Q2-1 and utterance (question) Q2-2 have the same properties as the above-mentioned utterance (question) Q1-1.

第２サーバ２００は、第１サーバ１００から最初の会話用シーケンスの問合せを受信すると、予め規定された複数の会話用シーケンスＳＱの中から、最初の会話用シーケンスＳＱを選択する（ステップＳ０２）。そして、第２サーバ２００は、選択した最初の会話用シーケンスＳＱの指示を第１サーバ１００に送信する（ステップＳ０３）。 When the second server 200 receives an inquiry about the first conversation sequence from the first server 100, it selects the first conversation sequence SQ from a plurality of predefined conversation sequences SQ (step S02). Then, the second server 200 transmits an instruction for the selected first conversation sequence SQ to the first server 100 (step S03).

たとえば、第２サーバ２００は、各種オンラインサービスに共通の最初の会話用シーケンスを予め設定しておいてもよい。また、第２サーバ２００は、オンラインサービスごとに、オンラインサービスに対応する最初の会話用シーケンスを予め設定しておいてもよい。この場合、第２サーバ２００は、第１サーバ１００から、最初の会話用シーケンスＳＱの問合せとともに、たとえば、チャットボットＣＢとの対話の相手となるサービス利用者Ｕが利用中のオンラインサービスを特定するためのサービス情報を取得する。そして、第２サーバ２００は、取得したサービス情報に予め対応付けられている会話用シーケンスＳＱを最初の会話用シーケンスＳＱとして選択する。 For example, the second server 200 may preset an initial conversation sequence common to various online services. Alternatively, the second server 200 may preset an initial conversation sequence corresponding to each online service for each online service. In this case, the second server 200 acquires, from the first server 100, an inquiry about the initial conversation sequence SQ, as well as service information for identifying the online service being used by the service user U who will be the other party in conversation with the chatbot CB. The second server 200 then selects, as the initial conversation sequence SQ, a conversation sequence SQ that is pre-associated with the acquired service information.

また、第２サーバ２００は、チャットボットＣＢとの対話の相手となるサービス利用者の属性（デモグラフィック属性やサイコグラフィック属性など）に対応する最初の会話用シーケンスＳＱを予め設定しておいてもよい。この場合、第２サーバ２００は、第１サーバ１００から、最初の会話用シーケンスＳＱの問合せとともに、たとえば、チャットボットＣＢとの対話の相手となるサービス利用者Ｕの属性を示す属性情報を取得する。そして、第２サーバ２００は、取得した属性情報に対応付けられている会話用シーケンスＳＱを最初の会話用シーケンスＳＱとして選択する。 The second server 200 may also pre-set an initial conversation sequence SQ that corresponds to the attributes (such as demographic attributes or psychographic attributes) of the service user who will be the other party in the conversation with the chatbot CB. In this case, the second server 200 acquires, from the first server 100, an inquiry about the initial conversation sequence SQ, as well as attribute information indicating, for example, the attributes of the service user U who will be the other party in the conversation with the chatbot CB. The second server 200 then selects the conversation sequence SQ associated with the acquired attribute information as the initial conversation sequence SQ.

また、第２サーバ２００は、チャットボットＣＢによるサービス利用者Ｕとの対話に用いる最後の会話用シーケンスＳＱについても、予め定められる所定のルールに従って選択してもよい。 The second server 200 may also select the final conversation sequence SQ to be used in the dialogue between the chatbot CB and the service user U according to predetermined rules.

第１サーバ１００は、第２サーバ２００から最初の会話用シーケンスＳＱの指示を受信すると、受信した最初の会話用シーケンスＳＱに従って、チャットボットＣＢを通じたサービス利用者Ｕとの対話に関する処理を実行する（ステップＳ０４）。図１に示すチャットボットＣＢの対話画面の例によれば、第１サーバ１００から送信される情報に基づいて、利用者端末１０には、会話用シーケンスＳＱに含まれる発話に対応する情報Ｄ－１～Ｄ－３が会話用シーケンスＳＱにおいて設定された順序で上から順に表示される。 When the first server 100 receives the instruction for the initial conversation sequence SQ from the second server 200, it executes processing related to the dialogue with the service user U through the chatbot CB in accordance with the received initial conversation sequence SQ (step S04). According to the example of the dialogue screen of the chatbot CB shown in Figure 1, based on the information transmitted from the first server 100, information D-1 to D-3 corresponding to the utterances included in the conversation sequence SQ are displayed on the user terminal 10 from top to bottom in the order set in the conversation sequence SQ.

第１サーバ１００は、最初の会話用シーケンスＳＱに基づく対話を完了すると、第２サーバ２００に対して次の会話用シーケンスＳＱの問合せを送信する（ステップＳ０５）。このとき、第１サーバ１００は、直前の会話用シーケンスＳＱを特定するための情報と、チャットボットＣＢとの対話におけるサービス利用者Ｕの反応に関する情報とを第２サーバ２００に合わせて送信する。 When the first server 100 completes the dialogue based on the first conversation sequence SQ, it sends a query for the next conversation sequence SQ to the second server 200 (step S05). At this time, the first server 100 also sends to the second server 200 information for identifying the immediately preceding conversation sequence SQ and information regarding the reaction of the service user U in the dialogue with the chatbot CB.

第２サーバ２００は、第１サーバ１００から次の会話用シーケンスＳＱの問合せを受信すると、チャットボットＣＢを通じた対話に用いられる会話用シーケンスＳＱの選択を行う選択モデルを用いて、予め規定された複数の会話用シーケンスＳＱの中から、サービス利用者Ｕの状態に応じた次の会話用シーケンスＳＱを選択する（ステップＳ０６）。そして、第２サーバ２００は、選択した次の会話用シーケンスＳＱの指示を第１サーバ１００に送信する（ステップＳ０７）。 When the second server 200 receives an inquiry about the next conversation sequence SQ from the first server 100, it uses a selection model for selecting a conversation sequence SQ to be used in a dialogue through the chatbot CB to select the next conversation sequence SQ that corresponds to the state of the service user U from among multiple predefined conversation sequences SQ (step S06). Then, the second server 200 transmits an instruction for the selected next conversation sequence SQ to the first server 100 (step S07).

以下、図３を用いて、第２サーバ２００から第１サーバ１００に対する会話用シーケンスＳＱの指示例について具体的に説明する。図３に、実施形態に係る第２サーバ２００から第１サーバ１００に対する会話用シーケンスＳＱの指示例を示す。 Hereinafter, an example of an instruction for a conversation sequence SQ from the second server 200 to the first server 100 will be specifically described using Figure 3. Figure 3 shows an example of an instruction for a conversation sequence SQ from the second server 200 to the first server 100 according to the embodiment.

図３に示すように、第１サーバ１００は、第２サーバ２００に対して、最初の会話用シーケンスＳＱの問合せを送信する（ステップＳ１１）。このとき、第１サーバ１００は、最初の会話用シーケンスＳＱの問合せとともに、サービス利用者Ｕの属性を示す属性情報（属性ＵＡ）を送信する。 As shown in FIG. 3, the first server 100 sends a query for the first conversation sequence SQ to the second server 200 (step S11). At this time, the first server 100 sends attribute information (attribute UA) indicating the attributes of the service user U along with the query for the first conversation sequence SQ.

第２サーバ２００は、第１サーバ１００から最初の会話用シーケンスＳＱの問合せを受信すると、最初の会話用シーケンスＳＱ１－１を選択し、選択した最初の会話用シーケンスＳＱ１－１のシーケンス番号「ＳＮ１０１」を第１サーバ１００に送信する（ステップＳ１２）。また、第２サーバ２００は、第１サーバ１００から受信したサービス利用者Ｕの属性を示す属性情報を保持しておく。 When the second server 200 receives an inquiry about the first conversation sequence SQ from the first server 100, it selects the first conversation sequence SQ1-1 and transmits the sequence number "SN101" of the selected first conversation sequence SQ1-1 to the first server 100 (step S12). The second server 200 also retains attribute information indicating the attributes of the service user U received from the first server 100.

第１サーバ１００は、第２サーバ２００から受信したシーケンス番号「ＳＮ１０１」に対応する会話用シーケンスＳＱを用いたサービス利用者Ｕとの対話が完了すると、次の会話用シーケンスＳＱの問合せを第２サーバ２００に送信する（ステップＳ１３）。このとき、第１サーバ１００は、次の会話用シーケンスＳＱの問合せとともに、直前の会話用シーケンスＳＱ１－１のシーケンス番号「ＳＮ１０１」と、チャットボットＣＢとの対話におけるサービス利用者Ｕの回答を示す情報「回答Ｒ－１」を送信する。 When the first server 100 completes the dialogue with the service user U using the conversation sequence SQ corresponding to the sequence number "SN101" received from the second server 200, it sends a query for the next conversation sequence SQ to the second server 200 (step S13). At this time, the first server 100 sends the query for the next conversation sequence SQ, along with the sequence number "SN101" of the previous conversation sequence SQ1-1, and information "Response R-1" indicating the response of the service user U in the dialogue with the chatbot CB.

第２サーバ２００は、第１サーバ１００から次の会話用シーケンスＳＱの問合せを受信すると、選択モデルを用いて、サービス利用者Ｕの状態に応じた次の会話用シーケンスＳＱ２－１を選択し、選択した次の会話用シーケンスＳＱ２－１のシーケンス番号「ＳＮ２０１」を第１サーバ１００に送信する（ステップＳ１４）。たとえば、第２サーバ２００は、サービス利用者Ｕの状態を示す情報として、サービス利用者Ｕの属性を示す属性情報（「利用者情報」の一例）と、直前の会話用シーケンスのシーケンス番号「ＳＮ１０１」（「会話の履歴」の一例）と、チャットボットＣＢとの対話におけるサービス利用者Ｕの回答結果（「サービス利用者の反応」の一例）を示す情報「回答Ｒ－１」とを選択モデルに入力することにより、選択モデルから出力されるシーケンス番号「ＳＮ２０１」を第１サーバ１００に送信する。 When the second server 200 receives a query for the next conversation sequence SQ from the first server 100, it uses the selection model to select the next conversation sequence SQ2-1 that corresponds to the state of the service user U, and transmits the sequence number "SN201" of the selected next conversation sequence SQ2-1 to the first server 100 (step S14). For example, the second server 200 inputs attribute information indicating the attributes of the service user U (an example of "user information"), the sequence number "SN101" of the previous conversation sequence (an example of "conversation history"), and information "answer R-1" indicating the response result of the service user U in the dialogue with the chatbot CB (an example of "service user response") into the selection model as information indicating the state of the service user U, and then transmits the sequence number "SN201" output from the selection model to the first server 100.

なお、サービス利用者Ｕの属性を示す属性情報には、デモグラフィック属性やサイコグラフィック属性などの静的な情報に限られず、位置情報や生体情報などの動的な情報を含んでいてもよい。この場合、第２サーバ２００は、第１サーバ１００から次の会話用シーケンスＳＱの問合せを受信する都度、サービス利用者Ｕの位置情報や生体情報などの動的な情報を取得し、取得した動的な情報に基づいて更新されたサービス利用者Ｕの状態に基づいて、次の会話用シーケンスＳＱの選択を実行できる。また、第２サーバ２００は、サービス利用者の状態を示す情報として、第１サーバ１００からサービス利用者Ｕのサービス利用履歴（購入履歴や予約履歴など）を取得し、取得したサービス利用履歴を会話用シーケンス選択時の入力情報として用いることもできる。 Note that the attribute information indicating the attributes of the service user U is not limited to static information such as demographic attributes and psychographic attributes, but may also include dynamic information such as location information and biometric information. In this case, the second server 200 acquires dynamic information such as the location information and biometric information of the service user U each time it receives an inquiry about the next conversation sequence SQ from the first server 100, and can select the next conversation sequence SQ based on the status of the service user U updated based on the acquired dynamic information. The second server 200 can also acquire the service usage history of the service user U (such as purchase history and reservation history) from the first server 100 as information indicating the status of the service user, and use the acquired service usage history as input information when selecting a conversation sequence.

図１に戻り、第２サーバ２００は、チャットボットＣＢを通じて行われる対話におけるサービス利用者Ｕの反応に基づく報酬を設定して、対話に用いられる会話用シーケンス単位で選択モデルの強化学習を実行する（ステップＳ０８）。図４に、実施形態に係る強化学習の概要を模式的に示す。 Returning to FIG. 1, the second server 200 sets a reward based on the response of the service user U in the dialogue conducted through the chatbot CB, and performs reinforcement learning of the selection model for each conversation sequence used in the dialogue (step S08). Figure 4 shows a schematic overview of reinforcement learning according to the embodiment.

図４に示すように、実施形態に係る強化学習では、選択モデルを強化学習のエージェントと見做し、チャットボットＣＢとサービス利用者Ｕとの対話を強化学習の環境と見做すことができる。この場合、以下の手順で強化学習が進められる。まず、選択モデルは、望まれる結果が得られると考える方策（ポリシ）に従って、サービス利用者Ｕの状態に応じた会話用シーケンスＳＱを選択する。ここで、サービス利用者Ｕの状態には、サービス利用者Ｕの属性を示す属性情報と、チャットボットＣＢとサービス利用者Ｕとの間の対話に用いられた会話用シーケンス（会話の履歴）と、チャットボットＣＢとの対話におけるサービス利用者Ｕの回答（サービス利用者Ｕの反応）とが含まれる。選択モデルにより選択された会話用シーケンスＳＱは、第２サーバ２００から第１サーバ１００に対して送信される。 As shown in FIG. 4, in reinforcement learning according to this embodiment, the selection model can be regarded as the reinforcement learning agent, and the dialogue between the chatbot CB and the service user U can be regarded as the reinforcement learning environment. In this case, reinforcement learning proceeds in the following procedure. First, the selection model selects a conversation sequence SQ corresponding to the state of the service user U, in accordance with a policy that is believed to produce the desired result. Here, the state of the service user U includes attribute information indicating the attributes of the service user U, the conversation sequence (conversation history) used in the dialogue between the chatbot CB and the service user U, and the response of the service user U in the dialogue with the chatbot CB (the reaction of the service user U). The conversation sequence SQ selected by the selection model is transmitted from the second server 200 to the first server 100.

次に、第１サーバ１００において会話用シーケンスＳＱによる対話が完了すると、第１サーバ１００から第２サーバ２００に対して会話の履歴が送信され、選択モデルに対して、会話用シーケンスＳＱによる対話が行われた後のサービス利用者Ｕの状態（対話に用いられた会話用シーケンスＳＱおよび対話におけるサービス利用者Ｕの回答）がフィードバックされる。これと同時に、選択モデルに対して、サービス利用者Ｕの反応に基づく報酬が選択モデルにフィードバックされる。そして、選択モデルは、会話用シーケンスＳＱによる対話が行われた後のサービス利用者Ｕの状態と、サービス利用者Ｕの反応に基づく報酬とに基づいて、方策（ポリシ）の見直しを行う。 Next, when the dialogue using the conversation sequence SQ is completed in the first server 100, the conversation history is sent from the first server 100 to the second server 200, and the state of the service user U after the dialogue using the conversation sequence SQ (the conversation sequence SQ used in the dialogue and the service user U's response in the dialogue) is fed back to the selection model. At the same time, a reward based on the service user U's response is fed back to the selection model. The selection model then reviews its policy based on the state of the service user U after the dialogue using the conversation sequence SQ and the reward based on the service user U's response.

すなわち、第２サーバ２００は、会話用シーケンスＳＱの選択という行動がチャットボットＣＢとサービス利用者Ｕとの対話という環境に与えた変化として、チャットボットＣＢとの対話におけるサービス利用者Ｕの回答（反応）に関する情報を第１サーバ１００から取得する。そして、第２サーバ２００は、会話用シーケンスＳＱの選択という行動がチャットボットＣＢとサービス利用者Ｕとの対話という環境に与えた変化に対する評価として、選択モデルにより選択された会話用シーケンスＳＱに対して、サービス利用者Ｕの反応（対話における回答）に基づく報酬を設定する。 In other words, the second server 200 acquires from the first server 100 information regarding the service user U's response (reaction) in the dialogue with the chatbot CB as a change that the action of selecting the conversation sequence SQ has brought to the environment of the dialogue between the chatbot CB and the service user U. Then, the second server 200 sets a reward based on the service user U's response (response in the dialogue) to the conversation sequence SQ selected by the selection model as an evaluation of the change that the action of selecting the conversation sequence SQ has brought to the environment of the dialogue between the chatbot CB and the service user U.

このようにして、第２サーバ２００は、たとえば、直前の会話用シーケンスＳＱにおけるサービス利用者Ｕの反応に基づく報酬を設定して、チャットボットＣＢとサービス利用者Ｕとの対話により得られる報酬を最大化するように、選択モデルによる会話用シーケンスＳＱの選択を最適化するための強化学習を、会話用シーケンス単位で実行できる。 In this way, the second server 200 can perform reinforcement learning on a conversation sequence basis to optimize the selection of conversation sequences SQ using the selection model, for example, by setting a reward based on the response of the service user U in the immediately preceding conversation sequence SQ, so as to maximize the reward obtained from the dialogue between the chatbot CB and the service user U.

また、強化学習の際、第２サーバ２００により会話用シーケンスＳＱに対して設定される報酬は、少なくとも直前の会話用シーケンスＳＱを用いて行われた会話の内容や、会話の成果に応じて設定される。 Furthermore, during reinforcement learning, the reward set by the second server 200 for a conversation sequence SQ is set based on at least the content of the conversation that took place using the immediately preceding conversation sequence SQ and the outcome of the conversation.

たとえば、第２サーバ２００は、直前（前回）の会話用シーケンスＳＱによる対話においてサービス利用者Ｕの反応が好意的であったか否かに基づいて報酬を設定してもよい。具体的には、第２サーバ２００は、直前の会話用シーケンスＳＱによる対話においてサービス利用者Ｕから得られた回答が好意的な回答であった場合、直前の会話用シーケンスＳＱに対して正（たとえば、＋１）の報酬を与える。一方、第２サーバ２００は、直前の会話用シーケンスＳＱによる対話においてサービス利用者Ｕから得られた回答が好意的な回答ではなかった場合、直前の会話用シーケンスＳＱに対して負（たとえば、－１）の報酬を与える。 For example, the second server 200 may set a reward based on whether the service user U responded favorably in the dialogue using the immediately preceding (previous) conversation sequence SQ. Specifically, if the response obtained from the service user U in the dialogue using the immediately preceding conversation sequence SQ was favorable, the second server 200 will provide a positive (e.g., +1) reward for the immediately preceding conversation sequence SQ. On the other hand, if the response obtained from the service user U in the dialogue using the immediately preceding conversation sequence SQ was not favorable, the second server 200 will provide a negative (e.g., -1) reward for the immediately preceding conversation sequence SQ.

なお、第２サーバ２００は、直前の会話用シーケンスＳＱに対する報酬を設定する際、過去に行われた対話におけるサービス利用者Ｕの反応の変化に応じて報酬を設定してもよい。たとえば、前々回の対話におけるサービス利用者Ｕおよび直前（前回）のサービス利用者Ｕの反応が共に好意的であった場合、直前の会話用シーケンスＳＱに対して正（たとえば、＋２）の報酬を与え、前々回の対話におけるサービス利用者Ｕが好意的であり、直前（前回）のサービス利用者Ｕの反応が好意的ではなかった場合、直前の会話用シーケンスＳＱに対して報酬を与えず、前々回の対話におけるサービス利用者Ｕおよび直前（前回）のサービス利用者Ｕの反応が共に好意的ではなかった場合、直前の会話用シーケンスＳＱに対して負（たとえば、－２）の報酬を与えてもよい。このようにして、第２サーバ２００は、サービス利用者Ｕの反応の変化に応じた会話用シーケンスＳＱの選択が行われるように、選択モデルを最適化する効果を期待できる。 When setting the reward for the most recent conversation sequence SQ, the second server 200 may set the reward according to changes in the service user U's response in past conversations. For example, if the responses of the service user U in the conversation before last and the immediately preceding (previous) service user U were both favorable, a positive reward (e.g., +2) may be given for the immediately preceding conversation sequence SQ; if the service user U in the conversation before last was favorable but the immediately preceding (previous) service user U's response was unfavorable, no reward may be given for the immediately preceding conversation sequence SQ; and if the responses of the service user U in the conversation before last and the immediately preceding (previous) service user U were unfavorable, a negative reward (e.g., -2) may be given for the immediately preceding conversation sequence SQ. In this way, the second server 200 can be expected to optimize the selection model so that a conversation sequence SQ is selected according to changes in the service user U's response.

また、たとえば、第２サーバ２００は、直前（前回）の会話用シーケンスＳＱに対応付けられている所定のコンバージョンをサービス利用者Ｕから取得できたか否かに基づいて報酬を設定してもよい。具体的には、第２サーバ２００は、直前の会話用シーケンスＳＱに対応付けられている所定の情報をサービス利用者Ｕから取得できた場合（たとえば、サービス利用者Ｕが求めている商品の情報を取得できた場合）には、直前の会話用シーケンスＳＱに対して正（たとえば、＋１）の報酬を与える。一方、第２サーバ２００は、直前の会話用シーケンスＳＱに対応付けられている所定の情報をサービス利用者Ｕから取得できなかった場合、直前の会話用シーケンスＳＱに対して負（たとえば、－１）の報酬を与える。 Furthermore, for example, the second server 200 may set a reward based on whether or not a predetermined conversion associated with the immediately preceding (previous) conversation sequence SQ was obtained from the service user U. Specifically, if the second server 200 is able to obtain predetermined information associated with the immediately preceding conversation sequence SQ from the service user U (for example, if the second server 200 is able to obtain information about a product that the service user U is looking for), it will provide a positive reward (for example, +1) for the immediately preceding conversation sequence SQ. On the other hand, if the second server 200 is unable to obtain predetermined information associated with the immediately preceding conversation sequence SQ from the service user U, it will provide a negative reward (for example, -1) for the immediately preceding conversation sequence SQ.

このようにして、第２サーバ２００は、直前の会話用シーケンスＳＱにおけるサービス利用者Ｕの反応に基づく報酬を設定して、選択モデルによる会話用シーケンスＳＱの選択が最適化されるように学習を実行する。また、第２サーバ２００は、サービス利用者Ｕの属性ごとに、選択モデルの強化学習を実行できる。これにより、第２サーバ２００は、選択モデルを用いることにより、サービス利用者Ｕの属性および状態に応じて、望ましい結果が得られるような会話用シーケンスＳＱが選択される可能性を高めことができる。 In this way, the second server 200 sets a reward based on the service user U's response to the immediately preceding conversation sequence SQ and performs learning so that the selection of the conversation sequence SQ by the selection model is optimized. The second server 200 can also perform reinforcement learning of the selection model for each attribute of the service user U. As a result, by using the selection model, the second server 200 can increase the likelihood of selecting a conversation sequence SQ that will produce desirable results depending on the attributes and state of the service user U.

なお、第２サーバ２００は、任意の手法を用いて、選択モデルの強化学習を実行できる。第２サーバ２００は、価値ベースの手法であれば、Ｑ学習やＳＡＲＳＡなどを用いてもよいし、方策ベース手法であれば、方策勾配法などを用いてもよい。 The second server 200 can use any method to perform reinforcement learning of the selection model. The second server 200 may use value-based methods such as Q-learning or SARSA, or policy-based methods such as policy gradient methods.

図１に戻り、第１サーバ１００は、第２サーバ２００から次の会話用シーケンスＳＱの指示を受信すると、受信した次の会話用シーケンスＳＱに従って、チャットボットＣＢを通じたサービス利用者Ｕとの対話に関する処理を実行する（ステップＳ０９）。 Returning to Figure 1, when the first server 100 receives an instruction for the next conversation sequence SQ from the second server 200, it executes processing related to the dialogue with the service user U through the chatbot CB in accordance with the received next conversation sequence SQ (step S09).

［３．実施形態に係る第２サーバの構成］
図５を用いて、実施形態に係る第２サーバ２００の構成例について説明する。図５に、実施形態に係る第２サーバ２００の構成例を示す。図５に示すように、第２サーバ２００は、通信部２１０と、記憶部２２０と、制御部２３０とを有する。 3. Configuration of the second server according to the embodiment
An example configuration of the second server 200 according to the embodiment will be described with reference to Fig. 5. Fig. 5 shows an example configuration of the second server 200 according to the embodiment. As shown in Fig. 5, the second server 200 includes a communication unit 210, a storage unit 220, and a control unit 230.

（通信部２１０について）
通信部２１０は、たとえば、ＮＩＣ（Network Interface Card）などによって実現される。通信部２１０は、ネットワークＮと有線または無線で接続される。第２サーバ２００は、ネットワークＮを介して、利用者端末１０や第１サーバ１００などの他の装置との間で情報の送受信を行う。 (Regarding the communication unit 210)
The communication unit 210 is realized by, for example, a network interface card (NIC) or the like. The communication unit 210 is connected to the network N by wire or wirelessly. The second server 200 transmits and receives information to and from other devices such as the user terminal 10 and the first server 100 via the network N.

（記憶部２２０について）
記憶部２２０は、たとえば、ＲＡＭ（Random Access Memory)や、フラッシュメモリなどの半導体メモリ素子、またはハードディスクや、光ディスクなどの記憶装置によって実現される。たとえば、記憶部２２０は、会話用シーケンス記憶部２２１と、選択モデル記憶部２２２と、利用者情報記憶部２２３とを有する。 (Regarding the storage unit 220)
The storage unit 220 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. For example, the storage unit 220 has a conversation sequence storage unit 221, a selection model storage unit 222, and a user information storage unit 223.

（会話用シーケンス記憶部２２１）
会話用シーケンス記憶部２２１は、チャットボットＣＢを通じたサービス利用者（たとえば、図１に示すサービス利用者Ｕ）との対話に用いる会話用シーケンスの情報を記憶する。図６は、実施形態に係る会話用シーケンスの情報の概要を示す図である。 (Conversation sequence storage unit 221)
The conversation sequence storage unit 221 stores information on conversation sequences used in a dialogue with a service user (for example, the service user U shown in FIG. 1) through the chatbot CB. FIG. 6 is a diagram showing an overview of the information on conversation sequences according to the embodiment.

図６に示すように、会話用シーケンス記憶部２２１に記憶される会話用シーケンスの情報は、「シーケンス番号」の項目や、「会話パターン」の項目や、「回答受付用コンテンツ」の項目や、「対応サービス」の項目などといった複数の項目を有している。会話用シーケンスの情報が有するこれらの項目は相互に対応付けられている。 As shown in Figure 6, the conversation sequence information stored in the conversation sequence storage unit 221 has multiple items such as a "sequence number" item, a "conversation pattern" item, a "content for receiving answers" item, and a "supported service" item. These items in the conversation sequence information are associated with each other.

「シーケンス番号」の項目には、会話用シーケンスを特定するために会話用シーケンスごとに個別に割り振られる識別番号が記憶される。「会話パターン」の項目には、会話用シーケンスに含まれる会話パターンに関する情報が記憶される。「回答受付用コンテンツ」の項目には、会話パターンに含まれる発話（質問）に関連付けて表示される回答受付用のコンテンツが記憶される。また、回答受付用コンテンツに、サービス利用者の評価を受け付けるための複数の回答が含まれる場合、それぞれの回答に対して、好意的な回答である否かを示す属性値を予め対応付けておく。「対応サービス」の項目には、会話用シーケンスが適用される各種オンラインサービスを示す情報が記憶される。 The "Sequence Number" field stores an identification number assigned individually to each conversation sequence to identify the conversation sequence. The "Conversation Pattern" field stores information about the conversation pattern included in the conversation sequence. The "Answer Receiving Content" field stores answer receiving content that is displayed in association with the utterances (questions) included in the conversation pattern. Furthermore, if the answer receiving content includes multiple answers for receiving evaluations from service users, each answer is associated in advance with an attribute value indicating whether it is a favorable answer. The "Compatible Services" field stores information indicating the various online services to which the conversation sequence applies.

（選択モデル記憶部２２２）
選択モデル記憶部２２２は、会話用シーケンスを選択する際に用いられる選択モデルに関する情報が記憶される。図７は、実施形態に係る選択モデルに関する情報の概要を示す図である。 (Selection model storage unit 222)
The selection model storage unit 222 stores information about selection models used when selecting conversation sequences. Fig. 7 is a diagram showing an overview of the information about selection models according to this embodiment.

図７に示すように、実施形態に係る選択モデル記憶部２２２に記憶される選択モデルに関する情報は、「モデルＩＤ」の項目や、「対応属性」の項目や、「モデル情報」の項目などといった複数の項目を有している。選択モデルに関する情報が有するこれらの項目は相互に対応付けられている。 As shown in FIG. 7, information about selection models stored in the selection model storage unit 222 according to the embodiment includes multiple items such as a "model ID" field, a "corresponding attribute" field, and a "model information" field. These items included in the information about selection models are associated with each other.

「モデルＩＤ」の項目には、選択モデルを識別するための識別情報が記憶される。「対応属性」の項目には、選択モデルに対応するサービス利用者（たとえば、図１に示すサービス利用者Ｕ）の属性を示す情報が記憶される。「モデル情報」の項目には、選択モデルの方策（ポリシ）に関する情報や、各種パラメータなどの選択モデルを構成する各種の情報が記憶される。 The "Model ID" field stores identification information for identifying the selection model. The "Corresponding Attributes" field stores information indicating the attributes of the service user (for example, service user U shown in Figure 1) corresponding to the selection model. The "Model Information" field stores information related to the selection model's policy and various information constituting the selection model, such as various parameters.

（利用者情報記憶部２２３）
利用者情報記憶部２２３は、各種オンラインサービスの利用者であるサービス利用者（たとえば、図１に示すサービス利用者Ｕ）に関する情報が記憶される。図８は、実施形態に係る利用者情報の概要を示す図である。 (User information storage unit 223)
The user information storage unit 223 stores information about service users who use various online services (for example, service user U shown in FIG. 1). FIG. 8 is a diagram showing an overview of user information according to the embodiment.

図８に示すように、実施形態に係る利用者情報記憶部２２３に記憶される利用者情報は、「利用者ＩＤ」の項目や、「属性情報」の項目や、「対話履歴」の項目などといった複数の項目を有している。利用者情報が有するこれらの項目は相互に対応付けられている。 As shown in FIG. 8, the user information stored in the user information storage unit 223 according to the embodiment has multiple items such as a "user ID" field, an "attribute information" field, and an "interaction history" field. These items in the user information are associated with each other.

「利用者ＩＤ」の項目には、各種オンラインサービスのサービス利用者（たとえば、図１に示すサービス利用者Ｕ）を識別するための識別情報が記憶される。「属性情報」の項目には、サービス利用者のデモグラフィック属性やサイコグラフィック属性、位置情報や生体情報などの属性に関する情報が記憶される。「対話履歴」の項目には、チャットボットＣＢとの対話において選択された会話用シーケンスなどを含む対話履歴が記憶される。 The "User ID" field stores identification information for identifying service users of various online services (for example, service user U shown in Figure 1). The "Attribute Information" field stores information related to the service user's attributes, such as demographic attributes, psychographic attributes, location information, and biometric information. The "Dialogue History" field stores dialogue history, including conversation sequences selected in dialogue with chatbot CB.

なお、利用者情報記憶部２２３は、サービス利用者に関する利用者情報として、サービス利用者のサービス利用履歴を記憶してもよい。たとえば、第２サーバ２００（制御部２３０）は、第１サーバ１００からサービス利用者のサービス利用履歴を取得し、取得したサービス利用履歴をサービス利用者の識別情報（利用者ＩＤ）に対応付けて、利用者情報記憶部２２３に登録できる。 The user information storage unit 223 may also store the service usage history of the service user as user information related to the service user. For example, the second server 200 (control unit 230) can acquire the service usage history of the service user from the first server 100, associate the acquired service usage history with the service user's identification information (user ID), and register it in the user information storage unit 223.

（制御部２３０について）
制御部２３０は、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などによって、第２サーバ２００内部の記憶装置に記憶されている各種プログラムがＲＡＭを作業領域として実行されることにより実現される。また、制御部２３０は、たとえば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの集積回路により実現される。 (Regarding the control unit 230)
The control unit 230 is realized by a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) executing various programs stored in a storage device inside the second server 200 using RAM as a work area. The control unit 230 is also realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図５に示す制御部２３０は、選択部２３１と、指示部２３２と、学習部２３３とを有し、これらの各部により、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部２３０は、以下に説明する情報処理の機能や作用を実現または実行する処理単位で複数に分割された内部構成を有していてもよい。また、制御部２３０は、図５に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部２３０には、第２サーバ２００により実行される処理の追加に応じて、図５に示す以外の機能部が追加されてもよい。 The control unit 230 shown in FIG. 5 has a selection unit 231, an instruction unit 232, and a learning unit 233, and these units realize or execute the information processing functions and actions described below. The control unit 230 may have an internal configuration divided into multiple processing units that realize or execute the information processing functions and actions described below. The control unit 230 is not limited to the configuration shown in FIG. 5, and may have other configurations that perform the information processing described below. Functional units other than those shown in FIG. 5 may be added to the control unit 230 in response to the addition of processes to be executed by the second server 200.

（選択部２３１）
選択部２３１は、オンラインサービスのサービス利用者（たとえば、図１に示すサービス利用者Ｕ）との間でチャットボットを通じて行われる対話において想定される一連の会話の内容を示す会話パターンが予め規定された複数の会話用シーケンスＳＱの中から、サービス利用者の状態に応じた会話用シーケンスＳＱを選択する。たとえば、選択部２３１は、通信部２１０を介して、第１サーバ１００から会話用シーケンスＳＱの問合せを取得すると、会話用シーケンスを選択する。 (Selection unit 231)
The selection unit 231 selects a conversation sequence SQ corresponding to the state of a service user of an online service (for example, the service user U shown in FIG. 1 ) from a plurality of conversation sequences SQ in which conversation patterns indicating a series of conversation contents expected in a dialogue between the service user and the service user through a chatbot are predefined. For example, when the selection unit 231 receives an inquiry about a conversation sequence SQ from the first server 100 via the communication unit 210, the selection unit 231 selects a conversation sequence.

また、選択部２３１は、チャットボットを通じて行われる対話におけるサービス利用者（たとえば、図１に示すサービス利用者Ｕ）の反応に基づく報酬を設定することにより、対話に用いられる会話用シーケンスＳＱの選択を行う選択モデル（たとえば、図２に示す選択モデル）を用いて、会話用シーケンス記憶部２２１に記憶されている複数の会話用シーケンスＳＱの中から会話用シーケンスを選択する。たとえば、選択部２３１は、第１サーバ１００から取得した会話用シーケンスＳＱの問合せに含まれるサービス利用者の属性を示す属性情報に対応付けられている選択モデルを取得する。そして、選択部２３１は、取得した選択モデルに対して、直近の会話用シーケンスおよびサービス利用者Ｕの反応を入力することにより、選択モデルから出力されるシーケンス番号に紐付く会話用シーケンスを、会話用シーケンス記憶部２２１に記憶されている複数の会話用シーケンスの中から選択する。 The selection unit 231 also selects a conversation sequence from multiple conversation sequences SQ stored in the conversation sequence storage unit 221 using a selection model (for example, the selection model shown in FIG. 2) that selects a conversation sequence SQ to be used in a conversation by setting a reward based on the response of a service user (for example, service user U shown in FIG. 1) in a conversation conducted through the chatbot. For example, the selection unit 231 acquires a selection model associated with attribute information indicating the attributes of the service user included in the inquiry for the conversation sequence SQ acquired from the first server 100. The selection unit 231 then inputs the most recent conversation sequence and the response of service user U into the acquired selection model, thereby selecting a conversation sequence linked to the sequence number output from the selection model from multiple conversation sequences stored in the conversation sequence storage unit 221.

（指示部２３２）
指示部２３２は、選択部２３１により選択された会話用シーケンスＳＱを、対話に関する処理を実行する第１サーバ１００（「外部装置」の一例）に指示する。たとえば、指示部２３２は、通信部２１０を通じて、会話用シーケンスのシーケンス番号を第１サーバ１００に送信する。 (Instruction unit 232)
The instruction unit 232 instructs the first server 100 (an example of an “external device”) that executes processing related to the dialogue to use the conversation sequence SQ selected by the selection unit 231. For example, the instruction unit 232 transmits the sequence number of the conversation sequence to the first server 100 via the communication unit 210.

（学習部２３３）
学習部２３３は、チャットボットＣＢを通じて行われる対話におけるサービス利用者（たとえば、図１に示すサービス利用者Ｕ）の反応に基づく報酬を設定することにより、話に用いられる会話用シーケンスＳＱの選択を行う選択モデルの強化学習を行う。 (Learning unit 233)
The learning unit 233 performs reinforcement learning of a selection model that selects a conversation sequence SQ to be used in a conversation by setting a reward based on the response of a service user (for example, service user U shown in Figure 1) in a dialogue conducted through the chatbot CB.

たとえば、学習部２３３は、少なくとも直前の会話用シーケンスＳＱによる対話においてサービス利用者の反応が好意的であったか否かに基づいて報酬を設定することにより、選択モデルの強化学習を行う。 For example, the learning unit 233 performs reinforcement learning of the selection model by setting a reward based on whether the service user's response was favorable or not in at least the dialogue using the immediately preceding conversation sequence SQ.

また、たとえば、学習部１３３は、直前の会話用シーケンスに対応付けられている所定のコンバージョンをサービス利用者から取得できたか否かに基づいて報酬を設定することにより、選択モデルの強化学習を行う。 Furthermore, for example, the learning unit 133 performs reinforcement learning of the selection model by setting a reward based on whether or not a specified conversion associated with the immediately preceding conversation sequence was obtained from the service user.

［４．実施形態に係る処理手順］
以下、実施形態に係る第２サーバ２００が実行する情報処理の手順について説明する。図９に、実施形態に係る第２サーバ２００が実行する処理手順の一例を示す。図９に示す処理手順は、第２サーバ２００の制御部２３０により実行される。図９に示す処理手順は、第２サーバ２００の稼働中、繰り返し実行される。 4. Processing Procedure According to the Embodiment
The following describes the procedure of information processing executed by the second server 200 according to the embodiment. Fig. 9 shows an example of the procedure of processing executed by the second server 200 according to the embodiment. The procedure of processing shown in Fig. 9 is executed by the control unit 230 of the second server 200. The procedure of processing shown in Fig. 9 is repeatedly executed while the second server 200 is operating.

図９に示すように、選択部２３１は、第１サーバ１００から会話用シーケンスＳＱの問合せを取得する（ステップＳ１０１）。 As shown in FIG. 9, the selection unit 231 obtains a query for a conversation sequence SQ from the first server 100 (step S101).

また、選択部２３１は、選択モデルを用いて、チャットボットＣＢとの対話を行うサービス利用者（たとえば、図１に示すサービス利用者）の状態に応じた会話用シーケンスを選択する（ステップＳ１０２）。 The selection unit 231 also uses the selection model to select a conversation sequence that corresponds to the state of the service user (for example, the service user shown in Figure 1) who is having a conversation with the chatbot CB (step S102).

また、指示部２３２は、選択部２３１により選択された会話用シーケンスを第１サーバ１００に指示する（ステップＳ１０３）。 The instruction unit 232 also instructs the first server 100 to use the conversation sequence selected by the selection unit 231 (step S103).

また、学習部２３３は、会話用シーケンスによる対話におけるサービス利用者の反応に基づく報酬を設定して選択モデルの強化学習を実行して（ステップＳ１０４）、図９に示す処理手順を終了する。 The learning unit 233 also sets a reward based on the service user's response in the dialogue using the conversation sequence and performs reinforcement learning of the selection model (step S104), and then ends the processing procedure shown in Figure 9.

［５．変形例］
本願に係る情報処理装置、情報処理方法、及び情報処理プログラムは、上記実施形態以外にも種々の異なる形態にて実施されてよい。以下では、上記実施形態の変形例について説明する。 5. Modifications
The information processing device, the information processing method, and the information processing program according to the present application may be implemented in various different forms other than the above-described embodiment. Modifications of the above-described embodiment will be described below.

（５－１．会話用シーケンスについて）
上記実施形態に係る会話用シーケンスは、探し物用シーケンスや、今日の気分用シーケンスや、キャンペーン用シーケンスなどのように所定のトピックごとに設定されてもよい。たとえば、探し物シーケンスであれば、「どんな本を探しているのかな？→ジャンルを選んでね→・・・」などの会話パーンなどが例示される。また、たとえば、今日の気分シーケンスであれば、「今日はどんな気分？→・・・」などの会話パターンが例示される。また、たとえば、キャンペーンシーケンスであれば、「今日はお得なキャンペーンをやっているよ→・・・」などの会話パターンが例示される。 (5-1. Conversation Sequence)
The conversation sequences according to the above embodiments may be set for each predetermined topic, such as a search sequence, a today's mood sequence, or a campaign sequence. For example, a search sequence may have a conversation pattern such as "What kind of book are you looking for? → Please select a genre → ...". For example, a today's mood sequence may have a conversation pattern such as "How are you feeling today? → ...". For example, a campaign sequence may have a conversation pattern such as "We're running a great campaign today → ...".

また、第１サーバ１００は、所定の会話はＮ回（Ｎは、自然数）以上表示しないなどの制御を実行してもよい。 The first server 100 may also implement controls such as not displaying a given conversation more than N times (N is a natural number).

（５－２．選択モデルの学習について）
上記実施形態において、第２サーバ２００において実行される選択モデルの学習は、サービス利用者の属性ごとに実行されてもよい。すなわち、第２サーバ２００は、属性が同一であるサービス利用者ごとに、各サービス利用者に共通の選択モデルを設け、強化学習を実行する。また、この場合、第２サーバ２００は、所定のタイミングで、会話用シーケンスごとに各サービス利用者の反応（対話における回答の内容）を収集し、収集した反応に基づいて強化学習を実行してもよい。 (5-2. Learning the selection model)
In the above embodiment, the learning of the selection model executed in the second server 200 may be executed for each attribute of the service user. That is, the second server 200 sets a common selection model for each service user with the same attribute and executes reinforcement learning. In this case, the second server 200 may collect responses (contents of responses in the dialogue) of each service user for each conversation sequence at a predetermined timing and execute reinforcement learning based on the collected responses.

（５－３．チャットボットについて）
上記実施形態において、第１サーバ１００は、チャットボットＣＢの対話画面において、チャットボットＣＢに対応する仮想的なキャラクター画像を表示させてもよい。このとき、第１サーバ１００は、対話の相手であるサービス利用者の回答の内容に応じて、キャラクター画像の表情を変化させてもよい。また、第１サーバ１００は、サービス利用者の属性などに応じて、キャラクター画像の容姿を変更してもよい。 (5-3. About chatbots)
In the above embodiment, the first server 100 may display a virtual character image corresponding to the chatbot CB on the dialogue screen of the chatbot CB. In this case, the first server 100 may change the facial expression of the character image depending on the content of the response of the service user who is the dialogue partner. Furthermore, the first server 100 may change the appearance of the character image depending on the attributes of the service user, etc.

（６．効果）
実施形態に係る第２サーバ２００は、オンラインサービスのサービス利用者との間でチャットボットを通じて行われる対話に関する処理を制御する情報処理装置であり、選択部２３１と、指示部２３２とを有する。選択部２３１は、対話において想定される一連の会話の内容を示す会話パターンが予め規定された複数の会話用シーケンスの中から、サービス利用者の状態に応じた会話用シーケンスを選択する。指示部２３２は、選択部２３１により選択された会話用シーケンスを、対話に関する処理を実行する第１サーバ１００に指示する。 (6. Effects)
The second server 200 according to the embodiment is an information processing device that controls processing related to a dialogue conducted with a service user of an online service through a chatbot, and includes a selection unit 231 and an instruction unit 232. The selection unit 231 selects a conversation sequence according to the state of the service user from a plurality of conversation sequences in which conversation patterns indicating a series of conversation contents expected in the dialogue are predefined. The instruction unit 232 instructs the first server 100, which executes processing related to the dialogue, to select the conversation sequence selected by the selection unit 231.

このようなことから、実施形態に係る情報処理装置の一例である第２サーバ２００は、オンラインサービスのサービス利用者からの効率的な情報収集を図ることができる。たとえば、実施形態に係る第２サーバ２００は、予め規定される会話用シーケンス単位でサービス利用者との対話を実施することにより、対話におけるユーザエクスペリエンスの質を向上させる効果を期待できる。すなわち、実施形態に係る第２サーバ２００は、チャットボットＣＢを通じて、サービス利用者Ｕとの間でより自然な会話を実現できる。この結果、チャットボットＣＢとの対話が継続される可能性を高めることができ、対話を通じてサービス利用者からの効率的な情報収集を図ることができる。 As a result, the second server 200, which is an example of an information processing device according to the embodiment, can efficiently collect information from service users of online services. For example, the second server 200 according to the embodiment can be expected to have the effect of improving the quality of the user experience in the dialogue by conducting a dialogue with the service user in units of predefined conversation sequences. In other words, the second server 200 according to the embodiment can realize a more natural conversation with the service user U through the chatbot CB. As a result, it is possible to increase the likelihood that the dialogue with the chatbot CB will continue, and to efficiently collect information from the service user through the dialogue.

また、第２サーバ２００は、チャットボットを通じて行われる対話におけるサービス利用者の反応に基づく報酬を会話用シーケンスに対して設定することにより、対話に用いられる会話用シーケンスの選択を行う選択モデルの強化学習を会話用シーケンス単位で実行する学習部２３３をさらに有する。選択部２３１は、選択モデルを用いて、会話用シーケンスを選択する。 The second server 200 also has a learning unit 233 that performs reinforcement learning of a selection model on a conversation sequence basis, which selects a conversation sequence to be used in a conversation by setting a reward for the conversation sequence based on the service user's response in a conversation conducted through the chatbot. The selection unit 231 selects a conversation sequence using the selection model.

また、選択部２３１は、選択モデルを用いて、サービス利用者に関する利用者情報と、直近の会話の内容を示す会話の履歴と、会話におけるサービス利用者の反応とに基づく会話用シーケンスを選択する。 The selection unit 231 also uses a selection model to select a conversation sequence based on user information about the service user, a conversation history indicating the content of the most recent conversation, and the service user's response to the conversation.

このため、第２サーバによれば、サービス利用者ごとに、サービス利用者の会話における反応に応じた自然な会話を演出できる。 As a result, the second server can create natural conversations for each service user that are tailored to the service user's reactions during the conversation.

また、利用者情報は、サービス利用者の属性を示す属性情報、及びオンラインサービスのサービス利用履歴を含む。 The user information also includes attribute information indicating the attributes of the service user and the service usage history of the online service.

このため、第２サーバによれば、サービス利用者の属性やサービスの利用状況に合わせた自然な会話を演出できる。 As a result, the second server can create natural conversations that are tailored to the service user's attributes and service usage situation.

このように、第２サーバ２００によれば、会話用シーケンス単位で選択モデルの強化学習を実行するので、会話用シーケンスによる対話において、強化学習が有する学習のランダム性に起因するユーザエクスペリエンスの質の低下を防止できる。さらに、第２サーバ２００によれば、会話用シーケンスを用いて、会話用シーケンス単位で学習を行うことにより、チャットボットＣＢに強化学習を用いる通常の学習よりも学習量を抑えることができ、システムの効率化を図ることができる。 In this way, the second server 200 performs reinforcement learning of the selection model on a conversation sequence basis, thereby preventing a decrease in the quality of the user experience in dialogues using conversation sequences, which is caused by the randomness of learning inherent in reinforcement learning. Furthermore, by using conversation sequences to perform learning on a conversation sequence basis, the second server 200 can reduce the amount of learning compared to normal learning using reinforcement learning for chatbot CB, thereby improving system efficiency.

また、学習部２３３は、少なくとも直前の会話用シーケンスによる対話においてサービス利用者の反応が好意的であったか否かに基づいて報酬を設定することにより、選択モデルの強化学習を行う。 The learning unit 233 also performs reinforcement learning of the selection model by setting a reward based on whether the service user's response was favorable or not in the dialogue based on at least the immediately preceding conversation sequence.

このため、第２サーバ２００によれば、会話用シーケンスによる対話において、サービス利用者の反応が好意的となるように、選択モデルによる会話用シーケンスの選択を最適化できる。 As a result, the second server 200 can optimize the selection of conversation sequences using the selection model so that the service user's reaction during a dialogue using the conversation sequences is favorable.

また、学習部２３３は、直前の会話用シーケンスに対応付けられている所定のコンバージョンをサービス利用者から取得できたか否かに基づいて報酬を設定することにより、選択モデルの強化学習を行う。 In addition, the learning unit 233 performs reinforcement learning of the selection model by setting a reward based on whether or not a specific conversion associated with the immediately preceding conversation sequence was obtained from the service user.

このため、第２サーバ２００によれば、会話用シーケンスによる対話を通じて、サービス利用者から望ましい結果が得られるように、選択モデルによる会話用シーケンスの選択を最適化できる。 As a result, the second server 200 can optimize the selection of conversation sequences using a selection model so that desirable results are obtained from service users through dialogue using the conversation sequences.

［７．ハードウェア構成］
また、上述してきた実施形態および各変形例に係る第２サーバ２００は、たとえば、図１０に示すような構成のコンピュータ１０００によって実現される。図１０は、実施形態及び各変形例に係る第２サーバ２００の機能を実現するコンピュータの一例を示すハードウェア構成図である。 7. Hardware Configuration
The second server 200 according to the above-described embodiment and each of the modified examples is realized, for example, by a computer 1000 having a configuration as shown in Fig. 10. Fig. 10 is a hardware configuration diagram showing an example of a computer that realizes the functions of the second server 200 according to the embodiment and each of the modified examples.

コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力ＩＦ（Interface）１０６０、入力ＩＦ１０７０、ネットワークＩＦ１０８０がバス１０９０により接続された形態を有する。 The computer 1000 is connected to an output device 1010 and an input device 1020, and has a configuration in which an arithmetic unit 1030, a primary storage device 1040, a secondary storage device 1050, an output IF (Interface) 1060, an input IF 1070, and a network IF 1080 are connected via a bus 1090.

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラムなどに基づいて動作し、各種の処理を実行する。一次記憶装置１０４０は、ＲＡＭなど、演算装置１０３０が各種の演算に用いるデータを一次的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ（Read Only Memory)、ＨＤＤ、フラッシュメモリ等により実現される。 The arithmetic device 1030 operates based on programs stored in the primary storage device 1040 and secondary storage device 1050, and programs read from the input device 1020, and executes various processes. The primary storage device 1040 is a memory device, such as RAM, that temporarily stores data used by the arithmetic device 1030 for various calculations. The secondary storage device 1050 is a storage device in which data used by the arithmetic device 1030 for various calculations and various databases are registered, and is realized by a ROM (Read Only Memory), HDD, flash memory, etc.

出力ＩＦ１０６０は、モニタやプリンタといった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインターフェイスであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力ＩＦ１０７０は、マウス、キーボード、およびスキャナなどといった各種の入力装置１０２０から情報を受信するためのインターフェイスであり、例えば、ＵＳＢなどにより実現される。 The output IF 1060 is an interface for transmitting information to be output to an output device 1010, such as a monitor or printer, which outputs various types of information, and is implemented, for example, by a connector conforming to a standard such as USB (Universal Serial Bus), DVI (Digital Visual Interface), or HDMI (High Definition Multimedia Interface). The input IF 1070 is an interface for receiving information from various input devices 1020, such as a mouse, keyboard, and scanner, and is implemented, for example, by USB.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）などの光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリなどから情報を読み出す装置であってもよい。また、入力装置１０２０は、ＵＳＢメモリなどの外付け記憶媒体であってもよい。 The input device 1020 may be a device that reads information from optical recording media such as a CD (Compact Disc), DVD (Digital Versatile Disc), or PD (Phase Change Rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. The input device 1020 may also be an external storage medium such as a USB memory.

ネットワークＩＦ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 Network IF 1080 receives data from other devices via network N and sends it to the computing device 1030, and also transmits data generated by the computing device 1030 to other devices via network N.

演算装置１０３０は、出力ＩＦ１０６０や入力ＩＦ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 The arithmetic unit 1030 controls the output device 1010 and the input device 1020 via the output IF 1060 and the input IF 1070. For example, the arithmetic unit 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040 and executes the loaded program.

たとえば、コンピュータ１０００が実施形態に係る第２サーバ２００として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラム（たとえば、情報処理プログラム）を実行することにより、制御部２３０と同様の機能を実現する。すなわち、演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラム（例えば、情報処理プログラム）との協働により、実施形態に係る第２サーバ２００による処理を実現する。 For example, when the computer 1000 functions as the second server 200 according to the embodiment, the arithmetic device 1030 of the computer 1000 executes a program (e.g., an information processing program) loaded onto the primary storage device 1040, thereby realizing functions similar to those of the control unit 230. In other words, the arithmetic device 1030 works in cooperation with the program (e.g., an information processing program) loaded onto the primary storage device 1040 to realize the processing of the second server 200 according to the embodiment.

［８．その他］
上記実施形態などにおいて説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 [8. Other]
Of the processes described in the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically using a known method. In addition, the information including the processing procedures, specific names, various data, and parameters shown in the above documents and drawings can be changed as desired unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。たとえば、第２サーバ２００の制御部２３０が有する選択部２３１および指示部２３２は、機能的に統合されていてもよい。また、たとえば、情報処理システムＳＹＳにおける第１サーバ１００および第２サーバ２００は、機能的および物理的に統合された単体の情報処理装置であってもよい。 Furthermore, the components of each device shown in the figure are conceptual and functional, and do not necessarily have to be physically configured as shown. In other words, the specific form of distribution and integration of each device is not limited to that shown, and all or part of them can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc. For example, the selection unit 231 and instruction unit 232 of the control unit 230 of the second server 200 may be functionally integrated. Also, for example, the first server 100 and the second server 200 in the information processing system SYS may be a single information processing device that is functionally and physically integrated.

また、上述してきた各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Furthermore, the above-described embodiments can be combined as appropriate to the extent that the processing content is not contradictory.

以上、本願の実施形態をいくつかの図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 The above describes in detail the embodiments of the present application based on several drawings, but these are merely examples, and the present invention can be implemented in other forms that incorporate various modifications and improvements based on the knowledge of those skilled in the art, including the aspects described in the Disclosure of the Invention section.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、制御部は、制御手段や制御回路に読み替えることができる。 In addition, the above-mentioned "section, module, unit" can be interpreted as "means" or "circuit." For example, a control unit can be interpreted as a control means or a control circuit.

Ｎネットワーク
ＳＹＳ情報処理システム
１０利用者端末
１００第１サーバ
２００第２サーバ
２１０通信部
２２０記憶部
２２１会話用シーケンス記憶部
２２２選択モデル記憶部
２２３利用者情報記憶部
２３０制御部
２３１選択部
２３２指示部
２３３学習部 N Network SYS Information Processing System 10 User Terminal 100 First Server 200 Second Server 210 Communication Unit 220 Storage Unit 221 Conversation Sequence Storage Unit 222 Selection Model Storage Unit 223 User Information Storage Unit 230 Control Unit 231 Selection Unit 232 Instruction Unit 233 Learning Unit

Claims

An information processing device that controls processing related to a dialogue conducted between a service user of an online service and a chatbot,
a selection unit that selects a conversation sequence from a plurality of conversation sequences in which conversation patterns indicating a series of conversation contents expected in the dialogue are defined in advance;
an instruction unit that instructs an external device that executes a process related to the dialogue to use the conversation sequence selected by the selection unit,
The selection unit
When receiving an inquiry about an initial conversation sequence from the external device, select, from the plurality of conversation sequences, an initial conversation sequence common to the online services, an initial conversation sequence preset for each online service, or an initial conversation sequence corresponding to an attribute of the service user who will be the other party in the conversation;
An information processing device characterized by acquiring dynamic information including location information and biometric information of the service user each time an inquiry for a next conversation sequence is received, and selecting the next conversation sequence based on the status of the service user updated based on the acquired dynamic information.

a learning unit that performs reinforcement learning of a selection model that selects a conversation sequence to be used in the conversation by setting a reward for the conversation sequence based on a response of the service user in the conversation conducted through the chatbot,
The selection unit
The information processing device according to claim 1 , further comprising: selecting a conversation sequence using the selection model associated with attribute information indicating attributes of the service user included in the inquiry acquired from the external device.

The selection unit
The information processing device according to claim 2, characterized in that a conversation sequence linked to a sequence number output from the selection model is selected by inputting a conversation sequence immediately before a next conversation sequence and the service user's reaction in the dialogue into the selection model.

The learning unit
The information processing device according to claim 2, characterized in that reinforcement learning of the selection model is performed by setting the reward based on whether or not the service user's response to the dialogue in at least the immediately preceding conversation sequence was favorable.

The learning unit
The information processing device according to claim 4 , wherein when a reward for a previous conversation sequence is set, the reward is set according to a change in the reaction of the service user in the previous conversation.

The learning unit
The information processing device according to claim 2, wherein reinforcement learning of the selection model is performed by setting the reward based on whether or not a predetermined conversion associated with the immediately preceding conversation sequence has been obtained from the service user.

An information processing method for controlling processing related to a dialogue conducted between a service user of an online service and a chatbot, comprising:
a selection step of selecting a conversation sequence from a plurality of conversation sequences in which conversation patterns indicating a series of conversation contents expected in the dialogue are defined in advance;
an instruction step of instructing an external device that executes processing related to the dialogue to use the conversation sequence selected in the selection step,
The selection step includes:
When receiving an inquiry about an initial conversation sequence from the external device, select, from the plurality of conversation sequences, an initial conversation sequence common to the online services, an initial conversation sequence preset for each online service, or an initial conversation sequence corresponding to an attribute of the service user who will be the other party in the conversation;
An information processing method characterized by acquiring dynamic information including location information and biometric information of a service user each time an inquiry for a next conversation sequence is received, and selecting a next conversation sequence based on the status of the service user updated based on the acquired dynamic information.

A computer that controls the processing of interactions between users of an online service and the chatbot,
a selection step of selecting a conversation sequence from a plurality of conversation sequences in which conversation patterns indicating a series of conversation contents expected in the dialogue are defined in advance;
an instruction step of instructing an external device that executes processing related to the dialogue to use the conversation sequence selected by the selection step;
The selection procedure comprises:
When receiving an inquiry about an initial conversation sequence from the external device, select, from the plurality of conversation sequences, an initial conversation sequence common to the online services, an initial conversation sequence preset for each online service, or an initial conversation sequence corresponding to an attribute of the service user who will be the other party in the conversation;
An information processing program characterized by acquiring dynamic information including location information and biometric information of a service user each time an inquiry for a next conversation sequence is received, and selecting a next conversation sequence based on the status of the service user updated based on the acquired dynamic information.