JP7733329B2

JP7733329B2 - Information processing device and program

Info

Publication number: JP7733329B2
Application number: JP2024027694A
Authority: JP
Inventors: 輝長岡; 春満信田
Original assignee: Mixi Inc
Current assignee: Mixi Inc
Priority date: 2020-01-23
Filing date: 2024-02-27
Publication date: 2025-09-03
Anticipated expiration: 2040-01-23
Also published as: JP2025156599A; JP2024051086A; JP7453513B2; JP2021117581A

Description

本発明は、情報処理装置及びプログラムに関する。 The present invention relates to an information processing device and a program.

情報処理装置のユーザに対して広告を提示する技術は、種々知られている。例えば、ユーザ間のチャットの内容に基づいて広告を選択することが特許文献１に開示されている。 Various technologies are known for presenting advertisements to users of information processing devices. For example, Patent Document 1 discloses a method for selecting advertisements based on the content of chats between users.

特開平１１－３３４８号公報Japanese Patent Application Publication No. 11-3348

上記従来例の技術では、ユーザ間のチャットにおいて明確に広告であることがわかるようにユーザに提示している。一方、ユーザと対話が可能な機械において、機械側の発話で広告を入れた発話をすると、ユーザとの会話が不自然なものとなってしまうという問題があった。 The above-mentioned conventional technology presents advertisements to users in chats so that they are clearly visible. However, when using a machine that can interact with users, there is a problem in that if the machine's speech includes an advertisement, the conversation with the user becomes unnatural.

本発明は上記実情に鑑みて為されたもので、ユーザと対話が可能な機械の発話に、自然な広告文を含めることができる情報処理装置及びプログラムを提供することを、その目的の一つとする。 The present invention was made in consideration of the above-mentioned circumstances, and one of its objectives is to provide an information processing device and program that can include natural advertising text in the speech of a machine that can interact with a user.

上記従来例の問題点を解決する本発明の一態様は、情報処理装置であって、広告の内容である広告文を保持する保持手段と、ユーザから入力される会話文に含まれる、予め定められた予約語を抽出する抽出手段と、前記抽出した予約語に基づいて、前記広告文のうち、ユーザに提示する広告文を選択する選択手段と、所定の基準に基づいて、前記会話文に続いて、選択された広告文を含む応答文の発話処理を実行する実行手段と、を備えることとしたものである。 One aspect of the present invention that solves the problems of the above-mentioned conventional examples is an information processing device that includes: storage means for storing advertising copy, which is the content of the advertisement; extraction means for extracting predetermined reserved words contained in conversational text entered by the user; selection means for selecting an advertising copy to present to the user from the advertising copy based on the extracted reserved words; and execution means for executing speech processing of a response text that includes the selected advertising copy following the conversational text based on predetermined criteria.

本発明によると、ユーザと対話が可能な機械の発話に、自然な広告文を含めることが可能な情報処理装置及びプログラムを提供できる。 This invention provides an information processing device and program that can include natural advertising text in the speech of a machine that can interact with a user.

本発明の実施の形態に係る情報処理システムの構成例を表すブロック図である。1 is a block diagram illustrating an example of the configuration of an information processing system according to an embodiment of the present invention. 本発明の実施の形態に係る端末装置の構成例を表すブロック図である。FIG. 2 is a block diagram illustrating an example of the configuration of a terminal device according to an embodiment of the present invention. 本発明の実施の形態に係る情報処理システムが用いる広告文の保持例を表す説明図である。1 is an explanatory diagram illustrating an example of how advertising copy is stored in the information processing system according to the embodiment of the present invention; 本発明の実施の形態に係る情報処理システムで利用される設定情報の例を表す説明図である。FIG. 3 is an explanatory diagram illustrating an example of setting information used in the information processing system according to the embodiment of the present invention. 本発明の実施の形態に係るサーバの例を表す機能ブロック図である。FIG. 2 is a functional block diagram illustrating an example of a server according to an embodiment of the present invention. 本発明の実施の形態に係る情報処理システムで利用される会話文キューの内容例を表す説明図であるFIG. 10 is an explanatory diagram showing an example of the contents of a conversation sentence queue used in the information processing system according to the embodiment of the present invention. 本発明の実施の形態に係る端末装置の例を表す機能ブロック図である。FIG. 2 is a functional block diagram illustrating an example of a terminal device according to an embodiment of the present invention. 本発明の実施の形態に係る情報処理システムの動作例を表す流れ図である。3 is a flowchart illustrating an example of the operation of the information processing system according to the embodiment of the present invention. 本発明の実施の形態に係る情報処理システムの動作例を表すもう一つの流れ図である。10 is another flowchart illustrating an example of the operation of the information processing system according to the embodiment of the present invention.

本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係る情報処理システム１は、図１に例示するように、互いにネットワーク等の通信手段を介して通信可能に接続されたサーバ１０と、端末装置２０とを含んで構成される。 An embodiment of the present invention will be described with reference to the drawings. As shown in FIG. 1, an information processing system 1 according to an embodiment of the present invention includes a server 10 and a terminal device 20 that are communicatively connected to each other via a communication means such as a network.

本実施の形態の一例では、このサーバ１０が本発明の情報処理装置を実現する。この例のサーバ１０は、図１に示したように、制御部１１と、記憶部１２と、通信部１３とを含んで構成される。また、端末装置２０は、ロボットであり、図２に例示するように、脚部２１と、本体部２２とを少なくとも含み、本体部２２に、制御部３１と、記憶部３２と、センサ部３３と、表示部３４と、音声出力部３５と、通信部３６と、駆動部３７とを収納している。また脚部２１と本体部２２とは、少なくとも１軸まわりに回転可能なアクチュエータを介して連結されており、脚部２１に対して本体部２２の向きを回動可能となっている。 In one example of this embodiment, the server 10 realizes the information processing device of the present invention. As shown in FIG. 1, the server 10 in this example is configured to include a control unit 11, a memory unit 12, and a communication unit 13. The terminal device 20 is a robot, and as shown in FIG. 2, includes at least a leg unit 21 and a main body unit 22, which houses a control unit 31, a memory unit 32, a sensor unit 33, a display unit 34, an audio output unit 35, a communication unit 36, and a drive unit 37. The leg unit 21 and the main body unit 22 are connected via an actuator that can rotate around at least one axis, allowing the orientation of the main body unit 22 to be rotated relative to the leg unit 21.

サーバ１０の制御部１１は、ＣＰＵ等のプログラム制御デバイスであり、記憶部１２に格納されたプログラムに従って動作する。本実施の形態では、この制御部１１は、端末装置２０からアクションのリクエスト情報を受け入れる。またこの制御部１１は、当該受け入れたリクエスト情報に対応して、端末装置２０にて実行されるアクションを指示するアクション指示と、端末装置２０にて発声される音声の内容を表す文字列情報とを含むアクション情報を、上記リクエスト情報の送信元である端末装置２０へ送信する。この制御部１１の詳しい処理の内容については、後に説明する。 The control unit 11 of the server 10 is a program-controlled device such as a CPU, and operates according to a program stored in the memory unit 12. In this embodiment, the control unit 11 accepts action request information from the terminal device 20. In response to the accepted request information, the control unit 11 transmits action information to the terminal device 20, which is the sender of the request information, the action information including an action instruction that instructs the action to be executed by the terminal device 20 and character string information representing the content of the voice to be uttered by the terminal device 20. The detailed processing content of the control unit 11 will be explained later.

記憶部１２は、ディスクデバイスまたはメモリデバイスであり、制御部１１によって実行されるプログラムを保持する。この記憶部１２は、また、制御部１１のワークメモリとしても動作する。本実施の形態の一例では、この記憶部１２には、端末装置２０への指示を生成するための情報が格納されていてもよい。この情報の内容については後に述べる。 The storage unit 12 is a disk device or memory device that stores programs executed by the control unit 11. The storage unit 12 also functions as a work memory for the control unit 11. In one example of this embodiment, the storage unit 12 may also store information for generating instructions to the terminal device 20. The contents of this information will be described later.

また本実施の形態の一例では、この記憶部１２には、図３に例示するように、広告の対象となる商品等（Ｐ）ごとに、広告識別子（Ｉ）と、端末装置２０に対して配信される広告の内容である少なくとも一つの広告文（Ｅ）と、その出稿者を特定する情報（Ａ）と、広告ジャンルを特定するジャンル情報（Ｇ）とが関連付けて保持される。また、この広告文にはさらに、広告の対象となる商品等の情報を掲載したウェブページのＵＲＬ（Ｕ）が関連付けられていてもよい。このように、本実施の形態のある例では、この記憶部１２が本発明の保持手段として機能する。なお、広告文は、広告の対象となる商品等に対して一つとは限られず、複数あってもよい。 In one example of this embodiment, as shown in FIG. 3, the memory unit 12 stores, for each product (P) to be advertised, an advertising identifier (I), at least one advertising copy (E) that is the content of the advertisement to be delivered to the terminal device 20, information (A) that identifies the advertiser, and genre information (G) that identifies the advertising genre, all in association with each other. The advertising copy may also be associated with a URL (U) of a webpage that contains information about the product (P) to be advertised. In this way, in one example of this embodiment, the memory unit 12 functions as the storage means of the present invention. Note that the number of advertising copies per product (P) is not limited to one, and there may be multiple copies.

さらに本実施の形態の一例では、この記憶部１２には、端末装置２０にて発話させるための会話文の候補を蓄積した会話文キューが保持されてもよい。この会話文キューの内容については後に述べる。 Furthermore, in one example of this embodiment, the storage unit 12 may also hold a conversation sentence queue that accumulates conversation sentence candidates to be spoken by the terminal device 20. The contents of this conversation sentence queue will be described later.

通信部１３は、ネットワークインタフェース等であり、制御部１１から入力される指示に従い、ネットワークを介して端末装置２０宛に種々の情報を送出する。またこの通信部１３は、ネットワークを介して受信した情報を、制御部１１に出力する。 The communication unit 13 is a network interface or the like, and sends various information to the terminal device 20 via the network in accordance with instructions input from the control unit 11. The communication unit 13 also outputs information received via the network to the control unit 11.

端末装置２０の制御部３１は、ＣＰＵ等のプログラム制御デバイスであり、記憶部３２に格納されたプログラムに従って動作する。本実施の形態では、この制御部３１は、所定のタイミングで、サーバ１０に対してアクションのリクエスト情報を送出する。またこの制御部３１は、サーバ１０からアクション情報を受信する。制御部３１は、アクション情報を受信すると、当該アクション情報に含まれる文字列情報に基づき、音声データを合成する。また制御部３１は、ここで合成された音声データを再生するとともに、受信したアクション情報の指示に従い、アニメーションの表示などの処理を実行する。 The control unit 31 of the terminal device 20 is a program-controlled device such as a CPU, and operates according to a program stored in the memory unit 32. In this embodiment, the control unit 31 sends action request information to the server 10 at a predetermined timing. The control unit 31 also receives action information from the server 10. Upon receiving the action information, the control unit 31 synthesizes audio data based on the character string information included in the action information. The control unit 31 also plays back the synthesized audio data and performs processing such as displaying animations in accordance with the instructions in the received action information.

本実施の形態の一例では、端末装置２０の制御部３１は、後に説明するセンサ部３３がユーザの音声の入力を受け入れると、当該入力された音声を文字列情報に変換する。この処理は、広く知られた音声認識の処理を用いることができ、制御部３１は例えば音声認識処理を実行する音声認識サーバに入力された音声の情報を送出し、認識した文字列情報を受信することでこの処理を実行してもよい。 In one example of this embodiment, when the sensor unit 33, which will be described later, receives a user's voice input, the control unit 31 of the terminal device 20 converts the input voice into character string information. This process can use widely known voice recognition processing, and the control unit 31 may perform this process, for example, by sending input voice information to a voice recognition server that performs voice recognition processing and receiving recognized character string information.

また制御部３１は、ユーザにより音声が入力されたことを契機（トリガ）として、サーバ１０に対してアクションのリクエスト情報を送出する。このリクエスト情報には、トリガを特定する情報（例えばユーザにより音声が入力された旨の情報）と、サーバ１０での処理に必要な情報、例えば、ここではユーザが入力した音声の認識結果である文字列情報とを含む。 The control unit 31 also sends action request information to the server 10 when speech input by the user is used as a trigger. This request information includes information identifying the trigger (e.g., information indicating that speech has been input by the user) and information necessary for processing by the server 10, such as character string information that is the recognition result of the speech input by the user.

すなわち制御部３１は、予め定められたトリガが発生したと判断すると、サーバ１０での処理に必要な情報を収集して、当該トリガを特定する情報とともに、当該収集した情報を含むリクエスト情報をサーバ１０へ送出することとなる。このトリガは、先の例のように、ユーザにより音声が入力されたことのほか、所定の時刻になった、など、任意に定め得る。この制御部３１の詳しい動作の内容についても後に説明する。 In other words, when the control unit 31 determines that a predetermined trigger has occurred, it collects the information necessary for processing on the server 10 and sends request information including the collected information to the server 10, along with information identifying the trigger. This trigger can be any trigger, such as when a user inputs voice, as in the previous example, or when a predetermined time arrives. The detailed operation of this control unit 31 will be explained later.

記憶部３２は、メモリデバイス等であり、制御部３１によって実行されるプログラムを保持する。この記憶部３２は、また、制御部３１のワークメモリとしても動作する。本実施の形態では、この記憶部３２には、上記トリガ（Ｎ）と、サーバ１０に送出するべき情報を特定する情報（Ｐ）等とを関連付けた設定情報が格納されていてもよい（図４）。この設定情報については後に具体的な例を挙げて説明する。 The storage unit 32 is a memory device or the like, and stores programs executed by the control unit 31. This storage unit 32 also operates as a work memory for the control unit 31. In this embodiment, this storage unit 32 may store setting information that associates the above-mentioned trigger (N) with information (P) that specifies the information to be sent to the server 10, etc. (Figure 4). This setting information will be explained later with specific examples.

さらに、本実施の形態のある例では、この記憶部３２は、表示部３４に表示するべきアニメーションの画像データを格納している。具体的に記憶部３２は、笑顔の目の画像データ、涙の流れるアニメーションの目の画像データ…など目のアニメーションの画像データや、笑った状態で閉じた口の画像データ、泣いている状態での閉じた口の画像データ、発声中の口のアニメーションの画像データ…など、口のアニメーションの画像データ…といったように、表情を表す各部の複数の画像データを格納していてもよい。 Furthermore, in one example of this embodiment, the memory unit 32 stores image data of an animation to be displayed on the display unit 34. Specifically, the memory unit 32 may store multiple image data of parts representing facial expressions, such as image data of animated eyes, such as image data of smiling eyes, image data of animated eyes with tears flowing, etc., and image data of animated mouths, such as image data of a closed mouth in a smiling state, image data of a closed mouth in a crying state, image data of an animated mouth during speech, etc.

センサ部３３は、少なくとも音声センサであるマイクを含む。またこのセンサ部３３は、タッチセンサや、加速度センサ等を含んでもよい。このセンサ部３３は、各センサで検出した音声信号や、ユーザが触れた位置を表す情報、加速度の情報等を、制御部３１に出力する。 The sensor unit 33 includes at least a microphone, which is an audio sensor. The sensor unit 33 may also include a touch sensor, an acceleration sensor, etc. The sensor unit 33 outputs audio signals detected by each sensor, information indicating the position touched by the user, acceleration information, etc. to the control unit 31.

表示部３４は、液晶ディスプレイ等であり、制御部３１から入力される指示に従って、キャラクタの表情を表す画像データ等を表示する。音声出力部３５は、スピーカー等であり、制御部３１から入力される音声信号に従って音声を鳴動する。 The display unit 34 is a liquid crystal display or the like, and displays image data representing the character's facial expressions, etc., in accordance with instructions input from the control unit 31. The audio output unit 35 is a speaker or the like, and produces sound in accordance with audio signals input from the control unit 31.

通信部３６は、ネットワークインタフェースを含む。この通信部３６は、無線または有線にてネットワークを介してサーバ１０との間で情報を送受する。具体的に通信部３６は、制御部３１から入力される指示に従って、サーバ１０に対してリクエスト情報等を送出する。また、この通信部３６は、サーバ１０から受信した情報を制御部３１に出力する。 The communication unit 36 includes a network interface. This communication unit 36 sends and receives information to and from the server 10 via a network, either wirelessly or via a wired connection. Specifically, the communication unit 36 sends request information and the like to the server 10 in accordance with instructions input from the control unit 31. The communication unit 36 also outputs information received from the server 10 to the control unit 31.

駆動部３７は、制御部３１から入力される指示に従い、脚部２１に対して本体部２２を回転するようアクチュエータを駆動する。 The drive unit 37 drives the actuator to rotate the main body unit 22 relative to the legs 21 in accordance with instructions input from the control unit 31.

次に、本実施の形態のサーバ１０の制御部１１の動作について説明する。本実施の形態では、このサーバ１０の制御部１１は、図５に例示するように、受信部４１と、予約語抽出部４２と、広告選択部４３と、アクション情報生成部４４と、指示送信部４５とを含んで構成される。 Next, the operation of the control unit 11 of the server 10 in this embodiment will be described. In this embodiment, the control unit 11 of the server 10 includes a receiving unit 41, a reserved word extraction unit 42, an advertisement selection unit 43, an action information generation unit 44, and an instruction transmission unit 45, as illustrated in FIG. 5.

受信部４１は、端末装置２０からリクエスト情報を受信する。本実施の形態では、サーバ１０は、端末装置２０にて実行するべき処理（アクション）を要求するリクエスト情報を、当該端末装置２０から受け入れる。このリクエスト情報には、アクションの要求の原因（トリガ）を特定する情報を含む。トリガの種類については後述するが、例えばユーザによる音声の入力等がその一例となる。ユーザによる音声入力があったとのトリガに基づく上記リクエスト情報には、当該トリガを特定する情報とともに、ユーザにより入力された音声の内容を表す情報が含まれてもよい。ここでユーザにより入力された音声の内容を表す情報は、音声を認識した結果である文字列情報でよい。 The receiving unit 41 receives request information from the terminal device 20. In this embodiment, the server 10 accepts request information from the terminal device 20 requesting a process (action) to be executed by the terminal device 20. This request information includes information identifying the cause (trigger) of the action request. Types of triggers will be described later, but an example would be voice input by the user. The request information based on the trigger of voice input by the user may include information identifying the trigger as well as information representing the content of the voice input by the user. Here, the information representing the content of the voice input by the user may be character string information that is the result of voice recognition.

受信部４１は、ここで受け入れたリクエスト情報に含まれる、トリガを特定する情報や、ユーザにより入力された音声の内容を表す文字列情報等を、予約語抽出部４２と、アクション情報生成部４４とに出力する。 The receiving unit 41 outputs the information contained in the received request information, such as information identifying the trigger and character string information representing the content of the voice input by the user, to the reserved word extraction unit 42 and the action information generation unit 44.

予約語抽出部４２は、受信部４１が出力する文字列情報に含まれる予約語を抽出する。ここで予約語は予め定められたキーワードであり、本実施の形態の例では、予め複数のキーワードがそれぞれ予約語として列挙して記憶部１２に格納されている。予約語抽出部４２は、この列挙された予約語が、受信部４１が出力する文字列情報に含まれる場合、予約語を抽出して広告選択部４３に出力する。 The reserved word extraction unit 42 extracts reserved words contained in the character string information output by the receiving unit 41. Here, reserved words are predetermined keywords, and in this embodiment, a plurality of keywords are listed in advance as reserved words and stored in the storage unit 12. If the listed reserved words are contained in the character string information output by the receiving unit 41, the reserved word extraction unit 42 extracts the reserved words and outputs them to the advertisement selection unit 43.

例えば端末装置２０から受信したリクエスト情報に含まれる、ユーザが入力した会話文を表す文字列情報が、「最近、睡眠不足で…」というものであるとき、予約語として「睡眠不足」の語が列挙されていれば、この予約語抽出部４２は、「睡眠不足」の語を抽出して、広告選択部４３に出力する。 For example, if the string information representing a conversation entered by the user and included in the request information received from the terminal device 20 is "I've been sleep-deprived lately...", and the word "sleep-deprived" is listed as a reserved word, the reserved word extraction unit 42 extracts the word "sleep-deprived" and outputs it to the advertisement selection unit 43.

広告選択部４３は、予約語抽出部４２が抽出した予約語に基づいて、記憶部１２に格納されている広告文のうち、ユーザに提示する広告文を選択する。具体的に、この広告選択部４３は、
（１）広告文に、抽出された予約語またはその類語が含まれる広告文を選択する、
（２）関連付けられているジャンル情報に、抽出された予約語が含まれる広告文を選択する、のいずれかの処理を実行して広告文を選択する。 The advertisement selection unit 43 selects advertisement copy to be presented to the user from among the advertisement copy stored in the storage unit 12 based on the reserved words extracted by the reserved word extraction unit 42. Specifically, the advertisement selection unit 43:
(1) Select advertisement copy that contains the extracted reserved word or its synonym.
(2) Select advertising copy whose associated genre information includes the extracted reserved word.

ここで、広告選択部４３が上記（１）または（２）の処理を行ったときに、複数の広告文が選択されるときには、広告選択部４３は、当該複数の広告文のうちから一つの広告文をさらに選択（絞り込み選択）する。この絞り込み選択の方法は、ランダムに行われてもよいし、他の条件に基づいて行われてもよい。 Here, when the advertisement selection unit 43 performs the above process (1) or (2) and multiple advertisement copies are selected, the advertisement selection unit 43 further selects (narrows down and selects) one advertisement copy from among the multiple advertisement copies. This narrowing down and selection method may be performed randomly or based on other conditions.

広告選択部４３は、ここで選択した一つの広告文を、会話文として記憶部１２に格納した会話文キューに登録する。このとき広告選択部４３は、当該広告文に関連付けて、期限の情報として、登録の時点から所定の時間だけ後の時間（例えば１時間）を関連付けて記録してもよい。後に説明するように、この期限が経過した会話文は発話されないよう制御されるので、このように期限の情報を設定すると、ユーザが予約語を発話してから相当の時間が経過した後に広告文が発話されることがなくなる。 The advertisement selection unit 43 registers the selected advertisement text as a conversational sentence in the conversational sentence queue stored in the memory unit 12. At this time, the advertisement selection unit 43 may associate the advertisement text with expiration information, which may be a predetermined time (e.g., one hour) from the time of registration. As will be explained later, conversational sentences are controlled so that they are not spoken after this expiration date, so setting expiration information in this way prevents the advertisement text from being spoken after a considerable amount of time has passed since the user uttered the reserved word.

アクション情報生成部４４は、受信部４１から入力される情報に基づいて、リクエストを送出した端末装置２０が実行するべきアクションを決定し、当該アクションを指示する情報（アクション指示）と、アクションの実行に必要となる情報（以下、パラメータ情報と呼ぶ）とを含むアクション情報を生成して指示送信部４５に出力する。 Based on the information input from the receiving unit 41, the action information generating unit 44 determines the action to be executed by the terminal device 20 that sent the request, generates action information including information instructing the action (action instruction) and information required to execute the action (hereinafter referred to as parameter information), and outputs the action information to the instruction transmitting unit 45.

本実施の形態の一例では、サーバ１０の記憶部１２には、端末装置２０への指示を生成するための情報として、発生条件と、トリガを特定する情報と、ユーザにより入力された音声の内容を表す情報と比較する情報（以下、比較文字列情報と呼ぶ。ただしこの比較文字列情報は、トリガの種類によってはなくてもよい）と、アクション情報の生成のためにサーバ１０が実行する処理を表す情報とを互いに関連づけたレコードを少なくとも一つ含む、アクションデータベースが格納されているものとする。 In one example of this embodiment, the memory unit 12 of the server 10 stores an action database containing at least one record that associates the following information for generating instructions to the terminal device 20: occurrence conditions, information identifying the trigger, information to be compared with information representing the content of the voice input by the user (hereinafter referred to as comparison string information; however, this comparison string information may not be necessary depending on the type of trigger), and information representing the process to be executed by the server 10 to generate action information.

アクション情報生成部４４は、受信部４１から入力されるトリガを特定する情報に関連付けられた、比較文字列情報（あれば）とアクション情報の生成のためにサーバ１０が実行するべき処理を表す情報とを取得する。 The action information generation unit 44 acquires comparison string information (if any) associated with the trigger-identifying information input from the receiving unit 41 and information indicating the processing that the server 10 should perform to generate action information.

そしてアクション情報生成部４４は、比較文字列情報が取得されれば（トリガを特定する情報に比較文字列情報が関連付けられていれば）、受信部４１が出力する文字列情報と当該比較文字列情報とを比較する。そして、アクション情報生成部４４は、受信部４１が出力する文字列情報が比較文字列情報に一致していると判断すると、取得した情報が表す処理を実行して、アクション情報を生成する。 If comparison string information is acquired (if comparison string information is associated with information identifying the trigger), the action information generation unit 44 compares the comparison string information with the string information output by the receiving unit 41. If the action information generation unit 44 determines that the string information output by the receiving unit 41 matches the comparison string information, it executes the process indicated by the acquired information and generates action information.

また、アクション情報生成部４４は、比較文字列情報が取得されていなければ、上記取得した情報が表す処理を実行して、アクション情報を生成する。 Furthermore, if comparison string information has not been acquired, the action information generation unit 44 executes the processing indicated by the acquired information to generate action information.

具体的な例として、ここではアクションデータベースには、「ユーザによる音声入力があった」旨のトリガを特定する情報と、ユーザにより入力された音声の内容を表す情報と比較するべき比較文字列情報として「＊ニュース［を｜は］＊［ない｜教えて｜読みあげて］＊」などといった文字列の情報とに「ニュースの文字列情報を、インターネット上の所定のウェブサーバから取得し、当該文字列情報を読み上げるよう指示する」との情報を関連付けたレコードが記録されているものとする。 As a specific example, the action database contains a record that associates information specifying a trigger that "there has been voice input by the user" with string information such as "*News [to | is] * [not | tell me | read it out loud] *" as comparison string information to be compared with information representing the content of the voice input by the user, and the information "obtain the news string information from a specified web server on the Internet and instruct to read out the string information."

なお、この比較文字列情報も正規表現で表されているものとする。従って上記の文字列は、「今日のニュースを教えて」や、「何かニュースはない？」といった文字列情報に合致することとなる。 Note that this comparison string information is also expressed as a regular expression. Therefore, the above string will match string information such as "Tell me today's news" or "Is there any news?"

アクション情報生成部４４は、例えば受信部４１から「ユーザによる音声入力があった」旨のトリガを特定する情報と、ユーザにより入力された音声の内容を表す情報として「何かニュースはない？」といった文字列情報との入力を受け入れると、当該トリガを特定する情報を含むレコードをアクションデータベースから検索する。 For example, when the action information generation unit 44 receives input from the receiving unit 41, such as information identifying a trigger that "there has been voice input by the user" and character string information such as "Is there any news?" representing the content of the voice input by the user, it searches the action database for a record containing the information identifying the trigger.

ここではアクション情報生成部４４は、上記のレコードをアクションデータベースから見出すこととなり、当該レコードに含まれる、比較文字列情報と、受け入れた文字列情報とを比較する。上記の例では受け入れた文字列情報「何かニュースはない？」が、比較文字列情報「＊ニュース［を｜は］＊［ない｜教えて｜読みあげて］＊」に合致すると判断されるので、アクション情報生成部４４は、検索で見出した上記のレコードに含まれる、サーバ１０が実行するべき処理を表す情報、例えば「（ステップ１）ニュースの文字列情報を、インターネット上の所定のウェブサーバから取得する、
（ステップ２）当該文字列情報を読み上げる指示を生成
（ステップ３）読み上げのときに再生するアニメーション情報を表示させる指示を生成する」を取得して、この情報に従った処理を実行する。 Here, the action information generating unit 44 finds the above record from the action database and compares the comparison string information contained in the record with the accepted string information. In the above example, the accepted string information "Is there any news?" is determined to match the comparison string information "*News [o|ha] * [no|tell me|read it out]*", so the action information generating unit 44 compares the information contained in the above record found by the search that indicates the process to be executed by the server 10, for example, "(Step 1) Obtain news string information from a specified web server on the Internet,
(Step 2) Generate an instruction to read out the character string information, and (Step 3) Generate an instruction to display animation information to be played back when reading out the character string information. Then, the information is acquired and processing is performed according to the information.

すなわちアクション情報生成部４４は、この読み出した情報に従って、インターネット上の所定のウェブサーバからニュースの文字列情報を取得する。またアクション情報生成部４４は、並列して行われるアクション処理の実行とともに表示するべきアニメーションの画像データを特定する情報（画像データのファイル名でよい）を含むアニメーション情報の表示指示を生成してもよい。 In other words, the action information generation unit 44 obtains news string information from a specified web server on the Internet in accordance with this read information. The action information generation unit 44 may also generate a display instruction for animation information that includes information specifying the image data of the animation to be displayed in conjunction with the execution of the action processing that is performed in parallel (which may be the file name of the image data).

そしてこの例では、アクション情報生成部４４は、アクション指示とパラメータ情報とを含んだアクション情報を生成して指示送信部４５に出力する。ここでアクション指示には、文字列情報を読み上げるべき旨の指示と、アニメーション情報の表示指示とを含む。また、パラメータ情報には、上記取得した文字列情報と、アニメーションの画像データを特定する情報とを含む。 In this example, the action information generation unit 44 generates action information including an action instruction and parameter information and outputs it to the instruction transmission unit 45. Here, the action instruction includes an instruction to read out string information and an instruction to display animation information. Furthermore, the parameter information includes the acquired string information and information specifying the image data of the animation.

またここで、サーバ１０が実行するべき処理を表す情報には「会話文の選択」の指示が含まれてもよい。このような指示が含まれる場合、アクション情報生成部４４は、当該指示に従い、例えば次のような方法で会話文を選択する。 In addition, the information representing the process to be executed by the server 10 may include an instruction to "select a conversation sentence." If such an instruction is included, the action information generation unit 44 will select a conversation sentence in accordance with the instruction, for example, in the following manner.

ここではサーバ１０の記憶部１２に格納される会話文キューは、図６に例示するように、期限（Ｔ）と、発話条件（Ｃ）と、会話文（Ｄ）とを関連付けて格納したものとなる。 Here, the conversation sentence queue stored in the memory unit 12 of the server 10 stores a deadline (T), a speech condition (C), and a conversation sentence (D) in association with each other, as illustrated in Figure 6.

アクション情報生成部４４は、会話文キューに格納されている会話文のうち、当該会話文に関連付けられた発話条件を満足する会話文を抽出する。ここで発話条件は例えば、発話してもよい時間帯を表す情報等である。この発話条件を満足するか否かの判断に必要な種々の情報、例えば現在日時（処理を実行している日時）の情報や、気象情報等は、アクション情報生成部４４が、ネットワークを介してＮＴＰ（Network Time Protocol）サーバや、所定のウェブサーバから取得すればよい。 The action information generation unit 44 extracts conversation sentences from the conversation sentences stored in the conversation sentence queue that satisfy the speech conditions associated with the conversation sentence. Here, the speech conditions are, for example, information indicating the time period during which speech is permitted. The action information generation unit 44 can obtain various information required to determine whether the speech conditions are satisfied, such as the current date and time (the date and time when processing is being performed) and weather information, from an NTP (Network Time Protocol) server or a specified web server via the network.

アクション情報生成部４４は、発話条件を満足するとして抽出した会話文のうちから一つを例えばランダムに選択する。なお、このときアクション情報生成部４４は、抽出した会話文に関連付けられている期限の情報を会話文キューから読み出し、当該期限が既に経過していたときには、会話文キューから抽出した会話文を削除してもよい。この場合、アクション情報生成部４４は、発話条件を満足するとして抽出した他の会話文（未選択の会話文）を選択する処理から繰り返す。 The action information generation unit 44 selects, for example, randomly, one of the conversation sentences extracted as satisfying the utterance conditions. At this time, the action information generation unit 44 may read information about the deadline associated with the extracted conversation sentence from the conversation sentence queue, and delete the extracted conversation sentence from the conversation sentence queue if the deadline has already passed. In this case, the action information generation unit 44 repeats the process from selecting another conversation sentence (an unselected conversation sentence) extracted as satisfying the utterance conditions.

アクション情報生成部４４は、選択した会話文が所定の基準を満足するか否かを判断する。例えばアクション情報生成部４４は、後に説明する指示送信部４５が記録している、リクエストを送出した端末装置２０のユーザに係る会話の履歴を読み出す。そしてアクション情報生成部４４は、現在記録されている会話の履歴に続いて、上記選択した会話文が発話されたときに自然な会話となるか否かを判断する。つまり、この例では、上記所定の基準は、履歴にある会話文と、選択された広告文とが連続性を有する、との基準となる。 The action information generation unit 44 determines whether the selected conversational text satisfies a predetermined criterion. For example, the action information generation unit 44 reads the conversation history relating to the user of the terminal device 20 that sent the request, which is recorded by the instruction transmission unit 45 (described later). The action information generation unit 44 then determines whether the selected conversational text will result in a natural conversation when spoken following the currently recorded conversation history. In other words, in this example, the predetermined criterion is that there is continuity between the conversational text in the history and the selected advertising text.

この判断は例えば、人間同士の間でなされた会話のテキストを機械学習したニューラルネットワーク等を用いて、現在記録されている会話の履歴に続く文として妥当であるか否か、すなわち会話に連続性があるか否かを判断させることで実現できる。このような処理は、いわゆる次文予測（Next Sentence Prediction：ＮＳＰ）として知られる処理である。次文予測を行うためのニューラルネットワークとしては、例えばＢＥＲＴとして知られるモデル（https://arxiv.org/pdf/1706.03762.pdf）を利用できる。このような次文予測を行うための機械学習の学習用データとしては、一対の会話文（第１の会話文と第２の会話文ととする）と、当該第１，第２の会話文の連続性を表す情報とを互いに関連付けたものを用いる方法等、広く知られた学習用データ並びに、それを用いた機械学習処理方法を採用できる。 This determination can be made, for example, by using a neural network that has learned from machine learning the text of conversations between people to determine whether a sentence is valid as a continuation of the currently recorded conversation history, i.e., whether there is continuity in the conversation. This type of processing is known as next sentence prediction (NSP). A model known as BERT (https://arxiv.org/pdf/1706.03762.pdf) can be used as a neural network for next sentence prediction. Widely known training data and machine learning processing methods using such training data can be used as training data for machine learning to perform such next sentence prediction, such as a method that uses a pair of sentences (a first sentence and a second sentence) associated with information indicating the continuity between the first and second sentences.

このようにニューラルネットワークを利用して現在記録されている会話の履歴に続く文として、選択した会話文が妥当であるか否か、つまり会話の連続性を判断させた場合、ニューラルネットワークの出力は、その妥当性を数値として表したものとなる。そこでアクション情報生成部４４は、予め定めたしきい値を超える数値となるときに、自然な会話となると判断（連続性ありと判断）する。 When a neural network is used in this way to determine whether a selected conversational sentence is appropriate as a continuation of the currently recorded conversation history, i.e., whether there is continuity in the conversation, the output of the neural network represents that appropriateness as a numerical value. Therefore, the action information generation unit 44 determines that the conversation is natural (determines that there is continuity) when the numerical value exceeds a predetermined threshold value.

そしてアクション情報生成部４４は、上記のように連続性ありとの判断をしたときには、当該会話文（この会話文は前段の会話を引き継ぐ応答文でもあるので、他の会話文との区別を要する場合はここでは応答文と呼ぶ）の文字列情報を読み上げるべき旨の指示と、アニメーション情報の表示指示とを含むアクション指示を生成する。またアクション情報生成部４４は、上記選択した会話文の文字列情報と、アニメーションの画像データを特定する情報とを含むパラメータ情報を生成し、当該生成したアクション指示とパラメータ情報とをアクション情報として、端末装置２０へ送出するよう、指示送信部４５に指示する。 When the action information generation unit 44 determines that there is continuity as described above, it generates an action instruction including an instruction to read out the string information of the conversational sentence (this conversational sentence is also a response sentence that continues the previous conversation, and therefore will be referred to as a response sentence here when it is necessary to distinguish it from other conversational sentences) and an instruction to display animation information. The action information generation unit 44 also generates parameter information including the string information of the selected conversational sentence and information specifying image data of the animation, and instructs the instruction transmission unit 45 to send the generated action instruction and parameter information as action information to the terminal device 20.

指示送信部４５は、アクション情報生成部４４が生成したアクション情報を、受信部４１が受信したリクエスト情報の送信元である端末装置２０に対して送出する。なお、このとき指示送信部４５は、ユーザごとに、受信部４１が受け入れた、当該ユーザにより入力された音声の内容と、アクション情報生成部４４が指示した、当該ユーザの端末装置２０により発話される会話文の内容とを順次記録する。 The instruction sending unit 45 sends the action information generated by the action information generating unit 44 to the terminal device 20 that is the sender of the request information received by the receiving unit 41. At this time, the instruction sending unit 45 sequentially records, for each user, the content of the voice input by the user that is accepted by the receiving unit 41 and the content of the conversation that is instructed by the action information generating unit 44 to be spoken by the terminal device 20 of that user.

次に、端末装置２０の制御部３１の動作について説明する。本実施の形態では、制御部３１は、図７に例示するように、リクエスト送出部５１と、アクション情報受信部５２と、音声合成部５３と、アクション処理実行部５４とを機能的に含んで構成される。 Next, the operation of the control unit 31 of the terminal device 20 will be described. In this embodiment, the control unit 31 functionally includes a request sending unit 51, an action information receiving unit 52, a voice synthesis unit 53, and an action processing execution unit 54, as illustrated in FIG. 7.

リクエスト送出部５１は、予め定められたトリガが発生したと判断すると、サーバ１０での処理に必要な情報を収集して、当該トリガを特定する情報とともに、当該収集した情報を含むリクエスト情報をサーバ１０へ送出する。具体的にここでは、ユーザにより音声が入力されたことや、所定の時刻になったなどといったトリガを予め列挙して、設定情報に含め、記憶部３２に格納しておく。 When the request sending unit 51 determines that a predetermined trigger has occurred, it collects information necessary for processing by the server 10 and sends request information including the collected information along with information identifying the trigger to the server 10. Specifically, triggers such as voice input by the user or the arrival of a specified time are listed in advance, included in the setting information, and stored in the memory unit 32.

リクエスト送出部５１は、図４に例示した設定情報を参照して、発生条件（Ｃ）が満足されたと判断すると、当該満足した発生条件に関連付けられたトリガ（Ｎ）が発生したとして、当該トリガに関連付けられた、サーバ１０での処理に必要な情報（Ｐ）を参照する。 When the request sending unit 51 determines that the occurrence condition (C) is satisfied by referencing the setting information exemplified in Figure 4, it determines that a trigger (N) associated with the satisfied occurrence condition has occurred, and refers to information (P) associated with the trigger that is necessary for processing on the server 10.

なお、リクエスト送出部５１は、発生条件（Ｃ）ごとに、前回、当該発生条件が満足され、リクエスト情報を送出した日時を記録しておいてもよい。そしてリクエスト送出部５１は、満足された発生条件（Ｃ）に関連して記録された上記日時と現在日時との差が、発生条件に関連付けて設定情報に記録された（つまり例えば発生条件ごとに設定された）インターバル（Ｔ）を越えない場合は、リクエスト情報の送出を行わないよう制御してもよい。 The request sending unit 51 may record, for each occurrence condition (C), the date and time when the occurrence condition was last satisfied and request information was sent. The request sending unit 51 may then perform control so as not to send request information if the difference between the date and time recorded in association with the satisfied occurrence condition (C) and the current date and time does not exceed the interval (T) recorded in the setting information in association with the occurrence condition (i.e., set for each occurrence condition, for example).

そしてリクエスト送出部５１は、当該参照した情報で特定される、サーバ１０での処理に必要な情報を収集し、当該収集した情報と、発生したトリガを特定する情報（トリガ名でよい）とを含むリクエスト情報を、サーバ１０へ送出する。 The request sending unit 51 then collects the information required for processing on the server 10, as identified by the referenced information, and sends request information to the server 10, including the collected information and information identifying the trigger that occurred (which may be the trigger name).

アクション情報受信部５２は、サーバ１０からアクション情報を受信して、当該受信したアクション情報を、アクション処理実行部５４に出力する。 The action information receiving unit 52 receives action information from the server 10 and outputs the received action information to the action processing execution unit 54.

音声合成部５３は、後に説明するアクション処理実行部５４から入力される文字列情報に基づいて、音声データを合成する。この音声合成部５３は、合成して得られた音声データを、アクション処理実行部５４に出力する。 The voice synthesis unit 53 synthesizes voice data based on character string information input from the action processing execution unit 54, which will be described later. The voice synthesis unit 53 outputs the synthesized voice data to the action processing execution unit 54.

アクション処理実行部５４は、サーバ１０が送出したアクション情報から、アクション指示とパラメータ情報とを取り出し、アクション指示に従って処理を実行する。具体的に、上述の例のように、当該取得した文字列情報を読み上げるべき旨の指示と、上記アニメーション情報の表示指示とを含むアクション指示、及び、取得した文字列情報と、アニメーションの画像データとを含むパラメータ情報を含んだアクション情報を、アクション情報受信部５２が受信した場合について説明する。 The action processing execution unit 54 extracts the action instruction and parameter information from the action information sent by the server 10 and executes processing in accordance with the action instruction. Specifically, as in the example above, we will explain the case where the action information receiving unit 52 receives action information that includes an instruction to read out the acquired character string information and an instruction to display the animation information, as well as parameter information that includes the acquired character string information and image data of the animation.

この例ではアクション処理実行部５４は、取得した文字列情報を音声合成部５３に出力して、音声データを取得する。また、アクション処理実行部５４は、アクション情報に含まれる情報で特定されるアニメーションの画像データを記憶部２２から読み出す。そしてアクション処理実行部５４は、音声合成部５３が出力した音声データを、音声出力部３５に出力して、音声を鳴動させるとともに、上記読み出したアニメーションの画像データを表示部３４に出力して、アニメーションの画像データを再生する。 In this example, the action processing execution unit 54 outputs the acquired character string information to the voice synthesis unit 53 to acquire audio data. The action processing execution unit 54 also reads image data of the animation specified by the information included in the action information from the storage unit 22. The action processing execution unit 54 then outputs the audio data output by the voice synthesis unit 53 to the audio output unit 35 to produce audio, and outputs the image data of the read animation to the display unit 34 to play the image data of the animation.

［動作］
本実施の形態の情報処理システム１は、以上の構成を備えており、次の例のように動作する。なお以下の例ではサーバ１０の記憶部１２には、アクションの要求の原因であるトリガごとに、アクション情報の生成のためにサーバ１０が実行する処理を表す情報が関連付けられて、アクションデータベースとして格納されているものとする。 [Operation]
The information processing system 1 of this embodiment has the above configuration and operates as follows: In the following example, it is assumed that the storage unit 12 of the server 10 stores, as an action database, information representing the process that the server 10 executes to generate action information for each trigger that is the cause of an action request.

以下の例では、このアクションデータベースに含まれる情報の例として、・トリガを特定する情報（Ｔ）：ユーザが会話をしている・実行する処理：
（ステップ１）会話文の文字列情報を選択
（ステップ２）当該文字列情報を読み上げる指示を生成
（ステップ３）読み上げのときに再生するアニメーション情報を表示させる指示を生成するとの情報が含まれるものとする。 In the following example, the information contained in this action database is: Information specifying a trigger (T): the user is having a conversation; Processing to be performed:
(Step 1) Select character string information of a conversation sentence; (Step 2) Generate an instruction to read out the character string information; (Step 3) Generate an instruction to display animation information to be played when reading out the character string information.

また、サーバ１０の記憶部１２には、広告の情報として、広告の対象（Ｐ）：睡眠改善薬ＡＡ・広告識別子：１・広告文（Ｅ）：
（１）「睡眠の質を改善するには睡眠改善薬ＡＡ」
（２）「睡眠改善薬ＡＡ、試してみた？」・出稿者（Ａ）：株式会社Ａ・ジャンル（Ｇ）：睡眠，深い，眠る，不眠，睡眠不足，睡眠改善薬，…・ＵＲＬ（Ｕ）：http://…広告の対象（Ｐ）：睡眠改善薬ＢＢ・広告識別子：２・広告文（Ｅ）：
（１）「病院でも使われている睡眠改善薬ＢＢ」
（２）「睡眠改善薬ＢＢは使ってみた？」・出稿者（Ａ）：株式会社Ｂ・ジャンル（Ｇ）：睡眠，入眠，不眠症，睡眠不足，睡眠改善薬，…・ＵＲＬ（Ｕ）：http://……といったような広告の情報が格納されているものとする。 The storage unit 12 of the server 10 stores the following advertisement information: advertisement target (P): sleep-improving drug AA; advertisement identifier: 1; advertisement copy (E):
(1) "To improve the quality of your sleep, use the sleep aid AA."
(2) “Have you tried sleep aid AA?” ・Advertiser (A): Co., Ltd. A ・Genre (G): sleep, deep sleep, insomnia, lack of sleep, sleep aid, ... ・URL (U): http://... ・Target of advertisement (P): sleep aid BB ・Advertisement identifier: 2 ・Advertisement copy (E):
(1) "Sleep Improvement Drug BB, also used in hospitals"
(2) "Have you tried the sleep improvement drug BB?" Advertiser (A): B Co., Ltd. Genre (G): sleep, falling asleep, insomnia, sleep deprivation, sleep improvement drug, ... URL (U): http:// ... and other advertisement information is stored.

さらに端末装置２０の記憶部３２は、設定情報として、図４に例示したように、トリガごとに、発生条件（Ｃ）や当該トリガに関係してサーバ１０での処理に必要な情報を特定する情報（Ｐ）等を関連付けて格納している。 Furthermore, the storage unit 32 of the terminal device 20 stores, as setting information, for each trigger, information (C) that specifies the conditions for occurrence and information (P) that is related to the trigger and is necessary for processing on the server 10, as shown in Figure 4.

以下の例では、この設定情報に、・トリガを特定する情報（トリガ名Ｎ）：ユーザによる音声入力があった・発生条件（Ｃ）：ユーザが所定のウェイクワードを発声した・サーバ１０での処理に必要な情報を特定する情報（Ｐ）：ユーザが発話した内容の文字列情報・トリガを特定する情報（トリガ名Ｎ）：ユーザが会話をしている・発生条件（Ｃ）：音声データ鳴動後、１０秒以内にユーザが発声した・サーバ１０での処理に必要な情報を特定する情報（Ｐ）：ユーザが発話した内容の文字列情報…といった情報が含まれるものとする。ここでウェイクワードとは、ユーザがその語を発話したときに、音声入力の開始として認識するべき、「ねえ聞いてよ」や「起きてよ」等の語であり、予め定められているものとする。端末装置２０は、サーバ１０での処理に必要となるユーザが発話した内容の文字列情報から、このウェイクワードに相当する文字列部分を除いてもよい。 In the following example, this configuration information includes the following information: - Information identifying the trigger (trigger name N): there is voice input by the user - Occurrence condition (C): the user utters a specified wake word - Information identifying information required for processing by server 10 (P): character string information of what the user has spoken - Information identifying the trigger (trigger name N): the user is having a conversation - Occurrence condition (C): the user speaks within 10 seconds of the sound being emitted - Information identifying information required for processing by server 10 (P): character string information of what the user has spoken... Here, a wake word is a predetermined word such as "Hey, listen to me" or "Wake up," which should be recognized as the start of voice input when spoken by the user. The terminal device 20 may remove the character string portion corresponding to the wake word from the character string information of what the user has spoken that is required for processing by server 10.

またここでは発生条件（Ｃ）として、ユーザがウェイクワードを発声したことを条件としているが本実施の形態はこれに限られるものではなく、他の条件であっても構わない。 Furthermore, here, the occurrence condition (C) is the user uttering the wake word, but this embodiment is not limited to this and other conditions may also be used.

以下、このような設定の情報等を保持するサーバ１０と、端末装置２０との動作について、図８，図９を参照しながら説明する。 The operation of the server 10, which stores such setting information, and the terminal device 20 will be explained below with reference to Figures 8 and 9.

端末装置２０がユーザからの指示に従って、ニュースの情報等の音声データを鳴動した後、ユーザが当該音声データに対して反応して、端末装置２０に対し「私も最近睡眠不足でさあ…」などと発話する（図８のＳ１１）と、端末装置２０はこのユーザの音声を認識する処理を実行して（Ｓ１２）、ユーザが発話した音声に対応する文字列情報を取得する。既に述べたように、音声認識の処理は端末装置２０自身が行わなくても、ネットワークを介して音声認識処理のサービスにアクセスすることで行ってもよい。 After the terminal device 20 sounds audio data such as news information in accordance with instructions from the user, the user may respond to the audio data by saying something like, "I've been getting less sleep lately, too..." to the terminal device 20 (S11 in FIG. 8). The terminal device 20 then executes a process to recognize the user's voice (S12) and acquires character string information corresponding to the voice spoken by the user. As already mentioned, the voice recognition process does not have to be performed by the terminal device 20 itself; it can also be performed by accessing a voice recognition service via a network.

端末装置２０は、設定情報を参照して、いずれかのトリガの発生条件が満足されたかを調べる（Ｓ１３）。ここでは端末装置２０の音声データ鳴動後、すぐにユーザが音声を発声しているので、「ユーザが会話をしている」旨のトリガが発生したものとして（Ｓ１３：Ｙｅｓ）、設定情報に従い、ユーザが発話した内容の文字列情報を収集する。なお、ステップＳ１３において、いずれのトリガの発生条件も満足していないと判断すると（Ｓ１0３：Ｎｏ）、端末装置２０は処理を終了する。 The terminal device 20 refers to the setting information to check whether any of the trigger conditions have been satisfied (S13). In this case, since the user speaks immediately after the terminal device 20 rings with audio data, it is assumed that a trigger indicating "the user is having a conversation" has occurred (S13: Yes), and character string information of the content spoken by the user is collected in accordance with the setting information. Note that if it is determined in step S13 that none of the trigger conditions have been satisfied (S103: No), the terminal device 20 terminates processing.

ここでは、ユーザが発話した内容は既にステップＳ１２にて、ユーザが発話した内容の文字列情報を取得しているので、端末装置２０は、当該文字列情報と、発生したトリガを特定する情報（トリガ名「ユーザが会話をしている」）とを含むリクエスト情報をサーバ１０宛に送出する（Ｓ１４）。 In this case, since the content of what the user said has already been acquired as string information in step S12, the terminal device 20 sends request information to the server 10 that includes this string information and information identifying the trigger that occurred (trigger name "User is having a conversation") (S14).

サーバ１０では端末装置２０からのリクエスト情報を受信する。そしてサーバ１０は、当該リクエスト情報に含まれるトリガ名「ユーザが会話をしている」を参照して、当該トリガ名に関連付けられている情報を、アクションデータベースから取り出す（Ｓ１５）。またこのときサーバ１０は、リクエスト情報に含まれる会話文（ユーザの発話した内容を表す文字列情報）を、対応するユーザとの会話履歴に追加して蓄積する。 The server 10 receives the request information from the terminal device 20. The server 10 then references the trigger name "Users are having a conversation" included in the request information and retrieves information associated with the trigger name from the action database (S15). At this time, the server 10 also adds the conversational text included in the request information (string information representing what the user said) to the conversation history with the corresponding user and stores it.

ステップＳ１５の処理では、サーバ１０で実行するべき処理を表す情報が取得される。そしてサーバ１０は当該取得した情報が表す処理を実行する（Ｓ１６）。ここでは会話文キューに含まれる会話文（例えば「えー、大丈夫？」）が応答文として一つ選択されて、当該応答文を読み上げる指示を生成し、また読み上げと並行してアニメーション情報を表示させる指示を生成してアクション情報として端末装置２０へ送出する処理が行われる。 In step S15, information representing the process to be executed by the server 10 is acquired. The server 10 then executes the process represented by the acquired information (S16). Here, one of the conversation sentences included in the conversation sentence queue (e.g., "Eh, are you okay?") is selected as a response sentence, an instruction to read out the response sentence is generated, and an instruction to display animation information in parallel with the reading out is generated and sent to the terminal device 20 as action information.

このときサーバ１０は、送出した応答文を、対応するユーザとの会話履歴に追加して蓄積する。 At this time, the server 10 adds the sent response sentence to the conversation history with the corresponding user and stores it.

またサーバ１０では、この処理と前後して、端末装置２０から受信したリクエスト情報に含まれる文字列情報（ユーザが入力した会話文）である、「私も最近睡眠不足でさあ…」を、例えば形態素解析により単語に区切り（各単語のうち活用のあるものは原形とする）、予め定められた予約語（予約語は原形で記録しておくものとする）として、「睡眠不足」の語を抽出する（Ｓ１７）。あるいはサーバ１０は、正規表現を用いたパターンマッチング等により、予め定められた予約語である「睡眠不足」の語を抽出してもよい。具体的にはユーザが入力した会話文に、予め定められた予約語のいずれかが含まれるか否かを順次調べ、いずれかの予約語が含まれると判断したときに、当該予約語が抽出されたものとすればよい。 Also, before or after this process, the server 10 separates the string information (conversational text entered by the user) included in the request information received from the terminal device 20, "I've been getting less sleep lately, too...", into words using, for example, morphological analysis (those words that have conjugations are taken to be in their original form), and extracts the word "not getting enough sleep" as a predetermined reserved word (reserved words are recorded in their original form) (S17). Alternatively, the server 10 may extract the word "not getting enough sleep", which is a predetermined reserved word, using pattern matching using regular expressions, etc. Specifically, the server 10 sequentially checks whether the conversational text entered by the user contains any of the predetermined reserved words, and when it is determined that any of the reserved words is included, it determines that the reserved word has been extracted.

サーバ１０は、次に、このユーザが入力した会話文に含まれる予約語を、ジャンル情報に含む広告文を選択する（Ｓ１８）。上述の例では、「睡眠の質を改善するには睡眠改善薬ＡＡ」と、「病院でも使われている睡眠改善薬ＢＢ」と、の少なくとも２つの広告文が選択される。なお、ここでは一つの広告識別子に関連付けて複数の広告文がある場合は、そのうち先頭の広告文を選択することとする。 The server 10 then selects advertising copy whose genre information includes a reserved word contained in the conversational text entered by the user (S18). In the above example, at least two advertising copy would be selected: "Sleep improvement drug AA to improve sleep quality" and "Sleep improvement drug BB, also used in hospitals." Note that if there are multiple advertising copy associated with one advertising identifier, the first advertising copy among them will be selected.

例えば広告識別子：１に関わる広告文（Ｅ）には、
（１）「睡眠の質を改善するには睡眠改善薬ＡＡ」
（２）「睡眠改善薬ＡＡ、試してみた？」の２つの広告文が関連付けられているものとしたが、ここではそのうち先頭の
（１）「睡眠の質を改善するには睡眠改善薬ＡＡ」を選択するものとしておく。先頭の広告文以外の広告文が選択される例については後述する。 For example, in the advertisement copy (E) related to advertisement identifier: 1,
(1) "To improve the quality of your sleep, use the sleep aid AA."
(2) "Have you tried Sleep Aid AA?" are assumed to be associated with each other, but here, the first one (1) "Improve your sleep quality with Sleep Aid AA" will be selected. An example of selecting a copy other than the first one will be described later.

サーバ１０は、これらの選択した広告文からさらに、一つの広告文を絞り込んで選択する。この絞り込みの方法は既に述べたように、例えばランダムであってよい。ここでは、「睡眠の質を改善するには睡眠改善薬ＡＡ」が選択されたものとする。そしてサーバ１０は、選択した広告文を会話文キューに登録する（Ｓ１９）。 The server 10 further narrows down and selects one ad copy from these selected ad copies. As already mentioned, this narrowing down method may be random, for example. In this example, it is assumed that "To improve sleep quality, use sleep aid AA" has been selected. The server 10 then registers the selected ad copy in the conversation sentence queue (S19).

一方、端末装置２０は、ステップＳ１６でサーバ１０が送出したアクション情報に含まれる応答文を鳴動する（Ｓ２０）。ユーザが当該鳴動された応答文に対して回答するかたちで「もうちょっと深く眠れるといいんだけどね」などの会話文を発話すると、端末装置２０は、ステップＳ１２からＳ１４と同様の処理を実行して、この会話文の文字列情報を含むリクエスト情報を送信する（Ｓ２１）。 Meanwhile, the terminal device 20 sounds the response sentence included in the action information sent by the server 10 in step S16 (S20). When the user responds to the sounded response sentence by uttering a conversational sentence such as "I hope you can sleep a little more deeply," the terminal device 20 executes the same processes as steps S12 to S14 and transmits request information including the character string information of this conversational sentence (S21).

図９に移り、サーバ１０は、ステップＳ２０で端末装置２０が送出したリクエスト情報を受けて、このリクエスト情報に含まれる会話文（ユーザの発話した内容を表す文字列情報）を、対応するユーザとの会話履歴に追加して蓄積する。 Now, moving on to FIG. 9, the server 10 receives the request information sent by the terminal device 20 in step S20, and adds the conversational text (character string information representing what the user said) included in this request information to the conversation history with the corresponding user and stores it.

またサーバ１０は、当該リクエスト情報に含まれるトリガ名「ユーザが会話をしている」を参照して、当該トリガ名に関連付けられている情報を、アクションデータベースから取り出す（Ｓ２２）。ここでは、サーバ１０で実行するべき処理を表す情報が取得される。そしてサーバ１０は当該取得した情報が表す処理を実行する。ここでは会話文キューに含まれる会話文を、応答文として選択する処理が行われ、まず、ステップＳ１８で登録された広告文である「睡眠の質を改善するには睡眠改善薬ＡＡ」を含む会話文が抽出される（Ｓ２３）。 The server 10 also references the trigger name "User is having a conversation" included in the request information and retrieves information associated with this trigger name from the action database (S22). Here, information representing the process to be executed by the server 10 is acquired. The server 10 then executes the process represented by the acquired information. Here, a process is performed to select a conversation sentence included in the conversation sentence queue as a response sentence, and first, a conversation sentence containing the advertisement copy registered in step S18, "To improve sleep quality, use sleep aid AA," is extracted (S23).

サーバ１０は、抽出した会話文について、順次、所定の基準を満足するか否かを判断する。具体的にサーバ１０は、抽出した会話文の一つ（未選択のもの）を候補として選択する（Ｓ２４）。そしてサーバ１０は、リクエスト情報を送出した端末装置２０のユーザに係る会話の履歴を参照し、現在蓄積されている会話の履歴に続いて、ステップＳ２４で選択した会話文が発話されたときに自然な会話となるか否かを判断する（Ｓ２５）。 The server 10 sequentially determines whether the extracted conversation sentences satisfy predetermined criteria. Specifically, the server 10 selects one of the extracted conversation sentences (an unselected one) as a candidate (S24). The server 10 then references the conversation history related to the user of the terminal device 20 that sent the request information, and determines whether the conversation sentence selected in step S24 would result in a natural conversation when spoken following the currently stored conversation history (S25).

一例としてサーバ１０は、人間同士の間でなされた会話のテキストを機械学習したＢＥＲＴのモデルを用いた次文予測処理により、ステップＳ２４で選択した会話文が、現在記録されている会話の履歴に続く文としての妥当性を表す数値（妥当であるほど大きい値となるものとする）を取得する。そしてサーバ１０は、取得した値が予め定めたしきい値を超える数値となるときに、自然な会話となる（連続性あり）と判断し、そうでないときには自然な会話とならない（連続性なし）と判断する。 As an example, the server 10 performs next-sentence prediction processing using a BERT model, which is a machine-learned version of text from conversations between humans, to obtain a numerical value (the more appropriate the value, the larger the value) that indicates the appropriateness of the conversational sentence selected in step S24 as a sentence that follows the currently recorded conversation history.The server 10 then determines that the conversation is natural (there is continuity) when the obtained value exceeds a predetermined threshold value, and otherwise determines that the conversation is not natural (there is no continuity).

具体的に、ここまでの処理で、サーバ１０には、ステップＳ２１でリクエスト情報を送出した端末装置２０のユーザとの間で、ユーザ：「私も最近睡眠不足でさあ…」端末装置２０の発話：「えー、大丈夫？」ユーザ：「もうちょっと深く眠れるといいんだけどね」といった会話が履歴として蓄積されている。 Specifically, in the processing up to this point, the server 10 has accumulated as a history of conversations between the user of the terminal device 20 that sent the request information in step S21 and the user, such as: User: "I've been getting less sleep lately too..." Terminal device 20 utterance: "Huh, are you okay?" User: "I wish I could sleep a little more deeply."

そこでサーバ１０は、これらの会話文に続いて、ステップＳ２４で選択した会話文が、現在記録されている会話の履歴に続く文として妥当であるか否かを判断することとなる。 Then, following these conversational sentences, the server 10 determines whether the conversational sentence selected in step S24 is appropriate as a sentence that follows the currently recorded conversation history.

サーバ１０は、ステップＳ２５において、選択した会話文が発話されたときに自然な会話となると判断する（Ｓ２５：Ｙｅｓ）と、当該選択した会話文の文字列情報を読み上げるべき旨の指示を含むアクション情報を生成する（Ｓ２６）。そしてサーバ１０は、当該アクション情報を端末装置２０へ送出する（Ｓ２７）。 In step S25, if the server 10 determines that the selected conversational sentence will result in a natural conversation when spoken (S25: Yes), it generates action information including an instruction to read out the character string information of the selected conversational sentence (S26). The server 10 then sends the action information to the terminal device 20 (S27).

一方、ステップＳ２５において、選択した会話文が発話されたときに自然な会話とならないと判断する（Ｓ２５：Ｎｏ）と、サーバ１０は、処理Ｓ２４に戻って、未選択の会話文を選択する処理を続ける。なお、処理Ｓ２４において未選択の会話文がない場合は、予め定めた会話文の文字列情報を読み上げるべき旨の指示を含むアクション情報を生成して端末装置２０へ送出するなど、予め定めた処理を実行する（Ｓ２８：既定処理の実行）。 On the other hand, if it is determined in step S25 that the selected conversational sentence will not result in a natural conversation when spoken (S25: No), the server 10 returns to process S24 and continues the process of selecting an unselected conversational sentence. If there are no unselected conversational sentences in process S24, the server 10 executes a predetermined process, such as generating action information including an instruction to read out the character string information of a predetermined conversational sentence and sending it to the terminal device 20 (S28: Execute default process).

端末装置２０では、サーバ１０がステップＳ２７、またはＳ２８で送出したアクション情報の指示に従い、文字列情報を読み上げる処理を実行する（Ｓ２９）。 The terminal device 20 executes a process of reading out the character string information in accordance with the instructions in the action information sent by the server 10 in step S27 or S28 (S29).

このサーバ１０のステップＳ２５の処理で、直前までの会話文の履歴に続いて、選択した広告文が発話されたときに自然な会話となると判断されると、サーバ１０はステップＳ２６で広告文を応答文として読み上げる指示を含むアクション情報を生成することとなる。 If the server 10 determines in step S25 that the selected advertisement copy will result in a natural conversation following the conversation history up to that point, the server 10 will generate action information in step S26 that includes an instruction to read the advertisement copy as a response.

そして端末装置２０では「睡眠の質を改善するには睡眠改善薬ＡＡ」という広告文が応答文として読み上げられることとなる。 Then, the terminal device 20 will read out the advertisement text "To improve your sleep quality, use sleep aid AA" as a response text.

［ＵＲＬ］
なお、会話文キューには、会話文と関係してユーザに対して提供するべき、会話文以外の（鳴動される文字列情報以外の）情報が含まれていてもよい。例えば、サーバ１０は、広告文を選択して会話文キューに登録する際、当該広告文に関連付けられたＵＲＬの情報がある場合には、広告文を会話文として登録するとともに、当該ＵＲＬの情報を、会話文と関係してユーザに提供するべき情報として当該広告文に関連付けて会話文キューに登録しておく。 [URL]
The conversation sentence queue may include information other than the conversation sentence (other than the character string information to be sounded) that should be provided to the user in relation to the conversation sentence. For example, when the server 10 selects an advertisement sentence and registers it in the conversation sentence queue, if there is URL information associated with the advertisement sentence, the server 10 registers the advertisement sentence as a conversation sentence and also registers the URL information in the conversation sentence queue in association with the advertisement sentence as information to be provided to the user in relation to the conversation sentence.

そしてサーバ１０は、会話文キューから広告文を選択して当該選択した広告文を鳴動する旨のアクション情報を生成したときには、広告文の鳴動とともに、予め定めた、「…詳しい情報のＵＲＬを送っておいたよ」というような文言を鳴動する指示を、当該生成したアクション情報に追加して、端末装置２０宛に送出する。 Then, when the server 10 selects an advertisement text from the conversation text queue and generates action information to sound the selected advertisement text, it adds to the generated action information an instruction to sound a predetermined message such as "...I've sent you a URL for more detailed information" in addition to sounding the advertisement text, and sends the action information to the terminal device 20.

そしてサーバ１０は、選択した広告文に関連付けて会話文キューに記録されているＵＲＬを本文に含む所定の電子メールを、アクション情報の宛先となった端末装置２０のユーザのメールアドレス（予め登録されているものとする）宛に送出する。 The server 10 then sends a specified email containing in its body the URL associated with the selected advertisement copy and recorded in the conversation text queue to the email address (assuming it is pre-registered) of the user of the terminal device 20 to which the action information is addressed.

これにより、端末装置２０では広告文が発話されるとともに、「詳しい情報のＵＲＬを送っておいたよ」との発話が行われ、このユーザは、自己宛のメールとして、当該広告に関連する情報を取得するためのＵＲＬを受信する。 As a result, the terminal device 20 will speak the advertisement copy and say, "I've sent you a URL with more detailed information," and the user will receive an email addressed to them containing a URL for obtaining information related to the advertisement.

また、ここではユーザのメールアドレスにＵＲＬを送信する例としたが、本実施の形態はこれに限られず、例えばユーザのスマートフォン等にインストールされた、サーバ１０や端末装置２０との通信を行うアプリケーション宛に送信されてもよい。この例では、サーバ１０は、選択した広告文に関連付けて会話文キューに記録されているＵＲＬを、アクション情報の宛先となった端末装置２０のユーザのスマートフォンにインストールされたアプリケーション宛に送信する。なお、アプリケーションに情報を表示させる技術としては、通知の技術等広く知られた技術を採用できるので、ここでの詳しい説明は省略する。 Furthermore, while the example given here is one in which a URL is sent to the user's email address, the present embodiment is not limited to this, and the URL may also be sent to an application installed on the user's smartphone or the like that communicates with the server 10 or the terminal device 20. In this example, the server 10 sends the URL associated with the selected advertisement copy and recorded in the conversation sentence queue to the application installed on the smartphone of the user of the terminal device 20 that is the destination of the action information. Note that widely known technologies such as notification technologies can be used as the technology for displaying information in an application, so a detailed explanation will be omitted here.

［過去に提供した広告に関する記録］
またここまでの説明では、会話文キューに登録する広告文を選択する際に、例えばジャンル情報に基づいて複数の広告文が選択されても、会話文として発話される広告文を絞り込んで選択する際に、ランダムな絞り込みを行うものとしていた。このため、ある日の会話から出稿者Ａによる広告文、「睡眠の質を改善するには睡眠改善薬ＡＡ」が発話されたとしても、翌日の会話からは出稿者Ｂによる広告文、「病院でも使われている睡眠改善薬ＢＢ」が発話されてしまうことがあり得る。 [Records of past advertising provided]
In the explanation so far, when selecting ad copy to register in the conversation sentence queue, even if multiple ad copies are selected based on genre information, the ad copy to be spoken as conversation sentence is narrowed down randomly. Therefore, even if an ad copy by advertiser A is spoken in a conversation one day, "To improve the quality of your sleep, use sleep aid AA," the next day's ad copy by advertiser B may be spoken, "Sleep aid BB, which is also used in hospitals."

つまり共通の広告対象について、互いに異なる出稿者からの広告文を発信してしまうことがあり得るが、これでは端末装置２０が一貫性のない会話を行うことになり、ユーザが端末装置２０を擬人的に扱うことができない。 In other words, it is possible that advertising copy from different advertisers may be sent to a common advertising target, which would result in the terminal device 20 engaging in inconsistent conversations and prevent the user from treating the terminal device 20 as a person.

そこで本実施の形態の一例では、サーバ１０は、ユーザの端末装置２０に対して、広告文を発話するよう指示したときに、ユーザごとに、当該広告文の広告識別子と、その発話を指示した日時とを関連付けて、広告履歴として記録する。そしてサーバ１０は広告文の選択の際に当該広告履歴を参照して選択を行う。 In one example of this embodiment, when the server 10 instructs a user's terminal device 20 to speak an advertisement copy, it associates the advertisement identifier of the advertisement copy with the date and time the instruction to speak was given, and records this as an advertisement history for each user. The server 10 then refers to the advertisement history when selecting an advertisement copy.

具体的に、サーバ１０は、ユーザが発話した会話文から抽出した予約語に基づいて、記憶部１２に格納されている広告文のうち、ユーザに提示する広告文を選択したときに、選択された広告文が複数あった場合は、次のようにして絞り込み選択を行う。すなわちサーバ１０は、広告履歴を参照し、選択されている広告文の広告識別子が記録されているか否かを検索する。そして選択されている広告文の広告識別子が広告履歴に記録されていれば、当該広告履歴に記録された発話を指示した日時のうち最も最近の日時の情報を取得する。これにより、予約語に基づいて選択した広告文のそれぞれについて、同じ出稿者からの同じ広告対象に係る広告文を最後に発話した日時の情報が得られる。 Specifically, when the server 10 selects advertising copy to present to the user from among those stored in the memory unit 12 based on reserved words extracted from conversational text spoken by the user, if there are multiple selected advertising copies, it narrows down the selection as follows: That is, the server 10 references the advertising history to check whether the advertising identifier of the selected advertising copy is recorded. If the advertising identifier of the selected advertising copy is recorded in the advertising history, it obtains information on the most recent date and time of the utterance instruction recorded in the advertising history. This provides information on the date and time when advertising copy related to the same advertising target from the same advertiser was last spoken for each of the advertising copies selected based on reserved words.

サーバ１０は、得られた日時情報のうち最も新しい日時の情報に係る広告文を絞り込み選択する。これにより、一度出稿者Ａによる広告文が発話された後は、その後も出稿者Ａによる広告文が発話されやすくなり、端末装置２０の会話の一貫性が維持される。 The server 10 narrows down and selects the advertising copy associated with the most recent date and time information from the obtained date and time information. This makes it easier for advertising copy by advertiser A to be spoken thereafter, maintaining the consistency of conversations on the terminal device 20.

また、この例のように、広告履歴を参照する場合、サーバ１０は、ユーザが発話した会話文から抽出した予約語を含むジャンル情報に関連付けられた広告識別子を取得し、当該取得した広告識別子に関連付けられた広告文のうち、先頭の広告文以外の広告文を選択することとしてもよい。 Furthermore, as in this example, when referencing the advertising history, the server 10 may acquire an advertising identifier associated with genre information containing reserved words extracted from the conversational text spoken by the user, and select an advertising copy other than the first advertising copy from among the advertising copies associated with the acquired advertising identifier.

この例によると、例えば広告識別子：１に関わる広告文（Ｅ）に
（１）「睡眠の質を改善するには睡眠改善薬ＡＡ」
（２）「睡眠改善薬ＡＡ、試してみた？」の２つがあるとき、初回は先頭の「睡眠の質を改善するには睡眠改善薬ＡＡ」が発話されるが、二回目以降は（広告履歴に記録されるため）、先頭以外の「睡眠改善薬ＡＡ、試してみた？」の広告文が発話されることとなり、より自然な会話を実現できる。 According to this example, for example, the advertisement copy (E) related to advertisement identifier: 1 includes the following: (1) "To improve the quality of your sleep, use sleep aid AA."
(2) When there are two options, "Have you tried Sleep Aid AA?", the first one, "To improve the quality of your sleep, try Sleep Aid AA," will be spoken the first time, but from the second time onwards (because it is recorded in the advertisement history), the other advertisement, "Have you tried Sleep Aid AA?", will be spoken, allowing for a more natural conversation.

［広告を抑制する時間帯の設定］
また、広告を行う時間帯として、ユーザが多忙な時間帯（例えば平日の朝など）は不適切であるため、サーバ１０は、会話文キューから端末装置２０が発話する会話文を選択する際に、端末装置２０の所在地の現在日時が予め忌避する時間帯として設定された時間帯であれば、広告文が選択されないよう制御してもよい。 [Setting time periods to suppress ads]
Furthermore, since times when users are busy (such as weekday mornings) are inappropriate for advertising, when the server 10 selects a conversation sentence to be spoken by the terminal device 20 from the conversation sentence queue, the server 10 may control the advertisement sentence not to be selected if the current date and time at the location of the terminal device 20 is in a time period that has been set in advance as a time period to be avoided.

この例では、端末装置２０は、例えば自己の所在する地域の現在日時の情報を、リクエスト情報に含めてサーバ１０に通知すればよい。またサーバ１０は、広告文を会話文キューに登録する際、当該会話文キューに登録した広告文に関連付けて、当該会話文が広告文であることを表すフラグ情報を登録しておいてもよい。 In this example, the terminal device 20 may notify the server 10 of, for example, information about the current date and time in the area where the terminal device 20 is located, by including this information in the request information. Furthermore, when the server 10 registers advertising text in the conversation text queue, the server 10 may also register flag information indicating that the conversation text is advertising text, in association with the advertising text registered in the conversation text queue.

サーバ１０は、リクエスト情報に含められた現在日時の情報が、予め忌避する時間帯として設定された時間帯に含まれる場合は、広告文であることを表すフラグ情報に関連付けられた会話文を避けて、会話文キューから、発話の対象とする会話文を選択する。 If the current date and time information included in the request information falls within a time period that has been previously set as an avoided time period, the server 10 selects the conversation sentence to be spoken from the conversation sentence queue, avoiding conversation sentences associated with flag information indicating that the conversation sentence is an advertisement sentence.

［広告発話後の会話文の分析処理］
サーバ１０は、広告文を応答文として発話されたときに、当該応答文の発話に対するユーザの反応を判定してもよい。具体的にサーバ１０は、広告文を発話した後にユーザから受け入れた会話文の内容について、広告に関係する内容の会話が続いているか否か（広告の対象に関わる語が後続の会話文に含まれているか否か）や、極性分析等の方法で肯定的であるか否定的であるかを分析する。 [Analysis of conversational text after advertisement speech]
When an advertising copy is uttered as a response, the server 10 may determine the user's reaction to the response. Specifically, the server 10 analyzes the content of the conversation received from the user after the advertising copy is uttered to determine whether the conversation continues with content related to the advertisement (whether words related to the subject of the advertisement are included in the subsequent conversation) and whether the conversation is positive or negative using a method such as polarity analysis.

そしてサーバ１０は、この分析の結果を、ユーザごと、かつ、発話した広告文の広告識別子ごとに、広告効果の履歴として保持し、所定の処理に供する。 The server 10 then stores the results of this analysis as a history of advertising effectiveness for each user and for each advertising identifier of the spoken advertising copy, and uses this information for specified processing.

ここで所定の処理は、特定の出稿者が出稿した広告の広告識別子ごとの広告効果の履歴の統計を得て、当該出稿者に提示する処理等が考えられる。これにより、広告出稿者は、出稿した広告の発話頻度や、ユーザの反応等の情報を得ることができる。 The specified processing here could include obtaining statistics on the advertising effectiveness history for each advertising identifier of advertisements placed by a specific advertiser and presenting this information to the advertiser. This allows the advertiser to obtain information such as the frequency of mentions of the advertisements they have placed and user responses.

また、所定の処理は、ユーザごとに、否定的、あるいは、広告後に広告の内容に係る会話が続かないと判断された広告識別子を抽出する処理であってもよい。この場合、当該ユーザに対しては、当該抽出した広告識別子の広告文を選択しないよう制御することとしてもよい。 The predetermined process may also be a process of extracting, for each user, advertising identifiers that are determined to be negative or that are not likely to lead to a conversation related to the content of the advertisement after the advertisement. In this case, control may be exercised so that the user is not able to select the advertising copy of the extracted advertising identifier.

［広告のターゲット］
またここまでの説明では、広告識別子に関わる広告文等は、ユーザの属性、例えば年齢層や性別などに関わらず選択され得るものとなっていたが、本実施の形態はこれに限られない。例えば広告文は、ターゲットとなるユーザの属性を表す情報に関連付けて記録されてもよい。この場合、サーバ１０は、ユーザの端末装置２０に発話させる広告文を選択する際に、当該ユーザの生年月日や性別等の属性（予め登録しておく）を参照して、当該属性を表す情報に関連付けられた広告文のうちから、広告文を選択することとしてもよい。 [Advertisement Targeting]
In the above description, the advertisement copy related to the advertisement identifier can be selected regardless of the user's attributes, such as age group or gender, but this embodiment is not limited to this. For example, the advertisement copy may be recorded in association with information representing the attributes of the target user. In this case, when selecting the advertisement copy to be spoken by the user's terminal device 20, the server 10 may refer to the user's attributes (which may be registered in advance), such as the user's date of birth or gender, and select the advertisement copy from among the advertisement copies associated with the information representing the attributes.

［基準に関する他の例］
ここで、広告文を発話するか否かの所定の基準は、会話の連続性を次文判定処理して、連続性ありとするか否かの基準であるとしたが本実施の形態はこれに限られず、会話の履歴に含まれるキーワードの数や頻度（単位時間あたりのキーワードの出現数）などに基づくものであってもよい。 [Other examples of standards]
Here, the predetermined criterion for whether or not to speak the advertising copy is a criterion for determining whether or not there is continuity in the conversation by performing a next-sentence determination process, but this embodiment is not limited to this, and may be based on the number or frequency (number of times a keyword appears per unit time) of keywords included in the conversation history, etc.

［端末装置側での処理］
またここまでの説明では、サーバ１０が本発明の情報処理装置の一例として機能し、広告を選択して、端末装置２０に発話させることとしていたが本実施の形態はこれに限られず、端末装置２０が本発明の情報処理装置の一例として機能して、サーバ１０に蓄積された広告文のうちから発話する広告文を選択してもよい。この例では、予約語抽出部４２と、広告選択部４３との動作を、端末装置２０の制御部３１が実行することとなる。またこの例では、制御部３１が、選択された広告文を発話する制御を行う。 [Processing on the terminal device side]
In the explanation so far, the server 10 functions as an example of the information processing device of the present invention, selects an advertisement, and causes the terminal device 20 to speak it, but the present embodiment is not limited to this, and the terminal device 20 may function as an example of the information processing device of the present invention and select an advertisement copy to be spoken from among advertisement copies stored in the server 10. In this example, the control unit 31 of the terminal device 20 executes the operations of the reserved word extraction unit 42 and the advertisement selection unit 43. In this example, the control unit 31 also controls the speaking of the selected advertisement copy.

［実施形態の構成・効果］
本発明の実施の形態は、以上のように、広告文を予め保持し、会話の続きとして発話することの適否の判定を行って、適切と判断したときに広告文を発話することとしている。すなわち、本実施の形態の情報処理装置は、広告の内容である広告文を保持するディスクやサーバ等の保持手段と、ユーザから入力される会話文に含まれる、予め定められたキーワードである予約語を抽出する抽出手段と、前記抽出した予約語に基づいて、前記広告文のうち、ユーザに提示する広告文を選択する選択手段と、所定の基準に基づいて、前記会話文に続いて、選択された広告文を含む応答文の発話処理を実行する実行手段と、を備える情報処理装置である。この情報処理装置は、サーバまたはユーザ側に配される端末装置
（ロボット等）として実現できる。 [Configuration and Effects of the Embodiment]
As described above, in an embodiment of the present invention, advertising copy is stored in advance, and a determination is made as to whether it is appropriate to utter the advertising copy as a continuation of the conversation, and the advertising copy is uttered if it is determined to be appropriate. That is, the information processing device of this embodiment includes: a storage means such as a disk or server for storing the advertising copy, which is the content of the advertisement; an extraction means for extracting reserved words, which are predetermined keywords, contained in a conversational text input by a user; a selection means for selecting an advertising copy to present to the user from the advertising copy based on the extracted reserved words; and an execution means for executing a speech process for a response text including the selected advertising copy following the conversational text based on predetermined criteria. This information processing device can be realized as a terminal device (e.g., a robot) located on the server or user side.

これにより、自然な会話の中に広告文を含めることができ、広告の唐突感を低減できる。 This allows ad copy to be included in natural conversation, reducing the abruptness of the ad.

またここで、前記所定の基準は、前記会話文と、前記選択された広告文とが連続性を有するか否かであり、連続性を有する場合に前記応答文の発話処理を実行することとしてもよい。 Furthermore, the predetermined criterion may be whether or not there is continuity between the conversational text and the selected advertising text, and if there is continuity, speech processing of the response text may be executed.

これにより、自然な会話の流れの中に広告文を含めることができ、広告の唐突感を低減できる。 This allows ad copy to be included in the natural flow of conversation, making the ad feel less abrupt.

さらに前記保持手段は、前記広告文ごとに、当該広告文が表す広告のジャンルを表すジャンル情報を関連付けて保持しており、前記選択手段は、前記予約語に基づいて、ユーザに対して提示する広告として決定したジャンルに関連付けられた広告文のうちから、ユーザに提示する広告文を選択することとしてもよい。 Furthermore, the storage means may store genre information indicating the genre of the advertisement represented by each of the advertising copy in association with the advertising copy, and the selection means may select advertising copy to present to the user from advertising copy associated with the genre determined as the advertisement to be presented to the user based on the reserved words.

このように、ジャンルで広告文を絞り込むことで、基準に基づく判定の処理の負担を軽減できる。 In this way, narrowing down ad copy by genre reduces the processing burden of criteria-based judgment.

さらに前記実行手段は、広告を避けるべき時間帯として設定された時間帯など、予め定められた時間帯に、前記応答文の発話を抑制してもよい。この例によると、ユーザにとって不適切な時間帯に広告を提示することを避けることができる。 Furthermore, the execution means may suppress the utterance of the response sentence during a predetermined time period, such as a time period set as a time period when advertisements should be avoided. According to this example, it is possible to avoid presenting advertisements during times that are inappropriate for the user.

また前記応答文の発話処理が実行されたときに、当該応答文の発話処理に対するユーザの反応を判定する反応判定手段をさらに含み、前記反応判定手段による判定の結果を、発話した応答文の内容を表す情報に関連付けて蓄積して、後にその広告を繰り返す、あるいは抑制するといった制御の処理など、所定処理に供することとしてもよい。これにより、広告の効果に応じた制御を行うことが可能となる。 The system may further include a reaction determination means for determining the user's reaction to the response sentence utterance processing when the response sentence utterance processing is executed, and the result of the determination by the reaction determination means may be stored in association with information representing the content of the uttered response sentence, and may later be used for predetermined processing, such as control processing to repeat or suppress the advertisement. This makes it possible to perform control according to the effectiveness of the advertisement.

１情報処理システム、１０サーバ、１１制御部、１２記憶部、１３通信部、２０端末装置、２１脚部、２２本体部、３１制御部、３２記憶部、３３センサ部、３４表示部、３５音声出力部、３６通信部、３７駆動部、４１受信部、４２予約語抽出部、４３広告選択部、４４アクション情報生成部、４５指示送信部、５１リクエスト送出部、５２アクション情報受信部、５３音声合成部、５４アクション処理実行部。 1 Information processing system, 10 Server, 11 Control unit, 12 Memory unit, 13 Communication unit, 20 Terminal device, 21 Leg unit, 22 Main body unit, 31 Control unit, 32 Memory unit, 33 Sensor unit, 34 Display unit, 35 Audio output unit, 36 Communication unit, 37 Drive unit, 41 Receiving unit, 42 Reserved word extraction unit, 43 Advertisement selection unit, 44 Action information generation unit, 45 Instruction transmission unit, 51 Request transmission unit, 52 Action information reception unit, 53 Audio synthesis unit, 54 Action processing execution unit.

Claims

It has a processor and memory,
The advertisement content, such as the advertisement copy, advertisement identifier, and information identifying the advertiser, is stored in memory.
The processor:
extracting predetermined reserved words contained in conversational text input by a user;
Selecting an advertisement copy to be presented to the user from among a plurality of advertisement copies based on the extracted reserved words;
performing speech processing of a response sentence including the selected advertising copy following the conversation sentence based on predetermined criteria;
An advertising identifier that identifies the target of an advertisement related to the speech-processed advertising copy is recorded as an advertising history.
When the reserved word is extracted from a conversational sentence different from the conversational sentence, selecting, from among a plurality of advertisement copies, advertisement copies that are the same as the advertisement copy specified by the advertisement identifier recorded as the advertisement history,
executes speech processing of a response sentence including the selected advertisement copy;
the predetermined criterion is whether or not a response sentence including the selected advertisement copy results in a natural conversation when uttered following a conversation history;
If it is determined that the response sentence will be a natural conversation, speech processing is executed for the response sentence including the selected advertisement copy following the conversation sentence, and if it is determined that the response sentence will not be a natural conversation, another advertisement copy to be presented to the user is selected from a plurality of advertisement copies based on the extracted reserved words.
Information processing device.

The processor stores in a memory an advertisement copy, an advertisement identifier, and information identifying an advertiser, which are contents of the advertisement;
A processor is caused to extract predetermined reserved words included in a conversation sentence input by a user;
causing a processor to select, from the advertising copy, an advertising copy to be presented to a user based on the extracted reserved word;
causing a processor to execute speech processing of a response sentence including a selected advertising sentence following the conversation sentence based on predetermined criteria;
causing the processor to record in the memory, as an advertisement history, an advertisement identifier that identifies a target of an advertisement related to the speech-processed advertisement copy;
a processor, when the reserved word is extracted from a conversation sentence different from the conversation sentence, selecting, from among a plurality of advertisement copies, advertisement copies that are made by the same advertiser identified based on an advertisement identifier recorded as an advertisement history;
causing a processor to execute speech processing of a response sentence including the selected advertising copy;
the predetermined criterion is whether or not a response sentence including the selected advertisement copy results in a natural conversation when uttered following a conversation history;
When it is determined that the response sentence is a natural conversation, the processor is caused to execute speech processing of the response sentence including the selected advertisement copy following the conversation sentence, and when it is determined that the response sentence is not a natural conversation, the processor is caused to select another advertisement copy to be presented to the user from a plurality of advertisement copies based on the extracted reserved words.
program.

The processor stores in a memory the advertisement copy, the advertisement identifier, and information identifying the advertiser, which are the contents of the advertisement;
A processor extracts predetermined reserved words included in a conversation sentence input by a user;
The processor selects an advertising copy to be presented to the user from among a plurality of advertising copies based on the extracted reserved words;
a processor, based on predetermined criteria, performing speech processing of a response sentence including the selected advertising sentence following the conversation sentence;
The processor records, as an advertisement history, an advertisement identifier that identifies a target of an advertisement related to the speech-processed advertisement copy;
a processor, when the reserved word is extracted from a conversation sentence different from the conversation sentence, selecting, from among a plurality of advertisement copies, advertisement copies that are made by the same advertiser identified based on an advertisement identifier recorded as an advertisement history;
a processor performs speech processing of a response sentence including the selected advertisement sentence;
the predetermined criterion is whether or not a response sentence including the selected advertisement copy results in a natural conversation when uttered following a conversation history;
If it is determined that the response sentence constitutes a natural conversation, the processor executes speech processing of the response sentence including the selected advertisement sentence following the conversation sentence, and if it is determined that the response sentence does not constitute a natural conversation, the processor selects another advertisement sentence to be presented to the user from among a plurality of advertisement sentences based on the extracted reserved words.
Information processing methods.

A server and a terminal device are provided,
The server
The advertisement content, such as the advertisement copy, advertisement identifier, and information identifying the advertiser, is stored in memory.
extracting predetermined reserved words contained in conversational text input by a user;
Selecting an advertisement copy to be presented to the user from among a plurality of advertisement copies based on the extracted reserved words;
causing the terminal device to execute speech processing of a response sentence including the selected advertisement sentence following the conversation sentence based on a predetermined criterion;
An advertising identifier that identifies the target of an advertisement related to the speech-processed advertising copy is recorded as an advertising history.
When the reserved word is extracted from a conversational sentence different from the conversational sentence, selecting, from among a plurality of advertisement copies, advertisement copies that are the same as the advertisement copy specified by the advertisement identifier recorded as the advertisement history,
causing the terminal device to execute speech processing of a response sentence including the selected advertisement copy;
the predetermined criterion is whether or not a response sentence including the selected advertisement copy results in a natural conversation when uttered following a conversation history;
When it is determined that the response sentence is a natural conversation, the processor is caused to execute speech processing of the response sentence including the selected advertisement copy following the conversation sentence, and when it is determined that the response sentence is not a natural conversation, the processor is caused to select another advertisement copy to be presented to the user from a plurality of advertisement copies based on the extracted reserved words.
system.