JP7653828B2

JP7653828B2 - Robots and robot systems

Info

Publication number: JP7653828B2
Application number: JP2021074873A
Authority: JP
Inventors: 朋佳大橋; 峻戸村; 登宮本; 奈津子榎本
Original assignee: Tokyo Gas Co Ltd
Current assignee: Tokyo Gas Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2025-03-31
Anticipated expiration: 2041-04-27
Also published as: JP2022169071A

Description

本発明は、ロボット、およびロボットシステムに関する。 The present invention relates to a robot and a robot system.

特許文献１には、装置と対象者との相互作用を促進させたり抑制させたりすることを目的とするロボットが記載されている。ロボットは、子供から離れる場合、例えば、「ちょっと疲れたからお休みしてくるね。」又は「ちょっと遊び過ぎたから休憩してくるね」といったロボットが子供から離れることを通知する内容の音声をロボットに発話させる。また、ロボットが子供との相互作用を停止する場合、例えば、「眠くなってきた」または「ぐーぐー（いびき音）」などのロボットが寝ることを通知したり、寝ている様子を表したりする内容の音声をロボットに発話させる。
特許文献２には、子供の相手をしたり管理をしたりすることを目的とするロボットが記載されている。ロボットは、子供の状況を示す情報を取得して、取得した状況に基づき、状況と、当該状況が生じたときにロボットがとるべき行動とを対応づけて記憶する行動記憶部を参照して、ロボットの行動の決定と、ロボットの行動の制御とを行う。 Patent Document 1 describes a robot that aims to promote or suppress interaction between a device and a subject. When the robot leaves the child, it is made to utter a voice that notifies the child that the robot is leaving, such as, for example, "I'm a bit tired, so I'm going to have a rest" or "I've played a bit too much, so I'm going to have a rest." Also, when the robot stops interacting with the child, it is made to utter a voice that notifies the child that the robot is going to sleep, such as, for example, "I'm getting sleepy" or "Grrr (snoring sound)," or that expresses the state of being asleep.
Patent Document 2 describes a robot that aims to look after and take care of children. The robot acquires information indicating the child's situation, and based on the acquired situation, refers to a behavior memory unit that associates and stores the situation with the behavior that the robot should take when the situation occurs, and determines and controls the robot's behavior.

特開２０１８－１７６３８３号公報JP 2018-176383 A 特開２００５－３０５６３１号公報JP 2005-305631 A

ロボットのうち特にコミュニケーションロボットは、一般的な電化製品では代替できない、両親や家族などに近い存在であることに存在意義がある。このため、核家族化、少子化、および一般家庭におけるロボットの利用率の高まりに伴い、両親や家族に対する依存度が高い幼児によるロボットに対する依存度が高くなることが予想される。その結果として、親の立場からしても、ロボットの存在が育児や子供の成長に欠かせないものになっていくことが予想される。特許文献１および２には、いずれも発話によって子供の相手をするロボットが提案されている。しかしながら、特許文献１および２のようなロボットによる発話は、一方的あるいは子供の発話の内容に単純に合わせたものに過ぎない。ロボットによる子供に向けた発話が、一方的あるいは子供の発話の内容に単純に合わせたものに過ぎない場合には、一般的な電化製品と変わりがない。 The raison d'être of robots, especially communication robots, is that they are close to parents and family members, something that cannot be replaced by ordinary electrical appliances. For this reason, with the trend towards nuclear families, the declining birthrate, and the increasing use of robots in ordinary homes, it is expected that young children, who are highly dependent on their parents and family, will become more dependent on robots. As a result, it is expected that robots will become indispensable for childcare and the growth of children, even from the perspective of parents. Patent Documents 1 and 2 both propose robots that interact with children by speaking. However, the speech of robots such as those in Patent Documents 1 and 2 is merely one-sided or simply tailored to the content of the child's speech. If the speech of a robot to a child is merely one-sided or simply tailored to the content of the child's speech, it is no different from an ordinary electrical appliance.

本発明の目的は、従来の一方的な発話や、単純にユーザの発話に合わせた発話を行うロボットでは実現できない、子供に向けてロボットが発話する内容および特徴が、育児や子供の成長に寄与させることができるロボットを提供することにある。 The objective of the present invention is to provide a robot whose speech content and characteristics directed at children can contribute to childcare and growth, something that could not be achieved with conventional robots that only speak one-sidedly or that simply match the speech of the user.

請求項１に記載された発明は、第１ユーザの発話の音声に予め定められた特徴量の音声の部分が含まれることを検知して、当該音声の部分を抽出する抽出手段と、前記抽出手段により抽出された前記音声の部分の前記特徴量に基づいて、前記第１ユーザの感情のパターンを推定する推定手段と、前記推定手段により推定された前記感情のパターンに応じて、第２ユーザに向けて出力する音声の態様を決定する決定手段とを備えるロボットである。
請求項２に記載された発明は、前記決定手段は、前記第２ユーザに向けて出力する前記音声の態様として、前記第１ユーザの発話と内容を同一とし、前記感情のパターンに変化を加えた音声を決定する、請求項１に記載のロボットである。
請求項３に記載された発明は、前記第１ユーザのポジティブな感情とネガティブな感情との各々に対応する、複数の前記音声の態様のパターンを予め記憶する記憶手段をさらに備え、前記推定手段は、前記第１ユーザの感情のパターンの推定として、当該第１ユーザの感情のパターンが前記ポジティブな感情のパターンと前記ネガティブな感情のパターンとのうちいずれかにあてはまるかを推定し、前記決定手段は、前記推定手段により推定された前記第１ユーザの感情のパターンに応じて、前記記憶手段に記憶されている複数の前記音声の態様のパターンのうち前記第２ユーザに向けて出力する音声の態様のパターンを選択して決定する、請求項１に記載のロボットである。
請求項４に記載された発明は、前記決定手段は、前記第２ユーザに向けて出力する音声の態様のパターンとして、前記推定手段による前記第１ユーザの感情のパターンの推定の結果にかかわらずポジティブな感情のパターンを選択して決定する、請求項３に記載のロボットである。
請求項５に記載された発明は、前記抽出手段は、前記第１ユーザの発話の音声のうち、予め定められた特徴量の音声の部分の前後の文脈の音声の部分をさらに抽出し、前記推定手段は、前記抽出手段により抽出された前記予め定められた特徴量の音声の部分の前後の文脈の音声の部分の特徴量をさらに考慮して、前記第１ユーザの感情のパターンを推定する、請求項１に記載のロボットである。
請求項６に記載された発明は、第１ユーザの発話の音声に予め定められた特徴量の音声の部分が含まれることがロボットにより検知されると、当該音声の部分を抽出する抽出手段と、前記抽出手段により抽出された前記音声の部分の前記特徴量に基づいて、前記第１ユーザの感情のパターンを推定する推定手段と、前記推定手段により推定された前記感情のパターンに応じて、第２ユーザに向けて前記ロボットに出力させる音声の態様を決定する決定手段と、前記決定手段により決定された前記音声の態様の音声を前記ロボットに出力させる制御を行う出力制御手段とを備えるロボットシステムである。
請求項７に記載された発明は、前記決定手段は、前記第２ユーザに向けて出力させる前記音声の態様として、前記第１ユーザの発話と内容を同一とし、前記感情のパターンに変化を加えた音声を決定する、請求項６に記載のロボットシステムである。
請求項８に記載された発明は、前記第１ユーザのポジティブな感情とネガティブな感情との各々に対応する、複数の前記音声の態様のパターンを予め記憶する記憶手段をさらに備え、前記推定手段は、前記第１ユーザの感情のパターンの推定として、当該第１ユーザの感情のパターンが前記ポジティブな感情のパターンと前記ネガティブな感情のパターンとのうちいずれかにあてはまるかを推定し、前記決定手段は、前記推定手段により推定された前記第１ユーザの感情のパターンに応じて、前記記憶手段に記憶されている複数の前記音声の態様のパターンのうち前記第２ユーザに向けて出力させる音声の態様のパターンを選択して決定する、請求項６に記載のロボットシステムである。
請求項９に記載された発明は、前記決定手段は、前記第２ユーザに向けて出力させる音声の態様のパターンとして、前記推定手段による前記第１ユーザの感情のパターンの推定の結果にかかわらずポジティブな感情のパターンを選択して決定する、請求項８に記載のロボットシステムである。
請求項１０に記載された発明は、前記抽出手段は、前記第１ユーザの発話の音声のうち、予め定められた特徴量の音声の部分の前後の文脈の音声の部分をさらに抽出し、前記推定手段は、前記抽出手段により抽出された前記予め定められた特徴量の音声の部分の前後の文脈の音声の部分の特徴量をさらに考慮して、前記第１ユーザの感情のパターンを推定する、請求項６に記載のロボットシステムである。 The invention described in claim 1 is a robot comprising an extraction means for detecting that a voice spoken by a first user contains a voice portion with a predetermined characteristic and extracting the voice portion, an estimation means for estimating an emotional pattern of the first user based on the characteristic of the voice portion extracted by the extraction means, and a determination means for determining a form of voice to be output to a second user in accordance with the emotional pattern estimated by the estimation means.
The invention described in claim 2 is the robot described in claim 1, wherein the determination means determines, as the form of the voice to be output to the second user, a voice that has the same content as the speech of the first user but has a change in the emotion pattern.
The invention described in claim 3 is the robot described in claim 1, further comprising a storage means for storing in advance a plurality of voice manner patterns corresponding to positive emotions and negative emotions of the first user, wherein the estimation means estimates, as an estimation of the emotion pattern of the first user, whether the emotion pattern of the first user falls into either the positive emotion pattern or the negative emotion pattern, and the determination means selects and determines a voice manner pattern to be output to the second user from the plurality of voice manner patterns stored in the storage means in accordance with the emotion pattern of the first user estimated by the estimation means.
The invention described in claim 4 is the robot described in claim 3, wherein the determination means selects and determines a positive emotion pattern as the voice style pattern to be output to the second user regardless of a result of the estimation means' estimation of the emotion pattern of the first user.
The invention described in claim 5 is the robot described in claim 1, wherein the extraction means further extracts parts of contextual voice before and after parts of the voice of the first user's utterance that have a predetermined feature, and the estimation means estimates the emotion pattern of the first user by further taking into consideration the features of the parts of contextual voice before and after the parts of the voice with the predetermined feature extracted by the extraction means.
The invention described in claim 6 is a robot system comprising: an extraction means for extracting a part of a voice having a predetermined characteristic when a robot detects that a voice spoken by a first user includes the part of the voice; an estimation means for estimating an emotional pattern of the first user based on the characteristic of the part of the voice extracted by the extraction means; a determination means for determining a type of voice to be output by the robot to a second user in accordance with the emotional pattern estimated by the estimation means; and an output control means for controlling the robot to output a voice having the type of voice determined by the determination means.
The invention described in claim 7 is the robot system described in claim 6, wherein the determination means determines, as the form of the voice to be output to the second user, a voice that is identical in content to the speech of the first user but has a change in the emotion pattern.
The invention described in claim 8 is the robot system described in claim 6, further comprising a storage means for pre-storing a plurality of voice behavior patterns corresponding to each of positive emotions and negative emotions of the first user, wherein the estimation means estimates, as an estimation of the emotion pattern of the first user, whether the emotion pattern of the first user falls into either the positive emotion pattern or the negative emotion pattern, and the determination means selects and determines a voice behavior pattern to be output to the second user from the plurality of voice behavior patterns stored in the storage means in accordance with the emotion pattern of the first user estimated by the estimation means.
The invention described in claim 9 is the robot system described in claim 8, wherein the determination means selects and determines a positive emotion pattern as the voice style pattern to be output to the second user regardless of the result of the estimation means' estimation of the emotion pattern of the first user.
The invention described in claim 10 is the robot system described in claim 6, wherein the extraction means further extracts contextual voice parts before and after the voice parts having predetermined features from the voice of the first user's speech, and the estimation means estimates the emotion pattern of the first user by further taking into consideration the features of the contextual voice parts before and after the voice parts having the predetermined features extracted by the extraction means.

本発明によれば、従来の一方的な発話や、単純にユーザの発話に合わせた発話を行うロボットでは実現できない、子供に向けてロボットが発話する内容および特徴が、育児や子供の成長に寄与させることができるロボットを提供することができる。 The present invention provides a robot that can contribute to childcare and the development of children by providing speech content and features directed at children that cannot be achieved with conventional robots that only speak one-sidedly or that simply match the speech of the user.

本実施の形態が適用される育児支援ロボットシステムのハードウェア構成を示す図である。1 is a diagram showing a hardware configuration of a child-rearing support robot system to which the present embodiment is applied. 育児支援ロボットのハードウェア構成を示す図である。FIG. 2 is a diagram illustrating a hardware configuration of the child care support robot. 育児支援ロボットの機能構成を示す図である。FIG. 2 is a diagram illustrating a functional configuration of a child care support robot. サーバの機能構成を示す図である。FIG. 2 is a diagram illustrating a functional configuration of a server. 育児支援ロボットの処理の流れを示すフローチャートである。13 is a flowchart showing a process flow of the childcare support robot. サーバの処理の流れを示すフローチャートである。13 is a flowchart showing a flow of processing by the server. データベースに記憶されている、感情のパターンと、特定音声の内容と、出力態様との対応関係の具体例を示す図である。FIG. 11 is a diagram showing a specific example of the correspondence between emotion patterns, specific voice contents, and output modes stored in a database. （Ａ）は、図１の育児支援ロボットシステムが適用されるタイミングの具体例を示す図である。（Ｂ）は、育児支援ロボットから出力される音声の態様のバリエーションの具体例を示す図である。Fig. 2A is a diagram showing a specific example of timing to which the childcare support robot system in Fig. 1 is applied, and Fig. 2B is a diagram showing a specific example of variations in the form of sounds output from the childcare support robot.

〔育児支援ロボットシステムのハードウェア構成〕
以下、添付図面を参照して、本発明の実施の形態について詳細に説明する。
図１は、本実施の形態が適用される育児支援ロボットシステム１のハードウェア構成を示す図である。
育児支援ロボットシステム１は、サービス提供者から、ユーザＵｂの親であるユーザＵａに対して提供される「育児支援ロボットサービス」（以下、「本サービス」と呼ぶ）を実現させるシステムである。本サービスを実現させる育児支援ロボットシステム１は、インターネット等のネットワーク９０に接続された、育児支援ロボット１０と、ユーザ端末５０と、サーバ７０とを有している。本サービスは、育児支援ロボット１０を利用して育児を支援するサービスである。育児支援ロボット１０は、ユーザＵａの発話、およびユーザＵａの子供であるユーザＵｂの発話の各々に、予め定められた特徴量を有する音声の部分である特定の音声が含まれることを検知する。すると、育児支援ロボット１０は、これをトリガとして、特定の音声に込められた発話者（ユーザＵａまたはユーザＵｂ）の感情に応じた音声Ｖｒを、ユーザＵｂに向けて出力する。 [Hardware configuration of the childcare support robot system]
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
FIG. 1 is a diagram showing a hardware configuration of a child-rearing support robot system 1 to which the present embodiment is applied.
The childcare support robot system 1 is a system that realizes a "childcare support robot service" (hereinafter referred to as "this service") provided by a service provider to a user Ua who is the parent of a user Ub. The childcare support robot system 1 that realizes this service has a childcare support robot 10, a user terminal 50, and a server 70, all connected to a network 90 such as the Internet. This service is a service that supports childcare using the childcare support robot 10. The childcare support robot 10 detects that the speech of the user Ua and the speech of the user Ub who is the child of the user Ua each contain a specific voice that is a part of the voice having a predetermined characteristic amount. Then, the childcare support robot 10 uses this as a trigger to output a voice Vr corresponding to the emotion of the speaker (user Ua or user Ub) contained in the specific voice to the user Ub.

なお、本サービスを利用するユーザＵａは、ユーザＵｂの親だけではなく、例えば祖父母、保育者など、ユーザＵｂを管理または教育する立場にある者がユーザＵａである場合もある。また、この場合、ユーザＵｂは、ユーザＵａの孫、被保育者など、ユーザＵａに管理され、または教育される立場にある者となる。なお、ユーザＵｂは、いずれの場合にも、未就学児、小学生、中学生、または高校生が想定される。なお、本明細書では、ユーザＵａとユーザＵｂとの各々を区別して説明する必要がない場合には、両者をまとめて「ユーザＵ」と記載する。 The user Ua who uses this service may not only be the parent of user Ub, but may also be a person in a position to manage or educate user Ub, such as a grandparent or caregiver. In this case, user Ub may be a person who is managed or educated by user Ua, such as user Ua's grandchild or care recipient. In either case, user Ub is assumed to be a preschool child, elementary school student, junior high school student, or high school student. In this specification, when there is no need to distinguish between user Ua and user Ub, both will be collectively referred to as "user U."

育児支援ロボット１０は、ユーザＵａからユーザＵｂに向けて発話された音声Ｖａ、およびユーザＵｂから発話された音声Ｖｂのいずれかに特定の音声が含まれていることを検知すると、特定の音声の音声データを抽出する。育児支援ロボット１０は、抽出した音声データの特徴量を解析することで、特定の音声に込められた感情のパターンを推定する。具体的には、育児支援ロボット１０は、音声Ｖａの一部または全部に特定の音声が含まれていることを検知すると、特定の音声の特徴量を解析することで、ユーザＵａが音声Ｖａを発話したときの感情のパターンを推定する。また、育児支援ロボット１０は、音声Ｖｂの一部または全部に特定の音声が含まれていることを検知すると、特定の音声の特徴量を解析することで、ユーザＵｂが音声Ｖｂを発話したときの感情のパターンを推定する。 When the childcare support robot 10 detects that a specific voice is included in either the voice Va spoken by the user Ua to the user Ub or the voice Vb spoken by the user Ub, it extracts voice data of the specific voice. The childcare support robot 10 estimates the emotional pattern contained in the specific voice by analyzing the features of the extracted voice data. Specifically, when the childcare support robot 10 detects that a specific voice is included in part or all of the voice Va, it analyzes the features of the specific voice to estimate the emotional pattern of the user Ua when he spoke the voice Va. Furthermore, when the childcare support robot 10 detects that a specific voice is included in part or all of the voice Vb, it analyzes the features of the specific voice to estimate the emotional pattern of the user Ub when he spoke the voice Vb.

発話者の感情のパターンは、ポジティブな感情のパターンとネガティブな感情のパターンとに分類することができる。ポジティブな感情のパターンとしては、例えば「歓喜」、「平穏」、「幸福」、「関心」といったものが挙げられる。また、ネガティブな感情のパターンとしては、例えば「悲哀」、「怒り」、「恐怖」、「嫌悪」といったものが挙げられる。さらに、ポジティブな感情の場合とネガティブな感情の場合との両方がある「驚き」等が挙げられる。なお、感情のパターンは、これらに限定されない。 The speaker's emotional patterns can be classified into positive emotional patterns and negative emotional patterns. Positive emotional patterns include, for example, "joy," "calmness," "happiness," and "interest." Negative emotional patterns include, for example, "sorrow," "anger," "fear," and "disgust." Another example is "surprise," which can be both a positive and negative emotion. Note that the emotional patterns are not limited to these.

ここで、育児支援ロボット１０による、特定の音声の抽出、および特定の音声の発話者の感情のパターンの推定は、どのような手法により実現されるかは特に限定されず、あらゆる手法を採用することができる。例えば、一般的に用いられている音声データの解析の手法を採用することができる。具体的には、例えば発話された音声のピッチ、基本周波数、音声波形周期性、声道の共鳴周波数、フォルマントの周波数帯域、声量、声質や声色、速度、波形ピークの出現頻度等について、ＡＩ（人工知能）が機械学習やディープラーニング等を行う手法を採用することができる。ここで、音声のデータの解析は、特定の音声のみを対象とするのではなく、特定の音声の前後の文脈部分の音声のデータも解析の対象となる。これは、同じ単語を同じように発話しても、その単語の前後の文脈を考慮すると意味合いが全く異なることがあるからである。 Here, the childcare support robot 10 may extract a specific voice and estimate the emotional pattern of the speaker of the specific voice without any particular limitation, and any method may be adopted. For example, a commonly used method of analyzing voice data may be adopted. Specifically, a method may be adopted in which AI (artificial intelligence) performs machine learning or deep learning on the pitch, fundamental frequency, voice waveform periodicity, vocal tract resonance frequency, formant frequency band, voice volume, voice quality and tone, speed, frequency of waveform peaks, etc. of the spoken voice. Here, the analysis of the voice data does not only target the specific voice, but also targets the voice data of the context before and after the specific voice. This is because even if the same word is spoken in the same way, the meaning may be completely different when the context before and after the word is taken into account.

育児支援ロボット１０は、特定の音声を発した者の感情のパターンを推定すると、その推定の結果に応じて、ユーザＵｂに向けて音声Ｖｒを出力する際の態様（以下、「出力態様」と呼ぶ）を決定する。出力態様は、出力される音声Ｖｒの内容と、音声Ｖｒに込められる感情のパターンとを含むように構成される。出力態様は、データベースにおいて、予め定められた複数の特定の音声の各々に対応付けられて管理されている。具体的には、出力態様は、育児支援ロボット１０のうち、後述する図２の記憶部１３の一領域に設けられた特定音声ＤＢ４１と、記憶部１３と同様の機能を有するサーバ７０の記憶部（図示せず）の一領域に設けられた特定音声ＤＢ８１とに記憶されている。なお、データベースとして記憶されている出力態様のパターンの具体例については、図７を参照して後述する。 When the childcare support robot 10 estimates the emotional pattern of the person who uttered the specific voice, it determines the mode (hereinafter referred to as the "output mode") when outputting the voice Vr to the user Ub according to the result of the estimation. The output mode is configured to include the content of the voice Vr to be output and the pattern of the emotion contained in the voice Vr. The output mode is managed in a database in association with each of a plurality of predetermined specific voices. Specifically, the output mode is stored in a specific voice DB 41 provided in an area of the memory unit 13 of the childcare support robot 10 in FIG. 2 described later, and a specific voice DB 81 provided in an area of the memory unit (not shown) of the server 70 having the same function as the memory unit 13. Specific examples of patterns of output modes stored as a database will be described later with reference to FIG. 7.

本サービスの具体例として、例えばユーザＵａがユーザＵｂを強く叱りつけるように「早くしなさい！」という内容の音声Ｖａを発話したとする。この場合、育児支援ロボット１０は、「早くしなさい！」という内容の音声Ｖａを、「早くしようね」といった優しい態様の音声Ｖｒに変換してからユーザＵｂに向けて出力する。つまり、ユーザＵａからユーザＵｂに向けた発話に込められた感情のパターンがネガティブなものである場合には、叱られることでユーザＵｂが心に受けるネガティブな影響を和らげて情緒を安定させるような音声Ｖｒが、育児支援ロボット１０から出力される。 As a specific example of this service, assume that user Ua utters a voice Va saying "Hurry up!" in a manner that strongly scolds user Ub. In this case, the childcare support robot 10 converts the voice Va saying "Hurry up!" into a gentler voice Vr saying "Let's hurry up," and outputs it to user Ub. In other words, if the emotional pattern conveyed in the speech from user Ua to user Ub is negative, the childcare support robot 10 outputs a voice Vr that eases the negative emotional impact that being scolded has on user Ub and stabilizes his/her emotions.

また、例えばユーザＵａが、「すごいね！」といったユーザＵｂを褒める内容の音声Ｖａを発話したとする。この場合、育児支援ロボット１０は、「すごいね！」といったようにオウム返しとなる音声Ｖｒを出力する。つまり、ユーザＵａからユーザＵｂに向けた発話に込められた感情のパターンがポジティブなものである場合には、褒められることでユーザＵｂが心に受けるポジティブな影響を増幅させるような音声Ｖｒが、育児支援ロボット１０から出力される。 For example, assume that user Ua utters a voice Va praising user Ub, such as "Amazing!" In this case, the childcare support robot 10 outputs a voice Vr that echoes the user's words, such as "Amazing!" In other words, if the emotional pattern conveyed in the speech from user Ua to user Ub is positive, the childcare support robot 10 outputs a voice Vr that amplifies the positive impact that being praised has on user Ub's mind.

育児支援ロボット１０から音声Ｖｒが出力されると、ユーザＵｂがリアクションをとる場合がある。ユーザＵｂがリアクションとなる音声Ｖｂを発話した場合には、リアクションの有無、およびリアクションとなる音声Ｖｂのデータが、フィードバック情報として取得される。このフィードバック情報は、育児支援ロボット１０から出力された音声Ｖｒの出力態様の妥当性を事後的に検証するための情報になる。つまり、育児支援ロボット１０から出力された音声Ｖｒの出力態様が的確なものだったのか、あるいはまったく的外れのものだったのかを検証するための情報になる。このため、フィードバック情報は、例えばＡＩ（人工知能）による機械学習の対象となる情報として活用できる。 When the voice Vr is output from the childcare support robot 10, the user Ub may react. When the user Ub utters the voice Vb that is the reaction, the presence or absence of a reaction and the data of the voice Vb that is the reaction are acquired as feedback information. This feedback information is information for retroactively verifying the validity of the output manner of the voice Vr output from the childcare support robot 10. In other words, it is information for verifying whether the output manner of the voice Vr output from the childcare support robot 10 was appropriate or completely off the mark. For this reason, the feedback information can be used as information that is the subject of machine learning by AI (artificial intelligence), for example.

具体的には、例えば育児支援ロボット１０からユーザＵｂに向けて音声Ｖｒが出力された直後に、ユーザＵｂが「何言ってるの？」であるとか「何それ？」といった懐疑的な態様の音声Ｖｂを発話すると、その内容がフィードバック情報として取得される。このフィードバック情報は、育児支援ロボット１０から出力された音声Ｖｒの妥当性が低いことを示す情報となる。これに対して、例えば育児支援ロボット１０からユーザＵｂに向けて音声Ｖｒが出力された直後に、ユーザＵｂが「わかった」であるとか「ありがとう」といった肯定的な態様の音声Ｖｂを発話すると、その内容がフィードバック情報として取得される。このフィードバック情報は、育児支援ロボット１０から出力された音声Ｖｒの妥当性が高いことを示す情報となる。 Specifically, for example, if the user Ub utters a skeptical voice Vb such as "What are you saying?" or "What's that?" immediately after the voice Vr is output from the child care support robot 10 to the user Ub, the content of that voice is acquired as feedback information. This feedback information is information indicating that the validity of the voice Vr output from the child care support robot 10 is low. In contrast, for example, if the user Ub utters a positive voice Vb such as "I got it" or "Thank you" immediately after the voice Vr is output from the child care support robot 10 to the user Ub, the content of that voice is acquired as feedback information. This feedback information is information indicating that the validity of the voice Vr output from the child care support robot 10 is high.

フィードバック情報は、データベースにおいて、予め定められた複数の特定の音声の各々に対応付けられて管理される。具体的には、フィードバック情報は、育児支援ロボット１０のうち、後述する図２の記憶部１３の一領域に設けられた特定音声ＤＢ４１と、記憶部１３と同様の機能を有するサーバ７０の記憶部（図示せず）の一領域に設けられた特定音声ＤＢ８１とに記憶されている。 The feedback information is managed in a database in association with each of a number of predetermined specific voices. Specifically, the feedback information is stored in a specific voice DB 41 provided in an area of the memory unit 13 of the child care support robot 10 (see FIG. 2, which will be described later), and in a specific voice DB 81 provided in an area of the memory unit (not shown) of a server 70 that has the same function as the memory unit 13.

上述したように、育児支援ロボット１０は、特定音声ＤＢ４１に記憶されているフィードバック情報を用いて、ＡＩ（人工知能）による機械学習を行うことができる。機械学習の結果は、出力態様を決定するための根拠情報として利用される。このように、フィードバック情報の取得、機械学習、出力形態の決定、音声Ｖｒの出力、再びフィードバック情報の取得というルーティンを繰り返し行うことにより、育児支援ロボット１０から出力される音声Ｖｒの精度を高めていくことができる。なお、フィードバック情報を対象とする機械学習は、後述するサーバ７０側で行う構成とすることもできる。ただし、育児支援ロボット１０をスタンドアロン型のロボットとして機能させる場合には、育児支援ロボット１０がフィードバック情報を対象とする機械学習を行う。 As described above, the childcare support robot 10 can perform machine learning using AI (artificial intelligence) using the feedback information stored in the specific voice DB 41. The results of the machine learning are used as basis information for determining the output mode. In this way, by repeating the routine of obtaining feedback information, machine learning, determining the output mode, outputting the voice Vr, and obtaining feedback information again, the accuracy of the voice Vr output from the childcare support robot 10 can be improved. Note that machine learning targeting the feedback information can also be configured to be performed on the server 70 side, which will be described later. However, when the childcare support robot 10 is made to function as a standalone robot, the childcare support robot 10 performs machine learning targeting the feedback information.

育児支援ロボット１０は、ユーザＵａの感情のパターンがポジティブなものであってもネガティブなものであっても、これにとらわれずに、ユーザＵｂに対してポジティブな態様の音声Ｖｒを出力することができる。また、育児支援ロボット１０は、ポジティブな態様の音声Ｖｒを出力するだけではなく、そのときの状況に応じてネガティブな態様の音声Ｖｒを出力する場合もある。具体的には、例えばユーザＵａが感情を抑えてユーザＵｂを叱ると、育児支援ロボット１０は、ユーザＵｂを少し強めに諭す内容の音声Ｖｒを出力することができる。これにより、ユーザＵａの態度の意味をユーザＵｂに悟らせることができる。また、例えばユーザＵｂから発話された音声Ｖｂに、年長者に対して非礼となる内容が含まれている場合には、育児支援ロボット１０がユーザＵａに代わってユーザＵｂを注意する内容の音声Ｖｒを出力することもできる。 Regardless of whether the emotional pattern of the user Ua is positive or negative, the childcare support robot 10 can output a voice Vr in a positive manner to the user Ub. The childcare support robot 10 can output not only a positive voice Vr, but also a negative voice Vr depending on the situation at the time. Specifically, for example, if the user Ua scolds the user Ub while suppressing his/her emotions, the childcare support robot 10 can output a voice Vr with a slightly stronger admonishment to the user Ub. This allows the user Ub to understand the meaning of the user Ua's attitude. For example, if the voice Vb uttered by the user Ub contains content that is disrespectful to an elder person, the childcare support robot 10 can output a voice Vr with a content that reprimands the user Ub on behalf of the user Ua.

上述したように、育児支援ロボット１０は、基本的には特定の音声および前後の文脈の音声のデータを解析した結果に基づいて出力態様を決定する。そのとき、育児支援ロボット１０は、ユーザＵａおよびユーザＵｂのユーザ情報を考慮する。「ユーザ情報」とは、ユーザＵａおよびユーザＵｂに関する情報のことをいう。ユーザ情報には、例えばユーザＵａおよびユーザＵｂの各々の呼び名、住所（都道府県）、年齢、性別などパーソナルな情報が含まれる。 As described above, the childcare support robot 10 basically determines the output mode based on the results of analyzing data on a specific voice and the voice in the surrounding context. At that time, the childcare support robot 10 takes into consideration the user information of the user Ua and the user Ub. "User information" refers to information on the user Ua and the user Ub. The user information includes personal information such as the nicknames, addresses (prefectures), ages, and genders of the users Ua and Ub.

ユーザ情報は、ユーザ端末５０に入力されることで取得される。また、ユーザＵａおよびユーザＵｂが本サービス以外の他サービスにユーザ情報を別途登録している場合であって、本サービスと他サービスとの間にユーザＵｂのユーザ情報の融通に関する定めが設けられているような場合がある。このような場合には、ユーザＵの承諾を条件として、他サービスにてユーザ情報として既に登録されている情報も、本サービスにおけるユーザ情報として取得される。具体的には、他サービスを運営するサーバなど図示しない外部のサーバからユーザ情報を取得する。育児支援ロボット１０は、取得したユーザ情報を、後述する図２の記憶部１３の一領域に設けられたユーザ情報ＤＢ４２に記憶して管理する。 The user information is acquired by being input into the user terminal 50. There are also cases where users Ua and Ub have registered user information separately in services other than this service, and there are provisions between this service and the other service regarding the flexibility of user information for user Ub. In such cases, with user U's consent, information that has already been registered as user information in the other service is also acquired as user information in this service. Specifically, the user information is acquired from an external server (not shown), such as a server that operates the other service. The child care support robot 10 stores and manages the acquired user information in a user information DB 42 provided in one area of the memory unit 13 in FIG. 2, which will be described later.

ここで、出力態様の決定に際してユーザ情報が考慮された場合の一例を説明する。例えば、ユーザ情報に、ユーザＵｂの呼び名が「〇〇ちゃん」、住所（都道府県）が「大阪府」、年齢が「５歳」、性別が「女」という情報が含まれていたとする。この場合、育児支援ロボット１０は、出力態様の決定に際して、ユーザＵｂに向けて音声Ｖｒを出力する際、「〇〇ちゃん」という呼び名、関西地方特有の言い回し、および未就学女児といった点を考慮する。このように出力態様の決定に際してユーザ情報が考慮されるので、育児支援ロボット１０は、ユーザＵｂのパーソナルな情報を含む、より親近感のある音声Ｖｒを出力することが可能となる。 Here, an example of a case where user information is taken into consideration when determining the output mode will be described. For example, assume that the user information includes information such as the nickname of user Ub "XX-chan", address (prefecture) "Osaka Prefecture", age "5 years old", and gender "female". In this case, when outputting voice Vr to user Ub in determining the output mode, the child care support robot 10 takes into consideration the nickname "XX-chan", the phrase specific to the Kansai region, and the character for a preschool girl. Since user information is taken into consideration in determining the output mode in this way, the child care support robot 10 is able to output a more familiar voice Vr that includes personal information of user Ub.

また、育児支援ロボット１０は、ユーザＵａによる入力操作の結果、出力態様の一部についての指定が行われると、これを受け付ける。そして、受け付けた指定の内容を含む出力態様にて音声Ｖｒを出力する。なお、ユーザＵａが出力態様の「一部」を指定できることとしたのは以下の理由による。すなわち、育児支援ロボット１０から出力される音声Ｖｒの態様の「すべて」を予め指定できることとすると、ユーザＵａおよびユーザＵｂの各々から発話された音声Ｖａおよび音声Ｖｂの各々と、育児支援ロボット１０から出力される音声Ｖｒとの関連性が希薄になる。その結果、「ロボットが発話する内容および特徴が、育児や子供の成長に寄与させる」という本来の目的を達成できなくなる場合があるためである。出力態様の一部の指定としては、例えば出力態様のバリエーションの指定を可能にする「出力モード」の指定が挙げられる。なお、出力モードの詳細については、図８（Ｂ）を参照して後述する。 In addition, when a part of the output mode is specified as a result of the input operation by the user Ua, the childcare support robot 10 accepts this. Then, the childcare support robot 10 outputs the voice Vr in an output mode that includes the content of the accepted specification. The reason why the user Ua can specify "part" of the output mode is as follows. That is, if the user Ua can specify "all" of the modes of the voice Vr output from the childcare support robot 10 in advance, the relationship between the voice Va and the voice Vb uttered by the user Ua and the user Ub, respectively, and the voice Vr output from the childcare support robot 10 will be weak. As a result, this is because there are cases where the original purpose of "having the content and characteristics uttered by the robot contribute to childcare and the growth of children" cannot be achieved. As an example of specifying a part of the output mode, there is a specification of an "output mode" that allows the specification of a variation of the output mode. The details of the output mode will be described later with reference to FIG. 8 (B).

育児支援ロボット１０は、図１に示すように、育児支援ロボットシステム１を構成するネットワーク型のロボットとして機能させることもできるが、通信環境の悪い場所では、一時的にスタンドアロン型のロボットとして機能させることもできる。また、当初から通信機能を有しないスタンドアロン型のロボットとして機能させることもできる。 As shown in FIG. 1, the childcare support robot 10 can function as a networked robot that constitutes the childcare support robot system 1, but in places with poor communication environments, it can also function temporarily as a standalone robot. It can also function from the beginning as a standalone robot that does not have communication capabilities.

育児支援ロボットシステム１を構成するユーザ端末５０は、スマートフォン、タブレット、パーソナルコンピュータ等で構成される。ユーザ端末５０は、親であるユーザＵａにより所持され、またはユーザＵａが生活する空間に配置されている。ユーザ端末５０は、ユーザＵａの入力操作を受け付ける。例えば、ユーザ端末５０は、育児支援ロボット１０の各種設定を行うための入力操作、および本サービスにユーザ情報を登録するための入力操作を受け付ける。なお、ここでいう「入力操作」には、ユーザ端末５０の画面に表示されたユーザインターフェースへの手入力操作、マイク等への音声による入力操作、カメラ等へのジェスチャによる入力操作が含まれる。 The user terminal 50 constituting the childcare support robot system 1 is composed of a smartphone, tablet, personal computer, etc. The user terminal 50 is held by the user Ua who is a parent, or placed in the space in which the user Ua lives. The user terminal 50 accepts input operations from the user Ua. For example, the user terminal 50 accepts input operations for making various settings for the childcare support robot 10, and input operations for registering user information in this service. Note that the "input operation" referred to here includes manual input operations into a user interface displayed on the screen of the user terminal 50, voice input operations into a microphone, etc., and input operations by gestures into a camera, etc.

ユーザ端末５０がスマートフォン、タブレット、パーソナルコンピュータで構成される場合には、本サービスを利用可能にする専用のアプリケーションソフトウェア（以下、「専用アプリ」と呼ぶ）をユーザ端末５０にインストールすることができる。また、専用アプリがインストールされない場合であっても、ユーザ端末５０のブラウザ機能を用いて、本サービスを利用可能にする専用のウェブサイトにアクセスすることでも本サービスを利用することができる。 When the user terminal 50 is configured as a smartphone, tablet, or personal computer, dedicated application software (hereinafter referred to as a "dedicated app") that enables the use of this service can be installed on the user terminal 50. Even if a dedicated app is not installed, the service can be used by accessing a dedicated website that enables the use of this service using the browser function of the user terminal 50.

育児支援ロボットシステム１を構成するサーバ７０は、システム全体を制御する情報処理装置であり、例えば専用アプリをユーザ端末５０に提供する。また、サーバ７０は、ユーザ端末５０への入力操作による出力態様の一部の指定を受け付ける。また、サーバ７０は、各種の情報をデータベースに記憶して管理する。具体的には、サーバ７０は、育児支援ロボット１０から取得したフィードバック情報を、後述する図２の記憶部１３と同様の機能を有する記憶部（図示せず）の一領域に設けられた特定音声ＤＢ８１に記憶して管理する。また、サーバ７０は、ユーザ端末５０等から取得したユーザ情報を、後述する図２の記憶部１３と同様の機能を有する記憶部（図示せず）の一領域に設けられたユーザ情報ＤＢ８２に記憶して管理する。また、サーバ７０は、ユーザ情報ＤＢ８２に記憶して管理しているユーザ情報の一部または全部を、必要に応じて育児支援ロボット１０およびユーザ端末５０に向けて送信する。 The server 70 constituting the childcare support robot system 1 is an information processing device that controls the entire system, and provides, for example, a dedicated application to the user terminal 50. The server 70 also accepts the designation of a part of the output mode by an input operation to the user terminal 50. The server 70 also stores and manages various information in a database. Specifically, the server 70 stores and manages feedback information acquired from the childcare support robot 10 in a specific voice DB 81 provided in one area of a storage unit (not shown) having a function similar to that of the storage unit 13 in FIG. 2 described later. The server 70 also stores and manages user information acquired from the user terminal 50, etc. in a user information DB 82 provided in one area of a storage unit (not shown) having a function similar to that of the storage unit 13 in FIG. 2 described later. The server 70 also transmits part or all of the user information stored and managed in the user information DB 82 to the childcare support robot 10 and the user terminal 50 as necessary.

また、図１のように育児支援ロボット１０をネットワーク型のロボットとして機能させる場合には、サーバ７０が、特定音声ＤＢ８１に記憶されているフィードバック情報を用いて、ＡＩ（人工知能）による機械学習を行うこともできる。機械学習の結果は、育児支援ロボット１０が出力態様を決定するための根拠情報として利用される。このように、フィードバック情報を用いた機械学習は、サーバ７０側で行う構成とすることもできるし、育児支援ロボット１０側でも行う構成とすることもできる。 In addition, when the childcare support robot 10 functions as a networked robot as shown in FIG. 1, the server 70 can perform machine learning using AI (artificial intelligence) using feedback information stored in the specific voice DB 81. The results of the machine learning are used as basis information for the childcare support robot 10 to determine the output mode. In this way, machine learning using feedback information can be configured to be performed on the server 70 side, or on the childcare support robot 10 side.

〔育児支援ロボットの構成〕
図２は、育児支援ロボット１０のハードウェア構成を示す図である。
育児支援ロボット１０は、コンピュータ装置を内蔵するいわゆるコミュニケーションロボットで構成される。育児支援ロボット１０は、自機全体を制御するＣＰＵ（Central Processing Unit）である制御部１１と、演算に際して作業エリアとして用いられるＲＡＭ（Random Access Memory）などのメモリ１２と、プログラムや各種設定データなどの記憶に用いられるＨＤＤ（Hard Disk Drive）や半導体メモリ等の記憶装置である記憶部１３とを有している。また、ネットワーク９０を介してデータの送受信を行う通信部１４を有している。また、ユーザＵからの入力操作を受け付けるタッチパネルなどの操作部１５と、ユーザＵに対して画像やテキスト情報などを表示する液晶ディスプレイなどからなる表示部１６と、表示部１６を制御する表示制御部１７とを有している。また、ユーザから発せられる音を録音するためのマイクなどからなる録音部１８と、ユーザＵに対して音声を出力するスピーカなどからなる音声出力部１９とを有している。さらに、ユーザＵのジェスチャの様子を撮像して静止画像または動画像の情報として取得するためのカメラなどからなる撮像部２０を有している。 [Configuration of childcare support robot]
FIG. 2 is a diagram showing the hardware configuration of the child care support robot 10. As shown in FIG.
The childcare support robot 10 is a so-called communication robot with a built-in computer device. The childcare support robot 10 has a control unit 11, which is a central processing unit (CPU) that controls the entire robot itself, a memory 12, such as a random access memory (RAM) that is used as a working area during calculations, and a storage unit 13, which is a storage device, such as a hard disk drive (HDD) or a semiconductor memory, that is used to store programs and various setting data. The childcare support robot 10 also has a communication unit 14 that transmits and receives data via a network 90. The childcare support robot 10 also has an operation unit 15, such as a touch panel, that accepts input operations from the user U, a display unit 16, such as a liquid crystal display, that displays images and text information to the user U, and a display control unit 17 that controls the display unit 16. The childcare support robot 10 also has a recording unit 18, such as a microphone for recording sounds made by the user, and a sound output unit 19, such as a speaker for outputting sound to the user U. The childcare support robot 10 also has an imaging unit 20, such as a camera, that captures the state of the user U's gestures and obtains them as still or moving image information.

なお、育児支援ロボットシステム１を構成するユーザ端末５０のハードウェア構成は、図２に示す育児支援ロボット１０のハードウェア構成と同様の構成を備えている。また、育児支援ロボットシステム１を構成するサーバ７０のハードウェア構成は、録音部１８、音声出力部１９、および撮像部２０を除いて図２に示す育児支援ロボット１０のハードウェア構成と同様の構成を備えている。このため、ユーザ端末５０およびサーバ７０のハードウェア構成の説明は省略する。 The hardware configuration of the user terminal 50 constituting the childcare support robot system 1 has a similar configuration to the hardware configuration of the childcare support robot 10 shown in FIG. 2. The hardware configuration of the server 70 constituting the childcare support robot system 1 has a similar configuration to the hardware configuration of the childcare support robot 10 shown in FIG. 2, except for the recording unit 18, the audio output unit 19, and the imaging unit 20. For this reason, a description of the hardware configuration of the user terminal 50 and the server 70 will be omitted.

図３は、育児支援ロボット１０の機能構成を示す図である。
育児支援ロボット１０は、ユーザＵにより発話された音声のデータを取得する音声取得部３１と、取得された音声のデータから特定の音声のデータを抽出する特定音声抽出部３２とを有する。また、育児支援ロボット１０は、音声を発話したユーザＵａおよびユーザＵｂの各々の感情のパターンを推定する感情パターン推定部３３と、出力態様のパターンを記憶する出力態様記憶部３４とを有する。また、育児支援ロボット１０は、ユーザ情報を取得するユーザ情報取得部３５と、ユーザ情報を記憶するユーザ情報記憶部３６と、出力態様の一部の指定を受け付ける出力態様受付部３７とを有する。また、育児支援ロボット１０は、出力態様を決定する出力態様決定部３８と、特定の音声のデータを変換する変換部３９と、音声の出力の制御を行う出力制御部４０とを有する。 FIG. 3 is a diagram showing the functional configuration of the child-rearing support robot 10.
The childcare support robot 10 has a voice acquisition unit 31 that acquires voice data uttered by the user U, and a specific voice extraction unit 32 that extracts specific voice data from the acquired voice data. The childcare support robot 10 also has an emotion pattern estimation unit 33 that estimates the emotion patterns of the users Ua and Ub who have uttered the voices, and an output mode storage unit 34 that stores the output mode patterns. The childcare support robot 10 also has a user information acquisition unit 35 that acquires user information, a user information storage unit 36 that stores the user information, and an output mode reception unit 37 that receives a designation of a part of the output mode. The childcare support robot 10 also has an output mode determination unit 38 that determines the output mode, a conversion unit 39 that converts the specific voice data, and an output control unit 40 that controls the output of the voices.

ユーザＵａから発話された音声Ｖａと、ユーザＵｂから発話された音声Ｖｂとのうち少なくとも一方の音声が入力されると、上述した図２の録音部１８によりその音声が録音される。音声取得部３１は、録音部１８により録音された音声Ｖａおよび音声Ｖｂの各々の情報を取得する。 When at least one of the voice Va spoken by the user Ua and the voice Vb spoken by the user Ub is input, the voice is recorded by the recording unit 18 in FIG. 2 described above. The voice acquisition unit 31 acquires information on each of the voice Va and voice Vb recorded by the recording unit 18.

特定音声抽出部３２は、ユーザＵａおよびユーザＵｂの各々の発話の音声Ｖａおよび音声Ｖｂの各々に、予め定められた特徴量の音声の部分が含まれることを検知して、これを特定の音声として抽出する。また、特定音声抽出部３２は、特定の音声の部分の前後の文脈の音声の部分を抽出する。 The specific voice extraction unit 32 detects that the voices Va and Vb of the respective utterances of the users Ua and Ub contain a voice portion with a predetermined characteristic amount, and extracts this as a specific voice. The specific voice extraction unit 32 also extracts the voice portions of the context before and after the specific voice portion.

感情パターン推定部３３は、特定音声抽出部３２により抽出された特定の音声の特徴量に基づいて、特定の音声を発話したユーザＵの感情のパターンを推定する。具体的には、例えば感情パターン推定部３３は、特定の音声を発話したユーザＵの感情のパターンの推定として、ポジティブな感情のパターンとネガティブな感情のパターンとのうちいずれかにあてはまるかを推定する。また、感情パターン推定部３３は、特定音声抽出部３２により抽出された特定の音声の前後の文脈の音声の部分の特徴量を考慮して、ユーザＵａおよびユーザＵｂの各々の感情のパターンを推定することもできる。 The emotion pattern estimation unit 33 estimates the emotion pattern of the user U who uttered the specific voice based on the features of the specific voice extracted by the specific voice extraction unit 32. Specifically, for example, the emotion pattern estimation unit 33 estimates whether the emotion pattern of the user U who uttered the specific voice corresponds to a positive emotion pattern or a negative emotion pattern. The emotion pattern estimation unit 33 can also estimate the emotion patterns of each of the users Ua and Ub by taking into account the features of the parts of the voice in the context before and after the specific voice extracted by the specific voice extraction unit 32.

出力態様記憶部３４は、特定の音声を一部または全部に含む、音声Ｖａまたは音声Ｖｂを発話したユーザＵの、ポジティブな感情とネガティブな感情との各々に対応する音声の態様のパターンを、出力態様のパターンとして記憶している。具体的には、出力態様記憶部３４は、複数の出力態様のパターンを、特定の音声に対応付けて特定音声ＤＢ４１に記憶させる。 The output mode storage unit 34 stores, as output mode patterns, voice mode patterns corresponding to each of the positive emotions and negative emotions of the user U who has uttered the voice Va or the voice Vb, which includes a specific voice in part or in whole. Specifically, the output mode storage unit 34 stores a plurality of output mode patterns in the specific voice DB 41 in association with the specific voice.

ユーザ情報取得部３５は、ユーザＵａ及びユーザＵｂの各々のユーザ情報を取得する。具体的には、ユーザ情報取得部３５は、入力操作にて入力されたユーザＵａ及びユーザＵｂの各々のユーザ情報を取得する。また、ユーザ情報取得部３５は、ユーザＵａ及びユーザＵｂの各々の本サービスの利用実績を、ユーザ情報として取得する。また、ユーザ情報取得部３５は、他サービスにてユーザＵａ及びユーザＵｂの各々のユーザ情報として既に登録されている情報のうち、ユーザＵの承諾が得られた情報を、本サービスにおけるユーザＵａ及びユーザＵｂの各々のユーザ情報として取得する。 The user information acquisition unit 35 acquires user information for each of the users Ua and Ub. Specifically, the user information acquisition unit 35 acquires user information for each of the users Ua and Ub that is input by an input operation. The user information acquisition unit 35 also acquires the usage history of the service for each of the users Ua and Ub as user information. The user information acquisition unit 35 also acquires, from information that has already been registered as user information for each of the users Ua and Ub in other services, information for which consent has been obtained from the user U, as user information for each of the users Ua and Ub in the service.

ユーザ情報記憶部３６は、ユーザ情報取得部３５により取得されたユーザＵａ及びユーザＵｂの各々のユーザ情報を、ユーザ情報ＤＢ４２に記憶して管理する。具体的には、ユーザ情報記憶部３６は、ユーザＵａおよびユーザＵｂの各々を一意に特定可能にする識別情報に、ユーザＵａおよびユーザＵｂの各々のユーザ情報を対応付けて、その識別情報をキーとしていつでも抽出可能な態様で管理する。 The user information storage unit 36 stores and manages the user information of each of the users Ua and Ub acquired by the user information acquisition unit 35 in the user information DB 42. Specifically, the user information storage unit 36 associates the user information of each of the users Ua and Ub with identification information that allows each of the users Ua and Ub to be uniquely identified, and manages the user information in a manner that allows the identification information to be extracted at any time as a key.

出力態様受付部３７は、出力態様の一部についての指定を受け付ける。具体的には、出力態様受付部３７は、出力態様の一部を指定するための入力操作が行われると、その指定を受け付ける。この入力操作としては、ユーザ端末５０の画面に表示されたユーザインターフェースへの手入力操作、マイク等への音声による入力操作、カメラ等へのジェスチャによる入力操作等が挙げられる。 The output mode receiving unit 37 receives a designation for a part of the output mode. Specifically, when an input operation for designating a part of the output mode is performed, the output mode receiving unit 37 receives the designation. Examples of this input operation include manual input operation into a user interface displayed on the screen of the user terminal 50, voice input operation into a microphone or the like, input operation by gesture into a camera or the like, and the like.

出力態様決定部３８は、感情パターン推定部３３により推定された、特定の音声を発話したユーザＵの感情のパターンに応じて、ユーザＵｂに向けて出力する音声Ｖｒの出力態様を決定する。具体的には、例えば出力態様決定部３８は、出力態様の決定として、ユーザＵａの発話と内容を同一とし、感情のパターンを変換した音声Ｖｒを決定する。また、例えば出力態様決定部３８は、感情パターン推定部３３により推定された感情のパターンに応じて、出力態様記憶部３４に記憶されている複数の音声Ｖｒの出力態様のパターンのうちユーザＵｂに向けて出力する音声Ｖｒの出力態様のパターンを選択して決定する。 The output mode determination unit 38 determines the output mode of the voice Vr to be output to the user Ub according to the emotion pattern of the user U who uttered the specific voice, which is estimated by the emotion pattern estimation unit 33. Specifically, for example, the output mode determination unit 38 determines a voice Vr whose content is the same as the speech of the user Ua and whose emotion pattern has been converted, as the output mode determination. Also, for example, the output mode determination unit 38 selects and determines the output mode pattern of the voice Vr to be output to the user Ub from among the multiple output mode patterns of the voice Vr stored in the output mode storage unit 34 according to the emotion pattern estimated by the emotion pattern estimation unit 33.

また、出力態様決定部３８は、ユーザＵｂに向けて出力する音声Ｖｒの出力態様のパターンとして、感情パターン推定部３３による感情のパターンの推定の結果にかかわらず、ポジティブな感情のパターンを選択して決定してもよい。また、出力態様決定部３８は、例えば予め定められた選択基準や、ＡＩ（人工知能）がユーザＵによる本サービスの利用実績を学習した結果に基づいて、出力態様を決定してもよい。また、ユーザＵａの入力操作により、既に出力態様の一部が決定されている場合には、出力態様決定部３８は、出力態様の決定に際してその決定内容を反映させる。 The output mode determination unit 38 may select and determine a positive emotion pattern as the output mode pattern of the voice Vr to be output to the user Ub, regardless of the result of the emotion pattern estimation by the emotion pattern estimation unit 33. The output mode determination unit 38 may determine the output mode, for example, based on predetermined selection criteria or the result of AI (artificial intelligence) learning about the usage history of the service by the user U. In addition, if part of the output mode has already been determined by the input operation of the user Ua, the output mode determination unit 38 reflects the determined content when determining the output mode.

変換部３９は、特定音声抽出部３２により抽出された特定の音声の出力態様を適宜変換する。具体的には、変換部３９は、抽出された特定の音声の出力態様を、出力態様決定部３８により決定された出力態様に変換する。具体的には、例えば出力態様決定部３８による決定の結果が、出力態様受付部３７により受け付けられた一部の指定内容に従ったものである場合には、変換部３９は、育児支援ロボット１０から音声Ｖｒが出力される際の出力態様を、出力態様受付部３７により指定された出力態様に変換する。 The conversion unit 39 appropriately converts the output mode of the specific voice extracted by the specific voice extraction unit 32. Specifically, the conversion unit 39 converts the output mode of the extracted specific voice into the output mode determined by the output mode determination unit 38. Specifically, for example, when the result of the determination by the output mode determination unit 38 is in accordance with some of the specified contents received by the output mode reception unit 37, the conversion unit 39 converts the output mode when the voice Vr is output from the child care support robot 10 into the output mode specified by the output mode reception unit 37.

なお、変換部３９による特定の音声の変換が具体的にどのような手法により行われるかは特に限定されず、例えば一般的に利用されている音声編集の手法を用いた加工が行われてもよい。この場合、変換部３９は、例えば波形編集、ノイズ除去、ボリューム調整、周波数調整、音圧調整、ブレス除去、ピッチ調整、イントネーション調整、音色調整等の手法により、特定の音声の変換を行う。 The specific method by which the conversion unit 39 converts the specific voice is not particularly limited, and may be, for example, a commonly used voice editing method. In this case, the conversion unit 39 converts the specific voice by, for example, waveform editing, noise removal, volume adjustment, frequency adjustment, sound pressure adjustment, breath removal, pitch adjustment, intonation adjustment, tone adjustment, etc.

出力制御部４０は、出力態様決定部３８により決定された出力態様の音声Ｖｒを出力する制御を行う。具体的には、出力制御部４０は、出力態様決定部３８により決定された出力態様になるように適宜変換された音声Ｖｒを、図２の音声出力部１９に出力させる制御を行う。なお、出力制御部４０が音声Ｖｒを出力するタイミングは特に限定されない。例えば、特定の音声の情報が抽出されると、直ちに出力態様の決定と変換とが行われて、音声出力部１９から出力されるようにしてもよい。また、例えば特定の音声の情報が抽出された後、予め設定された時間が経過する間に出力態様の決定と変換とが行われて、音声出力部１９から出力されるようにしてもよい。この場合、「予め設定された時間」はユーザＵが任意に設定できるものとし、例えば「１秒後」、「３秒後」といったように設定できるようにしてもよい。 The output control unit 40 controls to output the audio Vr in the output mode determined by the output mode determination unit 38. Specifically, the output control unit 40 controls to output the audio Vr, which has been appropriately converted so as to have the output mode determined by the output mode determination unit 38, to the audio output unit 19 in FIG. 2. The timing at which the output control unit 40 outputs the audio Vr is not particularly limited. For example, when specific audio information is extracted, the output mode may be determined and converted immediately, and the audio may be output from the audio output unit 19. Also, for example, after specific audio information is extracted, the output mode may be determined and converted within a preset time, and the audio may be output from the audio output unit 19. In this case, the "preset time" may be set arbitrarily by the user U, and may be set, for example, to "after 1 second" or "after 3 seconds".

〔サーバの構成〕
図４は、サーバ７０の機能構成を示す図である。
サーバ７０は、ユーザ情報を取得するユーザ情報取得部７１と、ユーザ情報を記憶するユーザ情報記憶部７２と、ユーザ情報の一部または全部を育児支援ロボット１０に送信する制御を行うユーザ情報送信制御部７３とを有している。また、サーバ７０は、出力態様の一部を指定する入力操作を受け付ける出力態様受付部７４と、出力態様の一部を決定する出力態様決定部７５とを有している。また、その決定内容を育児支援ロボット１０に送信する制御を行う出力態様送信制御部７６を有している。 [Server configuration]
FIG. 4 is a diagram showing the functional configuration of the server 70.
The server 70 has a user information acquisition unit 71 that acquires user information, a user information storage unit 72 that stores the user information, and a user information transmission control unit 73 that controls the transmission of a part or all of the user information to the child care support robot 10. The server 70 also has an output mode reception unit 74 that receives an input operation that specifies a part of the output mode, and an output mode determination unit 75 that determines a part of the output mode. The server 70 also has an output mode transmission control unit 76 that controls the transmission of the determined content to the child care support robot 10.

ユーザ情報取得部７１は、ユーザＵａの入力操作にて入力されたユーザＵａおよびユーザＵｂの各々のユーザ情報を取得する。また、ユーザ情報取得部７１は、ユーザＵａおよびユーザＵｂの各々による本サービスの利用実績を、ユーザＵａおよびユーザＵｂのユーザ情報として取得する。また、ユーザ情報取得部７１は、他サービスにてユーザＵａおよびユーザＵｂの各々のユーザ情報として既に登録されている情報のうち、ユーザＵの承諾が得られた情報を、本サービスにおけるユーザＵａおよびユーザＵｂの各々のユーザ情報として取得する。 The user information acquisition unit 71 acquires the user information of each of the users Ua and Ub input by the user Ua's input operation. The user information acquisition unit 71 also acquires the usage history of the service by each of the users Ua and Ub as the user information of the users Ua and Ub. The user information acquisition unit 71 also acquires, from the information that has already been registered as the user information of each of the users Ua and Ub in another service, information for which the user U has given consent, as the user information of each of the users Ua and Ub in the service.

ユーザ情報記憶部７２は、ユーザ情報取得部７１により取得されたユーザＵａおよびユーザＵｂの各々のユーザ情報を、ユーザ情報ＤＢ８２に記憶して管理する。具体的には、ユーザ情報記憶部７２は、ユーザＵａおよびユーザＵｂの各々を一意に特定可能にする識別情報にユーザ情報を対応付けて、その識別情報をキーとしていつでも抽出可能な態様で管理する。 The user information storage unit 72 stores and manages the user information of each of the users Ua and Ub acquired by the user information acquisition unit 71 in the user information DB 82. Specifically, the user information storage unit 72 associates the user information with identification information that allows each of the users Ua and Ub to be uniquely identified, and manages the user information in a manner that allows the identification information to be used as a key to be extracted at any time.

ユーザ情報送信制御部７３は、ユーザ情報ＤＢ８２に記憶されているユーザ情報の一部または全部を、育児支援ロボット１０に送信する制御を行う。なお、サーバ７０から育児支援ロボット１０に対するユーザ情報の送信の有無、および送信するユーザ情報の項目は、ユーザＵａの入力操作により任意に設定することができる。 The user information transmission control unit 73 controls the transmission of some or all of the user information stored in the user information DB 82 to the child care support robot 10. The user Ua can arbitrarily set whether or not to transmit user information from the server 70 to the child care support robot 10 and the items of user information to be transmitted.

出力態様受付部７４は、出力態様の一部の指定を受け付ける。具体的には、出力態様受付部７４は、出力態様の一部を指定するための入力操作が行われると、その指定を受け付ける。この入力操作は、ユーザ端末５０のユーザインターフェースへの手入力操作、マイク等への音声による入力操作、カメラ等へのジェスチャによる入力操作等が挙げられる。「出力態様の一部の指定」としては、例えば出力態様のバリエーションの指定を可能にする「出力モード」の指定が挙げられる。なお、出力モードの詳細については、図８（Ｂ）を参照して後述する。 The output mode receiving unit 74 receives the designation of a part of the output mode. Specifically, when an input operation for designating a part of the output mode is performed, the output mode receiving unit 74 receives the designation. This input operation can be a manual input operation to the user interface of the user terminal 50, an input operation by voice to a microphone or the like, an input operation by gesture to a camera or the like, and the like. An example of a "designation of a part of the output mode" is the designation of an "output mode" that allows the designation of variations in the output mode. Details of the output mode will be described later with reference to FIG. 8 (B).

出力態様決定部７５は、出力態様受付部７４により出力態様の一部についての指定が受け付けられた場合には、その指定に従った決定を行う。また、出力態様決定部７５は、例えば予め定められた基準や、ＡＩ（人工知能）を利用した機械学習の結果に基づいて、出力態様の一部を決定してもよい。出力態様送信制御部７６は、出力態様決定部７５によって出力態様の一部が決定されると、その決定された内容を育児支援ロボット１０に送信する制御を行う。 When the output mode determination unit 75 receives a designation for a part of the output mode from the output mode reception unit 74, the output mode determination unit 75 makes a decision according to the designation. The output mode determination unit 75 may also determine a part of the output mode based on, for example, predetermined criteria or the results of machine learning using AI (artificial intelligence). When a part of the output mode is determined by the output mode determination unit 75, the output mode transmission control unit 76 controls the transmission of the determined content to the child care support robot 10.

〔育児支援ロボットの処理〕
次に、育児支援ロボット１０の処理について、図５を用いて説明する。
図５は、育児支援ロボット１０の処理の流れを示すフローチャートである。
ユーザＵａの入力操作によってユーザＵａおよびユーザＵｂの各々のユーザ情報が入力されると（ステップ１０１でＹＥＳ）、ユーザ情報取得部３５は、ユーザＵａの入力操作にて入力されたユーザ情報を取得する（ステップ１０２）。ユーザ情報取得部３５により取得されたユーザ情報は、ユーザ情報記憶部３６によりユーザ情報ＤＢ４２に記憶されて管理される。これに対して、ユーザ情報が入力されていない場合には（ステップ１０１でＮＯ）、ステップ１０３の判断に進む。 [Childcare support robot processing]
Next, the processing of the child care support robot 10 will be described with reference to FIG.
FIG. 5 is a flowchart showing the flow of processing by the child-rearing support robot 10.
When the user information of each of the users Ua and Ub is input by the input operation of the user Ua (YES in step 101), the user information acquisition unit 35 acquires the user information input by the input operation of the user Ua (step 102). The user information acquired by the user information acquisition unit 35 is stored and managed in the user information DB 42 by the user information storage unit 36. On the other hand, if no user information has been input (NO in step 101), the process proceeds to the judgment in step 103.

ユーザＵａの入力操作によって出力態様の一部が指定されると（ステップ１０３でＹＥＳ）、出力態様受付部３７がその指定を受け付けて、出力態様決定部３８がその指定に従って出力態様の一部を決定する（ステップ１０４）。これに対して、出力態様の一部が指定されていない場合には（ステップ１０３でＮＯ）、ステップ１０５の判断に進む。 When part of the output mode is designated by the input operation of the user Ua (YES in step 103), the output mode acceptance unit 37 accepts the designation, and the output mode determination unit 38 determines part of the output mode in accordance with the designation (step 104). On the other hand, when part of the output mode is not designated (NO in step 103), the process proceeds to the judgment in step 105.

ユーザＵａから発話された音声Ｖａと、ユーザＵｂから発話された音声Ｖｂとのうち少なくとも一方の音声が入力されると（ステップ１０５でＹＥＳ）、その音声を録音部１８が録音して、音声取得部３１が、録音部１８により録音された音声Ｖａおよび音声Ｖｂの各々の情報を取得する（ステップ１０６）。これに対して、ユーザＵａから発話された音声Ｖａと、ユーザＵｂから発話された音声Ｖｂとのうち少なくとも一方の音声が入力されていない場合には（ステップ１０５でＮＯ）、ユーザＵａから発話された音声Ｖａと、ユーザＵｂから発話された音声Ｖｂとのうち少なくとも一方の音声が入力されるまでステップ１０５の判断が繰り返される。 When at least one of the voice Va spoken by user Ua and the voice Vb spoken by user Ub is input (YES in step 105), the recording unit 18 records the voice, and the voice acquisition unit 31 acquires information on each of the voices Va and Vb recorded by the recording unit 18 (step 106). On the other hand, when at least one of the voice Va spoken by user Ua and the voice Vb spoken by user Ub is not input (NO in step 105), the judgment in step 105 is repeated until at least one of the voice Va spoken by user Ua and the voice Vb spoken by user Ub is input.

特定音声抽出部３２は、ユーザＵａおよびユーザＵｂの各々の発話の音声Ｖａおよび音声Ｖｂの各々に、予め定められた特徴量の音声の部分が含まれることを検知して、これを特定の音声として抽出する（ステップ１０７）。感情パターン推定部３３は、ステップ１０７で特定音声抽出部３２により抽出された特定の音声の特徴量に基づいて、特定の音声を発話したユーザＵの感情のパターンを推定する（ステップ１０８）。 The specific voice extraction unit 32 detects that the voices Va and Vb of the users Ua and Ub each contain a voice portion with a predetermined characteristic amount, and extracts this as a specific voice (step 107). The emotion pattern estimation unit 33 estimates the emotion pattern of the user U who spoke the specific voice, based on the characteristic amount of the specific voice extracted by the specific voice extraction unit 32 in step 107 (step 108).

出力態様決定部３８は、ステップ１０８で感情パターン推定部３３により推定された、特定の音声を発話したユーザＵの感情のパターンに応じて、ユーザＵｂに向けて出力する音声Ｖｒの出力態様を決定する（ステップ１０９）。ここで、出力態様受付部３７により出力態様の一部の指定が受け付けられた場合には、その指定に従った決定が行われる。 The output mode determination unit 38 determines the output mode of the voice Vr to be output to the user Ub according to the emotional pattern of the user U who uttered the specific voice, estimated by the emotional pattern estimation unit 33 in step 108 (step 109). Here, if the output mode reception unit 37 receives a designation of a part of the output mode, a determination is made according to the designation.

変換部３９は、抽出された特定の音声の出力態様を、出力態様決定部３８により決定された出力態様に変換する（ステップ１１０）。ここで、ステップ１０９における出力態様決定部３８の決定の結果が、出力態様受付部３７により受け付けられた一部の指定内容に従ったものである場合、変換部３９は、出力態様受付部３７により一部の指定が受け付けられた出力態様に変換する。また、例えば出力態様決定部３８の決定の結果が、出力態様の決定に際してユーザ情報を考慮する旨の設定に従ったものである場合には、変換部３９は、出力態様を、ユーザ情報が考慮された出力態様に変換する。出力制御部４０は、出力態様決定部３８により決定された出力態様の音声Ｖｒを出力する制御を行う（ステップ１１１）。これにより処理が終了する。 The conversion unit 39 converts the output mode of the extracted specific voice into the output mode determined by the output mode determination unit 38 (step 110). Here, if the result of the determination by the output mode determination unit 38 in step 109 is in accordance with the part of the specification content accepted by the output mode acceptance unit 37, the conversion unit 39 converts into the output mode in which the part of the specification is accepted by the output mode acceptance unit 37. Also, for example, if the result of the determination by the output mode determination unit 38 is in accordance with a setting to take user information into consideration when deciding the output mode, the conversion unit 39 converts the output mode into an output mode in which the user information is taken into consideration. The output control unit 40 controls to output the voice Vr in the output mode determined by the output mode determination unit 38 (step 111). This ends the process.

〔サーバの処理〕
次に、サーバ７０の処理について、図６を用いて説明する。
図６は、サーバ７０の処理の流れを示すフローチャートである。
ユーザＵａの入力操作によってユーザＵａおよびユーザＵｂのユーザ情報が入力されると（ステップ７０１でＹＥＳ）、ユーザ情報取得部７１は、ユーザＵａの入力操作にて入力されたユーザＵａおよびユーザＵｂのユーザ情報を取得する（ステップ７０２）。ユーザ情報取得部７１により取得されたユーザＵａおよびユーザＵｂのユーザ情報は、ユーザ情報記憶部７２によりユーザ情報ＤＢ８２に記憶されて管理される。ユーザ情報送信制御部７３は、ユーザ情報ＤＢ８２に記憶されているユーザＵａおよびユーザＵｂのユーザ情報の一部または全部を育児支援ロボット１０に送信する制御を行う（ステップ７０３）。これに対して、ユーザＵａおよびユーザＵｂのユーザ情報が入力されていない場合には（ステップ７０１でＮＯ）、ステップ７０４の判断に進む。 [Server processing]
Next, the process of the server 70 will be described with reference to FIG.
FIG. 6 is a flowchart showing the flow of processing by the server 70.
When the user information of the user Ua and the user Ub is input by the input operation of the user Ua (YES in step 701), the user information acquisition unit 71 acquires the user information of the user Ua and the user Ub input by the input operation of the user Ua (step 702). The user information of the user Ua and the user Ub acquired by the user information acquisition unit 71 is stored and managed in the user information DB 82 by the user information storage unit 72. The user information transmission control unit 73 controls to transmit a part or all of the user information of the user Ua and the user Ub stored in the user information DB 82 to the child care support robot 10 (step 703). On the other hand, when the user information of the user Ua and the user Ub has not been input (NO in step 701), the process proceeds to the judgment in step 704.

ユーザＵａの入力操作によって出力態様の一部が指定されると（ステップ７０４でＹＥＳ）、出力態様受付部７４がその指定を受け付けて、出力態様決定部７５がその指定に従って出力態様の一部を決定する（ステップ７０５）。出力態様の一部が決定されると、出力態様送信制御部７６は、その決定内容を育児支援ロボット１０に送信する制御を行う（ステップ７０６）。これに対して、出力態様の一部が指定されていない場合には（ステップ７０４でＮＯ）、出力態様の一部が決定されることなく処理は終了する。 When part of the output mode is specified by the input operation of the user Ua (YES in step 704), the output mode receiving unit 74 receives the specification, and the output mode determining unit 75 determines part of the output mode in accordance with the specification (step 705). When part of the output mode is determined, the output mode transmission control unit 76 controls the transmission of the determined content to the child care support robot 10 (step 706). On the other hand, when part of the output mode is not specified (NO in step 704), the process ends without determining part of the output mode.

以上の構成を有する育児支援ロボットシステム１によれば、育児支援ロボット１０が、あるときは子であるユーザＵｂの心をケアする存在となり、またあるときは親であるユーザＵａをサポートする存在となる。すなわち、育児支援ロボット１０は、ユーザＵａおよびユーザＵｂの家族の一員として、ユーザＵａと一緒に育児を支援することが可能となる。 According to the childcare support robot system 1 having the above configuration, the childcare support robot 10 sometimes becomes a presence that cares for the mental health of the user Ub, who is the child, and sometimes becomes a presence that supports the user Ua, who is the parent. In other words, the childcare support robot 10 becomes a member of the family of the user Ua and the user Ub, and is able to support childcare together with the user Ua.

〔具体例〕
次に、本サービスの具体例について、図７及び図８を用いて説明する。
図７は、データベースに記憶されている、感情のパターンと、特定音声の内容と、出力態様との対応関係の具体例を示す図である。 [Specific examples]
Next, a specific example of this service will be described with reference to FIGS.
FIG. 7 is a diagram showing a specific example of the correspondence between emotion patterns, specific voice contents, and output modes stored in the database.

図７に示す情報は、特定音声ＤＢ４１および特定音声ＤＢ８１に記憶されている情報の一部である。なお、上述したように、特定音声ＤＢ４１および特定音声ＤＢ８１にはフィードバック情報が記憶されているが、図７には、具体例を説明するための情報として、感情のパターンと、特定音声の内容と、出力態様との対応関係のみが示されている。 The information shown in FIG. 7 is a portion of the information stored in specific voice DB41 and specific voice DB81. As described above, feedback information is stored in specific voice DB41 and specific voice DB81, but FIG. 7 only shows the correspondence between emotion patterns, specific voice content, and output modes as information for explaining specific examples.

図７に示すように、発話者（ユーザＵａおよびユーザＵｂ）の感情のパターンには、「ポジティブ」な感情を示すものと「ネガティブ」な感情を示すものとがある。このうち、「ポジティブ」な感情を示すものには、「歓喜」の感情を示すもの、「平穏」の感情を示すもの、その他図示はしないが、「幸福」、「関心」等の感情を示すものがある。そして、「歓喜」を示すものに対応する特定の音声の内容として、例えば「すごい！」、「えらい！」といったものがあり、各々に対応する育児支援ロボット１０の出力態様として、「すごい！」、「えらい！」といったものがある。また、「平穏」を示すものに対応する特定の音声の内容として、例えば「いい？」、「わかった？」といったものがあり、各々に対応する育児支援ロボット１０の出力態様として、「よく聞いてね」、「よくわかったよね」といったものがある。 As shown in FIG. 7, the emotional patterns of the speakers (users Ua and Ub) include those that indicate "positive" emotions and those that indicate "negative" emotions. Among these, the "positive" emotions include those that indicate "joy", those that indicate "calm", and others (not shown) that indicate emotions such as "happiness" and "interest". Specific voice contents that correspond to those that indicate "joy" include, for example, "Amazing!" and "Great!", and the output modes of the childcare support robot 10 that correspond to each of these include, for example, "Amazing!" and "Great!". Specific voice contents that correspond to those that indicate "calm" include, for example, "Okay?" and "Got it?", and the output modes of the childcare support robot 10 that correspond to each of these include, for example, "Listen carefully", and "You understand well, don't you?".

また、「ネガティブ」な感情を示すものには、「悲哀」の感情を示すもの、「怒り」の感情を示すもの、その他図示はしないが、「恐怖」、「嫌悪」等の感情を示すものがある。そして、「悲哀」を示すものに対応する特定の音声の内容として、例えば「ひどい！」、「（泣き声）」といったものがあり、各々に対応する育児支援ロボット１０の出力態様として、「どうしたの？」、「泣かないで」といったものがある。また、「怒り」を示すものに対応する特定の音声の内容として、例えば「こら！」、「だめ！」といったものがあり、各々に対応する育児支援ロボット１０の出力態様として、「それはいけないね」、「だめだよ」といったものがある。 Furthermore, the "negative" emotions include emotions such as "sorrow", "anger", and others (not shown), such as "fear" and "disgust". Specific voice contents corresponding to emotions such as "sorrow" include, for example, "That's terrible!" and "(crying)", and the output forms of the childcare support robot 10 corresponding to each of these include, for example, "What's wrong?" and "Don't cry". Specific voice contents corresponding to emotions such as "anger" include, for example, "Hey!" and "No!", and the output forms of the childcare support robot 10 corresponding to each of these include, for example, "That's not good" and "No."

ここで、例えばユーザＵａが、ユーザＵｂを絶賛するように、「歓喜」の感情を込めて「すごい！」という音声Ｖａを発話したとする。すると、育児支援ロボット１０は、ユーザＵｂに向けて、「すごい！」という音声Ｖｒを、「歓喜」の感情を表す出力態様で出力する。また、例えばユーザＵａが、ユーザＵｂを褒めるように、「歓喜」の感情を込めて「えらい！」という音声Ｖａを発話したとする。すると、育児支援ロボット１０は、ユーザＵｂに向けて、「えらい！」という音声Ｖｒを、「歓喜」の感情を表す出力態様で出力する。 Now, for example, suppose that user Ua utters the voice Va "Amazing!" with the emotion of "joy" to praise user Ub. Then, the child care support robot 10 outputs the voice Vr "Amazing!" to user Ub in an output mode that expresses the emotion of "joy". Also, for example, suppose that user Ua utters the voice Va "Great!" with the emotion of "joy" to praise user Ub. Then, the child care support robot 10 outputs the voice Vr "Great!" to user Ub in an output mode that expresses the emotion of "joy".

また、例えばユーザＵａが、ユーザＵｂに傾聴を促すように、「平穏」の感情を込めて「いい？」という音声Ｖａを発話したとする。すると、育児支援ロボット１０は、ユーザＵｂに向けて、「よく聞いてね」という音声Ｖｒを、「平穏」の感情を表す出力態様で出力する。また、例えばユーザＵａが、ユーザＵｂを諭すように、「平穏」の感情を込めて「わかった？」という音声Ｖａを発話したとする。すると、育児支援ロボット１０は、ユーザＵｂに向けて、「よくわかったよね」という音声Ｖｒを、「平穏」の感情を表す出力態様で出力する。 For example, suppose that user Ua utters the voice Va "Okay?" with a feeling of "calmness" to encourage user Ub to listen attentively. In response, the child care support robot 10 outputs the voice Vr "Listen carefully" to user Ub in an output mode that expresses the feeling of "calmness". For example, suppose that user Ua utters the voice Va "Do you understand?" with a feeling of "calmness" to admonish user Ub. In response, the child care support robot 10 outputs the voice Vr "You understand well, don't you?" to user Ub in an output mode that expresses the feeling of "calmness".

また、例えばユーザＵｂが、ユーザＵａに向かって、「悲哀」の感情を込めて「ひどい！」という音声Ｖｂを発話したとする。すると、育児支援ロボット１０は、ユーザＵｂに向けて、「どうしたの？」という音声Ｖｒを、「平穏」の感情を表す出力態様で出力する。また、例えばユーザＵｂが、特定の言葉ではなく、「悲哀」の感情を込めて、泣き声を示す音声Ｖｂを発話（嗚咽）したとする。すると、育児支援ロボット１０は、ユーザＵｂに向けて、「泣かないで」という音声Ｖｒを、「平穏」の感情を表す出力態様で出力する。 For example, suppose that user Ub utters a voice Vb saying "That's terrible!" to user Ua with the emotion of "sorrow". In response, the child care support robot 10 outputs a voice Vr saying "What's wrong?" to user Ub in an output mode expressing the emotion of "calm". For example, suppose that user Ub utters a voice Vb indicating crying (sobbing) with the emotion of "sorrow" rather than using specific words. In response, the child care support robot 10 outputs a voice Vr saying "Don't cry" to user Ub in an output mode expressing the emotion of "calm".

また、例えばユーザＵａが、ユーザＵｂを強く叱るように、「怒り」の感情を込めて「こら！」という音声Ｖａを発話したとする。すると、育児支援ロボット１０は、ユーザＵｂに向けて、「それはいけないね」という音声Ｖｒを、「平穏」の感情を表す出力態様で出力する。また、例えばユーザＵａが、ユーザＵｂを強く注意するように、「怒り」の感情を込めて「だめ！」という音声Ｖａを発話したとする。すると、育児支援ロボット１０は、ユーザＵｂに向けて、「だめだよ」という音声Ｖｒを、「平穏」の感情を表す出力態様で出力する。 For example, suppose that user Ua utters a voice Va saying "Hey!" with the emotion of "anger" as if strongly scolding user Ub. The childcare support robot 10 then outputs a voice Vr saying "That's not good" to user Ub in an output mode expressing the emotion of "calm". For example, suppose that user Ua utters a voice Va saying "No!" with the emotion of "anger" as if strongly scolding user Ub. The childcare support robot 10 then outputs a voice Vr saying "No" to user Ub in an output mode expressing the emotion of "calm".

図８（Ａ）は、図１の育児支援ロボットシステムが適用されるタイミングの具体例を示す図である。
育児支援ロボットシステム１が適用されるタイミングとしては、例えば図８（Ａ）に示すように、子供であるユーザＵｂが何かに成功したタイミング、ユーザＵｂが何かに失敗したタイミング、ユーザＵｂが泣いているタイミングなどが挙げられる。このうち、ユーザＵｂが何かに成功したタイミングで、母親であるユーザＵａがユーザＵｂを褒めると、育児支援ロボット１０は、これに便乗するようにユーザＵｂを褒める内容の音声Ｖｒを出力する。これにより、褒められることでユーザＵｂが心に受けるポジティブな影響を増幅させることができる。 FIG. 8A is a diagram showing a specific example of the timing at which the child-rearing support robot system of FIG. 1 is applied.
8A, examples of the timings when the childcare support robot system 1 is applied include the timing when the user Ub, who is a child, succeeds in something, the timing when the user Ub fails in something, the timing when the user Ub is crying, etc. When the user Ub succeeds in something and the user Ua, who is a mother, praises the user Ub, the childcare support robot 10 outputs a voice Vr praising the user Ub, as if to join in. This can amplify the positive impact that the praise has on the user Ub's mind.

また、ユーザＵｂが何かに失敗したタイミングで、母親であるユーザＵａがユーザＵｂを強く叱ると、育児支援ロボット１０は、これに便乗するのではなく、ユーザＵｂを慰める内容の音声Ｖｒを出力する。これにより、叱られることでユーザＵｂが心に受けるネガティブな影響を和らげて、情緒の安定を図ることができる。また、ユーザＵｂが何かに失敗したタイミングで、母親であるユーザＵａが感情を抑えてユーザＵｂを叱ると、育児支援ロボット１０は、ユーザＵｂを少し強めに諭す内容の音声Ｖｒを出力することもできる。これにより、ユーザＵａの態度の意味をユーザＵｂに悟らせることができる。このように、ユーザＵａと育児支援ロボット１０とが相反する感情を表すようにすることで、緊張と緩和とのコンビネーションによる効果的な育児が可能となる。 Furthermore, when user Ub makes a mistake and user Ua, the mother, scolds user Ub harshly, the child-rearing support robot 10 does not jump on the bandwagon, but outputs a voice Vr that comforts user Ub. This can ease the negative emotional impact that being scolded has on user Ub, and stabilize his/her emotions. When user Ub makes a mistake and user Ua, the mother, scolds user Ub while suppressing her emotions, the child-rearing support robot 10 can output a voice Vr that admonishes user Ub a little more forcefully. This can make user Ub understand the meaning of user Ua's attitude. In this way, by having user Ua and child-rearing support robot 10 express opposing emotions, effective child-rearing can be achieved through a combination of tension and relaxation.

また、ユーザＵｂが何かを理由に泣いていると、育児支援ロボット１０は、無条件でユーザＵｂを慰める内容の音声Ｖｒを出力する。これにより、育児支援ロボット１０は、ユーザＵａおよびユーザＵｂの家族の一員として心の支えになることができる。なお、図８（Ａ）に示されている各タイミングは一例に過ぎない。子供であるユーザＵｂが日々の生活をするうえで、周囲の誰かがユーザＵｂに手を差し伸べるべきあらゆるタイミングで育児支援ロボットシステム１が適用される。 In addition, if the user Ub cries for some reason, the childcare support robot 10 outputs a voice Vr that unconditionally comforts the user Ub. This allows the childcare support robot 10 to be an emotional support for the user Ua and the user Ub as a member of their family. Note that the timings shown in FIG. 8(A) are merely examples. The childcare support robot system 1 is applied at any timing when someone around the user Ub, who is a child, should lend a helping hand as the user Ub goes about his or her daily life.

図８（Ｂ）は、育児支援ロボットから出力される音声の態様のバリエーションの具体例を示す図である。
上述したように、育児支援ロボット１０は、ユーザＵａおよびユーザＵｂの各々から発話された音声Ｖａおよび音声Ｖｂの各々に込められた感情に応じて、ユーザＵａによるユーザＵｂの育児を支援するための音声ＶｒをユーザＵｂに向けて出力する。このとき、育児支援ロボット１０から出力される音声Ｖｒの出力態様として、様々なバリエーションのものが想定される。出力態様のバリエーションは、例えば図８（Ｂ）に示すように、出力態様毎に名前を付した「出力モード」として、ユーザＵａの入力操作により予め選択できるようにしてもよい。出力モードには、「ソフト」、「ハード」、「祖父」、「祖母」、「先生」、「近所のおばさん」、「好きなキャラクター」といったものがある。出力モードは、ユーザＵａが選択可能な態様で、ユーザ端末５０に表示される。 FIG. 8(B) is a diagram showing specific examples of variations in the form of sounds output from the child care support robot.
As described above, the childcare support robot 10 outputs a voice Vr to the user Ub to support the user Ua in raising the user Ub, according to the emotions contained in the voices Va and Vb uttered by the user Ua and the user Ub, respectively. At this time, various variations are assumed as the output form of the voice Vr output from the childcare support robot 10. The variations of the output form may be preselected by the input operation of the user Ua as "output modes" with names given to each output form, as shown in FIG. 8B, for example. The output modes include "soft", "hard", "grandfather", "grandmother", "teacher", "neighborhood aunt", and "favorite character". The output modes are displayed on the user terminal 50 in a form selectable by the user Ua.

図８（Ｂ）に示す出力モードのうち、「ソフト」が選択されると、通常よりも柔らかい態様で音声Ｖｒが出力される。例えば上述した図７の例では、ユーザＵａが「すごい！」と発話すると、これに便乗するように育児支援ロボット１０が「すごい！」という音声Ｖｒを出力するが、「ソフト」が予め選択されていると、「すばらしいね！」といった音声Ｖｒを出力する。また、図８（Ｂ）に示す出力モードのうち、「ハード」が選択されると、通常よりも硬い態様で音声Ｖｒが出力される。例えば上述した図７の例では、ユーザＵａが「いい？」と発話すると、育児支援ロボット１０は「よく聞いてね」という音声Ｖｒを出力するが、「ハード」が予め選択されていると、「よく聞いてください」といった音声Ｖｒを出力する。 When "soft" is selected from the output modes shown in FIG. 8(B), the voice Vr is output in a softer manner than usual. For example, in the example of FIG. 7 described above, when the user Ua says "Amazing!", the childcare support robot 10 outputs the voice Vr "Amazing!" to piggyback on this, but if "soft" has been selected in advance, the voice Vr is output as "Amazing!". Also, when "hard" is selected from the output modes shown in FIG. 8(B), the voice Vr is output in a harder manner than usual. For example, in the example of FIG. 7 described above, when the user Ua says "Okay?", the childcare support robot 10 outputs the voice Vr "Listen carefully," but if "hard" has been selected in advance, the voice Vr is output as "Please listen carefully."

また、図８（Ｂ）に示す出力モードのうち、「祖父」が選択されると、一般的な祖父が発話しそうな態様で音声Ｖｒが出力される。例えば上述した図７の例では、ユーザＵａが「えらい！」と発話すると、これに便乗するように育児支援ロボット１０が「えらい！」という音声Ｖｒを出力するが、「祖父」が予め選択されていると、「大したもんだ！」といった音声Ｖｒを出力する。なお、図８（Ｂ）に示す出力モードのうち、「祖母」、「先生」、「近所のおばさん」が選択された場合も同様に、一般的な祖母、先生、近所のおばさんの各々が発話しそうな態様で音声Ｖｒが出力される。 When "grandfather" is selected from the output modes shown in FIG. 8(B), the voice Vr is output in a manner that a typical grandfather would likely speak. For example, in the example of FIG. 7 described above, when the user Ua utters "Great!", the child care support robot 10 outputs the voice Vr "Great!" to piggyback on this, but if "grandfather" has been selected in advance, the voice Vr is output as "That's impressive!". Note that, when "grandmother," "teacher," or "neighborhood aunt" is selected from the output modes shown in FIG. 8(B), the voice Vr is similarly output in a manner that a typical grandmother, teacher, or neighborhood aunt would likely speak.

また、図８（Ｂ）に示す出力モードのうち、「好きなキャラクター」が選択されると、ユーザＵｂが気に入っているアニメや漫画のキャラクターが発話しそうな態様で音声Ｖｒが出力される。この場合、対象となる声優の音声を別途収録したものを利用する。例えば上述した図７の例では、ユーザＵｂが泣いていると、育児支援ロボット１０は、ユーザＵｂが気に入っているキャラクターの声で、「泣かないで」という音声Ｖｒを出力する。これにより、ユーザＵｂの心をケアする効果が高まることが期待できる。なお、ユーザＵａは、「好きなキャラクター」を選択する場合、ユーザ端末５０にユーザ情報を入力する際に、ユーザＵｂが気に入っている１以上のキャラクターを予め指定しておく。 When "Favorite Character" is selected from the output modes shown in FIG. 8(B), the voice Vr is output in a manner that would be spoken by a favorite anime or manga character of the user Ub. In this case, a separately recorded voice of the target voice actor is used. For example, in the example of FIG. 7 described above, when the user Ub is crying, the child care support robot 10 outputs a voice Vr saying "Don't cry" in the voice of the favorite character of the user Ub. This is expected to improve the effect of caring for the user Ub's mind. When selecting "Favorite Character", the user Ua specifies in advance one or more favorite characters of the user Ub when inputting user information into the user terminal 50.

以上、本実施の形態について説明したが、本発明は上述した本実施の形態に限るものではない。また、本発明による効果も、上述した本実施の形態に記載されたものに限定されない。例えば、ユーザＵａおよびＵｂには、未だ本サービスの利用を開始していない者であって、本サービスの利用を希望する者も含まれるものとする。 Although the present embodiment has been described above, the present invention is not limited to the above-described present embodiment. Furthermore, the effects of the present invention are not limited to those described in the above-described present embodiment. For example, users Ua and Ub include those who have not yet started using the service but wish to use the service.

また、例えば上述の実施の形態では、ユーザＵａによる入力操作の態様として、手入力操作、音声による入力操作、ジェスチャによる入力操作を挙げているが、これらの入力操作に限定されず、ユーザＵａの意思を入力可能なあらゆる入力操作を採用することができる。また、上述した本実施の形態では、ユーザＵが出力態様のすべてについては予め指定することができない構成としているが、ユーザＵが出力態様のすべてを予め指定することができる構成とすることもできる。 For example, in the above-described embodiment, the input operation modes by the user Ua include manual input operation, voice input operation, and gesture input operation, but the input operation is not limited to these, and any input operation that can input the intention of the user Ua can be adopted. Also, in the above-described embodiment, the user U is not able to specify all of the output modes in advance, but the user U can also specify all of the output modes in advance.

また、上述の実施の形態では、育児支援ロボット１０から音声Ｖｒが出力される際の出態様を構成する要素として「内容」と「感情のパターン」とが挙げられているが、これら以外にも、例えば音声Ｖｒを出力するタイミング（間）や、出力される際のボリューム、速度等が含まれていてもよい。 In addition, in the above-described embodiment, the elements that constitute the output manner when the voice Vr is output from the childcare support robot 10 are "content" and "emotion pattern," but in addition to these, other elements may also be included, such as the timing (pause) at which the voice Vr is output, and the volume and speed at which it is output.

また、上述の実施の形態では、ユーザＵａが育児支援ロボット１０の各種設定を行うための入力操作を、ユーザ端末５０にインストールされた専用アプリを利用して行っているが、これに限定されない。例えば育児支援ロボット１０に直接入力できるようにしてもよい。 In addition, in the above embodiment, the user Ua performs input operations to configure various settings for the childcare support robot 10 using a dedicated app installed on the user terminal 50, but this is not limited to the present invention. For example, the input may be made directly to the childcare support robot 10.

また、上述の実施の形態では、ユーザＵａから発話された音声Ｖａの特徴量をなるべく維持した変換がなされる構成となっているが、ＡＩ（人工知能）をさらに活用することにより、ユーザＵａから発話された音声Ｖａの特徴量を大幅に変更する変換を行うこともできる。具体的には、ユーザＵｂのユーザ情報としてユーザＵｂのスケジュールが管理されるようにして、より詳細な情報を含む音声Ｖｒが育児支援ロボット１０から出力されるようにしてもよい。この場合、例えばユーザＵａからユーザＵｂに向けて「早くしなさい！」という音声Ｖａが発話された場合には、育児支援ロボット１０が、「８時までに家を出るから７時５０分までに着替えを済まそうね」といった詳細な情報を含む音声Ｖｒに変換して出力してもよい。 In the above embodiment, the conversion is performed while maintaining the features of the voice Va uttered by the user Ua as much as possible, but by further utilizing AI (artificial intelligence), it is also possible to perform conversion that significantly changes the features of the voice Va uttered by the user Ua. Specifically, the schedule of the user Ub may be managed as the user information of the user Ub, and the voice Vr including more detailed information may be output from the childcare support robot 10. In this case, for example, when the user Ua utters a voice Va saying "Hurry up!" to the user Ub, the childcare support robot 10 may convert the voice Va into a voice Vr including detailed information such as "I'll leave the house by 8 o'clock, so let's finish changing clothes by 7:50" and output it.

１…育児支援ロボットシステム、１０…育児支援ロボット、３１…音声取得部、３２…特定音声抽出部、３３…感情パターン推定部、３４…出力態様記憶部、３５…ユーザ情報取得部、３６…ユーザ情報記憶部、３７…出力態様受付部、３８…出力態様決定部、３９…変換部、４０…出力制御部、５０…ユーザ端末、７０…サーバ、７１…ユーザ情報取得部、７２…ユーザ情報記憶部、７３…ユーザ情報送信制御部、７４…出力態様受付部、７５…出力態様決定部、７６…出力態様送信制御部、９０…ネットワーク 1...Childcare support robot system, 10...Childcare support robot, 31...Speech acquisition unit, 32...Specific voice extraction unit, 33...Emotion pattern estimation unit, 34...Output mode storage unit, 35...User information acquisition unit, 36...User information storage unit, 37...Output mode reception unit, 38...Output mode determination unit, 39...Conversion unit, 40...Output control unit, 50...User terminal, 70...Server, 71...User information acquisition unit, 72...User information storage unit, 73...User information transmission control unit, 74...Output mode reception unit, 75...Output mode determination unit, 76...Output mode transmission control unit, 90...Network

Claims

an extraction means for detecting that a voice of a direct speech from a first user to a second user includes a portion of the voice having at least one of predetermined features of pitch, fundamental frequency, voice waveform periodicity, vocal tract resonance frequency, formant frequency band, voice volume, voice quality, voice tone, speed, and frequency of occurrence of waveform peaks, and extracting the portion of the voice as a specific voice;
an estimation means for estimating an emotion pattern of the first user based on the feature amount of the specific voice extracted by the extraction means;
a determination means for determining a manner of a voice to be output to a second user in accordance with the emotion pattern estimated by the estimation means;
A robot comprising:

the determining means determines, as the mode of the voice to be output to the second user, a voice having the same content as the speech of the first user and having a changed emotion pattern;
The robot according to claim 1.

a storage unit configured to store in advance a plurality of patterns of the voice manner corresponding to each of the positive emotions and the negative emotions of the first user;
the estimation means estimates, as an estimation of an emotion pattern of the first user, whether the emotion pattern of the first user corresponds to the positive emotion pattern or the negative emotion pattern;
the determining means selects and determines a pattern of the voice manner to be output to the second user from among the plurality of voice manner patterns stored in the storage means in accordance with a pattern of the emotion of the first user estimated by the estimating means.
The robot according to claim 1.

the determining means selects and determines a positive emotion pattern as a speech aspect pattern to be output to the second user regardless of a result of the estimation by the estimating means of a pattern of the emotion of the first user.
The robot according to claim 3 .

The extraction means further extracts a part of a context of a part of a voice having a predetermined feature from the voice of the first user,
The estimation means estimates an emotion pattern of the first user by further considering features of parts of a voice in a context before and after the part of the voice having the predetermined feature extracted by the extraction means.
The robot according to claim 1.

an extraction means for extracting, when the robot detects that a voice portion of a direct speech from a first user to a second user includes at least one of predetermined features of pitch, fundamental frequency, voice waveform periodicity, vocal tract resonance frequency, formant frequency band, voice volume, voice quality, voice tone, speed, and frequency of appearance of waveform peaks, as a specific voice;
an estimation means for estimating an emotion pattern of the first user based on the feature amount of the specific voice extracted by the extraction means;
a determination means for determining a form of a voice to be output by the robot to a second user in accordance with the emotion pattern estimated by the estimation means;
an output control means for controlling the robot to output a voice having the voice style determined by the determination means;
A robot system comprising:

The determining means determines, as the mode of the voice to be output to the second user, a voice having the same content as the speech of the first user and having a changed emotion pattern.
The robot system according to claim 6 .

a storage unit configured to store in advance a plurality of patterns of the voice manner corresponding to each of the positive emotions and the negative emotions of the first user;
the estimation means estimates, as an estimation of an emotion pattern of the first user, whether the emotion pattern of the first user corresponds to the positive emotion pattern or the negative emotion pattern;
The determination means selects and determines a pattern of the voice manner to be output to the second user from among the plurality of voice manner patterns stored in the storage means in accordance with a pattern of the emotion of the first user estimated by the estimation means.
The robot system according to claim 6 .

the determining means selects and determines a positive emotion pattern as a speech aspect pattern to be output to the second user regardless of a result of the estimating means of the emotion pattern of the first user.
The robot system according to claim 8 .

The extraction means further extracts a part of a context of a part of a voice having a predetermined feature from the voice of the first user,
The estimation means estimates an emotion pattern of the first user by further considering features of parts of a voice in a context before and after the part of the voice having the predetermined feature extracted by the extraction means.
The robot system according to claim 6 .