JP7613796B2

JP7613796B2 - Line-of-sight control device and method, non-transitory storage medium, and computer program

Info

Publication number: JP7613796B2
Application number: JP2024522892A
Authority: JP
Inventors: カルロストシノリイシイ; 太健新谷
Original assignee: RIKEN
Current assignee: RIKEN
Priority date: 2022-05-27
Filing date: 2022-10-18
Publication date: 2025-01-15
Anticipated expiration: 2042-10-18
Also published as: WO2023228433A1; JPWO2023228433A1

Description

この発明は、ロボットなどのエージェントの、人との対話時における視線を制御する技術に関する。この出願は2022年5月27日出願の日本出願第2022-086674号に基づく優先権を主張し、前記日本出願に記載された全ての記載内容を援用するものである。 This invention relates to technology for controlling the gaze of an agent, such as a robot, when interacting with a person. This application claims priority to Japanese Application No. 2022-086674, filed on May 27, 2022, and incorporates all of the contents of said Japanese application by reference.

ロボット及びバーチャルエージェントを含む様々な対話エージェントが社会進出している。人々が対話エージェントに触れる機会は増え始めている。小売店、ホテルのロビー及び駅など、人々が集まる場においてロボットなどのエージェントを見かけることも多くなっている。 A variety of conversational agents, including robots and virtual agents, are entering society. Opportunities for people to come into contact with conversational agents are increasing. It is becoming more common to see robots and other agents in places where people gather, such as retail stores, hotel lobbies, and train stations.

ロボットなどのエージェントが人と対話する際、発話が含む言語情報のみならずジェスチャ、表情、韻律、視線などのような非言語情報も対話を円滑に進めるために重要である。また言語情報及び非言語情報は、人々が個性を意識的に又は無意識的に表出する媒体となっているとも考えられる。When agents such as robots converse with people, not only the linguistic information contained in speech but also non-verbal information such as gestures, facial expressions, prosody, and gaze are important for smooth dialogue. Furthermore, linguistic and non-verbal information can be thought of as a medium through which people consciously or unconsciously express their individuality.

したがって、ロボットなどのエージェントが人と対話するときにも、言語情報のみならず非言語情報が大きな意味を持つ。特に、エージェントが人と対話するときに、その視線の制御は、対話を円滑にするためだけではなく、エージェントに個性を表出させる上でも重要である。 Therefore, when agents such as robots converse with people, not only verbal but also non-verbal information is of great significance. In particular, when an agent converses with a person, controlling the agent's gaze is important not only for smooth dialogue but also for allowing the agent to express its individuality.

非特許文献１には、２人の人が何らかの作業をしているときの両者の振る舞いを観察することにより、外向的な人と内向的な人とでは、視線を相手に向ける時間と、作業の対象に向ける時間とが相違することが報告されている。非特許文献１は、この結果に基づいてロボットの動作を制御することにより、ロボットと対話する人がどのような印象を受けるかに関する実験を行った結果を報告している。Non-Patent Document 1 reports that by observing the behavior of two people while they are performing some kind of task, an extrovert and an introvert differ in the amount of time they direct their gaze at the other person and the amount of time they direct their gaze at the object of their work. Non-Patent Document 1 reports the results of an experiment in which the movements of a robot were controlled based on these results to see what impression a person interacting with the robot would have.

しかし、非特許文献１の報告は、特定の環境において所定の作業をするときの人の振る舞いに基づくものである。一般的な対話において、ロボットの視線をどのように制御するかに関する指針は非特許文献１の開示からは得られない。However, the report in Non-Patent Document 1 is based on human behavior when performing a specific task in a specific environment. The disclosure in Non-Patent Document 1 does not provide any guidance on how to control a robot's gaze in general interactions.

これに対し、後掲の特許文献１には、１人又は複数の人と対話するときのロボットの視線の制御に関する技術が開示されている。特許文献１に開示された技術においては、発話している人の方向にロボットの視線を向ける。しかし、ある人が発話中に他の人が発話しても、基本的にはロボットの視線をその新たな発話者の方向には向けない。最初の発話者の発話に対するロボットの関心度を算出し、その値がしきい値より低ければ新たな発話の方向にロボットの視線を向ける。In response to this, Patent Document 1, listed below, discloses technology relating to controlling a robot's gaze when conversing with one or more people. In the technology disclosed in Patent Document 1, the robot's gaze is directed in the direction of the person who is speaking. However, even if one person speaks while another person is speaking, the robot's gaze is generally not directed in the direction of the new speaker. The robot's level of interest in the speech of the first speaker is calculated, and if this value is lower than a threshold value, the robot's gaze is directed in the direction of the new utterance.

非特許文献１によれば、このようにしてロボットの視線を制御することにより、ロボットが短期間に視線の方向をあちこちに向ける不自然な動きを防止できるという効果があるとされている。According to non-patent document 1, controlling the robot's gaze in this way has the effect of preventing the robot from moving unnaturally around in a short period of time.

特開2022-057507号公報Patent Publication No. 2022-057507

Sean Andrist他、”Look Like Me: Matching Robot Personality via Gaze to Increase Motivation”、Proceedings of the 33rd annual ACM conference on human factors in computing systems、2015Sean Andrist et al., “Look Like Me: Matching Robot Personality via Gaze to Increase Motivation,” Proceedings of the 33rd annual ACM conference on human factors in computing systems, 2015.

人の対話時には、発話権（ターン）がある人から別の人に移動するとき（ターン交替時）付近において特徴的な視線方向の動きが見られる。それだけでなく、発話者が話をしているときにも、他の発話者が必ず発話者の方向を向いているわけではない。人は、発話者ではなく別の方向に視線を向けたり、参加者が３人以上いるときには、話をしていない別の参加者の方向に視線を向けたりすることもある。発話者の視線についても同様である。さらにその視線方向の動きも、人の性格（個性）によって違いが見られる。特許文献１に開示の技術は、こうした情報に基づいてロボットの視線を制御しているわけではない。そのため、得られるロボットの視線の動きは必ずしも自然なものとはならない。 When people are conversing, characteristic movements in the direction of the gaze are observed around the time when the right to speak (turn) passes from one person to another (when changing turns). Not only that, but even when a speaker is speaking, other speakers do not always look in the direction of the speaker. People may look in a direction other than the speaker, or when there are three or more participants, they may look in the direction of another participant who is not speaking. The same is true for the speaker's gaze. Furthermore, the movement of the gaze direction also differs depending on the person's personality (individuality). The technology disclosed in Patent Document 1 does not control the gaze of the robot based on such information. As a result, the gaze movement of the robot obtained is not necessarily natural.

したがって、ロボットなどのエージェントの人との対話時の視線を、より自然に実現できるような視線の制御装置が望まれている。 Therefore, there is a demand for a gaze control device that can enable agents such as robots to gaze more naturally when interacting with humans.

この発明の第１の局面に係る視線制御装置は、複数人対話におけるロボットの視線を制御するための視線制御装置であって、視線方向の決定のためのタイミングとなったことに応答して、複数人対話におけるロボットの役割と対話フローの状態との組み合わせに基づいて、ロボットの視線方向を定めるための視線方向設定手段と、視線方向設定手段によりロボットの視線方向が定められたことに応答して、ロボットの顔の向き及び眼球の方向を制御するための制御パラメータを生成するための制御パラメータ生成手段とを含む。 A gaze control device according to a first aspect of the present invention is a gaze control device for controlling the gaze of a robot in a multi-person dialogue, and includes a gaze direction setting means for determining the gaze direction of the robot based on a combination of the role of the robot in the multi-person dialogue and the state of the dialogue flow in response to the timing for determining the gaze direction, and a control parameter generation means for generating control parameters for controlling the direction of the robot's face and eyeballs in response to the gaze direction of the robot being determined by the gaze direction setting means.

好ましくは、視線方向設定手段は、複数人対話における複数の参加者の各役割と、対話フローの状態との組み合わせに応じて、複数の参加者があらかじめ定められた複数の方向をそれぞれ向く確率を役割ごとに定める方向決定モデルを記憶するための方向決定モデル記憶手段と、視線方向の決定のためのタイミングとなったことに応答して、方向決定モデルからロボットの役割と対話フローの状態との組み合わせに応じた確率分布を抽出するための確率分布抽出手段と、確率分布抽出手段により抽出された確率分布からロボットの視線方向をサンプリングするための第１サンプリング手段とを含む。Preferably, the gaze direction setting means includes a direction determination model storage means for storing a direction determination model that determines the probability that each of the multiple participants will face in a predetermined number of directions for each role, depending on the combination of each role of the multiple participants in the multi-person dialogue and the state of the dialogue flow, a probability distribution extraction means for extracting a probability distribution from the direction determination model depending on the combination of the robot's role and the state of the dialogue flow in response to the timing for determining the gaze direction, and a first sampling means for sampling the robot's gaze direction from the probability distribution extracted by the probability distribution extraction means.

より好ましくは、方向決定モデルの複数の方向は、複数の参加者の方向と、複数の参加者の方向のいずれとも異なる視線逸らし方向とを含む。 More preferably, the multiple directions of the orientation determination model include multiple participant orientations and an averting direction that is different from any of the multiple participant orientations.

さらに好ましくは、視線方向設定手段はさらに、ロボットの役割と対話フローの状態との組み合わせに応じて視線逸らし方向を確率的に決定するための確率モデルからなる視線逸らし方向モデルを記憶するための視線逸らし方向モデル記憶手段と、第１サンプリング手段によりサンプリングされた視線方向が、視線逸らし方向であることに応答して、視線逸らし方向モデルからロボットの視線を逸らす方向をサンプリングするための第２サンプリング手段とを含む。 More preferably, the gaze direction setting means further includes an averted gaze direction model storage means for storing an averted gaze direction model consisting of a probabilistic model for probabilistically determining the averted gaze direction in accordance with a combination of the robot's role and the state of the dialogue flow, and a second sampling means for sampling the direction in which the robot averts its gaze from the averted gaze direction model in response to the gaze direction sampled by the first sampling means being the averted gaze direction.

好ましくは、視線制御装置は、さらに、ロボットの役割と、対話フローの状態と、視線方向設定手段により定められた視線方向との組み合わせに応じて、ロボットの視線の継続時間を算出するための継続時間算出部を含む。Preferably, the gaze control device further includes a duration calculation unit for calculating the duration of the robot's gaze based on a combination of the robot's role, the state of the dialogue flow, and the gaze direction determined by the gaze direction setting means.

より好ましくは、視線方向の決定のためのタイミングは、対話フローの状態がターン交替状態のときと、それ以外のときとで異なる。 More preferably, the timing for determining the gaze direction is different when the dialogue flow is in a turn alternation state than when it is not.

さらに好ましくは、視線方向の決定のためのタイミングは、対話フローの状態がターン交替状態のときにはターン交替状態中におけるあらかじめ定められたタイミングであり、対話フローの状態がターン交替状態でないときには、直前に継続時間算出部により算出された継続時間が満了したタイミングである。 More preferably, the timing for determining the gaze direction is a predetermined timing during the turn alternation state when the dialogue flow state is a turn alternation state, and when the dialogue flow state is not a turn alternation state, it is the timing at which the duration calculated immediately before by the duration calculation unit expires.

好ましくは、視線方向設定手段は、複数人対話における複数の参加者の各役割と、対話フローの状態と、ロボットに想定される個性との組み合わせに応じて、複数の参加者があらかじめ定められた複数の方向をそれぞれ向く確率を役割ごとに定める方向決定モデルを記憶するための方向決定モデル記憶手段と、視線方向の決定のためのタイミングとなったことに応答して、方向決定モデルからロボットの役割と対話フローの状態と個性との組み合わせに応じた確率分布を抽出するための確率分布抽出手段と、確率分布抽出手段により抽出された確率分布からロボットの視線方向をサンプリングするための第１サンプリング手段とを含む。Preferably, the gaze direction setting means includes a direction determination model storage means for storing a direction determination model that determines the probability that each of the multiple participants will face in a predetermined number of directions for each role, depending on a combination of the roles of each of the multiple participants in the multi-person dialogue, the state of the dialogue flow, and the personality assumed for the robot; a probability distribution extraction means for extracting a probability distribution from the direction determination model depending on the combination of the robot's role, the state of the dialogue flow, and the personality, in response to the timing for determining the gaze direction, and a first sampling means for sampling the robot's gaze direction from the probability distribution extracted by the probability distribution extraction means.

さらに好ましくは、視線方向設定手段はさらに、ロボットの役割と対話フローの状態と個性との組み合わせに応じて視線逸らし方向を確率的に決定するための確率モデルからなる視線逸らし方向モデルを記憶するための視線逸らし方向モデル記憶手段と、第１サンプリング手段によりサンプリングされた視線方向が、視線逸らし方向であることに応答して、視線逸らし方向モデルからロボットの視線を逸らす方向をサンプリングするための第２サンプリング手段とを含む。 More preferably, the gaze direction setting means further includes an averting direction model storage means for storing an averting direction model consisting of a probabilistic model for probabilistically determining the averting direction according to a combination of the robot's role, the state of the dialogue flow, and the personality, and a second sampling means for sampling the direction in which the robot averts its gaze from the averting direction model in response to the gaze direction sampled by the first sampling means being the averting direction.

好ましくは、視線制御装置は、さらに、ロボットの役割と、対話フローの状態と、個性と、視線方向設定手段により定められた視線方向との組み合わせに応じて、ロボットの視線の継続時間を算出するための継続時間算出部を含む。Preferably, the gaze control device further includes a duration calculation unit for calculating the duration of the robot's gaze based on a combination of the robot's role, the state of the dialogue flow, the personality, and the gaze direction determined by the gaze direction setting means.

より好ましくは、視線方向の決定のためのタイミングは、対話フローの状態がターン交替状態のときと、それ以外のときとで異なる。 More preferably, the timing for determining the gaze direction is different when the dialogue flow is in a turn-changing state than when it is not.

この発明の第２の局面に係る視線制御方法は、複数人対話におけるロボットの視線を制御するための、コンピュータにより実現される視線制御方法であって、コンピュータが、視線方向の決定のためのタイミングとなったことに応答して、複数人対話におけるロボットの役割と対話フローの状態との組み合わせに基づいて、ロボットの視線方向を定めるステップと、コンピュータが、視線方向を定めるステップにおいてロボットの視線方向が定められたことに応答して、ロボットの顔の向き及び眼球の方向を制御するための制御パラメータを生成するステップとを含む。 A gaze control method according to a second aspect of the present invention is a gaze control method implemented by a computer for controlling the gaze of a robot in a multi-person dialogue, and includes the steps of: determining the gaze direction of the robot based on a combination of the robot's role in the multi-person dialogue and the state of the dialogue flow, in response to the timing for determining the gaze direction by the computer; and generating control parameters for controlling the orientation of the robot's face and the direction of its eyes, in response to the robot's gaze direction being determined in the gaze direction determining step.

この発明の第３の局面に係るコンピュータプログラムは、複数人対話におけるロボットの視線を制御するためのコンピュータプログラムであって、コンピュータを、視線方向の決定のためのタイミングとなったことに応答して、複数人対話におけるロボットの役割と対話フローの状態との組み合わせに基づいて、ロボットの視線方向を定めるための視線方向設定手段と、視線方向設定手段によりロボットの視線方向が定められたことに応答して、ロボットの顔の向き及び眼球の方向を制御するための制御パラメータを生成するための制御パラメータ生成手段として機能させる。 A computer program according to a third aspect of the present invention is a computer program for controlling the gaze of a robot in a multi-person dialogue, which causes the computer to function as a gaze direction setting means for determining the gaze direction of the robot based on a combination of the robot's role in the multi-person dialogue and the state of the dialogue flow in response to the timing for determining the gaze direction, and a control parameter generation means for generating control parameters for controlling the orientation of the robot's face and the direction of the eyes in response to the gaze direction of the robot being determined by the gaze direction setting means.

以上のようにこの発明によれば、ロボットなどのエージェントの人との対話時の視線を、より自然に実現できるような視線の制御装置及び方法、並びにコンピュータプログラムを提供できる。As described above, this invention provides an eye gaze control device, method, and computer program that enable an agent, such as a robot, to gaze more naturally when interacting with a person.

図１は、予備実験の設定を模式的に示す図である。FIG. 1 is a schematic diagram showing the setup of the preliminary experiment. 図２は、予備実験における各参加者の発話に対するタグ付け方法を説明するための模式図である。FIG. 2 is a schematic diagram for explaining a method of tagging the utterances of each participant in the preliminary experiment. 図３は、発話のターン交替のタイミングを説明するための模式図である。FIG. 3 is a schematic diagram for explaining the timing of turn-taking in speech. 図４は、予備実験の３者対話のターン交替時における、各発話者の視線方向の頻度を示すグラフである。FIG. 4 is a graph showing the frequency of gaze directions of each speaker during turn-taking in a three-party dialogue in a preliminary experiment. 図５は、予備実験の３者対話における、ターン交替時以外の各発話者の視線方向の時間的割合を示すグラフである。FIG. 5 is a graph showing the proportion of time in the gaze direction of each speaker other than at turn-taking times in a three-party dialogue in a preliminary experiment. 図６は、対話の各参加者について、発話中の視線を他の各参加者に向ける割合と視線を逸らす時間とを表形式で示す図である。FIG. 6 is a diagram showing, in the form of a table, the proportion of gaze directed toward each of the other participants during speech and the duration of gaze averting for each participant in a conversation. 図７は、予備実験の３者対話における、外向的な特定参加者の、対話への参与役割の各々における視線方向の時間的割合を示すグラフである。FIG. 7 is a graph showing the proportion of time spent in the gaze direction of a specific extroverted participant in each of the participation roles in the dialogue in a three-way dialogue in a preliminary experiment. 図８は、予備実験の３者対話における、内向的な特定参加者の、対話への参与役割の各々における視線方向の時間的割合を示すグラフである。FIG. 8 is a graph showing the proportion of time spent in the gaze direction of a specific introverted participant in each of the participation roles in the dialogue in a three-way dialogue in a preliminary experiment. 図９は、外向的な対話参加者が発話者のときに、各参加者に視線を向けた時間の分布を示すグラフである。FIG. 9 is a graph showing the distribution of gaze time directed at each participant when an extroverted dialogue participant was the speaker. 図１０は、内向的な対話参加者が発話者のときに、各参加者に視線を向けた時間の分布を示すグラフである。FIG. 10 is a graph showing the distribution of gaze time for each participant when an introverted dialogue participant is the speaker. 図１１は、発話中における各役割の参加者の視線を逸らした回数の頻度を示すグラフである。FIG. 11 is a graph showing the frequency of participants of each role averting their gaze while speaking. 図１２は、対話参加者が視線を逸らすときの時間の分布を示すヒストグラムとその近似曲線を示すグラフである。FIG. 12 is a graph showing a histogram indicating the distribution of the time when dialogue participants look away and an approximation curve thereof. 図１３は、対話参加者が視線を逸らすときの瞳の位置の割合を示す図である。FIG. 13 is a diagram showing the proportion of the pupil positions when the dialogue participants look away. 図１４は、外向的な対話参加者が視線を逸らすときの瞳の位置の割合を示す図である。FIG. 14 is a diagram showing the proportion of pupil positions when an extroverted dialogue participant looks away. 図１５は、内向的な対話参加者が視線を逸らすときの瞳の位置の割合を示す図である。FIG. 15 is a diagram showing the proportion of pupil positions when an introverted dialogue participant looks away. 図１６は、この発明の実施形態に係る会話ロボットシステム１５０のハードウェア構成を示すブロック図である。FIG. 16 is a block diagram showing a hardware configuration of a conversational robot system 150 according to an embodiment of the present invention. 図１７は、図１６に示すロボットの外形を示す図である。FIG. 17 is a diagram showing the external appearance of the robot shown in FIG. 図１８は、図１６に示すコンピュータのハードウェア構成を示すブロック図である。FIG. 18 is a block diagram showing the hardware configuration of the computer shown in FIG. 図１９は、図１６に示すロボットの制御装置が実現する視線制御装置の機能的構成を示すブロック図である。FIG. 19 is a block diagram showing the functional configuration of a line of sight control device realized by the robot control device shown in FIG. 図２０は、図１９に示す視線方向モデルの構成を示す模式図である。FIG. 20 is a schematic diagram showing the configuration of the gaze direction model shown in FIG. 図２１は、図２０に示す個性別・役割別発話時視線方向モデルの１例を示す図である。FIG. 21 is a diagram showing an example of the gaze direction model at the time of speaking by personality and role shown in FIG. 図２２は、図１９に示す視線逸らしモデルの構成の１例を示す模式図である。FIG. 22 is a schematic diagram showing an example of the configuration of the gaze aversion model shown in FIG. 図２３は、図２２に示す個性別・ターン交替時・視線逸らし時視線方向モデルの１例を示す模式図である。FIG. 23 is a schematic diagram showing an example of the gaze direction model for each individual, when changing turns, and when looking away, shown in FIG. 図２４は、この発明の実施形態において、コンピュータを視線制御装置として機能させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 24 is a flowchart showing a control structure of a computer program causing a computer to function as a line of sight control device in the embodiment of the present invention. 図２５は、図２４に示す状態センシングステップをコンピュータに実行させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 25 is a flowchart showing a control structure of a computer program causing a computer to execute the state sensing steps shown in FIG. 図２６は、図２４に示す視線方向と継続時間を決定するステップをコンピュータに実行させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 26 is a flowchart showing a control structure of a computer program causing a computer to execute the steps of determining the line of sight direction and duration shown in FIG. 図２７は、図２４に示す視線方向決定ステップをコンピュータに実行させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 27 is a flowchart showing a control structure of a computer program causing a computer to execute the line of sight direction determining step shown in FIG. 図２８は、図２６に示す視線継続時間の決定ステップをコンピュータに実行させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 28 is a flowchart showing a control structure of a computer program causing a computer to execute the gaze duration determining step shown in FIG. 26. 図２９は、図２６の示す視線逸らし時の視線方向を決定するステップをコンピュータに実行させるコンピュータプログラムの制御構造を示すフローチャートである。FIG. 29 is a flowchart showing a control structure of a computer program causing a computer to execute the steps of determining the gaze direction when the gaze is averted, shown in FIG. 図３０は、評価実験のためのビデオ画面の配置を示す図である。FIG. 30 shows the layout of video screens for the evaluation experiment. 図３１は、評価実験の結果を示すグラフである。FIG. 31 is a graph showing the results of the evaluation experiment. 図３２は、評価実験の結果を示すグラフである。FIG. 32 is a graph showing the results of the evaluation experiment.

以下の説明及び図面においては、同一の部品には同一の参照番号を付してある。したがって、それらについての詳細な説明は繰返さない。なお以下の実施形態は、エージェントとして人の形をしたロボットを用いている。しかしこの発明はそのような実施形態には限定されない。ロボットとして必ずしも人の形をしたものではなく、眼球があって話をすることが前提とされているものならばどのような形でもよい。また、ロボットのように３次元的な実体を持たずとも、例えば２次元的な画像として表現される仮想エージェント、又は仮想空間上に３次元的な画像として表現される仮想エージェントに対してもこの発明を適用できる。In the following description and drawings, identical parts are given the same reference numbers. Therefore, detailed description thereof will not be repeated. In the following embodiment, a human-shaped robot is used as the agent. However, the present invention is not limited to such an embodiment. The robot does not necessarily have to be human-shaped, and can be of any shape as long as it has eyeballs and is assumed to be able to speak. Furthermore, the present invention can be applied to virtual agents that do not have a three-dimensional entity like a robot, for example, virtual agents that are represented as two-dimensional images, or virtual agents that are represented as three-dimensional images in a virtual space.

第１予備実験
１．データ収集の目的
複数対話時のロボットの視線の動きを自然なものにするためには、実際の人間の視線を調べ、必要な情報を収集する必要がある。そこために、我々は以下のような予備実験を行った。図１に、予備実験の設定を示す。 Part 1 Preliminary Experiment 1. Purpose of Data Collection In order to make the gaze movement of a robot during multiple conversations natural, it is necessary to investigate the gaze of real humans and collect the necessary information. For this purpose, we conducted the following preliminary experiment. Figure 1 shows the setup of the preliminary experiment.

図１を参照して、予備実験は３者対話５０として行った。３者対話５０の参加者は参加者６０、６２及び６４である。図１では参加者６０、６２及び６４はいずれも立っているが、予備実験においては視線の方向が重要であり、互いの位置を固定する必要がある。そのために、予備実験では後述するように椅子を準備し、参加者にはそれらの椅子に座って対話をしてもらった。予備実験においては、これら参加者に、井戸端会議のように、特に目的なく明確に議論の進め方が存在しないような形により自由に対話してもらい、そのときの各参加者の視線の方向に関する情報を収集した。このような対話においては、明確な議論の進め方のルールが存在しないにもかかわらず、会話が弾むことも多い。エージェントが社会進出するためには、会議のような目的のある対話から何気ない対話まで広い範囲のインタラクションができるようにすることが望ましい。 With reference to FIG. 1, the preliminary experiment was conducted as a three-way dialogue 50. The participants in the three-way dialogue 50 are participants 60, 62, and 64. In FIG. 1, participants 60, 62, and 64 are all standing, but in the preliminary experiment, the direction of gaze is important, and it is necessary to fix each other's positions. For this reason, in the preliminary experiment, chairs were prepared as described below, and participants were asked to sit on those chairs and talk. In the preliminary experiment, these participants were asked to talk freely in a form without any particular purpose and without a clear way of proceeding with the discussion, like a chit-chat, and information on the direction of gaze of each participant at that time was collected. In such a dialogue, the conversation often flows lively, even though there are no clear rules for how to proceed with the discussion. In order for an agent to enter society, it is desirable to enable a wide range of interactions, from purposeful dialogue such as a meeting to casual dialogue.

ところで、このように３者対話をしているときには、誰が発話をするか、すなわち誰が発話する権利を持つかが大きな意味を持つ。ここではこのような権利を発話権と呼び、発話権を保持している参加者を発話者と呼ぶものとする。他の参加者は聞く人の役割を持つことになる。これをここではリスナと呼ぶ。発話者は適宜交替していく。このように発話権が交替することをここではターン交替と呼ぶ。ターン交替があるたびに、対話における各参加者は発話者になったり、リスナになったりする。このように対話に参与する各参加者の立場をここでは参与役割、又は単に役割と呼ぶ。 When a three-way dialogue like this is taking place, who speaks, or in other words who has the right to speak, is of great significance. Here, this right is called the right to speak, and the participant who has the right to speak is called the speaker. The other participants take on the role of listeners, which we will call listeners. Speakers take turns as appropriate. This shift in speaking rights is called turn-taking. Each time a turn takes place, each participant in the dialogue becomes either the speaker or the listener. The position of each participant taking part in the dialogue in this way is called a participant role, or simply a role.

過去の研究から、対話の参加者は、対話の状況に応じて自己の役割を意識的に、又は無意識的に認識していることが知られている。すなわち、各参加者は、自分自身が発話者なのか、発話者から主に話しかけられているリスナなのか、会話に少ししか関与していないリスナなのかを常に認識している。ここでは、発話者から主に話しかけられているリスナをメインリスナ（ＭＬ）と呼び、会話に少ししか関与していないリスナをサブリスナ（ＳＬ）と呼ぶ。 Past research has shown that participants in a dialogue consciously or unconsciously recognize their own role depending on the situation of the dialogue. That is, each participant is always aware of whether he or she is the speaker, the listener who is primarily addressed by the speaker, or a listener who is only slightly involved in the conversation. Here, the listener who is primarily addressed by the speaker is called the main listener (ML), and the listener who is only slightly involved in the conversation is called the sub-listener (SL).

これも従来の研究から、会話における各参加者の役割と、その視線の動きに相関があることが知られている。予備実験においては、この考え方に従い、３者対話５０における参加者が、各役割に応じてどのように視線を動かしているかに関する情報を収集した。Previous research has also shown that there is a correlation between the role of each participant in a conversation and their eye movements. Following this idea, in the preliminary experiment, we collected information on how participants in a three-way dialogue 50 moved their eyes according to their roles.

２．データ解析（ラベル付け）
Ａ．データセット
予備実験には男女あわせて９人が参加した。これら参加者により、各グループが３人を含む６グループを形成した。各グループ内において、参加者３名が自由に対話を行った。対話の内容はフリートークである。対話の継続時間は２０分から３０分の範囲だった。 2. Data analysis (labeling)
A. Data Set A total of nine people, both male and female, participated in the preliminary experiment. These participants formed six groups, each with three participants. In each group, the three participants freely conversed. The content of the conversation was free talk. The duration of the conversation ranged from 20 to 30 minutes.

各グループの参加者は、三角形の３頂点にそれぞれ配置された３個の椅子にすわって話を行った。各参加者の前にカメラが設置され、各参加者の真正面から顔及び体の動きが撮影された。各参加者はヘッドセットマイクを装着しており、各参加者の音声データ及び議論全体の動画を収集した。対話終了後、これら録画及び音声データから対話の書き起こしを作成した。 Participants in each group sat in three chairs, positioned at the three vertices of a triangle, while they spoke. A camera was placed in front of each participant, capturing their facial and body movements from directly in front of them. Each participant was equipped with a headset microphone, which collected audio data from each participant as well as a video of the entire discussion. After the dialogue ended, a transcript of the dialogue was created from these recordings and audio data.

次に、アノテータが、視線方向、眼球の方向、発話権の有無、ターン交替、各参加者の対話における役割に注目して書き起こしデータに対するラベル付を行った。図２にそうして得られたラベル付対話データの例を示す。こうして得られたラベル付対話データに対して以下に説明する解析が行われた。Next, annotators labeled the transcripts, paying attention to gaze direction, eye direction, speaking rights, turn taking, and each participant's role in the dialogue. Figure 2 shows an example of the resulting labeled dialogue data. The following analysis was performed on the resulting labeled dialogue data.

Ｂ．データの分析
図２には３人の参加者Ａ、Ｂ及びＣの視線ラベル列８０、８２及び８４の例を示す。図２を参照して、例えば参加者Ａの視線ラベル列８０におけるラベル「Ｂ」は、この時間帯に参加者Ａが参加者Ｂを見ていたことを示す。同様に視線ラベル列８０におけるラベル「Ｃ」は参加者Ａが参加者Ｃを見ていたことを示す。視線ラベル列８０における「視線逸らし」は、参加者Ａが参加者Ｂ及び参加者Ｃのいずれも見ていなかったことを示す。すなわちこのラベルは、この期間に参加者Ａが他の参加者から目を逸らしていたことを示す。このように参加者が他の参加者を見ていないことをこの明細書においては「視線逸らし」と呼び、その継続時間を「視線逸らし継続時間」と呼ぶ。 B. Data Analysis FIG. 2 shows examples of gaze label strings 80, 82, and 84 for three participants A, B, and C. With reference to FIG. 2, for example, the label "B" in the gaze label string 80 for participant A indicates that participant A was looking at participant B during this time period. Similarly, the label "C" in the gaze label string 80 indicates that participant A was looking at participant C. "Looking away" in the gaze label string 80 indicates that participant A was not looking at either participant B or participant C. In other words, this label indicates that participant A was looking away from other participants during this period. In this specification, a participant not looking at other participants is referred to as "looking away," and the duration of this is referred to as "looking away duration."

視線ラベル列８２及び８４についても視線ラベル列８０と同様に作成した。 Gaze label sequences 82 and 84 were created in the same manner as gaze label sequence 80.

Ｃ．眼球の方向
人が人を見る動作において、ラベルと顔及び瞳の位置との間にはあまり大きなズレは生じない。一方、視線を逸らすという動きにおいては、視線、特に瞳の位置は非常に重要である。しかし、視線の方向を定める精度に限界があることに鑑み、この予備実験においては瞳の位置に対応する眼球の方向に関しては、中央及び中央の斜めを含む上下左右からなる９方向によりラベル付を行った。 C. Eye direction When a person looks at another person, there is not much deviation between the label and the face and pupil position. On the other hand, when looking away, the gaze, especially the pupil position, is very important. However, considering that there is a limit to the accuracy of determining the gaze direction, in this preliminary experiment, the eye direction corresponding to the pupil position was labeled in nine directions consisting of up, down, left, and right, including the center and diagonal to the center.

図２の視線ラベル列８０の「視線逸らし」と記載された部分の下には、そのときに参加者Ａの視線がどの方向を向いていたかを示すラベルが付加情報として記載されている。Below the part of the gaze label column 80 in Figure 2 that says "averted gaze," additional information is provided, which is a label indicating in which direction participant A's gaze was directed at that time.

例えば「上」と記載された期間には参加者Ａの視線が上を向いていたことを示し「右上」と記載された期間には、参加者Ａの視線が右上を向いていたことを示す。このとき、顔の角度だけではなく眼球の方向も含めて視線とし、この２つの要素から総合的に判断された方向を視線の方向としている。参加者Ｂ及び参加者Ｃについても同様のラベル付を行った。For example, the period marked "up" indicates that Participant A's gaze was directed upwards, and the period marked "upper right" indicates that Participant A's gaze was directed to the upper right. In this case, the gaze is determined not only by the angle of the face but also by the direction of the eyes, and the direction determined comprehensively from these two elements is the gaze direction. Participant B and C were also labeled in the same way.

Ｄ．発話権の有無及びターン交替
図３に、実際のターン交替の様子を示す。この例においては、最初に参加者Ａが発話権を持ち、発話１００を行っている。これをターン交替ラベル１０２により示す。この次に発話者Ｃが発話権を取り、発話１０４を行っている。これをターン交替ラベル１０６により示す。ターン交替のタイミングはアノテータが判断する。例えばターン交替ラベル１０６により、発話権が参加者Ａから参加者Ｃに移動したことが明確に分かる。 D. Presence or Absence of Speaking Rights and Turn Taking Figure 3 shows an actual turn taking situation. In this example, participant A first has the speaking right and makes an utterance 100. This is indicated by a turn taking label 102. Next, speaker C takes the speaking right and makes an utterance 104. This is indicated by a turn taking label 106. The timing of the turn taking is determined by the annotator. For example, it is clear from the turn taking label 106 that the speaking right has been transferred from participant A to participant C.

この実験においては、両者の発話権が交替するタイミングの前後１秒間の合計２秒間をターン交替１０８とし、特にこの間の各参加者の視線の方向の動きを視線ラベルから解析した。In this experiment, the turn alternation 108 lasted a total of two seconds, including one second before and after the moment when the two participants took turns speaking, and in particular, the movement of the gaze direction of each participant during this period was analyzed using gaze labels.

Ｅ．役割
３者対話における視線解析においては、対話における各参加者の役割が重要である。この予備実験においては、上記したように発話者、メインリスナ及びサブリスナを定義し、これらの各々についてターン交替時及びそれ以外に分けて視線解析を行った。なお、ターン交替時には発話権が移動する。ターン交替時の各参加者の役割を明確にするため、ここでは、ターン交替時における発話者とは、ターン交替により発話権をとった人のことをいう。図３の場合には参加者Ｃが発話者である。ターン交替時におけるメインリスナとは、一つ前に発話権を持っていて、ターン交替により発話者に発話権を譲った人のことをいう。図３の場合には参加者Ａがメインリスナである。この発話権の交替に関与しなかった人がサブリスナである。図３の場合、参加者Ｂがサブリスナである。 E. Roles In gaze analysis of a three-way dialogue, the role of each participant in the dialogue is important. In this preliminary experiment, the speaker, main listener, and sub-listener were defined as described above, and gaze analysis was performed for each of these separately during turn-taking and other times. The right to speak moves during turn-taking. In order to clarify the role of each participant during turn-taking, here, the speaker during turn-taking refers to the person who takes the right to speak due to the turn-taking. In the case of FIG. 3, participant C is the speaker. The main listener during turn-taking refers to the person who had the right to speak one turn ago and handed it over to the speaker during the turn-taking. In the case of FIG. 3, participant A is the main listener. The person who was not involved in this change of the right to speak is the sub-listener. In the case of FIG. 3, participant B is the sub-listener.

Ｆ．解析結果
ａ．ターン交替時
ターン交替時の役割ごとの視線の動きの割合の推移について図４に示す。図４において、横軸は時間（単位は秒）である。時間ｔ＝０．０の時に発話者がターンを取り、話し始める。縦軸は時間ｔにおいて発話者がどこを見ているのかの統計に基づき、その割合（０．０－１．０）を表している。例えば図３（Ａ）の横軸ｔ＝０．０秒において縦軸の値を見るとＭＬ、ＳＬ、視線逸らしはそれぞれ０．３７、０．１９及び０．４４である。これは話し始めたタイミングでの発話者の視線の先の総計の中で、０．３７がメインリスナを、０．１９がサブリスナを、それぞれ見ており、残る０．４４は視線を逸らしていたことを表す。視線方向は０．１秒ごとに算出されており、図３は、発話者が発話権を取り話し始めたタイミングの前後１秒ずつ、合計２秒の区間における発話者の視線のやり場の割合が、時間経過とともに遷移する様子を表している。 F. Analysis Results a. When changing turns Figure 4 shows the transition of the rate of eye movement for each role when changing turns. In Figure 4, the horizontal axis is time (unit: seconds). At time t = 0.0, the speaker takes the turn and starts speaking. The vertical axis shows the rate (0.0-1.0) based on statistics of where the speaker is looking at time t. For example, looking at the values on the vertical axis at t = 0.0 seconds on the horizontal axis in Figure 3 (A), the ML, SL, and averted gaze are 0.37, 0.19, and 0.44, respectively. This indicates that of the total distance the speaker was looking at when he started speaking, 0.37 was looking at the main listener, 0.19 was looking at the sub-listener, and the remaining 0.44 was looking away. The gaze direction was calculated every 0.1 seconds, and Figure 3 shows how the proportion of the speaker's gaze direction changes over time during a period of 2 seconds, 1 second before and 1 second after the speaker takes the floor and begins to speak.

なお、後述する実施形態においては、ターン交替時の２秒の区間を、ターン交替のタイミングを原点（０秒）として、－１．０秒から－０．３秒、－０．３秒から０．３秒、及び０．３秒から１秒の３つの区間に分割し、各区間の先頭において視線方向の決定を行う。そのため、予備実験においては、これら各区間についてそれぞれ別々に統計をとりモデルを作成する。In the embodiment described below, the 2-second period during turn alternation is divided into three periods, from -1.0 seconds to -0.3 seconds, -0.3 seconds to 0.3 seconds, and 0.3 seconds to 1 second, with the timing of the turn alternation as the origin (0 seconds), and the gaze direction is determined at the beginning of each period. Therefore, in the preliminary experiment, statistics are taken separately for each of these periods to create a model.

・発話者（ＳＰ）
図４（Ａ）にターン交替時における発話者（ＳＰ）の視線の割合の推移を示す。発話者はターンを取る１秒前はメインリスナ（ＭＬ）、つまり、前の発話者を見ている割合が高い。しかし、発話者が発話権を取り、発話を開始する時点（ｔ＝０．０秒）に向かうにつれて、メインリスナから視線を外す割合が高くなる。そして、発話者が話し始めて０．１秒経過したあたりにおいて発話者が視線を逸らす割合はピークを迎える。その後、発話者の視線がメインリスナを向く割合と視線を逸らす割合がほぼ同等に高くなり、ターン交替における視線の遷移を終える。この結果より、人はターン交替時の最初に前の発話者を見て、話を始める際に視線を逸らし、以降は視線を逸らしたまま、あるいは前の発話者であるメインリスナを見る傾向にある。・Speaker (SP)
FIG. 4A shows the transition of the rate of gaze of the speaker (SP) during turn taking. One second before taking a turn, the speaker has a high rate of looking at the main listener (ML), i.e., the previous speaker. However, as the speaker gets the right to speak and starts speaking (t=0.0 seconds), the rate of looking away from the main listener increases. Then, the rate of the speaker looking away reaches a peak about 0.1 seconds after the speaker starts speaking. After that, the rate of the speaker looking toward the main listener and the rate of looking away become almost equal, and the transition of gaze during turn taking ends. From this result, people tend to look at the previous speaker at the beginning of a turn taking, look away when they start speaking, and then keep looking away or look at the previous speaker, the main listener.

・メインリスナ（ＭＬ）
図４（Ｂ）に、ターン交替の直前まで発話権を持っていたメインリスナの視線の割合の遷移を示す。ターン交替時には、メインリスナが発話権を発話者に渡した、発話者に取られた、又は自然に発話者が交替したなど、状況は様々に考えられる。しかし、全体として、次の発話者が話し始める前から、メインリスナには次の発話者がわかっているか又は決めており、ターン交替の１秒前から発話権を譲ってから１秒後までの発話交替の区間において、メインリスナは発話者の方を見続ける傾向にある。・Main Listener (ML)
4B shows the transition of the gaze ratio of the main listener who had the right to speak until just before the turn change. When the turn change occurs, various situations are considered, such as the main listener handing over the right to speak to the speaker, the speaker taking it, or the speaker naturally changing over. However, overall, the main listener knows or has already decided who the next speaker will be before the next speaker starts speaking, and the main listener tends to keep looking at the speaker during the speaker change period from one second before the turn change to one second after handing over the right to speak.

・サブリスナ（ＳＬ）
最後に、ターン交替に関与しなかったサブリスナの視線の割合の遷移を図３（Ｃ）に示す。図４（Ｃ）において、横軸ｔ＝－１．０秒の時、すなわち、発話者が発言する１秒前までは、サブリスナはメインリスナを見る割合及び発話者を見る割合が同等に高い。しかし、そこからｔ＝１．０秒、つまり発話者が発話を開始し１秒経過した時点に向かうにつれてサブリスナの発話者を見る割合が高くなる。したがって、サブリスナはターン交替の１秒前からターン交替までに次発話者を推測し、それまではメインリスナの方又は次の発話者の方に視線を向け、それからは次の発話者へと視線を動かす傾向にあると考えられる。・Sub-listener (SL)
Finally, Figure 3(C) shows the transition in the proportion of gazes of sub-listeners who were not involved in the turn-taking. In Figure 4(C), at t = -1.0 seconds on the horizontal axis, that is, until one second before the speaker speaks, the proportion of sub-listeners looking at the main listener and the speaker is equally high. However, from there to t = 1.0 seconds, that is, one second after the speaker starts speaking, the proportion of sub-listeners looking at the speaker increases. Therefore, it is thought that sub-listeners tend to guess who will be the next speaker from one second before the turn-taking until the turn-taking, and until then, they look toward the main listener or the next speaker, and then move their gaze to the next speaker.

ｂ．ターン交替時以外（発話期間）
ターン交替時以外を、ここでは発話期間と呼ぶ。発話期間の先頭及び末尾はターン交替の影響を受けていると考えられる。したがって、以下の発話期間における視線解析においては、発話期間の先頭及び末尾の２秒ずつについては解析の対象から除いた。図５に、発話期間における各参加者の視線の方向の割合を示した。図６には、図５に示した発話区間における視線の割合の具体的な数字を表形式により示す。 b. Outside turn-taking (utterance period)
The time other than the turn-taking time is referred to as the speech period. The beginning and end of the speech period are considered to be affected by the turn-taking. Therefore, in the following gaze analysis of the speech period, the first and last two seconds of the speech period were excluded from the analysis. Figure 5 shows the proportion of gaze directions of each participant during the speech period. Figure 6 shows the specific figures of the gaze proportions in the speech section shown in Figure 5 in a table format.

図５において、横軸は視線の方向を表す。縦軸は各方向を向いていた時間の全体に対する割合を示す。In Figure 5, the horizontal axis represents the direction of gaze, and the vertical axis represents the percentage of time spent looking in each direction relative to the total time.

・発話者
図５（Ａ）を参照して、発話者については、発話中にメインリスナを見る割合が少し高い。しかし、発話者は全体としてバランスよく視線を配分している。 5A, the speaker has a slightly higher rate of looking at the main listener while speaking. However, the speaker generally distributes his/her gaze in a balanced manner.

・メインリスナ
図５（Ｂ）を参照して、メインリスナの場合、発話期間の７割近くは発話者の方向をむいており、サブリスナを見る割合はかなり低い。メインリスナが目を逸らす割合はサブリスナを見る割合よりかなり高いが発話者を見る割合よりもかなり低い。 Main listener: Referring to Fig. 5B, the main listener faces the speaker for nearly 70% of the speaking period, and looks at the sub-listeners very rarely. The main listener looks away much more often than the sub-listeners, but much less often than the speaker.

・サブリスナ
図５（Ｃ）を参照して、サブリスナの場合も傾向はメインリスナとほぼ同様である。すなわちサブリスナも、発話期間の７割近くは発話者の方向を見ている。目を逸らす割合はメインリスナを見る割合より高いが、メインリスナが目を逸らす割合よりは低い。 Sub-listeners: Referring to Figure 5C, the tendency of sub-listeners is almost the same as that of main listeners. That is, sub-listeners also look in the direction of the speaker for nearly 70% of the speaking period. The percentage of looking away is higher than the percentage of looking at the main listener, but lower than the percentage of looking away by the main listener.

ｃ．個性による視線の動き
視線の動きは個性によっても異なるのではないかという問題意識のもと、上記予備実験（第１の予備実験）とは別に、個性に関する視線の動きを解析するために、別の予備実験（第２の予備実験）を行った。第２の予備実験においては、男女あわせて１７名の参加者による３者対話を行った。全部で１４セッションを行い、各セッションにおいては１０分から２０分程度の自由会話を行った。３名の話者は一辺２メートル程度の正三角形の３頂点にそれぞれ配置された３個の椅子に座りヘッドセットマイクと加速度センサを装着した。これらとは別に画像深度カメラを用いて各参加者の動きを記録した。 c. Eye Movement Due to Personality Based on the concern that eye movement may differ depending on personality, another preliminary experiment (second preliminary experiment) was conducted in addition to the above preliminary experiment (first preliminary experiment) to analyze eye movement related to personality. In the second preliminary experiment, a three-way dialogue was conducted with 17 participants, both male and female. A total of 14 sessions were conducted, and each session involved free conversation for about 10 to 20 minutes. The three speakers sat on three chairs arranged at the three vertices of an equilateral triangle with sides of about 2 meters each, and wore headset microphones and acceleration sensors. In addition to these, the movements of each participant were recorded using an image depth camera.

個性を表現する指標としてＢＩＧ５と呼ばれる指標が存在する。ＢＩＧ５は人間のパーソナリティ特性を５つの次元により説明する。その一つに「外向性」がある。この予備実験において、我々は「外向性」に着目し、外向性により参加者の視線の動きがどのように異なるかを調べることを目的とした。There is an index called the BIG5 that is used to express individuality. The BIG5 explains human personality characteristics in five dimensions, one of which is "extraversion." In this preliminary experiment, we focused on "extraversion" and aimed to investigate how participants' eye movements differed depending on their extroversion.

この実験においては、１７名の参加者の中で、特に外向性の印象が異なる２名の参加者を選定し、この２名の参加者から得られたデータにからそれぞれの視線の動きを解析した。以下の説明においてはこの２名の参加者をそれぞれ話者Ａ及び話者Ｂとする。話者Ａは外向寄りという印象を与え、話者Ｂは内向寄りという印象を与えた。この２名の視線に関する解析結果を使用して異なるモデルを構築することにより、異なった個性をロボットなどのエージェントにより表出できる可能性がある。In this experiment, two participants who gave different impressions of extroversion were selected from the 17 participants, and the gaze movements of each were analyzed from the data obtained from these two participants. In the following explanation, these two participants are referred to as Speaker A and Speaker B. Speaker A gave the impression of being more extroverted, while Speaker B gave the impression of being more introverted. By constructing different models using the analysis results of the gaze of these two people, it may be possible to express different personalities using agents such as robots.

・発話者として
図７（Ａ）に、話者Ａが発話者（ＳＰ）のときに、メインリスナ（ＭＬ）を見た時間、サブリスナ（ＳＬ）を見た時間、及び視線を逸らした時間（ＧＡ）の割合を示す。図８（Ａ）には話者Ｂについて得られた結果を同様に示す。 As a speaker Figure 7(A) shows the percentage of time speaker A looked at the main listener (ML), the sub-listener (SL), and the gaze averted (GA) when he was the speaker (SP). Figure 8(A) similarly shows the results obtained for speaker B.

図７（Ａ）から、話者Ａが発話者のときにはメインリスナだけではなく、サブリスナを見る時間の割合も比較的高いことが分かる。すなわち、メインリスナ、サブリスナを見る時間の割合は互いにほぼ等しく、さらに視線を逸らす時間の割合もほぼ同じことが分かる。なお、個性による区別をしていない先の予備実験の結果である図５（Ａ）と比較すると、話者Ａがメインリスナを見る確率は相対的に低くなっている。 From Figure 7 (A), we can see that when Speaker A is the speaker, he or she spends a relatively high proportion of time looking not only at the main listener, but also at the sub-listener. In other words, the proportions of time they spend looking at the main listener and the sub-listener are roughly equal, and the proportions of time they spend looking away are also roughly the same. Furthermore, compared to Figure 5 (A), which shows the results of a previous preliminary experiment that did not distinguish between personalities, the probability that Speaker A looks at the main listener is relatively low.

一方、図８（Ａ）から、話者Ｂが発話者のときには、話者Ａと比較して視線を逸らす時間帯が多く、かつサブリスナをあまり見ない傾向にあることが分かる。特に図５（Ａ）と比較すると、話者Ｂの場合には視線を逸らす時間の割合がかなり高いことも分かる。On the other hand, Figure 8 (A) shows that when Speaker B is the speaker, he or she tends to look away more often and less at the sub-listener than Speaker A. In particular, when compared to Figure 5 (A), it can be seen that Speaker B spends a lot of time looking away.

・メインリスナとして
図７（Ｂ）に、話者Ａがメインリスナのときに、発話者を見た時間、サブリスナを見た時間、及び目を逸らした時間の割合を示す。図（Ｂ）には、話者Ｂについて得られた結果を同様に示す。 As the main listener, Figure 7B shows the percentage of time spent looking at the speaker, the percentage of time spent looking at the sub-listener, and the percentage of time spent looking away when speaker A was the main listener. Figure 7B also shows the results obtained for speaker B.

図７（Ｂ）を参照して、話者Ａがメインリスナのときには、発話者を見る時間の割合が非常に高く、時々、視線を逸らす傾向にあることが分かる。一方、図８（Ｂ）を参照して、話者Ｂの場合は、話者Ａと同様に発話者を見る時間の割合が高いものの、視線を逸らす時間の割合も話者Ａよりかなり高いことが分かる。話者Ａ及び話者Ｂのいずれの場合もサブリスナを見る時間の割合は低いことも分かる。 Referring to Figure 7 (B), it can be seen that when speaker A is the main listener, they spend a very large proportion of their time looking at the speaker, and they tend to look away from time to time. On the other hand, referring to Figure 8 (B), it can be seen that speaker B, like speaker A, spends a large proportion of their time looking at the speaker, but they also look away much more than speaker A. It can also be seen that the proportion of time that speakers A and B spend looking at the sub-listener is low.

図７（Ｂ）及び図８（Ｂ）を図５（Ｂ）と比較すると、これらは同じ傾向を示すことが分かる。ただし、図８（Ｂ）に示すように話者Ｂの場合には視線を逸らす時間の割合が高いことが注目される。 Comparing Figures 7(B) and 8(B) with Figure 5(B), we can see that they show the same tendency. However, it is noteworthy that Speaker B, as shown in Figure 8(B), spends a lot of time looking away.

・サブリスナとして
図７（Ｃ）に話者Ａがサブリスナのときに発話者を見た時間、メインリスナを見た時間、及び目を逸らした時間の割合を示す。図８（Ｃ）に、話者Ｂについて得られた結果を同様に示す。 As a sub-listener Figure 7C shows the percentage of time that speaker A looked at the speaker, the main listener, and the time that he looked away when he was a sub-listener. Figure 8C shows the results for speaker B in a similar way.

図７（Ｃ）より、話者Ａがサブリスナとなったときには、メインリスナの場合と同様、発話者を見る時間の割合が非常に高く、メインリスナを見たり、視線を逸らしたりする時間の割合はかなり低い。一方、図８（Ｃ）より、話者Ｂがサブリスナの場合には、話者を見る時間の割合は高いものの、話者Ａと比較するとその割合は低い。その代わりに話者Ｂの場合には、視線を逸らす時間の割合が高く、発話者を見る時間の割合と等しい。 As can be seen from Figure 7 (C), when speaker A is the sub-listener, as with the main listener, the proportion of time spent looking at the speaker is very high, and the proportion of time spent looking at the main listener or looking away is quite low. On the other hand, as can be seen from Figure 8 (C), when speaker B is the sub-listener, the proportion of time spent looking at the speaker is high, but it is lower than that of speaker A. Instead, in the case of speaker B, the proportion of time spent looking away is high, and is equal to the proportion of time spent looking at the speaker.

これらを図５（Ｃ）と比較すると、話者Ａと話者Ｂの傾向の違いがより明らかとなる。すなわち、話者Ａは個性を考慮しない場合と比較して発話者を見る時間の割合が高く、視線を逸らす時間の割合は低い。逆に話者Ｂは個性を考慮しない場合と比較して、発話者を見る時間の割合が低く、視線を逸らす時間の割合が高い。なお話者Ｂの場合には、メインリスナを見る時間の割合もかなり低くなっている。 Comparing these with Figure 5 (C), the difference in the tendencies between Speaker A and Speaker B becomes even clearer. That is, Speaker A spends a greater proportion of time looking at the speaker and averts his gaze less than when personality is not taken into account. Conversely, Speaker B spends a lesser proportion of time looking at the speaker and a greater proportion of time averting his gaze than when personality is not taken into account. Furthermore, in the case of Speaker B, the proportion of time spent looking at the main listener is also significantly lower.

ｄ．個性による視線の継続時間
話者Ａ及び話者Ｂの各々について、発話期間中（ターン交替期間以外の期間）の、発話者、メインリスナ、及びサブリスナとしての役割時における視線の継続時間の分布をそれぞれ解析した。その結果を図９（Ａ）から（Ｃ）及び図１０（Ａ）から（Ｃ）にそれぞれ示す。これら図には、各条件において視線の継続時間のヒストグラムと、以下の式により表されるχ^２分布によって近似される分布曲線とを示す。 d. Gaze duration by personality For each of speaker A and speaker B, the distribution of gaze duration during the speaking period (period other than the turn-taking period) when they were in the role of speaker, main listener, and sub-listener was analyzed. The results are shown in Figures 9 (A) to (C) and Figures 10 (A) to (C), respectively. These figures show histograms of gaze duration under each condition and distribution curves approximated by the ^χ2 distribution expressed by the following formula:

上式においてｎはχ^２分布の自由度であり、ｓｃａｌｅは分布曲線の縦軸方向の振幅の大きさを、ｌｏｃは分布曲線の横軸方向のバイアスを、それぞれ決めるパラメータである。

In the above equation, n is the degree of freedom of the ^χ2 distribution, scale is a parameter that determines the magnitude of the amplitude in the vertical axis direction of the distribution curve, and loc is a parameter that determines the bias in the horizontal axis direction of the distribution curve.

話者Ａ及び話者Ｂのそれぞれについて、発話者である時にメインリスナを見る継続時間、サブリスナを見る継続時間、及び視線を逸らす継続時間の３種類の継続時間の分布を算出した。図９は話者Ａの結果を、図１０は話者Ｂの結果を、それぞれ示す。For each of Speaker A and Speaker B, we calculated the distribution of three types of duration: the duration of looking at the main listener when speaking, the duration of looking at the sub-listener, and the duration of looking away. Figure 9 shows the results for Speaker A, and Figure 10 shows the results for Speaker B.

図９（Ａ）を参照して、話者Ａにおいては、メインリスナを見る時間の分布の近似曲線はｎ＝３．４，ｌｏｃ＝０．１５，ｓｃａｌｅ＝０．２により与えられる。図９（Ｂ）に示されるサブリスナを見る継続時間の分布の近似曲線はｎ＝４．３４，ｌｏｃ＝０．０，ｓｃａｌｅ＝０．１８により与えられる。したがって、話者Ａが発話者のときには、話者Ｂが発話者のときより、対話相手（メインリスナとサブリスナ）を見る継続時間は長めになる傾向にある。一方、図９（Ｃ）に記される、話者Ａが視線を逸らす継続時間の分布の近似曲線はｎ＝４．６７，ｌｏｃ＝０．０，ｓｃａｌｅ＝０．１２である。この結果から、話者Ａの場合には視線を逸らす継続時間は短くなる傾向にあることが分かる。 With reference to FIG. 9(A), for speaker A, the approximation curve of the distribution of the time spent looking at the main listener is given by n=3.4, loc=0.15, scale=0.2. The approximation curve of the distribution of the duration of looking at the sub-listener shown in FIG. 9(B) is given by n=4.34, loc=0.0, scale=0.18. Therefore, when speaker A is the speaker, the duration of looking at the interlocutor (main listener and sub-listener) tends to be longer than when speaker B is the speaker. On the other hand, the approximation curve of the distribution of the duration of looking away by speaker A shown in FIG. 9(C) is n=4.67, loc=0.0, scale=0.12. From this result, it can be seen that the duration of looking away tends to be shorter in the case of speaker A.

以上により、話者Ａが発話者となったとき、対話相手を見る継続時間は長めであり、時々視線を逸らす動作は短めに行う傾向にあることが分かる。 From the above, we can see that when Speaker A is the speaker, he or she tends to look at the interlocutor for longer periods of time and occasionally look away for shorter periods of time.

これに対し話者Ｂの場合、図１０（Ａ）を参照して、メインリスナを見る時間の分布の近似曲線はｎ＝７．９，ｌｏｃ＝０．０，ｓｃａｌｅ＝０．１により与えられる。また図１０（Ｂ）に示されるサブリスナを見る時の継続時間の分布の近似曲線はｎ＝４．７，ｌｏｃ＝０．１２，ｓｃａｌｅ＝０．１２により与えられる。この結果より話者Ｂの場合には、メインリスナを見る継続時間及びサブリスナを見る継続時間の双方とも短めになることが分かる。さらに、図１０（Ｃ）に示す、視線を逸らす継続時間の分布の近似曲線はｎ＝３．２８、ｌｏｃ＝０．１、ｓｃａｌｅ＝０．２）により与えられる。この結果より、話者Ｂの場合には、視線を逸らす継続時間が長めになる傾向にあることが分かる。In contrast, in the case of speaker B, referring to FIG. 10(A), the approximation curve of the distribution of the time spent looking at the main listener is given by n=7.9, loc=0.0, scale=0.1. Also, the approximation curve of the distribution of the duration of looking at the sub-listener shown in FIG. 10(B) is given by n=4.7, loc=0.12, scale=0.12. From these results, it can be seen that in the case of speaker B, both the duration of looking at the main listener and the duration of looking at the sub-listener are shorter. Furthermore, the approximation curve of the distribution of the duration of looking away shown in FIG. 10(C) is given by n=3.28, loc=0.1, scale=0.2. From these results, it can be seen that in the case of speaker B, the duration of looking away tends to be longer.

以上により、話者Ｂが発話者の場合には視線逸らしの継続時間が長くなり、時々メインリスナ及びサブリスナの方を短時間だけ見る傾向にあることが分かる。 From the above, we can see that when speaker B is the speaker, the duration of gaze averting is longer and there is a tendency for the listener to occasionally look briefly at the main listener and sub-listener.

なお、話者Ａ及び話者Ｂの各々がメインリスナ及びサブリスナのときの視線方向の継続時間分布についての詳細はここでは示されない。しかし、これら話者が発話者のときと類似した継続時間の分布の傾向が見られた。 Details about the distribution of gaze direction duration when Speaker A and Speaker B were the main listener and sub-listener, respectively, are not provided here. However, a similar distribution trend of duration was observed when these speakers were the speakers.

ｅ．視線逸らし
・視線逸らしの間隔
図１１に、第１予備実験において参加者が発話中に視線を逸らした回数の散布図を示す。横軸は発話中の時間を示す。縦軸は人から視線を逸らした回数を示す。なお横軸の時間は前述したとおり、発話の冒頭部分と末端部分とを除いた時間である。図１１には、視線逸らしの近似直線も表している。近似直線の回帰係数ｒは図１１（Ａ）、（Ｂ）及び（Ｃ）に関しそれぞれ０．８３，０．８２，０．６７である。図１１及び図６より、役割と関係なく視線を逸らす間隔が変わらないことが分かる。 e. Averting Eyes - Interval of Averting Eyes Figure 11 shows a scatter plot of the number of times participants looked away while speaking in the first preliminary experiment. The horizontal axis indicates the time during speaking. The vertical axis indicates the number of times they looked away from a person. As mentioned above, the time on the horizontal axis is the time excluding the beginning and end of the speech. Figure 11 also shows an approximation line of the averting eyes. The regression coefficient r of the approximation line is 0.83, 0.82, and 0.67 for Figures 11 (A), (B), and (C), respectively. It can be seen from Figures 11 and 6 that the interval of averting eyes does not change regardless of the role.

・眼球の向き
人が他の人を見るときの視線の動きは非常に単純である。したがってロボットが人を見る場合には、違和感のないようにロボットの顔を相手に向ければよい。しかしロボットが対話相手などから視線を逸らすときは単純な動きただけでは不十分なことが分かった。実際に収録された人による３者対話のデータを見ると、人の場合には眼球だけ移動させて視線を逸らす動きが見られた。このため、ロボットが視線を逸らすときに必要な人の眼球の動きの解析として、視線を逸らすときに人の眼球がどこを向いているのかに特に着目した。また、人が視線を逸らすときに、どの程度の時間、視線を逸らしているのかについて、その時間の分布について解析を行った。 - Eye direction The movement of the eyes when a person looks at another person is very simple. Therefore, when a robot looks at a person, it is sufficient to turn the robot's face towards the other person in a way that does not seem unnatural. However, it was found that when a robot averts its gaze from a conversation partner, a simple movement is not sufficient. Looking at the data of an actual recorded three-way conversation with a human, it was found that in the case of a human, the gaze was averted by moving only the eyeballs. For this reason, in order to analyze the human eye movement required when a robot averts its gaze, we focused in particular on where the human eyeballs are directed when averting their gaze. We also analyzed the distribution of time for which a person averts their gaze when averting their gaze.

・視線逸らしの継続時間
図１２に、人が視線を逸らす動きをした時の継続時間のヒストグラムを示す。この分布は次の指数分布により近似される。 Duration of gaze aversion Figure 12 shows a histogram of the duration of gaze aversion. This distribution is approximated by the following exponential distribution:

ただしμ＝０．２、λ＝０．６３である。なお、０．２秒未満のサンプルは実際のエージェントへの実装を考えたとき、非現実的だと考え、０．２秒以上のサンプルに限って分析した。解析対象となったサンプルの継続時間の平均値は０．８３秒、中央値は０．５５秒だった。

where μ = 0.2 and λ = 0.63. Samples shorter than 0.2 seconds were considered unrealistic when considering implementation in an actual agent, so only samples longer than 0.2 seconds were analyzed. The average duration of the samples analyzed was 0.83 seconds, and the median was 0.55 seconds.

・視線逸らし時の瞳の配分
人は、対話相手から視線を逸らすとき、顔だけでなく、瞳（眼球）を主に動かして視線を逸らす。そのように視線を逸らすとき、人は眼球を何回動かすのか、どのようなパターンに従って視線を逸らすのかについても解析した。 - Eye distribution when looking away When people look away from a conversation partner, they mainly move their pupils (eyeballs) and not just their face. We also analyzed how many times people move their eyes when looking away and what pattern they follow.

第１の予備実験の結果から、各話者において、視線を逸らす際の眼球の向きを｛前方、上、下、左、右、左上、右上、左下、右下｝の９方向に分類して割合を求めた。その結果を図１３に示す。図１３を参照して、人が視線を逸らすときには、中央及び中央下側という２箇所に瞳を逸らす傾向が強いことが分かる。また、他の７箇所については、多少の偏りはあるものの、人が目を逸らす際にはほぼ同等の割合により瞳を分散させることが分かる。 From the results of the first preliminary experiment, the eye direction when averting gaze for each speaker was classified into nine directions: {forward, up, down, left, right, upper left, upper right, lower left, lower right}, and the proportions were calculated. The results are shown in Figure 13. Referring to Figure 13, it can be seen that when people avert their gaze, they tend to avert their pupils to two locations: the center and the lower center. It can also be seen that, although there is some bias, the pupils are dispersed to the other seven locations in roughly equal proportions when people avert their eyes.

・個性による瞳の配分
第２予備実験の結果を第１予備実験と同様にして視線逸らし時の瞳の配分を解析した。図１４に話者Ａの結果を、図１５に話者Ｂの結果を、それぞれ示す。 Pupil distribution by personality The results of the second preliminary experiment were analyzed for pupil distribution when looking away in the same manner as the first preliminary experiment. Figure 14 shows the results for speaker A, and Figure 15 shows the results for speaker B.

図１４及び図１５を参照して、第２予備実験においても、第１予備実験における結果と同様、話者Ａも話者Ｂも、視線を逸らす際に、上方向よりも、下方向に目を逸らす頻度が高いことが分かる。ただし、話者Ｂの方が下方向に目を逸らす頻度が高い傾向にある。両話者とも眼球が前方を向いている（すなわち瞳が中央に位置する）頻度が比較的高い。第２予備実験において得られた３者対話データにおいては、対話者は斜め左右に位置している。そのため、話者の眼球が前方を向いているということは、話者が対話相手のいずれからも視線を逸らしていることを意味している。 With reference to Figures 14 and 15, in the second preliminary experiment, as in the results of the first preliminary experiment, it can be seen that both Speaker A and Speaker B look downward more frequently than upward when looking away. However, Speaker B tends to look downward more frequently. Both speakers' eyes face forward (i.e., their pupils are located in the center) relatively frequently. In the three-way dialogue data obtained in the second preliminary experiment, the interlocutors are located diagonally to the left and right. Therefore, when a speaker's eyes face forward, it means that the speaker is looking away from both of his or her dialogue partners.

なお、図１４及び図１５においては右下に瞳を逸らす割合が、図１３と比較して高く、中央及び中央下部のいずれでもない他の６箇所と比較しても高い。この偏りの原因は不明である。しかし、我々は、図１３の結果に鑑み、後述の実施形態においては図１４及び図１５の場合も、視線逸らし時には、中央及び中央下部以外の７箇所には、ほぼ均等に瞳が向くような実装をした。 In addition, in Figures 14 and 15, the proportion of pupils that are deflected to the lower right is higher than in Figure 13, and is also higher than the other six locations that are neither the center nor the lower center. The cause of this bias is unknown. However, in light of the results in Figure 13, in the embodiment described below, we have implemented the case of Figures 14 and 15 so that when the gaze is deflected, the pupils are almost evenly directed to the seven locations other than the center and the lower center.

・視線逸らしの回数
第１実験においても第２実験においても、対話参加者が視線を逸らすとき、一点を長時間見ているのではなく、眼球を複数回動かす傾向が見られた。視線逸らしの時間が長くなるほど、眼球を動かす回数も多くなる傾向が見られた。視線を逸らした回数ごとの継続時間の分布を解析した結果に基づき、後掲の実施形態においては、ロボットが視線を逸らす回数は、視線逸らしの継続時間が０．７秒以下のときに１回、１．４秒以下のときに２回、２．１秒以下のときに３回、というように、０．７秒ごとに１回だけ、眼球を動かす回数を増やすこととした。 - Number of times of looking away In both the first and second experiments, when the dialogue participants looked away, they tended to move their eyes multiple times, rather than staring at one point for a long time. The longer the time of looking away, the more the number of times of eye movement tended to increase. Based on the results of analyzing the distribution of the duration of each number of times of looking away, in the embodiment described below, the number of times the robot looked away was increased by one for every 0.7 seconds, i.e., once when the duration of looking away was 0.7 seconds or less, twice when it was 1.4 seconds or less, and three times when it was 2.1 seconds or less.

第２実施形態
１．システム構成
図１６を参照して、以下にロボット１６０を制御するための、この発明の実施形態に係る視線制御システムを採用した会話ロボットシステム１５０の構成を説明する。ロボットなどのエージェントが人と対話するためには、音声認識機能、音声合成機能、及びこれらを使用した対話機能を実装する必要がある。この実施形態においては、これら機能については既存のものを用いることを前提とする。また３者対話を行うためには、ターン交替及び各対話参加者の役割を動的に認識する必要がある。ターン交替を認識するのは難しいが、例えば以下の参考文献に記載されたターン交替のタイミングを認識する技術を利用できる。 Second embodiment 1. System configuration With reference to FIG. 16, the configuration of a conversational robot system 150 employing a gaze control system according to an embodiment of the present invention for controlling a robot 160 will be described below. In order for an agent such as a robot to converse with a person, it is necessary to implement a voice recognition function, a voice synthesis function, and a dialogue function using these functions. In this embodiment, it is assumed that these functions are existing. In addition, in order to conduct a three-way dialogue, it is necessary to dynamically recognize turn taking and the roles of each dialogue participant. Although it is difficult to recognize turn taking, it is possible to use a technology for recognizing the timing of turn taking, for example, as described in the following reference document.

Chaoran Liu et al, “Turn-taking Estimation Model Based on Joint Embedding of Lexical and Prosodic Contents”, INTERSPEECH 2017, [Online], August 20-24, 2017, Stockholm, Sweden, [令和４年５月３日検索], インターネット＜URL:https://isca-speech.org/archive_v0/Interspeech_2017/pdfs/0965.PDF＞
上記参考文献においては、ターン交替のタイミングを検出するモデルを使用している。それに対し本実施形態においてはターン交替のタイミングの１秒前を検出する必要がある。しかし、これについても、検出すべきタイミングをターン交替の１秒前に設定した上で、上記参考文献に記載の手法を用いてモデルの訓練を行うことによりターン交替のタイミングの１秒前を検出できる。 Chaoran Liu et al, “Turn-taking Estimation Model Based on Joint Embedding of Lexical and Prosodic Contents”, INTERSPEECH 2017, [Online], August 20-24, 2017, Stockholm, Sweden, [Retrieved May 3, 2022], Internet <URL:https://isca-speech.org/archive_v0/Interspeech_2017/pdfs/0965.PDF>
In the above-mentioned reference, a model is used to detect the timing of turn alternation. In contrast, in the present embodiment, it is necessary to detect one second before the timing of turn alternation. However, even in this case, by setting the timing to be detected to one second before the turn alternation and training the model using the method described in the above-mentioned reference, it is possible to detect one second before the timing of turn alternation.

また、以下の実施形態においては、ロボットを含めた参加者の役割の認識と、ロボットが発話権を獲得する動作をすべきか否かについての認識とは、所与のものとして外部から与えられるものとする。しかし、例えば発話者が誰かについては、発話者の位置を把握した上で、後述するように音源を定位することにより特定できる。またメインリスナが誰かについては対話参加者の顔画像に基づき、発話者が各参加者の方を向いている時間の割合などに基づいて推定できる。ロボットがメインリスナならば、ターン交替が検出されたときに発話を開始するようにすればよい。ロボットが発話者のときには、他の参加者の顔画像から、より長い時間だけロボットの方向を向いている参加者をメインリスナにする、などの手法も考えられる。 In the following embodiments, the recognition of the roles of the participants, including the robot, and the recognition of whether the robot should take action to acquire the right to speak are assumed to be given from the outside. However, for example, who is speaking can be identified by grasping the position of the speaker and then localizing the sound source as described below. Also, who is the main listener can be estimated based on the facial images of the dialogue participants and the proportion of time that the speaker is facing each participant. If the robot is the main listener, it can simply start speaking when a turn change is detected. When the robot is the speaker, a method can be considered in which the participant who is facing the robot for a longer period of time from the facial images of the other participants is designated as the main listener.

図１６に、この実施形態に係る会話ロボットシステム１５０のハードウェア構成を示す。図１６を参照して、会話ロボットシステム１５０は、ロボット１６０の視線制御を行うための確率モデルを記憶した確率モデル記憶装置１７２と、確率モデル記憶装置１７２に記憶された確率モデルを使用してロボット１６０の視線に関する動作を制御するための動作制御ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）１６２と、動作制御ＰＣ１６２が接続されるネットワーク１７６とを含む。この実施形態に係るロボット１６０は、他の２人の対話参加者とともに３者対話を行うためのものである。 Figure 16 shows the hardware configuration of a conversational robot system 150 according to this embodiment. Referring to Figure 16, the conversational robot system 150 includes a probability model storage device 172 that stores a probability model for controlling the gaze of the robot 160, an operation control PC (Personal Computer) 162 for controlling the gaze-related operation of the robot 160 using the probability model stored in the probability model storage device 172, and a network 176 to which the operation control PC 162 is connected. The robot 160 according to this embodiment is for carrying out a three-way dialogue with two other dialogue participants.

会話ロボットシステム１５０はさらに、対話相手の位置を検出するための人位置センサ１７８、及び人位置センサ１７８とネットワーク１７６とに接続された人位置認識ＰＣ１８０を含む。人位置センサ１７８は対話相手の位置を検出して検出信号を人位置認識ＰＣ１８０に与えるためのものである。人位置認識ＰＣ１８０は、人位置センサ１７８からの信号に基づいて、ロボット１６０から見た対話相手の位置を算出し、ネットワーク１７６を介して動作制御ＰＣ１６２に送信するためのものである。The conversation robot system 150 further includes a human position sensor 178 for detecting the position of the conversation partner, and a human position recognition PC 180 connected to the human position sensor 178 and the network 176. The human position sensor 178 is for detecting the position of the conversation partner and providing a detection signal to the human position recognition PC 180. The human position recognition PC 180 is for calculating the position of the conversation partner as seen by the robot 160 based on the signal from the human position sensor 178, and transmitting the calculated position to the operation control PC 162 via the network 176.

会話ロボットシステム１５０はさらに、マイクロフォン１６４と、マイクロフォン１６４から受ける音声信号に基づいて対話参加者の発話内容の認識とターン交替及び各参加者の役割とを認識し、その結果をネットワーク１７６を介して動作制御ＰＣ１６２に送信するための音声処理ＰＣ１６６とを含む。マイクロフォン１６４はマイクロフォンアレイであり、音声処理ＰＣ１６６はマイクロフォン１６４の出力に基づいて音源の位置が特定できる。音声処理ＰＣ１６６は、発話テキスト、ターン交替の検出情報、及び各参加者の役割とともに、発話を行った人物の位置を示す情報も動作制御ＰＣ１６２に送信する機能を持つ。動作制御ＰＣ１６２は、発話テキストの内容と音声処理ＰＣ１６６によるターン交替のタイミングの検出結果及び役割の認識結果とに基づいて、自己が発話権を取得すべきか否かを判定する機能を持つ。The conversation robot system 150 further includes a microphone 164 and a voice processing PC 166 for recognizing the contents of the dialogue participants' utterances, turn changes, and the roles of each participant based on the voice signal received from the microphone 164, and transmitting the results to the operation control PC 162 via the network 176. The microphone 164 is a microphone array, and the voice processing PC 166 can identify the position of the sound source based on the output of the microphone 164. The voice processing PC 166 has a function of transmitting to the operation control PC 162 information indicating the position of the person who made the utterance, along with the spoken text, turn change detection information, and the role of each participant. The operation control PC 162 has a function of determining whether or not it should acquire the right to speak based on the contents of the spoken text and the detection result of the timing of turn changes and the role recognition result by the voice processing PC 166.

会話ロボットシステム１５０はさらに、発話テキストを受けてそのテキストに対応する音声信号を生成するための音声合成ＰＣ１７０と、音声合成ＰＣ１７０から音声信号を受けて音声に変換するためのスピーカ１６８とを含む。音声合成ＰＣ１７０による音声合成の声質は、ロボット１６０の外観にふさわしいものに選ばれる。The conversational robot system 150 further includes a voice synthesis PC 170 for receiving spoken text and generating a voice signal corresponding to the text, and a speaker 168 for receiving the voice signal from the voice synthesis PC 170 and converting it into voice. The voice quality of the voice synthesis by the voice synthesis PC 170 is selected to be suitable for the appearance of the robot 160.

会話ロボットシステム１５０はさらに、ネットワーク１７６に接続され、ネットワーク１７６を介して他のＰＣから発話テキストを受けたことに応答して、その発話に対する応答を生成し出力するための対話用ＰＣ１７４を含む。動作制御ＰＣ１６２は、音声処理ＰＣ１６６によりロボット１６０が発話権を取得すべきと判定されたときには、発話者の発話のテキストを対話用ＰＣ１７４に送る機能を持つ。対話用ＰＣ１７４がこの入力に対する応答の発話テキストを生成して動作制御ＰＣ１６２及び確率モデル記憶装置１７２に送信することにより、その発話テキストに対応する音声が音声合成ＰＣ１７０及びスピーカ１６８により生成され、動作制御ＰＣ１６２は発話に対応して音声処理ＰＣ１６６の動きを制御できる。The conversation robot system 150 further includes an interaction PC 174 that is connected to the network 176 and generates and outputs a response to an utterance in response to a speech text received from another PC via the network 176. The operation control PC 162 has a function of sending the text of the speaker's utterance to the interaction PC 174 when the voice processing PC 166 determines that the robot 160 should acquire the right to speak. The interaction PC 174 generates a speech text in response to this input and sends it to the operation control PC 162 and the probability model storage device 172, so that a voice corresponding to the speech text is generated by the voice synthesis PC 170 and the speaker 168, and the operation control PC 162 can control the movement of the voice processing PC 166 in response to the utterance.

音声処理ＰＣ１６６は、マイクロフォン１６４の出力する音声信号に対して音声認識を行い、さらにマイクロフォン１６４の出力に基づいて音源定位を行って、音声認識の結果として得られるテキストと、発話者の位置を示す情報とを動作制御ＰＣ１６２に送信する機能を持つ音声認識部１９０とを含む。音声処理ＰＣ１６６はさらに、音声認識部１９０の出力するテキストと、マイクロフォン１６４の音声信号から得られる音声パワー及び基本周波数Ｆ０を含む韻律情報とを用いて、上記した参考文献により開示された方法に従ってターン交替を検出する処理と、ターン交替に伴う各参加者の役割を認識する処理とを実行し、結果を動作制御ＰＣ１６２に送信するためのターン認識部１９２を含む。The voice processing PC 166 includes a voice recognition unit 190 that performs voice recognition on the voice signal output by the microphone 164, and further performs sound source localization based on the output of the microphone 164, and transmits text obtained as a result of the voice recognition and information indicating the position of the speaker to the operation control PC 162. The voice processing PC 166 further includes a turn recognition unit 192 that performs a process of detecting turn alternation according to the method disclosed in the above-mentioned reference document using the text output by the voice recognition unit 190 and prosodic information including the voice power and fundamental frequency F0 obtained from the voice signal of the microphone 164, and a process of recognizing the role of each participant in the turn alternation, and transmits the results to the operation control PC 162.

図１７にロボット１６０の外観を示す。図１７に示すようにロボット１６０はやや小ぶりなロボットであり、上半身及び頭部を左右に回転させることができる。さらにロボット１６０の眼球には大きな瞳が描かれており、眼球の回転角度を制御することによりロボット１６０の視線方向を上下左右に移動させることができる。 Figure 17 shows the external appearance of robot 160. As shown in Figure 17, robot 160 is a relatively small robot, and its upper body and head can rotate left and right. Furthermore, the eyes of robot 160 have large pupils, and by controlling the rotation angle of the eyeballs, the line of sight of robot 160 can be moved up, down, left and right.

２．ハードウェア構成
図１８は、例えば図１６に示す動作制御ＰＣ１６２として動作するコンピュータシステムのハードウェアブロック図である。図１６に示す音声処理ＰＣ１６６、音声合成ＰＣ１７０、及び人位置認識ＰＣ１８０についても動作制御ＰＣ１６２とほぼ同様の構成のコンピュータシステムにより実現できる。ここではこれらＰＣ１６２を代表するものとしてコンピュータシステム２５０の構成についてのみ述べることとし、個々のＰＣの構成の詳細については述べない。 2. Hardware Configuration Fig. 18 is a hardware block diagram of a computer system that operates as, for example, the operation control PC 162 shown in Fig. 16. The voice processing PC 166, voice synthesis PC 170, and human position recognition PC 180 shown in Fig. 16 can also be realized by a computer system having a configuration substantially similar to that of the operation control PC 162. Here, only the configuration of computer system 250 will be described as a representative of these PCs 162, and the configuration of each PC will not be described in detail.

図１８を参照して、このコンピュータシステム２５０はＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）ドライブ３０２を有するコンピュータ２７０と、いずれもコンピュータ２７０に接続された、ユーザと対話するためのキーボード２７４、マウス２７６、及びモニタ２７２とを含む。もちろんこれらはユーザ対話が必要となったときのための構成の一例であって、ユーザ対話に利用できる一般のハードウェア及びソフトウェア（例えばタッチパネル、音声入力、ポインティングデバイス一般）ならばどのようなものも利用できる。18, this computer system 250 includes a computer 270 having a DVD (Digital Versatile Disc) drive 302, and a keyboard 274, a mouse 276, and a monitor 272 for interacting with a user, all of which are connected to the computer 270. Of course, these are just examples of configurations for when user interaction is required, and any general hardware and software available for user interaction (e.g., a touch panel, voice input, pointing devices in general) can be used.

図１８を参照して、コンピュータ２７０は、ＤＶＤドライブ３０２に加えて、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２９０と、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２９２と、ＣＰＵ２９０、ＧＰＵ２９２、ＤＶＤドライブ３０２に接続されたバス３１０と、バス３１０に接続され、コンピュータ２７０のブートアッププログラムなどを記憶するＲＯＭ（Ｒｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）２９６と、バス３１０に接続され、プログラムを構成する命令、システムプログラム、及び作業データなどを記憶するＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２９８と、バス３１０に接続された不揮発性メモリであるＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）３００とを含む。ＳＳＤ３００は、ＣＰＵ２９０及びＧＰＵ２９２が実行するプログラム、並びにＣＰＵ２９０及びＧＰＵ２９２が実行するプログラムが使用するデータなどを記憶するためのものである。コンピュータ２７０はさらに、他端末との通信を可能とするネットワーク１７６への接続を提供するネットワークＩ／Ｆ（Ｉｎｔｅｒｆａｃｅ）３０８と、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリ２８４が着脱可能で、ＵＳＢメモリ２８４とコンピュータ２７０内の各部との通信を提供するＵＳＢポート３０６とを含む。 Referring to FIG. 18, computer 270 includes, in addition to DVD drive 302, a CPU (Central Processing Unit) 290, a GPU (Graphics Processing Unit) 292, a bus 310 connected to CPU 290, GPU 292, and DVD drive 302, a ROM (Read-Only Memory) 296 connected to bus 310 and storing a boot-up program for computer 270, etc., a RAM (Random Access Memory) 298 connected to bus 310 and storing instructions constituting a program, system programs, working data, etc., and an SSD (Solid State Drive) 300 which is a non-volatile memory connected to bus 310. The SSD 300 is for storing programs executed by the CPU 290 and the GPU 292, and data used by the programs executed by the CPU 290 and the GPU 292. The computer 270 further includes a network I/F (Interface) 308 that provides a connection to a network 176 that enables communication with other terminals, and a Universal Serial Bus (USB) port 306 to which a USB memory 284 is detachable and that provides communication between the USB memory 284 and each unit in the computer 270.

コンピュータ２７０はさらに、マイクロフォン１６４及びスピーカ１６８とバス３１０とに接続され、ＣＰＵ２９０により生成されＲＡＭ２９８又はＳＳＤ３００に保存された音声信号、映像信号及びテキストデータをＣＰＵ２９０の指示に従って読み出し、アナログ変換及び増幅処理をしてスピーカ１６８を駆動したり、マイクロフォン１６４からのアナログの音声信号をデジタル化し、ＲＡＭ２９８又はＳＳＤ３００の、ＣＰＵ２９０により指定される任意のアドレスに保存したりする機能を持つ音声Ｉ／Ｆ３０４を含む。The computer 270 further includes an audio I/F 304 that is connected to the microphone 164 and speaker 168 and the bus 310, and has the function of reading out audio signals, video signals and text data generated by the CPU 290 and stored in the RAM 298 or SSD 300 in accordance with instructions from the CPU 290, performing analog conversion and amplification processing to drive the speaker 168, and digitizing the analog audio signal from the microphone 164 and storing it in any address of the RAM 298 or SSD 300 specified by the CPU 290.

上記実施形態においては、動作制御ＰＣ１６２、音声処理ＰＣ１６６、音声合成ＰＣ１７０、及び人位置認識ＰＣ１８０などの機能を実現するプログラムなどは、いずれも例えば図１８に示すＳＳＤ３００、ＲＡＭ２９８、ＤＶＤ２７８又はＵＳＢメモリ２８４、若しくはネットワークＩ／Ｆ３０８及びネットワーク１７６を介して接続された図示しない外部装置の記憶媒体などに格納される。典型的には、これらのデータ及びパラメータなどは、例えば外部からＳＳＤ３００に書込まれコンピュータ２７０の実行時にはＲＡＭ２９８にロードされる。In the above embodiment, programs for realizing the functions of the operation control PC 162, the voice processing PC 166, the voice synthesis PC 170, and the human position recognition PC 180 are all stored, for example, in the SSD 300, RAM 298, DVD 278, or USB memory 284 shown in FIG. 18, or in a storage medium of an external device (not shown) connected via the network I/F 308 and the network 176. Typically, these data and parameters are written, for example, from outside into the SSD 300 and loaded into the RAM 298 when the computer 270 is executed.

このコンピュータシステムを、図７に示す動作制御ＰＣ１６２、音声処理ＰＣ１６６、音声合成ＰＣ１７０、及び人位置認識ＰＣ１８０並びにその各構成要素の機能を実現するよう動作させるためのコンピュータプログラムは、ＤＶＤドライブ３０２に装着されるＤＶＤ２７８に記憶され、ＤＶＤドライブ３０２からＳＳＤ３００に転送される。又は、これらのプログラムはＵＳＢメモリ２８４に記憶され、ＵＳＢメモリ２８４はＵＳＢポート３０６に装着され、プログラムがＳＳＤ３００に転送される。又は、このプログラムはネットワーク１７６を通じてコンピュータ２７０に送信されＳＳＤ３００に記憶されてもよい。7, the operation control PC 162, the voice processing PC 166, the voice synthesis PC 170, and the human position recognition PC 180, and computer programs for operating the computer system to realize the functions of each of these components are stored in a DVD 278 inserted in a DVD drive 302, and transferred from the DVD drive 302 to the SSD 300. Alternatively, these programs are stored in a USB memory 284, which is inserted in a USB port 306, and the programs are transferred to the SSD 300. Alternatively, the programs may be transmitted to the computer 270 via the network 176 and stored in the SSD 300.

プログラムは実行のときにＲＡＭ２９８にロードされる。もちろん、キーボード２７４、モニタ２７２及びマウス２７６を用いてソースプログラムを入力し、コンパイルした後のオブジェクトプログラムをＳＳＤ３００に格納してもよい。上記実施形態のようにスクリプト言語の場合には、キーボード２７４などを用いて入力したスクリプトをＳＳＤ３００に格納してもよい。仮想マシン上において動作するプログラムの場合には、仮想マシンとして機能するプログラムを予めコンピュータ２７０にインストールしておく必要がある。音声認識及び音声合成などにはニューラルネットワークが使用される。訓練済のニューラルネットワークを使用してもよいし、会話ロボットシステム１５０において訓練を行ってもよい。 The program is loaded into RAM 298 when executed. Of course, the source program may be input using keyboard 274, monitor 272 and mouse 276, and the compiled object program may be stored in SSD 300. In the case of a scripting language as in the above embodiment, a script input using keyboard 274 or the like may be stored in SSD 300. In the case of a program that runs on a virtual machine, a program that functions as a virtual machine must be installed in computer 270 in advance. A neural network is used for voice recognition and voice synthesis. A trained neural network may be used, or training may be performed in conversational robot system 150.

ＣＰＵ２９０は、その内部のプログラムカウンタと呼ばれるレジスタ（図示せず）により示されるアドレスに従ってＲＡＭ２９８からプログラムを読み出して命令を解釈し、命令の実行に必要なデータを命令により指定されるアドレスに従ってＲＡＭ２９８、ＳＳＤ３００又はそれ以外の機器から読み出して命令により指定される処理を実行する。ＣＰＵ２９０は、実行結果のデータを、ＲＡＭ２９８、ＳＳＤ３００、ＣＰＵ２９０内のレジスタなど、プログラムにより指定されるアドレスに格納する。アドレスによってはロボットのアクチュエータへの指令、音声信号などとしてコンピュータから出力される。このとき、プログラムカウンタの値もプログラムによって更新される。コンピュータプログラムは、ＤＶＤ２７８から、ＵＳＢメモリ２８４から、又はネットワーク１７６を介して、ＲＡＭ２９８に直接にロードしてもよい。なお、ＣＰＵ２９０が実行するプログラムの中で、一部のタスク（主として数値計算）については、プログラムに含まれる命令により、又はＣＰＵ２９０による命令実行時の解析結果に従って、ＧＰＵ２９２にディスパッチされる。The CPU 290 reads a program from the RAM 298 according to an address indicated by a register (not shown) called a program counter inside the CPU 290, interprets the command, reads data required for executing the command from the RAM 298, the SSD 300, or other devices according to the address specified by the command, and executes the process specified by the command. The CPU 290 stores the execution result data in the RAM 298, the SSD 300, a register in the CPU 290, or an address specified by the program. Depending on the address, the computer outputs the command to the robot actuator, a voice signal, or the like. At this time, the value of the program counter is also updated by the program. The computer program may be loaded directly into the RAM 298 from the DVD 278, the USB memory 284, or via the network 176. Note that some tasks (mainly numerical calculations) among the programs executed by the CPU 290 are dispatched to the GPU 292 according to the command included in the program or according to the analysis result when the CPU 290 executes the command.

コンピュータ２７０により上記した実施形態に係る各部の機能を実現するプログラムは、それら機能を実現するようコンピュータ２７０を動作させるように記述され配列された複数の命令を含む。この命令を実行するのに必要な基本的機能のいくつかはコンピュータ２７０上において動作するオペレーティングシステム（ＯＳ）若しくはサードパーティのプログラム、コンピュータ２７０にインストールされる各種ツールキットのモジュール又はプログラムの実行環境により提供される場合もある。したがって、このプログラムはこの実施形態のシステム及び方法を実現するのに必要な機能全てを必ずしも含まなくてよい。このプログラムは、命令の中で、所望の結果が得られるように制御されたやり方によって適切な機能又はモジュールなどを静的にリンクすることにより、又は動的に呼出すことにより、上記した各装置及びその構成要素としての動作を実行する命令のみを含んでいればよい。そのためのコンピュータ２７０の動作方法は周知なので、ここでは繰り返さない。The program for implementing the functions of each part according to the above-mentioned embodiment by the computer 270 includes a plurality of instructions written and arranged to operate the computer 270 to implement those functions. Some of the basic functions required to execute the instructions may be provided by an operating system (OS) or a third-party program running on the computer 270, or by modules of various toolkits or the execution environment of the program installed on the computer 270. Thus, the program does not necessarily include all of the functions required to implement the system and method of this embodiment. The program only needs to include instructions to perform the operations of each of the above-mentioned devices and their components by statically linking or dynamically calling appropriate functions or modules in a controlled manner to obtain the desired results. The method of operation of the computer 270 for this purpose is well known, so it will not be repeated here.

なお、ＧＰＵ２９２は並列処理を行うことが可能であり、機械学習に伴う多量の計算を同時並列的又はパイプライン的に実行できる。例えばプログラムのコンパイル時にプログラム中に発見された並列的計算要素、又はプログラムの実行時に発見された並列的計算要素は、随時、ＣＰＵ２９０からＧＰＵ２９２にディスパッチされ、実行され、その結果が直接に、又はＲＡＭ２９８の所定アドレスを介してＣＰＵ２９０に返され、プログラム中の所定の変数に代入される。 The GPU 292 is capable of parallel processing, and can execute a large amount of calculations associated with machine learning simultaneously in parallel or in a pipeline manner. For example, parallel calculation elements discovered in a program when the program is compiled, or parallel calculation elements discovered when the program is executed, are dispatched from the CPU 290 to the GPU 292 as needed, and executed. The results are returned to the CPU 290 directly or via a specified address in the RAM 298, and assigned to a specified variable in the program.

３．機能的構成
図１９に、この実施形態に係る会話ロボットシステム１５０の中の視線制御に関連する部分である視線制御システム３５０の機能的構成を示す。図１９を参照して、視線制御システム３５０は、ロボット１６０及び他の対話参加者の現在の役割を判定し出力するための役割判定部３６０と、ターン交替を検出するためのターン交替検出部３６２とを含む。この実施形態においてはターン交替の検出及び対話参加者の役割についてはその都度検出している。しかし、例えばあらかじめ準備したシナリオにそってロボットを含む３者対話を行う場合には、各参加者の役割及びターン交替はシナリオによりほぼ特定される。そのため、役割判定部３６０及びターン交替検出部３６２を設ける必要はない。後述する評価実験はそうした条件で視線制御を行っている。 3. Functional Configuration FIG. 19 shows the functional configuration of the gaze control system 350, which is a part related to gaze control in the conversation robot system 150 according to this embodiment. Referring to FIG. 19, the gaze control system 350 includes a role determination unit 360 for determining and outputting the current roles of the robot 160 and other dialogue participants, and a turn change detection unit 362 for detecting turn changes. In this embodiment, the turn changes and the roles of the dialogue participants are detected each time. However, for example, when a three-way dialogue including a robot is performed according to a scenario prepared in advance, the roles and turn changes of each participant are almost specified by the scenario. Therefore, it is not necessary to provide the role determination unit 360 and the turn change detection unit 362. In the evaluation experiment described later, gaze control is performed under such conditions.

視線制御システム３５０はさらに、ターン交替時及び発話中のロボット１６０の視線方向を確率的に決定するための視線方向モデル３６４と、ロボット１６０の視線逸らし時の視線方向を過効率的に決定するための視線逸らしモデル３６６と、ロボット１６０の個性（外向的、内向的、中立）に関する情報を記憶するための個性情報記憶部３６８とを含む。視線方向モデル３６４の詳細については図２０及び図２１を参照して後述する。モデル３６６の詳細については図２２及び図２３を参照して後述する。なお、視線方向モデル３６４及び視線逸らしモデル３６６はいずれも図１６に示す確率モデル記憶装置１７２に記憶される。個性情報記憶部３６８も実際には確率モデル記憶装置１７２により実現される。確率モデル記憶装置１７２は図１８のＳＳＤ３００により実現される。The gaze control system 350 further includes a gaze direction model 364 for probabilistically determining the gaze direction of the robot 160 when changing turns and while speaking, a gaze aversion model 366 for efficiently determining the gaze direction of the robot 160 when averting its gaze, and a personality information storage unit 368 for storing information about the personality of the robot 160 (extrovert, introvert, neutral). Details of the gaze direction model 364 will be described later with reference to Figures 20 and 21. Details of the model 366 will be described later with reference to Figures 22 and 23. Both the gaze direction model 364 and the gaze aversion model 366 are stored in the probability model storage unit 172 shown in Figure 16. The personality information storage unit 368 is also actually realized by the probability model storage unit 172. The probability model storage unit 172 is realized by the SSD 300 in Figure 18.

視線制御システム３５０はさらに、ターン交替検出部３６２からターン交替が検出されたことを示す信号を受けたことに応答して、役割判定部３６０からロボット１６０及び他の対話参加者の役割に関する情報を受けて、視線方向モデル３６４、個性情報記憶部３６８及び３６８を使用してロボット１６０の視線動作を制御するためのパラメータを作成するための視線動作生成部３７０と、視線動作生成部３７０により作成されたパラメータに従って、ロボット１６０の視線を制御するための各種のアクチュエータ（図示せず）を制御するための視線動作制御部３７２とを含む。The gaze control system 350 further includes a gaze action generation unit 370 for receiving information regarding the roles of the robot 160 and other dialogue participants from the role determination unit 360 in response to receiving a signal indicating that a turn change has been detected from the turn change detection unit 362, and for creating parameters for controlling the gaze action of the robot 160 using the gaze direction model 364, the personality information storage units 368 and 368, and a gaze action control unit 372 for controlling various actuators (not shown) for controlling the gaze of the robot 160 in accordance with the parameters created by the gaze action generation unit 370.

４．モデル構成
Ａ．視線方向モデル３６４
図２０を参照して、視線方向モデル３６４は、個性及び役割の組み合わせに応じてあらかじめ準備される発話時視線方向モデル４００と、個性別及び役割の組み合わせに応じてあらかじめ準備される視線継続時間モデル４０２と、個性及び役割の組み合わせに応じてあらかじめ準備される、それぞれターン交替時の１回目、２回目、及び３回目の区間における視線方向の決定のためのターン交替時視線方向モデル４０４、４０６及び４０８とを含む。 4. Model configuration A. Gaze direction model 364
Referring to FIG. 20 , the gaze direction model 364 includes a gaze direction model 400 during speech that is prepared in advance according to a combination of personality and role, a gaze duration model 402 that is prepared in advance according to a combination of personality and role, and gaze direction models 404, 406, and 408 during turn alternation that are prepared in advance according to a combination of personality and role for determining the gaze direction in the first, second, and third sections during a turn alternation, respectively.

発話時視線方向モデル４００の構成は図６に示したものと実質的に同じである。図６に示したものは例えば外向的な参加者のモデルである。したがって発話時視線方向モデル４００は、図６に示したものに加えて、内向的な参加者のためのモデル、及び中立的な参加者のためのモデルを含む。The configuration of the gaze direction model 400 during speech is substantially the same as that shown in FIG. 6. The model shown in FIG. 6 is, for example, a model for an extroverted participant. Therefore, in addition to that shown in FIG. 6, the gaze direction model 400 during speech includes a model for an introverted participant and a model for a neutral participant.

図２１に、視線継続時間モデル４０２の構成の１例を示す。図２１を参照して、視線継続時間モデル４０２は、ロボット１６０の役割（発話者、メインリスナ及びサブリスナ）、及びロボット１６０に想定される個性（外向的、内向的、中立）の組み合わせに応じ、各視線先（自分の役割を除く他の参加者の役割）の各々について、χ^２乗分布曲線の自由度（ｎ）、横軸方向のバイアス（ｌｏｃ）及び縦軸方向の振幅（ｓｃａｌｅ）の値を記憶したものである。これらの値をχ^２乗分布の式に代入することで分布を特定し、その分布からランダムに値をサンプリングすることにより、視線方向の決定時にその継続時間も決定できる。 Fig. 21 shows an example of the configuration of the gaze duration model 402. Referring to Fig. 21, the gaze duration model 402 stores the values of the degree of freedom (n), the bias (loc) in the horizontal axis direction, and the amplitude (scale) in the vertical axis direction of the chi ^-square distribution curve for each gaze destination (role of other participants excluding one's own role) according to a combination of the role of the robot 160 (speaker, main listener, and sub-listener) and the personality (extrovert, introvert, neutral) assumed for the robot 160. The distribution is specified by substituting these values into the chi- ^square distribution formula, and the duration can be determined when the gaze direction is determined by randomly sampling values from the distribution.

ターン交替時視線方向モデル４０４、４０６及び４０８は、発話時視線方向モデル４００と同じ構成を持つ。違いは、その格納した値が、ターン交替時の各区間に限定されているという点である。The gaze direction models 404, 406, and 408 during turn alternation have the same configuration as the gaze direction model during speech 400. The difference is that the stored values are limited to each interval during the turn alternation.

Ｂ．視線逸らしモデル３６６
図２２に、図１９に示す視線逸らしモデル３６６の構成を示す。図２２を参照して、視線逸らしモデル３６６は、ターン交替時視線逸らし方向モデル４５０及び発話時視線逸らし方向モデル４５２を含む。これらモデルはいずれも、個性別に設けられる。 B. Averted Eyes Model 366
Fig. 22 shows the configuration of the gaze aversion model 366 shown in Fig. 19. Referring to Fig. 22, the gaze aversion model 366 includes a turn-taking gaze aversion direction model 450 and a speech gaze aversion direction model 452. Both of these models are provided for each personality.

図２３にターン交替時視線逸らし方向モデル４５０の構成の一例を示す。図２３を参照して、ターン交替時視線逸らし方向モデル４５０は、例えば図１４に示す外向的な参加者の視線逸らし方向をモデル化したものである。この実施形態においては、ターン交替時視線逸らし方向モデル４５０は配列により実現されている。各要素は、図１４の中央の列及び右の列を最も左側の列の下に移動させたときの順番に配列されている。各要素は、視線逸らし方向（左上から右下までの９方向）と、その方向に視線を逸らす割合（確率）とを示す。図１４において「Ｌ」は左を、「Ｃ」は中央を、「Ｒ」は右を、「Ｕ」は上を、「Ｄ」は下を、それぞれ示す。例えば「ＬＤ」は左下を表し、「ＣＤ」は中央下を表す。 Figure 23 shows an example of the configuration of the turn-changing gaze aversion direction model 450. With reference to Figure 23, the turn-changing gaze aversion direction model 450 is a model of the gaze aversion direction of an extroverted participant shown in Figure 14, for example. In this embodiment, the turn-changing gaze aversion direction model 450 is realized by an array. Each element is arranged in the order when the center column and the right column in Figure 14 are moved below the leftmost column. Each element indicates the gaze aversion direction (nine directions from the top left to the bottom right) and the proportion (probability) of averting the gaze in that direction. In Figure 14, "L" indicates left, "C" indicates center, "R" indicates right, "U" indicates up, and "D" indicates down. For example, "LD" indicates bottom left, and "CD" indicates bottom center.

後述するように、プログラムがターン交替時視線逸らし方向モデル４５０を参照することにより、ロボット１６０の役割に応じ、ターン交替時にロボット１６０が視線逸らしを行うときの方向を確率的に定めることができる。As described below, the program can refer to the turn-changing gaze averting direction model 450 to probabilistically determine the direction in which the robot 160 will avert its gaze when changing turns, depending on the role of the robot 160.

５．プログラム構成
Ａ．全体構成
図２４に、この実施形態に係るロボット１６０を制御するために視線制御システム３５０が実行するプログラムの全体構成をフローチャート形式により示す。図２４を参照して、このプログラムは、一定の時間間隔をもって繰り返し起動され、起動時に対話の状態（ターン交替時か否か、各対話参加者の役割は何か、など）をセンシングするステップ４８０と、現時点が視線方向を決定するタイミングか否かに従って制御の流れを分岐させるステップ４８２とを含む。視線方向を決定するタイミングとは、例えばターン交替時であれば、ターン交替が発生するタイミングを０秒として－１秒、－０．３秒、及び０．３秒となったときである。ターン交替時以外、すなわち発話時には、その前の視線方向決定の際に決定された視線の継続時間が満了した時点である。その方向が視線逸らし方向であり、その継続時間が０．７秒以上となったときには、さらに０．７秒ごとに視線方向を決定し直す処理を行う。これらのタイミングについては、各イベントが発生したときにその時間をタイマに設定し、タイマが満了したか否かに従って検出できる。 5. Program Configuration A. Overall Configuration FIG. 24 shows, in a flow chart format, the overall configuration of the program executed by the gaze control system 350 to control the robot 160 according to this embodiment. Referring to FIG. 24, this program is repeatedly started at a certain time interval, and includes step 480 for sensing the state of the dialogue (whether it is a turn change, what is the role of each dialogue participant, etc.) at the time of starting, and step 482 for branching the flow of control according to whether the current time is the timing to determine the gaze direction. For example, if it is a turn change, the timing for determining the gaze direction is when it is −1 second, −0.3 second, and 0.3 second, with the timing of the turn change being 0 second. Other than the turn change, that is, when speaking, it is the time when the duration of the gaze determined when the previous gaze direction was determined expires. If the direction is the gaze averting direction and the duration is 0.7 seconds or more, a process of re-determining the gaze direction is performed every 0.7 seconds. These timings can be detected by setting the time to a timer when each event occurs and depending on whether the timer expires or not.

このプログラムはさらに、ステップ４８２の判定が肯定のときに実行され、現在のロボット１６０に割当てられた役割と、その個性と、発話時かターン交替時かという情報との組み合わせに従って、視線方向モデル３６４から選択したモデル、又はさらに視線逸らしモデル３６６から選択したモデルに従って視線方向とその継続時間とを決定するステップ４８４と、ステップ４８４の後、及びステップ４８２における判定が否定的である場合には直接に実行され、ロボット１６０の各部をステップ４８４で設定されたパラメータに従って制御してプログラムの実行を終了するステップ４８６とを含む。なお、このプログラムでは、図示していない他のステップにおいてロボットのための音声認識、音声合成、及び姿勢の制御などのための処理も実行される。ステップ４８６ではそうした処理に基づく制御も行われる。This program further includes step 484, which is executed when the determination in step 482 is positive, and determines the gaze direction and its duration according to a model selected from gaze direction model 364 or further from gaze aversion model 366, according to a combination of the role assigned to the current robot 160, its personality, and information on whether it is speaking or turning, and step 486, which is executed after step 484 or directly when the determination in step 482 is negative, and controls each part of the robot 160 according to the parameters set in step 484 and ends the execution of the program. Note that in this program, processing for voice recognition, voice synthesis, posture control, and the like for the robot is also executed in other steps not shown. Control based on such processing is also performed in step 486.

ステップ４８６では、ステップ４８４において決定されたパラメータに従って、タイマの進行にあわせてロボット１６０の視線を制御するために、ロボット１６０の頭部の向き、及び眼球の位置などが制御される。プログラムが実行されるたびに、時間が進行するため、プログラムが実行されるたびにロボット１６０の頭部の位置及び眼球の位置が変化していく。In step 486, the head direction and eye position of robot 160 are controlled in accordance with the parameters determined in step 484 in order to control the line of sight of robot 160 in accordance with the progression of the timer. Since time progresses each time the program is executed, the head position and eye position of robot 160 change each time the program is executed.

Ｂ．ステップ４８０（状態センシング）
図２５に、図２４のステップ４８０を実現するプログラムの構成をフローチャート形式により示す。図２５を参照して、ステップ４８０は、図１９に示す役割判定部３６０からロボット１６０及び各参加者の役割を示す情報を読み出すステップ５１０と、図１９に示すターン交替検出部３６２からターン状態を示す情報を読み出すステップ５１２と、図１９に示す個性情報記憶部３６８からロボット１６０に付与された個性を示す情報とを読み出してこのプログラムの実行を終了するステップ５１４とを含む。 B. Step 480 (Status Sensing)
Fig. 25 shows, in the form of a flowchart, the configuration of a program for realizing step 480 in Fig. 24. Referring to Fig. 25, step 480 includes step 510 for reading information indicating the roles of robot 160 and each participant from role determination unit 360 shown in Fig. 19, step 512 for reading information indicating a turn state from turn alternation detection unit 362 shown in Fig. 19, and step 514 for reading information indicating the personality given to robot 160 from personality information storage unit 368 shown in Fig. 19 and terminating execution of this program.

ロボット１６０に付与された個性は固定されていると考えてよい。したがって個性に関する情報は必ずしもプログラムの実行の繰り返しごとに読み出す必要はない。しかし個性だけではなく、感情などのように時間により変化することが想定される状態まで含めてロボット１６０に付与する場合には、図２５に示すようにプログラムの実行の繰り返しごとにその情報を読み出すことが望ましい。The personality given to the robot 160 can be considered to be fixed. Therefore, information about the personality does not necessarily need to be read out for each repetition of the program execution. However, if the robot 160 is to be given not only a personality but also states that are expected to change over time, such as emotions, it is desirable to read out that information for each repetition of the program execution, as shown in Figure 25.

Ｃ．ステップ４８４（視線方向と継続時間の決定）
図２６を参照して、ステップ４８４は、ロボット１６０に割当てられた役割、ターン状態、及び個性に応じた視線方向モデル及び視線継続時間モデルを視線方向モデル３６４から選択し読み出すステップ５５０と、同じくロボット１６０に割当てられた役割、ターン状態、及び個性に応じたターン交替時視線逸らし方向モデル４５０又は発話時視線逸らし方向モデル４５２を視線逸らしモデル３６６から選択し読み出すステップ５５２とを含む。ステップ５５０では、ターン状態が発話時であれば図２０に示す発話時視線方向モデル４００が選択される。ターン状態がターン交替時であれば、タイミングに従いターン交替時視線方向モデル４０４及び４０６及び４０８のいずれか１つが選択される。 C. Step 484 (Determine Gaze Direction and Duration)
26, step 484 includes a step 550 of selecting and reading from the gaze direction model 364 a gaze direction model and a gaze duration model corresponding to the role, turn state, and personality assigned to the robot 160, and a step 552 of selecting and reading from the gaze aversion model 366 a turn alternation gaze averting direction model 450 or a speech averting direction model 452 corresponding to the role, turn state, and personality assigned to the robot 160. In step 550, if the turn state is speech, the speech gaze direction model 400 shown in FIG. 20 is selected. If the turn state is turn alternation, any one of the turn alternation gaze direction models 404, 406, and 408 is selected according to the timing.

このプログラムはさらに、ステップ５５２に続き、一様分布から［０，１］の範囲の値ｐをサンプリングするステップ５５４と、ステップ５５４においてサンプリングされた値ｐを用い、ステップ５５０において選択された視線方向モデルから視線方向を決定するステップ５５６とを含む。この明細書では、このようにして視線方向モデルを［０，１］の範囲でランダムにサンプリングされた値を用いて視線方向を決定することを「視線方向のサンプリング」と呼ぶ。ステップ５５６の詳細については図２７を参照して後述する。Following step 552, the program further includes step 554 of sampling a value p in the range [0, 1] from a uniform distribution, and step 556 of determining the gaze direction from the gaze direction model selected in step 550 using the value p sampled in step 554. In this specification, determining the gaze direction using a value randomly sampled in the range [0, 1] of the gaze direction model in this manner is referred to as "sampling the gaze direction." Details of step 556 will be described later with reference to FIG. 27.

このプログラムはさらに、ステップ５５４と同様、一様分布から［０，１］の範囲の値ｐをサンプリングするステップ５５８と、ステップ５５０で読み出された視線継続時間モデル４０２から、図２５示す処理により設定されたロボット１６０の役割及びロボット１６０に付与された個性に従ったχ^２乗分布のパラメータｎ、ｌｏｃ及びｓｃａｌｅを読み出しこれら値とステップ５５８においてサンプリングされた値ｐとを用いて視線の継続時間を決定するステップ５６０とを含む。ステップ５６０の詳細については図２８を参照して後述する。 This program further includes step 558 of sampling a value p in the range of [0, 1] from a uniform distribution, similar to step 554, and step 560 of reading parameters n, loc, and scale of a ^chi -square distribution according to the role of the robot 160 and the personality given to the robot 160 set by the process shown in Fig. 25 from the gaze duration model 402 read in step 550, and determining the gaze duration using these values and the value p sampled in step 558. Details of step 560 will be described later with reference to Fig. 28.

このプログラムはさらに、ステップ５５６で決定された視線方向が視線逸らしか否かに従って制御の流れを分岐させるステップ５６２を含む。ステップ５６２の判定が否定のときにはこのプログラムの実行は終了する。The program further includes step 562, which branches the flow of control depending on whether the gaze direction determined in step 556 is an averted gaze or not. If the determination in step 562 is negative, execution of the program ends.

このプログラムはさらに、ステップ５６２の判定が肯定的であることに応答して、一様分布から［０，１］の範囲の値ｐをサンプリングするステップ５６４と、ステップ５６４でサンプリングされた値ｐを用い、ステップ５５６で選択された視線方向モデルを用いて視線方向を決定してこのプログラムの実行を終了するステップ５６６とを含む。ステップ５６６の詳細については図２９を参照して後述する。 The program further includes step 564, in response to the determination in step 562 being affirmative, sampling a value p in the range [0, 1] from a uniform distribution, and step 566, using the value p sampled in step 564 and the gaze direction model selected in step 556 to determine the gaze direction, and terminating execution of the program. Details of step 566 will be described later with reference to FIG. 29.

図２７に、図２６のステップ５５６の詳細をフローチャート形式で示す。なおこの図では、視線をメインリスナに向ける確率を表す変数として変数Ｓ_Ｍ、視線を発話者に向ける確率としてＳ_Ｓ、及び視線をサブリスナに向ける確率としてＳ_Ｂを、それぞれ用いる。図２７を参照して、ステップ５５６は、変数Ｓ_Ｓの値と変数Ｓ_Ｍの値とを合計した値を変数Ｓ_Ｍに代入し、このように値が算出された変数Ｓ_Ｍとの値と変数Ｓ_Ｂの値とを合計した値を変数Ｓ_Ｂに代入するステップ６００を含む。この計算を行うことにより、変数Ｓ_Ｓには発話者の方向を向く確率が格納される。変数Ｓ_Ｍには発話者の方向を向く確率とメインリスナの方向を向く確率とが合算された値が格納される。変数Ｓ_Ｂには発話者、メインリスナ及びサブリスナの方向を向く確率が合算された値が格納される。 FIG. 27 shows the details of step 556 in FIG. 26 in the form of a flow chart. In this figure, a variable S _M is used as a variable representing the probability of directing the gaze to the main listener, S _S is used as a probability of directing the gaze to the speaker, and S _B is used as a probability of directing the gaze to the sub-listener. Referring to FIG _. 27, step 556 includes step 600 of substituting the sum of the value of the variable S _S and the value of the variable S _M into the variable S _M , and substituting the sum of the value of the variable S _M thus calculated and the value of the variable S B into the variable S _B. By performing this calculation, the probability of turning in the direction of the speaker is stored in the variable S _S. The variable S _M stores a value obtained by adding up the probability of turning in the direction of the speaker and the probability of turning in the direction of the main listener. The variable S _B stores a value obtained by adding up the probabilities of turning in the direction of the speaker, the main listener, and the sub-listener.

このプログラムはさらに、ステップ６００に続き値ｐが変数Ｓ_Ｓの値より小さいか否かに従い制御の流れを分岐させるステップ６０２と、ステップ６０２における判定が肯定的であるときに、視線方向を発話者の方向であると決定しこのプログラムの実行を終了するステップ６０４とを含む。 This program further includes, following step 600, step 602 for branching the flow of control depending on whether or not the value p is smaller than the value of the variable S-- _S , and, when the determination in step 602 is affirmative, step 604 for determining that the gaze direction is the direction of the speaker and terminating execution of this program.

このプログラムはさらに、ステップ６０２における判定が否定的であることに応答して、値ｐが変数Ｓ_Ｍの値より小さいか否かに従って制御の流れを分岐させるステップ６０６と、ステップ６０６における判定結果が肯定的であることに応答して、視線方向をメインリスナの方向であると決定してこのプログラムの実行を終了するステップ６０８とを含む。 The program further includes step 606, in response to the determination in step 602 being negative, of branching the flow of control according to whether or not the value p is smaller than the value of the variable S-- _M , and step 608, in response to the determination result in step 606 being positive, of determining that the line of sight direction is the direction of the main listener and terminating execution of the program.

このプログラムはさらに、ステップ６０６における判定が否定的であることに応答して、値ｐが変数Ｓ_Ｂの値より小さいか否かに従って制御の流れを分岐させるステップ６１０と、ステップ６１０における判定が肯定的であるときに、視線方向をサブリスナの方向に決定してこのプログラムの実行を終了するステップ６１２と、ステップ６１０における判定が否定的であるときに、視線方向を視線逸らし方向であると決定してこのプログラムの実行を終了するステップ６１４とを含む。 The program further includes step 610, in response to the determination in step 606 being negative, branching the flow of control according to whether or not the value p is less than the value of the variable _{S_B} , step 612, when the determination in step 610 is positive, determining the gaze direction to be in the direction of the sub-listener and terminating execution of the program, and step 614, when the determination in step 610 is negative, determining the gaze direction to be an averting direction and terminating execution of the program.

ステップ６００における処理を行うことにより、こうしたアルゴリズムを用いて、値ｐとモデルの表す確率とに従い視線方向を決定できる。By performing the processing in step 600, such an algorithm can be used to determine the gaze direction according to the value p and the probability represented by the model.

図２８に、図２６のステップ５６０で実行されるプログラムの制御構造をフローチャート形式で示す。図２８を参照して、このプログラムは、図２６のステップ５５６において選択された視線継続時間モデル４０２から値ｎ、ｌｏｃ、及びｓｃａｌｅを読み出すステップ６５０と、［０，１］の一様分布から値ｐをサンプリングしこの値を変数ｘに代入するステップ６５２と、既に述べた視線継続時間を算出するχ^２乗分布の式にｎ、ｌｏｃ、ｓｃａｌｅ及びｘの値を代入することにより視線の継続時間を算出してこのプログラムの実行を終了するステップ６５４とを含む。 Fig. 28 shows, in the form of a flowchart, a control structure of the program executed in step 560 in Fig. 26. Referring to Fig. 28, this program includes step 650 of reading values n, loc, and scale from the gaze duration model 402 selected in step 556 in Fig. ^{26, step 652 of sampling a value p from a uniform distribution of [0, 1] and substituting this value for a variable x, and step 654 of calculating the gaze duration by substituting the values of n, loc, scale, and x into the chi-square} distribution equation for calculating the gaze duration already described, and then terminating execution of this program.

図２９に図２６のステップ５６６を実現するプログラムの制御構造をフローチャート形式で示す。図２９を参照して、このプログラムは、プログラム中のループ処理のための繰り返し変数ｉに０を、図２３に示す各確率を配列番号０から順番に累積していくための変数Ｓに０を、それぞれ代入するステップ６８０と、ステップ６８０に続き、変数ｉの値が８より小さいか否かに従って制御の流れを分岐させるステップ６８２とを含む。ここで「８」とは、図２３に示す配列における添字の最大値である。 Figure 29 shows, in the form of a flowchart, the control structure of a program for implementing step 566 in Figure 26. Referring to Figure 29, this program includes step 680 for substituting 0 for repetition variable i for loop processing in the program, and 0 for variable S for accumulating each of the probabilities shown in Figure 23 in order starting from array number 0, and step 682 following step 680, for branching the flow of control depending on whether the value of variable i is less than 8. Here, "8" is the maximum value of the subscript in the array shown in Figure 23.

このプログラムはさらに、ステップ６８２の判定が肯定的であるときに、変数Ｓに、図２３に示す配列の確率Ｐ（ｉ）を加算するステップ６８４と、変数ｐの値が変数Ｓの値より小さいか否かに従って制御の流れを分岐させるステップ６８６と、ステップ６８６の判定が否定的であるとき、すなわち変数ｐの値が変数Ｓの値以上であるときに、変数ｉの値に１を加算して制御をステップ６８２に戻すステップ６８８とを含む。ステップ６８６における判定が肯定的である場合には、視線逸らし方向を図２３の配列における視線逸らし方向Ｄ（ｉ）に決定しプログラムの実行を終了するステップ６９０を含む。23 to variable S, step 686 branches the flow of control depending on whether the value of variable p is smaller than the value of variable S, and step 688 adds 1 to the value of variable i and returns control to step 682 when the determination in step 686 is negative, i.e., when the value of variable p is equal to or greater than the value of variable S. If the determination in step 686 is positive, step 690 determines the direction of averted gaze to be the direction of averted gaze D(i) in the array in FIG. 23 and terminates execution of the program.

このプログラムはさらに、ステップ６８２における判定が否定的であることに応答して、視線逸らし方向を視線逸らし方向Ｄ（８）に決定しプログラムの実行を終了するステップ６９２を含む。 The program further includes step 692, in response to a negative determination in step 682, determining the gaze aversion direction to be gaze aversion direction D(8) and terminating execution of the program.

このプログラムを実行することにより、例えば図１４に示す確率の分布に従って視線逸らし方向を決定できる。 By executing this program, the direction of gaze aversion can be determined, for example, according to the probability distribution shown in Figure 14.

６．動作
この実施形態に係る会話ロボットシステム１５０は以下のように動作する。なお以下の説明は、視線制御システム３５０に関するもののみについて、主として図２４から図２９を参照して行う。 6. Operation The conversational robot system 150 according to this embodiment operates as follows. Note that the following description will be given only about the gaze control system 350, mainly with reference to Figs. 24 to 29.

視線制御システム３５０が起動されると、図２４のステップ４８０が実行される。この例では、ロボット１６０を含む対話参加者の各役割、ターン状態としてはあらかじめ定める初期値として役割＝発話者、ターン状態として発話期間（ターン交替期間以外）が設定されているものとする。また図１９に示す個性情報記憶部３６８にはロボット１６０に付与する個性を示す情報が格納されているものとする。ここでは、ロボット１６０には「外向的」という個性が付与されているものとする。ステップ４８０ではこうした情報が図１８のＲＡＭ２９８に読み込まれる。 When the gaze control system 350 is started, step 480 in Figure 24 is executed. In this example, it is assumed that the roles of the dialogue participants, including the robot 160, and the turn state are set as predetermined initial values, with role = speaker and the turn state being the speaking period (other than the turn alternation period). It is also assumed that the personality information storage unit 368 shown in Figure 19 stores information indicating the personality to be given to the robot 160. Here, it is assumed that the robot 160 has been given the personality of "extroverted". In step 480, this information is read into the RAM 298 in Figure 18.

続いてステップ４８２において、視線方向決定のタイミングか否かが決定される。ターン状態の初期値として発話期間が設定されており、かつ視線継続時間のタイマがクリアされているものとすると、図２４のステップ４８２における判定は肯定となり、制御はステップ４８４に進む。 Next, in step 482, it is determined whether it is time to determine the gaze direction. If the speech period is set as the initial value of the turn state and the gaze duration timer is cleared, the determination in step 482 in FIG. 24 is positive and control proceeds to step 484.

図２６を参照して、ステップ５５０では、役割＝発話者、ターン状態＝発話期間、個性＝外向的に対応する発話時視線方向モデル４００が読み出され、ステップ５５２では役割＝発話者、個性＝外向的に対応する視線継続時間モデル４０２がそれぞれ選択される。さらにステップ５５４において値ｐがサンプリングされ、ステップ５５６が実行される。ステップ５５６においては、図２７に示す処理が実行され、視線方向が決定される。ここではロボット１６０の役割は発話者なので、視線方向としてはメインリスナ、サブリスナ及び視線逸らしのいずれかが選択される。 With reference to FIG. 26, in step 550, the gaze direction model 400 during speech corresponding to role=speaker, turn state=speaking period, and personality=extroversion is read out, and in step 552, the gaze duration model 402 corresponding to role=speaker and personality=extroversion is selected. Furthermore, in step 554, the value p is sampled, and step 556 is executed. In step 556, the process shown in FIG. 27 is executed, and the gaze direction is determined. Here, since the role of the robot 160 is a speaker, one of main listener, sub-listener, and averted gaze is selected as the gaze direction.

図２６に戻り、ステップ５５８において値ｐが再びサンプリングされる。続くステップ５６０においては以下のように視線継続時間が決定される。Returning to FIG. 26, the value p is again sampled in step 558. In the following step 560, the gaze duration is determined as follows:

図２８を参照して、ステップ６５０において、図２６のステップ５５２において読み出された視線継続時間モデル４０２から、ロボット１６０の役割及びロボット１６０に付与された個性に従ったχ^２乗分布のパラメータｎ、ｌｏｃ及びｓｃａｌｅが読み出される。ステップ６５２において、ステップ５５８においてサンプリングされた値ｐが変数ｘに代入される。続くステップ６５４において、パラメータｎ、ｌｏｃ及びｓｃａｌｅと変数ｘの値とを継続時間計算のための式に代入して値を計算することにより視線の継続時間が決定される。 28, in step 650, parameters n, loc, and scale of the chi- ^square distribution according to the role of the robot 160 and the personality given to the robot 160 are read from the gaze duration model 402 read in step 552 in Fig. 26. In step 652, the value p sampled in step 558 is substituted for the variable x. In the following step 654, the gaze duration is determined by substituting the parameters n, loc, and scale and the value of the variable x into an equation for duration calculation to calculate a value.

再び図２６に戻り、続くステップ５６０においては、ステップ５５６で決定された視線方向が視線逸らしか否かが判定される。ここでは視線方向が視線逸らしではないとする。すると、図２６に示す処理（図２４のステップ４８４）は直ちに終了する。図２４に戻り、ステップ４８６においては、ステップ４８４で決定された視線方向及び視線継続時間に従ったロボット１６０の制御が開始される。なおステップ４８６においては、これら視線制御以外についてもロボット１６０の制御が実行される。仮にステップ５５６で決定された視線方向が視線逸らしであれば、図２６のステップ５６４及び５６６が実行されてロボット１６０の視線逸らしの方向が決定される。ステップ４８６においては、ロボット１６０の視線はメインリスナでもサブリスナでもない方向となるように制御される。Returning to FIG. 26 again, in the following step 560, it is determined whether the gaze direction determined in step 556 is an averted gaze or not. Here, it is assumed that the gaze direction is not an averted gaze. In that case, the process shown in FIG. 26 (step 484 in FIG. 24) immediately ends. Returning to FIG. 24, in step 486, control of robot 160 is started according to the gaze direction and gaze duration determined in step 484. Note that in step 486, control of robot 160 is also executed in addition to these gaze controls. If the gaze direction determined in step 556 is an averted gaze, steps 564 and 566 in FIG. 26 are executed to determine the direction in which robot 160 will avert its gaze. In step 486, the gaze of robot 160 is controlled so that it is in a direction that is neither the main listener nor the sub-listener.

次に図２４に示す処理が起動されたものとする。ステップ４８０は１回目の処理と同様である。ステップ４８２においては、まだ視線方向の継続時間が満了していないのでステップ４８４の処理は実行されず、ステップ４８６の処理だけが実行される。したがって、ロボット１６０の視線方向の変化がまだ完了していなければ視線方向の変化が継続して実行され、完了していればその視線方向が維持される。 Next, assume that the processing shown in FIG. 24 is started. Step 480 is the same as the first processing. In step 482, since the duration of the gaze direction has not yet expired, the processing of step 484 is not executed, and only the processing of step 486 is executed. Therefore, if the change in the gaze direction of the robot 160 has not yet been completed, the change in the gaze direction continues, and if it has been completed, the gaze direction is maintained.

こうして図２４の処理が複数回実行された後、ターン交替が検出されたものとする。ここでは実際のターン交替のタイミングの１秒前が検出され図２４のステップ４８０においてロボット１６０に通知される。ターン交替のタイミングの１秒前は視線方向決定のタイミングである。したがって上記したステップ４８４の処理が再度実行されて、ロボット１６０の新しい役割と、ターン状態＝ターン交替期間と、ロボット１６０の個性とに応じて選択された視線方向モデルと視線継続モデルとを用いて新たな視線方向と新たな視線継続時間が決定される。その後、図２４のステップ４８６によりロボット１６０の視線の制御が新たなパラメータに従って開始される。 After the process of FIG. 24 is thus executed multiple times, it is assumed that a turn change has been detected. Here, one second before the actual timing of the turn change is detected and notified to robot 160 in step 480 of FIG. 24. One second before the timing of the turn change is the timing for determining the gaze direction. Therefore, the process of step 484 described above is executed again, and a new gaze direction and a new gaze duration are determined using a gaze direction model and a gaze duration model selected according to robot 160's new role, the turn state = turn change period, and the personality of robot 160. Thereafter, control of robot 160's gaze according to the new parameters is started in step 486 of FIG. 24.

以下、視線制御システム３５０は上記した処理を繰り返す。なお、既に述べたとおり、直前に決定した視線継続時間が満了した時、又はターン交替のタイミングを０秒として－１秒、－０．３秒、及び０．３秒のときにロボット１６０の視線方向の決定が行われる。Thereafter, the gaze control system 350 repeats the above-mentioned process. As already mentioned, the gaze direction of the robot 160 is determined when the gaze duration determined immediately before expires, or when the timing of the turn change is -1 second, -0.3 second, and 0.3 second, assuming that the timing is 0 second.

なお、実際の３者対話では、視線逸らしのときには、一点を長時間見ているのではなく、途中で視線方向が変化することが観測された。この変化の回数は、視線逸らしの時間が長いほど多くなる傾向が見られた。そこで、視線を逸らした回数ごとの継続時間の分布の解析結果により、継続時間が０．７秒以下で１回、１．４秒以下で２回、２．１秒以下で３回、というように、０．７秒ごとに１回、視線方向を変化させる回数を増やすこととした。 In addition, in actual three-way dialogues, it was observed that when people looked away, they did not look at one point for a long time, but rather changed their gaze direction midway. The number of changes tended to increase the longer the gaze was averted. Therefore, based on the analysis of the distribution of duration for each number of gaze averts, it was decided to increase the number of gaze direction changes by once every 0.7 seconds, i.e. once when the duration was 0.7 seconds or less, twice when it was 1.4 seconds or less, and three times when it was 2.1 seconds or less.

この場合にも、基本的には図１３から図１５に示す分布に従って視線方向を変化させた。ただし視線逸らしの間にさらに視線方向を変化させる場合、それまでの視線方向を中央とし、中央以外に視線方向を変化させることを前提に図１３から図１５に示す分布を適用した。また視線逸らしの間に２回以上視線方向を変化させる場合には、実際の観察結果として、視線を直前の方向に戻す傾向が見られたため、実装としても直前の視線方向に戻すようにした。 In this case, too, the gaze direction was basically changed according to the distribution shown in Figures 13 to 15. However, if the gaze direction was changed further while looking away, the previous gaze direction was set as the center, and the distribution shown in Figures 13 to 15 was applied on the premise that the gaze direction was changed to a location other than the center. Also, when the gaze direction was changed two or more times while looking away, the actual observations showed a tendency for the gaze to return to the previous direction, so the implementation was also set to return the gaze direction to the previous direction.

第３評価実験
１．評価実験１
上記実施形態に係るロボット１６０について、個性を指定しない場合の視線方向の変化が観察者にどのように感じられるかについての評価実験を行った。 3. Evaluation experiment 1. Evaluation experiment 1
An evaluation experiment was conducted on the robot 160 according to the above embodiment to see how a change in the gaze direction when no personality is specified is perceived by an observer.

Ａ．モデル
この実施形態では、役割ごとの視線パターン及び視線を逸らすときの瞳の動きが対話における重要な要素である。そのため、評価実験では以下の４つの条件によりロボット１６０の動きに対する評価を比較することにした。 A. Model In this embodiment, the gaze pattern for each role and the movement of the pupils when looking away are important elements in the dialogue. Therefore, in the evaluation experiment, the evaluation of the movement of the robot 160 was compared under the following four conditions.

ａ．ベースライン（同割合－頭部モデル）
ベースラインとして、ロボット１６０の役割が発話者の時、２人の対話者の方向に同じ割合でロボットが視線を向けるモデルを使用した。２人の対話者とそれ以外のところを見る割合であるが、第１の対話者、第２の対話者、及びそれ以外を見る割合をおおよそ３：３：４になるように設定した。この際、ロボットが視線を向ける場所については、人の顔又は２人の対話者の間に角度4度を分散とした２次元ガウス分布を適用して得られた場所をロボットが見るようにした。聞き手の時は違和感を所持させないように、ロボットが話者の方向を向くようにした。 a. Baseline (same proportions - head model)
As a baseline, a model was used in which the robot 160, when its role was a speaker, directed its gaze in the direction of the two interlocutors at the same rate. The ratio of the robot's gaze to the two interlocutors and other places was set to be approximately 3:3:4, with the first interlocutor, the second interlocutor, and other places being the first:3:4 ratio. In this case, the robot was made to look at a place obtained by applying a two-dimensional Gaussian distribution with a variance of 4 degrees between the human face or the two interlocutors. When the robot was a listener, it was made to look in the direction of the speaker to avoid creating a sense of incongruity.

ｂ．比較モデル（同割合－頭部－眼球モデル）
ロボットの視線制御では、眼球の動きが大切である。そのため、前述の同割合モデルにおいて、動かす部位を頭だけでなく眼球も追加し共に動かすモデルとした。 b. Comparison model (equal proportions - head-eye model)
Eye movement is important for the gaze control of a robot. Therefore, in the equal proportion model mentioned above, we added the eyeballs to the head as well as the head to move together.

ｃ．評価モデル１（実施形態に係る頭部－眼球モデル）
上記実施形態に基づいた視線制御及び視線逸らしの実装を行ったモデルである。 c. Evaluation model 1 (head-eyeball model according to the embodiment)
This is a model in which gaze control and gaze aversion are implemented based on the above embodiment.

ｄ．評価モデル２（実施形態に係る頭部モデル）
前述の評価モデルの中で、眼球の動作をなくし、頭部動作のみを行うモデルである。 d. Evaluation model 2 (head model according to the embodiment)
Among the aforementioned evaluation models, this is a model that eliminates eye movement and only involves head movement.

Ｂ．印象評価
この実施形態において最も期待したい効果はロボットの振る舞いが自然らしいかどうかである。そのため、評価者に対する質問として「全体的にロボットの振る舞いは自然に感じましたか」という質問項目を設けた。 B. Impression Evaluation The most desirable effect of this embodiment is whether the robot's behavior seems natural. Therefore, the question "Did you feel that the robot's behavior was natural overall?" was included as a question for the evaluators.

Ｃ．実験の設定
この評価実験においては、実際に収録した３人により行われた対話の中の１人の会話音声をロボットに搭載したビデオを上記したモデルごとに作成した。その結果、４本のビデオが得られた。それらのビデオを評価者に視聴してもらい、その時の視線の動き方について評価してもらった。実際には図３０により示すような動画を各評価者に見てもらった。 C. Experimental Setup In this evaluation experiment, a video of one of the conversations between three people was actually recorded and loaded onto the robot, and was created for each of the above models. As a result, four videos were obtained. The evaluators watched these videos and evaluated the eye movements at the time. In practice, each evaluator watched a video like that shown in Figure 30.

男女３０人（平均年齢３２．１歳、分散１０．３）に対して４つの動画を視聴してもらい、それぞれに対して印象評価をしてもらった。図３０にこの実験において使用したビデオ画像７２０の１例を示す。実験はあくまでロボット１６０の視線方向の変化に関するものである。したがってこの評価実験では、発話は実際の人間による３者対話を使用し、その中の一人の発話をロボット１６０が行うようにした。その際には、ロボット１６０の視線方向をロボット１６０の役割、個性、ターン状態に従って、上記実施形態と同様に動くように制御した。実際の３者対話についてはラベル付がされているため、図１６に示す役割判定部３６０及びターン交替検出部３６２などについては不要で適切なタイミングでそうしたラベルに応じた情報を視線動作生成部３７０に与えるようにした。Thirty men and women (average age 32.1 years, variance 10.3) were asked to watch four videos and to evaluate their impressions of each. Figure 30 shows an example of the video image 720 used in this experiment. The experiment is solely concerned with changes in the gaze direction of the robot 160. Therefore, in this evaluation experiment, a three-way dialogue between real people was used as the speech, and one of the people was spoken by the robot 160. In this case, the gaze direction of the robot 160 was controlled to move in the same manner as in the above embodiment according to the role, personality, and turn state of the robot 160. Since the actual three-way dialogue is labeled, the role determination unit 360 and turn change detection unit 362 shown in Figure 16 are not necessary, and information corresponding to the labels is provided to the gaze action generation unit 370 at an appropriate timing.

ビデオ画像７２０は、図３０に示すように中央にロボット１６０の動画を配置し、左右に他の２人の話者を示す簡単なキャラクタ画像７３０及び７３２を配置した。キャラクタ画像７３０及び７３２は３者対話であることを示すためのものだが、例えばロボット１６０の右側（図３０の左側）に存在する話者が発話しているときにはキャラクタ画像７３０の周りに発話していることを示すサインを表示するようにした。ロボット１６０の左側（向かって右側）の話者の場合も同様である。図３０は向かって右側の話者が発話している状況を示す。As shown in Figure 30, video image 720 has a moving image of robot 160 placed in the center, with simple character images 730 and 732 showing the other two speakers on the left and right. Character images 730 and 732 are intended to indicate that this is a three-way dialogue, and when a speaker on the right side of robot 160 (left side of Figure 30) is speaking, for example, a sign indicating that the speaker is speaking is displayed around character image 730. The same applies to the speaker on the left side of robot 160 (right side as you face it). Figure 30 shows a situation where the speaker on the right side is speaking.

評価実験では、このような４本の動画を男女３０人に視聴してもらい、印象評価を収集した。なお、この動画の視聴時には、向かって右側の話者の発話は評価者の右耳に、左側の話者の発話は評価者の左耳に、ロボットの発話は評価者の両耳に、それぞれ聞こえるようにした。こうした形で、前述した４つのモデルについてそれぞれ個別に印象評価をしてもらった。In the evaluation experiment, 30 men and women were asked to watch these four videos and their impressions were collected. When watching the videos, the speaker on the right side was heard in the evaluator's right ear, the speaker on the left side in the evaluator's left ear, and the robot's speech in both ears. In this way, the evaluators were asked to individually evaluate the impressions of each of the four models mentioned above.

実験手順は次のとおりである。まず、順序効果を減らすために４つの動画の提示順序をランダム化した。次に、４つの手法の動画について個別に７点スケール（１：とても不自然、４：どちらともいえない、７：とても自然）で評価者に印象評価してもらった。以上を１セットとし、このセットを３つの対話区間（各対話区間の長さは１分程度）に対し合計１２個の動画について印象評価し、結果を集計した。The experimental procedure was as follows. First, the presentation order of the four videos was randomized to reduce order effects. Next, the evaluators were asked to rate the impressions of each of the four methods individually on a 7-point scale (1: very unnatural, 4: neither strong nor weak, 7: very natural). The above constituted one set, and impressions of a total of 12 videos in this set were evaluated for three dialogue sections (each dialogue section was approximately 1 minute long), and the results were compiled.

Ｄ．評価結果
上記３つの対話区間の中で、２つでは条件間に有意差が認められなかった。残りの１対話区間では図３１に示すように条件間で有意差が見られた。図３１に、「全体的にロボットの振る舞いは自然に感じましたか」という質問に対する評価結果を示す。各モデルについて、結果の平均値、標準誤差、ライアン法に基づいて多重比較の結果をそれぞれ算出した。 D. Evaluation Results Among the three dialogue sections, no significant difference was observed between the conditions in two of them. In the remaining dialogue section, a significant difference was observed between the conditions as shown in FIG. 31. FIG. 31 shows the evaluation results for the question, "Overall, did you feel that the robot's behavior was natural?" For each model, the average value of the results, standard error, and the result of multiple comparison based on Ryan's method were calculated.

ライアン法の結果、ベースラインモデルと評価モデル１の間にｐ値０．０２０（≦．０５）、比較モデルと評価モデル１の間にｐ値０．００１（≦．０５）、評価モデル２と評価モデル１との間にｐ値０．００８（≦．０５）と、それぞれ有意差が見られた。 The results of the Ryan method showed significant differences between the baseline model and evaluation model 1, with a p-value of 0.020 (≦.05), between the comparison model and evaluation model 1, and between evaluation model 2 and evaluation model 1, with a p-value of 0.001 (≦.05), and between evaluation model 2 and evaluation model 1, respectively.

以上の結果から、上記４つのモデルの中では、実施形態にもとづいた評価モデル３によるロボットの振る舞いが最も自然であることが示された。 The above results show that, among the four models, the behavior of the robot based on evaluation model 3 based on the embodiment is the most natural.

２．評価実験２
上記評価実験１の設定に加え、ロボットの視線制御に個性まで考慮し、視線動作による個性（外向性）の印象を評価するための実験を行った。 2. Evaluation experiment 2
In addition to the settings of the above evaluation experiment 1, an experiment was conducted to consider personality in the gaze control of the robot and to evaluate the impression of personality (extroversion) based on gaze movements.

Ａ．実験手法
視線動作を評価するための実験を行うため、まず３者対話データから話者Ａ又は話者Ｂが参加している対話区間を抽出した。さらに、各話者の会話音声を用いて、ロボットの視線動作を生成し、動画を作成した。実験では複数の条件において生成した視線動作を評価してもらった。視線動作による個性表出の印象の違いを調べるため、話者Ａのデータから作成したモデル（ＰＡ－Ｍ）と、話者Ｂのデータから作成したモデル（ＰＢ－Ｍ）を使ってロボットの視線動作を生成した。比較対象として、話者の視線動作を再現したもの（ＰＡ－Ｒ又はＰＢ－Ｒ）と、動作なしで音声のみを呈示したもの（ＰＡ－Ｓ又はＰＢ－Ｓ）も準備した。 A. Experimental Method In order to conduct an experiment to evaluate gaze behavior, first, a dialogue section in which speaker A or speaker B participated was extracted from the three-way dialogue data. Furthermore, the conversational voice of each speaker was used to generate the gaze behavior of the robot, and a video was created. In the experiment, the participants were asked to evaluate the gaze behavior generated under multiple conditions. In order to investigate the difference in the impression of personality expression due to gaze behavior, the gaze behavior of the robot was generated using a model (PA-M) created from the data of speaker A and a model (PB-M) created from the data of speaker B. For comparison, we also prepared a model that reproduced the gaze behavior of the speaker (PA-R or PB-R) and a model that presented only the voice without any movement (PA-S or PB-S).

話者Ａと話者Ｂに関して、それぞれ４０秒前後の会話区間を抽出し、上述の４つ条件で動画を収録し、被験者実験に用いた。動画の内容は図３０に示したものと同様である。評価者への発話の聞こえ方も評価実験１と同様である。では左右（斜め前）に対話者がいることを明確にするため、図７（ｂ）に示すように、画面の両側に対話者を示すキャラクタ画像を描き、声が発すると放射線が描かれるようにした。対話者の音声も左右の耳にそれぞれ聞こえるようにし、話者Ａ又は話者Ｂの音声は両耳に聞こえるようにした。音声のみの動画では、対象となる話者もロボットの代わりに動作なしのキャラクタ画像を描いた。Conversational segments of about 40 seconds each were extracted for Speaker A and Speaker B, and videos were recorded under the four conditions mentioned above and used in the subject experiment. The content of the videos was the same as that shown in Figure 30. The way the speech was heard by the evaluators was also the same as in Evaluation Experiment 1. In order to make it clear that there were interlocutors on the left and right (diagonally in front), character images representing the interlocutors were drawn on both sides of the screen as shown in Figure 7 (b), and when a voice was spoken, a radiating line was drawn. The interlocutors' voices were also heard in the left and right ears, respectively, and the voice of Speaker A or Speaker B was heard in both ears. In the video with audio only, the target speaker was also drawn as a character image without any movement instead of a robot.

実験参加者は、各対話区間において、音声のみの動画（Ｓ－Ｍ）をまず視聴し、その後、ロボットの視線動作が含まれる３つの動画を視聴した。順序効果を減らすために、３つの動画の提示順序をランダム化した。各動画を視聴した直後に７点スケール（１：非常に内向的、４：どちらともいえない、７：非常に外向的）で印象評価をしてもらった。なお、実験参加者には視線動作を制御した実験であることは伝えなかった。In each dialogue section, participants first watched an audio-only video (S-M), and then watched three videos that included the robot's gaze movements. To reduce order effects, the presentation order of the three videos was randomized. Immediately after watching each video, participants were asked to rate their impressions on a 7-point scale (1: very introverted, 4: neutral, 7: very extroverted). Participants were not informed that gaze movements were controlled in this experiment.

Ｂ．実験結果
男女４１人（平均３７．５歳、標準偏差１４．１歳）が実験に参加した。図３２に印象評価の結果を示す。 B. Experimental Results Forty-one men and women (average age 37.5 years, standard deviation 14.1 years) participated in the experiment. Figure 32 shows the results of the impression evaluation.

ａ．話者Ａの声
図３２（Ａ）より、まず音声のみを聞いた場合（ＰＡ－Ｓ）の外向性の印象は、外向寄り（４以上）であることが分かる。ロボットを通して視線動作を制御した場合、話者Ａのデータから構築した視線動作生成モデル（ＰＡ－Ｍ）は、話者Ａの視線動作を再現した場合（ＰＡ－Ｒ）と同等の外向性の印象を受け、内向寄りである話者Ｂのデータから構築したモデル（ＰＢ－Ｍ）では外向性の印象が有意差をもって下がることが示されている。 Speaker A's voice Figure 32 (A) shows that the impression of extroversion when listening to only the voice (PA-S) is on the extroverted side (4 or higher). When gaze movements are controlled through a robot, the gaze movement generation model (PA-M) constructed from speaker A's data gives the same impression of extroversion as the model (PA-R) in which speaker A's gaze movements are reproduced, while the model (PB-M) constructed from speaker B's data, which is on the introverted side, gives a significantly lower impression of extroversion.

ｂ．話者Ｂの声
図３２（Ｂ）より、音声のみを聞いた場合（ＰＢ－Ｓ）は、内向寄り（４以下）であることが分かる。ロボットを通して視線動作を制御した場合、話者Ｂのデータから構築した視線動作生成モデル（ＰＢ－Ｍ）は、話者Ｂの視線動作を再現した場合（ＰＢ－Ｒ）と同等の外向性の印象で、外向寄りである話者Ａのデータから構築したモデル（ＰＡ－Ｍ）では外向性の印象が有意的に上がることが示されている。 b. Speaker B's voice Figure 32 (B) shows that when only the voice was heard (PB-S), the speaker was more introverted (4 or less). When gaze movements were controlled through a robot, the gaze movement generation model (PB-M) constructed from speaker B's data gave the same impression of extroversion as the model (PB-R) in which speaker B's gaze movements were reproduced, while the model (PA-M) constructed from speaker A's data, who is more extroverted, gave a significantly higher impression of extroversion.

Ｃ．考察
以上の結果より、異なる個性（外向性）を持つ話者の視線生成モデルを使い分けることにより、同じ音声でも表出される個性の印象を変えることができることが示された。この結果より、モデルのパラメータを内挿又は外挿することにより、外向性の印象を連続的に制御できることも示唆された。 C. Discussion The above results show that by using different gaze generation models for speakers with different personalities (extroversion), it is possible to change the impression of personality expressed even with the same voice. This result also suggests that the impression of extroversion can be continuously controlled by interpolating or extrapolating the model parameters.

第４変形例
上記実施形態では、モデルには各方向に視線を向ける確率が格納され、実際に視線方向を決定するときにはそれら確率を加算してから乱数により視線方向をサンプリングしている。しかしこの発明はそのような実施形態には限定されない。例えばモデルには、あらかじめ各確率を加算した形の値（累積確率）を格納するようにしておいてもよい。上記実施形態では、視線逸らし時の方向は９方向としている。しかしこの発明はそのような実施形態には限定されない。より細かく方向を分類してもよい。 Fourth Modification In the above embodiment, the model stores the probability of directing the gaze in each direction, and when actually determining the gaze direction, these probabilities are added and then the gaze direction is sampled using a random number. However, the present invention is not limited to such an embodiment. For example, the model may store a value (cumulative probability) in which each probability is added in advance. In the above embodiment, there are nine directions when the gaze is averted. However, the present invention is not limited to such an embodiment. The directions may be classified in more detail.

さらに上記実施形態では、ロボットなどのエージェントの対話上の役割と、ターン状態と、個性との組み合わせにより視線方向モデル及び視線継続モデルを準備し利用している。しかしこの発明はそのような実施形態には限定されない。これらの中のいずれかを抜いた形でモデルを準備してもよい。またこれらとは異なる基準を視線方向モデル選択のために用いてもよい。例えば年齢、教師と学生、親と子などの社会的立場を基準としてもよい。 Furthermore, in the above embodiment, a gaze direction model and gaze continuation model are prepared and used based on a combination of the dialogue role, turn state, and personality of an agent such as a robot. However, the present invention is not limited to such an embodiment. A model may be prepared that omits any of these. Also, criteria other than these may be used to select a gaze direction model. For example, age, or social status such as teacher and student, parent and child, etc. may be used as criteria.

上記実施形態は３者対話に関するものである。しかし、上記実施形態の説明からも明らかなとおり、各発話者が区別され、その役割が分類できるような状況であれば４人以上の対話でも本発明を適用できる。The above embodiment relates to a three-party dialogue. However, as is clear from the description of the above embodiment, the present invention can also be applied to a dialogue involving four or more people, provided that each speaker can be distinguished and their role can be classified.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内での全ての変更を含む。The embodiments disclosed herein are merely illustrative, and the present invention is not limited to the above-described embodiments. The scope of the present invention is indicated by the claims in the scope of the claims, taking into consideration the detailed description of the invention, and includes all modifications within the meaning and scope equivalent to the wording described therein.

５０３者対話
６０、６２、６４参加者
８０、８２、８４視線ラベル列
１００、１０４発話
１０２、１０６ターン交替ラベル
１０８ターン交替
１５０会話ロボットシステム
１６０ロボット
１６２動作制御ＰＣ
１６４マイクロフォン
１６６音声処理ＰＣ
１６８スピーカ
１７０音声合成ＰＣ
１７２確率モデル記憶装置
１７４対話用ＰＣ
１７６ネットワーク
１７８人位置センサ
１８０人位置認識ＰＣ
１９０音声認識部
１９２ターン認識部
２５０コンピュータシステム
２７０コンピュータ
２７２モニタ
２７４キーボード
２７６マウス
２７８ＤＶＤ
２８４ＵＳＢメモリ
２９０ＣＰＵ
２９２ＧＰＵ
２９６ＲＯＭ
２９８ＲＡＭ
３００ＳＳＤ
３０２ＤＶＤドライブ
３０４音声Ｉ／Ｆ
３０６ＵＳＢポート
３０８ネットワークＩ／Ｆ
３１０バス
３５０視線制御システム
３６０役割判定部
３６２ターン交替検出部
３６４視線方向モデル
３６６モデル
３６８個性情報記憶部
３７０視線動作生成部
３７２視線動作制御部
４００発話時視線方向モデル
４０２視線継続時間モデル
４０４、４０６、４０８ターン交替時視線方向モデル
４５０、４５２方向モデル
50 Three-way dialogue 60, 62, 64 Participants 80, 82, 84 Gaze label sequence 100, 104 Utterance 102, 106 Turn-taking label 108 Turn-taking 150 Conversation robot system 160 Robot 162 Operation control PC
164 Microphone 166 Audio processing PC
168 Speaker 170 Voice synthesis PC
172 Probability model storage device 174 Interactive PC
176 Network 178 Human position sensor 180 Human position recognition PC
190 Voice recognition unit 192 Turn recognition unit 250 Computer system 270 Computer 272 Monitor 274 Keyboard 276 Mouse 278 DVD
284 USB memory 290 CPU
292 GPUs
296 ROM
298 RAM
300 SSD
302 DVD drive 304 Audio I/F
306 USB port 308 Network I/F
310 Bus 350 Gaze control system 360 Role determination unit 362 Turn change detection unit 364 Gaze direction model 366 Model 368 Individual information storage unit 370 Gaze movement generation unit 372 Gaze movement control unit 400 Gaze direction model during speech 402 Gaze duration model 404, 406, 408 Gaze direction model during turn change 450, 452 Direction model

Claims

A gaze control device for controlling a gaze of an agent in a multi-person dialogue including three or more agents , comprising:
a gaze direction setting means for determining the gaze direction of the agent based on a combination of roles of participants in the multi-person dialogue and a state of a dialogue flow in response to a timing for determining the gaze direction of the agent ;
a control parameter generating means for generating control parameters for controlling a face orientation and an eye direction of the agent in response to the gaze direction of the agent being determined by the gaze direction setting means,
The state of the dialogue flow includes a turn-taking state that starts a predetermined time before the turn-taking and continues until a predetermined condition is satisfied after the turn-taking, and an utterance state that is a state other than the turn-taking state,
The gaze direction setting means detects the start and end of the turn alternation state, determines the timing using different methods depending on the state of the dialogue flow, and at each of the timings, determines the gaze direction of the agent using different probability models depending on the state of the dialogue flow and the role of the agent in that state .

The line of sight direction setting means
a direction determination model storage means for storing a direction determination model that determines, for each role, a probability that the gaze direction of the agent will turn in each of a plurality of predetermined directions in accordance with a combination of each role of the participants and a state of the dialogue flow;
a probability distribution extracting means for detecting a start and an end of the turn-changing state, determining the timing in accordance with the role of the agent in the multi-person dialogue and whether the dialogue flow is in the turn-changing state, and extracting a probability distribution corresponding to a combination of the role of the agent and the state of the dialogue flow from the direction determination model in response to the timing being reached;
2. The gaze control device according to claim 1, further comprising: a first sampling means for sampling the gaze direction of said agent from the probability distribution extracted by said probability distribution extracting means.

The gaze control device according to claim 2 , wherein the plurality of directions of the direction determination model include a direction of the participants other than the agent and an averting direction different from any of the directions of the participants .

The line of sight direction setting means further comprises:
an eye gaze aversion direction model storage means for storing an eye gaze aversion direction model comprising a probabilistic model for probabilistically determining the eye gaze aversion direction according to a combination of the role of the agent and a state of the dialogue flow;
and second sampling means for sampling a direction in which the agent averts its gaze from the averting direction model in response to the gaze direction sampled by the first sampling means being the averting direction.

5. The gaze control device according to claim 1, further comprising a duration calculation unit for calculating a duration of the gaze of the agent in response to the timing being reached, in accordance with a combination of the role of the agent , a state of the dialogue flow, and the gaze direction determined by the gaze direction setting means.

6. The gaze control device according to claim 5, wherein when the state of the dialogue flow is a turn-alternating state , the timing is a predetermined timing in the turn-alternating state, and when the state of the dialogue flow is not a turn-alternating state, the timing is the expiration of the duration calculated by the duration calculation unit immediately before that .

The line-of-sight control device according to claim 6 , wherein the timing for determining the line-of-sight direction in the turn-alternating state is a plurality of predetermined timings after the start of the turn-alternating state.

The line of sight direction setting means
a direction determination model storage means for storing a direction determination model that determines, for each role , a probability that the gaze direction of the agent will turn in each of a plurality of predetermined directions according to a combination of each role of the participants, a state of the dialogue flow, and a personality assumed for the agent;
a probability distribution extracting means for detecting a start and an end of the turn alternation state, determining the timing in accordance with the role of the agent in the dialogue and the state of the dialogue flow, and, in response to the timing being reached, extracting a probability distribution corresponding to a combination of the role of the agent , the state of the dialogue flow, and the personality from the direction determination model;
2. The gaze control device according to claim 1, further comprising: a first sampling means for sampling the gaze direction of said agent from the probability distribution extracted by said probability distribution extracting means.

The gaze control device according to claim 8 , wherein the plurality of directions of the direction determination model include a direction of the participants other than the agent and an averting direction different from any of the directions of the participants .

The line of sight direction setting means further comprises:
an eye gaze aversion direction model storage means for storing an eye gaze aversion direction model comprising a probabilistic model for probabilistically determining the eye gaze aversion direction of the agent according to a combination of the role of the agent , the state of the dialogue flow , and the personality;
and second sampling means for sampling a direction in which the agent averts its gaze from the averting direction model in response to the gaze direction sampled by the first sampling means being the averting direction.

11. The gaze control device according to claim 8, further comprising a duration calculation unit for calculating a duration of the gaze of the agent in response to the timing being reached, according to a combination of the role of the agent , a state of the dialogue flow, the personality, and the gaze direction determined by the gaze direction setting means.

The gaze control device according to claim 11, wherein the timing for determining the gaze direction is a predetermined timing in a turn-alternating state when the state of the dialogue flow is a turn-alternating state, and is the timing at which the duration calculated by the duration calculation unit immediately before the turn-alternating state expires when the state of the dialogue flow is not a turn-alternating state.

The gaze control device according to claim 12 , wherein the timing for determining the gaze direction in the turn alternating state is a plurality of predetermined timings after the start of the turn alternating state.

The agent is an agent that is equipped with an eyeball capable of controlling the direction of gaze and a speaking function, and is selected from the group consisting of a robot, a three-dimensional virtual agent represented as a three-dimensional image in a virtual space, and a two-dimensional virtual agent represented as a two-dimensional image.A gaze control device as described in any one of claims 1 to 4 and claims 8 to 10.

1. A gaze control method implemented by a computer for controlling a gaze of an agent in a multi-person dialogue of three or more people including the agent, comprising:
A step of determining a gaze direction of the agent by the computer in response to a timing for determining a gaze direction based on a combination of a role of a participant in the multi-person dialogue and a state of a dialogue flow;
and generating control parameters for controlling a face orientation and an eye direction of the agent in response to the gaze direction of the agent being determined in the step of determining the gaze direction by the computer ,
The state of the dialogue flow includes a turn-taking state that starts a predetermined time before the turn-taking and continues until a predetermined condition is satisfied after the turn-taking, and an utterance state that is a state other than the turn-taking state,
The step of determining the gaze direction includes the steps of detecting the start and end of the turn alternation state, determining the timing by different methods depending on the state of the dialogue flow, and determining the gaze direction of the agent at each of the timings using different probability models depending on the state of the dialogue flow and the role of the agent in that state .

A computer program for controlling the gaze of an agent in a multi-person dialogue involving three or more agents , the computer comprising:
a gaze direction setting means for determining the gaze direction of the agent based on a combination of roles of participants in the multi-person dialogue and a state of a dialogue flow in response to a timing for determining the gaze direction of the agent ;
functioning as a control parameter generating means for generating control parameters for controlling a facial orientation and an eye direction of the agent in response to the gaze direction of the agent being determined by the gaze direction setting means;
The state of the dialogue flow includes a turn-taking state that starts a predetermined time before the turn-taking and continues until a predetermined condition is satisfied after the turn-taking, and an utterance state that is a state other than the turn-taking state,
the gaze direction setting means detects the start and end of the turn alternation state, determines the timing by different methods depending on the state of the dialogue flow, and determines the gaze direction of the agent at each of the timings using different probability models depending on the state of the dialogue flow and the role of the agent in that state .

A gaze control device for controlling a gaze of an agent in a multi-person dialogue of three or more people including the agent , the gaze control device being realized by a computer having a processor, comprising:
The processor,
determining a gaze direction of the agent based on a combination of roles of participants in the multi-person dialogue and a state of a dialogue flow in the multi-person dialogue in response to a timing for determining a gaze direction of the agent ;
a control parameter is generated for controlling a face orientation and an eye direction of the agent in response to the gaze direction of the agent being determined ;
The state of the dialogue flow includes a turn-taking state that starts a predetermined time before the turn-taking and continues until a predetermined condition is satisfied after the turn-taking, and an utterance state that is a state other than the turn-taking state,
the processor is further configured to detect the start and end of the turn alternation state, determine the timing in different manners depending on the state of the dialogue flow, and at each of the timings, determine the gaze direction of the agent using a different probability model depending on the state of the dialogue flow and the role of the agent in that state .

A non-transitory, computer-readable, non-transitory storage medium storing a computer program for operating a computer to function as the gaze control device according to claim 17 .