JP6182464B2

JP6182464B2 - Image display system and image display method

Info

Publication number: JP6182464B2
Application number: JP2014012588A
Authority: JP
Inventors: 喜智大野; 貴司折目
Original assignee: Daiwa House Industry Co Ltd
Current assignee: Daiwa House Industry Co Ltd
Priority date: 2014-01-27
Filing date: 2014-01-27
Publication date: 2017-08-16
Anticipated expiration: 2034-01-27
Also published as: JP2015142168A

Description

本発明は、話者の画像を対話相手に対して表示する画像表示システム及び画像表示方法に係り、特に、より臨場感のある対話を実現することが可能な画像表示システム及び画像表示方法に関する。 The present invention relates to an image display system and an image display method for displaying an image of a speaker to a conversation partner, and more particularly to an image display system and an image display method capable of realizing a more realistic dialogue.

遠隔会議システム等、遠隔地にいる者同士が互いの映像を見ながら対話を行うための通信技術は、既に周知である。また、近年では、臨場感のある対話を実現するために、表示画面に表示させる対話者の画像中、当該対話者の視線を意図的に変化させる技術が開発されている（例えば、特許文献１参照）。 Communication techniques for remote parties such as a teleconference system for performing a conversation while watching each other's images are already well known. In recent years, in order to realize a dialogue with a sense of presence, a technique has been developed that intentionally changes the line of sight of a conversation person in a conversation person image displayed on a display screen (for example, Patent Document 1). reference).

特許文献１に記載された画像表示システムでは、対話者同士が互いの映像を見ながら対話を行っている際、一方の対話者Ａがディスプレイにて他方の対話者Ｂの映像を見ているときの視線を検知し、その検知結果に基づき、対話者Ｂがディスプレイにて見ている対話者Ａの映像中、瞳の形状や位置を変化させて対話者Ａの映像を再構築することとしている。このような構成により、特許文献１に記載の画像表示システムでは、例えば対話者Ａが複数人の対話者Ｂを相手に対話するときに、そのうちの特定の対話者Ｂに視線を向けていることを各対話者Ｂに知覚させることが可能となる。 In the image display system described in Patent Document 1, when one of the conversation persons A is viewing the image of the other conversation person B on the display when the conversation persons are performing a conversation while watching each other's images. Based on the detection result, the video of the conversation person A is changed by changing the shape and position of the pupil in the video of the conversation person A that the conversation person B is watching on the display. . With such an arrangement, in the image display system described in Patent Document 1, for example, when the conversation person A interacts with a plurality of conversation persons B, the line of sight is directed to a specific conversation person B. Can be perceived by each dialogue person B.

特開２０１２−７００８１号公報JP 2012-70081 A

ところで、対話者同士が互いの画像を見ながら行う対話に対しては、更なる臨場感の向上が求められている。かかる要求を満たすには、特許文献１に記載された画像表示システムのように対話者の映像中の視線を変化させる技術を応用し、あたかも対面しながら対話しているように感じさせることが可能なシステム及び方法の開発が必要となる。 By the way, further improvement of a sense of reality is demanded for the dialogue performed by the interlocutors while looking at each other's images. In order to satisfy such a requirement, it is possible to apply a technology for changing the line of sight in the video of the interlocutor as in the image display system described in Patent Document 1 and make it feel as if the user is interacting with each other. System and method must be developed.

そこで、本発明は、上記の課題に鑑みてなされたものであり、その目的とするところは、話者の画像を対話相手に対して表示する画像表示システム及び方法として、より臨場感のある対話を実現することが可能なシステム及び方法を提供することである。 Therefore, the present invention has been made in view of the above problems, and the object of the present invention is to provide a more realistic dialogue as an image display system and method for displaying a speaker's image to a dialogue partner. It is an object to provide a system and method capable of realizing the above.

前記課題は、本発明の画像表示システムによれば、話者の画像を対話相手に対して表示する画像表示システムであって、前記話者側に設けられ、前記話者を撮像した際の映像を示すデータを取得するデータ取得部と、前記映像から補正された前記画像を表示するための表示データを生成する表示データ生成部と、前記対話相手側に設けられ、前記表示データを展開することで前記画像を表示する画像表示部と、前記対話相手側に設けられ、予め設定された条件を満たす動作を前記対話相手が行った際に該対話相手を検知する検知部と、を有し、前記検知部が前記動作を行った前記対話相手を検知したときには、前記表示データ生成部は、前記動作を行った前記対話相手が居る位置に前記話者の視線が向くように補正された前記画像を表示するための前記表示データを生成する第一処理を実行し、前記検知部が前記動作を行った前記対話相手を所定時間以上検知しないときには、前記表示データ生成部は、予め設定された位置に前記話者の視線が向くように補正された前記画像を表示するための前記表示データを生成する第二処理を実行することにより解決される。 According to the image display system of the present invention, the subject is an image display system for displaying a speaker's image to a conversation partner, and is provided on the speaker side, and an image when the speaker is captured A data acquisition unit that acquires data indicating a display, a display data generation unit that generates display data for displaying the image corrected from the video, and the display partner that expands the display data An image display unit for displaying the image, and a detection unit that is provided on the conversation partner side and detects the conversation partner when the conversation partner performs an operation that satisfies a preset condition , wherein when the front Symbol detection unit detects the dialogue partner were performing the operation, the display data generation unit, which has been corrected so that the face is the speaker of the line-of-sight to the dialogue partner is present position, which was the operating Display images Run the first process of generating the display data of the eye, the when the detection unit does not detect the dialogue partner for a predetermined time of performing the operation, the display data generating unit, the talk in the predetermined position This is solved by executing a second process for generating the display data for displaying the image corrected so that the user's line of sight faces .

上述したように、本発明の画像表示システムでは、対話相手が所定の動作を行うと、検知部がこれを検知する。そして、当該検知動作に連動する形で、表示データ生成部が、上記動作を行った前記対話相手が居る位置に話者の視線が向くように補正された画像の表示データを生成する。これにより、対話相手側で話者の画像が表示されると、当該画像中の話者の視線が対話相手に向けられるようになる。このような視線の変化は、対話相手に対して、上記の動作に対して話者が反応したという錯覚をもたらし、この結果、対話相手は、あたかも話者と対面しながら対話していると感じることが可能となる。
また、上記の構成によれば、所定の動作を行う対話相手を検知していない非検知期間には、対話相手に対して表示された話者の画像中、話者の視線が予め設定された位置に向くようになる。これにより、非検知期間における話者の視線を好適に設定することが可能となる。 As described above, in the image display system of the present invention, when the conversation partner performs a predetermined operation, the detection unit detects this. Then, the display data generation unit generates display data of the image corrected so that the line of sight of the speaker is directed to the position where the conversation partner who has performed the above operation exists in conjunction with the detection operation. Thus, when an image of the speaker is displayed on the conversation partner side, the line of sight of the speaker in the image is directed toward the conversation partner. This change in line of sight gives the conversation partner the illusion that the speaker has responded to the above actions, and as a result, the conversation partner feels as if they are interacting with the speaker. It becomes possible.
Further, according to the above configuration, in the non-detection period in which the conversation partner performing the predetermined operation is not detected, the line of sight of the speaker is preset in the speaker image displayed for the conversation partner. It comes to the position. This makes it possible to suitably set the speaker's line of sight during the non-detection period.

また、上記の画像表示システムにおいて、前記表示データ生成部は、前記検知部が前記動作を行った前記対話相手を検知すると、前記動作を行った対話相手が居る位置に前記話者の視線及び前記話者の顔が向くように補正された前記画像を表示するための前記表示データを生成するとよい。
上記の構成によれば、対話相手に対して表示された話者の画像中、話者の視線及び話者の顔が所定の動作を行った対話相手が居る位置に向けられることで、より一層臨場感のある対話を実現することが可能となる。すなわち、話者の視線のみならず、話者の顔も上記動作を行った対話相手に向けられることで、当該対話相手が話者と対面しながら対話しているように感じる度合い（対面性）が、より高まることとなる。 In the image display system, when the display unit detects the conversation partner that performed the operation, the display data generation unit detects the speaker's line of sight and the position of the conversation partner who performed the operation. The display data for displaying the image corrected so that the face of the speaker faces may be generated.
According to the above configuration, in the speaker image displayed to the conversation partner, the speaker's line of sight and the speaker's face are directed to the position where the conversation partner who has performed the predetermined action is present, so that Realistic dialogue can be realized. In other words, not only the line of sight of the speaker but also the speaker's face is directed to the conversation partner who performed the above action, so that the conversation partner feels as if interacting with the speaker (face-to-face) However, it will increase further.

特に、前記対話相手が複数存在する場合において、前記検知部が前記動作を行った前記対話相手を所定時間以上検知しないとき、前記表示データ生成部は、複数の前記対話相手のうち、前記話者の視線が向いている位置に居る前記対話相手が順次切り替わるように前記第二処理を繰り返し実行すると、好適である。
上記の構成によれば、非検知期間中、表示データ生成部が、話者の視線が向いている位置に居る対話相手が順次切り替わるように第二処理を繰り返し実行する。すなわち、対話相手に対して表示された話者の画像において、話者の視線は、その先に位置する対話相手が順次変化するような動きを示す。これにより、各対話相手は、あたかも話者が目配せしているように感じることが可能となり、以て、対話における画像表示の趣向性を向上させることが可能となる。 In particular, when there are a plurality of conversation partners, when the detection unit does not detect the conversation partner that has performed the operation for a predetermined time or more, the display data generation unit is the speaker among the plurality of conversation partners. It is preferable that the second process is repeatedly executed so that the conversation partner at the position where the line of sight is facing sequentially switches.
According to the above configuration, during the non-detection period, the display data generation unit repeatedly executes the second process so that the conversation partners at the position where the speaker's line of sight is sequentially switched. That is, in the speaker's image displayed to the conversation partner, the speaker's line of sight shows a movement in which the conversation partner positioned ahead changes sequentially. This makes it possible for each conversation partner to feel as if the speaker is looking around, thereby improving the preference of image display in the conversation.

また、上記の画像表示システムにおいて、前記対話相手が居る建物内に配置された建築材料、家具若しくは装飾品の一部分を構成するとともに前記画像の表示画面を形成する表示画面形成部と、前記建物内に設けられ、前記対話相手が行う動作、前記対話相手が居る位置、前記対話相手の姿勢及び前記対話相手が発する音のうちの少なくとも一つを検知対象としたときに、予め設定された第二の条件を満たす前記検知対象を検知するセンサと、を更に有し、前記表示画面形成部は、前記センサが前記第二の条件を満たす前記検知対象を検知していない期間には前記表示画面を形成せずに前記一部分としての外観を現し、前記センサが前記第二の条件を満たす前記検知対象を検知している期間にのみ前記表示画面を形成するとよい。
上記の構成によれば、対話相手側でその者の動作、位置、姿勢及び音のうちの少なくとも一つを検知することをトリガーとして、画像の表示画面が形成されることになっている。そして、トリガーとなる検知対象の検知がない期間には表示画面が形成せず、その代わりに、対話相手が居る建物内に配置された建築材料、家具若しくは装飾品の一部分としての外観を現すようになる。これにより、表示画面を形成する表示画面形成部は、話者との対話が行われていない期間中には建築材料等として機能するようになり、建物内で目立ち難くなる。この結果、同期間中における表示画面形成部の存在が気付き難くなる。一方、上述したように、対話相手側で検知対象が検知されると、これをトリガーとして表示画面が形成されるので、表示画面形成に特段複雑な操作を要さなくなる。 Further, in the above image display system, a display screen forming unit that forms a part of a building material, furniture, or decoration arranged in a building where the conversation partner is located and forms a display screen of the image; Provided at the time of detecting at least one of an action performed by the conversation partner, a position where the conversation partner is located, a posture of the conversation partner, and a sound emitted by the conversation partner. A sensor that detects the detection target that satisfies the condition, and the display screen forming unit displays the display screen during a period in which the sensor does not detect the detection target that satisfies the second condition. It is preferable that the display screen is formed only during a period in which the appearance as the part is displayed without being formed and the sensor detects the detection target that satisfies the second condition.
According to the above configuration, an image display screen is formed by using at least one of the operation, position, posture and sound of the person on the conversation partner side as a trigger. In addition, the display screen is not formed during the period when the detection target that is the trigger is not detected, and instead, the appearance as a part of the building material, furniture, or decoration arranged in the building where the conversation partner is located is displayed. become. As a result, the display screen forming unit that forms the display screen functions as a building material or the like during a period in which no dialogue with the speaker is performed, making it difficult to stand out in the building. As a result, it is difficult to notice the presence of the display screen forming unit during the same period. On the other hand, as described above, when a detection target is detected on the conversation partner side, a display screen is formed using this as a trigger, so that no complicated operation is required for forming the display screen.

また、前述の課題は、本発明の画像表示方法によれば、話者の画像を対話相手に対して表示する画像表示方法であって、前記話者側に設けられたデータ取得部が、前記話者を撮像した際の映像を示すデータを取得することと、表示データ生成部が、前記映像から補正された前記画像を表示するための表示データを生成することと、前記対話相手側に設けられた画像表示部が、前記表示データを展開することで前記画像を表示することと、前記対話相手側に設けられた検知部が、予め設定された条件を満たす動作を前記対話相手が行った際に該対話相手を検知することと、を有し、前記検知部が前記動作を行った前記対話相手を検知したときには、前記表示データ生成部は、前記動作を行った前記対話相手が居る位置に前記話者の視線が向くように補正された前記画像を表示するための前記表示データを生成する第一処理を実行し、前記検知部が前記動作を行った前記対話相手を所定時間以上検知しないときには、前記表示データ生成部は、予め設定された位置に前記話者の視線が向くように補正された前記画像を表示するための前記表示データを生成する第二処理を実行することにより解決される。 Further, the above-described problem is an image display method for displaying a speaker's image to a conversation partner according to the image display method of the present invention, wherein the data acquisition unit provided on the speaker side Obtaining data indicating a video when a speaker is imaged, a display data generation unit generating display data for displaying the image corrected from the video, and provided on the conversation partner side The displayed image display unit displays the image by expanding the display data, and the detecting unit provided on the dialog partner side performs an operation satisfying a preset condition. Detecting the conversation partner at the time, and when the detection unit detects the conversation partner that performed the operation , the display data generation unit is a position where the conversation partner who performed the operation is located. The speaker ’s line of sight Run the first process of generating the display data for displaying the corrected the image, the when the detection unit said no interaction partner is detected over a predetermined time of performing the operation, the display data generating unit This is solved by executing a second process of generating the display data for displaying the image corrected so that the line of sight of the speaker is directed to a preset position .

本発明の画像表示システム及び画像表示方法によれば、対話相手が所定の動作を行うと、対話相手に対して表示される話者の画像が補正され、話者の視線が上記動作を行った対話相手が居る位置に向けられるようになる。これにより、より臨場感のある対話を実現することが可能となる。つまり、本発明によれば、互いに離れた場所にいる話者と対話相手がお互いの映像を見ながら対話をする場合、対話相手は、あたかも話者と対面しながら対話しているような視聴覚効果を得ることが可能となる。
また、所定の動作を行う対話相手を検知していない非検知期間には、対話相手に対して表示された話者の画像中、話者の視線が予め設定された位置に向くようになる。これにより、非検知期間における話者の視線を好適に設定することが可能となる。 According to the image display system and the image display method of the present invention, when the conversation partner performs a predetermined operation, the image of the speaker displayed to the conversation partner is corrected, and the speaker's line of sight performs the above operation. It will be directed to the position where the conversation partner is. This makes it possible to realize a more realistic dialogue. That is, according to the present invention, when a speaker and a conversation partner at a distant place interact with each other while watching each other's images, the conversation partner is as if they are interacting with the speaker in an audiovisual effect. Can be obtained.
Further, in a non-detection period in which a conversation partner performing a predetermined operation is not detected, the speaker's line of sight is directed to a preset position in the speaker image displayed for the conversation partner. This makes it possible to suitably set the speaker's line of sight during the non-detection period.

話者の画像を対話相手側で表示している様子を示した図である。It is the figure which showed a mode that the image of a speaker is displayed on the dialogue other party side. 対話相手の画像を話者側で表示している様子を示した図である。It is the figure which showed a mode that the image of the other party was displayed on the speaker side. 本発明の一実施形態に係る画像表示システムを示した概念図である。1 is a conceptual diagram illustrating an image display system according to an embodiment of the present invention. 本発明の一実施形態に係る画像表示システムの構成を示したブロック図である。1 is a block diagram illustrating a configuration of an image display system according to an embodiment of the present invention. 図４の（Ａ）、（Ｂ）は、本発明の表示画面形成部の一例を示した図である。4A and 4B are diagrams showing an example of the display screen forming unit of the present invention. 話者側サーバ及び相手側サーバの各々の構成を機能面から示した図である。It is the figure which showed each structure of the speaker side server and the other party server from the functional surface. 本発明の第一処理の流れを示した図である。It is the figure which showed the flow of the 1st process of this invention. 動作実行者位置の特定に関する説明図である。It is explanatory drawing regarding specification of an operation person position. 話者映像の分解に関する説明図である。It is explanatory drawing regarding decomposition | disassembly of a speaker image | video. 視線編集に関する説明図である。It is explanatory drawing regarding a line-of-sight edit. 補正後の話者画像を表示している様子を示した図である。It is the figure which showed a mode that the speaker image after correction | amendment is displayed. 顔向き編集に関する説明図である。It is explanatory drawing regarding face orientation editing. 再補正後の話者画像を表示している様子を示した図である。It is the figure which showed a mode that the speaker image after re-correction is displayed. 本発明の第二処理の流れを示した図である。It is the figure which showed the flow of the 2nd process of this invention. 図１４の（Ａ）、（Ｂ）及び（Ｃ）は、第二処理によって生成される表示データに基づいて話者画像を表示している様子を示した図である。(A), (B), and (C) of FIG. 14 are views showing a state in which a speaker image is displayed based on display data generated by the second process. 本発明の一実施形態に係る画像表示システムが実行するデータ処理の流れを示した図である（その１）。It is the figure which showed the flow of the data processing which the image display system which concerns on one Embodiment of this invention performs (the 1). 本発明の一実施形態に係る画像表示システムが実行するデータ処理の流れを示した図である（その２）。It is the figure which showed the flow of the data processing which the image display system which concerns on one Embodiment of this invention performs (the 2). 話者画像を表示する際の手順を示した図である。It is the figure which showed the procedure at the time of displaying a speaker image.

以下、本発明の一実施形態（以下、本実施形態）に係る画像表示システム及び画像表示方法について図面を参照しながら説明する。なお、説明を分かり易くするために、以下では、話者をＡさんとし、対話相手をＢさん、Ｃさん、Ｄさんとするケースを具体例に挙げて説明することとする。ここで、「話者」とは、自らの発意により対話（会話）を開始し、対話相手に対して話し掛ける者である。これに対して、「対話相手」とは、話者の話の聞き手であり話者の話に応答して対話を行う者である。 Hereinafter, an image display system and an image display method according to an embodiment of the present invention (hereinafter, this embodiment) will be described with reference to the drawings. In order to make the description easy to understand, a case where the speaker is Mr. A and the conversation partners are Mr. B, Mr. C, and Mr. D will be described below as a specific example. Here, the “speaker” is a person who starts a conversation (conversation) based on his own idea and talks to the conversation partner. On the other hand, a “dialogue partner” is a speaker who is a listener of a speaker's story and performs a dialogue in response to the speaker's story.

また、以下では、話者であるＡさんは、対話時に所定の建物内（例えば、Ａさんの自宅内）に居ることとし、対話相手であるＢさん、Ｃさん、Ｄさんは、Ａさんとは異なる場所（例えば、Ａさんの自宅とは異なる建物内）に居て、３人すべてが同じ場所に集まってＡさんと対話を行うケースを例に挙げて説明することとする。 In the following description, Mr. A who is a speaker is in a predetermined building (for example, Mr. A's home) at the time of conversation, and Mr. B, Mr. C, and Mr. D who are conversation partners are with Mr. A. An example will be described in which a person is in a different place (for example, in a different building from Mr. A's home) and all three people gather at the same place and interact with Mr. A.

＜＜本実施形態に係る画像表示システムの構成＞＞
本実施形態に係る画像表示システム（本システムＳ）は、話者であるＡさんと、対話相手であるＢさん、Ｃさん、Ｄさんとが互いの画像を見ながら対話を行うために利用される。すなわち、本システムＳを用いることにより、Ｂさん、Ｃさん及びＤさんは、図１Ａに示すようにＡさんの画像を見ながら対話することが可能である。同様に、Ａさんは、図１Ｂに示すようにＢさん、Ｃさん、Ｄさんの画像を見ながら対話することが可能である。ここで、図１Ａは、話者の画像を対話相手側で表示している様子を示した図であり、図１Ｂは、対話相手の画像を話者側で表示している様子を示した図である。 << Configuration of Image Display System According to Present Embodiment >>
The image display system (present system S) according to the present embodiment is used for a conversation between Mr. A who is a speaker and Mr. B, C, and D who are conversation partners while viewing each other's images. The That is, by using this system S, Mr. B, Mr. C, and Mr. D can interact while viewing the image of Mr. A as shown in FIG. 1A. Similarly, Mr. A can interact while viewing the images of Mr. B, Mr. C, and Mr. D as shown in FIG. 1B. Here, FIG. 1A is a diagram showing a state in which an image of a speaker is displayed on the conversation partner side, and FIG. 1B is a diagram showing a state in which an image of the conversation partner is displayed on the speaker side. It is.

本実施形態において表示される話者及び対話相手それぞれの画像について図１Ａ及び図１Ｂを参照しながらより詳しく説明すると、話者であるＡさんの画像は、図１Ａに示すように、Ｂさん達が居る建物内に設置されたディスプレイに映し出され、Ａさんの全身像及びその周辺空間が表示されることになっている。同様に、対話相手であるＢさん、Ｃさん及びＤさんの画像は、Ａさんの自宅に設置されたディスプレイに映し出され、上記３人すべての全身像及びその周辺空間が表示されることになっている。 Referring to FIGS. 1A and 1B, the images of the speaker and the conversation partner displayed in the present embodiment will be described in more detail with reference to FIGS. 1A and 1B. As shown in FIG. It is projected on the display installed in the building where there is, and the whole body image of Mr. A and the surrounding space are to be displayed. Similarly, the images of Mr. B, Mr. C, and Mr. D who are conversation partners are displayed on the display installed at Mr. A's home, and the whole body image of the above three people and the surrounding space are displayed. ing.

以上のように、話者と対話相手とは、互いに離れた場所にてお互いの全身像及びその周辺空間を見ながら対話をすることが可能となる。これにより、話者と対話相手の双方は、あたかも同じ室内に居るような感覚を感じながら対話することができるので、臨場感のある対話が実現されるようになる。なお、「全身像」とは、頭部から足に亘る身体全体の姿であり、起立状態であっても着座状態であってもよく、また、前方に配置された物によって一部が隠れている状態の姿を含む概念である。 As described above, the speaker and the conversation partner can interact with each other while looking at each other's whole body image and the surrounding space at a location apart from each other. As a result, both the speaker and the conversation partner can interact while feeling as if they are in the same room, so that a realistic conversation can be realized. The “whole body image” is the appearance of the entire body from the head to the foot, and may be in a standing state or a sitting state, and partly hidden by an object placed in front. It is a concept that includes the state of being.

そして、本システムＳでは、臨場感のある対話を実現するために、図２に示すように、Ａさん側及びＢさん達側にそれぞれ通信ユニット１、２を設けている。図２は、本システムＳの概念図である。通信ユニット１、２同士は、同図に示すように、インターネット等の通信回線３を通じてデータ通信可能となっている。なお、図２には不図示となっているが、一般的に、通信ユニット１、２の間には中継サーバ（プロキシサーバ）が介在している。つまり、各通信ユニット１、２間で送受信されるデータについては、通常、上記の中継サーバを経由することになる。 And in this system S, in order to implement | achieve the realistic dialogue, as shown in FIG. 2, the communication units 1 and 2 are provided in the A side and the B side, respectively. FIG. 2 is a conceptual diagram of the system S. As shown in the figure, the communication units 1 and 2 are capable of data communication through a communication line 3 such as the Internet. Although not shown in FIG. 2, a relay server (proxy server) is generally interposed between the communication units 1 and 2. That is, data transmitted / received between the communication units 1 and 2 usually passes through the relay server.

以下、通信ユニット１、２の構成について説明する。先ず、話者側（Ａさん側）に設けられた通信ユニット１について説明すると、図３に示すように、話者側に設けられたサーバコンピュータ（以下、話者側サーバ）１０Ａ及び視聴覚設備等によって構成されている。図３は、通信ユニット１、２を含む本システムＳの構成を示すブロック図である。視聴覚設備としては、集音装置２１、撮像装置２２、音声再生装置２４及びディスプレイ（厳密には後述のディスプレイ兼用ミラー２５）が備えられている。さらに、本実施形態に係る通信ユニット１は、ディスプレイ前に居る話者を検知するセンサとしての人感センサ２３を備えている。 Hereinafter, the configuration of the communication units 1 and 2 will be described. First, the communication unit 1 provided on the speaker side (Mr. A side) will be described. As shown in FIG. 3, a server computer (hereinafter referred to as a speaker side server) 10A provided on the speaker side, audiovisual equipment, etc. It is constituted by. FIG. 3 is a block diagram showing a configuration of the system S including the communication units 1 and 2. As the audiovisual equipment, a sound collecting device 21, an imaging device 22, an audio reproducing device 24, and a display (strictly, a display / mirror 25 described later) are provided. Furthermore, the communication unit 1 according to the present embodiment includes a human sensor 23 as a sensor for detecting a speaker in front of the display.

話者側サーバ１０Ａは、通信ユニット１の中枢を担う装置であり、図３に示すようにＣＰＵ１１、ＲＯＭやＲＡＭからなるメモリ１２、ハードディスクドライブ１３（図３中、ＨＤＤと表記）、通信用インターフェース１４（図３中、通信用Ｉ／Ｆ）及びＩ／Ｏポート１５を有する。話者側サーバ１０Ａは、通信回線３に接続された外部機器（例えば、後述する相手側サーバ１０Ｂ）から送信されてくるデータを受信し、当該データをメモリ１２若しくはハードディスクドライブ１３に記憶する。また、メモリ１２には、話者であるＡさんがＢさん達と対話するにあたって実行される一連のデータ処理を規定したプログラム（以下、対話プログラム）が格納されている。この対話プログラムがＣＰＵ１１により読み出されて実行されることで、Ａさんの映像や音声がＢさん達側に送られ、また、Ｂさん達の画像や音声がＡさん側で表示／再生されることになる。 The speaker-side server 10A is a device that plays a central role in the communication unit 1, and as shown in FIG. 3, a CPU 11, a memory 12 composed of ROM and RAM, a hard disk drive 13 (denoted as HDD in FIG. 3), a communication interface. 14 (I / F for communication in FIG. 3) and an I / O port 15. The speaker-side server 10A receives data transmitted from an external device (for example, a partner-side server 10B described later) connected to the communication line 3, and stores the data in the memory 12 or the hard disk drive 13. Further, the memory 12 stores a program (hereinafter referred to as a dialogue program) that defines a series of data processing executed when the speaker A-san interacts with Mr. B-san. When this interactive program is read out and executed by the CPU 11, the video and audio of Mr. A are sent to Mr. B, and the image and sound of Mr. B are displayed / reproduced on the A side. It will be.

集音装置２１は、Ａさんが話す声や発する音を集音する装置であり、マイクロフォン等の公知の装置により構成される。この集音装置２１は、集音した音声を示す音声信号を出力し、当該音声信号は、話者側サーバ１０Ａに設けられたＩ／Ｏポート１５に入力される。 The sound collecting device 21 is a device that collects a voice spoken by Mr. A and a sound to be emitted, and includes a known device such as a microphone. The sound collecting device 21 outputs a sound signal indicating the collected sound, and the sound signal is input to the I / O port 15 provided in the speaker-side server 10A.

撮像装置２２は、Ａさんの姿及び周辺空間を撮像する装置であり、ビデオカメラ等の公知の装置により構成される。この、撮像装置２２は、撮像した映像を示す映像信号を出力し、当該映像信号は、話者側サーバ１０Ａに設けられたＩ／Ｏポート１５に入力される。 The imaging device 22 is a device that images the appearance of Mr. A and the surrounding space, and is configured by a known device such as a video camera. The imaging device 22 outputs a video signal indicating the captured video, and the video signal is input to the I / O port 15 provided in the speaker-side server 10A.

人感センサ２３は、その検知エリア内にヒトが存在していると、その者の位置を検知し、検知結果を示す信号を話者側サーバ１０Ａに向けて出力する。より具体的に説明すると、本実施形態に係る人感センサ２３は、その構造が公知となっており、ディスプレイ前方に話者（Ａさん）が居るときに、その者の位置を検知するものである。つまり、本実施形態に係る人感センサ２３は、ヒトが居る位置を検知対象とし、当該位置が予め設定された条件を満たしたときに、上記の検知対象を検知する。 When the human sensor 23 exists in the detection area, the human sensor 23 detects the position of the person and outputs a signal indicating the detection result to the speaker side server 10A. More specifically, the human sensor 23 according to the present embodiment is known in its structure, and detects the position of a speaker (Mr. A) when the speaker (Mr. A) is in front of the display. is there. That is, the human sensor 23 according to the present embodiment uses the position where the person is present as a detection target, and detects the detection target when the position satisfies a preset condition.

なお、人感センサとしては、ヒトの位置を検知するものに限定されるものではなく、ヒトが行う動作、姿勢、ヒトが発する音のうちの少なくとも一つを検知対象とし、当該検知対象が予め設定された条件を満たしたとき、当該検知対象を検知するものであればよい。例えば、ヒトの足音や話し声に反応する音センサを用い、所定の音量以上となったときに、その音を検知することとしてもよい。このように音を検知対象とする構成については、例えば特開２０１３−７３５０５号公報や特開２００５−７８３４７号公報に記載された構成のように公知の構成を利用することが可能である。 The human sensor is not limited to the one that detects the position of the person, and at least one of the action, posture, and sound emitted by the person is set as the detection target, and the detection target is previously set. What is necessary is just to detect the said detection object, when the set conditions are satisfy | filled. For example, a sound sensor that reacts to human footsteps or spoken voice may be used to detect the sound when the sound volume exceeds a predetermined level. As for the configuration in which sound is detected as described above, it is possible to use a known configuration, for example, a configuration described in JP2013-73505A or JP2005-78347A.

音声再生装置２４は、音声を再生する装置であり、スピーカー等の公知の装置により構成される。この音声再生装置２４は、話者側サーバ１０Ａが対話相手の音声を示すデータを展開することで出力される再生命令を受け付ける。この結果、音声再生装置２４によって対話相手の音声が再生されるようになる。 The sound reproducing device 24 is a device that reproduces sound, and includes a known device such as a speaker. The voice reproduction device 24 receives a reproduction command output when the speaker-side server 10A expands data indicating the voice of the conversation partner. As a result, the voice of the conversation partner is played back by the voice playback device 24.

ディスプレイは、Ｂさん達の画像（以下、相手画像）を表示する表示画面を形成する装置であり、表示画面形成部に相当する。このディスプレイは、話者側サーバ１０Ａが相手画像表示用のデータを展開することで出力される表示命令を受け付ける。この結果、ディスプレイ上の表示画面には、相手画像が表示されるようになる。 The display is a device that forms a display screen for displaying Mr. B's images (hereinafter referred to as partner images), and corresponds to a display screen forming unit. This display accepts a display command that is output when the speaker-side server 10A expands the partner image display data. As a result, the partner image is displayed on the display screen on the display.

そして、本実施形態に係るディスプレイは、通常時にはＡさんが居る建物（自宅）内に配置された装飾品、具体的には姿見として機能し、対話が実行される時にのみ表示画面を形成するものとなっている。以下、図３及び図４を参照しながら、本実施形態に係るディスプレイについて説明する。図４は、本実施形態に係るディスプレイを示す図であり、同図の（Ａ）は、対話が行われていない非対話時の状態を、同図の（Ｂ）は、対話が行われている対話時の状態を、それぞれ示している。 The display according to the present embodiment normally functions as an ornament arranged in the building (home) where Mr. A is present, specifically as a figure, and forms a display screen only when a dialogue is executed. It has become. Hereinafter, the display according to the present embodiment will be described with reference to FIGS. 3 and 4. FIG. 4 is a diagram showing a display according to the present embodiment, where FIG. 4A shows a non-interactive state when no dialogue is performed, and FIG. It shows the state at the time of dialogue.

本実施形態に係るディスプレイは、前述したように、Ａさんの自宅内に配置された姿見の一部分、具体的には鏡面部分を構成しており、図４の（Ａ）に示す通り、非対話時には表示画面を形成せずに鏡面部分としての外観を現している。一方、本実施形態に係るディスプレイは、対話時になると、図４の（Ｂ）に示すように、表示画面を形成し、当該表示画面には相手画像が表示されるようになる。 As described above, the display according to the present embodiment constitutes a part of the appearance arranged in Mr. A's house, specifically a specular part. As shown in FIG. Sometimes the appearance as a mirror surface part is shown without forming a display screen. On the other hand, the display according to the present embodiment forms a display screen as shown in FIG. 4B when the conversation is started, and the partner image is displayed on the display screen.

以上のように本実施形態に係るディスプレイは、ディスプレイ兼用ミラー２５により構成されており、表示画面の形成・消去が自在に切り替え可能となっている。より具体的に説明すると、ディスプレイ兼用ミラー２５は、図３に示すように、制御回路２６と発光部２７を内蔵している。そして、制御回路２６が、話者側サーバ１０Ａから出力される表示画面形成命令を受信し、当該命令に従って発光部２７を点灯させると、ディスプレイ兼用ミラー２５の鏡面部分に表示画面が形成されるようになる。 As described above, the display according to the present embodiment is constituted by the display / mirror 25, and the display screen can be freely formed and erased. More specifically, the display / mirror 25 includes a control circuit 26 and a light emitting unit 27 as shown in FIG. When the control circuit 26 receives the display screen formation command output from the speaker-side server 10A and turns on the light emitting unit 27 in accordance with the command, the display screen is formed on the mirror surface portion of the display / mirror 25. become.

一方、表示画面形成命令の出力がない期間には、制御回路２６は、発光部２７を消灯状態のままで保持し、これにより、ディスプレイ兼用ミラー２５の鏡面部分は、その本来の外観を現すようになる。このように、対話時には表示画面を形成するディスプレイ兼用ミラー２５が、対話が行われていない非対話時には姿見として機能することで、対話者（相手側対話者）の自宅内で目立ち難くなり、結果として表示画面の存在が気付き難くなる。なお、ディスプレイ兼用ミラー２５の構成については、例えば国際公開第２００９／１２２７１６号に記載された構成のように公知の構成が利用可能である。 On the other hand, during a period when there is no output of the display screen formation command, the control circuit 26 keeps the light emitting unit 27 in the unlit state, so that the mirror surface portion of the display / mirror 25 exhibits its original appearance. become. In this way, the display-use mirror 25 that forms a display screen at the time of dialogue functions as an appearance during non-dialogue when no dialogue is performed, so that it becomes difficult to stand out in the home of the dialoguer (the other party dialoguer). The presence of the display screen becomes difficult to notice. In addition, about the structure of the mirror 25 for display, a well-known structure can be utilized like the structure described in international publication 2009/122716, for example.

ちなみに、通常時（非対話時）に表示画面の存在を気付き難くするディスプレイとしては、上記のディスプレイ兼用ミラー２５に限定されず、姿見以外のものであってもよい。すなわち、相手画像表示用の表示画面を形成するディスプレイについては、建物内に配置された建築材料、家具若しくは装飾品の一部分を構成し、表示画面の形成・消去を自在に切り替えることが可能なものであればよい。例えば、建物のドアや内壁を構成する建築材料、あるいは、タンス等の家具の一部を上記ディスプレイとして用いることとしてもよい。 Incidentally, the display that makes it difficult to notice the presence of the display screen during normal (non-interactive) time is not limited to the above-mentioned display-use mirror 25 and may be other than the appearance. In other words, for the display that forms the display screen for displaying the partner image, it is a part of the building material, furniture, or decoration arranged in the building, and the display screen can be switched between formation and deletion freely If it is. For example, it is good also as using the building material which comprises the door and inner wall of a building, or some furnitures, such as a chiffon, as said display.

次に、対話相手側（Ｂさん達側）に設けられた通信ユニット２について説明すると、図３に示すように、対話相手側に設けられたサーバコンピュータ（以下、相手側サーバ）１０Ｂ及び視聴覚設備等によって構成されている。視聴覚設備としては、話者側の通信ユニット１と同様の装置、すなわち、集音装置２１、撮像装置２２、音声再生装置２４及びディスプレイ（厳密にはディスプレイ兼用ミラー２５）が備えられている。これらの機器については、話者側の通信ユニット１に備えられた装置と同様であるため、説明を省略する。 Next, the communication unit 2 provided on the conversation partner side (Mr. B) will be described. As shown in FIG. 3, a server computer (hereinafter referred to as partner server) 10B and audiovisual equipment provided on the conversation partner side. Etc. are constituted. The audio-visual equipment includes the same devices as the communication unit 1 on the speaker side, that is, a sound collection device 21, an imaging device 22, an audio reproduction device 24, and a display (strictly, a display-use mirror 25). Since these devices are the same as the devices provided in the communication unit 1 on the speaker side, description thereof will be omitted.

さらに、本実施形態では、対話相手側の通信ユニット２においても、話者側の通信ユニット１と同じく、人感センサ２３が備えられている。対話相手側に設けられた人感センサ２３は、センサの一例であり、ディスプレイ前方に対話相手（Ｂさん、Ｃさん、Ｄさんのいずれか）が居るときに、その者の位置を検知するものである。つまり、対話相手側に設けられた人感センサ２３は、ヒトが居る位置を検知対象とし、当該位置が予め設定された条件（第二の条件に相当し、具体的にはディスプレイ前方に位置するという条件）を満たしたときに、上記の検知対象を検知する。 Further, in the present embodiment, the communication unit 2 on the conversation partner side is also provided with the human sensor 23 as in the communication unit 1 on the speaker side. The human sensor 23 provided on the conversation partner side is an example of the sensor, and detects the position of the person when the conversation partner (Mr. B, Mr. C, or Mr. D) is in front of the display. It is. That is, the human sensor 23 provided on the conversation partner side detects a position where the person is present, and the position is equivalent to a preset condition (corresponding to the second condition, specifically, in front of the display. When the above condition is satisfied, the above detection target is detected.

なお、対話相手側に設けられた人感センサ２３についても、ヒトの位置を検知するものに限定されずヒトが行う動作、姿勢、その者が発する音のうちの少なくとも一つを検知対象とし、当該検知対象が予め設定された条件を満たしたときに当該検知対象を検知するものであればよい。 In addition, the human sensor 23 provided on the conversation partner side is not limited to the one that detects the position of the person, but at least one of the movement, posture, and sound generated by the person is a detection target, What is necessary is just to detect the said detection target, when the said detection target satisfy | fills the conditions set beforehand.

相手側サーバ１０Ｂは、話者側サーバ１０Ａと略同様の構成となっており、そのメモリ１２には対話プログラムが格納されている。そして、対話プログラムがＣＰＵ１１により読み出されて実行されることで、Ｂさん達の映像や音声がＡさん側に送られ、また、Ａさんの画像や音声がＢさん達側で表示／再生されることになる。 The counterpart server 10B has substantially the same configuration as the speaker server 10A, and a dialogue program is stored in the memory 12 thereof. Then, the conversation program is read and executed by the CPU 11, so that Mr. B's video and audio are sent to Mr. A, and Mr. A's image and sound are displayed / played on Mr. B's side. Will be.

ところで、相手側サーバ１０Ｂは、話者側サーバ１０Ａの機能に加えて、特別な機能が搭載されている。かかる機能について説明すると、相手側サーバ１０Ｂは、話者側サーバ１０Ａから話者の映像を示す映像データを受信したときに、当該映像データに基づいて話者の画像（以下、話者画像）を表示するための表示データを生成する。この表示データが展開されると、Ｂさん達側に設けられたディスプレイに話者画像が表示されるようになる。 By the way, the partner server 10B is equipped with a special function in addition to the function of the speaker server 10A. Explaining this function, when receiving the video data indicating the video of the speaker from the speaker side server 10A, the partner server 10B generates a speaker image (hereinafter referred to as a speaker image) based on the video data. Display data for display is generated. When this display data is expanded, a speaker image is displayed on a display provided on the side of Mr. B.

そして、本実施形態において、相手側サーバ１０Ｂは、表示データ生成時に、話者側サーバ１０Ａから受信した映像データが示す話者の映像から補正された画像を表示するための表示データを生成することが可能である。すなわち、本実施形態では、対話相手に対して表示される話者画像については、話者を撮像したときの映像を補正した画像（分かり易く言えば、撮像された話者の実際の映像とは異なる画像）とすることが可能である。 And in this embodiment, the other party server 10B produces | generates the display data for displaying the image corrected from the video | video of the speaker which the video data received from 10A of speaker side shows at the time of display data production | generation. Is possible. In other words, in the present embodiment, the speaker image displayed to the conversation partner is an image obtained by correcting the image when the speaker is captured (in simple terms, the actual image of the captured speaker is Different images).

話者画像の補正についてより具体的に説明するために、Ｂさん、Ｃさん、ＤさんのいずれかがＡさんの話に対して所定の反応動作（例えば、話し声や笑い声を発する等の動作）を示したケースを想定する。このとき、相手側サーバ１０Ｂは、上記の反応動作を検知し、当該動作を行った者（以下、動作実行者）が居る位置を特定する。その後、相手側サーバ１０Ｂは、特定結果に基づき、動作実行者が居る位置にＡさんの視線が向かうように補正された話者画像を表示するための表示データを生成する。 In order to explain the correction of the speaker image more specifically, any one of Mr. B, Mr. C, and Mr. D reacts to Mr. A's story (for example, an operation such as speaking or laughing) Is assumed. At this time, the counterpart server 10B detects the above-described reaction operation, and specifies the position where the person who performed the operation (hereinafter referred to as the operation executor) is present. After that, the counterpart server 10B generates display data for displaying the speaker image corrected so that Mr. A's line of sight is directed to the position where the action executor is present based on the identification result.

そして、上記の表示データが展開されると、Ｂさん達側では、図１０に示すようにＡさんの視線が動作実行者に向けられた話者画像（以下、視線合わせ画像）が表示されるようになる。この視線合わせ画像が表示されることで、動作実行者は、当該画像中のＡさんの視線を見て、自身の動作に対してＡさんが反応してくれたものと錯覚し、あたかもＡさんと直に対面しながら対話していると感じるようになる。 Then, when the display data is expanded, Mr. B's side displays a speaker image (hereinafter referred to as a line-of-sight image) in which Mr. A's line of sight is directed to the action performer as shown in FIG. It becomes like this. By displaying this line-of-sight image, the person performing the operation looks at Mr. A's line of sight in the image and feels that Mr. A has responded to his / her movement, as if Mr. A. You will feel like you are interacting directly with each other.

また、本実施形態において、相手側サーバ１０Ｂは、視線合わせ画像の表示後に、動作実行者が居る位置にＡさんの視線及び顔が向くように補正された話者画像を表示するための表示データを更に生成する。かかる表示データが展開されると、Ｂさん達側では、図１２に示すようにＡさんの視線及び顔が動作実行者に向けられた話者画像（以下、二次視線合わせ画像）が表示されるようになる。この二次視線合わせ画像が表示されることで、より一層臨場感のある対話が実現される。分かり易く説明すると、Ａさんの視線のみならず、Ａさんの顔も動作実行者に向けられるため、当該動作実行者にとって、Ａさんと対面しながら対話しているように感じる度合い、すなわち対面性がより高まることとなる。 In the present embodiment, the partner server 10B displays the display data for displaying the speaker image corrected so that Mr. A's line of sight and face are directed at the position where the operation executor is present after the line-of-sight image is displayed. Is further generated. When such display data is expanded, Mr. B's side displays a speaker image (hereinafter referred to as a secondary line-of-sight alignment image) in which Mr. A's line of sight and face are directed to the action performer, as shown in FIG. Become so. By displaying this secondary line-of-sight image, a more realistic dialogue is realized. If it explains in an easy-to-understand manner, not only Mr. A's line of sight but also Mr. A's face is directed toward the action performer, so the degree to which the action performer feels as if interacting with Mr. A, that is, face-to-face Will increase more.

＜＜話者側サーバ及び相手側サーバの構成について＞＞
話者側サーバ１０Ａ及び相手側サーバ１０Ｂの各々の構成、特にハードウェア構成については既に説明した通りであるが、以下では、図５を参照しながら、各サーバの構成をその機能面から改めて説明することとする。図５は、話者側サーバ１０Ａ及び相手側サーバ１０Ｂの各々の構成を機能面から示した図である。 << Configuration of Speaker Server and Counterparty Server >>
The configuration of each of the speaker-side server 10A and the partner-side server 10B, particularly the hardware configuration, has already been described. In the following, the configuration of each server will be described again from the functional aspect with reference to FIG. I decided to. FIG. 5 is a diagram showing the configuration of each of the speaker-side server 10A and the partner-side server 10B in terms of functions.

話者側サーバ１０Ａの構成を機能面から説明すると、図５に示すように、データ取得部３１、データ送信部３２、データ受信部３３、相手画像表示部３４及び相手音声再生部３５が備えられている。これらは、話者側サーバ１０Ａが実行するデータ処理を担当するもの、すなわちデータ処理部に相当する。そして、上述した５つのデータ処理部は、それぞれ、ＣＰＵ１１、メモリ１２、ハードディスクドライブ１３、通信用インターフェース１４及びＩ／Ｏポート１５といったハードウェア機器がソフトウェアとしての対話プログラムと協働することで実現される。以下、各データ処理部について個別に説明する。 The configuration of the speaker-side server 10A will be described in terms of functions. As shown in FIG. 5, a data acquisition unit 31, a data transmission unit 32, a data reception unit 33, a partner image display unit 34, and a partner voice reproduction unit 35 are provided. ing. These correspond to data processing executed by the speaker-side server 10A, that is, a data processing unit. The five data processing units described above are realized by hardware devices such as the CPU 11, the memory 12, the hard disk drive 13, the communication interface 14, and the I / O port 15 cooperating with an interactive program as software. The Hereinafter, each data processing unit will be described individually.

（データ取得部３１について）
データ取得部３１は、話者側サーバ１０ＡがＩ／Ｏポート１５を介して集音装置２１及び撮像装置２２から受信した信号、具体的には音声信号と映像信号をそれぞれデジタル化してからエンコードすることで音声データと映像データを取得する。ここで、音声データとは、集音装置２１により集音されたＡさんの音声（話し声）を示すデータである。また、映像データとは、Ａさんを撮像した際の実際の映像を示すデータであり、本実施形態では、ディスプレイ兼用ミラー２５の前方に居るＡさんの全身像と周辺空間の映像を示すものとなっている。 (About the data acquisition unit 31)
The data acquisition unit 31 digitizes and encodes signals received from the sound collection device 21 and the imaging device 22 via the I / O port 15 by the speaker-side server 10A, specifically, an audio signal and a video signal, respectively. In this way, audio data and video data are acquired. Here, the voice data is data indicating the voice (speaking voice) of Mr. A collected by the sound collecting device 21. The video data is data indicating an actual video when Mr. A is imaged. In the present embodiment, the video data indicates a full-body image of A who is in front of the display / mirror 25 and video of the surrounding space. It has become.

なお、本実施形態では、Ａさん側に設けられた人感センサ２３がその検知エリア内に居る者の位置を検知すると、これをトリガーとして、集音装置２１及び撮像装置２２が起動し、音声の集音や映像の撮像が開始される。これに連動する形でデータ取得部３１が上記２つのデータを取得し始めるようになる。 In the present embodiment, when the human sensor 23 provided on the Mr. A side detects the position of a person in the detection area, the sound collection device 21 and the imaging device 22 are activated using this as a trigger, and the voice Sound collection and video imaging are started. In conjunction with this, the data acquisition unit 31 starts to acquire the two data.

（データ送信部３２について）
データ送信部３２は、データ取得部３１が音声データ及び映像データを取得すると、これらのデータを相手側サーバ１０Ｂに向けて随時送信する。なお、本実施形態において、データ送信部３２は、音声データ及び映像データを多重化して一つのデータ（以下、対話データ）として送信することとしている。 (About the data transmission unit 32)
When the data acquisition unit 31 acquires audio data and video data, the data transmission unit 32 transmits these data to the counterpart server 10B as needed. In the present embodiment, the data transmission unit 32 multiplexes audio data and video data and transmits them as one data (hereinafter referred to as dialogue data).

（データ受信部３３について）
データ受信部３３は、相手側サーバ１０Ｂから送信されてくる対話データを、通信回線３を通じて受信する。なお、受信した対話データは、話者側サーバ１０Ａのメモリ１２の所定領域、若しくはハードディスクドライブ１３に保存される。 (About the data receiver 33)
The data receiving unit 33 receives the conversation data transmitted from the counterpart server 10B through the communication line 3. The received dialogue data is stored in a predetermined area of the memory 12 of the speaker-side server 10A or the hard disk drive 13.

（相手画像表示部３４について）
相手画像表示部３４は、メモリ１２やハードディスクドライブ１３に保存されている対話データを読み出して同データから映像データを抽出した上で、映像データをデコードして展開し、Ａさん側のディスプレイに対して表示命令を出力する。この表示命令を受け付けたディスプレイ側では、その表示画面に相手画像、すなわち、Ｂさん達の全身画像及び周辺空間の映像が表示されるようになる。このように相手画像表示部３４は、対話データから抽出した映像データを展開することで相手画像を表示するものである。 (About the partner image display unit 34)
The partner image display unit 34 reads out the conversation data stored in the memory 12 or the hard disk drive 13, extracts the video data from the data, decodes and expands the video data, and displays it on the display on the Mr. A side. Output a display command. On the display side that has received this display command, the other party's image, that is, the whole body image of Mr. B and the video of the surrounding space are displayed on the display screen. In this way, the partner image display unit 34 displays the partner image by expanding the video data extracted from the conversation data.

ところで、本実施形態では、前述したように、話者側に設けられたディスプレイがディスプレイ兼用ミラー２５によって構成されている。そして、相手画像表示部３４は、相手画像を表示するにあたり、ディスプレイ兼用ミラー２５の鏡面部分に表示画面を形成させるための処理を実行することとしている。かかる処理（以下、表示画面形成命令処理）は、ディスプレイ兼用ミラー２５が表示画面を形成していない状態において画面形成条件が成立したことをトリガーとして相手画像表示部３４によって実行される。ここで、画面形成条件とは、ディスプレイ兼用ミラー２５に表示画面を形成させる条件として予め設定されたものであり、具体的には、Ａさん側に設けられた人感センサ２３がその検知エリア内に居る者の位置を検知することである。 By the way, in this embodiment, as described above, the display provided on the speaker side is constituted by the display / mirror 25. The partner image display unit 34 executes a process for forming a display screen on the mirror surface portion of the display / mirror 25 when displaying the partner image. Such processing (hereinafter, display screen formation command processing) is executed by the partner image display unit 34 when the screen formation condition is satisfied in a state where the display mirror 25 does not form a display screen. Here, the screen formation condition is set in advance as a condition for forming a display screen on the display / mirror 25. Specifically, the human sensor 23 provided on the side of Mr. A is within the detection area. It is to detect the position of the person who is in the area.

より詳しく説明すると、人感センサ２３の検知エリアは、ディスプレイ兼用ミラー２５の前方位置、例えば、ディスプレイ兼用ミラー２５を姿見として利用する際に立つ位置よりも幾分近付いた位置に設定されている。そのため、図４の（Ｂ）に示すように、ディスプレイ兼用ミラー２５に対して通常時よりも近付いた位置にＡさんが位置していると、人感センサ２３は、その検知エリア内に居るＡさんの位置を検知し、その検知結果を示す信号（以下、検知信号）を話者側サーバ１０Ａに向けて出力する。検知信号がＩ／Ｏポート１５を介して話者側サーバ１０Ａに入力されると、相手画像表示部３４が表示画面形成命令処理を実行し、ディスプレイ兼用ミラー２５に表示画面を形成させる命令（表示画面形成命令）を生成して、当該命令をディスプレイ兼用ミラー２５に向けて出力する。 More specifically, the detection area of the human sensor 23 is set at a position slightly closer to the front position of the display / use mirror 25, for example, the position when the display / use mirror 25 is used as a look. Therefore, as shown in FIG. 4B, when Mr. A is located closer to the display / mirror 25 than usual, the human sensor 23 is located within the detection area. Is detected, and a signal indicating the detection result (hereinafter, a detection signal) is output to the speaker-side server 10A. When the detection signal is input to the speaker-side server 10A via the I / O port 15, the partner image display unit 34 executes a display screen formation command process, and a command (display) for forming the display screen on the display / mirror 25. A screen formation command is generated and the command is output to the display mirror 25.

表示画面形成命令を受け付けたディスプレイ兼用ミラー２５では、制御回路２６が当該命令に従って発光部２７を点灯させる。これにより、それまで姿見としての外観を呈していたディスプレイ兼用ミラー２５の鏡面部分に、表示画面が形成されるようになる。そして、ディスプレイ兼用ミラー２５は、人感センサ２３がその検知エリア内に居る者の位置を検知し続けている間、表示画面を形成した状態で維持される。一方、検知エリア内に居た者が当該エリア外に移動して人感センサ２３が検知対象を検知しなくなると、ディスプレイ兼用ミラー２５は、表示画面を消去し、鏡面部分に姿見としての外観を現すようになる。すなわち、本実施形態においてディスプレイ兼用ミラー２５は、人感センサ２３が検知エリア内に居る者の位置を検知している期間にのみ表示画面を形成する。 In the display-use mirror 25 that has received the display screen formation command, the control circuit 26 turns on the light emitting unit 27 in accordance with the command. As a result, a display screen is formed on the mirror surface portion of the display-use mirror 25 that has been shown as an appearance. The display mirror 25 is maintained in a state in which the display screen is formed while the human sensor 23 continues to detect the position of the person in the detection area. On the other hand, when a person who is in the detection area moves out of the area and the human sensor 23 no longer detects the detection target, the display-use mirror 25 erases the display screen, and the mirror surface portion has an appearance as a appearance. To come out. That is, in the present embodiment, the display-use mirror 25 forms a display screen only during a period in which the human sensor 23 is detecting the position of a person who is in the detection area.

（相手音声再生部３５について）
相手音声再生部３５は、メモリ１２やハードディスクドライブ１３に保存されている対話データを読み出して同データから音声データを抽出した上で、当該音声データをデコードして展開し、Ａさん側の音声再生装置２４に対して再生命令を出力する。この再生命令を受け付けた音声再生装置２４は、当該命令に従って上記の音声データが示す音声、すなわちＢさん達の音声を再生する。 (About the other party voice playback unit 35)
The other party's voice playback unit 35 reads out conversation data stored in the memory 12 or the hard disk drive 13, extracts voice data from the data, decodes and expands the voice data, and plays the voice of Mr. A. A reproduction command is output to the device 24. The audio reproduction device 24 that has received this reproduction command reproduces the audio indicated by the audio data, that is, the voices of Mr. B and others, according to the command.

次に、相手側サーバ１０Ｂの構成を機能面から説明すると、図５に示すように、データ取得部４１、データ送信部４２、データ受信部４３、動作実行者検知部４４、表示データ生成部４５、話者画像表示部４６及び話者音声再生部４７が備えられている。これらは、相手側サーバ１０Ｂが実行するデータ処理を担当するもの、すなわちデータ処理部に相当する。そして、上述した７つのデータ処理部は、それぞれ、ＣＰＵ１１、メモリ１２、ハードディスクドライブ１３、通信用インターフェース１４及びＩ／Ｏポート１５といったハードウェア機器がソフトウェアとしての対話プログラムと協働することで実現される。 Next, the configuration of the counterpart server 10B will be described in terms of functions. As shown in FIG. 5, a data acquisition unit 41, a data transmission unit 42, a data reception unit 43, an operation performer detection unit 44, and a display data generation unit 45 are provided. , A speaker image display unit 46 and a speaker voice reproduction unit 47 are provided. These correspond to data processing performed by the counterpart server 10B, that is, a data processing unit. The seven data processing units described above are realized by hardware devices such as the CPU 11, the memory 12, the hard disk drive 13, the communication interface 14, and the I / O port 15 cooperating with an interactive program as software. The

なお、上記７つのデータ処理部のうち、データ取得部４１、データ送信部４２及びデータ受信部４３については、対象とするデータの内容が異なるものの、その機能については話者側サーバ１０Ａのデータ処理部（具体的にはデータ取得部３１、データ送信部３２及びデータ受信部３３）と共通するので、説明を省略することとする。以下、残りのデータ処理部について説明する。 Among the above seven data processing units, the data acquisition unit 41, the data transmission unit 42, and the data reception unit 43 have different data contents, but the functions of the speaker side server 10A are the same. Since these are common to the units (specifically, the data acquisition unit 31, the data transmission unit 32, and the data reception unit 33), description thereof will be omitted. Hereinafter, the remaining data processing units will be described.

（動作実行者検知部４４について）
動作実行者検知部４４は、検知部に相当し、データ取得部４１が取得した音声データ及び映像データに基づいて、動作実行者を検知するものである。より具体的に説明すると、対話相手であるＢさん、Ｃさん及びＤさんは、対話中、ディスプレイを構成するディスプレイ兼用ミラー２５の前方位置に居て、ディスプレイ兼用ミラー２５に形成された表示画面にてＡさんの画像を見るとともに、音声再生装置２４により再生されるＡさんの音声を聞く。この間、Ｂさん達の音声が集音装置２１により集音され、Ｂさん達の映像が撮像装置２２により撮像されている。その音声信号及び映像信号は、逐次、相手側サーバ１０Ｂに向けて出力され、データ取得部４１は、上記の出力信号から音声データ及び映像データを取得する。 (About the operation executor detection unit 44)
The operation executor detection unit 44 corresponds to a detection unit, and detects an operation executor based on audio data and video data acquired by the data acquisition unit 41. More specifically, Mr. B, Mr. C, and Mr. D who are conversation partners are in front of the display / mirror 25 constituting the display during the conversation, and the display screen formed on the display / mirror 25 is displayed on the display screen. The image of Mr. A is viewed and the sound of Mr. A reproduced by the audio reproducing device 24 is heard. During this time, Mr. B's voice is collected by the sound collecting device 21, and the image of Mr. B's is captured by the imaging device 22. The audio signal and video signal are sequentially output toward the counterpart server 10B, and the data acquisition unit 41 acquires audio data and video data from the output signal.

一方で、動作実行者検知部４４は、データ取得部４１が取得した音声データ及び映像データを解析し、Ｂさん達の中で該当動作を行っている者がいるか否かを判定する。ここで、該当動作とは、動作実行者を検知するために予め設定された条件を満たす動作のことであり、具体的には、笑い声や話し声を発する動作のことである。そして、動作実行者検知部４４は、該当動作を行っている者が居ると判定すると、その者を検知する。 On the other hand, the operation executor detection unit 44 analyzes the audio data and the video data acquired by the data acquisition unit 41, and determines whether there is a person who performs the corresponding operation among Mr. B. Here, the corresponding action is an action that satisfies a preset condition for detecting the person who performs the action, and specifically, an action that produces a laughter or a voice. Then, when it is determined that there is a person who performs the corresponding operation, the action executor detection unit 44 detects the person.

なお、動作実行者を検知する方法については特に限定されるものではないが、一例としては、音声データから音量や声の高さを特定して当該特定結果から音声発生源の位置を算出するとともに、映像データから各対話相手の位置を特定し、算出した音声発生源の位置に相当する位置に居る者を割り出すことで動作実行者を検知することが考えられる。 The method for detecting the person performing the operation is not particularly limited. For example, the volume and the pitch of the voice are specified from the voice data, and the position of the voice source is calculated from the specified result. It is conceivable to identify the position of each conversation partner from the video data and detect the person who is in the position corresponding to the calculated position of the sound generation source to detect the person performing the operation.

（表示データ生成部４５について）
表示データ生成部４５は、話者側サーバ１０Ａから送信されてくる対話データから映像データを抽出し、当該映像データに基づいて話者画像の表示データを生成するものである。そして、本実施形態では、表示データ生成部４５によるデータ処理（表示データ生成処理）が２種類あり、いずれの処理が実行されるかについては、動作実行者検知部４４による動作実行者の検知の有無に応じて変化することとなっている。 (About the display data generation unit 45)
The display data generation unit 45 extracts video data from the conversation data transmitted from the speaker side server 10A, and generates display data of the speaker image based on the video data. In this embodiment, there are two types of data processing (display data generation processing) by the display data generation unit 45, and which of the processing is executed is detected by the operation performer detection unit 44. It will change according to the presence or absence.

より具体的に説明すると、動作実行者検知部４４が動作実行者を検知したとき、表示データ生成部４５は、話者画像として前述の視線合わせ画像を表示する表示データの生成処理を実行する。かかる処理は、本発明の第一処理に相当し、以降では視線合わせ画像生成処理と呼ぶこととする。 More specifically, when the operation executor detection unit 44 detects an operation executor, the display data generation unit 45 executes display data generation processing for displaying the above-described line-of-sight image as a speaker image. Such a process corresponds to the first process of the present invention, and is hereinafter referred to as a line-of-sight image generation process.

視線合わせ画像生成処理について図６に図示した手順に沿って説明する。図６は、視線合わせ画像生成処理の流れを示した図である。視線合わせ画像生成処理は、動作実行者検知部４４が動作実行者を検知したことをトリガーとして実行され、先ず、動作実行者が居る位置を特定する工程から始まる（Ｓ００１）。本工程Ｓ００１において、表示データ生成部４５は、動作実行者検知部４４が動作実行者を検知する際に割り出した音声発生源の位置から動作実行者の位置を特定する。以下、動作実行者位置特定工程Ｓ００１について図７を参照しながらより詳細に説明する。図７は、動作実行者位置の特定に関する説明図である。 The line-of-sight image generation processing will be described along the procedure illustrated in FIG. FIG. 6 is a diagram illustrating a flow of the line-of-sight image generation processing. The line-of-sight image generation processing is executed with the operation executor detection unit 44 detecting the operation executor as a trigger, and starts from a step of identifying the position where the operation executor is present (S001). In this step S001, the display data generation unit 45 specifies the position of the operation performer from the position of the sound generation source determined when the operation performer detection unit 44 detects the operation performer. Hereinafter, the operation executor position specifying step S001 will be described in more detail with reference to FIG. FIG. 7 is an explanatory diagram regarding the specification of the position of the person performing the operation.

表示データ生成部４５は、音声を発した対話相手（図７では最も右側に位置する者）の位置を特定するにあたり、映像データを解析して動作実行者の位置を特定する。より具体的に説明すると、音声を発した対話相手について、撮像装置２２から見たときの方向及び撮像装置２２の正面位置からの傾き角度（図７中、記号θにて表記）を特定する。なお、動作実行者位置を特定する方法については、上記の内容に限定されず、例えば距離センサや位置センサを用いて動作実行者の位置を特定することとしてもよい。 The display data generation unit 45 analyzes the video data and specifies the position of the person who performs the operation in order to specify the position of the conversation partner who uttered the voice (the person on the rightmost side in FIG. 7). More specifically, the direction of the conversation partner who uttered the voice and the tilt angle from the front position of the imaging device 22 (shown by the symbol θ in FIG. 7) when identified from the imaging device 22 are specified. In addition, about the method of specifying an operation performer position, it is not limited to said content, For example, it is good also as specifying an operation performer's position using a distance sensor or a position sensor.

動作実行者の位置を特定した後、表示データ生成部４５は、データ受信部４３が話者側サーバ１０Ａから受信した対話データの中から映像データを抽出し、抽出した映像データに対して話者映像分解処理を実行する（Ｓ００２）。この処理は、対話データから抽出した映像データが示す話者の映像を、図８に示すように話者の瞳の映像（以下、瞳映像）、瞳を除く頭部の映像（以下、頭部映像）、及び、話者の頭部以外及び周辺空間の映像（以下、胴等映像）に分割する処理であり、公知の画像処理技術により実現される。図８は、話者映像の分解に関する説明図である。 After specifying the position of the person who performs the operation, the display data generation unit 45 extracts video data from the dialogue data received by the data receiving unit 43 from the speaker-side server 10A, and the speaker for the extracted video data. Video separation processing is executed (S002). In this process, the video of the speaker indicated by the video data extracted from the dialogue data is converted into the video of the speaker's pupil (hereinafter referred to as the pupil video) and the video of the head excluding the pupil (hereinafter referred to as the head) as shown in FIG. Video) and a video of the space other than the speaker's head and the surrounding space (hereinafter referred to as a torso video), which is realized by a known image processing technique. FIG. 8 is an explanatory diagram regarding the decomposition of the speaker video.

話者映像分解処理を実行した後、表示データ生成部４５は、話者映像から分割した瞳映像のデータに対して、瞳の形状や眼球に対する相対位置を編集する処理を実行する（Ｓ００３）。この視線編集処理は、動作実行者位置特定工程Ｓ００１にて特定した動作実行者位置に応じて話者の視線を変化させるために実行される。以下、視線編集処理について図９を参照しながら説明する。図９は、視線編集に関する説明図である。 After executing the speaker video decomposition process, the display data generation unit 45 executes a process of editing the pupil shape and the relative position with respect to the eyeball on the pupil video data divided from the speaker video (S003). This line-of-sight editing process is executed to change the line of sight of the speaker in accordance with the action executor position specified in the action executor position specifying step S001. The line-of-sight editing process will be described below with reference to FIG. FIG. 9 is an explanatory diagram regarding line-of-sight editing.

視線編集処理では、実際の瞳の映像（図９中の黒抜き部分）を動作実行者位置に応じて編集し、具体的には、撮像装置２２から動作実行者を見たときの方向に傾き角度θに応じた分だけ視線がずれるように瞳の形状及び位置を変更する。かかる手順を経ることにより、瞳映像は、図９中、ハッチングが掛かった部分で示すように動作実行者の位置に応じて位置や形状が変化された映像（以下、編集後の瞳映像）となる。なお、視線編集処理において瞳形状や位置を変更する方法としては、公知の画像編集技術が利用可能である。 In the line-of-sight editing process, the actual pupil image (the black portion in FIG. 9) is edited according to the position of the person performing the operation, and specifically, tilted in the direction when the person performing the operation is viewed from the imaging device 22. The shape and position of the pupil are changed so that the line of sight is shifted by an amount corresponding to the angle θ. Through this procedure, the pupil image is an image whose position and shape are changed according to the position of the person performing the operation as shown by the hatched portion in FIG. 9 (hereinafter referred to as an edited pupil image). Become. As a method for changing the pupil shape and position in the line-of-sight editing process, a known image editing technique can be used.

視線編集処理を実行した後、表示データ生成部４５は、話者映像分解工程Ｓ００２において抽出された頭部映像及び胴等映像と、視線編集工程Ｓ００３において得た編集後の瞳映像を合成した画像を表示するための表示データを生成する（Ｓ００４）。かかる工程Ｓ００４によって得られる表示データを展開すると、図１０に図示の視線合わせ画像が話者画像としてディスプレイに表示されるようになる。図１０は、視線合わせ画像を表示している様子を示した図である。 After executing the line-of-sight editing process, the display data generating unit 45 synthesizes the head image and the torso image extracted in the speaker image decomposition step S002 and the edited pupil image obtained in the line-of-sight editing step S003. Display data for displaying is generated (S004). When the display data obtained in step S004 is developed, the line-of-sight image shown in FIG. 10 is displayed on the display as a speaker image. FIG. 10 is a diagram illustrating a state in which a line-of-sight alignment image is displayed.

視線合わせ画像についてより詳しく説明すると、話者側サーバ１０Ａから受信した対話データ中の映像データを展開することで表示される実際の話者映像とは異なった画像となっている。より具体的に説明すると、視線合わせ画像は、図１０に示す通り、動作実行者の位置（図１０に示すケースではＤさんの位置）に話者の視線が向くように実際の話者映像を補正したものとなっている。換言すると、話者映像から分割された各部分映像を合成することで生成される表示データは、動作実行者位置特定工程Ｓ００１にて特定された動作実行者の位置に話者の視線が向くように補正された話者画像を表示するためのデータである。なお、各部分画像を合成する方法としては、公知の画像処理技術が利用可能である。 The line-of-sight image will be described in more detail. The image is different from the actual speaker image displayed by expanding the video data in the conversation data received from the speaker-side server 10A. More specifically, as shown in FIG. 10, the line-of-sight image shows an actual speaker image so that the speaker's line of sight is directed to the position of the person performing the operation (position D in the case shown in FIG. 10). It has been corrected. In other words, the display data generated by combining the partial videos divided from the speaker video is such that the speaker's line of sight is directed to the position of the action executor specified in the action executor position specifying step S001. This is data for displaying the speaker image corrected in (1). A known image processing technique can be used as a method for combining the partial images.

また、視線合わせ画像における話者の視線については、瞬時に動作実行者の位置に向くように切り替わることよりも、通常の人間の動きに合わせて漸次的に切り替わることが望ましい。したがって、視線編集工程Ｓ００３では、動作実行者の位置に向かって徐々に変化するように視線を編集すると良い。 In addition, it is desirable that the line of sight of the speaker in the line-of-sight image is gradually switched in accordance with the normal human movement, rather than instantaneously switching to the position of the person performing the operation. Therefore, in the line-of-sight editing step S003, it is preferable to edit the line of sight so as to gradually change toward the position of the person performing the operation.

また、動作実行者の検知をトリガーとして視線合わせ画像生成処理を開始してから、当該処理にて生成された表示データを展開して視線合わせ画像を表示するまでの時間については、通常の人間の神経回路において視線を切り替えるのに要する時間と一致させると良い。かかる場合には、話者画像において話者の視線がより自然に変化するようになる。 In addition, regarding the time from the start of the line-of-sight image generation process triggered by the detection of the person performing the operation until the display data generated by the process is expanded to display the line-of-sight image, The time required for switching the line of sight in the neural circuit is preferably matched. In such a case, the speaker's line of sight changes more naturally in the speaker image.

視線合わせ画像の表示データを生成した後、表示データ生成部４５は、更に、話者映像分解処理において分割した頭部映像のデータに対して、話者の顔の向きを編集する処理を実行する（Ｓ００５）。この処理は、動作実行者位置特定工程Ｓ００１にて特定した動作実行者位置に応じて話者の顔の向きを変化させるために実行される。以下、顔向き編集処理について図１１を参照しながら説明する。図１１は、顔向き編集に関する説明図である。 After generating the display data of the line-of-sight image, the display data generating unit 45 further executes a process of editing the direction of the speaker's face on the head image data divided in the speaker image decomposition process. (S005). This process is executed to change the direction of the speaker's face according to the action performer position specified in the action performer position specifying step S001. The face orientation editing process will be described below with reference to FIG. FIG. 11 is an explanatory diagram regarding face orientation editing.

顔向き編集処理では、頭部映像を動作実行者位置に応じて編集し、具体的には、話者が正面を向いているときの顔の中心線（図１１中、破線にて示す）を基準にして、撮像装置２２から動作実行者を見たときの方向に傾き角度θに応じた分だけ話者の顔の向きが変化ように、鼻や口等の顔各部の位置及び顔の輪郭を変更する。かかる手順を経ることにより、頭部映像は、図１１に示すように正面を向いた状態から動作実行者の位置に応じて顔の向きが変化された映像（以下、編集後の頭部映像）となる。なお、顔向き編集処理において顔の向きを変更する方法としては、公知の画像編集技術が利用可能である。 In the face orientation editing process, the head image is edited according to the position of the person who performs the operation. Specifically, the center line of the face when the speaker is facing the front (indicated by a broken line in FIG. 11). The position of each part of the face such as the nose and mouth and the contour of the face so that the direction of the speaker's face changes by an amount corresponding to the tilt angle θ in the direction when the person performing the operation is viewed from the imaging device 22 with reference. To change. Through this procedure, the head image is an image in which the orientation of the face is changed according to the position of the person performing the operation from the state of facing the front as shown in FIG. 11 (hereinafter, the edited head image). It becomes. As a method for changing the face orientation in the face orientation editing process, a known image editing technique can be used.

顔向き編集処理を実行した後、表示データ生成部４５は、前工程Ｓ００５において得た編集後の頭部映像と残りの部分映像とを合成した画像を表示するための表示データを生成する（Ｓ００６）。かかる工程Ｓ００６によって得られる表示データを展開すると、図１２に図示の二次視線合わせ画像が話者画像としてディスプレイに表示されるようになる。図１２は、二次視線合わせ画像を表示している様子を示した図である。 After executing the face orientation editing process, the display data generation unit 45 generates display data for displaying an image obtained by combining the edited head image obtained in the previous step S005 and the remaining partial image (S006). ). When the display data obtained in step S006 is developed, the secondary line-of-sight image shown in FIG. 12 is displayed on the display as a speaker image. FIG. 12 is a diagram illustrating a state in which a secondary line-of-sight alignment image is displayed.

二次視線合わせ画像について説明すると、上述した視線合わせ画像と同様、実際の話者映像とは異なった画像となっており、図１２に示す通り、動作実行者の位置（図１０に示すケースではＤさんの位置）に話者の視線及び話者の顔が向くように実際の話者映像を補正したものとなっている。換言すると、話者映像から分割された各部分映像を再合成することで生成される表示データは、動作実行者位置特定工程Ｓ００１にて特定された動作実行者の位置に話者の視線及び話者の顔が向くように補正された話者画像を表示するためのデータである。なお、二次視線合わせ画像を構築するために各部分画像を再合成する方法についても、視線合わせ画像を構築するために各部分画像を合成するときと同様、公知の画像処理技術が利用可能である。 The secondary line-of-sight image will be described. Similar to the above-described line-of-sight alignment image, the image is different from the actual speaker image, and as shown in FIG. 12, the position of the person performing the operation (in the case shown in FIG. 10). The actual video image of the speaker is corrected so that the speaker's line of sight and the speaker's face face the position of Mr. D. In other words, the display data generated by recombining the partial videos divided from the speaker video is the speaker's line of sight and speech at the position of the motion executor identified in the motion executor position identifying step S001. This is data for displaying a speaker image corrected so that the person's face faces. As for the method of recombining each partial image to construct a secondary line-of-sight image, a known image processing technique can be used as in the case of combining each partial image to construct a line-of-sight image. is there.

また、二次視線合わせ画像における話者の顔の向きについては、瞬時に動作実行者に向くように変化することよりも、通常の人間の動きに合わせて漸次的に変化することが望ましい。したがって、顔向き編集工程Ｓ００５では、動作実行者の位置に向かって徐々に変化するように話者の顔の向きを編集すると良い。また、このとき、顔の向きの変化に連動して話者の視線が変化することが望ましい。したがって、顔向き編集工程Ｓ００５と同時に顔の向きの変化に付随して瞳映像を再編集することとし、かかる再編集工程において、顔の向きの変化量に応じて瞳の形状や位置を変化させると良い。 Moreover, it is desirable that the orientation of the speaker's face in the secondary line-of-sight image changes gradually in accordance with normal human movement, rather than instantaneously changing toward the action performer. Therefore, in the face direction editing step S005, it is preferable to edit the face direction of the speaker so as to gradually change toward the position of the person performing the operation. At this time, it is desirable that the line of sight of the speaker changes in conjunction with the change in the orientation of the face. Therefore, at the same time as the face direction editing step S005, the pupil video is re-edited in association with the change in the face direction, and in this re-editing step, the shape and position of the pupil are changed according to the amount of change in the face direction. And good.

以上までの一連の工程が完了した時点で、視線合わせ画像生成処理が終了する。そして、表示データ生成部４５は、動作実行者検知部４４が動作実行者を検知する度に視線合わせ画像生成処理を繰り返し実行する。 When the series of steps described above is completed, the line-of-sight image generation processing is completed. Then, the display data generation unit 45 repeatedly executes the line-of-sight image generation process every time the operation executor detection unit 44 detects the operation executor.

一方、動作実行者検知部４４が所定時間以上動作実行者を検知していないとき、表示データ生成部４５は、話者の視線が所定の位置を向いている話者画像（以下、目配せ画像）を表示する表示データの生成処理を実行する。かかる処理は、本発明の第二処理に相当し、以降では目配せ画像生成処理と呼ぶこととする。 On the other hand, when the motion executor detection unit 44 has not detected a motion executor for a predetermined time or longer, the display data generation unit 45 displays a speaker image in which the speaker's line of sight faces a predetermined position (hereinafter referred to as a look image). The display data generation process for displaying is executed. Such a process corresponds to the second process of the present invention, and is hereinafter referred to as a “layout image generation process”.

目配せ画像生成処理について図１３に図示した手順に沿って説明する。図１３は、目配せ画像生成処理の流れを示した図である。目配せ画像生成処理は、動作実行者検知部４４が動作実行者を検知していない時間が所定時間に達すると実行される。なお、目配せ画像生成処理の実行条件となる動作実行者の非検知時間については、任意に設定することが可能である。 The layout image generation processing will be described along the procedure illustrated in FIG. FIG. 13 is a diagram illustrating a flow of a side-by-side image generation process. The gaze image generation process is executed when the operation executor detection unit 44 does not detect the operation executor reaches a predetermined time. Note that the non-detection time of the operation executor as an execution condition of the side-by-side image generation process can be arbitrarily set.

目配せ画像生成処理は、先ず、各対話相手が居る位置を特定する工程から始まる（Ｓ０１１）。具体的に説明すると、本工程Ｓ０１１において、表示データ生成部４５は、撮像装置２２により撮像された対話相手の映像データを解析して各対話相手の位置を特定する。より具体的には、各対話相手について、撮像装置２２から見たときの方向及び撮像装置２２の正面位置からの傾き角度θを特定する。なお、各対話相手の位置を特定する方法については、上記の内容に限定されず、例えば距離センサや位置センサを用いて動作実行者の位置を特定することとしてもよい。 The gaze image generation process starts from a step of specifying a position where each conversation partner is present (S011). Specifically, in this step S011, the display data generation unit 45 analyzes the video data of the conversation partner imaged by the imaging device 22, and specifies the position of each conversation partner. More specifically, for each conversation partner, the direction when viewed from the imaging device 22 and the inclination angle θ from the front position of the imaging device 22 are specified. In addition, about the method of specifying the position of each dialogue partner, it is not limited to said content, For example, it is good also as specifying the position of an operation person using a distance sensor or a position sensor.

各対話相手の位置を特定した後、表示データ生成部４５は、データ受信部４３が話者側サーバ１０Ａから受信した対話データの中から映像データを抽出し、抽出した映像データに対して話者映像分解処理を実行する（Ｓ０１２）。かかる処理は、視線合わせ画像生成処理における話者映像分解処理と同様の処理である。 After specifying the position of each conversation partner, the display data generating unit 45 extracts video data from the conversation data received by the data receiving unit 43 from the speaker-side server 10A, and the speaker is extracted from the extracted video data. Video separation processing is executed (S012). This process is the same as the speaker video decomposition process in the line-of-sight image generation process.

そして、話者映像分解処理を実行した後、表示データ生成部４５は、話者映像から分割した瞳映像のデータに対して、瞳の形状や眼球に対する相対位置を編集する処理を実行する（Ｓ０１３）。この視線編集処理は、話者の視線を予め設定された位置に向けるために実行され、具体的には複数の対話相手であるＢさん、Ｃさん及びＤさんのうちの一人が居る位置に向けるために実行される。なお、視線編集処理の手順については、視線合わせ画像生成処理における視線編集処理と同様であり、同処理において瞳形状や位置を変更する方法としては、公知の画像編集技術が利用可能である。 After executing the speaker video decomposition process, the display data generating unit 45 executes a process of editing the pupil shape and the relative position with respect to the eyeball on the pupil video data divided from the speaker video (S013). ). This line-of-sight editing process is executed to direct the speaker's line of sight to a preset position, and specifically, directs to a position where one of a plurality of conversation partners B, C, and D is present. To be executed. Note that the procedure of the line-of-sight editing process is the same as that of the line-of-sight editing process in the line-of-sight image generation process, and a known image editing technique can be used as a method of changing the pupil shape and position in the process.

視線編集処理の実行後、表示データ生成部４５は、前工程Ｓ０１３において得た編集後の瞳映像と、話者映像分解工程Ｓ０１２において抽出された頭部映像及び胴等映像と、を合成した画像を表示するための表示データを生成する（Ｓ０１４）。かかる工程Ｓ０１４によって得られる表示データを展開すると、対話相手の一人が居る位置に話者の視線が向いた話者画像、すなわち目配せ画像がディスプレイに表示されるようになる。 After executing the line-of-sight editing process, the display data generation unit 45 combines the edited pupil image obtained in the previous step S013 with the head image and the torso image extracted in the speaker image decomposition step S012. Display data for displaying is generated (S014). When the display data obtained in step S014 is developed, a speaker image in which the speaker's line of sight is directed to a position where one of the conversation partners is present, that is, a look-ahead image, is displayed on the display.

目配せ画像について説明すると、視線合わせ画像と同様、話者側サーバ１０Ａから受信した対話データ中の映像データを展開することで表示される実際の話者映像とは異なった画像となっている。より具体的に説明すると、目配せ画像は、対話相手の一人が居る位置に話者の視線が向くように実際の話者映像を補正したものとなっている。換言すると、話者映像から分割された各部分映像を合成することで生成される表示データは、対話相手位置特定工程Ｓ０１１にて各対話相手の位置のうち、所定の対話相手、例えば、撮像装置２２から見て最も右側に位置する対話相手が居る位置に話者の視線が向くように補正された話者画像を表示するためのデータである。なお、各部分画像を合成する方法としては、公知の画像処理技術が利用可能である。 The look-ahead image will be described. Similar to the line-of-sight image, the image is different from the actual speaker image displayed by expanding the video data in the conversation data received from the speaker-side server 10A. More specifically, the look-ahead image is obtained by correcting an actual speaker video so that the speaker's line of sight is directed to a position where one of the conversation partners is present. In other words, the display data generated by synthesizing the partial videos divided from the speaker video is a predetermined conversation partner, for example, an imaging device among the positions of the respective conversation partners in the conversation partner position specifying step S011. 22 is data for displaying a speaker image corrected so that the speaker's line of sight is directed to a position where the conversation partner located on the rightmost side as viewed from 22 is present. A known image processing technique can be used as a method for combining the partial images.

また、目配せ画像における話者の視線については、所定の対話相手が居る位置に向かって瞬時に切り替わることよりも、通常の人間の動きに合わせて漸次的に切り替わることが望ましい。したがって、視線編集工程Ｓ０１３では、所定の対話相手が居る位置に向かって徐々に変化するように視線を編集すると良い。 Further, it is desirable that the line of sight of the speaker in the look-ahead image is gradually switched in accordance with a normal human movement rather than instantaneously switching toward a position where a predetermined conversation partner is present. Therefore, in the line-of-sight editing step S013, it is preferable to edit the line of sight so as to gradually change toward a position where a predetermined dialogue partner is present.

以上までの一連の工程が完了した時点で、目配せ画像生成処理が終了する。そして、表示データ生成部４５は、動作実行者を検知していない非検知期間が継続する間、一定の間隔で目配せ画像生成処理を繰り返し実行する。このとき、表示データ生成部４５は、複数の対話相手のうち、話者の視線が向いている位置に居る対話相手が順次切り替わるように目配せ画像生成処理を繰り返す。 At the time when the series of steps described above is completed, the side-by-side image generation processing ends. Then, the display data generation unit 45 repeatedly executes the look-ahead image generation process at regular intervals while the non-detection period in which the operation performer is not detected continues. At this time, the display data generation unit 45 repeats the look-ahead image generation process so that the conversation partner at the position where the line of sight of the speaker is facing is sequentially switched among the plurality of conversation partners.

より具体的に説明すると、ある回の目配せ画像生成処理において、撮像装置２２から見て最も右側に位置する対話相手（すなわち、Ｂさん）が居る位置に話者の視線が向くように補正された目配せ画像の表示データを生成したとする。かかる場合、次回実行される目配せ画像生成処理では、Ｂさんの左隣に居る対話相手（すなわち、Ｃさん）の位置に話者の視線が向くように補正された目配せ画像の表示データを生成する。さらに、その次の回で実行される目配せ画像生成処理では、撮像装置２２から見て最も左側に位置する対話相手（すなわち、Ｄさん）が居る位置に話者の視線が向くように補正された目配せ画像の表示データを生成する。以降、話者の視線がＢさんの位置、Ｃさんの位置、Ｄさんの位置の順で切り替わるように目配せ画像生成処理が繰り返し実行される。 More specifically, in the one-time gaze image generation process, correction was performed so that the speaker's line of sight is directed to the position where the conversation partner (that is, Mr. B) located on the rightmost side as viewed from the imaging device 22 is located. It is assumed that display data for a side-by-side image is generated. In such a case, in the next-stage run-out image generation process, the run-out image display data corrected so that the speaker's line of sight is directed to the position of the conversation partner (ie, Mr. C) on the left side of B is generated. . Further, in the side-by-side image generation process executed in the next round, correction is performed so that the speaker's line of sight is directed to the position where the conversation partner (ie, Mr. D) is located on the leftmost side when viewed from the imaging device 22. Display data of a side-by-side image is generated. Thereafter, the look-ahead image generation process is repeatedly executed so that the speaker's line of sight is switched in the order of Mr. B, Mr. C, and Mr. D.

以上のように目配せ画像生成処理が繰り返し実行されることにより、Ｂさん達側のディスプレイに表示される話者画像（すなわち、目配せ画像）では、図１４の（Ａ）、（Ｂ）及び（Ｃ）に示すように、話者の視線が定期的に変化し、その視線の先に居る対話相手が順次切り替わるようになる。図１４の（Ａ）、（Ｂ）及び（Ｃ）は、目配せ画像生成処理にて生成された表示データに基づいて話者画像を表示している様子を示した図である。そして、上記の視覚的効果により、各対話相手は、あたかも話者が目配せしているように感じることが可能となり、以て、話者画像を見ながら行う対話の趣向性が向上することとなる。 As described above, by repeating the look-ahead image generation process, in the speaker image (that is, the look-ahead image) displayed on the display on the side of Mr. B, (A), (B), and (C ), The speaker's line of sight changes periodically, and the conversation partner at the end of the line of sight switches sequentially. (A), (B), and (C) of FIG. 14 are views showing a state in which a speaker image is displayed based on display data generated by the lookout image generation process. The above-mentioned visual effect makes it possible for each conversation partner to feel as if the speaker is paying attention, thereby improving the preference of the conversation performed while looking at the speaker image. .

なお、本実施形態において、目配せ画像では話者の視線のみが所定の対話相手の位置を向いていることとしたが、話者の視線とともに話者の顔が所定の対話相手の位置に向くような目配せ画像を表示することとしてもよい。換言すると、目配せ画像生成処理では、所定の対話相手の位置に話者の視線及び話者の顔が向くように補正された話者画像（目配せ画像）を表示するための表示データを生成することとしてもよい。 In this embodiment, it is assumed that only the speaker's line of sight faces the position of the predetermined conversation partner in the look-ahead image, but the speaker's face faces the predetermined conversation partner's position together with the speaker's line of sight. It is also possible to display a fine-grained image. In other words, in the look-ahead image generation process, display data for displaying a speaker image (a look-ahead image) corrected so that the speaker's line of sight and the speaker's face are directed to the position of a predetermined conversation partner is generated. It is good.

（話者画像表示部４６及び話者音声再生部４７について）
話者画像表示部４６は、Ｂさん達側のディスプレイと協働することで画像表示部として機能する。厳密に説明すると、話者画像表示部４６は、表示データ生成部４５によって生成された表示データを展開し、Ｂさん達側のディスプレイに対して表示命令を出力する。この表示命令を受け付けたディスプレイ側では、その表示画面に前述の視線合わせ画像、二次視線合わせ画像若しくは目配せ画像が表示されるようになる。このように話者画像表示部４６は、表示データを展開することで話者画像を表示するものである。 (About the speaker image display unit 46 and the speaker voice reproduction unit 47)
The speaker image display unit 46 functions as an image display unit in cooperation with the display on the side of Mr. B. Strictly speaking, the speaker image display unit 46 expands the display data generated by the display data generation unit 45 and outputs a display command to the display on the side of Mr. B. On the display side that has received this display command, the above-described line-of-sight image, secondary line-of-sight image, or gaze image is displayed on the display screen. Thus, the speaker image display unit 46 displays the speaker image by expanding the display data.

なお、Ｂさん達側のディスプレイについてもディスプレイ兼用ミラー２５によって構成されているため、話者画像表示部４６は、話者画像を表示するにあたり、ディスプレイ兼用ミラー２５の鏡面部分に表示画面を形成させるために表示画面形成命令処理を実行する。表示画面形成命令処理は、話者側サーバ１０Ａの相手画像表示部３４によって行われる処理と同様であるため、同処理の具体的な手順については説明を省略する。 Since the display on the side of Mr. B is also composed of the display / mirror 25, the speaker image display unit 46 forms a display screen on the mirror surface of the display / mirror 25 when displaying the speaker image. Therefore, a display screen formation command process is executed. Since the display screen formation command process is the same as the process performed by the partner image display unit 34 of the speaker-side server 10A, a description of the specific procedure of the process is omitted.

話者音声再生部４７は、話者側サーバ１０Ａより受信した対話データから音声データを抽出し、当該音声データをデコードして展開し、Ｂさん達側の音声再生装置２４に対して再生命令を出力する。この再生命令を受け付けた音声再生装置２４は、当該命令に従って上記の音声データが示す音声、すなわちＡさんの音声を再生する。 The speaker voice playback unit 47 extracts voice data from the conversation data received from the speaker side server 10A, decodes and expands the voice data, and issues a playback command to the voice playback device 24 on the Mr. B side. Output. The audio reproducing device 24 that has received the reproduction command reproduces the audio indicated by the audio data, that is, Mr. A's audio in accordance with the command.

＜＜本実施形態に係る画像表示方法の手順＞＞
次に、本実施形態に係る画像表示方法について説明する。なお、以下の説明においても、上記までの説明と同様に話者をＡさんとし、対話相手をＢさん、Ｃさん及びＤさんとするケースを具体例に挙げることとする。 << Procedure for Image Display Method According to Present Embodiment >>
Next, an image display method according to the present embodiment will be described. In the following description, the case where the speaker is Mr. A and the conversation partners are Mr. B, Mr. C, and Mr. D is given as a specific example in the same manner as described above.

本実施形態に係る画像表示方法は、本システムＳにおいて話者及び対話相手双方の通信ユニット１、２によって実現され、具体的には、各通信ユニット１、２のサーバコンピュータ（話者側サーバ１０Ａ及び相手側サーバ１０Ｂ）が図１５及び１６に示す流れにてデータ処理を順次実行することで実現される。図１５及び１６は、ＡさんとＢさん達との間の対話において本システムＳが実行するデータ処理の流れを示した図である。以下、一連のデータ処理の流れについて図１５及び１６を参照しながら説明する。 The image display method according to the present embodiment is realized by the communication units 1 and 2 of both the speaker and the conversation partner in the system S. Specifically, the server computer (speaker-side server 10A) of each communication unit 1 and 2 is used. And the counterpart server 10B) execute data processing sequentially in the flow shown in FIGS. FIGS. 15 and 16 are diagrams illustrating a flow of data processing executed by the system S in the dialogue between Mr. A and Mr. B. Hereinafter, a flow of a series of data processing will be described with reference to FIGS.

先ず、Ａさんが自宅内に設置されたディスプレイ兼用ミラー２５の前方に移動し、人感センサ２３がその検知エリア内に居るＡさんの位置を検知するところから開始される（Ｓ０２１）。そして、話者側サーバ１０Ａは、Ｉ／Ｏポート１５を介して上記の人感センサ２３による検知結果を示す信号を受信すると（Ｓ０２１でＹｅｓ）、表示画面形成命令処理を実行する。これにより、Ａさん側のディスプレイ兼用ミラー２５は、その鏡面部分が姿見の外観を現している状態から遷移し、上記の鏡面部分に表示画面を形成するようになる（Ｓ０２２）。なお、Ａさん側のディスプレイ兼用ミラー２５に表示画面が形成された時点では、当該表示画面が所定の待ち受け画面となっている。 First, Mr. A moves to the front of the display / mirror 25 installed in his / her home, and the human sensor 23 detects the position of Mr. A in the detection area (S021). When the speaker-side server 10A receives a signal indicating the detection result of the human sensor 23 via the I / O port 15 (Yes in S021), the speaker-side server 10A executes display screen formation command processing. As a result, the display-use mirror 25 on the side of Mr. A transitions from a state in which the mirror surface portion shows the appearance of appearance, and forms a display screen on the mirror surface portion (S022). At the time when the display screen is formed on the display-use mirror 25 on the side of Mr. A, the display screen is a predetermined standby screen.

一方、表示画面形成命令処理の実行に伴って集音装置２１及び撮像装置２２が作動し、これにより、Ａさんの全身像及びその周辺空間の映像が撮像され、Ａさんの話し声が集音されるようになる（Ｓ０２３）。その後、話者側サーバ１０Ａが、集音装置２１及び撮像装置２２からの出力信号を基に対話データを生成し、同データを相手側サーバ１０Ｂに向けて送信する（Ｓ０２４）。 On the other hand, the sound collecting device 21 and the image pickup device 22 are operated in accordance with the execution of the display screen formation command process, whereby the whole body image of A and the surrounding space are picked up, and Mr. A's speaking voice is collected. (S023). Thereafter, the speaker-side server 10A generates conversation data based on the output signals from the sound collecting device 21 and the imaging device 22, and transmits the data to the counterpart server 10B (S024).

相手側サーバ１０Ｂは、通信回線３を介して対話データを受信し、当該対話データを内部のメモリ１２又はハードディスクドライブ１３に記憶する（Ｓ０２５）。その後、相手側サーバ１０Ｂは、記憶した対話データを読み出して同データの中から音声データを抽出し、当該音声データが示す音声を再生させる命令を音声再生装置２４に対して出力する。かかる再生命令を受け付けた音声再生装置２４は、上記音声データが示す音声を再生する（Ｓ０２６）。この結果、Ｂさん達が居る建物内ではＡさんの音声（話し声）が聞こえるようになる。 The partner server 10B receives the dialog data via the communication line 3, and stores the dialog data in the internal memory 12 or the hard disk drive 13 (S025). Thereafter, the counterpart server 10B reads the stored conversation data, extracts voice data from the data, and outputs a command to play the voice indicated by the voice data to the voice playback device 24. Receiving the reproduction command, the audio reproducing device 24 reproduces the audio indicated by the audio data (S026). As a result, Mr. A's voice (speaking voice) can be heard in the building where Mr. B is.

一方、Ｂさん達は、Ａさんの音声に反応する形で、Ｂさん達が居る建物内に設置されたディスプレイ兼用ミラー２５の前方に移動する。そして、Ａさんの音声が再生されている間にＢさん達のいずれかが人感センサ２３の検知エリア内に入ると、人感センサ２３がその者の位置を検知するようになる（Ｓ０２７）。相手側サーバ１０Ｂは、Ｉ／Ｏポート１５を介して上記の人感センサ２３による検知結果を示す信号を受信すると、これに伴って表示画面形成命令処理を実行する。これにより、Ｂさん達側のディスプレイ兼用ミラー２５は、その鏡面部分が姿見の外観を現している状態から遷移し、上記の鏡面部分に表示画面を形成するようになる（Ｓ０２８）。 On the other hand, Mr. B moves to the front of the display / mirror 25 installed in the building where Mr. B is in response to Mr. A's voice. If any of Mr. B enters the detection area of the human sensor 23 while the voice of Mr. A is being reproduced, the human sensor 23 detects the position of the person (S027). . When the partner server 10B receives a signal indicating the detection result by the human sensor 23 via the I / O port 15, the display server forming command process is executed accordingly. As a result, the display-use mirror 25 on the side of Mrs. B transitions from the state in which the mirror surface portion shows the appearance of appearance, and forms a display screen on the mirror surface portion (S028).

表示画面の形成後、相手側サーバ１０Ｂは、話者画像の表示に係るデータ処理を実行する。当該データ処理の実行により、Ａさんの音声が音声再生装置２４により再生されるとともに、話者画像が表示画面に表示されるようになる（Ｓ０２９）。なお、話者画像の表示に係るデータ処理の流れについては、後に詳しく説明する。 After the display screen is formed, the counterpart server 10B executes data processing related to the display of the speaker image. By executing the data processing, Mr. A's voice is reproduced by the voice reproduction device 24, and a speaker image is displayed on the display screen (S029). The flow of data processing related to the display of the speaker image will be described in detail later.

また、Ｂさん達が居る建物において人感センサ２３がその検知エリア内に居る者の位置を検知すると、集音装置２１及び撮像装置２２が作動するようになる。これにより、Ｂさん達の全身像及びその周辺空間の映像が撮像され、Ｂさん達の音声が集音されるようになる（Ｓ０３０）。その後、相手側サーバ１０Ｂは、集音装置２１及び撮像装置２２からの出力信号を基に対話データを生成し、生成した対話データを話者側サーバ１０Ａに向けて送信する（Ｓ０３１）。 Further, when the human sensor 23 detects the position of a person in the detection area in the building where Mr. B and others are located, the sound collecting device 21 and the imaging device 22 are activated. Thereby, Mr. B's whole body image and the image of the surrounding space are imaged, and Mr. B's voice is collected (S030). Thereafter, the partner server 10B generates dialogue data based on the output signals from the sound collecting device 21 and the imaging device 22, and transmits the generated dialogue data to the speaker side server 10A (S031).

話者側サーバ１０Ａは、通信回線３を介して対話データを受信し、当該対話データを内部のメモリ１２又はハードディスクドライブ１３に記憶する（Ｓ０３２）。その後、話者側サーバ１０Ａは、記憶した対話データを読み出した上で同データから音声データと映像データとを抽出し、これら２つのデータを展開する。そして、話者側サーバ１０Ａは、抽出した音声データが示す音声を再生させる命令を音声再生装置２４に対して出力するとともに、抽出した映像データが示す映像（すなわち、相手画像）を表示させる命令をディスプレイ兼用ミラー２５に対して出力する。これにより、Ａさん側の音声再生装置２４がＢさん達の音声を再生し、ディスプレイ兼用ミラー２５が形成する表示画面にはＢさん達の全身像及びその周辺空間の映像が表示されるようになる（Ｓ０３３）。 The speaker-side server 10A receives the dialog data via the communication line 3, and stores the dialog data in the internal memory 12 or the hard disk drive 13 (S032). Thereafter, the speaker-side server 10A reads the stored conversation data, extracts audio data and video data from the data, and develops these two data. Then, the speaker-side server 10A outputs a command for playing back the voice indicated by the extracted voice data to the voice playback device 24, and a command for displaying the video indicated by the extracted video data (that is, the partner image). Output to the mirror 25 for display. As a result, Mr. A's voice reproduction device 24 reproduces Mr. B's voice, and the display screen formed by the display and mirror 25 displays Mr. B's whole body image and the image of the surrounding space. (S033).

以降、両者間での対話が継続する間、上述した一連のデータ処理が各通信ユニット１、２にて繰り返し実行される。そして、Ａさんの自宅又はＢさん達が居る建物において、人感センサ２３がその検知エリア内に居る者の位置を検知しなくなったときに（Ｓ０３４、Ｓ０３５）、対話が終了し、上述した一連のデータ処理の実行についても終了することになる。これにより、集音装置２１及び撮像装置２２が停止する。また、人感センサ２３がその検知エリア内に居る者の位置を検知しなくなると、これに連動して、ディスプレイ兼用ミラー２５に形成されていた表示画面が消去され、ディスプレイ兼用ミラー２５は、その鏡面部分に姿見としての外観を現すようになる。 Thereafter, while the conversation between the two continues, the series of data processing described above is repeatedly executed in each of the communication units 1 and 2. Then, in the building where Mr. A's house or Mr. B's are located, when the human sensor 23 no longer detects the position of the person in the detection area (S034, S035), the dialogue is terminated, and the series described above. The execution of this data processing is also terminated. Thereby, the sound collector 21 and the imaging device 22 are stopped. When the human sensor 23 no longer detects the position of the person in the detection area, the display screen formed on the display mirror 25 is erased in conjunction with this, and the display mirror 25 Appearance appears as a figure on the mirror surface.

次に、上述したデータ処理のうち、話者画像の表示に係るデータ処理について図１７を参照しながら詳細に説明する。図１７は、話者画像を表示する際の手順を示した図である。話者画像の表示に係るデータ処理は、図１７に示すように、相手側サーバ１０Ｂによる動作実行者の検知の有無に応じて２つのパターンに分かれている。 Next, of the above-described data processing, data processing related to display of a speaker image will be described in detail with reference to FIG. FIG. 17 is a diagram illustrating a procedure for displaying a speaker image. As shown in FIG. 17, the data processing related to the display of the speaker image is divided into two patterns depending on whether or not the operation server is detected by the partner server 10B.

一方の処理パターンは、相手側サーバ１０Ｂが動作実行者を検知したとき（Ｓ０４１でＹｅｓ）のパターンである。かかる処理パターンでは、相手側サーバ１０Ｂが前述の視線合わせ画像生成処理を実行する（Ｓ０４２）。視線合わせ画像生成処理にて生成された表示データは、相手側サーバ１０Ｂによって逐次展開される（Ｓ０４３）。これにより、Ｂさん達側のディスプレイには話者画像が表示されるようになる。より具体的に説明すると、視線合わせ画像生成処理が実行されると、先ず、話者の視線が動作実行者の位置に向いている視線合わせ画像が表示され、次いで、話者の視線及び顔が動作実行者の位置に向いている二次視線合わせ画像が表示されるようになる。なお、視線合わせ画像生成処理が実行された場合には、一定時間、視線合わせ画像（厳密には、二次視線合わせ画像）がディスプレイに表示され続けるようになる。 One processing pattern is a pattern when the partner server 10B detects an operation performer (Yes in S041). In this processing pattern, the counterpart server 10B executes the above-described line-of-sight image generation processing (S042). The display data generated in the line-of-sight image generation processing is successively developed by the counterpart server 10B (S043). Thereby, a speaker image comes to be displayed on the display of Mr. B's side. More specifically, when the line-of-sight image generation processing is executed, first, a line-of-sight image in which the speaker's line of sight faces the position of the person performing the operation is displayed, and then the line of sight and face of the speaker are displayed. A secondary line-of-sight alignment image facing the position of the person performing the operation is displayed. When the line-of-sight image generation processing is executed, the line-of-sight image (strictly, the secondary line-of-sight image) continues to be displayed on the display for a certain period of time.

そして、視線合わせ画像生成処理が実行されてから一定時間が経過すると（Ｓ０４４でＹｅｓ）、相手側サーバ１０Ｂが動作実行者の検知の有無を再び判定するようになる（Ｓ０４１）。 When a certain time has elapsed since the line-of-sight image generation processing is executed (Yes in S044), the counterpart server 10B again determines whether or not the operation performer has been detected (S041).

もう一方の処理パターンは、相手側サーバ１０Ｂが動作実行者を検知しないとき（Ｓ０４１でＮｏ）のパターンである。かかる処理パターンでは、相手側サーバ１０Ｂが前述の目配せ画像生成処理を実行する（Ｓ０４５）。目配せ画像生成処理にて生成された表示データは、相手側サーバ１０Ｂによって逐次展開される（Ｓ０４６）。これにより、Ｂさん達側のディスプレイには話者画像として、話者の視線が所定の対話相手の位置に向いた目配せ画像が表示されるようになる。 The other processing pattern is a pattern when the counterpart server 10B does not detect the operation performer (No in S041). In such a processing pattern, the counterpart server 10B executes the above-described gaze image generation processing (S045). The display data generated by the side-by-side image generation process is successively developed by the counterpart server 10B (S046). Accordingly, a look-ahead image in which the line of sight of the speaker is directed to the position of the predetermined conversation partner is displayed as the speaker image on the display on the side of Mrs. B and others.

そして、目配せ画像生成処理は、相手側サーバ１０Ｂが動作実行者を検知しない非検知期間中、一定の時間毎に繰り返し実行される（Ｓ０４７、Ｓ０４８）。これにより、Ｂさん達側のディスプレイに表示される話者画像中、話者の視線の向きがＢさんの位置、Ｃさんの位置、Ｄさんの位置の順で切り替わるようになる。 Then, the side-by-side image generation process is repeatedly executed at regular intervals during the non-detection period in which the counterpart server 10B does not detect the operation performer (S047, S048). Thereby, in the speaker image displayed on the display on the side of Mr. B, the direction of the line of sight of the speaker is switched in the order of Mr. B, Mr. C, and Mr. D.

なお、相手側サーバ１０Ｂが目配せ画像生成処理を実行した後に動作実行者を検知した場合には（Ｓ０４７でＹｅｓ）、図１７に示すように、処理パターンが、目配せ画像生成処理を実行するパターンから視線合わせ画像生成処理を実行するパターンに移行する。 When the other server 10B detects an action performer after executing the gaze image generation process (Yes in S047), as shown in FIG. 17, the processing pattern is changed from the pattern for executing the gaze image generation process. The process shifts to a pattern for executing the line-of-sight image generation processing.

＜＜その他の実施形態＞＞
上記の実施形態では、本発明の画像表示システム及び画像表示方法について、一例を挙げて説明した。ただし、上記の実施形態は、本発明の理解を容易にするための一例に過ぎず、本発明を限定するものではない。本発明は、その趣旨を逸脱することなく、変更、改良され得ると共に、本発明にはその等価物が含まれることは勿論である。 << Other Embodiments >>
In the above embodiment, the image display system and the image display method of the present invention have been described by way of example. However, said embodiment is only an example for making an understanding of this invention easy, and does not limit this invention. The present invention can be changed and improved without departing from the gist thereof, and the present invention includes the equivalents thereof.

また、上記の実施形態では、実際の話者映像に対して話者の視線や話者の顔の向きを補正した話者画像を表示するにあたり、実際の話者映像を部分映像に分割することとした。そして、分割された部分映像中、瞳映像や頭部映像を編集し、編集された部分映像と残りの部分映像とを合成することで最終的な話者画像の表示データを生成することとした。かかる手順は、あくまでも話者画像の表示データを生成する手順の一例に過ぎず、その他の手順にて話者画像の表示データを生成することとしてもよい。例えば、話者映像を部分映像に分割せずに話者映像のままの状態（すなわち、分割されていない状態）で瞳や頭部のみを編集することとしてもよい。 Further, in the above embodiment, when displaying a speaker image in which the speaker's line of sight and the direction of the speaker's face are corrected with respect to the actual speaker image, the actual speaker image is divided into partial images. It was. Then, in the divided partial video, the pupil video and the head video are edited, and the edited partial video and the remaining partial video are combined to generate the final speaker image display data. . Such a procedure is merely an example of a procedure for generating speaker image display data, and the speaker image display data may be generated by another procedure. For example, it is possible to edit only the pupil or the head in a state where the speaker video is not divided into partial videos and is still in the state of the speaker video (that is, not divided).

また、上記の実施形態では、相手側サーバ１０Ｂが話者画像の表示データを生成することとした。ただし、表示データの生成を行う装置については、相手側サーバ１０Ｂに限定されず、話者側サーバ１０Ａが行うこととしてもよい。あるいは、話者側サーバ１０Ａと相手側サーバ１０Ｂの双方と通信可能な第三のサーバ（不図示のサーバであり、例えば、ＡＳＰサーバやクラウドサービス用のサーバ）が表示データを生成してもよい。 In the above embodiment, the partner server 10B generates the display data of the speaker image. However, the apparatus that generates the display data is not limited to the partner server 10B, and may be performed by the speaker server 10A. Alternatively, a third server (a server not shown, such as an ASP server or a cloud service server) that can communicate with both the speaker server 10A and the partner server 10B may generate display data. .

また、上記の実施形態では、対話中に笑い声や話し声を発する対話相手が居たときに相手側サーバ１０Ｂが当該対話相手を動作実行者として検知し、これをトリガーとして視線合わせ画像生成処理を実行することとした。ただし、動作実行者を検知する際の条件については、笑い声や話し声を発することに限定されるものではなく、それ以外の動作を行うことを検知条件とすることとしてもよい。例えば、挙手動作や起立動作等のように身体を動かす動作を行うことを検知条件として設定することとしてもよい。 In the above embodiment, when there is a conversation partner who laughs or speaks during the conversation, the partner server 10B detects the conversation partner as an action executor, and executes a line-of-sight image generation process using this as a trigger. It was decided to. However, the condition for detecting the person who performs the operation is not limited to laughing out or speaking, and performing other operations may be set as the detection condition. For example, it is good also as setting as detection conditions performing operation | movement which moves a body like raising hand operation | movement, standing-up operation | movement, etc.

また、上記の実施形態では、視線や顔の向きを補正して得られる画像として話者画像を例に挙げて説明したが、対話相手の画像（相手画像）についても視線や顔の向きを実際の映像から補正した上で表示することとしてもよい。 In the above embodiment, the speaker image is described as an example of the image obtained by correcting the line of sight and the face direction. However, the line of sight and the face direction are actually applied to the conversation partner image (the partner image). It is good also as displaying after correcting from the image | video of.

また、上記の実施形態では、一人の話者に対して対話相手が複数人存在するケースを例に挙げて説明したが、これに限定されるものではない。例えば、話者が複数人存在することとしてもよい。かかる場合には、話者画像の表示データを生成するに際して、話者の視線や顔の向きを補正するためのデータ処理（具体的には、話者映像の分割、編集及び合成）を話者毎に実行することとなる。また、対話相手が単数（一人）であることとしてもよい。ただし、本システムＳの機能をより効果的に発揮する観点では、対話相手が複数存在する上記の構成がより望ましい。 In the above-described embodiment, a case where there are a plurality of conversation partners with respect to one speaker has been described as an example, but the present invention is not limited to this. For example, there may be a plurality of speakers. In such a case, when generating the display data of the speaker image, the speaker performs data processing (specifically, dividing, editing, and synthesizing the speaker video) to correct the speaker's line of sight and face orientation. It will be executed every time. Further, the conversation partner may be a single person (one person). However, from the viewpoint of more effectively demonstrating the function of the system S, the above-described configuration in which there are a plurality of conversation partners is more desirable.

Ｓ本システム（画像表示システム）
１，２通信ユニット
３通信回線
１０Ａ話者側サーバ
１０Ｂ相手側サーバ
１１ＣＰＵ、１２メモリ
１３ハードディスクドライブ
１４通信用インターフェース
１５Ｉ／Ｏポート
２１集音装置、２２撮像装置
２３人感センサ、２４音声再生装置
２５ディスプレイ兼用ミラー
２６制御回路、２７発光部
３１，４１データ取得部
３２，４２データ送信部
３３，４３データ受信部
３４相手画像表示部、３５相手音声再生部
４４動作実行者検知部、４５表示データ生成部
４６話者画像表示部、４７話者音声再生部 S System (image display system)
1, 2 Communication unit 3 Communication line 10A Speaker server 10B Partner server 11 CPU, 12 Memory 13 Hard disk drive 14 Communication interface 15 I / O port 21 Sound collecting device, 22 Imaging device 23 Human sensor, 24 Voice reproduction Device 25 Display / mirror 26 Control circuit, 27 Light emitting unit 31, 41 Data acquisition unit 32, 42 Data transmission unit 33, 43 Data reception unit 34 Partner image display unit, 35 Partner voice playback unit 44 Operation performer detection unit, 45 Display Data generation unit 46 Speaker image display unit 47 Speaker audio playback unit

Claims

An image display system for displaying a speaker's image to a conversation partner,
A data acquisition unit which is provided on the speaker side and acquires data indicating an image when the speaker is imaged;
A display data generation unit for generating display data for displaying the image corrected from the video;
An image display unit provided on the conversation partner side and displaying the image by expanding the display data;
A detection unit that is provided on the conversation partner side and detects the conversation partner when the conversation partner performs an operation that satisfies a preset condition ;
Wherein when the front Symbol detection unit detects the dialogue partner were performing the operation, the display data generation unit, which has been corrected so that the face is the speaker of the line-of-sight to the dialogue partner is present position, which was the operating Performing a first process for generating the display data for displaying an image ;
When the detection unit does not detect the conversation partner who has performed the operation for a predetermined time or longer, the display data generation unit displays the image corrected so that the line of sight of the speaker faces at a preset position. An image display system characterized by executing a second process for generating the display data for the purpose .

When the detection unit detects the conversation partner that performed the operation, the display data generation unit is configured so that the line of sight of the speaker and the face of the speaker are directed to a position where the conversation partner performing the operation is present. The image display system according to claim 1, wherein the display data for displaying the corrected image is generated.

In the case where there are a plurality of the conversation partners, when the detection unit does not detect the conversation partner that has performed the operation for a predetermined time or more, the display data generation unit is the line of sight of the speaker among the plurality of the conversation partners. the image display system according to claim 1, characterized in that repeatedly executes the second processing the dialogue as partner sequentially switched to being in a position where the facing.

A display screen forming unit that forms a part of a building material, furniture, or decoration arranged in a building where the conversation partner is located and forms a display screen of the image;
It is set in advance when the object to be detected is at least one of the action performed by the conversation partner, the position where the conversation partner is located, the posture of the conversation partner, and the sound emitted by the conversation partner. A sensor that detects the detection target that satisfies the second condition,
The display screen forming unit displays the appearance as the part without forming the display screen during a period in which the sensor does not detect the detection target that satisfies the second condition. the image display system according to any one of claims 1 to 3, characterized in that the condition is satisfied the formation of the display screen only during the period in which to detect the detection target.

An image display method for displaying an image of a speaker to a conversation partner,
A data acquisition unit provided on the speaker side acquires data indicating an image when the speaker is imaged;
A display data generation unit generating display data for displaying the image corrected from the video;
An image display unit provided on the conversation partner side displays the image by expanding the display data;
A detection unit provided on the conversation partner side detects the conversation partner when the conversation partner performs an operation that satisfies a preset condition;
When the detection unit detects the conversation partner who performed the operation , the display data generation unit corrects the image so that the line of sight of the speaker faces the position where the conversation partner who performed the operation exists. Executing a first process for generating the display data for displaying
When the detection unit does not detect the conversation partner who has performed the operation for a predetermined time or longer, the display data generation unit displays the image corrected so that the line of sight of the speaker faces at a preset position. An image display method comprising: executing a second process for generating the display data for the purpose .