JP7614765B2

JP7614765B2 - Content presentation device and program

Info

Publication number: JP7614765B2
Application number: JP2020149411A
Authority: JP
Inventors: 数馬吉野; 裕之川喜田; 拓也半田; 健介久富
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2025-01-16
Anticipated expiration: 2040-09-04
Also published as: JP2022043909A

Description

特許法第３０条第２項適用（１）公開者名日本放送協会ウェブサイトの掲載日令和２年７月２７日掲載アドレスｈｔｔｐｓ：／／ｗｗｗ．ｎｈｋ．ｏｒ．ｊｐ／ｉｎｆｏ／ｐｒ／ｍａｒｕｋａｊｉ／ａｓｓｅｔｓ／ｐｄｆ／４５０．ｐｄｆｈｔｔｐｓ：／／ｗｗｗ．ｎｈｋ．ｏｒ．ｊｐ／ｓｔｒｌ／ｎｅｗｓ／２０２０／７．ｈｔｍｌ（２）発行者名日本放送協会刊行物名技研だより２０２０年８月号掲載年月日令和２年８月６日掲載アドレスｈｔｔｐｓ：／／ｗｗｗ．ｎｈｋ．ｏｒ．ｊｐ／ｓｔｒｌ／ｈｔｔｐｓ：／／ｗｗｗ．ｎｈｋ．ｏｒ．ｊｐ／ｓｔｒｌ／ｐｕｂｌｉｃａ／ｇｉｋｅｎ＿ｄａｙｏｒｉ／１８５／１．ｈｔｍｌｈｔｔｐｓ：／／ｗｗｗ．ｎｈｋ．ｏｒ．ｊｐ／ｓｔｒｌ／ｐｕｂｌｉｃａ／ｇｉｋｅｎ＿ｄａｙｏｒｉ／１８５／ｐｄｆ／ｄａｙｏｒｉ１８５．ｐｄｆ（３）発行者名株式会社電波新聞社刊行物名電波新聞令和２年８月３日付第３面発行年月日令和２年８月３日（４）発行者名株式会社電波新聞社刊行物名電波新聞令和２年９月２日付第１面発行年月日令和２年９月２日Article 30, paragraph 2 of the Patent Act applies (1) Name of the publisher: Japan Broadcasting Corporation Date of website posting: July 27, 2020 Posting address: https://www.nhk.or.jp/info/pr/marukaji/assets/pdf/450.pdf https://www.nhk.or.jp/strl/news/2020/7.html (2) Name of the publisher: Japan Broadcasting Corporation Name of publication: GiKen Dayori, August 2020 Issue Date of posting: August 6, 2020 Posting address: https://www.nhk.or.jp/strl/ https://www.nhk.or. jp/strl/publica/giken_dayori/185/1.html https://www.nhk.or.jp/strl/publica/giken_dayori/185/pdf/dayori185.pdf (3) Publisher: Dempa Shimbun Co., Ltd. Publication name: Dempa Shimbun, dated August 3, 2020, page 3 Publication date: August 3, 2020 (4) Publisher: Dempa Shimbun Co., Ltd. Publication name: Dempa Shimbun, dated September 2, 2020, page 1 Publication date: September 2, 2020

本発明は、コンテンツ提示装置、及びプログラムに関する。 The present invention relates to a content presentation device and a program.

拡張現実（ＡＲ：ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ）コンテンツ及び仮想現実（ＶＲ：ＶｉｒｔｕａｌＲｅａｌｉｔｙ）コンテンツの体験において、他の人と一緒に体験している感覚を提供するシステムが求められている。
例えば、非特許文献１では、ゲーム、映画、会議などのＶＲ共有体験の有効性を検証するため、３次元映像をリアルタイムで撮影、伝送することにより、自分の隣に別空間の人物を再現し、一方、別空間の人物の隣には自分を再現するシステムを構築している。
また、非特許文献２では、別空間の人物と自然なコミュニケーションを可能とするＡＲシステムとして、別空間の人物を３次元撮影し、自分が体験しているＡＲ空間に、その別空間の人物をリアルタイムに再現するシステムが提案されている。また、非特許文献２では、デバイスを変えて当該システムをＶＲに活用することについても言及している。
さらに、物理的に離れている人に加えて同じ空間にいる人とも一緒に、相手の存在を感じながらリアルタイムにコンテンツ体験を共有可能なＡＲ／ＶＲシステムを提供することが求められている。 2. Description of the Related Art When experiencing Augmented Reality (AR) content and Virtual Reality (VR) content, there is a demand for a system that provides a sense of experiencing the content together with other people.
For example, in Non-Patent Document 1, in order to verify the effectiveness of VR shared experiences such as games, movies, and meetings, a system is constructed in which a person in another space is reproduced next to the user by capturing and transmitting three-dimensional images in real time, while the user is reproduced next to the person in the other space.
In addition, Non-Patent Document 2 proposes an AR system that enables natural communication with a person in another space, in which a person in another space is photographed in three dimensions and the person in the other space is reproduced in real time in the AR space that the person is experiencing. Non-Patent Document 2 also mentions using the system for VR by changing the device.
Furthermore, there is a demand for an AR/VR system that allows people who are physically separated as well as those in the same space to share content experiences in real time while feeling the presence of the other person.

Simon et al，”EVERYDAY PHOTO-REALISTIC SOCIAL VR:COMMUNICATE AND COLLABORATE WITH AN ENHANCED CO-PRESENCE AND IMMERSION”、IBC 2019Simon et al, “EVERYDAY PHOTO-REALISTIC SOCIAL VR:COMMUNICATE AND COLLABORATE WITH AN ENHANCED CO-PRESENCE AND IMMERSION”, IBC 2019 Sergio et al，” Holoportation: Virtual 3D Teleportation in Real-time”, UIST ’16， October 16-19，2016Sergio et al, “Holoportation: Virtual 3D Teleportation in Real-time”, UIST ’16, October 16-19, 2016

異なる空間にいる人とコンテンツを同じ空間を共有できるためには、コンテンツ（ＡＲコンテンツ、ＶＲコンテンツ、またはＡＲコンテンツとＶＲコンテンツとが混合したコンテンツであるＡＲ／ＶＲ混合コンテンツ）の位置も含めて互いの位置関係を矛盾なく共有する必要がある。 To be able to share content in the same space with people in different spaces, it is necessary to share the relative positions of each other without any contradictions, including the position of the content (AR content, VR content, or AR/VR mixed content that is a mixture of AR content and VR content).

本発明は上記の点に鑑みてなされたものであり、異なる空間にいる人とコンテンツを同じ空間を共有しながら体験している感覚を提供できるコンテンツ提示装置、及びプログラムを提供する。 The present invention has been made in consideration of the above points, and provides a content presentation device and program that can provide the sensation of experiencing content with people in different spaces while sharing the same space.

［１］上記の課題を解決するため、本発明の一態様によるコンテンツ提示装置は、第１被写体が撮影された画像を含む映像である第１被写体映像について基準となる位置及び向きを示す情報である第１被写体基準位置情報と、第２被写体が撮影された画像を含む映像である第２被写体映像について基準となる位置及び向きを示す情報である第２被写体基準位置情報をとを３次元空間内に設定する基準位置設定部と、前記第１被写体を含む自装置の周辺の映像である周辺映像を取得する周辺映像取得部と、前記周辺映像から前記第１被写体映像を抽出する第１被写体映像抽出部と、背景として用いられる映像である背景映像を取得する背景映像取得部と、前記第２被写体映像を他のコンテンツ提示装置から取得する第２被写体映像取得部と、コンテンツを取得するコンテンツ取得部と、前記第１被写体映像の前記３次元空間内の位置及び向きを前記第１被写体基準位置情報に基づいて設定する第１被写体位置設定部と、前記第２被写体映像の前記３次元空間内の位置及び向きを前記第２被写体基準位置情報に基づいて設定する第２被写体位置設定部と、前記第１被写体基準位置情報が示す位置と、前記第２被写体基準位置情報が示す位置との間の所定の相対的位置関係に基づいて前記コンテンツが表示される前記３次元空間内の位置を設定するコンテンツ位置設定部と、前記３次元空間内の位置及び向きが設定された前記第１被写体映像と、前記３次元空間内の位置及び向きが設定された前記第２被写体映像と、前記３次元空間内の位置が設定された前記コンテンツとが、前記背景映像を背景として前記３次元空間内に表示された映像である提示映像を出力する提示部と、を備え、前記コンテンツが表示される前記３次元空間内の位置と前記第１被写体基準位置情報が示す位置と前記第２被写体基準位置情報が示す位置との間の相対的な位置関係、前記第１被写体基準位置情報、及び前記第２被写体基準位置情報がそれぞれ前記他のコンテンツ提示装置との間で共有されており、前記提示部は、ビデオシースルー方式を用いて前記提示映像を生成する。 [1] In order to solve the above problem, a content presentation device according to one aspect of the present invention includes a reference position setting unit that sets, in a three-dimensional space, first subject reference position information, which is information indicating a reference position and orientation for a first subject video, which is a video including an image of a first subject, and second subject reference position information, which is information indicating a reference position and orientation for a second subject video, which is a video including an image of a second subject, a surrounding video acquisition unit that acquires a surrounding video, which is a video of the surroundings of the device including the first subject, a first subject video extraction unit that extracts the first subject video from the surrounding video, a background video acquisition unit that acquires a background video, which is a video used as a background, a second subject video acquisition unit that acquires the second subject video from another content presentation device, a content acquisition unit that acquires content, a first subject position setting unit that sets a position and orientation of the first subject video in the three-dimensional space based on the first subject reference position information, and a position and orientation of the second subject video in the three-dimensional space based on the first subject reference position information. a second subject position setting unit that sets a position in the three-dimensional space where the content is to be displayed based on a predetermined relative positional relationship between the position indicated by the first subject reference position information and the position indicated by the second subject reference position information; and a presentation unit that outputs a presentation image which is an image in the three-dimensional space where the first subject video, the position and orientation of which in the three-dimensional space is set, the second subject video, the position and orientation of which in the three-dimensional space is set, and the content, the position of which in the three-dimensional space is set, are displayed against the background of the background image, wherein the relative positional relationship between the position in the three-dimensional space where the content is to be displayed, the position indicated by the first subject reference position information, and the position indicated by the second subject reference position information, the first subject reference position information, and the second subject reference position information are each shared with the other content presentation device, and the presentation unit generates the presentation image using a video see-through method .

［２］また、本発明の一態様は、上記のコンテンツ提示装置において、前記提示部は、前記コンテンツに含まれる音声を、前記コンテンツが表示される前記３次元空間内の位置に基づいて定位させて出力する、ものである。 [2] In one aspect of the present invention, in the content presentation device described above, the presentation unit outputs the sound included in the content by localizing the sound based on the position in the three-dimensional space where the content is displayed.

［３］また、本発明の一態様は、上記のコンテンツ提示装置において、前記提示部は、前記第２被写体映像に含まれる音声を、前記第２被写体映像の前記３次元空間内の位置に基づいて定位させて出力する、ものである。 [3] In one aspect of the present invention, in the content presentation device described above, the presentation unit localizes and outputs the sound included in the second object image based on the position of the second object image in the three-dimensional space.

［４］また、本発明の一態様は、上記のコンテンツ提示装置において、前記第１被写体映像の前記３次元空間内の位置と、前記第２被写体映像の前記３次元空間内の位置とに基づいて、前記提示映像において前記第１被写体映像と前記第２被写体映像とのいずれが手前側にあるかを判定するオクルージョン再現部をさらに備え、前記提示部は、前記オクルージョン再現部の判定結果に基づいて前記提示映像を出力する、ものである。 [4] In one aspect of the present invention, the content presentation device further includes an occlusion reproduction unit that determines whether the first object image or the second object image is in the foreground in the presented image based on the position of the first object image in the three-dimensional space and the position of the second object image in the three-dimensional space, and the presentation unit outputs the presented image based on the determination result of the occlusion reproduction unit.

［５］また、本発明の一態様は、上記のコンテンツ提示装置において、前記第２被写体位置設定部は、前記第２被写体基準位置情報に基づいて前記第２被写体映像に含まれる前記第２被写体の足の位置を前記３次元空間の底面の位置に一致させて、前記第２被写体映像の前記３次元空間内の位置を設定する、ものである。 [5] In one aspect of the present invention, in the content presentation device described above, the second subject position setting unit sets the position of the second subject image in the three-dimensional space by matching the position of the feet of the second subject included in the second subject image with the position of the bottom surface of the three-dimensional space based on the second subject reference position information.

［６］また、本発明の一態様は、上記のコンテンツ提示装置において、前記提示部は、前記第２被写体映像を所定未満の透過度において表示する、ものである。 [6] In one aspect of the present invention, in the content presentation device described above, the presentation unit displays the second subject image at a transparency level less than a predetermined level.

［８］また、本発明の一態様は、第１被写体が撮影された画像を含む映像である第１被写体映像について基準となる位置及び向きを示す情報である第１被写体基準位置情報と、第２被写体が撮影された画像を含む映像である第２被写体映像について基準となる位置及び向きを示す情報である第２被写体基準位置情報とを３次元空間内に設定する基準位置設定過程と、前記第１被写体を含む自装置の周辺の映像である周辺映像を取得する周辺映像取得過程と、前記周辺映像から前記第１被写体映像を抽出する第１被写体映像抽出過程と、背景として用いられる映像である背景映像を取得する背景映像取得過程と、前記第２被写体映像を他のコンテンツ提示装置から取得する第２被写体映像取得過程と、コンテンツを取得するコンテンツ取得過程と、前記第１被写体映像の前記３次元空間内の位置及び向きを前記第１被写体基準位置情報に基づいて設定する第１被写体位置設定過程と、前記第２被写体映像の前記３次元空間内の位置及び向きを前記第２被写体基準位置情報に基づいて設定する第２被写体位置設定過程と、前記第１被写体基準位置情報が示す位置と、前記第２被写体基準位置情報が示す位置との間の所定の相対的位置関係に基づいて前記コンテンツが表示される前記３次元空間内の位置を設定するコンテンツ位置設定過程と、前記３次元空間内の位置及び向きが設定された前記第１被写体映像と、前記３次元空間内の位置及び向きが設定された前記第２被写体映像と、前記３次元空間内の位置が設定された前記コンテンツとが、前記背景映像を背景として前記３次元空間内に表示された映像である提示映像を出力する提示過程と、の処理をコンピューターに実行させるプログラムであって、前記コンテンツが表示される前記３次元空間内の位置と前記第１被写体基準位置情報が示す位置と前記第２被写体基準位置情報が示す位置との間の相対的な位置関係、前記第１被写体基準位置情報、及び前記第２被写体基準位置情報がそれぞれ前記他のコンピューターとの間で共有されており、前記提示過程は、ビデオシースルー方式を用いて前記提示映像を生成するプログラムである。 [8] Also, one aspect of the present invention includes a reference position setting process for setting, in a three-dimensional space, first subject reference position information, which is information indicating a reference position and orientation for a first subject video, which is a video including an image of a first subject, and second subject reference position information, which is information indicating a reference position and orientation for a second subject video, which is a video including an image of a second subject, a surrounding image acquisition process for acquiring a surrounding image, which is an image of the surroundings of a content presentation device including the first subject, a first subject image extraction process for extracting the first subject video from the surrounding image, a background image acquisition process for acquiring a background image, which is an image used as a background, a second subject image acquisition process for acquiring the second subject video from another content presentation device, a content acquisition process for acquiring content, a first subject position setting process for setting a position and orientation of the first subject video in the three-dimensional space based on the first subject reference position information, and a second subject image acquisition process for setting a position and orientation of the second subject video in the three-dimensional space based on the second subject reference position information. A program that causes a computer to execute the processes of: a subject position setting process; a content position setting process that sets a position in the three-dimensional space where the content is to be displayed based on a predetermined relative positional relationship between the position indicated by the first subject reference position information and the position indicated by the second subject reference position information; and a presentation process that outputs a presentation image, which is an image displayed in the three-dimensional space against the background image, of the first subject image whose position and orientation in the three-dimensional space are set, the second subject image whose position and orientation in the three-dimensional space are set, and the content whose position in the three-dimensional space is set, wherein the relative positional relationship between the position in the three-dimensional space where the content is displayed, the position indicated by the first subject reference position information, and the position indicated by the second subject reference position information, the first subject reference position information, and the second subject reference position information are each shared with the other computer , and the presentation process is a program that generates the presentation image using a video see-through method .

本発明によれば、異なる空間にいる人とコンテンツを同じ空間を共有しながら体験している感覚を提供できる。 The present invention provides the feeling that people in different spaces are experiencing content while sharing the same space.

本発明の実施形態に係るコンテンツ提示システムの構成一例を示す図である。1 is a diagram illustrating an example of a configuration of a content presentation system according to an embodiment of the present invention. 実施形態に係るコンテンツ提示装置の構成の一例を示す図である。1 is a diagram illustrating an example of a configuration of a content presentation device according to an embodiment. 実施形態に係る撮影部の構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a configuration of an imaging unit according to the embodiment. 実施形態に係る表示部の構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a configuration of a display unit according to the embodiment. 実施形態に係る撮影処理の一例を示す図である。FIG. 4 is a diagram illustrating an example of an imaging process according to the embodiment. 実施形態に係る表示処理の一例を示す図である。FIG. 11 is a diagram illustrating an example of a display process according to the embodiment. 実施形態に係る位置設定処理の一例を示す図である。FIG. 11 is a diagram illustrating an example of a position setting process according to the embodiment. 実施形態に係る被写体の位置を設定するためにヘッドマウントディスプレイに表示される映像の一例を示す図である。FIG. 11 is a diagram showing an example of an image displayed on a head mounted display for setting the position of a subject according to the embodiment. 実施形態に係るコンテンツ、第１被写体映像、及び第２被写体映像の相対的な位置関係の一例を示す図である。5A to 5C are diagrams illustrating an example of a relative positional relationship between content, a first object image, and a second object image according to an embodiment. 実施形態に係るＡＲまたはＶＲオブジェクトの描画の一例を示す図である。FIG. 1 is a diagram illustrating an example of drawing an AR or VR object according to an embodiment. 実施形態に係る第１被写体映像と第２被写体映像についてのオクルージョンの再現の一例を示す図である。10A to 10C are diagrams illustrating an example of reproduction of occlusion for a first object image and a second object image according to an embodiment. 実施形態に係る第２被写体映像とコンテンツについてのオクルージョンの再現の一例を示す図である。11A to 11C are diagrams illustrating an example of reproduction of occlusion for a second object image and content according to the embodiment. 実施形態に係るＡＲとＶＲとを組み合わせたコンテンツの一例を示す図である。FIG. 1 is a diagram showing an example of content that combines AR and VR according to an embodiment. 実施形態に係るＡＲとＶＲとを組み合わせたコンテンツの一例を示す図である。FIG. 1 is a diagram showing an example of content that combines AR and VR according to an embodiment. 実施形態に係るＡＲとＶＲとを組み合わせたコンテンツの一例を示す図である。FIG. 1 is a diagram showing an example of content that combines AR and VR according to an embodiment.

（実施形態）
以下、図面を参照しながら本発明の実施形態について詳しく説明する。図１は、本実施形態に係るコンテンツ提示システムＳの構成一例を示す図である。コンテンツ提示システムＳは、複数のコンテンツ提示装置１（コンテンツ提示装置１－１、１－２）と、サーバー装置２と、複数のヘッドマウントディスプレイＨ１（ヘッドマウントディスプレイＨ１－１、Ｈ１－２）と、複数のステレオカメラＳＣ１（ステレオカメラＳＣ１－１、ＳＣ１－２）複数のＲＧＢＤカメラＣＭ１（ＲＧＢＤカメラＣＭ１－１、ＣＭ１－２）とを備える。ユーザーは、ＲＧＢＤカメラＣＭ１によって撮影されるため、本実施形態ではユーザーのことを被写体という。 (Embodiment)
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a diagram showing an example of the configuration of a content presentation system S according to this embodiment. The content presentation system S includes a plurality of content presentation devices 1 (content presentation devices 1-1, 1-2), a server device 2, a plurality of head mounted displays H1 (head mounted displays H1-1, H1-2), a plurality of stereo cameras SC1 (stereo cameras SC1-1, SC1-2), and a plurality of RGBD cameras CM1 (RGBD cameras CM1-1, CM1-2). Since a user is photographed by the RGBD camera CM1, in this embodiment, the user is referred to as a subject.

コンテンツ提示システムＳでは、複数の被写体は互いに離れた異なる空間に存在する。図１に示す例では、第１被写体Ｐ１、第２被写体Ｐ２はそれぞれ、第１の空間Ｒ１、第２の空間Ｒ２にそれぞれ存在している。第１の空間Ｒ１と、第２の空間Ｒ２とは互いに離れた場所に存在している。第１の空間Ｒ１と第２の空間Ｒ２とが離れている距離はいずれでもよい。第１の空間Ｒ１と第２の空間Ｒ２との間は建物の壁などで仕切られていてもよいし、仕切られていなくてもよい。第１の空間Ｒ１と第２の空間Ｒ２とは、同じ部屋のなかの異なる場所であってもよい。第１の空間Ｒ１は、例えば、第１被写体Ｐ１の自宅の部屋であり、第２の空間Ｒ２は、例えば、第１被写体Ｐ１の自宅から離れた場所にある第２被写体Ｐ２の自宅の部屋である。 In the content presentation system S, a plurality of subjects exist in different spaces that are separate from each other. In the example shown in FIG. 1, the first subject P1 and the second subject P2 exist in the first space R1 and the second space R2, respectively. The first space R1 and the second space R2 exist in locations that are separate from each other. The distance that the first space R1 and the second space R2 are separated from each other may be any distance. The first space R1 and the second space R2 may be separated by a wall of a building or the like, or may not be separated. The first space R1 and the second space R2 may be different locations in the same room. The first space R1 is, for example, a room in the home of the first subject P1, and the second space R2 is, for example, a room in the home of the second subject P2 that is located away from the home of the first subject P1.

複数のコンテンツ提示装置１は、相互に通信しながら連携動作する。図１では、コンテンツ提示装置１－１、コンテンツ提示装置１－２それぞれは、第１の空間Ｒ１、第２の空間Ｒ２にそれぞれ設置されて同時に稼働する。各々のコンテンツ提示装置１は、１人の被写体によって使用される。図１では、コンテンツ提示装置１－１、コンテンツ提示装置１－２それぞれは、第１被写体Ｐ１、第２被写体Ｐ２によってそれぞれ使用される。複数のコンテンツ提示装置１は、ネットワークＮＷ、及びサーバー装置２を介して相互に情報を交換する。コンテンツ提示装置１、及びサーバー装置２はそれぞれ、無線通信または有線通信によってネットワークＮＷに接続する。 The multiple content presentation devices 1 communicate with each other and work together. In FIG. 1, the content presentation device 1-1 and the content presentation device 1-2 are installed in a first space R1 and a second space R2, respectively, and operate simultaneously. Each content presentation device 1 is used by one subject. In FIG. 1, the content presentation device 1-1 and the content presentation device 1-2 are used by a first subject P1 and a second subject P2, respectively. The multiple content presentation devices 1 exchange information with each other via a network NW and a server device 2. The content presentation device 1 and the server device 2 are each connected to the network NW by wireless communication or wired communication.

コンテンツ提示装置１は、一例として、パーソナルコンピュータ（ＰＣ：ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）である。コンテンツ提示装置１は、スマートフォンなどの携帯端末装置であってもよい。コンテンツ提示装置１は、ヘッドマウントディスプレイＨ１と一体となって備えられてもよい。例えば、コンテンツ提示装置１は、小型の端末装置としてヘッドマウントディスプレイＨ１に内蔵されてもよいし、コンテンツ提示装置１の機能がプログラムとして実現されて、ヘッドマウントディスプレイＨ１に内蔵された演算装置によって当該プログラムが実行されてもよい。 The content presentation device 1 is, for example, a personal computer (PC). The content presentation device 1 may be a mobile terminal device such as a smartphone. The content presentation device 1 may be provided integrally with the head mounted display H1. For example, the content presentation device 1 may be built into the head mounted display H1 as a small terminal device, or the functions of the content presentation device 1 may be realized as a program, and the program may be executed by a computing device built into the head mounted display H1.

以下では第１被写体Ｐ１の側の構成について説明するが、第２被写体Ｐ２の側の構成は、第１被写体Ｐ１の側の構成と同様である。
第１被写体Ｐ１は、ヘッドマウントディスプレイＨ１－１を装着している。ヘッドマウントディスプレイＨ１－１は、ディスプレイ装置、マイク、スピーカ、及びＨＭＤ位置測定部を備える。ＨＭＤ位置測定部は、ヘッドマウントディスプレイＨ１－１の位置及び姿勢を測定する。
ステレオカメラＳＣ１－１は、一例としてヘッドマウントディスプレイＨ１と一体となって備えられる。ステレオカメラＳＣ１－１は、第１被写体Ｐ１から見た第１の空間Ｒ１の風景、及び第１被写体Ｐ１の体の一部を撮影する。第１被写体Ｐ１の体の一部とは、例えば、第１被写体Ｐ１から見た自身の手、腕などである。ヘッドマウントディスプレイＨ１－１は、ステレオカメラＳＣ１－１によって撮影された映像を、自装置に備えられたディスプレイ装置に再生させる。 The configuration on the side of the first subject P1 will be described below, but the configuration on the side of the second subject P2 is similar to the configuration on the side of the first subject P1.
The first subject P1 is wearing a head mounted display H1-1. The head mounted display H1-1 includes a display device, a microphone, a speaker, and an HMD position measurement unit. The HMD position measurement unit measures the position and orientation of the head mounted display H1-1.
As an example, the stereo camera SC1-1 is provided integrally with the head mounted display H1. The stereo camera SC1-1 captures the scenery of the first space R1 as seen from the first subject P1, and a part of the body of the first subject P1. The part of the body of the first subject P1 is, for example, the first subject P1's own hand, arm, etc. as seen from the first subject P1's perspective. The head mounted display H1-1 reproduces the image captured by the stereo camera SC1-1 on a display device provided in the head mounted display H1-1.

第１被写体Ｐ１は、第１の空間Ｒ１の第１被写体Ｐ１の背後の風景とともにＲＧＢＤカメラＣＭ１－１によって撮影される。第１被写体映像ＰＶ１は、第１被写体Ｐ１が撮影された画像を含む映像である。第１被写体映像ＰＶ１には、第１被写体Ｐ１が撮影された画像とともに、ヘッドマウントディスプレイＨ１－１に備えられるマイクによって収音された第１被写体Ｐ１の音声が含まる。 The first subject P1 is photographed by the RGBD camera CM1-1 along with the scenery behind the first subject P1 in the first space R1. The first subject video PV1 is a video including an image of the first subject P1. The first subject video PV1 includes the image of the first subject P1 as well as the sound of the first subject P1 picked up by a microphone provided on the head-mounted display H1-1.

ＲＧＢＤカメラＣＭ１－１によって撮影された映像は、コンテンツ提示装置１－１を介してコンテンツ提示装置１－２に送信される。一方、第２被写体Ｐ２の側においてＲＧＢＤカメラＣＭ１－２によって撮影された映像は、コンテンツ提示装置１－２を介してコンテンツ提示装置１－１によって受信される。ヘッドマウントディスプレイＨ１－１は、提示映像Ｖ１－１を自装置に備えられるディスプレイ装置に提示させる。提示映像Ｖ１－１には、背景映像ＢＧ１－１を背景として、第１被写体Ｐ１の体の一部の映像とともに、ＲＧＢＤカメラＣＭ１－２によって撮影された映像に含まれる第２被写体映像ＰＶ２、及びコンテンツＣ１が含まれる。ここで背景映像ＢＧ１は、ヘッドマウントディスプレイＨ１が再生する映像において背景として用いられる映像である。本実施形態では、ヘッドマウントディスプレイＨ１－１が再生する映像において背景として用いられる背景映像ＢＧ１－１は、一例として、ステレオカメラＳＣ１－１によって撮影された第１の空間Ｒ１の風景の映像である。第２被写体映像ＰＶ２は、第２被写体Ｐ２が撮影された画像を含む映像である。第２被写体映像ＰＶ２には、第２被写体Ｐ２の音声が含まれる。
コンテンツＣ１は、ヘッドマウントディスプレイＨ１が再生する映像において拡張現実（ＡＲ：ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ）として表示される。コンテンツＣ１は、例えば、テレビ放送、ＡＲを用いた３次元映像などである。 The image captured by the RGBD camera CM1-1 is transmitted to the content presentation device 1-2 via the content presentation device 1-1. On the other hand, the image captured by the RGBD camera CM1-2 on the side of the second subject P2 is received by the content presentation device 1-1 via the content presentation device 1-2. The head mounted display H1-1 causes the display device provided in the device to present the presentation image V1-1. The presentation image V1-1 includes the second subject image PV2 included in the image captured by the RGBD camera CM1-2, together with an image of a part of the body of the first subject P1 against the background of the background image BG1-1, and the content C1. Here, the background image BG1 is an image used as the background in the image reproduced by the head mounted display H1. In this embodiment, the background image BG1-1 used as the background in the image reproduced by the head mounted display H1-1 is, as an example, an image of the scenery of the first space R1 captured by the stereo camera SC1-1. The second subject video PV2 is a video including an image of the second subject P2 and includes the sound of the second subject P2.
The content C1 is displayed as an augmented reality (AR) image reproduced by the head mounted display H1. The content C1 is, for example, a television broadcast or a three-dimensional image using AR.

同様にして、第２被写体Ｐ２の側では、ヘッドマウントディスプレイＨ１－２は、提示映像Ｖ１－２を自装置に備えられるディスプレイ装置に提示させる。提示映像Ｖ１－２には、背景映像ＢＧ１－２を背景として、第２被写体Ｐ２の体の一部の映像とともに、ＲＧＢＤカメラＣＭ１－１によって撮影された映像に含まれる第１被写体映像ＰＶ１、及びコンテンツＣ１が含まれる。コンテンツＣ１は、第１被写体Ｐ１の側において提示される提示映像Ｖ１－１と、第２被写体Ｐ２の側において提示される提示映像Ｖ１－２とにおいて共通である。 Similarly, on the side of the second subject P2, the head mounted display H1-2 causes the display device provided in the head mounted display H1-2 to present the presented image V1-2. The presented image V1-2 includes an image of part of the body of the second subject P2 against the background image BG1-2, as well as the first subject image PV1 contained in the image captured by the RGBD camera CM1-1, and content C1. The content C1 is common to the presented image V1-1 presented on the side of the first subject P1 and the presented image V1-2 presented on the side of the second subject P2.

上述したように、背景映像ＢＧ１には、被写体が存在している空間の風景が用いられる。第１被写体Ｐ１と第２被写体Ｐ２とは、互いに離れた異なる空間にそれぞれ存在しているため、第１被写体Ｐ１の側において用いられる背景映像ＢＧ１－１と、第２被写体Ｐ２の側において用いられる背景映像ＢＧ１－２とは、互いに異なる。
なお、背景映像ＢＧ１として、第１被写体Ｐ１の側と、第２被写体Ｐ２の側とにおいて共通の映像が用いられてもよい。例えば、背景映像ＢＧ１として、第１の空間Ｒ１、または第２の空間Ｒ２以外の第３の空間の風景の映像が用いられてもよい。第３の空間とは、例えば、バーチャル空間である。このバーチャル空間には、例えば、各種の施設（例えば、放送スタジオ、会議室、映画館、水族館、美術館、コンサート会場、娯楽施設など）、屋外（海岸、山中、公園、観光名所など）などが含まれる。バーチャル空間は、リアルタイムストリーミングによる現在の風景の映像であってもよいし、過去に撮影された風景の映像であってもよい。バーチャル空間は、コンピュータグラフィックスによって作成された実際に存在しない場所であってもよい。 As described above, the background image BG1 uses the scenery of the space that the subject exists in. Since the first subject P1 and the second subject P2 exist in different spaces that are separate from each other, the background image BG1-1 used on the side of the first subject P1 and the background image BG1-2 used on the side of the second subject P2 are different from each other.
In addition, a common image may be used on the side of the first subject P1 and the side of the second subject P2 as the background image BG1. For example, an image of a landscape of a third space other than the first space R1 or the second space R2 may be used as the background image BG1. The third space is, for example, a virtual space. This virtual space includes, for example, various facilities (for example, broadcast studios, conference rooms, movie theaters, aquariums, art museums, concert venues, entertainment facilities, etc.), outdoors (coasts, mountains, parks, tourist attractions, etc.), etc. The virtual space may be an image of a current landscape by real-time streaming, or an image of a landscape photographed in the past. The virtual space may be a place that does not actually exist created by computer graphics.

コンテンツ提示システムＳでは、互いに離れた異なる空間に存在する複数の被写体に、互いに同じ空間に存在するように感じさせながら共通のコンテンツが提示される。 In the content presentation system S, common content is presented to multiple subjects that exist in different spaces separated from each other, making them feel as if they are in the same space.

［コンテンツ提示装置の構成］
ここで図２から図４を参照し、コンテンツ提示装置１の構成について説明する。
図２は、本実施形態に係るコンテンツ提示装置１の構成の一例を示す図である。コンテンツ提示装置１は、撮影部１１と、表示部１２とを備える。撮影部１１は、第１被写体映像ＰＶ１を第２被写体Ｐ２の側のコンテンツ提示装置１に送信する。表示部１２は、提示映像をヘッドマウントディスプレイＨ１に備えられるディスプレイ装置に表示させる。 [Configuration of the content presentation device]
The configuration of the content presentation device 1 will now be described with reference to FIGS.
2 is a diagram showing an example of the configuration of the content presentation device 1 according to the present embodiment. The content presentation device 1 includes a shooting unit 11 and a display unit 12. The shooting unit 11 transmits a first subject video PV1 to the content presentation device 1 on the side of the second subject P2. The display unit 12 displays the presentation video on a display device provided in the head mounted display H1.

コンテンツ提示装置１が備える上記の各機能部の少なくとも一部の機能は、電子回路を用いて実現され得る。また、それらの各機能部の一部または全部が、コンピューターと、プログラムとを用いて実現されてもよい。各機能部は、必要に応じて、記憶手段を有する。記憶手段は、例えば、電子回路上において所定の状態を維持するフリップフロップや、プログラムを用いる場合のプログラム上の変数や、プログラムの実行によりアロケーションされるメモリーである。また、必要に応じて、磁気ハードディスク装置やソリッドステートドライブ（ＳＳＤ）といった不揮発性の記憶手段を用いるようにしてもよい。 At least some of the functions of each of the functional units of the content presentation device 1 may be realized using electronic circuits. Furthermore, some or all of these functional units may be realized using a computer and a program. Each functional unit has a storage means as necessary. The storage means is, for example, a flip-flop that maintains a predetermined state on the electronic circuit, a variable in the program when a program is used, or a memory that is allocated by the execution of the program. Furthermore, non-volatile storage means such as a magnetic hard disk drive or a solid-state drive (SSD) may be used as necessary.

［撮影部の構成］
図３は、本実施形態に係る撮影部１１の構成の一例を示す図である。
ＲＧＢＤ撮影部３１は、自装置（ＲＧＢＤカメラＣＭ１－１）の周辺の映像であるＲＧＢＤ映像を撮影する。ＲＧＢＤ映像には、第１の空間Ｒ１に存在する第１被写体Ｐ１と、第１被写体Ｐ１の背後の風景とが含まれる。ＲＧＢＤ撮影部３１は、図１に示したＲＧＢＤカメラＣＭ１－１に対応する。 [Configuration of the imaging section]
FIG. 3 is a diagram showing an example of the configuration of the imaging unit 11 according to the present embodiment.
The RGBD image capturing unit 31 captures an RGBD image, which is an image of the surroundings of the device itself (RGBD camera CM1-1). The RGBD image includes a first subject P1 existing in a first space R1 and the scenery behind the first subject P1. The RGBD image capturing unit 31 corresponds to the RGBD camera CM1-1 shown in FIG.

第１被写体位置測定部３２は、ヘッドマウントディスプレイＨ１－１の位置及び姿勢を測定する。第１被写体位置測定部３２は、例えば、ヘッドマウントディスプレイＨ１－１に内蔵されるＨＭＤ位置測定部である。ここで、位置は、３次元空間における位置座標で表される情報である。また、姿勢は、ヘッドマウントディスプレイＨ１－１の向きを、例えば３次元の角度の情報で表した情報である。ＨＭＤ位置測定部は、例えば、ジャイロセンサーやカメラを内蔵することにより、位置および姿勢を取得するようにしてもよい。また、ヘッドマウントディスプレイＨ１－１と一体となって備えられるステレオカメラＳＣ１－１が撮影した画像からも自己位置推定ができるため、ＨＭＤ位置測定部は、ステレオカメラＳＣ１－１が取得した映像から、位置および姿勢を算出してもよい。また、ＨＭＤ位置測定部は、外部からのビーコン信号を受信することによって、あるいは実空間内の場所を特定するためにクロックと同期して外部から発せられる赤外線等の信号を受信することによって、位置や姿勢を取得するようにしてもよい。また、ＨＭＤ位置測定部は、外部（例えば、コンテンツの視聴空間である部屋内）に設けられた複数のカメラ（例えば、ＲＧＢＤカメラＣＭ１－１）がヘッドマウントディスプレイＨ１－１を撮影して、求められた位置および姿勢の情報を受信するようにしてもよい。
第１被写体位置測定部３２は、測定したヘッドマウントディスプレイＨ１－１の位置及び姿勢を示す情報を、ＨＭＤ位置情報としてコンテンツ提示装置１に出力する。 The first subject position measuring unit 32 measures the position and attitude of the head mounted display H1-1. The first subject position measuring unit 32 is, for example, an HMD position measuring unit built into the head mounted display H1-1. Here, the position is information expressed by position coordinates in a three-dimensional space. Also, the attitude is information expressing the orientation of the head mounted display H1-1, for example, by three-dimensional angle information. The HMD position measuring unit may acquire the position and attitude by, for example, incorporating a gyro sensor or a camera. Also, since self-position estimation can be performed from an image captured by a stereo camera SC1-1 provided integrally with the head mounted display H1-1, the HMD position measuring unit may calculate the position and attitude from an image captured by the stereo camera SC1-1. Also, the HMD position measuring unit may acquire the position and attitude by receiving a beacon signal from an external source, or by receiving a signal such as infrared light emitted from an external source in synchronization with a clock to identify a location in real space. In addition, the HMD position measurement unit may have multiple cameras (e.g., RGBD camera CM1-1) installed outside (e.g., within a room which is the viewing space for the content) photograph the head mounted display H1-1 and receive the determined position and attitude information.
The first subject position measuring unit 32 outputs information indicating the measured position and orientation of the head mounted display H1-1 to the content presentation device 1 as HMD position information.

第１被写体音声取得部３３は、第１の空間Ｒ１に存在する第１被写体Ｐ１の音声を取得する。第１被写体音声取得部３３は、ヘッドマウントディスプレイＨ１－１に内蔵されるマイクである。第１被写体音声取得部３３は、取得した第１被写体Ｐ１の音声を示す情報を、第１被写体音声情報としてコンテンツ提示装置１に出力する。 The first subject sound acquisition unit 33 acquires the sound of the first subject P1 present in the first space R1. The first subject sound acquisition unit 33 is a microphone built into the head mounted display H1-1. The first subject sound acquisition unit 33 outputs information indicating the acquired sound of the first subject P1 to the content presentation device 1 as first subject sound information.

撮影部１１は、第１被写体映像切り出し部１１１と、第１被写体映像供給部１１２とを備える。
第１被写体映像切り出し部１１１は、ＲＧＢＤ撮影部３１によって撮影されたＲＧＢＤ映像から第１被写体映像ＰＶ１を切り出す。第１被写体映像切り出し部１１１は、一例として、ＲＧＢＤ映像に含まれる距離の情報に基づいて距離情報（デプスマップ）を生成し、生成した距離情報に基づいて、予め設定された距離（深さ）に基づいてＲＧＢＤ映像から第１被写体映像ＰＶ１を切り出す。ここで距離情報は、ＲＧＢＤカメラＣＭ１－１からみた距離（深さ）がＲＧＢＤ映像の画素毎に示された情報である。ＲＧＢＤ撮影部３１によって撮影されたＲＧＢＤ映像において、第１被写体Ｐ１に対応する部分のＲＧＢＤカメラＣＭ１－１からみた距離（深さ）は、第１被写体Ｐ１以外の背景の部分のＲＧＢＤカメラＣＭ１－１からみた距離（深さ）に比べて近いため、第１被写体映像切り出し部１１１は、距離情報に基づいてＲＧＢＤ映像から第１被写体映像ＰＶ１を切り出すことができる。予め設定された距離（深さ）は、第１被写体Ｐ１とＲＧＢＤカメラＣＭ１－１との距離に応じて予め設定される。なお、第１被写体Ｐ１とＲＧＢＤカメラＣＭ１－１との間には他の物体が配置されていないことが好ましい。 The imaging unit 11 includes a first object image cropping unit 111 and a first object image supplying unit 112 .
The first subject image cutout unit 111 cuts out the first subject image PV1 from the RGBD image captured by the RGBD shooting unit 31. As an example, the first subject image cutout unit 111 generates distance information (depth map) based on distance information included in the RGBD image, and cuts out the first subject image PV1 from the RGBD image based on a preset distance (depth) based on the generated distance information. Here, the distance information is information indicating the distance (depth) seen from the RGBD camera CM1-1 for each pixel of the RGBD image. In the RGBD image captured by the RGBD shooting unit 31, the distance (depth) seen from the RGBD camera CM1-1 of the part corresponding to the first subject P1 is closer than the distance (depth) seen from the RGBD camera CM1-1 of the background part other than the first subject P1, so the first subject image cutout unit 111 can cut out the first subject image PV1 from the RGBD image based on the distance information. The preset distance (depth) is set in advance according to the distance between the first subject P1 and the RGBD camera CM1-1. It is preferable that no other object is placed between the first subject P1 and the RGBD camera CM1-1.

なお、第１被写体映像切り出し部１１１がＲＧＢＤ映像から第１被写体映像ＰＶ１を切り出す方法は、距離情報に基づく方法に限られない。第１被写体映像切り出し部１１１は、機械学習に基いてＲＧＢＤ映像から第１被写体映像ＰＶ１を切り出してもよい。この場合、第１被写体映像切り出し部１１１は、例えば、機械学習により、映像における第１被写体Ｐ１の特徴を予め学習済みである。第１被写体映像切り出し部１１１は、学習済みのモデルを参照することにより、ＲＧＢＤ映像内において第１被写体Ｐ１が映っている箇所（画像内の領域の座標等の情報）を特定することによって第１被写体映像ＰＶ１を切り出す。また別の一例として、第１被写体映像切り出し部１１１は、テンプレートを用いたパターンマッチングに基づいてＲＧＢＤ映像内において第１被写体Ｐ１が映っている箇所を特定してもよい。 The method by which the first subject video cutout unit 111 cuts out the first subject video PV1 from the RGBD video is not limited to a method based on distance information. The first subject video cutout unit 111 may cut out the first subject video PV1 from the RGBD video based on machine learning. In this case, the first subject video cutout unit 111 has previously learned the characteristics of the first subject P1 in the video, for example, by machine learning. The first subject video cutout unit 111 cuts out the first subject video PV1 by identifying the location in the RGBD video where the first subject P1 appears (information such as the coordinates of an area in an image) by referring to the learned model. As another example, the first subject video cutout unit 111 may identify the location in the RGBD video where the first subject P1 appears based on pattern matching using a template.

第１被写体映像供給部１１２は、第１被写体映像切り出し部１１１によって切り出された第１被写体映像ＰＶ１、第１被写体位置測定部３２から出力されるＨＭＤ位置情報、第１被写体音声取得部３３から出力される第１被写体音声情報を、他のコンテンツ提示装置１（コンテンツ提示装置１－２）に供給する。なお、第１被写体音声情報は、第１被写体映像ＰＶ１に含めて供給される。 The first subject video supply unit 112 supplies the first subject video PV1 cut out by the first subject video cut-out unit 111, the HMD position information output from the first subject position measurement unit 32, and the first subject audio information output from the first subject audio acquisition unit 33 to another content presentation device 1 (content presentation device 1-2). Note that the first subject audio information is supplied while being included in the first subject video PV1.

［表示部の構成］
図４は、本実施形態に係る表示部１２の構成の一例を示す図である。表示部１２は、外部情報取得部１３と、制御コマンド取得部１４と、コンテンツ取得部１５と、処理部１６と、データ共有部１７と、第２被写体情報取得部１８と、記憶部１９とを備える。 [Display Configuration]
4 is a diagram showing an example of the configuration of the display unit 12 according to the present embodiment. The display unit 12 includes an external information acquisition unit 13, a control command acquisition unit 14, a content acquisition unit 15, a processing unit 16, a data sharing unit 17, a second subject information acquisition unit 18, and a storage unit 19.

外部情報取得部１３は、周辺映像取得部１３１と、距離情報取得部１３２と、第１被写体位置・姿勢取得部１３３とを備える。周辺映像取得部１３１は、ステレオカメラＳＣ１－１が撮影した周辺映像Ｅ１を取得する。周辺映像Ｅ１は、第１被写体Ｐ１を含む自装置（コンテンツ提示装置１－１）の周辺の映像である。周辺映像Ｅ１には、第１被写体Ｐ１の音声が含まれる。 The external information acquisition unit 13 includes a surrounding image acquisition unit 131, a distance information acquisition unit 132, and a first subject position and orientation acquisition unit 133. The surrounding image acquisition unit 131 acquires a surrounding image E1 captured by the stereo camera SC1-1. The surrounding image E1 is an image of the surroundings of the own device (content presentation device 1-1) including the first subject P1. The surrounding image E1 includes the sound of the first subject P1.

距離情報取得部１３２は、周辺映像取得部１３１が取得する周辺映像Ｅ１に対応する距離情報（デプスマップ）を取得する。ここで距離情報取得部１３２は、ステレオカメラＳＣ１－１が撮影した周辺映像Ｅ１に基づいてステレオマッチングを用いて距離情報（デプスマップ）を取得（生成）する。なお、コンテンツ提示装置１とは別の装置によってステレオカメラＳＣ１－１が撮影した周辺映像Ｅ１から距離情報が算出されて、距離情報取得部１３２は、別の装置によって算出された距離情報を取得してもよい。なお、距離画像を取得することそのものは、既存技術を利用して実現可能である。 The distance information acquisition unit 132 acquires distance information (depth map) corresponding to the surrounding image E1 acquired by the surrounding image acquisition unit 131. Here, the distance information acquisition unit 132 acquires (generates) the distance information (depth map) using stereo matching based on the surrounding image E1 captured by the stereo camera SC1-1. Note that distance information may be calculated from the surrounding image E1 captured by the stereo camera SC1-1 by a device other than the content presentation device 1, and the distance information acquisition unit 132 may acquire the distance information calculated by the other device. Note that the acquisition of a distance image itself can be achieved by using existing technology.

第１被写体位置・姿勢取得部１３３は、ヘッドマウントディスプレイＨ１－１から第１被写体位置姿勢情報を取得する。第１被写体位置姿勢情報は、第１被写体Ｐ１の位置及び姿勢を示す情報である。第１被写体位置・姿勢取得部１３３は、ヘッドマウントディスプレイＨ１－１からヘッドマウントディスプレイＨ１－１の位置及び姿勢を示す情報を、第１被写体位置姿勢情報として取得する。 The first subject position/orientation acquisition unit 133 acquires first subject position/orientation information from the head mounted display H1-1. The first subject position/orientation information is information indicating the position and orientation of the first subject P1. The first subject position/orientation acquisition unit 133 acquires information indicating the position and orientation of the head mounted display H1-1 from the head mounted display H1-1 as the first subject position/orientation information.

制御コマンド取得部１４は、操作部（不図示）から制御コマンドを取得する。制御コマンドは、コンテンツについての操作を示す。操作部は、第１被写体Ｐ１からの操作を受け付ける。操作部は、例えば、リモートコントローラー（リモコン）、キーボード、コントローラーなどを含んで構成される。コンテンツについての操作に応じて、コンテンツの内部状態が更新される。コンテンツの内部状態には、選択されたコンテンツ（番組など）を示す情報、再生または停止の区別、音量、コンテンツが表示される位置及び向き、コンテンツが表示される領域の大きさなどが含まれる。コンテンツを選択するとは、番組（またはチャンネル）などを選択することである。コンテンツが表示される位置は、第１被写体Ｐ１からみて左右方向あるいは奥行方向の位置によって指定される。 The control command acquisition unit 14 acquires a control command from an operation unit (not shown). The control command indicates an operation on the content. The operation unit accepts operations from the first subject P1. The operation unit includes, for example, a remote controller, a keyboard, a controller, etc. The internal state of the content is updated in response to the operation on the content. The internal state of the content includes information indicating the selected content (such as a program), whether to play or stop, the volume, the position and orientation in which the content is displayed, the size of the area in which the content is displayed, etc. Selecting content means selecting a program (or channel), etc. The position in which the content is displayed is specified by the position in the left-right direction or the depth direction as viewed from the first subject P1.

コンテンツ取得部１５は、コンテンツＣ１を記憶部１９から取得する。なお、本実施形態では、コンテンツＣ１が記憶部１９に予め記憶されている場合の一例について説明するが、これに限られない。コンテンツ取得部１５は、コンテンツ提示装置１以外からコンテンツを取得してもよい。例えば、コンテンツ取得部１５は、ＤＶＤやブルーレイディスクやハードディスク装置などといった記録媒体から、コンテンツを取得してもよい。あるいは、コンテンツ取得部１５は、通信や放送等の信号で配信されるコンテンツを取得してもよい。 The content acquisition unit 15 acquires the content C1 from the storage unit 19. Note that, in this embodiment, an example in which the content C1 is pre-stored in the storage unit 19 will be described, but this is not limiting. The content acquisition unit 15 may acquire content from a source other than the content presentation device 1. For example, the content acquisition unit 15 may acquire content from a recording medium such as a DVD, a Blu-ray disc, or a hard disk device. Alternatively, the content acquisition unit 15 may acquire content distributed by a signal for communication, broadcasting, or the like.

処理部１６は、外部情報取得部１３、制御コマンド取得部１４、コンテンツ取得部１５、データ共有部１７、第２被写体情報取得部１８からの情報を処理し、ディスプレイ装置４に表示させる提示映像Ｖ１を計算し、出力する。処理部１６の構成の詳細は後述する。 The processing unit 16 processes information from the external information acquisition unit 13, the control command acquisition unit 14, the content acquisition unit 15, the data sharing unit 17, and the second subject information acquisition unit 18, and calculates and outputs the presentation image V1 to be displayed on the display device 4. The configuration of the processing unit 16 will be described in detail later.

データ共有部１７は、コンテンツ提示装置１（コンテンツ提示装置１－１）が他のコンテンツ提示装置１（コンテンツ提示装置１－２）との間で制御コマンドを共有するための処理を行う。データ共有部１７は、制御コマンド取得部１４が取得した制御コマンドを、他のコンテンツ提示装置１に送信する。データ共有部１７は、他のコンテンツ提示装置１から制御コマンドを受信する。コンテンツについての操作は、上述したように第１被写体Ｐ１からの操作によって行われる場合と、第２被写体Ｐ２からの操作が他のコンテンツ提示装置１（コンテンツ提示装置１－２）から受信されて行われる場合とがある。 The data sharing unit 17 performs processing for the content presentation device 1 (content presentation device 1-1) to share control commands with another content presentation device 1 (content presentation device 1-2). The data sharing unit 17 transmits the control commands acquired by the control command acquisition unit 14 to the other content presentation device 1. The data sharing unit 17 receives control commands from the other content presentation device 1. As described above, operations on the content may be performed by operations from the first subject P1, or may be performed by receiving operations from the second subject P2 from the other content presentation device 1 (content presentation device 1-2).

第２被写体情報取得部１８は、第２被写体映像取得部１８１と、第２被写体位置・姿勢取得部１８２とを備える。第２被写体映像取得部１８１は、第２被写体映像ＰＶ２を他のコンテンツ提示装置１（コンテンツ提示装置１－２）から取得する。第２被写体映像ＰＶ２には、第２被写体Ｐ２の音声が含まれる。第２被写体映像取得部１８１が他のコンテンツ提示装置１－２から取得する第２被写体映像ＰＶ２は、他のコンテンツ提示装置１－２の側に備えられるＲＧＢＤ撮影部（図３に示したＲＧＢＤ撮影部３１に対応）によって撮影されたＲＧＢＤ画像から切り出された第２被写体映像ＰＶ２と、他のコンテンツ提示装置１－２の側に備えられる第２被写体音声取得部（図３に示した第１被写体音声取得部３３に対応）によって取得された第２被写体Ｐ２の音声である。第２被写体位置・姿勢取得部１８２は、第２被写体Ｐ２の他のコンテンツ提示装置１（コンテンツ提示装置１－２）から第２被写体位置姿勢情報を取得する。第２被写体位置姿勢情報は、第２被写体Ｐ２の位置及び姿勢を示す情報である。 The second subject information acquisition unit 18 includes a second subject video acquisition unit 181 and a second subject position/posture acquisition unit 182. The second subject video acquisition unit 181 acquires the second subject video PV2 from the other content presentation device 1 (content presentation device 1-2). The second subject video PV2 includes the sound of the second subject P2. The second subject video PV2 acquired by the second subject video acquisition unit 181 from the other content presentation device 1-2 is the second subject video PV2 cut out from an RGBD image captured by an RGBD capture unit (corresponding to the RGBD capture unit 31 shown in FIG. 3) provided on the side of the other content presentation device 1-2, and the sound of the second subject P2 acquired by a second subject audio acquisition unit (corresponding to the first subject audio acquisition unit 33 shown in FIG. 3) provided on the side of the other content presentation device 1-2. The second subject position/orientation acquisition unit 182 acquires second subject position/orientation information from another content presentation device 1 (content presentation device 1-2) of the second subject P2. The second subject position/orientation information is information that indicates the position and orientation of the second subject P2.

記憶部１９は、各種の情報を記憶する。各種の情報には、コンテンツＣ１、共有基準情報１９２が含まれる。共有基準情報１９２は、第１被写体映像ＰＶ１、第２被写体映像ＰＶ２、及びコンテンツＣ１が提示映像Ｖ１において表示される位置、及び向きの基準を示す情報である。共有基準情報１９２には、後述する第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２が含まれる。また、共有基準情報１９２には、コンテンツＣ１が表示される３次元空間Ｔ１内の位置と第１被写体基準位置情報Ｍ１が示す位置と第２被写体基準位置情報Ｍ２が示す位置との間の相対的な位置関係が含まれる。共有基準情報１９２は、コンテンツ提示装置１（コンテンツ提示装置１－１）と他のコンテンツ提示装置１（コンテンツ提示装置１－２）との間で共有されている。第１被写体基準位置情報Ｍ１が示す位置、及び第２被写体基準位置情報Ｍ２が示す位置については後述する。記憶部１９は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。 The storage unit 19 stores various information. The various information includes the content C1 and the shared reference information 192. The shared reference information 192 is information indicating the reference of the position and orientation at which the first object video PV1, the second object video PV2, and the content C1 are displayed in the presentation video V1. The shared reference information 192 includes the first object reference position information M1 and the second object reference position information M2, which will be described later. The shared reference information 192 also includes the relative positional relationship between the position in the three-dimensional space T1 where the content C1 is displayed, the position indicated by the first object reference position information M1, and the position indicated by the second object reference position information M2. The shared reference information 192 is shared between the content presentation device 1 (content presentation device 1-1) and another content presentation device 1 (content presentation device 1-2). The position indicated by the first object reference position information M1 and the position indicated by the second object reference position information M2 will be described later. The storage unit 19 is configured using a storage device such as a magnetic hard disk device or a semiconductor storage device.

なお、本実施形態では、共有基準情報１９２が記憶部１９に予め記憶されている場合の一例について説明するが、これに限られない。コンテンツ提示装置１は、共有基準情報１９２をコンテンツ提示装置１以外の外部の装置（例えば、サーバー装置２）から取得してもよい。 In the present embodiment, an example in which the shared criteria information 192 is stored in advance in the storage unit 19 will be described, but the present invention is not limited to this. The content presentation device 1 may obtain the shared criteria information 192 from an external device other than the content presentation device 1 (for example, the server device 2).

処理部１６は、位置設定部１６１と、第１被写体映像抽出部１６６と、背景映像取得部１６７と、マスク生成部１６８と、オクルージョン再現部１６９と、提示部１６１０とを備える。
位置設定部１６１は、第１被写体映像ＰＶ１、第２被写体映像ＰＶ２、及びコンテンツＣ１が提示映像Ｖ１において表示される位置、及び向きを設定する。位置設定部１６１は、基準位置設定部１６２と、第１被写体位置設定部１６３と、第２被写体位置設定部１６４と、コンテンツ位置設定部１６５とを備える。 The processing unit 16 includes a position setting unit 161 , a first object image extraction unit 166 , a background image acquisition unit 167 , a mask generation unit 168 , an occlusion reproduction unit 169 , and a presentation unit 1610 .
The position setting unit 161 sets the positions and orientations at which the first object video PV1, the second object video PV2, and the content C1 are displayed in the presentation video V1. The position setting unit 161 includes a reference position setting unit 162, a first object position setting unit 163, a second object position setting unit 164, and a content position setting unit 165.

基準位置設定部１６２は、第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２を３次元空間Ｔ１内に設定する。３次元空間Ｔ１は、提示映像Ｖ１を生成するために各要素（第１被写体映像ＰＶ１、第２被写体映像ＰＶ２、及びコンテンツＣ１）の位置、向きを計算するための３次元空間である。第１被写体基準位置情報Ｍ１は、第１被写体映像ＰＶ１について基準となる位置及び向きを示す情報である。第２被写体基準位置情報Ｍ２は、第２被写体映像ＰＶ２について基準となる位置及び向きを示す情報である。第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２はそれぞれ、一例として、３次元空間Ｔ１内に配置された３つの点（マーカーともいう）である。第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２それぞれは、３つのマーカーによって３次元空間Ｔ１における位置と向きとを示す。ここで向きには、垂直方向についての向きと、この垂直方向に垂直な方向である回転方向についての向きとの２つの向きが含まれる。第１被写体基準位置情報Ｍ１が示す向きに垂直方向についての向きと、この垂直方向に垂直な方向である回転方向についての向きとが含まれることによって、第１被写体Ｐ１は、自身が装着しているヘッドマウントディスプレイＨ１－１の向きが、３次元空間Ｔ１に対して傾いていないかを確認できる。ヘッドマウントディスプレイＨ１－１の向きが、３次元空間Ｔ１に対して傾いているとは、例えば、ヘッドマウントディスプレイＨ１－１を装着している第１被写体Ｐ１が第１の空間Ｒ１の床に対して直立している場合に、この床が３次元空間Ｔ１における水平面に対して傾いていることである。ヘッドマウントディスプレイＨ１－１の向きが、３次元空間Ｔ１に対して傾いている場合には、第１被写体Ｐ１は自身の姿勢を調整してヘッドマウントディスプレイＨ１－１の向きを調整する。 The reference position setting unit 162 sets the first subject reference position information M1 and the second subject reference position information M2 in the three-dimensional space T1. The three-dimensional space T1 is a three-dimensional space for calculating the position and orientation of each element (the first subject video PV1, the second subject video PV2, and the content C1) to generate the presented video V1. The first subject reference position information M1 is information indicating the reference position and orientation for the first subject video PV1. The second subject reference position information M2 is information indicating the reference position and orientation for the second subject video PV2. The first subject reference position information M1 and the second subject reference position information M2 are, as an example, three points (also called markers) arranged in the three-dimensional space T1. The first subject reference position information M1 and the second subject reference position information M2 each indicate the position and orientation in the three-dimensional space T1 by the three markers. Here, the orientation includes two orientations: the orientation in the vertical direction and the orientation in the rotational direction that is perpendicular to the vertical direction. Since the orientation indicated by the first subject reference position information M1 includes the orientation in the vertical direction and the orientation in the rotational direction perpendicular to the vertical direction, the first subject P1 can check whether the orientation of the head mounted display H1-1 worn by the first subject P1 is tilted with respect to the three-dimensional space T1. For example, when the first subject P1 wearing the head mounted display H1-1 stands upright on the floor of the first space R1, the floor is tilted with respect to the horizontal plane in the three-dimensional space T1. When the orientation of the head mounted display H1-1 is tilted with respect to the three-dimensional space T1, the first subject P1 adjusts his/her own posture to adjust the orientation of the head mounted display H1-1.

基準位置設定部１６２は、第２被写体基準位置情報Ｍ２を、第２被写体基準位置情報Ｍ２が示す位置と第１被写体基準位置情報Ｍ１が示す位置との間の所定の相対的位置関係に基づいて３次元空間Ｔ１内に設定する。第１被写体基準位置情報Ｍ１が示す位置とは、例えば、第１被写体基準位置情報Ｍ１が示す３つの点のうちの１つの点の位置である。同様に、第２被写体基準位置情報Ｍ２が示す位置とは、例えば、第２被写体基準位置情報Ｍ２が示す３つの点のうちの１つの点の位置である。所定の相対的位置関係は、共有基準情報１９２に含まれる。この所定の相対的位置関係は、３次元空間Ｔ１内に設定された原点及び座標系に基づく関係である。３次元空間Ｔ１内に原点及び座標系が設定されて第１被写体基準位置情報Ｍ１が示す位置が設定されると、それら第１被写体基準位置情報Ｍ１が示す位置に対して第２被写体基準位置情報Ｍ２が示す位置が決められる。第２被写体基準位置情報Ｍ２が示す位置は、例えば、第１被写体基準位置情報Ｍ１が示す位置と、ＲＧＢＤカメラＣＭ１が設置される位置との間に含まれる位置である。 The reference position setting unit 162 sets the second subject reference position information M2 in the three-dimensional space T1 based on a predetermined relative positional relationship between the position indicated by the second subject reference position information M2 and the position indicated by the first subject reference position information M1. The position indicated by the first subject reference position information M1 is, for example, the position of one of the three points indicated by the first subject reference position information M1. Similarly, the position indicated by the second subject reference position information M2 is, for example, the position of one of the three points indicated by the second subject reference position information M2. The predetermined relative positional relationship is included in the shared reference information 192. This predetermined relative positional relationship is a relationship based on the origin and coordinate system set in the three-dimensional space T1. When the origin and coordinate system are set in the three-dimensional space T1 and the position indicated by the first subject reference position information M1 is set, the position indicated by the second subject reference position information M2 is determined relative to the position indicated by the first subject reference position information M1. The position indicated by the second subject reference position information M2 is, for example, a position included between the position indicated by the first subject reference position information M1 and the position where the RGBD camera CM1 is installed.

なお、本実施形態では、第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２は、それぞれ３つのマーカーである場合の一例について説明するが、これに限られない。第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２は、３次元空間Ｔ１における位置と、垂直方向についての向きと、回転方向についての向きとを示しさえすれば他の情報であってもよい。例えば、第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２は、それぞれ３次元の局所座標系であってもよい。局所座標系は、３次元空間Ｔ１に設定される座標系とは別に、３次元空間Ｔ１の点ごとに設定される。
他の例として、第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２は、それぞれ４つ以上のマーカーであってもよい。 In this embodiment, an example in which the first subject reference position information M1 and the second subject reference position information M2 each include three markers will be described, but the present invention is not limited to this. The first subject reference position information M1 and the second subject reference position information M2 may be other information as long as they indicate the position in the three-dimensional space T1, the orientation in the vertical direction, and the orientation in the rotational direction. For example, the first subject reference position information M1 and the second subject reference position information M2 may each be a three-dimensional local coordinate system. The local coordinate system is set for each point in the three-dimensional space T1, separate from the coordinate system set in the three-dimensional space T1.
As another example, each of the first subject reference position information M1 and the second subject reference position information M2 may include four or more markers.

第１被写体位置設定部１６３は、第１被写体映像ＰＶ１の３次元空間Ｔ１内の位置及び向きを第１被写体基準位置情報Ｍ１に基づいて設定する。
第２被写体位置設定部１６４は、第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置及び向きを第２被写体基準位置情報Ｍ２に基づいて設定する。
コンテンツ位置設定部１６５は、第１被写体基準位置情報Ｍ１が示す位置と、第２被写体基準位置情報Ｍ２が示す位置との間の所定の相対的位置関係に基づいてコンテンツＣ１が表示される３次元空間Ｔ１内の位置を設定する。この所定の相対的位置関係は、共有基準情報１９２に含まれる。この所定の相対的位置関係は、例えば、第１被写体基準位置情報Ｍ１が示す位置と、第２被写体基準位置情報Ｍ２が示す位置と、コンテンツＣ１の位置とが所定の長さの３辺をもつ三角形を形成するような相対的位置関係である。 The first subject position setting section 163 sets the position and orientation of the first subject video PV1 in the three-dimensional space T1 based on the first subject reference position information M1.
The second subject position setting section 164 sets the position and orientation of the second subject video PV2 in the three-dimensional space T1 based on the second subject reference position information M2.
The content position setting unit 165 sets the position in the three-dimensional space T1 where the content C1 is displayed based on a predetermined relative positional relationship between the position indicated by the first subject reference position information M1 and the position indicated by the second subject reference position information M2. This predetermined relative positional relationship is included in the shared reference information 192. For example, this predetermined relative positional relationship is a relative positional relationship in which the position indicated by the first subject reference position information M1, the position indicated by the second subject reference position information M2, and the position of the content C1 form a triangle having three sides of a predetermined length.

第１被写体映像抽出部１６６は、周辺映像Ｅ１から所定の被写体（人等）の映像を抽出する。また、所定の被写体の映像には、第１被写体映像ＰＶ１とともに、同一の空間（第１の空間Ｒ１）内で同一のバーチャルリアリティコンテンツを一緒に体験している他者の体の映像が含まれる。第１被写体映像抽出部１６６は、周辺映像取得部１３１が取得した周辺映像Ｅ１内の、第１被写体Ｐ１を認識する処理を行う。第１被写体映像抽出部１６６は、周辺映像Ｅ１から距離情報（デプスマップ）を生成し、生成した距離情報と、予め設定された深さに基づいて周辺映像Ｅ１内において所定の被写体が映っている箇所（画像内の領域の座標等の情報）を特定する。第１被写体映像抽出部１６６は、認識処理の結果として、周辺映像Ｅ１内の領域の位置情報を、マスク生成部１６８に供給する。
上述したように、所定の被写体の映像には、第１被写体映像ＰＶ１が含まれる。したがって、第１被写体映像抽出部１６６は、周辺映像Ｅ１から第１被写体映像ＰＶ１を抽出する。 The first subject image extraction unit 166 extracts an image of a predetermined subject (such as a person) from the surrounding image E1. The image of the predetermined subject includes, together with the first subject image PV1, an image of the body of another person who is experiencing the same virtual reality content together in the same space (first space R1). The first subject image extraction unit 166 performs a process of recognizing the first subject P1 in the surrounding image E1 acquired by the surrounding image acquisition unit 131. The first subject image extraction unit 166 generates distance information (depth map) from the surrounding image E1, and identifies a location in the surrounding image E1 where the predetermined subject is reflected (information such as coordinates of an area in an image) based on the generated distance information and a preset depth. The first subject image extraction unit 166 supplies position information of the area in the surrounding image E1 to the mask generation unit 168 as a result of the recognition process.
As described above, the image of the predetermined object includes the first object image PV1, so the first object image extractor 166 extracts the first object image PV1 from the surrounding image E1.

なお、第１被写体映像抽出部１６６は、機械学習に基づいて第１被写体Ｐ１を認識してもよい。その場合、第１被写体映像抽出部１６６は、機械学習により、映像における所定被写体の特徴を予め学習済みである。第１被写体映像抽出部１６６は、学習済みのモデルを参照することにより、周辺映像Ｅ１内において所定の被写体が映っている箇所（画像内の領域の座標等の情報）を特定し、その情報を出力する。
また、第１被写体映像抽出部１６６は、抽出する被写体の種類に応じて抽出のためのアルゴリズムを選択してもよい。例えば、第１被写体映像抽出部１６６は、同一の空間（第１の空間Ｒ１）内で同一のバーチャルリアリティコンテンツを一緒に体験している他者が遠方に存在する場合に当該他者の映像の抽出に機械学習を用い、当該他者以外の被写体の映像の抽出に距離情報を用いてもよい。 The first subject video extraction unit 166 may recognize the first subject P1 based on machine learning. In this case, the first subject video extraction unit 166 has learned the characteristics of a specific subject in a video in advance by machine learning. The first subject video extraction unit 166 identifies a location in the peripheral video E1 where the specific subject is captured (information such as the coordinates of an area in an image) by referring to the learned model, and outputs the information.
Furthermore, the first subject image extraction unit 166 may select an algorithm for extraction depending on the type of subject to be extracted. For example, when another person who is experiencing the same virtual reality content together in the same space (first space R1) is present at a distance, the first subject image extraction unit 166 may use machine learning to extract an image of the other person and may use distance information to extract an image of a subject other than the other person.

背景映像取得部１６７は、背景映像ＢＧ１を取得する。背景映像取得部１６７は、周辺映像取得部１３１が取得した周辺映像Ｅ１から、提示領域の映像（画像）を切り出す。背景映像取得部１６７は、切り出した提示領域の映像を背景映像ＢＧ１として取得する。提示領域の映像は、周辺映像Ｅ１全体の一部のみ（例えば、中心付近の部分）であってもよい。これにより、周辺映像取得部１３１が取得する周辺映像Ｅ１の視野角と、ディスプレイ装置４に表示する提示映像Ｖ１の視野角とを、合わせることができる。背景映像取得部１６７は、切り出した映像（画像）を、提示部１６１０に供給する。なお、背景映像取得部１６７の処理は、中心射影の処理を含んでよい。 The background image acquisition unit 167 acquires the background image BG1. The background image acquisition unit 167 cuts out an image (image) of the presentation area from the peripheral image E1 acquired by the peripheral image acquisition unit 131. The background image acquisition unit 167 acquires the cut-out image of the presentation area as the background image BG1. The image of the presentation area may be only a part of the entire peripheral image E1 (for example, a part near the center). This makes it possible to match the viewing angle of the peripheral image E1 acquired by the peripheral image acquisition unit 131 with the viewing angle of the presentation image V1 displayed on the display device 4. The background image acquisition unit 167 supplies the cut-out image (image) to the presentation unit 1610. Note that the processing of the background image acquisition unit 167 may include central projection processing.

マスク生成部１６８は、マスク情報を生成する。マスク生成部１６８は、距離情報に基づくマスク情報と、認識結果に基づくマスク情報とを生成する。距離情報に基づくマスク情報は、距離情報取得部１３２から供給される距離情報に基づいて提示映像を提示すべき領域であるか否かを表す。また、認識結果に基づくマスク情報は、第１被写体映像抽出部１６６が認識した被写体（第１被写体Ｐ１の体、または第１の空間Ｒ１に存在し第１被写体Ｐ１とコンテンツを共有する被写体）が存在する領域であるか否かを表す。本実施形態では、マスク情報は、第１被写体映像ＰＶ１、第２被写体映像ＰＶ２、及びコンテンツＣ１を表示すべき領域であるか、周辺映像Ｅ１を表示すべき領域であるかを表す情報である。 The mask generation unit 168 generates mask information. The mask generation unit 168 generates mask information based on distance information and mask information based on the recognition result. The mask information based on distance information indicates whether or not the area is an area in which the presentation image should be presented based on the distance information supplied from the distance information acquisition unit 132. The mask information based on the recognition result indicates whether or not the area is an area in which the subject recognized by the first subject image extraction unit 166 (the body of the first subject P1, or a subject that exists in the first space R1 and shares content with the first subject P1) exists. In this embodiment, the mask information is information indicating whether the area is an area in which the first subject image PV1, the second subject image PV2, and the content C1 should be displayed, or an area in which the surrounding image E1 should be displayed.

上記のように、マスク生成部１６８は、距離情報に基づくマスクと、認識結果に基づくマスクとを生成する。これにより、提示部１６１０は、次のような提示を行えるようになる。例えば、自分自身（つまり、第１被写体Ｐ１）の体や、同一の空間（第１の空間Ｒ１）内で同一のバーチャルリアリティコンテンツを一緒に体験している他者の体、及び他の空間（第２の空間Ｒ２）内で同一のバーチャルリアリティコンテンツを一緒に体験している他者（第２被写体Ｐ２の）体を、バーチャルリアリティ映像の中に提示することができる。第１被写体映像切り出し部１１１によって認識される所定の被写体（人等）に関しては、自装置からの距離に関わらず、周辺映像Ｅ１の表示が行われるようにすることができる。特定の被写体（人等）以外に関しては、距離情報に基づく提示が行われる。つまり、自装置から比較的近い範囲の物は、周辺映像Ｅ１に含まれる形で、バーチャルリアリティ空間の中に提示される。また、自装置から比較的遠い範囲に存在する物は、周辺映像Ｅ１に含まれる形では提示されない。そのように自装置から比較的遠い範囲に存在する物が存在する領域では、最背面に全天球映像Ｚ１が提示される。 As described above, the mask generating unit 168 generates a mask based on the distance information and a mask based on the recognition result. This allows the presenting unit 1610 to perform the following presentation. For example, the body of the user (i.e., the first subject P1), the body of another person experiencing the same virtual reality content together in the same space (first space R1), and the body of another person (second subject P2) experiencing the same virtual reality content together in another space (second space R2) can be presented in the virtual reality video. For a specific subject (person, etc.) recognized by the first subject video clipping unit 111, the peripheral video E1 can be displayed regardless of the distance from the device. For subjects other than the specific subject (person, etc.), presentation based on the distance information is performed. In other words, objects that are relatively close to the device are presented in the virtual reality space as included in the peripheral video E1. Also, objects that are relatively far from the device are not presented as included in the peripheral video E1. In such an area where there are objects that are relatively far from the device, the spherical image Z1 is presented in the background.

ここで、「比較的近い範囲」とは、例えば、人がその場から動くことなく（例えば、着座のまま）手を伸ばして触れられる範囲である。例えば、１メートル以内程度の範囲である。逆に「比較的遠い範囲」とは、例えば、２メートル以上程度の範囲である。その中間の距離の範囲（１メートル以上且つ２メートル以下）では、近距離用の周辺映像Ｅ１と、遠距離用の全天球映像Ｚ１とを混合した映像が提示されてもよい。
３次元空間Ｔ１は、近距離領域ＴＮ１と、遠距離領域ＴＦ１とからなる。近距離領域ＴＮ１は、上記した「比較的近い範囲」である。遠距離領域ＴＦ１は、上記した「比較的遠い範囲」である。 Here, the "relatively close range" is, for example, a range where a person can reach out and touch without moving from the spot (for example, while sitting). For example, it is a range of about 1 meter. Conversely, the "relatively far range" is, for example, a range of about 2 meters or more. In the intermediate distance range (1 meter or more and 2 meters or less), an image that combines the peripheral image E1 for a short distance and the omnidirectional image Z1 for a long distance may be presented.
The three-dimensional space T1 is made up of a short distance region TN1 and a long distance region TF1. The short distance region TN1 is the above-mentioned "relatively close range." The long distance region TF1 is the above-mentioned "relatively far range."

オクルージョン再現部１６９は、オクルージョンについての処理を行う。オクルージョンとは、手前に存在する物体を奥に存在する物体を隠すようにして描画することである。オクルージョン再現部１６９は、第１被写体映像ＰＶ１の３次元空間Ｔ１内の位置と、第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置とに基づいて、提示映像Ｖ１において第１被写体映像ＰＶ１と第２被写体映像ＰＶ２とのいずれが手前側にあるかを判定する。オクルージョン再現部１６９は、判定結果を提示部１６１０に供給する。 The occlusion reproduction unit 169 performs processing regarding occlusion. Occlusion refers to rendering an object in the foreground so as to hide an object in the background. The occlusion reproduction unit 169 determines whether the first object video PV1 or the second object video PV2 is in the foreground in the presented video V1, based on the position of the first object video PV1 in the three-dimensional space T1 and the position of the second object video PV2 in the three-dimensional space T1. The occlusion reproduction unit 169 supplies the determination result to the presentation unit 1610.

提示部１６１０は、提示映像Ｖ１を生成する。提示映像Ｖ１は、３次元空間Ｔ１内の位置及び向きが設定された第１被写体映像ＰＶ１と、３次元空間Ｔ１内の位置及び向きが設定された第２被写体映像ＰＶ２と、３次元空間Ｔ１内の位置が設定されたコンテンツＣ１とが、周辺映像Ｅ１を背景として３次元空間Ｔ１内に表示された映像である。提示部１６１０は、マスク生成部１６８によって生成されたマスク情報と、オクルージョン再現部１６９による判定結果とに基づいて、提示映像Ｖ１内において、第１被写体映像ＰＶ１、第２被写体映像ＰＶ２、及びコンテンツＣ１相互間の位置関係を反映して提示映像Ｖ１を生成する。なお、提示映像Ｖ１を生成する処理をレンダリングともいう。提示部１６１０は、生成した提示映像Ｖ１をディスプレイ装置４に出力する。 The presentation unit 1610 generates the presentation video V1. The presentation video V1 is an image in which the first object video PV1, the position and orientation of which are set in the three-dimensional space T1, the second object video PV2, the position and orientation of which are set in the three-dimensional space T1, and the content C1, the position of which is set in the three-dimensional space T1, are displayed in the three-dimensional space T1 against the background of the surrounding video E1. The presentation unit 1610 generates the presentation video V1 by reflecting the positional relationship between the first object video PV1, the second object video PV2, and the content C1 in the presentation video V1 based on the mask information generated by the mask generation unit 168 and the determination result by the occlusion reproduction unit 169. The process of generating the presentation video V1 is also referred to as rendering. The presentation unit 1610 outputs the generated presentation video V1 to the display device 4.

提示部１６１０は、提示映像Ｖ１をディスプレイ装置４に出力する処理において、第２被写体映像ＰＶ２に含まれる音声を、第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置に基づいて定位させて出力する。また、提示部１６１０は、提示映像Ｖ１をディスプレイ装置４に出力する処理において、コンテンツＣ１に含まれる音声を、コンテンツＣ１が表示される３次元空間Ｔ１内の位置に基づいて定位させて出力する。なお、提示部１６１０は、コンテンツＣ１に含まれる音声を、コンテンツＣ１が表示される３次元空間Ｔ１内の位置以外から出力してもよい。 In the process of outputting the presentation video V1 to the display device 4, the presentation unit 1610 localizes and outputs the audio included in the second object video PV2 based on the position of the second object video PV2 in the three-dimensional space T1. In addition, in the process of outputting the presentation video V1 to the display device 4, the presentation unit 1610 localizes and outputs the audio included in the content C1 based on the position in the three-dimensional space T1 where the content C1 is displayed. Note that the presentation unit 1610 may output the audio included in the content C1 from a position other than the position in the three-dimensional space T1 where the content C1 is displayed.

ディスプレイ装置４は、提示部１６１０が出力する提示映像Ｖ１を表示する。ディスプレイ装置４は、ヘッドマウントディスプレイＨ１に内蔵される。ディスプレイ装置４は、画面上の領域ごとに、提示部１６１０から出力される提示映像Ｖ１を表示する。なお、ディスプレイ装置４は、例えば立体視のためのステレオ表示を行うものであってもよい。 The display device 4 displays the presentation image V1 output by the presentation unit 1610. The display device 4 is built into the head-mounted display H1. The display device 4 displays the presentation image V1 output from the presentation unit 1610 for each area on the screen. Note that the display device 4 may perform stereo display for stereoscopic vision, for example.

［撮影処理］
ここで本実施形態に係る撮影処理について説明する。撮影処理とは、ＲＧＢＤカメラＣＭ１によって被写体が撮像され、撮影部１１によって被写体の映像が他のコンテンツ提示装置１に送信される処理である。図５は、本実施形態に係る撮影処理の一例を示す図である。図５に示す撮影処理は、コンテンツ提示装置１が提示する提示映像Ｖ１の１フレーム分についての処理に対応する。なお、図５に示す撮影処理は、第１被写体映像ＰＶ１の側を例に取って説明するが、第２被写体映像ＰＶ２の側の撮影処理についても同様である。 [Shooting processing]
Here, the shooting process according to this embodiment will be described. The shooting process is a process in which a subject is captured by the RGBD camera CM1, and the image of the subject is transmitted to another content presentation device 1 by the shooting unit 11. FIG. 5 is a diagram showing an example of the shooting process according to this embodiment. The shooting process shown in FIG. 5 corresponds to processing for one frame of the presentation image V1 presented by the content presentation device 1. Note that the shooting process shown in FIG. 5 will be described taking the first subject image PV1 as an example, but the shooting process for the second subject image PV2 is similar.

ステップＳ１０：第１被写体映像切り出し部１１１は、ＲＧＢＤ撮影部３１によって撮影されたＲＧＢＤ映像を取得する。ＲＧＢＤ映像は、ＲＧＢＤ撮影部３１（ＲＧＢＤカメラＣＭ１－１）によって撮影され、ＲＧＢＤカメラＣＭ１－１の周辺の映像であって、第１被写体Ｐ１を含む。 Step S10: The first subject image cropping unit 111 acquires the RGBD image captured by the RGBD imaging unit 31. The RGBD image is captured by the RGBD imaging unit 31 (RGBD camera CM1-1), is an image of the periphery of the RGBD camera CM1-1, and includes the first subject P1.

ステップＳ２０：第１被写体映像切り出し部１１１は、ＲＧＢＤ撮影部３１によって撮影されたＲＧＢＤ映像から第１被写体映像ＰＶ１を切り出す。ここで第１被写体映像切り出し部１１１は、予め設定された深さに基づいてＲＧＢＤ映像から第１被写体映像ＰＶ１を切り出す。なお、第１被写体映像切り出し部１１１は、機械学習に基いてＲＧＢＤ映像から第１被写体映像ＰＶ１を切り出してもよい。第１被写体映像切り出し部１１１は、切り出した第１被写体映像ＰＶ１を第１被写体映像供給部１１２に供給する。
ステップＳ３０：第１被写体映像供給部１１２は、第１被写体映像切り出し部１１１から供給される第１被写体映像ＰＶ１を他のコンテンツ提示装置１－２に送信する。 Step S20: The first object image cutout unit 111 cuts out the first object image PV1 from the RGBD image captured by the RGBD imaging unit 31. Here, the first object image cutout unit 111 cuts out the first object image PV1 from the RGBD image based on a preset depth. Note that the first object image cutout unit 111 may cut out the first object image PV1 from the RGBD image based on machine learning. The first object image cutout unit 111 supplies the cut out first object image PV1 to the first object image supply unit 112.
Step S30: The first object video supplying unit 112 transmits the first object video PV1 supplied from the first object video clipping unit 111 to the other content presentation device 1-2.

ステップＳ４０：第１被写体映像供給部１１２は、第１被写体位置測定部３２からＨＭＤ位置情報を取得する。ＨＭＤ位置情報は、上述したように第１被写体位置測定部３２（ヘッドマウントディスプレイＨ１－１に内蔵されるジャイロセンサー）によって測定されたヘッドマウントディスプレイＨ１－１の位置及び姿勢を示す。
ステップＳ５０：第１被写体映像供給部１１２は、第１被写体音声取得部３３から第１被写体音声情報を取得する。第１被写体音声情報は、上述したように第１被写体音声取得部３３（ヘッドマウントディスプレイＨ１－１に内蔵されるマイク）によって取得された第１被写体Ｐ１の音声を示す。
ステップＳ６０：第１被写体映像供給部１１２は、取得したＨＭＤ位置情報、及び第１被写体音声情報を他のコンテンツ提示装置１－２に送信する。 Step S40: The first subject image supply unit 112 acquires HMD position information from the first subject position measurement unit 32. The HMD position information indicates the position and attitude of the head mounted display H1-1 measured by the first subject position measurement unit 32 (the gyro sensor built into the head mounted display H1-1) as described above.
Step S50: The first subject video supply unit 112 acquires first subject audio information from the first subject audio acquisition unit 33. The first subject audio information indicates the audio of the first subject P1 acquired by the first subject audio acquisition unit 33 (the microphone built into the head mounted display H1-1) as described above.
Step S60: The first object video supply unit 112 transmits the acquired HMD position information and first object audio information to the other content presentation device 1-2.

ステップＳ７０：撮影部１１は、撮影処理の終了条件が満たされたか否かを判定する。撮影処理の終了条件とは、例えば、撮影処理を終了する操作を撮影部１１が受け付けること、あるいはヘッドマウントディスプレイＨ１－１の電源がオフになることである。撮影部１１は、撮影処理の終了条件が満たされと判定した場合（ステップＳ７０；ＹＥＳ）、撮影処理を終了する。一方、撮影部１１は、撮影処理の終了条件が満たされていないと判定した場合（ステップＳ７０；ＮＯ）、ステップＳ１０の処理を再度実行する。
以上で、撮影部１１は、撮影処理を終了する。 Step S70: The photographing unit 11 determines whether or not the end condition of the photographing process is satisfied. The end condition of the photographing process is, for example, that the photographing unit 11 accepts an operation to end the photographing process, or that the power of the head mounted display H1-1 is turned off. If the photographing unit 11 determines that the end condition of the photographing process is satisfied (step S70; YES), the photographing unit 11 ends the photographing process. On the other hand, if the photographing unit 11 determines that the end condition of the photographing process is not satisfied (step S70; NO), the photographing unit 11 executes the process of step S10 again.
With this, the photographing unit 11 ends the photographing process.

なお、撮影処理において、ステップＳ１０からステップＳ２０までの第１被写体映像ＰＶ１が切り出されるまでの処理と、ステップＳ４０のＨＭＤ位置情報を取得する処理と、ステップＳ５０の第１被写体音声情報を取得する処理とは、並列して実行されてもよい。また、ステップＳ３０の第１被写体映像ＰＶ１を送信する処理と、ステップＳ６０のＨＭＤ位置情報、及び第１被写体音声情報を送信する処理とは、並列してあるいは一度に実行されてもよい。第１被写体音声情報は、他のコンテンツ提示装置１－２に送信される前に、第１被写体映像ＰＶ１に含まれて送信されてもよい。 In the shooting process, the processes from step S10 to step S20 until the first subject video PV1 is extracted, the process of acquiring HMD position information in step S40, and the process of acquiring first subject audio information in step S50 may be executed in parallel. Also, the process of transmitting the first subject video PV1 in step S30 and the process of transmitting the HMD position information and the first subject audio information in step S60 may be executed in parallel or at the same time. The first subject audio information may be included in the first subject video PV1 and transmitted before being transmitted to another content presentation device 1-2.

［表示処理］
次に本実施形態に係る表示処理について説明する。表示処理とは、ヘッドマウントディスプレイＨ１に内蔵されるディスプレイ装置４に提示映像Ｖ１を表示させる処理である。図６は、本実施形態に係る表示処理の一例を示す図である。図６に示す表示処理は、コンテンツ提示装置１が提示する提示映像Ｖ１の１フレーム分についての処理に対応する。なお、図６に示す表示処理は、第１被写体映像ＰＶ１の側を例に取って説明するが、第２被写体映像ＰＶ２の側の撮影処理についても同様である。 [Display Processing]
Next, the display process according to this embodiment will be described. The display process is a process for displaying the presented image V1 on the display device 4 built in the head mounted display H1. Fig. 6 is a diagram showing an example of the display process according to this embodiment. The display process shown in Fig. 6 corresponds to a process for one frame of the presented image V1 presented by the content presentation device 1. Note that the display process shown in Fig. 6 will be described taking the first object image PV1 as an example, but the same applies to the shooting process for the second object image PV2.

ステップＳ１１０：提示部１６１０は、基準位置設定部１６２によって３次元空間Ｔ１内に設定された第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２をディスプレイ装置４に表示させる。 Step S110: The presentation unit 1610 causes the display device 4 to display the first subject reference position information M1 and the second subject reference position information M2 set in the three-dimensional space T1 by the reference position setting unit 162.

ステップＳ１２０：周辺映像取得部１３１は、ステレオカメラＳＣ１－１が撮影した周辺映像Ｅ１を取得する。ステレオカメラＳＣ１－１が撮影した周辺映像Ｅ１は、左側のカメラによって撮像された映像と、右側のカメラによって撮像された映像との組である。
ステップＳ１３０：提示部１６１０は、背景映像取得部１６７によって切り出された提示領域の映像をディスプレイ装置４に表示させる。ここで第１被写体映像ＰＶ１は、第１被写体映像抽出部１６６によって周辺映像Ｅ１から抽出されてディスプレイ装置４に表示される。 Step S120: The surrounding image acquisition unit 131 acquires the surrounding image E1 captured by the stereo camera SC1-1. The surrounding image E1 captured by the stereo camera SC1-1 is a set of an image captured by the left camera and an image captured by the right camera.
Step S130: The presentation unit 1610 displays the image of the presentation area cut out by the background image acquisition unit 167 on the display device 4. Here, the first object image PV1 is extracted from the surrounding image E1 by the first object image extraction unit 166 and displayed on the display device 4.

ステップＳ１４０：第１被写体位置設定部１６３は、第１被写体映像ＰＶ１の３次元空間Ｔ１内の位置及び向きを第１被写体基準位置情報Ｍ１に基づいて設定する。ここで第１被写体位置設定部１６３は、レンダリングによって生成される提示映像Ｖ１において視点となる位置及び向きを調整することによって３次元空間Ｔ１における第１被写体映像ＰＶ１の位置を調整する。第１被写体映像ＰＶ１の位置が第１被写体基準位置情報Ｍ１に基づいて設定（調整）される前では、提示映像Ｖ１において視点となる位置及び向きは、ヘッドマウントディスプレイＨ１－１の仕様に基づいて設定されている初期の表示における所定の位置及び向きに設定されて提示映像Ｖ１が表示される。そのため、３次元空間Ｔ１内において第１被写体映像ＰＶ１の位置及び向きは、第１被写体基準位置情報Ｍ１が示す位置及び向きからは一般にはずれている。第１被写体位置設定部１６３は、提示映像Ｖ１において視点となる位置及び向きを第１被写体基準位置情報Ｍ１が示す位置及び向きに一致させる。これによって、第１被写体位置設定部１６３は、３次元空間Ｔ１内において、第１被写体映像ＰＶ１の位置及び向きを第１被写体基準位置情報Ｍ１が示す位置及び向きに一致させる。 Step S140: The first subject position setting unit 163 sets the position and orientation of the first subject video PV1 in the three-dimensional space T1 based on the first subject reference position information M1. Here, the first subject position setting unit 163 adjusts the position of the first subject video PV1 in the three-dimensional space T1 by adjusting the position and orientation of the viewpoint in the presentation video V1 generated by rendering. Before the position of the first subject video PV1 is set (adjusted) based on the first subject reference position information M1, the position and orientation of the viewpoint in the presentation video V1 are set to a predetermined position and orientation in the initial display set based on the specifications of the head-mounted display H1-1, and the presentation video V1 is displayed. Therefore, the position and orientation of the first subject video PV1 in the three-dimensional space T1 are generally deviated from the position and orientation indicated by the first subject reference position information M1. The first subject position setting unit 163 matches the position and orientation of the viewpoint in the presentation video V1 to the position and orientation indicated by the first subject reference position information M1. As a result, the first subject position setting unit 163 matches the position and orientation of the first subject image PV1 in the three-dimensional space T1 with the position and orientation indicated by the first subject reference position information M1.

ステップＳ１５０：提示部１６１０は、第２被写体映像取得部１８１によって取得された第２被写体映像ＰＶ２をディスプレイ装置４に表示させる。ここで提示部１６１０は、第２被写体映像ＰＶ２を所定未満の透過度において表示する。ここで所定未満の透過度とは、第１被写体Ｐ１にとって第２被写体映像ＰＶ２が透けていないと感じられる程度の透過度である。なお、提示部１６１０は、第２被写体映像ＰＶ２を所定以上の透過度において表示してもよい。
提示部１６１０は、第２被写体映像ＰＶ２を実写映像として表示する。実写映像として表示するとは、ＲＧＢＤカメラＣＭ１－２によって第２被写体Ｐ２が撮影された映像を、画素値をそのまま用いて質感を損なうことなく表示することである。 Step S150: The presentation unit 1610 displays the second subject video PV2 acquired by the second subject video acquisition unit 181 on the display device 4. Here, the presentation unit 1610 displays the second subject video PV2 at a transparency level less than a predetermined level. Here, the transparency level less than the predetermined level means a transparency level at which the first subject P1 does not feel that the second subject video PV2 is transparent. Note that the presentation unit 1610 may display the second subject video PV2 at a transparency level equal to or greater than a predetermined level.
The presentation unit 1610 displays the second object image PV2 as an actual image. Displaying as an actual image means displaying an image of the second object P2 captured by the RGBD camera CM1-2 using pixel values as is without compromising the texture.

提示部１６１０は、オクルージョン再現部１６９によって判定された第２被写体映像ＰＶ２と、提示領域に含まれる他の物体に対応する映像とのいずれが手前にあるかの判定結果に基づいて第２被写体映像ＰＶ２を表示させる。提示領域に含まれる他の物体には、第１被写体映像ＰＶ１、周辺映像Ｅ１に撮影された風景が含まれる。 The presentation unit 1610 displays the second object image PV2 based on the result of the determination by the occlusion reproduction unit 169 as to which of the second object image PV2 and an image corresponding to another object included in the presentation area is in the foreground. The other objects included in the presentation area include the first object image PV1 and the scenery captured in the surrounding image E1.

ステップＳ１６０：第２被写体位置設定部１６４は、第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置及び向きを第２被写体基準位置情報Ｍ２に基づいて設定する。ここで第２被写体Ｐ２を撮影するステレオカメラＳＣ１－２は、自装置の位置から撮影される風景をみた場合の映像を、自装置の高さ方向及び左右方向の位置を撮影された映像の中心にしてそのままＲＧＢＤ映像として出力する。そのため、第２被写体位置設定部１６４は、３次元空間Ｔ１内において第２被写体映像ＰＶ２を投影する仮想的なカメラ（プロジェクタ）の位置及び向きを調整することによって、３次元空間Ｔ１内における第２被写体映像ＰＶ２の位置及び向きを調整する。以下の説明では、３次元空間Ｔ１内において第２被写体映像ＰＶ２を投影する仮想的なカメラを仮想カメラＶＣ１－１という。第２被写体映像ＰＶ２の位置が第２被写体基準位置情報Ｍ２に基づいて設定（調整）される前では、仮想カメラＶＣ１－１の位置及び向きは、ヘッドマウントディスプレイＨ１－１の仕様に基づいて設定されている初期の表示における所定の位置及び向きに設定されている。そのため、３次元空間Ｔ１内において第２被写体映像ＰＶ２の位置及び向きは、第２被写体基準位置情報Ｍ２が示す位置及び向きからは一般にはずれている。第２被写体位置設定部１６４は、３次元空間Ｔ１内において仮想カメラＶＣ１－１の位置及び向きを調整することによって、３次元空間Ｔ１内において、第２被写体映像ＰＶ２の位置及び向きを第２被写体基準位置情報Ｍ２が示す位置及び向きに一致させる。 Step S160: The second subject position setting unit 164 sets the position and orientation of the second subject video PV2 in the three-dimensional space T1 based on the second subject reference position information M2. Here, the stereo camera SC1-2 that captures the second subject P2 outputs the image of the scenery captured from the position of the device as an RGBD image with the height and left/right positions of the device as the center of the captured image. Therefore, the second subject position setting unit 164 adjusts the position and orientation of the second subject video PV2 in the three-dimensional space T1 by adjusting the position and orientation of the virtual camera (projector) that projects the second subject video PV2 in the three-dimensional space T1. In the following description, the virtual camera that projects the second subject video PV2 in the three-dimensional space T1 is called the virtual camera VC1-1. Before the position of the second subject video PV2 is set (adjusted) based on the second subject reference position information M2, the position and orientation of the virtual camera VC1-1 are set to a predetermined position and orientation in the initial display that is set based on the specifications of the head mounted display H1-1. Therefore, the position and orientation of the second subject video PV2 in the three-dimensional space T1 generally deviates from the position and orientation indicated by the second subject reference position information M2. The second subject position setting unit 164 adjusts the position and orientation of the virtual camera VC1-1 in the three-dimensional space T1 to match the position and orientation of the second subject video PV2 in the three-dimensional space T1 with the position and orientation indicated by the second subject reference position information M2.

上述したように、ステップＳ１４０及びステップＳ１６０において、３次元空間Ｔ１内の第１被写体映像ＰＶ１、及び第２被写体映像ＰＶ２それぞれの位置及び向きが、第１被写体基準位置情報Ｍ１及び第２被写体基準位置情報Ｍ２に基づいてそれぞれ設定される。３次元空間Ｔ１内の第１被写体映像ＰＶ１、及び第２被写体映像ＰＶ２それぞれの位置及び向きが一度設定されると、以降のフレームにおいて、３次元空間Ｔ１内の第１被写体映像ＰＶ１、及び第２被写体映像ＰＶ２それぞれの位置及び向きの変化は、第１被写体基準位置情報Ｍ１及び第２被写体基準位置情報Ｍ２に基づいてそれぞれ設定された位置及び向きからの相対的な変化となる。コンテンツ提示システムＳでは、第１被写体基準位置情報Ｍ１、第２被写体基準位置情報Ｍ２、及び第１被写体基準位置情報Ｍ１が示す位置と第２被写体基準位置情報Ｍ２が示す位置との間の相対的な位置関係は、複数のコンテンツ提示装置１相互間において共有されている。したがって、以降のフレームにおける第１被写体基準位置情報Ｍ１及び第２被写体基準位置情報Ｍ２に基づいてそれぞれ設定された位置及び向きからの相対的な変化は、複数のコンテンツ提示装置１相互間において共有される。 As described above, in steps S140 and S160, the positions and orientations of the first and second object images PV1 and PV2 in the three-dimensional space T1 are set based on the first and second object reference position information M1 and M2, respectively. Once the positions and orientations of the first and second object images PV1 and PV2 in the three-dimensional space T1 are set, in subsequent frames, the changes in the positions and orientations of the first and second object images PV1 and PV2 in the three-dimensional space T1 are relative changes from the positions and orientations set based on the first and second object reference position information M1 and M2, respectively. In the content presentation system S, the first and second object reference position information M1 and the relative positional relationship between the position indicated by the first and second object reference position information M1 and the position indicated by the second object reference position information M2 are shared between multiple content presentation devices 1. Therefore, the relative changes from the position and orientation set based on the first subject reference position information M1 and the second subject reference position information M2 in the subsequent frames are shared between multiple content presentation devices 1.

ステップＳ１７０：提示部１６１０は、ディスプレイ装置４に表示されている第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２を非表示にする。 Step S170: The presentation unit 1610 hides the first subject reference position information M1 and the second subject reference position information M2 displayed on the display device 4.

ステップＳ１８０：提示部１６１０は、他のコンテンツ提示装置１－２から受信したＨＭＤ位置情報に基づいて、第２被写体映像ＰＶ２に含まれる音声を第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置に基づいて定位させて出力（再生）する。他のコンテンツ提示装置１－２から受信したＨＭＤ位置情報は、第２被写体Ｐ２に装着されているヘッドマウントディスプレイＨ１－２の位置を示す。第２被写体映像ＰＶ２に含まれる音声は、第２被写体映像ＰＶ２に第２被写体音声情報として含まれている。なお、提示部１６１０は、第２被写体映像ＰＶ２に含まれる音声を第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置以外の位置から出力してもよい。 Step S180: The presentation unit 1610 localizes and outputs (plays) the audio included in the second subject video PV2 based on the position of the second subject video PV2 in the three-dimensional space T1 based on the HMD position information received from the other content presentation device 1-2. The HMD position information received from the other content presentation device 1-2 indicates the position of the head-mounted display H1-2 attached to the second subject P2. The audio included in the second subject video PV2 is included in the second subject video PV2 as second subject audio information. Note that the presentation unit 1610 may output the audio included in the second subject video PV2 from a position other than the position in the three-dimensional space T1 of the second subject video PV2.

ステップＳ１９０：提示部１６１０は、操作部（不図示）に制御コマンドが入力されたか否かを判定する。操作部とは、第１被写体映像ＰＶ１側の操作部と、第２被写体映像ＰＶ２側の操作部の両方である。第１被写体映像ＰＶ１側の操作部に入力される制御コマンドは、制御コマンド取得部１４から取得される。第２被写体映像ＰＶ２側の操作部に入力される制御コマンドは、他のコンテンツ提示装置１－２からデータ共有部１７によって取得される。制御コマンド取得部１４は、制御コマンドが入力されたと判定した場合（ステップＳ１９０；ＹＥＳ）、ステップＳ２００の処理を実行する。一方、制御コマンド取得部１４は、制御コマンドが入力さていないと判定した場合（ステップＳ１９０；ＮＯ）、ステップＳ２２０の処理を実行する。 Step S190: The presentation unit 1610 determines whether or not a control command has been input to an operation unit (not shown). The operation units refer to both the operation unit on the first subject video PV1 side and the operation unit on the second subject video PV2 side. The control command input to the operation unit on the first subject video PV1 side is acquired from the control command acquisition unit 14. The control command input to the operation unit on the second subject video PV2 side is acquired by the data sharing unit 17 from another content presentation device 1-2. If the control command acquisition unit 14 determines that a control command has been input (step S190; YES), it executes the process of step S200. On the other hand, if the control command acquisition unit 14 determines that a control command has not been input (step S190; NO), it executes the process of step S220.

ステップＳ２００：データ共有部１７は、制御コマンド取得部１４が取得した制御コマンドを他のコンテンツ提示装置１－２に送信する。
ステップＳ２１０：制御コマンド取得部１４は、取得した制御コマンドに応じてコンテンツＣ１の内部状態を更新する。 Step S200: The data sharing unit 17 transmits the control command acquired by the control command acquisition unit 14 to the other content presentation device 1-2.
Step S210: The control command acquisition unit 14 updates the internal state of the content C1 in accordance with the acquired control command.

ステップＳ２２０：マスク生成部１６８は、同一空間共有用マスク情報を生成する。同一空間共有用マスク情報は、第１の空間Ｒ１内に存在する被写体（第１被写体Ｐ１の体、または第１の空間Ｒ１に存在し第１被写体Ｐ１とコンテンツを共有する被写体）が存在する領域であるか否かを示す。
ステップＳ２３０：マスク生成部１６８は、ステップ２２０において生成した同一空間共有用マスク情報に、第２被写体映像ＰＶ２が存在する領域であるか否かを示す情報を追加する。 Step S220: The mask generating unit 168 generates mask information for sharing the same space. The mask information for sharing the same space indicates whether or not a region includes a subject present in the first space R1 (the body of the first subject P1, or a subject present in the first space R1 and sharing content with the first subject P1).
Step S230: The mask generating unit 168 adds, to the same space sharing mask information generated in step S220, information indicating whether or not the area includes the second object image PV2.

ステップＳ２４０：マスク生成部１６８は、コンテンツの内部状態に応じてコンテンツＣ１が存在する領域であるか否かを示す情報を同一空間共有用マスク情報に追加する。
ステップＳ２５０：提示部１６１０は、同一空間共有用マスク情報に基づいて、コンテンツＣ１を提示映像Ｖ１に追加する。ここで提示部１６１０は、コンテンツＣ１に含まれる映像、及び音声を提示映像Ｖ１に追加する。 Step S240: The mask generating unit 168 adds information indicating whether or not the area includes the content C1 according to the internal state of the content to the mask information for sharing the same space.
Step S250: The presentation unit 1610 adds the content C1 to the presentation image V1 based on the same space sharing mask information. Here, the presentation unit 1610 adds the video and audio included in the content C1 to the presentation image V1.

ステップＳ２６０：提示部１６１０は、生成した提示映像Ｖ１をディスプレイ装置４に出力し、表示させる。ここで提示部１６１０は、ビデオシースルー方式を用いて提示映像Ｖ１を生成する。ディスプレイ装置４は、ビデオシースルー方式を用いて提示映像Ｖ１を表示する。なお、ディスプレイ装置４は、光学シースルー方式を用いて提示映像Ｖ１を表示してもよい。 Step S260: The presentation unit 1610 outputs the generated presentation image V1 to the display device 4 for display. Here, the presentation unit 1610 generates the presentation image V1 using a video see-through method. The display device 4 displays the presentation image V1 using the video see-through method. Note that the display device 4 may display the presentation image V1 using an optical see-through method.

ステップＳ２７０：処理部１６は、表示処理の終了条件が満たされたか否かを判定する。表示処理の終了条件とは、例えば、ヘッドマウントディスプレイＨ１－１の電源がオフになることである。処理部１６は、表示処理の終了条件が満たされたと判定した場合（ステップＳ２７０；ＹＥＳ）、表示処理を終了する。一方、処理部１６は、表示処理の終了条件が満たされていないと判定した場合（ステップＳ２７０；ＮＯ）、ステップＳ１２０に戻って以降の処理を再度実行する。ここで、２回目以降の表示処理においては、ステップＳ１１０、ステップＳ１４０、ステップＳ１６０、及びステップＳ１７０の各処理は省略される。
以上で、表示部１２は表示処理を終了する。 Step S270: The processing unit 16 determines whether or not the end condition of the display process is satisfied. The end condition of the display process is, for example, that the power supply of the head mounted display H1-1 is turned off. When the processing unit 16 determines that the end condition of the display process is satisfied (step S270; YES), the processing unit 16 ends the display process. On the other hand, when the processing unit 16 determines that the end condition of the display process is not satisfied (step S270; NO), the processing unit 16 returns to step S120 and executes the subsequent processes again. Here, in the second and subsequent display processes, the processes of steps S110, S140, S160, and S170 are omitted.
This is how the display unit 12 ends the display process.

なお、図６に示した表示処理においては、第２被写体映像ＰＶ２の表示位置が設定された後に、コンテンツＣ１の表示位置が設定される場合の一例について説明したが、これに限られない。コンテンツＣ１の表示位置が設定された後に、第２被写体映像ＰＶ２の表示位置が設定されてもよい。 Note that, in the display process shown in FIG. 6, an example has been described in which the display position of the second object image PV2 is set and then the display position of the content C1 is set, but this is not limited thereto. The display position of the second object image PV2 may be set after the display position of the content C1 is set.

［位置設定］
ここで図７、図８を参照し、位置設定部１６１が第１被写体映像ＰＶ１、第２被写体映像ＰＶ２、及びコンテンツＣ１の３次元空間Ｔ１内のそれぞれの位置を設定する処理について説明する。図７は、本実施形態に係る位置設定処理の一例を示す図である。図７（Ａ）は、第１被写体Ｐ１の側の３次元空間Ｔ１－１における位置関係を示す。図７（Ｂ）は、第２被写体Ｐ２の側の３次元空間Ｔ１－２における位置関係を示す。３次元空間Ｔ１－１、及び３次元空間Ｔ１－２には、原点Ｏ１が予め設定されている。なお、図７（Ａ）では第１被写体Ｐ１の体の全体が示されているが、ヘッドマウントディスプレイＨ１－１に表示される提示映像Ｖ１では、第１被写体Ｐ１の体は、ステレオカメラＳＣ１－１によって撮影された部分のみがビデオシースルーに基づいて表示される。同様に、図７（Ｂ）では第２被写体Ｐ２の体の全体が示されているが、ヘッドマウントディスプレイＨ１－２に表示される提示映像Ｖ１では、第２被写体Ｐ２の体は、ステレオカメラＳＣ１－２によって撮影された部分のみがビデオシースルーに基づいて表示される。 [Position Settings]
Here, referring to FIG. 7 and FIG. 8, the process in which the position setting unit 161 sets the positions of the first object image PV1, the second object image PV2, and the content C1 in the three-dimensional space T1 will be described. FIG. 7 is a diagram showing an example of the position setting process according to this embodiment. FIG. 7(A) shows the positional relationship in the three-dimensional space T1-1 on the side of the first object P1. FIG. 7(B) shows the positional relationship in the three-dimensional space T1-2 on the side of the second object P2. An origin O1 is set in advance in the three-dimensional space T1-1 and the three-dimensional space T1-2. Note that, although FIG. 7(A) shows the entire body of the first object P1, in the presented image V1 displayed on the head-mounted display H1-1, only the part of the body of the first object P1 captured by the stereo camera SC1-1 is displayed based on the video see-through. Similarly, while Figure 7 (B) shows the entire body of the second subject P2, in the presented image V1 displayed on the head-mounted display H1-2, only the part of the body of the second subject P2 captured by the stereo camera SC1-2 is displayed based on video see-through.

第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、及び第１被写体第３マーカーＭ１３の組は、第１被写体基準位置情報Ｍ１の一例である。第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、及び第１被写体第３マーカーＭ１３は、原点Ｏ１に対する所定の相対位置に設定される。第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、及び第１被写体第３マーカーＭ１３それぞれの原点Ｏ１からの距離、及び原点Ｏ１からみた方向は予め決定されている。換言すれば、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、及び第１被写体第３マーカーＭ１３のそれぞれの位置を、３次元空間Ｔ１－１に予め設定された３次元座標系によって表した場合の各座標の値は予め決定されている。 The set of the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13 is an example of the first subject reference position information M1. The first subject first marker M11, the first subject second marker M12, and the first subject third marker M13 are set at a predetermined relative position with respect to the origin O1. The distance from the origin O1 and the direction seen from the origin O1 of each of the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13 are determined in advance. In other words, the values of each coordinate when the positions of the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13 are represented by a three-dimensional coordinate system set in advance in the three-dimensional space T1-1 are determined in advance.

第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３の組は、第２被写体基準位置情報Ｍ２の一例である。第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３は、原点Ｏ１に対する所定の相対位置に設定される。第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３それぞれの原点Ｏ１からの距離、及び原点Ｏ１からみた方向は予め決定されている。換言すれば、第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３のそれぞれの位置を、３次元空間Ｔ１－１に予め設定された３次元座標系によって表した場合の各座標の値は予め決定されている。 The set of the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23 is an example of the second subject reference position information M2. The second subject first marker M21, the second subject second marker M22, and the second subject third marker M23 are set at a predetermined relative position with respect to the origin O1. The distance from the origin O1 and the direction seen from the origin O1 of each of the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23 are determined in advance. In other words, the value of each coordinate when the position of each of the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23 is represented by a three-dimensional coordinate system set in advance in the three-dimensional space T1-1 is determined in advance.

上述したように、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、及び第１被写体第３マーカーＭ１３は原点Ｏ１に対する所定の相対位置に設定されるため、第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３は、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、及び第１被写体第３マーカーＭ１３に対する所定の相対位置に設定される。 As described above, the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13 are set at a predetermined relative position to the origin O1, and therefore the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23 are set at a predetermined relative position to the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13.

第１被写体映像ＰＶ１は、例えば、第１被写体映像ＰＶ１の頭部、腰、第１被写体Ｐ１の正面の方向の所定の位置がそれぞれ、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、及び第１被写体第３マーカーＭ１３にそれぞれ一致するように表示される。ただし、ヘッドマウントディスプレイＨ１－１に表示される提示映像Ｖ１では、第１被写体Ｐ１の体のうちステレオカメラＳＣ１－１によって撮影された部分のみがビデオシースルーに基づいて第１被写体映像ＰＶ１として表示される。 The first subject video PV1 is displayed, for example, so that the head, waist, and predetermined positions in the direction of the front of the first subject P1 in the first subject video PV1 correspond to the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13, respectively. However, in the presented video V1 displayed on the head-mounted display H1-1, only the part of the body of the first subject P1 that is captured by the stereo camera SC1-1 is displayed as the first subject video PV1 based on the video see-through.

第２被写体映像ＰＶ２は、上述したように３次元空間Ｔ１－１において仮想カメラＶＣ１－１によって投影されて表示される。第２被写体映像ＰＶ２は、例えば、第２被写体映像ＰＶ２の頭部、腰、第２被写体Ｐ２の正面の方向の所定の位置がそれぞれ、第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３にそれぞれ一致するように表示される。第２被写体映像ＰＶ２に第２被写体音声情報として含まれる音声は、第２被写体音声位置ＰＰ２に定位されて出力される。第２被写体音声位置ＰＰ２は、第２被写体映像ＰＶ２の頭部の位置（第２被写体第１マーカーＭ２１の位置）に基づいて設定される。第２被写体音声位置ＰＰ２は、例えば、第２被写体映像ＰＶ２の頭部の位置に設定される。あるいは第２被写体音声位置ＰＰ２は、第２被写体映像ＰＶ２の頭部の位置から高さ方向に所定の距離だけ離れた位置に設定されてもよい。第２被写体音声位置ＰＰ２は、第２被写体映像ＰＶ２の頭部の位置に基づいて設定された後、この頭部の位置の移動に連動して移動する。 The second subject video PV2 is projected and displayed by the virtual camera VC1-1 in the three-dimensional space T1-1 as described above. The second subject video PV2 is displayed so that, for example, the head, waist, and predetermined positions in the direction of the front of the second subject P2 in the second subject video PV2 correspond to the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23, respectively. The sound included in the second subject video PV2 as the second subject sound information is localized and output at the second subject sound position PP2. The second subject sound position PP2 is set based on the head position of the second subject video PV2 (the position of the second subject first marker M21). The second subject sound position PP2 is set, for example, to the head position of the second subject video PV2. Alternatively, the second subject sound position PP2 may be set to a position that is a predetermined distance away in the height direction from the head position of the second subject video PV2. The second subject sound position PP2 is set based on the head position in the second subject video PV2, and then moves in conjunction with the movement of this head position.

図７（Ｂ）に示す３次元空間Ｔ１－２においても、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、第１被写体第３マーカーＭ１３、第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３は、図７（Ａ）に示す３次元空間Ｔ１－１と設定された位置と同じ位置に設定される。第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、第１被写体第３マーカーＭ１３、第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３それぞれの原点Ｏ１からの距離、及び原点Ｏ１からみた方向は、３次元空間Ｔ１－１と、３次元空間Ｔ１－２との間で同じである。換言すれば、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、第１被写体第３マーカーＭ１３、第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３それぞれの座標の値は、３次元空間Ｔ１－１と、３次元空間Ｔ１－２との間で同じである。第１被写体音声位置ＰＰ１は、例えば、第１被写体映像ＰＶ１の頭部の位置に設定される。あるいは第１被写体音声位置ＰＰ１は、第１被写体映像ＰＶ１の頭部の位置から高さ方向に所定の距離だけ離れた位置に設定されてもよい。第１被写体音声位置ＰＰ１は、第１被写体映像ＰＶ１の頭部の位置に基づいて設定された後、この頭部の位置の移動に連動して移動する。 7(B), the first subject first marker M11, the first subject second marker M12, the first subject third marker M13, the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23 are set at the same positions as those set in the three-dimensional space T1-1 shown in Fig. 7(A). The distance from the origin O1 and the direction as viewed from the origin O1 of each of the first subject first marker M11, the first subject second marker M12, the first subject third marker M13, the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23 are the same between the three-dimensional space T1-1 and the three-dimensional space T1-2. In other words, the coordinate values of the first subject first marker M11, the first subject second marker M12, the first subject third marker M13, the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23 are the same between the three-dimensional space T1-1 and the three-dimensional space T1-2. The first subject sound position PP1 is set, for example, at the position of the head in the first subject video PV1. Alternatively, the first subject sound position PP1 may be set at a position a predetermined distance away in the height direction from the position of the head in the first subject video PV1. The first subject sound position PP1 is set based on the position of the head in the first subject video PV1, and then moves in conjunction with the movement of the position of the head.

第１被写体映像ＰＶ１は、３次元空間Ｔ１－２において仮想カメラＶＣ１－２によって投影されて表示される。図７に示す例では、第１被写体映像ＰＶ１の頭部、腰、第１被写体Ｐ１の正面の方向の所定の位置がそれぞれ、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、及び第１被写体第３マーカーＭ１３にそれぞれ一致されるように表示され、第２被写体映像ＰＶ２の頭部、腰、第２被写体Ｐ２の正面の方向の所定の位置がそれぞれ、第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３にそれぞれ一致するように表示される場合の一例について説明したが、これに限られない。第１被写体映像ＰＶ１に含まれる部分のうち、第１被写体基準位置情報Ｍ１が示す位置に応じた所定の３つの部分が第１被写体基準位置情報Ｍ１が示す３つの位置にそれぞれ一致されるように表示されてもよい。例えば、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、及び第１被写体第３マーカーＭ１３それぞれの位置に応じて、第１被写体映像ＰＶ１の肩、腕、足の位置が、第１被写体映像ＰＶ１の第１被写体基準位置情報Ｍ１が示す位置への位置合わせに用いられてもよい。また、第１被写体基準位置情報Ｍ１が示す位置に応じて、第１被写体Ｐ１の周囲の所定の方向（正面の方向以外に、背面の方向、左右の方向など）の所定の位置が第１被写体映像ＰＶ１の第１被写体基準位置情報Ｍ１が示す位置への位置合わせに用いられてもよい。
第２被写体映像ＰＶ２の第２被写体基準位置情報Ｍ２が示す位置への位置合わせについても同様に、第２被写体映像ＰＶ２に含まれる部分のうち第２被写体基準位置情報Ｍ２が示す位置に応じた所定の３つの部分が第２被写体基準位置情報Ｍ２が示す位置に一致されるように表示されてもよい。 The first subject video PV1 is projected and displayed by the virtual camera VC1-2 in the three-dimensional space T1-2. In the example shown in FIG. 7, the head, waist, and predetermined positions in the front direction of the first subject P1 of the first subject video PV1 are displayed to match the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13, respectively, and the head, waist, and predetermined positions in the front direction of the second subject P2 of the second subject video PV2 are displayed to match the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23, respectively, but this is not limited to this. Of the parts included in the first subject video PV1, three predetermined parts according to the position indicated by the first subject reference position information M1 may be displayed to match the three positions indicated by the first subject reference position information M1, respectively. For example, the positions of the shoulders, arms, and legs of the first subject video PV1 may be used for alignment to the position indicated by the first subject reference position information M1 of the first subject video PV1, depending on the respective positions of the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13. Also, depending on the position indicated by the first subject reference position information M1, a predetermined position in a predetermined direction (besides the front direction, the back direction, the left and right directions, etc.) around the first subject P1 may be used for alignment to the position indicated by the first subject reference position information M1 of the first subject video PV1.
Similarly, with regard to aligning the second subject video PV2 to the position indicated by the second subject reference position information M2, three specific parts of the second subject video PV2 corresponding to the position indicated by the second subject reference position information M2 may be displayed so as to coincide with the positions indicated by the second subject reference position information M2.

ここで、第１被写体映像ＰＶ１及び第２被写体映像ＰＶ２の位置の設定をヘッドマウントディスプレイＨ１－１に表示される映像を用いて説明する。図８は、本実施形態に係る被写体の位置を設定するためにヘッドマウントディスプレイＨ１－１に表示される映像の一例を示す図である。図８に示す例では、第１被写体Ｐ１は第１の空間Ｒ１においてヘッドマウントディスプレイＨ１－１を頭部に装着した状態で椅子ＯＢ１に座っている。第２被写体Ｐ２は、第２の空間Ｒ２において椅子に座っている。図８に示す例では、第２被写体映像ＰＶ２には、第２被写体Ｐ２の映像とともに第２被写体Ｐ２が座っている椅子ＯＢ２の映像が含まれる。 Here, the setting of the positions of the first subject image PV1 and the second subject image PV2 will be explained using images displayed on the head mounted display H1-1. FIG. 8 is a diagram showing an example of an image displayed on the head mounted display H1-1 to set the position of the subject according to this embodiment. In the example shown in FIG. 8, the first subject P1 is sitting on a chair OB1 in the first space R1 with the head mounted display H1-1 attached to his head. The second subject P2 is sitting on a chair in the second space R2. In the example shown in FIG. 8, the second subject image PV2 includes an image of the chair OB2 on which the second subject P2 is sitting, as well as an image of the second subject P2.

図８（Ａ）は、第２被写体映像ＰＶ２の位置が、第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３に対してずれている状態を示す。図８（Ａ）では、第２被写体映像ＰＶ２における第２被写体Ｐ２の腰の位置（つまり、椅子ＯＢ２の座面の位置）と、第２被写体第２マーカーＭ２２の位置とがずれている。一方、第１被写体Ｐ１が座っている椅子ＯＢ１の座面の位置と、第１被写体第２マーカーＭ１２の位置とは一致しており、第１被写体映像ＰＶ１の位置は、第１被写体Ｐ１の位置が、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、第１被写体第３マーカーＭ１３に対してずれていない。 Figure 8 (A) shows a state in which the position of the second subject video PV2 is misaligned with respect to the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23. In Figure 8 (A), the position of the waist of the second subject P2 in the second subject video PV2 (i.e., the position of the seat of the chair OB2) is misaligned with the position of the second subject second marker M22. Meanwhile, the position of the seat of the chair OB1 on which the first subject P1 is sitting matches the position of the first subject second marker M12, and the position of the first subject P1 in the first subject video PV1 is not misaligned with respect to the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13.

第２被写体映像ＰＶ２の位置の調整は、上下方向、前後方向、左右方向について行われる。第２被写体映像ＰＶ２に含まれる第２被写体Ｐ２の足の位置は、３次元空間Ｔ１の底面の位置に一致するように上下方向の調整が行われる。つまり、第２被写体位置設定部１６４は、第２被写体基準位置情報Ｍ２に基づいて第２被写体映像ＰＶ２に含まれる第２被写体Ｐ２の足の位置を３次元空間Ｔ１の底面の位置に一致させて、第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置を設定する。なお、第２被写体位置設定部１６４は、第２被写体Ｐ２の足の位置を３次元空間Ｔ１の底面の位置以外の位置に一致させて第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置を設定してもよい。 The position of the second subject video PV2 is adjusted in the up-down, front-back, and left-right directions. The position of the feet of the second subject P2 included in the second subject video PV2 is adjusted in the up-down direction so as to match the position of the bottom surface of the three-dimensional space T1. In other words, the second subject position setting unit 164 sets the position of the second subject video PV2 in the three-dimensional space T1 by matching the position of the feet of the second subject P2 included in the second subject video PV2 to the position of the bottom surface of the three-dimensional space T1 based on the second subject reference position information M2. Note that the second subject position setting unit 164 may set the position of the second subject video PV2 in the three-dimensional space T1 by matching the position of the feet of the second subject P2 to a position other than the position of the bottom surface of the three-dimensional space T1.

第２被写体映像ＰＶ２の位置が、第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３に対して調整されると、図８（Ｃ）に示す状態が得られる。図８（Ｃ）では、第２被写体映像ＰＶ２における第２被写体Ｐ２の腰の位置（つまり、椅子ＯＢ２の座面の位置）と、第２被写体第２マーカーＭ２２の位置とが一致している。 When the position of the second subject video PV2 is adjusted relative to the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23, the state shown in FIG. 8(C) is obtained. In FIG. 8(C), the position of the waist of the second subject P2 in the second subject video PV2 (i.e., the position of the seat of the chair OB2) coincides with the position of the second subject second marker M22.

一方、図８（Ｂ）では、第１被写体Ｐ１の位置が、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、第１被写体第３マーカーＭ１３に対してずれている状態を示す。図８（Ｂ）では、第１被写体Ｐ１が座っている椅子ＯＢ１の座面の位置と、第１被写体第２マーカーＭ１２の位置とがずれている。一方、第２被写体映像ＰＶ２における第２被写体Ｐ２の腰の位置（つまり、椅子ＯＢ２の座面の位置）と、第２被写体第２マーカーＭ２２の位置とは一致しており、第２被写体映像ＰＶ２の位置が、第２被写体第１マーカーＭ２１、第２被写体第２マーカーＭ２２、及び第２被写体第３マーカーＭ２３に対してずれていない。 On the other hand, FIG. 8(B) shows a state in which the position of the first subject P1 is misaligned with respect to the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13. In FIG. 8(B), the position of the seat of the chair OB1 on which the first subject P1 is sitting is misaligned with the position of the first subject second marker M12. On the other hand, the position of the waist of the second subject P2 in the second subject video PV2 (i.e., the position of the seat of the chair OB2) is consistent with the position of the second subject second marker M22, and the position of the second subject video PV2 is not misaligned with respect to the second subject first marker M21, the second subject second marker M22, and the second subject third marker M23.

第１被写体映像ＰＶ１の位置の調整は、前後方向、左右方向について行われる。ヘッドマウントディスプレイＨ１－１に表示される映像において、３次元空間Ｔ１の底面はヘッドマウントディスプレイＨ１－１の上下方向の位置に基づいて予め調整される。ヘッドマウントディスプレイＨ１－１の上下方向の位置と、正面の方向とがそれぞれ、３次元空間Ｔ１の底面の位置と、水平方向の位置とにそれぞれ一致するように予め調整されている場合には、第１被写体映像ＰＶ１の上下方向の位置の調整は行われない。ヘッドマウントディスプレイＨ１－１の上下方向の位置と、正面の方向とがそれぞれ、３次元空間Ｔ１の底面の位置と、水平方向の位置とにそれぞれ一致するように予め調整されていない場合には、第１被写体映像ＰＶ１の上下方向の位置の調整が行われる。 The position of the first object image PV1 is adjusted in the front-back and left-right directions. In the image displayed on the head mounted display H1-1, the bottom surface of the three-dimensional space T1 is pre-adjusted based on the up-down position of the head mounted display H1-1. If the up-down position and front direction of the head mounted display H1-1 have been pre-adjusted to match the bottom surface position and horizontal position of the three-dimensional space T1, respectively, then the up-down position of the first object image PV1 is not adjusted. If the up-down position and front direction of the head mounted display H1-1 have not been pre-adjusted to match the bottom surface position and horizontal position of the three-dimensional space T1, respectively, then the up-down position of the first object image PV1 is adjusted.

第１被写体映像ＰＶ１の位置が、第１被写体第１マーカーＭ１１、第１被写体第２マーカーＭ１２、第１被写体第３マーカーＭ１３に対して調整されると、図８（Ｃ）に示す状態が得られる。図８（Ｃ）では、第１被写体Ｐ１が座っている椅子ＯＢ１の座面の位置と、第１被写体第２マーカーＭ１２の位置とが一致している。
なお、第１被写体映像ＰＶ１、あるいは第２被写体映像ＰＶ２の位置の調整が完了すると、各マーカーは非表示の状態にされる。 When the position of the first subject image PV1 is adjusted relative to the first subject first marker M11, the first subject second marker M12, and the first subject third marker M13, the state shown in Fig. 8C is obtained. In Fig. 8C, the position of the seat of the chair OB1 on which the first subject P1 is sitting coincides with the position of the first subject second marker M12.
When the adjustment of the position of the first object image PV1 or the second object image PV2 is completed, each marker is made non-displayable.

ここで第２被写体映像ＰＶ２は、第２被写体Ｐ２がＲＧＢＤカメラＣＭ１－２に一方向から撮影されて得られる映像である。第２被写体映像ＰＶ２における第２被写体Ｐ２の上下方向の位置は、撮影部１１によって調整される。これは、ＲＧＢＤカメラＣＭ１－２の上下方向の位置（高さ）が第２被写体Ｐ２の目線と同程度の上下方向の位置（高さ）にあることが好ましいためである。 The second subject video PV2 is an image obtained by capturing the second subject P2 from one direction by the RGBD camera CM1-2. The vertical position of the second subject P2 in the second subject video PV2 is adjusted by the shooting unit 11. This is because it is preferable that the vertical position (height) of the RGBD camera CM1-2 is at a vertical position (height) approximately the same as the line of sight of the second subject P2.

第２被写体映像ＰＶ２における第２被写体Ｐ２の上下方向に垂直な方向の位置、及び当該方向についての向き（回転の向き）は、ＲＧＢＤカメラＣＭ１－２の位置によって調整されてもよいし、表示部１２によって調整されてもよいし、ＲＧＢＤカメラＣＭ１－２の位置による調整と表示部１２による調整とが組み合わされて調整されてもよい。本実施形態では、第２被写体映像ＰＶ２における第２被写体Ｐ２の上下方向に垂直な方向についての向きが適当でない場合には、一例として、第２被写体Ｐ２を撮影するＲＧＢＤカメラＣＭ１－２の上下方向に垂直な方向の位置、及び当該方向についての向き（回転の向き）が調整される。第２被写体映像ＰＶ２における第２被写体Ｐ２の上下方向に垂直な方向の位置、及び当該方向についての向き（回転の向き）は、ＲＧＢＤカメラＣＭ１－２の位置によって調整される場合には、ＲＧＢＤカメラＣＭ１－２の撮影範囲における第２被写体映像ＰＶ２の位置あるいは向きを一定にできる。 The position of the second subject P2 in the second subject video PV2 in the direction perpendicular to the up-down direction and the orientation in that direction (rotation direction) may be adjusted by the position of the RGBD camera CM1-2, or by the display unit 12, or may be adjusted by combining the adjustment by the position of the RGBD camera CM1-2 and the adjustment by the display unit 12. In this embodiment, if the orientation of the second subject P2 in the second subject video PV2 in the direction perpendicular to the up-down direction is not appropriate, as an example, the position of the RGBD camera CM1-2 that captures the second subject P2 in the direction perpendicular to the up-down direction and the orientation in that direction (rotation direction) are adjusted. If the position of the second subject P2 in the second subject video PV2 in the direction perpendicular to the up-down direction and the orientation in that direction (rotation direction) are adjusted by the position of the RGBD camera CM1-2, the position or orientation of the second subject video PV2 in the capture range of the RGBD camera CM1-2 can be kept constant.

なお、上述したようにＲＧＢＤカメラＣＭ１－２の向きが調整される代わりに、表示部１２によって第２被写体映像ＰＶ２における第２被写体Ｐ２の上下方向に垂直な方向の位置、及び当該方向についての向き（回転の向き）が変更されてもよい。その場合、表示部１２に備えられる第２被写体位置設定部１６４は、画像処理に基づいて第２被写体映像ＰＶ２の上下方向に垂直な方向の位置、及び当該方向についての向き（回転の向き）を変更する。第２被写体位置設定部１６４は、予め取得された第２被写体Ｐ２についての複数の方向から撮影した場合の複数の画像から、適当な画像が選択してもよい。または、第２被写体位置設定部１６４は、機械学習などを用いた画像処理に基づいて第２被写体Ｐ２を他の方向から撮影した場合の画像を生成することによって第２被写体映像ＰＶ２における第２被写体Ｐ２の向きを変更してもよい。 In addition, instead of adjusting the orientation of the RGBD camera CM1-2 as described above, the display unit 12 may change the position of the second subject P2 in the direction perpendicular to the up-down direction in the second subject video PV2 and the orientation (rotation direction) in that direction. In this case, the second subject position setting unit 164 provided in the display unit 12 changes the position of the second subject P2 in the direction perpendicular to the up-down direction in the second subject video PV2 and the orientation (rotation direction) in that direction based on image processing. The second subject position setting unit 164 may select an appropriate image from multiple images of the second subject P2 captured in advance from multiple directions. Alternatively, the second subject position setting unit 164 may change the orientation of the second subject P2 in the second subject video PV2 by generating an image of the second subject P2 captured from another direction based on image processing using machine learning or the like.

図７に戻って位置設定の説明を続ける。
コンテンツＣ１が表示される３次元空間Ｔ１内の位置は、原点Ｏ１に対する所定の相対位置に設定される。第１被写体基準位置情報Ｍ１が示す位置、及び第２被写体基準位置情報Ｍ２が示す位置は、原点Ｏ１に対する所定の相対位置に設定されているため、コンテンツＣ１が表示される位置は、第１被写体基準位置情報Ｍ１が示す位置、及び第２被写体基準位置情報Ｍ２が示す位置に対する所定の相対位置に設定されている。
コンテンツＣ１に含まれる音声は、コンテンツ音声位置ＰＣ１に定位されて再生される。コンテンツ音声位置ＰＣ１は、コンテンツＣ１が表示される位置に基づいて設定される。コンテンツ音声位置ＰＣ１は、例えば、コンテンツＣ１が表示される位置に設定される。なおコンテンツ音声位置ＰＣ１は、コンテンツＣ１が表示される位置から高さ方向に所定の距離だけ離れた位置に設定されてもよい。コンテンツ音声位置ＰＣ１は、コンテンツＣ１が表示される位置に基づいて設定された後、コンテンツＣ１が表示される位置の移動に連動して移動する。 Returning to FIG. 7, the description of the position setting will be continued.
The position in the three-dimensional space T1 where the content C1 is displayed is set to a predetermined relative position with respect to the origin O1. Since the position indicated by the first subject reference position information M1 and the position indicated by the second subject reference position information M2 are set to predetermined relative positions with respect to the origin O1, the position where the content C1 is displayed is set to a predetermined relative position with respect to the position indicated by the first subject reference position information M1 and the position indicated by the second subject reference position information M2.
The sound contained in the content C1 is localized at a content sound position PC1 and played back. The content sound position PC1 is set based on the position where the content C1 is displayed. The content sound position PC1 is set, for example, at the position where the content C1 is displayed. The content sound position PC1 may be set at a position a predetermined distance away in the height direction from the position where the content C1 is displayed. After being set based on the position where the content C1 is displayed, the content sound position PC1 moves in conjunction with the movement of the position where the content C1 is displayed.

ここで図９を参照し、コンテンツＣ１が表示される位置と、第１被写体映像ＰＶ１が表示される位置と、第２被写体映像ＰＶ２が表示される位置との関係について説明する。図９は、本実施形態に係るコンテンツＣ１、第１被写体映像ＰＶ１、及び第２被写体映像ＰＶ２間の相対的な位置関係の一例を示す図である。
図９（Ａ）は、第１被写体Ｐ１の側の３次元空間Ｔ１－１における位置関係を示す。図９（Ａ）において、位置Ａ１－１、Ａ２－１、Ａ３－１はそれぞれ、第１被写体Ｐ１の位置、第２被写体映像ＰＶ２が表示される位置、コンテンツＣ１が表示される位置を示す。位置Ａ２－１は、位置Ａ１－１と、ＲＧＢＤカメラＣＭ１－１が設置される位置との間の位置に設定される。または、ＲＧＢＤカメラＣＭ１－１が、位置Ａ１－１から位置Ａ２－１への方向に所定の距離だけ離れた位置に設置されてもよい。また、位置Ａ３－１は、位置Ａ１－１、Ａ２－１、Ａ３－１が所定の長さの３辺をもつ三角形の頂点に対応するように設定される。 Here, the relationship between the position where the content C1 is displayed, the position where the first object video PV1 is displayed, and the position where the second object video PV2 is displayed will be described with reference to Fig. 9. Fig. 9 is a diagram showing an example of the relative positional relationship between the content C1, the first object video PV1, and the second object video PV2 according to this embodiment.
9A shows the positional relationship in the three-dimensional space T1-1 on the side of the first subject P1. In FIG. 9A, positions A1-1, A2-1, and A3-1 respectively indicate the position of the first subject P1, the position where the second subject video PV2 is displayed, and the position where the content C1 is displayed. Position A2-1 is set at a position between position A1-1 and the position where the RGBD camera CM1-1 is installed. Alternatively, the RGBD camera CM1-1 may be installed at a position a predetermined distance away from position A1-1 in the direction toward position A2-1. Position A3-1 is set so that positions A1-1, A2-1, and A3-1 correspond to the vertices of a triangle having three sides of a predetermined length.

図９（Ｂ）は、第２被写体Ｐ２の側の３次元空間Ｔ１－２における位置関係を示す。図９（Ｂ）において、位置Ａ１－２、Ａ２－２、Ａ３－２はそれぞれ、第１被写体映像ＰＶ１、第２被写体Ｐ２の位置、コンテンツＣ１が表示される位置を示す。位置Ａ１－２は、位置Ａ２－２と、ＲＧＢＤカメラＣＭ１－２が設置される位置との間の位置に設定される。または、ＲＧＢＤカメラＣＭ１－２が、位置Ａ２－２から位置Ａ１－２への方向に所定の距離だけ離れた位置に設置されてもよい。また、位置Ａ３－２は、位置Ａ１－２、Ａ２－２、Ａ３－２が所定の長さの３辺をもつ三角形の頂点に対応するように設定される。 Figure 9 (B) shows the positional relationship in the three-dimensional space T1-2 on the side of the second subject P2. In Figure 9 (B), positions A1-2, A2-2, and A3-2 respectively indicate the position of the first subject video PV1, the position of the second subject P2, and the position where the content C1 is displayed. Position A1-2 is set at a position between position A2-2 and the position where the RGBD camera CM1-2 is installed. Alternatively, the RGBD camera CM1-2 may be installed at a position a predetermined distance away from position A2-2 in the direction toward position A1-2. Furthermore, position A3-2 is set so that positions A1-2, A2-2, and A3-2 correspond to the vertices of a triangle having three sides of predetermined lengths.

ここで、３次元空間Ｔ１－１において位置Ａ３－１は、位置Ａ１－１、Ａ２－１、Ａ３－１によって形成される所定の長さの３辺をもつ三角形と、３次元空間Ｔ１－２において位置Ａ１－２、Ａ２－２、Ａ３－２によって形成される所定の長さの３辺をもつ三角形とは合同である。つまり、コンテンツ提示装置１においては、コンテンツＣ１が表示される３次元空間Ｔ１内の位置と第１被写体基準位置情報Ｍ１が示す位置と第２被写体基準位置情報Ｍ２が示す位置との間の相対的な位置関係は、他のコンテンツ提示装置１との間で共有されている。 Here, in the three-dimensional space T1-1, a triangle with three sides of a predetermined length formed by positions A1-1, A2-1, and A3-1 is congruent with a triangle with three sides of a predetermined length formed by positions A1-2, A2-2, and A3-2 in the three-dimensional space T1-2. In other words, in the content presentation device 1, the relative positional relationship between the position in the three-dimensional space T1 where the content C1 is displayed and the position indicated by the first subject reference position information M1 and the position indicated by the second subject reference position information M2 is shared with other content presentation devices 1.

［ＡＲまたはＶＲオブジェクトの描画］
提示映像Ｖ１におけるＡＲまたはＶＲオブジェクトの描画の規則について説明する。図１０は、本実施形態に係るＡＲまたはＶＲオブジェクトの描画の一例を示す図である。上述したように、３次元空間Ｔ１は、第１被写体Ｐ１からの距離に応じて互いに区別される近距離領域ＴＮ１と、遠距離領域ＴＦ１とからなる。 [Drawing AR or VR objects]
The rules for drawing an AR or VR object in the presented image V1 will be described. Fig. 10 is a diagram showing an example of drawing an AR or VR object according to the present embodiment. As described above, the three-dimensional space T1 is composed of a close distance area TN1 and a long distance area TF1 that are distinguished from each other according to the distance from the first subject P1.

提示映像Ｖ１は、背景映像ＢＧ１と、物体映像Ｂ１とからなる。背景映像ＢＧ１は、周辺映像Ｅ１と、全天球映像Ｚ１とからなる。周辺映像Ｅ１は、近距離領域ＴＮ１における背景の映像として用いられる。周辺映像Ｅ１には、第１被写体Ｐ１が存在する周辺の例えば、床あるいは地面の映像とともに、第１被写体Ｐ１が存在する第１の空間Ｒ１に存在する様々な物体の映像が含まれる。図１０では、周辺映像Ｅ１に含まれる物体の映像として、住宅の柱の映像である映像Ｅ１０が示されている。全天球映像Ｚ１は、ＶＲオブジェクトであり、遠距離領域ＴＦ１における背景の映像として用いられる。 The presented image V1 is made up of a background image BG1 and an object image B1. The background image BG1 is made up of a surrounding image E1 and a celestial sphere image Z1. The surrounding image E1 is used as a background image in the near distance area TN1. The surrounding image E1 includes images of various objects existing in the first space R1 in which the first subject P1 exists, as well as images of the floor or ground, for example, in the vicinity of the first subject P1. In FIG. 10, an image E10, which is an image of a pillar of a house, is shown as an image of an object included in the surrounding image E1. The celestial sphere image Z1 is a VR object, and is used as a background image in the far distance area TF1.

物体映像Ｂ１に含まれる映像には、第１被写体映像ＰＶ１、第２被写体映像ＰＶ２、コンテンツＣ１、第３被写体映像ＰＶ３がある。第３被写体映像ＰＶ３は、第１の空間Ｒ１に存在する人物のうち遠距離領域ＴＦ１に存在する人物の映像である。第２被写体映像ＰＶ２、コンテンツＣ１は、ＡＲオブジェクトである。 The images included in the object image B1 include a first subject image PV1, a second subject image PV2, content C1, and a third subject image PV3. The third subject image PV3 is an image of a person who exists in the long distance area TF1 among people who exist in the first space R1. The second subject image PV2 and content C1 are AR objects.

本実施形態では、近距離領域ＴＮ１には、ステレオカメラＳＣ１によって撮影された第１被写体Ｐ１の第１の空間Ｒ１の風景の映像が表示される。遠距離領域ＴＦ１には、ＶＲの映像である全天球映像Ｚ１が最背面の背景として表示されてよい。なお、遠距離領域ＴＦ１には、全天球映像Ｚ１が表示されなくてもよい。 In this embodiment, a video of the scenery in the first space R1 of the first subject P1 captured by the stereo camera SC1 is displayed in the near field TN1. A celestial sphere video Z1, which is a VR video, may be displayed as the backmost background in the far field TF1. Note that the celestial sphere video Z1 does not necessarily have to be displayed in the far field TF1.

ここで以下の説明においては、背景映像ＢＧ１と物体映像Ｂ１とのうち全天球映像Ｚ１以外の映像を、Ａグループオブジェクトといい、全天球映像Ｚ１をＢグループオブジェクトという。
提示映像Ｖ１において、Ａグループオブジェクトはグループオブジェクトよりも手前に表示される。Ａグループオブジェクト同士は、第１被写体Ｐ１からの距離に基づいてオクルージョンが再現されて表示される。Ａグループオブジェクトは、近距離領域ＴＮ１、遠距離領域ＴＦ１、近距離領域ＴＮ１と遠距離領域ＴＦ１との境界上のいずれに表示されてもよい。ここで第１の空間Ｒ１に存在する様々な物体の映像、及び第３被写体映像ＰＶ３は、３次元空間Ｔ１において存在する位置が決まっているため、それらの位置に応じて表示される。 In the following description, the background image BG1 and the object image B1 other than the spherical image Z1 will be referred to as group A objects, and the spherical image Z1 will be referred to as group B objects.
In the presented image V1, the A group object is displayed in front of the group objects. The A group objects are displayed with occlusion reproduced based on the distance from the first subject P1. The A group object may be displayed in the close distance area TN1, the long distance area TF1, or on the boundary between the close distance area TN1 and the long distance area TF1. Here, the images of various objects existing in the first space R1 and the third subject image PV3 are displayed according to their positions since their positions in the three-dimensional space T1 are determined.

［オクルージョンの再現］
次に図１１及び図１２を参照し、オクルージョンの再現について説明する。図１１は、本実施形態に係る第１被写体映像ＰＶ１と第２被写体映像ＰＶ２についてのオクルージョンの再現の一例を示す図である。図１１（Ａ）では、第１被写体映像ＰＶ１として第１被写体Ｐ１の手の映像示されている。この第１被写体Ｐ１の手の位置が第２被写体映像ＰＶ２の位置よりも手前にある場合、提示映像Ｖ１において当該手が第２被写体映像ＰＶ２と重なる部分は第２被写体映像ＰＶ２よりも手前に表示される。つまり、この場合、第１被写体映像ＰＶ１は、当該手が見える態様で表示される。 [Recreating occlusion]
Next, reproduction of occlusion will be described with reference to Fig. 11 and Fig. 12. Fig. 11 is a diagram showing an example of reproduction of occlusion for the first object video PV1 and the second object video PV2 according to this embodiment. In Fig. 11(A), an image of the hand of the first object P1 is shown as the first object video PV1. When the position of the hand of the first object P1 is closer than the position of the second object video PV2, the part of the presented video V1 where the hand overlaps with the second object video PV2 is displayed closer than the second object video PV2. That is, in this case, the first object video PV1 is displayed in a manner in which the hand is visible.

一方、図１１（Ｂ）では、第１被写体Ｐ１の手の位置は第２被写体映像ＰＶ２の位置よりも奥側ある。この場合、提示映像Ｖ１において当該手が第２被写体映像ＰＶ２と重なる部分は第２被写体映像ＰＶ２に隠されて表示される。つまり、この場合、第１被写体映像ＰＶ１は、当該手の一部分が見えない態様で表示される。 On the other hand, in FIG. 11(B), the position of the hand of the first subject P1 is further back than the position of the second subject image PV2. In this case, the part of the hand in the presented image V1 that overlaps with the second subject image PV2 is displayed hidden by the second subject image PV2. In other words, in this case, the first subject image PV1 is displayed in such a way that a part of the hand is not visible.

図１２は、本実施形態に係る第２被写体映像ＰＶ２とコンテンツについてのオクルージョンの再現の一例を示す図である。図１２では、コンテンツとしてコンテンツＣ３（エクセサイズをしているトレーナーのＡＲオブジェクト）が表示されている。第２被写体映像ＰＶ２は、コンテンツＣ３よりも手前にあるため、コンテンツＣ３は、コンテンツＣ３の一部（脚の部分）が、第２被写体映像ＰＶ２に含まれる椅子に隠れる態様で表示されている。 Figure 12 is a diagram showing an example of reproducing occlusion for the second subject video PV2 and content according to this embodiment. In Figure 12, content C3 (an AR object of a trainer doing exercises) is displayed as the content. Because the second subject video PV2 is in front of content C3, content C3 is displayed in such a way that part of content C3 (the legs) is hidden by the chair included in the second subject video PV2.

［コンテンツの例］
次に図１３から図１５を参照し、コンテンツ提示装置１によって提示されるＡＲとＶＲとを組み合わせた様々なコンテンツの例について説明する。図１３では、コンテンツとしてコンテンツＣ１３が表示されている。コンテンツＣ１３は、テレビ画面のＡＲオブジェクトである。図１３では、テレビ画面が空中に表示されている。図１４では、コンテンツとしてコンテンツＣ１４が表示されている。コンテンツＣ１４は、エクセサイズをしているトレーナーのＡＲオブジェクトである。コンテンツＣ１４は、トレーナーの３次元映像である。
図１３及び図１４では、Ｂグループオブジェクトである全天球映像Ｚ１は表示されておらず、遠距離領域ＴＦ１には、周辺映像Ｅ１が背景として表示されている。図１３及び図１４では、Ａグループオブジェクトである第１被写体映像ＰＶ１と、第２被写体映像ＰＶ２と、コンテンツとは、３次元空間Ｔ１における位置に応じてオクルージョンを再現して表示されている。 [Content example]
Next, various examples of content combining AR and VR presented by the content presentation device 1 will be described with reference to Fig. 13 to Fig. 15. In Fig. 13, content C13 is displayed as the content. Content C13 is an AR object on a television screen. In Fig. 13, the television screen is displayed in the air. In Fig. 14, content C14 is displayed as the content. Content C14 is an AR object of a trainer performing exercises. Content C14 is a three-dimensional image of the trainer.
13 and 14, the celestial sphere video Z1, which is a group B object, is not displayed, and the peripheral video E1 is displayed as a background in the far distance area TF1. In Fig. 13 and 14, the first subject video PV1, the second subject video PV2, and the content, which are group A objects, are displayed with occlusion reproduced according to their positions in the three-dimensional space T1.

図１５では、全天球映像Ｚ１５が遠距離領域ＴＦ１に背景として表示されている。全天球映像Ｚ１５は、水族館の風景である。Ａグループオブジェクトである第１被写体映像ＰＶ１と、第２被写体映像ＰＶ２とは、Ｂグループオブジェクトである全天球映像Ｚ１５よりも手前に表示されている。Ａグループオブジェクトである第１被写体映像ＰＶ１と、第２被写体映像ＰＶ２は、３次元空間Ｔ１における位置に応じてオクルージョンを再現して表示されている。 In FIG. 15, a celestial sphere image Z15 is displayed as the background in the far distance area TF1. The celestial sphere image Z15 is a view of an aquarium. The first subject image PV1 and the second subject image PV2, which are group A objects, are displayed in front of the celestial sphere image Z15, which is a group B object. The first subject image PV1 and the second subject image PV2, which are group A objects, are displayed with occlusion reproduced according to their positions in the three-dimensional space T1.

以上に説明したように、本実施形態に係るコンテンツ提示装置１は、基準位置設定部１６２と、周辺映像取得部１３１と、第１被写体映像抽出部１６６と、背景映像取得部１６７と、第２被写体映像取得部１８１と、コンテンツ取得部１５と、第１被写体位置設定部１６３と、第２被写体位置設定部１６４と、コンテンツ位置設定部１６５と、提示部１６１０と、を備える。基準位置設定部１６２は、第１被写体Ｐ１が撮影された画像を含む映像である第１被写体映像ＰＶ１について基準となる位置及び向きを示す情報である第１被写体基準位置情報Ｍ１と、第２被写体Ｐ２が撮影された画像を含む映像である第２被写体映像ＰＶ２について基準となる位置及び向きを示す情報である第２被写体基準位置情報Ｍ２とを３次元空間Ｔ１内に設定する。周辺映像取得部１３１は、第１被写体Ｐ１を含む自装置の周辺の映像である周辺映像Ｅ１を取得する。第１被写体映像抽出部１６６は、周辺映像Ｅ１から第１被写体映像ＰＶ１を抽出する。背景映像取得部１６７は、背景として用いられる映像である背景映像ＢＧ１を取得する。第２被写体映像取得部１８１は、第２被写体映像ＰＶ２を他のコンテンツ提示装置１－２から取得する。コンテンツ取得部１５は、コンテンツＣ１を取得する。第１被写体位置設定部１６３は、第１被写体映像ＰＶ１の３次元空間Ｔ１内の位置及び向きを第１被写体基準位置情報Ｍ１に基づいて設定する。第２被写体位置設定部１６４は、第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置及び向きを第２被写体基準位置情報Ｍ２に基づいて設定する。コンテンツ位置設定部１６５は、第１被写体基準位置情報Ｍ１が示す位置と、第２被写体基準位置情報Ｍ２が示す位置との間の所定の相対的位置関係に基づいてコンテンツＣ１が表示される３次元空間Ｔ１内の位置を設定する。提示部１６１０は、３次元空間Ｔ１内の位置及び向きが設定された第１被写体映像ＰＶ１と、３次元空間Ｔ１内の位置及び向きが設定された第２被写体映像ＰＶ２と、３次元空間Ｔ１内の位置が設定されたコンテンツＣ１とが、背景映像ＢＧ１を背景として３次元空間Ｔ１内に表示された映像である提示映像Ｖ１を出力する。コンテンツ提示装置１では、コンテンツＣ１が表示される３次元空間Ｔ１内の位置と第１被写体基準位置情報Ｍ１が示す位置と第２被写体基準位置情報Ｍ２が示す位置との間の相対的な位置関係、第１被写体基準位置情報Ｍ１、及び第２被写体基準位置情報Ｍ２がそれぞれ他のコンテンツ提示装置１－２との間で共有されている。 As described above, the content presentation device 1 according to the present embodiment includes the reference position setting unit 162, the surrounding image acquisition unit 131, the first object image extraction unit 166, the background image acquisition unit 167, the second object image acquisition unit 181, the content acquisition unit 15, the first object position setting unit 163, the second object position setting unit 164, the content position setting unit 165, and the presentation unit 1610. The reference position setting unit 162 sets, in the three-dimensional space T1, the first object reference position information M1, which is information indicating a reference position and orientation for the first object image PV1, which is an image including an image of the first object P1, and the second object reference position information M2, which is information indicating a reference position and orientation for the second object image PV2, which is an image including an image of the second object P2. The surrounding image acquisition unit 131 acquires the surrounding image E1, which is an image of the surroundings of the device including the first object P1. The first object image extraction unit 166 extracts the first object image PV1 from the surrounding image E1. The background image acquisition unit 167 acquires the background image BG1, which is an image used as the background. The second object image acquisition unit 181 acquires the second object image PV2 from another content presentation device 1-2. The content acquisition unit 15 acquires the content C1. The first object position setting unit 163 sets the position and orientation of the first object image PV1 in the three-dimensional space T1 based on the first object reference position information M1. The second object position setting unit 164 sets the position and orientation of the second object image PV2 in the three-dimensional space T1 based on the second object reference position information M2. The content position setting unit 165 sets the position in the three-dimensional space T1 where the content C1 is displayed based on a predetermined relative positional relationship between the position indicated by the first object reference position information M1 and the position indicated by the second object reference position information M2. The presentation unit 1610 outputs a presentation video V1 in which a first subject video PV1, the position and orientation of which are set in the three-dimensional space T1, a second subject video PV2, the position and orientation of which are set in the three-dimensional space T1, and a content C1, the position of which is set in the three-dimensional space T1, are displayed in the three-dimensional space T1 against a background of a background video BG1. In the content presentation device 1, the relative positional relationship between the position in the three-dimensional space T1 where the content C1 is displayed, the position indicated by the first subject reference position information M1, and the position indicated by the second subject reference position information M2, the first subject reference position information M1, and the second subject reference position information M2 are each shared with another content presentation device 1-2.

この構成により、本実施形態に係るコンテンツ提示装置１では、異なる空間にいる人同士がコンテンツの位置を含めた互いの位置関係を矛盾なく共有できるため、異なる空間にいる人とコンテンツ（ＡＲコンテンツ、ＶＲコンテンツ、またはＡＲコンテンツとＶＲコンテンツとが混合したコンテンツであるＡＲ／ＶＲ混合コンテンツ）を同じ空間を共有しながら体験している感覚を提供できる。 With this configuration, the content presentation device 1 according to this embodiment allows people in different spaces to share each other's positional relationships, including the position of the content, without any inconsistencies, providing the sensation of experiencing content (AR content, VR content, or AR/VR mixed content that is a mixture of AR content and VR content) with people in different spaces while sharing the same space.

将来の放送コンテンツについては、ＡＲとＶＲとをシームレスに切り替えながら表示することが想定される。これを可能とし、なおかつ、実空間とバーチャルオブジェクトを同程度の品質で合成して表示できるシステムが必要とされる。そしてそのようなシステム内で、それぞれのユーザーがいずれの位置を基準点として空間を展開し、異なる空間をどのようにして矛盾なく共有するかを具体的に設計することが求められる。
コンテンツ提示装置１では、自分と遠方の人物とコンテンツの相対位置関係が矛盾することなく、同一エリアにいる人物だけでなく、物理的に同じ場所に存在しない人とも一緒に、言語外コミュニケーションも行いながら、ＡＲ／ＶＲ混合コンテンツを視聴体験することができる。ＡＲ／ＶＲ混合コンテンツは、将来の放送コンテンツとして想定されている。コンテンツ提示装置１では、ローカルエリア及びネットワークを介して家族や友人の様子を見ながら、一緒にＡＲコンテンツ、ＶＲコンテンツ、またはＡＲ／ＶＲ混合コンテンツを十分な視野角をもって視聴可能であり、実空間とバーチャル空間を同程度の品質で混合してシミュレーションできる環境を提供できる。 It is expected that future broadcast content will be displayed by seamlessly switching between AR and VR. A system that makes this possible and can display real space and virtual objects by compositing them with the same quality is required. In such a system, it is necessary to specifically design which position each user will use as the reference point to develop the space and how to share different spaces without contradiction.
In the content presentation device 1, a user can experience viewing AR/VR mixed content while engaging in non-verbal communication not only with people in the same area but also with people who are not physically in the same place, without any contradiction in the relative positional relationship between the user, a distant person, and the content. AR/VR mixed content is envisioned as future broadcast content. In the content presentation device 1, a user can view AR content, VR content, or AR/VR mixed content together with family and friends with a sufficient viewing angle while checking their status via a local area and a network, and an environment can be provided in which real space and virtual space can be mixed and simulated with the same level of quality.

また、本実施形態に係るコンテンツ提示装置１では、提示部１６１０は、コンテンツＣ１に含まれる音声を、コンテンツＣ１が表示される３次元空間Ｔ１内の位置に基づいて定位させて出力する。
この構成により、本実施形態に係るコンテンツ提示装置１では、コンテンツＣ１に含まれる音声がコンテンツＣ１が表示される３次元空間Ｔ１内の位置から聞こえてくるため、コンテンツＣ１を視聴する際の臨場感を増すことができる。 Furthermore, in the content presentation device 1 according to this embodiment, the presentation unit 1610 localizes and outputs the sound included in the content C1 based on the position in the three-dimensional space T1 where the content C1 is displayed.
With this configuration, in the content presentation device 1 of this embodiment, the sound contained in the content C1 can be heard from a position within the three-dimensional space T1 in which the content C1 is displayed, thereby increasing the sense of realism when viewing the content C1.

また、本実施形態に係るコンテンツ提示装置１では、提示部１６１０は、第２被写体映像ＰＶ２に含まれる音声を、第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置に基づいて定位させて出力する。
この構成により、本実施形態に係るコンテンツ提示装置１では、第２被写体Ｐ２の音声が第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置から聞こえてくるため、異なる空間に存在する人が隣で音声を発しているかのように感じることができる。 Furthermore, in the content presentation device 1 according to this embodiment, the presentation unit 1610 localizes and outputs the sound included in the second object video PV2 based on the position of the second object video PV2 in the three-dimensional space T1.
With this configuration, in the content presentation device 1 of this embodiment, the sound of the second subject P2 is heard from a position within the three-dimensional space T1 of the second subject image PV2, making it feel as if a person in a different space is making a sound next to you.

また、本実施形態に係るコンテンツ提示装置１では、オクルージョン再現部１６９をさらに備える。オクルージョン再現部１６９は、第１被写体映像ＰＶ１の３次元空間Ｔ１内の位置と、第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置とに基づいて、提示映像Ｖ１において第１被写体映像ＰＶ１と第２被写体映像ＰＶ２とのいずれが手前側にあるかを判定する。提示部１６１０は、オクルージョン再現部１６９の判定結果に基づいて提示映像Ｖ１を出力する。
この構成により、本実施形態に係るコンテンツ提示装置１では、異なる空間にいる人同士が互いの前後の位置関係を矛盾なく共有できるため、前後の位置関係に矛盾がある場合に比べてより自然に異なる空間に存在する人が隣に存在するかのように感じることができる。 Moreover, the content presentation device 1 according to this embodiment further includes an occlusion reproduction unit 169. The occlusion reproduction unit 169 determines which of the first object video PV1 and the second object video PV2 is in the foreground in the presentation video V1, based on the position of the first object video PV1 in the three-dimensional space T1 and the position of the second object video PV2 in the three-dimensional space T1. The presentation unit 1610 outputs the presentation video V1 based on the determination result of the occlusion reproduction unit 169.
With this configuration, the content presentation device 1 according to this embodiment allows people in different spaces to share each other's front-to-back positional relationships without any inconsistencies, making it feel as if people in different spaces are next to each other more naturally than when there is an inconsistency in the front-to-back positional relationships.

また、本実施形態に係るコンテンツ提示装置１では、第２被写体位置設定部１６４は、第２被写体基準位置情報Ｍ２に基づいて第２被写体映像ＰＶ２に含まれる第２被写体Ｐ２の足の位置を３次元空間Ｔ１の底面の位置に一致させて、第２被写体映像ＰＶ２の３次元空間Ｔ１内の位置を設定する。
この構成により、本実施形態に係るコンテンツ提示装置１では、異なる空間にいる人同士が互いの高さ（上下）方向の位置関係を矛盾なく共有できるため、高さ（上下）方向の位置関係に矛盾がある場合に比べてより自然に異なる空間に存在する人が隣に存在するかのように感じることができる。 Furthermore, in the content presentation device 1 according to this embodiment, the second subject position setting unit 164 sets the position of the second subject P2 included in the second subject video PV2 within the three-dimensional space T1 by matching the position of the feet of the second subject P2 included in the second subject video PV2 with the position of the bottom surface of the three-dimensional space T1 based on the second subject reference position information M2.
With this configuration, the content presentation device 1 of this embodiment allows people in different spaces to share each other's positional relationships in the height (up and down) direction without any inconsistencies, making it feel as if people in different spaces are next to each other more naturally than when there is an inconsistency in the positional relationships in the height (up and down) direction.

また、本実施形態に係るコンテンツ提示装置１では、提示部１６１０は、第２被写体映像ＰＶ２を所定未満の透過度において表示する。
この構成により、本実施形態に係るコンテンツ提示装置１では、異なる空間にいる人同士が互いの外観を、透明な映像として表示される場合に比べて実際の外観に近づけて見ることできるため、異なる空間に存在する人の外観を実際の外観に近づけて当該人が隣に存在するかのように感じることができる。 Furthermore, in the content presentation device 1 according to this embodiment, the presentation section 1610 displays the second object image PV2 at a transparency level lower than a predetermined level.
With this configuration, the content presentation device 1 of this embodiment allows people in different spaces to see each other's appearances closer to their actual appearances than when they are displayed as transparent images, making it possible to feel as if the appearance of a person in a different space is closer to their actual appearance as if that person is present next to them.

また、本実施形態に係るコンテンツ提示装置１では、提示部１６１０は、ビデオシースルー方式を用いて提示映像Ｖ１を生成する。
この構成により、本実施形態に係るコンテンツ提示装置１では、提示映像Ｖ１において自分の体が自然な態様で表示されるため、異なる空間にいる人と同じ空間を共有しながら体験する感覚をより自然なものにできる。自然な態様とは、距離感、質感、応答性（ディレイ）などが実際のものに近いことである。 Moreover, in the content presentation device 1 according to this embodiment, the presentation unit 1610 generates the presentation image V1 using a video see-through method.
With this configuration, in the content presentation device 1 according to the present embodiment, one's own body is displayed in a natural manner in the presentation image V1, so that the sensation of sharing the same space with a person in a different space can be made more natural. A natural manner means that the sense of distance, texture, responsiveness (delay), etc. are close to the actual ones.

なお、データ共有部１７は、同期部の機能を有していてもよい。同期部は、提示部１６１０が出力する提示映像Ｖ１においてコンテンツＣ１、または全天球映像Ｚ１の時間方向の再生位置を、他のコンテンツ提示装置１－２との間で同期させる。コンテンツ提示装置１－１と、他のコンテンツ提示装置１－２とは、通信により、随時情報を交換できる。 The data sharing unit 17 may have the function of a synchronization unit. The synchronization unit synchronizes the playback position in the time direction of the content C1 or the spherical image Z1 in the presentation image V1 output by the presentation unit 1610 with another content presentation device 1-2. The content presentation device 1-1 and the other content presentation device 1-2 can exchange information at any time through communication.

なお、本実施形態では、第１被写体映像切り出し部１１１が撮影部１１に備えられる場合の一例について説明したが、これに限られない。第１被写体映像切り出し部１１１は、表示部１２に備えられてもよい。その場合、第１被写体映像切り出し部１１１は、第２被写体映像切り出し部として表示部１２に備えられる。ここで第１被写体映像切り出し部１１１は、第２被写体映像切り出し部として表示部１２に備えられる場合について説明する。撮影部１１に備えられる被写体映像供給部は、ＲＧＢＤ撮影部３１によって撮影された第２被写体Ｐ２が撮影されたＲＧＢＤ映像を他のコンテンツ提示装置１に供給する。第２被写体映像取得部１８１は、他のコンテンツ提示装置１から供給されるＲＧＢＤ映像を取得する。表示部１２に備えられる第２被写体映像切り出し部は、第２被写体映像取得部１８１が取得したＲＧＢＤ映像から第２被写体映像ＰＶ２を切り出す。 In the present embodiment, an example in which the first subject image cutout unit 111 is provided in the imaging unit 11 has been described, but the present invention is not limited to this. The first subject image cutout unit 111 may be provided in the display unit 12. In this case, the first subject image cutout unit 111 is provided in the display unit 12 as a second subject image cutout unit. Here, a case in which the first subject image cutout unit 111 is provided in the display unit 12 as a second subject image cutout unit will be described. The subject image supply unit provided in the imaging unit 11 supplies the RGBD image of the second subject P2 captured by the RGBD imaging unit 31 to the other content presentation device 1. The second subject image acquisition unit 181 acquires the RGBD image provided from the other content presentation device 1. The second subject image cutout unit provided in the display unit 12 cuts out the second subject image PV2 from the RGBD image acquired by the second subject image acquisition unit 181.

なお、本実施形態では、一例として２台のコンテンツ提示装置１が同時に稼働している状況を示しているが、連携動作するコンテンツ提示装置１の数は、任意である。連携動作するコンテンツ提示装置１の数に応じて、複数人の被写体は、コンテンツの位置を含めた互いの位置関係を矛盾なく共有しながら共通のコンテンツを視聴できる。 In this embodiment, as an example, a situation in which two content presentation devices 1 are operating simultaneously is shown, but the number of content presentation devices 1 that operate in conjunction with each other is arbitrary. Depending on the number of content presentation devices 1 that operate in conjunction with each other, multiple subjects can view the same content while sharing their mutual positional relationships, including the position of the content, without any contradictions.

なお、上述した各実施形態におけるコンテンツ提示装置が有する機能の少なくとも一部をコンピューターで実現することができる。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ＵＳＢメモリー等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、一時的に、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリーのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。 At least a part of the functions of the content presentation device in each of the above-mentioned embodiments can be realized by a computer. In that case, a program for realizing the functions may be recorded in a computer-readable recording medium, and the program recorded in the recording medium may be read into a computer system and executed to realize the functions. Note that the term "computer system" here includes hardware such as an OS and peripheral devices. Furthermore, the term "computer-readable recording medium" refers to portable media such as flexible disks, optical magnetic disks, ROMs, CD-ROMs, DVD-ROMs, and USB memories, and storage devices such as hard disks built into computer systems. Furthermore, the term "computer-readable recording medium" may also include those that temporarily and dynamically hold a program, such as a communication line when a program is transmitted via a network such as the Internet or a communication line such as a telephone line, and those that hold a program for a certain period of time, such as a volatile memory inside a computer system that is a server or client in such a case. Furthermore, the above-mentioned program may be for realizing a part of the above-mentioned functions, and may further be capable of realizing the above-mentioned functions in combination with a program already recorded in the computer system.

以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 One embodiment of the present invention has been described in detail above with reference to the drawings, but the specific configuration is not limited to the above, and various design changes can be made without departing from the spirit of the present invention.

本発明は、例えば、映像コンテンツを提示するための装置や、映像コンテンツを提示するサービス等に利用することができる。但し、本発明の利用範囲はここに例示したものには限られない。 The present invention can be used, for example, in devices for presenting video content, services for presenting video content, etc. However, the scope of use of the present invention is not limited to the examples given here.

１、１－１、１－２…コンテンツ提示装置、１６２…基準位置設定部、１３１…周辺映像取得部、１６６…第１被写体映像抽出部、１６７…背景映像取得部、１８１…第２被写体映像取得部、１５…コンテンツ取得部、１６３…第１被写体位置設定部、１６４…第２被写体位置設定部、１６５…コンテンツ位置設定部、１６１０…提示部、Ｐ１…第１被写体、Ｐ２…第２被写体、ＰＶ１…第１被写体映像、ＰＶ２…第２被写体映像、Ｃ１…コンテンツ、Ｍ１…第１被写体基準位置情報、Ｍ２…第２被写体基準位置情報、Ｔ１…３次元空間、Ｅ１…周辺映像、ＢＧ１…背景映像 1, 1-1, 1-2... Content presentation device, 162... Reference position setting unit, 131... Surrounding image acquisition unit, 166... First subject image extraction unit, 167... Background image acquisition unit, 181... Second subject image Acquisition unit, 15...Content acquisition unit, 163...First subject position setting unit, 164...Second subject position setting unit, 1 65...Content position setting section, 1610...Presentation section, P1...First subject, P2...Second subject, PV1...First subject video, PV2...Second subject video, C1...Content, M1...First subject reference position information , M2...second subject reference position information, T1...three-dimensional space, E1...surrounding image, BG1...background image

Claims

a reference position setting unit that sets, in a three-dimensional space, first subject reference position information, which is information indicating a reference position and orientation for a first subject video, which is a video including an image of a first subject, and second subject reference position information, which is information indicating a reference position and orientation for a second subject video, which is a video including an image of a second subject;
A surrounding image acquisition unit that acquires a surrounding image that is an image of the surroundings of the own device including the first subject;
a first object image extraction unit that extracts the first object image from the surrounding image;
a background image acquisition unit that acquires a background image that is an image to be used as a background;
a second object image acquisition unit that acquires the second object image from another content presentation device;
a content acquisition unit that acquires content;
a first object position setting unit that sets a position and a direction of the first object image in the three-dimensional space based on the first object reference position information;
a second object position setting unit that sets a position and a direction of the second object image in the three-dimensional space based on the second object reference position information;
a content position setting unit that sets a position in the three-dimensional space where the content is to be displayed based on a predetermined relative positional relationship between a position indicated by the first subject reference position information and a position indicated by the second subject reference position information;
a presentation unit that outputs a presentation image in which the first object image, the position and orientation of which are set in the three-dimensional space, the second object image, the position and orientation of which are set in the three-dimensional space, and the content, the position of which is set in the three-dimensional space, are displayed in the three-dimensional space against the background of the background image;
Equipped with
a relative positional relationship between a position in the three-dimensional space where the content is displayed, a position indicated by the first subject reference position information, and a position indicated by the second subject reference position information, the first subject reference position information, and the second subject reference position information are each shared with the other content presentation device ;
The presentation unit generates the presentation image using a video see-through method.
Content presentation device.

The content presentation device according to claim 1 , wherein the presentation unit outputs the sound included in the content by localizing the sound based on a position in the three-dimensional space where the content is displayed.

The content presentation device according to claim 1 , wherein the presentation section is configured to output the sound included in the second object video by localizing the sound based on a position of the second object video in the three-dimensional space.

An occlusion reproduction unit that determines which of the first object image and the second object image is in front in the presentation image based on a position of the first object image in the three-dimensional space and a position of the second object image in the three-dimensional space,
The content presentation device according to claim 1 , wherein the presentation unit outputs the presentation video based on a determination result of the occlusion reproduction unit.

5. The content presentation device according to claim 1, wherein the second subject position setting unit sets the position of the second subject image in the three-dimensional space by matching the position of the foot of the second subject included in the second subject image with the position of the bottom surface of the three-dimensional space based on the second subject reference position information.

The content presentation device according to claim 1 , wherein the presentation section displays the second object image at a transparency lower than a predetermined value.

a reference position setting process for setting, in a three-dimensional space, first subject reference position information, which is information indicating a reference position and orientation for a first subject image, which is an image including an image of a first subject, and second subject reference position information, which is information indicating a reference position and orientation for a second subject image, which is an image including an image of a second subject;
a surrounding image acquisition step of acquiring a surrounding image that is an image of the surroundings of the own device including the first object;
extracting the first object image from the surrounding image;
a background image acquisition step of acquiring a background image to be used as a background;
a second object image acquiring step of acquiring the second object image from another content presentation device;
a content acquisition process for acquiring content;
a first object position setting step of setting a position and a direction of the first object image in the three-dimensional space based on the first object reference position information;
a second object position setting step of setting a position and a direction of the second object image in the three-dimensional space based on the second object reference position information;
a content position setting process for setting a position in the three-dimensional space where the content is to be displayed based on a predetermined relative positional relationship between a position indicated by the first object reference position information and a position indicated by the second object reference position information;
a presentation process of outputting a presentation image in which the first object image, the position and orientation of which are set in the three-dimensional space, the second object image, the position and orientation of which are set in the three-dimensional space, and the content, the position of which is set in the three-dimensional space, are displayed in the three-dimensional space against the background of the background image;
A program for causing a computer to execute the process of
a relative positional relationship between a position in the three-dimensional space where the content is displayed, a position indicated by the first subject reference position information, and a position indicated by the second subject reference position information, the first subject reference position information, and the second subject reference position information are each shared with the other computer ,
The presenting step generates the presented image using a video see-through method.
program.