JP7716232B2

JP7716232B2 - Image processing device, control method thereof, and program

Info

Publication number: JP7716232B2
Application number: JP2021089463A
Authority: JP
Inventors: 拓人川原
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2025-07-31
Anticipated expiration: 2041-05-27
Also published as: US20220385876A1; JP2022182119A

Description

本発明は、画像処理装置およびその制御方法、プログラムに関する。 The present invention relates to an image processing device, a control method thereof, and a program.

昨今、複数のカメラを異なる位置に設置して多視点で同期撮影し、当該撮影により得られた多視点映像を用いて仮想視点映像を生成する技術が注目されている。例えば、特許文献１には、被写体を取り囲むように複数のカメラを配置し、これら複数のカメラで撮影した被写体の画像を用いて任意の視点の画像を生成する技術が開示されている。このような多視点映像から仮想視点映像を生成する技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することが出来るため、通常の映像と比較して視聴者に高臨場感を与えることが出来る。また、音楽イベントの撮影やライブ配信、ミュージックビデオなどでは、アーティストを様々な角度から写した映像を作成することができる。 Recently, attention has been focused on technology that uses multiple cameras installed in different positions to capture synchronized images from multiple viewpoints and generate virtual viewpoint images using the multi-view images obtained through this capture. For example, Patent Document 1 discloses technology in which multiple cameras are positioned to surround a subject and images of the subject captured by these cameras are used to generate an image from any viewpoint. This technology for generating virtual viewpoint images from multi-view images allows viewers to view highlight scenes of soccer or basketball games from various angles, for example, providing a greater sense of realism to viewers than with regular images. Furthermore, when filming music events, live streaming, or music videos, it is possible to create images of artists from various angles.

特開２００８－０１５７５６号公報Japanese Patent Application Laid-Open No. 2008-015756

音楽イベントの撮影やライブ配信、ミュージックビデオ等の撮影では、複数台のカメラから同時に得られる複数の映像を切り替えて使用することが行われる。例えば、第１のカメラにより被写体の周辺を含めたロングショットの映像から被写体のバストショットまでのいわゆる「引きの映像」を撮影する。また、例えば、第２のカメラにより被写体のバストショットの映像からクローズショットまでのいわゆる「よりの映像」を撮影する。そして、これら第１のカメラと第２のカメラにより撮影された映像を切り替えて使用することで、様々な被写体のサイズに対応した映像を生成することができる。このとき、例えば、第１のカメラを上述した仮想視点映像を生成する仮想視点（本明細書では仮想カメラと称する）とし、第２のカメラを仮想視点映像に利用しない画像を撮影する実態のあるカメラ（本明細書では実カメラと称する）とすることが考えられる。 When filming music events, live broadcasts, music videos, etc., it is common to switch between multiple images captured simultaneously from multiple cameras. For example, a first camera is used to capture what is known as a "long shot" ranging from a long shot including the subject's surroundings to a bust shot of the subject. A second camera is used to capture what is known as a "close-up shot" ranging from a bust shot to a close-up of the subject. By switching between the images captured by the first and second cameras, it is possible to generate images that correspond to various subject sizes. In this case, for example, the first camera could be used as a virtual viewpoint (referred to herein as a virtual camera) that generates the virtual viewpoint image described above, and the second camera could be used as a physical camera (referred to herein as a real camera) that captures images that are not used for the virtual viewpoint image.

一般に、２つの映像を切り替えて１つの映像を出力する映像切替装置では、映像を瞬時に別の映像に切り替わるため、切り替え時に映像が大きく変化する。このため、視聴者が違和感を持つ場合がある。映像の切り替え時における視聴者の違和感を低減するための方法として、映像の切り替えにおいて、フェードイン、フェードアウト等の映像効果を加えることが知られている。しかしながら、切り替え時においては第１のカメラによる映像と第２のカメラによる映像が用いられることに変わりはなく、映像の切り替えに起因した不自然な映像の変化の発生を避けることはできない。 Generally, video switching devices that switch between two videos and output one video instantly switch from one video to another, resulting in a significant change in the video when switching. This can cause viewers to feel uncomfortable. One known method for reducing the sense of discomfort felt by viewers when switching between videos is to add visual effects such as fade-in and fade-out when switching between videos. However, this still means that the video from the first camera and the video from the second camera are used when switching, and it is impossible to avoid unnatural changes in the video caused by switching between videos.

本発明の一態様によれば、２つの映像を切り替えて出力する際の映像の不自然な変化を低減する技術が提供される。 One aspect of the present invention provides technology that reduces unnatural changes in images when switching between two images for output.

本発明の一態様による画像処理装置は以下の構成を有する。すなわち、
少なくとも一方が撮像装置により得られる撮像映像である第１の映像及び第２の映像に係る情報を取得する取得手段であって、前記第１の映像を得るための第１の視点の情報と、前記第１の映像の時刻と対応する時刻の前記第２の映像を得るための第２の視点の情報とを取得する前記取得手段と、
出力される映像を前記第１の映像から前記第２の映像に切り替える際に、前記第１の映像の出力の終了から前記第２の映像の出力の開始までの期間を設定する設定手段と、
前記期間における前記第１の視点の情報と前記期間における前記第２の視点の情報とに基づいて、前記期間における仮想視点の情報を生成する第１生成手段と、
前記期間における仮想視点の情報に基づいて前記期間の仮想視点映像を生成する第２生成手段と、
前記第１の映像、前記期間の仮想視点映像、前記第２の映像の順に切り替えて出力する出力手段と、
を有し、
前記第１生成手段は、前記第１の視点の情報、前記第２の視点の情報、前記期間の開始からの経過時間と前記期間の全体の時間との比率に基づいて前記期間の仮想視点を生成する。
本発明の他の態様による画像処理装置は以下の構成を有する。すなわち、
少なくとも一方が撮像装置により得られる撮像映像である第１の映像及び第２の映像に係る情報を取得する取得手段であって、前記第１の映像を得るための第１の視点の情報と、前記第１の映像の時刻と対応する時刻の前記第２の映像を得るための第２の視点の情報とを取得する前記取得手段と、
出力される映像を前記第１の映像から前記第２の映像に切り替える際に、前記第１の映像の出力の終了から前記第２の映像の出力の開始までの期間を設定する設定手段と、
前記期間における前記第１の視点の情報と前記期間における前記第２の視点の情報とに基づいて、前記期間における仮想視点の情報を生成する第１生成手段と、
前記期間における仮想視点の情報に基づいて前記期間の仮想視点映像を生成する第２生成手段と、
前記第１の映像、前記期間の仮想視点映像、前記第２の映像の順に切り替えて出力する出力手段と、
前記期間に受け付けたユーザ操作に応じて比率を設定する設定手段と、
を有し、
前記第１生成手段は、前記第１の視点の情報、前記第２の視点の情報、前記設定手段により設定された比率に基づいて前記期間の仮想視点を生成する。
本発明のさらに他の態様による画像処理装置は以下の構成を有する。すなわち、
少なくとも一方が撮像装置により得られる撮像映像である第１の映像及び第２の映像に係る情報を取得する取得手段であって、前記第１の映像を得るための第１の視点の情報と、前記第１の映像の時刻と対応する時刻の前記第２の映像を得るための第２の視点の情報とを取得する前記取得手段と、
出力される映像を前記第１の映像から前記第２の映像に切り替える際に、前記第１の映像の出力の終了から前記第２の映像の出力の開始までの期間を設定する設定手段と、
前記期間における前記第１の視点の情報と前記期間における前記第２の視点の情報とに基づいて、前記期間における仮想視点の情報を生成する第１生成手段と、
前記期間における仮想視点の情報に基づいて前記期間の仮想視点映像を生成する第２生成手段と、
前記第１の映像、前記期間の仮想視点映像、前記第２の映像の順に切り替えて出力する出力手段と、
を有し、
前記第１生成手段は、前記期間における各時刻の仮想視点を、前記各時刻における前記第１の視点の情報と、前記各時刻における前記第２の視点の情報とに基づいて生成する。 An image processing device according to one aspect of the present invention has the following configuration:
an acquisition means for acquiring information relating to a first image and a second image, at least one of which is an image captured by an imaging device, the acquisition means acquiring information about a first viewpoint for acquiring the first image and information about a second viewpoint for acquiring the second image at a time corresponding to the time of the first image;
a setting means for setting a period from the end of output of the first video to the start of output of the second video when switching the video to be output from the first video to the second video;
a first generating means for generating virtual viewpoint information for the period based on the first viewpoint information for the period and the second viewpoint information for the period;
a second generating means for generating a virtual viewpoint video for the period based on information about a virtual viewpoint for the period;
an output means for switching between and outputting the first video, the virtual viewpoint video for the period, and the second video in that order;
and
The first generation means generates a virtual viewpoint for the period based on information about the first viewpoint, information about the second viewpoint, and a ratio of the elapsed time from the start of the period to the total time of the period.
An image processing device according to another aspect of the present invention has the following configuration:
an acquisition means for acquiring information relating to a first image and a second image, at least one of which is an image captured by an imaging device, the acquisition means acquiring information about a first viewpoint for acquiring the first image and information about a second viewpoint for acquiring the second image at a time corresponding to the time of the first image;
a setting means for setting a period from the end of output of the first video to the start of output of the second video when switching the video to be output from the first video to the second video;
a first generating means for generating virtual viewpoint information for the period based on the first viewpoint information for the period and the second viewpoint information for the period;
a second generating means for generating a virtual viewpoint video for the period based on information about a virtual viewpoint for the period;
an output means for switching between and outputting the first video, the virtual viewpoint video for the period, and the second video in that order;
a setting means for setting a ratio in accordance with a user operation received during the period;
and
The first generating means generates a virtual viewpoint for the period based on the information about the first viewpoint, the information about the second viewpoint, and the ratio set by the setting means.
An image processing apparatus according to still another aspect of the present invention has the following configuration:
an acquisition means for acquiring information relating to a first image and a second image, at least one of which is an image captured by an imaging device, the acquisition means acquiring information about a first viewpoint for acquiring the first image and information about a second viewpoint for acquiring the second image at a time corresponding to the time of the first image;
a setting means for setting a period from the end of output of the first video to the start of output of the second video when switching the video to be output from the first video to the second video;
a first generating means for generating virtual viewpoint information for the period based on the first viewpoint information for the period and the second viewpoint information for the period;
a second generating means for generating a virtual viewpoint video for the period based on information about a virtual viewpoint for the period;
an output means for switching between and outputting the first video, the virtual viewpoint video for the period, and the second video in that order;
and
The first generation means generates a virtual viewpoint at each time during the period based on information about the first viewpoint at each time and information about the second viewpoint at each time.

本発明によれば、２つの映像を切り替えて出力する際の映像の不自然な変化が低減される。 This invention reduces unnatural changes in the images when switching between two images for output.

第１実施形態における画像処理システムの全体構成の例を示す図。FIG. 1 is a diagram showing an example of the overall configuration of an image processing system according to a first embodiment. 第１実施形態による配信映像決定処理を示すフローチャート。10 is a flowchart showing a distribution video determination process according to the first embodiment. 仮想視点映像から実カメラ映像への切り替えのタイムラインを示す図。FIG. 10 is a diagram showing a timeline for switching from a virtual viewpoint image to a real camera image. 第１実施形態による仮想カメラ情報の生成の一例を示す図。FIG. 4 is a diagram showing an example of generation of virtual camera information according to the first embodiment. 第１実施形態による仮想カメラ情報の生成の一例を示す図。FIG. 4 is a diagram showing an example of generation of virtual camera information according to the first embodiment. 第１実施形態による仮想カメラ情報の生成の一例を示す図。FIG. 4 is a diagram showing an example of generation of virtual camera information according to the first embodiment. 第１実施形態による仮想カメラ情報の生成の他の例を示す図。10A to 10C are views showing another example of generation of virtual camera information according to the first embodiment. 第１実施形態による切替比率を指定するための操作部を説明する図。FIG. 3 is a diagram illustrating an operation unit for specifying a switching ratio according to the first embodiment. 第２実施形態における画像処理システムの全体構成の例を示す図。FIG. 10 is a diagram showing an example of the overall configuration of an image processing system according to a second embodiment. 第２実施形態による配信映像決定処理を示すフローチャート。10 is a flowchart showing a distribution video determination process according to the second embodiment. 第２実施形態による仮想カメラ情報の生成の一例を示す図。FIG. 10 is a diagram showing an example of generation of virtual camera information according to the second embodiment. 第２実施形態による仮想カメラ情報の生成の一例を示す図。FIG. 10 is a diagram showing an example of generation of virtual camera information according to the second embodiment. 第２実施形態による仮想カメラ情報の生成の一例を示す図。FIG. 10 is a diagram showing an example of generation of virtual camera information according to the second embodiment. 画像処理装置のハードウェア構成例を示すブロック図。FIG. 1 is a block diagram showing an example of the hardware configuration of an image processing apparatus.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 The following describes the embodiments in detail with reference to the attached drawings. Note that the following embodiments do not limit the scope of the claimed invention. While the embodiments describe multiple features, not all of these features are necessarily essential to the invention, and multiple features may be combined in any desired manner. Furthermore, in the attached drawings, the same reference numbers are used to designate identical or similar components, and redundant explanations will be omitted.

＜第１実施形態＞
以下、出力される映像を、第１の視点の映像から第２の視点の映像へ替える画像処理装置について説明する。第１実施形態では、第１の視点を、複数の撮像装置により撮影された複数の画像から仮想視点映像を生成するための仮想的な撮像装置の視点とし、第２の視点を、映像を撮影する物理的な撮像装置の視点とする。すなわち、第１の視点の映像は仮想視点映像であり、第２の視点の映像は実カメラによる映像（以下、実カメラ映像）である。以下では、仮想視点映像を生成する画像処理システムにおいて、仮想視点映像から実カメラ映像への切り替えにおいて、それら２つの映像をなめらかにつなげる新たな仮想視点映像を生成する例を説明する。 First Embodiment
An image processing device that changes an output image from a first viewpoint image to a second viewpoint image will be described below. In the first embodiment, the first viewpoint is the viewpoint of a virtual imaging device for generating a virtual viewpoint image from multiple images captured by multiple imaging devices, and the second viewpoint is the viewpoint of a physical imaging device that captures the image. That is, the first viewpoint image is a virtual viewpoint image, and the second viewpoint image is an image captured by a real camera (hereinafter, referred to as a real camera image). An example will be described below in which, in an image processing system that generates a virtual viewpoint image, a new virtual viewpoint image is generated that smoothly connects the two images when switching from the virtual viewpoint image to the real camera image.

図１は、第１実施形態にかかわる仮想視点映像を生成する画像処理システムの構成例を示すブロック図である。カメラ群１０１は仮想視点映像を生成するために、撮影範囲の多視点画像を取得する複数の撮像装置（以下、カメラと称する）で構成される。複数のカメラの各々は内部に撮像素子を備え、その前方にレンズが備えられている。複数のカメラは、撮影範囲に向けて撮影範囲の周囲に設置固定されている。カメラ制御部１０２は、カメラ群１０１の各カメラを制御する。カメラ制御部１０２は、カメラ群１０１のカメラごとに設けられ、カメラ制御ケーブルとカメラ画像出力ケーブルとでカメラ群１０１の各カメラと接続されている。また、複数のカメラ制御部１０２の間は、ローカルネットワークケーブル等を介して、例えばデイジーチェインで接続され、後段に接続される画像処理装置１０３にカメラ群１０１の画像を送信する。なお、複数のカメラ制御部１０２を接続するためのネットワーク構成はデイジーチェインに限られるものではなく、カメラ制御それぞれが画像処理装置に接続されるスター型のネットワーク構成であってもよい。 FIG. 1 is a block diagram showing an example configuration of an image processing system for generating a virtual viewpoint video according to the first embodiment. The camera group 101 is composed of multiple imaging devices (hereinafter referred to as cameras) that capture multi-viewpoint images of the shooting range in order to generate a virtual viewpoint video. Each of the multiple cameras has an internal imaging element and a lens in front of it. The multiple cameras are fixedly installed around the shooting range, facing the shooting range. A camera control unit 102 controls each camera in the camera group 101. A camera control unit 102 is provided for each camera in the camera group 101, and is connected to each camera in the camera group 101 via a camera control cable and a camera image output cable. The multiple camera control units 102 are connected to each other via a local network cable or the like, for example, in a daisy chain, and transmit images from the camera group 101 to an image processing device 103 connected downstream. The network configuration for connecting the multiple camera control units 102 is not limited to a daisy chain, and may be a star-type network configuration in which each camera control unit is connected to an image processing device.

画像処理装置１０３は、カメラ群１０１で取得した画像（多視点画像）を基に仮想的な視点からの映像である仮想視点映像を生成し、出力する機能を有する。以下、画像処理装置１０３の機能構成について説明する。 The image processing device 103 has the function of generating and outputting a virtual viewpoint image, which is an image from a virtual viewpoint, based on the images (multi-viewpoint images) acquired by the camera group 101. The functional configuration of the image processing device 103 is described below.

画像取得部１０４は、カメラ制御部１０２から、カメラ群１０１により取得された撮影画像（多視点画像）を取得する。なお、画像取得部１０４は、撮影対象（前景）が含まれていない撮影領域をカメラ群１０１により撮影することで得られた撮影画像を背景画像として事前に取得し、背景画像記憶部１０５に記憶する。分離部１０６は、撮影領域を撮影した撮影画像から、その画像に含まれている撮影対象（前景）を分離する。分離部１０６は、例えば、背景差分による分離を行う。より具体的には、分離部１０６は、事前に取得され背景画像記憶部１０５に記憶されている背景画像と撮影画像を比較し、その差分を撮影対象である前景として識別することにより、前景と背景を分離する。分離部１０６は、分離した前景を含む画像（以下、前景画像という）を前景画像記憶部１０７に記憶する。なお、分離部１０６が用いる前景と背景の分離方法は、上述の背景差分を用いた分離手法に限られるものではなく、例えば、距離画像を利用した分離手法など、周知の分離手法が用いられ得る。 The image acquisition unit 104 acquires captured images (multi-view images) acquired by the camera group 101 from the camera control unit 102. The image acquisition unit 104 also acquires, in advance, captured images obtained by the camera group 101 capturing an image of a capturing area that does not include the subject (foreground) as background images and stores the images in the background image storage unit 105. The separation unit 106 separates the subject (foreground) included in the captured image from the captured image of the capturing area. The separation unit 106 performs separation by, for example, background subtraction. More specifically, the separation unit 106 compares the captured image with a background image acquired in advance and stored in the background image storage unit 105, and identifies the difference as the foreground, which is the subject, thereby separating the foreground from the background. The separation unit 106 stores the separated image including the foreground (hereinafter referred to as the foreground image) in the foreground image storage unit 107. Note that the method of separating the foreground and background used by the separation unit 106 is not limited to the separation method using background subtraction described above, and any well-known separation method, such as a separation method using a distance image, can be used.

前景画像記憶部１０７には、撮影領域の周囲に設置されたカメラ群１０１の撮影画像から分離部１０６により分離された複数の前景画像（複数のカメラ（すなわち複数の視点）で取得された複数の前景画像）が記憶される。３Ｄモデル生成部１０８は、前景画像記憶部１０７から前景画像を取得し、前景の３Ｄモデルを生成する。３Ｄモデル生成部１０８は、例えば、複数視点で取得された前景画像から視体積交差法を用いて前景の３Ｄモデルを生成する。生成された前景の３Ｄモデルとその位置情報は３Ｄモデル記憶部１０９に記憶される。 The foreground image storage unit 107 stores multiple foreground images (multiple foreground images acquired from multiple cameras (i.e., multiple viewpoints)) separated by the separation unit 106 from images captured by the group of cameras 101 installed around the shooting area. The 3D model generation unit 108 acquires the foreground images from the foreground image storage unit 107 and generates a 3D model of the foreground. The 3D model generation unit 108 generates a 3D model of the foreground using, for example, a volume intersection method from the foreground images acquired from multiple viewpoints. The generated foreground 3D model and its position information are stored in the 3D model storage unit 109.

仮想カメラ情報生成部１１０は、ジョイスティックや各種入力部などのユーザインターフェースから受け付けた、仮想視点の位置、視線の方向などを指示するユーザ操作に応じて仮想カメラ情報を生成する。仮想カメラ情報は、仮想視点映像の仮想視点（以下、仮想カメラともいう）の位置・姿勢（視線方向）・画角（焦点距離）の情報及び時刻情報を含む。すなわち、仮想カメラ情報生成部１１０の機能は、ジョイスティック等の入力部を用いた操作者による仮想カメラの操作に応じて、仮想視点映像を生成するために必要な仮想視点の時刻ごとの情報を生成する。 The virtual camera information generation unit 110 generates virtual camera information in response to user operations that specify the position of the virtual viewpoint, the direction of the line of sight, etc., received from a user interface such as a joystick or various input units. The virtual camera information includes information on the position, attitude (line of sight direction), and angle of view (focal length) of the virtual viewpoint (hereinafter also referred to as the virtual camera) of the virtual viewpoint video, as well as time information. In other words, the function of the virtual camera information generation unit 110 is to generate time-specific information about the virtual viewpoint required to generate the virtual viewpoint video in response to operation of the virtual camera by the operator using an input unit such as a joystick.

仮想視点映像生成部１１１は、仮想カメラ情報生成部１１０または後述の仮想カメラ情報自動生成部１１７により生成された仮想カメラ情報により表される時刻、仮想カメラの位置、姿勢、画角に基づいて仮想視点映像を生成する。例えば、仮想視点映像生成部１１１は、仮想視点映像を生成するために、当該時刻の前景画像を前景画像記憶部１０７から、当該時刻の前景３Ｄモデルを３Ｄモデル記憶部１０９から取得し、仮想カメラの位置、姿勢、画角に対応した前景画像を生成する。また、仮想視点映像生成部１１１は、背景画像記憶部１０５に記憶された背景画像を取得し、あらかじめ用意されている背景３Ｄモデルを取得し、仮想カメラの位置、姿勢、画角に対応した背景画像を生成する。仮想視点映像生成部１１１は、生成した前景画像と背景画像を合成して仮想視点映像として出力する。仮想視点映像は、映像切替部１１５に提供され、最終的な映像として出力される映像候補の１つとなる。 The virtual viewpoint video generation unit 111 generates a virtual viewpoint video based on the time, position, attitude, and angle of view of the virtual camera represented by the virtual camera information generated by the virtual camera information generation unit 110 or the automatic virtual camera information generation unit 117 (described below). For example, to generate a virtual viewpoint video, the virtual viewpoint video generation unit 111 acquires a foreground image for that time from the foreground image storage unit 107 and a foreground 3D model for that time from the 3D model storage unit 109, and generates a foreground image corresponding to the position, attitude, and angle of view of the virtual camera. The virtual viewpoint video generation unit 111 also acquires a background image stored in the background image storage unit 105, acquires a prepared background 3D model, and generates a background image corresponding to the position, attitude, and angle of view of the virtual camera. The virtual viewpoint video generation unit 111 combines the generated foreground image and background image and outputs the result as a virtual viewpoint video. The virtual viewpoint video is provided to the video switching unit 115 and becomes one of the candidate images to be output as the final video.

実カメラ１１２は、カメラ群１０１とは独立に、仮想カメラの撮影範囲を撮影することが可能なカメラである。実カメラ１１２は、仮想視点映像のために必要な画像を取得するのではなく、被写体をクローズアップで撮影したりするために用いられる。なお、本実施形態では、仮想視点映像に必要な画像を取得するカメラ群１０１、および、実際には存在しないが仮想視点映像を取得しているかのような位置に仮想的に配置される仮想カメラと区別するために、実カメラという名称を用いている。実カメラ１１２により得られる撮像映像は、後述する映像切替部１１５に提供され、最終的な映像として出力される映像候補の１つとなる。 The real camera 112 is a camera that can capture images within the shooting range of the virtual camera independently of the camera group 101. The real camera 112 is used not to capture images necessary for the virtual viewpoint video, but to capture close-up images of a subject. Note that in this embodiment, the term "real camera" is used to distinguish it from the camera group 101 that captures images necessary for the virtual viewpoint video, and the virtual camera that does not actually exist but is virtually placed in a position as if capturing the virtual viewpoint video. The captured image obtained by the real camera 112 is provided to the image switching unit 115, which will be described later, and becomes one of the image candidates to be output as the final image.

実カメラ情報取得部１１３は、実カメラ１１２の位置、姿勢（視線方向）、画角（焦点距離）を含む情報を取得する。実カメラ情報取得部１１３は、実カメラ１１２の位置および姿勢を、例えば、実カメラ１１２が移動する範囲に配置されたマーカが実カメラ１１２により撮影された画像に映り込んでいる位置から推定する。但し、これに限られるものではなく、例えば、実カメラ１１２に実カメラとは別に位置推定用にマーカを撮影するカメラを接続することでマーカの画像を得てもよい。また、マーカを配置せず、実カメラ１１２により撮影された画像から、位置が既知である特徴的な個所を特定し、実カメラ１１２の位置および姿勢を推定するようにしてもよい。 The real camera information acquisition unit 113 acquires information including the position, orientation (line of sight direction), and angle of view (focal length) of the real camera 112. The real camera information acquisition unit 113 estimates the position and orientation of the real camera 112, for example, from the position at which a marker placed within the range in which the real camera 112 moves is reflected in an image captured by the real camera 112. However, this is not limited to this, and for example, an image of the marker may be obtained by connecting a camera that captures markers for position estimation to the real camera 112 separately from the real camera. Alternatively, without placing a marker, the position and orientation of the real camera 112 may be estimated by identifying characteristic locations whose positions are known from images captured by the real camera 112.

映像決定部１１４は、複数の出力映像の候補から出力映像を選択して決定する。映像決定部１１４は、映像出力を選択するスイッチや、音量等を調整するフェーダーなどの入力部を備えている。また、映像を切り替える際の各種映像効果（トランジション）を加えて切り替えることもできる。例えば、仮想視点映像を出力すると決定したり、仮想視点映像から実カメラ映像に切替をしたり、切り替える際にフェードインやフェードアウト等の映像効果を加えるよう決定したりすることができる。映像決定部１１４は、選択した映像を指定するチャンネル情報や、切り替える際に実行される映像効果を示す情報を映像切替部１１５に送信する。映像切替部１１５は、映像決定部１１４からの情報を基に映像候補から映像を選択し、映像出力部１１６へ出力する。映像出力部１１６は、映像切替部１１５から供給される映像を外部へ出力する。 The video determination unit 114 selects and determines an output video from multiple output video candidates. The video determination unit 114 has input units such as a switch for selecting video output and a fader for adjusting volume, etc. The video determination unit 114 can also add various video effects (transitions) when switching between videos. For example, it can determine to output a virtual viewpoint video, switch from a virtual viewpoint video to a real camera video, or determine to add video effects such as fade-in or fade-out when switching. The video determination unit 114 transmits channel information specifying the selected video and information indicating the video effects to be executed when switching to the video switching unit 115. The video switching unit 115 selects a video from the video candidates based on the information from the video determination unit 114 and outputs it to the video output unit 116. The video output unit 116 outputs the video supplied from the video switching unit 115 to the outside.

仮想カメラ情報自動生成部１１７は、仮想カメラの映像から実カメラの映像へ出力映像を切り替える際に、切替前後の映像を繋ぐような仮想視点映像を得るための仮想カメラ情報を自動的に生成する。仮想カメラ情報自動生成部１１７で生成される仮想カメラ情報は、映像を切り替える際の映像効果の一つであり、仮想カメラと実カメラの位置、姿勢（視線の方向）、画角（焦点距離（ズーム値））が異なる場合に、仮想カメラ情報と実カメラ情報から新たな仮想カメラ情報を自動で生成し、映像の切り替え時における映像の変化をなめらかにする。 When switching the output video from a virtual camera to a real camera, the automatic virtual camera information generation unit 117 automatically generates virtual camera information to obtain a virtual viewpoint video that connects the videos before and after the switch. The virtual camera information generated by the automatic virtual camera information generation unit 117 is one of the video effects when switching videos. When the positions, attitudes (directions of gaze), and angle of view (focal length (zoom value)) of the virtual camera and real camera are different, new virtual camera information is automatically generated from the virtual camera information and real camera information, smoothing the change in video when switching videos.

次に、以上のような機能構成を実現する画像処理装置１０３のハードウェア構成について、図１０を用いて説明する。画像処理装置１０３は、ＣＰＵ（中央演算装置）１００１、ＲＯＭ（リードオンリーメモリ）１００２、ＲＡＭ（ランダムアクセスメモリ）１００３、補助記憶装置１００４、表示部１００５、操作部１００６、通信Ｉ／Ｆ１００７、及びバス１０１８を有する。 Next, the hardware configuration of the image processing device 103 that realizes the above-described functional configuration will be described using FIG. 10. The image processing device 103 has a CPU (Central Processing Unit) 1001, ROM (Read Only Memory) 1002, RAM (Random Access Memory) 1003, auxiliary storage device 1004, display unit 1005, operation unit 1006, communication I/F 1007, and bus 1018.

ＣＰＵ１００１は、ＲＯＭ１００２やＲＡＭ１００３に格納されているコンピュータプログラムやデータを用いて画像処理装置１０３の全体を制御することで、図１に示す画像処理装置１０３の各機能を実現する。なお、画像処理装置１０３がＣＰＵ１００１とは異なる１又は複数の専用のハードウェアを有し、ＣＰＵ１００１による処理の少なくとも一部を専用のハードウェアが実行してもよい。専用のハードウェアの例としては、ＡＳＩＣ（特定用途向け集積回路）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、およびＤＳＰ（デジタルシグナルプロセッサ）などがある。ＲＯＭ１００２は、変更を必要としないプログラムなどを格納する。ＲＡＭ１００３は、補助記憶装置１００４から供給されるプログラムやデータ、及び通信Ｉ／Ｆ１００７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置１００４は、例えばハードディスクドライブ等で構成され、画像データや音声データなどの種々のデータを記憶する。 The CPU 1001 realizes each function of the image processing device 103 shown in FIG. 1 by controlling the entire image processing device 103 using computer programs and data stored in the ROM 1002 and RAM 1003. The image processing device 103 may have one or more pieces of dedicated hardware separate from the CPU 1001, and at least some of the processing by the CPU 1001 may be performed by the dedicated hardware. Examples of dedicated hardware include an ASIC (application-specific integrated circuit), an FPGA (field-programmable gate array), and a DSP (digital signal processor). The ROM 1002 stores programs that do not require modification. The RAM 1003 temporarily stores programs and data supplied from the auxiliary storage device 1004, as well as data supplied from the outside via the communication I/F 1007. The auxiliary storage device 1004 is composed of, for example, a hard disk drive, and stores various data such as image data and audio data.

表示部１００５は、例えば液晶ディスプレイやＬＥＤ等で構成され、ユーザが画像処理装置１０３を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）などを表示する。操作部１００６は、例えばキーボードやマウス、ジョイスティック、タッチパネル等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ１００１に入力する。通信Ｉ／Ｆ１００７は、画像処理装置１０３の外部の装置との通信に用いられる。例えば、画像処理装置１０３が外部の装置と有線で接続される場合には、通信用のケーブルが通信Ｉ／Ｆ１００７に接続される。画像処理装置１０３が外部の装置と無線通信する機能を有する場合には、通信Ｉ／Ｆ１００７はアンテナを備える。バス１０１８は、画像処理装置１０３の各部をつないで情報を伝達する。 The display unit 1005 is composed of, for example, an LCD display or LEDs, and displays a GUI (Graphical User Interface) that allows the user to operate the image processing device 103. The operation unit 1006 is composed of, for example, a keyboard, mouse, joystick, touch panel, etc., and receives operations from the user to input various instructions to the CPU 1001. The communication I/F 1007 is used for communication with devices external to the image processing device 103. For example, if the image processing device 103 is connected to an external device via a wired connection, a communication cable is connected to the communication I/F 1007. If the image processing device 103 has the function of wirelessly communicating with external devices, the communication I/F 1007 is equipped with an antenna. The bus 1018 connects each unit of the image processing device 103 to transmit information.

本実施形態では表示部１００５と操作部１００６が画像処理装置１０３の内部に存在するものとするが、表示部１００５と操作部１００６との少なくとも一方が画像処理装置１０３の外部に別の装置として存在していてもよい。この場合、ＣＰＵ１００１が、表示部１００５を制御する表示制御部、及び操作部１００６を制御する操作制御部として動作してもよい。 In this embodiment, the display unit 1005 and operation unit 1006 are assumed to exist inside the image processing device 103, but at least one of the display unit 1005 and operation unit 1006 may exist as a separate device outside the image processing device 103. In this case, the CPU 1001 may operate as a display control unit that controls the display unit 1005 and an operation control unit that controls the operation unit 1006.

次に、以上のような構成を備えた画像処理装置１０３による、仮想カメラと実カメラの映像を切り替える際の処理について図２を用いて説明する。図２は第１実施形態の画像処理装置による出力映像決定処理を示すフローチャートである。なお、図２では、カメラ群１０１により取得された画像を前景画像記憶部１０７に記憶する処理、分離部１０６により分離された前景画像を３Ｄモデル記憶部１０９に記憶する処理は省略されている。 Next, the process of switching between images from a virtual camera and a real camera performed by the image processing device 103 configured as described above will be described with reference to Figure 2. Figure 2 is a flowchart showing the output image determination process performed by the image processing device of the first embodiment. Note that Figure 2 omits the process of storing images acquired by the camera group 101 in the foreground image storage unit 107 and the process of storing foreground images separated by the separation unit 106 in the 3D model storage unit 109.

ステップＳ２０１において、仮想視点映像生成部１１１は、仮想カメラ情報生成部１１０で生成された仮想カメラ情報を取得する。ステップＳ２０２において、仮想視点映像生成部１１１は、取得した仮想カメラ情報に基づいて仮想視点映像を生成する。ステップＳ２０３において、映像切替部１１５は、映像決定部１１４から出力映像の切替情報を取得する。切替情報は、例えば、映像決定部１１４が決定した切り替え後の出力映像のチャンネル、切り替える時刻などを示す。ステップＳ２０４において、映像切替部１１５は、ステップＳ２０３で取得した切替情報を基に出力映像を停止するかどうかを判断する。出力映像を停止すると判断された場合（ステップＳ２０４でＹＥＳ）、ステップＳ２０５において、映像切替部１１５が映像の出力を停止する。出力映像を停止しないと判断された場合（Ｓ２０４でＮＯ）、処理はステップＳ２０６に進む。 In step S201, the virtual viewpoint video generation unit 111 acquires the virtual camera information generated by the virtual camera information generation unit 110. In step S202, the virtual viewpoint video generation unit 111 generates a virtual viewpoint video based on the acquired virtual camera information. In step S203, the video switching unit 115 acquires output video switching information from the video determination unit 114. The switching information indicates, for example, the channel of the output video after switching determined by the video determination unit 114, the time of switching, etc. In step S204, the video switching unit 115 determines whether to stop the output video based on the switching information acquired in step S203. If it is determined that the output video should be stopped (YES in step S204), the video switching unit 115 stops the video output in step S205. If it is determined that the output video should not be stopped (NO in S204), the processing proceeds to step S206.

ステップＳ２０６において、映像切替部１１５は、ステップＳ２０３で取得した切替情報を基に出力映像を切り替えるかどうかを判断する。出力映像を切り替えないと判断された場合（ステップＳ２０６でＮＯ）、ステップＳ２０７において、映像切替部１１５は出力映像を切り替えることなく映像の出力を継続する。そして、処理はステップＳ２０１に戻る。一方、出力映像を切り替えると判断された場合（ステップＳ２０６でＹＥＳ）、処理はステップＳ２０８に進む。 In step S206, the video switching unit 115 determines whether to switch the output video based on the switching information acquired in step S203. If it is determined not to switch the output video (NO in step S206), in step S207 the video switching unit 115 continues outputting the video without switching the output video. Then, the process returns to step S201. On the other hand, if it is determined to switch the output video (YES in step S206), the process proceeds to step S208.

ステップＳ２０８において、映像切替部１１５は、出力映像の切替時に仮想カメラ情報を自動生成するかを判断する。仮想カメラ情報を自動生成しないと判断された場合（ステップＳ２０８でＮＯ）、ステップＳ２０９において、映像切替部１１５は切替情報を基に映像出力部１１６へ出力する映像を、切替情報が示す切り替え後の映像へ直ちに切り替える。例えば、仮想カメラ情報生成部１１０が生成する仮想視点から仮想視点映像生成部１１１が生成した仮想視点映像から、実カメラ１１２により撮影された実カメラ映像への切り替えが行われる。そして、処理はステップＳ２０１に戻る。一方、仮想カメラ情報を自動生成すると判定された場合（ステップＳ２０８でＹＥＳ）、処理はステップＳ２１０に進む。 In step S208, the video switching unit 115 determines whether to automatically generate virtual camera information when switching the output video. If it is determined that virtual camera information should not be automatically generated (NO in step S208), in step S209, the video switching unit 115 immediately switches the video to be output to the video output unit 116 based on the switching information to the post-switching video indicated by the switching information. For example, switching is performed from the virtual viewpoint generated by the virtual camera information generation unit 110 to the virtual viewpoint video generated by the virtual viewpoint video generation unit 111 to the real camera video captured by the real camera 112. Then, processing returns to step S201. On the other hand, if it is determined that virtual camera information should be automatically generated (YES in step S208), processing proceeds to step S210.

映像決定部１１４からの切替情報は仮想カメラ情報自動生成部１１７にも提供されている。ステップＳ２１０において、仮想カメラ情報自動生成部１１７は、映像決定部１１４から受信した切替情報から切替条件を取得する。切替条件は、例えば仮想カメラ情報を自動生成する期間（開始時刻と終了時刻）を示す移行期間の情報を含む。仮想カメラ情報自動生成部１１７は、仮想視点を生成するのに必要な仮想カメラ情報を仮想カメラ情報生成部１１０から、実カメラ情報を実カメラ情報取得部１１３から取得する。ステップＳ２１１において、仮想カメラ情報自動生成部１１７は、仮想カメラ情報と実カメラ情報と切替条件に基づいて、映像を切り替える際の新たな仮想視点の情報（仮想カメラ情報）を生成する。ステップＳ２１２において、仮想視点映像生成部１１１は、仮想カメラ情報自動生成部１１７により新たに生成された仮想視点に基づいて仮想視点映像を生成する。映像切替部１１５は、この新たな仮想視点から得られる仮想視点映像を出力した後、選択された映像（本例では実カメラ映像）の出力を開始する。そして、処理はステップＳ２０１に戻る。 The switching information from the video determination unit 114 is also provided to the virtual camera information automatic generation unit 117. In step S210, the virtual camera information automatic generation unit 117 acquires switching conditions from the switching information received from the video determination unit 114. The switching conditions include, for example, transition period information indicating the period (start time and end time) for automatically generating virtual camera information. The virtual camera information automatic generation unit 117 acquires virtual camera information necessary to generate a virtual viewpoint from the virtual camera information generation unit 110 and real camera information from the real camera information acquisition unit 113. In step S211, the virtual camera information automatic generation unit 117 generates new virtual viewpoint information (virtual camera information) for switching videos based on the virtual camera information, real camera information, and switching conditions. In step S212, the virtual viewpoint video generation unit 111 generates a virtual viewpoint video based on the new virtual viewpoint generated by the virtual camera information automatic generation unit 117. The video switching unit 115 outputs the virtual viewpoint video obtained from this new virtual viewpoint and then starts outputting the selected video (in this example, the real camera video). Processing then returns to step S201.

以下に仮想カメラから実カメラに出力映像を切り替える際の時刻経過ごとの仮想視点映像、実カメラ映像と出力映像の関係を、図３を使って説明する。図３は第１実施形態における映像の切り替え処理のタイムラインを示す図である。図３において、第１の仮想視点映像３０１は、仮想カメラ情報生成部１１０により生成された仮想カメラ情報（第１の仮想カメラ情報ともいう）に基づいて仮想視点映像生成部１１１が生成した仮想視点映像である。実カメラ映像３０２は、実カメラ１１２が撮影し出力する映像である。第２の仮想視点映像３０３は、仮想カメラ情報自動生成部１１７により生成された仮想カメラ情報（第２の仮想カメラ情報ともいう）に基づいて仮想視点映像生成部１１１が生成した仮想視点映像である。出力映像３０４は、映像切替部１１５が、候補映像である第１の仮想視点映像３０１、実カメラ映像３０２および第２の仮想視点映像３０３の中から選択し、出力する映像である。なお、横軸は時刻を表している。 The relationship between the virtual viewpoint video, real camera video, and output video over time when switching the output video from the virtual camera to the real camera is explained below using Figure 3. Figure 3 is a diagram showing a timeline of the video switching process in the first embodiment. In Figure 3, the first virtual viewpoint video 301 is a virtual viewpoint video generated by the virtual viewpoint video generation unit 111 based on the virtual camera information (also referred to as first virtual camera information) generated by the virtual camera information generation unit 110. The real camera video 302 is a video captured and output by the real camera 112. The second virtual viewpoint video 303 is a virtual viewpoint video generated by the virtual viewpoint video generation unit 111 based on the virtual camera information (also referred to as second virtual camera information) generated by the virtual camera information automatic generation unit 117. The output video 304 is a video selected by the video switching unit 115 from among the candidate videos of the first virtual viewpoint video 301, real camera video 302, and second virtual viewpoint video 303, and output. The horizontal axis represents time.

仮想視点映像生成部１１１は、操作者による仮想カメラ操作に応じて仮想カメラ情報生成部１１０が生成する仮想カメラ情報に従って第１の仮想視点映像３０１を生成し、出力している。実カメラ１１２も、自身が撮影した実カメラ映像３０２を出力している。なお、実カメラ１１２はカメラマンにより撮影中の位置及び姿勢、ズームなどが操作されている。時刻ｔ０において、映像決定部１１４は、ｔ２－ｔ０秒後に第１の仮想視点映像３０１から実カメラ映像３０２へ、ｔ７－ｔ２秒かけて第２の仮想視点映像３０３を用いて切り替えることを示す切替情報３１０を、映像切替部１１５に出力する。図３の例では、第１の仮想視点映像の出力を終了する時刻ｔ２から、実カメラ映像３０２の出力を開始する時刻ｔ７までの間が移行期間として設定されている。 The virtual viewpoint video generation unit 111 generates and outputs a first virtual viewpoint video 301 in accordance with virtual camera information generated by the virtual camera information generation unit 110 in response to virtual camera operation by the operator. The real camera 112 also outputs a real camera video 302 that it has captured. Note that the real camera 112's position, orientation, zoom, etc. are controlled by the cameraman during capture. At time t0, the video determination unit 114 outputs switching information 310 to the video switching unit 115, indicating that switching will be performed from the first virtual viewpoint video 301 to the real camera video 302 after t2-t0 seconds, and then over t7-t2 seconds using the second virtual viewpoint video 303. In the example of Figure 3, the period from time t2, when output of the first virtual viewpoint video ends, to time t7, when output of the real camera video 302 begins, is set as a transition period.

映像切替部１１５が受け付ける切替情報３１０は、第１の仮想視点映像３０１から実カメラ映像３０２へ出力映像を切り替えること、切替条件として第２の仮想視点映像３０３を用いることを指示している。なお、第２の仮想視点映像３０３は、仮想カメラ情報自動生成部１１７が生成する仮想カメラ情報に基づいて仮想視点映像生成部１１１が生成した仮想視点画像である。また、切替条件には、時刻ｔ２からｔ７が、映像の切り替えのための移行期間（第２の仮想視点映像を出力する期間）として設定されている。 The switching information 310 received by the video switching unit 115 instructs that the output video be switched from the first virtual viewpoint video 301 to the real camera video 302, and that the second virtual viewpoint video 303 be used as the switching condition. Note that the second virtual viewpoint video 303 is a virtual viewpoint image generated by the virtual viewpoint video generation unit 111 based on the virtual camera information generated by the automatic virtual camera information generation unit 117. Furthermore, the switching condition sets the time from t2 to t7 as the transition period for switching the video (the period during which the second virtual viewpoint video is output).

上述のような切替条件を含む切替情報３１０が映像決定部１１４から出力されると、図２のステップＳ２０６とステップＳ２０８でＹＥＳと判定される。仮想カメラ情報自動生成部１１７は、この切替条件を受け付けると、時刻ｔ２から時刻ｔ７にかけて第１の仮想視点映像３０１から実カメラ映像３０２へ切り替えるための第２の仮想視点映像３０３を作成するための新たな仮想視点（第２の仮想視点ともいう）を生成する。より具体的には、まず、仮想カメラ情報自動生成部１１７は時刻ｔ２から時刻ｔ７までの仮想視点の情報を作成するために、仮想カメラ情報生成部１１０から仮想カメラ情報を、実カメラ情報取得部１１３から実カメラ情報を取得する。仮想カメラ情報は、仮想視点映像生成部１１１が第１の仮想視点映像３０１を生成するのに用いている仮想視点の位置、視線の方向、画角の情報を含む。実カメラ情報は、実カメラ映像３０２を撮影している実カメラ１１２の位置、姿勢、画角の情報を含む。映像切替部１１５は、時刻ｔ２までは第１の仮想視点映像３０１を選択して映像出力部１１６へ出力する。時刻ｔ２で、映像切替部１１５は、映像出力部１１６へ出力する映像を、第１の仮想視点映像３０１から第２の仮想視点映像３０３に切り替える。さらに、時刻ｔ７で、映像切替部１１５は、映像出力部１１６へ出力する映像を、第２の仮想視点映像３０３から実カメラ映像３０２に切り替える。映像出力部１１６は、映像切替部１１５から送られた映像を出力する。 When switching information 310 including the switching conditions described above is output from the image determination unit 114, YES is determined in steps S206 and S208 of FIG. 2. When the automatic virtual camera information generation unit 117 receives this switching condition, it generates a new virtual viewpoint (also referred to as a second virtual viewpoint) for creating a second virtual viewpoint video 303 for switching from the first virtual viewpoint video 301 to the real camera video 302 from time t2 to time t7. More specifically, to create virtual viewpoint information from time t2 to time t7, the automatic virtual camera information generation unit 117 first acquires virtual camera information from the virtual camera information generation unit 110 and real camera information from the real camera information acquisition unit 113. The virtual camera information includes information on the position, line of sight direction, and angle of view of the virtual viewpoint used by the virtual viewpoint video generation unit 111 to generate the first virtual viewpoint video 301. The real camera information includes information on the position, attitude, and angle of view of the real camera 112 capturing the real camera video 302. Until time t2, the video switching unit 115 selects the first virtual viewpoint video 301 and outputs it to the video output unit 116. At time t2, the video switching unit 115 switches the video to be output to the video output unit 116 from the first virtual viewpoint video 301 to the second virtual viewpoint video 303. Furthermore, at time t7, the video switching unit 115 switches the video to be output to the video output unit 116 from the second virtual viewpoint video 303 to the real camera video 302. The video output unit 116 outputs the video sent from the video switching unit 115.

仮想カメラ情報自動生成部１１７による仮想カメラ情報の自動生成処理の一例について図４Ａ～図４Ｃを用いて詳細に説明する。図４Ａ～図４Ｃは第１実施形態における、仮想カメラ情報の自動生成処理の例である。図４Ａは、第１の仮想視点映像３０１を生成するための仮想カメラ、第２の仮想視点映像３０３を生成するための仮想カメラ、および、実カメラ映像３０２を撮影する実カメラ１１２の、時刻ｔ０からｔ１０の間の時刻ごとの位置と姿勢を示している。以下では仮想カメラと実カメラの位置について説明するが、その他のカメラ情報（姿勢、ズーム状態など）も同様に算出可能である。なお、ｔ０～ｔ１０のタイムラインは、図３に示したタイムラインに対応している。 An example of the process for automatically generating virtual camera information by the virtual camera information automatic generation unit 117 will be described in detail using Figures 4A to 4C. Figures 4A to 4C show an example of the process for automatically generating virtual camera information in the first embodiment. Figure 4A shows the positions and orientations at each time between t0 and t10 of the virtual camera for generating the first virtual viewpoint video 301, the virtual camera for generating the second virtual viewpoint video 303, and the real camera 112 that captures the real camera video 302. The positions of the virtual camera and real camera are described below, but other camera information (orientation, zoom state, etc.) can also be calculated in a similar manner. The timeline from t0 to t10 corresponds to the timeline shown in Figure 3.

図４Ａ～図４Ｃにおいて、第１の仮想カメラ情報４０１は、仮想カメラ情報生成部１１０で生成された第１の仮想カメラの位置情報によって示される位置を黒色の破線矢印で示している。第１の仮想カメラは、ｔ０からｔ１０の間、黒色の破線矢印に沿って、矢印の方向へ時々刻々と移動している。実カメラ情報４０３は、実カメラ情報取得部１１３にて取得された実カメラ１１２の位置情報によって示される位置を白抜きの破線矢印で示している。実カメラ１１２はｔ０からｔ１０の間、白抜きの破線矢印にそって矢印の方向へ時々刻々と移動している。仮想カメラ情報自動生成部１１７は、時刻ｔ２の仮想カメラ情報を起点にし、各時刻の実カメラ情報に近づいていくように第２の仮想カメラ情報４０２を生成していく。第２の仮想カメラ情報４０２による第２の仮想カメラの移動を、図４Ａ～図４Ｃでは、黒の実線矢印で示している。 In Figures 4A to 4C, first virtual camera information 401 indicates the position indicated by the position information of the first virtual camera generated by the virtual camera information generation unit 110 with a black dashed arrow. The first virtual camera moves moment by moment along the black dashed arrow in the direction of the arrow between t0 and t10. Real camera information 403 indicates the position indicated by the position information of the real camera 112 acquired by the real camera information acquisition unit 113 with a white dashed arrow. The real camera 112 moves moment by moment along the white dashed arrow in the direction of the arrow between t0 and t10. The virtual camera information automatic generation unit 117 generates second virtual camera information 402 starting from the virtual camera information at time t2, gradually approaching the real camera information at each time. In Figures 4A to 4C, the movement of the second virtual camera based on the second virtual camera information 402 is indicated by a black solid arrow.

以下、図４Ｂから図４Ｃを用いて、仮想カメラ情報自動生成部１１７が、時々刻々と移動していく第１の仮想カメラの位置と実カメラ１１２の位置から第２の仮想カメラ２の位置を生成する手法を説明する。以下では、第１の仮想カメラの情報と実カメラの情報、移行期間の開始からの経過時間と移行期間の全体の時間との比率に基づいて第２の仮想視点の情報を生成する例を説明する。 The following describes, using Figures 4B and 4C, how the automatic virtual camera information generation unit 117 generates the position of the second virtual camera 2 from the position of the first virtual camera, which moves from moment to moment, and the position of the real camera 112. Below, we explain an example of generating information about the second virtual viewpoint based on information about the first virtual camera, information about the real camera, and the ratio between the elapsed time from the start of the transition period and the total time of the transition period.

図４Ｂの４ａは、第１の仮想カメラ、実カメラ１１２、および第２の仮想カメラの時刻ｔ２における位置を示している。時刻ｔ２の時点では、第２の仮想カメラの位置と第１の仮想カメラの位置とは同じである。４ｂは、第１の仮想カメラ、実カメラ１１２、第２の仮想カメラの時刻ｔ３における位置を示している。第２の仮想カメラの時刻ｔ３における位置は、経過時間（ｔ３－ｔ２）と移行期間の全体の時間（ｔ７－ｔ２）との比率に基づいて決定される。より具体的には、第１の仮想カメラの時刻ｔ２の位置と実カメラ１１２の時刻ｔ３位置とを結ぶ線分上を、（ｔ３－ｔ２）／（ｔ７－ｔ２）の割合だけ第１の仮想カメラから実カメラ１１２の方へ進んだ位置が第２の仮想カメラの時刻ｔ３の位置となる。換言すると、第１の仮想視点の位置と実カメラ１１２の位置を比率に基づいて加重平均することにより移行期間における第２の仮想カメラの位置が生成される。４ｃは、第１の仮想カメラ、実カメラ１１２、第２の仮想カメラの時刻ｔ４における位置を示している。第２の仮想カメラの時刻ｔ４の位置は、時刻ｔ３と同様の方法で生成される。すなわち、第１の仮想カメラの時刻ｔ２の位置と実カメラ１１２の時刻ｔ４の位置とを結ぶ線分上を、（ｔ４－ｔ２）／（ｔ７－ｔ２）の割合だけ第１の仮想カメラから実カメラ１１２の方へ進んだ位置が第２の仮想カメラの時刻ｔ４の位置となる。 4a in Figure 4B shows the positions of the first virtual camera, real camera 112, and second virtual camera at time t2. At time t2, the position of the second virtual camera is the same as the position of the first virtual camera. 4b shows the positions of the first virtual camera, real camera 112, and second virtual camera at time t3. The position of the second virtual camera at time t3 is determined based on the ratio of the elapsed time (t3 - t2) to the total time of the transition period (t7 - t2). More specifically, the position of the second virtual camera at time t3 is the position on the line segment connecting the position of the first virtual camera at time t2 and the position of real camera 112 at time t3, which is located a distance from the first virtual camera toward real camera 112 by the ratio (t3 - t2) / (t7 - t2). In other words, the position of the second virtual camera during the transition period is generated by taking a weighted average based on the ratio between the position of the first virtual viewpoint and the position of the real camera 112. 4c shows the positions of the first virtual camera, real camera 112, and second virtual camera at time t4. The position of the second virtual camera at time t4 is generated in the same manner as time t3. That is, the position of the second virtual camera at time t4 is the position on the line segment connecting the position of the first virtual camera at time t2 and the position of the real camera 112 at time t4, moved from the first virtual camera toward the real camera 112 by a ratio of (t4 - t2) / (t7 - t2).

図４Ｃの４ｄは、第１の仮想カメラ、実カメラ１１２、第２の仮想カメラの時刻ｔ５における位置を示している。第２の仮想カメラの時刻ｔ５の位置も、時刻ｔ３と同様の方法で生成される。すなわち、第１の仮想カメラの時刻ｔ２の位置と実カメラ１１２の時刻ｔ５の位置とを結ぶ線分上で、（ｔ５－ｔ２）／（ｔ７－ｔ２）の割合だけ第１の仮想カメラから実カメラ１１２の方向へ進んだ位置が第２の仮想カメラの時刻ｔ５の位置となる。図Ｃの４ｅは、第１の仮想カメラ、実カメラ１１２、第２の仮想カメラの時刻ｔ６における位置を示している。時刻ｔ６の第２の仮想カメラの位置も上記と同様に生成される。すなわち、第１の仮想カメラの時刻ｔ２の位置と実カメラ１１２の時刻ｔ６の位置とを結ぶ線分上を、（ｔ６－ｔ２）／（ｔ７－ｔ２）の割合だけ第１の仮想カメラから実カメラ１１２の方へ進んだ位置である。図４Ｃの４ｆは、第１の仮想カメラ、実カメラ１１２、第２の仮想カメラの時刻ｔ７における位置を示している。第２の仮想カメラの時刻ｔ７の位置は、第１の仮想カメラの時刻ｔ２の位置と実カメラ１１２の時刻ｔ７の位置とを結ぶ線分上を、（ｔ７－ｔ２）／（ｔ７－ｔ２）の割合だけ第１の仮想カメラから実カメラ１１２の方向へ進んだ位置である。すなわち、移行期間の終了時刻である時刻ｔ７において、第２の仮想カメラの位置と実カメラ１１２の位置は同じになる。 4d in Figure 4C shows the positions of the first virtual camera, real camera 112, and second virtual camera at time t5. The position of the second virtual camera at time t5 is also generated in a similar manner to that of time t3. That is, the position of the second virtual camera at time t5 is the position on the line segment connecting the position of the first virtual camera at time t2 and the position of real camera 112 at time t5, which is moved from the first virtual camera toward real camera 112 by a ratio of (t5 - t2) / (t7 - t2). 4e in Figure C shows the positions of the first virtual camera, real camera 112, and second virtual camera at time t6. The position of the second virtual camera at time t6 is also generated in a similar manner. That is, the position is the position on the line segment connecting the position of the first virtual camera at time t2 and the position of real camera 112 at time t6, which is moved from the first virtual camera toward real camera 112 by a ratio of (t6 - t2) / (t7 - t2). 4f in Figure 4C shows the positions of the first virtual camera, real camera 112, and second virtual camera at time t7. The position of the second virtual camera at time t7 is a position on the line segment connecting the position of the first virtual camera at time t2 and the position of real camera 112 at time t7, moved from the first virtual camera toward real camera 112 by a ratio of (t7 - t2) / (t7 - t2). In other words, at time t7, which is the end time of the transition period, the position of the second virtual camera and the position of real camera 112 are the same.

以上のように、第１実施形態によれば、第１の仮想カメラによる仮想視点映像から実カメラ１１２による実カメラ映像へ切り替える際に、時刻ｔ２～時刻ｔ７の移行期間が設けられる。そして、この移行期間において、第１の仮想カメラの位置から実カメラ１１２の位置へ移動する第２の仮想カメラの情報が、当該移行期間における第１の仮想カメラの情報と実カメラの情報に基づいて生成される。したがって、第１の仮想カメラの映像から実カメラ１１２の映像へ切り替える際に、第１の仮想カメラと実カメラの位置が離れていても、移行期間においてその間を補間する仮想カメラの情報を自動的に生成することができる。結果、仮想カメラの映像から実カメラの映像への切り替えにおいて違和感のない映像を提供することが可能である。なお、仮想カメラ映像から実カメラ映像へ切り替える処理を説明したが、実カメラ映像から仮想カメラ映像への映像へ切り替える場合も上記と同様の処理を適用できる。なお、その場合、移行期間の最初の時刻における第２の仮想カメラの位置は、実カメラ１１２と同一の位置とし、第２の仮想カメラの位置を徐々に第１の仮想カメラの位置へ近づけていくことになる。 As described above, according to the first embodiment, a transition period from time t2 to time t7 is provided when switching from the virtual viewpoint image captured by the first virtual camera to the real camera image captured by real camera 112. During this transition period, information about the second virtual camera, which moves from the position of the first virtual camera to the position of real camera 112, is generated based on the information about the first virtual camera and the information about the real camera during the transition period. Therefore, when switching from the image captured by the first virtual camera to the image captured by real camera 112, even if the positions of the first virtual camera and the real camera are far apart, virtual camera information that interpolates between them can be automatically generated during the transition period. As a result, it is possible to provide a natural image when switching from the image captured by the virtual camera to the image captured by the real camera. While the process of switching from the virtual camera image to the real camera image has been described, the same process can also be applied when switching from the real camera image to the virtual camera image. In this case, the position of the second virtual camera at the first time of the transition period will be the same as that of the real camera 112, and the position of the second virtual camera will gradually approach the position of the first virtual camera.

なお、図４Ａ～４Ｃでは、移行期間における第２の仮想カメラの位置は、移行期間の開始時以外は第１の仮想カメラの位置に依存せずに、実カメラの位置に徐々に近づいていくようにしたが、これに限られるものではない。例えば、図５に示すような手法を用いて第２の仮想カメラ情報４０２が自動生成されてもよい。 Note that in Figures 4A to 4C, the position of the second virtual camera during the transition period is set to gradually approach the position of the real camera, independent of the position of the first virtual camera, except at the start of the transition period, but this is not limited to this. For example, the second virtual camera information 402 may be automatically generated using a method such as that shown in Figure 5.

図５は、第１実施形態における仮想視点映像の仮想カメラパス生成手法の他の例を示す。図４Ａ～４Ｃと同様、図５は、第１の仮想カメラ、第２の仮想カメラ、実カメラ１１２の時刻ｔ０からｔ１０の間の時刻ごとの位置を示している。本例では、第２の仮想カメラ情報４０２を生成するために、第１の仮想カメラと実カメラ１１２の同時刻の情報を用いて第２の仮想カメラの情報を生成する手法を説明する。図４Ａ～４Ｃで説明した方法と同様に、時刻ｔ２では、第１の仮想カメラの位置と第２の仮想カメラの位置は同じである。 Figure 5 shows another example of a method for generating a virtual camera path for a virtual viewpoint video in the first embodiment. Similar to Figures 4A to 4C, Figure 5 shows the positions of the first virtual camera, the second virtual camera, and the real camera 112 at each time between times t0 and t10. In this example, a method is described for generating second virtual camera information 402 using information from the first virtual camera and the real camera 112 at the same time. Similar to the method described in Figures 4A to 4C, at time t2, the positions of the first virtual camera and the second virtual camera are the same.

第２の仮想カメラの時刻ｔ３の位置は、第１の仮想カメラと実カメラ１１２の時刻ｔ３における位置を結ぶ線分上を、（ｔ３－ｔ２）／（ｔ７－ｔ２）の割合だけ第１の仮想カメラから実カメラ１１２の方向へ進んだ位置である。同様に、第２の仮想カメラの時刻ｔ４の位置は、第１の仮想カメラと実カメラ１１２の時刻ｔ４における位置を結ぶ線分上を、（ｔ４－ｔ２）／（ｔ７－ｔ２）の割合だけ第１の仮想カメラから実カメラ１１２の方向へ進んだ位置である。同様に、第２の仮想カメラの時刻ｔ５の位置は、第１の仮想カメラと実カメラ１１２の時刻ｔ５における位置を結ぶ線分上を、（ｔ５－ｔ２）／（ｔ７－ｔ２）の割合だけ第１の仮想カメラから実カメラ１１２の方向へ進んだ位置である。同様に、第１の仮想カメラの時刻ｔ６の位置は、第１の仮想カメラと実カメラ１１２の時刻ｔ６における位置を結ぶ線分上を、（ｔ６－ｔ２）／（ｔ７－ｔ２）の割合だけ第１の仮想カメラから実カメラ１１２の方向へ進んだ位置である。同様に、第２の仮想カメラの時刻ｔ７における位置は、第１の仮想カメラと実カメラ１１２の時刻ｔ７における位置を結ぶ線分上を、（ｔ７－ｔ２）／（ｔ７－ｔ２）の割合だけ第１の仮想カメラから実カメラ１１２の方向へ進んだ位置である。図４Ｃ（４ｆ）で説明したように、移行期間の終了時刻である時刻ｔ７において、第２の仮想カメラの位置と実カメラ１１２の位置は同じになる。 The position of the second virtual camera at time t3 is a position on the line segment connecting the positions of the first virtual camera and real camera 112 at time t3, moved from the first virtual camera toward real camera 112 by a ratio of (t3 - t2) / (t7 - t2). Similarly, the position of the second virtual camera at time t4 is a position on the line segment connecting the positions of the first virtual camera and real camera 112 at time t4, moved from the first virtual camera toward real camera 112 by a ratio of (t4 - t2) / (t7 - t2). Similarly, the position of the second virtual camera at time t5 is a position on the line segment connecting the positions of the first virtual camera and real camera 112 at time t5, moved from the first virtual camera toward real camera 112 by a ratio of (t5 - t2) / (t7 - t2). Similarly, the position of the first virtual camera at time t6 is a position on the line segment connecting the positions of the first virtual camera and real camera 112 at time t6, moved from the first virtual camera toward real camera 112 by a ratio of (t6 - t2) / (t7 - t2). Similarly, the position of the second virtual camera at time t7 is a position on the line segment connecting the positions of the first virtual camera and real camera 112 at time t7, moved from the first virtual camera toward real camera 112 by a ratio of (t7 - t2) / (t7 - t2). As explained in Figure 4C (4f), at time t7, which is the end time of the transition period, the position of the second virtual camera and the position of real camera 112 are the same.

以上のように、図５に示される手法では、第１の仮想カメラによる仮想視点映像から実カメラ１１２による実カメラ映像に切り替わるときの仮想カメラ位置が、第１の仮想カメラと実カメラ１１２の同一時刻における位置を基に算出される。この手法によれば、仮想カメラ映像から実カメラ映像へ、または実カメラ映像から仮想カメラ映像への映像へ切り替える場合に、常に、第１の仮想カメラと実カメラ１１２の同一時刻の位置から第２の仮想カメラの位置が算出される。そのため、第２の仮想カメラが第１の仮想カメラの位置から実カメラ１１２の位置へ移動している途中で実カメラ位置から仮想カメラ位置１へ向かうよう方向転換を行っても違和感なく、違和感なく切り替えることが可能である。 As described above, with the method shown in FIG. 5, the virtual camera position when switching from the virtual viewpoint image captured by the first virtual camera to the real camera image captured by real camera 112 is calculated based on the positions of the first virtual camera and real camera 112 at the same time. With this method, when switching from virtual camera image to real camera image, or from real camera image to virtual camera image, the position of the second virtual camera is always calculated from the positions of the first virtual camera and real camera 112 at the same time. Therefore, even if the second virtual camera changes direction from the real camera position to virtual camera position 1 while moving from the position of the first virtual camera to the position of real camera 112, the switching can be performed without any sense of incongruity.

また、上記２つの仮想カメラ情報を自動生成する手法では、映像を切り替える開始時刻と終了時刻を指定したが、それに限るものではなく、切り替える開始時刻と切り替えに要する時間（移行期間の長さ）を指定してもよい。これにより、切り替えに要する時間をあらかじめ指定したり、同一の映像を生成する場合の切り替え時間を統一したりすることが容易になる。 Furthermore, in the above method for automatically generating the two pieces of virtual camera information, the start and end times for switching between images are specified, but this is not limited to this; the start time for switching and the time required for switching (length of transition period) may also be specified. This makes it easy to specify the time required for switching in advance and to standardize switching times when generating the same images.

また、上記２つの仮想カメラ情報を自動生成する手法では、映像を切り替える際の第２の仮想カメラの移動を、経過時間と移動期間の比率に基づいて決定したがこれに限るものではない。例えば、上述した経過時間と移動期間の比率に代えて、ユーザ操作により指定される比率（以下、遷移比率という）が移行期間における各時刻で用いられるようにしてもよい。例えば、映像決定部１１４に切り替え前の映像と、切り替え後の映像を指定し、遷移比率を指定することができるフェーダーを有する入力部を設け、入力部へのユーザ操作に応じて第２の仮想視点の位置を生成するようにしてもよい。 Furthermore, in the method for automatically generating the two pieces of virtual camera information described above, the movement of the second virtual camera when switching between images is determined based on the ratio between the elapsed time and the movement period, but this is not limited to this. For example, instead of the ratio between the elapsed time and the movement period described above, a ratio specified by user operation (hereinafter referred to as the transition ratio) may be used at each time during the transition period. For example, the image determination unit 114 may be provided with an input unit having a fader that can specify the image before switching and the image after switching and specify the transition ratio, and the position of the second virtual viewpoint may be generated in response to user operation on the input unit.

図６に遷移比率を指定できる入力部６００の例を示す。入力部６００によるユーザ操作は、映像決定部１１４に出力される。入力部６００は、切替前ボタンスイッチ６０１と切替後ボタンスイッチ６０２を有し、それぞれのチャンネル１からチャンネル４までのボタンスイッチが備えられている。切替前ボタンスイッチ６０１と切替後ボタンスイッチ６０２の間をまたぐようにフェーダー６０３が設けられている。フェーダー６０３は、ユーザ操作に応じて移動し、その位置に従って映像を切り替える際の遷移比率を指示する。本実施形態では、第１の仮想カメラによる仮想視点映像がチャンネル１に、実カメラ１１２による実カメラ映像がチャンネル２に割り当てられている。 Figure 6 shows an example of an input unit 600 that can specify a transition ratio. User operations via the input unit 600 are output to the video determination unit 114. The input unit 600 has a pre-switching button switch 601 and a post-switching button switch 602, each equipped with button switches for channels 1 to 4. A fader 603 is provided across the pre-switching button switch 601 and the post-switching button switch 602. The fader 603 moves in response to user operations, and indicates the transition ratio when switching videos according to its position. In this embodiment, the virtual viewpoint video from the first virtual camera is assigned to channel 1, and the real camera video from the real camera 112 is assigned to channel 2.

図６（ａ）では、フェーダー６０３は最上段の位置にあり、この場合、切替前ボタンスイッチ６０１により指定されるチャンネルの映像が出力される。チャンネル１の切替前ボタンスイッチ６０１が点灯しており、チャンネル１の映像（第１の仮想視点映像３０１）が映像切替部１１５から出力される映像として選択されていることを示している。一方、切替後ボタンスイッチ６０２においてチャンネル２が選択されており、チャンネル２が点灯している。これは、切り替え後に出力される映像としてチャンネル２（実カメラ映像３０２）が選択されていることを示している。フェーダー６０３を最上段から下段方向に移動させると出力映像が第１の仮想カメラによる仮想視点映像から第２の仮想カメラによる仮想視点映像２に切り替わる。そして、第２の仮想カメラの位置は、フェーダー６０３の位置に応じた遷移比率に基づいて、図４Ａ～図４Ｃまたは図５により上述した方法で生成される。なお、遷移比率は、例えば、フェーダー６０３の最上段の位置から最下段の位置までの距離と、フェーダー６０３の最上段の位置から現在の位置までの距離に基づいて設定され得る。 In Figure 6(a), the fader 603 is in the top position, and in this case, the video of the channel specified by the pre-switching button switch 601 is output. The pre-switching button switch 601 for channel 1 is lit, indicating that the video of channel 1 (first virtual viewpoint video 301) has been selected as the video to be output from the video switching unit 115. Meanwhile, channel 2 has been selected by the post-switching button switch 602, and channel 2 is lit. This indicates that channel 2 (real camera video 302) has been selected as the video to be output after switching. When the fader 603 is moved from the top to the bottom, the output video switches from the virtual viewpoint video captured by the first virtual camera to virtual viewpoint video 2 captured by the second virtual camera. The position of the second virtual camera is generated by the method described above with reference to Figures 4A to 4C or 5, based on a transition ratio corresponding to the position of the fader 603. The transition ratio can be set, for example, based on the distance from the top to bottom position of the fader 603 and the distance from the top to the current position of the fader 603.

図６（ｂ）の例では、フェーダー６０３が、最上段から最下段までの間の２／５の位置にある。この場合、その時刻における第１の仮想カメラの位置と実カメラ１１２の位置を結ぶ線分上を、当該線分の２／５だけ第１の仮想カメラから実カメラ１１２の方へ進んだ位置が第２の仮想カメラの位置となる（図４Ｂの４ｃと同様）。なお、図６（ａ）の状態からフェーダー６０３の移動が開始された時刻が、上述した移行期間の開始時刻となり、図６（ｃ）に示されるようにフェーダー６０３が最下段に到達した時刻が移行期間の終了時刻となる。すなわち、フェーダー６０３が最下段に到達すると、第２の仮想カメラの映像から実カメラ１１２の映像に切り替わり、映像の切り替えが完了する。 In the example of Figure 6(b), fader 603 is located 2/5 of the way from the top to the bottom. In this case, the position of the second virtual camera is 2/5 of the way along the line segment connecting the position of the first virtual camera and the position of real camera 112 at that time, moving from the first virtual camera toward real camera 112 (similar to 4c in Figure 4B). Note that the time when fader 603 starts moving from the state in Figure 6(a) is the start time of the transition period described above, and the time when fader 603 reaches the bottom, as shown in Figure 6(c), is the end time of the transition period. In other words, when fader 603 reaches the bottom, the image from the second virtual camera switches to the image from real camera 112, completing the image switch.

以上のように、フェーダー６０３の操作によって、映像の切替時に仮想カメラ情報自動生成部１１７が仮想カメラ情報を生成するのに用いる遷移比率を指定することが可能となる。そのため、切替時刻と仮想カメラが実カメラの状態に近づいていくスピードを容易に操作することができる。 As described above, by operating the fader 603, it is possible to specify the transition ratio used by the virtual camera information automatic generation unit 117 to generate virtual camera information when switching videos. This makes it easy to control the switching time and the speed at which the virtual camera approaches the state of the real camera.

以上、仮想視点映像から実カメラ映像への切り替えを説明したが、これに限るものではなく、実カメラ映像から仮想視点映像への切り替えにも上記処理を適用できる。すなわち、切り替え前の映像を得るための第１の視点と切り替え後の映像を得るための第２の視点の一方は、仮想視点映像を生成するための仮想的な撮像装置の視点であり、他方は、映像を撮影する物理的な撮像装置の視点であればよい。その場合、実カメラ映像から、第２の仮想カメラによる仮想視点映像に切り替わり、さらに第１の仮想カメラによる仮想視点映像に切り替わる。仮想視点映像は、仮想視点カメラ情報２から仮想視点カメラ情報１へ切り替わったように生成される。また、２つの仮想視点による２つの仮想視点映像間の切り替え、２つの実カメラによる２つの自カメラ映像間の切り替えにおいても、仮想カメラ情報自動生成部１１７により生成された第２の仮想カメラからの仮想視点映像を用いることができる。 The above describes switching from virtual viewpoint video to real camera video, but this is not limiting; the above processing can also be applied to switching from real camera video to virtual viewpoint video. That is, one of the first viewpoint for obtaining the video before switching and the second viewpoint for obtaining the video after switching can be the viewpoint of a virtual imaging device for generating the virtual viewpoint video, and the other can be the viewpoint of a physical imaging device that captures the video. In this case, switching occurs from the real camera video to a virtual viewpoint video from the second virtual camera, and then to a virtual viewpoint video from the first virtual camera. The virtual viewpoint video is generated as if switching had occurred from virtual viewpoint camera information 2 to virtual viewpoint camera information 1. Furthermore, when switching between two virtual viewpoint videos from two virtual viewpoints, or when switching between two own-camera videos from two real cameras, the virtual viewpoint video from the second virtual camera generated by the virtual camera information automatic generation unit 117 can be used.

以上のように、第１実施形態によれば、第１の視点により得られる第１の映像から第２の視点により得られる第２の映像への切り替えにおいて、第１の視点と第２の視点の間を補完するように新たな仮想カメラが生成される。そして、新たな仮想視点による仮想視点映像を、第１の映像と第２の映像の間に用いることで、第１の映像と切り替え後の第２の映像とがあたかも１つの視点（カメラ）により撮影されたかのような切り替えを実現できる。また、仮想視点映像と実カメラの映像と滑らかに切り替えることにより、実カメラでは撮影できないよりダイナミックな映像表現が可能となる。 As described above, according to the first embodiment, when switching from a first image obtained from a first viewpoint to a second image obtained from a second viewpoint, a new virtual camera is generated to complement the transition between the first and second viewpoints. Then, by using a virtual viewpoint image from the new virtual viewpoint between the first and second images, it is possible to achieve a transition between the first image and the second image obtained after the switch, as if they were captured from a single viewpoint (camera). Furthermore, smoothly switching between the virtual viewpoint image and the image from the real camera enables more dynamic visual expression that cannot be captured with a real camera.

＜第２実施形態＞
第１実施形態では、第１の仮想カメラの情報と実カメラの情報に基づいて仮想視点（第２の仮想カメラ）の情報を生成する処理を説明した。仮想視点の情報には、位置、姿勢（視線の方向）、焦点距離（ズーム値）などが含まれるが、第１実施形態の処理ではこれらを特に区別することなく、同等の処理により生成した。第２実施形態では、仮想視点の情報のうち、位置情報と姿勢情報を独立した処理により生成する。なお、第１実施形態と同等の構成には同一の参照番号を付し、その詳細な説明を省略する。 Second Embodiment
In the first embodiment, a process for generating information about a virtual viewpoint (second virtual camera) based on information about a first virtual camera and information about a real camera was described. Information about the virtual viewpoint includes position, orientation (direction of line of sight), focal length (zoom value), and the like, but in the process of the first embodiment, these are generated by the same process without any particular distinction. In the second embodiment, position information and orientation information of the virtual viewpoint are generated by independent processes. Note that the same reference numerals are used for components equivalent to those in the first embodiment, and detailed descriptions thereof will be omitted.

上述のように、第１実施形態では、第２の仮想カメラの位置情報は第１の仮想カメラの位置情報と実カメラ１１２の位置情報からそれらの間を移動するよう生成し、第２の仮想カメラの姿勢も同等の手法で生成することができるとした。しかしながら、第１実施形態の方法では、第２の仮想カメラの姿勢や焦点距離によっては、撮影したい被写体が第２の仮想カメラの撮影範囲に含まれなくなる可能性があるという課題がある。第２の実施形態では、そのような課題を解決するため、第２の仮想カメラの位置と、第２の仮想カメラの姿勢、焦点距離の情報を独立に制御する。 As described above, in the first embodiment, the position information of the second virtual camera is generated from the position information of the first virtual camera and the position information of the real camera 112 so as to move between them, and the attitude of the second virtual camera can be generated using a similar method. However, the method of the first embodiment has the problem that, depending on the attitude and focal length of the second virtual camera, the subject to be photographed may not be included in the shooting range of the second virtual camera. In the second embodiment, to solve this problem, the position of the second virtual camera and the attitude and focal length information of the second virtual camera are controlled independently.

図７は、第２実施形態による画像処理システムの構成例を示すブロック図である。第１実施形態（図１）の構成に、被写体識別部７０１が加わった構成となっている。被写体識別部７０１は、仮想カメラまたは実カメラ１１２で撮影している被写体を識別する。すなわち、被写体識別部７０１は、仮想カメラ情報生成部１１０、実カメラ情報取得部１１３、仮想カメラ情報自動生成部１１７からのカメラ情報と、３Ｄモデル記憶部１０９からの情報を基に、仮想カメラや実カメラ１１２の映像に移っている被写体を識別する。また、画像取得部１０４は、カメラ制御部１０２から取得した映像を映像切替部１１５にも提供する。これにより、映像切替部１１５は、仮想視点映像に用いるために使用されるカメラ群１０１の映像を映像出力としても用いることが可能となる。 Figure 7 is a block diagram showing an example configuration of an image processing system according to the second embodiment. This configuration adds an object identification unit 701 to the configuration of the first embodiment (Figure 1). The object identification unit 701 identifies the object captured by the virtual camera or real camera 112. That is, the object identification unit 701 identifies the object captured in the image captured by the virtual camera or real camera 112 based on camera information from the virtual camera information generation unit 110, real camera information acquisition unit 113, and automatic virtual camera information generation unit 117, as well as information from the 3D model storage unit 109. The image acquisition unit 104 also provides the image acquired from the camera control unit 102 to the image switching unit 115. This enables the image switching unit 115 to use the image captured by the camera group 101 used for the virtual viewpoint image as image output.

図８は、第２実施形態による出力映像決定処理を示すフローチャートである。第１実施形態（図２）の処理と同等の処理には同一のステップ番号を付してある。ステップＳ８０１で、仮想カメラ情報自動生成部１１７は、切替情報を参照し、第１の仮想視点映像３０１から実カメラ映像３０２への移行期間において、第２の仮想カメラの位置と姿勢の遷移比率が異なるか否かを判断する。遷移比率が異なっていないと判断された場合（ステップＳ８０１でＮＯ）、処理はステップＳ２１１に進む。一方、遷移比率が異なると判断された場合（ステップＳ８０１でＹＥＳ）、処理はステップＳ８０２に進む。 Figure 8 is a flowchart showing the output video determination process according to the second embodiment. Processes equivalent to those in the first embodiment (Figure 2) are assigned the same step numbers. In step S801, the automatic virtual camera information generation unit 117 references the switching information and determines whether the transition ratio of the position and orientation of the second virtual camera is different during the transition period from the first virtual viewpoint video 301 to the real camera video 302. If it is determined that the transition ratio is not different (NO in step S801), the process proceeds to step S211. On the other hand, if it is determined that the transition ratio is different (YES in step S801), the process proceeds to step S802.

ステップＳ８０２において、仮想カメラ情報自動生成部１１７は、第１の仮想カメラの情報、実カメラ１１２の情報、切替条件に基づいて、仮想カメラ映像から実カメラ映像へ切り替える際の第２の仮想カメラの位置、姿勢、画角の情報を生成する。仮想カメラ情報自動生成部１１７は、切替条件に含まれる第１の仮想カメラの位置から実カメラ１１２の位置へ切り替えるための位置の移行期間と、第１の仮想カメラの姿勢から実カメラ１１２の姿勢に切り替えるための姿勢の移行期間を取得する。切替条件においては、例えば、位置の移行期間および姿勢の移行期間は互いに独立して設定されており、それぞれ開始時刻と終了時刻により示される。仮想カメラ情報自動生成部１１７は、それぞれの時刻における第２の仮想カメラの位置と姿勢を計算する。なお、第１実施形態と同様に、切替比率を指定するためのフェーダー６０３を備えた入力部６００が用いられてもよい。その場合、独立に制御したい条件ごとに個別にフェーダー６０３が設けられる。 In step S802, the automatic virtual camera information generation unit 117 generates information about the position, attitude, and angle of view of the second virtual camera when switching from virtual camera footage to real camera footage, based on information about the first virtual camera, information about the real camera 112, and the switching conditions. The automatic virtual camera information generation unit 117 acquires the position transition period for switching from the position of the first virtual camera to the position of the real camera 112, and the attitude transition period for switching from the attitude of the first virtual camera to the attitude of the real camera 112, which are included in the switching conditions. In the switching conditions, for example, the position transition period and the attitude transition period are set independently of each other and are indicated by start and end times, respectively. The automatic virtual camera information generation unit 117 calculates the position and attitude of the second virtual camera at each time. Note that, as in the first embodiment, an input unit 600 equipped with a fader 603 for specifying the switching ratio may be used. In this case, a separate fader 603 is provided for each condition that needs to be controlled independently.

また、第２の仮想カメラの姿勢が、切り替え後の出力映像に含まれる被写体を優先的に映し出すように、位置の遷移比率とは異なる遷移比率で計算されてもよい。図９Ａ～９Ｂは、ステップＳ８０２において、切り替え後の出力映像に含まれる被写体を優先的に映し出すように仮想カメラの情報を生成する処理の例を示す。各時刻における第１の仮想カメラ、第２の仮想カメラ、実カメラ１１２のそれぞれの位置と姿勢は、図４Ａで示したとおりである。なお、第１の仮想カメラでは、主に撮影されている被写体として被写体９０１がその撮影範囲に存在しており、実カメラ１１２では、主に撮影されている被写体として被写体９０２がその撮影範囲に存在している。位置の移行期間（時刻ｔ２からｔ７）において、第２の仮想カメラの位置は第１実施形態と同様に第１の仮想カメラの位置から実カメラ１１２の位置へ遷移する。一方、第２の仮想カメラの姿勢および焦点距離（ズーム値）は、姿勢の移行期間である時刻ｔ２から時刻ｔ４の間に実カメラ１１２と同等画角となるように急峻に変更される。その後、時刻ｔ４から時刻ｔ７の間は実カメラ１１２と同等画角となるよう第２の仮想カメラの姿勢と焦点距離を設定する。なお、同等画角とは、それぞれの視点から得られる映像において同一の被写体がほぼ同じ位置に映るように設定された姿勢と画角を言う。或いは、それぞれの視点から得られる映像において、同一の被写体がほぼ同じ大きさで映るように設定された姿勢と画角を言う。或いは、それぞれの視点から得られる映像において、同一の被写体の映る位置と大きさがほぼ同じになるように設定された姿勢と画角を言う。 The orientation of the second virtual camera may be calculated using a transition ratio different from the position transition ratio so as to prioritize displaying the subject included in the output image after switching. Figures 9A and 9B show an example of the process of generating virtual camera information in step S802 so as to prioritize displaying the subject included in the output image after switching. The positions and orientations of the first virtual camera, second virtual camera, and real camera 112 at each time are as shown in Figure 4A. Note that, in the first virtual camera, subject 901 is present within its shooting range as the subject primarily being photographed, and in the real camera 112, subject 902 is present within its shooting range as the subject primarily being photographed. During the position transition period (times t2 to t7), the position of the second virtual camera transitions from the position of the first virtual camera to the position of real camera 112, as in the first embodiment. Meanwhile, the attitude and focal length (zoom value) of the second virtual camera are abruptly changed during the attitude transition period from time t2 to time t4 so that the angle of view is equivalent to that of real camera 112. Thereafter, the attitude and focal length of the second virtual camera are set so that the angle of view is equivalent to that of real camera 112 from time t4 to time t7. Note that equivalent angle of view refers to an attitude and angle of view set so that the same subject appears in approximately the same position in the images obtained from each viewpoint. Alternatively, it refers to an attitude and angle of view set so that the same subject appears at approximately the same size in the images obtained from each viewpoint. Alternatively, it refers to an attitude and angle of view set so that the same subject appears in approximately the same position and size in the images obtained from each viewpoint.

被写体識別部７０１によって仮想カメラ情報生成部１１０からの第１の仮想カメラの位置、姿勢、焦点距離の情報と、３Ｄモデル記憶部１０９からの前景の位置に基づいて、第１の仮想カメラで取得される仮想視点映像のどの位置に前景が存在するかが確認できる。同様に、実カメラ１１２の位置、姿勢、焦点距離の情報と３Ｄモデル記憶部１０９からの前景の位置から、実カメラ１１２が撮影する実カメラ映像のどの位置に前景が存在するかが確認できる。本実施形態の仮想カメラ情報自動生成部１１７は、第２の仮想カメラによる仮想視点映像を出力している移行期間において、切替後の映像、すなわち実カメラ１１２の映像と同等画角となるような映像を第２の仮想カメラから撮影するかのごとく第２の仮想カメラの姿勢を計算する。 The subject identification unit 701 can determine the position of the foreground in the virtual viewpoint image captured by the first virtual camera based on the information on the position, orientation, and focal length of the first virtual camera from the virtual camera information generation unit 110 and the position of the foreground from the 3D model storage unit 109. Similarly, the position of the foreground in the real camera image captured by the real camera 112 can be determined based on the information on the position, orientation, and focal length of the real camera 112 and the position of the foreground from the 3D model storage unit 109. During the transition period in which the virtual viewpoint image from the second virtual camera is being output, the automatic virtual camera information generation unit 117 of this embodiment calculates the orientation of the second virtual camera as if the second virtual camera were capturing the post-switching image, i.e., the image with the same angle of view as the image from the real camera 112.

図９Ａにおいて、９ａは、第１の仮想カメラの時刻ｔ２における位置９１１と姿勢９１２、実カメラ１１２の時刻ｔ２における位置９３１と姿勢９３２を示す。時刻ｔ２の時点では、第２の仮想カメラの位置および姿勢は、第１の仮想カメラの位置９３１および姿勢９３２と同じである。図９Ａの９ｂは、時刻ｔ３における第１の仮想カメラの位置９１３と姿勢９１４、実カメラ１１２の位置９３３と姿勢９３４、第２の仮想カメラの位置９５１と姿勢９５４を示す。時刻ｔ３における第２の仮想カメラの姿勢９５４は、第１の仮想カメラの時刻ｔ２の姿勢９１２（姿勢９５２）と、第２の仮想カメラが時刻ｔ３の実カメラ１１２と同等画角を得ることができる姿勢９５３とに基づいて決定される。すなわち、第２の仮想カメラの時刻ｔ３の姿勢９５４は、姿勢９５２と姿勢９５４の間で、（ｔ３－ｔ２）／（ｔ４－ｔ２）の割合だけ姿勢９５２から姿勢９５３へ傾いた姿勢である。 In Figure 9A, 9a shows the position 911 and orientation 912 of the first virtual camera at time t2, and the position 931 and orientation 932 of the real camera 112 at time t2. At time t2, the position and orientation of the second virtual camera are the same as the position 931 and orientation 932 of the first virtual camera. 9b in Figure 9A shows the position 913 and orientation 914 of the first virtual camera, the position 933 and orientation 934 of the real camera 112, and the position 951 and orientation 954 of the second virtual camera at time t3. The orientation 954 of the second virtual camera at time t3 is determined based on the orientation 912 (orientation 952) of the first virtual camera at time t2 and the orientation 953 that enables the second virtual camera to obtain an angle of view equivalent to that of the real camera 112 at time t3. In other words, the orientation 954 of the second virtual camera at time t3 is an orientation that is tilted from orientation 952 to orientation 953 by a ratio of (t3 - t2) / (t4 - t2) between orientation 952 and orientation 954.

図９Ｂにおいて、９ｃは時刻ｔ４における第１の仮想カメラの位置９１５と姿勢９１６、実カメラ１１２の位置９３５と姿勢９３６、第２の仮想カメラの位置９５５と姿勢９５６を示す。時刻ｔ３の場合と同様に、時刻ｔ４における第２の仮想カメラの姿勢９５６は、第１の仮想カメラの時刻ｔ２の姿勢９１２と、第２の仮想カメラが時刻ｔ４の実カメラ１１２と同等画角を得ることができる姿勢とに基づいて決定される。しかし、時刻ｔ４では、（ｔ４－ｔ２）／（ｔ４－ｔ２）＝１となるため、実カメラ１１２と同等画角を得ることができる姿勢９５６が、第２の仮想カメラの時刻ｔ４における姿勢に決定される。 In Figure 9B, 9c shows the position 915 and orientation 916 of the first virtual camera, the position 935 and orientation 936 of the real camera 112, and the position 955 and orientation 956 of the second virtual camera at time t4. As with time t3, the orientation 956 of the second virtual camera at time t4 is determined based on the orientation 912 of the first virtual camera at time t2 and the orientation that allows the second virtual camera to obtain an angle of view equivalent to that of the real camera 112 at time t4. However, at time t4, (t4 - t2) / (t4 - t2) = 1, and therefore the orientation 956 that allows the second virtual camera to obtain an angle of view equivalent to that of the real camera 112 is determined as the orientation of the second virtual camera at time t4.

図９Ｂの９ｄは、時刻ｔ５における第１の仮想カメラの位置９１７と姿勢９１８、実カメラ１１２の位置９３７と姿勢９３８、第２の仮想カメラの位置９５７と姿勢９５８を示す。第２の仮想カメラの時刻ｔ５の姿勢９５８は、時刻ｔ５における実カメラ１１２と同等画角が得られるように決定されている。同様に、図９Ｃの９ｅは、時刻ｔ６における第１の仮想カメラの位置９１９と姿勢９２０、実カメラ１１２の位置９３９と姿勢９４０、第２の仮想カメラの位置９５９と姿勢９６０を示す。第２の仮想カメラの時刻ｔ６の姿勢９６０は、時刻ｔ６における実カメラ１１２と同等画角が得られるように決定されている。図９Ｃの９ｆは時刻ｔ７における第１の仮想カメラの位置９２１と姿勢９２２、実カメラ１１２の位置９４１と姿勢９４２を示す。時刻ｔ７では、第２の仮想カメラの位置および姿勢は、実カメラ１１２の位置９４１および姿勢９４２と同じである。 9d in Figure 9B shows the position 917 and orientation 918 of the first virtual camera, the position 937 and orientation 938 of the real camera 112, and the position 957 and orientation 958 of the second virtual camera at time t5. The orientation 958 of the second virtual camera at time t5 is determined so as to obtain the same angle of view as the real camera 112 at time t5. Similarly, 9e in Figure 9C shows the position 919 and orientation 920 of the first virtual camera, the position 939 and orientation 940 of the real camera 112, and the position 959 and orientation 960 of the second virtual camera at time t6. The orientation 960 of the second virtual camera at time t6 is determined so as to obtain the same angle of view as the real camera 112 at time t6. 9f in Figure 9C shows the position 921 and orientation 922 of the first virtual camera and the position 941 and orientation 942 of the real camera 112 at time t7. At time t7, the position and orientation of the second virtual camera are the same as the position 941 and orientation 942 of the real camera 112.

＜他の実施形態＞
なお、上記各実施形態では、実カメラ１１２は仮想視点映像を生成するカメラ群１０１とは異なる、仮想視点映像の撮影範囲周辺に持ち込んだカメラとして説明したが、これに限られるものではない。例えば、第２実施形態のようにカメラ群１０１の一部またはすべてのカメラの映像が映像切替部１１５へ送られ、出力映像として選択可能であれば、実カメラ１１２はカメラ群１０１のうちのいずれか１つであってもよい。これにより、仮想視点映像から、仮想視点映像を生成するためのカメラ群１０１のうちの１つの実カメラによる実カメラ映像へ切り替える場合であっても、それら映像の切り替えの移行期間のための新たな仮想視点映像を容易に生成することが可能となる。 <Other Embodiments>
In the above embodiments, the real camera 112 has been described as a camera that is different from the group of cameras 101 that generate the virtual viewpoint video and is brought in around the shooting range of the virtual viewpoint video, but this is not limited to this. For example, as in the second embodiment, if the images from some or all of the cameras in the group of cameras 101 are sent to the image switching unit 115 and can be selected as output images, the real camera 112 may be any one of the group of cameras 101. This makes it possible to easily generate a new virtual viewpoint video for the transition period between the images, even when switching from the virtual viewpoint video to real camera image from one of the real cameras in the group of cameras 101 that generate the virtual viewpoint video.

また、移行期間における仮想視点の生成は、移行期間における実カメラ１１２の撮影フレームごと（あるいは第１の仮想視点による仮想視点映像のフレームごと）に行われてもよいし、所定の時間間隔（例えば、０．５秒ごとなど）で行われてもよい。 Furthermore, the virtual viewpoint during the transition period may be generated for each frame captured by the real camera 112 during the transition period (or for each frame of the virtual viewpoint image captured by the first virtual viewpoint), or at a predetermined time interval (e.g., every 0.5 seconds).

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention can also be realized by supplying a program that realizes one or more of the functions of the above-described embodiments to a system or device via a network or storage medium, and having one or more processors in the computer of that system or device read and execute the program. It can also be realized by a circuit (e.g., an ASIC) that realizes one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above-described embodiments, and various modifications and variations are possible without departing from the spirit and scope of the invention. Therefore, the following claims are appended to clarify the scope of the invention.

１０１：カメラ群、１０２：カメラ制御部、１０３：画像処理装置、１０４：画像取得部、１０５：背景画像記憶部、１０６：分離部、１０７：前景画像記憶部、１０８：３Ｄモデル生成部、１０９：３Ｄモデル記憶部、１１０：仮想カメラ情報生成部、１１１：仮想視点映像生成部、１１２：実カメラ、１１３：実カメラ情報取得部、１１４：映像決定部、１１５：映像切替部、１１６：映像出力部、１１７：仮想カメラ情報自動生成部 101: Camera group, 102: Camera control unit, 103: Image processing device, 104: Image acquisition unit, 105: Background image storage unit, 106: Separation unit, 107: Foreground image storage unit, 108: 3D model generation unit, 109: 3D model storage unit, 110: Virtual camera information generation unit, 111: Virtual viewpoint image generation unit, 112: Real camera, 113: Real camera information acquisition unit, 114: Image determination unit, 115: Image switching unit, 116: Image output unit, 117: Automatic virtual camera information generation unit

Claims

an acquisition means for acquiring information relating to a first image and a second image, at least one of which is an image captured by an imaging device, the acquisition means acquiring information about a first viewpoint for acquiring the first image and information about a second viewpoint for acquiring the second image at a time corresponding to the time of the first image;
a setting means for setting a period from the end of output of the first video to the start of output of the second video when switching the video to be output from the first video to the second video;
a first generating means for generating virtual viewpoint information for the period based on the first viewpoint information for the period and the second viewpoint information for the period;
a second generating means for generating a virtual viewpoint video for the period based on information about a virtual viewpoint for the period;
an output means for switching between and outputting the first video, the virtual viewpoint video for the period, and the second video in that order;
and
The first generating means generates a virtual viewpoint for the period based on information about the first viewpoint, information about the second viewpoint, and a ratio between an elapsed time from the start of the period and an entire time of the period.
1. An image processing device comprising:

2. The image processing device according to claim 1, wherein the first generating means generates virtual viewpoint information for the period based only on information about the first viewpoint at the start time of the period.

an acquisition means for acquiring information relating to a first image and a second image, at least one of which is an image captured by an imaging device, the acquisition means acquiring information about a first viewpoint for acquiring the first image and information about a second viewpoint for acquiring the second image at a time corresponding to the time of the first image;
a setting means for setting a period from the end of output of the first video to the start of output of the second video when switching the video to be output from the first video to the second video;
a first generating means for generating virtual viewpoint information for the period based on the first viewpoint information for the period and the second viewpoint information for the period;
a second generating means for generating a virtual viewpoint video for the period based on information about a virtual viewpoint for the period;
an output means for switching between and outputting the first video, the virtual viewpoint video for the period, and the second video in that order;
a setting means for setting a ratio in accordance with a user operation received during the period ;
and
The image processing device according to claim 1, wherein the first generating means generates a virtual viewpoint for the period based on the information of the first viewpoint, the information of the second viewpoint, and the ratio set by the setting means.

4. The image processing device according to claim 1, wherein the first generating means generates a virtual viewpoint for the period by taking a weighted average of the information of the first viewpoint and the information of the second viewpoint based on the ratio.

5. The image processing device according to claim 1, wherein the first generation means generates a virtual viewpoint at each time during the period based on information about the first viewpoint at the start time of the period and information about the second viewpoint at each time.

an acquisition means for acquiring information relating to a first image and a second image, at least one of which is an image captured by an imaging device, the acquisition means acquiring information about a first viewpoint for acquiring the first image and information about a second viewpoint for acquiring the second image at a time corresponding to the time of the first image;
a setting means for setting a period from the end of output of the first video to the start of output of the second video when switching the video to be output from the first video to the second video;
a first generating means for generating virtual viewpoint information for the period based on the first viewpoint information for the period and the second viewpoint information for the period;
a second generating means for generating a virtual viewpoint video for the period based on information about a virtual viewpoint for the period;
an output means for switching between and outputting the first video, the virtual viewpoint video for the period, and the second video in that order;
and
The image processing device is characterized in that the first generation means generates a virtual viewpoint at each time during the period based on information about the first viewpoint at each time and information about the second viewpoint at each time.

further comprising an identification means for identifying a subject from the image captured from the second viewpoint,
7. The image processing device according to claim 1, wherein the first generating means generates information about the direction of the line of sight included in the information about the virtual viewpoint for the period based on the position of the subject identified by the identifying means.

The image processing device described in claim 7, characterized in that the first generation means generates gaze direction information included in the virtual viewpoint information for the period based on the gaze direction of the virtual viewpoint to obtain an image of a shooting range in which the position of the subject shown in the virtual viewpoint image is the same as the position of the subject shown in the image obtained from the second viewpoint of the subject, and the gaze direction of the first viewpoint at the start of the period.

The image processing device described in claim 7 or 8, characterized in that the first generation means generates information about the focal length of the virtual viewpoint for the period based on the focal length of the line of sight of the virtual viewpoint for obtaining an image of a shooting range in which the size of the subject shown in the virtual viewpoint image is the same as the size of the subject shown in the image obtained from the second viewpoint of the subject, and the focal length of the line of sight of the first viewpoint at the start of the period .

10. The image processing device according to claim 1, wherein one of the first image and the second image is a virtual viewpoint image generated based on a plurality of images captured by a plurality of imaging devices and a virtual viewpoint.

the second generating means further includes a connection means for connecting to a plurality of imaging devices for obtaining a plurality of images for generating a virtual viewpoint video;
The image processing device according to claim 10 , wherein the virtual viewpoint video is generated based on the plurality of images.

an acquisition step of acquiring information relating to a first image and a second image, at least one of which is an image captured by an imaging device, the acquisition step acquiring information about a first viewpoint for capturing the first image and information about a second viewpoint for capturing the second image at a time corresponding to the time of the first image;
a setting step of setting a period from the end of output of the first video to the start of output of the second video when switching the video to be output from the first video to the second video;
a first generation step of generating virtual viewpoint information for the period based on the first viewpoint information for the period and the second viewpoint information for the period;
a second generation step of generating a virtual viewpoint video for the period based on information about a virtual viewpoint for the period;
an output step of switching and outputting the first video, the virtual viewpoint video for the period, and the second video in that order;
and
In the first generation step, a virtual viewpoint for the period is generated based on information about the first viewpoint, information about the second viewpoint, and a ratio between the elapsed time from the start of the period and the total time of the period.
2. A method for controlling an image processing apparatus comprising:

an acquisition step of acquiring information relating to a first image and a second image, at least one of which is an image captured by an imaging device, the acquisition step acquiring information about a first viewpoint for capturing the first image and information about a second viewpoint for capturing the second image at a time corresponding to the time of the first image;
a setting step of setting a period from the end of output of the first video to the start of output of the second video when switching the video to be output from the first video to the second video;
a first generation step of generating virtual viewpoint information for the period based on the first viewpoint information for the period and the second viewpoint information for the period;
a second generation step of generating a virtual viewpoint video for the period based on information about a virtual viewpoint for the period;
an output step of switching and outputting the first video, the virtual viewpoint video for the period, and the second video in that order;
a setting step of setting a ratio in accordance with a user operation received during the period;
and
In the first generating step, a virtual viewpoint for the period is generated based on the information on the first viewpoint, the information on the second viewpoint, and the ratio set in the setting step.
2. A method for controlling an image processing apparatus comprising:

an acquisition step of acquiring information relating to a first image and a second image, at least one of which is an image captured by an imaging device, the acquisition step acquiring information about a first viewpoint for capturing the first image and information about a second viewpoint for capturing the second image at a time corresponding to the time of the first image;
a setting step of setting a period from the end of output of the first video to the start of output of the second video when switching the video to be output from the first video to the second video;
a first generation step of generating virtual viewpoint information for the period based on the first viewpoint information for the period and the second viewpoint information for the period;
a second generation step of generating a virtual viewpoint video for the period based on information about a virtual viewpoint for the period;
an output step of switching and outputting the first video, the virtual viewpoint video for the period, and the second video in that order;
and
In the first generation step, a virtual viewpoint at each time in the period is generated based on information about the first viewpoint at each time and information about the second viewpoint at each time.
2. A method for controlling an image processing apparatus comprising:

A program for causing a computer to function as the image processing device according to any one of claims 1 to 11 .