JP7557442B2

JP7557442B2 - 3DCG rendering device, system, method and program

Info

Publication number: JP7557442B2
Application number: JP2021138111A
Authority: JP
Inventors: 達也小林
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2024-09-27
Anticipated expiration: 2041-08-26
Also published as: JP2023032162A

Description

本発明は、3DCGをユーザの視点でレンダリングしてレンダリング画像を出力する3DCGレンダリング装置、システム、方法およびプログラムに係り、特に、ユーザの視点が高速に変化する場合でも高品質な運動視差表現を伴う画像を出力できる3DCGレンダリング装置、システム、方法およびプログラムに関する。 The present invention relates to a 3DCG rendering device, system, method, and program that renders 3DCG from a user's viewpoint and outputs a rendered image, and in particular to a 3DCG rendering device, system, method, and program that can output an image with high-quality motion parallax expression even when the user's viewpoint changes rapidly.

3DCGを用いた立体的な映像表現方式として、ヘッドマウントディスプレイ（HMD）等の頭部装着型のディスプレイを用いる方式と、ライトフィールドディスプレイ等の据え置き型の3Dディスプレイを用いる方式とが存在する。 There are two methods of expressing stereoscopic images using 3DCG: one that uses a head-mounted display such as a head-mounted display (HMD), and one that uses a stationary 3D display such as a light field display.

いずれの方式でもユーザの頭部姿勢（両眼の位置）を正確に推定し、ユーザの左右の眼に対応した視点で3DCGをレンダリングし、得られた2視点の映像をユーザの両眼に提示する必要がある。これにより、あたかも3DCGが眼の前に存在するかのような立体的な映像をユーザに提示することが可能になる。 Either method requires accurately estimating the user's head posture (position of both eyes), rendering the 3DCG from viewpoints corresponding to the user's left and right eyes, and presenting the resulting two-view images to the user's eyes. This makes it possible to present the user with a three-dimensional image as if the 3DCG were right in front of their eyes.

このとき、実際とは異なる視点でレンダリングされた映像が提示されると、サイバーシックネス（VR酔い）と呼ばれる現象が生じて体感品質が著しく損なわれる。また、システムが3DCGをレンダリングしてディスプレイに表示するまでの間もユーザの頭部姿勢は絶えず変化するため、特にユーザが頭部を意識的に動かすような場合には、表示遅延（Motion-to-Photon Latency）が頭部姿勢の大きな誤差に繋がり、現実と映像の頭部姿勢との不一致（3DCG表示の幾何学的な不整合）により体感品質が大きく損なわれる恐れがある。 If an image rendered from a different viewpoint from the actual one is presented at this time, a phenomenon known as cybersickness (VR sickness) occurs, significantly impairing the quality of the experience. In addition, the user's head posture is constantly changing between when the system renders the 3DCG and when it is displayed on the screen. In particular, when the user consciously moves their head, the display delay (motion-to-photon latency) can lead to large errors in head posture, and the quality of the experience can be significantly impaired due to the mismatch between the reality and the head posture of the image (geometric inconsistency in the 3DCG display).

非特許文献１には、ディスプレイ表示までの遅延時間に基づき、ユーザの両眼の位置の予測を行うことで誤差を低減し、映像品質を向上する技術が開示されている。 Non-Patent Document 1 discloses a technology that reduces errors and improves image quality by predicting the position of the user's eyes based on the delay time until the image appears on the display.

一方、近年のモバイルネットワークの技術向上に伴い、高画質映像を低遅延で伝送することが可能になり、3DCGのレンダリングをクラウドサーバで行い、HMDは映像を受信して再生する形式の3DCGレンダリングシステムが登場している。 On the other hand, recent improvements in mobile network technology have made it possible to transmit high-quality video with low latency, and 3DCG rendering systems have emerged in which 3DCG rendering is performed on a cloud server and the HMD receives and plays the video.

このようなシステムでは、上述の表示処理遅延に加えて、ネットワークの伝送に必要な時間分の遅延が生じるため、レンダリング時に想定していた頭部姿勢と実際に映像を表示した瞬間の頭部姿勢との誤差が大きくなり、体感品質が大きく劣化する恐れがある。 In such a system, in addition to the display processing delay mentioned above, there is also a delay due to the time required for network transmission, which can lead to a large error between the head pose assumed during rendering and the actual head pose at the moment the image is displayed, which can significantly degrade the quality of experience.

特許文献１には、サーバ側でのユーザの頭部姿勢の予測に加えて、HMD側で映像受信後に最新の頭部姿勢を推定し、最新の頭部姿勢に合うように映像の表示位置補正を行う技術が開示されている。 Patent document 1 discloses a technology in which, in addition to predicting the user's head pose on the server side, the HMD estimates the latest head pose after receiving the video and corrects the display position of the video to match the latest head pose.

特許文献２には、HMDにおいてレンダリング映像の生成後に最新の頭部姿勢を推定した後、頭部姿勢の再予測を行い、再予測結果に合うように映像の表示位置補正を行う技術が開示されている。 Patent document 2 discloses a technology in which the latest head pose is estimated after a rendered image is generated in an HMD, the head pose is re-predicted, and the display position of the image is corrected to match the re-prediction result.

特許文献３には、映像の表示補正の具体的方法として、画像全体または物体領域ごとのアフィン変換（平行移動、回転、およびスケーリング）を用いる方法が開示されている。 Patent document 3 discloses a method for image display correction that uses affine transformation (translation, rotation, and scaling) of the entire image or each object region.

特開2021-56783号公報Patent Publication No. 2021-56783 特開2017-76119号公報JP 2017-76119 A 特表2019-532395号公報Patent Publication No. 2019-532395

S. Lee, B. Kang and D. Nam, "Position Prediction for Eye-tracking based 3D Display", Digital Holography and 3-D Imaging, 2019S. Lee, B. Kang and D. Nam, "Position Prediction for Eye-tracking based 3D Display", Digital Holography and 3-D Imaging, 2019

上記のいずれの先行技術であっても、頭部姿勢の予測誤差に起因するモデルの運動視差表現に関する幾何学的な不整合を解消することは困難であり、特に、ユーザに近い奥行き距離で表示される3DCGの表示について体感品質が損なわれるという課題があった。 With any of the above prior art technologies, it is difficult to resolve the geometric inconsistencies in the model's representation of motion parallax caused by prediction errors in head pose, and there is a particular issue of impaired perceived quality when displaying 3DCG images that are displayed at a depth close to the user.

非特許文献1では、ユーザの両眼位置の予測誤差がモデル表示の幾何学的不整合に直結する。両眼位置の予測は線形予測で行われるため、ユーザの頭部が急に動き出したり、急に止まったりすると不整合が拡大してしまう。 In Non-Patent Document 1, prediction errors in the position of the user's eyes directly lead to geometric inconsistencies in the model display. Because prediction of the position of the eyes is performed using linear prediction, the inconsistency increases if the user's head suddenly starts moving or stops.

特許文献1，2では、最新の頭部姿勢の情報を用いて映像表示位置が補正されるものの、補正は平行移動に限定される。したがって、ユーザの頭部姿勢（両眼の位置）が並進運動を伴わない回転運動のみであれば正確な補正が可能であるものの、並進運動に起因する見え方の変化については補正ができず、幾何学的不整合が発生する。 In Patent Documents 1 and 2, the image display position is corrected using the latest head posture information, but the correction is limited to translation. Therefore, although accurate correction is possible if the user's head posture (position of both eyes) is only a rotational movement without translational movement, changes in appearance caused by translational movement cannot be corrected, resulting in geometric inconsistencies.

並進運動に起因する見え方の変化（運動視差表現）は複数のオブジェクト同士の前後の位置関係に応じて生じる。例えば、ユーザ頭部の並進運動に伴う各オブジェクトの表示位置の移動速度の違いや、オブジェクト同士が重なり合うことによる遮蔽（オクルージョン）状態の変化、単一のオブジェクトにおける見え方の変化（例：頭部が右に動くと新しく右側の面が見えたり、左に動くと新しく左側の面が見えたりする）などがある。 Changes in appearance caused by translational motion (motion parallax representation) occur according to the relative positions of multiple objects relative to one another. For example, there are differences in the speed at which the display position of each object moves as the user's head translates, changes in the occlusion state when objects overlap, and changes in the appearance of a single object (e.g., when the head moves to the right, a new face on the right becomes visible, and when the head moves to the left, a new face on the left becomes visible).

特許文献3では、複数のオブジェクト同士の位置関係に伴う運動視差表現については比較的正しく補正されるが、単一のオブジェクトにおける見え方の変化については補正することができない。そのため、特に3DCGがユーザに近い奥行に表示される際に幾何学的不整合が発生するという課題があった。 In Patent Document 3, the representation of motion parallax associated with the relative positions of multiple objects is relatively accurately corrected, but it is unable to correct changes in the appearance of a single object. This poses the problem of geometric inconsistencies occurring, particularly when 3DCG is displayed at a depth close to the user.

本発明の目的は、上記の技術課題を解決し、レンダリング映像と深度画像とを合わせてディスプレイデバイスへ伝送し、ディスプレイデバイスでの画像補正の際に、最新の頭部姿勢と深度画像を用いた三次元的な画像補正を行うことにより、頭部姿勢の予測誤差が大きい場合でも正確な運動視差表現を実現できる3DCGレンダリング装置、システム、方法およびプログラムを提供することにある。 The object of the present invention is to provide a 3DCG rendering device, system, method, and program that solves the above technical problems and transmits a rendered video and a depth image together to a display device, and performs three-dimensional image correction using the latest head pose and depth image when correcting the image on the display device, thereby realizing accurate motion parallax expression even when there is a large prediction error in the head pose.

上記の目的を達成するために、本発明は、ユーザ視点でレンダリング画像を表示するローカル端末と3DCGをユーザ視点でレンダリングしてローカル端末へ伝送するレンダリングサーバとをネットワークで接続して構成される3DCGレンダリングシステムにおいて、以下の構成を具備した点に特徴がある。 To achieve the above object, the present invention is a 3DCG rendering system that is configured by connecting a local terminal that displays a rendered image from a user's viewpoint and a rendering server that renders 3DCG from the user's viewpoint and transmits it to the local terminal via a network, and is characterized by having the following configuration.

(1) ローカル端末が、ユーザの各時刻における視点を、基準時刻では基準視点として推定し、基準時刻から遅延時間だけ経過した最新時刻では最新視点として推定する視点推定手段を具備し、レンダリングサーバが、基準視点のレンダリング画像を生成してその深度画像と共に出力するレンダリング手段を具備し、ローカル端末が更に、最新時刻において、前記基準視点のレンダリング画像をその深度画像、基準視点および最新視点に基づいて最新視点のレンダリング画像に補正する画像補正手段を具備した。 (1) The local terminal is equipped with a viewpoint estimation means for estimating the user's viewpoint at each time as a reference viewpoint at a reference time and estimating it as a latest viewpoint at a latest time that is a delay time after the reference time, the rendering server is equipped with a rendering means for generating a rendering image of the reference viewpoint and outputting it together with its depth image, and the local terminal is further equipped with an image correction means for correcting the rendering image of the reference viewpoint at the latest time to a rendering image of the latest viewpoint based on the depth image, the reference viewpoint, and the latest viewpoint.

(2) レンダリングサーバが、遅延時間τの予測時間および基準視点に基づいて、前記基準時刻から予測時間が経過した予測時刻におけるユーザの視点を予測視点として計算する視点予測手段を更に具備し、レンダリング手段は、前記予測視点のレンダリング画像およびその深度画像を出力し、画像補正手段は、前記予測視点のレンダリング画像をその深度画像、予測視点および最新視点に基づいて最新視点のレンダリング画像に補正するようにした。 (2) The rendering server further includes a viewpoint prediction means for calculating the user's viewpoint at a predicted time when a predicted time has elapsed from the reference time based on the predicted time of the delay time τ and a reference viewpoint as a predicted viewpoint, the rendering means outputs a rendering image of the predicted viewpoint and its depth image, and the image correction means corrects the rendering image of the predicted viewpoint to a rendering image of the latest viewpoint based on the depth image, the predicted viewpoint, and the latest viewpoint.

(3) 予測視点のレンダリング画像を最新視点のレンダリング画像に補正する際、予測視点のレンダリング画像の代わりに、あるいは予測視点のレンダリング画像と共に、周辺レンダリング画像を用いるようにした。 (3) When correcting the rendering image of the predicted viewpoint to the rendering image of the latest viewpoint, the surrounding rendering images are used instead of or in addition to the rendering image of the predicted viewpoint.

(4) 画像補正手段は、最新時刻と予測時刻との差分に基づいて予測時間の最適値を計算して前記視点予測手段へフィードバックし、視点予測手段は基準時刻から前記最適値が経過した時刻を予測時刻として予測視点を計算するようにした。 (4) The image correction means calculates an optimal value for the predicted time based on the difference between the latest time and the predicted time, and feeds this back to the viewpoint prediction means, and the viewpoint prediction means calculates the predicted viewpoint using the time when the optimal value has elapsed from the reference time as the predicted time.

(5) レンダリングサーバからローカル端末への伝送ビットレートを前記最新視点と予測視点との差分に応じて可変とした。 (5) The transmission bit rate from the rendering server to the local terminal is made variable depending on the difference between the latest viewpoint and the predicted viewpoint.

(1) 基準視点のレンダリング画像を、その深度画像および最新視点に基づいて最新視点のレンダリング画像に補正するので、深度画像を用いずに補正する場合よりも画像補正に立体的な変化を反映することができる。したがって、ユーザの視点がレンダリング中に大きく変化する場合でも精度の高い画像補正が可能となり、高品質な運動視差表現を伴う映像をユーザに提示できるようになる。 (1) The rendering image of the reference viewpoint is corrected to the rendering image of the latest viewpoint based on the depth image and the latest viewpoint, so that the image correction can reflect three-dimensional changes more than when correction is made without using a depth image. Therefore, even if the user's viewpoint changes significantly during rendering, highly accurate image correction is possible, and it becomes possible to present the user with an image with high-quality motion parallax expression.

(2) 基準視点に基づいて予測視点を計算し、予測視点のレンダリング画像を、その深度画像および最新視点に基づいて最新視点のレンダリング画像に補正するので、基準視点のレンダリング画像を補正する場合よりも補正量を少なくできる。したがって、精度の高い画像補正が可能となってレンダリング画像の品質を更に向上させることができる。 (2) The predicted viewpoint is calculated based on the reference viewpoint, and the rendering image of the predicted viewpoint is corrected to the rendering image of the latest viewpoint based on the depth image and the latest viewpoint, so the amount of correction can be reduced compared to when correcting the rendering image of the reference viewpoint. This enables highly accurate image correction, further improving the quality of the rendering image.

(3) 予測視点のレンダリング画像の代わりに、あるいは予測視点のレンダリング画像と共に周辺レンダリング画像を用いることで、レンダリング手段と画像補正手段との間での通信量を削減しつつディスオクルージョンの問題を軽減できるようになる。 (3) By using a peripheral rendering image instead of or in addition to a rendering image of a predicted viewpoint, it is possible to reduce the amount of communication between the rendering means and the image correction means while mitigating the problem of disocclusion.

(4) 最新視点と予測視点との差分に基づいて予測時間の最適値を計算して視点予測手段へフィードバックするので最新視点と予測視点との誤差を小さくできる。その結果、予測視点のレンダリング画像を補正する場合の補正量を少なくできるので、精度の高い画像補正が可能となってレンダリング画像の品質を更に向上させることができる。 (4) The optimal value of the prediction time is calculated based on the difference between the latest viewpoint and the predicted viewpoint, and is fed back to the viewpoint prediction means, so that the error between the latest viewpoint and the predicted viewpoint can be reduced. As a result, the amount of correction required when correcting the rendered image of the predicted viewpoint can be reduced, enabling highly accurate image correction and further improving the quality of the rendered image.

(5) 予測視点と最新視点との差分に応じてレンダリング画像や深度画像の伝送ビットレートを可変としたので、伝送遅延量の低減および体感品質の向上を両立させることができる。 (5) The transmission bit rate of the rendering image and the depth image is variable depending on the difference between the predicted viewpoint and the latest viewpoint, which makes it possible to reduce the transmission delay and improve the quality of experience at the same time.

本発明の第1実施形態に係る3DCGレンダリングシステムの機能ブロック図である。1 is a functional block diagram of a 3DCG rendering system according to a first embodiment of the present invention. 画像補正部の機能ブロック図である。FIG. 4 is a functional block diagram of an image correction unit. レンダリング結果の見え方を比較した図である。This is a diagram comparing the appearance of rendering results. 本発明の第2実施形態に係る3DCGレンダリング装置の機能ブロック図である。FIG. 11 is a functional block diagram of a 3DCG rendering device according to a second embodiment of the present invention. 予測周辺視点を利用したレンダリング方法を示した図（その1）である。This is a diagram (part 1) showing a rendering method using predicted peripheral viewpoint. 予測周辺視点を利用したレンダリング方法を示した図（その2）である。This is a diagram (part 2) showing a rendering method using predicted peripheral viewpoint. 予測周辺視点を利用したレンダリング方法を示した図（その3）である。This is a diagram (part 3) showing a rendering method using predicted peripheral viewpoint. 本発明の第3実施形態に係る3DCGレンダリングシステムの機能ブロック図である。FIG. 11 is a functional block diagram of a 3DCG rendering system according to a third embodiment of the present invention. 本発明の第4実施形態に係る3DCGレンダリング装置の機能ブロック図である。FIG. 11 is a functional block diagram of a 3DCG rendering device according to a fourth embodiment of the present invention.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の第1実施形態に係る3DCGレンダリングシステムの主要部の構成を示したブロック図であり、ユーザによる3DCGの鑑賞に供されるローカル端末（本実施形態では、ディスプレイデバイス）1と3DCGのレンダリングに供されるレンダリングサーバ2とをネットワークNWで接続して構成される。 The following describes in detail an embodiment of the present invention with reference to the drawings. Figure 1 is a block diagram showing the configuration of the main parts of a 3DCG rendering system according to a first embodiment of the present invention, which is configured by connecting a local terminal (in this embodiment, a display device) 1, which is used for viewing 3DCG by a user, and a rendering server 2, which is used for rendering the 3DCG, via a network NW.

ディスプレイデバイス1は通信部10，視点推定部11，画像補正部12およびディスプレイ部13を備える。ディスプレイ部13に代えて映像出力用のインタフェースを設け、汎用のディスプレイ装置が接続されるようにしても良い。レンダリングサーバ2は通信部20，視点予測部21およびレンダリング部22を備える。 The display device 1 comprises a communication unit 10, a viewpoint estimation unit 11, an image correction unit 12, and a display unit 13. Instead of the display unit 13, an interface for video output may be provided so that a general-purpose display device can be connected. The rendering server 2 comprises a communication unit 20, a viewpoint prediction unit 21, and a rendering unit 22.

ディスプレイデバイス1において、視点推定部11は3Dモデルを鑑賞するユーザのグローバル座標系での視点情報Pを推定する。本実施形態では、基準時刻tにおいて左右の眼の各視点情報Pt（以下、基準視点Ptと表現する場合もある）を推定し、更に当該基準時刻tから所定の遅延時間τだけ経過した最新時刻t+τにおいて各視点情報Pt+τ（以下、最新視点Pt+τと表現する場合もある）を推定する。 In the display device 1, the viewpoint estimation unit 11 estimates viewpoint information P in the global coordinate system of the user viewing the 3D model. In this embodiment, viewpoint information Pt of each of the left and right eyes (hereinafter, sometimes referred to as reference viewpoint Pt) is estimated at a reference time t, and then viewpoint information Pt+τ (hereinafter, sometimes referred to as latest viewpoint Pt+τ) is estimated at a latest time t+τ that is a predetermined delay time τ after the reference time t.

ここで、基準時刻tは視点情報を推定した瞬間の時刻を意味し、遅延時間τは基準視点Ptに基づいて生成したレンダリング画像を、後に詳述する画像補正部12が最新視点Pt+τのレンダリング画像に補正するまでの経過時間である。前記基準視点Ptは通信インタフェース10およびネットワーク経由でレンダリングサーバ2へ送信される。前記最新視点Pt+τは画像補正部12へ提供される。 Here, the reference time t means the time at which the viewpoint information is estimated, and the delay time τ is the time that elapses until the image correction unit 12, which will be described in detail later, corrects the rendered image generated based on the reference viewpoint Pt to a rendered image of the latest viewpoint Pt+τ. The reference viewpoint Pt is transmitted to the rendering server 2 via the communication interface 10 and the network. The latest viewpoint Pt+τ is provided to the image correction unit 12.

前記視点推定部11は、表示するディスプレイ部13の形式に応じて任意の視点推定技術を用いてユーザの視点情報Pを推定する。例えば、ディスプレイがHMDであれば当該HMDに搭載されたトラッカー（一般にカメラや加速度センサ、ジャイロスコープ等のセンサを用いて6自由度の頭部姿勢を推定）を用いることが可能であるし、ライトフィールドディスプレイであれば、ディスプレイに搭載されたアイトラッカー（一般にハイスピードカメラや深度センサを用いて両眼の三次元位置を推定）を用いることが可能である。 The viewpoint estimation unit 11 estimates the user's viewpoint information P using any viewpoint estimation technique depending on the format of the display unit 13 on which it is displayed. For example, if the display is an HMD, it is possible to use a tracker mounted on the HMD (which generally estimates the head pose with six degrees of freedom using sensors such as a camera, acceleration sensor, and gyroscope), and if it is a light field display, it is possible to use an eye tracker mounted on the display (which generally estimates the three-dimensional position of both eyes using a high-speed camera and depth sensor).

本実施形態では視点情報Pの形式を特に限定しない。一般に、3Dモデルの各頂点をディスプレイ面に表示するための幾何変換には4×4のモデルビュープロジェクション（MVP）行列の情報が必要である。MVP行列の算出は、例えばグローバル座標系における両眼の三次元位置と視点方向（片眼につき6自由度）、あるいはグローバル座標系における頭部の三次元位置と方向（合計6自由度）等で可能であるため、視点情報をこれらの形式で扱うことが望ましい。なお、MVP行列そのものを視点情報として扱っても良い。 In this embodiment, the format of the viewpoint information P is not particularly limited. In general, 4x4 model view projection (MVP) matrix information is required for geometric transformation to display each vertex of a 3D model on a display surface. Since the MVP matrix can be calculated, for example, from the three-dimensional position and viewpoint direction of both eyes in a global coordinate system (six degrees of freedom per eye), or the three-dimensional position and direction of the head in a global coordinate system (six degrees of freedom in total), it is desirable to handle the viewpoint information in these formats. Note that the MVP matrix itself may be handled as viewpoint information.

レンダリングサーバ2において、通信部20はネットワーク経由で視点推定部11から基準視点Ptを取得する。視点予測部21は前記遅延時間τの予測時間τ'に基づいて、ユーザの予測時刻t+τ'における視点情報P't+τ'（以下、予測視点P't+τ'と表現する場合もある）を計算する。 In the rendering server 2, the communication unit 20 acquires the reference viewpoint Pt from the viewpoint estimation unit 11 via the network. The viewpoint prediction unit 21 calculates viewpoint information P't+τ' (hereinafter sometimes referred to as predicted viewpoint P't+τ') at the user's predicted time t+τ' based on the predicted time τ' of the delay time τ.

このような遅延時間τの予測時間τ'は、一般にMotion-to-Photon Latencyと呼ばれ、3DCGのレンダリング処理やディスプレイ表示処理、各処理間のデータ通信処理の時間等、諸々の処理時間を合算したものである。予測時間τ'は前記遅延時間τと一致することが望ましいが、遅延時間τは様々な要因で変動し、当該時点で正確に予測することは難しい。したがって、予測時間τ'は固定値であっても良いし、ユーザが動的に調整できるようにしても良い。 The predicted time τ' of such a delay time τ is generally called the Motion-to-Photon Latency, and is the sum of various processing times, such as the 3DCG rendering process, display process, and data communication process between each process. It is desirable for the predicted time τ' to match the delay time τ, but the delay time τ fluctuates due to various factors, making it difficult to predict accurately at a given time. Therefore, the predicted time τ' may be a fixed value, or may be dynamically adjustable by the user.

前記視点予測部21は、任意の視点予測技術を用いて予測時刻t+τ'におけるユーザの予測視点P't+τ'を計算する。例えば、一般的な時系列フィルタであるカルマンフィルタ等を用いることで、過去の視点情報の時系列データから未来の視点情報を推定できる。非特許文献1には、視点情報の時系列データに対して2種類の平滑化フィルタ（移動平均法とSavitzky-Golay法）を適用してジッタノイズを取り除いた後、時系列フィルタで予測視点を推定する手法が開示されている。 The viewpoint prediction unit 21 calculates the predicted viewpoint P't+τ' of the user at the predicted time t+τ' using any viewpoint prediction technique. For example, by using a general time series filter such as a Kalman filter, future viewpoint information can be estimated from time series data of past viewpoint information. Non-Patent Document 1 discloses a method of applying two types of smoothing filters (moving average method and Savitzky-Golay method) to the time series data of viewpoint information to remove jitter noise, and then estimating the predicted viewpoint with a time series filter.

レンダリング部22は、視点予測部21から提供される予測視点P't+τ'を用いて3DCGをレンダリングすることで予測視点P't+τ'のレンダリング画像およびその深度画像を生成する。本実施形態では、利用する3Dモデルの形式を特に限定しないが、頂点座標／法線／色／テクスチャ座標等を含んだ一般的な3Dモデルの利用を想定している。 The rendering unit 22 generates a rendered image of the predicted viewpoint P't+τ' and its depth image by rendering 3DCG using the predicted viewpoint P't+τ' provided by the viewpoint prediction unit 21. In this embodiment, the format of the 3D model used is not particularly limited, but it is assumed that a general 3D model including vertex coordinates/normals/color/texture coordinates, etc. will be used.

ここで、レンダリング画像とは一般的な3DCGのレンダリング結果であるRGBのカラー画像を指し、予測視点P't+τ'からモデルビュープロジェクション行列M't+τ'を算出し、3DCGのレンダリングを行うことで生成できる。 Here, the rendered image refers to an RGB color image that is the general result of 3DCG rendering, and can be generated by calculating the model view projection matrix M't+τ' from the predicted viewpoint P't+τ' and rendering the 3DCG.

なお、両眼立体視を行う場合は左右の眼に対応した異なる2枚のRGBカラー画像を生成する。その際、左右の眼に対応した異なる二つのモデルビュープロジェクション行列ML't+τ'およびMR't+τ'を算出し、それぞれレンダリングを行う。 When performing binocular stereoscopic vision, two different RGB color images corresponding to the left and right eyes are generated. In this case, two different model-view projection matrices ML't+τ' and MR't+τ' corresponding to the left and right eyes are calculated and rendered separately.

前記レンダリング部22は、入力された視点情報から3DCGのレンダリングを行う際に、一般にZバッファと呼ばれるメモリ領域から左右の眼の深度画像D（D^L，D^R）を取得し、レンダリング画像と合わせてネットワーク経由で画像補正部12へ提供する。 When rendering 3DCG from the input viewpoint information, the rendering unit 22 obtains depth images D (D ^L , D ^R ) of the left and right eyes from a memory area generally called a Z buffer, and provides them together with the rendering image to the image correction unit 12 via a network.

ディスプレイデバイス1において、画像補正部12は、レンダリングサーバ2のレンダリング部22からは前記予測視点P't+τ'，レンダリング画像およびその深度画像D（D^L，D^R）を取得すると、視点推定部11から最新時刻t+τにおける最新視点Pt+τを取得する。そして、予測視点P't+τ'でレンダリングされているレンダリング画像を最新視点Pt+τでレンダリングされた画像に近似的に変換する画像補正を行う。 In the display device 1, the image correction unit 12 acquires the predicted viewpoint P't+τ', the rendered image and its depth image D (D ^L , D ^R ) from the rendering unit 22 of the rendering server 2, and acquires the latest viewpoint Pt+τ at the latest time t+τ from the viewpoint estimation unit 11. Then, image correction is performed to approximately convert the rendered image rendered at the predicted viewpoint P't+τ' to an image rendered at the latest viewpoint Pt+τ.

両眼立体視を行う場合、画像補正部12は両眼の各レンダリング画像I^L，I^Rに対して補正処理を行うことで補正レンダリング画像I^L"，I^R"を生成し、これを出力用のレンダリング画像としてディスプレイ部13へ出力する。 When performing binocular stereoscopic vision, the image correction unit 12 performs correction processing on the rendering images I ^L and I ^R of each eye to generate corrected rendering images I ^L'' , I ^R'' , and outputs these to the display unit 13 as rendering images for output.

図2は、前記画像補正部12の構成を示した機能ブロック図であり、補正深度画像生成部121および補正画像生成部122を主要な構成としている。ここでは両眼立体視を例にして画像補正部12の動作を説明する。 Figure 2 is a functional block diagram showing the configuration of the image correction unit 12, which mainly consists of a corrected depth image generation unit 121 and a corrected image generation unit 122. Here, the operation of the image correction unit 12 will be explained using binocular stereoscopic vision as an example.

補正深度画像生成部121は、前記予測視点P't+τ'、最新視点Pt+τおよび深度画像D^L，D^R に基づいて、最新視点Pt+τに対応する補正深度画像D^L"，D^R"を生成する。補正画像生成部122は、前記予測視点P't+τ'、最新視点Pt+τ、レンダリング画像I^L，I^Rおよび補正深度画像D^L",D^R"に基づいて、最新視点Pt+τに対応する補正画像（補正レンダリング画像）I^L"，I^R"を生成する。 The corrected depth image generation unit 121 generates corrected depth images D ^L ^{", D R" corresponding to the latest viewpoint Pt+τ based on the predicted viewpoint P't+τ', the latest viewpoint Pt+τ, and the depth images D L, D R. The corrected image generation unit 122 generates corrected images (corrected rendering images) I L"} ^{, I R"} ^{corresponding} to the latest viewpoint Pt+τ based on the predicted viewpoint P't+τ', the latest viewpoint Pt+τ, the rendering images I ^L , I ^R , and the corrected depth images D ^L ^" , D ^R" ^.

前記補正深度画像生成部121は、深度画像D^L，D^R のピクセル座標を視点変換することで補正深度画像D^L"，D^R"を生成する。本実施形態では、予測視点P't+τ'および最新視点Pt+τに基づいて、グローバル座標系における深度画像D^Lの三次元座標X_dを補正深度画像D^L"の三次元座標X_dL"に変換する回転行列R_dLおよび並進ベクトルt_dLを算出する。 The corrected depth image generation unit 121 generates corrected depth images D ^L ", D ^R" by performing viewpoint conversion on the pixel coordinates of the depth images D ^L , D ^R. In this embodiment, based on the predicted viewpoint P't+τ' and the latest viewpoint Pt+τ, a rotation matrix R _dL and a translation vector t dL are calculated to convert the three-dimensional coordinate X _d of the depth image D ^L in the global coordinate system into the three-dimensional coordinate X _dL" of the corrected depth image _D ^L" .

更に深度画像Dの各ピクセル座標(u_d, v_d)およびその深度値d(u_d, v_d)を次式(1)に適用することで三次元座標(X_d ^L",Y_d ^L",Z_d ^L")に変換（逆投影）する。この三次元座標に次式(2)を適用することで補正深度画像D^L"の対応する二次元ピクセル座標(u_d ^L",v_d ^L")を算出する。次式(3)に示す通り、その深度値d^L"(u_d ^L",v_d ^L")は三次元座標のZ_d ^L"の値となる。 Furthermore, each pixel coordinate (u _d , v _d ) of the depth image D and its depth value d (u _d , v _d ) are converted (back projected) into three-dimensional coordinates (X _d ^L" , Y _d ^L" , Z _d ^L" ) by applying the following equation (1). The corresponding two-dimensional pixel coordinates (u _d ^L" , v _d ^L" ) of the corrected depth image D ^L" are calculated by applying the following equation (2) to these three-dimensional coordinates. As shown in the following equation (3), the depth value d ^L" (u _d ^L" , v _d ^L" ) becomes the three-dimensional coordinate Z _d ^L" .

ここで、Kはカメラの内部パラメータ行列であり、使用するディスプレイデバイス1に適合する値が予め設定されているものとする。あるいはディスプレイ部13から適切なパラメータ行列を所定のAPI経由で取得しても良いし、本システムの初回起動時に所定のHMDキャリブレーション手法を適用して自動的に求めても良い。 Here, K is the internal parameter matrix of the camera, and a value suitable for the display device 1 to be used is set in advance. Alternatively, an appropriate parameter matrix may be obtained from the display unit 13 via a specified API, or may be automatically calculated by applying a specified HMD calibration method when the system is started for the first time.

本実施形態では、上記の各処理を深度画像D^Lの全てのピクセル座標に対して行うことで深度画像D^Lを補正深度画像D^L"に変換できる。また、上記の各手順を右眼に関しても繰り返すことで右眼用の補正深度画像D^R''を生成できる。 In this embodiment, the depth image ^DL can be converted into a corrected depth image DL ^" by performing each of the above processes on all pixel coordinates of the depth image ^DL . In addition, a corrected depth image D ^R" for the right eye can be generated by repeating each of the above procedures for the right eye.

なお、深度画像Dの代わりに当該深度画像Dを縮小（低解像度化）した縮小深度画像Dsを用いて縮小補正深度画像Ds^L"，Ds^R"を生成し、当該縮小補正深度画像Ds^L"，Ds^R"を拡大処理することでD^L"，D^Rを生成しても良い。これにより画素の変換回数を削減できるので処理負荷の低減が期待できる。 Instead of the depth image D, a reduced depth image Ds obtained by reducing the size (reducing the resolution) of the depth image D may be used to generate reduced corrected depth images DsL ^" , DsR ^" , and the reduced corrected depth images DsL ^" , DsR ^" may be enlarged to generate D ^L" , D ^R. This reduces the number of pixel conversions, which is expected to reduce the processing load.

また、生成された補正深度画像D^Lは一般にノイズや欠損を含むため、メジアンフィルタやバイラテラルフィルタを適用することによってノイズ除去や平滑化を行っても良い。 Furthermore, since the generated corrected depth image D ^L generally contains noise and defects, noise removal and smoothing may be performed by applying a median filter or a bilateral filter.

また、複数の深度画像Dが提供されている場合は全ての深度画像Dに上記の各処理を行い、深度画像Dごとに得られる補正深度画像を統合する。これにより、ディスオクルージョン領域の欠損が補正された補正深度画像が得られる。 In addition, if multiple depth images D are provided, the above processes are performed on all depth images D, and the corrected depth images obtained for each depth image D are integrated. This results in a corrected depth image in which defects in the disocclusion regions have been corrected.

補正画像生成部122は、予測視点P't+τ'のレンダリング画像I（I^L,I^R）を入力として、そのピクセル座標の視点変換により最新視点Pt+τのレンダリング画像（補正画像）I^L'',I^R''を生成する。具体的には、予測視点P't+τ'および基準視点Pt+τに基づいて、グローバル座標系におけるレンダリング画像I^Lの三次元座標X_iLを補正画像I^L''の三次元座標X_iL''に変換する回転行列R_iLおよび並進ベクトルt_iLを算出し、補正画像I^L''のピクセル座標（u_iL'',v_iL''）に対応するレンダリング画像I^Lのピクセル座標（u_iL,v_iL）を次式(4)で算出する。 The corrected image generation unit 122 receives the rendered image I (I ^L , I ^R ) of the predicted viewpoint P't+τ' as input, and generates rendered images (corrected images) I ^L'' , I ^R'' of the latest viewpoint Pt+τ by performing viewpoint transformation of the pixel coordinates. Specifically, based on the predicted viewpoint P't+τ' and the reference viewpoint Pt+τ, it calculates a rotation matrix R _iL and a translation vector t iL that convert the three-dimensional coordinates X _iL of the rendered image I ^L in the global coordinate system into three-dimensional coordinates X _iL'' ^of the corrected image I ^L'' , and calculates pixel coordinates (u _{iL , v iL ) of the rendered image I L corresponding to the pixel coordinates (u iL''} _, ^v _iL _'' ₎ of the corrected image I L'' using the following equation (4).

ここで、レンダリング画像I^Lのピクセル座標（u_iL,v_iL）のカラー値を参照することで補正画像I^L''のカラー値が取得できる。上記の各処理を補正画像I^L''の全てのピクセル座標に対して繰り返すことで左眼用の補正画像I^L''を生成できる。更に回転行列および並進ベクトルを右眼用に切り替えて同様の処理を繰り返すことで右眼用の補正画像I^R''も生成できる。 Here, the color values of the corrected image I ^L'' can be obtained by referencing the color values of the pixel coordinates (u _iL , v _iL ) of the rendered image I ^L. By repeating each of the above processes for all pixel coordinates of the corrected image I ^L'' , a corrected image I ^L'' for the left eye can be generated. Furthermore, by switching the rotation matrix and translation vector for the right eye and repeating the same processes, a corrected image I ^R'' for the right eye can also be generated.

なお、一般に特定の視点で撮影またはレンダリングされた映像から新しい視点の映像を生成する技術はNovel View Synthesis（NVS）として知られており、数多くの手法が存在する。したがって、本実施形態の画像補正部12も図2の構成に限定されるものではなく、任意のNVS手法を適用できる。 The technology for generating an image from a new viewpoint from an image captured or rendered from a specific viewpoint is generally known as Novel View Synthesis (NVS), and there are many techniques for this. Therefore, the image correction unit 12 of this embodiment is not limited to the configuration shown in FIG. 2, and any NVS technique can be applied.

また、本実施形態ではレンダリングサーバ2からディスプレイデバイス1へ伝送されるレンダリング画像およびその深度画像はH.264等の任意の映像符号化手法により符号化圧縮される。このとき、画像補正部12が予測視点P't+τ'と最新視点Pt+τとの差分を算出し、当該差分が所定の閾値以上であればレンダリングサーバ2へビットレート制御情報を提供し、通信部20はレンダリング画像のビットレート（画質・解像度）を下げるようにしても良い。 In this embodiment, the rendered image and its depth image transmitted from the rendering server 2 to the display device 1 are encoded and compressed by any video encoding method such as H.264. At this time, the image correction unit 12 calculates the difference between the predicted viewpoint P't+τ' and the latest viewpoint Pt+τ, and if the difference is equal to or greater than a predetermined threshold, it may provide bit rate control information to the rendering server 2, and the communication unit 20 may reduce the bit rate (image quality/resolution) of the rendered image.

このように、予測視点P't+τ'と最新視点Pt+τとの差分が大きく、視点予測の精度が低下すると画像補正の精度も低下するところ、本実施形態では当該差分に応じてレンダリング画像のビットレートを下げることで伝送の遅延量を低減させるので、視点推定の精度が高まるので画像補正の精度向上が期待できる。これにより、視点予測の精度が悪い場合でも効果的に画像補正の精度を向上させることができ、体感品質を向上させることができる。 In this way, when the difference between the predicted viewpoint P't+τ' and the latest viewpoint Pt+τ is large and the accuracy of viewpoint prediction decreases, the accuracy of image correction also decreases. In this embodiment, however, the bit rate of the rendering image is lowered according to the difference to reduce the amount of transmission delay, thereby increasing the accuracy of viewpoint estimation and thus improving the accuracy of image correction. This makes it possible to effectively improve the accuracy of image correction even when the accuracy of viewpoint prediction is poor, thereby improving the quality of experience.

あるいは、画像補正部12が予測視点P't+τ'と最新視点Pt+τとの差分が所定の閾値以上の場合にレンダリングサーバ2へビットレート制御情報を提供し、通信部20は深度画像のビットレート（画質・解像度）を上げるようにしても良い。前記差分が大きいと視点予測の精度が低下し、画像補正の精度も低下するところ、本実施形態では当該差分に応じて深度画像のビットレートが上がり、精度が向上するので、画像補正の精度向上が期待できる。これにより、視点予測の精度が悪い場合でも効果的に画像補正の精度を向上させることができ、体感品質を向上させることができる。 Alternatively, the image correction unit 12 may provide bit rate control information to the rendering server 2 when the difference between the predicted viewpoint P't+τ' and the latest viewpoint Pt+τ is equal to or greater than a predetermined threshold, and the communication unit 20 may increase the bit rate (image quality/resolution) of the depth image. If the difference is large, the accuracy of viewpoint prediction decreases, and the accuracy of image correction also decreases. In this embodiment, however, the bit rate of the depth image increases according to the difference, improving accuracy, and therefore improving the accuracy of image correction can be expected. This makes it possible to effectively improve the accuracy of image correction even when the accuracy of viewpoint prediction is poor, thereby improving the quality of experience.

ここで、画像補正部12は予測視点P't+τ'と最新視点Pt+τとの差分を算出する際に、視点変換（三次元座標変換）の際の並進ベクトルt_iLのノルムが所定の閾値以上の場合にレンダリング画像のビットレートを下げる、あるいは深度画像のビットレートを上げるようにしても良い。これにより立体的な画像補正量が大きい場合にのみ、効果的に遅延量を削減あるいは深度画像の精度を向上させることができるので、画像補正の精度を向上させることができる。 Here, when calculating the difference between the predicted viewpoint P't+τ' and the latest viewpoint Pt+τ, the image correction unit 12 may lower the bit rate of the rendering image or increase the bit rate of the depth image if the norm of the translation vector t _i L during viewpoint transformation (three-dimensional coordinate transformation) is equal to or greater than a predetermined threshold. This allows the delay to be effectively reduced or the accuracy of the depth image to be improved only when the amount of stereoscopic image correction is large, thereby improving the accuracy of image correction.

なお、通信部10，20間の通信路は一般的なインターネット、WAN、LAN等のネットワークに限定されるものではなく、Bluetoothに代表されるp2pネットワークでも良いし、USBに代表されるシリアル通信でも良い。 The communication path between the communication units 10 and 20 is not limited to general networks such as the Internet, WAN, or LAN, but may be a p2p network such as Bluetooth, or a serial communication such as USB.

ディスプレイ部13は、画像補正部12が出力する補正画像をディスプレイに表示する。本実施形態では、ディスプレイ部13として片眼／両眼HMD、スマートフォン、タブレット、ライトフィールドディスプレイ等、様々なディスプレイデバイスを用いることが可能であり、特にディスプレイの種類を限定するものではない。一方、ディスプレイ部13が使用する補正画像の枚数に応じて、レンダリング部22や画像補正部12は適切な枚数（視点数）のレンダリング画像を生成する必要がある。 The display unit 13 displays the corrected image output by the image correction unit 12 on a display. In this embodiment, various display devices such as a monocular/binocular HMD, a smartphone, a tablet, a light field display, etc. can be used as the display unit 13, and the type of display is not particularly limited. Meanwhile, depending on the number of corrected images used by the display unit 13, the rendering unit 22 and the image correction unit 12 need to generate an appropriate number of rendering images (number of viewpoints).

例えば、片眼HMD、スマートフォンまたはタブレットの場合、レンダリング画像や補正画像の枚数は1枚である。これに対して、両眼HMDや視点追跡型のライトフィールドディスプレイの場合はレンダリング画像や補正画像が2枚必要となる。更に、3視点以上の画像の入力が必要なライトフィールドディスプレイ（多眼ディスプレイや超多眼ディスプレイ）の場合は、レンダリング部22や画像補正部12はディスプレイに必要な枚数（視点数）のレンダリング画像および補正画像を生成する。 For example, in the case of a monocular HMD, smartphone, or tablet, the number of rendering images and corrected images is one. In contrast, in the case of a binocular HMD or a light field display with viewpoint tracking, two rendering images and corrected images are required. Furthermore, in the case of a light field display (multi-eye display or super multi-eye display) that requires input of images from three or more viewpoints, the rendering unit 22 and the image corrector 12 generate the number of rendering images and corrected images required for the display (number of viewpoints).

図3は、実際の両眼位置でのレンダリング画像の見え方を正解画像として、本実施形態による見え方と他の手法による見え方とを比較した図である。 Figure 3 shows a comparison of how the rendering image appears when viewed at the actual binocular position, as the correct image, according to this embodiment and other methods.

予測視点P't+τ'でのレンダリング画像は遅延時間τの影響により正解画像との乖離が大きくなる。また、予測視点P't+τ'でのレンダリング画像を最新視点Pt+τで補正する従来手法では、補正に立体的な変化を反映できないことが原因で中程度の乖離が生じる。これに対して、本実施形態のように予測視点P't+τ'でのレンダリング画像を最新視点Pt+τおよび深度画像で補正すれば、補正に立体的な変化を反映できるので乖離を少なくできる。 The rendering image at the predicted viewpoint P't+τ' deviates greatly from the correct image due to the influence of the delay time τ. Furthermore, in the conventional method of correcting the rendering image at the predicted viewpoint P't+τ' with the latest viewpoint Pt+τ, a moderate deviation occurs because the correction cannot reflect three-dimensional changes. In contrast, if the rendering image at the predicted viewpoint P't+τ' is corrected with the latest viewpoint Pt+τ and the depth image as in this embodiment, the correction can reflect three-dimensional changes, thereby reducing the deviation.

図4は、本発明の第2実施形態に係る3DCGレンダリング装置3の構成を示した機能ブロック図であり、前記と同一の符号は同一又は同等部分を表しているので、その説明は省略する。 Figure 4 is a functional block diagram showing the configuration of a 3DCG rendering device 3 according to a second embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts, and therefore their explanation will be omitted.

上記の第1実施形態では、本発明を実現する構成がディスプレイデバイス1とレンダリングサーバ2とに分散配置され、ディスプレイデバイス1およびレンダリングサーバ2がネットワーク経由で協調動作するシステムにより本発明の機能を実現するものとして説明した。 In the above first embodiment, the configuration for implementing the present invention is distributed between the display device 1 and the rendering server 2, and the functions of the present invention are implemented by a system in which the display device 1 and the rendering server 2 operate cooperatively via a network.

これに対して、本実施形態では各機能が1台の3DCGレンダリング装置3で実現されている。このような3DCGレンダリング装置3は、汎用のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいはアプリケーションの一部をハードウェア化またはソフトウェア化した専用機や単能機としても構成できる。 In contrast, in this embodiment, each function is realized by a single 3DCG rendering device 3. Such a 3DCG rendering device 3 can be configured by implementing an application (program) that realizes each function on a general-purpose computer or server. Alternatively, it can be configured as a dedicated machine or a single-function machine in which part of the application is implemented as hardware or software.

本実施形態では、第1実施形態において考慮したネットワーク遅延を考慮する必要が無いので、前記視点予測部21は遅延時間τの予測時間τ'からネットワーク遅延相当分を除去できる。その結果、遅延時間τと予測時間τ'と差分が小さくなるので、画像補正部12による補正後の視点の精度を向上させることができる。 In this embodiment, there is no need to consider the network delay considered in the first embodiment, so the viewpoint prediction unit 21 can remove the network delay equivalent from the predicted time τ' of the delay time τ. As a result, the difference between the delay time τ and the predicted time τ' becomes smaller, so the accuracy of the viewpoint after correction by the image correction unit 12 can be improved.

また、上記の各実施形態では視点予測部21が予測視点P't+τ'の計算に用いる遅延時間τが固定的であるものとして説明したが、本発明はこれのみに限定されるものではなく、遅延時間τの予測時間τ'との差分に基づいて適応的に制御されても良い。 In addition, in each of the above embodiments, the delay time τ used by the viewpoint prediction unit 21 to calculate the predicted viewpoint P't+τ' has been described as being fixed, but the present invention is not limited to this, and the delay time τ may be adaptively controlled based on the difference between the delay time τ and the predicted time τ'.

すなわち、画像補正部12が最新時刻t+τと予測時刻t+τ'との差分を最小化する予測時間τ'の最適値を計算して視点予測部21へフィードバックし、視点予測部21が当該最適値を予測時間τ'として予測視点P't+τ'を計算するようにしても良い。これにより最新視点Pt+τに対する予測視点P't+τ'の誤差が小さくなり、画像補正部12による補正量が少なくなって精度の高い画像補正が可能となるので、レンダリング画像の品質を更に向上させることができる。 That is, the image correction unit 12 may calculate an optimal value of the predicted time τ' that minimizes the difference between the latest time t+τ and the predicted time t+τ', and feed this back to the viewpoint prediction unit 21, which may then use this optimal value as the predicted time τ' to calculate the predicted viewpoint P't+τ'. This reduces the error of the predicted viewpoint P't+τ' relative to the latest viewpoint Pt+τ, reduces the amount of correction by the image correction unit 12, and enables highly accurate image correction, thereby further improving the quality of the rendered image.

ここで、第1および第2実施形態では、両眼立体視を行う場合にレンダリング部22が両眼用に2枚（左右）のレンダリング画像I^L，I^Rに対応した2枚の深度画像D^L，D^Rを取得し、レンダリング画像I^L，I^Rおよびその深度画像D^L，D^Rを画像補正部12へ提供するものとして説明したが、本発明はこれのみに限定されるものではない。 Here, in the first and second embodiments, it has been described that when performing binocular stereoscopic vision, the rendering unit 22 obtains two depth images D ^L , D ^R corresponding to two rendering images I ^L , I ^R for both eyes (left and right), and provides the rendering images I ^L , I ^R and the depth images D ^L , D ^R to the image correction unit 12, but the present invention is not limited to this.

例えば、予測視点P't+τ'から両眼の中心位置の視点（中心視点）に相当するMVP行列MC't+τ'を算出してレンダリングを行うことで1枚の共通深度画像D^LRを取得するようにしても良い。これにより、レンダリング部22と画像補正部12の間の通信量を、2枚の深度画像D^L，D^Rを送信する場合よりも削減できるので、特に第1実施形態では伝送遅延を低減できるようになる。画像補正部12は、2枚のレンダリング画像I^L，I^Rを一枚の共通深度画像D^LRで最新視点Pt+τに対応した2枚のレンダリング画像I^L''，I^R''に補正する。 For example, one common depth image D ^LR may be acquired by calculating an MVP matrix MC't+τ' corresponding to a viewpoint at the center position of both eyes (central viewpoint) from the predicted viewpoint P't+τ' and performing rendering. This allows the amount of communication between the rendering unit 22 and the image correction unit 12 to be reduced compared to the case of transmitting two depth images D ^L and D ^R , so that the transmission delay can be reduced particularly in the first embodiment. The image correction unit 12 corrects the two rendering images I ^L and I ^R to two rendering images I ^L'' , I ^R'' corresponding to the latest viewpoint Pt+τ in one common depth image D ^LR .

あるいは図5に示すように、視点予測部21が左右の眼の予測視点P't+τ'（P^L'，P^R'）に加えて、その周辺の視点（予測周辺視点）P^L'2，P^R'2として例えば予測視点P^L'，P^R'の外側の視点を予測し、レンダリング部22は、予測視点P^L'，P^R'に対応したレンダリング画像I^L，I^Rに加えて予測周辺視点P^L'2，P^R'2に対応した周辺レンダリング画像I^L2，I^R2を生成しても良い。 Alternatively, as shown in Figure 5, the viewpoint prediction unit 21 may predict, in addition to the predicted viewpoints P't+τ' (P ^L' , P ^R' ) of the left and right eyes, surrounding viewpoints (predicted peripheral viewpoints) P ^L'2 ^, P ^R'2 , for example viewpoints outside the predicted viewpoints P L', P ^R' , and the rendering unit 22 may generate peripheral rendering images I ^L2 , I ^R2 corresponding to the predicted peripheral viewpoints P ^L'2 , P ^R'2 in addition to the rendering images I ^L , I ^R corresponding to the predicted viewpoints P ^L' , ^P R'.

レンダリング画像I^L，I^Rおよび周辺レンダリング画像I^L2，I^R2は、それぞれの深度画像と共に画像補正部22に提供される。ただし、周辺レンダリング画像I^L2，I^R2の深度画像は別視点である予測視点P^L'，P^R'のレンダリング画像I^L，I^Rに基づいて生成できるので、レンダリング画像I^L，I^Rの深度画像のみを提供するようにしても良い。なお、予測周辺視点P^L'2，P^R'2は予測視点P^L'，P^R'の外側に限定されるものではなく、予測視点P^L'，P^R'の上側や下側などを予測周辺視点としても良い。 The rendering images I ^L , I ^R and the peripheral rendering images I ^L2 , I ^R2 are provided to the image correction unit 22 together with their respective depth images. However, since the depth images of the peripheral rendering images I ^L2 , I ^R2 can be generated based on the rendering images I ^L , I ^R of the predicted viewpoints P ^L' , P ^R' , which are different viewpoints, it is also possible to provide only the depth images of the rendering images I ^L , I ^R. Note that the predicted peripheral viewpoints P ^L'2 , P ^R'2 are not limited to the outside of the predicted viewpoints P ^L' , P ^R' , and the upper or lower side of the predicted viewpoints P ^L' , P ^R' may be the predicted peripheral viewpoints.

また、図6に示すように、視点予測部21はユーザの頭部運動の方向に応じて予測周辺視点の配置を調整しても良い。ここで、同図(a)はユーザの頭部運動の方向が鉛直方向の例を、同図(b)はユーザの頭部運動の方向が水平方向の例を示している。なお、ユーザの頭部運動の方向は同図(a)，(b)の鉛直方向や水平方向に限定されるものではなく、斜めの方向など、任意の方向を取ることが可能である。 As shown in FIG. 6, the viewpoint prediction unit 21 may adjust the placement of the predicted peripheral viewpoints depending on the direction of the user's head movement. Here, FIG. 6(a) shows an example where the direction of the user's head movement is vertical, and FIG. 6(b) shows an example where the direction of the user's head movement is horizontal. Note that the direction of the user's head movement is not limited to the vertical and horizontal directions in FIG. 6(a) and (b), and can be any direction, such as a diagonal direction.

例えば、視点予測部21は表示している3DCGモデルの中心、基準視点Ptおよび予測視点P't+τ'の3点の外積により、3DCGモデルの中心を通る、頭部運動の回転軸を算出する。そして、頭部運動の回転軸を中心として、予測視点P^L'，P^R'の一方を順方向に、他方を逆方向に、それぞれ回転角θだけ回転させた位置を予測周辺視点P^L'2，P^R'2として算出できる。ここで、回転角θは例えば5度とする等、固定値として指定しても良いし、ユーザの頭部運動の速度に応じて、当該速度が速いほど回転角θを大きくするなどの調整をしても良い。ユーザの頭部運動の方向を考慮して予測周辺視点の配置を調整することで、より高精度な補正画像を生成することが可能になる。 For example, the viewpoint prediction unit 21 calculates the rotation axis of the head movement passing through the center of the 3DCG model by the cross product of the three points of the center of the displayed 3DCG model, the reference viewpoint Pt, and the predicted viewpoint P't+τ'. Then, the predicted peripheral viewpoints P ^L'2 , P ^R'2 can be calculated by rotating one of the predicted viewpoints P ^L' , P ^R ' in the forward direction and the other in the reverse direction by a rotation angle θ around the rotation axis of the head movement. Here, the rotation angle θ may be specified as a fixed value, for example, 5 degrees, or may be adjusted according to the speed of the user's head movement, such as increasing the rotation angle θ as the speed increases. By adjusting the arrangement of the predicted peripheral viewpoints in consideration of the direction of the user's head movement, it becomes possible to generate a corrected image with higher accuracy.

画像補正部22は、レンダリング画像I^L，I^R、周辺レンダリング画像I^L2，I^R2および深度画像を用いて上記の各処理を繰り返し、立体的な画像補正を行うことで、ディスオクルージョンと呼ばれる、運動視差によってレンダリング画像に含まれない3Dモデルの新しい面が欠損して表示される問題を解消することができる。 The image correction unit 22 repeats each of the above processes using the rendering images I ^L , I ^R , the peripheral rendering images I ^L2 , I ^R2 and the depth image to perform stereoscopic image correction, thereby eliminating the problem known as disocclusion, in which new faces of a 3D model that are not included in the rendering image are displayed missing due to motion parallax.

あるいは、図7に示すように視点予測部21が予測周辺視点P^L'2，P^R'2のみを予測し、レンダリング部22は予測周辺視点P^L'2，P^R'2に対応するMVP行列M^L2，M^R2に基づいて周辺レンダリング画像I^L2，I^R2のみを生成してもよい。このとき、周辺レンダリング画像I^L2，I^R2と同じMVP行列でレンダリングして生成した当該周辺レンダリング画像I^L2，I^R2の深度画像が取得される。画像補正部12へは、予測周辺視点P^'2t+τ'（P^L'2，P^R'2）およびその深度画像が、前記予測視点P't+τ'およびその深度画像の代わりに提供される。 Alternatively, as shown in Fig. 7, the viewpoint prediction unit 21 may predict only the predicted peripheral viewpoints P ^L'2 , P ^R'2 , and the rendering unit 22 may generate only the peripheral rendering images I ^L2 , I ^R2 based on the MVP matrices M ^L2 , M ^R2 corresponding to the predicted peripheral viewpoints P ^L'2 ^, P ^R'2 . At this time, the depth images of the peripheral rendering images I ^L2 , I ^R2 generated by rendering with the same MVP matrix as the peripheral rendering images I L2 , I ^R2 are acquired. The predicted peripheral viewpoint ^P'2t +τ' (P ^L'2 , P ^R'2 ) and its depth image are provided to the image correction unit 12 instead of the predicted viewpoint P't+τ' and its depth image.

画像補正部22は、予測周辺視点P^'2t+τ'およびその深度画像を、それぞれ前記予測視点P't+τ'およびその深度画像として、最新視点Pt+τに対応した補正画像を生成し、これをレンダリング画像として出力する。 The image correction unit 22 generates a corrected image corresponding to the latest viewpoint Pt+τ using the predicted peripheral viewpoint ^P'2t +τ' and its depth image as the predicted viewpoint P't+τ' and its depth image, respectively, and outputs this as a rendering image.

なお、レンダリング部22から周辺レンダリング画像およびその深度画像が入力されている場合には、補正画像I^L''のピクセル座標u_iL'',v_iL''に対応するレンダリング画像I^Lのピクセル座標u_iL,v_iLを算出する際に、変換後の深度値d^ILと対応する深度画像の深度値とが一致するか否かをチェックしても良い。 In addition, when the surrounding rendering image and its depth image are input from the rendering unit 22, when calculating the pixel coordinates u _iL , v iL of the rendering image I ^L corresponding to the pixel coordinates u _iL '', v _iL _'' of the corrected image I ^L'' , it may be possible to check whether the converted depth value d ^IL matches the depth value of the corresponding depth image.

その結果、一致しない（差分が所定の閾値以上）場合に当該ピクセルがレンダリング画像I^Lに映っていない（遮蔽されている）と判定し、別の画像（I^Rや周辺レンダリング画像）を代わりに用いて同様の座標変換を行い、深度値が一致（差分が所定の閾値以下）した場合に当該レンダリング画像のピクセル座標のカラー値を参照するようにしても良い。 As a result, if there is no match (the difference is greater than a predetermined threshold), it is determined that the pixel is not reflected in the rendering image ^IL (is obscured), and a similar coordinate transformation is performed using another image ( ^IR or a surrounding rendering image) instead.If the depth values match (the difference is less than a predetermined threshold), the color value of the pixel coordinates of the rendering image is referenced.

一般に、オクルージョンが発生して正しい対応関係にある個所が変換先の画像上で遮蔽されていることが起こり得るが、上記のチェックによってそれを検出可能である。また、同様の処理を右眼用の補正画像I^R''に対しても行う。これにより、左眼用の補正画像I^L''のピクセル座標u_iL'',v_iL''に対応する複数のレンダリング画像のピクセル座標の候補が得られた際に、正しい対応点からカラー値を参照することが可能になり、欠損が少なくなるだけでなく、高精度な補正画像を生成することが可能になる。 Generally, it is possible that occlusion occurs and a point in the correct correspondence is hidden in the converted image, but this can be detected by the above check. In addition, a similar process is also performed on the corrected image I ^R'' for the right eye. As a result, when candidates for pixel coordinates of multiple rendering images corresponding to pixel coordinates u _iL'' and v _iL'' of the corrected image I ^L'' for the left eye are obtained, it becomes possible to refer to color values from the correct corresponding points, which not only reduces loss but also enables the generation of a highly accurate corrected image.

なお、上式(4)では補正画像I^L''のカラー値の取得にレンダリング画像I^L中のピクセル座標（u_iL,v_iL）のカラー値を参照しているが、右眼用のレンダリング画像I^Rや周辺レンダリング画像I^L2,I^R2が存在する場合は、そのピクセル座標のカラー値も参照可能である。これにより欠損の少ない補正画像I^L''を生成できる。 In addition, in the above equation (4), the color values of the corrected image I ^L'' are obtained by referencing the color values of the pixel coordinates (u _iL , v _iL ) in the rendered image I ^L , but if a rendered image I ^R for the right eye or peripheral rendered images I ^L2 and I ^R2 exist, the color values of those pixel coordinates can also be referenced. This makes it possible to generate a corrected image I ^L'' with fewer lossy parts.

このように、レンダリング画像I^L，I^Rの代わりに周辺レンダリング画像I^L2，I^R2を用いることで、レンダリング部22と画像補正12の間の通信量を削減しつつディスオクルージョンの問題を軽減できるようになる。 In this way, by using the peripheral rendering images I ^L2 , I ^R2 instead of the rendering images I ^L , I ^R , it is possible to reduce the amount of communication between the rendering unit 22 and the image correction unit 12 while mitigating the problem of disocclusion.

なお、ここまでは両眼立体視を行う前提で、レンダリング部22が固定的に左右の眼に対応するレンダリング画像を生成して画像補正部12へ提供する場合について説明してきたが、視点推定部11からレンダリングが必要な視点数Npの入力を受けてレンダリングを行う視点数を調整しても良い。これにより不必要なレンダリングを抑え、計算機リソースを節約することが可能になる。 Up to this point, we have explained the case where the rendering unit 22 generates fixed rendering images corresponding to the left and right eyes and provides them to the image correction unit 12, assuming binocular stereoscopic vision. However, it is also possible to adjust the number of viewpoints to be rendered by receiving input of the number of viewpoints Np that require rendering from the viewpoint estimation unit 11. This makes it possible to reduce unnecessary rendering and save computer resources.

更に、上記の各実施形態ではレンダリングサーバ2に視点予測部21を設け、レンダリング画像がユーザに提供されると予測される時刻t+τ'における視点情報P't+τ'を予測し、当該予測視点P't+τ'に基づいてレンダリング画像が生成されるものとして説明した。 Furthermore, in each of the above embodiments, the rendering server 2 is provided with a viewpoint prediction unit 21, which predicts viewpoint information P't+τ' at time t+τ' when the rendered image is predicted to be provided to the user, and the rendered image is generated based on the predicted viewpoint P't+τ'.

しかしながら、本発明はこれのみに限定されるものではなく、図8に示した第3実施形態に係る3DCGレンダリングシステム、図9に示した第4実施形態に係る3DCGレンダリング装置のように視点予測部21を省略し、レンダリング部22が基準視点Ptに基づいてレンダリング画像Iおよびその深度画像Dを生成するようにしても良い。 However, the present invention is not limited to this, and the viewpoint prediction unit 21 may be omitted, as in the 3DCG rendering system according to the third embodiment shown in FIG. 8, and the 3DCG rendering device according to the fourth embodiment shown in FIG. 9, and the rendering unit 22 may generate a rendering image I and its depth image D based on the reference viewpoint Pt.

このような構成によれば、遅延時間τが大きいほど画像補正部12が深度画像Dに基づいてレンダリング画像を補正する際の補正量が増えるので、補正画像の品質が視点予測部21を設けた各実施形態との比較で劣化する可能性がある。 With this configuration, the larger the delay time τ, the greater the amount of correction that the image correction unit 12 makes when correcting the rendering image based on the depth image D, so there is a possibility that the quality of the corrected image will deteriorate compared to the embodiments in which a viewpoint prediction unit 21 is provided.

しかしながら、第4実施形態のようにネットワーク遅延を考慮する必要のない3DCGレンダリング装置3への適用、あるいは第3実施形態のようにネットワーク遅延を考慮する必要がある3DCGレンダリングシステムへの適用であっても低遅延が約束されている環境での使用が前提であれば遅延時間τが小さくなる。加えて、視点予測部21が視点予測に費やしていた時間も短縮できるので、視点予測部21を省略しても十分に高品位なレンダリング画像を提供できるようになる。 However, when applied to a 3DCG rendering device 3 that does not need to consider network delay as in the fourth embodiment, or when applied to a 3DCG rendering system that needs to consider network delay as in the third embodiment, the delay time τ will be small if the system is used in an environment that promises low delay. In addition, the time that the viewpoint prediction unit 21 spends on viewpoint prediction can be shortened, so that a sufficiently high-quality rendering image can be provided even if the viewpoint prediction unit 21 is omitted.

そして、本発明によれば高品質なレンダリング画像を短時間で生成することができ、通信インフラ経由でもリアルタイムで提供することが可能となるので、地理的あるいは経済的な格差を超えて多くの人々に多様なエンターテインメントを提供できるようになる。その結果、国連が主導する持続可能な開発目標(SDGs)の目標9「レジリエントなインフラを整備し、包括的で持続可能な産業化を推進する」や目標11「都市を包摂的、安全、レジリエントかつ持続可能にする」に貢献することが可能となる。 The present invention makes it possible to generate high-quality rendering images in a short time and provide them in real time even via communication infrastructure, making it possible to provide a wide variety of entertainment to many people regardless of geographic or economic disparities. As a result, it will be possible to contribute to Goal 9 "Build resilient infrastructure and promote inclusive and sustainable industrialization" and Goal 11 "Make cities inclusive, safe, resilient and sustainable" of the United Nations-led Sustainable Development Goals (SDGs).

1…ディスプレイデバイス，2…レンダリングサーバ，10…通信部，11…視点推定部，12…画像補正部，13…ディスプレイ部，20…通信部，21…視点予測部，22…レンダリング部，121…補正深度画像生成部，122…補正画像生成部 1...display device, 2...rendering server, 10...communication unit, 11...viewpoint estimation unit, 12...image correction unit, 13...display unit, 20...communication unit, 21...viewpoint prediction unit, 22...rendering unit, 121...corrected depth image generation unit, 122...corrected image generation unit

Claims

A 3DCG rendering device that renders 3DCG from a user's viewpoint and outputs a rendered image,
A viewpoint estimation means for estimating a viewpoint of a user at each time as a reference viewpoint at a reference time t and as a latest viewpoint at a latest time t+τ which is a delay time τ from the reference time;
A viewpoint prediction means for calculating a user's viewpoint at a predicted time t+τ' when the predicted time τ' has elapsed from the reference time t based on a predicted time τ' of the delay time τ and a reference viewpoint as a predicted viewpoint;
a rendering means for generating a rendering image of the predicted viewpoint and outputting the rendering image together with a depth image;
and an image correction means for correcting, at a latest time, the rendering image of the predicted viewpoint to a rendering image of the latest viewpoint based on the depth image, the predicted viewpoint, and the latest viewpoint;
A 3DCG rendering device that outputs a rendering image from the latest viewpoint.

The viewpoint prediction means calculates predicted viewpoints of the left and right eyes of the user at a predicted time,
the rendering means outputs a rendering image of each predicted viewpoint and a depth image corresponding to an intermediate viewpoint between each predicted viewpoint;
2. The 3DCG rendering device according to claim 1 , wherein the image correction means corrects each rendering image of the predicted viewpoint to a rendering image of the latest viewpoint based on the depth image of the intermediate viewpoint, the predicted viewpoint and the latest viewpoint.

The viewpoint prediction means calculates a predicted viewpoint of the user at a predicted time and predicted peripheral viewpoints which are peripheral viewpoints of the user,
the rendering means outputs at least a peripheral rendering image and a depth image thereof out of a rendering image of a predicted viewpoint and peripheral rendering images of the predicted peripheral viewpoints;
2. The 3DCG rendering device according to claim 1 , wherein the image correction means corrects at least a peripheral rendering image out of the rendering image of the predicted viewpoint and the peripheral rendering images of the predicted peripheral viewpoints to a rendering image of a latest viewpoint.

The viewpoint prediction means calculates a predicted viewpoint of the left and right eyes of the user at a predicted time,
The 3DCG rendering device according to claim 3, characterized in that the predicted peripheral viewpoints are calculated by rotating one of the predicted viewpoints of the left and right eyes in the forward direction and the other in the reverse direction by a predetermined angle around a rotation axis of head movement that passes through the center of the 3DCG model.

5. The 3DCG rendering device according to claim 4 , wherein the predetermined angle is set in accordance with a speed of head movement about the rotation axis, the faster the speed is, the larger the angle is set.

The 3DCG rendering device according to any one of claims 1 to 5, characterized in that the image correction means calculates an optimal value of the predicted time τ' based on the difference between the latest time and the predicted time and feeds it back to the viewpoint prediction means, and the viewpoint prediction means calculates the predicted viewpoint using the time at which the optimal value has elapsed from a reference time t as the predicted time.

In a 3DCG rendering system configured by connecting a local terminal that displays a rendered image from a user's viewpoint and a rendering server that renders 3DCG from the user's viewpoint and transmits it to the local terminal via a network,
The local terminal,
A viewpoint estimation means is provided for estimating a viewpoint of a user at each time as a reference viewpoint at a reference time t and as a latest viewpoint at a latest time t+τ which is a delay time τ from the reference time,
The rendering server,
A viewpoint prediction means for calculating a user's viewpoint at a predicted time t+τ' when the predicted time τ' has elapsed from the reference time t based on a predicted time τ' of the delay time τ and a reference viewpoint as a predicted viewpoint;
A rendering means for generating a rendering image of the predicted viewpoint and outputting it together with a depth image thereof,
The local terminal further comprises:
an image correction means for correcting, at a latest time, the rendering image of the predicted viewpoint to a rendering image of the latest viewpoint based on the depth image, the predicted viewpoint and the latest viewpoint;
A 3DCG rendering system that outputs a rendering image of the latest viewpoint.

The viewpoint prediction means calculates predicted viewpoints of the left and right eyes of the user at a predicted time,
the rendering means outputs a rendering image of each predicted viewpoint and a depth image corresponding to an intermediate viewpoint between each predicted viewpoint;
8. The 3DCG rendering system according to claim 7 , wherein the image correction means corrects each rendering image of the predicted viewpoint to a rendering image of the latest viewpoint based on the depth image of the intermediate viewpoint, the predicted viewpoint and the latest viewpoint.

The viewpoint prediction means calculates a predicted viewpoint of the user at a predicted time and predicted peripheral viewpoints which are peripheral viewpoints of the user,
the rendering means outputs at least a peripheral rendering image and a depth image thereof out of a rendering image of a predicted viewpoint and peripheral rendering images of the predicted peripheral viewpoints;
8. The 3DCG rendering system according to claim 7 , wherein the image correction means corrects at least a peripheral rendering image out of the rendering image of the predicted viewpoint and the peripheral rendering images of the predicted peripheral viewpoints to a rendering image of a latest viewpoint.

The viewpoint prediction means calculates predicted viewpoints of the left and right eyes of the user at a predicted time,
The 3DCG rendering system according to claim 9, characterized in that the predicted peripheral viewpoints are calculated by rotating one of the predicted viewpoints of the left and right eyes in the forward direction and the other in the reverse direction by a predetermined angle around a rotation axis of head movement that passes through the center of the 3DCG model.

11. The 3DCG rendering system according to claim 10 , wherein the predetermined angle is set in accordance with a speed of head movement about the rotation axis, the faster the speed is, the larger the angle is set.

The 3DCG rendering system according to any one of claims 7 to 11, characterized in that the image correction means calculates an optimal value of the predicted time τ' based on the difference between the latest time and the predicted time and feeds it back to the viewpoint prediction means, and the viewpoint prediction means calculates the predicted viewpoint using the time at which the optimal value has elapsed from the reference time t as the predicted time.

13. The 3DCG rendering system according to claim 7 , wherein a transmission bit rate from said rendering server to said local terminal is variable according to a difference between said latest viewpoint and said predicted viewpoint.

A 3DCG rendering method in which a computer renders 3DCG from a user's viewpoint and outputs a rendered image, comprising:
The user's viewpoint at each time is estimated as a reference viewpoint at a reference time t, and as a latest viewpoint at the latest time t+τ, which is a delay time τ from the reference time, and
Calculate a user's viewpoint at a predicted time t+τ' when the predicted time τ' has elapsed from the reference time t based on a predicted time τ' of the delay time τ and a reference viewpoint, as a predicted viewpoint;
A rendering image of the predicted viewpoint is generated and output together with the depth image;
At a latest time, the rendering image of the predicted viewpoint is corrected to a rendering image of a latest viewpoint based on the depth image, the predicted viewpoint, and the latest viewpoint;
A 3DCG rendering method comprising: outputting a rendering image from the latest viewpoint.

In a 3DCG rendering program that renders 3DCG from the user's viewpoint and outputs the generated rendered image,
A procedure for estimating the user's viewpoint at each time as a reference viewpoint at a reference time t and as a latest viewpoint at a latest time t+τ, which is a delay time τ from the reference time;
A step of calculating a user's viewpoint at a predicted time t+τ' when the predicted time τ' has elapsed from the reference time t based on a predicted time τ' of the delay time τ and a reference viewpoint as a predicted viewpoint;
generating a rendering image of the predicted viewpoint and outputting it together with a depth image;
and at a latest time, correcting the rendering image of the predicted viewpoint to a rendering image of the latest viewpoint based on the depth image, the predicted viewpoint, and the latest viewpoint;
A 3DCG rendering program that causes a computer to execute the steps of: outputting a rendering image of the latest viewpoint.