JP7465227B2

JP7465227B2 - 3D model generation device, method and program

Info

Publication number: JP7465227B2
Application number: JP2021022869A
Authority: JP
Inventors: 良亮渡邊
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2021-02-16
Filing date: 2021-02-16
Publication date: 2024-04-10
Anticipated expiration: 2041-02-16
Also published as: JP2022124941A

Description

本発明は、複数台のカメラの映像から被写体の3Dモデルを高速に生成する装置、方法及びプログラムに関する。 The present invention relates to an apparatus, method, and program for quickly generating a 3D model of a subject from images captured by multiple cameras.

複数のカメラ映像から被写体の3Dモデルを生成するアプローチとして、非特許文献1に開示される視体積交差法が広く知られている。視体積交差法は、図14に示すように、各カメラ映像から被写体の領域を抽出した2値のシルエット画像を3D空間に投影し、その積集合となる部分を残すことによって3Dモデルを生成する手法である。 The volume intersection method disclosed in Non-Patent Document 1 is a widely known approach to generating a 3D model of a subject from images captured by multiple cameras. As shown in Figure 14, the volume intersection method is a technique for generating a 3D model by projecting binary silhouette images, which are obtained by extracting the subject's area from each camera image, into a 3D space and retaining the intersection of these images.

視体積交差法は、非特許文献2に示されるような自由視点映像技術の中で、3Dモデルの形状を復元する要素技術の一つとして利用されている。自由視点映像技術は複数台のカメラ映像から3D空間を再構成し、カメラがないアングルからでも映像視聴を可能とする技術である。 Visual volume intersection is used as one of the elemental technologies for restoring the shape of a 3D model in free viewpoint imaging technology as described in Non-Patent Document 2. Free viewpoint imaging technology reconstructs 3D space from images captured by multiple cameras, making it possible to view images from angles where there are no cameras.

視体積交差法で生成される3Dモデルを構成する最小単位はボクセルと呼ばれる。ボクセルは小さな体積の立方体であり、視体積交差法で3Dモデルを生成する際には、3Dモデル制作を行う3D領域全体を前記立方体で埋め尽くしたボクセルグリッドを定義し、ボクセルグリッドごとにモデルが生成されるか否かの判定を実施する。 The smallest unit that makes up a 3D model generated by the volume intersection method is called a voxel. A voxel is a cube with a small volume, and when generating a 3D model using the volume intersection method, a voxel grid is defined that fills the entire 3D area where the 3D model is to be created with these cubes, and a determination is made for each voxel grid as to whether or not a model will be generated.

この立方体の一辺の長さ（単位ボクセルサイズ）をMとするとき、単位ボクセルサイズMを大きく設定するほど3D空間は離散的に扱われるため、視体積交差法の処理時間は短くなるが、モデルが離散化されるため形状の粗い3Dモデルが生成される。 When the length of one side of this cube (unit voxel size) is M, the larger the unit voxel size M is set, the more discrete the 3D space is treated, so the processing time for the visual volume intersection method is shortened, but the model is discretized, resulting in a 3D model with a coarse shape.

一方、単位ボクセルサイズMが小さくなるほど精細な形状を復元することが可能となるが、計算単位の増加により処理時間が爆発的に増加する。特に、自由視点映像への応用を考えた場合、スポーツのスタジアムなどの広い空間に対しボクセル生成を行うため、計算時間が増大しやすい傾向にある。 On the other hand, the smaller the unit voxel size M, the more detailed the shape can be restored, but the increase in calculation units results in an explosive increase in processing time. In particular, when considering applications to free viewpoint video, voxels must be generated for large spaces such as sports stadiums, which means that calculation times tend to increase.

このような技術課題を解決するために、非特許文献3や特許文献1に視体積交差法の処理を高速化する技術が開示されている。非特許文献3では、視体積交差法で3Dボクセルモデルを生成する際に、初めに図15に示すように、粗い単位ボクセルサイズMbでモデルの生成を行い、ボクセルの連結領域を一つの被写体として3D空間内のバウンディングボックスを得る。 To solve these technical problems, Non-Patent Document 3 and Patent Document 1 disclose techniques for accelerating the processing of the volume intersection method. In Non-Patent Document 3, when generating a 3D voxel model using the volume intersection method, first, as shown in Figure 15, a model is generated with a coarse unit voxel size Mb, and a bounding box in 3D space is obtained with the connected region of voxels as a single subject.

その後、各3Dバウンディングボックス内を、細かい単位ボクセルサイズMa（＜Mb）で視体積交差法を用いてモデル化することで処理時間を大幅に削減することに成功している。 Then, we successfully significantly reduced processing time by modeling the inside of each 3D bounding box with a fine unit voxel size Ma (< Mb) using a volume intersection method.

また、特許文献1にも類似の技術として、3Dモデル制作を行う3D空間に対して粗い3Dボクセルモデルを推定し、粗い3Dボクセルモデルの生成位置に、より細かいボクセルグリッドを配置し、このボクセルグリッドに対して再度視体積交差法を用いて3Dモデルを生成する過程を繰り返すことで、3Dモデル生成を高速化する技術が開示されている。 Patent Document 1 also discloses a similar technology that speeds up 3D model generation by estimating a coarse 3D voxel model for the 3D space in which the 3D model is to be created, placing a finer voxel grid at the generation position of the coarse 3D voxel model, and repeating the process of generating a 3D model using the volume intersection method again for this voxel grid.

特許文献2では、視体積交差法における虚像物体の発生を防止するために、第1段階として視体積交差法で物体位置を推定し物体候補を取得した後に、第2段階として候補に対して被写体であるか虚像物体であるかを判定して虚像物体を削除することで、モデル品質を向上させる2段階に基づくモデル生成手法が開示されている。また、このときに過去フレームの被写体生成位置を参照し、当該参照位置に近い場合に虚像物体と判定されにくくする機構も開示されている。 Patent Document 2 discloses a two-stage model generation method that improves model quality by estimating object positions using the volume intersection method in the first stage to obtain object candidates, and then determining whether the candidates are subjects or virtual objects in the second stage and deleting the virtual objects, in order to prevent the generation of virtual objects in the volume intersection method. It also discloses a mechanism that refers to the subject generation position in a previous frame at this time, and makes it less likely to be determined as a virtual object if it is close to the reference position.

非特許文献4には、3DモデルをPCクラスタで計算する際に、計算負荷を均衡化することで生成を高速化する発明が開示されている。 Non-Patent Document 4 discloses an invention that speeds up the generation of 3D models by balancing the calculation load when calculating them on a PC cluster.

特開2018-063635号公報JP 2018-063635 A 特許第5454573号公報Patent No. 5454573 特願2019-153696号Patent Application No. 2019-153696

A. Laurentini, "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162, (1994).A. Laurentini, "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162, (1994). J. Kilner, J. Starck, A. Hilton and O. Grau, "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events," Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), pp. 177-184, (2007).J. Kilner, J. Starck, A. Hilton and O. Grau, "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events," Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), pp. 177-184, (2007). J. Chen, R. Watanabe, K. Nonaka, T. Konno, H. Sankoh, S. Naito, "A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes", 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), WeAT17.2, (2019).J. Chen, R. Watanabe, K. Nonaka, T. Konno, H. Sankoh, S. Naito, "A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes", 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), WeAT17.2, (2019). 岩下友美, 倉爪亮, 原健二, 内田誠一, 諸岡健一, 長谷川勉,"並列Fast Level Set Methodによる実物体の高速な3次元形状復元", ロボティクス・メカトロニクス講演会講演概要集, 2P1-C13, (2006).Tomomi Iwashita, Ryo Kurazume, Kenji Hara, Seiichi Uchida, Kenichi Morooka, Tsutomu Hasegawa, "Fast 3D Shape Recovery of Real Objects Using Parallel Fast Level Set Method", Abstracts of the 2006 Symposium on Robotics and Mechatronics, 2P1-C13, (2006). C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252, Vol. 2, (1999).C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252, Vol. 2, (1999). J. Ruttle, M. Manzke and R. Dahyot, "Estimating 3D Scene Flow from Multiple 2D Optical Flows," 2009 13th International Machine Vision and Image Processing Conference, Dublin, 2009, pp. 1-6, doi: 10.1109/IMVIP.2009.8.J. Ruttle, M. Manzke and R. Dahyot, "Estimating 3D Scene Flow from Multiple 2D Optical Flows," 2009 13th International Machine Vision and Image Processing Conference, Dublin, 2009, pp. 1-6, doi: 10.1109/IMVIP.2009.8. Arun, Somani; Thomas S. Huang; Steven D. Blostein, "Least-square fitting of two 3-D point sets". IEEE Pattern Analysis and Machine Intelligence, (1987).Arun, Somani; Thomas S. Huang; Steven D. Blostein, "Least-square fitting of two 3-D point sets". IEEE Pattern Analysis and Machine Intelligence, (1987).

非特許文献3や特許文献1のような高速化手法は有効であるが、非特許文献3の中で示される実験においては、最大でもバレーボールのコートサイズでの3Dモデル生成のリアルタイム性しか示されておらず、例えばサッカーのスタジアムのような広域空間で品質を保ったままリアルタイム計算が可能かどうかについては検証されていない。 Although the high-speed techniques described in Non-Patent Document 3 and Patent Document 1 are effective, the experiment described in Non-Patent Document 3 only demonstrated the real-time generation of 3D models up to the size of a volleyball court, and did not verify whether real-time calculations while maintaining quality were possible in a wide space such as a soccer stadium.

自由視点映像技術の用途を鑑みると、スポーツのスタジアムを自由視点化し、任意のカメラワークのリプレイ動画を生成し視聴することや、各ユーザの操作に応じてユーザが視聴したい視点からインタラクティブに自由視点映像の視聴を楽しむというユースケースが考えられる。すなわち、サッカーや野球のスタジアム全体のような広域空間における3Dモデルのリアルタイム生成が必須となる。 Considering the applications of free viewpoint video technology, possible use cases include turning a sports stadium into a free viewpoint, generating and viewing replay videos with any camerawork, or allowing users to interactively enjoy free viewpoint video from the viewpoint they want to view according to their own operations. In other words, real-time generation of 3D models of large spaces such as an entire soccer or baseball stadium will be essential.

単位ボクセルサイズMaやMbの値を大きくすることによりリアルタイム性を担保できる可能性はあるが、MaやMbを大きくすることは品質の劣化を招いてしまう。 It may be possible to ensure real-time performance by increasing the unit voxel sizes Ma and Mb, but increasing Ma and Mb will result in a deterioration in quality.

また、計算処理能力の高いサーバを用いる場合、サーバのコスト（費用）が増大し、実用を目指す上での妨げとなる。また、スタジアム全体を3Dモデル化するようなユースケースにおいては、単位ボクセルサイズMaで生成する細かいボクセルの生成時間だけでなく、単位ボクセルサイズMbで生成される粗いボクセルの生成時間に関しても、領域が広いために増大する傾向にあった。 In addition, when using a server with high computational processing power, the cost of the server increases, which is an obstacle to practical use. In addition, in a use case such as creating a 3D model of an entire stadium, the generation time not only for fine voxels generated with a unit voxel size Ma, but also for coarse voxels generated with a unit voxel size Mb tends to increase due to the large area.

また、非特許文献4のようなPCクラスタを用いた高速化手法は高速化を行う上で有効ではあるが、多数のPCを用意する必要があることからコストが増大する課題があった。 In addition, although the high-speed method using a PC cluster as described in Non-Patent Document 4 is effective in increasing speed, there is an issue of increased costs due to the need to prepare a large number of PCs.

本発明の目的は、上記の技術課題を解決し、過去のフレームの3Dモデルの生成位置を基に、連続するフレーム間で対象物体の移動方向や移動量を推定して入力フレームでの3Dモデルの生成位置を決定し、3Dモデル計算を行うべき領域を絞り込むことで3Dモデル生成を高速化することにある。 The objective of the present invention is to solve the above technical problems and to speed up 3D model generation by determining the generation position of the 3D model in the input frame by estimating the movement direction and amount of movement of the target object between successive frames based on the generation position of the 3D model in the previous frame, and narrowing down the area where the 3D model calculation should be performed.

上記の目的を達成するために、本発明は、視点の異なる複数のカメラで撮影した動画像からフレーム単位で抽出したシルエット画像に基づいて被写体の3Dモデルを生成する3Dモデル生成装置において、以下の構成を具備した点に特徴がある。 To achieve the above object, the present invention is a 3D model generation device that generates a 3D model of a subject based on silhouette images extracted frame by frame from video captured by multiple cameras with different viewpoints, and is characterized by having the following configuration.

(1) 過去フレームにおける各3Dモデルの生成位置を含むモデル生成履歴を記憶する手段と、モデル生成履歴に基づいて入力フレームにおけるモデル計算領域を決定する手段と、モデル計算領域を対象に3Dモデル計算を行って3Dモデルを生成する手段とを具備した。 (1) The system includes a means for storing a model generation history including the generation position of each 3D model in a past frame, a means for determining a model calculation area in an input frame based on the model generation history, and a means for performing a 3D model calculation on the model calculation area to generate a 3D model.

(2) 入力フレームをキーフレームまたは非キーフレームに分類する手段を具備し、前記3Dモデルを生成する手段が、相対的に高解像度の3Dモデルを生成する手段および低解像度の3Dモデルを生成する手段を具備し、キーフレームでは、シルエット画像に基づいて低解像度の3Dモデルを生成した領域に高解像度の3Dモデルを生成し、非キーフレームでは、モデル計算領域を対象に3Dモデル計算を行って高解像度の3Dモデルを生成するようにした。 (2) A means for classifying input frames as key frames or non-key frames is included, and the means for generating the 3D model includes a means for generating a relatively high-resolution 3D model and a means for generating a low-resolution 3D model, and in the case of key frames, a high-resolution 3D model is generated in an area where a low-resolution 3D model is generated based on a silhouette image, and in the case of non-key frames, a 3D model calculation is performed for a model calculation area to generate a high-resolution 3D model.

(3) 非キーフレームでは、モデル計算領域を対象に3Dモデル計算を行って生成した低解像度の3Dモデルの生成領域を対象に3Dモデル計算を行って高解像度の3Dモデルを生成するようにした。 (3) In non-keyframes, 3D model calculations are performed on the model calculation area to generate a low-resolution 3D model, and then 3D model calculations are performed on the generation area to generate a high-resolution 3D model.

(4) モデル計算領域を決定する手段は、過去フレームにおける3Dモデル生成位置を拡張した領域をモデル計算領域に決定するようにした。 (4) The method for determining the model calculation area is to determine the area that is an extension of the 3D model generation position in the previous frame as the model calculation area.

(5) 各3Dモデルをクラスに分類する手段を具備し、モデル計算領域を決定する手段は、過去フレームにおける3Dモデル生成位置をクラスに応じた拡張量で拡張するようにした。 (5) A means for classifying each 3D model into a class is provided, and the means for determining the model calculation area is adapted to expand the 3D model generation position in the previous frame by an expansion amount according to the class.

(6) 分類する手段は、各3Dモデルをその被写体に想定される移動速度に基づいて各クラスに分類し、モデル計算領域を決定する手段は、移動速度のより速い被写体のクラスほど拡張量をより大きくするようにした。 (6) The classification method classifies each 3D model into classes based on the assumed moving speed of the subject, and the method for determining the model calculation area is set so that the amount of expansion is larger for classes of subjects with faster moving speeds.

(7) 複数の過去フレームにおける各3Dモデルの生成履歴に基づいて各3Dモデルの速度場を推定する手段を具備し、モデル計算領域を決定する手段は、3Dモデルごとにその速度場に基づいてモデル計算領域を決定するようにした。 (7) A means for estimating a velocity field of each 3D model based on the generation history of each 3D model in multiple past frames is provided, and the means for determining the model calculation area determines the model calculation area for each 3D model based on the velocity field.

(8) 前記非キーフレームにおいて低解像度の3Dモデルを生成する際に用いるシルエット画像数が前記キーフレームにおいて低解像度の3Dモデルを生成する際に用いるシルエット画像数よりも少なくした。 (8) The number of silhouette images used when generating a low-resolution 3D model in the non-keyframe is made smaller than the number of silhouette images used when generating a low-resolution 3D model in the keyframe.

本発明によれば以下のような効果が達成される。 The present invention achieves the following effects:

(1) 過去フレームにおける各3Dモデルの生成位置に基づいて入力フレームにおけるモデル計算領域を決定し、当該モデル計算領域のみを対象に3Dモデル計算を行うので、3Dモデルを高速に生成できるようになる。 (1) The model calculation area in the input frame is determined based on the generation position of each 3D model in the previous frame, and 3D model calculation is performed only for that model calculation area, which enables high-speed generation of 3D models.

(2) キーフレームでは、シルエット画像に基づいて低解像度の3Dモデルを生成した領域のみに高解像度の3Dモデルを生成する一方、非キーフレームでは、モデル計算領域を対象に3Dモデル計算を行って高解像度の3Dモデルを生成するので、特に非キーフレームにおける3Dモデル生成を、その品質を維持しながら高速化できるようになる。 (2) In keyframes, a high-resolution 3D model is generated only in the area where a low-resolution 3D model is generated based on the silhouette image, whereas in non-keyframes, a high-resolution 3D model is generated by performing 3D model calculations on the model calculation area. This makes it possible to speed up 3D model generation, especially in non-keyframes, while maintaining its quality.

(3) 非キーフレームでは、モデル計算領域を対象に3Dモデル計算を行って生成した低解像度3Dモデルの生成領域を対象に3Dモデル計算を行って高解像度3Dモデルを生成するので、非キーフレームにおけるモデル計算領域を更に正確に決定できる。 (3) In non-keyframes, 3D model calculations are performed on the model calculation area, and then a high-resolution 3D model is generated by performing 3D model calculations on the generation area of the low-resolution 3D model that is generated by performing 3D model calculations on the model calculation area. This allows the model calculation area in non-keyframes to be determined more accurately.

(4) 過去フレームにおける3Dモデル生成位置を拡張してモデル計算領域に決定するので、処理負荷の軽い計算でモデル計算領域を決定できるようになり、3Dモデルを高速に生成できるようになる。 (4) The 3D model generation position in the previous frame is expanded to determine the model calculation area, so the model calculation area can be determined with calculations that have a light processing load, enabling the 3D model to be generated quickly.

(5) 各3Dモデルをクラスに分類し、過去フレームにおける3Dモデル生成位置をクラスに応じた拡張量で拡張するので、3Dモデルごとにモデル計算領域を過不足の無い適切な範囲に決定できるようになる。 (5) Each 3D model is classified into a class, and the 3D model generation position in the previous frame is expanded by an amount corresponding to the class, so that the model calculation area can be determined to be an appropriate range for each 3D model, without being excessive or insufficient.

(6) 各3Dモデルをその被写体に想定される移動速度に基づいて各クラスに分類し、移動速度のより速い被写体のクラスほど拡張量をより大きくするので、3Dモデルの移動速度に差がある場合でもモデル計算領域を過不足の無い適切な範囲に決定できるようになる。 (6) Each 3D model is classified into a class based on the assumed moving speed of the subject, and the amount of expansion is increased for classes of subjects with faster moving speeds. This makes it possible to determine an appropriate range for the model calculation area, neither too much nor too little, even if there are differences in the moving speeds of the 3D models.

(7) 複数の過去フレームにおける各3Dモデルの生成履歴に基づいて各3Dモデルの速度場を推定し、3Dモデルごとにその速度場に基づいてモデル計算領域を決定するので、3Dモデルの移動速度や移動方向に差がある場合でもモデル計算領域を過不足の無い適切な範囲に決定できるようになる。 (7) The velocity field of each 3D model is estimated based on the generation history of each 3D model in multiple past frames, and the model calculation region is determined for each 3D model based on that velocity field. This makes it possible to determine the model calculation region to be an appropriate range that is neither excessive nor insufficient, even if there is a difference in the movement speed or direction of the 3D models.

(8) 非キーフレームにおいて低解像度の3Dモデルを生成する際に用いるシルエット画像数を、キーフレームにおいて低解像度の3Dモデルを生成する際に用いるシルエット画像数よりも少なくしたので、非キーフレームにおける処理負荷の増加を最小限に抑えながらモデル計算領域を更に正確に決定できるようになる。 (8) The number of silhouette images used when generating a low-resolution 3D model in a non-keyframe is reduced compared to the number of silhouette images used when generating a low-resolution 3D model in a keyframe. This makes it possible to more accurately determine the model calculation area while minimizing the increase in processing load in non-keyframes.

本発明の第1実施形態に係る3Dモデル生成装置の構成を示した機能ブロック図である。1 is a functional block diagram showing a configuration of a 3D model generating device according to a first embodiment of the present invention. 過去フレームにおける3Dモデルの生成領域を拡張して3Dモデル計算領域を決定する例を示した図である。13 is a diagram showing an example of determining a 3D model calculation area by expanding a generation area of a 3D model in a past frame. FIG. 本発明の第2実施形態に係る3Dモデル生成装置の構成を示した機能ブロック図である。FIG. 11 is a functional block diagram showing the configuration of a 3D model generating device according to a second embodiment of the present invention. 3Dモデルごとに3Dバウンディングボックスに分割する例を示した図である。FIG. 13 is a diagram showing an example of dividing each 3D model into 3D bounding boxes. 3Dモデルのクラスに応じて拡張量を変化させる例を示した図である。13A and 13B are diagrams illustrating an example in which the amount of expansion is changed depending on the class of a 3D model. 本発明の第3実施形態に係る3Dモデル生成装置の構成を示した機能ブロック図である。FIG. 11 is a functional block diagram showing the configuration of a 3D model generating device according to a third embodiment of the present invention. 速度場に基づいてモデル計算領域を決定する例を示した図である。FIG. 13 is a diagram showing an example of determining a model calculation domain based on a velocity field. 本発明の第4実施形態に係る3Dモデル生成装置の構成を示した機能ブロック図である。FIG. 13 is a functional block diagram showing the configuration of a 3D model generating device according to a fourth embodiment of the present invention. フレームに応じて3Dモデルの生成方法を異ならせる例を示した図（その１）である。FIG. 1 is a diagram showing an example of how a 3D model is generated in a different way depending on the frame. フレームに応じて3Dモデルの生成方法を異ならせる例を示した図（その２）である。FIG. 2 is a diagram (part 2) showing an example of changing the method of generating a 3D model depending on the frame. 非キーフレームにおける各3Dモデルのモデル計算領域をクラスに応じた拡張量に基づいて決定する例を示した図である。13A and 13B are diagrams illustrating an example in which the model calculation area of each 3D model in a non-key frame is determined based on an expansion amount according to the class. 非キーフレームにおける各3Dモデルのモデル計算領域を速度場に基づいて決定する例を示した図である。13A and 13B are diagrams showing examples of determining the model calculation region of each 3D model in a non-key frame based on a velocity field. 本発明の第5実施形態に係る3Dモデル生成装置の構成を示した機能ブロック図である。FIG. 13 is a functional block diagram showing the configuration of a 3D model generating device according to a fifth embodiment of the present invention. 視体積交差法による3Dモデルの生成方法を示した図である。FIG. 13 is a diagram showing a method for generating a 3D model using the volume intersection method. 3Dモデルを2段階で生成する例を示した図である。FIG. 13 shows an example of generating a 3D model in two stages.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の第1実施形態に係る3Dモデル生成装置100の主要部の構成を示した機能ブロック図であり、シルエット画像取得部10、3Dモデル生成部20、履歴データベース (DB) 30、計算領域決定部40および履歴登録部50を主要な構成としている。 The following describes in detail an embodiment of the present invention with reference to the drawings. FIG. 1 is a functional block diagram showing the configuration of the main parts of a 3D model generation device 100 according to a first embodiment of the present invention, and the main components are a silhouette image acquisition unit 10, a 3D model generation unit 20, a history database (DB) 30, a calculation area determination unit 40, and a history registration unit 50.

このような3Dモデル生成装置100は、汎用の少なくとも1台のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいはアプリケーションの一部をハードウェア化またはソフトウェア化した専用機や単能機としても構成できる。本実施形態では、スポーツシーンをN台のカメラCam1～CamNで撮影し、被写体ごとに3Dモデルを生成する場合を例にして説明する。 Such a 3D model generating device 100 can be configured by implementing applications (programs) that realize each function on at least one general-purpose computer or server. Alternatively, it can be configured as a dedicated machine or a single-function machine in which part of the application is implemented as hardware or software. In this embodiment, an example will be described in which a sports scene is photographed by N cameras Cam1 to CamN, and a 3D model is generated for each subject.

シルエット画像取得部10は、複数の被写体を異なる視点で撮影した複数のカメラ映像（多視点映像）から、視体積交差法に用いるシルエット画像をフレーム単位でそれぞれ取得する。視体積交差法で3Dモデルを形成するためには2台以上のカメラからシルエット画像を取得することが望ましい。シルエット画像は、3Dモデルを生成する被写体領域を白、それ以外の領域を黒で表した2値のマスク画像形式で取得される。このようなシルエット画像は、非特許文献5に開示される背景差分法等の従来技術を用いて計算できる。 The silhouette image acquisition unit 10 acquires silhouette images to be used in the volume intersection method on a frame-by-frame basis from multiple camera images (multi-viewpoint images) in which multiple subjects are captured from different viewpoints. To form a 3D model using the volume intersection method, it is desirable to acquire silhouette images from two or more cameras. The silhouette image is acquired in the form of a binary mask image in which the subject area from which the 3D model is to be generated is shown in white and the other areas in black. Such silhouette images can be calculated using conventional techniques such as the background subtraction method disclosed in Non-Patent Document 5.

履歴DB30は、今回の入力フレームよりも前の過去フレームにおいて3Dモデルが生成された位置の情報を含むモデル生成履歴を記憶する。計算領域決定部40は、過去フレームのモデル生成履歴に基づいて入力フレームにおけるモデル計算領域を決定する。 The history DB30 stores model generation history including information on the position where a 3D model was generated in a past frame prior to the current input frame. The calculation area determination unit 40 determines the model calculation area in the input frame based on the model generation history of the past frames.

本実施形態では計算領域決定部40が拡張部41を具備し、図2に一例を示したように、例えば一つ前の過去フレームにおけるモデル生成領域rを周囲に所定の距離またはボクセル数（以下、拡張量Pで総称する）だけ拡張した領域を入力フレームにおけるモデル計算領域Rに決定する。なお、図2は便宜的に2次元で記載されているが、実際は3次元空間において拡張量Pに基づく拡張が成される。 In this embodiment, the calculation region determination unit 40 includes an expansion unit 41, and as shown in an example in FIG. 2, for example, a region obtained by expanding the model generation region r in the previous frame by a predetermined distance or number of voxels (hereinafter collectively referred to as the expansion amount P) around it is determined as the model calculation region R in the input frame. Note that while FIG. 2 is depicted in two dimensions for convenience, in reality the expansion based on the expansion amount P is performed in three-dimensional space.

3Dモデル生成部20において、ボクセルモデル計算部21は、前記計算領域決定部40が決定した3次元のモデル計算領域に、要求品質を満たす3Dモデル生成に好適な単位ボクセルサイズMaのボクセルグリッドを配置し、シルエット画像取得部10が取得したシルエット画像を用いた視体積交差法により高解像度の3Dボクセルモデルを生成する。 In the 3D model generation unit 20, the voxel model calculation unit 21 places a voxel grid with a unit voxel size Ma suitable for generating a 3D model that satisfies the required quality in the three-dimensional model calculation area determined by the calculation area determination unit 40, and generates a high-resolution 3D voxel model by a volume intersection method using the silhouette image acquired by the silhouette image acquisition unit 10.

なお、ボクセルモデルは多数のボクセルで形成されるボリュームデータであるが、一般的に3Dモデルデータはポリゴンモデルとして扱う方が都合の良いケースも多い。そこで、本実施形態では3Dモデル生成部20に3Dモデル出力部22を設け、マーチングキューブ法などのボクセルモデルをポリゴンモデルに変換する手法を用いてボクセルモデルをポリゴンモデルに変換し、ポリゴンモデルとして3Dモデルを出力するようにしている。 Note that while a voxel model is volume data formed from a large number of voxels, it is generally more convenient to handle 3D model data as a polygon model in many cases. Therefore, in this embodiment, a 3D model output unit 22 is provided in the 3D model generation unit 20, and the voxel model is converted into a polygon model using a method for converting a voxel model into a polygon model, such as the marching cubes method, and the 3D model is output as a polygon model.

履歴登録部50は、フレームごとに得られる3Dモデル生成履歴を、次フレーム以降でモデル計算領域を決定する際に参照する履歴情報として前記履歴DB50に登録する。 The history registration unit 50 registers the 3D model generation history obtained for each frame in the history DB 50 as history information to be referenced when determining the model calculation area in the next frame and thereafter.

本実施形態によれば、過去フレームにおける各3Dモデルの生成位置に基づいて入力フレームにおけるモデル計算領域を決定し、当該モデル計算領域のみを対象に3Dモデル計算を行うので、3Dモデルを高速に生成できるようになる。 According to this embodiment, the model calculation area in the input frame is determined based on the generation position of each 3D model in the previous frame, and the 3D model calculation is performed only for that model calculation area, so that the 3D model can be generated quickly.

また、本実施形態によれば過去フレームにおける3Dモデル生成位置を拡張してモデル計算領域に決定するので、処理負荷の軽い計算でモデル計算領域を決定できるようになり、3Dモデルを更に高速に生成できるようになる。 In addition, according to this embodiment, the 3D model generation position in the previous frame is expanded and determined as the model calculation area, so the model calculation area can be determined with calculations that impose a light processing load, enabling the 3D model to be generated even faster.

図3は、本発明の第2実施形態の構成を示した機能ブロック図であり、前記と同一の符号は同一または同等部分を表しているので、その説明は省略する。 Figure 3 is a functional block diagram showing the configuration of the second embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts, and therefore their explanation will be omitted.

本実施形態では前記履歴登録部50が、シルエット画像に基づいて生成した3Dモデルを、例えばその一般的な移動速度に基づいてクラス分けする分類部51を具備し、拡張部41が3Dモデルごとにそのクラス分類の結果に基づいて、モデル生成領域rを拡張してモデル計算領域Rを決定する際の拡張量Pを適応的に決定するようにした点に特徴がある。 In this embodiment, the history registration unit 50 includes a classification unit 51 that classifies the 3D models generated based on the silhouette images, for example, based on their general moving speed, and the expansion unit 41 adaptively determines the expansion amount P when expanding the model generation area r to determine the model calculation area R, based on the results of the classification for each 3D model.

前記分類部51は、フレームごとに生成した各3Dモデルを「人物」や「ボール」などのクラスに分類し、その分類結果を各3Dモデルの生成位置と共にモデル生成履歴として履歴DB30に登録する。クラス分類の処理を行うためには3Dモデルが被写体ごとに区別されて入力されなければならないが、これは入力される3Dモデルが連結している場合に連結領域を一つの塊として得ることで行われる。あるいは図4に示すように、各3Dモデルを内包する3Dバウンディングボックスを根拠にクラス分類を行っても良い。 The classification unit 51 classifies each 3D model generated for each frame into a class such as "person" or "ball," and registers the classification result together with the generation position of each 3D model in the history DB 30 as a model generation history. In order to perform the classification process, the 3D models must be input separately for each subject, and this is done by obtaining the connected areas as a single block when the input 3D models are connected. Alternatively, as shown in Figure 4, classification may be performed based on the 3D bounding box that contains each 3D model.

クラス分類には、(1) 3Dモデルのサイズに基づくクラス分類、(2) 3Dモデルの位置に基づくクラス分類、(3) 深層学習等に基づくクラス分類などを適用できる。 Classification can be done using (1) classification based on the size of the 3D model, (2) classification based on the position of the 3D model, or (3) classification based on deep learning, etc.

(1) 3Dモデルのサイズに基づくクラス分類
各被写体をその3Dボクセルモデルのサイズや形状（全体の大きさ、縦、横、高さ）に基づいて分類できる。例えば、被写体をスポーツシーンでよく見られる「人物」または「ボール」に区別する場合、ボクセルモデルが所定の閾値よりも大きければ「人物」、小さければ「ボール」に分類できる。あるいは3Dバウンディングボックスの形状が立方体であれば「ボール」、直方体であれば「人物」に分類することもできる。 (1) Classification based on the size of the 3D model Each subject can be classified based on the size and shape (overall size, length, width, and height) of its 3D voxel model. For example, when classifying subjects into "people" and "balls," which are often seen in sports scenes, if the voxel model is larger than a certain threshold, it can be classified as a "people" and if it is smaller, it can be classified as a "ball." Alternatively, if the shape of the 3D bounding box is a cube, it can be classified as a "ball," and if it is a rectangular prism, it can be classified as a "people."

(2) 3Dモデルの位置に基づくクラス分類
例えば、高さが10mを超える位置に形成される3Dモデルはボールに分類し、人物や用具である確率を小さく見積もることができる。 (2) Classification based on the position of the 3D model. For example, a 3D model formed at a height of more than 10 m can be classified as a ball, and the probability that it is a person or equipment can be estimated to be low.

(3) 深層学習等に基づくクラス分類
3Dモデルの形状が被写体ごとに特徴的であることを利用して、予めモデル形状と被写体との関係を深層学習等により学習して予測モデルを構築し、各3Dモデルを予測モデルに適用することでクラス分類を行うことができる。 (3) Classification based on deep learning etc.
Taking advantage of the fact that the shape of a 3D model is characteristic of each subject, a predictive model can be constructed by learning in advance the relationship between the model shape and the subject using deep learning or other methods, and class classification can be performed by applying each 3D model to the predictive model.

なお、上記の各分類手法は単独で用いても良いし、複数の分類手法を適宜に組み合わせるようにしても良い。 Each of the above classification methods may be used alone, or multiple classification methods may be combined as appropriate.

図5は、3Dモデルのクラスに応じて前記拡張部41がモデル計算領域Rを決定する際に拡張量Pを適応的に決定する例を模式的に示した図である。本実施形態では3Dモデルが生成されることに、前記分類部51が各3Dモデルをそれぞれ「人物」または「ボール」に分類する。一般的に、「ボール」の移動速度は「人物」の移動速度よりも大きいので、拡張部41は、ボールのモデル計算領域を決定する際の拡張量を、人物のモデル計算領域を決定する際の拡張量よりも大きくしている。 Figure 5 is a schematic diagram showing an example in which the expansion unit 41 adaptively determines the expansion amount P when determining the model calculation area R according to the class of the 3D model. In this embodiment, when a 3D model is generated, the classification unit 51 classifies each 3D model into a "person" or a "ball." Since the movement speed of a "ball" is generally faster than the movement speed of a "person," the expansion unit 41 sets the expansion amount when determining the model calculation area of a ball to be larger than the expansion amount when determining the model calculation area of a person.

本実施形態によれば、各3Dモデルをクラスに分類し、過去フレームにおける3Dモデル生成位置をクラスに応じた拡張量で拡張するので、3Dモデルごとにモデル計算領域を過不足の無い適切な範囲に決定できるようになる。 According to this embodiment, each 3D model is classified into a class, and the 3D model generation position in the previous frame is expanded by an expansion amount according to the class, so that the model calculation area can be determined to be an appropriate range for each 3D model, neither too much nor too little.

また、本実施形態によれば、各3Dモデルをその被写体に想定される移動速度に基づいて各クラスに分類し、移動速度のより速い被写体のクラスほど拡張量をより大きくするので、3Dモデルの移動速度に差がある場合でもモデル計算領域を過不足の無い適切な範囲に決定できるようになる。 In addition, according to this embodiment, each 3D model is classified into classes based on the expected moving speed of the subject, and the amount of expansion is increased for classes of subjects with faster moving speeds, so that even if there are differences in the moving speeds of the 3D models, the model calculation area can be determined to be an appropriate range that is neither excessive nor insufficient.

図6は、本発明の第3実施形態の構成を示した機能ブロック図であり、前記と同一の符号は同一または同等部分を表しているので、その説明は省略する。 Figure 6 is a functional block diagram showing the configuration of the third embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts, and therefore their explanation will be omitted.

本実施形態では計算領域決定部40に速度場推定部42を設け、図7に示したように、過去フレーム間での3Dモデル生成位置の変化に基づいて3Dモデルごとに移動場を推定し、この推定結果に基づいて入力フレームにおけるモデル計算領域を決定するようにした点に特徴がある。 In this embodiment, the calculation area determination unit 40 is provided with a velocity field estimation unit 42, and as shown in Figure 7, the movement field is estimated for each 3D model based on the change in the 3D model generation position between past frames, and the model calculation area in the input frame is determined based on the estimation result.

速度場推定部42は、過去のフレーム間での各3Dモデルの生成位置の変化に基づいて自由視点制作対象となる3D空間中の各ボクセルグリッドの3D速度場（v_x, v_y, v_z）を推定する。3D速度場の推定には、例えば非特許文献6に開示されるように、各カメラの2Dオプティカルフローから3Dのオプティカルフローを再構成する技術を用いることができる。 The velocity field estimation unit 42 estimates the 3D velocity field ( _vx , _vy , _vz) of each voxel grid in the 3D space to be the free viewpoint production target based on the change in the generation position of each 3D model between past frames. For the estimation of the 3D velocity field, a technique for reconstructing a 3D optical flow from the 2D optical flow of each camera can be used, as disclosed in, for example, Non-Patent Document 6.

また、過去のフレーム間のボクセル形成位置を点と見立てた3D点群データを形成し、被写体の3Dモデルごとに、非特許文献7が開示するICP (Iterative Closest Point) 法に代表される点群の位置合わせ手法を用い、位置合わせがなされた位置へと移動するものとして、（移動後の位置）-（移動前の位置）で当該ボクセルの速度場を算出するようにしても良い。 In addition, 3D point cloud data may be formed by regarding the voxel formation positions between past frames as points, and a point cloud registration method such as the ICP (Iterative Closest Point) method disclosed in Non-Patent Document 7 may be used for each 3D model of the subject, and the subject may be moved to the registered position, and the velocity field of the voxel may be calculated as (position after movement) - (position before movement).

あるいは、前後するフレーム間で各3Dモデルの3Dバウンディングボックスを、例えば最も近い位置にある3Dバウンディングボックス同士を対応付けることで追跡し、前フレームの各3Dバウンディングボックスの重心位置から、後フレームの対応する各3Dバウンディングボックスの重心位置への移動ベクトルを、後フレームの3Dバウンディングボックス内の全てのボクセルの速度場と推定するようにしても良い。 Alternatively, the 3D bounding boxes of each 3D model can be tracked between adjacent frames, for example by matching the 3D bounding boxes that are closest to each other, and the motion vector from the center of gravity of each 3D bounding box in the previous frame to the center of gravity of the corresponding 3D bounding box in the subsequent frame can be estimated as the velocity field of all voxels within the 3D bounding box in the subsequent frame.

前記拡張部41は、速度場の大きい3Dモデルほど、そのモデル計算領域を決定する際の拡張量Pを大きくする。 The expansion unit 41 increases the expansion amount P when determining the model calculation area for a 3D model with a larger velocity field.

このとき、第2実施形態と同様に、履歴登録部50に分類部51を設けて各3Dモデルまたはその3Dバウンディングボックスを「人物」や「ボール」などにクラス分けし、前後するフレーム間で同一クラスかつ最も近い位置にある3Dモデルまたはその3Dバウンディングボックス同士を対応付けることで、クラスごとに異なる速度場を算出するようにしても良い。 In this case, similar to the second embodiment, a classification unit 51 may be provided in the history registration unit 50 to classify each 3D model or its 3D bounding box into a class such as "person" or "ball," and a different velocity field may be calculated for each class by associating 3D models or their 3D bounding boxes that are in the same class and in the closest position between successive frames.

本実施形態によれば、複数の過去フレームにおける各3Dモデルの生成履歴に基づいて各3Dモデルの速度場を推定し、3Dモデルごとにその速度場に基づいてモデル計算領域を決定するので、3Dモデルの移動速度や移動方向に差がある場合でもモデル計算領域を過不足の無い適切な範囲に決定できるようになる。 According to this embodiment, the velocity field of each 3D model is estimated based on the generation history of each 3D model in multiple past frames, and the model calculation area is determined for each 3D model based on that velocity field. This makes it possible to determine the model calculation area to be an appropriate range that is neither excessive nor insufficient, even if there is a difference in the movement speed or movement direction of the 3D models.

図8は、本発明の第4実施形態の構成を示した機能ブロック図であり、前記と同一の符号は同一または同等部分を表しているので、その説明は省略する。 Figure 8 is a functional block diagram showing the configuration of the fourth embodiment of the present invention. The same reference numerals as those described above represent the same or equivalent parts, and therefore their explanation will be omitted.

本実施形態では、3Dモデル生成部20が前記ボクセルモデル計算部21として、相対的に高解像度の3Dボクセルモデルを生成する高解像度モデル生成部21aおよび低解像度の3Dボクセルモデルを生成する低解像度モデル生成部21bを具備し、入力フレームの周期や種別に応じて各モデル生成部21a，21bを使い分けて、あるいは組み合わせて、3Dボクセルモデルを生成するようにした点に特徴がある。 In this embodiment, the 3D model generation unit 20 is characterized in that it includes, as the voxel model calculation unit 21, a high-resolution model generation unit 21a that generates a relatively high-resolution 3D voxel model and a low-resolution model generation unit 21b that generates a low-resolution 3D voxel model, and generates a 3D voxel model by selectively using or combining each of the model generation units 21a and 21b depending on the period and type of the input frames.

高解像度モデル生成部21aは、第1実施形態のボクセルモデル計算部21と同様に、単位ボクセルサイズがMaのボクセルグリッドを配置した３次元空間に、シルエット画像取得部10が取得したシルエット画像を用いた視体積交差法により高解像度の3Dボクセルモデルを生成する。 The high-resolution model generation unit 21a, like the voxel model calculation unit 21 in the first embodiment, generates a high-resolution 3D voxel model by a volume intersection method using the silhouette image acquired by the silhouette image acquisition unit 10 in a three-dimensional space in which a voxel grid with a unit voxel size of Ma is arranged.

低解像度モデル生成部21bは、単位ボクセルサイズがMb（＞Ma）のボクセルグリッドを配置した３次元空間に、シルエット画像取得部10が取得したシルエット画像を用いた視体積交差法により低解像度の3Dボクセルモデルを生成する。 The low-resolution model generation unit 21b generates a low-resolution 3D voxel model by a volume intersection method using the silhouette image acquired by the silhouette image acquisition unit 10 in a three-dimensional space in which a voxel grid with a unit voxel size of Mb (>Ma) is arranged.

入力フレーム識別部23は、今回の入力フレームがキーフレームおよび非キーフレームのいずれであるかを識別する。 The input frame identification unit 23 identifies whether the current input frame is a key frame or a non-key frame.

本実施形態では、図9に示すように、キーフレームでは低解像度モデル生成部21bが自由視点制作対象の3D空間に単位ボクセルサイズがMbのボクセルグリッドを配置し、シルエット画像を用いた視体積交差法により低解像度3Dボクセルモデルを生成する。 In this embodiment, as shown in FIG. 9, in the key frame, the low-resolution model generation unit 21b places a voxel grid with a unit voxel size of Mb in the 3D space of the free viewpoint production target, and generates a low-resolution 3D voxel model by the visual volume intersection method using the silhouette image.

次いで、高解像度モデル生成部21aが前記低解像度3Dボクセルモデルの生成領域またはその3Dバウンディングボックス内のみに単位ボクセルサイズがMaのボクセルグリッドを配置し、改めてシルエット画像を用いた視体積交差法により高解像度3Dボクセルモデルを生成する。 Next, the high-resolution model generation unit 21a places a voxel grid with a unit voxel size Ma only within the generation area of the low-resolution 3D voxel model or its 3D bounding box, and again generates a high-resolution 3D voxel model by the volume intersection method using the silhouette image.

これに対して、非キーフレームでは高解像度モデル生成部21aが、前記計算領域決定部40が過去フレームにおける3Dモデルの生成履歴に基づいて決定したモデル計算領域のみに単位ボクセルサイズがMaのボクセルグリッドを配置し、シルエット画像を用いた視体積交差法により高解像度3Dボクセルモデルを生成する。 In contrast, for non-key frames, the high-resolution model generation unit 21a places a voxel grid with a unit voxel size Ma only in the model calculation area determined by the calculation area determination unit 40 based on the generation history of the 3D model in past frames, and generates a high-resolution 3D voxel model by the volume intersection method using the silhouette image.

前記入力フレーム識別部23は、各フレーム画像を例えばその入力順に、複数フレームに１フレームの割合でキーフレームと定義し、それ以外を非キーフレームと定義する。そして、図10に示すようにキーフレームでは低解像度3Dボクセルモデルを生成したのち、当該モデルの生成領域のみを対象に高解像度でモデルを生成する一方、非キーフレームでは第1実施形態と同様に、モデル生成履歴に基づいて決定したモデル計算領域のみを対象に高解像度でモデルを生成する。 The input frame identification unit 23 defines each frame image, for example in the order of input, as a key frame, at a ratio of one frame per multiple frames, and defines the rest as non-key frames. Then, as shown in FIG. 10, after generating a low-resolution 3D voxel model for a key frame, a model is generated at high resolution only for the generation area of that model, while for non-key frames, a model is generated at high resolution only for the model calculation area determined based on the model generation history, as in the first embodiment.

本実施形態によれば、キーフレームではシルエット画像に基づいて低解像度の3Dモデルを生成した領域のみに高解像度の3Dモデルを生成する一方、非キーフレームではモデル計算領域を対象に3Dモデル計算を行って高解像度の3Dモデルを生成するので、キーフレームにおける2段階での3Dモデル生成による高速化のみならず、非キーフレームにおける3Dモデル生成でも、その品質を維持しながら高速化を実現できるようになる。 According to this embodiment, in keyframes, a high-resolution 3D model is generated only in the area where a low-resolution 3D model is generated based on a silhouette image, while in non-keyframes, a high-resolution 3D model is generated by performing 3D model calculations on the model calculation area. This not only speeds up the two-stage 3D model generation in keyframes, but also speeds up the generation of 3D models in non-keyframes while maintaining their quality.

なお、履歴登録部50が前記第2実施形態と同様に分類部51を重ねて具備する場合には、図11に示すように、高解像度モデル生成部21aは非キーフレームにおいて、クラス分類の結果に応じて適応的に拡張されたモデル計算領域に3Dモデルを生成する。 When the history registration unit 50 is provided with a classification unit 51 superimposed thereon as in the second embodiment, as shown in FIG. 11, the high-resolution model generation unit 21a generates a 3D model in a model calculation area that is adaptively expanded in accordance with the results of class classification in non-key frames.

また、計算領域決定部40が前記第3実施形態と同様に速度場推定部42を重ねて具備する場合には、図12に示すように、非キーフレームにおけるモデル計算領域予測に速度場の推定結果を利用しても良い。 In addition, when the calculation domain determination unit 40 is equipped with a velocity field estimation unit 42 superimposed thereon as in the third embodiment, the estimated results of the velocity field may be used for model calculation domain prediction in non-key frames, as shown in FIG. 12.

さらに、図13に示した第5実施形態のように、非キーフレームにおいて前記計算領域決定部40が予測したモデル計算領域を対象に低解像度3Dモデル生成を実施し、低解像度3Dボクセルモデルまたはその3Dバウンディングボックスの生成領域のみを対象に前記高解像度モデル生成部21aが高解像度3Dモデル生成を実施するようにしても良い。 Furthermore, as in the fifth embodiment shown in FIG. 13, a low-resolution 3D model may be generated for the model calculation area predicted by the calculation area determination unit 40 in a non-key frame, and the high-resolution model generation unit 21a may generate a high-resolution 3D model for only the generation area of the low-resolution 3D voxel model or its 3D bounding box.

このとき、低解像度モデル生成部21bがキーフレームにおける低解像度3Dモデル生成と同様に単位ボクセルサイズをMbとして低解像度3Dモデル生成を実施すると計算負荷が増えることがある。そこで、予測したモデル計算領域に単位ボクセルサイズがMc（＞Mb）のボクセルグリッドを配置し、更に低解像度の3Dモデルを生成するようにしても良い。 At this time, if the low-resolution model generation unit 21b generates a low-resolution 3D model with a unit voxel size of Mb, as in the case of generating a low-resolution 3D model in a key frame, the calculation load may increase. Therefore, a voxel grid with a unit voxel size of Mc (>Mb) may be placed in the predicted model calculation area, and a 3D model with even lower resolution may be generated.

あるいは単位ボクセルサイズはMbとしたまま、モデル生成に用いるシルエット画像数（カメラ数）を非キーフレームにおける低解像度3Dモデル生成時よりも少なくするようにしても良い。 Alternatively, the unit voxel size can be kept at Mb, but the number of silhouette images (number of cameras) used to generate the model can be reduced compared to when generating a low-resolution 3D model in non-keyframes.

本実施形態によれば、非キーフレームにおいてはモデル計算領域を対象に3Dモデル計算を行って生成した低解像度3Dモデルの生成領域を対象に3Dモデル計算を行って高解像度3Dモデルを生成するので、非キーフレームにおけるモデル計算領域を更に正確に決定できる。 According to this embodiment, in non-keyframes, 3D model calculations are performed on the model calculation area to generate a low-resolution 3D model, and then 3D model calculations are performed on the generation area to generate a high-resolution 3D model, so the model calculation area in non-keyframes can be determined more accurately.

なお、本実施形態では非キーフレームにおいてモデル計算領域に低解像度3Dモデルを生成する際にシルエット画像を用いたカメラのうち何台のカメラにおいて当該シルエット画像が前景であったかをボクセルごとに判断して、モデル生成尤度を算出することができる。例えば、全8台のカメラのうち7台のカメラのシルエット画像で前景となったボクセルについては、そのモデル生成尤度L_modelを7/8（＝0.875）として記録する。 In this embodiment, when generating a low-resolution 3D model in a model calculation area in a non-key frame, it is possible to determine for each voxel how many cameras used silhouette images in which the silhouette images were in the foreground, and calculate the model generation likelihood. For example, for a voxel that was in the foreground in the silhouette images of seven cameras out of a total of eight cameras, the model generation likelihood L _model is recorded as 7/8 (=0.875).

さらに、非キーフレームにおいてモデル計算領域に生成した低解像度3Dモデルのボクセルごとに、過去フレーム（例えば、前フレーム）における3Dモデル生成領域との距離を、例えば当該3Dモデル生成領域の重心位置や最近傍ボクセルからの距離として計測し、当該距離の逆数に基づいて履歴ベース生成尤度L_historyを計算する。このとき、当該3Dモデル生成領域と重複するボクセルには最大尤度（=1）を与えることができる。 Furthermore, for each voxel of the low-resolution 3D model generated in the model calculation area in a non-key frame, the distance to the 3D model generation area in a past frame (e.g., the previous frame) is measured as, for example, the distance from the center of gravity position or the nearest voxel of the 3D model generation area, and a history-based generation likelihood L _history is calculated based on the inverse of the distance. At this time, the maximum likelihood (=1) can be given to voxels that overlap with the 3D model generation area.

そして、低解像度3Dモデルのボクセルごとに、前記モデル生成尤度L_modelおよび／または履歴ベース生成尤度L_historyが予め設定した閾値T1を上回るボクセルの領域のみ、あるいは次式(1)のようにモデル生成尤度L_modelと履歴ベース生成尤度L_historyとを加算した尤度が予め設定した閾値T2を上回るボクセルの領域のみ、を高解像度モデル計算領域に決定しても良い。 Then, for each voxel of the low-resolution 3D model, only the region of the voxel where the model generation likelihood L _model and/or the history-based generation likelihood L _history exceeds a preset threshold T1, or only the region of the voxel where the likelihood obtained by adding the model generation likelihood L _model and the history-based generation likelihood L _history exceeds a preset threshold T2, as shown in the following equation (1), may be determined as the high-resolution model calculation region.

L_model+ L_history＞T2 (1) L _model + L _history ＞T2 (1)

そして、上記の各実施形態によれば高品質な被写体3Dモデルを通信インフラ経由でもリアルタイムで提供することが可能となるので、地理的あるいは経済的な格差を超えて多くの人々に多様なエンターテインメントを提供できるようになる。その結果、国連が主導する持続可能な開発目標(SDGs)の目標9「レジリエントなインフラを整備し、包括的で持続可能な産業化を推進する」や目標11「都市を包摂的、安全、レジリエントかつ持続可能にする」に貢献することが可能となる。 Furthermore, according to each of the above embodiments, it becomes possible to provide high-quality 3D models of subjects in real time even via communication infrastructure, making it possible to provide a variety of entertainment to many people regardless of geographic or economic disparities. As a result, it becomes possible to contribute to Goal 9 "Build resilient infrastructure and promote inclusive and sustainable industrialization" and Goal 11 "Make cities inclusive, safe, resilient and sustainable" of the Sustainable Development Goals (SDGs) led by the United Nations.

10…シルエット画像取得部，20…3Dモデル生成部，21…ボクセルモデル計算部，21a…高解像度モデル生成部，21b…低解像度モデル生成部，22…3Dモデル出力部，23…入力フレーム識別部，30…履歴データベース，40…計算領域決定部，41…拡張部，42…速度場推定部，50…履歴登録部，51…分類部，100…3Dモデル生成装置 10...Silhouette image acquisition unit, 20...3D model generation unit, 21...Voxel model calculation unit, 21a...High resolution model generation unit, 21b...Low resolution model generation unit, 22...3D model output unit, 23...Input frame identification unit, 30...History database, 40...Calculation area determination unit, 41...Expansion unit, 42...Velocity field estimation unit, 50...History registration unit, 51...Classification unit, 100...3D model generation device

Claims

A 3D model generation device that generates a 3D model of a subject based on silhouette images extracted frame by frame from a video sequence captured by a plurality of cameras with different viewpoints,
A means for storing a model generation history including a generation position of each 3D model in a past frame;
means for determining a model calculation region in an input frame based on the model generation history;
A means for performing 3D model calculation on the model calculation area to generate a 3D model;
means for classifying the input frames as key frames or non-key frames;
The means for generating the 3D model includes:
means for generating a relatively high resolution 3D model and means for generating a relatively low resolution 3D model;
In the keyframe, a high-resolution 3D model is generated in the area where a low-resolution 3D model was generated based on the silhouette image,
In the non-key frame, a 3D model generation device performs 3D model calculation on the model calculation area to generate a high-resolution 3D model.

The 3D model generating device according to claim 1, characterized in that, in the non-key frames, a 3D model calculation is performed on a generation area of a low-resolution 3D model generated by performing 3D model calculation on the model calculation area to generate a high -resolution 3D model.

3. The 3D model generating device according to claim 1, wherein the means for determining the model calculation area determines an area obtained by expanding a 3D model generation position in a previous frame as the model calculation area.

means for classifying each 3D model into a class;
4. The 3D model generating device according to claim 3 , wherein the means for determining the model calculation area expands the 3D model generation position in the previous frame by an expansion amount according to the class.

The classifying means classifies each 3D model into a class based on an expected moving speed of the subject,
5. The 3D model generating device according to claim 4 , wherein the means for determining the model calculation region increases the amount of expansion for a subject class having a faster moving speed.

A means for estimating a velocity field of each 3D model based on a generation history of each 3D model in a plurality of past frames is provided;
6. The 3D model generating device according to claim 1, wherein the means for determining the model calculation region determines the model calculation region for each 3D model on the basis of its velocity field.

3. The 3D model generation device according to claim 2, wherein in the non-key frames, the number of silhouette images used when generating a low-resolution 3D model for the model calculation area is smaller than the number of silhouette images used when generating a low-resolution 3D model in the key frames.

3. The 3D model generating device according to claim 2, wherein in the non-key frames, a unit voxel size when generating a low-resolution 3D model for the model calculation region is larger than a unit voxel size when generating a low-resolution 3D model in the key frames .

In the non-key frame,
The 3D model generation device according to claim 2, characterized in that for each voxel of a low-resolution 3D model generated for the model calculation area, at least one of a calculation of a model generation likelihood based on how many cameras it was in the foreground in, and a calculation of a history-based generation likelihood based on the inverse of the distance from the model generation position in a past frame is performed, and a calculation area for a high-resolution 3D model is determined based on at least one of the model generation likelihood and the history-based generation likelihood.

A 3D model generation method in which a computer generates a 3D model of a subject based on silhouette images extracted frame by frame from a video sequence captured by a plurality of cameras with different viewpoints, comprising:
storing a model generation history including the generation position of each 3D model in past frames;
determining a model calculation domain in an input frame based on the model generation history;
A 3D model is generated by performing a 3D model calculation on the model calculation area;
classifying the input frames as key frames or non-key frames;
In the key frame, a high-resolution 3D model is generated in an area where a low-resolution 3D model is generated based on a silhouette image;
a 3D model generating method for generating a high-resolution 3D model by performing 3D model calculation on the model calculation area in the non-key frame;

A 3D model generation program that generates a 3D model of a subject based on silhouette images extracted frame by frame from a video sequence captured by multiple cameras with different viewpoints,
storing a model generation history including a generation position of each 3D model in a past frame;
determining a model calculation region in an input frame based on the model generation history;
performing a 3D model calculation on the model calculation area to generate a 3D model;
classifying said input frames as key frames or non-key frames ;
Run the following on your computer:
The step of generating the 3D model comprises:
In the key frame, a high-resolution 3D model is generated in an area where a low-resolution 3D model is generated based on a silhouette image;
A 3D model generation program, characterized in that in the non-keyframe, a 3D model calculation is performed on the model calculation area to generate a high-resolution 3D model.