JP7657308B2

JP7657308B2 - Method, apparatus and system for generating a three-dimensional model of a scene - Patents.com

Info

Publication number: JP7657308B2
Application number: JP2023548990A
Authority: JP
Inventors: シャンユーチェン，
Original assignee: Beike Technology Co Ltd
Current assignee: Beike Technology Co Ltd
Priority date: 2020-10-29
Filing date: 2021-08-24
Publication date: 2025-04-04
Anticipated expiration: 2041-08-24
Also published as: CN112312113A; US20220139030A1; JP2023546739A; CN112312113B; US11989827B2; WO2022088881A1

Description

本開示は、３次元（３Ｄ）再構成技術の分野に関し、より具体的には、VRデータを生成する画像処理に関する。 The present disclosure relates to the field of three-dimensional (3D) reconstruction techniques, and more specifically, to image processing to generate VR data.

［関連出願の相互参照］
本願は、２０２０年１０月２９日に出願された中国特許出願第２０２０１１１８０６５０.０号の優先権を主張し、その内容全体が参照により本明細書に組み込まれる。 CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority to Chinese Patent Application No. 202011180650.0, filed on October 29, 2020, the entire contents of which are incorporated herein by reference.

［背景技術］
３Ｄ再構成とは、３Ｄオブジェクトのコンピュータ表現と処理に適した数学的モデルを確立する手法である。コンピュータでは、確立された３Ｄモデルにより、３Ｄオブジェクトの処理、操作、および分析を実現できる。３Ｄ再構成技術は、コンピュータで客観的な世界を表現するVR環境を構成するための重要な手法である。一般に、３Ｄ再構成は、画像取得、カメラ校正、特徴抽出、ステレオマッチング、および３Ｄ再構成などのステップを含み得る。 [Background Art]
3D reconstruction is a method of establishing a mathematical model suitable for computer representation and processing of a 3D object. In a computer, the established 3D model can realize processing, manipulation, and analysis of the 3D object. 3D reconstruction technology is an important method for constructing a VR environment that represents an objective world in a computer. In general, 3D reconstruction may include steps such as image acquisition, camera calibration, feature extraction, stereo matching, and 3D reconstruction.

従来の３Ｄ再構成技術は、通常、３Ｄモデリングにするために、深度カメラまたはLIDAR (light Detection and ranging)システムのいずれを使用する。深度カメラは、高解像度の深度データを提供することができる。しかし、深度カメラにより取得られる深度データの精度は、深度カメラとシーン内のオブジェクトとの間の距離に大きく依存するため、深度カメラの高精度の深度データ範囲は制限されている。さらに、深度カメラの被写界距離は、通常、そのパワーや解像度によって制限されている。一方、LIＤARシステムにより取得られる深度データの精度は高く、広い深度範囲で比較的安定しているため、LIＤARシステムは広い深度範囲で高精度の深度データを提供する。しかし、LIＤARシステムにより取得られる深度データの解像度は低い。そのため、３Ｄモデル化のために生成される点群は比較的疎なものとなり、小さなオブジェクト(水道管、ペンなど)を十分な詳細でモデル化することは困難である。 Conventional 3D reconstruction techniques usually use either a depth camera or a LIDAR (light detection and ranging) system for 3D modeling. A depth camera can provide high-resolution depth data. However, the accuracy of the depth data acquired by a depth camera is highly dependent on the distance between the depth camera and objects in the scene, so the high-precision depth data range of the depth camera is limited. In addition, the field distance of a depth camera is usually limited by its power and resolution. On the other hand, the accuracy of the depth data acquired by a LIDAR system is high and relatively stable in a wide depth range, so the LIDAR system provides high-precision depth data in a wide depth range. However, the resolution of the depth data acquired by a LIDAR system is low. Therefore, the point cloud generated for 3D modeling is relatively sparse, and it is difficult to model small objects (such as water pipes, pens, etc.) with sufficient details.

そのため、広いシーンの検出範囲の広さ、生成される３Ｄモデルの精度、解像度を考慮した確実な３Ｄ再構成ソリューションの提供が求められている。 Therefore, there is a need to provide a reliable 3D reconstruction solution that takes into account the wide detection range of a wide scene, the accuracy of the generated 3D model, and the resolution.

シーンの３Ｄモデルを生成するための方法、コンピュータ可読媒体、システム、および装置が開示される。３Ｄモデルに対して一貫して高い精度を有する高解像度深度画像が生成できるように、撮像手段および走査手段により取得される深度データは、結合される。 A method, computer-readable medium, system, and apparatus are disclosed for generating a 3D model of a scene. Depth data acquired by the imaging means and the scanning means are combined such that a high-resolution depth image having a consistent high accuracy can be generated for the 3D model.

いくつかの実施形態では、シーンの３Ｄモデルを生成する方法が提供される。この方法は、３Ｄモデリングシステムにおける撮像手段が第１の深度データを取得することと、３Ｄモデリングシステムにおける走査手段が第２の深度データを取得することと、３Ｄモデリングシステムが色データを受信することと、前記色データ、前記第１の深度データ、前記第２の深度データに基づいて、前記シーンの３Ｄモデルを生成することと、前記３Ｄモデリングシステムにおいて前記シーンの３Ｄモデルを表示させることと、を含む。前記第１の深度データは、深度画像における複数のフレームの画素を含む。前記第２の深度データは、複数の画像フレームの深度データ点を含む。 In some embodiments, a method of generating a 3D model of a scene is provided. The method includes: an imaging means in a 3D modeling system acquiring first depth data; a scanning means in the 3D modeling system acquiring second depth data; a 3D modeling system receiving color data; generating a 3D model of the scene based on the color data, the first depth data, and the second depth data; and displaying the 3D model of the scene in the 3D modeling system. The first depth data includes pixels of a plurality of frames in a depth image. The second depth data includes depth data points of a plurality of image frames.

いくつかの実施形態では、第１の深度データにおける前記深度画像の各々は第１のデータであり、第２の深度データにおける前記画像フレームの各々は第２のデータである。前記方法は、さらに、複数のデータペアを決定し、複数のデータペアに基づいて、前記第１の深度データと前記第２の深度データとの間の位置関係を決定することと、を含む。複数のデータペアの各々は、前記第１のデータおよび対応する前記第２のデータを含む。前記データペアにおける前記第１のデータおよび対応する前記第２のデータは、同様の目標オブジェクトを含む。 In some embodiments, each of the depth images in the first depth data is first data, and each of the image frames in the second depth data is second data. The method further includes determining a plurality of data pairs and determining a positional relationship between the first depth data and the second depth data based on the plurality of data pairs. Each of the plurality of data pairs includes the first data and the corresponding second data. The first data and the corresponding second data in the data pairs include similar target objects.

いくつかの実施形態では、前記第１のデータの各々には、前記撮像手段の姿勢情報を示す第１の外部パラメータが付けされる。前記第２のデータの各々には、前記走査手段の姿勢情報を示す第２の外部パラメータが付けされる。前記方法は、さらに、前記第１の外部パラメータに基づいて、前記第１のデータに関連する第１の姿勢を決定することと、前記第２の外部パラメータに基づいて、前記第２のデータに関連する第２の姿勢を決定することと、前記第１の姿勢と前記第２の姿勢が同様であることに応じて、前記第１のデータ及び前記第２のデータを含む前記データペアを決定することと、を含む。 In some embodiments, each of the first data is provided with a first external parameter indicating attitude information of the imaging means. Each of the second data is provided with a second external parameter indicating attitude information of the scanning means. The method further includes determining a first attitude associated with the first data based on the first external parameter, determining a second attitude associated with the second data based on the second external parameter, and determining the data pair including the first data and the second data in response to the first attitude and the second attitude being similar.

いくつかの実施形態では、前記第１の外部パラメータと前記第２の外部パラメータは、前記３Ｄモデリングシステムにおける同一の姿勢センサから出力される。 In some embodiments, the first external parameter and the second external parameter are output from the same orientation sensor in the 3D modeling system.

いくつかの実施形態では、前記第１のデータの各々には、前記撮像手段により前記第１のデータが取得された定時を示す第１のタイムスタンプが付けされる。前記第２のデータの各々には、前記走査手段により前記第２のデータが取得された定時を示す第２のタイムスタンプが付けされる。前記データペアにおける前記第１のデータと対応する前記第２のデータは、閾値よりも小さい時間間隔を有する。 In some embodiments, each of the first data is provided with a first time stamp indicating the fixed time when the first data was acquired by the imaging means. Each of the second data is provided with a second time stamp indicating the fixed time when the second data was acquired by the scanning means. The first data and the corresponding second data in the data pair have a time interval that is less than a threshold value.

いくつかの実施形態では、前記データペアにおける前記第１のデータおよび対応する前記第２のデータは、１つ以上の同様のオブジェクトで識別される。 In some embodiments, the first data and the corresponding second data in the data pair are identified with one or more similar objects.

いくつかの実施形態では、前記方法は、さらに、深度閾値を決定することと、前記１つ以上のデータ点の深度値が前記深度閾値よりも大きいことに応じて、前記第１の深度データにおける前記１つ以上のデータ点を削除することと、を含む。 In some embodiments, the method further includes determining a depth threshold and removing the one or more data points in the first depth data in response to the depth value of the one or more data points being greater than the depth threshold.

いくつかの実施形態では、前記方法は、さらに、生成された前記シーンの３Ｄモデルにおいて、解像度閾値未満の１つ以上の画像領域を識別することと、前記３Ｄモデリングシステムにおける前記撮像手段が、複数の深度画像を含む第１の充填データを取得することと、生成された前記シーンの３Ｄモデルに、前記第１の充填データを補充して、新たなシーンの３Ｄモデルを生成することと、を含む。 In some embodiments, the method further includes identifying one or more image regions in the generated 3D model of the scene that are below a resolution threshold; the imaging means in the 3D modeling system acquiring first fill data comprising a plurality of depth images; and supplementing the generated 3D model of the scene with the first fill data to generate a new 3D model of the scene.

いくつかの実施形態では、前記方法は、さらに、生成された前記シーンの３Ｄモデルにおいて、十分な前記深度データ点を持たない１つ以上の画像領域を識別することと、前記３Ｄモデリングシステムにおける前記走査手段が、前記複数の画像フレームの深度データ点を含む第２の充填データを取得することと、生成された前記シーンの３Ｄモデルに前記第２の充填データを補充して、新たなシーンの３Ｄモデルを生成することと、を含む。 In some embodiments, the method further includes identifying one or more image regions in the generated 3D model of the scene that do not have sufficient depth data points, the scanning means in the 3D modeling system acquiring second fill data including depth data points of the plurality of image frames, and supplementing the generated 3D model of the scene with the second fill data to generate a new 3D model of the scene.

いくつかの実施形態では、前記撮像手段は、第１のフレームレートで前記複数の深度画像を取得する。前記走査手段は、第２のフレームレートで前記複数の画像フレームを取得する。 In some embodiments, the imaging means acquires the plurality of depth images at a first frame rate. The scanning means acquires the plurality of image frames at a second frame rate.

いくつかの実施形態では、前記３Ｄモデリングシステムは、取得された前記深度データに基づいて、前記シーンの３Ｄモデルをリアルタイムで表示するディスプレイを含む。 In some embodiments, the 3D modeling system includes a display that displays a 3D model of the scene in real time based on the acquired depth data.

いくつかの実施形態では、シーンの３Ｄモデルを生成するためのシステムが提供される。前記システムは、撮像手段、走査手段、１つ以上のプロセッサを備える。前記撮像手段は、第１の深度データを取得するように構成される。前記第１の深度データは、複数のフレームの深度画像の画素を含む。前記走査手段は、第２の深度データを取得するように構成される。前記第２の深度データは、複数の画像フレームの深度データ点を含む。前記１つ以上のプロセッサは、前記撮像手段からの前記第１の深度データ、前記走査手段からの前記第２の深度データ、色データを取得し、前記第１の深度データ、前記第２の深度データ、前記色データに基づいて、前記シーンの３Ｄモデルを生成し、生成された前記シーンの３Ｄモデルを出力するように構成される。前記色データは、複数の色画像の画素を含む。 In some embodiments, a system for generating a 3D model of a scene is provided. The system comprises an imaging means, a scanning means, and one or more processors. The imaging means is configured to acquire first depth data. The first depth data comprises pixels of depth images of a plurality of frames. The scanning means is configured to acquire second depth data. The second depth data comprises depth data points of a plurality of image frames. The one or more processors are configured to acquire the first depth data from the imaging means, the second depth data from the scanning means, and color data, generate a 3D model of the scene based on the first depth data, the second depth data, and the color data, and output the generated 3D model of the scene. The color data comprises pixels of a plurality of color images.

いくつかの実施形態では、前記第１の深度データにおける深度画像の各々は、第１のデータである。前記第２の深度データにおける画像フレームの各々は、第２のデータである。前記３Ｄモデリングシステムにおける前記１つ以上のプロセッサは、さらに、複数のデータペアを決定し、前記複数のデータペアに基づいて前記第１の深度データと前記第２の深度データとの間の位置関係を決定するように構成される。前記複数のデータペアの各々は、前記第１のデータおよび対応する前記第２のデータを含む。前記第１のデータおよび対応する前記第２のデータは、同様の目標オブジェクトを含む。 In some embodiments, each of the depth images in the first depth data is first data. Each of the image frames in the second depth data is second data. The one or more processors in the 3D modeling system are further configured to determine a plurality of data pairs and determine a positional relationship between the first depth data and the second depth data based on the plurality of data pairs. Each of the plurality of data pairs includes the first data and the corresponding second data. The first data and the corresponding second data include similar target objects.

いくつかの実施形態では、前記３Ｄモデリングシステムは、さらに、１つ以上の姿勢センサを備える。前記１つ以上の姿勢センサは、前記３Ｄモデリングシステムにおける前記撮像手段および前記走査手段の姿勢情報を示す外部パラメータを出力するように構成される。前記撮像手段により取得された前記第１のデータの各々には、前記撮像手段の姿勢情報を示す第１の外部パラメータが付けされる。前記走査手段により取得られた前記第２のデータの各々には、前記走査手段の姿勢情報を示す第２の外部パラメータが付けされる。前記３Ｄモデリングシステムにおける前記１つ以上のプロセッサは、さらに、前記第１の外部パラメータに基づいて、前記第１のデータに関連する第１の姿勢を決定し、前記第２の外部パラメータに基づいて、前記第２のデータに関連する第２の姿勢を決定し、前記第１の姿勢と前記第２の姿勢が同様であることに応じて、前記第１のデータと前記第２のデータを含むデータペアを決定するように構成される。 In some embodiments, the 3D modeling system further includes one or more attitude sensors. The one or more attitude sensors are configured to output external parameters indicative of attitude information of the imaging means and the scanning means in the 3D modeling system. Each of the first data acquired by the imaging means is assigned a first external parameter indicative of attitude information of the imaging means. Each of the second data acquired by the scanning means is assigned a second external parameter indicative of attitude information of the scanning means. The one or more processors in the 3D modeling system are further configured to determine a first attitude associated with the first data based on the first external parameter, determine a second attitude associated with the second data based on the second external parameter, and determine a data pair including the first data and the second data in response to the first attitude and the second attitude being similar.

いくつかの実施形態では、前記３Ｄモデリングシステムにおける前記１つ以上のプロセッサは、さらに、深度閾値を決定し、前記１つ以上のデータ点の深度値が前記深度閾値よりも大きいことに応じて、前記第１の深度データにおける前記１つ以上のデータ点を削除するように構成される。 In some embodiments, the one or more processors in the 3D modeling system are further configured to determine a depth threshold and, in response to the depth value of the one or more data points being greater than the depth threshold, remove the one or more data points in the first depth data.

いくつかの実施形態では、前記３Ｄモデリングシステムにおける前記１つ以上のプロセッサは、さらに、生成された前記シーンの３Ｄモデルにおいて、解像度閾値未満の１つ以上の画像領域を識別し、前記撮像手段から第１の充填データを受信し、生成された前記シーンの３Ｄモデルに前記第１の充填データを補充して、新たなシーンの３Ｄモデルを生成するように構成される。前記第１の充填データは、複数の深度画像の画素を含む。 In some embodiments, the one or more processors in the 3D modeling system are further configured to identify one or more image regions in the generated 3D model of the scene that are below a resolution threshold, receive first fill data from the imaging means, and supplement the generated 3D model of the scene with the first fill data to generate a new 3D model of the scene. The first fill data includes pixels of a plurality of depth images.

いくつかの実施形態では、前記３Ｄモデリングシステムにおける前記１つ以上のプロセッサは、さらに、生成された前記シーンの３Ｄモデルにおいて、十分な前記深度データ点を持たない１つ以上の画像領域を識別し、前記走査手段から第２の充填データを受信し、生成された前記シーンの３Ｄモデルに第２の充填データを補充して、新たなシーンの３Ｄモデルを生成するように構成される。前記第２の充填データは、複数の画像フレームの深度データ点を含む。 In some embodiments, the one or more processors in the 3D modeling system are further configured to identify one or more image regions in the generated 3D model of the scene that do not have sufficient depth data points, receive second fill data from the scanning means, and supplement the generated 3D model of the scene with the second fill data to generate a new 3D model of the scene. The second fill data includes depth data points of a plurality of image frames.

いくつかの実施形態では、非揮発性なコンピュータ可読媒体が提供される。非揮発性なコンピュータ可読媒体は、コンピュータ実行可能命令を記憶しており、この命令が１つ以上のプロセッサによって実行されると、プロセッサは、３Ｄモデリングシステムにおける撮像手段が第１の深度データを取得することと、前記３Ｄモデリングシステムにおける走査手段が第２の深度データを取得することと、前記３Ｄモデリングシステムが色データを受信し、前記第１のセット深度データ、前記第２の深度データ、前記色データに基づいて、シーンの３Ｄモデルを生成することと、前記シーンの３Ｄモデルを表示させることと、を容易に実現する。前記第１の深度データは、複数のフレームの深度画像の画素を含む。前記第２の深度データは、複数の画像フレームの深度データ点を含む。 In some embodiments, a non-volatile computer readable medium is provided. The non-volatile computer readable medium stores computer executable instructions that, when executed by one or more processors, facilitate the processors: capturing an image in a 3D modeling system to acquire first depth data; scanning a scan in the 3D modeling system to acquire second depth data; receiving color data from the 3D modeling system; generating a 3D model of a scene based on the first set of depth data, the second depth data, and the color data; and displaying the 3D model of the scene. The first depth data includes pixels of depth images of a plurality of frames. The second depth data includes depth data points of a plurality of image frames.

本発明の技術は、例示的な図に基づいて以下でさらに詳細に説明されるが、実施例に限定されない。本明細書に記載および／または例示されたすべての特徴は、単独で使用することも、異なる組み合わせで組み合わせることができる。様々な例の特徴および利点は、以下を示す添付図面を参照して以下の詳細な説明を読むことによって明らかになるであろう。
図１は、１つ以上の実施形態による例示的な３Ｄ仮想現実（ＶＲ）環境を示す。図２は、１つ以上の実施形態による例示的なコンピュータシステムのブロック図を示す。図３は、１つ以上の実施形態によるシーンの３Ｄモデルを生成するための工程を示す。図４は、１つ以上の実施形態によるシーンの３Ｄモデルを生成するアプリケーションシナリオを示す。図３は、１つ以上の実施形態によるシーンの３Ｄモデルを生成するための工程を示す。図３は、１つ以上の実施形態によるシーンの３Ｄモデルを生成するための工程を示す。図７は、１つ以上の実施形態によるシーンの３Ｄモデルを生成するためのシステムを示す。 The technology of the present invention will be described in more detail below based on the exemplary figures, but not limited to the examples. All features described and/or illustrated herein can be used alone or combined in different combinations. The features and advantages of the various examples will become clear by reading the following detailed description with reference to the attached drawings, which show:
FIG. 1 illustrates an exemplary 3D virtual reality (VR) environment in accordance with one or more embodiments. FIG. 2 illustrates a block diagram of an exemplary computer system in accordance with one or more embodiments. FIG. 3 illustrates a process for generating a 3D model of a scene in accordance with one or more embodiments. FIG. 4 illustrates an application scenario for generating a 3D model of a scene in accordance with one or more embodiments. FIG. 3 illustrates a process for generating a 3D model of a scene in accordance with one or more embodiments. FIG. 3 illustrates a process for generating a 3D model of a scene in accordance with one or more embodiments. FIG. 7 illustrates a system for generating a 3D model of a scene in accordance with one or more embodiments.

本明細書で説明する本開示は、シーンの３Ｄモデルを生成するための技術を提供する。深度情報は、撮像手段および走査手段を使用して取得される。深度カメラなどの撮像手段は、閾値未満の深度データを取得するように構成される。撮像手段により取得された深度データには、高解像度の深度データが含まれているため、３Ｄモデルの詳細が提供される。LIＤARシステムなどの走査手段は、３Ｄモデルにおいて遠くのオブジェクトを高精度で生成できるように、広範囲の撮影距離で一貫した精度で深度データを取得するように構成されている。撮像手段および走査手段により取得された深度データ点は、深度データ点間の位置関係に基づいて結合される。一部の例では、位置関係は、撮像手段や走査手段などのデータ取得装置の姿勢情報に基づいて決定される。他の例では、位置関係は、深度データ点の複数の画像および/またはフレームに存在する多数の目標オブジェクトを識別することによって決定される。加えて、及び／又は代替的に、３Ｄモデルにおいてデータ点は、色撮像手段により提供される色情報に従ってレンダリングされる。このようにして、深度と色の両方の情報を含む３Ｄモデルが生成される。 The disclosure described herein provides techniques for generating a 3D model of a scene. Depth information is acquired using an imaging means and a scanning means. The imaging means, such as a depth camera, is configured to acquire depth data below a threshold. The depth data acquired by the imaging means includes high resolution depth data, thereby providing detail in the 3D model. The scanning means, such as a LIDAR system, is configured to acquire depth data with consistent accuracy over a wide range of shooting distances, such that distant objects can be generated with high accuracy in the 3D model. The depth data points acquired by the imaging means and the scanning means are combined based on a positional relationship between the depth data points. In some examples, the positional relationship is determined based on pose information of a data acquisition device, such as the imaging means or the scanning means. In other examples, the positional relationship is determined by identifying multiple target objects present in multiple images and/or frames of depth data points. Additionally and/or alternatively, the data points in the 3D model are rendered according to color information provided by a color imaging means. In this manner, a 3D model is generated that includes both depth and color information.

本明細書で提供される技術を適用することにより、高解像度および高精度の３Ｄモデルを生成することができ、シミュレートＶＲアプリケーションのユーザ体験を向上させることができる。 By applying the techniques provided herein, high resolution and accuracy 3D models can be generated, enhancing the user experience of simulated VR applications.

図１は、いくつかの実施形態による例示的な３ＤＶＲ環境１００を示す。図１に示されるように、３ＤＶＲ環境１００は、アパートや家の床などの住宅ユニットをシミュレートまたは表現することができる。３ＤＶＲ環境１００は、任意の屋内空間または環境のＶＲ表現を含み得ることに留意されたい。図１を参照すると、３ＤＶＲ環境１００は、１１０、１２０、１３０、１４０、１５０、および１６０などの１つ以上の機能空間を含み得る。本明細書で使用されるように、機能空間とは、所定の機能に関連する閉鎖または部分的に閉鎖された空間を指す。場合によっては、機能空間が部屋に対応し得る。例えば、機能空間１１０は第１の寝室に対応し、機能空間１３０は第２の寝室に対応し得る。一部の例では、機能空間は、部屋内または部屋に隣接する囲まれた空間または部分的に囲まれた空間に対応し得る。例えば、機能空間１４０はクローゼットに対応し得る。他の例では、機能空間は、所定の目的のために一般的に使用される領域に対応し得る。例えば、機能空間１２０はキッチンエリアに対応し、機能空間１５０はダイニングエリアに対応し、機能空間１６０はリビングルームに対応し得る。機能空間１２０、１５０、１６０は、同じ部屋（例えば、閉鎖された領域）を共有することができるが、それらの機能は異なるため、異なる機能空間として考慮され得る。 FIG. 1 illustrates an exemplary 3DVR environment 100 according to some embodiments. As illustrated in FIG. 1, the 3DVR environment 100 may simulate or represent a residential unit, such as an apartment or a floor of a house. It should be noted that the 3DVR environment 100 may include a VR representation of any indoor space or environment. With reference to FIG. 1, the 3DVR environment 100 may include one or more functional spaces, such as 110, 120, 130, 140, 150, and 160. As used herein, a functional space refers to an enclosed or partially enclosed space associated with a given function. In some cases, a functional space may correspond to a room. For example, the functional space 110 may correspond to a first bedroom, and the functional space 130 may correspond to a second bedroom. In some examples, a functional space may correspond to an enclosed or partially enclosed space within or adjacent to a room. For example, the functional space 140 may correspond to a closet. In other examples, a functional space may correspond to an area commonly used for a given purpose. For example, functional space 120 may correspond to a kitchen area, functional space 150 may correspond to a dining area, and functional space 160 may correspond to a living room. Functional spaces 120, 150, and 160 may share the same room (e.g., an enclosed area), but may be considered different functional spaces because their functions are different.

図２は、本明細書に開示される様々な機能を実装するように構成された例示的なコンピュータシステム２００のブロック図である。例えば、コンピュータシステム２００は、ＶＲ環境１００を作成または再構成するためのサーバーとして構成され得る。他の例では、コンピュータシステム２００は、ＶＲ環境１００を表示または強化するための端末装置として構成され得る。図２に示すように、コンピュータシステム２００は、プロセッサ２１０、通信インターフェース２２０、メモリ／ストレージ２３０、およびディスプレイ２４０を含み得る。メモリ／ストレージ２３０は、プロセッサ２１０によって実行されると、プロセッサ２１０に本明細書に開示される様々な動作を実行させることができるコンピュータ可読命令を格納するように構成され得る。メモリ２３０は、揮発性または非揮発性、磁気、半導体ベース、テープベース、光学、取り外し可能、非取り外し可能、または他のタイプの記憶デバイス、または有形のコンピュータ可読媒体など、任意の非揮発性なタイプの大容量ストレージであり得る。読み取り専用メモリ（ＲＯＭ）、フラッシュメモリ、ダイナミックランダムアクセスメモリ（ＲＡＭ）、および／またはスタティックＲＡＭを含むが、これらに限定されない媒体。 2 is a block diagram of an exemplary computer system 200 configured to implement various functions disclosed herein. For example, the computer system 200 may be configured as a server for creating or reconstructing the VR environment 100. In another example, the computer system 200 may be configured as a terminal device for displaying or enhancing the VR environment 100. As shown in FIG. 2, the computer system 200 may include a processor 210, a communication interface 220, a memory/storage 230, and a display 240. The memory/storage 230 may be configured to store computer-readable instructions that, when executed by the processor 210, cause the processor 210 to perform various operations disclosed herein. The memory 230 may be any non-volatile type of mass storage, such as a volatile or non-volatile, magnetic, semiconductor-based, tape-based, optical, removable, non-removable, or other type of storage device, or tangible computer-readable medium, including, but not limited to, read-only memory (ROM), flash memory, dynamic random access memory (RAM), and/or static RAM.

プロセッサ２１０は、メモリ２３０に格納された命令に従って動作を実行するように構成され得る。プロセッサ２１０は、任意の適切なタイプの汎用または専用マイクロプロセッサ、デジタル信号プロセッサ、マイクロコントローラなどを含み得る。一部の例では、プロセッサ２１０は、本明細書で開示される１つ以上の所定の動作を実行するために専用の別個のプロセッサモジュールとして構成され得る。他の例では、プロセッサ２１０は、本明細書で開示される１つ以上の所定の操作に無関係な他の操作を実行できるために共有プロセッサモジュールとして構成され得る。 Processor 210 may be configured to perform operations according to instructions stored in memory 230. Processor 210 may include any suitable type of general purpose or special purpose microprocessor, digital signal processor, microcontroller, etc. In some examples, processor 210 may be configured as a separate processor module dedicated to performing one or more predetermined operations disclosed herein. In other examples, processor 210 may be configured as a shared processor module to be able to perform other operations unrelated to one or more predetermined operations disclosed herein.

通信インターフェース２２０は、コンピュータシステム２００と他のデバイスまたはシステムとの間で情報を通信するように構成され得る。例えば、通信インターフェース２２０は、データ通信接続を提供するために、統合サービスデジタルネットワーク（ＩＳＤＮ）カード、ケーブルモデム、衛星モデム、またはモデムを含み得る。他の例として、通信インターフェース２２０は、互換性のあるＬＡＮへのデータ通信接続を提供するために、ローカルエリアネットワーク（ＬＡＮ）カードを含み得る。別の例として、通信インターフェース２２０は、光ファイバネットワークアダプタ、１０Ｇイーサネットアダプタ（イーサネットは登録商標）などの高速ネットワークアダプタを含み得る。無線リンクは、通信インターフェース２２０によって実装することもできる。このような実装では、通信インターフェース２２０は、ネットワークを介して、様々なタイプの情報を表すデジタルデータストリームを運ぶ電気信号、電磁気信号または光信号を送受信することができる。ネットワークは、典型的に、セルラ通信ネットワーク、無線ローカルエリアネットワーク（ＷＬＡＮ）、ワイドエリアネットワーク（ＷＡＮ）などを含み得る。 The communication interface 220 may be configured to communicate information between the computer system 200 and other devices or systems. For example, the communication interface 220 may include an integrated services digital network (ISDN) card, a cable modem, a satellite modem, or a modem to provide a data communication connection. As another example, the communication interface 220 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. As another example, the communication interface 220 may include a high-speed network adapter, such as a fiber optic network adapter, a 10G Ethernet adapter (Ethernet is a registered trademark), or the like. A wireless link may also be implemented by the communication interface 220. In such an implementation, the communication interface 220 may send and receive electrical, electromagnetic, or optical signals carrying digital data streams representing various types of information over the network. The network may typically include a cellular communication network, a wireless local area network (WLAN), a wide area network (WAN), or the like.

通信インターフェース２２０は、キーボード、マウス、タッチパッド、タッチスクリーン、マイク、カメラ、バイオセンサなどの様々なＩ／Ｏデバイスを含み得る。ユーザは、通信インターフェース２２０を介して端末装置にデータを入力することができる。 The communication interface 220 may include various I/O devices such as a keyboard, mouse, touchpad, touchscreen, microphone, camera, biosensor, etc. A user can input data into the terminal device via the communication interface 220.

ディスプレイ２４０は、コンピュータシステム２００の一部として統合されてもよいし、コンピュータシステム２００に通信可能に結合された別個のデバイスとして提供されてもよい。ディスプレイ２４０は、液晶ディスプレイ（ＬＣＤ）、発光ダイオードディスプレイ（ＬＥＤ）、プラズマディスプレイ、または任意の他のタイプのディスプレイなどのディスプレイデバイスを含み、ユーザ入力およびデータ描写のためにディスプレイ上に提示されるグラフィカルユーザインターフェース（ＧＵＩ）を提供してもよい。いくつかの実施形態では、ディスプレイデバイス２４０は、ＶＲゴーグル、ＶＲメガネ、または没入型ＶＲ体験を提供する他の同様のデバイスを含み得る。例えば、ＶＲ環境１００は、ディスプレイ２４０上に表示され得る。いくつかの実施形態では、ディスプレイ２４０は、通信インターフェース２２０の一部として統合され得る。 The display 240 may be integrated as part of the computer system 200 or may be provided as a separate device communicatively coupled to the computer system 200. The display 240 may include a display device such as a liquid crystal display (LCD), a light emitting diode display (LED), a plasma display, or any other type of display, and may provide a graphical user interface (GUI) presented on the display for user input and data representation. In some embodiments, the display device 240 may include VR goggles, VR glasses, or other similar devices that provide an immersive VR experience. For example, the VR environment 100 may be displayed on the display 240. In some embodiments, the display 240 may be integrated as part of the communication interface 220.

図３は、１つ以上の実施形態によるシーンの３Ｄモデルを生成するための例示的な工程３００である。工程３００は、図２に示すメモリ２３０に格納されたコンピュータ実行可能命令に従って、３Ｄモデリングシステムおよび／またはデバイス２００によって実行され得る。３Ｄモデリングシステムは、撮像手段、走査手段、および／または１つ以上の姿勢センサを含み得る。３Ｄモデリングシステムは、撮像手段および／または走査手段のような１つ以上の手段を位置決め・再位置決めのために、加えておよび／または代替的に、３Ｄモデリングシステムは、１つ以上のプラットフォーム、モータ、アクチュエータをさらに含み得る。例えば、３Ｄモデリングシステムは、撮像手段を走査手段とは別に回転および／または移動させるための１つ以上のプラットフォーム、モータ、および／またはアクチュエータを含み得る。一部の例では、３Ｄモデリングシステムは、画像に関連する色データを取得するように構成された色撮像手段（例えば、カラーカメラ）を含み得る。他の例では、３Ｄモデリングシステムは、３Ｄモデリングシステムとは別の外部色撮像手段から色データを取得することができる。 3 is an exemplary process 300 for generating a 3D model of a scene according to one or more embodiments. Process 300 may be performed by the 3D modeling system and/or device 200 according to computer-executable instructions stored in memory 230 shown in FIG. 2. The 3D modeling system may include an imaging means, a scanning means, and/or one or more orientation sensors. Additionally and/or alternatively, the 3D modeling system may further include one or more platforms, motors, actuators for positioning and repositioning one or more means, such as the imaging means and/or the scanning means. For example, the 3D modeling system may include one or more platforms, motors, and/or actuators for rotating and/or moving the imaging means separately from the scanning means. In some examples, the 3D modeling system may include a color imaging means (e.g., a color camera) configured to acquire color data associated with the image. In other examples, the 3D modeling system may acquire color data from an external color imaging means separate from the 3D modeling system.

撮像手段は、深度カメラであってもよく、シーンの深度画像を撮像するように構成され得る。シーンの各深度画像は、複数の画素からなり、各画素は、深度値を含む。深度値は、１つ以上の画素によって描かれる代表オブジェクトと撮像手段の位置との間の距離を表し得る。走査手段（例えば、ＬＩＤＡＲデバイス）は、シーンを走査して、複数の深度値を示す複数の深度データ点を収集するように構成され得る。１つ以上の姿勢センサは、複数の外部パラメータを出力するように構成され得る。外部パラメータは、撮像手段および走査手段に関連する位置および回転情報を含む。例えば、外部パラメータは、３×３回転行列および３×１並進ベクトルを含む姿勢行列を含み得る。一部の変形例では、撮像手段は、複数の画像（例えば、５つの画像）を撮像してもよく、姿勢センサは、画像に対する１つ以上の姿勢行列を決定し得る。例えば、撮像手段は、複数の画像を撮像する際に（例えば、同じ位置に）静止していてもよく、姿勢センサは、複数の画像について単一の姿勢行列を取得・決定してもよい。姿勢行列は、この位置における撮像手段に関連する回転（例えば、ピッチ、ヨー、ロール）および／または並進（例えば、ｘ、ｙ、ｚ位置値）の値を示し得る。一部の例では、複数の画像を取得する際に、撮像手段は異なる位置にある場合がある。姿勢センサは、これらの画像のそれぞれを取得する際に、撮像手段の回転および/または並進の値を示す姿勢行列を取得・決定することができる。 The imaging means may be a depth camera and may be configured to capture depth images of the scene. Each depth image of the scene consists of a number of pixels, each pixel including a depth value. The depth value may represent a distance between a representative object depicted by one or more pixels and the position of the imaging means. The scanning means (e.g., a LIDAR device) may be configured to scan the scene to collect a number of depth data points indicating a number of depth values. The one or more orientation sensors may be configured to output a number of extrinsic parameters. The extrinsic parameters include position and rotation information related to the imaging means and the scanning means. For example, the extrinsic parameters may include an orientation matrix including a 3×3 rotation matrix and a 3×1 translation vector. In some variations, the imaging means may capture a number of images (e.g., five images), and the orientation sensor may determine one or more orientation matrices for the images. For example, the imaging means may be stationary (e.g., at the same position) when capturing the multiple images, and the orientation sensor may obtain and determine a single orientation matrix for the multiple images. The orientation matrix may indicate rotational (e.g., pitch, yaw, roll) and/or translational (e.g., x, y, z positional values) values associated with the imaging means at this position. In some examples, the imaging means may be at different positions when acquiring multiple images. The orientation sensor may obtain and determine an orientation matrix that indicates the rotational and/or translational values of the imaging means when acquiring each of these images.

外部パラメータは、ローカリゼーションのために、および／または撮像手段および走査手段からの画像の位置情報を同じ３Ｄ座標平面内にあるように計算するために使用され得る。別の言い方をすれば、撮像手段は第１の画像を撮像し、走査手段は第２の画像を取得する。姿勢センサからの外部パラメータは、これらの２つの手段からの第１および第２の画像が同じ座標面にあるように位置合わせするために使用されてもよい。３Ｄモデリングシステムは、さらに、取得されたデータを処理するデバイス２００を含み、および／またはデータ処理のために取得されたデータをデバイス２００に送信するインターフェースを含み得る。しかしながら、工程３００は、任意の適切な環境で実行されてもよく、以下のブロックのいずれかが任意の適切な順序で実行されてもよいことが認識されるであろう。 The extrinsic parameters may be used for localization and/or to calculate the position information of the images from the imaging means and the scanning means to be in the same 3D coordinate plane. In other words, the imaging means captures a first image and the scanning means acquires a second image. The extrinsic parameters from the attitude sensor may be used to align the first and second images from these two means to be in the same coordinate plane. The 3D modeling system may further include a device 200 for processing the acquired data and/or an interface for transmitting the acquired data to the device 200 for data processing. However, it will be appreciated that the process 300 may be performed in any suitable environment and any of the following blocks may be performed in any suitable order.

ブロック３１０において、３Ｄモデリングシステムは、第１の深度データを取得する。具体的には、３Ｄモデリングシステムの撮像手段は、シーンの複数の深度画像を撮像する。撮像手段は、１つ以上のカメラであってもよく、および／またはそれを含んでもよく、前記カメラは、ステレオカメラ、構造化光カメラ、飛行時間（TOF）カメラ、および／または他のタイプの深度カメラを含むが、これらに限定されるわけではない。撮像手段により撮像された各深度画像は、複数の画素を含む。各画素は、撮影位置（例えば、画像を取得するときの撮像手段の位置）における撮像手段とシーン内のスポットとの間の距離を示す深度値を含む深度データ点に関連付けられる。第１の深度データは、撮像手段により撮像された深度画像の画素を含む。 In block 310, the 3D modeling system acquires first depth data. In particular, an imaging means of the 3D modeling system captures a plurality of depth images of the scene. The imaging means may be and/or may include one or more cameras, including but not limited to stereo cameras, structured light cameras, time-of-flight (TOF) cameras, and/or other types of depth cameras. Each depth image captured by the imaging means includes a plurality of pixels. Each pixel is associated with a depth data point that includes a depth value indicating a distance between the imaging means at a capture position (e.g., the position of the imaging means when the image is captured) and a spot in the scene. The first depth data includes pixels of the depth image captured by the imaging means.

シーンは、１つ以上のオブジェクトおよび／または撮像領域の画像あってよく、前述のシーンは、任意のタイプ／数のオブジェクトおよび／または撮像領域を含む任意のタイプのシーンであってよい。例えば、シーンは、家の屋内シーンまたは家の屋外シーンであってよい。一部の例では、シーン内のオブジェクトの少なくとも一部(例えば、蛇口および/またはドアハンドル)を２回撮像することができる。例えば、撮像されるシーンは、部分シーンＡ、部分シーンＢ、部分シーンＣを含み得る。各部分シーンのサイズは、撮像手段の視野である。撮像手段は、各部分シーンの２つの深度画像を撮像することができる。加えて、及び／又は代替的に、部分シーンＡおよび部分シーンＢが共通のオブジェクトを含む場合、撮像手段は、部分シーンＡについて１つの深度画像を、部分シーンＢについて１つの深度画像を撮像することができる。この場合、前記共通のオブジェクトは、２つの深度画像において、２回撮像される。 A scene may be an image of one or more objects and/or imaging areas, and said scene may be any type of scene including any type/number of objects and/or imaging areas. For example, the scene may be an indoor scene of a house or an outdoor scene of a house. In some examples, at least some of the objects in the scene (e.g., a faucet and/or a door handle) may be imaged twice. For example, the imaged scene may include partial scene A, partial scene B, and partial scene C. The size of each partial scene is the field of view of the imaging means. The imaging means may capture two depth images of each partial scene. Additionally and/or alternatively, if partial scene A and partial scene B include a common object, the imaging means may capture one depth image for partial scene A and one depth image for partial scene B. In this case, the common object is imaged twice in the two depth images.

一部の例では、撮像手段は、シーンの複数の深度画像を撮像するために、部分シーンについて撮像された複数の深度画像がシーン全体をカバーし得るように、異なる姿勢（例えば、位置および／または撮影角度）で配置され得る。撮像手段は、異なる姿勢で配置されるように移動または回転され得る。例えば、撮像手段はプラットフォーム上に配置されてもよい。プラットフォームは、予め設定された運動軌跡に従って移動するように構成され得る。撮像手段は、撮像手段が予め設定された距離（例えば、０．２ｍ）だけ移動するたびに深度画像を撮像することができる。一部の変形例では、プラットフォームは、予め設定されたオブジェクトの周りを回転するように構成されることがある。撮像手段は、撮像手段が予め設定された角度（例えば、１０度）移動するたびに深度画像を撮像することができる。さらに、プラットフォームは、予め設定された運動軌跡に従って移動しながら、予め設定されたオブジェクトの周りを回転するように構成され得る。換言すれば、撮像手段が予め設定された距離および／または予め設定された角度だけ回転／移動するときに、撮像手段は深度画像を撮像することができる。 In some examples, the imaging means may be positioned in different poses (e.g., positions and/or shooting angles) to capture multiple depth images of a scene, such that the multiple depth images captured for a partial scene may cover the entire scene. The imaging means may be moved or rotated to be positioned in different poses. For example, the imaging means may be positioned on a platform. The platform may be configured to move according to a preset motion trajectory. The imaging means may capture a depth image every time the imaging means moves a preset distance (e.g., 0.2 m). In some variations, the platform may be configured to rotate around a preset object. The imaging means may capture a depth image every time the imaging means moves a preset angle (e.g., 10 degrees). Furthermore, the platform may be configured to rotate around a preset object while moving according to a preset motion trajectory. In other words, the imaging means may capture a depth image when the imaging means rotates/moves a preset distance and/or a preset angle.

一部の例では、３Ｄモデリングシステムは、シーンの複数の深度画像を撮像する工程中または工程後に、キーポイント検出を実行することができる。例えば、３Ｄモデリングシステムは、部分シーンについて撮像手段により取得された１つ以上の深度画像に対してキーポイント検出を実行して、取得された深度画像内の１つ以上の目標オブジェクトおよび／または１つ以上の画像領域を識別して、識別された目標オブジェクトが、１つ以上の原因(例えば、低解像度および/または欠落/不完全なデータ点)で、補充用の深度データを必要とするかどうかを判断する。例えば、シーンは目標オブジェクト（例えば、ドアハンドル）を含むことがあり、３Ｄモデリングシステムはシーンの１つ以上の深度画像を撮像することができる。各深度画像は、部分的なシーンであってもよい（例えば、目標オブジェクトを含み、シーン全体の一部を示してもよい）。換言すれば、３Ｄモデリングシステムは、目標オブジェクトの第１の深度画像を撮像するよう撮像手段に指示し、次に、目標オブジェクトを含む次の深度画像を撮像するように撮像手段に指示するようになる。加えて、及び／又は代替的に、３Ｄモデリングシステムは、キーポイント検出結果に基づいて(例えば、低解像度および/またはデータ点の欠落に基づいて)、シーン内の目標オブジェクトが補充用の深度データを必要とすると判断する。例えば、３Ｄモデリングシステムは、目標オブジェクトの解像度が所定の閾値を下回っているため、補充用の深度データが必要であると判断する。この場合、３Ｄモデリングシステムのプロセッサは、目標オブジェクトの画像を撮像できる所定の場所に移動するように撮像手段に指示する。次いで、プロセッサは、部分シーンの深度画像を撮像して、目標オブジェクトに関する補充用の深度データを提供するように撮像手段に指示する。目標オブジェクトは、所定のオブジェクトであってもよい（例えば、ユーザは、目標オブジェクトを指定するユーザ入力を提供してもよい）。加えて、および／または代替として、目標オブジェクトは、予め設定された条件を満たすオブジェクトであってもよい（例えば、目標オブジェクトは、画像内の所定の体積または領域よりも小さいものである）。 In some examples, the 3D modeling system may perform keypoint detection during or after capturing multiple depth images of a scene. For example, the 3D modeling system may perform keypoint detection on one or more depth images captured by the imaging means of a partial scene to identify one or more target objects and/or one or more image regions in the captured depth images and determine whether the identified target objects require additional depth data due to one or more causes (e.g., low resolution and/or missing/incomplete data points). For example, a scene may include a target object (e.g., a door handle), and the 3D modeling system may capture one or more depth images of the scene. Each depth image may be a partial scene (e.g., may include a target object and show a portion of the entire scene). In other words, the 3D modeling system may instruct the imaging means to capture a first depth image of the target object, and then instruct the imaging means to capture a next depth image including the target object. Additionally and/or alternatively, the 3D modeling system may determine that a target object in the scene requires additional depth data based on keypoint detection results (e.g., based on low resolution and/or missing data points). For example, the 3D modeling system may determine that additional depth data is required because the resolution of the target object is below a predetermined threshold. In this case, the processor of the 3D modeling system instructs the imaging means to move to a predetermined location where an image of the target object can be captured. The processor then instructs the imaging means to capture depth images of the partial scene to provide additional depth data for the target object. The target object may be a predetermined object (e.g., a user may provide user input specifying the target object). Additionally and/or alternatively, the target object may be an object that satisfies a predefined condition (e.g., the target object is smaller than a predetermined volume or area in the image).

ブロック３２０において、３Ｄモデリングシステムは、シーンの第２の深度データを取得する。具体的には、３Ｄモデリングシステムの走査手段がシーンを走査して、複数の深度データ点を収集する。各深度データ点には、走査手段とシーン内のスポットとの間の距離を示す深度値が含まれる。一部の例では、走査手段は、レーザーなどのエネルギー源を利用して領域を走査し、走査された領域内のオブジェクトからの跳ね返りによって生じる反射エネルギーを検出するLIＤARシステムであってもよい。 At block 320, the 3D modeling system obtains second depth data of the scene. Specifically, a scanning means of the 3D modeling system scans the scene to collect a plurality of depth data points. Each depth data point includes a depth value indicating the distance between the scanning means and a spot in the scene. In some examples, the scanning means may be a LIDAR system that utilizes an energy source, such as a laser, to scan an area and detect reflected energy resulting from bouncing off objects in the scanned area.

ブロック３１０と同様に、３Ｄモデリングシステムは、シーンを走査する工程中または工程後にキーポイント検出を実行してもよい。例えば、３Ｄモデリングシステムは、部分シーンについて走査手段により取得された深度データ点の１つ以上のフレームに対してキーポイント検出を実行して、取得されたフレーム内の１つ以上の目標オブジェクトおよび／または１つ以上の画像領域を識別して、識別された目標オブジェクトが、１つ以上の原因(例えば、低解像度および/または欠落/不完全なデータ点)で、補充用の深度データを必要とするかどうかを判断する。３Ｄモデリングシステムのプロセッサは、キーポイント検出結果に基づいて（例えば、低解像度および/またはデータ点の欠落に基づいて）、所定の領域を走査して目標オブジェクトの補充用の深度データを提供するように走査手段に指示する。 Similar to block 310, the 3D modeling system may perform keypoint detection during or after scanning the scene. For example, the 3D modeling system may perform keypoint detection on one or more frames of depth data points acquired by the scanning means for the sub-scene to identify one or more target objects and/or one or more image regions in the acquired frames and determine whether the identified target objects require supplemental depth data for one or more reasons (e.g., low resolution and/or missing/incomplete data points). Based on the keypoint detection results (e.g., based on low resolution and/or missing data points), the processor of the 3D modeling system instructs the scanning means to scan a predetermined region to provide supplemental depth data for the target objects.

一部の例では、撮像手段と走査手段との間の相対位置は固定されている。また、相対位置は、固定されない場合もある。撮像手段と走査手段との間の相対位置は、相対位置が固定されていない場合、深度データ取得工程中に動的に決定され得る。この場合、３Ｄモデリングシステムは、撮像手段および／または走査手段の位置および回転情報を示す外部パラメータを出力する１つ以上の姿勢センサを含み得る。一部の変形例では、３Ｄモデリングシステムは、撮像手段と走査手段の両方について単一の姿勢センサを含み得る。他の変形例では、３Ｄモデリングシステムは、撮像手段用の姿勢センサと、走査手段用の別個の姿勢センサとを含み得る。 In some examples, the relative position between the imaging means and the scanning means is fixed. In other cases, the relative position is not fixed. The relative position between the imaging means and the scanning means may be dynamically determined during the depth data acquisition process if the relative position is not fixed. In this case, the 3D modeling system may include one or more attitude sensors that output external parameters indicative of position and rotation information of the imaging means and/or the scanning means. In some variations, the 3D modeling system may include a single attitude sensor for both the imaging means and the scanning means. In other variations, the 3D modeling system may include an attitude sensor for the imaging means and a separate attitude sensor for the scanning means.

ブロック３３０において、３Ｄモデリングシステムは、色データ、第１の深度データ、第２の深度データに基づいて、シーンの３Ｄモデルを生成する。一部の変形例では、色データは、色撮像手段（例えば、カラーカメラ）を使用して撮像されたシーンの画像に基づいて取得される。一部の例では、３Ｄモデリングシステムには、色データを取得するための色撮像手段が含まれる。他の例では、色撮像手段により撮像された色データのセットは、３Ｄモデリングシステムに送信される。色撮像手段により取得されたシーンの画像は、赤緑青（ＲＧＢ）画像、グレースケール画像、および／または黒／白画像であってもよく、および／またはそれらを含んでもよい。色データは、色画像の画素に基づいて形成される（例えば、色データは、ＲＧＢ値、グレースケール値、および／または輝度値を含んでもよい）。３Ｄモデリングシステムは、シーンの３Ｄモデルを生成するために、受け取った色データおよび深度データを処理するように構成されたプロセッサを含み得る。 At block 330, the 3D modeling system generates a 3D model of the scene based on the color data, the first depth data, and the second depth data. In some variations, the color data is obtained based on an image of the scene captured using a color imaging means (e.g., a color camera). In some examples, the 3D modeling system includes a color imaging means for obtaining the color data. In other examples, a set of color data captured by the color imaging means is transmitted to the 3D modeling system. The image of the scene captured by the color imaging means may be and/or may include a red-green-blue (RGB) image, a grayscale image, and/or a black/white image. The color data is formed based on pixels of the color image (e.g., the color data may include RGB values, grayscale values, and/or luminance values). The 3D modeling system may include a processor configured to process the received color data and depth data to generate a 3D model of the scene.

一部の例では、３Ｄモデリングシステムは、撮像手段および走査手段に対応する姿勢情報を利用して、撮像手段からの第１のセットの深度データと、走査手段からの第２のセットの深度データとを組み合わせることができる。３Ｄモデリングシステムのプロセッサは、撮像手段および走査手段の位置および回転情報を含む外部パラメータを姿勢センサから取得することができる。プロセッサは、外部パラメータに基づいて、同じ３Ｄ座標平面における撮像手段および／または走査手段の位置、回転、および／または撮影角度を決定および／または計算することができる。例えば、撮像手段は、シーンの複数の深度画像を取得する。各深度画像は、深度画像を撮像した定時の撮像手段の姿勢を示す外部パラメータに関連付けられる。計算された撮像手段の姿勢情報に基づいて、撮像手段により撮像された深度画像間の位置関係が決定される。このように、撮像手段により撮像された深度画像は、決定された位置関係に基づいて結合され得る。加えて、及び／又は代替的に、走査手段により取得された深度データ点は、走査手段の姿勢情報から決定された位置関係に基づいて結合され得る。加えて、及び／又は代替的に、撮像手段により取得された深度画像と走査手段により取得された深度データ点は、撮像手段および走査手段の姿勢情報から決定された位置関係に基づいて結合され得る。このようにして、第１の深度データと第２の深度データを組み合わせて、シーンの完全な深度画像を生成することができる。加えて、及び／又は代替的に、色撮像手段により撮像された各色画像は、色撮像手段の姿勢情報を示す外部パラメータに関連付けられてもよい。同様に、色画像間の位置関係に基づいて色画像を結合され得る。加えて、及び／又は代替的に、色画像によって形成された色データは、データ取得手段の姿勢情報から決定された位置関係に基づいて、シーン内の深度データ点と位置合わせされ得る。したがって、深度および色情報の両方を含む３Ｄモデルを生成することができる。 In some examples, the 3D modeling system can combine a first set of depth data from the imaging means and a second set of depth data from the scanning means using pose information corresponding to the imaging means and the scanning means. The processor of the 3D modeling system can obtain external parameters including position and rotation information of the imaging means and the scanning means from the pose sensor. The processor can determine and/or calculate the position, rotation, and/or shooting angle of the imaging means and/or the scanning means in the same 3D coordinate plane based on the external parameters. For example, the imaging means obtains multiple depth images of a scene. Each depth image is associated with an external parameter indicating the pose of the imaging means at the time when the depth image was obtained. Based on the calculated pose information of the imaging means, a positional relationship between the depth images obtained by the imaging means is determined. In this way, the depth images obtained by the imaging means can be combined based on the determined positional relationship. Additionally and/or alternatively, the depth data points obtained by the scanning means can be combined based on the positional relationship determined from the pose information of the scanning means. Additionally and/or alternatively, the depth images acquired by the imaging means and the depth data points acquired by the scanning means may be combined based on a positional relationship determined from the orientation information of the imaging means and the scanning means. In this way, the first depth data and the second depth data can be combined to generate a complete depth image of the scene. Additionally and/or alternatively, each color image captured by the color imaging means may be associated with an external parameter indicating the orientation information of the color imaging means. Similarly, the color images may be combined based on a positional relationship between the color images. Additionally and/or alternatively, the color data formed by the color images may be aligned with the depth data points in the scene based on a positional relationship determined from the orientation information of the data acquisition means. Thus, a 3D model including both depth and color information may be generated.

一部の変形例では、３Ｄモデリングシステムは、キーポイント検出の結果に基づいて（例えば、識別された１つ以上の目標オブジェクトに基づいて）、色データ、第１の深度データ、第２の深度データを組み合わせることができる。例えば、３Ｄモデリングシステムは、第１の深度データにおける深度画像のそれぞれに対してキーポイント検出を実行して、複数の目標オブジェクトを識別することができる。目標オブジェクトは、深度画像において識別された目標オブジェクトを位置合わせすることによって深度画像を結合できるように、撮像手段からの少なくとも２つの深度画像によって捕捉される。走査手段からの第２の深度データは、複数のフレームを含む。３Ｄモデリングシステムは、走査手段により取得された各フレームに対してキーポイント検出を実行して、対象オブジェクトの数を特定することができる。同様に、走査手段により取得された深度データ点は、識別された目標オブジェクトをフレーム内で位置合わせることによって結合することができる。加えて、及び／又は代替的に、第１の深度データおよび第２の深度データは、第１の深度データおよび第２の深度データの両方に存在する識別された目標オブジェクトを位置合わせすることによって、組み合わせることができる。加えて、及び／又は代替的に、３Ｄモデリングシステムは、色データにおける色画像に対してキーポイント検出を実行して、多数の目標オブジェクトを識別することができる。このよう、色データは、深度データおよび色データの両方に存在する識別された目標オブジェクトに基づいて、深度データと位置合わせされ得る。したがって、３Ｄモデリングシステムは、シーンの３Ｄモデルを生成するように、シーン内の目標オブジェクトの数を揃えることによって、色データ、第１の深度データ、第２の深度データを組み合わせることができる。 In some variations, the 3D modeling system can combine the color data, the first depth data, and the second depth data based on the results of the keypoint detection (e.g., based on one or more identified target objects). For example, the 3D modeling system can perform keypoint detection on each of the depth images in the first depth data to identify multiple target objects. The target objects are captured by at least two depth images from the imaging means such that the depth images can be combined by aligning the identified target objects in the depth images. The second depth data from the scanning means includes multiple frames. The 3D modeling system can perform keypoint detection on each frame acquired by the scanning means to identify the number of target objects. Similarly, the depth data points acquired by the scanning means can be combined by aligning the identified target objects in the frames. Additionally and/or alternatively, the first depth data and the second depth data can be combined by aligning the identified target objects present in both the first depth data and the second depth data. Additionally and/or alternatively, the 3D modeling system may perform keypoint detection on the color image in the color data to identify multiple target objects. In this manner, the color data may be aligned with the depth data based on the identified target objects present in both the depth data and the color data. Thus, the 3D modeling system may combine the color data, the first depth data, and the second depth data by aligning the number of target objects in the scene to generate a 3D model of the scene.

３Ｄモデリングシステムは、３Ｄモデルを生成した後、３Ｄモデルをディスプレイに出力する。３Ｄモデルは、VRアプリケーションに利用できる。 After generating the 3D model, the 3D modeling system outputs the 3D model to a display. The 3D model can be used for VR applications.

図４は、１つ以上の実施形態によるシーン４００の３Ｄモデルを生成する例示的なアプリケーションである。３Ｄモデリングシステム４０２は、撮像手段４０４および走査手段４０６を備える。３Ｄモデリングシステム４０２の撮像手段４０４は、シーン４００の複数の深度画像を撮像して、第１の深度データを形成する。深度画像は、少なくとも１つのオブジェクトを含む部分シーン４０８を含み得る。例えば、部分シーン４０８は電気スタンドを含む。一部の例では、撮像手段４０４により撮像された少なくとも２つの深度画像は、部分シーン４０８を含む。走査手段４０６は、シーン４００を走査して複数の深度データ点を収集し、第２の深度データを形成する。その後、３Ｄモデリングシステム４０２は、色データ、第１の深度データ、第２の深度データに基づいて、シーンの３Ｄモデルを生成する。 4 is an exemplary application for generating a 3D model of a scene 400 according to one or more embodiments. The 3D modeling system 402 includes an imaging means 404 and a scanning means 406. The imaging means 404 of the 3D modeling system 402 captures a plurality of depth images of the scene 400 to form a first depth data. The depth images may include a partial scene 408 including at least one object. For example, the partial scene 408 includes a desk lamp. In some examples, at least two depth images captured by the imaging means 404 include the partial scene 408. The scanning means 406 scans the scene 400 to collect a plurality of depth data points to form a second depth data. The 3D modeling system 402 then generates a 3D model of the scene based on the color data, the first depth data, and the second depth data.

本明細書で説明するシーンの３Ｄモデルを生成するための方法、システム、および装置は、シーン内のオブジェクトの広範囲の距離に対して一貫した高精度でシーンの高解像度３Ｄモデルを生成することができる。シーンの高品質３Ｄモデルは、深度カメラなどの撮像手段から取得された深度データと、LIＤARシステムなどの走査手段から取得された深度データを組み合わせることによって生成される。 The methods, systems, and apparatus for generating a 3D model of a scene described herein can generate a high-resolution 3D model of a scene with consistent accuracy for a wide range of distances of objects in the scene. A high-quality 3D model of a scene is generated by combining depth data obtained from an imaging means, such as a depth camera, and depth data obtained from a scanning means, such as a LIDAR system.

深度カメラは、高解像度と高フレームレートの利点を有する。したがって、深度カメラは、シーン内の小さなオブジェクトの詳細を撮像できる。しかし、深度カメラは、その出力または解像度によって制限される可能性があるため、通常、比較的近い撮影距離で使用される。ほとんどの深度カメラの撮影距離は、０.２～８ｍで、最大撮影距離は通常１０ｍ以内である。また、深度カメラが取得するデプスデータの精度は、撮影距離に大きく依存する。例えば、深度カメラが構造化光カメラである例では、構造化光カメラの最高精度は、０.５～３ｍの撮影範囲内でｍｍのレベルに達することができる。しかし、３～５ｍの撮影範囲内では、構造化光カメラの精度は数十ｍｍに低下する。撮影距離が５ｍを超えると、構造化光カメラの精度が０.５ｍを下回ることがある。一方、LIＤARシステムなどの走査手段は、通常、深度カメラよりも著しく長い検出距離を持つ。例えば、市販のLIＤARシステムの測距距離は、１０ｍ、３０ｍ、１００ｍ、３００ｍ、またはそれ以上である。さらに、検出範囲内でのLIＤARシステムの精度は、一貫性が高く、非常に正確であることが知られている。しかし、LIＤARシステムは、通常、深度カメラのように集中的なデータ点を提供できず、シーン内の小さなオブジェクトの詳細を提供することはできない。 Depth cameras have the advantage of high resolution and high frame rate. Thus, depth cameras can capture details of small objects in a scene. However, depth cameras are usually used at relatively close shooting distances because they may be limited by their output or resolution. Most depth cameras have a shooting distance of 0.2 to 8 m, and the maximum shooting distance is usually within 10 m. In addition, the accuracy of the depth data acquired by the depth camera depends greatly on the shooting distance. For example, in an example where the depth camera is a structured light camera, the highest accuracy of the structured light camera can reach the level of mm within a shooting range of 0.5 to 3 m. However, within a shooting range of 3 to 5 m, the accuracy of the structured light camera decreases to tens of mm. When the shooting distance exceeds 5 m, the accuracy of the structured light camera may fall below 0.5 m. On the other hand, scanning means such as LIDAR systems usually have a significantly longer detection distance than depth cameras. For example, commercially available LIDAR systems have ranging distances of 10 m, 30 m, 100 m, 300 m, or even more. Furthermore, the accuracy of LIDAR systems within their detection range is known to be consistent and highly accurate. However, LIDAR systems typically cannot provide focused data points like depth cameras, and cannot provide details of small objects in a scene.

本開示は、シーン内で一貫した高精度で細かい詳細を備えた小さなオブジェクトおよび遠くのオブジェクトをモデル化することを保証するように、撮像手段および走査手段からの深度データを組み合わせるための方法、システム、および装置を提供する。 The present disclosure provides methods, systems, and apparatus for combining depth data from imaging and scanning means to ensure consistent modeling of small and distant objects with high accuracy and fine detail within a scene.

図３のブロック３１０に戻って参照すると、３Ｄモデリングシステムは、所定の解像度閾値に従って第１の深度データを取得するように構成される。３Ｄモデリングシステムの撮像手段は、最初に、異なる位置に移動し回転しながら、シーンの複数の深度画像を撮像する。３Ｄモデリングシステムのプロセッサは、撮像手段により撮像された深度画像を処理し、深度画像の画素を組み合わせることによって、シーンの全体的な深度画像を取得する。その後、３Ｄモデリングシステムのプロセッサは、所定の解像度閾値未満の解像度を有するシーン内の１つ以上の画像領域を識別し、１つ以上の画像領域について補充の深度画像を撮像するように撮像手段に指示する。各画像領域は、深度画像で撮像された部分シーンに含まれ得る。例えば、一部の変形例では、撮像手段は、目標数の深度画像を撮像するように指示されてもよい。部分シーンで撮像された深度画像の数は、部分シーンの複雑さに基づいて決定され得る。部分シーンが複雑になればなるほど、その部分シーンで撮像する深度画像の数が多くなる。撮像手段により撮像された初期の深度画像に、１つ以上の画像領域に対する補充の深度画像を補足することによって、１つ以上の画像領域の解像度は、所定の解像度閾値を上回るように改善されることができる。したがって、撮像手段により撮像された深度画像の画素を含む結果として得られる第１の深度データは、解像度要件を満たすために画像領域に十分な深度データ点を提供することができる。 Referring back to block 310 of FIG. 3, the 3D modeling system is configured to acquire first depth data according to a predetermined resolution threshold. The imaging means of the 3D modeling system first captures multiple depth images of the scene while moving and rotating to different positions. The processor of the 3D modeling system processes the depth images captured by the imaging means and acquires an overall depth image of the scene by combining pixels of the depth images. The processor of the 3D modeling system then identifies one or more image regions in the scene having a resolution below the predetermined resolution threshold and instructs the imaging means to capture supplemental depth images for the one or more image regions. Each image region may be included in a partial scene captured in the depth image. For example, in some variations, the imaging means may be instructed to capture a target number of depth images. The number of depth images captured of the partial scene may be determined based on the complexity of the partial scene. The more complex the partial scene, the more depth images are captured of the partial scene. By supplementing the initial depth image captured by the imaging means with a supplementary depth image for one or more image regions, the resolution of the one or more image regions can be improved above a predetermined resolution threshold. Thus, the resulting first depth data including pixels of the depth image captured by the imaging means can provide sufficient depth data points for the image regions to meet the resolution requirements.

一部の例では、３Ｄモデリングシステムは、目標精度に従って、所定の深度閾値に基づいて第１の深度データを取得することができる。例えば、撮像手段は、０．３ｍから２．３ｍの距離の間のデータ点／オブジェクトを正確に撮像するために使用され、０．３ｍから２．３ｍの距離の間の深度値の最大誤差は、０．５ｃｍであってもよい。したがって、深度閾値は２.３ｍであってもよく、２.３ｍより大きい値を示す深度データ値は、３Ｄモデリングシステムによって、目標精度を満たしていない可能性があるというフラグが付けられる。加えて、および／または代替として、ユーザは、画像がどれだけ正確であってほしいかに基づいて深度閾値を設定することができる。換言すれば、第１の深度データは、所定の深度閾値に従って調整され得る。例えば、深度閾値より大きい深度値を有する各データ点は、第１の深度データから削除され得る。深度閾値は、３Ｄモデリングシステムにおける撮像手段の精度に基づいて決定され得る。撮像手段の精度は、撮像手段を校正することによって、または製造業者により提供される関連パラメータから取得する。校正は、目標オブジェクトから離れた複数の撮影距離に撮像手段を配置して、各撮影距離で複数の深度画像を取得することによって実行されてもよい。例えば、撮像手段は、０．３ｍから４ｍの撮影範囲を有し、撮影距離は、所定の増分（例えば、０．１ｍ）で０．３ｍから４ｍの間で設定されてもよい。深度画像の画素に含まれる深度値は、異なる撮影距離での撮像手段の精度を校正できるように、撮像手段と目標オブジェクトとの間の実際の距離と比較され得る。例えば、撮影距離が０.３ｍの場合、深度値の誤差は０.３ｃｍであり、撮影距離が０.８ｍの場合、深度値の誤差は０.２ｃｍであり、撮影距離が１.３ｍの場合、深度値の誤差は０.１ｃｍであり、撮影距離が１.８ｍの場合、深度値の誤差は０.３ｃｍであり、撮影距離が２.３ｍの場合、深度値の誤差は０.５ｃｍであり、撮影距離が２.８ｍの場合、深度値の誤差は１ｃｍであり、撮影距離が３.３ｍの場合、深度値の誤差は５ｃｍであり、撮影距離が３.８ｍの場合、深度値の誤差は１０ｃｍであってもよい。この例では、目標精度を０.５ｃｍとした場合、深度閾値は２.３ｍと定義し得る。 In some examples, the 3D modeling system may obtain the first depth data based on a predefined depth threshold according to a target accuracy. For example, the imaging means may be used to accurately image data points/objects between a distance of 0.3 m and 2.3 m, and the maximum error of the depth value between a distance of 0.3 m and 2.3 m may be 0.5 cm. Thus, the depth threshold may be 2.3 m, and depth data values indicating values greater than 2.3 m are flagged by the 3D modeling system as possibly not meeting the target accuracy. Additionally and/or alternatively, the user may set the depth threshold based on how accurate he or she wants the image to be. In other words, the first depth data may be adjusted according to the predefined depth threshold. For example, each data point having a depth value greater than the depth threshold may be removed from the first depth data. The depth threshold may be determined based on the accuracy of the imaging means in the 3D modeling system. The accuracy of the imaging means may be obtained by calibrating the imaging means or from relevant parameters provided by the manufacturer. Calibration may be performed by placing the imaging means at multiple shooting distances away from the target object and acquiring multiple depth images at each shooting distance. For example, the imaging means may have a shooting range of 0.3 m to 4 m, and the shooting distances may be set between 0.3 m and 4 m in predetermined increments (e.g., 0.1 m). The depth values contained in the pixels of the depth images may be compared with the actual distance between the imaging means and the target object so that the accuracy of the imaging means at different shooting distances can be calibrated. For example, when the shooting distance is 0.3 m, the error in the depth value is 0.3 cm; when the shooting distance is 0.8 m, the error in the depth value is 0.2 cm; when the shooting distance is 1.3 m, the error in the depth value is 0.1 cm; when the shooting distance is 1.8 m, the error in the depth value is 0.3 cm; when the shooting distance is 2.3 m, the error in the depth value is 0.5 cm; when the shooting distance is 2.8 m, the error in the depth value is 1 cm; when the shooting distance is 3.3 m, the error in the depth value is 5 cm; and when the shooting distance is 3.8 m, the error in the depth value may be 10 cm. In this example, if the target accuracy is 0.5 cm, the depth threshold may be defined as 2.3 m.

深度閾値は、３Ｄモデルを生成するための精度および/または解像度の要件に応じて、データ処理中に変更してもよい。例えば、より高い精度要件のために、より小さな深度閾値が決定されてもよく、その結果、第１の深度データ内のデータ点が少なくなり、深度画像の解像度が犠牲になることがある。他の例では、深度閾値が大きいほど、第１の深度データに含まれるデータ点が多くなり、深度画像の解像度が高くなる。しかし、この場合は、精度要件を緩和する必要がある。加えて、及び／又は代替的に、深度閾値は、第２の深度データの調整に適用されてもよい。例えば、深度データ点が深度閾値よりも小さい深度値を有する場合、第２の深度データ内の深度データ点を削除してもよい。 The depth threshold may be changed during data processing depending on the accuracy and/or resolution requirements for generating the 3D model. For example, for higher accuracy requirements, a smaller depth threshold may be determined, resulting in fewer data points in the first depth data and sacrificing the resolution of the depth image. In another example, the larger the depth threshold, the more data points are included in the first depth data and the higher the resolution of the depth image. However, in this case the accuracy requirement needs to be relaxed. Additionally and/or alternatively, the depth threshold may be applied to adjust the second depth data. For example, a depth data point in the second depth data may be removed if the depth data point has a depth value smaller than the depth threshold.

一部の例では、３Ｄモデリングシステムのプロセッサは、処理速度の向上および計算複雑性の低減で改善されるように、図３のブロック３３０で説明されたデータ処理を行って、第１の深度データを処理して、第１の深度データにおける深度データ点の数を減らしてもよい。データ処理は、ダウンサンプリングおよびフィルタリングのうちの少なくとも１つを含む。一部の例では、ダウンサンプリングの工程は、深度画像の重なり合う画素をマージすることによって実行され得る。例えば、シーン内の同じ撮影スポットに関連する重なり合う画素は、重なり合う画素の深度値を平均化することによって、１つの深度データ点にマージされる。このような複数のデータ点を１つのデータ点にマージする工程は、繰り返しサンプリングの結果を平均化する工程に相当し、ガウス分布に従ってデータ点のジッタリングによって引き起こされる誤差を低減することができる。さらに、フィルタリングの工程は、フィルタリングアルゴリズムを適用することによって飛跡などの予想外の深度データ点を削除するために実行され得る。データ点の数を減らすための同様のデータ処理は、図３のブロック３２０で説明したように、３Ｄモデリングシステムにおける走査手段により取得された第２の深度データに適用することができる。 In some examples, the processor of the 3D modeling system may process the first depth data to reduce the number of depth data points in the first depth data by performing the data processing described in block 330 of FIG. 3 to improve processing speed and reduce computational complexity. The data processing includes at least one of downsampling and filtering. In some examples, the downsampling step may be performed by merging overlapping pixels of the depth image. For example, overlapping pixels related to the same shooting spot in the scene are merged into one depth data point by averaging the depth values of the overlapping pixels. Such a merging of multiple data points into one data point corresponds to averaging the results of repeated sampling and can reduce errors caused by jittering of the data points according to a Gaussian distribution. Furthermore, a filtering step may be performed to remove unexpected depth data points such as tracks by applying a filtering algorithm. Similar data processing to reduce the number of data points may be applied to the second depth data acquired by the scanning means in the 3D modeling system as described in block 320 of FIG. 3.

一部の例では、３Ｄモデリングシステムは、ブロック３３０の後、生成された３Ｄモデルが、所定の解像度閾値に達することができない１つ以上の画像領域を含むと決定することができる。例えば、１つ以上の画像領域は、シーン内の十分なデータ点で生成されていないオブジェクトを含む場合がある。３Ｄモデリングシステムのプロセッサは、１つ以上の画像領域について補充の深度画像を撮像するように、撮像手段に指示する。一部の例では、撮像手段は、１つ以上の画像領域についてより豊富な深度データを取得できるように、移動および/または回転させて、１つ以上の画像領域のそれぞれについて複数の撮影角度から深度画像を撮像する。撮像手段により撮像された補充深度画像の画素は、第１の充填データを形成し得る。３Ｄモデリングシステムのプロセッサは、新たなシーンの３Ｄモデルを生成するために、生成された３Ｄモデルに第１の充填データを補足して、局所的な詳細の洗練の程度を改善することができる。 In some examples, the 3D modeling system may determine after block 330 that the generated 3D model includes one or more image regions that fail to reach a predetermined resolution threshold. For example, one or more image regions may include objects that have not been generated with enough data points in the scene. The processor of the 3D modeling system instructs the imaging means to capture supplemental depth images for the one or more image regions. In some examples, the imaging means moves and/or rotates to capture depth images from multiple shooting angles for each of the one or more image regions so that richer depth data can be obtained for the one or more image regions. The pixels of the supplemental depth images captured by the imaging means may form first fill data. The processor of the 3D modeling system may supplement the generated 3D model with the first fill data to improve the degree of refinement of local details in order to generate a 3D model of the new scene.

一部の例では、３Ｄモデリングシステムは、取得された深度データに基づいて３Ｄモデルをリアルタイムで表示することができる。３Ｄモデルの表示は、ブロック３１０および３２０などの前述のデータ取得工程、ブロック３３０などのデータ処理工程、および/またはより多くの深度画像を撮像することによって新たな３Ｄモデルを生成する工程中に利用可能である。ユーザは、３Ｄモデルのリアルタイム表示に基づいて、提示された３Ｄモデルが精度、解像度、および／または完全性の要件を満たすかどうかを判断することができる。例えば、ユーザは、補充用の深度データが必要かどうか、および補充用の深度データを取得するために、３Ｄモデリングシステムにおける撮像手段と走査手段のどちらを使用すべきかを決定することができる。 In some examples, the 3D modeling system can display a 3D model in real time based on the acquired depth data. The display of the 3D model is available during the aforementioned data acquisition steps, such as blocks 310 and 320, data processing steps, such as block 330, and/or the step of generating a new 3D model by capturing more depth images. Based on the real-time display of the 3D model, the user can determine whether the presented 3D model meets the accuracy, resolution, and/or completeness requirements. For example, the user can determine whether supplemental depth data is needed and whether an imaging means or a scanning means in the 3D modeling system should be used to capture the supplemental depth data.

図５は、１つ以上の実施形態によるシーンの３Ｄモデルを生成するための例示的な工程５００である。工程５００は、図２に示されるメモリ２３０に格納されたコンピュータ実行可能命令に従って、前述の３Ｄモデリングシステムおよび／またはデバイス２００によって実行され得る。しかしながら、工程５００は、任意の適切な環境で実行されてもよく、以下のブロックのいずれかが任意の適切な順序で実行されてもよいことが認識されるであろう。工程３００と比較して、工程５００は、複数のデータペアを決定し、それらのデータペアを使用して、撮像手段および走査手段からの第１および第２の深度データを組み合わせる追加のブロックを含み得る。 5 is an exemplary process 500 for generating a 3D model of a scene according to one or more embodiments. Process 500 may be performed by the aforementioned 3D modeling system and/or device 200 according to computer-executable instructions stored in memory 230 shown in FIG. 2. However, it will be appreciated that process 500 may be performed in any suitable environment and any of the following blocks may be performed in any suitable order. Compared to process 300, process 500 may include an additional block of determining a plurality of data pairs and using the data pairs to combine the first and second depth data from the imaging means and the scanning means.

ブロック５１０において、３Ｄモデリングシステムは、第１の深度データを取得する。ブロック５１０は、上述のブロック３１０と同様であり得る。加えて、及び／又は代替的に、３Ｄモデリングシステムの撮像手段は、深度画像を第１のフレームレート（例えば、撮像手段によって捕捉される毎秒の深度画像の数）で捕捉することができる。各フレームは、複数の画素を含む。第１の深度データは、撮像手段により取得された複数のフレームを含む。第１セットの深度データにおける深さ映像の各フレームは、第１データとして定義される。 In block 510, the 3D modeling system acquires first depth data. Block 510 may be similar to block 310 described above. Additionally and/or alternatively, the imaging means of the 3D modeling system may capture depth images at a first frame rate (e.g., the number of depth images per second captured by the imaging means). Each frame includes a number of pixels. The first depth data includes a number of frames acquired by the imaging means. Each frame of the depth image in the first set of depth data is defined as first data.

ブロック５２０において、３Ｄモデリングシステムは、第２の深度データを取得する。ブロック５２０は、上述のブロック３２０と同様であり得る。さらに、３Ｄモデリングシステムの走査手段は、第２のフレームレートで深度データ点を収集することができる。各フレームは、走査手段により収集された複数の深度データ点を含み、第２のフレームレートは、秒あたり走査手段により収集されたデータ点の数を示す。第２の深度データは、走査手段により収集された深度データ点の複数のフレームを含む。第２の深度データにおける各フレームは、第２のデータとして定義され得る。 In block 520, the 3D modeling system acquires second depth data. Block 520 may be similar to block 320 described above. Additionally, the scanning means of the 3D modeling system may collect depth data points at a second frame rate. Each frame includes a plurality of depth data points collected by the scanning means, the second frame rate indicating the number of data points collected by the scanning means per second. The second depth data includes a plurality of frames of depth data points collected by the scanning means. Each frame in the second depth data may be defined as second data.

ブロック５３０において、３Ｄモデリングシステムは、複数のデータペアを決定し、複数のデータペアの各々は、第１のデータおよび対応する第２のデータを含む。例えば、深度画像のフレームである第１のデータは、部分シーンに含まれる画像領域について撮像されるものである。画像領域は、シーン内のオブジェクトであってもよい。対応する第２のデータは、同じ画像領域を含む部分シーンを走査することによって取得られるフレームとして決定されてもよい。なお、第１のデータによって取得された部分シーンと第２のデータによって取得された部分シーンは同様であっても異なっていてもよい。 At block 530, the 3D modeling system determines a number of data pairs, each of the number of data pairs including a first data and a corresponding second data. For example, the first data, which may be a frame of a depth image, is captured for an image region included in the partial scene. The image region may be an object within the scene. The corresponding second data may be determined as a frame acquired by scanning the partial scene including the same image region. Note that the partial scene captured by the first data and the partial scene captured by the second data may be similar or different.

一部の例では、３Ｄモデリングシステムにおける撮像手段および走査手段は、プラットフォーム上に統合されてもよく、走査手段に対する撮像手段の相対位置が固定されてもよい。その例では、プラットフォームは、プラットフォームの位置および回転情報などの姿勢情報を示す外部パラメータを提供する姿勢センサとさらに統合されてもよい。第１のデータである撮像手段により取得された深度画像の各フレームが、深度画像のフレームを撮像した定時に姿勢センサにより提供された外部パラメータが付けされるように、プラットフォーム、撮像手段、走査手段、姿勢センサの間で通信接続が確立されてもよい。深度画像のフレームを撮像した定時は、深度画像のフレームに関連付けられたタイムスタンプとして、撮像手段によって付けされてもよい。さらに、姿勢センサから出力された外部パラメータは、姿勢センサにより生成されたタイムスタンプが付けされてもよく、タイムスタンプは、姿勢センサにより外部パラメータが生成された定時を示す。タイムスタンプ情報に基づいて、３Ｄモデリングシステムは、外部パラメータのタイムスタンプと撮像された画像のタイムスタンプとの間の時間間隔が所定の閾値より小さいことによって、深度画像のフレームに付けされる外部パラメータを決定してもよい。同様に、走査手段により取得された深度データ点の各フレームは、深度データ点のフレームを走査した定時に姿勢センサにより提供される外部パラメータが付けされ得る。走査手段により取得された深度データ点のフレームは、走査手段により生成された、深度データ点のフレームを生成する定時を示すタイムスタンプが付けされる。３Ｄモデリングシステムは、タイムスタンプ情報に基づいて、時間間隔が所定の閾値よりも小さい場合に、走査手段により走査された深度データ点のフレームに付けされる外部パラメータを決定してもよい。 In some examples, the imaging means and the scanning means in the 3D modeling system may be integrated on a platform, and the relative position of the imaging means with respect to the scanning means may be fixed. In that example, the platform may be further integrated with an attitude sensor that provides extrinsic parameters indicative of attitude information, such as the position and rotation information of the platform. A communication connection may be established between the platform, the imaging means, the scanning means, and the attitude sensor, such that each frame of the depth image acquired by the imaging means, which is the first data, is tagged with the extrinsic parameters provided by the attitude sensor at the fixed time when the frame of the depth image was captured. The fixed time when the frame of the depth image was captured may be tagged by the imaging means as a timestamp associated with the frame of the depth image. Furthermore, the extrinsic parameters output from the attitude sensor may be tagged with a timestamp generated by the attitude sensor, the timestamp indicating the fixed time when the extrinsic parameters were generated by the attitude sensor. Based on the timestamp information, the 3D modeling system may determine the extrinsic parameters to be tagged to the frame of the depth image by a time interval between the timestamp of the extrinsic parameters and the timestamp of the captured image being less than a predetermined threshold. Similarly, each frame of depth data points acquired by the scanning means may be annotated with extrinsic parameters provided by the orientation sensor at the time the frame of depth data points was scanned. The frames of depth data points acquired by the scanning means are annotated with a timestamp indicating the time generated by the scanning means to generate the frame of depth data points. The 3D modeling system may determine, based on the timestamp information, the extrinsic parameters to be annotated with the frame of depth data points scanned by the scanning means when the time interval is less than a predefined threshold.

一部の例では、プラットフォームは、データ取得のために異なる位置に移動・回転する場合がある。プラットフォームがデータ取得のための位置の１つに移動・回転されると、深度画像の１つ以上のフレームを撮像するように撮像手段に指示され、深度データ点の１つ以上のフレームを走査するように走査手段に指示され、プラットフォームの現在の位置と回転を示す外部パラメータを出力するように姿勢センサに指示されてもよい。この位置で撮像手段により撮像された深度画像の１つ以上のフレームは、第１のデータを形成し得る。第１のデータには、この位置で姿勢センサから出力された外部パラメータが付けされる。加えて、及び／又は代替的に、この位置で走査手段により走査された深度データ点の１つ以上のフレームは、第２のデータを形成し得る。第２のデータには、この位置の姿勢センサから出力された外部パラメータが付けされる。３Ｄモデリングシステムは、この位置で取得された第１のデータが同じ位置で取得された第２のデータに対応していると判断し得る。プラットフォームを移動・回転させて異なる撮影位置で上記のデータ取得処理を繰り返すことによって、第１データ及び対応する第２データを含む複数のデータペアを取得することができる。一部の例では、プラットフォームをシーン内のある位置に固定し、３６０度回転させて、シーン全体のデータ取得を実行してもよい。一部の例では、プラットフォームは、ユーザによって手持ちまたは配置されてもよく、データ取得は、ユーザの動きに従って実行されてもよい。一部の例では、プラットフォームは、ロボットや自動運転車などの移動体に配置されてもよい。プラットフォームの運動軌跡は、本開示において限定されるべきではないことが理解されよう。一部の例では、プラットフォームは、色画像装置とさらに統合されてもよい。同様に、色撮像手段により撮像された画像は、姿勢センサから出力された外部パラメータが付けされてもよい。 In some examples, the platform may be moved and rotated to different positions for data acquisition. Once the platform is moved and rotated to one of the positions for data acquisition, the imaging means may be instructed to capture one or more frames of depth images, the scanning means may be instructed to scan one or more frames of depth data points, and the orientation sensor may be instructed to output extrinsic parameters indicative of the current position and rotation of the platform. The one or more frames of depth images captured by the imaging means at this position may form first data. The first data is annotated with the extrinsic parameters output from the orientation sensor at this position. Additionally and/or alternatively, the one or more frames of depth data points scanned by the scanning means at this position may form second data. The second data is annotated with the extrinsic parameters output from the orientation sensor at this position. The 3D modeling system may determine that the first data acquired at this position corresponds to the second data acquired at the same position. By moving and rotating the platform and repeating the data acquisition process described above at different capture positions, multiple data pairs including the first data and the corresponding second data may be acquired. In some examples, the platform may be fixed at a position in the scene and rotated 360 degrees to perform data acquisition of the entire scene. In some examples, the platform may be hand-held or placed by a user, and data acquisition may be performed according to the user's movement. In some examples, the platform may be placed on a moving object such as a robot or an autonomous vehicle. It will be understood that the motion trajectory of the platform should not be limited in this disclosure. In some examples, the platform may be further integrated with a color imager. Similarly, the image captured by the color imager may be annotated with extrinsic parameters output from the orientation sensor.

姿勢センサは、慣性測定ユニット（ＩＭＵ）、ＬＩＤＡＲシステムにおける同時定位およびマッピング（ＳＬＡＭ）ユニット、およびカラーカメラにおける視覚慣性オドメトリ（ＶＩＯ）ユニットのうちの少なくとも１つであってもよい。３Ｄモデリングシステムにおけるプラットフォームは、１つ以上の姿勢センサを含み得る。撮像手段、走査手段、および任意に色撮像手段のそれぞれは、別個の姿勢センサを含み得る。第１のデータ、第２のデータ、色撮像手段により撮像された画像に付けされた外部パラメータは、複数の姿勢センサから出力された外部パラメータの組み合わせであってもよい。一部の例では、外部パラメータは、IMUとSLAMユニットの両方から出力されるものである。一部の例では、外部パラメータは、IMUユニットとVIOユニットの両方から出力されるものである。一部の例では、外部パラメータは、ＩＭＵ、ＳＬＡＭユニット、ＶＩＯユニットから出力されるものである。あるいは、撮像手段および走査手段の外部パラメータのそれぞれは、撮像手段および走査手段の校正を通じて取得することができる。第１のデータ、第２のデータ、色撮像手段により撮像された画像に付けされた外部パラメータを使用して、同じ３Ｄ座標平面上でのデータ取得装置の姿勢を計算・決定することができる。 The attitude sensor may be at least one of an inertial measurement unit (IMU), a simultaneous localization and mapping (SLAM) unit in a LIDAR system, and a visual inertial odometry (VIO) unit in a color camera. The platform in the 3D modeling system may include one or more attitude sensors. Each of the imaging means, the scanning means, and optionally the color imaging means may include a separate attitude sensor. The extrinsic parameters attached to the first data, the second data, and the image captured by the color imaging means may be a combination of extrinsic parameters output from multiple attitude sensors. In some examples, the extrinsic parameters are output from both the IMU and the SLAM unit. In some examples, the extrinsic parameters are output from both the IMU unit and the VIO unit. In some examples, the extrinsic parameters are output from the IMU, the SLAM unit, and the VIO unit. Alternatively, each of the extrinsic parameters of the imaging means and the scanning means can be obtained through calibration of the imaging means and the scanning means. The pose of the data acquisition device on the same 3D coordinate plane can be calculated and determined using the external parameters attached to the first data, the second data, and the image captured by the color imaging means.

一部の例では、３Ｄモデリングシステムは、予め設定された時間間隔閾値に従ってデータペアを決定してもよい。例えば、データペアに含まれる第１のデータおよび対応する第２のデータは、予め設定された時間間隔閾値を下回る時間間隔内で取得されてもよい。第１のデータと第２のデータとの間の時間間隔が閾値未満であることに基づいて、第１のデータおよび対応する第２のデータは、同じ部分シーンの深度データ点を含み得る。このように、３Ｄモデリングシステムは、現在の時間間隔閾値に依存して、複数のデータペアを決定してもよく、データペアのそれぞれは、第１のデータおよび対応する第２のデータを含む。 In some examples, the 3D modeling system may determine the data pairs according to a preset time interval threshold. For example, the first data and the corresponding second data included in the data pair may be acquired within a time interval below the preset time interval threshold. Based on the time interval between the first data and the second data being less than the threshold, the first data and the corresponding second data may include depth data points of the same partial scene. In this manner, the 3D modeling system may determine multiple data pairs, each of which includes the first data and the corresponding second data, depending on the current time interval threshold.

一部の例では、３Ｄモデリングシステムは、第１のデータおよび第２のデータに対して実行されたキーポイント検出に基づいて、データペアを決定してもよい。３Ｄモデリングシステムは、第１のデータおよび第２のデータに対してキーポイント検出を実行することによって、第１のデータおよび対応する第２のデータ内の同じ画像領域を識別してもよい。画像領域の識別は、多数の特徴点に基づいて行われてもよい。特徴点は、ユーザによって事前定義され得る。例えば、一部の変形例では、特徴点は、グレースケールの劇的な変化を有する画像内で識別された画素、つまりエッジでの交点、および/または目標オブジェクトで識別された画素であってもよい。３Ｄモデリングシステムは、キーポイント検出結果に依存して、データペアを決定するために、第２のデータにも含まれる画像領域を第１のデータが含むと決定し得る。したがって、３Ｄモデリングシステムは、複数のデータペアを決定することができる。 In some examples, the 3D modeling system may determine the data pairs based on keypoint detection performed on the first data and the second data. The 3D modeling system may identify the same image regions in the first data and the corresponding second data by performing keypoint detection on the first data and the second data. The identification of the image regions may be based on a number of feature points. The feature points may be predefined by a user. For example, in some variations, the feature points may be pixels identified in the image that have a dramatic change in grayscale, i.e., intersections at edges, and/or pixels identified at the target object. The 3D modeling system may determine that the first data includes image regions that are also included in the second data in order to determine the data pairs, depending on the keypoint detection results. Thus, the 3D modeling system may determine multiple data pairs.

ブロック５４０において、３Ｄモデリングシステムは、第１の深度データと第２の深度データを組み合わせて、複数のデータペアに基づいて組合深度データを生成する。具体的には、データペアごとに、第１データ及び対応する第２データを組み合わせる。 At block 540, the 3D modeling system combines the first depth data and the second depth data to generate combined depth data based on the plurality of data pairs. Specifically, for each data pair, the first data and the corresponding second data are combined.

例えば、３Ｄモデリングシステムは、姿勢センサからの特徴点／目標オブジェクト、姿勢情報、および／または外部パラメータを使用して、第１および第２の深度データを組み合わせることができる。例えば、３Ｄモデリングシステムは、同じまたは実質的に同じ姿勢情報を有する（例えば、並進・回転ベクトルが実質的に同じである）第１の深度データおよび第２の深度データを決定してもよい。３Ｄモデリングシステムは、第１および第２の深度データを同様の姿勢情報と組み合わせて、組合深度データを生成してもよい。加えて、及び／又は代替的に、３Ｄモデリングシステムは、複数の画像/フレームを通して示される１つ以上の特徴点および/または目標オブジェクトを決定してもよい。３Ｄモデリングシステムは、特徴点および／または目標オブジェクトに関連付けられた第１および第２の深度データを決定し、次に、特徴点および／または目標オブジェクトに関連付けられた第１および第２の深度データを組み合わせて、組合深度データを生成してもよい。加えて、及び／又は代替的に、３Ｄモデリングシステムは、外部パラメータを使用して、同じ座標平面を有する組合深度データを生成してもよい(例えば、第１および第２の深度データを同じ座標平面に転置し、次に、同じ座標平面内の座標に基づいて、第１および第２の深度データを組み合わせる)。 For example, the 3D modeling system may combine the first and second depth data using feature points/target objects, pose information, and/or external parameters from a pose sensor. For example, the 3D modeling system may determine first and second depth data having the same or substantially the same pose information (e.g., translation and rotation vectors are substantially the same). The 3D modeling system may combine the first and second depth data with similar pose information to generate combined depth data. Additionally and/or alternatively, the 3D modeling system may determine one or more feature points and/or target objects shown through multiple images/frames. The 3D modeling system may determine first and second depth data associated with the feature points and/or target objects, and then combine the first and second depth data associated with the feature points and/or target objects to generate combined depth data. Additionally and/or alternatively, the 3D modeling system may use external parameters to generate combined depth data having the same coordinate plane (e.g., transposing the first and second depth data into the same coordinate plane and then combining the first and second depth data based on coordinates in the same coordinate plane).

ブロック５５０において、３Ｄモデリングシステムは、色データおよび組合深度データに基づいて、シーンの３Ｄモデルを生成する。生成された３Ｄモデルにおいて、データ点の各々は、深度値と色値を備える。組合深度データは、撮像手段により取得された深度画像のフレーム、または走査手段によって走査された深度データ点のフレームを含む。各フレームには、部分シーンが含まれる。一部の例では、フレームに含まれる部分シーンは、部分シーン間の位置関係に従って結合され得る。部分シーン間の位置関係は、予め設計されてもよい。例えば、撮像手段および走査手段を搭載するプラットフォームの運動軌跡を予め設定されてもよい。部分シーン間の位置関係は、プラットフォームの予め設定された運動軌跡に基づいて決定されてもよい。同様に、色データにおける部分シーンは、色データを形成する色画像を取得するために使用される色撮像手段の所定の運動軌跡に基づいて組み合わせることができる。あるいは、部分シーン間の位置関係は、データ取得工程中に、取得されたデータのフレームが付けされた外部パラメータ、または異なるフレームにおける識別された画像領域のいずれかに基づいて、動的に決定されてもよい。 In block 550, the 3D modeling system generates a 3D model of the scene based on the color data and the combined depth data. In the generated 3D model, each of the data points comprises a depth value and a color value. The combined depth data includes frames of depth images acquired by the imaging means or frames of depth data points scanned by the scanning means. Each frame includes a partial scene. In some examples, the partial scenes included in the frame may be combined according to the positional relationship between the partial scenes. The positional relationship between the partial scenes may be designed in advance. For example, the motion trajectory of the platform carrying the imaging means and the scanning means may be preset. The positional relationship between the partial scenes may be determined based on the preset motion trajectory of the platform. Similarly, the partial scenes in the color data can be combined based on a predetermined motion trajectory of the color imaging means used to acquire the color images forming the color data. Alternatively, the positional relationship between the partial scenes may be dynamically determined during the data acquisition process, based on either external parameters to which the frames of acquired data are attached, or on identified image areas in different frames.

一部の例では、３Ｄモデルにおいてデータ点の深度値は、前述のデータ処理ステップ中に同じ座標平面に転置される組合深度データに基づいて決定される。同様に、色データは、色画像を撮像している間の色撮像手段の位置および回転を示す外部パラメータに基づいて、同じ座標平面上に転置され得る。したがって、３Ｄモデルにおいてデータ点の各々は、３Ｄモデルにおいてデータ点と同じ座標にあると決定された色データ点に含まれるカラー値に従ってレンダリングされる。 In some examples, the depth value of a data point in the 3D model is determined based on the combined depth data that is transposed onto the same coordinate plane during the aforementioned data processing steps. Similarly, the color data may be transposed onto the same coordinate plane based on external parameters that indicate the position and rotation of the color imaging means while capturing the color image. Thus, each of the data points in the 3D model is rendered according to the color value contained in the color data point that is determined to be at the same coordinate as the data point in the 3D model.

一部の例では、３Ｄモデリングシステムは、組合深度データと色データの両方に存在する識別された１つ以上の画像領域を位置合わせすることによって、生成された３Ｄモデルにおいてデータ点の深度値と色値を決定する。 In some examples, the 3D modeling system determines depth and color values of data points in the generated 3D model by aligning one or more identified image regions that are present in both the combined depth data and color data.

図６は、１つ以上の実施形態によるシーンの３Ｄモデルを生成するための例示的な工程６００である。工程６００は、図２に示されるメモリ２３０に格納されたコンピュータ実行可能命令に従って、前述の３Ｄモデリングシステムおよび／またはデバイス２００によって実行され得る。しかしながら、工程６００は、任意の適切な環境で実行されてもよく、以下のブロックのいずれかが任意の適切な順序で実行されてもよいことが認識されるであろう。工程３００と比較すると、工程６００は、取得された第１／第２の深度データの元データ内の解像度の欠如および／または不完全なデータを決定し、その後、取得された深度データの元データを新たな深度データで補充する追加のブロックを含み得る。 Figure 6 is an exemplary process 600 for generating a 3D model of a scene according to one or more embodiments. Process 600 may be performed by the aforementioned 3D modeling system and/or device 200 according to computer-executable instructions stored in memory 230 shown in Figure 2. However, it will be appreciated that process 600 may be performed in any suitable environment and any of the following blocks may be performed in any suitable order. Compared to process 300, process 600 may include an additional block of determining lack of resolution and/or incomplete data in the original data of the acquired first/second depth data, and then supplementing the original data of the acquired depth data with new depth data.

ブロック６１０において、３Ｄモデリングシステムは、第１の深度データを取得する。ブロック６１０は、上述のブロック３１０と同様であり得る。 In block 610, the 3D modeling system obtains first depth data. Block 610 may be similar to block 310 described above.

ブロック６２０において、３Ｄモデリングシステムは、第２の深度データを取得する。ブロック６２０は、上述のブロック３２０と同様であり得る。 In block 620, the 3D modeling system obtains second depth data. Block 620 may be similar to block 320 described above.

ブロック６３０において、３Ｄモデリングシステムは、色データ、第１の深度データ、第２の深度データに基づいて、シーンの３Ｄモデルを生成する。ブロック６３０は、上述のブロック３３０と同様であり得る。 In block 630, the 3D modeling system generates a 3D model of the scene based on the color data, the first depth data, and the second depth data. Block 630 may be similar to block 330 described above.

ブロック６４０において、３Ｄモデリングシステムのプロセッサは、シーンの３Ｄモデルにおいて１つ以上の画像領域が十分なデータ点を有していないと判断することに応じて（例えば、モデル化に失敗したシーンの一部、および/または低解像度および/または欠落/不完全なデータに基づいて)、第２の充填データを取得するよう走査デバイスに指示する。３Ｄモデリングシステムのプロセッサは、１つ以上の画像領域について補充用の深度データ点を走査するように、走査手段に指示する。一部の例では、走査手段を移動および／または回転させて、１つ以上の画像領域を含む部分シーンを走査する。１つ以上の画像領域を走査することによって取得られた深度データ点は、第２の充填データを形成し得る。 At block 640, in response to determining that one or more image regions do not have sufficient data points in the 3D model of the scene (e.g., based on a portion of the scene that was not successfully modeled and/or low resolution and/or missing/incomplete data), the processor of the 3D modeling system directs the scanning device to acquire second fill data. The processor of the 3D modeling system directs the scanning means to scan the one or more image regions for fill depth data points. In some examples, the scanning means is moved and/or rotated to scan the sub-scene that includes the one or more image regions. The depth data points acquired by scanning the one or more image regions may form the second fill data.

ブロック６５０において、３Ｄモデリングシステムのプロセッサは、生成されたシーンの３Ｄモデルに第２の充填データを補足し、新たなシーンの３Ｄモデルを生成して、生成されたシーンの３Ｄモデルの完全性を改善する。 At block 650, the processor of the 3D modeling system supplements the generated 3D model of the scene with second fill data to generate a new 3D model of the scene, improving the completeness of the generated 3D model of the scene.

前述の開示された工程は、図７に示されるように、３Ｄモデリングシステム７００によって実行され得る。これは、１つ以上の実施形態に従う。３Ｄモデリングシステム７００は、撮像手段７１０、走査手段７２０、１つ以上の姿勢センサ７３０、１つ以上の３Ｄモデリングプロセッサ７４０、メモリ７５０およびディスプレイ７６０などの複数の構成要素を備える。 The above disclosed steps may be performed by a 3D modeling system 700, as shown in FIG. 7, according to one or more embodiments. The 3D modeling system 700 includes multiple components, such as an imaging means 710, a scanning means 720, one or more orientation sensors 730, one or more 3D modeling processors 740, a memory 750, and a display 760.

撮像手段７１０は、シーンの複数の深度画像を撮像するように構成され得る。撮像手段７１０は、シーンの深度画像を第１のフレームレートで撮像するように設定され得る。各深度画像は複数の画素を含み、画素は、撮像手段とシーン内の複数の撮影スポットとの間の距離を示す深度値を含む。さらに、各深度画像には、深度画像を撮像した定時を示すタイムスタンプが付けされてもよい。撮像手段７１０は、シーンの撮像された深度画像を第１の深度データまたは第１の充填データとして、３Ｄモデリングプロセッサ７４０に出力する。 The imaging means 710 may be configured to capture a number of depth images of the scene. The imaging means 710 may be set to capture the depth images of the scene at a first frame rate. Each depth image includes a number of pixels, the pixels including depth values indicative of distances between the imaging means and a number of capture spots in the scene. Additionally, each depth image may be time-stamped to indicate the exact time at which the depth image was captured. The imaging means 710 outputs the captured depth images of the scene to the 3D modeling processor 740 as first depth data or first fill data.

走査手段７２０は、シーンの複数の深度データ点を走査するように構成され得る。走査手段７２０は、第２のフレームレートでシーンの深度データ点を取得するように設定され得る。各フレームは、複数の深度データ点を含み、深度データ点は、走査手段７２０とシーン内の複数のスポットとの間の距離を示す深度値を含む。さらに、深度データ点の各フレームには、フレームを取得した定時を示すタイムスタンプが付けされてもよい。走査手段７２０は、取得された深度データ点を第２の深度データまたは第２の充填データとして、３Ｄモデリングプロセッサ７４０に出力する。 The scanning means 720 may be configured to scan a plurality of depth data points of the scene. The scanning means 720 may be set to acquire depth data points of the scene at a second frame rate. Each frame includes a plurality of depth data points, the depth data points including depth values indicating distances between the scanning means 720 and a plurality of spots in the scene. Additionally, each frame of depth data points may be time-stamped to indicate the exact time the frame was acquired. The scanning means 720 outputs the acquired depth data points to the 3D modeling processor 740 as second depth data or second fill data.

姿勢センサ７３０は、取得された深度データに対する外部パラメータを決定および／または出力するように構成され得る。外部パラメータは、姿勢センサ７３０が埋め込まれた手段の位置および回転などの姿勢情報を含む。姿勢センサ７３０により生成された外部パラメータには、外部パラメータを生成した定時を示すタイムスタンプが付けされてもよい。撮像手段７１０および走査手段７２０などのデータ取得手段には、それぞれ姿勢センサ７３０が組み込まれてもよい。あるいは、１つの姿勢センサ７３０が、取得された深度データのフレームに対して後で決定される外部パラメータを出力してもよい。姿勢センサ７３０は、３Ｄモデリングプロセッサ７４０が、タイムスタンプによって示される時間間隔に基づいて、取得されたデータのフレームに付けされた外部パラメータを決定できるように、タイムスタンプが付けされた外部パラメータを３Ｄモデリングプロセッサ７４０に出力してもよい。 The attitude sensor 730 may be configured to determine and/or output extrinsic parameters for the acquired depth data. The extrinsic parameters include attitude information such as the position and rotation of the means in which the attitude sensor 730 is embedded. The extrinsic parameters generated by the attitude sensor 730 may be time-stamped to indicate the time at which the extrinsic parameters were generated. The data acquisition means such as the imaging means 710 and the scanning means 720 may each be incorporated with an attitude sensor 730. Alternatively, one attitude sensor 730 may output extrinsic parameters that are subsequently determined for the acquired frames of depth data. The attitude sensor 730 may output the time-stamped extrinsic parameters to the 3D modeling processor 740 so that the 3D modeling processor 740 can determine the extrinsic parameters attached to the acquired frames of data based on the time interval indicated by the time stamp.

３Ｄモデリングプロセッサ７４０は、メモリ７５０に格納されたコンピュータ実行可能命令に基づいて前述の工程を実行するように構成され得る。一部の例では、３Ｄモデリングプロセッサ７４０は、外部色画像装置から色データを受信する。一部の例では、３Ｄモデリングプロセッサ７４０は、３Ｄモデリングシステム７００における色撮像手段から色データを受信する。 The 3D modeling processor 740 may be configured to perform the aforementioned steps based on computer-executable instructions stored in memory 750. In some examples, the 3D modeling processor 740 receives color data from an external color image device. In some examples, the 3D modeling processor 740 receives color data from a color imaging means in the 3D modeling system 700.

ディスプレイ７６０は、３Ｄモデリング工程７４０により生成される３Ｄモデルを表示するように構成され得る。一部の例では、ディスプレイ７６０は、データ取得工程中にシーンのリアルタイム３Ｄモデルを表示するようにさらに構成され得る。 The display 760 may be configured to display the 3D model generated by the 3D modeling process 740. In some examples, the display 760 may be further configured to display a real-time 3D model of the scene during the data acquisition process.

本明細書で説明する技術は、プロセッサベースの命令実行マシン、システム、装置、またはデバイスによって、またはこれらに関連して使用するためのコンピュータ可読媒体に格納された実行可能命令で具現化できる。いくつかの実施形態では、データを格納するために様々なタイプのコンピュータ可読媒体を含めることができることを当業者は理解するであろう。本明細書で使用される場合、「コンピュータ可読媒体」は、命令実行マシン、システム、装置、またはデバイスが、コンピュータ可読媒体から命令を読み取り（またはフェッチ）、説明した実施形態を実施するための命令を実行することができるように、コンピュータプログラムの実行可能命令を記憶するための任意の適切な媒体の１つまたは複数を含む。適切な記憶形式には、電子、磁気、光学、および電磁形式のうちの１つ以上が含まれる。従来の例のコンピュータ可読媒体の非網羅的なリストには、ポータブルコンピュータディスケット、ランダムアクセスメモリ（RAM）、読み取り専用メモリ（ROM）、消去可能プログラマブル読み取り専用メモリ（EPROM）、フラッシュメモリ装置、及び、ポータブルコンパクトディスク（CＤ）、ポータブルデジタル映像ディスク（ＤVＤ）などの光学記憶装置、などがある。 The techniques described herein may be embodied in executable instructions stored on a computer-readable medium for use by or in connection with a processor-based instruction-executing machine, system, apparatus, or device. Those skilled in the art will appreciate that various types of computer-readable media may be included for storing data in some embodiments. As used herein, a "computer-readable medium" includes one or more of any suitable medium for storing executable instructions of a computer program such that an instruction-executing machine, system, apparatus, or device can read (or fetch) the instructions from the computer-readable medium and execute the instructions to implement the described embodiments. Suitable storage formats include one or more of electronic, magnetic, optical, and electromagnetic formats. A non-exhaustive list of conventional example computer-readable media includes portable computer diskettes, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), flash memory devices, and optical storage devices such as portable compact disks (CDs), portable digital video disks (DVDs), and the like.

添付の図に図示された構成要素の配置は説明のためのものであり、他の配置も可能であることを理解されたい。例えば、本明細書に記載された要素の１つ以上は、全体または一部が電子ハードウェア部品として実現され得る。他の要素は、ソフトウェア、ハードウェア、またはソフトウェアとハードウェアの組み合わせで実装することができる。さらに、これらの他の要素の一部または全部を組み合わせてもよく、一部を完全に省略してもよく、本明細書に記載の機能を実現しながら追加のコンポーネントを追加してもよい。したがって、本明細書に記載の主題は、多くの異なるバリエーションで具現化することができ、そのようなバリエーションはすべて、特許請求の範囲内にあると考えられる。 It should be understood that the arrangements of components depicted in the accompanying figures are illustrative and that other arrangements are possible. For example, one or more of the elements described herein may be realized in whole or in part as electronic hardware components. Other elements may be implemented in software, hardware, or a combination of software and hardware. Moreover, some or all of these other elements may be combined, some may be omitted entirely, or additional components may be added while still achieving the functionality described herein. Thus, the subject matter described herein may be embodied in many different variations, and all such variations are believed to be within the scope of the claims.

本明細書に記載される主題の理解を容易にするために、多くの態様は、動作のシーケンスの観点から説明される。様々な動作は、特殊な回路または回路によって、１つ以上のプロセッサによって実行されるプログラム命令によって、または両方の組み合わせによって実行され得ることは、当業者によって認識されるであろう。本明細書における一連の動作の説明は、その一連の動作を行うために記述された特定の順序に従わなければならないことを意味することを意図していない。本明細書に記載される全ての方法は、本明細書で特に示されない限り、または文脈によって明らかに矛盾しない限り、任意の適切な順序で実行され得る。 To facilitate understanding of the subject matter described herein, many aspects are described in terms of a sequence of operations. Those skilled in the art will recognize that various operations may be performed by specialized circuitry or circuits, by program instructions executed by one or more processors, or by a combination of both. The description of a sequence of operations herein is not intended to imply that the particular order described must be followed to perform the sequence of operations. All methods described herein may be performed in any suitable order unless otherwise indicated herein or clearly contradicted by context.

主題を説明する文脈における（特に、以下の請求項の文脈における）用語「一つ」および「１つ」および「前記」ならびに同様の参照の使用は、本明細書において別様に示されるか、または文脈によって明らかに矛盾しない限り、単数および複数の両方をカバーするように解釈されるものとされる。用語「少なくとも１つ」の後に１つ以上の項目のリスト（例えば、「AおよびBの少なくとも１つ」）が続く使用は、本明細書において別段の指示があるか、または文脈によって明らかに矛盾しない限り、リストされた項目（AまたはB）から選択された１つの項目、またはリストされた項目（AおよびB）の２以上の任意の組み合わせを意味すると解釈される。さらに、求める保護範囲は、その均等物とともに以下に記載される請求項によって定義されるため、前述の説明は、例示の目的のみのためであり、制限の目的ではありえない。本明細書で提供される任意のおよびすべての例、または例文（例えば、「など」）の使用は、単に主題をより良く説明することを意図しており、別途請求されない限り、主題の範囲に制限を提起するものでない。特許請求の範囲及び書面の説明の両方において、ある結果をもたらすための条件を示す用語「に基づいて」及び他の同様のフレーズの使用は、その結果をもたらす他の条件を排除することを意図していない。本明細書のいかなる文言も、請求項に記載された発明の実施に不可欠なものとして、請求項に記載されていない要素を示すものと解釈されるべきではない。

The use of the terms "a,""an," and "said," and similar references in the context of describing the subject matter (particularly in the context of the claims below) shall be interpreted to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The use of the term "at least one" followed by a list of one or more items (e.g., "at least one of A and B") shall be interpreted to mean one item selected from the listed items (A or B), or any combination of two or more of the listed items (A and B), unless otherwise indicated herein or clearly contradicted by context. Moreover, the foregoing description is for illustrative purposes only, and not for limiting purposes, since the scope of protection sought is defined by the claims set forth below, together with their equivalents. The use of any and all examples or example phrases (e.g., "such as") provided herein is intended merely to better describe the subject matter, and does not pose any limitation on the scope of the subject matter, unless otherwise claimed. In both the claims and the written description, the use of the term "based on" and other similar phrases indicating conditions for producing a certain result is not intended to exclude other conditions for producing that result. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the claimed invention.

Claims

1. A method for generating a 3D model of a scene, comprising:
a 3D modeling system including an imaging means acquiring first depth data including pixels of a plurality of frames of depth images;
said 3D modeling system including a scanning means acquiring second depth data including depth data points of a plurality of frames of images;
determining a depth threshold;
removing one or more data points in the first depth data in response to a depth value of the one or more data points in the first depth data being greater than the depth threshold;
receiving, by the 3D modeling system, color data for the scene including a plurality of color image pixels;
generating a 3D model of the scene based on the color data, the first depth data, and the second depth data,
displaying a 3D model of the scene.

each of the depth images in the first depth data is first data and each of the image frames in the second depth data is second data;
The method further comprises:
determining a plurality of data pairs, each of the plurality of data pairs including the first data and the corresponding second data, the first data and the corresponding second data including similar target objects;
The method of claim 1 , further comprising: determining a positional relationship between the first depth data and the second depth data based on the plurality of data pairs.

Each of the first data is associated with a first external parameter indicating posture information of the imaging means, and each of the second data is associated with a second external parameter indicating posture information of the scanning means, the posture information indicating position and rotation information, and determining the plurality of data pairs includes:
determining a first pose of the 3D modeling system associated with the first datum based on the first extrinsic parameters;
determining a second pose of the 3D modeling system relative to the second data based on the second extrinsic parameters; and
determining a data pair including the first data and the second data in response to the first pose and the second pose being the same ;
The method of claim 2.

the first external parameter and the second external parameter are output from a same attitude sensor in the 3D modeling system;
The method according to claim 3.

a first time stamp indicating a fixed time when the first data was acquired by the imaging means is associated with each of the first data, a second time stamp indicating a fixed time when the second data was acquired by the scanning means is associated with each of the second data, and the first data and the corresponding second data in the data pair have a time interval smaller than a threshold value.
The method of claim 2.

the first data and the corresponding second data in the data pair are identified with one or more similar objects;
The method of claim 2.

identifying one or more image regions in the generated 3D model of the scene that are below a resolution threshold;
said 3D modeling system including said imaging means acquiring first fill data including pixels of a plurality of depth images;
supplementing the generated 3D model of the scene with the first fill data to generate a new 3D model of the scene.
The method of claim 1.

identifying one or more image regions in the generated 3D model of the scene that do not have enough of the depth data points;
said 3D modeling system including said scanning means acquiring second fill data including depth data points of a plurality of image frames;
supplementing the generated 3D model of the scene with second fill data to generate a new 3D model of the scene.
The method of claim 1.

the imaging means acquiring the plurality of depth images at a first frame rate, and the scanning means acquiring the plurality of image frames at a second frame rate.
The method of claim 1.

the 3D modeling system including a display for displaying in real time a 3D model of the scene based on the acquired depth data;
The method of claim 1.

1. A system for generating a 3D model of a scene, comprising:
an imaging means configured to acquire first depth data comprising pixels of a depth image of a plurality of frames;
a scanning means configured to acquire second depth data comprising depth data points of a plurality of image frames;
one or more processors;
The one or more processors:
acquiring first depth data from said imaging means, second depth data from said scanning means, and color data including a plurality of color image pixels;
determining a depth threshold;
removing one or more data points in the first depth data in response to a depth value of the one or more data points in the first depth data being greater than the depth threshold;
generating a 3D model of the scene based on the color data, the first depth data and the second depth data;
outputting the generated 3D model of the scene.
system.

each of the depth images in the first depth data is first data, and each of the image frames in the second depth data is second data;
The one or more processors further include:
determining a plurality of data pairs, each of the plurality of data pairs including the first data and the corresponding second data, the first data and the corresponding second data including similar target objects;
and configured to determine a positional relationship between the first depth data and the second depth data based on the plurality of data pairs.
The system of claim 11.

The system further comprises one or more attitude sensors configured to output external parameters indicative of attitude information of the imaging means and the scanning means in the system;
a first external parameter indicating posture information of the imaging means is associated with each of the first data, and a second external parameter indicating posture information of the scanning means is associated with each of the second data,
The attitude information indicates position and rotation information,
The one or more processors further include:
determining a first attitude of the system associated with the first data based on the first external parameters;
determining a second attitude of the system associated with the second data based on the second external parameters;
and determining a data pair including the first data and the second data in response to the first orientation and the second orientation being the same .
The system of claim 12.

A first time stamp indicating a fixed time when the first data was acquired by the imaging means is associated with each of the first data, and a second time stamp indicating a fixed time when the second data was acquired by the scanning means is associated with each of the second data,
the first data and the corresponding second data in the data pair have a time interval that is less than a threshold value;
The system of claim 12.

the first data and the corresponding second data in the data pair are identified with one or more similar objects;
The system of claim 12.

The one or more processors further include:
identifying one or more image regions in the generated 3D model of the scene that are below a resolution threshold;
receiving first fill data comprising pixels of a plurality of depth images from said imaging means;
- configured to fill the generated 3D model of the scene with the first filling data to generate a new 3D model of the scene.
The system of claim 15.

The one or more processors further include:
identifying one or more image regions in the generated 3D model of the scene that do not have sufficient depth data points;
receiving second fill data from said scanning means, said fill data including depth data points for a plurality of image frames;
- configured to fill the generated 3D model of the scene with the second filling data to generate a new 3D model of the scene.
The system of claim 12.

A non-volatile computer readable medium having computer executable instructions stored thereon, comprising:
The computer executable instructions, when executed by one or more processors, cause the one or more processors to perform the method of any one of claims 1 to 10.
Computer-readable medium.

1. A computer program for execution by a 3D modeling system having an imaging device, a scanning device, and one or more processors, comprising:
The computer program, when executed by the one or more processors, causes the 3D modeling system to perform the method of any one of claims 1 to 10.
Computer program.