JP7613690B2

JP7613690B2 - 3D point cloud processing device, 3D point cloud processing method, and 3D point cloud processing program

Info

Publication number: JP7613690B2
Application number: JP2021094793A
Authority: JP
Inventors: 泰洋八尾; 慎吾安藤; 潤島村; 涼一石川; 岳史大石
Original assignee: Nippon Telegraph and Telephone Corp; University of Tokyo NUC; NTT Inc USA
Current assignee: University of Tokyo NUC; NTT Inc; NTT Inc USA
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2025-01-15
Anticipated expiration: 2041-06-04
Also published as: JP2022186523A

Description

開示の技術は、３次元点群処理装置、３次元点群処理方法、及び３次元点群処理プログラムに関する。 The disclosed technology relates to a 3D point cloud processing device, a 3D point cloud processing method, and a 3D point cloud processing program.

非特許文献１には、ＬｉＤＡＲ（ＬｉｇｈｔＤｅｔｅｃｔｉｏｎａｎｄＲａｎｇｉｎｇ）とステレオ画像により、ＬｉＤＡＲによって疎な点群をステレオ画像の視差推定結果に基づいてアップサンプリングした深度画像を出力する技術が開示されている。 Non-Patent Document 1 discloses a technology that uses LiDAR (Light Detection and Ranging) and stereo images to output a depth image by upsampling a sparse point cloud obtained by LiDAR based on the disparity estimation results of the stereo images.

非特許文献２には、教師ありの深層学習によってＬｉＤＡＲとステレオ画像からＬｉＤＡＲ点群をアップサンプリングした深度画像を出力する技術が開示されている。 Non-Patent Document 2 discloses a technology that uses supervised deep learning to output a depth image by upsampling the LiDAR point cloud from LiDAR and stereo images.

非特許文献３には、半教師ありの深層学習によってＬｉＤＡＲとステレオ画像からＬｉＤＡＲ点群をアップサンプリングした深度画像を出力する技術が開示されている。非特許文献３に開示の技術では投影誤りがある場合でもその影響を除去して推論をすることが可能である。 Non-Patent Document 3 discloses a technology that uses semi-supervised deep learning to output a depth image by upsampling the LiDAR point cloud from LiDAR and stereo images. The technology disclosed in Non-Patent Document 3 makes it possible to remove the influence of projection errors and perform inference even if they exist.

Maddern, Will, and Paul Newman. "Real-time probabilistic fusion of sparse 3D LIDAR and dense stereo." 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016.Maddern, Will, and Paul Newman. "Real-time probabilistic fusion of sparse 3D LIDAR and dense stereo." 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2016. Park, Kihong, Seungryong Kim, and Kwanghoon Sohn. "High-precision depth estimation with the 3d lidar and stereo fusion." 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018.Park, Kihong, Seungryong Kim, and Kwanghoon Sohn. "High-precision depth estimation with the 3d lidar and stereo fusion." 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018. Cheng, Xuelian, et al. "Noise-aware unsupervised deep lidar-stereo fusion." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.Cheng, Xuelian, et al. "Noise-aware unsupervised deep lidar-stereo fusion." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

非特許文献１の問題点は、投影されたＬｉＤＡＲ点を正しい奥行値とみなして処理をすることであり、投影誤りがある場合を考慮していないことである。 The problem with Non-Patent Document 1 is that the projected LiDAR points are processed assuming that they are the correct depth values, and do not take into account cases where projection errors may occur.

非特許文献２の問題点は、非特許文献１と同様、投影誤りがデータにあることを考慮していないこと、および学習のために高密度な深度画像を必要とするがそのようなデータを得ることが容易ではないことである。 The problem with Non-Patent Document 2, like Non-Patent Document 1, is that it does not take into account that projection errors exist in the data, and that high-density depth images are required for learning, but it is not easy to obtain such data.

非特許文献３の問題点は、学習時に正しくレジストレーションされたＬｉＤＡＲ計測とステレオ画像の組が多量に必要となることであり、そのようなデータを得ることは容易ではない。 The problem with Non-Patent Document 3 is that a large number of correctly registered pairs of LiDAR measurements and stereo images are required for learning, and obtaining such data is not easy.

上述したように、非特許文献１、２の問題は「ＬｉＤＡＲデータの投影誤りを考慮していない」という点である。また、非特許文献３の問題は「多量の学習データを必要とする」という点である。 As mentioned above, the problem with Non-Patent Documents 1 and 2 is that they "do not take into account the projection error of LiDAR data." Also, the problem with Non-Patent Document 3 is that it "requires a large amount of training data."

さらに、非特許文献１～３はいずれも「視差推定をした結果によってアップサンプリングしたデータを作成する」という処理になっている。ステレオによる視差推定は長距離での距離計測精度が落ちる。これに対してＬｉＤＡＲは長距離でも距離計測精度が高いが、その特性を生かすことができていない。 Furthermore, all of Non-Patent Documents 1 to 3 use a process in which "upsampled data is created based on the results of disparity estimation." Disparity estimation using stereo reduces distance measurement accuracy over long distances. In contrast, LiDAR has high distance measurement accuracy even over long distances, but this characteristic is not utilized.

開示の技術は、上記の点に鑑みてなされたものであり、計測して得られた３次元点群を精度よくアップサンプリングすることができる３次元点群処理装置、３次元点群処理方法、及び３次元点群処理プログラムを提供することを目的とする。 The disclosed technology has been made in consideration of the above points, and aims to provide a 3D point cloud processing device, a 3D point cloud processing method, and a 3D point cloud processing program that can accurately upsample 3D point clouds obtained by measurement.

本開示の第１態様は、３次元点群処理装置であって、少なくとも撮影位置の関係が予め求められている第１画像及び第２画像と、少なくとも前記撮影位置と計測位置との関係が予め求められている物体の表面上の３次元点群とを受け付け、前記３次元点群の３次元点の各々に対応する前記第１画像上の画素位置を求める入力処理部と、前記３次元点群の３次元点の各々に対応する前記第１画像上の画素位置に基づいて、前記第１画像上の画素位置の各々について、近傍の複数の画素位置に対応する３次元点の奥行値から、前記第１画像及び前記第２画像の間の整合性が高くなるように、前記奥行値を選択する近傍選択部と、を含む。 A first aspect of the present disclosure is a three-dimensional point cloud processing device, which includes an input processing unit that receives a first image and a second image, the relationship of which at least is determined in advance, and a three-dimensional point cloud on the surface of an object, the relationship of which at least is determined in advance between the shooting position and the measurement position, and determines pixel positions on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud, and a neighborhood selection unit that selects, for each pixel position on the first image, based on the pixel positions on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud, from depth values of three-dimensional points corresponding to multiple neighboring pixel positions, so that the consistency between the first image and the second image is high.

本開示の第２態様は、３次元点群処理方法であって、入力処理部が、少なくとも撮影位置の関係が予め求められている第１画像及び第２画像と、少なくとも前記撮影位置と計測位置との関係が予め求められている物体の表面上の３次元点群とを受け付け、前記３次元点群の３次元点の各々に対応する前記第１画像上の画素位置を求め、近傍選択部が、前記３次元点群の３次元点の各々に対応する前記第１画像上の画素位置に基づいて、前記第１画像上の画素位置の各々について、近傍の複数の画素位置に対応する３次元点の奥行値から、前記第１画像及び前記第２画像の間の整合性が高くなるように、前記奥行値を選択する。 A second aspect of the present disclosure is a three-dimensional point cloud processing method, in which an input processing unit receives a first image and a second image for which at least the relationship between the shooting positions has been determined in advance, and a three-dimensional point cloud on the surface of an object for which at least the relationship between the shooting positions and the measurement positions has been determined in advance, and determines pixel positions on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud, and a neighborhood selection unit selects, for each pixel position on the first image, based on the pixel positions on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud, the depth value from the depth values of three-dimensional points corresponding to multiple neighboring pixel positions, so that the consistency between the first image and the second image is high.

本開示の第３態様は、３次元点群処理プログラムであって、コンピュータを、上記第１態様の３次元点群処理装置として機能させるためのプログラムである。 The third aspect of the present disclosure is a 3D point cloud processing program that causes a computer to function as the 3D point cloud processing device of the first aspect.

開示の技術によれば、計測して得られた３次元点群を精度よくアップサンプリングすることができる。 The disclosed technology makes it possible to upsample 3D point clouds obtained through measurement with high precision.

本実施形態の３次元点群処理装置として機能するコンピュータの一例の概略ブロック図である。FIG. 2 is a schematic block diagram of an example of a computer that functions as the three-dimensional point cloud processing device of the present embodiment. ＬｉＤＡＲセンサによる計測点と、第１カメラ及び第２カメラの撮影シーンとの一例を示す図である。A figure showing an example of measurement points measured by a LiDAR sensor and scenes captured by a first camera and a second camera. 本実施形態の３次元点群処理装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a three-dimensional point cloud processing apparatus according to an embodiment of the present invention; （Ａ）第１画像の一例を示す図、（Ｂ）第２画像の一例を示す図、及び（Ｃ）第１画像に３次元点群を投影した結果の一例を示す図である。FIG. 1A is a diagram showing an example of a first image, FIG. 1B is a diagram showing an example of a second image, and FIG. 1C is a diagram showing an example of the result of projecting a three-dimensional point cloud onto the first image. 近傍集合を抽出する方法を説明するための図である。FIG. 13 is a diagram for explaining a method for extracting a neighborhood set. 本実施形態の３次元点群処理装置の３次元点群処理ルーチンを示すフローチャートである。4 is a flowchart showing a three-dimensional point cloud processing routine of the three-dimensional point cloud processing device of the present embodiment.

以下、開示の技術の実施形態の一例を、図面を参照しつつ説明する。なお、各図面において同一又は等価な構成要素及び部分には同一の参照符号を付与している。また、図面の寸法比率は、説明の都合上誇張されており、実際の比率とは異なる場合がある。 Below, an example of an embodiment of the disclosed technology will be described with reference to the drawings. Note that the same reference symbols are used for identical or equivalent components and parts in each drawing. Also, the dimensional ratios in the drawings have been exaggerated for the convenience of explanation and may differ from the actual ratios.

＜本実施形態の概要＞
本実施形態では、ＬｉＤＡＲによる計測で得られた疎な３次元点群について、カメラにより撮影された第一画像と第二画像を手掛かりとしてアップサンプリングをする。 <Outline of this embodiment>
In this embodiment, a sparse 3D point cloud obtained by LiDAR measurement is upsampled using a first image and a second image captured by a camera as clues.

具体的には、ＬｉＤＡＲの投影を正しいとせずに、画素の周囲に投影された３次元点から、コスト関数と平滑化項とからなるエネルギー関数に基づいて最適である奥行値を選択するという手法により、学習データを必要とせず、ＬｉＤＡＲの投影情報に誤りがある場合でも、ＬｉＤＡＲによって計測された３次元点群を正しくアップサンプリングするものである。 Specifically, the LiDAR projection is not assumed to be correct, and the optimal depth value is selected from the 3D points projected around the pixel based on an energy function consisting of a cost function and a smoothing term. This method does not require training data and correctly upsamples the 3D point cloud measured by LiDAR even if there is an error in the LiDAR projection information.

さらに、本実施形態は、ＬｉＤＡＲで計測した３次元点群を画素に割り当てる形でアップサンプリングをするため、視差推定を実施する手法とは異なり、長距離において計測精度が維持されることを特徴としている。 Furthermore, this embodiment is characterized by upsampling by assigning the 3D point cloud measured by LiDAR to pixels, so that measurement accuracy is maintained over long distances, unlike methods that perform disparity estimation.

このように、本実施形態では、学習データを用いないこと、ＬｉＤＡＲとカメラ間に、車両の移動やセンサ位置の違いから生じるオクルージョンによる投影誤りがある場合でも正しくアップサンプリングが可能なこと、画像の視差による距離推定をせずＬｉＤＡＲで計測された３次元点群をアップサンプリングするので長距離での精度低下が抑えられることを特徴とする。 As such, this embodiment is characterized by the fact that it does not use learning data, that it is possible to perform correct upsampling even when there is a projection error between the LiDAR and the camera due to occlusion caused by vehicle movement or differences in sensor position, and that it is possible to suppress loss of accuracy over long distances by upsampling the 3D point cloud measured by the LiDAR without estimating distance based on image parallax.

＜本実施形態に係る３次元点群処理装置の構成＞
図１は、本実施形態の３次元点群処理装置１０のハードウェア構成を示すブロック図である。 <Configuration of the 3D point cloud processing device according to this embodiment>
FIG. 1 is a block diagram showing the hardware configuration of a three-dimensional point cloud processing apparatus 10 according to the present embodiment.

図１に示すように、３次元点群処理装置１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３、ストレージ１４、入力部１５、表示部１６及び通信インタフェース（Ｉ／Ｆ）１７を有する。各構成は、バス１９を介して相互に通信可能に接続されている。 As shown in FIG. 1, the 3D point cloud processing device 10 has a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, a storage 14, an input unit 15, a display unit 16, and a communication interface (I/F) 17. Each component is connected to each other via a bus 19 so as to be able to communicate with each other.

ＣＰＵ１１は、中央演算処理ユニットであり、各種プログラムを実行したり、各部を制御したりする。すなわち、ＣＰＵ１１は、ＲＯＭ１２又はストレージ１４からプログラムを読み出し、ＲＡＭ１３を作業領域としてプログラムを実行する。ＣＰＵ１１は、ＲＯＭ１２又はストレージ１４に記憶されているプログラムに従って、上記各構成の制御及び各種の演算処理を行う。本実施形態では、ＲＯＭ１２又はストレージ１４には、３次元点群をアップサンプリングするための３次元点群処理プログラムが格納されている。３次元点群処理プログラムは、１つのプログラムであっても良いし、複数のプログラム又はモジュールで構成されるプログラム群であっても良い。 The CPU 11 is a central processing unit that executes various programs and controls each part. That is, the CPU 11 reads out a program from the ROM 12 or storage 14, and executes the program using the RAM 13 as a working area. The CPU 11 controls each of the above components and performs various calculation processes according to the program stored in the ROM 12 or storage 14. In this embodiment, the ROM 12 or storage 14 stores a three-dimensional point cloud processing program for upsampling a three-dimensional point cloud. The three-dimensional point cloud processing program may be one program, or may be a group of programs composed of multiple programs or modules.

ＲＯＭ１２は、各種プログラム及び各種データを格納する。ＲＡＭ１３は、作業領域として一時的にプログラム又はデータを記憶する。ストレージ１４は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）又はＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）により構成され、オペレーティングシステムを含む各種プログラム、及び各種データを格納する。 The ROM 12 stores various programs and various data. The RAM 13 temporarily stores programs or data as a working area. The storage 14 is composed of a HDD (Hard Disk Drive) or SSD (Solid State Drive) and stores various programs including the operating system and various data.

入力部１５は、マウス等のポインティングデバイス、及びキーボードを含み、少なくとも撮影位置の関係が予め求められている第１画像及び第２画像と、少なくとも撮影位置と計測位置との関係が予め求められている物体の表面上の３次元点群Ｐとを含む各種の入力を行うために使用される。例えば、入力部１５には、図２に示すような、ＬｉＤＡＲセンサ５０によって計測された３次元点群Ｐと、第１カメラ５２によって撮影された第１画像（図４（Ａ）参照）と、第２カメラ５４によって撮影された第２画像（図４（Ｂ）参照）と、が入力される。第１カメラ５２と第２カメラ５４との撮影位置の関係が予め求められており、第１カメラ５２の撮影位置とＬｉＤＡＲセンサ５０の計測位置との関係が予め求められている。 The input unit 15 includes a pointing device such as a mouse, and a keyboard, and is used to perform various inputs including a first image and a second image for which the relationship between at least the shooting positions is determined in advance, and a three-dimensional point cloud P on the surface of an object for which the relationship between at least the shooting positions and the measurement positions is determined in advance. For example, the three-dimensional point cloud P measured by the LiDAR sensor 50, the first image taken by the first camera 52 (see FIG. 4(A)), and the second image taken by the second camera 54 (see FIG. 4(B)) as shown in FIG. 2 are input to the input unit 15. The relationship between the shooting positions of the first camera 52 and the second camera 54 is determined in advance, and the relationship between the shooting position of the first camera 52 and the measurement position of the LiDAR sensor 50 is determined in advance.

第１画像は、歪み補正されたＲＧＢもしくはグレースケールの画像であり、第２画像は、歪み補正されたＲＧＢもしくはグレースケールの画像である。３次元点群Ｐは、ＬｉＤＡＲセンサ５０によって計測された３次元点の集合である。一つ一つの３次元点は、３次元のベクトルであり、３次元点群Ｐに、３次元点がＮ点含まれる場合には、３次元点群ＰはＮ個の要素を持つ３次元のベクトルの集合となる。 The first image is a distortion-corrected RGB or grayscale image, and the second image is a distortion-corrected RGB or grayscale image. The three-dimensional point cloud P is a collection of three-dimensional points measured by the LiDAR sensor 50. Each three-dimensional point is a three-dimensional vector, and when the three-dimensional point cloud P contains N three-dimensional points, the three-dimensional point cloud P becomes a collection of three-dimensional vectors with N elements.

また、入力部１５には、第１カメラ５２の内部パラメータＫ＿１、第２カメラ５４の内部パラメータＫ＿２、第１カメラ５２と第２カメラ５４間の回転行列Ｒ＿Ｃ、第１カメラ５２と第２カメラ５４間の並進ベクトルＴ＿Ｃ、第１カメラ５２とＬｉＤＡＲセンサ５０間の投影行列Ｒ＿Ｌ、及び第１カメラ５２とＬｉＤＡＲセンサ５０間の並進ベクトルＴ＿Ｌが入力される。 The input unit 15 also receives input of an internal parameter K_1 of the first camera 52, an internal parameter K_2 of the second camera 54, a rotation matrix R_C between the first camera 52 and the second camera 54, a translation vector T_C between the first camera 52 and the second camera 54, a projection matrix R_L between the first camera 52 and the LiDAR sensor 50, and a translation vector T_L between the first camera 52 and the LiDAR sensor 50.

第１カメラ５２及び第２カメラ５４の内部パラメータＫ＿１、Ｋ＿２は、３×３のカメラ内部パラメータ行列である。第１カメラ５２と第２カメラ５４間の回転行列Ｒ＿Ｃは、３×３の回転行列である。第１カメラ５２と第２カメラ５４間の並進ベクトルＴ＿Ｃは、３次元のベクトルである。第１カメラ５２とＬｉＤＡＲセンサ５０間の投影行列Ｒ＿Ｌは、３×３の回転行列である。第１カメラ５２とＬｉＤＡＲセンサ５０間の並進ベクトルＴ＿Ｌは、３次元のベクトルである。 The internal parameters K_1 and K_2 of the first camera 52 and the second camera 54 are 3x3 camera internal parameter matrices. The rotation matrix R_C between the first camera 52 and the second camera 54 is a 3x3 rotation matrix. The translation vector T_C between the first camera 52 and the second camera 54 is a three-dimensional vector. The projection matrix R_L between the first camera 52 and the LiDAR sensor 50 is a 3x3 rotation matrix. The translation vector T_L between the first camera 52 and the LiDAR sensor 50 is a three-dimensional vector.

表示部１６は、例えば、液晶ディスプレイであり、ＬｉＤＡＲセンサ５０によって計測された３次元点群Ｐに対してアップサンプリングした結果を含む各種の情報を表示する。表示部１６は、タッチパネル方式を採用して、入力部１５として機能しても良い。 The display unit 16 is, for example, a liquid crystal display, and displays various information including the results of upsampling the three-dimensional point cloud P measured by the LiDAR sensor 50. The display unit 16 may employ a touch panel system and function as the input unit 15.

通信インタフェース１７は、他の機器と通信するためのインタフェースであり、例えば、イーサネット（登録商標）、ＦＤＤＩ、Ｗｉ－Ｆｉ（登録商標）等の規格が用いられる。 The communication interface 17 is an interface for communicating with other devices, and uses standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark).

次に、３次元点群処理装置１０の機能構成について説明する。図３は、３次元点群処理装置１０の機能構成の例を示すブロック図である。 Next, the functional configuration of the 3D point cloud processing device 10 will be described. Figure 3 is a block diagram showing an example of the functional configuration of the 3D point cloud processing device 10.

３次元点群処理装置１０は、機能的には、図３に示すように、入力処理部２０、近傍抽出部２２、近傍選択部２４、及び平滑化部２６を備えている。 As shown in FIG. 3, the 3D point cloud processing device 10 functionally comprises an input processing unit 20, a neighborhood extraction unit 22, a neighborhood selection unit 24, and a smoothing unit 26.

入力処理部２０は、入力部１５により受け付けた、第１画像及び第２画像と、３次元点群Ｐとに基づいて、３次元点群Ｐの各々の３次元点を第１画像上に投影し、各３次元点に対応する第１画像上の画素位置を求める（図４（Ｃ）参照）。図４（Ｃ）では、ドットで表される物体の表面上の３次元点の一つが、第１画像上の白い物体を表す領域内の画素位置に誤って投影されている例を示している。 The input processing unit 20 projects each of the three-dimensional points of the three-dimensional point group P onto the first image based on the first image, the second image, and the three-dimensional point group P received by the input unit 15, and determines the pixel position on the first image corresponding to each three-dimensional point (see FIG. 4(C)). FIG. 4(C) shows an example in which one of the three-dimensional points on the surface of an object represented by dots is erroneously projected onto a pixel position within an area representing a white object on the first image.

具体的には、３次元点群Ｐの各々の３次元点を、第１画像上に投影し、第１画像の領域外に投影された３次元点を３次元点群Ｐから除去する。除去後の３次元点群Ｐの各々の３次元点に対応する第１画像上の画素位置及び奥行値の組み合わせである要素からなる集合Ｑ＿１を求める。集合Ｑ＿１の各要素が示す点を、第２画像上に投影し、第２画像の領域外に投影された点を示す要素を集合Ｑ＿１から除去し、除去後の集合Ｑ＿１に更新する。 Specifically, each 3D point of the 3D point group P is projected onto the first image, and the 3D points projected outside the area of the first image are removed from the 3D point group P. A set Q_1 is obtained that is made up of elements that are combinations of pixel positions and depth values on the first image that correspond to each 3D point of the 3D point group P after the removal. The points indicated by each element of set Q_1 are projected onto the second image, and the elements indicating points projected outside the area of the second image are removed from set Q_1, updating it to set Q_1 after removal.

例えば、３次元点群Ｐに含まれる３次元点それぞれについて、以下の式に従って投影行列Ｒ＿Ｌ、並進ベクトルＴ＿Ｌを適用して３次元点群Ｐ＿１を求める。３次元点群Ｐ＿１はＮ個の３次元ベクトルの集合である。 For example, for each of the 3D points included in the 3D point group P, the projection matrix R_L and translation vector T_L are applied according to the following formula to obtain the 3D point group P_1. The 3D point group P_1 is a collection of N 3D vectors.

Ｐ＿１＝Ｒ＿ＬＰ＋Ｔ＿Ｌ P_1=R_L P+T_L

３次元点群Ｐ＿１を内部パラメータＫ＿１により第１カメラ５２の第１画像上に投影し、第１画像の領域外に投影されたものを除去し、（第１画像中のｘ座標、第１画像中のｙ座標、３次元点の奥行値）からなる要素の集合Ｑ＿１を得る。例えば、３次元点ｐ（∈Ｐ＿１）の各々について以下の式に従って計算される（ｘ，ｙ，ｄ）を集合Ｑ＿１の要素とする。
（Ｘ，Ｙ，ｄ）＝Ｋ＿１ｐ
ｘ＝Ｘ／ｄ
ｙ＝Ｙ／ｄ
第１画像の領域内に投影された点の数をＭ＿１と表記する。なお、集合Ｑ＿１の各要素と変換前の３次元点群Ｐの各３次元点との対応付けは保持されている。 The three-dimensional point group P_1 is projected onto the first image of the first camera 52 using the internal parameter K_1, and those projected outside the region of the first image are removed to obtain a set Q_1 of elements consisting of (x coordinate in the first image, y coordinate in the first image, depth value of the three-dimensional point). For example, (x, y, d) calculated according to the following formula for each three-dimensional point p (∈P_1) is set as an element of the set Q_1.
(X, Y, d) = K_1 p
x=X/d
y = Y/d
The number of points projected within the region of the first image is denoted as M_1. Note that the correspondence between each element of the set Q_1 and each 3D point of the 3D point group P before the transformation is maintained.

そして、３次元点群Ｐ＿１に含まれる３次元点それぞれについて、以下の式に従って投影行列Ｒ＿Ｃ、並進ベクトルＴ＿Ｃを適用して３次元点群Ｐ＿２を求める。３次元点群Ｐ＿２はＮ個の３次元ベクトルの集合である。 Then, for each of the 3D points included in the 3D point group P_1, the projection matrix R_C and translation vector T_C are applied according to the following formula to obtain the 3D point group P_2. The 3D point group P_2 is a collection of N 3D vectors.

Ｐ＿２＝Ｒ＿ＣＰ＿１＋Ｔ＿Ｃ P_2=R_C P_1+T_C

３次元点群Ｐ＿２を内部パラメータＫ＿２により第２カメラ５４の第２画像上に投影し、３次元点群Ｐ＿２から第２画像の領域外に投影されたものを除去し、（第２画像中のｘ座標、第２画像中のｙ座標、３次元点の奥行値）からなる要素の集合Ｑ＿２を得る。第２画像の領域内に投影された点の数をＭ＿２と表記する。なお、集合Ｑ＿２の各要素と変換前の３次元点群Ｐの各要素との対応付けは保持されている。 The three-dimensional point cloud P_2 is projected onto the second image of the second camera 54 using the internal parameter K_2, and those projected outside the area of the second image are removed from the three-dimensional point cloud P_2 to obtain a set of elements Q_2 consisting of (x coordinate in the second image, y coordinate in the second image, depth value of the three-dimensional point). The number of points projected within the area of the second image is denoted as M_2. Note that the correspondence between each element of the set Q_2 and each element of the three-dimensional point cloud P before transformation is maintained.

集合Ｑ＿１のうち、集合Ｑ＿２にも含まれている要素を、３次元点群Ｐとの対応付けを利用して抽出し、集合Ｑ＿１及び集合Ｑ＿２の両方に含まれている要素からなる集合となるように、集合Ｑ＿１を更新する。 Elements of set Q_1 that are also included in set Q_2 are extracted using the correspondence with the 3D point group P, and set Q_1 is updated so that it becomes a set consisting of elements that are included in both set Q_1 and set Q_2.

近傍抽出部２２は、３次元点群Ｐの各々の３次元点に対応する第１画像上の画素位置に基づいて、第１画像上の画素位置の各々について、近傍の複数の画素位置に対応する３次元点の奥行値を抽出する。 The neighborhood extraction unit 22 extracts depth values of three-dimensional points corresponding to multiple neighboring pixel positions for each pixel position on the first image based on the pixel position on the first image corresponding to each three-dimensional point of the three-dimensional point group P.

具体的には、第１画像の画素について、集合Ｑ＿１に含まれる要素のうち、近傍の画素位置を持つ要素を抽出する。これにより、各画素位置（ｘ，ｙ）について近傍集合Ｑ＿１＿ｘｙが抽出される。各画素位置（ｘ，ｙ）の近傍集合Ｑ＿１＿ｘｙは、集合Ｑ＿１の部分集合である。また、近傍の画素位置を持つ要素が抽出されなかった画素位置（ｘ，ｙ）については、近傍集合Ｑ＿１＿ｘｙは空集合とする。 Specifically, for each pixel in the first image, elements that are included in set Q_1 and have a nearby pixel position are extracted. As a result, a neighborhood set Q_1_xy is extracted for each pixel position (x, y). The neighborhood set Q_1_xy for each pixel position (x, y) is a subset of set Q_1. Furthermore, for pixel positions (x, y) for which no elements with nearby pixel positions were extracted, the neighborhood set Q_1_xy is set as an empty set.

なお、近傍の定義は任意であるが、例えば、画素位置からある半径以内に存在すること（図５）、又は画素位置からＫ近傍であることを、近傍としてもよい。図５では、注目点である画素位置から探索半径以内に存在する画素位置を有する要素を、近傍集合Ｑ＿１＿ｘｙとして抽出する例を示している。 Note that the definition of neighborhood is arbitrary, but for example, neighborhood may be within a certain radius from the pixel position (Figure 5), or K neighborhoods from the pixel position. Figure 5 shows an example in which elements having pixel positions that exist within a search radius from the pixel position of interest are extracted as neighborhood set Q_1_xy.

後述する信念伝播部３２において、画素位置（ｘ，ｙ）には近傍集合Ｑ＿１＿ｘｙに含まれる何れかの要素の奥行値が割り当てられる。 In the belief propagation unit 32 described later, the pixel position (x, y) is assigned the depth value of one of the elements in the neighborhood set Q_1_xy.

近傍選択部２４は、第１画像上の画素位置（ｘ，ｙ）の各々について、近傍集合Ｑ＿１＿ｘｙの要素の奥行値から、第１画像と第２画像との間の整合性が高く、かつ当該画素位置（ｘ，ｙ）と当該要素の画素位置との距離が近くなるように、奥行値を選択する。 For each pixel position (x, y) in the first image, the neighborhood selection unit 24 selects a depth value from the depth values of the elements of the neighborhood set Q_1_xy such that there is high consistency between the first image and the second image and the distance between the pixel position (x, y) and the pixel position of the element is short.

具体的には、近傍選択部２４は、コスト計算部３０及び信念伝播部３２を備えている。 Specifically, the neighborhood selection unit 24 includes a cost calculation unit 30 and a belief propagation unit 32.

コスト計算部３０は、第１画像上の画素位置（ｘ，ｙ）毎に、近傍集合Ｑ＿１＿ｘｙに含まれる要素の各々について、当該画素位置（ｘ，ｙ）及び当該要素の奥行値を用いて表される点を、第２画像上に投影し、第１画像上の当該画素位置（ｘ，ｙ）の画素値と、第２画像上に点が投影された画素位置の画素値との距離を求める。コスト計算部３０は、第１画像上の画素位置（ｘ，ｙ）毎に、近傍集合Ｑ＿１＿ｘｙに含まれる要素の各々について、第１画像上の画素位置（ｘ，ｙ）と当該要素の画素位置との距離、及び上記で求められた第１画像上の当該画素位置（ｘ，ｙ）の画素値と、第２画像上に点が投影された画素位置の画素値との距離を用いて表されるコスト関数を計算する。 For each pixel position (x, y) on the first image, the cost calculation unit 30 projects a point represented by using the pixel position (x, y) and the depth value of the element for each element included in the neighborhood set Q_1_xy onto the second image, and calculates the distance between the pixel value of the pixel position (x, y) on the first image and the pixel value of the pixel position where the point is projected on the second image. For each pixel position (x, y) on the first image, the cost calculation unit 30 calculates a cost function represented by using the distance between the pixel position (x, y) on the first image and the pixel position of the element, and the distance between the pixel value of the pixel position (x, y) on the first image calculated above and the pixel value of the pixel position where the point is projected on the second image for each element included in the neighborhood set Q_1_xy.

具体的には、コスト計算部３０は、近傍集合Ｑ＿１＿ｘｙに含まれる各要素ｑ＝（ｑ＿ｘ，ｑ＿ｙ，ｄ）について、コスト関数を計算する。コスト関数は、画素位置（ｘ，ｙ）が要素ｑの奥行値ｄを取る尤もらしさを評価する。 Specifically, the cost calculation unit 30 calculates a cost function for each element q = (q_x, q_y, d) included in the neighborhood set Q_1_xy. The cost function evaluates the likelihood that the pixel position (x, y) has the depth value d of element q.

まず、第１画像の画素位置（ｘ，ｙ）がｄの奥行値を持っていたときに第２画像上のどこに投影されるかを導出する。これは、第１画像の画素位置（ｘ，ｙ）と奥行値ｄとを用いて表される点（ｘ，ｙ，ｄ）を内部パラメータＫ＿１の逆行列によって逆投影し、投影行列Ｒ＿Ｃと並進ベクトルＴ＿Ｃを適用して第２カメラ５４の第２画像の座標系に座標変換をしたのちに、内部パラメータＫ＿２によって第２画像平面に投影することによって得ることができる。これによって得た第２画像上での画素位置を（ｘ’，ｙ’）とする。例えば、以下の式に従って、画素位置（ｘ，ｙ）を、奥行値ｄを用いて第２画像に投影した座標（ｘ’，ｙ’）を計算する。
（Ｘ’，Ｙ’，ｄ’）＝Ｋ＿２（Ｒ＿ＣＫ＿１^－１（ｄ×ｘ，ｄ×ｙ，ｄ）＋Ｔ＿Ｃ）
ｘ’＝Ｘ’／ｄ’
ｙ’＝Ｙ’／ｄ’
これによって、画素位置（ｘ，ｙ）がｑの奥行値を取った際のコスト関数ｃ＿ｘｙｑは以下のように求められる。 First, it is derived where on the second image a pixel position (x, y) of the first image is projected when it has a depth value of d. This can be obtained by inversely projecting a point (x, y, d) expressed using the pixel position (x, y) of the first image and the depth value d by the inverse matrix of the internal parameter K_1, applying the projection matrix R_C and the translation vector T_C to perform coordinate transformation into the coordinate system of the second image of the second camera 54, and then projecting onto the second image plane by the internal parameter K_2. The pixel position on the second image obtained in this way is assumed to be (x', y'). For example, the coordinates (x', y') of the pixel position (x, y) projected onto the second image using the depth value d are calculated according to the following formula.
(X', Y', d') = K_2 (R_C K_1 ^-1 (d×x, d×y, d) + T_C)
x'=X'/d'
y'=Y'/d'
As a result, the cost function c_xyq when the pixel position (x, y) has a depth value of q can be calculated as follows.

ｃ＿ｘｙｑ＝ｗ＿ＳＳｔｅｒｅｏ（Ｉ＿１，Ｉ＿２，ｘ，ｙ，ｘ’，ｙ’）＋ｗ＿ＤＤｉｓｔａｎｃｅ（ｘ，ｙ，ｑ＿ｘ，ｑ＿ｙ） c_xyq=w_S Stereo (I_1, I_2, x, y, x', y') + w_D Distance (x, y, q_x, q_y)

Ｓｔｅｒｅｏは、第１画像の画素位置（ｘ，ｙ）と第２画像の画素位置（ｘ’，ｙ’）における画素値の違いを評価する関数であり、画素値の差異、センサス変換をした画像間のハミング距離、画素値の勾配の差異を取る方法などがあり（非特許文献３参照）、その重みづけやウィンドウサイズは任意である。 Stereo is a function that evaluates the difference in pixel values at pixel position (x, y) in the first image and pixel position (x', y') in the second image. There are methods that take the difference in pixel values, the Hamming distance between images that have been census transformed, and the difference in the gradient of pixel values (see Non-Patent Document 3), and the weighting and window size are arbitrary.

Ｄｉｓｔａｎｃｅは、第１画像の画素位置（ｘ，ｙ）と要素の画素位置（ｑ＿ｘ，ｑ＿ｙ）の距離であり、ｌ１距離、ｌ２距離、もしくはＨｕｂｅｒ距離などであり、距離をトランケーションした値を用いてもよい。 Distance is the distance between the pixel position (x, y) of the first image and the pixel position (q_x, q_y) of the element, and can be the l1 distance, l2 distance, or Huber distance, or a truncated value of the distance can be used.

ｗ＿Ｓ，ｗ＿Ｄはそれぞれの重みである。ｃ＿ｘｙｑの意味するところは、画素位置（ｘ，ｙ）に奥行値ｄを割り当てた際に第１画像と第２画像の整合性が高く、かつ当該画素位置（ｘ，ｙ）の近くに投影された３次元点の奥行値が割り当てやすくなるということを意味する。 w_S and w_D are the respective weights. c_xyq means that when a depth value d is assigned to a pixel position (x, y), the first image and the second image are highly consistent, and it is easy to assign the depth value of a 3D point projected near the pixel position (x, y).

信念伝播部３２は、第１画像上の画素位置（ｘ，ｙ）毎に、コスト関数及び隣接する画素において選択される奥行値との差分を用いて表されるエネルギー関数に基づいて、３次元点の奥行値を選択する。 For each pixel position (x, y) on the first image, the belief propagation unit 32 selects a depth value for a three-dimensional point based on a cost function and an energy function expressed using the difference from the depth value selected for an adjacent pixel.

具体的には、信念伝播部３２は、前述のコスト関数に加えて、平滑化項を考慮したエネルギー関数を定義し、そのエネルギー関数を最小化する奥行値ｄ（ｘ，ｙ）を、第１画像の各画素位置（ｘ，ｙ）のそれぞれについて、信念伝播によって選択する。エネルギー関数Ｅは以下の式で定義される。 Specifically, the belief propagation unit 32 defines an energy function that takes into account a smoothing term in addition to the cost function described above, and selects the depth value d(x, y) that minimizes the energy function for each pixel position (x, y) of the first image by belief propagation. The energy function E is defined by the following formula.

ｎｏｒｍ（∇ｄ（ｘ，ｙ））は隣接する奥行値ｄ（ｘ，ｙ）の差分のノルム（ｌ１距離、ｌ２距離、もしくはＨｕｂｅｒ距離など）を意味し、ノルムをトランケーションした値を用いてもよい。ｎｏｒｍ（∇ｄ（ｘ，ｙ））は隣接する画素間での奥行値の差分が小さいほど値が小さくなる。
例えば、ｎｏｒｍ（∇ｄ（ｘ，ｙ））は、注目画素位置から右と下に隣接する画素位置に対して計算する、選択した奥行値の差分のノルム（ｌ１距離、ｌ２距離、もしくはＨｕｂｅｒ距離など）の和である。右に隣接する画素位置との差分をどのように計算するかを具体的に示す。下に隣接する画素位置との差分の計算も同様である。
第１画像での注目画素位置（ｘ，ｙ）において、奥行値ｄをＱ＿１＿ｘｙから選択しており、隣接する画素位置（ｘ＋１，ｙ）において奥行値ｄ’をＱ＿１＿ｘ＋１＿ｙから選択しているとすると、それらの差分（ｄ－ｄ’）を導出し、そのノルムを求める。 norm(∇d(x,y)) means the norm of the difference between adjacent depth values d(x,y) (such as l1 distance, l2 distance, or Huber distance), and a truncated value of the norm may be used. The smaller the difference in depth values between adjacent pixels, the smaller the value of norm(∇d(x,y)).
For example, norm(∇d(x, y)) is the sum of the norms (such as l1 distance, l2 distance, or Huber distance) of the selected depth value differences calculated for the pixel positions adjacent to the right and below the pixel position of interest. A specific example of how to calculate the difference with the pixel position adjacent to the right is shown below. The calculation of the difference with the pixel position adjacent below is similar.
At a pixel position (x, y) of interest in the first image, a depth value d is selected from Q_1_xy, and at an adjacent pixel position (x+1, y), a depth value d' is selected from Q_1_x+1_y. Then, the difference between them (d-d') is derived and its norm is calculated.

エネルギー関数Ｅを最小化することにより、ステレオ誤差、投影位置の近さ、各画素位置の奥行値の滑らかさを考慮した上で最適な奥行値ｄ（ｘ，ｙ）を各画素位置（ｘ，ｙ）について定めることができる。 By minimizing the energy function E, it is possible to determine the optimal depth value d(x, y) for each pixel position (x, y), taking into account the stereo error, the proximity of the projection position, and the smoothness of the depth value at each pixel position.

エネルギー関数Ｅの最小化は、信念伝播（ＢｅｌｉｅｆＰｒｏｐａｇａｔｉｏｎ）という手法、より具体的にはループありの信念伝播（ＬｏｏｐｙＢｅｌｉｅｆＰｒｏｐａｇａｔｉｏｎ）によって行うことができる。 The energy function E can be minimized by a technique called belief propagation, more specifically, loopy belief propagation.

以上のように、信念伝播部３２により、以下の式に従って、エネルギー関数Ｅの最小化によって、全ての画素位置（ｘ，ｙ）の各々について近傍集合Ｑ＿１＿ｘｙから選択される奥行値ｄ（ｘ，ｙ）の組み合わせが求められる。 As described above, the belief propagation unit 32 determines a combination of depth values d(x, y) selected from the neighborhood set Q_1_xy for each of all pixel positions (x, y) by minimizing the energy function E according to the following formula:

ｄ（ｘ，ｙ）＝ａｒｇｍｉｎＥ d(x,y)=argminE

平滑化部２６は、第１画像上の画素位置（ｘ，ｙ）の各々について選択された奥行値ｄ（ｘ，ｙ）を平滑化する。 The smoothing unit 26 smoothes the selected depth value d(x, y) for each pixel position (x, y) on the first image.

ここで、奥行値ｄ（ｘ，ｙ）は、信念伝播部３２により近傍の画素位置に投影される３次元点の奥行値の割り当てによって導出されるので、不連続なマップとなっている。そこで、本実施形態に係る平滑化部２６は、連続関数の平滑化法であるＶａｒｉａｔｉｏｎａｌ法によって平滑化をし、平滑化後の奥行値ｄ＿ｖ（ｘ，ｙ）を生成し、深度画像として表示部１６により出力する。 Here, the depth value d(x, y) is derived by assigning depth values of three-dimensional points projected to nearby pixel positions by the belief propagation unit 32, and is therefore a discontinuous map. Therefore, the smoothing unit 26 according to this embodiment smooths using the Variational method, which is a smoothing method for continuous functions, generates the smoothed depth value d_v(x, y), and outputs it as a depth image on the display unit 16.

具体的には、Ｖａｒｉａｔｉｏｎａｌ法により、以下のエネルギー関数Ｅ＿Ｖを最小化する。 Specifically, the following energy function E_V is minimized using the Variational method.

ｎｏｒｍ１，ｎｏｒｍ２はｌ１距離、ｌ２距離、もしくはＨｕｂｅｒ距離などの距離である。また、Ｇは非特許文献４記載のＡＤＴ（ＡｎｉｓｏｔｒｏｐｉｃＤｉｆｆｕｓｉｏｎＴｅｎｓｏｒ）もしくはＢ－ＡＤＴ（ＢｉｎａｒｙＡｎｉｓｏｔｒｏｐｉｃＤｉｆｆｕｓｉｏｎＴｅｎｓｏｒ）であり、画像の位置ごとに平滑化項ｎｏｒｍ２（∇ｄ＿Ｖ（ｘ，ｙ））を重みづけするものである。 norm1 and norm2 are distances such as the l1 distance, l2 distance, or Huber distance. G is the ADT (Anisotropic Diffusion Tensor) or B-ADT (Binary Anisotropic Diffusion Tensor) described in Non-Patent Document 4, which weights the smoothing term norm2 (∇d_V(x, y)) for each position in the image.

エネルギー関数Ｅ＿Ｖの最小化は非特許文献４に記載のようにｆｉｒｓｔｏｒｄｅｒｐｒｉｍａｌｄｕａｌａｌｇｏｒｉｔｈｍによって実施できる。これにより、以下の式に従って、平滑化後の、全ての画素位置（ｘ，ｙ）の各々についての奥行値ｄ＿Ｖ（ｘ，ｙ）の組み合わせが求められる。 The minimization of the energy function E_V can be performed by the first order primal dual algorithm as described in Non-Patent Document 4. This allows the combination of depth values d_V(x,y) for all pixel positions (x,y) after smoothing to be found according to the following formula:

ｄ＿Ｖ（ｘ，ｙ）＝ａｒｇｍｉｎＥ＿Ｖ d_V(x,y)=argmin E_V

［非特許文献４］Yao, Yasuhiro, et al. "Discontinuous and Smooth Depth Completion with Binary Anisotropic Diffusion Tensor." IEEE Robotics and Automation Letters 5.4 (2020): 5128-5135. [Non-Patent Document 4] Yao, Yasuhiro, et al. "Discontinuous and Smooth Depth Completion with Binary Anisotropic Diffusion Tensor." IEEE Robotics and Automation Letters 5.4 (2020): 5128-5135.

＜本実施形態に係る３次元点群処理装置の作用＞
次に、３次元点群処理装置１０の作用について説明する。 <Action of the 3D point cloud processing device according to this embodiment>
Next, the operation of the three-dimensional point cloud processing device 10 will be described.

図６は、３次元点群処理装置１０による３次元点群処理の流れを示すフローチャートである。ＣＰＵ１１がＲＯＭ１２又はストレージ１４から３次元点群処理プログラムを読み出して、ＲＡＭ１３に展開して実行することにより、３次元点群処理が行なわれる。また、３次元点群処理装置１０に、ＬｉＤＡＲセンサ５０によって計測された３次元点群Ｐと、第１カメラ５２によって撮影された第１画像と、第２カメラ５４によって撮影された第２画像と、が入力される。また、３次元点群処理装置１０に、第１カメラ５２の内部パラメータＫ＿１、第２カメラ５４の内部パラメータＫ＿２、第１カメラ５２と第２カメラ５４間の回転行列Ｒ＿Ｃ、第１カメラ５２と第２カメラ５４間の並進ベクトルＴ＿Ｃ、第１カメラ５２とＬｉＤＡＲセンサ５０間の投影行列Ｒ＿Ｌ、及び第１カメラ５２とＬｉＤＡＲセンサ５０間の並進ベクトルＴ＿Ｌが入力されているものとする。 Figure 6 is a flowchart showing the flow of 3D point cloud processing by the 3D point cloud processing device 10. The CPU 11 reads out a 3D point cloud processing program from the ROM 12 or storage 14, expands it in the RAM 13, and executes it to perform 3D point cloud processing. In addition, the 3D point cloud P measured by the LiDAR sensor 50, the first image taken by the first camera 52, and the second image taken by the second camera 54 are input to the 3D point cloud processing device 10. In addition, the internal parameters K_1 of the first camera 52, the internal parameters K_2 of the second camera 54, the rotation matrix R_C between the first camera 52 and the second camera 54, the translation vector T_C between the first camera 52 and the second camera 54, the projection matrix R_L between the first camera 52 and the LiDAR sensor 50, and the translation vector T_L between the first camera 52 and the LiDAR sensor 50 are input to the 3D point cloud processing device 10.

ステップＳ１００で、ＣＰＵ１１は、入力処理部２０として、入力部１５により受け付けた、第１画像及び第２画像と、３次元点群とを取得する。 In step S100, the CPU 11, as the input processing unit 20, acquires the first image, the second image, and the three-dimensional point cloud received by the input unit 15.

ステップＳ１０２で、ＣＰＵ１１は、入力処理部２０として、第１画像及び第２画像と、３次元点群とに基づいて、集合Ｑ＿１、Ｑ＿２を計算する。 In step S102, the CPU 11, as the input processing unit 20, calculates sets Q_1 and Q_2 based on the first and second images and the three-dimensional point cloud.

ステップＳ１０４では、ＣＰＵ１１は、近傍抽出部２２として、３次元点群Ｐの各々の３次元点に対応する第１画像上の画素位置に基づいて、第１画像上の各画素位置（ｘ，ｙ）について、近傍集合Ｑ＿１＿ｘｙを抽出する。 In step S104, the CPU 11, as the neighborhood extraction unit 22, extracts a neighborhood set Q_1_xy for each pixel position (x, y) on the first image based on the pixel positions on the first image corresponding to each 3D point of the 3D point group P.

ステップＳ１０６では、ＣＰＵ１１は、コスト計算部３０として、第１画像上の画素位置（ｘ，ｙ）毎に、近傍集合Ｑ＿１＿ｘｙに含まれる各要素ｑ＝（ｑ＿ｘ，ｑ＿ｙ，ｄ）について、コスト関数を計算する。 In step S106, the CPU 11, as the cost calculation unit 30, calculates a cost function for each element q = (q_x, q_y, d) included in the neighborhood set Q_1_xy for each pixel position (x, y) on the first image.

ステップＳ１０８では、ＣＰＵ１１は、信念伝播部３２として、第１画像上の画素位置（ｘ，ｙ）毎に、コスト関数及び隣接する画素において選択される奥行値との差分を用いて表されるエネルギー関数に基づいて、近傍集合Ｑ＿１＿ｘｙに含まれる各要素の奥行値の何れかを選択する。 In step S108, the CPU 11, as the belief propagation unit 32, selects, for each pixel position (x, y) on the first image, one of the depth values of each element included in the neighborhood set Q_1_xy based on a cost function and an energy function expressed using the difference with the depth value selected for the adjacent pixel.

ステップＳ１１０では、ＣＰＵ１１は、平滑化部２６として、第１画像上の画素位置の各々について選択された奥行値を平滑化し、平滑化された奥行値を深度画像として表示部１６により表示して、３次元点群処理ルーチンを終了する。 In step S110, the CPU 11, as the smoothing unit 26, smoothes the selected depth values for each pixel position on the first image, displays the smoothed depth values as a depth image on the display unit 16, and ends the 3D point cloud processing routine.

以上説明したように、本実施形態に係る３次元点群処理装置は、ＬｉＤＡＲセンサにより計測された３次元点群の３次元点の各々が投影される第１画像上の画素位置に基づいて、第１画像上の画素位置の各々について、近傍の複数の画素位置に対応する３次元点の奥行値から、第１画像及び第２画像の間の整合性が高く、かつ近傍の画素位置との距離が近くなるように、奥行値を選択する。これにより、計測して得られた３次元点群を精度よくアップサンプリングすることができる。 As described above, the 3D point cloud processing device according to this embodiment selects a depth value for each pixel position on the first image based on the pixel position on the first image onto which each of the 3D points of the 3D point cloud measured by the LiDAR sensor is projected, from the depth values of the 3D points corresponding to multiple nearby pixel positions, so that there is high consistency between the first image and the second image and the distance to the nearby pixel positions is short. This allows the 3D point cloud obtained by measurement to be upsampled with high precision.

＜変形例＞
なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 <Modification>
The present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the spirit and scope of the present invention.

例えば、ＬｉＤＡＲセンサによる計測で、３次元点群を取得する場合を例に説明したが、これに限定されるものではない。ＬｉＤＡＲセンサ以外のセンサを用いて、３次元点群を計測するようにしてもよい。 For example, the example described above uses a LiDAR sensor to measure and obtain a three-dimensional point cloud, but the present invention is not limited to this. A sensor other than a LiDAR sensor may be used to measure a three-dimensional point cloud.

また、第１画像と第２画像とが異なるカメラによって撮影された場合を例に説明したが、これに限定されるものではない。撮影位置の関係が予め求められていれば、第１画像と第２画像とが同じカメラによって撮影されたものでもよい。 Although the first image and the second image are taken by different cameras, the present invention is not limited to this example. As long as the relationship between the photographing positions is determined in advance, the first image and the second image may be taken by the same camera.

また、上記各実施形態でＣＰＵがソフトウェア（プログラム）を読み込んで実行した各種処理を、ＣＰＵ以外の各種のプロセッサが実行してもよい。この場合のプロセッサとしては、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＦＰＧＡ（Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の製造後に回路構成を変更可能なＰＬＤ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）、及びＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が例示される。また、３次元点群処理を、これらの各種のプロセッサのうちの１つで実行してもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡ、及びＣＰＵとＦＰＧＡとの組み合わせ等）で実行してもよい。また、これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子等の回路素子を組み合わせた電気回路である。 In addition, various processes that the CPU reads and executes in each of the above embodiments by reading the software (programs) may be executed by various processors other than the CPU. Examples of processors in this case include a dedicated electrical circuit that is a processor having a circuit configuration designed specifically to execute a specific process, such as a GPU (Graphics Processing Unit), a PLD (Programmable Logic Device) whose circuit configuration can be changed after manufacture, such as an FPGA (Field-Programmable Gate Array), and an ASIC (Application Specific Integrated Circuit). In addition, the three-dimensional point cloud processing may be executed by one of these various processors, or may be executed by a combination of two or more processors of the same or different types (for example, multiple FPGAs, a combination of a CPU and an FPGA, etc.). More specifically, the hardware structure of these various processors is an electrical circuit that combines circuit elements such as semiconductor devices.

また、上記各実施形態では、３次元点群処理プログラムがストレージ１４に予め記憶（インストール）されている態様を説明したが、これに限定されない。プログラムは、ＣＤ－ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＤＶＤ－ＲＯＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、及びＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリ等の非一時的（ｎｏｎ－ｔｒａｎｓｉｔｏｒｙ）記憶媒体に記憶された形態で提供されてもよい。また、プログラムは、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 In addition, in each of the above embodiments, the 3D point cloud processing program is described as being pre-stored (installed) in the storage 14, but this is not limiting. The program may be provided in a form stored in a non-transitory storage medium such as a CD-ROM (Compact Disk Read Only Memory), a DVD-ROM (Digital Versatile Disk Read Only Memory), or a USB (Universal Serial Bus) memory. The program may also be downloaded from an external device via a network.

以上の実施形態に関し、更に以下の付記を開示する。 The following notes are further provided with respect to the above embodiment.

（付記項１）
３次元点群処理装置であって、
メモリと、
前記メモリに接続された少なくとも１つのプロセッサと、
を含み、
前記プロセッサは、
少なくとも撮影位置の関係が予め求められている第１画像及び第２画像と、少なくとも前記撮影位置と計測位置との関係が予め求められている物体の表面上の３次元点群とを受け付け、前記３次元点群の３次元点の各々に対応する前記第１画像上の画素位置を求め、
前記３次元点群の３次元点の各々に対応する前記第１画像上の画素位置に基づいて、前記第１画像上の画素位置の各々について、近傍の複数の画素位置に対応する３次元点の奥行値から、前記第１画像及び前記第２画像の間の整合性が高くなるように、前記奥行値を選択する
ように構成される３次元点群処理装置。 (Additional Note 1)
A three-dimensional point cloud processing device,
Memory,
at least one processor coupled to the memory;
Including,
The processor,
receiving a first image and a second image, the relationship between at least the shooting positions of which is previously determined, and a three-dimensional point cloud on a surface of an object, the relationship between at least the shooting positions and measurement positions of which is previously determined, and determining pixel positions on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud;
a 3D point cloud processing device configured to select, for each pixel position on the first image, a depth value from depth values of 3D points corresponding to a plurality of neighboring pixel positions based on a pixel position on the first image corresponding to each of the 3D points of the 3D point cloud, such that consistency between the first image and the second image is high.

（付記項２）
３次元点群処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
前記３次元点群処理は、
少なくとも撮影位置の関係が予め求められている第１画像及び第２画像と、少なくとも前記撮影位置と計測位置との関係が予め求められている物体の表面上の３次元点群とを受け付け、前記３次元点群の３次元点の各々に対応する前記第１画像上の画素位置を求め、
前記３次元点群の３次元点の各々に対応する前記第１画像上の画素位置に基づいて、前記第１画像上の画素位置の各々について、近傍の複数の画素位置に対応する３次元点の奥行値から、前記第１画像及び前記第２画像の間の整合性が高くなるように、前記奥行値を選択する
非一時的記憶媒体。 (Additional Note 2)
A non-transitory storage medium storing a program executable by a computer to perform three-dimensional point cloud processing,
The three-dimensional point cloud processing includes:
receiving a first image and a second image, the relationship between at least the shooting positions of which is previously determined, and a three-dimensional point cloud on a surface of an object, the relationship between at least the shooting positions and measurement positions of which is previously determined, and determining pixel positions on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud;
A non-transitory storage medium that selects, for each pixel position on the first image based on a pixel position on the first image corresponding to each three-dimensional point of the three-dimensional point cloud, the depth value from depth values of three-dimensional points corresponding to multiple neighboring pixel positions so as to increase consistency between the first image and the second image.

１０３次元点群処理装置
１４ストレージ
１５入力部
１６表示部
２０入力処理部
２２近傍抽出部
２４近傍選択部
２６平滑化部
３０コスト計算部
３２信念伝播部
５０ＬｉＤＡＲセンサ
５２第１カメラ
５４第２カメラ Reference Signs List 10 3D point cloud processing device 14 Storage 15 Input unit 16 Display unit 20 Input processing unit 22 Neighborhood extraction unit 24 Neighborhood selection unit 26 Smoothing unit 30 Cost calculation unit 32 Belief propagation unit 50 LiDAR sensor 52 First camera 54 Second camera

Claims

an input processing unit that receives a first image and a second image, the relationship between at least the shooting positions of which is determined in advance, and a three-dimensional point cloud on a surface of an object, the relationship between at least the shooting positions and measurement positions of which is determined in advance, and determines pixel positions on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud;
a neighborhood selection unit that selects, for each pixel position on the first image, a depth value from depth values of three-dimensional points corresponding to a plurality of neighboring pixel positions based on a pixel position on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud, such that consistency between the first image and the second image is high;
A three-dimensional point cloud processing device comprising:

The neighborhood selection unit is
For each pixel location on the first image,
For each of the three-dimensional points corresponding to the plurality of neighboring pixel positions, a point consisting of the pixel position on the first image and a depth value of the three-dimensional point is projected onto the second image, and a distance between a pixel value of the pixel position on the first image and a pixel value of the pixel position of the point projected onto the second image is calculated;
2. The three-dimensional point cloud processing device according to claim 1, further comprising: a cost function expressed using a distance between a pixel value of a pixel position on the first image and a pixel value of a pixel position of a point projected onto the second image, the cost function being expressed using a distance between a pixel value of a pixel position on the first image and a pixel value of a pixel position of a point projected onto the second image, the cost function being expressed using a cost function ...

The neighborhood selection unit is
For each pixel location on the first image,
3. The three-dimensional point cloud processing device according to claim 2, wherein the depth value is selected based on the cost function and an energy function expressed using a difference between a depth value selected at a pixel position of interest on the first image and a depth value selected at an adjacent pixel position.

The 3D point cloud processing device according to any one of claims 1 to 3, further comprising a smoothing unit that smoothes the depth value selected for each pixel position on the first image.

The 3D point cloud processing device according to any one of claims 1 to 4, wherein the neighborhood selection unit selects, for each pixel position on the first image based on the pixel positions on the first image corresponding to each of the 3D points of the 3D point cloud, the depth value from the depth values of 3D points corresponding to multiple neighboring pixel positions, such that the consistency between the first image and the second image is high and the distance to the neighboring pixel positions is short.

The input processing unit:
projecting each of the three-dimensional points of the three-dimensional point cloud onto the first image, removing three-dimensional points projected outside the area of the first image, and obtaining a set of elements that are combinations of pixel positions and depth values on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud after the removal;
projecting the points of each element of the set onto the second image, removing elements projected outside the region of the second image, and updating the set to the removed elements;
The neighborhood selection unit is
A 3D point cloud processing device as described in any one of claims 1 to 5, wherein, based on the pixel positions of the elements of the set on the first image, for each pixel position on the first image, a depth value of the elements of the set having a plurality of neighboring pixel positions is selected from the depth values of the elements of the set so that there is high consistency between the first image and the second image and the distance to the neighboring pixel positions is short.

an input processing unit receives a first image and a second image, the relationship between at least the shooting positions of which is determined in advance, and a three-dimensional point cloud on a surface of an object, the relationship between at least the shooting positions and measurement positions of which is determined in advance, and determines pixel positions on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud;
A three-dimensional point cloud processing method, in which a neighborhood selection unit selects, for each pixel position on the first image, based on a pixel position on the first image corresponding to each of the three-dimensional points of the three-dimensional point cloud, the depth value from depth values of three-dimensional points corresponding to a plurality of neighboring pixel positions, so as to increase consistency between the first image and the second image.

A three-dimensional point cloud processing program for causing a computer to function as a three-dimensional point cloud processing device according to any one of claims 1 to 6.