JP7836005B2

JP7836005B2 - Learning device, three-dimensional reconstruction device, learning method, three-dimensional reconstruction method, and program

Info

Publication number: JP7836005B2
Application number: JP2024569919A
Authority: JP
Inventors: みずき田端; 陽祐竹内; 良牧野; 潤一郎玉松
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2023-01-11
Filing date: 2023-01-11
Publication date: 2026-03-26
Anticipated expiration: 2043-01-11
Also published as: JPWO2024150339A1; WO2024150339A1

Description

本開示は、学習装置、三次元再構成装置、学習方法、三次元再構成方法、及びプログラムに関する。This disclosure relates to a learning device, a three-dimensional reconstruction device, a learning method, a three-dimensional reconstruction method, and a program.

従来、構造物の内部空間を撮影することによって得られたパノラマ画像から抽出された画像特徴量を入力して深層学習を行った推定モデル（レイアウト推定ネットワーク）に特徴量を出力させ、該特徴量を用いて構造物の内部空間を三次元再構成することが知られている（非特許文献１及び２）。Conventionally, it is known that a deep learning estimation model (layout estimation network) is used to output features extracted from panoramic images obtained by photographing the interior space of a structure, and these features are then used to reconstruct the interior space of the structure in three dimensions (Non-Patent Documents 1 and 2).

また、画像特徴量だけではなく形状特徴量をさらに入力することによって、様々な物体（非特許文献３の例では、車、自転車、飛行機、椅子等）を高精度に３次元再構成することが知られている（非特許文献３及び４）。Furthermore, it is known that by inputting not only image features but also shape features, various objects (such as cars, bicycles, airplanes, and chairs in the example in Non-Patent Document 3) can be reconstructed in 3D with high accuracy (Non-Patent Documents 3 and 4).

Sun, C., et al., “HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation”, CVPR, 2019.Sun, C., et al., “HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation”, CVPR, 2019. Yang, ST., et al., “DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama”, CVPR, 2019.Yang, ST., et al., “DuLa-Net: A Dual-Projection Network for Estimating Room Layouts from a Single RGB Panorama”, CVPR, 2019. Yang, X., et al. Pose from Shape: Deep Pose Estimation for Arbitrary 3D Objects, BMVC, 2019.Yang, X., et al. Pose from Shape: Deep Pose Estimation for Arbitrary 3D Objects, BMVC, 2019. Li, Y., et al. DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection, CVPR, 2022.Li, Y., et al. DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection, CVPR, 2022.

しかしながら、上述した推定モデルは、パノラマ画像から抽出された画像特徴量を入力することによって構造物の内部空間のレイアウトを三次元再構成するため、内部空間のレイアウトに関する情報を最初から入力して三次元再構成することはレイアウト推定モデルの性質上できなかった。このため、上述した推定モデルは、画像特徴量と形状特徴量とを入力する場合に比べて、高い精度で、構造物の内部空間を三次元再構成することが困難であった。However, the estimation model described above reconstructs the three-dimensional layout of the internal space of a structure by inputting image features extracted from panoramic images. Therefore, due to the nature of the layout estimation model, it was not possible to input information about the internal space layout from the beginning and perform a three-dimensional reconstruction. Consequently, the estimation model described above had difficulty reconstructing the internal space of a structure in three dimensions with high accuracy compared to the case where image features and shape features were input.

かかる事情に鑑みてなされた本開示の目的は、高い精度で、構造物の内部空間を三次元再構成することができる学習装置、三次元再構成装置、学習方法、三次元再構成方法、及びプログラムを提供することである。In view of these circumstances, the purpose of this disclosure is to provide a learning device, a three-dimensional reconstruction device, a learning method, a three-dimensional reconstruction method, and a program that can reconstruct the internal space of a structure in three dimensions with high accuracy.

本開示に係る学習装置は、構造物の内部空間を撮影したパノラマ画像から前記内部空間を三次元再構成するための推定モデルを学習する学習装置であって、前記パノラマ画像と、前記内部空間の形状を示す形状情報とが入力されると前記内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、前記パノラマ画像における、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する教師モデルを用いて、前記パノラマ画像から抽出された画像特徴量と、前記形状情報から抽出された形状特徴量とが入力されると前記処理済み特徴量を出力する教師モデル推定部と、前記パノラマ画像が入力されると前記内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、前記パノラマ画像における、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する生徒モデルを用いて、前記パノラマ画像から抽出された画像特徴量が入力されると前記処理済み特徴量を出力する生徒モデル推定部と、前記教師モデルを用いて出力された処理済み特徴量の分布と、前記生徒モデルを用いて出力された処理済み特徴量の分布との差分を示す損失を算出し、前記損失が検証用データで最小値をとるように生徒モデルを前記推定モデルとして学習する生徒モデル学習部と、を備える。The learning device relating to this disclosure is a learning device for learning an estimation model for three-dimensional reconstruction of the interior space of a structure from a panoramic image of the interior space of a structure, and uses a teacher model that, when the panoramic image and shape information indicating the shape of the interior space are input, outputs processed feature quantities indicating features relating to the correlation of the planes defining the interior space, decodes the processed feature quantities and outputs a feature quantity vector indicating the position of the corners of the planes defining the interior space in the panoramic image, and when image features extracted from the panoramic image and shape features extracted from the shape information are input, the teacher model estimation unit outputs the processed feature quantities, and the panoramic image The system includes a student model estimation unit that outputs processed feature quantities showing characteristics related to the correlation of the planes defining the internal space when an image is input, decodes the processed feature quantities to output feature vectors showing the positions of the corners of the planes defining the internal space in the panoramic image, and outputs the processed feature quantities when image feature quantities extracted from the panoramic image are input, and a student model learning unit that calculates a loss showing the difference between the distribution of processed feature quantities output using the teacher model and the distribution of processed feature quantities output using the student model, and learns the student model as the estimation model so that the loss takes the minimum value with verification data.

本開示に係る三次元再構成装置は、構造物の内部空間を撮影したパノラマ画像から前記内部空間を三次元再構成する三次元再構成装置であって、前記パノラマ画像の入力を受け付ける画像入力部と、上述した学習装置によって学習された推定モデルに、前記パノラマ画像を入力し、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力させる推定部と、前記特徴量ベクトルに基づいて、前記隅角部を構成する点の三次元座標を計算する三次元座標計算部と、前記三次元座標に基づいて前記内部空間を三次元再構成する後処理部と、を備える。The three-dimensional reconstruction apparatus according to this disclosure is a three-dimensional reconstruction apparatus that reconstructs the interior space of a structure in three dimensions from a panoramic image of the interior space of a structure, and comprises: an image input unit that receives the panoramic image as input; an estimation unit that inputs the panoramic image to an estimation model learned by the learning device described above and outputs a feature vector indicating the position of the corner of the plane defining the interior space; a three-dimensional coordinate calculation unit that calculates the three-dimensional coordinates of the points constituting the corner based on the feature vector; and a post-processing unit that reconstructs the interior space in three dimensions based on the three-dimensional coordinates.

また、本開示に係る学習方法は、構造物の内部空間を撮影したパノラマ画像から前記内部空間を三次元再構成するための推定モデルを学習する学習装置が実行する学習方法であって、前記パノラマ画像と、前記内部空間の形状を示す形状情報とが入力されると前記内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、前記パノラマ画像における、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する教師モデルを用いて、前記パノラマ画像から抽出された画像特徴量と、前記形状情報から抽出された形状特徴量とが入力されると前記処理済み特徴量を出力するステップと、前記パノラマ画像が入力されると前記内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、前記パノラマ画像における、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する生徒モデルを用いて、前記パノラマ画像から抽出された画像特徴量が入力されると前記処理済み特徴量を出力するステップと、前記教師モデルを用いて出力された処理済み特徴量の分布と、前記生徒モデルを用いて出力された処理済み特徴量の分布との差分を示す損失を算出し、前記損失が検証用データで最小値をとるように生徒モデルを前記推定モデルとして学習するステップと、を含む。Furthermore, the learning method relating to this disclosure is a learning method executed by a learning device that learns an estimation model for three-dimensional reconstruction of the interior space of a structure from a panoramic image of the interior space of the structure, wherein when the panoramic image and shape information indicating the shape of the interior space are input, processed feature quantities indicating features relating to the correlation of the planes defining the interior space are output, and the processed feature quantities are decoded to output a feature quantity vector indicating the position of the corner portion of the planes defining the interior space in the panoramic image, and the learning method relating to this disclosure is executed by a training model, wherein when image features extracted from the panoramic image and shape features extracted from the shape information are input, the processed feature quantities are output. The method includes the steps of: using a student model that, when a panoramic image is input, outputs processed feature quantities showing characteristics related to the correlation of the planes defining the internal space, decodes the processed feature quantities to output feature vectors showing the positions of the corners of the planes defining the internal space in the panoramic image, and when image features extracted from the panoramic image are input, outputs the processed feature quantities; and calculating a loss that shows the difference between the distribution of processed feature quantities output using the teacher model and the distribution of processed feature quantities output using the student model, and training the student model as the estimation model so that the loss takes the minimum value on the verification data.

また、本開示に係る三次元再構成方法は、構造物の内部空間を撮影したパノラマ画像から前記内部空間を三次元再構成する三次元再構成装置が実行する三次元再構成方法であって、前記パノラマ画像の入力を受け付けるステップと、上述した学習方法によって学習された推定モデルに、前記パノラマ画像を入力し、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力させるステップと、前記特徴量ベクトルに基づいて、前記隅角部を構成する点の三次元座標を計算するステップと、前記三次元座標に基づいて前記内部空間を三次元再構成するステップと、を含む。Furthermore, the three-dimensional reconstruction method according to this disclosure is a three-dimensional reconstruction method performed by a three-dimensional reconstruction device that reconstructs the interior space of a structure in three dimensions from a panoramic image of the interior space of a structure, and includes the steps of: receiving the panoramic image as input; inputting the panoramic image to an estimation model learned by the learning method described above and causing it to output a feature vector indicating the position of the corner of the plane defining the interior space; calculating the three-dimensional coordinates of the points constituting the corner based on the feature vector; and reconstructing the interior space in three dimensions based on the three-dimensional coordinates.

また、本開示に係るプログラムは、コンピュータを、上述した学習装置として動作させる。Furthermore, the program relating to this disclosure causes the computer to operate as the learning device described above.

また、本開示に係るプログラムは、コンピュータを、上述した三次元再構成装置として動作させる。Furthermore, the program relating to this disclosure causes the computer to operate as the three-dimensional reconstruction device described above.

本開示によれば、学習装置、三次元再構成装置、学習方法、三次元再構成方法、及びプログラムは、高い精度で、構造物の内部空間を三次元再構成することができる。According to this disclosure, the learning device, the three-dimensional reconstruction device, the learning method, the three-dimensional reconstruction method, and the program can reconstruct the internal space of a structure in three dimensions with high accuracy.

本開示の一実施形態に係る学習装置の構成の一例を示す図である。This figure shows an example of the configuration of a learning device according to one embodiment of the present disclosure. パノラマ画像の一例を示す図である。This figure shows an example of a panoramic image. 特徴量ベクトルの一例を示す図である。This figure shows an example of a feature vector. 本開示の一実施形態に係る三次元再構成装置の構成の一例を示す図である。This figure shows an example of the configuration of a three-dimensional reconstruction apparatus according to one embodiment of the present disclosure. 図３に示す三次元再構成装置によって構成される、構造物の三次元の像を示す図である。This figure shows a three-dimensional image of a structure, constructed using the three-dimensional reconstruction device shown in Figure 3. 図１に示す学習装置の動作の一例を示すフローチャートである。Figure 1 is a flowchart illustrating an example of the operation of the learning device. 図３に示す三次元再構成装置の動作の一例を示すフローチャートである。Figure 3 is a flowchart illustrating an example of the operation of the three-dimensional reconstruction apparatus. 学習装置のハードウェアブロック図である。This is a hardware block diagram of the learning device.

以下、本実施形態について適宜図面を参照しながら説明する。各図面中、同一又は相当する部分には、同一符号を付している。本実施形態の説明において、同一又は相当する部分については、説明を適宜省略又は簡略化する。以下に説明する実施形態は本開示の構成の例であり、本発明は、以下の実施形態に制限されるものではない。The following description of this embodiment will be made with reference to the drawings as appropriate. In each drawing, the same or corresponding parts are denoted by the same reference numerals. In the description of this embodiment, the description of the same or corresponding parts will be omitted or simplified as appropriate. The embodiments described below are examples of the configuration of the present disclosure, and the present invention is not limited to the following embodiments.

＜学習装置の構成＞
学習装置１００は、構造物の内部空間を撮影したパノラマ画像から内部空間を三次元再構成するための推定モデルを学習する。図１に示すように学習装置１００は、教師モデル用画像入力部１１と、形状情報入力部１２と、教師モデル推定部１３と、生徒モデル用画像入力部１４と、生徒モデル推定部１５と、生徒モデル学習部１６とを備える。学習装置１００は、クラウドコンピューティングシステム又はその他のコンピューティングシステムに属するサーバなどのコンピュータである。 <Configuration of the learning device>
The learning device 100 learns an estimation model for three-dimensional reconstruction of the interior space of a structure from panoramic images taken of the interior space. As shown in Figure 1, the learning device 100 comprises an image input unit 11 for the teacher model, a shape information input unit 12, a teacher model estimation unit 13, an image input unit 14 for the student model, a student model estimation unit 15, and a student model learning unit 16. The learning device 100 is a computer such as a server belonging to a cloud computing system or other computing system.

教師モデル用画像入力部１１、形状情報入力部１２、及び生徒モデル用画像入力部１４は、入力インターフェースによって構成される。入力用インターフェースは、例えば、物理キー、静電容量キー、ポインティングデバイス、ディスプレイと一体的に設けられたタッチスクリーン、又はマイクである。入力インターフェースは、通信インターフェースを含んでもよい。通信インターフェースには、例えば、イーサネット（登録商標）、ＦＤＤＩ（Fiber Distributed Data Interface）、Ｗｉ－Ｆｉ（登録商標）等の規格が用いられてもよい。教師モデル推定部１３、生徒モデル推定部１５、及び生徒モデル学習部１６は、コントローラによって構成される。コントローラは、ＡＳＩＣ(Application Specific Integrated Circuit)、ＦＰＧＡ(Field-Programmable Gate Array)等の専用のハードウェアによって構成されてもよいし、プロセッサによって構成されてもよいし、双方を含んで構成されてもよい。The teacher model image input unit 11, the shape information input unit 12, and the student model image input unit 14 are configured by an input interface. The input interface may be, for example, a physical key, a capacitive key, a pointing device, a touchscreen integrated with a display, or a microphone. The input interface may also include a communication interface. For example, standards such as Ethernet®, FDDI (Fiber Distributed Data Interface), and Wi-Fi® may be used for the communication interface. The teacher model estimation unit 13, the student model estimation unit 15, and the student model learning unit 16 are configured by a controller. The controller may be configured by dedicated hardware such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field-Programmable Gate Array), or by a processor, or by including both.

教師モデル用画像入力部１１は、構造物の内部空間を撮影したパノラマ画像Ｄ１の入力を受け付ける。The teacher model image input unit 11 receives input of a panoramic image D1 taken of the interior space of a structure.

図２Ａは、パノラマ画像の一例を示す図である。本実施形態では、学習装置１００の入力を、３×５１２×１０２４（チャンネル×高さ×幅）の次元を持つ１枚のパノラマ画像とする。図２Ａに示すように、パノラマ画像Ｄ１は、構造物の内部空間を、カメラ等により鉛直方向に沿った上下方向に－９０°から＋９０°までの角度で、また水平方向に沿った左右方向に０°から３６０°までの角度で撮影することで得られる画像である。そのため、パノラマ画像Ｄ１には、構造物の内部空間を画定する面（例えば、天井、壁面、床面）が示されている。Figure 2A shows an example of a panoramic image. In this embodiment, the input to the learning device 100 is a single panoramic image with dimensions of 3 × 512 × 1024 (channels × height × width). As shown in Figure 2A, the panoramic image D1 is an image obtained by photographing the interior space of a structure with a camera or the like at angles from -90° to +90° in the vertical direction and from 0° to 360° in the horizontal direction. Therefore, the panoramic image D1 shows surfaces that define the interior space of the structure (e.g., ceiling, walls, floor).

教師モデル用画像入力部１１は、任意の手法によってパノラマ画像Ｄ１の入力を受け付けてもよい。例えば、教師モデル用画像入力部１１は、カメラを備える外部装置から出力されたパノラマ画像Ｄ１の入力を受け付けてもよい。教師モデル用画像入力部１１は、メモリからパノラマ画像Ｄ１を読み出すことによって、パノラマ画像Ｄ１の入力を受け付けてもよい。The teacher model image input unit 11 may receive the panoramic image D1 by any method. For example, the teacher model image input unit 11 may receive the panoramic image D1 output from an external device equipped with a camera. The teacher model image input unit 11 may also receive the panoramic image D1 by reading it from memory.

形状情報入力部１２は、構造物の内部空間の形状を示す形状情報Ｄ２の入力を受け付ける。例えば、形状情報Ｄ２は、構造物の内部の形状を表す図面であってもよい。図面は、平面図、側面図等を含んでもよい。The shape information input unit 12 receives shape information D2, which indicates the shape of the internal space of the structure. For example, shape information D2 may be a drawing representing the internal shape of the structure. The drawing may include a plan view, a side view, etc.

教師モデル推定部１３は、教師モデルを用いて、パノラマ画像Ｄ１から抽出された画像特徴量と、形状情報Ｄ２から抽出された形状特徴量とが入力されると処理済み特徴量を出力する。教師モデルは、パノラマ画像Ｄ１と、内部空間の形状を示す形状情報Ｄ２とが入力されると、内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、パノラマ画像Ｄ１における、内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力するモデルである。The teacher model estimation unit 13 outputs processed features when it receives image features extracted from the panoramic image D1 and shape features extracted from shape information D2 using the teacher model. The teacher model is a model that, when it receives the panoramic image D1 and shape information D2 which shows the shape of the internal space, outputs processed features that show features related to the correlation of the planes defining the internal space, decodes these processed features, and outputs a feature vector that shows the position of the corner of the plane that defines the internal space in the panoramic image D1.

特徴量ベクトルは、内部空間を画定する面の隅角部の位置を示すベクトルである。内部空間を画定する面には、天井、壁面、及び床面が含まれる。隅角部は、壁面と壁面との境界、壁面と床面との境界、壁面と天井との境界であってよい。具体的には、特徴量ベクトルは、パノラマ画像の列ごとに、壁面と天井との境界の位置、壁面と床面との境界の位置、壁面と壁面との境界の位置を示すベクトルである。壁面と壁面との境界の位置は、壁面と壁面との境界の存在確率が所定値より高い位置であってもよい。また、特徴量ベクトルは、パノラマ画像の列ごとに、壁面と天井との境界の位置、壁面と床面との境界の位置、壁面と壁面との境界の存在確率を示すベクトルであってもよい。The feature vector is a vector indicating the position of the corners of the surfaces that define the interior space. The surfaces that define the interior space include the ceiling, walls, and floor. The corners may be the boundaries between walls, between walls and floors, and between walls and ceilings. Specifically, the feature vector is a vector indicating the position of the boundary between a wall and a ceiling, the position of the boundary between a wall and a floor, and the position of the boundary between walls for each column of the panoramic image. The position of the boundary between walls may be a position where the probability of existence of the boundary between walls is higher than a predetermined value. Alternatively, the feature vector may be a vector indicating the position of the boundary between a wall and a ceiling, the position of the boundary between a wall and a floor, and the probability of existence of the boundary between walls for each column of the panoramic image.

教師モデルは、パノラマ画像Ｄ１と形状情報Ｄ２とを入力とし、特徴量ベクトルを出力するニューラルネットワークを作成し、該ニューラルネットワークを学習することによって得られる。また、教師モデルは、検証用データでの教師モデル損失Ｌ_Ｔが最小値をとるように学習されていてもよい。教師モデル損失Ｌ_Sは、損失関数を用いて算出された、追って詳細に説明する教師モデル用特徴量処理器１３３によって出力された処理済み特徴量の分布と、真値の分布との差分を示す値である。 The training model is obtained by creating a neural network that takes a panoramic image D1 and shape information D2 as inputs and outputs a feature vector, and then training this neural network. The training model may also be trained so that the training model loss L<sub> _T </sub> on the validation data is minimized. The training model loss L<sub> _S </sub> is a value that represents the difference between the distribution of processed features output by the training model feature processor 133 (which will be explained in detail later), calculated using the loss function, and the distribution of the true values.

教師モデル推定部１３は、教師モデル用画像特徴量抽出器１３１と、形状特徴量抽出器１３２と、教師モデル用特徴量処理器１３３と、教師モデル用特徴量復号器１３４と、を備える。The teacher model estimation unit 13 comprises a teacher model image feature extractor 131, a shape feature extractor 132, a teacher model feature processor 133, and a teacher model feature decoder 134.

教師モデル用画像特徴量抽出器１３１は、教師モデル用画像入力部１１によって入力が受け付けられたパノラマ画像Ｄ１から画像特徴量を抽出する。画像特徴量は、パノラマ画像Ｄ１に示されている、構造物の内部におけるエッジ、コーナー等の像の特徴を示す量である。教師モデル用画像特徴量抽出器１３１は、任意の手法を用いて画像特徴量を抽出することができ、例えば、公知のHorizon Netを用いて画像特徴量を抽出してもよい。The image feature extractor 131 for the training model extracts image features from the panoramic image D1, which is received as input by the image input unit 11 for the training model. Image features are quantities that represent image characteristics such as edges and corners within structures, as shown in the panoramic image D1. The image feature extractor 131 for the training model can extract image features using any method; for example, it may use the known Horizon Net to extract image features.

形状特徴量抽出器１３２は、形状情報入力部１２によって入力が受け付けられた、構造物の内部空間の形状を示す形状情報Ｄ２から形状特徴量を抽出する。形状特徴量は、例えば、構造物の内部空間の形状を示す三次元座標とすることができる。形状特徴量抽出器１３２は、任意の手法を用いて形状特徴量を抽出することができる。The shape feature extractor 132 extracts shape features from the shape information D2, which represents the shape of the internal space of the structure, and which is received as input by the shape information input unit 12. The shape features can be, for example, three-dimensional coordinates representing the shape of the internal space of the structure. The shape feature extractor 132 can extract shape features using any method.

教師モデル用特徴量処理器１３３は、パノラマ画像Ｄ１から抽出された画像特徴量と、形状情報から抽出された形状特徴量とが入力されると内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力する。具体的には、教師モデル用特徴量処理器１３３は、教師モデル用画像特徴量抽出器１３１によって抽出された画像特徴量と、形状特徴量抽出器１３２によって抽出された形状特徴量とが入力されると、処理済み特徴量を出力する。The training model feature processor 133, upon receiving image features extracted from the panoramic image D1 and shape features extracted from shape information, outputs processed features that show characteristics related to the correlation of surfaces defining the internal space. Specifically, the training model feature processor 133 outputs processed features upon receiving image features extracted by the training model image feature extractor 131 and shape features extracted by the shape feature extractor 132.

より具体的には、教師モデル用特徴量処理器１３３は、文献１から文献３に記載されている手法により画像特徴量及び形状特徴量を処理することによって処理済み特徴量をすることができる。
文献１：Yu, Z., et al. Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering, ICCV, 2017.
文献２：Yu, Z., et al. Deep Modular Co-Attention Networks for Visual Question Answering, CVPR, 2019.
文献３：Ben-younes, H., et al. MUTAN: Multimodal Tucker Fusion for Visual Question Answering, ICCV, 2017. More specifically, the training model feature processor 133 can obtain processed features by processing image features and shape features using the methods described in References 1 to 3.
Reference 1: Yu, Z., et al. Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering, ICCV, 2017.
Reference 2: Yu, Z., et al. Deep Modular Co-Attention Networks for Visual Question Answering, CVPR, 2019.
Reference 3: Ben-younes, H., et al. MUTAN: Multimodal Tucker Fusion for Visual Question Answering, ICCV, 2017.

教師モデル用特徴量復号器１３４は、教師モデル用特徴量処理器１３３によって出力された処理済み特徴量を復号することにより、パノラマ画像Ｄ１における、構造物の内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する。教師モデル用特徴量復号器１３４は、例えばHorizon Netを用いて特徴量ベクトルを出力してもよい。The training model feature decoder 134 outputs a feature vector indicating the position of the corner of the plane defining the internal space of the structure in the panoramic image D1 by decoding the processed features output by the training model feature processor 133. The training model feature decoder 134 may output the feature vector using, for example, Horizon Net.

生徒モデル用画像入力部１４は、構造物の内部を撮影したパノラマ画像Ｄ１の入力を受け付ける。生徒モデル用画像入力部１４によって入力が受け付けられるパノラマ画像Ｄ１は、教師モデル用画像入力部１１によって入力が受け付けられたパノラマ画像Ｄ１と同じである。The student model image input unit 14 receives a panoramic image D1 taken inside a structure. The panoramic image D1 received by the student model image input unit 14 is the same as the panoramic image D1 received by the teacher model image input unit 11.

生徒モデル推定部１５は、生徒モデルを用いて、パノラマ画像Ｄ１から抽出された画像特徴量が入力されると処理済み特徴量を出力する。生徒モデルは、パノラマ画像Ｄ１が入力されると、内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力するモデルである。生徒モデルは、パノラマ画像Ｄ１を入力とし、特徴量ベクトルを出力するニューラルネットワークを学習することによって得られる。生徒モデルは、検証用データでの生徒モデル損失Ｌ_Sが最小値をとるように学習されていてもよい。生徒モデル損失Ｌ_Sは、損失関数を用いて算出された、追って詳細に説明する生徒モデル用特徴量処理器１５２によって出力された処理済み特徴量の分布と、真値の分布との差分である。 The student model estimation unit 15 outputs processed features when image features extracted from the panoramic image D1 are input using the student model. The student model is a model that, when the panoramic image D1 is input, outputs processed features that show features related to the correlation of the planes defining the internal space, and decodes these processed features to output a feature vector that shows the position of the corners of the planes defining the internal space. The student model is obtained by training a neural network that takes the panoramic image D1 as input and outputs a feature vector. The student model may be trained so that the student model loss L _S takes the minimum value on the validation data. The student model loss L _S is the difference between the distribution of processed features output by the student model feature processor 152 (which will be explained in detail later) and the distribution of true values, calculated using the loss function.

生徒モデル推定部１５は、生徒モデル用画像特徴量抽出器１５１と、生徒モデル用特徴量処理器１５２と、生徒モデル用特徴量復号器１５３と、を備える。The student model estimation unit 15 comprises a student model image feature extractor 151, a student model feature processor 152, and a student model feature decoder 153.

生徒モデル用画像特徴量抽出器１５１は、生徒モデル用画像入力部１４によって入力が受け付けられたパノラマ画像Ｄ１から画像特徴量を抽出する。The student model image feature extractor 151 extracts image features from the panoramic image D1, which is received as input by the student model image input unit 14.

生徒モデル用特徴量処理器１５２は、パノラマ画像Ｄ１から抽出された画像特徴量が生徒モデルに入力されると内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力する。具体的には、生徒モデル用特徴量処理器１５２は、生徒モデル用画像特徴量抽出器１５１によって抽出された画像特徴量が生徒モデルに入力されると処理済み特徴量を出力させる。具体的には、生徒モデル用特徴量処理器１５２は、例えば、生徒モデル用特徴量処理器１５２は、例えば、文献４に記載されているTransformerにより、生徒モデル用特徴量処理器１５２によって出力された画像特徴量を処理することによって、処理済み特徴量を出力することができる。
文献４：Ashish, V., et al. Attention Is All You Need, CL, 2017. The student model feature processor 152 outputs processed features that show features related to the correlation of planes defining the internal space when image features extracted from the panoramic image D1 are input to the student model. Specifically, the student model feature processor 152 outputs processed features when image features extracted by the student model image feature extractor 151 are input to the student model. Specifically, the student model feature processor 152 can output processed features by processing the image features output by the student model feature processor 152 using, for example, the Transformer described in Reference 4.
Reference 4: Ashish, V., et al. Attention Is All You Need, CL, 2017.

生徒モデル用特徴量復号器１５３は、生徒モデル用特徴量処理器１５２によって出力された処理済み特徴量を復号することにより、構造物の内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを推定する。生徒モデル用特徴量復号器１５３は、教師モデル用特徴量復号器１３４と同様に、例えばHorizon Netを用いて、処理済み特徴量に基づいて、パノラマ画像Ｄ１における、内部空間の隅角部の位置を示す特徴量ベクトルを出力してもよい。The student model feature decoder 153 estimates a feature vector indicating the position of the corner of the plane defining the interior space of the structure by decoding the processed features output by the student model feature processor 152. Similar to the teacher model feature decoder 134, the student model feature decoder 153 may output a feature vector indicating the position of the corner of the interior space in the panoramic image D1 based on the processed features, for example, using Horizon Net.

生徒モデル学習部１６は、上述した生徒モデルを学習する。このとき、生徒モデル学習部１６は、教師モデル用特徴量処理器１３３によって教師モデルを用いて出力された処理済み特徴量の分布と、生徒モデル用特徴量処理器１５２によって生徒モデルを用いて出力された処理済み特徴量の分布とに基づいて、損失関数を設定する。具体的には、生徒モデル学習部１６は、教師モデルを用いて出力された処理済み特徴量の分布と、生徒モデルを用いて出力された処理済み特徴量の分布との差分を示す損失Ｌ_featureを算出する。そして、生徒モデル学習部１６は、検証用データでの損失Ｌ_featureが最小値をとるように、生徒モデルを推定モデルとして学習する。 The student model learning unit 16 learns the student model described above. At this time, the student model learning unit 16 sets a loss function based on the distribution of processed features output using the teacher model by the teacher model feature processor 133 and the distribution of processed features output using the student model by the student model feature processor 152. Specifically, the student model learning unit 16 calculates a loss L feature that represents the difference between the distribution of processed features output using the teacher model and the distribution of processed features output using the student model. Then, the student model learning unit 16 learns the student model as an estimation model so that _{the loss L feature} _on the validation data takes its minimum value.

学習装置１００によって学習された推定モデルが出力する特徴量ベクトルのサイズは３×１×１０２４である。上述したように、特徴量ベクトルが壁面と壁面との境界の存在確率を示す構成において、特徴量ベクトルは、パノラマ画像の列ごとに、天井及び壁面の境界位置（ｙ_ｃ）と、床面及び壁面の境界位置（ｙ_ｆ）と、壁面同士の境界（すなわち、角（コーナー））の存在確率（ｙ_ｗ）を示す。ｙ_ｗを０又は１のラベルを持つ２値ベクトルとしてもよいが、１の数が疎になる（例えば、１０２４個中４個）ため、ｙ_ｗ（ｉ）＝ｃ^ｄｘとしてもよい。ここでｉはｉ番目の列、ｄｘはｉ番目の列から壁面同士の境界が存在する最も近い列までの距離、ｃは定数（例えば、ｃ＝０．９６）とする。 The size of the feature vector output by the estimation model trained by the learning device 100 is 3 × 1 × 10²⁴. As described above, in a configuration where the feature vector indicates the probability of existence of a boundary between walls, the feature vector indicates, for each column of the panoramic image, the boundary position between the ceiling and walls (y _c ), the boundary position between the floor and walls (y _f ), and the probability of existence of a boundary between walls (i.e., a corner) (y _w ). y _w may be a binary vector with labels of 0 or 1, but since the number of 1s becomes sparse (for example, 4 out of 10²⁴), y _w (i) = c ^dx may also be used. Here, i is the i-th column, dx is the distance from the i-th column to the nearest column where a boundary between walls exists, and c is a constant (for example, c = 0.96).

図２Ｂは、学習装置１００に図２Ａに示すパノラマ画像を入力した場合に出力される特徴量ベクトルの一例を示す図である。パノラマ画像と特徴量ベクトルとの対応関係を分かりやすくするために、パノラマ画像にｙ_ｃ及びｙ_ｆが示す位置を重ねて表示している。また、パノラマ画像の上側に、ｙ_ｗ（ｉ）を示している。図２Ｂではｙ_ｗ（ｉ）を見やすくするために、高さ方向に引き伸ばして表示している。 Figure 2B shows an example of a feature vector output when the panoramic image shown in Figure 2A is input to the learning device 100. To make the correspondence between the panoramic image and the feature vector easier to understand, the positions indicated by y _c and y _f are superimposed on the panoramic image. In addition, y _w (i) is shown above the panoramic image. In Figure 2B, y _w (i) is stretched in the height direction for easier viewing.

＜三次元再構成装置の構成＞
三次元再構成装置３００は、構造物の内部空間を撮影したパノラマ画像Ｄ１から内部空間を三次元再構成する。三次元再構成装置３００は、クラウドコンピューティングシステム又はその他のコンピューティングシステムに属するサーバなどのコンピュータである。 <Configuration of the three-dimensional reconstruction device>
The three-dimensional reconstruction device 300 reconstructs the interior space of a structure in three dimensions from a panoramic image D1 taken of the interior space. The three-dimensional reconstruction device 300 is a computer such as a server belonging to a cloud computing system or other computing system.

図３に示すように、三次元再構成装置３００は、生徒モデル記憶部３１と、画像入力部３２と、推定部３３と、三次元座標計算部３４と、後処理部３５と、を備える。As shown in Figure 3, the three-dimensional reconstruction device 300 comprises a student model storage unit 31, an image input unit 32, an estimation unit 33, a three-dimensional coordinate calculation unit 34, and a post-processing unit 35.

生徒モデル記憶部３１は、メモリによって構成される。メモリは、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read-Only Memory）、ＲＯＭ（Read Only Memory）およびＲＡＭ（Random Access Memory）等であってよい。画像入力部３２は、入力インターフェースによって構成される。推定部３３、及び後処理部３５は、コントローラによって構成される。コントローラは、ＡＳＩＣ(Application Specific Integrated Circuit)、ＦＰＧＡ(Field-Programmable Gate Array)等の専用のハードウェアによって構成されてもよいし、プロセッサによって構成されてもよいし、双方を含んで構成されてもよい。The student model storage unit 31 is composed of memory. The memory may be an HDD (Hard Disk Drive), SSD (Solid State Drive), EEPROM (Electrically Erasable Programmable Read-Only Memory), ROM (Read Only Memory), or RAM (Random Access Memory). The image input unit 32 is composed of an input interface. The estimation unit 33 and the post-processing unit 35 are composed of a controller. The controller may be composed of dedicated hardware such as an ASIC (Application Specific Integrated Circuit) or FPGA (Field-Programmable Gate Array), or it may be composed of a processor, or it may include both.

生徒モデル記憶部３１は、学習装置１００によって学習された生徒モデル（推定モデル）を記憶する。The student model memory unit 31 stores the student model (estimated model) learned by the learning device 100.

画像入力部３２は、構造物の内部空間を撮影したパノラマ画像Ｄ１の入力を受け付ける。The image input unit 32 receives a panoramic image D1, which is a photograph of the interior space of the structure.

推定部３３は、上述した学習装置１００によって学習された生徒モデルに、パノラマ画像Ｄ１を入力し、該パノラマ画像Ｄ１における、内部空間の隅角部の位置を示す特徴量ベクトルを出力させることによって、特徴量ベクトルを推定する。The estimation unit 33 inputs the panoramic image D1 to the student model trained by the learning device 100 described above, and estimates the feature vector by outputting a feature vector indicating the position of the corner of the interior space in the panoramic image D1.

三次元座標計算部３４は、推定部３３によって推定された特徴量ベクトルに基づいて、隅角部を構成する点の三次元座標を計算する。The three-dimensional coordinate calculation unit 34 calculates the three-dimensional coordinates of the points constituting the corner based on the feature vector estimated by the estimation unit 33.

後処理部３５は、三次元座標計算部３４によって計算された三次元座標に基づいて、内部空間を三次元再構成する。具体的には、後処理部３５は、三次元座標に基づいて、図４に示すような三次元の像Ｄ３を生成する。また、後処理部３５は、三次元の像Ｄ３を表示装置等に出力する。The post-processing unit 35 reconstructs the internal space in three dimensions based on the three-dimensional coordinates calculated by the three-dimensional coordinate calculation unit 34. Specifically, the post-processing unit 35 generates a three-dimensional image D3, as shown in Figure 4, based on the three-dimensional coordinates. The post-processing unit 35 also outputs the three-dimensional image D3 to a display device or the like.

＜学習装置の動作＞
ここで、本実施形態に係る学習装置１００の動作について、図５を参照して説明する。図５は、本実施形態に係る学習装置１００の動作の一例を示すフローチャートである。図５を参照して説明する学習装置１００の動作は、本実施形態に係る、構造物の内部空間を撮影したパノラマ画像Ｄ１から内部空間を三次元再構成するための推定モデルを学習する学習装置１００が実行する方法の一例に相当する。 <Operation of the learning device>
Here, the operation of the learning device 100 according to this embodiment will be described with reference to Figure 5. Figure 5 is a flowchart showing an example of the operation of the learning device 100 according to this embodiment. The operation of the learning device 100 described with reference to Figure 5 corresponds to an example of the method performed by the learning device 100 according to this embodiment, which learns an estimation model for three-dimensional reconstruction of the interior space from a panoramic image D1 taken of the interior space of a structure.

まず、ステップＳ１１からステップＳ１５において、教師モデル推定部１３が、教師モデルを用いて、パノラマ画像Ｄ１から抽出された画像特徴量と、形状情報Ｄ２から抽出された形状特徴量とが入力されると処理済み特徴量を出力する。上述したように、パノラマ画像Ｄ１と、内部空間の形状を示す形状情報Ｄ２とが入力されると、内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力するモデルである。First, in steps S11 to S15, the teacher model estimation unit 13 uses the teacher model to output processed features when it receives image features extracted from the panoramic image D1 and shape features extracted from the shape information D2. As described above, when the panoramic image D1 and shape information D2 indicating the shape of the internal space are input, the model outputs processed features indicating the features related to the correlation of the planes defining the internal space, and decodes these processed features to output a feature vector indicating the position of the corners of the planes defining the internal space.

具体的には、ステップＳ１１において、教師モデル用画像入力部１１が、構造物の内部空間を撮影したパノラマ画像Ｄ１の入力を受け付ける。Specifically, in step S11, the image input unit 11 for the training model receives input of a panoramic image D1 taken of the interior space of the structure.

ステップＳ１２において、教師モデル用画像特徴量抽出器１３１が、パノラマ画像Ｄ１から画像特徴量を抽出する。In step S12, the image feature extractor 131 for the training model extracts image features from the panoramic image D1.

ステップＳ１３において、形状情報入力部１２が、構造物の内部空間の形状を示す形状情報Ｄ２の入力を受け付ける。In step S13, the shape information input unit 12 receives shape information D2, which indicates the shape of the internal space of the structure.

ステップＳ１４において、形状特徴量抽出器１３２が、形状情報Ｄ２から形状特徴量を抽出する。In step S14, the shape feature extractor 132 extracts shape features from the shape information D2.

ステップＳ１５において、教師モデル推定部１３が、パノラマ画像Ｄ１から抽出された画像特徴量と、形状情報Ｄ２から抽出された形状特徴量とが入力されると内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力させる。具体的には、教師モデル用特徴量処理器１３３が、教師モデルに、ステップ１２で抽出された画像特徴量と、ステップＳ１４で抽出された形状特徴量とを入力して、処理済み特徴量を出力させる。In step S15, the teacher model estimation unit 13 receives image features extracted from the panoramic image D1 and shape features extracted from the shape information D2 as input and outputs processed feature quantities that show features related to the correlation of the planes defining the internal space. Specifically, the teacher model feature processor 133 inputs the image features extracted in step 12 and the shape features extracted in step S14 into the teacher model and outputs processed feature quantities.

次に、ステップＳ１６からステップＳ１８において、生徒モデル推定部１５が、生徒モデルを用いて、パノラマ画像Ｄ１から抽出された画像特徴量が入力されると処理済み特徴量を出力する。上述したように、パノラマ画像Ｄ１が入力されると、内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、パノラマ画像Ｄ１における、内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力するモデルである。Next, in steps S16 to S18, the student model estimation unit 15 outputs processed features when image features extracted from the panoramic image D1 are input using the student model. As described above, when the panoramic image D1 is input, the model outputs processed features that show features related to the correlation of the planes defining the internal space, and decodes these processed features to output a feature vector that shows the position of the corners of the planes defining the internal space in the panoramic image D1.

具体的には、ステップＳ１６において、生徒モデル用画像入力部１４が、構造物の内部を撮影したパノラマ画像Ｄ１の入力を受け付ける。Specifically, in step S16, the student model image input unit 14 receives input of a panoramic image D1 taken inside the structure.

ステップＳ１７において、生徒モデル用画像特徴量抽出器１５１が、パノラマ画像Ｄ１から画像特徴量を抽出する。In step S17, the student model image feature extractor 151 extracts image features from the panoramic image D1.

ステップＳ１８において、生徒モデル推定部１５が、パノラマ画像Ｄ１から抽出された画像特徴量が入力されると内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力させる。具体的には、生徒モデル用特徴量処理器１５２が、生徒モデルに、ステップＳ１７で抽出された画像特徴量を入力して処理済み特徴量を出力させる。In step S18, when the student model estimation unit 15 receives image features extracted from the panoramic image D1 as input, it outputs processed feature quantities that show features related to the correlation of the planes defining the internal space. Specifically, the student model feature processor 152 inputs the image features extracted in step S17 into the student model and outputs processed feature quantities.

続いて、ステップＳ１９及びステップＳ２０において、生徒モデル学習部１６が、生徒モデルを学習する。Next, in steps S19 and S20, the student model learning unit 16 learns the student model.

具体的には、ステップＳ１９において、生徒モデル学習部１６が、ステップＳ１５で教師モデルを用いて出力された処理済み特徴量の分布と、ステップＳ１８で生徒モデルを用いて出力された処理済み特徴量の分布との差分を示す損失Ｌ_featureを算出する。 Specifically, in step S19, the student model learning unit 16 calculates a loss L feature, which is the difference between the distribution of processed features output using the teacher model in step S15 and the distribution of processed _features output using the student model in step S18.

ステップＳ２０において、生徒モデル学習部１６が、損失Ｌ_featureが検証用データで最小値をとるように生徒モデルを学習する。 In step S20, the student model learning unit 16 learns the student model so that the loss L _feature takes its minimum value on the validation data.

なお、上述において、学習装置１００は、ステップＳ１１からステップＳ１５を実行した後、ステップＳ１６からステップＳ１８を実行したが、これに限られない。例えば、学習装置１００は、ステップＳ１６からステップＳ１８を実行した後、ステップＳ１１からステップＳ１５を実行してもよい。また、学習装置１００は、ステップＳ１１からステップＳ１５を実行するタイミングで、ステップＳ１６からステップＳ１８を実行してもよい。In the above description, the learning device 100 executed steps S16 to S18 after executing steps S11 to S15, but it is not limited to this. For example, the learning device 100 may execute steps S11 to S15 after executing steps S16 to S18. Alternatively, the learning device 100 may execute steps S16 to S18 at the same time as executing steps S11 to S15.

また、学習装置１００は、ステップＳ１１及びステップＳ１２を実行した後、ステップＳ１３及びステップＳ１４を実行したが、これに限られない。例えば、学習装置１００は、ステップＳ１３及びステップＳ１４を実行した後、ステップＳ１１及びステップＳ１２を実行してもよい。また、学習装置１００は、ステップＳ１１及びステップＳ１２を実行するタイミングで、ステップＳ１３及びステップＳ１４を実行してもよい。Furthermore, while the learning device 100 performed steps S13 and S14 after performing steps S11 and S12, it is not limited to this. For example, the learning device 100 may perform steps S11 and S12 after performing steps S13 and S14. Alternatively, the learning device 100 may perform steps S13 and S14 at the same time as performing steps S11 and S12.

＜三次元再構成装置の動作＞
ここで、本実施形態に係る三次元再構成装置３００の動作について、図６を参照して説明する。図６は、本実施形態に係る三次元再構成装置３００の動作の一例を示すフローチャートである。図６を参照して説明する三次元再構成装置３００の動作は、本実施形態に係る、構造物の内部空間を撮影したパノラマ画像Ｄ１から内部空間を三次元再構成するための三次元再構成装置３００が実行する方法の一例に相当する。 <Operation of the three-dimensional reconstruction device>
Here, the operation of the three-dimensional reconstruction apparatus 300 according to this embodiment will be described with reference to Figure 6. Figure 6 is a flowchart showing an example of the operation of the three-dimensional reconstruction apparatus 300 according to this embodiment. The operation of the three-dimensional reconstruction apparatus 300 described with reference to Figure 6 corresponds to an example of the method performed by the three-dimensional reconstruction apparatus 300 for three-dimensional reconstruction of the interior space from a panoramic image D1 taken of the interior space of a structure, according to this embodiment.

ステップＳ３１において、画像入力部３２が、構造物の内部を撮影したパノラマ画像Ｄ１の入力を受け付ける。In step S31, the image input unit 32 receives an input of a panoramic image D1 taken inside the structure.

ステップＳ３２において、推定部３３が、上述した学習装置１００によって学習された生徒モデルに、ステップ３１で入力が受け付けられたパノラマ画像Ｄ１を入力し、該パノラマ画像Ｄ１における内部空間の隅角部の位置を示す特徴量ベクトルを出力させることによって、特徴量ベクトルを推定する。In step S32, the estimation unit 33 inputs the panoramic image D1 received in step 31 into the student model trained by the learning device 100 described above, and estimates the feature vector by outputting a feature vector indicating the position of the corner of the interior space in the panoramic image D1.

ステップＳ３３において、三次元座標計算部３４が、推定部３３によって推定された特徴量ベクトルに基づいて、隅角部を構成する点の三次元座標を計算する。In step S33, the three-dimensional coordinate calculation unit 34 calculates the three-dimensional coordinates of the points constituting the corner portion based on the feature vector estimated by the estimation unit 33.

ステップＳ３４において、後処理部３５が、ステップＳ３４で推定された三次元座標に基づいて、内部空間を三次元再構成する。In step S34, the post-processing unit 35 reconstructs the internal space in three dimensions based on the three-dimensional coordinates estimated in step S34.

上述のように、本実施形態における学習装置１００は、構造物の内部空間を撮影したパノラマ画像Ｄ１から内部空間を三次元再構成するための推定モデルを学習する学習装置１００であって、パノラマ画像Ｄ１と、内部空間の形状を示す形状情報Ｄ２とが入力されると内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、パノラマ画像における、内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する教師モデルを用いて、パノラマ画像Ｄ１から抽出された画像特徴量と、形状情報Ｄ２から抽出された形状特徴量とが入力されると処理済み特徴量を出力する教師モデル推定部１３と、パノラマ画像Ｄ１が入力されると内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、パノラマ画像Ｄ１における、内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する生徒モデルを用いて、パノラマ画像Ｄ１から抽出された画像特徴量が入力されると処理済み特徴量を出力する生徒モデル推定部１５と、教師モデル推定部１３によって出力された処理済み特徴量の分布と、生徒モデル推定部１５によって出力された処理済み特徴量の分布との差分を示す損失Ｌ_featureを算出し、検証用データでの損失が最小値をとるように生徒モデルを推定モデルとして学習する生徒モデル学習部１６と、を備える。 As described above, the learning device 100 in this embodiment is a learning device 100 that learns an estimation model for three-dimensional reconstruction of the interior space from a panoramic image D1 of the interior space of a structure, and when the panoramic image D1 and shape information D2 indicating the shape of the interior space are input, it outputs processed feature quantities that show features related to the correlation of the planes defining the interior space, and decodes the processed feature quantities to output a feature quantity vector that shows the position of the corners of the planes defining the interior space in the panoramic image, using a teacher model, and when the image features extracted from the panoramic image D1 and the shape features extracted from the shape information D2 are input The system includes: a teacher model estimation unit 13 that outputs processed features when a panoramic image D1 is input, a student model estimation unit 15 that outputs processed features when image features extracted from the panoramic image D1 are input, using a student model that outputs processed features showing features related to the correlation of planes defining the internal space when a panoramic image D1 is input, decodes the processed features and outputs a feature vector showing the position of the corner of the planes defining the internal space in the panoramic image D1, and calculates a loss L _feature that shows the difference between the distribution of processed features output by the teacher model estimation unit 13 and the distribution of processed features output by the student model estimation unit 15, and learns the student model as an estimation model so that the loss on the verification data is minimized.

これにより、学習装置１００によって学習された推定モデルを用いて、三次元再構成装置３００が、形状特徴量を用いることなく、特徴量ベクトルを、形状特徴量を用いた場合と同程度の高い精度で推定することができる。したがって、三次元再構成装置３００は、高い精度で構造物の内部空間の隅角部を構成する点の三次元座標を計算することができ、高い精度で内部空間を三次元再構成することができる。これに伴い、例えば、作業者が点検等により検出した、構造物の内部空間を画定する面の劣化を三次元の像Ｄ３に反映させることにより、劣化箇所を示す構造物のモデリング精度が向上する。また、構造物の内部空間に配置された家具、内部空間を画定する面に貼り付けられた壁紙等を三次元の像Ｄ３に反映させることにより、構造物のモデリング精度が向上する。また、三次元再構成装置３００は、形状特徴量を用いないため、高速に、三次元座標を推定することができる。As a result, the three-dimensional reconstruction device 300 can use the estimation model learned by the learning device 100 to estimate feature vectors with the same high accuracy as when shape features are used, without using shape features. Therefore, the three-dimensional reconstruction device 300 can calculate the three-dimensional coordinates of points constituting the corners of the internal space of a structure with high accuracy, and can reconstruct the internal space in three dimensions with high accuracy. Consequently, for example, by reflecting the deterioration of surfaces defining the internal space of a structure, detected by workers through inspections, etc., in the three-dimensional image D3, the modeling accuracy of the structure showing deteriorated areas is improved. Furthermore, by reflecting furniture placed in the internal space of a structure, wallpaper applied to surfaces defining the internal space, etc., in the three-dimensional image D3, the modeling accuracy of the structure is improved. In addition, because the three-dimensional reconstruction device 300 does not use shape features, it can estimate three-dimensional coordinates at high speed.

また、本実施形態における三次元再構成装置３００は、構造物の内部空間を撮影したパノラマ画像Ｄ１から内部空間を三次元再構成する三次元再構成装置３００であって、パノラマ画像Ｄ１の入力を受け付ける画像入力部３２と、上述した学習装置１００によって学習された推定モデルに、パノラマ画像Ｄ１を入力し、内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力させる推定部３３と、特徴量ベクトルに基づいて、隅角部を構成する点の三次元座標を計算する三次元座標計算部３４と、三次元座標に基づいて内部空間を三次元再構成する後処理部３５と、を備える。これにより、三次元再構成装置３００が、形状特徴量を用いることなく、内部空間を画定する面の三次元座標を高い精度で計算することができる。したがって、三次元再構成装置３００が、三次元座標に基づいて高い精度で内部空間を三次元再構成することができる。これに伴い、例えば、作業者が点検等により検出した、構造物の内部空間を画定する面の劣化を三次元の像Ｄ３に反映させることにより、劣化箇所を示す構造物のモデリング精度が向上する。また、構造物の内部空間に配置された家具、内部空間を画定する面に貼り付けられた壁紙等を三次元の像Ｄ３に反映させることにより、構造物のモデリング精度が向上する。また、三次元再構成装置３００は、形状特徴量を用いないため、高速に、内部空間を三次元再構成することができる。Furthermore, the three-dimensional reconstruction device 300 in this embodiment is a three-dimensional reconstruction device 300 that reconstructs the interior space of a structure in three dimensions from a panoramic image D1 taken of the interior space of the structure, and comprises an image input unit 32 that receives the panoramic image D1 as input, an estimation unit 33 that inputs the panoramic image D1 to an estimation model learned by the learning device 100 described above and outputs a feature vector indicating the position of the corners of the planes defining the interior space, a three-dimensional coordinate calculation unit 34 that calculates the three-dimensional coordinates of the points constituting the corners based on the feature vector, and a post-processing unit 35 that reconstructs the interior space in three dimensions based on the three-dimensional coordinates. As a result, the three-dimensional reconstruction device 300 can calculate the three-dimensional coordinates of the planes defining the interior space with high accuracy without using shape features. Therefore, the three-dimensional reconstruction device 300 can reconstruct the interior space in three dimensions with high accuracy based on the three-dimensional coordinates. Consequently, for example, by reflecting the deterioration of surfaces defining the internal space of a structure, detected by workers through inspections, etc., into the three-dimensional image D3, the modeling accuracy of the structure showing deteriorated areas is improved. Furthermore, by reflecting furniture placed in the internal space of the structure, wallpaper applied to surfaces defining the internal space, etc., into the three-dimensional image D3, the modeling accuracy of the structure is improved. In addition, since the three-dimensional reconstruction device 300 does not use shape features, it can reconstruct the internal space in three dimensions at high speed.

＜プログラム＞
上述した学習装置１００及び三次元再構成装置３００は、コンピュータによって実現することができる。また、学習装置１００及び三次元再構成装置３００としてそれぞれ機能させるためのプログラムが提供されてもよい。また、該プログラムは、記憶媒体に記憶されてもよいし、ネットワークを通して提供されてもよい。図７は、学習装置１００として機能するコンピュータ４０１の概略構成を示すブロック図である。三次元再構成装置３００として機能するコンピュータも同様である。ここで、コンピュータ４０１は、汎用コンピュータ、専用コンピュータ、ワークステーション、ＰＣ（Personal Computer）、電子ノートパッドなどであってもよい。プログラム命令は、必要なタスクを実行するためのプログラムコード、コードセグメントなどであってもよい。 <Program>
The learning device 100 and the three-dimensional reconstruction device 300 described above can be implemented using a computer. Furthermore, programs for functioning as the learning device 100 and the three-dimensional reconstruction device 300 may be provided. These programs may be stored on a storage medium or provided via a network. Figure 7 is a block diagram illustrating the schematic configuration of a computer 401 functioning as the learning device 100. The computer functioning as the three-dimensional reconstruction device 300 is similar. Here, the computer 401 may be a general-purpose computer, a dedicated computer, a workstation, a PC (Personal Computer), an electronic notepad, etc. Program instructions may be program code, code segments, etc., for executing the necessary tasks.

図７に示すように、コンピュータ４０１は、プロセッサ４１０と、ＲＯＭ（Read Only Memory）４２０と、ＲＡＭ（Random Access Memory）４３０と、ストレージ４４０と、入力部４５０と、出力部４６０と、通信インターフェース（Ｉ／Ｆ）４７０とを備える。各構成は、バス４８０を介して相互に通信可能に接続されている。プロセッサ４１０は、具体的にはＣＰＵ(Central Processing Unit)、ＭＰＵ（Micro Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＳｏＣ（System on a Chip）などであり、同種又は異種の複数のプロセッサにより構成されてもよい。As shown in Figure 7, the computer 401 comprises a processor 410, a ROM (Read Only Memory) 420, a RAM (Random Access Memory) 430, storage 440, an input unit 450, an output unit 460, and a communication interface (I/F) 470. Each component is connected to the others via a bus 480 so as to be able to communicate with each other. The processor 410 is specifically a CPU (Central Processing Unit), MPU (Micro Processing Unit), GPU (Graphics Processing Unit), DSP (Digital Signal Processor), SoC (System on a Chip), etc., and may be composed of multiple processors of the same or different types.

プロセッサ４１０は、各構成の制御、及び各種の演算処理を実行する。すなわち、プロセッサ４１０は、ＲＯＭ４２０又はストレージ４４０からプログラムを読み出し、ＲＡＭ４３０を作業領域としてプログラムを実行する。プロセッサ４１０は、ＲＯＭ４２０又はストレージ４４０に記憶されているプログラムに従って、上記各構成の制御及び各種の演算処理を行う。上述した実施形態では、ＲＯＭ４２０又はストレージ４４０に、本開示に係るプログラムが記憶されている。The processor 410 controls each configuration and performs various arithmetic operations. Specifically, the processor 410 reads a program from the ROM 420 or storage 440 and executes the program using the RAM 430 as a working area. The processor 410 controls each configuration and performs various arithmetic operations according to the program stored in the ROM 420 or storage 440. In the above-described embodiment, the program according to this disclosure is stored in the ROM 420 or storage 440.

プログラムは、コンピュータ４０１が読み取り可能な記憶媒体に記憶されていてもよい。このような記憶媒体を用いれば、プログラムをコンピュータ４０１にインストールすることが可能である。ここで、プログラムが記憶された記憶媒体は、非一時的（non-transitory）記憶媒体であってもよい。非一時的記憶媒体は、特に限定されるものではないが、例えば、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ＵＳＢ（Universal Serial Bus）メモリなどであってもよい。また、このプログラムは、ネットワークを介して外部装置からダウンロードされる形態としてもよい。The program may be stored on a storage medium readable by computer 401. Using such a storage medium, the program can be installed on computer 401. Here, the storage medium on which the program is stored may be a non-transitory storage medium. The non-transitory storage medium is not particularly limited, but may include, for example, a CD-ROM, DVD-ROM, or USB (Universal Serial Bus) memory. Alternatively, the program may be downloaded from an external device via a network.

ＲＯＭ４２０は、各種プログラム及び各種データを記憶する。ＲＡＭ４３０は、作業領域として一時的にプログラム又はデータを記憶する。ストレージ４４０は、ＨＤＤ（Hard Disk Drive）又はＳＳＤ（Solid State Drive）により構成され、オペレーティングシステムを含む各種プログラム及び各種データを記憶する。ROM 420 stores various programs and data. RAM 430 temporarily stores programs or data as a working area. Storage 440 consists of an HDD (Hard Disk Drive) or SSD (Solid State Drive) and stores various programs and data, including the operating system.

入力部４５０は、ユーザの入力操作を受け付けて、ユーザの操作に基づく情報を取得する１つ以上の入力インターフェースを含む。例えば、入力部４５０は、ポインティングデバイス、キーボード、マウスなどであるが、これらに限定されない。The input unit 450 includes one or more input interfaces that receive user input operations and acquire information based on the user operations. For example, the input unit 450 is a pointing device, keyboard, mouse, etc., but is not limited to these.

出力部４６０は、情報を出力する１つ以上の出力インターフェースを含む。例えば、出力部４６０は、情報を映像で出力するディスプレイ、又は情報を音声で出力するスピーカであるが、これらに限定されない。なお、出力部４６０は、タッチパネル方式のディスプレイである場合には、入力部４５０としても機能する。The output unit 460 includes one or more output interfaces for outputting information. For example, the output unit 460 is a display that outputs information as video, or a speaker that outputs information as audio, but is not limited to these. Furthermore, if the output unit 460 is a touch panel display, it also functions as an input unit 450.

通信インターフェース４７０は、外部の装置と通信するためのインターフェースである。The communication interface 470 is an interface for communicating with external devices.

以上の実施形態に関し、更に以下の付記を開示する。
［付記項１］
構造物の内部空間を撮影したパノラマ画像から前記内部空間を三次元再構成するための推定モデルを学習する学習装置であって、
コントローラを備え、前記コントローラは、
前記パノラマ画像と、前記内部空間の形状を示す形状情報とが入力されると前記内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、前記パノラマ画像における、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する教師モデルを用いて、前記パノラマ画像から抽出された画像特徴量と、前記形状情報から抽出された形状特徴量とが入力されると前記処理済み特徴量を出力し、
前記パノラマ画像が入力されると前記内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、前記パノラマ画像における、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する生徒モデルを用いて、前記パノラマ画像から抽出された画像特徴量が入力されると前記処理済み特徴量を出力し、
前記教師モデルを用いて出力された処理済み特徴量の分布と、前記生徒モデルを用いて出力された処理済み特徴量の分布との差分を示す損失を算出し、前記損失が検証用データで最小値をとるように生徒モデルを前記推定モデルとして学習する、学習装置。
［付記項２］
構造物の内部空間を撮影したパノラマ画像から前記内部空間を三次元再構成する三次元再構成装置であって、
前記パノラマ画像の入力を受け付ける入力インターフェースと、コントローラとを備え、
前記コントローラは、
付記項１に記載の学習装置によって学習された推定モデルに、前記パノラマ画像を入力し、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力させ、
前記特徴量ベクトルに基づいて、前記隅角部を構成する点の三次元座標を計算し、
前記三次元座標に基づいて前記内部空間を三次元再構成する、三次元再構成装置。
［付記項３］
構造物の内部空間を撮影したパノラマ画像から前記内部空間を三次元再構成するための推定モデルを学習する学習装置が実行する学習方法であって、
前記パノラマ画像と、前記内部空間の形状を示す形状情報とが入力されると前記内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、前記パノラマ画像における、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する教師モデルを用いて、前記パノラマ画像から抽出された画像特徴量と、前記形状情報から抽出された形状特徴量とが入力されると前記処理済み特徴量を出力し、
前記パノラマ画像が入力されると前記内部空間を画定する面の相関に関する特徴を示す処理済み特徴量を出力し、該処理済み特徴量を復号して、前記パノラマ画像における、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力する生徒モデルを用いて、前記パノラマ画像から抽出された画像特徴量が入力されると前記処理済み特徴量を出力し、
前記教師モデルを用いて出力された処理済み特徴量の分布と、前記生徒モデルを用いて出力された処理済み特徴量の分布との差分を示す損失を算出し、前記損失が検証用データで最小値をとるように生徒モデルを前記推定モデルとして学習する、学習方法。
［付記項４］
構造物の内部空間を撮影したパノラマ画像から前記内部空間を三次元再構成する三次元再構成装置が実行する三次元再構成方法であって、
前記パノラマ画像の入力を受け付け、
付記項３に記載の学習方法によって学習された推定モデルに、前記パノラマ画像を入力し、前記内部空間を画定する面の隅角部の位置を示す特徴量ベクトルを出力させ、
前記特徴量ベクトルに基づいて、前記隅角部を構成する点の三次元座標を計算し、
前記三次元座標に基づいて前記内部空間を三次元再構成する、三次元再構成方法。
［付記項５］
コンピュータを、付記項１に記載の学習装置として動作させるためのプログラムを記憶した非一時的なコンピュータ読取り可能な媒体。
［付記項６］
コンピュータを、付記項２に記載の三次元再構成装置として動作させるためのプログラムを記憶した非一時的なコンピュータ読取り可能な媒体。 The following additional information is disclosed regarding the embodiments described above.
[Additional note 1]
A learning device for learning an estimation model for three-dimensional reconstruction of the interior space of a structure from panoramic images of the interior space of the structure,
The controller is equipped with a controller,
Using a training model that, when the panoramic image and shape information indicating the shape of the internal space are input, outputs processed feature quantities showing features related to the correlation of the planes defining the internal space, decodes the processed feature quantities to output feature vectors indicating the positions of the corners of the planes defining the internal space in the panoramic image, when image features extracted from the panoramic image and shape features extracted from the shape information are input, the processed feature quantities are output.
Using a student model that, upon inputting the panoramic image, outputs processed feature quantities showing characteristics related to the correlation of the planes defining the internal space, decodes the processed feature quantities to output feature vectors indicating the positions of the corners of the planes defining the internal space in the panoramic image, and then, upon inputting image features extracted from the panoramic image, outputs the processed feature quantities,
A learning device that calculates a loss representing the difference between the distribution of processed features output using the teacher model and the distribution of processed features output using the student model, and trains the student model as the estimation model so that the loss takes its minimum value on the validation data.
[Additional note 2]
A three-dimensional reconstruction device for three-dimensionally reconstructing the interior space of a structure from a panoramic image of the interior space captured,
The system includes an input interface for receiving the aforementioned panoramic image and a controller.
The aforementioned controller,
The panoramic image is input to the estimation model trained by the learning device described in Appendix 1, and a feature vector indicating the position of the corner of the plane defining the interior space is output.
Based on the aforementioned feature vector, the three-dimensional coordinates of the points constituting the corner portion are calculated.
A three-dimensional reconstruction device that reconstructs the internal space in three dimensions based on the aforementioned three-dimensional coordinates.
[Additional note 3]
A learning method performed by a learning device that learns an estimation model for three-dimensional reconstruction of the interior space of a structure from panoramic images of the interior space of the structure,
Using a training model that, when the panoramic image and shape information indicating the shape of the internal space are input, outputs processed feature quantities showing features related to the correlation of the planes defining the internal space, decodes the processed feature quantities to output feature vectors indicating the positions of the corners of the planes defining the internal space in the panoramic image, when image features extracted from the panoramic image and shape features extracted from the shape information are input, the processed feature quantities are output.
Using a student model that, upon inputting the panoramic image, outputs processed feature quantities showing characteristics related to the correlation of the planes defining the internal space, decodes the processed feature quantities to output feature vectors indicating the positions of the corners of the planes defining the internal space in the panoramic image, and then, upon inputting image features extracted from the panoramic image, outputs the processed feature quantities,
A learning method comprising: calculating a loss that represents the difference between the distribution of processed features output using the teacher model and the distribution of processed features output using the student model; and training the student model as the estimation model so that the loss takes its minimum value on the validation data.
[Additional note 4]
A three-dimensional reconstruction method performed by a three-dimensional reconstruction apparatus that reconstructs the interior space of a structure in three dimensions from a panoramic image of the interior space of the structure,
The panoramic image is received as input.
The estimation model trained by the learning method described in Appendix 3 is input with the panoramic image and outputs a feature vector indicating the position of the corner of the plane defining the interior space.
Based on the aforementioned feature vector, the three-dimensional coordinates of the points constituting the corner portion are calculated.
A three-dimensional reconstruction method for three-dimensionally reconstructing the internal space based on the three-dimensional coordinates.
[Additional note 5]
A non-temporary computer-readable medium containing a program for operating a computer as a learning device as described in Appendix 1.
[Additional note 6]
A non-temporary computer-readable medium storing a program for operating a computer as a three-dimensional reconstruction device as described in Appendix 2.

本明細書に記載された全ての文献、特許出願および技術は、個々の文献、特許出願、および技術が参照により取り込まれることが具体的かつ個々に記載された場合と同程度に、本明細書中に参照により取り込まれる。All documents, patent applications, and technologies described herein are incorporated by reference to the same extent as if each individual document, patent application, and technology were specifically and individually described as being incorporated by reference.

上述の実施形態は代表的な例として説明したが、本開示の趣旨及び範囲内で、多くの変更及び置換ができることは当業者に明らかである。したがって、本発明は、上述の実施形態によって制限するものと解するべきではなく、請求の範囲から逸脱することなく、種々の変形又は変更が可能である。Although the embodiments described above are representative examples, it will be apparent to those skilled in the art that many modifications and substitutions are possible within the spirit and scope of this disclosure. Therefore, the present invention should not be construed as being limited by the embodiments described above, and various modifications or changes are possible without departing from the claims.

１１教師モデル用画像入力部
１２形状情報入力部
１３教師モデル推定部
１４生徒モデル用画像入力部
１５生徒モデル推定部
１６生徒モデル学習部
３１生徒モデル記憶部
３２画像入力部
３３推定部
３４三次元座標計算部
３５後処理部
１００学習装置
１３１教師モデル用画像特徴量抽出器
１３２形状特徴量抽出器
１３３教師モデル用特徴量処理器
１３４教師モデル用特徴量復号器
１５１生徒モデル用画像特徴量抽出器
１５２生徒モデル用特徴量処理器
１５３生徒モデル用特徴量復号器
３００三次元再構成装置
４０１コンピュータ
４１０プロセッサ
４２０ＲＯＭ
４３０ＲＡＭ
４４０ストレージ
４５０入力部
４６０出力部
４７０通信インターフェース
４８０バス 11 Image input unit for teacher model 12 Shape information input unit 13 Teacher model estimation unit 14 Image input unit for student model 15 Student model estimation unit 16 Student model learning unit 31 Student model storage unit 32 Image input unit 33 Estimation unit 34 Three-dimensional coordinate calculation unit 35 Post-processing unit 100 Learning device 131 Image feature extractor for teacher model 132 Shape feature extractor 133 Feature processor for teacher model 134 Feature decoder for teacher model 151 Image feature extractor for student model 152 Feature processor for student model 153 Feature decoder for student model 300 Three-dimensional reconstruction device 401 Computer 410 Processor 420 ROM
430 RAM
440 Storage 450 Input section 460 Output section 470 Communication interface 480 Bus

Claims

A learning device for learning an estimation model for three-dimensional reconstruction of the interior space of a structure from panoramic images of the interior space of the structure,
A teacher model estimation unit that, upon inputting the panoramic image and shape information indicating the shape of the internal space, outputs processed feature quantities showing features relating to the correlation of the planes defining the internal space, decodes the processed feature quantities, and outputs feature vectors indicating the positions of the corners of the planes defining the internal space in the panoramic image, and upon inputting image features extracted from the panoramic image and shape features extracted from the shape information, outputs the processed feature quantities.
A student model estimation unit that, when an image feature extracted from the panoramic image is input, outputs a processed feature quantity showing characteristics related to the correlation of the planes defining the internal space, decodes the processed feature quantity and outputs a feature quantity vector showing the position of the corner of the planes defining the internal space in the panoramic image, and when an image feature quantity extracted from the panoramic image is input, outputs the processed feature quantity,
A student model learning unit calculates a loss that represents the difference between the distribution of processed features output using the teacher model and the distribution of processed features output using the student model, and trains the student model as the estimation model so that the loss takes the minimum value on the validation data.
A learning device equipped with the following features.

A three-dimensional reconstruction device for three-dimensionally reconstructing the interior space of a structure from a panoramic image of the interior space captured,
An image input unit that receives the aforementioned panoramic image input,
An estimation unit that inputs the panoramic image to an estimation model learned by the learning device described in claim 1 and outputs a feature vector indicating the position of the corner of the plane defining the internal space,
A three-dimensional coordinate calculation unit calculates the three-dimensional coordinates of the points constituting the corner portion based on the aforementioned feature vector,
A post-processing unit that reconstructs the internal space in three dimensions based on the three-dimensional coordinates,
A three-dimensional reconstruction device equipped with the following features.

A learning method performed by a learning device that learns an estimation model for three-dimensional reconstruction of the interior space of a structure from panoramic images of the interior space of the structure,
Using a training model that, when the panoramic image and shape information indicating the shape of the internal space are input, outputs processed feature quantities showing features relating to the correlation of the planes defining the internal space, decodes the processed feature quantities to output feature vectors indicating the positions of the corners of the planes defining the internal space in the panoramic image, the step of outputting the processed feature quantities when image features extracted from the panoramic image and shape features extracted from the shape information are input,
Using a student model that, when the panoramic image is input, outputs processed feature quantities showing characteristics related to the correlation of the planes defining the internal space, decodes the processed feature quantities to output feature vectors showing the positions of the corners of the planes defining the internal space in the panoramic image, the student model outputs the processed feature quantities when image features extracted from the panoramic image are input.
The steps include: calculating a loss that represents the difference between the distribution of processed features output using the teacher model and the distribution of processed features output using the student model, and training the student model as the estimation model so that the loss takes its minimum value on the validation data;
Learning methods that include this.

A three-dimensional reconstruction method performed by a three-dimensional reconstruction apparatus that reconstructs the interior space of a structure in three dimensions from a panoramic image of the interior space of the structure,
The step of receiving the aforementioned panoramic image as input,
The steps include inputting the panoramic image into an estimation model trained by the learning method described in claim 3 and outputting a feature vector indicating the position of the corner of the plane defining the interior space,
The steps include: calculating the three-dimensional coordinates of the points constituting the corner portion based on the feature vector;
The steps include: three-dimensional reconstruction of the internal space based on the three-dimensional coordinates;
A three-dimensional reconstruction method, including...

A program for operating a computer as a learning device according to claim 1.

A program for operating a computer as a three-dimensional reconstruction apparatus as described in claim 2.