JP6604502B2

JP6604502B2 - Depth map generation apparatus, depth map generation method, and program

Info

Publication number: JP6604502B2
Application number: JP2015141533A
Authority: JP
Inventors: 孝文青木; 康一伊藤; 修二酒井; 弘樹運天; 隆史渡邉
Original assignee: Tohoku University NUC; Toppan Inc
Current assignee: Tohoku University NUC; Toppan Inc
Priority date: 2015-07-15
Filing date: 2015-07-15
Publication date: 2019-11-13
Anticipated expiration: 2035-07-15
Also published as: JP2017027101A

Description

本発明は、画像のデプスマップを生成するデプスマップ生成装置、デプスマップ生成方法及びプログラムに関する。 The present invention relates to a depth map generation apparatus, a depth map generation method, and a program for generating a depth map of an image.

多視点画像に基づく３次元復元技術は、コンピュータビジョンの研究コミュニティだけでなく、文化財のデジタルアーカイブやエンターテイメント産業など、幅広い分野で注目されている（特許文献１参照）。 3D restoration technology based on multi-viewpoint images is attracting attention not only in the computer vision research community, but also in a wide range of fields such as digital archives of cultural assets and the entertainment industry (see Patent Document 1).

特開２０１３−０１９８０１号公報JP 2013-019801 A

上述した多視点画像に基づく３次元復元技術は、三角測量の技術を用いており、最終的に、各視点におけるデプスマップを統合することによって、３次元点群を生成する。
しかしながら、各視点におけるデプスマップから算出される３次元点群を世界座標系（以下、世界座標とも言う）に変換するだけでは、最終的に得られる３次元復元結果に多くの誤対応点が残ってしまう。これは、統合に用いた視点のデプスマップにおいて、誤対応点が存在し、３次元復元結果に影響を与えるためである。 The above-described three-dimensional restoration technique based on the multi-viewpoint image uses a triangulation technique, and finally generates a three-dimensional point group by integrating the depth maps at the respective viewpoints.
However, simply converting the three-dimensional point group calculated from the depth map at each viewpoint into the world coordinate system (hereinafter also referred to as world coordinates) leaves many miscorresponding points in the finally obtained three-dimensional restoration result. End up. This is because there is a miscorresponding point in the depth map of the viewpoint used for integration, which affects the three-dimensional restoration result.

本発明は、このような状況に鑑みてなされたもので、３次元点群を生成する際、統合に用いるデプスマップにおける誤対応点を低減させ、３次元復元結果における誤対応点の発生を抑制させるデプスマップ生成装置、デプスマップ生成方法及びプログラムを提供する。 The present invention has been made in view of such a situation. When generating a three-dimensional point group, the number of erroneous correspondence points in the depth map used for integration is reduced, and the occurrence of erroneous correspondence points in the three-dimensional restoration result is suppressed. A depth map generation apparatus, a depth map generation method, and a program are provided.

上述した課題を解決するために、本発明のデプスマップ生成装置は、多視点ステレオアルゴリズムに基づき、位相限定相関法を用いて複数の異なる視点の画像からデプスマップを生成するデプスマップ生成部と、正対応点のラベルの正対応点ノード及び誤対応点のラベルの誤対応点ノードの各々に対し、誤対応点の検出を行う対象である対象視点のピクセルに対応するノードから接続させるエッジ各々に対し、当該ピクセルと当該対象視点の近傍の他の視点である参照視点の対応するピクセルとの比較による第１重み付け数の付与を行い、また、当該対象視点におけるピクセルのノードと当該ピクセルに隣接する他の隣接ピクセルのノードとを接続するエッジに対し、前記対象視点におけるピクセル及び前記隣接ピクセル間の奥行きの差分に対応する第２重み付け数を付与して生成したグラフに対し、当該グラフを切断するグラフカットの切断線が通る前記エッジの各重み付け数の積算値が最小となるように行い、前記グラフにおける前記ノードの各々を前記正対応点あるいは前記誤対応点の前記ラベルに振り分け、誤対応点を除去する誤対応点除去部と、を備え、前記第１重み付け数が、前記正対応点のノード及び前記ピクセルに対応するノード間を接続するエッジに対する第１Ａ重み付け数と、前記ピクセルに対応するノード及び前記誤対応点のノードを接続するエッジに対する第１Ｂ重み付け数とから構成され、前記第１Ａ重み付け数が、前記他の視点との間における位相限定相関関数のピーク値に対応した数値とし、相関値が高いほど数値が高く設定され、前記第１Ｂ重み付け数が、近傍の他の視点である参照視点とにおける同一のピクセルの奥行きの差分に対応した数値とし、差分が大きくなるほど数値が高く設定され、前記第２重み付け数が同一の前記視点における隣接するピクセルのノードとの奥行きの差分が小さいほど数値が高く設定されていることを特徴とする。 In order to solve the above-described problem, the depth map generation apparatus of the present invention, based on a multi-view stereo algorithm, a depth map generation unit that generates a depth map from images of different viewpoints using a phase-only correlation method; For each of the edges to be connected from the node corresponding to the pixel of the target viewpoint that is the target for detecting the erroneous corresponding point, for each of the correct corresponding point node of the label of the correct corresponding point and the erroneous corresponding point node of the label of the incorrect corresponding point On the other hand, the first weighting number is given by comparing the pixel with the corresponding pixel of the reference viewpoint which is another viewpoint in the vicinity of the target viewpoint, and the pixel node at the target viewpoint is adjacent to the pixel. For the edge connecting the node of another adjacent pixel, the difference in depth between the pixel at the target viewpoint and the adjacent pixel The graph is generated by assigning the corresponding second weighting number so that the integrated value of each weighting number of the edge through which the cutting line of the graph cut that cuts the graph passes is minimized, and the node in the graph Each corresponding to the correct corresponding point or the label of the erroneous corresponding point, and an erroneous corresponding point removing unit for removing the erroneous corresponding point , wherein the first weighting number includes the node of the positive corresponding point and the pixel A first A weighting number for edges connecting between nodes corresponding to, and a first B weighting number for edges connecting nodes corresponding to the pixels and nodes of the miscorresponding points, and the first A weighting number is The numerical value corresponding to the peak value of the phase-only correlation function with the other viewpoint is set higher as the correlation value is higher. The number of digits is a value corresponding to the difference in depth of the same pixel from a reference viewpoint that is another viewpoint in the vicinity, and the larger the difference is, the higher the value is set, and the second weighting number is adjacent in the same viewpoint. numerical as the difference in depth is small with the node of pixels, characterized that you are set high.

本発明のデプスマップ生成装置は、前記誤対応点除去部が誤対応点除去を行った後に、異なった前記視点間におけるアーチファクトを除去するアーチファクト除去部をさらに備え、前記アーチファクト除去部が、前記対象視点における前記ピクセルの奥行きと、当該対象視点と所定の距離で離間された視点である比較視点における前記ピクセルを前記対象視点の座標系に変換した奥行きとの差分を求め、当該差分が予め設定された閾値と比較して、前記対象視点のピクセルをアーチファクトとするか否かの判定を行うことを特徴とする。 The depth map generation device of the present invention further includes an artifact removal unit that removes an artifact between different viewpoints after the erroneous correspondence point removal unit performs the erroneous correspondence point removal, and the artifact removal unit includes the target The difference between the depth of the pixel at the viewpoint and the depth obtained by converting the pixel at the comparative viewpoint, which is a viewpoint separated from the target viewpoint by a predetermined distance, into the coordinate system of the target viewpoint is determined, and the difference is preset. It is determined whether or not the pixel of the target viewpoint is an artifact compared with the threshold value.

本発明のデプスマップ生成装置は、前記閾値が第１閾値及び第２閾値の各々からなり、前記アーチファクト除去部が、前記差分が前記第１閾値以下の場合、前記対象視点及び前記比較視点の各々において同一の前記ピクセルが同様の奥行きで復元されているため、前記対象視点の前記ピクセルをアーチファクトではないと判定し、前記差分が前記第１閾値を超え、かつ前記第２閾値未満である場合、前記対象視点及び前記比較視点の各々において同一の前記ピクセルが異なった奥行きで復元されているため、前記対象視点の前記ピクセルをアーチファクトであると判定し、前記差分が前記第２閾値以上である場合、前記対象視点の前記ピクセルが前記対象視点の前記ピクセルと異なる領域の前記ピクセルであるとして、前記対象視点の前記ピクセルをアーチファクトでないと判定することを特徴とする。 In the depth map generation device according to the present invention, when the threshold includes the first threshold and the second threshold, and the artifact removal unit determines that each of the target viewpoint and the comparative viewpoint is when the difference is equal to or less than the first threshold. If the same pixel is restored at a similar depth in step (a), the pixel of the target viewpoint is determined not to be an artifact, and the difference exceeds the first threshold and is less than the second threshold. When the same pixel is restored at different depths in each of the target viewpoint and the comparative viewpoint, it is determined that the pixel of the target viewpoint is an artifact, and the difference is greater than or equal to the second threshold value , Assuming that the pixel of the target viewpoint is the pixel in a different area from the pixel of the target viewpoint. And judging not an artifact of cell.

本発明のデプスマップ生成装置は、前記アーチファクト除去部が、前記対象視点と前記比較視点との復元された前記ピクセルに対する視差角が予め設定した第３閾値未満の場合、前記比較視点を前記対象視点との比較に用いないと判定することを特徴とする。 In the depth map generation device of the present invention, the artifact removal unit determines the comparison viewpoint as the target viewpoint when a parallax angle with respect to the restored pixel of the target viewpoint and the comparison viewpoint is less than a preset third threshold. It is determined that it is not used for comparison with.

本発明のデプスマップ生成装置は、複数の異なる前記視点の前記画像からのデプスマップを生成する際、前記画像間のピクセルのマッチング処理が前記ピクセルの分解能を徐々に上げる階層的探索により行われており、前記階層的探索における各階層の前記マッチング処理の終了後の前記デプスマップに対し、重み付きメディアンフィルタを用いて前記デプスマップにおける前記ピクセル毎の奥行きの数値を補正することを特徴とする。 In the depth map generation apparatus of the present invention, when generating a depth map from the images of a plurality of different viewpoints, the pixel matching process between the images is performed by a hierarchical search that gradually increases the resolution of the pixels. The depth map for each pixel in the depth map is corrected using a weighted median filter for the depth map after completion of the matching process for each layer in the hierarchical search.

本発明のデプスマップ生成装置は、前記重み付きメディアンフィルタのウインドウにおける前記ピクセルの各々の重み数値が、対象とするピクセルとの距離、前記階層の画像における前記ピクセルの輝度値及び位相限定相関関数のピーク値に対応して設定されていることを特徴とする。 In the depth map generation device of the present invention, the weight value of each of the pixels in the weighted median filter window includes a distance from a target pixel, a luminance value of the pixel in the image of the hierarchy, and a phase-only correlation function. It is characterized by being set corresponding to the peak value.

本発明のデプスマップ生成方法は、デプスマップ生成部が、多視点ステレオアルゴリズムに基づき、位相限定相関法を用いて複数の異なる視点の画像からデプスマップを生成するデプスマップ生成過程と、誤対応点除去部が、正対応点のラベルの正対応点ノード及び誤対応点のラベルの誤対応点ノードの各々に対し、誤対応点の検出を行う対象である対象視点のピクセルに対応するノードから接続させるエッジ各々に対し、当該ピクセルと当該対象視点の近傍の他の視点である参照視点の対応するピクセルとの比較による第１重み付け数の付与を行い、また、当該対象視点におけるピクセルのノードと当該ピクセルに隣接する他の隣接ピクセルのノードとを接続するエッジに対し、前記対象視点におけるピクセル及び前記隣接ピクセル間の奥行きの差分に対応する第２重み付け数を付与して生成したグラフに対し、当該グラフを切断するグラフカットの切断線が通る前記エッジの各重み付け数の積算値が最小となるように行い、前記グラフにおける前記ノードの各々を前記正対応点あるいは前記誤対応点の前記ラベルに振り分け、誤対応点を除去する誤対応点除去過程と、を含み、前記第１重み付け数が、前記正対応点のノード及び前記ピクセルに対応するノード間を接続するエッジに対する第１Ａ重み付け数と、前記ピクセルに対応するノード及び前記誤対応点のノードを接続するエッジに対する第１Ｂ重み付け数とから構成され、前記第１Ａ重み付け数が、前記他の視点との間における位相限定相関関数のピーク値に対応した数値とし、相関値が高いほど数値が高く設定され、前記第１Ｂ重み付け数が、近傍の他の視点である参照視点とにおける同一のピクセルの奥行きの差分に対応した数値とし、差分が大きくなるほど数値が高く設定され、前記第２重み付け数が同一の前記視点における隣接するピクセルのノードとの奥行きの差分が小さいほど数値が高く設定されていることを特徴とする。 The depth map generation method of the present invention includes a depth map generation process in which a depth map generation unit generates a depth map from images of different viewpoints using a phase-only correlation method based on a multi-view stereo algorithm, The removal unit connects from the node corresponding to the pixel of the target viewpoint, which is the target for detecting the incorrect corresponding point, to each of the positive corresponding point node of the correct corresponding point label and the incorrect corresponding point node of the incorrect corresponding point label. For each edge to be given, a first weighting number is assigned by comparing the pixel with a corresponding pixel of a reference viewpoint that is another viewpoint in the vicinity of the target viewpoint, and a node of the pixel in the target viewpoint The edge connecting the node of another adjacent pixel adjacent to the pixel is the depth between the pixel at the target viewpoint and the adjacent pixel. To graph generated by applying a second number weighted corresponding to Kino difference, do the integrated value of the weighted number of the edges cut line of the graph cut for cutting the graph passes is minimized, the distributing each of the nodes in the graph on the label of the positive corresponding points or the erroneous corresponding points, erroneous and the corresponding point removal process erroneously removing corresponding points, only contains the first number weighted, the positive corresponding point A first A weighting number for an edge connecting the node corresponding to the pixel and a node corresponding to the pixel, and a first B weighting number for an edge connecting the node corresponding to the pixel and the node corresponding to the miscorresponding point, The 1A weighting number is a numerical value corresponding to the peak value of the phase-only correlation function with the other viewpoint, and the higher the correlation value, the higher the numerical value is set. The 1B weighting number is a numerical value corresponding to the difference in depth of the same pixel from a reference viewpoint that is another viewpoint in the vicinity, and the larger the difference is, the higher the numerical value is set, and the second weighting number is the same. numerical smaller difference in the depth of a node of the adjacent pixels at the viewpoint characterized that you are set high.

本発明のプログラムは、コンピュータを、多視点ステレオアルゴリズムに基づき、位相限定相関法を用いて複数の異なる視点の画像からデプスマップを生成するデプスマップ生成手段、正対応点のラベルの正対応点ノード及び誤対応点のラベルの誤対応点ノードの各々に対し、誤対応点の検出を行う対象である対象視点のピクセルに対応するノードから接続させるエッジ各々に対し、当該ピクセルと当該対象視点の近傍の他の視点である参照視点の対応するピクセルとの比較による第１重み付け数の付与を行い、また、当該対象視点におけるピクセルのノードと当該ピクセルに隣接する他の隣接ピクセルのノードとを接続するエッジに対し、前記対象視点におけるピクセル及び前記隣接ピクセル間の奥行きの差分に対応する第２重み付け数を付与して生成したグラフに対し、当該グラフを切断するグラフカットの切断線が通る前記エッジの各重み付け数の積算値が最小となるように行い、前記グラフにおける前記ノードの各々を前記正対応点あるいは前記誤対応点の前記ラベルに振り分け、誤対応点を除去する誤対応点除去手段、として動作させるためのプログラムであり、前記第１重み付け数が、前記正対応点のノード及び前記ピクセルに対応するノード間を接続するエッジに対する第１Ａ重み付け数と、前記ピクセルに対応するノード及び前記誤対応点のノードを接続するエッジに対する第１Ｂ重み付け数とから構成され、前記第１Ａ重み付け数が、前記他の視点との間における位相限定相関関数のピーク値に対応した数値とし、相関値が高いほど数値が高く設定され、前記第１Ｂ重み付け数が、近傍の他の視点である参照視点とにおける同一のピクセルの奥行きの差分に対応した数値とし、差分が大きくなるほど数値が高く設定され、前記第２重み付け数が同一の前記視点における隣接するピクセルのノードとの奥行きの差分が小さいほど数値が高く設定されているプログラムである。 The program according to the present invention is based on a multi-view stereo algorithm and uses a phase-only correlation method to generate a depth map from a plurality of different viewpoint images. And for each of the miscorresponding point nodes of the label of the miscorresponding point, for each edge connected from the node corresponding to the pixel of the target viewpoint that is the target of detecting the miscorresponding point, the pixel and the vicinity of the target viewpoint A first weighting number is assigned by comparison with a corresponding pixel in the reference viewpoint, which is another viewpoint, and a node of a pixel in the target viewpoint is connected to a node of another adjacent pixel adjacent to the pixel. A second weighting number corresponding to a difference in depth between a pixel at the target viewpoint and the adjacent pixel with respect to an edge; To given graph generated by performed so integrated value of the weighted number of the edges cut line of the graph cut for cutting the graph passes is minimized, said each of the nodes of the graph positive corresponding point Or it is a program for allocating to the label of the miscorresponding point and operating as miscorresponding point removing means for removing the miscorresponding point , wherein the first weighting number corresponds to the node of the correct corresponding point and the pixel A first A weighting number for an edge connecting nodes to be connected, and a first B weighting number for an edge connecting a node corresponding to the pixel and a node of the miscorresponding point, and the first A weighting number is the other The numerical value corresponding to the peak value of the phase-only correlation function with respect to the other viewpoint is set higher as the correlation value is higher. The B weighting number is a numerical value corresponding to the difference in depth of the same pixel from a reference viewpoint that is another viewpoint in the vicinity. The larger the difference is, the higher the numerical value is set, and the second weighting number is the same in the viewpoint. This is a program in which the numerical value is set higher as the difference in depth from the adjacent pixel node is smaller .

以上説明したように、本発明によれば、３次元点群を生成する際、統合に用いるデプスマップにおける誤対応点を低減させ、３次元復元結果における誤対応点の発生を抑制させるデプスマップ生成装置、デプスマップ生成方法及びプログラムを提供することができる。 As described above, according to the present invention, when generating a three-dimensional point group, depth map generation that reduces the number of erroneous correspondence points in the depth map used for integration and suppresses the occurrence of erroneous correspondence points in the three-dimensional restoration result. An apparatus, a depth map generation method, and a program can be provided.

本発明の一実施形態における、演算処理装置の構成を表す図である。It is a figure showing the structure of the arithmetic processing apparatus in one Embodiment of this invention. 本発明の一実施形態における、１次元位相限定相関法の概要を表す図である。It is a figure showing the outline | summary of the one-dimensional phase only correlation method in one Embodiment of this invention. 本発明の一実施形態における、３次元点と視差との関係を表す図である。It is a figure showing the relationship between a three-dimensional point and parallax in one Embodiment of this invention. 本発明の一実施形態における、１次元位相限定相関法による相関関数をステレオペア毎に表す図である。It is a figure showing the correlation function by one-dimensional phase only correlation method for every stereo pair in one Embodiment of this invention. 本発明の一実施形態における、演算処理装置のメッシュモデルの生成の動作手順を表すフローチャートである。It is a flowchart showing the operation | movement procedure of the production | generation of the mesh model of the arithmetic processing unit in one Embodiment of this invention. 本発明の一実施形態における、画像ピラミッドの各階層での関数算出装置の動作手順を表す図である。It is a figure showing the operation | movement procedure of the function calculation apparatus in each hierarchy of an image pyramid in one Embodiment of this invention. 本発明の一実施形態における、奥行きの探索手順を表すフローチャートである。It is a flowchart showing the search procedure of the depth in one Embodiment of this invention. 本発明の一実施形態における、ステレオペアのそれぞれに定められたマッチングウィンドウを表す図である。It is a figure showing the matching window defined in each of the stereo pair in one Embodiment of this invention. 本発明の一実施形態における、画像ピラミッドの各階層での関数算出装置の動作手順を表すフローチャートである。It is a flowchart showing the operation | movement procedure of the function calculation apparatus in each hierarchy of an image pyramid in one Embodiment of this invention. 画像ピラミッドにおける上層、中層、下層のデプスマップの関係を説明する図である。It is a figure explaining the relationship of the depth map of the upper layer, middle layer, and lower layer in an image pyramid. デプスマップにおける座標系とウィンドウにおける座標系の関係を示す図である。It is a figure which shows the relationship between the coordinate system in a depth map, and the coordinate system in a window. デプスマップにおける誤対応点を除去するグラフカットの処理を説明する図である。It is a figure explaining the process of the graph cut which removes the miscorresponding point in a depth map. デプスマップにおけるグラフカットの処理による誤対応点の除去を示す図である。It is a figure which shows the removal of the miscorresponding point by the process of the graph cut in a depth map. アーチファクトが発生した際における複数の視点のデプスマップ間の整合性を説明する図である。It is a figure explaining the consistency between the depth maps of a some viewpoint when an artifact generate | occur | produces. 対象の視点のピクセルが正対応点であるかアーチファクトであるかの判定処理を説明する図である。It is a figure explaining the determination processing whether the pixel of the target viewpoint is a positive corresponding point or an artifact.

本発明の一実施形態について図面を参照して詳細に説明する。図１には、演算処理装置の構成が表されている。演算処理装置１００は、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）であり、バス４００を介して記憶装置２００及び出力装置３００に接続されている。 An embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 shows the configuration of the arithmetic processing unit. The arithmetic processing device 100 is, for example, a CPU (Central Processing Unit), and is connected to the storage device 200 and the output device 300 via the bus 400.

演算処理装置１００は、多視点画像を記憶装置２００から読み込み、これら多視点画像を構成するステレオペアの視差を正規化する。ここで、正規化とは、例えば、同一平面上に設置されていない複数のカメラから物体が撮像されたステレオペアの視差を、参照視点と、参照視点の近傍に仮想的に設置されたカメラからの視点（近傍視点）との視差となるように、カメラパラメータ（例えば、座標系、焦点距離）を用いて、視差を変換することである。正規化の詳細については、図３を用いて後述する。 The arithmetic processing device 100 reads the multi-viewpoint images from the storage device 200 and normalizes the parallax of the stereo pairs constituting the multi-viewpoint images. Here, normalization refers to, for example, a stereo pair parallax in which an object is imaged from a plurality of cameras that are not installed on the same plane, from a reference viewpoint and a camera that is virtually installed near the reference viewpoint. The parallax is converted using the camera parameters (for example, coordinate system, focal length) so that the parallax with the other viewpoint (near viewpoint) is obtained. Details of normalization will be described later with reference to FIG.

また、演算処理装置１００は、視差が正規化されたステレオペアからマッチングウィンドウを定めて切り出し、切り出したマッチングウィンドウ毎にステレオペアの相関関数を算出する。また、演算処理装置１００は、マッチングウィンドウ毎に算出した相関関数のうち、相関値マップｃｏｒｒ及び信頼値マップｃｏｎｆが高い相関関数のみを統合し、統合した相関関数に基づいてデプスマップｄｅｐを生成する。ここで、デプスマップとは、多視点画像に撮像されている物体の表面形状の奥行きを表すマップ情報である。また、演算処理装置１００は、生成したデプスマップｄｅｐに基づいて、物体の表面形状が復元されたメッシュモデルを生成し、生成したメッシュモデルを表すデータを出力装置３００に出力する。 Further, the arithmetic processing device 100 determines and cuts out a matching window from the stereo pair whose parallax is normalized, and calculates a correlation function of the stereo pair for each cut out matching window. The arithmetic processing device 100 integrates only correlation functions having a high correlation value map corr and confidence value map conf among the correlation functions calculated for each matching window, and generates a depth map dep based on the integrated correlation function. . Here, the depth map is map information representing the depth of the surface shape of the object captured in the multi-viewpoint image. In addition, the arithmetic processing device 100 generates a mesh model in which the surface shape of the object is restored based on the generated depth map dep, and outputs data representing the generated mesh model to the output device 300.

以下では、画像信号の位相成分に着目した画像マッチング手法である位置限定相関法（Ｐｈａｓｅ−ＯｎｌｙＣｏｒｒｅｌａｔｉｏｎ: ＰＯＣ）に基づく相関関数（以下、「ＰＯＣ関数」という。）が、相関関数の一例として用いられるものとして説明を続ける。 Hereinafter, a correlation function (hereinafter referred to as “POC function”) based on a position-only correlation method (hereinafter referred to as “POC function”), which is an image matching method focusing on the phase component of the image signal, is used as an example of the correlation function. The explanation will be continued as being possible.

まず、マッチングウィンドウの平行移動量、及び、マッチングウィンドウの視差の倍率を用いて、ステレオペア間の画像変形を表現する場合について説明する。 First, a case will be described in which image deformation between stereo pairs is expressed using the parallel movement amount of the matching window and the parallax magnification of the matching window.

１次元位相限定相関法（１次元ＰＯＣ）に基づく画像マッチング手法について説明する。
平行化したステレオペアでは、ステレオペアの一方の画像（以下、「参照視点画像」という。）から切り出されたマッチングウィンドウの幅方向と、ステレオペアの他方の画像（以下、「近傍視点画像」という。）から切り出されたマッチングウィンドウの幅方向とが一致する。この場合、切り出された各マッチングウィンドウが１次元方向に並ぶので、演算処理装置１００は、１次元ＰＯＣ関数に基づいて、物体表面の注目点が撮像されたマッチングウィンドウにおける対応点の平行移動量を、高精度に算出することができる。 An image matching method based on the one-dimensional phase-only correlation method (one-dimensional POC) will be described.
In the parallel stereo pair, the width direction of the matching window cut out from one image of the stereo pair (hereinafter referred to as “reference viewpoint image”) and the other image of the stereo pair (hereinafter referred to as “neighbor viewpoint image”). )) Matches the width direction of the matching window. In this case, since the extracted matching windows are arranged in the one-dimensional direction, the arithmetic processing device 100 calculates the parallel movement amount of the corresponding point in the matching window in which the attention point on the object surface is imaged based on the one-dimensional POC function. Can be calculated with high accuracy.

図２には、１次元位相限定相関法の概要が表されている。１次元画像におけるＮ（Ｎは、正の整数）個の点について、１次元画像信号ｆ(ｎ)、及び１次元画像信号ｇ（ｎ）が与えられた場合、１次元画像信号ｆ(ｎ) 及び１次元画像信号ｇ（ｎ）間の平行移動量は、１次元ＰＯＣに基づいて算出される。ここで、１次元画像信号の離散時間インデックスｎは、ｎ＝−Ｍ，…，Ｍ（Ｍは、正の整数）であるとする。また、Ｎ＝２Ｍ＋１とする。 FIG. 2 shows an outline of the one-dimensional phase-only correlation method. When a one-dimensional image signal f (n) and a one-dimensional image signal g (n) are given for N (N is a positive integer) points in the one-dimensional image, the one-dimensional image signal f (n) The parallel movement amount between the one-dimensional image signal g (n) is calculated based on the one-dimensional POC. Here, it is assumed that the discrete time index n of the one-dimensional image signal is n = −M,..., M (M is a positive integer). Further, N = 2M + 1.

１次元画像信号ｆ(ｎ)の１次元離散フーリエ変換（ＤｉｓｃｒｅｔｅＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ: ＤＦＴ) 結果は、式（１）により表される。また、１次元画像信号ｇ(ｎ) の１次元離散フーリエ変換結果は、式（２）により表される。 The one-dimensional discrete Fourier transform (DFT) of the one-dimensional image signal f (n) is expressed by the equation (1). Further, the one-dimensional discrete Fourier transform result of the one-dimensional image signal g (n) is expressed by Expression (2).

ここで、ｋ＝−Ｍ，…，Ｍは、離散周波数インデックスである。また、ＷＮ＝ｅ−ｊ２π／Ｎは、回転因子である。また、ＡＦ (ｋ) 及びＡＧ(ｋ) は、振幅スペクトルである
。また、θＦ（ｋ）及びθＧ（ｋ）は、位相スペクトルである。また、正規化相互パワースペクトルＲ（ｋ）は、式（３）により表される。 Here, k = −M,..., M is a discrete frequency index. WN = e−j2π / N is a twiddle factor. AF (k) and AG (k) are amplitude spectra. ΘF (k) and θG (k) are phase spectra. Further, the normalized mutual power spectrum R (k) is expressed by the equation (3).

ここで、上付き線が付いているＧ（ｋ）は、Ｇ（ｋ）の複素共役である。また、θＦ（ｋ）−θＧ（ｋ）は、１次元画像信号ｆ(ｎ) 及び１次元画像信号ｇ（ｎ）間の位相差ス
ペクトルである。 Here, G (k) with a superscript line is a complex conjugate of G (k). ΘF (k) −θG (k) is a phase difference spectrum between the one-dimensional image signal f (n) and the one-dimensional image signal g (n).

また、１次元ＰＯＣ関数ｒ（ｎ）は、正規化相互パワースペクトルＲ（ｋ）の１次元逆離散フーリエ変換(ＩｎｖｅｒｓｅＤｉｓｃｒｅｔｅＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ: ＩＤＦＴ) として、式（４）により表される。 Further, the one-dimensional POC function r (n) is expressed by Expression (4) as a one-dimensional inverse discrete Fourier transform (Inverse Discrete Fourier Transform: IDFT) of the normalized mutual power spectrum R (k).

ステレオペアである１次元画像信号ｆ（ｎ）及び１次元画像信号ｇ（ｎ）が、微小な平行移動量δだけ互いに平行移動した関係にある場合、１次元画像信号ｆ(ｎ) 及び１次元画像信号ｇ(ｎ) のＰＯＣ関数ｒ(ｎ) は、式（５）により表される。 When the one-dimensional image signal f (n) and the one-dimensional image signal g (n), which are stereo pairs, are translated from each other by a minute parallel movement amount δ, the one-dimensional image signal f (n) and one-dimensional The POC function r (n) of the image signal g (n) is expressed by equation (5).

式（５）は、１次元画像信号が平行移動量δだけ微小に平行移動した場合のＰＯＣ関数の一般形を表している。ここで、αは、相関ピークの高さ（ピーク値）を表現するために導入されたパラメータである。無相関なノイズが画像に加わると、相関ピークの高さαの値が減少するため、実際にはα≦１となる。この場合、ＰＯＣ関数の相関ピークの高さαは、１次元画像信号ｆ(ｎ) 及び１次元画像信号ｇ(ｎ)の類似度の指標に相当する。また、相関ピークの位置座標は、１次元画像信号ｆ(ｎ) 及び１次元画像信号ｇ(ｎ) の平行移動量δに相当する。 Equation (5) represents the general form of the POC function when the one-dimensional image signal is translated slightly by the translation amount δ. Here, α is a parameter introduced to express the height (peak value) of the correlation peak. When uncorrelated noise is added to the image, the value of the correlation peak height α decreases, so α ≦ 1 is actually satisfied. In this case, the height α of the correlation peak of the POC function corresponds to an index of similarity between the one-dimensional image signal f (n) and the one-dimensional image signal g (n). Further, the position coordinates of the correlation peak correspond to the parallel movement amount δ of the one-dimensional image signal f (n) and the one-dimensional image signal g (n).

したがって、１次元画像信号間の類似度は、相関ピークの高さαに基づいて算出される。また、１次元画像信号間の平行移動量δは、相関ピークの位置座標に基づいて、サブピクセル精度で算出される。 Therefore, the similarity between the one-dimensional image signals is calculated based on the correlation peak height α. The parallel movement amount δ between the one-dimensional image signals is calculated with subpixel accuracy based on the position coordinates of the correlation peak.

実際の多視点画像では、視点の基線長の変化に伴う歪み、及びデジタル画像に生じるノイズの影響により、平行移動量δに推定誤差が生じることがある。そこで、１次元ＰＯＣ関数を用いて平行移動量δを算出する際に重要となる各種の高精度化手法について、以下に説明する。 In an actual multi-viewpoint image, an estimation error may occur in the parallel movement amount δ due to the distortion caused by the change in the baseline length of the viewpoint and the influence of noise generated in the digital image. Accordingly, various high accuracy techniques that are important when calculating the parallel movement amount δ using a one-dimensional POC function will be described below.

１次元離散フーリエ変換（ＤｉｓｃｒｅｔｅＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ：ＤＦＴ）では、信号波形が周期的に循環することを仮定するため、信号波形の端点での不連続性が問題となる。この不連続性の影響を軽減するため、１次元画像信号ｆ(ｎ) 及びｇ(ｎ) に対して窓関数ｗ（ｎ）を適用する。以下では、窓関数として、式（６）に示す１次元ハニング窓を用いるものとする。 In the one-dimensional discrete Fourier transform (DFT), since it is assumed that the signal waveform circulates periodically, discontinuity at the end points of the signal waveform becomes a problem. In order to reduce the influence of this discontinuity, the window function w (n) is applied to the one-dimensional image signals f (n) and g (n). In the following, it is assumed that the one-dimensional Hanning window shown in Expression (6) is used as the window function.

また、自然画像では一般に低周波数成分にエネルギーが集中し、高周波数成分のエネルギーは低周波数成分と比較して小さくなることが知られている。このため、エイリアシング（周波数成分の折り返し）、雑音（ノイズ）、歪みなどの外乱が画像に加わった場合、高周波数成分の信号対雑音比（ＳＮ比）は大幅に低下する。 Further, it is known that in natural images, energy is generally concentrated on low frequency components, and the energy of high frequency components is smaller than that of low frequency components. For this reason, when disturbances such as aliasing (frequency component aliasing), noise (noise), and distortion are added to the image, the signal-to-noise ratio (S / N ratio) of the high frequency component is significantly reduced.

そこで、演算処理装置１００（図１を参照）は、信頼性の低い高周波数成分の影響を抑制するため、正規化相互パワースペクトルＲ（ｋ）を算出する際に、低域通過型のスペクトル重み付け関数Ｈ(ｋ)を適用する。以下では、スペクトル重み付け関数として、式（７）に表されるガウス関数を用いるものとする。 Therefore, the arithmetic processing unit 100 (see FIG. 1) uses a low-pass spectrum weighting when calculating the normalized mutual power spectrum R (k) in order to suppress the influence of high-frequency components with low reliability. Apply function H (k). In the following, it is assumed that a Gaussian function represented by Expression (7) is used as the spectrum weighting function.

ここで、σは、ガウス関数の幅を表す定数である。以下では、一例として、σ＝√（０．５）とする。この場合、相関ピークモデルは、式（８）に表されるガウス型になる。 Here, σ is a constant representing the width of the Gaussian function. Hereinafter, as an example, σ = √ (0.5). In this case, the correlation peak model is a Gaussian type expressed by Equation (8).

平行移動量δは一般に実数値であるため、ＰＯＣ関数の相関ピークの位置座標は、サンプリング格子点の間、すなわち、画像のピクセル間に存在する。そこで、演算処理装置１００は、相関ピークモデルを表す式（８）を、実際に算出されたＰＯＣ関数の数値データに対してフィッティングすることにより、画像のピクセル間に存在する相関ピークの位置座標を推定する。この場合、相関ピークの高さα、及び平行移動量δが、フィッティングパラメータとなる。以下では、フィッティング手法の一例として、非線型最小２乗問題の解法の一つである「Ｌｅｖｅｎｂｅｒｇ−Ｍａｒｑｕａｒｄｔ法」を用いるものとする。 Since the translation amount δ is generally a real value, the position coordinate of the correlation peak of the POC function exists between sampling grid points, that is, between pixels of an image. Therefore, the arithmetic processing apparatus 100 fits the equation (8) representing the correlation peak model to the numerical data of the actually calculated POC function to obtain the position coordinates of the correlation peak existing between the pixels of the image. presume. In this case, the height α of the correlation peak and the parallel movement amount δ are the fitting parameters. In the following, as an example of the fitting method, the “Levenberg-Marquardt method”, which is one of the solutions of the nonlinear least square problem, is used.

演算処理装置１００は、ＰＯＣ関数に基づく画像マッチング手法をステレオペア間の対応付けに適用する場合、物体表面の注目点が撮像された画像における対応点の座標を中心にマッチングウィンドウを定めて切り出し、切り出したウィンドウ間のＰＯＣ関数を算出する。この場合、わずか１組のウィンドウに基づいてＰＯＣ関数の相関ピークの位置座標を高精度に算出することは、ノイズの影響もあることから困難である。 When the image processing method based on the POC function is applied to the correspondence between stereo pairs, the arithmetic processing device 100 defines a matching window around the coordinates of the corresponding point in the image obtained by capturing the attention point on the object surface, A POC function between the cut out windows is calculated. In this case, it is difficult to calculate the position coordinates of the correlation peak of the POC function with high accuracy based on only one set of windows because of the influence of noise.

そこで、演算処理装置１００は、注目点が撮像された複数のステレオペアからマッチングウィンドウをそれぞれ定めて切り出し、切り出した複数のマッチングウィンドウから１次元画像信号を抽出する。また、１次元画像信号に基づく１次元ＰＯＣ関数は、複数のステレオペアに基づいて平均化されることで統合される。以下、平均化されたＰＯＣ関数を「平均ＰＯＣ関数」という。相関ピークα及び平行移動量δの算出に平均ＰＯＣ関数を用いることで、ノイズの影響を抑えることが可能となる。
以上が、１次元位相限定相関法（１次元ＰＯＣ）に基づく画像マッチング手法についての説明である。 Therefore, the arithmetic processing device 100 determines and cuts out matching windows from a plurality of stereo pairs in which the point of interest is imaged, and extracts a one-dimensional image signal from the plurality of cut out matching windows. The one-dimensional POC function based on the one-dimensional image signal is integrated by averaging based on a plurality of stereo pairs. Hereinafter, the averaged POC function is referred to as “average POC function”. By using the average POC function to calculate the correlation peak α and the parallel movement amount δ, it is possible to suppress the influence of noise.
This completes the description of the image matching method based on the one-dimensional phase-only correlation method (one-dimensional POC).

図１に戻り、演算処理装置１００の構成の説明を続ける。演算処理装置１００は、関数算出装置１１０と、デプスマップ生成装置１２０と、メッシュモデル生成装置１３０とを備える。関数算出装置１１０は、ＰＯＣ関数の相関ピークの位置座標、すなわち、マッチングウィンドウの平行移動量δに基づいて、画像に撮像された３次元点（注目点）の座標を算出する。関数算出装置１１０は、正規化部１１１と、ウィンドウ設定部１１２と、関数算出部１１３と、関数統合部１１４とを備える。 Returning to FIG. 1, the description of the configuration of the arithmetic processing unit 100 will be continued. The arithmetic processing device 100 includes a function calculation device 110, a depth map generation device 120, and a mesh model generation device 130. The function calculation device 110 calculates the coordinates of the correlation peak of the POC function, that is, the coordinates of the three-dimensional point (attention point) imaged on the image, based on the translation window δ of the matching window. The function calculation device 110 includes a normalization unit 111, a window setting unit 112, a function calculation unit 113, and a function integration unit 114.

正規化部１１１は、ステレオペアを構成する参照視点画像及び近傍視点画像を平行化する。また、正規化部１１１は、注目ピクセル及びカメラパラメータに基づいて、複数のステレオペアの視差を正規化する。複数のステレオペアの視差を正規化することにより、関数算出装置１１０は、カメラが同一平面上に設置されていない場合でも、相関関数を統合することができる。 The normalization unit 111 parallelizes the reference viewpoint image and the near viewpoint image that form the stereo pair. Further, the normalization unit 111 normalizes the parallax of a plurality of stereo pairs based on the target pixel and the camera parameter. By normalizing the parallax of a plurality of stereo pairs, the function calculation apparatus 110 can integrate the correlation function even when the cameras are not installed on the same plane.

正規化部１１１は、カメラパラメータ（例えば、座標系、焦点距離）が既知である多視点画像Ｖ＝｛Ｖ_０，…，Ｖ_Ｆ−１｝のうち、参照視点画像Ｖ_Ｒ∈Ｖと、その近傍の視点から撮像された近傍視点画像Ｃ＝｛Ｃ_０，…，Ｃ_Ｋ−１｝⊂Ｖ−｛Ｖ_Ｒ｝とを、記憶装置２００から読み込む。ここで、Ｆは、多視点画像の枚数である。また、Ｋは、近傍視点画像Ｃの枚数である。 Among the multi-view images V = {V ₀ ,..., V _F−1 } whose camera parameters (for example, the coordinate system and the focal length) are known, the normalization unit 111 selects the reference viewpoint image V _R εV, A near viewpoint image C = {C ₀ ,..., C _K−1 } ⊂V− {V _R } picked up from a near viewpoint is read from the storage device 200. Here, F is the number of multi-viewpoint images. K is the number of the near viewpoint images C.

正規化部１１１は、近傍視点画像Ｃのいずれかと参照視点画像Ｖ_Ｒとの組から成るＫ組のステレオペアを、カメラパラメータにより平行化する。また、関数算出装置１１０は、平行化したステレオペアを構成する参照視点画像及び近傍視点画像からマッチングウィンドウを定めて切り出し、切り出したマッチングウィンドウの視差を正規化する。 Normalizing unit 111, the K sets of stereo pair consisting of a set of the reference viewpoint image V _R and one of neighboring viewpoint image C, and collimated by the camera parameters. Further, the function calculation apparatus 110 determines and cuts out a matching window from the reference viewpoint image and the near viewpoint image constituting the parallelized stereo pair, and normalizes the parallax of the extracted matching window.

正規化部１１１は、ステレオペアから切り出されたマッチングウィンドウの視差を、次のように正規化する。ここで、予め定められた世界座標と一致するように、参照視点画像Ｖ_Ｒのカメラ座標が予め定められる。 The normalization unit 111 normalizes the parallax of the matching window cut out from the stereo pair as follows. Here, to match the predetermined world coordinate, the camera coordinates of the reference viewpoint image V _R is determined in advance.

正規化部１１１は、平行化したステレオペアＶ_Ｒ-Ｃ_ｉ（Ｃ_ｉ∈Ｃ）におけるマッチングウィンドウの視差を、３次元点Ｍ＝（Ｘ，Ｙ，Ｚ）について算出する場合、参照視点画像Ｖ_Ｒ及び近傍視点画像Ｃ_ｉを、カメラパラメータにより平行化する。ここで、参照視点画像Ｖ_Ｒのカメラ座標に対する回転行列Ｒｉは、式（９）により表される。 When the normalization unit 111 calculates the parallax of the matching window in the paralleled stereo pair V _R -C _i (C _i εC) for the three-dimensional point M = (X, Y, Z), the reference viewpoint image V _R and the near viewpoint image C _i are collimated by camera parameters. Here, the rotation matrix Ri with respect to the camera coordinates of the reference viewpoint image V _R is represented by the formula (9).

この場合、平行化したステレオペアにおける３次元点Ｍｉ＝（Ｘｉ，Ｙｉ，Ｚｉ）と、３次元点Ｍとの関係は、式（１０）により表される。 In this case, the relationship between the three-dimensional point Mi = (Xi, Yi, Zi) in the parallelized stereo pair and the three-dimensional point M is expressed by Expression (10).

また、平行化したステレオペアＶ_Ｒ-Ｃ_ｉにおける３次元点Ｍｉ＝（Ｘｉ，Ｙｉ，Ｚｉ）と、視差ｄ_ｉとの関係は、式（１１）により表される。 Further, the relationship between the three-dimensional point Mi = (Xi, Yi, Zi) in the parallel stereo pair V _R -C _i and the parallax d _i is expressed by Expression (11).

ここで、（ｕｉ，ｖｉ）は、平行化した参照視点画像Ｖ_Ｒにおける、３次元点Ｍｉの対応点のピクセル座標を表す。また、（ｕ０ｉ，ｖ０ｉ）は、平行化した参照視点画像Ｖ_Ｒにおける画角中心を表す。また、βｉは焦点距離を表す。また、Ｂ_ｉは、ステレオペア間の基線長を表す。 Here, (ui, vi) is at the reference viewpoint image _{V R} which collimates represent pixel coordinates of the corresponding point of the three-dimensional point Mi. Further, (u0i, v0i) represents the angle around the reference viewpoint image _{V R} which is collimated. Βi represents the focal length. B _i represents the baseline length between stereo pairs.

３次元点Ｍと視差ｄ_ｉとの関係は、式（１０）及び式（１１）に基づいて、式（１２）により表される。 3D point relationship between M and the disparity _{d i,} based on the equation (10) and (11) is represented by the formula (12).

また、参照視点画像Ｖ_Ｒと近傍視点画像Ｃ_ｊ∈Ｃ−｛Ｃ_ｉ｝との間には、式（１３）により示される関係が成り立つ。 Between the reference viewpoint image _{V R} and the neighboring view image _{_C j} ∈C- _{C _i}, holds a relationship represented by the formula (13).

また、予め定められた世界座標における３次元点Ｍの座標は、式（１２）及び式（１３）に基づいて、式（１４）〜（１６）により表される。 In addition, the coordinates of the three-dimensional point M in the predetermined world coordinates are expressed by equations (14) to (16) based on the equations (12) and (13).

平行化したステレオペアＶ_Ｒ-Ｃｉの視差ｄ_ｉと、平行化したステレオペアＶ_Ｒ-Ｃ_ｊの視差ｄ_ｊとには、式（１７）により表される関係が成り立つ。 A parallax _{d i} stereopair _{V R} - Ci was collimated, the parallax _{d j} stereopair _{V R} -C _j that collimation holds true relationship expressed by the equation (17).

したがって、平行化したステレオペアＶ_Ｒ-Ｃ_ｉにおける視差ｄ_ｉと、平行化したステレオペアＶ_Ｒ-Ｃ_ｊにおける視差ｄ_ｊとは、参照視点画像Ｖ_Ｒにおける対応点のピクセル座標と、カメラパラメータとに依存する視差の倍率ｓにより関係付けられる。ここで、カメラパラメータは、焦点距離βｉ、及びステレオペア間の基線長Ｂ_ｉである。 Therefore, a parallax _{d i} in the stereo pair _{V R} -C _i was collimated, the disparity _{d j} in the stereo pair _{V R} -C _j were collimated, and pixel coordinates of the corresponding point in the reference viewpoint image _{V R,} camera parameters Are related by the parallax magnification s. Here, the camera parameters are the focal length βi and the baseline length B _i between the stereo pairs.

つまり、正規化された視差ｄは、各ステレオペアの視差の倍率ｓが考慮されることにより定められる。平行化したステレオペアＶ_Ｒ-Ｃ_ｉ(ｉ＝０，…，Ｋ−１) が与えられた場合、平行化した各ステレオペアの視差ｄ_ｉと、正規化された視差ｄとには、式（１８）により表される関係が成り立つ。式（１８）により、正規化部１１１は、平行化したステレオペアの視差を正規化する。 That is, the normalized parallax d is determined by considering the parallax magnification s of each stereo pair. Stereopair _{_{V R -C i (i = 0}} , ..., K-1) was collimated if given, the parallax d _i of each stereo pair collimated, in a normalized parallax d of the formula The relationship expressed by (18) holds. The normalization part 111 normalizes the parallax of the paralleled stereo pair by Formula (18).

ここで、ｓｉは、平行化した各ステレオペアの視差の倍率である。また、平行化していない参照視点画像Ｖ_Ｒにおける対応点のピクセル座標（ｕ，ｖ）について、平行化した各ステレオペアの視差の倍率ｓｉは、式（１７）に基づいて式（１９）により表される。 Here, si is a magnification of parallax of each paralleled stereo pair. Further, Table for pixel coordinates of the corresponding point (u, v), the magnification si parallax of each stereo pair collimated using Formula (19) based on equation (17) in the reference viewpoint image V _R which is not collimated Is done.

ここで、｜ｓ｜は、式（２０）により表される。 Here, | s | is expressed by Expression (20).

この場合、平行化したステレオペアＶ_Ｒ-Ｃ_ｉにおける３次元点Ｍｉは、式（２１）により表される。 In this case, the three-dimensional point Mi in the parallel stereo pair V _R -C _i is represented by the equation (21).

また、ウィンドウ設定部１１２は、視差を正規化したステレオペアについて、物体表面の注目点が撮像された画像における対応点を中心に、マッチングウィンドウを定める。また、関数算出部１１３は、ウィンドウ設定部１１２が定めたマッチングウィンドウの間のＰＯＣ関数を算出する。また、関数統合部１１４は、関数算出部１１３が算出した複数のＰＯＣ関数を統合する。 Further, the window setting unit 112 determines a matching window for a stereo pair in which parallax is normalized, with a corresponding point in an image obtained by capturing a point of interest on the object surface as a center. The function calculation unit 113 calculates a POC function between matching windows determined by the window setting unit 112. The function integration unit 114 also integrates a plurality of POC functions calculated by the function calculation unit 113.

図３には、３次元点と視差との関係が表されている。平行化したステレオペアＶ_Ｒ-Ｃ_ｉ（Ｃ_ｉ∈Ｃ）における視差は、３次元点Ｍ及び参照視点画像Ｖ_Ｒを通る視線方向に、３次元点Ｍから微小量ΔＭ＝（ΔＸ，ΔＹ，ΔＺ）だけずれた位置に在る点Ｍ’（＝Ｍ＋ΔＭ）に基づいて定められる。 FIG. 3 shows the relationship between the three-dimensional point and the parallax. Parallax at collimated stereo pair _{_{_{V R -C i (C i ∈C}}} ) is in line-of-sight direction through the three-dimensional point M and the reference viewpoint image _{V R,} a small amount .DELTA.M = ([Delta] X from the 3D point M, [Delta] Y, It is determined based on a point M ′ (= M + ΔM) located at a position shifted by ΔZ).

この場合、３次元点Ｍ’は、平行化したステレオペアＶ_Ｒ-Ｃ_ｉにおける視差ｄ_ｉと、式（１２）とに基づいて、式（２２）により表される。 In this case, three-dimensional point M 'is a disparity _{d i} in the stereo pair _{V R} -C _i was collimated, based on the equation (12) is represented by the formula (22).

ここで、平行化したステレオペアＶ_Ｒ-Ｃ_ｉにおける、真の位置に在る３次元点Ｍの視差ｄ_ｉに対して、３次元点Ｍの視差ｄ_ｉに誤差δ_ｉが生じた場合、３次元点Ｍの視差ｄ_ｉと、３次元点Ｍ’の視差ｄ’_ｉとには、式（２３）により表される関係が成り立つ。 Here, in the stereo pair V _R -C _i was collimated with respect to the parallax d _i of the three-dimensional point M located at the true position, if the error [delta] _i occurs in the disparity d _i of the three-dimensional point M, The relationship represented by Expression (23) is established between the parallax d _i of the three-dimensional point M and the parallax d ′ _i of the three-dimensional point M ′.

局所的な画像変形は平行移動による画像変形のみであると仮定した場合、参照視点画像Ｖ_Ｒにおける対応点を中心に定めて切り出したマッチングウィンドウｆ_ｉと、近傍視点画像Ｃ_ｉにおける対応点を中心に定めて切り出したマッチングウィンドウｇ_ｉとの平行移動量δ_ｉは、マッチングウィンドウｆ_ｉ及びマッチングウィンドウｇ_ｉ間の１次元ＰＯＣ関数の相関ピークの位置座標に基づいて算出される。 If local image deformation was assumed that only the image deformation due to translation, matching windows f _i cut defining around the corresponding point in the reference viewpoint image V _R, the corresponding point in the vicinity viewpoint image C _i center translation amount [delta] _i between the matching window g _i cut defining a is calculated based on the position coordinates of the correlation peak of the 1-dimensional POC function between the matching window f _i and matching window g _i.

ここで、平行化したステレオペアＶ_Ｒ-Ｃ_ｉにおける平行移動量δ_iと、平行化したステレオペアＶ_Ｒ-Ｃ_ｊ（Ｃ_ｊ∈Ｃ−｛Ｃ_ｉ}) における平行移動量δ_j とが一致しない場合には、平行化したステレオペアＶ_Ｒ-Ｃ_ｉに基づく相関ピークの位置座標と、平行化したステレオペアＶ_Ｒ-Ｃ_ｊに基づく相関ピークの位置座標とは、一致しない。 Here, the movement amount [delta] _i parallel in the stereo pair _{V R} -C _i was collimated, parallel movement amount [delta] _j in the collimated stereo pair _{_{_{V R -C j (C j ∈C-}}} {C i}) is If they do not match, the position coordinates of the correlation peak based on the stereo pair V _R -C _i was collimated, the position coordinates of the correlation peak based on collimated stereo pair V _R -C _j, do not coincide.

例えば、図３では、平行化したステレオペアＶ_Ｒ‐Ｃ_０の基線長Ｂ_０よりも、平行化したステレオペアＶ_Ｒ‐Ｃ_１の基線長Ｂ１の方が長い。このため、奥行き方向への同じ変化量ΔＺに対しては、平行化したステレオペアＶ_Ｒ‐Ｃ_１における対応点の平行移動量δ_１は、平行化したステレオペアＶ_Ｒ‐Ｃ_０における対応点の平行移動量δ０より大きい。 For example, in FIG. 3, the base length B1 of the parallelized stereo pair V _R -C ₁ is longer than the base length B _{0 of} the parallel stereo pair V _R -C ₀ . Therefore, for the same amount of change ΔZ in the depth direction, the translation amount δ ₁ of the corresponding point in the parallelized stereo pair V _R -C ₁ is equal to the corresponding point in the parallelized stereo pair V _R -C ₀ . Is greater than the parallel movement amount δ0.

図１に戻り、演算処理装置１００の構成の説明を続ける。ウィンドウ設定部１１２は、平行化した各ステレオペアから定めて切り出したマッチングウィンドウを、正規化された視差に基づいて拡大又は縮小する。関数算出部１１３は、ウィンドウ設定部１１２が定めて切り出したマッチングウィンドウの間の１次元ＰＯＣ関数を算出する。 Returning to FIG. 1, the description of the configuration of the arithmetic processing unit 100 will be continued. The window setting unit 112 enlarges or reduces the matching window determined and cut out from each paralleled stereo pair based on the normalized parallax. The function calculation unit 113 calculates a one-dimensional POC function between matching windows determined and cut out by the window setting unit 112.

関数統合部１１４は、関数算出部１１３が算出した１次元ＰＯＣ関数をステレオペア間で平均化することにより、１次元ＰＯＣ関数を統合する。これにより、マッチングウィンドウ毎に算出された１次元ＰＯＣ関数の相関ピークの位置座標は、ステレオペア間で統一される。ここで、３次元点Ｍについて正規化された視差ｄと、３次元点Ｍ’について正規化された視差ｄ’（＝ｄ＋δ）とに基づいて、３次元点Ｍ’は、式（２４）により表される。 The function integration unit 114 integrates the one-dimensional POC function by averaging the one-dimensional POC function calculated by the function calculation unit 113 between the stereo pairs. Thereby, the position coordinates of the correlation peak of the one-dimensional POC function calculated for each matching window are unified among the stereo pairs. Here, based on the parallax d normalized with respect to the three-dimensional point M and the parallax d ′ normalized with respect to the three-dimensional point M ′ (= d + δ), the three-dimensional point M ′ is expressed by the equation (24). expressed.

したがって、平行化したステレオペアＶ_Ｒ‐Ｃ_iにおける対応点を中心に切り出したマッチングウィンドウｆ_ｉと、近傍視点画像Ｃ_ｉにおける対応点を中心に切り出したマッチングウィンドウｇ_ｉとの平行移動量は、ｓｉδとなる。そこで、マッチングウィンドウｆ_ｉ及びマッチングウィンドウｇ_ｉのウィンドウサイズは、それぞれｓｉｗと定められる。ウィンドウサイズｗは、予め定められた基準サイズである。 Therefore, the translation amount of the matching window f _i cut out around the corresponding point in the parallel stereo pair V _R -C _i and the matching window g _i cut out around the corresponding point in the neighboring viewpoint image C _i is siδ. Therefore, the window size of the matching window _{f i} and matching window _{g i} are respectively defined as siw. The window size w is a predetermined reference size.

また、ウィンドウサイズｓｉｗが（１／ｓｉ）倍に拡大又は縮小されることで、１次元画像信号のマッチングウィンドウのウィンドウサイズは、ウィンドウサイズｗに共通化される。以下、ウィンドウサイズｓｉｗが（１／ｓｉ）倍に拡大又は縮小されたマッチングウィンドウｆ_ｉを、マッチングウィンドウｆｉ’と表記する。同様に、ウィンドウサイズｓｉｗが（１／ｓｉ）倍に拡大又は縮小されたマッチングウィンドウｇ_ｉを、マッチングウィンドウｇｉ’と表記する。 Further, the window size siw is enlarged or reduced by (1 / si) times, so that the window size of the matching window of the one-dimensional image signal is made common to the window size w. Hereinafter, the matching window _{f i} which is enlarged or reduced window size siw within (1 / si) times, referred to as the matching window fi '. Similarly, the matching window _{g i} which is enlarged or reduced window size siw within (1 / si) times, referred to as the matching window gi '.

マッチングウィンドウｆ_ｉ’及びマッチングウィンドウｇ_ｉ’に基づいて算出された１次元ＰＯＣ関数の相関ピークの位置座標は、マッチングウィンドウｆ_ｉ’及びマッチングウィンドウｇ_ｉ’間の平行移動量δに相当する。ここで、平行化したステレオペアＶ_Ｒ‐Ｃ_ｊ（Ｃ_ｊ∈Ｃ−｛Ｃ_ｉ｝）から切り出したマッチングウィンドウのサイズは、ウィンドウサイズｓｊｗである。平行化したステレオペアＶ_Ｒ‐Ｃ_ｊの１次元ＰＯＣ関数の相関ピークは、ウィンドウサイズｓｉｗとは異なるウィンドウサイズｓｊｗが（１／ｓｊ）倍に拡大又は縮小されることで、平行化したステレオペアＶ_Ｒ‐Ｃ_ｉの１次元ＰＯＣ関数の相関ピークと一致するようになる。 Matching window f _{i 'and} the matching window g _i' coordinates of the correlation peak of the 1-dimensional POC function calculated on the basis of the corresponds to the matching window f _{i 'and} the matching window g _i' translation amount between [delta]. Here, the size of the matching window cut out from the parallel stereo pair V _R -C _j (C _j εC- {C _i }) is the window size sjw. The correlation peak of the one-dimensional POC function of the parallel stereo pair V _R -C _j is obtained by expanding or reducing the window size sjw different from the window size siw by (1 / sj) times, thereby making the parallel stereo pair It coincides with the correlation peak of the one-dimensional POC function of V _R -C _i .

また、ウィンドウ設定部１１２は、上記に説明したマッチングウィンドウの拡大又は縮小に加えて、対象物体の表面形状に応じたウィンドウサイズとなるように、マッチングウィンドウをさらに拡大又は縮小してもよい。 In addition to the above-described enlargement or reduction of the matching window, the window setting unit 112 may further enlarge or reduce the matching window so that the window size corresponds to the surface shape of the target object.

また、ウィンドウ設定部１１２は、上記に説明したマッチングウィンドウの拡大又は縮小の倍率を決定する際に、ウィンドウサイズ（倍率）を変化させた複数のマッチングウィンドウ毎に画像マッチングを実行し、１次元ＰＯＣ関数の相関ピークが最も高くなるウィンドウサイズを採用してもよい。 Further, the window setting unit 112 executes image matching for each of a plurality of matching windows whose window sizes (magnifications) are changed when determining the magnification or reduction ratio of the matching window described above, thereby performing one-dimensional POC. You may employ | adopt the window size from which the correlation peak of a function becomes the highest.

なお、多視点画像に基づいて３次元復元することを考慮すれば、複数のステレオペアにおいて、それぞれの左右の画像から切り出されたマッチングウィンドウのウィンドウサイズには、ステレオペア間で互いに関係性がある（例えば、近い位置に在るステレオペア同士では、同じようなウィンドウサイズとなる）。そこで、ウィンドウ設定部１１２は、３視点以上の多視点画像に基づいて、他のステレオペアのウィンドウサイズを互いに比較することにより、マッチングウィンドウのサイズに制約を加えてもよい。 In consideration of three-dimensional restoration based on multi-viewpoint images, the window sizes of the matching windows cut out from the left and right images in a plurality of stereo pairs are mutually related between the stereo pairs. (For example, stereo pairs located in close positions have the same window size). Therefore, the window setting unit 112 may add restrictions to the size of the matching window by comparing the window sizes of other stereo pairs with each other based on multi-view images of three or more viewpoints.

図４には、１次元位相限定相関法による相関関数（１次元ＰＯＣ関数）が、ステレオペア毎に表されている。図４（ａ）には、正規化された視差の倍率ｓが考慮されず、平行化したすべてのステレオペアから、同じウィンドウサイズｗでマッチングウィンドウを切り出した場合における、１次元ＰＯＣ関数が表されている。この場合、マッチングウィンドウの間の平行移動量δ_ｉは、ステレオペア毎に異なる。すなわち、算出された１次元ＰＯＣ関数の相関ピークの位置座標は、ステレオペア間で一致しない。 In FIG. 4, the correlation function (one-dimensional POC function) by the one-dimensional phase-only correlation method is shown for each stereo pair. FIG. 4A shows a one-dimensional POC function when a matching window is cut out from all parallel stereo pairs with the same window size w without considering the normalized parallax magnification s. ing. In this case, the parallel movement amount δ _i between the matching windows is different for each stereo pair. That is, the calculated position coordinates of the correlation peak of the one-dimensional POC function do not match between the stereo pairs.

一方、図４（ｂ）には、正規化された視差の倍率ｓが考慮され、平行化した各ステレオペアから、同じウィンドウサイズｓｉｗでマッチングウィンドウを切り出した場合における、１次元ＰＯＣ関数が表されている。この場合、正規化された視差の倍率ｓが考慮されたことで、マッチングウィンドウの間の平行移動量δ_ｉは、ステレオペア毎で同じになる。すなわち、算出された１次元ＰＯＣ関数の相関ピークの位置座標は、ステレオペア間で一致する。したがって、平行化した各ステレオペアに基づいて算出された１次元ＰＯＣ関数は、位置座標がステレオペア間で一致しているので統合可能である。 On the other hand, FIG. 4B shows a one-dimensional POC function in the case where a matching window is cut out from each paralleled stereo pair with the same window size siw in consideration of the normalized parallax magnification s. ing. In this case, since the normalized parallax magnification s is taken into account, the parallel movement amount δ _i between the matching windows becomes the same for each stereo pair. That is, the calculated position coordinates of the correlation peak of the one-dimensional POC function match between stereo pairs. Therefore, the one-dimensional POC function calculated based on the parallelized stereo pairs can be integrated because the position coordinates are identical between the stereo pairs.

関数統合部１１４は、ステレオペア毎に算出された１次元ＰＯＣ関数を、複数のステレオペア間で平均化することにより統合する。予め定められた世界座標における３次元点Ｍ（注目点）の座標は、統合した１次元ＰＯＣ関数の相関ピークの位置座標に基づく平行移動量δを用いて、式（２５）により表される。 The function integration unit 114 integrates the one-dimensional POC function calculated for each stereo pair by averaging the stereo pairs. The coordinates of the three-dimensional point M (attention point) in the predetermined world coordinates are expressed by Expression (25) using the parallel movement amount δ based on the position coordinates of the correlation peak of the integrated one-dimensional POC function.

ここで、オクルージョン（手前に在る物体が、背後に在る物体を隠す状態）により、３次元点Ｍが近傍視点画像Ｃ_ｉ∈Ｃに写っていない場合、又は、物体境界において複数の視差を持つ領域からマッチングウィンドウが切り出された場合、そのマッチングウィンドウから算出される１次元ＰＯＣ関数の相関ピークの位置座標に、非常に大きい誤差が生じることが予想される。また、マッチングウィンドウの間の画像変形が平行移動のみによって近似できない場合、そのマッチングウィンドウから算出されるＰＯＣ関数の相関ピークの高さαが低くなることが、実験的に知られている。 Here, when the three-dimensional point M is not reflected in the near viewpoint image C _i ∈C due to occlusion (a state in which the object in the front conceals the object in the back), or a plurality of parallaxes at the object boundary. When a matching window is cut out from a region having the same, it is expected that a very large error occurs in the position coordinates of the correlation peak of the one-dimensional POC function calculated from the matching window. Further, it is experimentally known that when the image deformation between matching windows cannot be approximated only by translation, the correlation peak height α of the POC function calculated from the matching window is lowered.

そこで、関数統合部１１４は、平均ＰＯＣ関数（統合した１次元ＰＯＣ関数）のうち、相関ピークの高さαが閾値ｔｈｃｏｒｒ以上である（画像の類似度が高い）平均ＰＯＣ関数のみを、デプスマップ生成装置１２０に出力する。 Therefore, the function integration unit 114 calculates only the average POC function of the average POC function (integrated one-dimensional POC function) whose correlation peak height α is equal to or higher than the threshold thcorr (image similarity is high). The data is output to the generation device 120.

デプスマップ生成装置１２０は、関数統合部１１４が出力した平均ＰＯＣ関数に基づいて、デプスマップを生成する。これにより、デプスマップ生成装置は、オクルージョン及び物体境界などの影響を抑えながら、デプスマップを生成することができる。デプスマップ生成装置１２０は、デプスマップ生成部１２１と、フィルタ部１２２と、誤対応点除去部１２３と、アーチファクト除去部１２４との各々を備えている。 The depth map generation device 120 generates a depth map based on the average POC function output from the function integration unit 114. Thereby, the depth map generation apparatus can generate the depth map while suppressing the influence of the occlusion and the object boundary. The depth map generation device 120 includes a depth map generation unit 121, a filter unit 122, an erroneous corresponding point removal unit 123, and an artifact removal unit 124.

メッシュモデル生成装置１３０は、デプスマップ生成装置１２０が算出したデプスマップに基づいてメッシュモデルを生成し、生成したメッシュモデルを表すデータを記憶装置２００及び出力装置３００に出力する。 The mesh model generation device 130 generates a mesh model based on the depth map calculated by the depth map generation device 120 and outputs data representing the generated mesh model to the storage device 200 and the output device 300.

記憶装置２００は、参照視点画像Ｖ_Ｒと、近傍視点画像Ｃと、デプスマップを表すデータと、メッシュモデルを表すデータとを記憶する。また、記憶装置２００は、処理に用いられる各種パラメータ、及びプログラムを記憶する。 Storage device 200 stores the reference viewpoint image V _R, and the neighboring viewpoint image C, a data representing a depth map, and the data representing the mesh model. The storage device 200 stores various parameters and programs used for processing.

出力装置３００には、メッシュモデルを表すデータが入力される。出力装置３００は、メッシュモデルを表すデータに基づいて、復元された物体の表面形状を表示する。出力装置３００は、例えば、液晶ディスプレイ装置である。 Data representing the mesh model is input to the output device 300. The output device 300 displays the surface shape of the restored object based on the data representing the mesh model. The output device 300 is, for example, a liquid crystal display device.

次に、関数算出装置の動作手順について説明する。
図５は、演算処理装置のメッシュモデルの生成の動作手順を表すフローチャートである。関数算出装置１１０の正規化部１１１（図１を参照）は、参照視点画像Ｖ_Ｒ及び近傍視点画像Ｃを、記憶装置２００から読み込む（ステップＳ１）。正規化部１１１は、ステレオペアを構成する参照視点画像Ｖ_Ｒ及び近傍視点画像Ｃを平行化し、平行化したステレオペアの視差を正規化する（ステップＳ２）。 Next, the operation procedure of the function calculation device will be described.
FIG. 5 is a flowchart showing an operation procedure for generating a mesh model of the arithmetic processing unit. Normalization unit 111 of the function calculation unit 110 (see FIG. 1) is a reference view image _{V R} and near view image C, read from the storage device 200 (step S1). Normalizing unit 111 collimates the reference viewpoint image V _R and neighboring viewpoint images C forming a stereo pair, it is normalized parallax collimated stereo pair (step S2).

ウィンドウ設定部１１２（図１を参照）は、ステレオペアから切り出すマッチングウィンドウのウィンドウサイズを、視差の倍率に基づいて拡大又は縮小する（ステップＳ３）。関数算出部１１３は（図１を参照）、拡大又は縮小されたマッチングウィンドウの間の１次元ＰＯＣ関数を算出する（ステップＳ４）。 The window setting unit 112 (see FIG. 1) enlarges or reduces the window size of the matching window cut out from the stereo pair based on the parallax magnification (step S3). The function calculation unit 113 (see FIG. 1) calculates a one-dimensional POC function between the enlarged or reduced matching windows (step S4).

関数統合部１１４（図１を参照）は、算出された１次元ＰＯＣ関数のうち、相関ピークの高さが閾値以上である１次元ＰＯＣ関数を統合する（ステップＳ５）。デプスマップ生成部１２１（図１を参照）は、統合された１次元ＰＯＣ関数に基づいて、デプスマップを生成する（ステップＳ６）。メッシュモデル生成装置１３０は、デプスマップに基づいて、メッシュモデルを生成する（ステップＳ７）。 The function integration unit 114 (see FIG. 1) integrates one-dimensional POC functions whose correlation peak heights are equal to or greater than a threshold among the calculated one-dimensional POC functions (step S5). The depth map generator 121 (see FIG. 1) generates a depth map based on the integrated one-dimensional POC function (step S6). The mesh model generation device 130 generates a mesh model based on the depth map (step S7).

＜画像ピラミッドを用いて奥行きを探索する方法について＞
関数算出装置１１０は、３次元点Ｍの奥行きを探索する（座標を算出する）場合、画像ピラミッドを用いた粗密探索を組み合わせてもよい。これにより、デプスマップ生成装置１２０は、画像ピラミッドを用いない場合と比較して少ないマッチング回数で、３次元点Ｍ群の世界座標を表すデプスマップを算出することができる。 <How to search depth using an image pyramid>
When searching for the depth of the three-dimensional point M (calculating coordinates), the function calculation device 110 may combine a coarse / fine search using an image pyramid. Thereby, the depth map generating apparatus 120 can calculate the depth map representing the world coordinates of the three-dimensional point M group with a smaller number of matchings compared to the case where the image pyramid is not used.

奥行き探索処理への入力情報は、参照視点画像Ｖ_Ｒと、その近傍視点画像Ｃ＝{Ｃ０，…，ＣＫ−１} である。また、奥行き探索処理からの出力情報は、デプスマップｄｅｐ（後述するＩ_Ｚ（ｕ，ｖ））と、相関値マップｃｏｒｒ（後述するＩ_α（ｕ，ｖ））と、信頼値マップｃｏｎｆである。また、デプスマップ算出処理に用いられる各パラメータは、相関ピークの高さの閾値ｔｈｃｏｒｒと、画像ピラミッドの階層数Ｈと、基準サイズであるウィンドウサイズｗと、平均ＰＯＣ関数を算出する処理に用いるライン数Ｌである。 Input information to the depth search process, the reference viewpoint image _{V R,} near view image C = {C0, ..., CK -1} is. Further, output information from the depth search processing is a depth map dep (I _Z (u, v) described later), a correlation value map corr (I _α (u, v) described later), and a confidence value map conf. . Each parameter used in the depth map calculation process includes a correlation peak height threshold thcorr, an image pyramid hierarchy number H, a reference size window size w, and a line used for calculating an average POC function. The number L.

図６には、画像ピラミッドの各階層での関数算出装置の動作手順が表されている。正規化部１１１（図１を参照）は、参照視点画像Ｖ_Ｒと、その近傍の視点から撮像された近傍視点画像Ｃとを、記憶装置２００から読み込む。そして、正規化部１１１は、ステレオペアＶ_Ｒ-Ｃ_ｉ（ｉ＝０，…，Ｋ−１）を平行化する。さらに、関数算出部１１３は、平行化したステレオペアＶ_Ｒ-Ｃ_ｉの画像ピラミッド（階層画像）を生成する。 FIG. 6 shows an operation procedure of the function calculation device in each layer of the image pyramid. Normalizing unit 111 (see FIG. 1) includes a reference viewpoint image V _R, and a near view image C captured from the viewpoint of its neighbors, read from the storage device 200. Then, the normalizing unit 111 parallelizes the stereo pair V _R -C _i (i = 0,..., K−1). Furthermore, the function calculation unit 113 generates an image pyramid (hierarchical image) of the parallel stereo pair V _R -C _i .

以下、平行化したステレオペアＶ_Ｒ-Ｃ_ｉを構成する参照視点画像をＶｒｅｃｔ，ｉ，０と、近傍視点画像をＣｒｅｃｔ，ｉ，０と表記する。また、画像ピラミッドの階層ｈ＝１，…，Ｈ−１について、参照視点画像Ｖｒｅｃｔ，ｉ，０を（２−ｈ）倍した画像を、参照視点画像Ｖｒｅｃｔ，ｉ，ｈと表記する。また、画像ピラミッドの階層ｈ＝１，…，Ｈ−１について、近傍視点画像Ｃｒｅｃｔ，ｉ，０を（２−ｈ）倍した画像を、近傍視点画像Ｃｒｅｃｔ，ｉ，ｈと表記する。 Hereinafter, the reference viewpoint images constituting the parallel stereo pair V _R -C _i are denoted as Vrect, i, 0, and the near viewpoint images are denoted as Crect, i, 0. In addition, for the image pyramid hierarchy h = 1,..., H−1, an image obtained by multiplying the reference viewpoint image Vrect, i, 0 by (2−h) is referred to as a reference viewpoint image Vrect, i, h. Further, for the hierarchy h = 1,..., H−1 of the image pyramid, an image obtained by multiplying the near viewpoint image Clect, i, 0 by (2−h) is referred to as a near viewpoint image Clect, i, h.

関数算出部１１３（図１を参照）は、画像ピラミッドの最上層Ｈ−１に対して、画像全体のマッチングを実行することにより、奥行きを探索する開始座標Ｚｉｎｉｔを定める。
この開始座標Ｚｉｎｉｔは、参照視点画像Ｖｒｅｃｔ，ｉ，Ｈ−１と、近傍視点画像Ｃｒｅｃｔ，ｉ，Ｈ−１との１次元ＰＯＣ関数の相関ピークの位置座標に基づいて、各ステレオペア（ｉ＝０，…，Ｋ−１）について視差が算出されることにより定められる。 The function calculation unit 113 (see FIG. 1) determines a start coordinate Zinit for searching for depth by executing matching of the entire image with respect to the uppermost layer H-1 of the image pyramid.
This start coordinate Zinit is based on the position coordinates of the correlation peak of the one-dimensional POC function between the reference viewpoint image Vrect, i, H-1 and the neighboring viewpoint image Clect, i, H-1, and each stereo pair (i = 0,..., K-1) are determined by calculating the parallax.

関数算出部１１３は、参照視点画像Ｖ_Ｒにおける任意の点ｍ＝ (ｕ，ｖ）について、奥行き（デプス）、相関値及び信頼値を算出し、デプスマップｄｅｐ(ｍ)と、相関値マップｃｏｒｒ(ｍ)と、信頼値マップｃｏｎｆ(ｍ)とを、記憶装置２００に記憶させる。ここで、関数算出部１１３は、参照視点画像Ｖ_Ｒにおける任意の点ｍの座標を、参照視点画像Ｖ_Ｒ上で変化させながら処理を繰り返すことで、デプスマップｄｅｐ(ｍ)と、相関値マップｃｏｒｒ(ｍ)と、信頼値マップｃｏｎｆ(ｍ)とを算出する。 Function calculating section 113, any point in the reference viewpoint image _{V R} m = the (u, v), the depth (depth), and calculates the correlation value and the confidence value, the depth map dep (m), the correlation value map corr (m) and the confidence value map conf (m) are stored in the storage device 200. Here, the function calculation unit 113, the coordinates of an arbitrary point in the reference viewpoint image V _R m, by repeating the process while changing over the reference viewpoint image V _R, the depth map dep (m), the correlation value map corr (m) and the confidence value map conf (m) are calculated.

以下、ステレオペアの一方を構成する参照視点画像Ｖｒｅｃｔ，ｉ，ｈ（ｉ＝０，…，Ｋ−１）に３次元点Ｍｉが投影された対応点を、ｍｉ，ｈ＝ (ｕｉ，ｈ，ｖｉ，ｈ)と表記する。また、ステレオペアの他方を構成する近傍視点画像Ｃｒｅｃｔ，ｉ，ｈ（ｉ＝０，…，Ｋ−１）に３次元点Ｍｉが投影された対応点を、ｍ’ｉ，ｈ＝ (ｕ’ｉ，ｈ，ｖ’ｉ，ｈ) と表記する。 Hereinafter, corresponding points obtained by projecting the three-dimensional points Mi on the reference viewpoint images Vrect, i, h (i = 0,..., K−1) constituting one of the stereo pairs are denoted by mi, h = (ui, h, vi, h). In addition, corresponding points obtained by projecting the three-dimensional point Mi to the near viewpoint images Clect, i, h (i = 0,..., K−1) constituting the other of the stereo pairs are represented by m′i, h = (u ′). i, h, v′i, h).

図７は、奥行きの探索手順を表すフローチャートである。正規化部１１１は、参照視点画像Ｖ_Ｒ上の任意の点ｍにおける視差の倍率ｓｉ（ｉ＝０，…，Ｋ−１）を、式（１９）に基づいて算出する。この場合、平行化した参照視点画像Ｖｒｅｃｔ，ｉ，０における対応点ｍｉ＝ (ｕｉ，ｖｉ）は、平行化したステレオペアを構成する画像の変形に用いられる射影行列に基づいて算出される（ステップＳａ１）。 FIG. 7 is a flowchart showing a depth search procedure. Normalizing unit 111, the reference viewpoint image _V ratio of parallax at any point m on the _{R si (i = 0, ...} , K-1) , and is calculated based on the equation (19). In this case, the corresponding point mi = (ui, vi) in the parallelized reference viewpoint image Vrect, i, 0 is calculated based on the projection matrix used for the deformation of the images constituting the parallelized stereo pair (step Sa1).

関数算出部１１３は、画像ピラミッドの階層ｈ＝Ｈ−１、及び３次元点Ｍ＝Ｍｉｎｉｔと初期設定し、世界座標における３次元点Ｍの座標の探索を、画像ピラミッドの最上層Ｈ−１から開始する。ここで、Ｍｉｎｉｔは、参照視点画像Ｖ_Ｒにおいて、任意の点ｍの座標を通る視線上で奥行きがＺｉｎｉｔの位置に在る３次元点である（ステップＳａ２）。 The function calculation unit 113 initially sets an image pyramid hierarchy h = H−1 and a three-dimensional point M = Minit, and searches for coordinates of the three-dimensional point M in world coordinates from the top layer H-1 of the image pyramid. Start. Here, Minit, in the reference viewpoint image _{V R,} is a three-dimensional point depth on the line of sight which passes through the coordinates of an arbitrary point m is in the position of Zinit (step Sa2).

ウィンドウ設定部１１２は、ウィンドウサイズｓｉｗ×ライン数Ｌの大きさで、対応点ｍｉ，ｈを中心に参照視点画像Ｖｒｅｃｔ，ｉ，ｈからマッチングウィンドウｆ_ｉ，ｈを定めて切り出す。また、ウィンドウ設定部１１２は、ウィンドウサイズｓｉｗ×ライン数Ｌの大きさで、対応点ｍｉ，ｈを中心に近傍視点画像Ｖｒｅｃｔ，ｉ，ｈからマッチングウィンドウｇ_ｉ，ｈを定めて切り出す。関数算出部１１３は、切り出されたマッチングウィンドウｆ_ｉ，ｈ及びマッチングウィンドウｇｉ，ｈに基づいて、1次元ＰＯＣ関数ｒｉ，ｈを算出する（ステップＳａ３）。 The window setting unit 112 defines the matching windows f _i and h from the reference viewpoint images Vrect, i and h with the window size siw × the number of lines L as the center and the corresponding points mi and h as the center. Further, the window setting unit 112 determines and cuts the matching windows g _i , h from the neighboring viewpoint images Vrect, i, h around the corresponding points mi, h with a size of window size siw × number of lines L. The function calculation unit 113 calculates the one-dimensional POC function ri, h based on the extracted matching windows f _i , h and the matching windows gi, h (step Sa3).

関数統合部１１４は、1次元ＰＯＣ関数ｒｉ，ｈのうち、相関ピークの高さαi≧閾値ｔｈｃｏｒｒである1次元ＰＯＣ関数を平均化することにより、平均ＰＯＣ関数ｒａｖｅ，ｈを算出する（ステップＳａ４）。以下、相関ピークの高さαｉ≧閾値ｔｈｃｏｒｒとなる1次元ＰＯＣ関数の個数を、Ｋ’と表記する。 The function integration unit 114 calculates the average POC function “rave, h” by averaging the one-dimensional POC function having the correlation peak height αi ≧ threshold thcorr among the one-dimensional POC functions “ri, h” (step Sa4). ). Hereinafter, the number of one-dimensional POC functions for which the correlation peak height αi ≧ threshold thcorr is expressed as K ′.

関数統合部１１４は、平均ＰＯＣ関数ｒａｖｅ，ｈに対して関数フィッティングを実行することにより、平均ＰＯＣ関数の相関ピークの高さαと、平均ＰＯＣ関数の相関ピークの位置座標（平行移動量δに相当）とを算出する。関数統合部１１４は、平行移動量δ及び式（２５）に基づいて、世界座標（参照視点画像のカメラ座標）における３次元点Ｍの座標を更新する（ステップＳａ５）。 The function integration unit 114 performs function fitting on the average POC function “rave”, “h”, thereby calculating the correlation peak height α of the average POC function and the position coordinates of the correlation peak of the average POC function (translation amount δ). Equivalent). The function integration unit 114 updates the coordinates of the three-dimensional point M in the world coordinates (camera coordinates of the reference viewpoint image) based on the parallel movement amount δ and Expression (25) (step Sa5).

関数統合部１１４は、画像ピラミッドの階層が最下層であるか否かを判定する（ステップＳａ６）。画像ピラミッドの階層が最下層である場合（ステップＳａ６：Ｙｅｓ）、デプスマップ生成部１２１は、相関値マップｃｏｒｒ及び信頼値マップｃｏｎｆが閾値以上である３次元点Ｍ群のデプスマップを生成する（ステップＳａ７）。 The function integration unit 114 determines whether the hierarchy of the image pyramid is the lowest layer (step Sa6). When the hierarchy of the image pyramid is the lowest layer (step Sa6: Yes), the depth map generation unit 121 generates a depth map of the three-dimensional point M group in which the correlation value map corr and the confidence value map conf are equal to or greater than a threshold ( Step Sa7).

ここで、デプスマップｄｅｐ(ｍ)は、階層ｈ＝０（最下層）である場合における１次元ＰＯＣ関数ｒａｖｅ，０と、３次元点Ｍ＝ (Ｘ，Ｙ，Ｚ) とを用いて、式（２６）のように予め定義される。同様に、相関値マップｃｏｒｒ(ｍ)は、式（２７）のように予め定義される。また、同様に、信頼値マップｃｏｎｆ(ｍ)は、式（２８）のように予め定義される。 Here, the depth map dep (m) is obtained by using the one-dimensional POC function “rave, 0” and the three-dimensional point “M = (X, Y, Z)” when the hierarchy is h = 0 (lowermost layer). It is defined in advance as shown in (26). Similarly, the correlation value map corr (m) is defined in advance as shown in Expression (27). Similarly, the confidence value map conf (m) is defined in advance as in Expression (28).

一方、ステップＳａ６において、画像ピラミッドの階層が最下層でない場合（ステップＳａ６：Ｎｏ）、関数統合部１１４は、階層ｈから値１を減算し、ステップＳａ３に処理を戻す（ステップＳａ８）。 On the other hand, in step Sa6, when the hierarchy of the image pyramid is not the lowest layer (step Sa6: No), the function integration unit 114 subtracts the value 1 from the hierarchy h, and returns the process to step Sa3 (step Sa8).

以上により、デプスマップ生成部１２１は、閾値以上である相関値マップｃｏｒｒ及び信頼値マップｃｏｎｆに基づいてデプスマップｄｅｐを生成することにより、３次元点Ｍ群について信頼性が高い座標を算出することができる。 As described above, the depth map generation unit 121 generates a depth map dep based on the correlation value map corr and the confidence value map conf that are equal to or greater than the threshold value, thereby calculating highly reliable coordinates for the three-dimensional point M group. Can do.

以上が、マッチングウィンドウの平行移動量、及び、マッチングウィンドウの視差の倍率を用いて、ステレオペア間の画像変形を表現する場合についての説明である。 This completes the description of the case where the image deformation between stereo pairs is expressed using the parallel movement amount of the matching window and the parallax magnification of the matching window.

次に、マッチングウィンドウの平行移動量、及び、マッチングウィンドウの視差の倍率のみならず、マッチングウィンドウの拡大縮小率、及び、マッチングウィンドウのスキュー（歪み）を用いて、ステレオペア間の画像変形を表現する場合について説明する。 Next, not only the translation amount of the matching window and the parallax magnification of the matching window, but also the image deformation between stereo pairs is expressed using the matching window scaling ratio and the matching window skew. The case where it does is demonstrated.

ステレオペア間の画像は、撮影された３次元復元対象の物体の表面形状と、カメラの位置関係とに応じて、非線形に変形する。関数算出装置１１０のウィンドウ設定部１１２（図１を参照）は、ステレオペア間の画像変形を軽減するように、マッチングウィンドウを定める。より具体的には、ウィンドウ設定部１１２は、線分に基づく拡大縮小率、及び、法線ベクトルに基づく傾きの少なくとも一方に基づいて、マッチングウィンドウを定める。 The images between the stereo pairs are deformed non-linearly according to the surface shape of the captured object to be three-dimensionally restored and the positional relationship of the camera. The window setting unit 112 (see FIG. 1) of the function calculation device 110 determines a matching window so as to reduce image deformation between stereo pairs. More specifically, the window setting unit 112 determines a matching window based on at least one of an enlargement / reduction ratio based on a line segment and an inclination based on a normal vector.

これにより、関数算出装置１１０は、ステレオペア間の基線長が長い場合、又は、３次元復元対象の物体がカメラに対して大きく傾いている場合など、ステレオペア間の画像変形が大きい場合でも、ステレオペアの相関関数を精度良く算出することができる。関数算出装置１１０は、画像に撮像された３次元点（注目点）の座標を、高精度に算出することができる。また、デプスマップ生成装置１２０は、高精度な画像マッチングを実行して、デプスマップを生成することができる。 Thereby, even when the image calculation between the stereo pairs is large, such as when the base line length between the stereo pairs is long or when the object to be three-dimensionally restored is greatly inclined with respect to the camera, the function calculating device 110 The correlation function of the stereo pair can be calculated with high accuracy. The function calculation device 110 can calculate the coordinates of the three-dimensional point (attention point) captured in the image with high accuracy. In addition, the depth map generation apparatus 120 can generate a depth map by executing high-precision image matching.

平行化したステレオペア間では、垂直座標軸又は水平座標軸と、エピポーラ線とが平行になっているので、ステレオペア間の画像変形は、１次元方向に限定されている。したがって、３次元復元対象の物体が局所的には平面であると仮定された場合、平行化したステレオペアの参照視点画像Ｖｒｅｃｔ，ｉと、近傍視点画像Ｃｒｅｃｔ，ｉとのマッチングウィンドウの間の画像変形は、マッチングウィンドウの拡大縮小率、及び、マッチングウィンドウのスキュー（例えば、スキューの傾き）を用いて、表現することができる。 Since the vertical coordinate axis or horizontal coordinate axis and the epipolar line are parallel between the parallel stereo pairs, the image deformation between the stereo pairs is limited to a one-dimensional direction. Therefore, when it is assumed that the object to be restored in three dimensions is locally a plane, the image between the matching viewpoint window of the parallel viewpoint reference viewpoint image Vrect, i and the neighboring viewpoint image Crect, i. The deformation can be expressed using the scaling ratio of the matching window and the skew (for example, skew slope) of the matching window.

以下では、平行化したステレオペアＶ_Ｒ-Ｃ_ｉ、すなわち、平行化したステレオペアＶｒｅｃｔ，ｉ‐Ｃｒｅｃｔ，ｉにおいて、３次元点Ｍｉ＝ [Ｘｉ，Ｙｉ，Ｚｉ]と、３次元点Ｍｉの法線ベクトルｎｉ＝ [ｎＸ，ｉ，ｎＹ，ｉ，ｎＺ，ｉ] とが与えられた場合について、拡大縮小率ξi と、傾きκi とを算出する方法について説明する。 In the following, in the paralleled stereo pair V _R -C _i , that is, in the parallelized stereo pair Vrect, i-Clect, i, the method of the three-dimensional point Mi = [Xi, Yi, Zi] and the three-dimensional point Mi A method of calculating the enlargement / reduction ratio ξi and the slope κi in the case where the line vector ni = [nX, i, nY, i, nZ, i] is given will be described.

なお、３次元点Ｍｉの座標系は、平行化した参照視点画像Ｖｒｅｃｔ，ｉのカメラ座標系により表現されている。また、法線ベクトルｎｉの座標系も、平行化した参照視点画像Ｖｒｅｃｔ，ｉのカメラ座標系により表現されている。これら参照視点画像Ｖｒｅｃｔ，ｉのカメラ座標は、ステレオペアの参照視点画像Ｖｒｅｃｔ，ｉ及び近傍視点画像Ｃｒｅｃｔ，ｉが平行化される際に、その近傍視点画像Ｃｒｅｃｔ，ｉのカメラパラメータに基づいて回転されている。 Note that the coordinate system of the three-dimensional point Mi is represented by the camera coordinate system of the parallelized reference viewpoint image Vrect, i. The coordinate system of the normal vector ni is also expressed by the camera coordinate system of the parallelized reference viewpoint image Vrect, i. The camera coordinates of these reference viewpoint images Vrect, i are rotated based on the camera parameters of the neighboring viewpoint images Clect, i when the reference viewpoint images Vrect, i and the neighboring viewpoint images Cect, i of the stereo pair are parallelized. Has been.

図８には、ステレオペアのそれぞれに定められたマッチングウィンドウが表されている。図８では、ステレオペアＶｒｅｃｔ，ｉ‐Ｃｒｅｃｔ，ｉは、エピポーラ線が水平座標軸と平行になるように、平行化されているものとする。また、３次元平面（物体表面）は、カメラに対して大きく傾いているものとする。 FIG. 8 shows a matching window defined for each stereo pair. In FIG. 8, the stereo pair Vrect, i-Clect, i is parallelized so that the epipolar line is parallel to the horizontal coordinate axis. Further, it is assumed that the three-dimensional plane (object surface) is greatly inclined with respect to the camera.

まず、拡大縮小率ξiに基づいて画像変形を軽減する場合について、図８を用いて説明する。３次元点Ｍｉ及び法線ベクトルｎｉにより定まる３次元平面（物体表面）と、エピポーラ平面との交線上に、３次元点Ｍｉを中心とする単位線分が仮定されたとする。 First, the case of reducing the image deformation based on the enlargement / reduction ratio ξi will be described with reference to FIG. Assume that a unit line segment centered on the three-dimensional point Mi is assumed on the intersection line between the three-dimensional point Mi and the normal vector ni and the three-dimensional plane (object surface) and the epipolar plane.

当該単位線分が参照視点画像Ｖｒｅｃｔ，ｉに投影された線分の長さｗ１，ｉは、式（２９）により表される。また、当該単位線分が近傍視点画像Ｃｒｅｃｔ，ｉに投影された線分の長さｗ２，ｉは、式（３０）により表される。 The lengths w1, i of the line segments projected on the reference viewpoint image Vrect, i are expressed by Expression (29). Further, the length w2, i of the line segment projected on the near viewpoint image Clect, i is expressed by Expression (30).

ここで，ψ１，ｉは、参照視点画像Ｖｒｅｃｔ，ｉにおける光軸と、視線とが成す角を表す。また、ψ２，ｉは、近傍視点画像Ｃｒｅｃｔ，ｉにおける光軸と、視線とが成す角を表す。また、φ１，ｉは、法線ベクトルｎｉがエピポーラ平面に投影された投影ベクトルと、視線とが成す角を表す。また、φ２，ｉは、法線ベクトルｎｉがエピポーラ平面に投影された投影ベクトルと、視線とが成す角を表す。 Here, ψ1, i represents an angle formed by the optical axis and the line of sight in the reference viewpoint image Vrect, i. Ψ2, i represents an angle formed by the optical axis and the line of sight in the near viewpoint image Clect, i. Moreover, φ1, i represents an angle formed by the projection vector obtained by projecting the normal vector ni onto the epipolar plane and the line of sight. Φ2, i represents an angle formed by a projection vector obtained by projecting the normal vector ni onto the epipolar plane and the line of sight.

当該単位線分が参照視点画像Ｖｒｅｃｔ，ｉに投影された線分の長さｗ１，ｉと、当該単位線分が近傍視点画像Ｃｒｅｃｔ，ｉに投影された線分の長さｗ２，ｉとの関係は、式（２９）及び式（３０）に基づいて、式（３１）又は式（３２）により表される。 The length w1, i of the line segment in which the unit line segment is projected on the reference viewpoint image Vrect, i and the length w2, i of the line segment in which the unit line segment is projected on the near viewpoint image Crect, i The relationship is expressed by Expression (31) or Expression (32) based on Expression (29) and Expression (30).

また、マッチングウィンドウの拡大縮小率ξi は、式（３３）により表される。なお、エピポーラ線が垂直座標軸と平行になるように、ステレオペアＶｒｅｃｔ，ｉ‐Ｃｒｅｃｔ，ｉが平行化された場合でも、マッチングウィンドウの拡大縮小率ξi は、式（３３）により表される。 Further, the matching window enlargement / reduction ratio ξi is expressed by Expression (33). Even when the stereo pair Vrect, i-Clect, i is parallelized so that the epipolar line is parallel to the vertical coordinate axis, the scaling ratio ξi of the matching window is expressed by the equation (33).

ウィンドウ設定部１１２（図１を参照）は、視差の倍率ｓｉと、ウィンドウサイズｗとに基づいて、近傍視点画像Ｃｒｅｃｔ，ｉの水平方向（幅方向）のウィンドウサイズをｓｉｗピクセルに、近傍視点画像Ｃｒｅｃｔ，ｉの垂直方向のウィンドウサイズをＬピクセルに定める（図８の右下に示すマッチングウィンドウを参照）。 The window setting unit 112 (refer to FIG. 1) sets the window size in the horizontal direction (width direction) of the near viewpoint image Clect, i to siw pixels based on the parallax magnification si and the window size w. The vertical window size of Crect, i is set to L pixels (see the matching window shown in the lower right of FIG. 8).

一方、ウィンドウ設定部１１２は、視差の倍率ｓｉと、ウィンドウサイズｗと、拡大縮小率ξiとに基づいて、参照視点画像Ｖｒｅｃｔ，ｉの水平方向（幅方向）のウィンドウサイズをξiｓｉｗピクセルに、参照視点画像Ｖｒｅｃｔ，ｉの垂直方向のウィンドウサイズをＬピクセルに定める（図８の左下に示すマッチングウィンドウを参照）。このように、ウィンドウ設定部１１２は、拡大縮小率ξiに基づいて、ステレオペア間の局所的な画像変形を軽減させることができる。 On the other hand, the window setting unit 112 refers to the window size in the horizontal direction (width direction) of the reference viewpoint image Vrect, i based on the parallax magnification si, the window size w, and the enlargement / reduction ratio ξi, with reference to ξisiw pixels. The vertical window size of the viewpoint image Vrect, i is set to L pixels (see the matching window shown in the lower left of FIG. 8). In this way, the window setting unit 112 can reduce local image deformation between stereo pairs based on the enlargement / reduction ratio ξi.

なお、参照視点画像Ｖｒｅｃｔ，ｉの水平方向のウィンドウサイズをξi倍する代わりに、近傍視点画像Ｃｒｅｃｔ，ｉの水平方向のウィンドウサイズを１／ξi倍した場合には、1次元ＰＯＣ関数ｒｉ，ｈの相関ピークの位置座標は、ξi倍されることになる。このため、他のステレオペアＶｒｅｃｔ，ｊ‐Ｃｒｅｃｔ，ｊ(ｊ∈｛１，…，Ｋ−１｝−{ｉ｝）に基づいて算出された１次元ＰＯＣ関数ｒｊ，ｈの相関ピークの位置座標は、１次元ＰＯＣ関数ｒｉ，ｈの相関ピークの位置座標と一致しなくなる。 In addition, instead of multiplying the horizontal window size of the reference viewpoint image Vrect, i by ξi, the one-dimensional POC function ri, h is obtained when the horizontal window size of the neighboring viewpoint image Cect, i is multiplied by 1 / ξi. The position coordinates of the correlation peak are multiplied by ξi. For this reason, the position coordinates of the correlation peak of the one-dimensional POC function rj, h calculated based on another stereo pair Vrect, j-Clect, j (j∈ {1,..., K-1}-{i}) Does not coincide with the position coordinates of the correlation peak of the one-dimensional POC function ri, h.

次に、傾きκiに基づいて画像変形を軽減する場合について、図８を用いて説明する。
３次元点Ｍｉ及び法線ベクトルｎｉにより定まる３次元平面（物体表面）上に、直線を仮定する。この直線は、参照視点画像Ｖｒｅｃｔ，ｉに投影された場合に、参照視点画像Ｖｒｅｃｔ，ｉにおける３次元点Ｍｉの対応点ｍｉ＝ [ｕｉ，ｖｉ] を通り、垂直座標軸に平行となるように、３次元平面（物体表面）上に仮定される。 Next, a case where image deformation is reduced based on the inclination κi will be described with reference to FIG.
A straight line is assumed on a three-dimensional plane (object surface) determined by the three-dimensional point Mi and the normal vector ni. When this straight line is projected onto the reference viewpoint image Vrect, i, it passes through the corresponding point mi = [ui, vi] of the three-dimensional point Mi in the reference viewpoint image Vrect, i and is parallel to the vertical coordinate axis. It is assumed on a three-dimensional plane (object surface).

このように３次元平面上に仮定された直線は、近傍視点画像Ｃｒｅｃｔ，ｉにおける３次元点Ｍｉの対応点ｍＣ_ｉ＝ [ｕＣ_ｉ，ｖｉ] を通り、傾きκiを有する直線として、近
傍視点画像Ｃｒｅｃｔ，ｉに投影されることになる。この３次元平面上に仮定された直線は，式（３４）を満たす[Ｘ，Ｙ，Ｚ] により表される。 The straight line assumed on the three-dimensional plane in this way passes through the corresponding point mC _i = [uC _i , vi] of the three-dimensional point Mi in the near viewpoint image Clect, i, and is a near viewpoint image as a straight line having an inclination κi. It will be projected onto Crect, i. The straight line assumed on the three-dimensional plane is represented by [X, Y, Z] satisfying the equation (34).

ここで、座標（ｕ０ｉ，ｖ０ｉ）は、参照視点画像Ｖｒｅｃｔ，ｉの中心座標を表す。
また、βiは、参照視点画像Ｖｒｅｃｔ，ｉの焦点距離を表す。また、｜｜ｔｃａｍ，ｉ｜｜は、ステレオペア間の基線長を表す。また、ｖ’ｉは、参照視点画像Ｖｒｅｃｔ，ｉ上の任意の垂直座標を表す。 Here, the coordinates (u0i, v0i) represent the center coordinates of the reference viewpoint image Vrect, i.
Β i represents the focal length of the reference viewpoint image Vrect, i. || tcam, i || represents the baseline length between stereo pairs. Further, v′i represents an arbitrary vertical coordinate on the reference viewpoint image Vrect, i.

図８に示されているように、近傍視点画像Ｃｒｅｃｔ，ｉが参照視点画像Ｖｒｅｃｔ，ｉに対してＸ軸正方向にある場合、傾きκiについて式（３４）を解くことにより、式（３５）が得られる。 As shown in FIG. 8, when the near viewpoint image Crect, i is in the positive direction of the X axis with respect to the reference viewpoint image Vrect, i, the equation (35) is obtained by solving the equation (34) for the inclination κi. Is obtained.

一方、近傍視点画像Ｃｒｅｃｔ，ｉが参照視点画像Ｖｒｅｃｔ，ｉに対してＸ軸負方向にある場合、傾きκiの正負が逆転する。したがって、式（３５）は、より一般的に、式（３６）により表される。なお、エピポーラ線が垂直座標軸と平行になるように、ステレオペアＶｒｅｃｔ，ｉ‐Ｃｒｅｃｔ，ｉが平行化された場合でも、傾きκiは、式（３３）により表される。 On the other hand, when the near viewpoint image Crect, i is in the negative direction of the X axis with respect to the reference viewpoint image Vrect, i, the sign of the slope κi is reversed. Therefore, equation (35) is more generally represented by equation (36). Even when the stereo pair Vrect, i-Clect, i is parallelized so that the epipolar line is parallel to the vertical coordinate axis, the inclination κi is expressed by Expression (33).

ウィンドウ設定部１１２は、近傍視点画像Ｃｒｅｃｔ，ｉにおけるマッチングウィンドウを、例えば、スキューの傾きがκiとなるように定める。また、ウィンドウ設定部１１２は、エピポーラ線と直交する座標軸と、マッチングウィンドウの中心座標との位置ずれに基づいて、マッチングウィンドウを構成する各ラインの中心座標を定める。このように、ウィンドウ設定部１１２は、ステレオペア間の局所的な画像変形を軽減させるように、傾きκiに基づいて、マッチングウィンドウを定める。 The window setting unit 112 determines the matching window in the near viewpoint image Crect, i so that, for example, the skew inclination becomes κi. Further, the window setting unit 112 determines the center coordinates of each line constituting the matching window based on the positional deviation between the coordinate axis orthogonal to the epipolar line and the center coordinates of the matching window. Thus, the window setting unit 112 determines a matching window based on the inclination κi so as to reduce local image deformation between stereo pairs.

関数統合部１１４（図１を参照）は、拡大縮小率ξi及び傾きκiに基づいてステレオペア毎に算出された１次元ＰＯＣ関数を、複数のステレオペア間で平均化することにより統合する。関数統合部１１４は、平均ＰＯＣ関数（統合した１次元ＰＯＣ関数）のうち、相関ピークの高さαが閾値ｔｈｃｏｒｒ以上である（画像の類似度が高い）平均ＰＯＣ関数のみを、デプスマップ生成装置１２０（図１を参照）に出力する。 The function integration unit 114 (see FIG. 1) integrates the one-dimensional POC function calculated for each stereo pair based on the enlargement / reduction ratio ξi and the inclination κi by averaging the stereo pairs. The function integration unit 114 calculates only the average POC function whose correlation peak height α is equal to or higher than the threshold thcorr (high image similarity) from the average POC function (integrated one-dimensional POC function). 120 (see FIG. 1).

デプスマップ生成装置１２０は、関数統合部１１４が出力した平均ＰＯＣ関数に基づいて、プレーンスイーピングを用いた奥行き探索と、画像ピラミッドを用いた階層的探索とにより、多視点画像からデプスマップ（後述するＩ_Ｚ（ｕ，ｖ））を生成する。 The depth map generation device 120 performs depth map (to be described later) from a multi-viewpoint image by depth search using plane sweeping and hierarchical search using image pyramids based on the average POC function output from the function integration unit 114. I _Z (u, v)) is generated.

＜画像ピラミッドを用いて奥行きを探索する方法について＞
図９は、画像ピラミッドの各階層での関数算出装置の動作手順を表すフローチャートである。
（ステップＳｂ１）正規化部１１１（図１を参照）は、カメラパラメータが既知である参照視点画像Ｖｒｅｃｔ，ｉと、カメラパラメータが既知である近傍視点画像Ｃｒｅｃｔ，ｉとを、記憶装置２００から読み込む。正規化部１１１は、Ｋ組の平行化したステレオペアＶｒｅｃｔ，ｉ‐Ｃｒｅｃｔ，ｉ（ｉ＝０，…，Ｋ−１）を生成する。ここで、正規化部１１１は、Ｋ組の平行化したステレオペアＶｒｅｃｔ，ｉ‐Ｃｒｅｃｔ，ｉ（ｉ＝０，…，Ｋ−１）について、階層毎に２分の１ずつ画像を縮小した階層数Ｈの画像ピラミッドを生成する。 <How to search depth using an image pyramid>
FIG. 9 is a flowchart showing an operation procedure of the function calculation device in each layer of the image pyramid.
(Step Sb1) The normalization unit 111 (see FIG. 1) reads from the storage device 200 the reference viewpoint image Vrect, i whose camera parameters are known and the near viewpoint image Clect, i whose camera parameters are known. . The normalization unit 111 generates K parallel stereo pairs Vrect, i-Clect, i (i = 0,..., K−1). Here, the normalization unit 111 is a hierarchy obtained by reducing the image by 1/2 for each hierarchy for K parallel stereo pairs Vrect, i-Clect, i (i = 0,..., K−1). A number H of image pyramids is generated.

（ステップＳｂ２）正規化部１１１は、参照視点画像Ｖｒｅｃｔ，ｉ上の座標ｍi＝ [ｕi，ｖi] について、各ステレオペアにおける視差の倍率ｓｉを算出する。 (Step Sb2) The normalization unit 111 calculates the parallax magnification si in each stereo pair for the coordinates mi = [ui, vi] on the reference viewpoint image Vrect, i.

（ステップＳｂ３）３次元復元対象の３次元平面（物体表面）の法線ベクトルｎｉが算出されていないため、関数統合部１１４（図１を参照）は、画像ピラミッドの最上層Ｈ−１で、３次元点Ｍiの奥行きと、法線ベクトルｎｉとを変化させながら、平均ＰＯＣ関数ｒａｖｅ，ｈを算出する（画像マッチング）。 (Step Sb3) Since the normal vector ni of the three-dimensional plane (object surface) to be three-dimensionally restored has not been calculated, the function integration unit 114 (see FIG. 1) is the uppermost layer H-1 of the image pyramid. The average POC function “rave, h” is calculated while changing the depth of the three-dimensional point Mi and the normal vector ni (image matching).

ここで、３次元点Ｍiの奥行きを変化させる刻み幅は、ウィンドウサイズｗの４分の１に相当する長さでもよい。また、参照視点画像Ｖｒｅｃｔ，ｉに正対する法線ベクトルｎiを基準に、Ｘ軸及びＹ軸回りに±（π／８）の範囲で回転させた９個の法線ベクトルｎについて、平均ＰＯＣ関数ｒａｖｅ，ｈが算出されてもよい。 Here, the step size for changing the depth of the three-dimensional point Mi may be a length corresponding to a quarter of the window size w. The average POC function for nine normal vectors n rotated in the range of ± (π / 8) around the X axis and the Y axis with reference to the normal vector ni facing the reference viewpoint image Vrect, i. Rave, h may be calculated.

（ステップＳｂ４）関数統合部１１４は、平均ＰＯＣ関数の相関ピークの高さαｉが最も高くなる３次元点Ｍiの奥行き、及び法線ベクトルｎiを選択する。また、関数統合部１１４は、平均ＰＯＣ関数ｒａｖｅ，ｈの相関ピークの位置座標に基づいて、３次元点Ｍiの奥行きを更新する。なお、法線ベクトルｎiは、選択された法線ベクトルｎiを基準にして、より精度良く定められてもよい。 (Step Sb4) The function integration unit 114 selects the depth of the three-dimensional point Mi having the highest correlation peak height αi of the average POC function and the normal vector ni. Further, the function integration unit 114 updates the depth of the three-dimensional point Mi based on the position coordinates of the correlation peak of the average POC function “rave”, h. Note that the normal vector ni may be determined with higher accuracy on the basis of the selected normal vector ni.

（ステップＳｂ５）関数統合部１１４は、最上層Ｈ−１で選択した法線ベクトルｎiに基づいて、ステレオペア間の平均ＰＯＣ関数ｒａｖｅ，ｈを算出し、平均ＰＯＣ関数の相関ピークの高さαｉが最も高くなる３次元点Ｍiの奥行きを更新する。また、関数統合部１１４は、平均ＰＯＣ関数ｒａｖｅ，ｈの相関ピークの位置座標に基づいて、３次元点Ｍiの奥行きを更新する。 (Step Sb5) The function integration unit 114 calculates the average POC function “rave, h” between stereo pairs based on the normal vector ni selected in the top layer H−1, and the correlation peak height αi of the average POC function. The depth of the three-dimensional point Mi where becomes the highest is updated. Further, the function integration unit 114 updates the depth of the three-dimensional point Mi based on the position coordinates of the correlation peak of the average POC function “rave”, h.

（ステップＳｂ６）関数統合部１１４は、画像ピラミッドの階層が最下層ｈ＝０であるか否かを判定するを判定する。画像ピラミッドの階層が最下層である場合（ステップＳｂ６：Ｙｅｓ）、関数統合部１１４は、ステップＳｂ８に処理を進める。一方、画像ピラミッドの階層が最下層でない場合（ステップＳｂ６：Ｎｏ）、関数算出部１１３は、ステップＳｂ７に処理を進める。 (Step Sb6) The function integration unit 114 determines whether or not the hierarchy of the image pyramid is the lowest layer h = 0. When the hierarchy of the image pyramid is the lowest layer (step Sb6: Yes), the function integration unit 114 proceeds with the process to step Sb8. On the other hand, when the hierarchy of the image pyramid is not the lowest layer (step Sb6: No), the function calculation unit 113 advances the process to step Sb7.

（ステップＳｂ７）関数統合部１１４は、階層ｈから値１を減算し（一つ下の階層に処理を移動させ）、ステップＳｂ５に処理を戻す。
（ステップＳｂ８）関数統合部１１４は、更新された３次元点Ｍiの奥行きを、座標ｍiに対応する３次元点Ｍiの奥行きと定める。 (Step Sb7) The function integration unit 114 subtracts the value 1 from the hierarchy h (moves the process to the next lower hierarchy), and returns the process to step Sb5.
(Step Sb8) The function integration unit 114 determines the updated depth of the three-dimensional point Mi as the depth of the three-dimensional point Mi corresponding to the coordinate mi.

（ステップＳｂ９）関数統合部１１４は、参照視点画像Ｖｒｅｃｔ，ｉ上の全ての座標ｍiについて、３次元点Ｍiの奥行きを定めたか否かを判定する。参照視点画像Ｖｒｅｃｔ，ｉ上の全ての座標ｍiについて、３次元点Ｍiの奥行きを定めた場合（ステップＳｂ９：Ｙｅｓ）、関数統合部１１４は、平均ＰＯＣ関数（統合した１次元ＰＯＣ関数）のうち、相関ピークの高さαが閾値ｔｈｃｏｒｒ以上である（画像の類似度が高い）平均ＰＯＣ関数のみを、デプスマップ生成装置１２０に出力し、処理を終了する。一方、参照視点画像Ｖｒｅｃｔ，ｉ上のいずれかの座標ｍiについて、３次元点Ｍiの奥行きを定めていない場合（ステップＳｂ９：Ｎｏ）、関数統合部１１４は、ステップＳｂ２に処理を戻す。 (Step Sb9) The function integration unit 114 determines whether or not the depth of the three-dimensional point Mi has been determined for all coordinates mi on the reference viewpoint image Vrect, i. When the depth of the three-dimensional point Mi is determined for all the coordinates mi on the reference viewpoint image Vrect, i (step Sb9: Yes), the function integration unit 114 includes the average POC function (integrated one-dimensional POC function). Only the average POC function whose correlation peak height α is equal to or greater than the threshold thcorr (the image similarity is high) is output to the depth map generation device 120, and the process is terminated. On the other hand, when the depth of the three-dimensional point Mi is not defined for any coordinate mi on the reference viewpoint image Vrect, i (step Sb9: No), the function integration unit 114 returns the process to step Sb2.

関数算出装置１１０は、奥行きを探索する参照視点画像Ｖｒｅｃｔ，ｉを変更し、その参照視点画像Ｖｒｅｃｔ，ｉについても、同様に処理を実行する。デプスマップ生成装置１２０（デプスマップ生成部１２１）は、関数統合部１１４が出力した平均ＰＯＣ関数に基づいて、デプスマップを生成する。 The function calculation device 110 changes the reference viewpoint image Vrect, i for searching the depth, and similarly executes the process for the reference viewpoint image Vrect, i. The depth map generation device 120 (depth map generation unit 121) generates a depth map based on the average POC function output from the function integration unit 114.

＜重み付けメディアンフィルタを用いた欠如したピクセルの修復＞
上述した画像ピラミッドを用いた階層的探索においては、上層の１個のピクセルのマッチング結果が、下層の複数個のピクセルに対して影響を与える。このため、上層におけるピクセルの誤対応が最終的なデプスマップの多くのピクセルの誤対応の原因となってしまう。また、上位の層ほど相対的に、画像サイズに対するウィンドウサイズが大きくなっていくため，物体境界やオクルージョン境界の影響うける範囲が大きくなる。この問題に対して、本実施形態においては、重み付きメディアンフィルタを用いることで、デプスマップの高精度化を行う。これにより、本実施形態においては、重み付きメディアンフィルタを用いることで、孤立した誤対応点と、物体境界及びオクルージョン境界におけるアーチファクトとの修正を行っている。 <Repair missing pixels using weighted median filter>
In the hierarchical search using the image pyramid described above, the matching result of one pixel in the upper layer affects a plurality of pixels in the lower layer. For this reason, the erroneous correspondence of the pixels in the upper layer causes the erroneous correspondence of many pixels in the final depth map. In addition, since the window size relative to the image size increases relative to the upper layer, the range affected by the object boundary and the occlusion boundary increases. In order to solve this problem, in the present embodiment, the accuracy of the depth map is increased by using a weighted median filter. As a result, in the present embodiment, the weighted median filter is used to correct isolated miscorresponding points and artifacts at the object boundary and the occlusion boundary.

すなわち、本実施形態においては、上層のマッチングにおけるピクセルの欠如に対応して、上述した重み付きメディアンフィルタによる修復処理を行うことにより、上層において突発的に発生した誤対応や、物体境界及びオクルージョン境界付近で発生するアーチファクトの影響が下層に伝搬することを抑制する。
また、本実施形態においては、最終的なデプスマップに対し、ＭＶＳ（Multi View Stereo；多視点ステレオ）のためのＰＯＣ（Phase Only Correlation；位相限定相関法）に基づき、ウィンドウマッチングによる奥行き推定結果そのものを用いるため、重み付きメディアンフィルタを使用したマップ値の更新を行わない。 That is, in the present embodiment, in response to the lack of pixels in the upper layer matching, the above-described weighted median filter is used to perform the repair process, thereby causing an unexpected correspondence in the upper layer, an object boundary, and an occlusion boundary. Suppresses propagation of artifacts that occur in the vicinity to the lower layer.
In the present embodiment, the depth estimation result itself by window matching based on POC (Phase Only Correlation) for MVS (Multi View Stereo) is used for the final depth map. Therefore, the map value is not updated using the weighted median filter.

画像ピラミッドにおける各階層毎に、デプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）（デプスマップｄｅｐ）、相関値マップＩ_α ^ｈ（ｕ，ｖ）（相関値マップｃｏｒｒ）、法線ベクトルマップＩ_Ψ ^ｈ（ｕ，ｖ）、法線ベクトルマップＩ_φ ^ｈ（ｕ，ｖ）の各々が生成される毎に、重み付きメディアンフィルタを用いて、デプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）、相関値マップＩ_α ^ｈ（ｕ，ｖ）、法線ベクトルマップＩ_Ψ ^ｈ（ｕ，ｖ）の各々において、マッチング処理において欠如したピクセルの修復を行う。デプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）、相関値マップＩ_α ^ｈ（ｕ，ｖ）、法線ベクトルマップＩ_Ψ ^ｈ（ｕ，ｖ）の各々において、符号ｈが画像ピラミッドにおける対応する階層を示している。ここで、ｕ及びｖの各々は、Ｘ軸、Ｙ軸及びＺ軸からなる３次元空間における２次元平面を形成するＸ軸及びＹ軸上の座標値を示している。 For each hierarchy in the image pyramid, depth map I _Z ^h (u, v) (depth map dep), correlation value map I _α ^h (u, v) (correlation value map corr), normal vector map I _Ψ ^h ( u, v) and the normal vector map I _φ ^h (u, v) are generated, the depth map I _Z ^h (u, v), the correlation value map I _α is calculated using a weighted median filter. ^{In each of h} (u, v) and normal vector map I _Ψ ^h (u, v), the pixel missing in the matching process is repaired. In each of the depth map I _Z ^h (u, v), the correlation value map I _α ^h (u, v), and the normal vector map I _Ψ ^h (u, v), the symbol h indicates the corresponding hierarchy in the image pyramid. ing. Here, each of u and v represents coordinate values on the X axis and the Y axis that form a two-dimensional plane in a three-dimensional space composed of the X axis, the Y axis, and the Z axis.

図１０は、画像ピラミッドにおける上層、中層、下層のデプスマップの関係を説明する図である。図１０において、上層のデプスマップＩ_Ｚ ^Ｈ−１（ｕ，ｖ）、中層のデプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）、下層のデプスマップＩ_Ｚ ^０（ｕ，ｖ）の各々は、画像ピラミッドを構成している。上層のデプスマップＩ_Ｚ ^Ｈ−１（ｕ，ｖ）でマッチングした１個のピクセルが、中層のデプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）では４個のピクセルとなる。また、中層のデプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）でマッチングした１個のピクセルが、下層のデプスマップＩ_Ｚ ^０（ｕ，ｖ）では４個のピクセルとなる。このため、上層のデプスマップＩ_Ｚ ^Ｈ−１（ｕ，ｖ）でマッチングした１個のピクセルが、下層のデプスマップＩ_Ｚ ^０（ｕ，ｖ）では１６個のピクセルとなる。図１０における例では、元の画像データである下層を、上位になるに従って縦横２ピクセルずつの縮小を行い、ピクセル毎のマッチング処理を行っている。縦横ｎピクセルずつの縮小の場合、下位に行くに従い、ｎ２単位でピクセル数が増加するため、上位の層の１個のピクセルの欠如が、下層に伝搬するに従って、多数のピクセルの欠如となる。他の相関値マップＩ_α ^ｈ（ｕ，ｖ）及び法線ベクトルマップＩ_Ψ ^ｈ（ｕ，ｖ）も、画像ピラミッドにおいて同様である。 FIG. 10 is a diagram illustrating the relationship between the depth maps of the upper layer, the middle layer, and the lower layer in the image pyramid. In FIG. 10, each of the upper layer depth map I _Z ^H-1 (u, v), the middle layer depth map I _Z ^h (u, v), and the lower layer depth map I _Z ⁰ (u, v) is an image pyramid. Is configured. One pixel matched in the upper layer depth map I _Z ^H-1 (u, v) becomes four pixels in the middle layer depth map I _Z ^h (u, v). In addition, one pixel matched in the middle layer depth map I _Z ^h (u, v) becomes four pixels in the lower layer depth map I _Z ⁰ (u, v). For this reason, one pixel matched in the upper depth map I _Z ^H-1 (u, v) becomes 16 pixels in the lower depth map I _Z ⁰ (u, v). In the example in FIG. 10, the lower layer, which is the original image data, is reduced by 2 pixels vertically and horizontally as it becomes higher, and matching processing for each pixel is performed. In the case of reduction by n pixels in the vertical and horizontal directions, the number of pixels increases in units of n2 as it goes down, so that the lack of one pixel in the upper layer becomes the lack of a large number of pixels as it propagates to the lower layer. The other correlation value maps I _α ^h (u, v) and normal vector maps I _Ψ ^h (u, v) are similar in the image pyramid.

本実施形態においては、重み付きメディアンフィルタにおける重み関数としては、原画像におけるバイラテラル重み（ガウシアンフィルタのカーネルに輝度差に基づいた重み）と、相関値マップの対応するピクセルの相関値との積を用いる。
以下においては、階層ｈにおけるデプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）を例として、本実施形態における重み付きメディアンフィルタについて説明する。
本実施形態における重み付きメディアンフィルタは、フィルタリングを行う対象のピクセルである対象ピクセルの数値（デプスマップの奥行きの数値）を、この対象ピクセルを中心とした周辺領域における他のピクセルの数値（デプスマップの奥行きの数値）の重み付け中央値で決定するフィルタである。 In the present embodiment, the weight function in the weighted median filter is the product of the bilateral weight in the original image (weight based on the luminance difference in the Gaussian filter kernel) and the correlation value of the corresponding pixel in the correlation value map. Is used.
In the following, the weighted median filter in this embodiment will be described by taking the depth map I _Z ^h (u, v) in the hierarchy h as an example.
The weighted median filter in the present embodiment uses the numerical value of the target pixel (depth numerical value of the depth map) that is a pixel to be filtered, and the numerical value of other pixels (depth map) in the peripheral region centered on the target pixel. This is a filter that is determined by the weighted median of the numerical value of the depth.

重み付き中央値の算出において、小さいあるいは大きい順にソートされた数列（ａ_ｋ）_{ｋ＝０，１，…，Ｎ−１}と、各要素（ピクセルの位置に対応）の重み（ω_ｋ）_{ｋ＝０，１，…，Ｎ−１}との各々が与えられた際に、数列（ａｋ）の重み付き中央値ａ_ｍｅｄは、以下の（３７）式により求める。（３７）式において、ｎ及びｌの各々は数列の項の番号を示している。ここで、ｌは対象となるピクセルの数列における項の番号を示す。 In the calculation of the weighted median, the numbers (a _k ) _{k = 0, 1,..., N−1} sorted in order from the smallest or largest _, and the weight (ω _k ) _{k = of} each element (corresponding to the pixel position) _{When each of 0, 1,..., N−1} is given, the weighted median value a _med of the sequence (ak) is obtained by the following equation (37). In the formula (37), each of n and l indicates the number of a term in the sequence. Here, l indicates the number of a term in the target pixel number sequence.

重み付きメディアンフィルタでは、欠如してデータを修復する対象ピクセルに対し、この対象ピクセル周辺（予め設定された対象ピクセルを中心とする他のピクセルを含む範囲の領域）の他のピクセルのマップ値Ｉ_ｍｅｄ（ｉ，ｊ）とその重みｗ_ｍｅｄ（ｉ，ｊ）との各々を、小さい順番あるいは大きい順番にソートする。そして、ソートした数列及び各要素の重み（ｗ_ｋ）として、重み付き中央値ａ_ｍｅｄを求める。ここで、（ｉ，ｊ）はメディアンフィルタのウィンドウ内の座標であり、ｉ＝−Ｍ_ω、…、Ｍ_ωであり、ｊ＝−Ｍ_ω、…、Ｍ_ωであり、ウィンドウサイズはＮ_ω×Ｎ_ω（Ｎ_ω＝２Ｍ_ω＋１）ピクセルである。 In the weighted median filter, the map value I of other pixels around the target pixel (a region including other pixels centered on the target pixel set in advance) for the target pixel whose data is to be restored by lacking. Each of _med (i, j) and its weight w _med (i, j) is sorted in ascending order or descending order. Then, a weighted median a _med is _obtained as the sorted sequence and the weight (w _k ) of each element. Here, (i, j) are coordinates in the window of the median filter, i = −M _ω ,..., M _ω , j = −M _ω ,..., M _ω , and the window size is N _ω. × N _ω (N _ω = 2M _ω +1) pixels.

図１１は、デプスマップにおける座標系とウィンドウにおける座標系の関係を示す図である。図１１において、中層のデプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）は、縦ｕピクセル×横ｖピクセルの各々のピクセルの奥行きを示している。ここで、重み付けメディアンフィルタのウィンドウ（Window）は、マップ値Ｉ_ｍｅｄ（ｉ，ｊ）に対応している。マップ値Ｉ_ｍｅｄ（ｉ，ｊ）において、上述したように、ｉ＝−Ｍ_ω、…、Ｍ_ωであり、ｊ＝−Ｍ_ω、…、Ｍ_ωである。ウィンドウのウィンドウサイズ（ピクセル数）はＮ_ω×Ｎ_ω（Ｎ_ω＝２Ｍ_ω＋１）である。
ここで、階層ｈ（中層）のデプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）の座標系における対象ピクセルのピクセル座標（ｕ’，ｖ’）とウィンドウ内の座標系におけるマップ値Ｉ_ｍｅｄ（ｉ，ｊ）との対応関係は、以下の（３８）式により表す。 FIG. 11 is a diagram showing the relationship between the coordinate system in the depth map and the coordinate system in the window. In FIG. 11, a depth map I _Z ^h (u, v) in the middle layer indicates the depth of each pixel of vertical u pixels × horizontal v pixels. Here, the window of the weighted median filter corresponds to the map value I _med (i, j). In map value _I med (i, j), as described _above, i = -M _ω, ..., a _{_{M ω, j = -M ω,}} ..., a M _omega. The window size (number of pixels) of the window is N _ω × N _ω (N _ω = 2M _ω +1).
Here, the pixel coordinate (u ′, v ′) of the target pixel in the coordinate system of the depth map I _Z ^h (u, v) of the hierarchy h (middle layer) and the map value I _med (i, j) in the coordinate system in the window. ) Is expressed by the following equation (38).

また、重みω_ｍｅｄ（ｉ，ｊ）は、以下の（３９）式により表す。 The weight ω _med (i, j) is expressed by the following equation (39).

（３９）式において、対象ピクセルのピクセル座標（ｕ’，ｖ’）は、デプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）の座標系における重み付きメディアンフィルタのウィンドウ中心のピクセルを示している。Ｉ_Ｖ ^ｈ（ｕ，ｖ）は、画像ピラミッドの階層ｈにおける対象視点のカメラで撮像された画像データの各ピクセルの輝度値を示している。また、σ_ω１及びσ_ω２の各々は、それぞれ任意のパラメータである。
上述した（３２）式は、以下に示す３個の仮定を元に形成されている。
・同一画像データ内において対象ピクセル近傍（所定の範囲内の座標値）の他のピクセル値はデプスマップにおける奥行きが近い数値である。
・同一画像データ内において対象ピクセルの輝度値と近い輝度値を有する他のピクセルとは、デプスマップにおける奥行きの数値が近い。
・ＰＯＣ関数のピーク値が低いピクセルほど、デプスマップにおける奥行きの数値の信頼性が低い。 In the equation (39), the pixel coordinate (u ′, v ′) of the target pixel indicates the pixel at the window center of the weighted median filter in the coordinate system of the depth map I _Z ^h (u, v). I _V ^h (u, v) represents the luminance value of each pixel of the image data captured by the camera of the target viewpoint in the hierarchy h of the image pyramid. Each of σ _ω1 and σ _ω2 is an arbitrary parameter.
The above equation (32) is formed based on the following three assumptions.
Other pixel values in the vicinity of the target pixel (coordinate values within a predetermined range) in the same image data are values that are close in depth in the depth map.
A depth value in the depth map is close to another pixel having a luminance value close to the luminance value of the target pixel in the same image data.
-The lower the peak value of the POC function, the lower the reliability of the depth value in the depth map.

上記（３８）式及び（３９）式の各々において、デプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）及び輝度値Ｉ_Ｖ ^ｈ（ｕ，ｖ）それぞれの画像外の奥行き、輝度値の値を０とする。
上述したウィンドウ内の各ピクセルのマップ値Ｉ_ｍｅｄ（ｉ，ｊ）を用いて、マップ値（例えば奥行きの数値）をソートして、数列（ａ_ｋ）を生成する。ウィンドウ内の各ピクセルの重みω_ｍｅｄ（ｉ，ｊ）を用いて、重みω_ｍｅｄ（ｉ，ｊ）をソートして、重み数列（ω_ｋ）を生成する。そして、数列（ａ_ｋ）及び重み数列（ω_ｋ）を（３７）式に代入し、重み付き中央値ａ_ｍｅｄを算出し、重み付きメディアンフィルタのフィルタリングによるデプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）におけるピクセル座標（ｕ’，ｖ’）の対象ピクセルの奥行きの数値とする。以下に示す（４０）式による代入処理を行う。 In each of the above equations (38) and (39), the depth map outside the image and the luminance value of the depth map I _Z ^h (u, v) and the luminance value I _V ^h (u, v) are set to 0. .
Using the map value I _med (i, j) of each pixel in the window described above, the map value (for example, the depth value) is sorted to generate a sequence (a _k ). Weights ω _med (i, j) of each pixel in the window using, by sorting the weight ω _med (i, j), and generates a weight sequence (omega _k). Then, the numerical sequence (a _k ) and the weighted numerical sequence (ω _k ) are substituted into the equation (37), the weighted median value a _med is calculated, and the depth map I _Z ^h (u, v) obtained by filtering the weighted median filter. The depth of the target pixel at the pixel coordinates (u ′, v ′) at. Substitution processing according to the following expression (40) is performed.

上述した重み付きメディアンフィルタを用いたデプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）の更新処理、すなわち、対象ピクセルとしてのピクセル座標（ｕ’，ｖ’）を順次変化させ、デプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）における全てのピクセルの重み付き中央値ａ_ｍｅｄを算出し、ピクセル各々のマップ値を算出した重み付き中央値に更新する。
また、上述の説明としては、デプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）を例として行ったが、相関値マップＩ_α ^ｈ（ｕ，ｖ）、法線ベクトルマップＩ_Ψ ^ｈ（ｕ，ｖ）の各々についても、重み付きメディアンフィルタを用い、デプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）に対してと同様なマップ値の更新処理、ずなわち、それぞれのマップを更新する処理を行う。このとき、重み付きメディアンフィルタにおいて、ウィンドウ内のマップ値Ｉ_ｍｅｄ（ｉ，ｊ）はそれぞれのマップの数値、例えば相関値マップＩ_α ^ｈ（ｕ，ｖ）の場合、マップ値として相関値を、（３８）式に対応する式に代入する。同様に、法線ベクトルマップＩ_Ψ ^ｈ（ｕ，ｖ）の場合、マップ値として法線ベクトルを、（３８）式に対応する式に代入する。 The update process of the depth map I _Z ^h (u, v) using the above-described weighted median filter, that is, the pixel coordinates (u ′, v ′) as the target pixel are sequentially changed, and the depth map I _Z ^h (u , V) calculates the weighted median value a _med of all the pixels, and updates the map value of each pixel to the calculated weighted median value.
In the above description, the depth map I _Z ^h (u, v) is taken as an example, but the correlation value map I _α ^h (u, v) and the normal vector map I _Ψ ^h (u, v) For each of them, a weighted median filter is used to perform a map value update process similar to that for the depth map I _Z ^h (u, v), that is, a process for updating each map. At this time, in the weighted median filter, the map value I _med (i, j) in the window is a numerical value of each map, for example, a correlation value map I _α ^h (u, v), Substitute into the equation corresponding to equation (38). Similarly, in the case of the normal vector map I _Ψ ^h (u, v), the normal vector is substituted as a map value into an expression corresponding to the expression (38).

上述したように、本実施形態によれば、画像ピラミッドにおける最下層以外、全ての階層において、重み付きメディアンフィルタにより、デプスマップＩ_Ｚ ^ｈ（ｕ，ｖ）、相関値マップＩ_α ^ｈ（ｕ，ｖ）及び法線ベクトルマップＩ_Ψ ^ｈ（ｕ，ｖ）の各々のマップのマップ値を更新するため、ウィンドウマッチングによって発生したピクセルの欠如を修復することができ、上層のマップにおけるピクセルの欠如の影響を、下層のマップに対して伝搬させることを抑制することが可能となる。 As described above, according to the present embodiment, the depth map I _Z ^h (u, v) and the correlation value map I _α ^h (u, v) are applied to all layers other than the lowest layer in the image pyramid by the weighted median filter. v) and normal vector maps I _Ψ ^h (u, v) to update the map value of each map, so that the lack of pixels caused by window matching can be repaired, and the lack of pixels in the upper layer map It is possible to suppress the influence from propagating to the lower layer map.

＜デプスマップにおける誤対応点の除去＞
ここで、デプスマップ生成装置１２０における誤対応点除去部１２３は、以下に示す誤対応点除去処理及びアーチファクト除去処理により、各視点のデプスマップ（デプスマップ生成部１２１が生成したデプスマップｄｅｐ）における誤対応点の除去と、各視点のデプスマップにおけるアーチファクトの除去を行う（提案手法）。上述したデプスマップ生成部１２１が生成したデプスマップｄｅｐを、デプスマップＩ_Ｚ（ｕ，ｖ）として以下において説明する。 <Elimination of erroneous correspondence points in the depth map>
Here, the miscorresponding point removing unit 123 in the depth map generating device 120 performs the following in the depth map (depth map dep generated by the depth map generating unit 121) of each viewpoint by the miscorresponding point removing process and the artifact removing process described below. Remove false correspondence points and remove artifacts in the depth map of each viewpoint (proposed method). The depth map dep generated by the above-described depth map generation unit 121 will be described below as a depth map I _Z (u, v).

・誤対応点除去処理
本処理方法においては、グラフカットの手法を用いて誤対応点の除去を行う。ある視点のデプスマップにおいて誤対応点であっても、同一の３次元座標（世界座標）の座標点を復元する別の視点（近傍にある他の視点）のデプスマップにおいては、誤対応点とならない可能性がある。このため、本実施形態における誤対応点除去処理は、誤対応点の評価対象の対象視点のデプスマップについて、カメラ位置の近い近傍の他の視点である参照視点のデプスマップとの整合性を取ることにより、対象視点におけるデプスマップの誤対応点を除去する処理を行っている。特に、上述したような画像ピラミッドの階層における階層的探索を用いるデプスマップ生成は、誤対応点が画像ピラミッドの上層から下層となるに従って広がり、下層のデプスマップ上でクラスタ状に発生し易い。
したがって、本実施形態においては、画像ピラミッドの下層においてクラスタ状に発生した誤対応点について、誤対応点の発生した視点（対象視点）と、近傍の他の視点（参照視点）との整合性を取るため、グラフ理論におけるグラフカットを用いた誤対応点除去を行う。 -Miscorresponding point removal processing In this processing method, the miscorresponding points are removed using a graph cut method. Even if it is a miscorresponding point in the depth map of a certain viewpoint, in the depth map of another viewpoint (another viewpoint in the vicinity) that restores the coordinate point of the same three-dimensional coordinates (world coordinates), It may not be possible. For this reason, in the corresponding point removal process according to the present embodiment, the depth map of the target viewpoint to be evaluated for the corresponding point is consistent with the depth map of the reference viewpoint that is another viewpoint near the camera position. As a result, a process for removing an erroneous correspondence point of the depth map at the target viewpoint is performed. In particular, the depth map generation using the hierarchical search in the image pyramid hierarchy as described above spreads from the upper layer to the lower layer of the image pyramid and easily occurs in a cluster shape on the lower depth map.
Therefore, in the present embodiment, for the miscorresponding points generated in a cluster shape in the lower layer of the image pyramid, the consistency between the viewpoint (target viewpoint) where the miscorresponding point occurred and the other viewpoints (reference viewpoints) in the vicinity are adjusted. In order to take this, the erroneous corresponding point removal using the graph cut in the graph theory is performed.

グラフカットは、Ｓノード（Source node:lnlier）とＴノード（Sink node:Outlier）とを含む重み付き有向グラフにおいて、Ｓ−Ｔカットを求めて、Ｓノード及びＴノードの各々の２値のラベリングを行う手法である。ここで、Ｓ−Ｔカットとは、Ｓノードからピクセルに対応するノードに向かうエッジ、あるいは各ピクセルに対応するノードからＴノードに向かうエッジの重みが最小となるカットラインを求めて、ＳまたはＴの２値のラベリングを行う。
このとき、上記有向グラフのエッジの重みに対して、エネルギー関数の各項を割り当てて、エネルギー最小化問題を解くことにより、Ｓ−Ｔカットを行っている。また、Ｓ−Ｔカットを画像処理に適用する場合、対象のノードであるピクセルの近傍にある他のピクセルとの連結関係を考慮する必要もある。 The graph cut is a weighted directed graph including an S node (Source node: lnlier) and a T node (Sink node: Outlier), and an S-T cut is obtained and binary labeling of each of the S node and the T node is performed. It is a technique to do. Here, the S-T cut refers to a cut line that minimizes the weight of the edge from the S node to the node corresponding to the pixel or the edge from the node corresponding to each pixel to the T node. Binary labeling is performed.
At this time, the ST cut is performed by allocating each term of the energy function to the edge weight of the directed graph and solving the energy minimization problem. In addition, when the ST cut is applied to image processing, it is necessary to consider the connection relationship with other pixels in the vicinity of the pixel that is the target node.

図１２は、デプスマップＩ_Ｚ（ｕ，ｖ）における誤対応点を除去するグラフカットの処理を説明する図である。図１２においては、グラフカットするグラフが示されている。すなわち、対象となる視点Ｖ_Ｒ∈Ｖにおいて、デプスマップＩ_Ｚ（ｕ，ｖ）上の各々のピクセル座標（ｕ，ｖ）のピクセルｍのノード（Node：each pixel）が正対応点（正しい３次元復元点）であるか、または誤対応点であるか（誤った３次元復元点）かであるかの２値のラベリング問題をグラフカットにより解く。図１２において、各ノードがピクセルｍを示しており、Ｓノード（Source node）が正対応（Inlier）のラベルとし、Ｔノード（Sink node）が誤対応（Outlier）のラベルとしている。各ノードであるピクセルに対してＳノードから延びるエッジ（edge1）と、各ピクセルからＴノードに延びるエッジ（edge2）が示されている。 FIG. 12 is a diagram illustrating a graph cut process for removing an erroneous corresponding point in the depth map I _Z (u, v). In FIG. 12, a graph to be cut is shown. That is, in the target viewpoint V _R ∈V, the node (Node: each pixel) of the pixel m of each pixel coordinate (u, v) on the depth map I _Z (u, v) is a positive corresponding point (correct 3 A binary labeling problem is solved by a graph cut whether it is a (dimensional restoration point) or an erroneous correspondence point (an incorrect three-dimensional restoration point). In FIG. 12, each node indicates a pixel m, and an S node (Source node) is a label for correct correspondence (Inlier), and a T node (Sink node) is a label for incorrect correspondence (Outlier). For each pixel that is a pixel, an edge (edge1) extending from the S node and an edge (edge2) extending from each pixel to the T node are shown.

また、隣接する他のピクセルとの連結関係のため、対象とするピクセルと、この対象のピクセルに対して隣接する近傍の４つのピクセルの各々の４個のピクセル）との間について対応するノード間に双方向のエッジ（edge3）を加える。上述した４つのピクセルは、例えば、３次元座標において、対象のピクセルのピクセル座標（ｕ，ｖ）に対して、上下左右に隣接するピクセルのピクセル座標（ｕ−１，ｖ）、ピクセル座標（ｕ，ｖ−１）、ピクセル座標（ｕ＋１，ｖ）及びピクセル座標（ｕ，ｖ＋１）である。そして、全てのピクセルのノードを表す集合をＮとし、全てのピクセル間のエッジを表す集合をＥとする。この各エッジに対して重みを追加し、Ｓノードからピクセルのノード（edge1）の各々に向かうエッジの重みをＥ_{ｄａｔａ１}（ｍ）とし、ピクセルのノードの各々からＴノードに向かうエッジ（edge2）の重みをＥ_{ｄａｔａ２}（ｍ）とし、隣接するピクセル（例えば、ｍ_１−ｍ_２）のノード間の関係を表すエッジ（edge3）の重みをＥ_{ｓｍｏｏｔｈ}（ｍ_１−ｍ_２）とする。このとき、グラフカットとしては、以下の（４１）式の全エネルギＥ_ａｌｌ（Ｎ，Ｅ）を最小化する問題となる。 Also, because of the connection relationship with other adjacent pixels, the corresponding nodes between the target pixel and each of four neighboring pixels adjacent to the target pixel) Add a bi-directional edge (edge3) to The four pixels described above are, for example, in three-dimensional coordinates, the pixel coordinates (u−1, v) and pixel coordinates (u) of pixels adjacent in the vertical and horizontal directions with respect to the pixel coordinates (u, v) of the target pixel. , V−1), pixel coordinates (u + 1, v), and pixel coordinates (u, v + 1). A set representing nodes of all pixels is N, and a set representing edges between all pixels is E. A weight is added to each edge, and the weight of the edge from the S node to each of the pixel nodes (edge1) is E _data1 (m), and the edge (edge2) from each of the pixel nodes to the T node The weight is E _data2 (m), and the weight of an edge (edge3) representing the relationship between nodes of adjacent pixels (for example, m ₁ -m ₂ ) is E _smooth (m ₁ -m ₂ ). At this time, the graph cut becomes a problem of minimizing the total energy E _all (N, E) of the following equation (41).

重みＥ_{ｄａｔａ１}（ｍ）は、ラベルＳからピクセルのノードに対する各々のエッジ（edge1）の重みであり、そのノードに対応するピクセルの正対応点らしさの程度を表す数値である。すなわち、ピクセルｍが正対応点であることを示す重みＥ_{ｄａｔａ１}（ｍ）が大きいほど、Ｓノードからピクセルｍのノードに向かうエッジがＳ−Ｔカットに含まれにくくなる。本実施形態において、ＰＯＣ関数のピーク値が高いピクセルは、正対応点である可能性が高いと仮定している。このため、重みＥ_{ｄａｔａ１}（ｍ）は、本実施形態において以下の（４２）式により表している。（４２）式において、λ_Ｅ１はコストの優先度を表しており、σ_Ｅ１はパラメータを示している。Ｉ_α（ｍ）は各ピクセルの相関値を示す相関値マップである。 The weight E _data1 (m) is the weight of each edge (edge1) from the label S to the node of the pixel, and is a numerical value representing the degree of likelihood of a positive corresponding point of the pixel corresponding to the node. That is, as the weight E _data1 (m) indicating that the pixel m is a positive corresponding point is larger, the edge from the S node toward the node of the pixel m is less likely to be included in the ST cut. In the present embodiment, it is assumed that a pixel having a high peak value of the POC function is likely to be a positive corresponding point. Therefore, the weight E _data1 (m) is expressed by the following equation (42) in the present embodiment. In the equation (42), λ _E1 represents the priority of cost, and σ _E1 represents a parameter. I _α (m) is a correlation value map indicating the correlation value of each pixel.

一方、重みＥ_{ｄａｔａ２}（ｍ）は、ピクセルのノードからラベルＴに対する各々のエッジ（edge2）の重みであり、そのノードに対応するピクセルの誤対応点らしさの程度を表す数値である。すなわち、ピクセルｍが誤対応点であることを示す重みＥ_{ｄａｔａ２}（ｍ）が大きいほど、ピクセルｍのノードからＴノードに向かうエッジがＳ−Ｔカットに含まれにくくなる。本実施形態において、対象となるピクセルの奥行きと、このピクセルを近傍の視点に対して再投影したピクセルの奥行きとの差分が大きいほど、対象となるピクセルは誤対応点である可能性が高いと仮定している。このため、重みＥ_{ｄａｔａ２}（ｍ）は、本実施形態において以下の（４３）式により表している。（４３）式において、λ_Ｅ２はコストの優先度を表しており、σ_Ｅ２はパラメータを示している。Ｃは、Ｖ_Ｒの作成時における近傍の視点群である。Ｚ’は、参照視点Ｖ_Ｒのピクセルの３次元点を近傍の他の視点Ｃ_ｋの３次元点に変換した際のこのピクセルの奥行きを示している。Ｉ_Ｚ、Ｃｋ（ｍ’）は、対象となる視点のピクセルｍに対応する、この対象の視点の近傍の視点Ｃ_ｋにおけるピクセルｍ’の奥行きを表している。 On the other hand, the weight E _data2 (m) is a weight of each edge (edge2) from the pixel node to the label T, and is a numerical value representing the degree of likelihood of an erroneous corresponding point of the pixel corresponding to the node. That is, as the weight E _data2 (m) indicating that the pixel m is an erroneous correspondence point is larger, the edge from the node of the pixel m toward the T node is less likely to be included in the ST cut. In the present embodiment, the greater the difference between the depth of the target pixel and the depth of the pixel that has been reprojected to a nearby viewpoint, the higher the possibility that the target pixel is a miscorresponding point. Assumes. Therefore, the weight E _data2 (m) is expressed by the following equation (43) in the present embodiment. In equation (43), λ _E2 represents the priority of cost, and σ _E2 represents a parameter. C is a perspective group near When Creating a V _R. Z 'represents the depth of the pixels when a three-dimensional point of the reference viewpoint V _R pixel was converted into 3-dimensional point other viewpoints C _k of neighboring. I _{Z, Ck} (m ′) represents the depth of the pixel m ′ at the viewpoint C _k in the vicinity of the target viewpoint corresponding to the pixel m of the target viewpoint.

また、上記（４３）式において、奥行きＺ’及びピクセルｍ’の各々は、以下の（４４）式に示す関係を満たしている。この（４４）式において、Ｒ_Ｃｋは、対象の視点Ｖ_Ｒにおけるカメラ座標から、この対象の視点Ｖ_Ｒの近傍の他の視点Ｃ_ｋのカメラ座標への回転を示している。また、ｔ_Ｃｋは、対象の視点Ｖ_Ｒにおけるカメラ座標から、この対象の視点Ｖ_Ｒの近傍の他の視点Ｃ_ｋのカメラ座標への並行移動量を示している。また、Ａ及びＡ_Ｃｋの各々は、参照の視点Ｖ_Ｒ及び参照の視点Ｖ_Ｒの近傍の視点Ｃ_ｋそれぞれのカメラの内部パラメータ示している。ｓは拡大率あるいは縮小率を示している。また、Ｉ_Ｚ（ｍ）は、対象の視点のデプスマップＩ_Ｚ（ｕ，ｖ）における対象のピクセルｍの奥行きを示している。 In the above equation (43), each of the depth Z ′ and the pixel m ′ satisfies the relationship represented by the following equation (44). In this equation (44), R _Ck from the camera coordinates at the viewpoint V _R of the target, indicates the rotation of the camera coordinate other viewpoints C _k in the vicinity of the viewpoint V _R of this subject. Also, t _Ck from the camera coordinates at the viewpoint V _R of the subject shows a parallel movement amount of the camera coordinates of the other viewpoints C _k in the vicinity of the viewpoint V _R of this subject. Further, each of A and A _Ck show perspective C _k internal parameters of each camera in the vicinity of the viewpoint V _R and the reference viewpoint V _R of the reference. s indicates an enlargement rate or reduction rate. Further, I _Z (m) indicates the depth of the target pixel m in the depth map I _Z (u, v) of the target viewpoint.

一方、重みＥ_{ｓｍｏｏｔｈ}（ｍ_１−ｍ_２）は、対象のピクセルと隣接するピクセルとのラベルの一致し易さを示している。すなわち、重みＥ_{ｓｍｏｏｔｈ}（ｍ_１−ｍ_２）が大きくなるほど、ｍ_１及びｍ_２の各々の間のエッジ（edge3）がＳ−Ｔカットに含まれ難くくなる。本実施形態においては、対象のピクセルの奥行きが、この対象のピクセルに隣接する他のピクセルの奥行きとの差が大きくなるエッジ（edge3）において、正対応点及び誤対応点の各々のラベルが変化し易いと仮定している。このため、重みＥ_{ｓｍｏｏｔｈ}（ｍ）は、本実施形態において以下の（４５）式により表している。（４５）式において、λ_Ｅ３はコストの優先度を表しており、σ_Ｅ２はパラメータを示している。Ｉ_Ｚ（ｍ_１）及びＩ_Ｚ（ｍ_２）の各々は、対象の視点のデプスマップＩ_Ｚ（ｕ，ｖ）における対象のピクセルｍ_１とこのピクセルｍ_１に隣接するピクセルｍ_２との各々の奥行きを示している。 On the other hand, the weight E _smooth (m ₁ -m ₂ ) indicates the ease with which the labels of the target pixel and the adjacent pixels match. That is, as the weight E _smooth (m ₁ −m ₂ ) increases, the edge (edge 3) between each of m ₁ and m ₂ is less likely to be included in the ST cut. In this embodiment, the labels of the correct corresponding point and the incorrect corresponding point change at the edge (edge3) where the depth of the target pixel becomes larger than the depth of other pixels adjacent to the target pixel. It is assumed that it is easy to do. Therefore, the weight E _smooth (m) is expressed by the following equation (45) in the present embodiment. In the equation (45), λ _E3 represents the priority of cost, and σ _E2 represents a parameter. Each of I _Z (m ₁ ) and I _Z (m ₂ ) is determined by each of the target pixel m ₁ in the depth map I _Z (u, v) of the target viewpoint and the pixel m ₂ adjacent to the pixel m _1. Depth of.

そして、誤対応点除去部１２３は、上述したように定義された図１２に示すグラフ、及び（４１）式から（４５）式の各々を用いて、グラフカットの手法によって対象となる視点のデプスマップにおいてＳ−Ｔカットを求め、誤対応点除去処理を行う。また、誤対応点除去部１２３は、求めたＳ−Ｔカットによって、Ｓノードと同じラベルのノードに対応するピクセルを正対応点と判定し、一方、Ｔノードと同じラベルのノードに対応するピクセルを誤対応点と判定する。誤対応点除去部１２３は、誤対応点と判定したピクセルをデプスマップから除去するため、デプスマップＩ_Ｚ（ｕ，ｖ）及び相関値マップＩ_α（ｕ，ｖ）の各々における、誤対応点と判定されたピクセルの値を０とする。誤対応点除去部１２３は、多視点画像Ｖに含まれる全ての視点に対して、上述したＳ−Ｔカットを求めて、各視点のデプスマップにおける誤対応点を除去する処理を行う。 Then, the miscorresponding point removal unit 123 uses the graph shown in FIG. 12 defined as described above and each of the equations (41) to (45) to change the depth of the target viewpoint by the graph cut method. An ST cut is obtained in the map, and an erroneous corresponding point removal process is performed. The miscorresponding point removal unit 123 determines a pixel corresponding to a node having the same label as the S node as a correct corresponding point by the obtained ST cut, and a pixel corresponding to a node having the same label as the T node. Are determined to be miscorresponding points. The miscorresponding point removal unit 123 removes the pixels determined to be miscorresponding points from the depth map, so that the miscorresponding points in each of the depth map I _Z (u, v) and the correlation value map I _α (u, v). The value of the pixel determined as 0 is set to 0. The miscorresponding point removal unit 123 obtains the above-described ST cut for all viewpoints included in the multi-viewpoint image V, and performs processing for removing the miscorresponding points in the depth map of each viewpoint.

本実施形態によれば、上述したように、ＰＯＣ関数の相関値を重み付けに用い、Ｓ−Ｔカットを求めてグラフカットによって、各ピクセルが正対応点であるか誤対応点であるかのラベリングを行い、誤対応点と判定されたピクセルの除去を行うことにより、３次元座標の座標点群を世界座標に座標変換した際に残される誤対応点を効果的に除去することができ、各視点のデプスマップから復元される３次元形状の画像の形状の復元精度を向上させることができる。 According to the present embodiment, as described above, the correlation value of the POC function is used for weighting, the ST cut is obtained, and the labeling as to whether each pixel is a positive corresponding point or a false corresponding point by graph cutting. By removing the pixels determined to be miscorresponding points, it is possible to effectively remove miscorresponding points left when the coordinate points of the three-dimensional coordinates are transformed into world coordinates, It is possible to improve the accuracy of restoring the shape of the three-dimensional image restored from the viewpoint depth map.

図１３は、デプスマップにおけるグラフカットの処理による誤対応点の除去を示す図である。図１３（ａ）は、対象視点Ｖ_Ｒに配置されたカメラで撮像された画像データを示している。図１３（ｂ）は、図１３（ａ）の画像データから生成されたデプスマップを示している。図１３（ｃ）及び図１３（ｄ）の各々は、それぞれ近傍の他の視点である参照視点Ｃ_ｋに配置されたカメラの画像データから生成されたデプスマップを示している。図１３（ｂ）から図１２に示すようなグラフを作成し，図１３（ｃ）、図１３（ｄ）から各エッジに対する重みを計算し、Ｔノードとのエッジを残したピクセルｍを誤対応点とする。図１３（ｅ）は、上記誤対応点のピクセルｍの部分を白抜きとして示している。図１３（ｆ）は、図１３（ｅ）において白抜きとなった部分のピクセルを、図１３（ｂ）から削除したデプスマップを示している。 FIG. 13 is a diagram illustrating the removal of erroneous corresponding points by the graph cut processing in the depth map. FIG. 13 (a) shows an image data captured by the camera located in the target viewpoint V _R. FIG. 13B shows a depth map generated from the image data of FIG. Each of FIG. 13C and FIG. 13D shows a depth map generated from the image data of the camera arranged at the reference viewpoint C _k which is another viewpoint in the vicinity. Create graphs as shown in Fig. 13 (b) to Fig. 12, calculate weights for each edge from Fig. 13 (c) and Fig. 13 (d), and erroneously correspond to pixel m that leaves the edge with T node. Let it be a point. FIG. 13E shows a portion of the pixel m of the erroneous corresponding point as white. FIG. 13 (f) shows a depth map in which the white pixels in FIG. 13 (e) are deleted from FIG. 13 (b).

・アーチファクト除去処理
本処理方法においては、閾値処理の手法を用いてアーチファクトの除去を行う。このアーチファクトは、多視点ステレオのアルゴリズムによって復元される、撮像した３次元形状の物体において現実には存在しない物体表面上に生成される構造物である。また、アーチファクトは、対象の視点において物体境界及びオクルージョン境界の各々となる領域で復元される傾向がある。多視点ステレオのアルゴリズムにおいては、マッチングウィンドウという小領域毎に、３次元座標の座標点の奥行きを推定しているため、物体境界で背景となるピクセルにおいても、このマッチングウィンドウ内に含まれる前景の奥行きを推定するため、アーチファクトとして復元される。 -Artifact removal processing In this processing method, artifacts are removed using a threshold processing method. This artifact is a structure generated on the surface of an object that does not actually exist in a captured three-dimensional object, which is restored by a multi-viewpoint stereo algorithm. Artifacts tend to be restored in regions that are object boundaries and occlusion boundaries at the target viewpoint. In the multi-viewpoint stereo algorithm, the depth of the coordinate point of the three-dimensional coordinate is estimated for each small area of the matching window, so that the foreground included in the matching window is also included in the pixel that is the background at the object boundary. In order to estimate the depth, it is restored as an artifact.

また、すでに説明した誤対応点の除去と異なり、これから説明する除去処理の対象となるアーチファクトは、対象となる視点のカメラの位置の近傍の他の視点のデプスマップにおける奥行きの差分としては差が現れ難い。このため、アーチファクトについては、すでに述べた誤対応点除去処理においては効果的な除去が行うことができない。このため、対象となる視点のカメラ位置のデプスマップと、この対象となる地点に対して大きく異なる位置の他の視点のカメラ位置におけるデプスマップとの整合性を取る必要がある。 Also, unlike the removal of erroneous corresponding points already described, the artifacts that are the targets of the removal processing that will be described are not different in depth difference in depth maps of other viewpoints in the vicinity of the target viewpoint camera position. It is hard to appear. For this reason, the artifact cannot be effectively removed in the already described erroneous corresponding point removal process. For this reason, it is necessary to ensure consistency between the depth map of the camera position of the target viewpoint and the depth map of the camera position of another viewpoint at a position greatly different from the target point.

しかしながら、カメラ位置が大きく異なる視点間においては、３次元形状を復元する対象である対象物体における復元領域の範囲も変わり易くなる。このため、対象となる視点と、比較する視点との各々におけるデプスマップ間に差分が認められる場合、その差分がアーチファクトを原因とするものか、あるいは異なる視点間において異なる領域を３次元形状に復元して異なっているのかを判断することが困難である。
このため、本実施形態においては、アーチファクトのピクセルを含む複数枚のデプスマップを用いて、これらデプスマップ同士の整合性が取れなくなる状態を場合分けすることで、アーチファクトのピクセルを以下に説明する閾値処理によって除去する。 However, between the viewpoints with greatly different camera positions, the range of the restoration area in the target object that is the target for restoring the three-dimensional shape is easily changed. For this reason, if there is a difference between the depth maps of the target viewpoint and the viewpoint to be compared, the difference is caused by an artifact, or a different area between different viewpoints is restored to a three-dimensional shape. It is difficult to judge whether they are different.
For this reason, in the present embodiment, a plurality of depth maps including artifact pixels are used, and a state in which the consistency between the depth maps is not obtained is classified into cases, thereby the threshold value described below for the artifact pixels. Remove by processing.

図１４は、アーチファクトが発生した際における複数の視点のデプスマップ間の整合性を説明する図である。図１４（ａ）においては、カメラ位置の各々が異なる位置である視点Ｖｉｅｗ１、視点Ｖｉｅｗ２及び視点Ｖｉｅｗ３のデプスマップで再生した３次元形状にアーチファクトが発生している。視点Ｖｉｅｗ１から物体境界やオクルージョン境界（例えば、形状Ｏｂｊｅｃｔ２の突起部による境界５０２）として観察される領域においても、他の視点Ｖｉｅｗ２及び視点Ｖｉｅｗ３の各々からは物体境界やオクルージョン境界として観察されない。 FIG. 14 is a diagram for explaining consistency between depth maps of a plurality of viewpoints when an artifact occurs. In FIG. 14A, an artifact is generated in the three-dimensional shape reproduced by the depth maps of the viewpoint View1, the viewpoint View2, and the viewpoint View3 that are different positions in the camera position. Even in an area observed from the viewpoint View1 as an object boundary or an occlusion boundary (for example, the boundary 502 by the protrusion of the shape Object2), it is not observed as an object boundary or an occlusion boundary from each of the other viewpoint View2 and the viewpoint View3.

同様に、視点Ｖｉｅｗ３から物体境界やオクルージョン境界（例えば、形状Ｏｂｊｅｃｔ２の突起部による境界５０１）として観察される領域においても、他の視点Ｖｉｅｗ１及び視点Ｖｉｅｗ２の各々からは物体境界やオクルージョン境界として観察されない。また、アーチファクトは、図１４（ａ）のように、形状の外側に発生し易い特性を有している。このため、視点Ｖｉｅｗ１のデプスマップから再生されたアーチファクトは、他の視点Ｖｉｅｗ２及び視点Ｖｉｅｗ３の各々のデプスマップから再生された形状に対してより手前に観察される。ここで、手前とは、各視点における奥行き方向に対し、視点により近い距離を示している。 Similarly, even in an area observed from the viewpoint View3 as an object boundary or an occlusion boundary (for example, a boundary 501 due to the protrusion of the shape Object2), it is not observed as an object boundary or an occlusion boundary from each of the other viewpoint View1 and the viewpoint View2. . Further, the artifact has a characteristic that is likely to occur outside the shape as shown in FIG. For this reason, the artifact reproduced from the depth map of the viewpoint View1 is observed closer to the shape reproduced from the depth maps of the other viewpoint View2 and the viewpoint View3. Here, the near side indicates a distance closer to the viewpoint with respect to the depth direction at each viewpoint.

図１４（ｂ）においては、視点Ｖｉｅｗ１、視点Ｖｉｅｗ２及び視点Ｖｉｅｗ３の各々のデプスマップの整合性と、再生された３次元座標におけるピクセルの関係とを示している。以下に、座標点Ａ、座標点Ａ’、座標点Ｂ、座標点Ｂ’、座標点Ｃ、座標点Ｃ’及び座標点Ｄ各々の座標点の整合性と関係性とを以下に示す。
座標点Ａと座標点Ａ’との各々においては、視点Ｖｉｅｗ１、視点Ｖｉｅｗ２それぞれによるデプスマップよって、座標点間の整合性を取ることはできない。しかしながら、視点Ｖｉｅｗのデプスマップから復元される形状Ｏｂｊｅｃｔ１の座標点Ａと、視点Ｖｉｅｗ２から復元される形状Ｏｂｊｅｃｔ２の座標点Ａ’とのいずれも正対応点のピクセルである。座標点Ａ及び座標点Ａ’の各々においては、視点Ｖｉｅｗ１のデプスマップと視点Ｖｉｅｗ２のデプスマップとの各々から復元される領域が異なっている例である。この座標点Ａと座標点Ａ’との再生に用いたデプスマップの各々における奥行きの差分は大きくなる。 FIG. 14B shows the consistency of the depth maps of the viewpoint View1, the viewpoint View2, and the viewpoint View3, and the relationship between the pixels in the reproduced three-dimensional coordinates. In the following, the consistency and relationship of the coordinate points of coordinate point A, coordinate point A ′, coordinate point B, coordinate point B ′, coordinate point C, coordinate point C ′, and coordinate point D are shown below.
In each of the coordinate point A and the coordinate point A ′, consistency between the coordinate points cannot be achieved by the depth maps of the viewpoint View1 and the viewpoint View2. However, both the coordinate point A of the shape Object1 restored from the depth map of the viewpoint View and the coordinate point A ′ of the shape Object2 restored from the viewpoint View2 are the pixels of the positive corresponding points. In each of the coordinate point A and the coordinate point A ′, an area restored from each of the depth map of the viewpoint View1 and the depth map of the viewpoint View2 is an example. The difference in depth in each of the depth maps used for reproduction of the coordinate point A and the coordinate point A ′ becomes large.

また、座標点Ｂ及び座標点Ｂ’の各々と、座標点Ｃ及び座標点Ｃ’の各々とにおいては、視点Ｖｉｅｗ１及び視点Ｖｉｅｗ２間の整合性が取れていない。すなわち、視点Ｖｉｅｗ１のデプスマップから復元された座標点Ｂは正対応点のピクセルであり、視点Ｖｉｅｗ３のデプスマップから復元された座標点Ｂ’はアーチファクトのピクセルである。同様に、視点Ｖｉｅｗ１のデプスマップから復元された座標点Ｃ’はアーチファクトのピクセルであり、視点Ｖｉｅｗ３のデプスマップから復元された座標点Ｃは正対応点のピクセルである。座標点Ｃ’及び座標点Ｂ’の各々は、視点Ｖｉｅｗ１、視点Ｖｉｅｗ３それぞれにおけるオクルージョン境界において発生している。 In addition, in each of the coordinate point B and the coordinate point B ′ and each of the coordinate point C and the coordinate point C ′, consistency between the viewpoint View1 and the viewpoint View2 is not achieved. That is, the coordinate point B restored from the depth map of the viewpoint View1 is a pixel of a positive corresponding point, and the coordinate point B 'restored from the depth map of the viewpoint View3 is an artifact pixel. Similarly, the coordinate point C ′ restored from the depth map of the viewpoint View1 is an artifact pixel, and the coordinate point C restored from the depth map of the viewpoint View3 is a pixel of a positive corresponding point. Each of the coordinate point C ′ and the coordinate point B ′ is generated at an occlusion boundary in each of the viewpoint View1 and the viewpoint View3.

ここで、座標点Ｂ及び座標点Ｂ’の各々と、座標点Ｃ及び座標点Ｃ’の各々とにおいては、視点Ｖｉｅｗ１のデプスマップと視点Ｖｉｅｗ３のデプスマップとの各々から復元される領域が同様である例である。この座標点Ｂ及び座標点Ｂ’と座標点Ｃ及び座標点Ｃ’との再生に用いたデプスマップの各々における奥行きの差分は小さくなる。また、アーチファクトである座標点Ｂ’の奥行きは、正対応点である座標点Ｂを復元した視点Ｖｉｅｗ１から観察した際、座標点Ｂに対してより視点Ｖｉｅｗ１に近い位置に復元される。アーチファクトである座標点Ｃ’の奥行きは、正対応点である座標点Ｃを復元した視点Ｖｉｅｗ３から観察した際、座標点Ｃに対してより視点Ｖｉｅｗ３に近い位置に復元される。 Here, in each of the coordinate point B and the coordinate point B ′ and each of the coordinate point C and the coordinate point C ′, the regions restored from the depth map of the viewpoint View 1 and the depth map of the viewpoint View 3 are the same. This is an example. The difference in depth in each of the depth maps used for reproducing the coordinate points B and B 'and the coordinate points C and C' becomes small. Further, the depth of the coordinate point B ′ that is an artifact is restored to a position closer to the viewpoint View 1 with respect to the coordinate point B when the coordinate point B that is the positive corresponding point is observed from the restored view View1. The depth of the coordinate point C ′, which is an artifact, is restored to a position closer to the viewpoint View3 with respect to the coordinate point C when observed from the viewpoint View3 where the coordinate point C, which is a positive corresponding point, is restored.

また、座標点Ｄにおいては、視点Ｖｉｅｗ２のデプスマップと視点Ｖｉｅｗ３のデプスマップにおける奥行きの差分が小さく、所定の範囲内であるとして整合性が取れている。この結果から、座標点Ｄの位置においては、視点Ｖｉｅｗ２のデプスマップと、視点Ｖｉｅｗ３のデプスマップとから対象物の同様の領域における同一の座標点Ｄが復元されている。 At the coordinate point D, the difference in depth between the depth map of the viewpoint View 2 and the depth map of the viewpoint View 3 is small, and consistency is obtained as being within a predetermined range. From this result, at the position of the coordinate point D, the same coordinate point D in the same region of the object is restored from the depth map of the viewpoint View2 and the depth map of the viewpoint View3.

上述した図１４におけるデプスマップの整合性と座標点の対応とから、アーチファクトとするかあるいは正対応点とするかを判定する閾値を設定する。そして、アーチファクト除去部１２４は、設定した閾値を用いた閾値処理により、対象の視点Ｖ_ＲのデプスマップＩ_Ｚ（ｕ，ｖ）において、全てのピクセルを正対応点またはアーチファクトのいずれかであるかの判定を行う。設定される閾値は、以下の（４６）式により示す。アーチファクト除去部１２４は、以下の（４６）式を満たすピクセルｍをアーチファクトと判定する。 Based on the consistency of the depth map in FIG. 14 described above and the correspondence between the coordinate points, a threshold value is set for determining whether to be an artifact or a positive corresponding point. Then, whether artifact removing section 124, the threshold value processing using the threshold value set, the depth map I _{Z (u,} v) of the target viewpoint V _R in is either positive corresponding points or artifacts every pixel Judgment is made. The threshold value to be set is expressed by the following equation (46). The artifact removing unit 124 determines that the pixel m satisfying the following expression (46) is an artifact.

上記（４６）式において、ｔｈ_１及びｔｈ_２の各々は、任意に設定されるパラメータである。Ｚ’は、対象の視点Ｖ_Ｒの３次元座標における座標点を、近傍の他の視点Ｃ_ｋの座標系に対して座標変換した３次元座標における奥行きを示している。Ｉ_Ｚ，Ｃｋ（ｍ’）は、近傍の他の視点Ｃ_ｋのデプスマップにおけるピクセルｍに対応するピクセルｍ’の奥行きを示している。
また、奥行きＺ’及びピクセルｍ’の各々は、すでに説明した（４４）式の関係を満たす。また、上記（４６）式における視点Ｃ_ｋは、以下の（４７）式を満たす視点である。 In the above equation (46), each of th ₁ and th ₂ is a parameter that is arbitrarily set. Z 'is a coordinate point in the three-dimensional coordinate of the target of the viewpoint V _R, shows the depth in the three-dimensional coordinates coordinate transformation on the coordinate system of the other viewpoint C _k of neighboring. I _{Z, Ck} (m ′) indicates the depth of the pixel m ′ corresponding to the pixel m in the depth map of another viewpoint C _{k in} the vicinity.
Further, each of the depth Z ′ and the pixel m ′ satisfies the relationship of the formula (44) already described. The viewpoint C _k in the above equation (46) is a viewpoint that satisfies the following equation (47).

上記（４７）式において、ｔｈ_３は、任意に設定されるパラメータである。ｄｅｇ（Ｖ_Ｒ，Ｃ_ｋ）は、視点Ｖ_Ｒと視点Ｖ_Ｒの近傍の他の視点Ｃ_ｋとの視差角を示している。
アーチファクト除去部１２３は、対象とする視点Ｖ_Ｒに対して隣接する他の視点Ｃ_ｋが複数存在する場合、隣接する全ての他の視点を候補とし、各候補について上記（４６）式により評価する。そして、アーチファクト除去部１２３は、候補の１個以上が（４６）式を満たす場合、対象のデプスマップにおけるピクセルをアーチファクトと判定する。
このとき、アーチファクト除去部１２３は、（４６）において、奥行きＩ_Ｚ，Ｃｋ（ｍ’）と奥行きＺ’との差分を奥行きＩ_Ｚ，Ｃｋ（ｍ’）により除算して正規化して正規化差分が、パラメータである閾値ｔｈ_１以下のピクセルｍが、ピクセルｍ’と同一のピクセルとして復元された座標点であるとして、正対応点と判定して除去しない。 In the above equation (47), th ₃ is a parameter that is arbitrarily set. deg _(V R, _{C k)} shows a parallax angle between the other viewpoint _{C k} in the vicinity of the viewpoint _{V R} and viewpoint _{V R.}
Artifact removing section 123, if the other viewpoint C _k adjacent against the viewpoint V _R of interest there are a plurality, and a candidate all other viewpoints adjacent, for each candidate is evaluated by the expression (46) . Then, when one or more candidates satisfy the expression (46), the artifact removal unit 123 determines a pixel in the target depth map as an artifact.
At this time, in (46), the artifact removal unit 123 normalizes the difference between the depth I _{Z, Ck} (m ′) and the depth Z ′ by dividing the difference by the depth I _{Z, Ck} (m ′). However, the pixel m that is equal to or less than the threshold value th ₁ as a parameter is a coordinate point restored as the same pixel as the pixel m ′, and is determined as a positive corresponding point and is not removed.

一方、アーチファクト除去部１２３は、正規化差分がパラメータである閾値ｔｈ_２以上のピクセルｍが異なる領域の異なる座標点を復元したものであるとし、正対応点と判定して除去しない。そして、アーチファクト除去部１２３は、閾値ｔｈ_１を超えて、かつ閾値ｔｈ_２未満であるピクセルｍが、同様の領域における対応するピクセルｍ’と異なるピクセルとして復元されたとし、アーチファクトとして除去する。
また、アーチファクト除去部１２３は、対象の視点に対して、視野角が（４７）式においてパラメータである閾値ｔｈ_３以下の他の視点を、対象とする視点の近傍に位置し、アーチファクトの判定に適さないため対象の視点と比較する他の視点として用いない。 On the other hand, the artifact removal unit 123 restores a different coordinate point in a region in which the pixel m having a normalization difference equal to or greater than the threshold value th _{2 as a} parameter is different, and determines that it is a positive corresponding point and does not remove it. Then, the artifact removal unit 123 removes the pixel m that exceeds the threshold th ₁ and is less than the threshold th ₂ as a pixel different from the corresponding pixel m ′ in the same region, and removes it as an artifact.
Further, the artifact removal unit 123 locates another viewpoint with a viewing angle equal to or smaller than the threshold th ₃ that is a parameter in the equation (47) with respect to the target viewpoint, and determines artifacts. Because it is not suitable, it is not used as another viewpoint compared with the target viewpoint.

アーチファクト除去部１２３は、多視点画像における全て視点ＶのデプスマップＩ_Ｚ（ｕ，ｖ）上の全てのピクセルを、（４６）式及び（４７）式の各々により評価する。そして、アーチファクト除去部１２３は、（４６）式及び（４７）式の各々の条件を満たすピクセルｍを、デプスマップＩ_Ｚ（ｕ，ｖ）及び相関値マップＩ_α（ｕ，ｖ）において０とする処理を行いアーチファクトの除去を行う。 The artifact removing unit 123 evaluates all the pixels on the depth map I _Z (u, v) of all the viewpoints V in the multi-viewpoint image by each of the expressions (46) and (47). Then, the artifact removing unit 123 sets the pixel m that satisfies the conditions of Expressions (46) and (47) to 0 in the depth map I _Z (u, v) and the correlation value map I _α (u, v). To remove artifacts.

図１５は、対象の視点のピクセルが正対応点であるかアーチファクトであるかの判定処理を説明する図である。
図１５（ａ）は、対象の視点Ｖｉｅｗ２（Ｖ_Ｒ）のデプスマップから復元された座標点Ａ（ピクセルｍ）と、他の視点Ｖｉｅｗ１（Ｃ_ｋ）のデプスマップから復元された座標点Ａ’との比較を示している。図１５（ａ）において、アーチファクト除去部１２３は、視点Ｖｉｅｗ２の座標点Ａを視差角ｄｅｇ（Ｖ_Ｒ，Ｃ_ｋ）分回転させて、視点Ｖｉｅｗ１の座標系に座標変換している。そして、アーチファクト除去部１２３は、この座標変換した座標点Ａの視点Ｖｉｅｗ１からの奥行きＺ’と、対応する座標点Ａ’（ピクセルｍ’）の視点Ｖｉｅｗ１からの奥行きＩ_Ｚ，Ｃｋ（ｕ，ｖ）との差分（Ｉ_Ｚ，Ｃｋ（ｕ，ｖ）−Ｚ’）を求める。 FIG. 15 is a diagram illustrating a process of determining whether a target viewpoint pixel is a positive corresponding point or an artifact.
FIG. 15A shows a coordinate point A (pixel m) restored from the depth map of the target viewpoint View2 (V _R ) and a coordinate point A ′ restored from the depth map of the other viewpoint View1 (C _k ). Comparison with is shown. In FIG. 15A, the artifact removing unit 123 rotates the coordinate point A of the viewpoint View2 by the parallax angle deg (V _R , C _k ) and performs coordinate conversion to the coordinate system of the viewpoint View1. The artifact removing unit 123 then converts the depth Z ′ of the coordinate point A from the viewpoint View1 and the depth I _{Z, Ck} (u, v) of the corresponding coordinate point A ′ (pixel m ′) from the viewpoint View1. ) (I _{Z, Ck} (u, v) −Z ′).

そして、アーチファクト除去部１２３は、求めた差分（Ｉ_Ｚ，Ｃｋ（ｕ，ｖ）−Ｚ’）をＩ_Ｚ，Ｃｋ（ｕ，ｖ）で除算して正規化差分を求める。アーチファクト除去部１２３は、求めた正規化差分が上記（４６）式を満たすか否かの判定を行い、正規化差分が閾値ｔｈ_２を超えているため、座標点Ａのピクセルｍを正対応点と判定する。このとき、アーチファクト除去部１２３は、視点Ｖｉｅｗ１（Ｃ_ｋ）が（４７）式を満たすか否かの判定を行い、視点Ｖｉｅｗ２と視点Ｖｉｅｗ１とのなす視差角ｄｅｇ（Ｖ_Ｒ，Ｃ_ｋ）が閾値ｔｈ３を超えており、視点Ｖｉｅｗ１が視点Ｖｉｅｗ２との比較に用いるアーチファクトの判定に適していると判定している。 Then, the artifact removing unit 123 obtains a normalized difference by dividing the obtained difference (I _{Z, Ck} (u, v) −Z ′) by I _{Z, Ck} (u, v). The artifact removal unit 123 determines whether or not the obtained normalized difference satisfies the above equation (46). Since the normalized difference exceeds the threshold th ₂ , the pixel m at the coordinate point A is a positive corresponding point. Is determined. At this time, the artifact removing unit 123 determines whether or not the viewpoint View1 (C _k ) satisfies the expression (47), and the parallax angle deg (V _R , C _k ) between the viewpoint View2 and the viewpoint View1 is a threshold value. Since it exceeds th3, it is determined that the viewpoint View1 is suitable for determining an artifact used for comparison with the viewpoint View2.

一方、図１５（ｂ）は、対象の視点Ｖｉｅｗ３（Ｖ_Ｒ）のデプスマップから復元された座標点Ｂ（ピクセルｍ）と、他の視点Ｖｉｅｗ１（Ｃ_ｋ）のデプスマップから復元された座標点Ｂ’との比較を示している。図１５（ｂ）において、アーチファクト除去部１２３は、視点Ｖｉｅｗ３の座標点Ｂを視差角ｄｅｇ（Ｖ_Ｒ，Ｃ_ｋ）分回転させて、視点Ｖｉｅｗ１の座標系に座標変換している。そして、アーチファクト除去部１２３は、この座標変換した座標点Ｂの視点Ｖｉｅｗ１からの奥行きＺ’と、対応する座標点Ｂ’（ピクセルｍ’）の視点Ｖｉｅｗ１からの奥行きＩ_Ｚ，Ｃｋ（ｕ，ｖ）との差分（Ｉ_Ｚ，Ｃｋ（ｕ，ｖ）−Ｚ’）を求める。 On the other hand, FIG. 15B shows a coordinate point B (pixel m) restored from the depth map of the target viewpoint View3 (V _R ) and a coordinate point restored from the depth map of the other viewpoint View1 (C _k ). Comparison with B ′ is shown. In FIG. 15B, the artifact removing unit 123 rotates the coordinate point B of the viewpoint View3 by the parallax angle deg (V _R , C _k ) and performs coordinate conversion to the coordinate system of the viewpoint View1. The artifact removing unit 123 then converts the depth Z ′ of the coordinate point B from the viewpoint View1 and the depth I _{Z, Ck} (u, v) of the corresponding coordinate point B ′ (pixel m ′) from the viewpoint View1. ) (I _{Z, Ck} (u, v) −Z ′).

そして、アーチファクト除去部１２３は、求めた差分（Ｉ_Ｚ，Ｃｋ（ｕ，ｖ）−Ｚ’）をＩ_Ｚ，Ｃｋ（ｕ，ｖ）で除算して正規化差分を求める。アーチファクト除去部１２３は、求めた正規化差分が上記（４６）式を満たすか否かの判定を行い、正規化差分が閾値ｔｈ_１以上であり、かつ閾値ｔｈ_２以下であるため、座標点Ｂのピクセルｍをアーチファクトと判定する。このとき、アーチファクト除去部１２３は、視点Ｖｉｅｗ１（Ｃ_ｋ）が（４７）式を満たすか否かの判定を行い、視点Ｖｉｅｗ３と視点Ｖｉｅｗ１とのなす視差角ｄｅｇ（Ｖ_Ｒ，Ｃ_ｋ）が閾値ｔｈ_３を超えており、視点Ｖｉｅｗ１が視点Ｖｉｅｗ３と比較に用いるアーチファクトの判定に適していると判定している。 Then, the artifact removing unit 123 obtains a normalized difference by dividing the obtained difference (I _{Z, Ck} (u, v) −Z ′) by I _{Z, Ck} (u, v). The artifact removing unit 123 determines whether or not the obtained normalized difference satisfies the above equation (46). Since the normalized difference is equal to or greater than the threshold th ₁ and equal to or less than the threshold th ₂ , the coordinate point B Pixel m is determined to be an artifact. At this time, the artifact removing unit 123 determines whether or not the viewpoint View1 (C _k ) satisfies the expression (47), and the parallax angle deg (V _R , C _k ) between the viewpoint View3 and the viewpoint View1 is a threshold value. th ₃ is exceeded, and it is determined that the viewpoint View1 is suitable for determining the artifact used for comparison with the viewpoint View3.

本実施形態によれば、上述したように、（４７）式によって得られる程度距離の離れた参照視点を用いて、対象視点の各ピクセルに対して閾値処理を用いたアーチファクト除去処理を行うことにより、すでに説明した誤対応点除去処理では取り除くことができないアーチファクトを効果的に除去することができ、各視点のデプスマップから復元される３次元形状の画像の形状の復元精度を向上させることができる。
また、本実施形態においては、複数の参照視点のうち１個でもアーチファクトと判定したピクセルを除去しているが、参照視点の対応するピクセルがアーチファクトである可能性を考慮し、複数の参照視点の多数決により、アーチファクトの判定を行うように、アーチファクト除去部１２４を構成しても良い。 According to the present embodiment, as described above, by performing the artifact removal processing using the threshold processing on each pixel of the target viewpoint using the reference viewpoints that are separated by a distance obtained by Expression (47). The artifacts that cannot be removed by the already described erroneous corresponding point removal process can be effectively removed, and the restoration accuracy of the shape of the three-dimensional image restored from the depth map of each viewpoint can be improved. .
Further, in the present embodiment, pixels that are determined to be artifacts are removed from at least one of the plurality of reference viewpoints. However, considering the possibility that the corresponding pixels of the reference viewpoint are artifacts, The artifact removing unit 124 may be configured to perform artifact determination by majority vote.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

また、上記に説明した各装置を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、実行処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。 In addition, a program for realizing each device described above is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed, thereby executing an execution process. May be. Here, the “computer system” may include an OS and hardware such as peripheral devices.

また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ））のように、一定時間プログラムを保持しているものも含むものとする。
また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。
また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。
さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” refers to a volatile memory (for example, DRAM (Dynamic) in a computer system serving as a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. Random Access Memory)) that holds a program for a certain period of time is also included.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above.
Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

１００…演算処理装置、１１０…関数算出装置、１１１…正規化部（受信部、平行化部）、１１２…ウィンドウ設定部、１１３…関数算出部、１１４…関数統合部、１２０…デプスマップ生成装置、１２１…デプスマップ生成部、１２２…フィルタ部、１２３…誤対応点除去部、１２４…アーチファクト除去部、１３０…メッシュモデル生成装置、２００…記憶装置、３００…出力装置、４００…バス、１１０…記憶部 DESCRIPTION OF SYMBOLS 100 ... Arithmetic processing apparatus, 110 ... Function calculation apparatus, 111 ... Normalization part (reception part, parallelization part), 112 ... Window setting part, 113 ... Function calculation part, 114 ... Function integration part, 120 ... Depth map generation apparatus , 121 ... Depth map generation unit, 122 ... Filter unit, 123 ... Mismatch point removal unit, 124 ... Artifact removal unit, 130 ... Mesh model generation device, 200 ... Storage device, 300 ... Output device, 400 ... Bus, 110 ... Memory

Claims

A depth map generating unit that generates a depth map from images of different viewpoints using a phase-only correlation method based on a multi-view stereo algorithm;
For each of the edges to be connected from the node corresponding to the pixel of the target viewpoint that is the target for detecting the erroneous corresponding point, for each of the correct corresponding point node of the label of the correct corresponding point and the erroneous corresponding point node of the label of the incorrect corresponding point On the other hand, the first weighting number is given by comparing the pixel with the corresponding pixel of the reference viewpoint which is another viewpoint in the vicinity of the target viewpoint, and the pixel node at the target viewpoint is adjacent to the pixel. The graph is cut with respect to the graph generated by assigning the second weighting number corresponding to the difference in depth between the pixel at the target viewpoint and the adjacent pixel to the edge connecting the node of another adjacent pixel. Each of the nodes in the graph is performed such that the integrated value of each weighting number of the edge through which the cutting line of the graph cut passes is minimized. The allocation to the label of the positive corresponding points or the erroneous corresponding points, and erroneous corresponding point removing unit which removes erroneous corresponding points,
Equipped with a,
The first weighting number is
A first A weighting number for an edge connecting between the node corresponding to the positive corresponding point and the node corresponding to the pixel, and a first B weighting number corresponding to an edge connecting the node corresponding to the pixel and the node corresponding to the erroneous corresponding point And
The first A weighting number is a numerical value corresponding to the peak value of the phase-only correlation function with the other viewpoint, and the higher the correlation value, the higher the numerical value is set,
The 1B weighting number is a numerical value corresponding to the difference in depth of the same pixel from a reference viewpoint that is another viewpoint in the vicinity, and the numerical value is set higher as the difference increases,
The smaller the depth difference from the adjacent pixel node in the viewpoint with the same second weighting number, the higher the numerical value is set.
Depth map generation device according to claim and this.

After the erroneous corresponding point removal unit performs the erroneous corresponding point removal, further comprising an artifact removing unit for removing artifacts between the different viewpoints,
The artifact removal unit calculates a difference between the depth of the pixel at the target viewpoint and the depth obtained by converting the pixel at the comparison viewpoint, which is a viewpoint separated from the target viewpoint by a predetermined distance, into the coordinate system of the target viewpoint. The depth map generation apparatus according to claim 1 , wherein the depth map generation apparatus determines and determines whether or not the pixel of the target viewpoint is an artifact by comparing the difference with a preset threshold value.

The threshold comprises each of a first threshold and a second threshold;
The artifact removing unit is
If the difference is less than or equal to the first threshold, since the same pixel is restored at the same depth in each of the target viewpoint and the comparative viewpoint, it is determined that the pixel of the target viewpoint is not an artifact,
When the difference exceeds the first threshold and is less than the second threshold, the same pixel is restored at a different depth in each of the target viewpoint and the comparative viewpoint, and thus the target viewpoint Determine that the pixel is an artifact,
When the difference is equal to or greater than the second threshold, the pixel of the target viewpoint is determined not to be an artifact, assuming that the pixel of the target viewpoint is the pixel in a region different from the pixel of the target viewpoint. The depth map generating apparatus according to claim 2 .

The artifact removing unit is
The comparative viewpoint is determined not to be used for comparison with the target viewpoint when a parallax angle with respect to the restored pixel of the target viewpoint and the comparative viewpoint is less than a preset third threshold value. The depth map generation device according to claim 2 or claim 3 .

When generating a depth map from the images of a plurality of different viewpoints, a pixel matching process between the images is performed by a hierarchical search that gradually increases the resolution of the pixels,
The depth value for each pixel in the depth map is corrected using a weighted median filter for the depth map after completion of the matching process in each hierarchy in the hierarchical search. The depth map generation device according to any one of claims 1 to 4 .

The weight value of each of the pixels in the weighted median filter window is set corresponding to the distance from the target pixel, the luminance value of the pixel in the image of the hierarchy, and the peak value of the phase-only correlation function. The depth map generating apparatus according to claim 5 , wherein

A depth map generating unit that generates a depth map from images of a plurality of different viewpoints using a phase-only correlation method based on a multi-view stereo algorithm;
The miscorresponding point removal unit corresponds to each pixel of the target viewpoint that is a target for detecting the miscorresponding point for each of the correct corresponding point node of the correct corresponding point label and the miscorresponding point node of the miscorresponding point label. For each edge to be connected from the node, a first weighting number is assigned by comparing the pixel with a corresponding pixel of a reference viewpoint that is another viewpoint in the vicinity of the target viewpoint. A graph generated by assigning a second weighting number corresponding to a difference in depth between a pixel at the target viewpoint and the adjacent pixel to an edge connecting a node and a node of another adjacent pixel adjacent to the pixel. hand, do as the integrated value of each weighting number of the edges cut line of the graph cut for cutting the graph passes is minimized, put on the graph Sorting each of said nodes to said label of said positive corresponding points or the erroneous corresponding points, the corresponding point removal process erroneously removing erroneous corresponding points,
Only including,
The first weighting number is
A first A weighting number for an edge connecting between the node corresponding to the positive corresponding point and the node corresponding to the pixel, and a first B weighting number corresponding to an edge connecting the node corresponding to the pixel and the node corresponding to the erroneous corresponding point And
The first A weighting number is a numerical value corresponding to the peak value of the phase-only correlation function with the other viewpoint, and the higher the correlation value, the higher the numerical value is set,
The 1B weighting number is a numerical value corresponding to the difference in depth of the same pixel from a reference viewpoint that is another viewpoint in the vicinity, and the numerical value is set higher as the difference increases,
The smaller the depth difference from the adjacent pixel node in the viewpoint with the same second weighting number, the higher the numerical value is set.
Depth map generation method which is characterized a call.

Computer
A depth map generating means for generating a depth map from images of different viewpoints using a phase-only correlation method based on a multi-view stereo algorithm;
For each of the edges to be connected from the node corresponding to the pixel of the target viewpoint that is the target for detecting the erroneous corresponding point, for each of the correct corresponding point node of the label of the correct corresponding point and the erroneous corresponding point node of the label of the incorrect corresponding point On the other hand, the first weighting number is given by comparing the pixel with the corresponding pixel of the reference viewpoint which is another viewpoint in the vicinity of the target viewpoint, and the pixel node at the target viewpoint is adjacent to the pixel. The graph is cut with respect to the graph generated by assigning the second weighting number corresponding to the difference in depth between the pixel at the target viewpoint and the adjacent pixel to the edge connecting the node of another adjacent pixel. Proceed as integrated value of the weighted number of the edges cut line of the graph cut through is minimized, each of the nodes of the graph The positive corresponding point or distributed to the label of the erroneous corresponding point, the corresponding point removal means false for removing erroneous corresponding points,
Is a program for operating as
The first weighting number is
A first A weighting number for an edge connecting between the node corresponding to the positive corresponding point and the node corresponding to the pixel, and a first B weighting number corresponding to an edge connecting the node corresponding to the pixel and the node corresponding to the erroneous corresponding point And
The first A weighting number is a numerical value corresponding to the peak value of the phase-only correlation function with the other viewpoint, and the higher the correlation value, the higher the numerical value is set,
The 1B weighting number is a numerical value corresponding to the difference in depth of the same pixel from a reference viewpoint that is another viewpoint in the vicinity, and the numerical value is set higher as the difference increases,
A program in which a numerical value is set higher as a difference in depth from an adjacent pixel node in the viewpoint having the same second weighting number is smaller .