JP3800905B2

JP3800905B2 - Image feature tracking processing method, image feature tracking processing device, and three-dimensional data creation method

Info

Publication number: JP3800905B2
Application number: JP2000048117A
Authority: JP
Inventors: 甲志明渡; 良介三高; 長生 ▲濱▼田
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 1999-07-27
Filing date: 2000-02-24
Publication date: 2006-07-26
Anticipated expiration: 2020-02-24
Also published as: JP2001101419A

Description

【０００１】
【発明の属する技術分野】
本発明は、２次元画像である時系列画像群（動画像）に含まれる対象物について３次元情報を抽出する画像特徴追跡処理方法、画像特徴追跡処理装置、３次元データ作成方法に関するものである。
【０００２】
【従来の技術】
近年、コンピュータグラフィックス（以下、３ＤＣＧと略称する）技術が急速に進歩してきている。このような、３ＤＣＧ技術は、ＣＡＤシステムや仮想現実システムにも用いられる。しかしながら、３ＤＣＧのデータを作成するモデリング作業には多大な時間を要し、このことが３ＤＣＧの応用分野を拡張できない主な原因になっている。
【０００３】
モデリング作業を容易にするために、３次元計測装置（モーションキャプチャなど）を用いて実空間での計測を行い、計測値を３ＤＣＧのデータとすることでモデリング作業を自動化する技術が実用化されてはいるものの、３次元計測装置は立体視を行うものであるから非常に高価であり、また現状の３次元計測装置では計測できる空間領域が比較的狭いから、この技術の用途は限られている。とくに、この種の３次元計測装置は、市街地のような広い空間領域の計測には適しておらず、都市景観などを３ＤＣＧで形成するには３ＤＣＧのデータを手作業で作成しているのが現状である。
【０００４】
そこで、比較的安価な装置を用いて３次元計測を行うとともに各種の対象に汎用的に用いることができる技術として、単眼視による２次元の動画像（時系列画像群）に基づいて３次元形状を復元させようとする技術が提案されている。たとえば、「金出武雄ら：因子分解法による物体形状と撮像装置の運動の復元，電子情報通信学会誌，Ｖｏｌ．Ｊ７６−Ｄ−ＩＩＮｏ．８，１９９３／８」に記載された技術（以下、因子分解法という）では、２次元の動画像上の特徴を認識して、その特徴が動画像としてどのように移り変わっていくかを追跡した特徴点行列を作成し、特徴点行列に因子分解の手法を適用することによって物体の３次元形状とカメラ姿勢を復元している。この技術では特徴点行列を作成する追跡処理に誤差があると計測結果の誤差が大きくなるから、動画像上の特徴を正確に追跡することが重要な課題である。
【０００５】
３次元計測を目的として２次元動画像における特徴を追跡する技術には、特開平１０−１１１９３４号公報に記載の技術がある。この公報には、因子分解法を適用するにあたり、１フレームの画像に対して複数の領域を指定し、かつ各領域に複数種類の特徴抽出方法を適用し、最適な結果を採用することによって、正確な特徴点追跡を行うことが記載されている。
【０００６】
また、特開平１０−２５５０５３号公報には、ステレオ法を適用するにあたり、特徴点を追跡した後、画像中の適宜の１点を画像原点に設定し、撮像装置の回転に対して変化しない特徴点としての原点と他の特徴点との距離を用いて特徴点の運動軌跡を求め、異なる２つの画像群に対して運動軌跡が相関性を持つか否かを評価して追跡の失敗の有無を検出し、正しく追跡された特徴点のみを用いることにより３次元形状を高い精度で復元することが記載されている。
【０００７】
【発明が解決しようとする課題】
しかしながら、特開平１０−１１１９３４号公報に記載された技術では、特徴点追跡の正否を操作者が判定して指示する必要があるから、追跡処理の期間には操作者は装置から離れることができないという問題がある。
【０００８】
また、特開平１０−２５５０５３号公報に記載された技術では、特徴追跡の失敗を検出することはできるものの、失敗の理由までは判断することができないから、特徴点を途中で追跡できなくなったときに、追跡の誤りなのか、別に原因（複数の物体が存在していて追跡中の物体が他の物体により隠れた場合など）があるのかを判断することができず、失敗を補正することができないから、画像中で計測不可能になる部分が生じることがある。
【０００９】
さらに、上述したどちらの技術も画像から抽出された局所的な特徴を用いて追跡しているから、たとえば格子模様などの繰り返しパターンを有する場合には、追跡を失敗しやすいという問題もある。また、上述したいずれの技術においても得られる３次元情報は各特徴点の座標であり、各特徴点の座標は３次元空間中の離散的な座標情報にすぎず、特徴点同士の関連性は不明である。したがって、上述した技術によって計測した情報を３ＤＣＧのデータとして利用するには、離散的な座標情報からポリゴンなどの面情報を持つデータを生成する別の処理手段が必要になる。
【００１０】
本発明は上記事由に鑑みて為されたものであり、その目的は、３ＤＣＧのデータに用いることができる３次元情報を２次元画像である時系列画像群から容易に得ることができるようにし、しかも時系列画像群の中で対象物を追跡するにあたって追跡の妥当性の検証を可能とし、対象物を確実に追跡するとともに３次元情報における誤差の発生を抑制した画像特徴追跡処理方法、画像特徴追跡処理装置、３次元データ作成方法を提供することにある。
【００１１】
【課題を解決するための手段】
請求項１の発明は、静止している対象物を異なる複数位置からＴＶカメラにより撮像して複数の画像を取得し、前記対象物から選択した面を境界要素と画像特徴量とで表した追跡領域として画像内に設定し、一つの画像内で設定した追跡領域を他の画像における追跡領域の候補の境界要素に一致するように変形させた変形領域を作成し、前記他の画像における追跡領域の候補と変形領域とについて境界要素と画像特徴量との少なくとも一方を比較することにより追跡領域の候補から前記一つの画像内で設定した追跡領域に対応する追跡領域を選択する第１の過程と、隠蔽やフレームアウト等の撮像されていない領域が少ない追跡領域を３次元空間にマッピングすることにより得られる基準領域を設定し、画像内において追跡領域の少なくとも一部が追跡できないときに当該追跡領域と基準領域の３次元空間での境界要素の比較に基づいてＴＶカメラにより撮像できない領域が生じたか否かを判断する第２の過程と、前記基準領域をＴＶカメラの各位置において撮像したときの基準領域の位置関係に基づいて追跡領域をＴＶカメラで撮像できなくなった原因を推定し、推定結果に基づいて追跡領域の追跡が可能となるように修正して追跡を継続させる第３の過程とを備えることを特徴とする。
【００１２】
請求項２の発明は、請求項１の発明において、前記第１の過程で、一つの画像で設定された追跡領域の境界要素について他の画像上で対応する候補を抽出し、境界要素の候補の組み合わせによる追跡候補を生成した後、前記一つの画像で設定した追跡領域を追跡候補に一致するように変形させた変形領域を生成し、追跡候補と変形領域とについて境界要素の形状、画素値、画像特徴量から選択される指標を比較することにより追跡候補の中から前記他の画像における追跡領域を決定することを特徴とする。
【００１３】
請求項３の発明は、請求項２の発明において、前記他の画像においてエッジを抽出するとともに、エッジのうち前記一つの画像において追跡領域を構成する線要素との距離および方向が規定範囲内であるエッジを線要素の候補として選択することを特徴とする。
【００１４】
請求項４の発明は、請求項２の発明において、前記他の画像においてエッジを抽出するとともに、エッジ上の各画素にハフ変換を行って連続性を有したエッジを抽出し、このエッジのうち前記一つの画像において追跡領域を構成する線要素との距離および方向が規定範囲内であるエッジを線要素の候補として選択することを特徴とする。
【００１５】
請求項５の発明は、請求項２の発明において、前記一つの画像において追跡領域を構成する点要素の近傍の形状をテンプレートとし、前記他の画像においてテンプレートにマッチングする部位から点要素の候補を抽出することを特徴とする。
【００１６】
請求項６の発明は、請求項２の発明において、前記追跡候補と前記変形領域との画素値を前記指標に用いることを特徴とする。
【００１７】
請求項７の発明は、請求項２の発明において、前記追跡領域と前記追跡候補との平均輝度を前記指標に用いることを特徴とする。
【００１８】
請求項８の発明は、請求項２の発明において、前記画像はカラー画像であって、前記追跡領域と前記追跡候補との色を前記指標に用いることを特徴とする。
【００１９】
請求項９の発明は、請求項２の発明において、前記追跡領域と前記追跡候補との空間周波数分布を前記指標に用いることを特徴とする。
【００２０】
請求項１０の発明は、請求項２の発明において、前記追跡領域と前記追跡候補の領域内で画素値の変化が規定値を超える部分を抽出し、この部分の位置、形状、画像特徴の少なくとも１要素を前記指標に用いることを特徴とする。
【００２１】
請求項１１の発明は、請求項２の発明において、前記画像はカラー画像であって、前記追跡領域と前記追跡候補との平均輝度または色を前記指標に用いる場合と、空間周波数分布を前記指標に用いる場合と、領域内で画素値の変化が規定値を超える部分を抽出し、この部分の位置、形状、画像特徴の少なくとも１要素を前記指標に用いる場合とを、前記一方の画像における追跡領域内での輝度および色の分布パターンに応じて選択することを特徴とする。
【００２２】
請求項１２の発明は、請求項２の発明において、複数の追跡領域について複数の画像のうち平均輝度が最大になるときのＴＶカメラの姿勢を各追跡領域に対応した対象物の面の反射光の方向に近いと推定し、各追跡領域に対応した対象物の面の法線方向と反射光の方向とから光源からの照射方向を推定することを特徴とする。
【００２３】
請求項１３の発明は、請求項１の発明において、前記第２の過程では、複数の画像間で対応する追跡領域を３次元空間にマッピングし、当該追跡領域の境界要素の対応関係から求められる一致度が閾値以上であるときに３次元空間にマッピングした追跡領域を基準領域とし、基準領域に対応する追跡領域と基準領域との境界要素を比較し、境界要素の変化によって追跡領域中で撮像されていない領域を抽出することを特徴とする。
【００２４】
請求項１４の発明は、請求項１３の発明において、複数の画像間で境界要素の個数に変化が生じない範囲に限定し、各々の範囲について、マッピング時に得られる寄与率を一致度に用いて寄与率が閾値以上である追跡領域を基準領域の候補とし、基準領域の候補が複数個得られるときには面積が最大になる候補を基準領域として採用することを特徴とする。
【００２５】
請求項１５の発明は、請求項１４の発明において、境界要素の個数の変化は、線要素と点要素との一方に着目することを特徴とする。
【００２６】
請求項１６の発明は、請求項１４の発明において、前記寄与率は追跡領域を因子分解法により３次元空間にマッピングする際に得られる対角行列の成分から求めることを特徴とする。
【００２７】
請求項１７の発明は、請求項１３の発明において、３次元空間に追跡領域をマッピングするとともに基準領域と比較し、互いに対応しない点要素によって規定される領域を追跡領域中で撮像されていない領域として求めることを特徴とする。
【００２８】
請求項１８の発明は、請求項１３の発明において、３次元空間に追跡領域をマッピングするとともに基準領域と比較し、互いに対応しない線要素が存在するときに、この線要素を追跡領域中で撮像されていない領域の一部とみなすことを特徴とする。
【００２９】
請求項１９の発明は、請求項１の発明において、前記第３の過程で、３次元空間内でのすべての基準領域をＴＶカメラの位置により決まる画像面に投影し、画像面内における基準領域同士の位置関係および画像面に対する基準領域の位置関係に基づいて、追跡領域中で撮像されていない領域が生じた原因を判定することを特徴とする。
【００３０】
請求項２０の発明は、請求項１９の発明において、着目する基準領域とＴＶカメラとの間に他の基準領域が存在し、かつ両基準領域が接続されていないときに他者隠蔽と判定することを特徴とする。
【００３１】
請求項２１の発明は、請求項１９の発明において、着目する基準領域とＴＶカメラとの間に他の基準領域が存在し、かつ両基準領域が接続されているときに自己隠蔽と判定することを特徴とする。
【００３２】
請求項２２の発明は、請求項１９の発明において、着目する基準領域が画面の周縁に位置するときにフレームアウトと判定することを特徴とする。
【００３３】
請求項２３の発明は、請求項１９の発明において、追跡領域中で撮像されていない領域が存在するときに、エッジ延長を行い、延長されたエッジの交点を新たな点要素として、追跡処理を再度実行し、３次元空間にマッピングする際に寄与率が向上すれば隠蔽と確定することを特徴とする。
【００３４】
請求項２４の発明は、ＴＶカメラにより撮像された画像入力画像として与える画像入力部と、入力された画像に対して請求項１記載の画像特徴追跡処理方法による処理を施す画像処理装置と、入力された画像および画像処理装置で処理された画像を格納する記憶装置と、画像処理装置での処理画像を表示する表示手段と、画像処理装置に対して追跡領域を指定する領域指定手段とを備えるものである。
【００３５】
請求項２５の発明は、対象物を異なる複数位置からＴＶカメラにより撮像して複数の画像を取得した後、前記対象物から選択した面を境界要素で表した追跡領域として画像内に設定し、一つの画像内で設定した追跡領域を他の画像における追跡領域の候補の境界要素に一致するように変形させた変形領域を作成し、前記他の画像における追跡領域の候補と変形領域とについて境界要素を比較することにより追跡領域の候補から前記一つの画像内で設定した追跡領域に対応する追跡領域を選択し、さらに前記対象物の複数の面が撮像されかつ連続して同じ面が撮像されている複数の画像を１つのフェーズ画像群として区切り、次に複数のフェーズ画像について各フェーズ画像群ごとに対象物から選択した面の３次元空間へのマッピングを行うことにより各フェーズ画像群ごとに３次元形状を求め、互いに他のフェーズ画像群の座標系を一致させるように座標変換を行うことによって、３次元データを作成することを特徴とする。
【００３６】
請求項２６の発明は、対象物を異なる複数位置からＴＶカメラにより撮像して複数の画像を取得した後、前記対象物から選択した面を境界要素で表した追跡領域として画像内に設定し、一つの画像内で設定した追跡領域を他の画像における追跡領域の候補の境界要素に一致するように変形させた変形領域を作成し、前記他の画像における追跡領域の候補と変形領域とについて境界要素を比較することにより追跡領域の候補から前記一つの画像内で設定した追跡領域に対応する追跡領域を選択し、さらに前記対象物の複数の面が撮像されかつ連続して同じ面が撮像されている複数の画像を１つのフェーズ画像群として区切り、次に複数のフェーズ画像について各フェーズ画像群ごとに対象物から選択した面の３次元空間へのマッピングを行うことにより各フェーズ画像群ごとに３次元形状を求め、互いに他のフェーズ画像群に共通している線要素を重ね合わせるように各フェーズ画像群ごとに得られた３次元形状の座標変換を行うことによって、３次元データを作成することを特徴とする。
【００３７】
請求項２７の発明は、請求項２５または請求項２６の発明において、前記ＴＶカメラにより撮像して取得した前記複数の画像内で前記対象物から選択した面を、点要素と各点要素を順次接続する有向の線要素とからなる境界要素で表し、時系列で得られる各画像において線要素を辿って各点要素を巡回する向きを監視し、巡回する向きが逆になるまでを１つのフェーズ画像群とすることを特徴とする。
【００３８】
請求項２８の発明は、請求項２５または請求項２６の発明において、前記座標変換では、境界要素ごとに回転および平行移動に関する変換パラメータを求めた後、境界要素ごとに求めた変換パラメータをそれぞれ平均した変換パラメータを用いて座標変換を行うことを特徴とする。
【００３９】
請求項２９の発明は、請求項２５または請求項２６の発明において、座標系を互いに一致させる一方の３次元形状に前記座標変換を施した後の各点要素の位置と他方の３次元形状においてそれぞれ対応する各点要素の位置との中点を、各点要素の位置とすることを特徴とする。
【００４０】
【発明の実施の形態】
（第１の実施の形態）
図１に本発明で用いる装置を示す。処理対象となる２次元画像は、画像入力装置１１から画像特徴追跡処理装置１０に入力される。画像入力装置１１としては、ＴＶカメラ、あるいはＴＶカメラにより撮像された動画像を記録した記録媒体（ビデオテープ、ＣＤ、ＤＶＤなど）から映像を再生する再生装置が用いられ、画像特徴追跡処理装置１０はコンピュータ装置（パーソナルコンピュータ）を用いて構成される。ＴＶカメラ（以下ではカメラと略称する）としては家庭用の簡易なビデオカメラでよく、工業用のＩＴＶカメラでもよい。また、以下に説明する実施形態ではカラー画像を対象とする。
【００４１】
画像特徴追跡処理装置１０に入力される画像は、図２に示すように対象物１に対してカメラ２を矢印Ａで示すように移動させることによって、対象物１を見る視点を連続的に変化させた画像を用いる。つまり、対象物１に対する視点を連続的に変化させた２次元画像の時系列画像群（動画像）が画像特徴追跡処理装置１０に入力される。ここで、画像入力手段１２に入力される画像は、動画像の各フレーム間での対象物１の移動量が対象物１の大きさに比較して十分に小さくなるように撮像されている。つまり、動画像を撮像するカメラ２は比較的低速で移動する。また、対象物１としては平面部分が比較的多いものを考え、しかも対象物１は移動せず時間経過に伴う形状の変化も生じないものとする。図２に示した対象物１を撮像して得られる動画像の例を図３に示す。図３では１つの枠が１画像Ｖ１〜Ｖ３を意味し、左から右に向かって画像Ｖ１〜Ｖ３が時系列に並んでいる状態を示している。
【００４２】
画像特徴追跡処理装置１０は、基本的には、画像入力装置１１とのインタフェースとなる画像入力手段１２と、画像入力手段１２を通して入力された動画像や処理結果を格納するために設けられたハードディスクおよびメモリからなる記憶装置１３と、入力された２次元画像である時系列画像群から３次元情報を抽出する画像処理装置１４と、入力された動画像や処理結果を表示するディスプレイ装置からなる表示手段１５と、画像処理装置１４に対して対象物１に対応した追跡領域などを設定するためのキーボードおよびマウスからなる領域指定手段１６とを備える。
【００４３】
記憶装置１３には、画像入力手段１２を通して入力された画像を格納する画像ファイルＦ１と、２次元画像である時系列画像群において３次元情報を得ようとする対象物１を追跡する際の作業用ファイルとして用いる追跡データファイルＦ２と、画像処理装置１４により得られた３次元情報を格納する３次元形状データファイルＦ３とが設けられる。
【００４４】
また、画像処理装置１４は、上述した領域指定手段１６により対象物１に対して設定された追跡領域に基づいて対象物１を追跡する領域追跡手段１７を備えるとともに、領域追跡手段１７による対象物１の追跡の妥当性を評価する追跡評価手段１８を備える。また、画像処理装置１４には追跡した対象物１に関する２次元画像内での情報から３次元情報を生成する形状復元手段１９が設けられる。追跡評価手段１８において行われる追跡の妥当性の評価には、形状復元手段１９により生成された３次元情報を用いる。
【００４５】
以下に、画像処理装置１４の動作を説明する。まず、画像入力手段１２を通して画像が入力されると、入力された画像は記憶装置１３の画像ファイルＦ１に格納されるとともに表示手段１５に表示される。ここで、領域指定手段１６を操作する操作者は表示手段１５に表示された画像を見て対象物１に対応付けて初期の追跡領域を指定する。つまり、時系列画像群から１つの画像を選択し、この画像内で初期の追跡領域を設定する。一般に、追跡領域は対象物１の面単位で設定され、１つの画像に表示されている対象物１のすべての面を追跡領域として指定する。このようにして追跡領域を指定すれば、背景のような不要な情報は除去される。図４に追跡領域Ｄを指定した状態を示す。追跡領域Ｄは境界要素の集合よりなる閉領域であって、境界要素としては点要素および点要素を結ぶ線要素を用いている。図４において、白丸が点要素であり、隣接する２個の点要素の間の線が線要素を表す。境界要素は、線要素が２つの追跡領域Ｄに共有されているか否かにより表される追跡領域Ｄ同士の接続関係（図４の太線は２つの追跡領域Ｄに跨る線要素を示す）、領域の画像特徴（色、テクスチャ等）を含んでいてもよい。
【００４６】
上述のように立体形状を境界要素としての線要素と点要素とにより表現する形式は境界表現（Ｂ−ＲＥＰ：boundary representation）と呼ばれ３ＤＣＧでは一般的に用いられている。入力された画像情報を境界表現に変換すれば、画素単位で画像情報を扱う場合に比較するとデータ量が大幅に低減され、しかも３次元データを利用する際にデータの変換が容易になる。図５に境界表現による境界領域Ｄの設定例を示す。図５では斜線部が境界領域Ｄを示し、線要素ｓ１〜ｓ８と点要素ｐ１〜ｐ８によって環状の領域を設定している。
【００４７】
領域指定手段１６では、対象物１に追跡領域Ｄを設定するだけではなく対象物１を計測する精度も指定することができる。たとえば、図６のような引出２１を有する家具２０について３次元計測を行うとすれば、引出２１に設けた取手２２まで計測するか、家具２０の全体を直方体とみなして計測するかの精度を指定することができる。画像内の対象物１が家具のように複雑な形状ではなく幾何学的な形状の組み合わせのような単純な形状である場合には、対象物１の各面をそれぞれ追跡領域Ｄとして自動的に設定するようにしてもよい。つまり、対象物１の内部領域を面ごとに分割するような領域分割手段を設けてもよい。
【００４８】
追跡領域Ｄは記憶装置１３における追跡データファイルＦ２に格納される。領域追跡手段１７は、時系列画像群の各画像において領域指定手段１６により設定された追跡領域Ｄに対応する領域を検出する。つまり、時系列画像群の各画像において対象物１の着目する面を追跡する。領域追跡手段１７では、時系列画像群の各画像において追跡領域Ｄを追跡するために、画像ファイルＦ１に格納された時系列画像群のうちの１画像（１フレーム）に対して設定されている追跡領域Ｄから、次フレームの追跡領域Ｄの形状を推定して変形領域を生成する機能を有する。得られた変形領域は次フレームの画像と比較照合され、次フレームでの変形領域の位置が検出される。その後、変形領域を次フレームにおける追跡領域とみなして、さらにその次のフレームでの変形領域の位置を検出する。こうしてすべてのフレームについて変形領域を次々に生成して変形領域の位置を追跡データとして追跡データファイルＦ２に格納する。
【００４９】
領域追跡手段１７の機能についてさらに詳しく説明する。図７は時系列画像群のうちの１画像（１フレーム）において、１つの対象物１のみを表示した状態で示してある。図７における追跡領域（斜線部）Ｄは時系列画像群において１つ前のフレームで設定されたものであり、図７に示すフレームにおいては対象物１のどの面とも一致していない。そこで、領域追跡手段１７では追跡領域Ｄの各線要素ｓＡ，ｓＢ，ｓＣがフレーム内のどの部分に対応するかを検出する。ここに、対象物１の画像はエッジを抽出した画像（エッジ画像という）であるものとする。エッジの抽出には周知の技術を用いる。
【００５０】
領域追跡手段１７では、線要素に基づいて変形領域を生成する場合と、点要素に基づいて変形領域を生成する場合とがある。線要素に基づいて変形領域を生成する場合には、まず対象物１のエッジのうち方向および距離が各線要素ｓＡ，ｓＢ，ｓＣに対して所定範囲内であるエッジを抽出する。すなわち、対象物１のエッジ画像に対してハフ変換を行うことにより、エッジ画像内での直線部分の傾きを知ることができるから、ハフ変換により求めたエッジ画像内の各直線部分と各線要素ｓＡ，ｓＢ，ｓＣとの角度差を求めて、この角度差が所定範囲内であるときにエッジ画像内の直線部分を線要素ｓＡ，ｓＢ，ｓＣに対応付ける。図示例では線要素ｓＡに対してエッジｓａ１，ｓａ２が対応し、線要素ｓＢに対してエッジｓｂ１，ｓｂ２が対応し、線要素ｓＣに対してエッジｓｃ１が対応する。
【００５１】
この段階では、線要素ｓＡには２本のエッジｓａ１，ｓａ２が対応付けられ、線要素ｓＢにも２本のエッジｓｂ１，ｓｂ２が対応付けられているから、１つの変形領域を設定するには、各線要素ｓＡ，ｓＢ，ｓＣにエッジｓａ１，ｓａ２，ｓｂ１，ｓｂ２，ｓｃ１を一対一に対応付ける必要がある。ここで、組み合わせとしては４種類の候補が考えられる。つまり、ｓａ１−ｓｂ１−ｓｃ１、ｓａ１−ｓｂ２−ｓｃ１、ｓａ２−ｓｂ１−ｓｃ１、ｓａ２−ｓｂ２−ｓｃ１のいずれかの組み合わせ（以下、追跡候補という）が追跡領域Ｄに対応することになる。ここではハフ変換によってエッジを求めているから、エッジの長さに関する情報は失われており、各組み合わせは図８に実線で示す図形に相当する。図８における二点鎖線は着目外のエッジを示す。
【００５２】
次に、上述した各追跡候補にそれぞれ合致するように追跡領域Ｄを変形する。つまり、線要素ｓＡ，ｓＢ，ｓＣを図８（ａ）〜（ｄ）に示した追跡候補に対応付けるように変形して変形領域を生成すると、図９（ａ）〜（ｄ）のように線要素ｓＡ１〜ｓＡ４，ｓＢ１〜ｓＢ４，ｓＣ１〜ｓＣ４を有した４種類の変形領域Ｅ１〜Ｅ４が生成される。図９（ｅ）は変形前の追跡領域Ｄである。ここに、追跡領域Ｄを変形する処理は、図１０に示すように、元の追跡領域Ｄにおける各点要素ｐＡ，ｐＢ，ｐＣと変形領域Ｅにおけるエッジの各交点ｐａ，ｐｂ，ｐｃとを結ぶベクトルＭａ，Ｍｂ，Ｍｃを求め、さらに追跡領域Ｄに含まれる画素ｐＰの位置に応じてベクトルＭａ，Ｍｂ，Ｍｃを補間することにより画素ｐＰに対応する変形領域Ｅ内での画素ｐｐの位置（ベクトルＭｐ）を求める。また、変形領域Ｅの各画素の画素値も補間処理によって求める。この種の処理をワープ変形処理と呼んでいる。
【００５３】
領域追跡手段１７において、点要素に基づいて変形領域を生成する場合には、追跡領域Ｄの点要素ｐＡ，ｐＢ，ｐＣの近傍における境界要素の形状に着目する。いま、図１１のような画像が得られれているものとすると、追跡領域Ｄの各点要素ｐＡ，ｐＢ，ｐＣを中心とする所定範囲の境界要素は、図１２（ａ）〜（ｃ）のようになる。そこで、図１２の形状をテンプレートに用いてフレーム（追跡領域Ｄを設定した次のフレーム）内のパターンマッチングを行い、類似度の高い部位を抽出する。パターンマッチングはフレーム内の点要素の近傍で行えばよく、また傾きを考慮する必要がないから、画像内の全領域についてパターンマッチングする場合に比較して処理量がごく少なくなる。図１１の例では点要素ｓＡに対して点要素ｓａ１，ｓａ２，ｓａ３が選択され、点要素ｓＢに対して点要素ｓｂ１が選択され、点要素ｓＣに対して点要素ｓｃ１が選択される。つまり、図１３に示すように、ｓａ１−ｓｂ１−ｓｃ１、ｓａ２−ｓｂ１−ｓｃ１、ｓａ３−ｓｂ１−ｓｃ１の３種類の組み合わせが追跡候補として得られるから、図１４のように３種類の変形領域Ｅ１〜Ｅ３が生成される。
【００５４】
上述のようにして変形領域Ｅの各画素ｐｘの画素値が求められると、次に、変形領域Ｅを用いて各追跡候補の中から追跡領域Ｄに対応するものを選択する。
【００５５】
まず、追跡領域Ｄの中の画素値（濃度値）および微分値が規定範囲内であるときには、画素値が略一定であるか変化が滑らかであることを意味するから、変形領域Ｅの各画素ｐｐの画素値と追跡候補の各画素の画素値とを直接比較する。つまり、各変形領域Ｅは各追跡候補の形状に合致するように設定されているから、対応する部位の画素値の差を求めることによって、画素値の差異を知ることができる。そこで、画素値の差の絶対値の総和を求め、追跡領域Ｄに含まれる画素数で除算した値を評価値として用いる。すべての追跡候補について評価値を求め、評価値がもっとも小さい追跡候補を最適な追跡候補として選択する。すなわち、このようにして選択した追跡候補を、新たな追跡領域として採用するのである。図８に示す例では図８（ｂ）の追跡候補が新たな追跡領域として選択され、図１２に示す例では図１３（ａ）の追跡候補が新たな追跡領域として選択される。言い換えると、追跡候補内に他のエッジが含まれないような組み合わせ（線要素ではｐａ１−ｐｂ１−ｐｃ１、点要素ではｐａ１−ｐｂ１−ｐｃ１）の評価値が最小になり、新たな追跡領域になる。
【００５６】
一方、追跡領域Ｄの中で画素値あるいは微分値が規定範囲を超える部分を有するときには、画素値に局所的に大きな変動があることを意味する。たとえば、石の表面のように細かい模様（テクスチャ）を含むような追跡領域Ｄでは、画素値あるいは微分値が規定範囲を超えることになり、このような面では局所的な画素値の差が大きいので画素値の差を評価値に用いることはできない。そこで、追跡領域Ｄにおける後述の画像特徴量と変形領域Ｅ１から求めた画像特徴量との差を評価値として用い、評価値が最小になる追跡候補を新たな追跡領域として採用する。
【００５７】
ここに、画像特徴量としては、追跡領域Ｄの輝度がほぼ均一であるときには輝度情報を用い、色にばらつきが少ないときには色情報を用いればよい。つまり、追跡候補のうち追跡領域Ｄとの画像特徴量の差が規定値以内のときに追跡候補を新たな追跡領域として採用する。なお、色情報を用いるときには画像としてカラー画像を用いる必要がある。色情報を用いる場合には、たとえば図１５に示すような色度図（Ｒ，Ｇ，Ｂ，Ｗは赤、緑、青、白を意味する）の中でのＲ−Ｗ線に対する追跡領域の色Ｑとの角度θを色相値とすればよい。
【００５８】
また、図１６に示すように、追跡領域Ｄに周期的とみなされる模様が存在するときには、フーリエ変換やウェーブレット変換を行うことにより空間周波数に関する情報を抽出して画像特徴量に用いる。つまり、図１７のように変形領域Ｅと追跡候補との空間周波数の分布ｄ１，ｄ２をそれぞれ抽出し、空間周波数の分布ｄ１，ｄ２を比較すれば追跡候補から新たな追跡領域を選択することができる。
【００５９】
図１８のように、対象物の表面に文字や図形が表記されたりラベルが貼着されているような場合には、追跡領域Ｄの中でもコントラストの大きい部位が生じることがある。このようなときには、文字や図形の表記あるいはラベルを特徴部分Ｆとして追跡候補と変形領域Ｅとの比較を行う。たとえば、図１８（ａ）のように変形領域Ｅが設定され、比較すべき追跡候補Ｅ′（特徴部分Ｆ′を含む）が図１８（ｂ）のように設定されているとすれば、図１９に示すように、特徴部分Ｆの重心Ｇとエッジの各交点ｐａ，ｐｂ，ｐｃとの距離Ｌ１，Ｌ２，Ｌ３、特徴部分Ｆの面積、特徴部分Ｆでの平均輝度や平均色相を画像特徴量として用いることによって、追跡候補から新たな追跡領域を選択することが可能になる。
【００６０】
上述のように、追跡候補から新たな追跡領域を選択するときの評価方法は、追跡領域Ｄの中で画素値および微分値が規定範囲内である場合と、追跡領域Ｄ内で画素値または微分値が規定範囲外である場合と、追跡領域Ｄ内に表記やラベルが存在する場合の３つの場合とでそれぞれ異なる。そこで、各条件に応じて評価方法を選択することが必要である。この選択には、輝度および色相のヒストグラムを作成し、ヒストグラムのパターンに応じて評価方法を選択することによって選択を自動化している。
【００６１】
すなわち、輝度および色相についてヒストグラムを作成したときに図２０のようにヒストグラムにピークが生じない場合には、変形領域Ｅについて画像を１次微分するとともに、微分値の総和を画素数で除算した値（つまり、変形領域Ｅの微分値の平均値）をテクスチャ特徴量として求める。また、テクスチャ特徴量に対する閾値を設定し、テクスチャ特徴量が閾値以下のときには追跡領域Ｄの中で微分値が規定範囲内であるため、細かい模様がないと判断し、画素値の差分和を評価方法として採用し、また閾値を超える場合には微分値が規定範囲を超えるため、細かい模様があると判断し、空間周波数の分布を比較する方法を採用する。
【００６２】
一方、図２１のように、輝度と色相との少なくとも一方に単一のピークが生じるときには平均輝度や平均色相を用いる。また、図２２のように、輝度と色相とにともに複数のピークが生じるときには、文字や図形が表記されていたりラベルが貼着されている可能性が高いから、特徴部分Ｆに関する情報を用いるようにする。
【００６３】
上述のように、輝度および色相の分布情報を用いることによって、どの処理を行うかを自動的に設定することが可能になる。図２３に全体の流れを示す。すなわち、初期の追跡領域Ｄを設定した後（Ｓ１）、次のフレームにおいて境界要素のうち対応する候補を抽出する（Ｓ２）。次に、境界要素の組み合わせによって追跡候補を作成し（Ｓ３）、追跡領域Ｄの中の画素に関して輝度と色相とのヒストグラムを作成する（Ｓ４）。ここで、ヒストグラムに生じるピークの個数を求め（Ｓ５）、両ヒストグラムにピークがなければ（Ｓ６）微分値の平均値を閾値と比較し（Ｓ７）、閾値以下ならば画素値のみの比較によって追跡候補から新たな追跡領域を選択する（Ｓ８）。また、微分値の平均値が閾値より大きいときには空間周波数の分布によって追跡領域を選択する（Ｓ９）。
【００６４】
一方、輝度のヒストグラムにおいてピークが１個あれば（Ｓ１０）平均輝度を用いて追跡領域を選択し（Ｓ１１）、輝度のヒストグラムにおけるピークは１個ではないが色相のヒストグラムにおいてピークが１個であるときには（Ｓ１２）、平均色相を用いて追跡領域を選択する（Ｓ１３）。輝度および色相のヒストグラムにおいてピークはあるが、ともにピークが２個以上であるときには、文字や図形の表記あるいはラベルの貼着があると考えられるから、この種の特徴を用いて追跡領域を選択する（Ｓ１４）。上述のようにして選択した追跡候補を新たな追跡領域とし（Ｓ１５）、次のフレームの処理に移行する（Ｓ１６）。
【００６５】
ところで、カメラ２の移動によって時系列画像群を生成しているから、対象物１とカメラ２との位置関係によって、初期の追跡領域Ｄの一部が、図２４のように着目している対象物１とカメラ２との間に存在する物体３（他の対象物を含む）によって隠蔽されることがある（隠蔽部位を斜線部で示す）。また、対象物１の追跡領域Ｄが図２５のように、カメラ２に対して対象物１の反対面側に位置して隠蔽されることもある（隠蔽部位を斜線部で示す）。このように、追跡領域Ｄとして選択した特定の面の形状が変化したり、特定の面が隠蔽されて追跡できなくなることがある。なお、隠蔽されていた面が新たに露出する場合には追跡領域Ｄとして指定されていないから問題はない。
【００６６】
上述のように、追跡領域Ｄの形状（頂点や辺の数）が変化したり、追跡領域Ｄ２が完全に隠蔽されるような場合には、変形領域と追跡候補との形状（点要素や線要素の個数）が一致しないから、変形領域と追跡候補との一致度が低下することになる。このような場合には、変形した追跡領域の形状に応じて追跡領域の境界要素を増減させて以後の追跡を続行させる。たとえば、図２４に示したように追跡領域を設定した対象物１とカメラ２との間に他の物体３が存在するときには、図２６（ａ）のように追跡領域Ｄ１が四角形であるのに対して、図に二点鎖線で示す直線と一つの頂点とを含む部位が物体３に隠蔽されることによって、図２６（ｂ）のように五角形の追跡領域Ｄ２に変形するから、このような場合には追跡領域Ｄ２を五角形とするように境界要素（点要素、線要素）を増やして以後の追跡を行う。このような追跡領域Ｄ２の変形は可能な限り自動的に行うようにし、自動的に行えない場合には追跡領域Ｄ２の変形を要求するメッセージを表示手段１５に表示する。表示手段１５にメッセージが表示されたときには領域指定手段１６を操作して追跡領域Ｄ２を変形させることになる。
【００６７】
上述した追跡過程においては、記憶装置１３には追跡領域の位置が順次格納されるとともに、追跡候補から新たな追跡領域を選択する際に用いた平均輝度などの画像特徴も記憶装置１３に格納される。これらの情報は、追跡評価手段１８や形状復元手段１９で用いられ、追跡に異常が生じた追跡領域を再度追跡する際における異常の原因の推定に用いられる。
【００６８】
たとえば、晴天時の日中に屋外で住宅を撮像したような場合には、対象物１に照射される強い単一光源が存在することになり、図２７に示すように、対象物１の影ＯＭが強く生じることになる。このような影ＯＭにはエッジが生じるから、変形領域を誤って生成して追跡を妨げる可能性があり、追跡に異常が生じる可能性がある。そこで、記憶装置１３に格納した画像特徴から、影ＯＭを形成する単一光源ＬＭの位置を検出するとともに、影ＯＭの発生を予測して影ＯＭの影響を除去するのである。
【００６９】
さらに具体的に説明する。図２８のように、追跡領域Ｄが単一光源（自然光も含む）ＬＭにより照明されているときに、カメラ２をカメラ姿勢ベクトルｖ１〜ｖｎで表される位置から撮像するものとする。単一光源ＬＭからの光束の方向を表す照明光ベクトルｒ１と追跡領域Ｄでの正反射光の方向を表す反射光ベクトルｒ２とは追跡領域Ｄが設定される面の法線方向Ｕとなす角度が等しく、追跡領域Ｄの見かけ上の輝度は、カメラ姿勢ベクトルｖ１〜ｖｎが反射光ベクトルｒ２に近いほど大きくなる。
【００７０】
そこで、追跡過程において各フレームで設定される追跡領域Ｄの平均輝度を記憶装置１３に格納しておき、形状復元手段１９において各追跡領域Ｄごとに平均輝度が最大になるフレームを撮像したときのカメラ姿勢ベクトルｖ１〜ｖｎを求め、求めたカメラ姿勢ベクトルｖ１〜ｖｎを反射光ベクトルｒ２の近似値とみなす。反射光ベクトルｒ２が得られると、各追跡領域Ｄが設定される面の法線方向Ｕとの関係で照明光ベクトルｒ１を推定することができるから、すべての追跡領域Ｄについて得られる照明光ベクトルｒ１の推定値のばらつきの程度を評価し、ばらつきが小さいときには、単一光源が存在すると判断することができる。このとき、各追跡領域Ｄについて得られた照明光ベクトルｒ１の推定値の平均値を照明光ベクトルｒ１として用いる。
【００７１】
ところで、上述したように、カメラ２によって追跡領域Ｄを追跡する過程において追跡領域Ｄの一部または全部を撮像できなくなることがある。追跡領域Ｄを撮像できなくなる原因として以下の３種類の場合が考えられる。
【００７２】
すなわち、図２９のように、対象物１に設定した追跡領域Ｄとカメラ２との間に他の物体３が存在して追跡領域Ｄが隠蔽される場合（以下、他者隠蔽という）、図３０のように、対象物１に設定した追跡領域Ｄがカメラ２に対して対象物１の背面側に位置する場合（以下、自己隠蔽という）、図３１のように追跡領域Ｄの一部がカメラ２の視野ＶＦの外に出る場合（以下、フレームアウトという）の３種類の場合である。
【００７３】
上述のようにカメラ２で撮像できない領域（以下、撮像不可領域という）は以下の手順によって抽出される。すなわち、まず追跡過程において撮像不可領域が生じていない追跡領域を基準領域として設定する。このような基準領域は、時系列画像群のうちの複数の画像において境界要素の個数が変化しない追跡領域を用いて設定する。ここに、境界要素としては線要素と点要素とのいずれかを用いればよい。たとえば、図３２（ａ）に示す時系列画像群Ｖ１１〜Ｖ１３では追跡領域Ｄ１１〜Ｄ１３は５個の点要素および５本の線要素を有しており、点要素および線要素の個数に変化は生じていない。また、図３２（ｂ）に示す時系列画像群Ｖ２１，Ｖ２２では追跡領域Ｄ２１，Ｄ２２は４個の点要素および４本の線要素を有しており、この場合も点要素および線要素の個数に変化はない。
【００７４】
基準領域を設定する際には、上述のように複数の画像において境界要素の個数に変化が生じない境界領域を選択し、点要素の対応をとってステレオ法や因子分解法のような手法を適用することにより３次元空間に追跡領域をマッピングする。たとえば、図３３（ａ）（ｂ）のような２画像において、追跡領域Ｄ１，Ｄ２の点要素の座標が図３３（ａ）（ｂ）のように設定されているものとする。ここで、各点要素を図３３（ｃ）のようにマッピングする場合に因子分解法を採用するものとすれば、各画像における追跡領域Ｄ１、Ｄ２の点要素の座標と、３次元空間での点の座標との関係を数１の形に表すことができる。
【００７５】
【数１】

数１において右辺の左の行列式はカメラ２の向きを表しており、第１行の（ＣＸ１ＣＹ１ＣＺ１）、第３行の（ＣＸ３ＣＹ３ＣＺ３）は図３３（ａ）の画像を得る際のカメラ２の画像面のｘ軸、ｙ軸方向のベクトルを示し、第２行の（ＣＸ２ＣＹ２ＣＺ２）、第４行の（ＣＸ４ＣＹ４ＣＺ４）は図３３（ｂ）の画像を得る際のカメラ２の画像面のｘ軸、ｙ軸方向のベクトルを示す。右辺の中央の行列は因子分解法によって求められた対角行列であり、この成分から次式で寄与率Ｋを求めることができる。
Ｋ＝（ａ＋ｂ＋ｃ）／（ａ＋ｂ＋ｃ＋ｄ）
撮像不可領域が生じなければ、時系列画像群の中に設定された各追跡領域間での点要素に誤対応がないと考えられるから、寄与率Ｋに対して適宜の閾値を設定しておき、上述のようにして求めた寄与率Ｋが閾値を超えるときには撮像不可領域が生じないものと判断して、３次元空間にマッピングした領域を基準領域候補とする。境界要素のうち点要素の個数に変化が生じない追跡領域ごとに基準領域候補を求めるため、３次元空間において１つの面に対応する基準領域候補は複数生成されることになる。そこで、複数の基準領域候補を求め、その中で面積が最大である基準領域候補を３次元空間における１つの面に対する基準領域として採用する。
【００７６】
基準領域が決定されれば、撮像不可領域を抽出することができる。つまり、基準領域を決定した追跡領域を時系列画像群で追跡する間に点要素が対応しなくなれば撮像不可領域が生じたと判断することができる。たとえば、図３４に示すように、基準領域ＤＳが設定されているとすれば、基準領域ＤＳは境界要素として点要素ｐＳ１〜ｐＳ４を備えていることになる。これに対して、追跡領域Ｄをステレオ法や因子分解法によって３次元空間にマッピングしたときに、図３４に示すように追跡領域Ｄが５個の点要素ｐ１〜ｐ５を含むようになったとすれば、追跡領域Ｄの点要素ｐ１〜ｐ３は基準領域ＤＳの点要素ｐＳ１〜ｐＳ３に対応するものの、追跡領域Ｄの点要素ｐ４，ｐ５には基準領域ＤＳに対応する点要素がないことになる。つまり、基準領域ＤＳと追跡領域Ｄとにおいて互いに対応するものがない点要素ｐＳ４，ｐ４，ｐ５に囲まれた領域を撮像不可領域とみなすことができる。
【００７７】
上述の説明では点要素を用いているが、線要素を用いる場合には、図３５のように基準領域ＤＳの線要素ｓＳ２，ｓＳ３には追跡領域Ｄの線要素ｓ２，ｓ３が対応し、また基準領域ＤＳの線要素ｓＳ１，ｓＳ４には追跡領域Ｄの線要素ｓ１，ｓ４が対応するが、追跡領域Ｄの線要素ｓ５については基準領域ＤＳに対応する線要素が存在しないから、この線要素ｓ５が撮像不可領域の一部であると判断することができる。
【００７８】
上述のようにして撮像不可領域が抽出された後には、撮像不可領域が生じた原因が、他者隠蔽と自己隠蔽とフレームアウトとのいずれであるかを判断する。この判断には、上述のようにして求めたすべての基準領域を３次元空間にマッピングした状態から時系列画像群における各画像の画像面に基準領域を投影する。すなわち、カメラ２の位置によって撮像不可領域が生じた画像面の３次元空間内での位置を規定できるから、この画像面に対して基準領域を投影すれば、基準領域を設定した面同士が重なっているかあるいは画像面の外にはみ出しているかを知ることができる。また、基準領域を設定した面とカメラ２（つまり画像面）との距離関係によって基準領域を設定したどの面に撮像不可領域が生じているかを知ることができる。以下では基準領域を設定した面を基準面と呼ぶことにする。
【００７９】
ここで、３次元空間において撮像不可領域の生じている基準面とカメラ２との間に他の基準面が存在する場合であって、撮像不可領域の生じている基準面に連続する基準面が存在しないときには、他者隠蔽であると判断する。たとえば、図３６においては基準面ＳＲ１に撮像不可領域（斜線部）が生じているが、この基準面ＳＲ１に対して画像内で隣接している基準面ＳＲ２は３次元空間では接続されていないから、他者隠蔽と判断される。
【００８０】
一方、３次元空間において撮像不可領域の生じている基準面とカメラ２との間に他の基準面が存在する場合であって、撮像不可領域の生じている基準面に連続する基準面が存在するときには、自己隠蔽であると判断する。たとえば、図３７においては基準面ＳＲ３に撮像不可領域（斜線部）が生じており、基準面ＳＲ３に対して画像内で隣接している基準面ＳＲ４，ＳＲ５は３次元空間においても接続されているから、この場合には基準面ＳＲ３は自己隠蔽によって撮像不可領域になっていると判断される。
【００８１】
さらに、図３８に示すように、撮像不可領域を生じている基準面ＳＲ６の一部が画像面の周縁に跨っているときには、フレームアウトであると判断される（基準面ＳＲ６のうち画面から外に出ている部分を斜線部で示す）。
【００８２】
上述のようにして、基準面と画像面との距離および撮像不可領域を生じている基準面と他の基準面との接続関係などに基づいて、撮像不可領域を生じている原因が他者隠蔽、自己隠蔽、フレームアウトのいずれであるかを判定することができる。
【００８３】
撮像不可領域が生じている原因を上述のようにして判定した後には、判定結果の検証を行う。つまり、撮像不可領域を生じている基準面ＳＲにおいて、図３９に示すように撮像不可領域の存在する方向にエッジＥ１，Ｅ２を延長し、延長したエッジＥ１，Ｅ２の交点を基準面ＳＲの新たな点要素とみなして追跡処理を行う。このとき、上述した寄与率が向上すれば、撮像不可領域は他者隠蔽または自己隠蔽により生じているものと最終的に確定される。ここで、他者隠蔽による撮像不可領域はカメラ２が移動することによって撮像可能になることがあるから追跡処理を継続し、自己隠蔽による撮像不可領域はカメラ２の移動によって再び撮像可能になることはないものと判断して追跡処理を終了する。
【００８４】
図４０に撮像不可領域に対する処理の手順をまとめて示す。すなわち、まず基準領域を設定するために、複数の画像において境界要素の数が変化しない追跡領域を抽出する（Ｓ１）。ここで、対象物１の１つの面を形成している追跡領域を複数の画像から抽出するとともに対応する追跡領域に対して座標変換を施すことによって３次元空間に追跡領域をマッピングし（Ｓ２）、マッピングの際の寄与率を求める（Ｓ３）。抽出可能なすべての面について３次元空間へのマッピングと寄与率の演算とを終了した後（Ｓ４）、３次元空間にマッピングしたときに寄与率が閾値以上となった面を基準領域候補として抽出する（Ｓ５）。さらに、基準領域候補から面積がもっとも大きいものを抽出して基準領域とする（Ｓ６）。基準領域が決定されると、次に撮像不可領域を抽出し（Ｓ７）、さらに３次元空間において注目する基準領域と同じ画像内で他の基準領域およびカメラとの位置関係を求める（Ｓ８）。ここで、撮像不可領域が生じている基準領域に対して他の基準領域が離れている場合には（Ｓ９）、他者隠蔽と判断し（Ｓ１０）、撮像不可領域が生じている基準領域に対して他の基準領域が接続されている場合には（Ｓ１１）、自己隠蔽と判断する（Ｓ１２）。他者隠蔽、自己隠蔽のいずれも生じていない場合には、撮像不可領域が生じている基準領域についてフレーム内での位置を求め（Ｓ１３）、画像の周縁付近に基準領域が存在しているときには（Ｓ１４）、フレームアウトと判断する（Ｓ１５）。以上の処理を全フレームについて行った後（Ｓ１６）、撮像不可領域に対応するエッジを延長し追跡処理を再実行して隠蔽の有無を確定する（Ｓ１７）。
【００８５】
（第２の実施の形態）
第１の実施の形態は隠蔽の有無を検出する方法であったが、本実施形態では、画像内に含まれる面に変化が生じない一連の複数の画像をフェーズ画像群とし、フェーズ画像群ごとに３次元空間に追跡領域をマッピングし、マッピングにより得られた３次元形状を座標変換により重ね合わせて対象物１の全体の３次元データを得る方法について説明する。したがって、本実施形態において対象物１を境界要素で表して追跡する処理については第１の実施の形態と共通化することができる。本実施形態は他者隠蔽が生じない対象物１であれば単独で処理することができ、また第１の実施の形態における隠蔽の有無を検出する処理の後に、第１の実施の形態において用いた追跡の処理の結果を用いて本実施形態の処理を行うことも可能である。
【００８６】
以下では説明を容易にするために、対象物１が図４１のように直方体状であるものとする。図４１において矢印Ａはカメラ２の移動を表す。つまり、他者隠蔽は生じないものとして本実施形態の処理を説明する。また、対象物１の各面を図４２のように規定する。つまり、図４２の上面をｆ１、下面をｆ２とし、図４２の手前左面、手前右面をそれぞれｆ３、ｆ４、図４２の裏側右面、裏側左面をそれぞれｆ５、ｆ６とする。このような対象物１を上面から見ると、各面ｆ１、ｆ３〜ｆ６の関係は図４３のようになる。いま、対象物１の上面ｆ１の中心を通り上面ｆ１に直交する軸（つまり、図４３の破線の交点を通り図の面に直交する軸）の回りでカメラ２を対象物１に対して相対的に回転させ、上記軸に対して４５度程度の角度をもって対象物１を斜め上方から撮像するものとする。したがって、撮像によって得られた各画像（フレーム）にはつねに上面ｆ１が含まれていることになる。また、上記軸を含み各面ｆ３〜ｆ６に直交する面内にカメラ２が位置するときには（図４３における位置ｅ３〜ｅ６にカメラ２が位置するときには）、各画像（フレーム）に上面ｆ１以外には各面ｆ３〜ｆ６のいずれか１面が含まれることになる。カメラ２がその他の位置（図４３における範囲ｄ３４，ｄ４５，ｄ５６，ｄ６３）に位置するときには各画像には上面ｆ１のほかに隣接する２面が含まれることになる。
【００８７】
つまり、図４３に示す位置ｅ３から左回りにカメラ２の位置を変化させると、図４４に示すように各画像（図４４における各箱がそれぞれ画像を示している）の内容が変化することになる。ただし、上面ｆ１と他の１面との２面のみを含む１つの画像ｈ３〜ｈ６を挟んで、上面ｆ１と他の２面との３面を含む画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３が得られるように撮像条件が設定される。つまり、３面を含む画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３はそれぞれ連続して複数画像ずつ得られるように撮像条件が設定される。このように同じ面が含まれている連続した複数画像を以下ではフェーズ画像群と呼ぶ。また、フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３の間の２面のみを含む画像ｈ３〜ｈ６は１画像ずつ独立して得られるように撮像条件が設定される。このような撮像条件で得られた画像では、各フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３を画像ｈ３〜ｈ６で切り分けることが可能になる。要するに、各画像に含まれる面の数が変化したときに変化の前後で切り分け、同じ面を含む画像が複数連続して得られているときには、それらをまとめてフェーズ画像群とするのである。こうして得られたフェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３の各画像では含まれる面に変化は生じない。
【００８８】
本実施形態ではＴＶカメラ２で取得した対象物１の画像に対して、図１に示した領域指定手段１６において、線要素と点要素とからなる境界表現（Ｂ−ＲＥＰ：boundary representation）を用いて追跡領域を設定する。また、対象物１から選択した面を領域追跡手段１７により追跡する。このような追跡の処理は第１の実施の形態と同様であり、対象物１の面単位で追跡領域を設定し、１つの画面の追跡領域から他の画面の追跡領域の形状を推定して変形領域を生成し、他の画面の画像と変形領域とを比較照合して、変形領域の位置を追跡するのである。このようにして対象物１の面を追跡すれば、各画像内にどの面が含まれているかの情報が得られるから、上述のようにフェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３を設定することができる。
【００８９】
上述のようにしてフェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３が設定されると、各フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３ごとに、追跡領域を３次元空間にマッピングする。なお、フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３を設定するには、図１に図示していない手段を用いて画像ファイルＦ１内に格納されている元の動画像からフェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３に切り分ける。元の動画像からフェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３に切り分ける処理としては、２次元画像内で輪郭線によって面を自動的に追跡する画像処理手段を用いる方法か、画面上に複数画像を表示しておき人が手作業で行う方法かのいずれかを用いる。フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３の各画像では含まれる面に変化が生じないから、フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３を構成する各画像においては自己隠蔽による点要素の消失が生じることはなく、各フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３の範囲内においては因子分解法やステレオ法のような従来から知られている手法を適用して追跡領域を３次元空間にマッピングすることができる。
【００９０】
いま、図４３に示したフェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３を用いることによって、それぞれ図４５（ａ）〜（ｄ）のように３次元空間に追跡領域がマッピングできたとすれば、各フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３から得られた３次元形状についてのワールド座標系はそれぞれ異なることになる。図４５（ａ）〜（ｄ）においては各３次元形状に対するワールド座標系を、それぞれＸ１−Ｙ１−Ｚ１，Ｘ２−Ｙ２−Ｚ２，Ｘ３−Ｙ３−Ｚ３，Ｘ４−Ｙ４−Ｚ４で示している。このように、フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３から得られた３次元形状ごとにワールド座標系が異なっていると、共通した１つの座標系内で対象物１の３次元データを表現することができないから、以下の手順で各ワールド座標系を有した３次元形状を重ね合わせる。
【００９１】
ワールド座標系が異なる複数の３次元形状を重ね合わせる際には、重ね合わせの対象となる複数の３次元形状の間に共通する線要素が存在するか否かに応じて手順に相違が生じる。
【００９２】
共通の線要素が存在する場合は、共通する線要素が重なるように、対象となる複数の３次元形状に対する座標変換を施す。たとえば、図４５において、面ｆ１を囲む各線要素ｓ１３，ｓ１４，ｓ１５，ｓ１６は図４５（ａ）〜（ｄ）において共通に存在し、面ｆ４と面ｆ５との境界線としての線要素ｓ４５は図４５（ａ）〜（ｃ）において共通に存在する。他にも複数の３次元形状で共通する線要素が存在しているが、できるだけ多くの３次元形状で共通する線要素を選択するのが望ましい。本実施形態では互いに異なる向きであって１点で交わる線要素を選択している。つまり、面ｆ１と面ｆ４との境界線である線要素ｓ１４と、面ｆ１と面ｆ５との境界線である線要素ｆ１５と、面ｆ４と面ｆ５との境界線である線要素ｓ４５との３本を選択している。これらの３本の線要素ｓ１４，ｓ１５，ｓ４５を重ね合わせるように座標変換を施すことで、４個の３次元形状のワールド座標系を統一して図４６のように１つのワールド座標系（Ｘ−Ｙ−Ｚ）で３次元データを表すことが可能になる。
【００９３】
一方、共通の線要素が存在しない場合には、各３次元形状において共通に存在している追跡領域を重ね合わせる。つまり、図４５においては面ｆ１がすべての３次元形状に共通に存在するから、この面ｆ１を重ね合わせる。このことは、線要素ｓ１３，ｓ１４，ｓ１５，ｓ１６を重ねるように座標変換を行うことと等価である。
【００９４】
ところで、上述のようにフェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３ごとに求めた３次元形状のワールド座標系を１つのワールド座標系に統合する際に、各フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３から求めた３次元形状に誤差があれば、重ね合わせようとする２つのワールド座標系にもずれが生じているから、いずれかの境界要素を重ね合わせると他の境界要素が重ならないというような不都合が生じることがある。そこで、２つのワールド座標系に共通する境界要素（線要素と点要素）ごとに座標変換のための変換行列を求め、求めた複数個の変換行列の各要素（つまり、回転および平行移動に関する変換パラメータ）の平均値を要素とする変換行列を用いて座標変換を行う。このように平均値を用いることで変換行列を一意に決定することができる。
【００９５】
いま、図４５（ａ）（ｂ）の３次元形状を対象として説明すれば、２つの３次元形状において３つの面ｆ１，ｆ３，ｆ４の交点である点要素ｐ１３４は、図４７（ａ）（ｂ）の位置にあり、図４７（ｂ）における点要素ｐ１３４に対して、図４７（ａ）に示す回転ｒと平行移動ｍとの座標変換を施すことで、図４７（ａ）の位置に点要素ｐ１３４を移動させることができるから、回転ｒと平行移動ｍとの座標変換によって図４５（ａ）（ｂ）におけるワールド座標系を一致させることができると考えられる。このような座標変換は一般に行列形式で表されるから、行列要素として座標変換を定義することができる。そこで、点要素ｐ１３４だけではなく、他の点要素（多いほうがよい）についても座標変換の要素を求め、同じ要素ごとに平均値を求め、この平均値を要素とする変換行列を用いて座標変換を施すのである。
【００９６】
上述のように複数個の境界要素について求めた変換行列の各要素ごとの平均値を要素とする変換行列を用いることによって、図４８のように、２つの３次元形状を重ね合わせた１つの三次元形状を得ることができる。図示例では図４７（ｂ）の３次元形状が図４７（ａ）の３次元形状に重なるように座標変換を施した結果であり、座標変換が施された３次元形状のうち面ｆ５は背面側で不可視になっている。ここに、図４８においてダッシュ（′）付きの符号は図４７（ｂ）の３次元形状に座標変換を施した後の３次元形状を意味する。上述のようにして変換行列の各要素について要素ごとの平均値を用いると、図４７（ｂ）の位置に対して座標変換を施した点要素ｐ１３４′は図４７（ａ）の位置における点要素ｐ１３４とは完全には一致していない。そこで、最終的には点要素ｐ１３４と点要素ｐ１３４′との中点を点要素の位置として決定する。
【００９７】
以上のようにして複数の３次元形状から１つの３次元データに統合することが可能になる。なお、本実施形態では面ｆ２についてはカメラ２によって撮像していないから、面ｆ２を正確に規定することはできないが、面ｆ２の周囲を構成する線要素および点要素はカメラ２に撮像されているから、これらの線要素および点要素により表現される平面とみなしている。
【００９８】
上述の例において、動画像（時系列画像）を用いて各フェーズ画像群ｋ３４，ｋ４５，ｋ５６，ｋ６３を設定するために、面の数が変化することを利用していたが、以下のように自己隠蔽が生じる面の有無によってフェーズ画像群を区切るようにしてもよい。つまり、境界要素を点要素と線要素とで表すだけではなく、各線要素に方向を規定することにより有向の線要素とし、各面の自己隠蔽に伴う線要素の方向の変化を利用して追跡領域である面が不可視になったと判断し、追跡領域である面が可視の状態から不可視の状態に変化したときにフェーズ画像群の区切りと判断するのである。言い換えると、線要素は点要素を順次接続しているから点要素を順に辿ることによって点要素を巡回することになり、巡回する向きを監視しておき巡回する向きが画像内で逆になると、その面が不可視になったと判断できるのである。
【００９９】
具体的には、可視である面について図４９に矢印を付して示しているように、１つの面を構成する境界要素を線要素の方向に辿ると時計回りで一巡できるように規定しているとすれば、不可視になった面は反時計回りになる。たとえば、図４９では面ｆ１，ｆ３，ｆ４が可視であって、面ｆ２，ｆ５，ｆ６が不可視になっている。ここで、カメラ２は対象物１に対して上述した位置関係で対象物１が撮像されていることにより面ｆ１はつねに可視であり、面ｆ２はつねに不可視であるから、面ｆ１，ｆ２は除外する。したがって、ここでは可視・不可視の判断の対象を面ｆ３〜ｆ６とする。この場合、可視である面ｆ３、ｆ４について線要素を時計回りに辿ることができるとすれば、不可視である面ｆ５，ｆ６は線要素を反時計回りに辿ることになる。これは、面ｆ３，ｆ４が不可視になる場合も同様である。要するに、線要素を辿ることで１つの面の境界要素を一巡するように線要素に方向を与えておけば、その面が可視か不可視であるかによって、時計回りに線要素を辿るかと反時計回りに線要素を辿るかが変化するから、これを利用して追跡領域である面が自己隠蔽により不可視になったと判断することができ、動画像（時系列画像）のうちで追跡領域として設定した面が可視から不可視になるとフェーズ画像群の区切りとすることができる。
【０１００】
【発明の効果】
請求項１の発明は、静止している対象物を異なる複数位置からＴＶカメラにより撮像して複数の画像を取得し、対象物から選択した面を境界要素と画像特徴量とで表した追跡領域として画像内に設定し、一つの画像内で設定した追跡領域を他の画像における追跡領域の候補の境界要素に一致するように変形させた変形領域を作成し、他の画像における追跡領域の候補と変形領域とについて境界要素と画像特徴量との少なくとも一方を比較することにより追跡領域の候補から一つの画像内で設定した追跡領域に対応する追跡領域を選択する第１の過程と、撮像されていない領域が少ない追跡領域を３次元空間にマッピングすることにより得られる基準領域を設定し、画像内において追跡領域の少なくとも一部が追跡できないときに複数の基準領域の３次元空間での位置関係に基づいてＴＶカメラにより撮像できない領域が生じたか否かを判断する第２の過程と、基準領域をＴＶカメラの各位置において撮像したときの基準領域の位置関係に基づいて追跡領域をＴＶカメラで撮像できなくなった原因を推定し、推定結果に基づいて追跡領域の追跡が可能となるように修正して追跡を継続させる第３の過程とを備えることを特徴とし、動画像の各画像間で追跡領域を追跡することによって３ＤＣＧのデータに用いることができる３次元情報を２次元画像から容易に得ることができるだけではなく、追跡領域のうち撮像されていない領域の発生を検出するとともに、撮像されていない原因を推定するから、追跡領域の追跡が一旦失敗したとしても原因に応じた対処を行って追跡領域を確実に追跡することが可能になる。その結果、得られた３次元情報の妥当性が検証されるとともに、誤差の発生も抑制されることになる。
【０１０１】
請求項２の発明は、請求項１の発明において、第１の過程で、一つの画像で設定された追跡領域の境界要素について他の画像上で対応する候補を抽出し、境界要素の候補の組み合わせによる追跡候補を生成した後、一つの画像で設定した追跡領域を追跡候補に一致するように変形させた変形領域を生成し、追跡候補と変形領域とについて境界要素の形状、画素値、画像特徴量から選択される指標を比較することにより追跡候補の中から他の画像における追跡領域を決定することを特徴とし、従来のような局所的な特徴を用いるのではなく、追跡領域を構成する線要素（境界線）や点要素（境界線の接合点）、あるいは画像の特徴を用いるから、追跡領域を従来より確実に追跡することが可能になる。
【０１０２】
請求項３の発明は、請求項２の発明において、他の画像においてエッジを抽出するとともに、エッジのうち一つの画像において追跡領域を構成する線要素との距離および方向が規定範囲内であるエッジを線要素の候補として選択することを特徴し、請求項４の発明は、請求項２の発明において、他の画像においてエッジを抽出するとともに、エッジ上の各画素にハフ変換を行って連続性を有したエッジを抽出し、このエッジのうち一つの画像において追跡領域を構成する線要素との距離および方向が規定範囲内であるエッジを線要素の候補として選択することを特徴とし、請求項５の発明は、請求項２の発明において、一つの画像において追跡領域を構成する点要素の近傍の形状をテンプレートとし、他の画像においてテンプレートにマッチングする部位から点要素の候補を抽出することを特徴としており、いずれにおいても候補となる境界要素の組み合わせによって有限個の追跡候補を生成するから、処理量を制限しながらも正確な追跡が可能になる。
【０１０３】
請求項６の発明は、請求項２の発明において、追跡候補と変形領域との画素値を指標に用いるので、画像が特徴を持たない場合でも追跡候補と変形領域との比較が可能である。
【０１０４】
請求項７の発明は、請求項２の発明において、追跡領域と追跡候補との平均輝度を指標に用いることを特徴とし、請求項８の発明は、請求項２の発明において、画像はカラー画像であって、追跡領域と追跡候補との色を指標に用いることを特徴とし、請求項９の発明は、請求項２の発明において、追跡領域と追跡候補との空間周波数分布を指標に用いることを特徴としており、いずれも画像内の特徴を利用することによって、数値の比較によって追跡候補と変形領域とを容易に比較することができる。
【０１０５】
請求項１０の発明は、請求項２の発明において、追跡領域と追跡候補の領域内で画素値の変化が規定値を超える部分を抽出し、この部分の位置、形状、画像特徴の少なくとも１要素を指標に用いるので、画像内の特徴を利用することで追跡候補と変形領域との比較をより確実に行うことができる。
【０１０６】
請求項１１の発明は、請求項２の発明において、画像はカラー画像であって、追跡領域と追跡候補との平均輝度または色を指標に用いる場合と、空間周波数分布を指標に用いる場合と、領域内で画素値の変化が規定値を超える部分を抽出し、この部分の位置、形状、画像特徴の少なくとも１要素を指標に用いる場合とを、一方の画像における追跡領域内での輝度および色の分布パターンに応じて選択することを特徴としており、輝度と色との情報を用いることで、追跡候補と変形領域との比較に適した方法を自動的に選択することができる。
【０１０７】
請求項１２の発明は、請求項２の発明において、複数の追跡領域について複数の画像のうち平均輝度が最大になるときのＴＶカメラの姿勢を各追跡領域に対応した対象物の面の反射光の方向に近いと推定し、各追跡領域に対応した対象物の面の法線方向と反射光の方向とから光源からの照射方向を推定することを特徴とし、光源の位置を推定することによって影の影響を除去することが可能になり、追跡領域の追跡に際して影の影響による追跡の誤りを防止することができる。
【０１０８】
請求項１３の発明は、請求項１の発明において、第２の過程では、複数の画像間で対応する追跡領域を３次元空間にマッピングし、当該追跡領域の境界要素の対応関係から求められる一致度が閾値以上であるときに３次元空間にマッピングした追跡領域を基準領域とし、基準領域に対応する追跡領域と基準領域との境界要素を比較し、境界要素の変化によって追跡領域中で撮像されていない領域を抽出することを特徴とし、画像から得られた対象物の３次元空間での位置を基準領域によって簡易的に再現することにより、対象物の位置関係を比較的容易に検証することができ、撮像されていない領域が生じるか否かを容易に検証することができる。
【０１０９】
請求項１４の発明は、請求項１３の発明において、複数の画像間で境界要素の個数に変化が生じない範囲に限定し、各々の範囲について、マッピング時に得られる寄与率を一致度に用いて寄与率が閾値以上である追跡領域を基準領域の候補とし、基準領域の候補が複数個得られるときには面積が最大になる候補を基準領域として採用することを特徴とし、この技術によって基準領域を求めることにより、対象物の１つの面によりよく対応した基準領域を設定することができるという利点がある。
【０１１０】
請求項１５の発明は、請求項１４の発明において、境界要素の個数の変化は、線要素と点要素との一方に着目しているので、処理量が少なく高速な処理が可能になる。
【０１１１】
請求項１６の発明は、請求項１４の発明において、寄与率は追跡領域を因子分解法により３次元空間にマッピングする際に得られる対角行列の成分から求めるので、因子分解法を用いた線形演算により安定して解を求めることができる。
【０１１２】
請求項１７の発明は、請求項１３の発明において、３次元空間に追跡領域をマッピングするとともに基準領域と比較し、互いに対応しない点要素によって規定される領域を追跡領域中で撮像されていない領域として求めることを特徴とし、請求項１８の発明は、請求項１３の発明において、３次元空間に追跡領域をマッピングするとともに基準領域と比較し、互いに対応しない線要素が存在するときに、この線要素を追跡領域中で撮像されていない領域の一部とみなすことを特徴としており、撮像されていない領域の範囲を容易に求めることができる。
【０１１３】
請求項１９の発明は、請求項１の発明において、第３の過程で、３次元空間内でのすべての基準領域をＴＶカメラの位置により決まる画像面に投影し、画像面内における基準領域同士の位置関係および画像面に対する基準領域の位置関係に基づいて、追跡領域中で撮像されていない領域が生じた原因を判定することを特徴とし、実際の対象物の位置関係を簡易的に再現した基準領域の位置関係によるシミュレーションを行って撮像されていない領域が生じた原因を容易に判定することができる。
【０１１４】
請求項２０の発明は、請求項１９の発明において、着目する基準領域とＴＶカメラとの間に他の基準領域が存在し、かつ両基準領域が接続されていないときに他者隠蔽と判定することを特徴とし、請求項２１の発明は、請求項１９の発明において、着目する基準領域とＴＶカメラとの間に他の基準領域が存在し、かつ両基準領域が接続されているときに自己隠蔽と判定することを特徴とし、請求項２２の発明は、請求項１９の発明において、着目する基準領域が画面の周縁に位置するときにフレームアウトと判定することを特徴としており、それぞれ基準領域同士および画像面との位置関係に基づいて撮像されていない領域が生じた原因を判断することができる。
【０１１５】
請求項２３の発明は、請求項１９の発明において、追跡領域中で撮像されていない領域が存在するときに、エッジ延長を行い、延長されたエッジの交点を新たな点要素として、追跡処理を再度実行し、３次元空間にマッピングする際に寄与率が向上すれば隠蔽と確定することを特徴とし、追跡領域の隠蔽によって追跡が不可能になる状態を回避することができ、追跡領域をより確実に追跡することが可能になる。
【０１１６】
請求項２４の発明は、ＴＶカメラにより撮像された画像入力画像として与える画像入力部と、入力された画像に対して請求項１記載の画像特徴追跡処理方法による処理を施す画像処理装置と、入力された画像および画像処理装置で処理された画像を格納する記憶装置と、画像処理装置での処理画像を表示する表示手段と、画像処理装置に対して追跡領域を指定する領域指定手段とを備えるものであり、請求項１の発明と同様の効果に加えて、ＴＶカメラとして通常のビデオカメラ等により撮像した動画像を用いて３次元情報を容易に得ることができる。
【０１１７】
請求項２５の発明は、対象物を異なる複数位置からＴＶカメラにより撮像して複数の画像を取得した後、前記対象物から選択した面を境界要素で表した追跡領域として画像内に設定し、一つの画像内で設定した追跡領域を他の画像における追跡領域の候補の境界要素に一致するように変形させた変形領域を作成し、前記他の画像における追跡領域の候補と変形領域とについて境界要素を比較することにより追跡領域の候補から前記一つの画像内で設定した追跡領域に対応する追跡領域を選択し、さらに前記対象物の複数の面が撮像されかつ連続して同じ面が撮像されている複数の画像を１つのフェーズ画像群として区切り、次に複数のフェーズ画像について各フェーズ画像群ごとに対象物から選択した面の３次元空間へのマッピングを行うことにより各フェーズ画像群ごとに３次元形状を求め、互いに他のフェーズ画像群の座標系を一致させるように座標変換を行うことによって、３次元データを作成することを特徴とし、動画像（時系列画像）において対象物の同じ面が撮像されている連続した複数の画像をフェーズ画像群とし、フェーズ画像群の範囲内で３次元形状を求めているから、フェーズ画像群の範囲内では各面が自己隠蔽によって追跡できなくなることがなく、３次元形状を確実かつ容易に求めることができる。つまり、従来から知られている因子分解法やステレオ法を適用して３次元形状を求めても不都合が生じない。このようにして個々にフェーズ画像群では、３次元形状を容易に求めることができ、その後、各フェーズ画像群から求めた３次元形状の座標系を一致させるように座標変換を行うことで１つの３次元データにまとめることができるのである。
【０１１８】
請求項２６の発明は、対象物を異なる複数位置からＴＶカメラにより撮像して複数の画像を取得した後、前記対象物から選択した面を境界要素で表した追跡領域として画像内に設定し、一つの画像内で設定した追跡領域を他の画像における追跡領域の候補の境界要素に一致するように変形させた変形領域を作成し、前記他の画像における追跡領域の候補と変形領域とについて境界要素を比較することにより追跡領域の候補から前記一つの画像内で設定した追跡領域に対応する追跡領域を選択し、さらに前記対象物の複数の面が撮像されかつ連続して同じ面が撮像されている複数の画像を１つのフェーズ画像群として区切り、次に複数のフェーズ画像について各フェーズ画像群ごとに対象物から選択した面の３次元空間へのマッピングを行うことにより各フェーズ画像群ごとに３次元形状を求め、互いに他のフェーズ画像群に共通している線要素を重ね合わせるように各フェーズ画像群ごとに得られた３次元形状の座標変換を行うことによって、３次元データを作成することを特徴とし、動画像（時系列画像）において対象物の同じ面が撮像されている連続した複数の画像をフェーズ画像群とし、フェーズ画像群の範囲内で３次元形状を求めているから、フェーズ画像群の範囲内では各面が自己隠蔽によって追跡できなくなることがなく、３次元形状を確実かつ容易に求めることができる。つまり、従来から知られている因子分解法やステレオ法を適用して３次元形状を求めても不都合が生じない。このようにして個々にフェーズ画像群では、３次元形状を容易に求めることができ、その後、各フェーズ画像群から求めた３次元形状の線要素同士を重ね合わせるように座標変換を行うことで１つの３次元データにまとめることができるのである。
【０１１９】
請求項２７の発明は、請求項２５または請求項２６の発明において、前記ＴＶカメラにより撮像して取得した前記複数の画像内で前記対象物から選択した面を、点要素と各点要素を順次接続する有向の線要素とからなる境界要素で表し、時系列で得られる各画像において線要素を辿って各点要素を巡回する向きを監視し、巡回する向きが逆になるまでを１つのフェーズ画像群とすることを特徴とし、フェーズ画像群を区切る処理が自動化可能になる。
【０１２０】
請求項２８の発明は、請求項２５または請求項２６の発明において、前記座標変換では、境界要素ごとに回転および平行移動に関する変換パラメータを求めた後、境界要素ごとに求めた変換パラメータをそれぞれ平均した変換パラメータを用いて座標変換を行うことを特徴とし、座標変換の変換パラメータを平均化することで個々の境界要素から求めた変換パラメータが不一致の場合でも変換パラメータを一意に決定することができる。
【０１２１】
請求項２９の発明は、請求項２５または請求項２６の発明において、座標系を互いに一致させる一方の３次元形状に前記座標変換を施した後の各点要素の位置と他方の３次元形状においてそれぞれ対応する各点要素の位置との中点を、各点要素の位置とすることを特徴とし、座標変換を行ったときに点要素の位置にずれがあっても各点要素の位置を一意に決定することが可能になる。
【図面の簡単な説明】
【図１】本発明の実施形態に用いる装置を示すブロック図である。
【図２】同上においてＴＶカメラにより対象物を撮像する状態を示す斜視図である。
【図３】同上において得られた時系列画像群を示す図である。
【図４】同上における画像の例を示す図である。
【図５】同上における追跡領域の例を示す図である。
【図６】同上における対象物の例を示す図である。
【図７】同上の動作説明図である。
【図８】同上における追跡候補の例を示す図である。
【図９】同上における変形領域の例を示す図である。
【図１０】同上において変形領域を生成する過程を示す図である。
【図１１】同上の動作説明図である。
【図１２】同上におけるテンプレートの例を示す図である。
【図１３】同上における追跡候補の例を示す図である。
【図１４】同上における変形領域の例を示す図である。
【図１５】同上の動作説明図である。
【図１６】同上における追跡領域の例を示す図である。
【図１７】同上の動作説明図である。
【図１８】同上における変形領域の例を示す図である。
【図１９】同上の動作説明図である。
【図２０】同上の動作説明図である。
【図２１】同上の動作説明図である。
【図２２】同上の動作説明図である。
【図２３】同上の動作説明図である。
【図２４】同上の動作説明図である。
【図２５】同上の動作説明図である。
【図２６】同上の動作説明図である。
【図２７】同上において影が生じる例を示す図である。
【図２８】同上の動作説明図である。
【図２９】同上の動作説明図である。
【図３０】同上の動作説明図である。
【図３１】同上の動作説明図である。
【図３２】同上の動作説明図である。
【図３３】同上の動作説明図である。
【図３４】同上の動作説明図である。
【図３５】同上の動作説明図である。
【図３６】同上の動作説明図である。
【図３７】同上の動作説明図である。
【図３８】同上の動作説明図である。
【図３９】同上の動作説明図である。
【図４０】同上の動作説明図である。
【図４１】本発明の第２の実施の形態においてＴＶカメラにより対象物を撮像する状態を示す斜視図である。
【図４２】同上に用いる対象物を示す斜視である。
【図４３】同上に用いる対象物を示す平面図である。
【図４４】同上におけるフェーズ画像群の概念を説明する図である。
【図４５】同上において各フェーズ画像群から３次元形状を得た状態を示す図である。
【図４６】同上において３次元形状を統合した状態を示す図である。
【図４７】同上の動作説明図である。
【図４８】同上の動作説明図である。
【図４９】同上の動作説明図である。
【符号の説明】
１対象物
２ＴＶカメラ
３物体
１１画像入力装置
１２画像入力手段
１３記憶装置
１４画像処理装置
１５表示手段
１６領域指定手段
１７領域追跡手段
１８追跡評価手段
１９形状復元手段
Ｄ追跡領域
Ｅ変形領域
Ｆ１画像ファイル
Ｆ２追跡データファイル
Ｆ３３次元形状データファイル[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image feature tracking processing method, an image feature tracking processing device, and a three-dimensional data creation method for extracting three-dimensional information about an object included in a time-series image group (moving image) that is a two-dimensional image. .
[0002]
[Prior art]
In recent years, computer graphics (hereinafter abbreviated as 3DCG) technology has advanced rapidly. Such 3DCG technology is also used in CAD systems and virtual reality systems. However, the modeling work for creating 3DCG data takes a lot of time, and this is the main reason why the application field of 3DCG cannot be expanded.
[0003]
In order to facilitate the modeling work, a technology that automates the modeling work by measuring in real space using a three-dimensional measuring device (motion capture etc.) and converting the measured value to 3DCG data has been put into practical use. However, the 3D measuring device is very expensive because it performs stereoscopic vision, and the space area that can be measured with the current 3D measuring device is relatively narrow, so the application of this technology is limited. . In particular, this type of 3D measuring device is not suitable for measuring a wide space area such as an urban area, and 3DCG data is created manually to form a cityscape or the like with 3DCG. Currently.
[0004]
Therefore, as a technology that can perform three-dimensional measurement using a relatively inexpensive apparatus and can be used for various purposes, a three-dimensional shape based on a two-dimensional moving image (time-series image group) by monocular vision. There has been proposed a technique for restoring the image. For example, the technique described in “Takeo Kanade et al .: Restoration of Object Shape and Motion of Imaging Device by Factorization Method, Journal of the Institute of Electronics, Information and Communication Engineers, Vol. J76-D-II No. 8, 1993/8” (below) Is called a factorization method), recognizes features on a two-dimensional moving image, creates a feature point matrix that tracks how the features change as a moving image, and factorizes the feature point matrix By applying this method, the three-dimensional shape of the object and the camera posture are restored. In this technique, if there is an error in the tracking process for creating the feature point matrix, the error in the measurement result becomes large. Therefore, it is an important issue to accurately track the feature on the moving image.
[0005]
As a technique for tracking features in a two-dimensional moving image for the purpose of three-dimensional measurement, there is a technique described in JP-A-10-11934. In this publication, by applying a factorization method, by specifying a plurality of regions for an image of one frame, and applying a plurality of types of feature extraction methods to each region, and adopting an optimum result, It is described that accurate feature point tracking is performed.
[0006]
Japanese Patent Laid-Open No. 10-255053 discloses a feature in which, after applying a stereo method, after tracking a feature point, an appropriate point in the image is set as the image origin and does not change with the rotation of the imaging apparatus. Determine the motion trajectory of the feature point using the distance between the origin as a point and other feature points, evaluate whether the motion trajectory has correlation between two different image groups, and whether there is a tracking failure It is described that a three-dimensional shape can be restored with high accuracy by using only feature points that are correctly tracked.
[0007]
[Problems to be solved by the invention]
However, in the technique described in Japanese Patent Application Laid-Open No. 10-11934, it is necessary for the operator to determine and instruct whether or not the feature point tracking is correct. Therefore, the operator cannot leave the apparatus during the tracking process. There is a problem.
[0008]
Further, in the technique described in Japanese Patent Application Laid-Open No. 10-255053, it is possible to detect a failure in feature tracking, but it is impossible to determine the reason for the failure. In addition, it is impossible to determine whether there is a tracking error or another cause (such as when there are multiple objects and the object being tracked is hidden by another object), and the failure can be corrected. Since this is not possible, there may be a portion in the image that cannot be measured.
[0009]
Furthermore, since both of the above-described techniques track using local features extracted from the image, there is a problem that tracking tends to fail when there is a repetitive pattern such as a lattice pattern. In addition, the three-dimensional information obtained by any of the above-described techniques is the coordinates of each feature point. The coordinates of each feature point are merely discrete coordinate information in the three-dimensional space, and the relationship between the feature points is It is unknown. Therefore, in order to use the information measured by the above-described technique as 3DCG data, another processing means for generating data having surface information such as polygons from discrete coordinate information is required.
[0010]
The present invention has been made in view of the above-mentioned reasons, and its purpose is to make it possible to easily obtain three-dimensional information that can be used for 3DCG data from a time-series image group that is a two-dimensional image, In addition, it is possible to verify the validity of tracking in tracking a target object in a time-series image group, and the image feature tracking processing method and the image feature capable of reliably tracking the target object and suppressing errors in three-dimensional information The object is to provide a tracking processing device and a three-dimensional data creation method.
[0011]
[Means for Solving the Problems]
According to the first aspect of the present invention, a stationary object is imaged by a TV camera from a plurality of different positions to obtain a plurality of images, and a surface selected from the object is tracked by a boundary element and an image feature amount. Create a deformed area that is set in the image as an area, deforms the tracking area set in one image so as to match the boundary element of the candidate tracking area in another image, and the tracking area in the other image Selecting a tracking area corresponding to the tracking area set in the one image from the tracking area candidates by comparing at least one of the boundary element and the image feature amount with respect to the candidate and the deformation area; Setting a reference region obtained by mapping a tracking region with few uncaptured regions such as concealment and frame-out to a three-dimensional space, and at least the tracking region in the image A second step of determining whether or not an area that cannot be captured by the TV camera is generated based on a comparison of boundary elements between the tracking area and the reference area in the three-dimensional space when the part cannot be tracked; Estimate the reason why the tracking area could not be imaged by the TV camera based on the positional relationship of the reference area when captured at each position of the camera, and modified the tracking area to be traceable based on the estimation result And a third process for continuing the tracking.
[0012]
According to a second aspect of the present invention, in the first aspect of the invention, in the first step, a candidate corresponding to a boundary element of a tracking area set in one image is extracted on another image, and a candidate for a boundary element is obtained. After generating a tracking candidate based on a combination of the above, a deformation region is generated by deforming the tracking region set in the one image so as to match the tracking candidate, and the shape of the boundary element and the pixel value for the tracking candidate and the deformation region are generated. The tracking region in the other image is determined from the tracking candidates by comparing the indices selected from the image feature amounts.
[0013]
According to a third aspect of the present invention, in the second aspect of the present invention, an edge is extracted from the other image, and the distance and direction with respect to a line element constituting the tracking region in the one image is within a specified range. A certain edge is selected as a candidate for a line element.
[0014]
According to a fourth aspect of the present invention, in the second aspect of the present invention, an edge is extracted from the other image, and a Hough transform is performed on each pixel on the edge to extract a continuous edge. In the one image, an edge whose distance and direction from a line element constituting a tracking region are within a specified range is selected as a line element candidate.
[0015]
According to a fifth aspect of the present invention, in the second aspect of the present invention, the shape in the vicinity of the point element constituting the tracking area in the one image is used as a template, and the point element candidate is selected from a part matching the template in the other image. It is characterized by extracting.
[0016]
The invention of claim 6 is characterized in that, in the invention of claim 2, pixel values of the tracking candidate and the deformation area are used as the index.
[0017]
According to a seventh aspect of the invention, in the second aspect of the invention, an average luminance of the tracking area and the tracking candidate is used as the index.
[0018]
The invention of claim 8 is characterized in that, in the invention of claim 2, the image is a color image, and colors of the tracking region and the tracking candidate are used as the index.
[0019]
A ninth aspect of the invention is characterized in that, in the second aspect of the invention, a spatial frequency distribution between the tracking region and the tracking candidate is used as the index.
[0020]
According to a tenth aspect of the present invention, in the second aspect of the present invention, a portion where a change in pixel value exceeds a specified value is extracted from the tracking region and the tracking candidate region, and at least the position, shape, and image feature of the portion are extracted. One element is used for the index.
[0021]
The invention according to claim 11 is the invention according to claim 2, wherein the image is a color image, and an average luminance or color of the tracking region and the tracking candidate is used as the index, and a spatial frequency distribution is defined as the index. Tracking in the one image, and a case where a part where the change in pixel value exceeds a specified value in the region is extracted and at least one element of the position, shape, and image feature of the part is used as the index The selection is made according to the luminance and color distribution pattern in the region.
[0022]
According to a twelfth aspect of the present invention, in the second aspect of the invention, the reflected light from the surface of the object corresponding to each tracking area is set as the attitude of the TV camera when the average luminance is maximum among the plurality of images in the plurality of tracking areas. It is estimated that the irradiation direction from the light source is estimated from the normal direction of the surface of the object corresponding to each tracking region and the direction of the reflected light.
[0023]
According to a thirteenth aspect of the present invention, in the second aspect, in the second step, a tracking area corresponding to a plurality of images is mapped to a three-dimensional space, and is obtained from a correspondence relationship between boundary elements of the tracking area. When the matching degree is equal to or higher than the threshold value, the tracking area mapped in the three-dimensional space is set as the reference area, the boundary element between the tracking area corresponding to the reference area and the reference area is compared, and imaging is performed in the tracking area by the change of the boundary element. It is characterized by extracting a region that has not been processed.
[0024]
According to a fourteenth aspect of the present invention, in the thirteenth aspect of the present invention, the number of boundary elements between a plurality of images is limited to a range that does not change, and for each range, the contribution rate obtained during mapping is used as the degree of coincidence. A tracking region having a contribution rate equal to or greater than a threshold is set as a reference region candidate, and when a plurality of reference region candidates are obtained, a candidate having the largest area is adopted as the reference region.
[0025]
The invention of claim 15 is characterized in that, in the invention of claim 14, the change in the number of boundary elements focuses on one of a line element and a point element.
[0026]
The invention of claim 16 is characterized in that, in the invention of claim 14, the contribution rate is obtained from a component of a diagonal matrix obtained when the tracking region is mapped to a three-dimensional space by a factorization method.
[0027]
According to a seventeenth aspect of the present invention, in the thirteenth aspect of the present invention, the tracking region is mapped in the three-dimensional space and compared with the reference region, and the region defined by the point elements that do not correspond to each other is not imaged in the tracking region. It is characterized by obtaining as.
[0028]
According to the invention of claim 18, in the invention of claim 13, when the tracking area is mapped in the three-dimensional space and compared with the reference area, when there is a line element that does not correspond to each other, the line element is imaged in the tracking area. It is characterized in that it is regarded as a part of an unfinished area.
[0029]
According to a nineteenth aspect of the present invention, in the first aspect, in the third step, all the reference areas in the three-dimensional space are projected on an image plane determined by the position of the TV camera, and the reference area in the image plane is projected. Based on the positional relationship between each other and the positional relationship of the reference area with respect to the image plane, the cause of the occurrence of an unimaged area in the tracking area is determined.
[0030]
In the invention of claim 20, in the invention of claim 19, when another reference area exists between the reference area of interest and the TV camera and both reference areas are not connected, it is determined that the other person is concealed. It is characterized by that.
[0031]
According to the invention of claim 21, in the invention of claim 19, when another reference area exists between the target reference area and the TV camera and both the reference areas are connected, it is determined that self-concealment occurs. It is characterized by.
[0032]
The invention of claim 22 is characterized in that, in the invention of claim 19, it is determined that the frame is out when the reference region of interest is located at the periphery of the screen.
[0033]
According to the invention of claim 23, in the invention of claim 19, when there is a region not imaged in the tracking region, the edge extension is performed, and the intersection processing of the extended edge is used as a new point element to perform the tracking process. When it is executed again and mapped to a three-dimensional space, the concealment is confirmed if the contribution rate is improved.
[0034]
According to a twenty-fourth aspect of the present invention, there is provided an image input unit that is provided as an image input image captured by a TV camera, an image processing device that performs processing by the image feature tracking processing method according to the first aspect, and an input A storage device for storing the processed image and the image processed by the image processing device, a display unit for displaying the processed image in the image processing device, and a region designating unit for designating a tracking region for the image processing device. Is.
[0035]
In the invention of claim 25, after acquiring a plurality of images by imaging a target object from a plurality of different positions with a TV camera, the surface selected from the target object is set as a tracking area represented by a boundary element in the image, A modified region is created by deforming a tracking region set in one image so as to match a boundary element of a tracking region candidate in another image, and a boundary between the tracking region candidate and the modified region in the other image is created. A tracking region corresponding to the tracking region set in the one image is selected from the tracking region candidates by comparing the elements, and a plurality of surfaces of the object are imaged and the same surface is imaged continuously. The plurality of images are divided into one phase image group, and then, for the plurality of phase images, the surface selected from the object for each phase image group is mapped to the three-dimensional space. The determined three-dimensional shape for each phase image group by performing a coordinate transformation so as to match each other coordinate system of the other phase images, characterized by creating a three-dimensional data.
[0036]
In the invention of claim 26, after acquiring a plurality of images by capturing an object with a TV camera from a plurality of different positions, a surface selected from the object is set as a tracking area represented by a boundary element in the image, A modified region is created by deforming a tracking region set in one image so as to match a boundary element of a tracking region candidate in another image, and a boundary between the tracking region candidate and the modified region in the other image is created. A tracking region corresponding to the tracking region set in the one image is selected from the tracking region candidates by comparing the elements, and a plurality of surfaces of the object are imaged and the same surface is imaged continuously. The plurality of images are divided into one phase image group, and then, for the plurality of phase images, the surface selected from the object for each phase image group is mapped to the three-dimensional space. By obtaining a three-dimensional shape for each phase image group by performing coordinate transformation of the three-dimensional shape obtained for each phase image group so that line elements common to other phase image groups are superimposed on each other It is characterized by creating three-dimensional data.
[0037]
According to a twenty-seventh aspect of the present invention, in the invention of the twenty-fifth or twenty-sixth aspect, a surface selected from the object in the plurality of images obtained by imaging with the TV camera is sequentially displayed as a point element and each point element. This is expressed as a boundary element consisting of connected directional line elements. In each image obtained in time series, the line element is traced to monitor the direction in which each point element is circulated. A phase image group is used.
[0038]
In the invention of claim 28, in the invention of claim 25 or claim 26, in the coordinate transformation, after obtaining transformation parameters relating to rotation and translation for each boundary element, the transformation parameters obtained for each boundary element are averaged. Coordinate conversion is performed using the converted parameters.
[0039]
The invention of claim 29 is the invention of claim 25 or claim 26, wherein the position of each point element and the other three-dimensional shape after the coordinate transformation is applied to one three-dimensional shape in which the coordinate systems coincide with each other. It is characterized in that the midpoint of the position of each corresponding point element is the position of each point element.
[0040]
DETAILED DESCRIPTION OF THE INVENTION
(First embodiment)
FIG. 1 shows an apparatus used in the present invention. A two-dimensional image to be processed is input from the image input device 11 to the image feature tracking processing device 10. The image input device 11 is a TV camera or a playback device that plays back video from a recording medium (video tape, CD, DVD, etc.) on which a moving image captured by the TV camera is recorded. Is configured using a computer device (personal computer). The TV camera (hereinafter abbreviated as a camera) may be a simple video camera for home use or an industrial ITV camera. In the embodiment described below, a color image is targeted.
[0041]
As shown in FIG. 2, the image input to the image feature tracking processing apparatus 10 continuously changes the viewpoint of viewing the object 1 by moving the camera 2 with respect to the object 1 as indicated by an arrow A. Use the selected image. That is, a time-series image group (moving image) of two-dimensional images in which the viewpoint with respect to the object 1 is continuously changed is input to the image feature tracking processing device 10. Here, the image input to the image input unit 12 is imaged so that the moving amount of the object 1 between each frame of the moving image is sufficiently smaller than the size of the object 1. That is, the camera 2 that captures a moving image moves at a relatively low speed. In addition, it is assumed that the target object 1 has a relatively large number of planar portions, and the target object 1 does not move and does not change in shape with the passage of time. FIG. 3 shows an example of a moving image obtained by imaging the object 1 shown in FIG. In FIG. 3, one frame means one image V1 to V3, and shows a state where the images V1 to V3 are arranged in time series from left to right.
[0042]
The image feature tracking processing apparatus 10 basically includes an image input means 12 serving as an interface with the image input apparatus 11 and a hard disk provided for storing moving images input through the image input means 12 and processing results. And a storage device 13 including a memory, an image processing device 14 that extracts three-dimensional information from a time-series image group that is an input two-dimensional image, and a display device that displays an input moving image and a processing result Means 15 and an area designation means 16 including a keyboard and a mouse for setting a tracking area corresponding to the object 1 to the image processing apparatus 14.
[0043]
The storage device 13 stores an image file F1 for storing an image input through the image input means 12 and a work for tracking the target 1 for obtaining three-dimensional information in a time-series image group that is a two-dimensional image. A tracking data file F2 used as a file for use and a three-dimensional shape data file F3 for storing three-dimensional information obtained by the image processing device 14 are provided.
[0044]
Further, the image processing apparatus 14 includes a region tracking unit 17 that tracks the target 1 based on the tracking region set for the target 1 by the region specifying unit 16 described above, and the target by the region tracking unit 17. Tracking evaluation means 18 for evaluating the validity of one tracking is provided. In addition, the image processing device 14 is provided with a shape restoration unit 19 that generates three-dimensional information from information in the two-dimensional image related to the tracked object 1. The three-dimensional information generated by the shape restoration unit 19 is used for the evaluation of the validity of the tracking performed by the tracking evaluation unit 18.
[0045]
Hereinafter, the operation of the image processing apparatus 14 will be described. First, when an image is input through the image input unit 12, the input image is stored in the image file F 1 of the storage device 13 and displayed on the display unit 15. Here, the operator who operates the area specifying unit 16 looks at the image displayed on the display unit 15 and specifies the initial tracking area in association with the object 1. That is, one image is selected from the time-series image group, and an initial tracking area is set in this image. In general, the tracking area is set for each surface of the object 1, and all the surfaces of the object 1 displayed in one image are designated as the tracking area. If the tracking area is designated in this way, unnecessary information such as the background is removed. FIG. 4 shows a state where the tracking area D is designated. The tracking region D is a closed region made up of a set of boundary elements, and a point element and a line element connecting the point elements are used as the boundary element. In FIG. 4, white circles are point elements, and a line between two adjacent point elements represents a line element. The boundary element is a connection relationship between the tracking areas D expressed by whether or not the line element is shared by the two tracking areas D (the bold line in FIG. 4 indicates a line element straddling the two tracking areas D), the area Image features (color, texture, etc.).
[0046]
As described above, a format for expressing a three-dimensional shape by line elements and point elements as boundary elements is called boundary representation (B-REP) and is generally used in 3DCG. If the input image information is converted into the boundary representation, the amount of data is greatly reduced as compared with the case where the image information is handled in units of pixels, and data conversion is facilitated when using three-dimensional data. FIG. 5 shows an example of setting the boundary region D by boundary expression. In FIG. 5, the hatched portion indicates the boundary region D, and an annular region is set by the line elements s1 to s8 and the point elements p1 to p8.
[0047]
The area designating unit 16 can designate not only the tracking area D for the object 1 but also the accuracy for measuring the object 1. For example, if three-dimensional measurement is performed on the furniture 20 having the drawer 21 as shown in FIG. 6, the accuracy of whether the measurement is performed up to the handle 22 provided on the drawer 21 or the whole furniture 20 is regarded as a rectangular parallelepiped. Can be specified. When the object 1 in the image is not a complicated shape like furniture but a simple shape such as a combination of geometric shapes, each surface of the object 1 is automatically set as a tracking region D. You may make it set. That is, you may provide the area division | segmentation means which divides | segments the internal area | region of the target object 1 for every surface.
[0048]
The tracking area D is stored in the tracking data file F2 in the storage device 13. The area tracking unit 17 detects an area corresponding to the tracking area D set by the area specifying unit 16 in each image of the time-series image group. That is, the target surface of the object 1 is tracked in each image of the time-series image group. In the area tracking means 17, in order to track the tracking area D in each image of the time series image group, it is set for one image (one frame) in the time series image group stored in the image file F1. The tracking area D has a function of estimating the shape of the tracking area D of the next frame and generating a deformation area. The obtained deformation area is compared with the image of the next frame, and the position of the deformation area in the next frame is detected. Thereafter, the deformation area is regarded as a tracking area in the next frame, and the position of the deformation area in the next frame is further detected. In this way, deformation areas are generated one after another for all frames, and the positions of the deformation areas are stored as tracking data in the tracking data file F2.
[0049]
The function of the area tracking unit 17 will be described in more detail. FIG. 7 shows a state in which only one object 1 is displayed in one image (one frame) in the time-series image group. The tracking region (shaded portion) D in FIG. 7 is set in the previous frame in the time-series image group, and does not coincide with any surface of the object 1 in the frame shown in FIG. Therefore, the area tracking means 17 detects which part in the frame each line element sA, sB, sC of the tracking area D corresponds to. Here, it is assumed that the image of the object 1 is an image obtained by extracting an edge (referred to as an edge image). A well-known technique is used for edge extraction.
[0050]
In the area tracking means 17, there are a case where a deformation area is generated based on a line element and a case where a deformation area is generated based on a point element. When generating a deformation area based on a line element, first, an edge whose direction and distance are within a predetermined range with respect to each of the line elements sA, sB, and sC is extracted from the edges of the object 1. That is, by performing the Hough transform on the edge image of the object 1, it is possible to know the inclination of the straight line portion in the edge image, so that each straight line portion and each line element sA in the edge image obtained by the Hough transform are obtained. , SB, sC are obtained, and when the angle difference is within a predetermined range, the straight line portion in the edge image is associated with the line elements sA, sB, sC. In the illustrated example, the edges sa1 and sa2 correspond to the line element sA, the edges sb1 and sb2 correspond to the line element sB, and the edge sc1 corresponds to the line element sC.
[0051]
At this stage, two edges sa1 and sa2 are associated with the line element sA, and two edges sb1 and sb2 are also associated with the line element sB. The edges sa1, sa2, sb1, sb2, and sc1 need to be associated with the line elements sA, sB, and sC on a one-to-one basis. Here, four types of candidates are considered as combinations. That is, any combination of sa1-sb1-sc1, sa1-sb2-sc1, sa2-sb1-sc1, and sa2-sb2-sc1 (hereinafter referred to as a tracking candidate) corresponds to the tracking region D. Here, since the edge is obtained by the Hough transform, information on the length of the edge is lost, and each combination corresponds to a graphic shown by a solid line in FIG. A two-dot chain line in FIG.
[0052]
Next, the tracking area D is deformed so as to match each of the tracking candidates described above. That is, when the line elements sA, sB, and sC are deformed so as to be associated with the tracking candidates shown in FIGS. 8A to 8D and the deformed area is generated, the lines as shown in FIGS. 9A to 9D are obtained. Four types of deformation regions E1 to E4 having elements sA1 to sA4, sB1 to sB4, and sC1 to sC4 are generated. FIG. 9E shows a tracking area D before deformation. Here, in the process of deforming the tracking area D, as shown in FIG. 10, the point elements pA, pB, and pC in the original tracking area D and the intersections pa, pb, and pc of the edges in the deformation area E are connected. The vectors Ma, Mb, Mc are obtained, and further, the position of the pixel pp in the deformation area E corresponding to the pixel pP by interpolating the vectors Ma, Mb, Mc according to the position of the pixel pP included in the tracking area D ( Vector Mp) is obtained. The pixel value of each pixel in the deformation area E is also obtained by interpolation processing. This type of processing is called warp deformation processing.
[0053]
When the region tracking unit 17 generates the deformation region based on the point element, attention is paid to the shape of the boundary element in the vicinity of the point elements pA, pB, and pC of the tracking region D. Now, assuming that an image as shown in FIG. 11 is obtained, boundary elements in a predetermined range centered on the point elements pA, pB, and pC of the tracking region D are shown in FIGS. 12 (a) to 12 (c). It becomes like this. Therefore, pattern matching in a frame (the next frame in which the tracking region D is set) is performed using the shape of FIG. 12 as a template, and a part with high similarity is extracted. The pattern matching may be performed in the vicinity of the point element in the frame, and since it is not necessary to consider the inclination, the processing amount is very small as compared with the case of pattern matching for the entire area in the image. In the example of FIG. 11, point elements sa1, sa2, and sa3 are selected for the point element sA, the point element sb1 is selected for the point element sB, and the point element sc1 is selected for the point element sC. That is, as shown in FIG. 13, since three types of combinations of sa1-sb1-sc1, sa2-sb1-sc1, and sa3-sb1-sc1 are obtained as tracking candidates, three types of deformation regions E1 as shown in FIG. ~ E3 is generated.
[0054]
When the pixel value of each pixel px in the deformation area E is obtained as described above, the deformation area E is used to select one corresponding to the tracking area D from the tracking candidates.
[0055]
First, when the pixel value (density value) and the differential value in the tracking region D are within the specified range, it means that the pixel value is substantially constant or changes smoothly. The pixel value of pp is directly compared with the pixel value of each pixel of the tracking candidate. That is, since each deformation region E is set to match the shape of each tracking candidate, the difference in pixel value can be known by obtaining the difference in pixel value of the corresponding part. Therefore, a sum of absolute values of pixel value differences is obtained, and a value obtained by dividing by the number of pixels included in the tracking region D is used as an evaluation value. Evaluation values are obtained for all tracking candidates, and the tracking candidate with the smallest evaluation value is selected as the optimal tracking candidate. That is, the tracking candidate selected in this way is adopted as a new tracking area. In the example shown in FIG. 8, the tracking candidate shown in FIG. 8B is selected as a new tracking area, and in the example shown in FIG. 12, the tracking candidate shown in FIG. 13A is selected as a new tracking area. In other words, the evaluation value of a combination (pa1-pb1-pc1 for a line element, pa1-pb1-pc1 for a point element) that does not include other edges in the tracking candidate becomes a minimum and becomes a new tracking area. .
[0056]
On the other hand, when the pixel value or the differential value has a portion exceeding the specified range in the tracking region D, it means that there is a large local variation in the pixel value. For example, in the tracking region D including a fine pattern (texture) such as the surface of a stone, the pixel value or the differential value exceeds the specified range, and the local pixel value difference is large on such a surface. Therefore, the difference between the pixel values cannot be used as the evaluation value. Therefore, a difference between an image feature amount described later in the tracking region D and an image feature amount obtained from the deformation region E1 is used as an evaluation value, and a tracking candidate that minimizes the evaluation value is used as a new tracking region.
[0057]
Here, as the image feature amount, luminance information may be used when the luminance of the tracking region D is substantially uniform, and color information may be used when there is little variation in color. That is, the tracking candidate is adopted as a new tracking area when the difference in image feature amount from the tracking area D among the tracking candidates is within a specified value. When color information is used, it is necessary to use a color image as an image. In the case of using color information, for example, the tracking region for the RW line in the chromaticity diagram (R, G, B, W means red, green, blue, white) as shown in FIG. The angle θ with respect to the color Q may be a hue value.
[0058]
Also, as shown in FIG. 16, when a pattern that is considered to be periodic exists in the tracking region D, information on the spatial frequency is extracted by Fourier transform or wavelet transform and used as an image feature amount. That is, as shown in FIG. 17, if the spatial frequency distributions d1 and d2 of the deformation area E and the tracking candidate are extracted and the spatial frequency distributions d1 and d2 are compared, a new tracking area can be selected from the tracking candidates. it can.
[0059]
As shown in FIG. 18, when characters or figures are written on the surface of an object or a label is attached, a portion with a large contrast may occur in the tracking region D. In such a case, the tracking candidate and the deformation area E are compared using the character or graphic notation or label as the feature portion F. For example, if the deformation area E is set as shown in FIG. 18A and the tracking candidates E ′ (including the feature portion F ′) to be compared are set as shown in FIG. 19, the distances L1, L2, and L3 between the center of gravity G of the feature portion F and each of the intersections pa, pb, and pc of the edge, the area of the feature portion F, the average luminance and the average hue at the feature portion F are image features. By using it as a quantity, it becomes possible to select a new tracking area from the tracking candidates.
[0060]
As described above, the evaluation method when selecting a new tracking area from the tracking candidates is based on the case where the pixel value and the differential value are within the specified range in the tracking area D, and the pixel value or the differential value in the tracking area D. The case where the value is out of the specified range is different from the case where there are notations and labels in the tracking area D. Therefore, it is necessary to select an evaluation method according to each condition. In this selection, luminance and hue histograms are created, and selection is automated by selecting an evaluation method according to the histogram pattern.
[0061]
That is, when a histogram is generated for luminance and hue and no peak occurs in the histogram as shown in FIG. 20, the image is first-order differentiated for the deformation region E, and the sum of the differential values is divided by the number of pixels. In other words, the average value of the differential values of the deformation area E is obtained as the texture feature amount. Also, a threshold value for the texture feature amount is set, and when the texture feature amount is equal to or less than the threshold value, the differential value is within the specified range in the tracking region D, so it is determined that there is no fine pattern and the difference sum of the pixel values is evaluated. As a method, if the threshold value is exceeded, the differential value exceeds the specified range, so it is determined that there is a fine pattern, and a method of comparing the spatial frequency distribution is adopted.
[0062]
On the other hand, as shown in FIG. 21, when a single peak occurs in at least one of luminance and hue, average luminance and average hue are used. Also, as shown in FIG. 22, when a plurality of peaks occur in both luminance and hue, there is a high possibility that characters and figures are written or labels are attached, so that information on the characteristic portion F is used. To.
[0063]
As described above, by using the luminance and hue distribution information, it is possible to automatically set which processing is performed. FIG. 23 shows the overall flow. That is, after setting the initial tracking area D (S1), corresponding candidates are extracted from the boundary elements in the next frame (S2). Next, a tracking candidate is created by a combination of boundary elements (S3), and a histogram of luminance and hue is created for the pixels in the tracking area D (S4). Here, the number of peaks occurring in the histogram is obtained (S5), and if there is no peak in both histograms (S6), the average value of the differential values is compared with a threshold value (S7), and if it is less than the threshold value, tracking is performed by comparing only pixel values. A new tracking area is selected from the candidates (S8). When the average value of the differential values is larger than the threshold value, the tracking region is selected according to the spatial frequency distribution (S9).
[0064]
On the other hand, if there is one peak in the luminance histogram (S10), a tracking area is selected using the average luminance (S11), and there is not one peak in the luminance histogram but one peak in the hue histogram. Sometimes (S12), the tracking area is selected using the average hue (S13). There is a peak in the histogram of luminance and hue, but if there are two or more peaks, it is considered that there is character or graphic notation or label sticking, so select the tracking area using this kind of feature (S14). The tracking candidate selected as described above is set as a new tracking area (S15), and the process proceeds to the next frame (S16).
[0065]
By the way, since the time-series image group is generated by the movement of the camera 2, a part of the initial tracking area D is focused on as shown in FIG. 24 depending on the positional relationship between the object 1 and the camera 2. The object 3 (including other objects) existing between the object 1 and the camera 2 may be concealed (the concealed part is indicated by a hatched portion). In addition, as shown in FIG. 25, the tracking region D of the object 1 may be concealed by being positioned on the opposite surface side of the object 1 with respect to the camera 2 (the concealed part is indicated by a hatched portion). As described above, the shape of the specific surface selected as the tracking region D may change, or the specific surface may be hidden and cannot be tracked. If the hidden surface is newly exposed, there is no problem because it is not designated as the tracking region D.
[0066]
As described above, when the shape (number of vertices or sides) of the tracking region D changes or when the tracking region D2 is completely hidden, the shapes (point elements and lines) of the deformation region and the tracking candidate are used. Since the number of elements) does not match, the degree of matching between the deformation area and the tracking candidate decreases. In such a case, the boundary elements of the tracking area are increased / decreased according to the shape of the deformed tracking area, and the subsequent tracking is continued. For example, as shown in FIG. 24, when another object 3 exists between the object 1 and the camera 2 for which the tracking area is set, the tracking area D1 is a square as shown in FIG. On the other hand, a part including a straight line indicated by a two-dot chain line and one vertex in the figure is concealed by the object 3 and deformed into a pentagonal tracking area D2 as shown in FIG. In this case, boundary elements (point elements, line elements) are increased so that the tracking area D2 is a pentagon, and the subsequent tracking is performed. Such modification of the tracking area D2 is performed automatically as much as possible, and if it cannot be performed automatically, a message requesting the modification of the tracking area D2 is displayed on the display means 15. When a message is displayed on the display means 15, the area specifying means 16 is operated to deform the tracking area D2.
[0067]
In the tracking process described above, the position of the tracking area is sequentially stored in the storage device 13, and image characteristics such as average luminance used when selecting a new tracking area from the tracking candidates are also stored in the storage device 13. The These pieces of information are used by the tracking evaluation unit 18 and the shape restoration unit 19 and are used to estimate the cause of the abnormality when the tracking area where the abnormality has occurred in tracking is re-tracked.
[0068]
For example, when a house is imaged outdoors during the day on fine weather, there is a strong single light source that irradiates the object 1, and as shown in FIG. OM will occur strongly. Since such a shadow OM has an edge, there is a possibility that the deformation area is erroneously generated to hinder tracking, and the tracking may be abnormal. Therefore, the position of the single light source LM forming the shadow OM is detected from the image features stored in the storage device 13, and the occurrence of the shadow OM is predicted to remove the influence of the shadow OM.
[0069]
This will be described more specifically. As shown in FIG. 28, when the tracking area D is illuminated by a single light source (including natural light) LM, the camera 2 is picked up from a position represented by camera posture vectors v1 to vn. The angle formed between the illumination light vector r1 representing the direction of the light beam from the single light source LM and the reflected light vector r2 representing the direction of the regular reflection light in the tracking region D with the normal direction U of the surface on which the tracking region D is set And the apparent luminance of the tracking region D increases as the camera posture vectors v1 to vn are closer to the reflected light vector r2.
[0070]
Therefore, the average luminance of the tracking area D set in each frame in the tracking process is stored in the storage device 13, and the shape restoring means 19 captures a frame having the maximum average luminance for each tracking area D. Camera posture vectors v1 to vn are obtained, and the obtained camera posture vectors v1 to vn are regarded as approximate values of the reflected light vector r2. When the reflected light vector r2 is obtained, the illumination light vector r1 can be estimated in relation to the normal direction U of the surface on which each tracking region D is set, so that the illumination light vectors obtained for all the tracking regions D are obtained. The degree of variation in the estimated value of r1 is evaluated, and when the variation is small, it can be determined that a single light source exists. At this time, the average value of the estimated values of the illumination light vector r1 obtained for each tracking region D is used as the illumination light vector r1.
[0071]
By the way, as described above, in the process of tracking the tracking area D by the camera 2, it may be impossible to image a part or all of the tracking area D. The following three types of cases are considered as the cause that the tracking area D cannot be imaged.
[0072]
That is, as shown in FIG. 29, when another object 3 exists between the tracking area D set on the object 1 and the camera 2 and the tracking area D is concealed (hereinafter referred to as “concealment of others”), When the tracking area D set for the object 1 is located on the back side of the object 1 with respect to the camera 2 (hereinafter referred to as self-hiding) as shown in FIG. 31, a part of the tracking area D is shown in FIG. There are three types of cases where the camera 2 goes out of the field of view VF (hereinafter referred to as frame out).
[0073]
As described above, an area that cannot be imaged by the camera 2 (hereinafter referred to as an imaging impossible area) is extracted by the following procedure. That is, first, a tracking area in which an unimageable area does not occur in the tracking process is set as a reference area. Such a reference area is set using a tracking area where the number of boundary elements does not change in a plurality of images in the time-series image group. Here, either a line element or a point element may be used as the boundary element. For example, in the time-series image groups V11 to V13 shown in FIG. 32A, the tracking areas D11 to D13 have five point elements and five line elements, and the number of point elements and line elements changes. It has not occurred. In the time-series image groups V21 and V22 shown in FIG. 32B, the tracking areas D21 and D22 have four point elements and four line elements. In this case, the number of point elements and line elements is also the case. There is no change.
[0074]
When setting the reference area, select a boundary area where the number of boundary elements does not change in multiple images as described above, and use a method such as stereo method or factorization method by taking the correspondence of point elements. By applying, the tracking area is mapped in the three-dimensional space. For example, in the two images as shown in FIGS. 33A and 33B, it is assumed that the coordinates of the point elements of the tracking areas D1 and D2 are set as shown in FIGS. Here, when each point element is mapped as shown in FIG. 33 (c), if the factorization method is adopted, the coordinates of the point elements of the tracking regions D1 and D2 in each image and the three-dimensional space The relationship with the coordinates of the points can be expressed in the form of Equation 1.
[0075]
[Expression 1]

In Equation 1, the left determinant on the right side represents the direction of the camera 2, and (CX1 CY1 CZ1) in the first row and (CX3 CY3 CZ3) in the third row are used when obtaining the image of FIG. The vectors in the x-axis and y-axis directions of the image plane of the camera 2 are shown, and (CX2 CY2 CZ2) in the second row and (CX4 CY4 CZ4) in the fourth row are the cameras 2 when obtaining the image of FIG. The vectors in the x-axis and y-axis directions of the image plane are shown. The central matrix on the right side is a diagonal matrix obtained by the factorization method, and the contribution rate K can be obtained from this component by the following equation.
K = (a + b + c) / (a + b + c + d)
If there is no non-imagingable area, it is considered that there is no miscorrespondence between the point elements between the tracking areas set in the time-series image group. Therefore, an appropriate threshold is set for the contribution rate K. When the contribution rate K obtained as described above exceeds a threshold value, it is determined that no non-imaging area is generated, and an area mapped in the three-dimensional space is set as a reference area candidate. Since a reference area candidate is obtained for each tracking area in which the number of point elements among the boundary elements does not change, a plurality of reference area candidates corresponding to one surface are generated in the three-dimensional space. Therefore, a plurality of reference area candidates are obtained, and the reference area candidate having the largest area among them is adopted as the reference area for one surface in the three-dimensional space.
[0076]
If the reference area is determined, the non-imaging area can be extracted. That is, if the point element does not correspond while tracking the tracking area in which the reference area is determined with the time-series image group, it can be determined that the non-imagingable area has occurred. For example, as shown in FIG. 34, if the reference region DS is set, the reference region DS includes point elements pS1 to pS4 as boundary elements. In contrast, when the tracking region D is mapped to a three-dimensional space by the stereo method or the factorization method, it is assumed that the tracking region D includes five point elements p1 to p5 as shown in FIG. For example, the point elements p1 to p3 of the tracking area D correspond to the point elements pS1 to pS3 of the reference area DS, but the point elements p4 and p5 of the tracking area D have no point elements corresponding to the reference area DS. . That is, an area surrounded by the point elements pS4, p4, and p5 that do not correspond to each other in the reference area DS and the tracking area D can be regarded as an imaging impossible area.
[0077]
In the above description, point elements are used. However, when line elements are used, the line elements sS2 and s3 in the tracking area D correspond to the line elements sS2 and sS3 in the reference area DS, as shown in FIG. The line elements sS1 and s4 in the tracking area D correspond to the line elements sS1 and sS4 in the reference area DS, but the line element s5 in the tracking area D does not have a line element corresponding to the reference area DS. It can be determined that s5 is a part of the non-imaging area.
[0078]
After the non-imaging area is extracted as described above, it is determined whether the non-imaging area is caused by another person's concealment, self-concealment, or frame-out. For this determination, the reference area is projected on the image plane of each image in the time-series image group from the state where all the reference areas obtained as described above are mapped in the three-dimensional space. In other words, since the position in the three-dimensional space of the image plane where the non-imaging area is generated can be defined by the position of the camera 2, if the reference area is projected onto this image plane, the planes on which the reference area is set overlap each other. It can be known whether it is out of the image plane. In addition, it is possible to know on which surface the reference area is set that the non-imaging area is generated based on the distance relationship between the surface where the reference area is set and the camera 2 (that is, the image plane). Hereinafter, the surface on which the reference area is set is referred to as a reference surface.
[0079]
Here, in the case where there is another reference plane between the camera 2 and the reference plane in which the non-imaging area is generated in the three-dimensional space, the reference plane continuous to the reference plane in which the non-imaging area is generated is When it does not exist, it is determined that the other person is concealed. For example, in FIG. 36, a non-imagingable region (shaded portion) is generated on the reference surface SR1, but the reference surface SR2 adjacent to the reference surface SR1 in the image is not connected in the three-dimensional space. It is judged that others are concealed.
[0080]
On the other hand, when there is another reference plane between the camera 2 and the reference plane where the non-imaging area is generated in the three-dimensional space, there is a reference plane continuous to the reference plane where the non-imaging area is generated. When doing so, it is determined to be self-concealment. For example, in FIG. 37, a non-imagingable area (shaded portion) is generated on the reference plane SR3, and the reference planes SR4 and SR5 adjacent to the reference plane SR3 in the image are also connected in the three-dimensional space. Therefore, in this case, it is determined that the reference plane SR3 is a non-imaging region due to self-hiding.
[0081]
Furthermore, as shown in FIG. 38, when a part of the reference surface SR6 that has caused the non-imaging area is straddling the periphery of the image surface, it is determined that the frame is out (out of the reference surface SR6 from the screen). (The shaded area indicates the part that appears in
[0082]
Based on the distance between the reference plane and the image plane and the connection relationship between the reference plane generating the non-imaging area and another reference plane as described above, the cause of the non-imaging area is concealed by others. It can be determined whether it is self-concealment or frame-out.
[0083]
After determining the cause of the non-imagingable area as described above, the determination result is verified. That is, in the reference plane SR in which the non-imaging area is generated, as shown in FIG. 39, the edges E1 and E2 are extended in the direction in which the non-imaging area exists, and the intersection of the extended edges E1 and E2 is a new reference plane SR. The tracking process is performed by regarding it as a point element. At this time, if the above-described contribution rate is improved, the non-imagingable area is finally determined to be caused by the other person hiding or self hiding. Here, since the non-imaging area due to concealment may become imageable when the camera 2 moves, the tracking process is continued, and the non-imaging area due to self-concealment can be imaged again by the movement of the camera 2. It is determined that there is not, and the tracking process is terminated.
[0084]
FIG. 40 summarizes the processing procedure for the non-imagingable area. That is, first, in order to set a reference region, a tracking region in which the number of boundary elements does not change in a plurality of images is extracted (S1). Here, the tracking area forming one surface of the object 1 is extracted from a plurality of images, and the corresponding tracking area is subjected to coordinate transformation to map the tracking area in the three-dimensional space (S2). Then, a contribution rate at the time of mapping is obtained (S3). After mapping all the extractable faces to the three-dimensional space and calculating the contribution ratio (S4), the faces whose contribution ratio is equal to or greater than the threshold when mapped to the three-dimensional space are extracted as reference area candidates. (S5). Further, a reference area candidate having the largest area is extracted as a reference area (S6). When the reference area is determined, the non-imagingable area is then extracted (S7), and the positional relationship with other reference areas and cameras in the same image as the reference area of interest in the three-dimensional space is obtained (S8). Here, if another reference area is separated from the reference area where the non-imaging area is generated (S9), it is determined that the other person is concealed (S10), and the non-imaging area is generated. On the other hand, if another reference area is connected (S11), it is determined as self-hiding (S12). If neither other person's concealment nor self concealment has occurred, the position in the frame is obtained for the reference area where the non-imaging area has occurred (S13), and when the reference area exists near the periphery of the image (S14), it is determined that the frame is out (S15). After the above process is performed for all frames (S16), the edge corresponding to the non-imaging area is extended and the tracking process is re-executed to determine the presence or absence of concealment (S17).
[0085]
(Second Embodiment)
The first embodiment is a method for detecting the presence or absence of concealment, but in this embodiment, a series of multiple images that do not change in a surface included in an image are used as a phase image group, and each phase image group Next, a method of mapping the tracking area in the three-dimensional space and superimposing the three-dimensional shape obtained by the mapping by coordinate transformation to obtain the entire three-dimensional data of the object 1 will be described. Therefore, in the present embodiment, the processing for representing and tracking the object 1 as a boundary element can be made common with the first embodiment. This embodiment can be processed alone if the object 1 does not cause concealment of others, and is used in the first embodiment after the processing for detecting the presence or absence of concealment in the first embodiment. It is also possible to perform the processing of this embodiment using the result of the tracking processing that has been performed.
[0086]
In the following, for ease of explanation, it is assumed that the object 1 has a rectangular parallelepiped shape as shown in FIG. In FIG. 41, an arrow A represents the movement of the camera 2. That is, the process of the present embodiment will be described assuming that no other person is concealed. Further, each surface of the object 1 is defined as shown in FIG. That is, the upper surface of FIG. 42 is f1, the lower surface is f2, the front left surface and the front right surface of FIG. 42 are respectively f3 and f4, and the back right surface and the back left surface of FIG. 42 are f5 and f6, respectively. When such an object 1 is viewed from above, the relationship between the surfaces f1, f3 to f6 is as shown in FIG. Now, the camera 2 is relative to the object 1 around an axis that passes through the center of the upper surface f1 of the object 1 and is orthogonal to the upper surface f1 (that is, an axis that passes through the broken line in FIG. 43 and is orthogonal to the surface of the drawing). The object 1 is imaged obliquely from above at an angle of about 45 degrees with respect to the axis. Therefore, each image (frame) obtained by imaging always includes the upper surface f1. Further, when the camera 2 is located in a plane that includes the above axis and is orthogonal to each of the surfaces f3 to f6 (when the camera 2 is located at positions e3 to e6 in FIG. 43), each image (frame) has an area other than the upper surface f1. Includes one of the surfaces f3 to f6. When the camera 2 is located at other positions (ranges d34, d45, d56, d63 in FIG. 43), each image includes two adjacent surfaces in addition to the upper surface f1.
[0087]
That is, when the position of the camera 2 is changed counterclockwise from the position e3 shown in FIG. 43, the contents of each image (each box in FIG. 44 shows an image) change as shown in FIG. Become. However, an image group k34, k45, k56, k63 including three surfaces of the upper surface f1 and the other two surfaces is obtained with one image h3 to h6 including only the upper surface f1 and the other one surface interposed therebetween. The imaging conditions are set so that That is, the imaging conditions are set so that a plurality of images can be obtained continuously for each of the image groups k34, k45, k56, and k63 including three surfaces. A plurality of consecutive images including the same surface in this way is hereinafter referred to as a phase image group. The imaging conditions are set so that the images h3 to h6 including only two surfaces between the phase image groups k34, k45, k56, and k63 can be obtained independently for each image. In an image obtained under such imaging conditions, each phase image group k34, k45, k56, k63 can be divided into images h3 to h6. In short, when the number of surfaces included in each image changes, the images are divided before and after the change, and when a plurality of images including the same surface are obtained in succession, they are grouped into a phase image group. In each of the phase image groups k34, k45, k56, and k63 obtained in this way, there is no change in the included surface.
[0088]
In the present embodiment, a boundary representation (B-REP) consisting of line elements and point elements is used in the area designating unit 16 shown in FIG. 1 for the image of the object 1 acquired by the TV camera 2. To set the tracking area. Further, the area selected by the object 1 is tracked by the area tracking means 17. Such tracking processing is the same as that of the first embodiment. A tracking area is set for each surface of the object 1, and the shape of the tracking area of another screen is estimated from the tracking area of one screen. A deformation area is generated, and the image of another screen is compared with the deformation area to track the position of the deformation area. By tracking the surface of the object 1 in this way, information on which surface is included in each image can be obtained. Therefore, the phase image groups k34, k45, k56, and k63 are set as described above. Can do.
[0089]
When the phase image groups k34, k45, k56, and k63 are set as described above, the tracking area is mapped to a three-dimensional space for each phase image group k34, k45, k56, and k63. In order to set the phase image groups k34, k45, k56, k63, the phase image groups k34, k45, k from the original moving image stored in the image file F1 using means not shown in FIG. Cut into k56 and k63. As a process of dividing the original moving image into the phase image groups k34, k45, k56, and k63, a method using an image processing unit that automatically tracks a surface by a contour line in a two-dimensional image, or a plurality of images on a screen. One of the methods that are displayed and manually performed by a person is used. In each of the images of the phase image groups k34, k45, k56, and k63, there is no change in the planes included. Therefore, in each image constituting the phase image groups k34, k45, k56, and k63, the point element disappears due to self-hiding. In the range of each phase image group k34, k45, k56, k63, a tracking region is mapped to a three-dimensional space by applying a conventionally known method such as a factorization method or a stereo method. Can do.
[0090]
Now, assuming that the tracking area can be mapped in the three-dimensional space as shown in FIGS. 45A to 45D by using the phase image groups k34, k45, k56, and k63 shown in FIG. The world coordinate systems for the three-dimensional shapes obtained from the image groups k34, k45, k56, and k63 are different. 45A to 45D, the world coordinate system for each three-dimensional shape is indicated by X1-Y1-Z1, X2-Y2-Z2, X3-Y3-Z3, and X4-Y4-Z4, respectively. As described above, when the world coordinate system is different for each three-dimensional shape obtained from the phase image groups k34, k45, k56, and k63, the three-dimensional data of the object 1 is expressed in one common coordinate system. Since it is not possible, the three-dimensional shape having each world coordinate system is superimposed by the following procedure.
[0091]
When a plurality of three-dimensional shapes having different world coordinate systems are overlaid, the procedure differs depending on whether there is a common line element between the plurality of three-dimensional shapes to be overlaid.
[0092]
When a common line element exists, coordinate conversion is performed on a plurality of three-dimensional shapes as targets so that the common line elements overlap. For example, in FIG. 45, the line elements s13, s14, s15, s16 surrounding the surface f1 exist in common in FIGS. 45A to 45D, and the line element s45 as a boundary line between the surface f4 and the surface f5 is 45 (a) to 45 (c) exist in common. There are other line elements that are common to a plurality of three-dimensional shapes, but it is desirable to select as many line elements as common to three-dimensional shapes as possible. In this embodiment, line elements that are in different directions and intersect at one point are selected. That is, a line element s14 that is a boundary line between the surface f1 and the surface f4, a line element f15 that is a boundary line between the surface f1 and the surface f5, and a line element s45 that is a boundary line between the surface f4 and the surface f5. Three are selected. By performing coordinate transformation so that these three line elements s14, s15, and s45 are superposed, the four three-dimensional world coordinate systems are unified, and one world coordinate system (X -YZ) makes it possible to represent three-dimensional data.
[0093]
On the other hand, when there is no common line element, the tracking areas that exist in common in each three-dimensional shape are overlapped. That is, in FIG. 45, since the surface f1 is common to all three-dimensional shapes, the surface f1 is overlapped. This is equivalent to performing coordinate transformation so that the line elements s13, s14, s15, and s16 are overlapped.
[0094]
By the way, when the three-dimensional world coordinate system obtained for each of the phase image groups k34, k45, k56, and k63 as described above is integrated into one world coordinate system, each of the phase image groups k34, k45, k56, and k63. If there is an error in the three-dimensional shape obtained from the above, there is also a shift in the two world coordinate systems that are to be superimposed. Inconvenience may occur. Therefore, a transformation matrix for coordinate transformation is obtained for each boundary element (line element and point element) common to the two world coordinate systems, and each element of the obtained transformation matrices (that is, transformation relating to rotation and translation) Coordinate conversion is performed using a conversion matrix whose elements are the average values of the parameters. In this way, the transformation matrix can be uniquely determined by using the average value.
[0095]
45A and 45B, the point element p134 that is the intersection of the three surfaces f1, f3, and f4 in the two three-dimensional shapes will be described with reference to FIGS. 47 (b), the point element p134 in FIG. 47 (b) is subjected to coordinate transformation between the rotation r and the translation m shown in FIG. 47 (a), so that the position of FIG. 47 (a) is obtained. Since the point element p134 can be moved, it is considered that the world coordinate systems in FIGS. 45A and 45B can be matched by coordinate conversion between the rotation r and the translation m. Since such coordinate transformation is generally expressed in a matrix format, the coordinate transformation can be defined as a matrix element. Therefore, not only the point element p134 but also other point elements (which should be larger) are determined for coordinate conversion elements, average values are determined for the same elements, and coordinate conversion is performed using a conversion matrix having the average values as elements. Is given.
[0096]
By using the transformation matrix having the average value for each element of the transformation matrix obtained for a plurality of boundary elements as described above as an element, one cubic that is obtained by superimposing two three-dimensional shapes as shown in FIG. The original shape can be obtained. In the illustrated example, the coordinate transformation is performed so that the three-dimensional shape of FIG. 47B overlaps the three-dimensional shape of FIG. 47A. Of the three-dimensional shape subjected to the coordinate transformation, the surface f5 is the back surface. It is invisible on the side. Here, in FIG. 48, a symbol with a dash (′) means a three-dimensional shape after coordinate conversion is performed on the three-dimensional shape of FIG. When the average value for each element is used for each element of the transformation matrix as described above, the point element p134 ′ obtained by performing the coordinate transformation on the position of FIG. 47B is the point element at the position of FIG. It does not completely match p134. Therefore, finally, the midpoint between the point element p134 and the point element p134 ′ is determined as the position of the point element.
[0097]
As described above, a plurality of three-dimensional shapes can be integrated into one three-dimensional data. In the present embodiment, since the surface f2 is not captured by the camera 2, the surface f2 cannot be accurately defined, but the line elements and point elements that form the periphery of the surface f2 are captured by the camera 2. Therefore, it is regarded as a plane expressed by these line elements and point elements.
[0098]
In the above example, the fact that the number of faces changes is used to set each phase image group k34, k45, k56, k63 using moving images (time-series images). The phase image group may be divided according to the presence / absence of a surface on which self concealment occurs. In other words, the boundary element is not only represented by a point element and a line element, but by defining the direction of each line element, it becomes a directed line element, and the change in the direction of the line element accompanying the self-hiding of each surface is used. It is determined that the surface that is the tracking region has become invisible, and when the surface that is the tracking region changes from the visible state to the invisible state, it is determined that the phase image group is separated. In other words, since the line elements sequentially connect the point elements, the point elements are circulated by following the point elements in order, and when the direction of circulation is reversed in the image, the direction of circulation is monitored. It can be determined that the surface has become invisible.
[0099]
Specifically, as shown in FIG. 49 with an arrow for a visible surface, a boundary element that constitutes one surface is defined so that it can go round in a clockwise direction when following the direction of the line element. If so, the invisible surface will be counterclockwise. For example, in FIG. 49, the surfaces f1, f3, and f4 are visible, and the surfaces f2, f5, and f6 are invisible. Here, since the surface f1 is always visible and the surface f2 is always invisible because the object 1 is imaged with the positional relationship described above with respect to the object 1, the surfaces f1 and f2 are excluded. To do. Therefore, here, the targets of the visible / invisible determination are surfaces f3 to f6. In this case, if the line elements can be traced clockwise with respect to the visible faces f3 and f4, the invisible faces f5 and f6 follow the line elements counterclockwise. This is the same when the surfaces f3 and f4 become invisible. In short, if the direction is given to the line element so that it goes around the boundary element of one surface by following the line element, it will be counterclockwise whether to follow the line element clockwise depending on whether the surface is visible or invisible. Since it changes whether the line element is traced around, it can be determined that the surface that is the tracking area has become invisible due to self-concealment, and is set as the tracking area in the moving image (time-series image) When the finished surface is changed from visible to invisible, the phase image group can be separated.
[0100]
【The invention's effect】
The invention according to claim 1 is a tracking region in which a stationary object is captured by a TV camera from a plurality of different positions to obtain a plurality of images, and a surface selected from the object is represented by a boundary element and an image feature amount. As a setting in the image, create a deformed area in which the tracking area set in one image is transformed to match the boundary element of the tracking area candidate in the other image, and the tracking area candidate in the other image A first process of selecting a tracking area corresponding to a tracking area set in one image from candidates of the tracking area by comparing at least one of a boundary element and an image feature amount with respect to the deformation area A reference area obtained by mapping a tracking area with a small number of non-existing areas to a three-dimensional space is set, and when at least a part of the tracking area cannot be tracked in an image, The second process of determining whether or not an area that cannot be imaged by the TV camera is generated based on the positional relationship in the three-dimensional space, and the positional relationship of the reference area when the reference area is imaged at each position of the TV camera. And a third process for estimating the cause that the tracking area can no longer be captured by the TV camera based on the estimation result, and correcting the tracking area so that the tracking area can be tracked based on the estimation result, and continuing the tracking. By tracking the tracking area between each image of the moving image, not only can the three-dimensional information that can be used for 3DCG data be easily obtained from the two-dimensional image, but also the area of the tracking area that is not imaged. Since the occurrence is detected and the cause that is not imaged is estimated, even if tracking of the tracking area fails once, the countermeasure is taken according to the cause to ensure the tracking area It is possible to trace. As a result, the validity of the obtained three-dimensional information is verified and the occurrence of errors is suppressed.
[0101]
According to a second aspect of the present invention, in the first aspect, in the first step, a candidate corresponding to a boundary element of a tracking region set in one image is extracted on another image, and a candidate of boundary element is extracted. After generating tracking candidates by combination, generate a deformation area by deforming the tracking area set in one image so that it matches the tracking candidate, and the shape, pixel value, and image of the boundary element for the tracking candidate and the deformation area The tracking area in other images is determined from the tracking candidates by comparing the indices selected from the features, and the tracking area is constructed instead of using local features as in the past. Since line elements (boundary lines), point elements (boundary points of boundary lines), or image features are used, the tracking region can be tracked more reliably than before.
[0102]
According to a third aspect of the present invention, in the second aspect of the present invention, an edge is extracted from another image, and the distance and direction from the line element constituting the tracking region in one of the edges is within a specified range. Is selected as a line element candidate, and the invention of claim 4 is characterized in that, in the invention of claim 2, an edge is extracted from another image and a Hough transform is performed on each pixel on the edge. And an edge having a distance and a direction within a specified range with respect to a line element constituting a tracking region in one image is selected as a line element candidate. According to the invention of claim 5, in the invention of claim 2, the shape in the vicinity of the point element constituting the tracking region in one image is used as a template, and the template is matched in another image It is characterized by extracting point element candidates from the part to be processed, and in any case, a limited number of tracking candidates are generated by combinations of candidate boundary elements, so accurate tracking is possible while limiting the processing amount become.
[0103]
In the invention of claim 6, in the invention of claim 2, since the pixel values of the tracking candidate and the deformation area are used as indices, the tracking candidate and the deformation area can be compared even when the image has no feature.
[0104]
The invention of claim 7 is characterized in that, in the invention of claim 2, the average luminance of the tracking region and the tracking candidate is used as an index, and the invention of claim 8 is the invention of claim 2, wherein the image is a color image. The color of the tracking area and the tracking candidate is used as an index, and the invention of claim 9 uses the spatial frequency distribution of the tracking area and the tracking candidate as an index in the invention of claim 2. In any case, by using the feature in the image, it is possible to easily compare the tracking candidate and the deformation region by comparing numerical values.
[0105]
According to a tenth aspect of the present invention, in the second aspect of the present invention, a portion in which a change in pixel value exceeds a specified value is extracted from the tracking region and the tracking candidate region, and at least one element of the position, shape, and image feature of this portion Therefore, the tracking candidate and the deformed region can be more reliably compared with each other by using the feature in the image.
[0106]
The invention of claim 11 is the invention of claim 2, wherein the image is a color image, and the average luminance or color of the tracking region and the tracking candidate is used as an index, and the case where the spatial frequency distribution is used as an index. The luminance and color within the tracking region in one image when a portion where the change in the pixel value exceeds a specified value in the region is extracted and at least one element of the position, shape, and image feature of this portion is used as an index The method is selected according to the distribution pattern, and by using the information on the luminance and the color, a method suitable for comparison between the tracking candidate and the deformation area can be automatically selected.
[0107]
According to a twelfth aspect of the present invention, in the second aspect of the invention, the reflected light from the surface of the object corresponding to each tracking area is set as the attitude of the TV camera when the average luminance is maximum among the plurality of images in the plurality of tracking areas. It is estimated that the irradiation direction from the light source is estimated from the normal direction of the surface of the object corresponding to each tracking area and the direction of the reflected light, and by estimating the position of the light source It becomes possible to remove the influence of the shadow, and the tracking error due to the influence of the shadow can be prevented when tracking the tracking area.
[0108]
According to a thirteenth aspect of the present invention, in the second aspect of the invention, in the second step, the corresponding tracking areas between a plurality of images are mapped to a three-dimensional space, and the matching is obtained from the correspondence between the boundary elements of the tracking areas. When the degree is equal to or greater than the threshold, the tracking area mapped in the three-dimensional space is used as a reference area, the boundary element between the tracking area corresponding to the reference area and the reference area is compared, and an image is captured in the tracking area by a change in the boundary element. It is characterized by extracting a non-existing region, and the positional relationship of the target object can be verified relatively easily by simply reproducing the position of the target object obtained from the image in the three-dimensional space using the reference area. Thus, it is possible to easily verify whether or not a region that has not been imaged exists.
[0109]
According to a fourteenth aspect of the present invention, in the thirteenth aspect of the present invention, the number of boundary elements between a plurality of images is limited to a range that does not change, and for each range, the contribution rate obtained during mapping is used as the degree of coincidence. A tracking region with a contribution ratio equal to or greater than a threshold is set as a reference region candidate, and when a plurality of reference region candidates are obtained, the candidate having the largest area is used as the reference region. Thus, there is an advantage that it is possible to set a reference region that better corresponds to one surface of the object.
[0110]
In the invention of claim 15, in the invention of claim 14, since the change in the number of boundary elements focuses on one of the line element and the point element, the processing amount is small and high-speed processing becomes possible.
[0111]
In the invention of claim 16, in the invention of claim 14, since the contribution rate is obtained from the components of the diagonal matrix obtained when the tracking region is mapped to the three-dimensional space by the factorization method, the linearity using the factorization method is obtained. A solution can be obtained stably by calculation.
[0112]
According to a seventeenth aspect of the present invention, in the thirteenth aspect of the present invention, a tracking region is mapped in a three-dimensional space and compared with a reference region, and regions defined by point elements that do not correspond to each other are not imaged in the tracking region. The invention according to claim 18 is characterized in that, in the invention according to claim 13, when the tracking region is mapped in the three-dimensional space and compared with the reference region, there is a line element that does not correspond to each other. The element is regarded as a part of a region that is not imaged in the tracking region, and the range of the region that is not imaged can be easily obtained.
[0113]
According to a nineteenth aspect of the present invention, in the first aspect, in the third step, all reference areas in the three-dimensional space are projected on an image plane determined by the position of the TV camera, and the reference areas in the image plane are Based on the positional relationship between the reference area and the reference surface relative to the image plane, the cause of the uncaptured area in the tracking area is determined, and the actual positional relation of the object is simply reproduced. It is possible to easily determine the cause of a region that has not been imaged by performing a simulation based on the positional relationship of the reference region.
[0114]
According to a twentieth aspect of the present invention, in the nineteenth aspect of the present invention, when another reference area exists between the target reference area and the TV camera and both reference areas are not connected, it is determined that the other person is concealed. The invention of claim 21 is characterized in that, in the invention of claim 19, when another reference area exists between the reference area of interest and the TV camera and both reference areas are connected, The invention according to claim 22 is characterized in that, in the invention according to claim 19, the frame is determined to be out of frame when the reference area of interest is located at the periphery of the screen. Based on the positional relationship between each other and the image plane, it is possible to determine the cause of the occurrence of a region that has not been imaged.
[0115]
According to the invention of claim 23, in the invention of claim 19, when there is a region not imaged in the tracking region, the edge extension is performed, and the intersection processing of the extended edge is used as a new point element to perform the tracking process. It is characterized by concealment if the contribution rate is improved when re-executed and mapped to a three-dimensional space, and it is possible to avoid a situation in which tracking becomes impossible due to concealment of the tracking area, and more tracking area It becomes possible to track reliably.
[0116]
According to a twenty-fourth aspect of the present invention, there is provided an image input unit that is provided as an image input image captured by a TV camera, an image processing device that performs processing by the image feature tracking processing method according to the first aspect, and an input A storage device for storing the processed image and the image processed by the image processing device, a display unit for displaying the processed image in the image processing device, and a region designating unit for designating a tracking region for the image processing device. In addition to the effects similar to those of the first aspect of the invention, three-dimensional information can be easily obtained using a moving image captured by a normal video camera or the like as a TV camera.
[0117]
In the invention of claim 25, after acquiring a plurality of images by imaging a target object from a plurality of different positions with a TV camera, the surface selected from the target object is set as a tracking area represented by a boundary element in the image, A modified region is created by deforming a tracking region set in one image so as to match a boundary element of a tracking region candidate in another image, and a boundary between the tracking region candidate and the modified region in the other image is created. A tracking region corresponding to the tracking region set in the one image is selected from the tracking region candidates by comparing the elements, and a plurality of surfaces of the object are imaged and the same surface is imaged continuously. The plurality of images are divided into one phase image group, and then, for the plurality of phase images, the surface selected from the object for each phase image group is mapped to the three-dimensional space. To obtain a three-dimensional shape for each phase image group and to generate a three-dimensional data by performing coordinate transformation so that the coordinate systems of the other phase image groups coincide with each other. In the image), a plurality of continuous images in which the same surface of the object is imaged is used as a phase image group, and a three-dimensional shape is obtained within the range of the phase image group. The three-dimensional shape can be obtained reliably and easily without being unable to be tracked by self-hiding. That is, there is no inconvenience even if a three-dimensional shape is obtained by applying a conventionally known factorization method or stereo method. In this way, in each phase image group, a three-dimensional shape can be easily obtained, and then one coordinate conversion is performed so as to match the coordinate system of the three-dimensional shape obtained from each phase image group. It can be summarized into three-dimensional data.
[0118]
In the invention of claim 26, after acquiring a plurality of images by capturing an object with a TV camera from a plurality of different positions, a surface selected from the object is set as a tracking area represented by a boundary element in the image, A modified region is created by deforming a tracking region set in one image so as to match a boundary element of a tracking region candidate in another image, and a boundary between the tracking region candidate and the modified region in the other image is created. A tracking region corresponding to the tracking region set in the one image is selected from the tracking region candidates by comparing the elements, and a plurality of surfaces of the object are imaged and the same surface is imaged continuously. The plurality of images are divided into one phase image group, and then, for the plurality of phase images, the surface selected from the object for each phase image group is mapped to the three-dimensional space. By obtaining a three-dimensional shape for each phase image group by performing coordinate transformation of the three-dimensional shape obtained for each phase image group so that line elements common to other phase image groups are superimposed on each other It is characterized by creating three-dimensional data, and a plurality of continuous images in which the same surface of an object is imaged in a moving image (time-series image) is defined as a phase image group, and three-dimensional within the range of the phase image group Since the shape is obtained, each surface does not become unable to be tracked by self concealment within the range of the phase image group, and the three-dimensional shape can be obtained reliably and easily. That is, there is no inconvenience even if a three-dimensional shape is obtained by applying a conventionally known factorization method or stereo method. In this way, in each phase image group, a three-dimensional shape can be easily obtained, and thereafter, coordinate conversion is performed by superimposing the three-dimensional shape line elements obtained from the respective phase image groups. It can be combined into three three-dimensional data.
[0119]
According to a twenty-seventh aspect of the present invention, in the invention of the twenty-fifth or twenty-sixth aspect, a surface selected from the object in the plurality of images obtained by imaging with the TV camera is sequentially displayed as a point element and each point element. This is expressed as a boundary element consisting of connected directional line elements. In each image obtained in time series, the line element is traced to monitor the direction in which each point element is circulated. A phase image group is used, and the process of dividing the phase image group can be automated.
[0120]
In the invention of claim 28, in the invention of claim 25 or claim 26, in the coordinate transformation, after obtaining transformation parameters relating to rotation and translation for each boundary element, the transformation parameters obtained for each boundary element are averaged. The coordinate conversion is performed using the converted parameters, and the conversion parameters obtained from the individual boundary elements can be uniquely determined by averaging the conversion parameters of the coordinate conversion even when the conversion parameters do not match. .
[0121]
The invention of claim 29 is the invention of claim 25 or claim 26, wherein the position of each point element and the other three-dimensional shape after the coordinate transformation is applied to one three-dimensional shape in which the coordinate systems coincide with each other. Each point element's position is the midpoint of the corresponding point element's position, and the position of each point element is unique even if there is a deviation in the position of the point element when coordinate conversion is performed It becomes possible to decide on.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an apparatus used in an embodiment of the present invention.
FIG. 2 is a perspective view showing a state in which an object is imaged by a TV camera in the same as above.
FIG. 3 is a diagram showing a time-series image group obtained in the above.
FIG. 4 is a diagram showing an example of the image in the same as above.
FIG. 5 is a diagram showing an example of a tracking area in the same as above.
FIG. 6 is a diagram showing an example of the object in the same as above.
FIG. 7 is an operation explanatory diagram of the above.
FIG. 8 is a diagram showing an example of tracking candidates in the above.
FIG. 9 is a diagram showing an example of a deformation area in the same as above.
FIG. 10 is a diagram showing a process of generating a deformation region in the same as above.
FIG. 11 is an operation explanatory diagram of the above.
FIG. 12 is a diagram showing an example of a template in the same as above.
FIG. 13 is a diagram showing an example of tracking candidates in the above.
FIG. 14 is a diagram showing an example of a deformation area in the same as above.
FIG. 15 is an operation explanatory diagram of the above.
FIG. 16 is a diagram showing an example of a tracking area in the same as above.
FIG. 17 is an operation explanatory diagram of the above.
FIG. 18 is a diagram showing an example of a deformation area in the same as above.
FIG. 19 is a diagram for explaining the operation of the above.
FIG. 20 is an operation explanatory diagram of the above.
FIG. 21 is a diagram for explaining the operation of the above.
FIG. 22 is a diagram for explaining the operation of the above.
FIG. 23 is an operation explanatory view of the above.
FIG. 24 is a diagram illustrating the operation of the above.
FIG. 25 is an operation explanatory view of the above.
FIG. 26 is an operation explanatory view of the above.
FIG. 27 is a diagram showing an example in which a shadow is generated in the same as above.
FIG. 28 is an operation explanatory diagram of the above.
FIG. 29 is an operation explanatory view of the above.
FIG. 30 is a diagram for explaining the operation of the above.
FIG. 31 is an operation explanatory view of the above.
FIG. 32 is an operation explanatory view of the above.
FIG. 33 is an operation explanatory diagram of the above.
FIG. 34 is a diagram for explaining the operation of the above.
FIG. 35 is an explanatory diagram of the operation of the above.
FIG. 36 is a diagram for explaining the operation of the above.
FIG. 37 is a diagram for explaining the operation of the above.
FIG. 38 is a diagram for explaining the operation of the above.
FIG. 39 is a diagram for explaining the operation of the above.
FIG. 40 is a diagram for explaining the operation of the above.
FIG. 41 is a perspective view showing a state in which an object is imaged by the TV camera in the second embodiment of the present invention.
FIG. 42 is a perspective view showing an object used in the above.
FIG. 43 is a plan view showing an object used in the above.
FIG. 44 is a diagram for explaining the concept of a phase image group in the above.
FIG. 45 is a diagram showing a state where a three-dimensional shape is obtained from each phase image group in the above.
FIG. 46 is a diagram showing a state in which the three-dimensional shapes are integrated in the above.
FIG. 47 is a diagram for explaining the operation of the above.
FIG. 48 is an explanatory diagram of the operation of the above.
FIG. 49 is a diagram for explaining the operation of the above.
[Explanation of symbols]
1 object
2 TV camera
3 objects
11 Image input device
12 Image input means
13 Storage device
14 Image processing device
15 Display means
16 Area designation means
17 Area tracking means
18 Tracking evaluation means
19 Shape restoration means
D Tracking area
E Deformation area
F1 image file
F2 tracking data file
F3 3D shape data file

Claims

A stationary object is imaged by a TV camera from a plurality of different positions to acquire a plurality of images, and a surface selected from the object is set as a tracking area represented by boundary elements and image feature amounts in the image. Creating a deformed area in which a tracking area set in one image is deformed so as to match a boundary element of a tracking area candidate in another image, and the tracking area candidate and the deformed area in the other image A first process of selecting a tracking area corresponding to the tracking area set in the one image from the tracking area candidates by comparing at least one of the boundary element and the image feature amount; A reference area obtained by mapping a tracking area with few missing areas to a three-dimensional space is set, and at least a part of the tracking area cannot be tracked in the image. A second step of determining whether or not an area that cannot be imaged by the TV camera is generated based on the positional relationship of the reference area in the three-dimensional space, and a reference area when the reference area is imaged at each position of the TV camera A third process of estimating the cause that the tracking area can no longer be captured by the TV camera based on the positional relationship, and correcting the tracking area so that the tracking area can be tracked based on the estimation result and continuing the tracking. An image feature tracking processing method characterized by the above.

In the first step, the candidate corresponding to the boundary element of the tracking region set in one image is extracted on another image, and the tracking candidate is generated by combining the boundary element candidates. By generating a deformed area in which the tracking area set in step 1 is deformed to match the tracking candidate, and comparing the index selected from the shape of the boundary element, the pixel value, and the image feature amount between the tracking candidate and the deformed area 2. The image feature tracking processing method according to claim 1, wherein a tracking region in the other image is determined from tracking candidates.

Extracting an edge in the other image, and selecting an edge having a distance and direction within a specified range as a line element candidate from the edge in the one image. The image feature tracking processing method according to claim 2.

In addition to extracting an edge in the other image, a Hough transform is performed on each pixel on the edge to extract an edge having continuity, and a line element that constitutes a tracking region in the one image among the edges is extracted. 3. The image feature tracking processing method according to claim 2, wherein an edge whose distance and direction are within a prescribed range is selected as a line element candidate.

3. The image feature according to claim 2, wherein a point element candidate is extracted from a portion matching the template in the other image using a shape in the vicinity of the point element constituting the tracking region in the one image as a template. Tracking processing method.

The image feature tracking processing method according to claim 2, wherein pixel values of the tracking candidate and the deformation area are used as the index.

The image feature tracking processing method according to claim 2, wherein an average luminance of the tracking area and the tracking candidate is used as the index.

The image feature tracking processing method according to claim 2, wherein the image is a color image, and colors of the tracking area and the tracking candidate are used as the index.

The image feature tracking processing method according to claim 2, wherein a spatial frequency distribution between the tracking region and the tracking candidate is used as the index.

The portion where the change in pixel value exceeds a specified value in the tracking region and the tracking candidate region is extracted, and at least one element of the position, shape, and image feature of this portion is used as the index. 2. The image feature tracking processing method according to 2.

The image is a color image, and when the average luminance or color of the tracking area and the tracking candidate is used as the index, and when the spatial frequency distribution is used as the index, the change of the pixel value within the area is defined. Select a part exceeding the value and use at least one element of the position, shape, and image feature of the part as the index according to the luminance and color distribution pattern in the tracking area of the one image The image feature tracking processing method according to claim 2, wherein:

Estimated that the posture of the TV camera when the average brightness is the maximum among multiple images for multiple tracking areas is close to the direction of reflected light on the surface of the object corresponding to each tracking area, and corresponds to each tracking area 3. The image feature tracking processing method according to claim 2, wherein the irradiation direction from the light source is estimated from the normal direction of the surface of the object and the direction of the reflected light.

In the second step, corresponding tracking areas between a plurality of images are mapped to a three-dimensional space, and when the degree of coincidence obtained from the correspondence between the boundary elements of the tracking areas is equal to or greater than a threshold, the mapping is performed to the three-dimensional space. The tracking region is a reference region, boundary elements between the tracking region corresponding to the reference region and the reference region are compared, and a region that is not imaged in the tracking region is extracted by a change in the boundary element. The image feature tracking processing method according to claim 1.

Limit to the range where the number of boundary elements does not change between multiple images, and for each range, use the contribution rate obtained at the time of mapping as the degree of coincidence, and set the tracking region whose contribution rate is greater than or equal to the threshold as the reference region candidate 14. The image feature tracking processing method according to claim 13, wherein when a plurality of reference area candidates are obtained, the candidate having the largest area is adopted as the reference area.

The image feature tracking processing method according to claim 14, wherein the change in the number of boundary elements focuses on one of a line element and a point element.

15. The image feature tracking processing method according to claim 14, wherein the contribution rate is obtained from a diagonal matrix component obtained when mapping the tracking region into a three-dimensional space by a factorization method.

14. The image feature according to claim 13, wherein a tracking area is mapped in a three-dimensional space and compared with a reference area, and an area defined by point elements that do not correspond to each other is obtained as an area that is not imaged in the tracking area. Tracking processing method.

A tracking region is mapped in a three-dimensional space and compared with a reference region, and when there is a line element that does not correspond to each other, the line element is regarded as a part of a region not imaged in the tracking region. The image feature tracking processing method according to claim 13.

In the third process, all the reference areas in the three-dimensional space are projected onto the image plane determined by the position of the TV camera, and the positional relation between the reference areas in the image plane and the positional relation of the reference area with respect to the image plane are set. The image feature tracking processing method according to claim 1, further comprising: determining a cause of a region that is not imaged in the tracking region.

20. The image feature tracking process according to claim 19, wherein when another reference area exists between the reference area of interest and the TV camera and both reference areas are not connected, it is determined that the other person is concealed. Method.

20. The image feature tracking processing method according to claim 19, wherein when the other reference area exists between the target reference area and the TV camera, and the both reference areas are connected, it is determined as self-hiding. .

20. The image feature tracking processing method according to claim 19, wherein it is determined that the frame is out when the reference region of interest is located at the periphery of the screen.

When there is a region that is not imaged in the tracking region, edge extension is performed, and the tracking processing is performed again using the intersection of the extended edges as a new point element, and the contribution rate when mapping to the three-dimensional space 20. The image feature tracking processing method according to claim 19, wherein concealment is confirmed if the image quality is improved.

An image input unit provided as an image input image picked up by a TV camera, an image processing device for processing the input image by the image feature tracking processing method according to claim 1, an input image, and an image processing device An image feature comprising: a storage device that stores the image processed in step S1, a display unit that displays a processed image in the image processing device, and a region designation unit that designates a tracking region for the image processing device. Tracking processing device.

After capturing a plurality of images by capturing images of a target object from different positions with a TV camera, the surface selected from the target object is set as a tracking area represented by a boundary element in the image and set in one image. Tracking is performed by creating a deformed region in which the tracking region is deformed so as to match the boundary element of the tracking region candidate in another image, and comparing the boundary element for the tracking region candidate and the deformed region in the other image. A tracking region corresponding to the tracking region set in the one image is selected from the region candidates, and a plurality of images in which a plurality of surfaces of the object are captured and the same surface is continuously captured are selected as 1 Each phase image is divided by dividing it into three phase images, and then mapping the surface selected from the object for each phase image group into a three-dimensional space for each phase image group. Obtains a three-dimensional shape for each group, by performing a coordinate transformation to match the coordinate system of one another other phase images, 3-dimensional data generation method characterized by creating a three-dimensional data.

After capturing a plurality of images by capturing images of a target object from different positions with a TV camera, the surface selected from the target object is set as a tracking area represented by a boundary element in the image and set in one image. Tracking is performed by creating a deformed region in which the tracking region is deformed so as to match the boundary element of the tracking region candidate in another image, and comparing the boundary element for the tracking region candidate and the deformed region in the other image. A tracking region corresponding to the tracking region set in the one image is selected from the region candidates, and a plurality of images in which a plurality of surfaces of the object are captured and the same surface is continuously captured are selected as 1 Each phase image is divided by dividing it into three phase images, and then mapping the surface selected from the object for each phase image group into a three-dimensional space for each phase image group. Three-dimensional data is obtained by obtaining a three-dimensional shape for each group and performing coordinate transformation of the three-dimensional shape obtained for each phase image group so as to superimpose line elements common to other phase image groups. A method for creating three-dimensional data, characterized by:

A surface selected from the object in the plurality of images obtained by capturing with the TV camera is represented by a boundary element including a point element and a directed line element that sequentially connects the point elements. 27. The phase image is traced in each obtained image to monitor the direction in which each point element is circulated, and one phase image group is formed until the circulated direction is reversed. 3D data creation method.

26. In the coordinate transformation, after obtaining transformation parameters relating to rotation and translation for each boundary element, coordinate transformation is performed using transformation parameters obtained by averaging the transformation parameters obtained for each boundary element. Or the three-dimensional data creation method of Claim 26.

The midpoint between the position of each point element after the coordinate transformation is applied to one three-dimensional shape whose coordinate systems match each other and the position of each corresponding point element in the other three-dimensional shape 27. The three-dimensional data creation method according to claim 25 or claim 26, wherein the position is a position.