JP4244593B2

JP4244593B2 - Object extraction device, object extraction method, and image display device

Info

Publication number: JP4244593B2
Application number: JP2002236041A
Authority: JP
Inventors: 剛田中; 哲二郎近藤
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-08-13
Filing date: 2002-08-13
Publication date: 2009-03-25
Anticipated expiration: 2022-08-13
Also published as: JP2004078432A

Description

【０００１】
【発明の属する技術分野】
画像に表示されるオブジェクトを抽出するオブジェクト抽出装置、オブジェクト抽出方法、および画像表示装置に関する。
【０００２】
【従来の技術】
画像には、人物や物体などの複数のオブジェクトが表示される。従来、画像からオブジェクトを抽出する場合、特開平１２−２２２５８４号公報に記載のように、画像とともにオブジェクトの特徴を示す値を記録し、この値をもとにオブジェクトの抽出を行う方法がある。また、特開平９−２１２６４８号公報に記載のように、オプティカルフローを基にカメラ動きや画像の三次元構造を推定し、背景とオブジェクトを抽出する方法もある。
【０００３】
【発明が解決しようとする課題】
しかしながら、上述した方法は、単に画像からオブジェクトを抽出する技術であり、カメラマンが注目し、撮像している主要なオブジェクトを抽出することはできない。
【０００４】
本発明は、上記課題に鑑みてなされたものであって、画像から主要なオブジェクトを抽出するオブジェクト抽出装置、オブジェクト抽出方法および画像表示装置を提供することを目的とする。
【０００５】
【課題を解決するための手段】
上述の課題を解決するため、本発明にかかるオブジェクト抽出装置は、入力された画像の動きベクトルを計算する動きベクトル計算手段と、動きベクトルをもとにカメラ動きがあったか否かを推定するカメラ動き推定手段と、カメラ動き推定手段において、カメラ動きがあったと推定された場合、動きベクトルの大きさが閾値としてのオブジェクト判別値以下である領域を注目オブジェクトの候補として抽出する領域抽出手段と、動きベクトル計算手段によって計算された動きベクトルと複数種類のカメラ動きに対応した動きベクトルのパターンとを適合させる適合手段と、適合手段において適合したカメラ動きの種類をもとに注目オブジェクトが含まれる領域を注目領域として推定する注目領域推定手段とを有し、領域抽出手段は、注目領域内で注目オブジェクトを抽出する。
【０００６】
また、本発明にかかるオブジェクト抽出方法は、コンピュータに、入力された画像の動きベクトルを計算する動きベクトル計算工程と、動きベクトルをもとにカメラ動きがあったか否かを推定するカメラ動き推定工程と、カメラ動き推定工程において、カメラ動きがあったと推定された場合、動きベクトルの大きさが閾値としてのオブジェクト判別値以下である領域を注目オブジェクトの候補として抽出する領域抽出工程と、動きベクトル計算工程によって計算された動きベクトルと複数種類のカメラ動きに対応した動きベクトルのパターンとを適合させる適合工程と、適合工程において適合したカメラ動きの種類をもとに注目オブジェクトが含まれる領域を注目領域として推定する注目領域推定工程とを実行させ、領域抽出工程では、注目領域内で注目オブジェクトを抽出する。
【０００７】
また、本発明にかかる画像表示装置は、入力された画像の動きベクトルを計算する動きベクトル計算手段と、動きベクトルをもとにカメラ動きがあったか否かを推定するカメラ動き推定手段と、カメラ動き推定手段において、カメラ動きがあったと推定された場合、動きベクトルの大きさが閾値としてオブジェクト判別値以下である領域を注目オブジェクトの候補として抽出する領域抽出手段と、テロップが表示される領域を判定するテロップ領域判定手段を有し、カメラ動き推定手段は、テロップ領域を除いた範囲で動きベクトルの方向ごとの集計を行う。
【０００８】
【発明の実施の形態】
画像には、人物や物体などの複数のオブジェクトが表示される。画像に表示される複数のオブジェクトのうち、画像の主要部となるオブジェクト（以下、注目オブジェクトと呼ぶ）は、常にカメラの視野に収まるように撮影される。このため、注目オブジェクトは、画像内においてほとんど動きがないということが起こる。本発明を適用した画像表示装置は、動きベクトルを計算し、動きベクトルの大きさが所定の閾値より小さい領域を注目オブジェクトの表示される領域として選択することを特徴としている。
【０００９】
図１は、人物を注目オブジェクトとしたときの画像の例である。図１に示すように、注目オブジェクトは、画面の中心に配置されている。そして、注目オブジェクトが左に移動すると、カメラの視野も注目オブジェクトとともに左に移動する。また、注目オブジェクトが右に移動すると、カメラの視野も注目オブジェクトとともに右に移動する。
【００１０】
カメラの視野が注目オブジェクトとともに移動すると、図２に示すように、注目オブジェクト以外の物体や人物（以下、非注目オブジェクトと呼ぶ）および背景は、カメラの視野が移動した分だけ逆方向に移動する。
【００１１】
そのため、画像の動きベクトルを求めると、注目オブジェクトの動きベクトルは小さくなり、非注目ベクトルや背景の動きベクトルは大きくなる。本発明を適用した画像表示装置は、この原理を利用して、動きベクトルの小さい領域を注目オブジェクトとして特定する。なお、以下の説明ではカメラの視野の移動をカメラ動きと呼び、動きベクトルの方向を動き方向と呼ぶ。
【００１２】
図３は、注目オブジェクトを抽出する注目オブジェクト抽出部１の概略構成を示す機能ブロック図である。注目オブジェクト抽出部１は、１フレーム前の画像を記憶する遅延バッファ１１、動きベクトルを算出する動きベクトル計算部１２、テロップ領域の除去を行うテロップ領域除去部１３、カメラ動きがあるか否かを推定するカメラ動き推定部１４、動きベクトルの大きさが小さい領域を抽出する領域抽出部１５、複数の領域から注目オブジェクトが表示される領域を特定する領域選択部１６とを有する。
【００１３】
遅延バッファ１１は、入力画像を格納し、１フレーム前の入力画像を動きベクトル計算部１２に出力する。動きベクトル計算部１２は、入力画像と１フレーム前の入力画像と複数のブロックに分割し、ブロックごとの動きベクトルを計算し、テロップ領域除去部１３に出力する。
【００１４】
テロップ領域除去部１３は、動きベクトル計算部１２から出力された動きベクトルのうち、動きベクトルの大きさがゼロに近いブロックを選択し、このブロックの動きベクトルの大きさが所定時間以上継続したとき、このブロックの存在する領域をテロップ領域であると判定する。
【００１５】
カメラ動き推定部１４は、動きベクトルの方向をもとにカメラの撮像方向が移動したか否かを推定する。カメラ動き推定部１４は、動きベクトルを動き方向ごとに分類し、集計をとる。カメラ動き推定部１４は、ある動き方向の集計結果（動きベクトルの個数）が閾値以上であると、カメラ動きの方向はその動き方向と逆の方向であると推定する。
【００１６】
領域抽出部１５は、カメラ動き推定部１４の推定結果を入力し、カメラ動きがあった場合には、注目オブジェクトの抽出を開始する。領域抽出部１５は、動きベクトルの大きさが小さいブロックを抽出し、抽出したブロックを構成するブロックにラベリング処理を行う。ブロックは、ラベリング処理が施されることにより、割り付けられた番号ごとに画像を複数の領域を分類することができる。
【００１７】
領域選択部１６は、領域抽出部１５によってラベリングされた領域の中から注目オブジェクトを選択する。領域選択部１６は、領域抽出部１５によって一つの領域が選択された場合には、その領域を注目オブジェクトであると決定する。
【００１８】
また、領域選択部１６は、領域抽出部１５によって複数の領域が選択された場合には、ラベリングされた領域の面積、重心、アクティビティを計算し、これらの値をもとに最も注目オブジェクトらしい領域を注目オブジェクトとして選択する。
【００１９】
次に、図４を参照して、注目オブジェクト抽出部１の動作を説明する。注目オブジェクト抽出部１は、入力画像を読み込み（ステップＳ１）、入力画像は遅延バッファ１１と動きベクトル計算部１２に供給される。遅延バッファ１１は、入力画像を蓄積し、１フレーム前の入力画像を動きベクトルに出力する。動きベクトル計算部１２は、入力画像と遅延バッファ１１から出力される１フレーム前の入力画像とをブロックに分割し、ブロックごとの動きベクトルを計算する（ステップＳ２）。
【００２０】
テロップ領域除去部１３は、テロップ領域と判定されたブロックを除去するための処理である。テロップ領域の判定には、種々の方法がある。例えば、動きベクトル計算部１２から出力された動きベクトルのうち、動きベクトルの大きさがゼロに近いブロックを選択し、選択したブロックの動きベクトルの大きさが所定時間以上変化しないとき、このブロックの存在する領域をテロップ領域であると判定する方法が挙げられる（ステップＳ３）。
【００２１】
カメラ動き推定部１４は、動きベクトルを用いてカメラの撮像方向が移動したか否かを推定する。動きベクトルの方向とは、例えば、右向き、左向き、上向き、下向き、外向き、内向きの６方向である。カメラ動き推定部１４は、動きベクトルの方向ごとに集計をとり、ある方向の集計結果が動き有無判別値以上であると、その方向と逆の方向にカメラ動きがあったものと推定する（ステップＳ５；ＹＥＳ）。また、動きベクトルの方向ごとに集計をとったとき、全ての方向の集計結果が動き有無判別値を超えないならば、カメラ動きがないと推定し（ステップＳ５；ＮＯ）、注目オブジェクトの抽出処理を完了する。
【００２２】
領域抽出部１５には、動きベクトルの大小を判定するための閾値として、オブジェクト判別値が設定されている。領域抽出部１５は、カメラ動き推定部１４からの推定結果を入力し、カメラ動きがあった場合には（ステップＳ５；ＹＥＳ）、オブジェクト判別値と動きベクトルの大きさとを比較して、動きベクトルの大きさがオブジェクト判別値より小さいブロックを選択する。
【００２３】
次いで、領域抽出部１５は、選択したブロックにラベリング処理を施す。ラベリング処理とは、ブロックに番号をつける処理である。領域抽出部１５は、選択したブロックが隣接する場合には、隣接するブロックに同一の番号をつける。また、領域抽出部１５は、選択したブロックが離間している場合には、離間したブロックに異なる番号をつける。このラベリング処理によって、隣接したブロックは、一つの領域として分類され、離間したブロックは、異なる領域として分類される（ステップＳ７）。領域抽出部１５は、ラベリング処理を終了すると、ラベリング処理が施された画像データを領域選択部１６に出力する。
【００２４】
領域選択部１６は、領域抽出部１５が一つの領域を抽出した場合（ステップＳ８；ＮＯ）、この領域を注目オブジェクトとしてサブ表示装置に出力する。また、領域選択部１６は、領域抽出部１５が複数の領域を抽出した場合（ステップＳ８；ＹＥＳ）、領域ごとに面積、重心、アクティビティなどの領域の特性を示す値を計算する（ステップＳ９）。
【００２５】
領域選択部１６は、ステップＳ９で算出した面積、重心、アクティビティを使用して複数の領域のなかで最も注目オブジェクトらしい領域を選択する。注目オブジェクトの選択のためのいくつかの条件を例示する。例えば、領域選択部１６は、面積の大きい領域を注目オブジェクトが表示される可能性の高い領域とする。これは、注目オブジェクトの表示される領域は、他のオブジェクトの表示される領域と比較して、面積が大きい可能性が高いためである。
【００２６】
また、領域選択部１６は、アクティビティの高い領域を注目オブジェクトの表示される可能性の高い領域とする。カメラの焦点は、注目オブジェクトに合わせられている場合が多く、画像が鮮明になり、注目オブジェクトのアクティビティが高くなるためである。
【００２７】
また、領域選択部１６は、カメラの撮像方向が左に移動しているときは、画面の真ん中から右側に重心が位置する領域を注目オブジェクトが表示された可能性の高い領域とし、カメラの撮像方向が右に移動しているときは、画面の真ん中から左側に重心が位置する領域を注目オブジェクトが表示された可能性の高い領域であるとする。これは、カメラマンの構図のとり方に基づいた選択方法であり、カメラマンは、図５に示すように、注目オブジェクトが左に移動しているときは、画面の真ん中から右側に注目オブジェクトを配置し、注目オブジェクトが右に移動しているときは、画面の真ん中から左側に注目オブジェクトを配置する傾向があるためである。
【００２８】
領域選択部１６は、面積に基づく選択結果、重心に基づく選択結果、アクティビティに基づく選択結果を総合し、最も注目オブジェクトらしい領域を選択する（ステップＳ１０）。
【００２９】
このように、注目オブジェクト抽出部１は、注目オブジェクトとともにカメラの視野が変化したとき、画像の動きベクトルを算出して、動きベクトルの値の小さい領域を注目オブジェクトの表示領域として選択する。
【００３０】
また、注目オブジェクト抽出部１は、注目オブジェクトの選択方法として動きベクトルの他に、領域の面積、重心、アクティビティなどの値を参照して、注目オブジェクトを特定する。
【００３１】
次に、他の方法を用いた注目オブジェクト抽出部２について説明する。この注目オブジェクト抽出部２は、カメラ動きの方向をもとに注目オブジェクトの表示される注目領域を推定し、その範囲内で注目オブジェクトを探索することを特徴としている。
【００３２】
図６は、注目オブジェクト抽出部２の構成を示している。注目オブジェクト抽出部２は、カメラ動きの方向を検出するカメラ動き検出部２１、カメラ動きの方向を基に注目領域を推定し、注目領域内で動きベクトルの大きさが小さい領域を抽出する領域抽出部２２、領域抽出部２２によって抽出された領域の中で注目オブジェクトが表示される領域を選択する領域選択部２３とを有する。
【００３３】
カメラ動き検出部２１は、入力画像を複数のブロックから構成される大ブロックに分割し、大ブロックの動き方向が、パン、チルト、回り込み、前後移動のいずれに適合するかを比較しカメラ動きを推定する。大ブロックの動き方向は、大ブロックを構成するブロックの動き方向を集計することにより算出される。
【００３４】
以下、図７を参照してカメラ動き検出部２１の動作を詳しく説明する。カメラ動き検出部２１は、画像を４つの大ブロックに分割する（ステップＳ２０）。カメラ動き検出部２１は、大ブロックを構成するブロックの動きベクトルの動き方向を上向き、下向き、右向き、左向き、外向き、中心向きの６方向に分類し、動き方向ごとにブロックの個数を集計する（ステップＳ２１）。カメラ動き検出部２１は、カメラ動きがあったか否かを判別するための閾値として、動き方向判別値が設定されている。カメラ動き検出部２１は、ブロックの集計結果が動き方向判別値以上であるとき（ステップＳ２２；ＹＥＳ）、集計結果が動き方向判別値以上であった動きベクトルの大きさを読み出し、この動きベクトルの大きさのばらつきを計算する（ステップＳ２３）。カメラ動き検出部２１には、動きベクトルの大きさに一定の規則性があるか否かを判定する閾値として方向性判定値が設定されている。カメラ動き検出部２１は、ばらつきが方向性判定値以下であるとき（ステップＳ２４；ＹＥＳ）、この大ブロックでカメラ動きがあったと判定する（ステップＳ２５）。
【００３５】
また、カメラ動き検出部２１は、ステップＳ２２で動きベクトルの集計をとったとき、集計結果が動き方向判別値以下であるとき（ステップＳ２２；ＮＯ）、その大ブロックでの動きはないと判定し（ステップＳ２７）、処理を中止する。また、カメラ動き検出部２１は、ステップＳ２４において、動きベクトルの値のばらつきが方向性判定値以下であるとき（ステップＳ２４；ＮＯ）、その大ブロックでの動きはないと判定し（ステップＳ２７）、処理を終了する。
【００３６】
カメラ動き検出部２１は、すべての大ブロックで動き方向が判定されたか否かを判定する（ステップＳ２６）。カメラ動き検出部２１は、すべての大ブロックで動き方向が判定されると（ステップＳ２６；ＹＥＳ）、ステップＳ２８に処理を移行する。また、動き方向が判定されないブロックが存在すると（ステップＳ２６；ＮＯ）、カメラ動き検出部２１は、カメラ動きはなかったものとして処理を終了する。
【００３７】
次いで、カメラ動き検出部２１は、入力した画像のカメラ動きの種類を検出する。カメラ動きの種類は、図８に示すように、パン、チルト、回り込み、前後移動の４パターンある。カメラ動き検出部２１は、４つの大ブロックの動き方向がいずれのカメラ動きに適合するかを一種類ずつ比較していく。
【００３８】
カメラ動き検出部２１は、まず、カメラ動きの種類はパンであるか否かを判定する。パンとは、カメラを固定したまま水平方向に移動する物体を撮像する撮像方法である。パンで撮像した場合、動きベクトルの方向は、水平方向右向きもしくは水平方向左向きになる。カメラ動き検出部２１は、動きベクトルがパンのパターンに適合すると（ステップＳ２８；ＹＥＳ）、カメラ動きは、パンであると特定する（ステップＳ２９）。
【００３９】
また、ステップＳ２８において、動きベクトルがパンのパターンに適合しない場合（ステップＳ２８；ＮＯ）、カメラ動き検出部２１は、カメラ動きの種類がチルトである否かを判定する。チルトとは、カメラを固定したまま上下方向に移動する物体を撮像する撮像方法である。チルトで撮像した場合、動きベクトルの方向は、垂直方向上向きもしくは垂直方向下向きになる。カメラ動き検出部２１は、動きベクトルがチルトのパターンに適合すると（ステップＳ３０；ＹＥＳ）、カメラ動きは、チルトであると特定する（ステップＳ３１）。
【００４０】
ステップＳ３１において、動きベクトルがチルトのパターンに適合しない場合（ステップＳ３１；ＮＯ）、カメラ動き検出部２１は、カメラ動きの種類が回り込みであるか否かを判定する。回り込みとは、物体の周囲を回り込むようにカメラを移動させながら、物体を撮像する撮像方法である。回り込みで撮像した場合、動きベクトルの方向は、水平方向左向きもしくは水平方向右向きになる。回り込みの動きベクトルの方向とパンの動きベクトルの方向は、同一であるが動きベクトルの動き量が異なる。カメラ動き検出部２１は、動きベクトルが回り込みのパターンに適合すると（ステップＳ３２；ＹＥＳ）、カメラ動きは回り込みであると特定する（ステップＳ３３）。
【００４１】
ステップＳ３２において、動きベクトルが回り込みのパターンに適合しない場合（ステップＳ３２；ＮＯ）、カメラ動き検出部２１は、カメラ動きの種類が前後移動であるか否かを判定する。前後移動は、前後に動く物体を追いかけて撮像した状態である。手前に移動する物体を引いて撮像した場合、動きベクトルの方向は画面中心方向を向かう。また、奥に移動する物体を追って撮像した場合、動きベクトルの方向は画面の中心から画面の外側に向かう。カメラ動き検出部２１は、動きベクトルが前後移動のパターンに適合すると（ステップＳ３４；ＹＥＳ）、カメラ動きは前後移動であると特定する（ステップＳ３５）。また、動きベクトルが前後移動のパターンに適合しない場合（ステップＳ３４；ＮＯ）、カメラ動きはないと判定する（ステップＳ３６）。カメラ動き検出部２１は、以上のような手順でカメラ動きの有無およびカメラ動きの種類を検出し、検出結果を領域抽出部２２に出力する。
【００４２】
領域抽出部２２は、動き方向推定部によって推定されたカメラの撮像方向の移動方向をもとに、注目オブジェクトが表示される注目領域を推定し、注目領域の中で動きベクトルの値が閾値より小さいブロックを抽出する。注目領域は、カメラが移動している間常に撮影されている領域であり、注目領域の選択方法はカメラ動きの種類ごとに決められている。図９は、カメラ動きの種類と注目領域との関係を示している。カメラ動きが右方向へのパンであるとき、動きベクトルは左方向を向く。このときの注目領域は、画面の右側３分の２となる。また、カメラ動きが左方向のパンでするときの注目領域は、画面の左側３分の２となる。
【００４３】
カメラ動きが下方向のチルトの場合、動きベクトルは下方向を向く。このときの注目領域は、画面の下側３分の２となる。また、カメラ動きが上方向のチルトの場合、動きベクトルは上方向を向く。このときの注目領域は、画面の上側３分の２となる。
【００４４】
カメラ動きが回り込みの場合、注目領域は、画面の上下左右４分の１の領域を削った残りの領域となる。また、カメラ動きが前後移動の場合、注目領域は、画面の上下左右４分の１の領域を削った残りの領域となる。
【００４５】
領域抽出部２２は、カメラ動き検出部２１の検出結果をもとに注目領域を特定し、注目領域の中で動きベクトルの大きさがオブジェクト判別値以下であるマクロブロックを検索する。領域抽出部２２は、動きベクトルの大きさがオブジェクト判別値以下であるブロックにラベリング処理を行い、画像を領域に分割する。なお、ラベリング処理については、既に上述しているため、その説明を省略する。
【００４６】
領域選択部２３は、上述した注目オブジェクト抽出部２の領域選択部２３と同様の機能を有し、領域抽出部２２によって抽出された領域の面積、重心、アクティビティを算出し、この値を基に最も注目オブジェクトらしい領域を選択する。
【００４７】
以上のように、注目オブジェクト抽出部２は、カメラ動きの方向を推定し、この推定結果をもとに注目オブジェクトの表示される注目領域を推定する。そして、注目オブジェクト抽出部２は、注目領域で最も注目オブジェクトらしい領域を選択する。
【００４８】
次に、注目オブジェクト抽出部３１を適用した例について説明する。図１０は、注目オブジェクト抽出部３１を適用した画像表示装置３の構成を示す図である。画像表示装置３は、入力画像から注目オブジェクトを抽出する注目オブジェクト抽出部３１と、注目オブジェクトを拡大する拡大画像作成部３２、入力画像を表示するメイン表示装置３３と、注目オブジェクトを表示するサブ表示装置３４とを有する。
【００４９】
注目オブジェクト抽出部３１は、注目オブジェクトの表示される領域を抽出する。拡大画像作成部３２は、注目オブジェクトの拡大画像を作成し、作成した拡大画像をサブ表示装置３４に出力する。拡大画像作成部３２は、線形補間する処理によって拡大画像を作成してもよいし、その他の処理によって拡大画像を作成してもよい。また、拡大率は、固定であってもよいし、ユーザが指定してもよい。
【００５０】
メイン表示装置３３は入力画像をそのまま表示し、サブ表示装置３４は拡大した注目オブジェクトを表示する。メイン表示装置３３とサブ表示装置３４は、同時に見える位置に配置される。
【００５１】
このように、画像表示装置３は、入力画像の主要部である注目オブジェクトを抽出し、この注目オブジェクトを拡大した拡大画像をサブ表示装置３４に出力する。メイン表示装置３３とサブ表示装置３４は同時に見える位置に配置されており、ユーザは、入力画像と注目オブジェクトとを同時に観賞することができる。
【００５２】
次に、注目オブジェクト抽出部４１を適用した第２の例について説明する。図１１は、注目オブジェクト抽出部４１を適用した画像表示装置４の構成を示す図である。画像表示装置４は、入力画像から注目オブジェクトを抽出する注目オブジェクト抽出部４１と、注目オブジェクトの周囲に枠を形成する枠作成部４２と、入力画像を表示するメイン表示装置４３と、注目オブジェクトを表示するサブ表示装置４４とを有する。
【００５３】
注目オブジェクト抽出部４１は、注目オブジェクトの表示される領域を抽出する。枠作成部４２は、オブジェクトの表示される領域を入力し、注目オブジェクトの周囲を枠で囲む。枠作成部４２は、注目オブジェクトの周囲を枠で囲んだ入力画像をサブ表示装置４４に出力する。
【００５４】
メイン表示装置４３は入力画像をそのまま表示し、サブ表示装置４４は注目オブジェクトの周囲を枠で囲んだ入力画像を表示する。メイン表示装置４３とサブ表示装置４４は、同時に見える位置に配置される。
【００５５】
このように画像表示装置４は、注目オブジェクトの表示される領域を抽出し、この領域に周囲を枠で囲んだ画像をサブ表示装置４４に出力する。メイン表示装置４３とサブ表示装置４４は、同時に見える位置に配置されており、ユーザは、入力画像のどの位置が注目オブジェクトかを確認しながら、入力画像を観賞することができる。
【００５６】
次に、注目オブジェクト抽出部５１を適用した第３の例について説明する。図１２は、注目オブジェクト抽出部５１を適用した画像表示装置５の構成を示す図である。画像表示装置５は、入力画像から注目オブジェクトを抽出する注目オブジェクト抽出部５１と、注目オブジェクトの表示される領域を拡大する拡大画像作成部５２と、拡大画像を表示するサブ表示装置５５を選択するモニタ選択部５３と、入力画像を表示するメイン表示装置５４と、注目オブジェクトを表示する複数のサブ表示装置５５とを有する。
【００５７】
注目オブジェクト抽出部５１は、注目オブジェクトの表示される領域を抽出する。拡大画像作成部５２は、注目オブジェクトの表示される領域を拡大し、拡大画像を作成する。拡大画像作成部５２は、注目オブジェクトの拡大画像を作成し、作成した拡大画像をモニタ選択部５３に出力する。拡大画像作成部５２は、線形補間する処理によって拡大画像を作成してもよいし、その他の処理によって拡大画像を作成してもよい。また、拡大率は、固定であってもよいし、ユーザが指定してもよい。
【００５８】
モニタ選択部５３は、注目オブジェクトの表示される位置や動きベクトルの方向をもとに注目オブジェクトを表示するサブ表示装置５５を選択する。例えば、モニタ選択部５３は、注目オブジェクトの表示される位置が画面の右側であるとき、右側のサブ表示装置５５に拡大画像を表示し、注目オブジェクトの表示される位置が画面の左側であるとき、左側のサブ表示装置５５に拡大画像を表示させるような処理を行う。また、注目オブジェクトが画面の左から右に移動している際には、左側のサブ表示装置５５から右側のサブ表示装置５５に徐々に拡大画像が移動するような処理を行う。
【００５９】
メイン表示装置５４は入力画像をそのまま表示し、サブ表示装置５５は注目オブジェクトを表示する。サブ表示装置５５は、メイン表示装置５４の周囲に配置され、モニタ選択部５３から選択されたサブ表示領域にのみ、注目オブジェクトが表示される。
【００６０】
このように、画像表示装置５は、複数のサブ表示装置５５を有し、入力画像の変化に応じて注目オブジェクトを表示するサブ表示装置５５選択し、選択したサブ表示領域に注目オブジェクトを表示する。メイン表示装置５４とサブ表示装置５５とは、同時に見える場所に配置されており、ユーザは、入力画像と注目オブジェクトを同時に観賞することができる。
【００６１】
以上説明したように、本発明を適用した注目オブジェクト抽出部は、注目オブジェクトが常にカメラに収まるように撮影される性質を利用して、注目オブジェクトの抽出を行う。
【００６２】
注目オブジェクトの抽出には、動きベクトルを使用し、動きベクトルの大きさが小さい領域を注目オブジェクトが表示される領域であると選択する。また、動きベクトルの大きさが小さい領域が複数ある場合には、領域の面積、重心、アクティビティを算出し、これらの値に基づいて注目オブジェクトが表示される領域を特定する。
【００６３】
また、注目オブジェクト抽出部は、動きベクトルをもとにカメラ動きの方向を推定し、カメラ動きの方向をもとに注目オブジェクトの表示される注目領域を推定する。注目オブジェクト抽出部は、注目領域のなかから注目オブジェクトが表示される領域を抽出する。
【００６４】
なお、本発明は、上記実施の形態に限定されるものではなく、本発明の要旨を含む範囲での変形・改良は、本発明に含まれるものとする。例えば、画像表示装置の表示装置は、メイン表示装置とサブ表示装置と複数表示する必要はなく、メイン表示装置に注目オブジェクトを表示するようにしてもよい。
【００６５】
また、注目オブジェクトの表示領域は、複数の画面に表示させるだけでなく、例えば、注目オブジェクトの表示領域の解像度を上げたり、注目オブジェクトの表示位置に応じて音場制御をしてもよい。
【００６６】
【発明の効果】
上述のように、本発明にかかるオブジェクト抽出装置は、入力された画像の動きベクトルを計算し、動きベクトルをもとにカメラ動きがあったか否かを推定する。オブジェクト抽出装置は、カメラ動きがあったと推定すると、動きベクトルの大きさが所定の閾値以下である領域を注目オブジェクトの候補として抽出するため、画像から主要なオブジェクトを抽出することができる。
【００６７】
また、本発明にかかるオブジェクト抽出方法は、入力された画像の動きベクトルを計算し、動きベクトルをもとにカメラ動きがあったか否かを推定する。オブジェクト抽出装置は、カメラ動きがあったと推定すると、動きベクトルの大きさが所定の閾値以下である領域を注目オブジェクトの候補として抽出するため、画像から主要なオブジェクトを抽出することができる。
【００６８】
また、本発明にかかる画像表示装置は、入力された画像の動きベクトルを計算し、動きベクトルをもとにカメラ動きがあったか否かを推定する。オブジェクト抽出装置は、カメラ動きがあったと推定すると、動きベクトルの大きさが所定の閾値以下である領域を注目オブジェクトの候補として抽出するため、画像から主要なオブジェクトを抽出することができる。
【図面の簡単な説明】
【図１】人物を注目オブジェクトとしたときの画像を示す図である。
【図２】カメラ動きと被注目オブジェクトと背景との関係を示す図である。
【図３】注目オブジェクト抽出部の概略構成を示す機能ブロック図である。
【図４】オブジェクト抽出部の動作を示す図である。
【図５】オブジェクトの移動方向に応じたカメラの構図を示す図である。
【図６】注目オブジェクト抽出部の概略構成を示す機能ブロック図である。
【図７】カメラ動き検出部の動作を示す図である。
【図８】カメラ動きと動きベクトルの関係を示す図である。
【図９】カメラ動きと注目領域の関係を示す図である。
【図１０】画像表示装置の構成を示す機能ブロック図である。
【図１１】画像表示装置の構成を示す機能ブロック図である。
【図１２】画像表示装置の構成を示す機能ブロック図である。
【符号の説明】
１注目オブジェクト抽出部、２注目オブジェクト抽出部、３画像表示装置、４画像表示装置、５画像表示装置、１１遅延バッファ、１２動きベクトル計算部、１３テロップ領域除去部、１４カメラ動き推定部、１５領域抽出部、１６領域選択部、２１カメラ動き検出部、２２領域抽出部、２３領域選択部、３１注目オブジェクト抽出部、３２拡大画像作成部、３３メイン表示装置、４１注目オブジェクト抽出部、４２枠作成部、４３メイン表示装置、４４サブ表示装置、５１注目オブジェクト抽出部、５２拡大画像作成部、５３モニタ選択部、５４メイン表示装置、５５サブ表示装置[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an object extraction device, an object extraction method, and an image display device that extract an object displayed on an image.
[0002]
[Prior art]
A plurality of objects such as a person and an object are displayed on the image. Conventionally, when extracting an object from an image, there is a method of recording a value indicating the feature of the object together with the image and extracting the object based on this value, as described in JP-A-12-222484. In addition, as described in Japanese Patent Application Laid-Open No. 9-212648, there is a method of estimating a camera motion and a three-dimensional structure of an image based on an optical flow and extracting a background and an object.
[0003]
[Problems to be solved by the invention]
However, the above-described method is simply a technique for extracting an object from an image, and it is impossible for a cameraman to focus on and extract a main object being imaged.
[0004]
The present invention has been made in view of the above problems, and an object thereof is to provide an object extraction device, an object extraction method, and an image display device that extract main objects from an image.
[0005]
[Means for Solving the Problems]
  In order to solve the above-described problems, an object extraction apparatus according to the present invention includes a motion vector calculation unit that calculates a motion vector of an input image, and a camera motion that estimates whether there is a camera motion based on the motion vector. An estimation unit, and a region extraction unit that extracts a region in which the magnitude of the motion vector is equal to or less than an object determination value as a threshold as a target object candidate when the camera motion estimation unit estimates that there is a camera motion;, A matching means for matching the motion vector calculated by the motion vector calculation means with a motion vector pattern corresponding to a plurality of types of camera motion, and an object of interest based on the type of camera motion matched by the matching means There is an attention area estimation means for estimating the area as the attention area, and the area extraction means extracts the attention object in the attention area.
[0006]
  In addition, the object extraction method according to the present invention includes:On the computer,It is estimated that there was a camera motion in the motion vector calculation step for calculating the motion vector of the input image, the camera motion estimation step for estimating whether there was a camera motion based on the motion vector, and the camera motion estimation step. If the motion vector size is less than or equal to the object discrimination value as a threshold, the region extraction process that extracts the target object candidate, the motion vector calculated by the motion vector calculation process, and multiple types of camera motion are supported. An adaptation process for adapting the motion vector pattern, and an attention area estimation process for estimating an area including the object of interest as the attention area based on the type of camera motion adapted in the adaptation process.Let it runRegion extraction processsoExtracts an object of interest within the region of interest.
[0007]
  An image display apparatus according to the present invention includes a motion vector calculation unit that calculates a motion vector of an input image, a camera motion estimation unit that estimates whether there is a camera motion based on the motion vector, and a camera motion A region extracting unit that extracts a region having a motion vector magnitude equal to or smaller than an object determination value as a threshold when the estimation unit estimates that there is a camera motion;Telop area determination means for determining the area in which the telop is displayed, and the camera motion estimation means sums up motion vector directions in a range excluding the telop area.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
A plurality of objects such as a person and an object are displayed on the image. Of the plurality of objects displayed in the image, an object that is the main part of the image (hereinafter referred to as an object of interest) is always photographed so as to be within the field of view of the camera. For this reason, the attention object occurs that there is almost no movement in the image. The image display device to which the present invention is applied is characterized in that a motion vector is calculated and an area in which the magnitude of the motion vector is smaller than a predetermined threshold is selected as an area where the object of interest is displayed.
[0009]
FIG. 1 is an example of an image when a person is a target object. As shown in FIG. 1, the object of interest is arranged at the center of the screen. When the target object moves to the left, the field of view of the camera also moves to the left together with the target object. Further, when the target object moves to the right, the field of view of the camera also moves to the right together with the target object.
[0010]
When the camera field of view moves together with the object of interest, as shown in FIG. 2, the object or person other than the object of interest (hereinafter referred to as a non-object of interest) and the background move in the opposite direction by the amount of movement of the camera field of view. .
[0011]
Therefore, when the motion vector of the image is obtained, the motion vector of the target object is decreased, and the non-target vector and the background motion vector are increased. The image display device to which the present invention is applied uses this principle to specify a region having a small motion vector as the object of interest. In the following description, the movement of the camera field of view is referred to as camera motion, and the direction of the motion vector is referred to as the motion direction.
[0012]
FIG. 3 is a functional block diagram illustrating a schematic configuration of the target object extraction unit 1 that extracts a target object. The target object extraction unit 1 includes a delay buffer 11 that stores an image one frame before, a motion vector calculation unit 12 that calculates a motion vector, a telop region removal unit 13 that removes a telop region, and whether there is camera motion. A camera motion estimation unit 14 for estimation, a region extraction unit 15 for extracting a region with a small motion vector size, and a region selection unit 16 for specifying a region in which a target object is displayed from a plurality of regions.
[0013]
The delay buffer 11 stores the input image and outputs the input image of the previous frame to the motion vector calculation unit 12. The motion vector calculation unit 12 divides the input image, the input image one frame before, and a plurality of blocks, calculates a motion vector for each block, and outputs the motion vector to the telop area removal unit 13.
[0014]
The telop area removing unit 13 selects a block whose motion vector size is close to zero from the motion vectors output from the motion vector calculating unit 12, and the motion vector size of this block continues for a predetermined time or more. The area where this block exists is determined to be a telop area.
[0015]
The camera motion estimation unit 14 estimates whether or not the imaging direction of the camera has moved based on the direction of the motion vector. The camera motion estimation unit 14 classifies the motion vectors for each motion direction and calculates the total. The camera motion estimator 14 estimates that the direction of the camera motion is opposite to the direction of motion when the total result (number of motion vectors) of a certain motion direction is greater than or equal to the threshold value.
[0016]
The area extraction unit 15 inputs the estimation result of the camera motion estimation unit 14 and starts extracting an object of interest when there is a camera motion. The region extraction unit 15 extracts blocks having a small motion vector size, and performs a labeling process on the blocks constituting the extracted blocks. The block can be classified into a plurality of regions for each assigned number by performing a labeling process.
[0017]
The region selection unit 16 selects an object of interest from the regions labeled by the region extraction unit 15. When a region is selected by the region extraction unit 15, the region selection unit 16 determines that the region is a target object.
[0018]
In addition, when a plurality of regions are selected by the region extraction unit 15, the region selection unit 16 calculates the area, the center of gravity, and the activity of the labeled region, and based on these values, the region most likely to be the object of interest Is selected as the object of interest.
[0019]
Next, the operation of the target object extraction unit 1 will be described with reference to FIG. The object-of-interest extraction unit 1 reads an input image (step S1), and the input image is supplied to the delay buffer 11 and the motion vector calculation unit 12. The delay buffer 11 accumulates the input image and outputs the input image one frame before as a motion vector. The motion vector calculation unit 12 divides the input image and the input image one frame before output from the delay buffer 11 into blocks, and calculates a motion vector for each block (step S2).
[0020]
The telop area removing unit 13 is a process for removing blocks determined to be telop areas. There are various methods for determining a telop area. For example, among the motion vectors output from the motion vector calculation unit 12, a block whose motion vector is close to zero is selected, and when the motion vector size of the selected block does not change for a predetermined time or more, There is a method of determining that an existing area is a telop area (step S3).
[0021]
The camera motion estimation unit 14 estimates whether or not the imaging direction of the camera has moved using the motion vector. The direction of the motion vector is, for example, six directions of rightward, leftward, upward, downward, outward, and inward. The camera motion estimation unit 14 sums up for each direction of the motion vector, and if the summation result in a certain direction is equal to or greater than the motion presence / absence discrimination value, the camera motion estimation unit 14 estimates that the camera motion is in the opposite direction (step). S5; YES). Further, when counting is performed for each direction of the motion vector, if the counting results in all directions do not exceed the motion presence / absence discrimination value, it is estimated that there is no camera motion (step S5; NO), and the object of interest extraction processing is performed. To complete.
[0022]
In the region extraction unit 15, an object discrimination value is set as a threshold for determining the magnitude of the motion vector. The region extraction unit 15 receives the estimation result from the camera motion estimation unit 14 and, when there is a camera motion (step S5; YES), compares the object discriminant value with the magnitude of the motion vector to determine the motion vector. A block whose size is smaller than the object discrimination value is selected.
[0023]
Next, the region extraction unit 15 performs a labeling process on the selected block. The labeling process is a process for numbering blocks. When the selected blocks are adjacent to each other, the area extracting unit 15 assigns the same number to the adjacent blocks. In addition, when the selected blocks are separated, the region extracting unit 15 assigns different numbers to the separated blocks. By this labeling process, adjacent blocks are classified as one area, and separated blocks are classified as different areas (step S7). When the labeling process ends, the area extraction unit 15 outputs the image data that has been subjected to the labeling process to the area selection unit 16.
[0024]
When the region extraction unit 15 extracts one region (step S8; NO), the region selection unit 16 outputs this region as a target object to the sub display device. In addition, when the region extraction unit 15 extracts a plurality of regions (step S8; YES), the region selection unit 16 calculates a value indicating the region characteristics such as area, center of gravity, and activity for each region (step S9). .
[0025]
The region selection unit 16 selects a region that is most likely to be an object of interest from among a plurality of regions using the area, center of gravity, and activity calculated in step S9. Several conditions for selection of an object of interest are illustrated. For example, the region selection unit 16 sets a region having a large area as a region where the target object is highly likely to be displayed. This is because the area where the object of interest is displayed is likely to have a larger area than the area where other objects are displayed.
[0026]
In addition, the region selection unit 16 sets a region having a high activity as a region where a target object is highly likely to be displayed. This is because the focus of the camera is often focused on the object of interest, and the image becomes clear and the activity of the object of interest increases.
[0027]
Further, when the imaging direction of the camera is moving to the left, the area selection unit 16 sets the area where the center of gravity is located from the middle to the right side of the screen as the area where the object of interest is highly likely to be displayed, and the imaging of the camera When the direction is moving to the right, it is assumed that the area where the center of gravity is located on the left side from the center of the screen is the area where the object of interest is highly likely to be displayed. This is a selection method based on how to compose the photographer. When the object of interest is moving to the left as shown in FIG. 5, the cameraman places the object of interest on the right side from the center of the screen, This is because when the target object moves to the right, the target object tends to be arranged from the middle to the left side of the screen.
[0028]
The region selection unit 16 combines the selection result based on the area, the selection result based on the center of gravity, and the selection result based on activity, and selects a region that seems to be the most interesting object (step S10).
[0029]
Thus, when the visual field of the camera changes together with the target object, the target object extraction unit 1 calculates the motion vector of the image and selects a region with a small motion vector value as the display region of the target object.
[0030]
Further, the attention object extraction unit 1 specifies a target object by referring to values such as the area of the area, the center of gravity, and the activity in addition to the motion vector as a method of selecting the target object.
[0031]
Next, the attention object extraction unit 2 using another method will be described. The attention object extraction unit 2 is characterized by estimating an attention area where the attention object is displayed based on the direction of camera movement, and searching for the attention object within the range.
[0032]
FIG. 6 shows a configuration of the target object extraction unit 2. The attention object extraction unit 2 is a camera motion detection unit 21 that detects the direction of camera movement, estimates a region of interest based on the direction of camera movement, and extracts a region in which the size of the motion vector is small in the region of interest And a region selection unit 23 that selects a region in which the target object is displayed among the regions extracted by the region extraction unit 22.
[0033]
The camera motion detection unit 21 divides the input image into large blocks composed of a plurality of blocks, compares the movement direction of the large blocks with pan, tilt, wraparound, and back-and-forth movement to compare the camera motion. presume. The movement direction of the large block is calculated by totaling the movement directions of the blocks constituting the large block.
[0034]
Hereinafter, the operation of the camera motion detection unit 21 will be described in detail with reference to FIG. The camera motion detection unit 21 divides the image into four large blocks (step S20). The camera motion detection unit 21 classifies the motion directions of the motion vectors of the blocks constituting the large block into six directions, upward, downward, rightward, leftward, outward, and central, and totals the number of blocks for each motion direction. (Step S21). In the camera motion detection unit 21, a motion direction determination value is set as a threshold value for determining whether or not there is a camera motion. The camera motion detection unit 21 reads the magnitude of the motion vector whose summation result is greater than or equal to the motion direction discrimination value when the block summation result is greater than or equal to the motion direction discrimination value (step S22; YES). The variation in size is calculated (step S23). In the camera motion detection unit 21, a directionality determination value is set as a threshold for determining whether or not the magnitude of the motion vector has a certain regularity. When the variation is equal to or less than the directionality determination value (step S24; YES), the camera motion detection unit 21 determines that there is a camera motion in this large block (step S25).
[0035]
The camera motion detection unit 21 determines that there is no motion in the large block when the motion vector is totaled in step S22 and the total result is equal to or less than the motion direction determination value (step S22; NO). (Step S27), the process is stopped. Further, when the variation in the value of the motion vector is equal to or smaller than the directionality determination value in step S24 (step S24; NO), the camera motion detection unit 21 determines that there is no motion in the large block (step S27). The process is terminated.
[0036]
The camera motion detection unit 21 determines whether or not the motion direction has been determined for all the large blocks (step S26). When the motion direction is determined in all large blocks (step S26; YES), the camera motion detection unit 21 proceeds to step S28. If there is a block whose motion direction is not determined (step S26; NO), the camera motion detection unit 21 ends the process assuming that there is no camera motion.
[0037]
  Next, the camera motion detection unit 21 detects the type of camera motion of the input image. As shown in FIG. 8, there are four types of camera movements: pan, tilt, wraparound, and forward / backward movement. The camera motion detection unit 21 has one type of motion direction of the four large blocks suitable for which camera motion.One by oneCompare.
[0038]
The camera motion detection unit 21 first determines whether the type of camera motion is pan. Panning is an imaging method for imaging an object that moves in the horizontal direction with the camera fixed. When the image is taken with pan, the direction of the motion vector is horizontal rightward or horizontal leftward. When the motion vector matches the pan pattern (step S28; YES), the camera motion detection unit 21 specifies that the camera motion is pan (step S29).
[0039]
In step S28, if the motion vector does not match the pan pattern (step S28; NO), the camera motion detection unit 21 determines whether the type of camera motion is tilt. Tilt is an imaging method for imaging an object that moves up and down with the camera fixed. When imaging is performed with tilt, the direction of the motion vector is upward in the vertical direction or downward in the vertical direction. When the motion vector matches the tilt pattern (step S30; YES), the camera motion detection unit 21 specifies that the camera motion is tilt (step S31).
[0040]
In step S31, when the motion vector does not match the tilt pattern (step S31; NO), the camera motion detection unit 21 determines whether or not the type of camera motion is wraparound. The wraparound is an imaging method for capturing an object while moving the camera so as to wrap around the object. When the image is taken around, the direction of the motion vector is leftward in the horizontal direction or rightward in the horizontal direction. The direction of the wraparound motion vector and the direction of the pan motion vector are the same, but the motion vector motion amount is different. When the motion vector matches the wraparound pattern (step S32; YES), the camera motion detection unit 21 specifies that the camera motion is wraparound (step S33).
[0041]
In step S32, when the motion vector does not match the wraparound pattern (step S32; NO), the camera motion detection unit 21 determines whether or not the type of camera motion is forward / backward movement. The back-and-forth movement is a state in which an object that moves back and forth is chased and imaged. When an object moving toward you is pulled and imaged, the direction of the motion vector is directed toward the center of the screen. Further, when an image is picked up following an object moving in the back, the direction of the motion vector is directed from the center of the screen to the outside of the screen. When the motion vector matches the back-and-forth movement pattern (step S34; YES), the camera motion detection unit 21 specifies that the camera motion is the back-and-forth movement (step S35). If the motion vector does not match the forward / backward movement pattern (step S34; NO), it is determined that there is no camera motion (step S36). The camera motion detection unit 21 detects the presence / absence of camera motion and the type of camera motion in the above-described procedure, and outputs the detection result to the region extraction unit 22.
[0042]
The region extraction unit 22 estimates a region of interest in which the target object is displayed based on the moving direction of the imaging direction of the camera estimated by the motion direction estimation unit, and the value of the motion vector in the target region is greater than the threshold Extract small blocks. The attention area is an area that is always photographed while the camera is moving, and the method of selecting the attention area is determined for each type of camera movement. FIG. 9 shows the relationship between the type of camera movement and the region of interest. When the camera motion is panning to the right, the motion vector points to the left. The attention area at this time is two-thirds on the right side of the screen. In addition, the attention area when the camera moves in the left panning is the left third of the screen.
[0043]
When the camera motion is a downward tilt, the motion vector points downward. The attention area at this time is the lower two-thirds of the screen. In addition, when the camera motion is an upward tilt, the motion vector is directed upward. The attention area at this time is the upper two-thirds of the screen.
[0044]
When the camera motion wraps around, the attention area is the remaining area obtained by cutting the upper, lower, left, and right quarters of the screen. Further, when the camera moves back and forth, the attention area is the remaining area obtained by shaving the upper, lower, left, and right quarters of the screen.
[0045]
The region extraction unit 22 identifies a region of interest based on the detection result of the camera motion detection unit 21, and searches for a macroblock in which the size of the motion vector is equal to or smaller than the object determination value. The region extraction unit 22 performs a labeling process on blocks whose motion vectors are equal to or smaller than the object determination value, and divides the image into regions. Since the labeling process has already been described above, the description thereof is omitted.
[0046]
The region selection unit 23 has the same function as the region selection unit 23 of the target object extraction unit 2 described above, calculates the area, the center of gravity, and the activity of the region extracted by the region extraction unit 22, and based on this value Select the area that seems to be the most interesting object.
[0047]
As described above, the attention object extraction unit 2 estimates the direction of camera motion, and estimates the attention area where the attention object is displayed based on the estimation result. Then, the attention object extraction unit 2 selects an area that seems to be the attention object most in the attention area.
[0048]
Next, an example in which the target object extraction unit 31 is applied will be described. FIG. 10 is a diagram illustrating a configuration of the image display device 3 to which the target object extraction unit 31 is applied. The image display device 3 includes an attention object extraction unit 31 that extracts an attention object from the input image, an enlarged image creation unit 32 that enlarges the attention object, a main display device 33 that displays the input image, and a sub display that displays the attention object. Device 34.
[0049]
The attention object extraction unit 31 extracts a region where the attention object is displayed. The enlarged image creating unit 32 creates an enlarged image of the object of interest and outputs the created enlarged image to the sub display device 34. The enlarged image creation unit 32 may create an enlarged image by a linear interpolation process, or may create an enlarged image by other processes. The enlargement ratio may be fixed or specified by the user.
[0050]
The main display device 33 displays the input image as it is, and the sub display device 34 displays the enlarged attention object. The main display device 33 and the sub display device 34 are arranged at a position where they can be seen simultaneously.
[0051]
As described above, the image display device 3 extracts the attention object that is the main part of the input image, and outputs an enlarged image obtained by enlarging the attention object to the sub display device 34. The main display device 33 and the sub display device 34 are arranged at a position where they can be seen at the same time, and the user can view the input image and the object of interest at the same time.
[0052]
Next, a second example to which the attention object extraction unit 41 is applied will be described. FIG. 11 is a diagram illustrating a configuration of the image display device 4 to which the target object extraction unit 41 is applied. The image display device 4 includes a target object extraction unit 41 that extracts a target object from the input image, a frame creation unit 42 that forms a frame around the target object, a main display device 43 that displays the input image, and a target object. And a sub display device 44 for displaying.
[0053]
The attention object extraction unit 41 extracts an area where the attention object is displayed. The frame creation unit 42 inputs an area where the object is displayed and surrounds the object of interest with a frame. The frame creation unit 42 outputs an input image in which the periphery of the object of interest is surrounded by a frame to the sub display device 44.
[0054]
The main display device 43 displays the input image as it is, and the sub display device 44 displays the input image in which the object of interest is surrounded by a frame. The main display device 43 and the sub display device 44 are arranged at a position where they can be seen simultaneously.
[0055]
In this way, the image display device 4 extracts the region where the object of interest is displayed, and outputs an image surrounded by a frame around this region to the sub display device 44. The main display device 43 and the sub display device 44 are arranged at positions where they can be seen at the same time, and the user can view the input image while confirming which position of the input image is the target object.
[0056]
Next, a third example to which the attention object extraction unit 51 is applied will be described. FIG. 12 is a diagram illustrating a configuration of the image display device 5 to which the target object extraction unit 51 is applied. The image display device 5 selects a target object extraction unit 51 that extracts a target object from the input image, a magnified image creation unit 52 that magnifies a region where the target object is displayed, and a sub display device 55 that displays a magnified image. The monitor selection unit 53 includes a main display device 54 that displays an input image, and a plurality of sub display devices 55 that display an object of interest.
[0057]
The attention object extraction unit 51 extracts an area where the attention object is displayed. The enlarged image creation unit 52 enlarges the area where the object of interest is displayed, and creates an enlarged image. The enlarged image creating unit 52 creates an enlarged image of the object of interest, and outputs the created enlarged image to the monitor selecting unit 53. The enlarged image creation unit 52 may create an enlarged image by a linear interpolation process, or may create an enlarged image by other processes. The enlargement ratio may be fixed or specified by the user.
[0058]
The monitor selection unit 53 selects the sub display device 55 that displays the target object based on the position where the target object is displayed and the direction of the motion vector. For example, when the position where the target object is displayed is on the right side of the screen, the monitor selection unit 53 displays an enlarged image on the right sub display device 55, and when the position where the target object is displayed is on the left side of the screen. Then, processing for displaying an enlarged image on the left sub display device 55 is performed. Further, when the object of interest is moving from the left to the right of the screen, processing is performed so that the enlarged image gradually moves from the left sub display device 55 to the right sub display device 55.
[0059]
The main display device 54 displays the input image as it is, and the sub display device 55 displays the object of interest. The sub display device 55 is arranged around the main display device 54, and the object of interest is displayed only in the sub display region selected from the monitor selection unit 53.
[0060]
As described above, the image display device 5 includes the plurality of sub display devices 55, selects the sub display device 55 that displays the target object in accordance with the change of the input image, and displays the target object in the selected sub display area. . The main display device 54 and the sub display device 55 are disposed at a place where they can be seen at the same time, and the user can view the input image and the object of interest at the same time.
[0061]
As described above, the object-of-interest extraction unit to which the present invention is applied extracts the object of interest by using the property that the object of interest is photographed so as to always fit in the camera.
[0062]
For extracting the target object, a motion vector is used, and an area where the magnitude of the motion vector is small is selected as an area where the target object is displayed. Further, when there are a plurality of regions having small motion vectors, the area, the center of gravity, and the activity of the region are calculated, and the region where the target object is displayed is specified based on these values.
[0063]
The attention object extraction unit estimates the direction of camera movement based on the motion vector, and estimates the attention area where the attention object is displayed based on the direction of camera movement. The target object extraction unit extracts a region where the target object is displayed from the target region.
[0064]
Note that the present invention is not limited to the above-described embodiment, and modifications and improvements within the scope including the gist of the present invention are included in the present invention. For example, the display device of the image display device need not display a plurality of main display devices and sub-display devices, and may display an object of interest on the main display device.
[0065]
In addition to displaying the target object display area on a plurality of screens, for example, the resolution of the target object display area may be increased, or the sound field may be controlled according to the display position of the target object.
[0066]
【The invention's effect】
As described above, the object extraction device according to the present invention calculates a motion vector of an input image, and estimates whether there has been camera motion based on the motion vector. When the object extraction apparatus estimates that there is camera movement, the object extraction apparatus extracts a region having a motion vector magnitude equal to or smaller than a predetermined threshold value as a candidate for the object of interest, so that a main object can be extracted from the image.
[0067]
The object extraction method according to the present invention calculates a motion vector of an input image and estimates whether or not there is a camera motion based on the motion vector. When the object extraction apparatus estimates that there is camera movement, the object extraction apparatus extracts a region having a motion vector magnitude equal to or smaller than a predetermined threshold value as a candidate for the object of interest, so that a main object can be extracted from the image.
[0068]
The image display apparatus according to the present invention calculates a motion vector of an input image and estimates whether or not there has been a camera motion based on the motion vector. When the object extraction apparatus estimates that there is camera movement, the object extraction apparatus extracts a region having a motion vector magnitude equal to or smaller than a predetermined threshold value as a candidate for the object of interest, so that a main object can be extracted from the image.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating an image when a person is a target object.
FIG. 2 is a diagram illustrating a relationship among camera motion, an object of interest, and a background.
FIG. 3 is a functional block diagram illustrating a schematic configuration of a target object extraction unit.
FIG. 4 is a diagram illustrating an operation of an object extraction unit.
FIG. 5 is a diagram illustrating a composition of a camera according to a moving direction of an object.
FIG. 6 is a functional block diagram illustrating a schematic configuration of a target object extraction unit.
FIG. 7 is a diagram illustrating an operation of a camera motion detection unit.
FIG. 8 is a diagram illustrating the relationship between camera motion and motion vectors.
FIG. 9 is a diagram illustrating a relationship between camera movement and a region of interest.
FIG. 10 is a functional block diagram illustrating a configuration of an image display device.
FIG. 11 is a functional block diagram illustrating a configuration of an image display device.
FIG. 12 is a functional block diagram illustrating a configuration of an image display device.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Attention object extraction part, 2 Attention object extraction part, 3 Image display apparatus, 4 Image display apparatus, 5 Image display apparatus, 11 Delay buffer, 12 Motion vector calculation part, 13 Telop area removal part, 14 Camera motion estimation part, 15 Region extraction unit, 16 region selection unit, 21 camera motion detection unit, 22 region extraction unit, 23 region selection unit, 31 attention object extraction unit, 32 enlarged image creation unit, 33 main display device, 41 attention object extraction unit, 42 frame Creation unit, 43 Main display device, 44 Sub display device, 51 Object-of-interest extraction unit, 52 Enlarged image creation unit, 53 Monitor selection unit, 54 Main display device, 55 Sub display device

Claims

Motion vector calculation means for calculating the motion vector of the input image;
Camera motion estimation means for estimating whether there has been camera motion based on the motion vector;
In the camera motion estimation means, when it is estimated that there is a camera motion, an area extraction means for extracting an area where the magnitude of the motion vector is equal to or less than the object determination value as a threshold as a target object candidate;
Adapting means for adapting motion vectors calculated by the motion vector calculating means and motion vector patterns corresponding to a plurality of types of camera motions;
A region of interest estimation means for estimating a region including the object of interest as a region of interest based on the type of camera motion adapted in the adaptation unit;
The object extraction device, wherein the region extraction means extracts a target object in the target region.

2. The camera motion estimation means sums up the number of motion vectors for each direction of motion vectors, and compares the summation result with a motion direction discrimination value as a threshold to estimate whether or not there has been camera motion. Object extraction device.

The camera motion estimation means, when the total result is equal to or greater than the motion direction determination value, calculates the variation of the magnitude of the motion vector facing this direction, compares the variation and the directionality determination value as a threshold, The object extraction device according to claim 2, further comprising estimation means for estimating whether or not there has been camera movement.

2. The object extracting apparatus according to claim 1, further comprising: a region specifying unit that calculates a value indicating characteristics of the region when the region is extracted by the region extracting unit, and specifies the object of interest based on the value.

5. The object extracting apparatus according to claim 4, wherein the values indicating the region characteristics calculated by the region specifying means are a center of gravity, an area, and an activity of the region.

2. The object extracting apparatus according to claim 1, wherein the type of the camera movement is a camera operation at the time of shooting such as panning, tilting, turning around, and back-and-forth movement.

Having a telop area determination means for determining an area in which the telop is displayed;
The object extraction apparatus according to claim 1, wherein the camera motion estimation unit sums up each motion vector in a range excluding the telop area.

On the computer,
A motion vector calculation step for calculating a motion vector of the input image;
A camera motion estimation step for estimating whether there was a camera motion based on the motion vector;
In the camera motion estimation step, when it is estimated that there is a camera motion, a region extraction step of extracting a region in which the magnitude of the motion vector is equal to or less than the object determination value as a threshold as a target object candidate;
A matching step for matching the motion vector calculated by the motion vector calculation step with motion vector patterns corresponding to a plurality of types of camera motions;
A region-of-interest estimation step of estimating, as a region of interest, a region including the target object based on the type of camera motion that has been adapted in the adaptation step ,
In the region extraction step , an object extraction method for extracting a target object in the target region.

Motion vector calculation means for calculating the motion vector of the input image;
Camera motion estimation means for estimating whether there has been camera motion based on the motion vector;
In the camera motion estimation unit, when it is estimated that there is a camera motion, a region extraction unit that extracts, as a target object candidate, a region in which the magnitude of the motion vector is equal to or less than the object determination value as a threshold value;
Having a telop area determination means for determining an area in which the telop is displayed;
The image display apparatus according to claim 1, wherein the camera motion estimation means sums up motion vector directions in a range excluding the telop area.

10. The camera motion estimation means sums up the number of motion vectors for each direction of motion vectors, and compares the summation result with a motion direction discrimination value as a threshold to estimate whether or not there has been camera motion. Image display device.

The camera motion estimation means, when the total result is equal to or greater than the motion direction determination value, calculates the variation of the magnitude of the motion vector facing this direction, compares the variation and the directionality determination value as a threshold, The image display apparatus according to claim 10, further comprising estimation means for estimating whether or not there has been camera movement.

The image display device according to claim 9, further comprising a region specifying unit that calculates a value indicating a characteristic of the region when the region is extracted by the region extracting unit, and specifies the object of interest based on the value.

Adapting means for adapting motion vectors calculated by the motion vector calculating means and motion vector patterns corresponding to a plurality of types of camera motions;
A region of interest estimation means for estimating a region including the object of interest as a region of interest based on the type of camera motion adapted in the adaptation unit;
The image display device according to claim 9, wherein the region extraction unit extracts a target object in the target region.

A main display for displaying the input image;
The image display device according to claim 9, further comprising: at least one sub display unit that displays the object of interest.

An image enlarging means for enlarging the object of interest;
The image display device according to claim 14, wherein the sub display unit displays the enlarged target object.

It has a frame drawing means that surrounds the object of interest with a frame,
The image display device according to claim 14, wherein the sub display unit displays an image in which the periphery of the object of interest is surrounded by a frame.

The image display device according to claim 14, further comprising: a display unit selection unit that arranges a plurality of sub display units around the main display unit and selects a sub display unit that displays a noticed image according to the content of the image.