JP4158544B2

JP4158544B2 - Image search device

Info

Publication number: JP4158544B2
Application number: JP2003033844A
Authority: JP
Inventors: 典司加藤; 洋次鹿志村; 仁池田
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2003-02-12
Filing date: 2003-02-12
Publication date: 2008-10-01
Anticipated expiration: 2023-02-12
Also published as: JP2004246476A

Description

【０００１】
【発明の属する技術分野】
本発明は、写真などの画像データから、顔の部分などといった特定の画像部分を探索する画像探索装置に関する。
【０００２】
【従来の技術】
近年、写真等に含まれる特定の対象体、例えば人の顔などの部分を特定し、当該特定した部分に基づいて所定の処理を行うことが考えられている。その一例としては撮影された写真から各人の顔の部分を検出し、当該顔の部分のみを焼き付けたり、または撮像中の映像から人の顔部分を検出して顔の認証処理に供したり、といったものが考えられる。
【０００３】
従来の顔画像等、対象体を認識する装置では、対象体の撮像状態（傾き、大きさ、照明状態など）によっては対象体の認識が困難になる場合に対応するため、撮像状態を所定の撮像状態（基準状態）に適合させる処理を行うものがある。
【０００４】
従来、この処理では、具体的には撮像状態を変化させながら撮影した学習用画像データを用いてニューラルネットワークを学習させ、当該学習させたニューラルネットワークを利用して処理の対象となった写真での撮像状態が基準状態からどの程度ずれているかを検出し、当該ずれを補正するよう画像処理を行うことが考えられてきた。
【０００５】
なお、対象となる画像データから所望のパターンを検出する処理の例としては、特許文献１に開示される、カーネル非線形部分空間法等の方法が知られている。
【０００６】
【特許文献１】
特開２００１−９０２７４号公報
【０００７】
【発明が解決しようとする課題】
しかしながら、例えば人物の顔部分で言えば、横向き加減や上向き加減、首のかしげ具合、照明の具合といった様々な変化があり、従来の基準状態からのずれを検出する処理を行おうとすると、ニューラルネットワークの学習に用いる学習用画像データが上記様々な変化に合わせて大量に必要となる。また、こうした大量の画像データによって学習された結果、ニューラルネットワークの規模も膨大なものとなって、当該処理を現実的な時間内に完了することは不可能であった。
【０００８】
また従来の対象体を認識するための装置では、対象体を探索する元となる写真等について、その全体を探索範囲として処理を行っている。このため、処理すべきデータ量も増大してしまい、処理負荷が多大であった。
【０００９】
これらの結果として、対象体を認識する装置としては、従来は実験室的なものに限られているのが現状であり、こういった本質的な技術の困難性を排除する必要がある。
【００１０】
本発明は、上記実情に鑑みて為されたもので、探索の対象体を写真などから探索する処理の負荷を軽減し、種々の用途に適用可能な、実用的な画像探索装置を提供することをその目的の一つとする。
【００１１】
【課題を解決するための手段】
請求項１記載の発明は、処理の対象となった対象画像データ内から、探索の対象となる探索対象の画像データ部分を探索する画像探索装置であって、前記対象画像データ内に、探索領域を少なくとも一つ画定する手段と、少なくとも一つの変換の種類について、予め定められた基準状態から互いに異なる変換量だけ変換された複数の学習サンプルを用い、各学習サンプルについてそれぞれを基準状態に戻す変換量の情報を関連づけて学習させて得た変換データベースを参照して、前記画定されたそれぞれの探索領域に含まれる探索部分データについて、当該探索部分データを基準状態とするために適用すべき変換の種類及び変換量を含んでなる変換条件を取得し、当該取得した変換条件に基づく変換を、各探索部分データについての変換条件が示す変換量が「０」となるまで繰り返し行って、前記探索部分データを基準状態に変換する変換手段と、前記基準状態での探索対象の画像データ例を用いて学習獲得された探索データベースを参照し、前記変換後の探索部分データの各々に探索対象が含まれているか否かを判断する探索手段と、探索対象が含まれていると判断された探索部分データに対応する前記対象画像データ上の領域を特定する探索結果情報を生成し、出力する手段と、を含み、当該生成された探索結果情報が所定処理に供されることとしたものである。
【００１２】
請求項２記載の発明は、請求項１に記載の画像探索装置において、前記変換データベースは、変換の種類ごとに学習して得た変換条件をそれぞれ記憶するデータベース群を含んでなり、前記変換手段は、変換データベースに含まれる前記各データベース群を検索し、前記画定されたそれぞれの探索領域に含まれる探索部分データについて、当該探索部分データを基準状態とするために適用すべき変換の種類及び変換量を含んでなる変換条件を取得し、当該取得した変換条件に基づく変換を、各探索部分データについての変換条件が示す変換量が「０」となるまで繰り返し行って、前記探索部分データを基準状態に変換することとしたものである。
【００１３】
請求項３記載の発明は、処理の対象となった対象画像データ内から、探索の対象となる探索対象の画像データ部分を探索する画像探索方法であって、コンピュータを用いて、探索領域を画定する手段が、前記対象画像データ内に、探索領域を少なくとも一つ画定する工程と、変換手段が、少なくとも一つの変換の種類について、予め定められた基準状態から互いに異なる変換量だけ変換された複数の学習サンプルを用い、各学習サンプルについてそれぞれを基準状態に戻す変換量の情報を関連づけて学習させて得た変換データベースを参照して、前記画定されたそれぞれの探索領域に含まれる探索部分データについて、当該探索部分データを基準状態とするために適用すべき変換の種類及び変換量を含んでなる変換条件を取得し、当該取得した変換条件に基づく変換を、各探索部分データについての変換条件が示す変換量が「０」となるまで繰り返し行って、前記探索部分データを基準状態に変換する工程と、探索手段が、前記基準状態での探索対象の画像データ例を用いて学習獲得された探索データベースを参照し、前記変換後の探索部分データの各々に探索対象が含まれているか否かを判断する工程と、出力手段が、探索対象が含まれていると判断された探索部分データに対応する前記対象画像データ上の領域を特定する探索結果情報を生成し、出力する工程と、を含み、当該生成された探索結果情報が所定処理に供されることとしたものである。
【００１４】
請求項４記載の発明は、処理の対象となった対象画像データ内から、探索の対象となる探索対象の画像データ部分を探索するプログラムであって、コンピュータを、前記対象画像データ内に、探索領域を少なくとも一つ画定する手段と、少なくとも一つの変換の種類について、予め定められた基準状態から互いに異なる変換量だけ変換された複数の学習サンプルを用い、各学習サンプルについてそれぞれを基準状態に戻す変換量の情報を関連づけて学習させて得た変換データベースを参照して、前記画定されたそれぞれの探索領域に含まれる探索部分データについて、当該探索部分データを基準状態とするために適用すべき変換の種類及び変換量を含んでなる変換条件を取得し、当該取得した変換条件に基づく変換を、各探索部分データについての変換条件が示す変換量が「０」となるまで繰り返し行って、前記探索部分データを基準状態に変換する変換手段と、前記基準状態での探索対象の画像データ例を用いて学習獲得された探索データベースを参照し、前記変換後の探索部分データの各々に探索対象が含まれているか否かを判断する探索手段と、探索対象が含まれていると判断された探索部分データに対応する前記対象画像データ上の領域を特定する探索結果情報を生成し、出力する手段と、として機能させ、当該生成された探索結果情報が所定処理に供されることとしたものである。
【００２０】
【発明の実施の形態】
［基本構成］
本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係る画像探索装置は、図１に示すように、制御部１１と、記憶部１２と、データベース部１３と、表示部１４と、操作部１５と、外部記憶部１６とを含んで構成されている。これら各部は互いにバスを介して接続されており、いわば一般的なコンピュータを用いて実現される。このコンピュータは、他の製品、例えばカメラなどに組み込まれたものであっても構わない。
【００２１】
制御部１１は、記憶部１２に格納されているプログラムに従って動作するものであり、処理の対象となった対象画像データのうち、探索領域を少なくとも一つ画定する探索領域画定処理と、基準状態に変換する変換処理と、探索対象が含まれている探索領域を検出する探索処理と、探索結果を用いた所定の処理とを実行する。これらの制御部１１の具体的処理内容については、後に詳しく述べる。
【００２２】
記憶部１２は、制御部１１が実行するソフトウエアを格納している。また、この記憶部１２は、制御部１１がその処理の過程で必要とする種々のデータを保持するワークメモリとしても動作する。具体的にこの記憶部１２は、ハードディスクなどの記憶媒体、あるいは半導体メモリ、ないしこれらの組み合わせとして実現できる。
【００２３】
データベース部１３は、後に説明するように、制御部１１の変換処理において用いられる変換データベース１３ａ、並びに探索処理において用いられる探索データベース１３ｂを含んだデータベースである。このデータベース部１３は、具体的にはハードディスクなどの記憶媒体であり、記憶部１２がこのデータベース部１３を兼ねてもよいが、ここでは説明のため、特に分けて示している。
【００２４】
表示部１４は、例えばディスプレイ装置やプリンタ装置などであり、制御部１１から入力される指示に従い、情報の表示などを行うものである。操作部１５は、例えばキーボードやマウスなどであり、ユーザの操作を受け入れて、当該操作の内容を制御部１１に出力する。
【００２５】
外部記憶部１６は、例えばＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなど、コンピュータ可読なリムーバブルメディア（記憶媒体の一種）からプログラムやデータを読み出して制御部１１に出力し、制御部１１の処理によって記憶部１２に格納させる処理を行うものである。本実施の形態に係るプログラムは、例えばＣＤ−ＲＯＭなどの可搬的な記憶媒体に格納されて頒布でき、この外部記憶部１６を用いて記憶部１２に複写されて利用される。なお、本実施の形態に係るプログラムは、こうした記憶媒体だけでなく、ネットワーク上のサーバなどから図示しない通信部を介して記憶部１２に複写されることとしてもよい。
【００２６】
［制御部１１の処理］
ここで、制御部１１の処理の内容について具体的に説明する。本実施の形態においては、各処理の対象となる画像データ（対象画像データ）は、外部記憶部１６や図示しない通信部を介して外部から入力され、記憶部１２に格納される。ここで対象画像データは、一つであっても複数であっても構わない。ユーザが操作部１５を操作して、制御部１１に対し、特定の対象画像データについて探索対象を探索する処理を行うべき旨の指示（処理開始の指示）を行うと、制御部１１は、図２に示す処理を開始する。
【００２７】
制御部１１は、対象画像データを順次縮小変換しながら、各縮小変換された対象画像データについて探索領域を画定し、各探索領域について基準状態への変換処理と、探索処理とを行い、各縮小変換された対象画像データ上のどの部分に探索対象が含まれているかを表すマップデータを生成する。
【００２８】
具体的に制御部１１は、まず縮小率Ｓを最小縮小率（例えば１倍、つまり縮小せず）に設定し（Ｓ１）、対象画像データを縮小率Ｓで縮小する（Ｓ２）。そして縮小後の対象画像データのサイズに等しいサイズのマップデータの領域を記憶部１２上に確保し、当該領域の値を「偽（false）」に設定して、マップデータの初期化を実行する（Ｓ３）。例えば縮小後の対象画像データが１０００×１０００ピクセルの画像データであれば、１０００×１０００ビット分の領域を確保し、各ビット値を「０」に初期設定する。
【００２９】
次に制御部１１は、縮小後の対象画像データについて、少なくとも一つの探索領域を画定する処理を行う（Ｓ４）。この探索領域の画定処理については後に詳しく述べる。そして探索領域の一つを選択し（Ｓ５）、当該探索領域について変換処理を実行する（Ｓ６）。この変換処理についても後に詳しく述べる。
【００３０】
制御部１１は、変換処理後の探索領域に含まれる画像データ部分について、探索対象が含まれているか否かを判定する処理（探索処理）を実行し（Ｓ７）、含まれていると判断されるときには（Ｙｅｓのときには）、当該変換処理後の探索領域に相当する、マップデータ上の領域の値を「真（true）」に設定する（Ｓ８）。そしてさらに選択していない探索領域があるか否かを調べ（Ｓ９）、選択していない探索領域があれば（Ｙｅｓであれば）、処理Ｓ５に戻り、当該選択していない探索領域の一つを選択して処理を続ける（Ａ）。
【００３１】
一方、処理Ｓ７における探索処理の結果、探索対象が含まれていないと判定されるときには（Ｎｏのときには）、そのまま処理Ｓ９に移行する。また、処理Ｓ９において、選択していない探索領域がなければ、つまり、すべての探索領域について変換処理と探索処理とを完了したならば（Ｎｏならば）、現在設定されている縮小率Ｓが事前に定められた最大縮小率を上回っているか否かを調べ（Ｓ１０）、上回っていなければ（Ｎｏならば）、縮小率Ｓを大きくするように調整して（Ｓ１１）、処理Ｓ２に戻って処理を続ける（Ｂ）。ここで、縮小率Ｓを大きく調整する処理Ｓ１１は、例えば縮小率Ｓを所定比で高めるような処理としてもよいし、縮小率Ｓである倍率に対し、所定乗率ΔＳを乗じて、Ｓ＝Ｓ×ΔＳとして新たな縮小率Ｓを定めてもよい。
【００３２】
また、処理Ｓ１０において現在設定されている縮小率Ｓが事前に定められた最大縮小率を上回っていれば（Ｙｅｓならば）、各縮小率での対象画像データに対応するマップデータに基づき、元の（縮小前の）対象画像データ内で、探索対象が含まれている領域を画定して（Ｓ１２）、処理を終了する。
【００３３】
［探索領域画定処理］
次に、制御部１１が探索領域を画定する処理（探索領域画定処理）について説明する。探索領域画定処理は、ユーザから入力された開始点の情報を利用しても、また、所定の条件を満足する領域を自律的に画定することによっても行うことができる。
【００３４】
例えばユーザから入力される情報を利用する場合、制御部１１は、操作部１５などから入力された少なくとも一つの開始点座標の情報に基づき、各開始点座標を左上隅とする予め定められたサイズの矩形領域を探索領域としてそれぞれ画定する。
【００３５】
また所定の条件を満足する領域を自律的に検索して画定する場合、制御部１１は、（縮小後の）対象画像データの左上隅の座標（例えばＸ＝０，Ｙ＝０）を開始点として、予め定められたサイズの矩形領域について所定の画定条件を満足しているか否かを調べ、画定条件を満足しているときには、当該矩形領域を探索領域とするという処理を、開始点を幅方向に所定量ずつ移動しながら（Ｘ＝Ｘ＋ΔＸ）順次行い、開始点が対象画像データの幅を逸脱する（Ｘ＞対象画像データの幅）と、高さ方向に所定量だけ開始点を移動して（Ｘ＝０，Ｙ＝Ｙ＋ΔＹ）、幅方向の処理を繰り返す。こうして対象画像データ全体のうち、画定条件を満足する領域を探索領域として画定する。
【００３６】
なお、ここでは開始点の移動量を幅方向、高さ方向にそれぞれΔＸ，ΔＹとしているが、これら移動量は対象画像データの縮小率Ｓに応じて、１倍のときのΔＸ，ΔＹに対してΔＸ／Ｓ，ΔＹ／Ｓとしてもよい。
【００３７】
［変換処理］
次に変換処理について説明する。本実施の形態において特徴的なことの一つは、この変換処理が段階的に行われ、各段階では一つの変換自由度に対応する変換が行われることである。本実施の形態の制御部１１は、探索領域に含まれている画像データ部分に基づいて所定の特徴量ベクトル情報を演算する。ここで特徴量ベクトル情報は、探索対象の性状に合わせて選択された、複数の特徴量要素を含んでなるベクトル量である。
【００３８】
本実施の形態では、制御部１１は、この特徴量ベクトル情報と、変換データベース１３ａに格納されている特徴量ベクトル情報とを用いた、カーネル非線形部分空間法によって変換を特定することとして説明する。
【００３９】
［変換データベース１３ａの内容］
このカーネル非線形部分空間法は、データを何らかのカテゴリに分類する方法として広く知られているので、詳しい説明を省略するが、その概要を述べれば、特徴量要素を基底として張られる空間Ｆにおいて、当該空間Ｆに含まれる複数の部分空間Ωのそれぞれをデータの分類先であるカテゴリとして認識し、分類しようとするデータに基づいて作成される空間Ｆ内の特徴量ベクトル情報（例えばΦとする）を各部分空間Ωに射影し（射影の結果を例えばφとする）、射影前の特徴量ベクトル情報Φと、射影後の特徴量ベクトル情報φとの距離Ｅが最も小さくなる部分空間Ω（仮に最近接部分空間と呼ぶ）を検出し、分類しようとするデータは、その部分空間Ωによって表されるカテゴリに属すると判断する方法である。
【００４０】
そこで学習段階では、同一のカテゴリに属するべき学習用の例示データ（学習サンプル）に対応する特徴量ベクトル情報に基づく最近接部分空間Ωが同一となるよう、非線形写像（空間Ｆへの写像、すなわちカーネル関数に含まれるパラメータ等）と、各カテゴリに対応する部分空間Ω間を隔てる超平面との少なくとも一方を調整することとなる。
【００４１】
本実施の形態においては、探索対象を基準状態に変換する方法（変換の種類及び量）を決定するために、この変換データベース１３ａが形成される。つまり、基準状態にあるか否かが不明な画像データに対して、行うべき変換の種類及び変換の量（カテゴリ）を決定できるように変換データベース１３ａが学習獲得されている。本実施の形態では、画像の回転、平行移動、サイズ変更という、画像に対して行うべき変換の種類（自由度）ごとに、変換データベース１３ａを作成している。変換の各自由度に対応する変換データベース１３ａは、対応する変換の変換量をカテゴリとして学習獲得したものである。
【００４２】
この学習獲得のため、本実施の形態の変換データベース１３ａの学習過程では、学習サンプルを次のように生成する。すなわち、所定の基準状態での探索対象である画像データの例を複数用意し、各画像データの例について、変換の自由度ごとに、それぞれの自由度について、互いに異なる変換量での変換が行われた複数の変換画像データを生成する。こうして自由度ごとに生成された変換画像データを、各自由度ごとの学習サンプルとする。具体的に顔を探索対象とする場合、所定の基準状態（所定の撮影条件・姿勢）にある顔の画像データを例として複数用意し、各画像データについて、変換の自由度として、例えば回転・平行移動・サイズ変更等という各自由度ごとに、回転であれば−１８０度から１８０度までの範囲で５度ずつ等の角度で回転させた変換画像データを回転の自由度に対する学習サンプルとする。また、平行移動であれば、縦横にそれぞれ５ピクセルずつ移動させた複数の変換画像データを平行移動の自由度に対する学習サンプルとする。なお、これらの学習サンプルは、移動等の変換自由度を含むために、基準状態よりも広い領域の画像データのうちから基準状態の面積を５ピクセルずつ移動させながら取り出すことで生成する。
【００４３】
こうして複数の画像データ例のそれぞれについて、さらに自由度ごとにそれぞれ複数の変換が施された複数の画像データを生成し、各画像データにどのような変換を行ったかを表す情報（変換量の大きさ等）を関連づける。
【００４４】
なお、ここでは互いに異なる変換量の変換を施した画像データを得るために、変換量を所定のステップ（例えば回転で言えば５度）ずつ変化させながらそれぞれ変換を行った画像データを学習サンプルに含めるようにしたが、所定のステップずつ変化させながらでなくとも、変換量を乱数によって決定しながら変換を行って、それぞれを学習サンプルに含めるようにしてもよい。
【００４５】
次に、各自由度ごとの学習サンプルを用いて、各自由度に対応する変換データベース１３ａを学習させる。
【００４６】
［変換処理の動作］
制御部１１は、こうして学習された各変換データベース１３ａを用いて、探索領域画定処理によって画定された探索領域の各々について次のように変換処理を行う。すなわち、処理の対象となった探索領域に含まれている画像データ部分（例えば画素値の列としてベクトル値と同視し得る）を、空間Ｆ内の特徴量ベクトル情報（各変換データベース１３ａごと、つまり変換の各自由度ごとに定義されている特徴量の組）に写像し、さらにその写像を各部分空間Ωに射影する。そして、射影前の特徴量ベクトル情報と、射影後の特徴量ベクトル情報との距離Ｅが最小となる変換量を決定する。また制御部１１は、距離Ｅの二乗値Ｌを演算し、これを誤差として記憶部１２に保持する。
【００４７】
ここで変換量は、各変換データベース１３ａに基づき自由度ごとに決定されるが、制御部１１は、各自由度に対応する変換量のうち一つを所定の条件（例えば各変換量に対応する距離Ｅが最小となるもの等の条件）に基づいて選択し、選択した自由度に対応する変換を、選択した変換量の分だけ変換する。
【００４８】
つまり、探索領域に含まれている画像データ部分からは、各自由度に対応する各変換データベース１３ａに学習獲得された情報によって、例えば回転の自由度に対しては１０度の回転変換により基準状態に近づき、その誤差がＬｒであり、平行移動の自由度に対しては左へ５ピクセルの変換で基準状態に近づき、その誤差がＬｐといった情報が得られるので、この中から、誤差が最小となる自由度の変換を選択する。例えば上述の例の場合、Ｌｒ＜Ｌｐならば１０度の回転変換を探索領域に施して、新たな探索領域を画定する。そして、この新たな探索領域に含まれる画像データ部分をさらに空間Ｆ内の特徴量ベクトル情報に写像し、その写像をさらに各部分空間Ωに射影する。そして、射影前の特徴量ベクトル情報と、射影後の特徴量ベクトル情報との距離Ｅが最小となる変換量を決定する処理から繰り返す。
【００４９】
また、各自由度に対応する変換量がいずれも「０」（つまり無変換）を表すものとなっている場合は、その段階で処理を終了し、さらに未処理の探索領域があれば、当該未処理の探索領域のいずれかを処理の対象として変換処理を行う。
【００５０】
なお、ここでは対象画像データのうち、画定された探索領域に含まれる画像データ部分をそのまま用いているが、当該画像データ部分の解像度を低減する処理を行って、粗視データとし、当該粗視データを用いて変換処理を実行してもよい。この場合は、当該粗視データに対応する学習サンプルを用いて、各変換データベース１３ａを学習獲得させておく。
【００５１】
また、制御部１１は、特徴量ベクトル情報の演算、部分空間への写像、距離の評価、誤差の評価といった処理を各自由度ごとに順次行うのではなく、並列して行ってもよい。
【００５２】
さらにここではカーネル非線形部分空間法を用いる場合を例として説明したが、データの分類と、分類時の誤差評価が可能であれば例えばオートエンコーダ等、他の方法を用いても構わない。
【００５３】
［探索処理］
次に探索処理について説明する。この探索処理では変換処理を完了した探索領域の各々にそれぞれ含まれる画像データ部分について、探索データベース１３ｂを用いて、探索対象が含まれているか否かを判定する。具体的な探索処理の例としては、特開２００２−３２９１８８号公報に開示された方法などがある。次にその概要を説明する。
【００５４】
［探索データベースの学習課程］
この探索データベース１３ｂは、基準状態にある探索対象の画像データの例を学習サンプルとして用い、ニューラルネットワークを学習させて形成する。すなわち、制御部１１は、複数の学習サンプルの入力を受けて、その各々について、探索対象の性状に合わせて予め選択された特徴量のセット（特徴量ベクトル）を演算し、学習用データを生成する。次に、この学習用データを用いて、記憶部１２に格納されたＭ×Ｍ′の格子空間上に、ＳＯＭ（自己組織化マップ）によって格子空間マップを形成する。つまり、制御部１１は、入力された学習用データである特徴量ベクトルと、各格子ごとに割り当てられた重みベクトルとの距離を所定の測度（例えばユークリッド測度）で演算し、この距離が最小となる格子（最整合ノード）ｃを検出する。そしてこの最整合ノード近傍の複数の格子について、その重みベクトルを当該入力された特徴量ベクトルを用いて更新する。この処理の繰り返しにより、記憶部１２上に格子空間マップが形成され、互いに類似する特徴量ベクトルに対する最整合ノードが連続的な領域を形成するようになる。つまり、この格子空間には、多次元の入力信号である特徴量ベクトルから２次元のマップへの非線形射影が位相を保持したまま形成され、重みの更新により、データの特徴部分が組織化され、その学習成果として類似のデータに反応する格子が近接して存在しているようになる。
【００５５】
各学習データに基づく学習が完了すると、次に制御部１１は、格子空間マップの各格子をカテゴリに分類する。この分類は、例えば各格子間の距離（各格子に関連づけられた重みベクトル間の距離）に基づいて行うことができ、探索対象に似た画像データに反応する格子群のカテゴリ（探索対象カテゴリ）と、そうでない格子群のカテゴリ（非探索対象カテゴリ）とに分類される。
【００５６】
［探索処理の動作］
制御部１１は、対象画像データと同じサイズのマップデータを記憶する領域を記憶部１２に確保し、当該領域の値を「偽（false）」に初期化する。
【００５７】
制御部１１は、学習獲得した探索データベース１３ｂを用い、変換処理を完了した探索領域の画像データ部分に基づいて所定の特徴量ベクトルを演算する。そして当該演算した特徴量ベクトルと探索データベース１３ｂ内の各格子に関連づけられた重みベクトルとの距離を求め、特徴量ベクトルとの距離が最小となる格子（最整合ノード）を特定し、特定した格子が探索対象カテゴリに属していれば、探索領域に探索対象が含まれていると判断し、特定した格子が非探索対象カテゴリに属していれば、探索領域には探索対象が含まれていないと判断する。
【００５８】
［制御部１１の動作］
本実施の形態の制御部１１は、以上のように、探索の対象となった対象画像データを順次縮小しながら、縮小後のそれぞれの対象画像データから探索処理を行う領域を取り出し、当該領域内の画像データが基準状態となるよう変換をした上で、探索対象が当該変換後の探索領域内の画像データに含まれているか否かを判断する。すなわち、制御部１１は探索処理の対象となる画像データを基準状態とする処理を行うので、変換前の探索領域の位置が多少ずれていても構わない。また、縮小率が基準状態から多少ずれていたとしても問題とならない。
【００５９】
そこで従来であれば、０．８倍ずつ縮小した多段階の縮小画像データを生成し、しかも探索領域を１画素ずつずらしながら取り出すようにしていたのに対し、本実施の形態のものでは０．５倍ずつの縮小で構わないし、探索領域を画定する際に、所定の条件を満足する領域を自律的に取り出す場合であっても、ΔＸやΔＹを６画素等とすることができる。これにより、探索処理の対象となるパターン数を大幅に低減でき、探索の対象体を写真などから探索する処理の負荷を軽減できる。
【００６０】
制御部１１は、探索対象が含まれていると判断された領域を表すマップデータを探索結果情報として生成するが、このマップデータは各縮小率で縮小された後の対象画像データのそれぞれに対応して複数生成される。そこで、これら複数のマップデータ（それぞれ縮小後の対象画像データのサイズとなっている）を統合的に用いて探索対象が含まれている領域を決定する。
【００６１】
例えば、各マップデータを、それぞれの縮小率に応じた拡大率で拡大し、元の対象画像データのサイズに揃えて比較し、すべてのマップデータで共通して「真」となっている領域（どの縮小率の対象画像データに基づいても、探索対象が含まれていると判断された領域）に探索対象が含まれていると判断することとしてもよい。また、いずれか一つのマップデータで「真」となっている領域を探索対象が含まれている領域と判断するようにしてもよい。
【００６２】
［動作及び応用例］
次に、対象画像データから探索対象の含まれている領域を見いだす処理の応用例について説明する。
【００６３】
本実施の形態に係る画像探索装置によると、例えばある写真内に何名の人物が撮像されているかをカウントすることができる。すなわち、処理対象画像データとして写真を読みとったものを利用し、人の顔を探索対象として各データベース１３を学習させておく。そして図２に示した処理を行ってマップデータ上で「真」となっている領域がいくつあるかをカウントする。これによると、例えば交通量センサスなどの用途に利用できる。
【００６４】
また、同じく人の顔を探索対象として学習させたデータベース１３を用い、カメラによってディジタル的に撮像された画像データを対象画像データとして、顔が画像データのどの部分に撮像されているかを検出し、当該検出した部分に撮像されている顔を用いて個人認証を行うこともできる。
【００６５】
さらに例えば複数枚の写真を読みとった画像データを対象画像データとして、図２に示す処理により、それぞれの写真に対応する画像データのどの部分に人の顔が写っているかを検出するとともに、検索条件として写真のどの部分に人の顔が撮像されているものを検索するかをユーザに特定させて、当該特定された検索条件にマッチする検出結果を検索し、マッチする検出結果に対応する写真をユーザに提示するといった応用例もある。
【００６６】
これをさらにネットワーク上のウエブサーバから取得した画像データを対象に行うと、ウエブ上に公開されている画像データのうち、人物が特定の状況に配されているものを検索するといった応用も可能となる。
【００６７】
さらにこの例をカメラに適用すると、次のようなカメラとすることもできる。すなわち、ユーザに予め、人の顔の配置状況を特定させておき、ディジタル的に連続して撮像した画像データの各々を対象画像データとして、連続撮影された各対象画像データについて、それぞれの中に含まれている人の顔の配置状況を図２の処理を用いて検出し、当該検出の結果と、ユーザによって特定された配置状況とがマッチしたときの画像データを選択的に記憶部１２に保持する。これによると、人の顔が所望の配置状態になっている写真を撮影することができる。これは例えば２人の子供を撮影するときのように、２人がちょうど正面を向いたときの画像を得ることが困難な場合にも利用できる。正面を向いたときに図２の処理によって該当部分に顔が配置されていると検出するからである。さらに集合写真を撮影する際に、従来のようにセルフタイマーやリモートコントローラを使うのではなく、カメラをセットアップした人物が写真上の所望の位置に来たときの画像を記録するという用途にも利用できる。
【００６８】
さらに、本実施の形態においては探索対象は人の顔に限られない。例えば工業用部品の生産工場において、コンベア上を流れる部品の数をカウントしたり、自然画像の撮像や科学実験の記録などのときに、被写体が所望の配置状態となったところの画像を記録させたりといった用途にも用いることができる。
【００６９】
［処理の変形例］
探索領域の画定処理の段階で、探索対象が含まれている可能性があるか否かを判断し、当該判断結果に基づいて探索領域を画定させることも好ましい。これは所定の条件を満足する領域を自律的に検索して画定した探索領域には、互いに重複した領域が含まれていることも想定され得るためである。すなわち、同じ正面向きの顔を探索対象とした場合であっても、左耳近傍を中心にした領域と、鼻を中心にした領域と、右耳を中心にした領域とで異なる探索領域として画定される場合があるが、この場合に、すべての探索領域を処理する必要はないことに基づく。
【００７０】
この場合の制御部１１は、図２の代わりに図３に示す処理を行う。なお、図３において図２と同様の処理を行う部分については同一の符号を付して詳細な説明を省略する。制御部１１は、まず縮小率Ｓを最小縮小率に設定し（Ｓ１）、対象画像データを縮小率Ｓで縮小する（Ｓ２）。そして縮小後の対象画像データのサイズに等しいサイズのマップデータの領域を記憶部１２上に確保し、当該領域の値を「偽（false）」に設定して、マップデータの初期化を実行する（Ｓ３）。また、制御部１１は、対象画像データと同一サイズのブーリアン値の配列（以下、ゲイジングマップ（Gazing Map；注視領域マップ）と呼ぶ）を生成する（Ｓ２０）。なお、各ブーリアン値は当初「false（偽）」に設定しておく。
【００７１】
次に制御部１１は、縮小後の対象画像データについて、少なくとも一つの探索領域を画定する処理を行う（Ｓ４）。この処理Ｓ４においては、既に説明した自律的処理によって探索領域を画定する。次に制御部１１は、各ブーリアン値のうち、探索領域に対応する領域のブーリアン値を「True（真）」に変更して設定する（Ｓ２１）。そして制御部１１は各探索領域のいずれか一つを選択し（Ｓ２２）、当該選択した探索領域に対応するゲイジングマップ上の領域の中心座標（例えば（ｘ１，ｙ１）と、（ｘ２，ｙ２）とをそれぞれ左上、右下とする矩形領域であれば、その中心座標は（（ｘ１＋ｘ２）／２，（ｙ１＋ｙ２）／２）で表される点であり、所定の方法で整数化する）のブーリアン値を参照して、それが真であるか否かを調べる（Ｓ２３）。
【００７２】
この処理Ｓ２３においてブーリアン値が真であれば（Ｙｅｓであれば）、当該選択している探索領域について変換処理（Ｓ６）と、探索処理（Ｓ７）とを行い、探索処理により探索対象が当該選択している探索領域に含まれていると判断されるときには（Ｙｅｓのときには）、当該変換処理後の探索領域に相当する、マップデータ上の領域の値を「真（true）」に設定する（Ｓ８）とともに、ゲイジングマップ上で、当該探索領域に対応する領域のブーリアン値を偽に設定する（Ｓ２４）。これにより当該部分にその中心点座標が含まれている探索領域が重ねて処理されることがなくなる。
【００７３】
制御部１１は、さらにゲイジングマップ上で真となっているブーリアン値があるか否かを調べ（Ｓ２５）、真となっているブーリアン値があれば（Ｙｅｓであれば）、当該ブーリアン値のいずれかを中心座標として新たな探索領域を画定し（Ｓ２６）、処理Ｓ２２に戻って当該探索領域を選択して処理を続ける。なお、処理Ｓ２３においてブーリアン値が真でなければ（Ｎｏならば）、さらに選択していない探索領域があるか否かを調べるため、処理Ｓ２５に移行する（Ｃ）。
【００７４】
一方、処理Ｓ７における探索処理の結果、探索対象が含まれていないと判定されるときには（Ｎｏのときには）、そのまま処理Ｓ２５に移行する。また、処理Ｓ２５において、ゲイジングマップ上で真となっているブーリアン値がなければ（Ｎｏならば）、現在設定されている縮小率Ｓが事前に定められた最大縮小率を上回っているか否かを調べ（Ｓ１０）、上回っていなければ（Ｎｏならば）、縮小率Ｓを大きくするように調整して（Ｓ１１）、処理Ｓ２に戻って処理を続ける。ここで、縮小率Ｓを大きく調整する処理Ｓ１１は、例えば縮小率Ｓを所定比で高めるような処理としてもよいし、縮小率Ｓである倍率に対し、所定乗率ΔＳを乗じて、Ｓ＝Ｓ×ΔＳとして新たな縮小率Ｓを定めてもよい。
【００７５】
また、処理Ｓ１０において現在設定されている縮小率Ｓが事前に定められた最大縮小率を上回っていなければ（Ｙｅｓならば）、各縮小率での対象画像データに対応するマップデータに基づき、元の（縮小前の）対象画像データ内で、探索対象が含まれている領域を画定して（Ｓ１２）、処理を終了する。
【００７６】
この処理によると、探索領域の画定処理において一つの探索対象に対して複数の探索領域が画定されてしまっても、近接する複数の探索領域が重複して処理されることがなくなる。
【００７７】
［探索領域画定処理の変形例］
さらに探索領域画定処理において、制御部１１は探索領域として画定しようとする領域について、その内部に含まれる画像データのエントロピーや、階層エントロピー、色、輝度分散、及びこれらのうちの二以上の値の組み合わせを用い、探索領域として実際に画定するか否かを決定してもよい。例えば人物の顔部分を探索する場合、顔の周辺部（輪郭部分）ではエントロピーが高くなるので、エントロピーが所定のしきい値よりも大きい場合には当該探索領域として画定しようとしている領域を実際に探索領域として画定し、そうでない場合には、当該領域を探索領域とせずに、他の処理を続けるようにする。これによると、変換処理・探索処理の対象となる探索領域を合理的に減少させることができ、処理負荷の軽減が図られる。
【００７８】
［処理のさらに別の変形例］
さらに、ここまでの説明では探索領域の画定を制御部１１が行っていたが、探索領域そのものをユーザが指定できるようにしてもよい。この場合、制御部１１は、対象画像データのサイズと方向を選択させ、当該サイズに対応する枠線を表示部１４に表示する。さらに制御部１１はユーザに、この枠線内に探索領域の候補を描画させる。具体的に表示部１４には、図４に示すような表示が行われ、枠線Ｂ内に探索領域の候補Ｒを描画させる。制御部１１は、ユーザから「画定」ボタンのクリック操作を受け付けると、描画された探索領域の候補Ｒに基づき、この探索領域の候補Ｒに外接する矩形を探索領域として画定する。
【００７９】
また記憶部１２内に、予め画定した探索領域のパターン（テンプレート）を保持しておき、探索領域の指定を受け付ける際に、当該保持しているテンプレートの一覧を提示して、ユーザに選択させ、当該選択されたテンプレートに基づいて探索領域を画定してもよい。
【００８０】
［変換処理の変形例］
これら探索領域そのものをユーザに指定させる場合において、指定した探索領域の各々について行われるべき変換処理の情報（変換の自由度と変換量とを表す情報、以下ユーザ指定変換情報と呼ぶ）を関連づけておき、変換処理の際にこれを利用してもよい。このユーザ指定変換情報が、本発明における、「探索対象の状態に関する情報」の一例に相当する。この場合の変換処理は次のようなものとなる。
【００８１】
すなわち制御部１１は、まず画定された探索領域の一つを選択し、選択した探索領域を、当該探索領域に関連づけられた変換処理の情報に基づいて変換する（予備変換）。そして、この予備変換された探索領域に含まれる画像データ部分を、空間Ｆ内の特徴量ベクトル情報に写像し、さらにその写像を各部分空間Ωに射影する。そして、変換データベース１３ａを用いて射影前の特徴量ベクトル情報と、射影後の特徴量ベクトル情報との距離Ｅが最小となる変換量を変換の自由度ごとに決定する。
【００８２】
つまり、探索領域に含まれている画像データ部分からは、各自由度に対応する各変換データベース１３ａに学習獲得された情報によって、各自由度ごとに基準状態に近づけるための変換量の情報と、その際の誤差（距離Ｅの二乗値）とが得られるので、これら各自由度の変換の中から、誤差が最小となっている自由度の変換を特定する。ここで制御部１１は、当該特定された自由度が、選択されている探索領域に関連づけられたユーザ指定変換情報で決められる変換の自由度と同じであれば、次に誤差の小さい自由度の変換を特定しなおす処理を繰り返して、ユーザ指定変換情報で決められる変換の自由度以外の自由度の変換を特定する。そして制御部１１は選択した探索領域について、特定した自由度の変換を、当該自由度の変換について決定された変換量だけ変換を施して、新たな探索領域を画定する。そして、この新たな探索領域に含まれる画像データ部分をさらに空間Ｆ内の特徴量ベクトル情報に写像し、その写像をさらに各部分空間Ωに射影する。そして、射影前の特徴量ベクトル情報と、射影後の特徴量ベクトル情報との距離Ｅが最小となる変換量を決定する処理から繰り返す。
【００８３】
また、ユーザ指定変換情報で決められる自由度以外の自由度に対応する変換量がいずれも「０」（つまり無変換）を表すものとなっている場合や、ユーザ指定変換情報で決められる自由度以外の自由度の変換がない場合は、その段階で処理を終了し、さらに未処理の探索領域があれば、当該未処理の探索領域のいずれかを処理の対象として変換処理を行う。
【００８４】
なお、この場合も当該画像データ部分の解像度を低減する処理を行って、粗視データとし、当該粗視データを用いて変換処理を実行してもよい。この場合にも、当該粗視データに対応する学習サンプルを用いて、各変換データベース１３ａを学習獲得させておく。
【００８５】
この変換処理を用いると、例えば各探索領域における探索対象の状態をも含めた画像データの検索を行うことができる。つまり写真内に、直立している人物と寝転がっている人物とがおり、それぞれの顔が正立状態と、正立から９０度回転した状態とにあるとき、それぞれの状態に関する情報を探索領域とともに検索条件として指定する。例えば基準状態として探索データベース１３ｂが正立状態の人の顔を学習サンプルとして学習獲得されているとすると、正立状態に関連づけられた探索領域については、０度の回転変換を行うとして指定しておけば、変換処理において回転変換が行われることがないので、探索処理においては基準状態、すなわち正立状態にある人物の顔だけが検出され、同様に寝転がっている人物（正立から９０度回転した状態にある）については−９０度の回転変換を行うとして指定しておけば、予備変換で−９０度の変換が行われた後は、変換処理によってもそれ以上の回転変換が行われることがなく、探索処理において回転の自由度に関しては当該予備変換後の状態が基準状態であるような人物の顔（つまり９０度傾いた人の顔）だけが検出される。
【００８６】
［同等の効果を得るための別の構成例］
また制御部１１は、予備変換を用いる上記処理に代えて、図２に示した通りに変換処理を実行し、当該変換処理が完了したときに、当該変換処理で実際に行われた変換の内容と、探索領域に関連づけられたユーザ指定変換情報で決められる変換の内容とを比較する処理を行ってもよい。この場合は、この比較の処理により、ユーザ指定変換情報に示された変換と同等の変換が行われていれば当該変換処理を行った探索領域について探索処理を実行し、同等の変換が行われていなければ当該変換を行った変換領域についての探索処理をスキップする。
【００８７】
［その他の変形例］
ここまでの説明では、変換処理において行われる変換は、２次元的な回転、平行移動、拡大縮小（探索領域を拡大縮小し、その内部の画像データ部分を元の（拡大縮小前の）探索領域のサイズに変換して扱えばよい）であるとして説明したが、これ以外にも例えば人の顔であれば、姿勢（うつむき加減や振り向き加減）に影響される３次元的な回転を含んでもよい。この場合、探索対象の平均的３次元モデルを想定し、当該平均的３次元モデルへ画像データ部分を投射したものを用いて３次元的回転の変換を実現することができる。
【００８８】
また、探索処理においては探索対象として、例えば人の顔であっても、さらに細かくカテゴリを分けて、年齢や性別、口を開けているか否かなどの条件を含めてもよい。
【００８９】
さらに、図２や図３のフローチャート図においては各縮小率における処理を制御部１１が順次行うものとしていたが、各縮小率における処理は互いに独立しているので、制御部１１は、これらの各縮小率における処理を並列して行ってもよい。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る画像探索装置の構成ブロック図である。
【図２】制御部１１の処理の一例を表すフローチャート図である。
【図３】制御部１１による処理の別の例を表すフローチャート図である。
【図４】本発明の実施の形態に係る画像探索装置を応用した装置のインタフェースの一例を表す説明図である。
【符号の説明】
１１制御部、１２記憶部、１３データベース部、１４表示部、１５操作部、１６外部記憶部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image search apparatus for searching for a specific image portion such as a face portion from image data such as a photograph.
[0002]
[Prior art]
In recent years, it has been considered that a specific object included in a photograph or the like, for example, a part such as a human face is specified and predetermined processing is performed based on the specified part. As an example, detect the face part of each person from the photographed picture and burn only the face part, or detect the face part of the person from the image being captured and use it for face authentication processing, Such a thing can be considered.
[0003]
In a device for recognizing a target object such as a conventional face image, in order to cope with the case where it is difficult to recognize the target object depending on the imaging state (tilt, size, illumination state, etc.) of the target object, Some perform processing adapted to the imaging state (reference state).
[0004]
Conventionally, in this process, specifically, a neural network is learned using learning image data taken while changing the imaging state, and a photograph that is a processing target using the learned neural network is used. It has been considered to detect how much the imaging state deviates from the reference state and perform image processing so as to correct the deviation.
[0005]
As an example of processing for detecting a desired pattern from target image data, a method such as a kernel nonlinear subspace method disclosed in Patent Document 1 is known.
[0006]
[Patent Document 1]
JP 2001-90274 A
[0007]
[Problems to be solved by the invention]
However, for example, in the case of a person's face, there are various changes such as lateral adjustment, upward adjustment, neck caulking condition, lighting condition, etc. When trying to detect a deviation from the conventional reference state, a neural network A large amount of learning image data used for learning is required in accordance with the various changes described above. Moreover, as a result of learning with such a large amount of image data, the scale of the neural network has become enormous, and it has been impossible to complete the processing within a realistic time.
[0008]
Further, in a conventional apparatus for recognizing a target object, processing is performed using the entire photograph or the like as a search source for the target object as a search range. For this reason, the amount of data to be processed also increases, and the processing load is great.
[0009]
As a result of these, the current state of the apparatus for recognizing an object is conventionally limited to a laboratory one, and it is necessary to eliminate such essential technical difficulties.
[0010]
The present invention has been made in view of the above circumstances, and provides a practical image search device that can reduce the load of processing for searching for an object to be searched from a photograph and can be applied to various uses. Is one of its purposes.
[0011]
[Means for Solving the Problems]
The invention according to claim 1 is an image search device for searching for a search target image data portion to be searched from within the target image data to be processed, wherein a search region is included in the target image data. A plurality of learning samples obtained by converting different conversion amounts from a predetermined reference state with respect to at least one means for defining at least one conversion type, and converting each learning sample to a reference state. Referring to the conversion database obtained by associating and learning the information on the quantity, for the search partial data included in each of the defined search areas, the conversion to be applied to make the search partial data the reference state A conversion condition including a type and a conversion amount is acquired, and conversion based on the acquired conversion condition is converted into a conversion condition for each search partial data. Refer to the search database acquired by learning using the conversion means for converting the search partial data into the reference state and the search target image data example in the reference state by repeatedly performing the conversion amount shown to “0”. Search means for determining whether or not each of the converted search partial data includes a search target; and on the target image data corresponding to the search partial data determined to include the search target Means for generating and outputting search result information for specifying the region, and the generated search result information is subjected to a predetermined process.
[0012]
According to a second aspect of the present invention, in the image search device according to the first aspect, the conversion database includes a database group for storing conversion conditions obtained by learning for each type of conversion, and the conversion means. Search for each database group included in the conversion database, and for the search partial data included in each of the defined search areas, the type and conversion of conversion to be applied to set the search partial data as a reference state The conversion condition including the amount is acquired, and the conversion based on the acquired conversion condition is repeatedly performed until the conversion amount indicated by the conversion condition for each search partial data becomes “0”, and the search partial data is used as a reference. It is supposed to be converted to a state.
[0013]
The invention according to claim 3 is an image search method for searching a search target image data portion to be searched from target image data to be processed, and defines a search area using a computer. A means for defining at least one search area in the target image data, and a plurality of conversion means converted by a different conversion amount from a predetermined reference state for at least one type of conversion. With respect to the search partial data included in each of the defined search regions with reference to the conversion database obtained by associating and learning the information on the conversion amount for each learning sample to be returned to the reference state. Acquiring a conversion condition including a conversion type and a conversion amount to be applied in order to set the search partial data as a reference state; The conversion based on the conversion condition is repeatedly performed until the conversion amount indicated by the conversion condition for each search partial data becomes “0”, and the search means converts the search partial data into a reference state, and the search means includes the reference state. A step of referring to a search database acquired by learning using an example of image data of a search target in the step of determining whether or not a search target is included in each of the converted search partial data, and output means, Generating and outputting search result information for specifying an area on the target image data corresponding to the search partial data determined to include the search target, and the generated search result information It is intended to be subjected to predetermined processing.
[0014]
The invention according to claim 4 is a program for searching a target image data portion to be searched from target image data to be processed, and a computer is searched in the target image data. A plurality of learning samples obtained by converting at least one region and means for converting at least one type of conversion from a predetermined reference state by different conversion amounts are used to return each learning sample to the reference state. With reference to a conversion database obtained by learning by associating conversion amount information, the conversion to be applied in order to set the search partial data as a reference state for the search partial data included in each of the defined search areas The conversion condition including the type and the conversion amount is acquired, and conversion based on the acquired conversion condition is applied to each search partial data. It is repeatedly acquired until the conversion amount indicated by all the conversion conditions becomes “0”, and learning is acquired using the conversion means for converting the search partial data into the reference state and the image data example of the search target in the reference state. Search means for referring to the search database and determining whether or not each of the converted search partial data includes a search target, and corresponding to the search partial data determined to include the search target It is made to function as a means for generating and outputting search result information for specifying an area on the target image data, and the generated search result information is subjected to a predetermined process.
[0020]
DETAILED DESCRIPTION OF THE INVENTION
[Basic configuration]
Embodiments of the present invention will be described with reference to the drawings. As shown in FIG. 1, the image search apparatus according to the embodiment of the present invention includes a control unit 11, a storage unit 12, a database unit 13, a display unit 14, an operation unit 15, and an external storage unit 16. It is comprised including. These units are connected to each other via a bus, which is realized by using a general computer. This computer may be incorporated in another product, such as a camera.
[0021]
The control unit 11 operates according to a program stored in the storage unit 12, and includes a search area demarcation process that demarcates at least one search area from the target image data to be processed, and a reference state. A conversion process for conversion, a search process for detecting a search area including a search target, and a predetermined process using the search result are executed. Specific processing contents of these control units 11 will be described in detail later.
[0022]
The storage unit 12 stores software executed by the control unit 11. The storage unit 12 also operates as a work memory that holds various data required by the control unit 11 during the process. Specifically, the storage unit 12 can be realized as a storage medium such as a hard disk, a semiconductor memory, or a combination thereof.
[0023]
As will be described later, the database unit 13 is a database including a conversion database 13a used in the conversion process of the control unit 11 and a search database 13b used in the search process. The database unit 13 is specifically a storage medium such as a hard disk, and the storage unit 12 may also serve as the database unit 13, but is illustrated here separately for the sake of explanation.
[0024]
The display unit 14 is, for example, a display device or a printer device, and displays information according to an instruction input from the control unit 11. The operation unit 15 is a keyboard or a mouse, for example, and accepts a user operation and outputs the content of the operation to the control unit 11.
[0025]
The external storage unit 16 reads a program or data from a computer-readable removable medium (a type of storage medium) such as a CD-ROM or a DVD-ROM, and outputs the program or data to the control unit 11. The processing to be stored in is performed. The program according to the present embodiment can be distributed and stored in a portable storage medium such as a CD-ROM, and is copied to the storage unit 12 using the external storage unit 16 and used. Note that the program according to the present embodiment may be copied to the storage unit 12 not only from such a storage medium but also from a server on the network via a communication unit (not shown).
[0026]
[Processing of control unit 11]
Here, the content of the process of the control part 11 is demonstrated concretely. In the present embodiment, image data (target image data) to be processed is input from the outside via the external storage unit 16 or a communication unit (not shown) and stored in the storage unit 12. Here, the target image data may be one or plural. When the user operates the operation unit 15 to instruct the control unit 11 to perform a process of searching for a search target for specific target image data (instruction to start processing), the control unit 11 2 is started.
[0027]
The control unit 11 delimits the target image data sequentially, demarcates a search area for each reduced-converted target image data, performs conversion processing to the reference state and search processing for each search area, and performs each reduction Map data representing which part of the converted target image data contains the search target is generated.
[0028]
Specifically, the control unit 11 first sets the reduction rate S to the minimum reduction rate (for example, 1 time, that is, does not reduce) (S1), and reduces the target image data at the reduction rate S (S2). Then, an area of map data having a size equal to the size of the target image data after reduction is secured on the storage unit 12, the value of the area is set to “false”, and the map data is initialized. (S3). For example, if the target image data after reduction is image data of 1000 × 1000 pixels, an area for 1000 × 1000 bits is secured, and each bit value is initialized to “0”.
[0029]
Next, the control unit 11 performs a process of defining at least one search area for the reduced target image data (S4). The search area defining process will be described in detail later. Then, one of the search areas is selected (S5), and conversion processing is executed for the search area (S6). This conversion process will also be described in detail later.
[0030]
The control unit 11 executes a process (search process) for determining whether or not a search target is included for the image data portion included in the search area after the conversion process (S7), and is determined to be included. If yes (Yes), the value of the area on the map data corresponding to the search area after the conversion process is set to “true” (S8). Then, it is checked whether or not there is an unselected search area (S9). If there is an unselected search area (if Yes), the process returns to step S5, and one of the unselected search areas is selected. Is selected and processing continues (A).
[0031]
On the other hand, when it is determined that the search target is not included as a result of the search process in the process S7 (when No), the process directly proceeds to the process S9. If there is no unselected search area in process S9, that is, if the conversion process and the search process have been completed for all search areas (if No), the currently set reduction rate S is set in advance. (S10), if not (if No), the reduction rate S is adjusted to be increased (S11), and the process returns to step S2 and processed. Continue (B). Here, the process S11 for largely adjusting the reduction rate S may be, for example, a process for increasing the reduction rate S by a predetermined ratio, or by multiplying the magnification which is the reduction rate S by a predetermined multiplier ΔS, S = A new reduction ratio S may be set as S × ΔS.
[0032]
Further, if the currently set reduction rate S in the process S10 exceeds the predetermined maximum reduction rate (if Yes), based on the map data corresponding to the target image data at each reduction rate, the original In the target image data (before reduction), an area including a search target is defined (S12), and the process ends.
[0033]
[Search area definition processing]
Next, a process in which the control unit 11 defines a search area (search area defining process) will be described. The search area defining process can be performed using information on the start point input by the user or by autonomously defining an area that satisfies a predetermined condition.
[0034]
For example, when using information input from the user, the control unit 11 determines a predetermined size with each start point coordinate as an upper left corner based on information of at least one start point coordinate input from the operation unit 15 or the like. Are defined as search areas.
[0035]
When the area satisfying the predetermined condition is autonomously searched and demarcated, the control unit 11 starts the coordinates (for example, X = 0, Y = 0) of the upper left corner of the target image data (after reduction) as the starting point. As a result, it is checked whether or not a predetermined demarcation condition is satisfied for a rectangular area of a predetermined size. If the delimitation condition is satisfied, the process of setting the rectangular area as a search area When the start point deviates from the width of the target image data (X> the width of the target image data), the start point is moved by a predetermined amount in the height direction. (X = 0, Y = Y + ΔY) and repeat the process in the width direction. Thus, an area that satisfies the demarcation condition is defined as a search area in the entire target image data.
[0036]
Here, the moving amounts of the starting point are ΔX and ΔY in the width direction and the height direction, respectively, but these moving amounts are relative to ΔX and ΔY when the image data is 1 time depending on the reduction rate S of the target image data. ΔX / S, ΔY / S may be used.
[0037]
[Conversion processing]
Next, the conversion process will be described. One characteristic of the present embodiment is that this conversion process is performed step by step, and conversion corresponding to one degree of freedom of conversion is performed at each step. The control unit 11 according to the present embodiment calculates predetermined feature vector information based on the image data part included in the search area. Here, the feature amount vector information is a vector amount including a plurality of feature amount elements selected in accordance with the property of the search target.
[0038]
In the present embodiment, the control unit 11 will be described as specifying the transformation by the kernel nonlinear subspace method using the feature vector information and the feature vector information stored in the transformation database 13a.
[0039]
[Contents of conversion database 13a]
Since this kernel nonlinear subspace method is widely known as a method for classifying data into a certain category, a detailed description thereof will be omitted. However, in brief, in the space F stretched on the basis of a feature element, Recognize each of a plurality of subspaces Ω included in the space F as a category to which data is classified, and feature vector information (for example, Φ) in the space F created based on the data to be classified. Projection is performed on each subspace Ω (the projection result is, for example, φ), and the subspace Ω (assuming that the distance E between the feature vector information Φ before projection and the feature vector information φ after projection is minimized) This is a method for determining that the data to be detected and classified is belonging to the category represented by the subspace Ω.
[0040]
Therefore, at the learning stage, the nonlinear mapping (mapping to the space F, that is, mapping to the space F, that is, the closest subspace Ω based on the feature vector information corresponding to the exemplary data for learning (learning sample) that should belong to the same category, is made. Parameters and the like included in the kernel function) and at least one of the hyperplanes separating the subspaces Ω corresponding to the respective categories are adjusted.
[0041]
In the present embodiment, this conversion database 13a is formed in order to determine a method (type and amount of conversion) for converting the search target into the reference state. That is, the conversion database 13a has been learned and acquired so that the type of conversion to be performed and the amount (category) of conversion to be performed can be determined for image data whose unknown state is unknown. In the present embodiment, a conversion database 13a is created for each type of conversion (degree of freedom) to be performed on an image, such as image rotation, parallel movement, and size change. The conversion database 13a corresponding to each degree of freedom of conversion is acquired by learning the conversion amount of the corresponding conversion as a category.
[0042]
In order to acquire this learning, a learning sample is generated as follows in the learning process of the conversion database 13a of the present embodiment. That is, a plurality of examples of image data to be searched in a predetermined reference state are prepared, and for each example of image data, conversion is performed with different conversion amounts for each degree of freedom. A plurality of converted image data is generated. The converted image data generated for each degree of freedom is used as a learning sample for each degree of freedom. Specifically, when a face is a search target, a plurality of face image data in a predetermined reference state (predetermined shooting condition / posture) is prepared as an example, and for each image data, for example, rotation / For each degree of freedom such as parallel movement and size change, if it is a rotation, the converted image data rotated by an angle of 5 degrees in the range from -180 degrees to 180 degrees is used as a learning sample for the degree of freedom of rotation. . In the case of parallel movement, a plurality of converted image data moved by 5 pixels vertically and horizontally are used as learning samples for the degree of freedom of translation. Since these learning samples include conversion degrees of freedom such as movement, the learning samples are generated by moving the area of the reference state from the image data in a region wider than the reference state while moving it by 5 pixels.
[0043]
In this way, for each of a plurality of image data examples, a plurality of image data that has been subjected to a plurality of conversions for each degree of freedom is generated, and information indicating what conversion has been performed on each image data (a large amount of conversion) Etc.).
[0044]
Here, in order to obtain image data having undergone conversions of different conversion amounts, the image data that has been converted while changing the conversion amount by a predetermined step (for example, 5 degrees in terms of rotation) is used as a learning sample. Although they are included, it is also possible to perform conversion while determining the conversion amount by a random number, and include each in the learning sample, instead of changing each predetermined step.
[0045]
Next, the conversion database 13a corresponding to each degree of freedom is learned using the learning sample for each degree of freedom.
[0046]
[Conversion operation]
The control unit 11 performs the conversion process as follows for each of the search areas defined by the search area defining process, using each of the conversion databases 13a learned in this way. That is, the image data part included in the search area to be processed (for example, can be equated with a vector value as a column of pixel values) is converted into feature vector information in the space F (for each conversion database 13a, that is, A set of features defined for each degree of freedom of transformation), and then project the mapping onto each subspace Ω. Then, a conversion amount that minimizes the distance E between the feature vector information before projection and the feature vector information after projection is determined. Further, the control unit 11 calculates a square value L of the distance E, and stores this in the storage unit 12 as an error.
[0047]
Here, the conversion amount is determined for each degree of freedom based on each conversion database 13a, but the control unit 11 selects one of the conversion amounts corresponding to each degree of freedom as a predetermined condition (for example, corresponding to each conversion amount). The conversion corresponding to the selected degree of freedom is converted by an amount corresponding to the selected conversion amount.
[0048]
That is, from the image data portion included in the search area, the reference state is obtained by, for example, 10 degree rotation conversion with respect to the rotation degree of freedom by the information acquired by learning in each conversion database 13a corresponding to each degree of freedom. The error is Lr, and for the degree of freedom of translation, information such as Lp is obtained by approaching the reference state by converting 5 pixels to the left, and from this, the error is the minimum. Select a transformation with degrees of freedom. For example, in the case of the above-described example, if Lr <Lp, a 10-degree rotation transformation is applied to the search area to define a new search area. Then, the image data portion included in the new search area is further mapped to the feature vector information in the space F, and the mapping is further projected to each partial space Ω. And it repeats from the process which determines the conversion amount from which the distance E of the feature-value vector information before projection and the feature-value vector information after projection becomes the minimum.
[0049]
If the conversion amount corresponding to each degree of freedom represents “0” (that is, no conversion), the process ends at that stage, and if there is an unprocessed search area, The conversion process is performed with any of the unprocessed search areas as a processing target.
[0050]
Here, the image data portion included in the defined search area is used as it is in the target image data. However, processing for reducing the resolution of the image data portion is performed to obtain coarse-grained data. You may perform a conversion process using data. In this case, each conversion database 13a is learned and acquired using a learning sample corresponding to the coarse-grained data.
[0051]
In addition, the control unit 11 may perform processes such as calculation of feature vector information, mapping to a partial space, distance evaluation, and error evaluation in parallel instead of sequentially for each degree of freedom.
[0052]
Furthermore, although the case where the kernel nonlinear subspace method is used has been described here as an example, other methods such as an auto encoder may be used as long as data classification and error evaluation at the time of classification are possible.
[0053]
[Search process]
Next, the search process will be described. In this search process, it is determined whether or not a search target is included using the search database 13b for each image data portion included in each search area for which the conversion process has been completed. As a specific example of the search process, there is a method disclosed in Japanese Patent Laid-Open No. 2002-329188. Next, the outline will be described.
[0054]
[Learning course of search database]
This search database 13b is formed by learning a neural network using an example of search target image data in a reference state as a learning sample. That is, the control unit 11 receives a plurality of learning samples, calculates a feature amount set (feature amount vector) selected in advance according to the properties of the search target for each of the learning samples, and generates learning data. To do. Next, using this learning data, a lattice space map is formed on the M × M ′ lattice space stored in the storage unit 12 by SOM (self-organizing map). That is, the control unit 11 calculates the distance between the feature vector that is the input learning data and the weight vector assigned to each grid with a predetermined measure (for example, Euclidean measure), and the distance is the minimum. A lattice (most matching node) c is detected. Then, the weight vectors of the plurality of grids near the most matched node are updated using the input feature vector. By repeating this process, a lattice space map is formed on the storage unit 12, and the best matching nodes for similar feature vectors form a continuous region. That is, in this lattice space, a non-linear projection from a feature vector, which is a multi-dimensional input signal, to a two-dimensional map is formed while maintaining the phase, and by updating the weights, feature portions of the data are organized, As a learning result, there are grids that react to similar data in close proximity.
[0055]
When the learning based on each learning data is completed, the control unit 11 classifies each lattice of the lattice space map into a category. This classification can be performed based on, for example, the distance between the lattices (the distance between the weight vectors associated with each lattice), and the category of the lattice group that reacts to image data similar to the search target (search target category). And other categories of lattice groups (non-search target categories).
[0056]
[Search operation]
The control unit 11 secures an area for storing map data having the same size as the target image data in the storage unit 12 and initializes the value of the area to “false”.
[0057]
The control unit 11 calculates a predetermined feature vector based on the image data portion of the search area for which the conversion process has been completed, using the learned and acquired search database 13b. Then, the distance between the calculated feature vector and the weight vector associated with each grid in the search database 13b is obtained, the grid (the most consistent node) having the minimum distance from the feature vector is specified, and the specified grid If it belongs to the search target category, it is determined that the search area includes the search target. If the specified lattice belongs to the non-search target category, the search area does not include the search target. to decide.
[0058]
[Operation of Control Unit 11]
As described above, the control unit 11 according to the present embodiment sequentially extracts the target image data to be searched, extracts the area to be searched from each of the reduced target image data, After the conversion is performed so that the image data is in the reference state, it is determined whether the search target is included in the image data in the converted search area. That is, since the control unit 11 performs the process of setting the image data to be searched for as a reference state, the position of the search area before conversion may be slightly shifted. Even if the reduction ratio is slightly deviated from the reference state, there is no problem.
[0059]
Thus, conventionally, multistage reduced image data reduced by 0.8 times is generated and the search area is extracted while being shifted one pixel at a time. It is possible to reduce by 5 times, and ΔX and ΔY can be set to 6 pixels or the like even when a region that satisfies a predetermined condition is autonomously extracted when defining a search region. As a result, the number of patterns to be searched can be significantly reduced, and the processing load for searching for a search object from a photograph or the like can be reduced.
[0060]
The control unit 11 generates map data representing an area determined to include the search target as search result information. This map data corresponds to each of the target image data after being reduced at each reduction rate. Are generated. Therefore, an area including the search target is determined by using the plurality of map data (each having the reduced target image data size) in an integrated manner.
[0061]
For example, each map data is enlarged at an enlargement ratio corresponding to the respective reduction ratios, compared with the size of the original target image data, and a region that is “true” in common to all map data ( Based on the target image data of any reduction ratio, it may be determined that the search target is included in an area where the search target is determined to be included. Further, an area that is “true” in any one of the map data may be determined as an area that includes a search target.
[0062]
[Operation and application examples]
Next, an application example of processing for finding an area including a search target from target image data will be described.
[0063]
According to the image search device according to the present embodiment, for example, it is possible to count how many people are captured in a certain photo. In other words, each database 13 is learned by using a human face as a search target by using a photograph read as processing target image data. Then, the process shown in FIG. 2 is performed to count how many areas are “true” on the map data. According to this, it can utilize for uses, such as traffic census, for example.
[0064]
Similarly, using the database 13 in which a human face is learned as a search target, image data digitally picked up by a camera is used as target image data to detect which part of the image data the face is picked up. Personal authentication can also be performed using the face imaged in the detected part.
[0065]
Further, for example, by using image data obtained by reading a plurality of photographs as target image data, the process shown in FIG. 2 detects in which part of the image data corresponding to each photograph the person's face is reflected, and the search condition Let the user specify which part of the photo the person's face is imaged to be searched, search for a detection result that matches the specified search condition, and select a photo corresponding to the matching detection result There are also application examples such as presenting to the user.
[0066]
If this is further performed on image data acquired from a web server on the network, it can be applied to search for image data published on the web where a person is placed in a specific situation. Become.
[0067]
Furthermore, when this example is applied to a camera, the following camera can be obtained. That is, by letting the user specify the arrangement state of the human face in advance, each piece of image data continuously taken digitally is set as the target image data, The arrangement state of the face of the included person is detected using the processing of FIG. 2, and image data when the detection result matches the arrangement state specified by the user is selectively stored in the storage unit 12. Hold. According to this, it is possible to take a photograph in which a human face is in a desired arrangement state. This can also be used when it is difficult to obtain an image when two persons are facing the front, such as when photographing two children. This is because it is detected that the face is arranged at the corresponding portion by the processing of FIG. 2 when facing the front. In addition, when taking a group photo, instead of using a self-timer or remote controller as in the past, it is also used for recording images when the person who set up the camera arrives at the desired position on the photo it can.
[0068]
Furthermore, in this embodiment, the search target is not limited to a human face. For example, in an industrial parts production factory, when the number of parts flowing on a conveyor is counted, or when a natural image is taken or a scientific experiment is recorded, an image where the subject is in a desired arrangement state is recorded. It can also be used for purposes such as.
[0069]
[Modification of processing]
It is also preferable to determine whether or not there is a possibility that a search target is included at the stage of the search area defining process, and to define the search area based on the determination result. This is because it can be assumed that the search areas defined by autonomously searching for areas that satisfy the predetermined condition include areas that overlap each other. In other words, even when the same front-facing face is the search target, the search area is defined as a different search area between the area around the left ear, the area around the nose, and the area around the right ear. This is based on the fact that not all search areas need to be processed in this case.
[0070]
In this case, the control unit 11 performs the process shown in FIG. 3 instead of FIG. In FIG. 3, the same reference numerals are given to the portions that perform the same processing as in FIG. 2, and detailed description thereof is omitted. The control unit 11 first sets the reduction rate S to the minimum reduction rate (S1), and reduces the target image data at the reduction rate S (S2). Then, an area of map data having a size equal to the size of the target image data after reduction is secured on the storage unit 12, the value of the area is set to “false”, and the map data is initialized. (S3). Further, the control unit 11 generates an array of Boolean values having the same size as the target image data (hereinafter referred to as a “gazing map”) (S20). Each Boolean value is initially set to “false”.
[0071]
Next, the control unit 11 performs a process of defining at least one search area for the reduced target image data (S4). In this process S4, a search area is demarcated by the autonomous process already described. Next, the control unit 11 changes and sets the Boolean value of the area corresponding to the search area to “True” among the Boolean values (S21). Then, the control unit 11 selects any one of the search areas (S22), and the center coordinates (for example, (x1, y1) and (x2, y2) of the area on the gage map corresponding to the selected search area. ) Is a rectangular area with upper left and lower right, respectively, the center coordinates are points represented by ((x1 + x2) / 2, (y1 + y2) / 2), and are converted into integers by a predetermined method) Referring to the Boolean value, it is checked whether it is true (S23).
[0072]
If the Boolean value is true in this process S23 (if Yes), the conversion process (S6) and the search process (S7) are performed on the selected search area, and the search target is selected by the search process. When it is determined that it is included in the search area (if Yes), the value of the area on the map data corresponding to the search area after the conversion process is set to “true” ( Along with S8), the Boolean value of the area corresponding to the search area is set to false on the aging map (S24). As a result, the search area in which the center point coordinates are included in the portion is not processed in an overlapping manner.
[0073]
The control unit 11 further checks whether there is a Boolean value that is true on the aging map (S25). If there is a Boolean value that is true (if Yes), the control unit 11 A new search area is demarcated using either one as a central coordinate (S26), and the process returns to process S22 to select the search area and continue the process. If the Boolean value is not true in step S23 (if No), the process proceeds to step S25 to check whether there is a search area that has not been selected (C).
[0074]
On the other hand, if it is determined that the search target is not included as a result of the search process in the process S7 (No), the process proceeds to the process S25 as it is. Also, in process S25, if there is no Boolean value that is true on the aging map (if No), whether or not the currently set reduction rate S exceeds a predetermined maximum reduction rate. (S10), if it does not exceed (if No), the reduction rate S is adjusted to be increased (S11), and the process returns to the process S2 to continue the process. Here, the process S11 for largely adjusting the reduction rate S may be, for example, a process for increasing the reduction rate S by a predetermined ratio, or by multiplying the magnification which is the reduction rate S by a predetermined multiplier ΔS, S = A new reduction ratio S may be set as S × ΔS.
[0075]
Further, if the currently set reduction rate S does not exceed the predetermined maximum reduction rate (if Yes) in the process S10, based on the map data corresponding to the target image data at each reduction rate, the original In the target image data (before reduction), an area including a search target is defined (S12), and the process ends.
[0076]
According to this process, even if a plurality of search areas are defined for one search object in the search area defining process, a plurality of adjacent search areas are not processed in an overlapping manner.
[0077]
[Variation of search area definition processing]
Further, in the search area defining process, the control unit 11 determines the entropy, hierarchical entropy, color, luminance distribution, and two or more of these values of the image data included in the area to be defined as the search area. A combination may be used to determine whether or not the search area is actually defined. For example, when searching for a face part of a person, the entropy is high in the peripheral part (outline part) of the face, so if the entropy is larger than a predetermined threshold, the area to be defined as the search area is actually A search area is defined. Otherwise, the area is not set as a search area, and other processes are continued. According to this, the search area to be subjected to the conversion process / search process can be rationally reduced, and the processing load can be reduced.
[0078]
[Still another variation of processing]
Furthermore, in the description so far, the control unit 11 defines the search area, but the search area itself may be specified by the user. In this case, the control unit 11 causes the size and direction of the target image data to be selected, and displays a frame line corresponding to the size on the display unit 14. Further, the control unit 11 causes the user to draw search area candidates within the frame line. Specifically, a display as shown in FIG. 4 is performed on the display unit 14, and a search area candidate R is drawn in the frame B. When the control unit 11 receives a click operation of the “define” button from the user, the control unit 11 defines a rectangle circumscribing the search region candidate R as a search region based on the drawn search region candidate R.
[0079]
Further, a pattern (template) of a search area defined in advance is stored in the storage unit 12, and when a search area specification is accepted, a list of the stored templates is presented to the user for selection. A search area may be defined based on the selected template.
[0080]
[Modification of conversion process]
When letting the user specify these search areas themselves, information on conversion processing to be performed for each of the specified search areas (information indicating degree of freedom of conversion and conversion amount, hereinafter referred to as user-specified conversion information) is associated. Alternatively, this may be used during the conversion process. This user-specified conversion information corresponds to an example of “information related to a search target state” in the present invention. The conversion process in this case is as follows.
[0081]
That is, the control unit 11 first selects one of the defined search areas, and converts the selected search area based on information on conversion processing associated with the search area (preliminary conversion). Then, the image data portion included in the preliminary-converted search area is mapped to the feature vector information in the space F, and the mapping is projected onto each partial space Ω. Then, a conversion amount that minimizes the distance E between the feature vector information before projection and the feature vector information after projection is determined for each degree of freedom of conversion using the conversion database 13a.
[0082]
That is, from the image data portion included in the search region, information on the amount of conversion for approaching the reference state for each degree of freedom, based on the information learned and acquired in each conversion database 13a corresponding to each degree of freedom, Since the error (square value of the distance E) at that time is obtained, the conversion of the degree of freedom in which the error is minimized is specified from among the conversions of these degrees of freedom. Here, if the specified degree of freedom is the same as the degree of freedom of conversion determined by the user-specified conversion information associated with the selected search region, the control unit 11 has the next degree of freedom with the smallest error. The process of re-specifying the conversion is repeated to specify a conversion with a degree of freedom other than the conversion degree of freedom determined by the user-specified conversion information. Then, the control unit 11 performs a conversion of the specified degree of freedom on the selected search area by the conversion amount determined for the conversion of the degree of freedom, thereby defining a new search area. Then, the image data portion included in the new search area is further mapped to the feature vector information in the space F, and the mapping is further projected to each partial space Ω. And it repeats from the process which determines the conversion amount from which the distance E of the feature-value vector information before projection and the feature-value vector information after projection becomes the minimum.
[0083]
In addition, when the conversion amount corresponding to the degree of freedom other than the degree of freedom determined by the user-specified conversion information represents “0” (that is, no conversion), the degree of freedom determined by the user-specified conversion information If there is no conversion with a degree of freedom other than, the process ends at that stage, and if there is an unprocessed search area, the conversion process is performed with any of the unprocessed search areas as a processing target.
[0084]
In this case as well, a process for reducing the resolution of the image data portion may be performed to obtain coarse-grained data, and the conversion process may be executed using the coarse-grained data. In this case as well, each conversion database 13a is learned and acquired using a learning sample corresponding to the coarse-grained data.
[0085]
Using this conversion process, for example, it is possible to search for image data including the state of the search target in each search region. In other words, when there are upright people and lying people in the photo, and each face is upright and rotated 90 degrees from upright, information on each state is displayed along with the search area. Specify as a search condition. For example, if the search database 13b is learned and acquired as a learning sample in the search database 13b as the reference state, the search region associated with the erect state is designated as performing 0 degree rotation conversion. In this case, since rotation conversion is not performed in the conversion process, only the face of the person in the reference state, i.e., the upright state, is detected in the search process, and the person who is lying down (rotated 90 degrees from the upright position) is also detected. If it is specified that -90 degree rotation conversion is to be performed for (in this state), after the -90 degree conversion is performed in the preliminary conversion, further rotation conversion is performed by the conversion process. In the search processing, only the face of a person whose state after the preliminary conversion is the reference state (that is, the face of a person tilted by 90 degrees) is detected with respect to the degree of freedom of rotation in the search process.
[0086]
[Another configuration example for obtaining the same effect]
Further, the control unit 11 executes the conversion process as shown in FIG. 2 instead of the above process using the preliminary conversion, and when the conversion process is completed, the content of the conversion actually performed in the conversion process And a process of comparing the content of the conversion determined by the user-specified conversion information associated with the search area. In this case, if a conversion equivalent to the conversion indicated in the user-specified conversion information is performed by this comparison process, the search process is executed for the search area where the conversion process is performed, and the equivalent conversion is performed. If not, the search process for the conversion area where the conversion has been performed is skipped.
[0087]
[Other variations]
In the description so far, the conversion performed in the conversion process is two-dimensional rotation, translation, enlargement / reduction (enlargement / reduction of the search area, and the original image data area before the enlargement / reduction) However, other than this, for example, a human face may include a three-dimensional rotation that is influenced by the posture (depression or swing direction). . In this case, assuming an average three-dimensional model to be searched, conversion of three-dimensional rotation can be realized by using an image data portion projected onto the average three-dimensional model.
[0088]
Further, in the search process, for example, even a human face may be further classified into categories and may include conditions such as age, sex, and whether or not the mouth is open.
[0089]
Further, in the flowcharts of FIG. 2 and FIG. 3, the control unit 11 sequentially performs the processing at each reduction rate. However, the processing at each reduction rate is independent of each other. You may perform the process in a reduction ratio in parallel.
[Brief description of the drawings]
FIG. 1 is a configuration block diagram of an image search apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating an example of processing of a control unit 11.
FIG. 3 is a flowchart showing another example of processing by the control unit 11;
FIG. 4 is an explanatory diagram illustrating an example of an interface of an apparatus to which the image search apparatus according to the embodiment of the present invention is applied.
[Explanation of symbols]
11 control unit, 12 storage unit, 13 database unit, 14 display unit, 15 operation unit, 16 external storage unit.

Claims

An image search device for searching a search target image data portion to be searched from target image data to be processed,
Means for defining at least one search area in the target image data;
For at least one type of conversion, a plurality of learning samples converted from different predetermined conversion amounts from a predetermined reference state are used, and learning information associated with conversion amounts for returning each of the learning samples to the reference state is learned. With reference to the obtained conversion database, for the search partial data included in each of the defined search areas, the conversion including the type of conversion and the conversion amount to be applied in order to set the search partial data as the reference state Conversion means for acquiring a condition, repeatedly performing conversion based on the acquired conversion condition until the conversion amount indicated by the conversion condition for each search partial data is “0”, and converting the search partial data into a reference state When,
Search means for referring to a search database acquired by learning using an example of image data of a search target in the reference state, and determining whether a search target is included in each of the converted search partial data;
Means for generating and outputting search result information for specifying a region on the target image data corresponding to the search partial data determined to include a search target;
Including
An image search device, wherein the generated search result information is subjected to a predetermined process.

The image search device according to claim 1,
The conversion database includes a database group for storing conversion conditions obtained by learning for each type of conversion,
The conversion means searches each database group included in a conversion database, and for the search partial data included in each of the defined search areas, conversion to be applied to set the search partial data as a reference state. The conversion condition including the type and the conversion amount is acquired, and the conversion based on the acquired conversion condition is repeatedly performed until the conversion amount indicated by the conversion condition for each search portion data becomes “0”, and the search portion An image search apparatus characterized by converting data into a reference state .

  An image search method for searching a search target image data portion to be searched from target image data to be processed, using a computer,
  Means for defining a search area defining at least one search area in the target image data;
  The conversion means uses a plurality of learning samples converted by a different conversion amount from a predetermined reference state for at least one type of conversion, and associates information of the conversion amount for returning each of the learning samples to the reference state. With reference to the conversion database obtained by learning, the type of conversion and the conversion amount to be applied in order to set the search partial data as the reference state for the search partial data included in each of the defined search areas. The conversion condition is acquired, and the conversion based on the acquired conversion condition is repeatedly performed until the conversion amount indicated by the conversion condition for each search partial data becomes “0”, and the search partial data is set to the reference state. Converting, and
  The search means refers to the search database acquired by learning using the image data example of the search target in the reference state, and determines whether or not the search target is included in each of the converted search partial data Process,
  An output means for generating and outputting search result information for specifying an area on the target image data corresponding to the search partial data determined to include a search target; and
  Including
  An image search method, wherein the generated search result information is subjected to a predetermined process.

  A program for searching a search target image data portion to be searched from target image data to be processed,
  Computer
  Means for defining at least one search area in the target image data;
  For at least one type of conversion, a plurality of learning samples converted from different predetermined conversion amounts from a predetermined reference state are used, and learning information associated with conversion amounts for returning each of the learning samples to the reference state is learned. With reference to the obtained conversion database, for the search partial data included in each of the defined search areas, the conversion including the type of conversion and the conversion amount to be applied in order to set the search partial data as the reference state Conversion means for acquiring a condition, repeatedly performing conversion based on the acquired conversion condition until the conversion amount indicated by the conversion condition for each search partial data is “0”, and converting the search partial data into a reference state When,
  Search means for referring to a search database acquired by learning using an example of image data of a search target in the reference state, and determining whether a search target is included in each of the converted search partial data;
  Means for generating and outputting search result information for specifying a region on the target image data corresponding to the search partial data determined to include a search target;
  Function as
  A program characterized in that the generated search result information is subjected to a predetermined process.