JP4099981B2

JP4099981B2 - Image recognition system, image recognition method, and image recognition program

Info

Publication number: JP4099981B2
Application number: JP2001369557A
Authority: JP
Inventors: 晃井上
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-12-04
Filing date: 2001-12-04
Publication date: 2008-06-11
Anticipated expiration: 2021-12-04
Also published as: JP2003168113A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像認識システム、画像認識方法および画像認識プログラムに関し、特に、画像として撮影された対象物体が、辞書に登録された物体かどうかを識別する、または辞書に登録された複数のカテゴリ中の一つに分類する画像認識システム、画像認識方法および画像認識プログラムに関する。
【０００２】
【従来の技術】
従来の画像認識システムの一例が、特開平１１−２６５４５２号公報（物体認識装置および物体認識方法）に記載されている。図１３は、この従来の画像認識システムの構成を示すブロック図である。この図に示すように、従来の画像認識画像認識システムは、画像入力部５０１と、辞書記憶部５０２と、部分空間間の角度計算部５０３と、認識部５０４とから構成されている。
【０００３】
画像入力部５０１は、複数方向から撮影された複数の画像を獲得する。辞書記憶部５０２には、あらかじめＭ次元の部分空間で表現された辞書データが、カテゴリごとに用意されている。部分空間間の角度計算部５０３は、まず画像入力部５０１によって獲得された入力画像群をＮ次元部分空間で表現する。具体的には、入力画像を１次元特徴データとみなして主成分分析し、Ｎ個の固有ベクトルを抽出し、このＮ個の固有ベクトルでＮ次元部分空間を規定する。部分空間間の角度計算部５０３は、さらに入力画像のＮ次元部分空間（以下、入力部分空間という）と辞書のＭ次元部分空間（以下、辞書部分空間という）との角度Θを、辞書のカテゴリごとに計算する。認識部５０４は、部分空間間の角度計算部５０３において算出された角度Θを比較し、角度Θが最も小さいカテゴリを認識結果として出力する。
【０００４】
図１４を参照して、より具体的に説明する。図１４において、５１１は入力画像から抽出された特徴データが分布する入力特徴分布、５１２は入力特徴分布５１１を含む入力部分空間、５２１は辞書データのあるカテゴリの作成に用いた特徴データが分布する辞書特徴分布、５２２は辞書特徴分布５２１を含む辞書部分空間である。
入力部分空間５１２の基底ベクトルをΨn（ｎ＝１，２，…，Ｎ）、辞書部分空間５２２の基底ベクトルをΦm（ｍ＝１，２，…，Ｍ）とすると、部分空間間の角度計算部５０３で、式（１）または式（２）のｘijを要素にもつ行列Ｘを計算する。
【０００５】
【数７】

【０００６】
行列Ｘの最大固有値として、部分空間５１２，５２２間の角度Θの余弦の二乗が求められる。角度Θの余弦の二乗が大きい（または、小さい）とき、角度Θは小さく（または、大きく）なり、また角度Θが小さい（または、大きい）とき、部分空間５１２，５２２間の類似度が大きく（または、小さく）なる。したがって、行列Ｘの最大固有値を、部分空間５１２，５２２間の類似度と言い換えることができる。
よって、部分空間間の角度計算部５０３で、各カテゴリに対し行列Ｘの最大固有値を求めて類似度とし、認識部５０４で、類似度が最大のカテゴリに入力画像を分類する。
【０００７】
【発明が解決しようとする課題】
図１５は、従来の画像認識システムの問題点を示す概念図である。
この図に示すように、同じ入力部分空間５１２内の異なる位置に３つの入力特徴分布５１１Ａ，５１１Ｂ，５１１Ｃが存在する場合には、３つの入力特徴分布５１１Ａ〜５１１Ｃは互いに離れた位置にあり、辞書部分空間５２２内の辞書特徴分布５２１との隔たりも明らかに異なるので、本来は異なる類似度が算出されなければならない。
【０００８】
しかし、従来の画像認識システムでは、入力部分空間５１２と辞書部分空間５２２との角度Θの余弦の二乗と等価な行列Ｘの最大固有値を類似度として用いているので、同じ入力部分空間５１２内に存在する３つの入力特徴分布５１１Ａ〜５１１Ｃと辞書部分空間５２２との類似度がすべて同じになってしまう。
このように、従来の画像認識システムでは、同じ入力部分空間５１２内の異なる位置に入力特徴分布がある場合には、辞書部分空間５２２との類似度がすべて同じとなり、判別できないという問題があった。
【０００９】
本発明はこのような課題を解決するためになされたものであり、その目的は、入力部分空間と辞書部分空間との角度に依存せず、入力画像群を正しく判別することができる画像認識システム、画像認識方法および画像認識プログラムを提供することにある。
【００１０】
【課題を解決するための手段】
このような目的を達成するために、本発明の画像認識システムは、同じ対象が撮影された複数の入力画像と予め登録された辞書データとを照合し認識結果を出力する画像認識システムであって、前記入力画像から得られた複数の入力特徴データの分布範囲に属する任意の一ベクトルである入力代表ベクトルと、前記入力特徴データによって張られる入力部分空間の基底である入力主成分ベクトルと、予め登録された辞書特徴データの分布範囲に属する任意の一ベクトルである辞書代表ベクトルと、前記辞書特徴データによって張られる辞書部分空間の基底である辞書主成分ベクトルとを用いて、前記入力部分空間と前記辞書部分空間との距離値を算出する距離算出手段と、前記距離値に基づいて前記認識結果を出力する識別を行う識別手段とを備えたことを特徴とする。
【００１１】
より具体的には、前記入力画像から得られた複数の入力特徴データの分布範囲に属する任意の一ベクトルである入力代表ベクトルと、前記入力特徴データと入力代表ベクトルとの差分ベクトルによって張られる入力部分空間の基底である入力主成分ベクトルと、予め登録された辞書特徴データの分布範囲に属する任意の一ベクトルである辞書代表ベクトルと、前記辞書特徴データによって張られる辞書部分空間の基底である辞書主成分ベクトルとを用いて、前記入力部分空間と前記辞書部分空間との距離値を算出する距離算出手段と、前記距離値に基づいて前記認識結果を出力する識別を行う識別手段とを備えてもよい。これにより、複数の入力特徴分布が同じ入力部分空間内に存在する場合でも、各入力特徴分布の配置が異なれば距離値も異なるので、入力画像群を正しく判別することができる。
【００１２】
この画像認識システムにおいて、入力主成分ベクトルおよび辞書主成分ベクトルが、直交基底であってもよい。直交基底である主成分ベクトルを用いて距離値の計算を行うことにより、直交基底でない場合と比較して、短時間で高精度の照合結果を得ることができ、認識率を向上させることができる。
また、入力代表ベクトルが、入力特徴データの平均ベクトルであり、入力主成分ベクトルが、入力特徴データから入力代表ベクトルを減算した成分のうち固有値が大きい方からＫ個の固有ベクトルであり、辞書代表ベクトルが、辞書特徴データの平均ベクトルであり、辞書主成分ベクトルが、辞書特徴データから辞書代表ベクトルを減算した成分のうち固有値が大きい方からＬ個の固有ベクトルであってもよい。
【００１３】
また、距離算出手段は、Ｌ個の辞書主成分ベクトルで形成される空間と入力代表ベクトルとの第１の距離を算出する入力投影距離算出手段と、Ｋ個の入力主成分ベクトルで形成される空間と辞書代表ベクトルとの第２の距離を算出する辞書投影距離算出手段と、第１および第２の距離から入力部分空間と辞書部分空間との距離値を算出する統合手段とを備えるものであってもよい。このように入力画像および辞書データから得られた多くのデータを有効に利用して距離値を算出し、この距離値を照合に用いるので、照合性能が向上し、高い認識率が得られる。
【００１４】
ここで、入力代表ベクトルをＶ₁、辞書代表ベクトルをＶ₂、入力主成分ベクトルをΨ_i（ｉ＝１，…，Ｋ）、辞書主成分ベクトルをΦ_j（ｊ＝１，…，Ｌ）とすると、入力投影距離算出手段は、式（３）により第１の距離ｄ₁を算出し、辞書投影距離算出手段は、式（４）により第２の距離ｄ₂を算出するものであってもよい。
【００１５】
【数８】

【００１６】
また、上述した画像認識システムにおいて、前記距離算出手段は、前記入力主成分ベクトルのそれぞれに対応するＫ個の重みと、前記辞書主成分ベクトルのそれぞれに対応するＬ個の重みとを用いるものであってもよい。
ここで、入力主成分ベクトルに対応する重みが、入力主成分ベクトルとなる固有ベクトルのＫ個の固有値であり、辞書主成分ベクトルに対応する重みが、辞書主成分ベクトルとなる固有ベクトルのＬ個の固有値であってもよい。
【００１７】
また、入力代表ベクトルをＶ₁、辞書代表ベクトルをＶ₂、入力主成分ベクトルをΨ_i（ｉ＝１，…，Ｋ）、入力主成分ベクトルに対応する重みをμ_i（ｉ＝１，…，Ｋ）、辞書主成分ベクトルをΦ_j（ｊ＝１，…，Ｌ）、辞書主成分ベクトルに対応する重みをλ_j（ｊ＝１，…，Ｌ）とすると、入力投影距離算出手段は、式（５）により第１の距離ｄ₁を算出し、辞書投影距離算出手段は、式（６）により第２の距離ｄ₂を算出するものであってもよい（ただし、σは任意の定数）。
【００１８】
【数９】

【００１９】
また、上述した画像認識システムにおいて、統合手段は、第１の距離ｄ₁および第２の距離ｄ₂から式（７）により入力部分空間と辞書部分空間との距離値Ｄを算出するものであってもよい。
Ｄ＝αｄ₁＋βｄ₂ ・・・（７）
（ただし、α，βは定数）
あるいは、統合手段は、第１の距離ｄ₁および第２の距離ｄ₂から式（８）により入力部分空間と辞書部分空間との距離値Ｄを算出するものであってもよい。
Ｄ＝αｄ₁・ｄ₂／（ｄ₁＋ｄ₂）・・・（８）
（ただし、αは定数）
【００２０】
また、上述した画像認識システムにおいて、前記複数の入力特徴データから前記入力代表ベクトルを生成する入力代表ベクトル生成手段と、前記入力特徴データから前記入力主成分ベクトルを生成する入力主成分データ生成手段とをさらに備えていてもよい。
【００２１】
また、上述した画像認識システムにおいて、前記辞書代表ベクトルを生成する辞書代表ベクトル生成手段と、前記辞書主成分ベクトルを生成し、前記辞書代表ベクトルおよび前記辞書主成分ベクトルを格納する辞書格納手段に出力する辞書主成分データ生成手段とをさらに備えていてもよい。
また、上述した画像認識システムは、認識対象が人間の顔画像である。これにより、人間の顔画像を用いて画像中の人物を同定するシステムを構成することができる。
【００２２】
また、本発明の画像認識方法は、同じ対象が撮影された複数の入力画像と予め登録された辞書データとを照合し認識結果を出力する画像認識方法であって、前記入力画像から得られた複数の入力特徴データの分布範囲に属する任意の一ベクトルである入力代表ベクトルと、前記入力特徴データによって張られる入力部分空間の基底である入力主成分ベクトルと、予め登録された辞書特徴データの分布範囲に属する任意の一ベクトルである辞書代表ベクトルと、前記辞書特徴データによって張られる辞書部分空間の基底である辞書主成分ベクトルとを用いて、前記入力部分空間と前記辞書部分空間との距離値を算出する距離算出ステップと、前記距離値に基づいて前記認識結果を出力する識別を行う識別ステップとを備えたことを特徴とする。
【００２３】
より具体的には、前記入力画像から得られた複数の入力特徴データの分布範囲に属する任意の一ベクトルである入力代表ベクトルと、前記入力特徴データと入力代表ベクトルとの差分ベクトルによって張られる入力部分空間の基底である入力主成分ベクトルと、予め登録された辞書特徴データの分布範囲に属する任意の一ベクトルである辞書代表ベクトルと、前記辞書特徴データによって張られる辞書部分空間の基底である辞書主成分ベクトルとを用いて、前記入力部分空間と前記辞書部分空間との距離値を算出する距離算出ステップと、前記距離値に基づいて前記認識結果を出力する識別を行う識別ステップとを備えていてもよい。
この画像認識方法において、入力主成分ベクトルおよび辞書主成分ベクトルが、直交基底であってもよい。
【００２４】
また、前記入力代表ベクトルは、前記入力特徴データの平均ベクトルであり、前記入力主成分ベクトルは、前記入力特徴データから前記入力代表ベクトルを減算した成分のうち、固有値が大きい方からＫ個の固有ベクトルであり、前記辞書代表ベクトルは、前記辞書特徴データの平均ベクトルであり、前記辞書主成分ベクトルは、前記辞書特徴データから前記辞書代表ベクトルを減算した成分のうち、固有値が大きい方からＬ個の固有ベクトルであってもよい。
【００２５】
また、前記距離算出ステップは、前記Ｌ個の辞書主成分ベクトルで形成される空間と前記入力代表ベクトルとの第１の距離を算出する入力投影距離算出ステップと、前記Ｋ個の入力主成分ベクトルで形成される空間と前記辞書代表ベクトルとの第２の距離を算出する辞書投影距離算出ステップと、前記第１および第２の距離から前記入力部分空間と前記辞書部分空間との前記距離値を算出する統合ステップとからなるものであってもよい。
【００２７】
また、前記入力代表ベクトルをＶ ₁ 、前記辞書代表ベクトルをＶ ₂ 、前記入力主成分ベクトルをΨ _i （ｉ＝１，…，Ｋ）、前記辞書主成分ベクトルをΦ _j （ｊ＝１，…，Ｌ）とすると、前記入力投影距離算出ステップは、式（９）により前記第１の距離ｄ ₁ を算出し、前記辞書投影距離算出ステップは、式（１０）により前記第２の距離ｄ ₂ を算出するものであってもよい。
【００２８】
【数１０】

【００２９】
また、前記距離算出ステップは、前記入力主成分ベクトルのそれぞれに対応するＫ個の重みと、前記辞書主成分ベクトルのそれぞれに対応するＬ個の重みとを用いるものであってもよい。
ここで、前記入力主成分ベクトルに対応する重みは、前記入力主成分ベクトルとなる前記固有ベクトルのＫ個の固有値であり、前記辞書主成分ベクトルに対応する重みは、前記辞書主成分ベクトルとなる前記固有ベクトルのＬ個の固有値であってもよい。
【００３０】
また、前記入力代表ベクトルをＶ ₁ 、前記辞書代表ベクトルをＶ ₂ 、前記入力主成分ベクトルをΨ _i （ｉ＝１，…，Ｋ）、前記入力主成分ベクトルに対応する前記重みをμ _i （ｉ＝１，…，Ｋ）、前記辞書主成分ベクトルをΦ _j （ｊ＝１，…，Ｌ）、前記辞書主成分ベクトルに対応する前記重みをλ _j （ｊ＝１，…，Ｌ）とすると、前記入力投影距離算出ステップは、式（１１）により前記第１の距離ｄ ₁ を算出し、前記辞書投影距離算出ステップは、式（１２）により前記第２の距離ｄ ₂ を算出するものであってもよい（ただし、σは任意の定数）。
【００３１】
【数１１】

【００３２】
また、前記統合ステップは、前記第１の距離ｄ ₁ および前記第２の距離ｄ ₂ から式（１３）により前記入力部分空間と前記辞書部分空間との前記距離値Ｄを算出するものであってもよい。
Ｄ＝αｄ₁＋βｄ₂ ・・・（１３）
（ただし、α，βは定数）
あるいは、前記統合ステップは、前記第１の距離ｄ ₁ および前記第２の距離ｄ ₂ から式（１４）により前記入力部分空間と前記辞書部分空間との前記距離値Ｄを算出するものであってもよい。
Ｄ＝αｄ₁・ｄ₂／（ｄ₁＋ｄ₂）・・・（１４）
（ただし、αは定数）
【００３３】
また、上述した画像認識方法は、前記複数の入力特徴データから前記入力代表ベクトルを生成する入力代表ベクトル生成ステップと、前記入力特徴データから前記入力主成分ベクトルを生成する入力主成分データ生成ステップとをさらに備えていてもよい。
また、上述した画像認識方法は、前記辞書代表ベクトルを生成する辞書代表ベクトル生成ステップと、前記辞書主成分ベクトルを生成し、前記辞書代表ベクトルおよび前記辞書主成分ベクトルを格納する辞書格納手段に出力する辞書主成分データ生成ステップとをさらに備えていてもよい。
また、上述した画像認識方法は、認識対象が人間の顔画像である。
【００３４】
【発明の実施の形態】
次に、本発明の実施の形態について、図面を参照して詳細に説明する。
【００３５】
（第１の実施の形態）
図１は、本発明の第１の実施の形態である画像認識システムの構成を示すブロック図である。図２は、図１に示す画像認識システムによる処理を概念的に示す図である。
図１に示す画像認識システムは、同じ対象を撮影して得られた複数の学習画像データからなる学習画像群を獲得する学習画像群入力部１と、この学習画像群入力部１より入力される学習画像群から辞書データを生成する学習部２と、この学習部２で生成される辞書データを撮影対象毎にカテゴリに分けて格納する辞書格納部３と、同じ対象を撮影して得られた複数の入力画像データからなる入力画像群を獲得する識別対象画像群入力部４と、辞書格納部３に格納されている辞書データを用いて識別対象画像群入力部４より入力される入力画像群から撮影対象を認識する照合部５とから構成されている。
【００３６】
学習画像群入力部１は、辞書として登録するためビデオカメラ等によって同一対象物体を撮影して得られたＭ個（Ｍは自然数）の静止画像を獲得し、これらを学習画像データ１ＡとしてカテゴリＩＤ１Ｂとともに学習部２に出力する。Ｍ個の学習画像データ１Ａは、カテゴリＩＤ１Ｂによって指定されたカテゴリに属するものとする。
学習部２は更に、学習画像特徴抽出部２１と、辞書代表ベクトル生成部２２と、辞書主成分データ生成部（辞書部分空間生成手段）２３とから構成されている。
【００３７】
学習画像特徴抽出部２１は、学習画像群入力部１より入力されるＭ個の学習画像データ１Ａから、認識に用いるＭ個の辞書特徴データ２１Ａを特徴抽出し、辞書代表ベクトル生成部２２および辞書主成分データ生成部２３に出力する。Ｍ個の辞書特徴データ２１Ａからなる辞書特徴データ群は、図２（ａ）に示す辞書特徴分布２１Ｂに分布しているものとする。
学習画像特徴抽出部２１の一例として、元の画像データに１次微分、２次微分フィルタを作用させた出力を、ラスタースキャンして１次元特徴データとして出力するものがある。また、辞書画像特徴抽出部２１の他の例として、元の画像データをラスタースキャンして１次元特徴データとし、その平均を０、分散を１．０とするように、平均と分散を一定値に正規化するものがある。これにより、輪郭強調や雑音除去などが施された辞書特徴データ２１Ａが得られる。
なお、学習画像特徴抽出部２１は、仮に識別対象画像群入力部４から出力される入力画像データ４Ａが入力されたとしたら、照合部５の入力画像特徴抽出部５１と同じ特徴データを抽出するものである必要がある。
【００３８】
辞書代表ベクトル生成部２２は、Ｍ個の入力特徴データ２１Ａからなる辞書特徴データ群を基に、この辞書特徴データ群を代表する１つのベクトルである辞書代表ベクトル２２Ａを生成し、辞書主成分データ生成部２３および辞書格納部３に出力する。辞書代表ベクトル２２Ａは、辞書特徴分布２１Ｂを含む辞書部分空間に属する任意のベクトルであり、原点Ｏを始点とし、辞書部分空間上のＰ点を終点とする。辞書代表ベクトル２２Ａの一例として、辞書特徴データ群の平均値（平均ベクトル）や中央値（中央ベクトル）などが挙げられる。
【００３９】
辞書主成分データ生成部２３は、Ｍ個の辞書特徴データ２１Ａのそれぞれから辞書代表ベクトル２２Ａを除いた後の成分を代表するＬ個のベクトルである辞書主成分ベクトル（Φj（ｊ＝１，・・・，Ｌ））２３Ａと、辞書主成分ベクトル（Φi）のそれぞれに対応する重み値（λj（ｊ＝１，・・・，Ｌ））２３Ｂを抽出し、辞書格納部３に出力する。辞書特徴データ２１Ａの特徴次元数をＤとすると、Ｌは１より大きく、ｍｉｎ（Ｍ，Ｄ）以下の自然数である。ｍｉｎ（Ｍ，Ｄ）は、ＭとＤの小さい方の数を表す。辞書主成分ベクトル２３Ａは、辞書特徴分布２１Ｂを含む辞書部分空間を表すベクトルである。
【００４０】
辞書特徴データ２１Ａから辞書代表ベクトル２２Ａを除く方法としては、辞書特徴データ２１Ａから辞書代表ベクトル２２Ａを減算する方法や、辞書特徴データ２１Ａの辞書代表ベクトル２２Ａに垂直な成分を計算する方法がある。
その後、Ｌ個の辞書主成分ベクトル２３Ａおよび重み値２３Ｂを抽出する方法としては、Ｍ個の辞書特徴データ２１Ａのそれぞれから辞書代表ベクトル２２Ａを除いた後の成分を主成分分析し、固有値が大きい方からＬ個の固有ベクトルを辞書主成分ベクトル２３Ａとして選択し、選択された辞書主成分ベクトル２３Ａに対応する固有値を重み値２３Ｂとして採用する方法がある。固有値および固有ベクトルの求め方は、一般的な多変量解析の文献に述べられており、例えば文献１（田中、脇本著、「多変量統計解析法」、現代数学社、pp.71-79, 1983）がある。
【００４１】
辞書格納部３は、例えば図３に示すように、Ｃ個（Ｃは自然数）のレコード記憶部３１，３２，・・・，３Ｃを有し、各レコード記憶部３１〜３Ｃは、それぞれレコード番号６１、辞書代表ベクトル６２、辞書主成分データ６３、カテゴリＩＤ６４を記憶することができる。辞書代表ベクトル６２として、辞書代表ベクトル生成部２２で生成された辞書代表ベクトル２２Ａを、辞書主成分データ６３として、辞書主成分データ生成部２３で生成されたＬ個の辞書主成分ベクトル２３Ａおよび重み値２３Ｂを、カテゴリＩＤ６４として、カテゴリＩＤ１Ｂを記憶する。このように辞書格納部３は、辞書代表ベクトル６２および辞書主成分データ６３を辞書データとしてカテゴリＩＤ６４にしたがって格納する。なお、同じカテゴリＩＤをもつ複数の辞書データを格納することも可能である。
【００４２】
識別対象画像群入力部４は、ビデオカメラ等によって同一対象物体を撮影して得られたＮ個（Ｎは自然数）の静止画像を獲得し、これらを入力画像データ４Ａとして照合部５に出力する。
照合部５は更に、入力画像特徴抽出部５１と、入力代表ベクトル生成部５２と、入力主成分データ生成部（入力部分空間生成手段）５３と、距離算出部５４と、識別部５５とから構成されている。
【００４３】
入力画像特徴抽出部５１は、識別対象画像群入力部４より入力されるＮ個の入力画像データ４Ａから、認識に用いるＮ個の入力特徴データ５１Ａを特徴抽出し、入力代表ベクトル生成部５２および入力主成分データ生成部５３に出力する。Ｎ個の入力特徴データ５１Ａからなる入力特徴データ群は、図２（ａ）に示す入力特徴分布５１Ｂに分布しているものとする。
入力画像特徴抽出部５１の一例として、元の画像データに１次微分、２次微分フィルタを作用させた出力を、ラスタースキャンして１次元特徴データとして出力するものがある。また入力画像特徴抽出部５１の他の例として、元の画像データをラスタースキャンして１次元特徴データとし、その平均を０、分散を１．０とするように、平均と分散を一定値に正規化するものがある。これにより、輪郭強調や雑音除去などが施された入力特徴データ５１Ａが得られる。
【００４４】
入力代表ベクトル生成部５２は、Ｎ個の入力特徴データ５１Ａからなる入力特徴データ群を基に、この入力特徴データ群を代表する１つのベクトルである入力代表ベクトル５２Ａを生成し、入力主成分データ生成部５３および距離算出部５４に出力する。入力代表ベクトル５２Ａは、入力特徴分布５１Ｂを含む入力部分空間に属する任意のベクトルであり、原点Ｏを始点とし、入力部分空間上のＱ点を終点とする。入力代表ベクトル５２Ａの一例として、入力特徴データ群の平均値（平均ベクトル）や中央値（中央ベクトル）などが挙げられる。
【００４５】
入力主成分データ生成部５３は、Ｎ個の入力特徴データ５１Ａのそれぞれから入力代表ベクトル５２Ａを除いた後の成分を代表するＫ個のベクトルである入力主成分ベクトル（Ψi（ｉ＝１，・・・，Ｋ））５３Ａと、入力主成分ベクトル（Ψj）のそれぞれに対応する重み値（μi（ｉ＝１，・・・，Ｋ））５３Ｃを抽出し、距離算出部５４に出力する。入力特徴データ５１Ａの特徴次元数をＤとすると、Ｋは１より大きく、ｍｉｎ（Ｎ，Ｄ）以下の自然数である。入力主成分ベクトル５３Ａは、入力特徴分布５１Ｂを含む入力部分空間を表すベクトルである。
【００４６】
入力特徴データ５１Ａから入力代表ベクトル５２Ａを除く方法としては、入力特徴データ５１Ａから入力代表ベクトル５２Ａを減算する方法や、入力特徴データ５１Ａの入力代表ベクトル５２Ａに垂直な成分を計算する方法がある。
Ｋ個の入力主成分ベクトル５３Ａおよび重み値５３Ｃを抽出する方法としては、Ｎ個の入力特徴データ５１Ａのそれぞれから入力代表ベクトル５２Ａを除いた後の成分を主成分分析し、固有値が大きい方からＫ個の固有ベクトルを入力主成分ベクトル５３Ａとして選択し、選択された入力主成分ベクトル５３Ａに対応する固有値を重み値５３Ｃとして採用する方法がある。
【００４７】
距離算出部５４は、入力代表ベクトル５２Ａ、入力主成分ベクトル５３Ａおよび重み値５３Ｃを用いて、辞書格納部３に格納されているＣ個のカテゴリに属する辞書データとの距離値を算出する。より具体的には、辞書格納部３のレコード記憶部３１〜３Ｃのそれぞれから、辞書代表ベクトル６２と、辞書主成分データ６３として記憶されているＬ個の辞書主成分ベクトル６３Ａおよび重み値６３Ｂとを読み出し、入力代表ベクトル５２Ａ、入力主成分ベクトル５３Ａおよび重み値５３Ｃを用いて、図２（ｂ）に示す入力特徴分布５１Ｂを含む入力部分空間と各カテゴリの辞書特徴分布２１ＢＢを含む辞書部分空間との間の距離の値Ｄを算出し、距離値５４Ａとして識別部５５に順次出力する。
【００４８】
識別部５５は、Ｃ個のカテゴリとの距離値５４Ａに基づいて、入力画像データ４Ａに対する認識結果５Ａを出力する。この識別部５５は、例えば図４に示すように、最小値算出部７１と、閾値処理部７２とから構成される。最小値算出部７１は、Ｃ個のカテゴリとの距離値５４Ａの最小値を求める。閾値処理部７２は、最小値算出部７１によって求められた最小値を、あらかじめ決められた閾値と比較し、閾値より小さければ、最小値が得られたカテゴリを認識結果５Ａとして出力する。逆に、閾値以上であれば、入力画像データ４Ａは辞書には存在しないパターンであるという認識結果５Ａを出力する。
【００４９】
次に、図５および図６を参照し、距離算出部５４の構成および動作について詳述する。図５は、距離算出部５４の一構成例を示すブロック図である。図６は、距離算出部５４の動作を説明する概念図である。
図５に示すように、距離算出部５４は、入力投影距離算出部８１と、辞書投影距離算出部８２と、統合部８３とから構成される。
入力投影距離算出部８１は、図６に示す入力代表ベクトル５２Ａの終点Ｑと、Ｌ個の辞書主成分ベクトル６３Ａで形成される辞書部分空間６３Ｃとの距離を表す入力投影距離値（第１の距離）ｄ₁を算出する。入力代表ベクトル５２ＡをＶ₁、辞書代表ベクトル６２をＶ₂、σを任意の定数とすると、距離値ｄ₁を例えば式（１５）によって算出することができる。
【００５０】
【数１２】

【００５１】
計算の高速化のため、定数σ＝０とし、式（１６）のように簡略化してもよい。
【００５２】
【数１３】

【００５３】
式（１５）および式（１６）において、ベクトル（Ｖ₂−Ｖ₁）は、ベクトルＱＰであり、入力投影距離値ｄ₁は、入力代表ベクトル５２Ａの終点Ｑと、辞書部分空間６３Ｃとの距離値を表している。
辞書投影距離算出部８２は、図６に示す辞書代表ベクトル６２の終点Ｐと、Ｋ個の入力主成分ベクトル５３Ａで形成される入力部分空間５３Ｃとの辞書投影距離値（第２の距離）ｄ₂を算出する。同様に、距離値ｄ₂を例えば式（１７）によって算出することができる。
【００５４】
【数１４】

【００５５】
計算の高速化のため、定数σ＝０とし、式（１８）のように簡略化してもよい。
【００５６】
【数１５】

【００５７】
式（１７）および式（１８）において、ベクトル（Ｖ₁−Ｖ₂）は、ベクトルＰＱであり、辞書投影距離値ｄ₂は、辞書代表ベクトル６２の終点Ｐと、入力部分空間５３Ｃとの距離値を表している。
統合部８３は、入力投影距離値ｄ₁および辞書投影距離値ｄ₂の両方を用いて、距離値Ｄを算出する。例えば、式（１９）を用いて距離値Ｄを算出することができる。
Ｄ＝αｄ₁＋βｄ₂ ・・・（１９）
ただし、α，βは定数である。
また、式（２０）を用いてもよい。
Ｄ＝αｄ₁・ｄ₂／（ｄ₁＋ｄ₂）・・・（２０）
ただし、αは定数である。
【００５８】
図２（ｂ）において、入力特徴分布５１Ｂと辞書特徴分布２１ＢＢとの間の距離値Ｄは、ベクトルＰＱのノルムを計算することによっても得られるが、この方法では入力代表ベクトル５２Ａおよび辞書代表ベクトル６２の２個のデータしか用いないので、得られた距離値Ｄを実際の照合に用いても、照合性能が低く、高い認識率は得られない。これに対し、入力代表ベクトル５２Ａと辞書部分空間６３Ｃとの投影距離値ｄ₁と、辞書代表ベクトル６２と入力部分空間５３Ｃとの投影距離値ｄ₂とを統合することによって得られた距離値Ｄは、入力代表ベクトル５２Ａ、辞書代表ベクトル６２に加えて、Ｋ個の入力主成分ベクトル５３Ａ（および重み値５３Ｂ）と、Ｌ個の辞書主成分ベクトル６３Ａ（および重み値６３Ｂ）という、より多くのデータを利用して得られたものであるから、上述した方法と比較して、照合性能がはるかに高く、高い認識率が得られる。
【００５９】
また、入力部分空間５３Ｃと辞書部分空間６３Ｃとの角度ではなく、代表ベクトルおよび主成分ベクトルを用いて算出した分布間の距離値Ｄを照合に用いるので、図１５に示したような分布配置の場合でも、正確な照合が可能となる。
また、入力画像として複数の画像データ４Ａを用いるので、１つの入力画像データを用いて認識するシステムに比べると、照合性能がはるかに高い。
なお、距離値Ｄの計算に用いる辞書主成分ベクトル６３Ａおよび入力主成分ベクトル５３Ａは、ともに直交基底であることが好ましい。直交基底である主成分ベクトル６３Ａ，５３Ａを用いて距離値の計算を行うことにより、直交基底でない場合と比較して、短時間で高精度の照合結果を得ることができ、認識率を向上させることができるからである。
【００６０】
次に、図１に示した画像認識システムの辞書データ学習の動作について説明する。図７は、この辞書データ学習の動作の流れを示すフローチャートである。また、図８は、辞書学習に用いる学習画像の一例を示す図である。
まず、辞書学習に用いる学習画像群と、この学習画像群に対応するカテゴリＩＤを入力する（図７のステップＳ１）。この学習画像群は、例えば図８に示すように、カテゴリ１の学習画像データ９１、カテゴリ２の学習画像データ９２、カテゴリ３の学習画像データ９３のように、特定のカテゴリに属する複数（Ｍ個）の画像データからなる。
【００６１】
つぎに、入力されたＭ個の学習画像データのそれぞれに対して特徴抽出を行い、Ｍ個の辞書特徴データを得る（図７のステップＳ２）。つぎに、得られた辞書特徴データ群を代表する１つのベクトルを生成し、辞書代表ベクトルとする（図７のステップＳ３）。つぎに、辞書特徴データ群から辞書代表ベクトルを除いた成分について、その分布を代表するＬ個の辞書主成分ベクトルを含む辞書主成分データを生成する（図７のステップＳ４）。こうして得られた辞書代表ベクトルおよび辞書主成分データを辞書データとして、カテゴリＩＤによって分類し辞書格納部３に格納する（図７のステップＳ５）。
他のカテゴリについて学習するかどうかを判断し（図７のステップＳ６）、学習する場合には学習画像群の入力（図７のステップＳ１）から作業を繰り返す。作成が終了したら学習動作を終了する。
【００６２】
次に、図１に示した画像認識システムの認識動作について説明する。図９は、この認識動作の流れを示すフローチャートである。また、図１０は、認識対象の入力画像の一例を示す図である。
まず、認識対象の画像群を入力する（図９のステップＳ１１）。この入力画像群は、例えば図１０に示すように、同じ対象物体を撮影して得られた複数（Ｎ個）の画像データ９０からなる。
【００６３】
つぎに、入力されたＮ個の入力画像データのそれぞれに対して特徴抽出を行い、Ｎ個の入力特徴データを得る（図９のステップＳ１２）。つぎに、得られた入力特徴データ群を代表する１つのベクトルを生成し、入力代表ベクトルとする（図９のステップＳ１３）。つぎに、入力特徴データ群から入力代表ベクトルを除いた成分について、その分布を代表するＫ個の入力主成分ベクトルを含む入力主成分データを生成する（図９のステップＳ１４）。
【００６４】
つぎに、辞書格納部３にカテゴリ毎に格納されている辞書データを読み出し、入力代表ベクトルおよび入力主成分データを用いて、辞書データとの距離値をカテゴリ毎に計算する（図９のステップＳ１５）。そして、これらの中で最小の距離値を求める（図９のステップＳ１６）。
つぎに、最小距離値が閾値よりも小さいかどうかを判断する（図９のステップＳ１７）。最小距離値が閾値よりも小さいときは、最小距離となったカテゴリを認識結果として出力して終了する（図９のステップＳ１８）。逆に、最小距離値が閾値以上であるときは、該当クラスなしを出力して終了する（図９のステップＳ１９）。ここでは、最小距離値が閾値と等しい場合、ステップＳ１９に移行することとしたが、ステップＳ１８に移行するようにしてもよいことは言うまでもない。
【００６５】
（第２の実施の形態）
図１１は、本発明の第２の実施の形態である顔画像認識システムの構成を示すブロック図である。この顔画像認識システムは、顔画像検出部１００と、学習部１０２と、顔辞書格納部１０３と、照合部１０５とから構成されている。
顔画像検出部１００は、ビデオ映像などの画像シーケンスの各フレームから人間の顔が映っている顔画像データを選択する。辞書データ学習動作の際には、選択された顔画像データを学習部１０２に出力し、認識動作の際には、照合部１０５に出力する。顔画像データを選択する方法としては、人間の肌の色に近い色の領域の面積、動きのある領域の面積がある閾値以上になったときに顔があると判断する方法がある。また、人間が手動で顔が撮影された画像群を画面で見ながら選択する方法もある。
【００６６】
学習部１０２の動作は、図１における学習部２の動作と同じである。顔辞書格納部１０３の動作は、レコードとして人間の顔画像を対象とした辞書データが格納されることを除き、図１における辞書格納部３の動作と同じである。照合部１０５の動作は、図１における照合部５の動作と同じである。
図８に示す顔画像認識システムは、人間の顔を対象とし、入力された画像に映る人物が誰なのかを認識することができ、セキュリティ、監視、ヒューマンインターフェース等に利用することができる。
【００６７】
（第３の実施の形態）
図１２は、本発明の画像認識システムである第３の実施の形態の構成を示すブロック図である。この画像認識システムは、プログラム制御により動作するコンピュータ１１０と、識別対象画像及び学習画像を取り込みコンピュータ１１０に出力するカメラ１２１と、コンピュータ１１０に対してオペレータが認識の指示及び学習の指示を与えるための操作卓１２２と、コンピュータ１１０から出力された認識結果を表示する表示装置１２３とから構成されている。コンピュータ１１０は、演算処理部１１１と記憶部１１２とインタフェース部（以下、Ｉ／Ｆ部という）１１３₁〜１１３₄とがバス１１４に接続された構成となっている。Ｉ／Ｆ部１１３₁〜１１３₃は、コンピュータ１１０の外部装置であるカメラ１２１、操作卓１２２、表示装置１２３とインタフェースをとる。
【００６８】
コンピュータ１１０の動作を制御する画像認識プログラムは、磁気ディスク、半導体メモリその他の記録媒体１２４に記録された状態で提供される。この記録媒体１２４をＩ／Ｆ部１１３₄に接続すると、演算処理部１１１は記録媒体１２４に書き込まれた画像認識プログラムを読み出し、記憶部１１２に格納する。その後、操作卓１２２からの指示に基づき、演算処理部１１１が記憶部１１２に格納された画像認識プログラムを実行し、図１に示した学習部２と、辞書格納部３と、照合部５とを実現する。
なお、画像認識プログラムは、インターネットなどの電気通信回線を介して提供されてもよい。
【００６９】
コンピュータ１１０は、図７および図９のフローチャートに示す動作を行う。すなわち、操作卓１２２より学習の指示があり、カメラ１２１から学習画像データ群が入力されるとともに操作卓から対応するカテゴリＩＤが入力されると、学習画像データ群の特徴抽出を行い、得られた辞書特徴データ群を基に辞書代表ベクトルおよび辞書主成分データを生成し、この辞書代表ベクトルおよび辞書主成分データをカテゴリＩＤによって分類し、記憶部１１２によって構成される辞書格納部に辞書データとして格納する。つぎに、他のカテゴリについて学習するかどうかを判断し、学習する場合には学習画像データの入力から作業を繰り返す。作成が終了したら学習動作を終了する。
【００７０】
また、操作卓１２２より認識の指示があり、カメラ１２１から識別対象の画像データ群が入力されると、特徴抽出を行い、得られた入力特徴データ群を基に入力代表ベクトルおよび入力主成分データを生成する。つぎに、辞書格納部にカテゴリ毎に格納されている辞書データを読み出し、入力代表ベクトルおよび入力主成分データを用いて、辞書データとの距離値をカテゴリ毎に計算する。そして、これらの中で最小の距離値を求め、最小距離値が閾値よりも小さいかどうかを判断する。最小距離が閾値よりも小さいときは、最小距離となったカテゴリを認識結果として表示装置１２３に表示し、逆に最小距離が閾値以上であるときは、該当クラスがない旨を表示装置１２３に表示し、認識動作を終了する。
なお、演算処理部１１１が画像認識プログラムを実行することにより、図１１に示した顔画像検出部１００と、学習部１０２と、顔辞書格納部１０３と、照合部１０５とを実現させることもできる。
【００７１】
【発明の効果】
以上説明したように、本発明では、Ｋ個の入力主成分ベクトル、入力代表ベクトル、Ｌ個の辞書主成分ベクトルおよび辞書代表ベクトルを用いて入力部分空間と辞書部分空間との距離値を算出し、算出された距離値を照合に用いる。これにより、複数の入力特徴分布が同じ入力部分空間内に存在する場合でも、各入力特徴分布の配置が異なれば距離値も異なるので、入力画像群を正しく判別することができる。
また、本発明では、複数の入力画像を用い、しかも複数の入力画像から得られた多くのデータを有効に利用して距離値を算出し、この距離値を照合に用いるので、１つの入力画像を用いて認識を行なう場合と比較して、照合性能がはるかに高く、高い認識率が得られる。
したがって、同一物体の照明による変動、向きによる変動、変形などを吸収し、頑強な認識システムおよび方法を構築することが可能となる。
【００７２】
また、直交基底である主成分ベクトルを用いて距離値の計算を行うことにより、直交基底でない場合と比較して、短時間で高精度の照合結果を得ることができ、認識率を向上させることができる。
また、辞書主成分ベクトルおよび辞書代表ベクトルを生成する手段を設けることにより、または、かかる処理を行うことにより、辞書データの内容を随時更新し、急激な内容変化に対応することができる。
また、入力される画像シーケンスから顔画像データを選択する手段を設けることにより、または、かかる処理を行なうことにより、人間の顔画像を用いて画像中の人物を同定するシステムを構築することができる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態である画像認識システムの構成を示すブロック図である。
【図２】図１に示す画像認識システムによる処理を概念的に示す図である。
【図３】辞書格納部の一構成例を示すブロック図である。
【図４】識別部の一構成例を示すブロック図である。
【図５】距離算出部の一構成例を示すブロック図である。
【図６】距離算出部の動作を説明する概念図である。
【図７】図１に示す画像認識システムの辞書データ学習の動作の流れを示すフローチャートである。
【図８】辞書学習に用いる学習画像の一例を示す図である。
【図９】図１に示す画像認識システムの認識動作の流れを示すフローチャートである。
【図１０】認識対象の入力画像の一例を示す図である。
【図１１】本発明の第２の実施の形態である顔画像認識システムの構成を示すブロック図である。
【図１２】本発明の画像認識システムである第３の実施の形態の構成を示すブロック図である。
【図１３】従来の画像認識システムの構成を示すブロック図である。
【図１４】従来の画像認識システムで用いられる類似度を示す概念図である。
【図１５】従来の画像認識システムの問題点を示す概念図である。
【符号の説明】
１…学習画像群入力部、１Ａ…学習画像データ１Ａ、１Ｂ…カテゴリＩＤ、２…学習部、３…辞書格納部、４…識別対象画像群入力部、５…照合部、５Ａ…認識結果、２１…学習画像特徴抽出部、２１Ａ…辞書特徴データ、２１Ｂ，２１Ｂ′…辞書特徴分布、２２…辞書代表ベクトル生成部、２２Ａ…辞書代表ベクトル、２３…辞書主成分データ生成部、２３Ａ…辞書主成分ベクトル、２３Ｂ…重み値、３１〜３Ｃ…レコード記憶部、５１…入力画像特徴抽出部、５１Ａ…入力特徴データ、５１Ｂ…入力特徴分布、５２…入力代表ベクトル生成部、５２Ａ…入力代表ベクトル、５３…入力主成分データ生成部、５３Ａ…入力主成分ベクトル、５３Ｂ…重み値、５３Ｃ…入力部分空間、５４…距離算出部、５４Ａ…距離値、５５…識別部、６１…レコード番号、６２…辞書代表ベクトル、６３…辞書主成分データ、６３Ａ…辞書主成分ベクトル、６３Ｃ…辞書部分空間、６４…カテゴリＩＤ、７１…最小値算出部、７２…閾値処理部、８１…入力投影距離算出部、８２…辞書投影距離算出部、８３…統合部、９０…識別対象の入力画像データ、９１〜９３…学習画像データ、１００…顔画像検出部、１０２…学習部、１０３…顔辞書格納部、１０５…照合部、１１０…コンピュータ、１１１…演算処理部、１１２…記憶部、１１３…インタフェース部、１１４…バス、１２１…カメラ、１２２…操作卓、１２３…表示装置、１２４…記憶媒体、５０１…画像入力部、５０２…辞書記憶部、５０３…部分空間間の角度計算部、５０４…認識部、５１１，５１１Ａ〜５１１Ｃ…入力特徴分布、５１２…入力部分空間、５２１…辞書特徴分布、５２２…辞書部分空間、Ｄ，ｄ₁，ｄ₂…距離値、Ｏ…座標系の原点、Ｐ…辞書代表ベクトルの終点、Ｑ…入力代表ベクトルの終点、Ｓ１〜Ｓ６…学習動作のステップ、Ｓ１１〜Ｓ１９…認識動作のステップ、Ｖ₁…入力代表ベクトル、Ｖ₂…辞書代表ベクトル、Θ…角度。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image recognition system, an image recognition method, and an image recognition program, and in particular, identifies whether a target object photographed as an image is an object registered in a dictionary, or in a plurality of categories registered in a dictionary. The present invention relates to an image recognition system, an image recognition method, and an image recognition program.
[0002]
[Prior art]
An example of a conventional image recognition system is described in Japanese Patent Laid-Open No. 11-265452 (object recognition apparatus and object recognition method). FIG. 13 is a block diagram showing the configuration of this conventional image recognition system. As shown in this figure, the conventional image recognition image recognition system includes an image input unit 501, a dictionary storage unit 502, a subspace angle calculation unit 503, and a recognition unit 504.
[0003]
The image input unit 501 acquires a plurality of images taken from a plurality of directions. In the dictionary storage unit 502, dictionary data expressed in advance in an M-dimensional partial space is prepared for each category. The angle calculation unit 503 between subspaces first represents an input image group acquired by the image input unit 501 in an N-dimensional subspace. Specifically, the input image is regarded as one-dimensional feature data, principal component analysis is performed, N eigenvectors are extracted, and an N-dimensional subspace is defined by the N eigenvectors. The angle calculation unit 503 between the subspaces further calculates an angle Θ between the N-dimensional subspace of the input image (hereinafter referred to as the input subspace) and the M-dimensional subspace of the dictionary (hereinafter referred to as the dictionary subspace) as a category of the dictionary. Calculate every. The recognition unit 504 compares the angle Θ calculated by the angle calculation unit 503 between subspaces, and outputs a category having the smallest angle Θ as a recognition result.
[0004]
A more specific description will be given with reference to FIG. In FIG. 14, 511 is an input feature distribution in which feature data extracted from an input image is distributed, 512 is an input subspace including the

input feature distribution

511, and 521 is a feature data used to create a category of dictionary data. A dictionary feature distribution 522 is a dictionary subspace including the dictionary feature distribution 521.
When the basis vector of the input subspace 512 is ψn (n = 1, 2,..., N) and the basis vector of the dictionary subspace 522 is Φm (m = 1, 2,..., M), the angle calculation between the subspaces is performed. The unit 503 calculates a matrix X having x ij in the expression (1) or (2) as an element.
[0005]
[Expression 7]

[0006]
As the maximum eigenvalue of the matrix X, the square of the cosine of the angle Θ between the

subspaces

512 and 522 is obtained. When the square of the cosine of the angle Θ is large (or small), the angle Θ is small (or large), and when the angle Θ is small (or large), the similarity between the

subspaces

512 and 522 is large ( Or smaller). Therefore, the maximum eigenvalue of the matrix X can be rephrased as the similarity between the

subspaces

512 and 522.
Therefore, the angle calculation unit 503 between subspaces obtains the maximum eigenvalue of the matrix X for each category and sets the similarity, and the recognition unit 504 classifies the input image into the category having the maximum similarity.
[0007]
[Problems to be solved by the invention]
FIG. 15 is a conceptual diagram showing problems of a conventional image recognition system.
As shown in this figure, when there are three input feature distributions 511A, 511B, and 511C at different positions in the same input subspace 512, the three input feature distributions 511A to 511C are at positions separated from each other. Since the distance from the dictionary feature distribution 521 in the dictionary subspace 522 is also clearly different, originally a different similarity must be calculated.
[0008]
However, in the conventional image recognition system, the maximum eigenvalue of the matrix X equivalent to the square of the cosine of the angle Θ between the input subspace 512 and the dictionary subspace 522 is used as the similarity. The similarities between the three existing input feature distributions 511A to 511C and the dictionary subspace 522 are all the same.
As described above, in the conventional image recognition system, when there are input feature distributions at different positions in the same input subspace 512, all the similarities with the dictionary subspace 522 are the same, and there is a problem that they cannot be discriminated. .
[0009]
The present invention has been made to solve such a problem, and the object thereof is not dependent on the angle between the input subspace and the dictionary subspace, and an image recognition system that can correctly determine an input image group. Another object is to provide an image recognition method and an image recognition program.
[0010]
[Means for Solving the Problems]
In order to achieve such an object, an image recognition system of the present invention is an image recognition system that collates a plurality of input images obtained by photographing the same object with previously registered dictionary data and outputs a recognition result. ,An input representative vector that is an arbitrary vector belonging to a distribution range of a plurality of input feature data obtained from the input image, an input principal component vector that is a basis of an input subspace spanned by the input feature data, and registered in advance A dictionary representative vector that is an arbitrary vector belonging to the distribution range of the dictionary feature data, and a dictionary principal component vector that is a base of the dictionary subspace spanned by the dictionary feature data, and the input subspace and the Distance calculating means for calculating a distance value from the dictionary subspace, and identification means for performing identification for outputting the recognition result based on the distance value;It is provided with.
[0011]
More specifically,It is a base of an input subspace spanned by an input representative vector that is an arbitrary vector belonging to a distribution range of a plurality of input feature data obtained from the input image, and a difference vector between the input feature data and the input representative vector Using an input principal component vector, a dictionary representative vector that is an arbitrary vector belonging to a distribution range of dictionary feature data registered in advance, and a dictionary principal component vector that is a basis of a dictionary subspace spanned by the dictionary feature data Distance calculating means for calculating a distance value between the input subspace and the dictionary subspace; and an identification means for performing identification for outputting the recognition result based on the distance value;May be provided. As a result, even when a plurality of input feature distributions exist in the same input subspace, the input image group can be correctly identified because the distance value is different if the arrangement of each input feature distribution is different.
[0012]
In this image recognition system, the input principal component vector and the dictionary principal component vector may be orthogonal bases. By calculating the distance value using the principal component vector that is an orthogonal basis, a highly accurate collation result can be obtained in a short time and the recognition rate can be improved as compared with the case where it is not an orthogonal basis. .
The input representative vector is an average vector of the input feature data, and the input principal component vector is the K eigenvectors having the largest eigenvalue among components obtained by subtracting the input representative vector from the input feature data, and the dictionary representative vector May be an average vector of dictionary feature data, and the dictionary principal component vector may be L eigenvectors having the largest eigenvalue among components obtained by subtracting the dictionary representative vector from the dictionary feature data.
[0013]
The distance calculation means is formed of input projection distance calculation means for calculating a first distance between a space formed by L dictionary principal component vectors and an input representative vector, and K input principal component vectors. A dictionary projection distance calculating means for calculating a second distance between the space and the dictionary representative vector; and an integrating means for calculating a distance value between the input partial space and the dictionary partial space from the first and second distances. There may be. As described above, the distance value is calculated by effectively using a lot of data obtained from the input image and the dictionary data, and this distance value is used for matching. Therefore, the matching performance is improved and a high recognition rate is obtained.
[0014]
Where the input representative vector is V₁, V₂, The input principal component vector_i(I = 1, ..., K), the principal component vector of the dictionary is Φ_jAssuming (j = 1,..., L), the input projection distance calculation means calculates the first distance d by equation (3).₁And the dictionary projection distance calculation means calculates the second distance d according to equation (4).₂May be calculated.
[0015]
[Equation 8]

[0016]
In the image recognition system described above,The distance calculation unit uses K weights corresponding to the input principal component vectors and L weights corresponding to the dictionary principal component vectors.It may be a thing.
Here, the weight corresponding to the input principal component vector is K eigenvalues of the eigenvector that is the input principal component vector, and the weight corresponding to the dictionary principal component vector is the L eigenvalue of the eigenvector that is the dictionary principal component vector. It may be.
[0017]
The input representative vector is V₁, V₂, The input principal component vector_i(I = 1,..., K), the weight corresponding to the input principal component vector is μ_i(I = 1, ..., K), the principal component vector of the dictionary is Φ_j(J = 1,..., L), and the weight corresponding to the dictionary principal component vector is λ_jIf (j = 1,..., L), the input projection distance calculation means calculates the first distance d according to equation (5).₁And the dictionary projection distance calculation means calculates the second distance d according to equation (6).₂May be calculated (where σ is an arbitrary constant).
[0018]
[Equation 9]

[0019]
Further, in the image recognition system described above, the integration unit has the first distance d.₁And the second distance d₂From equation (7), the distance value D between the input subspace and the dictionary subspace may be calculated.
D = αd₁+ Βd₂                  ... (7)
(Where α and β are constants)
Alternatively, the integrating means may use the first distance d₁And the second distance d₂From equation (8), the distance value D between the input subspace and the dictionary subspace may be calculated.
D = αd₁・ D₂/ (D₁+ D₂(8)
(Where α is a constant)
[0020]
  In the image recognition system described above,The apparatus may further include input representative vector generation means for generating the input representative vector from the plurality of input feature data, and input principal component data generation means for generating the input principal component vector from the input feature data.
[0021]
  In the image recognition system described above,Dictionary representative vector generating means for generating the dictionary representative vector; dictionary principal component data generating means for generating the dictionary principal component vector and outputting the dictionary representative vector and the dictionary storing means for storing the dictionary principal component vector;May be further provided.
  In the above-described image recognition system, the recognition target is a human face image.Thereby, the system which identifies the person in an image using a human face image can be comprised.
[0022]
    The image recognition method of the present invention is an image recognition method for collating a plurality of input images obtained by photographing the same object with dictionary data registered in advance and outputting a recognition result,An input representative vector that is an arbitrary vector belonging to a distribution range of a plurality of input feature data obtained from the input image, an input principal component vector that is a basis of an input subspace spanned by the input feature data, and registered in advance A dictionary representative vector that is an arbitrary vector belonging to the distribution range of the dictionary feature data, and a dictionary principal component vector that is a base of the dictionary subspace spanned by the dictionary feature data, and the input subspace and the A distance calculating step for calculating a distance value from the dictionary subspace, and an identifying step for performing identification for outputting the recognition result based on the distance value;It is provided with.
[0023]
  More specifically,This is a base of an input subspace spanned by an input representative vector that is an arbitrary vector belonging to a distribution range of a plurality of input feature data obtained from the input image, and a difference vector between the input feature data and the input representative vector Using an input principal component vector, a dictionary representative vector that is an arbitrary vector belonging to a distribution range of dictionary feature data registered in advance, and a dictionary principal component vector that is a base of a dictionary subspace spanned by the dictionary feature data A distance calculating step for calculating a distance value between the input subspace and the dictionary subspace, and an identifying step for performing identification for outputting the recognition result based on the distance value;May be provided.
  In this image recognition method, the input principal component vector and the dictionary principal component vector may be orthogonal bases.
[0024]
  In addition, the input representative vector is an average vector of the input feature data, and the input principal component vector is the K eigenvectors having the largest eigenvalue among components obtained by subtracting the input representative vector from the input feature data. The dictionary representative vector is an average vector of the dictionary feature data, and the dictionary principal component vector is a component obtained by subtracting the dictionary representative vector from the dictionary feature data. It may be an eigenvector.
[0025]
  The distance calculating step includes an input projection distance calculating step for calculating a first distance between a space formed by the L dictionary principal component vectors and the input representative vector, and the K input principal component vectors. A dictionary projection distance calculating step for calculating a second distance between the space formed by the dictionary representative vector, and the distance value between the input subspace and the dictionary subspace from the first and second distances. It may consist of an integration step to calculate.
[0027]
  The input representative vector is V ₁ , The dictionary representative vector is V ₂ , The input principal component vector _i (I = 1,..., K), the dictionary principal component vector is Φ _j (J = 1,..., L), the input projection distance calculation step uses the first distance d according to equation (9). ₁ The dictionary projection distance calculating step calculates the second distance d according to equation (10). ₂ CalculateIt may be a thing.
[0028]
[Expression 10]

[0029]
  Also,The distance calculating step uses K weights corresponding to each of the input principal component vectors and L weights corresponding to the dictionary principal component vectors.It may be a thing.
  here,The weight corresponding to the input principal component vector is K eigenvalues of the eigenvector that is the input principal component vector, and the weight corresponding to the dictionary principal component vector is L of the eigenvector that is the dictionary principal component vector. Unique valuesIt may be.
[0030]
  Also,The input representative vector is V ₁ , The dictionary representative vector is V ₂ , The input principal component vector _i (I = 1,..., K), and the weight corresponding to the input principal component vector is μ _i (I = 1,..., K), the dictionary principal component vector is Φ _j (J = 1,..., L), and the weight corresponding to the dictionary principal component vector is λ _j (J = 1,..., L), the input projection distance calculation step uses the first distance d according to equation (11). ₁ The dictionary projection distance calculation step calculates the second distance d according to equation (12). ₂ Calculate(Where σ is an arbitrary constant).
[0031]
## EQU11 ##

[0032]
  Also,In the integration step, the first distance d ₁ And the second distance d ₂ The distance value D between the input subspace and the dictionary subspace is calculated from Equation (13)It may be a thing.
      D = αd₁+ Βd₂                  ... (13)
      (Where α and β are constants)
  OrIn the integration step, the first distance d ₁ And the second distance d ₂ The distance value D between the input subspace and the dictionary subspace is calculated from Equation (14)It may be a thing.
      D = αd₁・ D₂/ (D₁+ D₂(14)
      (Where α is a constant)
[0033]
  Also,The image recognition method described above further includes an input representative vector generation step for generating the input representative vector from the plurality of input feature data, and an input principal component data generation step for generating the input principal component vector from the input feature data. You may have.
  Also,The above-described image recognition method includes a dictionary representative vector generating step for generating the dictionary representative vector, a dictionary generating the dictionary principal component vector, and outputting the dictionary representative vector and the dictionary storage means for storing the dictionary principal component vector. A principal component data generation step.
  In the image recognition method described above, the recognition target is a human face image.
[0034]
DETAILED DESCRIPTION OF THE INVENTION
Next, embodiments of the present invention will be described in detail with reference to the drawings.
[0035]
(First embodiment)
FIG. 1 is a block diagram showing a configuration of an image recognition system according to the first embodiment of the present invention. FIG. 2 is a diagram conceptually showing processing by the image recognition system shown in FIG.
The image recognition system shown in FIG. 1 receives a learning image group input unit 1 that acquires a learning image group composed of a plurality of learning image data obtained by photographing the same object, and is input from the learning image group input unit 1. The learning unit 2 that generates dictionary data from the learning image group, the dictionary storage unit 3 that stores the dictionary data generated by the learning unit 2 in categories for each shooting target, and obtained by shooting the same target An identification target image group input unit 4 for acquiring an input image group composed of a plurality of input image data, and an input image group input from the identification target image group input unit 4 using dictionary data stored in the dictionary storage unit 3 And a collation unit 5 for recognizing a photographing target.
[0036]
The learning image group input unit 1 acquires M (M is a natural number) still images obtained by photographing the same target object with a video camera or the like for registration as a dictionary, and uses them as learning image data 1A, category ID1B. At the same time, it is output to the learning unit 2. The M pieces of learning image data 1A are assumed to belong to the category specified by the category ID 1B.
The learning unit 2 further includes a learning image feature extraction unit 21, a dictionary representative vector generation unit 22, and a dictionary principal component data generation unit (dictionary subspace generation means) 23.
[0037]
The learning image feature extraction unit 21 extracts M dictionary feature data 21A used for recognition from the M learning image data 1A input from the learning image group input unit 1, and extracts the dictionary representative vector generation unit 22 and the dictionary. The data is output to the principal component data generation unit 23. It is assumed that a dictionary feature data group composed of M dictionary feature data 21A is distributed in a dictionary feature distribution 21B shown in FIG.
As an example of the learning image feature extraction unit 21, there is an output obtained by applying a first-order differential and second-order differential filter to original image data as raster scan and outputting as one-dimensional feature data. As another example of the dictionary image feature extraction unit 21, the original image data is raster-scanned into one-dimensional feature data, and the average and variance are constant values so that the average is 0 and the variance is 1.0. There is something to normalize. Thereby, dictionary feature data 21A subjected to contour enhancement, noise removal, and the like is obtained.
The learning image feature extraction unit 21 extracts the same feature data as the input image feature extraction unit 51 of the collation unit 5 if the input image data 4A output from the identification target image group input unit 4 is input. Need to be.
[0038]
The dictionary representative vector generation unit 22 generates a dictionary representative vector 22A, which is one vector representing the dictionary feature data group, based on the dictionary feature data group including the M pieces of input feature data 21A, and the dictionary principal component data. The data is output to the generation unit 23 and the dictionary storage unit 3. The dictionary representative vector 22A is an arbitrary vector belonging to the dictionary subspace including the dictionary feature distribution 21B, and has the origin O as the start point and the P point on the dictionary subspace as the end point. Examples of the dictionary representative vector 22A include an average value (average vector) and a median value (center vector) of the dictionary feature data group.
[0039]
The dictionary principal component data generation unit 23 is a dictionary principal component vector (Φj (j = 1,...) That is L vectors representing components after the dictionary representative vector 22A is removed from each of the M dictionary feature data 21A. .., L)) 23A and the weight values (λj (j = 1,..., L)) 23B corresponding to the dictionary principal component vector (Φi) are extracted and output to the dictionary storage unit 3. When the number of feature dimensions of the dictionary feature data 21A is D, L is a natural number greater than 1 and less than or equal to min (M, D). min (M, D) represents the smaller number of M and D. The dictionary principal component vector 23A is a vector representing a dictionary subspace including the dictionary feature distribution 21B.
[0040]
As a method of removing the dictionary representative vector 22A from the dictionary feature data 21A, there are a method of subtracting the dictionary representative vector 22A from the dictionary feature data 21A and a method of calculating a component perpendicular to the dictionary representative vector 22A of the dictionary feature data 21A.
Thereafter, as a method of extracting L dictionary principal component vectors 23A and weight values 23B, principal component analysis is performed on components after removing dictionary representative vector 22A from each of M dictionary feature data 21A, and the eigenvalue is large. There is a method in which L eigenvectors are selected as the dictionary principal component vector 23A, and the eigenvalue corresponding to the selected dictionary principal component vector 23A is adopted as the weight value 23B. Methods for obtaining eigenvalues and eigenvectors are described in general multivariate analysis literature. For example, Reference 1 (Tanaka, Wakimoto, “Multivariate Statistical Analysis”, Contemporary Mathematics, pp.71-79, 1983 )
[0041]
For example, as shown in FIG. 3, the dictionary storage unit 3 has C (C is a natural number) record storage units 31, 32,..., 3C, and each of the record storage units 31 to 3C has a record number. 61, dictionary representative vector 62, dictionary principal component data 63, and category ID 64 can be stored. The dictionary representative vector 22A generated by the dictionary representative vector generation unit 22 as the dictionary representative vector 62, and the L dictionary principal component vectors 23A and weights generated by the dictionary principal component data generation unit 23 as the dictionary principal component data 63. The category ID1B is stored with the value 23B as the category ID64. As described above, the dictionary storage unit 3 stores the dictionary representative vector 62 and the dictionary principal component data 63 as dictionary data according to the category ID 64. It is also possible to store a plurality of dictionary data having the same category ID.
[0042]
The identification target image group input unit 4 acquires N (N is a natural number) still images obtained by photographing the same target object with a video camera or the like, and outputs these as the input image data 4A to the verification unit 5. .
The collation unit 5 further includes an input image feature extraction unit 51, an input representative vector generation unit 52, an input principal component data generation unit (input subspace generation means) 53, a distance calculation unit 54, and an identification unit 55. Has been.
[0043]
The input image feature extraction unit 51 extracts the N input feature data 51A used for recognition from the N input image data 4A input from the identification target image group input unit 4, and the input representative vector generation unit 52 and The data is output to the input principal component data generation unit 53. Assume that an input feature data group composed of N pieces of input feature data 51A is distributed in an input feature distribution 51B shown in FIG.
As an example of the input image feature extraction unit 51, there is a raster scan of an output obtained by applying a first-order differential and a second-order differential filter to original image data and outputting it as one-dimensional feature data. As another example of the input image feature extraction unit 51, the original image data is raster scanned into one-dimensional feature data, and the average and variance are set to constant values so that the average is 0 and the variance is 1.0. There is something to normalize. As a result, input feature data 51A subjected to contour enhancement, noise removal, and the like is obtained.
[0044]
The input representative vector generation unit 52 generates an input representative vector 52A, which is one vector representing the input feature data group, based on the input feature data group composed of N pieces of input feature data 51A. The data is output to the generation unit 53 and the distance calculation unit 54. The input representative vector 52A is an arbitrary vector belonging to the input subspace including the input feature distribution 51B, and has the origin O as the start point and the Q point on the input subspace as the end point. Examples of the input representative vector 52A include an average value (average vector) and a median value (center vector) of the input feature data group.
[0045]
The input principal component data generation unit 53 includes input principal component vectors (Ψi (i = 1,...) That are K vectors representing the components after the input representative vector 52A is removed from each of the N input feature data 51A. .., K)) 53A and a weight value (μi (i = 1,..., K)) 53C corresponding to each of the input principal component vector (Ψj) are extracted and output to the distance calculation unit 54. When the number of feature dimensions of the input feature data 51A is D, K is a natural number greater than 1 and less than min (N, D). The input principal component vector 53A is a vector representing an input subspace including the input feature distribution 51B.
[0046]
As a method of removing the input representative vector 52A from the input feature data 51A, there are a method of subtracting the input representative vector 52A from the input feature data 51A and a method of calculating a component perpendicular to the input representative vector 52A of the input feature data 51A.
As a method of extracting the K input principal component vectors 53A and the weight values 53C, the components after removing the input representative vector 52A from each of the N input feature data 51A are subjected to principal component analysis, and the component with the larger eigenvalue is used. There is a method of selecting K eigenvectors as the input principal component vector 53A and adopting the eigenvalue corresponding to the selected input principal component vector 53A as the weight value 53C.
[0047]
The distance calculation unit 54 calculates a distance value with respect to dictionary data belonging to C categories stored in the dictionary storage unit 3 using the input representative vector 52A, the input principal component vector 53A, and the weight value 53C. More specifically, a dictionary representative vector 62, L dictionary principal component vectors 63A and weight values 63B stored as dictionary principal component data 63 from each of the record storage units 31 to 3C of the dictionary storage unit 3, and , And using the input representative vector 52A, the input principal component vector 53A, and the weight value 53C, an input subspace including the input feature distribution 51B shown in FIG. 2B and a dictionary subspace including the dictionary feature distribution 21BB of each category A distance value D between and is calculated and sequentially output to the identification unit 55 as a distance value 54A.
[0048]
The identification unit 55 outputs a recognition result 5A for the input image data 4A based on the distance value 54A with respect to the C categories. For example, as shown in FIG. 4, the identification unit 55 includes a minimum value calculation unit 71 and a threshold processing unit 72. The minimum value calculation unit 71 obtains the minimum value of the distance value 54A from the C categories. The threshold processing unit 72 compares the minimum value obtained by the minimum value calculating unit 71 with a predetermined threshold, and if it is smaller than the threshold, outputs the category from which the minimum value is obtained as the recognition result 5A. On the contrary, if it is equal to or greater than the threshold value, a recognition result 5A indicating that the input image data 4A is a pattern that does not exist in the dictionary is output.
[0049]
Next, the configuration and operation of the distance calculation unit 54 will be described in detail with reference to FIGS. 5 and 6. FIG. 5 is a block diagram illustrating a configuration example of the distance calculation unit 54. FIG. 6 is a conceptual diagram illustrating the operation of the distance calculation unit 54.
As shown in FIG. 5, the distance calculation unit 54 includes an input projection distance calculation unit 81, a dictionary projection distance calculation unit 82, and an integration unit 83.
The input projection distance calculation unit 81 has an input projection distance value (first value) representing the distance between the end point Q of the input representative vector 52A shown in FIG. 6 and the dictionary subspace 63C formed by the L dictionary principal component vectors 63A. Distance) d₁Is calculated. Input representative vector 52A is V₁The dictionary representative vector 62 is V₂, Σ is an arbitrary constant, the distance value d₁Can be calculated by, for example, Equation (15).
[0050]
[Expression 12]

[0051]
In order to speed up the calculation, the constant σ = 0 may be set and simplified as shown in Expression (16).
[0052]
[Formula 13]

[0053]
In equations (15) and (16), the vector (V₂-V₁) Is the vector QP and the input projection distance value d₁Represents the distance value between the end point Q of the input representative vector 52A and the dictionary subspace 63C.
The dictionary projection distance calculation unit 82 is a dictionary projection distance value (second distance) d between the end point P of the dictionary representative vector 62 shown in FIG. 6 and the input subspace 53C formed by the K input principal component vectors 53A.₂Is calculated. Similarly, the distance value d₂Can be calculated by, for example, Equation (17).
[0054]
[Expression 14]

[0055]
In order to speed up the calculation, the constant σ = 0 may be set and simplified as shown in Expression (18).
[0056]
[Expression 15]

[0057]
In the equations (17) and (18), the vector (V₁-V₂) Is the vector PQ and the dictionary projection distance value d₂Represents the distance value between the end point P of the dictionary representative vector 62 and the input subspace 53C.
The integrating unit 83 inputs the input projection distance value d₁And dictionary projection distance value d₂Both are used to calculate the distance value D. For example, the distance value D can be calculated using Equation (19).
D = αd₁+ Βd₂ ... (19)
However, α and β are constants.
Moreover, you may use Formula (20).
D = αd₁・ D₂/ (D₁+ D₂(20)
Where α is a constant.
[0058]
In FIG. 2B, the distance value D between the input feature distribution 51B and the dictionary feature distribution 21BB can also be obtained by calculating the norm of the vector PQ. In this method, the input representative vector 52A and the dictionary representative vector are obtained. Since only the two pieces of data 62 are used, even if the obtained distance value D is used for actual matching, the matching performance is low and a high recognition rate cannot be obtained. On the other hand, the projection distance value d between the input representative vector 52A and the dictionary subspace 63C.₁And a projection distance value d between the dictionary representative vector 62 and the input subspace 53C.₂And the distance value D obtained by integrating the input representative vector 52A and dictionary representative vector 62, K input principal component vectors 53A (and weight values 53B), and L dictionary principal component vectors. Since 63A (and weight value 63B) is obtained by using more data, compared with the method described above, collation performance is much higher and a high recognition rate is obtained.
[0059]
Further, since the distance value D between the distributions calculated using the representative vector and the principal component vector, not the angle between the input subspace 53C and the dictionary subspace 63C, is used for collation, the distribution arrangement as shown in FIG. Even in this case, accurate verification is possible.
In addition, since a plurality of image data 4A is used as an input image, the matching performance is much higher than that of a system that recognizes using one input image data.
It is preferable that the dictionary principal component vector 63A and the input principal component vector 53A used for calculating the distance value D are both orthogonal bases. By calculating the distance value using the

principal component vectors

63A and 53A that are orthogonal bases, it is possible to obtain a high-precision collation result in a shorter time than in the case of non-orthogonal bases, thereby improving the recognition rate. Because it can.
[0060]
Next, the dictionary data learning operation of the image recognition system shown in FIG. 1 will be described. FIG. 7 is a flowchart showing the flow of the dictionary data learning operation. FIG. 8 is a diagram illustrating an example of a learning image used for dictionary learning.
First, a learning image group used for dictionary learning and a category ID corresponding to the learning image group are input (step S1 in FIG. 7). For example, as shown in FIG. 8, the learning image group includes a plurality of (M pieces of learning images) belonging to a specific category, such as learning image data 91 of category 1, learning image data 92 of category 2, and learning image data 93 of category 3. ) Image data.
[0061]
Next, feature extraction is performed on each of the M learning image data input to obtain M dictionary feature data (step S2 in FIG. 7). Next, one vector representing the obtained dictionary feature data group is generated and used as a dictionary representative vector (step S3 in FIG. 7). Next, for the component excluding the dictionary representative vector from the dictionary feature data group, dictionary principal component data including L dictionary principal component vectors representing the distribution is generated (step S4 in FIG. 7). The dictionary representative vector and dictionary principal component data thus obtained are classified as dictionary data by category ID and stored in the dictionary storage unit 3 (step S5 in FIG. 7).
It is determined whether or not to learn about another category (step S6 in FIG. 7), and in the case of learning, the operation is repeated from the input of the learning image group (step S1 in FIG. 7). When the creation is finished, the learning operation is finished.
[0062]
Next, the recognition operation of the image recognition system shown in FIG. 1 will be described. FIG. 9 is a flowchart showing the flow of this recognition operation. FIG. 10 is a diagram illustrating an example of an input image to be recognized.
First, a recognition target image group is input (step S11 in FIG. 9). For example, as shown in FIG. 10, the input image group includes a plurality (N) of image data 90 obtained by photographing the same target object.
[0063]
Next, feature extraction is performed on each of the input N pieces of input image data to obtain N pieces of input feature data (step S12 in FIG. 9). Next, one vector representing the obtained input feature data group is generated and set as an input representative vector (step S13 in FIG. 9). Next, for the component obtained by removing the input representative vector from the input feature data group, input principal component data including K input principal component vectors representing the distribution is generated (step S14 in FIG. 9).
[0064]
Next, the dictionary data stored for each category in the dictionary storage unit 3 is read, and the distance value from the dictionary data is calculated for each category using the input representative vector and the input principal component data (step S15 in FIG. 9). ). And the minimum distance value is calculated | required among these (step S16 of FIG. 9).
Next, it is determined whether or not the minimum distance value is smaller than a threshold value (step S17 in FIG. 9). When the minimum distance value is smaller than the threshold value, the category having the minimum distance is output as a recognition result, and the process ends (step S18 in FIG. 9). On the contrary, when the minimum distance value is equal to or greater than the threshold value, “no corresponding class” is output and the process ends (step S19 in FIG. 9). Here, when the minimum distance value is equal to the threshold value, the process proceeds to step S19, but it goes without saying that the process may proceed to step S18.
[0065]
(Second Embodiment)
FIG. 11 is a block diagram showing a configuration of a face image recognition system according to the second embodiment of the present invention. The face image recognition system includes a face image detection unit 100, a learning unit 102, a face dictionary storage unit 103, and a collation unit 105.
The face image detection unit 100 selects face image data showing a human face from each frame of an image sequence such as a video image. In the dictionary data learning operation, the selected face image data is output to the learning unit 102, and in the recognition operation, it is output to the matching unit 105. As a method of selecting face image data, there is a method of determining that there is a face when the area of a color area close to the color of human skin or the area of a moving area exceeds a certain threshold. There is also a method of selecting a group of images in which a human face is manually photographed while viewing on the screen.
[0066]
The operation of the learning unit 102 is the same as the operation of the learning unit 2 in FIG. The operation of the face dictionary storage unit 103 is the same as the operation of the dictionary storage unit 3 in FIG. 1 except that dictionary data for a human face image is stored as a record. The operation of the matching unit 105 is the same as the operation of the matching unit 5 in FIG.
The face image recognition system shown in FIG. 8 can recognize a person appearing in an input image for a human face, and can be used for security, monitoring, human interface, and the like.
[0067]
(Third embodiment)
FIG. 12 is a block diagram showing the configuration of the third embodiment which is the image recognition system of the present invention. This image recognition system includes a computer 110 that operates under program control, a camera 121 that captures an identification target image and a learning image and outputs the image to the computer 110, and an operator gives a recognition instruction and a learning instruction to the computer 110. The console 122 includes a display device 123 that displays the recognition result output from the computer 110. The computer 110 includes an arithmetic processing unit 111, a storage unit 112, and an interface unit (hereinafter referred to as an I / F unit) 113.₁~ 113_FourAre connected to the bus 114. I / F part 113₁~ 113_ThreeInterface with the camera 121, the console 122, and the display device 123 which are external devices of the computer 110.
[0068]
An image recognition program for controlling the operation of the computer 110 is provided in a state recorded in a magnetic disk, a semiconductor memory, or other recording medium 124. The recording medium 124 is connected to the I / F unit 113._FourWhen connected to, the arithmetic processing unit 111 reads out the image recognition program written in the recording medium 124 and stores it in the storage unit 112. Thereafter, based on an instruction from the console 122, the arithmetic processing unit 111 executes the image recognition program stored in the storage unit 112, and the learning unit 2, the dictionary storage unit 3, the verification unit 5 illustrated in FIG. To realize.
The image recognition program may be provided via a telecommunication line such as the Internet.
[0069]
The computer 110 performs the operations shown in the flowcharts of FIGS. That is, when there is a learning instruction from the console 122 and a learning image data group is input from the camera 121 and a corresponding category ID is input from the console, the feature of the learning image data group is extracted and obtained. A dictionary representative vector and dictionary principal component data are generated based on the dictionary feature data group, the dictionary representative vector and the dictionary principal component data are classified by category ID, and stored as dictionary data in a dictionary storage unit configured by the storage unit 112. To do. Next, it is determined whether or not to learn about other categories, and in the case of learning, the operation is repeated from the input of learning image data. When the creation is finished, the learning operation is finished.
[0070]
Further, when there is a recognition instruction from the console 122 and an image data group to be identified is input from the camera 121, feature extraction is performed, and an input representative vector and input principal component data are based on the obtained input feature data group. Is generated. Next, the dictionary data stored for each category in the dictionary storage unit is read, and the distance value from the dictionary data is calculated for each category using the input representative vector and the input principal component data. Then, the minimum distance value among them is obtained, and it is determined whether or not the minimum distance value is smaller than the threshold value. When the minimum distance is smaller than the threshold value, the category having the minimum distance is displayed as a recognition result on the display device 123. Conversely, when the minimum distance is equal to or larger than the threshold value, the display device 123 displays that there is no corresponding class. Then, the recognition operation is finished.
It should be noted that the face image detection unit 100, the learning unit 102, the face dictionary storage unit 103, and the collation unit 105 shown in FIG. 11 can be realized by the arithmetic processing unit 111 executing the image recognition program. .
[0071]
【The invention's effect】
As described above, in the present invention, the distance value between the input subspace and the dictionary subspace is calculated using K input principal component vectors, input representative vectors, L dictionary principal component vectors, and dictionary representative vectors. The calculated distance value is used for collation. As a result, even when a plurality of input feature distributions exist in the same input subspace, the input image group can be correctly identified because the distance value is different if the arrangement of each input feature distribution is different.
Further, in the present invention, a distance value is calculated by using a plurality of input images and effectively using a lot of data obtained from the plurality of input images, and this distance value is used for collation. Compared with the case where recognition is performed using, collation performance is much higher and a high recognition rate can be obtained.
Therefore, it is possible to construct a robust recognition system and method that absorbs variations due to illumination of the same object, variations due to orientation, deformation, and the like.
[0072]
In addition, by calculating the distance value using the principal component vector that is an orthogonal basis, it is possible to obtain a highly accurate collation result in a shorter time than when it is not an orthogonal basis, and to improve the recognition rate. Can do.
In addition, by providing means for generating the dictionary principal component vector and the dictionary representative vector, or by performing such processing, the contents of the dictionary data can be updated as needed to cope with abrupt content changes.
In addition, by providing means for selecting face image data from an input image sequence, or by performing such processing, a system for identifying a person in an image using a human face image can be constructed. .
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an image recognition system according to a first embodiment of the present invention.
FIG. 2 is a diagram conceptually showing processing by the image recognition system shown in FIG.
FIG. 3 is a block diagram illustrating a configuration example of a dictionary storage unit.
FIG. 4 is a block diagram illustrating a configuration example of an identification unit.
FIG. 5 is a block diagram illustrating a configuration example of a distance calculation unit.
FIG. 6 is a conceptual diagram illustrating the operation of a distance calculation unit.
7 is a flowchart showing a flow of dictionary data learning operation of the image recognition system shown in FIG. 1;
FIG. 8 is a diagram illustrating an example of a learning image used for dictionary learning.
FIG. 9 is a flowchart showing a flow of recognition operation of the image recognition system shown in FIG. 1;
FIG. 10 is a diagram illustrating an example of an input image to be recognized.
FIG. 11 is a block diagram showing a configuration of a face image recognition system according to a second embodiment of the present invention.
FIG. 12 is a block diagram showing a configuration of a third embodiment which is an image recognition system of the present invention.
FIG. 13 is a block diagram showing a configuration of a conventional image recognition system.
FIG. 14 is a conceptual diagram showing the similarity used in a conventional image recognition system.
FIG. 15 is a conceptual diagram showing problems of a conventional image recognition system.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Learning image group input part, 1A ... Learning image data 1A, 1B ... Category ID, 2 ... Learning part, 3 ... Dictionary storage part, 4 ... Identification object image group input part, 5 ... Collation part, 5A ... Recognition result, 21 ... Learning image feature extraction unit, 21A ... Dictionary feature data, 21B, 21B '... Dictionary feature distribution, 22 ... Dictionary representative vector generation unit, 22A ... Dictionary representative vector, 23 ... Dictionary principal component data generation unit, 23A ... Dictionary main Component vector, 23B ... weight value, 31-3C ... record storage unit, 51 ... input image feature extraction unit, 51A ... input feature data, 51B ... input feature distribution, 52 ... input representative vector generation unit, 52A ... input representative vector, 53 ... Input principal component data generation unit, 53A ... Input principal component vector, 53B ... Weight value, 53C ... Input subspace, 54 ... Distance calculation unit, 54A ... Distance value, 55 ... Identification unit, 61 Record number, 62 ... Dictionary representative vector, 63 ... Dictionary principal component data, 63A ... Dictionary principal component vector, 63C ... Dictionary subspace, 64 ... Category ID, 71 ... Minimum value calculation unit, 72 ... Threshold processing unit, 81 ... Input Projection distance calculation unit, 82 ... Dictionary projection distance calculation unit, 83 ... Integration unit, 90 ... Identification input image data, 91 to 93 ... Learning image data, 100 ... Face image detection unit, 102 ... Learning unit, 103 ... Face Dictionary storage unit, 105 ... collation unit, 110 ... computer, 111 ... arithmetic processing unit, 112 ... storage unit, 113 ... interface unit, 114 ... bus, 121 ... camera, 122 ... console, 123 ... display device, 124 ... storage Medium, 501 ... Image input unit, 502 ... Dictionary storage unit, 503 ... Angle calculation unit between subspaces, 504 ... Recognition unit, 511, 511A to 511C ... Input features , 512 ... input subspace, 521 ... dictionary feature distribution, 522 ... subspace, D, d₁, D₂... distance value, O ... origin of coordinate system, P ... end point of dictionary representative vector, Q ... end point of input representative vector, S1 to S6 ... step of learning operation, S11 to S19 ... step of recognition operation, V₁... Input representative vector, V₂... Dictionary representative vector, Θ ... Angle.

Claims

An image recognition system that collates a plurality of input images obtained by photographing the same object with previously registered dictionary data and outputs a recognition result,
An input representative vector that is an average vector of a plurality of input feature data obtained from the input image, and K that is a basis of an input subspace spanned by the input feature data obtained by principal component analysis of the input feature data and inputs principal component vector, a dictionary representative vector is the mean vector of the dictionary feature data registered in advance, the base of which the dictionary feature data is E and principal component analysis, dictionary subspace spanned by the dictionary feature data a distance calculation means for using the L-number of dictionary principal component vectors, and calculates a distance value between the input subspace and the reference subspace is,
Identification means for performing identification based on the distance value and outputting the recognition result;
The distance calculating means includes
Input projection distance calculating means for calculating a first distance between the space formed by the L dictionary principal component vectors and the input representative vector;
Dictionary projection distance calculating means for calculating a second distance between the space formed by the K input principal component vectors and the dictionary representative vector;
An image recognition system comprising: an integration unit that calculates the distance value between the input subspace and the dictionary subspace by weighted addition of the first and second distances.

An image recognition system that collates a plurality of input images obtained by photographing the same object with previously registered dictionary data and outputs a recognition result,
The first representative vector obtained by principal component analysis of an input representative vector, which is an average vector of a plurality of input feature data obtained from the input image, and a first difference vector between the input feature data and the input representative vector . and K input principal component vector is the basis of the input subspace spanned by difference vector, first of a dictionary representative vector is the mean vector of the dictionary feature data registered in advance, and the dictionary feature data and the dictionary representative vector The input subspace and the dictionary subspace using L dictionary principal component vectors that are bases of the dictionary subspace spanned by the second difference vector, obtained by principal component analysis of two difference vectors Distance calculating means for calculating a distance value between and
Identification means for performing identification to output the recognition result based on the distance value;
The distance calculating means includes
Input projection distance calculating means for calculating a first distance between the space formed by the L dictionary principal component vectors and the input representative vector;
Dictionary projection distance calculating means for calculating a second distance between the space formed by the K input principal component vectors and the dictionary representative vector;
An image recognition system comprising: an integration unit that calculates the distance value between the input subspace and the dictionary subspace by weighted addition of the first and second distances.

The image recognition system according to claim 2,
The image recognition system, wherein the input principal component vector and the dictionary principal component vector are orthogonal bases.

The image recognition system according to claim 2 or 3,
The input representative vector is an average vector of the input feature data,
The input principal component vectors are K eigenvectors having the largest eigenvalue among components obtained by subtracting the input representative vector from the input feature data,
The dictionary representative vector is an average vector of the dictionary feature data,
2. The image recognition system according to claim 1, wherein the dictionary principal component vectors are L eigenvectors having a larger eigenvalue among components obtained by subtracting the dictionary representative vector from the dictionary feature data.

The image recognition system according to any one of claims 1 to 4,
If the input representative vector is V1, the dictionary representative vector is V2, the input principal component vector is ψi (i = 1,..., K), and the dictionary principal component vector is Φj (j = 1,..., L),
The input projection distance calculation means calculates the first distance d1 by the equation (A),
The image projection system, wherein the dictionary projection distance calculation means calculates the second distance d2 by equation (B).

In the image recognition system according to any one of claims 1 to 4,
The distance calculating means uses K weights corresponding to each of the input principal component vectors and L weights corresponding to the dictionary principal component vectors, respectively.

The image recognition system according to claim 6.
The weight corresponding to the input principal component vector is K eigenvalues of the eigenvector that becomes the input principal component vector,
The weight corresponding to the dictionary principal component vector is L eigenvalues of the eigenvector that becomes the dictionary principal component vector.

The image recognition system according to claim 7,
The input representative vector is V1, the dictionary representative vector is V2, the input principal component vector is ψi (i = 1,..., K), and the weight corresponding to the input principal component vector is μi (i = 1,. K), the dictionary principal component vector is Φj (j = 1,..., L), and the weight corresponding to the dictionary principal component vector is λj (j = 1,..., L).
The input projection distance calculating means calculates the first distance d1 by the equation (C),
The dictionary projection distance calculating means calculates the second distance d2 according to equation (D).

(Where σ is an arbitrary constant)

The image recognition system according to any one of claims 1 to 4, 5, or 8,
The integration means calculates the distance value D between the input subspace and the dictionary subspace from the first distance d1 and the second distance d2 by the equation (E). .
D = αd1 + βd2 (E)
(Where α and β are constants)

The image recognition system according to any one of claims 1 to 4, 5, or 8,
The integration means calculates the distance value D between the input subspace and the dictionary subspace from the first distance d1 and the second distance d2 by the equation (F). .
D = αd1 · d2 / (d1 + d2) (F)
(Where α is a constant)

The image recognition system according to any one of claims 1 to 10,
Input representative vector generation means for generating the input representative vector from the plurality of input feature data;
An image recognition system further comprising input principal component data generation means for generating the input principal component vector from the input feature data.

The image recognition system according to any one of claims 1 to 10,
Dictionary representative vector generating means for generating the dictionary representative vector;
An image recognition system, further comprising: dictionary principal component data generating means for generating the dictionary principal component vector and outputting the dictionary representative vector and dictionary dictionary means for storing the dictionary principal component vector.

The image recognition system according to any one of claims 1 to 12,
An image recognition system, wherein a recognition target is a human face image.

An image recognition method for collating a plurality of input images obtained by photographing the same object with previously registered dictionary data and outputting a recognition result,
An input representative vector that is an average vector of a plurality of input feature data obtained from the input image, and K that is a basis of an input subspace spanned by the input feature data obtained by principal component analysis of the input feature data and inputs principal component vector, a dictionary representative vector is the mean vector of the dictionary feature data registered in advance, the base of which the dictionary feature data obtained by principal component analysis, dictionary subspace spanned by the dictionary feature data a distance calculation step of using the L-number of dictionary principal component vectors, and calculates a distance value between the input subspace and the reference subspace is,
An identification step of performing identification to output the recognition result based on the distance value,
The distance calculating step includes:
An input projection distance calculating step of calculating a first distance between a space formed by the L dictionary principal component vectors and the input representative vector;
A dictionary projection distance calculating step of calculating a second distance between the space formed by the K input principal component vectors and the dictionary representative vector;
An image recognition method comprising: an integration step of calculating the distance value between the input subspace and the dictionary subspace by weighted addition of the first and second distances.

An image recognition method for collating a plurality of input images obtained by photographing the same object with previously registered dictionary data and outputting a recognition result,
The first representative vector obtained by principal component analysis of an input representative vector, which is an average vector of a plurality of input feature data obtained from the input image, and a first difference vector between the input feature data and the input representative vector . and K input principal component vector is the basis of the input subspace spanned by difference vector, first of a dictionary representative vector is the mean vector of the dictionary feature data registered in advance, and the dictionary feature data and the dictionary representative vector The input subspace and the dictionary subspace using L dictionary principal component vectors that are bases of the dictionary subspace spanned by the second difference vector, obtained by principal component analysis of two difference vectors A distance calculating step for calculating a distance value between and
An identification step of performing identification to output the recognition result based on the distance value;
The distance calculating step includes:
An input projection distance calculating step of calculating a first distance between a space formed by the L dictionary principal component vectors and the input representative vector;
A dictionary projection distance calculating step of calculating a second distance between the space formed by the K input principal component vectors and the dictionary representative vector;
An image recognition method comprising: an integration step of calculating the distance value between the input subspace and the dictionary subspace by weighted addition of the first and second distances.

The image recognition method according to claim 15, wherein
The image recognition method, wherein the input principal component vector and the dictionary principal component vector are orthogonal bases.

The image recognition method according to claim 15 or 16,
The input representative vector is an average vector of the input feature data,
The input principal component vectors are K eigenvectors having the largest eigenvalue among components obtained by subtracting the input representative vector from the input feature data,
The dictionary representative vector is an average vector of the dictionary feature data,
The image recognition method, wherein the dictionary principal component vectors are L eigenvectors in descending order of eigenvalues among components obtained by subtracting the dictionary representative vector from the dictionary feature data.

The image recognition method according to any one of claims 14 to 17,
If the input representative vector is V1, the dictionary representative vector is V2, the input principal component vector is ψi (i = 1,..., K), and the dictionary principal component vector is Φj (j = 1,..., L),
In the input projection distance calculating step, the first distance d1 is calculated by the equation (G),
In the image recognition method, the dictionary projection distance calculating step calculates the second distance d2 by the equation (H).

The image recognition method according to any one of claims 14 to 17,
The distance calculating step uses K weights corresponding to the input principal component vectors and L weights corresponding to the dictionary principal component vectors, respectively.

The image recognition method according to claim 19, wherein
The weight corresponding to the input principal component vector is K eigenvalues of the eigenvector that becomes the input principal component vector,
The weight corresponding to the dictionary principal component vector is L eigenvalues of the eigenvector that becomes the dictionary principal component vector.

The image recognition method according to claim 20, wherein
The input representative vector is V1, the dictionary representative vector is V2, the input principal component vector is ψi (i = 1,..., K), and the weight corresponding to the input principal component vector is μi (i = 1,. K), the dictionary principal component vector is Φj (j = 1,..., L), and the weight corresponding to the dictionary principal component vector is λj (j = 1,..., L).
In the input projection distance calculating step, the first distance d1 is calculated by the equation (I),
In the image recognition method, the dictionary projection distance calculating step calculates the second distance d2 by the equation (J).

(Where σ is an arbitrary constant)

The image recognition method according to any one of claims 14 to 17, 18 or 21,
In the integration step, the distance value D between the input subspace and the dictionary subspace is calculated from the first distance d1 and the second distance d2 by the equation (K). .
D = αd1 + βd2 (K)
(Where α and β are constants)

The image recognition method according to any one of claims 14 to 17, 18 or 21,
In the integration step, the distance value D between the input subspace and the dictionary subspace is calculated from the first distance d1 and the second distance d2 by the equation (L). .
D = αd1 · d2 / (d1 + d2) (L)
(Where α is a constant)

The image recognition method according to any one of claims 14 to 23,
An input representative vector generation step of generating the input representative vector from the plurality of input feature data;
An image recognition method further comprising: an input principal component data generation step of generating the input principal component vector from the input feature data.

The image recognition method according to any one of claims 14 to 23,
A dictionary representative vector generating step for generating the dictionary representative vector;
An image recognition method, further comprising: a dictionary principal component data generation step of generating the dictionary principal component vector and outputting the dictionary representative vector and a dictionary storage means for storing the dictionary principal component vector.

The image recognition method according to any one of claims 14 to 23,
An image recognition method, wherein the recognition target is a human face image.

An image recognition program for causing a computer to execute a process of collating a plurality of input images obtained by photographing the same object with previously registered dictionary data and outputting a recognition result,
An input representative vector that is an average vector of a plurality of input feature data obtained from the input image, and K that is a basis of an input subspace spanned by the input feature data obtained by principal component analysis of the input feature data and inputs principal component vector, a dictionary representative vector is the mean vector of the dictionary feature data registered in advance, the base of which the dictionary feature data obtained by principal component analysis, dictionary subspace spanned by the dictionary feature data a distance calculation step of using the L-number of dictionary principal component vectors, and calculates a distance value between the input subspace and the reference subspace is,
An identification step of performing identification to output the recognition result based on the distance value,
The distance calculating step includes:
An input projection distance calculating step of calculating a first distance between a space formed by the L dictionary principal component vectors and the input representative vector;
A dictionary projection distance calculating step of calculating a second distance between the space formed by the K input principal component vectors and the dictionary representative vector;
An image recognition program for causing a computer to execute an integration step of calculating the distance value between the input subspace and the dictionary subspace by weighted addition of the first and second distances.

An image recognition program for causing a computer to execute a process of collating a plurality of input images obtained by photographing the same object with previously registered dictionary data and outputting a recognition result,
A first difference obtained by principal component analysis of an input representative vector, which is an average vector of a plurality of input feature data obtained from the input image, and a first difference vector between the input feature data and the input representative vector. and K input principal component vector is the basis of the input subspace spanned by vectors, a dictionary representative vector is the mean vector of the dictionary feature data registered in advance, the second and the dictionary feature data and the dictionary representative vector Using L dictionary principal component vectors, which are the basis of the dictionary subspace spanned by the second difference vector, obtained by principal component analysis of the difference vector of
A distance calculating step of calculating a distance value between the input subspace and the dictionary subspace;
An identification step of performing identification for outputting the recognition result based on the distance value, and the distance calculation step includes:
An input projection distance calculating step of calculating a first distance between a space formed by the L dictionary principal component vectors and the input representative vector;
A dictionary projection distance calculating step of calculating a second distance between the space formed by the K input principal component vectors and the dictionary representative vector;
An image recognition program for causing a computer to execute an integration step of calculating the distance value between the input subspace and the dictionary subspace by weighted addition of the first and second distances.

In the image recognition program according to claim 28,
The image recognition program, wherein the input principal component vector and the dictionary principal component vector are orthogonal bases.

The image recognition program according to claim 26 or 30,
The input representative vector is an average vector of the input feature data,
The input principal component vectors are K eigenvectors having the largest eigenvalue among components obtained by subtracting the input representative vector from the input feature data,
The dictionary representative vector is an average vector of the dictionary feature data,
The image recognition program characterized in that the dictionary principal component vectors are L eigenvectors having the largest eigenvalue among components obtained by subtracting the dictionary representative vector from the dictionary feature data.

The image recognition program according to any one of claims 27 to 30,
If the input representative vector is V1, the dictionary representative vector is V2, the input principal component vector is ψi (i = 1,..., K), and the dictionary principal component vector is Φj (j = 1,..., L),
In the input projection distance calculating step, the first distance d1 is calculated by the equation (M),
In the image recognition program, the dictionary projection distance calculating step calculates the second distance d2 according to the equation (N).

The image recognition program according to any one of claims 27 to 30,
The distance calculation step uses K weights corresponding to each of the input principal component vectors and L weights corresponding to the dictionary principal component vectors, respectively.

In the image recognition program according to claim 32,
The weight corresponding to the input principal component vector is K eigenvalues of the eigenvector that becomes the input principal component vector,
The weight corresponding to the said dictionary principal component vector is L eigenvalues of the said eigenvector used as the said dictionary principal component vector. The image recognition program characterized by the above-mentioned.

In the image recognition program according to claim 33,
The input representative vector is V1, the dictionary representative vector is V2, the input principal component vector is ψi (i = 1,..., K), and the weight corresponding to the input principal component vector is μi (i = 1,. K), the dictionary principal component vector is Φj (j = 1,..., L), and the weight corresponding to the dictionary principal component vector is λj (j = 1,..., L).
In the input projection distance calculating step, the first distance d1 is calculated by the equation (O),
The dictionary projection distance calculating step calculates the second distance d2 by the equation (P).

(Where σ is an arbitrary constant)

The image recognition program according to any one of claims 27 to 30, 31 and 34,
In the integration step, the distance value D between the input subspace and the dictionary subspace is calculated from the first distance d1 and the second distance d2 by the equation (Q). .
D = αd1 + βd2 (Q)
(Where α and β are constants)

The image recognition program according to any one of claims 27 to 30, 31 and 34,
In the integration step, the distance value D between the input subspace and the dictionary subspace is calculated from the first distance d1 and the second distance d2 by the equation (R). .
D = αd1 · d2 / (d1 + d2) (R)
(Where α is a constant)

In the image recognition program according to any one of claims 27 to 36,
An input representative vector generation step of generating the input representative vector from the plurality of input feature data;
An image recognition program, further comprising: an input principal component data generation step for generating the input principal component vector from the input feature data.

In the image recognition program according to any one of claims 27 to 36,
A dictionary representative vector generating step for generating the dictionary representative vector;
An image recognition program further comprising: a dictionary principal component data generation step of generating the dictionary principal component vector and outputting the dictionary representative vector and a dictionary storage means for storing the dictionary principal component vector.

In the image recognition program according to any one of claims 27 to 36,
An image recognition program characterized in that a recognition target is a human face image.