JP4052609B2

JP4052609B2 - Pattern recognition apparatus and method

Info

Publication number: JP4052609B2
Application number: JP24147198A
Authority: JP
Inventors: 薫鈴木; 和広福井; 修山口
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1998-08-27
Filing date: 1998-08-27
Publication date: 2008-02-27
Anticipated expiration: 2018-08-27
Also published as: JP2000076440A

Description

【０００１】
【発明の属する技術分野】
本発明はパタンを認識する装置とその方法に関し、特に照明変動を受けた反射輝度パタンに対して安定なパタン認識装置とその方法に関する。
【０００２】
【従来の技術】
パタン認識の基本的な枠組みは、認識対象となるカテゴリに属するパタンの特徴を何らかの形式で表現し、未知入力パタンの特徴がこのカテゴリの特徴にどれくらい合致しているかを照合評価するというものである。このカテゴリの特徴を表現した情報を辞書と呼び、合致の度合を示す尺度を類似度と呼ぶが、この２つをどのように定義し、その処理系をいかに実装するかがパタン認識システムのポテンシャルを大きく左右する。
【０００３】
ここで、パタン認識の枠組みを簡単に説明する。
【０００４】
Ｎ個のスカラー量で構成されるパタンｐは各スカラー量を１つの軸に対応させ、各スカラー量の値を対応する軸上の座標としたＮ次元超空間Ｓの１点（あるいは原点からの位置ベクトルｖ）と看做すことができる。画像パタンｐの場合は各画素値がこのスカラー量に相当する。これは多くのパタン認識装置で用いられているパタンのベクトル化法である。このとき、２つのパタンがどれくらい似通っているかはパタン間の距離に基づく類似度を計算することで評価できる。これが最も簡単なパタン認識の枠組みである。
【０００５】
２つのパタン間の類似度の定義のしかたには様々あるが、例えば、２点間のユークリッド距離の逆数や、各点の位置ベクトル（これを特徴ベクトルと呼ぶ）の余弦（すなわち、一方から他方への射影長）などとして定義される。この場合、類似度が大きいほど２つのパタンは似ていることになる。
【０００６】
一般に、あるカテゴリに属する既知のパタンが複数与えられており、さらに未知のパタンが１つ与えられたとき、この未知パタンがこのカテゴリにどれくらい似通っているかを評価するには、単純に前述の２パタン間の類似度を計算することで行える。
【０００７】
この場合、未知パタンと全ての既知パタンとの類似度を個々に求め、例えば最も大きい類似度をこのカテゴリとの類似度とする。この方法はカテゴリの特徴を表わす辞書として全ての既知パタンを個々に保存しておく方法である。しかしながら、この方法だと既知パタンの数が増えるにつれて類似度計算のコストや既知パタンを記憶しておくメモリスペースなどが増大してしまう。また、既知パタンには相互に類似したパタンもあるため、同じ様なパタンと何回も類似度計算をするという無駄も発生する。
【０００８】
そこで、既知パタンを個々に保存しておくかわりにより少ない情報で既知パタン集合（すなわちカテゴリ）の特徴を記述する必要がでてくる。
【０００９】
カテゴリの特徴を表現する方法として、あるカテゴリに属するｋ本のＮ次元特徴ベクトル｛v ：v1,...,vk ｝（前述の既知パタンに相当）が複数与えられたとき、これら特徴ベクトルを主成分分析して得られるＭ（Ｍ≦Ｎ）本の正規直交ベクトル｛e ：e1,...,eM ｝を基底とするＭ次元超空間Ｌを辞書とする方法がある。このＭ次元超空間ＬはＮ次元超空間Ｓそのもの（Ｍ＝Ｎ）か、あるいはＮ次元超空間Ｓの次元数Ｍの部分空間Ｌ（Ｍ＜Ｎ）であるが、便宜上このＭ次元超空間Ｌを次元数Ｍの辞書部分空間と呼ぶ。
【００１０】
この方法によると、既知パタン（教示パタンとも学習サンプルと言う）の数ｋがＮよりはるかに多くても、辞書としては高々Ｍ（≦Ｎ）本の基底ベクトルを保存しておくだけでよい。未知特徴ベクトルｖ（前述の未知パタンに相当）が１つ与えられたときの辞書部分空間Ｌとの類似度はこれまでに各種定義されているが、未知特徴ベクトルｖを辞書部分空間Ｌに射影した長さ（特徴ベクトルｖを部分空間Ｌの全ての基底ベクトル｛e ：e1,...,eM ｝に射影した長さの２乗和として計算する）を類似度と定義する方法が部分空間法と呼ばれるものである。
【００１１】
なお、部分空間法を含めた辞書表現と類似度の定義に関連した詳細な情報は、文献[1] （エルッキ・オヤ著（小川英光他訳）、" パターン認識と部分空間" 、産業図書、1986）や、文献[2] （飯島泰蔵著、" パターン認識理論" 、森北出版、1989）に開示されている。
【００１２】
以上で述べた各種従来手法は、辞書表現をベクトル（あるいはベクトル群）や部分空間で記述しているが、辞書との間の類似度計算において対象となる入力は全てベクトル（すなわち、ただ１つのパタン）であるという点が共通している。
【００１３】
一方、前田は文献[3] （前田賢一、" パターン認識方式" 、特開昭60-57475）において、辞書として部分空間Ld（部分空間法と同様である）を用いつつ、未知入力としては１つの入力パタンｐに人為的に位置ずれ変動を加えて生成した複数パタン｛p ：p1,...,Pk ｝による部分空間Li（入力部分空間と呼ぶ）を採用して、入力部分空間Liと辞書部分空間Ldの最大余弦を類似度（部分空間間類似度と呼ぶ）と定義する相互部分空間法を提案している。
【００１４】
また、山口らは文献[4] （山口修他、" 動画像を用いた顔認識システム" 、信学技報PRMU97-50 、1997）において、入力時系列パタンから入力部分空間Liを生成する相互部分空間法を提案し、人物顔認識への適用を行っている。この文献において、同じ辞書部分空間の組と同じ入力時系列パタンの組を与えて比較したとき、相互部分空間法は従来の部分空間法より優れた認識性能を発揮することが示されている。
【００１５】
特に、部分空間法では認識対象人物の顔向きや表情の変化により類似度が大きく振動する現象が現われたが、相互部分空間法では類似度が高い値で安定することが確認されている。
【００１６】
部分空間法などの類似度計算対象を１パタンとする認識手法と相互部分空間法との相違を図８を用いて模式的に説明する。図中の４１はカテゴリＡに属する辞書部分空間であり、図中の４２はカテゴリＢに属する辞書部分空間である。図中の" ×" はカテゴリＡに属する入力パタンであり、図上段に示されるように、パタン４３はカテゴリＡの辞書部分空間４１に近く、パタン４４はカテゴリＢの辞書部分空間４２に近い。そのため、類似度計算対象を１パタンと成す認識手法ではパタン４３はカテゴリＡに、パタン４４はカテゴリＢに類別されてしまう。一方、図下段に示されるように、入力パタンから入力部分空間４５を生成する相互部分空間法では、入力部分空間４５が最も接近するのはカテゴリＡの辞書部分空間４１であることから、入力パタンはカテゴリＡに類別されるのである。
【００１７】
このように、相互部分空間法の類似度はパタン４３のような本来のカテゴリに近いパタンの存在のおかげで、パタン４４のような本来のカテゴリから遠い変動パタンの存在に対して安定であることがわかる。
【００１８】
このような相互部分空間法における類似度安定性は定性的には以下の２つの安定性の相乗効果であると解釈できる。
【００１９】
（１）入力部分空間の安定性：相互部分空間法では、多数の特徴ベクトルにより入力部分空間が張られるので、多くの特徴ベクトルに常に現われるランダムな微小変動や、ある特徴ベクトルにのみ大きく現われる稀少変動は入力部分空間全体には影響しにくい。すなわち、入力部分空間の安定性は変動の規模と頻度に依存する。
【００２０】
（２）部分空間間類似度の安定性：相互部分空間法では、入力部分空間と辞書部分空間の最大余弦、すなわち双方の部分空間に各々属する２つのベクトルの組（無数にある）による余弦のうち最も大きい値（最大余弦）を類似度とするので、入力部分空間に多少の変動があってもベクトルの組が変わるだけで類似度そのものは変動しにくい。
【００２１】
以上の事柄を踏まえて文献[4] における現象を考察する。顔向きや表情の変化により常に未学習のランダムな微小変動を受けたパタンが入力され、稀に未学習の大きな変動（稀少変動）を受けたパタンが入力される。部分空間法では未学習の微小変動にさえ直ちに類似度が反応して振動してしまう。また、部分空間法では未学習の大きな変動に対しては類似度が一瞬のうちに容認できないほど低下してしまう。
【００２２】
一方、相互部分空間法では既学習変動／未学習変動の区別なく、それが微小変動や希少変動である限り入力部分空間が辞書部分空間の近くに留まることができる（入力部分空間の安定性）。そして、入力部分空間が辞書部分空間の近くに留まる限りは、それが多少変動しても部分空間間類似度は高い値を維持できる（部分空間間類似度の安定性）。
【００２３】
さらに、相互部分空間法では辞書部分空間に変動を学習させることで、変動の規模と頻度の両方に対する耐性を同時に強化することができる。この安定性と学習性は変動に対して頑健なパタン認識系を提供するうえで重要な性質である。
【００２４】
部分空間法（その他、入力を１パタンとする手法）では、所定期間の複数入力パタンに対する類似度を平均するなどの後処理により稀少変動に対する耐性を与えることができるが、そのような後処理によったとしても、個々の入力パタンに対する類似度が変動しているので、入力部分空間の安定性と部分空間間類似度の安定性の相乗効果に相当する類似度安定性を得ることは難しい。
【００２５】
ところで、物体をある照明環境下で撮影して得られた画像（すなわち反射輝度による画像）は、該物体の立体形状と表面反射率パタン、照明分光強度（色と明るさ）、照明方向と撮像方向などによって決まる。物体の立体形状と表面反射率パタンは該物体固有の特徴であり、撮影された画像にはその特徴が物体固有の画像パタン（すなわち反射輝度のパタン）として現われる。これが物体を画像パタン的に認識できる根拠である。
【００２６】
ところが、この画像パタンは物体の変形（例えば顔における表情変化などの立体形状変化）、照明変動（例えば時間帯や天候による照明分光強度や照明方向の変化）、および物体の姿勢変動（例えば顔における向きの変化などの撮像方向の変化）などが原因で様々な影響を受ける。特に照明方向については画像パタンの陰影の付き方が劇的に変わるという不安定さを抱えているにも関わらず、辞書部分空間に様々な照明方向のパタンを予め学習させることが難しい。未学習の照明変動は全ての入力パタンに未学習の輝度変動を与えてしまう。そのため、変動に強いとされる相互部分空間法においてさえ、その稀少変動と微小変動への耐性が保証されず、深刻な類似度低下と誤認識の原因となる。
【００２７】
照明変動の影響を図９を用いて模式的に説明する。図中の５１はカテゴリＡに属する辞書部分空間であり、図中の５２はカテゴリＢに属する辞書部分空間である。図中の" ×" はカテゴリＡに属する入力パタンであり、照明変動の影響により全体的に移動してしまっている。図上段に示されるように、そのような照明変動の中でもパタン５３はカテゴリＡの辞書部分空間５１に近く、パタン５４はカテゴリＢの辞書部分空間５２に近い。ただし、パタン全体の移動のためにカテゴリＡに近いパタンの数は減少してしまっている。このとき、図下段に示されるように、入力パタンから生成される入力部分空間５５はカテゴリＡの辞書部分空間５１ともカテゴリＢの辞書部分空間５２とも中途半端に距離があるため、その類似度は高くなく、入力パタンの具合によっては入力部分空間５５はカテゴリＢに類別されてしまうことすらある。
【００２８】
このような照明変動による類似度低下や誤認識を防ぐためには、照明の影響を受けにくく、かつ物体固有の特徴を温存した特徴ベクトルを用いて認識を行う必要がある。照明変動は対象物の反射輝度パタンに強い変動を与えるため、反射輝度パタンそのものを特徴ベクトルとする限り照明変動の影響は避けられない。一方、反射輝度パタンから導かれるある種の特徴ベクトルには照明条件に影響されにくい恒常性があることが知られている。
【００２９】
例えば、文献[5] （Yael Moses et al., "Face Recognition: the Problem of Compensation for Changes in Illumination Direction", Proceeding of ECCV'94, 1994）では、照明変動に対して有効な特徴量として反射輝度画像を微分して得られる輝度勾配画像（以下、微分画像と呼ぶ）が提案されている。
【００３０】
反射輝度画像への１階微分は輝度の変化率を求める操作に相当し、輝度がステップ状に変化する箇所（エッジ）を抽出する。反射輝度画像への２階微分は反射輝度変化率の変化率を求める操作に相当し、輝度が尾根状や谷状に変化する箇所を抽出する。
【００３１】
物体表面には物体特有の反射率の高低パタンがあり、反射率の低い箇所では照明強度に依らず常に低い反射輝度が、反射率の高い箇所では照明強度に依存して反射輝度が高低に変化する。このため、１階微分画像は（ａ）照明が当たる箇所における反射率の変化と（ｂ）高反射率箇所における陰影変化を表現している。（ａ）は物体の姿勢が変わらない限り照明変動による強度変動はあっても位置変動がないエッジであり、（ｂ）は物体の姿勢が変わらなくても照明変動により強度と位置がともに変動するエッジである。
【００３２】
照明変動に対する微分画像の第１の効果は、（ａ）が物体の姿勢に応じた物体固有の反射率パタンの特徴を保存していることである。これは微分パタンを物体認識に使用することの正当性を示している。照明変動に対する微分画像の第２の効果は、物体表面の反射率パタンの高低差が大きい、すなわち物体のテクスチャが強いほど（ａ）は（ｂ）より強く現われることである。特に物体が滑らかな曲面を有しているとき、照明による陰影はなだらかな変化を示すので（ｂ）はさらに弱くなる。これは反射輝度画像では大規模な変動であった未学習の照明変動が、微分画像では（ｂ）に関する未学習の変動に抑えられること（局所化と微小化）を意味する。微分画像はこのような効果により照明変動恒常性を有すると考えられる。しかし、一方では（ｂ）の存在により完全に照明変動の影響を除去しきれてはいない。したがって、（ｂ）の局所的微小変動を吸収できる認識手法と組み合わせて用いることが、パタン認識システムに照明変動恒常性を与えるために必要であることがわかる。
【００３３】
実際、文献[5] による実験では、類似度としてパタン間距離を用いたため、微分による認識性の向上が確認されなかったという結果に終わっている。この知見が意味するものは、特徴量の工夫により照明変動の影響を局所的な微小変動にまで抑え込むことができても、この微小変動に対して最も安定なパタン認識方式を選択しなければ望む効果が得られないということである。ここに従来示唆されていなかった相互部分空間法を用いる優位性が現れるのである。
【００３４】
相互部分空間法と照明変動の局所化／微小化を組み合わせる効果を図１０を用いて模式的に説明する。図中の６１はカテゴリＡに属する辞書部分空間であり、図中の６２はカテゴリＢに属する辞書部分空間である。図中の" ×" はカテゴリＡに属する入力パタンであり、照明変動が無い場合よりは遠いものの、照明変動の局所化／微小化により本来のカテゴリであるＡの辞書部分空間６１に近づいている。図上段に示されるように、パタン６３はカテゴリＡの辞書部分空間６１に近く、パタン６４はカテゴリＢの辞書部分空間６２に近い。
【００３５】
ただし、照明変動の局所化／微小化によるパタン全体の移動のためにカテゴリＡに近いパタンの数が増加している。このとき、図下段に示されるように、入力パタンから生成される入力部分空間６５が最も接近しているのはカテゴリＡの辞書部分空間６１であることから、照明変動の局所化／微小化により入力パタンは無事カテゴリＡに類別されるのである。一方、類似度計算対象を１パタンとする方法では、照明変動の有無やその局所化／微小化の有無に関わらず、正しく類別されるパタンの数が増減するのみで、相互部分空間法のような劇的な効果は望めない。
【００３６】
生きた人間の顔のように向きや表情が頻繁に変わる物体では、眉や目など低反射率部分のエッジ（ａ）は顔向きや表情に応じて移動し、陰影によるエッジ（ｂ）は照明方向が一定ならほとんど移動しない。ずなわち、顔向きや表情が頻繁に変わる（生きた人間を撮影する）状況を前提とした場合には、（ａ）と（ｂ）ではその動き方が異なるのである。
【００３７】
ここで、辞書に学習された顔の特徴たる眉や目のエッジ（ａ）を基準に考えると、（ｂ）は（ａ）に対して顔向きなどに応じて相対的に移動する一貫性のない未学習の微小変動と看做せる。これは、特定の限られた照明条件で様々な顔向きや表情を学習させた文献[3][4]のような顔認識において成立するので、既に顔向き変化や表情変化への耐性が他の従来手法より優れていることが示され、かつ微小変動への高い耐性が望めるパタン認識手法、すなわち入力部分空間の安定性と部分空間間類似度の安定性を有する相互部分空間法を用いることに特段の意味が出てくる。ところが、従来提案されている微分画像の応用手法ではこの点に関して何ら考慮言及されておらず、その結果、微分画像のメリットを完全に引き出すことができず、十分な性能向上が成されなかったのである。これが従来技術における第１の課題である。
【００３８】
顔のように反射率パタンが明確に存在し、かつ滑らかな表面形状を有する物体では、照明による陰影は物体の反射輝度パタンの大局的な変化となって現われ、反射率変化はより細かい構造を持つ急峻な輝度変化となって現われる。前述の微分処理は反射輝度画像に対する高域通過フィルタとして働いており、空間周波数の低い大局的な輝度変化を除去して、空間周波数の高い微細な構造を浮き立たせる効果を有する。すなわち、文献[5] に示唆される微分処理に限らず、高域通過フィルタリングを施すことが照明変動の影響を軽減するのに有効であるという結論が得られる。したがって、高域通過フィルタリングと相互部分空間法を組み合わせることによってパタン認識システムは物体変形と物体姿勢変動と照明変動というおよそ全ての変動に対して従来にない安定性を獲得するのである。
【００３９】
さて、照明の影響が反射輝度画像の低周波成分に現われることを利用した別の特徴量も提案されている。文献[6] （赤松, " 顔画像照合装置",特開平5-20442 ）と文献[7] （赤松他, " 濃淡画像マッチングによるロバストな正面顔の識別法−フーリエスペクトルのKL展開の応用−", 信学論Vol.J76-D-II, No.7, 1993）では、特徴量として反射輝度画像の空間周波数成分の強度分布、すなわちフーリエスペクトルが用いられている。
【００４０】
フーリエスペクトルを特徴量とする最大のメリットは、その幾何変換不変性により画像パタン中の物体像の位置ずれ変動に影響されないことである。さらにフーリエスペクトルには照明変動の影響を緩和する効果もあることが文献[7] において示されている。これは照明変動が専らフーリエスペクトルの低周波領域にのみ影響を与えるため、照明変動の影響を低周波領域の強度変動という局所的な変動に抑えられるからである。この低周波領域の変動は微分における（ｂ）に相当する局所変動である。したがって、フーリエスペクトルの場合も微分の場合と同様にパタン認識手法として相互部分空間法を用いるメリットが存在する。
【００４１】
なお、微分画像もフーリエスペクトルもそれ単体にて反射輝度画像から照明変動成分を局所化／微小化する効果を有することは説明した通りである。ここで、これら単体による初期の照明変動恒常化処理を経てなお残留する照明変動成分を別の方法によりさらに削減することが考えられる。残留する照明変動を恒常化する手段についてはこれまでに明確にそれを示唆した例はなく、初期の照明変動を恒常化する手段と、なお残留する照明変動を恒常化する手段の組み合わせを実現することが従来技術における第２の課題である。
【００４２】
さらに、照明の影響を強く残す高反射率部分の低周波成分、すなわち照明変動成分として軽減されるべき陰影変化には物体の形状的特徴による反射輝度パタンが含まれている。反射輝度画像に対する高域フィルタリング処理はこのような特徴の情報を落とす操作であり、物体の反射率的特徴のみを認識に使おうとするものである。望むらくはこの両者を用いつつ照明変動の影響を軽減して高い認識性能を達成したいものである。その実現が従来技術における第３の課題である。
【００４３】
また、一般的な微分処理は少なくとも画素値の加重和計算、すなわち積和演算を必要としており、フーリエ展開はさらに多大な計算を必要とする。そのため、画像の画素数が大きくなるにつれて計算コストが容認できないほどに増大する可能性がある。
【００４４】
特に微分における積和演算は高域通過フィルタリングの基本的な実現方法であるため、これを高速化することは高域通過フィルタリングと相互部分空間法を組み合わせたパタン認識システムにおいて、その処理パフォーマンスを最大限まで向上させるためにも重要である。より高速な高域通過フィルタリングの実現が従来技術における第４の課題である。
【００４５】
【発明が解決しようとする課題】
本発明は以上の問題点に鑑みて成されたものであり、その第１の目的とするところは、照明変動を局所化／微小化する手段により相互部分空間法の照明変動への弱さを補うことにより両者の長所を融合し、対象の位置と姿勢の変動、対象の変形、ならびに照明変動などの変動全般に対して従来になく頑健なパタン認識装置とその方法を提供することである。
【００４６】
また、本発明の第２の目的は、照明変動を局所化／微小化する手段を経てなお残留する照明変動成分をさらに削減する別の手段を提供することにより、入力に現われる未学習の微小変動を一層減少させ、もって第１の目的とするところのパタン認識装置とその方法に一層頑健な耐照明変動性を実現することである。
【００４７】
また、本発明の第３の目的は、反射率パタンによる物体テクスチャ特徴と陰影パタンによる物体形状特徴の２つを有効に利用しつつ、照明変動の影響を局所化／微小化したパタン認識手段を提供することにより、入力に現われる物体特徴を最大限利用可能として、もって第１の目的とするところのパタン認識装置とその方法の認識能力を向上させることである。
【００４８】
また、本発明の第４の目的は、照明変動を局所化／微小化する手段を極めて高速簡便な処理により提供し、もって第１の目的とするところのパタン認識装置とその方法の処理能力を向上させることである。
【００４９】
【課題を解決するための手段】
本発明に係るパタン認識装置は、パタンを入力する手段と、該入力パタンから認識のための特徴ベクトルを抽出する手段と、該特徴ベクトルから入力部分空間を生成する手段と、該入力部分空間と辞書部分空間の間の類似度を計算する手段と、該類似度に基づいて物体のカテゴリを確定する手段とから成るパタン認識装置において、前記特徴ベクトルが、照明変動を局所化／微小化する１ないし複数の処理を経て抽出され、前記照明変動を局所化／微小化する１ないし複数の処理の１つが低輝度領域のレンジを拡大し、相対的に高輝度領域のレンジを圧縮する処理であることを特徴とする。
【００５０】
本発明に係るパタン認識装置は、パタンを入力する手段と、該入力パタンから認識のための特徴ベクトルを抽出する手段と、該特徴ベクトルから入力部分空間を生成する手段と、該入力部分空間と辞書部分空間の間の類似度を計算する手段と、該類似度に基づいて物体のカテゴリを確定する手段とから成るパタン認識装置において、前記特徴ベクトルの一部あるいは全部が、照明変動を局所化／微小化する１ないし複数の処理を経て抽出され、前記照明変動を局所化／微小化する１ないし複数の処理の１つが低輝度領域のレンジを拡大し、相対的に高輝度領域のレンジを圧縮する処理であり、前記入力部分空間が、照明変動を抑圧する制約部分空間に前記特徴ベクトルを主成分分析して得られる部分空間を射影する処理を経て生成され、前記制約部分空間は、同一照明条件下で収集された前記特徴ベクトルによってカテゴリ毎に張られる部分空間のカテゴリ間の差分部分空間の線形結合として生成されることを特徴とする。
【００５１】
本発明に係るパタン認識装置は、パタンを入力する手段と、該入力パタンから認識のための第１の特徴ベクトルを抽出する手段と、該入力パタンから認識のための第２の特徴ベクトルを抽出する手段と、該第１と第２の特徴ベクトルから入力部分空間を生成する手段と、該入力部分空間と辞書部分空間の間の類似度を計算する手段と、該類似度に基づいて物体のカテゴリを確定する手段とから成るパタン認識装置において、前記第１の特徴ベクトルが、照明変動を局所化／微小化する１ないし複数の処理を経て抽出され、前記照明変動を局所化／微小化する１ないし複数の処理の１つが低輝度領域のレンジを拡大し、相対的に高輝度領域のレンジを圧縮する処理であることを特徴とする。
【００５２】
本発明によれば、立体物を撮影して得られる反射輝度パタンを認識するパタン認識装置において、パタンに現われる照明変動の影響が空間周波数の低周波成分や高輝度画素領域に多く含まれることを利用し、照明変動を局所化／微小化する処理として、例えばパタンに対する高域通過フィルタリング、パタンに対する低輝度レンジ拡大処理、パタンに対するフーリエ展開処理を単独または組み合わせてを施すことにより、照明の影響を局所化／微小化した特徴ベクトルを生成して認識する。この結果、相互部分空間法の照明変動に対する弱さを補って本来の耐変動性を回復させ、変動全般に対して頑健なパタン認識装置を実現可能である。
【００５３】
本発明によれば、立体物を撮影して得られる反射輝度パタンを認識するパタン認識装置において、照明変動を局所化／微小化する処理を施して特徴ベクトルを生成し、さらに該特徴ベクトルによる入力部分空間を所定の制約部分空間に射影して照明変動の方向に広がりを持たない新たな入力部分空間を生成する。この結果、相互部分空間法の照明変動に対する弱さを補って本来の耐変動性を回復させ、変動全般に対してさらに頑健なパタン認識装置を実現可能である。
【００５４】
本発明によれば、立体物を撮影して得られる反射輝度パタンを認識するパタン認識装置において、照明変動を局所化／微小化する処理を施して第１の特徴ベクトルを生成し、これと別に照明変動の局所化／微小化処理を受けない第２の特徴ベクトルを生成し、両特徴ベクトルを適当に組み合わせることにより、照明の影響が軽減され、かつ物体固有の特徴を多く保存した入力部分空間を生成する。この結果、相互部分空間法の照明変動に対する弱さを補って本来の耐変動性を回復させ、変動全般に対して頑健かつ認識性能の向上したパタン認識装置を実現可能である。
【００５５】
本発明によれば、立体物を撮影して得られる反射輝度パタンを認識するパタン認識装置において、前記高域通過フィルタリング処理を微分処理で、なかでも計算コストの極めて少ないエンボス化処理で実現する。この結果、本処理の追加に伴う計算コストの増加をほとんど招くことなく、相互部分空間法の照明変動に対する弱さを補って本来の耐変動性を回復させ、変動全般に対して頑健かつ高速なパタン認識装置を実現可能である。
【００５６】
本発明によれば、前記照明変動を局所化／微小化する処理に先駆けてパタンの画素輝度値に対する輝度補正を行うことで、照明変動に対してより安定した特徴ベクトルを生成し、相互部分空間法の照明変動に対する弱さを補って本来の耐変動性を回復させ、変動全般に対して頑健かつ安定なパタン認識装置を実現可能である。
【００５７】
なお、以上の各装置に係る発明は方法に係る発明としても成立し、方法に係る発明は装置に係る発明としても成立する。
【００５８】
また、上記の発明は、相当する手順あるいは手段をコンピュータに実行させるためのプログラムを記録した機械読み取り可能な媒体としても成立する。
【００５９】
【発明の実施の形態】
本発明に係るパタン認識装置とその方法の実施例を図面にしたがって説明する。
【００６０】
（第１の実施例の説明）
以下、本発明に係るパタン認識装置とその方法の第１の実施例を説明する。図１に本実施例装置のブロック構成を示す。本実施例装置はパタン入力部１、変動恒常化ベクトル生成部２、入力部分空間生成部３、部分空間間類似度計算部４、認識カテゴリ確定部５、出力部６より成る。また、図２に本実施例装置における処理構成を示す。本実施例装置における処理はパタン入力処理Ｓ１、変動恒常化ベクトル生成処理Ｓ２、入力部分空間生成処理Ｓ３、部分空間間類似度計算処理Ｓ４、認識カテゴリ確定処理Ｓ５、出力処理Ｓ６より成る。
【００６１】
（第１の実施例：パタン入力部１の説明）
パタン入力部１はテレビカメラなどの撮像手段を用いて撮影された認識対象物のｋ個の反射輝度パタン｛p ：p1,...,Pk ｝を入力する（ステップＳ１）。このとき、パタン入力部１は単一の撮像位置による撮影時刻の異なる複数のパタンを入力することも可能であるし、複数の撮像位置による撮影時刻の同じあるいは異なる複数のパタンを入力することも可能である。また、パタン入力部１は単一の撮像手段を用いても、複数の撮像手段を用いてもよい。
【００６２】
（第１の実施例：変動恒常化ベクトル生成部２の説明）
照明の効果は高反射率部分において物体形状に由来する陰影を反射輝度パタン上に発生させる。照明の変動はこの陰影のパタン全域にわたる大規模な変動を誘うため、相互部分空間法の安定性が発揮されるように、この大規模な変動を局所的かつ微小な変動に抑え込む必要がある。変動恒常化ベクトル生成部２はこのような変動の局所化／微小化を行う機能ブロックであり、パタン入力部１による各反射輝度パタン｛p ：p1,...,Pk ｝から照明変動の影響を局所化／微小化させた特徴ベクトル｛v ：v1,...,vk ｝を生成出力する（ステップＳ２）。このために、本実施例装置における変動恒常化ベクトル生成部２は反射輝度パタンから照明変動の影響を緩和された特徴ベクトルを生成する方法として、少なくとも次の３つを単独であるいは組み合わせて用いることができる。
【００６３】
（Ａ）高域通過フィルタリング
（Ｂ）低輝度領域のレンジ拡大
（Ｃ）フーリエ変換
以下、上記（Ａ）〜（Ｃ）の方法について説明する。
【００６４】
（第１の実施例：変動恒常化ベクトル生成部２の説明：高域通過フィルタリング）
照明変動の局所化／微小化を行う第１の方法として、反射輝度パタン｛p ：p1,...,Pk ｝に対する高域通過フィルタリング（Ａ）が挙げられる。照明変動による高反射率部分に現われる陰影変動は空間周波数でいうところの低周波成分であることを利用して、反射輝度パタン｛p ：p1, ...,Pk｝に高域通過フィルタリング（あるいは高域強調フィルタリングでも近い効果を得られる）を施した画像パタンを特徴ベクトル｛v ：v1,...,vk ｝として生成することで、陰影変動を一貫性のない弱いエッジに変換する（局所化と微小化）。
【００６５】
高域通過フィルタリングの実現方法には幾つかの候補が考えられるが、最も簡単には反射輝度パタン｛p ：p1,...,Pk ｝に微分処理を施すことである。一般に微分処理は(2m+1)×(2n+1)画素の微分オペレータOpをw ×h 画素の画像パタン上で走査しつつ、オペレータOpの画素が持つ重み係数を該オペレータ画素と重なる画像パタン上の画素が持つ輝度値にかけた値（重み係数×輝度値）を求め、オペレータ画素全てのこのような計算値を加え合わせた値（重み係数と輝度値の積和）をオペレータ中心に対応する画像パタン上の微分値とする。オペレータが覆う範囲が画像パタン上のオペレータ中心画素の周辺近傍となることから、オペレータサイズが大きいほど中心画素から遠い画素までを考慮する、すなわち波長の長い低い空間周波数成分を検出するオペレータになる。逆に高い空間周波数成分を検出するためにはオペレータのサイズは小さい方がよい。なお、微分オペレータによる画像処理に関しては文献[8] （田中弘編著、" 画像処理応用技術" 、工業調査会、1989）に詳しい情報が記載されている。因にオペレータによる微分処理の有効範囲は画像パタンの左右両端各m 画素と上下両端各n 画素を取り除いた範囲となり、微分後のパタンはこの有効範囲の(w-2m)×(h-2n)画素のパタン、すなわち(w-2m)×(h-2n)次元特徴ベクトルとなる。これは有効範囲の外を中心画素としたオペレータは画像パタンをはみ出してしまうため、微分値が正確に求められないからである。
【００６６】
また、微分処理は微分オペレータOpの適用による加重和計算の他に、例えばレリーフ様の画像表現手法として知られているエンボス化処理によっても実現できる。エンボス化は元の画像パタンに同じ画像パタンを位置をずらして重ね、一方から他方を差し引いて得られた値を当該画素の値とするものである。エンボス化処理は極端に小さな微分オペレータを用いた１方向の微分処理であるとも看做せるが、画素値の減算のみで行えることから、一般的な微分オペレータによる加重和計算より遥かに高速に実行できる。特に１画素をずらして行われるエンボス化処理は、当該反射輝度パタンにおける最も高い空間周波数成分のみを抽出する高域通過フィルタとして機能するので、照明変動の影響たる低周波成分を少ない計算コストで除去するのに好都合である。
【００６７】
なお、エンボス化の場合は、画像パタンの重なる範囲が有効範囲となるので、特徴ベクトルもその重なり範囲の画素数次元のベクトルとなる。１画素ずらすエンボス化は無効範囲が高々１画素幅しかなく、サイズの大きな微分オペレータを適用する場合よりも有効範囲を広く保てる分さらに有利である。
【００６８】
エンボス化処理で画像パタンをずらす方向は当該画像パタンに現われる認識対象の特徴を表わすエッジの方向に垂直とすれば最も効果的である。このようにすることで、照明変動恒常化パタンは認識対象の特徴を十分残し、照明変動の影響を極力排除した画像となる。例えば、人間の顔には眉や目や口などの水平エッジが多いことから、画像パタンを垂直方向（顔の縦軸方向）にずらすエンボス化が有効であると考えられる。同様の方向性の問題は通常の微分オペレータについても言え、パタンの特徴を表わす重要なエッジ方向にのみ感度のある微分オペレータをかけることが肝要である。
【００６９】
なお、高域通過フィルタリングにより生成出力された画像パタンには低反射率部分の反射率パタンに応じた強いエッジ（ａ）と高反射率部分の陰影に応じた弱いエッジ（ｂ）が存在している。このうち前者（ａ）は物体特徴を表わす確固とした画像特徴として認識対象にすべき成分であるが、後者（ｂ）は照明変動により位置と強度が変化するノイズ成分である。パタン全域にわたる照明変動を（ｂ）のようなノイズ成分に局所化／微小化するという目的は高域通過フィルタリングのみでも達成できるが、高域通過フィルタリング（照明変動恒常化）に続いて、この残留するノイズ成分（ｂ）を除去する処理（残留照明変動恒常化）を加えることにより照明変動は一層軽減され、相互部分空間法の変動安定性がさらに効果を発揮する。そこで、残留照明変動恒常化処理としてノイズ成分（ｂ）を除去するためには弱小エッジ除去処理を実行する。
【００７０】
具体的には、画素値の絶対値が所定閾値以下の画素値を０に置き換える。この結果生成される画像パタンは（ａ）のエッジのみを残した特徴ベクトルとなる。
【００７１】
（第１の実施例：変動恒常化ベクトル生成部２の説明：低輝度領域のレンジ拡大）
照明変動の局所化／微小化を行う第２の方法として、反射輝度パタン｛p ：p1,...,Pk ｝に対する低輝度領域のレンジ拡大処理（Ｂ）が挙げられる。照明変動が低反射率部分にあまり影響を与えないことを利用して、反射輝度パタンにおいて常に低反射輝度となる低反射率部分の画像情報を強調すべく、低輝度領域の輝度レンジを拡大し、相対的に高輝度領域の輝度レンジを縮小する。すなわち、画像パタン中の暗い部分の階調を広げ、明るい部分の階調を狭める。この結果、照明の影響を受けやすい高反射率部分の情報を狭い階調範囲に抑え込み（微小化）、物体の特徴を強く残す低反射率部分の情報を広い階調範囲で表現しなおす。具体的な計算方法としては、入力輝度に対する出力輝度の値を定義した輝度補正曲線や、入力輝度に対する出力輝度の値を格納した輝度補正テーブルを用意し、入力輝度に対する補正輝度値を求める。
【００７２】
また、低輝度領域の輝度レンジを拡大処理の極端な例として、反射輝度パタンの２値化という方法も可能である。反射輝度パタンの画素値を所定の閾値に照らし、閾値以上の高輝度なら所定値を、閾値未満の低輝度なら別の所定値を当該画素に与える。
【００７３】
いずれの方法においても、処理後のパタンの有効範囲は変化しないので、特徴ベクトルの次元は入力パタンの画素数w ×h と一致する。
【００７４】
（第１の実施例：変動恒常化ベクトル生成部２の説明：フーリエ展開）
照明変動の局所化／微小化を行う第３の方法として、文献[7] に開示される反射輝度パタン｛p ：p1,...,Pk ｝に対するフーリエ展開（Ｃ）が挙げられる。照明変動に伴う陰影変動は専ら空間周波数でいうところの低周波成分であることから、照明変動の影響は反射輝度パタン｛p ：p1,...,Pk ｝をフーリエ展開して得られる空間周波数別の強度（フーリエスペクトル）の低周波領域に多く現われる。k 個パタンのn 個の空間周波数について各々生成されたフーリエスペクトル｛f ：f1,...,fn ｝はそれぞれＮ次元特徴ベクトル｛v ：v1,...,vk ｝の１本と看做され、照明変動の影響はその中の低周波領域に対応した少数のスカラー量にのみ現われる（局所化）。
【００７５】
なお、フーリエ展開により生成され特徴ベクトルはＮ個のスペクトル成分から成るフーリエスペクトルである。フーリエスペクトルは物体の画像的特徴を空間周波数成分のパタンで表現しており、その低周波成分には照明による陰影の影響が依然保存されている。パタン全域にわたる照明変動をフーリエスペクトルの低周波成分に局所化するという目的はフーリエ展開のみでも達成できるが、フーリエ展開（照明変動恒常化）に続いて、この残留する低周波成分（ｂ）を除去する処理（残留照明変動恒常化）を加えることにより照明変動は一層軽減され、相互部分空間法の変動安定性がさらに効果を発揮する。そこで、残留照明変動恒常化処理として、Ｎ個のスペクトル成分のうち低周波成分に当たるn 個のスペクトル成分を削除した(N-n) 個のスペクトル成分で特徴ベクトルを再構成する。この結果、照明変動の影響を除去した特徴ベクトルが得られる。
【００７６】
（第１の実施例：変動恒常化ベクトル生成部２の説明：輝度補正・正規化処理）
また、高域通過フィルタリング（Ａ）、低輝度領域のレンジ拡大処理（Ｂ）、フーリエ展開（Ｃ）による照明変動恒常化に先駆けて、反射輝度パタン｛p ：p1,...,Pk ｝の輝度範囲や輝度ヒストグラムを補正・正規化しておく。このようにすることで、微分画像のエッジ強度、低反射率部分に相応した低輝度領域、フーリエスペクトルの強度が照明強度の変動如何に関わらず安定に求められる。輝度補正・正規化処理としては、反射輝度パタンに対するノルム正規化（画素値の２乗和が所定値となるよう、各画素値に共通の係数をかける）、正準化（画素値の平均が０、画素値の分散が１となるよう、各画素値から共通のオフセットを引き、さらに共通の係数をかける）、ヒストグラム平坦化（輝度ヒストグラムが平坦となるように各画素の輝度値を非線形に補正する）などの処理を施すものとする。
【００７７】
（第１の実施例：変動恒常化ベクトル生成部２の説明：特徴量の組み合わせ等）
なお、照明変動恒常化のための方法は上記（Ａ）〜（Ｃ）の例に限定されない。例えば、照明変動恒常化として文献[9] （橋爪他、" 照明光の変動にロバストな色画像中の物体抽出" 、信学技報PRMU97-125、1997）に開示されるCCCI（Color Constant Color Indexing ）記述子を用いることも可能である。具体的には、各画素の輝度値の対数と該画素の隣接４近傍の輝度値の対数の加重和を利用する。これは照明が変動しても反射輝度パタン中の多点間の輝度比は変化しないという性質（仮定）を利用したものであり、照明変動の影響を微小化する効果を有する。要するに本発明においては、入力部分空間の安定性と部分空間間類似度の安定性の相乗効果による稀少変動と微小変動への耐性を相互部分空間法が発揮し、もって認識システムがほぼ全ての変動に対して高い安定性を得ることができるよう、特徴ベクトルの全域にわたって影響する照明変動を局所変動や微小変動に抑え込むことができさえすれば、変動の局所化や微小化のための手段の相違は問題とはならないのである。
【００７８】
また、照明変動恒常化（あるいは残留照明変動恒常化を加えた処理）を経て生成される特徴ベクトルは上記（Ａ）〜（Ｃ）ならびにその他の相当する方法を複数並列に用いて生成されるようにすることも可能である。すなわち、複数の方法により各々生成される特徴ベクトルの各スカラー量を適宜組み合わせた新たな特徴ベクトルを生成して用いることが可能である。例えば、高域通過フィルタリングとフーリエ展開による各特徴量を１つの特徴ベクトルに編集して利用してもよい。このようにすることで、１つの方法では得られなかった物体の特徴表現を利用して、そこに含まれる照明変動を局所化／微小化した安定な認識を達成可能となる。
【００７９】
（第１の実施例：入力部分空間生成部３の説明）
入力部分空間生成部３は、変動恒常化ベクトル生成部２によるＮ次元特徴ベクトル｛v ：v1,...,vk ｝からＫＬ展開などの主成分分析手法により固有値｛λ：λ1,...,λM ｝とＮ次元固有ベクトル｛e ：e1,...,eM ｝を抽出し、対応する固有値の大きいものから順にMi（Mi≦M ）本の固有ベクトル｛e ：e1,...,eMi｝を基底ベクトルとする入力部分空間Li（次元数Mi）を生成出力する（ステップＳ３）。このMiは次段の部分空間間類似度計算部４において類似度計算に使用される入力側の基底の数である。
【００８０】
（第１の実施例：部分空間間類似度計算部４の説明）
部分空間間類似度計算部４は、認識対象となるx 個のカテゴリに対応した辞書部分空間を保持しており、入力部分空間生成部３による入力部分空間Liと保持するx 個の辞書部分空間｛Ld：Ld1,...,Ldx ｝との部分空間間類似度｛s ：s1,...,sx ｝を全て計算し、該類似度とそれを示した辞書部分空間のカテゴリコード｛c ：c1,...,cx ｝とを対にした認識候補情報｛c ：c1,...,cx 、s ：s1,...,sx ｝を生成出力する（ステップＳ４）。なお、入力部分空間Liに対する射影行列をP 、辞書部分空間Ldに対する射影行列をQ とすると、部分空間間類似度s は行列PQP あるいはQPQ の最大固有値となる。このとき、射影行列P の生成に入力部分空間LiのMi本の基底ベクトルが、射影行列Q の生成に辞書部分空間LdのMd本の基底ベクトルが各々用いられる。部分空間間類似度の定義と計算法の数学的裏付けについては文献[3] などに開示されている。
【００８１】
（第１の実施例：認識結果確定部５の説明）
認識結果確定部５は、部分空間間類似度計算部４による認識候補情報｛c ：c1,...,cx 、s ：s1,...,sx ｝の全類似度を比較し、所定閾値以上の最高類似度を示した辞書部分空間のカテゴリコードcoを認識結果として確定出力する。また、もし所定閾値以上の類似度を示す辞書部分空間が無い場合には認識可能なカテゴリが入力パタンに含まれていないものと確定し、カテゴリ無しを示す特別なコードを出力する（ステップＳ５）。
【００８２】
（第１の実施例：出力部６の説明）
出力部６は、少なくとも認識結果確定部５によるコードcoを装置外部に出力する（ステップＳ６）。
【００８３】
（第２の実施例の説明）
以上は、変動恒常化ベクトル生成部２（変動恒常化ベクトル生成処理Ｓ２）を経て照明変動の影響を局所化／微小化された反射輝度パタンを相互部分空間法で認識する形態の第１の実施例の説明であった。次に本発明に係るパタン認識装置とその方法の第２の実施例について説明する。
【００８４】
図３に本実施例装置のブロック構成を示す。本実施例装置はパタン入力部１１、変動恒常化ベクトル生成部１２、入力部分空間生成部１３、部分空間間類似度計算部１４、認識カテゴリ確定部１５、出力部１６、部分空間射影部１７より成る。また、図４に本実施例装置における処理構成を示す。本実施例装置における処理はパタン入力処理Ｓ１１、変動恒常化ベクトル生成処理Ｓ１２、入力部分空間生成処理Ｓ１３、部分空間間類似度計算処理Ｓ１４、認識カテゴリ確定処理Ｓ１５、出力処理Ｓ１６、部分空間射影処理Ｓ１７より成る。図中の１１〜１６（Ｓ１１〜Ｓ１６）は第１の実施例における１〜６（Ｓ１〜Ｓ６）と同じである。本実施例と第１の実施例との相違は、部分空間射影部１７（部分空間射影処理Ｓ１７）の追加である。
【００８５】
（第２の実施例：部分空間射影部１７の説明）
入力部分空間生成部１３により生成される部分空間Liは変動恒常化ベクトル生成部１２により照明変動を局所化／微小化された特徴ベクトルｖにより生成される次元数Miの部分空間であるが、照明変動の影響を完全には除去されていない。部分空間射影部１７は次元数Mcの制約部分空間Lcを保持しており、入力部分空間をこの制約部分空間Lcに射影した新しい部分空間Li' を生成出力する（ステップＳ１７）。このとき、入力部分空間Liを制約部分空間Lcに射影する計算は、入力部分空間LiのMi本の基底ベクトルをそれぞれLcに射影した新しいMi本のベクトルを求め、このベクトルを正規直交化して部分空間Li' の基底とすることで達成される。また、照明変動を制約する制約部分空間Lcは、同一照明条件下でｘ個のカテゴリのＭ次元部分空間｛Ls：Ls1,...,Lsx ｝を収集し、その任意の異なる２カテゴリ間のｙ個の差分部分空間｛LD：LD1,...,LDy ｝を求めて、それらの線形結合として与えられる。
【００８６】
なお、差分部分空間LDは、２つの部分空間Lsi,Lsj の正準角を与えるＭ組のベクトルu,v の差分ベクトル(u-v) を基底として与えられる。同一照明条件で生成される差分部分空間｛LD：LD1,...,LDy ｝は照明変動を含まないので、その線形結合たる制約部分空間Lcも照明変動方向への広がりを持たない。一方、差分部分空間｛LD：LD1,...,LDy ｝はカテゴリ間の差分なので、その線形結合たる制約部分空間Lcもカテゴリ間の差の方向に広がりを持っている。
【００８７】
したがって、制約部分空間Lcに射影された入力部分空間Liは照明変動方向の成分を含まず、カテゴリ間差分方向の成分を保存した入力部分空間Li' となる。すなわち、射影後の部分空間は照明変動を抑圧され、カテゴリ特徴を強調された部分空間となる。なお、ここで重要となる点は、制約部分空間Lcを得るための｛Ls：Ls1,...,Lsx ｝の収集は、照明条件が一定でよいことから簡単に実行できることである。
【００８８】
部分空間間類似度計算部１４は、予め制約部分空間Lcに射影された辞書部分空間｛Ld' ：Ld1',...,Ldx' ｝を保持しており、部分空間射影部１７による入力部分空間Li' と保持するx 個の辞書部分空間｛Ld' ：Ld1',...,Ldx' ｝との部分空間間類似度｛s ：s1,...,sx ｝を全て計算し、該類似度とそれを示した辞書部分空間のカテゴリコード｛c ：c1,...,cx ｝とを対にした認識候補情報｛c ：c1,...,cx 、s ：s1,...,sx ｝を生成出力する（ステップＳ１４）。なお、以上のように入力部分空間と辞書部分空間とを制約部分空間に射影した後、部分空間間類似度を計算する認識手法を制約相互部分空間法と呼ぶ。
【００８９】
変動恒常化ベクトル生成部１２における照明変動恒常化（あるいは残留照明変動恒常化を加えた処理）と部分空間射影部１７による制約相互部分空間法を併用することで、照明変動に影響されない類似度を計算利用することが可能になる。
【００９０】
（第３の実施例の説明）
以上は照明変動恒常化を施された特徴ベクトルを用いる実施態様であったが、本発明においては、照明変動恒常化を経て生成される特徴ベクトルと照明変動恒常化を経ないで生成される特徴ベクトルを組み合わせて利用することも可能である。
【００９１】
すなわち、各々の特徴ベクトルの各スカラー量を適宜組み合わせた新たな特徴ベクトルを生成して用いることで、照明変動恒常化により利用できなくなった物体特徴情報を照明変動恒常化を受けない特徴ベクトルから利用可能としつつ、両者を組み合わせた特徴ベクトルにおいては照明変動恒常化の効果として照明変動恒常化を施さないものよりも変動の規模が局所化／微小化される。この結果、照明変動恒常化では得られなかった物体の特徴表現を利用しつつ、そこに含まれる照明変動を緩和した安定な認識が可能となる。本発明に係るパタン認識装置とその方法の第３の実施例はこのような構成を与えられている。以下、この第３の実施例について説明する。
【００９２】
図５に本実施例装置のブロック構成を示す。本実施例装置はパタン入力部２１、変動恒常化ベクトル生成部２２、入力部分空間生成部２３、部分空間間類似度計算部２４、認識カテゴリ確定部２５、出力部２６、変動未恒常化ベクトル生成部２７より成る。また、図６に本実施例装置における処理構成を示す。本実施例装置における処理はパタン入力処理Ｓ２１、変動恒常化ベクトル生成処理Ｓ２２、入力部分空間生成処理Ｓ２３、部分空間間類似度計算処理Ｓ２４、認識カテゴリ確定処理Ｓ２５、出力処理Ｓ２６、変動未恒常化ベクトル生成処理Ｓ２７より成る。図中の２１〜２６（Ｓ２１〜Ｓ２６）は第１の実施例における１〜６（Ｓ１〜Ｓ６）と同じである。本実施例と第１の実施例との相違は、変動未恒常化ベクトル生成部２７（変動未恒常化ベクトル生成処理Ｓ２７）の追加である。
【００９３】
変動恒常化ベクトル生成部２２は、パタン入力部２１による反射輝度パタンから例えば高域通過フィルタリングによる照明変動恒常化（あるいは残留照明変動恒常化を加えた処理）を施したN1次元の第１の特徴ベクトルを生成する。このとき、第１の特徴ベクトルは照明変動を局所化／微小化されたベクトルであり、物体の形状的特徴を表現する陰影の情報をほとんど失って、物体のテクスチャ特徴を表現する強いエッジ情報のみを維持している。
【００９４】
変動未恒常化ベクトル生成部２７は、パタン入力部２１による反射輝度パタンから直接N2次元の第２の特徴ベクトルを生成する。第２の特徴ベクトルは反射輝度パタンそのものであり、物体形状による陰影情報をそのまま維持しているが、これはパタン全域にわたる照明変動の影響も同時に含んでいる。
【００９５】
入力部分空間生成部２３はN1次元の第１特徴ベクトルとN2次元の第２特徴ベクトルを合わせて(N1+N2) 次元の特徴ベクトルと見做し、入力される複数の(N1+N2) 次元の特徴ベクトルから入力部分空間を生成する。
【００９６】
なお、本実施例において、第１の特徴ベクトルをパタンの第１の所定領域のみを対象とした照明変動恒常化（あるいは残留照明変動恒常化を加えた処理）で生成し、第２の特徴ベクトルをパタンの第２の所定領域のみを対象とした他の手段により生成することで、物体特徴をより多く残しながら他無駄な情報を除去した状態で認識を行うことも可能である。例えば、顔認識において、第１の所定領域を強いエッジの多い眉や目や鼻や口周辺とし、第２の所定領域を額や頬とすることも可能である。このようにすることで、テクスチャ特徴を多く保存する領域に照明変動恒常化を施し、立体形状特徴を多く保存する領域に照明変動恒常化を施さないようにして、より多くの顔特徴情報を保存しつつ、照明変動を緩和した特徴量を用いることができる。
【００９７】
（変形実施例の説明）
なお、以上で述べた各実施例はその構成を組み合わせたり変形したりして実施することも可能である。
【００９８】
例えば、第１の特徴ベクトルを照明変動恒常化（あるいは残留照明変動恒常化を加えた処理）で生成し、第２の特徴ベクトルを他の手段により生成し、さらに両者を融合させた特徴ベクトルから入力部分空間を生成して制約部分空間への射影を行って類似度を求めることも可能である。このようにすることで、物体特徴をより多く残しながら照明変動をより強く除去して認識を行うことが可能となる。
【００９９】
また、第１の所定領域を例えば顔でいうところの眉や目や鼻や口周辺のような強いエッジの多い領域とし、第２の所定領域を例えば顔でいうところの額や頬のような陰影の強い領域として、第１の特徴ベクトルを第１の所定領域のみを対象とした照明変動恒常化（あるいは残留照明変動恒常化を加えた処理）で生成し、第２の特徴ベクトルを第２の所定領域のみを対象とした他の手段により生成し、さらに両者を融合させた特徴ベクトルから入力部分空間を生成したうえで制約部分空間への射影を行って類似度を求めるようにすることも可能である。このようにすることで、物体特徴をより多く残しながら照明変動その他無駄な情報を除去した状態で認識を行うことが可能である。
【０１００】
（媒体による実施形態の説明）
ところで、図７に例示するように、本発明に係る画像パタン認識装置とその方法を実現する情報（例えばプログラム）を記録媒体３１に記録し、該記録した情報を該記録媒体３１を経由して撮像手段３７を具備した装置３２や装置３３に適用したり、通信回線３５や３６を経由して、装置３３や撮像手段３８を具備した装置３４に適用することも可能である。すなわち、本発明は上述した実施形態に限定されるものではなく、その技術的範囲において種々変形して実施することができる。
【０１０１】
【発明の効果】
本発明によれば、屋外や窓辺の屋内などの外光影響のあるような照明変動環境下においても、該照明変動の影響を受けにくい画像パタン認識が可能になる。
【０１０２】
具体的には、本発明によれば、立体物を撮影して得られる反射輝度パタンを認識するパタン認識装置において、パタンに現われる照明変動の影響が空間周波数の低周波成分や高輝度画素領域に多く含まれることを利用した照明変動恒常化処理により、照明の影響を局所化／微小化された特徴ベクトルを生成して認識する。この結果、相互部分空間法の照明変動に対する弱さを補って本来の耐変動性を回復させ、変動全般に対して頑健なパタン認識装置を実現可能である。
【０１０３】
また、本発明によれば、立体物を撮影して得られる反射輝度パタンを認識するパタン認識装置において、照明変動恒常化処理の実施後に残留する変動成分を除去する処理を施すことにより、照明変動の影響を一層削減した特徴ベクトルを生成可能とし、変動全般に対してさらに頑健なパタン認識装置を実現可能である。
【０１０４】
また、本発明によれば、立体物を撮影して得られる反射輝度パタンを認識するパタン認識装置において、照明変動恒常化処理を施して特徴ベクトルを生成し、さらに残留した照明変動を部分空間の射影により恒常化する制約相互部分空間法を行うことにより、変動全般に対してさらに一層頑健なパタン認識装置を実現可能である。
【０１０５】
また、本発明によれば、立体物を撮影して得られる反射輝度パタンを認識するパタン認識装置において、照明変動恒常化処理を施して第１の特徴ベクトルを生成し、これと別に照明変動の恒常化を受けない第２の特徴ベクトルを生成して組み合わせることにより、照明変動を局所化／微小化しつつ物体固有の特徴を多く保存した特徴ベクトルを生成可能とし、変動全般に対して頑健かつ認識性能の向上したパタン認識装置を実現可能である。
【０１０６】
また、本発明によれば、立体物を撮影して得られる反射輝度パタンを認識するパタン認識装置において、照明変動恒常化処理を計算コストの極めて少ないエンボス化処理で実現することで、照明変動恒常化処理の追加に伴う計算コストの増加をほとんど招くことなく、変動全般に対して頑健かつ高速なパタン認識装置を実現可能である。
【０１０７】
また、本発明によれば、前記照明変動恒常化処理に先駆けてパタンの画素輝度値に対する補正を行うことで、照明変動に対してより安定した特徴ベクトルを生成可能とし、変動全般に対して頑健かつ安定なパタン認識装置を実現可能である。
【図面の簡単な説明】
【図１】本発明に係る第１の実施例装置のブロック構成を示した図である。
【図２】本発明に係る第１の実施例装置の処理構成を示した図である。
【図３】本発明に係る第２の実施例装置のブロック構成を示した図である。
【図４】本発明に係る第２の実施例装置の処理構成を示した図である。
【図５】本発明に係る第３の実施例装置のブロック構成を示した図である。
【図６】本発明に係る第３の実施例装置の処理構成を示した図である。
【図７】記録媒体等による実施形態を説明するための図である。
【図８】相互部分空間法の効果を説明するための図である。
【図９】照明変動の影響を説明するための図である。
【図１０】照明変動の局所化／微小化の効果を説明するための図である。
【符号の説明】
１：パタン入力部、
２：変動恒常化ベクトル生成部、
３：入力部分空間生成部、
４：部分空間間類似度計算部、
５：認識カテゴリ確定部、
６：出力部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an apparatus and method for recognizing a pattern, and more particularly to an apparatus and method for recognizing a pattern that is stable against a reflected luminance pattern subjected to illumination fluctuations.
[0002]
[Prior art]
The basic framework for pattern recognition is to express the characteristics of patterns belonging to the category to be recognized in some form, and collate and evaluate how well the characteristics of unknown input patterns match the characteristics of this category. . Information representing the characteristics of this category is called a dictionary, and a measure indicating the degree of match is called similarity. The potential of the pattern recognition system depends on how these two are defined and how the processing system is implemented. Greatly affects.
[0003]
Here, the framework for pattern recognition is briefly explained.
[0004]
A pattern p composed of N scalar quantities associates each scalar quantity with one axis, and each scalar quantity value is a point on the corresponding axis (or a point from the origin in the N-dimensional superspace S). It can be regarded as a position vector v). In the case of the image pattern p, each pixel value corresponds to this scalar quantity. This is a pattern vectorization method used in many pattern recognition apparatuses. At this time, how similar the two patterns are can be evaluated by calculating the similarity based on the distance between the patterns. This is the simplest pattern recognition framework.
[0005]
There are various ways of defining the similarity between two patterns. For example, the reciprocal of the Euclidean distance between two points, the cosine of each point's position vector (this is called a feature vector) (ie, from one to the other) Projective length). In this case, the greater the similarity, the more similar the two patterns.
[0006]
In general, when a plurality of known patterns belonging to a certain category are given and one unknown pattern is given, in order to evaluate how similar this unknown pattern is to this category, the above-mentioned 2 This can be done by calculating the similarity between patterns.
[0007]
In this case, the similarity between the unknown pattern and all the known patterns is obtained individually, and for example, the largest similarity is set as the similarity with this category. In this method, all known patterns are individually stored as a dictionary representing the characteristics of a category. However, with this method, as the number of known patterns increases, the cost of similarity calculation and the memory space for storing the known patterns increase. In addition, since there are patterns similar to each other in the known patterns, there is a waste of calculating the degree of similarity many times with similar patterns.
[0008]
Therefore, instead of storing the known patterns individually, it is necessary to describe the characteristics of the known pattern set (ie, category) with less information.
[0009]
As a method of expressing the features of a category, when a plurality of k N-dimensional feature vectors {v: v1, ..., vk} (corresponding to the aforementioned known patterns) belonging to a certain category are given, these feature vectors are There is a method in which an M-dimensional superspace L based on M (M ≦ N) orthonormal vectors {e: e1,..., EM} obtained by principal component analysis is used as a dictionary. The M-dimensional superspace L is the N-dimensional superspace S itself (M = N) or a subspace L (M <N) of the number of dimensions M of the N-dimensional superspace S. For convenience, the M-dimensional superspace L Is called a dictionary subspace of dimension M.
[0010]
According to this method, even if the number k of known patterns (also referred to as teaching patterns) is much larger than N, it is only necessary to store at most M (≦ N) basis vectors as a dictionary. Although the degree of similarity with the dictionary subspace L when one unknown feature vector v (corresponding to the above-mentioned unknown pattern) is given has been defined in various ways, the unknown feature vector v is projected onto the dictionary subspace L. The method of defining the length (calculated as the sum of squares of the lengths of the feature vectors v projected to all the base vectors {e: e1, ..., eM} of the subspace L) as the similarity is the subspace It is called the law.
[0011]
Detailed information related to the definition of the dictionary representation and similarity including the subspace method can be found in Ref. [1] (Erki Oya (translated by Hidemitsu Ogawa), “Pattern Recognition and Subspace”, Industrial Books, 1986), and [2] (by Taizo Iijima, “Pattern Recognition Theory”, Morikita Publishing, 1989).
[0012]
The various conventional methods described above describe the dictionary expression as a vector (or a group of vectors) or a subspace, but all of the inputs that are the targets for similarity calculation with the dictionary are vectors (that is, only one The pattern is common.
[0013]
On the other hand, Maeda uses the subspace Ld (similar to the subspace method) as a dictionary and 1 as an unknown input in Ref. [3] (Kenichi Maeda, “Pattern Recognition Method”, Japanese Patent Laid-Open No. 60-57475). Subspace Li (referred to as input subspace) by a plurality of patterns {p: p1, ..., Pk} generated by artificially adding positional deviation to one input pattern p A mutual subspace method is proposed in which the maximum cosine of the dictionary subspace Ld is defined as similarity (referred to as intersubspace similarity).
[0014]
Yamaguchi et al. [4] (Osamu Yamaguchi et al., "Face Recognition System Using Moving Images", IEICE Technical Report PRMU97-50, 1997) generates mutual subspaces Li from input time series patterns. A subspace method is proposed and applied to human face recognition. In this document, it is shown that the mutual subspace method exhibits better recognition performance than the conventional subspace method when the same set of input time series patterns and the same set of dictionary subspaces are given and compared.
[0015]
In particular, in the subspace method, a phenomenon in which the similarity greatly fluctuates due to a change in the face direction or facial expression of the person to be recognized, but it has been confirmed that the mutual subspace method stabilizes at a high value of similarity.
[0016]
A difference between a recognition method such as a subspace method in which a similarity calculation target is one pattern and a mutual subspace method will be schematically described with reference to FIG. 41 in the figure is a dictionary subspace belonging to category A, and 42 in the figure is a dictionary subspace belonging to category B. “×” in the figure is an input pattern belonging to category A, and as shown in the upper part of the figure, pattern 43 is close to category A dictionary subspace 41 and pattern 44 is close to category B dictionary subspace 42. Therefore, in the recognition method in which the similarity calculation target is one pattern, the pattern 43 is classified into the category A and the pattern 44 is classified into the category B. On the other hand, as shown in the lower part of the figure, in the mutual subspace method for generating the input subspace 45 from the input pattern, the input subspace 45 is closest to the dictionary subspace 41 of category A. Are classified into category A.
[0017]
Thus, the similarity of the mutual subspace method is stable against the existence of a variation pattern far from the original category such as the pattern 44, thanks to the existence of a pattern close to the original category such as the pattern 43. I understand.
[0018]
The similarity stability in the mutual subspace method can be qualitatively interpreted as a synergistic effect of the following two stability.
[0019]
(1) Stability of the input subspace: In the mutual subspace method, the input subspace is extended by a large number of feature vectors. Therefore, random minute fluctuations that always appear in many feature vectors, or rare that appear only in certain feature vectors. Variation is less likely to affect the entire input subspace. That is, the stability of the input subspace depends on the magnitude and frequency of the fluctuation.
[0020]
(2) Stability of similarity between subspaces: In the mutual subspace method, the maximum cosine of the input subspace and the dictionary subspace, that is, the cosine of two vectors belonging to both subspaces (which is innumerable) Since the largest value (maximum cosine) is used as the similarity, even if there is some variation in the input subspace, the similarity itself is difficult to vary simply by changing the vector set.
[0021]
Based on the above, we consider the phenomenon in Ref. [4]. A pattern that has been subjected to a random minute change that has not been learned is always input due to a change in face orientation or facial expression, and a pattern that has received a large change that has not been learned (rare change) is rarely input. In the subspace method, the degree of similarity reacts and vibrates immediately even with unlearned minute fluctuations. In addition, in the subspace method, the degree of similarity is unacceptably decreased for a large fluctuation that has not been learned yet.
[0022]
On the other hand, in the mutual subspace method, the input subspace can remain close to the dictionary subspace as long as it is a minute change or a rare change without distinguishing between learned and unlearned changes (stability of the input subspace). . As long as the input subspace stays close to the dictionary subspace, the similarity between subspaces can maintain a high value even if the input subspace fluctuates slightly (stability of similarity between subspaces).
[0023]
Further, in the mutual subspace method, the tolerance to both the magnitude and the frequency of the fluctuation can be simultaneously enhanced by learning the fluctuation in the dictionary subspace. This stability and learning are important properties in providing a pattern recognition system that is robust against fluctuations.
[0024]
In the subspace method (other methods in which one input pattern is used), it is possible to provide resistance against rare fluctuations by post-processing such as averaging the similarity to a plurality of input patterns in a predetermined period. Even so, since the similarity to each input pattern varies, it is difficult to obtain similarity stability corresponding to the synergistic effect of the stability of the input subspace and the stability of the subspace similarity.
[0025]
By the way, an image obtained by photographing an object under a certain illumination environment (that is, an image by reflection luminance) is a solid shape and surface reflectance pattern of the object, illumination spectral intensity (color and brightness), illumination direction and imaging. It depends on the direction. The three-dimensional shape of the object and the surface reflectance pattern are features unique to the object, and the feature appears in the captured image as an image pattern unique to the object (that is, a reflection luminance pattern). This is the basis on which an object can be recognized in an image pattern.
[0026]
However, this image pattern is a deformation of an object (for example, a three-dimensional shape change such as a facial expression change in the face), an illumination variation (for example, a change in illumination spectral intensity or illumination direction due to time zone or weather), and an object posture variation (for example, in the face). Changes in imaging direction such as a change in orientation) are affected in various ways. In particular, regarding the illumination direction, it is difficult to learn patterns of various illumination directions in the dictionary subspace in advance, despite the instability that the shading of the image pattern changes dramatically. Unlearned lighting fluctuations give unlearned luminance fluctuations to all input patterns. For this reason, even the mutual subspace method, which is considered to be resistant to fluctuations, cannot guarantee the resistance to the rare fluctuations and minute fluctuations, which causes a serious decrease in similarity and erroneous recognition.
[0027]
The influence of illumination fluctuation will be schematically described with reference to FIG. 51 in the figure is a dictionary subspace belonging to category A, and 52 in the figure is a dictionary subspace belonging to category B. “×” in the figure is an input pattern belonging to category A, and has moved as a whole due to the influence of illumination fluctuations. As shown in the upper part of the figure, among such illumination variations, pattern 53 is close to category A dictionary subspace 51 and pattern 54 is close to category B dictionary subspace 52. However, the number of patterns close to category A has decreased due to movement of the entire pattern. At this time, as shown in the lower part of the figure, since the input subspace 55 generated from the input pattern is halfway between the category A dictionary subspace 51 and the category B dictionary subspace 52, the similarity is The input subspace 55 may even be classified into category B depending on the input pattern.
[0028]
In order to prevent such a decrease in similarity and erroneous recognition due to illumination fluctuations, it is necessary to perform recognition using feature vectors that are not easily affected by illumination and that preserve object-specific features. Since the illumination variation gives a strong variation to the reflected luminance pattern of the object, the influence of the illumination variation is inevitable as long as the reflected luminance pattern itself is a feature vector. On the other hand, it is known that certain feature vectors derived from the reflected luminance pattern have constancy that is hardly affected by illumination conditions.
[0029]
For example, in reference [5] (Yael Moses et al., “Face Recognition: the Problem of Compensation for Changes in Illumination Direction”, Proceeding of ECCV'94, 1994), reflected luminance is an effective feature against illumination variation. A luminance gradient image obtained by differentiating an image (hereinafter referred to as a differential image) has been proposed.
[0030]
The first-order differentiation to the reflected luminance image corresponds to an operation for obtaining the luminance change rate, and a portion (edge) where the luminance changes stepwise is extracted. The second-order differentiation to the reflected luminance image corresponds to an operation for obtaining the rate of change of the reflected luminance change rate, and a portion where the luminance changes in a ridge shape or a valley shape is extracted.
[0031]
The surface of the object has high and low reflectance patterns peculiar to the object, and the low reflection luminance always changes regardless of the illumination intensity at locations where the reflectance is low, and the reflection luminance changes depending on the illumination intensity at locations where the reflectance is high. To do. For this reason, the first-order differential image represents (a) a change in reflectance at a location where illumination is applied and (b) a change in shadow at a location where high reflectance is applied. (A) is an edge where there is no fluctuation in position even if there is an intensity fluctuation due to illumination fluctuation unless the attitude of the object changes, and (b) is an intensity and position which both fluctuate due to illumination fluctuation even if the attitude of the object does not change. It is an edge.
[0032]
The first effect of the differential image with respect to illumination variation is that (a) preserves the characteristic of the reflectance pattern unique to the object according to the posture of the object. This shows the validity of using the differential pattern for object recognition. The second effect of the differential image on the illumination variation is that (a) appears stronger than (b) as the height difference of the reflectance pattern of the object surface increases, that is, the texture of the object is stronger. In particular, when the object has a smooth curved surface, the shadow due to illumination shows a gentle change, so that (b) is further weakened. This means that unlearned illumination fluctuation, which was a large-scale fluctuation in the reflected luminance image, can be suppressed to an unlearned fluctuation relating to (b) in the differential image (localization and miniaturization). The differential image is considered to have illumination variation constancy due to such an effect. On the other hand, however, the presence of (b) does not completely eliminate the influence of illumination fluctuations. Therefore, it can be seen that it is necessary to use in combination with a recognition method capable of absorbing local minute fluctuations in (b) in order to give illumination fluctuation constancy to the pattern recognition system.
[0033]
In fact, in the experiment according to the literature [5], the inter-pattern distance was used as the similarity, so that the improvement of the recognizability by differentiation was not confirmed. This knowledge means that even if the influence of illumination fluctuation can be suppressed to a local minute fluctuation by devising the feature amount, it is desirable if the most stable pattern recognition method is not selected for this minute fluctuation. The effect is not obtained. Here, the advantage of using the mutual subspace method that has not been suggested heretofore appears.
[0034]
The effect of combining the mutual subspace method and the localization / miniaturization of illumination variation will be schematically described with reference to FIG. 61 in the figure is a dictionary subspace belonging to category A, and 62 in the figure is a dictionary subspace belonging to category B. “×” in the figure is an input pattern belonging to the category A, which is farther than the case where there is no illumination variation, but is approaching the dictionary subspace 61 of the original category A due to the localization / miniaturization of the illumination variation. . As shown in the upper part of the figure, the pattern 63 is close to the category A dictionary subspace 61, and the pattern 64 is close to the category B dictionary subspace 62.
[0035]
However, the number of patterns close to category A is increasing due to movement of the entire pattern due to localization / miniaturization of illumination variation. At this time, as shown in the lower part of the figure, the input subspace 65 generated from the input pattern is closest to the dictionary subspace 61 of category A. The input pattern is safely classified into category A. On the other hand, in the method of calculating the similarity as one pattern, the number of correctly classified patterns is increased or decreased regardless of whether there is illumination variation or localization / miniaturization. A dramatic effect cannot be expected.
[0036]
For an object whose direction and expression change frequently, such as a live human face, the edge (a) of the low-reflectance part such as the eyebrows and eyes moves according to the face direction and expression, and the shadowed edge (b) is illuminated. If the direction is constant, it hardly moves. In other words, if it is assumed that the face orientation and facial expression change frequently (photographs a live human), the movement is different between (a) and (b).
[0037]
Here, considering the eyebrows and eye edges (a), which are facial features learned in the dictionary, (b) is a consistent movement that moves relative to (a) according to the face orientation and the like. It can be regarded as no unlearned minute fluctuation. This is true for face recognition as described in [3] [4], in which various face orientations and expressions were learned under specific limited lighting conditions. Pattern recognition method, which is shown to be superior to the conventional method, and is expected to be highly resistant to minute fluctuations, that is, to use the mutual subspace method with the stability of the input subspace and the similarity between subspaces. A special meaning comes out. However, the differential image application methods that have been proposed in the past did not take any consideration into this point, and as a result, the merit of the differential image could not be fully derived and sufficient performance improvement was not achieved. is there. This is the first problem in the prior art.
[0038]
In an object with a clear reflectance pattern such as a face and a smooth surface shape, the shadow due to illumination appears as a global change in the reflected luminance pattern of the object, and the reflectance change has a finer structure. Appears as a steep luminance change. The above-described differential processing functions as a high-pass filter for the reflected luminance image, and has an effect of removing a fine luminance structure having a high spatial frequency by removing a global luminance change having a low spatial frequency. That is, it can be concluded that high-pass filtering is effective in reducing the influence of illumination fluctuations, as well as the differential processing suggested in the literature [5]. Therefore, by combining the high-pass filtering and the mutual subspace method, the pattern recognition system obtains unprecedented stability for almost all variations including object deformation, object orientation variation, and illumination variation.
[0039]
Another feature amount that utilizes the fact that the influence of illumination appears in the low-frequency component of the reflected luminance image has also been proposed. Reference [6] (Akamatsu, "Facial image matching device", Japanese Patent Laid-Open No. 5-20442) and Reference [7] (Akamatsu et al., "Robust front face identification method using grayscale image matching -Application of KL expansion of Fourier spectrum- In "Science Theory Vol.J76-D-II, No.7, 1993), the intensity distribution of the spatial frequency component of the reflected luminance image, that is, the Fourier spectrum is used as the feature quantity.
[0040]
The greatest merit of using the Fourier spectrum as a feature quantity is that it is not affected by the positional deviation fluctuation of the object image in the image pattern due to the invariance of the geometric transformation. Furthermore, it is shown in the literature [7] that the Fourier spectrum has the effect of mitigating the influence of illumination fluctuations. This is because the illumination fluctuation affects only the low frequency region of the Fourier spectrum, and the influence of the illumination fluctuation can be suppressed to a local fluctuation called intensity fluctuation in the low frequency region. This fluctuation in the low frequency region is a local fluctuation corresponding to (b) in differentiation. Therefore, in the case of the Fourier spectrum, there is a merit of using the mutual subspace method as a pattern recognition method as in the case of differentiation.
[0041]
As described above, both the differential image and the Fourier spectrum alone have the effect of localizing / miniaturizing the illumination fluctuation component from the reflected luminance image. Here, it is conceivable to further reduce the illumination variation component that remains after the initial illumination variation stabilization processing by these single units by another method. There is no example that clearly suggests the means for making the remaining illumination fluctuation clear, and a combination of the means for making the initial light fluctuation constant and the means for making the remaining light fluctuation permanent is realized. This is the second problem in the prior art.
[0042]
Furthermore, the low frequency component of the high reflectance part that strongly influences the illumination, that is, the shadow change that should be reduced as the illumination fluctuation component, includes a reflection luminance pattern due to the shape characteristic of the object. The high-pass filtering process for the reflected luminance image is an operation for dropping such feature information, and only the reflectance feature of the object is used for recognition. I hope to achieve high recognition performance by reducing the effects of lighting fluctuations while using both. This is the third problem in the prior art.
[0043]
Further, general differentiation processing requires at least a weighted sum calculation of pixel values, that is, a product-sum operation, and Fourier expansion requires much more calculation. Therefore, the calculation cost may increase unacceptably as the number of pixels in the image increases.
[0044]
In particular, the product-sum operation in differentiation is a basic method of realizing high-pass filtering, so speeding up this maximizes the processing performance of a pattern recognition system that combines high-pass filtering and the mutual subspace method. It is also important to improve to the limit. The realization of faster high-pass filtering is a fourth problem in the prior art.
[0045]
[Problems to be solved by the invention]
The present invention has been made in view of the above problems, and the first object of the present invention is to reduce the weakness of the mutual subspace method to illumination fluctuations by means of localizing / miniaturizing the illumination fluctuations. It is to provide a pattern recognition apparatus and method that are more robust than any other conventional methods for fusing the advantages of the two by making up for them, and for overall fluctuations such as fluctuations in the position and orientation of the object, deformation of the object, and illumination fluctuations.
[0046]
In addition, the second object of the present invention is to provide another means for further reducing the illumination fluctuation component still remaining after the means for localizing / minimizing the illumination fluctuation, so that the unlearned minute fluctuation appearing in the input is provided. Therefore, it is intended to realize illumination resistance variability that is more robust to the pattern recognition apparatus and method as the first object.
[0047]
A third object of the present invention is to provide a pattern recognition means that localizes / miniaturizes the influence of illumination variation while effectively using the object texture feature by the reflectance pattern and the object shape feature by the shadow pattern. By providing it, the object features appearing in the input can be used to the maximum extent, thereby improving the recognition capability of the pattern recognition apparatus and method as the first object.
[0048]
The fourth object of the present invention is to provide a means for localizing / minimizing illumination fluctuations by an extremely high speed and simple process, and thus the pattern recognition device as the first object and the processing capability of the method. It is to improve.
[0049]
[Means for Solving the Problems]
The pattern recognition apparatus according to the present invention includes means for inputting a pattern, means for extracting a feature vector for recognition from the input pattern, means for generating an input subspace from the feature vector, and the input subspace. In a pattern recognition apparatus comprising means for calculating similarity between dictionary subspaces and means for determining an object category based on the similarity, the feature vector localizes / miniaturizes illumination variation 1 One of the one or more processes which are extracted through a plurality of processes and localize / miniaturize the illumination variation is a process of expanding the range of the low luminance region and relatively compressing the range of the high luminance region. It is characterized by that.
[0050]
The pattern recognition apparatus according to the present invention includes means for inputting a pattern, means for extracting a feature vector for recognition from the input pattern, means for generating an input subspace from the feature vector, and the input subspace. In a pattern recognition apparatus comprising means for calculating a similarity between dictionary subspaces and means for determining an object category based on the similarity, a part or all of the feature vectors localize illumination variations / Extracted through one or more processes to be miniaturized, and one of the one or more processes to localize / miniaturize the illumination variation expands the range of the low-brightness region and increases the range of the relatively high-brightness region. The input subspace is generated through a process of projecting a subspace obtained by principal component analysis of the feature vector onto a constrained subspace that suppresses illumination variation, About subspace, characterized in that it is produced by the feature vector collected under the same lighting conditions as a linear combination of the difference subspace between categories of the signal space defined for each category.
[0051]
The pattern recognition apparatus according to the present invention includes means for inputting a pattern, means for extracting a first feature vector for recognition from the input pattern, and extracting a second feature vector for recognition from the input pattern. Means for generating an input subspace from the first and second feature vectors, means for calculating a similarity between the input subspace and the dictionary subspace, and an object based on the similarity In a pattern recognition apparatus comprising a means for determining a category, the first feature vector is extracted through one or more processes for localizing / minimizing illumination variation, and localizing / miniaturizing the illumination variation. One of the one or more processes is a process of expanding the range of the low luminance region and relatively compressing the range of the high luminance region.
[0052]
According to the present invention, in the pattern recognition device for recognizing the reflected luminance pattern obtained by photographing a three-dimensional object, the influence of illumination variation appearing in the pattern is often included in the low frequency component of the spatial frequency and the high luminance pixel region. As a process to use and localize / miniaturize illumination fluctuations, for example, high-pass filtering for patterns, low-intensity range expansion processing for patterns, and Fourier expansion processing for patterns can be performed alone or in combination to reduce the effects of illumination. Generate and recognize localized / miniaturized feature vectors. As a result, it is possible to realize a pattern recognition device that compensates for the weakness of the mutual subspace method with respect to illumination fluctuations, restores the original fluctuation resistance, and is robust against fluctuations in general.
[0053]
According to the present invention, in a pattern recognition apparatus for recognizing a reflected luminance pattern obtained by photographing a three-dimensional object, a feature vector is generated by performing processing for localizing / minimizing illumination fluctuations, and further, input by the feature vector By projecting the subspace onto a predetermined constrained subspace, a new input subspace that does not expand in the direction of illumination variation is generated. As a result, it is possible to realize a pattern recognition device that compensates for the weakness of the mutual subspace method with respect to illumination fluctuations and restores the original fluctuation resistance, and is more robust against fluctuations in general.
[0054]
According to the present invention, in the pattern recognition device for recognizing the reflected luminance pattern obtained by photographing a three-dimensional object, the first feature vector is generated by performing the processing for localizing / minimizing the illumination variation, and separately from this. An input subspace in which the second feature vector that is not subjected to the localization / miniaturization processing of the illumination variation is generated and the effect of illumination is reduced by appropriately combining both feature vectors, and many object-specific features are stored. Is generated. As a result, it is possible to realize a pattern recognition apparatus that compensates for the weakness of the mutual subspace method against illumination fluctuations and restores the original fluctuation resistance, and is robust against the fluctuations and improved in recognition performance.
[0055]
According to the present invention, in the pattern recognition device for recognizing a reflected luminance pattern obtained by photographing a three-dimensional object, the high-pass filtering process is realized by a differentiation process, particularly an embossing process with extremely low calculation cost. As a result, there is almost no increase in calculation cost due to the addition of this processing, and the weakness against the illumination fluctuation of the mutual subspace method is compensated to restore the original fluctuation resistance. A pattern recognition device can be realized.
[0056]
According to the present invention, by performing luminance correction on the pixel luminance value of the pattern prior to the process of localizing / miniaturizing the illumination variation, a feature vector that is more stable against the illumination variation is generated, and the mutual subspace is generated. It is possible to recover the original fluctuation resistance by compensating for the weakness of the law against illumination fluctuations, and to realize a pattern recognition device that is robust and stable with respect to all fluctuations.
[0057]
The invention relating to each of the above devices is also established as an invention relating to the method, and the invention relating to the method is also established as an invention relating to the device.
[0058]
The above-described invention can also be realized as a machine-readable medium recording a program for causing a computer to execute a corresponding procedure or means.
[0059]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of a pattern recognition apparatus and method according to the present invention will be described with reference to the drawings.
[0060]
(Description of the first embodiment)
A first embodiment of a pattern recognition apparatus and method according to the present invention will be described below. FIG. 1 shows a block configuration of the apparatus of this embodiment. The apparatus according to the present embodiment includes a pattern input unit 1, a fluctuation constant vector generation unit 2, an input subspace generation unit 3, an intersubspace similarity calculation unit 4, a recognition category determination unit 5, and an output unit 6. FIG. 2 shows a processing configuration in the apparatus of this embodiment. The processing in the apparatus of the present embodiment includes pattern input processing S1, variation constant vector generation processing S2, input subspace generation processing S3, subspace similarity calculation processing S4, recognition category determination processing S5, and output processing S6.
[0061]
(First Embodiment: Description of Pattern Input Unit 1)
The pattern input unit 1 inputs k reflection luminance patterns {p: p1,..., Pk} of a recognition object photographed using an imaging means such as a television camera (step S1). At this time, the pattern input unit 1 can input a plurality of patterns having different shooting times at a single imaging position, or can input a plurality of patterns having the same or different shooting times at a plurality of imaging positions. Is possible. The pattern input unit 1 may use a single imaging unit or a plurality of imaging units.
[0062]
(First Embodiment: Explanation of Fluctuation Constant Vector Generator 2)
The effect of illumination is to generate a shadow derived from the object shape on the reflection luminance pattern in the high reflectance portion. Since the variation in illumination invites a large-scale variation over the entire shadow pattern, it is necessary to suppress this large-scale variation to a local and minute variation so that the stability of the mutual subspace method is exhibited. The variation constant vector generation unit 2 is a functional block that localizes / miniaturizes such variation, and the influence of illumination variation from each reflected luminance pattern {p: p1, ..., Pk} by the pattern input unit 1. A feature vector {v: v1, ..., vk} obtained by localizing / minimizing is generated and output (step S2). For this purpose, the variation-constant vector generation unit 2 in the apparatus of the present embodiment uses at least the following three alone or in combination as a method for generating a feature vector in which the influence of illumination variation is reduced from the reflected luminance pattern. Can do.
[0063]
(A) High pass filtering
(B) Expanding the range of the low brightness area
(C) Fourier transform
Hereinafter, the methods (A) to (C) will be described.
[0064]
(First embodiment: description of the fluctuation constant vector generation unit 2: high-pass filtering)
As a first method for localizing / miniaturizing the illumination variation, high-pass filtering (A) for the reflected luminance pattern {p: p1,..., Pk} can be mentioned. By using the fact that the shadow variation that appears in the high reflectance part due to illumination variation is a low frequency component in terms of spatial frequency, high-pass filtering (or a reflection luminance pattern {p: p1, ..., Pk}) (or By generating an image pattern that has been subjected to high-frequency emphasis filtering as a feature vector {v: v1, ..., vk}, the shading variation is converted into inconsistent weak edges (local And miniaturization).
[0065]
Although several candidates are conceivable as a method for realizing high-pass filtering, the simplest method is to apply a differential process to the reflected luminance pattern {p: p1, ..., Pk}. In general, the differential processing is performed by scanning the differential operator Op of (2m + 1) × (2n + 1) pixels on the image pattern of w × h pixels, and the image pattern in which the weight coefficient of the pixel of operator Op overlaps with the operator pixel. A value (weighting coefficient × luminance value) multiplied by the luminance value of the upper pixel is obtained, and a value (sum of products of the weighting coefficient and the luminance value) obtained by adding such calculation values of all the operator pixels corresponds to the operator center. The differential value on the image pattern. Since the area covered by the operator is in the vicinity of the periphery of the operator center pixel on the image pattern, the operator is considered to be farther from the center pixel as the operator size is larger, that is, the operator detects a low spatial frequency component having a long wavelength. Conversely, in order to detect a high spatial frequency component, the operator should have a small size. Detailed information on image processing by a differential operator is described in the literature [8] (edited by Hiroshi Tanaka, “Image Processing Applied Technology”, Industrial Research Committee, 1989). The effective range of the differentiation process by the operator is the range obtained by removing the m pixels at both the left and right ends of the image pattern and the n pixels at the upper and lower ends, and the pattern after differentiation is (w-2m) x (h-2n) of this effective range. The pixel pattern, that is, a (w-2m) × (h-2n) -dimensional feature vector. This is because an operator whose center pixel is outside the effective range protrudes an image pattern, and thus a differential value cannot be obtained accurately.
[0066]
In addition to the weighted sum calculation by applying the differential operator Op, the differential processing can be realized by, for example, embossing processing known as a relief-like image expression technique. In embossing, the same image pattern is shifted and superimposed on the original image pattern, and the value obtained by subtracting the other from one is used as the value of the pixel. The embossing process can be regarded as a one-way differential process using an extremely small differential operator, but it can be performed by subtracting only the pixel values, so it is executed much faster than the weighted sum calculation by a general differential operator. it can. In particular, the embossing process that is performed by shifting one pixel functions as a high-pass filter that extracts only the highest spatial frequency component in the reflection luminance pattern, so that low-frequency components that are affected by illumination fluctuations can be removed with low calculation cost. Convenient to do.
[0067]
In the case of embossing, the range where the image patterns overlap is the effective range, so the feature vector is also a vector of the number of pixels in the overlapping range. Embossing by shifting by one pixel is more advantageous because the effective range has only one pixel width at most, and the effective range can be kept wider than when a large differential operator is applied.
[0068]
The direction in which the image pattern is shifted in the embossing process is most effective if it is perpendicular to the direction of the edge representing the feature of the recognition target appearing in the image pattern. By doing in this way, the illumination variation constant pattern becomes an image in which the characteristics of the recognition target remain sufficiently and the influence of illumination variation is eliminated as much as possible. For example, since human faces have many horizontal edges such as eyebrows, eyes, and mouths, embossing by shifting the image pattern in the vertical direction (the vertical axis direction of the face) is considered effective. The same directionality problem applies to ordinary differential operators, and it is important to apply sensitive differential operators only in the important edge directions representing the characteristics of the pattern.
[0069]
The image pattern generated and output by high-pass filtering has a strong edge (a) corresponding to the reflectance pattern of the low reflectance portion and a weak edge (b) corresponding to the shadow of the high reflectance portion. Yes. Of these, the former (a) is a component to be recognized as a solid image feature representing an object feature, while the latter (b) is a noise component whose position and intensity change due to illumination variation. Although the objective of localizing / minimizing the illumination variation over the entire pattern to the noise component as shown in (b) can be achieved only by the high-pass filtering, this residual is followed by the high-pass filtering (constant illumination variation). By adding a process for removing the noise component (b) (constant residual illumination fluctuation), the illumination fluctuation is further reduced, and the fluctuation stability of the mutual subspace method is further effective. Therefore, weak edge removal processing is executed to remove the noise component (b) as the residual illumination fluctuation constant processing.
[0070]
Specifically, a pixel value whose absolute value is equal to or less than a predetermined threshold value is replaced with 0. The image pattern generated as a result is a feature vector that leaves only the edge of (a).
[0071]
(First embodiment: description of the variation constant vector generation unit 2: expansion of the range of the low luminance region)
As a second method for localizing / miniaturizing the illumination variation, there is a range expansion process (B) in the low luminance region for the reflected luminance pattern {p: p1,..., Pk}. Utilizing the fact that fluctuations in lighting do not affect the low-reflectance part so much, the brightness range of the low-brightness area is expanded in order to emphasize the image information of the low-reflectance part that always has low reflection brightness in the reflection brightness pattern. Reducing the luminance range of the relatively high luminance area. That is, the gradation of the dark part in the image pattern is widened and the gradation of the bright part is narrowed. As a result, the information of the high reflectance portion that is easily affected by illumination is suppressed (miniaturized) to a narrow gradation range, and the information of the low reflectance portion that strongly retains the characteristics of the object is re-expressed in a wide gradation range. As a specific calculation method, a brightness correction curve that defines an output brightness value with respect to input brightness and a brightness correction table that stores an output brightness value with respect to input brightness are prepared, and a corrected brightness value with respect to input brightness is obtained.
[0072]
Also, as an extreme example of the process of enlarging the luminance range of the low luminance region, a method of binarizing the reflected luminance pattern is also possible. The pixel value of the reflection luminance pattern is compared with a predetermined threshold value, and if the luminance is higher than the threshold value, a predetermined value is given to the pixel, and if the luminance is lower than the threshold value, another predetermined value is given to the pixel.
[0073]
In either method, since the effective range of the pattern after processing does not change, the dimension of the feature vector matches the number of pixels w × h of the input pattern.
[0074]
(First embodiment: description of the fluctuation constant vector generation unit 2: Fourier expansion)
As a third method for localizing / miniaturizing the illumination variation, Fourier expansion (C) for the reflection luminance pattern {p: p1,..., Pk} disclosed in the literature [7] can be mentioned. Since the shading fluctuation accompanying the lighting fluctuation is a low frequency component in terms of the spatial frequency, the influence of the lighting fluctuation is the spatial frequency obtained by Fourier expansion of the reflection luminance pattern {p: p1, ..., Pk}. It often appears in the low frequency region of another intensity (Fourier spectrum). Each Fourier spectrum {f: f1, ..., fn} generated for n spatial frequencies of k patterns is regarded as one of N-dimensional feature vectors {v: v1, ..., vk}, respectively. The effect of illumination variation appears only in a small number of scalars corresponding to the low frequency region within it (localization).
[0075]
Note that the feature vector generated by Fourier expansion is a Fourier spectrum composed of N spectral components. The Fourier spectrum expresses the image characteristics of an object with a pattern of spatial frequency components, and the influence of shadows due to illumination is still preserved in the low frequency components. The purpose of localizing the illumination variation over the entire pattern to the low-frequency component of the Fourier spectrum can be achieved by Fourier expansion alone, but following the Fourier expansion (constant illumination variation), this remaining low-frequency component (b) is removed. By adding this processing (residual illumination fluctuation constant), the illumination fluctuation is further reduced, and the fluctuation stability of the mutual subspace method is more effective. Therefore, as a residual illumination fluctuation constant processing, a feature vector is reconstructed with (Nn) spectral components obtained by deleting n spectral components corresponding to low frequency components from among N spectral components. As a result, a feature vector from which the influence of illumination variation is removed is obtained.
[0076]
(First Embodiment: Explanation of Fluctuation Constant Vector Generator 2: Luminance Correction / Normalization Processing)
In addition, prior to high-pass filtering (A), low-brightness range expansion processing (B), and Fourier expansion (C), the variation of the reflected luminance pattern {p: p1, ..., Pk} Correct and normalize the brightness range and brightness histogram. By doing so, the edge intensity of the differential image, the low luminance region corresponding to the low reflectance portion, and the intensity of the Fourier spectrum can be stably obtained regardless of the fluctuation of the illumination intensity. As the luminance correction / normalization processing, norm normalization with respect to the reflection luminance pattern (a common coefficient is applied to each pixel value so that the square sum of the pixel values becomes a predetermined value), canonicalization (the average of the pixel values is 0, a common offset is subtracted from each pixel value so that the variance of the pixel values is 1, and a common coefficient is applied), and histogram flattening (the luminance value of each pixel is made nonlinear so that the luminance histogram becomes flat) Correction) and the like.
[0077]
(First embodiment: description of the fluctuation constant vector generation unit 2: combination of feature amounts, etc.)
In addition, the method for lighting fluctuation constant is not limited to the example of said (A)-(C). For example, CCCI (Color Constant Color) disclosed in the literature [9] (Hashizume et al., “Object extraction from color images robust to illumination light fluctuations”, IEICE Technical Report PRMU97-125, 1997) is proposed as a constant lighting fluctuation. Indexing) descriptors can also be used. Specifically, the weighted sum of the logarithm of the luminance value of each pixel and the logarithm of the luminance value in the vicinity of the four adjacent pixels is used. This utilizes the property (assuming) that the luminance ratio between multiple points in the reflected luminance pattern does not change even if the illumination varies, and has the effect of minimizing the influence of illumination variation. In short, in the present invention, the mutual subspace method exhibits resistance to rare fluctuations and minute fluctuations due to the synergistic effect of the stability of the input subspace and the stability of the subspace similarity, and thus the recognition system has almost all fluctuations. As long as the illumination fluctuation that affects the entire area of the feature vector can be suppressed to local fluctuations and minute fluctuations so that high stability can be obtained, differences in the means for localizing and minimizing fluctuations Is not a problem.
[0078]
Further, the feature vector generated through the illumination variation constant (or the process with the residual illumination variation constant) is generated by using a plurality of the above-described (A) to (C) and other corresponding methods in parallel. It is also possible to make it. That is, it is possible to generate and use a new feature vector by appropriately combining each scalar quantity of feature vectors generated by a plurality of methods. For example, each feature quantity by high-pass filtering and Fourier expansion may be edited into one feature vector and used. In this way, it is possible to achieve stable recognition by localizing / miniaturizing the illumination variation included in the object using a feature expression that cannot be obtained by one method.
[0079]
(First Embodiment: Description of Input Subspace Generation Unit 3)
The input subspace generator 3 uses the eigenvalue {λ: λ1,... By the principal component analysis technique such as KL expansion from the N-dimensional feature vector {v: v1,. , λM} and N-dimensional eigenvectors {e: e1, ..., eM}, and Mi (Mi ≦ M) eigenvectors {e: e1, ..., eMi} in descending order of corresponding eigenvalues. An input subspace Li (dimension number Mi) is generated and output as a basis vector (step S3). This Mi is the number of bases on the input side used for similarity calculation in the subspace similarity calculation unit 4 in the next stage.
[0080]
(First embodiment: description of similarity calculation unit 4 between subspaces)
The inter-subspace similarity calculation unit 4 holds dictionary subspaces corresponding to x categories to be recognized, and x dictionary subspaces held by the input subspace generation unit 3 and the input subspace Li. The subspace similarity {s: s1, ..., sx} with {Ld: Ld1, ..., Ldx} is calculated, and the similarity and the category code {c : C1, ..., cx} is generated and output as recognition candidate information {c: c1, ..., cx, s: s1, ..., sx} (step S4). When the projection matrix for the input subspace Li is P and the projection matrix for the dictionary subspace Ld is Q, the subspace similarity s is the maximum eigenvalue of the matrix PQP or QPQ. At this time, Mi basis vectors of the input subspace Li are used to generate the projection matrix P 1, and Md basis vectors of the dictionary subspace Ld are used to generate the projection matrix Q 1. The definition of similarity between subspaces and the mathematical support of the calculation method are disclosed in the literature [3].
[0081]
(First Example: Explanation of Recognition Result Determination Unit 5)
The recognition result determination unit 5 compares all the similarities of the recognition candidate information {c: c1,..., Cx, s: s1,. The category code co in the dictionary subspace showing the above highest similarity is confirmed and output as a recognition result. If there is no dictionary subspace indicating a similarity equal to or greater than a predetermined threshold, it is determined that a recognizable category is not included in the input pattern, and a special code indicating no category is output (step S5). .
[0082]
(First embodiment: description of output unit 6)
The output unit 6 outputs at least the code co from the recognition result determination unit 5 to the outside of the apparatus (step S6).
[0083]
(Description of the second embodiment)
The first implementation of the embodiment in which the reflected luminance pattern in which the influence of illumination variation is localized / miniaturized through the variation constant vector generation unit 2 (variation constant vector generation processing S2) is recognized by the mutual subspace method. It was an example explanation. Next, a pattern recognition apparatus and method according to a second embodiment of the present invention will be described.
[0084]
FIG. 3 shows a block configuration of the apparatus of this embodiment. The apparatus according to the present embodiment includes a pattern input unit 11, a variation constant vector generation unit 12, an input subspace generation unit 13, a subspace similarity calculation unit 14, a recognition category determination unit 15, an output unit 16, and a subspace projection unit 17. Become. FIG. 4 shows a processing configuration in the apparatus of this embodiment. The processing in the apparatus of the present embodiment is pattern input processing S11, variation constant vector generation processing S12, input subspace generation processing S13, subspace similarity calculation processing S14, recognition category determination processing S15, output processing S16, subspace projection processing. Consists of S17. 11 to 16 (S11 to S16) in the figure are the same as 1 to 6 (S1 to S6) in the first embodiment. The difference between the present embodiment and the first embodiment is the addition of the subspace projection unit 17 (subspace projection processing S17).
[0085]
(Second embodiment: description of subspace projection unit 17)
The subspace Li generated by the input subspace generation unit 13 is a subspace with the dimension number Mi generated by the feature vector v in which the variation in illumination is localized / miniaturized by the variation constant vector generation unit 12. The effects of fluctuations are not completely eliminated. The subspace projection unit 17 holds a constrained subspace Lc having a dimension number Mc, and generates and outputs a new subspace Li ′ obtained by projecting the input subspace onto the constrained subspace Lc (step S17). At this time, the calculation for projecting the input subspace Li to the constrained subspace Lc is to obtain new Mi vectors obtained by projecting Mi base vectors of the input subspace Li to Lc, respectively, and orthonormalize this vector to obtain a partial This is achieved by using the base of the space Li ′. In addition, the constrained subspace Lc that constrains illumination fluctuations collects x categories of M-dimensional subspaces {Ls: Ls1,..., Lsx} under the same lighting conditions, and between any two different categories. y differential subspaces {LD: LD1, ..., LDy} are obtained and given as a linear combination thereof.
[0086]
The difference subspace LD is given with a difference vector (uv) of M sets of vectors u and v giving canonical angles of the two subspaces Lsi and Lsj as a basis. Since the differential subspace {LD: LD1,..., LDy} generated under the same illumination condition does not include illumination variation, the constrained subspace Lc, which is a linear combination thereof, does not expand in the illumination variation direction. On the other hand, the difference subspace {LD: LD1,..., LDy} is a difference between categories, so that the constrained subspace Lc that is a linear combination also has a spread in the direction of the difference between categories.
[0087]
Therefore, the input subspace Li projected onto the constrained subspace Lc does not include a component in the illumination variation direction, and becomes an input subspace Li ′ that stores the component in the difference direction between categories. In other words, the partial space after projection is a partial space in which variation in illumination is suppressed and category features are emphasized. Note that the important point here is that the collection of {Ls: Ls1,..., Lsx} for obtaining the constrained subspace Lc can be easily performed because the illumination conditions may be constant.
[0088]
The inter-subspace similarity calculation unit 14 holds a dictionary subspace {Ld ′: Ld1 ′,..., Ldx ′} previously projected onto the constrained subspace Lc, and an input portion by the subspace projection unit 17 Compute all subspace similarity {s: s1, ..., sx} between the space Li ′ and the x dictionary subspaces {Ld ′: Ld1 ′,.. Recognition candidate information {c: c1, ..., cx, s: s1, ... paired with the category code {c: c1, ..., cx} of the dictionary subspace indicating the similarity. , sx} is generated and output (step S14). The recognition method for calculating the similarity between subspaces after projecting the input subspace and the dictionary subspace to the constrained subspace as described above is called a constrained mutual subspace method.
[0089]
By using the illumination variation constant in the variation constant vector generation unit 12 (or processing with constant residual illumination variation constant) and the constrained mutual subspace method by the subspace projection unit 17, similarity that is not affected by illumination variation can be obtained. It can be used for calculation.
[0090]
(Explanation of the third embodiment)
The above is an embodiment using a feature vector subjected to illumination fluctuation constant. However, in the present invention, a feature vector generated through illumination fluctuation constant and a feature generated without going through illumination fluctuation constant. It is also possible to use a combination of vectors.
[0091]
In other words, by generating and using a new feature vector that appropriately combines each scalar quantity of each feature vector, the object feature information that can no longer be used due to the constant illumination variation is used from the feature vector that is not subject to the constant illumination variation. In addition, in the feature vector in which both are made possible, the scale of variation is localized / miniaturized as an effect of making the illumination variation constant as compared with the case where the illumination variation is not made constant. As a result, it is possible to perform stable recognition by reducing the illumination variation included in the object while using the feature expression of the object that cannot be obtained by making the illumination variation constant. The pattern recognition apparatus and method according to the third embodiment of the present invention is given such a configuration. The third embodiment will be described below.
[0092]
FIG. 5 shows a block configuration of the apparatus of this embodiment. The apparatus according to the present embodiment includes a pattern input unit 21, a variation constant vector generation unit 22, an input subspace generation unit 23, a subspace similarity calculation unit 24, a recognition category determination unit 25, an output unit 26, and a variation non-constant vector generation. Part 27. FIG. 6 shows a processing configuration in the apparatus of this embodiment. The processing in the apparatus of the present embodiment is pattern input processing S21, variation constant vector generation processing S22, input subspace generation processing S23, subspace similarity calculation processing S24, recognition category determination processing S25, output processing S26, variation non-constant. It consists of vector generation processing S27. 21 to 26 (S21 to S26) in the figure are the same as 1 to 6 (S1 to S6) in the first embodiment. The difference between the present embodiment and the first embodiment is the addition of a fluctuation non-constant vector generation unit 27 (fluctuation non-constant vector generation process S27).
[0093]
The variation constant vector generation unit 22 is a first N1-dimensional feature obtained by performing illumination variation constant (or processing including residual illumination variation constant) by, for example, high-pass filtering from the reflected luminance pattern by the pattern input unit 21. Generate a vector. At this time, the first feature vector is a vector in which illumination variation is localized / miniaturized, and most of the shadow information that expresses the shape feature of the object is lost, and only strong edge information that expresses the texture feature of the object is lost. Is maintained.
[0094]
The variation non-constant vector generation unit 27 directly generates an N2-dimensional second feature vector from the reflected luminance pattern by the pattern input unit 21. The second feature vector is the reflection luminance pattern itself, and the shadow information based on the object shape is maintained as it is, but this also includes the influence of illumination variation over the entire pattern.
[0095]
The input subspace generation unit 23 considers the N1 dimensional first feature vector and the N2 dimensional second feature vector as a (N1 + N2) dimensional feature vector and inputs a plurality of (N1 + N2) dimensional inputs. An input subspace is generated from the feature vector.
[0096]
In the present embodiment, the first feature vector is generated by making the illumination variation constant (or processing adding the residual illumination variation constant) only for the first predetermined region of the pattern, and the second feature vector. Is generated by other means targeting only the second predetermined region of the pattern, and it is possible to perform recognition in a state where other unnecessary information is removed while leaving more object features. For example, in face recognition, the first predetermined area may be an eyebrow, eye, nose, or mouth periphery with many strong edges, and the second predetermined area may be a forehead or cheek. By doing this, the illumination variation is made constant in an area where many texture features are stored, and more face feature information is saved without making the illumination variation constant in an area where many solid shape features are stored. However, it is possible to use a feature amount with reduced illumination variation.
[0097]
(Description of Modified Example)
It should be noted that the embodiments described above can be implemented by combining or modifying the configurations.
[0098]
For example, the first feature vector is generated by making the illumination variation constant (or processing with constant residual illumination variation constant), the second feature vector is generated by other means, and the feature vector obtained by merging the two is used. It is also possible to generate the input subspace and perform projection onto the constrained subspace to obtain the similarity. In this way, it is possible to perform recognition by removing illumination variations more strongly while leaving more object features.
[0099]
In addition, the first predetermined region is a region having many strong edges such as the eyebrows, eyes, nose, and mouth around the face, and the second predetermined region is, for example, the forehead or cheek A first feature vector is generated as an area having a strong shadow by performing illumination fluctuation constant (or processing including residual illumination fluctuation constant) for only the first predetermined area, and the second feature vector is generated as the second feature vector. It is also possible to generate the input subspace from the feature vector that combines only the predetermined area of the target and generate the input subspace from the feature vector and then project it to the constrained subspace to obtain the similarity Is possible. In this way, it is possible to perform recognition in a state where illumination variation and other useless information are removed while leaving more object features.
[0100]
(Description of embodiment by medium)
By the way, as illustrated in FIG. 7, information (for example, a program) for realizing the image pattern recognition apparatus and method according to the present invention is recorded on a recording medium 31, and the recorded information is transmitted via the recording medium 31. The present invention can also be applied to the apparatus 32 and the apparatus 33 including the imaging unit 37, or to the apparatus 34 including the apparatus 33 and the imaging unit 38 via the communication lines 35 and 36. That is, the present invention is not limited to the above-described embodiment, and can be implemented with various modifications within the technical scope thereof.
[0101]
【The invention's effect】
According to the present invention, it is possible to recognize an image pattern that is not easily affected by the illumination variation even under an illumination variation environment that is affected by outside light such as outdoors or indoors on a window.
[0102]
Specifically, according to the present invention, in a pattern recognition device for recognizing a reflected luminance pattern obtained by photographing a three-dimensional object, the influence of illumination variation that appears in the pattern is applied to a low frequency component of a spatial frequency or a high luminance pixel region. A feature vector in which the influence of illumination is localized / miniaturized is generated and recognized by the illumination fluctuation constant processing utilizing the fact that it is included in large numbers. As a result, it is possible to realize a pattern recognition device that compensates for the weakness of the mutual subspace method with respect to illumination fluctuations, restores the original fluctuation resistance, and is robust against fluctuations in general.
[0103]
Further, according to the present invention, in the pattern recognition device for recognizing the reflected luminance pattern obtained by photographing a three-dimensional object, the illumination fluctuation variation is performed by performing the process of removing the fluctuation component remaining after the illumination fluctuation stabilization process. It is possible to generate a feature vector that further reduces the influence of the pattern recognition, and it is possible to realize a pattern recognition device that is more robust against fluctuations in general.
[0104]
Further, according to the present invention, in a pattern recognition device for recognizing a reflected luminance pattern obtained by photographing a three-dimensional object, a feature vector is generated by performing illumination variation stabilization processing, and the remaining illumination variation is further converted into a partial space. By performing the constrained mutual subspace method that becomes constant by projection, it is possible to realize a pattern recognition device that is even more robust against fluctuations in general.
[0105]
According to the present invention, in the pattern recognition device for recognizing the reflected luminance pattern obtained by photographing a three-dimensional object, the illumination variation constant processing is performed to generate the first feature vector. By generating and combining second feature vectors that are not subject to regularization, it is possible to generate feature vectors that preserve many object-specific features while localizing / minimizing illumination variations, and are robust and aware of all variations A pattern recognition device with improved performance can be realized.
[0106]
Further, according to the present invention, in the pattern recognition device for recognizing the reflected luminance pattern obtained by photographing the three-dimensional object, the illumination variation constant processing is realized by embossing processing with extremely low calculation cost. It is possible to realize a pattern recognition device that is robust and robust against all fluctuations with almost no increase in calculation cost due to the addition of the digitizing process.
[0107]
In addition, according to the present invention, by correcting the pixel luminance value of the pattern prior to the illumination variation stabilization process, it is possible to generate a feature vector that is more stable with respect to illumination variation, and is robust against all variations. A stable pattern recognition device can be realized.
[Brief description of the drawings]
FIG. 1 is a diagram showing a block configuration of an apparatus according to a first embodiment of the present invention.
FIG. 2 is a diagram showing a processing configuration of the first embodiment apparatus according to the present invention;
FIG. 3 is a diagram showing a block configuration of a second embodiment apparatus according to the present invention;
FIG. 4 is a diagram showing a processing configuration of a second embodiment apparatus according to the present invention.
FIG. 5 is a diagram showing a block configuration of a third embodiment apparatus according to the present invention.
FIG. 6 is a diagram showing a processing configuration of a third embodiment apparatus according to the present invention.
FIG. 7 is a diagram for explaining an embodiment using a recording medium or the like.
FIG. 8 is a diagram for explaining the effect of the mutual subspace method.
FIG. 9 is a diagram for explaining the influence of illumination variation.
FIG. 10 is a diagram for explaining the effect of localization / miniaturization of illumination variation.
[Explanation of symbols]
1: Pattern input unit,
2: Fluctuation constant vector generator,
3: Input subspace generator,
4: Subspace similarity calculation unit,
5: Recognition category determination section,
6: Output section

Claims

Means for inputting a pattern; means for extracting a feature vector for recognition from the input pattern; means for generating an input subspace from the feature vector; and similarity between the input subspace and the dictionary subspace. In a pattern recognition device comprising means for calculating and means for determining a category of an object based on the similarity,
The feature vector is extracted through one or more processes for localizing / miniaturizing illumination fluctuations, and one of the one or more processes for localizing / miniaturizing the illumination fluctuations expands the range of the low luminance region. A pattern recognition apparatus, which is a process of compressing a range of a relatively high luminance region.

Means for inputting a pattern; means for extracting a feature vector for recognition from the input pattern; means for generating an input subspace from the feature vector; and similarity between the input subspace and the dictionary subspace. In a pattern recognition device comprising means for calculating and means for determining a category of an object based on the similarity,
A part or all of the feature vector is extracted through one or more processes for localizing / miniaturizing illumination fluctuations, and one of the one or more processes for localizing / miniaturizing the illumination fluctuations is a low luminance region. Is a process that expands the range and compresses the range of the relatively high brightness area,
The input subspace is generated through a process of projecting a subspace obtained by principal component analysis of the feature vector onto a constrained subspace that suppresses illumination variation, and the constrained subspace is collected under the same lighting condition. A pattern recognition apparatus, wherein the pattern recognition device is generated as a linear combination of difference subspaces between categories of subspaces extended for each category by the feature vector.

Means for inputting a pattern; means for extracting a first feature vector for recognition from the input pattern; means for extracting a second feature vector for recognition from the input pattern; and A pattern comprising: means for generating an input subspace from two feature vectors; means for calculating a similarity between the input subspace and the dictionary subspace; and means for determining an object category based on the similarity. In the recognition device,
The first feature vector is extracted through one or more processes for localizing / miniaturizing illumination fluctuations, and one of the one or more processes for localizing / miniaturizing the illumination fluctuations is a range of a low luminance region. The pattern recognition apparatus is a process for enlarging the image and compressing the range of the relatively high luminance region.

A process of inputting a pattern, a process of extracting a feature vector for recognition from the input pattern, a process of generating an input subspace from the feature vector, and a similarity between the input subspace and the dictionary subspace In a pattern recognition method comprising a process of calculating and a process of determining a category of an object based on the similarity,
The feature vector is extracted through one or more processes for localizing / miniaturizing illumination fluctuations, and one of the one or more processes for localizing / miniaturizing the illumination fluctuations expands the range of the low luminance region. A pattern recognition method, which is a process of compressing a range of a relatively high luminance region.

A process of inputting a pattern, a process of extracting a feature vector for recognition from the input pattern, a process of generating an input subspace from the feature vector, and a similarity between the input subspace and the dictionary subspace In a pattern recognition method comprising a process of calculating and a process of determining a category of an object based on the similarity,
The feature vector is extracted through one or more processes for localizing / miniaturizing illumination fluctuations, and one of the one or more processes for localizing / miniaturizing the illumination fluctuations expands the range of the low luminance region. Is a process of compressing the range of the relatively high brightness area,
The input subspace is generated through a process of projecting a subspace obtained by principal component analysis of the feature vector onto a constrained subspace that suppresses illumination variation, and the constrained subspace is collected under the same lighting condition. A pattern recognition method, wherein the pattern recognition method is generated as a linear combination of difference subspaces between categories of subspaces extended for each category by the feature vector.

A process of inputting a pattern, a process of extracting a first feature vector for recognition from the input pattern, a process of extracting a second feature vector for recognition from the input pattern, the first and first A pattern comprising: a process of generating an input subspace from two feature vectors; a process of calculating a similarity between the input subspace and the dictionary subspace; and a process of determining an object category based on the similarity In the recognition method,
The first feature vector is extracted through one or more processes for localizing / miniaturizing illumination fluctuations, and one of the one or more processes for localizing / miniaturizing the illumination fluctuations is a range of a low luminance region. The pattern recognition method is a process of enlarging the image and compressing the range of the relatively high luminance region.