JP6783207B2

JP6783207B2 - Object head region extractor and method

Info

Publication number: JP6783207B2
Application number: JP2017180155A
Authority: JP
Inventors: ホウアリサビリン; 浩嗣三功; 内藤　整; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-09-20
Filing date: 2017-09-20
Publication date: 2020-11-11
Anticipated expiration: 2037-09-20
Also published as: JP2019057031A

Description

本発明は、任意の入力イメージからオブジェクトの頭部領域を抽出する頭部領域抽出装置および方法に関する。 The present invention relates to a head region extraction device and a method for extracting a head region of an object from an arbitrary input image.

特許文献１には、予備的頭部検出モデルを利用して画像内の頭部を迅速かつ正確に検出する方法が開示されている。 Patent Document 1 discloses a method for quickly and accurately detecting a head in an image by using a preliminary head detection model.

非特許文献１には、第１段階の低分解能単一頭モデル分類器と、前段階の検出を訓練した部分ベースモデルとからなるカスケード分類器としてHOGを利用することにより、頭部を検出する方法が開示されている。 Non-Patent Document 1 describes a method of detecting a head by using HOG as a cascade classifier consisting of a low-resolution single-headed model classifier of the first stage and a partial-based model trained for detection of the previous stage. Is disclosed.

非特許文献２には、人数を数えるために、Haar特徴およびHOG特徴を利用して頭および肩の位置を検出する技術が開示されている。非特許文献２では、検出された頭部および肩部の対称構造に基づいて、いわゆる関節HOG特徴が作成され、Haar分類器を使用して負のサンプルが除外され、そこから最終的な頭肩位置が推定される。 Non-Patent Document 2 discloses a technique for detecting the positions of the head and shoulders using Haar features and HOG features in order to count the number of people. In Non-Patent Document 2, so-called joint HOG features are created based on the detected symmetric structures of the head and shoulders, negative samples are excluded using a Haar classifier, from which the final head and shoulders are finalized. The position is estimated.

特願2012-522506号Japanese Patent Application No. 2012-522506

E. Rehder, H. Kloeden, and C. Stiller, "Head Detection and Orientation Estimation for Pedestrian Safety," IEEE 17th Int. Conf. on ITSC, 2014E. Rehder, H. Kloeden, and C. Stiller, "Head Detection and Orientation Estimate for Pedestrian Safety," IEEE 17th Int. Conf. On ITSC, 2014 L. Chen, H. Wu, S. Zhao, and J. Gu, "Head-shoulder detection using joint HOG features for people counting and video surveillance in library," IEEE Workshop on ECA, 2014.L. Chen, H. Wu, S. Zhao, and J. Gu, "Head-shoulder detection using joint HOG features for people counting and video surveillance in library," IEEE Workshop on ECA, 2014.

非特許文献１、２では、複数の頭部間にオクルージョンが生じたときに各頭部をいかに識別するかの課題が検討されていない。特許文献１では、画像内で複数の頭部を検出することができるが、頭部のテクスチャ／カラーに関する顕著性が背景と比較して低い場合に検出能力が低下してしまう。 Non-Patent Documents 1 and 2 do not examine the problem of how to identify each head when occlusion occurs between a plurality of heads. In Patent Document 1, a plurality of heads can be detected in an image, but the detection ability is lowered when the prominence of the texture / color of the heads is lower than that of the background.

本発明の目的は、上記の技術課題を解決し、オブジェクト認識に際して、オブジェクトの頭部領域を認識することで、オブジェクト間にオクルージョンが発生して両者を識別できない場合でも、頭部領域の位置、姿勢に基づいて各オブジェクトの位置、姿勢を推定できるようにすることにある。 An object of the present invention is to solve the above technical problems and to recognize the head region of an object at the time of object recognition, so that even if occlusion occurs between the objects and both cannot be distinguished, the position of the head region can be determined. The purpose is to be able to estimate the position and posture of each object based on the posture.

上記の目的を達成するために、本発明は、任意の入力イメージからオブジェクトの頭部領域を抽出する頭部抽出装置および方法において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention is characterized in that the head extraction device and method for extracting the head region of an object from an arbitrary input image have the following configurations.

(1) 既知の頭部イメージのテクスチャパターンおよびエッジパターンを学習する頭部学習手段と、入力イメージを対象に前記テクスチャパターンに基づくマッチング処理を実施して頭部領域候補を抽出するテクスチャマッチング手段と、前記頭部領域候補に前記テクスチャパターンおよびエッジパターンに基づくマッチング処理を実施して頭部領域を決定するパターンマッチング手段とを具備した。 (1) A head learning means for learning a texture pattern and an edge pattern of a known head image, and a texture matching means for extracting a head region candidate by performing a matching process based on the texture pattern on an input image. The head region candidate is provided with a pattern matching means for determining the head region by performing a matching process based on the texture pattern and the edge pattern.

(2) 相互に近接する複数の頭部領域候補をその位置情報に基づいて一つに統合する頭部領域候補統合手段をさらに具備した。 (2) A head region candidate integration means for integrating a plurality of head region candidates that are close to each other into one based on the position information is further provided.

(3) 頭部学習手段は、頭部イメージごとに各画素値を求めて画素位置ごとに画素値平均および画素値分散を計算し、画素位置ごとに得られる画素値平均および画素値分散をテクスチャパターンとして学習するようにした。 (3) The head learning means obtains each pixel value for each head image, calculates the pixel value average and pixel value dispersion for each pixel position, and textures the pixel value average and pixel value dispersion obtained for each pixel position. I tried to learn as a pattern.

(4) 頭部学習手段は、頭部イメージをCIELab色空間に変換し、画素値平均および画素値分散を計算する手段は、次元Lおよび補色次元a，bの各チャネル値の平均および分散を計算するようにした。
装置。 (4) The head learning means converts the head image into the CIELab color space, and the means for calculating the pixel value averaging and pixel value variance calculates the averaging and variance of each channel value of dimension L and complementary color dimensions a and b. I tried to calculate.
apparatus.

(5) 頭部学習手段は、各頭部イメージの画素ブロックごとに勾配値を計算する手段と、画素ブロックの位置ごとに勾配値平均および勾配値分散を計算する手段とを具備し、勾配値分散が所定の基準値を下回る画素ブロック位置の勾配値を前記勾配値平均で代表したエッジパターンを学習するようにした。 (5) The head learning means includes a means for calculating a gradient value for each pixel block of each head image and a means for calculating a gradient value average and a gradient value variance for each position of the pixel block, and includes a gradient value. An edge pattern is learned in which the gradient value of the pixel block position where the variance is lower than a predetermined reference value is represented by the gradient value average.

(6) テクスチャマッチング手段は、入力イメージの各画素値と前記テクスチャパターンの対応する画素値平均との差分が、対応する画素値分散との関係で所定値を下回るか否かを識別する手段と、入力イメージ上で所定の探索範囲を順次にシフトして前記所定値を下回らない画素の割合を探索範囲ごとに計算する手段とを具備し、前記所定値を下回らない画素の割合が基準値を下回る探索範囲を頭部領域候補として抽出するようにした。 (6) The texture matching means is a means for identifying whether or not the difference between each pixel value of the input image and the corresponding pixel value average of the texture pattern is less than a predetermined value in relation to the corresponding pixel value variance. , A means for sequentially shifting a predetermined search range on the input image and calculating the proportion of pixels not below the predetermined value for each search range, and the proportion of pixels not below the predetermined value is the reference value. The search range below is extracted as a head region candidate.

(7) パターンマッチング手段は、各頭部領域候補のテクスチャパターンおよびエッジパターンに基づいて頭部領域候補ごとに第１空間分散行列を計算する手段と、学習したテクスチャパターンおよびエッジパターンに基づいて第２空間分散行列を算出する手段とを具備し、第１空間分散行列と第２空間分散行列との差分が所定の基準値を下回る頭部領域候補を頭部領域に決定するようにした。 (7) The pattern matching means is a means for calculating the first spatial variance matrix for each head region candidate based on the texture pattern and edge pattern of each head region candidate, and the first based on the learned texture pattern and edge pattern. A means for calculating the two-spatial variance matrix is provided, and a head region candidate whose difference between the first spatial variance matrix and the second spatial variance matrix is less than a predetermined reference value is determined as the head region.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) オブジェクト認識に際して、オブジェクトの頭部領域を認識するので、オブジェクト間にオクルージョンが発生して両者を識別できない場合でも、頭部領域の位置、姿勢に基づいて各オブジェクトの位置、姿勢を推定できるようになる。 (1) Since the head area of an object is recognized during object recognition, the position and posture of each object are estimated based on the position and posture of the head area even if occlusion occurs between the objects and the two cannot be distinguished. become able to.

(2) 任意の入力イメージに対して、最初はテクスチャマッチングにより頭部領域候補を抽出することで探索範囲を限定し、次いで、頭部領域候補のみを対象にテクスチャパターンおよびエッジパターンを用いた空間的相関に基づく探索を行って頭部領域を決定するので、オブジェクトの頭部領域を少ない処理負荷で正確に抽出できるようになる。 (2) For any input image, the search range is limited by first extracting the head area candidates by texture matching, and then the space using the texture pattern and edge pattern only for the head area candidates. Since the head region is determined by performing a search based on the target correlation, the head region of the object can be accurately extracted with a small processing load.

本発明の一実施形態に係るオブジェクトの頭部抽出装置の主要部の構成を示した機能ブロック図である。It is a functional block diagram which showed the structure of the main part of the head extraction apparatus of the object which concerns on one Embodiment of this invention. 学習用の頭部イメージの抽出例を示した図である。It is a figure which showed the extraction example of the head image for learning. テクスチャパターンの学習手順を示したフローチャートである。It is a flowchart which showed the learning procedure of a texture pattern. エッジパターンの学習手順を示したフローチャートである。It is a flowchart which showed the learning procedure of an edge pattern. テクスチャマッチングの手順を示したフローチャートである。It is a flowchart which showed the procedure of texture matching. パターンマッチングの手順を示したフローチャートである。It is a flowchart which showed the procedure of pattern matching. 頭部領域候補の統合方法を示した図（その１）である。It is a figure which showed the integration method of a head region candidate (the 1). 頭部領域候補の統合方法を示した図（その２）である。It is a figure (No. 2) which showed the integration method of a head region candidate.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図1は、本発明の一実施形態に係るオブジェクトの頭部領域抽出装置の主要部の構成を示した機能ブロック図である。このような装置は、汎用のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいはアプリケーションの一部がハードウェア化またはROM化された専用機や単能機としても構成できる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing a configuration of a main part of an object head region extraction device according to an embodiment of the present invention. Such a device can be configured by implementing an application (program) that realizes each function on a general-purpose computer or server. Alternatively, it can be configured as a dedicated machine or a single-purpose machine in which a part of the application is made into hardware or ROM.

本発明の頭部領域抽出装置は、頭部領域であることが既知の予め用意した多数の頭部イメージに基づいて頭部の特徴を学習する頭部学習部１、および学習結果に基づいて任意の入力イメージから頭部領域を識別する頭部識別部２から構成される。データベース３には、図２に示したように、多数のオブジェクト（人間）に関するビルボードから、例えば手作業で抽出した多数の頭部イメージが蓄積されている。 The head region extraction device of the present invention is an arbitrary head learning unit 1 that learns the characteristics of the head based on a large number of head images prepared in advance known to be the head region, and an arbitrary based on the learning result. It is composed of a head identification unit 2 that identifies a head region from the input image of. As shown in FIG. 2, the database 3 stores a large number of head images extracted manually, for example, from a billboard relating to a large number of objects (humans).

頭部学習部１において、頭部パターン発生部１０１は、テクスチャパターン学習部１０１ａおよびエッジパターン学習部１０１ｂを含む。テクスチャパターン学習部１０１ａは、データベース３から取得した頭部イメージの座標位置x，yごとに、画素値平均および画素値分散を計算することで頭部のテクスチャパターンを学習する。テクスチャパターンの学習結果はテクスチャパターン蓄積部１０２に蓄積される。 In the head learning unit 1, the head pattern generating unit 101 includes a texture pattern learning unit 101a and an edge pattern learning unit 101b. The texture pattern learning unit 101a learns the texture pattern of the head by calculating the pixel value average and the pixel value variance for each of the coordinate positions x and y of the head image acquired from the database 3. The texture pattern learning result is stored in the texture pattern storage unit 102.

エッジパターン学習部１０１ｂは、データベース３から取得した頭部イメージに対して、所定の画素ブロックごとにエッジ検出および勾配計算を実施し、その勾配平均および勾配分散を計算することで頭部のエッジパターンを学習する。エッジパターンの学習結果はエッジパターン蓄積部１０３に蓄積される。 The edge pattern learning unit 101b performs edge detection and gradient calculation for each predetermined pixel block on the head image acquired from the database 3, and calculates the gradient average and the gradient variance to calculate the edge pattern of the head. To learn. The learning result of the edge pattern is accumulated in the edge pattern storage unit 103.

頭部識別部２には、任意のオブジェクト画像に対して、例えば背景差分法を適用して抽出されたオブジェクトのマスクイメージが入力される。フィルタリング部２０１は、オブジェクトマスクの入力イメージから、そのエッジ成分を保存しながらノイズ線分のみを選択的に除去する。色空間変換部２０２は、テクスチャ構造の顕著性を改善するために、入力イメージの色空間をBGB空間からCIELab空間に変換する。 A mask image of an object extracted by applying, for example, the background subtraction method is input to an arbitrary object image in the head identification unit 2. The filtering unit 201 selectively removes only noise line segments from the input image of the object mask while preserving the edge components thereof. The color space conversion unit 202 converts the color space of the input image from the BGB space to the CIELab space in order to improve the prominence of the texture structure.

テクスチャマッチング部２０３は、前記入力イメージに対して前記テクスチャパターン蓄積部１０２の学習結果を用いて所定の画素ブロック単位でテクスチャマッチングを実施し、入力イメージから頭部領域である可能性が高い全ての頭部領域候補を抽出する。本実施形態では、テクスチャパターンの学習結果として、後に詳述するように、各画素位置x,yにおけるCIELab空間の次元Lおよび補色次元a，bの各チャネル値の平均値u^L，u^a，u^bおよび分散σ^L，σ^a，σ^bが用いられる。 The texture matching unit 203 performs texture matching on the input image in units of predetermined pixel blocks using the learning result of the texture pattern storage unit 102, and all that are highly likely to be the head region from the input image. Extract head region candidates. In this embodiment, as a result of learning the texture pattern, as will be described in detail later, the average value u ^L , u ^a , of the dimension L of the CIELab space at each pixel position x, y and the channel values of the complementary color dimensions a, b, u ^b and variances σ ^L , σ ^a , σ ^b are used.

頭部領域候補統合部２０４は、相互に近接する複数の頭部領域候補をその位置情報に基づいて統合する。パターンマッチング部２０５は、前記抽出または統合された頭部領域候補を対象に、テクスチャパターンマッチングおよびエッジパターンマッチングを併用して頭部領域を決定する。 The head region candidate integration unit 204 integrates a plurality of head region candidates that are close to each other based on the position information. The pattern matching unit 205 determines the head region by using both the texture pattern matching and the edge pattern matching for the extracted or integrated head region candidate.

図３は、前記頭部学習部１のテクスチャパターン学習部１０１ａによるテクスチャパターンの学習手順を示したフローチャートである。 FIG. 3 is a flowchart showing a texture pattern learning procedure by the texture pattern learning unit 101a of the head learning unit 1.

ステップＳ１では、データベース３から全ての頭部イメージが取得される。ステップＳ２では、テクスチャ構造の顕著性を改善するために、RGB色空間で表現された頭部イメージがCIELab色空間に変換される。ステップＳ３では、各頭部イメージにフィルタリング処理を実施して、背景画素等のノイズ成分を除去する。 In step S1, all head images are acquired from the database 3. In step S2, the head image represented in the RGB color space is converted into the CIELab color space in order to improve the prominence of the texture structure. In step S3, each head image is subjected to a filtering process to remove noise components such as background pixels.

ステップＳ４では、全ての頭部イメージから画素位置x，yごとにCIELab色空間におけるチャネル値（L，a，b）が抽出され、画素位置x，yごとに各チャネル値の平均値u*_xy（=u^L _xy，u^a _xy，u^b _xy）が次式(1)に基づいて計算される。ここで、Fはデータベース３から取得した頭部イメージの個数であり、Ii*はi番目の頭部イメージの各画素位置x，yにおける各チャネル値を代表する。 In step S4, the channel values (L, a, b) in the CIELab color space are extracted from all the head images for each pixel position x, y, and the average value u * _xy of each channel value for each pixel position x, y. (= U ^L _xy , u ^a _xy , u ^b _xy ) is calculated based on the following equation (1). Here, F is the number of head images acquired from the database 3, and Ii * represents each channel value at each pixel position x and y of the i-th head image.

ステップＳ５では、各頭部イメージから画素位置x,yごとに抽出したCIELab色空間のチャネル値Ii*および前記各チャネル値の平均値u*_xyに基づいて、画素位置x，yごとに各チャネル値の分散σ*_xy（=σ^L _xy，σ^a _xy，σ^b _xy）が次式(2)に基づいて計算される。 In step S5, each channel for each pixel position x, y is based on the channel value Ii * of the CIELab color space extracted for each pixel position x, y from each head image and the average value u * _xy of each of the channel values. The variance of the value σ * _xy (= σ ^L _xy , σ ^a _xy , σ ^b _xy ) is calculated based on the following equation (2).

以上のようにして、頭部イメージのCIELab色空間における各チャネル値の平均値u*_xyおよび分散σ*_xyが求まると、ステップＳ６では、次式(3)，(4)に基づいてテクスチャパターンが構築される。 As described above, when the mean value u * _xy and the variance σ * _xy of each channel value in the CIELab color space of the head image are obtained, in step S6, the texture pattern is based on the following equations (3) and (4). Is constructed.

図４は、前記頭部学習部１のエッジパターン学習部１０１ｂによるエッジパターンの学習手順を示したフローチャートである。 FIG. 4 is a flowchart showing an edge pattern learning procedure by the edge pattern learning unit 101b of the head learning unit 1.

ステップＳ１１では、前記データベース３から全ての頭部イメージが取得される。ステップＳ１２では、RGB色空間で表現された頭部イメージがグレースケールのイメージに変換される。ステップＳ１３で、エッジフィルタ処理によりエッジ情報が抽出される。 In step S11, all head images are acquired from the database 3. In step S12, the head image represented in the RGB color space is converted into a grayscale image. In step S13, edge information is extracted by edge filtering.

ステップＳ１４では、フィルタ処理後の各頭部イメージを4×4画素の画素ブロックに分割し、画素ブロックごとにHOG特徴量（ここでは、勾配値k）に基づいてエッジパターンε_pqが構築される。 In step S14, each head image after filtering is divided into pixel blocks of 4 × 4 pixels, and an edge pattern ε _pq is constructed for each pixel block based on the HOG feature amount (here, the gradient value k). ..

本実施形態では、各画素ブロックの位置を、左上端の画素ブロックからの相対値で表し、k_pqは縦方向にp番目かつ横方向にq番目の画素ブロックの勾配値を表現している。そして、対応する画素ブロックの位置pqごとに、勾配値kとその分散σとの関係を求め、次式(5)に基づいて、画素ブロックごとにエッジパターンε_pqが構築される。 In the present embodiment, the position of each pixel block is represented by a relative value from the upper left pixel block, and k _pq represents the gradient value of the p-th pixel block in the vertical direction and the q-th pixel block in the horizontal direction. Then, the relationship between the gradient value k and its variance σ is obtained for each position pq of the corresponding pixel block, and an edge pattern ε _pq is constructed for each pixel block based on the following equation (5).

本実施形態では、各頭部イメージの対応する各画素ブロックの位置pqごとに、その全ての勾配値k_pqが当該位置pqでの±2σ以内であるか否かを判断する。そして、±2σ以内の画素ブロックのエッジパターンε_pqは当該位置pqにおける各画素ブロックの勾配値k_pqの平均値k'_pqとし、±2σから外れる勾配値k_pqを含む位置のエッジパターンε_pqはゼロとされる。 In the present embodiment, for each position pq of each corresponding pixel block of each head image, it is determined whether or not all the gradient values k _pq are within ± 2σ at the position pq. The edge pattern epsilon _pq pixel blocks within ± 2 [sigma] is the average value k _'pq slope value k _pq of each pixel block in the position pq, the position of the edge pattern including a gradient value k _pq departing from ± 2 [sigma] epsilon _pq Is zero.

図５は、前記テクスチャマッチング部２０３によるテクスチャマッチングの手順を示したフローチャートである。テクスチャパターンを用いたマッチング処理は、頭部識別部２に入力された入力イメージから、オブジェクトの頭部である確率が高い頭部領域候補を抽出するために実行される。 FIG. 5 is a flowchart showing the procedure of texture matching by the texture matching unit 203. The matching process using the texture pattern is executed to extract a head region candidate having a high probability of being the head of an object from the input image input to the head identification unit 2.

本実施形態では、上式(3)，(4)のテクスチャパターンが与えられると、入力イメージ内の24×24画素の画素ブロックを探索範囲として、この探索範囲を４画素ずつシフトさせながら入力イメージの全領域から頭部候補領域が探索される。 In this embodiment, when the texture patterns of the above equations (3) and (4) are given, the input image is shifted by 4 pixels with the pixel block of 24 × 24 pixels in the input image as the search range. The head candidate area is searched from all the areas of.

ステップＳ２１では、入力イメージの各位置(x，y)の画素値I_xyが取得される。ステップＳ２２では、次式(６)に基づいて、入力イメージの各画素値I_xyと、その対応位置における前記テクスチャパターンの画素値平均u_xyとの差分の絶対値を当該対応位置での前記テクスチャパターンの分散平均の２倍値２σ_xyと比較する。そして、次式(6)が成立すれば、ステップＳ２３へ進んで当該位置の画素値I_xyをセット（I_xy=１）し、成立しなければ、ステップＳ２４へ進んで当該位置の画素値I_xyをリセット（I_xy=０）する。 In step S21, the pixel value I _xy at each position (x, y) of the input image is acquired. In step S22, based on the following equation (6), the absolute value of the difference between each pixel value I _xy of the input image and the average pixel value u _xy of the texture pattern at the corresponding position is set to the texture at the corresponding position. Compare with 2σ _{xy, which} is twice the variance average of the pattern. Then, if the following equation (6) is satisfied, the process proceeds to step S23 to set the pixel value I _xy at the position (I _xy = 1), and if not, the process proceeds to step S24 and the pixel value I at the position is set. Reset _xy (I _xy = 0).

ステップＳ２５では、入力イメージの全ての画素に関して上記の処理が完了したか否かが判断される。完了していなければステップＳ２１へ戻り、注目する画素を切り換えながら上記の各処理を実行する。 In step S25, it is determined whether or not the above processing is completed for all the pixels of the input image. If it is not completed, the process returns to step S21, and each of the above processes is executed while switching the pixel of interest.

入力イメージの全ての画素がセットまたはリセットされるとステップＳ２６へ進み、24×24画素の探索範囲が入力イメージの左上に設定される。ステップＳ２７では、探索範囲内で画素値Ixyがゼロでない非ゼロ画素の割合が計算される。ステップＳ２８では、非ゼロ割合を所定の基準値と比較し、非ゼロ割合が基準値よりも低ければ、ステップＳ２９へ進んで今回の探索範囲が頭部領域候補とされる。 When all the pixels of the input image are set or reset, the process proceeds to step S26, and the search range of 24 × 24 pixels is set in the upper left of the input image. In step S27, the ratio of non-zero pixels whose pixel value Ixy is not zero is calculated within the search range. In step S28, the non-zero ratio is compared with a predetermined reference value, and if the non-zero ratio is lower than the reference value, the process proceeds to step S29 and the current search range is set as a head region candidate.

ステップＳ３０では、入力イメージの全領域を探索済みか否かが判断される。探索済みでなければステップＳ３１へ進み、探索範囲を横方向又は縦方向に4画素分だけシフトさせて次の探索範囲を設定した後、ステップＳ２７へ戻って新たな探索範囲を対象に上記の各処理が繰り返される。 In step S30, it is determined whether or not the entire area of the input image has been searched. If the search has not been completed, the process proceeds to step S31, the search range is shifted in the horizontal or vertical direction by 4 pixels to set the next search range, and then the process returns to step S27 to target the new search range. The process is repeated.

図６は、前記頭部領域候補統合部２０４およびパターンマッチング部２０５によるパターンマッチングの手順を示したフローチャートである。本実施形態では、全ての頭部領域候補を対象に、前記テクスチャパターンおよびエッジパターンに基づくパターンマッチングを実施して最終的に頭部領域を決定する。 FIG. 6 is a flowchart showing a pattern matching procedure by the head region candidate integration unit 204 and the pattern matching unit 205. In the present embodiment, pattern matching based on the texture pattern and the edge pattern is performed on all the head region candidates, and finally the head region is determined.

パターンマッチング処理では、予め学習したテクスチャパターン(u，σ)およびエッジパターンεを、画素ブロック単位（本実施形態では、前記4×4画素の画素ブロック）で空間共分散行列ρに相関させる。 In the pattern matching process, the texture pattern (u, σ) and the edge pattern ε learned in advance are correlated with the spatial covariance matrix ρ in pixel block units (in this embodiment, the 4 × 4 pixel pixel block).

同様に、入力イメージから抽出した各頭部領域候補に関しても、同じ画素ブロック単位で、その画素平均値u、画素分散σおよびエッジパターンεを空間共分散行列ρに相関させる。そして、対応する画素ブロックごとに各空間共分散行列ρを比較し、その差分が非ゼロとなる割合が低い頭部領域候補を頭部領域に決定する。 Similarly, for each head region candidate extracted from the input image, the pixel mean value u, the pixel variance σ, and the edge pattern ε are correlated with the spatial covariance matrix ρ in the same pixel block unit. Then, each spatial covariance matrix ρ is compared for each corresponding pixel block, and a head region candidate having a low ratio of the difference becoming non-zero is determined as the head region.

ステップＳ５１では、図７，８に示したように、前記頭部領域候補統合部２０４により、位置が近接する複数の頭部領域候補Hi_1〜Hi_5が、同一の頭部に関して認識された頭部領域候補群とみなして一つに統合される。本実施形態では、頭部領域候補の位置指標値(x，y)として、例えばその中心位置、重心位置または頂点座標を定義する。そして、位置ずれが所定の範囲内、例えばx-4＜x＜x+4かつy-4＜y＜y+4である全ての頭部領域候補を、これらの頭部領域候補を包含する最小の矩形領域を範囲とする新たな頭部領域候補に統合する。 In step S51, as shown in FIGS. 7 and 8, a plurality of head region candidates Hi_1 to Hi_5 having close positions are recognized by the head region candidate integration unit 204 with respect to the same head region. It is regarded as a candidate group and integrated into one. In the present embodiment, for example, the center position, the center of gravity position, or the apex coordinates are defined as the position index values (x, y) of the head region candidate. Then, all the head region candidates whose misalignment is within a predetermined range, for example, x-4 <x <x + 4 and y-4 <y <y + 4, are included in the minimum including these head region candidates. Integrate into a new head region candidate that covers the rectangular region of.

ステップＳ５２では、統合後の頭部領域候補の一つが選択される。ステップＳ５３では、今回の頭部領域候補から、4×4画素の位置pqの画素ブロックごとに、次式(7)，(8)で定義されるスーパーピクセルI_pqが計算される。 In step S52, one of the head region candidates after integration is selected. In step S53, the super pixel I _pq defined by the following equations (7) and (8) is calculated for each pixel block at the position pq of 4 × 4 pixels from the current head region candidate.

上式(8)において、L_pq，a_pqおよびb_pqは、位置pqの画素ブロックで検知されたCIELab空間での各チャネル値L，aおよびbの平均値を表している。ω、Δsは、それぞれパターン係数サイズおよび領域係数距離であり、本実施形態では、ω=24、Δs=25を想定している。前記パターン係数サイズは、次式(9)に基づいて決定される。 In the above equation (8), L _pq , a _pq, and b _pq represent the average value of each channel value L, a, and b in the CIELab space detected in the pixel block at the position pq. ω and Δs are the pattern coefficient size and the area coefficient distance, respectively, and in this embodiment, ω = 24 and Δs = 25 are assumed. The pattern coefficient size is determined based on the following equation (9).

ここで、Pはパターンサイズであり、Sは次式(10)で表される。ここで、Aはスーパーピクセル領域サイズであり、sはスーパーピクセルサイズである。 Here, P is the pattern size, and S is expressed by the following equation (10). Where A is the superpixel area size and s is the superpixel size.

ステップＳ５４では、空間共分散行列ρ_pqが、エッジ情報ε_pqのα倍とテクスチャ情報I_pqのβ倍との和として、次式(11)で求められる。 In step S54, the spatial covariance matrix ρ _pq is obtained by the following equation (11) as the sum of α times the edge information ε _pq and β times the texture information I _pq .

ステップＳ５５では、今回の頭部領域候補内の全ての4×4画素ブロックに関して空間共分散行列ρ_pqの計算が完了したか否かが判断される。完了していなければステップＳ５３へ戻り、残りの各画素ブロックに関しても同様に空間共分散行列ρ_pqが計算される。 In step S55, it is determined whether or not the calculation of the spatial covariance matrix ρ _pq is completed for all the 4 × 4 pixel blocks in the head region candidate this time. If it is not completed, the process returns to step S53, and the spatial covariance matrix ρ _pq is similarly calculated for each of the remaining pixel blocks.

ステップＳ５６では、予め学習したテクスチャパターン(u，σ)およびエッジパターンεに基づいて、上記と同様の手順で計算した各画素ブロックの空間共分散行列ρと、今回の頭部領域候補に関して計算した各画素ブロックの空間共分散行列ρ_pqとの差分が画素ブロック単位で計算される。 In step S56, based on the texture pattern (u, σ) and the edge pattern ε learned in advance, the spatial covariance matrix ρ of each pixel block calculated by the same procedure as above and the head region candidate this time were calculated. The difference between each pixel block and the spatial covariance matrix ρ _pq is calculated for each pixel block.

ステップＳ５７では、空間共分散行列ρ，ρ_pqの差分が非ゼロとなる割合が所定の基準値、例えば20%未満であるか否かが判断される。基準値未満であれば、ステップＳ５８へ進んで今回の頭部領域候補が頭部領域に決定される。これに対して、非ゼロとなる割合が基準値未満でなければ、ステップＳ５９へ進んで今回の頭部領域候補が非頭部領域に決定される。 In step S57, it is determined whether or not the ratio of the difference between the spatial covariance matrices ρ and ρ _pq to non-zero is less than a predetermined reference value, for example, 20%. If it is less than the reference value, the process proceeds to step S58 and the current head region candidate is determined as the head region. On the other hand, if the non-zero ratio is not less than the reference value, the process proceeds to step S59 and the current head region candidate is determined as the non-head region.

ステップＳ６０では、全ての頭部領域候補に関して前記空間共分散行列ρに基づく頭部領域判定が完了したか否かが判断される。完了していなければステップＳ５２へ戻り、残りの頭部領域候補に対して上記の各処理が繰り返される。 In step S60, it is determined whether or not the head region determination based on the spatial covariance matrix ρ is completed for all the head region candidates. If it is not completed, the process returns to step S52, and each of the above processes is repeated for the remaining head region candidates.

本実施形態によれば、オブジェクト認識に際して、オブジェクトの頭部領域を認識するので、オブジェクト間にオクルージョンが発生して両者を識別できない場合でも、頭部領域の位置、姿勢に基づいて各オブジェクトの位置、姿勢を推定できるようになる。 According to the present embodiment, since the head area of the object is recognized at the time of object recognition, even if occlusion occurs between the objects and both cannot be distinguished, the position of each object is based on the position and posture of the head area. , You will be able to estimate the posture.

また、本実施形態によれば、任意の入力イメージに対して、最初はテクスチャマッチングにより頭部領域候補を抽出することで探索範囲を限定し、次いで、頭部領域候補のみを対象にテクスチャパターンおよびエッジパターンを用いた空間的相関に基づく探索を行って頭部領域を決定するので、オブジェクトの頭部領域を少ない処理負荷で正確に抽出できるようになる。 Further, according to the present embodiment, for an arbitrary input image, the search range is limited by first extracting the head region candidates by texture matching, and then the texture pattern and the texture pattern and the head region candidates are targeted only. Since the head region is determined by performing a search based on spatial correlation using an edge pattern, the head region of the object can be accurately extracted with a small processing load.

１…頭部学習部，２…頭部識別部，３…データベース，１０１…頭部パターン発生部，１０１ａ…テクスチャパターン学習部，１０１ｂ…エッジパターン学習部，１０２…テクスチャパターン蓄積部，１０３…エッジパターン蓄積部，２０１…フィルタリング部，２０２…色空間変換部，２０３…テクスチャマッチング部，２０４…頭部領域候補統合部，２０５…パターンマッチング部 1 ... Head learning unit, 2 ... Head identification unit, 3 ... Database, 101 ... Head pattern generation unit, 101a ... Texture pattern learning unit, 101b ... Edge pattern learning unit, 102 ... Texture pattern storage unit, 103 ... Edge Pattern storage unit, 201 ... Filtering unit, 202 ... Color space conversion unit, 203 ... Texture matching unit, 204 ... Head area candidate integration unit, 205 ... Pattern matching unit

Claims

In a head extraction device that extracts the head area of an object,
A head learning means for learning the texture pattern and edge pattern of the head image,
A texture matching means for extracting head region candidates by performing matching processing based on the texture pattern on an input image, and
An object head region extraction device, characterized in that the head region candidate is provided with a pattern matching means for determining a head region by performing matching processing based on the texture pattern and the edge pattern.

The head region extraction device for an object according to claim 1, further comprising a head region candidate integration means that integrates a plurality of head region candidates that are close to each other into one based on the position information. ..

The head learning means includes means for obtaining each pixel value for each head image and calculating the pixel value average and the pixel value variance for each pixel position.
The head region extraction device for an object according to claim 1 or 2, wherein the pixel value average and the pixel value dispersion obtained for each pixel position are learned as a texture pattern.

The head learning means includes means for converting a head image into a CIELab color space.
The head region extraction device for an object according to claim 2, wherein the means for calculating the pixel value average and the pixel value variance is to calculate the average and variance of each channel value of the dimension L and the complementary color dimensions a and b. ..

The head learning means
A means to calculate the gradient value for each pixel block of each head image,
A means for calculating the gradient value average and the gradient value variance for each position of the pixel block is provided.
The head region extraction device for an object according to claim 3 or 4, wherein the edge pattern in which the gradient value of the pixel block position where the gradient value dispersion is lower than a predetermined reference value is represented by the gradient value average is learned.

The texture matching means
A means for identifying whether or not the difference between each pixel value of the input image and the corresponding pixel value average of the texture pattern is less than a predetermined value in relation to the corresponding pixel value variance.
It is provided with a means for sequentially shifting a predetermined search range on the input image and calculating the ratio of pixels that do not fall below the predetermined value for each search range.
The head region extraction device for an object according to claim 3, wherein a search range in which the proportion of pixels not less than the predetermined value is less than the reference value is extracted as a head region candidate.

The pattern matching means
A means for calculating the first spatial variance matrix for each head region candidate based on the texture pattern and edge pattern of each head region candidate, and
A means for calculating a second spatial variance matrix based on the learned texture pattern and edge pattern is provided.
The head region extraction of the object according to claim 6, wherein a head region candidate whose difference between the first spatial variance matrix and the second spatial variance matrix is less than a predetermined reference value is determined as the head region. apparatus.

In the head extraction method that extracts the head area of an object by a computer,
Learn the texture and edge patterns of the head image,
Matching processing based on the texture pattern is performed on the input image to extract head region candidates.
A method for extracting a head region of an object, which comprises performing a matching process based on the texture pattern and the edge pattern on the head region candidate to determine the head region.