JP5289993B2

JP5289993B2 - TRACKING DEVICE AND TRACKING METHOD

Info

Publication number: JP5289993B2
Application number: JP2009024190A
Authority: JP
Inventors: 浩輔松原
Original assignee: Olympus Imaging Corp
Current assignee: Olympus Imaging Corp
Priority date: 2009-02-04
Filing date: 2009-02-04
Publication date: 2013-09-11
Anticipated expiration: 2029-02-04
Also published as: JP2010183291A

Description

本発明は、連続する画像を順次処理して各画像中に出現した顔を追尾する追尾装置および追尾方法に関するものである。 The present invention relates to a tracking device and a tracking method for sequentially processing consecutive images and tracking a face appearing in each image.

従来から、画像中に映る人物等の顔を検出する顔検出技術が知られている。かかる技術は、例えばテンプレートマッチング法によって画像中の顔の位置を検出し、目や鼻、口等の顔の特徴点の位置を検出するものであり、顔の大きさや向き等を検出することができる。例えばデジタルカメラ等の撮像装置では、撮像素子に結像されている被写体像をリアルタイムに表示（ライブビュー）させて電子ファインダーとして用いているが、近年では、顔検出技術を適用してライブビュー画像中の顔を検出し、検出した顔を示す顔枠を表示するようにしたものが知られている。また、この顔検出の結果は、露出やフォーカスの制御にも利用されている。例えば特許文献１には、顔検出（顔認識）の結果をもとに焦点検出を行う技術が開示されている。 Conventionally, a face detection technique for detecting a face of a person or the like shown in an image is known. Such a technique detects the position of a face in an image by, for example, a template matching method, and detects the position of a facial feature point such as an eye, nose, or mouth, and can detect the size and orientation of the face. it can. For example, in an imaging apparatus such as a digital camera, an object image formed on an imaging element is displayed in real time (live view) and used as an electronic viewfinder. However, in recent years, a live view image is applied by applying face detection technology. A device that detects the inside face and displays a face frame indicating the detected face is known. The face detection result is also used for exposure and focus control. For example, Patent Literature 1 discloses a technique for performing focus detection based on the result of face detection (face recognition).

一方で、連続する画像間でパタンマッチングを行い、その移動量を算出することによって画像間の動きを検出する動き検出技術が知られている。 On the other hand, there is known a motion detection technique for detecting a motion between images by performing pattern matching between successive images and calculating a movement amount thereof.

特開２００６−２２７０８０号公報JP 2006-227080 A

従来の顔検出技術では、例えば顔が横や後ろを向いていると検出精度が低下する場合があった。このため、例えば顔の向きが変化すると顔検出の結果が不安定になる場合があり、顔検出結果に従ってライブビュー画像上に前述の顔枠を表示させると、顔枠の表示・非表示が繰り返されてちらつく等して見難い場合があった。 In the conventional face detection technology, for example, when the face is facing sideways or behind, the detection accuracy may be lowered. For this reason, for example, if the orientation of the face changes, the face detection result may become unstable. When the above-mentioned face frame is displayed on the live view image according to the face detection result, the display / non-display of the face frame is repeated. Sometimes it was difficult to see due to flickering.

これに対し、顔検出によって検出された顔について動き検出を行うようにすれば、顔の向き等が大きく変化して顔検出に失敗した場合であっても、その顔の動きを検出することができるので、顔を見失わずに追尾できる。しかしながら、画像中から複数の顔が検出された場合、全ての顔を対象として動き検出を行うと、処理負荷が増大するという問題があった。この問題は、デジタルカメラ等の処理能力に制限がある装置に適用する場合に特に問題であった。また、顔の検出や動き検出に時間を要してしまうと、動きのあるシーンにおいて検出不能に陥ったり、撮影チャンスを逃す問題が生じ、安定した顔の追尾を行うことができなかった。 On the other hand, if motion detection is performed on a face detected by face detection, the face motion can be detected even when face detection fails due to a significant change in the orientation of the face. You can track without losing your face. However, when a plurality of faces are detected from an image, there is a problem that processing load increases when motion detection is performed for all the faces. This problem is particularly problematic when applied to an apparatus having a limited processing capability such as a digital camera. In addition, if time is required for face detection and motion detection, there is a problem that detection is not possible in a moving scene, and there is a problem of missing a photographing opportunity, and stable face tracking cannot be performed.

本発明は、上記に鑑みてなされたものであって、処理の負荷を増大させることなく、連続する画像中に出現する顔を安定して追尾することができる追尾装置および追尾方法を提供することを目的とする。 The present invention has been made in view of the above, and provides a tracking device and a tracking method capable of stably tracking a face appearing in a continuous image without increasing the processing load. With the goal.

上述した課題を解決し、目的を達成するため、本発明にかかる追尾装置は、連続する画像中に出現した顔を追尾する追尾装置であって、前記連続する画像を順次処理して前記画像中の複数の顔を検出する顔検出部と、前記顔検出部によって検出された各顔を含む顔領域をそれぞれの顔に対して設定する顔領域設定部と、前記顔検出部による顔検出結果をもとに、前記検出された各顔の検出難易度を評価する難易度評価部と、前記各顔を含む顔領域の中から、前記各顔の検出難易度が所定条件を満たす顔領域を選択する顔領域選択部と、前記顔領域選択部による選択結果をもとに、前記各顔を含む顔領域それぞれに対して動き検出をする対象領域とするか否かを切り換えて設定する動き検出対象設定部と、隣接する画像間で、前記動き検出対象設定部によって設定された対象領域の動きを検出する動き検出部と、を備えることを特徴とする。 In order to solve the above-described problems and achieve the object, a tracking device according to the present invention is a tracking device that tracks a face that appears in a continuous image, and sequentially processes the continuous image to include the image in the image. A face detection section for detecting a plurality of faces, a face area setting section for setting a face area including each face detected by the face detection section for each face, and a face detection result by the face detection section. Based on the difficulty level evaluation unit that evaluates the detection difficulty level of each detected face and a face area that satisfies the predetermined condition of the detection difficulty level of each face from the face areas including each face And a motion detection target that is set by switching whether or not to make a motion detection target for each of the face regions including each face based on the selection result by the face region selection unit The motion detection target between the setting unit and adjacent images Characterized in that it comprises a motion detector for detecting a movement of the target region set by the tough, the.

また、本発明にかかる追尾装置は、上記の発明において、前記顔領域選択部は、前記検出難易度が高い所定数の顔を含む顔領域を選択し、前記動き検出対象設定部は、前記顔領域選択部によって選択された所定数の顔を含む顔領域を前記対象領域として設定することを特徴とする。 In the tracking device according to the present invention, in the above invention, the face area selection unit selects a face area including a predetermined number of faces with high detection difficulty, and the motion detection target setting unit includes the face A face area including a predetermined number of faces selected by the area selection unit is set as the target area.

また、本発明にかかる追尾装置は、上記の発明において、前記顔領域選択部は、前記検出難易度が低い所定数の顔を含む顔領域を選択し、前記動き検出対象設定部は、前記顔領域選択部によって選択された所定数の顔を含む顔領域を前記対象領域として設定しないことを特徴とする。 In the tracking device according to the present invention, in the above invention, the face area selection unit selects a face area including a predetermined number of faces with a low detection difficulty level, and the motion detection target setting unit includes the face A face area including a predetermined number of faces selected by the area selection unit is not set as the target area.

また、本発明にかかる追尾装置は、上記の発明において、前記顔検出部は、少なくとも顔の大きさ、顔の位置、顔の向きおよび顔の傾きのいずれか１つを前記顔検出結果として出力し、前記難易度評価部は、前記顔検出部によって検出された各顔の顔検出結果をもとに、少なくとも顔の大きさ、顔の位置、顔の向き、顔の傾き、顔の向き変化、顔の傾き変化、顔の移動速度および顔の移動方向のうちの１つまたは複数を評価パラメータとして前記各顔の検出難易度を評価することを特徴とする。 In the tracking device according to the present invention as set forth in the invention described above, the face detection unit outputs at least one of a face size, a face position, a face orientation, and a face tilt as the face detection result. The difficulty level evaluation unit, based on the face detection result of each face detected by the face detection unit, at least a face size, a face position, a face orientation, a face tilt, a face orientation change The degree of difficulty in detecting each face is evaluated using one or more of a change in face inclination, a face moving speed, and a face moving direction as evaluation parameters.

また、本発明にかかる追尾装置は、上記の発明において、前記難易度評価部は、前記検出された各顔が他の顔と重なる可能性を推定し、推定結果を前記評価パラメータとして前記各顔の検出難易度を評価することを特徴とする。 In the tracking device according to the present invention, in the above invention, the difficulty level evaluation unit estimates the possibility that each detected face overlaps with another face, and uses the estimation result as the evaluation parameter for each face. It is characterized by evaluating the degree of detection difficulty.

また、本発明にかかる追尾装置は、上記の発明において、前記難易度評価部は、前記各顔の顔検出結果から顔の移動速度および／または顔の移動方向を判定し、判定結果をもとに前記各顔が他の顔と重なる可能性を推定することを特徴とする。 In the tracking device according to the present invention as set forth in the invention described above, the difficulty level evaluation unit determines the moving speed and / or moving direction of the face from the face detection result of each face, and based on the determination result. Further, the possibility that each face overlaps with another face is estimated.

また、本発明にかかる追尾装置は、上記の発明において、前記難易度評価部は、前記評価パラメータ毎に予め設定される重み係数を用いて前記評価パラメータを重み付けする重み付け部を有し、該重み付け部によって重み付けされた前記評価パラメータをもとに前記各顔の検出難易度を評価することを特徴とする。 In the tracking device according to the present invention, in the above invention, the difficulty level evaluation unit includes a weighting unit that weights the evaluation parameter using a weighting factor set in advance for each evaluation parameter. The degree of detection difficulty of each face is evaluated based on the evaluation parameter weighted by the unit.

また、本発明にかかる追尾装置は、上記の発明において、前記顔検出部によって検出された各顔を含む顔領域と、前記動き検出部によって検出された前記対象領域の動きとをもとに、前記画像中の顔領域を確定する顔領域確定部を備えることを特徴とする。 Further, the tracking device according to the present invention is based on the face area including each face detected by the face detection unit and the movement of the target area detected by the motion detection unit in the above invention. The image processing apparatus includes a face area determination unit that determines a face area in the image.

また、本発明にかかる追尾装置は、上記の発明において、前記連続する画像を切り換えて表示部に表示処理する表示処理部を備え、該表示処理部は、前記確定された前記画像中の顔領域に従って、前記画像中の顔を示す顔枠を表示することを特徴とする。 Further, the tracking device according to the present invention includes a display processing unit that switches the continuous images and performs display processing on the display unit in the above invention, and the display processing unit includes the face area in the determined image. According to the above, a face frame indicating the face in the image is displayed.

また、本発明にかかる追尾装置は、上記の発明において、フレーム毎に被写体を撮像して前記連続する画像を順次生成する撮像部と、撮影指示を行う撮影指示部と、前記確定された前記画像中の顔領域の顔について前記顔検出部が検出した最新の顔検出結果を用い、前記撮像部の撮像条件を設定する撮像条件設定部と、を備えることを特徴とする。 In the tracking device according to the present invention, in the above-described invention, an imaging unit that captures a subject for each frame and sequentially generates the continuous image, a shooting instruction unit that performs a shooting instruction, and the determined image An imaging condition setting unit configured to set an imaging condition of the imaging unit using a latest face detection result detected by the face detection unit for a face in an inside face region.

また、本発明にかかる追尾方法は、連続する画像中に出現した顔を追尾する追尾方法であって、前記連続する画像を順次処理して前記画像中の複数の顔を検出する顔検出工程と、前記顔検出工程で検出された各顔を含む顔領域をそれぞれの顔に対して設定する顔領域設定工程と、前記顔検出工程での顔検出結果をもとに、前記検出された各顔の検出難易度を評価する難易度評価工程と、前記各顔を含む顔領域の中から、前記各顔の検出難易度が所定条件を満たす顔領域を選択する顔領域選択工程と、前記顔領域選択工程での選択結果をもとに、前記各顔を含む顔領域それぞれに対して動き検出をする対象領域とするか否かを切り換えて設定する動き検出対象設定工程と、隣接する画像間で、前記動き検出対象設定工程で設定された対象領域の動きを検出する動き検出部と、を含むことを特徴とする。 The tracking method according to the present invention is a tracking method for tracking a face that appears in a continuous image, and a face detection step for detecting a plurality of faces in the image by sequentially processing the continuous image; A face area setting step for setting a face area including each face detected in the face detection step for each face, and each detected face based on a face detection result in the face detection step A difficulty level evaluation step for evaluating the detection difficulty level, a face area selection step for selecting a face area where the detection difficulty level of each face satisfies a predetermined condition from the face areas including the faces, and the face area Based on the selection result in the selection step, between the motion detection target setting step of switching and setting whether or not to make a motion detection target region for each face region including each face, and between adjacent images , The target region set in the motion detection target setting step Characterized in that it comprises a motion detector for detecting gas, a.

本発明によれば、顔検出によって検出された各顔を含む顔領域それぞれに対し、顔検出結果をもとに評価した各顔の検出難易度に従って、動き検出をする対象領域とするか否かを切り換えて設定することができる。そして、対象領域とした顔領域の動き検出を行うことができる。したがって、処理の負荷を増大させることなく、連続する画像中に出現する顔を安定して追尾することができるという効果を奏する。 According to the present invention, for each face area including each face detected by face detection, whether or not to make a target area for motion detection according to the detection difficulty level of each face evaluated based on the face detection result. Can be set by switching. Then, the motion detection of the face area as the target area can be performed. Therefore, there is an effect that a face appearing in a continuous image can be tracked stably without increasing the processing load.

以下、図面を参照し、本発明の好適な実施の形態について詳細に説明する。本実施の形態では、本発明の追尾装置をデジタルカメラに適用した場合を例にとって説明する。なお、この実施の形態によって本発明が限定されるものではない。また、各図面の記載において、同一部分には同一の符号を付して示している。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. In this embodiment, a case where the tracking device of the present invention is applied to a digital camera will be described as an example. Note that the present invention is not limited to the embodiments. Moreover, in description of each drawing, the same code | symbol is attached | subjected and shown to the same part.

（実施の形態）
図１は、デジタルカメラ１の背面図である。図１に示すように、デジタルカメラ１は、カメラ本体２の上面に配設された撮影タイミングを指示するためのレリーズスイッチ（シャッタースイッチ）３、カメラ本体２の背面に配設された電源スイッチ４やメニュースイッチ５、上下左右の各方向スイッチ（上スイッチ、下スイッチ、左スイッチおよび右スイッチ）を有する十字キー６、操作内容を確定する等のためのＯＫスイッチ７、撮影モードや再生モードといった各種モードを切り換えるモードダイヤル８、各種画面を表示する表示部２４等を備えている。レリーズスイッチ３は、例えば二段階式の押下ボタンであり、半押しされるとファーストレリーズスイッチがＯＮし、全押しされるとセカンドレリーズスイッチがＯＮするようになっている。また、図示しないが、カメラ本体２の前面には、フラッシュや撮像レンズ等が配設されている。 (Embodiment)
FIG. 1 is a rear view of the digital camera 1. As shown in FIG. 1, the digital camera 1 includes a release switch (shutter switch) 3 for instructing photographing timing provided on the upper surface of the camera body 2 and a power switch 4 provided on the back surface of the camera body 2. And a menu switch 5, a cross key 6 having up / down / left / right direction switches (up switch, down switch, left switch and right switch), an OK switch 7 for confirming the operation content, various shooting modes and playback modes, etc. A mode dial 8 for switching modes, a display unit 24 for displaying various screens, and the like are provided. The release switch 3 is, for example, a two-stage press button. When the release switch 3 is half-pressed, the first release switch is turned ON, and when the release switch 3 is fully pressed, the second release switch is turned ON. Although not shown, a flash, an imaging lens, and the like are disposed on the front surface of the camera body 2.

このデジタルカメラ１において電源スイッチ４を押下し、電源をＯＮすると、モードダイヤル８で撮影モードが選択されている場合であれば、デジタルカメラ１は撮影可能な状態となる。撮影モードでは、撮像レンズを通して入射する被写体像が１フレーム（例えば１／３０秒）毎に出力され、ライブビュー画像として表示部２４にリアルタイムに表示されるようになっており、ユーザは、ライブビュー画像を見ながらレリーズスイッチ３を押下して、静止画や動画の撮影を行う。一方、電源ＯＮ時、モードダイヤル８で再生モードが選択されていれば、デジタルカメラ１は再生モードとなる。この再生モードでは、ユーザは、デジタルカメラ１で撮影された静止画や動画を表示部２４に表示（再生）して楽しむ。 When the power switch 4 is pressed and the power is turned on in the digital camera 1, the digital camera 1 is ready for photographing if the photographing mode is selected with the mode dial 8. In the shooting mode, a subject image incident through the imaging lens is output every frame (for example, 1/30 second) and is displayed in real time on the display unit 24 as a live view image. While viewing the image, the release switch 3 is pressed to shoot a still image or a moving image. On the other hand, if the playback mode is selected with the mode dial 8 when the power is turned on, the digital camera 1 enters the playback mode. In this playback mode, the user enjoys displaying (playing back) still images and moving images captured by the digital camera 1 on the display unit 24.

先ず、本実施の形態のデジタルカメラ１の機能の１つである顔検出機能の概要について説明する。本実施の形態のデジタルカメラ１は、画像中に映る人物等の顔を検出する顔検出部１７（図８参照）を備えており、１フレーム毎に取り込まれるライブビュー画像を画像処理して顔検出を行う。そして、検出した顔を示す顔枠をライブビュー画像上に表示する処理を行う。 First, an outline of a face detection function that is one of the functions of the digital camera 1 of the present embodiment will be described. The digital camera 1 according to the present embodiment includes a face detection unit 17 (see FIG. 8) that detects the face of a person or the like shown in the image, and performs image processing on the live view image captured every frame. Perform detection. And the process which displays the face frame which shows the detected face on a live view image is performed.

図２は、表示部２４に更新表示される４枚のライブビュー画像の一例を時系列に沿って示した図である。上記したように、顔検出技術は、テンプレートマッチング法によって画像中の顔の位置を検出し、目や鼻、口等の顔の特徴点の位置を検出するものであり、顔が横や後ろを向いている場合や、顔が大きく傾いている場合等において検出精度が低下する場合がある。例えば、図２（ａ）のフレームＩ１１では、ライブビュー画像中に映る人物Ｐの顔が正面を向いているため顔検出に成功し、顔枠Ｎ１１が表示されている。これに対し、図２（ｂ）のように人物Ｐが動いたフレームＩ１３では、顔が横を向いており、角度も大きく傾いているため顔検出に失敗し、顔枠が表示されていない。続く図２（ｃ）のフレームＩ１５でも人物Ｐの顔が検出できず、顔枠が表示されない状態が継続している。そして、図２（ｄ）に示すフレームＩ１７では、人物Ｐの顔検出に成功し、顔枠Ｎ１４が表示されている。 FIG. 2 is a diagram showing an example of four live view images updated and displayed on the display unit 24 in time series. As described above, the face detection technique detects the position of the face in the image by the template matching method, and detects the position of the feature point of the face such as the eyes, nose and mouth. The detection accuracy may decrease when the camera is facing or when the face is greatly inclined. For example, in the frame I11 shown in FIG. 2A, the face detection is successful because the face of the person P shown in the live view image is facing the front, and the face frame N11 is displayed. On the other hand, in the frame I13 in which the person P moves as shown in FIG. 2B, the face is facing sideways and the angle is greatly inclined, so the face detection fails and no face frame is displayed. In the subsequent frame I15 in FIG. 2C, the face of the person P cannot be detected and the face frame is not displayed. In the frame I17 shown in FIG. 2 (d), the face detection of the person P is successful and the face frame N14 is displayed.

このように、フレーム間で顔の向きや角度が変化すると、顔枠の表示・非表示が繰り返される場合がある。特に、人物が激しく動くような場合にこのような事態が生じ、顔枠がちらついて見難い。また、顔検出結果は、露出やフォーカスの制御に利用されるが、これらの制御が安定しないという問題もある。本実施の形態は、ライブビュー画像中の顔検出と併せて顔エリア（顔領域）の動き検出を行い、顔検出に失敗した場合であっても、顔を見失わずに追尾できるようにするものである。 As described above, when the face direction or angle changes between frames, the display / non-display of the face frame may be repeated. In particular, such a situation occurs when a person moves violently, and the face frame flickers and is difficult to see. The face detection result is used for exposure and focus control, but there is a problem that these controls are not stable. In this embodiment, motion detection of a face area (face area) is performed in conjunction with face detection in a live view image, and even if face detection fails, tracking can be performed without losing sight of the face. It is.

図３は、５人の人物の顔Ｆ１〜Ｆ５が映る３枚のライブビュー画像の一例を時系列に沿って模式的に示した図であり、図３では、各フレームＩ２１，Ｉ２３，Ｉ２５においてそれぞれ顔検出で検出された顔を一点鎖線で囲んで示している。ここで、フレームＩ２１，Ｉ２３，Ｉ２５は、ライブビュー画像に映る各顔の向きや傾き等が時系列に沿って変化する様子を示したものであり、連続するフレームのライブビュー画像を示したものではない。例えば図３（ａ）のフレームＩ２１では、顔検出によって５人全ての顔Ｆ１〜Ｆ５が検出されている。一方、図３（ｂ）のフレームＩ２３では、向かって左上の顔Ｆ１について顔が大きく傾いたために顔検出に失敗している。また、図３（ｃ）のフレームＩ２５では、図３（ｂ）と同様に顔Ｆ１について顔検出に失敗しているのに加えて、向かって左下及び右下の顔Ｆ２，Ｆ５についても顔が後ろを向いてしまったために顔検出に失敗している。 FIG. 3 is a diagram schematically showing an example of three live view images showing the faces F1 to F5 of five persons along a time series. In FIG. 3, in each frame I21, I23, and I25, FIG. Each face detected by face detection is shown surrounded by a one-dot chain line. Here, the frames I21, I23, and I25 show how the orientation and inclination of each face reflected in the live view image change in time series, and show the live view images of successive frames. is not. For example, in the frame I21 in FIG. 3A, all five faces F1 to F5 are detected by face detection. On the other hand, in the frame I23 in FIG. 3B, the face detection has failed because the face is greatly inclined with respect to the upper left face F1. In addition, in the frame I25 in FIG. 3C, in addition to the face detection failure for the face F1 as in FIG. 3B, the faces also appear on the lower left and lower right faces F2 and F5. Face detection failed due to facing backwards.

また、図４は、図３と同様の３枚のライブビュー画像の一例を示す図であり、図４では、各フレームＩ２１，Ｉ２３，Ｉ２５においてそれぞれ顔検出で検出された顔を一点鎖線で囲んで示すとともに、各フレームＩ２１，Ｉ２３，Ｉ２５においてそれぞれ動き検出で検出された顔エリアを二点鎖線で囲んで示している。詳細は後述するが、動き検出は、顔検出によって検出された顔を含む顔領域（顔エリア）を動き検出エリア（対象領域）として隣接するフレーム間でパタンマッチングを行い、その移動量を算出するものであり、フレーム間で動き検出エリアが対応付けられれば、顔の向きや傾きに関係なく顔を見失わずに追尾できる。例えば、図４に示す例では、（ａ）〜（ｃ）に示す各フレームＩ２１〜Ｉ２５で、それぞれ５人全ての顔が検出できている。 FIG. 4 is a diagram showing an example of three live view images similar to FIG. 3. In FIG. 4, the faces detected by face detection in each of the frames I21, I23, and I25 are surrounded by a one-dot chain line. In addition, the face areas detected by motion detection in each of the frames I21, I23, and I25 are surrounded by a two-dot chain line. Although details will be described later, in motion detection, pattern matching is performed between adjacent frames using a face area (face area) including a face detected by face detection as a motion detection area (target area), and the amount of movement is calculated. If a motion detection area is associated between frames, tracking can be performed without losing sight of the face regardless of the orientation and inclination of the face. For example, in the example shown in FIG. 4, all five faces can be detected in each of the frames I21 to I25 shown in (a) to (c).

ここで、テンプレートマッチング法を適用した顔検出では、１枚の画像中に映る複数の顔（例えば数十個）を同時に検出できる。一方で、複数の顔エリアについて動き検出を実施する場合には、各顔エリアを動き検出エリアとし、それぞれ個別にパタンマッチングを行う必要があり、これらを同時に行うと処理負荷が増大する。このため、必要な処理速度を確保するためには、例えば動き検出をハードウェアで実現する場合であれば、回路規模が増大してしまうという問題があった。 Here, in face detection using the template matching method, a plurality of faces (for example, several tens) appearing in one image can be detected simultaneously. On the other hand, when motion detection is performed for a plurality of face areas, each face area must be set as a motion detection area, and pattern matching must be performed individually. If these are performed simultaneously, the processing load increases. For this reason, in order to ensure the necessary processing speed, for example, when motion detection is realized by hardware, there is a problem that the circuit scale increases.

そこで、本実施の形態では、予め同時に動き検出を実施可能な動き検出エリア数を定めておく。そして、顔検出結果をもとにその顔の優先度を採点し、動き検出エリア数よりも多い数の顔がライブビュー画像中に出現した場合には、採点の高い顔から順番に動き検出エリア数の顔エリアを選択して動き検出エリアとする。図５〜図７は、動き検出エリアの設定原理を説明する図であり、各図において、図３（ａ）〜（ｃ）および図４（ａ）〜（ｃ）でそれぞれ示した５人の人物の顔Ｆ１〜Ｆ５が映る３枚のライブビュー画像を示している。なお、ここでは、動き検出エリア数を「３」として説明するが、この動き検出エリア数は、実際のデジタルカメラ１の処理能力に応じて適宜定めることができる。 Therefore, in this embodiment, the number of motion detection areas in which motion detection can be simultaneously performed is determined in advance. Then, the priority of the face is scored based on the face detection result, and when the number of faces larger than the number of motion detection areas appears in the live view image, the motion detection areas are ordered in order from the face with the highest score. A number of face areas are selected as motion detection areas. 5 to 7 are diagrams for explaining the principle of setting the motion detection area. In each figure, the five persons shown in FIGS. 3 (a) to (c) and FIGS. 4 (a) to (c), respectively. Three live view images showing human faces F1 to F5 are shown. Although the number of motion detection areas is described as “3” here, the number of motion detection areas can be determined as appropriate according to the actual processing capability of the digital camera 1.

例えば、図５中に一点鎖線で囲んで示すように、顔検出によって顔Ｆ１〜Ｆ５が検出された場合、各顔Ｆ１〜Ｆ５について優先度を採点する。優先度の採点の詳細については後述するが、その顔についての顔検出結果をもとに、次のフレームで行う顔検出の検出難易度が高いと想定されるほど優先度が高くなるように採点を行う。すなわち、本実施の形態では、このようにして優先度を採点することで、顔検出によって検出された各顔の検出難易度を評価する。例えば、顔検出の結果、その顔の向きが正面から外れていたり、顔が傾いている場合、あるいは過去のフレームと比較して顔の向きや角度の変化が大きいといった場合には、次のフレームでその顔の顔検出に失敗する可能性が高いと考えられるので、このような顔について優先度を高く採点する。そして、採点した優先度をもとに、動き検出エリアとする顔エリアを選択する。 For example, when faces F1 to F5 are detected by face detection as shown by being surrounded by a one-dot chain line in FIG. 5, the priority is scored for each of the faces F1 to F5. Although details of scoring priority will be described later, scoring is performed so that the priority becomes higher as it is assumed that the detection difficulty of face detection performed in the next frame is higher based on the face detection result for the face. I do. That is, in the present embodiment, the priority of scoring is evaluated in this way, thereby evaluating the detection difficulty level of each face detected by face detection. For example, as a result of face detection, if the face orientation is off the front, the face is tilted, or the face orientation or angle changes significantly compared to the previous frame, the next frame Therefore, it is considered that there is a high possibility that the face detection of the face will fail. Then, a face area as a motion detection area is selected based on the scored priority.

例えば、図５に示す顔Ｆ１〜Ｆ５についてそれぞれ優先度を採点した結果、実線で囲んで示した３つの顔Ｆ１，Ｆ４，Ｆ５について優先度が高く採点された場合には、フレームＩ２１中の各顔Ｆ１，Ｆ４，Ｆ５の顔エリアを動き検出エリアとする。そして、各動き検出エリアについて、例えば次のフレームとの間で動き検出を行うことにより、これらの各顔Ｆ１，Ｆ４，Ｆ５を追尾する。 For example, as a result of scoring the priorities for the faces F1 to F5 shown in FIG. 5, when the priorities are scoring high for the three faces F1, F4, and F5 surrounded by solid lines, The face areas of the faces F1, F4, and F5 are set as motion detection areas. Then, for each motion detection area, for example, by performing motion detection with the next frame, these faces F1, F4, and F5 are tracked.

このようにして各フレームで顔検出を行うとともに、動き検出エリアとした顔エリアについて動き検出を行った結果、例えばフレームＩ２３では、図６（ａ）中に二点鎖線で囲んで示すように、顔Ｆ１，Ｆ４，Ｆ５の顔エリアが動き検出によってそれぞれ検出され、追尾されている。一方、顔検出は、顔Ｆ２〜Ｆ５については成功し、顔Ｆ１については失敗している。これによれば、結果的にフレームＩ２３で顔検出に失敗した顔Ｆ１に対する顔枠の表示を、動き検出結果をもとに継続して行うことができる。 As a result of performing face detection in each frame in this way and performing motion detection on the face area as the motion detection area, for example, in frame I23, as shown by being surrounded by a two-dot chain line in FIG. Face areas of the faces F1, F4, and F5 are detected and tracked by motion detection, respectively. On the other hand, the face detection succeeds for the faces F2 to F5 and fails for the face F1. As a result, it is possible to continuously display the face frame for the face F1 whose face detection has failed in the frame I23 based on the motion detection result.

また、本実施の形態では、次のフレームとの間で行う動き検出で動き検出エリアとする顔エリアを例えばその都度選択する。このとき、例えば図６（ａ）の顔Ｆ１のように、動き検出で検出できたが顔検出に失敗した顔エリアについては必ず動き検出エリアとして選択する。そして、動き検出エリアとして選択した顔エリアの数が動き検出エリア数に満たない場合には、顔検出結果をもとに、各顔について採点した優先度が高い顔エリアをさらに選択する。 In the present embodiment, for example, a face area that is used as a motion detection area in motion detection performed with the next frame is selected each time. At this time, for example, a face area that could be detected by motion detection but failed to detect the face, such as the face F1 in FIG. 6A, is always selected as the motion detection area. If the number of face areas selected as the motion detection area is less than the number of motion detection areas, a face area with a high priority scored for each face is further selected based on the face detection result.

例えば、図６（ａ）の例では、動き検出で検出できたが顔検出に失敗した顔エリアは１つ（顔Ｆ１）であり、動き検出エリア数「３」に満たないため、顔検出に成功している顔Ｆ２〜Ｆ５のうちの２つの顔の顔エリアについて動き検出を行うことが可能である。例えば、フレームＩ２３について行った顔検出結果をもとに各顔Ｆ２〜Ｆ５についてそれぞれ優先度を採点した結果、顔Ｆ５，Ｆ２，Ｆ４，Ｆ３の順に優先度が高く採点されたとする。ここで、前回動き検出エリアとして選択した顔の優先度が低く採点される場合がある。例えば、例示した顔Ｆ５，Ｆ２，Ｆ４，Ｆ３の順の優先度の採点結果では、前回動き検出エリアとして選択した顔Ｆ４よりも顔Ｆ２の優先度が高く採点されている。これは、フレームＩ２３において、直前のフレームとの間で顔Ｆ４と比較して顔Ｆ２が大きく変化しているためである。この場合には、図６（ｂ）中に実線で囲んで示す各顔Ｆ１，Ｆ２，Ｆ５の顔エリアを動き検出エリアとし、各動き検出エリアについて、次のフレームとの間で動き検出を行うことにより、これらの各顔Ｆ１，Ｆ２，Ｆ５を追尾する。 For example, in the example of FIG. 6A, there is one face area (face F1) that could be detected by motion detection but failed to detect the face, and the number of motion detection areas is less than “3”. Motion detection can be performed for the face areas of two of the successful faces F2 to F5. For example, it is assumed that the priority is scored in the order of the faces F5, F2, F4, and F3 as a result of scoring the priorities for the faces F2 to F5 based on the face detection results performed for the frame I23. Here, the priority of the face selected as the previous motion detection area may be scored low. For example, in the priority scoring results in the order of the faces F5, F2, F4, F3 illustrated, the priority of the face F2 is scored higher than the face F4 selected as the previous motion detection area. This is because in the frame I23, the face F2 has changed significantly compared to the face F4 from the previous frame. In this case, the face area of each of the faces F1, F2, and F5 indicated by a solid line in FIG. 6B is set as a motion detection area, and motion detection is performed between each motion detection area and the next frame. Thus, each of these faces F1, F2, and F5 is tracked.

この結果、例えば、図７中に二点鎖線で囲んで示すように、フレームＩ２５では、動き検出によって顔Ｆ１，Ｆ２，Ｆ５の顔エリアがそれぞれ検出され、追尾されている。一方、顔検出は、顔Ｆ３，Ｆ４については成功し、顔Ｆ１，Ｆ２，Ｆ５については失敗している。これによれば、結果的にフレームＩ２５で顔検出に失敗した各顔Ｆ１，Ｆ２，Ｆ５に対する顔枠の表示を、動き検出結果をもとに引き続き行える。このように、フレーム毎に顔検出によって検出された各顔の優先度を採点し、優先度の高い顔の顔エリアを動き検出エリアとして設定することで、動き検出エリアを適切に設定することができる。したがって、動き検出エリア数を所定数（動き検出エリア数）に制限して処理負荷の増大を許容範囲内に抑えつつ、顔検出し難い顔が出現した場合であっても安定した追尾が実現できる。 As a result, for example, as indicated by being surrounded by a two-dot chain line in FIG. 7, the face areas of the faces F1, F2, and F5 are detected and tracked by the motion detection in the frame I25. On the other hand, the face detection succeeds for the faces F3 and F4 and fails for the faces F1, F2, and F5. According to this, it is possible to continuously display the face frame for each of the faces F1, F2, and F5 whose face detection has failed in the frame I25, based on the motion detection result. In this way, it is possible to appropriately set the motion detection area by scoring the priority of each face detected by face detection for each frame and setting the face area of the face with high priority as the motion detection area. it can. Therefore, it is possible to realize stable tracking even when a face that is difficult to detect a face appears while limiting the number of motion detection areas to a predetermined number (number of motion detection areas) and suppressing an increase in processing load within an allowable range. .

次に、このデジタルカメラ１の構成について説明する。図８は、デジタルカメラ１の構成例を示す概略ブロック図である。図８に示すように、デジタルカメラ１は、撮像光学系１１、撮像素子１２、ＡＦＥ（Analog Front End）１３、フレームメモリ１４、動き検出部１５、画像処理部１６、顔検出部１７、顔領域選択部および動き検出対象設定部としての顔エリア選択部１８、記録媒体Ｉ／Ｆ１９、記録媒体保持部２０、記録媒体２１、ビデオエンコーダ２２、表示ドライバ２３、表示部２４、ビデオ信号出力端子２５、操作部２６、ＲＡＭ２７、ＲＯＭ２８、表示処理部および撮像条件設定部としてのコントローラ２９等を備える。 Next, the configuration of the digital camera 1 will be described. FIG. 8 is a schematic block diagram illustrating a configuration example of the digital camera 1. As shown in FIG. 8, the digital camera 1 includes an imaging optical system 11, an imaging device 12, an AFE (Analog Front End) 13, a frame memory 14, a motion detection unit 15, an image processing unit 16, a face detection unit 17, a face area. A face area selection unit 18 as a selection unit and a motion detection target setting unit, a recording medium I / F 19, a recording medium holding unit 20, a recording medium 21, a video encoder 22, a display driver 23, a display unit 24, a video signal output terminal 25, An operation unit 26, a RAM 27, a ROM 28, a display processing unit, a controller 29 as an imaging condition setting unit, and the like are provided.

撮像光学系１１は、撮像レンズ、絞り、シャッター等を含み、入射される被写体像を撮像素子１２に結像する。撮像素子１２は、例えばＣＣＤ（Charge Coupled Device）やＣＭＯＳ（Complementary Metal Oxide Semiconductor）等の固体撮像素子であり、被写体からの光束を撮像光学系１１を介して受光し、光電変換することによってフレーム単位の画像データ（アナログ電気信号）を得るものである。ＡＦＥ１３は、撮像素子１２によって得られる画像データに対してＣＤＳ（Correlated Double Sampling）やＡＧＣ（Automatic Gain Control）等のアナログ信号処理を施した後、Ａ／Ｄ変換処理を施してデジタル電気信号に変換する。ＡＦＥ１３によってデジタル化された画像データは、フレームメモリ１４および動き検出部１５に出力されるとともに、ＲＡＭ２７に一時的に記録される。 The imaging optical system 11 includes an imaging lens, a diaphragm, a shutter, and the like, and forms an incident subject image on the imaging element 12. The image pickup device 12 is a solid-state image pickup device such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor), for example, and receives a light beam from a subject via the image pickup optical system 11 and photoelectrically converts it into frame units. Image data (analog electrical signal) is obtained. The AFE 13 performs analog signal processing such as CDS (Correlated Double Sampling) and AGC (Automatic Gain Control) on the image data obtained by the image sensor 12, and then performs A / D conversion processing to convert it into a digital electrical signal. To do. The image data digitized by the AFE 13 is output to the frame memory 14 and the motion detection unit 15 and is temporarily recorded in the RAM 27.

フレームメモリ１４は、動き検出部１５による作業用メモリとして用いられる。このフレームメモリ１４は、２フレーム分の画像データを格納するための領域を備え、ライブビュー画像の表示時において、現フレームのライブビュー画像（現フレーム画像）の画像データと、直前に取り込まれた直前フレームのライブビュー画像（直前フレーム画像）の画像データとが記録される。 The frame memory 14 is used as a working memory by the motion detection unit 15. This frame memory 14 has an area for storing image data for two frames, and at the time of displaying a live view image, the image data of the live view image (current frame image) of the current frame and the data captured immediately before are displayed. Image data of a live view image (previous frame image) of the immediately preceding frame is recorded.

動き検出部１５は、ＡＦＥ１３からの画像データをもとに、フレーム間の動きを検出するためのものであり、例えば１フレーム毎に撮像素子１２から取り込まれてＡＦＥ１３から出力される画像間で動き検出エリアの動きベクトルを求めることによって、フレーム間での動き検出エリアの動きを検出する。具体的には、動き検出部１５は、顔エリア選択部１８によって選択される顔エリアを動き検出エリアとする。そして、動き検出部１５は、ＡＦＥ１３から随時入力されてフレームメモリ１４に記録される直前フレーム画像と現フレーム画像との間でパタンマッチングを行い、各動き検出エリアについてそれぞれその移動量を表す動きベクトルを算出する。 The motion detection unit 15 is for detecting the motion between frames based on the image data from the AFE 13. For example, the motion detection unit 15 moves between images captured from the image sensor 12 and output from the AFE 13 for each frame. The motion of the motion detection area between frames is detected by obtaining the motion vector of the detection area. Specifically, the motion detection unit 15 sets the face area selected by the face area selection unit 18 as a motion detection area. Then, the motion detection unit 15 performs pattern matching between the immediately preceding frame image input from the AFE 13 and recorded in the frame memory 14 and the current frame image, and a motion vector representing the amount of movement for each motion detection area. Is calculated.

図９は、直前フレーム画像に設定される動き検出エリアの一例を示す図であり、直前フレーム画像の画角範囲Ｅｖに対し、顔エリア選択部１８によって選択された顔エリアを動き検出エリアＥａ１として設定した様子を示している。また、図１０は、直前フレーム画像に設定される動き検出エリアの他の例を示す図であり、図９と比較してサイズの大きい顔エリアを動き検出エリアＥａ２として設定した様子を示している。なお、図９および図１０中では１つの動き検出エリアを示しているが、実際には、顔エリア選択部１８は予め定められた動き検出エリア数の顔エリアを選択するようになっており、動き検出部１５は、各顔エリアを動き検出エリアとして設定する。 FIG. 9 is a diagram illustrating an example of the motion detection area set in the immediately preceding frame image, and the face area selected by the face area selecting unit 18 with respect to the angle of view range Ev of the immediately preceding frame image is defined as the motion detection area Ea1. It shows how it was set. FIG. 10 is a diagram showing another example of the motion detection area set in the immediately preceding frame image, and shows a state in which a face area having a size larger than that in FIG. 9 is set as the motion detection area Ea2. . Although FIG. 9 and FIG. 10 show one motion detection area, in actuality, the face area selection unit 18 selects a face area having a predetermined number of motion detection areas, The motion detection unit 15 sets each face area as a motion detection area.

そして、動き検出部１５は、図９，図１０に示したように直前フレーム画像に設定した動き検出エリアＥａ１，Ｅａ２について、現フレーム画像との間でパタンマッチングを行う。ここで、パタンマッチングは、精度を高めるため、動き検出エリア内に複数のマクロブロックを設定し、マクロブロック毎にその動きベクトルを求めることにより行う。例えば、図９に示す例では、動き検出エリアＥａ１内に４個のマクロブロックＢを設定している。一方、図１０に示す例では、動き検出エリアＥａ２内に２５個のマクロブロックＢを設定しており、動き検出エリアのサイズに応じた数のマクロブロックＢが設定されるようになっている。なお、動き検出エリア内に設定するマクロブロックの数は同数であってもよく、また、マクロブロックのサイズについても適宜設定することができる。 Then, the motion detection unit 15 performs pattern matching with the current frame image for the motion detection areas Ea1 and Ea2 set in the immediately preceding frame image as shown in FIGS. Here, the pattern matching is performed by setting a plurality of macro blocks in the motion detection area and obtaining the motion vector for each macro block in order to improve accuracy. For example, in the example shown in FIG. 9, four macroblocks B are set in the motion detection area Ea1. On the other hand, in the example shown in FIG. 10, 25 macroblocks B are set in the motion detection area Ea2, and the number of macroblocks B corresponding to the size of the motion detection area is set. Note that the number of macroblocks set in the motion detection area may be the same, and the size of the macroblock can be set as appropriate.

顔検出部１７は、図８に示すように、ＡＦＥ１３によってデジタル化されてＲＡＭ２７に記録された画像データを処理して顔を検出し、顔検出結果をＲＡＭ２７に一時的に記録する。例えば、顔検出部１７は、テンプレートマッチング法によってライブビュー画像中の顔の位置を検出し、目や鼻、口等の顔の特徴点の位置を検出することによって、顔の大きさや向き、角度等を検出する。この顔検出部１７による顔検出によって、例えばライブビュー画像中の顔の有無、顔の位置、顔の大きさ、顔の向き、顔の角度（傾き）等が顔検出結果として得られる。なお、顔検出結果は、検出した各顔についてそれぞれ少なくとも最新の顔検出結果を保持しておくようになっている。 As shown in FIG. 8, the face detection unit 17 detects the face by processing the image data digitized by the AFE 13 and recorded in the RAM 27, and temporarily records the face detection result in the RAM 27. For example, the face detection unit 17 detects the position of the face in the live view image by the template matching method, and detects the position of the feature point of the face such as the eyes, nose, mouth, etc. Etc. are detected. By the face detection by the face detection unit 17, for example, presence / absence of a face in the live view image, face position, face size, face orientation, face angle (tilt), and the like are obtained as face detection results. As the face detection result, at least the latest face detection result is held for each detected face.

顔エリア選択部１８は、動き検出エリア数の顔エリアを選択し、選択結果をもとに次のフレームとの間で動き検出を行う動き検出エリアを切り換えて設定する。本実施の形態では、顔エリア選択部１８は、顔検出部１７が検出した各顔についてコントローラ２９の優先度採点部２９３が採点した優先度をもとに、優先度が高い顔の顔エリアを優先的に選択し、選択した各顔の顔エリアをそれぞれ動き検出エリアとして設定する。 The face area selection unit 18 selects face areas corresponding to the number of motion detection areas, and switches and sets a motion detection area for performing motion detection with the next frame based on the selection result. In the present embodiment, the face area selection unit 18 selects a face area of a face with high priority based on the priority scored by the priority scoring unit 293 of the controller 29 for each face detected by the face detection unit 17. The face area of each selected face is set as a motion detection area.

画像処理部１６は、ＲＡＭ２７に一旦記録された画像データを読み出し、この画像データに対して各種の画像処理を施すとともに、記録用、あるいは表示用等に適した画像データに変換する処理を行う。例えば、撮影画像の画像データを記録する際、あるいは記録されている画像データを表示する際等に、ＪＰＥＧ（Joint Photographic Experts Group）方式等に基づく画像データの圧縮処理や伸張処理を行う。この画像処理部１６で画像処理された画像データは、記録媒体Ｉ／Ｆ１９に出力されて記録媒体２１に記録され、あるいはビデオエンコーダ２２に出力されて表示部２４に表示される。 The image processing unit 16 reads the image data once recorded in the RAM 27, performs various image processes on the image data, and performs a process of converting the image data into image data suitable for recording or display. For example, when recording image data of a photographed image or displaying recorded image data, image data compression processing or decompression processing based on a JPEG (Joint Photographic Experts Group) method or the like is performed. The image data processed by the image processing unit 16 is output to the recording medium I / F 19 and recorded on the recording medium 21 or is output to the video encoder 22 and displayed on the display unit 24.

ビデオエンコーダ２２は、表示用に変換された画像データを表示ドライバ２３に送出する。例えば撮影モードでは、１フレーム毎に撮像素子１２から取り込まれて画像処理部１６によって画像処理された画像をフレーム単位で表示部２４に切換表示させ、ライブビュー画像の表示を行う。一方、再生モードでは、記録媒体２１から読み出されて画像処理部１６よって画像処理された撮影画像を表示部２４に表示させる。また、このビデオエンコーダ２２は、ビデオ信号出力端子２５に接続された外部機器に対し、必要に応じて表示用の画像データを出力する。表示部２４は、撮影画像やライブビュー画像の他、デジタルカメラ１の各種設定情報等を表示するためのものであり、ＬＣＤ（Liquid Crystal Display）やＥＬディスプレイ（Electroluminescence Display）等の表示装置で実現される。 The video encoder 22 sends the image data converted for display to the display driver 23. For example, in the shooting mode, an image captured from the image sensor 12 for each frame and subjected to image processing by the image processing unit 16 is switched and displayed on the display unit 24 in units of frames to display a live view image. On the other hand, in the reproduction mode, the captured image read from the recording medium 21 and image-processed by the image processing unit 16 is displayed on the display unit 24. The video encoder 22 outputs image data for display to an external device connected to the video signal output terminal 25 as necessary. The display unit 24 displays various setting information of the digital camera 1 in addition to the captured image and the live view image, and is realized by a display device such as an LCD (Liquid Crystal Display) or an EL display (Electroluminescence Display). Is done.

記録媒体Ｉ／Ｆ１９は、記録媒体保持部２０によって挿脱自在に保持される記録媒体２１に対して、記録用に変換された画像データ等の書き込みや、記録された画像データの読み出し等を行う。記録媒体２１は、例えばｘＤ−ピクチャーカード（登録商標）やコンパクトフラッシュ（登録商標）カード等のメモリカードである。 The recording medium I / F 19 performs writing of image data converted for recording, reading of recorded image data, and the like with respect to the recording medium 21 detachably held by the recording medium holding unit 20. . The recording medium 21 is a memory card such as an xD-picture card (registered trademark) or a compact flash (registered trademark) card.

操作部２６は、撮影タイミングの指示、撮影モードや再生モード等のモードの設定操作、撮影条件の設定操作等、ユーザによる各種操作を受け付けて操作信号をコントローラ２９に通知するためのものであり、各種機能が割り当てられたボタンスイッチ、ダイヤル、各種センサ等で実現される。この操作部２６は、図１のレリーズスイッチ３、電源スイッチ４、メニュースイッチ５、十字スイッチ６、ＯＫスイッチ７およびモードダイヤル８を含む。 The operation unit 26 is for accepting various operations by the user, such as an instruction of shooting timing, a setting operation of a shooting mode or a playback mode, a setting operation of shooting conditions, and the like, and notifies an operation signal to the controller 29. It is realized with button switches, dials, various sensors, etc., to which various functions are assigned. The operation unit 26 includes the release switch 3, the power switch 4, the menu switch 5, the cross switch 6, the OK switch 7 and the mode dial 8 of FIG.

ＲＯＭ２８は、デジタルカメラ１を動作させ、このデジタルカメラ１が備える種々の機能を実現するための各種のカメラプログラムや、このカメラプログラムの実行中に使用されるデータ等を予め記録する。ＲＡＭ２７は、画像処理部１６やコントローラ２９の作業用メモリとして用いられる。例えば、ＡＦＥ１３からの画像データ等が一時的に記録され、表示部２４に表示するライブビュー画像の画像データを生成する際の作業用や撮影画像を記録媒体２１に記録する際の作業用等に用いられる。 The ROM 28 previously records various camera programs for operating the digital camera 1 and realizing various functions of the digital camera 1, data used during the execution of the camera program, and the like. The RAM 27 is used as a working memory for the image processing unit 16 and the controller 29. For example, image data or the like from the AFE 13 is temporarily recorded and used for generating image data of a live view image to be displayed on the display unit 24 or for recording a captured image on the recording medium 21. Used.

コントローラ２９は、操作部２６からの操作信号等に応じてＲＯＭ２８からカメラプログラムを読み出して実行し、デジタルカメラ１を構成する各部の動作制御やメモリ制御を行ってデジタルカメラ１全体の動作を統括的に制御する。また、ＡＦ（自動焦点）、ＡＥ（自動露出）、ＡＷＢ（自動ホワイトバランス）等の処理を行う。このコントローラ２９は、撮影開始指示部２９１と、難易度評価部としての優先度採点部２９３、顔領域設定部としての動き検出候補設定部２９５と、顔領域確定部としての顔エリア確定部２９７とを含む。撮影開始指示部２９１は、撮影処理の開始タイミングを指示する。優先度採点部２９３は、顔検出結果をもとに検出された顔の優先度を採点する。本実施の形態では、優先度採点部２９３は、顔の大きさ、顔の位置、顔の向き、顔の向き変化、顔の重なり、顔の角度および顔の角度変化を評価パラメータとし、先ず各評価パラメータの採点値を求める。そして、優先度採点部２９３は、各評価パラメータに予め設定された重み係数を用いて求めた各評価パラメータの採点値をそれぞれ重み付けして合算し、優先度とする。なお、評価パラメータとしては、例示したものに限定されない。例えば、顔検出結果をもとに顔の移動速度や顔の移動速度を判定し、これらの値を評価パラメータとして用いることとしてもよい。動き検出候補設定部２９５は、顔検出結果に基づく顔エリアと動き検出結果に基づく顔エリアとをもとに次回の動き検出候補を設定する。顔エリア確定部２９７は、現フレームのライブビュー画像中における顔エリアを確定する。 The controller 29 reads out and executes a camera program from the ROM 28 in response to an operation signal from the operation unit 26, and performs operation control and memory control of each unit constituting the digital camera 1 to control the entire operation of the digital camera 1. To control. In addition, processing such as AF (automatic focus), AE (automatic exposure), and AWB (automatic white balance) is performed. The controller 29 includes a shooting start instruction unit 291, a priority scoring unit 293 as a difficulty level evaluation unit, a motion detection candidate setting unit 295 as a face region setting unit, and a face area determination unit 297 as a face region determination unit. including. The shooting start instruction unit 291 instructs the start timing of the shooting process. The priority scoring unit 293 scores the priority of the detected face based on the face detection result. In this embodiment, the priority scoring unit 293 uses the face size, face position, face orientation, face orientation change, face overlap, face angle, and face angle change as evaluation parameters. Find the scoring value of the evaluation parameter. Then, the priority scoring unit 293 weights and adds the scoring values of the respective evaluation parameters obtained using the weighting factors set in advance for the respective evaluation parameters to obtain the priority. In addition, as an evaluation parameter, it is not limited to what was illustrated. For example, the face moving speed or the face moving speed may be determined based on the face detection result, and these values may be used as evaluation parameters. The motion detection candidate setting unit 295 sets the next motion detection candidate based on the face area based on the face detection result and the face area based on the motion detection result. The face area determination unit 297 determines the face area in the live view image of the current frame.

次に、デジタルカメラ１が行う処理手順について説明する。図１１は、デジタルカメラ１が行う基本処理の手順を示すフローチャートである。電源投入時、デジタルカメラ１は、モードダイヤル８によって選択されているモードに応じた処理を行う。すなわち、図１１に示すように、現在選択されているモードが撮影モードの場合には（ステップａ１：Ｙｅｓ）、コントローラ２９は、撮像素子１２に結像されている被写体像の画像（ライブビュー画像）を取り込み（ステップａ３）、顔エリア検出処理に移る（ステップａ５）。顔エリア検出処理では、ステップａ３で取り込んだ現フレームのライブビュー画像中の顔エリアを顔検出と動き検出とによって検出し、このライブビュー画像中における顔エリアを確定する。 Next, a processing procedure performed by the digital camera 1 will be described. FIG. 11 is a flowchart illustrating a procedure of basic processing performed by the digital camera 1. When the power is turned on, the digital camera 1 performs processing according to the mode selected by the mode dial 8. That is, as shown in FIG. 11, when the currently selected mode is the shooting mode (step a1: Yes), the controller 29 images the subject image formed on the image sensor 12 (live view image). ) (Step a3), and the process proceeds to face area detection processing (step a5). In the face area detection process, the face area in the live view image of the current frame captured in step a3 is detected by face detection and motion detection, and the face area in the live view image is determined.

顔エリア検出処理を終えると、続いてコントローラ２９は、ライブビュー画像を表示部２４に表示する処理を行う（ステップａ７）。このとき、ステップａ５で確定した顔エリアに顔を示す顔枠を表示したライブビュー画像が表示部２４に表示される。また、コントローラ２９は、ステップａ５で確定された顔エリアがあれば（ステップａ９：Ｙｅｓ）、この確定した顔エリアをもとに撮像条件を設定してＡＦ、ＡＥ、ＡＷＢ等の処理を行う（ステップａ１１）。具体的には、コントローラ２９は、ステップａ５で確定された顔エリアの顔について得られている最新の顔検出結果を用いて撮像条件を設定する。すなわち、確定された顔エリアのうち、顔検出結果に基づく顔エリア（今回顔検出に成功した顔の顔エリア）についてはその顔検出結果を用い、動き検出結果に基づく顔エリア（今回顔検出に失敗した顔の顔エリア）についてはその顔について前回成功したときの顔検出結果を用い、撮像条件を設定する。一方、コントローラ２９は、ステップａ５で確定された顔エリアがない場合には（ステップａ９：Ｎｏ）、通常範囲（例えば画面全体）でＡＦ、ＡＥ、ＡＷＢ等の処理を行う（ステップａ１３）。 When the face area detection process is completed, the controller 29 subsequently performs a process of displaying the live view image on the display unit 24 (step a7). At this time, a live view image in which a face frame indicating a face is displayed in the face area determined in step a5 is displayed on the display unit 24. If there is a face area determined in step a5 (step a9: Yes), the controller 29 sets the imaging condition based on the determined face area and performs processing such as AF, AE, and AWB ( Step a11). Specifically, the controller 29 sets the imaging condition using the latest face detection result obtained for the face in the face area determined in step a5. That is, among the determined face areas, the face area based on the face detection result (the face area of the face that has been successfully detected this time) is used, and the face area based on the motion detection result (for the current face detection). For the face area of the failed face), the imaging condition is set using the face detection result when the face succeeded last time. On the other hand, if there is no face area determined in step a5 (step a9: No), the controller 29 performs AF, AE, AWB, etc. in the normal range (for example, the entire screen) (step a13).

そして、レリーズスイッチ３が一段階押下され、ファーストレリーズスイッチがＯＮされるまでの間（ステップａ１５：Ｎｏ）、ステップａ３に戻って１フレーム毎に処理を繰り返す。 Then, until the release switch 3 is pressed down one step and the first release switch is turned on (step a15: No), the process returns to step a3 and the process is repeated for each frame.

レリーズスイッチ３が一段階押下され、ファーストレリーズスイッチがＯＮされた場合には（ステップａ１５：Ｙｅｓ）、続いてコントローラ２９は、ステップａ３と同様にして、撮像素子１２に結像されている被写体の画像を取り込む（ステップａ１７）。その後、顔検出処理に移る（ステップａ１９）。顔エリア検出処理では、ステップａ１７で取り込んだ現フレームのライブビュー画像中の顔エリアを顔検出と動き検出とによって検出し、このライブビュー画像中における顔エリアを確定する。 When the release switch 3 is pressed down one step and the first release switch is turned on (step a15: Yes), the controller 29 then continues to detect the subject imaged on the image sensor 12 in the same manner as in step a3. An image is captured (step a17). Thereafter, the process proceeds to face detection processing (step a19). In the face area detection process, the face area in the live view image of the current frame captured in step a17 is detected by face detection and motion detection, and the face area in the live view image is determined.

そして、顔検出処理の後、続いてコントローラ２９は、ライブビュー画像を表示部２４に表示する処理を行う（ステップａ２１）。このとき、ステップａ１９で確定した顔エリアに顔を示す顔枠を表示したライブビュー画像が表示部２４に表示される。また、コントローラ２９は、ステップａ１９で確定された顔エリアがあれば（ステップａ２３：Ｙｅｓ）、この確定した顔エリアをもとに撮像条件を設定してＡＦ、ＡＥ、ＡＷＢ等の処理を行う（ステップａ２５）。具体的には、コントローラ２９は、ステップａ１１と同様に、ステップａ１９で確定された顔エリアの顔について得られている最新の顔検出結果を用いて撮像条件を設定する。一方、コントローラ２９は、ステップａ１９で確定された顔エリアがない場合には（ステップａ２３：Ｎｏ）、通常範囲でＡＦ、ＡＥ、ＡＷＢ等の処理を行う（ステップａ２７）。 Then, after the face detection process, the controller 29 performs a process of displaying the live view image on the display unit 24 (step a21). At this time, a live view image in which a face frame indicating a face is displayed in the face area determined in step a19 is displayed on the display unit 24. If there is a face area determined in step a19 (step a23: Yes), the controller 29 sets the imaging condition based on the determined face area and performs processing such as AF, AE, and AWB ( Step a25). Specifically, the controller 29 sets the imaging condition using the latest face detection result obtained for the face in the face area determined in step a19, as in step a11. On the other hand, if there is no face area determined in step a19 (step a23: No), the controller 29 performs processing such as AF, AE, AWB, etc. in the normal range (step a27).

そして、レリーズスイッチ３が二段階押下され、セカンドレリーズスイッチがＯＮされるまでの間（ステップａ２９：Ｎｏ）、ステップａ１７に戻って１フレーム毎に処理を繰り返す。 Then, until the release switch 3 is pressed in two stages and the second release switch is turned on (step a29: No), the process returns to step a17 and the process is repeated for each frame.

レリーズスイッチ３が二段階押下され、セカンドレリーズスイッチがＯＮされた場合には（ステップａ２９：Ｙｅｓ）、撮影処理に移る（ステップａ３１）。具体的には、撮影開始指示部２９１が、セカンドレリーズスイッチがＯＮされたタイミングを撮影タイミングとして撮影処理の開始を指示する。これによって撮影処理が開始され、撮影画像の画像データを生成する。生成した撮影画像の画像データは、記録媒体２１に記録される。なお、この撮影処理が開始されると、ライブビュー画像の表示が一旦停止される。このライブビュー画像の表示は、露光の後、画像データの転送処理や画像処理を終えると復帰するようになっている。 If the release switch 3 is depressed in two stages and the second release switch is turned on (step a29: Yes), the process proceeds to the photographing process (step a31). Specifically, the imaging start instruction unit 291 instructs the start of the imaging process with the timing when the second release switch is turned on as the imaging timing. Thereby, a photographing process is started, and image data of the photographed image is generated. The generated image data of the captured image is recorded on the recording medium 21. Note that when this shooting process is started, the display of the live view image is temporarily stopped. The display of the live view image is restored after the image data transfer processing and image processing are completed after exposure.

続いてステップａ３３に移り、コントローラ２９は、撮影モードの終了判定を行う。撮影モードを終了する場合には（ステップａ３３：Ｙｅｓ）、ステップａ４３に移る。撮影モードを終了しない場合には（ステップａ３３：Ｎｏ）、ステップａ３に戻る。 Subsequently, the process proceeds to step a33, and the controller 29 determines the end of the shooting mode. When the shooting mode is to be ended (step a33: Yes), the process proceeds to step a43. If the shooting mode is not terminated (step a33: No), the process returns to step a3.

一方、現在選択されているモードが撮影モードでなく（ステップａ１：Ｎｏ）、再生モードの場合には（ステップａ３５：Ｙｅｓ）、コントローラ２９は、過去に撮影されて記録媒体２１に記録されている静止画や動画の画像データを例えばサムネイル形式で一覧表示する処理を行い、ユーザ操作に従って一覧の中から再生画像を選択する（ステップａ３７）。そして、コントローラ２９は、選択した再生画像を表示部２４に表示する処理を行う（ステップａ３９）。 On the other hand, when the currently selected mode is not the shooting mode (step a1: No) and in the playback mode (step a35: Yes), the controller 29 has been shot in the past and recorded on the recording medium 21. A process for displaying a list of image data of still images and moving images, for example, in a thumbnail format is performed, and a reproduced image is selected from the list in accordance with a user operation (step a37). Then, the controller 29 performs a process of displaying the selected reproduction image on the display unit 24 (step a39).

続いてステップａ４１に移り、コントローラ２９は、再生モードの終了判定を行う。再生モードを終了する場合には（ステップａ４１：Ｙｅｓ）、ステップａ４３に移る。再生モードを終了しない場合には（ステップａ４１：Ｎｏ）、ステップａ３７に戻る。 Subsequently, the process proceeds to step a41, and the controller 29 determines the end of the reproduction mode. When the reproduction mode is to be ended (step a41: Yes), the process proceeds to step a43. If the reproduction mode is not terminated (step a41: No), the process returns to step a37.

そして、ステップａ４３では、コントローラ２９は、基本処理を終了するか否かを判定する。例えば、電源スイッチ４が押下されて電源ＯＦＦが指示された場合に、本処理を終える（ステップａ４３：Ｙｅｓ）。一方、終了しない場合には（ステップａ４３：Ｎｏ）、ステップａ１に戻る。 In step a43, the controller 29 determines whether to end the basic process. For example, when the power switch 4 is pressed and an instruction to turn off the power is given, this processing is finished (step a43: Yes). On the other hand, if not finished (step a43: No), the process returns to step a1.

次に、図１１のステップａ５およびステップａ１９で行う顔エリア検出処理について説明する。図１２は、顔エリア検出処理の詳細な処理手順を示すフローチャートである。顔エリア検出処理では、先ずコントローラ２９が、動き検出候補の有無を判定する。ここで、図１２中のステップｂ１５で、動き検出候補が設定される。このため、撮影モードが選択されて最初に行う顔エリア検出処理では動き検出候補はなく（ステップｂ１：Ｎｏ）、ステップｂ９に移る。すなわち、顔検出部１７が現フレーム画像中の顔検出を行い（ステップｂ９）、顔検出結果をＲＡＭ２７に記録する（ステップｂ１１）。 Next, the face area detection process performed in step a5 and step a19 in FIG. 11 will be described. FIG. 12 is a flowchart showing a detailed processing procedure of the face area detection processing. In the face area detection process, first, the controller 29 determines whether there is a motion detection candidate. Here, a motion detection candidate is set in step b15 in FIG. For this reason, there is no motion detection candidate in the face area detection process performed first after the photographing mode is selected (step b1: No), and the process proceeds to step b9. That is, the face detection unit 17 detects a face in the current frame image (step b9), and records the face detection result in the RAM 27 (step b11).

続いて、優先度採点部２９３が、優先度採点処理を実行する（ステップｂ１３）。図１３は、優先度採点処理の詳細な処理手順を示すフローチャートである。この優先度採点処理は、図１２のステップｂ９で検出された全ての顔についてそれぞれ行われ、優先度採点部２９３は、各顔の優先度をその顔についての顔検出結果をもとに採点する。 Subsequently, the priority scoring unit 293 executes a priority scoring process (step b13). FIG. 13 is a flowchart showing a detailed processing procedure of the priority scoring process. This priority scoring process is performed for each of the faces detected in step b9 in FIG. 12, and the priority scoring unit 293 scores the priority of each face based on the face detection result for that face. .

すなわち、優先度採点部２９３は、図１２のステップｂ９の顔検出結果をもとに、先ず顔の大きさを採点する（ステップｃ１）。サイズの大きい顔は、サイズの小さい顔に比べて重要度が高い。そこで、例えば優先度採点部２９３は、顔が大きいほど採点値を大きく設定する。 That is, the priority scoring unit 293 first scores the face size based on the face detection result of step b9 in FIG. 12 (step c1). Larger faces are more important than smaller faces. Therefore, for example, the priority scoring unit 293 sets the scoring value larger as the face is larger.

続いて優先度採点部２９３は、顔の位置を採点する（ステップｃ３）。顔の位置が画角範囲の中心に近いほど重要度が高い。一方、顔の位置が画角範囲の端部に位置している場合、その顔は次のフレームでフレームアウトする可能性があるため、重要度は低い。そこで、例えば優先度採点部２９３は、顔の位置が画角範囲の中心に近いほど採点値を大きく設定する。 Subsequently, the priority scoring unit 293 scores the face position (step c3). The closer the face position is to the center of the field angle range, the higher the importance. On the other hand, when the face position is located at the end of the angle of view range, the face is likely to be framed out in the next frame, so the importance is low. Therefore, for example, the priority scoring unit 293 sets the scoring value larger as the face position is closer to the center of the angle of view range.

続いて優先度採点部２９３は、顔の向きを採点する（ステップｃ５）。顔が正面向きから外れた方向を向いている場合、次のフレームでもその顔が正面向きから外れた方向を向いている可能性が高く、顔検出に失敗する確率が高い。そこで、例えば優先度採点部２９３は、顔が正面を向いている場合の採点値を小さくし、顔の向きが正面から外れるほど採点値を大きく設定する。 Subsequently, the priority scoring unit 293 scores the face orientation (step c5). When the face is facing away from the front direction, there is a high possibility that the face is facing away from the front direction in the next frame, and the probability of face detection failure is high. Therefore, for example, the priority scoring unit 293 decreases the scoring value when the face is facing the front, and sets the scoring value larger as the face is deviated from the front.

続いて優先度採点部２９３は、顔の向き変化を採点する（ステップｃ７）。ここで、ＲＡＭ２７内には、直近の数フレーム分についての結果を保持しておくようになっている。優先度採点部２９３は、ＲＡＭ２７内に保持されている過去数フレーム分の顔検出結果を参照して顔の向きの変化を算出し、算出した向き変化を採点する。すなわち、過去数フレームにおいて顔の向きが変化していれば、次のフレームでもその顔の向きが変化する可能性が高い。そして、向き変化が大きいほど次のフレームで大きく顔の向きが変化する可能性が高いため、顔検出に失敗する可能性も高い。そこで、例えば優先度採点部２９３は、顔の向き変化が大きいほど採点値を大きく設定する。 Subsequently, the priority scoring unit 293 scores the face orientation change (step c7). Here, in the RAM 27, the results for the last several frames are stored. The priority scoring unit 293 calculates a change in the face direction with reference to the face detection results for the past several frames held in the RAM 27, and scores the calculated change in direction. That is, if the face orientation has changed in the past several frames, the possibility that the face orientation will change in the next frame is high. And as the orientation change is larger, the possibility that the face orientation will change greatly in the next frame is higher, so the possibility of face detection failure is also higher. Therefore, for example, the priority scoring unit 293 sets the scoring value larger as the face direction change is larger.

続いて優先度採点部２９３は、次のフレームで顔同士が重なる場合を推定する（ステップｃ９）。図１４は、顔の重なり推定を説明する図であり、３人の人物の顔Ｆ１１〜Ｆ１３が映る３枚のライブビュー画像の一例を時系列に沿って模式的に示している。ここで、顔Ｆ１１，Ｆ１２に着目すると、図１４（ａ）のフレームＩ３１と図１４（ｂ）のフレームＩ３３との間で顔Ｆ１１，Ｆ１２が接近し、図１４（ｃ）のフレームＩ３５で各顔Ｆ１１，Ｆ１２が重なっている。重なり推定では、このような状態を推定する。すなわち、優先度採点部２９３は、過去数フレーム分の顔検出結果を参照する。そして、優先度採点部２９３は、各顔の位置、向き、大きさをもとに移動方向（向きが変化する方向）やその移動速度（移動量）を判定し、顔の重なりを推定する。例えば、図１４（ａ）のフレームＩ３１では、各顔Ｆ１１〜Ｆ１３の位置が離れており、次のフレームでは各顔Ｆ１１〜Ｆ１３は重ならないと推定する。一方、図１４（ｂ）のフレームＩ３３では、顔Ｆ１１，Ｆ１２が接近しており、その移動方向や移動速度をもとに、この顔Ｆ１１，Ｆ１２について次のフレームＩ３５で重なると例えば推定する。 Subsequently, the priority scoring unit 293 estimates a case where faces overlap in the next frame (step c9). FIG. 14 is a diagram for explaining face overlap estimation, and schematically shows an example of three live view images in which the faces F11 to F13 of three persons are shown in time series. Here, paying attention to the faces F11 and F12, the faces F11 and F12 approach between the frame I31 in FIG. 14A and the frame I33 in FIG. 14B, and each frame I35 in FIG. Faces F11 and F12 overlap. In the overlap estimation, such a state is estimated. That is, the priority scoring unit 293 refers to face detection results for the past several frames. The priority scoring unit 293 determines the moving direction (direction in which the direction changes) and the moving speed (moving amount) based on the position, orientation, and size of each face, and estimates the overlap of the faces. For example, in the frame I31 in FIG. 14A, the positions of the faces F11 to F13 are separated, and it is estimated that the faces F11 to F13 do not overlap in the next frame. On the other hand, in the frame I33 in FIG. 14B, it is estimated that the faces F11 and F12 are approaching and that the faces F11 and F12 overlap in the next frame I35 based on the moving direction and moving speed.

そして、図１３に示すように、続いて優先度採点部２９３は、重なり推定の結果をもとに、顔の重なりを採点する（ステップｃ１１）。具体的には、優先度採点部２９３は、顔が重なると推定された各顔についての顔検出結果をもとに、重なったときに後ろに隠れる顔について採点値を小さく設定する。推定の通りにその顔が次のフレームで他の顔と重なり、他の顔の後ろに隠れてしまった場合、顔枠の表示が必要なく、その顔についての露出やフォーカスの制御も必要ないため、重要度が低いためである。ここで、顔同士が重なった際、どちらが後方に隠れるのかについては、顔の大きさで判定できる。すなわち、サイズの大きい顔は手前側に存在し、顔が重なった場合、サイズの大きい顔の後方にサイズの小さい顔が隠れると考えられる。例えば図１４（ｃ）のフレームＩ３５では、サイズの大きい顔Ｆ１１の後方にサイズの小さい顔Ｆ１２が隠れている。このような顔の採点値を他の顔の採点値よりも小さく設定する。 Then, as shown in FIG. 13, the priority scoring unit 293 scores the face overlap based on the overlap estimation result (step c11). Specifically, the priority scoring unit 293 sets a small scoring value for the face that is hidden behind when it overlaps based on the face detection result for each face estimated to overlap. As estimated, if the face overlaps with another face in the next frame and hides behind the other face, there is no need to display the face frame, and there is no need to control exposure or focus for that face. This is because the importance is low. Here, when the faces overlap, which can be hidden behind can be determined by the size of the face. That is, a large size face exists on the near side, and when the faces overlap, it is considered that the small size face is hidden behind the large size face. For example, in the frame I35 in FIG. 14C, the small face F12 is hidden behind the large face F11. The score value of such a face is set smaller than the score values of other faces.

続いて優先度採点部２９３は、顔の角度を採点する（ステップｃ１３）。ライブビュー画像の縦方向に対して顔が傾いている場合、次のフレームでもその顔が傾いている可能性が高く、顔検出に失敗する確率が高い。そこで、例えば優先度採点部２９３は、顔が傾いていない場合の採点値を小さくし、顔が角度が大きいほど採点値を大きく設定する。 Subsequently, the priority scoring unit 293 scores the face angle (step c13). When the face is tilted with respect to the vertical direction of the live view image, there is a high possibility that the face is tilted even in the next frame, and the probability of face detection failure is high. Therefore, for example, the priority scoring unit 293 decreases the scoring value when the face is not inclined, and sets the scoring value larger as the face has a larger angle.

続いて優先度採点部２９３は、顔の角度変化を採点する（ステップｃ１５）。優先度採点部２９３は、過去数フレーム分の顔検出結果を参照して顔の角度の変化を算出し、算出した角度変化を採点する。すなわち、過去数フレームにおいて顔の角度が変化していれば、次のフレームでもその顔の角度が変化する可能性が高い。そして、角度変化が大きいほど次のフレームで大きく顔の角度が変化する可能性が高いため、顔検出に失敗する可能性も高い。そこで、例えば優先度採点部２９３は、顔の角度変化が大きいほど採点値を大きく設定する。 Subsequently, the priority scoring unit 293 scores the face angle change (step c15). The priority scoring unit 293 calculates the change in the face angle with reference to the face detection results for the past several frames, and scores the calculated angle change. That is, if the face angle has changed in the past few frames, the face angle is likely to change in the next frame. And as the angle change is larger, there is a higher possibility that the face angle will change greatly in the next frame. Therefore, for example, the priority scoring unit 293 sets the scoring value larger as the face angle change is larger.

続いて、優先度採点部２９３は、各採点値に対する重み付けを行う（ステップｃ１７）。ここで行う重み付けは、例えば、顔の大きさ、顔の位置、顔の向き、顔の向き変化、顔の重なり、顔の角度および顔の角度変化の各評価パラメータについて予め重み係数を設定しておき、各採点値に重み係数を乗じて行う。重み係数は、評価パラメータの重要度に応じて適宜設定しておくことができる。例えば、顔検出部１７の検出精度が低下する要因となる評価パラメータである顔の向きや顔の角度の重み係数を大きく設定しておけば、これらの採点値が高い顔の優先度を高くすることができる。 Subsequently, the priority scoring unit 293 performs weighting for each scoring value (step c17). For the weighting performed here, for example, a weighting factor is set in advance for each evaluation parameter of face size, face position, face orientation, face orientation change, face overlap, face angle, and face angle change. Every scoring value is multiplied by a weighting factor. The weighting factor can be set as appropriate according to the importance of the evaluation parameter. For example, if the weighting coefficient of the face direction and the face angle, which are evaluation parameters that cause the detection accuracy of the face detection unit 17 to decrease, is set large, the priority of the face having a high score value is increased. be able to.

そして、優先度採点部２９３は、重み付けした各採点値の総和を優先度として算出する（ステップｃ１９）。その後、図１２のステップｂ１３にリターンし、ステップｂ１５に移る。なお、上記した優先度採点処理は一例であって、採点方法や評価パラメータはこれに限定されず、顔検出の仕様等に応じて適宜設定できる。 And the priority scoring part 293 calculates the sum total of each weighted scoring value as a priority (step c19). Thereafter, the process returns to step b13 in FIG. 12 and proceeds to step b15. The priority scoring process described above is an example, and the scoring method and evaluation parameters are not limited to this, and can be set as appropriate according to face detection specifications and the like.

続く図１２のステップｂ１５では、動き検出候補設定部２９５が、動き検出候補設定処理を実行する。図１５は、動き検出候補設定処理の詳細な処理手順を示すフローチャートである。 In the subsequent step b15 of FIG. 12, the motion detection candidate setting unit 295 executes a motion detection candidate setting process. FIG. 15 is a flowchart illustrating a detailed processing procedure of the motion detection candidate setting process.

動き検出候補設定処理では、動き検出候補設定部２９５は先ず、動き検出結果をもとに顔エリアを設定する（ステップｄ１）。上記のように、撮影モードが選択されて最初に行う顔エリア検出処理では、図１２のステップｂ１５の動き検出が未だ行われていない。このため、ステップｄ１でも顔エリアは設定されない。一方、後述するように２回目以降の顔エリア検出処理において、ステップｂ１で動き検出候補があると判定されてステップｂ５で動き検出を行った場合には、動き検出候補設定部２９５は、このステップｄ１において、動き検出の結果算出された各動き検出エリアの動きベクトルをもとに現フレーム中における各動き検出エリアの位置をそれぞれ算出し、動き検出結果に基づく顔エリアとして設定する。続いて、動き検出候補設定部２９５は、顔検出の結果検出された顔の領域を顔検出結果に基づく顔エリアとして設定する（ステップｄ３）。 In the motion detection candidate setting process, the motion detection candidate setting unit 295 first sets a face area based on the motion detection result (step d1). As described above, in the face area detection process performed first after the shooting mode is selected, the motion detection in step b15 in FIG. 12 has not been performed yet. For this reason, the face area is not set even in step d1. On the other hand, as described later, in the second and subsequent face area detection processing, when it is determined that there is a motion detection candidate in step b1 and motion detection is performed in step b5, the motion detection candidate setting unit 295 performs this step. At d1, the position of each motion detection area in the current frame is calculated based on the motion vector of each motion detection area calculated as a result of motion detection, and set as a face area based on the motion detection result. Subsequently, the motion detection candidate setting unit 295 sets a face area detected as a result of face detection as a face area based on the face detection result (step d3).

そして、動き検出候補設定部２９５は、現フレーム中の動き検出結果に基づく顔エリアと、顔検出結果に基づく顔エリアとを次回の動き検出候補として設定する（ステップｄ５）。 Then, the motion detection candidate setting unit 295 sets the face area based on the motion detection result in the current frame and the face area based on the face detection result as the next motion detection candidate (step d5).

続いて動き検出候補設定部２９５は、次回の動き検出候補とした動き検出に基づく顔エリアの位置と、顔検出結果に基づく顔エリアの位置とを比較する。そして、動き検出候補設定部２９５は、顔検出結果に基づくいずれかの顔エリア位置が、動き検出結果に基づくいずれかの顔エリアの位置と一致している場合、すなわち動き検出結果に基づく顔エリアと顔検出結果に基づく顔エリアとが重複している場合には（ステップｄ７：Ｙｅｓ）、これらの顔エリアのうち、動き検出結果に基づく顔エリアを次回の動き検出候補から除外する（ステップｄ９）。その後、図１２のステップｂ１５にリターンし、ステップｂ１７に移る。 Subsequently, the motion detection candidate setting unit 295 compares the position of the face area based on the motion detection as the next motion detection candidate with the position of the face area based on the face detection result. Then, the motion detection candidate setting unit 295, when any face area position based on the face detection result matches the position of any face area based on the motion detection result, that is, the face area based on the motion detection result. And the face area based on the face detection result overlap (step d7: Yes), the face area based on the motion detection result is excluded from the next motion detection candidates (step d9). ). Thereafter, the process returns to step b15 in FIG. 12 and proceeds to step b17.

そして、ステップｂ１７では、ステップｂ１５の動き検出候補設定処理の結果設定した次回の動き検出候補を現フレーム中の顔エリアとして確定する。その後、図１１のステップａ５にリターンし、ステップａ７に移る。あるいはステップａ１９にリターンし、ステップａ２１に移る。そしてこの結果、ステップａ７やステップａ２１において表示されるライブビュー画像上で、この顔エリア検出処理で検出された顔エリアに顔枠が表示される。また、検出された顔エリアが露出やフォーカスの制御に用いられる。 In step b17, the next motion detection candidate set as a result of the motion detection candidate setting process in step b15 is determined as the face area in the current frame. Thereafter, the process returns to step a5 in FIG. 11 and proceeds to step a7. Or it returns to step a19 and moves to step a21. As a result, a face frame is displayed in the face area detected by the face area detection process on the live view image displayed in step a7 or step a21. The detected face area is used for exposure and focus control.

また、撮影モードが選択された後、２回目以降に行う顔エリア検出処理では、図１２に示すように、前回の顔エリア検出処理においてステップｂ１５で次回の動き検出候補が設定されていれば（ステップｂ１：Ｙｅｓ）、ステップｂ３に移り、顔エリア選択部１８が顔エリア選択処理を実行する。図１６は、顔エリア選択処理の詳細な処理手順を示すフローチャートである。 Further, in the face area detection process performed after the shooting mode is selected for the second and subsequent times, as shown in FIG. 12, if the next motion detection candidate is set in step b15 in the previous face area detection process ( Step b1: Yes), the process moves to step b3, and the face area selection unit 18 executes the face area selection process. FIG. 16 is a flowchart showing a detailed processing procedure of the face area selection processing.

顔エリア選択処理では、顔エリア選択部１８は先ず、設定されている動き検出候補の数を予め設定される動き検出エリア数Ｎと比較する。そして、顔エリア選択部１８は、動き検出候補の数が動き検出エリア数Ｎ以下であれば（ステップｅ１：Ｎｏ）、全ての動き検出候補を選択する（ステップｅ３）。そして、図１２のステップｂ３にリターンし、その後ステップｂ５に移る。 In the face area selection process, the face area selection unit 18 first compares the number of set motion detection candidates with a preset number N of motion detection areas. If the number of motion detection candidates is equal to or less than the number of motion detection areas N (step e1: No), the face area selection unit 18 selects all motion detection candidates (step e3). And it returns to step b3 of FIG. 12, and moves to step b5 after that.

また、顔エリア選択部１８は、動き検出候補の数が動き検出エリア数Ｎより多い場合には（ステップｅ１：Ｙｅｓ）、ステップｅ５に移る。そして、顔エリア選択部１８は、動き検出結果に基づく顔エリアの数Ｌが「０」であれば（ステップｅ５：Ｙｅｓ）、ステップｅ７に移る。すなわち、動き検出結果に基づく顔エリアの数Ｌが「０」ということは、設定されている動き検出候補は全て顔検出結果に基づく顔エリアである。ステップｅ７では、顔エリア選択部１８は、この顔検出結果に基づく顔エリアの中から、その顔について採点した優先度が高いものから順にＮ個の顔の顔エリアを選択する。そして、図１２のステップｂ３にリターンし、その後ステップｂ５に移る。 If the number of motion detection candidates is greater than the number N of motion detection areas (step e1: Yes), the face area selection unit 18 proceeds to step e5. If the number L of face areas based on the motion detection result is “0” (step e5: Yes), the face area selection unit 18 proceeds to step e7. That is, if the number L of face areas based on the motion detection result is “0”, the set motion detection candidates are all face areas based on the face detection result. In step e7, the face area selection unit 18 selects N face areas from the face areas based on the face detection result in descending order of the priority scored for the face. And it returns to step b3 of FIG. 12, and moves to step b5 after that.

また、顔エリア選択部１８は、動き検出結果に基づく顔エリアの数Ｌが「０」でない場合には（ステップｅ５：Ｎｏ）、ステップｅ９に移る。そして、顔エリア選択部１８は、顔検出結果に基づく顔エリアの数が「０」であれば（ステップｅ９：Ｙｅｓ）、ステップｅ１１に移る。すなわち、顔検出結果に基づく顔エリアの数が「０」ということは、設定されている動き検出候補は全て動き検出結果に基づく顔エリアであって、かつその数はＮ個である。ステップｅ１１では、顔エリア選択部１８は、この動き検出結果に基づく顔エリアを全て選択する。そして、図１２のステップｂ３にリターンし、その後ステップｂ５に移る。 If the number L of face areas based on the motion detection result is not “0” (step e5: No), the face area selection unit 18 proceeds to step e9. If the number of face areas based on the face detection result is “0” (step e9: Yes), the face area selection unit 18 proceeds to step e11. That is, when the number of face areas based on the face detection result is “0”, the set motion detection candidates are all face areas based on the motion detection result, and the number thereof is N. In step e11, the face area selection unit 18 selects all face areas based on the motion detection result. And it returns to step b3 of FIG. 12, and moves to step b5 after that.

また、顔エリア選択部１８は、顔検出結果に基づく顔エリアの数が「０」でない場合には（ステップｅ９：Ｎｏ）、ステップｅ１３に移る。そして、顔エリア選択部１８は、動き検出結果に基づく顔エリアを全て選択するとともに、Ｎ個に満たない場合には、顔検出結果に基づく顔エリアのうちの優先度の高いものから順番に選んだＮ−Ｌ個を選択する。そして、図１２のステップｂ３にリターンし、その後ステップｂ５に移る。 If the number of face areas based on the face detection result is not “0” (step e9: No), the face area selection unit 18 proceeds to step e13. Then, the face area selection unit 18 selects all the face areas based on the motion detection results, and if less than N, selects the face areas based on the face detection results in descending order of priority. Select NL pieces. And it returns to step b3 of FIG. 12, and moves to step b5 after that.

そして、ステップｂ５では、動き検出部１５が、ステップｂ３の顔エリア選択処理で選択した各顔エリアをそれぞれ動き検出エリアとして現フレーム画像中の動き検出を行い、その後動き検出結果をＲＡＭ２７に記録する（ステップｂ７）。その後、ステップｂ９に移る。 In step b5, the motion detection unit 15 performs motion detection in the current frame image using each face area selected in the face area selection processing in step b3 as a motion detection area, and then records the motion detection result in the RAM 27. (Step b7). Then, it moves to step b9.

以上説明したように、本実施の形態によれば、顔検出によってライブビュー画像中の顔を検出するとともに、顔検出結果に従って各顔の優先度を採点することができる。例えば人物の動きが激しく場合等、次のフレームで行う顔検出の検出難易度が高いと想定される顔について優先度を高く設定することができる。そして、顔検出の結果、ライブビュー画像中から動き検出が可能な数（動き検出エリア数）よりも多い顔が検出された場合には、優先度の高い顔を含む顔エリアを選択して動き検出を行うことができる。すなわち、顔検出の検出難易度が高いと想定される顔を優先的に選択し、動き検出によってその顔エリアの動きを検出することができるので、処理の負荷を増大させることなく、連続する画像中に出現する顔を見失わずに安定して追尾できる。 As described above, according to the present embodiment, a face in a live view image can be detected by face detection, and the priority of each face can be scored according to the face detection result. For example, when the movement of a person is intense, the priority can be set high for a face that is assumed to have a high detection difficulty level in the next frame. As a result of face detection, if more faces than the number of motion detection possible (number of motion detection areas) are detected in the live view image, a face area including a face with high priority is selected and moved. Detection can be performed. That is, it is possible to preferentially select a face that is assumed to have a high degree of detection difficulty for face detection, and to detect the movement of the face area by motion detection, so that a continuous image is not increased without increasing the processing load. It is possible to track stably without losing sight of the face that appears inside.

そして、本実施の形態では、ライブビュー画像上の最終的に確定した顔エリアに、顔を示す顔枠を表示させることができる。したがって、安定した見易い顔枠の表示が実現でき、顔枠の表示・非表示が繰り返されてちらつく等の不具合を防止することができる。 In the present embodiment, a face frame indicating a face can be displayed in the finally determined face area on the live view image. Therefore, stable and easy-to-view face frame display can be realized, and problems such as flickering by repeatedly displaying and hiding the face frame can be prevented.

なお、上記した実施の形態では、予め設定される動き検出エリア数の顔エリアを動き検出エリアとして選択し、選択した動き検出エリアについて動き検出を行うこととした。そして、顔検出結果に基づく顔エリアを動き検出エリアとして選択する際には、顔検出結果をもとに各顔について採点した優先度が高いものから順番に、動き検出エリアとする顔エリアを選択することとした。これに対し、顔検出によって検出された各顔のうち、優先度の低い顔の顔エリアを選択し、選択した顔エリアについて動き検出を行わない設定を行う構成としてもよい。 In the embodiment described above, a face area having a preset number of motion detection areas is selected as the motion detection area, and motion detection is performed for the selected motion detection area. When selecting a face area based on the face detection result as a motion detection area, select the face area as the motion detection area in order from the highest priority scored for each face based on the face detection result. It was decided to. On the other hand, a configuration may be adopted in which a face area of a face with a low priority is selected from the faces detected by face detection, and a setting is made so that motion detection is not performed for the selected face area.

また、上記した実施の形態では、動き検出で検出できたが顔検出に失敗した顔エリアについては必ず動き検出エリアとして選択することとした。これに対し、動き検出結果の信頼度を加味して動き検出エリアを設定するようにしてもよい。すなわち、例えば、動き検出エリアに設定したマクロブロックＢ（図９，図１０を参照）の数に対してマッチング失敗数が多い場合や、得られたマクロブロックＢ毎の動きベクトルの方向に統一性がない場合、動き検出結果の信頼性が低いと考えられる。このような場合には、該当する顔エリアを次回の動き検出エリアから外すようにしてもよい。図７を参照して説明すれば、例えば、顔Ｆ２の顔エリアを動き検出エリアとして行った動き検出結果の信頼度が低かったとする。この場合には、次のフレームとの間で行う動き検出の対象から顔Ｆ２の顔エリアを外す。そして、顔検出結果に基づく顔エリアである顔Ｆ３および顔Ｆ４の顔エリアのうち、優先度の高い顔を選択して動き検出エリアとしてもよい。図示の例では、例えば顔Ｆ４が動き検出エリアとして選択されることとなる。 In the above-described embodiment, a face area that can be detected by motion detection but fails to detect a face is always selected as a motion detection area. On the other hand, the motion detection area may be set in consideration of the reliability of the motion detection result. That is, for example, when the number of matching failures is larger than the number of macroblocks B (see FIGS. 9 and 10) set in the motion detection area, or the direction of the motion vector for each obtained macroblock B is uniform. If there is no, it is considered that the reliability of the motion detection result is low. In such a case, the corresponding face area may be removed from the next motion detection area. If it demonstrates with reference to FIG. 7, suppose that the reliability of the motion detection result performed, for example using the face area of the face F2 as a motion detection area was low. In this case, the face area of the face F2 is removed from the target of motion detection performed with the next frame. And it is good also as a motion detection area by selecting a face with high priority among the face areas of the face F3 and the face F4 which are face areas based on a face detection result. In the illustrated example, for example, the face F4 is selected as the motion detection area.

また、上記した実施の形態では、顔検出および動き検出を１フレーム毎に行うこととしたが、所定のフレーム間隔で行うこととしてもよい。また、顔検出および動き検出を行うフレーム間隔は適宜個別に設定できる。例えば、動き検出を１フレーム毎に行い、顔検出については数フレーム毎に行うといったことも可能である。 In the above-described embodiment, the face detection and the motion detection are performed for each frame, but may be performed at a predetermined frame interval. Also, the frame intervals for performing face detection and motion detection can be set individually as appropriate. For example, motion detection can be performed every frame, and face detection can be performed every few frames.

また、顔エリア選択部１８や画像処理部１６、コントローラ２９を構成する撮影開始指示部２９１や優先度採点部２９３、動き検出候補設定部２９５は、ハードウェアで実現する構成としてもよいし、所定のプログラムを実行することによってソフトウェアとして実現することとしてもよい。ソフトウェアとして実現する場合には、例えば、図１２や図１３、図１５、図１６等に示した処理の一部または全部を実現するためのプログラムをＲＯＭ２８に記録しておく。そして、コントローラ２９がこのプログラムを読み出して実行することによって、顔エリア選択部１８や画像処理部１６、撮影開始指示部２９１、優先度採点部２９３、動き検出候補設定部２９５の構成を実現するようにしてもよい。 In addition, the face area selection unit 18, the image processing unit 16, the imaging start instruction unit 291, the priority scoring unit 293, and the motion detection candidate setting unit 295 that constitute the controller 29 may be realized by hardware, or may be predetermined. It is good also as implement | achieving as software by running this program. When realized as software, for example, a program for realizing part or all of the processing shown in FIGS. 12, 13, 15, 16, etc. is recorded in the ROM. The controller 29 reads out and executes this program, thereby realizing the configuration of the face area selection unit 18, the image processing unit 16, the imaging start instruction unit 291, the priority scoring unit 293, and the motion detection candidate setting unit 295. It may be.

また、上記した実施の形態では、本発明の追尾装置をデジタルカメラに適用した例について説明したが、適用対象はデジタルカメラに限定されず、携帯電話機に付属のカメラやＰＣ付属のカメラに適用してもよい。また、パソコン等を用いて人物等の顔が映る動画を再生する場合に適用することもできる。 In the above-described embodiment, the example in which the tracking device of the present invention is applied to a digital camera has been described. However, the application target is not limited to a digital camera, and is applied to a camera attached to a mobile phone or a camera attached to a PC. May be. Further, the present invention can also be applied to the case where a moving image showing a person's face is reproduced using a personal computer or the like.

デジタルカメラの背面図である。It is a rear view of a digital camera. １フレーム毎に更新表示されるライブビュー画像の一例を時系列に沿って示した図である。It is the figure which showed an example of the live view image updated and displayed for every frame along a time series. ５つの顔が映るライブビュー画像について行った顔検出結果の一例を示す図である。It is a figure which shows an example of the face detection result performed about the live view image in which five faces are reflected. ５つの顔が映るライブビュー画像について行った顔検出結果および動き検出結果の一例を示す図である。It is a figure which shows an example of the face detection result and motion detection result which were performed about the live view image in which five faces are reflected. 動き検出エリアの設定原理を説明する図である。It is a figure explaining the setting principle of a motion detection area. 動き検出エリアの設定原理を説明する他の図である。It is another figure explaining the setting principle of a motion detection area. 動き検出エリアの設定原理を説明する他の図である。It is another figure explaining the setting principle of a motion detection area. デジタルカメラの構成例を示す概略ブロック図である。It is a schematic block diagram which shows the structural example of a digital camera. 直前フレーム画像に設定される動き検出エリアの一例を示す図である。It is a figure which shows an example of the motion detection area set to the immediately preceding frame image. 直前フレーム画像に設定される動き検出エリアの他の例を示す図である。It is a figure which shows the other example of the motion detection area set to the immediately preceding frame image. デジタルカメラが行う基本処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the basic process which a digital camera performs. 顔エリア検出処理の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of a face area detection process. 優先度採点処理の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of a priority scoring process. 顔の重なり推定を説明する図である。It is a figure explaining the overlap estimation of a face. 動き検出候補設定処理の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of a motion detection candidate setting process. 顔エリア選択処理の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of a face area selection process.

１デジタルカメラ
２カメラ本体
１１撮像光学系
１２撮像素子
１３ＡＦＥ
１４フレームメモリ
１５動き検出部
１６画像処理部
１７顔検出部
１８顔エリア選択部
１９記録媒体Ｉ／Ｆ
２０記録媒体保持部
２１記録媒体
２２ビデオエンコーダ
２３表示ドライバ
２４表示部
２５ビデオ信号出力端子
２６操作部
３レリーズスイッチ
４電源スイッチ
５メニュースイッチ
６十字キー
７ＯＫスイッチ
８モードダイヤル
２７ＲＡＭ
２８ＲＯＭ
２９コントローラ
２９１撮影開始指示部
２９３優先度採点部
２９５動き検出候補設定部
２９７顔エリア確定部 DESCRIPTION OF SYMBOLS 1 Digital camera 2 Camera body 11 Imaging optical system 12 Imaging element 13 AFE
14 frame memory 15 motion detection unit 16 image processing unit 17 face detection unit 18 face area selection unit 19 recording medium I / F
DESCRIPTION OF SYMBOLS 20 Recording medium holding | maintenance part 21 Recording medium 22 Video encoder 23 Display driver 24 Display part 25 Video signal output terminal 26 Operation part 3 Release switch 4 Power switch 5 Menu switch 6 Four-way controller 7 OK switch 8 Mode dial 27 RAM
28 ROM
29 controller 291 imaging start instruction unit 293 priority scoring unit 295 motion detection candidate setting unit 297 face area determination unit

Claims

A tracking device that tracks faces appearing in successive images,
A face detector that sequentially processes the successive images to detect a plurality of faces in the images;
A face area setting unit that sets a face area including each face detected by the face detection unit for each face;
Based on the face detection result by the face detection unit, a difficulty level evaluation unit that evaluates the detection difficulty level of each face detected by the face detection unit,
Based on the evaluation result by the difficulty level evaluation unit, a predetermined number of the face regions are selected from the face regions including the faces set by the face region setting unit in descending order of difficulty of detection of the faces. A face area selection unit to be
A motion detection target setting unit that sets a target region for motion detection with respect to the face region selected by the face region selection unit;
A motion detection unit that detects a motion of the target region set by the motion detection target setting unit between adjacent images;
A tracking device comprising:

The face detection unit outputs at least one of a face size, a face position, a face orientation, and a face tilt as the face detection result;
The difficulty level evaluation unit, based on the face detection result of each face detected by the face detection unit, at least the size of the face, the position of the face, the direction of the face, the inclination of the face, the change of the face direction, The tracking device according to claim 1, wherein the degree of difficulty in detecting each face is evaluated using one or more of a change in face inclination, a face moving speed, and a face moving direction as evaluation parameters.

3. The difficulty evaluation unit estimates the possibility that each detected face overlaps with another face, and evaluates the detection difficulty of each face using the estimation result as the evaluation parameter. The tracking device described in 1.

The difficulty level evaluation unit determines a moving speed and / or moving direction of the face from the face detection result of each face, and estimates the possibility that each face overlaps with another face based on the determination result. The tracking device according to claim 3.

The difficulty level evaluation unit weights the evaluation parameter using a weighting factor set in advance for each evaluation parameter, and evaluates the detection difficulty level of each face based on the weighted evaluation parameter. The tracking device according to claim 2 .

A face region determining unit that determines a face region in the image based on a face region including each face detected by the face detecting unit and a motion of the target region detected by the motion detecting unit; The tracking device according to claim 1.

A display processing unit for switching the continuous images and performing display processing on the display unit;
The tracking apparatus according to claim 6, wherein the display processing unit displays a face frame indicating a face in the image according to the face region in the image determined by the face region determination unit.

An imaging unit that sequentially images the subject for each frame and sequentially generates the continuous images;
A shooting instruction section for giving shooting instructions;
An imaging condition setting unit that sets an imaging condition of the imaging unit using the latest face detection result detected by the face detection unit for the face of the face region in the image determined by the face region determination unit;
The tracking device according to claim 6, further comprising:

A tracking method for tracking a face that appears in successive images,
A face detection step of sequentially processing the continuous images to detect a plurality of faces in the images;
A face area setting step for setting a face area including each face detected in the face detection step for each face;
Based on the face detection result in the face detection step, a difficulty level evaluation step for evaluating the detection difficulty level of each face detected in the face detection step;
Based on the evaluation result of the difficulty level evaluation step, a predetermined number of the face regions are selected from the face regions including the faces set in the face region setting step in descending order of difficulty of detection of the faces. A face area selection step to be performed;
A motion detection target setting step for setting as a target region for motion detection for the face region selected by the face region selection step;
A motion detection step of detecting a motion of the target region set in the motion detection target setting step between adjacent images;
The tracking method characterized by including.