JP7630272B2

JP7630272B2 - Image processing device and control method thereof, imaging device, program, and storage medium

Info

Publication number: JP7630272B2
Application number: JP2020215656A
Authority: JP
Inventors: 亮太賀集
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-02-19
Filing date: 2020-12-24
Publication date: 2025-02-17
Anticipated expiration: 2040-12-24
Also published as: JP2021132369A

Description

本発明は、画像処理装置、撮像装置における被写体検出技術に関するものである。 The present invention relates to subject detection technology in image processing devices and imaging devices.

従来より、デジタルカメラ等において、撮像素子から連続して出力される画像中から人の顔及び瞳を検出し、検出された人の顔または瞳に対して、焦点状態および露出状態を継続的に最適化し続ける、追尾ＡＦモードを備えた製品が実用化されている。 Conventionally, digital cameras and other devices have been equipped with a tracking AF mode that detects human faces and eyes from images continuously output from an image sensor, and continuously optimizes the focus and exposure conditions for the detected human faces or eyes.

また近年では、機械学習の手法を用いて、人の顔や瞳だけではなく、様々な種類の被写体を検出することが可能になってきている（特許文献１）。この手法では、検出器に対して画像と検出したい被写体を学習させた辞書データとを併せて入力することにより、特定の被写体検出を行う。検出器に入力する辞書データを変更することにより、異なる種類の被写体を画像内から検出することが可能である。 In recent years, it has become possible to use machine learning techniques to detect not only human faces and eyes, but also various other types of subjects (Patent Document 1). With this technique, a specific subject is detected by inputting to a detector both an image and dictionary data that has been trained to detect the subject to be detected. By changing the dictionary data input to the detector, it is possible to detect different types of subjects within an image.

検出対象となる被写体の種類が増えると、対応する辞書データの種類も増加する。デジタルカメラなどの、検出器を構成する論理演算回路の数や処理能力が限られている装置においては、１フレームごとに辞書データを切り替えることにより、複数種類の被写体検出を実現する制御方法が考えられる。なお、ここでは、１フレームとは、撮像素子から出力される画像に対して、演算処理及びカメラ制御が実行され完了するまでの時間または周期を表すものとする。 As the number of types of subjects to be detected increases, the number of types of corresponding dictionary data also increases. In devices such as digital cameras, where the number of logical operation circuits constituting the detector and the processing power are limited, a control method can be considered that realizes detection of multiple types of subjects by switching dictionary data every frame. Note that, here, one frame represents the time or cycle until the execution and completion of operation processing and camera control for the image output from the image sensor.

特開２０１０－１５４４３８号公報JP 2010-154438 A

しかしながら、画像が出力されるフレーム毎に辞書データを切り替える制御方法だけでは、検出された被写体に対応する辞書データが入力されないフレームにおいては、その被写体が検出できなくなる。これにより安定した被写体の追尾が行えなくなる問題がある。 However, a control method that switches dictionary data for each frame in which an image is output alone will not allow a detected subject to be detected in a frame in which dictionary data corresponding to that subject is not input. This creates a problem in that stable subject tracking cannot be performed.

本発明は上述した課題に鑑みてなされたものであり、その目的は、複数種類の被写体を検出可能でありながら、安定した被写体の追尾を行うことができる撮像装置を提供することである。 The present invention has been made in consideration of the above-mentioned problems, and its purpose is to provide an imaging device that can detect multiple types of subjects and can stably track the subjects.

本発明に係わる画像処理装置は、異なる複数の被写体を画像からそれぞれ検出するための複数の辞書データを記憶する記憶手段と、撮像手段によって得られる複数フレームの画像の各フレームに対して、前記複数の辞書データのうちの一部の辞書データを用いて該一部の辞書データに対応する被写体を検出する検出手段と、前記検出手段による被写体の検出結果に応じて、前記複数フレームにおいて前記検出手段で用いる辞書データを切替える切替手段と、を備え、前記切替手段は、前記検出手段によって対応する被写体が検出された辞書データを、以降のフレームで、前記被写体が検出される前よりも短い検出周期で用いるとともに、前記検出手段で用いる辞書データの組み合わせが各フレームごとに周期的に切り替わるように、前記検出手段で用いる辞書データを切替えることを特徴とする。 The image processing device of the present invention comprises a memory means for storing multiple dictionary data for detecting multiple different subjects from an image, a detection means for detecting a subject corresponding to a portion of the dictionary data from among the multiple dictionary data for each frame of a multiple frame image obtained by an imaging means, and a switching means for switching the dictionary data used by the detection means in the multiple frames depending on the result of the subject detection by the detection means, wherein the switching means uses the dictionary data for which the corresponding subject has been detected by the detection means in subsequent frames at a shorter detection period than before the subject was detected, and switches the dictionary data used by the detection means so that the combination of dictionary data used by the detection means is switched periodically for each frame .

本発明によれば、複数種類の被写体を検出可能でありながら、安定した被写体の追尾を行うことができる撮像装置を提供することが可能となる。 The present invention makes it possible to provide an imaging device that can detect multiple types of subjects and can stably track the subjects.

本発明の撮像装置の第１の実施形態であるデジタル一眼カメラの構成を示す側断面図。1 is a side sectional view showing the configuration of a digital single-lens camera that is a first embodiment of an image pickup apparatus of the present invention. 撮像装置の制御系のブロック図。FIG. 2 is a block diagram of a control system of the imaging apparatus. 撮像装置の１フレームの動作を説明するフローチャート。4 is a flowchart for explaining the operation of one frame of the imaging apparatus. 第１の実施形態における辞書データの優先度算出処理のフローチャート。11 is a flowchart of a dictionary data priority calculation process according to the first embodiment. 第１の実施形態における辞書データの切替制御方法を決定するフローチャート。5 is a flowchart for determining a dictionary data switching control method according to the first embodiment. 各辞書データの検出周期の例を示した図。6A and 6B are diagrams showing examples of detection periods of each dictionary data. 具体的なケースを想定した辞書データの制御例を示した図。FIG. 13 is a diagram showing an example of dictionary data control assuming a specific case. 局所辞書データが定義されている場合の各辞書データの検出周期の例を示した図。11A and 11B are diagrams showing examples of detection periods of each dictionary data when local dictionary data is defined. 局所辞書データが定義されている場合の具体的なケースを想定した辞書データの制御例を示した図。13 is a diagram showing an example of dictionary data control assuming a specific case where local dictionary data is defined. FIG. 第２の実施形態における辞書データの優先度テーブルと検出周期の例を示した図。13A and 13B are diagrams showing an example of a priority table and a detection period of dictionary data according to the second embodiment. 第２の実施形態における具体的なケースを想定した辞書データの制御例を示した図。13A and 13B are diagrams showing examples of dictionary data control assuming specific cases in the second embodiment. 第３の実施形態における被写体検出頻度のデータを示した図。FIG. 13 is a diagram showing data on subject detection frequency according to the third embodiment. 第３の実施形態における辞書データの検出周期の例を示した図。FIG. 13 is a diagram showing an example of a detection period of dictionary data according to the third embodiment. 連続性チェックの処理内容を示すフローチャート。11 is a flowchart showing the contents of a continuity check process. 連写開始時などにおける課題について示す図。13A and 13B are diagrams showing problems that arise when continuous shooting is started, etc. 図１５に示す課題を解決する制御概念を示す図。FIG. 16 is a diagram showing a control concept for solving the problem shown in FIG. 15 . 図１６の制御概念を実現するための制御を示すフローチャート。17 is a flowchart showing a control for realizing the control concept of FIG. 16 . 主被写体として選択し直す場合の被写体の位置関係を示す図。13A and 13B are diagrams showing the positional relationship of subjects when the subject is reselected as the main subject.

以下、添付図面を参照して実施形態を詳しく説明する。尚、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 The following embodiments are described in detail with reference to the attached drawings. Note that the following embodiments do not limit the invention according to the claims. Although the embodiments describe multiple features, not all of these multiple features are necessarily essential to the invention, and multiple features may be combined in any manner. Furthermore, in the attached drawings, the same reference numbers are used for the same or similar configurations, and duplicate explanations are omitted.

（第１の実施形態）
図１は、本発明の撮像装置の第１の実施形態であるデジタル一眼カメラ（以下、単にカメラとも表記する）１００の構成を示す側断面図である。図２は、図１のカメラ１００の電気的な構成を示すブロック図である。 First Embodiment
Fig. 1 is a side cross-sectional view showing the configuration of a digital single-lens camera (hereinafter, also referred to simply as camera) 100 which is a first embodiment of an image pickup apparatus of the present invention. Fig. 2 is a block diagram showing the electrical configuration of the camera 100 of Fig. 1.

本実施形態のカメラ１００は、図１に示すように、カメラ本体１０１の正面側（被写体側）に、着脱可能な交換式のレンズユニット１２０が装着される。レンズユニット１２０は、フォーカスレンズ１２１および絞り１２２等を有し、カメラ本体１０１に対してマウント接点部１２３を介して電気的に接続される。これにより、カメラ本体１０１に取り込む光量と焦点位置を調節することが可能である。なお、フォーカスレンズ１２１は、ユーザが手動で調節することもできる。 As shown in FIG. 1, the camera 100 of this embodiment has a detachable, interchangeable lens unit 120 attached to the front side (subject side) of the camera body 101. The lens unit 120 has a focus lens 121, an aperture 122, etc., and is electrically connected to the camera body 101 via a mount contact portion 123. This makes it possible to adjust the amount of light taken into the camera body 101 and the focal position. The focus lens 121 can also be manually adjusted by the user.

撮像素子１０４は、ＣＣＤセンサーやＣＭＯＳセンサー等で構成され、赤外カットフィルターやローパスフィルター等を含む。撮像素子１０４は、撮影時にレンズユニット１２０の撮影光学系を通過して結像した被写体像を光電変換し、撮影画像を生成するための信号を演算装置１０２に送信する。演算装置１０２は、受信した信号から撮影画像を生成し、画像記憶部１０７へ保存するとともに、ＬＣＤ等の表示部１０５に表示する。シャッター１０３は、非撮影時には撮像素子１０４を遮光し、撮影時には開いて撮像素子１０４を露光させる。 The image sensor 104 is composed of a CCD sensor, a CMOS sensor, or the like, and includes an infrared cut filter, a low-pass filter, and the like. When shooting, the image sensor 104 photoelectrically converts the subject image formed through the shooting optical system of the lens unit 120, and transmits a signal for generating a shot image to the calculation device 102. The calculation device 102 generates a shot image from the received signal, stores it in the image storage unit 107, and displays it on the display unit 105 such as an LCD. The shutter 103 blocks light from the image sensor 104 when shooting is not being performed, and opens when shooting to expose the image sensor 104.

次に制御に関わる構成について、図２を用いて説明する。演算装置１０２は、複数のタスクを並列処理できるマルチコアＣＰＵ、ＲＡＭおよびＲＯＭに加え、特定の演算処理を高速に実行するための専用回路などを備えている。これらのハードウェアによって、演算装置１０２は、制御部２０１、主被写体演算部２０２、追尾演算部２０３、フォーカス演算部２０４、露出演算部２０５を構成している。制御部２０１はカメラ本体１０１およびレンズユニット１２０の各部を制御する。 Next, the configuration related to control will be explained using FIG. 2. The calculation device 102 is equipped with a multi-core CPU capable of parallel processing of multiple tasks, RAM, and ROM, as well as dedicated circuits for executing specific calculation processes at high speed. With this hardware, the calculation device 102 is made up of a control unit 201, a main subject calculation unit 202, a tracking calculation unit 203, a focus calculation unit 204, and an exposure calculation unit 205. The control unit 201 controls each part of the camera body 101 and the lens unit 120.

主被写体演算部２０２は、辞書データ優先度算出部２１１、辞書データ切替制御部２１２、検出器２１３、主被写体判定部２１４を備えて構成される。検出器２１３は、画像から特定領域（例えば、人の顔や瞳、犬の顔や瞳）を検出する処理を行う。特定領域は、検出されない場合もあれば複数検出される場合もある。検出手法としてはＡｄａＢｏｏｓｔや畳み込みニューラルネットワークなどの公知の任意の方法を用いればよい。また、その実装の形態としては、ＣＰＵ上で動くプログラムであってもよいし、専用のハードウェアであってもよいし、それらの組み合わせであってもよい。また検出器２１３に入力する辞書データを切り替えることによって、検出被写体の種類を変更することができる。 The main subject calculation unit 202 is configured with a dictionary data priority calculation unit 211, a dictionary data switching control unit 212, a detector 213, and a main subject determination unit 214. The detector 213 performs processing to detect specific areas (e.g., a human face or eyes, or a dog's face or eyes) from an image. There are cases where no specific areas are detected, and cases where multiple specific areas are detected. The detection method may be any known method such as AdaBoost or a convolutional neural network. In addition, the implementation form may be a program running on a CPU, dedicated hardware, or a combination of these. In addition, the type of detected subject can be changed by switching the dictionary data input to the detector 213.

辞書データは、例えば対応する被写体の特徴が登録されたデータであり、論理回路に対する制御命令が、被写体種類ごとに記載されたものである。本実施形態では、被写体ごとの辞書データは演算装置１０２内のＲＯＭに記憶される（辞書データが複数種類記憶される）。辞書データは被写体の種類及び特定領域ごとに存在するので、辞書データを切り替えることによって、異なる種類の被写体を検出することができる。例えばフレーム毎など所定の期間ごとに辞書データ優先度算出部２１１が辞書データの優先度を算出し、その演算結果に基づいて、辞書データ切替制御部２１２が検出器２１３に入力する辞書データを決定する。検出器２１３から得られる被写体検出結果は主被写体判定部２１４に送られ、検出された被写体の中から、主被写体が決定され主被写体領域が設定される。主被写体の判定は、サイズ、位置、検出結果の信頼度などによる公知の算出方法を用いて行われる。検出器２１３で被写体あるいは特定領域が検出されなかった場合には、過去の検出結果や対象フレームのエッジ等の特徴量、被写体距離などに基づいて主被写体とする主被写体領域を決定するものとする。 The dictionary data is data in which, for example, the characteristics of the corresponding subject are registered, and control commands for the logic circuit are written for each type of subject. In this embodiment, the dictionary data for each subject is stored in the ROM in the calculation device 102 (multiple types of dictionary data are stored). Since the dictionary data exists for each type of subject and each specific area, different types of subjects can be detected by switching the dictionary data. For example, the dictionary data priority calculation unit 211 calculates the priority of the dictionary data for each predetermined period, such as for each frame, and the dictionary data switching control unit 212 determines the dictionary data to be input to the detector 213 based on the calculation result. The subject detection result obtained from the detector 213 is sent to the main subject determination unit 214, and the main subject is determined from the detected subjects and the main subject area is set. The main subject is determined using a known calculation method based on the size, position, reliability of the detection result, etc. If the detector 213 does not detect a subject or a specific area, the main subject area to be the main subject is determined based on past detection results, feature amounts such as edges of the target frame, subject distance, etc.

本実施形態では、検出器２１３にＣＮＮ（コンボリューショナル・ニューラル・ネットワーク）による被写体検出を行う検出器を用いており、当該検出器が各被写体の検出に用いる辞書データは、予め外部機器（ＰＣ）あるいは撮像装置１００でＣＮＮの機械学習によって生成された学習済パラメータである。 In this embodiment, the detector 213 is a detector that performs subject detection using a convolutional neural network (CNN), and the dictionary data that the detector uses to detect each subject is learned parameters that have been generated in advance by machine learning of the CNN in an external device (PC) or the imaging device 100.

ＣＮＮの機械学習は、任意の手法で行われ得る。例えば、サーバ等の所定のコンピュータが、ＣＮＮの機械学習を行い、撮像装置１００は、学習されたＣＮＮを、所定のコンピュータから取得してもよい。例えば、所定のコンピュータが、学習用の画像データを入力とし、学習用の画像データに対応する被写体の位置等を教師データとした教師あり学習を行うことで、被写体検出部２０４のＣＮＮの学習が行われてもよい。また、所定のコンピュータが、学習用の画像データを入力とし、学習用の画像データの被写体に対応する辞書データを教師データとした教師あり学習を行うことで、辞書推定部２０５のＣＮＮの学習が行われてもよい。以上により、ＣＮＮの学習済パラメータが生成される。ＣＮＮの学習は、撮像装置１００または上述した画像処理装置で行われてもよい。 The machine learning of the CNN may be performed by any method. For example, a specific computer such as a server may perform machine learning of the CNN, and the imaging device 100 may obtain the trained CNN from the specific computer. For example, the CNN of the subject detection unit 204 may be trained by a specific computer performing supervised learning using the image data for training as input and the position of the subject corresponding to the image data for training as teacher data. The CNN of the dictionary estimation unit 205 may be trained by a specific computer performing supervised learning using the image data for training as input and dictionary data corresponding to the subject of the image data for training as teacher data. As a result, trained parameters of the CNN are generated. The CNN may be trained by the imaging device 100 or the image processing device described above.

本実施形態で用いられる検出器２１３は、１フレームに対して、複数種類の被写体検出が可能な処理能力を有するものとするが、本実施形態のカメラが検出対象とする全種類の被写体の検出を常に１フレーム以内で行うことはしないものとする。すなわち、本実施形態は検出器２１３の処理能力を必ずしも制限するものではないが、処理速度やバスの帯域を節約するために全種類より少ない複数種類の被写体検出を、予め定めた設定に従って辞書を切替えて適用するものとする。 The detector 213 used in this embodiment has the processing power to detect multiple types of subjects per frame, but does not always detect all types of subjects targeted by the camera of this embodiment within one frame. In other words, this embodiment does not necessarily limit the processing power of the detector 213, but in order to save processing speed and bus bandwidth, it applies detection of fewer than all types of subjects by switching dictionaries according to predetermined settings.

追尾演算部２０３は、主被写体の検出情報に基づいて、主被写体領域の追尾を行う。フォーカス制御部２０４は、主被写体領域にピントを合わせるためのフォーカスレンズ１２１の制御値を算出する。また、露出演算部２０５は、主被写体領域を適正露出にするための絞り１２２および撮像素子１０４の制御値を算出する。 The tracking calculation unit 203 tracks the main subject region based on the detection information of the main subject. The focus control unit 204 calculates the control value of the focus lens 121 for focusing on the main subject region. The exposure calculation unit 205 calculates the control values of the aperture 122 and the image sensor 104 for achieving proper exposure for the main subject region.

操作部１０６は、レリーズスイッチやモードダイヤルなどを備えており、制御部２０１は、操作部１０６を通してユーザからの撮影指示やモード変更指示などを受け取ることができる。以上が、本発明の第１の実施形態に係わるカメラ１００の構成である。 The operation unit 106 includes a release switch and a mode dial, and the control unit 201 can receive shooting instructions and mode change instructions from the user through the operation unit 106. This completes the configuration of the camera 100 according to the first embodiment of the present invention.

次に、図３を参照して本実施形態におけるカメラ１００の１フレームの動作について説明する。図３のフローチャートの動作は、１フレームごとに繰り返し実行される。 Next, the operation of one frame of the camera 100 in this embodiment will be described with reference to FIG. 3. The operation of the flowchart in FIG. 3 is repeatedly executed for each frame.

ステップＳ３０１では、制御部２０１が、撮像素子１０４からの画素信号の読み出し処理を行う。読み出された画素信号から画像データを生成する。画素信号および生成した画像データをＲＡＭに保存した後に、ステップＳ３０２へ進む。 In step S301, the control unit 201 performs a process of reading out pixel signals from the image sensor 104. Image data is generated from the read out pixel signals. After storing the pixel signals and the generated image data in RAM, the process proceeds to step S302.

ステップＳ３０２では、初回フレーム時のみステップＳ３０６へ進む。それ以外の場合は、ステップＳ３０３へ進む。 In step S302, proceed to step S306 only if it is the first frame. Otherwise, proceed to step S303.

ステップＳ３０３では、追尾演算部２０３が、前フレームで後述するステップＳ３１０において生成された追尾用参照情報を用いて、前フレームで設定された主被写体領域の現フレームにおける位置を算出する追尾処理を行う。追尾処理に用いるアルゴリズムは任意の公知の方法を用いればよい。例えば、前フレームにおいて設定された主被写体領域から所定の特徴抽出手段によって抽出した特徴量を追尾用参照情報として、現フレームにおいて、特徴量同士の距離が近い領域を探索することによって行うことができる。また、アルゴリズムの実装の形態としては、ＣＰＵ上で動くプログラムであってもよいし、専用のハードウェアであってもよいし、それらの組み合わせであってもよい。追尾処理が完了した後に、追尾結果（現フレームにおける主被写体領域の位置とサイズ）を演算装置１０２内のＲＡＭに保存して、Ｓ３０４へ進む。 In step S303, the tracking calculation unit 203 performs a tracking process to calculate the position in the current frame of the main subject area set in the previous frame using the tracking reference information generated in step S310 (described later) for the previous frame. Any known method may be used as the algorithm used in the tracking process. For example, the tracking process can be performed by searching for an area in the current frame where the feature amounts are close to each other, using the feature amount extracted from the main subject area set in the previous frame by a predetermined feature extraction means as the tracking reference information. In addition, the implementation form of the algorithm may be a program running on a CPU, dedicated hardware, or a combination of these. After the tracking process is completed, the tracking result (the position and size of the main subject area in the current frame) is saved in the RAM in the calculation device 102, and the process proceeds to S304.

ステップＳ３０４では、フォーカス演算部２０４が、ステップＳ３０３で生成された追尾結果と、ステップＳ３０１で生成された信号情報や画像データなどを用いて、追尾結果にピントが合うようにフォーカスレンズ１２１の制御値を算出する。フォーカスレンズの制御値の算出方法は、コントラスト方式や位相差検出方式など、任意の公知の方法を用いればよい。フォーカスレンズの制御値の算出が完了したら、フォーカス演算部２０４が、制御部２０１にフォーカスレンズの制御値を通知する。制御部２０１は、通知された制御値に基づき、マウント接点部１２３を通じて、フォーカスレンズ１２１の制御を行う。以上の処理が終了後、ステップＳ３０５へ進む。 In step S304, the focus calculation unit 204 uses the tracking result generated in step S303 and the signal information and image data generated in step S301 to calculate a control value for the focus lens 121 so that the tracking result is in focus. The focus lens control value may be calculated using any known method, such as a contrast method or a phase difference detection method. When calculation of the focus lens control value is complete, the focus calculation unit 204 notifies the control unit 201 of the focus lens control value. The control unit 201 controls the focus lens 121 through the mount contact unit 123 based on the notified control value. After the above processing is completed, proceed to step S305.

ステップＳ３０５では、露出演算部２０５が、ステップＳ３０３で生成された追尾結果と、ステップＳ３０１で生成された信号情報や画像データを用いて、追尾結果の露出が適正になるように、撮像素子１０４および絞り１２２の制御値を算出する。撮像素子１０４および絞り１２２の制御値の算出方法は任意の公知の方法を用いればよい。撮像素子１０４および絞り１２２の制御値の算出が完了したら、露出演算部２０５が、制御部２０１に撮像素子１０４および絞り１２２の制御値を通知する。制御部２０１は通知された撮像素子１０４および絞り１２２の制御値に基づき、撮像素子１０４の制御を行うとともに、マウント接点部１２３を通じて、絞り１２２の制御を行う。以上の処理が終了後、ステップＳ３０６へ進む。 In step S305, the exposure calculation unit 205 uses the tracking result generated in step S303 and the signal information and image data generated in step S301 to calculate control values for the image sensor 104 and the aperture 122 so that the exposure of the tracking result is appropriate. Any known method may be used to calculate the control values for the image sensor 104 and the aperture 122. When the calculation of the control values for the image sensor 104 and the aperture 122 is completed, the exposure calculation unit 205 notifies the control unit 201 of the control values for the image sensor 104 and the aperture 122. The control unit 201 controls the image sensor 104 based on the notified control values for the image sensor 104 and the aperture 122, and also controls the aperture 122 through the mount contact unit 123. After the above processing is completed, proceed to step S306.

ステップＳ３０６では、制御部２０１が操作部１０６の状態を読み取り、レリーズスイッチが押されている場合には、ステップＳ３０７へ進む。それ以外の場合はステップＳ３０８へ進む。 In step S306, the control unit 201 reads the state of the operation unit 106, and if the release switch is pressed, the process proceeds to step S307. Otherwise, the process proceeds to step S308.

ステップＳ３０７では、制御部２０１が静止画の撮像処理を行う。ステップＳ３０５で算出した制御値に基づいてシャッター１０３を駆動させ、撮像素子１０４を露光させる。露光された撮像素子１０４から読み出された画素信号から画像データを生成する。生成された画像データはＳＤカードなどの外部記憶媒体に保存される。以上の処理が終了後、ステップＳ３０８へ進む。 In step S307, the control unit 201 performs imaging processing for a still image. Based on the control value calculated in step S305, the shutter 103 is driven to expose the image sensor 104. Image data is generated from pixel signals read from the exposed image sensor 104. The generated image data is stored in an external storage medium such as an SD card. After the above processing is completed, the process proceeds to step S308.

ステップＳ３０８では、各辞書データの優先度を算出する。本処理の詳細については図４を用いて後述する。次にステップＳ３０９に進む。 In step S308, the priority of each dictionary data is calculated. Details of this process will be described later with reference to FIG. 4. Next, proceed to step S309.

ステップＳ３０９では、算出された各辞書データの優先度に基づいて、辞書データの切替制御方法を決定する。本処理の詳細については図５を用いて後述する。次にステップＳ３１０に進む。 In step S309, a dictionary data switching control method is determined based on the calculated priority of each dictionary data. Details of this process will be described later with reference to FIG. 5. Next, the process proceeds to step S310.

ステップＳ３１０では、検出器２１３にステップＳ３０１で生成した画像データと辞書データを入力し、辞書データに対応する被写体の特定領域が検出される。入力される辞書データはステップＳ３０９で決定した辞書データの切替制御方法に従う。次にステップＳ３１１に進む。 In step S310, the image data generated in step S301 and the dictionary data are input to the detector 213, and a specific area of the subject corresponding to the dictionary data is detected. The input dictionary data follows the dictionary data switching control method determined in step S309. Next, proceed to step S311.

ステップＳ３１１では、ステップＳ３１０で得られた検出結果から例えば前述した方法で主被写体領域を判定する。主被写体領域の判定は、検出結果の部位（例えば、全身、顔、瞳など）、サイズ、位置、検出結果の信頼度などによる公知の算出方法を用いて行われる。決定した主被写体領域の情報を演算装置１０２内のＲＡＭに保存したら、ステップＳ３１２へ進む。 In step S311, the main subject region is determined from the detection result obtained in step S310, for example, using the method described above. The main subject region is determined using a known calculation method based on the part of the detection result (e.g., whole body, face, eyes, etc.), size, position, and reliability of the detection result. Once information on the determined main subject region is saved in the RAM in the computing device 102, the process proceeds to step S312.

ステップＳ３１２では、追尾演算部２０３がステップＳ３１１で生成した主被写体領域の情報と、ステップＳ３０１で生成した画像データとに基づいて、次フレームのステップＳ３０３の追尾処理において使用するための追尾用参照情報を生成する。そして、生成した追尾用参照情報を演算装置１０２内のＲＡＭに保存する。 In step S312, the tracking calculation unit 203 generates tracking reference information to be used in the tracking process of step S303 for the next frame based on the information on the main subject region generated in step S311 and the image data generated in step S301. The generated tracking reference information is then stored in the RAM in the calculation device 102.

ステップＳ３１２が終了したら、カメラ１００の１フレームの動作は終了となる。カメラ１００が動作し続ける限り、再びステップＳ３０１に戻って、ここまで説明した処理を繰り返す。以上が本実施形態におけるカメラ１００の１フレームの動作の説明である。 When step S312 is completed, the operation of one frame of the camera 100 is completed. As long as the camera 100 continues to operate, the process returns to step S301 and repeats the process described up to this point. This concludes the description of the operation of one frame of the camera 100 in this embodiment.

次に、図３のステップＳ３０８での、辞書データの優先度の算出方法について図４を用いて説明する。 Next, the method for calculating the priority of dictionary data in step S308 of FIG. 3 will be explained using FIG. 4.

ステップＳ４００では、前フレームで設定されている各辞書データの優先度を初期化する。 In step S400, the priority of each dictionary data set in the previous frame is initialized.

ステップＳ４０１では、前フレームにおいて主被写体領域が出力されたかを演算装置１０２内のＲＡＭに保存されている主被写体領域情報を参照して判定する。主被写体領域が検出されていれば、ステップＳ４０２に進み、被写体が何も検出されていなければステップＳ４０５に進む。 In step S401, it is determined whether a main subject region has been output in the previous frame by referring to main subject region information stored in the RAM in the calculation device 102. If a main subject region has been detected, the process proceeds to step S402, and if no subject has been detected, the process proceeds to step S405.

ステップＳ４０２では、前フレームで検出された主被写体に対応する辞書データの優先度を「高」に設定する。図４では辞書データの優先度を「高」、「低」の２段階に設定することとする。優先度の設定方法は辞書データ数や検出器の処理能力（１フレーム当りに検出可能な種類、回数）によって異なる。 In step S402, the priority of the dictionary data corresponding to the main subject detected in the previous frame is set to "high." In FIG. 4, the priority of the dictionary data is set to two levels, "high" and "low." The method of setting the priority differs depending on the number of dictionary data and the processing power of the detector (types and number of times that can be detected per frame).

ステップＳ４０３では、主被写体に局所辞書データが定義されているか否かを判定する。被写体によっては、検出領域として、全体領域と全体領域内の一部である局所領域とが定義されている。具体的には、例えば下記のような関係である。 In step S403, it is determined whether local dictionary data is defined for the main subject. Depending on the subject, the detection area may be defined as a whole area and a local area that is a part of the whole area. Specifically, for example, the relationship is as follows:

（人の場合）全体領域：顔局所領域：瞳
（犬の場合）全体領域：全身／顔局所領域：瞳
本実施形態では、上記辞書データに基づく局所領域（特定領域）の検出は、撮影画像全体から行うのではなく、検出された被写体の全体領域およびその近傍を含む部分を元の撮影画像から切り出した画像を、検出器２１３に入力することにより行う。つまり対象の被写体が人であれば、人の顔が検出されて初めて局所領域である瞳が検出される。対応する局所辞書データがあれば、ステップＳ４０４に進み、局所辞書データがなければ、ステップＳ４０５に進む。 (For humans) Whole area: Face Local area: Pupils (For dogs) Whole area: Whole body/face Local area: PupilsIn this embodiment, detection of a local area (specific area) based on the dictionary data is not performed from the entire captured image, but is performed by inputting an image obtained by cutting out the whole area of the detected subject and its vicinity from the original captured image to the detector 213. In other words, if the target subject is a person, the pupils, which are a local area, are detected only after the human face is detected. If there is corresponding local dictionary data, the process proceeds to step S404, and if there is no local dictionary data, the process proceeds to step S405.

ステップＳ４０４では、主被写体に対応する局所辞書データの優先度を「高」に設定し、ステップＳ４０５に進む。 In step S404, the priority of the local dictionary data corresponding to the main subject is set to "high," and the process proceeds to step S405.

ステップＳ４０５では、優先度が決定していない辞書データがあるか否かを判定する。優先度が決定していない辞書データがあれば、ステップＳ４０６に進み、なければステップＳ３０８の辞書データの優先度算出プロセスは終了する。 In step S405, it is determined whether there is any dictionary data for which the priority has not been determined. If there is any dictionary data for which the priority has not been determined, the process proceeds to step S406; if not, the dictionary data priority calculation process in step S308 ends.

ステップＳ４０６では、優先度が決定していない辞書データの優先度を「低」に設定する。 In step S406, the priority of dictionary data for which the priority has not been determined is set to "low."

次に、図３のステップＳ３０９での、辞書データ切替制御方法の決定方法について、図５を用いて説明する。 Next, the method for determining the dictionary data switching control method in step S309 of FIG. 3 will be explained using FIG. 5.

ステップＳ５００では、前フレームにおいて決定されていた各辞書データの検出周期を初期化する。ここで検出周期とは１回の検出結果を何フレーム毎に取得するかを表すパラメータ［ｆｒａｍｅ／検出］とする。また、検出周期の単位としてはこれに限らず、例えば時間［ｍｓ／検出］で設定されてもよい。また、複数フレームに対して検出器２１３が被写体の検出に用いる辞書データの設定方法としては上記に限らない。例えば、検出器２１３が各フレームにおいてどの組合せで辞書データを用いるかを記載したデータテーブルの形式で、演算装置１０２内のＲＯＭに予め複数記憶しておいてもよい。制御部２０１は、例えば被写体の検出としてどの被写体を優先して検出するかのユーザによる設定（人物優先、動物（犬、猫、鳥）優先、乗り物（２輪、４輪）優先など）に応じてテーブルを選択し、ＲＡＭに展開する。検出器２１３はＲＡＭに格納されたテーブルの順番通りに辞書データを参照し、各辞書データに対応した被写体の検出を行う。ＲＡＭに展開された辞書データの入力順を示すテーブルは、被写体の検出状況に応じて随時書き換えられる。例えば、鳥などの特定の被写体が検出されたことに応じて次に鳥の瞳を検出するべく、鳥の瞳に対応する辞書データが優先的に検出されるように書き換えられる。 In step S500, the detection period of each dictionary data determined in the previous frame is initialized. Here, the detection period is a parameter [frame/detection] indicating how many frames a single detection result is obtained. The unit of the detection period is not limited to this, and may be set, for example, by time [ms/detection]. The method of setting the dictionary data used by the detector 213 to detect the subject for multiple frames is not limited to the above. For example, a plurality of data tables describing which combination of dictionary data the detector 213 uses in each frame may be stored in advance in the ROM in the computing device 102. The control unit 201 selects a table according to a user's setting of which subject to prioritize for detection of the subject (person priority, animal (dog, cat, bird) priority, vehicle (2-wheel, 4-wheel) priority, etc.), and expands it in the RAM. The detector 213 refers to the dictionary data in the order of the table stored in the RAM, and detects the subject corresponding to each dictionary data. The table indicating the input order of the dictionary data expanded in the RAM is rewritten at any time according to the detection situation of the subject. For example, if a specific subject such as a bird is detected, the dictionary data corresponding to the bird's eyes is rewritten so that they are detected with priority, so that the bird's eyes can be detected next.

ステップＳ５０１では、検出周期が未設定で、優先度「高」の辞書データがあるか否かを判定する。そのような辞書データがある場合には、ステップＳ５０２に進み、その辞書の検出周期を１［ｆｒａｍｅ／検出］に設定する（毎フレーム検出結果を取得する辞書データとして選択する）。ステップＳ５０１、Ｓ５０２は、全ての優先度「高」の辞書データに検出周期を設定するまで繰り返し実行される。次にステップＳ５０３に進む。 In step S501, it is determined whether there is any dictionary data with a "high" priority for which the detection period has not been set. If such dictionary data is present, the process proceeds to step S502, where the detection period for that dictionary is set to 1 [frame/detection] (selected as dictionary data for which detection results are obtained for each frame). Steps S501 and S502 are repeatedly executed until detection periods have been set for all dictionary data with a "high" priority. Next, the process proceeds to step S503.

ステップＳ５０３では、残りの優先度「低」の辞書データの検出周期を複数フレームに設定する。具体的な優先度「低」の辞書データの検出周期の例については図６、図７を用いて後述する。次にステップＳ５０４に進む。 In step S503, the detection period for the remaining dictionary data with a "low" priority is set to multiple frames. Specific examples of detection periods for dictionary data with a "low" priority will be described later with reference to Figures 6 and 7. Next, the process proceeds to step S504.

ステップＳ５０４では、これまでのステップで決定された各辞書データの検出周期と１フレーム当りの検出可能回数とに基づいて、辞書データの制御スケジュールを決定する。具体的な辞書データの制御スケジュール例については図６、図７を用いて後述する。 In step S504, a control schedule for the dictionary data is determined based on the detection period of each dictionary data and the number of times detection is possible per frame determined in the previous steps. Specific examples of the control schedule for the dictionary data will be described later with reference to Figures 6 and 7.

次に、具体的な辞書データの制御例について図６、図７を用いて説明する。本実施形態で想定する制御ケースでの、辞書データの種類と検出領域および検出器の処理制約について、図６（ａ）に示す。本実施形態では、辞書データとして、人、犬、猫、鳥、４輪、２輪に分類される被写体を検出するために６種類用意する。本実施形態では、検出器となる論理回路ブロックは１ブロックで、１フレーム分の画像信号が処理される時間で３回検出を掛けることが出来る。すなわち、１フレーム当たり３種類の辞書の処理が可能であるとする。４輪とは、４輪タイプの車両を検出するための辞書データを指し、車両としては例えば４輪を備えたラリーカーなどのレース用車両や一般的な乗用車が含まれる。また２輪とは、２輪タイプの車両を検出するための辞書データを指し、車両としては例えばモーターバイクや自転車が含まれる。 Next, a specific example of dictionary data control will be described with reference to Figs. 6 and 7. Fig. 6(a) shows the types of dictionary data, detection areas, and detector processing constraints in the control case assumed in this embodiment. In this embodiment, six types of dictionary data are prepared to detect subjects classified as people, dogs, cats, birds, four-wheeled vehicles, and two-wheeled vehicles. In this embodiment, the logic circuit block that serves as the detector is one block, and detection can be performed three times in the time it takes to process one frame of image signals. In other words, it is possible to process three types of dictionaries per frame. Four-wheeled vehicles refer to dictionary data for detecting four-wheeled vehicles, and examples of vehicles include racing vehicles such as rally cars with four wheels, as well as general passenger cars. Two-wheeled vehicles refer to dictionary data for detecting two-wheeled vehicles, and examples of vehicles include motorbikes and bicycles.

図６（ａ）の条件のもとに、図４，図５のＳ３０８、Ｓ３０９の両ステップによって、各辞書データの優先度及び検出周期が決定される。その結果を図６（ｂ）に示す。図６（ｂ）は、検出されている主被写体に応じて決定された各辞書の検出周期を示している。 Under the conditions in FIG. 6(a), the priority and detection period of each dictionary data are determined by steps S308 and S309 in FIG. 4 and FIG. 5. The results are shown in FIG. 6(b). FIG. 6(b) shows the detection period of each dictionary determined according to the detected main subject.

被写体が検出されていない場合は、全ての辞書データの検出周期は一律で２[ｆｒａｍｅ／検出]とする。一方、被写体として人が検出されている場合は、人についての辞書データのみ、検出周期を１[ｆｒａｍｅ／検出]とする。同様に、その他の被写体が検出されている場合は、検出されている被写体に対応する辞書データのみ、検出周期を１[ｆｒａｍｅ／検出]とする。 If no subject is detected, the detection period for all dictionary data is set to 2 [frame/detection]. On the other hand, if a person is detected as the subject, the detection period for only the dictionary data for people is set to 1 [frame/detection]. Similarly, if other subjects are detected, the detection period for only the dictionary data corresponding to the detected subject is set to 1 [frame/detection].

次に、図７を用いて、連続したフレームの途中で犬７０３が検出されるケースを想定した辞書データの制御例を示す。図７において、７０１は各フレームに対して、スケジューリングされている辞書データを表し、７０２は検出器２１３に入力される撮影画像を表している。 Next, an example of dictionary data control assuming a case where a dog 703 is detected in the middle of consecutive frames is shown using FIG. 7. In FIG. 7, 701 represents the dictionary data scheduled for each frame, and 702 represents the captured image input to the detector 213.

図７の例では、１～５フレーム目までは被写体が検出されていないので、全ての辞書データの検出周期は、２［ｆｒａｍｅ／検出］に設定されている。４フレーム目で犬７０３がフレーム内に現れ、５フレーム目の途中で犬の顔７０４が検出結果として出力され、犬が主被写体として判定される。６フレーム目以降は、犬の辞書データの検出周期が１［ｆｒａｍｅ／検出］で制御され、それ以外の辞書データ（人、猫、鳥、４輪、２輪）は２［ｆｒａｍｅ／検出］で制御される。６フレーム目以降は、検出されている主被写体に対応する辞書データが全てのフレーム（今回以降の毎フレーム）で検出器２１３に入力されるので、安定した被写体追尾が可能となる。また主被写体以外の辞書データも検出器に入力されているので、主被写体以外の種類の被写体が撮影画像内に現れたとしても、検出可能であり、主被写体を他の種類の被写体に変更することも可能となる。 In the example of FIG. 7, since no subject is detected in the first to fifth frames, the detection cycle for all dictionary data is set to 2 [frame/detection]. In the fourth frame, a dog 703 appears in the frame, and in the middle of the fifth frame, a dog's face 704 is output as a detection result, and the dog is determined to be the main subject. From the sixth frame onwards, the detection cycle for the dog dictionary data is controlled at 1 [frame/detection], and the other dictionary data (people, cats, birds, four-wheeled vehicles, two-wheeled vehicles) is controlled at 2 [frame/detection]. From the sixth frame onwards, dictionary data corresponding to the detected main subject is input to the detector 213 in all frames (every frame from this frame onwards), making it possible to stably track the subject. In addition, since dictionary data other than the main subject is also input to the detector, even if a type of subject other than the main subject appears in the captured image, it is possible to detect it and change the main subject to another type of subject.

次に、局所辞書データがある場合の辞書データの制御例について説明する。図８（ａ）は、辞書データの種類と検出器の処理制約を示している。また図８（ｂ）は、図４、図５のＳ３０８、Ｓ３０９の両ステップによって決定された、各辞書データ、局所辞書データの検出周期について示している。局所辞書データが定義されている被写体が主被写体として検出されると、局所辞書データも優先度「高」に設定され、検出周期１［ｆｒａｍｅ／検出］で制御される。従って、主被写体以外に対応する辞書データの検出周期は相対的に低下する。 Next, an example of dictionary data control when local dictionary data is present will be described. Figure 8(a) shows the types of dictionary data and the processing constraints of the detector. Figure 8(b) shows the detection cycles of each dictionary data and local dictionary data determined by steps S308 and S309 in Figures 4 and 5. When a subject for which local dictionary data is defined is detected as the main subject, the local dictionary data is also set to a "high" priority and is controlled at a detection cycle of 1 [frame/detection]. Therefore, the detection cycle of dictionary data corresponding to subjects other than the main subject is relatively reduced.

図９は、図８によって決定された辞書データの検出周期による辞書データの制御例を示している。図９に示す５フレーム目で犬が検出され、６フレーム目から辞書データの制御方法が変更される。６フレーム目からは対応する局所辞書データも１［ｆｒａｍｅ／検出］の検出周期で検出器に入力され、犬の局所領域である瞳９０５が検出されるようになる。 Figure 9 shows an example of dictionary data control using the dictionary data detection cycle determined by Figure 8. A dog is detected in the fifth frame shown in Figure 9, and the dictionary data control method is changed from the sixth frame onwards. From the sixth frame onwards, the corresponding local dictionary data is also input to the detector at a detection cycle of 1 [frame/detection], and the dog's local region, the eye 905, is detected.

以上説明したように、本実施形態では、辞書データ数と比較して検出器を構成する論理演算回路の数が少なく、処理能力が乏しいような装置において、被写体の検出状況に応じて、各フレームに用いる各辞書データの優先度を動的に算出する。そして、辞書データの制御方法を変更することによって、複数種類の被写体を検出可能でありながら、特定の被写体が検出された後は、安定した被写体追尾を可能とすることができる。 As described above, in this embodiment, in a device with a small number of logic operation circuits constituting a detector compared to the number of dictionary data and with poor processing power, the priority of each dictionary data to be used for each frame is dynamically calculated according to the subject detection situation. Then, by changing the control method of the dictionary data, it is possible to detect multiple types of subjects, while enabling stable subject tracking after a specific subject is detected.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。第１の実施形態においては、検出された主被写体以外の各辞書データの優先度を一律で相対的に下げていた。本実施形態では、主被写体以外の各辞書データそれぞれに対して優先度を算出し、どの辞書データを優先的に検出器２１３に入力するかを決定する。各辞書データの優先度の優劣は、実際のユースケースに則して決定することが望ましい。 Second Embodiment
Next, a second embodiment of the present invention will be described. In the first embodiment, the priority of each dictionary data other than the detected main subject is relatively lowered uniformly. In this embodiment, the priority is calculated for each dictionary data other than the main subject, and it is determined which dictionary data is to be preferentially input to the detector 213. It is preferable to determine the priority of each dictionary data in accordance with an actual use case.

犬が主被写体である場合を例に挙げると、犬と一緒に撮影されるユースケースが最も多い被写体は、人と考えられる。想定シーンとしては、散歩や、人が犬を抱きかかえる様子や、人と犬が遊ぶ様子などが考えられ、人の辞書データの優先度は高く設定するべきと考えられる。一方で、犬が４輪や２輪といった乗物と一緒に撮影されるユースケースは少ないと考えられるので、４輪や２輪の辞書データの優先度は低く設定するべきと考えられる。 For example, if a dog is the main subject, it is thought that the subject that is most often photographed together with a dog is a person. Possible scenes include a walk, a person holding a dog, or a person playing with a dog, so the priority of the dictionary data for people should be set high. On the other hand, it is thought that there are few use cases in which a dog is photographed together with a four-wheeled or two-wheeled vehicle, so the priority of the dictionary data for four-wheeled or two-wheeled vehicles should be set low.

実際の制御では、このような考え方に基づいて、各辞書データの優先度を予め設計テーブルとしてＲＯＭ上に用意しておき、検出された被写体の状態に応じて、そのテーブルを参照して辞書データの優先度を決定する。 In actual control, based on this concept, the priority of each dictionary data is prepared in advance as a design table stored in ROM, and the priority of the dictionary data is determined by referring to this table depending on the state of the detected subject.

図１０（ａ）は、辞書データの優先度テーブルの具体例を示している。ここでは、辞書データの優先度を５(高)～１(低)の５段階で表している。このテーブルに基づいて、各辞書データの検出周期を算出し、辞書データの切替制御方法を決定する。図１０（ｂ）に、図１０（ａ）の優先度に基づいて決定した各辞書データの検出周期の例を示す。 Figure 10(a) shows a specific example of a dictionary data priority table. Here, the priority of dictionary data is expressed in five stages from 5 (high) to 1 (low). Based on this table, the detection period for each dictionary data is calculated and a dictionary data switching control method is determined. Figure 10(b) shows an example of the detection period for each dictionary data determined based on the priority in Figure 10(a).

次に、図１１を用いて、連続したフレームの途中で犬１１０３が検出されるケースを想定した辞書データの制御例を示す。５フレーム目で犬１１０３が被写体として検出され、６フレーム目から辞書データ制御を変更している。６フレーム目から犬の局所領域である犬の瞳に対応する辞書データが検出器に入力されるので、犬の瞳１１０５が検出される。また犬以外の辞書データの検出周期は図１０（ｂ）に則ってスケジューリングされる。 Next, FIG. 11 shows an example of dictionary data control assuming a case where a dog 1103 is detected in the middle of consecutive frames. A dog 1103 is detected as a subject in the fifth frame, and dictionary data control is changed from the sixth frame onwards. Dictionary data corresponding to the dog's eyes, which are a local region of the dog, is input to the detector from the sixth frame onwards, and the dog's eyes 1105 are detected. The detection cycle for dictionary data other than the dog is scheduled according to FIG. 10(b).

上述したように、本実施形態では、各辞書データの優先度を、実際のユースケースに則して決定する。それによって、検出器を構成する論理演算回路の数や処理能力が限られている装置においても、様々な種類の被写体をより早く検出することができ、主被写体の変更もより迅速に行うことができるようになる。 As described above, in this embodiment, the priority of each dictionary data is determined in accordance with the actual use case. This makes it possible to detect various types of subjects more quickly and to change the main subject more quickly, even in a device with a limited number of logic operation circuits constituting the detector and limited processing power.

（第３の実施形態）
次に、本発明の第３の実施形態について説明する。第３の実施形態では、図３に記載の各辞書データの優先度算出ステップを、過去の検出履歴に応じて行う。本実施形態が第１の実施形態と異なる点は、図３のステップＳ３０８、Ｓ３０９、Ｓ３１１のみであり、その他の処理は第１の実施形態と同じである。 Third Embodiment
Next, a third embodiment of the present invention will be described. In the third embodiment, the priority calculation step for each dictionary data shown in Fig. 3 is performed according to the past detection history. This embodiment differs from the first embodiment only in steps S308, S309, and S311 in Fig. 3, and the other processes are the same as those in the first embodiment.

ステップＳ３１１の、検出された被写体の中から主被写体を判定するプロセスにおいて、主被写体と同フレームで（同時に）検出されている被写体の検出情報を、被写体種類ごとに累積して被写体検出頻度データとして蓄積する処理を追加する。 In the process of determining the main subject from among the detected subjects in step S311, a process is added in which the detection information of subjects detected in the same frame (simultaneously) as the main subject is accumulated for each subject type and stored as subject detection frequency data.

図１２は、被写体種類ごとの被写体検出頻度データ例を棒グラフで示した図で、特定の被写体が主被写体として検出されている場合に、同一フレーム内でその他の種類の被写体が検出された検出頻度を表している。検出対象とする被写体は、第１の実施形態と同様に、人、犬、猫、鳥、４輪、２輪としていて、それぞれの被写体が主被写体として検出されている場合のその他の被写体の検出頻度を棒グラフで表している。ここで主被写体として検出されている被写体の種類自体の検出頻度は対象外となるので、Ｎ／Ａ（Not Applicable）と表記している。 Figure 12 is a bar graph showing an example of subject detection frequency data for each subject type, and indicates the detection frequency of other types of subjects being detected in the same frame when a specific subject is detected as the main subject. As in the first embodiment, the subjects to be detected are people, dogs, cats, birds, four-wheeled vehicles, and two-wheeled vehicles, and the bar graph shows the detection frequency of other subjects when each of these subjects is detected as the main subject. Here, the detection frequency of the type of subject itself detected as the main subject is not included, and is therefore indicated as N/A (Not Applicable).

次に、ステップＳ３０８において、前回のステップＳ３１１で蓄積処理された被写体検出頻度データを参照して、その時の主被写体として検出されている被写体種類に応じて、今回の被写体検出における各辞書データの優先度を決定する。検出頻度が高い被写体種類の辞書データの優先度が高く、検出頻度の低い被写体種類の辞書データの優先度は低く設定される。 Next, in step S308, the subject detection frequency data accumulated in the previous step S311 is referenced, and the priority of each dictionary data for the current subject detection is determined according to the subject type detected as the main subject at that time. The priority of dictionary data for subject types that are detected frequently is set to high, and the priority of dictionary data for subject types that are detected infrequently is set to low.

次に、ステップＳ３０９において、各辞書データの検出周期を既に決定した辞書データの優先度に基づいて決定する。図１３に、決定した各辞書データの検出周期の例を示す。 Next, in step S309, the detection period for each dictionary data is determined based on the priority of the dictionary data that has already been determined. FIG. 13 shows an example of the determined detection period for each dictionary data.

上述したように、本実施形態では、各辞書データの優先度を過去の被写体検出頻度データに則して決定することによって、そのユーザに適した辞書データ制御を行うことができる。従って、検出器を構成する論理演算回路の数や処理能力が限られている装置においても、様々な種類の被写体をより早く検出することができ、主被写体の変更もより迅速に行うことができるようになる。 As described above, in this embodiment, the priority of each dictionary data is determined in accordance with past subject detection frequency data, making it possible to perform dictionary data control suited to the user. Therefore, even in a device with a limited number of logic operation circuits constituting a detector and limited processing power, it is possible to detect various types of subjects more quickly and to change the main subject more quickly.

（第４の実施形態）
次に、本発明の第４の実施形態について説明する。被写体検出では誤検出防止などの目的で、被写体を連続して一定回数以上検出してから、その検出結果を採用するというアルゴリズムが用いられる（これ以降は、連続性チェックと呼ぶ）。 (Fourth embodiment)
Next, a fourth embodiment of the present invention will be described. In subject detection, in order to prevent erroneous detection, an algorithm is used in which a subject is detected consecutively a certain number of times or more before the detection result is adopted (hereinafter, referred to as continuity check).

しかしながら、連写中などの表示や追尾制御において即時性が求められる撮影モードにおいては、連続性チェックを行わず、一度でも検出されればその結果を即時に主被写体として採用し、追尾制御、辞書データ制御及び枠表示への反映が行われる。 However, in shooting modes where immediacy is required for display and tracking control during continuous shooting, continuity checks are not performed, and if a subject is detected even once, the result is immediately adopted as the main subject and reflected in tracking control, dictionary data control, and frame display.

しかし、１回のみの検出結果では誤検出の可能性がある。例えば、２輪車の一部を犬として検出してしまうなどの誤検出の可能性がある。誤検出であっても連続して対象の被写体を検出し続ければ、追尾制御にも大きな問題は生じない。しかしながら、誤検出であるため検出状態が不安定となる場合が多く、被写体追尾に不具合を生じる恐れが高くなってしまう。 However, there is a possibility of false positives if only one detection result is made. For example, there is a possibility of false positives, such as detecting part of a two-wheeled vehicle as a dog. Even if there is a false positive, if the target subject continues to be detected continuously, there will be no major problems with tracking control. However, false positives often result in an unstable detection state, increasing the risk of problems occurring with subject tracking.

そこで本実施形態では、連写モードなどの連続性チェックを行わない撮影モードにおいて、１度検出し主被写体として採用した後も全種の辞書データを稼働させ続け、一定回数以上連続検出されたら、正しい主被写体として採用し直す。追尾制御、辞書データ制御、枠表示制御も採用し直された主被写体に合わせて変更する。 In this embodiment, therefore, in a shooting mode that does not perform a continuity check, such as a continuous shooting mode, all types of dictionary data continue to operate even after a subject is detected once and adopted as the main subject, and if it is detected continuously a certain number of times or more, it is re-adopted as the correct main subject. The tracking control, dictionary data control, and frame display control are also changed to match the re-adopted main subject.

上記の連続性チェックは図３におけるＳ３１１の主被写体判定処理において行われる。図１４に連続性チェックの処理内容について記載する。図３で説明したように、図３のフローチャートは１フレームごとに繰り返し行われるため、それに伴って、図１４のフローチャートの動作も１フレームごとに行われる。 The above continuity check is performed in the main subject determination process of S311 in FIG. 3. The continuity check process is described in FIG. 14. As explained in FIG. 3, the flowchart in FIG. 3 is repeated for each frame, and therefore the operation of the flowchart in FIG. 14 is also performed for each frame.

ステップＳ１４０１では、入力画像内に被写体検出結果があるか否かをチェックする。被写体検出結果があれば、ステップＳ１４０２に進む。被写体検出結果がなければ、ステップＳ１４０３で連続検出回数を０に初期化し、検出結果を採用しない（ステップＳ１４０７）。 In step S1401, it is checked whether or not there is a subject detection result in the input image. If there is a subject detection result, the process proceeds to step S1402. If there is no subject detection result, the number of consecutive detections is initialized to 0 in step S1403, and the detection result is not adopted (step S1407).

次にステップＳ１４０２で連続検出回数が０であるか、または直前のフレームで検出された被写体ＩＤと現フレームで検出された被写体ＩＤとが同じであるか否かをチェックする。フレーム間での被写体どうしの相関処理は連続性チェック処理の前に行われ、被写体ＩＤによって紐づけられる。連続検出回数が０であれば、初めての検出結果であるため、ステップＳ１４０４に進む。あるいは、直前の被写体ＩＤと現在の被写体ＩＤとが一致していれば同様にステップＳ１４０４に進む。例えば、連続検出回数が１以上であっても、直前の被写体ＩＤと現在の被写体ＩＤとが一致しているという条件が成り立てば、ステップＳ１４０４に進む。前述のいずれの条件も満たさない場合にはステップＳ１４０３に進む。 Next, in step S1402, it is checked whether the number of consecutive detections is 0 or whether the subject ID detected in the previous frame is the same as the subject ID detected in the current frame. Correlation processing between subjects between frames is performed before the continuity check processing, and subjects are linked by subject ID. If the number of consecutive detections is 0, this is the first detection result, so the process proceeds to step S1404. Alternatively, if the previous subject ID and the current subject ID match, the process similarly proceeds to step S1404. For example, even if the number of consecutive detections is 1 or more, if the condition that the previous subject ID and the current subject ID match is met, the process proceeds to step S1404. If none of the above conditions are met, the process proceeds to step S1403.

ステップＳ１４０３では連続検出回数を０に初期化し、ステップＳ１４０７により、この場合の検出結果は採用されない。 In step S1403, the number of consecutive detections is initialized to 0, and in step S1407, the detection result in this case is not adopted.

ステップＳ１４０４では、連続検出回数に１を加えてステップＳ１４０５に進む。連続検出回数は、図１４の動作が１回行われるごとにメモリなどに記憶される。図１４のステップＳ１４０２において、直前の被写体ＩＤと現在の被写体ＩＤとが一致しているという条件が複数フレーム続けば、上記のメモリに連続検出回数がカウントアップされていくこととなる。 In step S1404, the number of consecutive detections is incremented by 1, and the process proceeds to step S1405. The number of consecutive detections is stored in a memory or the like each time the operation in FIG. 14 is performed once. In step S1402 in FIG. 14, if the condition that the immediately preceding subject ID and the current subject ID match continues for multiple frames, the number of consecutive detections is counted up in the memory.

次にステップＳ１４０５では、連続検出回数が連続検出回数の閾値以上となっているかをチェックする。連写などの即時性が要求される撮影モードにおいては連続検出回数の閾値を１にセットし（連写モードとする）、それ以外の場合には１よりも大きな回数を連続検出回数の閾値にセットする（通常モードとする）。連続検出回数が連続検出回数の閾値を超えていれば、ステップＳ１４０６に進み、その時の検出結果を採用し、図１４の連続性チェック処理の後で主被写体として採用する。 Next, in step S1405, it is checked whether the number of consecutive detections is equal to or greater than the threshold value for the number of consecutive detections. In a shooting mode that requires immediacy, such as continuous shooting, the threshold value for the number of consecutive detections is set to 1 (continuous shooting mode), and in other cases, the threshold value for the number of consecutive detections is set to a number greater than 1 (normal mode). If the number of consecutive detections exceeds the threshold value for consecutive detections, the process proceeds to step S1406, where the detection result at that time is adopted and is adopted as the main subject after the continuity check process in FIG. 14.

連続検出回数が連続検出回数の閾値未満の場合には、ステップＳ１４０７に進み、その時の検出結果は採用しない。 If the number of consecutive detections is less than the threshold number of consecutive detections, proceed to step S1407 and do not use the detection result at that time.

次に図１５を用いて、連写開始時などにおける本実施形態で取り上げる課題について説明する。 Next, using Figure 15, we will explain the issues that this embodiment addresses when starting continuous shooting, etc.

図１５において、画像１５０１，１５０２は、２輪を撮影した入力画像を示し、枠１５０３はＡＦ枠を表している。またそれぞれの画像フレームに対する検出状態と追尾状態とを入力画像の下部に表示している。入力画像１５０１において、主被写体がない状態で連写撮影が開始されると、ＡＦ枠１５０３の部分において被写体検出を行い、それ以降は検出された被写体に対して画角全面で追尾を行う。データ１５０４は、各画像フレームにおいて入力される辞書データを表している。この例では、1フレームに対して３つの辞書データをセットできる場合を想定している。 In FIG. 15, images 1501 and 1502 show input images of two wheels, and frame 1503 represents the AF frame. The detection and tracking states for each image frame are also displayed below the input image. In input image 1501, when continuous shooting begins without a main subject, subject detection is performed in the AF frame 1503, and thereafter the detected subject is tracked across the entire angle of view. Data 1504 represents the dictionary data input in each image frame. In this example, it is assumed that three dictionary data can be set for one frame.

図１５では、１フレーム目で２輪の一部を犬１５０５として誤検出してしまい、２フレーム以降では犬１５０５の検出結果が主被写体として選択され追尾が行われている。この画像における正しい検出結果は２輪であるが、２フレーム目以降は犬の辞書データの優先度を高める制御が行われているために２輪、４輪の乗り物の辞書データがセットされていない。そのため対象被写体を２輪として検出し直すことができない。この課題を解決する本実施形態での制御概念を図１６に示す。 In Figure 15, part of a two-wheeled vehicle is erroneously detected as a dog 1505 in the first frame, and from the second frame onwards, the detection result of the dog 1505 is selected as the main subject and tracking is performed. The correct detection result in this image is two wheels, but from the second frame onwards, control is performed to increase the priority of the dictionary data for dogs, so dictionary data for two-wheeled and four-wheeled vehicles is not set. Therefore, the target subject cannot be re-detected as two wheels. The control concept in this embodiment that solves this problem is shown in Figure 16.

図１６では、１フレーム目で２輪の一部を犬１５０５として検出した場合でも、２フレーム目以降の一定期間内は主被写体として選択した犬に合わせた辞書データ制御を行わず、主被写体がない時と同じ辞書データ制御（全種別の辞書データをセットする）を行う。図１６では、２フレーム目以降で正検出である２輪１５０６の検出結果が複数回得られ、８フレーム目で連続検出回数が連続検出回数の閾値（図１６では４回としている）を上回っている。そして次の９フレーム目から主被写体を犬から２輪に変更し、辞書データ制御も２輪に合わせて変更する。複数回検出された新たな被写体検出結果を主被写体として採用し直すことで、連写中における即時性を保ちながら誤検出を是正し、誤追尾の恐れを抑制することができる。 In FIG. 16, even if part of two wheels is detected as a dog 1505 in the first frame, dictionary data control is not performed according to the dog selected as the main subject for a certain period from the second frame onwards, and dictionary data control is performed in the same way as when there is no main subject (dictionary data of all types is set). In FIG. 16, the detection result of two wheels 1506, which is a correct detection, is obtained multiple times from the second frame onwards, and the number of consecutive detections exceeds the threshold for the number of consecutive detections (set to four times in FIG. 16) in the eighth frame. Then, from the next ninth frame, the main subject is changed from the dog to two wheels, and dictionary data control is also changed to match the two wheels. By re-adopting the new subject detection results that have been detected multiple times as the main subject, it is possible to correct erroneous detections while maintaining immediacy during continuous shooting, and to reduce the risk of erroneous tracking.

次に図１６の制御概念を実現するための制御フローを図１７に示す。図１７の制御フローは、図３のステップＳ３１１の主被写体判定処理において行われる。図３で説明したように、図３のフローチャートは１フレームごとに繰り返し行われるため、それに伴って、図１７のフローチャートの動作も１フレームごとに行われる。 Next, FIG. 17 shows a control flow for realizing the control concept of FIG. 16. The control flow of FIG. 17 is performed in the main subject determination process of step S311 of FIG. 3. As explained in FIG. 3, the flowchart of FIG. 3 is repeatedly performed for each frame, and therefore the operation of the flowchart of FIG. 17 is also performed for each frame.

図１７では、ステップＳ１７０１で主被写体が既にあるか否かを判断している。主被写体が既にあれば処理を終了する。主被写体がなければ、ステップＳ１７０２に進み、連写モードか否かを判断する。 In FIG. 17, step S1701 determines whether or not a main subject is already present. If a main subject is already present, the process ends. If no main subject is present, the process proceeds to step S1702, where it is determined whether or not the continuous shooting mode is selected.

連写モードでなければ、連続性チェックは通常モードで行えばよいので、ステップＳ１７０６の連続性チェック（通常モード：連続検出回数＝４回）に進む。連写モードであれば、ステップＳ１７０３において暫定主被写体があるか否かをチェックする。ここで暫定主被写体とは、図１６で２～８フレーム目で主被写体としていた犬のように、連続性チェック（通常モード：連続検出回数＝４回）を経ずに主被写体として選択された被写体を示す。この被写体が暫定主被写体である場合は追尾対象にはなるが、辞書データ制御は主被写体がない場合の辞書データ制御と同様に、全種別の辞書データを一律にセットする。 If the continuous shooting mode is not selected, the continuity check can be performed in normal mode, and the program proceeds to the continuity check in step S1706 (normal mode: number of continuous detections = 4). If the continuous shooting mode is selected, a check is made in step S1703 to see if there is a tentative main subject. A tentative main subject here refers to a subject that is selected as the main subject without going through a continuity check (normal mode: number of continuous detections = 4), such as the dog that was the main subject in frames 2 to 8 in FIG. 16. If this subject is the tentative main subject, it will be a tracking target, but the dictionary data control is the same as the dictionary data control when there is no main subject, in which all types of dictionary data are set uniformly.

ステップＳ１７０３において暫定被写体がない場合には、ステップＳ１７０４の連続性チェック（連写モード：連続検出回数＝１回）を行い、ステップＳ１７０５において検出結果があれば暫定主被写体として採用する。 If there is no provisional subject in step S1703, a continuity check is performed in step S1704 (continuous shooting mode: number of continuous detections = 1), and if there is a detection result in step S1705, it is adopted as the provisional main subject.

次にステップＳ１７０９では、暫定主被写体フレーム数を０に初期化して処理を終了する。暫定主被写体フレーム数とは暫定主被写体がセットされてからの画像フレーム数をカウントしたものである。 Next, in step S1709, the number of provisional main subject frames is initialized to 0, and processing ends. The number of provisional main subject frames is the number of image frames counted since the provisional main subject was set.

またステップＳ１７０３において暫定被写体がある場合には、ステップＳ１７０８に進む。 Also, if a provisional subject is found in step S1703, proceed to step S1708.

ステップＳ１７０８では、暫定主被写体フレーム数が暫定主被写体フレームの閾値を上回っているかを判定する。暫定主被写体フレーム数が暫定主被写体フレームの閾値を上回った場合は、暫定主被写体よりも確からしい被写体はないとして、ステップＳ１７１０において暫定主被写体を主被写体に変更し処理を終了する。なお、このフローチャートでは、暫定主被写体フレーム数は、図１７の動作が１回行われるごとにメモリなどに記憶され、暫定主被写体フレーム数がカウントアップされていくこととなる。 In step S1708, it is determined whether the number of provisional main subject frames exceeds the threshold value for provisional main subject frames. If the number of provisional main subject frames exceeds the threshold value for provisional main subject frames, it is determined that there is no subject more likely than the provisional main subject, and in step S1710, the provisional main subject is changed to the main subject and the process ends. Note that in this flowchart, the number of provisional main subject frames is stored in a memory or the like each time the operation in FIG. 17 is performed once, and the number of provisional main subject frames is counted up.

暫定主被写体フレーム数が暫定主被写体フレームの閾値以下であればステップＳ１７０６に進む。暫定主被写体フレームの閾値は暫定主被写体による制御状態を一定期間に制限するために設けられるもので、その値は少なくとも連続検出回数の閾値以上に設定される。 If the number of tentative main subject frames is equal to or less than the tentative main subject frame threshold, proceed to step S1706. The tentative main subject frame threshold is set to limit the control state by the tentative main subject to a certain period of time, and its value is set to at least equal to or greater than the threshold for the number of consecutive detections.

次にステップＳ１７０６の連続性チェック（通常モード）を行い、連続検出回数の閾値を超える検出結果が得られれば、ステップＳ１７０７において、その検出結果を主被写体として採用し処理を終了する。 Next, a continuity check (normal mode) is performed in step S1706, and if a detection result that exceeds the threshold for the number of consecutive detections is obtained, in step S1707, that detection result is adopted as the main subject, and the process ends.

上述した一連の制御は連写中に追尾していた主被写体を見失い、新たな被写体を検出し直す場合においても適用することができる。 The above-mentioned series of controls can also be applied when the main subject being tracked during continuous shooting is lost and a new subject needs to be detected again.

また選択し直す被写体の検出領域とその近傍領域が、暫定被写体の検出領域と重複している場合にのみ、主被写体として選択し直すようにする。図１８に被写体の位置関係について示す。１８０１は画像領域、１８０２は暫定主被写体の検出領域、１８０３は選択し直す被写体の検出領域、１８０４は検出領域１８０３の近傍領域をそれぞれ表している。 Furthermore, only when the detection area of the subject to be reselected and its neighboring area overlap with the detection area of the provisional subject, the subject is reselected as the main subject. Figure 18 shows the positional relationship of the subjects. Reference numeral 1801 denotes the image area, 1802 denotes the detection area of the provisional main subject, 1803 denotes the detection area of the subject to be reselected, and 1804 denotes the neighboring area of detection area 1803.

図１８（Ａ）では、検出領域１８０２と重複している領域がないため、検出領域１８０３を新たな主被写体としては選択しない。 In FIG. 18(A), there is no area that overlaps with detection area 1802, so detection area 1803 is not selected as the new main subject.

図１８（Ｂ）、１８（Ｃ）では、検出領域１８０２と重複している領域があるため、検出領域１８０３を新たな主被写体として選択する。近傍領域１８０４は、選択し直す被写体の種別やそれまでの動き量の多さに応じて設定される。例えば選択し直す対象が動物の場合には、動きが速く数フレームの間の移動量が大きい場合が想定されるため、近傍領域１８０４は相対的に広めに設定される。また選択し直す対象が人の場合には、数フレーム内で大きく移動することは考えにくいため、相対的に狭くするといった設定方法が考えられる。 In Figures 18(B) and 18(C), there is an area that overlaps with detection area 1802, so detection area 1803 is selected as the new main subject. Nearby area 1804 is set depending on the type of subject to be reselected and the amount of movement up to that point. For example, if the subject to be reselected is an animal, it is expected that it will move quickly and move a lot over several frames, so nearby area 1804 is set relatively wide. On the other hand, if the subject to be reselected is a person, it is unlikely that they will move a lot over several frames, so one possible setting method is to make the nearby area relatively narrow.

このようにすることによって、周囲の別被写体を誤って選択する恐れを軽減することができる。 This reduces the risk of accidentally selecting a different subject in the surrounding area.

上述したように、本実施形態では、連写モードなどの連続性チェックを行わない撮影モードにおいて、１度検出し主被写体として採用した後も全種別の辞書データを稼働させ続け、一定回数以上連続検出されたら、正しい主被写体として採用し直す。追尾制御、辞書データ制御、枠表示制御も採用し直された主被写体に合わせて変更する。このようにすることによって、連写モードにおいて求められる追尾制御及び枠表示制御の即時性を担保しつつ、誤検出による追尾性能の低下を抑制することができる。 As described above, in this embodiment, in a shooting mode that does not perform a continuity check, such as continuous shooting mode, all types of dictionary data continue to operate even after a subject is detected once and adopted as the main subject, and if it is detected continuously a certain number of times or more, it is re-adopted as the correct main subject. The tracking control, dictionary data control, and frame display control are also changed to match the re-adopted main subject. In this way, it is possible to suppress a decrease in tracking performance due to erroneous detection while ensuring the immediacy of tracking control and frame display control required in continuous shooting mode.

（他の実施形態）
また本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読み出し実行する処理でも実現できる。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現できる。 Other Embodiments
The present invention can also be realized by a process in which a program for realizing one or more of the functions of the above-described embodiments is supplied to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) for realizing one or more of the functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above-described embodiment, and various modifications and variations are possible without departing from the spirit and scope of the invention. Therefore, the following claims are appended to disclose the scope of the invention.

本発明は、デジタルカメラやデジタルビデオカメラの撮影に限らず、監視カメラ、Ｗｅｂカメラ、携帯電話などの撮像装置にも搭載できる。 The present invention is not limited to use with digital cameras or digital video cameras, but can also be installed in imaging devices such as surveillance cameras, web cameras, and mobile phones.

１０２：演算装置、１０４：撮像素子、２０２：主被写体演算部、２０３：追尾演算部、２１１：辞書データ優先度算出部、２１２：辞書データ切替制御部、２１３：検出器、２１４：主被写体判定部 102: Calculation device, 104: Image sensor, 202: Main subject calculation unit, 203: Tracking calculation unit, 211: Dictionary data priority calculation unit, 212: Dictionary data switching control unit, 213: Detector, 214: Main subject determination unit

Claims

A storage means for storing a plurality of dictionary data for detecting a plurality of different subjects from an image;
a detection means for detecting an object corresponding to a part of the dictionary data by using a part of the plurality of dictionary data for each frame of a plurality of frames of images obtained by an imaging means;
a switching means for switching dictionary data used by the detection means in the plurality of frames in response to a result of the object detection by the detection means;
Equipped with
The image processing device is characterized in that the switching means uses dictionary data corresponding to a subject detected by the detection means in subsequent frames at a shorter detection period than before the subject was detected, and switches the dictionary data used by the detection means so that the combination of dictionary data used by the detection means is switched periodically for each frame .

The image processing device according to claim 1, characterized in that the detection means detects a main subject from the image.

3. The image processing apparatus according to claim 2, wherein the main subject is a subject on which the image capturing means adjusts at least one of focus and exposure.

4. The image processing device according to claim 1, wherein the storage means stores a plurality of types of tables for switching between the plurality of dictionary data for each frame, and the switching means switches the dictionary data to be used by the detection means using a table that has been set in advance by a user from among the plurality of types of tables .

The image processing device according to any one of claims 1 to 4, characterized in that the switching means switches the dictionary data used by the detection means to use second dictionary data corresponding to the detection of a local area of the subject corresponding to the first dictionary data in response to the detection of a corresponding subject using the first dictionary data.

6. The image processing apparatus according to claim 5, wherein the switching means switches the dictionary data used by the detection means so that, when the second dictionary data is present, the second dictionary data is preferentially used together with the first dictionary data.

7. The image processing apparatus according to claim 1, wherein the switching means has a priority table for determining priorities of the plurality of dictionary data in accordance with the subject detected by the detection means.

8. The image processing apparatus according to claim 7 , wherein the switching means determines the priorities of the plurality of dictionary data based on the priority table.

9. The image processing apparatus according to claim 1, wherein the switching means stores frequency data that is a cumulative count of the frequency of a type of object being detected simultaneously with a specific object.

10. The image processing apparatus according to claim 9 , wherein the switching means determines priorities of the plurality of dictionary data based on the frequency data.

11. The image processing apparatus according to claim 1, wherein the plurality of dictionary data are convolutional neural networks trained by machine learning.

The image processing device according to any one of claims 1 to 11, characterized in that the switching means performs switching control of dictionary data so that, even if a first subject is detected and selected as a main subject in a specific shooting mode, if a second subject other than the first subject is detected consecutively a predetermined number of times or more , the second subject is re-selected as the main subject.

13. The image processing apparatus according to claim 12 , wherein the switching means performs the switching control even when the main subject being tracked is lost in the specific photographing mode and the subject is redetected.

13 . The image processing device according to claim 12 , wherein the switching means reselects the main subject when the detection area of the second subject and an area including the vicinity thereof overlap with the detection area of the first subject.

An imaging means for capturing an image;
An image processing device according to any one of claims 1 to 14 ,
An imaging device comprising:

a detection step of detecting, for each frame of a plurality of frames of images obtained by the imaging means, a subject corresponding to a portion of a plurality of dictionary data for detecting a plurality of different subjects from the image, using the portion of dictionary data;
a switching step of switching dictionary data used in the detection step in the plurality of frames according to a result of the detection of the object in the detection step;
having
A control method for an image processing device, characterized in that in the switching process, dictionary data corresponding to a subject detected in the detection process is used in subsequent frames at a shorter detection period than before the subject was detected, and the dictionary data used in the detection process is switched so that the combination of dictionary data used in the detection process is switched periodically for each frame .

A program for causing a computer to execute each step of the control method according to claim 16 .

A computer-readable storage medium storing a program for causing a computer to execute each step of the control method according to claim 16 .