JP6168549B2

JP6168549B2 - Image processing apparatus, image processing method, program, and imaging apparatus

Info

Publication number: JP6168549B2
Application number: JP2013090492A
Authority: JP
Inventors: 熙張; 小林　達也; 達也小林
Original assignee: THE UNIVERSITY OF ELECTRO-COMUNICATINS
Current assignee: THE UNIVERSITY OF ELECTRO-COMUNICATINS
Priority date: 2013-04-23
Filing date: 2013-04-23
Publication date: 2017-07-26
Anticipated expiration: 2033-04-23
Also published as: JP2014215694A

Description

本発明は、画像処理装置、画像処理方法、プログラム及び撮像装置に関し、特に、複数の画像を合成して最適な画像を得る技術に関する。 The present invention relates to an image processing apparatus, an image processing method, a program, and an imaging apparatus, and more particularly to a technique for obtaining an optimal image by combining a plurality of images.

近年、デジタルカメラや、撮影機能付きの携帯電話端末やタブレット端末等が普及してきており、これらのカメラを使用した撮影も日常的に行われている。ところが、１回の撮影では、撮影者の意図に即した撮影画像を得ることが難しい場合もある。特に、集合写真の撮影時等、被写体となる人物の数が多い場合には、撮影の瞬間に誰か一人が目を閉じてしまったりして、被写体全員の良い表情を一枚の写真に収めることが難しい。 In recent years, digital cameras, mobile phone terminals with a photographing function, tablet terminals, and the like have become widespread, and photographing using these cameras is also routinely performed. However, it may be difficult to obtain a photographed image that matches the photographer's intention in one photographing. In particular, when there are a large number of people as subjects, such as when taking a group photo, someone closes their eyes at the moment of shooting, so that all the subjects have a good expression on a single photo. Is difficult.

このため、複数人の被写体を連写し、連写して得た複数枚の撮影画像を合成することより、被写体全員が良い表情をしている合成画像を生成することが行われている。例えば特許文献１には、表情評価値が最も高い顔画像領域を複数枚の撮影画像の中から抽出し、抽出した顔画像領域を、予め顔画像領域が除かれた背景画像にはめ込むことで、被写体全員が良い表情をしている合成画像を生成する技術が記載されている。 For this reason, by continuously shooting a plurality of subjects and combining a plurality of photographed images obtained by continuous shooting, a composite image in which all the subjects have a good expression is generated. For example, in Patent Document 1, a facial image area having the highest facial expression evaluation value is extracted from a plurality of photographed images, and the extracted facial image area is inserted into a background image from which the facial image area has been removed in advance. A technique for generating a composite image in which all subjects have a good expression is described.

特開２００７−２９９２９７号公報JP 2007-299297 A

ところで、特許文献１に記載のように、顔画像領域が除かれた背景画像に、他の撮影画像から抽出した顔画像領域をはめ込む場合には、はめ込まれた顔画像領域と背景画像との間で、画像のずれが生じてしまうことが起こりうる。このような場合には、生成される合成画像が、顔画像領域と背景画像との整合が取れていない不自然なものとなってしまう。 By the way, as described in Patent Document 1, when a face image region extracted from another photographed image is to be inserted into a background image from which the face image region has been removed, a space between the inserted face image region and the background image is used. Therefore, it is possible that an image shift occurs. In such a case, the generated composite image is unnatural because the face image area and the background image are not matched.

特許文献１には、はめ込まれた顔画像領域と背景画像との間で位置ずれが生じた場合に、顔画像領域の合成位置や大きさを調整することにより、位置ずれを解消することも記載されている。ところが、位置ずれが解消されるまでの間、このような調整を繰り返し行うとなると、合成画像の完成までに長い時間がかかってしまう。また、顔画像領域と背景画像との位置ずれを解消できたとしても、はめ込まれた顔画像領域と背景画像との境目が線として認識されてしまうことがある。このような場合には、特許文献１に記載の技術では、合成画像における不自然さを解消することはできない。 Patent Document 1 also describes that when a misalignment occurs between the embedded face image area and the background image, the misalignment is eliminated by adjusting the synthesis position and size of the face image area. Has been. However, if such adjustment is repeatedly performed until the positional deviation is eliminated, it takes a long time to complete the composite image. Even if the positional deviation between the face image area and the background image can be eliminated, the boundary between the inserted face image area and the background image may be recognized as a line. In such a case, the technique described in Patent Literature 1 cannot eliminate unnaturalness in the composite image.

本発明はかかる点に鑑みてなされたものであり、複数枚の撮影画像を用いて、より自然な合成画像を容易に生成することを目的とする。 The present invention has been made in view of such a point, and an object thereof is to easily generate a more natural composite image using a plurality of photographed images.

本発明の画像処理装置は、撮影画像選択部と、多重解像度分解部と、置換処理部と、合成画像生成部とを備える構成とし、各部の構成及び機能を次のようにする。撮影画像選択部は、連続して撮影された複数枚の撮影画像の中から、画像の置き換えが行われる側の被置換撮影画像と、画像を置き換える側の置換側撮影画像とを選択する。多重解像度分解部は、置換側撮影画像内の置換領域の外周周辺の領域である置換領域外周周辺領域を少なくとも含む第１の領域と、被置換撮影画像内の被置換領域の外周周辺の領域である被置換領域外周周辺領域を少なくとも含む第２の領域とを多重解像度に分解する。置換処理部は、多重解像度分解部で分解して得られた各周波数成分において、第２の領域の少なくとも一部の領域を、第１の領域の少なくとも一部の領域で置き換える。合成画像生成部は、置換処理部で置き換えが行われた各周波数成分を再構成する第１の処理、又は、第１の処理で得られた再構成画像と置換領域内の画像とを合成する第２の処理によって、合成画像を生成する。そして、第１の領域が置換領域外周周辺領域である場合、多重解像度分解部は、第１の領域としての置換領域外周周辺領域と、第２の領域としての被置換領域外周周辺領域とをそれぞれ多重解像度に分解し、置換処理部は、多重解像度分解部で分解して得られた置換領域外周周辺領域の各周波数成分及び被置換領域外周周辺領域の各周波数成分において、被置換領域外周周辺領域の内周側領域を置換領域外周周辺領域の内周側領域で置き換え、合成画像生成部は、第２の処理によって合成画像を生成する。また、第１の領域が置換側撮影画像全体である場合、多重解像度分解部は、第１の領域としての置換側撮影画像全体と、第２の領域としての被置換撮影画像全体とを多重解像度に分解し、置換処理部は、多重解像度分解部で分解して得られた置換側撮影画像全体の各周波数成分及び被置換撮影画像全体の各周波数成分において、被置換領域を置換領域で置き換え、合成画像生成部は、第１の処理によって合成画像を生成する。 The image processing apparatus according to the present invention includes a captured image selection unit, a multi-resolution decomposition unit, a replacement processing unit, and a composite image generation unit. The configuration and function of each unit are as follows. The captured image selection unit selects a replacement captured image on the side where the image is to be replaced and a replacement-side captured image on the side where the image is to be replaced from among a plurality of continuously captured images. The multi-resolution decomposition unit includes a first area including at least a replacement area outer peripheral area that is an area around the outer periphery of the replacement area in the replacement-side captured image, and an area around the outer periphery of the replacement area in the replacement captured image. A second region including at least a peripheral region around a certain replacement region is decomposed into multiple resolutions. The replacement processing unit replaces at least a part of the second region with at least a part of the first region in each frequency component obtained by the decomposition by the multi-resolution decomposition unit. Composite image generation unit, first processing for reconstructing the respective frequency components replacement is performed by the replacement processor, or combines the image replacement area as the reconstructed image obtained in the first processing A composite image is generated by the second process. Then, when the first area is the replacement area outer peripheral area, the multi-resolution decomposition unit performs the replacement area outer peripheral area as the first area and the replacement area outer peripheral area as the second area, respectively. The permutation processing unit decomposes into multiple resolutions, and the permutation processing unit includes the perimeter region around the replacement region in each frequency component around the perimeter region of the permutation region and each perimeter component around the permutation region obtained by the multiresolution resolution unit. The inner peripheral side region is replaced with the inner peripheral side region of the replacement region outer peripheral peripheral region, and the composite image generation unit generates a composite image by the second processing. When the first area is the entire replacement-side captured image, the multi-resolution decomposition unit multi-resolutions the entire replacement-side captured image as the first area and the entire replacement-captured image as the second area. The replacement processing unit replaces the replacement region with a replacement region in each frequency component of the entire replacement-side captured image and each frequency component of the entire replacement captured image obtained by the multi-resolution decomposition unit. The composite image generation unit generates a composite image by the first process.

また、本発明の画像処理方法は、撮影画像選択ステップ、多重解像度分解ステップ、置換処理ステップ及び合成画像生成ステップを含む。まず、撮影画像選択ステップで、連続して撮影された複数枚の撮影画像の中から、画像の置き換えが行われる側の被置換撮影画像と、画像を置き換える側の置換側撮影画像とを選択する。続いて、多重解像度分解ステップで、置換側撮影画像内の置換領域の外周周辺の領域である置換領域外周周辺領域を少なくとも含む第１の領域と、被置換撮影画像内の被置換領域の外周周辺の領域である被置換領域外周周辺領域を少なくとも含む第２の領域とを多重解像度に分解する。続いて、置換処理ステップで、多重解像度に分解して得られた各周波数成分において、第２の領域の少なくとも一部の領域を、第１の領域の少なくとも一部の領域で置き換える。続いて、合成画像生成ステップで、置き換えが行われた各周波数成分を再構成する第１の処理、又は、第１の処理で得られた再構成画像と置換領域内の画像とを合成する第２の処理によって、合成画像を生成する。そして、第１の領域が置換領域外周周辺領域である場合、多重解像度分解ステップでは、第１の領域としての置換領域外周周辺領域と、第２の領域としての被置換領域外周周辺領域とがそれぞれ多重解像度に分解され、置換処理ステップでは、多重解像度に分解して得られた置換領域外周周辺領域の各周波数成分及び被置換領域外周周辺領域の各周波数成分において、被置換領域外周周辺領域の内周側領域が置換領域外周周辺領域の内周側領域で置き換えられ、合成画像生成ステップでは、第２の処理によって合成画像が生成される。また、第１の領域が置換側撮影画像全体である場合、多重解像度分解ステップでは、第１の領域としての置換側撮影画像全体と、第２の領域としての被置換撮影画像全体とが多重解像度に分解され、置換処理ステップでは、多重解像度に分解して得られた置換側撮影画像全体の各周波数成分及び被置換撮影画像全体の各周波数成分において、被置換領域が置換領域で置き換えられ、合成画像生成ステップでは、第１の処理によって合成画像が生成される。 The image processing method of the present invention includes a captured image selection step, a multi-resolution decomposition step, a replacement processing step, and a composite image generation step. First, in a photographed image selection step, a replacement photographed image on the side to be replaced and a replacement-side photographed image on the image replacement side are selected from a plurality of continuously photographed images. . Subsequently, in the multi-resolution decomposition step, a first area including at least a replacement area outer peripheral area that is an area around the outer periphery of the replacement area in the replacement-side captured image, and an outer periphery of the replacement area in the replacement captured image And the second area including at least the peripheral area of the outer periphery of the replacement area, which is an area of the area, is decomposed into multiple resolutions. Subsequently, in each frequency component obtained by decomposing into multiple resolutions in the replacement processing step, at least a part of the second region is replaced with at least a part of the first region. Subsequently, a synthetic image generating step, the first processing for reconstructing the respective frequency components is performed replaced, or, the synthesizing the image replacement area as the reconstructed image obtained in the first processing A composite image is generated by the process 2. When the first area is a replacement area outer peripheral area, the multi-resolution decomposition step includes a replacement area outer peripheral area as the first area and a replacement area outer peripheral area as the second area, respectively. In the replacement processing step, each frequency component in the peripheral area around the replacement area and each frequency component in the peripheral area around the replacement area obtained in the replacement processing step are included in the peripheral area around the replacement area. The peripheral side region is replaced with the inner peripheral side region of the replacement region outer peripheral peripheral region, and in the composite image generation step, a composite image is generated by the second process. When the first area is the entire replacement-side captured image, the multi-resolution decomposition step includes the entire resolution-side captured image as the first area and the entire replacement-captured image as the second area. In the replacement processing step, the replacement area is replaced with a replacement area in each frequency component of the entire replacement-side captured image obtained by decomposing into multiple resolutions and each frequency component of the entire replacement captured image. In the image generation step, a composite image is generated by the first process.

また、本発明のプログラムは、撮影画像選択ステップ、多重解像度分解ステップ、置換処理ステップ及び合成画像生成ステップの各処理をコンピュータに実現させる。まず、撮影画像選択ステップで、連続して撮影された複数枚の撮影画像の中から、画像の置き換えが行われる側の被置換撮影画像と、画像を置き換える側の置換側撮影画像とを選択する。続いて、多重解像度分解ステップで、置換側撮影画像内の置換領域の外周周辺の領域である置換領域外周周辺領域を少なくとも含む第１の領域と、被置換撮影画像内の被置換領域の外周周辺の領域である被置換領域外周周辺領域を少なくとも含む第２の領域とを多重解像度に分解する。続いて、置換処理ステップで、多重解像度に分解して得られた各周波数成分において、第２の領域の少なくとも一部の領域を、第１の領域の少なくとも一部の領域で置き換える。続いて、合成画像生成ステップで、置き換えが行われた各周波数成分を再構成する第１の処理、又は、第１の処理で得られた再構成画像と置換領域内の画像とを合成する第２の処理によって、合成画像を生成する。そして、第１の領域が置換領域外周周辺領域である場合、多重解像度分解ステップでは、第１の領域としての置換領域外周周辺領域と、第２の領域としての被置換領域外周周辺領域とがそれぞれ多重解像度に分解され、置換処理ステップでは、多重解像度に分解して得られた置換領域外周周辺領域の各周波数成分及び被置換領域外周周辺領域の各周波数成分において、被置換領域外周周辺領域の内周側領域が置換領域外周周辺領域の内周側領域で置き換えられ、合成画像生成ステップでは、第２の処理によって合成画像が生成される。また、第１の領域が置換側撮影画像全体である場合、多重解像度分解ステップでは、第１の領域としての置換側撮影画像全体と、第２の領域としての被置換撮影画像全体とが多重解像度に分解され、置換処理ステップでは、多重解像度に分解して得られた置換側撮影画像全体の各周波数成分及び被置換撮影画像全体の各周波数成分において、被置換領域が置換領域で置き換えられ、合成画像生成ステップでは、第１の処理によって合成画像が生成される。 In addition, the program of the present invention causes a computer to realize each process of a captured image selection step, a multi-resolution decomposition step, a replacement processing step, and a composite image generation step . First, in a photographed image selection step, a replacement photographed image on the side to be replaced and a replacement-side photographed image on the image replacement side are selected from a plurality of continuously photographed images. . Subsequently, in the multi-resolution decomposition step, a first area including at least a replacement area outer peripheral area that is an area around the outer periphery of the replacement area in the replacement-side captured image, and an outer periphery of the replacement area in the replacement captured image And the second area including at least the peripheral area of the outer periphery of the replacement area, which is an area of the area, is decomposed into multiple resolutions. Subsequently, in each frequency component obtained by decomposing into multiple resolutions in the replacement processing step, at least a part of the second region is replaced with at least a part of the first region. Subsequently, a synthetic image generating step, the first processing for reconstructing the respective frequency components is performed replaced, or, the synthesizing the image replacement area as the reconstructed image obtained in the first processing A composite image is generated by the process 2. When the first area is a replacement area outer peripheral area, the multi-resolution decomposition step includes a replacement area outer peripheral area as the first area and a replacement area outer peripheral area as the second area, respectively. In the replacement processing step, each frequency component in the peripheral area around the replacement area and each frequency component in the peripheral area around the replacement area obtained in the replacement processing step are included in the peripheral area around the replacement area. The peripheral side region is replaced with the inner peripheral side region of the replacement region outer peripheral peripheral region, and in the composite image generation step, a composite image is generated by the second process. When the first area is the entire replacement-side captured image, the multi-resolution decomposition step includes the entire resolution-side captured image as the first area and the entire replacement-captured image as the second area. In the replacement processing step, the replacement area is replaced with a replacement area in each frequency component of the entire replacement-side captured image obtained by decomposing into multiple resolutions and each frequency component of the entire replacement captured image. In the image generation step, a composite image is generated by the first process.

また、本発明の撮像装置は、撮像部と、撮影画像選択部と、多重解像度分解部と、置換処理部と、合成画像生成部と、記憶部とを備える構成とし、各部の構成及び機能を次のようにする。撮像部は、複数枚の撮影画像を連続して撮影する。撮影画像選択部は、連続して撮影された複数枚の撮影画像の中から、画像の置き換えが行われる側の被置換撮影画像と、画像を置き換える側の置換側撮影画像とを選択する。多重解像度分解部は、置換側撮影画像内の置換領域の外周周辺の領域である置換領域外周周辺領域を少なくとも含む第１の領域と、被置換撮影画像内の被置換領域の外周周辺の領域である被置換領域外周周辺領域を少なくとも含む第２の領域とを多重解像度に分解する。置換処理部は、多重解像度分解部で分解して得られた各周波数成分において、第２の領域の少なくとも一部の領域を、第１の領域の少なくとも一部の領域で置き換える。合成画像生成部は、置換処理部で置き換えが行われた各周波数成分を再構成する第１の処理、又は、第１の処理で得られた再構成画像と置換領域内の画像とを合成する第２の処理によって、合成画像を生成する。記憶部は、合成画像生成部で生成された合成画像を記憶する。そして、第１の領域が置換領域外周周辺領域である場合、多重解像度分解部は、第１の領域としての置換領域外周周辺領域と、第２の領域としての被置換領域外周周辺領域とをそれぞれ多重解像度に分解し、置換処理部は、多重解像度分解部で分解して得られた置換領域外周周辺領域の各周波数成分及び被置換領域外周周辺領域の各周波数成分において、被置換領域外周周辺領域の内周側領域を置換領域外周周辺領域の内周側領域で置き換え、合成画像生成部は、第２の処理によって合成画像を生成する。また、第１の領域が置換側撮影画像全体である場合、多重解像度分解部は、第１の領域としての置換側撮影画像全体と、第２の領域としての被置換撮影画像全体とを多重解像度に分解し、置換処理部は、多重解像度分解部で分解して得られた置換側撮影画像全体の各周波数成分及び被置換撮影画像全体の各周波数成分において、被置換領域を置換領域で置き換え、合成画像生成部は、第１の処理によって合成画像を生成する。 The imaging apparatus of the present invention includes an imaging unit, a captured image selection unit, a multi-resolution decomposition unit, a replacement processing unit, a composite image generation unit, and a storage unit, and the configuration and function of each unit. Do as follows. The imaging unit continuously captures a plurality of captured images. The captured image selection unit selects a replacement captured image on the side where the image is to be replaced and a replacement-side captured image on the side where the image is to be replaced from among a plurality of continuously captured images. The multi-resolution decomposition unit includes a first area including at least a replacement area outer peripheral area that is an area around the outer periphery of the replacement area in the replacement-side captured image, and an area around the outer periphery of the replacement area in the replacement captured image. A second region including at least a peripheral region around a certain replacement region is decomposed into multiple resolutions. The replacement processing unit replaces at least a part of the second region with at least a part of the first region in each frequency component obtained by the decomposition by the multi-resolution decomposition unit. Composite image generation unit, first processing for reconstructing the respective frequency components replacement is performed by the replacement processor, or combines the image replacement area as the reconstructed image obtained in the first processing A composite image is generated by the second process. The storage unit stores the composite image generated by the composite image generation unit. Then, when the first area is the replacement area outer peripheral area, the multi-resolution decomposition unit performs the replacement area outer peripheral area as the first area and the replacement area outer peripheral area as the second area, respectively. The permutation processing unit decomposes into multiple resolutions, and the permutation processing unit includes the perimeter region around the replacement region in each frequency component around the perimeter region of the permutation region and each perimeter component around the permutation region obtained by the multiresolution resolution unit. The inner peripheral side region is replaced with the inner peripheral side region of the replacement region outer peripheral peripheral region, and the composite image generation unit generates a composite image by the second processing. When the first area is the entire replacement-side captured image, the multi-resolution decomposition unit multi-resolutions the entire replacement-side captured image as the first area and the entire replacement-captured image as the second area. The replacement processing unit replaces the replacement region with a replacement region in each frequency component of the entire replacement-side captured image and each frequency component of the entire replacement captured image obtained by the multi-resolution decomposition unit. The composite image generation unit generates a composite image by the first process.

以上のように画像処理装置又は撮像装置を構成し、画像処理を行うことで、置換領域外周周辺領域を少なくとも含む第１の領域と、被置換領域外周周辺領域を少なくとも含む第２の領域とが、多重解像度に変換して得られた各周波数領域下で合成される。そして、合成後に再構成して得られた再構成画像、又は、再構成画像及び置換領域内の画像を用いて、合成画像が生成される。このとき、第１の領域が置換領域外周周辺領域である場合には、第１の領域としての置換領域外周周辺領域と、第２の領域としての被置換領域外周周辺領域とがそれぞれ多重解像度に分解され、多重解像度に分解された置換領域外周周辺領域の各周波数成分及び被置換領域外周周辺領域の各周波数成分において、被置換領域外周周辺領域の内周側領域が置換領域外周周辺領域の内周側領域で置き換えられる。そして、置き換えが行われた各周波数成分と置換領域内の画像とが合成されることによって、合成画像が生成される。また、第１の領域が置換側撮影画像全体である場合には、第１の領域としての置換側撮影画像全体と、第２の領域としての被置換撮影画像全体とが多重解像度に分解され、多重解像度に分解された置換側撮影画像全体の各周波数成分及び被置換撮影画像全体の各周波数成分において、被置換領域が置換領域で置き換えられる。そして、置き換えが行われた各周波数成分が再構成されることによって、合成画像が生成される。これにより、画像の置き換えが行われた箇所における、置き換え部分と被置き換え部分との境界部分が目立たなくなる。 By configuring the image processing apparatus or the imaging apparatus as described above and performing image processing, the first area including at least the replacement area outer peripheral area and the second area including at least the replacement area outer peripheral area are provided. Are synthesized under each frequency domain obtained by converting to multi-resolution. Then, the reconstructed image obtained by reconstructing after synthesis, or by using the images of the reconstructed image and substitutions regions, the composite image is generated. At this time, if the first area is a perimeter area of the replacement area, the perimeter area of the replacement area as the first area and the perimeter area of the replacement area as the second area have multiple resolutions, respectively. In each frequency component of the perimeter area of the replacement area and each frequency component of the perimeter area of the replacement area that has been decomposed into multiple resolutions, the inner peripheral area of the perimeter area of the replacement area is within the perimeter area of the replacement area. Replaced with the circumferential area. Then, a synthesized image is generated by synthesizing each frequency component subjected to the replacement and the image in the replacement area. When the first area is the entire replacement-side captured image, the entire replacement-side captured image as the first area and the entire replacement-captured image as the second area are decomposed into multiple resolutions. The replacement area is replaced with the replacement area in each frequency component of the entire replacement-side captured image decomposed into multiple resolutions and each frequency component of the entire replacement captured image. Then, a composite image is generated by reconstructing each frequency component subjected to the replacement. As a result, the boundary portion between the replacement portion and the replacement portion becomes inconspicuous at the place where the image replacement has been performed.

本発明によれば、複数枚の撮影画像を用いて、より自然な合成画像が容易に生成できるようになる。 According to the present invention, a more natural composite image can be easily generated using a plurality of photographed images.

本発明の概要を説明する概要図であり、人物の良い表情及び良くない表情を含む撮影画像の例を示す図である。It is a schematic diagram explaining the outline | summary of this invention, and is a figure which shows the example of the picked-up image containing a good expression and a bad expression of a person. 本発明の概要を説明する概要図であり、ベストショットとしての合成画像の生成例を示す図である。It is a schematic diagram explaining the outline | summary of this invention, and is a figure which shows the example of a production | generation of the synthesized image as a best shot. 本発明の一実施形態例に係る画像処理装置の構成例を示すブロック図である。1 is a block diagram illustrating a configuration example of an image processing apparatus according to an embodiment of the present invention. 本発明の一実施形態例に係る黒目領域の抽出処理の例を示す図である。It is a figure which shows the example of the extraction process of the black eye area | region which concerns on one embodiment of this invention. 本発明の一実施形態例に係る目の開き具合の判定例を示す図である。It is a figure which shows the example of a determination of the eye opening degree which concerns on the example of 1 embodiment of this invention. 本発明の一実施形態例に係る唇ラインのピクセルの取得例を示す図である。It is a figure which shows the acquisition example of the pixel of the lip line which concerns on one embodiment of this invention. 本発明の一実施形態例に係る口角の上がり具合の判定例を示す図である。It is a figure which shows the example of determination of the raising degree of a mouth angle which concerns on the example of 1 embodiment of this invention. 本発明の一実施形態例に係る、置換領域抽出用の楕円の設定例を示す図である。It is a figure which shows the example of a setting of the ellipse for replacement area | region extraction based on one embodiment of this invention. 本発明の一実施形態例に係る多重解像度分解部の構成例を示す図である。It is a figure which shows the structural example of the multi-resolution decomposition part which concerns on the example of 1 embodiment of this invention. 本発明の一実施形態例に係る多重解像度分解部の分解処理の例を示す図である。It is a figure which shows the example of the decomposition | disassembly process of the multi-resolution decomposition part which concerns on one embodiment of this invention. 本発明の第１の実施形態例に係る画像の置換処理の例を示す図である。It is a figure which shows the example of the replacement process of the image which concerns on the 1st Example of this invention. 本発明の第１の実施形態例に係る、各周波数成分における画像の置き換え処理の例を示す図である。It is a figure which shows the example of the replacement process of the image in each frequency component based on the 1st Example of this invention. 本発明の一実施形態例に係る再構成部の構成例を示す図である。It is a figure which shows the structural example of the reconstruction part which concerns on the example of 1 embodiment of this invention. 本発明の第１の実施形態例に係る画像処理方法の例を示すフローチャートである。It is a flowchart which shows the example of the image processing method which concerns on the 1st Example of this invention. 本発明の第１の実施形態例による合成画像と、従来の手法で生成した合成画像との比較を示す図であり、図１５Ａが従来の手法による合成画像であり、図１５Ｂが第１の実施形態例により生成された合成画像である。FIG. 15 is a diagram showing a comparison between a composite image according to the first exemplary embodiment of the present invention and a composite image generated by a conventional technique, FIG. 15A is a composite image by a conventional technique, and FIG. 15B is a first implementation. It is the synthesized image produced | generated by the form example. 本発明の第２の実施形態例に係る、多重解像度への分解を行う領域の設定例を示す図である。It is a figure which shows the example of a setting of the area | region which performs decomposition | disassembly to multi-resolution based on the 2nd Example of this invention. 本発明の第２の実施形態例に係る多重解像度への分解を行う領域の位置及び範囲の例を示す図である。It is a figure which shows the example of the position and range of the area | region which performs decomposition | disassembly to the multi-resolution based on the 2nd Example of this invention. 本発明の第２の実施形態例に係る画像の置換処理の例を示す図である。It is a figure which shows the example of the replacement process of the image which concerns on the 2nd Example of this invention. 本発明の第２の実施形態例に係る合成画像の構成例を示す図である。It is a figure which shows the structural example of the synthesized image which concerns on the 2nd Example of this invention. 本発明の第２の実施形態例に係る画像処理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the image processing apparatus which concerns on the 2nd Example of this invention. 本発明の第２の実施形態例に係る離散ウェーブレット変換処理、画像の置き換え及び合成処理、逆ウェーブレット変換処理に必要な画素領域を説明する図である。It is a figure explaining the pixel area | region required for the discrete wavelet transformation process which concerns on the 2nd example of this invention, the replacement and composition process of an image, and an inverse wavelet transformation process. 本発明の第２の実施形態例に係る画像処理方法の例を示すフローチャートである。It is a flowchart which shows the example of the image processing method which concerns on the 2nd Example of this invention. 本発明の一実施形態例に係る撮像装置の構成例を示すブロック図である。1 is a block diagram illustrating a configuration example of an imaging apparatus according to an embodiment of the present invention.

本発明の実施形態例に係る画像処理装置及び画像処理方法の一例を、図面を参照しながら下記の順で説明する。ただし、本発明は下記の例に限定されない。
１．本発明の概要
２．第１の実施形態例
３．第２の実施形態例
４．本発明の一実施形態例に係る撮像装置の構成例
５．各種変形例 An example of an image processing apparatus and an image processing method according to an embodiment of the present invention will be described in the following order with reference to the drawings. However, the present invention is not limited to the following examples.
1. 1. Outline of the present invention First Embodiment Example 3 Second Embodiment Example 4 4. Configuration example of imaging apparatus according to an embodiment of the present invention Various modifications

〔１．本発明の概要〕
まず、本発明の概要について、図１及び図２を参照して説明する。図１は、不図示の撮像装置によって時間的に連続して撮影された３枚の撮影画像Ｐｏ１、撮影画像Ｐｏ２、撮影画像Ｐｏ３を示す図である。撮影画像Ｐｏ１〜撮影画像Ｐｏ３には、被写体として人物Ｐｓ１と、人物Ｐｓ２と、人物Ｐｓ３が映っている。撮影画像Ｐｏ１においては、中央の人物Ｐｓ２の目が閉じてしまっており、撮影画像Ｐｏ２においては、左端と人物Ｐｓ２と右端のＰｓ３の目が閉じてしまっている。撮影画像Ｐｏ３においては、右端の人物Ｐｓ３の目が閉じてしまっている。 [1. Outline of the present invention]
First, the outline | summary of this invention is demonstrated with reference to FIG.1 and FIG.2. FIG. 1 is a diagram illustrating three captured images Po1, a captured image Po2, and a captured image Po3 captured continuously in time by an imaging device (not shown). The photographed image Po1 to the photographed image Po3 show a person Ps1, a person Ps2, and a person Ps3 as subjects. In the photographed image Po1, the eyes of the central person Ps2 are closed, and in the photographed image Po2, the eyes of the left end, the person Ps2, and the right end Ps3 are closed. In the photographed image Po3, the eyes of the rightmost person Ps3 are closed.

一方で、これらの３枚の撮影画像Ｐｏ１〜撮影画像Ｐｏ３には、人物Ｐｓ１，人物Ｐｓ２，人物Ｐｓ３のそれぞれの良い表情も映っている。右端の人物Ｐｓ３は撮影画像Ｐｏ１においては笑顔であり、中央の人物Ｐｓ２は撮影画像Ｐｏ２においては笑顔であり、左端の人物Ｐｓ１は撮影画像Ｐｏ３においては笑顔である。 On the other hand, these three photographed images Po1 to Po3 also show good expressions of the persons Ps1, Ps2, and Ps3. The rightmost person Ps3 is smiling in the captured image Po1, the central person Ps2 is smiling in the captured image Po2, and the leftmost person Ps1 is smiling in the captured image Po3.

したがって、人物Ｐｓ１〜人物Ｐｓ３のそれぞれにおいて、撮影画像Ｐｏ１〜撮影画像Ｐｏ３の中から良い表情をしている顔領域の画像で、良い表情でない顔領域の画像を置き換えることで、被写体全員の良い表情を収めた写真（ベストショット）を生成できると考えられる。ただし、画像が置き換えられる側の撮影画像Ｐｏが人物Ｐｓによって異なるようだと、単に複数枚の合成画像が生成されるだけとなってしまう。このため、撮影画像Ｐｏ１〜撮影画像Ｐｏ３のうちの１枚を、ベストショットとしての合成画像の素地となる被置換撮影画像として選択するようにする。 Therefore, in each of the persons Ps1 to Ps3, by replacing the image of the face area having a good expression from the photographed image Po1 to the photographed image Po3 with the image of the face area having no good expression, the good expression of all the subjects. It is thought that a photograph (best shot) containing However, if the photographed image Po on the side where the image is replaced appears to be different depending on the person Ps, a plurality of composite images are simply generated. For this reason, one of the photographed images Po1 to Po3 is selected as a replacement photographed image that is the basis of the composite image as the best shot.

図１に示した例では、中段の撮影画像Ｐｏ２を被置換撮影画像として選択している。撮影画像Ｐｏ２の中央に写っている人物Ｐｓ２は良い表情をしているため、この顔領域Ｒｆ２２の画像をそのまま使用する。右側の人物Ｐｓ３については、一番上の段の撮影画像Ｐｏ１から抽出した顔領域Ｒｆ３１内の画像で、目が閉じてしまっている顔領域の画像を置き換える。左側の人物Ｐｓ１については、一番下の段の撮影画像Ｐｏ３から抽出した顔領域Ｒｆ１３内の画像で、目が閉じてしまっている顔領域の画像を置き換える。 In the example shown in FIG. 1, the middle-stage photographed image Po2 is selected as the replacement photographed image. Since the person Ps2 shown in the center of the photographed image Po2 has a good expression, the image of the face region Rf22 is used as it is. For the right person Ps3, the image in the face area Rf31 extracted from the photographed image Po1 in the top row is replaced with the image of the face area in which the eyes are closed. For the left person Ps1, the image in the face area Rf13 extracted from the lowermost photographed image Po3 is replaced with the image of the face area in which the eyes are closed.

このような処理を行うことにより、図２に示すように、人物Ｐｓ１〜人物Ｐｓ３の全員が良い表情をしている合成画像Ｐｃがベストショットとして生成される。本発明では、画像の置き換えを離散ウェーブレット変換領域下で行うことで、置き換え部分と被置き換え部分との境目が目立たない自然な合成画像を生成することができる。 By performing such processing, as shown in FIG. 2, a composite image Pc in which all of the persons Ps1 to Ps3 have a good expression is generated as the best shot. In the present invention, by performing image replacement under the discrete wavelet transform region, it is possible to generate a natural composite image in which the boundary between the replacement portion and the replacement portion is not conspicuous.

〔２．第１の実施形態例〕
（２−１．画像処理装置の構成例）
図３に、本発明の第１の実施形態例に係る画像処理装置１００の内部構成例を示す。画像処理装置１００は、撮像装置２００によって連続して撮影（連写）された複数枚の撮影画像Ｐｏを入力として、ベストショットとしての合成画像Ｐｃを生成するものである。画像処理装置１００は、顔検出部１１０と、撮影画像選択部１２０と、合成画像生成部１３０よりなる。 [2. First Embodiment]
(2-1. Configuration Example of Image Processing Device)
FIG. 3 shows an internal configuration example of the image processing apparatus 100 according to the first embodiment of the present invention. The image processing apparatus 100 generates a composite image Pc as a best shot by using a plurality of photographed images Po continuously photographed (continuously shot) by the imaging apparatus 200 as an input. The image processing apparatus 100 includes a face detection unit 110, a captured image selection unit 120, and a composite image generation unit 130.

顔検出部１１０は、顔部分検出部１１１と、目・口部分検出部１１２とを有する。顔部分検出部１１１は、撮像装置２００から入力された複数枚の各撮影画像Ｐｏから、人物の顔の部分を検出する。目・口部分検出部１１２は、顔部分検出部１１１で検出された人物の顔の中から、目と口の部分を検出する。 The face detection unit 110 includes a face part detection unit 111 and an eye / mouth part detection unit 112. The face portion detection unit 111 detects a face portion of a person from each of a plurality of captured images Po input from the imaging device 200. The eye / mouth part detection unit 112 detects the eye and mouth parts from the face of the person detected by the face part detection unit 111.

顔部分検出部１１１による顔部分の検出や、目・口部分検出部１１２による目と口部分の検出は、一般的に用いられている顔領域検出機能や、顔の部位検出機能を用いて行うことができる。本実施形態例では、ＯｐｅｎＣＶに関数として用意されている、Ｈａａｒ−ｌｉｋｅ特徴量とＣａｓｃａｄｅ構造の識別器を用いた物体検出を使用するものとする。ＯｐｅｎＣＶとは、オープンソースのコンピュータビジョン向けライブラリである。上述した識別器を用いた物体検出とは、対象領域の明暗度を見ることで、顔や目・口などの部位を検出するものである。 The detection of the face part by the face part detection unit 111 and the detection of the eyes and mouth part by the eye / mouth part detection unit 112 are performed by using a commonly used face area detection function or a face part detection function. be able to. In the present embodiment example, it is assumed that object detection using a Haar-like feature quantity and a classifier having a Cascade structure prepared as a function in OpenCV is used. OpenCV is an open source computer vision library. The object detection using the discriminator described above is to detect parts such as the face, eyes, and mouth by looking at the brightness of the target area.

撮影画像選択部１２０は、表情推定部１２１と、置換側／被置換撮影画像選択部１２２とを有する。表情推定部１２１は、目・口部分検出部１１２で検出された目と口の部分の画像を解析することにより、人物の表情を推定する。置換側／被置換撮影画像選択部１２２は、表情推定部１２１で推定された表情の情報に基づいて、複数枚の撮影画像Ｐｏの中から、良い表情の画像を含む置換側撮影画像と、ベストショットの素地となる被置換撮影画像とを選択する。撮影画像選択部１２０で行う処理の詳細については、図４〜図７を参照して後述する。 The captured image selection unit 120 includes a facial expression estimation unit 121 and a replacement / replacement captured image selection unit 122. The facial expression estimation unit 121 estimates the facial expression of a person by analyzing the images of the eyes and mouth detected by the eye / mouth part detection unit 112. Based on the facial expression information estimated by the facial expression estimation unit 121, the replacement side / replacement captured image selection unit 122 selects the replacement side captured image including a good facial expression image from the plurality of captured images Po, and the best A replacement photographed image as a base of the shot is selected. Details of the processing performed by the captured image selection unit 120 will be described later with reference to FIGS.

合成画像生成部１３０は、置換領域範囲決定部１３１と、多重解像度分解部１３２と、置換処理部１３３と、再構成部１３４とを有する。置換領域範囲決定部１３１は、置換側／被置換撮影画像選択部１２２で選択された置換側撮影画像の中から、一番良い表情をした顔の部分を検出する。そして、検出した顔の画像を解析して得られるパラメータを用いて、置換領域として切り出す範囲を算出する。置換領域の切り出しは、後段の多重解像度分解部１３２で行うものであり、ここではその範囲のみを決定する。 The composite image generation unit 130 includes a replacement area range determination unit 131, a multi-resolution decomposition unit 132, a replacement processing unit 133, and a reconstruction unit 134. The replacement area range determination unit 131 detects a face portion having the best facial expression from the replacement side captured image selected by the replacement side / replacement captured image selection unit 122. Then, a range to be cut out as a replacement area is calculated using parameters obtained by analyzing the detected face image. The replacement area is cut out by the subsequent multi-resolution decomposition unit 132, and only the range is determined here.

本実施形態例では、置換領域として、人物の顔全体を含み、かつ顔全体より少し大きい領域を楕円で切り出すことを行う。このため、置換領域範囲決定部１３１は、この楕円の長径と短径を算出する。楕円の長径と短径は、例えば、目・口部分検出部１１２が検出した両目の間隔の長さを用いて算出することができる。楕円の長径と短径の算出方法の詳細については後述する。 In the present embodiment, an area that includes the entire face of the person and is slightly larger than the entire face is cut out with an ellipse as the replacement area. Therefore, the replacement region range determination unit 131 calculates the major axis and minor axis of this ellipse. The major axis and minor axis of the ellipse can be calculated using, for example, the distance between the eyes detected by the eye / mouth part detection unit 112. Details of the method of calculating the major axis and minor axis of the ellipse will be described later.

楕円の大きさは、被写体として写っている人物Ｐｓ毎に異なるものと考えられるため、置換領域決定の処理は、顔領域の置き換えが必要とされる各人物Ｐｓに対して行われる。図１に示した例で言えば、置換領域範囲決定部１３１は、人物Ｐｓ３に対しては、撮影画像Ｐｏ１内の顔画像から、置換領域として抽出する顔領域Ｒｆ３１のサイズ、すなわち楕円の大きさを決定する。人物Ｐｓ１に対しては、撮影画像Ｐｏ３内の顔画像から、置換領域として抽出する顔領域Ｒｆ１３のサイズを決定する。 Since the size of the ellipse is considered to be different for each person Ps shown as a subject, the replacement area determination processing is performed for each person Ps that needs to replace the face area. In the example shown in FIG. 1, for the person Ps3, the replacement area range determination unit 131 extracts the size of the face area Rf31 extracted as a replacement area from the face image in the captured image Po1, that is, the size of the ellipse. To decide. For the person Ps1, the size of the face area Rf13 to be extracted as a replacement area is determined from the face image in the captured image Po3.

多重解像度分解部１３２は、置換側／被置換撮影画像選択部１２２で選択された被置換撮影画像と置換側撮影画像とを、それぞれ離散ウェーブレット変換によって多重解像度に分解する。置換処理部１３３は、置換領域範囲決定部１３１でサイズが決定された楕円を用いて、多重解像度に分解された各周波数成分から、更新領域としての顔領域を抽出する。そして、各周波数成分において、良い表情をした顔を含む顔領域で、良い表情でない顔を含む顔領域を置き換える処理を行う。再構成部１３４は、顔領域の置換が行われた各周波数成分を逆ウェーブレット変換することで、画像を再構成する。これにより、ベストショットとしての合成画像Ｐｃが生成される。合成画像生成部１３０を構成するこれらの各部における処理の詳細については、図８〜図１３を参照して後述する。 The multi-resolution decomposition unit 132 decomposes the replacement-captured image and the replacement-side captured image selected by the replacement-side / replacement-captured image selection unit 122 into multiple resolutions by discrete wavelet transform. The replacement processing unit 133 uses the ellipse whose size is determined by the replacement region range determination unit 131 to extract a face region as an update region from each frequency component decomposed into multiple resolutions. Then, in each frequency component, a process is performed to replace a face area including a face with a good expression with a face area including a face with a good expression. The reconstruction unit 134 reconstructs an image by performing inverse wavelet transform on each frequency component subjected to face region replacement. Thereby, the composite image Pc as the best shot is generated. Details of processing in each of these units constituting the composite image generation unit 130 will be described later with reference to FIGS. 8 to 13.

＜２−１−１．撮影画像選択部の処理の詳細＞
続いて、撮影画像選択部１２０の処理の詳細について説明する。まず、図４〜図７を参照して、表情推定部１２１の処理の詳細について説明する。表情推定部１２１は、目・口部分検出部１１２で検出された目の部位と口の部位の画像を画像解析することで、撮影画像Ｐｏ中に写っている人物Ｐｓの表情を推定する。本実施形態例では、以下の２点の条件を満たす表情を良い表情（笑顔）であるとみなすことにする。
１．目がある程度開いている
２．複数枚の撮影画像Ｐｏに写っているその人物Ｐｓの顔の中で、口角が一番上がっている
目の開き具合は、目・口部分検出部１１２で検出された片目領域に閾値処理を適用して黒目部分を検出し、検出した黒目部分の大きさに基づいて判断する。口角の上がり具合は、唇のラインを二次関数で近似し、その２次の項の係数の大きさによって推定する。 <2-1-1. Details of processing of photographed image selection unit>
Next, details of the processing of the captured image selection unit 120 will be described. First, details of the processing of the facial expression estimation unit 121 will be described with reference to FIGS. The facial expression estimation unit 121 estimates the facial expression of the person Ps in the captured image Po by performing image analysis on the images of the eye part and the mouth part detected by the eye / mouth part detection unit 112. In the present embodiment, an expression satisfying the following two points is regarded as a good expression (smile).
1. 1. The eyes are open to some extent. Threshold angle processing is applied to the one eye area detected by the eye / mouth part detection unit 112 for the degree of opening of the eye whose mouth corner is the highest among the faces of the person Ps shown in the plurality of photographed images Po. Then, the black eye part is detected, and the determination is made based on the size of the detected black eye part. The degree of mouth angle rise is estimated by approximating the lip line with a quadratic function and the magnitude of the coefficient of the quadratic term.

まず、図４を参照して、目の開き具合の推定方法について説明する。図４は、片目領域に適用する閾値処理の例を示した図である。図４の左側には、目・口部分検出部１１２で検出された片目領域を示しており、この領域に閾値処理を適用した結果を、図４の右側に示している。図４の右側には、閾値処理の結果検出された黒目部分が白色の画素として表示されており、この白色の部分のピクセル数の大小によって、目の開き具合を判定する。黒目部分として検出されたピクセル数が多いほど目が大きく開いていると推定できる。 First, a method for estimating the degree of eye opening will be described with reference to FIG. FIG. 4 is a diagram illustrating an example of threshold processing applied to the one-eye region. The left side of FIG. 4 shows the one-eye region detected by the eye / mouth part detection unit 112, and the result of applying threshold processing to this region is shown on the right side of FIG. On the right side of FIG. 4, the black eye portion detected as a result of the threshold processing is displayed as a white pixel, and the degree of opening of the eye is determined based on the number of pixels of the white portion. It can be estimated that the larger the number of pixels detected as the black-eye portion, the larger the eyes are opened.

また、黒目部分として検出されたピクセル数が非常に少ない場合には、その目は閉じた状態の目であると推定できる。目の開閉判定に用いる閾値としては、例えば９０（単位：ピクセル）を設定することができる。この値は、片目領域として検出する領域の大きさ等に応じて、適宜最適な値を設定できる。 Further, when the number of pixels detected as a black eye portion is very small, it can be estimated that the eyes are closed. For example, 90 (unit: pixel) can be set as the threshold used for the eye open / close determination. This value can be set to an optimal value as appropriate according to the size of the area detected as the one-eye area.

本実施形態例では、目の開閉判定に加えて、笑顔状態か否かの判定も行う。前述したように、黒目部分として検出されたピクセル数が多いほど目が大きく開いていると推定できる。しかし、目が一番開いている顔だからといってそれが笑っている表情であると断言することはできない。なぜなら、笑顔の際には目尻が下がるという特徴があるため、笑顔の目は無表情の目よりもまぶたが下りている状態である可能性もあるからである。したがって、やや閉じた状態の目から完全に開いている状態の目までが、笑顔状態の目と考えられる。この状態の目か否かの判定は、抽出した黒目領域のピクセル数が所定の範囲内かそうでないかによって行うことにする。 In the present embodiment, in addition to the eye opening / closing determination, it is also determined whether or not the face is smiling. As described above, it can be estimated that the larger the number of pixels detected as the black-eye portion, the larger the eyes are opened. But just because your face is the most open, you can't say it's a laughing expression. This is because there is a possibility that the corners of the eyes are lowered when smiling, and there is a possibility that the eyes of the smile are in a state where the eyelids are lower than the eyes without expression. Therefore, the eyes from the slightly closed state to the fully opened eye are considered as the smiling eyes. Whether or not the eye is in this state is determined depending on whether or not the number of pixels of the extracted black eye region is within a predetermined range.

例えばｎ枚の撮影画像Ｐｏを入力として、それぞれの撮影画像Ｐｏ中に写っているある特定の人物Ｐｓの左目の片目領域を検出したとする。検出した片目領域から抽出した黒目領域のピクセル数が、ｐ１，ｐ２，．．．，ｐｎだったとする。このとき、検出された片目の中から笑顔状態の目であると判定されるのは、その黒目のピクセル数ｐが以下の数１で示される範囲内にある場合である。
For example, it is assumed that n photographed images Po are input, and one eye region of the left eye of a specific person Ps in each photographed image Po is detected. The number of pixels of the black eye region extracted from the detected one eye region is p1, p2,. . . , Pn. At this time, it is determined that the detected eye is a smiling eye when the number p of pixels of the black eye is within the range expressed by the following equation (1).

つまり、抽出した複数の黒目領域の画素数の中から一番多いピクセル数を検出し、その最大ピクセル数の１／２以上を、笑顔状態の目と区分する範囲としている。 That is, the largest number of pixels is detected from among the extracted number of pixels of the black eye region, and ½ or more of the maximum number of pixels is set as a range to be distinguished from smiling eyes.

図５に、片目領域から抽出した黒目領域のピクセル数と、目の開き具合の推定結果との対応例を示す。図５の左側には、撮影画像Ｐｏ１〜撮影画像Ｐｏ３の３枚の画像の中から、人物Ｐｓ２の顔が写っている部分を抽出したものを、上から下の方向に撮影画像Ｐｏ１，撮影画像Ｐｏ２，撮影画像Ｐｏ３の順に並べてある。人物Ｐｓ２は、撮影画像Ｐｏ１においては目を閉じており、撮影画像Ｐｏ２と撮影画像Ｐｏ３では目を開いている。 FIG. 5 shows a correspondence example between the number of pixels in the black eye region extracted from the one eye region and the estimation result of the eye opening degree. On the left side of FIG. 5, a portion in which the face of the person Ps2 is extracted from the three images of the photographed image Po1 to the photographed image Po3 is extracted from the top to the bottom. Po2 and photographed image Po3 are arranged in this order. The person Ps2 closes his eyes in the photographed image Po1, and opens his eyes in the photographed image Po2 and the photographed image Po3.

図５の右側には、各撮影画像Ｐｏから検出した片目領域より抽出した黒目領域と、そのピクセル数ｐを示している。なお、黒目領域は、両目を対としてピクセル数が多い順にソートして上から下の方向に並べてある。 The right side of FIG. 5 shows the black eye region extracted from the one eye region detected from each captured image Po and the number of pixels p. Note that the black eye regions are sorted in descending order from the top to the bottom with the number of pixels as a pair for both eyes.

図５の右側の一番上段には、図５の左側の中段に示した撮影画像Ｐｏ２から検出した右目領域Ｒｅ２ｒ及び左目領域Ｒｅ２ｌから抽出した各黒目領域を示している。右目領域Ｒｅ２ｒから抽出した黒目領域のピクセル数ｐは２４３であり、左目領域Ｒｅ２ｌから抽出した黒目領域のピクセル数ｐは２３５であることが示されている。 The uppermost stage on the right side of FIG. 5 shows the black eye areas extracted from the right eye area Re2r and the left eye area Re2l detected from the captured image Po2 shown in the middle stage on the left side of FIG. It is indicated that the pixel number p of the black eye region extracted from the right eye region Re2r is 243, and the pixel number p of the black eye region extracted from the left eye region Re2l is 235.

中段には、図５の左側の一番下の段に示した撮影画像Ｐｏ３から検出した右目領域Ｒｅ３ｒ及び左目領域Ｒｅ３ｌから抽出した各黒目領域を示している。右目領域Ｒｅ３ｒから抽出した黒目領域のピクセル数ｐは２０３であり、左目領域Ｒｅ３ｌから抽出した黒目領域のピクセル数ｐは２２３であることが示されている。 The middle stage shows the black eye areas extracted from the right eye area Re3r and the left eye area Re3l detected from the captured image Po3 shown in the lowermost stage on the left side of FIG. It is indicated that the pixel number p of the black eye region extracted from the right eye region Re3r is 203, and the pixel number p of the black eye region extracted from the left eye region Re3l is 223.

一番下の段には、図５の左側の一番上の段に示した撮影画像Ｐｏ１から検出した右目領域Ｒｅ１ｒ及び左目領域Ｒｅ１ｌから抽出した各黒目領域を示している。右目領域Ｒｅ１ｒから抽出した黒目領域のピクセル数ｐは７４であり、左目領域Ｒｅ１ｌから抽出した黒目領域のピクセル数ｐは２５であることが示されている。 The bottom row shows the black eye regions extracted from the right eye region Re1r and the left eye region Re1l detected from the photographed image Po1 shown in the top row on the left side of FIG. It is shown that the number of pixels p of the black eye region extracted from the right eye region Re1r is 74, and the number of pixels p of the black eye region extracted from the left eye region Re1l is 25.

これらの黒目領域のピクセル数ｐのうち、一番値が大きいのは２４３である。したがって、２４３／２＝１２１．５よりもピクセル数ｐが多い黒目領域を有する目を、笑顔状態の目と推定できる。図５の右側に示した例では、一番上段に示した撮影画像Ｐｏ２から抽出した目と、中段に示した撮影画像Ｐｏ３から抽出した目が、笑顔状態の目に区分される。 Of the number p of pixels in the black eye region, 243 has the largest value. Therefore, it is possible to estimate an eye having a black eye region having more pixels p than 243/2 = 121.5 as a smiling eye. In the example shown on the right side of FIG. 5, the eyes extracted from the photographed image Po2 shown in the top row and the eyes extracted from the photographed image Po3 shown in the middle row are classified into smiley eyes.

撮影画像Ｐｏ３の右目領域Ｒｅ１ｒと左目領域Ｒｅ１ｌから抽出した各黒目領域のピクセル数ｐは、それぞれ“７４”、“２５”であり、目の開閉判定用の閾値として設定された９０ピクセルよりも少ない。したがって、撮影画像Ｐｏ３の右目領域Ｒｅ１ｒの目と左目領域Ｒｅ１ｌの目は、いずれも閉じていると推定される。このような閉じた目を含む顔、すなわち撮影画像Ｐｏ３における人物Ｐｓ２の顔は、例え口角が上がっていたとしても、笑顔であるとは判定しないようにする。 The number of pixels p of each black eye region extracted from the right eye region Re1r and the left eye region Re1l of the photographed image Po3 is “74” and “25”, respectively, which is smaller than 90 pixels set as the threshold value for eye open / close determination. . Therefore, it is estimated that both the eyes of the right eye region Re1r and the left eye region Re1l of the captured image Po3 are closed. Such a face including closed eyes, that is, the face of the person Ps2 in the photographed image Po3 is not determined to be a smile even if the mouth corner is raised.

続いて、口角の上がり具合の推定方法について説明する。人が笑う時には無表情の時に比べて一般的に口角が上がる。口角が上がると唇のラインは下方向に凸になるはずである。本実施形態例では、この唇のラインを二次関数に近似し、その２次の項の係数の大きさによって笑顔の具合を推定する。二次関数の２次の項の係数が大きければ大きいほど、関数の曲線は下方向に凸となる。したがって、一番大きい係数となった唇ラインを含む口領域が、一番口角が上がっている口領域であると推定できる。すなわち、笑っている口領域であると推定される。 Next, a method for estimating the rising angle of the mouth corner will be described. When people laugh, their mouth corners are generally higher than when they are expressionless. As the corner of the mouth rises, the lip line should be convex downward. In this embodiment, the lip line is approximated to a quadratic function, and the degree of smile is estimated based on the coefficient of the quadratic term. The larger the coefficient of the quadratic term of the quadratic function, the more convex the function curve will be. Therefore, it can be estimated that the mouth region including the lip line having the largest coefficient is the mouth region having the highest mouth angle. That is, it is estimated that the mouth area is laughing.

唇ラインを二次関数に近似するにあたっては、唇ラインの情報を持つデータ点を得る必要がある。本実施形態例では、口領域のＹ（輝度）成分に対して各列の極小点を検出することで、このデータ点を得るものとする。図６Ａは、唇ラインの検出が行われた画像を示し、図６Ｂは、図６Ａ中にラインＬｎとして示された縦方向の線上で検出されたＹ成分を、縦方向の座標にプロットしたグラフである。図６Ｂの縦軸はＹ成分の値（輝度値）を示し、横軸は画像の縦（ｙ）方向の座標を示す。 In approximating the lip line to a quadratic function, it is necessary to obtain data points having lip line information. In this embodiment, this data point is obtained by detecting the minimum point of each column for the Y (luminance) component of the mouth area. FIG. 6A shows an image in which the lip line is detected, and FIG. 6B is a graph in which Y components detected on the vertical line indicated as the line Ln in FIG. 6A are plotted on the vertical coordinates. It is. The vertical axis in FIG. 6B indicates the value of the Y component (luminance value), and the horizontal axis indicates the coordinates in the vertical (y) direction of the image.

唇ラインの情報を保持するピクセルは、画像の同じ列（例えば図６ＡのラインＬｎ）の肌や唇、歯の情報を保持するピクセルよりも輝度値が低くなる。したがって、画像の列を１次元信号と見た時、唇ラインは極小点に一致する。つまり極小点を検出することで、唇ラインのピクセルを得ることができる。図６Ｂに示した例では、ｙ座標が“２０”近辺の輝度値が極小点として検出される。唇ラインのピクセルを検出後は、これらのデータを最小二乗法によって二次関数（ｆ（ｘ）＝ａｘ^２＋ｂｘ＋ｃ）へ近似する。この二次関数の係数ａの値が大きければ大きいほど、関数の曲線は下方向に凸となる。つまり、口角が上がっている笑顔の口領域であると推定できる。 Pixels that hold lip line information have lower brightness values than pixels that hold skin, lip, and tooth information in the same column of the image (for example, line Ln in FIG. 6A). Therefore, when the sequence of images is viewed as a one-dimensional signal, the lip line matches the minimum point. In other words, the lip line pixel can be obtained by detecting the minimum point. In the example shown in FIG. 6B, the luminance value near the y coordinate of “20” is detected as a minimum point. After detecting the pixels of the lip line, these data are approximated to a quadratic function (f (x) = ax ² + bx + c) by the least square method. The larger the value of the coefficient a of the quadratic function, the more convex the function curve is. That is, it can be estimated that the mouth area is a smiling mouth area with a raised corner.

図７は、撮影画像Ｐｏ中の口領域の画像と、検出した唇ラインとの対応を示す図である。図７の左側には、撮影画像Ｐｏ１〜撮影画像Ｐｏ３の３枚の画像の中から、同一の人物Ｐｓ２の顔が写っている部分を抽出したものを、上から下の方向に撮影画像Ｐｏ３，撮影画像Ｐｏ２，撮影画像Ｐｏ１の順に並べてある。 FIG. 7 is a diagram illustrating the correspondence between the mouth area image in the captured image Po and the detected lip line. On the left side of FIG. 7, a portion in which the face of the same person Ps2 is extracted from the three images of the photographed image Po1 to the photographed image Po3 is extracted in the photographed image Po3 from the top to the bottom. The captured image Po2 and the captured image Po1 are arranged in this order.

図７の右側には、各撮影画像Ｐｏから検出した口領域Ｒｍより抽出した唇ラインと、それを近似した二次関数の係数ａの値を示している。撮影画像Ｐｏ３から検出した唇ラインを近似して得た二次関数の係数ａは２．９８×１０^−５であり、撮影画像Ｐｏ２から検出した唇ラインを近似して得た二次関数の係数ａは３３．３×１０^−５である。撮影画像Ｐｏ１から検出した唇ラインを近似して得た二次関数の係数ａは８．４５×１０^−５である。図７の左側に示した３つの撮影画像Ｐｏのうち、口角が一番上がっている撮影画像Ｐｏ２から抽出した唇ラインを近似した二次関数の係数ａが、一番大きな値となっていることが分かる。 The right side of FIG. 7 shows the lip line extracted from the mouth region Rm detected from each captured image Po and the value of the coefficient a of the quadratic function approximating it. The coefficient a of the quadratic function obtained by approximating the lip line detected from the photographed image Po3 is 2.98 × 10 ⁻⁵ , and the coefficient of the quadratic function obtained by approximating the lip line detected from the photographed image Po2 a is 33.3 × 10 ⁻⁵ . The coefficient a of the quadratic function obtained by approximating the lip line detected from the photographed image Po1 is 8.45 × 10 ⁻⁵ . Among the three photographed images Po shown on the left side of FIG. 7, the coefficient a of the quadratic function approximating the lip line extracted from the photographed image Po2 having the highest mouth corner is the largest value. I understand.

このようにして得た目の開き具合の情報と、口角の上がり具合の情報とを用いて、表情の推定を行う。より詳細には、まず、撮影画像Ｐｏを、目が開いているものと閉じているものに分ける。そして、閉じているものと開いているものの両方において、唇ラインを近似して得た二次関数の係数ａが高い順に、撮影画像Ｐｏをソートする作業を行う。そして、ソートされた各撮影画像Ｐｏを、目が開いているもの、目が閉じているものの順に並べる。最後に、この作業によって定まった順番をそのまま評価値として各顔に与える。 The facial expression is estimated using the information on the degree of opening of the eyes and the information on the degree of rising of the mouth corner. More specifically, first, the photographed image Po is divided into one with eyes open and one with eyes closed. Then, in both closed and open ones, an operation is performed to sort the captured images Po in descending order of the quadratic function coefficient a obtained by approximating the lip line. Then, the sorted captured images Po are arranged in the order of open eyes and closed eyes. Finally, the order determined by this work is given to each face as an evaluation value as it is.

これにより、より目が大きく開いており、口角の上がった表情を含む撮影画像Ｐｏに対して、より高い評価値が付与され、より目が閉じており、より口角が下がっている表情を含む撮影画像Ｐｏに対しては、より低い評価値が付与される。この評価値付与の作業は、撮影画像Ｐｏに含まれるすべての人物に対して行う。各撮影画像Ｐｏにおける評価値は、その画像内に写っている人物の顔の評価値を合計することで求めることができる。 As a result, a higher evaluation value is given to the captured image Po including a facial expression with wider eyes and a raised mouth corner, and photographing including a facial expression with a closed eye and a lower mouth corner. A lower evaluation value is assigned to the image Po. This evaluation value assignment operation is performed for all persons included in the captured image Po. The evaluation value in each captured image Po can be obtained by summing up the evaluation values of the faces of the persons shown in the image.

置換側／被置換撮影画像選択部１２２（図３参照）は、評価値が最も高かった撮影画像Ｐｏを、ベストショットの素地となる被置換撮影画像として選択する。そして、被置換撮影画像に対して置き換えを行う顔領域を含む撮影画像を、置換側撮影画像として選択する。なお、被置換撮影画像として選択された撮影画像Ｐｏ内に写っている各人物Ｐｓの顔が、それぞれの人物において最も評価値の高いものである場合も想定される。つまり、各人物の一番良い表情が、一枚の撮影画像Ｐｏ内に収まっている場合もありうる。このような場合には、その後の処理は行わず、その撮影画像Ｐｏをベストショットとして選定する。 The replacement side / replacement captured image selection unit 122 (see FIG. 3) selects the captured image Po having the highest evaluation value as the replacement captured image that is the basis of the best shot. Then, a captured image including a face area to be replaced with respect to the replacement captured image is selected as a replacement-side captured image. It is assumed that the face of each person Ps in the photographed image Po selected as the replacement photographed image has the highest evaluation value for each person. That is, the best facial expression of each person may be within one photographed image Po. In such a case, subsequent processing is not performed, and the captured image Po is selected as the best shot.

＜２−１−２．合成画像生成部の処理の詳細＞
［２−１−２−１．置換領域範囲決定部の処理の詳細］
次に、合成画像生成部１３０を構成する各部での処理の詳細について説明する。まず、置換領域範囲決定部１３１の処理の詳細について、図８を参照して説明する。置換領域範囲決定部１３１は、後段の多重解像度分解部１３２によって抽出する置換領域の楕円の範囲を、撮影画像Ｐｏから検出した顔部分の両目の間隔に基づいて算出する。 <2-1-2. Details of processing of composite image generation unit>
[2-1-2-1. Details of processing of replacement area range determination unit]
Next, details of processing in each unit constituting the composite image generation unit 130 will be described. First, details of the processing of the replacement region range determination unit 131 will be described with reference to FIG. The replacement region range determination unit 131 calculates the elliptical range of the replacement region extracted by the subsequent multi-resolution decomposition unit 132 based on the distance between the eyes of the face portion detected from the captured image Po.

楕円の大きさは、顔領域の全体を含み、かつ頭髪の部分もすべて含む大きさとすることが好ましい。楕円をこのような大きさとし、楕円で囲われた領域を置換領域とすることで、置き換え部分と被置き換え部分との境目に頭髪の領域がかかってしまうことがなくなる。この境目部分に頭髪の領域がかかってしまうと、その後の処理で多重解像度に分解されて再構成されることで、境目部分が不自然な画像として認識されてしまうためである。 The size of the ellipse is preferably a size that includes the entire face region and also includes the entire hair portion. By setting the ellipse to such a size and setting the region surrounded by the ellipse as the replacement region, the hair region is not covered at the boundary between the replacement portion and the replacement portion. This is because, if a hair region is applied to the boundary portion, the boundary portion is recognized as an unnatural image by being decomposed and reconstructed into multiple resolutions in the subsequent processing.

図８には、目・口部分検出部１１２で検出された右目領域Ｒｅｒと左目領域Ｒｅｌを、それぞれ矩形で示している。右目領域Ｒｅｒの中心と左目領域Ｒｅｌの中心とを結ぶ矢印で示した線Ｅｐが、瞳孔の間隔に相当する。この線Ｅｐの中心を楕円Ｅの中心に設定し、この点を基準に楕円Ｅの長径ｒＬと短径ｒＳを決定する。楕円Ｅの傾き、すなわち長径ｒＬの水平方向に対するθ角は、０°となるように位置決めされる。楕円Ｅの長径ｒＬと短径ｒＳは、それぞれ以下の式１と式２によって求まる。
長径ｒＬ＝１．５ｒｈＥｐ・・・式１
短径ｒＳ＝１．５ｒｗＥｐ・・・式２ In FIG. 8, the right eye region Rer and the left eye region Rel detected by the eye / mouth part detection unit 112 are indicated by rectangles. A line Ep indicated by an arrow connecting the center of the right eye region Rer and the center of the left eye region Rel corresponds to the pupil interval. The center of the line Ep is set as the center of the ellipse E, and the major axis rL and the minor axis rS of the ellipse E are determined based on this point. The inclination of the ellipse E, that is, the θ angle with respect to the horizontal direction of the major axis rL is positioned to be 0 °. The major axis rL and the minor axis rS of the ellipse E are obtained by the following equations 1 and 2, respectively.
Long diameter rL = 1.5 rhEp Formula 1
Minor axis rS = 1.5 rwEp Equation 2

上記式１における“ｒｈ”と、上記式２における“ｒｗ”は、それぞれ、瞳孔間隔を１とした場合の瞳孔間隔に対する顔の高さの比率と、顔の幅の比率の、日本人の標準的な値を示す。これらの値は、独立行政法人産業技術総合研究所より提供されている「日本人頭部寸法データベース２００１」に記載されたデータを用いて算出した。具体的には、各年代の男女に対して瞳孔間隔を１としたときの顔の高さの比率と幅の比率を求め、求めた顔の高さと幅を平均することにより算出した。これにより、顔の高さｒｈ＝３．５７、顔の幅ｒｗ＝２．４９の各値が得られた。 “Rh” in the above formula 1 and “rw” in the above formula 2 are the Japanese standard of the ratio of the face height to the pupil distance and the ratio of the face width when the pupil distance is 1, respectively. The typical value is shown. These values were calculated using data described in “Japanese Head Dimensions Database 2001” provided by the National Institute of Advanced Industrial Science and Technology. Specifically, the ratio of the height of the face and the ratio of the width when the pupil interval was set to 1 for men and women of each age were calculated, and the calculated height and width were averaged. As a result, values of face height rh = 3.57 and face width rw = 2.49 were obtained.

上記式１と式２における乗数“１．５”の値は、頭髪の部分までを置換領域内に収めることを意図として算出された値である。したがって、置換領域として抽出したい範囲の大きさに応じて、違う値を設定してもよい。 The value of the multiplier “1.5” in the above formulas 1 and 2 is a value calculated with the intention of fitting the hair portion up to the replacement area. Accordingly, different values may be set according to the size of the range to be extracted as the replacement area.

［２−１−２−２．多重解像度分解部の処理の詳細］
続いて、多重解像度分解部１３２の処理の詳細について、図９及び図１０を参照して説明する。前述したように、本実施形態例は、置換領域内の画像による被置換領域内の画像の置き換えを、離散ウェーブレット変換領域下で行う。すなわち、撮影画像Ｐｏを離散ウェーブレット変換によって多重解像度に分解した後、分解して得られた各周波数領域下で、画像の置き換えを行う。多重解像度分解部１３２は、撮影画像Ｐｏを離散ウェーブレット変換によって多重解像度に分解する処理を行う。本実施形態例では、第１の領域としての置換側撮影画像全体と、第２の領域としての被置換撮影画像全体とを、離散ウェーブレット変換の一種である非間引きウェーブレット変換によって多重解像度に分解する。非間引きウェーブレット変換とは、離散ウェーブレット変換で行われる間引き処理（アップサンプリング／ダウンサンプリング）を行わずに、多重解像度への分解及び再構成を行う手法である。 [2-1-2-2. Details of processing of multi-resolution decomposition unit]
Next, details of the processing of the multi-resolution decomposition unit 132 will be described with reference to FIGS. 9 and 10. As described above, in this embodiment, the image in the replacement area is replaced with the image in the replacement area under the discrete wavelet transform area. That is, after the captured image Po is decomposed into multiple resolutions by discrete wavelet transform, the images are replaced under each frequency region obtained by the decomposition. The multi-resolution decomposition unit 132 performs processing for decomposing the captured image Po into multi-resolution by discrete wavelet transform. In this embodiment, the entire replacement-side captured image as the first region and the entire replacement-captured captured image as the second region are decomposed into multiple resolutions by non-decimation wavelet transform, which is a kind of discrete wavelet transform. . The non-decimation wavelet transform is a technique for performing decomposition and reconstruction into multiple resolutions without performing the thinning process (upsampling / downsampling) performed in the discrete wavelet transform.

離散ウェーブレット変換はサブバンド符号化の一種であり、１次元信号ｘ（ｎ）の離散ウェーブレット変換は、２チャネルフィルタバンクを用いて信号を帯域分割し、低周波成分に対して再帰的にフィルタバンクを適用していくオクターブ分割で表現することができる。図９は、多重解像度分解部１３２が、１次元離散信号ｘ（ｎ）を３レベルまで離散ウェーブレット分解する処理示したものである。 The discrete wavelet transform is a kind of subband coding. The discrete wavelet transform of the one-dimensional signal x (n) uses a two-channel filter bank to divide a signal into bands and recursively filter banks for low frequency components. Can be expressed in octave division. FIG. 9 shows a process in which the multi-resolution decomposition unit 132 performs discrete wavelet decomposition of the one-dimensional discrete signal x (n) to three levels.

図９中の“Ｈ（ｚ）”はローパスフィルタを示し、“Ｇ（ｚ）”はハイパスフィルタを示す。これらが１組のフィルタとして用いられる。ローパスフィルタＨ（ｚ）とハイパスフィルタＧ（ｚ）は、１レベル分解するごとにフィルタをアップサンプリングする。撮影画像Ｐｏは２次元信号であるため、この処理を縦方向か横方向に行った後、もう一方の方向においても同様の処理を行うようにする。この処理を行うことで、撮影画像Ｐｏが、低周波成分ＬＬと、縦の高周波成分ＬＨと、横の高周波成分ＨＬと、斜めの高周波成分ＨＨとに分解される。 “H (z)” in FIG. 9 indicates a low-pass filter, and “G (z)” indicates a high-pass filter. These are used as a set of filters. The low-pass filter H (z) and the high-pass filter G (z) upsample the filter every time one level is decomposed. Since the captured image Po is a two-dimensional signal, this processing is performed in the vertical direction or the horizontal direction, and then the same processing is performed in the other direction. By performing this processing, the captured image Po is decomposed into a low frequency component LL, a vertical high frequency component LH, a horizontal high frequency component HL, and a diagonal high frequency component HH.

図１０は、非間引きウェーブレット変換を用いて画像を１レベル分解した例を示す図である。図１０の左側に示したグレースケールのＬＥＮＮＡの画像に対して、横方向、縦方向の順に非間引きウェーブレット変換を行った結果を、図１０の右側に示している。非間引きウェーブレット変換によって、左上から時計回りの方向に配置して示すように、低周波成分ＬＬ、縦の高周波成分ＬＨ、横の高周波成分ＨＬ、斜めの高周波成分ＨＨが得られる。間引き処理を行っていないため、各周波数成分のサイズは、原画像のサイズと同じ大きさとなる。撮影画像Ｐｏがカラー画像である場合には、Ｒ（赤），Ｇ（緑），Ｂ（青）の各色成分あるいは輝度信号（Ｙ）と色差信号（Ｕ，Ｖ）とを、それぞれ非間引きウェーブレット変換で分解する必要がある。 FIG. 10 is a diagram illustrating an example in which an image is decomposed by one level using non-decimated wavelet transform. The result of performing non-decimation wavelet transform on the grayscale LENNA image shown on the left side of FIG. 10 in the horizontal direction and the vertical direction is shown on the right side of FIG. The low frequency component LL, the vertical high frequency component LH, the horizontal high frequency component HL, and the diagonal high frequency component HH are obtained by the non-decimation wavelet transform, as shown in the clockwise direction from the upper left. Since the thinning process is not performed, the size of each frequency component is the same as the size of the original image. When the photographed image Po is a color image, R (red), G (green), and B (blue) color components or luminance signals (Y) and color difference signals (U, V) are respectively non-decimated wavelets. It is necessary to decompose by conversion.

この非間引きウェーブレット変換を複数回繰り返してｎレベルまで分解してから再構成することで、ベストショットとして生成される合成画像Ｐｃにおける、置き換え部分と被置き換え部分との境目をより目立たなく、自然なものとすることができる。これは、分解レベルを上げることで処理が増えるため、境界線もより平滑化されるからである。実験の結果、少なくとも３レベルまで分解することで、一定の効果が得られることが分かっている。 The non-decimated wavelet transform is repeated a plurality of times and decomposed to n level and then reconstructed, so that the boundary between the replacement portion and the replacement portion in the synthesized image Pc generated as the best shot is less noticeable and natural. Can be. This is because the processing is increased by increasing the decomposition level, so that the boundary line is also smoothed. As a result of experiments, it has been found that a certain effect can be obtained by decomposing to at least three levels.

１レベル分解によって得られた低周波成分ＬＬをさらに分解すると、低周波成分ＬＬ２、縦の高周波成分ＬＨ２、横の高周波成分ＨＬ２、斜めの高周波成分ＨＨ２が得られる。ここで得られた低周波成分ＬＬ２をさらに分解することで、低周波成分ＬＬ３、縦の高周波成分ＬＨ３、横の高周波成分ＨＬ３、斜めの高周波成分ＨＨ３が得られる。この処理を繰り返すことにより、所望のｎレベルまでの分解を行うことができる。 When the low-frequency component LL obtained by the one-level decomposition is further decomposed, a low-frequency component LL2, a vertical high-frequency component LH2, a horizontal high-frequency component HL2, and an oblique high-frequency component HH2 are obtained. By further decomposing the low frequency component LL2 obtained here, a low frequency component LL3, a vertical high frequency component LH3, a horizontal high frequency component HL3, and an oblique high frequency component HH3 are obtained. By repeating this process, it is possible to perform decomposition up to a desired n level.

［２−１−２−３．置換処理部の処理の詳細］
次に、置換処理部１３３の処理の詳細について、図１１と図１２を参照して説明する。置換処理部１３３は、まず、置換領域範囲決定部１３１でサイズが決定された楕円Ｅを用いて、置換側撮影画像から置換領域を切り出す。そして、切り出した置換領域内の顔画像で、被置換撮影画像の対応する領域内の顔画像を置き換える。 [2-1-2-3. Details of processing in the replacement processing unit]
Next, details of the processing of the replacement processing unit 133 will be described with reference to FIGS. 11 and 12. First, the replacement processing unit 133 uses the ellipse E whose size is determined by the replacement region range determination unit 131 to cut out a replacement region from the replacement-side captured image. Then, the face image in the corresponding area of the replacement photographed image is replaced with the face image in the extracted replacement area.

図１１は、被置換撮影画像として選択された撮影画像Ｐｏ２のＲ（赤）のＬＬ成分Ｐｏ２ＬＬにおける、顔領域の合成処理の例を示した図である。図１１に示すように、置換撮影画像Ｐｏ１のＲのＬＬ成分Ｐｏ１ＬＬより切り取った顔領域Ｒｆ３１を、置き換えが行われる側のＬＬ成分Ｐｏ２ＬＬ上の該当する位置に配置する。このとき、置き換えが行われる側の顔領域の瞳孔間隔の中心の位置に、楕円Ｅで切り取られた顔領域Ｒｆ３１の中心が配置されるように、位置合わせを行う。楕円Ｅの傾き角θは、画像の水平方向に対して０°となるように調整される。位置合わせが終わった後は、置換領域として抽出された顔領域Ｒｆ３１の顔画像で、置き換えが行われる側のＬＬ成分Ｐｏ２ＬＬ上の顔画像を置き換える合成処理を行う。この置き換え処理は、Ｒ，Ｇ，Ｂの各色成分の各周波数成分で行う。 FIG. 11 is a diagram illustrating an example of face area synthesis processing in the R (red) LL component Po2LL of the captured image Po2 selected as the replacement captured image. As shown in FIG. 11, the face region Rf31 cut out from the R LL component Po1LL of the replacement photographed image Po1 is arranged at a corresponding position on the LL component Po2LL on the side where the replacement is performed. At this time, alignment is performed so that the center of the face area Rf31 cut out by the ellipse E is arranged at the center position of the pupil interval of the face area on the side where the replacement is performed. The inclination angle θ of the ellipse E is adjusted to be 0 ° with respect to the horizontal direction of the image. After the alignment is completed, a compositing process is performed to replace the face image on the LL component Po2LL on the replacement side with the face image of the face region Rf31 extracted as the replacement region. This replacement process is performed for each frequency component of the R, G, and B color components.

図１２に、Ｒ，Ｇ，Ｂの各色成分の各周波数成分における、顔領域の置き換え（合成）処理の例を示す。図１２の左側には、被置換撮影画像として選択された撮影画像Ｐｏ２の各周波数成分を示し、右側に、置換撮影画像として選択された撮影画像Ｐｏ１の各周波数成分を示す。 FIG. 12 shows an example of face area replacement (synthesizing) processing for each frequency component of R, G, and B color components. The left side of FIG. 12 shows each frequency component of the photographed image Po2 selected as the replacement photographed image, and the right side shows each frequency component of the photographed image Po1 selected as the replacement photographed image.

置換処理部１３３は、置換領域範囲決定部１３１でサイズが決定された楕円Ｅを用いて、置換側撮影画像である撮影画像Ｐｏ１から生成された各周波数成分から、置換領域としての顔領域Ｒｆ３１を抽出する。つまり、低周波成分Ｐｏ１ＬＬ、縦の高周波成分Ｐｏ１ＬＨ、横の高周波成分Ｐｏ１ＨＬ、斜めの高周波成分Ｐｏ１ＨＨのそれぞれより、顔領域Ｒｆ３１を抽出する。 The replacement processing unit 133 uses the ellipse E whose size is determined by the replacement region range determination unit 131 to calculate a face region Rf31 as a replacement region from each frequency component generated from the captured image Po1 that is the replacement-side captured image. Extract. That is, the face region Rf31 is extracted from each of the low frequency component Po1LL, the vertical high frequency component Po1LH, the horizontal high frequency component Po1HL, and the oblique high frequency component Po1HH.

そして、被置換撮影画像である撮影画像Ｐｏ２から生成された各周波数成分に対して、抽出した顔領域Ｒｆ３１の置き換え（合成）処理を行う。つまり、低周波成分Ｐｏ２ＬＬ、縦の高周波成分Ｐｏ２ＬＨ、横の高周波成分Ｐｏ２ＨＬ、斜めの高周波成分Ｐｏ２ＨＨのそれぞれに対して、置換領域と対応する被置換領域の顔画像を、置換領域である顔領域Ｒｆ３１内の顔画像で置き換える。顔画像の置き換え処理は、顔画像の置き換えが必要な人物の数だけ行う。図１に示した例で言えば、人物Ｐｓ１と人物Ｐｓ３に対して行う。 Then, a replacement (combination) process of the extracted face region Rf31 is performed on each frequency component generated from the captured image Po2 that is the replacement captured image. That is, for each of the low-frequency component Po2LL, the vertical high-frequency component Po2LH, the horizontal high-frequency component Po2HL, and the oblique high-frequency component Po2HH, the face image of the replacement region corresponding to the replacement region is a face region Rf31 that is the replacement region. Replace with the face image inside. The face image replacement process is performed for the number of persons that need to be replaced. In the example shown in FIG. 1, the process is performed on the person Ps1 and the person Ps3.

［２−１−２−４．再構成部の処理の詳細］
次に、再構成部１３４での処理の詳細について、図１３を参照して説明する。図１３に示す再構成部１３４においても、１組のローパスフィルタＨ（ｚ）とハイパスフィルタＧ（ｚ）が３段に組まれている。１組のローパスフィルタＨ（ｚ）とハイパスフィルタＧ（ｚ）は、１レベル再構成する毎にフィルタをダウンサンプリングする。このように組まれたフィルタを通過することにより、多重解像度分解部１３２で分解された各周波数成分が逆ウェーブレット変換され、１つの画像に再構成される。 [2-1-2-4. Details of processing of the reconstruction unit]
Next, details of the processing in the reconstruction unit 134 will be described with reference to FIG. Also in the reconstruction unit 134 shown in FIG. 13, one set of low-pass filter H (z) and high-pass filter G (z) is assembled in three stages. One set of low-pass filter H (z) and high-pass filter G (z) downsamples the filter every time one level is reconstructed. By passing through the filter assembled in this way, each frequency component decomposed by the multi-resolution decomposition unit 132 is subjected to inverse wavelet transform and reconstructed into one image.

（２−２．第１の実施形態例に係る画像処理方法の例）
続いて、本実施形態例による画像処理方法の例について、図１４のフローチャートを参照して説明する。まず、画像処理装置１００に、撮像装置２００で連続的に撮影された複数枚の撮影画像Ｐｏが取り込まれる（ステップＳ１）。次に、顔部分検出部１１１によって、複数枚の撮影画像Ｐｏのそれぞれで顔部分が検出され（ステップＳ２）、目・口部分検出部１１２によって、顔部分の中からさらに目と口の部分が検出される（ステップＳ３）。次に、目・口部分検出部１１２で検出された目と口の部分の画像に基づいて、表情推定部１２１によって、人物Ｐｓの表情の推定が行われる（ステップＳ４）。 (2-2. Example of image processing method according to first embodiment)
Next, an example of the image processing method according to the present embodiment will be described with reference to the flowchart of FIG. First, a plurality of captured images Po continuously captured by the image capturing apparatus 200 are taken into the image processing apparatus 100 (step S1). Next, the face part detection unit 111 detects a face part in each of the plurality of photographed images Po (step S2), and the eye / mouth part detection unit 112 further detects an eye and mouth part from the face part. It is detected (step S3). Next, the facial expression estimation unit 121 estimates the facial expression of the person Ps based on the eye and mouth part images detected by the eye / mouth part detection unit 112 (step S4).

続いて、表情推定部１２１で推定された表情に基づいて、置換側／被置換撮影画像選択部１２２によって、撮影画像Ｐｏ内に写っている各人物Ｐｓに対して、最も良い表情が写っている撮影画像Ｐｏが決定される（ステップＳ５）。そして、ステップＳ５で選ばれた各人物Ｐｓの最も良い表情を含む撮影画像Ｐｏが、すべて同一の撮影画像Ｐｏであるか否かが判断される（ステップＳ６）。すべて同一の撮影画像Ｐｏであった場合には、その撮影画像Ｐｏがベストショットとして採用される（ステップＳ７）。 Subsequently, based on the facial expression estimated by the facial expression estimation unit 121, the replacement / replaced captured image selection unit 122 shows the best facial expression for each person Ps in the captured image Po. A captured image Po is determined (step S5). Then, it is determined whether or not all the captured images Po including the best facial expression of each person Ps selected in step S5 are the same captured image Po (step S6). If all the captured images Po are the same, the captured image Po is adopted as the best shot (step S7).

各人物Ｐｓの最も良い表情を含む撮影画像Ｐｏが、すべて同一の撮影画像Ｐｏではなかった場合には、置換側／被置換撮影画像選択部１２２によって、ベストショットの素地となる被置換撮影画像が決定される（ステップＳ８）。具体的には、撮影画像Ｐｏ内に写っている各人物Ｐｓの顔の評価値の合計値が最も高い撮影画像Ｐｏが、被置換撮影画像として選択される。 When the photographed images Po including the best facial expressions of each person Ps are not all the same photographed images Po, the replacement photographed image that is the basis of the best shot is obtained by the replacement / replaced photographed image selection unit 122. It is determined (step S8). Specifically, the photographed image Po having the highest total evaluation value of the faces of each person Ps in the photographed image Po is selected as the replacement photographed image.

そして、置換領域範囲決定部１３１によって、撮影画像Ｐｏ内の各人物Ｐｓの最も良い表情を含む置換側撮影画像に基づいて、置換領域としての顔領域を切り出すための楕円Ｅの長径と短径が算出される（ステップＳ９）。続いて、多重解像度分解部１３２によって、被置換撮影画像と置換側撮影画像とが離散ウェーブレット変換（非間引きウェーブレット変換）される（ステップＳ１０）。そして、被置換撮影画像の各周波数成分と置換側撮影画像の各周波数成分において、置換処理部１３３によって、楕円Ｅを用いて顔領域が抽出される（ステップＳ１１）。 Then, the replacement area range determination unit 131 determines the major axis and the minor axis of the ellipse E for cutting out the face area as the replacement area based on the replacement side captured image including the best facial expression of each person Ps in the captured image Po. Calculated (step S9). Subsequently, the multi-resolution decomposition unit 132 performs discrete wavelet transform (non-decimation wavelet transform) on the replacement captured image and the replacement-side captured image (step S10). Then, in each frequency component of the replacement-captured image and each frequency component of the replacement-side captured image, the replacement processing unit 133 extracts a face area using the ellipse E (step S11).

続いて、同じく置換処理部１３３によって、置換領域内の顔画像で、被置換撮影画像内の被置換領域内の顔画像を置き換える処理が行われる（ステップＳ１２）。最後に、顔画像の置き換えが行われた被置換撮影画像の各周波数成分が、再構成部１３４によって逆ウェーブレット変換されることにより、ベストショットとしての合成画像が生成される（ステップＳ１３）。 Subsequently, the replacement processing unit 133 performs a process of replacing the face image in the replacement area in the replacement photographed image with the face image in the replacement area (step S12). Finally, each frequency component of the replacement-captured image on which the face image has been replaced is subjected to inverse wavelet transform by the reconstruction unit 134, thereby generating a composite image as the best shot (step S13).

本実施形態例によれば、置換領域内の画像による被置換撮影画像の置き換えが、離散ウェーブレット変換領域下で行われる。このため、逆ウェーブレット変換して画像が再構成される際に、それぞれの周波数成分がフィルタ処理されることになる。これにより、再構成された合成画像Ｐｃにおいて、置き換え部分と被置き換え部分との境目が目立たなくなる。 According to the present embodiment, replacement of the replacement captured image with the image in the replacement area is performed under the discrete wavelet transform area. For this reason, when an image is reconstructed by inverse wavelet transform, each frequency component is filtered. Thereby, in the reconstructed composite image Pc, the boundary between the replacement portion and the replacement portion becomes inconspicuous.

また、本実施形態例によれば、置換領域として、人物の顔部分及び頭髪部分を含む領域が楕円Ｅで切り出される。これにより、例えば目の領域のみや、口の領域のみが置き換えられる場合と比較して、合成画像Ｐｃにおける人物Ｐｓの表情が自然なものとなる。また、置換領域を、矩形でなく楕円Ｅで切り出しているため、合成画像Ｐｃにおける置き換え部分と被置き換え部分との境目が、矩形で切り出した場合と比較してより一層目立ちにくくなる。 Further, according to the present embodiment, an area including a human face portion and a hair portion is cut out by an ellipse E as a replacement region. Thereby, for example, the facial expression of the person Ps in the composite image Pc becomes natural as compared with a case where only the eye region or only the mouth region is replaced. Further, since the replacement area is cut out by the ellipse E instead of the rectangle, the boundary between the replacement portion and the replacement portion in the composite image Pc becomes more inconspicuous than in the case of cutting out the rectangle.

図１５に、離散ウェーブレット変換による分解を行わずに空間領域で合成を行った合成画像と、本実施形態例による合成画像Ｐｃとを示す。図１５Ａが従来の手法により生成した合成画像で、図１５Ｂが、本実施形態例により生成された合成画像Ｐｃである。図１５Ａ及び図１５Ｂの画像の左上端には人物Ｐｓの頬からあごにかかる領域が写っており、画像の下方には、人物Ｐｓの肩の部分が写っている。 FIG. 15 shows a composite image synthesized in the spatial domain without performing the decomposition by the discrete wavelet transform, and a composite image Pc according to the present embodiment. FIG. 15A is a composite image generated by a conventional method, and FIG. 15B is a composite image Pc generated by the present embodiment. A region from the cheek of the person Ps to the chin is shown in the upper left corner of the images of FIGS. 15A and 15B, and a shoulder portion of the person Ps is shown below the images.

図１５Ａに示す従来の手法による合成画像中には、顔周辺の領域とその外側の背景の領域との間に、右上から左下方向の線状の境界線を視認できる。一方、図１５Ｂに示す本実施形態例による合成画像Ｐｃにおいては、そのような境界線が見られない。つまり、より自然な合成画像Ｐｃが得られていることが分かる。 In the synthesized image by the conventional method shown in FIG. 15A, a linear boundary line from the upper right to the lower left can be visually recognized between the area around the face and the background area outside the face. On the other hand, such a boundary line is not seen in the composite image Pc according to the present embodiment shown in FIG. 15B. That is, it can be seen that a more natural composite image Pc is obtained.

〔３．第２の実施形態例〕
（３−１．第２の実施形態例の概要）
上述した第１の実施の形態例では、離散ウェーブレット変換領域下で画像の合成を行うにあたり、被置換撮影画像と置換側撮影画像の全体を多重解像度に分解した。本実施形態例では、被置換撮影画像と置換側撮影画像のそれぞれ一部の領域のみを多重解像度に分解して再構成する。これにより処理に必要なデータ量が削減されるため、処理時間も短縮することができる。 [3. Second Embodiment]
(3-1. Overview of Second Embodiment)
In the above-described first embodiment, when the images are combined under the discrete wavelet transform region, the entire replacement-captured captured image and replacement-side captured image are decomposed into multiple resolutions. In the present embodiment example, only a partial area of each of the replacement captured image and the replacement side captured image is decomposed into multiple resolutions and reconstructed. As a result, the amount of data required for processing is reduced, and the processing time can also be shortened.

図１６は、本実施形態例による、多重解像度への分解及び再構成を行う領域Ａｒを示す図である。領域Ａｒは、置換領域の外周周辺に形成される領域であり、置換領域の範囲を示す楕円Ｅを中心として、その内周側及び外周側に所定の幅を有するドーナツ型の領域として定義される。 FIG. 16 is a diagram showing an area Ar where decomposition and reconstruction are performed into multiple resolutions according to the present embodiment. The region Ar is a region formed around the outer periphery of the replacement region, and is defined as a donut-shaped region having a predetermined width on the inner peripheral side and the outer peripheral side around the ellipse E indicating the range of the replacement region. .

本実施形態例では、この領域Ａｒを、置換撮影画像と被置換撮影画像の両方から抽出する。図１７Ａは、置換側撮影画像としての撮影画像Ｐｏ１から抽出される領域Ａｒ３１（第１の領域）を示した図であり、図７Ｂは、被置換撮影画像としての撮影画像Ｐｏ２から抽出される領域Ａｒ３２（第２の領域）を示した図である。いずれの図においても、領域Ａｒ３１と領域Ａｒ３２とを共に一点鎖線で示している。 In the present embodiment example, this area Ar is extracted from both the replacement photographed image and the replacement photographed image. FIG. 17A is a diagram illustrating a region Ar31 (first region) extracted from the captured image Po1 as the replacement-side captured image, and FIG. 7B is a region extracted from the captured image Po2 as the replacement captured image. It is the figure which showed Ar32 (2nd area | region). In any of the drawings, both the region Ar31 and the region Ar32 are indicated by a one-dot chain line.

このように設定された領域Ａｒ３１内の画像と、領域Ａｒ３２内の画像とを、多重解像度分解部１３２が非間引きウェーブレット変換して、各周波数成分に分解する。置換処理部１３３は、置き換えられる側の領域Ａｒ３２内の画像の一部を、置換側の領域Ａｒ３１内の画像の一部で置き換える処理を行う。 The multi-resolution decomposition unit 132 performs non-decimation wavelet transform on the image in the area Ar31 and the image in the area Ar32 set in this way, and decomposes them into frequency components. The replacement processing unit 133 performs a process of replacing a part of the image in the replacement area Ar32 with a part of the image in the replacement area Ar31.

図１８は、置換処理部１３３による置き換え処理の例を示した図である。図１８Ａの左上は、図１７Ｂに示した、良い表情でない顔領域の外周周辺の領域Ａｒ３２の低周波成分Ａｒ３２ＬＬを、右下がり斜線で示している。図１８Ａの右下には、図１７Ａに示した、良い表情の顔領域の外周周辺の領域Ａｒ３１の低周波成分Ａｒ３１ＬＬの内周部分（楕円Ｅの内側）のみを、右上がり斜線で示している。本実施形態例による置換処理部１３３は、被置換撮影画像から生成された低周波成分Ａｒ３２ＬＬの内周部分を、置換側撮影画像から生成された低周波成分Ａｒ３１ＬＬの内周部分で置き換えることを行う。これにより、図１８Ｂに示すように、被置換撮影画像から生成された低周波成分Ａｒ３２ＬＬと、置換側撮影画像から生成された低周波成分Ａｒ３１ＬＬとが合成される。 FIG. 18 is a diagram illustrating an example of replacement processing by the replacement processing unit 133. In the upper left of FIG. 18A, the low-frequency component Ar32LL of the area Ar32 around the outer periphery of the face area that is not a good expression shown in FIG. In the lower right of FIG. 18A, only the inner peripheral portion (inside the ellipse E) of the low-frequency component Ar31LL of the region Ar31 around the outer periphery of the face region of good expression shown in FIG. . The replacement processing unit 133 according to the present embodiment replaces the inner peripheral portion of the low-frequency component Ar32LL generated from the replacement captured image with the inner peripheral portion of the low-frequency component Ar31LL generated from the replacement-side captured image. . Thereby, as shown in FIG. 18B, the low-frequency component Ar32LL generated from the replacement-captured image and the low-frequency component Ar31LL generated from the replacement-side captured image are combined.

合成された低周波成分Ａｒ３１ＬＬと低周波成分Ａｒ３２ＬＬは、再構成部１３４で逆ウェーブレット変換されて再構成され、領域Ａｒ′とされる。 The synthesized low-frequency component Ar31LL and low-frequency component Ar32LL are subjected to inverse wavelet transform by the reconstruction unit 134 and reconstructed into the region Ar ′.

本実施形態例では、図１９に示すように、楕円Ｅ内の顔領域Ｒｆ３１の画像と、置き換え後に再構成された領域Ａｒ′の画像と、被置換撮影画像である撮影画像Ｐｏ２の背景画像とを合成することにより、ベストショットとしての合成画像Ｐｃを得る。 In this embodiment, as shown in FIG. 19, the image of the face area Rf31 in the ellipse E, the image of the area Ar ′ reconstructed after replacement, and the background image of the captured image Po2 that is the replacement captured image Is combined to obtain a composite image Pc as the best shot.

（３−２．画像処理装置の構成例）
図２０は、本実施形態例による画像処理装置１００Ａの構成例を示した図である。図２０において、図３と対応する箇所には同一の符号を付してあり、重複する説明は省略する。図３に示した、第１の実施形態例に係る画像処理装置１００と構成が異なる点は、合成画像生成部１３０Ａの内部の構成である。本実施形態例では、再構成部１３４の後段にさらに合成画像生成処理部１３５が加わっている。合成画像生成処理部１３５は、多重解像度分解部１３２で多重解像度に分解されて再構成部１３４で再構成された領域Ａｒ′内の画像と、置換領域内の顔画像と、被置換撮影画像の背景画像とを合成して、合成画像Ｐｃを生成する。 (3-2. Configuration Example of Image Processing Device)
FIG. 20 is a diagram illustrating a configuration example of an image processing apparatus 100A according to the present embodiment. 20, portions corresponding to those in FIG. 3 are denoted by the same reference numerals, and redundant description is omitted. The difference in configuration from the image processing apparatus 100 according to the first embodiment shown in FIG. 3 is the internal configuration of the composite image generation unit 130A. In the present embodiment, a composite image generation processing unit 135 is further added after the reconstruction unit 134. The composite image generation processing unit 135 includes an image in the area Ar ′ that has been decomposed into multiple resolutions by the multi-resolution decomposition unit 132 and reconstructed by the reconstruction unit 134, a face image in the replacement area, and a replacement photographed image. A synthesized image Pc is generated by synthesizing the background image.

なお、図１６に示した、置換撮影画像と被置換撮影画像の両方から抽出する領域Ａｒの幅としては、楕円Ｅを中心とした内周側と外周側に、それぞれ少なくとも１．５Ｌとる必要がある。ここでいう“Ｌ”とは、多重解像度分解部１３２で行う非間引きウェーブレット変換のフィルタ長である。領域Ａｒの幅は、この幅より広くする分には、いくらでも広くしてよい。 Note that the width of the area Ar extracted from both the replacement photographic image and the replacement photographic image shown in FIG. 16 needs to be at least 1.5 L on the inner peripheral side and the outer peripheral side with the ellipse E as the center. is there. Here, “L” is the filter length of the non-decimated wavelet transform performed by the multiresolution decomposition unit 132. The width of the region Ar may be increased as much as it is wider than this width.

ここで、領域Ａｒの幅として、少なくとも３Ｌが必要となる理由について説明する。例えば、フィルタ長Ｌが５である一般的なフィルタの係数を“ｈ（ｎ）”とし、フィルタに対する入力を“ｘ（ｎ）”とすると、フィルタからの出力ｙ（ｎ）は、以下の式３で示すことができる。
ｙ（ｎ）＝ｈ（−２）×ｘ（ｎ＋２）＋ｈ（−１）×ｘ（ｎ＋１）＋ｈ（０）×ｘ（ｎ）＋ｈ（１）×ｘ（ｎ−１）＋ｈ（２）×ｘ（ｎ−２）・・・式３ Here, the reason why at least 3L is required as the width of the region Ar will be described. For example, when the coefficient of a general filter having a filter length L of 5 is “h (n)” and the input to the filter is “x (n)”, the output y (n) from the filter is expressed by the following equation: 3 can be indicated.
y (n) = h (−2) × x (n + 2) + h (−1) × x (n + 1) + h (0) × x (n) + h (1) × x (n−1) + h (2) × x (n-2) Equation 3

すなわち、現時点ｎでの出力ｙ（ｎ）は、ｎ±Ｌ／２（ここでは２．５）の範囲における入力の影響を受ける。言い換えると、現時点ｎでの入力ｘ（ｎ）は、ｎ±Ｌ／２の範囲の出力ｙ（ｎ）に対して影響を与える。 That is, the output y (n) at the current time n is affected by the input in the range of n ± L / 2 (2.5 here). In other words, the input x (n) at the current time n affects the output y (n) in the range of n ± L / 2.

図２１Ａは、本発明に係る非間引きウェーブレット変換から再構成までの処理を、ブロックで示した図である。非間引きウェーブレット変換のフィルタの長さＬは、（１＋２＋４＋・・・＋２^ｎ−１）Ｎ＋１で示される。ここでいう“Ｎ”とは、ローパスフィルタＨ（ｚ）とハイパスフィルタＧ（ｚ）の次数である。図２１Ａに示すように、入力を“ｘ（ｎ）”、非間引きウェーブレット変換（ｕＤＷＴ）後の出力を“ｕ（ｎ）”、置き換え合成処理後の出力を“ｖ（ｎ）”、逆ウェーブレット変換後の出力を“ｙ（ｎ）”とする。 FIG. 21A is a block diagram showing processing from non-decimated wavelet transform to reconstruction according to the present invention. The length L of the non-decimated wavelet transform filter is represented by (1 + 2 + 4 +... +2 ⁿ⁻¹ ) N + 1. Here, “N” is the order of the low-pass filter H (z) and the high-pass filter G (z). As shown in FIG. 21A, the input is “x (n)”, the output after non-decimated wavelet transform (uDWT) is “u (n)”, the output after replacement synthesis processing is “v (n)”, and the inverse wavelet The converted output is assumed to be “y (n)”.

図２１Ｂは、置換側撮影画像を分解して得られた周波数成分ｕ１（ｎ）の、“ｋ−２”〜“ｋ＋２”で示される各画素位置における輝度値の大きさを、矢印の長さで示した図である。図２１Ｃは、被置換撮影画像を分解して得られた周波数成分ｕ２（ｎ）の、“ｋ−２”〜“ｋ＋２”で示される各画素位置における輝度値の大きさを、矢印の長さで示した図である。図２１Ｄは、置き換え合成処理後の出力ｖ（ｎ）における、“ｋ−２”〜“ｋ＋２”で示される各画素位置での輝度値の大きさを、矢印の長さで示した図である。 FIG. 21B shows the magnitude of the luminance value at each pixel position indicated by “k−2” to “k + 2” of the frequency component u1 (n) obtained by decomposing the replacement-side captured image, and the length of the arrow. It is the figure shown by. FIG. 21C shows the magnitude of the luminance value at each pixel position indicated by “k−2” to “k + 2” of the frequency component u2 (n) obtained by decomposing the replacement photographed image, and the length of the arrow. It is the figure shown by. FIG. 21D is a diagram illustrating the magnitude of the luminance value at each pixel position indicated by “k−2” to “k + 2” in the output v (n) after the replacement synthesis process, by the length of the arrow. .

例えば、図２１Ｄ中に縦線で示す“ｋ−１”と“ｋ”の間の位置で、画像の置き換え及び合成を行うものとする。この場合は、置き換え及び合成処理が行われることによって、ｋ±Ｌ／２の範囲の出力ｙ（ｎ）に影響が及ぶ。ｋ±Ｌ／２の範囲の出力ｙ（ｎ）を求めるためには、ｋ±Ｌ／２±Ｌ／２、すなわちｋ±Ｌの範囲の輝度値を、入力ｖ（ｎ）として入力する必要がある。さらに、ｋ±Ｌの範囲のｕ（ｎ）を計算するためには、ｋ±Ｌ±Ｌ／２、すなわちｋ±３Ｌ／２の範囲の輝度値を、入力ｘ（ｎ）として入力する必要がある。 For example, it is assumed that image replacement and composition are performed at a position between “k−1” and “k” indicated by a vertical line in FIG. 21D. In this case, the output y (n) in the range of k ± L / 2 is affected by the replacement and synthesis processing. In order to obtain the output y (n) in the range of k ± L / 2, it is necessary to input the luminance value in the range of k ± L / 2 ± L / 2, that is, k ± L, as the input v (n). is there. Further, in order to calculate u (n) in the range of k ± L, it is necessary to input a luminance value in the range of k ± L ± L / 2, that is, k ± 3L / 2, as the input x (n). is there.

したがって、非間引きウェーブレット変換処理と、画像の置き換え及び合成処理と、逆ウェーブレット変換処理を行うには、ｋの位置に配置される楕円Ｅを中心として、±３Ｌ／２、すなわち±１．５Ｌの範囲の画素値が、入力として必要となる。このため、楕円Ｅで切り取られる置換領域内の画像で、置換領域と対応する被置換領域内の画像を置き換える場合には、楕円Ｅを中心として内周方向と外周方向にそれぞれ少なくとも１．５Ｌの幅を有する領域Ａｒを、多重解像度分解部１３２に対して入力する必要がある。 Therefore, in order to perform the non-decimated wavelet transform process, the image replacement and synthesis process, and the inverse wavelet transform process, the ellipse E arranged at the position of k is centered at ± 3 L / 2, that is, ± 1.5 L. A range of pixel values is required as input. For this reason, when replacing the image in the replacement area corresponding to the replacement area with the image in the replacement area cut out by the ellipse E, at least 1.5 L each in the inner circumferential direction and the outer circumferential direction with the ellipse E as the center. It is necessary to input a region Ar having a width to the multi-resolution decomposition unit 132.

（３−３．画像処理方法の例）
続いて、本実施形態例に係る画像処理方法について、図２２のフローチャートを参照して説明する。本実施形態例に係る画像処理方法は、図１４に示した第１の実施形態例に係る画像処理と、ステップＳ１〜ステップＳ６までは共通である。したがって、図２２には、図１４のステップＳ６以降の処理を示す。 (3-3. Example of image processing method)
Next, an image processing method according to the present embodiment will be described with reference to the flowchart of FIG. The image processing method according to this embodiment is common to the image processing according to the first embodiment shown in FIG. 14 from step S1 to step S6. Therefore, FIG. 22 shows processing after step S6 of FIG.

まず、置換側／被置換撮影画像選択部１２２によって、ベストショットの素地となる被置換撮影画像が決定される（ステップＳ２１）。そして、置換領域範囲決定部１３１によって、置換領域としての顔領域を切り出すための楕円Ｅの長径と短径が算出される（ステップＳ２２）。さらに、置換領域範囲決定部１３１によって、楕円Ｅを中心とした、楕円Ｅの内周側と外周側にそれぞれ１．５Ｌの幅を有する領域Ａｒが決定される（ステップＳ２３）。 First, the replacement side / replacement captured image selection unit 122 determines a replacement captured image that is the basis of the best shot (step S21). Then, the replacement area range determination unit 131 calculates the major axis and the minor axis of the ellipse E for cutting out the face area as the replacement area (step S22). Further, the replacement area range determination unit 131 determines areas Ar having a width of 1.5 L on the inner circumference side and the outer circumference side of the ellipse E around the ellipse E (step S23).

次に、多重解像度分解部１３２によって、被置換撮影画像と置換側撮影画像の両方から領域Ａｒが抽出され、抽出された領域Ａｒが離散ウェーブレット変換（非間引きウェーブレット変換）される（ステップＳ２４）。続いて、置換処理部１３３によって、多重解像度に分解された各周波数成分において、置換側撮影画像内の楕円Ｅの内周側の領域内の画像で、被置換撮影画像内の対応領域の内周側の領域の画像を置き換える処理が行われる（ステップＳ２５）。 Next, the multi-resolution decomposition unit 132 extracts a region Ar from both the replacement-captured captured image and the replacement-side captured image, and the extracted region Ar is subjected to discrete wavelet transform (non-decimated wavelet transform) (step S24). Subsequently, in each frequency component decomposed into multiple resolutions by the replacement processing unit 133, an image in the inner peripheral area of the ellipse E in the replacement-side captured image, and the inner periphery of the corresponding area in the replacement-captured image A process of replacing the image in the side area is performed (step S25).

そして、再構成部１３４によって、置き換えが行われた各周波数成分における領域Ａｒ′の画像が逆ウェーブレット変換され、再構成される（ステップＳ２６）。最後に、合成画像生成処理部１３５によって、楕円Ｅで切り出された置換領域内の顔画像と、再構成された顔画像周辺の領域Ａｒ′と、被置換撮影画像の背景部分の画像とを用いて、ベストショットとしての合成画像Ｐｃが生成される（ステップＳ２７）。 Then, the reconstruction unit 134 performs inverse wavelet transform on the image of the area Ar ′ in each frequency component for which replacement has been performed, and reconstructs the image (step S26). Lastly, the composite image generation processing unit 135 uses the face image in the replacement area cut out by the ellipse E, the reconstructed area Ar ′ around the face image, and the background image of the replacement photographed image. Thus, the composite image Pc as the best shot is generated (step S27).

上述した第２の実施形態例によれば、楕円Ｅを中心とした±１．５Ｌの領域Ａｒを、離散ウェーブレット変換領域下で置き換えることで、置き換え部分と被置き換え部分との境目が目立たない、自然な合成画像を得ることができる。 According to the second embodiment described above, by replacing the ± 1.5L region Ar centered on the ellipse E under the discrete wavelet transform region, the boundary between the replacement portion and the replacement portion is inconspicuous. A natural composite image can be obtained.

また、離散ウェーブレット変換される領域が、楕円Ｅを中心とした±１．５Ｌの領域に限定されるため、計算時間が短縮され、ベストショットとしての合成画像Ｐｃが生成されるまでの時間も短くなる。したがって、例えば撮影画像Ｐｏが非常に高精細であり、データ量が多い場合等にも、処理を高速に行うことができる。 In addition, since the area subjected to discrete wavelet transform is limited to an area of ± 1.5 L centered on the ellipse E, the calculation time is shortened and the time until the composite image Pc as the best shot is generated is also shortened. Become. Therefore, for example, even when the captured image Po is very high definition and the amount of data is large, the processing can be performed at high speed.

〔４．各種変形例〕
なお、上述した各実施形態例では、多重解像度分解部１３２が非間引きウェーブレット変換を行う例を挙げたが、これに限定されるものではなく、通常の離散ウェーブレット変換を行ってもよい。 [4. Various modifications)
In each of the above-described embodiments, the example in which the multi-resolution decomposition unit 132 performs the non-decimation wavelet transform is described. However, the present invention is not limited to this, and normal discrete wavelet transform may be performed.

また、上述した各実施形態例では、顔領域の全体を含む領域を置換領域として抽出する例を挙げたが、これに限定されるものではない。目の部分や口の部分のみを、置換領域として抽出するようにしてもよい。 Further, in each of the above-described embodiments, an example in which an area including the entire face area is extracted as a replacement area has been described, but the present invention is not limited to this. Only the eye part and the mouth part may be extracted as replacement regions.

また、上述した各実施形態例では、良い表情を含む顔画像を置換領域として抽出し、良い表情でない顔画像を置き換える例を挙げたが、これに限定されるものではない。表情の善し悪しにかかわらず、任意の領域の画像を、対応する領域の画像で置き換えるようにしてもよい。 In each of the above-described embodiments, a face image including a good expression is extracted as a replacement area and a face image that does not have a good expression is replaced. However, the present invention is not limited to this. Regardless of whether the expression is good or bad, an image in an arbitrary area may be replaced with an image in a corresponding area.

〔５．本発明の一実施形態例に係る撮像装置の構成例〕
図２３は、本発明の画像処理装置を撮像装置に適用した場合の、撮像装置２００Ａの構成例を示すブロック図である。図２３に示す撮像装置２００Ａは、連写機能を備えるとともに、連写して得られた複数の撮影画像Ｐｏを用いて、ベストショットとしての合成画像Ｐｃを生成する。撮像装置２００Ａは、レンズ２０１と、撮像素子２０２と、信号処理部２０３と、画像処理部２０４と、タイミングジェネレータ（ＴＧ）２０５とを備える。 [5. Configuration example of imaging apparatus according to an embodiment of the present invention]
FIG. 23 is a block diagram illustrating a configuration example of the imaging apparatus 200A when the image processing apparatus of the present invention is applied to the imaging apparatus. An imaging apparatus 200A illustrated in FIG. 23 has a continuous shooting function and generates a composite image Pc as a best shot using a plurality of captured images Po obtained by continuous shooting. The imaging apparatus 200A includes a lens 201, an imaging element 202, a signal processing unit 203, an image processing unit 204, and a timing generator (TG) 205.

レンズ２０１は、単一のレンズ又は複数枚のレンズ群よりなり、被写体光を撮像装置２００内に取り込む。撮像部としての撮像素子２０２は、レンズ２０１を通して入射された被写体光を光電変換して画像信号を生成する。撮像素子２０２は、例えばＣＣＤ（Charge Coupled Device）イメージセンサやＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサ等で構成される。 The lens 201 includes a single lens or a plurality of lens groups, and takes in subject light into the imaging device 200. An imaging element 202 as an imaging unit photoelectrically converts subject light incident through the lens 201 to generate an image signal. The image sensor 202 is configured by, for example, a CCD (Charge Coupled Device) image sensor, a CMOS (Complementary Metal Oxide Semiconductor) image sensor, or the like.

信号処理部２０３は、撮像素子２０２で得られた画像信号に対して、ＡＧＣ（Auto Gain Control）やＡＷＢ（Auto White Balance）、ガンマ補正等の各種信号処理を行う。また信号処理部２０３は、Ｒ，Ｇ，Ｂで入力された画像信号のフォーマットを、出力フォーマットに応じて変換する処理も行う。なお、画像信号がＹＵＶフォーマットで出力される場合には、次の画像処理部２０４で行う多重解像度分解処理は、ＲＧＢ信号の時と同様に、輝度信号、色差信号（Ｃｂ）、色差信号（Ｃｒ）のそれぞれに対して行う必要がある。 The signal processing unit 203 performs various signal processes such as AGC (Auto Gain Control), AWB (Auto White Balance), and gamma correction on the image signal obtained by the image sensor 202. The signal processing unit 203 also performs processing for converting the format of the image signal input in R, G, and B according to the output format. When the image signal is output in the YUV format, the multi-resolution decomposition process performed in the next image processing unit 204 is the same as in the case of the RGB signal, the luminance signal, the color difference signal (Cb), and the color difference signal (Cr ) For each of these.

画像処理部２０４は、信号処理部２０３で信号処理が施された画像信号に対して、デモザイクや色空間変換等のカラープロセス処理や、解像度変換処理、画像フィルタ処理等を行う。さらに、複数の撮影画像Ｐｏを合成してベストショットとしての合成画像Ｐｃを得る処理も行う。タイミングジェネレータ（ＴＧ）２０５は、撮像素子２０２の駆動タイミングパルスと、信号処理部２０３と画像処理部２０４用のパルスを発生して、これらの各部に供給する。 The image processing unit 204 performs color process processing such as demosaicing and color space conversion, resolution conversion processing, image filter processing, and the like on the image signal subjected to signal processing by the signal processing unit 203. Furthermore, a process of obtaining a composite image Pc as a best shot by combining a plurality of captured images Po is also performed. A timing generator (TG) 205 generates drive timing pulses for the image sensor 202 and pulses for the signal processing unit 203 and the image processing unit 204 and supplies them to these units.

また、撮像装置２００Ａは、記憶部としてのメモリ２０６と、記録再生処理部（ＣＯＤＥＣ）２０７とを有する。メモリ２０６は、ＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory）よりなり、ＲＡＭには、作業途中のデータ、例えば多重解像度分解部１３２で生成された各周波数成分や、置換処理部１３３で置き換え処理を行う画像等が一時的に記憶される。ＲＯＭには、撮像素子２０２で撮影された撮影画像Ｐｏや動画像等が記憶される。記録再生処理部２０７は、画像処理部２０４で画像処理が施された画像信号を圧縮してメモリ２０６に書き込んだり、メモリ２０６に記憶された圧縮データを読み出して伸長したりする処理を行う。 The imaging apparatus 200A also includes a memory 206 as a storage unit and a recording / playback processing unit (CODEC) 207. The memory 206 includes a ROM (Read Only Memory) and a RAM (Random Access Memory). The RAM is replaced with data in the middle of work, for example, each frequency component generated by the multi-resolution decomposition unit 132 or a replacement processing unit 133. An image to be processed is temporarily stored. The ROM stores a captured image Po, a moving image, and the like captured by the image sensor 202. The recording / playback processing unit 207 performs a process of compressing the image signal subjected to the image processing by the image processing unit 204 and writing the compressed image data in the memory 206, or reading and decompressing the compressed data stored in the memory 206.

また、撮像装置２００Ａは、デジタル・アナログ変換部（Ｄ／Ａ）２０８と、ビデオエンコーダ２０９と、表示部２１０を備える。Ｄ／Ａ２０８は、画像処理部２０４で画像処理が施されたデジタルの画像信号をアナログの画像信号に変換する。ビデオエンコーダ２０９は、Ｄ／Ａ２０８でアナログ信号に変換された画像信号を動画符号化して、表示部２１０に出力する。表示部２１０は、ビデオエンコーダ２０９で動画符号化された映像を表示するディスプレイであり、ＬＣＤ（Liquid Crystal Display）や有機ＥＬ（Electro-Luminescence）ディスプレイで構成される。 The imaging apparatus 200 </ b> A includes a digital / analog conversion unit (D / A) 208, a video encoder 209, and a display unit 210. The D / A 208 converts the digital image signal subjected to image processing by the image processing unit 204 into an analog image signal. The video encoder 209 encodes the image signal converted into an analog signal by the D / A 208 into a moving image and outputs the encoded image signal to the display unit 210. The display unit 210 is a display that displays a video encoded by the video encoder 209, and includes a LCD (Liquid Crystal Display) or an organic EL (Electro-Luminescence) display.

さらに、撮像装置２００Ａは、制御部２１１と、外部記憶媒体２１２と、外部記憶媒体Ｉ／Ｆ部２１３と、操作入力部２１４とを備える。制御部２１１は、ＣＰＵ（Central Processing Unit）等よりなり、撮像装置２００Ａを構成する各部を制御する。外部記憶媒体２１２とは、リムーバブルメディアと称される着脱可能な記憶媒体や、ＤＶＤ（Digital Versatile Disc）やＨＤＤ（Hard Disc Drive）等の外付けの記憶媒体を指す。外部記憶媒体Ｉ／Ｆ部２１３は、これらの外部記憶媒体２１２に対するデータの読み書きを制御する。操作入力部２１４は、ボタンやつまみ、キー等よりなり、ユーザによって入力された操作内容を操作信号に変換して制御部２１１に出力する。 Furthermore, the imaging apparatus 200 </ b> A includes a control unit 211, an external storage medium 212, an external storage medium I / F unit 213, and an operation input unit 214. The control unit 211 includes a CPU (Central Processing Unit) and the like, and controls each unit configuring the imaging apparatus 200A. The external storage medium 212 refers to a removable storage medium called a removable medium, and an external storage medium such as a DVD (Digital Versatile Disc) or HDD (Hard Disc Drive). The external storage medium I / F unit 213 controls reading and writing of data with respect to the external storage medium 212. The operation input unit 214 includes buttons, knobs, keys, and the like. The operation input unit 214 converts the operation content input by the user into an operation signal and outputs the operation signal to the control unit 211.

なお、図２３には、本発明の画像処理装置を、表示部２１０を備えた撮像装置２００Ａに適用した例を挙げたが、表示部２１０を備えずに、外部の表示装置に対して画像信号を出力する撮像装置に適用してもよい。 FIG. 23 shows an example in which the image processing apparatus of the present invention is applied to an imaging apparatus 200A including a display unit 210. However, an image signal is not transmitted to an external display device without including the display unit 210. It may be applied to an imaging device that outputs.

また、本発明の画像処理装置は、スタンドアロンの環境下で使用される画像処理装置に限定されない。例えば、ネットワークを介して送信された撮影画像Ｐｏを加工して現像したり、フォトブック等に加工したりする画像処理サービスで使用される画像処理装置に適用してもよい。 The image processing apparatus of the present invention is not limited to an image processing apparatus used in a stand-alone environment. For example, the present invention may be applied to an image processing apparatus used in an image processing service that processes and develops a photographed image Po transmitted via a network, or processes it into a photo book or the like.

また、上述した実施の形態例における一連の処理は、ソフトウェアにより実行させることもできる。一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが専用のハードウェアに組み込まれているコンピュータ、又は各種の機能を実行するためのプログラムをインストールしたコンピュータにより実行可能である。例えば汎用のパーソナルコンピュータ等に所望のソフトウェアを構成するプログラムをインストールして実行させればよい。また、ネットワークに接続されたサーバ上や、クラウドコンピューティング環境下に置かれるソフトウェアに適用してもよい。 The series of processing in the above-described embodiment can be executed by software. When the series of processing is executed by software, the processing can be executed by a computer in which a program constituting the software is incorporated in dedicated hardware or a computer in which programs for executing various functions are installed. For example, a program constituting desired software may be installed and executed on a general-purpose personal computer or the like. Further, the present invention may be applied to software placed on a server connected to a network or in a cloud computing environment.

また、上述した実施の形態例の機能を実現するソフトウェアのプログラムコードを記憶させた記録媒体を、システムあるいは装置に供給してもよい。また、そのシステムあるいは装置のコンピュータ（又はＣＰＵ等の制御装置）が記録媒体に格納されたプログラムコードを読み出し実行することによっても、機能が実現されることは言うまでもない。 Further, a recording medium storing software program codes for realizing the functions of the above-described embodiments may be supplied to the system or apparatus. It goes without saying that the function is also realized by reading and executing the program code stored in the recording medium by a computer (or a control device such as a CPU) of the system or apparatus.

１００，１００Ａ…画像処理装置、１１０…顔検出部、１１１…顔部分検出部、１１２…目・口部分検出部、１２０…撮影画像選択部、１２１…表情推定部、１２２…置換側／被置換撮影画像選択部、１３０，１３０Ａ…合成画像生成部、１３１…置換領域範囲決定部、１３２…多重解像度分解部、１３３…置換処理部、１３４…再構成部、１３５…合成画像生成処理部、２００，２００Ａ…撮像装置、２０１…レンズ、２０２…撮像素子、２０３…信号処理部、２０４…画像処理部、２０５…タイミングジェネレータ、２０６…メモリ、２０７…デジタル・アナログ変換部、２０８…記録再生処理部、２０９…ビデオエンコーダ、２１０…表示部、２１１…制御部、２１２…外部記憶媒体、２１３…外部記憶媒体Ｉ／Ｆ部、２１４…操作入力部 DESCRIPTION OF SYMBOLS 100,100A ... Image processing apparatus, 110 ... Face detection part, 111 ... Face part detection part, 112 ... Eye / mouth part detection part, 120 ... Photographed image selection part, 121 ... Expression estimation part, 122 ... Replacement side / substitution Captured image selection unit, 130, 130A ... composite image generation unit, 131 ... replacement area range determination unit, 132 ... multi-resolution decomposition unit, 133 ... replacement processing unit, 134 ... reconstruction unit, 135 ... composite image generation processing unit, 200 , 200A ... imaging device, 201 ... lens, 202 ... imaging device, 203 ... signal processing unit, 204 ... image processing unit, 205 ... timing generator, 206 ... memory, 207 ... digital / analog conversion unit, 208 ... recording / reproduction processing unit 209: Video encoder 210 ... Display unit 211 ... Control unit 212 ... External storage medium 213 ... External storage medium I / F unit 214 ... Operation input Part

Claims

A photographed image selection unit that selects a replacement photographed image on which the image is replaced from a plurality of photographed images that are continuously photographed, and a replacement-side photographed image on the replacement side of the image;
A first region including at least a replacement region outer peripheral region that is a region around the outer periphery of a replacement region in the replacement-side captured image, and a replacement region that is a region around the outer periphery of the replacement region in the replacement captured image A multi-resolution decomposition unit that decomposes the second region including at least the outer peripheral region into multi-resolution;
A replacement processing unit that replaces at least a part of the second region with at least a part of the first region in each frequency component obtained by the decomposition by the multi-resolution decomposition unit;
A first process for reconstructing each frequency component replaced by the replacement processing unit, or a second process for combining the reconstructed image obtained by the first process and the image in the replacement area. processed by, and a composite image generating unit that generates a composite image,
When the first area is the outer peripheral area of the replacement area, the multi-resolution decomposition unit performs the outer peripheral area of the replacement area as the first area and the outer periphery of the replacement area as the second area. Each of the peripheral regions is decomposed into multiple resolutions, and the replacement processing unit is configured to decompose each frequency component in the peripheral region around the replacement region and each frequency in the peripheral region around the replacement region obtained by the multi-resolution decomposition unit. In the component, the inner periphery side region of the replacement region outer periphery peripheral region is replaced with the inner periphery side region of the replacement region outer periphery peripheral region, the composite image generation unit generates the composite image by the second processing,
When the first region is the entire replacement-side captured image, the multi-resolution decomposition unit performs the entire replacement-side captured image as the first region and the replacement captured image as the second region. The whole is divided into multiple resolutions, the replacement processing unit, in each frequency component of the entire replacement-side captured image obtained by the decomposition in the multi-resolution decomposition unit and each frequency component of the entire replacement captured image, The image processing device that replaces the replacement area with the replacement area, and wherein the composite image generation unit generates the composite image by the first processing.

The replacement region outer periphery peripheral region and the replacement region outer periphery peripheral region are regions formed on the inner periphery side and the outer periphery side around the outer periphery of the replacement region and the outer periphery of the replacement region,
Wherein the A and peripheral side the inner peripheral side region formed in the outer circumferential region which is formed on the outer peripheral side, respectively, the filter at least 1.5 times the length of the length of the filter used in the multiresolution decomposition section The image processing apparatus according to claim 1.

The image processing apparatus according to claim 2, wherein the multi-resolution decomposition unit decomposes the first area and the second area into multi-resolution by discrete wavelet transform.

A face detection unit for detecting a face part of a person and a part of the face part from the plurality of photographed images;
A facial expression estimation unit that analyzes an image of the facial part detected by the face detection unit and estimates the facial expression;
The photographed image selection unit selects, as the replacement-side photographed image, the photographed image including at least the part of the face whose facial expression is estimated to be good by the facial expression estimation unit, and is not selected as the replacement-side photographed image The image processing apparatus according to claim 3, wherein the replacement photographed image is selected from the photographed images.

A replacement region range determination unit for determining a range to be extracted as the replacement region from the replacement-side captured image;
The replacement area range determination unit determines a predetermined range that includes the entire face portion detected by the face detection unit and is wider than the entire face portion as a range to be extracted as the replacement region. Image processing apparatus.

The replacement area range determination unit has a minor axis obtained by multiplying the width and length of the face calculated by the face detection unit based on the distance between the parts of both eyes of the face by at least 1.5 times, respectively. The image processing apparatus according to claim 5, wherein a range surrounded by an ellipse having a major axis is determined as a range to be extracted as the replacement region.

The image processing apparatus according to claim 5, wherein the multi-resolution decomposition unit decomposes the first area and the second area into multi-resolution by non-decimation wavelet transform .

A photographed image selection step of selecting a replacement photographed image on the side where image replacement is performed, and a replacement-side photographed image on the side replacing the image, from among a plurality of photographed images continuously photographed;
A first region including at least a replacement region outer peripheral region that is a region around the outer periphery of a replacement region in the replacement-side captured image, and a replacement region that is a region around the outer periphery of the replacement region in the replacement captured image A multi-resolution decomposition step of decomposing the second region including at least the outer peripheral region into multi-resolution;
A replacement processing step of replacing at least a part of the second region with at least a part of the first region in each frequency component obtained by decomposing into the multi-resolution;
By the first process for reconstructing each frequency component subjected to the replacement, or the second process for synthesizing the reconstructed image obtained by the first process and the image in the replacement region, A composite image generation step of generating a composite image,
When the first area is the perimeter area of the replacement area, the multiresolution decomposition step includes the perimeter area of the replacement area as the first area and the perimeter of the replacement area as the second area. Each of the peripheral regions is decomposed into multiple resolutions, and in the replacement processing step, each frequency component in the peripheral region around the replacement region and each frequency component in the peripheral region around the replacement region obtained by the decomposition into the multi-resolution are obtained. The inner peripheral side region of the replacement region outer peripheral peripheral region is replaced with the inner peripheral side region of the replacement region outer peripheral peripheral region, and in the composite image generation step, the composite image is generated by the second process,
When the first area is the entire replacement-side captured image, in the multi-resolution decomposition step, the entire replacement-side captured image as the first area and the replacement captured image as the second area In the replacement processing step, the frequency component of the entire replacement-side captured image and the frequency component of the entire replacement-captured image obtained by decomposing into the multi-resolution are converted in the replacement processing step. An image processing method in which a replacement area is replaced with the replacement area, and in the composite image generation step, the composite image is generated by the first process.

A photographed image selection step of selecting a replacement photographed image on the side where image replacement is performed, and a replacement-side photographed image on the side replacing the image, from among a plurality of photographed images continuously photographed;
A first region including at least a replacement region outer peripheral region that is a region around the outer periphery of a replacement region in the replacement-side captured image, and a replacement region that is a region around the outer periphery of the replacement region in the replacement captured image A multi-resolution decomposition step of decomposing the second region including at least the outer peripheral region into multi-resolution;
A replacement processing step of replacing at least a part of the second region with at least a part of the first region in each frequency component obtained by decomposing into the multi-resolution;
By the first process for reconstructing each frequency component subjected to the replacement, or the second process for synthesizing the reconstructed image obtained by the first process and the image in the replacement region, Causing a computer to execute a composite image generation step of generating a composite image;
When the first area is the perimeter area of the replacement area, the multiresolution decomposition step includes the perimeter area of the replacement area as the first area and the perimeter of the replacement area as the second area. Each of the peripheral regions is decomposed into multiple resolutions, and in the replacement processing step, each frequency component in the peripheral region around the replacement region and each frequency component in the peripheral region around the replacement region obtained by the decomposition into the multi-resolution are obtained. The inner peripheral side region of the replacement region outer peripheral peripheral region is replaced with the inner peripheral side region of the replacement region outer peripheral peripheral region, and in the composite image generation step, the composite image is generated by the second process,
When the first area is the entire replacement-side captured image, in the multi-resolution decomposition step, the entire replacement-side captured image as the first area and the replacement captured image as the second area In the replacement processing step, the frequency component of the entire replacement-side captured image and the frequency component of the entire replacement-captured image obtained by decomposing into the multi-resolution are converted in the replacement processing step. A replacement area is replaced with the replacement area, and in the composite image generation step, the composite image is generated by the first processing.
program.

An imaging unit that continuously captures a plurality of captured images;
A photographed image selection unit that selects a replacement photographed image on the side on which image replacement is performed and a replacement-side photographed image on the side that replaces the image from among a plurality of photographed images photographed by the imaging unit;
A first region including at least a replacement region outer peripheral region that is a region around the outer periphery of a replacement region in the replacement-side captured image, and a replacement region that is a region around the outer periphery of the replacement region in the replacement captured image A multi-resolution decomposition unit that decomposes the second region including at least the outer peripheral region into multi-resolution;
A replacement processing unit that replaces at least a part of the second region with at least a part of the first region in each frequency component obtained by the decomposition by the multi-resolution decomposition unit;
A first process for reconstructing each frequency component replaced by the replacement processing unit, or a second process for combining the reconstructed image obtained by the first process and the image in the replacement area. A composite image generation unit that generates a composite image by the processing of
A storage unit for storing the composite image generated by the composite image generation unit,
When the first area is the outer peripheral area of the replacement area, the multi-resolution decomposition unit performs the outer peripheral area of the replacement area as the first area and the outer periphery of the replacement area as the second area. Each of the peripheral regions is decomposed into multiple resolutions, and the replacement processing unit is configured to decompose each frequency component in the peripheral region around the replacement region and each frequency in the peripheral region around the replacement region obtained by the multi-resolution decomposition unit. In the component, the inner periphery side region of the replacement region outer periphery peripheral region is replaced with the inner periphery side region of the replacement region outer periphery peripheral region, the composite image generation unit generates the composite image by the second processing,
When the first region is the entire replacement-side captured image, the multi-resolution decomposition unit performs the entire replacement-side captured image as the first region and the replacement captured image as the second region. The whole is divided into multiple resolutions, the replacement processing unit, in each frequency component of the entire replacement-side captured image obtained by the decomposition in the multi-resolution decomposition unit and each frequency component of the entire replacement captured image, The imaging apparatus , wherein the replacement area is replaced with the replacement area, and the composite image generation unit generates the composite image by the first processing .