JP7686428B2

JP7686428B2 - IMAGE PROCESSING METHOD, MACHINE LEARNING MODEL GENERATION METHOD, IMAGE PROCESSING DEVICE, IMAGE PROCESSING SYSTEM, AND PROGRAM

Info

Publication number: JP7686428B2
Application number: JP2021066380A
Authority: JP
Inventors: 義明井田; 法人日浅
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-04-09
Filing date: 2021-04-09
Publication date: 2025-06-02
Anticipated expiration: 2041-04-09
Also published as: WO2022215375A1; JP2022161503A

Description

本発明は、機械学習を用いて複数の成分を補正する画像処理方法に関する。 The present invention relates to an image processing method that uses machine learning to correct multiple components.

撮像画像に含まれる収差成分とノイズ成分等の複数の成分を補正する場合、各成分で適切な補正強度を実現するためには成分ごとに補正を行うことが望ましい。従来、各成分が相関を持つ場合に撮像画像から一方の成分を変えずに他方の成分を補正することは困難であったが、近年、ディープラーニングを用いて一方の成分のみを高精度に補正できるようになってきた。特許文献１には、ニューラルネットワークを用いて、ノイズ成分を変えずに撮像画像から収差・回折によるぼけを補正した後にノイズ低減処理を行う方法が開示されている。 When correcting multiple components, such as aberration components and noise components, contained in a captured image, it is desirable to perform correction for each component in order to achieve an appropriate correction strength for each component. Conventionally, when the components are correlated, it has been difficult to correct one component from the captured image without changing the other component, but in recent years, it has become possible to correct only one component with high accuracy using deep learning. Patent Document 1 discloses a method of using a neural network to correct blurring due to aberration and diffraction from a captured image without changing the noise component, and then performing noise reduction processing.

特開２０２０－１４４４８９号公報JP 2020-144489 A

特許文献１の方法では、ぼけ補正の強度が調整された撮像画像に対してノイズ低減処理を行っている。そのため、鮮鋭度やアンダーシュートの調整のためにぼけ補正の強度を弱めた場合、ディープラーニングを用いずにノイズ低減処理を行うと収差成分が変化してしまう。一般にノイズ低減処理はぼかし処理になるため、ぼけ成分が広がり、色収差も広がって目立ってしまう。また、ぼけ成分を残差マップとして取得していないため、ノイズ低減処理後にぼけ補正の強度を調整できず、ぼけ補正の強度を調整するたびにノイズ低減処理を実行する必要がある。ディープラーニングの推定処理は計算負荷が大きいため、ぼけ補正の強度を調整するたびにディープラーニングでノイズ低減処理を行うと、計算負荷が著しく大きくなる。また、ぼけ成分とノイズ成分の相関を考慮していないため、ぼけ成分が変化した領域で輝度値の変化にノイズ成分が追従せず、不自然になってしまう。また、ノイズ成分を残差マップとして取得していないため、ぼけ成分を考慮してノイズ成分を修正することができない。 In the method of Patent Document 1, noise reduction processing is performed on a captured image in which the intensity of blur correction has been adjusted. Therefore, if the intensity of blur correction is weakened to adjust sharpness or undershoot, the aberration components will change if noise reduction processing is performed without using deep learning. Generally, noise reduction processing is a blurring process, so the blur components spread and chromatic aberration also spreads and becomes noticeable. In addition, since the blur components are not acquired as a residual map, the intensity of blur correction cannot be adjusted after noise reduction processing, and noise reduction processing must be performed every time the intensity of blur correction is adjusted. Since the estimation processing of deep learning requires a large calculation load, the calculation load will be significantly increased if noise reduction processing is performed using deep learning every time the intensity of blur correction is adjusted. In addition, since the correlation between the blur components and the noise components is not taken into consideration, the noise components do not follow the change in luminance value in the area where the blur components have changed, resulting in unnaturalness. In addition, since the noise components are not acquired as a residual map, the noise components cannot be corrected taking the blur components into consideration.

本発明は、相関を持つ複数の成分に対応する画像補正を適切な補正強度で行って自然な補正画像を取得可能な画像処理方法を提供することを目的とする。 The present invention aims to provide an image processing method that can perform image correction corresponding to multiple correlated components with appropriate correction strength to obtain a natural corrected image.

本発明の一側面としての画像処理方法は、少なくとも１つの機械学習モデルに入力画像を入力することで第１の成分に対応する第１の残差マップと第２の成分に対応する第２の残差マップとを取得する工程と、第１の残差マップと第２の残差マップとに基づいて第３の残差マップを生成する工程と、入力画像、第１の残差マップ、及び第３の残差マップに基づいて出力画像を取得する工程とを有することを特徴とする。 An image processing method according to one aspect of the present invention is characterized by comprising the steps of: acquiring a first residual map corresponding to a first component and a second residual map corresponding to a second component by inputting an input image to at least one machine learning model ; generating a third residual map based on the first residual map and the second residual map ; and acquiring an output image based on the input image, the first residual map, and the third residual map.

本発明によれば、相関を持つ複数の成分に対応する画像補正を適切な補正強度で行って自然な補正画像を取得可能な画像処理方法を提供することができる。 The present invention provides an image processing method that can perform image correction corresponding to multiple correlated components with appropriate correction strength to obtain a natural corrected image.

実施例１の画像処理システムのブロック図である。FIG. 1 is a block diagram of an image processing system according to a first embodiment. 実施例１の画像処理システムの外観図である。1 is an external view of an image processing system according to a first embodiment. 実施例１のウェイトの学習に関するフローチャートである。11 is a flowchart relating to weight learning in the first embodiment. 実施例１のニューラルネットワークのウェイトの学習の流れを示す図である。FIG. 1 is a diagram showing a flow of learning weights of a neural network in the first embodiment. 実施例１の補正画像の生成に関するフローチャートである。4 is a flowchart relating to generation of a corrected image in the first embodiment. 実施例１の画像処理フローのブロック図である。FIG. 2 is a block diagram of an image processing flow according to the first embodiment. 実施例１の変形例の画像処理フローのブロック図である。FIG. 11 is a block diagram of an image processing flow according to a modified example of the first embodiment. 実施例２の画像処理システムのブロック図である。FIG. 11 is a block diagram of an image processing system according to a second embodiment. 実施例２の画像処理システムの外観図である。FIG. 11 is an external view of an image processing system according to a second embodiment. 実施例３の画像処理システムのブロック図である。FIG. 11 is a block diagram of an image processing system according to a third embodiment. 実施例３の画像処理に関するフローチャートである。13 is a flowchart relating to image processing in the third embodiment.

以下、本発明の実施形態について、図面を参照しながら詳細に説明する。各図において、同一の部材については同一の参照符号を付し、重複する説明は省略する。 The following describes in detail an embodiment of the present invention with reference to the drawings. In each drawing, the same components are given the same reference symbols, and duplicate descriptions are omitted.

まず、実施例の具体的な説明を行う前に、本発明の要旨を説明する。本発明は、相関を持つ複数の成分に対応する画像補正を適切な補正強度で行い、自然な補正画像を取得する画像処理に関する。 First, before describing the specific embodiments, the gist of the present invention will be described. The present invention relates to image processing that performs image correction corresponding to multiple correlated components with appropriate correction strength to obtain a natural corrected image.

各成分は、画像の各画素位置における補正量に相当する２次元マップとして表現できる。各成分は、例えば、ノイズ成分、撮像系起因のぼけ成分、デフォーカス・手ぶれ・被写体ぶれによるぼけ成分、霧による散乱成分、リライティング成分、及び背景ぼけを修正する背景ぼけ成分等である。ノイズ成分とは、撮像画像に所定の分散で乱数的に発生し、撮像するたびに異なる分布で生じるランダムノイズである。ノイズ成分は、暗電流ノイズ、ショットノイズ、及び読み出しノイズを含む。撮像系起因のぼけ成分は、収差・回折、ローパスフィルタ、及び開口影響等を含む。リライティングは、撮像後に画像の光源環境を仮想的に変更することであり、光源付与や光源変更を含む。リライティングによる輝度変化成分を以下ではリライティング成分と呼ぶ。 Each component can be expressed as a two-dimensional map that corresponds to the amount of correction at each pixel position of the image. The components are, for example, noise components, blur components caused by the imaging system, blur components caused by defocus, camera shake, and subject motion, scattering components caused by fog, relighting components, and background blur components that correct background blur. Noise components are random noise that occurs randomly with a predetermined variance in the captured image and occurs with a different distribution each time imaging is performed. Noise components include dark current noise, shot noise, and readout noise. Blur components caused by the imaging system include aberration, diffraction, low-pass filters, and aperture effects. Relighting is a virtual change to the light source environment of an image after imaging, and includes adding a light source and changing the light source. The luminance change components caused by relighting are referred to as relighting components below.

各成分に対応する画像補正とは、ノイズの低減や付与、及びぼけの補正や付与等の各成分を変更して好ましい画像やユーザの編集意図を反映した画像とするために成分の変更を加えることである。本実施形態では、各成分に対応する残差マップを取得する。残差マップとは、画像に含まれる特定の成分の補正量（又は成分の値そのもの）を抽出した二次元マップであり、加算又は減算することで各成分に対応する補正が行われる。 Image correction corresponding to each component refers to modifying each component, such as reducing or adding noise, and correcting or adding blur, to create a preferable image or an image that reflects the user's editing intent. In this embodiment, a residual map corresponding to each component is obtained. A residual map is a two-dimensional map that extracts the amount of correction (or the component value itself) of a specific component contained in an image, and correction corresponding to each component is performed by adding or subtracting.

複数の成分は、本質的に相関を持つことがある。例えば、ノイズ成分は、ＩＳＯ感度等の撮影条件や撮像素子の違いで異なる分散を持つが、ショットノイズを含む場合は輝度値によって異なる分散を持つ。ぼけ成分や霧による散乱成分等の補正によってある画素の輝度値が変わった場合、ぼけや散乱がなかった場合に本来その画素に生じるノイズの分散は異なるものとなる。相関を考慮せずにこれらの補正を行うとノイズが不自然に残った画像となる。例えば回折ぼけが除去されて低輝度になった画素や画像領域に対して、ぼけが除去される前の輝度値に対応する大きな分散のノイズが重畳する。また、リライティングされて明るくなった画素や画像領域にリライティング前の暗い輝度値に対応して不自然に小さいノイズが重畳する。更に、撮像系の光学特性、デフォーカス、手ぶれ、及び被写体ぶれに由来するぼけは、被写体の明るさに依存して周囲の画素に広がるぼけを生じさせる。そのため、ライティングによって被写体の明るさが変わると、ぼけも同時に変化する。したがって、画像処理によってリライティングした場合でも、成分間の相関を考慮せずにぼけを補正すると不自然にぼけが残った画像やぼけが過剰に補正された画像となる。 Multiple components may inherently have a correlation. For example, noise components have different variances depending on shooting conditions such as ISO sensitivity and differences in image sensors, but when shot noise is included, they have different variances depending on the luminance value. When the luminance value of a pixel changes due to correction of blur components or scattering components due to fog, the variance of noise that would normally occur in that pixel if there was no blur or scattering will be different. If these corrections are made without considering the correlation, the image will have unnatural noise. For example, noise with a large variance corresponding to the luminance value before the blur was removed is superimposed on pixels or image areas that have become low luminance after the diffraction blur is removed. In addition, unnaturally small noise corresponding to the dark luminance value before the relighting is superimposed on pixels or image areas that have become brighter due to relighting. Furthermore, blur caused by the optical characteristics of the imaging system, defocus, camera shake, and subject motion causes blur that spreads to surrounding pixels depending on the brightness of the subject. Therefore, when the brightness of the subject changes due to lighting, the blur also changes at the same time. Therefore, even if relighting is done using image processing, correcting the blur without considering the correlation between components can result in an image with unnatural blur or with the blur corrected excessively.

従来、複数の成分が相関を持つ場合に画像から一方の成分を変えずに他方の成分を補正することは困難であった。例えば、撮像系起因のぼけ成分を補正する場合、鮮鋭化処理を行うと周波数特性にゲインをかけることになり、ぼけ成分又は鮮鋭化処理の周波数特性に依存してノイズ成分が強調されてしまう。逆に、背景ぼけを大きくする場合、ぼかし処理をかけることになり、ノイズ成分が低減されてしまう。ノイズ成分を先に補正する場合でも、ノイズ低減処理は周囲の画像領域の情報を用いたぼかし処理になるため、ぼけ成分が大きくなってしまう。また、リライティング処理を行って輝度値を明るくしたり暗くしたりすると、ぼけ成分、ノイズ成分、及び霧による散乱成分が大きくなったり小さくなったりする。これらの相関する成分を切り分けて取得するには、シーンに応じた各成分の変化を捉えて、画像中の被写体に応じた処理を行う必要がある。 Conventionally, when multiple components are correlated, it has been difficult to correct one component from an image without changing the other. For example, when correcting blur components caused by the imaging system, sharpening processing applies a gain to the frequency characteristics, and noise components are emphasized depending on the frequency characteristics of the blur components or the sharpening processing. Conversely, when background blur is increased, blurring processing is applied, and noise components are reduced. Even when noise components are corrected first, the noise reduction processing is a blurring process that uses information from the surrounding image area, so the blur components become larger. In addition, when the luminance value is brightened or darkened by performing relighting processing, the blur components, noise components, and scattering components due to fog become larger or smaller. To separate and obtain these correlated components, it is necessary to capture the changes in each component according to the scene and perform processing according to the subject in the image.

近年では、機械学習、特にディープラーニングを用いることでシーンの理解や成分の分離ができるようになり、一方の成分のみを高精度に補正できるようになってきた。これにより、相関を持つ複数の成分に対して各成分を精度よく補正することが可能となる。例えば、ノイズ成分とぼけ成分を含む画像にノイズ低減処理とぼけ補正による鮮鋭化処理を適用する場合について説明する。先に機械学習を用いてぼけ成分のみを高精度に補正することで、ぼけ成分を含まずノイズ成分のみを含む画像を取得できる。取得した画像に従来のノイズ低減処理を行うことで、ノイズ成分を低減できる。ただし、ぼけ成分を含まない画像であったとしてもノイズ低減処理を行うことでぼけが生じるため、ノイズ低減処理も機械学習で行うことが好ましい。なお、鮮鋭度及びノイズ度合いは、撮像装置メーカーの作画意図やユーザの編集意図によって適切な補正の程度が変わることから補正強度は調整できることが好ましい。シーンに応じて補正処理によるアーティファクトが生じうることなどからも、補正強度は調整できることが好ましい。アーティファクトは、ノイズ低減処理によるぼけ、鮮鋭化処理によるアンダーシュート、及びリンギング等を含み、機械学習を用いた高精度な補正処理においても、シーンによっては生じうるものである。例えば、光源の写りこみや被写体が高コントラストなシーンでは、アーティファクトが生じやすい。このように補正強度を調整する場合は、ぼけ成分を一部残存するようにぼけ補正強度が調整される。したがって、続いて実行されるノイズ低減処理は、機械学習を用いることでぼけ成分の影響を受けずに高精度な補正を行うことができる。 In recent years, machine learning, especially deep learning, has made it possible to understand scenes and separate components, and to correct only one component with high accuracy. This makes it possible to accurately correct each component for multiple components that are correlated. For example, a case will be described in which noise reduction processing and sharpening processing by blur correction are applied to an image containing noise components and blur components. By first correcting only the blur components with high accuracy using machine learning, an image containing only noise components and no blur components can be obtained. The noise components can be reduced by performing conventional noise reduction processing on the obtained image. However, even if the image does not contain blur components, blurring occurs due to noise reduction processing, so it is preferable to perform noise reduction processing using machine learning. Note that the appropriate degree of correction for sharpness and noise level changes depending on the imaging device manufacturer's drawing intention and the user's editing intention, so it is preferable to be able to adjust the correction strength. It is also preferable to be able to adjust the correction strength because artifacts may occur due to the correction processing depending on the scene. Artifacts include blurring caused by noise reduction processing, undershooting caused by sharpening processing, ringing, etc., and may occur depending on the scene even in high-precision correction processing using machine learning. For example, artifacts are likely to occur in scenes where a light source is reflected or the subject is high contrast. When adjusting the correction intensity in this way, the blur correction intensity is adjusted so that some of the blur components remain. Therefore, the noise reduction processing executed subsequently can be performed with high precision without being affected by the blur components by using machine learning.

上述したように複数の成分の補正を１つずつ行うことで、各成分を補正可能である。前段の補正強度を変更した際には後段の補正処理を再度実行する必要があるが、機械学習は一般的に計算負荷が大きく再実行が必要であることで、補正強度の調整が困難になる。そこで、本発明では、各成分を対応する残差マップとして取得する。残差マップに基づいて補正強度を調整することで、各成分を適切な強度で補正した補正画像を取得することができる。各成分を切り分けて取得する必要があるので、それぞれ機械学習モデルを用いる必要がある。相関を持つ複数の成分を補正する際には、一方の成分を補正する際に他方の成分を変化させなくとも、他方の成分の本来あるべき大きさが変化する。そこで、本発明では、一方の残差マップに基づいて他方の残差マップを修正する。これにより、相関を持つ複数の成分に対する補正強度を調整しつつも、自然な補正画像を取得することができる。修正前後の残差マップは、異なる分布である。ここで、２つの残差マップが異なる分布であるとは、画素ごとに一律の値をかけたりオフセットを加えたりした関係ではないことである。これは修正が画面一律の処理ではなく各画素又は画像領域ごとの処理であり、各画素又は画像領域ごとの一方の成分の値によって他方の成分を修正するためである。これは本発明における成分間の相関が局所的なものであり、一方の成分の局所的な値に応じて他方の成分が影響を受けることに由来する。また、残差マップとして各成分を分離して取得できることで、経験的ではなく物理的に適切な修正が可能である。光源で明るさを変更した被写体が、撮像系の光学特性でぼけを生じ、ぼけた光をセンシングすることでノイズが生じるというように、相関を持つ複数の成分には物理的な順序が存在する場合がある。物理的に後から生じる成分は先に生じる成分の影響を受けるため、先に生じる成分に対応する残差マップに基づいて後から生じる成分に対応する残差マップを修正することが好ましい。 As described above, each component can be corrected by performing correction of multiple components one by one. When the correction intensity of the previous stage is changed, the correction process of the subsequent stage needs to be executed again, but machine learning generally has a large calculation load and requires re-execution, making it difficult to adjust the correction intensity. Therefore, in the present invention, each component is acquired as a corresponding residual map. By adjusting the correction intensity based on the residual map, a corrected image in which each component is corrected with an appropriate intensity can be acquired. Since it is necessary to acquire each component separately, it is necessary to use a machine learning model for each. When correcting multiple components that are correlated, even if one component is corrected without changing the other component, the original size of the other component changes. Therefore, in the present invention, one residual map is corrected based on the other residual map. As a result, a natural corrected image can be acquired while adjusting the correction intensity for multiple components that are correlated. The residual maps before and after correction have different distributions. Here, the fact that the two residual maps have different distributions means that they are not in a relationship in which a uniform value is multiplied or an offset is added to each pixel. This is because the correction is not a uniform process for the entire screen, but a process for each pixel or image region, and the value of one component for each pixel or image region corrects the other component. This is because the correlation between components in the present invention is local, and one component is affected by the local value of the other component. In addition, by being able to separate and acquire each component as a residual map, it is possible to perform physically appropriate correction rather than empirical. There may be a physical order to multiple correlated components, such as when a subject whose brightness has been changed by a light source becomes blurred due to the optical characteristics of the imaging system, and noise is generated by sensing the blurred light. Since a component that occurs later physically is affected by a component that occurs earlier, it is preferable to correct the residual map corresponding to the component that occurs later based on the residual map corresponding to the component that occurs earlier.

なお、機械学習であれば前段の補正で１つの成分のみを補正するように学習しなくても、後段の補正で前段の処理を前提として学習することも可能である。この場合は後段で補正したい成分の変化を前提として学習するため、残差マップの修正が不要となる。しかしながら、前段の処理の有無や補正強度の変化に対応して学習すると学習条件が増えるため、学習の手間、データ量の増加、及び補正性能劣化につながる。また、前段の処理に変更がある場合や新たな処理を追加したい場合に後段の処理を見直す必要がある。 Note that with machine learning, it is possible to learn the latter correction assuming the former processing, rather than learning to correct only one component in the former correction. In this case, since learning is based on the change in the component to be corrected in the latter, there is no need to modify the residual map. However, learning in response to the presence or absence of former processing or changes in correction strength increases the learning conditions, leading to more learning work, an increase in data volume, and deterioration of correction performance. Also, if there are changes to the former processing or if new processing is to be added, the latter processing must be reviewed.

本実施例では、多層のニューラルネットワーク（機械学習モデル）を用いて撮像系の光学特性に由来するぼけ成分（第１の成分）の補正とノイズ成分（第２の成分）の補正を学習、実行させる画像処理システムについて説明する。ぼけ成分とノイズ成分は画像補正のそれぞれ異なるタスクに対応する。なお、本発明は、ぼけ成分の補正とノイズ成分の補正とを組み合わせた画像処理に限定されるものではなく、その他の画像処理にも適用可能である。 In this embodiment, an image processing system is described that uses a multi-layer neural network (machine learning model) to learn and execute correction of a blur component (first component) derived from the optical characteristics of an imaging system and correction of a noise component (second component). The blur component and the noise component correspond to different tasks of image correction. Note that the present invention is not limited to image processing that combines correction of a blur component and correction of a noise component, and can be applied to other types of image processing.

図１は、本実施例の画像処理システム１００のブロック図である。図２は、画像処理システム１００の外観図である。画像処理システム１００は、学習装置１０１、撮像装置１０２、画像推定装置（画像処理装置）１０３、表示装置１０４、記録媒体１０５、出力装置１０６、及びネットワーク１０７を有する。 Figure 1 is a block diagram of an image processing system 100 according to this embodiment. Figure 2 is an external view of the image processing system 100. The image processing system 100 includes a learning device 101, an imaging device 102, an image estimation device (image processing device) 103, a display device 104, a recording medium 105, an output device 106, and a network 107.

学習装置１０１は、記憶部１０１ａ、取得部１０１ｂ、生成部１０１ｃ、及び更新部１０１ｄを有する。 The learning device 101 has a memory unit 101a, an acquisition unit 101b, a generation unit 101c, and an update unit 101d.

撮像装置１０２は、光学系１０２ａ及び撮像素子１０２ｂを有する。光学系１０２ａは、被写体空間から撮像装置１０２に入射した光を集光する。撮像素子１０２ｂは、光学系１０２ａを介して形成された光学像（被写体像）を受光して（光電変換して）撮像画像を取得する。撮像素子１０２ｂは例えば、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）センサや、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌ－ＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）センサ等である。撮像装置１０２によって取得される撮像画像は、光学系１０２ａの収差や回折によるぼけと、撮像素子１０２ｂによるノイズとを含む。 The imaging device 102 has an optical system 102a and an imaging element 102b. The optical system 102a collects light incident on the imaging device 102 from the subject space. The imaging element 102b receives (photoelectrically converts) the optical image (subject image) formed via the optical system 102a to obtain an image. The imaging element 102b is, for example, a CCD (Charge Coupled Device) sensor or a CMOS (Complementary Metal-Oxide Semiconductor) sensor. The image obtained by the imaging device 102 includes blur due to aberration and diffraction of the optical system 102a and noise due to the imaging element 102b.

画像推定装置１０３は、記憶部１０３ａ、取得部（第１取得部）１０３ｂ、及び補正部（修正部、第２取得部）１０３ｃを有する。画像推定装置１０３は、撮像装置１０２から撮像画像を取得すると共に、少なくとも１つの機械学習モデルを用いてぼけ成分に対応するぼけ残差マップ（第１の残差マップ）とノイズ成分に対応するノイズ残差マップ（第２の残差マップ）とを取得する。その後、撮像画像（入力画像）とぼけ残差マップに基づいてノイズ残差マップを修正し、修正ノイズ残差マップ（第３の残差マップ）を取得する。そして、撮像画像とぼけ残差マップと修正ノイズ残差マップとに基づいてぼけとノイズとが適切な強度で補正された推定画像（補正画像、出力画像）を生成する。なお、画像推定装置１０３は、必要に応じて現像処理やその他の画像処理を行う機能を有する。 The image estimation device 103 has a storage unit 103a, an acquisition unit (first acquisition unit) 103b, and a correction unit (modification unit, second acquisition unit) 103c. The image estimation device 103 acquires a captured image from the imaging device 102, and acquires a blur residual map (first residual map) corresponding to the blur component and a noise residual map (second residual map) corresponding to the noise component using at least one machine learning model. Then, the noise residual map is corrected based on the captured image (input image) and the blur residual map to acquire a corrected noise residual map (third residual map). Then, an estimated image (corrected image, output image) in which blur and noise are corrected with an appropriate strength is generated based on the captured image, the blur residual map, and the corrected noise residual map. The image estimation device 103 has a function of performing development processing and other image processing as necessary.

残差マップの取得において多層のニューラルネットワークが使用され、ウェイトの情報（機械学習モデルウェイト）は記憶部１０３ａから読み出される。ウェイトの情報は、学習装置１０１で学習されたものである。画像推定装置１０３は、事前にネットワーク１０７を介して記憶部１０１ａからウェイトの情報を読み出し、記憶部１０３ａに保存している。ウェイトの情報は、ウェイトの数値そのものでもよいし、符号化された形式の情報でもよい。ウェイトの学習、及びウェイトを用いた各残差マップの取得に関する詳細は後述する。 A multi-layer neural network is used to obtain the residual map, and weight information (machine learning model weights) is read from the memory unit 103a. The weight information is learned by the learning device 101. The image estimation device 103 reads the weight information from the memory unit 101a via the network 107 in advance and stores it in the memory unit 103a. The weight information may be the weight's numerical value itself, or may be information in an encoded format. Details regarding learning the weights and obtaining each residual map using the weights will be described later.

補正画像は、表示装置１０４、記録媒体１０５、及び出力装置１０６の少なくとも１つに出力される。表示装置１０４は例えば、液晶ディスプレイやプロジェクタ等である。ユーザは、表示装置１０４を介して、処理途中の画像を確認しながら編集作業などを行うことができる。記録媒体１０５は例えば、半導体メモリ、ハードディスク、及びネットワーク上のサーバ等である。出力装置１０６は例えば、プリンタ等である。 The corrected image is output to at least one of the display device 104, the recording medium 105, and the output device 106. The display device 104 is, for example, a liquid crystal display or a projector. The user can perform editing work while checking the image being processed via the display device 104. The recording medium 105 is, for example, a semiconductor memory, a hard disk, or a server on a network. The output device 106 is, for example, a printer.

以下、図３及び図４を参照して、本実施例の学習装置１０１により実行されるウェイト（ウェイトの情報）の学習方法（学習済みモデルの製造方法）に関して説明する。図３は、ウェイトの学習に関するフローチャートである。図３の各ステップの処理は、主に、学習装置１０１の取得部１０１ｂ、生成部１０１ｃ、又は更新部１０１ｄにより実行される。図４は、ニューラルネットワークのウェイトの学習の流れを示す図である。 The weight (weight information) learning method (method of manufacturing a trained model) executed by the learning device 101 of this embodiment will be described below with reference to Figs. 3 and 4. Fig. 3 is a flowchart related to weight learning. The processing of each step in Fig. 3 is mainly executed by the acquisition unit 101b, the generation unit 101c, or the update unit 101d of the learning device 101. Fig. 4 is a diagram showing the flow of learning the weights of a neural network.

ステップＳ１０１では、取得部１０１ｂは、原画像（被写体画像）を取得する。本実施例では、原画像は、光学系１０２ａの収差や回折によるぼけが少ない高解像（高品位）な画像である。原画像は複数取得され、様々な被写体、すなわち、様々な強さと方向のエッジ、テクスチャ、グラデーション、及び平坦部等を有する画像である。原画像は、実写画像でもよいし、ＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）により生成された画像でもよい。特に、原画像として実写画像を使用する場合、既に収差や回折によってぼけが発生しているため、縮小することでぼけの影響を小さくし、高解像（高品位）な画像にすることができる。なお、原画像に高周波成分が充分に含まれている場合、縮小は行わなくてもよい。 In step S101, the acquisition unit 101b acquires an original image (subject image). In this embodiment, the original image is a high-resolution (high-quality) image with little blurring due to aberration or diffraction of the optical system 102a. A plurality of original images are acquired, and are images of various subjects, that is, images having edges, textures, gradations, flat parts, etc. of various strengths and directions. The original image may be a real-life image or an image generated by CG (Computer Graphics). In particular, when a real-life image is used as the original image, since blurring has already occurred due to aberration or diffraction, the effect of blurring can be reduced by shrinking the image, and a high-resolution (high-quality) image can be obtained. Note that if the original image contains sufficient high-frequency components, shrinking may not be necessary.

原画像は、撮像素子１０２ｂの輝度飽和値よりも高い信号値を有することが好ましい。これは、実際の被写体においても、特定の露出条件で撮像装置１０２により撮影を行った際、輝度飽和値に収まらない被写体が存在するためである。原画像として実写画像を使用する場合、ＨＤＲ撮影や撮像装置１０２より高いダイナミックレンジを持つ撮像装置で撮影することで取得可能である。撮像装置１０２と同等のダイナミックレンジを持つ撮像装置で撮影した画像を原画像とする場合、信号値を比例倍する等して高い信号値とすることも可能である。ただし、比例倍による階調の低下が学習結果に影響を与えない範囲で行うことが好ましい。また、原画像は、ノイズ成分を有していてもよい。この場合、原画像に含まれるノイズを含めて被写体であるとみなせるため、原画像のノイズは特に問題にならない。 It is preferable that the original image has a signal value higher than the brightness saturation value of the image sensor 102b. This is because there are actual subjects that do not fall within the brightness saturation value when photographed by the image capture device 102 under specific exposure conditions. When using a real image as the original image, it can be obtained by HDR photography or by photographing with an image capture device that has a higher dynamic range than the image capture device 102. When using an image photographed with an image capture device that has the same dynamic range as the image capture device 102 as the original image, it is also possible to obtain a high signal value by proportionally multiplying the signal value. However, it is preferable to do this within a range where the reduction in gradation due to proportional multiplication does not affect the learning results. In addition, the original image may have noise components. In this case, the noise contained in the original image can be considered to be the subject, so the noise in the original image is not a particular problem.

ステップＳ１０２では、取得部１０１ｂは、後述する撮像シミュレーションを行うために用いるぼけを取得する。まず、取得部１０１ｂは、光学系１０２ａのレンズステート（ズーム、絞り、及び合焦距離の状態）に対応する撮影条件を取得する。そして、取得部１０１ｂは、撮影条件と画面位置によって決まるぼけを取得する。ここで、ぼけとは、光学系１０２ａのＰＳＦ（点像強度分布）又はＯＴＦ（光学伝達関数）である。ぼけは、光学系１０２ａにおける光学シミュレーションや測定によって取得可能である。なお、原画像ごとに異なるレンズステート、像高、アジムスの収差、及び回折によるぼけが取得される。これにより、複数の撮影条件、像高、及びアジムスに対応した撮像シミュレーションを行うことができる。また、必要に応じて、付与するぼけに、撮像装置１０２に含まれる光学ローパスフィルタ等の成分を加えてもよい。 In step S102, the acquisition unit 101b acquires the blur used to perform the imaging simulation described later. First, the acquisition unit 101b acquires the imaging conditions corresponding to the lens state (zoom, aperture, and focal distance state) of the optical system 102a. Then, the acquisition unit 101b acquires the blur determined by the imaging conditions and the screen position. Here, the blur is the PSF (point spread function) or OTF (optical transfer function) of the optical system 102a. The blur can be acquired by optical simulation or measurement in the optical system 102a. Note that the lens state, image height, azimuth aberration, and diffraction blur that differ for each original image are acquired. This makes it possible to perform imaging simulations corresponding to multiple imaging conditions, image heights, and azimuths. In addition, if necessary, a component such as an optical low-pass filter included in the imaging device 102 may be added to the blur to be applied.

ステップＳ１０３では、生成部１０１ｃは、正解パッチ（正解画像）からなる正解データと訓練パッチ（訓練画像）からなる訓練データをまとめた学習データを生成する。正解パッチと訓練パッチは、学習したい機能や効果によって変えるものであり、対応した画像を正解パッチと訓練パッチとして使えばよい。正解パッチと訓練パッチに用いる画像として、第１の成分と第２の成分がそれぞれ異なる画像を生成する。本実施例では、原画像から生成した無成分パッチと無成分パッチにステップＳ１０２で付与したぼけ成分を付与したぼけパッチを生成する。また、無成分パッチとぼけパッチにノイズ成分を付与したノイズパッチとノイズぼけパッチを生成する。無成分パッチとぼけパッチはそれぞれ複数生成し、１枚の原画像に対応して１以上のパッチを生成する。本実施例では、無成分パッチとぼけパッチは同一の被写体が写った画像である。本実施例では、無成分パッチとぼけパッチをそれぞれ正解パッチと訓練パッチとして用いる。そして、それらの組み合わせを複数まとめたものを学習データとして、光学系１０２ａの収差や回折を補正するためのぼけ残差マップを取得する機械学習モデルを学習するために用いる。そのため、無成分パッチは、ぼけパッチと比較してぼけが少ない画像である必要がある。ただし、後述するように原画像の条件次第で補正しない場合があってもよいため、学習データは無成分パッチとぼけパッチで同一の画像である場合を含んでもよい。 In step S103, the generating unit 101c generates learning data that combines correct answer data consisting of correct answer patches (correct answer images) and training data consisting of training patches (training images). The correct answer patches and training patches are changed depending on the function or effect to be learned, and the corresponding images can be used as the correct answer patches and training patches. Images with different first and second components are generated as the images used for the correct answer patches and training patches. In this embodiment, a component-free patch generated from the original image and a blurred patch are generated by adding the blur component added in step S102 to the component-free patch. In addition, a noise patch and a noise-blurred patch are generated by adding a noise component to the component-free patch and the blurred patch. A plurality of component-free patches and blurred patches are generated, and one or more patches are generated corresponding to one original image. In this embodiment, the component-free patch and the blurred patch are images in which the same subject is captured. In this embodiment, the component-free patch and the blurred patch are used as the correct answer patch and the training patch, respectively. Then, a collection of multiple combinations of these is used as training data to train a machine learning model that acquires a blur residual map for correcting the aberration and diffraction of the optical system 102a. Therefore, the component-free patch needs to be an image with less blur compared to the blur patch. However, as described below, there may be cases where correction is not required depending on the conditions of the original image, so the training data may include cases where the component-free patch and the blur patch are the same image.

なお、パッチとは、既定の画素数（例えば、６４×６４画素など）を有する画像を指す。また、正解パッチと訓練パッチの画素数は、必ずしも一致する必要はない。本実施例では、多層のニューラルネットワークのウェイトの学習に、ミニバッチ学習を使用する。このため、ステップＳ１０３では、複数組の正解パッチと訓練パッチを生成する。ただし、本発明はこれに限定されるものではなく、オンライン学習又はバッチ学習を用いてもよい。 Note that a patch refers to an image having a predetermined number of pixels (e.g., 64 x 64 pixels). The number of pixels of the correct patch and the training patch does not necessarily have to be the same. In this embodiment, mini-batch learning is used to learn the weights of the multi-layered neural network. For this reason, in step S103, multiple pairs of correct patches and training patches are generated. However, the present invention is not limited to this, and online learning or batch learning may also be used.

本実施例では、記憶部１０１ａに記憶されている複数の原画像を被写体として、撮像シミュレーションを行うことにより、収差や回折によるぼけの影響が相対的に異なる無成分画像とぼけ画像とのペアを複数生成する。この時点では無成分画像とぼけ画像は学習データとして用いるパッチの画素数と同じかそれより大きい画素数を有する。そして、無成分画像とぼけ画像との複数のペアから同一位置で規定の画素サイズの部分領域を抽出することで、複数の無成分パッチとぼけパッチを取得する。本実施例では、原画像は未現像のＲＡＷ画像であり、無成分パッチとぼけパッチも同様にＲＡＷ画像である。ただし本発明は、これに限定されるものではなく、現像後の画像でもよい。また、部分領域の位置とは、部分領域の中心を指す。なお、本実施例では上記方法により、無成分パッチとぼけパッチを取得するが、本発明はこれに限定されるものではない。 In this embodiment, a plurality of pairs of component-free images and blurred images with relatively different effects of blurring due to aberration and diffraction are generated by performing an imaging simulation using a plurality of original images stored in the storage unit 101a as subjects. At this point, the component-free images and blurred images have the same or larger number of pixels as the number of pixels of the patch used as learning data. Then, a plurality of component-free patches and blurred patches are obtained by extracting partial areas of a specified pixel size at the same position from the plurality of pairs of component-free images and blurred images. In this embodiment, the original image is an undeveloped RAW image, and the component-free patches and blurred patches are also RAW images. However, the present invention is not limited to this, and may be images after development. In addition, the position of the partial area refers to the center of the partial area. Note that in this embodiment, the component-free patches and blurred patches are obtained by the above method, but the present invention is not limited to this.

また、撮像素子１０２ｂでノイズが発生するため、学習データにもノイズの付与を行う撮像素子１０２ｂのノイズ特性に対応した乱数を生成して付与すればよく、ＩＳＯ感度等の撮影条件を考慮してもよい。ぼけ画像と無成分画像で同一のノイズを付与することで、ノイズぼけ画像（ノイズぼけパッチ）とノイズ画像（ノイズパッチ）を取得する。付与するノイズはσ（ｘ，ｙ）・ｒ（ｘ，ｙ）で表され、以下の式（１）を満たす。
σ^２（ｘ，ｙ）＝［ｋ１（Ｓ１（ｘ，ｙ）－ＯＢ）＋ｋ０）］×ＩＳＯ／１００（１）
（ｘ，ｙ）は２次元の空間座標、Ｓ１（ｘ，ｙ）はノイズを付与する前のぼけパッチの座標（ｘ，ｙ）における画素の信号値である。ｒ（ｘ，ｙ）は標準偏差１の乱数マップの座標（ｘ，ｙ）における数値、σ（ｘ，ｙ）はノイズの標準偏差（σ^２（ｘ，ｙ）は分散）である。ＯＢはオプティカルブラック（黒レベルの画像）の信号値、ＩＳＯはＩＳＯ感度、ｋ１とｋ０はＩＳＯ感度１００における信号値に対する比例係数と定数である。比例係数ｋ１はショットノイズの影響を表し、定数ｋ０は暗電流や読み出しノイズの影響を表す。ｋ１とｋ０の値は、撮像素子１０２ｂのノイズ特性によって決まる。これにより、無成分パッチとぼけパッチの対応する画素（対応画素）に対して、共通のノイズが付与される。対応画素とは、被写体空間の同一の位置を撮像した画素、又は無成分パッチとぼけパッチの同一の位置の画素である。撮像素子１０２ｂの様々なＩＳＯ感度に対応する場合は、複数の無成分パッチ２０１とぼけパッチ２０２に対して、異なるＩＳＯ感度のノイズを付与する。 In addition, since noise occurs in the image sensor 102b, a random number corresponding to the noise characteristics of the image sensor 102b that adds noise to the learning data may be generated and added, and shooting conditions such as ISO sensitivity may be taken into consideration. By adding the same noise to the blurred image and the component-free image, a noise blurred image (noise blurred patch) and a noise image (noise patch) are obtained. The noise to be added is represented by σ(x, y)·r(x, y) and satisfies the following formula (1).
σ ² (x,y)=[k1(S1(x,y)-OB)+k0)]×ISO/100 (1)
(x, y) is a two-dimensional spatial coordinate, and S1(x, y) is a signal value of a pixel at the coordinates (x, y) of the blurred patch before noise is added. r(x, y) is a numerical value at the coordinates (x, y) of a random number map with a standard deviation of 1, and σ(x, y) is the standard deviation of noise (σ ² (x, y) is the variance). OB is a signal value of optical black (black level image), ISO is ISO sensitivity, and k1 and k0 are a proportionality coefficient and a constant for the signal value at ISO sensitivity 100. The proportionality coefficient k1 represents the influence of shot noise, and the constant k0 represents the influence of dark current and readout noise. The values of k1 and k0 are determined by the noise characteristics of the image sensor 102b. As a result, a common noise is added to corresponding pixels (corresponding pixels) of the component-free patch and the blurred patch. A corresponding pixel is a pixel that captures the same position in the subject space, or a pixel that is at the same position in the component-free patch and the blurred patch. When various ISO sensitivities of the image sensor 102b are supported, noises of different ISO sensitivities are added to the multiple component-free patches 201 and blur patches 202.

なお、実写画像を縮小して原画像とする場合、縮小とぼけの付与は順序を逆にしてもよい。ぼけの付与を先に行う場合、縮小を考慮して、ぼけのサンプリングレートを細かくする必要がある。ＰＳＦ（点像強度分布）ならば空間のサンプリング点を細かくし、ＯＴＦ（光学伝達関数）ならば最大周波数を大きくすればよい。 When reducing an actual image to create the original image, the order of reduction and blurring may be reversed. If blurring is performed first, the sampling rate of the blur must be finer to take into account the reduction. In the case of a PSF (point spread function), the spatial sampling points should be finer, and in the case of an OTF (optical transfer function), the maximum frequency should be increased.

付与するぼけには、歪曲収差を含めないことが好ましい。鮮鋭化処理（収差補正処理、ぼけ成分の補正）において、歪曲収差が大きいと、被写体の位置が変化し、無成分パッチとぼけパッチで被写体が異なる可能性があるためである。このため、本実施例で用いるぼけ残差マップは歪曲収差を含まない。歪曲収差はバイリニア補間やバイキュービック補間等を用いて、ぼけ補正後、個別に補正する。 It is preferable that the blur to be applied does not include distortion. This is because if distortion is large during sharpening processing (aberration correction processing, blur component correction), the position of the subject will change, and the subject may be different between the component-free patch and the blur patch. For this reason, the blur residual map used in this embodiment does not include distortion. Distortion is corrected separately after blur correction using bilinear interpolation, bicubic interpolation, etc.

ステップＳ１０４では、生成部１０１ｃは、ノイズぼけパッチを図４の入力データ２１２（訓練パッチ、訓練画像）として多層のニューラルネットワークに入力し、推定パッチ（推定画像）２１３を生成する。ミニバッチ学習のため、複数の入力データ２１２に対応する推定パッチ２１３を生成する。図４は、ステップＳ１０４からステップＳ１０５までの流れを示している。 In step S104, the generation unit 101c inputs the noise blur patch as input data 212 (training patch, training image) in FIG. 4 to a multi-layer neural network, and generates an estimated patch (estimated image) 213. For mini-batch learning, the generation unit 101c generates estimated patches 213 corresponding to multiple pieces of input data 212. FIG. 4 shows the flow from step S104 to step S105.

本実施例では、正解データ２１１としてノイズパッチを用いる。推定パッチ２１３は、ぼけを補正されたノイズぼけパッチであり、理想的には正解データ（正解パッチ、正解画像）２１１と一致する。ニューラルネットワークは、入力データ２１２（本実施例ではノイズぼけパッチ）と正解データ２１１（本実施例ではノイズパッチ）との差分に相当する推定残差マップ２１４を出力する。推定残差マップ２１４は、推定されたぼけ残差マップである。なお、本実施例では、図３に示されるニューラルネットワークの構成を使用するが、本発明はこれに限定されない。 In this embodiment, a noise patch is used as the correct answer data 211. The estimated patch 213 is a noise blur patch with blur corrected, and ideally matches the correct answer data (correct answer patch, correct answer image) 211. The neural network outputs an estimated residual map 214 that corresponds to the difference between the input data 212 (noise blur patch in this embodiment) and the correct answer data 211 (noise patch in this embodiment). The estimated residual map 214 is an estimated blur residual map. Note that in this embodiment, the neural network configuration shown in FIG. 3 is used, but the present invention is not limited to this.

図３中のＣＮは畳み込み層、ＤＣは逆畳み込み層を表す。ＣＮとＤＣのどちらでも、入力とフィルタの畳み込み、及びバイアスとの和が算出され、その結果を活性化関数によって非線形変換する。フィルタの各成分とバイアスの初期値は任意であり、本実施例では乱数によって決定する。活性化関数は例えば、ＲｅＬＵ（ＲｅｃｔｉｆｉｅｄＬｉｎｅａｒＵｎｉｔ）やシグモイド関数等を使うことができる。最終層を除く各層の出力は、特徴マップと呼ばれる。スキップコネクション２２２，２２３は、連続していない層から出力された特徴マップを合成する。特徴マップの合成は要素ごとの和をとってもよいし、チャンネル方向に連結（ｃｏｎｃａｔｅｎａｔｉｏｎ）してもよい。本実施例では、要素ごとの和をとる。スキップコネクション２２１は、入力データ２１２から推定された推定残差マップ２１４と、入力データ２１２との和を取り、推定パッチ２１３を生成する。複数の入力データ２１２のそれぞれに対して、推定パッチ２１３が生成される。 CN in FIG. 3 represents a convolutional layer, and DC represents a deconvolutional layer. In both CN and DC, the sum of the input, the convolution of the filter, and the bias is calculated, and the result is nonlinearly transformed by the activation function. The initial values of each component of the filter and the bias are arbitrary, and in this embodiment, they are determined by random numbers. For example, ReLU (Rectified Linear Unit) or a sigmoid function can be used as the activation function. The output of each layer except the final layer is called a feature map. The skip connections 222 and 223 combine the feature maps output from discontinuous layers. The combination of the feature maps may be performed by taking the sum of each element, or may be concatenated in the channel direction. In this embodiment, the sum of each element is taken. The skip connection 221 takes the sum of the estimated residual map 214 estimated from the input data 212 and the input data 212 to generate an estimated patch 213. An estimated patch 213 is generated for each of the multiple input data 212.

ステップＳ１０５では、更新部１０１ｄは、推定パッチ２１３と正解データ２１１との誤差から、ニューラルネットワークのウェイトを更新する。ウェイトは、各層のフィルタの成分とバイアスを含む。ウェイトの更新には誤差逆伝搬法（Ｂａｃｋｐｒｏｐａｇａｔｉｏｎ）が使用されるが、本発明はこれに限定されない。ミニバッチ学習のため、正解データ２１１として入力された複数のノイズパッチとそれらに対応する推定パッチ２１３の誤差を求め、ウェイトを更新する。誤差関数（Ｌｏｓｓｆｕｎｃｔｉｏｎ）には、例えばＬ２ノルムやＬ１ノルム等を用いればよい。 In step S105, the update unit 101d updates the weights of the neural network based on the error between the estimated patch 213 and the ground truth data 211. The weights include the filter components and biases of each layer. Backpropagation is used to update the weights, but the present invention is not limited to this. For mini-batch learning, the error between multiple noise patches input as the ground truth data 211 and the corresponding estimated patches 213 is calculated, and the weights are updated. For example, the L2 norm or the L1 norm may be used as the loss function.

ステップＳ１０６では、更新部１０１ｄは、ウェイトの学習が完了したか否かを判定する。完了は、学習（ウェイトの更新）の反復回数が規定値に達したか、又は更新時のウェイトの変化量が規定値より小さいか等により判定することができる。未完と判定された場合、ステップＳ１０４に戻り、新たなノイズパッチとノイズぼけパッチを複数取得する。一方、完了と判定された場合、学習装置１０１（更新部１０１ｄ）は学習を終了し、ウェイトの情報を記憶部１０１ａに保存する。同一の乱数ノイズが付与されたノイズパッチとノイズぼけパッチを正解データ２１１及び入力データ２１２として学習することで、ニューラルネットワークは被写体とぼけ成分とノイズ成分を分離して学習することができる。そのため、ノイズ変動を抑えて被写体のみのぼけ成分に対応したぼけ残差マップを取得することができる。 In step S106, the update unit 101d determines whether the weight learning is complete. Completion can be determined by whether the number of iterations of learning (weight update) has reached a specified value, or whether the amount of change in weight at the time of update is smaller than a specified value. If it is determined to be incomplete, the process returns to step S104, and multiple new noise patches and noise blur patches are obtained. On the other hand, if it is determined to be complete, the learning device 101 (update unit 101d) ends the learning and stores the weight information in the storage unit 101a. By learning the noise patch and noise blur patch to which the same random noise has been added as the correct answer data 211 and the input data 212, the neural network can learn by separating the subject, the blur component, and the noise component. Therefore, it is possible to obtain a blur residual map corresponding to the blur component of the subject only while suppressing noise fluctuation.

本実施例では正解データ２１１に用いたノイズパッチと推定パッチ２１３とから推定誤差を取得したが、推定パッチ２１３を出力せずに推定残差マップ２１４とぼけ残差マップの正解とから推定誤差を取得してもよい。ぼけ残差マップの正解は、ノイズパッチとノイズぼけパッチとの差分であり、無成分パッチとぼけパッチとの差分に等しい。この場合、推定残差マップ２１４として推定されたぼけ残差マップを直接出力する機械学習モデルを学習することができる。 In this embodiment, the estimated error is obtained from the noise patch and the estimated patch 213 used in the correct answer data 211, but the estimated error may be obtained from the estimated residual map 214 and the correct answer of the blur residual map without outputting the estimated patch 213. The correct answer of the blur residual map is the difference between the noise patch and the noise blur patch, and is equal to the difference between the component-free patch and the blur patch. In this case, it is possible to train a machine learning model that directly outputs the blur residual map estimated as the estimated residual map 214.

また、本実施例ではぼけ残差マップを取得する機械学習モデルの学習について説明したが、ノイズ残差マップを取得する機械学習モデルも同様に学習することができる。具体的には、図４において入力データ２１２はノイズぼけパッチのままで、正解データ２１１としてノイズパッチの代わりにぼけパッチを用いればよい。このとき、ぼけ成分は不変でノイズ成分のみ異なるパッチを用いて学習することで、ニューラルネットワークは被写体とぼけ成分とノイズ成分を分離して学習することができる。そのため、ぼけ成分の変化を抑えてノイズ成分に対応したノイズ残差マップを取得することができる。 In addition, although the present embodiment describes the learning of a machine learning model that acquires a blur residual map, a machine learning model that acquires a noise residual map can also be trained in a similar manner. Specifically, in FIG. 4, the input data 212 remains a noise blur patch, and a blur patch is used instead of a noise patch as the correct answer data 211. In this case, by learning using a patch in which the blur component is unchanged and only the noise component is different, the neural network can learn by separating the subject, the blur component, and the noise component. Therefore, it is possible to acquire a noise residual map that corresponds to the noise component while suppressing changes in the blur component.

この場合も、推定パッチ２１３を出力せずに推定残差マップ２１４とノイズ残差マップの正解とから推定誤差を取得してもよい。ノイズ残差マップの正解は、ノイズぼけパッチとぼけパッチとの差分であり、ノイズパッチと無成分パッチの差分に等しい。この場合、推定残差マップ２１４として推定されたノイズ残差マップを直接出力する機械学習モデルを学習することができる。 In this case, too, an estimated error may be obtained from the estimated residual map 214 and the correct solution of the noise residual map without outputting the estimated patch 213. The correct solution of the noise residual map is the difference between the noise blur patch and the blur patch, which is equal to the difference between the noise patch and the component-free patch. In this case, a machine learning model can be trained that directly outputs the noise residual map estimated as the estimated residual map 214.

ぼけ残差マップの正解を無成分パッチとぼけパッチとの差分から取得し、かつノイズ残差マップの正解をノイズパッチと無成分パッチとの差分から取得した場合、ノイズぼけパッチは使用しない。そのため、この場合にノイズぼけパッチを取得しなくともよい。 If the correct answer for the blur residual map is obtained from the difference between the component-free patch and the blur patch, and the correct answer for the noise residual map is obtained from the difference between the noise patch and the component-free patch, the noise blur patch is not used. Therefore, in this case, it is not necessary to obtain the noise blur patch.

また、本実施例ではぼけ残差マップを取得する機械学習モデルとノイズ残差マップを取得する機械学習モデルとは個別に学習するため、それぞれで異なるネットワーク構成をとってよい。 In addition, in this embodiment, the machine learning model that obtains the blur residual map and the machine learning model that obtains the noise residual map are trained separately, so each may have a different network configuration.

また、ニューラルネットワークは被写体とぼけ成分とノイズ成分を分離して学習することができるため、共通のネットワーク（一つのネットワーク）でぼけ残差マップとノイズ残差マップを共に出力してもよい。この場合、機械学習モデルでの学習、推定が共に１度でよくなるため、計算負荷を小さくすることができる。 In addition, since neural networks can learn by separating the subject, blur components, and noise components, a common network (one network) may output both the blur residual map and the noise residual map. In this case, since learning and estimation in the machine learning model can be done only once, the computational load can be reduced.

以下、図５及び図６を参照して、画像推定装置１０３で実行される補正画像（推定画像、出力画像）の生成に関して説明する。図５は、補正画像の生成に関するフローチャートである。図６は、画像処理フローのブロック図である。図５の各ステップの処理は、主に、画像推定装置１０３の取得部１０３ｂ及び補正部１０３ｃにより実行される。 The generation of a corrected image (estimated image, output image) performed by the image estimation device 103 will be described below with reference to Figs. 5 and 6. Fig. 5 is a flowchart related to the generation of a corrected image. Fig. 6 is a block diagram of an image processing flow. The processing of each step in Fig. 5 is mainly performed by the acquisition unit 103b and the correction unit 103c of the image estimation device 103.

ステップＳ２０１では、取得部１０３ｂは、入力画像４０１とウェイトの情報を取得する。入力画像４０１は、未現像のＲＡＷ画像であり、本実施例では撮像装置１０２から送信されたものである。ウェイトの情報は、学習装置１０１から送信され、記憶部１０３ａに記憶された、ぼけ残差マップを取得する機械学習モデルとノイズ残差マップを取得する機械学習モデルのウェイトである。 In step S201, the acquisition unit 103b acquires the input image 401 and weight information. The input image 401 is an undeveloped RAW image, and in this embodiment, is transmitted from the imaging device 102. The weight information is the weights of the machine learning model that acquires the blur residual map and the machine learning model that acquires the noise residual map, which are transmitted from the learning device 101 and stored in the storage unit 103a.

ステップＳ２０２では、補正部１０３ｃは、ステップＳ２０１で取得したぼけ残差マップを取得する機械学習モデルに入力画像４０１を入力してぼけ残差マップ（第１の残差マップ）４０２を取得する。また、補正部１０３ｃは、ノイズ残差マップを取得する機械学習モデルに入力画像４０１を入力してノイズ残差マップ（第２の残差マップ）４０３を取得する。 In step S202, the correction unit 103c inputs the input image 401 to a machine learning model that acquires the blur residual map acquired in step S201 to acquire a blur residual map (first residual map) 402. The correction unit 103c also inputs the input image 401 to a machine learning model that acquires a noise residual map to acquire a noise residual map (second residual map) 403.

ステップＳ２０３では、補正部１０３ｃは、ぼけ残差マップを用いた補正強度とノイズ残差マップを用いた補正強度を取得する。補正強度は、各成分の補正比率を用いればよく、本実施例では予め規定された値を用いるが、ユーザの指定する値を取得してもよい。 In step S203, the correction unit 103c obtains a correction intensity using the blur residual map and a correction intensity using the noise residual map. The correction intensity can be determined by using the correction ratio of each component. In this embodiment, a predefined value is used, but a value specified by the user may also be obtained.

ステップＳ２０４では、補正部１０３ｃは、入力画像４０１とぼけ残差マップ４０２に基づいてノイズ残差マップ４０３を修正し、修正ノイズ残差マップ（第３の残差マップ）４０４を取得する。まず、補正部１０３ｃは、ぼけ残差マップ４０２をステップＳ２０３で取得したぼけ成分の補正比率をかけて入力画像４０１に加える。例えば補正比率を０．５とすると、ぼけ成分を５０％補正した中間補正画像４１０が得られる。次に、式（１）の信号値Ｓ１（ｘ，ｙ）を入力画像４０１とした場合と中間補正画像４１０とした場合のノイズの標準偏差σ１（ｘ，ｙ），σ２（ｘ，ｙ）を取得する。ノイズ残差マップ４０３は、入力画像４０１に含まれるノイズ成分に相当する。ノイズ残差マップ４０３に比率σ２（ｘ，ｙ）／σ１（ｘ，ｙ）を画素ごとにかけることで、修正ノイズ残差マップ４０４が得られる。修正ノイズ残差マップ４０４は、中間補正画像４１０に含まれるノイズ成分に相当する。入力画像４０１の輝度値やぼけ残差マップの値は画面位置ごとに通常は異なるため、画素ごとにかける比率は異なる。したがって、修正ノイズ残差マップ４０４は、ノイズ残差マップ４０３の画素ごとに異なる係数を乗算され、異なる分布となる。これにより、画素ごとの各成分の相関を反映した自然な補正を可能とする残差マップを取得することができる。 In step S204, the correction unit 103c corrects the noise residual map 403 based on the input image 401 and the blur residual map 402 to obtain a modified noise residual map (third residual map) 404. First, the correction unit 103c adds the blur residual map 402 to the input image 401 by multiplying it by the correction ratio of the blur component obtained in step S203. For example, if the correction ratio is 0.5, an intermediate corrected image 410 in which the blur component is corrected by 50% is obtained. Next, the standard deviations σ1(x,y) and σ2(x,y) of noise are obtained when the signal value S1(x,y) of formula (1) is the input image 401 and when the intermediate corrected image 410 is used. The noise residual map 403 corresponds to the noise component contained in the input image 401. The modified noise residual map 404 is obtained by multiplying the noise residual map 403 by the ratio σ2(x,y)/σ1(x,y) for each pixel. The modified noise residual map 404 corresponds to the noise components contained in the intermediate corrected image 410. Since the luminance values of the input image 401 and the values of the blur residual map usually differ for each screen position, the multiplication ratio differs for each pixel. Therefore, the modified noise residual map 404 is multiplied by a different coefficient for each pixel of the noise residual map 403, resulting in a different distribution. This makes it possible to obtain a residual map that enables natural correction that reflects the correlation of each component for each pixel.

ステップＳ２０５では、補正部１０３ｃは、修正ノイズ残差マップ４０４にステップＳ２０３で取得したノイズ成分の補正比率をかけて中間補正画像４１０に加えることで推定画像４０５を取得する。取得される画像はＲＡＷ画像であるため、必要に応じて現像処理が実行される。 In step S205, the correction unit 103c multiplies the modified noise residual map 404 by the correction ratio of the noise component obtained in step S203 and adds the result to the intermediate corrected image 410 to obtain an estimated image 405. Since the obtained image is a RAW image, development processing is performed as necessary.

本実施例において、各成分の補正強度は推定画像を見ながら調整可能であってもよい。 In this embodiment, the correction strength of each component may be adjustable while viewing the estimated image.

また、本実施例の構成は上記に限定されない。例えば、補正強度の取得は、各成分の補正強度を使用する以前であればいつでもよい。 Furthermore, the configuration of this embodiment is not limited to the above. For example, the correction intensity may be obtained at any time before the correction intensity of each component is used.

また、本実施例では、修正ノイズ残差マップ４０４に補正比率をかけたものを中間補正画像４１０に加えて推定画像４０５を取得したが、本発明はこれに限定されない。入力画像４０１にぼけ残差マップ４０２と修正ノイズ残差マップ４０４を各成分の補正比率に応じて加えることで推定画像４０５を取得してもよい。 In addition, in this embodiment, the estimated image 405 is obtained by adding the intermediate corrected image 410 to the corrected noise residual map 404 multiplied by the correction ratio, but the present invention is not limited to this. The estimated image 405 may be obtained by adding the blurred residual map 402 and the corrected noise residual map 404 to the input image 401 according to the correction ratio of each component.

また、共通の機械学習モデル（一つの機械学習モデル）でぼけ残差マップ４０２とノイズ残差マップ４０３を共に出力してもよい。 In addition, both the blur residual map 402 and the noise residual map 403 may be output using a common machine learning model (one machine learning model).

また、標準偏差σ１（ｘ，ｙ）を取得する際の信号値Ｓ１（ｘ，ｙ）は入力画像４０１としたが、ノイズ成分を補正した後の画像としてもよい。すなわち、入力画像４０１から補正比率１でノイズ残差マップを加えた画像を用いてもよい。この方法によれば精度を向上でき、特にノイズ成分が大きい場合に向上する。 In addition, the signal value S1(x,y) when obtaining the standard deviation σ1(x,y) is the input image 401, but it may be an image after noise components have been corrected. In other words, an image obtained by adding a noise residual map to the input image 401 with a correction ratio of 1 may be used. This method can improve accuracy, especially when the noise components are large.

また、本実施例では残差マップを加えることで各成分を補正（除去）しているが、残差マップを減算することで各成分を除去できる符号の定義としてもよい。 In addition, in this embodiment, each component is corrected (removed) by adding a residual map, but it is also possible to define a code that can remove each component by subtracting the residual map.

また、図７の画像処理フローに沿って処理を進めてもよい。図７は、変形例の画像処理フローのブロック図である。まず、入力画像４０１をノイズ成分について学習した機械学習モデル（第２の学習モデル）に入力してノイズ補正画像４１１とノイズ残差マップ４０３を取得する。ノイズ補正画像４１１は、取得したノイズ成分を入力画像４０１から１００％差し引いた画像である。次に、ノイズ補正画像４１１をぼけ成分について学習した機械学習モデル（第１の学習モデル）に入力してノイズぼけ補正画像４１２とぼけ残差マップ４０２を取得する。ノイズぼけ補正画像４１２は、取得したぼけ成分をノイズ補正画像４１１から１００％差し引いた画像である。次に、ぼけ残差マップ４０２に（１－ぼけ成分の補正比率）をかけてノイズぼけ補正画像４１２から減算することでぼけ成分の補正強度を弱め、中間補正画像４１３を取得する。結果として、中間補正画像４１３は、ぼけ成分の補正比率を反映した補正画像となる。ノイズ残差マップ４０３は、入力画像４０１に含まれるノイズ成分に相当する。そこで、本実施例と同様に、ノイズ補正画像４１１と中間補正画像４１３の輝度値に対するノイズの各標準偏差の比率をかけることで、修正ノイズ残差マップ４０４を取得する。修正ノイズ残差マップ４０４に（１－ぼけ成分の補正比率）をかけて中間補正画像４１３から差し引くことでノイズ成分の補正強度を弱め、推定画像４０５を取得して完了する。 Processing may also be performed according to the image processing flow of FIG. 7. FIG. 7 is a block diagram of the image processing flow of the modified example. First, the input image 401 is input to a machine learning model (second learning model) that has learned about noise components to obtain a noise-corrected image 411 and a noise residual map 403. The noise-corrected image 411 is an image obtained by subtracting 100% of the acquired noise components from the input image 401. Next, the noise-corrected image 411 is input to a machine learning model (first learning model) that has learned about blur components to obtain a noise-blur-corrected image 412 and a blur residual map 402. The noise-blur-corrected image 412 is an image obtained by subtracting 100% of the acquired blur components from the noise-corrected image 411. Next, the blur residual map 402 is multiplied by (1-blur component correction ratio) and subtracted from the noise-blur-corrected image 412 to weaken the correction strength of the blur components, and an intermediate corrected image 413 is obtained. As a result, the intermediate corrected image 413 is a corrected image that reflects the correction ratio of the blur component. The noise residual map 403 corresponds to the noise component contained in the input image 401. As in this embodiment, the modified noise residual map 404 is obtained by multiplying the ratio of each standard deviation of noise to the luminance value of the noise corrected image 411 and the intermediate corrected image 413. The correction strength of the noise component is weakened by multiplying the modified noise residual map 404 by (1 - blur component correction ratio) and subtracting it from the intermediate corrected image 413, and the process is completed by obtaining an estimated image 405.

本実施例及び変形例によれば、各成分の補正強度を調整しても機械学習モデルの計算を再度行う必要がない。ぼけ補正の強度を調整する場合はぼけ補正前後の輝度値に基づいて修正ノイズ残差マップを取得する工程から再度実行すればよい。したがって、計算負荷を大幅に低減でき、画像を見ながら補正強度を調整することも可能となる。 According to this embodiment and the modified example, even if the correction strength of each component is adjusted, there is no need to perform the calculation of the machine learning model again. When adjusting the intensity of blur correction, it is sufficient to execute the process again from the step of acquiring the corrected noise residual map based on the luminance values before and after blur correction. Therefore, the calculation load can be significantly reduced, and it is also possible to adjust the correction strength while viewing the image.

なお、本実施例では撮像系の光学特性に由来するぼけ成分の補正を扱ったが、他の要因（デフォーカスやぶれ等）のぼけに対しても、本発明は有効である。学習の際、ぼけパッチに付与するぼけを、デフォーカスやぶれ等に変更することで、それらの要因のぼけでもノイズ成分と分離して残差マップを取得することができる。 In this embodiment, we dealt with the correction of blur components resulting from the optical characteristics of the imaging system, but the present invention is also effective for blur caused by other factors (defocus, blur, etc.). By changing the blur added to the blur patch during learning to defocus, blur, etc., it is possible to separate blur caused by these factors from noise components and obtain a residual map.

また、本発明は、ぼけ成分の補正の代わりにデフォーカスぼけ（背景ぼけ）の変換にも適用可能である。デフォーカスぼけの変換とは、撮像画像中のデフォーカスぼけをユーザの望ましい形状、分布に変換する処理である。撮像画像のデフォーカスぼけには、ヴィネッティングによる欠け、二線ぼけ、非球面レンズの切削痕による輪帯模様、及びカタディオプティック光学系による中心の遮蔽等が含まれる。これらのデフォーカスぼけを、ニューラルネットワークによって、ユーザの望む形状、分布（例えば、フラットな円形や正規分布関数等）に変換する。デフォーカスぼけの変換を実現するニューラルネットワークは、以下の方法で学習することができる。同一の原画像に対し、撮像画像で発生するデフォーカスぼけを付与した撮像相当画像と、ユーザの望むデフォーカスぼけを付与した理想相当画像を、複数のデフォーカス量に対して生成する。ただし、デフォーカスぼけの変換は、合焦距離の被写体に対して変化を起こさないことが望まれるので、デフォーカス量がゼロの撮像相当画像と理想相当画像も生成する。生成された複数の撮像相当画像と理想相当画像から、それぞれ第１デフォーカスパッチと第２デフォーカスパッチを複数抽出し、これらにノイズを付与したノイズ第１デフォーカスパッチとノイズ第２デフォーカスパッチを取得する。本実施例と同様の学習を行うことで、デフォーカスぼけ変換の補正成分をノイズ成分と分離して残差マップとして取得することができる。 The present invention can also be applied to the conversion of defocus blur (background blur) instead of the correction of the blur component. The conversion of defocus blur is a process of converting the defocus blur in a captured image into a shape and distribution desired by the user. The defocus blur in a captured image includes chipping due to vignetting, two-line blur, annular patterns due to cutting marks on an aspheric lens, and central occlusion due to a catadioptric optical system. These defocus blurs are converted into a shape and distribution desired by the user (for example, a flat circle or a normal distribution function) by a neural network. A neural network that realizes the conversion of defocus blur can be trained by the following method. For the same original image, a captured equivalent image with the defocus blur generated in the captured image and an ideal equivalent image with the defocus blur desired by the user are generated for multiple defocus amounts. However, since it is desired that the conversion of defocus blur does not cause any change to the subject at the focal distance, a captured equivalent image and an ideal equivalent image with a defocus amount of zero are also generated. From the generated multiple captured equivalent images and ideal equivalent images, multiple first defocus patches and second defocus patches are extracted, and noise is added to these to obtain noisy first defocus patches and noisy second defocus patches. By performing learning similar to that in this embodiment, the correction components of the defocus blur conversion can be separated from the noise components and obtained as a residual map.

また、本発明は、ぼけ成分の補正の代わりにライティングの変換（リライティング）にも適用可能である。ライティングの変換を実現するニューラルネットワークは、以下の方法で学習することができる。同一の法線マップである原画像に対し、撮像画像で想定される光源環境でのレンダリングを行うことで撮像相当画像を生成する。同様に、法線マップに対し、ユーザの望む光源環境でのレンダリングによって理想相当画像を生成する。撮像相当画像と理想相当画像から、それぞれ第１ライティングパッチと第２ライティングパッチを複数抽出し、これらにノイズを付与したノイズ第１ライティングパッチとノイズ第２ライティングパッチを取得する。第１ライティングがリライティング前のライティングであり、第２ライティングがリライティング後のライティングに相当する。本実施例と同様の学習と行うことで、リライティング成分をノイズ成分と分離して残差マップとして取得することができる。 The present invention can also be applied to lighting conversion (relighting) instead of blur component correction. A neural network that realizes lighting conversion can be trained in the following manner. A captured equivalent image is generated by rendering an original image, which is the same normal map, in a light source environment assumed for the captured image. Similarly, an ideal equivalent image is generated by rendering the normal map in a light source environment desired by the user. A plurality of first lighting patches and second lighting patches are extracted from the captured equivalent image and the ideal equivalent image, respectively, and noise is added to these to obtain a noisy first lighting patch and a noisy second lighting patch. The first lighting corresponds to the lighting before relighting, and the second lighting corresponds to the lighting after relighting. By performing learning similar to that of this embodiment, the relighting component can be separated from the noise component and obtained as a residual map.

また、本発明は、ぼけ成分の補正の代わりにリライティング、ノイズ成分の補正の代わりにぼけ成分の補正としても適用可能である。撮像系の光学特性、デフォーカス、及びぶれに由来するぼけは、被写体の明るさに依存して周囲の画素に広がるぼけを生じさせる。そのため、ライティングによって被写体の明るさが変わると本来ぼけも同時に変化する。ＲＡＷ画像においては、被写体の明るさとぼけ成分は比例の関係で変化する。本実施例では、ぶれ補正前後の輝度変化に基づいてノイズの標準偏差変化を取得し、標準偏差変化の比率を乗算することで修正ノイズ残差マップを取得する。同様に、リライティング成分による光源補正の前後での輝度値の比に基づいて、画素ごとにぼけ残差マップを修正する。これにより、修正ぼけ残差マップが取得される。ぼけ成分とリライティング成分を分離して取得するニューラルネットワークは、以下の方法で学習することができる。同一の法線マップである原画像に対し、撮像画像で想定される光源環境でのレンダリングを行うことで撮像相当画像を生成する。同様に、法線マップに対し、ユーザの望む光源環境でのレンダリングによって理想相当画像を生成する。撮像相当画像と理想相当画像から、それぞれ第１ライティングパッチと第２ライティングパッチを複数抽出し、これらにぼけを付与したぼけ第１ライティングパッチとぼけ第２ライティングパッチを取得する。ここでは、第２成分がぼけ成分になるため、ぼけ第１ライティングパッチ及びぼけ第２ライティングパッチが本実施例のノイズぼけパッチ及びノイズパッチに相当する。本実施例と同様の学習を行うことで、リライティング成分をぼけ成分と分離して残差マップとして取得することができる。 The present invention can also be applied as relighting instead of correcting blur components, and as correcting blur components instead of correcting noise components. Blurring caused by the optical characteristics of the imaging system, defocus, and shaking causes blurring that spreads to surrounding pixels depending on the brightness of the subject. Therefore, when the brightness of the subject changes due to lighting, the original blur also changes at the same time. In a RAW image, the brightness of the subject and the blur components change in a proportional relationship. In this embodiment, the standard deviation change of noise is obtained based on the brightness change before and after shake correction, and a modified noise residual map is obtained by multiplying the ratio of the standard deviation change. Similarly, the blur residual map is modified for each pixel based on the ratio of the brightness values before and after light source correction by the relighting component. This results in a modified blur residual map. A neural network that separates and obtains blur components and relighting components can be trained in the following manner. A captured equivalent image is generated by rendering the original image, which is the same normal map, in a light source environment assumed in the captured image. Similarly, an ideal equivalent image is generated by rendering the normal map in a light source environment desired by the user. A plurality of first lighting patches and second lighting patches are extracted from the captured equivalent image and the ideal equivalent image, respectively, and blurred first lighting patches and blurred second lighting patches are obtained by adding blur to these. Here, since the second component is the blur component, the blurred first lighting patch and blurred second lighting patch correspond to the noise blur patch and noise patch in this embodiment. By performing learning similar to that in this embodiment, the relighting component can be separated from the blur component and obtained as a residual map.

また、ぼけ成分とリライティング成分を取得して画像補正を行う場合、ぼけ成分の補正前後での輝度値の比に基づいてリライティング成分の残差マップを修正してもよい。 In addition, when acquiring the blur component and the relighting component to perform image correction, the residual map of the relighting component may be corrected based on the ratio of brightness values before and after the blur component correction.

また、本発明は３つ以上の成分についても適用可能である。３つ以上の成分であっても各成分を分離して残差マップとして取得することができるため、２成分の場合と同様に成分間の相関を考慮しつつ、適宜各成分の残差マップを修正すればよい。例えば、ぼけ成分の補正、リライティング、及びノイズ成分の補正を行う場合、まずリライティング成分による光源補正の前後での輝度値の比に基づいて、画素ごとにぼけ残差マップを修正する。その後、リライティング成分による光源補正とぼけ成分の補正を両方適用しない場合の輝度値と両方適用した場合の輝度値の比に基づいて、画素ごとにノイズ成分マップを修正すればよい。 The present invention can also be applied to three or more components. Even when there are three or more components, each component can be separated and obtained as a residual map, so the residual map of each component can be modified appropriately while taking into account the correlation between the components, just as in the case of two components. For example, when performing blur component correction, relighting, and noise component correction, first the blur residual map is modified for each pixel based on the ratio of luminance values before and after light source correction by the relighting component. After that, the noise component map is modified for each pixel based on the ratio of luminance values when neither light source correction by the relighting component nor blur component correction is applied to the luminance values when both are applied.

なお、本実施例では学習装置１０１と画像推定装置１０３が別体である場合について説明したが、本発明はこれに限定されない。学習装置１０１と画像推定装置１０３は一体であってもよい。すなわち、一体の装置内で学習（図３に示す処理）と推定（図５に示す処理）を行ってもよい。 In this embodiment, the case where the learning device 101 and the image estimation device 103 are separate has been described, but the present invention is not limited to this. The learning device 101 and the image estimation device 103 may be integrated. In other words, learning (the process shown in FIG. 3) and estimation (the process shown in FIG. 5) may be performed within a single device.

以上の構成により、相関を持つ複数の成分に対応する各補正強度を調整しつつ画像補正を行って自然な補正画像を取得することができる。 With the above configuration, it is possible to obtain a natural corrected image by performing image correction while adjusting the correction strength corresponding to multiple correlated components.

本実施例の画像処理システムは、補正画像の生成を撮像装置内の画像推定部で実行する点が実施例１の画像処理システムと異なる。 The image processing system of this embodiment differs from the image processing system of the first embodiment in that the generation of the corrected image is performed by an image estimation unit in the imaging device.

図８は、本実施例の画像処理システム３００のブロック図である。図９は、画像処理システム３００の外観図である。画像処理システム３００は、学習装置３０１と撮像装置３０２を有する。学習装置３０１と撮像装置３０２は、ネットワーク３０３を介して接続されている。学習装置３０１は、記憶部３１１、取得部３１２、生成部３１３、及び更新部３１４を有し、ニューラルネットワークで残差マップを取得するためのウェイト（ウェイトの情報）を学習する。撮像装置３０２は、光学系３２１、撮像素子３２２、画像推定部（画像処理装置）３２３、記憶部３２４、記録媒体３２５、表示部３２６、及びシステムコントローラ３２７を有する。撮像装置３０２は、被写体空間を撮像して撮像画像を取得し、読み出したウェイトの情報を用いて撮像画像からぼけ残差マップとノイズ残差マップを取得する。撮像装置３０２は、ノイズ残差マップを修正した修正ノイズ残差マップを用いて補正画像を生成する。画像推定部３２３は、取得部（第１取得部）３２３ａと補正部（修正部、第２取得部）３２３ｂを有し、記憶部３２４に保存されたウェイトの情報を用いて各残差マップを取得して、各残差マップに基づいて補正を実行する。 Figure 8 is a block diagram of the image processing system 300 of this embodiment. Figure 9 is an external view of the image processing system 300. The image processing system 300 has a learning device 301 and an imaging device 302. The learning device 301 and the imaging device 302 are connected via a network 303. The learning device 301 has a memory unit 311, an acquisition unit 312, a generation unit 313, and an update unit 314, and learns weights (weight information) for acquiring a residual map in a neural network. The imaging device 302 has an optical system 321, an imaging element 322, an image estimation unit (image processing device) 323, a memory unit 324, a recording medium 325, a display unit 326, and a system controller 327. The imaging device 302 captures an image of a subject space to acquire an image, and acquires a blur residual map and a noise residual map from the image using the read weight information. The imaging device 302 generates a corrected image using a modified noise residual map obtained by modifying the noise residual map. The image estimation unit 323 has an acquisition unit (first acquisition unit) 323a and a correction unit (modification unit, second acquisition unit) 323b, and acquires each residual map using the weight information stored in the memory unit 324, and performs correction based on each residual map.

ウェイトの情報は、学習装置３０１で事前に学習され、記憶部３１１に保存されている。撮像装置３０２は、記憶部３１１からネットワーク３０３を介してウェイトの情報を読み出し、記憶部３２４に保存する。補正画像は、記録媒体３２５に保存される。ユーザから補正画像の表示に関する指示が出された場合、保存された補正画像が読み出され、表示部３２６に表示される。なお、記録媒体３２５に既に保存された撮像画像を読み出し、画像推定部３２３で性能ずれ補正を行ってもよい。以上の一連の制御は、システムコントローラ３２７によって行われる。 The weight information is learned in advance by the learning device 301 and stored in the memory unit 311. The imaging device 302 reads out the weight information from the memory unit 311 via the network 303 and stores it in the memory unit 324. The corrected image is stored in the recording medium 325. When an instruction to display the corrected image is issued by the user, the stored corrected image is read out and displayed on the display unit 326. Note that an image already stored in the recording medium 325 may be read out and performance deviation correction may be performed by the image estimation unit 323. The above series of controls are performed by the system controller 327.

学習装置３０１で実行される機械学習モデルの学習は、実施例１で説明した機械学習モデルの学習と等しい。 The learning of the machine learning model performed by the learning device 301 is equivalent to the learning of the machine learning model described in Example 1.

画像推定部３２３で実行される補正画像の取得は、図５のフローチャートと同様に実施される。実施例１の取得部１０３ｂ及び補正部１０３ｃで実行される代わりに、主に、画像推定部３２３の取得部３２３ａ及び補正部３２３ｂにより実行される。 The acquisition of the corrected image performed by the image estimation unit 323 is performed in the same manner as in the flowchart of FIG. 5. Instead of being performed by the acquisition unit 103b and the correction unit 103c in the first embodiment, it is mainly performed by the acquisition unit 323a and the correction unit 323b of the image estimation unit 323.

本実施例の画像処理システムは、画像推定装置に対して画像処理の対象である撮像画像を送信し処理済みの出力画像（補正画像、推定画像）を画像推定装置から受信する処理装置（コンピュータ）を有する点で実施例１，２の画像処理システムと異なる。 The image processing system of this embodiment differs from the image processing systems of embodiments 1 and 2 in that it has a processing device (computer) that transmits a captured image that is the subject of image processing to the image estimation device and receives a processed output image (corrected image, estimated image) from the image estimation device.

図１０は、本実施例の画像処理システム６００のブロック図である。画像処理システム６００は、学習装置６０１、撮像装置６０２、画像推定装置（画像処理装置）６０３、及び処理装置（コンピュータ）６０４を有する。学習装置６０１及び画像推定装置６０３は例えば、サーバである。処理装置６０４は例えば、ユーザ端末（パーソナルコンピュータ又はスマートフォン）であり、ネットワーク６０５を介して画像推定装置６０３に接続されている。すなわち、画像推定装置６０３と処理装置６０４は、互いに通信可能に構成されている。画像推定装置６０３は、ネットワーク６０６を介して学習装置６０１に接続されている。すなわち、学習装置６０１と画像推定装置６０３は、互いに通信可能に構成されている。 Figure 10 is a block diagram of an image processing system 600 of this embodiment. The image processing system 600 has a learning device 601, an imaging device 602, an image estimation device (image processing device) 603, and a processing device (computer) 604. The learning device 601 and the image estimation device 603 are, for example, servers. The processing device 604 is, for example, a user terminal (personal computer or smartphone), and is connected to the image estimation device 603 via a network 605. That is, the image estimation device 603 and the processing device 604 are configured to be able to communicate with each other. The image estimation device 603 is connected to the learning device 601 via a network 606. That is, the learning device 601 and the image estimation device 603 are configured to be able to communicate with each other.

学習装置６０１の構成は実施例１の学習装置１０１と同様であるため、説明を省略する。また、撮像装置６０２の構成は実施例１の撮像装置１０２と同様であるため、説明を省略する。 The configuration of the learning device 601 is similar to that of the learning device 101 in the first embodiment, and therefore a description thereof will be omitted. Also, the configuration of the imaging device 602 is similar to that of the imaging device 102 in the first embodiment, and therefore a description thereof will be omitted.

画像推定装置６０３は、記憶部６０３ａ、取得部（第１取得部）６０３ｂ、補正部（修正部、第２取得部）６０３ｃ、及び通信部（受信部）６０３ｄを有する。記憶部６０３ａ、取得部６０３ｂ、及び補正部６０３ｃのそれぞれは、実施例１の画像推定装置１０３の記憶部１０３ａ、取得部１０３ｂ、及び補正部１０３ｃと同様である。通信部６０３ｄは、処理装置６０４から送信される要求を受信する機能と、画像推定装置６０３によって生成された出力画像を処理装置６０４に送信する機能とを有する。 The image estimation device 603 has a memory unit 603a, an acquisition unit (first acquisition unit) 603b, a correction unit (modification unit, second acquisition unit) 603c, and a communication unit (reception unit) 603d. The memory unit 603a, the acquisition unit 603b, and the correction unit 603c are similar to the memory unit 103a, the acquisition unit 103b, and the correction unit 103c of the image estimation device 103 in Example 1, respectively. The communication unit 603d has a function of receiving a request transmitted from the processing device 604, and a function of transmitting an output image generated by the image estimation device 603 to the processing device 604.

処理装置６０４は、通信部（送信部）６０４ａ、表示部６０４ｂ、画像処理部６０４ｃ、及び記録部６０４ｄを有する。通信部６０４ａは、撮像画像に対する処理を画像推定装置６０３に実行させるための要求を画像推定装置６０３に送信する機能と、画像推定装置６０３によって処理された出力画像を受信する機能とを有する。表示部６０４ｂは、種々の情報を表示する機能を有する。表示部６０４ｂによって表示される情報は例えば、画像推定装置６０３に送信する撮像画像や、画像推定装置６０３から受信した出力画像を含む。画像処理部６０４ｃは、画像推定装置６０３から受信した出力画像に対して画像処理を施す機能を有する。記録部６０４ｄは、撮像装置６０２から取得した撮像画像や、画像推定装置６０３から受信した出力画像等を記録する。 The processing device 604 has a communication unit (transmission unit) 604a, a display unit 604b, an image processing unit 604c, and a recording unit 604d. The communication unit 604a has a function of transmitting a request to the image estimation device 603 to cause the image estimation device 603 to execute processing on the captured image, and a function of receiving an output image processed by the image estimation device 603. The display unit 604b has a function of displaying various information. The information displayed by the display unit 604b includes, for example, the captured image to be transmitted to the image estimation device 603 and the output image received from the image estimation device 603. The image processing unit 604c has a function of performing image processing on the output image received from the image estimation device 603. The recording unit 604d records the captured image acquired from the imaging device 602, the output image received from the image estimation device 603, etc.

以下、図１１を参照して、本実施例の画像処理について説明する。図１１は、画像処理に関するフローチャートである。 The image processing of this embodiment will be described below with reference to FIG. 11. FIG. 11 is a flowchart of the image processing.

図１１の画像処理は、処理装置６０４を介してユーザにより画像処理開始の指示が成されたことを契機として開始される。まず、処理装置６０４における動作について説明する。 The image processing in FIG. 11 is started when a command to start image processing is given by the user via the processing device 604. First, the operation of the processing device 604 will be described.

ステップＳ７０１では、処理装置６０４は、撮像画像に対する処理の要求を画像推定装置６０３に送信する。なお、処理対象である撮像画像を画像推定装置６０３に送信する方法は問わない。例えば、撮像画像はステップＳ７０１の処理と同時に画像推定装置６０３にアップロードされてもよいし、ステップＳ７０１の処理より前に画像推定装置６０３にアップロードされてもよい。また、撮像画像は、画像推定装置６０３とは異なるサーバ上に記憶された画像でもよい。なお、ステップＳ７０１において、処理装置６０４は、撮像画像に対する処理の要求と共に、ユーザを認証するＩＤ情報等を送信してもよい。 In step S701, the processing device 604 transmits a request for processing the captured image to the image estimation device 603. The method of transmitting the captured image to be processed to the image estimation device 603 is not important. For example, the captured image may be uploaded to the image estimation device 603 simultaneously with the processing of step S701, or may be uploaded to the image estimation device 603 before the processing of step S701. The captured image may also be an image stored on a server different from the image estimation device 603. In step S701, the processing device 604 may transmit ID information for authenticating a user together with the request for processing the captured image.

ステップＳ７０２では、処理装置６０４は、画像推定装置６０３内で生成された出力画像を受信する。出力画像は、実施例１と同様に撮像画像に対して各成分が残差マップを用いて補正された画像である。 In step S702, the processing device 604 receives the output image generated in the image estimation device 603. The output image is an image in which each component of the captured image has been corrected using the residual map, as in Example 1.

次に、画像推定装置６０３の動作について説明する。 Next, the operation of the image estimation device 603 will be described.

ステップＳ８０１では、画像推定装置６０３は、処理装置６０４から送信された撮像画像に対する処理の要求を受信する。画像推定装置６０３は、撮像画像に対する処理が指示されたと判断し、ステップＳ８０２以降の処理を実行する。 In step S801, the image estimation device 603 receives a request for processing the captured image sent from the processing device 604. The image estimation device 603 determines that processing for the captured image has been instructed, and executes the processing from step S802 onwards.

ステップＳ８０２では、画像推定装置６０３は、ウェイトの情報を取得する。ウェイトの情報は、実施例１と同様の方法で学習された情報（学習済みモデル）である。画像推定装置６０３は、学習装置６０１からウェイトの情報を取得してもよいし、予め学習装置６０１から取得され、記憶部６０３ａに記憶されたウェイトの情報を取得してもよい。 In step S802, the image estimation device 603 acquires weight information. The weight information is information (trained model) learned in the same manner as in Example 1. The image estimation device 603 may acquire the weight information from the learning device 601, or may acquire weight information previously acquired from the learning device 601 and stored in the memory unit 603a.

ステップＳ８０３からステップＳ８０６までの処理はそれぞれ、実施例１で説明した図５のステップＳ２０２からステップＳ２０５までの処理と同様であるため、説明を省略する。 The processes from step S803 to step S806 are similar to the processes from step S202 to step S205 in FIG. 5 described in Example 1, and therefore will not be described.

ステップＳ８０７では、画像推定装置６０３は、出力画像を処理装置６０４に送信する。 In step S807, the image estimation device 603 transmits the output image to the processing device 604.

本実施例のように、補正処理を画像推定装置６０３内で行う場合、補正処理による処理負荷を画像推定装置６０３内で担うことができるため、処理装置６０４側に求められる処理能力を減じることができる。 When the correction process is performed within the image estimation device 603 as in this embodiment, the processing load due to the correction process can be borne within the image estimation device 603, thereby reducing the processing capacity required of the processing device 604.

以上説明したように、画像推定装置６０３を、画像推定装置６０３と通信可能に接続された処理装置６０４を用いて制御するように構成してもよい。
［その他の実施例］
本発明は、上述の実施例の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 As described above, the image estimation device 603 may be configured to be controlled using a processing device 604 communicatively connected to the image estimation device 603 .
[Other Examples]
The present invention can also be realized by a process in which a program for implementing one or more of the functions of the above-described embodiments is supplied to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) that implements one or more of the functions.

以上、本発明の好ましい実施形態について説明したが、本発明はこれらの実施形態に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。 The above describes preferred embodiments of the present invention, but the present invention is not limited to these embodiments, and various modifications and variations are possible within the scope of the gist of the invention.

１０３画像推定装置
１０３ｂ取得部
１０３ｃ補正部 103 Image estimation device 103b Acquisition unit 103c Correction unit

Claims

obtaining a first residual map corresponding to the first component and a second residual map corresponding to the second component by inputting an input image to at least one machine learning model;
generating a third residual map based on the first residual map and the second residual map;
obtaining an output image based on the input image, the first residual map, and the third residual map.

The image processing method according to claim 1, characterized in that the first component and the second component correspond to different tasks of image correction.

the first residual map corresponds to a difference between the input image corrected for the first component and the input image;
3. The image processing method of claim 1, wherein the second residual map corresponds to a difference between the input image corrected for the second component and the input image.

The image processing method according to any one of claims 1 to 3, characterized in that the at least one machine learning model is trained to output the first residual map that changes the first component without changing the second component, and is trained to output the second residual map that changes the second component without changing the first component.

The image processing method according to claim 4, characterized in that the at least one machine learning model is composed of one machine learning model, and the one machine learning model is trained to output the first residual map that changes the first component without changing the second component, and is trained to output the second residual map that changes the second component without changing the first component.

The image processing method according to claim 4, characterized in that the at least one machine learning model includes a first learning model trained to output the first residual map that changes the first component without changing the second component, and a second learning model trained to output the second residual map that changes the second component without changing the first component.

The image processing method according to any one of claims 1 to 6, characterized in that the first component is at least one of a blur component, a scattering component due to fog, a relighting component, and a background blur component that corrects background blur.

The image processing method according to any one of claims 1 to 7, characterized in that the second component is a noise component.

The image processing method according to any one of claims 1 to 8, characterized in that the third residual map is obtained using luminance values of an image before and after correction using the first residual map.

The image processing method according to claim 9, characterized in that the third residual map is obtained by multiplying a coefficient obtained for each pixel based on the luminance value by a value of a corresponding pixel in the second residual map.

The image processing method according to any one of claims 1 to 10, characterized in that the output image is obtained by weighting the first residual map and the third residual map and adding them to an image based on the input image.

a first acquisition unit that acquires a first residual map corresponding to a first component and a second residual map corresponding to a second component by inputting an input image to at least one machine learning model;
a correction unit that generates a third residual map based on the first residual map and the second residual map;
and a second acquisition unit that acquires an output image based on the input image, the first residual map, and the third residual map.

An image processing system comprising the image processing device according to claim 12 and a control device capable of communicating with the image processing device,
the control device has a transmission means for transmitting a request for execution of processing on a captured image to the image processing device,
The image processing system is characterized in that the image processing device has a receiving means for receiving the request, and generates the output image in response to the request.

A program for causing a computer to execute the image processing method according to any one of claims 1 to 11.