JP6401674B2

JP6401674B2 - Image processing method, image processing apparatus, and image processing program

Info

Publication number: JP6401674B2
Application number: JP2015150887A
Authority: JP
Inventors: 志織杉本; 信哉志水
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2015-07-30
Filing date: 2015-07-30
Publication date: 2018-10-10
Anticipated expiration: 2035-07-30
Also published as: JP2017033172A

Description

本発明は、ライトフィールド画像を生成する画像処理方法、画像処理装置及び画像処理プログラムに関する。 The present invention relates to an image processing method, an image processing apparatus, and an image processing program for generating a light field image.

撮像素子に入光する光の方向を含めた光線情報（ライトフィールド画像）を記録するライトフィールドカメラが実用化されたことで、撮像後のピント調節や三次元情報の復元など、光線情報を活用した新たな画像処理技術とアプリケーションの研究開発が盛んに行われるようになってきている。光線情報を記録する方法は古くから存在しており、多数のカメラを密に並べて同期して撮影する方法などが有名である（例えば、非特許文献１参照）。この方法は非常に多くの同型のカメラを並べ同期して動かすシステムを必要とし、また厳密なカメラ校正が要求されることから、費用や労力の点で実用が難しいと思われてきた。 Utilization of light information, such as focus adjustment after image capture and restoration of three-dimensional information, due to the practical use of light field cameras that record light information (light field images) including the direction of light entering the image sensor Research and development of new image processing technologies and applications has been actively conducted. A method of recording light ray information has been present for a long time, and a method of photographing a large number of cameras closely arranged in synchronization is well known (for example, see Non-Patent Document 1). This method requires a system in which a large number of cameras of the same type are moved in synchronization with each other, and since strict camera calibration is required, it has been considered difficult to implement in terms of cost and labor.

しかしながら、近年一般向けにも販売されるようになったライトフィールドカメラ（例えば、非特許文献２参照）では、カメラ内部の主レンズの前後にマイクロレンズアレイを設置することで、多数のカメラを並べた場合と同じ（マイクロレンズアレイの設置位置を変えることで異なる構成を取ることもできる）ように光線情報を記録することを可能にした。 However, in a light field camera (for example, see Non-Patent Document 2) that has recently been sold to the general public, a large number of cameras are arranged by arranging microlens arrays before and after the main lens inside the camera. It is possible to record the light beam information in the same way as in the case of the above (it is possible to adopt a different configuration by changing the installation position of the microlens array).

B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM Transactions on Graphics, vol. 24. p. 765, 2005.B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM Transactions on Graphics, vol. 24. p. 765, 2005. R. Ng, M. Levoy, G. Duval, M. Horowitz, and P. Hanrahan, “Light Field Photography with a Hand-held Plenoptic Camera,” Stanford Tech Rep. CTSR, pp. 1-11, 2005.R. Ng, M. Levoy, G. Duval, M. Horowitz, and P. Hanrahan, “Light Field Photography with a Hand-held Plenoptic Camera,” Stanford Tech Rep. CTSR, pp. 1-11, 2005.

マイクロレンズアレイを伴うライトフィールドカメラでは従来のカメラと異なり光線情報を記録することが可能である。しかしながら、このカメラの記録方法では空間方向分解能と角度方向分解能がトレードオフであり、実用的に利用可能な撮像素子では従来のカメラによる写真撮影で要求されるような解像度を達成することは難しい。また、マイクロレンズアレイによる光の減衰によって光量が不足するためノイズの影響を受けやすく、それを補うために露出を調節するためブレの影響を受けることにもなる。このようにライトフィールドカメラによる撮像では従来のカメラで容易に達成可能な品質を得ることは難しいという問題がある。 Unlike a conventional camera, a light field camera with a microlens array can record light beam information. However, in this camera recording method, the spatial resolution and the angular resolution are a trade-off, and it is difficult to achieve the resolution required for photography with a conventional camera with a practically usable imaging device. In addition, the amount of light is insufficient due to attenuation of light by the microlens array, so that it is easily affected by noise, and in order to compensate for it, it is also affected by blurring. As described above, there is a problem that it is difficult to obtain a quality that can be easily achieved by a conventional camera in imaging using a light field camera.

本発明は、このような事情に鑑みてなされたもので、容易にライトフィールド画像を生成することができる画像処理方法、画像処理装置及び画像処理プログラムを提供することを目的とする。 SUMMARY An advantage of some aspects of the invention is that it provides an image processing method, an image processing apparatus, and an image processing program capable of easily generating a light field image.

本発明の一態様は、焦点ボケを含む入力画像と任意のライトフィールド画像を基底ベクトルの線形結合で表すことができる有限の前記基底ベクトルからなるライトフィールド辞書からライトフィールド画像を生成する画像処理方法であって、前記ライトフィールド画像に対する焦点合わせ処理結果と前記入力画像との誤差と、前記ライトフィールド画像の各副開口画像の焦点ボケ量とを最適化する前記ライトフィールド辞書に対する係数ベクトルを推定する係数ベクトル推定ステップと、前記係数ベクトルからライトフィールド画像を生成するライトフィールド画像生成ステップとを有する画像処理方法である。 One aspect of the present invention is an image processing method for generating a light field image from a light field dictionary composed of a finite basis vector that can represent an input image including a defocused image and an arbitrary light field image by linear combination of the basis vectors. And estimating a coefficient vector for the light field dictionary that optimizes an error between the focusing process result for the light field image and the input image, and a focal blur amount of each sub-aperture image of the light field image. An image processing method including a coefficient vector estimation step and a light field image generation step of generating a light field image from the coefficient vector.

本発明の一態様は、前記画像処理方法であって、前記副開口画像の焦点ボケ量を推定する焦点ボケ量推定ステップをさらに有し、前記係数ベクトル推定ステップでは、前記焦点ボケ量に基づき前記係数ベクトルを推定する。 One aspect of the present invention is the image processing method, further including a focal blur amount estimation step for estimating a focal blur amount of the sub-aperture image, and the coefficient vector estimation step includes the focus blur amount based on the focal blur amount. Estimate the coefficient vector.

本発明の一態様は、前記画像処理方法であって、前記焦点ボケ量推定ステップでは、前記副開口画像の勾配のノルムを前記焦点ボケ量とする。 One aspect of the present invention is the image processing method, wherein in the focal blur amount estimation step, a norm of a gradient of the sub-aperture image is set as the focal blur amount.

本発明の一態様は、前記画像処理方法であって、前記焦点ボケ量推定ステップでは、前記副開口画像の勾配の分散を前記焦点ボケ量とする。 One aspect of the present invention is the image processing method, wherein in the focal blur amount estimation step, a dispersion of a gradient of the sub-aperture image is set as the focal blur amount.

本発明の一態様は、前記画像処理方法であって、前記焦点ボケ量推定ステップでは、前記副開口画像に対して異なる２種類の点広がり関数を適用して求めた勾配の比を前記焦点ボケ量とする。 One aspect of the present invention is the image processing method, wherein in the focal blur amount estimation step, a gradient ratio obtained by applying two different types of point spread functions to the sub-aperture image is calculated as the focal blur. Amount.

本発明の一態様は、焦点ボケを含む入力画像と任意のライトフィールド画像を基底ベクトルの線形結合で表すことができる有限の前記基底ベクトルからなるライトフィールド辞書からライトフィールド画像を生成する画像処理装置であって、前記ライトフィールド画像に対する焦点合わせ処理結果と前記入力画像との誤差と、前記ライトフィールド画像の各副開口画像の焦点ボケ量とを最適化する前記ライトフィールド辞書に対する係数ベクトルを推定する係数ベクトル推定手段と、前記係数ベクトルからライトフィールド画像を生成するライトフィールド画像生成手段とを備える画像処理装置である。 One aspect of the present invention is an image processing device that generates a light field image from a light field dictionary including a finite basis vector that can represent an input image including a defocused image and an arbitrary light field image by linear combination of the basis vectors. And estimating a coefficient vector for the light field dictionary that optimizes an error between the focusing process result for the light field image and the input image, and a focal blur amount of each sub-aperture image of the light field image. An image processing apparatus includes a coefficient vector estimation unit and a light field image generation unit that generates a light field image from the coefficient vector.

本発明の一態様は、前記画像処理方法をコンピュータに実行させるための画像処理プログラムである。 One aspect of the present invention is an image processing program for causing a computer to execute the image processing method.

本発明によれば、光線情報を含まない一般の画像に基づきライトフィールド辞書を参照して、光線情報を復元するようにしたため、容易にライトフィールド画像を生成することができるという効果が得られる。 According to the present invention, the light field information is restored by referring to the light field dictionary based on a general image that does not include the light beam information, so that the light field image can be easily generated.

本発明の一実施形態によるライトフィールド画像生成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the light field image generation apparatus by one Embodiment of this invention. 図１に示すライトフィールド画像生成装置１００の処理動作を示すフローチャートである。3 is a flowchart showing a processing operation of the light field image generation device 100 shown in FIG. 1.

以下、図面を参照して、本発明の一実施形態によるライトフィールド画像生成装置を説明する。図１は同実施形態によるライトフィールド画像生成装置の構成を示すブロック図である。ライトフィールド画像生成装置１００は、図１に示すように、画像入力部１０１、辞書入力部１０２、パッチ生成部１０３、係数ベクトル推定部１０４及びライトフィールド画像生成部１０５を備えている。 Hereinafter, a light field image generating device according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a light field image generation device according to the embodiment. As illustrated in FIG. 1, the light field image generation device 100 includes an image input unit 101, a dictionary input unit 102, a patch generation unit 103, a coefficient vector estimation unit 104, and a light field image generation unit 105.

画像入力部１０１は、ライトフィールド画像の元となる光線情報を含まない画像を外部から入力する。以下では、この画像を入力画像と称する。辞書入力部１０２は、ライトフィールド辞書を外部から入力する。以下では、このライトフィールド辞書をＬＦ辞書と称する。パッチ生成部１０３は、入力画像をあらかじめ定められた大きさのパッチ（小領域）に分割し、画像パッチ群を生成する。係数ベクトル推定部１０４は、画像パッチ群とＬＦ辞書とから各画像パッチに対応する係数ベクトル群を推定する。ライトフィールド画像生成部１０５は、係数ベクトル群とＬＦ辞書とからライトフィールドパッチ群を生成し、ライトフィールドパッチ群からライトフィールド画像を生成し出力する。以下ではライトフィールドパッチのことをＬＦパッチ、ライトフィールド画像のことをＬＦ画像と称する。 The image input unit 101 inputs an image that does not include light ray information that is a source of the light field image from the outside. Hereinafter, this image is referred to as an input image. The dictionary input unit 102 inputs a light field dictionary from the outside. Hereinafter, this light field dictionary is referred to as an LF dictionary. The patch generation unit 103 divides the input image into patches (small regions) having a predetermined size, and generates an image patch group. The coefficient vector estimation unit 104 estimates a coefficient vector group corresponding to each image patch from the image patch group and the LF dictionary. The light field image generation unit 105 generates a light field patch group from the coefficient vector group and the LF dictionary, and generates and outputs a light field image from the light field patch group. Hereinafter, the light field patch is referred to as an LF patch, and the light field image is referred to as an LF image.

ＬＦ画像はある撮像系に入光した光線の情報を記録した画像である。ＬＦ画像はどのように表現されてもよい。一般には、通常の画像が縦・横にピクセルを並べた形の二次元配列で表されるのに対し、ＬＦ画像はさらに二方向の角度を表現する二次元を追加した四次元配列の形で表される。空間分解能がＨ（高さ）ｘＷ（幅）で角度分解能がＮｘＭである場合、ＬＦ画像はＨｘＷｘＮｘＭの四次元配列で表すことができる（参考文献１：「M. Levoy and P. Hanrahan, “Light field rendering,” Proc. 23rd Annu. Conf. Comput. Graph. Interact. Tech. - SIGGRAPH ’96, pp. 31-42, 1996.」）。 The LF image is an image in which information of light rays incident on a certain imaging system is recorded. The LF image may be expressed in any way. In general, a normal image is represented by a two-dimensional array in which pixels are arranged vertically and horizontally, whereas an LF image is a four-dimensional array in which two dimensions are added to express angles in two directions. expressed. When the spatial resolution is H (height) xW (width) and the angular resolution is NxM, the LF image can be represented by a four-dimensional array of HxWxNxM (Reference 1: “M. Levoy and P. Hanrahan,“ Light Field rendering, ”Proc. 23rd Annu. Conf. Comput. Graph. Interact. Tech.-SIGGRAPH '96, pp. 31-42, 1996.”).

このＬＦ画像から任意の距離に焦点を合わせた画像を生成する場合、生成される画像の解像度は再サンプリングや超解像などの解像度を増大させる処理を特別に行わなければＨｘＷとなる。この他に、ＨｘＷの解像度を持つＮｘＭ枚の多視点画像として表現することもできる。この場合の各視点の画像は光線情報を方向別に画像としてまとめたものであり、同じ光線情報を多数のカメラを使用して撮像する場合に各カメラで撮像される画像と同一である。以下では、この表現における各視点の画像を副開口画像（ｓｕｂ−ａｐｅｒｔｕｒｅｉｍａｇｅ）と称する。また、一般のＬＦカメラの撮像画像そのままに主レンズとマイクロレンズアレイを介して撮像素子に記録された画像をそのまま入力としてもよい。そのほかにどの様な形式のＬＦ画像を出力としてもよい。 When generating an image focused on an arbitrary distance from the LF image, the resolution of the generated image is HxW unless special processing for increasing the resolution such as resampling or super-resolution is performed. In addition, it can be expressed as N × M multi-viewpoint images having a resolution of HxW. In this case, each viewpoint image is a collection of ray information as an image for each direction, and is the same as an image captured by each camera when the same ray information is captured using a number of cameras. Hereinafter, an image at each viewpoint in this expression is referred to as a sub-aperture image. Alternatively, an image recorded on an image sensor via a main lens and a microlens array may be directly input as an image captured by a general LF camera. In addition, any type of LF image may be output.

次に、図２を参照して、図１に示すライトフィールド画像生成装置１００の処理動作を説明する。図２は、図１に示すライトフィールド画像生成装置１００の処理動作を示すフローチャートである。 Next, the processing operation of the light field image generating apparatus 100 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the processing operation of the light field image generating apparatus 100 shown in FIG.

まず、画像入力部１０１は、ＬＦ画像の元となる光線情報を含まない画像を入力する（ステップＳ１０１）。この画像を以下では入力画像と称する。続いて、辞書入力部１０２は、ＬＦ辞書を入力する（ステップＳ１０２）。ＬＦ辞書はどのようなものでもよいが、本実施形態においては有限の基底ベクトルからなる辞書で、任意のＬＦ画像をこの基底ベクトルの線形結合であらわすことができるとする。基底ベクトルは主成分分析や独立主成分分析などどのような方法によって生成したものでもよいが、以下ではスパースコーディング（参考文献２：「M. Elad and M. Aharon, “Image denoising via sparse and redundant representation over learned dictionaries,” IEEE Transations Image Process., vol. 15, no. 12, pp. 3736-3745, 2006.」）によって生成したものであるとして説明する。これにより、任意のＬＦ画像を非常に少数の係数ベクトルによって表現可能であるとする。 First, the image input unit 101 inputs an image that does not include the light ray information that is the basis of the LF image (step S101). This image is hereinafter referred to as an input image. Subsequently, the dictionary input unit 102 inputs an LF dictionary (step S102). Any LF dictionary may be used, but in the present embodiment, it is assumed that an arbitrary LF image can be represented by a linear combination of the basis vectors. The basis vectors may be generated by any method such as principal component analysis or independent principal component analysis, but in the following, sparse coding (Reference 2: “M. Elad and M. Aharon,“ Image denoising via sparse and redundant representation. over learned dictionaries, "IEEE Transations Image Process., vol. 15, no. 12, pp. 3736-3745, 2006."). Thus, it is assumed that an arbitrary LF image can be expressed by a very small number of coefficient vectors.

次に、パッチ生成部１０３は、入力画像をあらかじめ定められた大きさのパッチに分割し、画像パッチ群を生成する（ステップＳ１０３）。パッチの大きさはどのような大きさでもよいが、入力するＬＦ辞書によって定められる。また、複数のパッチが重複していてもよい。以下ではｗｘｈの空間解像度のパッチを使用すると仮定して説明する。 Next, the patch generation unit 103 divides the input image into patches having a predetermined size, and generates an image patch group (step S103). The size of the patch may be any size, but is determined by the input LF dictionary. A plurality of patches may be overlapped. In the following description, it is assumed that a patch having a spatial resolution of wxh is used.

次に、係数ベクトル推定部１０４は、画像パッチ群とＬＦ辞書とから、各画像パッチのＬＦ辞書に対する係数ベクトルを推定する（ステップＳ１０４）。このとき、復元されたＬＦパッチに焦点合わせ処理を施すことで元になる画像パッチがそれぞれ生成できることし、更に復元されたＬＦパッチから得られる副開口画像ができるだけ焦点ボケを含まないことを条件とする。このような条件を満たす係数ベクトルの推定方法はどのような方法でもよいが、以下ではスパースコーディングを使った方法を説明する。 Next, the coefficient vector estimation unit 104 estimates a coefficient vector for the LF dictionary of each image patch from the image patch group and the LF dictionary (step S104). At this time, on the condition that the original image patch can be generated by performing the focusing process on the restored LF patch, and that the sub-aperture image obtained from the restored LF patch contains as little focus blur as possible. To do. Any method may be used for estimating the coefficient vector that satisfies such conditions, but a method using sparse coding will be described below.

スパースコーディングでは、任意のＬＦ画像ベクトルをＬＦ基底からなるＬＦ辞書に対応する係数ベクトルで表現する。復元対象となるＬＦ画像をＬ、辞書をＤ、係数ベクトルをαとすると、

となる。 In sparse coding, an arbitrary LF image vector is expressed by a coefficient vector corresponding to an LF dictionary composed of LF bases. If the LF image to be restored is L, the dictionary is D, and the coefficient vector is α,

It becomes.

生成されたＬＦ画像から得られる副開口画像が焦点ボケを含まないためには、入力画像のベクトルをＩ、焦点距離ｆ、被写界深度ｄにおける焦点合わせ演算子をＲ（ｆ，ｄ）、（ｉ，ｊ）番目の副開口画像を抽出する演算子をＣ（ｉ，ｊ）とし、焦点ボケ評価関数をＥ（ｘ）すると、

とすることで係数ベクトルαを推定できる。第三項はスパース項であり、λはそのパラメータである。 In order that the sub-aperture image obtained from the generated LF image does not include defocusing, the vector of the input image is I, the focal length f, the focusing operator at the depth of field d is R (f, d), If the operator that extracts the (i, j) th sub-aperture image is C (i, j) and the focal blur evaluation function is E (x),

Thus, the coefficient vector α can be estimated. The third term is a sparse term, and λ is its parameter.

焦点合わせ画像の作成方法はどのような方法でもよい。よく知られている方法としては、シフト加算法やフーリエスライス法（参考文献３：「R. Ng, “Fourier slice photography,” ACM SIGGRAPH 2005 Pap. - SIGGRAPH ’05, p. 735, 2005.」）などがある。上記方法では入力画像の焦点距離及び被写界深度が必要となるが、これらを外部から入力し使用してもよいし、推定して使用してもよい。 Any method may be used to create the focused image. Well-known methods include shift addition and Fourier slice methods (Reference 3: “R. Ng,“ Fourier slice photography, ”ACM SIGGRAPH 2005 Pap.-SIGGRAPH '05, p. 735, 2005.”) and so on. Although the above method requires the focal length and depth of field of the input image, these may be input from the outside and used, or may be estimated and used.

また、これらのパラメータを使用しない方法で係数ベクトルを推定してもよい。例えば入力画像の焦点距離及び被写界深度をＬＦ画像の撮像系における主レンズの焦点距離及び被写界深度と同じと仮定して復元する場合には、シフト加算法においては前副開口画像の加算平均、フーリエスライス法においてはスライス角度０とすることにより、パラメータによらない焦点合わせ演算子を定義できる。 The coefficient vector may be estimated by a method that does not use these parameters. For example, when restoration is performed on the assumption that the focal length and depth of field of the input image are the same as the focal length and depth of field of the main lens in the LF image imaging system, In addition averaging and Fourier slicing, by setting the slice angle to 0, a focusing operator that does not depend on parameters can be defined.

焦点ボケ評価関数はどのようなものでもよい。例えば画像の勾配ベクトルを求め、ゼロでないピクセルの数を評価関数としてもよい。このような評価関数を使用する場合、

とすることで係数ベクトルαを推定できる。また第二項は勾配ベクトルを正規化することでＬ２ノルムに置き換えることも可能であり、一般的なＬ１正則化で高速に解を得ることができる。 Any defocus evaluation function may be used. For example, the gradient vector of the image may be obtained and the number of non-zero pixels may be used as the evaluation function. When using such an evaluation function,

Thus, the coefficient vector α can be estimated. The second term can be replaced with the L2 norm by normalizing the gradient vector, and a high-speed solution can be obtained by general L1 regularization.

正規化の方法はどのようなものでもよいが、例えば入力画像の勾配を

としてあらかじめ求め、

とすることで係数ベクトルを推定できる。 Any normalization method can be used. For example, the gradient of the input image

As in advance,

Thus, the coefficient vector can be estimated.

また、第一項に入力画像の代わりに入力画像の正規化済み勾配ベクトルを使用することで、第二項において以下のように勾配演算子及び正規化演算子を省略することも可能である。

Further, by using the normalized gradient vector of the input image instead of the input image in the first term, it is possible to omit the gradient operator and the normalization operator in the second term as follows.

この場合に得られる係数ベクトルはＬＦ画像の勾配に対応するため、後述のライトフィールド画像生成部１０５において勾配からＬＦ画像を推定する処理を行ってもよい。 Since the coefficient vector obtained in this case corresponds to the gradient of the LF image, the light field image generation unit 105 described later may perform processing for estimating the LF image from the gradient.

上記説明においては最小化問題で最適化を行うために焦点ボケ評価関数として焦点ボケが強いほど大きな値をとる評価関数を用いたが、最大化問題で最適化を行う場合や、双対問題が最小化可能な場合、第一項と第二項を別の問題に分離し繰り返し演算などで解を求める場合などは、焦点ボケが強いほど小さな値をとる評価関数などを利用してもよい。例えば、画像の勾配の分散などを評価してもよい。あるいは、フーリエ変換やウェーブレット変換などの周波数領域への変換を行い、その高周波成分の強度などを評価するなどしてもよい。 In the above description, an evaluation function having a larger value as the focal blur is stronger is used as the focal blur evaluation function in order to optimize the minimization problem. However, the optimization problem is minimized or the dual problem is minimized. In the case where the first term and the second term are separated into different problems and a solution is obtained by repetitive calculation or the like, an evaluation function that takes a smaller value as the defocus is stronger may be used. For example, the variance of the image gradient may be evaluated. Alternatively, conversion to a frequency domain such as Fourier transform or wavelet transform may be performed to evaluate the strength of the high frequency component.

また、上記説明では評価値を線形演算で算出可能な関数を使用したが、非線形演算で得られる評価関数を利用してもよい。例えば参考文献４：「S. Zhuo and T. Sim, “Defocus map estimation from a single image,” Pattern Recognit., vol. 44, no. 9, pp. 1852-1858, Sep. 2011.」で示されるように、異なる二つのパラメータでガウシアンフィルタをかけた画像の勾配量を比較してボケ量を推定する方法などがある。この方法では焦点ボケを表す点広がり関数をガウス関数と仮定し、そのパラメータを推定するものである。元画像をＩ，任意のガウシアンフィルタのパラメータをσ_１，σ_２：σ_１＜σ_２、ガウシアンフィルタの演算子をＧ（ｘ，ｙ，σ），勾配演算子を∇とすると、位置（ｘ，ｙ）におけるσ（ｘ，ｙ）は、

と推定される。 In the above description, a function capable of calculating an evaluation value by a linear operation is used. However, an evaluation function obtained by a non-linear operation may be used. For example, as shown in Reference 4: “S. Zhuo and T. Sim,“ Defocus map estimation from a single image, ”Pattern Recognit., Vol. 44, no. 9, pp. 1852-1858, Sep. 2011.” As described above, there is a method of estimating the blur amount by comparing the gradient amounts of the images subjected to the Gaussian filter with two different parameters. In this method, a point spread function representing defocus is assumed to be a Gaussian function, and its parameters are estimated. If the original image is I, the parameters of an arbitrary Gaussian filter are σ ₁ , σ ₂ : σ ₁ <σ ₂ , the operator of the Gaussian filter is G (x, y, σ), and the gradient operator is ∇, the position (x , Y), σ (x, y) is

It is estimated to be.

また、同様の方法を平均フィルタなどの異なる点広がり関数に応用することも可能である。また、点広がり関数のモデルを仮定しパラメータを推定する方法のほかにも参考文献５：「H. Zhang, J. Yang, Y. Zhang, and T. S. Huang, “Sparse Representation Based Blind Image Deblurring,” IEEE Int. Conf. Multimed. Expo, pp. 1-6, 2011.」で示されるように直接点広がり関数を推定する方法などを使用してもよい。 It is also possible to apply the same method to different point spread functions such as an average filter. Besides the method of estimating parameters assuming a model of point spread function, Reference 5: “H. Zhang, J. Yang, Y. Zhang, and TS Huang,“ Sparse Representation Based Blind Image Deblurring, ”IEEE Int. Conf. Multimed. Expo, pp. 1-6, 2011. ”A method of directly estimating a point spread function may be used.

または、副開口画像に対する焦点ボケ評価の代わりに、入力画像を主レンズ中心にあたる副開口画像とある点広がり関数との畳み込みであるとして、この点広がり関数によるボケ量を評価する評価関数を使用してもよい。点広がり関数の推定方法はどのような方法でもよいが、例えば副開口画像の位置（ｘ，ｙ）を中心とした点広がり関数のカーネルと同じサイズのベクトルをＡ（ｘ，ｙ），（ｘ，ｙ）における入力画像の画素値をｂ（ｘ，ｙ）とし、点広がり関数のカーネルをｋ（ｘ，ｙ）とすると、

と推定される。 Alternatively, instead of focusing blur evaluation on the sub-aperture image, an evaluation function that evaluates the amount of blur due to the point spread function is used by assuming that the input image is a convolution of the sub-aperture image corresponding to the center of the main lens and a point spread function. May be. Any method may be used for estimating the point spread function. For example, a vector having the same size as the kernel of the point spread function around the position (x, y) of the sub-aperture image is represented by A (x, y), (x , Y) where the pixel value of the input image is b (x, y) and the kernel of the point spread function is k (x, y).

It is estimated to be.

パッチサイズがカーネルサイズより大きい場合は

としてパッチ全体で点広がり関数を推定するなどしてもよい。 If the patch size is larger than the kernel size

For example, a point spread function may be estimated for the entire patch.

このように推定された点広がり関数によるボケ量の評価はどのように行ってもよい。例えば点広がり関数のカーネルの分散が高いほどボケ量が大きく、低いほどボケ量が小さいとするなどでもよい。 The blur amount may be evaluated by the point spread function estimated in this way. For example, the blur amount may be larger as the variance of the kernel of the point spread function is higher, and the blur amount may be smaller as the kernel variance is lower.

上記説明においては入力画像をそのままベクトルとして対応する係数ベクトルを推定しているが、必要に応じて入力画像に正規化や冗長性削減などの処理を加えてよい。例えば、各パッチから平均値を差しひいて画像ベクトルとし、ＬＦパッチを復元したのちに平均値を足し戻すといったような処理を加えてよい。また、ＤＣＴ変換やフーリエ変換など任意の変換や、必要に応じて量子化などの処理を加えてもよい。 In the above description, the corresponding coefficient vector is estimated with the input image as a vector as it is, but normalization, redundancy reduction, and the like may be added to the input image as necessary. For example, a process may be added in which an average value is subtracted from each patch to obtain an image vector, and after the LF patch is restored, the average value is added back. Further, arbitrary transformation such as DCT transformation and Fourier transformation, and processing such as quantization may be added as necessary.

次に、ライトフィールド画像生成部１０５は、係数ベクトル群とＬＦ辞書とからＬＦパッチ群を生成し、ＬＦパッチ群からＬＦ画像を生成する。ＬＦ画像を生成する方法にはどのような方法を使用してもよい。一般には、すべてのパッチを対応する画像位置に合わせ、複数のパッチが重なる部分は加算平均をとるといった方法が使用される。最後に、生成したＬＦ画像を出力し処理を終了する（ステップＳ１０５）。 Next, the light field image generation unit 105 generates an LF patch group from the coefficient vector group and the LF dictionary, and generates an LF image from the LF patch group. Any method for generating the LF image may be used. In general, a method is used in which all patches are matched to corresponding image positions, and an average of the portions where a plurality of patches overlap is obtained. Finally, the generated LF image is output and the process is terminated (step S105).

上記実施形態においては、入力画像の撮像位置はＬＦの主レンズ中心にあたる副開口画像の撮像位置と同じであるが、ＬＦの別の副開口画像の撮像位置と同じでもよいし、また異なる位置でもよい。このような場合にはＬＦ画像を使用して入力画像と同一の位置における焦点合わせ画像が生成できるものとし、その演算子をＲ（ｆ，ｄ，Ｘ，Ｙ，Ｚ）とすると、（２）式や（３）式のように係数ベクトルが生成できる。 In the above embodiment, the imaging position of the input image is the same as the imaging position of the sub-aperture image corresponding to the center of the main lens of the LF, but may be the same as the imaging position of another sub-aperture image of the LF, or may be a different position. Good. In such a case, it is assumed that a focused image at the same position as the input image can be generated using the LF image, and the operator is R (f, d, X, Y, Z). A coefficient vector can be generated as shown in Equation (3) and Equation (3).

以上説明したように、光線情報を持たない入力画像からライトフィールド画像を生成する際に、入力画像を分割した画像パッチをＬＦ辞書の要素である基底ベクトルの線形結合として表現する場合の係数ベクトルを所定の誤差を最小化するものとして算出し、この係数ベクトルを用いて入力画像からライトフィールド画像を容易に生成することができるようになる。 As described above, when a light field image is generated from an input image having no ray information, a coefficient vector for expressing an image patch obtained by dividing the input image as a linear combination of basis vectors that are elements of the LF dictionary is represented. The light field image can be easily generated from the input image using the coefficient vector calculated to minimize the predetermined error.

なお、前述した実施形態において一部の処理はその順序が前後しても構わない。また、前述した実施形態におけるライトフィールド画像生成装置の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 In the above-described embodiment, the order of some processes may be changed. In addition, all or part of the light field image generation device in the above-described embodiment may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be a program for realizing a part of the above-described functions, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行ってもよい。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Therefore, additions, omissions, substitutions, and other modifications of the components may be made without departing from the technical idea and scope of the present invention.

光線情報を含まない一般の画像からライトフィールド画像を生成しピント調節や三次元情報の推定などを行うのに不可欠な用途に適用できる。 It can be applied to applications indispensable for generating a light field image from a general image that does not include light ray information and performing focus adjustment and estimation of three-dimensional information.

１００・・・ライトフィールド画像生成装置、１０１・・・画像入力部、１０２・・・辞書入力部、１０３・・・パッチ生成部、１０４・・・係数ベクトル推定部、１０５・・・ライトフィールド画像生成部 DESCRIPTION OF SYMBOLS 100 ... Light field image generation apparatus, 101 ... Image input part, 102 ... Dictionary input part, 103 ... Patch generation part, 104 ... Coefficient vector estimation part, 105 ... Light field image Generator

Claims

An image processing method for generating a light field image from a light field dictionary composed of a finite basis vector that can represent an input image including a defocus and an arbitrary light field image by linear combination of basis vectors,
A coefficient vector estimation step for estimating a coefficient vector for the light field dictionary for optimizing an error between the focus processing result for the light field image and the input image, and a defocus amount of each sub-aperture image of the light field image. When,
A light field image generation step of generating a light field image from the coefficient vector.

A defocus amount estimation step for estimating a defocus amount of the sub-aperture image;
The image processing method according to claim 1, wherein in the coefficient vector estimation step, the coefficient vector is estimated based on the focal blur amount.

The image processing method according to claim 2, wherein, in the focal blur amount estimation step, a norm of a gradient of the sub-aperture image is used as the focal blur amount.

The image processing method according to claim 2, wherein in the focal blur amount estimation step, a variance of a gradient of the sub-aperture image is used as the focal blur amount.

The image processing method according to claim 2, wherein in the focal blur amount estimating step, a ratio of gradients obtained by applying two different types of point spread functions to the sub-aperture image is used as the focal blur amount.

An image processing device that generates a light field image from a light field dictionary composed of a finite basis vector that can represent an input image including a defocus and an arbitrary light field image by linear combination of basis vectors,
Coefficient vector estimation means for estimating a coefficient vector for the light field dictionary for optimizing an error between the focus processing result for the light field image and the input image, and a defocus amount of each sub-aperture image of the light field image When,
An image processing apparatus comprising: a light field image generation unit configured to generate a light field image from the coefficient vector.

An image processing program for causing a computer to execute the image processing method according to claim 1.