JP6202879B2

JP6202879B2 - Rolling shutter distortion correction and image stabilization processing method

Info

Publication number: JP6202879B2
Application number: JP2013105996A
Authority: JP
Inventors: 力松永
Original assignee: KABUSHIKI KAISYA HOUEI
Current assignee: KABUSHIKI KAISYA HOUEI
Priority date: 2013-05-20
Filing date: 2013-05-20
Publication date: 2017-09-27
Anticipated expiration: 2033-05-20
Also published as: JP2014229971A

Description

本発明は、ＣＭＯＳセンサを用いたカメラにより撮影された映像における動き歪み変形の補正（ローリングシャッタ歪み補正）と、揺れの補正を同時に行う映像安定化処理に関する。 The present invention relates to a video stabilization process that simultaneously performs motion distortion deformation correction (rolling shutter distortion correction) and shake correction in a video shot by a camera using a CMOS sensor.

映像の動きを推定する方法は古くから研究されており、当該方法は、特徴ベースによる方法と、領域ベースによる方法に大別される。特徴ベースによる方法は、コーナー点や直線等の画像特徴を検出し、それらを画像（フレーム）間で対応付けることにより画像の動きを推定するが、エッジ等の特徴抽出処理及びそれらのフレーム間対応付け処理には、比較的多くの処理時間とコストが掛かる。 A method for estimating motion of an image has been studied for a long time, and the method is roughly classified into a feature-based method and a region-based method. The feature-based method detects image features such as corner points and straight lines, and associates them with images (frames) to estimate the motion of the image. Processing takes a relatively large amount of processing time and cost.

このように、特徴ベースによる方法は、特徴抽出処理やその後の対応付け処理にコストを要することから、領域ベースによる方法を用いた揺れ安定化装置の開発や製品化も鋭意なされている。しかし、いずれもＣＣＤセンサにより取得されたカメラ映像を対象としており、ＣＭＯＳセンサによる動き歪み変形には対応しておらず、当該ＣＭＯＳの揺れ映像の安定化はできない。 As described above, since the feature-based method requires a cost for the feature extraction processing and the subsequent association processing, development and commercialization of a shake stabilization device using the region-based method have been eagerly performed. However, all of them are intended for camera images acquired by a CCD sensor, do not support motion distortion deformation by a CMOS sensor, and cannot stabilize the shaking image of the CMOS.

また、従来、カメラの意図的な動き（例えばパン）と、意図しない揺れ（ぶれ）による動きとを、取得された映像段階において判別することができないことから、移動カメラによる取得映像から、画像処理により不要な揺れ（ぶれ）のみを除去することは、十分にはできなかった。 Conventionally, it is impossible to discriminate between intentional camera movement (for example, panning) and unintentional shaking (blurring) at the acquired video stage. Therefore, it was not possible to sufficiently remove only unnecessary shaking.

下記特許文献１には、ＣＭＯＳセンサからの映像データ読み出し方法において、ある走査線の読み込み中に次の走査線の読み込みを開始する複数走査線の同時読み出し等の工夫をすることにより、歪みを軽減する撮像方式が開示されている。 In Patent Document 1 below, in a method of reading out video data from a CMOS sensor, distortion is reduced by devising simultaneous reading of a plurality of scanning lines that starts reading the next scanning line while reading a certain scanning line. An imaging method is disclosed.

また、下記特許文献２では、映像処理による歪み補正を行う方法が開示されているが、画像特徴点を用いる方法であり、映像の揺れによる平行移動成分と歪み成分の両者を別々に推定する技術思想が開示されている。 Further, Patent Document 2 below discloses a method for correcting distortion by video processing. However, this is a method using image feature points, and a technique for separately estimating both a translation component and a distortion component due to video shaking. The idea is disclosed.

特開２０１１−１４２５９２号公報JP2011-142592A 特開２０１３−０１７１５５号公報JP2013-171155A

従来、特徴ベースによる映像の動きを推定する方法も領域ベースによる映像の動きを推定する方法も、ＣＭＯＳセンサによる動き歪み変形の補正には対応しておらず、揺れ映像の安定化はできなかった。例えば、ラスタスキャン時のスキャン速度よりも速く画面内の被写体が移動する状況等においては、被写体の形状が歪むように映像取得（動き歪み）されることが知られているところ、これに対する有効な補正方法はなかった。 Conventionally, neither feature-based video motion estimation nor region-based video motion estimation was compatible with motion sensor distortion correction using CMOS sensors, and it was not possible to stabilize shaking video. . For example, in situations where the subject in the screen moves faster than the scan speed during raster scanning, it is known that the image is acquired (motion distortion) so that the shape of the subject is distorted. There was no way.

また、従来、カメラの意図的な動き（例えばパン）と意図しない揺れ（ぶれ）による動きとを、取得された映像段階において判別することができないことから、移動カメラによる取得映像から不要な揺れのみを除去することは、十分にはできなかった。 Further, conventionally, since it is impossible to discriminate between intentional camera movement (for example, panning) and unintentional shaking (blurring) at the acquired video stage, only unnecessary shaking is obtained from the acquired video from the mobile camera. Could not be removed sufficiently.

本発明においては、ＣＭＯＳセンサを用いたカメラにより撮影された映像における動き歪み変形と揺れの補正を同時に行うことが可能なように、ローリングシャッタに起因する動き歪み変形を、隣接する２画像間（２フレーム間）の２次元４パラメータアフィン変換により記述する。そして、その変換行列を最適に推定した後、解析的に分解することにより並進パラメータを計算する。 In the present invention, motion distortion deformation caused by a rolling shutter is performed between two adjacent images (in order to be able to simultaneously perform motion distortion deformation and shake correction in an image captured by a camera using a CMOS sensor. It is described by two-dimensional four-parameter affine transformation (between two frames). Then, after optimally estimating the transformation matrix, the translation parameter is calculated by analytically decomposing.

この場合に、何らの画像特徴や対応付けを用いることなく（すなわち従来の特徴ベースによる方法を用いることなく）、勾配拘束条件を用いた領域ベースの直接法により推定するものとする。移動カメラの場合には、推定した並進パラメータの時系列変化に対して、巡回型フィルタにレベル適応を加味した巡回型フィルタにより揺れ成分を除去して、カメラの移動（すなわちパン等による視野映像の移動）を保持したまま、映像中の揺れ（ぶれ）のみを補正するものとする。 In this case, the estimation is performed by the region-based direct method using the gradient constraint condition without using any image feature or association (that is, without using the conventional feature-based method). In the case of a moving camera, the fluctuation component is removed by a recursive filter in which level adaptation is added to the recursive filter for the estimated time series change of the translation parameter, and the camera is moved (that is, the visual field image by panning or the like is removed). It is assumed that only shaking (blurring) in the video is corrected while holding (movement).

従って、本発明の映像安定化処理方法は、ＣＭＯＳセンサから取得された映像に対して、ローリングシャッタに起因する動き歪み変形を隣接する２画像間の２次元４パラメータアフィン変換し、その変換行列を勾配拘束条件を用いた領域ベースの直接法により推定した後、解析的に分解することにより並進パラメータを計算することを特徴とする。 Accordingly, the video stabilization processing method of the present invention performs a two-dimensional four-parameter affine transformation between two adjacent images on a motion image obtained from a rolling shutter, and transforms the transformation matrix of the video acquired from the CMOS sensor. It is characterized by calculating the translation parameter by analytically decomposing after estimation by a region-based direct method using gradient constraint conditions.

また、本発明の映像安定化処理方法は、好ましくは移動カメラの場合には、推定した並進パラメータの時系列変化に対して、レベル適応を加味した巡回型フィルタにより揺れ成分を除去することを特徴とする。 The video stabilization processing method of the present invention is preferably characterized in that, in the case of a mobile camera, the fluctuation component is removed by a recursive filter taking the level adaptation into account for the time-series change of the estimated translation parameter. And

また、本発明の映像安定化処理方法は、さらに好ましくは変換行列を勾配拘束条件を用いた領域ベースの直接法により推定した後の解析的な分解が、線形解法を用いた反復更新により最適化したアフィン変換行列の解析的な分解であることを特徴とする。 In the video stabilization processing method of the present invention, it is more preferable that the analytical decomposition after estimating the transformation matrix by the region-based direct method using the gradient constraint condition is optimized by iterative updating using the linear solution method. It is an analytical decomposition of the affine transformation matrix.

また、本発明の映像安定化処理方法は、さらに好ましくは線形解法を用いた反復更新による最適化が、最小二乗法を用いる演算処理であることを特徴とする。 The video stabilization processing method of the present invention is more preferably characterized in that the optimization by iterative updating using a linear solution method is an arithmetic processing using a least square method.

また、本発明の映像安定化処理方法は、さらに好ましくはＣＭＯＳセンサから取得された映像における動き歪み変形と、揺れの補正と、を同時に行うことを特徴とする。 In addition, the image stabilization processing method of the present invention is more preferably characterized by simultaneously performing motion distortion deformation and shake correction in an image acquired from a CMOS sensor.

また、本発明の映像安定化処理方法は、さらに好ましくは変換行列を勾配拘束条件を用いた領域ベースの直接法により推定する場合に、画像特徴や対応付けを用いないことを特徴とする。 The video stabilization processing method of the present invention is more preferably characterized in that no image feature or association is used when the transformation matrix is estimated by a region-based direct method using gradient constraint conditions.

また、本発明の映像安定化処理方法は、さらに好ましくはカメラの移動に伴う視野映像の移動を保持したまま、映像中の揺れのみを補正することを特徴とする。 The image stabilization processing method of the present invention is more preferably characterized by correcting only the shaking in the image while maintaining the movement of the visual field image accompanying the movement of the camera.

また、本発明のプログラムは、上述のいずれかに記載の映像安定化処理方法をコンピュータに実行させるためのプログラムであることを特徴とする。 A program of the present invention is a program for causing a computer to execute any one of the above-described video stabilization processing methods.

また、本発明の記憶媒体は、上述のプログラムを記憶したコンピュータが読み取り可能な記憶媒体であることを特徴とする。 The storage medium of the present invention is a computer-readable storage medium storing the above-described program.

また、本発明のビデオスタビライザーは、上述のいずれかに記載の映像安定化処理方法を実行するビデオスタビライザーであることを特徴とする。 The video stabilizer of the present invention is a video stabilizer that executes any one of the above-described video stabilization processing methods.

本発明により、ＣＭＯＳセンサを用いた撮像装置により撮影された映像における動き歪み変形と揺れの補正とが同時に処理可能となる。また、カメラのパン等移動カメラにおける揺れを含む映像の場合、カメラの移動を保持したまま、映像中の揺れのみを補正することが可能となる。また、移動カメラから固定カメラヘ遷移する場合、あるいはその逆の場合にも、動き歪み変形と揺れに対して同時にかつ忠実に補正処理対応することが可能となる。 According to the present invention, it is possible to simultaneously process motion distortion deformation and shake correction in an image taken by an imaging device using a CMOS sensor. In addition, in the case of an image including shaking of a moving camera such as a camera pan, it is possible to correct only shaking in the image while maintaining camera movement. In addition, even when transitioning from a moving camera to a fixed camera or vice versa, it is possible to simultaneously and faithfully perform correction processing for motion distortion deformation and shaking.

固定カメラの場合の、ＣＭＯＳカメラ映像の動き歪み補正および安定化処理（ぶれ補正処理）を説明するブロック図である。It is a block diagram explaining the movement distortion correction and stabilization process (blur correction process) of a CMOS camera image in the case of a fixed camera. 移動カメラの場合の、ＣＭＯＳカメラ映像の動き歪み補正および安定化処理（ぶれ補正処理）を説明するブロック図である。It is a block diagram explaining the movement distortion correction and stabilization process (blur correction process) of a CMOS camera image in the case of a moving camera. ＣＭＯＳカメラの順次露光（ラスタスキャン）による動き歪みを説明する図であり、縦線が画像の右方向へ移動する場合（カメラを左にパンした場合）とその結果得られる歪み画像（上段）を説明し、円が画像の下方向へ移動する場合（カメラが上にパン（ティルトともいう）した場合）とその結果得られる歪み画像（下段）を説明する図である。It is a figure explaining the movement distortion by the sequential exposure (raster scan) of a CMOS camera, and when the vertical line moves to the right of the image (when the camera is panned to the left) and the resulting distortion image (upper stage) FIG. 6 is a diagram illustrating a case where a circle moves downward in an image (when the camera pans upward (also referred to as tilt)) and a distortion image (lower stage) obtained as a result. ＣＭＯＳカメラの動き歪みを説明する図である。It is a figure explaining the motion distortion of a CMOS camera. シミュレーションにより生成したカメラの動きの方向による格子画像のＣＭＯＳ動き歪み画像であり、（ａ）がカメラ動きなしの場合を説明し、（ｂ）がカメラが左を向くと画像は右に変形する場合を説明し、（ｃ）がカメラが右を向くと画像は左に変形する場合を説明し、（ｄ）がカメラが上を向くと画像は伸びる場合を説明し、（ｅ）がカメラが下を向くと画像は縮む場合を説明している。A CMOS motion distortion image of a lattice image according to the direction of camera movement generated by simulation, where (a) illustrates the case where there is no camera movement, and (b) is a case where the image is deformed to the right when the camera is directed to the left. (C) illustrates the case where the image is deformed to the left when the camera is directed to the right, (d) illustrates the case where the image is stretched when the camera is directed upward, and (e) illustrates the case where the camera is The case where the image is shrunk when facing is described. 図５の格子画像を歪み変形させたシミュレーション画像において、カメラの動きなしの画像を基準画像として、４パラメータアフィン変換行列を計算して歪み補正した結果を説明する図であり、（ａ）は基準画像（カメラ動きなし）を説明する図であり、（ｂ）〜（ｅ）は、図５のカメラの動きによる各動き歪み画像（ｂ）〜（ｅ）の補正結果を説明する図であるが、画像境界は歪み変形の補正が分かりやすいように黒のままとしている。FIG. 6 is a diagram for explaining the result of distortion correction by calculating a four-parameter affine transformation matrix using an image without camera movement as a reference image in the simulation image obtained by distortion-deforming the lattice image of FIG. It is a figure explaining an image (no camera motion), (b)-(e) is a figure explaining the correction result of each motion distortion image (b)-(e) by the motion of the camera of FIG. The image boundary remains black so that distortion correction can be easily understood. （ａ）は、水平および垂直方向にそれぞれ平均０、標準偏差５および３画素の正規乱数による並進パラメータを用いて生成した並進歪み画像列の加算平均画像を示す図であり（３０フレーム）、（ｂ）は、第１フレームを基準画像として、第２フレーム以降、順次隣接する２画像間の並進パラメータを推定した結果を用いて歪み補正および安定化処理を行った処理結果の画像列の加算平均画像を示す図であり、歪み変形と揺れにより輪郭が重なって見える処理前（ａ）の加算平均画像に対して、処理後（ｂ）の加算平均画像は明瞭に見えるところ、画像境界付近の黒味は補正処理による見切れのためであり、（ｃ）は、歪み画像列の生成に用いた並進パラメータの軌跡を説明する図である。(A) is a figure which shows the addition average image of the translation distortion image row | line | column produced | generated using the translation parameter by the normal random number of average 0, standard deviation 5 and 3 pixels, respectively in the horizontal and vertical directions (30 frames), ( b) An addition average of image sequences of processing results obtained by performing distortion correction and stabilization processing using a result of estimating translation parameters between two adjacent images sequentially from the second frame using the first frame as a reference image. It is a figure which shows an image, and the addition average image after a process (b) looks clear with respect to the addition average image before a process (a) where the outline appears to overlap by distortion deformation and a shake, black of the image boundary vicinity The taste is due to an overrun by the correction process, and (c) is a diagram for explaining the trajectory of the translation parameter used to generate the distorted image sequence. ディジタルカメラ（キヤノンＩＸＹＤＩＧＩＴＡＬ５００（登録商標））で撮影した２５９２×１９４４画素の画像に対して、並進パラメータを与えて、その一部分を切り出して生成した並進歪み画像列の一部を説明する図であり、水平および垂直方向の基準となる並進パラメータ（ｔｘ，ｔｙ）＝（１５，３）に対して、それぞれ平均０、標準偏差１画素の正規乱数を加えた並進パラメータを用いて画像サイズが６４０×４８０画素の並進歪み画像列を生成した結果を示しており、左から右の順に、カメラが左上方へパンアップしている等速直線運動する移動カメラによる映像と見なすことができ、上段は、そのようにして生成した原画像列であり、中段は、固定カメラとして補正した結果の画像列であり、下段は移動カメラとして補正した結果の画像列を説明する図である。It is a figure explaining a part of translation distortion image sequence which gave a translation parameter to an image of 2592 × 1944 pixels photographed with a digital camera (Canon IXY DIGITAL500 (registered trademark)) and cut out a part thereof. The image size is 640 × using the translation parameter (tx, ty) = (15, 3) serving as a reference in the horizontal and vertical directions, and a translation parameter obtained by adding normal random numbers each having an average of 0 and a standard deviation of 1 pixel. This shows the result of generating a 480-pixel translational distortion image sequence, which can be regarded as an image from a moving camera that moves in a straight line at a constant speed, in which the camera is panned up to the upper left in order from left to right. The original image sequence generated in this way, the middle row is the image row resulting from correction as a fixed camera, and the lower row is corrected as a moving camera. Is a diagram illustrating the result image sequence. 移動カメラによる並進歪み画像列から推定した隣接２画像間の並進パラメータの時系列変化のグラフを説明する図であり、上段が水平方向、下段が垂直方向を示している。It is a figure explaining the graph of the time-sequential change of the translation parameter between two adjacent images estimated from the translation distortion image sequence by a moving camera, The upper stage shows the horizontal direction and the lower stage has shown the perpendicular direction. 実施例における階層動き推定処理を説明するブロック図である。It is a block diagram explaining the hierarchy motion estimation process in an Example.

本発明においては、ＣＭＯＳセンサを用いたカメラにより撮影された映像における動き歪み変形と揺れの補正とを同時に行うことが可能なように、ローリングシャッタに起因する動き歪み変形を隣接する２画像間の２次元４パラメータアフィン変換により記述する。そして、その変換行列を最適に推定した後、解析的に分解することにより並進パラメータを計算する。 In the present invention, motion distortion deformation caused by a rolling shutter is performed between two adjacent images so that motion distortion deformation and shake correction in an image captured by a camera using a CMOS sensor can be performed simultaneously. It is described by a two-dimensional four-parameter affine transformation. Then, after optimally estimating the transformation matrix, the translation parameter is calculated by analytically decomposing.

この場合に、何らの画像特徴や対応付けを用いることなく、勾配拘束条件を用いた領域ベースの直接法により推定するものとする。移動カメラの場合には、推定した並進パラメータの時系列変化に対して、巡回型フィルタにレベル適応を加味した巡回型フィルタにより揺れ成分を除去して、カメラの移動（すなわち視野映像の移動）を保持したまま、映像中の揺れ（すなわち「ぶれ」）のみを補正するものとする。 In this case, the estimation is performed by the region-based direct method using the gradient constraint condition without using any image feature or association. In the case of a moving camera, the camera component (ie, the movement of the visual field image) is removed by removing the shaking component with the cyclic filter that adds level adaptation to the cyclic filter for the estimated time-series change of the translation parameter. It is assumed that only shaking (that is, “blurring”) in the video is corrected while being held.

上述のように本発明においては、ＣＭＯＳセンサを用いたカメラにより撮影された映像における動き歪み変形と揺れの補正とを同時に可能となるように、移動カメラの場合には、推定した並進パラメータの時系列変化に対して、巡回型フィルタにレベル適応を加味したフィルタにより揺れ成分を除去して、カメラの移動を保持したまま、映像中の揺れのみを補正する。また、レベル適応によるフィルタ処理のため、移動カメラから固定カメラヘの遷移、あるいはその逆の場合にも忠実に補正処理対応することが可能となる。 As described above, in the present invention, in the case of a mobile camera, when the estimated translation parameter is used, it is possible to simultaneously perform motion distortion deformation and shake correction in an image captured by a camera using a CMOS sensor. In response to the sequence change, the shaking component is removed by a filter in which level adaptation is added to the recursive filter, and only the shaking in the video is corrected while maintaining the movement of the camera. Further, since the filter processing is based on level adaptation, it is possible to faithfully perform correction processing even in the case of transition from a moving camera to a fixed camera, or vice versa.

このため、ＣＭＯＳセンサのローリングシャッタ（順次露光）機構による動き歪みを２次元アフィン変換としてモデル化して、その変換行列を勾配拘束条件に基づき推定し、推定した変換行列を解析的に分解することにより動き歪みをなす並進パラメータを計算し、さらに推定した並進パラメータの時系列変化に対して、レベル適応処理を加味した巡回型フィルタ処理を施すものとする。 Therefore, by modeling the motion distortion due to the rolling shutter (sequential exposure) mechanism of the CMOS sensor as a two-dimensional affine transformation, estimating the transformation matrix based on the gradient constraint condition, and analytically decomposing the estimated transformation matrix It is assumed that translational parameters that cause motion distortion are calculated, and that cyclic filter processing that takes into account level adaptation processing is applied to the time-series changes of the estimated translational parameters.

本発明の方法は、ベースバンドビデオ信号を処理するハードウェア装置により実現することも可能であるし、ＭＸＦファイルを処理するソフトウェアおよびそれを実行するコンピュータをベースとした装置により実現することも可能である。また、ＭＸＦファイルをベースバンドビデオ信号に変換あるいは逆変換する装置を用いれば、その他の多様な構成による実現が可能である。 The method of the present invention can be implemented by a hardware device that processes a baseband video signal, or by a software-based device that processes MXF files and a computer that executes the software. is there. In addition, if an apparatus that converts or reversely converts an MXF file into a baseband video signal is used, various other configurations can be realized.

図１は、固定カメラの場合の、ＣＭＯＳカメラ映像の動き歪み補正および安定化処理を説明するブロック図である。図１において、動き推定（ＭｏｔｉｏｎＥｓｔｉｍａｔｉｏｎ）により２画像

の間の４パラメータアフィン変換行列を推定して、それを分解した並進パラメータ

により歪み補正（ＤｉｓｔｏｒｔｉｏｎＣｏｒｒｅｃｔｉｏｎ）する。 FIG. 1 is a block diagram for explaining motion distortion correction and stabilization processing of a CMOS camera image in the case of a fixed camera. In FIG. 1, two images are obtained by motion estimation.

A four-parameter affine transformation matrix between and the translation parameters decomposed

To correct distortion (Distortion Correction).

並進パラメータを累積加算した

によって、歪み補正の結果を揺れ補正（ＭｏｔｉｏｎＳｔａｂｉｌｉｚａｔｉｏｎ）する。すなわち、

である。ここで、Ｚ^−１は、１フレーム遅延を表す。 Cumulative addition of translation parameters

Thus, the result of distortion correction is shake-corrected (Motion Stabilization). That is,

It is. Here, Z ⁻¹ represents one frame delay.

移動カメラの場合は、

を累積加算したものではなく、

を直接に低域通過フィルタ（ＬＰＦ）によって平滑化した

により歪み補正の結果を図２に示すように、揺れ補正（ＭｏｔｉｏｎＳｔａｂｉｌｉｚａｔｉｏｎ）する。図２は、移動カメラの場合の、ＣＭＯＳカメラ映像の動き歪み補正および安定化処理を説明するブロック図である。 For mobile cameras,

Is not a cumulative addition of

Was directly smoothed by a low-pass filter (LPF).

Thus, as shown in FIG. 2, the distortion correction result is subjected to shake correction (Motion Stabilization). FIG. 2 is a block diagram illustrating motion distortion correction and stabilization processing of a CMOS camera image in the case of a mobile camera.

平滑化した並進パラメータを用いれば、パン等の移動するカメラにおいても、揺れ補正が可能となる。低域通過フィルタとして巡回型フィルタを用いると、現在と過去のデータしか使わないので余分なフレーム遅延が発生せず、処理全体の遅延量の観点から優位である。 If smoothed translation parameters are used, shake correction can be performed even in a moving camera such as a pan. When a recursive filter is used as a low-pass filter, only current and past data is used, so that no extra frame delay occurs, which is advantageous from the viewpoint of the delay amount of the entire process.

また、２画像間の動き歪みをなす並進パラメータの推定結果の時系列に施す巡回型フィルタに信号のレベル差に応じた重み係数を導入する。１次バタワース巡回フィルタの場合、水平方向の並進パラメータ

は、次の［数１］ように計算される（垂直方向の並進パラメータに関しては、

とすればよい）。 In addition, a weighting factor corresponding to a signal level difference is introduced into a cyclic filter applied to a time series of estimation results of translation parameters that cause motion distortion between two images. For the first-order Butterworth recursive filter, the horizontal translation parameter

Is calculated as follows (for the vertical translation parameter:

And it is sufficient).

ここで、

here,

また、

は、信号のレベル差の許容範囲を調整するパラメータであり、α_０，α_１，β_１は、１次バタワース巡回型フィルタ係数であり、以下に示す［数２］のように計算される。 Also,

Is a parameter for adjusting an allowable range of signal level difference, and α ₀ , α ₁ , and β ₁ are first-order Butterworth recursive filter coefficients and are calculated as shown in [Equation 2] below.

ここで、

は、サンプリング周期であり、

は、ディジタルカットオフ角周波数である。 here,

Is the sampling period,

Is the digital cutoff angular frequency.

すなわち、本発明においては、揺れと歪みとを同時に補正することが可能であり、動きを推定するのに特徴点の抽出や対応付けを行わず、移動するカメラから不要な揺れのみを補正する、ことに特徴があり、特に、移動カメラに対応するためのレベル適応フィルタに大きな特徴を有する。 That is, in the present invention, it is possible to correct shaking and distortion at the same time, without extracting or associating feature points to estimate motion, and correcting only unnecessary shaking from a moving camera. In particular, the level adaptive filter for dealing with a moving camera has a great feature.

単なる巡回型フィルタであれば、カメラが動いている状態から静止する状態へ変化する場合の変化点、あるいはその逆に静止している状態から動いている状態へ変化する場合の変化点が鈍るので、正しい補正処理が行われないが、本発明で説明する上述のレベル適応フィルタを用いると、動き／静止の状態変化に対して忠実な補正が可能となる。 If it is just a recursive filter, the change point when the camera changes from a moving state to a stationary state, or vice versa, the change point when changing from a stationary state to a moving state is dull. Although correct correction processing is not performed, if the above-described level adaptive filter described in the present invention is used, it is possible to perform faithful correction with respect to a motion / still state change.

（対応点を用いないローリングシャッタ歪み補正と映像安定化）
近年、低価格な携帯電話カメラからハイエンドのディジタル一眼レフカメラ（ＤｉｇｉｔａｌＳｉｎｇｌｅＬｅｎｓＲｅｆｌｅｘｃａｍｅｒａ、ＤＳＬＲ）まで、ＣＭＯＳセンサが広範に使われてきている。ＣＭＯＳセンサは、低価格化、低消費電力化、大判化が可能であるが、従来のＣＣＤセンサと大きく異なる点は、ローリングシャッタと呼ばれる順次露光機構を用いて画素データを取得することにあり、これに起因する動き歪み変形が生じる点である。 (Rolling shutter distortion correction and image stabilization without using corresponding points)
In recent years, CMOS sensors have been widely used from low-priced mobile phone cameras to high-end digital single lens reflex cameras (DSLRs). Although the CMOS sensor can be reduced in price, reduced in power consumption, and large in size, it differs greatly from the conventional CCD sensor in that it acquires pixel data using a sequential exposure mechanism called a rolling shutter. This is the point at which motion distortion deformation is caused.

本発明は、ローリングシャッタ機構によるＣＭＯＳカメラ映像におけるスタビライザ処理を行う。このとき、何らの画像特徴や対応付けを用いない。隣接する２画像間のグローバル動きを２次元アフィン変換により記述し、その推定には、非線形最適化を行わずに線形解法を用いる。線形解法の反復により最適化したアフィン変換行列を解析的に分解することにより並進パラメータを計算する。画像シミュレーション実験を行い、固定カメラ、移動カメラいずれの映像に含まれる動き歪み変形を補正するとともに揺れを除去して安定化する。移動カメラの場合には、推定した並進パラメータの時系列変化に対して、巡回型フィルタ処理により揺れ成分を除去して、カメラの移動を保持したまま、映像中の揺れのみを補正する。移動カメラから固定カメラへ遷移する際にも、変化に忠実な安定化処理を実現する。 The present invention performs a stabilizer process on a CMOS camera image by a rolling shutter mechanism. At this time, no image feature or association is used. A global motion between two adjacent images is described by two-dimensional affine transformation, and a linear solution is used for the estimation without performing nonlinear optimization. The translation parameters are calculated by analytically decomposing the affine transformation matrix optimized by the iteration of linear solution. An image simulation experiment is conducted to correct motion distortion and deformation included in the images of both the fixed camera and the moving camera, and to stabilize by removing shaking. In the case of a moving camera, the shaking component is removed by cyclic filter processing with respect to the estimated time-series change of the translation parameter, and only the shaking in the video is corrected while keeping the camera moving. Even when transitioning from a mobile camera to a fixed camera, a stabilization process faithful to the change is realized.

（ＣＭＯＳ動き歪みモデル）
ＣＭＯＳセンサはＣＣＤセンサとは異なるシャッタ機構を持つ。ＣＣＤセンサではすべての画素が同時に露光されるが、ＣＭＯＳセンサの場合、小型、低価格を達成するためにライン走査による順次露光を用いている。したがって、カメラの動きが走査時間に比較して非常に大きい場合、ＣＭＯＳセンサの最初と最後のラインの時間差のために、ＣＭＯＳカメラ映像はカメラの動きの方向と種類によって歪む。図３はそのようなローリングシャッタ機構において、どのように走査時間の間にシーン中の物体が動くのかを示している。図３は、ＣＭＯＳカメラの順次露光による動き歪みを説明する図であり、縦線が画像の右方向へ移動する場合（カメラを左にパンした場合）とその結果得られる歪み画像（上段）を説明し、円が画像の下方向へ移動する場合（カメラが上にパン（ティルトともいう）した場合）とその結果得られる歪み画像（下段）を説明する図である。 (CMOS motion distortion model)
The CMOS sensor has a shutter mechanism different from that of the CCD sensor. In the CCD sensor, all pixels are exposed simultaneously, but in the case of a CMOS sensor, sequential exposure by line scanning is used in order to achieve a small size and low cost. Therefore, if the camera motion is very large compared to the scanning time, the CMOS camera image is distorted according to the direction and type of camera motion due to the time difference between the first and last lines of the CMOS sensor. FIG. 3 shows how an object in the scene moves during the scanning time in such a rolling shutter mechanism. FIG. 3 is a diagram for explaining motion distortion due to sequential exposure of a CMOS camera. When the vertical line moves in the right direction of the image (when the camera is panned to the left) and the resulting distortion image (upper stage). FIG. 6 is a diagram illustrating a case where a circle moves downward in an image (when the camera pans upward (also referred to as tilt)) and a distortion image (lower stage) obtained as a result.

ここで、画像縦横サイズがＶ×ＨのＣＭＯＳカメラが動くと、撮影されたシーン中の物体の特徴点ｘは１フレーム期間中に画像の動きｕによって動くとする。並進動きを仮定すると、

であり、その速度ｖは１フレーム時間Ｔｆで割ることによって得られる。

Here, it is assumed that when a CMOS camera whose image size is V × H moves, the feature point x of the object in the photographed scene is moved by the image movement u during one frame period. Assuming translational motion,

And the velocity v is obtained by dividing by one frame time Tf.

ｔ＝０で画面左上の画像原点から走査開始する画像Ｉ_ｎにおいて、特徴点ｘｎが１フレーム期間にｕｎで動くとすると、画素位置ｘｄｎ＝（ｘｄｎ，ｙｄｎ）までの経過時間は、

であり、ＣＭＯＳ歪み位置ｘｄｎは、歪みのない場合の位置ｘｎに動き歪みによる変動項を加えた次のようなＣＭＯＳ動き歪みモデルを満たす。 In the image _{I n} to the scanning starting at t = 0 from the image origin of upper left of the screen, the feature point xn is to move in un to one frame period, the pixel position xdn = (xdn, ydn) elapsed time until,

The CMOS distortion position xdn satisfies the following CMOS motion distortion model in which a fluctuation term due to motion distortion is added to the position xn when there is no distortion.

と近似すると、

［数８］から、ｙｄｎについて解くと、

And approximating

From [Equation 8], solving for ydn:

したがって、（ｘｎ，ｙｎ）と（ｘｄｎ，ｙｄｎ）の間の並進動きによる歪み変換は次のようになる。

Therefore, the distortion transformation by translation between (xn, yn) and (xdn, ydn) is as follows.

これを、

と書くと、

だから、並進歪みによる隣接する２画像Ｉ_ｎ、Ｉ_ｎ＋１の間の関係は次のようになる（図４）。図４は、ＣＭＯＳカメラの動き歪みを説明する図である。

this,

And write

Therefore, the relationship between two adjacent images I _n and I _{n + 1} due to translational distortion is as follows (FIG. 4). FIG. 4 is a diagram for explaining the motion distortion of the CMOS camera.

ここで、

変換行列Ａｎ，ｎ＋１を要素で書くと、

であり、これを

と置いて、行列の各要素を等値する。ｔｘｎ、ｔｙｎ、ｔｘｎ＋１、ｔｙｎ＋１について解くと、次のようになる。

here,

When the transformation matrix An, n + 1 is written as an element,

And this

And set each element of the matrix to be equal. Solving for txn, tyn, txn + 1, and tyn + 1 gives the following.

また、図５は、シミュレーションにより生成したカメラの動きの方向による格子画像のＣＭＯＳ動き歪み画像であり、（ａ）がカメラ動きなしの場合を説明し、（ｂ）がカメラが左を向くと画像は右に変形する場合を説明し、（ｃ）がカメラが右を向くと画像は左に変形する場合を説明し、（ｄ）がカメラが上を向くと画像は伸びる場合を説明し、（ｅ）がカメラが下を向くと画像は縮む場合を説明している。 FIG. 5 is a CMOS motion distortion image of a lattice image based on the direction of camera movement generated by simulation. FIG. 5A illustrates a case where there is no camera movement, and FIG. 5B illustrates an image when the camera faces left. Describes the case where the image is deformed to the right, (c) illustrates the case where the image is deformed to the left when the camera is directed to the right, (d) illustrates the case where the image is expanded when the camera is directed upward, e) illustrates a case where the image shrinks when the camera faces downward.

（２次元アフィン変換によるＣＭＯＳ動き推定）
並進動きによる歪み変形を表す変換行列Ａｎ，ｎ＋１は２次元アフィン変換であるが、その自由度（未知パラメータの個数）は４である。そこで、これを４パラメータアフィン変換と呼ぶことにする。したがって、ＣＭＯＳ動き推定は４パラメータアフィン変換を計算することに帰着する。４パラメータアフィン変換行列を計算する方法を以下に示す。 (CMOS motion estimation by two-dimensional affine transformation)
A transformation matrix An, n + 1 representing distortion deformation due to translational motion is a two-dimensional affine transformation, but its degree of freedom (the number of unknown parameters) is four. Therefore, this is referred to as 4-parameter affine transformation. Thus, CMOS motion estimation results in computing a four parameter affine transformation. A method for calculating a four-parameter affine transformation matrix is shown below.

［数１４］より、第１画像Ｉ_ｎをＡｎ，ｎ＋１により４パラメータアフィン変換すると、第２画像Ｉ_ｎ＋１に重なるから、重なり部分では次の関係が成り立つ。

From Equation 14, when the first image _{I n} An, the n + 1 4 parameter affine transformation, since overlapping the second image _{I n + 1,} the following relationship holds in the overlapping portion.

［数２２］の右辺をテイラー展開により１次近似すると、

であり、ａ＝（ａ２，ａ３，ａ５，ａ６）を求めるためには、次の目的関数Ｊを最小化する。以降、ｘｄｎ→ｘ、ｙｄｎ→ｙと略記する。 When the right side of [Equation 22] is first-order approximated by Taylor expansion,

In order to obtain a = (a2, a3, a5, a6), the following objective function J is minimized. Hereafter, abbreviated as xdn → x and ydn → y.

ここで、

である。Σは２画像の重複した領域中のすべての画素に渡る和を表す。［数２４］は勾配拘束条件の最小二乗推定である。各パラメータでＪを微分すると次のようになる。

here,

It is. Σ represents the sum over all pixels in the overlapping area of the two images. [Equation 24] is a least square estimation of the gradient constraint condition. Differentiating J with each parameter, it becomes as follows.

したがって、次のような連立方程式を解けばよい。
Therefore, the following simultaneous equations may be solved.

そのようにして計算したａ２、ａ３、ａ５、ａ６から、［数１８］〜［数２１］により、並進パラメータｔｘｎ、ｔｙｎ、ｔｘｎ＋１、ｔｙｎ＋１を計算する。
The translation parameters txn, tyn, txn + 1, and tyn + 1 are calculated from [Equation 18] to [Equation 21] from the a2, a3, a5, and a6 calculated as described above.

そのようにして得られたｔｘｎ、ｔｙｎ、ｔｘｎ＋１、ｔｙｎ＋１を初期値として、例えば、ガウス・ニュートン法により最適化してもよいが、４パラメータアフィン変換行列Ａｎ，ｎ＋１は並進パラメータに関しては線形ではないので、直接最適化するのは煩雑になる。そこで、Ａｎ，ｎ＋１を反復更新により最適に推定した結果を分解する。上記の最小二乗法を反復により最適化する手順は次のようになる。 The txn, tyn, txn + 1, and tyn + 1 obtained in this way may be used as initial values, and may be optimized by, for example, the Gauss-Newton method. Direct optimization is cumbersome. Therefore, the result of optimal estimation of An, n + 1 by iterative updating is decomposed. The procedure for optimizing the least squares method by iteration is as follows.

ステップ１、初期値を与えて第１画像の変換画像を生成して、Ｊ←∞（十分大きい値）とする。
ステップ２、第２画像と変換された第１画像で、［数２５］による画素毎の時空間勾配Ｉｘ、Ｉｙ、Ｉｔを計算する。
ステップ３、次の連立方程式を解く。
Step 1: An initial value is given to generate a converted image of the first image, and J ← ∞ (a sufficiently large value).
Step 2. Calculate the spatiotemporal gradients Ix, Iy, It for each pixel according to [Equation 25] using the second image and the converted first image.
Step 3 Solve the following simultaneous equations.

ステップ４、ａ２、ａ３、ａ５、ａ６を次のように更新する。

Steps 4, a2, a3, a5, and a6 are updated as follows.

ステップ５、更新されたパラメータａによる第１画像の変換画像を生成して、残差Ｊ’＝Ｊ（ａ）を計算する。Ｊ’＜＝Ｊかつ｜Ｊ-Ｊ’｜＜ε（微小しきい値）ならａを返して終了する。そうでなければ、Ｊ←Ｊ’としてステップ２に戻る。
Step 5, a converted image of the first image with the updated parameter a is generated, and the residual J ′ = J (a) is calculated. If J ′ <= J and | J−J ′ | <ε (small threshold value), a is returned and the processing is terminated. Otherwise, return to step 2 as J ← J '.

実際には、ガウシアンフィルタを掛けて間引くことにより階層画像を生成して、最も低解像度の画像間で推定したパラメータをより高解像度の画像間における動き推定処理に伝播させる階層動き推定処理を行う。 In practice, a hierarchical image is generated by thinning out by applying a Gaussian filter, and a hierarchical motion estimation process is performed in which a parameter estimated between the lowest resolution images is propagated to a motion estimation process between higher resolution images.

図１０は、実施例における階層動き推定処理を説明するブロック図である。図１０に示すように、隣接する動き歪み２画像「Ｉ_ｎ」と「Ｉ_ｎ＋１」を入力する。それぞれにガウシアンフィルタGσを掛けて画像サイズを１／２に間引く（↓２）。間引き処理を繰り返し行い画像サイズを１／４まで縮小する。原画像サイズをＬｅｖｅｌ０とすると、１／２画像サイズをＬｅｖｅｌ１、１／４画像サイズをＬｅｖｅｌ２と呼ぶ。 FIG. 10 is a block diagram illustrating the hierarchical motion estimation process in the embodiment. As shown in FIG. 10, adjacent motion distortion 2 images “I _n ” and “I _{n + 1} ” are input. Each is multiplied by a Gaussian filter Gσ to thin out the image size by half (↓ 2). The thinning process is repeated to reduce the image size to ¼. When the original image size is Level 0, the 1/2 image size is called Level 1 and the 1/4 image size is called Level 2.

４パラメータアフィン変換行列パラメータの初期値ａ（０）を与えて、１／４画像サイズのＩ_ｎを４パラメータアフィン変換による補正処理（Ｗ）を行う。そして、１／４画像サイズのＩ_ｎ＋１との間で、４パラメータアフィン変換行列を推定する（Ｍ）。その推定結果をａ（０）に加えた結果をａ（１）として、次の１／２画像サイズによるＬｅｖｅｌ１処理に用いる。 4 gives the parameter initial value a of the affine transformation matrix parameters (0), performs correction processing by 4 parameter affine transformation I _n 1/4 image size (W). Then, a four-parameter affine transformation matrix is estimated with respect to In _{+ 1} of a _¼ image size (M). The result of adding the estimation result to a (0) is defined as a (1), and is used for Level1 processing with the next 1/2 image size.

４パラメータアフィン変換行列パラメータの初期値ａ（０）は、１／４画像サイズのＩ_ｎ、Ｉ_ｎ＋１において、例えばブロックマッチングによる並進パラメータとすればよい。その場合のａ（０）＝（ａ２、ａ３、ａ５、ａ６）＝（０、ｔｘ（０）、０、ｔｙ（０））である。ここで、（ｔｘ（０）、ｔｙ（０））がブロックマッチングによる並進パラメータである。 The initial value a (0) of the four-parameter affine transformation matrix parameter may be a translation parameter, for example, by block matching in I _n and I _{n + 1} of the ¼ image size. In this case, a (0) = (a2, a3, a5, a6) = (0, tx (0), 0, ty (0)). Here, (tx (0), ty (0)) is a translation parameter by block matching.

１／４画像サイズによるＬｅｖｅｌ２処理同様に、１／２画像サイズによるＬｅｖｅｌ１処理は、４パラメータアフィン変換行列パラメータａ（１）による１／２画像サイズのＩ_ｎを４パラメータアフィン変換による補正処理（Ｗ）を行う。そして、１／２画像サイズのＩ_ｎ＋１との間で、４パラメータアフィン変換行列を推定する（Ｍ）。その推定結果をａ（１）に加えた結果をａ（２）として、次の原画像サイズによるＬｅｖｅｌ０処理に用いる。 1/4 image Level2 treated similarly by size, 1/2 image Level1 processing by size, 4 parameter affine transformation matrix parameters a (1) correction processing by 4 parameter affine transformation _{I n} 1/2 image size by (W )I do. Then, a four-parameter affine transformation matrix is estimated between I _{n + 1} of the ½ image size (M). The result of adding the estimation result to a (1) is set as a (2), and is used for Level0 processing with the next original image size.

１／２画像サイズによるＬｅｖｅｌ１処理同様に、原画像サイズによるＬｅｖｅｌ０処理は、４パラメータアフィン変換行列パラメータａ（２）による原画像サイズのＩ_ｎを４パラメータアフィン変換による補正処理（Ｗ）を行う。そして、原画像サイズのＩ_ｎ＋１との間で、４パラメータアフィン変換行列を推定する（Ｍ）。その推定結果をａ（２）に加えた結果のａ（３）が最終的な４パラメータアフィン変換行列の推定結果であり、その結果の４パラメータアフィン変換行列を分解して並進パラメータを計算する。得られた並進パラメータに対して時系列処理を行い、その結果の並進パラメータにより再び合成した４パラメータアフィン変換行列により最終的に原画像サイズのＩ_ｎ＋１を補正して出力する。 Level1 similarly processed by the half image size, Level0 processing by the original image size, performing four-parameter affine transformation matrix parameters a (2) correction processing by 4 parameter affine transformation _{I n} of the original image size by a (W). Then, a 4-parameter affine transformation matrix is estimated between the original image size I _{n + 1} (M). The result of adding the estimation result to a (2) is a (3), which is the final estimation result of the 4-parameter affine transformation matrix, and the translation parameter is calculated by decomposing the resulting 4-parameter affine transformation matrix. Time series processing is performed on the obtained translation parameters, and finally the original image size I _{n + 1} is corrected and output by the four-parameter affine transformation matrix synthesized again by the translation parameters as a result.

（ＣＭＯＳ動き歪み補正と安定化処理）
上述のように図１は、固定カメラの場合のＣＭＯＳカメラ映像の動き歪み補正および安定化処理を説明するブロック図である。図１において、動き推定（ＭｏｔｉｏｎＥｓｔｉｍａｔｉｏｎ）により２画像Ｉ_ｎ、Ｉ_ｎ＋１の間の４パラメータアフィン変換行列を推定して、それを分解した並進パラメータｔｎ＝（ｔｘｎ，ｔｙｎ）により歪み補正（ＤｉｓｔｏｒｔｉｏｎＣｏｒｒｅｃｔｉｏｎ）する。並進パラメータを累積加算したτｎ＝（τｘｎ，τｙｎ）によって、歪み補正の結果を揺れ補正（ＭｏｔｉｏｎＳｔａｂｉｌｉｚａｔｉｏｎ）する。すなわち、 (CMOS motion distortion correction and stabilization processing)
As described above, FIG. 1 is a block diagram for explaining motion distortion correction and stabilization processing of a CMOS camera image in the case of a fixed camera. In FIG. 1, a 4-parameter affine transformation matrix between two images I _n and I _{n + 1} is estimated by motion estimation, and distortion correction (Distortion Correction) is performed by a translation parameter tn = (txn, tyn) obtained by decomposing the matrix. ) Based on τn = (τxn, τyn) obtained by accumulating the translation parameters, the distortion correction result is subjected to shake stabilization (Motion Stabilization). That is,

である。Ｚ^−１は１フレーム遅延を表す。

It is. Z ⁻¹ represents one frame delay.

移動カメラの場合は、ｔｎを累積加算したものではなく、ｔｎを低域通過フィルタ（ＬＰＦ）によって平滑化したτｎにより歪み補正の結果を揺れ補正（ＭｏｔｉｏｎＳｔａｂｉｌｉｚａｔｉｏｎ）する（図２）。上述した図２において、平滑化した並進パラメータを用いれば、パン等の移動するカメラにおいても、揺れ補正が可能となる。低域通過フィルタとして巡回型フィルタを用いると、現在と過去のデータしか使わないので余分なフレーム遅延が発生せず、処理全体の遅延量の観点から優位である。１次バタワース巡回型フィルタの場合、その出力は次のように計算される。
In the case of a moving camera, tn is not cumulatively added, but the result of distortion correction is shake-corrected (Motion Stabilization) by τn obtained by smoothing tn with a low-pass filter (LPF) (FIG. 2). In FIG. 2 described above, if smoothed translation parameters are used, shake correction can be performed even in a moving camera such as a pan. When a recursive filter is used as a low-pass filter, only current and past data is used, so that no extra frame delay occurs, which is advantageous from the viewpoint of the delay amount of the entire process. For a first order Butterworth recursive filter, its output is calculated as follows:

ここで、ωａｃは双一次変換によってプリウォーピング（ｐｒｅｗａｒｐｉｎｇ）されたアナログカットオフ角周波数であり、ディジタルカットオフ角周波数をωｃ＝２πｆｃとすると、次のようになる。
Here, ωac is an analog cutoff angular frequency prewarped by bilinear transformation, and when the digital cutoff angular frequency is ωc = 2πfc, the following is obtained.

Ｔｓはサンプリング周期であり、ｆｓをサンプリング周波数とするとＴｓ＝１／ｆｓである。
Ts is a sampling period, and Ts = 1 / fs when fs is a sampling frequency.

［数３１］は、水平方向も垂直方向もベクトルとしてまとめて書いている。カットオフ周波数が同じであれば、フィルタ係数も同じであるが、それぞれ異なるカットオフ周波数としても構わない。 [Equation 31] is written as a vector in both the horizontal and vertical directions. If the cut-off frequencies are the same, the filter coefficients are the same, but different cut-off frequencies may be used.

しかし、移動カメラから固定カメラへ遷移する場合、巡回型フィルタは平滑化作用を強めるほど、減衰により変化点が鈍る。これは、移動カメラが静止したにも関わらず、しばらくの間、移動カメラとして安定化処理がなされることを意味する。あるいはその逆に固定カメラから移動カメラへ遷移する場合も同様である。 However, when transitioning from a moving camera to a fixed camera, the recursive filter becomes duller due to attenuation as the smoothing effect increases. This means that the stabilization process is performed as the moving camera for a while even though the moving camera is stationary. Or vice versa, the same applies to a transition from a fixed camera to a moving camera.

そこで、変化点がフィルタ処理の減衰により鈍らないように、巡回型フィルタに信号のレベル差に応じた重み係数を導入する。１次バタワース巡回フィルタの場合、水平方向の並進パラメータτｘｎは次のように計算される。 Therefore, a weighting factor corresponding to the signal level difference is introduced into the cyclic filter so that the changing point is not dull due to the attenuation of the filter processing. For the first order Butterworth recursive filter, the horizontal translation parameter τxn is calculated as follows:

ここで、

σｒは信号のレベル差の許容範囲を調整するためのパラメータである。垂直方向の並進パラメータに関しては、τｘ→τｙ、ｔｘ→ｔｙとして計算すればよい。
here,

σr is a parameter for adjusting an allowable range of signal level difference. The vertical translation parameters may be calculated as τx → τy and tx → ty.

実際には、固定カメラの場合も移動カメラの場合も歪み補正と揺れ補正はそれぞれの並パラメータｔｎ、τｎを再び合成した４パラメータアフィン変換によりまとめて行う。変換によるサブピクセル精度の画素座標における画素値は、近傍画素による内挿補間により計算する。 Actually, in the case of both the fixed camera and the moving camera, the distortion correction and the shake correction are collectively performed by four-parameter affine transformation in which the parallel parameters tn and τn are synthesized again. The pixel value at the pixel coordinates with subpixel accuracy by the conversion is calculated by interpolation using neighboring pixels.

（画像シミュレーション実験） (Image simulation experiment)

（１）人工画像シミュレーション実験
図６は、図５の格子画像を歪み変形させたシミュレーション画像において、カメラの動きなしの画像を基準画像として、４パラメータアフィン変換行列を計算して歪み補正した結果を説明する図である。図６に示すＣＭＯＳ動き歪み補正画像において、（ａ）は基準画像（カメラ動きなし）を説明する図であり、（ｂ）〜（ｅ）は図５のカメラの動きによる動き歪み画像（ｂ）〜（ｅ）の補正結果を説明する図であり、画像境界は歪み変形の補正が分かりやすいように黒のままとしている。
図６から、水平および垂直方向いずれの動きによる歪み変形も補正できていることがわかる。 (1) Artificial Image Simulation Experiment FIG. 6 is a simulation image obtained by distorting and deforming the lattice image of FIG. 5, and calculating a distortion correction result by calculating a 4-parameter affine transformation matrix using an image without camera movement as a reference image. It is a figure explaining. In the CMOS motion distortion corrected image shown in FIG. 6, (a) is a diagram for explaining a reference image (no camera motion), and (b) to (e) are motion distortion images (b) due to the camera motion of FIG. It is a figure explaining the correction result of (e), and the image boundary is left black so that distortion correction can be easily understood.
From FIG. 6, it can be seen that distortion deformation due to movement in both the horizontal and vertical directions can be corrected.

（２）実画像列シミュレーション実験（固定カメラの場合）
ディジタルカメラ（ニコンＤ４０（登録商標））で撮影した３００８×２０００画素の画像に対して、並進パラメータを与えて、その一部分を切り出して、並進歪み画像を生成する。水平および垂直方向にそれぞれ平均０、標準偏差５および３画素の正規乱数による並進パラメータを用いて並進歪み画像列を生成する。生成した画像サイズは６４０×４８０画素である。これは、固定カメラによる定点監視映像と見なすことができる。 (2) Real image sequence simulation experiment (in the case of a fixed camera)
A translation parameter is given to an image of 3008 × 2000 pixels captured by a digital camera (Nikon D40 (registered trademark)), and a part thereof is cut out to generate a translational distortion image. A translational distortion image sequence is generated using translation parameters based on normal random numbers having an average of 0, a standard deviation of 5 and 3 pixels in the horizontal and vertical directions, respectively. The generated image size is 640 × 480 pixels. This can be regarded as a fixed point monitoring video by a fixed camera.

図７（ａ）はそのようにして生成した並進歪み画像列の加算平均画像を示す図である（３０フレーム）。第１フレームを基準画像として、第２フレーム以降、順次隣接する２画像間の並進パラメータを推定した結果を用いて歪み補正および安定化処理を行った。図７（ｂ）は処理結果の画像列の加算平均画像を示す図である。歪み変形と揺れにより輪郭が重なって見える処理前の加算平均画像に対して、処理後の加算平均画像は明瞭に見える。画像境界付近の黒味は補正処理による見切れのためである。図７（ｃ）は歪み画像列の生成に用いた並進パラメータの軌跡を説明する図である。 FIG. 7A is a diagram showing an addition average image of translational distortion image sequences generated in this way (30 frames). Using the first frame as a reference image, distortion correction and stabilization processing were performed using the results of estimating translation parameters between two adjacent images sequentially from the second frame. FIG. 7B is a diagram showing an addition average image of an image sequence as a processing result. Compared to the addition average image before the process that appears to have overlapping contours due to distortion and shaking, the addition average image after the process looks clear. This is because the blackness in the vicinity of the image boundary is cut out by the correction process. FIG. 7C is a diagram for explaining the locus of the translation parameter used for generating the distorted image sequence.

補正結果の隣接２画像間の二乗誤差画像のピークＳＮ比により定量的に評価する。ピークＳＮ比ＰＳＮＲは二乗誤差画像の平均輝度値（平均ノイズ電力）ＭＳＥおよび最大輝度値（最大信号電力）Ｉ２ｍａｘから次のように求められる。実験では、Ｉｍａｘを８ビット最大画素値２５５とした。 The correction result is quantitatively evaluated based on the peak SN ratio of the square error image between two adjacent images. The peak SN ratio PSNR is obtained as follows from the average luminance value (average noise power) MSE and the maximum luminance value (maximum signal power) I2max of the square error image. In the experiment, Imax was set to an 8-bit maximum pixel value 255.

（３）実画像列シミュレーション実験（移動カメラの場合）
図８は、ディジタルカメラ（キヤノンＩＸＹＤＩＧＩＴＡＬ５００（登録商標））で撮影した２５９２×１９４４画素の画像に対して、並進パラメータを与えて、その一部分を切り出して生成した並進歪み画像列の一部である。水平および垂直方向の基準となる並進パラメータ（ｔｘ，ｔｙ）＝（１５，３）に対して、それぞれ平均０、標準偏差１画素の正規乱数を加えた並進パラメータを用いて並進歪み画像列を生成する。生成した画像サイズは６４０×４８０画素である。図８において、画像は左から右の順に、カメラが左上方へパンアップしている等速直線運動する移動カメラによる映像と見なすことができる。図８上段は、そのようにして生成した原画像列であり、中段は、固定カメラとして補正した結果の画像列、下段は移動カメラとして補正した結果の画像列である。 (3) Real image sequence simulation experiment (in the case of a moving camera)
FIG. 8 shows a part of a translation distortion image sequence generated by giving a translation parameter to an image of 2592 × 1944 pixels captured by a digital camera (Canon IXY DIGITAL 500 (registered trademark)) and cutting out a part of the image. . A translation distortion image sequence is generated by using translation parameters obtained by adding normal random numbers having an average of 0 and a standard deviation of 1 pixel to the translation parameters (tx, ty) = (15, 3) serving as the reference in the horizontal and vertical directions. To do. The generated image size is 640 × 480 pixels. In FIG. 8, the image can be regarded as an image from a moving camera that moves linearly at a constant speed, with the camera panning up to the upper left in order from left to right. The upper row in FIG. 8 is the original image sequence generated as described above, the middle row is the image row as a result of correction as a fixed camera, and the lower row is the image row as a result of correction as a moving camera.

固定カメラとして行った歪み補正および安定化処理の結果は、第１フレームを基準として完全に安定しているが、入力が移動カメラによる画像列のため、基準フレームから大きく移動すると、次第に見切れる領域が大きくなっていく。 The result of distortion correction and stabilization processing performed as a fixed camera is completely stable with reference to the first frame. However, since the input is an image sequence from a moving camera, if the input frame moves greatly from the reference frame, there will be a region that is gradually cut out. It gets bigger.

一方、移動カメラとして行った歪み補正および安定化処理の結果は、カメラの移動に伴い、補正処理が追従しているのがわかる。ここでは、不要な揺れ成分を除去して、カメラの軌跡を滑らかにするために、隣接２画像間において推定した並進パラメータの時系列変化に対して１次バタワース巡回型レベル適応フィルタ処理を行い、その結果の並進パラメータを用いて各フレームを補正した。１次バタワース巡回型レベル適応フィルタにおけるカットオフ周波数は水平および垂直方向いずれも０．０１Ｈｚ、σｒ２をそれぞれ、２０、３とした。 On the other hand, the results of the distortion correction and stabilization processing performed as a moving camera show that the correction processing follows the camera movement. Here, in order to remove unnecessary shaking components and smooth the trajectory of the camera, the first-order Butterworth cyclic level adaptive filter processing is performed on the time series change of the translation parameter estimated between two adjacent images, Each frame was corrected using the resulting translation parameters. The cut-off frequency in the first-order Butterworth cyclic level adaptive filter was 0.01 Hz in both the horizontal and vertical directions, and σr2 was 20 and 3, respectively.

図９は、移動カメラによる並進歪み画像列から推定した隣接２画像間の並進パラメータの時系列変化のグラフを説明する図であり、上段が水平方向、下段が垂直方向を示している。図９において、並進パラメータの時系列変化（Ｏｒｉｇｉｎａｌ）を１次バタワース巡回型低域通過フィルタ（ｆｃ＝１Ｈｚ）により平滑化した並進パラメータ（ＩＩＲ）、１次バタワース巡回型レベル適応フィルタ（ｆｃ＝０．０１Ｈｚ、σr２を水平および垂直方向でそれぞれ２０、３）により平滑化した並進パラメータ（ＩＩＲｂｉｌａｔｅｒａｌ）。１次バタワース巡回型レベル適応フィルタにより揺れ成分である高周波成分が除去されて並進パラメータが滑らかになっているのがわかる。 FIG. 9 is a diagram for explaining a graph of the time-series change of the translation parameter between two adjacent images estimated from the translation distortion image sequence by the moving camera, and the upper part indicates the horizontal direction and the lower part indicates the vertical direction. In FIG. 9, the translation parameter (IIR) obtained by smoothing the time series change (original) of the translation parameter by the first-order Butterworth cyclic low-pass filter (fc = 1 Hz), the first-order Butterworth cyclic level adaptive filter (fc = 0). .01 Hz, translation parameter (IIRbilital) obtained by smoothing σr2 by 20 and 3) in the horizontal and vertical directions, respectively. It can be seen that the translation parameter is smoothed by removing the high-frequency component, which is a fluctuation component, by the primary Butterworth cyclic level adaptive filter.

さらに、並進パラメータを（ｔｘ，ｔｙ）＝（０，０）として、同様に正規乱数を加えて引き続き並進歪み画像列を生成する。すなわち、カメラが移動した後、静止して固定カメラになる場合である。図９の３０フレーム目の移動カメラから固定カメラへの変化点において、カットオフ周波数が１Ｈｚの１次バタワース巡回型フィルタにより平滑化した並進パラメータは、平滑化作用による減衰のためカメラが静止したフレームを越えても、すぐには０にはならない。 Further, the translation parameter is set to (tx, ty) = (0, 0), and a normal random number is added in the same manner to continuously generate a translation distortion image sequence. That is, it is a case where after the camera moves, it becomes stationary and becomes a fixed camera. The translation parameter smoothed by the first-order Butterworth recursive filter with a cutoff frequency of 1 Hz at the transition point from the moving camera to the fixed camera in the 30th frame in FIG. 9 is a frame where the camera is stationary due to attenuation due to the smoothing action. Even if it exceeds, it will not become 0 immediately.

一方、１次バタワース巡回型フィルタによる平滑化の結果は、移動カメラにおける揺れ成分を除去しつつ、変化点も保持しており、カメラが静止した後もほぼ０に近い平滑結果が得られている。簡易なしきい値処理で十分に固定カメラであることを判別可能である。平滑化した並進パラメータの絶対値があるしきい値以下の場合、固定カメラであると判定して、安定化処理のための補正には、平滑化した並進パラメータではなく、そのフレームから並進パラメータを累積加算した結果を用いればよい。 On the other hand, the smoothing result by the first-order Butterworth recursive filter removes the shaking component from the moving camera and retains the change point, and a smoothing result close to 0 is obtained even after the camera is stationary. . It can be determined that the camera is sufficiently fixed by simple threshold processing. If the absolute value of the smoothed translation parameter is below a certain threshold value, it is determined that the camera is a fixed camera, and the correction parameter for the stabilization process is not the smoothed translation parameter, but the translation parameter from that frame. The result of cumulative addition may be used.

（まとめ）
本発明では、ＣＭＯＳセンサを用いたカメラにより撮影された映像における動き歪み変形と揺れの補正を同時に行うために、ローリングシャッタに起因する動き歪み変形を隣接する画像間のグローバルな２次元４パラメータアフィン変換により記述して、その変換行列を最適に推定した後、解析的に分解することにより並進パラメータを計算した。このとき、何らの画像特徴や対応付けを用いることなく、画素を直接的に処理することにより推定を行った。 (Summary)
In the present invention, in order to simultaneously perform motion distortion deformation and shake correction in an image taken by a camera using a CMOS sensor, the motion distortion deformation caused by the rolling shutter is subjected to global two-dimensional four-parameter affine between adjacent images. The translation parameters were calculated by describing them by transformation, estimating the transformation matrix optimally, and then analyzing it analytically. At this time, the estimation was performed by directly processing the pixels without using any image feature or association.

画像シミュレーション実験を行い、固定カメラ、移動カメラいずれの映像に含まれる動き歪み変形を補正するとともに揺れを除去して安定化した。移動カメラの場合には、推定した並進パラメータの時系列変化に対して、巡回型レベル適応フィルタ処理により揺れ成分を除去して、カメラの移動を保持したまま、映像中の揺れのみを補正した。移動カメラから固定カメラへ遷移する際にも、変化に忠実な安定化処理を実現した。 An image simulation experiment was performed to correct the motion distortion and deformation included in the images of both the fixed camera and the moving camera, and to stabilize by removing the shaking. In the case of a moving camera, the shaking component is removed by cyclic level adaptive filter processing with respect to the estimated translation parameter time-series change, and only the shaking in the video is corrected while maintaining the movement of the camera. When shifting from a moving camera to a fixed camera, we realized stabilization processing that is faithful to changes.

（補足説明）
カメラ映像の揺れの安定化を実現する方式は、カメラに取り付けた加速度センサからの情報に基づいてカメラのレンズやカメラ自身を動かすことにより揺れを補正する光学式、機械式と呼ばれるものと、画像処理による電子式と呼ばれるものに分けられる。 (Supplementary explanation)
The method of stabilizing the shaking of the camera image is called an optical or mechanical type that corrects the shaking by moving the camera lens or the camera itself based on the information from the acceleration sensor attached to the camera, and the image It is divided into what is called electronic by processing.

これらの装置は、一般にビデオスタビライザと呼ばれるが、画像処理による電子式によるビデオスタビライザは、揺れの補正可能な範囲や装置の小型化、耐久性等、多くの点で優位である。また、画像処理による電子式によるビデオスタビライザは、既に取得済みの映像の後処理として後発的に処理することが可能であるので、光学式による揺れ補正が充分でなかった場合や、保管されていた過去の映像等に対して追加的に処理することが可能である。 These devices are generally referred to as video stabilizers. However, electronic video stabilizers based on image processing are advantageous in many respects such as a range in which shake can be corrected, downsizing and durability of the devices. In addition, electronic video stabilizers based on image processing can be processed later as post-processing of already acquired video, so that optical shake correction was not sufficient or was stored. It is possible to additionally process a past video or the like.

例えば、２画像間の画面全体のグローバル動きの推定をオプティカルフローによって行い、カメラ映像の揺れを安定化させた例がある［非特許文献１］。 For example, there is an example in which the global motion of the entire screen between two images is estimated by an optical flow to stabilize the camera image shake [Non-Patent Document 1].

非特許文献［１］
M. Irani, B. Rousso, and S. Peleg, Recovery of ego-motion using region alignment, IEEE Transactions on Pattern Analysis and Machine Intelligence,
19-3 (1997), 268-272. Non-patent literature [1]
M. Irani, B. Rousso, and S. Peleg, Recovery of ego-motion using region alignment, IEEE Transactions on Pattern Analysis and Machine Intelligence,
19-3 (1997), 268-272.

画像処理によるカメラ映像の揺れの安定化は、連続する画像のグローバルな動きを推定する問題に帰着される。 The stabilization of camera image shaking by image processing results in a problem of estimating the global motion of successive images.

画像の動きの推定は、画像処理・コンピュータビジョンにおける基本的な問題であり、これまでに多くの研究がなされてきたが、それらは大きく領域ベースによる手法と、特徴ベースによる手法に分けられる。 Image motion estimation is a fundamental problem in image processing and computer vision, and many studies have been conducted so far, but these can be broadly divided into region-based methods and feature-based methods.

領域ベースの方法としては、動画像圧縮符号化の国際標準規格MPEG［非特許文献２］では、ブロックマッチングが用いられ、コンピュータビジョンではオプティカルフロー［非特許文献３，４］がよく用いられるが、いずれも濃淡画素を直接処理するものである。 As a region-based method, block matching is used in the international standard MPEG of moving image compression encoding [Non-Patent Document 2], and optical flow [Non-Patent Documents 3 and 4] is often used in computer vision. In either case, grayscale pixels are directly processed.

非特許文献［２］
ISO/IEC-11172, Coding of moving pictures and associated audio for digital storage media up to 1.5 Mbits/s, 1993. Non-patent literature [2]
ISO / IEC-11172, Coding of moving pictures and associated audio for digital storage media up to 1.5 Mbits / s, 1993.

非特許文献［３］
B. K. P. Horn and B. G. Schunck, Determining optical flow,
Artificial Intelligence, 17 (1981), 185-203. Non-patent literature [3]
BKP Horn and BG Schunck, Determining optical flow,
Artificial Intelligence, 17 (1981), 185-203.

非特許文献［４］
B. D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, Proceedings of the 1981 DARPA Image Understanding Workshop, April, 1981, 121-130. Non-patent literature [4]
BD Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, Proceedings of the 1981 DARPA Image Understanding Workshop, April, 1981, 121-130.

一方、位相相関法［５］のように、画像を周波数変換することによって周波数領域で行う処理もある。 On the other hand, there is also processing performed in the frequency domain by frequency-converting an image, such as the phase correlation method [5].

非特許文献［５］
G. A. Thomas, Television motion measurement for DATV and other applications、
BBC R&D Reports RD1987/11, 1987. Non-patent literature [5]
GA Thomas, Television motion measurement for DATV and other applications,
BBC R & D Reports RD1987 / 11, 1987.

特徴ベースの方法としては、コーナー等の画像特徴点や画像中の直線を用いるものがある。金澤・金谷は、特徴点から２画像間の射影変換を最適に計算した［６］。松永は、海洋上の船舶から撮影される映像に含まれる画像の回転と上下動を除去するために、映像中の水平線を検出することにより動揺映像の安定化を行った［７］。 Some feature-based methods use image feature points such as corners and straight lines in the image. Kanazawa and Kanaya optimally calculated projective transformation between two images from feature points [6]. Matsunaga stabilized the swaying image by detecting the horizontal line in the image in order to remove the rotation and vertical movement of the image included in the image taken from the ship on the ocean [7].

非特許文献［６］
金澤靖, 金谷健一, 段階的マッチングによる画像モザイク生成,
電子情報通信学会論文誌D-II, J86-D-II-6 (2003), 816-824. Non-patent literature [6]
Satoshi Kanazawa, Kenichi Kanaya, Image mosaic generation by stepwise matching,
IEICE Transactions D-II, J86-D-II-6 (2003), 816-824.

非特許文献［７］
松永力, 水平線検出による船体動揺映像の安定化,
第15回画像センシングシンポジウム（SSII2009）講演論文集, 横浜(パシフィコ横浜). Non-patent literature [7]
Matsunaga, Stabilization of Hull Motion Image by Horizon Detection,
Proceedings of the 15th Image Sensing Symposium (SSII2009), Yokohama (Pacifico Yokohama).

これまでのスタビライザ処理の多くはＣＣＤセンサによるカメラを前提としているが、ＣＭＯＳセンサにおけるスタビライザ処理の研究もなされている。ＲｉｎｇａｂｙとＦｏｒｓｓｅｎ［非特許文献８］は、携帯電話のカメラ映像を安定化するために、予めカメラの内部パラメータを校正した後、映像中の特徴点を抽出し、それを追跡した。カメラの運動を３次元回転モデルにより記述し、そのパラメータ推定には、再投影誤差の最小化を行うために非線形最適化を用いた。そして、推定したパラメータを平均化することによって安定化を行った。Ｇｒｕｎｄｍａｎｎら［非特許文献９］は、画面をブロック分割して、ブロック毎に隣接する２画像間の２次元射影変換を計算して、それらの重ね合わせにより動き歪みを補正したが、射影変換を計算するためには、やはり、映像中の特徴点を用いている。 Many of the conventional stabilizer processes are premised on a camera using a CCD sensor, but studies on stabilizer processes in CMOS sensors are also being conducted. Ringabil and Forssen [Non-patent Document 8] extracted the feature points in the video and tracked them after calibrating the internal parameters of the camera in advance in order to stabilize the camera video of the mobile phone. The camera motion was described by a three-dimensional rotation model, and nonlinear optimization was used for parameter estimation to minimize the reprojection error. Stabilization was performed by averaging the estimated parameters. Grundmann et al. [Non-Patent Document 9] divided a screen into blocks, calculated two-dimensional projection transformation between two adjacent images for each block, and corrected motion distortion by superimposing them. In order to calculate, the feature points in the video are still used.

非特許文献［８］
E. Ringaby and P.-E. Forssen, Efficient video rectication and stabilisation for cell-phones, International Journal of Computer Vision, 96-3 (2012), 335-352. Non-patent literature [8]
E. Ringaby and P.-E. Forssen, Efficient video rectication and stabilisation for cell-phones, International Journal of Computer Vision, 96-3 (2012), 335-352.

非特許文献［９］
M. Grundmann, V. Kwatra, D. Castro, and I. Essa, Calibration-free rolling shutter removal, Proceedings of IEEE Conference on Computational Photography (ICCP2012), April, 2012. Non-patent literature [9]
M. Grundmann, V. Kwatra, D. Castro, and I. Essa, Calibration-free rolling shutter removal, Proceedings of IEEE Conference on Computational Photography (ICCP2012), April, 2012.

本発明は、ローリングシャッタ機構によるＣＭＯＳカメラ映像におけるスタビライザ処理を行う。このとき、何らの画像特徴や対応付けを用いない。隣接する２画像間のグローバル動きを２次元アフィン変換により記述し、その推定には、非線形最適化を行わずに線形解法を用いる。線形解法の反復により最適化したアフィン変換行列を解析的に分解することにより並進パラメータを計算する。 The present invention performs a stabilizer process on a CMOS camera image by a rolling shutter mechanism. At this time, no image feature or association is used. A global motion between two adjacent images is described by two-dimensional affine transformation, and a linear solution is used for the estimation without performing nonlinear optimization. The translation parameters are calculated by analytically decomposing the affine transformation matrix optimized by the iteration of linear solution.

画像シミュレーション実験を行い、固定カメラ、移動カメラいずれの映像に含まれる動き歪み変形を補正するとともに揺れを除去して安定化することが確認できた。 An image simulation experiment was conducted, and it was confirmed that the motion distortion and deformation included in the images of both the fixed camera and the moving camera were corrected and the shake was removed and stabilized.

移動カメラの場合には、推定した並進パラメータの時系列変化に対して、巡回型フィルタ処理により揺れ成分を除去して、カメラの移動を保持したまま、映像中の揺れのみを補正する。移動カメラから固定カメラへ遷移する際にも、変化に忠実な安定化処理を実現することができる。 In the case of a moving camera, the shaking component is removed by cyclic filter processing with respect to the estimated time-series change of the translation parameter, and only the shaking in the video is corrected while keeping the camera moving. Even when transitioning from a moving camera to a fixed camera, it is possible to realize stabilization processing that is faithful to changes.

また、本発明では、映像処理による方法のため、いかなるＣＭＯＳセンサを用いたカメラにより撮影された動き歪みや揺れを伴う録画された蓄積映像であっても、補正することが可能である。 Further, in the present invention, since the method is based on video processing, it is possible to correct a stored video recorded with motion distortion or shaking taken by a camera using any CMOS sensor.

従来公知の方法では、画像特徴点の抽出と２画像間における特徴点の対応付け処理には処理時間とコストが掛かる。本発明では、何らの画像特徴を用いることなく、画素値を直接処理することにより揺れ補正と歪み補正を同時に行う。平行成分と歪み成分をまとめて推定および補正することができる。 In a conventionally known method, processing time and cost are required for extracting image feature points and associating feature points between two images. In the present invention, shake correction and distortion correction are simultaneously performed by directly processing pixel values without using any image features. The parallel component and the distortion component can be estimated and corrected together.

本発明は、映像処理全般、特に、映像監視やセキュリティに好適である。映像の動き情報を推定して、動き補正処理を行うビデオスタビライザ、フレームレー卜変換等の処理を遂行する場合の基礎とできる。
The present invention is suitable for video processing in general, particularly video monitoring and security. It can be used as a basis for estimating motion information of a video and performing processing such as a video stabilizer for performing motion compensation processing and frame rate conversion.

Claims

Two-dimensional four-parameter affine transformation between adjacent two images of motion distortion caused by a rolling shutter is performed on an image acquired from a CMOS sensor, and the transformation matrix is obtained by a region-based direct method using gradient constraints. After estimation, the translation parameters are calculated by analytical decomposition ,
In the case of a mobile camera, a video stabilization processing method is characterized in that a shaking component is removed by a recursive filter in consideration of level adaptation with respect to a time-series change of the estimated translation parameter .

The video stabilization processing method according to claim 1,
The video stabilization processing method characterized in that the two-dimensional four-parameter affine transformation between the two images is calculated by direct calculation using gradient information of a gray value between two pixels .

In the video stabilization processing method according to claim 1 or 2,
The analytical decomposition after estimating the transformation matrix by a region-based direct method with gradient constraints is:
A video stabilization processing method characterized by the analytical decomposition of an affine transformation matrix optimized by iterative updating using a linear solution.

The video stabilization processing method according to claim 3,
The video stabilization processing method, wherein the optimization by iterative updating using the linear solution method is an arithmetic processing using a least square method.

In the video stabilization processing method according to any one of claims 1 to 4,
An image stabilization processing method characterized by simultaneously performing motion distortion deformation and shake correction in an image acquired from a CMOS sensor.

The video stabilization processing method according to any one of claims 1 to 5,
An image stabilization processing method characterized in that no image feature or association is used when the transformation matrix is estimated by a region-based direct method using gradient constraint conditions.

In the video stabilization processing method according to any one of claims 1 to 6,
An image stabilization processing method characterized by correcting only shaking in an image while maintaining movement of a visual field image accompanying movement of a camera.

A program for causing a computer to execute the video stabilization processing method according to any one of claims 1 to 7.

A computer-readable storage medium storing the program according to claim 8.

The video stabilizer which performs the image | video stabilization processing method as described in any one of Claims 1 thru | or 7.

A video stabilizer according to claim 10, comprising:
Using the recursive filter as a low-pass filter and using only current and past data, processing is performed in real time without generating extra frame delay
A video stabilizer characterized by that.

In the video stabilization processing method according to any one of claims 1 to 7,
The two-dimensional four-parameter affine transformation is obtained by modeling motion distortion deformation caused by the rolling shutter as a two-dimensional geometric transformation of an image.
And a video stabilization processing method.

In the video stabilization processing method according to any one of claims 1 to 7,
In the estimation based on the region-based direct method using the gradient constraint condition, the transformation matrix is not subjected to feature point extraction processing and matching processing between the extracted feature points between the two images.
And a video stabilization processing method.

In the video stabilization processing method according to any one of claims 1 to 7,
In the removal of the shaking component by the recursive filter, unnecessary motion due to blurring and significant motion of the camera are separated.
And a video stabilization processing method.