JP4688147B2

JP4688147B2 - Moving image processing device

Info

Publication number: JP4688147B2
Application number: JP2005175047A
Authority: JP
Inventors: 晴久加藤; 康弘滝嶋
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2005-06-15
Filing date: 2005-06-15
Publication date: 2011-05-25
Anticipated expiration: 2025-06-15
Also published as: JP2006350618A

Description

本発明は、動画像処理装置に関し、特に、符号化動画像の復号情報だけでなく、符号情報を利用して動画像における移動物体を効率的に高速かつ高精度に追跡することができる動画像処理装置に関する。 The present invention relates to a moving image processing apparatus, and in particular, a moving image capable of efficiently and rapidly tracking a moving object in a moving image using not only decoding information of an encoded moving image but also code information. The present invention relates to a processing apparatus.

動画像における移動物体を追跡する方式には、画素領域で追跡する方式と符号領域で追跡する方式がある。画素領域で移動物体を追跡する方式としては、背景差分法やKLT法などが知られている。背景差分法では、背景画像と対象画像との差分絶対値和を移動物体検出の判定に使用する。また、KLT法では、画像の勾配に関する拘束式を解くことにより移動物体を検出する。 As a method of tracking a moving object in a moving image, there are a method of tracking in a pixel region and a method of tracking in a code region. Background subtraction methods, KLT methods, and the like are known as methods for tracking a moving object in a pixel region. In the background difference method, the sum of absolute differences between the background image and the target image is used for the determination of moving object detection. In the KLT method, a moving object is detected by solving a constraint equation related to the gradient of an image.

符号領域で移動物体を追跡する方式としては、圧縮符号化された動画像の符号情報そのものを利用する方式が提案されており、これでは、圧縮符号化の際に求められた各種パラメータや符号情報を直接操作することで移動物体検出の処理を達成する。移動物体検出の処理では、画像内の符号単位ブロックが持つ動き予測情報を中心に必要な情報のみが取捨選択されて処理され、これによる検出結果に従って移動物体が追跡される。 As a method for tracking a moving object in a code area, a method using code information itself of a compression-coded moving image has been proposed. In this case, various parameters and code information obtained at the time of compression coding are proposed. The process of moving object detection is achieved by directly operating. In the moving object detection process, only necessary information is selected and processed based on the motion prediction information of the code unit block in the image, and the moving object is tracked according to the detection result.

しかしながら、画素領域で移動物体を追跡する方式は、動画像が符号化されている場合、該方式を適用するに際し、符号化動画像の符号情報を一旦画素領域まで復号し、画素領域で全ての処理を行うので、復号処理や追跡処理の計算量負荷が大きくなるという課題がある。 However, the method of tracking a moving object in the pixel area is such that, when a moving image is encoded, when the method is applied, the code information of the encoded moving image is once decoded to the pixel region, Since the process is performed, there is a problem that the calculation load of the decoding process and the tracking process increases.

一方、符号領域で移動物体を追跡する方法は、圧縮効率を高めるように決定された動きベクトルに依存し、この動きベクトルを元に検出された動画像における動きは本来の動きと一致するとは限らないため、追跡の信頼性に欠けるという課題がある。例えば、平坦な画像領域ではノイズの影響が大きく、動きベクトルは本来の動きと異なる場合が多いので、動きベクトルを元に検出された動きの信頼性は、特に乏しくなる。 On the other hand, the method of tracking a moving object in a code region depends on a motion vector determined to increase compression efficiency, and the motion in a moving image detected based on this motion vector does not always match the original motion. Therefore, there is a problem that the reliability of tracking is lacking. For example, since the influence of noise is large in a flat image region and the motion vector is often different from the original motion, the reliability of the motion detected based on the motion vector is particularly poor.

移動物体の検出や追跡などの画像理解には、画素そのものだけではなく周波数上での特徴量を解析することが有効である。しかし、符号化動画像の内容を周波数上で解析するために、一度画素領域まで復号した後に改めて周波数領域に変換すると、処理コストが大きくなってしまう。 For image understanding such as detection and tracking of a moving object, it is effective to analyze not only the pixel itself but also the feature quantity on the frequency. However, in order to analyze the content of the encoded moving image on the frequency, if the pixel region is once decoded and then converted to the frequency region again, the processing cost increases.

本発明の目的は、上記課題を解決し、符号化動画像の符号情報そのものを周波数上の情報として利用することで動画像における移動物体を効率的に高速かつ高精度に追跡することができる動画像処理装置を提供することにある。 An object of the present invention is to solve the above-described problem and use moving image information that can efficiently and rapidly track a moving object in a moving image by using code information itself of the encoded moving image as information on the frequency. An object is to provide an image processing apparatus.

上記課題を解決するために、本発明は、符号化動画像の符号情報を利用して動画像における移動物体を追跡する動画像処理装置において、伝送または蓄積された符号化動画像を入力する入力部と、入力された符号化動画像から符号情報の抽出および復号を行う復号部と、移動物体を形成し、かつ画素の変化に富んでいる領域の複数の特徴点を前記符号情報を用いて選択する特徴点選択部と、前記特徴点の位置情報および復号された動画像を用いて符号化動画像での動き探索の領域単位より小さい、特徴点を含む小領域単位で動きを推定して移動物体の動きを推定する動き推定部と、前記動き推定部により推定された移動物体の動きに基づいて移動物体を追跡する整合部を備えたことを基本的特徴としている。 In order to solve the above problems, the present invention provides an input for inputting a transmitted or accumulated encoded moving image in a moving image processing apparatus that tracks moving objects in the moving image using code information of the encoded moving image. using the parts, and a decoder for extracting and decoding of the code information from the input encoded moving image, to form a moving objects, and the code information a plurality of feature points of a region rich in change of the pixel a feature point selection section for selecting Te, by using the positional information and the decoded moving image of the feature point area unit smaller than the motion search in the encoded moving image, and estimating motion in a small region unit including a feature point It is basically characterized in that it comprises a motion estimator for estimating a motion of a moving object, the matching unit to track the moving object based on the motion of the moving object estimated by the motion estimation unit Te.

本発明では、符号化動画像の復号情報だけでなく、符号情報も利用して移動物体の検出と追跡を行う。この際、移動物体を形成し、かつ画素の変化に富んでいる領域の複数の特徴点を符号情報を用いて選択し、移動物体の検出および追跡処理では、該特徴点に追跡対象を限定して処理するので、画素領域で追跡する方式と比較して処理コストを低減することができる。また、符号領域で追跡する方式と比較しても処理コストを低減することができる。さらに、特徴点の位置情報および復号された動画像を用いて符号化動画像での動き探索の領域単位より小さい、特徴点を含む小領域単位で動きを推定して移動物体の動きを推定するので、移動物体を高精度に追跡することができる。 In the present invention, the moving object is detected and tracked using not only the decoding information of the encoded moving image but also the code information. At this time, a plurality of feature points in a region that forms a moving object and is rich in pixel changes are selected using code information , and the tracking target is limited to the feature points in the detection and tracking process of the moving object. Therefore , the processing cost can be reduced as compared with the method of tracking in the pixel area. Also, the processing cost can be reduced compared with the method of tracking in the code area. Furthermore, the motion of a moving object is estimated by estimating the motion in units of small areas including feature points, which is smaller than the unit of motion search in the encoded video using the position information of the feature points and the decoded video. Therefore, it is possible to track a moving object with high accuracy.

以下、図面を参照して本発明について説明する。図１は、本発明に係る動画像処理装置の一実施形態の機能ブロック図である。本実施形態の動画像処理装置は、入力部１０、復号部１１、特徴点選択部１２、動き推定部１３、整合部１４および表示部１５を備える。 The present invention will be described below with reference to the drawings. FIG. 1 is a functional block diagram of an embodiment of a moving image processing apparatus according to the present invention. The moving image processing apparatus according to the present embodiment includes an input unit 10, a decoding unit 11, a feature point selection unit 12, a motion estimation unit 13, a matching unit 14, and a display unit 15.

以下、各部の機能を順に説明する。なお、以下では、入力される動画像が国際標準であるMPEG-1 ビデオ(ISO/IEC11172-2)に従って符号化された符号化動画像であるとして説明するが、本発明は、その他の方式の符号化動画像に対しても適用でき、対象とする符号化動画像を限定するものではない。 Hereinafter, functions of each unit will be described in order. In the following description, it is assumed that the input moving image is an encoded moving image encoded according to the MPEG-1 video (ISO / IEC11172-2) which is an international standard. The present invention can also be applied to an encoded moving image, and does not limit the target encoded moving image.

符号化動画像は、入力部１０を介して復号部１１に入力される。入力される符号化動画像は、遠隔地から伝送されてきたものでも、装置内外のデータベースから読み出されたものでもよい。 The encoded moving image is input to the decoding unit 11 via the input unit 10. The encoded moving image that is input may be transmitted from a remote place or read from a database inside or outside the apparatus.

復号部１１は、入力された符号化動画像を復号し、また、符号化動画像から符号化係数や発生符号量などの符号情報を抽出する。抽出された符号情報は、特徴点選択部１２に送出され、復号情報は、動き推定部１３へ送出される。 The decoding unit 11 decodes the input encoded moving image and extracts code information such as an encoding coefficient and a generated code amount from the encoded moving image. The extracted code information is sent to the feature point selection unit 12, and the decoded information is sent to the motion estimation unit 13.

特徴点選択部１２は、復号部１１から送出される符号情報を入力とし、移動物体の動きが正確に捕捉できるように、フレーム間での一致の対応付けが容易な動き物体を形成する特徴点を選択する。 The feature point selection unit 12 receives the code information sent from the decoding unit 11, and forms a moving object that can easily match matches between frames so that the movement of the moving object can be accurately captured. Select.

例えば、変化の乏しい平坦な画像領域は、追跡する際に正確な動きを推定することが困難であるため、特徴点としては相応しくない。また、ノイズが大きい画像領域やランダムな画素値を持つ画像領域も特徴点として相応しくない。逆に、明確なエッジが存在する画像領域は、位置の特定が容易であるので特徴点として相応しい。特に、物体の角や端点など、複数のエッジが交差する部分を含む画像領域は特徴点として相応しい。移動物体の領域は、複数の特徴点を囲う外接領域として形成することができる。 For example, a flat image region with little change is not suitable as a feature point because it is difficult to estimate an accurate motion when tracking. Also, an image area with a large noise or an image area with a random pixel value is not suitable as a feature point. On the contrary, an image region where a clear edge exists is suitable as a feature point because the position can be easily identified. In particular, an image region including a portion where a plurality of edges intersect, such as an object corner or an end point, is suitable as a feature point. The area of the moving object can be formed as a circumscribed area surrounding a plurality of feature points.

以下では、エッジに着目した特徴点選択について説明する。図２は、特徴点選択部１２の機能を具体的に示す機能ブロック図である。本例の特徴点選択部１２は、テクスチャ判定２１、分布判定２２およびブロック境界について判定２３の３種類の判定を順次実行して、入力された符号情報から特徴点を選択する。 Hereinafter, feature point selection focusing on edges will be described. FIG. 2 is a functional block diagram specifically illustrating the function of the feature point selection unit 12. The feature point selection unit 12 of this example sequentially executes three types of determinations of the texture determination 21, the distribution determination 22, and the determination 23 for the block boundary, and selects the feature points from the input code information.

まず、テクスチャ判定２１について説明する。テクスチャ判定２１は、符号化係数のテクスチャを基に、平坦領域とノイズ領域を排除し、適度なエッジを持った領域を特徴点候補とする。発生符号量やDCT係数を判定することにより、特徴点候補を符号領域上で選定することができる。 First, the texture determination 21 will be described. The texture determination 21 excludes a flat area and a noise area based on the texture of the coding coefficient, and sets an area having an appropriate edge as a feature point candidate. By determining the generated code amount and DCT coefficient, feature point candidates can be selected on the code area.

例えば、可変長符号化されているビット量(発生符号量)を判定し、このビット量が所定の範囲内に収まる領域を特徴点候補とすることができる。また、量子化されたDCT係数の最高次係数まで連続する０の長さを判定し、この長さが所定の範囲内に収まる領域を特徴点候補とすることができる。また、DCT係数のAC成分絶対値和を判定し、この絶対値和が所定の範囲内に収まる領域を特徴点候補とすることもできる。これらの判定の元となる情報は、いずれも符号情報として格納されているので、簡単に取得できる。 For example, it is possible to determine a bit amount (generated code amount) that has been subjected to variable-length coding, and set a region in which this bit amount falls within a predetermined range as a feature point candidate. Further, it is possible to determine a length of 0 that continues to the highest order coefficient of the quantized DCT coefficient, and to select a region where this length falls within a predetermined range as a feature point candidate. It is also possible to determine the sum of the AC component absolute values of the DCT coefficient, and to set a region where the sum of the absolute values falls within a predetermined range as a feature point candidate. Since all the information that is the basis of these determinations is stored as code information, it can be easily obtained.

また、テクスチャ判定２１に用いる閾値は、量子化パラメータなどの量子化情報や、符号化係数がフレームDCTによるものであるかフィールドDCTによるものであるかなどの符号化パラメータなどに応じて適応的に変化させてもよい。 The threshold used for the texture determination 21 is adaptive according to quantization information such as a quantization parameter, an encoding parameter such as whether the encoding coefficient is based on the frame DCT or the field DCT, and the like. It may be changed.

次に、分布判定２２について説明する。分布判定２２は、テクスチャ判定２１で選定された特徴点候補から単純な形状を持つ候補を排除し、適度な複雑度を持った領域を特徴点として選定する。適度な複雑度を持った領域は、例えば、DCT係数の分布の偏りを基に符号領域上で選定することができる。 Next, the distribution determination 22 will be described. The distribution determination 22 excludes a candidate having a simple shape from the feature point candidates selected in the texture determination 21, and selects a region having an appropriate complexity as a feature point. A region having an appropriate complexity can be selected on the code region based on, for example, a bias in the distribution of DCT coefficients.

具体的には、DCT係数の水平方向のAC成分の絶対値和および垂直方向のAC成分の絶対値和およびその他の低域のAC成分の絶対値和がともに閾値を超えることを判定基準にして判定することができる。 Specifically, the criterion is that both the absolute value sum of the horizontal AC component and the sum of the absolute AC component values of the DCT coefficient and the absolute sum of the other low frequency AC components exceed the threshold. Can be determined.

次に、ブロック境界について判定２３について説明する。テクスチャ判定２１および分布判定２２では、ブロックを単位とした判定であるため、複数のブロックを跨ぐ特徴点を判定できない。例えば、ブロック境界にエッジが存在する場合、ブロックのDCT係数のAC成分は大きな係数とならない。このため、DCT係数を用いる判定では、このようなエッジを判定できない。 Next, the determination 23 for the block boundary will be described. Since the texture determination 21 and the distribution determination 22 are determinations in units of blocks, it is not possible to determine feature points that straddle a plurality of blocks. For example, when an edge exists at a block boundary, the AC component of the DCT coefficient of the block is not a large coefficient. For this reason, such an edge cannot be determined in the determination using the DCT coefficient.

テクスチャ判定２１および分布判定２２では判定できない特徴点を判定して選択するため、ブロック境界についての判定２３では、複数のブロックからブロックを跨ぐブロックを構成し、テクスチャ判定２１および分布判定２２とは異なる階層のブロックで階層的に特徴点を選択する。 Since the feature points that cannot be determined by the texture determination 21 and the distribution determination 22 are determined and selected, the block boundary determination 23 constitutes a block extending from a plurality of blocks to the block, and is different from the texture determination 21 and the distribution determination 22. Select feature points in a hierarchical block.

例えば、隣接する4組のDCT係数X，Y，Z，Wの各DCT係数をそれぞれxi，yi，zi，wi(ここで、iは、DCT係数を水平方向かつ垂直方向に並べたときの順番であるとする)とし、1/4の縮小画像で4個のブロックを跨ぐブロックを構成することを考えると、式(1)で4個のブロックを跨ぐ縮小されたブロックが得られる。なお、式(1)では、4個のブロックそれぞれのDC成分と3つの低域AC成分を用いてIDCT(復号処理)を近似している。 For example, the DCT coefficients of four adjacent DCT coefficients X, Y, Z, and W are respectively xi, yi, zi, and wi (where i is the order in which the DCT coefficients are arranged horizontally and vertically) And a block that straddles four blocks with a 1/4 reduced image, a reduced block that straddles four blocks is obtained by equation (1). In Equation (1), IDCT (decoding processing) is approximated using DC components and three low-frequency AC components of each of the four blocks.

次に、エッジ検出に用いられるハイパスフィルタとして2次微分のラプラシアンフィルタをベクトルの積に分離すると、式(2)が得られる。 Next, when a second-order Laplacian filter is separated into vector products as a high-pass filter used for edge detection, Equation (2) is obtained.

式(2)のベクトルを式(1)で縮小されたブロックに適用すると、ラプラシアンフィルタによるコンボリューション演算は行列積の形の式(3)で表現される。式(3)の定数をまとめると、式(4)が得られる。式(4)を用いてDCT係数から直接ラプラシアンフィルタの結果を導くことができる。 When the vector of Expression (2) is applied to the block reduced by Expression (1), the convolution operation by the Laplacian filter is expressed by Expression (3) in the form of a matrix product. Summarizing the constants of equation (3) yields equation (4). Equation (4) can be used to derive the Laplacian filter results directly from the DCT coefficients.

式(4)の計算結果である判定基準が閾値よりも大きい場合はブロック境界にエッジが存在するとして特徴点に加える。式(4)の演算は、途中の計算結果を保持することにより定数行列と任意のベクトル(a,b,c,d)^ｔ（tは転置を表す）の積の演算量を削減することができる。図３は、演算過程と途中の計算結果の保持による演算量削減を示している。 If the criterion of the calculation result of Equation (4) is larger than the threshold, it is added to the feature point as an edge exists at the block boundary. The calculation of equation (4) can reduce the amount of calculation of the product of a constant matrix and an arbitrary vector (a, b, c, d) ^t (t represents transposition) by holding the calculation result in the middle. it can. FIG. 3 shows the calculation amount reduction by holding the calculation process and the calculation result in the middle.

例えば左側2行列を演算して結果行列の1列目2要素を求める際には、a,b,c,dをそれぞれx_０,x_８,z_０,z_８とすればよい。結果行列の他の要素、さらに結果行列と右側行列の演算による各要素も、a,b,c,dに適宜の値を与えて同様にして求めることができる。 For example, when the left two matrices are calculated to obtain the first element and the second element of the result matrix, a, b, c, and d may be set to x ₀ , x ₈ , z ₀ , and z ₈ , respectively. Other elements of the result matrix, and further each element obtained by the calculation of the result matrix and the right matrix can be obtained in the same manner by giving appropriate values to a, b, c, and d.

なお、上記実施形態では、DCT係数のDC成分とそれに隣接する3つの低域AC成分で近似したIDCTとハイパスフィルタを統合して1/4縮小画像でブロック境界におけるエッジ検出を行っているが、他の解像度でもIDCTを近似して同様にエッジ検出を行うことができる。ブロック境界について判定２３では、エッジの存在を見極めさえすればよいのでIDCTの近似は任意であり、演算の高速化が優先されるようにIDCTを近似できる。 In the above embodiment, the DCT coefficient of the DCT coefficient and the IDCT approximated by the three low-frequency AC components adjacent thereto and the high-pass filter are integrated to perform edge detection at the block boundary in the 1/4 reduced image. Edge detection can be performed similarly by approximating IDCT at other resolutions. In the determination 23 on the block boundary, since it is only necessary to determine the presence of an edge, the approximation of IDCT is arbitrary, and the IDCT can be approximated so that high-speed computation is prioritized.

追跡すべき移動物体が分かっている場合には、手動で特徴点を選択する領域を初めから指定して該領域に処理を限定することもできる。この場合、移動物体の詳細な形状を指定する必要はなく、矩形や円などの大まかな形状で領域を指定するだけで構わない。指定された領域の中から自動的に追跡に適した複数の特徴点が選択され、選択された特徴点を囲う最小の多角形が移動物体追跡の初期領域となるからである。 When the moving object to be tracked is known, it is possible to manually specify an area for selecting feature points from the beginning and limit the processing to the area. In this case, it is not necessary to specify the detailed shape of the moving object, and it is only necessary to specify the region with a rough shape such as a rectangle or a circle. This is because a plurality of feature points suitable for tracking are automatically selected from the designated region, and the smallest polygon surrounding the selected feature point is the initial region for moving object tracking.

図４は、動き推定部１３の機能を具体的に示す機能ブロック図である。動き推定部１３は、動き探索４１と予測誤差判定４２の機能を有し、探索時に得られる予測誤差を判定し、移動物体を形成する複数の特徴点を個々に追跡することで移動物体の動きを推定する。動き探索MPEGの符号情報としてMV(動きベクトル)が格納されているので、領域内MVの重み平均から移動物体の動きを推定するようにすることも考えられるが、移動物体が小さい場合、マクロブロック(MB)数の僅かな増減で動きが大きく変動する可能性があり、逆に、移動物体が大きい場合には平坦部におけるMVのように信頼性の低いベクトルの影響を受ける可能性がある。MVはMB(マクロブロック)という面単位で予測誤差を最小にするように決定されているため、特徴点の動きを正確に表現していない。そこで、本発明では特徴点を含む小領域単位で動きを推定し直すこととしている。 FIG. 4 is a functional block diagram specifically illustrating the function of the motion estimation unit 13. The motion estimation unit 13 has functions of a motion search 41 and a prediction error determination 42, determines a prediction error obtained during the search, and individually tracks a plurality of feature points forming the moving object, thereby moving the moving object. Is estimated. Since the MV (motion vector) is stored as the code information of the motion search MPEG, it may be possible to estimate the motion of the moving object from the weighted average of the MV in the area, but if the moving object is small, the macroblock If the number of (MB) is slightly increased or decreased, the motion may fluctuate greatly. Conversely, if the moving object is large, it may be affected by a low-reliability vector such as MV in the flat part. Since MV is determined to minimize the prediction error in units of MB (macroblock), it does not accurately represent the movement of feature points. Therefore, in the present invention, the motion is estimated again in units of small areas including feature points.

特徴点の動きは、エッジ保存性、ノイズ耐性、計算負荷の観点から、予測誤差評価関数を最小化するベクトルを決定することで推定するのがよい。特徴点を含む小領域は画素の変化に富んでいるので、MPEGのMBに比べ大幅に小さな領域でも正確な動きを推定することが可能である。また、動き推定の単位をMPEGのMBより小領域にすることで、計算量が削減されるだけでなく、予測誤差評価関数の最小化方向と勾配方向が一致しやすくなる。 The movement of the feature point is preferably estimated by determining a vector that minimizes the prediction error evaluation function from the viewpoint of edge preservation, noise resistance, and calculation load. Since the small region including the feature point is rich in change of pixels, it is possible to estimate an accurate motion even in a region that is much smaller than the MPEG MB. Also, by making the unit of motion estimation smaller than the MB of MPEG, not only the amount of calculation is reduced, but also the minimization direction and the gradient direction of the prediction error evaluation function are easily matched.

特徴点の座標を〈x〉＝(x,y)で表し，座標〈x〉かつ時刻tの画素値をI(〈x〉,t)で表すと、特徴点の動きベクトル〈v〉＝(v_ｘ,v_ｙ) の予測誤差評価関数e(〈v〉)は、式(5)で定義される。 If the coordinates of the feature point are represented by <x> = (x, y), the coordinates <x> and the pixel value at time t is represented by I (<x>, t), the motion vector <v> = ( The prediction error evaluation function e (<v>) of v _x , v _y ) is defined by equation (5).

ここで、Wは特徴点を含む小領域を示し、w(〈x〉) は重み係数を示す。重み係数w(〈x〉) を常に1にすれば式(5)を簡略化できるが、DCT係数AC成分の絶対値和やDCT係数の非0の個数など、重み係数を任意に設定することも可能である。 Here, W represents a small region including feature points, and w (<x>) represents a weighting coefficient. Equation (5) can be simplified if the weighting factor w (<x>) is always set to 1, but the weighting factor can be set arbitrarily, such as the sum of the absolute values of the DCT coefficients AC component and the number of nonzero DCT coefficients. Is also possible.

予測誤差評価関数e(〈v〉)の値を最小化する最適な動きベクトル〈v_ａ〉は、ある初期ベクトル〈v_０〉を使って式(6)〜(9)を反復繰り返すことによって求めることができる。ただし、I_ｘおよびI_ｙは、I(〈x〉,t)のそれぞれx方向、y方向の微分であり、式(10)、式(11)で表される。また、ΔI_ｋは、I(〈x〉,t)の動き予測誤差であり、式(12)で表される。 An optimal motion vector <v _a > that minimizes the value of the prediction error evaluation function e (<v>) is obtained by iteratively repeating equations (6) to (9) using a certain initial vector <v ₀ >. be able to. Here, I _x and I _y are the differentials of I (<x>, t) in the x direction and y direction, respectively, and are expressed by equations (10) and (11). ΔI _k is a motion prediction error of I (<x>, t) and is expressed by Expression (12).

反復繰り返しの終了条件は、Δ〈v_ｋ〉のノルムが予め定めた閾値より小さくなったとき、あるいは反復繰り返しの回数が予め定められた回数に達したときとすることができる。また、初期ベクトル〈v_０〉としてMPEGのMVを用いれば、収束するまでの反復繰り返しの回数を抑えることができる。 The end condition of the repeated iteration can be when the norm of Δ <v _k > becomes smaller than a predetermined threshold or when the number of repeated iterations reaches a predetermined number. Further, if MPEG MV is used as the initial vector <v ₀ >, the number of repeated iterations until convergence can be suppressed.

平坦な画像領域などでは動きベクトルはノイズに影響されるので、本来の動きを捉えることは困難であるが、平坦な画像領域などは特徴点選択部１２において特徴点から排除されている。しかし、特徴点選択部１２により選択された特徴点でも真の動きを捉えにくい場合があるので、動き推定部１３で該当する画像領域を排除するのが好ましい。例えば、大きなDCT係数が存在する周期的なエッジ領域は、特徴点選択部１２で排除しきれないが、この領域は誤った動きベクトルを誘発しやすい。この誤った動きベクトルが誘発されるのを防ぐため、特徴点選択部１２で選択された特徴点(エッジ領域)をすべて特徴点とするのではなく、動き探索の過程で得られる誤差評価関数の値を基にエッジ領域を分別し、特徴点としての信頼性を高める。 Since a motion vector is affected by noise in a flat image region or the like, it is difficult to capture the original motion, but the flat image region or the like is excluded from the feature points in the feature point selection unit 12. However, since the true motion may be difficult to capture even at the feature point selected by the feature point selection unit 12, it is preferable to exclude the corresponding image area by the motion estimation unit 13. For example, a periodic edge region where a large DCT coefficient exists cannot be completely eliminated by the feature point selection unit 12, but this region easily induces an erroneous motion vector. In order to prevent this erroneous motion vector from being induced, not all feature points (edge regions) selected by the feature point selection unit 12 are feature points, but an error evaluation function obtained in the motion search process The edge region is classified based on the value, and the reliability as a feature point is increased.

動きベクトルは、上述したように、画素値の差の2乗和や絶対値和などで定義される予測誤差評価関数の値が最小になるよう決定されるが、周期的なエッジ領域の場合には予測誤差評価関数の値が極小となる場所が一定間隔で複数存在する。また、例えば、単純なエッジ領域では、予測誤差評価関数は、エッジに沿った方向で同程度の値を取る。 As described above, the motion vector is determined so as to minimize the value of the prediction error evaluation function defined by the sum of squares of pixel value differences or the sum of absolute values. There are a plurality of locations where the value of the prediction error evaluation function is minimized at regular intervals. Further, for example, in a simple edge region, the prediction error evaluation function takes the same value in the direction along the edge.

このような誤った動きベクトルが誘発されやすい周期的なエッジ領域や単純なエッジ領域を排除するために、画素領域で動きベクトル情報を用いて特徴点を分別する。この分別は、予測誤差の分布をみることで達成できる。例えば動きベクトルが指す場所の予測誤差評価関数の値とその周囲の予測誤差評価関数の値を比較判断すればよい。 In order to eliminate periodic edge regions and simple edge regions in which such erroneous motion vectors are likely to be induced, feature points are classified using motion vector information in the pixel regions. This classification can be achieved by looking at the distribution of prediction errors. For example, the value of the prediction error evaluation function at the location indicated by the motion vector may be compared with the value of the surrounding prediction error evaluation function.

比較判断の結果、動きベクトルが指す場所だけが十分小であれば特徴点として相応しい複雑性を持つ特徴点として分別でき、複数の場所で同程度の値であれば特徴点として相応しくない特徴点として分別できる。比較判断には、例えば、動きベクトルが指す場所とその隣接8つの近傍を利用することができる。あるいは数画素離れた近傍地点を利用することもできる。 As a result of comparison, if only the location pointed to by the motion vector is sufficiently small, it can be classified as a feature point having a suitable complexity as a feature point, and if it is the same value at multiple locations, it is not suitable as a feature point Can be separated. For the comparison determination, for example, a location indicated by a motion vector and eight neighboring points can be used. Alternatively, a nearby point several pixels away can be used.

整合部１４は、動き推定部１３により推定された特徴点の動きに基づいて移動物体を追跡する。整合部１４で監視カメラを制御するようにすれば、移動物体を検出・追跡可能な監視システムを構築することができる。具体的には、特徴点の動きベクトル〈v〉を用いてフレーム毎の特徴点を対応付け、特徴点を囲う領域を形成することによって移動物体とし、その軌跡を追跡する。フレーム間の特徴点の対応付けは、複数の特徴点の相対的位置関係、過去の特徴点の位置と動きベクトル〈v〉による予測位置の空間的距離を用いて行うことができる。 The matching unit 14 tracks the moving object based on the motion of the feature points estimated by the motion estimation unit 13. If the monitoring camera is controlled by the matching unit 14, a monitoring system capable of detecting and tracking a moving object can be constructed. Specifically, the feature points for each frame are associated with each other using the motion vector <v> of the feature points, and a region surrounding the feature points is formed as a moving object, and the trajectory is tracked. The association of feature points between frames can be performed using the relative positional relationship between a plurality of feature points and the spatial distance between the positions of past feature points and predicted positions based on motion vectors <v>.

また、さらに特徴点ごとの色情報などを考慮して対応付けを行うことにより、一部特徴点が誤って選択された場合に該特徴点を誤って対応付けてしまうことを、より低減できる。 Further, by performing association in consideration of color information for each feature point, it is possible to further reduce the possibility that the feature points are erroneously associated when some feature points are selected by mistake.

表示部１５は、整合部１４により追跡された移動物体の軌跡を、復号された動画像に重畳して表示する。この表示を見れば、移動物体がどのように動いたかを簡単に把握できる。 The display unit 15 superimposes and displays the locus of the moving object tracked by the matching unit 14 on the decoded moving image. By looking at this display, it is possible to easily grasp how the moving object has moved.

本発明に係る動画像処理装置の一実施形態の機能ブロック図である。1 is a functional block diagram of an embodiment of a moving image processing apparatus according to the present invention. 特徴点選択部の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of a feature point selection part. 境界判定における演算過程と途中の計算結果の保持による演算量削減を示す説明図である。It is explanatory drawing which shows the calculation amount reduction by holding | maintaining the calculation process in the boundary determination, and the calculation result in the middle. 動き推定部の機能を示す機能ブロック図である。It is a functional block diagram which shows the function of a motion estimation part.

Explanation of symbols

１０・・・入力部、１１・・・復号部、１２・・・特徴点選択部、１３・・・動き推定部、１４・・・整合部、１５・・・表示部、２１・・・テクスチャ判定、２２・・・分布判定、２３・・・境界判定、４１・・・動き探索、４２・・・予測誤差判定 DESCRIPTION OF SYMBOLS 10 ... Input part, 11 ... Decoding part, 12 ... Feature point selection part, 13 ... Motion estimation part, 14 ... Matching part, 15 ... Display part, 21 ... Texture Determination, 22 ... distribution determination, 23 ... boundary determination, 41 ... motion search, 42 ... prediction error determination

Claims

In a moving image processing apparatus for tracking a moving object in a moving image using code information of the encoded moving image,
An input unit for inputting transmitted or accumulated encoded moving images; and
A decoding unit that extracts and decodes code information from the input encoded moving image;
A feature point selection section for selecting moving objects is formed, and a plurality of feature points of a region rich in change of the pixel by using the code information,
A motion that estimates a motion of a moving object by estimating a motion in a small area unit including a feature point, which is smaller than a motion search area unit in an encoded video using the position information of the feature point and the decoded video. An estimation unit;
A moving image processing apparatus comprising: a matching unit that tracks a moving object based on the movement of the moving object estimated by the motion estimation unit.

The moving image processing apparatus according to claim 1, further comprising a display unit that superimposes and displays the locus of the moving object tracked by the matching unit on the decoded moving image.

The moving image processing according to claim 1, wherein the input unit can specify a moving object region in the moving image, and the feature point selection unit first selects a feature point in the moving object region. apparatus.

The moving image processing apparatus according to claim 1, wherein the decoding unit includes means for extracting a coding coefficient and a generated code amount of the encoded moving image as code information.

The moving image processing apparatus according to claim 1, wherein the feature point selection unit includes means for associating feature points for each frame and setting a circumscribed area surrounding the feature points as a moving object area.

The moving image processing apparatus according to claim 4, wherein the feature point selection unit includes means for selecting a feature point by determining an evaluation of a texture and distribution of coding coefficients and a block boundary.

The feature point selection unit includes means for selecting feature point candidates by excluding a flat region and a noise region in a code region based on a texture of an encoding coefficient. A moving image processing apparatus.

The feature point selection unit uses the bit amount of the generated code amount, the number of 0s consecutive up to the highest order coefficient of the encoding coefficient, or the flat area in the encoding area by using the absolute value sum of the AC components of the encoding coefficient. The moving image processing apparatus according to claim 7, further comprising: determining a noise area.

The moving image processing apparatus according to claim 7, wherein the feature point selection unit adaptively changes a threshold value for determining a flat region and a noise region in a code region.

The moving image processing apparatus according to claim 9, wherein the threshold value is a numerical value depending on quantization information for each block.

The moving image according to claim 6, wherein the feature point selection unit includes means for selecting a feature point by excluding a region having a simple shape in the code region based on a distribution of coding coefficients. Image processing device.

The moving image processing apparatus according to claim 11, wherein the feature point selection unit determines a region having a simple shape by using a bias in a distribution of encoding coefficients.

The feature point selection unit has a simple shape using the sum of absolute values of AC components in the horizontal direction, the sum of absolute values of AC components in the vertical direction, and the sum of absolute values of other low-frequency AC components. The moving image processing apparatus according to claim 12, wherein an area having a difference is determined.

The moving image according to claim 6, wherein the feature point selection unit includes means for selecting a feature point in a code area in a block that straddles a plurality of block boundaries when determining an evaluation of a block boundary. Image processing device.

15. The feature point selection unit calculates a determination criterion using a constant matrix obtained in advance by synthesizing an approximate decoding process and a high-pass filter when determining an evaluation of a block boundary. The moving image processing apparatus according to 1.

The moving image processing apparatus according to claim 15, wherein the feature point selection unit includes means for reducing the number of computations by holding calculation results during the determination criterion calculation process.

The motion estimation unit includes a motion search unit that estimates the motion of the feature point from the position information of the feature point, and a determination unit that reselects the feature point from a prediction error obtained during the search using the decoded moving image. The moving image processing apparatus according to claim 1.

18. The moving image processing apparatus according to claim 17, wherein the motion search means searches for a motion vector so as to minimize the prediction error evaluation function in units of small regions including feature points.

The moving image processing apparatus according to claim 18, wherein the prediction error evaluation function includes a weighting factor arbitrarily set for a feature point.

20. The moving image processing apparatus according to claim 19, wherein the weighting coefficient can be an absolute value sum of AC components of coding coefficients or the number of non-zero coefficients.

The moving image processing apparatus according to claim 17, wherein the determination unit includes a unit that selects a feature point using a distribution of prediction errors.

The determination means includes means for selecting a feature point where a prediction error of a location indicated by a motion vector determined by motion search by the motion search means is sufficiently small and a neighborhood prediction error is relatively large. The moving image processing apparatus according to claim 21.

The matching unit uses a relative positional relationship between a plurality of feature points, a spatial distance between past feature points and predicted feature points, and feature points between frames using at least one of color information for each feature point. The moving image processing apparatus according to claim 1, further comprising: a unit that performs tracking to track a trajectory of a moving object.