JP7652927B2

JP7652927B2 - A scheme for adjusting the adaptive resolution of motion vector differences.

Info

Publication number: JP7652927B2
Application number: JP2023560903A
Authority: JP
Inventors: リアン・ジャオ; シン・ジャオ; シャン・リュウ
Original assignee: Tencent America LLC
Current assignee: Tencent America LLC
Priority date: 2022-01-24
Filing date: 2022-06-03
Publication date: 2025-03-27
Anticipated expiration: 2042-06-03
Also published as: JP2024513066A; JP2025094027A; WO2023140884A1; KR20230145144A; CN116830572B; CA3213660A1; EP4470212A4; CN116830572A; EP4470212A1; CN120343239A; AU2022434642A1

Description

関連出願の相互参照
本出願は、2022年5月25日に出願された「動きベクトル差の適応解像度を調整するための方式（Schemes for Adjusting Adaptive Resolution for Motion Vector Difference）」と題する米国非仮出願第17／824，193号に基づく優先権の利益を主張し、2022年1月24日に出願された「適応MVD解像度のさらなる改善（Further Improvement for Adaptive MVD Resolution）」と題する米国仮特許出願第63／302，518号に基づく優先権の利益を主張する。これらの先行出願は、その全体が参照により本明細書に組み込まれる。 CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of priority to U.S. Nonprovisional Application No. 17/824,193, entitled "Schemes for Adjusting Adaptive Resolution for Motion Vector Difference," filed May 25, 2022, and claims the benefit of priority to U.S. Provisional Application No. 63/302,518, entitled "Further Improvement for Adaptive MVD Resolution," filed January 24, 2022. These prior applications are incorporated herein by reference in their entireties.

本開示は、一般に、ビデオコーディングに関し、特に、動きベクトル差のための適応解像度を実施する場合、許容可能な動きベクトルの異なる値を設定するための方式を提供するための方法およびシステムに関する。 The present disclosure relates generally to video coding, and more particularly to methods and systems for providing a scheme for setting different values of allowable motion vectors when implementing adaptive resolution for motion vector differences.

本明細書において提供される背景技術の説明は、本開示の文脈を大まかに提示することを目的としている。本発明者らの研究は、その研究がこの背景技術の項に記載されている限りにおいて、またそれ以外の本出願の出願時に先行技術として認められない可能性のある説明の態様と共に、本開示に対する先行技術としては明示的にも暗示的にも認められない。 The background art description provided herein is intended to provide a general context for the present disclosure. The inventors' work is not admitted expressly or impliedly as prior art to the present disclosure, to the extent that that work is described in this background art section, together with aspects of the description that may not otherwise be admitted as prior art at the time of filing of this application.

ビデオコーディングおよびデコーディングは、動き補償を伴うインターピクチャ予測を用いて行われ得る。非圧縮デジタルビデオは、一連のピクチャを含むことができ、各ピクチャは、例えば、1920×1080の輝度サンプルおよび関連するフルサンプリングまたはサブサンプリングされた色差サンプルの空間次元を有する。一連のピクチャは、例えば、毎秒60ピクチャまたは毎秒60フレームの固定または可変のピクチャレート（あるいはフレームレートとも呼ばれる）を有することができる。非圧縮ビデオは、ストリーミングまたはデータ処理のための特定のビットレート要件を有する。例えば、1920×1080の画素解像度、60フレーム／秒のフレームレート、および色チャネルあたり画素あたり8ビットで4：2：0のクロマサブサンプリングを有するビデオは、1．5Gbit／sに近い帯域幅を必要とする。1時間分のそのようなビデオは、600GByteを超える記憶空間を必要とする。 Video coding and decoding may be performed using inter-picture prediction with motion compensation. Uncompressed digital video may include a sequence of pictures, each with spatial dimensions of, for example, 1920x1080 luma samples and associated fully sampled or subsampled chroma samples. The sequence of pictures may have a fixed or variable picture rate (also called frame rate), for example, 60 pictures per second or 60 frames per second. Uncompressed video has specific bitrate requirements for streaming or data processing. For example, a video with a pixel resolution of 1920x1080, a frame rate of 60 frames per second, and 4:2:0 chroma subsampling with 8 bits per pixel per color channel requires a bandwidth approaching 1.5 Gbit/s. One hour of such video requires more than 600 GByte of storage space.

ビデオコーディングおよびデコーディングの1つの目的は、圧縮による非圧縮入力ビデオ信号の冗長性の低減であり得る。圧縮は、前述の帯域幅および／または記憶空間要件を、場合によっては2桁以上低減させるのに役立ち得る。可逆圧縮と非可逆圧縮の両方、ならびにそれらの組み合わせを採用することができる。可逆圧縮とは、原信号の正確なコピーがデコーディングプロセスを介して圧縮された原信号から再構築され得る技術を指す。非可逆圧縮とは、元のビデオ情報がコーディング時に完全に保持されず、デコーディング時に完全に復元できないコーディング／デコーディングプロセスを指す。非可逆圧縮を使用する場合、再構築された信号は原信号と同一ではない可能性があるが、原信号と再構築された信号との間の歪みは、多少の情報損失はあっても、再構築された信号を意図された用途に役立てるのに十分なほど小さくなる。ビデオの場合、非可逆圧縮が多くの用途で広く採用されている。耐容できる歪みの量は、用途に依存する。例えば、特定の消費者ビデオストリーミング用途のユーザは、映画やテレビ放送用途のユーザよりも高い歪みを容認し得る。特定のコーディングアルゴリズムによって達成可能な圧縮比が、様々な歪み耐性を反映するように選択または調整され得る。すなわち、一般に、歪み耐性が高いほど、高い損失および高い圧縮比をもたらすコーディングアルゴリズムが可能になる。 One goal of video coding and decoding may be the reduction of redundancy in an uncompressed input video signal through compression. Compression may help reduce the aforementioned bandwidth and/or storage space requirements, in some cases by more than one order of magnitude. Both lossless and lossy compression, as well as combinations thereof, may be employed. Lossless compression refers to techniques where an exact copy of the original signal may be reconstructed from the compressed original signal through a decoding process. Lossy compression refers to a coding/decoding process where the original video information is not fully preserved when coding and cannot be fully restored when decoding. When using lossy compression, the reconstructed signal may not be identical to the original signal, but the distortion between the original and reconstructed signals will be small enough to make the reconstructed signal useful for its intended application, even with some information loss. For video, lossy compression has been widely adopted in many applications. The amount of distortion that can be tolerated depends on the application. For example, a user of a particular consumer video streaming application may tolerate higher distortion than a user of a movie or television broadcast application. The compression ratio achievable by a particular coding algorithm may be selected or adjusted to reflect various distortion tolerances. That is, generally speaking, higher distortion tolerance allows for coding algorithms that result in higher losses and higher compression ratios.

ビデオエンコーダおよびデコーダは、例えば、動き補償、フーリエ変換、量子化、およびエントロピーコーディングを含む、いくつかの広範なカテゴリおよびステップからの技法を利用することができる。 Video encoders and decoders can utilize techniques from several broad categories and steps, including, for example, motion compensation, Fourier transform, quantization, and entropy coding.

ビデオコーデック技術は、イントラコーディングとして知られる技法を含むことができる。イントラコーディングでは、サンプル値は、以前に再構築された参照ピクチャからのサンプルまたは他のデータを参照することなく表される。一部のビデオコーデックでは、ピクチャがサンプルのブロックに空間的に細分される。サンプルのすべてのブロックがイントラモードでコーディングされる場合、そのピクチャをイントラピクチャと呼ぶことができる。イントラピクチャおよび独立したデコーダリフレッシュピクチャなどのそれらの派生ピクチャは、デコーダ状態をリセットするために使用され得、したがって、コーディングされたビデオビットストリームおよびビデオセッション内の最初のピクチャとして、または静止画像として使用され得る。次いで、イントラ予測後のブロックのサンプルに周波数領域への変換を施すことができ、そのように生成された変換係数をエントロピーコーディングの前に量子化することができる。イントラ予測は、変換前領域におけるサンプル値を最小化する技法を表す。場合によっては、変換後のDC値が小さいほど、かつAC係数が小さいほど、エントロピーコーディング後のブロックを表すために所与の量子化ステップのサイズにおいて必要とされるビットは少なくなる。 Video codec technology can include a technique known as intra-coding. In intra-coding, sample values are represented without reference to samples or other data from previously reconstructed reference pictures. In some video codecs, a picture is spatially subdivided into blocks of samples. If all blocks of samples are coded in intra mode, the picture can be called an intra-picture. Intra-pictures and their derived pictures, such as independent decoder refresh pictures, can be used to reset the decoder state and thus can be used as the first picture in a coded video bitstream and video session or as still images. The samples of the block after intra prediction can then be transformed to the frequency domain, and the transform coefficients so produced can be quantized before entropy coding. Intra-prediction represents a technique that minimizes sample values in the pre-transform domain. In some cases, the smaller the DC value after transformation and the smaller the AC coefficients, the fewer bits are required at a given quantization step size to represent the block after entropy coding.

例えば、MPEG－2生成コーディング技術から知られているような従来のイントラコーディングは、イントラ予測を使用しない。しかしながら、いくつかのより新しいビデオ圧縮技術は、例えば、空間的隣接のエンコーディングおよび／またはデコーディング時に取得される、イントラコーディングまたはイントラデコーディングされているデータのブロックにデコーディング順序で先行する、周囲のサンプルデータおよび／またはメタデータに基づいて、ブロックのコーディング／デコーディングを試みる技法を含む。そのような技法は、以後「イントラ予測」技法と呼ばれる。少なくともいくつかの場合において、イントラ予測は、再構築中の現在のピクチャのみからの参照データを使用し、他の参照ピクチャからの参照データは使用しないことに留意されたい。 Conventional intra-coding, for example as known from MPEG-2 generation coding techniques, does not use intra-prediction. However, some newer video compression techniques include techniques that attempt to code/decode a block based on surrounding sample data and/or metadata that precedes in decoding order the block of data being intra-coded or intra-decoded, e.g., obtained during the encoding and/or decoding of its spatial neighbors. Such techniques are hereafter referred to as "intra-prediction" techniques. Note that in at least some cases, intra-prediction uses reference data only from the current picture being reconstructed, and not from other reference pictures.

イントラ予測には、多くの異なる形態があり得る。そのような技法のうちの2つ以上が所与のビデオコーディング技術において利用可能である場合、使用される技法を、イントラ予測モードと呼ぶことができる。1つまたは複数のイントラ予測モードが、特定のコーデックで提供され得る。特定の場合には、モードは、サブモードを有することができ、かつ／または様々なパラメータに関連付けられていてもよく、モード／サブモード情報およびビデオのブロックのイントラコーディングパラメータは、個々にコーディングされるか、またはまとめてモードのコードワードに含めることができる。所与のモード、サブモード、および／またはパラメータの組み合わせに、どのコードワードを使用するかは、イントラ予測を介したコーディング効率向上に影響を与える可能性があり、そのため、コードワードをビットストリームに変換するために使用されるエントロピーコーディング技術も影響を与える可能性がある。 Intra prediction may take many different forms. When more than one of such techniques is available in a given video coding technique, the technique used may be referred to as an intra prediction mode. One or more intra prediction modes may be provided in a particular codec. In certain cases, a mode may have sub-modes and/or may be associated with various parameters, and the mode/sub-mode information and intra coding parameters for a block of video may be coded individually or collectively included in the codeword of the mode. Which codeword is used for a given mode, sub-mode, and/or parameter combination may affect the coding efficiency gains via intra prediction, and therefore also the entropy coding technique used to convert the codeword into a bitstream.

イントラ予測の特定のモードは、H．264で導入され、H．265において改良され、共同探索モデル（JEM）、多用途ビデオコーディング（VVC）、およびベンチマークセット（BMS）などのより新しいコーディング技術においてさらに改良された。一般に、イントラ予測では、利用可能になった隣接サンプル値を使用して予測子ブロックを形成することができる。例えば、特定の方向および／または線に沿った特定の隣接サンプルセットの利用可能な値が、予測子ブロックにコピーされ得る。使用中の方向への参照は、ビットストリームでコーディングされ得るか、またはそれ自体が予測されてもよい。 Certain modes of intra prediction were introduced in H.264, improved in H.265, and further refined in newer coding techniques such as the Joint Search Model (JEM), Versatile Video Coding (VVC), and Benchmark Set (BMS). In general, in intra prediction, the predictor block may be formed using neighboring sample values as they become available. For example, the available values of a particular set of neighboring samples along a particular direction and/or line may be copied into the predictor block. The reference to the direction in use may be coded in the bitstream or may itself be predicted.

図1Aを参照すると、右下に示されているのは、（H．265で指定される35個のイントラモードのうちの33個の角度モードに対応する）H．265の33個の可能なイントラ予測子方向で指定される9つの予測子方向のサブセットである。矢印が収束する点（101）は、予測されているサンプルを表す。矢印は、101にあるサンプルを予測するために隣接サンプルがそこから使用される方向を表す。例えば、矢印（102）は、サンプル（101）が水平方向から45度の角度で右上に1つまたは複数の隣接サンプルから予測されることを示す。同様に、矢印（103）は、サンプル（101）が水平方向から22．5度の角度でサンプル（101）の左下に1つまたは複数の隣接サンプルから予測されることを示す。 Referring to FIG. 1A, shown at the bottom right is a subset of the nine predictor directions specified in the 33 possible intra predictor directions of H.265 (corresponding to the 33 angle modes of the 35 intra modes specified in H.265). The point where the arrows converge (101) represents the sample being predicted. The arrows represent the direction from which neighboring samples are used to predict the sample at 101. For example, arrow (102) indicates that sample (101) is predicted from one or more neighboring samples to the upper right at an angle of 45 degrees from the horizontal. Similarly, arrow (103) indicates that sample (101) is predicted from one or more neighboring samples to the lower left of sample (101) at an angle of 22.5 degrees from the horizontal.

さらに図1Aを参照すると、左上には、（太い破線によって示された）4×4サンプルの正方形ブロック（104）が示されている。正方形ブロック（104）は16個のサンプルを含み、各々、「S」、Y次元のその位置（例えば、行インデックス）、およびX次元のその位置（例えば、列インデックス）でラベル付けされている。例えば、サンプルS21は、Y次元の（上から）2番目のサンプルであり、X次元の（左から）1番目のサンプルである。同様に、サンプルS44は、ブロック（104）内のY次元とX次元の両方の4番目のサンプルである。ブロックはサイズが4×4サンプルなので、S44は右下にある。同様の番号付け方式に従う例示的な参照サンプルが、さらに示されている。参照サンプルは、ブロック（104）に対してR、そのY位置（例えば、行インデックス）、およびX位置（列インデックス）でラベル付けされている。H．264とH．265の両方で、再構築中のブロックに隣接する予測サンプルが使用される。 With further reference to FIG. 1A, at the top left is shown a square block (104) of 4×4 samples (indicated by the thick dashed line). The square block (104) contains 16 samples, each labeled with an “S”, its position in the Y dimension (e.g., row index), and its position in the X dimension (e.g., column index). For example, sample S21 is the second sample (from the top) in the Y dimension and the first sample (from the left) in the X dimension. Similarly, sample S44 is the fourth sample in both the Y and X dimensions in the block (104). Since the block is 4×4 samples in size, S44 is at the bottom right. An exemplary reference sample, which follows a similar numbering scheme, is further shown. The reference sample is labeled R, its Y position (e.g., row index), and X position (column index) relative to the block (104). In both H.264 and H.265, predicted samples neighboring the block being reconstructed are used.

ブロック104のイントラピクチャ予測は、シグナリングされた予測方向に従って隣接サンプルから参照サンプル値をコピーすることから開始し得る。例えば、コーディングされたビデオビットストリームは、このブロック104について、矢印（102）の予測方向を示すシグナリングを含む、すなわち、サンプルは1つまたは複数の予測サンプルから右上へ、水平方向から45度の角度で予測されると仮定する。そのような場合、サンプルS41、S32、S23、およびS14が同じ参照サンプルR05から予測される。次いで、サンプルS44が参照サンプルR08から予測される。 Intra-picture prediction of block 104 may start by copying reference sample values from neighboring samples according to a signaled prediction direction. For example, assume that the coded video bitstream includes signaling for this block 104 indicating the prediction direction of the arrow (102), i.e., the sample is predicted from one or more prediction samples to the upper right, at an angle of 45 degrees from the horizontal. In such a case, samples S41, S32, S23, and S14 are predicted from the same reference sample R05. Then sample S44 is predicted from reference sample R08.

特定の場合には、参照サンプルを計算するために、特に方向が45度によって均等に割り切れない場合、複数の参照サンプルの値は、例えば補間によって組み合わされてもよい。 In certain cases, to calculate a reference sample, especially when the orientation is not evenly divisible by 45 degrees, the values of multiple reference samples may be combined, for example by interpolation.

可能な方向の数は、ビデオコーディング技術が発展し続けるにつれて増加してきた。H．264（2003年）では、例えば、9つの異なる方向がイントラ予測に利用可能である。これは、H．265（2013年）では33まで増加し、JEM／VVC／BMSは、本開示の時点で、最大65の方向をサポートすることができる。最も適切なイントラ予測方向を識別するのに役立つ実験研究が行われており、エントロピーコーディングの特定の技法を使用して、方向についての特定のビットペナルティを受け入れて、それらの最も適切な方向が少数のビットでエンコーディングされ得る。さらに、方向自体を、デコーディングされた隣接ブロックのイントラ予測で使用された隣接する方向から予測することができる場合もある。 The number of possible directions has increased as video coding technology continues to develop. In H.264 (2003), for example, nine different directions are available for intra prediction. This increases to 33 in H.265 (2013), and JEM/VVC/BMS can support up to 65 directions at the time of this disclosure. Experimental studies have been conducted to help identify the most suitable intra prediction directions, and those most suitable directions can be encoded with a small number of bits using certain techniques of entropy coding, accepting a certain bit penalty for the direction. Furthermore, in some cases the direction itself can be predicted from the neighboring directions used in the intra prediction of the decoded neighboring blocks.

図1Bは、時間の経過と共に発展した様々なエンコーディング技術における増加する予測方向の数を例示するために、JEMによる65個のイントラ予測方向を示す概略図（180）を示す。 Figure 1B shows a schematic diagram (180) showing 65 intra prediction directions according to JEM to illustrate the increasing number of prediction directions in various encoding techniques that have evolved over time.

コーディングされたビデオビットストリームにおけるイントラ予測方向を表すビットの予測方向へのマッピングのための方法は、ビデオコーディング技術によって異なる可能性があり、例えば、予測方向対イントラ予測モードの単純な直接マッピングから、コードワード、最確モードを含む複雑な適応方式、および同様の技法に及ぶことができる。しかしながら、すべての場合において、特定の他の方向よりもビデオコンテンツで発生する可能性が統計的に低いイントロ予測の特定の方向が存在し得る。ビデオ圧縮の目的は冗長性の低減であるため、うまく設計されたビデオコーディング技術においては、それらのより可能性の低い方向はより可能性の高い方向よりも多くのビット数で表され得る。 Methods for mapping bits representing intra-prediction directions to prediction directions in a coded video bitstream can vary across video coding techniques and can range, for example, from simple direct mappings of prediction directions to intra-prediction modes to complex adaptation schemes involving codewords, most-probable modes, and similar techniques. In all cases, however, there may be certain directions of intra-prediction that are statistically less likely to occur in the video content than certain other directions. Because the goal of video compression is redundancy reduction, in a well-designed video coding technique, those less likely directions may be represented with a greater number of bits than the more likely directions.

インターピクチャ予測、またはインター予測は、動き補償に基づいていてもよい。動き補償では、以前に再構築されたピクチャまたはその一部（参照ピクチャ）からのサンプルデータが、動きベクトル（以後、MV）によって示される方向に空間的にシフトされた後、新たに再構築されたピクチャまたはピクチャ部分（例えば、ブロック）の予測に使用され得る。場合によっては、参照ピクチャは、現在再構築中のピクチャと同じであり得る。MVは、2つの次元XおよびY、または3つの次元を有していてもよく、第3の次元は、（時間次元と類似した）使用中の参照ピクチャの指示である。 Interpicture prediction, or inter prediction, may be based on motion compensation, in which sample data from a previously reconstructed picture or part of it (reference picture) may be used to predict a newly reconstructed picture or picture part (e.g., a block) after being spatially shifted in a direction indicated by a motion vector (hereafter MV). In some cases, the reference picture may be the same as the picture currently being reconstructed. The MV may have two dimensions X and Y, or three dimensions, with the third dimension being an indication of the reference picture in use (similar to the temporal dimension).

いくつかのビデオ圧縮技法では、サンプルデータの特定のエリアに適用可能な現在のMVが、他のMVから、例えば再構築中のエリアに空間的に隣接し、デコーディング順序で現在のMVに先行する、サンプルデータの他のエリアに関連する他のMVから予測され得る。そうすることにより、相関するMVの冗長性の除去に依拠することによってMVをコーディングするのに必要とされる全体のデータ量を大幅に削減することができ、それによって圧縮効率が増加する。MV予測が効果的に機能することができるのは、例えば、（自然なビデオとして知られている）カメラから導出された入力ビデオ信号をコーディングするとき、単一のMVが適用可能なエリアよりも大きいエリアは、ビデオシーケンスにおいて同様の方向に移動する統計的尤度があり、したがって、場合によっては、隣接するエリアのMVから導出された同様の動きベクトルを使用して予測することができるからである。その結果として、所与のエリアの実際のMVが周囲のMVから予測されたMVと同様または同一になる。そのようなMVはさらに、エントロピーコーディング後に、MVが隣接するMVから予測されるのではなく直接コーディングされた場合に使用されることになるビット数よりも少ないビット数で表され得る。場合によっては、MV予測は、原信号（すなわち、サンプルストリーム）から導出された信号（すなわち、MV）の可逆圧縮の一例であり得る。他の場合には、MV予測自体は、例えばいくつかの周囲のMVから予測子を計算するときの丸め誤差のために、非可逆であり得る。 In some video compression techniques, a current MV applicable to a particular area of sample data may be predicted from other MVs, e.g., from other MVs related to other areas of sample data that are spatially adjacent to the area being reconstructed and that precede the current MV in decoding order. Doing so can significantly reduce the overall amount of data required to code the MV by relying on the removal of redundancy in correlated MVs, thereby increasing compression efficiency. MV prediction can work effectively because, for example, when coding an input video signal derived from a camera (known as natural video), areas larger than the area to which a single MV is applicable have a statistical likelihood to move in a similar direction in the video sequence and therefore, in some cases, can be predicted using similar motion vectors derived from MVs of neighboring areas. As a result, the actual MV of a given area is similar or identical to the MV predicted from the surrounding MVs. Such MVs may further be represented, after entropy coding, with fewer bits than would be used if the MVs were directly coded instead of predicted from neighboring MVs. In some cases, MV prediction may be an example of lossless compression of a signal (i.e., MV) derived from an original signal (i.e., a sample stream). In other cases, the MV prediction itself may be non-lossy, for example due to rounding errors when computing the predictor from several surrounding MVs.

様々なMV予測メカニズムが、H．265／HEVC（ITU－T勧告H．265、「High Efficiency Video Coding」、2016年12月）に記載されている。H．265が指定する多くのMV予測メカニズムのうち、以下で説明するのは、以後「空間マージ」と呼ぶ技法である。 Various MV prediction mechanisms are described in H.265/HEVC (ITU-T Recommendation H.265, "High Efficiency Video Coding", December 2016). Among the many MV prediction mechanisms specified by H.265, the one described below is a technique that we will refer to hereafter as "spatial merging".

具体的には、図2を参照すると、現在のブロック（201）は、動き検索プロセス中にエンコーダによって、空間的にシフトされた同じサイズの前のブロックから予測可能であると検出されたサンプルを含む。直接そのMVをコーディングする代わりに、MVは、A0、A1、およびB0、B1、B2（それぞれ、202～206）と表記された5つの周囲のサンプルのいずれか1つに関連付けられたMVを使用して、1つまたは複数の参照ピクチャに関連付けられたメタデータから、例えば、（デコーディング順序で）最新の参照ピクチャから導出され得る。H．265では、MV予測は、隣接ブロックが使用しているのと同じ参照ピクチャからの予測子を使用することができる。 Specifically, referring to FIG. 2, a current block (201) contains samples that were detected by the encoder during the motion search process as predictable from a spatially shifted previous block of the same size. Instead of coding its MV directly, the MV may be derived from metadata associated with one or more reference pictures, e.g., the most recent reference picture (in decoding order), using MVs associated with any one of five surrounding samples, denoted A0, A1, and B0, B1, B2 (202-206, respectively). In H.265, MV prediction may use predictors from the same reference picture that neighboring blocks use.

本開示は、一般に、ビデオコーディングに関し、特に、インター予測における動きベクトル差のための大きさ依存適応解像度が使用されるか否かに基づいて、様々な動きベクトルまたは動きベクトル差関連構文をシグナリングするための方法およびシステムに関する。 The present disclosure relates generally to video coding, and more particularly to methods and systems for signaling different motion vector or motion vector difference related syntax based on whether magnitude-dependent adaptive resolution for motion vector differences in inter prediction is used.

例示的な実装形態では、ビデオストリームの現在のビデオブロックを処理するための方法が開示される。本方法は、ビデオストリームを受信するステップと、現在のビデオブロックが予測ブロックおよび動きベクトル（MV）に基づいてインターコーディングされていると決定するステップであって、MVは現在のビデオブロックの参照動きベクトル（RMV）と動きベクトル差（MVD）から導出されるべきである、ステップとを含み得る。本方法は、MVDが適応MVD画素解像度でコーディングされているとの決定に応答して、現在のビデオブロックの参照MVD画素精度を決定するステップと、最大許容MVD画素精度を特定するステップと、参照MVD画素精度および最大許容MVD画素精度に基づいて現在のビデオブロックの許容可能なMVDレベルのセットを決定するステップと、現在のビデオブロックおよび許容可能なMVDレベルのセットについてビデオストリーム内でシグナリングされた少なくとも1つのMVDパラメータに従ってビデオストリームからMVDを導出するステップとをさらに含む。 In an example implementation, a method for processing a current video block of a video stream is disclosed. The method may include receiving a video stream and determining that the current video block is inter-coded based on a predictive block and a motion vector (MV), where the MV is to be derived from a reference motion vector (RMV) and a motion vector difference (MVD) of the current video block. The method further includes, in response to determining that the MVD is coded at an adaptive MVD pixel resolution, determining a reference MVD pixel precision for the current video block, identifying a maximum allowable MVD pixel precision, determining a set of allowable MVD levels for the current video block based on the reference MVD pixel precision and the maximum allowable MVD pixel precision, and deriving the MVD from the video stream according to at least one MVD parameter signaled in the video stream for the current video block and the set of allowable MVD levels.

上記の実装形態では、現在のビデオブロックの参照MVD画素精度は、シーケンスレベル、ピクチャレベル、フレームレベル、スーパーブロックレベル、またはコーディングブロックレベルで指定／シグナリング／導出される。 In the above implementations, the reference MVD pixel precision of the current video block is specified/signaled/derived at the sequence level, picture level, frame level, superblock level, or coding block level.

上記の実装形態のいずれか1つにおいて、現在のビデオブロックの参照MVD画素精度は、現在のビデオブロックのMVDに関連付けられたMVDクラスに依存する。 In any one of the above implementations, the reference MVD pixel precision of the current video block depends on the MVD class associated with the MVD of the current video block.

上記の実装形態のいずれか1つにおいて、現在のビデオブロックの参照MVD画素精度は、現在のビデオブロックのMVDのMVD大きさに依存する。上記の実装形態のいずれか1つにおいて、最大許容MVD画素精度が事前定義されている。 In any one of the above implementations, the reference MVD pixel precision of the current video block depends on the MVD magnitude of the MVD of the current video block. In any one of the above implementations, a maximum allowable MVD pixel precision is predefined.

上記の実装形態のいずれか1つにおいて、本方法は、MVDクラスの事前定義されたセットの中から現在のMVDクラスを決定するステップをさらに含み得る。参照MVD画素精度および最大許容MVD画素精度に基づいてMVDの許容可能なMVDレベルのセットを決定するステップは、現在のビデオブロックの許容可能なMVDレベルのセットを決定するために、参照MVD画素精度および現在のMVDクラスに基づいて決定された参照MVDレベルセットから、最大許容MVD画素精度以上のMVD画素精度に関連付けられたMVDレベルを除外するステップを含み得る。 In any one of the above implementations, the method may further include determining a current MVD class from among a predefined set of MVD classes. Determining a set of acceptable MVD levels for the MVD based on the reference MVD pixel precision and the maximum allowable MVD pixel precision may include excluding MVD levels associated with an MVD pixel precision equal to or greater than the maximum allowable MVD pixel precision from the reference MVD level set determined based on the reference MVD pixel precision and the current MVD class to determine a set of acceptable MVD levels for the current video block.

上記の実装形態のいずれか1つにおいて、最大許容MVD画素精度は1／4画素である。上記の実装形態のいずれか1つにおいて、1／8画素以上の精度に関連付けられたMVDレベルは、現在のビデオブロックの許容可能なMVDレベルのセットから除外される。 In any one of the above implementations, the maximum allowable MVD pixel precision is 1/4 pixel. In any one of the above implementations, MVD levels associated with 1/8 pixel precision or greater are excluded from the set of allowable MVD levels for the current video block.

上記の実装形態のいずれか1つにおいて、本方法は、MVDクラスの事前定義されたセットの中から現在のMVDクラスを決定するステップをさらに含み得る。小数（fractional）MVD精度に関連付けられたMVDレベルは、現在のMVDクラスが閾値MVDクラス以下である場合、参照MVD精度にかかわらず、許容可能なMVDレベルのセットに含まれ得る。 In any one of the above implementations, the method may further include determining a current MVD class from among a predefined set of MVD classes. An MVD level associated with a fractional MVD accuracy may be included in the set of acceptable MVD levels, regardless of the reference MVD accuracy, if the current MVD class is less than or equal to a threshold MVD class.

上記の実装形態のいずれか1つにおいて、閾値MVDクラスは、MVDクラスの事前定義されたセットの中で最も低いMVDクラスであってもよい。 In any one of the above implementations, the threshold MVD class may be the lowest MVD class among a predefined set of MVD classes.

上記の実装形態のいずれか1つにおいて、方法は、MVDの大きさを決定するステップをさらに含んでもよく、閾値MVD精度よりも高いMVD精度に関連付けられたMVDレベルは、MVDの大きさが閾値MVD大きさ以下である場合にのみ許容可能なMVDレベルのセットにおいて許容される。 In any one of the above implementations, the method may further include determining an MVD magnitude, and an MVD level associated with an MVD accuracy higher than a threshold MVD accuracy is allowed in the set of allowable MVD levels only if the MVD magnitude is less than or equal to the threshold MVD magnitude.

上記の実装形態のいずれか1つにおいて、閾値MVD大きさは2画素以下である。上記の実装形態のいずれか1つにおいて、閾値MVD精度は1画素である。上記の実装形態のいずれか1つにおいて、1／4画素以上のMVD精度に関連付けられたMVDレベルは、MVDの大きさが1／2画素以下である場合にのみ許容される。上記の実装形態のいずれか1つにおいて、最大許容MVD画素精度は、参照MVD画素精度より大きくなくてもよい。 In any one of the above implementations, the threshold MVD magnitude is 2 pixels or less. In any one of the above implementations, the threshold MVD precision is 1 pixel. In any one of the above implementations, MVD levels associated with MVD precision of ¼ pixel or greater are only allowed if the MVD magnitude is ½ pixel or less. In any one of the above implementations, the maximum allowable MVD pixel precision may not be greater than the reference MVD pixel precision.

他の例示的な実装形態では、ビデオストリームの現在のビデオブロックを処理するための方法が開示される。本方法は、ビデオストリームを受信するステップと、現在のビデオブロックがインターコーディングされ、複数の参照フレームに関連付けられていると決定するステップと、ビデオストリーム内のシグナリングに基づいて、複数の参照フレームのうちの少なくとも1つに適応動きベクトル差（MVD）画素解像度が適用されるかどうか決定するステップとを含み得る。 In another example implementation, a method for processing a current video block of a video stream is disclosed. The method may include receiving a video stream, determining that the current video block is inter-coded and associated with multiple reference frames, and determining, based on signaling in the video stream, whether adaptive motion vector difference (MVD) pixel resolution is applied to at least one of the multiple reference frames.

上記の実装形態では、シグナリングは、適応MVD画素解像度が複数の参照フレームのすべてに適用されるか、またはいずれにも適用されないかを示すための単一ビットフラグを含み得る。 In the above implementation, the signaling may include a single-bit flag to indicate whether the adaptive MVD pixel resolution is applied to all or none of the multiple reference frames.

上記の実装形態のいずれか1つにおいて、シグナリングは、適応MVD画素解像度が適用されるかどうかを示すために、複数の参照フレームのうちの1つにそれぞれ対応する別個のフラグを含み得る。 In any one of the above implementations, the signaling may include separate flags, each corresponding to one of the multiple reference frames, to indicate whether adaptive MVD pixel resolution is applied.

上記の実装形態のいずれか1つにおいて、シグナリングは、複数の参照フレームの各々について、複数の参照フレームの各々に対応するMVDがゼロであるときに適応MVD画素解像度が適用されないという暗黙の指示と、複数の参照フレームの各々に対応するMVDが非ゼロであるときに適応MVD画素解像度が適用されるかどうかを示すための単一ビットフラグとを含んでもよい。 In any one of the above implementations, the signaling may include, for each of the multiple reference frames, an implicit indication that adaptive MVD pixel resolution is not applied when the MVD corresponding to each of the multiple reference frames is zero, and a single-bit flag to indicate whether adaptive MVD pixel resolution is applied when the MVD corresponding to each of the multiple reference frames is non-zero.

他の例示的な実装形態では、ビデオストリームの現在のビデオブロックを処理するための方法が開示される。本方法は、ビデオストリームを受信するステップと、現在のビデオブロックが予測ブロックおよび動きベクトル（MV）に基づいてインターコーディングされていると決定するステップであって、MVは現在のビデオブロックに対する参照動きベクトル（RMV）および動きベクトル差（MVD）から導出されるべきである、ステップと、MVDクラスの事前定義されたセットの中からMVDの現在のMVDクラスを決定するステップと、現在のMVDクラスに基づいて、ビデオストリーム内の少なくとも1つの明示的なシグナリングをエントロピーデコーディングするための少なくとも1つのコンテキストを導出するステップであって、少なくとも1つの明示的なシグナリングは、MVDの少なくとも1つの成分のためのMVD画素解像度を指定するためにビデオストリームに含まれる、ステップと、MVDの少なくとも1つの成分についてのMVD画素解像度を決定するために、少なくとも1つのコンテキストを使用してビデオストリームからの少なくとも1つの明示的なシグナリングをエントロピーデコーディングするステップとを含み得る。 In another example implementation, a method for processing a current video block of a video stream is disclosed. The method may include receiving a video stream; determining that the current video block is inter-coded based on a predictive block and a motion vector (MV), where the MV is to be derived from a reference motion vector (RMV) and a motion vector difference (MVD) for the current video block; determining a current MVD class of the MVD from among a predefined set of MVD classes; deriving at least one context for entropy decoding at least one explicit signaling in the video stream based on the current MVD class, where the at least one explicit signaling is included in the video stream to specify an MVD pixel resolution for at least one component of the MVD; and entropy decoding the at least one explicit signaling from the video stream using the at least one context to determine the MVD pixel resolution for the at least one component of the MVD.

上記の実装形態では、MVDの少なくとも1つの成分は、MVDの水平成分および垂直成分を含んでもよく、少なくとも1つのコンテキストは、MVDの水平成分および垂直成分のうちの1つにそれぞれ関連付けられた2つの別々のコンテキストを含んでもよく、水平成分および垂直成分は、別々のMVD画素解像度に関連付けられている。 In the above implementation, the at least one component of the MVD may include a horizontal component and a vertical component of the MVD, and the at least one context may include two separate contexts each associated with one of the horizontal and vertical components of the MVD, the horizontal and vertical components being associated with separate MVD pixel resolutions.

本開示の態様はまた、上記の方法の実装形態のいずれかを実行するように構成された回路を含むビデオエンコーディングまたはデコーディングデバイスまたは装置を提供する。 Aspects of the present disclosure also provide a video encoding or decoding device or apparatus that includes circuitry configured to perform any of the implementations of the above methods.

本開示の態様では、ビデオデコーディングおよび／またはエンコーディングのためにコンピュータによって実行されるときにビデオデコーディングおよび／またはエンコーディングのための方法をコンピュータに行わせる命令を記憶する非一時的コンピュータ可読媒体も提供する。 Aspects of the present disclosure also provide a non-transitory computer-readable medium storing instructions that, when executed by a computer, cause the computer to perform a method for video decoding and/or encoding.

開示された主題のさらなる特徴、性質、および様々な利点は、以下の詳細な説明および添付の図面からより明らかになるであろう。 Further features, nature and various advantages of the disclosed subject matter will become more apparent from the following detailed description and accompanying drawings.

イントラ予測方向性モードの例示的なサブセットの概略図である。FIG. 13 is a schematic diagram of an example subset of intra-prediction directional modes. 例示的なイントラ予測方向を示す図である。FIG. 2 illustrates an exemplary intra-prediction direction. 一例における現在のブロックおよび動きベクトル予測のためのその周囲の空間マージ候補を示す概略図である。FIG. 2 is a schematic diagram illustrating a current block and its surrounding spatial merge candidates for motion vector prediction in one example. 例示的な実施形態による、通信システム（300）の簡略化されたブロック図の概略図である。1 is a schematic diagram of a simplified block diagram of a communication system (300) according to an exemplary embodiment. 例示的な実施形態による、通信システム（400）の簡略化されたブロック図の概略図である。4 is a schematic diagram of a simplified block diagram of a communication system (400) according to an exemplary embodiment. 例示的な実施形態による、ビデオデコーダの簡略化されたブロック図の概略図である。2 is a schematic diagram of a simplified block diagram of a video decoder according to an example embodiment; 例示的な実施形態による、ビデオエンコーダの簡略化されたブロック図の概略図である。1 is a schematic diagram of a simplified block diagram of a video encoder according to an example embodiment; 他の例示的な実施形態による、ビデオエンコーダのブロック図である。FIG. 11 is a block diagram of a video encoder according to another example embodiment. 他の例示的な実施形態による、ビデオデコーダのブロック図である。FIG. 11 is a block diagram of a video decoder according to another example embodiment. 本開示の例示的な実施形態による、コーディングブロック分割の方式を示す図である。FIG. 2 illustrates a coding block partitioning scheme according to an exemplary embodiment of the present disclosure. 本開示の例示的な実施形態による、コーディングブロック分割の他の方式を示す図である。FIG. 13 illustrates another scheme for coding block partitioning, according to an exemplary embodiment of the present disclosure. 本開示の例示的な実施形態による、コーディングブロック分割の他の方式を示す図である。FIG. 13 illustrates another scheme for coding block partitioning, according to an exemplary embodiment of the present disclosure. 例示的な分割方式による、ベースブロックのコーディングブロックへの例示的な分割を示す図である。FIG. 2 illustrates an exemplary partitioning of a base block into coding blocks according to an exemplary partitioning scheme. 例示的な三分割方式を示す図である。FIG. 1 illustrates an exemplary division-of-thirds scheme. 例示的な四分木二分木コーディングブロック分割方式を示す図である。FIG. 2 illustrates an exemplary quadtree/binary tree coding block partitioning scheme. 本開示の例示的な実施形態による、コーディングブロックを複数の変換ブロックに分割するための方式および変換ブロックのコーディング順序を示す図である。4A-4C are diagrams illustrating a scheme for splitting a coding block into multiple transform blocks and the coding order of the transform blocks according to an exemplary embodiment of the present disclosure. 本開示の例示的な実施形態による、コーディングブロックを複数の変換ブロックに分割するための他の方式および変換ブロックのコーディング順序を示す図である。4A-4C are diagrams illustrating other schemes for splitting a coding block into multiple transform blocks and the coding order of the transform blocks according to an exemplary embodiment of the present disclosure. 本開示の例示的な実施形態による、コーディングブロックを複数の変換ブロックに分割するための他の方式を示す図である。FIG. 13 illustrates another scheme for splitting a coding block into multiple transform blocks, according to an exemplary embodiment of the present disclosure. 本開示の例示的な実施形態による、方法のフローチャートを示す図である。FIG. 2 illustrates a flowchart of a method according to an exemplary embodiment of the present disclosure. 本開示の例示的な実施形態による、方法の他のフローチャートを示す図である。FIG. 13 illustrates another flowchart of a method according to an exemplary embodiment of the present disclosure. 本開示の例示的な実施形態による、方法のフローチャートを示す図である。FIG. 2 illustrates a flowchart of a method according to an exemplary embodiment of the present disclosure. 本開示の例示的な実施形態による、コンピュータシステムの概略図である。1 is a schematic diagram of a computer system according to an exemplary embodiment of the present disclosure.

本明細書および特許請求の範囲全体を通して、用語は、明示的に記載される意味を超えて文脈内で示唆または暗示される微妙な意味を有する場合がある。本明細書で使用される「一実施形態では」または「いくつかの実施形態では」という語句は、必ずしも同じ実施形態を指すものではなく、本明細書で使用される「他の実施形態では」または「他の実施形態では」という語句は、必ずしも異なる実施形態を指すものではない。同様に、本明細書で使用される「一実装形態では」または「いくつかの実装形態では」という語句は、必ずしも同じ実装形態を指すものではなく、本明細書で使用される「他の実装形態では」または「他の実装形態では」という語句は、必ずしも異なる実装形態を指すものではない。例えば、特許請求される主題は、例示的な実施形態／実装形態の全部または一部の組み合わせを含むことを意図している。 Throughout this specification and the claims, terms may have subtle meanings that are suggested or implied in the context beyond the meaning explicitly stated. The phrases "in one embodiment" or "in some embodiments" used herein do not necessarily refer to the same embodiment, and the phrases "in other embodiments" or "in other embodiments" used herein do not necessarily refer to different embodiments. Similarly, the phrases "in one implementation" or "in some implementations" used herein do not necessarily refer to the same implementation, and the phrases "in other implementations" or "in other implementations" used herein do not necessarily refer to different implementations. For example, the claimed subject matter is intended to include all or part of the exemplary embodiments/implementations.

一般に、専門用語は、文脈における使用法から少なくとも部分的に理解される場合がある。例えば、本明細書で使用される「および」、「または」、または「および／または」などの用語は、そのような用語が使用される文脈に少なくとも部分的に依存する場合がある様々な意味を含んでもよい。典型的には、A、B、またはCなどのリストを関連付けるために使用される場合の「または」は、ここでは包括的な意味で使用されるA、B、およびC、ならびにここでは排他的な意味で使用されるA、B、またはCを意味することを意図している。加えて、本明細書で使用される「1つまたは複数」または「少なくとも1つ」という用語は、文脈に少なくとも部分的に依存して、単数の意味で任意の特徴、構造、もしくは特性を記述するために使用されてもよく、または複数の意味で特徴、構造、もしくは特性の組み合わせを記述するために使用されてもよい。同様に、「a」、「an」、または「the」などの用語もやはり、文脈に少なくとも部分的に依存して、単数形の使用法を伝えるか、または複数形の使用法を伝えると理解されてもよい。加えて、「に基づいて」または「によって決定される」という用語は、必ずしも排他的な要因のセットを伝えることを意図していないと理解されてもよく、代わりに、やはり文脈に少なくとも部分的に依存して、必ずしも明示的に記述されていないさらなる要因の存在を可能にする場合もある。図3は、本開示の一実施形態による、通信システム（300）の簡略化されたブロック図を示している。通信システム（300）は、例えば、ネットワーク（350）を介して互いに通信することができる複数の端末デバイスを含む。例えば、通信システム（300）は、ネットワーク（350）を介して相互接続された端末デバイス（310）および（320）の第1のペアを含む。図3の例では、端末デバイス（310）および（320）の第1のペアは、データの単方向送信を行い得る。例えば、端末デバイス（310）は、ネットワーク（350）を介して他方の端末デバイス（320）に送信するための（例えば、端末デバイス（310）によって取り込まれたビデオピクチャのストリームの）ビデオデータをコーディングし得る。エンコーディングされたビデオデータは、1つまたは複数のコーディングされたビデオビットストリームの形式で送信され得る。端末デバイス（320）は、ネットワーク（350）からコーディングされたビデオデータを受信し、コーディングされたビデオデータをデコーディングしてビデオピクチャを復元し、復元されたビデオデータに従ってビデオピクチャを表示し得る。単方向データ送信は、メディアサービング用途などで実施されてもよい。 In general, terminology may be understood at least in part from its usage in context. For example, terms such as "and", "or", or "and/or" as used herein may include various meanings that may depend at least in part on the context in which such terms are used. Typically, "or" when used to relate a list such as A, B, or C is intended to mean A, B, and C, which are used herein in an inclusive sense, as well as A, B, or C, which are used herein in an exclusive sense. In addition, the terms "one or more" or "at least one" as used herein may be used to describe any feature, structure, or characteristic in a singular sense, or may be used to describe a combination of features, structures, or characteristics in a plural sense, depending at least in part on the context. Similarly, terms such as "a", "an", or "the" may also be understood to convey a singular usage or to convey a plural usage, depending at least in part on the context. In addition, the terms "based on" or "determined by" may be understood not to be intended to convey an exclusive set of factors, but instead may allow for the existence of additional factors not necessarily explicitly described, again depending at least in part on the context. FIG. 3 illustrates a simplified block diagram of a communication system (300) according to one embodiment of the present disclosure. The communication system (300) includes a plurality of terminal devices that can communicate with each other, for example, via a network (350). For example, the communication system (300) includes a first pair of terminal devices (310) and (320) interconnected via the network (350). In the example of FIG. 3, the first pair of terminal devices (310) and (320) may perform unidirectional transmission of data. For example, the terminal device (310) may code video data (e.g., of a stream of video pictures captured by the terminal device (310)) for transmission to the other terminal device (320) via the network (350). The encoded video data may be transmitted in the form of one or more coded video bitstreams. The terminal device (320) may receive coded video data from the network (350), decode the coded video data to recover a video picture, and display the video picture according to the recovered video data. The unidirectional data transmission may be implemented in a media serving application, etc.

他の例では、通信システム（300）は、例えば、ビデオ会議用途の間に行われ得るコーディングされたビデオデータの双方向送信を実行する端末デバイス（330）および（340）の第2のペアを含む。データの双方向送信のために、一例では、端末デバイス（330）および（340）の各端末デバイスは、ネットワーク（350）を介して端末デバイス（330）および（340）の他方の端末デバイスに送信するための（例えば、その端末デバイスによって取り込まれたビデオピクチャのストリームの）ビデオデータをコーディングし得る。端末デバイス（330）および（340）の各端末デバイスはまた、端末デバイス（330）および（340）の他方の端末デバイスによって送信されたコーディングされたビデオデータを受信し、コーディングされたビデオデータをデコーディングしてビデオピクチャを復元し、復元されたビデオデータに従ってアクセス可能な表示デバイスでビデオピクチャを表示し得る。 In another example, the communication system (300) includes a second pair of terminal devices (330) and (340) performing bidirectional transmission of coded video data, which may occur, for example, during video conferencing applications. For the bidirectional transmission of data, in one example, each of the terminal devices (330) and (340) may code video data (e.g., of a stream of video pictures captured by that terminal device) for transmission to the other of the terminal devices (330) and (340) over the network (350). Each of the terminal devices (330) and (340) may also receive coded video data transmitted by the other of the terminal devices (330) and (340), decode the coded video data to recover the video pictures, and display the video pictures on an accessible display device according to the recovered video data.

図3の例では、端末デバイス（310）、（320）、（330）、および（340）は、サーバ、パーソナルコンピュータ、およびスマートフォンとして実装されてもよいが、本開示の基礎となる原理の適用はそのように限定されなくてもよい。本開示の実施形態は、デスクトップコンピュータ、ラップトップコンピュータ、タブレットコンピュータ、メディアプレーヤ、ウェアラブルコンピュータ、専用のビデオ会議機器などにおいて実装され得る。ネットワーク（350）は、例えば配線（有線）および／または無線通信ネットワークを含む、端末デバイス（310）、（320）、（330）、および（340）間で、コーディングされたビデオデータを伝達する任意の数またはタイプのネットワークを表す。通信ネットワーク（350）は、回線交換チャネル、パケット交換チャネル、および／または他のタイプのチャネルでデータを交換してもよい。代表的なネットワークには、電気通信ネットワーク、ローカルエリアネットワーク、ワイドエリアネットワーク、および／またはインターネットが含まれる。本説明の目的のために、ネットワーク（350）のアーキテクチャおよびトポロジーは、本明細書で明確に説明されない限り、本開示の動作にとって重要でない可能性がある。 In the example of FIG. 3, terminal devices (310), (320), (330), and (340) may be implemented as a server, a personal computer, and a smartphone, although application of the principles underlying the present disclosure need not be so limited. Embodiments of the present disclosure may be implemented in desktop computers, laptop computers, tablet computers, media players, wearable computers, dedicated video conferencing equipment, and the like. Network (350) represents any number or type of network that conveys coded video data between terminal devices (310), (320), (330), and (340), including, for example, wired and/or wireless communication networks. Communications network (350) may exchange data over circuit-switched channels, packet-switched channels, and/or other types of channels. Representative networks include telecommunications networks, local area networks, wide area networks, and/or the Internet. For purposes of this description, the architecture and topology of network (350) may not be important to the operation of the present disclosure unless expressly described herein.

図4は、開示された主題についての用途の一例として、ビデオストリーミング環境内のビデオエンコーダおよびビデオデコーダの配置を示す。開示された主題は、例えば、ビデオ会議、デジタルテレビ放送、ゲーム、仮想現実、CD、DVD、メモリスティックなどを含むデジタル媒体上の圧縮ビデオの記憶などを含む、他のビデオ用途に等しく適用可能でありうる。 Figure 4 illustrates the arrangement of a video encoder and a video decoder in a video streaming environment as an example of an application for the disclosed subject matter. The disclosed subject matter may be equally applicable to other video applications including, for example, video conferencing, digital television broadcasting, gaming, virtual reality, storage of compressed video on digital media including CDs, DVDs, memory sticks, etc.

ビデオストリーミングシステムは、例えば、圧縮されていないビデオピクチャまたは画像のストリーム（402）を作成するためのビデオソース（401）、例えば、デジタルカメラを含むことができるビデオキャプチャサブシステム（413）を含んでもよい。一例では、ビデオピクチャのストリーム（402）は、ビデオソース401のデジタルカメラによって記録されたサンプルを含む。エンコーディングされたビデオデータ（404）（またはコーディングされたビデオビットストリーム）と比較したときに多いデータ量を強調するために太い線として示されたビデオピクチャのストリーム（402）は、ビデオソース（401）に結合されたビデオエンコーダ（403）を含む電子デバイス（420）によって処理され得る。ビデオエンコーダ（403）は、以下でより詳細に記載されるように、開示された主題の態様を可能にするかまたは実装するために、ハードウェア、ソフトウェア、またはそれらの組み合わせを含むことができる。エンコーディングされたビデオデータ（404）（またはエンコーディングされたビデオビットストリーム（404））は、非圧縮ビデオピクチャのストリーム（402）と比較した場合の低データ量を強調するために細線で示されており、将来の使用のためにストリーミングサーバ（405）に、または下流のビデオデバイス（図示せず）に直接記憶され得る。図4のクライアントサブシステム（406）および（408）などの1つまたは複数のストリーミングクライアントサブシステムは、ストリーミングサーバ（405）にアクセスして、エンコーディングされたビデオデータ（404）のコピー（407）および（409）を取り出すことができる。クライアントサブシステム（406）は、例えば、電子デバイス（430）内のビデオデコーダ（410）を含むことができる。ビデオデコーダ（410）は、エンコーディングされたビデオデータの入力コピー（407）をデコーディングし、圧縮されていない、ディスプレイ（412）（例えば、表示画面）または他のレンダリングデバイス（図示せず）上にレンダリングすることができるビデオピクチャの出力ストリーム（411）を作成する。ビデオデコーダ410は、本開示に記載される様々な機能の一部または全部を実行するように構成され得る。いくつかのストリーミングシステムでは、エンコーディングされたビデオデータ（404）、（407）、および（409）（例えば、ビデオビットストリーム）は、特定のビデオコーディング／圧縮規格に従ってエンコーディングされ得る。それらの規格の例には、ITU－T勧告H．265が含まれる。一例では、開発中のビデオコーディング規格は、多用途ビデオコーディング（VVC）として非公式に知られている。開示された主題は、VVC、および他のビデオコーディング規格の文脈で使用されてもよい。 A video streaming system may include a video source (401) for creating a stream of uncompressed video pictures or images (402), a video capture subsystem (413) that may include, for example, a digital camera. In one example, the stream of video pictures (402) includes samples recorded by the digital camera of the video source 401. The stream of video pictures (402), shown as a thick line to emphasize the large amount of data when compared to the encoded video data (404) (or coded video bitstream), may be processed by an electronic device (420) that includes a video encoder (403) coupled to the video source (401). The video encoder (403) may include hardware, software, or a combination thereof to enable or implement aspects of the disclosed subject matter, as described in more detail below. The encoded video data (404) (or encoded video bitstream (404)), shown with thin lines to emphasize its low amount of data compared to the stream of uncompressed video pictures (402), may be stored directly to the streaming server (405) or to a downstream video device (not shown) for future use. One or more streaming client subsystems, such as the client subsystems (406) and (408) of FIG. 4, may access the streaming server (405) to retrieve copies (407) and (409) of the encoded video data (404). The client subsystem (406) may include, for example, a video decoder (410) within the electronic device (430). The video decoder (410) decodes the input copy of the encoded video data (407) and creates an output stream of video pictures (411) that is uncompressed and can be rendered on a display (412) (e.g., a display screen) or other rendering device (not shown). The video decoder 410 may be configured to perform some or all of the various functions described in this disclosure. In some streaming systems, the encoded video data (404), (407), and (409) (e.g., a video bitstream) may be encoded according to a particular video coding/compression standard. Examples of such standards include ITU-T Recommendation H.265. In one example, a video coding standard under development is informally known as Versatile Video Coding (VVC). The disclosed subject matter may be used in the context of VVC, as well as other video coding standards.

電子デバイス（420）および（430）は、他の構成要素（図示せず）を含むことができることに留意されたい。例えば、電子デバイス（420）はビデオデコーダ（図示せず）を含むことができ、電子デバイス（430）もビデオエンコーダ（図示せず）を含むことができる。 It should be noted that electronic devices (420) and (430) may include other components (not shown). For example, electronic device (420) may include a video decoder (not shown) and electronic device (430) may also include a video encoder (not shown).

図5は、以下の本開示の任意の実施形態による、ビデオデコーダ（510）のブロック図を示す。ビデオデコーダ（510）は、電子デバイス（530）に含まれ得る。電子デバイス（530）は、受信器（531）（例えば、受信回路）を含むことができる。ビデオデコーダ（510）を、図4の例のビデオデコーダ（410）の代わりに使用することができる。 FIG. 5 illustrates a block diagram of a video decoder (510) according to any embodiment of the present disclosure below. The video decoder (510) may be included in an electronic device (530). The electronic device (530) may include a receiver (531) (e.g., receiving circuitry). The video decoder (510) may be used in place of the video decoder (410) of the example of FIG. 4.

受信器（531）は、ビデオデコーダ（510）によってデコーディングされるべき1つまたは複数のコーディングされたビデオシーケンスを受信し得る。同じまたは他の実施形態では、一度に1つのコーディングされたビデオシーケンスがデコーディングされ得、各コーディングされたビデオシーケンスのデコーディングは、他のコーディングされたビデオシーケンスから独立している。各ビデオシーケンスは、複数のビデオフレームまたはビデオ画像に関連付けられ得る。コーディングされたビデオシーケンスはチャネル（501）から受信され得、チャネル（501）は、エンコーディングされたビデオデータを記憶するストレージデバイスへのハードウェア／ソフトウェアリンク、またはエンコーディングされたビデオデータを送信するストリーミングソースであり得る。受信器（531）は、エンコーディングされたビデオデータを、それぞれの処理回路（図示せず）に転送され得る、コーディングされたオーディオデータおよび／または補助データストリームなどの他のデータと共に受信し得る。受信器（531）は、コーディングされたビデオシーケンスを他のデータから分離し得る。ネットワークジッタに対抗するために、バッファメモリ（515）が、受信器（531）とエントロピーデコーダ／パーサ（520）（以後、「パーサ（520）」）との間に配置されてもよい。特定の用途では、バッファメモリ（515）は、ビデオデコーダ（510）の一部として実装され得る。他の用途では、バッファメモリ（515）は、ビデオデコーダ（510）から分離されて外部にあり得る（図示せず）。さらに他の用途では、例えば、ネットワークジッタに対抗するためにビデオデコーダ（510）の外部にバッファメモリ（図示せず）があってもよく、例えば再生タイミングを処理するためにビデオデコーダ（510）の内部に他の追加のバッファメモリ（515）があり得る。受信器（531）が十分な帯域幅および可制御性の記憶／転送デバイスから、またはアイソシンクロナスネットワークからデータを受信しているとき、バッファメモリ（515）は不要な場合があり、または小さくすることができる。インターネットなどのベストエフォートパケットネットワークで使用するために、十分なサイズのバッファメモリ（515）が必要とされる場合があり、そのサイズは比較的大きくなり得る。そのようなバッファメモリは、適応サイズで実装されてもよく、ビデオデコーダ（510）の外部のオペレーティングシステムまたは同様の要素（図示せず）に少なくとも部分的に実装されてもよい。 The receiver (531) may receive one or more coded video sequences to be decoded by the video decoder (510). In the same or other embodiments, one coded video sequence at a time may be decoded, with the decoding of each coded video sequence being independent of the other coded video sequences. Each video sequence may be associated with multiple video frames or video images. The coded video sequences may be received from a channel (501), which may be a hardware/software link to a storage device that stores the encoded video data, or a streaming source that transmits the encoded video data. The receiver (531) may receive the encoded video data along with other data, such as coded audio data and/or auxiliary data streams, which may be forwarded to respective processing circuits (not shown). The receiver (531) may separate the coded video sequences from the other data. To combat network jitter, a buffer memory (515) may be placed between the receiver (531) and the entropy decoder/parser (520) (hereafter “parser (520)”). In certain applications, the buffer memory (515) may be implemented as part of the video decoder (510). In other applications, the buffer memory (515) may be separate and external to the video decoder (510) (not shown). In still other applications, there may be a buffer memory (not shown) external to the video decoder (510), for example, to combat network jitter, and there may be other additional buffer memories (515) internal to the video decoder (510), for example, to handle playback timing. When the receiver (531) is receiving data from a storage/forwarding device with sufficient bandwidth and controllability, or from an isosynchronous network, the buffer memory (515) may be unnecessary or may be small. For use with best-effort packet networks such as the Internet, a buffer memory (515) of sufficient size may be required, which may be relatively large in size. Such a buffer memory may be implemented with an adaptive size and may be implemented at least in part in an operating system or similar element (not shown) external to the video decoder (510).

ビデオデコーダ（510）は、コーディングされたビデオシーケンスからシンボル（521）を再構築するためにパーサ（520）を含んでもよい。それらのシンボルのカテゴリは、ビデオデコーダ（510）の動作を管理するために使用される情報と、潜在的に、図5に示すように、電子デバイス（530）の不可欠な部分である場合もそうでない場合もあるが、電子デバイス（530）に結合することができるディスプレイ（512）（例えば、表示画面）などのレンダリングデバイスを制御するための情報とを含む。レンダリングデバイスのための制御情報は、補足拡張情報（SEIメッセージ）またはビデオユーザビリティ情報（VUI）のパラメータセットフラグメント（図示せず）の形式であってもよい。パーサ（520）は、パーサ（520）によって受信されるコーディングされたビデオシーケンスを構文解析／エントロピーデコーディングし得る。コーディングされたビデオシーケンスのエントロピーコーディングは、ビデオコーディング技術または規格に従ったものとすることができ、可変長コーディング、ハフマンコーディング、文脈依存性ありまたはなしの算術コーディングなどを含む様々な原理に従うことができる。パーサ（520）は、コーディングされたビデオシーケンスから、サブグループに対応する少なくとも1つのパラメータに基づいて、ビデオデコーダ内の画素のサブグループのうちの少なくとも1つのサブグループパラメータのセットを抽出し得る。サブグループには、ピクチャグループ（GOP）、ピクチャ、タイル、スライス、マクロブロック、コーディングユニット（CU）、ブロック、変換ユニット（TU）、予測ユニット（PU）などを含めることができる。パーサ（520）はまた、コーディングされたビデオシーケンスから、変換係数（例えば、フーリエ変換係数）、量子化パラメータ値、動きベクトルなどの情報も抽出し得る。 The video decoder (510) may include a parser (520) to reconstruct symbols (521) from the coded video sequence. These categories of symbols include information used to manage the operation of the video decoder (510) and potentially information for controlling a rendering device such as a display (512) (e.g., a display screen) that may or may not be an integral part of the electronic device (530) but may be coupled to the electronic device (530) as shown in FIG. 5. The control information for the rendering device may be in the form of a supplemental enhancement information (SEI message) or a video usability information (VUI) parameter set fragment (not shown). The parser (520) may parse/entropy decode the coded video sequence received by the parser (520). The entropy coding of the coded video sequence may be according to a video coding technique or standard and may follow various principles including variable length coding, Huffman coding, arithmetic coding with or without context dependency, etc. The parser (520) may extract from the coded video sequence a set of subgroup parameters for at least one of the subgroups of pixels in the video decoder based on at least one parameter corresponding to the subgroup. The subgroups may include groups of pictures (GOPs), pictures, tiles, slices, macroblocks, coding units (CUs), blocks, transform units (TUs), prediction units (PUs), etc. The parser (520) may also extract information from the coded video sequence, such as transform coefficients (e.g., Fourier transform coefficients), quantization parameter values, motion vectors, etc.

パーサ（520）は、シンボル（521）を作成するために、バッファメモリ（515）から受信されたビデオシーケンスに対してエントロピーデコーディング／構文解析動作を実行し得る。 The parser (520) may perform entropy decoding/parsing operations on the video sequence received from the buffer memory (515) to create symbols (521).

シンボル（521）の再構築は、コーディングされたビデオピクチャまたはその部分のタイプ（インターピクチャおよびイントラピクチャ、インターブロックおよびイントラブロックなど）、ならびに他の要因に応じて、複数の異なる処理ユニットまたは機能ユニットを含むことができる。含まれるユニットおよびユニットがどのように含まれるかは、パーサ（520）によってコーディングされたビデオシーケンスから構文解析されたサブグループ制御情報によって制御され得る。パーサ（520）と以下の複数の処理ユニットまたは機能ユニットとの間のそのようなサブグループ制御情報の流れは、簡潔にするために図示されていない。 The reconstruction of the symbols (521) may involve a number of different processing or functional units, depending on the type of video picture or portion thereof being coded (interpicture and intrapicture, interblock and intrablock, etc.), as well as other factors. The units that are included and how the units are included may be controlled by subgroup control information parsed from the coded video sequence by the parser (520). The flow of such subgroup control information between the parser (520) and the following processing or functional units is not shown for the sake of simplicity.

すでに述べられた機能ブロック以外に、ビデオデコーダ（510）は、以下で説明されるように、概念的にいくつかの機能ユニットに細分することができる。商業的制約の下で動作する実際の実装形態では、これらの機能ユニットの多くは互いに密接に相互作用し、少なくとも部分的に、互いに統合され得る。しかしながら、開示された主題の様々な機能を明確に記載する目的で、以下の開示において機能ユニットへの概念的細分化が採用される。 Beyond the functional blocks already mentioned, the video decoder (510) may be conceptually subdivided into several functional units, as described below. In an actual implementation operating under commercial constraints, many of these functional units may closely interact with each other and may be, at least in part, integrated with each other. However, for purposes of clearly describing the various functions of the disclosed subject matter, a conceptual subdivision into functional units is adopted in the following disclosure.

第1のユニットは、スケーラ／逆変換ユニット（551）を含んでもよい。スケーラ／逆変換ユニット（551）は、量子化変換係数、ならびにどのタイプの逆変換を使用するかを示す情報、ブロックサイズ、量子化係数／パラメータ、量子化スケーリング行列などを含む制御情報を、パーサ（520）からシンボル（521）として受信し得る。スケーラ／逆変換ユニット（551）は、アグリゲータ（555）に入力することができるサンプル値を含むブロックを出力することができる。 The first unit may include a scalar/inverse transform unit (551), which may receive quantized transform coefficients as well as control information from the parser (520) including information indicating which type of inverse transform to use, block size, quantization coefficients/parameters, quantization scaling matrix, etc., as symbols (521). The scalar/inverse transform unit (551) may output a block containing sample values that may be input to an aggregator (555).

場合によっては、スケーラ／逆変換（551）の出力サンプルは、イントラコーディングされたブロック、すなわち、以前に再構築されたピクチャからの予測情報を使用しないが、現在のピクチャの以前に再構築された部分からの予測情報を使用することができるブロックに関係する場合がある。そのような予測情報は、イントラピクチャ予測ユニット（552）によって提供され得る。場合によっては、イントラピクチャ予測ユニット（552）は、すでに再構築され、現在のピクチャバッファ（558）に記憶されている周囲のブロックの情報を使用して、再構築中のブロックと同じサイズおよび形状のブロックを生成してもよい。現在のピクチャバッファ（558）は、例えば、部分的に再構築された現在のピクチャおよび／または完全に再構築された現在のピクチャをバッファリングする。アグリゲータ（555）は、いくつかの実装形態では、サンプルごとに、イントラ予測ユニット（552）が生成した予測情報を、スケーラ／逆変換ユニット（551）によって提供される出力サンプル情報に追加してもよい。 In some cases, the output samples of the scalar/inverse transform (551) may relate to intra-coded blocks, i.e., blocks that do not use prediction information from a previously reconstructed picture, but may use prediction information from a previously reconstructed portion of the current picture. Such prediction information may be provided by an intra-picture prediction unit (552). In some cases, the intra-picture prediction unit (552) may generate a block of the same size and shape as the block being reconstructed using information of surrounding blocks that have already been reconstructed and stored in the current picture buffer (558). The current picture buffer (558) may, for example, buffer the partially reconstructed and/or fully reconstructed current picture. The aggregator (555) may, in some implementations, add the prediction information generated by the intra-prediction unit (552) to the output sample information provided by the scalar/inverse transform unit (551) on a sample-by-sample basis.

他の場合には、スケーラ／逆変換ユニット（551）の出力サンプルは、インターコーディングされ、潜在的に動き補償されたブロックに関連する可能性がある。そのような場合、動き補償予測ユニット（553）は、参照ピクチャメモリ（557）にアクセスして、インターピクチャ予測に使用されるサンプルをフェッチすることができる。ブロックに関連するシンボル（521）に従ってフェッチされたサンプルを動き補償した後、これらのサンプルを、出力サンプル情報を生成するために、アグリゲータ（555）によってスケーラ／逆変換ユニット（551）の出力に追加することができる（ユニット551の出力は、残差サンプルまたは残差信号と呼ばれ得る）。動き補償予測ユニット（553）がそこから予測サンプルをフェッチする参照ピクチャメモリ（557）内のアドレスは、例えば、X成分、Y成分（シフト）、および参照ピクチャ成分（時間）を有することができるシンボル（521）の形式で動き補償予測ユニット（553）に利用可能な、動きベクトルによって制御され得る。動き補償はまた、サブサンプルの正確な動きベクトルが使用されているときに参照ピクチャメモリ（557）からフェッチされたサンプル値の補間を含んでもよく、また、動きベクトル予測メカニズムなどに関連付けられてもよい。 In other cases, the output samples of the scalar/inverse transform unit (551) may relate to an inter-coded, potentially motion-compensated block. In such cases, the motion compensated prediction unit (553) may access the reference picture memory (557) to fetch samples used for inter-picture prediction. After motion compensating the fetched samples according to the symbols (521) related to the block, these samples may be added to the output of the scalar/inverse transform unit (551) by the aggregator (555) to generate output sample information (the output of unit 551 may be referred to as residual samples or residual signals). The addresses in the reference picture memory (557) from which the motion compensated prediction unit (553) fetches the prediction samples may be controlled by a motion vector, available to the motion compensated prediction unit (553) in the form of a symbol (521), which may have, for example, an X component, a Y component (shift), and a reference picture component (time). Motion compensation may also include interpolation of sample values fetched from the reference picture memory (557) when sub-sample accurate motion vectors are used, and may also be associated with motion vector prediction mechanisms, etc.

アグリゲータ（555）の出力サンプルは、ループフィルタユニット（556）において様々なループフィルタリング技術を受けることができる。ビデオ圧縮技術は、（コーディングされたビデオビットストリームとも呼ばれる）コーディングされたビデオシーケンスに含まれるパラメータによって制御され、パーサ（520）からのシンボル（521）としてループフィルタユニット（556）に利用可能にされるインループフィルタ技術を含むことができるが、コーディングされたピクチャまたはコーディングされたビデオシーケンスの（デコーディング順序で）前の部分のデコーディング中に取得されたメタ情報に応答するだけでなく、以前に再構築およびループフィルタリングされたサンプル値に応答することもできる。以下でさらに詳細に説明するように、いくつかのタイプのループフィルタが、様々な順序でループフィルタユニット556の一部として含まれ得る。 The output samples of the aggregator (555) may be subjected to various loop filtering techniques in the loop filter unit (556). Video compression techniques may include in-loop filter techniques controlled by parameters contained in the coded video sequence (also called the coded video bitstream) and made available to the loop filter unit (556) as symbols (521) from the parser (520), but may also be responsive to previously reconstructed and loop filtered sample values as well as meta-information obtained during decoding of a previous portion (in decoding order) of the coded picture or coded video sequence. As described in more detail below, several types of loop filters may be included as part of the loop filter unit 556 in various orders.

ループフィルタユニット（556）の出力は、レンダリングデバイス（512）に出力されるだけでなく、将来のインターピクチャ予測で使用するために参照ピクチャメモリ（557）に記憶することもできるサンプルストリームであり得る。 The output of the loop filter unit (556) may be a sample stream that can be output to a rendering device (512) but also stored in a reference picture memory (557) for use in future inter-picture prediction.

特定のコーディングされたピクチャは、完全に再構築されると、将来のインターピクチャ予測のための参照ピクチャとして使用され得る。例えば、現在のピクチャに対応するコーディングされたピクチャが完全に再構築され、コーディングされたピクチャが参照ピクチャとして（例えば、パーサ（520）によって）識別されると、現在のピクチャバッファ（558）は、参照ピクチャメモリ（557）の一部になることができ、未使用の現在のピクチャバッファは、次のコーディングされたピクチャの再構築を開始する前に再割り当てされ得る。 Once a particular coded picture has been fully reconstructed, it may be used as a reference picture for future inter-picture prediction. For example, once a coded picture corresponding to a current picture has been fully reconstructed and the coded picture has been identified (e.g., by the parser (520)) as a reference picture, the current picture buffer (558) may become part of the reference picture memory (557), and any unused current picture buffer may be reallocated prior to beginning reconstruction of the next coded picture.

ビデオデコーダ（510）は、例えばITU－T勧告H．265などの規格で採用された所定のビデオ圧縮技術に従ってデコーディング動作を実行し得る。コーディングされたビデオシーケンスは、コーディングされたビデオシーケンスがビデオ圧縮技術または規格の構文と、ビデオ圧縮技術または規格に文書化されたプロファイルの両方に忠実であるという意味において、使用されているビデオ圧縮技術または規格によって指定された構文に準拠し得る。具体的には、プロファイルは、ビデオ圧縮技術または規格において使用可能なすべてのツールから、そのプロファイルの下で使用することができる唯一のツールとして特定のツールを選択することができる。規格に準拠するために、コーディングされたビデオシーケンスの複雑さが、ビデオ圧縮技術または規格のレベルによって定義される範囲内にあり得る。場合によっては、レベルは、最大ピクチャサイズ、最大フレームレート、最大再構築サンプルレート（例えば毎秒メガサンプルで測定される）、最大参照ピクチャサイズなどを制限する。レベルによって設定された限界は、場合によっては、仮想参照デコーダ（HRD）の仕様、およびコードされたビデオシーケンスでシグナリングされるHRDバッファ管理のメタデータによってさらに制限され得る。 The video decoder (510) may perform decoding operations according to a given video compression technique adopted in a standard, such as ITU-T Recommendation H.265. The coded video sequence may conform to the syntax specified by the video compression technique or standard used in the sense that the coded video sequence adheres to both the syntax of the video compression technique or standard and to the profile documented in the video compression technique or standard. Specifically, the profile may select certain tools from all tools available in the video compression technique or standard as the only tools that may be used under that profile. To conform to a standard, the complexity of the coded video sequence may be within a range defined by a level of the video compression technique or standard. In some cases, the level limits the maximum picture size, maximum frame rate, maximum reconstruction sample rate (e.g., measured in megasamples per second), maximum reference picture size, etc. The limits set by the level may be further limited in some cases by the specification of a hypothetical reference decoder (HRD) and metadata for HRD buffer management signaled in the coded video sequence.

いくつかの例示的な実施形態では、受信器（531）は、エンコーディングされたビデオと共に追加の（冗長な）データを受信し得る。追加のデータは、コーディングされたビデオシーケンスの一部として含まれてもよい。追加のデータは、ビデオデコーダ（510）によって、データを適切にデコーディングするために、かつ／または元のビデオデータをより正確に再構築するために使用され得る。追加のデータは、例えば、時間、空間、または信号対雑音比（SNR）の拡張層、冗長スライス、冗長ピクチャ、順方向誤り訂正コードなどの形式であり得る。 In some example embodiments, the receiver (531) may receive additional (redundant) data along with the encoded video. The additional data may be included as part of the coded video sequence. The additional data may be used by the video decoder (510) to properly decode the data and/or to more accurately reconstruct the original video data. The additional data may be in the form of, for example, temporal, spatial, or signal-to-noise ratio (SNR) enhancement layers, redundant slices, redundant pictures, forward error correction codes, etc.

図6は、本開示の例示的な実施形態による、ビデオエンコーダ（603）のブロック図を示す。ビデオエンコーダ（603）は、電子デバイス（620）に含まれ得る。電子デバイス（620）は、送信器（640）（例えば、送信回路）をさらに含み得る。ビデオエンコーダ（603）を、図4の例のビデオエンコーダ（403）の代わりに使用することができる。 FIG. 6 illustrates a block diagram of a video encoder (603) according to an exemplary embodiment of the present disclosure. The video encoder (603) may be included in an electronic device (620). The electronic device (620) may further include a transmitter (640) (e.g., a transmitting circuit). The video encoder (603) may be used in place of the video encoder (403) of the example of FIG. 4.

ビデオエンコーダ（603）は、ビデオエンコーダ（603）によってコーディングされるべきビデオ画像を取り込み得るビデオソース（601）（図6の例では電子デバイス（620）の一部ではない）からビデオサンプルを受信し得る。他の例では、ビデオソース（601）は、電子デバイス（620）の一部分として実装されてもよい。 The video encoder (603) may receive video samples from a video source (601) (which in the example of FIG. 6 is not part of the electronic device (620)) that may capture video images to be coded by the video encoder (603). In other examples, the video source (601) may be implemented as part of the electronic device (620).

ビデオソース（601）は、任意の適切なビット深度（例えば、8ビット、10ビット、12ビット、…）、任意の色空間（例えば、BT．601 YCrCb、RGB、XYZ…）、および任意の適切なサンプリング構造（例えば、YCrCb 4：2：0、YCrCb 4：4：4）であり得るデジタルビデオサンプルストリームの形式で、ビデオエンコーダ（603）によってコーディングされるべきソースビデオシーケンスを提供してもよい。メディアサービングシステムでは、ビデオソース（601）は、以前に準備されたビデオを記憶することが可能な記憶デバイスであり得る。ビデオ会議システムでは、ビデオソース（601）は、ビデオシーケンスとしてローカル画像情報を取り込むカメラであってもよい。ビデオデータは、順番に見たときに動きを与える複数の個々のピクチャまたは画像として提供され得る。ピクチャ自体は、画素の空間配列として編成されてもよく、各画素は、使用されているサンプリング構造、色空間などに応じて、1つまたは複数のサンプルを含むことができる。当業者であれば、画素とサンプルとの間の関係を容易に理解することができる。以下の説明は、サンプルに焦点を当てている。 The video source (601) may provide a source video sequence to be coded by the video encoder (603) in the form of a digital video sample stream that may be of any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit, ...), any color space (e.g., BT.601 YCrCb, RGB, XYZ ...), and any suitable sampling structure (e.g., YCrCb 4:2:0, YCrCb 4:4:4). In a media serving system, the video source (601) may be a storage device capable of storing previously prepared video. In a video conferencing system, the video source (601) may be a camera that captures local image information as a video sequence. The video data may be provided as a number of individual pictures or images that give motion when viewed in sequence. The picture itself may be organized as a spatial array of pixels, each pixel may contain one or more samples depending on the sampling structure, color space, etc. being used. Those skilled in the art can easily understand the relationship between pixels and samples. The following description focuses on samples.

いくつかの例示的な実施形態によれば、ビデオエンコーダ（603）は、リアルタイムで、または用途によって必要とされる他の任意の時間制約の下で、ソースビデオシーケンスのピクチャをコーディングされたビデオシーケンス（643）にコーディングおよび圧縮し得る。適切なコーディング速度を強制することが、コントローラ（650）の1つの機能を構成する。いくつかの実施形態では、コントローラ（650）は、以下で説明されるように、他の機能ユニットに機能的に結合され、他の機能ユニットを制御し得る。簡潔にするために、結合は図示されていない。コントローラ（650）によって設定されるパラメータには、レート制御関連のパラメータ（ピクチャスキップ、量子化器、レート歪み最適化技法のラムダ値など）、ピクチャサイズ、ピクチャグループ（GOP）レイアウト、最大動きベクトル検索範囲などが含まれ得る。コントローラ（650）は、特定のシステム設計のために最適化されたビデオエンコーダ（603）に関連する他の適切な機能を有するように構成されることができる。 According to some example embodiments, the video encoder (603) may code and compress pictures of a source video sequence into a coded video sequence (643) in real time or under any other time constraint required by the application. Enforcing an appropriate coding rate constitutes one function of the controller (650). In some embodiments, the controller (650) may be operatively coupled to and control other functional units as described below. For the sake of brevity, couplings are not shown. Parameters set by the controller (650) may include rate control related parameters (picture skip, quantizer, lambda value for rate distortion optimization techniques, etc.), picture size, group of pictures (GOP) layout, maximum motion vector search range, etc. The controller (650) may be configured to have other appropriate functions associated with the video encoder (603) optimized for a particular system design.

いくつかの例示的な実施形態では、ビデオエンコーダ（603）は、コーディングループで動作するように構成されてもよい。過度に簡略化した説明として、一例では、コーディングループは、ソースコーダ（630）（例えば、コーディングされるべき入力ピクチャと参照ピクチャとに基づいて、シンボルストリームなどのシンボルを生成することに関与する）と、ビデオエンコーダ（603）に組み込まれた（ローカル）デコーダ（633）とを含み得る。デコーダ（633）は、組み込まれたデコーダ633がエントロピーコーディングなしでソースコーダ630によってコーディングされたビデオストリームを処理するとしても、シンボルを再構築して、（リモート）デコーダが作成することになるのと同様の方法でサンプルデータを作成する（開示された主題で考慮されるビデオ圧縮技術では、シンボルとコーディングされたビデオビットストリームとの間の任意の圧縮が可逆であり得るため）。再構築されたサンプルストリーム（サンプルデータ）は、参照ピクチャメモリ（634）に入力される。シンボルストリームのデコーディングは、デコーダの位置（ローカルまたはリモート）に関係なくビットイグザクトな結果をもたらすため、参照ピクチャメモリ（634）の内容も、ローカルエンコーダとリモートエンコーダとの間でビットイグザクトである。言い換えれば、エンコーダの予測部分は、デコーディング中に予測を使用するときにデコーダが「見る」のと全く同じサンプル値を参照ピクチャサンプルとして「見る」。参照ピクチャ同期性のこの基本原理（および、例えばチャネル誤差が原因で同期性を維持することができない場合に結果として生じるドリフト）は、コーディング品質を向上させるために使用される。 In some example embodiments, the video encoder (603) may be configured to operate in a coding loop. As an oversimplified explanation, in one example, the coding loop may include a source coder (630) (e.g., involved in generating symbols, such as a symbol stream, based on an input picture to be coded and a reference picture) and a (local) decoder (633) embedded in the video encoder (603). The decoder (633) reconstructs the symbols to create sample data in a similar manner as a (remote) decoder would create them (since in the video compression techniques contemplated in the disclosed subject matter, any compression between the symbols and the coded video bitstream may be lossless). The reconstructed sample stream (sample data) is input to a reference picture memory (634). Since decoding of the symbol stream yields bit-exact results regardless of the location of the decoder (local or remote), the contents of the reference picture memory (634) are also bit-exact between the local and remote encoders. In other words, the predictive part of the encoder "sees" exactly the same sample values as the reference picture samples that the decoder "sees" when using prediction during decoding. This fundamental principle of reference picture synchrony (and the resulting drift when synchrony cannot be maintained due to, for example, channel errors) is used to improve coding quality.

「ローカル」デコーダ（633）の動作は、図5と併せて上記で詳細にすでに説明されている、ビデオデコーダ（510）などの「リモート」デコーダの動作と同じであり得る。図5も簡単に参照すると、しかしながら、シンボルが利用可能であり、エントロピーコーダ（645）およびパーサ（520）によるコーディングされたビデオシーケンスへのシンボルのエンコーディング／デコーディングが可逆であり得るため、バッファメモリ（515）およびパーサ（520）を含むビデオデコーダ（510）のエントロピーデコーディング部分は、エンコーダ内のローカルデコーダ（633）においては完全に実装されない場合がある。 The operation of the "local" decoder (633) may be the same as that of a "remote" decoder, such as the video decoder (510), already described in detail above in conjunction with FIG. 5. With brief reference also to FIG. 5, however, because symbols are available and the encoding/decoding of symbols into a coded video sequence by the entropy coder (645) and parser (520) may be lossless, the entropy decoding portion of the video decoder (510), including the buffer memory (515) and parser (520), may not be fully implemented in the local decoder (633) within the encoder.

この時点で言えることは、デコーダ内にのみ存在し得る構文解析／エントロピーデコーディングを除く任意のデコーダ技術もまた必然的に、対応するエンコーダにおいて、実質的に同一の機能形式で存在する必要があり得るということである。このため、開示された主題はデコーダ動作に焦点を当てる場合があり、この動作はエンコーダのデコーディング部分と同様である。よって、エンコーダ技術の説明は、包括的に説明されるデコーダ技術の逆であるので、省略され得る。特定の領域または態様においてのみ、エンコーダのより詳細な説明が以下に提供される。 At this point, it can be said that any decoder technology, except for parsing/entropy decoding, which may only be present in the decoder, may necessarily also need to be present in a corresponding encoder in substantially the same functional form. For this reason, the disclosed subject matter may focus on the decoder operation, which is similar to the decoding portion of the encoder. Thus, a description of the encoder technology may be omitted, since it is the inverse of the decoder technology described generically. Only in certain areas or aspects is a more detailed description of the encoder provided below.

動作中、いくつかの例示的な実装形態では、ソースコーダ（630）は、「参照ピクチャ」として指定されたビデオシーケンスからの1つまたは複数の以前にコーディングされたピクチャを参照して予測的に入力ピクチャをコーディングする、動き補償予測コーディングを実行し得る。このようにして、コーディングエンジン（632）は、入力ピクチャの画素ブロックと、入力ピクチャへの予測参照として選択され得る参照ピクチャの画素ブロックとの間の色チャネルの差（または残差）をコーディングする。「残差」という用語およびその形容詞形「残差の」は、互換的に使用されてもよい。 In operation, in some example implementations, the source coder (630) may perform motion-compensated predictive coding, which predictively codes an input picture with reference to one or more previously coded pictures from the video sequence designated as "reference pictures." In this manner, the coding engine (632) codes color channel differences (or residuals) between pixel blocks of the input picture and pixel blocks of reference pictures that may be selected as predictive references to the input picture. The terms "residual" and its adjective form "residual" may be used interchangeably.

ローカルビデオデコーダ（633）は、ソースコーダ（630）によって作成されたシンボルに基づいて、参照ピクチャとして指定され得るピクチャのコーディングされたビデオデータをデコーディングし得る。コーディングエンジン（632）の動作は、有利には、非可逆プロセスであってもよい。コーディングされたビデオデータが（図6には示されていない）ビデオデコーダでデコーディングされ得るとき、再構築されたビデオシーケンスは、典型的には、いくつかの誤差を伴うソースビデオシーケンスのレプリカであり得る。ローカルビデオデコーダ（633）は、参照ピクチャに対してビデオデコーダによって実行され得るデコーディングプロセスを複製し、再構築された参照ピクチャが参照ピクチャキャッシュ（634）に記憶されるようにし得る。このようにして、ビデオエンコーダ（603）は、（送信誤差なしで）遠端（リモート）ビデオデコーダによって取得される再構築された参照ピクチャと共通の内容を有する再構築された参照ピクチャのコピーをローカルに記憶し得る。 The local video decoder (633) may decode the coded video data of pictures that may be designated as reference pictures based on the symbols created by the source coder (630). The operation of the coding engine (632) may advantageously be a lossy process. When the coded video data may be decoded in a video decoder (not shown in FIG. 6), the reconstructed video sequence may typically be a replica of the source video sequence with some errors. The local video decoder (633) may replicate the decoding process that may be performed by the video decoder on the reference pictures, such that the reconstructed reference pictures are stored in the reference picture cache (634). In this way, the video encoder (603) may locally store copies of reconstructed reference pictures that have common content with the reconstructed reference pictures obtained by the far-end (remote) video decoder (without transmission errors).

予測器（635）は、コーディングエンジン（632）のための予測検索を実行し得る。すなわち、コーディングされる新しいピクチャの場合、予測器（635）は、新しい画素のための適切な予測参照として役立ち得る、（候補参照画素ブロックとしての）サンプルデータまたは参照ピクチャ動きベクトル、ブロック形状などの特定のメタデータを求めて、参照ピクチャメモリ（634）を検索し得る。予測器（635）は、適切な予測参照を見つけるために、画素ブロックごとにサンプルブロックに対して動作し得る。場合によっては、予測器（635）によって取得された検索結果によって決定されるように、入力ピクチャは、参照ピクチャメモリ（634）に記憶された複数の参照ピクチャから引き出された予測参照を有してもよい。 The predictor (635) may perform a prediction search for the coding engine (632). That is, for a new picture to be coded, the predictor (635) may search the reference picture memory (634) for sample data (as candidate reference pixel blocks) or specific metadata such as reference picture motion vectors, block shapes, etc., that may serve as suitable prediction references for the new pixels. The predictor (635) may operate on sample blocks, pixel block by pixel block, to find a suitable prediction reference. In some cases, as determined by the search results obtained by the predictor (635), the input picture may have prediction references drawn from multiple reference pictures stored in the reference picture memory (634).

コントローラ（650）は、例えば、ビデオデータをエンコーディングするために使用されるパラメータおよびサブグループパラメータの設定を含む、ソースコーダ（630）のコーディング動作を管理し得る。 The controller (650) may manage the coding operations of the source coder (630), including, for example, setting parameters and subgroup parameters used to encode the video data.

すべての前述の機能ユニットの出力は、エントロピーコーダ（645）内でエントロピーコーディングを受けることができる。エントロピーコーダ（645）は、ハフマンコーディング、可変長コーディング、算術コーディングなどといった技術に従ったシンボルの可逆圧縮により、様々な機能ユニットによって生成されたシンボルをコーディングされたビデオシーケンスに変換する。 The output of all the aforementioned functional units can be subjected to entropy coding in an entropy coder (645), which converts the symbols produced by the various functional units into a coded video sequence by lossless compression of the symbols according to techniques such as Huffman coding, variable length coding, arithmetic coding, etc.

送信器（640）は、エントロピーコーダ（645）によって作成されたコーディングされたビデオシーケンスをバッファリングして、通信チャネル（660）を介した送信の準備をすることができ、通信チャネル（660）は、エンコーディングされたビデオデータを記憶する記憶デバイスへのハードウェア／ソフトウェアリンクであってもよい。送信器（640）は、ビデオコーダ（603）からのコーディングされたビデオデータを、送信される他のデータ、例えば、コーディングされたオーディオデータおよび／または補助データストリーム（ソースは図示せず）とマージし得る。 The transmitter (640) may buffer the coded video sequence created by the entropy coder (645) and prepare it for transmission over a communication channel (660), which may be a hardware/software link to a storage device that stores the encoded video data. The transmitter (640) may merge the coded video data from the video coder (603) with other data to be transmitted, such as coded audio data and/or auxiliary data streams (sources not shown).

コントローラ（650）は、ビデオエンコーダ（603）の動作を管理してもよい。コーディング中、コントローラ（650）は、各コーディングされたピクチャに特定のコーディングされたピクチャタイプを割り当ててもよく、これは、それぞれのピクチャに適用され得るコーディング技法に影響を及ぼす場合がある。例えば、ピクチャは、以下のピクチャタイプのうちの1つとして割り当てられることが多い。 The controller (650) may manage the operation of the video encoder (603). During coding, the controller (650) may assign a particular coded picture type to each coded picture, which may affect the coding technique that may be applied to the respective picture. For example, pictures are often assigned as one of the following picture types:

イントラピクチャ（Iピクチャ）は、予測のソースとしてシーケンス内の他のピクチャを使用せずにコーディングおよびデコーディングされ得るものであり得る。いくつかのビデオコーデックは、例えば、独立デコーダリフレッシュ（「IDR」）ピクチャを含む、異なるタイプのイントラピクチャを可能にする。当業者は、Iピクチャのそれらの変形形態、ならびにそれらのそれぞれの用途および特徴を認識している。 An intra picture (I-picture) may be one that can be coded and decoded without using other pictures in a sequence as a source of prediction. Some video codecs allow different types of intra pictures, including, for example, Independent Decoder Refresh ("IDR") pictures. Those skilled in the art are aware of these variations of I-pictures, as well as their respective uses and characteristics.

予測ピクチャ（Pピクチャ）は、各ブロックのサンプル値を予測するために、最大で1つの動きベクトルおよび参照インデックスを使用するイントラ予測またはインター予測を使用して、コーディングおよびデコーディングされ得るピクチャであり得る。 A predicted picture (P picture) may be a picture that can be coded and decoded using intra- or inter-prediction, which uses at most one motion vector and reference index to predict the sample values of each block.

双方向予測ピクチャ（Bピクチャ）は、各ブロックのサンプル値を予測するために、最大で2つの動きベクトルおよび参照インデックスを使用するイントラ予測またはインター予測を使用して、コーディングおよびデコーディングされ得るピクチャであり得る。同様に、複数予測ピクチャは、単一のブロックの再構築のために3つ以上の参照ピクチャおよび関連付けられたメタデータを使用することができる。 A bidirectionally predicted picture (B-picture) may be a picture that can be coded and decoded using intra- or inter-prediction, which uses up to two motion vectors and reference indices to predict the sample values of each block. Similarly, a multi-predicted picture may use more than two reference pictures and associated metadata for the reconstruction of a single block.

ソースピクチャは、一般に、複数のサンプルコーディングブロック（例えば、各々4×4、8×8、4×8、または16×16サンプルのブロック）に空間的に細分され、ブロックごとにコーディングされ得る。ブロックは、ブロックのそれぞれのピクチャに適用されたコーディング割り当てによって決定されるように、他の（すでにコーディングされた）ブロックを参照して予測的にコーディングされてもよい。例えば、Iピクチャのブロックは、非予測的にコーディングされてもよく、または同じピクチャのすでにコーディングされたブロックを参照して予測的にコーディングされてもよい（空間予測またはイントラ予測）。Pピクチャの画素ブロックは、1つの以前にコーディングされた参照ピクチャを参照して、空間予測を介して、または時間予測を介して、予測的にコーディングされてもよい。Bピクチャのブロックは、1つまたは2つの以前にコーディングされた参照ピクチャを参照して、空間予測を介して、または時間予測を介して、予測的にコーディングされてもよい。ソースピクチャまたは中間処理されたピクチャは、他の目的で他のタイプのブロックに細分されてもよい。コーディングブロックおよび他のタイプのブロックの分割は、以下でさらに詳細に説明するように、同じ方法に従ってもよく、従わなくてもよい。 A source picture may generally be spatially subdivided into multiple sample coding blocks (e.g., blocks of 4x4, 8x8, 4x8, or 16x16 samples each) and coded block by block. Blocks may be predictively coded with reference to other (already coded) blocks, as determined by the coding assignment applied to the respective picture of the block. For example, blocks of an I picture may be non-predictively coded or predictively coded with reference to already coded blocks of the same picture (spatial prediction or intra prediction). Pixel blocks of a P picture may be predictively coded via spatial prediction with reference to one previously coded reference picture or via temporal prediction. Blocks of a B picture may be predictively coded via spatial prediction with reference to one or two previously coded reference pictures or via temporal prediction. Source pictures or intermediate processed pictures may be subdivided into other types of blocks for other purposes. The division of coding blocks and other types of blocks may or may not follow the same method, as described in more detail below.

ビデオエンコーダ（603）は、ITU－T勧告H．265などの所定のビデオコーディング技術または規格に従ってコーディング動作を行い得る。その動作において、ビデオエンコーダ（603）は、入力ビデオシーケンスにおける時間および空間の冗長性を利用する予測コーディング動作を含む、様々な圧縮動作を行い得る。したがって、コーディングされたビデオデータは、使用されているビデオコーディング技術または規格によって指定された構文に準拠し得る。 The video encoder (603) may perform coding operations in accordance with a given video coding technique or standard, such as ITU-T Recommendation H.265. In its operations, the video encoder (603) may perform various compression operations, including predictive coding operations that exploit temporal and spatial redundancies in the input video sequence. Thus, the coded video data may conform to a syntax specified by the video coding technique or standard being used.

いくつかの例示的な実施形態では、送信器（640）は、エンコーディングされたビデオと共に追加のデータを送信し得る。ソースコーダ（630）は、コーディングされたビデオシーケンスの一部としてそのようなデータを含み得る。追加のデータは、時間／空間／SNR拡張層、冗長なピクチャおよびスライスなどの他の形式の冗長データ、SEIメッセージ、VUIパラメータセットフラグメントなどを含んでもよい。 In some example embodiments, the transmitter (640) may transmit additional data along with the encoded video. The source coder (630) may include such data as part of the coded video sequence. The additional data may include temporal/spatial/SNR enhancement layers, other forms of redundant data such as redundant pictures and slices, SEI messages, VUI parameter set fragments, etc.

ビデオは、時系列で複数のソースピクチャ（ビデオピクチャ）として取り込まれてもよい。イントラピクチャ予測（しばしばイントラ予測と略される）は、所与のピクチャにおける空間相関を利用し、インターピクチャ予測は、ピクチャ間の時間または他の相関を利用する。例えば、現在のピクチャと呼ばれる、エンコーディング／デコーディング中の特定のピクチャがブロックに分割され得る。現在のピクチャ内のブロックは、ビデオ内の以前にコーディングされたまだバッファリングされている参照ピクチャ内の参照ブロックと同様である場合、動きベクトルと呼ばれるベクトルによってコーディングされ得る。動きベクトルは、参照ピクチャ内の参照ブロックを指し、複数の参照ピクチャが使用されている場合、参照ピクチャを識別する第3の次元を有することができる。 Video may be captured as multiple source pictures (video pictures) in a time sequence. Intra-picture prediction (often abbreviated as intra prediction) exploits spatial correlation in a given picture, while inter-picture prediction exploits temporal or other correlation between pictures. For example, a particular picture being encoded/decoded, called the current picture, may be divided into blocks. If a block in the current picture is similar to a reference block in a previously coded, yet buffered, reference picture in the video, it may be coded by a vector, called a motion vector. A motion vector points to a reference block in the reference picture, and may have a third dimension that identifies the reference picture if multiple reference pictures are used.

いくつかの例示的な実施形態では、双予測技法がインターピクチャ予測に使用され得る。そのような双予測技法によれば、第1の参照ピクチャおよび第2の参照ピクチャなどの2つの参照ピクチャが使用され、これらは両方ともビデオ内の現在のピクチャをデコーディング順序で進める（ただし、表示順序では、それぞれ過去または未来にあり得る）。現在のピクチャ内のブロックは、第1の参照ピクチャ内の第1の参照ブロックを指す第1の動きベクトル、および第2の参照ピクチャ内の第2の参照ブロックを指す第2の動きベクトルによってコーディングされ得る。ブロックを、第1の参照ブロックと第2の参照ブロックの組み合わせによって協調して予測することができる。 In some example embodiments, bi-prediction techniques may be used for inter-picture prediction. According to such bi-prediction techniques, two reference pictures are used, such as a first reference picture and a second reference picture, both of which advance the current picture in the video in decoding order (but may be in the past or future, respectively, in display order). A block in the current picture may be coded by a first motion vector that points to a first reference block in the first reference picture and a second motion vector that points to a second reference block in the second reference picture. A block may be jointly predicted by a combination of the first and second reference blocks.

さらに、マージモード技法が、インターピクチャ予測においてコーディング効率を改善するために使用されてもよい。 Furthermore, merge mode techniques may be used to improve coding efficiency in inter-picture prediction.

本開示のいくつかの例示的な実施形態によれば、インターピクチャ予測およびイントラピクチャ予測などの予測は、ブロック単位で行われる。例えば、ビデオピクチャのシーケンス内のピクチャは、圧縮のためにコーディングツリーユニット（CTU）に分割され、ピクチャ内のCTUは、64×64画素、32×32画素、または16×16画素などの同じサイズを有し得る。一般に、CTUは、3つの並列のコーディングツリーブロック（CTB）、すなわち、1つのルマCTBおよび2つのクロマCTBを含み得る。各CTUは、1つまたは複数のコーディングユニット（CU）に再帰的に四分木分割され得る。例えば、64×64画素のCTUを、64×64画素の1つのCU、または32×32画素の4つのCUに分割することができる。32×32ブロックのうちの1つまたは複数の各々は、16×16画素の4つのCUにさらに分割され得る。いくつかの例示的な実施形態では、各CUは、インター予測タイプやイントラ予測タイプなどの様々な予測タイプの中からそのCUの予測タイプを決定するためにエンコーディング中に分析され得る。CUは、時間的および／または空間的予測可能性に応じて、1つまたは複数の予測ユニット（PU）に分割され得る。一般に、各PUは、1つのルマ予測ブロック（PB）および2つのクロマPBを含む。一実施形態では、コーディング（エンコーディング／デコーディング）における予測動作は、予測ブロックの単位で行われる。CUのPU（または異なる色チャネルのPB）への分割は、様々な空間パターンで行われ得る。ルマPBまたはクロマPBは、例えば、8×8画素、16×16画素、8×16画素、16×8画素などといった、サンプルの値（例えば、ルマ値）の行列を含み得る。 According to some exemplary embodiments of the present disclosure, predictions such as inter-picture prediction and intra-picture prediction are performed on a block-by-block basis. For example, a picture in a sequence of video pictures is divided into coding tree units (CTUs) for compression, and the CTUs in a picture may have the same size, such as 64×64 pixels, 32×32 pixels, or 16×16 pixels. In general, a CTU may include three parallel coding tree blocks (CTBs), i.e., one luma CTB and two chroma CTBs. Each CTU may be recursively quadtree partitioned into one or more coding units (CUs). For example, a CTU of 64×64 pixels may be partitioned into one CU of 64×64 pixels, or four CUs of 32×32 pixels. Each of one or more of the 32×32 blocks may be further partitioned into four CUs of 16×16 pixels. In some exemplary embodiments, each CU may be analyzed during encoding to determine a prediction type for that CU from among various prediction types, such as an inter prediction type and an intra prediction type. A CU may be divided into one or more prediction units (PUs) according to temporal and/or spatial predictability. In general, each PU includes one luma prediction block (PB) and two chroma PBs. In one embodiment, prediction operations in coding (encoding/decoding) are performed in units of prediction blocks. The division of a CU into PUs (or PBs of different color channels) may be performed in various spatial patterns. A luma PB or a chroma PB may include a matrix of sample values (e.g., luma values), for example, 8×8 pixels, 16×16 pixels, 8×16 pixels, 16×8 pixels, etc.

図7は、本開示の他の例示的な実施形態による、ビデオエンコーダ（703）の図を示す。ビデオエンコーダ（703）は、ビデオピクチャのシーケンス内の現在のビデオピクチャ内のサンプル値の処理ブロック（例えば、予測ブロック）を受信し、処理ブロックをコーディングされたビデオシーケンスの一部であるコーディングされたピクチャにエンコーディングするように構成される。例示的なビデオエンコーダ（703）は、図4の例のビデオエンコーダ（403）の代わりに使用され得る。 FIG. 7 shows a diagram of a video encoder (703) according to another example embodiment of this disclosure. The video encoder (703) is configured to receive a processed block (e.g., a predictive block) of sample values in a current video picture in a sequence of video pictures and to encode the processed block into a coded picture that is part of a coded video sequence. The example video encoder (703) may be used in place of the example video encoder (403) of FIG. 4.

例えば、ビデオエンコーダ（703）は、8×8サンプルの予測ブロックなどの処理ブロックについてのサンプル値の行列を受信する。次いでビデオエンコーダ（703）は、例えば、レート歪み最適化（RDO）を使用して、処理ブロックがそれを使用して最良にコーディングされるのは、イントラモードか、インターモードか、それとも双予測モードかを決定する。処理ブロックがイントラモードでコーディングされると決定された場合、ビデオエンコーダ（703）は、イントラ予測技法を使用して処理ブロックをコーディングされたピクチャにエンコーディングし、処理ブロックがインターモードまたは双予測モードでコーディングされると決定された場合、ビデオエンコーダ（703）は、それぞれインター予測技法または双予測技法を使用して、処理ブロックをコーディングされたピクチャにエンコーディングし得る。いくつかの例示的な実施形態では、インターピクチャ予測のサブモードとして、動きベクトルが予測子の外側のコーディングされた動きベクトル成分の恩恵を受けずに1つまたは複数の動きベクトル予測子から導出されるマージモードが使用され得る。いくつかの他の例示的な実施形態では、対象ブロックに適用可能な動きベクトル成分が存在し得る。したがって、ビデオエンコーダ（703）は、処理ブロックの予測モードを決定するために、モード決定モジュールなど、図7に明示的に示されていない構成要素を含んでもよい。 For example, the video encoder (703) receives a matrix of sample values for a processing block, such as a predictive block of 8×8 samples. The video encoder (703) then determines, for example using rate-distortion optimization (RDO), whether the processing block is best coded using intra-mode, inter-mode, or bi-predictive mode. If it is determined that the processing block is coded in intra-mode, the video encoder (703) may encode the processing block into a coded picture using intra-prediction techniques, and if it is determined that the processing block is coded in inter-mode or bi-predictive mode, the video encoder (703) may encode the processing block into a coded picture using inter-prediction techniques or bi-prediction techniques, respectively. In some exemplary embodiments, a merge mode may be used as a sub-mode of inter-picture prediction, in which motion vectors are derived from one or more motion vector predictors without the benefit of coded motion vector components outside the predictors. In some other exemplary embodiments, there may be motion vector components applicable to the current block. Thus, the video encoder (703) may include components not explicitly shown in FIG. 7, such as a mode decision module, to determine the prediction mode of a processing block.

図7の例では、ビデオエンコーダ（703）は、図7の例示的な配置に示されたように互いに結合されたインターエンコーダ（730）、イントラエンコーダ（722）、残差計算器（723）、スイッチ（726）、残差エンコーダ（724）、汎用コントローラ（721）、およびエントロピーエンコーダ（725）を含む。 In the example of FIG. 7, the video encoder (703) includes an inter-encoder (730), an intra-encoder (722), a residual calculator (723), a switch (726), a residual encoder (724), a general controller (721), and an entropy encoder (725) coupled together as shown in the exemplary arrangement of FIG. 7.

インターエンコーダ（730）は、現在のブロック（例えば、処理ブロック）のサンプルを受信し、そのブロックを参照ピクチャ内の1つまたは複数の参照ブロック（例えば、表示順序で前のピクチャ内および後のピクチャ内のブロック）と比較し、インター予測情報（例えば、インターエンコーディング技法による冗長情報、動きベクトル、マージモード情報の記述）を生成し、任意の適切な技法を使用してインター予測情報に基づいてインター予測結果（例えば、予測されたブロック）を計算するように構成される。いくつかの例では、参照ピクチャは、（以下でさらに詳細に説明するように、図7の残差デコーダ728として示されている）図6の例示的なエンコーダ620に組み込まれたデコーディングユニット633を使用して、エンコーディングされたビデオ情報に基づいてデコーディングされた、デコーディングされた参照ピクチャである。 The inter-encoder (730) is configured to receive samples of a current block (e.g., a processing block), compare the block to one or more reference blocks in a reference picture (e.g., blocks in previous and subsequent pictures in display order), generate inter-prediction information (e.g., a description of redundancy information, motion vectors, merge mode information from an inter-encoding technique), and calculate an inter-prediction result (e.g., a predicted block) based on the inter-prediction information using any suitable technique. In some examples, the reference picture is a decoded reference picture that has been decoded based on the encoded video information using a decoding unit 633 incorporated in the example encoder 620 of FIG. 6 (shown as a residual decoder 728 of FIG. 7, as described in more detail below).

イントラエンコーダ（722）は、現在のブロック（例えば、処理ブロック）のサンプルを受信し、ブロックを同じピクチャ内のすでにコーディングされたブロックと比較し、変換後の量子化係数を生成し、場合によってはイントラ予測情報（例えば、1つまたは複数のイントラエンコーディング技法によるイントラ予測方向情報）も生成するように構成される。イントラエンコーダ（722）は、イントラ予測情報および同じピクチャ内の参照ブロックに基づいて、イントラ予測結果（例えば、予測ブロック）を計算し得る。 The intra encoder (722) is configured to receive samples of a current block (e.g., a processing block), compare the block to previously coded blocks in the same picture, generate transformed quantized coefficients, and possibly also generate intra prediction information (e.g., intra prediction direction information according to one or more intra encoding techniques). The intra encoder (722) may calculate an intra prediction result (e.g., a prediction block) based on the intra prediction information and a reference block in the same picture.

汎用コントローラ（721）は、汎用制御データを決定し、汎用制御データに基づいてビデオエンコーダ（703）の他の構成要素を制御するように構成されてもよい。一例では、汎用コントローラ（721）は、ブロックの予測モードを決定し、予測モードに基づいて制御信号をスイッチ（726）に提供する。例えば、予測モードがイントラモードである場合、汎用コントローラ（721）は、スイッチ（726）を制御して、残差計算器（723）が使用するためのイントラモード結果を選択させ、エントロピーエンコーダ（725）を制御して、イントラ予測情報を選択させてそのイントラ予測情報をビットストリームに含め、ブロックの予測モードがインターモードである場合、汎用コントローラ（721）は、スイッチ（726）を制御して、残差計算器（723）が使用するためのインター予測結果を選択させ、エントロピーエンコーダ（725）を制御して、インター予測情報を選択させてそのインター予測情報をビットストリームに含める。 The generic controller (721) may be configured to determine generic control data and control other components of the video encoder (703) based on the generic control data. In one example, the generic controller (721) determines a prediction mode for the block and provides a control signal to the switch (726) based on the prediction mode. For example, if the prediction mode is an intra mode, the generic controller (721) controls the switch (726) to select an intra mode result for use by the residual calculator (723) and controls the entropy encoder (725) to select intra prediction information and include the intra prediction information in the bitstream, and if the prediction mode for the block is an inter mode, the generic controller (721) controls the switch (726) to select an inter prediction result for use by the residual calculator (723) and controls the entropy encoder (725) to select inter prediction information and include the inter prediction information in the bitstream.

残差計算器（723）は、受信したブロックと、イントラエンコーダ（722）またはインターエンコーダ（730）から選択されたブロックについての予測結果との間の差（残差データ）を計算するように構成され得る。残差エンコーダ（724）は、残差データをエンコーディングして変換係数を生成するように構成され得る。例えば、残差エンコーダ（724）は、残差データを空間領域から周波数領域に変換して変換係数を生成するように構成され得る。次いで、変換係数は、量子化変換係数を取得するために量子化処理を受ける。様々な例示的な実施形態において、ビデオエンコーダ（703）は、残差デコーダ（728）も含む。残差デコーダ（728）は逆変換を行い、デコーディングされた残差データを生成するように構成される。デコーディングされた残差データは、イントラエンコーダ（722）およびインターエンコーダ（730）によって適切に使用され得る。例えば、インターエンコーダ（730）は、デコーディングされた残差データおよびインター予測情報に基づいてデコーディングされたブロックを生成することができ、イントラエンコーダ（722）は、デコーディングされた残差データおよびイントラ予測情報に基づいてデコーディングされたブロックを生成することができる。デコーディングされたブロックは、デコーディングされたピクチャを生成するために適切に処理され、デコーディングされたピクチャは、メモリ回路（図示せず）にバッファリングされ、参照ピクチャとして使用され得る。 The residual calculator (723) may be configured to calculate a difference (residual data) between a received block and a prediction result for a block selected from the intra-encoder (722) or the inter-encoder (730). The residual encoder (724) may be configured to encode the residual data to generate transform coefficients. For example, the residual encoder (724) may be configured to transform the residual data from the spatial domain to the frequency domain to generate transform coefficients. The transform coefficients then undergo a quantization process to obtain quantized transform coefficients. In various exemplary embodiments, the video encoder (703) also includes a residual decoder (728). The residual decoder (728) is configured to perform an inverse transform and generate decoded residual data. The decoded residual data may be used by the intra-encoder (722) and the inter-encoder (730) as appropriate. For example, the inter-encoder (730) can generate decoded blocks based on the decoded residual data and the inter-prediction information, and the intra-encoder (722) can generate decoded blocks based on the decoded residual data and the intra-prediction information. The decoded blocks are appropriately processed to generate decoded pictures, which can be buffered in a memory circuit (not shown) and used as reference pictures.

エントロピーエンコーダ（725）は、ビットストリームをエンコーディングされたブロックを含むようにフォーマットし、エントロピーコーディングを行うように構成され得る。エントロピーエンコーダ（725）は、ビットストリームに様々な情報を含めるように構成される。例えば、エントロピーエンコーダ（725）は、汎用制御データ、選択された予測情報（例えば、イントラ予測情報やインター予測情報）、残差情報、および他の適切な情報をビットストリームに含めるように構成され得る。インターモードまたは双予測モードのいずれかのマージサブモードでブロックをコーディングするとき、残差情報は存在しなくてもよい。 The entropy encoder (725) may be configured to format a bitstream to include the encoded block and perform entropy coding. The entropy encoder (725) may be configured to include various information in the bitstream. For example, the entropy encoder (725) may be configured to include general control data, selected prediction information (e.g., intra-prediction information or inter-prediction information), residual information, and other suitable information in the bitstream. When coding a block in a merged sub-mode of either an inter mode or a bi-prediction mode, the residual information may not be present.

図8は、本開示の他の実施形態による、例示的なビデオデコーダ（810）の図を示す。ビデオデコーダ（810）は、コーディングされたビデオシーケンスの一部であるコーディングされたピクチャを受信し、コーディングされたピクチャをデコーディングして再構築されたピクチャを生成するように構成される。一例では、ビデオデコーダ（810）は、図4の例のビデオデコーダ（410）の代わりに使用され得る。 FIG. 8 shows a diagram of an example video decoder (810) according to another embodiment of the disclosure. The video decoder (810) is configured to receive coded pictures that are part of a coded video sequence and to decode the coded pictures to generate reconstructed pictures. In one example, the video decoder (810) may be used in place of the example video decoder (410) of FIG. 4.

図8の例では、ビデオデコーダ（810）は、図8の例示的な配置に示されたように、互いに結合されたエントロピーデコーダ（871）、インターデコーダ（880）、残差デコーダ（873）、再構築モジュール（874）、およびイントラデコーダ（872）を含む。 In the example of FIG. 8, the video decoder (810) includes an entropy decoder (871), an inter-decoder (880), a residual decoder (873), a reconstruction module (874), and an intra-decoder (872) coupled together as shown in the example arrangement of FIG. 8.

エントロピーデコーダ（871）は、コーディングされたピクチャから、コーディングされたピクチャが構成される構文要素を表す特定のシンボルを再構築するように構成されることができる。そのようなシンボルは、例えば、ブロックがコーディングされているモード（例えば、イントラモード、インターモード、双予測モード、マージサブモードまたは他のサブモード）、イントラデコーダ（872）またはインターデコーダ（880）によって予測に使用される特定のサンプルまたはメタデータを識別することができる予測情報（例えば、イントラ予測情報やインター予測情報）、例えば、量子化変換係数の形式の残差情報などを含むことができる。一例では、予測モードがインター予測モードまたは双予測モードである場合、インター予測情報がインターデコーダ（880）に提供され、予測タイプがイントラ予測タイプである場合、イントラ予測情報がイントラデコーダ（872）に提供される。残差情報は、逆量子化を受けることができ、残差デコーダ（873）に提供される。 The entropy decoder (871) may be configured to reconstruct from the coded picture certain symbols that represent syntax elements of which the coded picture is composed. Such symbols may include, for example, prediction information (e.g., intra- or inter-prediction information) that may identify the mode in which the block is coded (e.g., intra-, inter-, bi-prediction, merged or other sub-modes), certain samples or metadata used for prediction by the intra- or inter-decoder (872) or (880), residual information, for example in the form of quantized transform coefficients, etc. In one example, if the prediction mode is an inter- or bi-prediction mode, the inter-prediction information is provided to the inter-decoder (880), and if the prediction type is an intra-prediction type, the intra-prediction information is provided to the intra-decoder (872). The residual information may undergo inverse quantization and is provided to the residual decoder (873).

インターデコーダ（880）は、インター予測情報を受信し、インター予測情報に基づいてインター予測結果を生成するように構成されてもよい。 The inter decoder (880) may be configured to receive inter prediction information and generate inter prediction results based on the inter prediction information.

イントラデコーダ（872）は、イントラ予測情報を受信し、イントラ予測情報に基づいて予測結果を生成するように構成されてもよい。 The intra decoder (872) may be configured to receive intra prediction information and generate a prediction result based on the intra prediction information.

残差デコーダ（873）は、逆量子化を行って逆量子化変換係数を抽出し、逆量子化変換係数を処理して、残差を周波数領域から空間領域に変換するように構成されてもよい。残差デコーダ（873）はまた、（量子化器パラメータ（QP）を含めるために）特定の制御情報を利用してもよく、その情報は、エントロピーデコーダ（871）によって提供されてもよい（これは小さいデータ量の制御情報のみであり得るので、データパスは示されていない）。 The residual decoder (873) may be configured to perform inverse quantization to extract inverse quantized transform coefficients and process the inverse quantized transform coefficients to transform the residual from the frequency domain to the spatial domain. The residual decoder (873) may also utilize certain control information (to include quantizer parameters (QP)), which may be provided by the entropy decoder (871) (datapath not shown as this may be only a small amount of control information).

再構築モジュール（874）は、空間領域において、残差デコーダ（873）による出力としての残差と、（場合によって、インター予測モジュールまたはイントラ予測モジュールによる出力としての）予測結果を組み合わせて、再構築されたビデオの一部としての再構築されたピクチャの一部を形成する再構築されたブロックを形成するように構成され得る。視覚的品質を改善するために、デブロッキング動作などの他の適切な動作が行われてもよいことに留意されたい。 The reconstruction module (874) may be configured to combine, in the spatial domain, the residual as output by the residual decoder (873) and the prediction result (possibly as output by an inter prediction module or an intra prediction module) to form a reconstructed block that forms part of a reconstructed picture as part of the reconstructed video. It should be noted that other suitable operations, such as deblocking operations, may be performed to improve visual quality.

ビデオエンコーダ（403）、（603）、および（703）、ならびにビデオデコーダ（410）、（510）、および（810）は、任意の適切な技法を使用して実装することができることに留意されたい。いくつかの例示的な実施形態では、ビデオエンコーダ（403）、（603）、および（703）、ならびにビデオデコーダ（410）、（510）、および（810）は、1つまたは複数の集積回路を使用して実装されることができる。他の実施形態では、ビデオエンコーダ（403）、（603）、および（603）、ならびにビデオデコーダ（410）、（510）、および（810）は、ソフトウェア命令を実行する1つまたは複数のプロセッサを使用して実装されることができる。 It should be noted that the video encoders (403), (603), and (703) and the video decoders (410), (510), and (810) may be implemented using any suitable technique. In some exemplary embodiments, the video encoders (403), (603), and (703) and the video decoders (410), (510), and (810) may be implemented using one or more integrated circuits. In other embodiments, the video encoders (403), (603), and (603) and the video decoders (410), (510), and (810) may be implemented using one or more processors executing software instructions.

コーディングおよびデコーディングのためのブロック分割に目を向けると、一般的な分割は、ベースブロックから開始してもよく、事前定義されたルールセット、特定のパターン、分割ツリー、または任意の分割構造もしくは方式に従ってもよい。分割は、階層的かつ再帰的であってもよい。以下に記載される例示的な分割手順もしくは他の手順、またはそれらの組み合わせのいずれかに従ってベースブロックを分離または分割した後に、パーティションまたはコーディングブロックの最終セットが取得されてもよい。これらのパーティションの各々は、分割階層内の様々な分割レベルのうちの1つにあってもよく、様々な形状のパーティションであってもよい。パーティションの各々は、コーディングブロック（CB）と呼ばれる場合がある。以下にさらに記載される様々な例示的な分割実装形態では、結果として得られる各CBは、許容されるサイズおよび分割レベルのいずれかのCBであってもよい。そのようなパーティションは、そのためのいくつかの基本的なコーディング／デコーディング決定が行われ得、コーディング／デコーディングパラメータが最適化され、決定され、エンコーディングされたビデオビットストリームにおいてシグナリングされ得るユニットを形成し得るので、コーディングブロックと呼ばれる。最終パーティションにおける最高または最深のレベルは、コーディングブロック分割ツリー構造の深度を表す。コーディングブロックは、ルマコーディングブロックまたはクロマコーディングブロックであってもよい。各色のCBツリー構造は、コーディングブロックツリー（CBT）と呼ばれる場合がある。 Turning to block partitioning for coding and decoding, a general partitioning may start from a base block, may follow a predefined set of rules, a specific pattern, a partitioning tree, or any partitioning structure or scheme. The partitioning may be hierarchical and recursive. After isolating or dividing the base block according to any of the exemplary partitioning procedures described below or other procedures, or combinations thereof, a final set of partitions or coding blocks may be obtained. Each of these partitions may be at one of various partitioning levels in the partitioning hierarchy and may be partitions of various shapes. Each of the partitions may be referred to as a coding block (CB). In various exemplary partitioning implementations described further below, each resulting CB may be a CB of any of the allowed sizes and partitioning levels. Such partitions are referred to as coding blocks because they may form a unit for which some basic coding/decoding decisions may be made and coding/decoding parameters may be optimized, determined, and signaled in the encoded video bitstream. The highest or deepest level in the final partition represents the depth of the coding block partitioning tree structure. The coding block may be a luma coding block or a chroma coding block. The CB tree structure for each color is sometimes called the coding block tree (CBT).

すべての色チャネルのコーディングブロックは、まとめてコーディングユニット（CU）と呼ばれる場合がある。すべての色チャネルの階層構造は、まとめてコーディングツリーユニット（CTU）と呼ばれる場合がある。CTU内の様々な色チャネルの分割パターンまたは分割構造は、同じであってもなくてもよい。 The coding blocks of all color channels may be collectively referred to as a coding unit (CU). The hierarchical structure of all color channels may be collectively referred to as a coding tree unit (CTU). The partitioning pattern or structure of the various color channels within a CTU may or may not be the same.

いくつかの実装形態では、ルマチャネルおよびクロマチャネルに使用される分割ツリー方式または構造は、同じである必要はなくてもよい。言い換えれば、ルマチャネルおよびクロマチャネルは、別々のコーディングツリー構造またはパターンを有してもよい。さらに、ルマチャネルおよびクロマチャネルが同じコーディング分割ツリー構造を使用するか、異なるコーディング分割ツリー構造を使用するか、および使用されるべき実際のコーディング分割ツリー構造は、コーディングされているスライスがPスライスか、Bスライスか、Iスライスかに依存する場合がある。例えば、Iスライスの場合、クロマチャネルおよびルマチャネルは、別々のコーディング分割ツリー構造またはコーディング分割ツリー構造モードを有してもよいが、PスライスまたはBスライスの場合、ルマチャネルおよびクロマチャネルは、同じコーディング分割ツリー方式を共有してもよい。別々のコーディング分割ツリー構造またはモードが適用されるとき、ルマチャネルは、1つのコーディング分割ツリー構造によってCBに分割されてもよく、クロマチャネルは、他のコーディング分割ツリー構造によってクロマCBに分割されてもよい。 In some implementations, the split tree scheme or structure used for the luma channel and the chroma channel may not need to be the same. In other words, the luma channel and the chroma channel may have separate coding tree structures or patterns. Furthermore, whether the luma channel and the chroma channel use the same coding split tree structure or different coding split tree structures, and the actual coding split tree structure to be used, may depend on whether the slice being coded is a P slice, a B slice, or an I slice. For example, for an I slice, the chroma channel and the luma channel may have separate coding split tree structures or coding split tree structure modes, while for a P slice or a B slice, the luma channel and the chroma channel may share the same coding split tree scheme. When separate coding split tree structures or modes are applied, the luma channel may be split into CBs by one coding split tree structure, and the chroma channel may be split into chroma CBs by the other coding split tree structure.

いくつかの例示的な実装形態では、所定の分割パターンがベースブロックに適用されてもよい。図9に示すように、例示的な4方向分割ツリーは、第1の事前定義されたレベル（例えば、ベースブロックサイズとして、64×64ブロックレベルまたは他のサイズ）から開始してもよく、ベースブロックは、事前定義された最下位レベル（例えば、4×4レベル）まで階層的に分割されてもよい。例えば、ベースブロックは、902、904、906、および908によって示された4つの事前定義された分割オプションまたはパターンに従うことができ、Rとして指定されたパーティションは、図9に示された同じ分割オプションが最下位レベル（例えば、4×4レベル）まで下位スケールで繰り返され得るという点で、再帰分割が可能である。いくつかの実装形態では、図9の分割方式に追加の制限が加えられてもよい。図9の実装形態では、長方形パーティション（例えば、1：2／2：1の長方形パーティション）は、許容され得るが、再帰的であることは許容され得ず、一方、正方形パーティションは再帰的であることが許容される。必要に応じて、再帰による図9の後に続く分割により、コーディングブロックの最終セットが生成される。ルートノードまたはルートブロックからの分割深度を示すために、コーディングツリー深度がさらに定義されてもよい。例えば、64×64ブロックのルートノードまたはルートブロックに対するコーディングツリー深度は0に設定されてもよく、ルートブロックが図9に従ってさらに1回分割された後、コーディングツリー深度は1だけ増加する。64×64のベースブロックから4×4の最小パーティションまでの最大または最深のレベルは、上記の方式では（レベル0から開始して）4である。そのような分割方式が、色チャネルのうちの1つまたは複数に適用されてもよい。各色チャネルは、図9の方式に従って独立して分割されてもよい（例えば、各階層レベルにおける色チャネルの各々に対して、事前定義されたパターンの中の分割パターンまたはオプションが独立して決定されてもよい）。あるいは、2つ以上の色チャネルが図9の同じ階層パターンツリーを共有してもよい（例えば、各階層レベルにおける2つ以上の色チャネルに対して、事前定義されたパターンの中の同じ分割パターンまたはオプションが選択されてもよい）。 In some exemplary implementations, a predefined partitioning pattern may be applied to the base block. As shown in FIG. 9, an exemplary four-way partitioning tree may start at a first predefined level (e.g., 64×64 block level or other size as the base block size), and the base block may be partitioned hierarchically down to a predefined lowest level (e.g., 4×4 level). For example, the base block may follow four predefined partitioning options or patterns illustrated by 902, 904, 906, and 908, and the partition designated as R is capable of recursive partitioning in that the same partitioning option illustrated in FIG. 9 may be repeated at a lower scale down to the lowest level (e.g., 4×4 level). In some implementations, additional restrictions may be placed on the partitioning scheme of FIG. 9. In the implementation of FIG. 9, rectangular partitions (e.g., 1:2/2:1 rectangular partitions) may be allowed but not recursive, while square partitions are allowed to be recursive. Subsequent partitioning of FIG. 9 by recursion, if necessary, produces a final set of coding blocks. A coding tree depth may be further defined to indicate the division depth from the root node or root block. For example, the coding tree depth for a root node or root block of 64x64 blocks may be set to 0, and after the root block is further divided one time according to FIG. 9, the coding tree depth increases by 1. The maximum or deepest level from the 64x64 base block to the 4x4 smallest partition is 4 (starting from level 0) in the above scheme. Such a division scheme may be applied to one or more of the color channels. Each color channel may be divided independently according to the scheme of FIG. 9 (e.g., for each of the color channels at each hierarchical level, a division pattern or option in the predefined pattern may be determined independently). Alternatively, two or more color channels may share the same hierarchical pattern tree of FIG. 9 (e.g., for two or more color channels at each hierarchical level, the same division pattern or option in the predefined pattern may be selected).

図10は、再帰分割が分割ツリーを形成することを可能にする他の例示的な事前定義された分割パターンを示す。図10に示すように、例示的な10通りの分割構造またはパターンが事前定義されてもよい。ルートブロックは、事前定義されたレベルから（例えば、128×128レベルまたは64×64レベルのベースブロックから）開始し得る。図10の例示的な分割構造は、様々な2：1／1：2および4：1／1：4の長方形パーティションを含む。図10の2列目の1002、1004、1006、および1008で示される3つのサブパーティションを有するパーティションタイプは、「T型」パーティションと呼ばれ得る。「T型」パーティション1002、1004、1006、および1008は、左T型、上T型、右T型、および下T型と呼ばれる場合がある。いくつかの例示的な実装形態では、図10の長方形パーティションのどれもこれ以上細分されることは可能でない。ルートノードまたはルートブロックからの分割深度を示すために、コーディングツリー深度がさらに定義されてもよい。例えば、128×128ブロックのルートノードまたはルートブロックに対するコーディングツリー深度は0に設定されてもよく、ルートブロックが図10に従ってさらに1回分割された後、コーディングツリー深度は1だけ増加する。いくつかの実装形態では、1010のすべて正方形パーティションのみが、図10のパターンの後に続く分割ツリーの次のレベルへの再帰分割が可能であり得る。言い換えれば、再帰分割は、T型パターン1002、1004、1006、および1008内の正方形パーティションでは可能でない場合がある。必要に応じて、再帰による図10の後に続く分割手順により、コーディングブロックの最終セットが生成される。そのような方式が、色チャネルのうちの1つまたは複数に適用されてもよい。いくつかの実装形態では、8×8レベル未満のパーティションの使用に、より多くの柔軟性が加えられてもよい。例えば、場合によっては、2×2のクロマインター予測が使用されてもよい。 FIG. 10 illustrates another exemplary predefined partitioning pattern that allows the recursive partitioning to form a partitioning tree. As shown in FIG. 10, an exemplary ten-way partitioning structure or pattern may be predefined. The root block may start from a predefined level (e.g., from a base block at a 128×128 level or a 64×64 level). The exemplary partitioning structure of FIG. 10 includes various 2:1/1:2 and 4:1/1:4 rectangular partitions. A partition type having three subpartitions, shown as 1002, 1004, 1006, and 1008 in the second column of FIG. 10, may be referred to as a “T-type” partition. The “T-type” partitions 1002, 1004, 1006, and 1008 may be referred to as a left T-type, an upper T-type, a right T-type, and a lower T-type. In some exemplary implementations, none of the rectangular partitions of FIG. 10 may be further subdivided. A coding tree depth may be further defined to indicate the partitioning depth from the root node or root block. For example, the coding tree depth for a root node or root block of a 128×128 block may be set to 0, and after the root block is further divided one time according to FIG. 10, the coding tree depth increases by 1. In some implementations, only the all-square partitions of 1010 may allow recursive division to the next level of the division tree following the pattern of FIG. 10. In other words, recursive division may not be possible for the square partitions in the T-shaped patterns 1002, 1004, 1006, and 1008. If necessary, the division procedure following FIG. 10 by recursion generates a final set of coding blocks. Such a scheme may be applied to one or more of the color channels. In some implementations, more flexibility may be added to the use of partitions less than 8×8 levels. For example, in some cases, 2×2 chroma inter prediction may be used.

コーディングブロック分割についてのいくつかの他の例示的な実装形態では、ベースブロックまたは中間ブロックを四分木パーティションに分割するために四分木構造が使用されてもよい。そのような四分木分割は、任意の正方形パーティションに階層的かつ再帰的に適用されてもよい。ベースブロックまたは中間ブロックまたはパーティションがさらに四分木分割されるかどうかは、ベースブロックまたは中間ブロック／パーティションの様々なローカル特性に適合してもよい。ピクチャ境界における四分木分割が、さらに適合してもよい。例えば、サイズがピクチャ境界に収まるまでブロックが四分木分割を続けるように、ピクチャ境界で暗黙の四分木分割が行われてもよい。 In some other example implementations of coding block partitioning, a quadtree structure may be used to partition a base block or intermediate block into quadtree partitions. Such quadtree partitioning may be applied hierarchically and recursively to any square partitions. Whether a base block or intermediate block or partition is further quadtree partitioned may be adapted to various local characteristics of the base block or intermediate block/partition. The quadtree partitioning at the picture boundary may be further adapted. For example, an implicit quadtree partitioning may be done at the picture boundary such that a block continues to be quadtree partitioned until its size fits into the picture boundary.

いくつかの他の例示的な実装形態では、ベースブロックからの階層二分割が使用されてもよい。そのような方式の場合、ベースブロックまたは中間レベルブロックは、2つのパーティションに分割されてもよい。二分割は、水平または垂直のいずれかであってもよい。例えば、水平二分割は、ベースブロックまたは中間ブロックを等しい左右のパーティションに分割してもよい。同様に、垂直二分割は、ベースブロックまたは中間ブロックを等しい上下のパーティションに分割してもよい。そのような二分割は、階層的かつ再帰的であってもよい。二分割方式を続けるべきかどうか、および方式がさらに続く場合、水平二分割が使用されるべきか、垂直二分割が使用されるべきかは、ベースブロックまたは中間ブロックの各々において決定されてもよい。いくつかの実装形態では、さらなる分割は、（一方または両方の次元の）事前定義された最低パーティションサイズで停止してもよい。あるいは、ベースブロックから事前定義された分割レベルまたは深度に達すると、さらなる分割を停止してもよい。いくつかの実装形態では、パーティションのアスペクト比は制限されてもよい。例えば、パーティションのアスペクト比は、1：4よりも小さく（または4：1よりも大きく）なくてもよい。そのため、4：1の垂直対水平アスペクト比を有する垂直ストリップパーティションは、各々が2：1の垂直対水平アスペクト比を有する上下のパーティションに垂直にさらに二分割され得るのみである。 In some other example implementations, a hierarchical bisection from the base block may be used. For such a scheme, the base block or mid-level block may be divided into two partitions. The bisection may be either horizontal or vertical. For example, a horizontal bisection may divide the base block or mid-block into equal left and right partitions. Similarly, a vertical bisection may divide the base block or mid-block into equal top and bottom partitions. Such bisection may be hierarchical and recursive. It may be determined at each of the base block or mid-block whether the bisection scheme should continue, and if the scheme continues further, whether a horizontal or vertical bisection should be used. In some implementations, further division may stop at a predefined minimum partition size (in one or both dimensions). Alternatively, further division may stop once a predefined division level or depth from the base block is reached. In some implementations, the aspect ratio of the partitions may be limited. For example, the aspect ratio of the partitions may not be smaller than 1:4 (or larger than 4:1). Therefore, a vertical strip partition with a 4:1 vertical to horizontal aspect ratio can only be further divided vertically into an upper and lower partition, each with a 2:1 vertical to horizontal aspect ratio.

さらにいくつかの他の例では、図13に示すように、ベースブロックまたは任意の中間ブロックを分割するために三分割方式が使用され得る。三元パターンは、図13の1302に示すように垂直に、または図13の1304に示すように水平に実装されてもよい。図13の例示的な分割比は、垂直または水平のいずれかで1：2：1として示されているが、他の比が事前定義されてもよい。いくつかの実装形態では、2つ以上の異なる比が事前定義されてもよい。そのような三分木分割が1つの連続するパーティション内のブロック中心に位置するオブジェクトを取り込むことが可能であるが、四分木および二分木が常にブロック中心に沿って分割しており、したがってオブジェクトを別々のパーティションに分割するという点で、そのような三分割方式は四分木または二分割構造を補完するために使用されてもよい。いくつかの実装形態では、例示的な三分木のパーティションの幅および高さは、さらなる変換を回避するために常に2の累乗である。 In yet some other examples, a three-way splitting scheme may be used to split the base block or any intermediate blocks, as shown in FIG. 13. The ternary pattern may be implemented vertically, as shown at 1302 in FIG. 13, or horizontally, as shown at 1304 in FIG. 13. The example split ratio in FIG. 13 is shown as 1:2:1 either vertically or horizontally, but other ratios may be predefined. In some implementations, two or more different ratios may be predefined. Such a three-way splitting scheme may be used to complement a quadtree or bipartition structure, in that such a ternary tree splitting can capture objects located at block centers in one contiguous partition, while quadtrees and bipartitions always split along block centers, thus splitting objects into separate partitions. In some implementations, the width and height of the partitions of the example ternary tree are always powers of two to avoid further transformations.

上記の分割方式は、異なる分割レベルで任意の方法で組み合わされ得る。一例として、上述された四分木および二分割方式は、ベースブロックを四分木－二分木（QTBT）構造に分割するために組み合わされてもよい。そのような方式では、ベースブロックまたは中間ブロック／パーティションは、指定された場合、事前定義された条件のセットに従う、四分木分割または二分割のいずれかであってもよい。特定の例が、図14に示されている。図14の例では、ベースブロックは、1402、1404、1406、および1408によって示すように、最初に4つのパーティションに四分木分割される。その後、結果として得られたパーティションの各々は、（1408などの）4つのさらなるパーティションに四分木分割されるか、または次のレベルで（例えば、両方とも対称である1402もしくは1406などの水平もしくは垂直のいずれかの）2つのさらなるパーティションに二分割されるか、または（1404などの）分割されないかのいずれかである。二分割または四分木分割は、1410の全体的な例示的な分割パターンおよび1420の対応するツリー構造／表現によって示すように、正方形パーティションに対して再帰的に可能にされてもよく、実線は四分木分割を表し、破線は二分割を表す。二分割が水平か垂直かを示すために、二分割ノード（非リーフバイナリパーティション）ごとにフラグが使用されてもよい。例えば、1410の分割構造と一致する1420に示すように、フラグ「0」は水平二分割を表すことができ、フラグ「1」は垂直二分割を表してもよい。四分木分割パーティションの場合、四分木分割は常にブロックまたはパーティションを水平と垂直の両方に分割して等しいサイズの4つのサブブロック／パーティションを生成するので、分割タイプを示す必要はない。いくつかの実装形態では、フラグ「1」は水平二分割を表すことができ、フラグ「0」は垂直二分割を表してもよい。 The above partitioning schemes may be combined in any manner at different partitioning levels. As an example, the quadtree and bipartitioning schemes described above may be combined to partition the base block into a quadtree-binary tree (QTBT) structure. In such a scheme, the base block or intermediate blocks/partitions may be either quadtree-partitioned or bipartitioned, if specified, subject to a set of predefined conditions. A specific example is shown in FIG. 14. In the example of FIG. 14, the base block is first quadtree-partitioned into four partitions, as shown by 1402, 1404, 1406, and 1408. Each of the resulting partitions is then either quadtree-partitioned into four further partitions (such as 1408) or bipartitioned into two further partitions at the next level (e.g., either horizontally or vertically, such as 1402 or 1406, both of which are symmetric), or not partitioned (such as 1404). Bisection or quadtree partitioning may be recursively enabled for square partitions, as shown by the overall example partitioning pattern in 1410 and the corresponding tree structure/representation in 1420, where solid lines represent quadtree partitioning and dashed lines represent bisection. A flag may be used for each bisection node (non-leaf binary partition) to indicate whether the bisection is horizontal or vertical. For example, flag "0" may represent horizontal bisection and flag "1" may represent vertical bisection, as shown in 1420, which matches the partitioning structure in 1410. In the case of quadtree partitioning, there is no need to indicate the type of partitioning, since quadtree partitioning always splits a block or partition both horizontally and vertically to generate four sub-blocks/partitions of equal size. In some implementations, flag "1" may represent horizontal bisection and flag "0" may represent vertical bisection.

QTBTのいくつかの例示的な実装形態では、四分木および二元分割ルールセットは、以下の事前定義されたパラメータおよびそれに関連する対応する関数によって表されてもよい。
－CTUサイズ：四分木のルートノードサイズ（ベースブロックのサイズ）
－MinQTSize：最小許容四分木リーフノードサイズ
－MaxBTSize：最大許容二分木ルートノードサイズ
－MaxBTDepth：最大許容二分木深さ
－MinBTSize：最小許容二分木リーフノードサイズ
QTBT分割構造のいくつかの例示的な実装形態では、CTUサイズは、クロマサンプルの2つの対応する64×64ブロックを有する128×128個のルマサンプルとして設定されてもよく（例示的なクロマサブサンプリングが考慮され使用される場合）、MinQTSizeは、16×16として設定されてもよく、MaxBTSizeは、64×64として設定されてもよく、MinBTSize（幅および高さの両方について）は、4×4として設定されてもよく、MaxBTDepthは、4として設定されてもよい。四分木分割は、四分木リーフノードを生成するために、最初にCTUに適用されてもよい。四分木リーフノードは、16×16のその最小許容サイズ（すなわち、MinQTSize）から128×128（すなわち、CTU size）までのサイズを有してもよい。ノードが128×128である場合、サイズがMaxBTSize（すなわち、64×64）を超えるので、二分木によって最初に分割されることはない。そうでない場合、MaxBTSizeを超えないノードは、二分木によって分割される可能性がある。図14の例では、ベースブロックは、128×128である。ベースブロックは、事前定義されたルールセットに従って、四分木分割のみが可能である。ベースブロックは、0の分割深度を有する。結果として得られた4つのパーティションの各々は、MaxBTSizeを超えない64×64であり、レベル1でさらに四分木分割または二分木分割されてもよい。プロセスは続く。二分木深度がMaxBTDepth（すなわち、4）に達すると、それ以上の分割は考慮されなくてもよい。二分木ノードの幅がMinBTSize（すなわち、4）に等しいとき、それ以上の水平分割は考慮されなくてもよい。同様に、二分木ノードの高さがMinBTSizeに等しいとき、それ以上の垂直分割は考慮されない。 In some example implementations of QTBT, the quadtree and binary splitting rule sets may be represented by the following predefined parameters and their associated corresponding functions:
- CTU size: Root node size of the quadtree (size of the base block)
-MinQTSize: Minimum allowable quad tree leaf node size -MaxBTSize: Maximum allowable binary tree root node size -MaxBTDepth: Maximum allowable binary tree depth -MinBTSize: Minimum allowable binary tree leaf node size
In some exemplary implementations of the QTBT partitioning structure, the CTU size may be set as 128x128 luma samples with two corresponding 64x64 blocks of chroma samples (if exemplary chroma subsampling is considered and used), MinQTSize may be set as 16x16, MaxBTSize may be set as 64x64, MinBTSize (for both width and height) may be set as 4x4, and MaxBTDepth may be set as 4. A quadtree partition may be applied to the CTU first to generate quadtree leaf nodes. The quadtree leaf nodes may have a size from its minimum allowed size of 16x16 (i.e., MinQTSize) to 128x128 (i.e., CTU size). If a node is 128x128, it will not be split first by the binary tree because the size exceeds MaxBTSize (i.e., 64x64). Otherwise, nodes that do not exceed MaxBTSize may be split by binary tree. In the example of FIG. 14, the base block is 128×128. The base block is only capable of quadtree splitting according to a predefined set of rules. The base block has a split depth of 0. Each of the resulting four partitions is 64×64, not exceeding MaxBTSize, and may be further quadtree or binary tree split at level 1. The process continues. When the binary tree depth reaches MaxBTDepth (i.e., 4), no further splits may be considered. When the width of a binary tree node is equal to MinBTSize (i.e., 4), no further horizontal splits may be considered. Similarly, when the height of a binary tree node is equal to MinBTSize, no further vertical splits may be considered.

いくつかの例示的な実装形態では、上記のQTBT方式は、ルマおよびクロマが同じQTBT構造または別々のQTBT構造を有するための柔軟性をサポートするように構成されてもよい。例えば、PスライスおよびBスライスの場合、1つのCTU内のルマCTBおよびクロマCTBは同じQTBT構造を共有してもよい。しかしながら、Iスライスの場合、ルマCTBはQTBT構造によってCBに分割されてもよく、クロマCTBは他のQTBT構造によってクロマCBに分割されてもよい。これは、CUがIスライス内の異なる色チャネルを参照するために使用されてもよく、例えば、Iスライスが、ルマ成分のコーディングブロックまたは2つのクロマ成分のコーディングブロックから構成されてもよく、PスライスまたはBスライス内のCUが、3つの色成分すべてのコーディングブロックから構成されてもよいことを意味する。 In some example implementations, the above QTBT scheme may be configured to support flexibility for luma and chroma to have the same QTBT structure or separate QTBT structures. For example, for P and B slices, the luma CTB and chroma CTB in one CTU may share the same QTBT structure. However, for I slices, the luma CTB may be divided into CBs by a QTBT structure, and the chroma CTB may be divided into chroma CBs by another QTBT structure. This means that CUs may be used to refer to different color channels in an I slice, e.g., an I slice may consist of a coding block of a luma component or a coding block of two chroma components, and a CU in a P or B slice may consist of coding blocks of all three color components.

いくつかの他の実装形態では、QTBT方式は、上述された三元方式で補完されてもよい。そのような実装形態は、マルチタイプツリー（MTT）構造と呼ばれる場合がある。例えば、ノードの二分割に加えて、図13の三分割パターンのうちの1つが選択されてもよい。いくつかの実装形態では、正方形ノードのみが三分割を受けることができる。三分割が水平であるか垂直であるかを示すために、追加のフラグが使用されてもよい。 In some other implementations, the QTBT scheme may be complemented with the ternary scheme described above. Such implementations may be referred to as multi-type tree (MTT) structures. For example, in addition to the bisection of the nodes, one of the trisection patterns of FIG. 13 may be selected. In some implementations, only square nodes may undergo trisection. An additional flag may be used to indicate whether the trisection is horizontal or vertical.

QTBT実装形態および三分割によって補完されたQTBT実装形態などの2レベルツリーまたはマルチレベルツリーの設計は、主に複雑性の低減によって動機付けられてもよい。理論的には、ツリーをトラバースする複雑性はT^Dであり、ここで、Tは分割タイプの数を表し、Dはツリーの深度である。深度（D）を低減しながらマルチタイプ（T）を使用することによって、トレードオフが行われてもよい。 The design of two-level or multi-level trees, such as the QTBT implementation and the QTBT implementation complemented by trisection, may be motivated primarily by reducing complexity. In theory, the complexity of traversing a tree is T ^D , where T represents the number of partition types and D is the depth of the tree. A trade-off may be made by using multiple types (T) while reducing the depth (D).

いくつかの実装形態では、CBはさらに分割されてもよい。例えば、CBは、コーディングプロセスおよびデコーディングプロセス中のイントラフレーム予測またはインターフレーム予測を目的として、複数の予測ブロック（PB）にさらに分割され得る。言い換えれば、CBは異なるサブパーティションにさらに分割されてもよく、そこで個々の予測決定／構成が行われてもよい。並行して、CBは、ビデオデータの変換または逆変換が行われるレベルを記述する目的で、複数の変換ブロック（TB）にさらに分割されてもよい。CBのPBおよびTBへの分割方式は、同じである場合もそうでない場合もある。例えば、各分割方式は、例えば、ビデオデータの様々な特性に基づいて独自の手順を使用して行われ得る。PBおよびTBの分割方式は、いくつかの例示的な実装形態では独立していてもよい。PBおよびTBの分割方式および境界は、いくつかの他の例示的な実装形態では相関されていてもよい。いくつかの実装形態では、例えば、TBは、PB分割後に分割されてもよく、特に、各PBは、コーディングブロックの分割の後に続いて決定された後、次いで1つまたは複数のTBにさらに分割されてもよい。例えば、いくつかの実装形態では、PBは、1つ、2つ、4つ、または他の数のTBに分割され得る。 In some implementations, the CB may be further divided. For example, the CB may be further divided into multiple prediction blocks (PBs) for the purpose of intra-frame or inter-frame prediction during the coding and decoding processes. In other words, the CB may be further divided into different sub-partitions, where individual prediction decisions/configurations may be made. In parallel, the CB may be further divided into multiple transform blocks (TBs) for the purpose of describing the level at which the transformation or inverse transformation of the video data is performed. The division scheme of the CB into PBs and TBs may or may not be the same. For example, each division scheme may be performed using a unique procedure based on, for example, various characteristics of the video data. The division scheme of the PBs and TBs may be independent in some exemplary implementations. The division schemes and boundaries of the PBs and TBs may be correlated in some other exemplary implementations. In some implementations, for example, the TBs may be divided after the PB division, and in particular, each PB may be determined following the division of the coding block and then further divided into one or more TBs. For example, in some implementations, the PB may be divided into one, two, four, or other number of TBs.

いくつかの実装形態では、ベースブロックをコーディングブロックに分割し、さらに予測ブロックおよび／または変換ブロックに分割するために、ルマチャネルおよびクロマチャネルは異なって処理されてもよい。例えば、いくつかの実装形態では、コーディングブロックの予測ブロックおよび／または変換ブロックへの分割は、ルマチャネルに対して許容されてもよいが、コーディングブロックの予測ブロックおよび／または変換ブロックへのそのような分割は、クロマチャネルに対して許容されない場合がある。そのような実装形態では、よって、ルマブロックの変換および／または予測は、コーディングブロックレベルでのみ行われ得る。他の例では、ルマチャネルおよびクロマチャネルの最小変換ブロックサイズが異なっていてもよく、例えば、ルマチャネルのコーディングブロックは、クロマチャネルよりも小さい変換ブロックおよび／または予測ブロックに分割されることが許容され得る。さらに他の例では、コーディングブロックの変換ブロックおよび／または予測ブロックへの分割の最大深度がルマチャネルとクロマチャネルとの間で異なっていてもよく、例えば、ルマチャネルのコーディングブロックは、クロマチャネルよりも深い変換ブロックおよび／または予測ブロックに分割されることが許容され得る。具体例として、ルマコーディングブロックは、最大2レベルだけ下がる再帰分割によって表すことができる複数のサイズの変換ブロックに分割されてもよく、正方形、2：1／1：2、および4：1／1：4などの変換ブロック形状、ならびに4×4から64×64の変換ブロックサイズが許容され得る。しかしながら、クロマブロックの場合、ルマブロックに指定された可能な最大の変換ブロックのみが許容されてもよい。 In some implementations, the luma and chroma channels may be processed differently to split the base blocks into coding blocks and further into prediction and/or transform blocks. For example, in some implementations, splitting of coding blocks into prediction and/or transform blocks may be allowed for the luma channel, but such splitting of coding blocks into prediction and/or transform blocks may not be allowed for the chroma channels. In such implementations, the transformation and/or prediction of luma blocks may thus only be performed at the coding block level. In other examples, the minimum transform block sizes of the luma and chroma channels may be different, e.g., the coding blocks of the luma channel may be allowed to be split into smaller transform and/or predictive blocks than the chroma channels. In yet other examples, the maximum depth of the splitting of coding blocks into transform and/or predictive blocks may be different between the luma and chroma channels, e.g., the coding blocks of the luma channel may be allowed to be split into deeper transform and/or predictive blocks than the chroma channels. As a specific example, a luma coding block may be divided into transform blocks of multiple sizes that can be represented by a recursive division down by up to two levels, allowing transform block shapes such as square, 2:1/1:2, and 4:1/1:4, as well as transform block sizes from 4x4 to 64x64. However, for chroma blocks, only the largest possible transform block designated for the luma block may be allowed.

コーディングブロックをPBに分割するためのいくつかの例示的な実装形態では、PB分割の深度、形状、および／または他の特性は、PBがイントラコーディングされるかインターコーディングされるかに依存してもよい。 In some example implementations for partitioning a coding block into PBs, the depth, shape, and/or other characteristics of the PB partition may depend on whether the PB is intra-coded or inter-coded.

コーディングブロック（または予測ブロック）の変換ブロックへの分割は、四分木分割および事前定義されたパターン分割を含むがそれらに限定されない様々な例示的な方式で、再帰的または非再帰的に、コーディングブロックまたは予測ブロックの境界での変換ブロックをさらに考慮して実施されてもよい。一般に、結果として得られた変換ブロックは、異なる分割レベルにあってもよく、同じサイズでなくてもよく、形状が正方形である必要がなくてもよい（例えば、それらはいくつかの許容されたサイズおよびアスペクト比を有する長方形であり得る）。さらなる例は、図15、図16、および図17に関連して以下でさらに詳細に説明される。 The division of the coding block (or prediction block) into transform blocks may be performed in various exemplary manners, including but not limited to quadtree division and predefined pattern division, recursively or non-recursively, further considering transform blocks at the boundaries of the coding block or prediction block. In general, the resulting transform blocks may be at different division levels, may not be the same size, and need not be square in shape (e.g., they may be rectangular with some allowed size and aspect ratio). Further examples are described in more detail below in connection with Figures 15, 16, and 17.

しかしながら、いくつかの他の実装形態では、上記の分割方式のいずれかを介して取得されたCBは、予測および／または変換のための基本または最小のコーディングブロックとして使用されてもよい。言い換えれば、インター予測／イントラ予測を行う目的で、かつ／または変換の目的で、これ以上の分割は行われない。例えば、上記のQTBT方式から取得されたCBは、予測を行うための単位としてそのまま使用されてもよい。具体的には、そのようなQTBT構造は、複数の分割タイプの概念を取り除く、すなわち、CU、PU、およびTUの分離を取り除き、上述したように、CU／CB分割形状についてのさらなる柔軟性をサポートする。そのようなQTBTブロック構造では、CU／CBは、正方形または長方形のいずれかの形状を有することができる。そのようなQTBTのリーフノードは、これ以上の分割なしに予測および変換処理のための単位として使用される。これは、CU、PU、およびTUがそのような例示的なQTBTコーディングブロック構造において同じブロックサイズを有することを意味する。 However, in some other implementations, the CB obtained via any of the above partitioning schemes may be used as a basic or smallest coding block for prediction and/or transformation. In other words, no further partitioning is performed for the purpose of performing inter/intra prediction and/or transformation. For example, the CB obtained from the above QTBT scheme may be used as it is as a unit for performing prediction. Specifically, such a QTBT structure removes the concept of multiple partition types, i.e., removes the separation of CU, PU, and TU, and supports more flexibility on CU/CB partition shapes as described above. In such a QTBT block structure, the CU/CB can have either a square or rectangular shape. The leaf nodes of such a QTBT are used as units for prediction and transformation processing without further partitioning. This means that the CU, PU, and TU have the same block size in such an exemplary QTBT coding block structure.

上記の様々なCB分割方式、ならびにPBおよび／またはTBへのCBのさらなる分割（PB／TB分割なしを含む）は、任意の方法で組み合わされ得る。以下の特定の実装形態は、非限定的な例として提供される。 The various CB division schemes described above, as well as further division of the CB into PB and/or TB (including no PB/TB division), may be combined in any manner. The following specific implementations are provided as non-limiting examples.

コーディングブロックおよび変換ブロックの分割の具体的で例示的な実装形態が、以下に記載される。そのような例示的な実装形態では、ベースブロックは、再帰的四分木分割、または（図9および図10の分割パターンなどの）上述された事前定義された分割パターンを使用して、コーディングブロックに分割されてもよい。各レベルにおいて、特定のパーティションのさらなる四分木分割を続けるべきかどうかが、ローカルビデオデータ特性によって決定されてもよい。結果として得られたCBは、様々な四分木分割レベルにあり、様々なサイズのCBであってもよい。ピクチャエリアをインターピクチャ（時間）予測を使用してコーディングするか、イントラピクチャ（空間）予測を使用してコーディングするかの決定は、CBレベル（または、すべての3色チャネルの場合CUレベル）で行われてもよい。各CBは、事前定義されたPB分割タイプに従って、1つ、2つ、4つ、または他の数のPBにさらに分割されてもよい。1つのPB内部で、同じ予測プロセスが適用されてもよく、関連情報はPBベースでデコーダに送信されてもよい。PB分割タイプに基づいて予測プロセスを適用することによって残差ブロックを取得した後、CBは、CB用のコーディングツリーと同様の他の四分木構造に従ってTBに分割されることができる。この特定の実装形態では、CBまたはTBは、正方形に限定されなくてもよい。さらにこの特定の例では、PBは、インター予測では正方形または長方形であってもよく、イントラ予測では正方形のみであってもよい。コーディングブロックは、例えば、4つの正方形のTBに分割されてもよい。各TBは、（四分木分割を使用して）再帰的に、残差四分木（RQT）と呼ばれるより小さいTBにさらに分割されてもよい。 Specific exemplary implementations of the division of coding blocks and transform blocks are described below. In such exemplary implementations, the base block may be divided into coding blocks using a recursive quadtree division or a predefined division pattern described above (such as the division patterns of Figures 9 and 10). At each level, whether to continue further quadtree division of a particular partition may be determined by local video data characteristics. The resulting CBs may be at various quadtree division levels and of various sizes. The decision to code a picture area using inter-picture (temporal) prediction or intra-picture (spatial) prediction may be made at the CB level (or at the CU level for all three color channels). Each CB may be further divided into one, two, four, or other number of PBs according to a predefined PB division type. Within one PB, the same prediction process may be applied and related information may be transmitted to the decoder on a PB basis. After obtaining the residual blocks by applying the prediction process based on the PB division type, the CBs may be divided into TBs according to other quadtree structures similar to the coding tree for CBs. In this particular implementation, the CB or TB may not be limited to a square. Furthermore, in this particular example, the PB may be square or rectangular for inter prediction, and only square for intra prediction. A coding block may be divided, for example, into four square TBs. Each TB may be further divided recursively (using quadtree partitioning) into smaller TBs called residual quadtrees (RQTs).

ベースブロックをCB、PB、および／またはTBに分割するための他の例示的な実装形態が、以下でさらに記載される。例えば、図9または図10に示すタイプなどの複数のパーティションユニットタイプを使用するのではなく、二分割および三分割のセグメント化構造（例えば、QTBTまたは上述された三分割によるQTBT）を使用するネストされたマルチタイプツリーを有する四分木が使用されてもよい。CB、PB、およびTBの分離（すなわち、CBのPBおよび／またはTBへの分割、ならびにPBのTBへの分割）は、そのようなCBがさらなる分割を必要とする場合に、最大変換長には大きすぎるサイズを有するCBに必要なときを除き、断念されてもよい。この例示的な分割方式は、予測および変換が両方ともこれ以上の分割なしにCBレベルで行われ得るように、CB分割形状についてのさらなる柔軟性をサポートするように設計されてもよい。そのようなコーディングツリー構造では、CBは、正方形または長方形のいずれかの形状を有し得る。具体的には、コーディングツリーブロック（CTB）が最初に四分木構造によって分割されてもよい。次いで、四分木リーフノードが、ネストされたマルチタイプツリー構造によってさらに分割されてもよい。二分割または三分割を使用するネストされたマルチタイプツリー構造の一例が、図11に示されている。具体的には、図11の例示的なマルチタイプツリー構造は、垂直二分割（SPLIT＿BT＿VER）（1102）、水平二分割（SPLIT＿BT＿HOR）（1104）、垂直三分割（SPLIT＿TT＿VER）（1106）、および水平三分割（SPLIT＿TT＿HOR）（1108）と呼ばれる4つの分割タイプを含む。次いで、CBはマルチタイプツリーのリーフに対応する。この例示的な実装形態では、CBが最大変換長に対して大きすぎない限り、このセグメント化は、これ以上の分割なしに予測と変換の両方の処理に使用される。これは、ほとんどの場合、CB、PB、およびTBが、ネストされたマルチタイプツリーコーディングブロック構造を有する四分木において同じブロックサイズを有することを意味する。例外は、サポートされる最大変換長がCBの色成分の幅または高さよりも小さいときに発生する。いくつかの実装形態では、二分割または三分割に加えて、図11のネストされたパターンは、四分木分割をさらに含んでもよい。 Other exemplary implementations for splitting the base block into CBs, PBs, and/or TBs are described further below. For example, rather than using multiple partition unit types such as those shown in FIG. 9 or FIG. 10, a quadtree with nested multi-type trees using bipartite and tripartite segmentation structures (e.g., QTBT or QTBT with tripartites as described above) may be used. Separation of CBs, PBs, and TBs (i.e., splitting CBs into PBs and/or TBs, and splitting PBs into TBs) may be abandoned except when necessary for CBs that have a size that is too large for the maximum transform length, when such CBs require further splitting. This exemplary splitting scheme may be designed to support further flexibility on the CB splitting shape, such that prediction and transformation can both be performed at the CB level without further splitting. In such coding tree structures, the CBs may have either a square or rectangular shape. Specifically, the coding tree block (CTB) may first be split by a quadtree structure. The quadtree leaf nodes may then be further split by a nested multi-type tree structure. An example of a nested multi-type tree structure using bisection or trisection is shown in FIG. 11. Specifically, the example multi-type tree structure of FIG. 11 includes four split types called vertical bisection (SPLIT_BT_VER) (1102), horizontal bisection (SPLIT_BT_HOR) (1104), vertical trisection (SPLIT_TT_VER) (1106), and horizontal trisection (SPLIT_TT_HOR) (1108). Then, CB corresponds to a leaf of the multi-type tree. In this example implementation, as long as CB is not too large for the maximum transform length, this segmentation is used for both prediction and transform processing without further splitting. This means that in most cases, CB, PB, and TB have the same block size in a quadtree with a nested multi-type tree coding block structure. An exception occurs when the maximum supported transform length is smaller than the width or height of the color components of CB. In some implementations, in addition to bisection or trisection, the nested pattern of FIG. 11 may further include a quadtree division.

1つのベースブロックに対する（四分木分割、二分割、および三分割のオプションを含む）ブロック分割のネストされたマルチタイプツリーコーディングブロック構造を有する四分木についての1つの具体例が図12に示されている。より詳細には、図12は、ベースブロック1200が4つの正方形パーティション1202、1204、1206、および1208に四分木分割されることを示す。さらなる分割のために図11のマルチタイプツリー構造および四分木をさらに使用する決定は、四分木分割されたパーティションの各々について行われる。図12の例では、パーティション1204は、これ以上分割されない。パーティション1202およびパーティション1208は、他の四分木分割を各々採用する。パーティション1202では、第2のレベルの四分木分割された左上、右上、左下、および右下のパーティションは、それぞれ、四分木、図11の水平二分割1104、非分割、および図11の水平三分割1108の第3のレベルの分割を採用する。パーティション1208は他の四分木分割を採用し、第2のレベルの四分木分割された左上、右上、左下、および右下のパーティションは、それぞれ、図11の垂直三分割1106、非分割、非分割、および図11の水平二分割1104の第3のレベルの分割を採用する。1208の第3のレベルの左上パーティションのサブパーティションのうちの2つは、それぞれ、図11の水平二分割1104および水平三分割1108に従ってさらに分割される。パーティション1206は、図11の垂直二分割1102の後に続く、2つのパーティションへの第2のレベルの分割パターンを採用し、2つのパーティションは図11の水平三分割1108および垂直二分割1102に従って第3のレベルでさらに分割される。図11の水平二分割1104に従って、それらのうちの1つに第4のレベルの分割がさらに適用される。 One specific example of a quadtree with nested multi-type tree coding block structure of block partitions (including quadtree partition, bipartition, and tripartition options) for one base block is shown in FIG. 12. More specifically, FIG. 12 shows that a base block 1200 is quadtree partitioned into four square partitions 1202, 1204, 1206, and 1208. A decision to further use the multi-type tree structure and quadtree of FIG. 11 for further partitioning is made for each of the quadtree partitioned partitions. In the example of FIG. 12, partition 1204 is not further partitioned. Partition 1202 and partition 1208 each adopt another quadtree partition. In partition 1202, the second level quadtree partitioned top left, top right, bottom left, and bottom right partitions adopt third level partitions of quadtree, horizontal bipartition 1104 of FIG. 11, non-partition, and horizontal tripartition 1108 of FIG. 11, respectively. Partition 1208 adopts another quadtree division, and the second level quadtree divided top left, top right, bottom left, and bottom right partitions adopt the third level division of vertical trisection 1106, unsplit, unsplit, and horizontal bisection 1104 of FIG. 11, respectively. Two of the subpartitions of the top left partition of the third level of 1208 are further divided according to horizontal bisection 1104 and horizontal trisection 1108 of FIG. 11, respectively. Partition 1206 adopts the second level division pattern into two partitions following vertical bisection 1102 of FIG. 11, and the two partitions are further divided at the third level according to horizontal trisection 1108 and vertical bisection 1102 of FIG. 11, respectively. A fourth level division is further applied to one of them according to horizontal bisection 1104 of FIG. 11.

上記の具体例では、最大ルマ変換サイズは64×64であってもよく、サポートされる最大クロマ変換サイズは、ルマとは異なる、例えば、32×32であり得る。図12の上記の例示的なCBが、一般に、より小さいPBおよび／またはTBにこれ以上分割されない場合でも、ルマコーディングブロックまたはクロマコーディングブロックの幅または高さが最大変換幅または最大変換高さよりも大きいとき、ルマコーディングブロックまたはクロマコーディングブロックは、水平方向および／または垂直方向の変換サイズ制限を満たすように、その方向に自動的に分割されてもよい。 In the above specific example, the maximum luma transform size may be 64x64, and the maximum supported chroma transform size may be different from the luma, e.g., 32x32. Even if the above example CB of FIG. 12 is not generally further divided into smaller PBs and/or TBs, when the width or height of the luma coding block or chroma coding block is larger than the maximum transform width or maximum transform height, the luma coding block or chroma coding block may be automatically divided in the horizontal and/or vertical directions to meet the transform size constraints in that direction.

上記のベースブロックのCBへの分割についての具体例では、上述されたように、コーディングツリー方式は、ルマおよびクロマが別々のブロックツリー構造を有するための能力をサポートし得る。例えば、PスライスおよびBスライスの場合、1つのCTU内のルマCTBおよびクロマCTBは、同じコーディングツリー構造を共有してもよい。Iスライスの場合、例えば、ルマおよびクロマは、別々のコーディングブロックツリー構造を有してもよい。別々のブロックツリー構造が適用されるとき、ルマCTBは1つのコーディングツリー構造によってルマCBに分割されてもよく、クロマCTBは他のコーディングツリー構造によってクロマCBに分割される。これは、Iスライス内のCUがルマ成分のコーディングブロックまたは2つのクロマ成分のコーディングブロックから構成されてもよく、PスライスまたはBスライス内のCUが常に、ビデオがモノクロでない限り3つの色成分すべてのコーディングブロックから構成されることを意味する。 In the specific example of the division of base blocks into CBs above, as described above, the coding tree scheme may support the ability for luma and chroma to have separate block tree structures. For example, for P and B slices, the luma CTB and chroma CTB in one CTU may share the same coding tree structure. For I slices, for example, luma and chroma may have separate coding block tree structures. When separate block tree structures are applied, the luma CTB may be divided into luma CBs by one coding tree structure, and the chroma CTB is divided into chroma CBs by the other coding tree structure. This means that a CU in an I slice may consist of a coding block of the luma component or a coding block of two chroma components, and a CU in a P or B slice is always composed of coding blocks of all three color components unless the video is monochrome.

コーディングブロックが複数の変換ブロックにさらに分割されるとき、その中の変換ブロックは、様々な順序または走査方式に従ってビットストリーム内で順序付けされてもよい。コーディングブロックまたは予測ブロックを変換ブロックに分割するための例示的な実装形態、および変換ブロックのコーディング順序が、以下でさらに詳細に記載される。いくつかの例示的な実装形態では、上述されたように、変換分割は、例えば、4×4から64×64までの範囲の変換ブロックサイズを有する、複数の形状、例えば、1：1（正方形）、1：2／2：1、および1：4／4：1の変換ブロックをサポートし得る。いくつかの実装形態では、コーディングブロックが64×64よりも小さいか等しい場合、変換ブロック分割は、クロマブロックの場合、変換ブロックサイズがコーディングブロックサイズと同一であるように、ルマ成分にのみ適用されてもよい。そうではなく、コーディングブロックの幅または高さが64よりも大きい場合、ルマコーディングブロックとクロマコーディングブロックの両方は、それぞれ、min（W，64）×min（H，64）およびmin（W，32）×min（H，32）の倍数の変換ブロックに暗黙的に分割されてもよい。 When a coding block is further divided into multiple transform blocks, the transform blocks therein may be ordered in the bitstream according to various orders or scanning schemes. Exemplary implementations for dividing a coding block or a prediction block into transform blocks, and the coding order of the transform blocks, are described in further detail below. In some exemplary implementations, as described above, the transform division may support transform blocks of multiple shapes, e.g., 1:1 (square), 1:2/2:1, and 1:4/4:1, with transform block sizes ranging from, e.g., 4×4 to 64×64. In some implementations, when the coding block is smaller than or equal to 64×64, the transform block division may be applied only to the luma component, such that for chroma blocks, the transform block size is identical to the coding block size. Otherwise, if the width or height of a coding block is greater than 64, both the luma coding block and the chroma coding block may be implicitly divided into multiples of min(W,64)×min(H,64) and min(W,32)×min(H,32) transform blocks, respectively.

変換ブロック分割のいくつかの例示的な実装形態では、イントラコーディングされたブロックとインターコーディングされたブロックの両方について、コーディングブロックが、事前定義された数のレベル（例えば、2レベル）までの分割深度を有する複数の変換ブロックにさらに分割され得る。変換ブロックの分割深度およびサイズは、関連してもよい。いくつかの例示的な実装形態の場合、現在の深度の変換サイズから次の深度の変換サイズへのマッピングが以下で表1に示されている。 In some example implementations of transform block partitioning, for both intra-coded and inter-coded blocks, a coding block may be further partitioned into multiple transform blocks with a partition depth up to a predefined number of levels (e.g., two levels). The partition depth and size of the transform blocks may be related. For some example implementations, the mapping from the transform size of the current depth to the transform size of the next depth is shown below in Table 1.

表1の例示的なマッピングに基づいて、1：1正方形ブロックの場合、次のレベルの変換分割は、4つの1：1正方形サブ変換ブロックを作成し得る。変換分割は、例えば、4×4で停止してもよい。したがって、4×4の現在の深度の変換サイズは、次の深度の4×4の同じサイズに対応する。表1の例では、1：2／2：1非正方形ブロックの場合、次のレベルの変換分割は2つの1：1正方形サブ変換ブロックを作成し得るが、1：4／4：1非正方形ブロックの場合、次のレベルの変換分割は2つの1：2／2：1サブ変換ブロックを作成し得る。 Based on the example mappings in Table 1, for a 1:1 square block, the next level transform split may create four 1:1 square sub-transform blocks. The transform split may stop at, for example, 4x4. Thus, a transform size at the current depth of 4x4 corresponds to the same size of 4x4 at the next depth. In the example of Table 1, for a 1:2/2:1 non-square block, the next level transform split may create two 1:1 square sub-transform blocks, while for a 1:4/4:1 non-square block, the next level transform split may create two 1:2/2:1 sub-transform blocks.

いくつかの例示的な実装形態では、イントラコーディングされたブロックのルマ成分に対して、変換ブロック分割に関してさらなる制限が適用され得る。例えば、変換分割のレベルごとに、すべてのサブ変換ブロックは、等しいサイズを有するように制限されてもよい。例えば、32×16のコーディングブロックの場合、レベル1の変換分割は、2つの16×16のサブ変換ブロックを作成し、レベル2の変換分割は、8つの8×8のサブ変換ブロックを作成する。言い換えれば、変換ユニットを等しいサイズに保つために、すべての第1のレベルのサブブロックに第2のレベルの分割が適用されなければならない。表1に従ってイントラコーディングされた正方形ブロックに対する変換ブロック分割の一例が、矢印によって示されたコーディング順序と共に図15に示されている。具体的には、1502は、正方形コーディングブロックを示す。表1による4つの等しいサイズの変換ブロックへの第1のレベルの分割が、矢印によって示されたコーディング順序と共に1504に示されている。表1によるすべての第1のレベルの等しいサイズのブロックの16個の等しいサイズの変換ブロックへの第2のレベルの分割が、矢印によって示されたコーディング順序と共に1506に示されている。 In some example implementations, further restrictions may be applied on transform block partitioning for the luma components of intra-coded blocks. For example, for each level of transform partitioning, all sub-transform blocks may be restricted to have equal size. For example, for a 32×16 coding block, level 1 transform partitioning creates two 16×16 sub-transform blocks, and level 2 transform partitioning creates eight 8×8 sub-transform blocks. In other words, to keep the transform units equal in size, a second level partitioning must be applied to all first level sub-blocks. An example of transform block partitioning for an intra-coded square block according to Table 1 is shown in FIG. 15 with the coding order indicated by the arrows. Specifically, 1502 shows a square coding block. The first level partitioning according to Table 1 into four equal-sized transform blocks is shown in 1504 with the coding order indicated by the arrows. The second level partitioning of all first level equal-sized blocks according to Table 1 into 16 equal-sized transform blocks is shown in 1506 with the coding order indicated by the arrows.

いくつかの例示的な実装形態では、インターコーディングされたブロックのルマ成分に対して、イントラコーディングに対する上記の制限が適用されない場合がある。例えば、第1のレベルの変換分割の後に、サブ変換ブロックのいずれか1つが、もう1つのレベルでさらに独立して分割され得る。したがって、結果として得られた変換ブロックは、同じサイズのブロックであってもなくてもよい。インターコーディングされたブロックのそれらのコーディング順序による変換ロックへの例示的な分割が、図16に示されている。図16の例では、インターコーディングされたブロック1602は、表1に従って2つのレベルで変換ブロックに分割される。第1のレベルで、インターコーディングされたブロックは、等しいサイズの4つの変換ブロックに分割される。次いで、4つの変換ブロックのうちの（それらのすべてではない）1つのみが4つのサブ変換ブロックにさらに分割され、1604によって示すように、2つの異なるサイズを有する合計7つの変換ブロックがもたらされる。これらの7つの変換ブロックの例示的なコーディング順序が、図16の1604に矢印によって示されている。 In some example implementations, the above restrictions on intra-coding may not apply to the luma components of an inter-coded block. For example, after the first level of transform splitting, any one of the sub-transform blocks may be further split independently at another level. Thus, the resulting transform blocks may or may not be blocks of the same size. An example splitting of an inter-coded block into transform blocks according to their coding order is shown in FIG. 16. In the example of FIG. 16, an inter-coded block 1602 is split into transform blocks at two levels according to Table 1. At the first level, the inter-coded block is split into four transform blocks of equal size. Then, only one of the four transform blocks (but not all of them) is further split into four sub-transform blocks, resulting in a total of seven transform blocks having two different sizes, as indicated by 1604. An example coding order of these seven transform blocks is indicated by arrows at 1604 in FIG. 16.

いくつかの例示的な実装形態では、クロマ成分の場合、変換ブロックに対する何らかのさらなる制限が適用されてもよい。例えば、クロマ成分の場合、変換ブロックサイズは、コーディングブロックサイズと同じ大きさであり得るが、事前定義されたサイズ、例えば、8×8よりも小さくすることはできない。 In some example implementations, for chroma components, some further restrictions on the transform blocks may apply. For example, for chroma components, the transform block size may be as large as the coding block size, but cannot be smaller than a predefined size, e.g., 8x8.

いくつかの他の例示的な実装形態では、幅（W）または高さ（H）のいずれかが64よりも大きいコーディングブロックの場合、ルマコーディングブロックとクロマコーディングブロックの両方は、それぞれ、min（W，64）×min（H，64）およびmin（W，32）×min（H，32）の倍数の変換ユニットに暗黙的に分割されてもよい。ここで、本開示では、「min（a，b）」は、aとbとの間で小さい方の値を返すことができる。 In some other example implementations, for coding blocks with either width (W) or height (H) greater than 64, both luma coding blocks and chroma coding blocks may be implicitly divided into multiples of min(W, 64) x min(H, 64) and min(W, 32) x min(H, 32) transform units, respectively. Here, in this disclosure, "min(a, b)" may return the smaller value between a and b.

図17は、コーディングブロックまたは予測ブロックを変換ブロックに分割するための他の代替の例示的な方式をさらに示す。図17に示すように、再帰変換分割を使用する代わりに、コーディングブロックの変換タイプに従って、事前定義された分割タイプのセットがコーディングブロックに適用されてもよい。図17に示す特定の例では、6つの例示的な分割タイプのうちの1つが、コーディングブロックを様々な数の変換ブロックに分割するために適用されてもよい。変換ブロック分割を生成するそのような方式は、コーディングブロックまたは予測ブロックのいずれかに適用されてもよい。 Figure 17 further illustrates other alternative exemplary schemes for partitioning coding blocks or predictive blocks into transform blocks. As shown in Figure 17, instead of using recursive transform partitioning, a set of predefined partition types may be applied to the coding block according to the transform type of the coding block. In the particular example shown in Figure 17, one of six exemplary partition types may be applied to partition the coding block into a varying number of transform blocks. Such a scheme for generating transform block partitioning may be applied to either the coding block or the predictive block.

より詳細には、図17の分割方式は、任意の所与の変換タイプ（変換タイプは、例えば、ADSTなどのプライマリ変換のタイプを指す）に対して最大6つの例示的な分割タイプを提供する。この方式では、すべてのコーディングブロックまたは予測ブロックは、例えば、レート歪みコストに基づいて変換分割タイプが割り当てられてもよい。一例では、コーディングブロックまたは予測ブロックに割り当てられる変換分割タイプは、コーディングブロックまたは予測ブロックの変換タイプに基づいて決定されてもよい。図17に例示された6つの変換分割タイプによって示すように、特定の変換分割タイプが、変換ブロックの分割サイズおよびパターンに対応し得る。様々な変換タイプと様々な変換分割タイプとの間の対応関係が、事前定義されてもよい。レート歪みコストに基づいてコーディングブロックまたは予測ブロックに割り当てられ得る変換分割タイプを大文字のラベルが示す一例が、以下に示されている。 More specifically, the partitioning scheme of FIG. 17 provides up to six exemplary partitioning types for any given transform type (transform type refers to the type of primary transform, e.g., ADST). In this scheme, every coding block or predictive block may be assigned a transform partitioning type based on, e.g., rate-distortion cost. In one example, the transform partitioning type assigned to a coding block or predictive block may be determined based on the transform type of the coding block or predictive block. As illustrated by the six transform partitioning types illustrated in FIG. 17, a particular transform partitioning type may correspond to the partitioning size and pattern of the transform block. The correspondence between various transform types and various transform partitioning types may be predefined. An example is shown below in which capitalized labels indicate transform partitioning types that may be assigned to a coding block or predictive block based on rate-distortion cost.

・PARTITION＿NONE：ブロックサイズに等しい変換サイズを割り当てる。 -PARTITION＿NONE: Assigns a transformation size equal to the block size.

・PARTITION＿SPLIT：ブロックサイズの1／2の幅およびブロックサイズの1／2の高さの変換サイズを割り当てる。 -PARTITION＿SPLIT: Assigns a transformation size that is 1/2 the width of the block size and 1/2 the height of the block size.

・PARTITION＿HORZ：ブロックサイズと同じ幅およびブロックサイズの1／2の高さの変換サイズを割り当てる。 -PARTITION＿HORZ: Assigns a transformation size with the same width as the block size and half the height of the block size.

・PARTITION＿VERT：ブロックサイズの1／2の幅およびブロックサイズと同じ高さの変換サイズを割り当てる。 -PARTITION＿VERT: Assigns a transformation size with a width half the block size and a height equal to the block size.

・PARTITION＿HORZ4：ブロックサイズと同じ幅およびブロックサイズの1／4の高さの変換サイズを割り当てる。 -PARTITION＿HORZ4: Assigns a transformation size with the same width as the block size and 1/4 of the height of the block size.

・PARTITION＿VERT4：ブロックサイズの1／4の幅およびブロックサイズと同じ高さの変換サイズを割り当てる。 -PARTITION＿VERT4: Assigns a transformation size with a width of 1/4 of the block size and a height equal to the block size.

上記の例では、図17に示す変換分割タイプは、すべて分割された変換ブロックについての均一な変換サイズを含む。これは限定ではなく単なる例である。いくつかの他の実装形態では、特定の分割タイプ（またはパターン）における分割された変換ブロックに混合変換ブロックサイズが使用されてもよい。 In the above example, the transform split types shown in FIG. 17 include uniform transform sizes for all split transform blocks. This is by way of example only and not limitation. In some other implementations, mixed transform block sizes may be used for the split transform blocks in a particular split type (or pattern).

上記の分割方式のいずれかから取得されたPB（または、予測ブロックにさらに分割されていない場合はPBとも呼ばれるCB）は、イントラ予測またはインター予測のいずれかを介してコーディングのための個々のブロックになり得る。現在のPBにおけるインター予測のために、現在のブロックと予測ブロックとの間の残差が生成され、コーディングされ、コーディングされたビットストリームに含まれ得る。 The PBs (or CBs, also called PBs if not further divided into predictive blocks) obtained from any of the above partitioning schemes can become individual blocks for coding via either intra- or inter-prediction. For inter-prediction on the current PB, the residual between the current block and the predictive block can be generated, coded, and included in the coded bitstream.

インター予測は、例えば、単一参照モードまたは複合参照モードで実施され得る。いくつかの実装形態では、現在のブロックがインターコーディングされており、スキップされないかどうかを示すために、スキップフラグが最初に現在のブロックのビットストリームに（またはより高いレベルで）含まれ得る。現在のブロックがインターコーディングされている場合、現在のブロックの予測に単一参照モードが使用されているか複合参照モードが使用されているかを示す信号として、他のフラグがビットストリームにさらに含まれ得る。単一参照モードの場合、現在のブロックの予測ブロックを生成するために1つの参照ブロックが使用され得る。複合参照モードの場合、例えば、加重平均によって予測ブロックを生成するために2つ以上の参照ブロックが使用され得る。複合参照モードは、複数参照モード、2参照モード、または多重参照モードと呼ばれる場合がある。1つまたは複数の参照ブロックは、1つまたは複数の参照フレームインデックスを使用して、さらに、参照ブロックと、例えば、水平および垂直画素内の位置における現在のブロックとの間のシフトを示す対応する1つまたは複数の動きベクトルを使用して識別され得る。例えば、現在のブロックのインター予測ブロックは、単一参照モードの予測ブロックとして参照フレーム内の1つの動きベクトルによって識別される単一参照ブロックから生成され得るが、複合参照モードの場合、予測ブロックは、2つの参照フレームインデックスおよび2つの対応する動きベクトルによって示される2つの参照フレーム内の2つの参照ブロックの加重平均によって生成され得る。動きベクトルは、様々な方法でコーディングされ、ビットストリームに含まれ得る。 Inter prediction may be implemented, for example, in single reference mode or mixed reference mode. In some implementations, a skip flag may be included initially in the bitstream of the current block (or at a higher level) to indicate whether the current block is inter-coded and will not be skipped. If the current block is inter-coded, another flag may be further included in the bitstream to signal whether a single or mixed reference mode is used to predict the current block. In the case of a single reference mode, one reference block may be used to generate the prediction block of the current block. In the case of a mixed reference mode, two or more reference blocks may be used to generate the prediction block, for example, by weighted averaging. A mixed reference mode may be referred to as a multiple reference mode, a two-reference mode, or a multiple reference mode. One or more reference blocks may be identified using one or more reference frame indices and further using one or more corresponding motion vectors indicating a shift between the reference block and the current block, for example, in positions in horizontal and vertical pixels. For example, the inter-predicted block of the current block may be generated from a single reference block identified by one motion vector in a reference frame as the predictive block in a single reference mode, while in the case of a mixed reference mode, the predictive block may be generated by a weighted average of two reference blocks in two reference frames indicated by two reference frame indexes and two corresponding motion vectors. The motion vectors may be coded and included in the bitstream in various ways.

いくつかの実装形態では、エンコーディングまたはデコーディングシステムは、デコーディングされたピクチャバッファ（DPB）を維持してもよい。いくつかの画像／ピクチャは、（デコーディングシステムにおいて）表示されるのを待つDPBにおいて維持されてもよく、DPBにおけるいくつかの画像／ピクチャは、（デコーディングシステムまたはエンコーディングシステムにおいて）インター予測を可能にするための参照フレームとして使用されてもよい。いくつかの実装形態では、DPB内の参照フレームは、エンコーディングまたはデコーディングされている現在の画像の短期参照または長期参照のいずれかとしてタグ付けされ得る。例えば、短期参照フレームは、現在のフレームまたは現在のフレームに最も近い事前定義された数（例えば、2つ）の後続のビデオフレーム内のブロックのインター予測に使用されるフレームをデコーディング順序に含んでもよい。長期参照フレームは、デコーディングの順序で現在のフレームから事前定義された数のフレームを超えるフレーム内の画像ブロックを予測するために使用することができるDPB内のフレームを含んでもよい。短期および長期参照フレームのためのこのようなタグに関する情報は、参照ピクチャセット（RPS）と称され、エンコーディングされたビットストリームにおける各フレームのヘッダに追加され得る。エンコーディングされたビデオストリーム内の各フレームは、ピクチャ順序カウンタ（POC）によって識別され得、これは、絶対的な方式で、または例えば、Iフレームから開始するピクチャグループに関連して、再生シーケンスに従って番号付けされる。 In some implementations, the encoding or decoding system may maintain a decoded picture buffer (DPB). Some images/pictures may be maintained in the DPB waiting to be displayed (in the decoding system) and some images/pictures in the DPB may be used as reference frames to enable inter prediction (in the decoding system or encoding system). In some implementations, the reference frames in the DPB may be tagged as either short-term or long-term references for the current picture being encoded or decoded. For example, short-term reference frames may include frames used for inter prediction of blocks in the current frame or the predefined number (e.g., two) subsequent video frames closest to the current frame in decoding order. Long-term reference frames may include frames in the DPB that can be used to predict image blocks in frames beyond a predefined number of frames from the current frame in decoding order. Information regarding such tags for short-term and long-term reference frames may be referred to as a reference picture set (RPS) and may be added to the header of each frame in the encoded bitstream. Each frame in the encoded video stream may be identified by a picture order counter (POC), which is numbered according to the playback sequence, either in an absolute manner or relative to a group of pictures starting with, for example, an I-frame.

いくつかの例示的な実装形態では、インター予測のための短期および長期参照フレームの識別を含む1つまたは複数の参照ピクチャリストが、RPS内の情報に基づいて形成され得る。例えば、単一のピクチャ参照リストは、L0参照（または参照リスト0）として表記される単方向インター予測のために形成されてもよく、2つのピクチャ参照リストは、2つの予測方向の各々についてL0（または参照リスト0）およびL1（または参照リスト1）として表記される双方向インター予測のために形成されてもよい。L0リストおよびL1リストに含まれる参照フレームは、様々な所定の方法で順序付けられてもよい。L0リストおよびL1リストの長さは、ビデオビットストリームにおいてシグナリングされ得る。単方向インター予測は、複合予測モードでの加重平均による予測ブロックの生成のための複数の参照が予測対象のブロックの同じ側にある場合、単一参照モードまたは複合参照モードのいずれかであり得る。双方向インター予測は、双方向インター予測が少なくとも2つの参照ブロックを含むという点で、複合モードのみであり得る。 In some example implementations, one or more reference picture lists including identification of short-term and long-term reference frames for inter prediction may be formed based on information in the RPS. For example, a single picture reference list may be formed for unidirectional inter prediction, denoted as L0 reference (or reference list 0), and two picture reference lists may be formed for bidirectional inter prediction, denoted as L0 (or reference list 0) and L1 (or reference list 1) for each of the two prediction directions. The reference frames included in the L0 and L1 lists may be ordered in various predetermined manners. The lengths of the L0 and L1 lists may be signaled in the video bitstream. Unidirectional inter prediction may be either a single reference mode or a mixed reference mode, where multiple references for the generation of a prediction block by weighted averaging in the mixed prediction mode are on the same side of the block to be predicted. Bidirectional inter prediction may only be a mixed mode, in that bidirectional inter prediction includes at least two reference blocks.

いくつかの実装形態では、インター予測のためのマージモード（MM）が実装されてもよい。一般に、マージモードの場合、現在のPBの単一参照予測における動きベクトルまたは複合参照予測における動きベクトルの1つまたは複数は、独立して計算およびシグナリングされるのではなく、他の動きベクトルから導出されてもよい。例えば、エンコーディングシステムでは、現在のPBの現在の動きベクトルは、現在の動きベクトルと他の1つまたは複数のすでにエンコーディングされた動きベクトル（参照動きベクトルと呼ばれる）との間の差によって表されてもよい。現在の動きベクトルの全体ではなく動きベクトルのそのような差は、エンコーディングされてビットストリームに含まれてもよく、参照動きベクトルにリンクされてもよい。これに対応して、デコーディングシステムにおいて、現在のPBに対応する動きベクトルは、デコーディングされた動きベクトル差およびそれとリンクされたデコーディングされた参照動きベクトルに基づいて導出され得る。一般的なマージモード（MM）インター予測の具体的な形式として、動きベクトル差に基づくこのようなインター予測は、動きベクトル差を伴うマージモード（MMVD）と呼ばれることがある。したがって、一般的なMMまたは特にMMVDは、異なるPBに関連付けられた動きベクトル間の相関を活用してコーディング効率を改善するために実装され得る。例えば、隣接するPBは、同様の動きベクトルを有してもよく、したがってMVDは小さくてもよく、効率的にコーディングされることができる。他の例では、動きベクトルは、空間内の同様に位置する／配置されたブロックについて時間的に（フレーム間で）相関してもよい。 In some implementations, a merge mode (MM) for inter prediction may be implemented. In general, for a merge mode, one or more of the motion vectors in single reference prediction or mixed reference prediction of the current PB may be derived from other motion vectors, rather than being calculated and signaled independently. For example, in an encoding system, a current motion vector of a current PB may be represented by a difference between the current motion vector and one or more other already encoded motion vectors (referred to as reference motion vectors). Such a difference of the motion vector, rather than the entire current motion vector, may be encoded and included in the bitstream and linked to the reference motion vector. Correspondingly, in a decoding system, a motion vector corresponding to the current PB may be derived based on the decoded motion vector difference and the decoded reference motion vector linked therewith. As a specific form of the general merge mode (MM) inter prediction, such inter prediction based on the motion vector difference may be referred to as a merge mode with motion vector difference (MMVD). Thus, the general MM or the MMVD in particular may be implemented to exploit the correlation between the motion vectors associated with different PBs to improve coding efficiency. For example, adjacent PBs may have similar motion vectors and therefore the MVD may be small and can be coded efficiently. In other examples, the motion vectors may be correlated temporally (between frames) for similarly located/positioned blocks in space.

いくつかの例示的な実装形態では、現在のPBがマージモードにあるかどうかを示すために、エンコーディングプロセス中にMMフラグをビットストリームに含めてもよい。追加的または代替的に、現在のPBがMMVDモードにあるかどうかを示すために、エンコーディングプロセス中にMMVDフラグが含まれ、ビットストリームでシグナリングされてもよい。MMおよび／またはMMVDフラグまたはインジケータは、PBレベル、CBレベル、CUレベル、CTBレベル、CTUレベル、スライスレベル、ピクチャレベルなどで提供され得る。特定の例では、現在のCUに対してMMフラグとMMVDフラグの両方が含まれてもよく、MMVDモードが現在のCUに使用されるかどうかを指定するために、MMVDフラグは、スキップフラグおよびMMフラグの直後にシグナリングされ得る。 In some example implementations, an MM flag may be included in the bitstream during the encoding process to indicate whether the current PB is in merge mode. Additionally or alternatively, an MMVD flag may be included and signaled in the bitstream during the encoding process to indicate whether the current PB is in MMVD mode. MM and/or MMVD flags or indicators may be provided at the PB level, CB level, CU level, CTB level, CTU level, slice level, picture level, etc. In a particular example, both an MM flag and an MMVD flag may be included for the current CU, and the MMVD flag may be signaled immediately after the skip flag and the MM flag to specify whether the MMVD mode is used for the current CU.

MMVDのいくつかの例示的な実装形態では、予測されるブロックに対して、動きベクトル予測のための参照動きベクトル（RMV）またはMV予測子候補のリストが形成され得る。RMV候補のリストは、その動きベクトルが現在の動きベクトルを予測するために使用され得る所定の数（例えば、2つ）のMV予測子候補ブロックを含むことができる。RMV候補ブロックは、同じフレーム内の隣接ブロックおよび／または時間ブロック（例えば、現在のフレームの進行中または後続のフレームにおいて同一に位置するブロック）から選択されたブロックを含んでもよい。これらのオプションは、現在のブロックと同様または同一の動きベクトルを有する可能性が高い、現在のブロックに対する空間的または時間的位置にあるブロックを表す。MV予測子候補のリストのサイズは、予め決定されてもよい。例えば、リストは、2つ以上の候補を含んでもよい。RMV候補のリスト上にあるために、候補ブロックは、例えば、現在のブロックと同じ参照フレーム（または複数のフレーム）を有する必要があり得、存在しなければならず（例えば、現在のブロックがフレームのエッジの近くにある場合、境界チェックが行われる必要がある）、エンコーディングプロセス中にすでにエンコーディングされなければならず、および／またはデコーディングプロセス中にすでにデコーディングされなければならない。いくつかの実装形態では、マージ候補のリストは、利用可能であり、上記の条件を満たす場合、最初に空間的に隣接するブロック（特定の事前定義された順序で走査される）で埋められ、次いで、リスト内に空間がまだ利用可能である場合、時間ブロックで埋められてもよい。隣接するRMV候補ブロックは、例えば、現在のブロックの左および上のブロックから選択され得る。RMV予測子候補のリストは、動的参照リスト（DRL）として様々なレベル（シーケンス、ピクチャ、フレーム、スライス、スーパーブロックなど）で動的に形成され得る。DRLは、ビットストリームでシグナリングされ得る。 In some example implementations of MMVD, a list of reference motion vector (RMV) or MV predictor candidates for motion vector prediction may be formed for a predicted block. The list of RMV candidates may include a predetermined number (e.g., two) of MV predictor candidate blocks whose motion vectors may be used to predict the current motion vector. The RMV candidate blocks may include blocks selected from neighboring blocks and/or temporal blocks in the same frame (e.g., blocks that are co-located in an ongoing or subsequent frame of the current frame). These options represent blocks that are in a spatial or temporal position relative to the current block that are likely to have a similar or identical motion vector to the current block. The size of the list of MV predictor candidates may be predetermined. For example, the list may include two or more candidates. To be on the list of RMV candidates, a candidate block may, for example, need to have the same reference frame (or frames) as the current block, must exist (e.g., if the current block is near an edge of a frame, a bounds check needs to be done), must already be encoded during the encoding process, and/or must already be decoded during the decoding process. In some implementations, the list of merge candidates may be filled first with spatially adjacent blocks (scanned in a certain predefined order) if available and satisfy the above conditions, and then with temporal blocks if space is still available in the list. Neighboring RMV candidate blocks may be selected, for example, from the blocks to the left and above the current block. The list of RMV predictor candidates may be dynamically formed at various levels (sequence, picture, frame, slice, superblock, etc.) as a dynamic reference list (DRL). The DRL may be signaled in the bitstream.

いくつかの実装形態では、現在のブロックの動きベクトルを予測するための参照動きベクトルとして使用されている実際のMV予測子候補がシグナリングされ得る。RMV候補リストに2つの候補が含まれる場合、参照マージ候補の選択を示すために、マージ候補フラグと呼ばれる1ビットのフラグが使用されてもよい。複合モードで予測されている現在のブロックについて、MV予測子を使用して予測された複数の動きベクトルの各々は、マージ候補リストからの参照動きベクトルに関連付けられ得る。エンコーダは、どのRMV候補が現在のコーディングブロックをより厳密に予測するかを決定し、選択をDRLへのインデックスとしてシグナリングし得る。 In some implementations, the actual MV predictor candidate that is being used as the reference motion vector for predicting the motion vector of the current block may be signaled. If the RMV candidate list includes two candidates, a one-bit flag, called the merge candidate flag, may be used to indicate the selection of the reference merge candidate. For the current block being predicted in a mixed mode, each of the multiple motion vectors predicted using the MV predictor may be associated with a reference motion vector from the merge candidate list. The encoder may determine which RMV candidate more closely predicts the current coding block and signal the selection as an index to the DRL.

MMVDのいくつかの例示的な実装形態では、RMV候補が選択され、予測対象の動きベクトルのベース動きベクトル予測子として使用された後、動きベクトル差（予測対象の動きベクトルと参照候補動きベクトルとの間の差を表すMVDまたはデルタMV）がエンコーディングシステムで計算され得る。そのようなMVDは、MV差の大きさおよびMV差の方向を表す情報を含み得、それらの両方はビットストリームでシグナリングされ得る。動き差の大きさおよび動き差の方向は、様々な方法でシグナリングされ得る。 In some example implementations of MMVD, after an RMV candidate is selected and used as a base motion vector predictor for the motion vector to be predicted, a motion vector difference (MVD or delta MV representing the difference between the motion vector to be predicted and the reference candidate motion vector) may be calculated in the encoding system. Such an MVD may include information representing the magnitude of the MV difference and the direction of the MV difference, both of which may be signaled in the bitstream. The magnitude of the motion difference and the direction of the motion difference may be signaled in various ways.

MMVDのいくつかの例示的な実装形態では、距離インデックスを使用して、動きベクトル差の大きさ情報を指定し、開始点（参照動きベクトル）からの事前定義された動きベクトル差を表す事前定義されたオフセットのセットのうちの1つを示し得る。次いで、シグナリングされたインデックスに応じたMVオフセットが、開始（参照）動きベクトルの水平成分または垂直成分のいずれかに加えられ得る。参照動きベクトルの水平成分または垂直成分がオフセットされるべきかどうかは、MVDの方向情報によって決定され得る。距離インデックスと事前定義されたオフセットとの間の例示的な事前定義された関係は、表2に指定されている。 In some example implementations of MMVD, a distance index may be used to specify motion vector difference magnitude information and indicate one of a set of predefined offsets representing a predefined motion vector difference from a starting point (reference motion vector). The MV offset according to the signaled index may then be added to either the horizontal or vertical component of the starting (reference) motion vector. Whether the horizontal or vertical component of the reference motion vector should be offset may be determined by the MVD's orientation information. An example predefined relationship between the distance index and the predefined offsets is specified in Table 2.

MMVDのいくつかの例示的な実装形態では、方向インデックスがさらにシグナリングされ、参照動きベクトルに対するMVDの方向を表すために使用され得る。いくつかの実装形態では、方向は、水平方向および垂直方向のいずれか一方に制限されてもよい。例示的な2ビット方向インデックスが、表3に示されている。表3の例では、MVDの解釈は、開始／参照MVの情報に従って変化し得る。例えば、開始／参照MVが単一予測ブロックに対応するか、または両方の参照フレームリストが現在のピクチャの同じ側を指す双予測ブロックに対応する場合（すなわち、2つの参照ピクチャのPOCは両方とも現在のピクチャのPOCよりも大きいか、または両方とも現在のピクチャのPOCよりも小さい）、表3の符号は、開始／参照MVに加えられるMVオフセットの符号（方向）を指定し得る。開始／参照MVが、現在のピクチャの異なる側に2つの参照ピクチャを有する双予測ブロックに対応し（すなわち、一方の参照ピクチャのPOCは現在のピクチャのPOCよりも大きく、他方の参照ピクチャのPOCは現在のピクチャのPOCよりも小さい）、ピクチャ参照リスト0内の参照POCと現在のフレームとの間の差が、ピクチャ参照リスト1内の参照POCと現在のフレームとの間の差よりも大きい場合、表3の符号は、ピクチャ参照リスト0内の参照ピクチャに対応する参照MVに加えられるMVオフセットの符号を指定し得、ピクチャ参照リスト1内の参照ピクチャに対応するMVのオフセットの符号は、反対の値（オフセットの反対符号）を有し得る。そうではなく、ピクチャ参照リスト1内の参照POCと現在のフレームとの間の差がピクチャ参照リスト0内の参照POCと現在のフレームとの間の差よりも大きい場合、表3の符号は、ピクチャ参照リスト1に関連付けられた参照MVに加えられたMVオフセットの符号と、ピクチャ参照リスト0に関連付けられた参照MVへのオフセットの符号とが反対の値を有することを指定し得る。 In some example implementations of MMVD, a direction index may be further signaled and used to represent the direction of the MVD relative to the reference motion vector. In some implementations, the direction may be limited to either the horizontal or vertical direction. An example 2-bit direction index is shown in Table 3. In the example of Table 3, the interpretation of the MVD may change according to the information of the start/reference MV. For example, if the start/reference MV corresponds to a uni-predictive block or a bi-predictive block with both reference frame lists pointing to the same side of the current picture (i.e., the POCs of the two reference pictures are both greater than the POC of the current picture or both less than the POC of the current picture), the sign in Table 3 may specify the sign (direction) of the MV offset added to the start/reference MV. If the start/reference MV corresponds to a bi-predictive block with two reference pictures on different sides of the current picture (i.e., the POC of one reference picture is larger than the POC of the current picture and the POC of the other reference picture is smaller than the POC of the current picture) and the difference between the reference POC in picture reference list 0 and the current frame is larger than the difference between the reference POC in picture reference list 1 and the current frame, the code in Table 3 may specify the sign of the MV offset added to the reference MV corresponding to the reference picture in picture reference list 0, and the sign of the offset of the MV corresponding to the reference picture in picture reference list 1 may have the opposite value (opposite sign of the offset). Otherwise, if the difference between the reference POC in picture reference list 1 and the current frame is larger than the difference between the reference POC in picture reference list 0 and the current frame, the code in Table 3 may specify that the sign of the MV offset added to the reference MV associated with picture reference list 1 and the sign of the offset to the reference MV associated with picture reference list 0 have opposite values.

いくつかの例示的な実装形態では、MVDは、各方向のPOCの差に従ってスケーリングされてもよい。両方のリストにおけるPOCの差が同じである場合、スケーリングは必要とされない。そうではなく、参照リスト0内のPOCの差が参照リスト1の差よりも大きい場合、参照リスト1のMVDはスケーリングされる。参照リスト1のPOC差がリスト0よりも大きい場合、リスト0のMVDも同様にスケーリングされ得る。開始MVが単一予測される場合、MVDは、利用可能または参照MVに加えられる。 In some example implementations, the MVD may be scaled according to the difference in POC in each direction. If the difference in POC in both lists is the same, no scaling is required. Otherwise, if the difference in POC in reference list 0 is greater than the difference in reference list 1, the MVD of reference list 1 is scaled. If the POC difference in reference list 1 is greater than list 0, the MVD of list 0 may be scaled as well. If the starting MV is uni-predicted, the MVD is added to the available or reference MV.

双方向複合予測のためのMVDコーディングおよびシグナリングのいくつかの例示的な実装形態では、2つのMVDを別々にコーディングおよびシグナリングすることに加えて、またはその代わりに、1つのMVDのみがシグナリングを必要とし、他のMVDがシグナリングされたMVDから導出され得るように、対称MVDコーディングが実装され得る。そのような実装形態では、リスト0とリスト1の両方の参照ピクチャインデックスを含む動き情報がシグナリングされる。しかしながら、例えば、参照リスト0に関連付けられたMVDのみがシグナリングされ、参照リスト1に関連付けられたMVDはシグナリングされず導出される。具体的には、スライスレベルでは、参照リスト1がビットストリームでシグナリングされていないかどうかを示すために、「mvd＿l1＿0＿flag」と呼ばれるフラグがビットストリームに含まれ得る。このフラグが1であり、参照リスト1がゼロに等しい（したがって、シグナリングされない）ことを示す場合、「BiDirPredFlag」と呼ばれる双方向予測フラグは0に設定されてもよく、これは双方向予測がないことを意味する。そうではなく、mvd＿l1＿0＿flagがゼロである場合、リスト0内の最も近い参照ピクチャおよびリスト1内の最も近い参照ピクチャが参照ピクチャの順方向および逆方向のペアまたは参照ピクチャの逆方向および順方向のペアを形成する場合、BiDirPredFlagは1に設定されてもよく、リスト0およびリスト1の参照ピクチャは両方とも短期参照ピクチャである。そうでなければ、BiDirPredFlagは、0に設定される。1のBiDirPredFlagは、対称モードフラグがビットストリームで追加的にシグナリングされることを示し得る。デコーダは、BiDirPredFlagが1である場合、ビットストリームから対称モードフラグを抽出し得る。対称モードフラグは、例えば、（必要に応じて）CUレベルでシグナリングされ得、対称MVDコーディングモードが対応するCUのために使用されているかどうかを示し得る。対称モードフラグが1である場合、それは対称MVDコーディングモードの使用を示し、リスト0とリスト1の両方の参照ピクチャインデックス（「mvp＿l0＿flag」および「mvp＿l1＿flag」と呼ばれる）のみが、リスト0に関連付けられたMVD（「MVD0」と呼ばれる）によってシグナリングされ、他の動きベクトル差「MVD1」がシグナリングされるのではなく導出されるべきであることを示す。例えば、MVD1は、－MVD0として導出され得る。したがって、例示的な対称MVDモードでは、1つのMVDのみがシグナリングされる。MV予測のためのいくつかの他の例示的な実装形態では、単一参照モードMV予測と複合参照モードMV予測の両方のために、一般的なマージモードMMVD、およびいくつかの他のタイプのMV予測を実装するために、調和方式が使用されてもよい。現在のブロックのMVが予測される方法をシグナリングするために、様々な構文要素が使用され得る。 In some example implementations of MVD coding and signaling for bidirectional composite prediction, in addition to or instead of coding and signaling the two MVDs separately, symmetric MVD coding may be implemented such that only one MVD needs to be signaled and the other MVD may be derived from the signaled MVD. In such implementations, motion information including reference picture indexes for both list 0 and list 1 is signaled. However, for example, only the MVD associated with reference list 0 is signaled, while the MVD associated with reference list 1 is not signaled and is derived. Specifically, at the slice level, a flag called "mvd_l1_0_flag" may be included in the bitstream to indicate if reference list 1 is not signaled in the bitstream. If this flag is 1, indicating that reference list 1 is equal to zero (and thus not signaled), a bidirectional prediction flag called "BiDirPredFlag" may be set to 0, meaning there is no bidirectional prediction. Otherwise, if mvd_l1_0_flag is zero, BiDirPredFlag may be set to 1 if the closest reference picture in list 0 and the closest reference picture in list 1 form a forward and backward pair of reference pictures or a backward and forward pair of reference pictures, and the reference pictures in list 0 and list 1 are both short-term reference pictures. Otherwise, BiDirPredFlag is set to 0. A BiDirPredFlag of 1 may indicate that a symmetric mode flag is additionally signaled in the bitstream. A decoder may extract the symmetric mode flag from the bitstream if BiDirPredFlag is 1. The symmetric mode flag may be signaled, for example, at the CU level (if necessary) to indicate whether a symmetric MVD coding mode is used for the corresponding CU. When the symmetric mode flag is 1, it indicates the use of a symmetric MVD coding mode and indicates that only the reference picture indexes for both list 0 and list 1 (referred to as "mvp_l0_flag" and "mvp_l1_flag") are signaled by the MVD associated with list 0 (referred to as "MVD0"), and the other motion vector difference "MVD1" should be derived rather than signaled. For example, MVD1 may be derived as -MVD0. Thus, in the exemplary symmetric MVD mode, only one MVD is signaled. In some other exemplary implementations for MV prediction, a harmonic scheme may be used to implement a general merged mode MMVD, as well as some other types of MV prediction, for both single reference mode MV prediction and mixed reference mode MV prediction. Various syntax elements may be used to signal how the MV of the current block is predicted.

例えば、単一参照モードの場合、以下のMV予測モードがシグナリングされ得る。 For example, for single reference mode, the following MV prediction modes may be signaled:

NEARMV－MVDなしで直接DRL（動的参照リスト）インデックスによって示されるリスト内の動きベクトル予測子（MVP）のうちの1つを使用する。 NEARMV - Use one of the motion vector predictors (MVPs) in the list indicated by the DRL (Dynamic Reference List) index directly without MVD.

NEWMV－参照としてDRLインデックスによってシグナリングされたリスト内の動きベクトル予測子（MVP）のうちの1つを使用し、デルタをMVPに適用する（例えば、MVDを使用する）。 NEWMV - Use one of the motion vector predictors (MVP) in the list signaled by the DRL index as a reference and apply the delta to the MVP (e.g., use MVD).

GLOBALMV－フレームレベルのグローバル動きパラメータに基づいて動きベクトルを使用する。 GLOBALMV - Use motion vectors based on frame-level global motion parameters.

同様に、予測対象の2つのMVに対応する2つの参照フレームを使用する複合参照インター予測モードの場合、以下のMV予測モードがシグナリングされ得る。 Similarly, for mixed reference inter prediction modes using two reference frames corresponding to the two MVs to be predicted, the following MV prediction modes may be signaled:

NEAR＿NEARMV－予測対象の2つのMVの各々について、MVDなしのDRLインデックスによってシグナリングされたリスト内の動きベクトル予測子（MVP）のうちの1つを使用する。 NEAR_NEARMV - For each of the two MVs to predict, use one of the motion vector predictors (MVPs) in the list signaled by the DRL index without MVD.

NEAR＿NEWMV－2つの動きベクトルのうちの第1の動きベクトルを予測するために、参照MVとしてMVDなしのDRLインデックスによってシグナリングされたリスト内の動きベクトル予測子（MVP）のうちの1つを使用し、2つの動きベクトルのうちの第2の動きベクトルを予測するために、DRLインデックスによってシグナリングされたリスト内の動きベクトル予測子（MVP）のうちの1つを、追加的にシグナリングされたデルタMV（MVD）と併せて参照MVとして使用する。 NEAR_NEWMV - Use one of the motion vector predictors (MVP) in the list signaled by the DRL index without MVD as a reference MV to predict the first of the two motion vectors, and use one of the motion vector predictors (MVP) in the list signaled by the DRL index as a reference MV together with the additionally signaled delta MV (MVD) to predict the second of the two motion vectors.

NEW＿NEARMV－2つの動きベクトルのうちの第2の動きベクトルを予測するために、参照MVとしてMVDなしのDRLインデックスによってシグナリングされたリスト内の動きベクトル予測子（MVP）のうちの1つを使用し、2つの動きベクトルのうちの第1の動きベクトルを予測するために、DRLインデックスによってシグナリングされたリスト内の動きベクトル予測子（MVP）のうちの1つを、追加的にシグナリングされたデルタMV（MVD）と併せて参照MVとして使用する。 NEW_NEARMV - Use one of the motion vector predictors (MVP) in the list signaled by the DRL index without MVD as a reference MV to predict the second of the two motion vectors, and use one of the motion vector predictors (MVP) in the list signaled by the DRL index as a reference MV together with the additionally signaled delta MV (MVD) to predict the first of the two motion vectors.

NEW＿NEWMV－参照MVとしてDRLインデックスによってシグナリングされたリスト内の動きベクトル予測子（MVP）のうちの1つを使用し、それを追加的にシグナリングされたデルタMVと併せて使用して2つのMVの各々について予測する。 NEW_NEWMV - Use one of the motion vector predictors (MVPs) in the list signaled by the DRL index as a reference MV and use it together with the additionally signaled delta MV to predict for each of the two MVs.

GLOBAL＿GLOBALMV－フレームレベルのグローバル動きパラメータに基づいて、各参照からのMVを使用する。 GLOBAL_GLOBALMV - Use MV from each reference based on frame-level global motion parameters.

したがって、上記の「NEAR」という用語は、一般的なマージモードとしてMVDなしの参照MVを使用するMV予測を指すのに対して、「NEW」という用語は、参照MVを使用し、MMVDモードのようにシグナリングされたMVDでそれをオフセットすることを含むMV予測を指す。複合インター予測の場合、参照ベース動きベクトルと上記の動きベクトルデルタの両方は、それらが相関され、そのような相関が2つの動きベクトルデルタをシグナリングするために必要な情報量を削減するために利用され得るとしても、2つの参照間で一般に異なるか独立していてもよい。そのような状況では、2つのMVDのジョイントシグナリングが実装され、ビットストリームに示され得る。 Thus, the term "NEAR" above refers to MV prediction using a reference MV without MVD as a general merge mode, whereas the term "NEW" refers to MV prediction that involves using a reference MV and offsetting it with the signaled MVD as in the MMVD mode. In the case of mixed inter prediction, both the reference-based motion vector and the above motion vector delta may generally be different or independent between the two references, even though they may be correlated and such correlation may be exploited to reduce the amount of information required to signal the two motion vector deltas. In such situations, joint signaling of the two MVDs may be implemented and indicated in the bitstream.

上記の動的参照リスト（DRL）は、動的に維持され、候補動きベクトル予測子と見なされるインデックス付き動きベクトルのセットを保持するために使用され得る。 The dynamic reference list (DRL) described above can be used to hold a dynamically maintained set of indexed motion vectors that are considered as candidate motion vector predictors.

いくつかの例示的な実装形態では、MVDの事前定義された解像度が許容され得る。例えば、1／8画素の動きベクトル精度（または確度）が許容され得る。様々なMV予測モードで上述したMVDは、様々な方法で構築およびシグナリングされ得る。いくつかの実装形態では、参照フレームリスト0またはリスト1内の上記の動きベクトル差をシグナリングするために、様々な構文要素が使用され得る。 In some example implementations, a predefined resolution of the MVD may be allowed. For example, a motion vector accuracy (or precision) of 1/8 pixel may be allowed. The MVDs described above for various MV prediction modes may be constructed and signaled in various ways. In some implementations, various syntax elements may be used to signal the above motion vector differences in reference frame list 0 or list 1.

例えば、「mv＿joint」と呼ばれる構文要素は、それに関連付けられた動きベクトル差のどの成分が非ゼロであるかを指定し得る。MVDの場合、これはすべての非ゼロ成分について一緒にシグナリングされる。例えば、mv＿jointは、以下の値を有する：
0は、水平方向または垂直方向のいずれかに沿って非ゼロMVDがないことを示すことができ、
1は、水平方向に沿ってのみ非ゼロMVDがあることを示すことができ、
2は、垂直方向に沿ってのみ非ゼロMVDがあることを示すことができ、
3は、水平方向と垂直方向の両方に沿って非ゼロMVDがあることを示すことができる。 For example, a syntax element called "mv_joint" may specify which components of the motion vector difference associated with it are non-zero. In the case of MVD, this is signaled for all non-zero components together. For example, mv_joint has the following values:
0 can indicate no non-zero MVD along either the horizontal or vertical direction,
1 can be shown that there is non-zero MVD only along the horizontal direction,
2 can be shown that there is non-zero MVD only along the vertical direction,
3 can be shown to have non-zero MVD along both the horizontal and vertical directions.

MVD用の「mv＿joint」構文要素が、非ゼロMVD成分がないことをシグナリングする場合、さらなるMVD情報はシグナリングされ得ない。しかしながら、「mv＿joint」構文が、1つまたは2つの非ゼロ成分があることをシグナリングする場合、追加の構文要素は、以下で説明されるように、非ゼロMVD成分の各々についてさらにシグナリングされ得る。 If the "mv_joint" syntax element for MVD signals that there are no non-zero MVD components, no further MVD information may be signaled. However, if the "mv_joint" syntax signals that there are one or two non-zero components, additional syntax elements may be further signaled for each of the non-zero MVD components, as described below.

例えば、「mv＿sign」と呼ばれる構文要素を使用して、対応する動きベクトル差成分が正であるか負であるかをさらに指定してもよい。 For example, a syntax element called "mv_sign" may be used to further specify whether the corresponding motion vector difference component is positive or negative.

他の例では、「mv＿class」と呼ばれる構文要素を使用して、対応する非ゼロMVD成分のクラスの事前定義されたセット間の動きベクトル差のクラスを指定してもよい。動きベクトル差の事前定義されたクラスは、例えば、動きベクトル差の連続した大きさ空間を、各範囲がMVDクラスに対応する非重複範囲に分離するために使用され得る。したがって、シグナリングされたMVDクラスは、対応するMVD成分の大きさ範囲を示す。以下の表4に示す例示的な実装形態では、より高いクラスは、より大きな大きさの範囲を有する動きベクトル差に対応する。表4において、シンボル（n，m］は、n画素よりも大きくm画素以下の動きベクトル差の範囲を表すために使用される。 In another example, a syntax element called "mv_class" may be used to specify the class of motion vector differences between a predefined set of classes of corresponding non-zero MVD components. The predefined classes of motion vector differences may be used, for example, to separate a continuous magnitude space of motion vector differences into non-overlapping ranges, each range corresponding to an MVD class. Thus, the signaled MVD class indicates the magnitude range of the corresponding MVD component. In the example implementation shown in Table 4 below, higher classes correspond to motion vector differences with a larger magnitude range. In Table 4, the symbol (n,m] is used to represent a range of motion vector differences greater than n pixels and less than or equal to m pixels.

いくつかの他の例では、「mv＿bit」と呼ばれる構文要素をさらに使用して、非ゼロ動きベクトル差成分と対応してシグナリングされたMVクラス大きさ範囲の開始の大きさとの間のオフセットの整数部分を指定してもよい。したがって、mv＿bitは、MVDの大きさまたは振幅を示してもよい。各MVDクラスの全範囲をシグナリングするために「my＿bit」に必要なビット数は、MVクラスに応じて変化し得る。例えば、表4の実装形態におけるMV＿CLASS 0およびMV＿CLASS 1は、0の開始MVDから1または2の整数画素オフセットを示すために単一ビットのみを必要とし得、表4の例示的な実装形態における各より高いMV＿CLASSは、前のMV＿CLASSよりも「mv＿bit」に対して漸進的にもう1ビットを必要とし得る。 In some other examples, a syntax element called "mv_bit" may be further used to specify the integer portion of the offset between a non-zero motion vector difference component and the magnitude of the start of the correspondingly signaled MV class magnitude range. Thus, mv_bit may indicate the magnitude or amplitude of the MVD. The number of bits required for "my_bit" to signal the full range of each MVD class may vary depending on the MV class. For example, MV_CLASS 0 and MV_CLASS 1 in the implementation of Table 4 may require only a single bit to indicate an integer pixel offset of 1 or 2 from a starting MVD of 0, and each higher MV_CLASS in the example implementation of Table 4 may require progressively one more bit for "mv_bit" than the previous MV_CLASS.

いくつかの他の例では、「mv＿fr」と呼ばれる構文要素は、対応する非ゼロMVD成分の動きベクトル差の最初の2つの小数（fractional）ビットを指定するためにさらに使用されてもよく、「mv＿hp」と呼ばれる構文要素は、対応する非ゼロMVD成分の動きベクトル差の第3の小数ビット（高解像度ビット）を指定するために使用されてもよい。2ビットの「mv＿fr」は、本質的に1／4画素のMVD解像度を提供するが、「mv＿hp」ビットは、1／8画素の解像度をさらに提供してもよい。いくつかの他の実装形態では、1／8画素よりも細かいMVD画素解像度を提供するために、2つ以上の「mv＿hp」ビットが使用されてもよい。いくつかの例示的な実装形態では、1／8画素以上のMVD解像度がサポートされているかどうかを示すために、様々なレベルのうちの1つまたは複数で追加のフラグがシグナリングされてもよい。MVD解像度が特定のコーディングユニットに適用されない場合、対応するサポートされていないMVD解像度についての上記の構文要素は、シグナリングされない場合がある。 In some other examples, a syntax element called "mv_fr" may be further used to specify the first two fractional bits of the motion vector difference of the corresponding non-zero MVD component, and a syntax element called "mv_hp" may be used to specify the third fractional bit (high resolution bit) of the motion vector difference of the corresponding non-zero MVD component. The two bits of "mv_fr" essentially provide a 1/4 pixel MVD resolution, while the "mv_hp" bits may further provide 1/8 pixel resolution. In some other implementations, two or more "mv_hp" bits may be used to provide an MVD pixel resolution finer than 1/8 pixel. In some example implementations, additional flags may be signaled at one or more of various levels to indicate whether 1/8 pixel or finer MVD resolutions are supported. If the MVD resolution does not apply to a particular coding unit, the above syntax elements for the corresponding unsupported MVD resolutions may not be signaled.

上記のいくつかの例示的な実装形態では、小数解像度は、異なるクラスのMVDとは無関係であり得る。言い換えれば、動きベクトル差の大きさに関係なく、非ゼロMVD成分の小数MVDをシグナリングするために、事前定義された数の「mv＿fr」および「mv＿hp」ビットを使用して、動きベクトル解像度に対する同様のオプションを提供してもよい。 In some example implementations above, fractional resolution may be independent of the MVD of different classes. In other words, a predefined number of "mv_fr" and "mv_hp" bits may be used to signal fractional MVD of non-zero MVD components regardless of the magnitude of the motion vector difference, providing similar options for motion vector resolution.

しかしながら、いくつかの他の例示的な実装形態では、様々なMVD大きさクラスにおける動きベクトル差の解像度が区別され得る。具体的には、より高いMVDクラスの大きなMVD大きさのための高解像度MVDは、圧縮効率の統計的に有意な改善をもたらさない可能性がある。したがって、MVDは、より高いMVD大きさクラスに対応するより大きなMVD大きさ範囲に対して、解像度（整数画素解像度または小数画素解像度）を下げてコーディングされ得る。同様に、MVDは、一般により大きなMVD値に対して、解像度（整数画素解像度または小数画素解像度）を下げてコーディングされ得る。そのようなMVDクラス依存性またはMVD大きさ依存性のMVD解像度は、一般に、適応MVD解像度、振幅依存性適応MVD解像度、または大きさ依存性MVD解像度と呼ばれることがある。「解像度」という用語は、「画素解像度」とさらに呼ばれることがある。適応MVD解像度は、全体的により良好な圧縮効率を達成するために、以下の例示的な実装形態によって説明されるように様々な方法で実施され得る。特に、精度の低いMVDを目指すことによるシグナリングビットの数の削減は、大規模または高クラスのMVDのMVD解像度を低規模または低クラスのMVDと同様のレベルで非適応的に処理しても、大規模または高クラスのMVDを有するブロックのインター予測残差コーディング効率を大幅に増加させることができないという統計的観察に起因して、そのような精度の低いMVDの結果としてインター予測残差をコーディングするために必要な追加のビットよりも大きくなり得る。言い換えれば、大規模または高クラスのMVDのためにより高いMVD解像度を使用することは、より低いMVD解像度を使用するよりも多くのコーディング利得をもたらさない可能性がある。 However, in some other example implementations, the resolution of the motion vector differences in various MVD magnitude classes may be differentiated. Specifically, a high-resolution MVD for a large MVD magnitude of a higher MVD class may not provide a statistically significant improvement in compression efficiency. Thus, the MVD may be coded at a reduced resolution (integer pixel resolution or fractional pixel resolution) for a larger MVD magnitude range corresponding to a higher MVD magnitude class. Similarly, the MVD may be coded at a reduced resolution (integer pixel resolution or fractional pixel resolution) for a generally larger MVD value. Such MVD class-dependent or MVD magnitude-dependent MVD resolution may generally be referred to as adaptive MVD resolution, amplitude-dependent adaptive MVD resolution, or magnitude-dependent MVD resolution. The term "resolution" may further be referred to as "pixel resolution". The adaptive MVD resolution may be implemented in various ways as described by the following example implementations to achieve better compression efficiency overall. In particular, the reduction in the number of signaling bits by aiming for a less accurate MVD may be greater than the additional bits required to code the inter-prediction residual as a result of such a less accurate MVD, due to the statistical observation that non-adaptively processing the MVD resolution of a large-scale or high-class MVD at a similar level as a low-scale or low-class MVD cannot significantly increase the inter-prediction residual coding efficiency of blocks with a large-scale or high-class MVD. In other words, using a higher MVD resolution for a large-scale or high-class MVD may not result in more coding gain than using a lower MVD resolution.

いくつかの一般的な例示的な実装形態では、MVDの画素解像度または精度は、MVDクラスの増加に伴って下がってもよいし、増加しなくてもよい。MVDの画素解像度を下げることは、より粗いMVD（または1つのMVDレベルから次のレベルへのより大きなステップ）に対応する。いくつかの実装形態では、MVD画素解像度とMVDクラスとの間の対応関係は、指定、事前定義、または事前構成されてもよく、したがってエンコードビットストリームでシグナリングされる必要はない。 In some typical example implementations, the pixel resolution or precision of the MVD may or may not decrease with increasing MVD class. Reducing the pixel resolution of the MVD corresponds to a coarser MVD (or a larger step from one MVD level to the next). In some implementations, the correspondence between MVD pixel resolution and MVD class may be specified, predefined, or preconfigured, and thus does not need to be signaled in the encoded bitstream.

いくつかの例示的な実装形態では、表3のMVクラスは各々、異なるMVD画素解像度に関連付けられてもよい。 In some example implementations, each of the MV classes in Table 3 may be associated with a different MVD pixel resolution.

いくつかの例示的な実装形態では、各MVDクラスは、単一の許容された解像度に関連付けられてもよい。いくつかの他の実装形態では、1つまたは複数のMVDクラスは、2つ以上の任意選択のMVD画素解像度に関連付けられてもよい。したがって、そのようなMVDクラスを有する現在のMVD成分のビットストリーム内の信号の後に、現在のMVD成分に対してどの任意選択の画素解像度が選択されるかを示すための追加のシグナリングが続くことができる。 In some example implementations, each MVD class may be associated with a single allowed resolution. In some other implementations, one or more MVD classes may be associated with two or more optional MVD pixel resolutions. Thus, a signal in the bitstream of a current MVD component having such an MVD class may be followed by additional signaling to indicate which optional pixel resolution is selected for the current MVD component.

いくつかの例示的な実装形態では、適応的に許容されるMVD画素解像度は、（解像度の降順で）1／64pel（画素）、1／32pel、1／16pel、1／8pel、1－4pel、1／2pel、1pel、2pel、4pel…を含んでもよいが、これらに限定されない。したがって、昇順MVDクラスの各々は、非昇順でこれらの解像度のうちの1つに関連付けられ得る。いくつかの実装形態では、MVDクラスは、上記の2つ以上の解像度に関連付けられてもよく、より高い解像度は、先行するMVDクラスのより低い解像度以下であってもよい。例えば、表4のMV＿CLASS＿3が任意選択の1pelおよび2pel解像度に関連付けられ得る場合、表4のMV＿CLASS＿4が関連付けられ得る最高解像度は、2pelになる。いくつかの他の実装形態では、MVクラスの最高許容解像度は、先行する（より低い）MVクラスの最低許容解像度よりも高くてもよい。しかしながら、昇順MVクラスについて許容される解像度の平均は、非昇順のみであってもよい。 In some example implementations, the adaptively allowed MVD pixel resolutions may include, but are not limited to, (in descending order of resolution): 1/64pel (pixel), 1/32pel, 1/16pel, 1/8pel, 1-4pel, 1/2pel, 1pel, 2pel, 4pel.... Thus, each ascending MVD class may be associated with one of these resolutions in a non-ascending order. In some implementations, an MVD class may be associated with two or more of the above resolutions, where the higher resolution may be less than or equal to the lower resolution of the preceding MVD class. For example, if MV_CLASS_3 in Table 4 may be associated with optional 1pel and 2pel resolutions, then the highest resolution to which MV_CLASS_4 in Table 4 may be associated would be 2pel. In some other implementations, the highest allowed resolution of an MV class may be higher than the lowest allowed resolution of the preceding (lower) MV class. However, the average of the allowed resolutions for the ascending MV classes may only be in non-ascending order.

いくつかの実装形態では、1／8pelよりも高い小数画素解像度が許容される場合、「mv＿fr」および「mv＿hp」シグナリングは、合計で3を超える小数ビットに対応して拡張され得る。 In some implementations, if fractional pixel resolution higher than 1/8 pel is allowed, the "mv_fr" and "mv_hp" signaling may be extended to accommodate more than 3 fractional bits in total.

いくつかの例示的な実装形態では、小数画素解像度は、閾値MVDクラス以下のMVDクラスに対してのみ許容され得る。例えば、小数画素解像度は、MVD－CLASS 0に対してのみ許容され、表4のすべての他のMVクラスに対しては許容されない場合がある。同様に、小数画素解像度は、表4の他のMVクラスのいずれか1つ以下のMVDクラスに対してのみ許容され得る。閾値MVDクラスを上回る他のMVDクラスについては、MVDの整数画素解像度のみが許容される。このようにして、「mv－fr」および／または「mv－hp」ビットのうちの1つまたは複数などの小数解像度シグナリングは、閾値MVDクラス以上のMVDクラスでシグナリングされるMVDに対してシグナリングされる必要はない。1画素未満の解像度を有するMVDクラスの場合、「mv－bit」シグナリングのビット数は、さらに低減され得る。例えば、表4のMV＿CLASS＿5の場合、MVD画素オフセットの範囲は（32，64］であり、したがって1pel解像度で範囲全体をシグナリングするには5ビットが必要である。しかしながら、MV＿CLASS＿5が2pel MVD解像度（1画素解像度よりも低い解像度）に関連付けられている場合、「mv－bit」には5ビットではなく4ビットが必要とされ得、「mv－fr」および「mv－hp」のいずれもMV－CLASS＿5として「mv＿class」のシグナリングに続いてシグナリングされる必要はない。 In some example implementations, fractional pixel resolution may only be allowed for MVD classes that are equal to or less than the threshold MVD class. For example, fractional pixel resolution may only be allowed for MVD-CLASS 0 and not for all other MV classes in Table 4. Similarly, fractional pixel resolution may only be allowed for MVD classes that are equal to or less than any one of the other MV classes in Table 4. For other MVD classes above the threshold MVD class, only integer pixel resolution of the MVD is allowed. In this manner, fractional resolution signaling, such as one or more of the "mv-fr" and/or "mv-hp" bits, need not be signaled for MVDs that are signaled in MVD classes equal to or greater than the threshold MVD class. For MVD classes with resolution less than one pixel, the number of bits of "mv-bit" signaling may be further reduced. For example, for MV_CLASS_5 in Table 4, the range of the MVD pixel offset is (32, 64], and therefore 5 bits are required to signal the entire range at 1 pel resolution. However, if MV_CLASS_5 is associated with a 2 pel MVD resolution (a resolution lower than 1 pel resolution), then 4 bits instead of 5 bits may be required for "mv-bit", and neither "mv-fr" nor "mv-hp" would need to be signaled following the signaling of "mv_class" as MV-CLASS_5.

いくつかの例示的な実装形態では、小数画素解像度は、閾値整数画素値未満の整数値を有するMVDに対してのみ許容されてもよい。例えば、小数画素解像度は、5画素よりも小さいMVDに対してのみ許容され得る。この例に対応して、小数解像度は、表4のMV＿CLASS＿0およびMV＿CLASS＿1に対して許可され、すべての他のMVクラスに対しては許容されない場合がある。他の例では、小数画素解像度は、7画素よりも小さいMVDに対してのみ許容され得る。この例に対応して、小数解像度は、表4のMV＿CLASS＿0およびMV＿CLASS＿1（5画素未満の範囲を有する）に対して許容され、MV＿CLASS＿3以上（5画素を超える範囲を有する）に対しては許容されない場合がある。その画素範囲が5画素を包含するMV＿CLASS＿2に属するMVDの場合、MVDの小数画素解像度は、「mv－bit」値に応じて許容されてもよい。「m－bit」値が1または2としてシグナリングされる場合（「m－bit」によって示されるようにオフセット1または2を有するMV＿CLASS＿2の画素範囲の開始として計算される、シグナリングされたMVDの整数部分が5または6であるように）、小数画素解像度は許容され得る。そうではなく、「mv－bit」値が3または4としてシグナリングされる場合（シグナリングされたMVDの整数部分が7または8であるように）、小数画素解像度は許容されない場合がある。 In some example implementations, fractional pixel resolution may be allowed only for MVDs with integer values less than a threshold integer pixel value. For example, fractional pixel resolution may be allowed only for MVDs smaller than 5 pixels. Corresponding to this example, fractional resolution may be allowed for MV_CLASS_0 and MV_CLASS_1 in Table 4, and not allowed for all other MV classes. In another example, fractional pixel resolution may be allowed only for MVDs smaller than 7 pixels. Corresponding to this example, fractional resolution may be allowed for MV_CLASS_0 and MV_CLASS_1 in Table 4 (having a range less than 5 pixels), and not allowed for MV_CLASS_3 and above (having a range greater than 5 pixels). For MVDs belonging to MV_CLASS_2, whose pixel range encompasses 5 pixels, fractional pixel resolution of the MVD may be allowed according to the "mv-bit" value. If the "m-bit" value is signaled as 1 or 2 (such that the integer part of the signaled MVD, calculated as the start of the pixel range of MV_CLASS_2 with offset 1 or 2 as indicated by "m-bit", is 5 or 6), fractional pixel resolution may be allowed. Otherwise, if the "mv-bit" value is signaled as 3 or 4 (such that the integer part of the signaled MVD is 7 or 8), fractional pixel resolution may not be allowed.

いくつかの他の実装形態では、閾値MVクラス以上のMVクラスについては、単一のMVD値のみが許容され得る。例えば、そのような閾値MVクラスは、MV＿CLASS 2であってもよい。したがって、MV＿CLASS＿2以上は、単一のMVD値を有し、小数画素解像度を有さないことのみが許容され得る。これらのMVクラスの単一の許容MVD値は、事前定義されてもよい。いくつかの例では、許容される単一の値は、表4のこれらのMVクラスのそれぞれの範囲の上限値であってもよい。例えば、MV＿CLASS＿2～MV＿CLASS＿10は、MV＿CLASS 2の閾値クラス以上であってもよく、これらのクラスの単一の許容MVD値は、それぞれ8、16、32、64、128、256、512、1024、および2048として事前定義されてもよい。いくつかの他の例では、許容される単一の値は、表4のこれらのMVクラスのそれぞれの範囲の中央値であってもよい。例えば、MV＿CLASS＿2～MV＿CLASS＿10は、クラス閾値を上回ってもよく、これらのクラスの単一の許容MVD値は、それぞれ3、6、12、24、48、96、192、384、768、および1536として事前定義されてもよい。範囲内の任意の他の値もまた、それぞれのMVDクラスの単一の許容解像度として定義されてもよい。 In some other implementations, only a single MVD value may be allowed for MV classes equal to or greater than the threshold MV class. For example, such a threshold MV class may be MV_CLASS 2. Thus, MV_CLASS_2 and above may only be allowed to have a single MVD value and no fractional pixel resolution. The single allowed MVD value for these MV classes may be predefined. In some examples, the single allowed value may be the upper limit of the range for each of these MV classes in Table 4. For example, MV_CLASS_2 through MV_CLASS_10 may be equal to or greater than the threshold class for MV_CLASS 2, and the single allowed MVD values for these classes may be predefined as 8, 16, 32, 64, 128, 256, 512, 1024, and 2048, respectively. In some other examples, the single allowed value may be the median of the range for each of these MV classes in Table 4. For example, MV_CLASS_2 through MV_CLASS_10 may be above the class threshold, and the single allowable MVD values for these classes may be predefined as 3, 6, 12, 24, 48, 96, 192, 384, 768, and 1536, respectively. Any other value within the range may also be defined as the single allowable resolution for the respective MVD class.

上記の実装形態では、シグナリングされた「mv＿class」が事前定義されたMVDクラス閾値以上である場合、「mv＿class」シグナリングのみがMVD値を決定するのに十分である。次にMVDの大きさおよび方向は、「mv＿class」および「mv＿sign」を使用して決定される。 In the above implementation, if the signaled "mv_class" is greater than or equal to a predefined MVD class threshold, then the "mv_class" signaling alone is sufficient to determine the MVD value. The magnitude and direction of the MVD are then determined using "mv_class" and "mv_sign".

したがって、MVDがただ1つの参照フレーム（参照フレームリスト0またはリスト1からのいずれかであるが、両方ではない）についてシグナリングされるか、または2つの参照フレームについて一緒にシグナリングされる場合、MVDの精度（または解像度）は、表3の関連する動きベクトル差のクラスおよび／またはMVDの大きさに依存し得る。 Thus, when an MVD is signaled for just one reference frame (either from reference frame list 0 or list 1, but not both), or for two reference frames together, the precision (or resolution) of the MVD may depend on the class of the associated motion vector difference in Table 3 and/or the magnitude of the MVD.

いくつかの他の実装形態では、MVDの画素解像度または精度は、MVD大きさの増加に伴って下がってもよいし、増加しなくてもよい。例えば、画素解像度は、MVD大きさの整数部分に依存し得る。いくつかの実装形態では、小数画素解像度は、振幅閾値以下のMVD大きさに対してのみ許容され得る。デコーダの場合、MVD大きさの整数部分は、最初にビットストリームから抽出され得る。次いで、画素解像度が決定され得、次に任意の小数MVDがビットストリームに存在し、解析される必要があるかどうかに関して決定が行われ得る（例えば、部分画素解像度が特定の抽出されたMVDの整数の大きさに対して許容されない場合、抽出を必要とするビットストリームには小数MVDビットが含まれなくてもよい）。MVDクラス依存性適応MVD画素解像度に関する上記の例示的な実装形態は、MVD大きさ依存性適応MVD画素解像度に適用される。特定の例では、大きさの閾値を上回るまたは包含するMVDクラスは、ただ1つの事前定義された値を有することが許容され得る。 In some other implementations, the pixel resolution or precision of the MVD may decrease or not increase with increasing MVD magnitude. For example, the pixel resolution may depend on the integer portion of the MVD magnitude. In some implementations, fractional pixel resolution may only be allowed for MVD magnitudes below an amplitude threshold. For a decoder, the integer portion of the MVD magnitude may first be extracted from the bitstream. The pixel resolution may then be determined, and then a determination may be made as to whether any fractional MVD is present in the bitstream and needs to be parsed (e.g., if fractional pixel resolution is not allowed for the integer magnitude of a particular extracted MVD, the bitstream requiring extraction may not include fractional MVD bits). The exemplary implementations described above for MVD class-dependent adaptive MVD pixel resolution apply to MVD magnitude-dependent adaptive MVD pixel resolution. In a particular example, MVD classes above or including a magnitude threshold may be allowed to have only one predefined value.

上記の様々な例示的な実装形態は、単一参照モードに適用される。これらの実装形態は、MMVD下の複合予測における例示的なNEW＿NEARMV、NEAR＿NEWMV、および／またはNEW＿NEWMVモードにも適用される。これらの実装形態は、一般に、任意のMVDの適応解像度に適用される。 The various exemplary implementations described above apply to single-reference modes. These implementations also apply to the exemplary NEW_NEARMV, NEAR_NEWMV, and/or NEW_NEWMV modes in composite prediction under MMVD. These implementations generally apply to any MVD adaptation resolution.

適応MVD画素解像度のための特に例示的な実装形態では、1未満のMVD大きさについてのMVD画素解像度は小数であってもよく、MV＿CLASS＿1以上のMVクラスについては、表4の対応するMVD大きさ範囲の終了値に等しい単一のMVD大きさのみが許容されてもよい。そのような例では、許容MVD値は、1／8、1／4、または1／2画素の許容される小数画素解像度について表4に示されている。 In a particular example implementation for adaptive MVD pixel resolution, the MVD pixel resolution for MVD magnitudes less than 1 may be fractional, and for MV classes MV_CLASS_1 and above, only a single MVD magnitude may be allowed equal to the end value of the corresponding MVD magnitude range in Table 4. In such an example, the allowed MVD values are shown in Table 4 for allowed fractional pixel resolutions of 1/8, 1/4, or 1/2 pixel.

コーディングブロックの場合、適応MVD画素解像度が使用されるか否かは、明示的または暗黙的にシグナリング（導出）され得る。適応MVD画素解像度が使用されていないとシグナリングされる場合、それは、異なるMVDクラスが表4に示されたMVD範囲に従い得ることを示し、非適応MVD画素解像度が定義またはシグナリングされ得る。そのような非適応解像度は、小数（例えば、1／8、1／4、または1／2画素）または非小数（例えば、1、2、4、…画素）であってもよく、すべてのMVDクラスに適用される。非適応解像度は、基本的に、上述のmv＿bit、mv＿fr、およびmv＿hpをシグナリングするのに必要なビット数を決定する。非適応解像度が小数である場合、それは、（MVDのクラスとは無関係に）すべてのMVDクラスについてmv＿frおよびmv＿hpをシグナリングするために必要なビット数を決定することのみが可能であり、Mv＿bitをシグナリングするためのビット数はMVDクラスに依存し得る。 For coding blocks, whether an adaptive MVD pixel resolution is used can be signaled (derived) explicitly or implicitly. If an adaptive MVD pixel resolution is not used, it indicates that different MVD classes may follow the MVD ranges shown in Table 4, and a non-adaptive MVD pixel resolution can be defined or signaled. Such a non-adaptive resolution may be fractional (e.g., 1/8, 1/4, or 1/2 pixel) or non-fractional (e.g., 1, 2, 4, ... pixels) and applies to all MVD classes. The non-adaptive resolution basically determines the number of bits required to signal mv_bit, mv_fr, and mv_hp mentioned above. If the non-adaptive resolution is fractional, it can only determine the number of bits required to signal mv_fr and mv_hp for all MVD classes (independent of the class of MVD), and the number of bits for signaling Mv_bit may depend on the MVD class.

適応MVD画素解像度が使用されているとシグナリングされるとき、許容MVDレベルまたは値は、表5に示されるものなどの適応的な方法で事前定義されてもよく、またはシグナリングされてもよい。例えば、それらは、適応MVD解像度のための特定の方式に応じて様々な方法で、ビットストリーム内でシグナリングされ得る。表5の例では、シグナリング構文のセットは、小数解像度（例えば、1／8画素）、シグナリングされた小数解像度が適用される大きさ閾値（例えば、1画素のMVD大きさ）を示すために使用され得る。（より複雑であり得る）他の構文のセットは、他の適応MVD解像度方式をシグナリングするために使用され得る。適応MVD画素解像度方式のそのような指示は、シーケンスレベル、ピクチャレベル、フレームレベル、スライスレベル、スーパーブロックレベル、またはコーディングブロックレベルなどの様々なコーディングレベルのうちの1つでシグナリングされ得る。 When adaptive MVD pixel resolution is signaled as being used, the allowed MVD levels or values may be predefined or signaled in an adaptive manner, such as those shown in Table 5. For example, they may be signaled in the bitstream in various ways depending on the particular scheme for the adaptive MVD resolution. In the example of Table 5, a set of signaling syntax may be used to indicate a fractional resolution (e.g., 1/8 pixel), a magnitude threshold (e.g., MVD magnitude of 1 pixel) at which the signaled fractional resolution applies. Other sets of syntax (which may be more complex) may be used to signal other adaptive MVD resolution schemes. Such indication of an adaptive MVD pixel resolution scheme may be signaled at one of various coding levels, such as the sequence level, picture level, frame level, slice level, superblock level, or coding block level.

いくつかの例示的な実装形態では、表5に示すものを含むがこれに限定されない全体的な適応MVD画素解像度方式は、特定のコーディングレベル（例えば、シーケンスレベル、ピクチャレベル、フレームレベル、スライスレベル、スーパーブロックレベル）で定義またはシグナリングされてもよい。そのような適応MVD画素解像度方式は、様々なMVDクラスに対して許容MVD画素解像度値が同じまたは他のコーディングレベルで調整または修正され得るように、同じまたは他のコーディングレベルでさらに修正され得る。特定のコーディングレベルで調整が行われない場合、シグナリングされたまたは事前定義された適応MVD画素解像度方式が修正なしに適用される。例えば、包括的な適応MVD画素解像度方式は、フレームレベルで定義またはシグナリングされてもよく、一方、調整は、1つまたは複数のスーパーブロックレベルまたはコーディングブロックレベルで行われてもよく、逆もまた同様である。 In some example implementations, a global adaptive MVD pixel resolution scheme, including but not limited to those shown in Table 5, may be defined or signaled at a particular coding level (e.g., sequence level, picture level, frame level, slice level, superblock level). Such an adaptive MVD pixel resolution scheme may be further modified at the same or other coding levels such that the allowable MVD pixel resolution values for various MVD classes may be adjusted or modified at the same or other coding levels. If no adjustment is made at a particular coding level, the signaled or predefined adaptive MVD pixel resolution scheme is applied without modification. For example, a global adaptive MVD pixel resolution scheme may be defined or signaled at the frame level, while adjustments may be made at one or more superblock levels or coding block levels, or vice versa.

そのような調整は、MVD精度の制限またはMVD精度の拡大として実施され得る。そのような調整に関連する情報は、事前定義またはシグナリングされてもよい。事前定義された調整は、すべてのコーディングブロックに適用され得る。あるいは、事前定義された調整は、シグナリングによって様々なコーディングレベルでアクティブ化されてもよい。 Such an adjustment may be implemented as a limitation of the MVD precision or an extension of the MVD precision. Information related to such an adjustment may be predefined or signaled. The predefined adjustment may be applied to all coding blocks. Alternatively, the predefined adjustment may be activated at various coding levels by signaling.

いくつかの実装形態では、そのような調整は、最大許容MVD精度として具体化され得る。特定のコーディングブロックについて、適応MVD解像度が適用されるとき、そのような最大許容MVD精度は、上述したように、ピクチャレベル、またはスーパーブロックレベル、またはコーディングされたブロックレベルで指定／シグナリング／導出される適応MVD画素解像度方式のMVD画素精度とは異なり得る。そのような状況では、様々なMVDクラスの許容MVD解像度値は、適応MVD画素解像度方式によって指定されるかまたはそれから導出される許容値と最大許容MVD精度との両方を取ることによって決定され得る。 In some implementations, such an adjustment may be embodied as a maximum allowed MVD precision. For a particular coding block, when an adaptive MVD resolution is applied, such maximum allowed MVD precision may differ from the MVD pixel precision of the adaptive MVD pixel resolution scheme specified/signaled/derived at the picture level, or superblock level, or coded block level, as described above. In such a situation, the allowed MVD resolution values for the various MVD classes may be determined by taking both the allowed values and the maximum allowed MVD precision specified by or derived from the adaptive MVD pixel resolution scheme.

例えば、表5の適応MVD画素解像度方式が特定のコーディングレベルに対して事前定義／シグナリング／導出されると仮定する。さらに、最大許容精度が1／4画素であると仮定すると、これは、表5に関連する適応MVD解像度にかかわらず、いかなるMVDクラスに対しても1／8画素以上の精度が許容されないことを意味する。次に、最大許容画素精度を表5の制限として無差別に適用することによって、様々なMVDクラスの許容MVD画素レベルまたは値は以下のように修正され得る： For example, assume that the adaptive MVD pixel resolution scheme in Table 5 is predefined/signaled/derived for a particular coding level. Further assume that the maximum allowable precision is 1/4 pixel, which means that no precision greater than 1/8 pixel is allowed for any MVD class, regardless of the adaptive MVD resolution associated with Table 5. Then, by indiscriminately applying the maximum allowable pixel precision as a constraint in Table 5, the allowable MVD pixel levels or values for various MVD classes may be modified as follows:

1／4画素の最大許容MVD画素精度を定義／シグナリングすることによって、すべてのMVDクラスに対して1／8画素精度以上を許容しないことは、単なる一例である。他の例では、最大許容画素精度は、1／2画素として定義／シグナリングされ得る。上記のMV＿CLASS＿0に対応する許容MVD値は、1／8画素、1／4画素、および1／2画素の小数画素解像度を有する適応解像度方式の場合に（1／2，1，2）となり、1画素の画素解像度の場合に（1，2）となり得る。 Defining/signaling a maximum allowable MVD pixel precision of 1/4 pixel, thereby not allowing more than 1/8 pixel precision for all MVD classes, is just one example. In another example, the maximum allowable pixel precision may be defined/signaled as 1/2 pixel. The allowable MVD values corresponding to MV_CLASS_0 above could be (1/2, 1, 2) for adaptive resolution schemes with fractional pixel resolutions of 1/8 pixel, 1/4 pixel, and 1/2 pixel, and (1, 2) for pixel resolution of 1 pixel.

表6に関連して上記の実装形態で例示したように、特定のコーディングレベルで定義／シグナリング／導出された適応解像度方式を使用し、同じまたは異なるコーディングレベルで追加の定義／シグナリングされた最大許容MVD精度で適応MVD解像度が適用される場合、そのような最大許容MVD精度が適応解像度方式のMVD解像度より大きくないことが要求／制限され得る。言い換えれば、適応解像度方式と最大許容精度の両方を考慮して導出された実際に適用されたMVD精度は、適応解像度方式におけるMVD解像度によってクリッピングされる（すなわち、最大許容精度は、適応解像度方式から定義／シグナリング／導出された解像度よりも大きい場合には効果的ではない）。 As illustrated in the implementation above in connection with Table 6, when using an adaptive resolution scheme defined/signaled/derived at a particular coding level and an adaptive MVD resolution is applied with an additional defined/signaled maximum allowed MVD precision at the same or a different coding level, it may be required/constrained that such maximum allowed MVD precision is not greater than the MVD resolution of the adaptive resolution scheme. In other words, the actually applied MVD precision, derived taking into account both the adaptive resolution scheme and the maximum allowed precision, is clipped by the MVD resolution in the adaptive resolution scheme (i.e., the maximum allowed precision is not effective if it is greater than the resolution defined/signaled/derived from the adaptive resolution scheme).

しかしながら、いくつかの他の実装形態では、そのようなクリッピングは必要とされない場合があり、定義／シグナリングされた最大許容MVD精度は、少なくともいくつかのMVDクラスの実際のMVD解像度を制御し得る。これらの実装形態では、適応MVD解像度が適用される場合（上述したように、様々なコーディングレベルでの定義／シグナリング／導出によって示されるように）、少なくともいくつかのMVDクラスのMVDレベルの調整は、表5に関連するものなどの適応MVD画素解像度方式で定義／シグナリング／導出される適応MVD解像度を制限するのではなく、増加させることを含み得る。例えば、定義またはシグナリングされた閾値MVDクラスレベル以下のMVDクラスの適応解像度方式から指定／シグナリング／導出されるよりも高い精度を可能にするように調整を行ってもよい。そのようなより高い精度は、上述したように、最大許容MVD精度として定義／シグナリングされ得る。そのような最大許容精度は、適応解像度方式における指定／シグナリング／導出されたMVD解像度に関係なく、閾値MVDクラスレベル以下で課され得る。具体的には、そのような閾値MVDクラスレベルは、MV＿CLASS＿0（または表5のMVDクラスのセットなどのMVDクラスのセットの最低MVDクラスレベル）であってもよい（ただし、そうである必要はない）。最大許容画素精度は、事前定義／シグナリングされてもよい。最大許容画素精度は、小数であってもよい。特定の例について、閾値MVDクラスが表5の適応解像度方式におけるMV＿CLASS＿0であり、MV＿CLASS＿0のMVD画素解像度が非小数の1画素であり、調整のための最大許容小数画素精度が1／8、1／4、または1／2画素である場合、調整許容可能なMVD値は以下のようになる： However, in some other implementations, such clipping may not be required, and the defined/signaled maximum allowed MVD precision may control the actual MVD resolution of at least some MVD classes. In these implementations, if adaptive MVD resolution is applied (as indicated by definition/signaling/derivation at various coding levels, as described above), the adjustment of the MVD level of at least some MVD classes may include increasing, rather than limiting, the adaptive MVD resolution defined/signaled/derived in an adaptive MVD pixel resolution scheme, such as that associated with Table 5. For example, the adjustment may be made to allow a higher precision than specified/signaled/derived from the adaptive resolution scheme of the MVD class below the defined or signaled threshold MVD class level. Such a higher precision may be defined/signaled as a maximum allowed MVD precision, as described above. Such a maximum allowed precision may be imposed below the threshold MVD class level, regardless of the specified/signaled/derived MVD resolution in the adaptive resolution scheme. Specifically, such a threshold MVD class level may be (but need not be) MV_CLASS_0 (or the lowest MVD class level of a set of MVD classes, such as the set of MVD classes in Table 5). The maximum allowable pixel precision may be predefined/signaled. The maximum allowable pixel precision may be a fraction. For a specific example, if the threshold MVD class is MV_CLASS_0 in the adaptive resolution scheme of Table 5, the MVD pixel resolution of MV_CLASS_0 is a non-fractional 1 pixel, and the maximum allowable fractional pixel precision for adjustment is 1/8, 1/4, or 1/2 pixel, the adjusted allowable MVD values are as follows:

表7の実装形態に代わるいくつかの例示的な実装形態では、閾値MVDクラスレベルの代わりに閾値MVD振幅が使用されてもよい。これらの実装形態では、より高い精度は、閾値MVDクラスレベルではなく閾値MVD振幅以下の大きさを有するMVDに対して指定／シグナリングされた最大許容MVD精度によって課されてもよい。そのような実装形態では、許容MVD値を決定するために、MVDの大きさを時間的に決定することができるように、mv＿class情報に加えてmv＿bit情報がビデオストリーム内で十分に早くシグナリングされてもよい。例えば、閾値MVDクラスを1／2画素の閾値MVD大きさに置き換え、なおもMV＿CLASS＿0の適応MVD解像度が適応解像度方式において1画素であると仮定することによって、表7は以下の表8になる： In some exemplary alternative implementations to that of Table 7, a threshold MVD amplitude may be used instead of the threshold MVD class level. In these implementations, higher precision may be imposed by a maximum allowable MVD precision specified/signaled for MVDs with magnitudes equal to or less than the threshold MVD amplitude rather than the threshold MVD class level. In such implementations, mv_bit information may be signaled early enough in the video stream in addition to the mv_class information so that the magnitude of the MVD can be determined in time to determine the allowable MVD value. For example, by replacing the threshold MVD class with a threshold MVD magnitude of 1/2 pixel and still assuming that the adaptive MVD resolution of MV_CLASS_0 is 1 pixel in the adaptive resolution scheme, Table 7 becomes Table 8 below:

いくつかの他の例示的な実装形態では、上記の調整は、MVDの大きさが閾値MVD大きさ以下である場合に、特定の精度およびより低い精度（例えば、小数精度1／8、1／4、または1／2以下）のみを可能にすることを含み得る。そのような実装形態では、やはり、許容MVD値を決定するために、MVDの大きさを時間的に決定することができるように、mv＿class情報に加えてmv＿bit情報がビデオストリーム内で十分に早くシグナリングされてもよい。 In some other example implementations, the above adjustments may include only allowing certain precisions and lower precisions (e.g., decimal precision 1/8, 1/4, or 1/2 or less) when the MVD magnitude is equal to or less than a threshold MVD magnitude. In such implementations, again, the mv_bit information in addition to the mv_class information may be signaled early enough in the video stream so that the MVD magnitude can be determined in time to determine the allowed MVD value.

そのような実装形態では、適応解像度方式から導出されたMVD値（表5など）に追加の解像度を課すことはできない。代わりに、MVDの振幅が閾値MVD大きさよりも大きい場合、定義／シグナリングされた精度レベル以上の解像度に関連付けられたMVD値は、許容されない場合がある。ここでも、表5の例では、さらに、1／2画素の閾値MVD大きさよりも大きいMVD大きさについて、1／8画素精度の定義／シグナリングされた精度以上の解像度に関連付けられたMVD値が許容されないと仮定する。次に、表5を以下のように調整する： In such implementations, no additional resolution may be imposed on the MVD values derived from the adaptive resolution scheme (e.g., Table 5). Instead, MVD values associated with a resolution equal to or greater than the defined/signaled precision level may not be allowed if the MVD amplitude is greater than a threshold MVD magnitude. Again, in the example of Table 5, further assume that MVD values associated with a resolution equal to or greater than the defined/signaled precision of ⅛ pixel precision are not allowed for MVD magnitudes greater than a threshold MVD magnitude of ½ pixel. Table 5 is then adjusted as follows:

特に、上記のように、MV＿CLASS＿0および1／8画素の小数解像度（1／8，2／8，3／8，1／2，5／8，6／8，7／8，1，2）の許容MVD値は（1／8，2／8，3／8，1／2，6／8，1，2）に調整され、1／8精度に関連付けられたMVD値は許容され、1／2画素の閾値MVD大きさ以下に保持されるのみである。1／2画素の大きさを超えると、5／8画素値および7／8画素値などの1／8精度に関連付けられたMVD値は許容されない。 In particular, as noted above, the allowed MVD values for MV_CLASS_0 and 1/8 pixel fractional resolution (1/8, 2/8, 3/8, 1/2, 5/8, 6/8, 7/8, 1, 2) are adjusted to (1/8, 2/8, 3/8, 1/2, 6/8, 1, 2), such that MVD values associated with 1/8 pixel precision are only allowed and kept below a threshold MVD magnitude of 1/2 pixel. Above the 1/2 pixel magnitude, MVD values associated with 1/8 pixel precision, such as 5/8 pixel values and 7/8 pixel values, are not allowed.

同様に、表5の例では、MVD大きさが1／2画素の閾値大きさよりも大きい場合、1／4画素精度の定義／シグナリングされた精度以上の解像度に関連付けられたMVD値は許容されないと仮定する。次に、表5を以下のように調整する： Similarly, in the example of Table 5, assume that MVD values associated with resolutions equal to or greater than the defined/signaled precision of 1/4 pixel precision are not allowed if the MVD magnitude is greater than the threshold magnitude of 1/2 pixel. Then, adjust Table 5 as follows:

上記の実装形態のいくつかでは、閾値MVD大きさは、上記の例で与えられた1／2画素の大きさの閾値など、2画素以下であってもよい。 In some of the above implementations, the threshold MVD magnitude may be 2 pixels or less, such as the 1/2 pixel magnitude threshold given in the example above.

上記の例示的な実装形態は、インター予測モードが単一参照モードにあるか複合参照モードにあるかに関係なく、特定のMVDに関して説明されている。MVが複数の参照フレームによって予測される複合参照モードにおけるいくつかの他の例示的な実装形態では、適応MVD解像度が適用されるかどうか、およびそれが適用される複数の参照フレームのうちの参照フレームのうちのどれに適用されるかを示すために、定義／シグナリングのセットが使用されてもよい。 The above example implementations are described with respect to a particular MVD, regardless of whether the inter prediction mode is in a single reference mode or a mixed reference mode. In some other example implementations in a mixed reference mode where the MV is predicted by multiple reference frames, a set of definitions/signaling may be used to indicate whether an adaptive MVD resolution is applied and to which of the multiple reference frames it is applied.

いくつかの例示的な実装形態では、MVDが複数の参照フレームについてシグナリングされる場合、適応MVD解像度が適用されるか否かを示すために、1つ（または複数）のフラグ／インデックスがシグナリングされ得る。 In some example implementations, if MVD is signaled for multiple reference frames, one (or more) flags/indexes may be signaled to indicate whether adaptive MVD resolution is applied or not.

例えば、MVDが（例えば、上述のNEW＿NEWMVモード、または他の複合参照インター予測モードにおける）複数の参照フレームについてシグナリングされるとき、適応MVD解像度が複数の参照フレームのすべてについてMVDのシグナリングに適用されるか否かを示すために、ビデオストリーム内で1つのフラグ／インデックスがシグナリングされ得る。このフラグ／インデックスが1（または0）である場合、これは、複数の参照フレームのすべてについてMVDのシグナリングに適応MVD解像度が適用されることを示す。そうではなく、このフラグ／インデックスが0（または1）である場合、適応MVDコーディングは、複数の参照フレームのいずれについてもMVDのシグナリングに適用されない。そのような実装形態では、複数のインター予測参照フレームに関して、適応MVD解像度は全か無かの方式で適用される。 For example, when MVD is signaled for multiple reference frames (e.g., in the NEW_NEWMV mode described above, or other mixed reference inter prediction modes), one flag/index may be signaled in the video stream to indicate whether adaptive MVD resolution is applied to the signaling of MVD for all of the multiple reference frames. If this flag/index is 1 (or 0), this indicates that adaptive MVD resolution is applied to the signaling of MVD for all of the multiple reference frames. Otherwise, if this flag/index is 0 (or 1), adaptive MVD coding is not applied to the signaling of MVD for any of the multiple reference frames. In such an implementation, adaptive MVD resolution is applied in an all-or-nothing manner with respect to the multiple inter prediction reference frames.

いくつかの他の例では、MVDが（例えば、2参照フレーム複合インター予測モードまたは他の複合インター予測モードについて上述したNEW＿NEWMVモードにおける）複数の参照フレームについてシグナリングされるとき、適応MVD解像度が各参照フレームに適用されるか否かを示すために、各参照フレームについて別々に1つのフラグ／インデックスがシグナリングされ得る。そのような実装形態では、適応MVD解像度が適用されるか否かは、参照フレームの各々について個別に決定されてもよい。適応MVD解像度を適用するかどうかの決定は、複数の参照フレームの各々について独立してエンコーダで行われ、ビデオストリーム内で別々にシグナリングされ得る。 In some other examples, when MVD is signaled for multiple reference frames (e.g., in the NEW_NEWMV mode described above for the two-reference frame composite inter prediction mode or other composite inter prediction modes), one flag/index may be signaled separately for each reference frame to indicate whether adaptive MVD resolution is applied to each reference frame. In such implementations, whether adaptive MVD resolution is applied may be determined individually for each of the reference frames. The decision of whether to apply adaptive MVD resolution may be made at the encoder independently for each of the multiple reference frames and signaled separately in the video stream.

いくつかの例示的な実装形態では、MVDが複数の参照フレームについてシグナリングされるとき、複数の参照フレームの各々について、その参照フレームのMVDが非ゼロである場合、適応MVD解像度がその参照フレームに適用されるか否かを示すために1つのフラグ／インデックスがシグナリングされてもよい。そうでなければ、フラグ／インデックスはシグナリングされる必要はない。言い換えれば、特定の参照フレームのMVDがゼロとしてシグナリング／指示される場合、適応MVD解像度が適用されるか否かを判定する必要はなく、したがって、ビデオストリーム内の対応するシグナリングは必要ない。しかしながら、そのような実装形態では、適応解像度が適用されるか否かに関する判定が行われるときの前に、MVDがゼロであるという指示がシグナリングされる必要がある。 In some example implementations, when MVD is signaled for multiple reference frames, for each of the multiple reference frames, if the MVD of that reference frame is non-zero, one flag/index may be signaled to indicate whether adaptive MVD resolution is applied to that reference frame or not. Otherwise, no flag/index needs to be signaled. In other words, if the MVD of a particular reference frame is signaled/indicated as zero, there is no need to determine whether adaptive MVD resolution is applied or not, and therefore no corresponding signaling in the video stream is needed. However, in such implementations, an indication that the MVD is zero needs to be signaled before the time when a determination is made as to whether adaptive resolution is applied or not.

さらにMVD解像度のシグナリングに目を向けると、いくつかの例示的な実装形態では、現在のコーディングブロックのMVD解像度を明示的に示すためにフラグ／インデックスがシグナリングされてもよく、そのようなフラグ／インデックスをエントロピーコーディングするために使用されるコンテキストは、MVDに関連付けられたMVDクラスに依存し得る。そのようなフラグ／インデックスは、表5などの適応解像度方式を導出するために使用されるMVD解像度、または上述の最大許容MVD精度のいずれかであってもよい。 Further turning to signaling of MVD resolution, in some example implementations, a flag/index may be signaled to explicitly indicate the MVD resolution of the current coding block, and the context used to entropy code such flag/index may depend on the MVD class associated with the MVD. Such a flag/index may be either the MVD resolution used to derive an adaptive resolution scheme, such as Table 5, or the maximum allowed MVD precision described above.

MVD解像度のシグナリングに関するいくつかの例示的な実装形態では、MVDの様々な構成要素は別々にシグナリングされてもよい。MVDは、例えば、水平成分および垂直成分を含み得る。水平成分および垂直成分のMVD解像度をそれぞれ示すために、MVDの水平成分および垂直成分の各々についてフラグ／インデックスがシグナリングされ得る。 In some example implementations of signaling MVD resolution, various components of the MVD may be signaled separately. The MVD may include, for example, a horizontal component and a vertical component. A flag/index may be signaled for each of the horizontal and vertical components of the MVD to indicate the MVD resolution of the horizontal and vertical components, respectively.

いくつかの例示的な実装形態では、MVD解像度フラグ／インデックスは、MVDクラス情報の後にシグナリングされてもよい。MV＿CLASS＿0、MV＿CLASS＿1、MV＿CLASS＿2などのシグナリングされたMVDクラス情報の値に応じて、コンテキスト値が導出され、MVD解像度を示すためのMVD解像度フラグ／インデックスをシグナリングするために使用され得る。言い換えれば、MVD解像度のシグナリングのための構文（複数可）は、異なるMVDクラスまたは異なるMVDクラスグループに対して異なるコンテキストを使用してエントロピーコーディングされてもよい。 In some example implementations, the MVD resolution flag/index may be signaled after the MVD class information. Depending on the value of the signaled MVD class information, such as MV_CLASS_0, MV_CLASS_1, MV_CLASS_2, etc., a context value may be derived and used to signal the MVD resolution flag/index to indicate the MVD resolution. In other words, the syntax(es) for signaling the MVD resolution may be entropy coded using different contexts for different MVD classes or different MVD class groups.

図18は、適応MVD解像度のための上記の実装形態の基礎となる原理に従う例示的な方法のフローチャート1800を示す。例示的なデコーディング方法のフローはS1801で開始する。S1810では、ビデオストリームを受信する。S1820では、ビデオブロックが予測ブロックと動きベクトル（MV）とに基づいてインターコーディングされていると決定され、MVはビデオブロックの参照動きベクトル（RMV）と動きベクトル差（MVD）から導出されるべきである。S1830では、MVDが適応MVD画素解像度でコーディングされていると決定したことに応答して、現在のビデオブロックの参照MVD画素精度が決定され、最大許容MVD画素精度が識別され、現在のビデオブロックの許容可能なMVDレベルのセットは、参照MVD画素精度および最大許容MVD画素精度に基づいて決定され、ビデオストリームからのMVDは、現在のビデオブロックについてビデオストリームでシグナリングされた少なくとも1つのMVDパラメータおよび許容可能なMVDレベルのセットに従って導出される。例示的な方法は、S1899で停止する。 FIG. 18 illustrates a flowchart 1800 of an exemplary method according to the principles underlying the above implementation for adaptive MVD resolution. The flow of the exemplary decoding method starts at S1801. At S1810, a video stream is received. At S1820, it is determined that a video block is inter-coded based on a predictive block and a motion vector (MV), and the MV should be derived from a reference motion vector (RMV) and a motion vector difference (MVD) of the video block. At S1830, in response to determining that the MVD is coded at the adaptive MVD pixel resolution, a reference MVD pixel precision of a current video block is determined, a maximum allowable MVD pixel precision is identified, a set of allowable MVD levels of the current video block is determined based on the reference MVD pixel precision and the maximum allowable MVD pixel precision, and the MVD from the video stream is derived according to at least one MVD parameter and the set of allowable MVD levels signaled in the video stream for the current video block. The exemplary method stops at S1899.

図19は、適応MVD解像度のための上記の実装形態の基礎となる原理に従う例示的な方法のフローチャート1900を示す。例示的なデコーディング方法のフローはS1901で開始する。S1910では、ビデオストリームを受信する。S1920では、現在のビデオブロックがインターコーディングされ、複数の参照フレームと関連付けられていると決定される。S1930では、ビデオストリーム内のシグナリングに基づいて、複数の参照フレームのうちの少なくとも1つに適応動きベクトル差（MVD）画素解像度が適用されるかどうかがさらに決定される。例示的な方法はS1999で停止する。 FIG. 19 shows a flowchart 1900 of an example method according to the principles underlying the above implementation for adaptive MVD resolution. The flow of the example decoding method starts at S1901. At S1910, a video stream is received. At S1920, it is determined that the current video block is inter-coded and associated with multiple reference frames. At S1930, it is further determined, based on signaling in the video stream, whether an adaptive motion vector difference (MVD) pixel resolution is applied to at least one of the multiple reference frames. The example method stops at S1999.

図20は、適応MVD解像度のための上記の実装形態の基礎となる原理に従う例示的な方法のフローチャート2000を示す。例示的なデコーディング方法のフローはS2001で開始する。S2010では、ビデオストリームを受信する。S2020では、ビデオブロックが予測ブロックと動きベクトル（MV）とに基づいてインターコーディングされていると決定され、MVはビデオブロックの参照動きベクトル（RMV）と動きベクトル差（MVD）から導出されるべきである。S2030では、MVDクラスの事前定義されたセットの中のMVDの現在のMVDクラスが決定される。S2040では、ビデオストリーム内の少なくとも1つの明示的なシグナリングをエントロピーデコーディングするための少なくとも1つのコンテキストが、現在のMVDクラスに基づいて導出され、少なくとも1つの明示的なシグナリングは、MVDの少なくとも1つの成分のMVD画素解像度を指定するためにビデオストリームに含まれる。S2050では、ビデオストリームからの少なくとも1つの明示的なシグナリングは、MVDの少なくとも1つの成分のMVD画素解像度を決定するために、少なくとも1つのコンテキストを使用してエントロピーデコーディングされる。例示的な方法はS2099で停止する。 FIG. 20 shows a flowchart 2000 of an example method according to the principles underlying the above implementation for adaptive MVD resolution. The flow of the example decoding method starts at S2001. At S2010, a video stream is received. At S2020, it is determined that a video block is inter-coded based on a predictive block and a motion vector (MV), and the MV should be derived from a reference motion vector (RMV) and a motion vector difference (MVD) of the video block. At S2030, a current MVD class of the MVD among a predefined set of MVD classes is determined. At S2040, at least one context for entropy decoding at least one explicit signaling in the video stream is derived based on the current MVD class, and the at least one explicit signaling is included in the video stream to specify an MVD pixel resolution of at least one component of the MVD. At S2050, at least one explicit signaling from the video stream is entropy decoded using the at least one context to determine an MVD pixel resolution of at least one component of the MVD. The exemplary method stops at S2099.

本開示の実施形態および実装形態では、所望により、任意のステップおよび／または動作は、任意の量または順序で組み合わされるか、または配置されてもよい。ステップおよび／または動作のうちの2つ以上が、並列に行われてもよい。本開示の実施形態および実装形態は、別々に使用されてもよく、任意の順序で組み合わされてもよい。さらに、方法（または実施形態）の各々、エンコーダ、およびデコーダは、処理回路（例えば、1つもしくは複数のプロセッサまたは1つもしくは複数の集積回路）によって実装されてもよい。一例では、1つまたは複数のプロセッサは、非一時的コンピュータ可読媒体に記憶されたプログラムを実行する。本開示の実施形態は、ルマブロックまたはクロマブロックに適用されてもよい。ブロックという用語は、予測ブロック、コーディングブロック、またはコーディングユニット、すなわちCUとして解釈されてもよい。ここでのブロックという用語はまた、変換ブロックを指すために使用されてもよい。以下の項目では、ブロックサイズと言うとき、それは、ブロックの幅もしくは高さ、または幅および高さの最大値、または幅および高さの最小値、またはエリアのサイズ（幅＊高さ）、またはブロックのアスペクト比（幅：高さ、もしくは高さ：幅）のいずれかを指すことができる。 In the embodiments and implementations of the present disclosure, any steps and/or operations may be combined or arranged in any quantity or order, as desired. Two or more of the steps and/or operations may be performed in parallel. The embodiments and implementations of the present disclosure may be used separately or combined in any order. Furthermore, each of the methods (or embodiments), the encoder, and the decoder may be implemented by a processing circuit (e.g., one or more processors or one or more integrated circuits). In one example, the one or more processors execute a program stored in a non-transitory computer-readable medium. The embodiments of the present disclosure may be applied to a luma block or a chroma block. The term block may be interpreted as a prediction block, a coding block, or a coding unit, i.e., a CU. The term block here may also be used to refer to a transform block. In the following items, when referring to a block size, it may refer to either the width or height of the block, or the maximum value of the width and height, or the minimum value of the width and height, or the size of the area (width * height), or the aspect ratio of the block (width: height, or height: width).

上述された技法は、コンピュータ可読命令を使用するコンピュータソフトウェアとして実装され、1つまたは複数のコンピュータ可読媒体に物理的に記憶することができる。例えば、図21は、開示された主題の特定の実施形態を実装するのに適したコンピュータシステム（2100）を示す。 The techniques described above may be implemented as computer software using computer-readable instructions and physically stored on one or more computer-readable media. For example, FIG. 21 illustrates a computer system (2100) suitable for implementing certain embodiments of the disclosed subject matter.

コンピュータソフトウェアは、アセンブリ、コンパイル、リンクなどのメカニズムを受けることができる任意の適切な機械コードまたはコンピュータ言語を使用してコーディングされ、1つまたは複数のコンピュータ中央処理装置（CPU）、グラフィックス処理装置（GPU）などによって直接、または解釈、マイクロコード実行などを介して、実行され得る命令を含むコードを作成することができる。 Computer software may be coded using any suitable machine code or computer language that is amenable to mechanisms such as assembly, compilation, linking, etc., to create code that includes instructions that may be executed by one or more computer central processing units (CPUs), graphics processing units (GPUs), etc., directly or via interpretation, microcode execution, etc.

命令は、例えば、パーソナルコンピュータ、タブレットコンピュータ、サーバ、スマートフォン、ゲーム機、モノのインターネットデバイスなどを含む様々なタイプのコンピュータまたはコンピュータの構成要素上で実行されることができる。 The instructions can be executed on various types of computers or computer components, including, for example, personal computers, tablet computers, servers, smartphones, gaming consoles, Internet of Things devices, etc.

コンピュータシステム（2100）に関して図21に示された構成要素は、本質的に例示的なものであり、本開示の実施形態を実装するコンピュータソフトウェアの使用または機能の範囲に関するいかなる制限も示唆するものではない。構成要素の構成は、コンピュータシステム（2100）の例示的な実施形態に示された構成要素のいずれか1つまたは組み合わせに関するいかなる依存性または要件も有すると解釈されるべきでない。 The components illustrated in FIG. 21 for the computer system (2100) are exemplary in nature and are not intended to suggest any limitations as to the scope of use or functionality of the computer software implementing the embodiments of the present disclosure. The configuration of components should not be construed as having any dependency or requirement regarding any one or combination of components illustrated in the exemplary embodiment of the computer system (2100).

コンピュータシステム（2100）は、特定のヒューマンインターフェース入力デバイスを含んでもよい。そのようなヒューマンインターフェース入力デバイスは、例えば、（キーストローク、スワイプ、データグローブの動きなどの）触覚入力、（音声、拍手などの）オーディオ入力、（ジェスチャなどの）視覚入力、（描写されていない）嗅覚入力を介して、1人または複数の人間のユーザによる入力に応答し得る。ヒューマンインターフェースデバイスは、オーディオ（音声、音楽、環境音など）、画像（走査画像、写真画像は静止画像カメラから取得など）、ビデオ（2次元ビデオ、立体ビデオを含む3次元ビデオなど）など、必ずしも人間による意識的な入力に直接関連しない特定の媒体を取り込むためにも使用され得る。 The computer system (2100) may include certain human interface input devices. Such human interface input devices may respond to input by one or more human users, for example, via tactile input (e.g., keystrokes, swipes, data glove movements), audio input (e.g., voice, clapping), visual input (e.g., gestures), or olfactory input (not depicted). Human interface devices may also be used to capture certain media not necessarily directly associated with conscious human input, such as audio (e.g., voice, music, ambient sounds), images (e.g., scanned images, photographic images obtained from still image cameras), and video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

入力ヒューマンインターフェースデバイスには、キーボード（2101）、マウス（2102）、トラックパッド（2103）、タッチスクリーン（2110）、データグローブ（図示せず）、ジョイスティック（2105）、マイクロフォン（2106）、スキャナ（2107）、カメラ（2108）のうちの1つまたは複数が含まれてもよい（各々の1つのみが描写されている）。 The input human interface devices may include one or more of a keyboard (2101), a mouse (2102), a trackpad (2103), a touch screen (2110), a data glove (not shown), a joystick (2105), a microphone (2106), a scanner (2107), and a camera (2108) (only one of each is depicted).

コンピュータシステム（2100）はまた、特定のヒューマンインターフェース出力デバイスを含んでもよい。そのようなヒューマンインターフェース出力デバイスは、例えば、触覚出力、音、光、および嗅覚／味覚を介して、1人または複数の人間のユーザの感覚を刺激している場合がある。そのようなヒューマンインターフェース出力デバイスには、触覚出力デバイス（例えば、タッチスクリーン（2110）、データグローブ（図示せず）、またはジョイスティック（2105）による触覚フィードバック、しかし入力デバイスとして機能しない触覚フィードバックデバイスが存在する可能性もある）、（スピーカ（2109）、ヘッドフォン（描写せず）などの）オーディオ出力デバイス、（CRTスクリーン、LCDスクリーン、プラズマスクリーン、OLEDスクリーンを含むスクリーン（2110）など、各々タッチスクリーン入力機能の有無にかかわらず、各々触覚フィードバック機能の有無にかかわらず、それらのうちのいくつかは、ステレオグラフィック出力、仮想現実眼鏡（描写せず）、ホログラフィックディスプレイおよびスモークタンク（描写せず）などの手段を介して2次元視覚出力または3次元以上の出力を出力することが可能な場合がある）視覚出力デバイス、ならびにプリンタ（描写せず）が含まれてもよい。 The computer system (2100) may also include certain human interface output devices. Such human interface output devices may stimulate one or more of the human user's senses, for example, through haptic output, sound, light, and smell/taste. Such human interface output devices may include haptic output devices (e.g., haptic feedback via a touch screen (2110), data gloves (not shown), or joystick (2105), although there may be haptic feedback devices that do not function as input devices), audio output devices (such as speakers (2109), headphones (not depicted)), visual output devices (such as screens (2110), including CRT screens, LCD screens, plasma screens, OLED screens, each with or without touch screen input capability, each with or without haptic feedback capability, some of which may be capable of outputting two-dimensional visual output or three or more dimensional output via means such as stereographic output, virtual reality glasses (not depicted), holographic displays, and smoke tanks (not depicted)), and printers (not depicted).

コンピュータシステム（2100）は、CD／DVDまたは同様の媒体（2121）を有するCD／DVD ROM／RW（2120）を含む光学媒体、サムドライブ（2122）、リムーバブルハードドライブまたはソリッドステートドライブ（2123）、テープおよびフロッピーディスクなどのレガシー磁気媒体（描写せず）、セキュリティドングルなどの特殊なROM／ASIC／PLDベースのデバイス（描写せず）などの、人間がアクセス可能なストレージデバイスおよびそれらに関連する媒体を含むこともできる。 The computer system (2100) may also include human-accessible storage devices and their associated media, such as optical media, including CD/DVD ROM/RW (2120) with CD/DVD or similar media (2121), thumb drives (2122), removable hard drives or solid state drives (2123), legacy magnetic media such as tapes and floppy disks (not depicted), and specialized ROM/ASIC/PLD-based devices (not depicted) such as security dongles.

当業者はまた、本開示の主題に関連して使用される「コンピュータ可読媒体」という用語が、伝送媒体、搬送波、または他の一時的な信号を包含しないことを理解するべきである。 Those skilled in the art should also understand that the term "computer-readable medium" as used in connection with the subject matter of this disclosure does not encompass transmission media, carrier waves, or other transitory signals.

コンピュータシステム（2100）は、1つまたは複数の通信ネットワーク（2155）へのインターフェース（2154）を含むこともできる。ネットワークは、例えば、ワイヤレス、有線、光であり得る。ネットワークはさらに、ローカル、ワイドエリア、メトロポリタン、車両用および産業用、リアルタイム、遅延耐性などであり得る。ネットワークの例には、イーサネット、無線LANなどのローカルエリアネットワーク、GSM、3G、4G、5G、LTEなどを含むセルラーネットワーク、ケーブルテレビ、衛星テレビ、および地上波放送テレビを含むテレビ有線または無線ワイドエリアデジタルネットワーク、CANbusを含む車両用および産業用などが含まれる。特定のネットワークは、通常、（例えば、コンピュータシステム（2100）のUSBポートなどの）特定の汎用データポートまたは周辺バス（2149）に取り付けられた外部ネットワークインターフェースアダプタを必要とし、他のネットワークは、通常、以下で説明するようなシステムバスに取り付けることによってコンピュータシステム（2100）のコアに統合される（例えば、PCコンピュータシステムへのイーサネットインターフェース、またはスマートフォンコンピュータシステムへのセルラーネットワークインターフェース）。これらのネットワークのいずれかを使用して、コンピュータシステム（2100）は他のエンティティと通信することができる。そのような通信は、単方向受信のみ（例えば、ブロードキャストTV）、単方向送信のみ（例えば、特定のCANbusデバイスへのCANbus）、または、例えば、ローカルもしくは広域のデジタルネットワークを使用する他のコンピュータシステムとの双方向であり得る。特定のプロトコルおよびプロトコルスタックは、上述したように、それらのネットワークおよびネットワークインターフェースの各々で使用され得る。 The computer system (2100) may also include interfaces (2154) to one or more communication networks (2155). The networks may be, for example, wireless, wired, optical. The networks may further be local, wide area, metropolitan, vehicular and industrial, real-time, delay tolerant, etc. Examples of networks include local area networks such as Ethernet, WLAN, cellular networks including GSM, 3G, 4G, 5G, LTE, etc., television wired or wireless wide area digital networks including cable television, satellite television, and terrestrial broadcast television, vehicular and industrial including CANbus, etc. Certain networks typically require an external network interface adapter attached to a particular general-purpose data port (e.g., a USB port on the computer system (2100)) or peripheral bus (2149), while other networks are typically integrated into the core of the computer system (2100) by attachment to a system bus as described below (e.g., an Ethernet interface to a PC computer system, or a cellular network interface to a smartphone computer system). Using any of these networks, the computer system (2100) can communicate with other entities. Such communications may be one-way receive only (e.g., broadcast TV), one-way transmit only (e.g., CANbus to a particular CANbus device), or bidirectional with other computer systems using, for example, local or wide area digital networks. Specific protocols and protocol stacks may be used with each of these networks and network interfaces, as described above.

前述のヒューマンインターフェースデバイス、人間がアクセス可能なストレージデバイス、およびネットワークインターフェースを、コンピュータシステム（2100）のコア（2140）に取り付けることができる。 The aforementioned human interface devices, human accessible storage devices, and network interfaces may be attached to the core (2140) of the computer system (2100).

コア（2140）は、1つまたは複数の中央処理装置（CPU）（2141）、グラフィックス処理装置（GPU）（2142）、フィールドプログラマブルゲートアレイ（FPGA）（2143）、特定のタスク用のハードウェアアクセラレータ（2144）、グラフィックスアダプタ（2150）などの形態の特殊なプログラマブル処理装置を含むことができる。これらのデバイスは、読み取り専用メモリ（ROM）（2145）、ランダムアクセスメモリ（2146）、ユーザがアクセスできない内部ハードドライブ、SSDなどの内部大容量ストレージ（2147）と共に、システムバス（2148）を介して接続されてもよい。いくつかのコンピュータシステムでは、システムバス（2148）は、追加のCPU、GPUなどによる拡張を可能にするために、1つまたは複数の物理プラグの形態でアクセス可能であり得る。周辺デバイスを、コアのシステムバス（2148）に直接取り付けることも、周辺バス（2149）を介して取り付けることもできる。一例では、スクリーン（2110）を、グラフィックスアダプタ（2150）に接続することができる。周辺バス用のアーキテクチャには、PCI、USBなどが含まれる。 The cores (2140) may include specialized programmable processing devices in the form of one or more central processing units (CPUs) (2141), graphics processing units (GPUs) (2142), field programmable gate arrays (FPGAs) (2143), hardware accelerators for specific tasks (2144), graphics adapters (2150), and the like. These devices may be connected via a system bus (2148), along with read only memory (ROM) (2145), random access memory (2146), and internal mass storage (2147) such as an internal hard drive, SSD, etc. that is not accessible to the user. In some computer systems, the system bus (2148) may be accessible in the form of one or more physical plugs to allow expansion with additional CPUs, GPUs, etc. Peripheral devices may be attached directly to the core's system bus (2148) or via a peripheral bus (2149). In one example, a screen (2110) may be connected to a graphics adapter (2150). Architectures for peripheral buses include PCI, USB, etc.

CPU（2141）、GPU（2142）、FPGA（2143）、およびアクセラレータ（2144）は、組み合わせて、前述のコンピュータコードを構成することができる特定の命令を実行することができる。そのコンピュータコードを、ROM（2145）またはRAM（2146）に記憶することができる。移行データをRAM（2146）に記憶することもでき、永続データを、例えば、内部大容量ストレージ（2147）に記憶することができる。メモリデバイスのいずれかに対する高速の記憶および検索は、1つまたは複数のCPU（2141）、GPU（2142）、大容量ストレージ（2147）、ROM（2145）、RAM（2146）などと密接に関連付けることができるキャッシュメモリを使用して可能にすることができる。 The CPU (2141), GPU (2142), FPGA (2143), and accelerator (2144) can execute certain instructions that, in combination, may constitute the aforementioned computer code. That computer code may be stored in ROM (2145) or RAM (2146). Transient data may also be stored in RAM (2146), and persistent data may be stored, for example, in internal mass storage (2147). Rapid storage and retrieval from any of the memory devices may be enabled using cache memories that may be closely associated with one or more of the CPU (2141), GPU (2142), mass storage (2147), ROM (2145), RAM (2146), etc.

コンピュータ可読媒体は、様々なコンピュータ実装動作を行うためのコンピュータコードを有することができる。媒体およびコンピュータコードは、本開示の目的のために特別に設計および構築されたものであってもよく、またはコンピュータソフトウェア技術の当業者に周知の利用可能な種類のものであってもよい。 The computer-readable medium can bear computer code for performing various computer-implemented operations. The medium and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind known and available to those skilled in the computer software arts.

非限定的な例として、アーキテクチャを有するコンピュータシステム（2100）、具体的にはコア（2140）は、（CPU、GPU、FPGA、アクセラレータなどを含む）プロセッサが1つまたは複数の有形のコンピュータ可読媒体に具現化されたソフトウェアを実行する結果としての機能を提供することができる。そのようなコンピュータ可読媒体は、上記で紹介されたユーザアクセス可能な大容量ストレージ、ならびにコア内部大容量ストレージ（2147）またはROM（2145）などの非一時的な性質のコア（2140）の特定のストレージに関連付けられた媒体であり得る。本開示の様々な実施形態を実装するソフトウェアは、そのようなデバイスに記憶され、コア（2140）によって実行することができる。コンピュータ可読媒体は、特定のニーズに応じて、1つまたは複数のメモリデバイスまたはチップを含むことができる。ソフトウェアは、コア（2140）、および具体的にはそのうちの（CPU、GPU、FPGAなどを含む）プロセッサに、RAM（2146）に記憶されたデータ構造を定義すること、およびソフトウェアによって定義されたプロセスに従ってそのようなデータ構造を修正することを含む、本明細書に記載された特定のプロセスまたは特定のプロセスの特定の部分を実行させることができる。加えて、または代替として、コンピュータシステムは、回路（例えば、アクセラレータ（2144））にハードワイヤードまたは他の方法で具現化された論理の結果としての機能を提供することができ、これは、本明細書で説明される特定のプロセスまたは特定のプロセスの特定の部分を実行するためにソフトウェアの代わりに、またはソフトウェアと共に動作し得る。ソフトウェアへの言及は、必要に応じて、論理を包含することができ、その逆も同様である。コンピュータ可読媒体への言及は、必要に応じて、実行のためのソフトウェアを記憶する回路（集積回路（IC）など）、実行のための論理を具現化する回路、またはその両方を包含することができる。本開示は、ハードウェアとソフトウェアの任意の適切な組み合わせを包含する。 As a non-limiting example, a computer system (2100) having an architecture, and specifically a core (2140), can provide functionality as a result of a processor (including a CPU, GPU, FPGA, accelerator, etc.) executing software embodied in one or more tangible computer-readable media. Such computer-readable media can be the user-accessible mass storage introduced above, as well as media associated with a particular storage of the core (2140) of a non-transitory nature, such as the core internal mass storage (2147) or ROM (2145). Software implementing various embodiments of the present disclosure can be stored in such devices and executed by the core (2140). The computer-readable media can include one or more memory devices or chips, depending on the particular needs. The software can cause the core (2140), and specifically the processor therein (including a CPU, GPU, FPGA, etc.) to perform a particular process or a particular portion of a particular process described herein, including defining data structures stored in RAM (2146) and modifying such data structures according to a process defined by the software. Additionally or alternatively, the computer system may provide functionality as a result of logic hardwired or otherwise embodied in circuitry (e.g., accelerator (2144)), which may operate in place of or in conjunction with software to perform particular processes or portions of particular processes described herein. References to software may encompass logic, and vice versa, where appropriate. References to computer-readable media may encompass circuitry (such as integrated circuits (ICs)) that stores software for execution, circuitry that embodies logic for execution, or both, where appropriate. The present disclosure encompasses any suitable combination of hardware and software.

本開示はいくつかの例示的な実施形態を記載しているが、本開示の範囲内に入る変更、置換、および様々な代替の均等物が存在する。よって、当業者は、本明細書に明示的に図示または記載されていないが、本開示の原理を具現化する、したがって本開示の趣旨および範囲内にある多数のシステムおよび方法を考案することができることが理解されよう。
付記A：頭字語
JEM：共同探索モデル
VVC：多用途ビデオコーディング
BMS：ベンチマークセット
MV：動きベクトル
HEVC：高効率ビデオコーディング
SEI：補足拡張情報
VUI：ビデオユーザビリティ情報
GOPs：ピクチャグループ
TUs：変換ユニット
PUs：予測ユニット
CTUs：コーディングツリーユニット
CTBs：コーディングツリーブロック
PBs：予測ブロック
HRD：仮想参照デコーダ
SNR：信号対雑音比
CPUs：中央処理装置
GPUs：グラフィックス処理装置
CRT：陰極線管
LCD：液晶ディスプレイ
OLED：有機発光ダイオード
CD：コンパクトディスク
DVD：デジタルビデオディスク
ROM：読み取り専用メモリ
RAM：ランダムアクセスメモリ
ASIC：特定用途向け集積回路
PLD：プログラマブル論理デバイス
LAN：ローカルエリアネットワーク
GSM：モバイル通信用グローバルシステム
LTE：ロングタームエボリューション
CANBus：コントローラエリアネットワークバス
USB：ユニバーサルシリアルバス
PCI：周辺構成要素相互接続
FPGA：フィールドプログラマブルゲートエリア
SSD：ソリッドステートドライブ
IC：集積回路
HDR：ハイダイナミックレンジ
SDR：標準ダイナミックレンジ
JVET：共同ビデオ探索チーム
MPM：最確モード
WAIP：広角イントラ予測
CU：コーディングユニット
PU：予測ユニット
TU：変換ユニット
CTU：コーディングツリーユニット
PDPC：位置依存予測組み合わせ
ISP：イントラサブパーティション
SPS：シーケンスパラメータ設定
PPS：ピクチャパラメータセット
APS：適応パラメータセット
VPS：ビデオパラメータセット
DPS：デコーディングパラメータセット
ALF：適応ループフィルタ
SAO：サンプル適応オフセット
CC－ALF：交差成分適応ループフィルタ
CDEF：制約付き方向性拡張フィルタ
CCSO：交差成分サンプルオフセット
LSO：ローカルサンプルオフセット
LR：ループ復元フィルタ
AV1：AOMedia Video 1
AV2：AOMedia Video 2
MVD：動きベクトル差
CfL：ルマからのクロマ
SDT：半分離ツリー
SDP：半分離分割
SST：セミ・セパレート・ツリー
SB：スーパーブロック
IBC（またはIntraBC）：イントラブロックコピー
CDF：累積密度関数
SCC：スクリーンコンテンツコーディング
GBI：一般化双予測
BCW：CUレベルの重みによる双予測
CIIP：結合されたイントラ－インター予測
POC：ピクチャ順序カウント
RPS：参照ピクチャセット
DPB：デコーディングされたピクチャバッファ
MMVD：動きベクトル差を伴うマージモード While this disclosure describes several exemplary embodiments, there are alterations, permutations, and various substitute equivalents that fall within the scope of this disclosure. Thus, it will be appreciated that those skilled in the art can devise numerous systems and methods that, although not explicitly shown or described herein, embody the principles of the present disclosure and are therefore within the spirit and scope of the present disclosure.
Appendix A: Acronyms
JEM: Joint Exploration Model
VVC: Versatile Video Coding
BMS: Benchmark Set
MV: Motion Vector
HEVC: High Efficiency Video Coding
SEI: Supplemental Extended Information
VUI: Video Usability Information
GOPs: Group of Pictures
TUs: conversion units
PUs: Prediction Units
CTUs: Coding Tree Units
CTBs: coding tree blocks
PBs: Predicted blocks
HRD: Hypothetical Reference Decoder
SNR: Signal to Noise Ratio
CPUs: Central Processing Units
GPUs: Graphics Processing Units
CRT: Cathode ray tube
LCD: Liquid crystal display
OLED: Organic Light Emitting Diode
CD: Compact Disc
DVD: Digital Video Disc
ROM: Read-Only Memory
RAM: Random Access Memory
ASIC: Application Specific Integrated Circuit
PLD: Programmable Logic Device
LAN: Local Area Network
GSM: Global System for Mobile Communications
LTE: Long Term Evolution
CANBus: Controller Area Network Bus
USB: Universal Serial Bus
PCI: Peripheral Component Interconnect
FPGA: Field Programmable Gate Area
SSD: Solid State Drive
IC: Integrated Circuit
HDR: High Dynamic Range
SDR: Standard Dynamic Range
JVET: Joint Video Exploration Team
MPM: Most Probable Mode
WAIP: Wide-angle intra prediction
CU: coding unit
PU: Prediction Unit
TU: conversion unit
CTU: Coding Tree Unit
PDPC: Position-dependent prediction combination
ISP: Intra Subpartition
SPS: Sequence parameter settings
PPS: Picture Parameter Set
APS: Adaptive Parameter Set
VPS: Video Parameter Set
DPS: Decoding Parameter Set
ALF: Adaptive Loop Filter
SAO: Sample Adaptive Offset
CC-ALF: Cross-component adaptive loop filter
CDEF: Constrained Directional Enhancement Filter
CCSO: Cross component sample offset
LSO: Local Sample Offset
LR: Loop restoration filter
AV1: AOMedia Video 1
AV2: AOMedia Video 2
MVD: Motion Vector Difference
CfL: Chroma from Luma
SDT: Semi-Disjoint Tree
SDP: Semi-separated division
SST: Semi-separate tree
SB: Superblock
IBC (or IntraBC): Intra-block copy
CDF: Cumulative density function
SCC: Screen Content Coding
GBI: Generalized Biprediction
BCW: Bi-prediction with CU-level weights
CIIP: Combined Intra-Inter Prediction
POC: Picture Order Count
RPS: Reference Picture Set
DPB: Decoded Picture Buffer
MMVD: Merge mode with motion vector difference

101 サンプル
102 矢印
103 矢印
104 ブロック
201 現在のブロック
202 サンプル
203 サンプル
204 サンプル
205 サンプル
206 サンプル
300 通信システム
310 端末デバイス
320 端末デバイス
330 端末デバイス
340 端末デバイス
350 ネットワーク
400 通信システム
401 ビデオソース
402 ビデオピクチャのストリーム
403 ビデオエンコーダ
404 エンコーディングされたビデオデータ
405 ストリーミングサーバ
406 クライアントサブシステム
407 ビデオデータ
408 クライアントサブシステム
409 ビデオデータ
410 ビデオデコーダ
411 ビデオピクチャの出力ストリーム
412 ディスプレイ
413 ビデオキャプチャサブシステム
420 電子デバイス
430 電子デバイス
501 チャネル
510 ビデオデコーダ
512 レンダリングデバイス
515 バッファメモリ
520 パーサ
521 シンボル
530 電子デバイス
531 受信器
551 スケーラ／逆変換ユニット
552 イントラ予測ユニット
553 動き補償予測ユニット
555 アグリゲータ
556 ループフィルタ
557 参照ピクチャメモリ
558 現在のピクチャバッファ
601 ビデオソース
603 ビデオエンコーダ
620 電子デバイス
630 ソースコーダ
632 コーディングエンジン
633 デコーダ
634 参照ピクチャメモリ
635 予測器
640 送信器
645 エントロピーコーダ
650 コントローラ
660 通信チャネル
703 ビデオエンコーダ
721 汎用コントローラ
722 イントラエンコーダ
723 残差計算器
724 残差エンコーダ
725 エントロピーエンコーダ
726 スイッチ
728 残差デコーダ
730 インターエンコーダ
810 ビデオデコーダ
871 エントロピーデコーダ
872 イントラデコーダ
873 残差デコーダ
874 再構築モジュール
880 インターデコーダ
2100 コンピュータシステム
2101 キーボード
2102 マウス
2103 トラックパッド
2105 ジョイスティック
2106 マイクロフォン
2107 スキャナ
2108 カメラ
2109 スピーカ
2110 タッチスクリーン
2120 CD／DVD ROM／RW
2121 CD／DVDまたは同様の媒体
2122 サムドライブ
2123 リムーバブルハードドライブまたはソリッドステートドライブ
2140 コア
2141 中央処理装置（CPU）
2142 グラフィックス処理装置（GPU）
2143 フィールドプログラマブルゲートエリア（FPGA）
2144 アクセラレータ
2145 読み取り専用メモリ（ROM）
2146 ランダムアクセスメモリ（RAM）
2147 大容量ストレージ
2148 システムバス
2149 周辺バス
2150 グラフィックスアダプタ
2154 ネットワークインターフェース
2155 通信ネットワーク 101 Samples
102 Arrow
103 Arrow
104 Block
201 Current Block
202 Sample
203 Sample
204 Sample
205 Samples
206 Samples
300 Communication Systems
310 Terminal Devices
320 Terminal Devices
330 Terminal Devices
340 Terminal Devices
350 Network
400 Communication Systems
401 Video Source
402 Video Picture Stream
403 Video Encoder
404 Encoded video data
405 Streaming Server
406 Client Subsystem
407 Video Data
408 Client Subsystem
409 Video Data
410 Video Decoder
411 Video Picture Output Stream
412 Display
413 Video Capture Subsystem
420 Electronic Devices
430 Electronic Devices
501 Channel
510 Video Decoder
512 Rendering Device
515 Buffer Memory
520 Parser
521 Symbols
530 Electronic Devices
531 Receiver
551 Scaler/Inverse Conversion Unit
552 Intra Prediction Units
553 Motion Compensation Prediction Unit
555 Aggregator
556 Loop Filter
557 Reference Picture Memory
558 Current Picture Buffer
601 Video Sources
603 Video Encoder
620 Electronic Devices
630 Source Coder
632 Coding Engine
633 Decoder
634 Reference Picture Memory
635 Predictor
640 Transmitter
645 Entropy Coder
650 Controller
660 Communication Channels
703 Video Encoder
721 General-purpose controller
722 Intra Encoder
723 Residual Calculator
724 Residual Encoder
725 Entropy Encoder
726 Switch
728 Residual Decoder
730 InterEncoder
810 Video Decoder
871 Entropy Decoder
872 Intra Decoder
873 Residual Decoder
874 Reconstruction Module
880 Interdecoder
2100 Computer Systems
2101 Keyboard
2102 Mouse
2103 Trackpad
2105 Joystick
2106 Microphone
2107 Scanner
2108 Camera
2109 Speaker
2110 Touch Screen
2120 CD/DVD ROM/RW
2121 CD/DVD or similar media
2122 Thumb Drive
2123 Removable Hard Drive or Solid State Drive
2140 Core
2141 Central Processing Unit (CPU)
2142 Graphics Processing Unit (GPU)
2143 Field Programmable Gate Area (FPGA)
2144 Accelerator
2145 Read-Only Memory (ROM)
2146 Random Access Memory (RAM)
2147 Mass Storage
2148 System Bus
2149 Surrounding bus
2150 Graphics Adapter
2154 Network Interface
2155 Communication Network

Claims

1. A method, executed by a video processing device, for processing a current video block of a video stream, comprising:
receiving the video stream;
determining that the current video block is inter-coded based on a predictive block and a motion vector (MV), where the MV should be derived from a reference motion vector (RMV) and a motion vector difference (MVD) of the current video block;
In response to determining that the MVD is coded at an adaptive MVD pixel resolution,
determining a reference MVD pixel precision of the current video block;
Identifying a maximum allowable MVD pixel accuracy;
determining a set of allowable MVD levels for the current video block based on the reference MVD pixel precision and the maximum allowable MVD pixel precision;
and deriving the MVD from the video stream according to at least one MVD parameter signaled within the video stream for the current video block and the set of allowable MVD levels.

The method of claim 1, wherein the reference MVD pixel precision of the current video block is specified/signaled/derived at a sequence level, a picture level, a frame level, a superblock level, or a coding block level.

The method of claim 2, wherein the reference MVD pixel precision of the current video block depends on an MVD class associated with the MVD of the current video block.

The method of claim 2, wherein the reference MVD pixel precision of the current video block depends on the MVD magnitude of the MVD of the current video block.

The method of claim 2, wherein the maximum allowable MVD pixel precision is predefined.

determining a current MVD class from among a predefined set of MVD classes, and determining the set of allowable MVD levels for the MVD based on the reference MVD pixel precision and the maximum allowable MVD pixel precision comprises:
6. The method of claim 1 , further comprising: excluding MVD levels associated with MVD pixel precision equal to or greater than the maximum allowable MVD pixel precision from a reference MVD level set determined based on the reference MVD pixel precision and the current MVD class to determine the set of allowable MVD levels for the current video block.

The method of claim 6, wherein the maximum allowable MVD pixel precision is 1/4 pixel.

The method of any one of claims 1 to 5, wherein MVD levels associated with 1/8 pixel precision or better are excluded from the set of allowable MVD levels for the current video block.

determining a current MVD class from among a predefined set of MVD classes;
6. The method of claim 1, wherein an MVD level associated with a fractional MVD precision is included in the set of allowable MVD levels regardless of the reference MVD pixel precision if the current MVD class is less than or equal to a threshold MVD class.

The method of claim 9, wherein the threshold MVD class is the lowest MVD class among a predefined set of MVD classes.

The method of any one of claims 1 to 5, further comprising determining a magnitude of the MVD, and an MVD level associated with an MVD accuracy higher than a threshold MVD accuracy is allowed in the set of allowable MVD levels only if the magnitude of the MVD is less than or equal to a threshold MVD magnitude.

The method of claim 11, wherein the threshold MVD magnitude is 2 pixels or less.

The method of claim 12, wherein the threshold MVD accuracy is 1 pixel.

The method of claim 11, wherein MVD levels associated with MVD precision of 1/4 pixel or greater are allowed only if the magnitude of the MVD is 1/2 pixel or less.

The method of any one of claims 1 to 5, wherein the maximum allowable MVD pixel precision is less than or equal to the reference MVD pixel precision.

1. A method, executed by a video processing device, for processing a current video block of a video stream, comprising:
receiving the video stream;
determining that the current video block is inter-coded and associated with multiple reference frames;
determining whether an adaptive motion vector difference (MVD) pixel resolution is applied to at least one of the plurality of reference frames based on signaling in the video stream ;
the signaling includes a single-bit flag to indicate whether adaptive MVD pixel resolution is applied to all or none of the multiple reference frames.
method .

The video processing device processes the current video block of the video stream.
13. A method for
receiving the video stream;
The current video block is inter-coded and associated with multiple reference frames.
determining that the device is being used;
determining whether an adaptive motion vector difference (MVD) pixel resolution is applied to at least one of the plurality of reference frames based on signaling in the video stream;
the signaling includes a separate flag, each corresponding to one of the plurality of reference frames, for indicating whether an adaptive MVD pixel resolution is applied;
method .

The video processing device processes the current video block of the video stream.
13. A method for
receiving the video stream;
The current video block is inter-coded and associated with multiple reference frames.
determining that the device is being used;
determining whether an adaptive motion vector difference (MVD) pixel resolution is applied to at least one of the plurality of reference frames based on signaling in the video stream;
The signaling comprises, for each of a plurality of reference frames:
an implicit indication that adaptive MVD pixel resolution is not applied when the MVD corresponding to each of the plurality of reference frames is zero; and
and a single-bit flag for indicating whether adaptive MVD pixel resolution is applied when the MVD corresponding to each of the plurality of reference frames is non-zero.
method .

1. A method, executed by a video processing device, for processing a current video block of a video stream, comprising:
receiving the video stream;
determining that the current video block is inter-coded based on a predictive block and a motion vector (MV), where the MV should be derived from a reference motion vector (RMV) and a motion vector difference (MVD) for the current video block;
determining a current MVD class of the MVD from among a predefined set of MVD classes;
deriving at least one context for entropy decoding at least one explicit signaling in the video stream based on the current MVD class, the at least one explicit signaling being included in the video stream to specify an MVD pixel resolution for at least one component of the MVD;
and entropy decoding the at least one explicit signaling from the video stream using the at least one context to determine the MVD pixel resolution for the at least one component of the MVD.

20. The method of claim 19, wherein the at least one component of the MVD includes a horizontal component and a vertical component of the MVD, and the at least one context includes two separate contexts each associated with one of the horizontal and vertical components of the MVD, the horizontal and vertical components being associated with separate MVD pixel resolutions.

A video processing device comprising a memory for storing computer instructions and a processor, the processor being configured, when executing the computer instructions, to perform a method according to any one of claims 1 to 5 and 16 to 20 .

A computer program product for causing a computer to carry out the method according to any one of claims 1 to 5 and 16 to 20 .