JP7832938B2

JP7832938B2 - Improved motion vector on the multipath decoder side

Info

Publication number: JP7832938B2
Application number: JP2023530281A
Authority: JP
Inventors: ジ・ジャン; ハン・フアン; チュン－チ・チェン; ヤン・ジャン; ヴァディム・セレジン; マルタ・カルチェヴィッチ
Original assignee: クアルコム，インコーポレイテッド
Priority date: 2020-12-22
Filing date: 2021-12-21
Publication date: 2026-03-18
Anticipated expiration: 2041-12-21
Also published as: IL301896B1; KR20230123946A; AU2021409783A1; CO2023007958A2; WO2022140338A1; AU2021409783A9; IL301896A; MX2023007161A; TW202232951A; JP2023554236A; EP4268468A1; CL2023001849A1; CA3198095A1

Description

本出願は、2021年12月20日に出願された米国特許出願第17/556,142号および2020年12月22日に出願された「MULTI-PASS DECODER-SIDE MOTION VECTOR REFINEMENT」という表題の米国仮出願第63/129,221号の優先権を主張し、これらの出願の内容全体が、参照により本明細書に組み込まれる。2021年12月20日に出願された米国特許出願第17/556,142号は、2020年12月22日に出願された米国仮出願第63/129,221号の利益を主張する。 This application claims priority to U.S. Patent Application No. 17/556,142, filed on December 20, 2021, and U.S. Provisional Application No. 63/129,221, titled "MULTI-PASS DECODER-SIDE MOTION VECTOR REFINEMENT," filed on December 22, 2020, the entire contents of these applications being incorporated herein by reference. U.S. Patent Application No. 17/556,142, filed on December 20, 2021, claims the benefit of U.S. Provisional Application No. 63/129,221, filed on December 22, 2020.

本開示は、ビデオ符号化およびビデオ復号に関する。 This disclosure relates to video coding and video decoding.

デジタルビデオ能力は、デジタルテレビジョン、デジタルダイレクトブロードキャストシステム、ワイヤレスブロードキャストシステム、携帯情報端末(PDA)、ラップトップまたはデスクトップコンピュータ、タブレットコンピュータ、電子ブックリーダー、デジタルカメラ、デジタル記録デバイス、デジタルメディアプレーヤ、ビデオゲーミングデバイス、ビデオゲームコンソール、セルラーまたは衛星ラジオ電話、いわゆる「スマートフォン」、ビデオ遠隔会議デバイス、ビデオストリーミングデバイスなどを含む、広範囲にわたるデバイスに組み込まれ得る。デジタルビデオデバイスは、MPEG-2、MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4、Part 10、Advanced Video Coding(AVC)、ITU-T H.265/High Efficiency Video Coding(HEVC)によって定義された規格、およびそのような規格の拡張に記載されている技法などの、ビデオコーディング技法を実装する。ビデオデバイスは、そのようなビデオコーディング技法を実施することによって、デジタルビデオ情報をより効率的に送信、受信、符号化、復号、および/または記憶し得る。 Digital video capabilities can be incorporated into a wide range of devices, including digital television, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio phones, so-called "smartphones," video teleconferencing devices, and video streaming devices. Digital video devices implement video coding techniques such as those described in standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265/High Efficiency Video Coding (HEVC), and extensions of such standards. By implementing such video coding techniques, video devices can more efficiently transmit, receive, encode, decode, and/or store digital video information.

ビデオコーディング技法は、ビデオシーケンスに固有の冗長性を低減または除去するために、空間(ピクチャ内)予測および/または時間(ピクチャ間)予測を含む。ブロックベースのビデオコーディングの場合、ビデオスライス(たとえば、ビデオピクチャ、またはビデオピクチャの一部分)は、ビデオブロックに区分されてもよく、ビデオブロックは、コーディングツリーユニット(CTU)、コーディングユニット(CU)、および/またはコーディングノードと呼ばれることもある。ピクチャのイントラコーディングされた(I)スライスの中のビデオブロックは、同じピクチャの中の隣接ブロックの中の参照サンプルに対する空間的予測を使用して符号化される。ピクチャのインターコーディングされた(PまたはB)スライスの中のビデオブロックは、同じピクチャの中の隣接ブロックの中の参照サンプルに対する空間的予測または他の参照ピクチャの中の参照サンプルに対する時間的予測を使用し得る。ピクチャはフレームと呼ばれることがあり、参照ピクチャは参照フレームと呼ばれることがある。 Video coding techniques involve spatial (intra-picture) and/or temporal (inter-picture) predictions to reduce or eliminate redundancy inherent in video sequences. In block-based video coding, a video slice (e.g., a video picture, or a portion of a video picture) may be divided into video blocks, which are sometimes called coding tree units (CTUs), coding units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are coded using spatial predictions for reference samples in adjacent blocks within the same picture. Video blocks in an intercoded (P or B) slice of a picture may use spatial predictions for reference samples in adjacent blocks within the same picture or temporal predictions for reference samples in other reference pictures. Pictures are sometimes called frames, and reference pictures are sometimes called reference frames.

全般に、本開示は、デコーダ側動きベクトル導出技法のための技法を説明する。より具体的には、本開示は、ビデオコーディングにおいて使用するためのマルチパスデコーダ側動きベクトル改良技法を説明する。一部の草案のビデオ規格では、動きベクトル改良の範囲が、すべての事例に対して狭すぎることがある。本開示の技法はこの問題に対処し、これは、より正確な動き予測を、したがって符号化されたビデオデータのより正確な復号と再現をもたらし得る。 In general, this disclosure describes techniques for decoder-side motion vector derivation techniques. More specifically, this disclosure describes a multi-pass decoder-side motion vector improvement technique for use in video coding. In some draft video standards, the scope of motion vector improvement may be too narrow for all cases. The technique of this disclosure addresses this problem, which can result in more accurate motion prediction and therefore more accurate decoding and reproduction of encoded video data.

一例では、方法は、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用するステップと、少なくとも1つの改良された動きベクトルに基づいてブロックを復号するステップとを含み、マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える。 In one example, the method includes the steps of: applying a multipath decoder-side motion vector improvement (DMVR) to a motion vector for a block of video data to determine at least one improved motion vector; and decoding the block based on at least one improved motion vector, wherein the multipath DMVR comprises: a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of the block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of the block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

別の例では、デバイスは、ビデオデータを記憶するように構成されるメモリと、回路で実装されメモリに通信可能に結合される1つまたは複数のプロセッサとを含み、1つまたは複数のプロセッサは、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用し、少なくとも1つの改良された動きベクトルに基づいてブロックを復号するように構成され、マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える。 In another example, the device includes a memory configured to store video data and one or more processors implemented in circuitry and communicatively coupled to the memory, the one or more processors configured to apply a multipath decoder-side motion vector improvement (DMVR) to the motion vectors for a block of video data to determine at least one improved motion vector, and to decode the block based on the at least one improved motion vector, wherein the multipath DMVR comprises a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of a block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of a block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

別の例では、非一時的コンピュータ可読媒体は、実行されると、1つまたは複数のプロセッサに、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルへマルチパスデコーダ側動きベクトル改良(DMVR)を適用させ、少なくとも1つの改良された動きベクトルに基づいてブロックを復号させる、命令を記憶し、マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える。 In another example, a non-temporary computer-readable medium stores instructions that, when executed, cause one or more processors to apply a multipath decoder-side motion vector improvement (DMVR) to the motion vectors for a block of video data to determine at least one improved motion vector, and to decode the block based on the at least one improved motion vector. The multipath DMVR comprises: a first pass that is block-based and applied to the block of video data; a second pass that is sub-block-based and applied to at least one second pass sub-block of the block of video data, wherein the width of the second pass sub-block is less than or equal to the width of the block of video data and the height of the second pass sub-block is less than or equal to the height of the block of video data; and a third pass that is sub-block-based and applied to at least one third pass sub-block of the block of video data, wherein the width of the third pass sub-block is less than or equal to the width of the second pass sub-block and the height of the third pass sub-block is less than or equal to the height of the second pass sub-block.

別の例では、デバイスは、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用するための手段と、少なくとも1つの改良された動きベクトルに基づいてブロックを復号するための手段とを含み、マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える。 In another example, the device includes means for applying a multipath decoder-side motion vector improvement (DMVR) to a motion vector for a block of video data to determine at least one improved motion vector, and means for decoding the block based on at least one improved motion vector, wherein the multipath DMVR comprises: a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of the block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of the block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

一例では、方法は、改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用するステップと、改良された動きベクトルに基づいてブロックをコーディングするステップとを含む。 In one example, the method includes the steps of applying a multipath decoder-side motion vector improvement (DMVR) to the motion vectors for a block of video data to determine the improved motion vectors, and coding the blocks based on the improved motion vectors.

別の例では、デバイスは、ビデオデータを記憶するように構成されるメモリと、回路で実装されメモリに通信可能に結合される1つまたは複数のプロセッサとを含み、1つまたは複数のプロセッサは、本開示の技法のいずれかを実行するように構成される。 In another example, the device includes memory configured to store video data and one or more processors implemented by circuitry and communicatively coupled to the memory, the one or more processors configured to perform any of the techniques of this disclosure.

別の例では、デバイスは、本開示の技法のいずれかを実行するための少なくとも1つの手段を含む。 In another example, the device includes at least one means for performing any of the techniques of this disclosure.

別の例では、コンピュータ可読記憶媒体には、実行されると、プログラマブルプロセッサに本開示の技法のいずれかを実行させる命令が符号化されている。 In another example, a computer-readable storage medium encodes instructions that, when executed, cause a programmable processor to perform one of the techniques of this disclosure.

1つまたは複数の例の詳細が、添付の図面および以下の説明に記載されている。他の特徴、目的、および利点が、説明、図面、および特許請求の範囲から明らかになるであろう。 Details of one or more examples are described in the accompanying drawings and the following description. Other features, purposes, and advantages will become apparent from the description, drawings, and claims.

本開示の技法を実行し得る例示的なビデオ符号化および復号システムを示すブロック図である。Block diagram shows an exemplary video encoding and decoding system capable of performing the techniques of this disclosure. 例示的な四分木二分木(QTBT)構造を示す概念図である。This is a conceptual diagram illustrating an exemplary quadrow-binary tree (QTBT) structure. 対応するコーディングツリーユニット(CTU)を示す概念図である。This is a conceptual diagram showing the corresponding coding tree unit (CTU). 本開示の技法を実行し得る例示的なビデオエンコーダを示すブロック図である。A block diagram illustrating a video encoder capable of performing the techniques of this disclosure. 本開示の技法を実行し得る例示的なビデオデコーダを示すブロック図である。A block diagram showing an exemplary video decoder capable of performing the techniques of this disclosure. マージモードのための例示的な空間隣接MV候補を示す概念図である。This is a conceptual diagram showing exemplary spatially adjacent MV candidates for merge mode. AMVPモードのための例示的な空間隣接MV候補を示す概念図である。This is a conceptual diagram showing exemplary spatially adjacent MV candidates for AMVP mode. 例示的なTMVP候補を示す概念図である。This is a conceptual diagram showing an example of a TMVP candidate. 例示的なMVスケーリングを示す概念図である。This is a conceptual diagram illustrating exemplary MV scaling. 初期MVの周りの探索エリアでの例示的なテンプレートマッチングを示す概念図である。This is a conceptual diagram illustrating exemplary template matching in the search area around the initial MV. MVD0およびMVD1が時間距離に基づいて比例する例を示す概念図である。This is a conceptual diagram illustrating an example where MVD0 and MVD1 are proportional based on time distance. MVD0およびMVD1が時間距離にかかわらず鏡写しになっている例を示す概念図である。This is a conceptual diagram illustrating an example where MVD0 and MVD1 are mirror images of each other regardless of time and distance. 探索範囲[-8,8]における3×3の正方形探索パターンの例を示す概念図である。This is a conceptual diagram showing an example of a 3x3 square search pattern within the search range [-8,8]. 例示的なデコーダ側動きベクトル改良を示す概念図である。This is a conceptual diagram illustrating an example of decoder-side motion vector improvement. BDOFにおいて使用される例示的な拡張されたCU領域を示す概念図である。This is a conceptual diagram illustrating an example of an extended CU region used in BDOF. 例示的な3パスDMVR技法を示す概念図である。This is a conceptual diagram illustrating an exemplary 3-pass DMVR technique. 例示的なBDOF動きベクトル改良を示す概念図である。This is a conceptual diagram illustrating an example of BDOF motion vector improvement. 本開示の例示的なマルチパスDMVR技法を示すフローチャートである。This flowchart shows an exemplary multi-pass DMVR technique in this disclosure. 本開示の技法による、現在ブロックを符号化するための例示的な方法を示すフローチャートである。This flowchart shows an exemplary method for encoding a current block using the technique of the present disclosure. 本開示の技法による、現在ブロックを復号するための例示的な方法を示すフローチャートである。This flowchart shows an exemplary method for decrypting a current block using the technique of the present disclosure.

一部の草案のビデオ規格では、動きベクトル改良の範囲が、すべての事例に対して狭すぎることがある。これは、誤りの多い動き予測を、したがってより不正確な復号をもたらすことがある。本開示の技法はこの問題に対処し、これは、より正確な動き予測を、したがって符号化されたビデオデータのより正確な復号と再現をもたらし得る。 In some draft video standards, the scope of motion vector improvement may be too narrow for all cases. This can lead to erroneous motion prediction and therefore less accurate decoding. The technique of this disclosure addresses this problem and can result in more accurate motion prediction and therefore more accurate decoding and reproduction of encoded video data.

図1は、本開示の技法を実行し得る例示的なビデオ符号化および復号システム100を示すブロック図である。本開示の技法は、一般に、ビデオデータをコーディング(符号化および/または復号)することを対象とする。一般に、ビデオデータは、ビデオを処理するための任意のデータを含む。したがって、ビデオデータは、未加工の符号化されていないビデオ、符号化されたビデオ、復号された(たとえば、再構築された)ビデオ、およびシグナリングデータなどのビデオメタデータを含み得る。 Figure 1 is a block diagram illustrating an exemplary video coding and decoding system 100 capable of performing the techniques of this disclosure. The techniques of this disclosure generally pertain to coding (encoding and/or decoding) video data. Generally, video data includes any data for processing video. Therefore, video data may include raw, unencoded video, encoded video, decoded (e.g., reconstructed) video, and video metadata such as signaling data.

図1に示されるように、システム100は、この例では、デスティネーションデバイス116によって復号され表示されるべき、符号化されたビデオデータを提供するソースデバイス102を含む。具体的には、ソースデバイス102は、コンピュータ可読媒体110を介してデスティネーションデバイス116にビデオデータを提供する。ソースデバイス102およびデスティネーションデバイス116は、デスクトップコンピュータ、ノートブック(すなわち、ラップトップ)コンピュータ、モバイルデバイス、タブレットコンピュータ、セットトップボックス、スマートフォンなどの電話ハンドセット、テレビジョン、カメラ、表示デバイス、デジタルメディアプレーヤ、ビデオゲーミングコンソール、ビデオストリーミングデバイス、放送受信機デバイスなどを含む、広範囲にわたるデバイスのいずれを備えてもよい。場合によっては、ソースデバイス102およびデスティネーションデバイス116は、ワイヤレス通信に対応することがあり、したがって、ワイヤレス通信デバイスと呼ばれることがある。 As shown in Figure 1, system 100 includes, in this example, a source device 102 that provides encoded video data to be decoded and displayed by the destination device 116. Specifically, the source device 102 provides video data to the destination device 116 via a computer-readable medium 110. The source device 102 and the destination device 116 may comprise any of the wide range of devices, including desktop computers, notebook (i.e., laptop) computers, mobile devices, tablet computers, set-top boxes, telephone handsets such as smartphones, televisions, cameras, display devices, digital media players, video gaming consoles, video streaming devices, broadcast receiver devices, and so on. In some cases, the source device 102 and the destination device 116 may be compatible with wireless communication and may therefore be referred to as wireless communication devices.

図1の例では、ソースデバイス102は、ビデオソース104、メモリ106、ビデオエンコーダ200、および出力インターフェース108を含む。デスティネーションデバイス116は、入力インターフェース122、ビデオデコーダ300、メモリ120、および表示デバイス118を含む。本開示によれば、ソースデバイス102のビデオエンコーダ200およびデスティネーションデバイス116のビデオデコーダ300は、デコーダ側動きベクトル導出のための技法を適用するように構成され得る。したがって、ソースデバイス102はビデオ符号化デバイスの例を表すが、デスティネーションデバイス116はビデオ復号デバイスの例を表す。他の例では、ソースデバイスおよびデスティネーションデバイスは、他のコンポーネントまたは構成を含んでもよい。たとえば、ソースデバイス102は、外部カメラなどの外部ビデオソースからビデオデータを受信してもよい。同様に、デスティネーションデバイス116は、一体型表示デバイスを含むのではなく、外部表示デバイスとインターフェースしてもよい。 In the example in Figure 1, the source device 102 includes a video source 104, memory 106, a video encoder 200, and an output interface 108. The destination device 116 includes an input interface 122, a video decoder 300, memory 120, and a display device 118. According to this disclosure, the video encoder 200 of the source device 102 and the video decoder 300 of the destination device 116 may be configured to apply techniques for decoder-side motion vector derivation. Thus, the source device 102 represents an example of a video encoding device, while the destination device 116 represents an example of a video decoding device. In other examples, the source and destination devices may include other components or configurations. For example, the source device 102 may receive video data from an external video source, such as an external camera. Similarly, the destination device 116 may interface with an external display device instead of including an integrated display device.

図1に示されるようなシステム100は一例にすぎない。一般に、任意のデジタルビデオ復号デバイスが、デコーダ側動きベクトル導出技法のための技法を実行し得る。ソースデバイス102およびデスティネーションデバイス116は、ソースデバイス102がデスティネーションデバイス116への送信のためにコーディングされたビデオデータを生成するような、コーディングデバイスの例にすぎない。本開示は、データのコーディング(符号化および/または復号)を実行するデバイスを「コーディング」デバイスと呼ぶ。したがって、ビデオエンコーダ200およびビデオデコーダ300は、コーディングデバイス、具体的には、それぞれビデオエンコーダおよびビデオデコーダの例を表す。いくつかの例では、ソースデバイス102およびデスティネーションデバイス116は、ソースデバイス102およびデスティネーションデバイス116の各々がビデオ符号化および復号コンポーネントを含むように、実質的に対称的に動作し得る。したがって、システム100は、たとえば、ビデオストリーミング、ビデオ再生、ビデオブロードキャスティング、またはビデオ電話のための、ソースデバイス102とデスティネーションデバイス116との間での一方向または双方向のビデオ送信をサポートし得る。 The system 100 shown in Figure 1 is merely an example. Generally, any digital video decoding device can perform techniques for decoder-side motion vector derivation. Source device 102 and destination device 116 are merely examples of coding devices, such that source device 102 generates coded video data for transmission to destination device 116. This disclosure refers to a device that performs coding (encoding and/or decoding) of data as a “coding” device. Therefore, video encoder 200 and video decoder 300 represent examples of coding devices, specifically, video encoder and video decoder, respectively. In some examples, source device 102 and destination device 116 may operate substantially symmetrically, such that each of source device 102 and destination device 116 includes video encoding and decoding components. Thus, system 100 may support one-way or two-way video transmission between source device 102 and destination device 116 for, for example, video streaming, video playback, video broadcasting, or video phone.

一般に、ビデオソース104は、ビデオデータ(すなわち、未加工の符号化されていないビデオデータ)のソースを表し、ビデオデータの連続した一連のピクチャ(「フレーム」とも呼ばれる)をビデオエンコーダ200に提供し、ビデオエンコーダ200は、ピクチャのためのデータを符号化する。ソースデバイス102のビデオソース104は、ビデオカメラ、以前にキャプチャされた未加工のビデオを含むビデオアーカイブ、および/またはビデオコンテンツプロバイダからビデオを受信するためのビデオフィードインターフェースなどの、ビデオキャプチャデバイスを含み得る。さらなる代替として、ビデオソース104は、ソースビデオとしてのコンピュータグラフィックスベースのデータ、またはライブビデオとアーカイブされたビデオとコンピュータで生成されたビデオとの組合せを生成し得る。各々の場合において、ビデオエンコーダ200は、キャプチャされた、事前にキャプチャされた、またはコンピュータで生成されたビデオデータを符号化する。ビデオエンコーダ200は、受信された順序(「表示順序」と呼ばれることがある)からコーディングのためのコーディング順序へと、ピクチャを並べ替え得る。ビデオエンコーダ200は、符号化されたビデオデータを含むビットストリームを生成し得る。ソースデバイス102は次いで、たとえばデスティネーションデバイス116の入力インターフェース122によって、受信および/または取り出しのために、出力インターフェース108を介してコンピュータ可読媒体110へと、符号化されたビデオデータを出力し得る。 Generally, the video source 104 represents a source of video data (i.e., raw, unencoded video data), providing the video encoder 200 with a sequence of pictures (also called "frames") of video data, which the video encoder 200 then encodes. The video source 104 of source device 102 may include video capture devices such as a video camera, a video archive containing previously captured raw video, and/or a video feed interface for receiving video from a video content provider. As a further alternative, the video source 104 may generate computer graphics-based data as source video, or a combination of live video, archived video, and computer-generated video. In each case, the video encoder 200 encodes the captured, pre-captured, or computer-generated video data. The video encoder 200 may rearrange the pictures from the order in which they were received (sometimes called the "display order") to the coding order for encoding. The video encoder 200 may generate a bitstream containing the encoded video data. The source device 102 can then output the encoded video data to the computer-readable medium 110 via the output interface 108, for reception and/or retrieval, for example, through the input interface 122 of the destination device 116.

ソースデバイス102のメモリ106およびデスティネーションデバイス116のメモリ120は、汎用メモリを表す。いくつかの例では、メモリ106、120は、未加工のビデオデータ、たとえば、ビデオソース104からの未加工のビデオと、ビデオデコーダ300からの未加工の復号されたビデオデータとを記憶し得る。追加または代替として、メモリ106、120は、たとえば、それぞれ、ビデオエンコーダ200およびビデオデコーダ300によって実行可能なソフトウェア命令を記憶し得る。メモリ106およびメモリ120は、この例ではビデオエンコーダ200およびビデオデコーダ300とは別々に示されているが、ビデオエンコーダ200およびビデオデコーダ300は、機能的に同様のまたは等価な目的で内部メモリも含み得ることを理解されたい。さらに、メモリ106、120は、たとえばビデオエンコーダ200から出力されビデオデコーダ300へと入力される、符号化されたビデオデータを記憶し得る。いくつかの例では、メモリ106、120の一部は、たとえば、未加工の、復号された、および/または符号化されたビデオデータを記憶するための、1つまたは複数のビデオバッファとして割り振られ得る。 Memory 106 of source device 102 and memory 120 of destination device 116 represent general-purpose memory. In some examples, memories 106 and 120 may store raw video data, for example, raw video from video source 104 and raw decoded video data from video decoder 300. Additionally or alternatively, memories 106 and 120 may store, for example, software instructions executable by video encoder 200 and video decoder 300, respectively. Although memories 106 and 120 are shown separately from video encoder 200 and video decoder 300 in this example, it should be understood that video encoder 200 and video decoder 300 may also include internal memory for functionally similar or equivalent purposes. Furthermore, memories 106 and 120 may store encoded video data, for example, output from video encoder 200 and input to video decoder 300. In some examples, portions of memory 106, 120 may be allocated as one or more video buffers for storing, for example, raw, decoded, and/or encoded video data.

コンピュータ可読媒体110は、ソースデバイス102からデスティネーションデバイス116に符号化されたビデオデータを移すことが可能な任意のタイプの媒体またはデバイスを表し得る。一例では、コンピュータ可読媒体110は、たとえば、無線周波数ネットワークまたはコンピュータベースのネットワークを介して、ソースデバイス102がリアルタイムでデスティネーションデバイス116に符号化されたビデオデータを直接送信することを可能にするための通信媒体を表す。出力インターフェース108は、符号化されたビデオデータを含む送信信号を変調してもよく、入力インターフェース122は、ワイヤレス通信プロトコルなどの通信規格に従って、受信された送信信号を復調してもよい。通信媒体は、無線周波数(RF)スペクトルまたは1つまたは複数の物理伝送線路などの、任意のワイヤレスまたは有線通信媒体を含み得る。通信媒体は、ローカルエリアネットワーク、ワイドエリアネットワーク、またはインターネットなどのグローバルネットワークなどの、パケットベースのネットワークの一部を形成し得る。通信媒体は、ルータ、スイッチ、基地局、またはソースデバイス102からデスティネーションデバイス116への通信を容易にするために有用であり得る任意の他の機器を含み得る。 The computer-readable medium 110 may represent any type of medium or device capable of transferring encoded video data from the source device 102 to the destination device 116. For example, the computer-readable medium 110 may represent a communication medium that enables the source device 102 to directly transmit encoded video data to the destination device 116 in real time, for example, over a radio frequency network or a computer-based network. The output interface 108 may modulate the transmission signal containing the encoded video data, and the input interface 122 may demodulate the received transmission signal according to a communication standard such as a wireless communication protocol. The communication medium may include any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be useful in facilitating communication from the source device 102 to the destination device 116.

いくつかの例では、ソースデバイス102は、出力インターフェース108から記憶デバイス112に符号化されたデータを出力し得る。同様に、デスティネーションデバイス116は、入力インターフェース122を介して、記憶デバイス112からの符号化されたデータにアクセスし得る。記憶デバイス112は、ハードドライブ、ブルーレイディスク、DVD、CD-ROM、フラッシュメモリ、揮発性もしくは不揮発性メモリ、または符号化されたビデオデータを記憶するための任意の他の適切なデジタル記憶媒体などの、様々な分散されたまたはローカルでアクセスされるデータ記憶媒体のうちのいずれかを含み得る。 In some examples, the source device 102 may output encoded data to the storage device 112 via the output interface 108. Similarly, the destination device 116 may access encoded data from the storage device 112 via the input interface 122. The storage device 112 may include any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.

いくつかの例では、ソースデバイス102は、符号化されたビデオデータを、ソースデバイス102によって生成された符号化されたビデオデータを記憶し得るファイルサーバ114または別の中間記憶デバイスに出力し得る。デスティネーションデバイス116は、ストリーミングまたはダウンロードを介して、ファイルサーバ114からの記憶されたビデオデータにアクセスし得る。 In some examples, the source device 102 may output the encoded video data to a file server 114 or another intermediate storage device capable of storing the encoded video data generated by the source device 102. The destination device 116 may access the stored video data from the file server 114 via streaming or download.

ファイルサーバ114は、符号化されたビデオデータを記憶し、その符号化されたビデオデータをデスティネーションデバイス116に送信することが可能な任意のタイプのサーバデバイスであり得る。ファイルサーバ114は、(たとえば、ウェブサイトのための)ウェブサーバ、(File Transfer Protocol(FTP)またはFile Delivery over Unidirectional Transport(FLUTE)プロトコルなどの)ファイル転送プロトコルサービスを提供するように構成されたサーバ、コンテンツ配信ネットワーク(CDN)デバイス、ハイパーテキスト転送プロトコル(HTTP)サーバ、マルチメディアブロードキャストマルチキャストサービス(MBMS)もしくは拡張MBMS(eMBMS)サーバ、および/またはネットワークアタッチトストレージ(NAS)デバイスを表し得る。ファイルサーバ114は、追加または代替として、Dynamic Adaptive Streaming over HTTP(DASH)、HTTP Live Streaming(HLS)、Real Time Streaming Protocol(RTSP)、HTTP Dynamic Streamingなどの1つまたは複数のHTTPストリーミングプロトコルを実装し得る。 The file server 114 can be any type of server device capable of storing encoded video data and transmitting that encoded video data to the destination device 116. The file server 114 may represent a web server (for example, a website), a server configured to provide file transfer protocol services (such as the File Transfer Protocol (FTP) or File Delivery over Unidirectional Transport (FLUTE) protocol), a Content Delivery Network (CDN) device, a Hypertext Transfer Protocol (HTTP) server, a Multimedia Broadcast Multicast Service (MBMS) or Enhanced MBMS (eMBMS) server, and/or a Network Attached Storage (NAS) device. The file server 114 may, additionally or alternatively, implement one or more HTTP streaming protocols, such as Dynamic Adaptive Streaming over HTTP (DASH), HTTP Live Streaming (HLS), Real Time Streaming Protocol (RTSP), or HTTP Dynamic Streaming.

デスティネーションデバイス116は、インターネット接続を含む任意の標準的なデータ接続を通じて、ファイルサーバ114からの符号化されたビデオデータにアクセスし得る。これは、ワイヤレスチャネル(たとえば、Wi-Fi接続)、有線接続(たとえば、デジタル加入者線(DSL)、ケーブルモデムなど)、またはファイルサーバ114上に記憶されている符号化されたビデオデータにアクセスするのに適した、両方の組合せを含み得る。入力インターフェース122は、ファイルサーバ114からメディアデータを取り出すかもしくは受信するための上で説明された様々なプロトコル、またはメディアデータを取り出すための他のそのようなプロトコルのうちのいずれか1つまたは複数に従って動作するように構成され得る。 The destination device 116 may access encoded video data from the file server 114 via any standard data connection, including an internet connection. This may include wireless channels (e.g., Wi-Fi connection), wired connections (e.g., digital subscriber line (DSL), cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on the file server 114. The input interface 122 may be configured to operate according to one or more of the various protocols described above for retrieving or receiving media data from the file server 114, or other such protocols for retrieving media data.

出力インターフェース108および入力インターフェース122は、ワイヤレス送信機/受信機、モデム、有線ネットワーキングコンポーネント(たとえば、イーサネットカード)、種々のIEEE 802.11規格のいずれかに従って動作するワイヤレス通信コンポーネント、または他の物理コンポーネントを表し得る。出力インターフェース108および入力インターフェース122がワイヤレスコンポーネントを備える例では、出力インターフェース108および入力インターフェース122は、4G、4G-LTE(Long-Term Evolution)、LTE Advanced、5Gなどのセルラー通信規格に従って、符号化されたビデオデータなどのデータを転送するように構成され得る。出力インターフェース108がワイヤレス送信機を備えるいくつかの例では、出力インターフェース108および入力インターフェース122は、IEEE 802.11仕様、IEEE 802.15仕様(たとえば、ZigBee(商標))、Bluetooth(商標)規格などの他のワイヤレス規格に従った、符号化されたビデオデータなどのデータを転送するように構成され得る。いくつかの例では、ソースデバイス102および/またはデスティネーションデバイス116は、それぞれのシステムオンチップ(SoC)デバイスを含み得る。たとえば、ソースデバイス102は、ビデオエンコーダ200および/または出力インターフェース108に起因する機能を実行するためのSoCデバイスを含んでもよく、デスティネーションデバイス116は、ビデオデコーダ300および/または入力インターフェース122に起因する機能を実行するためのSoCデバイスを含んでもよい。 The output interface 108 and input interface 122 may represent a wireless transmitter/receiver, modem, wired networking component (e.g., Ethernet card), wireless communication component operating according to any of the various IEEE 802.11 standards, or other physical components. In examples where the output interface 108 and input interface 122 include wireless components, the output interface 108 and input interface 122 may be configured to transfer data such as encoded video data according to cellular communication standards such as 4G, 4G-LTE (Long-Term Evolution), LTE Advanced, or 5G. In some examples where the output interface 108 includes a wireless transmitter, the output interface 108 and input interface 122 may be configured to transfer data such as encoded video data according to other wireless standards such as the IEEE 802.11 specification, the IEEE 802.15 specification (e.g., ZigBee®), or the Bluetooth® standard. In some examples, the source device 102 and/or destination device 116 may include their respective system-on-chip (SoC) devices. For example, the source device 102 may include an SoC device for performing functions related to the video encoder 200 and/or the output interface 108, and the destination device 116 may include an SoC device for performing functions related to the video decoder 300 and/or the input interface 122.

本開示の技法は、電波によるテレビジョン放送、ケーブルテレビジョン送信、衛星テレビジョン送信、dynamic adaptive streaming over HTTP(DASH)などのインターネットストリーミングビデオ送信、データ記憶媒体上に符号化されているデジタルビデオ、データ記憶媒体上に記憶されたデジタルビデオの復号、または他の適用例などの、様々なマルチメディア適用例のいずれかをサポートするビデオコーディングに適用され得る。 The techniques of this disclosure can be applied to video coding supporting any of a variety of multimedia applications, such as television broadcasting via radio waves, cable television transmission, satellite television transmission, internet streaming video transmission such as dynamic adaptive streaming over HTTP (DASH), digital video encoded on a data storage medium, decoding of digital video stored on a data storage medium, or other applications.

デスティネーションデバイス116の入力インターフェース122は、コンピュータ可読媒体110(たとえば、通信媒体、記憶デバイス112、ファイルサーバ114など)から、符号化されたビデオビットストリームを受信する。符号化されたビデオビットストリームは、ビデオブロックまたは他のコーディングされたユニット(たとえば、スライス、ピクチャ、ピクチャグループ、シーケンスなど)の特性および/または処理を記述する値を有するシンタックス要素などの、ビデオデコーダ300によっても使用されるビデオエンコーダ200によって定義されるシグナリング情報を含み得る。表示デバイス118は、復号されたビデオデータの復号されたピクチャをユーザに表示する。表示デバイス118は、液晶ディスプレイ(LCD)、プラズマディスプレイ、有機発光ダイオード(OLED)ディスプレイ、または別のタイプの表示デバイスなどの、様々な表示デバイスのいずれかを表し得る。 The input interface 122 of the destination device 116 receives an encoded video bitstream from a computer-readable medium 110 (e.g., a communication medium, storage device 112, file server 114, etc.). The encoded video bitstream may include signaling information defined by the video encoder 200, which is also used by the video decoder 300, such as syntax elements having values describing the characteristics and/or processing of video blocks or other coded units (e.g., slices, pictures, picture groups, sequences, etc.). The display device 118 displays the decoded picture of the decoded video data to the user. The display device 118 may represent any of various display devices, such as a liquid crystal display (LCD), plasma display, organic light-emitting diode (OLED) display, or another type of display device.

図1には示されないが、いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300は各々、オーディオエンコーダおよび/またはオーディオデコーダと統合されてもよく、共通のデータストリームの中のオーディオとビデオの両方を含む多重化されたストリームを扱うために、適切なMUX-DEMUXユニット、または他のハードウェアおよび/もしくはソフトウェアを含んでもよい。適用可能な場合、MUX-DEMUXユニットは、ITU H.223マルチプレクサプロトコル、またはユーザデータグラムプロトコル(UDP)などの他のプロトコルに準拠し得る。 Although not shown in Figure 1, in some examples, the video encoder 200 and video decoder 300 may each be integrated with an audio encoder and/or audio decoder, and may include a suitable MUX-DEMUX unit or other hardware and/or software to handle multiplexed streams containing both audio and video in a common data stream. Where applicable, the MUX-DEMUX unit may comply with the ITU H.223 Multiplexer Protocol or other protocols such as the User Datagram Protocol (UDP).

ビデオエンコーダ200およびビデオデコーダ300は各々、1つまたは複数のマイクロプロセッサ、デジタル信号プロセッサ(DSP)、特定用途向け集積回路(ASIC)、フィールドプログラマブルゲートアレイ(FPGA)、ディスクリート論理、ソフトウェア、ハードウェア、ファームウェア、またはそれらの任意の組合せなどの、様々な適切なエンコーダおよび/またはデコーダ回路のいずれかとして実装され得る。技法が部分的にソフトウェアで実装されるとき、デバイスは、適切な非一時的コンピュータ可読媒体にソフトウェアのための命令を記憶し、本開示の技法を実行するために1つまたは複数のプロセッサを使用してハードウェアでその命令を実行し得る。ビデオエンコーダ200およびビデオデコーダ300の各々は、1つまたは複数のエンコーダまたはデコーダに含まれてもよく、それらのいずれもが、それぞれのデバイスの中で複合エンコーダ/デコーダ(コーデック)の一部として統合されてもよい。ビデオエンコーダ200および/またはビデオデコーダ300を含むデバイスは、集積回路、マイクロプロセッサ、および/または携帯電話などのワイヤレス通信デバイスを備え得る。 The video encoder 200 and video decoder 300 may each be implemented as one or more suitable encoder and/or decoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. When the technique is partially implemented in software, the device may store instructions for the software in a suitable non-temporary computer-readable medium and execute those instructions in hardware using one or more processors to perform the technique of this disclosure. Each of the video encoder 200 and video decoder 300 may be included in one or more encoders or decoders, any of which may be integrated as part of a composite encoder/decoder (codec) within each device. A device including the video encoder 200 and/or video decoder 300 may comprise an integrated circuit, a microprocessor, and/or a wireless communication device such as a mobile phone.

ビデオエンコーダ200およびビデオデコーダ300は、High Efficiency Video Coding(HEVC)とも呼ばれるITU-T H.265などのビデオコーディング規格、または、マルチビューおよび/もしくはスケーラブルビデオコーディング拡張などのそれらの拡張に従って動作し得る。代替として、ビデオエンコーダ200およびビデオデコーダ300は、Versatile Video Coding(VVC)とも呼ばれるITU-T H.266などの、他のプロプライエタリ規格または業界規格に従って動作し得る。VVC規格の草案は、Bross他、「Versatile Video Coding Editorial Refinements on Draft 10」、ITU-T SG 16 WP 3およびISO/IEC JTC 1/SC 29/WG 11のJoint Video Experts Team(JVET)、遠隔会議による第18回会合、2020年10月7～16日、JVET-T2001-v1(以後「VVC Draft 10」)に記載されている。しかしながら、本開示の技法は、いかなる特定のコーディング規格にも限定されない。 The video encoder 200 and video decoder 300 may operate in accordance with video coding standards such as ITU-T H.265, also known as High Efficiency Video Coding (HEVC), or their extensions, such as multiview and/or scalable video coding extensions. Alternatively, the video encoder 200 and video decoder 300 may operate in accordance with other proprietary or industry standards, such as ITU-T H.266, also known as Versatile Video Coding (VVC). The draft VVC standard is described in Bross et al., "Versatile Video Coding Editorial Refinements on Draft 10," ITU-T SG 16 WP 3 and the Joint Video Experts Team (JVET) of ISO/IEC JTC 1/SC 29/WG 11, 18th meeting held remotely, October 7-16, 2020, JVET-T2001-v1 (hereinafter "VVC Draft 10"). However, the techniques described herein are not limited to any specific coding standard.

一般に、ビデオエンコーダ200およびビデオデコーダ300は、ピクチャのブロックベースのコーディングを実行し得る。「ブロック」という用語は全般に、処理されるべきデータ(たとえば、符号化および/または復号プロセスにおいて符号化される、復号される、または別様に使用される)を含む構造を指す。たとえば、ブロックは、輝度および/または色度データのサンプルの2次元行列を含み得る。一般に、ビデオエンコーダ200およびビデオデコーダ300は、YUV(たとえば、Y、Cb、Cr)フォーマットで表されるビデオデータをコーディングし得る。すなわち、ピクチャのサンプルに対する赤、緑、および青(RGB)のデータをコーディングするのではなく、ビデオエンコーダ200およびビデオデコーダ300は、輝度成分および色度成分をコーディングしてもよく、色度成分は、赤の色調と青の色調の両方の色度成分を含んでもよい。いくつかの例では、ビデオエンコーダ200は、受信されたRGBフォーマットされたデータを符号化の前にYUV表現へと変換し、ビデオデコーダ300は、YUV表現をRGBフォーマットに変換する。代替として、前処理ユニットおよび後処理ユニット(図示せず)が、これらの変換を実行してもよい。 Generally, the video encoder 200 and video decoder 300 may perform block-based coding of pictures. The term "block" generally refers to a structure containing data to be processed (e.g., to be coded, coded, or otherwise used in the coding and/or decoding process). For example, a block may contain a two-dimensional matrix of samples of luminance and/or chromaticity data. Generally, the video encoder 200 and video decoder 300 may code video data represented in YUV (e.g., Y, Cb, Cr) format. That is, rather than coding red, green, and blue (RGB) data for the samples of a picture, the video encoder 200 and video decoder 300 may code luminance and chromaticity components, and the chromaticity component may include both red and blue chromaticity components. In some examples, the video encoder 200 converts the received RGB-formatted data to a YUV representation before coding, and the video decoder 300 converts the YUV representation to RGB format. Alternatively, pre-processing and post-processing units (not shown) may perform these conversions.

本開示は一般に、ピクチャのデータを符号化または復号するプロセスを含むものとして、ピクチャのコーディング(たとえば、符号化および復号)に言及し得る。同様に、本開示は、ブロックのためのデータを符号化または復号するプロセス、たとえば予測および/または残差コーディングを含むものとして、ピクチャのブロックのコーディングに言及し得る。符号化されたビデオビットストリームは一般に、コーディングの決定(たとえば、コーディングモード)およびブロックへのピクチャの区分を表す、シンタックス要素に対する一連の値を含む。したがって、ピクチャまたはブロックをコーディングすることへの言及は全般に、ピクチャまたはブロックを形成するシンタックス要素に対する値をコーディングすることとして理解されるべきである。 This disclosure may generally refer to coding a picture (e.g., encoding and decoding) as including the process of encoding or decoding the data of a picture. Similarly, this disclosure may refer to coding a block of a picture as including the process of encoding or decoding data for a block, e.g., predictive and/or residual coding. An encoded video bitstream generally contains a set of values for syntax elements that represent the coding decision (e.g., coding mode) and the division of the picture into blocks. Therefore, references to coding a picture or a block should generally be understood as coding values for the syntax elements that form the picture or block.

HEVCは、コーディングユニット(CU)、予測ユニット(PU)、および変換ユニット(TU)を含む、様々なブロックを定義する。HEVCによれば、ビデオコーダ(ビデオエンコーダ200など)は、四分木構造に従ってコーディングツリーユニット(CTU)をCUへと区分する。すなわち、ビデオコーダは、CTUおよびCUを4つの等しい重複しない正方形へと区分し、四分木の各ノードは、0個または4個のいずれかの子ノードを有する。子ノードのないノードは「リーフノード」と呼ばれることがあり、そのようなリーフノードのCUは、1つまたは複数のPUおよび/または1つまたは複数のTUを含むことがある。ビデオコーダはさらにPUおよびTUを区分し得る。たとえば、HEVCでは、残差四分木(RQT)はTUの区分を表す。HEVCでは、PUはインター予測データを表すが、TUは残差データを表す。イントラ予測されるCUは、イントラモード指示などのイントラ予測情報を含む。 HEVC defines various blocks, including coding units (CUs), prediction units (PUs), and transformation units (TUs). According to HEVC, a video coder (such as video encoder 200) divides coding tree units (CTUs) into CUs according to a quadtree structure. That is, the video coder divides the CTUs and CUs into four equal, non-overlapping squares, and each node in the quadtree has either zero or four child nodes. Nodes without child nodes are sometimes called "leaf nodes," and the CU of such a leaf node may contain one or more PUs and/or one or more TUs. The video coder may further divide the PUs and TUs. For example, in HEVC, the residual quadtree (RQT) represents a division of TUs. In HEVC, PUs represent intra-predicted data, while TUs represent residual data. Intra-predicted CUs contain intra-predicted information, such as intra-mode indications.

別の例として、ビデオエンコーダ200およびビデオデコーダ300は、VVCに従って動作するように構成され得る。VVCによれば、(ビデオエンコーダ200などの)ビデオコーダは、ピクチャを複数のコーディングツリーユニット(CTU)に区分する。ビデオエンコーダ200は、四分木二分木(QTBT)構造またはマルチタイプ木(MTT)構造などの木構造に従って、CTUを区分し得る。QTBT構造は、HEVCのCUと、PUと、TUとの分離などの、複数の区分タイプという概念を取り払う。QTBT構造は、四分木区分に従って区分される第1のレベル、および二分木区分に従って区分される第2のレベルという、2つのレベルを含む。QTBT構造のルートノードはCTUに対応する。二分木のリーフノードはコーディングユニット(CU)に対応する。 As another example, the video encoder 200 and video decoder 300 may be configured to operate according to VVC. According to VVC, the video coder (such as the video encoder 200) divides a picture into multiple coding tree units (CTUs). The video encoder 200 may divide the CTUs according to a tree structure such as a quadtree-binary tree (QTBT) structure or a multi-type tree (MTT) structure. The QTBT structure eliminates the concept of multiple division types, such as the separation of CUs, PUs, and TUs in HEVC. The QTBT structure includes two levels: a first level divided according to quadtree divisions, and a second level divided according to binary tree divisions. The root node of the QTBT structure corresponds to a CTU. The leaf nodes of the binary tree correspond to coding units (CUs).

MTT区分構造では、ブロックは四分木(QT)区分、二分木(BT)区分、および1つまたは複数のタイプの三分木(triple tree)(TT)(三分木(ternary tree)(TT)とも呼ばれる)区分を使用して区分され得る。三分木(triple tree)または三分木(ternary tree)区分は、ブロックが3つのサブブロックに分割される区分である。いくつかの例では、三分木(triple tree)または三分木(ternary tree)区分は、中心を通って元のブロックを分割することなく、ブロックを3つのサブブロックへと分割する。MTTにおける区分タイプ(たとえば、QT、BT、およびTT)は、対称的であっても、または非対称であってもよい。 In MTT partitioning structures, blocks can be partitioned using quadtree (QT) partitions, binary tree (BT) partitions, and one or more types of triple tree (TT) partitions (also called ternary tree (TT) partitions). A triple tree or ternary tree partition is a partition in which a block is divided into three subblocks. In some examples, a triple tree or ternary tree partition divides a block into three subblocks without dividing the original block through a center. The partition types in MTT (e.g., QT, BT, and TT) can be symmetrical or asymmetrical.

いくつかの例では、ビデオエンコーダ200およびビデオデコーダ300は、輝度成分および色度成分の各々を表すために単一のQTBT構造またはMTT構造を使用してもよく、他の例では、ビデオエンコーダ200およびビデオデコーダ300は、輝度成分のための1つのQTBT/MTT構造および両方の色度成分のための別のQTBT/MTT構造(またはそれぞれの色度成分のための2つのQTBT/MTT構造)などの、2つ以上のQTBTまたはMTT構造を使用してもよい。 In some examples, the video encoder 200 and video decoder 300 may use a single QTBT or MTT structure to represent the luminance and chromaticity components, respectively. In other examples, the video encoder 200 and video decoder 300 may use two or more QTBT or MTT structures, such as one QTBT/MTT structure for the luminance component and another QTBT/MTT structure for both chromaticity components (or two QTBT/MTT structures for each chromaticity component).

ビデオエンコーダ200およびビデオデコーダ300は、HEVCに従った四分木区分、QTBT区分、MTT区分、または他の区分構造を使用するように構成され得る。説明を目的に、本開示の技法の説明は、QTBT区分に関連して提示される。しかしながら、本開示の技法は、四分木区分、または他のタイプの区分も使用するように構成される、ビデオコーダにも適用され得ることを理解されたい。 The video encoder 200 and video decoder 300 may be configured to use a quadtree partition, QTBT partition, MTT partition, or other partition structure according to HEVC. For illustrative purposes, the description of the techniques of this disclosure is presented in relation to QTBT partitions. However, it should be understood that the techniques of this disclosure may also be applicable to video coders configured to use quadtree partitions or other types of partitions.

いくつかの例では、CTUは、ルマサンプルのコーディングツリーブロック(CTB)、3つのサンプルアレイを有するピクチャのクロマサンプルの2つの対応するCTB、またはモノクロームピクチャもしくはサンプルをコーディングするために使用される3つの別個の色平面およびシンタックス構造を使用してコーディングされたピクチャのサンプルのCTBを含む。CTBは、CTBへの成分の分割が区分であるような、何らかの値のNに対するサンプルのN×Nブロックであり得る。成分は、1つのアレイまたは4:2:0、4:2:2、もしくは4:4:4カラーフォーマットでピクチャを構成する3つのアレイ(ルマおよび2つのクロマ)のうちの1つからの単一のサンプル、あるいはアレイまたはモノクロームフォーマットでピクチャを構成するアレイの単一のサンプルである。いくつかの例では、コーディングブロックは、コーディングブロックへのCTBの分割が区分であるような、何らかの値のMおよびNに対するサンプルのM×Nブロックである。 In some examples, a CTU includes a coding tree block (CTB) of a luma sample, two corresponding CTBs of a chroma sample for a picture with three sample arrays, or a CTB of a sample for a picture coded using three separate color planes and syntax structures used to code a monochrome picture or sample. A CTB can be an N×N block of samples for some value N, such that the division of components into the CTB is a partition. A component is a single sample from one array or one of three arrays (luma and two chromas) that make up a picture in a 4:2:0, 4:2:2, or 4:4:4 color format, or a single sample from an array or array that makes up a picture in a monochrome format. In some examples, a coding block is an M×N block of samples for some values M and N, such that the division of the CTB into the coding block is a partition.

ブロック(たとえば、CTUまたはCU)は、ピクチャにおいて様々な方法でグループ化され得る。一例として、ブリックは、ピクチャの中の特定のタイル内のCTU行の長方形領域を指し得る。タイルは、ピクチャにおける特定のタイル列および特定のタイル行の中のCTUの長方形領域であり得る。タイル列は、ピクチャの高さに等しい高さおよび(たとえば、ピクチャパラメータセットなどにおいて)シンタックス要素によって指定される幅を有する、CTUの長方形領域を指す。タイル行は、(たとえば、ピクチャパラメータセットなどにおいて)シンタックス要素によって指定される高さおよびピクチャの幅に等しい幅を有する、CTUの長方形領域を指す。 Blocks (e.g., CTUs or CUs) can be grouped in various ways within a picture. For example, a brick might refer to a rectangular area of CTU rows within a specific tile in a picture. A tile can be a rectangular area of CTU within a specific tile column or tile row in a picture. A tile column refers to a rectangular area of CTU with a height equal to the picture's height and a width specified by a syntax element (e.g., in a picture parameter set). A tile row refers to a rectangular area of CTU with a height specified by a syntax element (e.g., in a picture parameter set) and a width equal to the picture's width.

いくつかの例では、タイルは複数のブリックへと区分されてもよく、ブリックの各々はタイル内の1つまたは複数のCTU行を含んでもよい。複数のブリックに区分されないタイルも、ブリックと呼ばれることがある。しかしながら、タイルの真のサブセットであるブリックは、タイルと呼ばれないことがある。 In some examples, a tile may be divided into multiple bricks, each brick containing one or more CTU rows within the tile. A tile that is not divided into multiple bricks may also be called a brick. However, a brick that is a true subset of a tile may not be called a tile.

ピクチャの中のブリックは、スライスにおいても並べられ得る。スライスは、単一のネットワーク抽象化レイヤ(NAL)ユニットに独占的に含まれ得る、整数個のピクチャのブリックであり得る。いくつかの例では、スライスは、いくつかの完全なタイル、または、1つのタイルの一連の連続する完全なブリックのみの、いずれかを含む。 Bricks within a picture can also be arranged in a slice. A slice can be an integer number of picture bricks that can exclusively reside within a single Network Abstraction Layer (NAL) unit. In some examples, a slice may contain either several complete tiles or only a series of consecutive complete bricks within a single tile.

本開示は、垂直方向および水平方向の寸法に関して、ブロック(CUまたは他のビデオブロックなど)のサンプル寸法を指すために、「N×N」および「N対N」を、たとえば16×16サンプルまたは16対16サンプルを交換可能に使用することがある。一般に、16×16 CUは、垂直方向に16個のサンプル(y=16)と水平方向に16個のサンプル(x=16)とを有する。同様に、N×N CUは、一般に、垂直方向にN個のサンプルと水平方向にN個のサンプルとを有し、ここでNは、非負の整数値を表す。CUの中のサンプルは、行および列として並べられ得る。その上、CUは、必ずしも水平方向に垂直方向と同じ数のサンプルを有する必要があるとは限らない。たとえば、CUはN×Mサンプルを備えることがあり、ここでMは必ずしもNと等しいとは限らない。 This disclosure may use "N×N" and "N to N" interchangeably to refer to the sample dimensions of a block (such as a CU or other video block) with respect to vertical and horizontal dimensions, for example, 16×16 samples or 16 to 16 samples. Generally, a 16×16 CU has 16 samples vertically (y=16) and 16 samples horizontally (x=16). Similarly, an N×N CU generally has N samples vertically and N samples horizontally, where N represents a non-negative integer. The samples within a CU can be arranged as rows and columns. Furthermore, a CU does not necessarily have to have the same number of samples horizontally as vertically. For example, a CU may have N×M samples, where M is not necessarily equal to N.

ビデオエンコーダ200は、予測情報および/または残差情報、ならびに他の情報を表す、CUのためのビデオデータを符号化する。予測情報は、CUのための予測ブロックを形成するために、CUがどのように予測されるべきかを示す。残差情報は一般に、符号化の前のCUのサンプルと予測ブロックとの間のサンプルごとの差を表す。 The video encoder 200 encodes video data for the predictive block (CU), representing prediction information and/or residual information, as well as other information. The prediction information indicates how the CU should be predicted in order to form a prediction block for the CU. The residual information generally represents the sample-by-sample difference between the CU samples before encoding and the prediction block.

CUを予測するために、ビデオエンコーダ200は一般に、インター予測またはイントラ予測を通じて、CUのための予測ブロックを形成し得る。インター予測は一般に、以前にコーディングされたピクチャのデータからCUを予測することを指し、一方、イントラ予測は一般に、同じピクチャの以前にコーディングされたデータからCUを予測することを指す。インター予測を実行するために、ビデオエンコーダ200は、1つまたは複数の動きベクトルを使用して予測ブロックを生成し得る。ビデオエンコーダ200は一般に、たとえば、CUと参照ブロックとの差に関して、CUとよく一致する参照ブロックを特定するために、動き探索を実行し得る。ビデオエンコーダ200は、絶対差分和(SAD)、二乗差分和(SSD)、平均絶対差(MAD)、平均二乗差(MSD)、または他のそのような差分計算を使用して差分メトリックを計算し、参照ブロックが現在のCUとよく一致するかどうかを決定し得る。いくつかの例では、ビデオエンコーダ200は、単方向予測または双方向予測を使用して現在のCUを予測し得る。 To predict the CU (Control Unit), the video encoder 200 can generally form prediction blocks for the CU through inter-prediction or intra-prediction. Inter-prediction generally refers to predicting the CU from data of a previously coded picture, while intra-prediction generally refers to predicting the CU from previously coded data of the same picture. To perform inter-prediction, the video encoder 200 can generate prediction blocks using one or more motion vectors. The video encoder 200 can generally perform motion search to identify a reference block that closely matches the CU, for example, with respect to the difference between the CU and the reference block. The video encoder 200 can calculate a difference metric using absolute difference sum (SAD), squared difference sum (SSD), mean absolute difference (MAD), mean squared difference (MSD), or other such difference calculations to determine whether the reference block closely matches the current CU. In some examples, the video encoder 200 can predict the current CU using unidirectional or bidirectional prediction.

VVCのいくつかの例は、インター予測モードと見なされ得るアフィン動き補償モードも提供する。アフィン動き補償モードでは、ビデオエンコーダ200は、ズームインもしくはズームアウト、回転、射影運動、または他の不規則な運動タイプなどの、非並進運動を表す2つ以上の動きベクトルを決定し得る。 Some examples of VVC also offer an affine motion compensation mode, which can be considered an interpredictive mode. In affine motion compensation mode, the video encoder 200 may determine two or more motion vectors representing non-translational motion, such as zooming in or out, rotation, projection, or other irregular motion types.

イントラ予測を実行するために、ビデオエンコーダ200は、予測ブロックを生成するためにイントラ予測モードを選択し得る。VVCのいくつかの例は、様々な方向モードを含む67個のイントラ予測モード、ならびに平面モードおよびDCモードを提供する。一般に、ビデオエンコーダ200は、現在ブロック(たとえば、CUのブロック)のサンプルをそれから予測すべき、その現在ブロックに対する近隣のサンプルを記述する、イントラ予測モードを選択する。そのようなサンプルは一般に、ビデオエンコーダ200がラスター走査順序で(左から右、上から下)CTUおよびCUをコーディングすると仮定して、現在ブロックと同じピクチャにおいて、現在ブロックの上、上および左、または左にあり得る。 To perform intra-prediction, the video encoder 200 may select an intra-prediction mode to generate prediction blocks. Several examples of VVCs offer 67 intra-prediction modes, including various directional modes, as well as planar and DC modes. Generally, the video encoder 200 selects an intra-prediction mode that describes neighboring samples to the current block (e.g., a block of CUs) from which samples of the current block should be predicted. Such samples can generally be located above, above, and to the left of, or to the left of, the current block, in the same picture as the current block, assuming the video encoder 200 codes CTUs and CUs in raster scanning order (left to right, top to bottom).

ビデオエンコーダ200は、現在ブロックのための予測モードを表すデータを符号化する。たとえば、インター予測モードでは、ビデオエンコーダ200は、様々な利用可能なインター予測モードのいずれが使用されるかを表すデータ、ならびに、対応するモードのための動き情報を符号化し得る。単方向または双方向のインター予測のために、たとえば、ビデオエンコーダ200は、高度動きベクトル予測(AMVP)モードまたはマージモードを使用して動きベクトルを符号化し得る。ビデオエンコーダ200は、同様のモードを使用して、アフィン動き補償モードのために動きベクトルを符号化し得る。 The video encoder 200 encodes data representing the prediction mode for the current block. For example, in interprediction mode, the video encoder 200 may encode data representing which of the various available interprediction modes is used, as well as motion information for the corresponding mode. For unidirectional or bidirectional interprediction, for example, the video encoder 200 may encode motion vectors using advanced motion vector prediction (AMVP) mode or merge mode. The video encoder 200 may also encode motion vectors for affine motion compensation mode using similar modes.

ブロックのイントラ予測またはインター予測などの予測に従って、ビデオエンコーダ200は、ブロックのための残差データを計算し得る。残差ブロックなどの残差データは、対応する予測モードを使用して形成された、ブロックとブロックのための予測ブロックとの間のサンプルごとの差を表す。ビデオエンコーダ200は、サンプル領域の代わりに変換領域において変換されたデータを生成するために、1つまたは複数の変換を残差ブロックに適用し得る。たとえば、ビデオエンコーダ200は、離散コサイン変換(DCT)、整数変換、ウェーブレット変換、または概念的に類似の変換を、残差ビデオデータに適用し得る。加えて、ビデオエンコーダ200は、モード依存非分離可能二次変換(MDNSST: mode-dependent non-separable secondary transform)、信号依存変換、カルーネンレーベ変換(KLT)などの二次的な変換を、最初の変換に続いて適用し得る。ビデオエンコーダ200は、1つまたは複数の変換の適用に続いて変換係数を生成する。 According to predictions such as intra-prediction or inter-prediction of a block, the video encoder 200 may calculate residual data for the block. Residual data, such as residual blocks, represents the sample-by-sample difference between the block and the predicted block for the block, formed using the corresponding prediction mode. The video encoder 200 may apply one or more transformations to the residual blocks to generate transformed data in the transformation region instead of the sample region. For example, the video encoder 200 may apply a discrete cosine transform (DCT), integer transform, wavelet transform, or a conceptually similar transform to the residual video data. In addition, the video encoder 200 may apply secondary transformations such as mode-dependent non-separable secondary transform (MDNSST), signal-dependent transform, or Carunen-Löwe transform (KLT) following the initial transformation. The video encoder 200 generates transformation coefficients following the application of one or more transformations.

上述のように、変換係数を生成するための任意の変換に続いて、ビデオエンコーダ200は、変換係数の量子化を実行し得る。量子化は一般に、変換係数を表すために使用されるデータの量をできるだけ低減するために変換係数が量子化され、さらなる圧縮を実現するプロセスを指す。量子化プロセスを実行することによって、ビデオエンコーダ200は、変換係数の一部またはすべてに関連するビット深度を低減し得る。たとえば、ビデオエンコーダ200は、量子化の間にnビット値をmビット値に切り捨ててもよく、ここで、nはmよりも大きい。いくつかの例では、量子化を実行するために、ビデオエンコーダ200は、量子化されるべき値のビット単位の右シフトを実行してもよい。 As described above, following any transformation to generate the transformation coefficients, the video encoder 200 may perform quantization of the transformation coefficients. Quantization generally refers to the process of quantizing the transformation coefficients to reduce the amount of data used to represent them as much as possible, thereby achieving further compression. By performing the quantization process, the video encoder 200 may reduce the bit depth associated with some or all of the transformation coefficients. For example, the video encoder 200 may truncate an n-bit value to an m-bit value during quantization, where n is greater than m. In some examples, to perform quantization, the video encoder 200 may perform a bitwise right shift of the value to be quantized.

量子化に続いて、ビデオエンコーダ200は、変換係数を走査し、量子化された変換係数を含む2次元行列から1次元ベクトルを生成し得る。走査は、より高いエネルギー(それゆえより低い周波数)の変換係数をベクトルの前方に置き、より低いエネルギー(それゆえより高い周波数)の変換係数をベクトルの後方に置くように設計され得る。いくつかの例では、ビデオエンコーダ200は、量子化された変換係数を走査するためにあらかじめ定められた走査順序を利用して直列化されたベクトルを生成し、次いで、ベクトルの量子化された変換係数をエントロピー符号化し得る。他の例では、ビデオエンコーダ200は、適応走査を実行し得る。1次元ベクトルを形成するために量子化された変換係数を走査した後、ビデオエンコーダ200は、たとえばコンテキスト適応バイナリ算術コーディング(CABAC)に従って、1次元ベクトルをエントロピー符号化し得る。ビデオエンコーダ200はまた、ビデオデータを復号する際にビデオデコーダ300によって使用するための符号化されたビデオデータに関連するメタデータを記述するシンタックス要素に対する値をエントロピー符号化し得る。 Following quantization, the video encoder 200 may scan the transformation coefficients and generate a one-dimensional vector from a two-dimensional matrix containing the quantized transformation coefficients. The scanning may be designed to place higher-energy (and therefore lower-frequency) transformation coefficients at the beginning of the vector and lower-energy (and therefore higher-frequency) transformation coefficients at the end. In some examples, the video encoder 200 may generate a serialized vector using a predetermined scanning order to scan the quantized transformation coefficients, and then entropy-encode the quantized transformation coefficients of the vector. In other examples, the video encoder 200 may perform adaptive scanning. After scanning the quantized transformation coefficients to form a one-dimensional vector, the video encoder 200 may entropy-encode the one-dimensional vector, for example, according to context-adaptive binary arithmetic coding (CABAC). The video encoder 200 may also entropy-encode values for syntax elements describing metadata related to the encoded video data for use by the video decoder 300 when decoding the video data.

CABACを実行するために、ビデオエンコーダ200は、送信されるべきシンボルにコンテキストモデル内のコンテキストを割り当て得る。コンテキストは、たとえば、シンボルの隣接する値が0であるかどうかに関連し得る。確率決定は、シンボルに割り当てられたコンテキストに基づき得る。 To perform CABAC, the video encoder 200 may assign a context within a context model to the symbols to be transmitted. This context may relate, for example, to whether the adjacent values of a symbol are 0. Probability decisions may be based on the context assigned to the symbols.

ビデオエンコーダ200はさらに、ビデオデコーダ300への、ブロックベースのシンタックスデータ、ピクチャベースのシンタックスデータ、およびシーケンスベースのシンタックスデータなどのシンタックスデータを、たとえば、ピクチャヘッダ、ブロックヘッダ、スライスヘッダ、または、シーケンスパラメータセット(SPS)、ピクチャパラメータセット(PPS)、もしくはビデオパラメータセット(VPS)などの他のシンタックスデータにおいて生成し得る。ビデオデコーダ300は、対応するビデオデータをどのように復号するかを決定するために、そのようなシンタックスデータを同様に復号し得る。 The video encoder 200 may further generate syntax data for the video decoder 300, such as block-based syntax data, picture-based syntax data, and sequence-based syntax data, for example, in the form of a picture header, block header, slice header, or other syntax data such as a sequence parameter set (SPS), picture parameter set (PPS), or video parameter set (VPS). The video decoder 300 may similarly decode such syntax data to determine how to decode the corresponding video data.

このようにして、ビデオエンコーダ200は、符号化されたビデオデータ、たとえば、ブロック(たとえば、CU)へのピクチャの区分ならびにブロックに対する予測および/または残差情報を記述するシンタックス要素を含む、ビットストリームを生成し得る。最終的に、ビデオデコーダ300は、ビットストリームを受信し、符号化されたビデオデータを復号し得る。 In this way, the video encoder 200 can generate a bitstream containing encoded video data, for example, syntax elements describing the division of a picture into blocks (e.g., CUs) and prediction and/or residual information for those blocks. Finally, the video decoder 300 can receive the bitstream and decode the encoded video data.

一般に、ビデオデコーダ300は、ビットストリームの符号化されたビデオデータを復号するために、ビデオエンコーダ200によって実行されたプロセスと逆のプロセスを実行する。たとえば、ビデオデコーダ300は、ビデオエンコーダ200のCABAC符号化プロセスと逆ではあるが実質的に同様の方式で、CABACを使用してビットストリームのシンタックス要素に対する値を復号し得る。シンタックス要素は、ピクチャをCTUに区分するための区分情報、およびQTBT構造などの対応する区分構造に従った各CTUの区分を定義して、CTUのCUを定義し得る。シンタックス要素はさらに、ビデオデータのブロック(たとえば、CU)に対する予測および残差情報を定義し得る。 Generally, the video decoder 300 performs the reverse process of the process performed by the video encoder 200 to decode the encoded video data of the bitstream. For example, the video decoder 300 may decode values for syntax elements of the bitstream using CABAC in a substantially similar, but reverse, manner to the CABAC encoding process of the video encoder 200. The syntax elements may define the CUs of the CTUs by defining partitioning information for partitioning the picture into CTUs, and the partitions of each CTU according to a corresponding partitioning structure such as a QTBT structure. The syntax elements may further define prediction and residual information for blocks of video data (e.g., CUs).

残差情報は、たとえば量子化された変換係数によって表され得る。ビデオデコーダ300は、ブロックの量子化された変換係数を逆量子化し逆変換して、ブロックのための残差ブロックを再生し得る。ビデオデコーダ300は、シグナリングされた予測モード(イントラ予測またはインター予測)および関連する予測情報(たとえば、インター予測のための動き情報)を使用して、ブロックのための予測ブロックを形成する。ビデオデコーダ300は次いで、予測ブロックと残差ブロックとを(サンプルごとに)組み合わせて、元のブロックを再生し得る。ビデオデコーダ300は、ブロックの境界に沿った視覚的なアーティファクトを減らすために、デブロッキング処理を実行することなどの、追加の処理を実行し得る。 Residual information can be represented, for example, by quantized transformation coefficients. The video decoder 300 may reconstruct the residual block for the block by inverse quantizing and inverse transforming the quantized transformation coefficients of the block. The video decoder 300 uses the signaled prediction mode (intra-prediction or inter-prediction) and associated prediction information (e.g., motion information for inter-prediction) to form a predicted block for the block. The video decoder 300 may then combine the predicted block and the residual block (sample by sample) to reconstruct the original block. The video decoder 300 may perform additional processing, such as deblocking, to reduce visual artifacts along the block boundaries.

本開示の技法によれば、方法は、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用するステップと、少なくとも1つの改良された動きベクトルに基づいてブロックを復号するステップとを含み、マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える。 According to the technique of this disclosure, the method comprises the steps of: applying a multipath decoder-side motion vector improvement (DMVR) to a motion vector for a block of video data to determine at least one improved motion vector; and decoding the block based on at least one improved motion vector, wherein the multipath DMVR comprises: a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of the block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of the block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

本開示の技法によれば、デバイスは、ビデオデータを記憶するように構成されるメモリと、回路で実装されメモリに通信可能に結合される1つまたは複数のプロセッサとを含み、1つまたは複数のプロセッサは、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用し、少なくとも1つの改良された動きベクトルに基づいてブロックを復号するように構成され、マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える。 According to the technique of this disclosure, the device includes a memory configured to store video data and one or more processors implemented by circuitry and communicatively coupled to the memory, the one or more processors configured to apply a multipath decoder-side motion vector improvement (DMVR) to a motion vector for a block of video data to determine at least one improved motion vector, and to decode the block based on the at least one improved motion vector, wherein the multipath DMVR comprises a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of a block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of a block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

本開示の技法によれば、非一時的コンピュータ可読媒体は、実行されると、1つまたは複数のプロセッサに、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルへマルチパスデコーダ側動きベクトル改良(DMVR)を適用させ、少なくとも1つの改良された動きベクトルに基づいてブロックを復号させる、命令を記憶し、マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える。 According to the technique of this disclosure, a non-temporary computer-readable medium stores instructions that, when executed, cause one or more processors to apply a multipath decoder-side motion vector improvement (DMVR) to the motion vectors for a block of video data to determine at least one improved motion vector, and to decode the block based on the at least one improved motion vector. The multipath DMVR comprises: a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of a block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of a block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

本開示の技法によれば、デバイスは、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用するための手段と、少なくとも1つの改良された動きベクトルに基づいてブロックを復号するための手段とを含み、マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える。 According to the technique of this disclosure, the device includes means for applying a multipath decoder-side motion vector improvement (DMVR) to a motion vector for a block of video data to determine at least one improved motion vector, and means for decoding the block based on at least one improved motion vector, wherein the multipath DMVR comprises: a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of the block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of the block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

本開示の技法によれば、方法は、改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用するステップと、改良された動きベクトルに基づいてブロックをコーディングするステップとを含む。 According to the technique of this disclosure, the method includes the steps of: applying a multipath decoder-side motion vector improvement (DMVR) to the motion vector for a block of video data to determine an improved motion vector; and coding the block based on the improved motion vector.

本開示の技法によれば、デバイスは、ビデオデータを記憶するように構成されるメモリと、回路で実装されメモリに通信可能に結合される1つまたは複数のプロセッサとを含み、1つまたは複数のプロセッサは、本開示の技法のいずれかを実行するように構成される。 According to the techniques of this disclosure, the device includes a memory configured to store video data and one or more processors that are circuit-implemented and communicatively coupled to the memory, the one or more processors being configured to perform any of the techniques of this disclosure.

本開示の技法によれば、デバイスは、本開示の技法のいずれかを実行するための少なくとも1つの手段を含む。 According to the techniques of this disclosure, the device includes at least one means for performing any of the techniques of this disclosure.

本開示の技法によれば、コンピュータ可読記憶媒体には、実行されると、プログラマブルプロセッサに本開示の技法のいずれかを実行させる命令が符号化されている。 According to the techniques of this disclosure, a computer-readable storage medium encodes instructions that, when executed, cause a programmable processor to perform one of the techniques of this disclosure.

本開示は全般に、シンタックス要素などの何らかの情報を「シグナリングすること」に言及することがある。「シグナリング」という用語は、全般に、シンタックス要素に対する値、および/または符号化されたビデオデータを復号するために使用される他のデータの通信を指すことがある。すなわち、ビデオエンコーダ200は、ビットストリームにおいてシンタックス要素に対する値をシグナリングし得る。一般に、シグナリングは、ビットストリームにおいて値を生成することを指す。上で述べられたように、ソースデバイス102は、実質的にリアルタイムで、または非リアルタイムでビットストリームをデスティネーションデバイス116に転送してもよく、これは、デスティネーションデバイス116により後で取り出すためにシンタックス要素を記憶デバイス112に記憶するときなどに起こることがある。 This disclosure may refer in general to "signaling" any information, such as syntax elements. The term "signaling" may generally refer to the communication of values for syntax elements and/or other data used to decode the encoded video data. That is, the video encoder 200 may signal values for syntax elements in the bitstream. Generally, signaling refers to generating values in the bitstream. As stated above, the source device 102 may transfer the bitstream to the destination device 116 substantially in real time or non-real time, which may occur, for example, when the destination device 116 stores the syntax elements in the storage device 112 for later retrieval.

図2Aおよび図2Bは、例示的な四分木二分木(QTBT)構造130および対応するコーディングツリーユニット(CTU)132を示す概念図である。実線は四分木分割を表し、点線は二分木分割を示す。二分木の各分割(すなわち、非リーフ)ノードでは、どちらの分割タイプ(すなわち、水平または垂直)が使用されるかを示すために1つのフラグがシグナリングされ、この例では0が水平の分割を示し、1が垂直の分割を示す。四分木分割の場合、四分木ノードがブロックをサイズが等しい4つのサブブロックに水平にかつ垂直に分割するので、分割タイプを示す必要はない。したがって、QTBT構造130の領域木レベル(すなわち、実線)のためのシンタックス要素(分割情報など)およびQTBT構造130の予測木レベル(すなわち、破線)のためのシンタックス要素(分割情報など)を、ビデオエンコーダ200は符号化してもよく、ビデオデコーダ300は復号してもよい。QTBT構造130の末端リーフノードによって表されるCUのための、予測データおよび変換データなどのビデオデータを、ビデオエンコーダ200は符号化してもよく、ビデオデコーダ300は復号してもよい。 Figures 2A and 2B are conceptual diagrams showing an exemplary quadtree-binary tree (QTBT) structure 130 and its corresponding coding tree unit (CTU) 132. Solid lines represent quadtree partitions, and dotted lines represent binary tree partitions. At each partition (i.e., non-leaf) node of the binary tree, one flag is signaled to indicate which partition type (i.e., horizontal or vertical) is used; in this example, 0 indicates a horizontal partition and 1 indicates a vertical partition. In the case of a quadtree partition, there is no need to indicate the partition type, as the quadtree node divides the block horizontally and vertically into four subblocks of equal size. Therefore, the video encoder 200 may encode syntax elements (such as partition information) for the domain tree level (i.e., solid lines) of the QTBT structure 130, and the video decoder 300 may decode syntax elements (such as partition information) for the predictor tree level (i.e., dashed lines) of the QTBT structure 130. The video encoder 200 may encode video data, such as prediction data and transformation data, for CUs represented by the terminal leaf nodes of the QTBT structure 130, and the video decoder 300 may decode it.

一般に、図2BのCTU132は、第1および第2のレベルにおけるQTBT構造130のノードに対応するブロックのサイズを定義するパラメータと関連付けられ得る。これらのパラメータは、CTUサイズ(サンプルの中のCTU132のサイズを表す)、最小四分木サイズ(MinQTSize、最小の許容される四分木リーフノードサイズを表す)、最大二分木サイズ(MaxBTSize、最大の許容される二分木ルートノードサイズを表す)、最大二分木深度(MaxBTDepth、最大の許容される二分木深度を表す)、および最小二分木サイズ(MinBTSize、最小の許容される二分木リーフノードサイズを表す)を含み得る。 Generally, CTU132 in Figure 2B can be associated with parameters that define the size of the blocks corresponding to the nodes of the QTBT structure 130 at the first and second levels. These parameters may include the CTU size (representing the size of CTU132 in the sample), the minimum quadtree size (MinQTSize, representing the minimum allowed quadtree leaf node size), the maximum binary tree size (MaxBTSize, representing the maximum allowed binary tree root node size), the maximum binary tree depth (MaxBTDepth, representing the maximum allowed binary tree depth), and the minimum binary tree size (MinBTSize, representing the minimum allowed binary tree leaf node size).

CTUに対応するQTBT構造のルートノードは、QTBT構造の第1のレベルにおいて4つの子ノードを有してもよく、それらの各々が、四分木区分に従って区分されてもよい。すなわち、第1のレベルのノードは、リーフノード(子ノードを有しない)を有するか、または4つの子ノードを有するかのいずれかである。QTBT構造130の例は、親ノードと、分岐のための実線を有する子ノードとを含むものとして、そのようなノードを表す。第1のレベルのノードが最大の許容される二分木ルートノードサイズ(MaxBTSize)より大きくない場合、ノードはそれぞれの二分木によってさらに区分され得る。分割に起因するノードが最小の許容される二分木リーフノードサイズ(MinBTSize)または最大の許容される二分木深度(MaxBTDepth)に達するまで、1つのノードの二分木分割が繰り返され得る。QTBT構造130の例は、分岐のための破線を有するようなノードを表す。二分木リーフノードはコーディングユニット(CU)と呼ばれ、これは、さらなる区分なしで予測(たとえば、ピクチャ内またはピクチャ間予測)および変換のために使用される。上で論じられたように、CUは「ビデオブロック」または「ブロック」とも呼ばれ得る。 The root node of a QTBT structure corresponding to a CTU may have four child nodes at the first level of the QTBT structure, each of which may be partitioned according to a quadtree partition. That is, a node at the first level may have a leaf node (no child nodes) or four child nodes. An example of QTBT structure 130 represents such a node, including a parent node and child nodes with solid lines for branching. If a node at the first level is not larger than the maximum allowable binary tree root node size (MaxBTSize), the node may be further partitioned by its respective binary tree. Binary tree partitioning of a single node may be repeated until the node resulting from the partition reaches the minimum allowable binary tree leaf node size (MinBTSize) or the maximum allowable binary tree depth (MaxBTDepth). An example of QTBT structure 130 represents a node with dashed lines for branching. Binary tree leaf nodes are called coding units (CUs), which are used for prediction (e.g., in-picture or between-picture predictions) and transformations without further partitioning. As discussed above, CU can also be called "video block" or "block."

QTBT区分構造の一例では、CTUサイズは128×128(ルマサンプルおよび2つの対応する64×64クロマサンプル)として設定され、MinQTSizeは16×16として設定され、MaxBTSizeは64×64として設定され、MinBTSize(幅と高さの両方に対して)は4として設定され、MaxBTDepthは4として設定される。四分木リーフノードを生成するために、四分木区分がまずCTUに適用される。四分木リーフノードは、16×16(すなわち、MinQTSize)から128×128(すなわち、CTUサイズ)までのサイズを有し得る。四分木リーフノードが128×128である場合、サイズがMaxBTSize(すなわち、この例では64×64)を超えるので、四分木リーフノードは二分木によってさらに分割されない。それ以外の場合、四分木リーフノードは二分木によってさらに区分される。したがって、四分木リーフノードは二分木のルートノードでもあり、0という二分木深度を有する。二分木深度がMaxBTDepth(この例では4)に達するとき、さらなる分割は許可されない。MinBTSize(この例では4)に等しい幅を有する二分木ノードは、その二分木ノードに対してさらなる垂直分割(すなわち、幅の分割)が許可されないことを示唆する。同様に、MinBTSizeに等しい高さを有する二分木ノードは、その二分木ノードに対してさらなる水平分割(すなわち、高さの分割)が許可されないことを示唆する。上述のように、二分木のリーフノードはCUと呼ばれ、さらなる区分なしで予測および変換に従ってさらに処理される。 In an example of a QTBT partitioned structure, the CTU size is set to 128×128 (a chroma sample and two corresponding 64×64 chroma samples), MinQTSize is set to 16×16, MaxBTSize is set to 64×64, MinBTSize (for both width and height) is set to 4, and MaxBTDepth is set to 4. To generate a quadtree leaf node, the quadtree partition is first applied to the CTU. The quadtree leaf node can have sizes ranging from 16×16 (i.e., MinQTSize) to 128×128 (i.e., CTU size). If the quadtree leaf node is 128×128, the size exceeds MaxBTSize (i.e., 64×64 in this example), so the quadtree leaf node is not further partitioned by a binary tree. Otherwise, the quadtree leaf node is further partitioned by a binary tree. Therefore, a quadtree leaf node is also the root node of a binary tree and has a binary tree depth of 0. When the binary tree depth reaches MaxBTDepth (4 in this example), further partitioning is not permitted. A binary tree node with a width equal to MinBTSize (4 in this example) suggests that further vertical partitioning (i.e., partitioning by width) is not permitted for that binary tree node. Similarly, a binary tree node with a height equal to MinBTSize suggests that further horizontal partitioning (i.e., partitioning by height) is not permitted for that binary tree node. As mentioned above, a binary tree leaf node is called a CU and is further processed according to prediction and transformation without further partitioning.

図3は、本開示の技法を実行し得る例示的なビデオエンコーダ200を示すブロック図である。図3は、説明のために提供され、本開示において広く例示および説明されるような技法の限定と見なされるべきでない。説明のために、本開示は、VVC(ITU-T H.266)、およびHEVC(ITU-T H.265)の技法によるビデオエンコーダ200を説明する。しかしながら、本開示の技法は、他のビデオコーディング規格に従って構成されるビデオ符号化デバイスによって実行され得る。 Figure 3 is a block diagram showing an exemplary video encoder 200 capable of performing the techniques of this disclosure. Figure 3 is provided for illustrative purposes and should not be considered a limitation of the techniques as broadly illustrated and described herein. For illustrative purposes, this disclosure describes a video encoder 200 using VVC (ITU-T H.266) and HEVC (ITU-T H.265) techniques. However, the techniques of this disclosure may be performed by video encoding devices configured according to other video coding standards.

図3の例では、ビデオエンコーダ200は、ビデオデータメモリ230、モード選択ユニット202、残差生成ユニット204、変換処理ユニット206、量子化ユニット208、逆量子化ユニット210、逆変換処理ユニット212、再構築ユニット214、フィルタユニット216、復号ピクチャバッファ(DPB)218、およびエントロピー符号化ユニット220を含む。ビデオデータメモリ230、モード選択ユニット202、残差生成ユニット204、変換処理ユニット206、量子化ユニット208、逆量子化ユニット210、逆変換処理ユニット212、再構築ユニット214、フィルタユニット216、DPB218、およびエントロピー符号化ユニット220のいずれかまたはすべては、1つまたは複数のプロセッサにおいてまたは処理回路において実装され得る。たとえば、ビデオエンコーダ200のユニットは、ハードウェア回路の一部としての1つまたは複数の回路もしくは論理要素として、またはプロセッサ、ASIC、もしくはFPGAの一部として実装され得る。さらに、ビデオエンコーダ200は、これらおよび他の機能を実行するための追加または代替のプロセッサまたは処理回路を含み得る。 In the example in Figure 3, the video encoder 200 includes a video data memory 230, a mode selection unit 202, a residual generation unit 204, a conversion processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse conversion processing unit 212, a reconstruction unit 214, a filter unit 216, a decoded picture buffer (DPB) 218, and an entropy coding unit 220. Any or all of the video data memory 230, mode selection unit 202, residual generation unit 204, conversion processing unit 206, quantization unit 208, inverse quantization unit 210, inverse conversion processing unit 212, reconstruction unit 214, filter unit 216, DPB 218, and entropy coding unit 220 may be implemented in one or more processors or processing circuits. For example, the units of the video encoder 200 may be implemented as one or more circuits or logic elements as part of a hardware circuit, or as part of a processor, ASIC, or FPGA. Furthermore, the video encoder 200 may include additional or alternative processors or processing circuits to perform these and other functions.

ビデオデータメモリ230は、ビデオエンコーダ200のコンポーネントによって符号化されるべきビデオデータを記憶し得る。ビデオエンコーダ200は、ビデオデータメモリ230に記憶されるビデオデータを、たとえば、ビデオソース104(図1)から受信し得る。DPB218は、ビデオエンコーダ200による後続のビデオデータの予測において使用するための参照ビデオデータを記憶する、参照ピクチャメモリとして作動し得る。ビデオデータメモリ230およびDPB218は、同期DRAM(SDRAM)を含むダイナミックランダムアクセスメモリ(DRAM)、磁気抵抗RAM(MRAM)、抵抗RAM(RRAM)、または他のタイプのメモリデバイスなどの、様々なメモリデバイスのいずれかによって形成され得る。ビデオデータメモリ230およびDPB218は、同じメモリデバイスまたは別個のメモリデバイスによって提供され得る。様々な例では、ビデオデータメモリ230は、示されるように、ビデオエンコーダ200の他のコンポーネントとともにオンチップであってもよく、またはそれらのコンポーネントに対してオフチップであってもよい。 The video data memory 230 can store video data to be encoded by the components of the video encoder 200. The video encoder 200 may receive video data to be stored in the video data memory 230, for example, from a video source 104 (Figure 1). The DPB 218 may act as a reference picture memory, storing reference video data for use by the video encoder 200 in predicting subsequent video data. The video data memory 230 and DPB 218 may be formed by any of various memory devices, such as dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. The video data memory 230 and DPB 218 may be provided by the same memory device or by separate memory devices. In various examples, the video data memory 230 may be on-chip with the other components of the video encoder 200, as shown, or off-chip relative to those components.

本開示では、ビデオデータメモリ230への言及は、そのように別段記述されない限り、ビデオエンコーダ200の内部のメモリに限定されるものとして解釈されるべきではなく、または、そのように別段記述されない限り、ビデオエンコーダ200の外部のメモリに限定されるものとして解釈されるべきではない。むしろ、ビデオデータメモリ230への言及は、ビデオエンコーダ200が符号化のために受信するビデオデータ(たとえば、符号化されるべき現在ブロックに対するビデオデータ)を記憶する参照メモリとして理解されるべきである。図1のメモリ106はまた、ビデオエンコーダ200の様々なユニットからの出力の一時的な記憶を行い得る。 In this disclosure, references to the video data memory 230 should not be interpreted as being limited to memory inside the video encoder 200 unless otherwise stated, nor should they be interpreted as being limited to memory outside the video encoder 200 unless otherwise stated. Rather, references to the video data memory 230 should be understood as reference memory that stores video data received by the video encoder 200 for encoding (e.g., video data for the current block to be encoded). Memory 106 in Figure 1 may also temporarily store outputs from various units of the video encoder 200.

図3の様々なユニットは、ビデオエンコーダ200によって実行される動作を理解するのを助けるために示されている。ユニットは、固定機能の回路、プログラマブル回路、またはこれらの組合せとして実装され得る。固定機能の回路は、特定の機能を提供する回路を指し、実行され得る動作があらかじめ設定される。プログラマブル回路は、様々なタスクを実行するようにプログラムされ得る回路を指し、実行され得る動作において柔軟な機能を提供する。たとえば、プログラマブル回路は、ソフトウェアまたはファームウェアの命令によって定義される方式で、プログラマブル回路を動作させるソフトウェアまたはファームウェアを実行し得る。固定機能の回路は、(たとえば、パラメータを受信するために、またはパラメータを出力するために)ソフトウェア命令を実行し得るが、固定機能の回路が実行する動作のタイプは、一般に不変である。いくつかの例では、ユニットのうちの1つまたは複数は、異なる回路ブロック(固定機能またはプログラマブル)であってもよく、いくつかの例では、ユニットのうちの1つまたは複数は集積回路であってもよい。 The various units in Figure 3 are shown to help understand the operations performed by the video encoder 200. Units can be implemented as fixed-function circuits, programmable circuits, or a combination thereof. Fixed-function circuits refer to circuits that provide a specific function, with predefined operations. Programmable circuits refer to circuits that can be programmed to perform various tasks, offering flexibility in the operations they can perform. For example, a programmable circuit may execute software or firmware that operates it, defined by software or firmware instructions. Fixed-function circuits may execute software instructions (e.g., to receive or output parameters), but the type of operation performed by a fixed-function circuit is generally immutable. In some examples, one or more units may be different circuit blocks (fixed-function or programmable), and in some examples, one or more units may be integrated circuits.

ビデオエンコーダ200は、算術論理ユニット(ALU)、基本機能ユニット(EFU)、デジタル回路、アナログ回路、および/または、プログラム可能回路から形成されるプログラマブルコアを含み得る。ビデオエンコーダ200の動作がプログラマブル回路によって実行されるソフトウェアを使用して実行される例では、メモリ106(図1)が、ビデオエンコーダ200が受信および実行するソフトウェアの命令(たとえば、オブジェクトコード)を記憶してもよく、またはビデオエンコーダ200内の別のメモリ(図示せず)が、そのような命令を記憶してもよい。 The video encoder 200 may include a programmable core formed from an arithmetic logic unit (ALU), an basic function unit (EFU), digital circuits, analog circuits, and/or programmable circuits. In an example where the operation of the video encoder 200 is performed using software executed by the programmable circuits, memory 106 (Figure 1) may store instructions (e.g., object code) of the software that the video encoder 200 receives and executes, or another memory (not shown) within the video encoder 200 may store such instructions.

ビデオデータメモリ230は、受信されたビデオデータを記憶するように構成される。ビデオエンコーダ200は、ビデオデータメモリ230からビデオデータのピクチャを取り出し、ビデオデータを残差生成ユニット204およびモード選択ユニット202に提供し得る。ビデオデータメモリ230中のビデオデータは、符号化されるべき未加工のビデオデータであってもよい。 The video data memory 230 is configured to store the received video data. The video encoder 200 can retrieve the video data picture from the video data memory 230 and provide the video data to the residual generation unit 204 and the mode selection unit 202. The video data in the video data memory 230 may be raw video data to be encoded.

モード選択ユニット202は、動き推定ユニット222、動き補償ユニット224、およびイントラ予測ユニット226を含む。モード選択ユニット202は、他の予測モードに従ってビデオ予測を実行するための追加の機能ユニットを含み得る。例として、モード選択ユニット202は、パレットユニット、イントラブロックコピーユニット(これは、動き推定ユニット222および/または動き補償ユニット224の一部であり得る)、アフィンユニット、線形モデル(LM)ユニットなどを含み得る。 The mode selection unit 202 includes a motion estimation unit 222, a motion compensation unit 224, and an intra-prediction unit 226. The mode selection unit 202 may include additional functional units for performing video prediction according to other prediction modes. For example, the mode selection unit 202 may include a palette unit, an intra-block copy unit (which may be part of the motion estimation unit 222 and/or the motion compensation unit 224), an affine unit, a linear model (LM) unit, and so on.

モード選択ユニット202は全般に、複数の符号化パスを協調させて、符号化パラメータの組合せと、そのような組合せに対して得られるレートひずみ値を試験する。符号化パラメータは、CUへのCTUの区分、CUに対する予測モード、CUの残差データに対する変換タイプ、CUの残差データに対する量子化パラメータなどを含み得る。モード選択ユニット202は最終的に、他の試験された組合せより良いレートひずみ値を有する符号化パラメータの組合せを選択し得る。 The mode selection unit 202 generally coordinates multiple coding paths to test combinations of coding parameters and the resulting rate distortion values. Coding parameters may include the division of the CTU to the CU, the prediction mode for the CU, the transformation type for the residual data of the CU, and the quantization parameters for the residual data of the CU. The mode selection unit 202 can ultimately select a combination of coding parameters that yields a better rate distortion value than other tested combinations.

ビデオエンコーダ200は、ビデオデータメモリ230から取り出されたピクチャを一連のCTUに区分し、スライス内に1つまたは複数のCTUをカプセル化し得る。モード選択ユニット202は、QTBT構造または上で説明されたHEVCの四分木構造などの、木構造に従ってピクチャのCTUを区分し得る。上で説明されたように、ビデオエンコーダ200は、木構造に従ってCTUを区分することから1つまたは複数のCUを形成し得る。そのようなCUは、一般に「ビデオブロック」または「ブロック」とも呼ばれ得る。 The video encoder 200 divides the picture retrieved from the video data memory 230 into a series of CTUs, and may encapsulate one or more CTUs within a slice. The mode selection unit 202 may divide the picture's CTUs according to a tree structure, such as a QTBT structure or the HEVC quadtree structure described above. As described above, the video encoder 200 may form one or more CUs from dividing the CTUs according to a tree structure. Such CUs may also be commonly referred to as “video blocks” or “blocks.”

一般に、モード選択ユニット202はまた、現在ブロック(たとえば、現在CU、またはHEVCでは、PUとTUの重複部分)に対する予測ブロックを生成するように、そのコンポーネント(たとえば、動き推定ユニット222、動き補償ユニット224、およびイントラ予測ユニット226)を制御する。現在ブロックのインター予測の場合、動き推定ユニット222は、1つまたは複数の参照ピクチャ(たとえば、DPB218に記憶されている1つまたは複数の以前にコーディングされたピクチャ)の中の1つまたは複数のよく一致する参照ブロックを特定するために、動き探索を実行し得る。具体的には、動き推定ユニット222は、たとえば、絶対差分和(SAD)、二乗差分和(SSD)、平均絶対差(MAD)、平均二乗差(MSD)などに従って、潜在的な参照ブロックが現在ブロックとどれだけ類似しているかを表す値を計算し得る。動き推定ユニット222は一般に、現在ブロックと検討されている参照ブロックとの間のサンプルごとの差を使用して、これらの計算を実行し得る。動き推定ユニット222は、現在ブロックと最もよく一致する参照ブロックを示す、これらの計算に起因する最低の値を有する参照ブロックを特定し得る。 Generally, the mode selection unit 202 also controls its components (e.g., motion estimation unit 222, motion compensation unit 224, and intra-prediction unit 226) to generate a predicted block for the current block (e.g., the current CU, or in HEVC, the overlapping portion of PU and TU). For intra-prediction of the current block, the motion estimation unit 222 may perform a motion search to identify one or more well-matching reference blocks among one or more reference pictures (e.g., one or more previously coded pictures stored in DPB 218). Specifically, the motion estimation unit 222 may calculate a value representing how similar a potential reference block is to the current block, for example, according to the sum of absolute differences (SAD), sum of squared differences (SSD), mean absolute difference (MAD), mean squared difference (MSD), etc. The motion estimation unit 222 may generally perform these calculations using the sample-by-sample difference between the current block and the reference block under consideration. The motion estimation unit 222 may identify the reference block with the lowest value resulting from these calculations, indicating the reference block that best matches the current block.

動き推定ユニット222は、現在ピクチャの中の現在ブロックの位置に対する相対的な参照ピクチャの中の参照ブロックの位置を定義する、1つまたは複数の動きベクトル(MV)を形成し得る。動き推定ユニット222は次いで、動きベクトルを動き補償ユニット224に提供し得る。たとえば、単方向のインター予測の場合、動き推定ユニット222は単一の動きベクトルを提供してもよく、一方、双方向インター予測の場合、動き推定ユニット222は2つの動きベクトルを提供してもよい。動き補償ユニット224は次いで、動きベクトルを使用して予測ブロックを生成し得る。たとえば、動き補償ユニット224は、動きベクトルを使用して参照ブロックのデータを取り出し得る。別の例として、動きベクトルが非整数サンプル精度を有する場合、動き補償ユニット224は、1つまたは複数の補間フィルタに従って、予測ブロックに対する値を補間し得る。その上、双方向インター予測の場合、動き補償ユニット224は、それぞれの動きベクトルによって特定される2つの参照ブロックに対するデータを取り出し、たとえば、サンプルごとの平均または加重平均を通じて、取り出されたデータを組み合わせ得る。 The motion estimation unit 222 may form one or more motion vectors (MVs) that define the position of a reference block in a reference picture relative to the position of the current block in the current picture. The motion estimation unit 222 may then provide the motion vectors to the motion compensation unit 224. For example, in the case of unidirectional interpretation, the motion estimation unit 222 may provide a single motion vector, while in the case of bidirectional interpretation, the motion estimation unit 222 may provide two motion vectors. The motion compensation unit 224 may then generate prediction blocks using the motion vectors. For example, the motion compensation unit 224 may use the motion vectors to extract data for the reference blocks. As another example, if the motion vectors have non-integer sample precision, the motion compensation unit 224 may interpolate values for the prediction blocks according to one or more interpolation filters. Furthermore, in the case of bidirectional interpretation, the motion compensation unit 224 may extract data for the two reference blocks identified by each motion vector and combine the extracted data, for example, through a sample-by-sample average or weighted average.

別の例として、イントラ予測またはイントラ予測コーディングの場合、イントラ予測ユニット226は、現在ブロックに隣接するサンプルから予測ブロックを生成し得る。たとえば、指向性モードの場合、イントラ予測ユニット226は一般に、隣接するサンプルの値を数学的に組み合わせて、現在ブロックにわたって定められた方向においてこれらの計算された値をポピュレートして、予測ブロックを生み出し得る。別の例として、DCモードの場合、イントラ予測ユニット226は、現在ブロックに隣接するサンプルの平均を計算し、予測ブロックの各サンプルに対するこの得られた平均を含むように予測ブロックを生成し得る。 As another example, in the case of intra-prediction or intra-prediction coding, the intra-prediction unit 226 may generate a prediction block from samples adjacent to the current block. For example, in directional mode, the intra-prediction unit 226 may generally mathematically combine the values of adjacent samples and populate these calculated values in a direction defined across the current block to generate a prediction block. As another example, in DC mode, the intra-prediction unit 226 may calculate the average of the samples adjacent to the current block and generate a prediction block that includes this obtained average for each sample in the prediction block.

モード選択ユニット202は、予測ブロックを残差生成ユニット204に提供する。残差生成ユニット204は、ビデオデータメモリ230から現在ブロックの未加工の符号化されていないバージョンを受信し、モード選択ユニット202から予測ブロックを受信する。残差生成ユニット204は、現在ブロックと予測ブロックとのサンプルごとの差を計算する。得られるサンプルごとの差は、現在ブロックに対する残差ブロックを定義する。いくつかの例では、残差生成ユニット204はまた、残差差分パルス符号変調(RDPCM: residual differential pulse code modulation)を使用して残差ブロックを生成するために、残差ブロックの中のサンプル値間の差を決定し得る。いくつかの例では、残差生成ユニット204は、二進減算を実行する1つまたは複数の減算器回路を使用して形成され得る。 The mode selection unit 202 provides the prediction block to the residual generation unit 204. The residual generation unit 204 receives a raw, unencoded version of the current block from the video data memory 230 and the prediction block from the mode selection unit 202. The residual generation unit 204 calculates the sample-by-sample difference between the current block and the prediction block. The resulting sample-by-sample difference defines the residual block relative to the current block. In some examples, the residual generation unit 204 may also determine the differences between sample values within the residual block to generate the residual block using residual differential pulse code modulation (RDPCM). In some examples, the residual generation unit 204 may be formed using one or more subtractor circuits that perform binary subtraction.

モード選択ユニット202がCUをPUへと区分する例では、各PUは、ルマ予測ユニットおよび対応するクロマ予測ユニットと関連付けられ得る。ビデオエンコーダ200およびビデオデコーダ300は、様々なサイズを有するPUをサポートし得る。上で示されたように、CUのサイズは、CUのルマコーディングブロックのサイズを指すことがあり、PUのサイズは、PUのルマ予測ユニットのサイズを指すことがある。特定のCUのサイズが2N×2Nであると仮定すると、ビデオエンコーダ200は、イントラ予測に対して2N×2NまたはN×NというPUサイズ、およびインター予測に対して2N×2N、2N×N、N×2N、N×N、または類似の対称的なPUサイズをサポートし得る。ビデオエンコーダ200およびビデオデコーダ300はまた、インター予測に対して、2N×nU、2N×nD、nL×2N、およびnR×2NというPUサイズのための非対称区分をサポートし得る。 In an example where the mode selection unit 202 divides CUs into PUs, each PU may be associated with a luma prediction unit and a corresponding chroma prediction unit. The video encoder 200 and video decoder 300 may support PUs of various sizes. As shown above, the size of a CU may refer to the size of the luma coding block of the CU, and the size of a PU may refer to the size of the luma prediction unit of the PU. Assuming a particular CU size is 2N×2N, the video encoder 200 may support PU sizes of 2N×2N or N×N for intra-prediction, and 2N×2N, 2N×N, N×2N, N×N, or similar symmetric PU sizes for inter-prediction. The video encoder 200 and video decoder 300 may also support asymmetric divisions for PU sizes of 2N×nU, 2N×nD, nL×2N, and nR×2N for inter-prediction.

モード選択ユニット202がCUをPUへとさらに区分しない例では、各CUはルマコーディングブロックおよび対応するクロマコーディングブロックと関連付けられ得る。上のように、CUのサイズは、CUのルマコーディングブロックのサイズを指し得る。ビデオエンコーダ200およびビデオデコーダ300は、2N×2N、2N×N、またはN×2NというCUサイズをサポートし得る。 In cases where the mode selection unit 202 does not further subdivide the CUs into PUs, each CU may be associated with a lumacoding block and a corresponding chromacoding block. As shown above, the size of the CU may refer to the size of the lumacoding block within the CU. The video encoder 200 and video decoder 300 may support CU sizes of 2N×2N, 2N×N, or N×2N.

いくつかの例として、イントラブロックコピーモードコーディング、アフィンモードコーディング、および線形モデル(LM)モードコーディングなどの他のビデオコーディング技法の場合、モード選択ユニット202は、コーディング技法に関連するそれぞれのユニットを介して、符号化されている現在ブロックのための予測ブロックを生成する。パレットモードコーディングなどのいくつかの例では、モード選択ユニット202は、予測ブロックを生成しなくてもよく、代わりに、選択されたパレットに基づいてブロックを再構築する方式を示すシンタックス要素を生成してもよい。そのようなモードでは、モード選択ユニット202は、符号化されるべきエントロピー符号化ユニット220にこれらのシンタックス要素を提供し得る。 In some examples, such as intra-block copy mode coding, affine mode coding, and other video coding techniques including linear model (LM) mode coding, the mode selection unit 202 generates a predicted block for the current block being coded via the respective units associated with the coding technique. In some examples, such as palette mode coding, the mode selection unit 202 may not generate a predicted block; instead, it may generate syntax elements indicating a scheme for reconstructing the block based on the selected palette. In such modes, the mode selection unit 202 may provide these syntax elements to the entropy coding unit 220 to be coded.

上で説明されたように、残差生成ユニット204は、現在ブロックおよび対応する予測ブロックに対するビデオデータを受信する。残差生成ユニット204は次いで、現在ブロックに対する残差ブロックを生成する。残差ブロックを生成するために、残差生成ユニット204は、予測ブロックと現在ブロックとのサンプルごとの差を計算する。 As described above, the residual generation unit 204 receives video data for the current block and the corresponding predicted block. The residual generation unit 204 then generates residual blocks for the current block. To generate the residual blocks, the residual generation unit 204 calculates the sample-by-sample difference between the predicted block and the current block.

変換処理ユニット206は、1つまたは複数の変換を残差ブロックに適用して、変換係数のブロック(「変換係数ブロック」と本明細書で呼ばれる)を生成する。変換処理ユニット206は、残差ブロックに様々な変換を適用して、変換係数ブロックを形成し得る。たとえば、変換処理ユニット206は、離散コサイン変換(DCT)、方向変換、カルーネンレーベ変換(KLT)、または概念的に類似の変換を、残差ブロックに適用し得る。いくつかの例では、変換処理ユニット206は、複数の変換、たとえば、回転変換などの一次変換および二次変換を残差ブロックに対して実行し得る。いくつかの例では、変換処理ユニット206は、残差ブロックに変換を適用しない。 The transformation processing unit 206 applies one or more transformations to the residual block to generate a block of transformation coefficients (referred to herein as the "transformation coefficient block"). The transformation processing unit 206 may apply various transformations to the residual block to form the transformation coefficient block. For example, the transformation processing unit 206 may apply a discrete cosine transform (DCT), a direction transform, a Carunenlebe transform (KLT), or a conceptually similar transformation to the residual block. In some examples, the transformation processing unit 206 may perform multiple transformations on the residual block, such as linear and quadratic transformations, including a rotation transform. In some examples, the transformation processing unit 206 does not apply any transformations to the residual block.

量子化ユニット208は、変換係数ブロックの中の変換係数を量子化して、量子化された変換係数ブロックを生成し得る。量子化ユニット208は、現在ブロックに関連する量子化パラメータ(QP)値に従って、変換係数ブロックの変換係数を量子化し得る。ビデオエンコーダ200は(たとえば、モード選択ユニット202を介して)、CUに関連するQP値を調整することによって、現在ブロックに関連する変換係数ブロックに適用される量子化の程度を調整し得る。量子化が情報の損失をもたらすことがあり、したがって、量子化された変換係数は、変換処理ユニット206によって生成される元の変換係数より精度が低いことがある。 The quantization unit 208 can quantize the transformation coefficients within the transformation coefficient block to generate a quantized transformation coefficient block. The quantization unit 208 can quantize the transformation coefficients of the transformation coefficient block according to the quantization parameter (QP) value associated with the current block. The video encoder 200 can adjust the degree of quantization applied to the transformation coefficient block associated with the current block (for example, via the mode selection unit 202) by adjusting the QP value associated with the CU. Quantization may result in information loss; therefore, the quantized transformation coefficients may be less precise than the original transformation coefficients generated by the transformation processing unit 206.

逆量子化ユニット210および逆変換処理ユニット212は、それぞれ、量子化された変換係数ブロックに逆量子化および逆変換を適用して、変換係数ブロックから残差ブロックを再構築し得る。再構築ユニット214は、モード選択ユニット202によって生成される再構築された残差ブロックおよび予測ブロックに基づいて、(ある程度のひずみを伴う可能性があるが)現在ブロックに対応する再構築されたブロックを生成し得る。たとえば、再構築ユニット214は、モード選択ユニット202によって生成される予測ブロックからの対応するサンプルに再構築された残差ブロックのサンプルを加算して、再構築されたブロックを生成し得る。 The inverse quantization unit 210 and the inverse transform processing unit 212 can reconstruct the residual block from the quantized transform coefficient block by applying inverse quantization and inverse transform, respectively. The reconstruction unit 214 can generate a reconstructed block corresponding to the current block (which may be with some distortion) based on the reconstructed residual block and prediction block generated by the mode selection unit 202. For example, the reconstruction unit 214 can generate a reconstructed block by adding a sample from the reconstructed residual block to a corresponding sample from the prediction block generated by the mode selection unit 202.

フィルタユニット216は、再構築されたブロックに対して1回または複数回のフィルタ動作を実行し得る。たとえば、フィルタユニット216は、CUの端部に沿ったブロッキネスアーティファクトを減らすために、デブロッキング動作を実行し得る。いくつかの例では、フィルタユニット216の動作はスキップされ得る。 The filter unit 216 may perform one or more filtering operations on the reconstructed block. For example, the filter unit 216 may perform a deblocking operation to reduce blockingness artifacts along the edges of the CU. In some examples, the operation of the filter unit 216 may be skipped.

ビデオエンコーダ200は、DPB218に再構築されたブロックを記憶する。たとえば、フィルタユニット216の動作が実行されない例では、再構築ユニット214が再構築されたブロックをDPB218に記憶し得る。フィルタユニット216の動作が実行される例では、フィルタユニット216は、フィルタリングされた再構築されたブロックをDPB218に記憶し得る。動き推定ユニット222および動き補償ユニット224は、後で符号化されるピクチャのブロックをインター予測するために、再構築された(および場合によってはフィルタリングされた)ブロックから形成される、DPB218から参照ピクチャを取り出し得る。加えて、イントラ予測ユニット226は、現在ピクチャの中の他のブロックをイントラ予測するために、現在ピクチャのDPB218の中の再構築されたブロックを使用し得る。 The video encoder 200 stores the reconstructed blocks in the DPB 218. For example, in an example where the filter unit 216 does not operate, the reconstruction unit 214 may store the reconstructed blocks in the DPB 218. In an example where the filter unit 216 operates, the filter unit 216 may store the filtered reconstructed blocks in the DPB 218. The motion estimation unit 222 and the motion compensation unit 224 may retrieve a reference picture from the DPB 218, formed from the reconstructed (and possibly filtered) blocks, to inter-predict blocks of the picture to be encoded later. In addition, the intra-prediction unit 226 may use the reconstructed blocks in the DPB 218 of the current picture to intra-predict other blocks in the current picture.

一般に、エントロピー符号化ユニット220は、ビデオエンコーダ200の他の機能コンポーネントから受信されたシンタックス要素をエントロピー符号化し得る。たとえば、エントロピー符号化ユニット220は、量子化ユニット208から量子化された変換係数ブロックをエントロピー符号化し得る。別の例として、エントロピー符号化ユニット220は、モード選択ユニット202からの予測シンタックス要素(たとえば、インター予測のための動き情報またはイントラ予測のためのイントラモード情報)をエントロピー符号化し得る。エントロピー符号化ユニット220は、1つまたは複数のエントロピー符号化動作を、ビデオデータの別の例であるシンタックス要素に対して実行して、エントロピー符号化されたデータを生成し得る。たとえば、エントロピー符号化ユニット220は、コンテキスト適応可変長コーディング(CAVLC)動作、CABAC動作、可変長-可変長(V2V)コーディング動作、シンタックスベースコンテキスト適応バイナリ算術コーディング(SBAC)動作、確率間隔区分エントロピー(PIPE)コーディング動作、指数ゴロム符号化動作、または別のタイプのエントロピー符号化動作を、データに対して実行し得る。いくつかの例では、エントロピー符号化ユニット220は、シンタックス要素がエントロピー符号化されないバイパスモードで動作し得る。 In general, the entropy coding unit 220 can entropy code syntax elements received from other functional components of the video encoder 200. For example, the entropy coding unit 220 can entropy code a quantized conversion coefficient block from the quantization unit 208. As another example, the entropy coding unit 220 can entropy code prediction syntax elements from the mode selection unit 202 (e.g., motion information for inter-prediction or intra-mode information for intra-prediction). The entropy coding unit 220 can perform one or more entropy coding operations on syntax elements, which are another example of video data, to generate entropy-coded data. For example, the entropy coding unit 220 may perform context-adaptive variable-length coding (CAVLC), CABAC, variable-length to variable-length (V2V) coding, syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioned entropy (PIPE) coding, exponential Golomb coding, or another type of entropy coding on the data. In some examples, the entropy coding unit 220 may operate in a bypass mode where syntax elements are not entropically coded.

ビデオエンコーダ200は、スライスまたはピクチャのブロックを再構築するために必要とされるエントロピー符号化されたシンタックス要素を含むビットストリームを出力し得る。具体的には、エントロピー符号化ユニット220がビットストリームを出力し得る。 The video encoder 200 may output a bitstream containing entropy-encoded syntax elements necessary for reconstructing slices or blocks of pictures. Specifically, the entropy encoding unit 220 may output the bitstream.

上で説明された動作は、ブロックに関して説明される。そのような説明は、ルマコーディングブロックおよび/またはクロマコーディングブロックのための動作であるものとして理解されるべきである。上で説明されたように、いくつかの例では、ルマコーディングブロックおよびクロマコーディングブロックは、CUのルマ成分およびクロマ成分である。いくつかの例では、ルマコーディングブロックおよびクロマコーディングブロックは、PUのルマ成分およびクロマ成分である。 The behavior described above is explained in relation to blocks. Such explanations should be understood as behaviors for luma-coding blocks and/or chroma-coding blocks. As described above, in some examples, the luma-coding block and chroma-coding block are the luma and chroma components of the CU. In some examples, the luma-coding block and chroma-coding block are the luma and chroma components of the PU.

いくつかの例では、ルマコーディングブロックに関して実行される動作は、クロマコーディングブロックに対して繰り返される必要はない。一例として、ルマコーディングブロックのための動きベクトル(MV)および参照ピクチャを特定するための動作が、クロマブロックのためのMVおよび参照ピクチャを特定するために繰り返される必要はない。むしろ、ルマコーディングブロックのためのMVは、クロマブロックのためのMVを決定するためにスケーリングされてもよく、参照ピクチャは同じであってもよい。別の例として、イントラ予測プロセスは、ルマコーディングブロックおよびクロマコーディングブロックについて同じであってもよい。 In some cases, actions performed for a lumacoding block do not need to be repeated for a chromacoding block. For example, actions to identify the motion vector (MV) and reference picture for a lumacoding block do not need to be repeated to identify the MV and reference picture for a chromablock. Rather, the MV for the lumacoding block may be scaled to determine the MV for the chromablock, and the reference picture may be the same. Another example is that the intra-prediction process may be the same for both lumacoding and chromacoding blocks.

図4は、本開示の技法を実行し得る例示的なビデオデコーダ300を示すブロック図である。図4は、説明のために提供され、本開示において広く例示および説明されるような技法を限定するものではない。説明のために、本開示は、VVC(ITU-T H.266)、およびHEVC(ITU-T H.265)の技法によるビデオデコーダ300を説明する。しかしながら、本開示の技法は、他のビデオコーディング規格に従って構成されるビデオコーディングデバイスによって実行され得る。 Figure 4 is a block diagram showing an exemplary video decoder 300 capable of performing the techniques of this disclosure. Figure 4 is provided for illustrative purposes and is not intended to limit the techniques that are broadly illustrated and described in this disclosure. For illustrative purposes, this disclosure describes a video decoder 300 using VVC (ITU-T H.266) and HEVC (ITU-T H.265) techniques. However, the techniques of this disclosure may be performed by video coding devices configured according to other video coding standards.

図4の例では、ビデオデコーダ300は、コーディングピクチャバッファ(CPB)メモリ320、エントロピー復号ユニット302、予測処理ユニット304、逆量子化ユニット306、逆変換処理ユニット308、再構築ユニット310、フィルタユニット312、および復号ピクチャバッファ(DPB)314を含む。CPBメモリ320、エントロピー復号ユニット302、予測処理ユニット304、逆量子化ユニット306、逆変換処理ユニット308、再構築ユニット310、フィルタユニット312、およびDPB314のいずれかまたはすべてが、1つまたは複数のプロセッサにおいてまたは処理回路において実装され得る。たとえば、ビデオデコーダ300のユニットは、ハードウェア回路の一部としての1つまたは複数の回路もしくは論理要素として、またはプロセッサ、ASIC、もしくはFPGAの一部として実装され得る。さらに、ビデオデコーダ300は、これらおよび他の機能を実行するための追加または代替のプロセッサまたは処理回路を含み得る。 In the example shown in Figure 4, the video decoder 300 includes a coding picture buffer (CPB) memory 320, an entropy decoding unit 302, a prediction processing unit 304, an inverse quantization unit 306, an inverse transformation processing unit 308, a reconstruction unit 310, a filter unit 312, and a decoding picture buffer (DPB) 314. Any or all of the CPB memory 320, entropy decoding unit 302, prediction processing unit 304, inverse quantization unit 306, inverse transformation processing unit 308, reconstruction unit 310, filter unit 312, and DPB 314 may be implemented in one or more processors or processing circuits. For example, the units of the video decoder 300 may be implemented as one or more circuits or logic elements as part of a hardware circuit, or as part of a processor, ASIC, or FPGA. Furthermore, the video decoder 300 may include additional or alternative processors or processing circuits to perform these and other functions.

予測処理ユニット304は、動き補償ユニット316およびイントラ予測ユニット318を含む。予測処理ユニット304は、他の予測モードに従って予測を実行するための追加のユニットを含み得る。例として、予測処理ユニット304は、パレットユニット、イントラブロックコピーユニット(これは、動き補償ユニット316の一部を形成し得る)、アフィンユニット、線形モデル(LM)ユニットなどを含み得る。他の例では、ビデオデコーダ300は、より多数の、より少数の、または異なる機能コンポーネントを含み得る。動き補償ユニット316は、以下の動き補償ユニット316の議論において説明される、マルチパスDMVRユニット(MPDMVR)317を含み得る。 The prediction processing unit 304 includes a motion compensation unit 316 and an intra-prediction unit 318. The prediction processing unit 304 may include additional units for performing predictions according to other prediction modes. For example, the prediction processing unit 304 may include a pallet unit, an intra-block copy unit (which may form part of the motion compensation unit 316), an affine unit, a linear model (LM) unit, etc. In other examples, the video decoder 300 may include more, fewer, or different functional components. The motion compensation unit 316 may include a multipath DMVR unit (MPDMVR) 317, which will be discussed in the following discussion of the motion compensation unit 316.

CPBメモリ320は、ビデオデコーダ300のコンポーネントによって復号されるべき、符号化されたビデオビットストリームなどのビデオデータを記憶し得る。CPUメモリ320に記憶されるビデオデータは、たとえば、コンピュータ可読媒体110(図1)から取得され得る。CPUメモリ320は、符号化されたビデオビットストリームからの符号化されたビデオデータ(たとえば、シンタックス要素)を記憶するCPBを含み得る。また、CPBメモリ320は、ビデオデコーダ300の様々なユニットからの出力を表す一時データなどの、コーディングされたピクチャのシンタックス要素以外のビデオデータを記憶し得る。DPB314は一般に復号されたピクチャを記憶し、ビデオデコーダ300は、符号化されたビデオビットストリームの後続のデータまたはピクチャを復号するとき、参照ビデオデータとしてこの復号されたピクチャを出力および/または使用し得る。CPBメモリ320およびDPB314は、SDRAMを含むDRAM、MRAM、RRAM、または他のタイプのメモリデバイスなどの、様々なメモリデバイスのいずれかによって形成され得る。CPUメモリ320およびDPB314は、同じメモリデバイスまたは別個のメモリデバイスによって提供され得る。様々な例では、CPBメモリ320は、ビデオデコーダ300の他のコンポーネントとともにオンチップであってよく、またはそれらのコンポーネントに対してオフチップであってよい。 The CPB memory 320 may store video data, such as an encoded video bitstream, to be decoded by the components of the video decoder 300. Video data stored in the CPU memory 320 may be obtained, for example, from a computer-readable medium 110 (Figure 1). The CPU memory 320 may include a CPB that stores encoded video data (e.g., syntax elements) from the encoded video bitstream. The CPB memory 320 may also store video data other than syntax elements of the coded picture, such as temporary data representing outputs from various units of the video decoder 300. The DPB 314 generally stores the decoded picture, and the video decoder 300 may output and/or use this decoded picture as reference video data when decoding subsequent data or pictures from the encoded video bitstream. The CPB memory 320 and DPB 314 may be formed by any of various memory devices, such as DRAM, MRAM, RRAM, or other types of memory devices, including SDRAM. The CPU memory 320 and DPB 314 may be provided by the same memory device or by separate memory devices. In various examples, the CPB memory 320 may be on-chip with the other components of the video decoder 300, or it may be off-chip relative to those components.

追加または代替として、いくつかの例では、ビデオデコーダ300は、メモリ120(図1)からコーディングされたビデオデータを取り出し得る。すなわち、メモリ120は、CPBメモリ320とともに上で論じられたようなデータを記憶し得る。同様に、メモリ120は、ビデオデコーダ300の機能の一部またはすべてがビデオデコーダ300の処理回路によって実行されるべきソフトウェアにおいて実装されるとき、ビデオデコーダ300によって実行されるべき命令を記憶し得る。 As an addition or alternative, in some examples, the video decoder 300 may retrieve coded video data from memory 120 (Figure 1). That is, memory 120 may store data such as those discussed above, together with the CPB memory 320. Similarly, memory 120 may store instructions to be executed by the video decoder 300 when some or all of the functions of the video decoder 300 are implemented in software to be performed by the processing circuit of the video decoder 300.

図4に示される様々なユニットは、ビデオデコーダ300によって実行される動作を理解するのを助けるために示されている。ユニットは、固定機能の回路、プログラマブル回路、またはこれらの組合せとして実装され得る。図3と同様に、固定機能の回路は、特定の機能を提供する回路を指し、実行され得る動作があらかじめ設定されている。プログラマブル回路は、様々なタスクを実行するようにプログラムされ得る回路を指し、実行され得る動作において柔軟な機能を提供する。たとえば、プログラマブル回路は、ソフトウェアまたはファームウェアの命令によって定義される方式で、プログラマブル回路を動作させるソフトウェアまたはファームウェアを実行し得る。固定機能の回路は、(たとえば、パラメータを受信するために、またはパラメータを出力するために)ソフトウェア命令を実行し得るが、固定機能の回路が実行する動作のタイプは、一般に不変である。いくつかの例では、ユニットのうちの1つまたは複数は、異なる回路ブロック(固定機能またはプログラマブル)であってもよく、いくつかの例では、ユニットのうちの1つまたは複数は集積回路であってもよい。 The various units shown in Figure 4 are presented to help understand the operations performed by the video decoder 300. The units can be implemented as fixed-function circuits, programmable circuits, or a combination thereof. Similar to Figure 3, fixed-function circuits refer to circuits that provide a specific function, with predefined operations that can be performed. Programmable circuits refer to circuits that can be programmed to perform various tasks, offering flexibility in the operations they can perform. For example, a programmable circuit may execute software or firmware that operates it, defined by software or firmware instructions. Fixed-function circuits may execute software instructions (e.g., to receive or output parameters), but the type of operation performed by a fixed-function circuit is generally immutable. In some examples, one or more of the units may be different circuit blocks (fixed-function or programmable), and in some examples, one or more of the units may be integrated circuits.

ビデオデコーダ300は、ALU、EFU、デジタル回路、アナログ回路、および/または、プログラム可能回路から形成されるプログラマブルコアを含み得る。ビデオデコーダ300の動作がプログラム可能回路上で実行されるソフトウェアによって実行される例では、オンチップメモリまたはオフチップメモリは、ビデオデコーダ300が受信して実行するソフトウェアの命令(たとえば、オブジェクトコード)を記憶し得る。 The video decoder 300 may include a programmable core formed from an ALU, EFU, digital circuitry, analog circuitry, and/or programmable circuitry. In an example where the operation of the video decoder 300 is performed by software running on the programmable circuitry, on-chip or off-chip memory may store software instructions (e.g., object code) that the video decoder 300 receives and executes.

エントロピー復号ユニット302は、CPBから符号化されたビデオデータを受信し、ビデオデータをエントロピー復号して、シンタックス要素を再生し得る。予測処理ユニット304、逆量子化ユニット306、逆変換処理ユニット308、再構築ユニット310、およびフィルタユニット312は、ビットストリームから抽出されたシンタックス要素に基づいて、復号されたビデオデータを生成し得る。 The entropy decoding unit 302 can receive video data encoded from the CPB and reconstruct the syntax elements by entropy decoding the video data. The prediction processing unit 304, inverse quantization unit 306, inverse transformation processing unit 308, reconstruction unit 310, and filter unit 312 can generate the decoded video data based on the syntax elements extracted from the bitstream.

一般に、ビデオデコーダ300は、ブロックごとにピクチャを再構築する。ビデオデコーダ300は、各ブロックに対する再構築動作を個別に実行し得る(ここで、現在再構築されている、すなわち復号されているブロックは、「現在ブロック」と呼ばれ得る)。 Generally, the video decoder 300 reconstructs the picture block by block. The video decoder 300 can perform the reconstruction operation for each block individually (where the block currently being reconstructed, i.e., decoded, may be called the "current block").

エントロピー復号ユニット302は、量子化された変換係数ブロックの量子化された変換係数を定義するシンタックス要素、ならびに、量子化パラメータ(QP)および/または変換モード指示などの変換情報をエントロピー復号し得る。逆量子化ユニット306は、量子化の程度と、同様に、逆量子化ユニット306が適用すべき逆量子化の程度とを決定するために、量子化された変換係数ブロックと関連付けられるQPを使用し得る。逆量子化ユニット306は、たとえば、ビットごとの左シフト演算を実行して、量子化された変換係数を逆量子化し得る。こうして、逆量子化ユニット306は、変換係数を含む変換係数ブロックを形成し得る。 The entropy decoding unit 302 can entropy decode the syntax elements that define the quantized transformation coefficients of the quantized transformation coefficient block, as well as transformation information such as quantization parameters (QP) and/or transformation mode indications. The inverse quantization unit 306 may use the QP associated with the quantized transformation coefficient block to determine the degree of quantization and, similarly, the degree of inverse quantization that the inverse quantization unit 306 should apply. The inverse quantization unit 306 may, for example, perform a bitwise left shift operation to inverse quantize the quantized transformation coefficients. Thus, the inverse quantization unit 306 can form a transformation coefficient block containing the transformation coefficients.

逆量子化ユニット306が変換係数ブロックを形成した後、逆変換処理ユニット308は、現在ブロックに関連する残差ブロックを生成するために、変換係数ブロックに1つまたは複数の逆変換を適用し得る。たとえば、逆変換処理ユニット308は、逆DCT、逆整数変換、逆カルーネンレーベ変換(KLT)、逆回転変換、逆方向変換、または別の逆変換を、変換係数ブロックに適用し得る。 After the inverse quantization unit 306 forms a transformation coefficient block, the inverse transformation processing unit 308 may apply one or more inverse transformations to the transformation coefficient block to generate a residual block associated with the current block. For example, the inverse transformation processing unit 308 may apply an inverse DCT, an inverse integer transformation, an inverse Carunenlebe transformation (KLT), an inverse rotation transformation, an inverse direction transformation, or another inverse transformation to the transformation coefficient block.

さらに、予測処理ユニット304は、エントロピー復号ユニット302によってエントロピー復号された予測情報シンタックス要素に従って、予測ブロックを生成する。たとえば、現在ブロックがインター予測されることを、予測情報シンタックス要素が示す場合、動き補償ユニット316は、予測ブロックを生成し得る。この場合、予測情報シンタックス要素は、参照ブロックをそこから取り出すべきDPB314の中の参照ピクチャ、ならびに、現在ピクチャの中の現在ブロックの位置に対する相対的な参照ピクチャの中の参照ブロックの位置を特定する動きベクトルを示し得る。動き補償ユニット316は一般に、動き補償ユニット224(図3)に関して説明されるものと実質的に同様の方式で、インター予測処理を実行し得る。 Furthermore, the prediction processing unit 304 generates prediction blocks according to the prediction information syntax elements entropy-decoded by the entropy decoding unit 302. For example, if the prediction information syntax elements indicate that the current block is interpredicted, the motion compensation unit 316 may generate a prediction block. In this case, the prediction information syntax elements may indicate the reference picture in the DPB 314 from which the reference block should be extracted, as well as a motion vector that identifies the position of the reference block in the reference picture relative to the position of the current block in the current picture. The motion compensation unit 316 can generally perform interprediction processing in substantially the same manner as described with respect to the motion compensation unit 224 (Figure 3).

いくつかの例では、動き補償ユニット316は、マルチパスDMVRユニット317を含み得る。マルチパスDMVRユニット317は、改良された動きベクトルを決定するために、ビデオデータのブロックのための動きベクトルにマルチパスDMVRを適用し得る。マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される、第1のパスを含み得る。マルチパスDMVRは、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される、第2のパスを含み得る。マルチパスDMVRは、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される、第3のパスを含み得る。第2パスサブブロックの幅は、ビデオデータのブロックの幅以下であってもよく、第2パスサブブロックの高さは、ビデオデータのブロックの高さ以下であってもよい。第3パスサブブロックの幅は、第2パスサブブロックの幅以下であってもよく、第3パスサブブロックの高さは、第2パスサブブロックの高さ以下である。マルチパスDMVR技法のさらなる例および説明は、本開示において後で説明される。 In some examples, the motion compensation unit 316 may include a multi-pass DMVR unit 317. The multi-pass DMVR unit 317 may apply multi-pass DMVR to the motion vectors for blocks of video data to determine improved motion vectors. The multi-pass DMVR may include a first pass, which is block-based and applied to blocks of video data. The multi-pass DMVR may include a second pass, which is sub-block-based and applied to at least one second-pass sub-block of the block of video data. The multi-pass DMVR may include a third pass, which is sub-block-based and applied to at least one third-pass sub-block of the block of video data. The width of the second-pass sub-block may be less than or equal to the width of the block of video data, and the height of the second-pass sub-block may be less than or equal to the height of the block of video data. The width of the third-pass sub-block may be less than or equal to the width of the second-pass sub-block, and the height of the third-pass sub-block may be less than or equal to the height of the second-pass sub-block. Further examples and explanations of multi-pass DMVR techniques are described later in this disclosure.

別の例として、現在ブロックがイントラ予測されることを、予測情報シンタックス要素が示す場合、イントラ予測ユニット318は、予測情報シンタックス要素によって示されるイントラ予測モードに従って、予測ブロックを生成し得る。再び、イントラ予測ユニット318は一般に、イントラ予測ユニット226(図3)に関して説明されるものと実質的に同様の方式で、イントラ予測処理を実行し得る。イントラ予測ユニット318は、DPB314から現在ブロックに隣接するサンプルのデータを取り出し得る。 As another example, if the prediction information syntax element indicates that the current block is intra-predicted, the intra-prediction unit 318 may generate a predicted block according to the intra-prediction mode indicated by the prediction information syntax element. Again, the intra-prediction unit 318 may generally perform intra-prediction processing in substantially the same manner as described with respect to the intra-prediction unit 226 (Figure 3). The intra-prediction unit 318 may retrieve data from the DPB 314 for samples adjacent to the current block.

再構築ユニット310は、予測ブロックと残差ブロックとを使用して現在ブロックを再構築し得る。たとえば、再構築ユニット310は、予測ブロックの対応するサンプルに残差ブロックのサンプルを追加して、現在ブロックを再構築し得る。 The reconstruction unit 310 can reconstruct the current block using the predicted block and the residual block. For example, the reconstruction unit 310 can reconstruct the current block by adding samples from the residual block to the corresponding samples from the predicted block.

フィルタユニット312は、再構築されたブロックに対して1回または複数回のフィルタ動作を実行し得る。たとえば、フィルタユニット312は、再構築されたブロックの端部に沿ったブロッキネスアーティファクトを減らすために、デブロッキング動作を実行し得る。すべての例において、フィルタユニット312の動作が必ずしも実行されるとは限らない。 The filter unit 312 may perform one or more filtering operations on the reconstructed block. For example, the filter unit 312 may perform a deblocking operation to reduce blockingness artifacts along the edges of the reconstructed block. In all examples, the filter unit 312 may not necessarily perform any operations.

ビデオデコーダ300は、DPB314に再構築されたブロックを記憶し得る。たとえば、フィルタユニット312の動作が実行されない例では、再構築ユニット310は再構築されたブロックをDPB314に記憶し得る。フィルタユニット312の動作が実行される例では、フィルタユニット312は、フィルタリングされた再構築されたブロックをDPB314に記憶し得る。上で論じられたように、DPB314は、イントラ予測のための現在のピクチャおよび以後の動き補償のための以前に復号されたピクチャのサンプルなどの参照情報を、予測処理ユニット304に提供し得る。さらに、ビデオデコーダ300は、図1の表示デバイス118などの表示デバイス上に後で提示するための、DPB314からの復号されたピクチャ(たとえば、復号されたビデオ)を出力し得る。 The video decoder 300 may store the reconstructed blocks in the DPB 314. For example, in an example where the filter unit 312 does not operate, the reconstruction unit 310 may store the reconstructed blocks in the DPB 314. In an example where the filter unit 312 operates, the filter unit 312 may store the filtered reconstructed blocks in the DPB 314. As discussed above, the DPB 314 may provide the prediction processing unit 304 with reference information such as the current picture for intra-prediction and samples of previously decoded pictures for subsequent motion compensation. Furthermore, the video decoder 300 may output the decoded picture (e.g., decoded video) from the DPB 314 for later presentation on a display device such as the display device 118 in Figure 1.

このようにして、ビデオデコーダ300は、ビデオデータを記憶するように構成されるメモリと、回路で実装されメモリに通信可能に結合される1つまたは複数のプロセッサとを含む、ビデオ復号デバイスの例を表し、1つまたは複数のプロセッサは、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用し、少なくとも1つの改良された動きベクトルに基づいてブロックを復号するように構成され、マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える。 Thus, the video decoder 300 represents an example of a video decoding device, comprising a memory configured to store video data and one or more processors implemented in circuitry and communicatively coupled to the memory, wherein the one or more processors are configured to apply a multipath decoder-side motion vector improvement (DMVR) to the motion vectors for a block of video data to determine at least one improved motion vector, and to decode the block based on the at least one improved motion vector, the multipath DMVR comprising: a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of a block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of a block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

ビデオデコーダ300はまた、ビデオデータを記憶するように構成されるメモリと、回路で実装される1つまたは複数の処理ユニットとを含む、ビデオ復号デバイスの例を表し、処理ユニットは、改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用し、改良された動きベクトルに基づいてブロックを復号するように構成される。 The video decoder 300 also represents an example of a video decoding device, comprising a memory configured to store video data and one or more processing units implemented in the circuit, the processing units configured to apply multipath decoder-side motion vector improvement (DMVR) to the motion vectors for blocks of video data to determine improved motion vectors, and to decode the blocks based on the improved motion vectors.

本開示は、デコーダ側動きベクトル導出技法(たとえば、テンプレートマッチング、バイラテラルマッチング、デコーダ側MV改良、双方向オプティカルフローなど)に関連する。本開示の技法は、HEVC(High Efficiency Video Coding)、VVC(Versatile Video Coding)、Essential Video Coding(EVC)などの既存のビデオコーデックのいずれに適用されてもよく、またはあらゆる未来のビデオコーディング規格における効率的なコーディングツールになる可能性がある。このセクションでは、HEVCおよびJEM技法ならびに本開示に関連するVersatile Video Coding(VVC)における進行中の作業が最初に検討される。 This disclosure relates to decoder-side motion vector derivation techniques (e.g., template matching, bilateral matching, decoder-side MV improvements, bidirectional optical flow, etc.). The techniques described herein may be applied to any existing video codec such as HEVC (High Efficiency Video Coding), VVC (Versatile Video Coding), and Essential Video Coding (EVC), or may serve as efficient coding tools in any future video coding standard. This section first considers HEVC and JEM techniques, as well as ongoing work on Versatile Video Coding (VVC) related to this disclosure.

ビデオコーディング規格は、そのスケーラブルビデオコーディング(SVC:Scalable Video Coding)拡張およびマルチビュービデオコーディング(MVC:Multi-view Video Coding)拡張を含む、ITU-T H.261、ISO/IEC MPEG-1 Visual、ITU-T H.262またはISO/IEC MPEG-2 Visual、ITU-T H.263、ISO/IEC MPEG-4 Visual、およびITU-T H.264(ISO/IEC MPEG-4 AVCとも呼ばれる)を含む。Versatile Video Coding and Test Model 10(VTM 10.0)のアルゴリズム記述は、https://jvet-experts.org/から入手可能なJVET-T2002と呼ばれ得る。 Video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual, and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including their Scalable Video Coding (SVC) and Multi-view Video Coding (MVC) extensions. The algorithmic description of Versatile Video Coding and Test Model 10 (VTM 10.0) may be referred to as JVET-T2002, available from https://jvet-experts.org/.

HEVCにおけるCU構造および動きベクトル予測がここで論じられる。HEVCでは、スライスの中の最大のコーディングユニットは、コーディングツリーブロック(CTB)またはコーディングツリーユニット(CTU)と呼ばれる。CTBは、そのノードがコーディングユニットである四分木を含み得る。 The CU structure and motion vector prediction in HEVC are discussed here. In HEVC, the largest coding unit in a slice is called a coding tree block (CTB) or coding tree unit (CTU). A CTB may contain a quadtree whose nodes are coding units.

(技術的には8×8のCTBサイズがサポートされ得るが)CTBのサイズは、HEVCメインプロファイルでは16×16から64×64にわたり得る。コーディングユニット(CU)は、CTBと同じサイズから8×8程度に小さいサイズまでのサイズを有し得る。各CUは、1つのモード、すなわちインターモードまたはイントラモードを用いてコーディングされる。CUがインターコーディングされるとき、CUは、2つもしくは4つの予測ユニット(PU)へとさらに区分されてもよく、またはさらなる区分が適用されないときには1つだけのPUのままであってもよい。2つのPUが1つのCUの中に存在するとき、2つのPUは、各々半分のサイズの長方形(CUのサイズの半分)であってもよく、または一方がCUの1/4のサイズで他方がCUの3/4のサイズである2つのサイズの長方形であってもよい。 (Technically, an 8x8 CTB size may be supported.) CTB sizes in HEVC main profiles can range from 16x16 to 64x64. Coding units (CUs) can range in size from the same size as the CTB to as small as 8x8. Each CU is coded using one mode, i.e., intermode or intramode. When a CU is intercoded, it may be further divided into two or four predictive units (PUs), or it may remain as a single PU if no further division is applied. When two PUs exist within a single CU, the two PUs may each be a rectangle half the size of the CU (half the size of the CU), or they may be two rectangles, one one being 1/4 the size of the CU and the other 3/4 the size of the CU.

CUがインターコーディングされるとき、各PUは動き情報の1つのセットを有し、これは固有のインター予測モードを用いて導出される。 When a CU is intercoded, each PU possesses a set of motion information, which is derived using a unique inter-prediction mode.

ここで、動きベクトル予測が論じられる。HEVC規格では、PUに対して、マージモード(スキップはマージの特別な場合であると見なされる)および高度動きベクトル予測(AMVP)モードと名付けられた、2つのインター予測モードがある。 Here, motion vector prediction is discussed. The HEVC standard provides two interpretation modes for the PU: merge mode (skipping is considered a special case of merging) and advanced motion vector prediction (AMVP) mode.

AMVPモードまたはマージモードのいずれでも、動きベクトル(MV)候補リストが、複数の動きベクトル予測子のために維持される。現在PUの、MV、ならびにマージモードにおける参照インデックスは、MV候補リストから1つの候補をとることによって生成される。たとえば、ビデオデコーダ300はMV候補リストを維持し得る。 In either AMVP mode or merge mode, a motion vector (MV) candidate list is maintained for multiple motion vector predictors. Currently, the MV, and the reference index in merge mode, are generated by selecting one candidate from the MV candidate list. For example, the video decoder 300 may maintain an MV candidate list.

MV候補リストは、マージモードのための最高で5つの候補とAMVPモードのための2つだけの候補とを含んでいる。マージ候補は、動き情報のセット、たとえば、参照ピクチャリスト(リスト0およびリスト1)と対応する参照インデックスとの両方に対応するMVを含み得る。マージ候補がマージインデックスによって特定される場合、現在ブロックの予測のために使用される参照ピクチャ、ならびに関連する動きベクトルが決定される。一方、リスト0またはリスト1のいずれかからの各々の潜在的な予測方向に対するAMVPモードでは、AMVP候補がMVしか含まないので、ビデオエンコーダ200は、MV予測子(MVP)インデックスとともに参照インデックスをMV候補リストに明示的にシグナリングし得る。AMVPモードでは、予測されるMVはさらに改良され得る。 The MV candidate list contains up to five candidates for merge mode and only two candidates for AMVP mode. A merge candidate may contain MVs corresponding to both a set of motion information, such as a reference picture list (lists 0 and 1) and its corresponding reference index. When a merge candidate is identified by its merge index, the reference picture used for the current block prediction, as well as the associated motion vector, are determined. On the other hand, in AMVP mode for each potential prediction direction from either list 0 or list 1, the AMVP candidate contains only MVs, so the video encoder 200 may explicitly signal the reference index along with the MV predictor (MVP) index to the MV candidate list. In AMVP mode, the predicted MV can be further refined.

ビデオデコーダ300は、同じ空間および時間隣接ブロックから同様に、両方のモードのための候補を導出し得る。 The video decoder 300 can similarly derive candidates for both modes from the same spatial and temporally adjacent blocks.

図5A～図5Bはそれぞれ、マージモードおよびAMVPモードのための例示的な空間隣接MV候補を示す概念図である。空間MV候補は、特定のPU(PU0)に対して図5Aおよび図5Bに示される隣接ブロックから導出されるが、ブロックから候補を生成する方法は、マージモードとAMVPモードとで異なる。 Figures 5A and 5B are conceptual diagrams showing exemplary spatially adjacent MV candidates for merge mode and AMVP mode, respectively. Spatial MV candidates are derived from the adjacent blocks shown in Figures 5A and 5B for a given PU (PU0), but the method of generating candidates from the blocks differs between merge mode and AMVP mode.

マージモードでは、PU0 500のための最高で4つの空間MV候補を、番号が増大するような図5Aにおいて示される順序で導出することができ、その順序は、左(0,A1)、上(1,B1)、右上(2,B0)、左下(3,A0)、および左上(4,B2)である。たとえば、ビデオデコーダ300は、上に記載された順序を使用して、PU0 500のための最大で4つの空間MV候補を導出し得る。 In merge mode, up to four spatial MV candidates for PU0 500 can be derived in the order shown in Figure 5A, where the numbers increase as follows: left (0, A1), top (1, B1), top right (2, B0), bottom left (3, A0), and top left (4, B2). For example, video decoder 300 can derive up to four spatial MV candidates for PU0 500 using the order described above.

図5Bに示されるように、AMVPモードでは、PU0 502の隣接ブロックは2つのグループ、すなわちブロック0および1からなる左のグループと、ブロック2、3および4からなる上のグループに分割される。たとえば、ビデオデコーダ300は、隣接ブロックを左のグループおよび上のグループへと分割し得る。各グループに対して、シグナリングされた参照インデックスによって示される参照ピクチャと同じ参照ピクチャを参照する隣接ブロックの中の潜在的な候補は、グループの最終候補を形成するために選択される際の優先順位が最高である。すべての隣接ブロックが、同じ参照ピクチャを指す動きベクトルを含まないことがあり得る。したがって、そのような候補を見出すことができない場合、第1の利用可能な候補が、最終候補を形成するようにスケーリングされるので、時間距離の差が補償され得る。 As shown in Figure 5B, in AMVP mode, the adjacent blocks of PU0 502 are divided into two groups: the left group consisting of blocks 0 and 1, and the upper group consisting of blocks 2, 3, and 4. For example, the video decoder 300 may divide adjacent blocks into the left and upper groups. For each group, potential candidates among adjacent blocks that reference the same reference picture as the reference picture indicated by the signaled reference index have the highest priority when selected to form the final candidate for the group. It is possible that not all adjacent blocks contain motion vectors pointing to the same reference picture. Therefore, if no such candidate can be found, the difference in time distance can be compensated for by scaling the first available candidate to form the final candidate.

HEVCにおける時間動きベクトル予測がここで論じられる。ビデオデコーダ300は、時間動きベクトル予測子(TMVP)候補を、有効であり利用可能である場合、任意の空間動きベクトル候補の後にMV候補リストに追加し得る。TMVP候補のための動きベクトル導出のプロセスは、マージモードとAMVPモードの両方に対して同じである。しかしながら、マージモードにおけるTMVP候補のための目標参照インデックスは常に0に設定され得る。 Time-motion vector prediction in HEVC is discussed here. The video decoder 300 may add time-motion vector predictor (TMVP) candidates to the MV candidate list after any spatial motion vector candidates, if valid and available. The motion vector derivation process for TMVP candidates is the same for both merge mode and AMVP mode. However, the target reference index for TMVP candidates in merge mode can always be set to 0.

図6A～図6Bはそれぞれ、例示的なTMVP候補およびMVスケーリングを示す概念図である。TMVP候補導出のための主要ブロックの位置は、空間隣接候補を生成するために使用される上のブロックおよび左のブロックへのバイアスを補償するために、ブロック「T」600として図6Aにおいて示されている同一位置PUの外側の右下ブロックである。しかしながら、ブロックが現在CTBの行の外側に位置する場合(ブロック602として示される)、または動き情報が利用可能ではない場合、ブロックはPU0 606の中心ブロック604で置き換えられる。 Figures 6A and 6B are conceptual diagrams illustrating exemplary TMVP candidates and MV scaling, respectively. The primary block location for TMVP candidate derivation is the lower-right block outside the same-position PU, shown in Figure 6A as block "T" 600, to compensate for biases to the upper and left blocks used to generate spatial adjacency candidates. However, if the block is currently located outside the row of the CTB (shown as block 602), or if motion information is unavailable, the block is replaced by the central block 604 of PU0 606.

TMVP候補のための動きベクトルは、スライスレベルで示される、同一位置ピクチャの同一位置PUから導出される。同一位置PUの動きベクトルは、同一位置MVと呼ばれる。 The motion vector for the TMVP candidate is derived from the co-position PU of the co-position picture, shown at the slice level. The motion vector of the co-position PU is called the co-position MV.

AVCにおける時間直接モードと同様に、TMVP候補を導出するために、同一位置MV610は、図6Bに示されるように、時間距離差分を補償するためにスケーリングされる必要がある。たとえば、ビデオデコーダ300は、時間距離差分を補償するために同一位置MV610をスケーリングし得る。 Similar to the time-direct mode in AVC, in order to derive TMVP candidates, the same-position MV610 needs to be scaled to compensate for the time-distance difference, as shown in Figure 6B. For example, the video decoder 300 may scale the same-position MV610 to compensate for the time-distance difference.

HEVCにおける動き予測の他の態様がここで論じられる。マージモードおよびAMVPモードのいくつかの態様は、次のように言及に値する。 Other aspects of motion prediction in HEVC are discussed here. Several aspects of merge mode and AMVP mode are worth mentioning, as follows:

動きベクトルのスケーリング: MVの値は提示時間におけるピクチャの距離に比例する。MVは、参照ピクチャ、および動きベクトルを含むピクチャ(たとえば、含有ピクチャまたは動きベクトルを使用して予測されているブロックを含むピクチャ)という、2つのピクチャを関連付ける。MVが別のMVを予測するために利用されるとき、含有ピクチャと参照ピクチャとの間の時間距離は、ピクチャ順序カウント(POC)値に基づいて計算される。 Motion Vector Scaling: The MV value is proportional to the distance between pictures at presentation time. The MV relates two pictures: a reference picture and a picture containing the motion vector (e.g., a containing picture or a picture containing the block predicted using the motion vector). When an MV is used to predict another MV, the time distance between the containing picture and the reference picture is calculated based on the Picture Order Count (POC) value.

予測されるべき動きベクトルに対して、動きベクトルの関連する含有ピクチャと参照ピクチャの両方が異なることがある。したがって、新しい距離(POCに基づく)が計算される。MVは、これらの2つのPOC距離に基づいてスケーリングされる。たとえば、ビデオデコーダ300は、POCに基づいて新しい距離を計算してもよく、2つのPOC距離に基づいてMVをスケーリングしてもよい。空間隣接候補に対して、2つのMVの含有ピクチャは同じであるが、参照ピクチャは異なる。HEVCでは、MVのスケーリングは、空間隣接候補および時間隣接候補のためのTMVPとAMVPの両方に適用される。 For a motion vector to be predicted, both the associated containing picture and the reference picture may differ. Therefore, a new distance (based on POC) is calculated. The MV is scaled based on these two POC distances. For example, the video decoder 300 may calculate the new distance based on the POC, or it may scale the MV based on the two POC distances. For a spatially adjacent candidate, the containing pictures of the two MVs may be the same, but the reference pictures may be different. In HEVC, MV scaling applies to both TMVP and AMVP for both spatially and temporally adjacent candidates.

人工動きベクトル候補の生成: MV候補リストが完全ではない場合、リストがすべての候補を有する(たとえば、リストが埋まる)まで、人工MV候補が生成されてリストの最後に挿入され得る。 Generation of artificial motion vector candidates: If the MV candidate list is incomplete, artificial MV candidates may be generated and inserted at the end of the list until the list contains all candidates (e.g., the list is filled).

マージモードでは、Bスライスのためだけに導出される合成候補、および第1のタイプが十分な人工候補を提供しない場合にAMVPのためだけに使用されるゼロ候補という、2つのタイプの人工MV候補がある。 In merge mode, there are two types of artificial MV candidates: synthetic candidates derived solely for B-slice, and zero candidates used only for AMVP when the first type does not provide sufficient artificial candidates.

候補リストの中にすでにあり、必要な動き情報を有する候補の各ペアに対して、リスト0の中のピクチャを参照する第1の候補のMVと、リスト1の中のピクチャを参照する第2の候補のMVとの合成によって、双方向合成MV候補が導出される。 For each pair of candidates already present in the candidate list and possessing the necessary motion information, a bidirectional composite MV candidate is derived by combining the MV of the first candidate referencing a picture in List 0 with the MV of the second candidate referencing a picture in List 1.

候補挿入のための剪定プロセス: 異なるブロックからの候補が偶然同じであることがあり、これは、マージ/AMVP候補リストの効率を下げる。この問題を解決するために剪定プロセスが適用され得る。剪定プロセスの間、ビデオデコーダ300は、同一の候補を挿入するのをある程度避けるために、ある候補を現在の候補リストの中の他の候補と比較する。複雑さを下げるために、剪定プロセスは、各々のあり得る候補をすべての他の既存の候補と比較するのではなく、限られた数の候補に適用されてもよい。 Pruning Process for Candidate Insertion: Candidates from different blocks may coincidentally be the same, which reduces the efficiency of the merge/AMVP candidate list. A pruning process may be applied to solve this problem. During the pruning process, the video decoder 300 compares a candidate to other candidates in the current candidate list to some extent to avoid inserting identical candidates. To reduce complexity, the pruning process may be applied to a limited number of candidates rather than comparing each possible candidate to all other existing candidates.

テンプレートマッチング予測がここで論じられる。テンプレートマッチング(TM)予測は、Frame-Rate Up Conversion(FRUC)技法に基づく特別なマージモードである。このTM予測モードでは、ブロックの動き情報はシグナリングされないが、ビデオデコーダ300によってデコーダ側で導出される。AMVPモードと通常のマージモードの両方にTM予測が適用される。AMVPモードでは、MVP候補の選択は、現在ブロックテンプレートと参照ブロックテンプレートとの差が最小となる候補を選ぶように、基本テンプレートマッチングを使用して決定される。通常のマージモードでは、ビデオエンコーダ200は、TMの使用を示すためにTMモードフラグをシグナリングし、次いで、MV改良のためのマージインデックスによって示されるマージ候補に、TMが適用される。 Template matching prediction is discussed here. Template matching (TM) prediction is a special merge mode based on the Frame-Rate Up Conversion (FRUC) technique. In this TM prediction mode, block motion information is not signaled but is derived on the decoder side by the video decoder 300. TM prediction is applied to both AMVP mode and normal merge mode. In AMVP mode, the selection of the MVP candidate is determined using basic template matching to select the candidate with the smallest difference between the current block template and the reference block template. In normal merge mode, the video encoder 200 signals a TM mode flag to indicate the use of TM, and then TM is applied to the merge candidate indicated by the merge index for MV improvement.

図7は、初期MVの周りの探索エリアでの例示的なテンプレートマッチングを示す概念図である。図7に示されるように、現在CUの動き情報を導出するために、テンプレートマッチングが使用され得る。動き情報を導出することは、現在ピクチャ702の中のテンプレート700(現在CUの上および/または左の隣接ブロック)と参照ピクチャ706の中のブロック704(たとえば、テンプレートと同じサイズ)との最良の一致を見つけることを含み得る。最初のマッチングエラーに基づいて選択されるAMVP候補を用いて、候補のMVPはテンプレートマッチングによって改良される。シグナリングされたマージインデックスによって示されるマージ候補を用いて、参照ピクチャlist0(L0)および参照ピクチャlist1(L1)に対応する候補のマージされたMVは、テンプレートマッチングによって独立に改良され、次いで、前の参照としてより正確なMVを用いて、より正確ではないMVがさらに再び改良される。たとえば、ビデオデコーダ300は、シグナリングされたマージインデックスを受信して解析し、MVを改良するためにテンプレートマッチングをマージされたMVに適用し得る。 Figure 7 is a conceptual diagram illustrating exemplary template matching in the search area around the initial MV. As shown in Figure 7, template matching may be used to derive motion information for the current CU. Deriving motion information may involve finding the best match between template 700 in current picture 702 (the adjacent block above and/or to the left of the current CU) and block 704 in reference picture 706 (for example, the same size as the template). Using AMVP candidates selected based on the initial matching error, the candidate MVP is refined by template matching. Using merge candidates indicated by signaled merge indices, the merged MVs of candidates corresponding to reference picture list0 (L0) and reference picture list1 (L1) are refined independently by template matching, and then the less accurate MVs are further refined again using more accurate MVs as previous references. For example, video decoder 300 may receive and parse signaled merge indices and apply template matching to the merged MVs to refine them.

コスト関数: 動きベクトルが非整数サンプル位置を指すとき、ビデオデコーダ300は、動き補償された補間を使用し得る。複雑さを下げるために、通常の8タップ離散コサイン変換-補間フィルタ(DCT-IF)補間の代わりに、双線形補間が、テンプレートマッチングと、参照ピクチャ上でテンプレートを生成することの両方のために使用される。テンプレートマッチングのマッチングコストCは次のように計算され得る。
C=SAD+w・(|MV_x-MV_x ^s|+|MV_y-MV_y ^s|)
ここで、wは経験的に4に設定される加重係数であり、MVおよびMV^sはそれぞれ、現在の試験対象MVおよび初期MV(たとえば、AMVPモードにおけるMVP候補またはマージモードにおけるマージされた動きベクトル)を示す。絶対差分和(SAD)は、テンプレートマッチングのマッチングコストとして使用され得る。 Cost function: When the motion vector points to a non-integer sample position, the video decoder 300 may use motion-compensated interpolation. To reduce complexity, bilinear interpolation is used for both template matching and generating the template on the reference picture, instead of the usual 8-tap discrete cosine transform-interpolation filter (DCT-IF) interpolation. The matching cost C for template matching can be calculated as follows:
C=SAD+w・(|MV _x -MV _x ^s |+|MV _y -MV _y ^s |)
Here, w is a weighting coefficient empirically set to 4, and MV and ^MVs represent the current MV under test and the initial MV (e.g., the MVP candidate in AMVP mode or the merged motion vector in merge mode), respectively. The absolute difference sum (SAD) can be used as the matching cost for template matching.

TMが使用されるとき、ルマサンプルのみを使用することによって、動きが改良される。導出される動きは、動き補償(MC)インター予測のためのルマとクロマの両方のために使用される。MVが決められた後、ルマのための8タップ補間フィルタおよびクロマのための4タップ補間フィルタを使用して、最後のMCが実行される。たとえば、ビデオデコーダ300は、ルマサンプルだけを使用して動きを改良してもよい。 When motion correction (TM) is used, motion is improved by using only luma samples. The derived motion is used for both luma and chroma for motion compensation (MC) interpretation. After the motion value (MV) is determined, the final MC is performed using an 8-tap interpolation filter for luma and a 4-tap interpolation filter for chroma. For example, video decoder 300 may improve motion using only luma samples.

探索方法: MV改良は、テンプレートマッチングコストの基準を用いたパターンベースのMV探索であり得る。MV改良のために、ダイヤモンド探索およびクロス探索という2つの探索パターンがサポートされる。たとえば、ビデオデコーダ300は、MV改良のためにダイヤモンド探索またはクロス探索を使用し得る。MVは、ダイヤモンドパターンを用いて4分の1ルマサンプル動きベクトル差分(MVD)の正確さで、クロスパターンを用いて続いて4分の1ルマサンプルMVDの正確さで直接探索され、そしてこの後には、クロスパターンを用いた8分の1ルマサンプルMVDの改良がある。MV改良の探索範囲は、初期MVの周りの(-8,+8)ルマサンプルに等しく設定され得る。 Search Method: MV refinement can be a pattern-based MV search using a template matching cost criterion. Two search patterns are supported for MV refinement: diamond search and cross search. For example, video decoder 300 may use either diamond search or cross search for MV refinement. The MV is directly searched using the diamond pattern with accuracy of 1/4 lumasample motion vector difference (MVD), then using the cross pattern with accuracy of 1/4 lumasample MVD, and subsequently refined to 1/8 lumasample MVD using the cross pattern. The search range for MV refinement can be set to equal (-8, +8) lumasamples around the initial MV.

バイラテラルマッチング予測がここで論じられる。バイラテラルマッチング(バイラテラルマージとしても知られている)(BM)予測は、FRUC技法に基づく別のマージモードである。あるブロックのためにBMモードを適用するという決定が行われると、構築されたマージリストの中のマージ候補を選択するためにシグナリングされたマージ候補インデックスを使用することによって、2つの初期MV(MV0およびMV1)が導出される。ビデオデコーダ300は、MV0およびMV1の周りでバイラテラルマッチング探索を実行し、最小のバイラテラルマッチングコストに基づいて最終的なMV0'およびMV1'を導出し得る。 Bilateral matching prediction is discussed here. Bilateral matching (also known as bilateral merging) (BM) prediction is another merge mode based on the FRUC technique. Once a decision is made to apply the BM mode to a given block, two initial MVs (MV0 and MV1) are derived by using signaled merge candidate indices to select merge candidates from a constructed merge list. The video decoder 300 can perform a bilateral matching search around MV0 and MV1 and derive the final MV0' and MV1' based on the minimum bilateral matching cost.

図8A～図8Bはそれぞれ、MVD0およびMVD1が時間距離に基づいて比例する例、ならびにMVD0およびMVD1が時間距離にかかわらず鏡写しになっている例を示す概念図である。2つの参照ブロックを指す動きベクトル差分MVD0(MV0'-MV0によって表記される)およびMVD1(MV1'-MV1によって表記される)は、現在ピクチャ804と2つの参照ピクチャ806および808との間の時間距離(TD)、たとえばTD0 800およびTD1 802に比例し得る。図8AはMVD0およびMVD1の例を示し、TD1 802はTD0 800の4倍である。 Figures 8A and 8B are conceptual diagrams illustrating examples where MVD0 and MVD1 are proportional based on time distance, and examples where MVD0 and MVD1 are mirror images regardless of time distance, respectively. The motion vector differences MVD0 (denoted by MV0'-MV0) and MVD1 (denoted by MV1'-MV1) pointing to two reference blocks can be proportional to the time distance (TD) between picture 804 and the two reference pictures 806 and 808, for example, TD0 800 and TD1 802. Figure 8A shows an example of MVD0 and MVD1, where TD1 802 is four times TD0 800.

しかしながら、MVD0およびMVD1が時間距離TD0およびTD1にかかわらず鏡写しであるような任意選択の設計がある。図8Bは鏡写しであるMVD0およびMVD1の例を示し、TD1 812はTD0 810の4倍である。 However, there is an optional design in which MVD0 and MVD1 are mirror images of each other regardless of the time distance TD0 and TD1. Figure 8B shows an example of mirror images of MVD0 and MVD1, where TD1 812 is four times TD0 810.

図9は、探索範囲[-8,8]における3×3の正方形探索パターンの例を示す概念図である。バイラテラルマッチングは、初期MV0およびMV1の周りでローカル探索を実行して、最終的なMV0'およびMV1'を導出することを含み得る。ローカル探索を適用するために、ビデオデコーダ300は、3×3の正方形探索パターンを適用してもよく、探索範囲[-8,8]をループする。各々の探索の反復において、探索パターンの中の8つの取り囲むMVのバイラテラルマッチングコストが計算され、中心MVのバイラテラルマッチングコストと比較される。バイラテラルマッチングコストが最小であるMVが、次の探索の反復において新しい中心のMVになる。現在の中心MVが3×3の正方形探索パターン内で最小のコストを有するとき、またはローカル探索があらかじめ定められた最大の回数の探索の反復に達すると、ローカル探索は終了する。たとえば、ビデオデコーダ300は、本明細書において説明されるようなバイラテラルマッチングを実行し得る。図9の例では、初期MV900が使用され、初期MV900の周りで3×3の探索パターン902が探索される。最初の反復では、初期の8つのMVのうちでコストが最低であるMVはMV904になる。第2の反復において、ビデオデコーダ300は次いで、MV904の周りで探索パターン902を繰り返す。この例では、N回の反復の後で最後に選択されるMVはMV906である。 Figure 9 is a conceptual diagram showing an example of a 3x3 square search pattern in the search range [-8,8]. Bilateral matching may involve performing a local search around the initial MV0 and MV1 to derive the final MV0' and MV1'. To apply the local search, the video decoder 300 may apply a 3x3 square search pattern, looping through the search range [-8,8]. In each iteration of the search, the bilateral matching costs of the eight surrounding MVs in the search pattern are calculated and compared to the bilateral matching cost of the central MV. The MV with the smallest bilateral matching cost becomes the new central MV in the next iteration of the search. The local search terminates when the current central MV has the smallest cost within the 3x3 square search pattern, or when the local search reaches a predetermined maximum number of iterations. For example, the video decoder 300 may perform bilateral matching as described herein. In the example in Figure 9, the initial MV900 is used, and a 3x3 search pattern 902 is explored around the initial MV900. In the first iteration, the MV with the lowest cost among the initial 8 MVs is MV904. In the second iteration, the video decoder 300 then repeats the search pattern 902 around MV904. In this example, the MV finally selected after N iterations is MV906.

図10は、例示的なデコーダ側動きベクトル改良を示す概念図である。マージモードのMVの正確さ高めるために、デコーダ側動きベクトル改良(DMVR)が、VVD Draft 10におけるように適用され得る。たとえば、ビデオデコーダ300はDMVRを適用し得る。双予測演算において、参照ピクチャlist0(L0)および参照ピクチャlist1(L1)の中の初期MVの周りで、改良されたMVが探索される。DMVR方法は、L0およびL1の中の2つの候補ブロック間のひずみを計算する。たとえば、ビデオデコーダ300は、2つの候補ブロック間のひずみを計算し得る。図10に示されるように、初期MVの周りの各MV候補に基づくブロック1000と1002との間のSADが計算される。たとえば、ビデオデコーダ300は、ブロック1000と1002との間のSADを決定し得る。SADが最低であるMV候補が、改良されたMVになり、双予測される信号を生成するために使用される。 Figure 10 is a conceptual diagram illustrating an exemplary decoder-side motion vector improvement. To improve the accuracy of the merge-mode MV, decoder-side motion vector improvement (DMVR) can be applied, as shown in VVD Draft 10. For example, video decoder 300 may apply DMVR. In the biprediction operation, the improved MV is searched around the initial MV in reference picture list0 (L0) and reference picture list1 (L1). The DMVR method calculates the strain between two candidate blocks in L0 and L1. For example, video decoder 300 may calculate the strain between two candidate blocks. As shown in Figure 10, the SAD between blocks 1000 and 1002 is calculated based on each MV candidate around the initial MV. For example, video decoder 300 may determine the SAD between blocks 1000 and 1002. The MV candidate with the lowest SAD becomes the improved MV and is used to generate the bipredicted signal.

DMVR技法によって導出される改良されたMVは、インター予測サンプルを生成するために使用され、未来のピクチャをコーディングするための時間動きベクトル予測においても使用される。ビデオデコーダ300は、デブロッキングプロセスにおいて、また未来のCUのコーディングのための空間動きベクトル予測においても、元のMVを使用し得る。 The improved MV derived by the DMVR technique is used to generate interpredictive samples and is also used in time motion vector prediction for coding future pictures. The video decoder 300 can use the original MV in the deblocking process and in spatial motion vector prediction for coding future CUs.

VVC Draft 10のDMVRは、16×16ルマサンプルのあらかじめ定められた最大のPUを用いるサブブロックベースのマージモードである。CUの幅および/または高さが16ルマサンプルより大きいとき、CUはさらに、16ルマサンプルに等しい幅および/または高さをもつサブブロックへと分割され得る。たとえば、ビデオデコーダ300はさらに、16ルマサンプルに等しい幅および/または高さをもつサブブロックへと、より大きいCUを分割し得る。 The DMVR in VVC Draft 10 is a subblock-based merge mode that uses a predetermined maximum PU of 16x16 lumasamples. When the width and/or height of a CU is greater than 16 lumasamples, the CU may be further divided into subblocks with widths and/or heights equal to 16 lumasamples. For example, the video decoder 300 may further divide larger CUs into subblocks with widths and/or heights equal to 16 lumasamples.

例示的な探索方式がここで論じられる。DVMRでは、初期MVの周りの探索点およびMVオフセットは、上で論じられたMV差分の鏡写しルールに従う。言い換えると、MVペア候補(MV0,MV1)によって表記される、DMVRを実装するビデオデコーダ300によって確認されるいずれの点も、以下の2つの式に従う。
MV0'=MV0+MV_offset
MV1'=MV1-MV_offset
ここで、MV_offsetは、参照ピクチャのうちの1つの中の初期MVと改良されたMVとの間の改良オフセットを表す。改良探索範囲は、初期MVから2つの整数ルマサンプルである。この探索は、整数サンプルオフセット探索段階および非整数サンプル改良段階を含む。 An exemplary search scheme is discussed here. In DVMR, the search points and MV offsets around the initial MV follow the mirroring rule of the MV difference discussed above. In other words, any point identified by the video decoder 300 implementing the DMVR, denoted by a candidate MV pair (MV0, MV1), follows the following two equations:
MV0' = MV0 + MV_offset
MV1' = MV1 - MV_offset
Here, MV_offset represents the refinement offset between the initial MV and the refined MV in one of the reference pictures. The refinement search range is two integer samples from the initial MV. This search includes an integer sample offset search phase and a non-integer sample refinement phase.

整数サンプルオフセット探索のために、25点の完全な探索が適用され得る。たとえば、ビデオデコーダ300は、25点の完全な探索を実行し得る。初期MVペアのSADが最初に計算される。初期MVペアのSADが閾値より小さい場合、DMVRの整数サンプル段階は終了する。それ以外の場合、残りの24点のSADが、ラスター走査順序で計算され確認される。SADが最小である点が、整数サンプルオフセット探索段階の出力として選択される。DMVR改良の不確実性という不利益を減らすために、DMVRプロセスの間は元のMVが優先され得る。初期MV候補によって参照される参照ブロック間のSADは、SAD値の1/4だけ減少し得る。 For integer sample offset search, a full 25-point search may be applied. For example, video decoder 300 may perform a full 25-point search. The SAD of the initial MV pair is calculated first. If the SAD of the initial MV pair is less than the threshold, the integer sample stage of DMVR terminates. Otherwise, the SADs of the remaining 24 points are calculated and verified in raster scan order. The point with the minimum SAD is selected as the output of the integer sample offset search stage. To mitigate the disadvantage of uncertainty in DMVR refinement, the original MV may be preferred during the DMVR process. The SAD between reference blocks referenced by the initial MV candidates may be reduced by 1/4 of the SAD value.

整数サンプル探索の後には、非整数サンプル改良がある。たとえば、ビデオデコーダ300は、整数サンプル探索を実行し、次いで非整数サンプル改良を実行し得る。計算の複雑さを軽減するために、非整数サンプル改良は、SAD比較を用いた追加の探索ではなく、パラメトリック誤差表面方程式を使用することによって導出され得る。非整数サンプル改良は、整数サンプル探索段階の出力に基づいて条件的に呼び出される。整数サンプル探索段階が、第1の反復探索または第2の反復探索のいずれかにおいて、中心が最小のSADを有する状態で終了すると、非整数サンプル改良がさらに適用される。 Following integer sample search, there is a non-integer sample refinement. For example, the video decoder 300 may perform an integer sample search and then a non-integer sample refinement. To reduce computational complexity, the non-integer sample refinement may be derived by using a parametric error surface equation rather than an additional search using SAD comparison. The non-integer sample refinement is conditionally invoked based on the output of the integer sample search stage. If the integer sample search stage terminates with the center having the smallest SAD in either the first or second iterative search, the non-integer sample refinement is further applied.

パラメトリック誤差表面ベースのサブピクセルオフセット推定では、中心位置コストおよび中心からの4つの隣接する位置におけるコストが、以下の形式の2次元(2-D)放物線誤差表面方程式をフィッティングするために使用される。
E(x,y)=A(x-x_min)²+B(y-y_min)²+C
ここで、(x_min,y_min)はコストが最小である非整数位置に対応し、AおよびBは定数であり、Cは最小のコスト値に対応する。5つの探索点のコスト値を使用することによって上の式を解くことによって、最小値(x_min,y_min)の位置は次のように計算される。
x_min=(E(-1,0)-E(1,0))/(2(E(-1,0)+E(1,0)-2E(0,0)))
y_min=(E(0,-1)-E(0,1))/(2((E(0,-1)+E(0,1)-2E(0,0))) In parametric error surface-based subpixel offset estimation, the central position cost and the costs at four adjacent positions from the center are used to fit a two-dimensional (2-D) parabolic error surface equation of the following form:
E(x,y)=A(xx _min ) ² +B(yy _min ) ² +C
Here, (x _min , y _min ) corresponds to the non-integer position with the minimum cost, A and B are constants, and C corresponds to the minimum cost value. By solving the above equation using the cost values of the five search points, the position of the minimum value (x _min , y _min ) is calculated as follows:
x _min =(E(-1,0)-E(1,0))/(2(E(-1,0)+E(1,0)-2E(0,0)))
y _min =(E(0,-1)-E(0,1))/(2((E(0,-1)+E(0,1)-2E(0,0)))

すべてのコスト値が整数であり、最小の値はE(0,0)であるので、x_minおよびy_minの値は自動的に-8と8の間に制約される。これは、VVC Draft 10における16分の1ペルのMVの正確さを用いると、半ペルのオフセットに相当する。サブピクセルの正確な改良デルタMVを得るために、計算された非整数の(x_min, y_min)が整数距離改良MVに加算される。 Since all cost values are integers and the smallest value is E(0,0), the values of x _min and y _min are automatically constrained between -8 and 8. This corresponds to a half-pel offset, using the 1/16th pel accuracy of the MV in VVC Draft 10. To obtain the accurate improved delta MV for subpixels, the calculated non-integer (x _min , y _min ) is added to the integer distance improved MV.

双線形補間およびサンプルパディングがここで論じられる。これらの技法は、ビデオデコーダ300によって適用され得る。VVC Draft 10では、MVの分解能は1/16ルマサンプルである。非整数位置におけるサンプルは、8タップの補間フィルタを使用して補間され得る。DMVRでは、探索点は、整数サンプルのオフセットを伴って初期の非整数ペルMVを取り囲む。したがって、それらの非整数位置のサンプルは、DMVR探索のために補間される必要がある。計算の複雑さを下げるために、双線形補間フィルタを使用して、DMVRにおける探索のための非整数サンプルを生成する。別の効果は、2サンプルの探索範囲とともに双線形フィルタを使用することによって、DVMRは普通の動き補償プロセスと比較してより多くの参照サンプルにアクセスしないということである。改良されたMVがDMVR探索を用いて得られた後、最終的な予測を生成するために、普通の8タップの補間フィルタが適用される。普通の動き補償プロセスより多くの参照サンプルにアクセスしないようにするために、元のMVに基づく補間プロセスに必要とされないが、改良されたMVに基づく補間プロセスに必要とされるサンプルが、利用可能なサンプルからパディングされる。 Bilinear interpolation and sample padding are discussed here. These techniques can be applied by the video decoder 300. In VVC Draft 10, the resolution of the MV is 1/16 luma samples. Samples at non-integer positions can be interpolated using an 8-tap interpolation filter. In DMVR, the search points surround the initial non-integer phase MV with an offset of integer samples. Therefore, samples at those non-integer positions need to be interpolated for DMVR search. To reduce computational complexity, a bilinear interpolation filter is used to generate non-integer samples for search in DMVR. Another effect is that by using a bilinear filter with a 2-sample search range, DVMR accesses fewer reference samples compared to a normal motion compensation process. After the improved MV is obtained using DMVR search, a normal 8-tap interpolation filter is applied to generate the final prediction. To avoid accessing more reference samples than a normal motion compensation process, samples not required for the interpolation process based on the original MV but required for the interpolation process based on the improved MV are padded from the available samples.

DMVRのための例示的な有効化条件がここで論じられる。一例では、以下の条件がすべて満たされる場合に、DMVRが有効にされ得る。1)CUレベルマージモードが双予測MVとともに使用される、2)現在ピクチャに関して、ある参照ピクチャが過去にあり、別の参照ピクチャが未来にある、3)両方の参照ピクチャから現在ピクチャまでの距離(たとえば、POC差分)が同じである、4)CUが64個より多くのルマサンプルを有する、5)CUの高さとCUの幅の両方が8ルマサンプル以上である、6)CU重みを用いた双予測(BCW: Bi-prediction with CU weights)重みインデックスが等しい重みを示す、7)加重予測(WP)が現在ブロックに対して有効ではない、および8)合成インターイントラ予測(CIIP: Combined inter-intra prediction)モードが現在ブロックのために使用されない。 Exemplary activation conditions for DMVR are discussed here. For example, DMVR may be enabled if all of the following conditions are met: 1) CU-level merge mode is used with bi-prediction MV; 2) with respect to the current picture, one reference picture is in the past and another reference picture is in the future; 3) the distance from both reference pictures to the current picture (e.g., POC difference) is the same; 4) the CU has more than 64 luma samples; 5) both the height and width of the CU are greater than or equal to 8 luma samples; 6) the bi-prediction with CU weights (BCW) weight indices show equal weights; 7) weighted prediction (WP) is not enabled for the current block; and 8) the Combined inter-intra prediction (CIIP) mode is not used for the current block.

双方向オプティカルフローがここで論じられる。ビデオデコーダ300は、4×4サブブロックレベルでCUの中のルマサンプルの双予測信号を改良するために、双方向オプティカルフロー(BDOF)を使用し得る。その名称が示すように、BDOFモードはオプティカルフローの概念に基づき、この概念は、オブジェクトの動きが滑らかであると仮定する。各々の4×4サブブロックに対して、L0予測サンプルとL1予測サンプルとの間の差分を最小にすることによって、動き改良(v_x,v_y)が計算される。動き改良は次いで、4×4サブブロックの中の双予測されるサンプル値を調整するために使用される。以下のステップはBDOFプロセスにおいて適用される。 Bidirectional optical flow is discussed here. The video decoder 300 may use bidirectional optical flow (BDOF) to improve the bi-predicted signals of luma samples in the CU at the 4x4 subblock level. As its name suggests, the BDOF mode is based on the concept of optical flow, which assumes that the motion of the object is smooth. For each 4x4 subblock, motion improvement (v _x , v _y ) is calculated by minimizing the difference between the L0 predicted sample and the L1 predicted sample. The motion improvement is then used to adjust the bi-predicted sample values in the 4x4 subblock. The following steps are applied in the BDOF process.

まず、2つの予測信号の水平方向および垂直方向の勾配、
および
、k=0,1が、2つの隣接サンプル間の差分を直接計算することによって算出され、たとえば、
であり、ここでI^(k)(i,j)はリストkの中の予測信号の座標(i,j)におけるサンプル値であり、k=0,1であり、shift1はルマビット深度bitDepthに基づいて計算され、shift1は6に等しく設定される。 First, the horizontal and vertical gradients of the two prediction signals,
and
k=0,1 is calculated by directly calculating the difference between two adjacent samples, for example,
Here, I ^(k) (i,j) is the sample value at coordinate (i,j) of the predicted signal in list k, where k=0,1 and shift1 is calculated based on the luma bit depth bitDepth, and shift1 is set to equal to 6.

次いで、勾配S₁、S₂、S₃、S₅およびS₆の自己相関と相互相関が次のように計算され、
S₁=Σ_(i,j)∈Ω│ψ_x(i,j)│, S₃=Σ_(i,j)∈Ωθ(i,j)・(-sign(ψ_x(i,j)))
S₂=Σ_(i,j)∈Ωψ_x(i,j)・sign(ψ_y(i,j))
S₅=Σ_(i,j)∈Ω│ψ_y(i,j)│ S₆=Σ_(i,j)∈Ωθ(i,j)・(-sign(ψ_y(i,j))) Next, the autocorrelation and crosscorrelation of gradients _S1 , _S2 , _S3 , _S5 , and _S6 are calculated as follows:
S ₁ =Σ _(i,j)∈Ω │ψ _x (i,j)│, S ₃ =Σ _(i,j)∈Ω θ(i,j)・(-sign(ψ _x (i,j)))
S ₂ =Σ _(i,j)∈Ω ψ _x (i,j)・sign(ψ _y (i,j))
S ₅ =Σ _(i,j)∈Ω │ψ _y (i,j)│ S ₆ =Σ _(i,j)∈Ω θ(i,j)・(-sign(ψ _y (i,j)))

ここで
θ(i,j)=(I⁽⁰⁾(i,j)≫shift2)-(I⁽¹⁾(i,j)≫shift2)
であり、Ωは4×4サブブロックの周りの6×6ウィンドウであり、shift2の値は4に等しく設定され、shift3の値は1に等しく設定される。 Here
θ(i,j)=(I ⁽⁰⁾ (i,j)≫shift2)-(I ⁽¹⁾ (i,j)≫shift2)
Therefore, Ω is a 6x6 window around a 4x4 subblock, the value of shift2 is set to equal to 4, and the value of shift3 is set to equal to 1.

動き改良(v_x,v_y)が次いで、以下を使用して、相互相関と自己相関の項を使用して導出される。
ここで、
は床関数である。
The motion improvement (v _x , v _y ) is then derived using the cross-correlation and autocorrelation terms, as follows:
Here,
This is the floor function.

動き改良および勾配に基づいて、4×4サブブロック中の各サンプルのために以下の調整が計算される。
最後に、CUのBDOFサンプルが、双予測サンプルを調整することによって次のように計算される。
pred_BDOF(x,y)=(I⁽⁰⁾(x,y)+I⁽¹⁾(x,y)+b(x,y)+ο_offset)≫shift5 Based on motion improvements and gradients, the following adjustments are calculated for each sample in the 4x4 subblock.
Finally, the BDOF sample of CU is calculated by adjusting the biprediction sample as follows:
pred _BDOF (x,y)=(I ⁽⁰⁾ (x,y)+I ⁽¹⁾ (x,y)+b(x,y)+ο _offset )≫shift5

ここで、shift5はMax(3,15-BitDepth)であり、変数o_offsetは(1<<(shift5-1))に等しく設定される。 Here, shift5 is Max(3, 15-BitDepth), and the variable o _offset is set to equal (1<<(shift5-1)).

これらの値は、BDOFプロセスにおける乗数が15ビットを超えず、BDOFプロセスにおける中間パラメータの最大ビット幅が32ビット以内に保たれるように選択される。 These values are selected so that the multiplier in the BDOF process does not exceed 15 bits, and the maximum bit width of the intermediate parameters in the BDOF process is kept within 32 bits.

図11は、BDOFにおいて使用される例示的な拡張されたCU領域を示す概念図である。勾配値を導出するために、現在CU境界の外の、リストk(k=0,1)の中のいくつかの予測サンプルI^(k)(i,j)が生成されなければならないことがある。図11に示されるように、BDOFは、CU1100の境界の周りの1つの拡張された行/列を使用する。境界外予測サンプルを生成する計算上の複雑さを制御するために、(座標に対してfloor()演算を使用して)近くの整数位置にある参照サンプルを補間なしで直接とることによって、拡張されたエリア(たとえば、一番外側の位置)の中の予測サンプルが生成され、普通の8タップ動き補償補間フィルタが、CU1100内で予測サンプルを生成するために使用される(たとえば、CU1100内の斜線または模様付きの位置)。これらの拡張されたサンプル値は、勾配計算においてのみ使用され得る。BDOFプロセスにおける残りのステップに対して、CU1100の境界の外側の任意のサンプル値および勾配値が必要である場合、それらは最近傍からパディング(たとえば、反復)され得る。 Figure 11 is a conceptual diagram showing an exemplary extended CU region used in BDOF. To derive gradient values, it may be necessary to generate some predicted samples I ^(k) (i,j) in the list k(k=0,1) that are currently outside the CU boundary. As shown in Figure 11, BDOF uses one extended row/column around the boundary of CU1100. To control the computational complexity of generating out-of-bounds predicted samples, predicted samples within the extended area (e.g., outermost positions) are generated by directly taking reference samples at nearby integer positions without interpolation (using the floor() operation on the coordinates), and a standard 8-tap motion-compensated interpolation filter is used to generate predicted samples within CU1100 (e.g., diagonal or patterned positions within CU1100). These extended sample values may only be used in gradient calculations. If any sample values and gradient values outside the boundary of CU1100 are needed for the remaining steps in the BDOF process, they may be padded (e.g., iterated) from the nearest neighbor.

BDOFは、4×4サブブロックレベル(たとえば、サブブロック1102)でCUの双予測信号を改良するために使用される。一例では、BDOFは、CUが以下の条件をすべて満たす場合にそのCUに適用され得る。1)CUが「真」の双予測モードを使用してコーディングされる、たとえば、2つの参照ピクチャの一方が表示順序において現在ピクチャの前にあり、他方が表示順序において現在ピクチャの後にある、2)CUがアフィンモードまたはATMVPマージモードを使用してコーディングされない、3)CUが64ルマサンプルより多くを有する、4)CUの高さとCUの幅の両方が8ルマサンプル以上である、5)BCW重みインデックスが等しい重みを示す、6)WPが現在CUに対して有効ではない、および7)CIIPモードが現在CUのために使用されない。 BDOF is used to improve the biprediction signal of a CU at the 4x4 subblock level (e.g., subblock 1102). For example, BDOF may be applied to a CU if it satisfies all of the following conditions: 1) the CU is coded using a "true" biprediction mode, e.g., one of two reference pictures is before the current picture in display order and the other is after the current picture in display order; 2) the CU is not coded using affine mode or ATMVP merge mode; 3) the CU has more than 64 luma samples; 4) both the height and width of the CU are greater than or equal to 8 luma samples; 5) the BCW weight indices show equal weights; 6) WP is not currently valid for the CU; and 7) CIIP mode is not currently used for the CU.

VVC Draft 10において、DMVRはサブブロックベースであり、最大で16×16ルマサンプルを伴う。各サブブロックの改良されたMVは、元のMVからのデルタMV(Δhor,Δver)を有する。ΔhorおよびΔverはそれぞれ、水平方向および垂直方向における動きベクトルオフセットである。ΔhorおよびΔverの値の範囲は、DMVRの探索範囲によって決定される。VVC Draft 10では、DMVRの探索範囲は[-2,2]である。したがって、改良された動きベクトルは、水平方向と垂直方向の両方において、元のMVから最大で±2ペルのオフセットを有する。 In VVC Draft 10, the DMVR is subblock-based and includes up to 16 × 16 luma samples. The improved MV for each subblock has a delta MV (Δhor, Δver) from the original MV. Δhor and Δver are the motion vector offsets in the horizontal and vertical directions, respectively. The range of values for Δhor and Δver is determined by the search range of the DMVR. In VVC Draft 10, the search range of the DMVR is [-2, 2]. Therefore, the improved motion vectors have an offset of up to ±2 pel from the original MV in both the horizontal and vertical directions.

デルタMVの±2ペルの値の範囲は、一部のブロックには小さすぎることがある。デルタMVの±2ペルの値の範囲外に最良のデルタMVを有するブロックに対して、ビデオデコーダ300は、そのような値の範囲を有するDMVRを用いて最適な改良されたMVを導出することができない。 The range of ±2 Pel values for Delta MV may be too small for some blocks. For blocks with the best Delta MV outside this range, the video decoder 300 cannot derive an optimally improved MV using a DMVR with such a range.

デルタMVの値の範囲は、DMVR探索範囲を広げることにより広げられ得る。たとえば、DMVR探索範囲は[-8,8]に広げられ得る。したがって、改良された動きベクトルは、水平方向と垂直方向の両方において、元のMVから最大で±8ペルのオフセットを有する。 The range of the Delta MV value can be broadened by widening the DMVR search range. For example, the DMVR search range can be widened to [-8, 8]. Therefore, the improved motion vector will have an offset of up to ±8 Pels from the original MV in both the horizontal and vertical directions.

しかしながら、探索範囲を広げることは、DMVRプロセスの複雑さを上げる。たとえば、固定された探索範囲[-8,8]へと広げるとき、ビデオデコーダ300は、DMVRのコーディングされたブロックのための[-2,2]という探索範囲と比較して、11倍以上多くのDMRS探索を行う必要がある。加えて、導出されたMVのサブセットが同様または同一であり得るとしても、DMVRのコーディングされたブロックの中のサブブロックのサブセットは、同様の改良されたMVを有することがあり、サブブロックベースのDMVRプロセスは、各サブブロックのためのMV改良を含む。一方、サブブロックのサブエリアは、サブブロックの他のサブエリアとは異なる最適な改良されたMVを有し得る。VVC Draft 10のDMVRは、16×16ルマサンプルサブブロックベースであるので、ビデオデコーダ300は、たとえば16×16サブブロック内の8×8または4×4サブエリアにおいて、異なる改良されたMVを導出できない。 However, expanding the search range increases the complexity of the DMVR process. For example, when expanding to a fixed search range [-8,8], the video decoder 300 needs to perform more than 11 times more DMVR searches compared to a search range of [-2,2] for the coded block of the DMVR. In addition, even if the derived subsets of MVs are similar or identical, subsets of subblocks within the coded block of the DMVR may have similar improved MVs, and the subblock-based DMVR process includes MV improvements for each subblock. On the other hand, a sub-area of a subblock may have a different optimal improved MV than other sub-areas of the subblock. Since the DMVR of VVC Draft 10 is 16×16 luma-sample subblock-based, the video decoder 300 cannot derive different improved MVs in, for example, 8×8 or 4×4 sub-areas within a 16×16 subblock.

DMVRプロセスを改善し得る技法が、本明細書において開示される。 Techniques for improving the DMVR process are disclosed herein.

例1. この例では、W×Hコーディングブロック内のサブブロックの改良された動きベクトルは、マルチパスデコーダ側動きベクトル改良(Multi-Pass DMVR)プロセスによって導出される。所定の数Nは、マルチパスDMVR技法のパスの総数を表し得る。ビデオデコーダ300は、これらのマルチパスDMVR技法を利用し得る。 Example 1. In this example, the improved motion vectors of the subblocks within the W×H coding block are derived by a Multi-Pass DMVR (Multi-Pass Decoder Motion Vector Improvement) process. A given number N may represent the total number of passes in the Multi-Pass DMVR technique. The video decoder 300 can utilize these Multi-Pass DMVR techniques.

図12は、例示的な3パスDMVR技法を示す概念図である。この例では、32×16コーディングブロック1200は、初期MV_orgで開始する。第1のパスはブロックベースであり得る。したがって、第1のパスは、現在PUまたはCUなどのブロック1200全体を使用し得る。第1のパス1202を実装するビデオデコーダ300は、改良されたMVであるMV_pass1を生成し得る。第2のパスはサブブロックベースであり得る。この例では、ビデオデコーダ300は、ブロック1200を2つの16×16サブブロック、サブブロック1204A～1204Bへと分割し得る。第2のパスを実装するビデオデコーダ300は、サブブロック1204A(MV_(pass2,0))およびサブブロック1204B(MV_(pass2,1))の各々のための改良されたMVを生成し得る。この例では、ビデオデコーダ300は、ブロック1200を8つの8×8サブブロック、サブブロック1208A～1208Hへと分割し得る。示されるように、第3のパスを実装するビデオデコーダ300は、サブブロック1208A～1208Gの各々のための改良されたMVを生成し得る。 Figure 12 is a conceptual diagram illustrating an exemplary three-pass DMVR technique. In this example, a 32x16 coding block 1200 starts with an initial MV _org . The first pass can be block-based. Thus, the first pass can use the entire block 1200, such as currently a PU or CU. A video decoder 300 implementing the first pass 1202 can generate an improved MV, MV _pass1 . The second pass can be subblock-based. In this example, the video decoder 300 can divide block 1200 into two 16x16 subblocks, subblocks 1204A and 1204B. A video decoder 300 implementing the second pass can generate improved MVs for each of subblock 1204A (MV _(pass2,0) ) and subblock 1204B (MV _(pass2,1) ). In this example, the video decoder 300 may divide block 1200 into eight 8x8 subblocks, subblocks 1208A to 1208H. As shown, the video decoder 300 implementing the third pass may generate improved MVs for each of the subblocks 1208A to 1208G.

たとえば、ビデオデコーダ300は、改良された動きベクトルを決定するためにビデオデータのブロック(たとえば、ブロック1200)のための動きベクトルにマルチパスDMVRを適用し、改良された動きベクトルに基づいてブロックを復号し得る。マルチパスDMVRは、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅が第1パスブロックの幅以下であり、第2パスサブブロックの高さが第1パスブロックの高さ以下である、第2のパスと、サブブロックベースであり少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを含み得る。 For example, the video decoder 300 may apply a multipath DMVR to the motion vector for a block of video data (e.g., block 1200) to determine an improved motion vector, and decode the block based on the improved motion vector. The multipath DMVR may include a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second-pass subblock of the block of video data, wherein the width of the second-pass subblock is less than or equal to the width of the first-pass block and the height of the second-pass subblock is less than or equal to the height of the first-pass block; and a third pass that is subblock-based and applied to at least one third-pass subblock, wherein the width of the third-pass subblock is less than or equal to the width of the second-pass subblock and the height of the third-pass subblock is less than or equal to the height of the second-pass subblock.

マルチパスDMVR技法は、W×Hコーディングブロックの元の動きベクトルMV_orgで開始する。コーディングブロックはPUまたはCUであり得る。第1のパスはブロックベースであり得る。第1のパスは、W×Hコーディングブロック全体のための改良された動きベクトルMV_pass1を導出し得る。MV_pass1は、後続のパスのための初期動きベクトルとして保存され使用され得る。 The multi-pass DMVR technique begins with the original motion vector MV _org of the W×H coding block. The coding block can be PU or CU. The first pass can be block-based. The first pass may derive an improved motion vector MV _pass1 for the entire W×H coding block. MV _pass1 can be saved and used as the initial motion vector for subsequent passes.

第2のパスはサブブロックベースであってもよく、たとえばW×Hコーディングブロックの1つまたは複数のサブブロックに基づいてもよい。第2のパスにおけるサブブロック(第2パスSB)は、所定の最大の寸法sbW_1×sbH_1を有し得る。W×Hコーディングブロックは、K1個のサブブロック(第2パスSB)へと分割されてもよく、K1≧1である。各第2パスSBは寸法M1×N1を有してもよく、M1≦WかつN1≦Hである。各第2パスSBは、初期動きベクトルMV_pass1(たとえば、第1のパスから導出されたMV)を有し得る。第2のパスは、各第2パスSBのための改良された動きベクトルMV_(pass2,i)を導出してもよく、iは第2パスSBのインデックスを表し、0≦i≦K1-1である。MV_(pass2,i)は、後続のパスのための初期動きベクトルとして保存され使用され得る。 The second pass may be subblock-based, for example, based on one or more subblocks of a W×H coding block. A subblock in the second pass (second pass SB) may have a given maximum dimension sbW_1×sbH_1. A W×H coding block may be divided into K1 subblocks (second pass SB) such that K1≧1. Each second pass SB may have dimensions M1×N1 such that M1≦W and N1≦H. Each second pass SB may have an initial motion vector MV _pass1 (for example, MV derived from the first pass). The second pass may derive an improved motion vector MV _(pass2,i) for each second pass SB, where i represents the index of the second pass SB, and 0≦i≦K1-1. MV _(pass2,i) may be saved and used as the initial motion vector for subsequent passes.

第3のパスはサブブロックベースであってもよく、たとえば第2のパスのそれぞれのサブブロックの1つまたは複数のサブブロックに基づいてもよい。第3のパスにおけるサブブロック(第3パスSB)は、所定の最大の次元sbW_2×sbH_2を有し、sbBW_2≦sbW_1かつsbH_2≦sbH_1である。第2のパスにおける各々の第iの第2パスSBは、K2個のサブブロック(第3パスSB)へと分割されてもよく、K2≧1である。W×Hコーディングブロック内の第3パスSBの総数は、K2*K1であり得る。各第3パスSBは次元M2×N2を有してもよく、M2≦sbW_1かつN2≦sbH_1である。第iの第2パスSB内の各第3パスSBは、初期動きベクトルMV_(pass2,i)(たとえば、第2のパスの間に導出されたMV)を有し得る。第3のパスは、各第3パスSBのための改良された動きベクトルMV_(pass3,j)を導出し、jは第3パスSBのインデックスを表し、0≦j≦K2*K1-1である。MV_(pass3,j)は、後続のパスのための初期動きベクトルとして保存され使用され得る。 The third pass may be subblock-based, for example, based on one or more subblocks of each subblock of the second pass. A subblock in the third pass (third pass SB) has a predetermined maximum dimension sbW_2 × sbH_2, where sbW_2 ≤ sbW_1 and sbH_2 ≤ sbH_1. Each i-th second pass SB in the second pass may be divided into K2 subblocks (third pass SBs), where K2 ≥ 1. The total number of third pass SBs in a W × H coding block may be K2 * K1. Each third pass SB may have dimension M2 × N2, where M2 ≤ sbW_1 and N2 ≤ sbH_1. Each third pass SB in the i-th second pass SB may have an initial motion vector MV _(pass2,i) (for example, the MV derived during the second pass). The third pass derives an improved motion vector MV _(pass3,j) for each third pass SB, where j represents the index of the third pass SB, and 0 ≤ j ≤ K2 * K1 - 1. MV _(pass3,j) can be saved and used as the initial motion vector for subsequent passes.

いくつかの例では、マルチパスDMVR技法は、第Pのパスまで続く。MV改良を実行するビデオデコーダ300は、第Pのパスにおける各サブブロック(第PパスSB)のためのMV_(passP,i)を導出してもよく、iはW×Hコーディングブロック内の第PパスSBのインデックスを表す。MV_(passP,i)は、現在のコーディングブロックの予測ブロックを導出するために保存され使用され得る。MV_(passP,i)は、第iのサブブロックのための改良されたMVを表す。 In some examples, the multi-pass DMVR technique extends to the pth pass. The video decoder 300 performing the MV refinement may derive an MV _(passP,i) for each subblock (pth pass SB) in the pth pass, where i represents the index of the pth pass SB in the W×H coding block. _{MV (passP,i)} may be saved and used to derive the predicted block for the current coding block. MV _(passP,i) represents the refined MV for the i-th subblock.

例2. 例1のように、DMVR技法の第pのパスと先行するパス(第(p-1)のパス)の両方がサブブロックベースであるとき、第pパスサブブロックの寸法は、先行するパスにおけるサブブロックの寸法以下であり得る。 Example 2. As in Example 1, when both the p-th pass and the preceding pass (the (p-1)th pass) of the DMVR technique are subblock-based, the dimensions of the p-th pass subblock may be less than or equal to the dimensions of the subblock in the preceding pass.

例1のように、第pのパスにおけるデルタ動きベクトルMV(Δhor,Δver)の値の範囲はあらかじめ決められていてもよい。たとえば、minDeltaHorPassP≦Δhor≦maxDeltaHorPassP、minDeltaVerPassP≦Δver≦maxDeltaVerPassPである。第pのパスが第1のパスではないとき(たとえば、p>1)、第pのパスにおけるΔhorおよびΔverの値の範囲は、先行するパスにおける値の範囲以下であり得る。たとえば、minDeltaHorPassP≧minDeltaHorPass(P-1)、maxDeltaHorPassP≦maxDeltaHorPass(P-1)、minDeltaVerPassP≧minDeltaVerPass(P-1)、maxDeltaVerPassP≦maxDeltaVerPass(P-1)である。第pのパスは先行するパスの改良された動きベクトルから開始し得るので、デルタ(最終的な改良された)動きベクトルの全体の値の範囲は、単一パスDMVRと比較して拡張される。 As in Example 1, the range of values for the delta motion vector MV(Δhor, Δver) in the p-th path may be predetermined. For example, minDeltaHorPassP ≤ Δhor ≤ maxDeltaHorPassP and minDeltaVerPassP ≤ Δver ≤ maxDeltaVerPassP. When the p-th path is not the first path (for example, p > 1), the range of values for Δhor and Δver in the p-th path may be less than or equal to the range of values in the preceding path. For example, minDeltaHorPassP ≥ minDeltaHorPass(P-1), maxDeltaHorPassP ≤ maxDeltaHorPass(P-1), minDeltaVerPassP ≥ minDeltaVerPass(P-1), and maxDeltaVerPassP ≤ maxDeltaVerPass(P-1). Since the p-th pass can start from the improved motion vector of the preceding pass, the overall range of values for the delta (final improved) motion vector is extended compared to a single-pass DMVR.

例1のように、ビデオデコーダ300が現在コーディングブロックをK個のサブブロックへと分割することを決定するとき、サブブロックは、現在コーディングブロックの左上から右下へのラスター走査順序にあり得る。 As in Example 1, when the video decoder 300 decides to divide the current coding block into K subblocks, the subblocks can be in the raster scan order from the top left to the bottom right of the current coding block.

例3 - DMVR技法の第pのパスを飛ばす。例1のように、所定の数Nは、マルチパスDMVR技法の全体のパスを表し得る。マルチパスDMVR技法を実装するビデオデコーダ300は、最終的な改良されたMVを導出するために1つまたは複数のパスを飛ばしてもよい。言い換えると、ビデオデコーダ300は、マルチパスDMVR技法のサブセットを適用することによって、最終的な改良された動きベクトルを導出し得る。DMVR技法の第pのパスを飛ばすと、ビデオデコーダ300の複雑さが下がり得る。 Example 3 - Skipping the p-th pass of the DMVR technique. As in Example 1, a given number N may represent the total number of passes in the multi-pass DMVR technique. A video decoder 300 implementing the multi-pass DMVR technique may skip one or more passes to derive the final improved MV. In other words, the video decoder 300 may derive the final improved motion vector by applying a subset of the multi-pass DMVR technique. Skipping the p-th pass of the DMVR technique may reduce the complexity of the video decoder 300.

DMVR技法の第pのパスを飛ばすかどうかの決定は、DMVR技法の先行するパスの結果に基づき得る。たとえば、先行するパスが相対的に最適な改良された動きベクトルを導出する場合、第pのパスは飛ばされ得る。 The decision of whether to skip the p-th pass in the DMVR technique can be based on the results of the preceding passes in the DMVR technique. For example, if the preceding passes derive a relatively optimal improved motion vector, the p-th pass may be skipped.

たとえば、ビデオデコーダ300は、短縮されたマルチパスDMVRをブロックのための動きベクトルに適用し得る。ビデオデコーダ300は、ブロックのためのマルチパスDMVRの所与のパスを飛ばすと決定し、所与のパスを飛ばすという決定に基づいて、ブロックのためのマルチパスDMVRの所与のパスを飛ばし得る。たとえば、先行するパスの改良されたMVが相対的に最適である(たとえば、さらなる改良がMV(サブペル)分解能の点でMVに変化をもたらさない可能性がある、またはさらなる改良のコストがさらなる改良の利益を上回る可能性がある)ときなどは、所与のパスを飛ばすと決定することは先行するパスの結果に基づいてもよい。 For example, the video decoder 300 may apply a shortened multipath DMVR to the motion vector for a block. The video decoder 300 may decide to skip a given path of the multipath DMVR for the block, and based on this decision, may skip a given path of the multipath DMVR for the block. For example, the decision to skip a given path may be based on the result of the preceding path, such as when the improved MV of the preceding path is relatively optimal (e.g., further improvements may not result in a change to the MV in terms of MV (sub-per) resolution, or the cost of further improvements may outweigh the benefits of further improvements).

例4 - サブブロックベースの第1パスDMVR技法。一部のハードウェア設計では、動き補償プロセスのための最大サイズは制約されていてもよく、より大きいコーディングブロックがハードウェア処理のために複数のサブブロックへと分割されてもよい。いくつかの例では、マルチパスDMVR技法は、第1のパスのためのサブブロックサイズmin{P,W}×min{Q,H}で開始してもよく、PおよびQはハードウェア制約により決定されるあらかじめ定められた整数値である。 Example 4 - Subblock-Based First-Pass DMVR Technique. In some hardware designs, the maximum size for the motion compensation process may be constrained, and larger coding blocks may be divided into multiple subblocks for hardware processing. In some examples, the multi-pass DMVR technique may begin with a subblock size of min{P,W} × min{Q,H} for the first pass, where P and Q are predetermined integer values determined by hardware constraints.

例1および3のように、DMVR技法の第1のパスはブロックベースであり得る。マルチパスDMVR技法がサブブロックベースのパスで開始するとき、DMVR技法の第1のパスは、サブブロックベースの第1パスDMVR技法またはスキップ第1パスDMVR技法としても知られていることがある。ビデオデコーダ300は、サブブロックベースの第1パスDMVR技法を適用し得る。 As in Examples 1 and 3, the first pass of the DMVR technique can be block-based. When the multi-pass DMVR technique begins with a sub-block-based pass, the first pass of the DMVR technique may also be known as the sub-block-based first-pass DMVR technique or the skip first-pass DMVR technique. The video decoder 300 may apply the sub-block-based first-pass DMVR technique.

例5 - コーディングブロックのサブエリアのためのDMVR技法の第pのパスを飛ばす。例1および3のように、W×Hコーディングブロックを仮定すると、所定の数Nは、マルチパスDMVR技法のパスの総数を表し得る。ビデオデコーダ300は、DMVR技法のN個のパスを適用することによって、コーディングブロックのサブエリアのための改良された動きベクトルを導出し得る。ビデオデコーダ300は、DMVR技法のM個のパスを適用することによってコーディングブロックの異なるサブエリアの改良された動きベクトルを導出してもよく、M<Nである。言い換えると、ビデオデコーダ300は、1つまたは複数のパスを飛ばして、コーディングブロックの所与のサブエリアのための最終的な改良された動きベクトルを導出してもよい。サブエリアは、コーディングブロックの1つまたは複数のサブブロックを含み得る。 Example 5 - Skipping the p-th pass of the DMVR technique for a sub-area of a coding block. Assuming a W × H coding block, as in Examples 1 and 3, a given number N may represent the total number of passes of the multi-pass DMVR technique. The video decoder 300 may derive improved motion vectors for a sub-area of a coding block by applying N passes of the DMVR technique. The video decoder 300 may also derive improved motion vectors for different sub-areas of a coding block by applying M passes of the DMVR technique, where M < N. In other words, the video decoder 300 may derive a final improved motion vector for a given sub-area of a coding block by skipping one or more passes. A sub-area may contain one or more sub-blocks of a coding block.

たとえば、ビデオデコーダ300は、短縮されたマルチパスDMVRをブロックの動きベクトルに適用する。ビデオデコーダ300は、1つまたは複数のサブブロックを含む、ブロックの特定のサブエリア(たとえば、上の段落において言及されたコーディングブロックの異なるサブエリア)のためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばすと決定し、所与のサブブロックベースのパスを飛ばすという決定に基づいて、特定のサブエリアのためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばしてもよい。たとえば、先行するパスの改良されたMVが相対的に最適である(たとえば、さらなる改良がMV(サブペル)精度の点でMVに変化をもたらさない可能性がある、またはさらなる改良のコストがさらなる改良の利益を上回る可能性がある)ときなどは、所与のサブブロックベースのパスを飛ばすと決定することは先行するパスの結果に基づいてもよい。 For example, the video decoder 300 applies a shortened multipath DMVR to the motion vector of a block. The video decoder 300 may decide to skip a given subblock-based path of the multipath DMVR for a specific sub-area of a block (e.g., a different sub-area of the coding block mentioned in the paragraph above), which contains one or more sub-blocks, and may skip a given subblock-based path of the multipath DMVR for that specific sub-area based on this decision. For example, the decision to skip a given subblock-based path may be based on the result of the preceding path, such as when the improved MV of the preceding path is relatively optimal (e.g., further improvements may not result in a change to the MV in terms of MV (sub-per) accuracy, or the cost of further improvements may outweigh the benefits of further improvements).

例6 - DMVRの第pのパスにおける改良された動きベクトルを導出する。この例では、いくつかのデコーダ側動きベクトル改良技法が説明される。マルチパスDMVR技法では、ビデオデコーダ300は、少なくとも1つのパスを通じて、以下で論じられるバイラテラルマッチングベースの動きベクトル改良を適用し、および/または、少なくとも1つのパスを通じて、BDOFベースの動きベクトル改良を適用し得る。言い換えると、マルチパスDMVRの少なくとも1つのパスはBDOFを適用することを含んでもよく、および/または、マルチパスDMVRの少なくとも1つのパスはバイラテラルマッチングを適用することを含んでもよい。一例では、第1のパスはバイラテラルマッチングを適用することを含み、第2のパスはバイラテラルマッチングを適用することを含み、第3のパスはBDOFを適用することを含む。 Example 6 - Deriving the improved motion vector in the p-th pass of a DMVR. This example illustrates several decoder-side motion vector improvement techniques. In a multi-pass DMVR technique, the video decoder 300 may apply bilateral matching-based motion vector improvement, as discussed below, through at least one pass, and/or apply BDOF-based motion vector improvement through at least one pass. In other words, at least one pass of a multi-pass DMVR may include applying BDOF, and/or at least one pass of a multi-pass DMVR may include applying bilateral matching. In one example, the first pass includes applying bilateral matching, the second pass includes applying bilateral matching, and the third pass includes applying BDOF.

図13は、例示的なBDOF動きベクトル改良を示す概念図である。双方向オプティカルフローによって改良された動きベクトルを導出することがここで説明される。この例では、ビデオデコーダ300は、双方向オプティカルフロー(BDOF)を使用することによって、第pパスDMVR技法における改良された動きベクトルを導出し得る。BDOF MVの改良は次の通りであり得る。
Mv0'=Mv0+bioMv
Mv1'=Mv1-bioMv
ここで、Mv0およびMv1はそれぞれ、参照ピクチャ0 1300および参照ピクチャ1 1302の中の現在ブロック/サブブロックの第pのパスの最初の初期Mvを表し、Mv0'およびMv1'はそれぞれ、参照ピクチャ0 1300および参照ピクチャ1 1302の中の現在ブロックのBDOF改良されたMVを表し、bioMvはBDOFデルタMVである。 Figure 13 is a conceptual diagram illustrating an exemplary BDOF motion vector improvement. The derivation of an improved motion vector by bidirectional optical flow is described here. In this example, the video decoder 300 can derive an improved motion vector in the p-th pass DMVR technique by using bidirectional optical flow (BDOF). Improvements to BDOF MV may include the following:
Mv0'=Mv0+bioMv
Mv1'=Mv1-bioMv
Here, Mv0 and Mv1 represent the initial Mv of the p-th path in the current block/subblock within reference picture 0 1300 and reference picture 1 1302, respectively; Mv0' and Mv1' represent the BDOF-enhanced MV of the current block within reference picture 0 1300 and reference picture 1 1302, respectively; and bioMv is the BDOF delta MV.

BDOF MV改良プロセスにおいて、bioMv(Δhor,Δver)は、以下のステップから導出され得る。
1)上で論じられたように、予測信号predSig0およびpredSig1から、水平方向および垂直方向の勾配、
および
、k=0,1を導出する。
2)上で論じられたように、導出された水平方向および垂直方向の勾配ならびに予測信号predSig0およびpredSig1から、勾配の自己相関および相互相関、S1、S2、S3、S5、およびS6を導出する。
3)2つのパラメータv_xおよびv_yを次のように導出する。
ここで、mは所定の値である。たとえば、m=3である。
4)デルタMV bioMV(Δhor,Δver)を次のように導出する。
Δhor=clip3(minDeltaHorPass3, maxDeltaHorPass3, ((v_x+2^n-1 )≫n))
Δver=clip3(minDeltaVerPass3, maxDeltaVerPass3, ((v_y+2^n-1 )≫n))
ここで、
nは所定の値である。たとえば、n=3である。
minDeltaHorPass3は所定の値である。たとえば、minDeltaHorPass3=-2である。
maxDeltaHorPass3は所定の値である。たとえば、maxDeltaHorPass3=2である。
minDeltaVerPass3は所定の値である。たとえば、minDeltaVerPass3=-2である。
maxDeltaVerPass3は所定の値である。たとえば、maxDeltaVerPass3=2である。 In the BDOF MV improvement process, bioMv(Δhor,Δver) can be derived from the following steps.
1) As discussed above, from the prediction signals predSig0 and predSig1, the horizontal and vertical gradients,
and
Then, we derive k=0 and k=1.
2) As discussed above, the autocorrelation and crosscorrelation of the gradients, S1, S2, S3, S5, and S6 are derived from the derived horizontal and vertical gradients and the predicted signals predSig0 and predSig1.
3) The two parameters v _x and v _y are derived as follows:
Here, m is a predetermined value. For example, m = 3.
4) The delta MV bioMV(Δhor,Δver) is derived as follows.
Δhor=clip3(minDeltaHorPass3, maxDeltaHorPass3, ((v _x +2 ^n-1 )≫n))
Δver=clip3(minDeltaVerPass3, maxDeltaVerPass3, ((v _y +2 ^n-1 )≫n))
Here,
n is a predetermined value. For example, n=3.
minDeltaHorPass3 is a predetermined value. For example, minDeltaHorPass3 = -2.
maxDeltaHorPass3 is a predetermined value. For example, maxDeltaHorPass3 = 2.
minDeltaVerPass3 is a predetermined value. For example, minDeltaVerPass3 = -2.
maxDeltaVerPass3 is a predetermined value. For example, maxDeltaVerPass3 = 2.

バイラテラルマッチングによって改良された動きベクトルを導出するビデオデコーダ300がここで説明される。バイラテラルマッチングは、それぞれ参照ピクチャ0および参照ピクチャ1の中の所定のローカル探索エリアにおける第pのパスでの、2つの初期動きベクトルMV0およびMV1の周りの探索を含む。最終的なMV0'およびMV1'は、最小のバイラテラルマッチングコストに基づいて導出される。 A video decoder 300 that derives improved motion vectors by bilateral matching is described here. Bilateral matching involves searching around two initial motion vectors MV0 and MV1 in the p-th path within a given local search area in reference picture 0 and reference picture 1, respectively. The final MV0' and MV1' are derived based on the minimum bilateral matching cost.

コーディングブロックのためのバイラテラルマッチングのローカル探索エリアは、水平方向の探索範囲、たとえば[sMinHor,sMaxHor]および垂直方向の探索範囲、たとえば[sMinVer,sMaxVer]を有する。コーディングブロックのためのバイラテラルマッチングのローカル探索エリアは、(sMaxHor-sMinHor+1)×(sMaxVer-sMinVer+1)であり得る。 The local search area for bilateral matching for a coding block has a horizontal search range, e.g., [sMinHor, sMaxHor], and a vertical search range, e.g., [sMinVer, sMaxVer]. The local search area for bilateral matching for a coding block may be (sMaxHor - sMinHor + 1) × (sMaxVer - sMinVer + 1).

例2のように、第pのパスにおいてデルタ動きベクトルMV(Δhor,Δver)の所定の値の範囲がある場合、探索範囲の値は、次のように、第pパスDMVR技法におけるデルタ動きベクトルの値の範囲によって決定され得る。
sMinHor≧minDeltaHorPassP
sMaxHor≦maxDeltaHorPassP
sMinVer≧minDeltaVerPassP
sMaxVer≦maxDeltaVerPassP As in Example 2, if there is a predetermined range of values for the delta motion vector MV(Δhor,Δver) in the p-th pass, the value of the search range can be determined by the range of values for the delta motion vector in the p-th pass DMVR technique, as follows:
sMinHor≧minDeltaHorPassP
sMaxHor≦maxDeltaHorPassP
sMinVer≧minDeltaVerPassP
sMaxVer≦maxDeltaVerPassP

さらなるデコーダ側動きベクトル改良方法がここで説明される。改良された動きベクトルは、代替のデコーダ側動きベクトル導出技法、たとえば、テンプレートマッチングまたはデコーダ側動きベクトル導出(DMVD)によって導出され得る。第pパスマルチパスDMVR技法を実装するビデオデコーダ300は、本開示において説明されるこれらの動きベクトル改良方法のうちの1つを使用し得る。しかしながら、DMVR技法の詳細は、本文書における説明と比較して異なっていてもよく、それでも本開示の範囲内にあってもよい。 Further decoder-side motion vector refinement methods are described here. The refined motion vectors may be derived by alternative decoder-side motion vector derivation techniques, such as template matching or decoder-side motion vector derivation (DMVD). A video decoder 300 implementing a p-pass multi-pass DMVR technique may use one of these motion vector refinement methods described in this disclosure. However, the details of the DMVR technique may differ from those described in this document, and may still remain within the scope of this disclosure.

例7 - 補間フィルタを適用することによって、または先行するパスの予測信号を使用することによって、第pパスDMVR技法における予測信号を導出する。例6のように、第pのパスにおける動きベクトル改良技法は、第pのパスにおける初期動きベクトルおよび参照ピクチャにおける予測信号で開始する。参照ピクチャにおける予測信号は、参照ピクチャにおける初期動きベクトル情報とともに補間フィルタを適用することによって導出され得る。 Example 7 - Deriving the prediction signal in the p-th pass DMVR technique by applying an interpolation filter or by using the prediction signal from a preceding pass. As in Example 6, the motion vector improvement technique in the p-th pass starts with the initial motion vector in the p-th pass and the prediction signal in the reference picture. The prediction signal in the reference picture can be derived by applying an interpolation filter along with the initial motion vector information in the reference picture.

この例では、ビデオデコーダ300は、
1)補間フィルタを適用することによって、第pパスDMVR技法を用いて予測信号を導出する。補間フィルタは、第pのパスにおいてMV改良技法(たとえば、バイラテラルマッチング、BDOFなど)によって決定されてもよく、および/または、
2)先行する予測信号を使用することによって、第pパスDMVR技法を用いて予測信号を導出する。 In this example, the video decoder 300 is
1) The predicted signal is derived using the p-th pass DMVR technique by applying an interpolation filter. The interpolation filter may be determined in the p-th pass by an MV improvement technique (e.g., bilateral matching, BDOF, etc.), and/or
2) By using the preceding prediction signal, the prediction signal is derived using the p-pass DMVR technique.

補間フィルタを適用することによって第pパスDMVR技法を用いて予測信号を導出するビデオデコーダ300がここで説明される。バイラテラルマッチングまたは改良された動きベクトルの導出のためのDMVR技法において、探索のための動き補償結果を生成するために、何らかの簡略化された補間フィルタが使用され得る。たとえば、バイラテラルマッチングまたはDMVRにおける探索プロセスのための非整数サンプルを生成するために、双線形補間フィルタが使用され得る。 A video decoder 300 is described here that derives a predicted signal using the p-pass DMVR technique by applying an interpolation filter. In DMVR techniques for bilateral matching or improved motion vector derivation, some simplified interpolation filter may be used to generate motion compensation results for the search. For example, a bilinear interpolation filter may be used to generate non-integer samples for the search process in bilateral matching or DMVR.

いくつかの例では、BDOFベースの技法を適用するとき、例6のように、第pのパスにおいて改良された動きベクトルを導出するために、入力は、元の(簡略化されていない)補間フィルタを使用した動き補償によって生成されるサンプルであり得る。 In some examples, when applying BDOF-based techniques, as in Example 6, the input may be samples generated by motion compensation using the original (unsimplified) interpolation filter, in order to derive an improved motion vector in the p-th pass.

他の例では、BDOFベースの技法を適用するとき、例6のように、第pのパスにおいて改良された動きベクトルを導出するために、入力は、双線形補間フィルタなどの簡略化された補間フィルタを使用した動き補償によって生成されるサンプルであり得る。 In other examples, when applying BDOF-based techniques, as in Example 6, the input may be samples generated by motion compensation using a simplified interpolation filter, such as a bilinear interpolation filter, in order to derive an improved motion vector in the p-th pass.

先行するパスの予測信号を使用することによって第pパスDMVR技法において予測信号を導出するビデオデコーダ300がここで説明される。一例では、ビデオデコーダ300は、先行するパスにおけるデルタ動きベクトルの精度を確認することによって、先行するパスの予測信号を使用するかどうかを決定し得る。たとえば、
1)先行するパスにおけるデルタ動きベクトルが整数ペル精度であるとき、DMVR技法の第pのパスにおける予測信号は、先行するパスの予測信号を使用することによって導出されてもよく、2)先行するパスにおける改良された動きベクトルが先行するパスにおける初期動きベクトルと同一であるとき、DMVR技法の第pのパスにおける予測信号は、先行するパスの予測信号を使用することによって導出されてもよい。 A video decoder 300 is described here that derives a prediction signal in the p-th pass DMVR technique by using the prediction signal of the preceding pass. For example, the video decoder 300 may decide whether to use the prediction signal of the preceding pass by checking the accuracy of the delta motion vector in the preceding pass. For example,
1) When the delta motion vector in the preceding path has integer Pell precision, the prediction signal in the p-th path of the DMVR technique may be derived by using the prediction signal of the preceding path, and 2) When the improved motion vector in the preceding path is identical to the initial motion vector in the preceding path, the prediction signal in the p-th path of the DMVR technique may be derived by using the prediction signal of the preceding path.

例8 - 3パスデコーダ側動き改良の例。この例では、ビデオデコーダ300は、3パスデコーダ側動き改良技法を使用する。この例では、プロセスは次のような3つのパスを含む。1)第1のパスはブロックベースである。改良された動きベクトルは、バイラテラルマッチングベースの動きベクトル改良を適用することによって導出される。デルタ動き値の範囲は、たとえば水平方向において[-8,8]であり、たとえば垂直方向において[-8,8]である。2)第2のパスはサブブロックベースである。改良された動きベクトルは、バイラテラルマッチングベースの動きベクトル改良を適用することによって導出される。最大のサブブロック寸法は、たとえば16×16ルマサンプルである。たとえば、第2のパスのサブブロックは、16ルマサンプルという所定の最大の幅および16ルマサンプルという所定の最大の高さを有する。デルタ動き値の範囲は、たとえば水平方向において[-8,8]であり、たとえば垂直方向において[-8,8]である。3)第3のパスはサブブロックベースである。改良された動きベクトルは、BDOFベースの動きベクトル改良を適用することによって導出される。最大のサブブロック寸法は、たとえば8×8ルマサンプルである。たとえば、第3のパスのサブブロックは、8ルマサンプルという所定の最大の幅および8ルマサンプルという所定の最大の高さを有する。デルタ動き値の範囲は、たとえば水平方向において[-2,2]であり、たとえば垂直方向において[-2,2]である。 Example 8 - Example of 3-pass decoder-side motion improvement. In this example, the video decoder 300 uses a 3-pass decoder-side motion improvement technique. In this example, the process involves three passes as follows: 1) The first pass is block-based. The improved motion vector is derived by applying bilateral matching-based motion vector improvement. The delta motion value range is, for example, [-8,8] horizontally and, for example, [-8,8] vertically. 2) The second pass is sub-block-based. The improved motion vector is derived by applying bilateral matching-based motion vector improvement. The maximum sub-block dimension is, for example, 16 × 16 luma samples. For example, the sub-block of the second pass has a predetermined maximum width of 16 luma samples and a predetermined maximum height of 16 luma samples. The delta motion value range is, for example, [-8,8] horizontally and, for example, [-8,8] vertically. 3) The third pass is sub-block-based. The improved motion vector is derived by applying a BDOF-based motion vector improvement. The maximum subblock dimension is, for example, 8 x 8 luma samples. For example, the subblock of the third pass has a predetermined maximum width of 8 luma samples and a predetermined maximum height of 8 luma samples. The delta motion value range is, for example, [-2,2] in the horizontal direction and, for example, [-2,2] in the vertical direction.

たとえば、第1のパスまたは第2のパスのうちの少なくとも1つのためのデルタ動き値の範囲は、水平方向において[-8,8]および垂直方向において[-8,8]であってもよく、第3のパスのためのデルタ動き値の範囲は、水平方向において[-2,2]および垂直方向において[-2,2]であってもよい。 For example, the delta motion range for at least one of the first or second path may be [-8,8] horizontally and [-8,8] vertically, and the delta motion range for the third path may be [-2,2] horizontally and [-2,2] vertically.

前述の技法は、ビデオコーディングシステムのビデオデコーダ300によって適用され得る。以下は、マルチパスDMVRの詳細な例である。ビデオデコーダ300は、ビットストリームからのピクチャの中のインター予測されるブロックを復号するために、以下のステップのすべてまたはサブセットによって、ここで説明される技法を実施し得る。
1)ビットストリームの中のシンタックス要素を復号することによって、現在ブロックの左上のルマ位置として位置成分(x,y)を導出する。
2)ビットストリームの中のシンタックス要素を復号することによって、幅値Wおよび高さ値Hとして現在ブロックのサイズを導出する。
3)ビットストリームの中の要素を復号することから、現在ブロックがインター予測されるブロックであると決定する。
4)ビットストリームの中の要素を復号することから、現在ブロックの動きベクトル成分(mvL0およびmvL1)および参照インデックス(refPicL0およびrefPicL1)を導出する。
5)ビットストリームの中の要素を復号することからフラグを推測し、フラグは、デコーダ側動きベクトル導出(たとえば、DMVR、バイラテラルマージ、テンプレートマッチングなど)が現在ブロックに適用されるかどうかを示す。フラグの推測方式は、限定はされないが、本開示において前に論じられたDMVRの有効化条件と同じであり得る。別の例では、このフラグは、ビデオデコーダ300による複雑な条件の確認を避けるために、ビットストリームにおいて明示的にシグナリングされ得る。
6)(パス1)前述のフラグの値に従って、決定が現在ブロックにDMVR(バイラテラルマージまたはテンプレートマッチング)を適用しないというものである場合、動きベクトルmvL0およびmvL1をそれぞれMV0pass1およびMV1pass1の動きベクトルとして設定し、それ以外の場合(決定が現在ブロックにDMVRを適用するというものである場合)、次のことが当てはまる。
(a)現在ブロックのmvL0およびmvL1を、現在ブロックのための初期動きベクトルとして設定する。
(b)変数sHorおよびsVerを次のように決定する。
sHor=maximum(maxDeltaHorPass1, W×sFactor)
sVer=maximum(maxDeltaVerPass1, H×sFactor)
ここで、
maxDeltaHorPass1は所定の変数(たとえば、8)である。
maxDeltaVerPass1は所定の変数(たとえば、8)である。
sFactorは所定の変数(たとえば、0.5)である。
sHorは水平方向におけるDMVRの探索範囲[-sHor,sHor]である。
sVerは垂直方向におけるDMVRの探索範囲[-sVer,sVer]である。
(c)導出されたmvL0およびrefPicL0を使用することによって、参照ピクチャ0から予測信号predSig0を導出する。predSig0の幅は、W+2×sHorに等しい。predSig0の高さは、H+2×sVerに等しい。
(d)導出されたmvL1およびrefPicL1を使用することによって、参照ピクチャ1から予測信号predSig1を導出する。predSig1の幅は、W+2×sHorに等しい。predSig0の高さは、H+2×sVerに等しい。
(e)変数minCostPass1を最大のコスト値に設定する。
(f)変数best delta MV(Δhor_best,Δver_best)をデルタMV(0,0)に設定する。
(g)現在ブロックの探索範囲内のデルタMV(Δhor,Δver)の各々またはサブセットをループする。-sVer≦Δver≦sVer、-sHor≦Δhor≦sHorである。
(i)現在デルタMV(Δhor,Δver)におけるバイラテラルマッチングコストbilCostを導出する。
(ii)bilCostがminCostPass1未満である場合、
(a)minCostPass1をbilCostに等しく設定する。
(b)best delta MV(Δhor_best,Δver_best)をMV(Δhor,Δver)に等しく設定する。
(h)改良された動きベクトル(mvL0+MV(Δhor_best,Δver_best))をMV0_pass1の動きベクトルとして導出する。
(i)改良された動きベクトル(mvL1-MV(Δhor_best,Δver_best))をMV1_pass1の動きベクトルとして導出する。
7)(パス2)水平方向におけるサブブロックの数numSbXおよび垂直方向におけるサブブロックの数numSbY、サブブロックの幅sbWidthPass2、ならびに高さsbHeightPass2を次のように導出する。
numSbX=(W>thW)?(W/thW):1
numSbY=(H>thH)?(H/thH):1
sbWidthPass2=(W>thW)?thW:W
sbHeightPass2=(H>thH)?thH:H
ここで、thWおよびthHはそれぞれ、第2のパスのための最大のサブブロックの幅と高さを示す所定の整数値である(たとえば、thW=thH=16)。
(a)前述のフラグの値に従って、決定が現在ブロックにDMVR(バイラテラルマージまたはテンプレートマッチング)を適用しないというものである場合、動きベクトルMV0_pass1およびMV1_pass1をそれぞれ、各サブブロックのための動きベクトルMV0_(pass2,i)およびMV1_(pass2,i)として設定し、それ以外の場合(決定が現在ブロックにDMVRを適用するというものである場合)、次のことが当てはまる。
(b)(パス2を飛ばすかどうかを確認する)(thFactorPass2×W×H)に等しい変数costThPass2を導出し、thFactorPass2は所定の値であり、たとえばthFactorPass2=1である。minCostPass1がcostThPass2未満である場合、MV0_pass1およびMV1_pass1をそれぞれ、各サブブロックのための動きベクトルMV0_{(pass2, i)}およびMV1_{(pass2, i)}として設定し、それ以外の場合(minCostPass1がcostThPass2以上である場合)、次のことが当てはまる。
(i)位置成分(sbX,sbY)=(x,y)を現在ブロックの第1のサブブロックの左上ルマ位置として設定する。
(ii)左上から右下へと各サブブロックに対して、
(a)変数i=(sbY/sbHeightPass2)*(W/sbWidthPass2)+(sbX/sbWidthPass2)を現在サブブロックインデックスとして設定する。
(b)MV0_pass1およびMV1_pass1を、現在サブブロックのための初期動きベクトルとして設定する。
(c)変数sHorおよびsVerを次のように決定する。
sHor=maximum(maxDeltaHorPass2, sbWidthPass2×sFactor)
sVer=maximum(maxDeltaVerPass2, sbHeightPass2×sFactor)
ここで、
maxDeltaHorPass2は所定の変数(たとえば、8)であり、maxDeltaVerPass2は所定の変数(たとえば、8)である。
sFactorは所定の変数(たとえば、0.5)である。
sHorは、パス2のための水平方向における探索範囲[-sHor,sHor]を指定する。
sVerは、パス2のための垂直方向における探索範囲[-sVer,sVer]を指定する。
(d)導出されたMV0_pass1およびrefPicL0を使用することによって、参照ピクチャ0から予測信号predSig0を導出する。predSig0の幅はsbWidthPass2+2×sHorに等しい。predSig0の高さはsbHeightPass2+2×sVerに等しい。
(e)導出されたMV1_pass1およびrefPicL1を使用することによって、参照ピクチャ1から予測信号predSig1を導出する。predSig1の幅はsbWidthPass2+2×sHorに等しい。predSig0の高さはsbHeightPass2+2×sVerに等しい。
(f)変数minCostPass2を最大のコスト値に設定する。
(g)変数best delta MV(Δhor_best,Δver_best)をデルタMV(0,0)に設定する。
(h)現在サブブロックの探索範囲内のデルタMV(Δhor,Δver)の各々またはサブセットをループする。-sVer≦Δver≦sVer、-sHor≦Δhor≦sHorである。
(i)現在デルタMV(Δhor,Δver)におけるバイラテラルマッチングコストbilCostを導出する。
(ii)bilCostがminCostPass2未満である場合、
(a)minCostPass2をbilCostに等しく設定する。
(b)best delta MV(Δhor_best,Δver_best)をMV(Δhor,Δver)に等しく設定する。
(i)改良された動きベクトル(MV0_pass1+MV(Δhor_best,Δver_best))をMV0_(pass2,i)の動きベクトルとして導出する。
(j)改良された動きベクトル(MV1_pass1-MV(Δhor_best,Δver_best))をMV1_(pass2,i)の動きベクトルとして導出する。
(k)サブブロックの左上ルマ位置を次のように更新する。
sbX=(sbX+sbWidthPass2)<W?sbX+sbWidthPass2:0
sbY=(sbX+sbWidthPass2)<W?sbY:sbY+sbHeightPass2
8)ビットストリームの中の要素を復号することからフラグを推測し、フラグは、双方向オプティカルフローが現在ブロックに適用されるかどうかを示す。フラグの推測方式は、限定はされないが、上に記載された例と同じであり得る。別の例では、このフラグは、デコーダにおける複雑な条件の確認を避けるために、ビットストリームにおいて明示的にシグナリングされ得る。
9)(パス3)前述のフラグの値に従って、決定が現在ブロックにBDOFを適用するというものであるとき、次のことが当てはまる。
(a)水平方向におけるサブブロックの数numSbXおよび垂直方向におけるサブブロックの数numSbY、サブブロックの幅sbW、ならびに高さsbHを次のように導出する。
numSbX=(W>thW)?(W/thW):1
numSbY=(H>thH)?(H/thH):1
sbWidthPass3=(W>thW)?thW:W
sbHeightPass3=(H>thH)?thH:H
ここで、thWおよびthHはそれぞれ、第3のパスのための最大のサブブロックの幅と高さを示す所定の整数値である(たとえば、thW=thH=8)。
(b)(thFactorPass3×sbWidth×sbHeight)に等しい変数costThPass3を導出し、thFactorPass3は所定の値であり、たとえばthFactorPass3=32である。
(c)位置成分(sbX,sbY)=(x,y)を現在ブロックの第1のサブブロックの左上ルマ位置として設定する。
(d)左上から右下へと各サブブロックに対して、
(i)変数i=(sbY/sbHeightPass3)*(W/sbWidthPass3)+(sbX/sbWidthPass3)をパス3の現在サブブロックインデックスとして設定する。
(ii)変数j=(sbY/sbHeightPass2)*(W/sbWidthPass2)+(sbX/sbWidthPass2)をパス2の現在サブブロックインデックスとして設定する。
(iii)MV0_(pass2,j)およびMV1_(pass2,j)を、現在サブブロックのための初期動きベクトルとして設定する。
(iv)導出されたMV0_(pass2,j)およびrefPicL0を使用することによって、参照ピクチャ0から予測信号predSig0を導出する。
(v)導出されたMV1_(pass2,j)およびrefPicL1を使用することによって、参照ピクチャ1から予測信号predSig1を導出する。
(vi)現在サブブロックのpredSig0とpredSig1との間のひずみコスト距離を導出する。
(vii)(サブエリアパス3をスキップするかどうかを確認する)ひずみコスト距離がcostThPass3未満である場合、MV0_(pass2,j)およびMV1_(pass2,j)をそれぞれ、現在サブブロックのための改良された動きベクトルMV0_(pass3,i)およびMV1_(pass3,i)として設定し、それ以外の場合(ひずみコスト距離がcostThPass3以上である場合)、次のことが当てはまる、
(a)上で論じられたように、予測信号predSig0およびpredSig1から、水平方向および垂直方向の勾配、
および
、k=0,1を導出する。
(b)上で論じられたように、導出された水平方向および垂直方向の勾配ならびに予測信号predSig0およびpredSig1から、勾配の自己相関および相互相関、S1、S2、S3、S5、およびS6を導出する。
(c)2つのパラメータv_xおよびv_yを次のように導出する。
ここで、mは所定の値である。たとえば、m=3である。
(d)デルタMV bioMV(Δhor,Δver)を次のように導出する。
Δhor=clip3(minDeltaHorPass3, maxDeltaHorPass3, ((v_x+2^n-1 )≫n))
Δver=clip3(minDeltaVerPass3, maxDeltaVerPass3, ((v_y+2^n-1 )≫n))
ここで、
nは所定の値である。たとえば、n=3である。
minDeltaHorPass3は所定の値である。たとえば、minDeltaHorPass3=-2である。
maxDeltaHorPass3は所定の値である。たとえば、maxDeltaHorPass3=2である。
minDeltaVerPass3は所定の値である。たとえば、minDeltaVerPass3=-2である。
maxDeltaVerPass3は所定の値である。たとえば、maxDeltaVerPass3=2である。
(e)改良された動きベクトル(MV0_(pass2,j)+bioMV(Δhor,Δver))をMV0_(pass3,i)の動きベクトルとして導出する。
(f)改良された動きベクトル(MV1_(pass2,j)-bioMV(Δhor,Δver))をMV1_(pass3,i)の動きベクトルとして導出する。
(viii)サブブロックの左上ルマ位置を次のように更新する。
sbX=(sbX+sbWidthPass3)<W?sbX+sbWidthPass3:0
sbY=(sbX+sbWidthPass3)<W?sbY:sbY+sbHeightPass3
10)ビデオ復号のための各サブブロックの改良された動きベクトルMV0_(pass3,i)およびMV1_(pass3,i)を使用して、予測されるブロックを導出する。 The aforementioned techniques can be applied by the video decoder 300 of the video coding system. The following is a detailed example of multipath DMVR. The video decoder 300 may implement the techniques described herein by all or a subset of the following steps in order to decode interpredicted blocks in a picture from a bitstream.
1) The position component (x,y) is derived as the top-left Luma position of the current block by decoding the syntax element in the bitstream.
2) The size of the current block is derived as width W and height H by decoding the syntax elements in the bitstream.
3) By decoding the elements in the bitstream, we determine that the current block is the block that can be interpreted.
4) By decoding the elements in the bitstream, the motion vector components (mvL0 and mvL1) and reference indices (refPicL0 and refPicL1) of the current block are derived.
5) A flag is inferred from decoding the elements in the bitstream, and the flag indicates whether a decoder-side motion vector derivation (e.g., DMVR, bilateral merge, template matching, etc.) is currently applied to the block. The flag inference method may be, but is not limited to, the same as the DMVR activation conditions discussed earlier in this disclosure. In another example, this flag may be explicitly signaled in the bitstream to avoid complex conditional checks by the video decoder 300.
6) (Pass 1) If the decision is not to apply DMVR (Bilateral Merge or Template Matching) to the current block, according to the value of the flag mentioned above, then the motion vectors mvL0 and mvL1 are set as the motion vectors for MV0pass1 and MV1pass1, respectively; otherwise (if the decision is to apply DMVR to the current block), then the following applies:
(a) Set the mvL0 and mvL1 of the current block as the initial motion vectors for the current block.
(b) Determine the variables sHor and sVer as follows:
sHor=maximum(maxDeltaHorPass1, W×sFactor)
sVer=maximum(maxDeltaVerPass1, H×sFactor)
Here,
maxDeltaHorPass1 is a predetermined variable (for example, 8).
maxDeltaVerPass1 is a predetermined variable (for example, 8).
sFactor is a given variable (for example, 0.5).
sHor is the search range of the DMVR in the horizontal direction [-sHor, sHor].
sVer is the vertical search range of the DMVR [-sVer, sVer].
(c) Derive the prediction signal predSig0 from reference picture 0 by using the derived mvL0 and refPicL0. The width of predSig0 is equal to W + 2 × sHor. The height of predSig0 is equal to H + 2 × sVer.
(d) Using the derived mvL1 and refPicL1, the prediction signal predSig1 is derived from reference picture 1. The width of predSig1 is equal to W + 2 × sHor. The height of predSig0 is equal to H + 2 × sVer.
(e) Set the variable minCostPass1 to the maximum cost value.
(f) Set the variable best delta MV(Δhor_best,Δver_best) to delta MV(0,0).
(g) Loop through each or a subset of DeltaMV(Δhor,Δver) within the current block's search range, where -sVer≦Δver≦sVer and -sHor≦Δhor≦sHor.
(i) Derive the bilateral matching cost bilCost in the current delta MV(Δhor,Δver).
(ii) If bilCost is less than minCostPass1,
(a) Set minCostPass1 to be equal to bilCost.
(b) Set the best delta MV(Δhor_best,Δver_best) to be equal to MV(Δhor,Δver).
(h) Derive the improved motion vector (mvL0+MV(Δhor_best,Δver_best)) as the motion vector for MV0 _pass1 .
(i) Derive the improved motion vector (mvL1-MV(Δhor_best,Δver_best)) as the motion vector for MV1 _pass1 .
7) (Path 2) The number of subblocks in the horizontal direction numSbX and the number of subblocks in the vertical direction numSbY, the width of the subblocks sbWidthPass2, and the height sbHeightPass2 are derived as follows:
numSbX=(W>thW)?(W/thW):1
numSbY=(H>thH)?(H/thH):1
sbWidthPass2=(W>thW)?thW:W
sbHeightPass2=(H>thH)?thH:H
Here, thW and thH are predetermined integer values that represent the width and height of the largest subblock for the second path, respectively (for example, thW = thH = 16).
(a) If the decision is not to apply DMVR (Bilateral Merge or Template Matching) to the current block, according to the value of the flag mentioned above, then the motion vectors MV0 _pass1 and MV1 _pass1 are set as the motion vectors MV0 _(pass2,i) and MV1 _(pass2,i) for each subblock, respectively; otherwise (if the decision is to apply DMVR to the current block), then the following applies:
(b) (Check whether to skip pass 2) Derive a variable costThPass2 equal to (thFactorPass2 × W × H), where thFactorPass2 is a predetermined value, for example thFactorPass2 = 1. If minCostPass1 is less than costThPass2, set MV0 _pass1 and MV1 _pass1 as the motion vectors MV0 _{(pass2, i)} and MV1 _{(pass2, i)} for each subblock, respectively; otherwise (if minCostPass1 is greater than or equal to costThPass2), the following applies:
(i) Set the position component (sbX,sbY)=(x,y) as the top-left corner position of the first subblock of the current block.
(ii) For each subblock, starting from the top left and going down to the bottom right,
(a) Set the variable i = (sbY/sbHeightPass2)*(W/sbWidthPass2)+(sbX/sbWidthPass2) as the current subblock index.
(b) Set MV0 _pass1 and MV1 _pass1 as the initial motion vectors for the current subblock.
(c) Determine the variables sHor and sVer as follows:
sHor=maximum(maxDeltaHorPass2, sbWidthPass2×sFactor)
sVer=maximum(maxDeltaVerPass2, sbHeightPass2×sFactor)
Here,
maxDeltaHorPass2 is a predetermined variable (for example, 8), and maxDeltaVerPass2 is a predetermined variable (for example, 8).
sFactor is a given variable (for example, 0.5).
sHor specifies the horizontal search range [-sHor,sHor] for path 2.
sVer specifies the vertical search range [-sVer,sVer] for path 2.
(d) Derive the prediction signal predSig0 from reference picture 0 by using the derived MV0 _pass1 and refPicL0. The width of predSig0 is equal to sbWidthPass2 + 2 × sHor. The height of predSig0 is equal to sbHeightPass2 + 2 × sVer.
(e) Derive the prediction signal predSig1 from reference picture 1 by using the derived MV1 _pass1 and refPicL1. The width of predSig1 is equal to sbWidthPass2 + 2 × sHor. The height of predSig0 is equal to sbHeightPass2 + 2 × sVer.
(f) Set the variable minCostPass2 to the maximum cost value.
(g) Set the variable best delta MV(Δhor_best,Δver_best) to delta MV(0,0).
(h) Loop through each or a subset of DeltaMV(Δhor,Δver) within the current subblock's search range, where -sVer≦Δver≦sVer and -sHor≦Δhor≦sHor.
(i) Derive the bilateral matching cost bilCost in the current delta MV(Δhor,Δver).
(ii) If bilCost is less than minCostPass2,
(a) Set minCostPass2 to be equal to bilCost.
(b) Set the best delta MV(Δhor_best,Δver_best) to be equal to MV(Δhor,Δver).
(i) Derive the improved motion vector (MV0 _pass1 + MV(Δhor_best, Δver_best)) as the motion vector for MV0 _{(pass2, i)} .
(j) Derive the improved motion vector (MV1 _pass1 -MV(Δhor_best,Δver_best)) as the motion vector of MV1 _(pass2,i) .
(k) Update the top-left luma position of the subblock as follows:
sbX=(sbX+sbWidthPass2)<W?sbX+sbWidthPass2:0
sbY=(sbX+sbWidthPass2)<W?sbY:sbY+sbHeightPass2
8) Infer a flag from decoding elements in the bitstream, which indicates whether a bidirectional optical flow is currently applied to the block. The method for inferring the flag is not limited, but could be the same as in the example above. In another example, this flag may be explicitly signaled in the bitstream to avoid complex conditional checks in the decoder.
9) (Path 3) When the decision is to apply BDOF to the current block according to the value of the flag mentioned above, the following is true:
(a) The number of subblocks in the horizontal direction numSbX and the number of subblocks in the vertical direction numSbY, the width of the subblock sbW, and the height sbH are derived as follows:
numSbX=(W>thW)?(W/thW):1
numSbY=(H>thH)?(H/thH):1
sbWidthPass3=(W>thW)?thW:W
sbHeightPass3=(H>thH)?thH:H
Here, thW and thH are predetermined integer values that represent the width and height of the largest subblock for the third path, respectively (for example, thW = thH = 8).
(b) Derive a variable costThPass3 that is equal to (thFactorPass3 × sbWidth × sbHeight), where thFactorPass3 is a predetermined value, for example, thFactorPass3 = 32.
(c) Set the position component (sbX,sbY)=(x,y) as the top-left corner position of the first subblock of the current block.
(d) For each subblock, starting from the top left and going down to the bottom right,
(i) Set the variable i = (sbY/sbHeightPass3)*(W/sbWidthPass3)+(sbX/sbWidthPass3) as the current subblock index of path 3.
(ii) Set the variable j = (sbY/sbHeightPass2)*(W/sbWidthPass2)+(sbX/sbWidthPass2) as the current subblock index of path 2.
(iii) Set MV0 _(pass2,j) and MV1 _(pass2,j) as the initial motion vectors for the current subblock.
(iv) Using the derived MV0 _(pass2,j) and refPicL0, the prediction signal predSig0 is derived from reference picture 0.
(v) Using the derived MV1 _(pass2,j) and refPicL1, the predicted signal predSig1 is derived from reference picture 1.
(vi) Derive the strain cost distance between predSig0 and predSig1 in the current subblock.
(vii) (Check whether to skip sub-area pass 3) If the strain cost distance is less than costThPass3, set MV0 _(pass2,j) and MV1 _(pass2,j) as the improved motion vectors MV0 _(pass3,i) and MV1 _(pass3,i) for the sub-block, respectively; otherwise (if the strain cost distance is greater than or equal to costThPass3), the following applies:
(a) As discussed above, from the prediction signals predSig0 and predSig1, the horizontal and vertical gradients,
and
Then, we derive k=0 and k=1.
(b) As discussed above, the autocorrelation and crosscorrelation of the gradients, S1, S2, S3, S5, and S6 are derived from the derived horizontal and vertical gradients and the predicted signals predSig0 and predSig1.
(c) The two parameters v _x and v _y are derived as follows:
Here, m is a predetermined value. For example, m = 3.
(d) The delta MV bioMV(Δhor,Δver) is derived as follows.
Δhor=clip3(minDeltaHorPass3, maxDeltaHorPass3, ((v _x +2 ^n-1 )≫n))
Δver=clip3(minDeltaVerPass3, maxDeltaVerPass3, ((v _y +2 ^n-1 )≫n))
Here,
n is a predetermined value. For example, n=3.
minDeltaHorPass3 is a predetermined value. For example, minDeltaHorPass3 = -2.
maxDeltaHorPass3 is a predetermined value. For example, maxDeltaHorPass3 = 2.
minDeltaVerPass3 is a predetermined value. For example, minDeltaVerPass3 = -2.
maxDeltaVerPass3 is a predetermined value. For example, maxDeltaVerPass3 = 2.
(e) Derive the improved motion vector (MV0 _(pass2,j) +bioMV(Δhor,Δver)) as the motion vector for MV0 _(pass3,i) .
(f) The improved motion vector (MV1 _(pass2,j) -bioMV(Δhor,Δver)) is derived as the motion vector of MV1 _(pass3,i) .
(viii) Update the top-left luma position of the subblock as follows:
sbX=(sbX+sbWidthPass3)<W?sbX+sbWidthPass3:0
sbY=(sbX+sbWidthPass3)<W?sbY:sbY+sbHeightPass3
10) Using the improved motion vectors MV0 _(pass3,i) and MV1 _(pass3,i) of each subblock for video decoding, derive the predicted block.

例9 - マルチパスDMVR技法のすべてのパスが飛ばされる。マルチパスDMVR技法のすべてのパスが飛ばされるとき、最後のパス(パスP)における各サブブロックのための最後の改良された動きベクトルMV_(passP,i)は、初期動きベクトルMV_Orgに等しい。 Example 9 - All passes in the multi-pass DMVR technique are skipped. When all passes in the multi-pass DMVR technique are skipped, the final improved motion vector MV _(passP,i) for each subblock in the last pass (pass P) is equal to the initial motion vector MV _Org .

たとえば、例8のように、ビデオデコーダ300は、DMVR(たとえば、バイラテラルマージまたはテンプレートマッチング)を現在ブロックに適用する条件によって、BDOFベースの動きベクトル改良(パス3)を現在ブロックに適用するかどうかを決定し得る。たとえば、上の例の8におけるステップ5において、ビデオデコーダ300は、現在ブロックにDMVRを適用しないと決定し、マルチパスDMVR技法すべての3つのパスが飛ばされる。例8のステップ10における各サブブロックのための改良された動きベクトルMV0_(pass3,i)およびMV1_(pass3,i)はそれぞれ、mvL0およびmvL1に等しい。たとえば、ビデオデコーダ300は、DMVRをブロックに適用しないと決定し得る。DMVRをブロックに適用しないという決定に基づいて、ビデオデコーダ300は、マルチパスDMVRのすべてのパスを飛ばし、初期動きベクトルに基づいてブロックを復号し得る。 For example, as in Example 8, the video decoder 300 may decide whether to apply a BDOF-based motion vector improvement (pass 3) to the current block, depending on the conditions under which a DMVR (e.g., bilateral merge or template matching) is applied to the current block. For example, in step 5 of Example 8 above, the video decoder 300 decides not to apply a DMVR to the current block, and all three passes of the multi-pass DMVR technique are skipped. The improved motion vectors MV0 _(pass3,i) and MV1 _(pass3,i) for each subblock in step 10 of Example 8 are equal to mvL0 and mvL1, respectively. For example, the video decoder 300 may decide not to apply a DMVR to the block. Based on the decision not to apply a DMVR to the block, the video decoder 300 may skip all passes of the multi-pass DMVR and decode the block based on the initial motion vector.

図14は、本開示の例示的なマルチパスDMVR技法を示すフローチャートである。ビデオデコーダ300は、改良されたMVを決定するために、ビデオデータのブロックのためのMVにマルチパスDMVRを適用し得る(1400)。たとえば、ビデオデコーダ300は、ブロックベースである第1のパスと、サブブロックベースである第2のパスであって、第2パスサブブロックの幅が第1パスブロックの幅以下であり、第2パスサブブロックの高さが第1パスブロックの高さ以下である、第2のパスと、サブブロックベースである第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを含む、マルチパスDMVRを適用し得る。 Figure 14 is a flowchart illustrating an exemplary multi-pass DMVR technique of the present disclosure. The video decoder 300 may apply multi-pass DMVR to the MV for blocks of video data to determine an improved MV (1400). For example, the video decoder 300 may apply multi-pass DMVR including a first pass that is block-based, a second pass that is sub-block-based, where the width of the second-pass subblock is less than or equal to the width of the first-pass block and the height of the second-pass subblock is less than or equal to the height of the first-pass block, and a third pass that is sub-block-based, where the width of the third-pass subblock is less than or equal to the width of the second-pass subblock and the height of the third-pass subblock is less than or equal to the height of the second-pass subblock.

ビデオデコーダ300は、改良されたMVに基づいてブロックをコーディングし得る(1402)。たとえば、ビデオデコーダ300は、改良されたMVを使用してブロックを予測し得る。 The video decoder 300 may code blocks based on the improved MV (1402). For example, the video decoder 300 may predict blocks using the improved MV.

いくつかの例では、ビデオデータのブロックの少なくとも1つの第3パスサブブロックは、ビデオデータのブロックの少なくとも1つの第2パスサブブロックに対してサブブロックである。いくつかの例では、ビデオデコーダ300は、ビデオデータのブロックのための少なくとも1つの第1の改良された動きベクトルを導出するために第1のパスを適用し、第2のパスにおいて少なくとも1つの第1の改良された動きベクトルを使用し得る。たとえば、ビデオデコーダ300は、第2のパスのための初期動きベクトルとして第1の改良された動きベクトルを使用し得る。いくつかの例では、ビデオデコーダ300は、少なくとも1つのそれぞれの第2パスサブブロックのための少なくとも1つの第2の改良された動きベクトルを導出するために第2のパスを適用し、第3のパスにおいて少なくとも1つの第2の改良された動きベクトルを使用し得る。たとえば、ビデオデコーダ300は、第2のパスの1つまたは複数のそれぞれのサブブロックのための1つまたは複数の第2の改良された動きベクトルを導出し、第3のパスのための初期動きベクトルとして1つまたは複数の第2の改良された動きベクトルを使用し得る。いくつかの例では、ビデオデコーダ300は、少なくとも1つのそれぞれの第3パスサブブロックのための少なくとも1つの第3の改良された動きベクトルを導出するために第3のパスを適用し、少なくとも1つの第3の改良された動きベクトルとして少なくとも1つの改良された動きベクトルを決定し得る。 In some examples, at least one third-pass subblock of a block of video data is a subblock for at least one second-pass subblock of the block of video data. In some examples, the video decoder 300 may apply a first pass to derive at least one first improved motion vector for a block of video data and use at least one first improved motion vector in a second pass. For example, the video decoder 300 may use a first improved motion vector as an initial motion vector for a second pass. In some examples, the video decoder 300 may apply a second pass to derive at least one second improved motion vector for at least one second-pass subblock and use at least one second improved motion vector in a third pass. For example, the video decoder 300 may derive one or more second improved motion vectors for one or more subblocks of the second pass and use one or more second improved motion vectors as an initial motion vector for a third pass. In some examples, the video decoder 300 may apply a third pass to derive at least one third improved motion vector for at least one third pass subblock, thereby determining at least one improved motion vector as at least one third improved motion vector.

いくつかの例では、マルチパスDMVRの少なくとも1つのパスは、BDOFを適用すること、またはバイラテラルマッチングを適用することを含む。いくつかの例では、第1のパスはバイラテラルマッチングを適用することを含み、第2のパスはバイラテラルマッチングを適用することを含み、第3のパスはBDOFを適用することを含む。 In some examples, at least one pass of a multipath DMVR includes applying BDOF or bilateral matching. In some examples, the first pass includes applying bilateral matching, the second pass includes applying bilateral matching, and the third pass includes applying BDOF.

いくつかの例では、少なくとも1つの第2パスサブブロックは、16ルマサンプルという所定の最大の幅および16ルマサンプルという所定の最大の高さを有する。いくつかの例では、少なくとも1つの第3パスサブブロックは、8ルマサンプルという所定の最大の幅および8ルマサンプルという所定の最大の高さを有する。 In some examples, at least one second-pass subblock has a predetermined maximum width of 16 luma samples and a predetermined maximum height of 16 luma samples. In some examples, at least one third-pass subblock has a predetermined maximum width of 8 luma samples and a predetermined maximum height of 8 luma samples.

いくつかの例では、第1のパスまたは第2のパスのうちの少なくとも1つのためのデルタ動き値の範囲は、水平方向において[-8,8]および垂直方向において[-8,8]であり、第3のパスのためのデルタ動き値の範囲は、水平方向において[-2,2]および垂直方向において[-2,2]である。 In some examples, the delta movement range for at least one of the first or second paths is [-8,8] horizontally and [-8,8] vertically, and the delta movement range for the third path is [-2,2] horizontally and [-2,2] vertically.

いくつかの例では、ビデオデータのブロックは第1のブロックである。いくつかの例では、ビデオデコーダ300は、短縮されたマルチパスDMVRをビデオデータの第2のブロックのための動きベクトルに適用し得る。たとえば、ビデオデコーダ300は、第2のブロックのためのマルチパスDMVRの所与のパスを飛ばすと決定し、第2のブロックのためのマルチパスDMVRの所与のパスを飛ばすという決定に基づいて、第2のブロックのためのマルチパスDMVRの所与のパスを飛ばし得る。いくつかの例では、ビデオデコーダ300は、先行するパスの結果に基づいて所与のパスを飛ばすと決定し得る。 In some examples, the block of video data is the first block. In some examples, the video decoder 300 may apply a shortened multipath DMVR to the motion vector for the second block of video data. For example, the video decoder 300 may decide to skip a given path of the multipath DMVR for the second block, and based on that decision, may skip a given path of the multipath DMVR for the second block. In some examples, the video decoder 300 may decide to skip a given path based on the result of a preceding path.

いくつかの例では、ビデオのブロックは第1のブロックである。いくつかの例では、ビデオデコーダ300は、短縮されたマルチパスDMVRをビデオデータの第2のブロックのための動きベクトルに適用し得る。たとえば、ビデオデコーダ300は、ビデオデータの第2のブロックの特定のサブエリアのためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばすと決定してもよく、特定のサブエリアは第2のブロックの1つまたは複数のサブブロックを備える。たとえば、ビデオデコーダ300は、第2のブロックの特定のサブエリアのためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばすという決定に基づいて、第2のブロックの特定のサブエリアのためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばし得る。いくつかの例では、ビデオデコーダ300は、先行するパスの結果に基づいて所与のパスを飛ばすと決定し得る。 In some examples, the video block is the first block. In some examples, the video decoder 300 may apply a shortened multipath DMVR to the motion vector for the second block of video data. For example, the video decoder 300 may decide to skip a given subblock-based path of the multipath DMVR for a specific sub-area of the second block of video data, where the specific sub-area comprises one or more sub-blocks of the second block. For example, the video decoder 300 may skip a given subblock-based path of the multipath DMVR for a specific sub-area of the second block based on its decision to skip that path. In some examples, the video decoder 300 may decide to skip a given path based on the result of a preceding path.

いくつかの例では、ブロックはビデオデータの第1のブロックである。いくつかの例では、ビデオデコーダ300は、ビデオデータの第2のブロックにDMVRを適用しないと決定し得る。DMVRを第2のブロックに適用しないという決定に基づいて、ビデオデコーダ300は、第2のブロックのためのマルチパスDMVRのすべてのパスを飛ばし、第2のブロックのための初期動きベクトルに基づいて第2のブロックを復号し得る。 In some examples, the block is the first block of video data. In some examples, the video decoder 300 may decide not to apply the DMVR to the second block of video data. Based on the decision not to apply the DMVR to the second block, the video decoder 300 may skip all paths of the multipath DMVR for the second block and decode the second block based on the initial motion vector for the second block.

図15は、本開示の技法による、現在ブロックを符号化するための例示的な方法を示すフローチャートである。現在ブロックは現在CUを備え得る。ビデオエンコーダ200(図1および図3)に関して説明されるが、他のデバイスが図15の方法と同様の方法を実行するように構成され得ることを理解されたい。 Figure 15 is a flowchart illustrating an exemplary method for encoding a current block using the technique of this disclosure. A current block may comprise a current CU. While the video encoder 200 (Figures 1 and 3) is described, it should be understood that other devices may be configured to perform a similar method to that shown in Figure 15.

この例では、ビデオエンコーダ200は最初に、現在ブロックを予測する(350)。たとえば、ビデオエンコーダ200は、現在ブロックのための予測ブロックを形成し得る。次いで、ビデオエンコーダ200は、現在ブロックのための残差ブロックを計算し得る(352)。残差ブロックを計算するために、ビデオエンコーダ200は、元の符号化されていないブロックと現在ブロックのための予測ブロックとの間の差分を計算し得る。次いで、ビデオエンコーダ200は、残差ブロックを変換し、残差ブロックの変換係数を量子化し得る(354)。次に、ビデオエンコーダ200は、残差ブロックの量子化された変換係数を走査し得る(356)。走査の間、または走査に続いて、ビデオエンコーダ200は、変換係数をエントロピー符号化し得る(358)。たとえば、ビデオエンコーダ200は、CAVLCまたはCABACを使用して変換係数を符号化し得る。次いで、ビデオエンコーダ200は、ブロックのエントロピー符号化されたデータを出力し得る(360)。 In this example, the video encoder 200 first predicts the current block (350). For example, the video encoder 200 may form a predicted block for the current block. Next, the video encoder 200 may compute the residual block for the current block (352). To compute the residual block, the video encoder 200 may compute the difference between the original unencoded block and the predicted block for the current block. Next, the video encoder 200 may transform the residual block and quantize the transformation coefficients of the residual block (354). Then, the video encoder 200 may scan the quantized transformation coefficients of the residual block (356). During or following the scan, the video encoder 200 may entropy encode the transformation coefficients (358). For example, the video encoder 200 may encode the transformation coefficients using CAVLC or CABAC. Next, the video encoder 200 may output the entropy encoded data of the block (360).

図16は、本開示の技法による、ビデオデータの現在ブロックを復号するための例示的な方法を示すフローチャートである。現在ブロックは現在CUを備え得る。ビデオデコーダ300(図1および図4)に関して説明されるが、他のデバイスが図16の方法に類似の方法を実行するように構成され得ることを理解されたい。 Figure 16 is a flowchart illustrating an exemplary method for decoding the current block of video data using the technique of this disclosure. The current block may comprise the current CU. While the video decoder 300 (Figures 1 and 4) is described, it should be understood that other devices may be configured to perform a method similar to that shown in Figure 16.

ビデオデコーダ300は、エントロピー符号化された予測情報および現在ブロックに対応する残差ブロックの変換係数のエントロピー符号化されたデータなどの、現在ブロックのためのエントロピー符号化されたデータを受信し得る(370)。ビデオデコーダ300は、エントロピー符号化されたデータをエントロピー復号して、現在ブロックのための予測情報を決定し、残差ブロックの変換係数を再生し得る(372)。ビデオデコーダ300は、現在ブロックのための予測ブロックを計算するために、たとえば、現在ブロックのための予測情報によって示されるようなイントラ予測モードまたはインター予測モードを使用して、現在ブロックを予測し得る(374)。現在ブロックを予測することの一部として、ビデオデコーダ300は、限定はされないが図14の技法を含む、本開示のマルチパスDMVR技法のいずれかを使用し得る。次いで、ビデオデコーダ300は、量子化された変換係数のブロックを作成するために、再生された変換係数を逆走査し得る(376)。次いで、ビデオデコーダ300は、変換係数を逆量子化し、逆変換を変換係数に適用して、残差ブロックを生成し得る(378)。ビデオデコーダ300は、予測ブロックおよび残差ブロックを合成することによって、現在ブロックを最終的に復号し得る(380)。 The video decoder 300 may receive entropy-encoded data for the current block, such as entropy-encoded prediction information and entropy-encoded data of the transformation coefficients of the residual block corresponding to the current block (370). The video decoder 300 may entropy-decode the entropy-encoded data to determine the prediction information for the current block and reconstruct the transformation coefficients of the residual block (372). The video decoder 300 may predict the current block to compute the prediction block for the current block, for example, using an intra-prediction mode or an inter-prediction mode as indicated by the prediction information for the current block (374). As part of predicting the current block, the video decoder 300 may use any of the multi-pass DMVR techniques of this disclosure, including, but not limited to, the technique shown in Figure 14. The video decoder 300 may then inversely scan the reconstructed transformation coefficients to create a block of quantized transformation coefficients (376). The video decoder 300 may then inversely quantize the transformation coefficients and apply the inverse transform to the transformation coefficients to generate a residual block (378). The video decoder 300 can finally decode the current block by combining the predicted block and the residual block (380).

本開示は以下の非限定的な条項を含む。 This disclosure includes the following non-exclusive provisions.

条項1A. ビデオデータをコーディングする方法であって、改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用するステップと、改良された動きベクトルに基づいてブロックをコーディングするステップとを備える、方法。 Clause 1A. A method for coding video data, comprising the steps of: applying a multipath decoder-side motion vector improvement (DMVR) to the motion vectors for a block of video data to determine improved motion vectors; and coding the block based on the improved motion vectors.

条項2A. マルチパスDMVRのパスの総数が所定の整数である、条項1Aに記載の方法。 Clause 2A. The method according to Clause 1A, wherein the total number of paths in a multipath DMVR is a predetermined integer.

条項3A. マルチパスDMVRが、ブロックベースである第1のパスと、サブブロックベースである第2のパスと、サブブロックベースである第3のパスとを備える、条項1Aまたは条項2Aに記載の方法。 Clause 3A. The method according to Clause 1A or Clause 2A, wherein the multipath DMVR comprises a first path that is block-based, a second path that is subblock-based, and a third path that is subblock-based.

条項4A. 第1のパスを適用するステップが、第1の改良された動きベクトルを導出する、条項3Aに記載の方法。 Clause 4A. The method according to Clause 3A, wherein the step of applying the first path derives the first improved motion vector.

条項5A. 改良された動きベクトルが第2のパスにおいて使用され、第2のパスのサブブロックが所定の最大の幅および所定の最大の高さを有し、第2のパスを適用するステップが、第2のパスの少なくとも1つのそれぞれのサブブロックのための第2の改良された動きベクトルを導出する、条項4Aに記載の方法。 Clause 5A. The method according to Clause 4A, wherein an improved motion vector is used in a second path, the subblocks of the second path have a predetermined maximum width and a predetermined maximum height, and the step of applying the second path derives a second improved motion vector for at least one of each subblocks of the second path.

条項6A. 各々の第2の改良された動きベクトルが第3のパスにおいて使用され、第3のパスのサブブロックが所定の最大の幅および所定の最大の高さを有し、第3のパスを適用するステップが、第3のパスの少なくとも1つのそれぞれのサブブロックのための第3の改良された動きベクトルを導出する、条項5Aに記載の方法。 Clause 6A. The method according to Clause 5A, wherein each second improved motion vector is used in a third path, the subblocks of the third path have a predetermined maximum width and a predetermined maximum height, and the step of applying the third path derives a third improved motion vector for at least one of each subblocks of the third path.

条項7A. マルチパスDMVRが反復的である、条項1Aから6Aのいずれかの組合せに記載の方法。 Clause 7A. A method of any combination of Clauses 1A through 6A in which the multipath DMVR is iterative.

条項8A. パスのサブブロックが、先行するパスのブロックまたはサブブロックのサイズ以下である、条項1Aから7Aのいずれかの組合せに記載の方法。 Clause 8A. A method according to any combination of Clauses 1A through 7A, wherein the subblocks of a path are less than or equal to the size of the blocks or subblocks of the preceding path.

条項9A. マルチパスDMVRの所与のパスを飛ばすかどうかを決定するステップと、所与のパスを飛ばすという決定に基づいてマルチパスDMVRの所与のパスを飛ばすステップとをさらに備える、条項1Aから8Aのいずれかの組合せに記載の方法。 Clause 9A. A method according to any combination of Clauses 1A through 8A, further comprising the steps of determining whether to skip a given path in a multipath DMVR, and skipping the given path in the multipath DMVR based on the decision to skip the given path.

条項10A. 所与のパスを飛ばすかどうかを決定するステップが、先行するパスからの所与の改良された動きベクトルが最適であると決定するステップを備える、条項9Aに記載の方法。 Clause 10A. The method according to Clause 9A, wherein the step of determining whether to skip a given path comprises the step of determining that a given improved motion vector from a preceding path is optimal.

条項11A. 所与のサブブロックサイズを(P,W)の小さい方および(Q,H)の小さい方であるものと決定するステップをさらに備え、PおよびQがあらかじめ定められた整数である、条項1Aから10Aのいずれかの組合せに記載の方法。 Clause 11A. A method according to any combination of Clauses 1A to 10A, further comprising the step of determining that a given subblock size is the smaller of (P, W) and the smaller of (Q, H), wherein P and Q are predetermined integers.

条項12A. PおよびQがハードウェア制約に基づく、条項11Aに記載の方法。 Clause 12A. The method described in Clause 11A, where P and Q are based on hardware constraints.

条項13A. 特定のサブブロックのためのマルチパスDMVRのサブブロックパスを飛ばすかどうかを決定するステップと、サブブロックパスを飛ばすという決定に基づいて特定のサブブロックのためのマルチパスDMVRのサブブロックパスを飛ばすステップとをさらに備える、条項1Aから12Aのいずれかの組合せに記載の方法。 Clause 13A. A method of any combination of Clauses 1A to 12A, further comprising the steps of determining whether to skip a subblock path of a multipath DMVR for a particular subblock, and, based on the decision to skip the subblock path, skipping the subblock path of the multipath DMVR for the particular subblock.

条項14A. サブブロックパスを飛ばすかどうかを決定するステップが、先行するパスからのサブブロックの改良された動きベクトルが最適であると決定するステップを備える、条項13Aに記載の方法。 Clause 14A. The method according to Clause 13A, wherein the step of determining whether to skip a subblock path comprises the step of determining that the improved motion vector of the subblock from the preceding path is optimal.

条項15A. マルチパスDMVRの少なくとも1つのパスが、双方向オプティカルフローを適用するステップを備える、条項1Aから14Aのいずれかの組合せに記載の方法。 Clause 15A. A method according to any combination of Clauses 1A through 14A, wherein at least one path of a multipath DMVR comprises a step of applying bidirectional optical flow.

条項16A. マルチパスDMVRの少なくとも1つのパスが、バイラテラルマッチングを適用するステップを備える、条項1Aから15Aのいずれかの組合せに記載の方法。 Clause 16A. A method according to any combination of Clauses 1A through 15A, wherein at least one path of a multipath DMVR comprises a step of applying bilateral matching.

条項17A. マルチパスDMVRの少なくとも1つのパスが、テンプレートマッチングを適用するステップを備える、条項1Aから16Aのいずれかの組合せに記載の方法。 Clause 17A. A method according to any combination of Clauses 1A through 16A, wherein at least one pass of a multipath DMVR comprises a step of applying template matching.

条項18A. マルチパスDMVRの少なくとも1つのパスが、補間フィルタを適用するステップを備える、条項1Aから17Aのいずれかの組合せに記載の方法。 Clause 18A. A method according to any combination of Clauses 1A to 17A, wherein at least one pass of a multipass DMVR comprises a step of applying an interpolation filter.

条項19A. マルチパスDMVRの少なくとも1つのパスが、簡略化された補間フィルタを適用するステップを備える、条項1Aから18Aのいずれかの組合せに記載の方法。 Clause 19A. The method according to any combination of Clauses 1A to 18A, wherein at least one pass of a multipass DMVR comprises a step of applying a simplified interpolation filter.

条項20A. マルチパスDMVTのすべてのパスを飛ばすかどうかを決定するステップと、マルチパスDMVTのすべてのパスを飛ばすという決定に基づいて、初期動きベクトルに基づいてブロックをコーディングするステップとをさらに備える、条項1Aから19Aのいずれかの組合せに記載の方法。 Clause 20A. The method of any combination of Clauses 1A to 19A, further comprising the steps of determining whether to skip all paths of a multipath DMVT and coding a block based on an initial motion vector, based on the decision to skip all paths of the multipath DMVT.

条項21A. コーディングするステップが復号するステップを備える、条項1Aから20Aのいずれかに記載の方法。 Clause 21A. The method according to any one of Clauses 1A to 20A, wherein the coding step comprises a decryption step.

条項22A. コーディングするステップが符号化するステップを備える、条項1Aから21Aのいずれかに記載の方法。 Clause 22A. The method according to any one of Clauses 1A to 21A, wherein the coding step comprises an encoding step.

条項23A. ビデオデータをコーディングするためのデバイスであって、条項1Aから22Aのいずれかに記載の方法を実行するための1つまたは複数の手段を備える、デバイス。 Clause 23A. A device for coding video data, comprising one or more means for performing the method described in any of Clauses 1A to 22A.

条項24A. 1つまたは複数の手段が、回路において実装された1つまたは複数のプロセッサを備える、条項23Aに記載のデバイス。 Clause 24A. The device described in Clause 23A, wherein one or more means comprises one or more processors implemented in the circuit.

条項25A. ビデオデータを記憶するためのメモリをさらに備える、条項23Aまたは24Aのいずれかに記載のデバイス。 Clause 25A. A device as described in either Clause 23A or 24A, further comprising memory for storing video data.

条項26A. 復号されたビデオデータを表示するように構成されるディスプレイをさらに備える、条項23Aから25Aのいずれかの組合せに記載のデバイス。 Clause 26A. A device described in any combination of Clauses 23A through 25A, further comprising a display configured to display decoded video data.

条項27A. デバイスが、カメラ、コンピュータ、モバイルデバイス、ブロードキャスト受信機デバイス、またはセットトップボックスのうちの1つまたは複数を備える、条項23Aから26Aのいずれかの組合せに記載のデバイス。 Clause 27A. A device described in any combination of Clauses 23A through 26A, comprising one or more of the following: a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.

条項28A. デバイスがビデオデコーダを備える、条項23Aから27Aのいずれかの組合せに記載のデバイス。 Clause 28A. A device described in any combination of Clauses 23A through 27A, wherein the device comprises a video decoder.

条項29A. デバイスがビデオエンコーダを備える、条項23Aから28Aのいずれかの組合せに記載のデバイス。 Clause 29A. A device described in any combination of Clauses 23A through 28A, wherein the device comprises a video encoder.

条項30A. 命令を記憶したコンピュータ可読記憶媒体であって、命令が、実行されると、1つまたは複数のプロセッサに、条項1Aから22Aのいずれかに記載の方法を実行させる、コンピュータ可読記憶媒体。 Clause 30A. A computer-readable storage medium storing instructions, wherein, when the instructions are executed, causes one or more processors to perform the method described in any of Clauses 1A to 22A.

条項1B. ビデオデータを復号する方法であって、
少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用するステップと、少なくとも1つの改良された動きベクトルに基づいてブロックを復号するステップとを備え、マルチパスDMVRが、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える、方法。 Clause 1B. A method for decoding video data,
A method comprising the steps of: applying a multipath decoder-side motion vector improvement (DMVR) to a motion vector for a block of video data to determine at least one improved motion vector; and decoding the block based on at least one improved motion vector, wherein the multipath DMVR comprises: a first pass which is block-based and applied to a block of video data; a second pass which is subblock-based and applied to at least one second pass subblock of the block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass which is subblock-based and applied to at least one third pass subblock of the block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

条項2B. ビデオデータのブロックの少なくとも1つの第3パスサブブロックが、ビデオデータのブロックの少なくとも1つの第2パスサブブロックに対してサブブロックである、条項1Bに記載の方法。 Clause 2B. The method according to Clause 1B, wherein at least one third-path subblock of a block of video data is a subblock with respect to at least one second-path subblock of a block of video data.

条項3B. 第1のパスを適用するステップが、ビデオデータのブロックのための少なくとも1つの第1の改良された動きベクトルを導出し、少なくとも1つの第1の改良された動きベクトルが第2のパスにおいて使用される、条項1Bまたは条項2Bに記載の方法。 Clause 3B. The method according to Clause 1B or Clause 2B, wherein the step of applying the first pass derives at least one first improved motion vector for a block of video data, and at least one first improved motion vector is used in the second pass.

条項4B. 第2のパスを適用するステップが、少なくとも1つのそれぞれの第2パスサブブロックのための少なくとも1つの第2の改良された動きベクトルを導出し、少なくとも1つの第2の改良された動きベクトルが第3のパスにおいて使用される、条項3Bに記載の方法。 Clause 4B. The method according to Clause 3B, wherein the step of applying the second pass derives at least one second improved motion vector for at least one each second pass subblock, and at least one second improved motion vector is used in the third pass.

条項5B. 第3のパスを適用するステップが、少なくとも1つのそれぞれの第3パスサブブロックのための少なくとも1つの第3の改良された動きベクトルを導出し、少なくとも1つの改良された動きベクトルが少なくとも1つの第3の改良された動きベクトルとして決定される、条項4Bに記載の方法。 Clause 5B. The method according to Clause 4B, wherein the step of applying the third path derives at least one third improved motion vector for at least one each third path subblock, and at least one improved motion vector is determined to be at least one third improved motion vector.

条項6B. マルチパスDMVRの少なくとも1つのパスが、双方向オプティカルフロー(BDOF)を適用するステップまたはバイラテラルマッチングを適用するステップを備える、条項1Bから5Bのいずれかの組合せに記載の方法。 Clause 6B. A method according to any combination of Clauses 1B to 5B, wherein at least one path of the multipath DMVR comprises a step of applying bidirectional optical flow (BDOF) or a step of applying bilateral matching.

条項7B. 第1のパスがバイラテラルマッチングを適用することを備え、第2のパスがバイラテラルマッチングを適用することを備え、第3のパスがBDOFを適用することを備える、条項6Bに記載の方法。 Clause 7B. The method according to Clause 6B, comprising: a first pass applying bilateral matching; a second pass applying bilateral matching; and a third pass applying BDOF.

条項8B. 少なくとも1つの第2パスサブブロックが、16ルマサンプルという所定の最大の幅および16ルマサンプルという所定の最大の高さを有する、条項1Bから7Bのいずれかの組合せに記載の方法。 Clause 8B. The method according to any combination of Clauses 1B to 7B, wherein at least one second pass subblock has a predetermined maximum width of 16 luma samples and a predetermined maximum height of 16 luma samples.

条項9B. 少なくとも1つの第3パスサブブロックが、8ルマサンプルという所定の最大の幅および8ルマサンプルという所定の最大の高さを有する、条項1Bから8Bのいずれかの組合せに記載の方法。 Clause 9B. The method according to any combination of Clauses 1B to 8B, wherein at least one third-path subblock has a predetermined maximum width of 8 luma samples and a predetermined maximum height of 8 luma samples.

条項10B. 第1のパスまたは第2のパスのうちの少なくとも1つのためのデルタ動き値の範囲が、水平方向において[-8,8]および垂直方向において[-8,8]であり、第3のパスのためのデルタ動き値の範囲が、水平方向において[-2,2]および垂直方向において[-2,2]である、条項1Bから9Bのいずれかの組合せに記載の方法。 Clause 10B. The method according to any combination of Clauses 1B to 9B, wherein the delta motion range for at least one of the first or second paths is [-8,8] horizontally and [-8,8] vertically, and the delta motion range for the third path is [-2,2] horizontally and [-2,2] vertically.

条項11B. ビデオデータのブロックが第1のブロックであり、方法がさらに、ビデオデータの第2のブロックのための動きベクトルに短縮されたマルチパスDMVRを適用するステップを備え、適用するステップが、第2のブロックのためのマルチパスDMVRの所与のパスを飛ばすと決定するステップと、第2のブロックのためのマルチパスDMVRの所与のパスを飛ばすという決定に基づいて、第2のブロックのためのマルチパスDMVRの所与のパスを飛ばすステップとを備える、条項1Bから10Bのいずれかの組合せに記載の方法。 Clause 11B. The method of any combination of Clauses 1B to 10B, wherein a block of video data is a first block, and the method further comprises the step of applying a shortened multipath DMVR to a motion vector for a second block of video data, wherein the applying step comprises the step of determining to skip a given path of the multipath DMVR for the second block, and the step of skipping a given path of the multipath DMVR for the second block based on the determination to skip a given path of the multipath DMVR for the second block.

条項12B. 所与のパスを飛ばすと決定するステップが、先行するパスの結果に基づく、条項11Bに記載の方法。 Clause 12B. The method described in Clause 11B, wherein the step of deciding to skip a given path is based on the result of a preceding path.

条項13B. ビデオデータのブロックが第1のブロックであり、方法がさらに、ビデオデータの第2のブロックのための動きベクトルに短縮されたマルチパスDMVRを適用するステップを備え、適用するステップが、ビデオデータの第2のブロックの特定のサブエリアのためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばすと決定するステップであって、特定のサブエリアが第2のブロックの1つまたは複数のサブブロックを備える、ステップと、第2のブロックの特定のサブエリアのためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばすという決定に基づいて、第2のブロックの特定のサブエリアのためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばすステップとを備える、条項1Bから12Bのいずれかの組合せに記載の方法。 Clause 13B. The method of any combination of Clauses 1B to 12B, wherein a block of video data is a first block, and the method further comprises the step of applying a shortened multipath DMVR to a motion vector for a second block of video data, wherein the applying step is a step of determining to skip a given subblock-based path of the multipath DMVR for a particular sub-area of the second block of video data, the particular sub-area comprising one or more sub-blocks of the second block; and the step of skipping a given subblock-based path of the multipath DMVR for a particular sub-area of the second block based on the determination to skip a given subblock-based path of the multipath DMVR for a particular sub-area of the second block.

条項14B. 所与のサブブロックベースのパスを飛ばすと決定するステップが、先行するパスの結果に基づく、条項13Bに記載の方法。 Clause 14B. The method described in Clause 13B, wherein the step of deciding to skip a given subblock-based path is based on the result of a preceding path.

条項15B. ブロックがビデオデータの第1のブロックであり、方法がさらに、ビデオデータの第2のブロックにDMVRを適用しないと決定するステップと、第2のブロックにDMVRを適用しないという決定に基づいて、第2のブロックのためのマルチパスDMVRのすべてのパスを飛ばすステップと、第2のブロックのための初期動きベクトルに基づいて第2のブロックを復号するステップとを備える、条項1Bから10Bのいずれかの組合せに記載の方法。 Clause 15B. The method according to any combination of Clauses 1B to 10B, wherein Block is a first block of video data, and the method further comprises the steps of: determining not to apply DMVR to a second block of video data; skipping all passes of the multipath DMVR for the second block based on the decision not to apply DMVR to the second block; and decoding the second block based on an initial motion vector for the second block.

条項16B. ビデオデータを復号するためのデバイスであって、ビデオデータを記憶するように構成されるメモリと、回路で実装されメモリに通信可能に結合される1つまたは複数のプロセッサとを備え、1つまたは複数のプロセッサが、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用し、少なくとも1つの改良された動きベクトルに基づいてブロックを復号するように構成され、マルチパスDMVRが、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える、デバイス。 Clause 16B. A device for decoding video data, comprising a memory configured to store video data, and one or more processors implemented by circuitry and communicatively coupled to the memory, wherein one or more processors are configured to apply a multipath decoder-side motion vector improvement (DMVR) to a motion vector for a block of video data to determine at least one improved motion vector, and to decode the block based on the at least one improved motion vector, wherein the multipath DMVR comprises: a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of a block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of a block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

条項17B. ビデオデータのブロックの少なくとも1つの第3パスサブブロックが、ビデオデータのブロックの少なくとも1つの第2パスサブブロックに対してサブブロックである、条項16Bに記載のデバイス。 Clause 17B. The device according to Clause 16B, wherein at least one third-path subblock of a block of video data is a subblock with respect to at least one second-path subblock of a block of video data.

条項18B. 1つまたは複数のプロセッサが、ビデオデータのブロックのための少なくとも1つの第1の改良された動きベクトルを導出するために、および少なくとも1つの第1の改良された動きベクトルを第2のパスにおいて使用するために、第1のパスを適用するように構成される、条項16Bまたは条項17Bに記載のデバイス。 Clause 18B. The device described in Clause 16B or Clause 17B, wherein one or more processors are configured to apply a first pass to derive at least one first improved motion vector for a block of video data and to use at least one first improved motion vector in a second pass.

条項19B. 1つまたは複数のプロセッサが、少なくとも1つのそれぞれの第2パスサブブロックのための少なくとも1つの第2の改良された動きベクトルを導出するために、および少なくとも1つの第2の改良された動きベクトルを第3のパスにおいて使用するために、第2のパスを適用するように構成される、条項18Bに記載のデバイス。 Clause 19B. The device according to Clause 18B, wherein one or more processors are configured to apply a second pass to derive at least one second improved motion vector for at least one each second pass subblock, and to use at least one second improved motion vector in a third pass.

条項20B. 1つまたは複数のプロセッサが、少なくとも1つのそれぞれの第3パスサブブロックのための少なくとも1つの第3の改良された動きベクトルを導出するために、および少なくとも1つの改良された動きベクトルを少なくとも1つの第3の改良された動きベクトルとして決定するために、第3のパスを適用するように構成される、条項19Bに記載のデバイス。 Clause 20B. The device according to Clause 19B, wherein one or more processors are configured to apply a third path to derive at least one third improved motion vector for at least one third path subblock, and to determine at least one improved motion vector as at least one third improved motion vector.

条項21B. マルチパスDMVRの少なくとも1つのパスが、双方向オプティカルフロー(BDOF)を適用するステップまたはバイラテラルマッチングを適用するステップを備える、条項16Bから20Bのいずれかの組合せに記載のデバイス。 Clause 21B. A device according to any combination of Clauses 16B to 20B, wherein at least one path of the multipath DMVR comprises a step of applying bidirectional optical flow (BDOF) or a step of applying bilateral matching.

条項22B. 第1のパスがバイラテラルマッチングを適用することを備え、第2のパスがバイラテラルマッチングを適用することを備え、第3のパスがBDOFを適用することを備える、条項21Bに記載のデバイス。 Clause 22B. The device described in Clause 21B, comprising a first pass applying bilateral matching, a second pass applying bilateral matching, and a third pass applying BDOF.

条項23B. 少なくとも1つの第2パスサブブロックが、16ルマサンプルという所定の最大の幅および16ルマサンプルという所定の最大の高さを有する、条項16Bから22Bのいずれかの組合せに記載のデバイス。 Clause 23B. A device according to any combination of Clauses 16B to 22B, wherein at least one second pass subblock has a predetermined maximum width of 16 luma samples and a predetermined maximum height of 16 luma samples.

条項24B. 少なくとも1つの第3パスサブブロックが、8ルマサンプルという所定の最大の幅および8ルマサンプルという所定の最大の高さを有する、条項16Bから23Bのいずれかの組合せに記載のデバイス。 Clause 24B. A device according to any combination of Clauses 16B to 23B, wherein at least one third-pass subblock has a predetermined maximum width of 8 luma samples and a predetermined maximum height of 8 luma samples.

条項25B. 第1のパスまたは第2のパスのうちの少なくとも1つのためのデルタ動き値の範囲が、水平方向において[-8,8]および垂直方向において[-8,8]であり、第3のパスのためのデルタ動き値の範囲が、水平方向において[-2,2]および垂直方向において[-2,2]である、条項16Bから24Bのいずれかの組合せに記載のデバイス。 Clause 25B. A device according to any combination of Clauses 16B to 24B, wherein the delta motion range for at least one of the first or second passes is [-8,8] horizontally and [-8,8] vertically, and the delta motion range for the third pass is [-2,2] horizontally and [-2,2] vertically.

条項26B. ビデオデータのブロックが第1のブロックであり、1つまたは複数のプロセッサが、ビデオデータの第2のブロックのための動きベクトルに短縮されたマルチパスDMVRを適用するように構成され、第2のブロックのための動きベクトルに短縮されたマルチパスDMVRを適用するために、1つまたは複数のプロセッサが、第2のブロックのためのマルチパスDMVRの所与のパスを飛ばすと決定し、第2のブロックのためのマルチパスDMVRの所与のパスを飛ばすという決定に基づいて、第2のブロックのためのマルチパスDMVRの所与のパスを飛ばすように構成される、条項16Bから25Bのいずれかの組合せに記載のデバイス。 Clause 26B. A device according to any combination of Clauses 16B to 25B, wherein a block of video data is a first block, and one or more processors are configured to apply a shortened multipath DMVR to the motion vectors for a second block of video data, and one or more processors decide to skip a given path of the multipath DMVR for the second block in order to apply the shortened multipath DMVR to the motion vectors for the second block, and are configured to skip a given path of the multipath DMVR for the second block based on the decision to skip a given path of the multipath DMVR for the second block.

条項27B. 1つまたは複数のプロセッサが、先行するパスの結果に基づいて所与のパスを飛ばすと決定するように構成される、条項26Bに記載のデバイス。 Clause 27B. The device described in Clause 26B, in which one or more processors are configured to determine whether to skip a given path based on the results of a preceding path.

条項28B. ビデオデータのブロックが第1のブロックであり、1つまたは複数のプロセッサが、ビデオデータの第2のブロックのための動きベクトルに短縮されたマルチパスDMVRを適用するように構成され、第2のブロックのための動きベクトルに短縮されたマルチパスDMVRを適用するために、1つまたは複数のプロセッサが、ビデオデータの第2のブロックの特定のサブエリアのためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばすと決定することであって、特定のサブエリアが第2のブロックの1つまたは複数のサブブロックを備える、決定することと、第2のブロックの特定のサブエリアのためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばすという決定に基づいて、第2のブロックの特定のサブエリアのためのマルチパスDMVRの所与のサブブロックベースのパスを飛ばすこととを行うように構成される、条項16Bから27Bのいずれかの組合せに記載のデバイス。 Clause 28B. A device according to any combination of Clauses 16B to 27B, wherein a block of video data is a first block, and one or more processors are configured to apply a shortened multipath DMVR to motion vectors for a second block of video data, and to determine, in order to apply the shortened multipath DMVR to motion vectors for the second block, one or more processors are configured to determine, and to perform the given subblock-based path of the multipath DMVR for a particular sub-area of the second block, wherein the particular sub-area comprises one or more sub-blocks of the second block, and based on the determination to perform the given subblock-based path of the multipath DMVR for the particular sub-area of the second block, the device is configured to perform the given subblock-based path of the multipath DMVR for the particular sub-area of the second block.

条項29B. 1つまたは複数のプロセッサが、先行するパスの結果に基づいて所与のサブブロックベースのパスを飛ばすと決定するように構成される、条項28Bに記載のデバイス。 Clause 29B. A device as described in Clause 28B, in which one or more processors are configured to determine whether to skip a given subblock-based path based on the results of a preceding path.

条項30B. ブロックがビデオデータの第1のブロックであり、1つまたは複数のプロセッサがさらに、ビデオデータの第2のブロックにDMVRを適用しないと決定し、第2のブロックにDMVRを適用しないという決定に基づいて、第2のブロックのためのマルチパスDMVRのすべてのパスを飛ばし、第2のブロックのための初期動きベクトルに基づいて第2のブロックを復号するように構成される、条項16Bから25Bのいずれかの組合せに記載のデバイス。 Clause 30B. A device as described in any combination of Clauses 16B to 25B, wherein a block is a first block of video data, and one or more processors are further configured to decide not to apply a DMVR to a second block of video data, to skip all paths of the multipath DMVR for the second block based on the decision not to apply a DMVR to the second block, and to decode the second block based on the initial motion vector for the second block.

条項31B. 命令を記憶した非一時的コンピュータ可読記憶媒体であって、命令が実行されると、1つまたは複数のプロセッサに、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルへマルチパスデコーダ側動きベクトル改良(DMVR)を適用させ、少なくとも1つの改良された動きベクトルに基づいてブロックを復号させ、マルチパスDMVRが、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える、非一時的コンピュータ可読記憶媒体。 Clause 31B. A non-temporary computer-readable storage medium storing instructions, wherein, when an instruction is executed, it causes one or more processors to apply a multipath decoder-side motion vector improvement (DMVR) to the motion vectors for a block of video data to determine at least one improved motion vector, and to decode the block based on at least one improved motion vector, wherein the multipath DMVR comprises: a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of a block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of a block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

条項33B. ビデオデータをコーディングするためのデバイスであって、少なくとも1つの改良された動きベクトルを決定するためにビデオデータのブロックのための動きベクトルにマルチパスデコーダ側動きベクトル改良(DMVR)を適用するための手段と、少なくとも1つの改良された動きベクトルに基づいてブロックを復号するための手段とを備え、マルチパスDMVRが、ブロックベースでありビデオデータのブロックに適用される第1のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第2パスサブブロックに適用される第2のパスであって、第2パスサブブロックの幅がビデオデータのブロックの幅以下であり、第2パスサブブロックの高さがビデオデータのブロックの高さ以下である、第2のパスと、サブブロックベースでありビデオデータのブロックの少なくとも1つの第3パスサブブロックに適用される第3のパスであって、第3パスサブブロックの幅が第2パスサブブロックの幅以下であり、第3パスサブブロックの高さが第2パスサブブロックの高さ以下である、第3のパスとを備える、デバイス。 Clause 33B. A device for coding video data, comprising means for applying a multipath decoder-side motion vector improvement (DMVR) to a motion vector for a block of video data to determine at least one improved motion vector, and means for decoding a block based on at least one improved motion vector, wherein the multipath DMVR comprises: a first pass that is block-based and applied to a block of video data; a second pass that is subblock-based and applied to at least one second pass subblock of a block of video data, wherein the width of the second pass subblock is less than or equal to the width of the block of video data and the height of the second pass subblock is less than or equal to the height of the block of video data; and a third pass that is subblock-based and applied to at least one third pass subblock of a block of video data, wherein the width of the third pass subblock is less than or equal to the width of the second pass subblock and the height of the third pass subblock is less than or equal to the height of the second pass subblock.

例に応じて、本明細書において説明された技法のうちのいずれかのいくつかの行為またはイベントが、異なる順序で実行されることが可能であり、追加され、統合され、または完全に除外されてもよい(たとえば、説明されたすべての行為またはイベントが技法の実践にとって必要であるとは限らない)ことを認識されたい。その上、いくつかの例では、行為またはイベントは、連続的にではなく、たとえば、マルチスレッド処理、割込み処理、または複数のプロセッサを通じて並行して実行されてもよい。 It should be noted that, depending on the example, some of the actions or events among the techniques described herein may be executed in a different order, added, merged, or completely excluded (for example, not all described actions or events are necessarily required for the practice of the technique). Furthermore, in some examples, the actions or events may be executed not sequentially, but concurrently, for example, through multithreading, interrupt handling, or across multiple processors.

1つまたは複数の例では、説明された機能は、ハードウェア、ソフトウェア、ファームウェア、またはそれらの任意の組合せで実装されてもよい。ソフトウェアで実装される場合、機能は、1つまたは複数の命令またはコードとして、コンピュータ可読媒体上に記憶されてもよく、またはコンピュータ可読媒体を介して送信されてもよく、ハードウェアベースの処理ユニットによって実行されてもよい。コンピュータ可読媒体は、データ記憶媒体などの有形媒体に相当するコンピュータ可読記憶媒体、または、たとえば、通信プロトコルに従って、ある場所から別の場所へのコンピュータプログラムの転送を容易にする任意の媒体を含む通信媒体を含んでもよい。このようにして、コンピュータ可読媒体は、一般に、(1)非一時的である有形コンピュータ可読記憶媒体、または(2)信号もしくは搬送波などの通信媒体に相当してもよい。データ記憶媒体は、本開示で説明された技法の実装のための命令、コード、および/またはデータ構造を取り出すために、1つまたは複数のコンピュータまたは1つまたは複数のプロセッサによってアクセスされ得る、任意の利用可能な媒体であってもよい。コンピュータプログラム製品がコンピュータ可読媒体を含んでもよい。 In one or more examples, the described functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or codes on a computer-readable medium, transmitted through a computer-readable medium, or executed by a hardware-based processing unit. The computer-readable medium may include computer-readable storage media corresponding to tangible media such as data storage media, or communication media including any medium that facilitates the transfer of computer programs from one location to another according to a communication protocol, for example. Thus, the computer-readable medium may generally correspond to (1) non-transient tangible computer-readable storage media, or (2) communication media such as signals or carrier waves. The data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described herein. A computer program product may include computer-readable media.

限定ではなく例として、そのようなコンピュータ可読記憶媒体は、RAM、ROM、EEPROM、CD-ROMもしくは他の光ディスクストレージ、磁気ディスクストレージもしくは他の磁気記憶デバイス、フラッシュメモリ、または命令もしくはデータ構造の形態の所望のプログラムコードを記憶するために使用され得るとともに、コンピュータによってアクセスされ得る、任意の他の媒体を備えることができる。また、いかなる接続も適切にコンピュータ可読媒体と呼ばれる。たとえば、命令が、同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線(DSL)、または赤外線、無線、およびマイクロ波などのワイヤレス技術を使用して、ウェブサイト、サーバ、または他のリモートソースから送信される場合、同軸ケーブル、光ファイバケーブル、ツイストペア、DSL、または赤外線、無線、およびマイクロ波などのワイヤレス技術は、媒体の定義の中に含まれる。しかしながら、コンピュータ可読記憶媒体およびデータ記憶媒体は、接続、搬送波、信号、または他の一時的な媒体を含まず、代わりに非一時的な有形記憶媒体を対象とすることを理解されたい。本明細書で使用されるディスク(disk)およびディスク(disc)は、コンパクトディスク(disc)(CD)、レーザーディスク(disc)、光ディスク(disc)、デジタル多用途ディスク(disc)(DVD)、フロッピーディスク(disk)、およびブルーレイディスク(disc)を含み、ここで、ディスク(disk)は、通常、データを磁気的に再生し、ディスク(disc)は、レーザーを用いてデータを光学的に再生する。上記のものの組合せもコンピュータ可読媒体の範囲内に含まれるべきである。 As an example, and not an limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other media that can be used to store desired program code in the form of instructions or data structures and can be accessed by a computer. Any connection is also appropriately called computer-readable media. For example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media. However, it should be understood that computer-readable storage media and data storage media do not include connections, carriers, signals, or other temporary media, but instead refer to non-temporary tangible storage media. The terms "disk" and "disc" as used herein include compact discs (CDs), laser discs, optical discs, digital multi-purpose discs (DVDs), floppy disks, and Blu-ray discs, where a disk typically reproduces data magnetically, and a disc reproduces data optically using a laser. Combinations of the above should also be included within the scope of computer-readable media.

命令は、1つまたは複数のDSP、汎用マイクロプロセッサ、ASIC、FPGA、または他の等価な集積論理回路もしくはディスクリート論理回路などの、1つまたは複数のプロセッサによって実行され得る。したがって、本明細書で使用される「プロセッサ」および「処理回路」という用語は、上記の構造、または本明細書で説明された技法の実装に適した任意の他の構造のうちのいずれかを指すことがある。加えて、いくつかの態様では、本明細書で説明された機能は、符号化および復号のために構成された専用のハードウェアモジュールおよび/もしくはソフトウェアモジュール内で提供されてもよく、または複合コーデックの中に組み込まれてもよい。また、本技法は、1つまたは複数の回路または論理要素で完全に実装され得る。 Instructions may be executed by one or more processors, such as one or more DSPs, general-purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuits. Therefore, the terms “processor” and “processing circuit” as used herein may refer to any of the above-described structures or any other structure suitable for implementing the techniques described herein. In addition, in some embodiments, the functions described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a composite codec. Furthermore, the techniques may be fully implemented with one or more circuits or logic elements.

本開示の技法は、ワイヤレスハンドセット、集積回路(IC)、またはICのセット(たとえば、チップセット)を含む、多種多様なデバイスまたは装置において実装されてもよい。開示された技法を実行するように構成されたデバイスの機能的態様を強調するために、様々なコンポーネント、モジュール、またはユニットが本開示において説明されたが、それらは必ずしも異なるハードウェアユニットによる実現を必要とするとは限らない。むしろ、上で説明されたように、様々なユニットは、コーデックハードウェアユニットにおいて組み合わせられてもよく、または適切なソフトウェアおよび/もしくはファームウェアとともに、上で説明されたような1つまたは複数のプロセッサを含む、相互動作可能なハードウェアユニットの集合によって提供されてもよい。 The techniques of this disclosure may be implemented in a wide variety of devices or apparatus, including wireless handsets, integrated circuits (ICs), or sets of ICs (e.g., chipsets). Various components, modules, or units have been described in this disclosure to highlight the functional aspects of devices configured to perform the disclosed techniques, but these do not necessarily require implementation by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit, or provided by a set of interoperable hardware units, including one or more processors as described above, along with appropriate software and/or firmware.

様々な例が説明された。これらおよび他の例は、以下の特許請求の範囲内に入る。 Various examples were described. These and other examples fall within the scope of the following claims.

102 ソースデバイス
104 ビデオソース
106 メモリ
108 出力インターフェース
110 コンピュータ可読媒体
112 記憶デバイス
114 ファイルサーバ
116 デスティネーションデバイス
118 表示デバイス
120 メモリ
122 入力インターフェース
130 QTBT構造
132 コーディングツリーユニット
200 ビデオエンコーダ
202 モード選択ユニット
204 残差生成ユニット
206 変換処理ユニット
208 量子化ユニット
210 逆量子化ユニット
212 逆変換処理ユニット
214 再構築ユニット
216 フィルタユニット
218 復号ピクチャバッファ
220 エントロピー符号化ユニット
222 動き推定ユニット
224 動き補償ユニット
226 イントラ予測ユニット
230 ビデオデータメモリ
300 ビデオデコーダ
302 エントロピー復号ユニット
304 予測処理ユニット
306 逆量子化ユニット
308 逆変換処理ユニット
310 再構築ユニット
312 フィルタユニット
314 DPB
316 動き補償ユニット
317 MPDMVR
318 イントラ予測ユニット
320 CPBメモリ
500 PU0
502 PU0
600 ブロック
602 ブロック
610 同一位置MV
700 テンプレート
702 現在ピクチャ
706 参照ピクチャ
800 TD0
802 TD1
804 現在ピクチャ
806 参照ピクチャ
808 参照ピクチャ
810 TD0
812 TD1
900 初期MV
902 3×3探索パターン
904 MV
906 最後に選択されるMV
1000 ブロック
1002 ブロック
1100 CU
1102 サブブロック
1200 コーディングブロック
1202 第1のパス
1204A～1204B サブブロック
1208A～1208H サブブロック 102 Source Device
104 Video Sources
106 memory
108 Output Interfaces
110 Computer-readable media
112 Storage Devices
114 File Server
116 Destination Devices
118 Display Devices
120 memory
122 Input Interfaces
130 QTBT structure
132 Coding Tree Units
200 video encoders
202 Mode Selection Unit
204 Residual Generation Unit
206 Conversion Processing Unit
208 Quantization Units
210 Inverse Quantization Unit
212 Inverse Transform Processing Unit
214 Reconstruction Unit
216 Filter Unit
218 Decoded picture buffer
220 Entropy Coding Units
222 Motion Estimation Unit
224 Motion Compensation Unit
226 Intra Prediction Units
230 video data memory
300 video decoders
302 Entropy Decoding Unit
304 Predictive Processing Unit
306 Inverse Quantization Unit
308 Inverse Transform Processing Unit
310 Reconstruction Unit
312 Filter Unit
314 DPB
316 Motion Compensation Unit
317 MPDMVR
318 Intra Prediction Units
320 CPB memory
500 PU0
502 PU0
600 blocks
602 blocks
610 Same position MV
700 templates
702 Current Picture
706 Reference Picture
800 TD0
802 TD1
804 Current Picture
806 Reference Picture
808 Reference Picture
810 TD0
812 TD1
900 Initial MV
902 3x3 search pattern
904 MV
906 The last selected MV
1000 blocks
1002 blocks
1100 CU
1102 Subblock
1200 coding blocks
1202 First Pass
1204A-1204B Subblock
1208A-1208H Subblock

Claims

A method for decoding video data,
The steps include applying a multipath decoder-side motion vector improvement (DMVR) to the motion vector for a block of video data in order to determine at least one improved motion vector,
The step of decoding the block based on the at least one improved motion vector is included.
The aforementioned multipath DMVR
A first path which is block-based and applied to the blocks of the video data,
A second path that is subblock-based and applied to at least one second path subblock of the block of the video data, wherein the width of the second path subblock is less than or equal to the width of the block of the video data, and the height of the second path subblock is less than or equal to the height of the block of the video data,
A method comprising: a third path which is subblock-based and applied to at least one third path subblock of the block of video data, wherein the width of the third path subblock is less than or equal to the width of the second path subblock, the height of the third path subblock is less than or equal to the height of the second path subblock, and the at least one third path subblock of the block of video data is a subblock with respect to the at least one second path subblock of the block of video data.

The step of applying the first pass derives at least one first improved motion vector for the block of video data, and the at least one first improved motion vector is used in the second pass.
The step of applying the second pass derives at least one second improved motion vector for each of at least one second pass subblocks, and the at least one second improved motion vector is used in the third pass.
The method according to claim 1, wherein the step of applying the third pass derives at least one third improved motion vector for at least one each third pass subblock, and the at least one improved motion vector is determined to be the at least one third improved motion vector.

At least one path of the multipath DMVR includes applying bidirectional optical flow (BDOF) or bilateral matching,
The method according to claim 1, wherein the first pass comprises applying bilateral matching, the second pass comprises applying bilateral matching, and the third pass comprises applying BDOF.

The at least one second pass subblock has a predetermined maximum width of 16 luma samples and a predetermined maximum height of 16 luma samples, and/or
The method according to claim 1, wherein the at least one third pass subblock has a predetermined maximum width of 8 luma samples and a predetermined maximum height of 8 luma samples.

The method according to claim 1, wherein the delta motion range for at least one of the first or second paths is [-8,8] horizontally and [-8,8] vertically, and the delta motion range for the third path is [-2,2] horizontally and [-2,2] vertically.

The block of the video data is a first block, and the method further comprises the step of applying a shortened multipath DMVR to a motion vector for a second block of the video data, the applying step is
A step of deciding to skip a given path of the multipath DMVR for the second block based on the result of the preceding path,
The process includes the step of skipping the given path of the multipath DMVR for the second block based on the decision to skip the given path of the multipath DMVR for the second block,
or
The block of the video data is a first block, and the method further comprises the step of applying a shortened multipath DMVR to a motion vector for a second block of the video data, the applying step is
A step of determining, based on the results of a preceding pass, to fly a given subblock-based path of the multipath DMVR for a specific sub-area of the second block of the video data, wherein the specific sub-area comprises one or more sub-blocks of the second block;
The method according to claim 1, further comprising the step of flying the given subblock-based path of the multipath DMVR for the particular sub-area of the second block, based on the decision to fly the given subblock-based path of the multipath DMVR for the particular sub-area of the second block.

The block is the first block of the video data, and the method further
The steps include deciding not to apply DMVR to the second block of the aforementioned video data,
Based on the decision not to apply the DMVR to the second block, the steps include skipping all paths of the multipath DMVR for the second block,
The method according to claim 1, further comprising the step of decoding the second block based on an initial motion vector for the second block.

A device for decoding video data,
A memory configured to store the aforementioned video data,
The circuit comprises one or more processors that are implemented in a circuit and are communicably coupled to the memory, and the one or more processors
To determine at least one improved motion vector, a multipath decoder-side motion vector improvement (DMVR) is applied to the motion vector for the block of video data,
The block is configured to decode based on at least one improved motion vector,
The aforementioned multipath DMVR
A first path which is block-based and applied to the blocks of the video data,
A second path that is subblock-based and applied to at least one second path subblock of the block of the video data, wherein the width of the second path subblock is less than or equal to the width of the block of the video data, and the height of the second path subblock is less than or equal to the height of the block of the video data,
A device comprising: a third path which is subblock-based and applied to at least one third path subblock of the block of video data, wherein the width of the third path subblock is less than or equal to the width of the second path subblock, the height of the third path subblock is less than or equal to the height of the second path subblock, and the at least one third path subblock of the block of video data is a subblock with respect to the at least one second path subblock of the block of video data.

The one or more processors are configured to apply the first pass in order to derive at least one first improved motion vector for the block of video data and to use the at least one first improved motion vector in the second pass.
The one or more processors are configured to apply the second pass in order to derive at least one second improved motion vector for each of at least one second pass subblocks, and to use the at least one second improved motion vector in the third pass.
The device according to claim 8, wherein one or more processors are configured to apply the third pass to derive at least one third improved motion vector for each of at least one third pass subblocks, and to determine the at least one improved motion vector as the at least one third improved motion vector.

The device according to claim 8, wherein at least one path of the multipath DMVR includes applying bidirectional optical flow (BDOF), or at least one path of the multipath DMVR includes applying bilateral matching, wherein the first path includes applying bilateral matching, the second path includes applying bilateral matching, and the third path includes applying BDOF.

The at least one second pass subblock has a predetermined maximum width of 16 luma samples and a predetermined maximum height of 16 luma samples, and/or
The device according to claim 8, wherein the at least one third pass subblock has a predetermined maximum width of 8 luma samples and a predetermined maximum height of 8 luma samples.

The device according to claim 8, wherein the delta motion range for at least one of the first or second passes is [-8,8] horizontally and [-8,8] vertically, and the delta motion range for the third pass is [-2,2] horizontally and [-2,2] vertically.

The block of the video data is a first block, and one or more processors are configured to apply a shortened multipath DMVR to the motion vector for a second block of the video data, and in order to apply the shortened multipath DMVR to the motion vector for the second block, the one or more processors
Based on the results of the preceding pass, it is decided to skip a given path of the multipath DMVR for the second block.
Based on the decision to skip the given path of the multipath DMVR for the second block, the multipath DMVR is configured to skip the given path for the second block.
or
The block of the video data is a first block, and one or more processors are configured to apply a shortened multipath DMVR to the motion vector for a second block of the video data, and in order to apply the shortened multipath DMVR to the motion vector for the second block, the one or more processors
Determining, based on the results of a preceding pass, to perform a given subblock-based pass of the multipath DMVR for a specific sub-area of the second block of the video data, wherein the specific sub-area comprises one or more sub-blocks of the second block,
The device according to claim 8, configured to perform the given subblock-based path of the multipath DMVR for the particular sub-area of the second block, based on the decision to perform the given subblock-based path of the multipath DMVR for the particular sub-area of the second block.

The block is the first block of the video data, and the one or more processors further
It was decided not to apply DMVR to the second block of the aforementioned video data.
Based on the decision not to apply the DMVR to the second block, all paths of the multipath DMVR for the second block are skipped.
The device according to claim 8, configured to decode the second block based on an initial motion vector for the second block.

A computer program, when executed, causes one or more processors to perform the method described in any one of claims 1 to 7.