NZ745288B2

NZ745288B2 - Decoding video data using a two-level multi-type-tree framework

Info

Publication number: NZ745288B2
Application number: NZ745288A
Authority: NZ
Inventors: Jianle Chen; Hsiao Chiang Chuang; Marta Karczewicz; Xiang Li; Li Zhang; Xin Zhao; Feng Zou
Original assignee: Qualcomm Incorporated
Filing date: 2017-03-21
Publication date: 2024-12-03

Abstract

example device for decoding video data includes a video decoder configured to decode one or more syntax elements at a region-tree level of a region-tree of a tree data structure for a coding tree block (CTB) of video data, the region-tree having one or more region-tree nodes including region-tree leaf and non-leaf nodes, each of the region-tree non-leaf nodes having at least four child region-tree nodes, decode one or more syntax elements at a prediction-tree level for each of the region-tree leaf nodes of one or more prediction trees of the tree data structure for the CTB, the prediction trees each having one or more prediction-tree leaf and non-leaf nodes, each of the prediction-tree non-leaf nodes having at least two child prediction-tree nodes, each of the prediction leaf nodes defining respective coding units (CUs), and decode video data for each of the CUs.

Claims

1. A method of decoding video data, the method comprising: decoding one or more syntax elements at a region-tree level of a -tree of a tree data structure for a coding tree block (CTB) of video data, the region-tree having one or more region-tree nodes including zero or more region-tree non-leaf nodes and one or more region-tree leaf nodes, each of the region-tree af nodes having four child region-tree nodes; determining, using the syntax elements at the -tree level, how the regiontree nodes are split into the child region-tree nodes; decoding one or more syntax ts at a prediction-tree level for each of the region-tree leaf nodes of one or more prediction trees of the tree data structure for the CTB, the tion trees each having root nodes corresponding to one or more of the region-tree leaf nodes and one or more prediction-tree nodes including zero or more prediction-tree af nodes and one or more prediction-tree leaf nodes, each of the prediction-tree non-leaf nodes having either two child prediction tree nodes or three child prediction tree nodes obtained using a center-side triple partitioning, with at least one prediction-tree non-leaf node having three child prediction tree nodes obtained using a center-side triple ioning, each of the prediction-tree leaf nodes defining respective coding units (CUs); determining, using the syntax elements at the prediction-tree level, how the prediction-tree nodes are split into the child prediction-tree nodes; and decoding video data, including prediction data and transform data, for each of the CUs based at least in part on the syntax elements at the region-tree level and the syntax ts at the prediction-tree level, wherein the prediction data indicates a prediction mode for forming a predicted block for a corresponding one of the CUs, and wherein the transform data includes transform coefficients representing transformed residual data for the corresponding one of the CU’s.

2. The method of claim 1, wherein decoding the syntax elements at the -tree level and decoding the syntax elements at the prediction-tree level comprises decoding one or more no-further-split tree types for at least one of the region-tree level or the prediction-tree level, wherein a no-further-split tree type means that no further splitting is permitted.

3. The method of claim 1 or 2, further comprising decoding data representing a maximum region-tree depth for the -tree level.

4. The method of claim 3, further sing decoding data representing a maximum prediction-tree depth for the prediction-tree level.

5. The method of claim 4, wherein a sum of the maximum region-tree depth and the maximum prediction-tree depth is less than a m total depth value.

6. The method of any one of claims 3 to 5, wherein decoding the data representing the maximum region-tree depth comprises decoding the data representing the maximum region-tree depth from one or more of a ce parameter set (SPS), a picture parameter set (PPS), or a slice header.

7. The method of any one of claims 1 to 6, further comprising decoding one syntax element that jointly represents both a maximum region-tree depth for the region-tree level and a maximum prediction-tree depth for the prediction-tree level.

8. The method of claim 7, wherein decoding the one syntax element that jointly represents the maximum region-tree depth and the maximum prediction-tree depth comprises decoding the one syntax element from one or more of a sequence parameter set (SPS), a picture parameter set (PPS), or a slice header.

9. The method of any one of claims 1 to 8, further comprising inferring that at least one node, comprising at least one of the region-tree nodes or at least one of the prediction-tree nodes, is split without decoding split data for the node.

10. The method of claim 9, wherein inferring comprises inferring that the at least one node is split based on a block to which the node ponds crossing at least one of picture boundary, a slice boundary, or a tile boundary.

11. The method of any one of claims 1 to 10, further comprising decoding data indicating whether the -tree depth and the prediction-tree depth overlap with each other.

12. The method of any one of claims 1 to 11, wherein the video data for each of the CUs comprises one or more of a skip flag, a merge index, a syntax element representing whether the CU is predicted using inter-mode or intra-mode, mode prediction information, motion information, transform information, residual information, or quantization information.

13. The method of any one of claims 1 to 12, wherein decoding the one or more syntax elements at the region-tree level comprises decoding one or more syntax elements representing splitting ation for the prediction trees before decoding the video data for each of the CUs.

14. The method of claim 13, r comprising determining the number of CUs using the ing information before decoding the video data for each of the CUs.

15. The method of any one of claims 1 to 14, wherein decoding the syntax elements at the region-tree level comprises decoding one or more syntax elements representing one or more d coding tools at the region-tree level.

16. The method of claim 15, further comprising applying the coding tools across boundaries of the CUs when the boundaries of the CUs are within a common region as indicated by one of the region-tree leaf nodes of the region-tree.

17. The method of claim 15 or 16, wherein ng the data representing one or more of the enabled coding tools comprises decoding overlapped block motion compensation (OBMC) mode information in each of the region-tree leaf nodes representing whether OBMC is enabled for blocks of the video data corresponding to the one of the region-tree leaf nodes.

18. The method of claim 15 or 16, wherein decoding the data representing one or more of the enabled coding tools comprises decoding overlapped transforms information in each of the region-tree leaf nodes representing whether overlapped transforms are d for blocks of the video data corresponding to the one of the -tree leaf nodes, wherein overlapped orms comprises a coding tool for which a orm block is permitted to overlap a boundary between two prediction blocks of a region corresponding to the one of the region-tree leaf nodes.

19. The method of claim 15 or 16, wherein decoding the data representing one or more of the enabled coding tools ses decoding data in one or more of the regiontree leaf nodes indicating whether all of the CUs within a region corresponding to the region-tree leaf node are coded using one of skip mode, merge mode, intra-mode, intermode , or frame-rate upconversion (FRUC) mode, and wherein when all of the CUs within one of the regions are coded using a common mode, the method further comprises preventing decoding of mode information for the CUs at the CU level.

20. The method of any one of claims 1 to 19, n decoding the syntax elements at the -tree level comprises decoding at least one of sample adaptive offset (SAO) parameters or adaptive loop filter (ALF) parameters.

21. The method of any one of claims 1 to 20, further sing decoding one or more center-side triple tree syntax elements in at least one of the region-tree level or the prediction-tree level.

22. The method of any one of claims 1 to 21, further comprising calculating respective quantization parameters (QPs) for each of the CUs, wherein calculating the respective QPs ses determining base QPs for each of the region-tree leaf nodes and calculating the respective QPs based on the base QPs of the corresponding regiontree leaf nodes of the CUs.

23. The method of any one of claims 1 to 22, further comprising encoding the video data prior to decoding the video data.

24. A device for decoding video data, the device comprising: a memory configured to store video data; and a processor implemented in circuitry and configured to carry out the method of any one of claims 1 to 23.

25. The device of claim 24, further comprising at least one of: a display configured to display the decoded video data; or a camera configured to capture the video data.

26. The device of claim 24 or 25, wherein the device ses one or more of a camera, a computer, a mobile device, a broadcast er device, or a set-top box.

27. A computer-readable storage medium having stored thereon instructions that, when executed, cause a processor to carry out the method of any one of claims 1 to 23. SOURCE DEVICE DESTINATION DEVICE 12 14 VIDEO SOURCE DISPLAY DEVICE 18 32 VIDEO VIDEO ENCODER DECODER 20 30 OUTPUT 16 INPUT ACE INTERFACE 22 28 WO 65375 s