Method and apparatus for selecting transform selection in an encoder and decoder
11558613 · 2023-01-17
Assignee
Inventors
- Christopher HOLLMANN (UPPSALA, SE)
- Davood SAFFAR (Solna, SE)
- Jacob STRÖM (Stockholm, SE)
- Per Wennersten (Årsta, SE)
Cpc classification
H04N19/12
ELECTRICITY
H04N19/70
ELECTRICITY
H04N19/46
ELECTRICITY
International classification
H04N19/13
ELECTRICITY
H04N19/12
ELECTRICITY
Abstract
There are provided mechanisms for methods and apparatuses for transform selection in encoding and decoding of video blocks.
Claims
1. A method performed by a decoder, the method comprising: obtaining an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding; parsing a flag of the at least one flag to determine if the flag is set to signal that a transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate a decoded video block; responsive to when the flag is determined to be set to signal that the transform is to be used in both the horizontal direction and in the vertical direction: decoding the encoded video block in the horizontal direction and the vertical direction using the transform to generate the decoded video block; responsive to when the flag is determined to be set to signal that the transform is not to be used in both the horizontal direction and in the vertical direction when decoding: parsing another flag of the at least one flag to determine in which one of the horizontal direction or the vertical direction the transform is to be used to decode the encoded video block and in which other one of the horizontal direction or the vertical direction another transform is to be used to decode the encoded video block; and decoding the encoded video block using the transform in the one of the horizontal direction or the vertical direction and using the another transform in the other one of the horizontal direction or the vertical direction to generate the decoded video block.
2. The method of claim 1, The method of claim 1, wherein the transform comprises one of two available alternative transforms, the method further comprising determining which one of the two available alternative transforms is to be used to decode the encoded video block based on parsing the flag.
3. The method of claim 2, wherein one of the two available alternative transforms is a Discrete Sine Transformation, DST-7, and the other of the two available alternative transforms is a Discrete Cosine Transformation, DCT-8.
4. The method of claim 1, wherein the flag is a second flag, the transform is a second transform, the another flag is a third flag, and the another transform is a third transform, and before parsing the second flag to determine if the second flag is set to signal that the second transform is to be used to decode the encoded video block in both the horizontal direction and the vertical direction, further comprising: parsing a first flag of the at least one flag to determine if the first flag is set to signal that a first transform of the plurality of transforms is to be used to decode the encoded video block in the horizontal direction and the vertical direction to generate the decoded video block; responsive to when the first flag is determined to be set to signal that the first transform is to be used to decode the encoded video block in both the horizontal direction and in the vertical direction, decoding the encoded video block in the horizontal direction and the vertical direction using the first transform to generate a decoded video block; wherein the parsing the second flag is performed responsive to when the first flag is determined to be set to signal that the first transform is not to be used to decode the encoded video block in both the horizontal direction and in the vertical direction.
5. The method of claim 1, wherein the second transform comprises a DST-7 transform.
6. The method of claim 1, wherein the another transform comprises one of a DCT-2 transform or a DCT-8 transform.
7. The method of claim 1, wherein the encoded video block is decoded in one of the horizontal and vertical directions using the transform and is decoded in the other one of the horizontal and vertical directions using the another transform.
8. The method of claim 4, wherein the first transform comprises a DCT-2 transform.
9. A decoder comprising: at least one processor; at least one memory coupled to the at least one processor, said at least one memory comprising instructions executable by the at least one processor, which cause the at least one processor to perform operations comprising: obtaining an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding; parsing a flag of the at least one flag to determine if the flag is set to signal that a transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate a decoded video block; responsive to when the flag is determined to be set to signal that the transform is to be used in both the horizontal direction and in the vertical direction: decoding the encoded video block in the horizontal direction and the vertical direction using the transform to generate the decoded video block; responsive to when the flag is determined to be set to signal that the transform is not to be used in both the horizontal direction and in the vertical direction when decoding: parsing another flag of the at least one flag to determine in which one of the horizontal direction or the vertical direction the transform is to be used to decode the encoded video block and in which other one of the horizontal direction or the vertical direction another transform is to be used to decode the encoded video block; and decoding the encoded video block using the transform in the one of the horizontal direction or the vertical direction and using the another transform in the other one of the horizontal direction or the vertical direction to generate the decoded video block.
10. The decoder of claim 9, wherein the transform comprises one of two available alternative transforms, and wherein the at least one memory further comprises instructions which cause the at least one processor to perform parsing the flag to determine which one of the two available alternative transforms of the transform is to be used to decode the encoded video block based on parsing the flag.
11. The decoder of claim 10, wherein one of the two available alternative transforms is a Discrete Sine Transformation, DST-7, and the other of the two available alternative transforms is a Discrete Cosine Transformation, DCT-8 transform.
12. The decoder of claim 9, wherein the flag is a second flag, the transform is a second transform, the another flag is a third flag, and the another transform is a third transform, and before parsing the second flag to determine if the second flag is set to signal that the second transform is to be used to decode the encoded video block in both the horizontal direction and the vertical direction, further comprising: parsing a first flag of the at least one flag to determine if the first flag is set to signal that a first transform of the plurality of transforms is to be used to decode the encoded video block in the horizontal direction and the vertical direction to generate the decoded video block; responsive to when the first flag is determined to be set to signal that the first transform is to be used to decode the encoded video block in both the horizontal direction and in the vertical direction, decoding the encoded video block in the horizontal direction and the vertical direction using the first transform to generate a decoded video block; wherein the parsing the second flag is performed responsive to when the first flag is determined to be set to signal that the first transform is not to be used to decode the encoded video block in both the horizontal direction and in the vertical direction.
13. The decoder of claim 9, wherein the transform comprises a DST-7 transform.
14. The decoder of claim 9, wherein the another transform comprises one of a DCT-2 transform or a DCT-8 transform.
15. The decoder of claim 9, wherein the encoded video block is decoded in one of the horizontal and vertical directions using the second transform and is decoded in the other one of the horizontal and vertical directions using the another transform.
16. A method performed by an encoder, the method comprising: obtaining a video block for encoding; determining a characteristic of the video block; responsive to when the characteristic is determined to be a type that indicates a multiple transform selection is used: selecting a first transform in a plurality of transforms that is part of the multiple transform selection and that is one of most computationally intensive for at least one processor to use or least likely to be used in encoding the video block; testing combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction; selecting a combination from the combinations of the plurality of transforms that provides the lowest rate distortion; encoding the video block using the selected combination to generate an encoded video block; responsive to when the characteristic is determined to be a type that indicates a multiple transform selection is not to be used: encoding the video block using one transform in the horizontal direction and the vertical direction.
17. The method of claim 16, wherein selecting the first transform comprises selecting a transform that is similar to another transform of the plurality of transforms and is more computationally intensive for the at least one processor than the other transform of the plurality of transforms.
18. The method of claim 16, further comprising determining the plurality of transforms that are to be tested.
19. The method of claim 18, wherein the plurality of transforms comprises at least three of a Discrete Cosine Transformation, a DCT-2 transform, a DCT-8 transform, and a Discrete Sine Transformation, DST-7 transform.
20. The method of claim 16, wherein the characteristic of the video block comprises a dimension of the video block.
21. The method of claim 1, wherein if the context-based adaptive arithmetic coding uses intra prediction, context selection used for one or more of the at least one flag is determined based on the larger dimension, width or height, of the block and direction of the intra prediction.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate certain non-limiting embodiments of inventive concepts. In the drawings:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION
(9) Inventive concepts will now be described more fully hereinafter with reference to the accompanying drawings, in which examples of embodiments of inventive concepts are shown. Inventive concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of present inventive concepts to those skilled in the art. It should also be noted that these embodiments are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present/used in another embodiment.
(10) The following description presents various embodiments of the disclosed subject matter. These embodiments are presented as teaching examples and are not to be construed as limiting the scope of the disclosed subject matter. For example, certain details of the described embodiments may be modified, omitted, or expanded upon without departing from the scope of the described subject matter.
(11)
(12)
(13) According to other embodiments, processor circuit 201 may be defined to include memory so that a separate memory circuit is not required. As discussed herein, operations of the encoder 100 may be performed by processor 201 and/or network interface 205. For example, processor 201 may control network interface 205 to transmit communications to decoder 108 and/or to receive communications through network interface 104 from one or more other network nodes/entities/servers such as other encoder nodes, depository servers, etc. Moreover, modules may be stored in memory 203, and these modules may provide instructions so that when instructions of a module are executed by processor 201, processor 201 performs respective operations.
(14)
(15) According to other embodiments, processor circuit 301 may be defined to include memory so that a separate memory circuit is not required. As discussed herein, operations of the decoder 108 may be performed by processor 301 and/or network interface 305. For example, processor 301 may control network interface 305 to receive communications from encoder 100. Moreover, modules may be stored in memory 303, and these modules may provide instructions so that when instructions of a module are executed by processor 301, processor 301 performs respective operations.
(16) A potential advantage provided by the inventive concepts described herein include reducing the encoder run time by limiting the number of transform combinations to be evaluated in the case of an encoder implemented in software. In the case of an encoder implemented in hardware, the complexity reduction may take another form, such as lowered silicon area usage instead of encoder run time.
(17) The embodiments described herein reduce the complexity of both the encoder and decoder by replacing a transform that is computationally expensive to use or that is infrequently used by another transform for certain block sizes. For example, in an encoder that is configured to operate under the VVC standard, the DCT-8, which is relatively speaking computationally expensive, may be replaced by the DCT-2, which is relatively speaking less computationally expensive, for certain block sizes.
(18) Furthermore, the compression efficiency is increased by using CABAC contexts to binarize emt_cu_flag and emt_tu_idx.
(19) A further improvement is a reduction in memory usage as no transform coefficients for the transform replaced (e.g., size-32 DCT-8) have to be stored in the memory. In a hardware implementation this may translate to a smaller silicon surface area.
(20) For example, in an implementation based on an anchor using VTM-2.0.1 according to the Common Test Conditions (CTC) as described in [1], the compression efficiency (average BD-rate for luma) is improved by 0.07% in the All Intra configuration and 0.02% in the Random Access (RA) configuration. At the same time, the encoding time is reduced to 85% (Al) and 95% (RA), respectively, compared to the anchor. There is minimal, if any, impact on the complexity of the decoder, but to the extent that is impact, it is favorable. One reason for this is due to the computationally expensive combination of DCT-8 horizontally and DCT-8 vertically being removed from use. When implementing the same modifications in VTM-3.0, the improvements in compression efficiency are 0.03% (Al) and 0.01% (RA), while the encoder run time is reduced to 88% (Al) and 98% (RA), respectively.
(21) In the description that follows, an encoder and decoder configured to perform in accordance with portions of the VVC standardization is used to describe the inventive concepts. Other standardizations may be implemented using the concepts described herein.
(22)
(23) Change 1 to change 6 are reflected in
(24) Prior to describing various embodiments based on the above changes, an overview of how the encoder 100 and decoder 108 operate with the changes implemented shall be described. Turning now to
(25) Responsive to the characteristic being of a type that indicates a multiple transform selection component is used, the encoder 100 in operation 703 selects a first transform from a plurality of transforms used by the multiple selection transform component (MST) and that is either the most computationally expensive or least likely to be used in encoding the video block. For example, when the transforms used by the MST are DCT-2, DST-7, and DCT-8, the DCT-8 often is the most computationally expensive to use. In such scenarios, the DCT-8 transform may be selected and designated as the first transform.
(26) In operation 705, the encoder 100 tests combinations of transforms without testing a combination where the first transform is used both in the horizontal direction and in the vertical direction. For example, the DCT-8 transform in the scenario described in operation 703 would not be tested in both the horizontal direction and the vertical direction.
(27) In operation 707, a combination is selected that provides the lowest rate distortion in comparison to other test combinations. Other decision factors may also be used in selecting the combination to use. For example, of one of the transforms is preferred over another transform and both transforms have comparable rate distortions, the preferred transform may be used.
(28) In operation 709, the video block is encoded using the combination selected to generate an encoded block. In operation 711, the encoded block is transmitted to a decoder, such as decoder 108, with flags that are used by the decoder to determine which combination was used in encoding and is to be used in decoding the encoded block.
(29) Responsive to the characteristic not being of the type, the video block is encoded using a default transformation is both horizontal and vertical directions. In one embodiment, the DCT-2 transform may be used as the default transform. In operation 715, the encoded block is transmitted to the decoder, such as decoder 108, with flags that are used by the decoder to determine which combination was used in encoding and is to be used in decoding the encoded block.
(30) Turning now to
(31) In operation 805, the video block is decoded using the first transform in both the horizontal direction and the vertical direction responsive to the first flag have a value associated with the first transform being used in both directions (e.g., the first flag is equal to a first value). For example, the DCT-2 transform may be used in both the horizontal direction and the vertical direction to decode the video block.
(32) In operation 807, a second flag is parsed responsive to the first flag setting having a value associated with the first transform not being used in both directions. The second flag is parsed to determine the second flag setting. The flag setting may indicate whether a second transform is to be used to decode the encoded video block in both the horizontal direction and the vertical direction. For example, in one embodiment, the setting may be a binary setting of a 1 or a 0. In other words, the second flag is equal to a first value or a second value. A setting of 1 may indicate the second transform is to be used in both directions. In other embodiments, a setting of 0 may be used to indicate the second transform is to be used in both directions.
(33) The second transform may be one of two transforms. The second flag may be parsed to determine which of the two transforms to be sued to decode the video block. For example, the two transforms in one embodiment may be the DST-7 transform and the DCT-8 transform
(34) In operation 809, the video block is decoded using the second transform in both the horizontal direction and the vertical direction responsive to the second flag have a value associated with the second transform being used in both directions (e.g., the second flag is equal to a first value). For example, the DST-7 transform may be used in both the horizontal direction and the vertical direction to decode the video block in operation 809.
(35) In operation 811, a third flag is parsed responsive to the second flag setting having a value associated with the second transform not being used in both directions. The third flag is parsed to determine the third flag setting. The third flag setting may indicate whether a second transform is to be used to decode the encoded video block in the horizontal direction or the vertical direction and a third transform to be used to decode in the other of the horizontal direction and vertical direction. This may be a first preferred transform combination. For example, in one embodiment, the setting may be a binary setting of a 1 or a 0. A setting of 1 may indicate the second transform is to be used in the horizontal direction and the third transform is to be used in the vertical direction. In other embodiments, a setting of 0 may be used to indicate the second transform is to be used in the horizontal direction and the third transform to be used in the vertical direction. This may be a second preferred transform combination. The third transform in an embodiment may be the first transform.
(36) In operation 813, the video block is decoded using the second transform in either the horizontal direction or the vertical direction based on the setting of the third flag. For example, the DST-7 transform may be used in the horizontal direction and either the DCT-2 or DCT-8 transform used in the vertical direction to decode the video block in operation 813. Alternatively, the DST-7 transform may be used in the vertical direction and either the DCT-2 or DCT-8 transform used in the horizontal direction to decode the video block in operation 813.
(37) In operation 815, the decoder may output the decoded video block to a media player for playback of the decoded video block.
(38) Turning now to
(39) In operation 903, responsive to the first criterion met, the decoder selects the transform combination from one: of the first transform in both the vertical direction and the horizontal direction; the third transform in both the vertical direction and the horizontal direction; the first transform in the vertical direction and the third transform in the horizontal direction; and the third transform in the vertical direction and the first transform in the horizontal direction.
(40) In operation 905, responsive to the first criterion met, the decoder selects the transform combination from one of: the first transform in both the vertical direction and the horizontal direction; the third transform in both the vertical direction and the horizontal direction; the second transform in the vertical direction and the third transform in the horizontal direction; and the third transform in the vertical direction and the second transform in the horizontal direction.
(41) Inn operation 907, the decoder decodes the block using the selected combination. In operation 909, the decoder may transmit the encoded block towards a media player.
(42) The first transform in the embodiments described below is the DCT-2 transform, the second transform is the DCT-8 transform, and the third transform is the DST-7 transform. In the description of the embodiments that follows, the first criterion is block size.
(43) In a first embodiment, change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 32 samples. In this first embodiment all blocks of size 16×16 or smaller evaluate the following combinations:
(44) DCT-2 horizontally and DCT-2 vertically
(45) DST-7 horizontally and DST-7 vertically
(46) DST-7 horizontally and DCT-8 vertically
(47) DCT-8 horizontally and DST-7 vertically
(48) For blocks of size 32×N or N×32 in the first embodiment, where N can be 4, 8, 16 or 32, the following combinations are evaluated:
(49) DCT-2 horizontally and DCT-2 vertically
(50) DST-7 horizontally and DST-7 vertically
(51) DST-7 horizontally and DCT-2 vertically
(52) DCT-2 horizontally and DST-7 vertically
(53) The decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 16×16 or smaller, the decoded bins can indicate the following combinations:
(54) DCT-2 horizontally and DCT-2 vertically
(55) DST-7 horizontally and DST-7 vertically
(56) DST-7 horizontally and DCT-8 vertically
(57) DCT-8 horizontally and DST-7 vertically
(58) If the block is of size 32×N or N×32 in the first embodiment, where N can be 4, 8, 16 or 32 (i.e., the first criterion of
(59) DCT-2 horizontally and DCT-2 vertically
(60) DST-7 horizontally and DST-7 vertically
(61) DST-7 horizontally and DCT-2 vertically
(62) DCT-2 horizontally and DST-7 vertically
(63) Table 1 shows where DCT-2 and DCT-8 are used in the first embodiment:
(64) TABLE-US-00001 TABLE 1 Block width 4 8 16 32 Block 4 DCT-8 DCT-8 DCT-8 DCT-2 height 8 DCT-8 DCT-8 DCT-8 DCT-2 16 DCT-8 DCT-8 DCT-8 DCT-2 32 DCT-2 DCT-2 DCT-2 DCT-2
(65) In a second embodiment, change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks of size 16×32, 32×16 or 32×32. In this embodiment all blocks of size 16×16 or smaller, 4×32, 8×32, 32×4 and 32×8 evaluate the following combinations:
(66) DCT-2 horizontally and DCT-2 vertically
(67) DST-7 horizontally and DST-7 vertically
(68) DST-7 horizontally and DCT-8 vertically
(69) DCT-8 horizontally and DST-7 vertically
(70) For blocks of size 32×16, 16×32 or 32×32 in the second embodiment, the following combinations are evaluated:
(71) DCT-2 horizontally and DCT-2 vertically
(72) DST-7 horizontally and DST-7 vertically
(73) DST-7 horizontally and DCT-2 vertically
(74) DCT-2 horizontally and DST-7 vertically
(75) The decoder is able to determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 16×16 or smaller, 4×32, 8×32, 32×4 or 32×8 the decoded bins can indicate the following combinations:
(76) DCT-2 horizontally and DCT-2 vertically
(77) DST-7 horizontally and DST-7 vertically
(78) DST-7 horizontally and DCT-8 vertically
(79) DCT-8 horizontally and DST-7 vertically
(80) If the block is of size 32×16, 16×32 or 32×32 in the second embodiment (i.e., the first criterion of
(81) DCT-2 horizontally and DCT-2 vertically
(82) DST-7 horizontally and DST-7 vertically
(83) DST-7 horizontally and DCT-2 vertically
(84) DCT-2 horizontally and DST-7 vertically
(85) Table 2 shows where DCT-2 and DCT-8 are used in the second embodiment:
(86) TABLE-US-00002 TABLE 2 Block width 4 8 16 32 Block 4 DCT-8 DCT-8 DCT-8 DCT-8 height 8 DCT-8 DCT-8 DCT-8 DCT-8 16 DCT-8 DCT-8 DCT-8 DCT-2 32 DCT-8 DCT-8 DCT-2 DCT-2
(87) In a third embodiment, changes 1, 3 and 4 are done for all block sizes. If a step to the right in
(88) TABLE-US-00003 TABLE 3 Horizontal Vertical transform transform mts_cu_flag mts_dst_flag mts_tu_flag DCT-2 DCT-2 0 DST-7 DST-7 1 0 DST-7 DCT-X 1 1 0 DCT-X DST-7 1 1 1
(89) The decoder will parse the flags and determine the combination of transforms based on the decoded bins. With respect to
(90) TABLE-US-00004 TABLE 4 Horizontal Vertical mts_cu_flag mts_dst_flag mts_tu_flag transform transform 0 DCT-2 DCT-2 1 0 DST-7 DST-7 1 1 0 DST-7 DCT-X 1 1 1 DCT-X DST-7
(91) In a set of embodiments, changes 1, 3, 4 and 5 are done for all block sizes. As an example, the more preferred combination as described in change 5 can be marked by setting the mts_tu_flag to ‘1’ and the less preferred combination as described in change 5 can be marked by setting the mts_tu_flag to ‘0’.
(92) In a fifth embodiment that is one of the set of embodiments, if the block is using intra prediction, the combination of using DST-7 horizontally and DCT-X vertically is regarded more preferred if the intra direction is closer to horizontal than to vertical. At the same time, if the intra direction is closer to vertical than to horizontal, the combination of using DCT-X horizontally and DST-7 vertically is regarded as more preferred. Thus, the decoder will determine the combination based on the intra direction of the block.
(93) If the intra direction is, for example, purely horizontal, and the decoder reads the mts_tu_flag as ‘1’, it will use a transform combination of DST-7 horizontally and DCT-X vertically. If the flag is read as ‘0’, the decoder will use a transform combination of DCT-X horizontally and the DST-7 vertically.
(94) If the intra direction is, for example, purely vertical, and the decoder reads the mts_tu_flag as ‘1’, it will use a transform combination of DCT-X horizontally and the DST-7 vertically. If the flag is read as ‘0’, the decoder will use a transform combination of DST-7 horizontally and DCT-X vertically.
(95) In a sixth embodiment that is one of the set of embodiments, if the block is using inter prediction, the combination of using DST-7 horizontally and DCT-X vertically is regarded as more probable if the block has a larger width than height. If the block has a larger height than width the combination of using DCT-X horizontally and DST-7 vertically is regarded as more probable.
(96) If the block has, for example, a size of 16×4 samples, and the decoder reads the mts_tu_flag as ‘1’, it will use a transform combination of DST-7 horizontally and DCT-X vertically. If the flag is read as ‘0’, the decoder will use a transform combination of DCT-X horizontally and the DST-7 vertically.
(97) If the block has, for example, a size of 4×16 samples, and the decoder reads the mts_tu_flag as ‘1’, it will use a transform combination of DCT-X horizontally and DST-7 vertically. If the flag is read as ‘0’, the decoder will use a transform combination of DST-7 horizontally and the DCT-X vertically.
(98) In the embodiments above, a 45-degree prediction direction is equally close to vertical as to horizontal. Therefore, the decoder and encoder have to agree on a tie-breaking rule to treat 45-degree directions in the same manner. In the set of embodiments above, this is handled by treating 45-degree directions as more vertical than horizontal. In a different embodiment, it may be advantageous to use a different tie-breaking rule, such as treating 45-degree directions as horizontal. Another possibility is to change at another degree than 45-degree directions. As an example, it may be advantageous to treat not only 45-degree directions as vertical, but also treat, for example, 43-degree directions as vertical, although mathematically they are closer to a horizontal direction. In general, it is therefore possible to use any angle in the tie-break rule, not just diagonal directions.
(99) Another case where a tie-breaking rule should be defined are non-directional intra prediction modes (planar or DC). In the set of embodiments above, these predictions are treated as more horizontal than vertical. In a slightly different embodiment, it might be advantageous to treat these as more vertical than horizontal. For example, in an implementation, the intra modes 0-34 are treated as being closer to horizontal and the intra modes 35-66 are treated as being closer to vertical.
(100) In a seventh embodiment, change 6 is used for intra coded blocks. The selection of which context to use for encoding and decoding the mts_cu_flag is made based on the longer side of the block and the intra direction. The intra directions are divided into two groups, one where using the DCT-2 horizontally and vertically is more preferred and one where using the DCT-2 horizontally and vertically is less preferred. These groups can be identical for different block sizes. Using the DCT-2 both horizontally and vertically can for example be more preferred if the intra mode is close to horizontal or vertical. In the same example, the combination would be less preferred if the intra direction is close to diagonal.
(101) Turning to
(102) Responsive to the block being of size 32×N or N×32 where N can be 4, 8, 16 or 32 and the intra direction is close to diagonal (i.e., it does not pass one of a horizontal closeness test or a vertical closeness test as determined in operation 1103), for instance if it is purely diagonal, a different context will be chosen, for example with a second id 1 in operation 1107.
(103) In operation 1108, the decoder determines if the block is of size 16×N or N×16 where N can be 4, 8 or 16. In operation 1111, responsive to the block being one of size 16×N or N×16 and the intra direction is close to horizontal or close to vertical (i.e., it passes one of a horizontal closeness test or a vertical closeness test as determined in operation 1111), for instance if it is purely vertical, a different context will be chosen, for example with a third id 2 in operation 1113.
(104) Responsive to the block being is of size 16×N or N×16 where N can be 4, 8 or 16 and the intra direction is close to diagonal (i.e., it does not pass one of a horizontal closeness test or a vertical closeness test as determined in operation 1111), for instance if it is purely diagonal, a different context will be chosen, for example with a fourth id 3 in operation 1115.
(105) In operation 1117, the decoder determines, if the block is of size 8×8, 8×4, 4×8 or 4×4. In operation 1119, responsive to the block being one of size 8×8, 8×4, 4×8, or 4×4 and the intra direction is close to horizontal or close to vertical (i.e., it passes one of a horizontal closeness test or a vertical closeness test as determined in operation 1119), for instance if it is purely horizontal, a different context will be chosen, for example with a fifth id 4 in operation 1121.
(106) Responsive to the block being of size 8×8, 8×4, 4×8 or 4×4 and the intra direction is close to diagonal (i.e., it does not pass one of a horizontal closeness test or a vertical closeness test as determined in operation 1119), for instance if it is purely diagonal, a different context will be chosen, for example with a sixth id 5 in operation 1123.
(107) This can be summarized in the following table:
(108) TABLE-US-00005 TABLE 5 Type of intra direction Closer to horizontal or Closer to vertical diagonal Block sizes 32×N or N×32 id 0 id 1 16×N or N×16 id 2 id 3 4×N, 8×N, N×4 or id 4 Id 5 N×8
(109) As described in the previous embodiment, a set of tie-breaking rules should be defined for the encoder and decoder for the cases where a prediction direction is equally close to horizontal and vertical. Tie-breaking rules should also be defined for the non-directional intra prediction modes Planar or DC. For example, in one implementation, the intra modes 10-22 may be seen as close to horizontal and may be treated as being horizontal, the intra modes 46-57 may be seen as close to vertical and may be treated as being vertical, and the remaining intra modes 0-9, 23-45 and 58-66 may be seen as close to diagonal and be treated as being diagonal.
(110) In an eighth embodiment, change 6 is used for inter coded blocks. The selection of which context to use for encoding and decoding the mts_cu_flag is made based on the block size and shape. For example, the six contexts can be selected as follows: a) If the block is of size 4×32 or 32×4 in the eighth embodiment, one context is used, for example with identifier (id) 0. b) If the block is of size 4×16, 8×32, 32×8 or 16×4 in the eighth embodiment, a different context is used, for example with id 1. c) If the block is of size 4×8 or 8×4 in the eighth embodiment, a different context is used, for example with id 2. d) If the block is of size 8×16, 16×32, 32×16 or 16×8 in the eighth embodiment, a different context is used, for example with id 3. e) If the block is of size 16×16 or 32×32 in the eighth embodiment, a different context is used, for example with id 4. f) If the block is of size 8×8 or 4×4 in the eighth embodiment, a different context is used, for example with id 5.
(111) The eighth embodiment can be summarized in Table 6:
(112) TABLE-US-00006 TABLE 6 Block width 4 8 16 32 Block 4 id 5 id 2 id 1 id 0 height 8 id 2 id 5 id 3 id 1 16 id 1 id 3 id 4 id 3 32 id 0 id 1 id 3 id 4
(113) In a ninth embodiment, change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 16 or 32 samples. In this embodiment all blocks of size 8×8 or smaller evaluate the following combinations: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-8 vertically DCT-8 horizontally and DST-7 vertically
(114) For blocks of size 16×N, N×16, 32×N or N×32 in the ninth embodiment, where N can be 4, 8, 16 or 32, the following combinations are evaluated: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-2 vertically DCT-2 horizontally and DST-7 vertically
(115) The decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 8×8 or smaller, the decoded bins can indicate the following combinations: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-8 vertically DCT-8 horizontally and DST-7 vertically
(116) If the block is of size 16×N, N×16, 32×N or N×32 in the ninth embodiment, where N can be 4, 8, 16 or 32 (i.e., the first criterion of
(117) Table 7 shows where DCT-2 and DCT-8 are used in the ninth embodiment:
(118) TABLE-US-00007 TABLE 7 Block width 4 8 16 32 Block 4 DCT-8 DCT-8 DCT-2 DCT-2 height 8 DCT-8 DCT-8 DCT-2 DCT-2 16 DCT-2 DCT-2 DCT-2 DCT-2 32 DCT-2 DCT-2 DCT-2 DCT-2
(119) In a tenth embodiment, change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 32 samples or 4 samples. In this embodiment all blocks of size 8×8, 8×16, 16×8 or 16×16 evaluate the following combinations: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-8 vertically DCT-8 horizontally and DST-7 vertically
(120) For blocks of size 4×N, N×4, 32×N or N×32 in the tenth embodiment, where N can be 4, 8, 16 or 32, the following combinations are evaluated: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-2 vertically DCT-2 horizontally and DST-7 vertically
(121) The decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 8×8, 8×16, 16×8 or 16×16, the decoded bins can indicate the following combinations: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-8 vertically DCT-8 horizontally and DST-7 vertically
(122) If the block is of size 4×N, N×4, 32×N or N×32 in the tenth embodiment, where N can be 4, 8, 16 or 32 (i.e., the first criterion of
(123) Table 8 shows where DCT-2 and DCT-8 are used in the tenth embodiment:
(124) TABLE-US-00008 TABLE 8 Block width 4 8 16 32 Block 4 DCT-2 DCT-2 DCT-2 DCT-2 height 8 DCT-2 DCT-8 DCT-8 DCT-2 16 DCT-2 DCT-8 DCT-8 DCT-2 32 DCT-2 DCT-2 DCT-2 DCT-2
(125) In an eleventh embodiment, change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 32 samples or the block has a size of 4×4 samples. In this embodiment all blocks of size 16×16 or smaller but larger than 4×4 evaluate the following combinations: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-8 vertically DCT-8 horizontally and DST-7 vertically
(126) For blocks of size 4×4, 32×N or N×32 in the eleventh embodiment, where N can be 4, 8, 16 or 32, the following combinations are evaluated: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-2 vertically DCT-2 horizontally and DST-7 vertically
(127) The decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 16×16 or smaller but larger than 4×4, the decoded bins can indicate the following combinations: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-8 vertically DCT-8 horizontally and DST-7 vertically
(128) If the block is of size 4×4, 32×N or N×32 in the eleventh embodiment, where N can be 4, 8, 16 or 32 (i.e., the first criterion of
(129) Table 9 shows where DCT-2 and DCT-8 are used in the eleventh embodiment:
(130) TABLE-US-00009 TABLE 9 Block width 4 8 16 32 Block 4 DCT-2 DCT-8 DCT-8 DCT-2 height 8 DCT-8 DCT-8 DCT-8 DCT-2 16 DCT-8 DCT-8 DCT-8 DCT-2 32 DCT-2 DCT-2 DCT-2 DCT-2
(131) In a twelfth embodiment, changes 1 and 2 is done for all block sizes where the MTS tool is allowed. In this embodiment all blocks evaluate the following combinations: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-2 vertically DCT-2 horizontally and DST-7 vertically
(132) The decoder can determine the correct combination of transforms based on the parsed flags. The decoded bins can indicate the following combinations: DCT-2 horizontally and DCT-2 vertically DST-7 horizontally and DST-7 vertically DST-7 horizontally and DCT-2 vertically DCT-2 horizontally and DST-7 vertically
(133) In a further set of embodiments, change 7 is incorporated. A new flag, called mts_same_flag, is signaled to indicate whether a block use the same transform in both horizontal and vertical direction. In one embodiment, if the flag has the value ‘1’, the block uses identical transforms in both directions, whereas if the flag has the value ‘0’, two different transformations will be used.
(134) In an embodiment, the mts_same_flag indicates that a block uses the same transform in both horizontal and vertical direction. An additional flag mts_tu_idx is signaled to indicate whether to use DCT-8 or DST-7 in both directions.
(135) In another embodiment, the mts_same_flag indicates that a block uses different transforms in horizontal and vertical direction. An additional flag mts_tu_idx is signaled to indicate whether to use DCT-8 in the horizontal direction and DST-7 in the vertical direction, or DST-7 in the horizontal direction and DCT-8 in the vertical direction.
(136) The processing in the decoder works analogously. First, the mts_same_flag is parsed by the decoder, followed by parsing the mts_tu_idx to determine the correct combination of transforms to use.
(137) In another embodiment, the mts_same_flag is parsed by the decoder, indicating that the same transform should be used in both horizontal and vertical direction. Afterwards, the mts_tu_idx is parsed by the decoder, indicating whether to use DST-7 or DCT-8 in both directions.
(138) In another embodiment, the mts_same_flag is parsed by the decoder, indicating that two different transforms should be used for the current block. The mts_tu_idx is parsed by the decoder to determine whether to use DCT-8 in horizontal and DST-7 in vertical direction, or DST-7 in horizontal and DCT-8 in vertical direction
(139) Thus, the disabling of one of the transform combination which enables the change to the CABAC coding by replacing the two existing flags as described herein with two new flags. Another key aspect is to replace one transform in certain cases by a different transform.
REFERENCES
(140) [1] F. Bossen, J. Boyce, X. Li, V. Seregin, K. Suhring (editors): “JVET common test conditions and software reference configurations for SDR video”, JVET-L1010, Macau, October 2018 [2] G. J. Sullivan, J. R. Ohm: “Meeting Report of the 11th JVET Meeting, (Ljubljana, 10-18 Jul. 2018)”, section 6.6, JVET-K1000, Ljubljana, July 2018