Method and apparatus for chrominance processing in video coding and decoding
10827169 ยท 2020-11-03
Assignee
Inventors
- James Alexander Gamei (Surrey, GB)
- Nicholas Ian Saunders (Basingstoke, GB)
- Karl James Sharman (Newbury, GB)
- Paul James Silcock (Swindon, GB)
Cpc classification
H04N19/12
ELECTRICITY
H04N19/122
ELECTRICITY
H04N19/119
ELECTRICITY
H04N19/649
ELECTRICITY
H04N19/13
ELECTRICITY
H04N19/80
ELECTRICITY
H04N19/129
ELECTRICITY
H04N19/157
ELECTRICITY
H04N19/44
ELECTRICITY
International classification
H04N19/119
ELECTRICITY
H04N19/157
ELECTRICITY
H04N19/129
ELECTRICITY
H04N19/80
ELECTRICITY
H04N19/122
ELECTRICITY
H04N19/44
ELECTRICITY
Abstract
A method of video coding in respect of a 4:2:2 chroma subsampling format includes dividing image data into transform units. In a case of a non-square transform unit, the method includes splitting the non-square transform unit into square blocks prior to applying a spatial frequency transform. The method further includes applying a spatial frequency transform to the square blocks to generate corresponding sets of spatial frequency coefficients.
Claims
1. A method of video coding, comprising: dividing, by circuitry, image data into transform units; when a transform unit is a non-square transform unit, splitting the non-square transform unit into square blocks and applying a spatial frequency transform to the square blocks to generate corresponding sets of spatial frequency coefficients; and associating, by the circuitry, intra-prediction mode angles for square prediction units with different intra-prediction mode angles for non-square prediction units.
2. A method according to claim 1, further comprising: combining the sets of spatial frequency coefficients relating to the square blocks derived from a transform unit.
3. The method according to claim 1, wherein the non-square transform unit is rectangular, and the splitting comprises selecting respective square blocks on either side of a center axis of the rectangular transform unit.
4. The method according to claim 1, wherein the transform unit has twice as many samples in a vertical direction as in a horizontal direction.
5. The method according to claim 1, wherein, in respect of transform units of an intra-prediction unit, the splitting is performed before generating predicted image data in respect of the intra-prediction unit.
6. The method according to claim 1, for a 4:2:2 chroma subsampling format, further comprising: interpolating a chroma intra-prediction unit having a height twice that of a corresponding 4:2:0 format prediction unit using a chroma filter employed for the corresponding 4:2:0 format prediction unit; and using only alternate vertical values of the interpolated chroma prediction unit.
7. The method according to claim 1, further comprising: deriving a luma motion vector for a prediction unit; and independently deriving a chroma motion vector for the prediction unit.
8. The method according to claim 1, further comprising: indicating that luma residual data is to be included in a bitstream losslessly; and independently indicating that chroma residual data is to be included in the bitstream losslessly.
9. The method according to claim 1, further comprising: defining one or more quantization matrices as difference values with respect to quantization matrices defined for a different chroma subsampling format.
10. A method of video decoding, comprising: applying a spatial frequency transform to blocks of spatial frequency coefficients to generate two or more corresponding square blocks of samples; and combining by circuitry the two or more square blocks of samples into a non-square transform unit.
11. The method according to claim 10, further comprising: splitting a block of spatial frequency coefficients into two or more sub-blocks; and applying the spatial frequency transform separately to each of the sub-blocks.
12. The method according to claim 10, wherein the non-square transform unit is rectangular, and the combining comprises concatenating the respective square blocks on either side of a center axis of the rectangular transform unit.
13. The method according to claim 10, wherein the transform unit has twice as many samples in a vertical direction as in a horizontal direction.
14. The method according to claim 10, wherein the video coding is in respect of a 4:2:2 chroma subsampling format.
15. A non-transitory computer readable medium including computer program instructions, which when executed by a computer causes the computer to perform the method of claim 1.
16. A video coding apparatus, comprising: circuitry configured to divide image data into transform units; a splitter configured to, when a transform unit is a non-square transform unit, split the non-square transform unit into square blocks; and a spatial frequency transformer configured to apply a spatial frequency transform to the square blocks to generate corresponding sets of spatial frequency coefficients, wherein the circuitry is further configured to associate intra-prediction mode angles for square prediction units with different intra-prediction mode angles for non-square prediction units.
17. The video coding apparatus of claim 16, configured to encode video data in a 4:2:2 chroma sampling format.
18. A video decoding apparatus, comprising: a spatial frequency transformer configured to apply a spatial frequency transform to blocks of spatial frequency coefficients to generate two or more corresponding square blocks of samples; and circuitry configured to combine the two or more square blocks of samples into a non-square transform unit.
19. The video decoding apparatus of claim 18, configured to decode video data in a 4:2:2 chroma sampling format.
20. A video capture, storage, display, transmission and/or reception apparatus comprising the coding apparatus according to claim 16.
21. A video capture, storage, display, transmission and/or reception apparatus comprising the decoding apparatus according to claim 18.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) A more complete appreciation of the disclosure and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DESCRIPTION OF THE EMBODIMENTS
(10) An apparatus and methods for chrominance processing in high efficiency video codecs are disclosed. In the following description, a number of specific details are presented in order to provide a thorough understanding of the embodiments of the present disclosure. It will be apparent, however, to a person skilled in the art that these specific details need not be employed to practice the present disclosure. Conversely, specific details known to the person skilled in the art are omitted for the purposes of clarity where appropriate.
(11) So-called high efficiency codecs according to the HEVC standards and/or proposals will be described purely by way of example. The terms HEVC or high efficiency are not to be considered limiting on the technical nature of the present disclosure or the embodiments.
(12) Video coding and decoding of the type of be discussed below makes use of a forward encoding path which encodes a residual image block representing differences between an image block and a predicted version of that image block. The image block used in generating the predicted image block is actually a decoded version of the image block rather than the original image block. The reason for this is to ensure that the encoder and the decoder are both working with the same source data, given that the original input image block is not available at the decoder. So, an encoder also includes a reverse decoding path, as described below with reference to
(13) Block Structure
(14) As noted above, the proposed HEVC standard uses a particular chroma sampling scheme known as the 4:2:0 scheme. The 4:2:0 scheme can be used for domestic/consumer equipment. However, several other schemes are possible.
(15) In particular, a so-called 4:4:4 scheme would be suitable for professional broadcasting, mastering and digital cinema, and in principle would have the highest quality and data rate.
(16) Similarly, a so-called 4:2:2 scheme could be used in professional broadcasting, mastering and digital cinema with some loss of fidelity.
(17) These schemes and their corresponding PU and TU block structures are described below.
(18) In addition, other schemes include the 4:0:0 monochrome scheme.
(19) In the 4:4:4 scheme, each of the three Y, Cb and Cr channels have the same sample rate. In principle therefore, in this scheme there would be twice as much chroma data as luma data.
(20) Hence in HEVC, in this scheme each of the three Y, Cb and Cr channels would have PU and TU blocks that are the same size; for example an 88 luma block would have corresponding 88 chroma blocks for each of the two chroma channels.
(21) Consequently in this scheme there would generally be a direct 1:1 relationship between block sizes in each channel.
(22) In the 4:2:2 scheme, the two chroma components are sampled at half the sample rate of luma (for example using vertical or horizontal subsampling). In principle therefore, in this scheme there would be as much chroma data as luma data.
(23) Hence in HEVC, in this scheme the Cb and Cr channels would have different size PU and TU blocks to the luma channel; for example an 88 luma block could have corresponding 4 wide8 high chroma blocks for each chroma channel.
(24) Notably therefore in this scheme the chroma blocks would be non-square.
(25) In the currently proposed HEVC 4:2:0 scheme, the two chroma components are sampled at a quarter of the sample rate of luma (for example using vertical and horizontal subsampling). In principle therefore, in this scheme there is half as much chroma data as luma data.
(26) Hence in HEVC, in this scheme again the Cb and Cr channels have different size PU and TU blocks to the luma channel. For example an 88 luma block would have corresponding 44 chroma blocks for each chroma channel. Consequently in general all of the CU, PU and TU blocks in this scheme are square, in particular for intra-prediction.
(27) The above schemes are colloquially known in the art as channel ratios, as in a 4:2:0 channel ratio; however it will be appreciated from the above description that in fact this does not always mean that the Y, Cb and Cr channels are compressed or otherwise provided in that ratio. Hence whilst referred to as a channel ratio, this should not be assumed to be literal. In fact, the ratios for the 4:2:0 scheme are 4:1:1 (the ratios for the 4:2:2 scheme and 4:4:4 scheme are in fact correct).
(28) 4:2:0 Block Structure
(29) Referring to
(30) Briefly, the Largest Coding Unit (LCU) is the root picture object. It typically covers an area equivalent to 6464 luma pixels and is recursively split to form a tree-hierarchy of Coding Units (CUs) being either 6464, 3232, 1616 or 88 pixels. The three channels have the same CU tree-hierarchy. The smallest permitted recursion is down to a CU of 88 pixels.
(31) The leaf CUs are then split into Prediction Units (PUs). The three channels have the same PU structure (with one possible exception where PUs are 44 luma Pixels for intra-prediction).
(32) These leaf CUs are also split into Transform Units (TUs), which can in turn be split again, up to a maximum of 16 TUs per CU. Smallest TU size is 44 pixels; the largest is 3232 pixels. The three channels have the same TU structure (again with one possible exception where TUs are 44 luma Pixels).
(33) 4:4:4 Block Structure Variants
(34) It has been appreciated that both 4:2:0 and 4:4:4 schemes have square PU blocks for intra-prediction coding. Moreover, currently the 4:2:0 scheme permits 44 pixel PU & TU blocks.
(35) In an embodiment of the present disclosure, it is consequently proposed that for the 4:4:4 scheme the recursion for CU blocks is permitted down to 44 pixels rather than 88 pixels, since as noted above in the 4:4:4 mode the luma and chroma blocks will be the same size (the chroma data is not subsampled) and so for a 44 CU no PU or TU will need to be less than the already allowed minimum of 44 pixels.
(36) Similarly, in the 4:4:4 scheme, in an embodiment of the present disclosure each of the Y, Cr, Cb channels, or the Y and the two Cr, Cb channels together, could have respective CU tree-hierarchies. A flag may then be used to signal which hierarchy or arrangement of hierarchies is to be used. This approach could also be used for a 4:4:4 RGB colour space scheme.
(37) 4:2:2 Block Structure Variants
(38) In the example of an 88 CU in the 4:2:0 scheme, this results in four 44 luma PUs and one 44 chroma PU. Hence in the 4:2:2 scheme, having twice as much chroma data, one option is in this case is to have two 44 chroma PUs. However, it is has been appreciated that using one non-square 48 chroma PU in this case would be more consistent with other non-square 4:2:2 PUs.
(39) As can be seen from
(40) However, as noted previously, the 4:2:2 scheme can have non-square PUs. Consequently in an embodiment of the present disclosure it is proposed to allow non-square TUs for the 4:2:2 scheme.
(41) For example, whilst a 1616 4:2:2 luma TU could correspond with two 88 4:2:2 chroma TUs for each chroma channel (Cb & Cr), in this embodiment it could instead correspond with one 816 4:2:2 chroma TU for each chroma channel (Cb & Cr).
(42) Similarly, four 44 4:2:2 luma TUs could correspond with two 44 4:2:2 chroma TUs for each chroma channel (Cb & Cr), or in this embodiment could instead correspond with one 48 4:2:2 chroma TU for each chroma channel (Cb & Cr). Here, the 48 TU is an example of a rectangular TU. It is an example of a 4:2:2 TU which has twice as many samples in a vertical direction as in a horizontal direction. Other sizes of TU may be used, for example other rectangular TUs and/or other TUs which have twice as many samples in a vertical direction as in a horizontal direction. For example, the following sizes may be considered: 24, 816, 1632 and so on.
(43) Having non-square chroma TUs, and hence fewer TUs, may be more efficient as they are likely to contain less information. However this may affect the transformation and scanning processes of such TUs, as will be described later.
(44) Finally, for the 4:4:4 scheme it may be preferable to have the TU structure channel-independent, and selectable at the sequence, picture, slice or finer level.
(45) As noted above, NSQT is currently disabled in the 4:2:0 scheme of HEVC. However, if for inter-picture prediction, NSQT is enabled and asymmetric motion partitioning (AMP) is permitted, this allows for PUs to be partitioned asymmetrically; thus for example a 1616 CU may have a 416 PU and a 1216 PU. In these circumstances, further considerations of block structure are important for each of the 4:2:0 and 4:2:2 schemes.
(46) For the 4:2:0 scheme, in NSQT the minimum width/height of a TU is restricted to 4 luma/chroma samples:
(47) Hence in a non-limiting example a 164/1612 luma PU structure has four 164 luma TUs and four 44 chroma TUs, where the luma TUs are in a 14 vertical block arrangement and the chroma TUs are in a 22 block arrangement.
(48) In a similar arrangement where the partitioning was vertical rather than horizontal, a 416/1216 luma PU structure has four 416 luma TUs and four 44 chroma TUs, where the luma TUs are in a 41 horizontal block arrangement and the chroma TUs are in a 22 block arrangement.
(49) For the 4:2:2 scheme, in NSQT as a non-limiting example a 416/1216 luma PU structure has four 416 luma TUs and four 48 chroma TUs, where the luma TUs are in a 41 horizontal block arrangement; the chroma TUs are in a 22 block arrangement.
(50) However, it has been appreciated that a different structure can be considered for some cases. Hence in an embodiment of the present disclosure, in NSQT as a non-limiting example 164/1612 luma PU structure has four 164 luma TUs and four 84 chroma TUs, but now the luma and chroma TUs are in a 14 vertical block arrangement, aligned with the PU layout (as opposed to the 4:2:0 style arrangement of four 48 chroma TUs in a 22 block arrangement).
(51) Similarly 328 PU can have four 164 luma TUs and four 84 chroma TUs, but now the luma and chroma TUs are in a 22 block arrangement.
(52) Hence more generally, for the 4:2:2 scheme, in NSQT the TU block sizes are selected to align with the asymmetric PU block layout. Consequently the NSQT usefully allows TU boundaries to align with PU boundaries, which reduces high frequency artefacts that may otherwise occur.
(53) Intra-Prediction
(54) 4:2:0 Intra-Prediction
(55) Turning now to
(56) HEVC allows chroma to have DC, Vertical, Horizontal, Planar, DM_CHROMA and LM_CH ROMA modes.
(57) DM_CHROMA indicates that the prediction mode to be used is the same as that of the co-located luma PU (one of the 35 shown in
(58) LM_CHROMA indicates that co-located luma samples are used to derive the predicted chroma samples. In this case, if the luma PU from which the DM_CHROMA prediction mode would be taken selected DC, Vertical, Horizontal or Planar, that entry in the chroma prediction list is replaced using mode 34.
(59) It is notable that the prediction modes 2-34 sample an angular range from 45 degrees to 225 degrees; that is to say, one diagonal half of a square. This is useful in the case of the 4:2:0 scheme, which as noted above only uses square chroma PUs for intra-picture prediction.
(60) 4:2:2 Intra-Prediction Variants
(61) However, also as noted above the 4:2:2 scheme could have rectangular (non-square) chroma PUs.
(62) Consequently, in an embodiment of the present disclosure, for rectangular chroma PUs, a mapping table may be required for the direction. Assuming a 1-to-2 aspect ratio for rectangular PUs, then for example mode 18 (currently at an angle of 135 degrees) may be re-mapped to 123 degrees. Alternatively selection of current mode 18 may be remapped to a selection of current mode 22, to much the same effect.
(63) Hence more generally, for non-square PUs, a different mapping between the direction of the reference sample and the selected intra prediction mode may be provided compared with that for square PUs.
(64) More generally still, any of the modes, including the non-directional modes, may also be re-mapped based upon empirical evidence.
(65) It is possible that such mapping will result in a many-to-one relationship, making the specification of the full set of modes redundant for 4:2:2 chroma PUs. In this case, for example it may be that only 17 modes (corresponding to half the angular resolution) are necessary. Alternatively or in addition, these modes may be angularly distributed in a non-uniform manner.
(66) Similarly, the smoothing filter used on the reference sample when predicting the pixel at the sample position may be used differently; in the 4:2:0 scheme it is only used to smooth luma pixels, but not chroma ones. However, in the 4:2:2 and 4:4:4 schemes this filter may also be used for the chroma PUs. In the 4:2:2 scheme, again the filter may be modified in response to the different aspect ratio of the PU, for example only being used for a subset of near horizontal modes. An example subset of modes is preferably 2-18 and 34, or more preferably 7-14.
(67) 4:4:4 Intra-Prediction Variants
(68) In the 4:4:4 scheme, the chroma and luma PUs are the same size, and so the intra-prediction mode for a chroma PU can be either the same as the co-located luma PU (so saving some overhead in the bit stream), or more preferably, it can be independently selected.
(69) In this latter case therefore, in an embodiment of the present disclosure one may have 1, 2 or 3 different prediction modes for the PUs in a CU;
(70) In a first example, the Y, Cb and Cr PUs may all use the same intra-prediction mode.
(71) In a second example, the Y PU may use one intra-prediction mode, and the Cb and Cr PUs both use another independently selected intra-prediction mode.
(72) In a third example, the Y, Cb and Cr PUs each use a respective independently selected intra-prediction mode.
(73) It will be appreciated that having independent prediction modes for the chroma channels (or each chroma channel) will improve the colour prediction accuracy.
(74) The selection of the number of modes could be indicated in the high-level syntax (for example at sequence, picture, or slice level). Alternatively, the number of independent modes could be derived from the video format; for example, GBR could have up to 3, whilst YCbCr could be restricted to up to 2.
(75) In addition to independently selecting the modes, the available modes may be allowed to differ from the 4:2:0 scheme in the 4:4:4 scheme.
(76) For example as the luma and chroma PUs are the same size, the chroma PU may benefit from access to all of the 35+LM_CHROMA+DM_CHROMA directions available. Hence for the case of Y, Cb and Cr each having independent prediction modes, then the Cb channel could have access to DM_CHROMA & LM_CHROMA, whilst the Cr channel could have access to DM_CHROMA_Y, DM_CHROMA_Cb, LM_CHROMA_Y and LM_CHROMA_Cb, Where these replace references to the Luma channel with references to the Y or Cb chroma channels.
(77) Where the luma prediction modes are signalled by deriving a list of most probable modes and sending an index for that list, then if the chroma prediction mode(s) are independent, it may be necessary to derive independent lists of most probable modes for each channel.
(78) Finally, in a similar manner to that noted for the 4:2:2 case above, in the 4:4:4 scheme the smoothing filter used on the reference sample when predicting the pixel at the sample position may be used for chroma PUs in a similar manner to luma PUs.
(79) Inter-Prediction
(80) Each frame of a video image is a discrete sampling of a real scene, and as a result each pixel is a step-wise approximation of a real-world gradient in colour and brightness.
(81) In recognition of this, when predicting the Y, Cb or Cr value of a pixel in a new video frame from a value in a previous video frame, the pixels in that previous video frame are interpolated to create a better estimate of the original real-world gradients, to allow a more accurate selection of brightness or colour for the new pixel. Consequently the motion vectors used to point between video frames are not limited to an integer pixel resolution. Rather, they can point to a sub-pixel position within the interpolated image.
(82) 4:2:0 Inter-Prediction
(83) Referring now to
(84) For example for the 88 4:2:0 luma PU, interpolation is pixel, and so an 8-tap4 filter is applied horizontally first, and then the same 8-tap4 filter is applied vertically, so that the luma PU is effectively stretched 4 times in each direction, as shown in
(85) 4:2:2 Inter-Prediction Variants
(86) Referring now also to
(87) Whilst it may be possible therefore to use the existing 8-tap4 luma filter vertically on the chroma PU, in an embodiment of the present disclosure it has been appreciated that the existing 4-tap8 chroma filter would suffice for vertical interpolation as in practice one is only interested in the even fractional locations of the interpolated chroma PU.
(88) Hence
(89) 4:4:4 Inter-Prediction Variants
(90) By extension, the same principle of only using the even fractional results for the existing 4-tap8 chroma filter can be applied both vertically and horizontally for the 88 4:4:4 chroma PUs.
(91) Further Inter-Prediction Variants
(92) In one implementation of motion vector (MV) derivation, one vector is produced for a PU in a P-slice (and two vectors for a PU in a B-slice (where a P-slice takes predictions from a preceding frame, and a B-slice takes predictions from a preceding and following frame, in a similar manner to MPEG P and B frames). Notably, in this implementation in the 4:2:0 scheme the vectors are common to all channels, and moreover, the chroma data is not used to calculate the motion vectors. In other words, all the channels use a motion vector based on the luma data.
(93) In an embodiment of the present disclosure, in the 4:2:2 scheme the chroma vector could be independent from luma (a vector for the Cb and Cr channels could be derived separately), and in the 4:4:4 scheme chroma vectors could further be independent for each of the Cb and Cr channels.
(94) Transforms
(95) In HEVC, most images are encoded using motion vectors with respect to previously encoded/decoded frames, with the motion vectors telling the decoder where, in these other decoded frames, to copy good approximations of the current image from. The result is an approximate version of the current image. HEVC then encodes the so-called residual, which is the error between that approximate version and the correct image. This residual requires much less information than specifying the actual image directly. However, it is still generally preferable to compress this residual information to reduce the overall bitrate further.
(96) In many encoding methods including HEVC, such data is transformed into the spatial frequency domain using an integer cosine transform (ICT), and typically some compression is then achieved by retaining low spatial frequency data and discarding higher spatial frequency data according to the level of compression desired.
(97) 4:2:0 Transforms
(98) The spatial frequency transforms used in HEVC are conventionally ones that generate coefficients in powers of 4 (for example 64 frequency coefficients) as this is particularly amenable to common quantisation/compression methods. The square TUs in the 4:2:0 scheme are all powers of 4 and hence this is straightforward to achieve.
(99) Even in the case of the currently not-enabled NSQT, some non-square transforms are available for non-square TUs, such as 416, but again notably these result in 64 coefficients, again a power of 4.
(100) 4:2:2 and 4:4:4 Transform Variants
(101) The 4:2:2 scheme can result in non-square TUs that are not powers of 4; for example a 48 TU has 32 pixels, and 32 is not a power of 4.
(102) In an embodiment of the present disclosure therefore, a non-square transform for a non-power of 4 number of coefficients may be used, acknowledging that modifications may be required to the subsequent quantisation process.
(103) Alternatively, in an embodiment of the present disclosure non-square TUs are split into square blocks having a power of 4 area for transformation, and then the resulting coefficients can be interleaved.
(104) For example, for 48 blocks (eight rows of four samples), odd/even rows of samples can be split into two square blocks, for example so that one of the square blocks takes the even rows and the other takes the odd rows. Alternatively, for 48 blocks the top 44 pixels and the bottom 44 pixels could form two square blocks, in other words by dividing the TU around a centre axis of the TU (a horizontal axis in this example). Alternatively again, for 48 blocks a Haar wavelet decomposition can be used to form a lower and an upper frequency 44 block. Corresponding recombining techniques are used to recombine decoded square blocks into a TU at the decoder (or in the reverse decoding path of the encoder).
(105) Any of these options may be made available, and the selection of a particular alternative may be signalled to or derived by the decoder.
(106) Accordingly, at the encoder side, this represents an example of a method of video coding in respect of a 4:2:2 chroma subsampling format or another format, the method comprising:
(107) dividing image data into transform units;
(108) in the case of a non-square transform unit, splitting the non-square transform unit into square blocks prior to applying a spatial frequency transform; and
(109) applying a spatial frequency transform to the square blocks to generate corresponding sets of spatial frequency coefficients.
(110) In embodiments, in respect of transform units of an intra-prediction unit, the splitting step may be performed before generating predicted image data in respect of that prediction unit. This can be useful because for intra-coding, the prediction is potentially based upon recently decoded TUs which could be others from the same PU.
(111) Optionally, the sets of spatial frequency coefficients relating to the square blocks derived from a transform unit may be recombined after the transform has been performed. But in other embodiments, the coefficients relating to the transformed square blocks may be encoded, stored and/or transmitted separately.
(112) As discussed above, the splitting may comprise applying a Haar transform. Alternatively, in the case that the non-square transform unit is rectangular, the splitting may comprise selecting respective square blocks either side of a centre axis of the rectangular transform unit. Alternatively, in the case that the non-square transform unit is rectangular, the splitting may comprise selecting alternate rows or columns of samples of the transform unit.
(113) In embodiments, in respect of transform units of an intra-prediction unit, the splitting step may be performed before generating predicted image data in respect of that prediction unit. This can be useful because for intra-coding, the prediction is potentially based upon recently decoded TUs which could be others from the same PU.
(114) A 48 TU is an example of a rectangular TU. It is an example of a TU in which there are twice as many samples in a vertical direction as in a horizontal direction.
(115) At the decoder side, a method of video decoding in respect of a 4:2:2 chroma subsampling format or other format may comprise applying a spatial frequency transform to blocks of spatial frequency coefficients to generate two or more corresponding square blocks of samples; and combining the two or more square blocks of samples into a non-square transform unit.
(116) In other words, spatial frequency coefficients for the square blocks may be handled (at least by the transform process) separately, with the resulting square blocks of samples being combined into the non-square TU.
(117) Prior to the transform process being applied, the coefficients may be delivered as respective sets (each corresponding to a square block) or as a combined set of coefficients. In the latter case, the method may include splitting a block of spatial frequency coefficients into two or more sub-blocks; and applying the spatial frequency transform separately to each of the sub-blocks.
(118) As above, various options are proposed for the combining operation. The combining may comprise applying an inverse Haar transform. Alternatively, in the case that the non-square transform unit is rectangular, the combining may comprise concatenating the respective square blocks either side of a centre axis of the rectangular transform unit. Alternatively, in the case that the non-square transform unit is rectangular, the combining may comprise selecting alternate rows or columns of samples of the transform unit from alternate ones of the square blocks.
(119) Other Transform Modes
(120) In the 4:2:0 scheme there is a proposed flag (the so-called qpprime_y_zero_transquant_bypass_flag) allowing the residual data to be included in the bit stream losslessly (without being transformed, quantised or further filtered). In the 4:2:0 scheme the flag applies to all channels.
(121) In an embodiment of the present disclosure, it is proposed that the flag for the luma channel is separate to the chroma channels. Hence for the 4:2:2 scheme, such flags should be provided separately for the luma channel and for the chroma channels, and for the 4:4:4 scheme, such flags should be provided either separately for the luma and chroma channels, or one flag is provided for each of the three channels. This recognises the increased chroma data rates associated with the 4:2:2 and 4:4:4 schemes, and enables, for example, lossless luma data together with compressed chroma data.
(122) For intra-prediction coding, mode-dependent directional transform (MDDT) allows the horizontal or vertical ICT (or both ICTs) for a TU to be replaced with an Integer Sine Transform depending upon the intra-prediction direction. In the 4:2:0 scheme this is not applied to chroma TUs. However in an embodiment of the present disclosure it is proposed to apply it to 4:2:2 and 4:4:4 chroma TUs.
(123) Quantisation
(124) In the 4:2:0 scheme, the quantisation calculation is the same for chrominance as for luminance. Only the quantisation parameters (QPs) differ.
(125) QPs for chrominance are calculated from the luminance QPs as follows:
Qp.sub.Cb=scalingTable[Qp.sub.luminance+chroma_qp_index_offset]
Qp.sub.Cr=scalingTable[Qp.sub.luminance+second_chroma_qp_index_offset]
(126) Where the scaling table is defined as seen in
(127) Chrominance channels typically contain less information than luminance and hence have smaller-magnitude coefficients; this limitation on the chrominance QP may prevent all chrominance detail being lost at heavy quantisation levels.
(128) The QP-divisor relationship in the 4:2:0 is such that an increase of 6 in the QP is equivalent to a doubling of the divisor. Hence the largest difference in the scaling table of 5139=12 represents a factor-of-4 change in the divisor.
(129) However, in an embodiment of the present disclosure, for the 4:2:2 scheme, which potentially contains twice as much chroma information as the 4:2:0 scheme, the maximum chrominance QP value in the scaling table may be raised to 45 (halving the divisor). Similarly for the 4:4:4 scheme, the maximum chrominance QP value in the scaling table may be raised to 51 (the same divisor). In this case the scaling table is in effect redundant, but may be retained simply for operational efficiency (so that the system works by reference to a table in the same way for each scheme). Hence more generally in an embodiment of the present disclosure the chroma QP divisor is modified responsive to the amount of information in the coding scheme relative to the 4:2:0 scheme.
(130) It is also notable that in the 4:2:0 scheme, the largest chroma TU is 1616, whereas for the 4:2:2 scheme 1632 TUs are possible, and for the 4:4:4 scheme, 3232 chroma TUs are possible. Consequently in an embodiment of the present disclosure quantisation matrices (Qmatrices) for 3232 chroma TUs are proposed. Similarly, Qmatrices should be defined for non-square TUs such as the 1632 TU.
(131) Qmatrices could be defined by any one of the following: values in a grid (as for 44 and 88 Qmatrices); interpolated spatially from smaller or larger matrices; in HEVC larger Qmatrices can be derived from smaller ones, relative to other Qmatrices (difference values, or deltas); hence only the deltas need to be sent, as a function of another Qmatrix; for example a scaling ratio relative to another matrix, hence only the coefficients of the functions need to be sent (such as the scaling ratio), as an equation/function (for example piece-wise linear curve, exponential, polynomial); hence only the coefficients of the equations need to be sent to derive the matrix, or any combination of the above.
(132) Other useful information includes an optional indicator of to which other matrix the values are related, the previous channel or the first (primary) channel; for example the matrix for Cr could be a scaled factor of a matrix for Y, or for Cb, as indicated.
(133) The number of Q Matrices in HEVC 4:2:0 is currently 2 (Luma+Chroma) for each transform size. However, in an embodiment of the present disclosure 3 are provided for (Y+Cb+Cr) or (G+B+R) as applicable. Hence in the case of a 4:4:4 GBR scheme, it will be appreciated that either one set of quantisation matrices could be used for all channels, or three respective sets of quantisation matrices could be used.
(134) A similar principle may be applied to MPEG4-SStP for GBR, where again 2 or 3 matrices per transform size maybe provided.
(135) Entropy Encoding
(136) Basic entropy encoding comprises assigning codewords to input data symbols, where the shortest available codewords are assigned to the most probable symbols in the input data. On average the result is a lossless but much smaller representation of the input data.
(137) This basic scheme can be improved upon further by recognising that symbol probability is often conditional on recent prior data, and consequently making the assignment process context adaptive.
(138) In such a scheme, context variables (CVs) are used to determine the choice of respective probability models, and such CVs are provided for in the HEVC 4:2:0 scheme.
(139) To extend entropy encoding to the 4:2:2 scheme, which for example will use 48 chroma TUs rather than 44 TUs for an 88 luma TU, optionally the context variables can be provided for by simply vertically repeating the equivalent CV selections.
(140) However, in an embodiment of the present disclosure the CV selections are not repeated for the top-left coefficients (the high-energy, DC and/or low spatial frequency coefficients), and instead new CVs are derived. In this case, for example, a mapping may be derived from the luma map. This approach may also be used for the 4:4:4 scheme.
(141) During coding, in the 4:2:0 scheme, a so-called zig-scan scans through the coefficients in order from high to low frequencies. However, again it is noted that the chroma TUs in the 4:2:2 scheme can be non-square, and so in an embodiment of the present disclosure a different chroma zig-scan is proposed with the angle of the scan be tilted to make it more horizontal, or more generally, responsive to the aspect ratio of the TU.
(142) Similarly, the neighbourhood for significance map CV selection and the c1/c2 system for greater-than-one and greater-than-two CV selection may be adapted accordingly.
(143) Likewise, in an embodiment of the present disclosure the last significant coefficient position (which becomes the start point during decoding) could also be adjusted for the 4:4:4 scheme, with last-significant positions for chroma TUs being coded differentially from the last-significant position in the co-located luma TU.
(144) The coefficient scanning can also be made prediction mode dependent for certain TU sizes. Hence a different scan order can be used for some TU sizes dependent on the intra-prediction mode.
(145) In the 4:2:0 scheme, mode dependent coefficient scanning (MDCS) is only applied for 44/88 luma TUs and 44 chroma TUs for intra prediction.
(146) In an embodiment of the present disclosure, it is proposed that in the 4:2:2 scheme MDCS is applied to 48 and 84 chroma TUs for intra prediction. Similarly, it is proposed that in the 4:4:4 scheme MDCS is applied to 88 and 44 chroma TUs.
(147) In-Loop Filters
(148) Deblocking
(149) Deblocking is applied to all CU, PU and TU boundaries, and the CU/PU/TU shape is not taken into account. The filter strength and size is dependent on local statistics, and deblocking has a granularity of 88 Luma pixels.
(150) Consequently it is anticipated that the current deblocking applied for the 4:2:0 scheme should also be applicable for the 4:2:2 and 4:4:4 schemes.
(151) Sample Adaptive Offsetting
(152) In sample adaptive offsetting (SAO) each channel is completely independent. SAO splits the image data for each channel using a quad-tree, and the resulting blocks are at least one LCU in size. The leaf blocks are aligned to LCU boundaries and each leaf can run in one of three modes, as determined by the encoder (Central band offset, Side band offset or Edge offset). Each leaf categorises its pixels, and the encoder derives an offset value for each of the 16 categories by comparing the SAO input data to the source data. These offsets are sent to the decoder. The offset for a decoded pixel's category is added to its value to minimise the deviation from the source.
(153) In addition, SAO is enabled or disabled at picture level; if enabled for luma, it can also be enabled separately for each chroma channel. SAO will therefore be applied to chroma only if it is applied to luma.
(154) Consequently the process is largely transparent to the underlying block scheme and it is anticipated that the current SAO applied for the 4:2:0 scheme should also be applicable for the 4:2:2 and 4:4:4 schemes.
(155) Adaptive Loop Filtering
(156) In the 4:2:0 scheme, adaptive loop filtering (ALF) is disabled by default. However, in principle (if allowed) then ALF would be applied to the entire picture for chroma.
(157) In ALF, luma samples are sorted into one of 15 categories; each category uses a different Wiener-based filter.
(158) By contrast, in 4:2:0 chroma samples are not categorisedthere is just one Wiener-based filter for Cb, and one for Cr.
(159) Hence in an embodiment of the present disclosure, in light of the increased chroma information in the 4:2:2 and 4:4:4 schemes, it is proposed that the chroma samples are categorised; for example with 7 categories for 4:2:2 and 15 categories for 4:4:4.
(160) Whilst in the 4:2:0 scheme ALF can be disabled for luma on a per-CU basis using an ALF control flag (down to the CU-level specified by the ALF control depth), it can only be disabled for chroma on a per-picture basis.
(161) Consequently in an embodiment of the present disclosure, the 4:2:2 and 4:4:4 schemes are provided with one or two channel specific ALF control flags for chroma.
(162) Syntax
(163) In HEVC, syntax is already present to indicate 4:2:0, 4:2:2 or 4:4:4 schemes, and is indicated at the sequence level. However, in an embodiment of the present disclosure it is proposed to also indicate 4:4:4 GBR coding at this level.
(164) HEVC Encoder
(165) Referring now to
(166) HEVC Decoder
(167) As discussed above, the reverse path of the decoder shown in
(168) Accordingly a decoder corresponding to the above encoder will be readily understood by a person skilled in the art to similarly comprise an intra-frame mode selector (corresponding to the selector 110) operable to select (for example, on the basis of data supplied by the encoder as part of the encoded bitstream) an intra-prediction mode, and an intra-frame mode predictor (corresponding to the predictor 120) which, responsive to that selection, is operable to select one of a plurality of predetermined orders of transform unit processing, so as to correspond with the encoding process for that data (otherwise the transmitted residual errors would not correspond to the errors in prediction at decoding). Hence such a decoder may also implement the methods described herein.
(169) The apparatus of
(170) As discussed above, features of the reverse path of the encoder of
(171) A HEVC or other decoder corresponding to the above encoder will be understood by a person skilled in the art. Such a decoder may implement at least the methods summarised in
(172) Summary
(173) In a summary embodiment of the present disclosure, a HEVC encoder as described above is operable to carry out methods described herein, including but not limited to the following.
(174) Referring to
(175) Referring to
(176) Referring to
(177) Referring to
(178) Referring to
(179) Referring to
(180) Referring to
(181) Referring to
(182) Referring to
(183) Referring to
(184) Referring to
(185) Referring to
(186) Referring to
(187) Referring to
(188) Referring to
(189) Referring to
(190) Referring to
(191) Referring to
(192) Referring to
(193) Referring to
(194) Finally, it will be appreciated that the methods disclosed herein may be carried out on conventional hardware suitably adapted as applicable by software instruction and/or by the inclusion or substitution of dedicated hardware.
(195) Thus the required adaptation to existing parts of a conventional equivalent device may be implemented in the form of a non-transitory computer program product or similar object of manufacture comprising processor implementable instructions stored on a data carrier such as a floppy disk, optical disk, hard disk, PROM, RAM, flash memory or any combination of these or other storage media, or in the form of a transmission via data signals on a network such as an Ethernet, a wireless network, the Internet, or any combination of these of other networks, or realised in hardware as an ASIC (application specific integrated circuit) or an FPGA (field programmable gate array) or other configurable circuit suitable to use in adapting the conventional equivalent device.
(196) In so far as embodiments of the disclosure have been described as being implemented, at least in part, by software-controlled data processing apparatus, it will be appreciated that a non-transitory machine-readable medium carrying such software, such as an optical disk, a magnetic disk, semiconductor memory or the like, is also considered to represent an embodiment of the present disclosure.
(197) It will be apparent that numerous modifications and variations of the present disclosure are possible in light of the above teachings. It is therefore to be understood that within the scope of the appended claims, the technology may be practiced otherwise than as specifically described herein.
(198) Embodiments of the disclosure may comprise video capture, storage, display, transmission and/or reception apparatus comprising a decoder as described above and/or an encoder as described above.
(199) Respective aspects and features of embodiments of the present disclosure are defined by the following numbered clauses. In the following clauses, the term high efficiency may optionally be deleted from the wording, as it refers just to an example of the use of embodiments of the present technology. 1. A method of high efficiency video coding, comprising the steps of: providing a 4:4:4 chroma subsampling format; and for that format recursively splitting a largest coding unit down to a coding unit of 44 pixels. 2. A method of high efficiency video coding, comprising the steps of: providing a 4:4:4 chroma subsampling format; and for that format providing respective coding unit tree hierarchies for each channel. 3. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format; and for that format enabling non-square quad-tree transforms; enabling asymmetric motion partitioning; and selecting transform unit block sizes to align with a resulting asymmetric prediction unit block layout. 4. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format; and for that format, associating intra-prediction mode angles for square prediction units with different intra-prediction mode angles for non-square prediction units. 5. A method of high efficiency video coding, comprising the steps of: providing a 4:4:4 chroma subsampling format; and for that format, providing respective intra-prediction modes for two or more prediction units in a coding unit. 6. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format; and for that format, interpolating a chroma intra-prediction unit having a height twice that of a corresponding 4:2:0 format prediction unit using the chroma filter employed for the corresponding 4:2:0 format prediction unit; and using only alternate vertical values of the interpolated chroma prediction unit. 7. A method of high efficiency video coding, comprising the steps of: providing a 4:4:4 chroma subsampling format; and for that format, interpolating a chroma prediction unit having dimensions twice those of a corresponding 4:2:0 format prediction unit using the chroma filter employed for the corresponding 4:2:0 format prediction unit; and using only alternate vertical and horizontal values of the interpolated chroma prediction unit. 8. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format and/or a 4:4:4 chroma subsampling format; and for either format, deriving a luma motion vector for a prediction unit; and independently deriving a chroma motion vector for that prediction unit. 9. A method of high efficiency video coding, comprising the steps of: providing a 4:4:4 chroma subsampling format; and for that format, deriving a luma motion vector for a prediction unit; and independently deriving a respective chroma motion vector for each chroma channel for the prediction unit. 10. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format; and for that format, splitting non-square transform units into square blocks prior to applying a spatial frequency transform; and then combining the resulting coefficients. 11. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format and/or a 4:4:4 chroma subsampling format; and for either format, indicating that luma residual data is to be included in a bitstream losslessly; and independently indicating that chroma residual data is to be included in the bitstream losslessly. 12. A method of high efficiency video coding, comprising the steps of: providing a 4:4:4 chroma subsampling format; and for that format, independently indicating for each channel whether residual data is to be included in a bitstream losslessly. 13. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format; and for that format, providing a quantisation parameter association table between luma and chroma quantisation parameters, where the maximum chroma quantisation parameter value is 6 smaller than the maximum luma quantisation parameter. 14. A method of high efficiency video coding, comprising the steps of: providing a 4:4:4 chroma subsampling format; and for that format, providing a quantisation parameter association table between luma and chroma quantisation parameters, where the maximum chroma quantisation parameter value is the same as the maximum luma quantisation parameter. 15. A method of high efficiency video coding, comprising the steps of: providing a 4:4:4 chroma subsampling format; and for that format, treating luma quantisation parameter values as chroma quantisation parameter values. 16. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format and/or a 4:4:4 chroma subsampling format; and for either format, defining one or more quantisation matrices as difference values with respect to quantisation matrices defined for a different chroma subsampling format. 17. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format and/or a 4:4:4 chroma subsampling format; and for either format, mapping an entropy encoding context variable from a luma context variable map for use with a chroma transform unit; and entropy encoding one or more coefficients of a chroma transform unit using the mapped context variable. 18. A method of high efficiency video coding, comprising the steps of: providing a 4:4:4 chroma subsampling format; and for that format, entropy encoding coefficients of luma and chroma transform units; and coding the last significant position for chroma transform units differentially from the last significant position in the co-located luma transform unit. 19. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format and/or a 4:4:4 chroma subsampling format; and for either format, enabling adaptive loop filtering; and categorising respective chroma samples into one of a plurality of categories each having a respective filter. 20. A method of high efficiency video coding, comprising the steps of: providing a 4:2:2 chroma subsampling format and/or a 4:4:4 chroma subsampling format; and for either format, enabling adaptive loop filtering; and providing at least a first adaptive loop filtering control flag for the chroma channels. 21. A computer program for implementing the steps of any preceding method clause. 22. A high efficiency video coding encoder arranged in operation to implement the steps of any preceding method clause. 23. A high efficiency video coding decoder arranged in operation to implement the steps of any one of clauses 1 to 9. 24. A method of high efficiency video coding substantially as described herein with reference to the accompanying drawings. 25. A high efficiency video coding encoder substantially as described herein with reference to the accompanying drawings. 26. A high efficiency video coding decoder substantially as described herein with reference to the accompanying drawings.
(200) It will be appreciated that these aspects and features, as well as the underlying embodiments to which they relate, may be applied in combination as technically appropriate.