Video encoding and decoding
11317102 · 2022-04-26
Assignee
Inventors
- Matteo NACCARI (London, GB)
- Marta Mrak (London, GB)
- SAVERIO BLASI (LONDON, GB)
- Andre Seixas Dias (London, GB)
Cpc classification
H04N19/70
ELECTRICITY
H04N19/198
ELECTRICITY
H04N19/46
ELECTRICITY
International classification
H04N19/70
ELECTRICITY
H04N19/46
ELECTRICITY
Abstract
The present invention relates to a method of decoding a video bitstream, the method comprising the steps of: receiving a bitstream representing: residual samples produced by subtracting encoder filtered motion compensated prediction samples from image samples; and motion vectors used in forming the motion compensated prediction samples; the encoder filtering process conducted on the motion compensated prediction samples at an encoder having at least one parameter; using said motion vectors to provide motion compensated prediction samples from a previously reconstructed image; decoder filtering said motion compensated prediction samples in accordance with said at least one parameter; and adding said filtered motion compensated prediction samples to said residual samples to reconstruct images. A system and apparatus corresponding to this method are also disclosed.
Claims
1. A method of decoding a video bitstream comprising the steps of: receiving a bitstream representing: residual samples produced by subtracting encoder filtered motion compensated prediction samples from image samples; and motion vectors used in forming the motion compensated prediction samples; the encoder filtering process conducted on the motion compensated prediction samples at an encoder having one or more parameters; using said motion vectors to provide motion compensated prediction samples from a previously reconstructed image; analysing the motion compensated prediction samples, and/or the previously reconstructed image, to infer at least one of said parameters of the encoder filtering process; decoder filtering said motion compensated prediction samples in accordance with said at least one inferred parameter; and adding said filtered motion compensated prediction samples to said residual samples to reconstruct images wherein the step of analysing comprises determining the existence of, and/or the direction of, any predominant direction in the motion compensated prediction samples and/or the previously reconstructed image.
2. The method of claim 1, wherein the decoder filtering uses a filter aperture defining weighted contributions from a current motion compensated prediction sample and neighbouring motion compensated prediction samples in a block or other set of motion compensated prediction samples.
3. The method of claim 2, wherein said or one of said parameters, and/or said or one of said inferred parameters, relates to anisotropy of the filter aperture and/or wherein said or one of said parameters, and/or said or one of said inferred parameters, is a binary flag denoting isotropy or anisotropy of the filter aperture.
4. The method of claim 3, wherein said or one of said parameters, and/or said or one of said inferred parameters, denotes one or more of: a number of samples in said filter aperture; and the weight of the contribution of the current motion compensated prediction sample.
5. A method of encoding video comprising the steps of: forming motion compensated prediction samples using motion vectors and a reconstructed image; conducting an encoder filtering process on the motion compensated prediction samples based on one or more parameters; subtracting motion compensated prediction samples from image samples to form residual samples; and forming a bitstream representing the residual samples, the motion vectors and optionally at least one parameter of the encoder filtering process; wherein the method further comprises: analysing the motion compensated prediction samples and/or a previously reconstructed image to determine at least one of said parameters of the encoder filtering process; wherein the step of analysing comprises determining the existence of, and/or the direction of, any predominant direction in the motion compensated prediction samples and/or the previously reconstructed image.
6. The method of claim 5, wherein the encoder filtering uses a filter aperture defining weighted contributions from a current motion compensated prediction sample and neighbouring motion compensated prediction samples in a block or other set of motion compensated prediction samples.
7. The method of claim 6, wherein said or one of said parameters relates to anisotropy of the filter aperture.
8. The method of claim 6, wherein said or one of said parameters, denotes one or more of: a predominant direction of an anisotropic filter aperture and one of a set of quantised directions; a number of samples in said filter aperture; and the weight of the contribution of the current motion compensated prediction sample.
9. A method of decoding a video bitstream comprising the steps of: receiving a bitstream representing: residual samples produced by subtracting motion compensated prediction samples from image samples; and motion vectors used in forming the motion compensated prediction samples; an encoder filtering process conducted at an encoder having one or more parameters; using said motion vectors to provide motion compensated prediction samples from previously reconstructed image samples; and adding said motion compensated prediction samples to said residual samples to reconstruct images; characterized by: conducting an analysis at the decoder to infer at least one of said parameters of the encoder filtering process; and decoder filtering said motion compensated prediction samples or said previously reconstructed image samples in accordance with said at least one inferred parameter and optionally in accordance with at least one parameter represented in the bitstream; wherein the analysis comprises determining the existence of, and/or the direction of, any predominant direction in the motion compensated prediction samples and/or the previously reconstructed image.
10. The method of claim 9, wherein the step of analysing comprises determining the existence of and/or the direction of any predominant direction in an array of samples to infer a parameter or parameters relating to any anisotropy in the encoder filtering process.
11. The method of claim 10, wherein the array of samples comprises the motion compensated prediction samples or the previously reconstructed image.
12. The method of claim 9, wherein the decoder filtering uses a filter aperture defining weighted contributions from a current sample and neighbouring samples in a block or other set of samples.
13. The method of claim 12, wherein said or one of said parameters, and/or said or one of said inferred parameters, relates to anisotropy of the filter aperture.
14. The method of claim 12, wherein said or one of said parameters, and/or said or one of said inferred parameters, is a binary flag denoting isotropy or anisotropy of the filter aperture.
15. The method of claim 12, wherein said or one of said parameters, and/or said or one of said inferred parameters, denotes one or more of: a predominant direction of an anisotropic filter aperture and preferably one of a set of quantised directions; a number of samples in said filter aperture; and the weight of the contribution of the current motion compensated prediction sample.
16. The method of claim 2, wherein one of the inferred parameters denotes a predominant direction of an anisotropic filter aperture.
17. The method of claim 16, wherein one of the inferred parameters denotes one of a set of quantised directions.
18. The method of claim 1, wherein the step of analysing comprises determining the direction of any predominant direction in the motion compensated prediction samples, and/or the previously reconstructed image, so as to infer a parameter relating to any anisotropy in the encoder filtering process.
19. The method of claim 1, comprising analysing the motion compensated prediction samples to infer at least one parameter of the encoder filtering process; wherein the step of analysing comprises determining the existence of, and/or the direction of, any predominant direction in the motion compensated prediction samples.
20. The method of claim 1, comprising analysing the previously reconstructed image to infer at least one parameter of the encoder filtering process; wherein the step of analysing comprises determining the existence of, and/or the direction of, any predominant direction in the previously reconstructed image.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The invention will now be described by way of example with reference to the accompanying drawings, in which:
(2)
(3)
(4)
(5)
DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS
(6) There is shown in
(7) When a subsequent input frame is received, this subsequent input frame is combined with the reference frame (from the decoded picture buffer 46) to obtain a motion estimation 48. This determines suitable motion vectors to map blocks within the reference frame to corresponding blocks within the (subsequent) input frame. The reference frame (from the decoded picture buffer 46) and the vectors from the motion estimation 48 are then used to obtain a motion compensated prediction 12, The optimal MC prediction is then subtracted from the subsequent input frame to form a residual which is transformed 14, quantised 16, entropy encoded 18, and transmitted.
(8) To this extent, the encoder of
(9) In accordance with one aspect of this disclosure, the motion compensated prediction is passed to a smoothing filter which provides an alternative, smoothed motion compensated prediction.
(10) So, the general block schema of a hybrid motion compensated predictive video encoder is modified through the addition of a smoothing filter. The smoothing filter operates on predictor P (provided by motion compensation) to obtain the alternative predictor P.sub.MCPS. Given that the smoothing applied may not be beneficial for all coding blocks, e.g. because P provides already good coding efficiency, a flag is required to be present in the bitstream for its correct decoding. Therefore, at the encoder side for each coding block, residuals are computed with and without smoothing and the mode which minimises the rate distortion Lagrangian cost is eventually selected. The steps in this aspect of the workflow of the encoder are set out below, using—for efficiency—a pseudo code where MCPS denotes the smoothing process.
(11) TABLE-US-00001 For each inter coding mode m do Set P equal to the predictor computed as specified by m Apply MCPS to P and set the obtained predictor to P.sub.MCPS Compute the residuals R associated with P Compute the residuals R.sub.MCPS associated with P.sub.MCPS Encode (i.e. apply frequency transformation, quantisation and entropy coding) R and R.sub.MCPS Measure the rate r and r.sub.MCPS associated with R and R.sub.MCPS, respectively Compute the distortion D and D.sub.MCPS for R and R.sub.MCPS, respectively Compute the Lagrangian cost J and J.sub.MCPS If J ≤ J.sub.MCPS write into the bitstream a flag with value zero otherwise, write a flag with value one Endfor
(12) The smoothing process may be represented by a spatial filter having an aperture or kernel which defines which samples contribute to the output and with which weighting.
(13) In the one example, the smoothing process MCPS uses two types of Mobile Average (MA)-like?] and symmetric kernels. The first type is a 2D and cross-shaped filter while the second corresponds to a 1D filter applied along four directions: horizontal, vertical, 45 and 135 degrees. Other sizes, as well as other filters and other directions, could also be used. Since the first type of filter is a 2D one, operating along the horizontal and vertical dimensions, it will be hereafter denoted as isotropic while the kernels of the second type are hereafter denoted as directional. As mentioned above, the amount of blur depends on the motion activity of different image areas. Accordingly, the size of the different kernels used may vary to further improve the coding efficiency. For the MCPS method, two example sizes for the kernels are used: 3 and 5 where the chroma component only uses size 3. Overall the (exemplary) kernels specified in MCPS are listed in
(14) In total there are eight different kernels and if the encoder were to perform an exhaustive search for the best mode during rate distortion optimisation, eight full encodings of the resulting residuals should be evaluated in addition to the case of no smoothing. However, the resulting complexity may be prohibitive in some applications. To alleviate the computational burden, and/or to provide the advantages detailed below, the encoder according to this example only selects among the filter type (i.e. isotropic or directional) and related size (i.e. 3 or 5). The remaining parameter, that is to say the directionality, is determined through a preliminary analysis of—in this case—the motion compensated prediction, to identify a predominant direction. This can be done by measuring and analysing gradients.
(15) For example and as shown in
(16)
(17) Where the sign ‘-’ accounts for the different reference system used in images w.r.t the Cartesian 2D plane. The arctan function is approximated with integer arithmetic and 10 bit precision. The angle α is then uniformly quantised 34 into the aforementioned directions with a quantisation step of 45 degrees and dead-zone extent set to 22.5 degrees. After quantisation 34, a four bin histogram is computed 36 and the GDD is determined 38 by the peak of the histogram. For the example depicted in
(18) With the introduction of directional and variable size kernels, the general workflow presented above is extended as follows:
(19) TABLE-US-00002 Foreach inter coding mode m do Set P equal to the predictor computed as specified by m Compute the residuals R associated with P Encode (i.e. apply frequency transformation, quantisation and entropy coding) R Measure the rate r and distortion D associated with R and compute the Lagrangian cost J Set J.sub.BEST ← J Foreach smoothing mode s in { Isotropic or Directional } do Foreach kernel size ks in { 3, 5 } do If s == Isotropic Apply smoothing with cross-shaped filtering with size ks to get P.sub.MCPS Else (s == Directional) Compute the GDD d Apply smoothing with 1D ks-sized filtering along d to get P.sub.MCPS Endif Compute the residuals R.sub.MCPS associated with P.sub.MCPS Encode R.sub.MCPS and measure r.sub.MCPS, compute D.sub.MCPS and J.sub.MCPS If J.sub.BEST ≤ J.sub.MCPS Set J.sub.BEST ← J.sub.MCPS, s.sub.BEST ← s and ks.sub.BEST ← ks Endfor Endfor If J.sub.BEST == J Write into the bistream a flag with value ‘0’ Else Write into the bistream a flag with value ‘1’ and s.sub.BEST and ks.sub.BEST Endif Endfor
(20) At the decoder side, the signalled metadata specify information on whether smoothing is applied and, if this is the case, which type and size have been selected. In an HEVC example, metadata may be transmitted on a Coding Unit (CU) basis and refer to three colour components: hence if the metadata signal that MCPS is used, it will be applied to Y, Cb and Cr. The information associated with MCPS is conveyed with three binary flags denoted as follows in Table 1:
(21) TABLE-US-00003 TABLE 1 Flag to signal the metadata required by MCPS Flag name Description use_mcps_flag Specifies whether MCPS is used or not mcps_type Specifies which type of filter is used (isotropic or directional) mcps_size Specifies which size to use for the smoothing (3 or 5)
(22) In an HEVC example, each flag is coded with CABAC using one context per flag. Finally, the use of MCPS may be signalled in both the Sequence Parameter Set (SPS) and slide header. The slice level flag is only transmitted if MCPS is enabled at SPS level. The value of the slice level flag is decided after the compression of all CTUs belonging to that slice is completed. If none of the CUs have flag use_mcps_flag equal to 1, then the slice header flag is set to zero or one otherwise. It is worth noting that the value for the slice level flag should be decided in RDO sense but this would require compressing a slice twice which may be prohibitive in terms of encoder complexity. The skilled man will understand that different signalling techniques may be employed, particularly in the context of future standards.
(23) Given the use of integer arithmetic and short size kernels, the additional complexity—the processing required by the core MCPS algorithm—brought by MCPS is quite limited.
(24) At the decoder, as shown for example in
(25) The decoder determines from the described binary flags: whether smoothing is to be used; whether the filter is isotropic or directional; and what is the filter size. However, in the case where a directional smoothing filter is used, the decoder can infer which direction of filter to be used. This is achieved by applying essentially the same GDD process on the motion compensated prediction.
(26) The important advantage is achieved that, for a minor increase in complexity at the decoder, the overhead imposed on the bitstream by this technique is restricted in this example to three binary flags.
(27) It will be understood that the direction of the filtering can be quantised more finely with a larger number of filter kernels. Since the direction of filtering is inferred at the decoder—rather than signalled in the bitstream
(28) In modifications, other filter parameters can be deduced or determined by measurement at the encoder and then inferred at the decoder, so further reducing the amount of information that is required to be signalled in the bitstream.
(29) For example, a process similar to the GDD process could be used to determine the degree of anisotropy in the samples and thus whether an isotropic or directional filter is to be used. More particularly, the described histogram can be used to infer the type of filter to use. A like process would be conducted at the decoder to infer which type of filter to use.
(30) In another example, the size of the filters to use may be based on: the spatial resolution of the sequence; the direction of the filter; or the histogram of gradient directions—in particular the relative quantities of directions.
(31) The number and quality of filters used may be extended, for example: to include additional directions and to include more complex shapes of the kernel. The weight of the coefficients, in particular the centre coefficient in the filter kernels may be selected depending on, for example: visual characteristics of the content.
(32) In appropriate cases, the decoder may be capable of inferring all relevant parameters (including the existence) of the encoding filtering process. There would then be no necessity for a parameter to be represented in the bistream
(33) There are other modifications which may have useful application, particularly where computational resources are limited. Such limitations will of course more usually apply at the decoder.
(34) For example: Given that MCPS is an additional filtering cascaded to motion compensation, there could be assumed the potential to combine both motion compensation and smoothing by convolving the associated kernels. In particular, given that there are a number of discrete, different MCPS modes, some precomputed kernels can be stored in memory to be then used later. However, by the description from previous sections and with the reference to
(35) At the encoder, the GDD process will then be separated from the smoothing filter and will operate on the output of the decoded picture buffer. Similarly, at the decoder the GDD process will operate on the previously decoded picture.
(36) It will be recognised that current and proposed standards permit block by block selection of reference frames as well as motion vectors. Thus MCPS may be performed separate for each motion vector or each motion vector and reference frame combination.
(37) Various other modifications will be apparent to those skilled in the art, for example, while the detailed description has primarily considered the method being used for blurring, this method could also be used for de-blurring frames. This may involve the use of known blur detection techniques rather than GDD, and a de-blurring filter rather than a smoothing filter. More generally, this invention may be useful for reducing the residuals within any set of images wherein a certain effect is present in only one of the images.
(38) Extensions to the method, for example use for de-blurring may involve a change in the flags sent, where differing flags may be sent depending upon the filter to be applied. Corresponding to the de-blurring example, there may be a choice of flags for: no filter; blurring filter; de-blurring filter (of course this could be extended to include further flags). Features of the application of the filter, for example the calculation of the gradient dominant direction, may remain calculated by the decoder to minimise the transmission cost.
(39) Alternatively, the type of filter to be applied may be inferred from the image being filtered, for example a blurred image may result in a de-blurring filter being applied. There may be a flag that is set only when an unusual filter is to be applied—such as using a blurring filter on an already blurred image.
(40) The filter may also be dependent upon historical data, for example if a filter had been applied to previous blocks or frames, the direction or the magnitude of the filter applied may be selected accordingly.
(41) It should be noticed that the smoothing and related decisions described in this document can be applied to any level of granularity during the encoding loop, be Coding Units or Prediction Units (in the case of the HEVC standard) or any kind of partitioning in other possibly future standards.
(42) Whilst attention has so far in this disclosure been focused on smoothing or other filtering the motion compensated prediction, the ability to select between a wide range of filters—without a corresponding increase in the signalling burden placed on the bitstream—may be of advantage in a case where the smoothing or other filtering is conducted on a reference frame prior to motion compensation, for example as shown in US2006/0171569.