LAST COEFFICIENT CODING FOR ADAPTIVE TRANSFORM BASED VIDEO COMPRESSION
20230039301 · 2023-02-09
Inventors
- Sebastien LASSERRE (Thorigné Fouillard, FR)
- Saurabh PURI (Rennes, FR)
- Patrick LE CALLET (LE PALLET, FR)
Cpc classification
H04N19/91
ELECTRICITY
H04N19/132
ELECTRICITY
H04N19/12
ELECTRICITY
H04N19/463
ELECTRICITY
H04N19/192
ELECTRICITY
International classification
H04N19/12
ELECTRICITY
H04N19/13
ELECTRICITY
H04N19/132
ELECTRICITY
H04N19/463
ELECTRICITY
Abstract
Coding of the last coded coefficient position is performed by basing the coding of they coordinate of the position of the last coded coefficient on knowledge of the size of the partial transform used to generate a block of coefficients from a block of video pixels. This enables a context adaptive coding of the last coded coefficient parameter to be performed much more efficiently.
Claims
1-15. (canceled)
16. A method for encoding, comprising: applying a partial transform to a block of image values to obtain a block of transform coefficients; and entropy encoding the block of transform coefficients, wherein the entropy encoding of the block of transform coefficients comprises entropy encoding a syntax element representing a coordinate of a position of a last coded transform coefficient in the block of transform coefficients, the entropy encoding of the syntax element depending on the size of the partial transform.
17. The method of claim 16, wherein an information representative of the size of the partial transform is encoded in video data.
18. The method of claim 16, wherein the syntax element is a prefix resulting from a splitting of the coordinate of the position of the last coded transform coefficient in the block of transform coefficients into a prefix and a suffix, wherein the entropy encoding further comprises inferring the value of the prefix from the size of the partial transform.
19. The method of claim 16, wherein an adaptive set of partial transforms that are learned offline is used to transform the block of image values.
20. An apparatus comprising electronic circuitry configured to: apply a partial transform to a block of image values to obtain a block of transform coefficients; and entropy encode the block of transform coefficients, wherein the entropy encoding of the block of transform coefficients comprises entropy encoding a syntax element representing a coordinate of a position of a last coded transform coefficient in the block of transform coefficients, the entropy encoding of the syntax element depending on the size of the partial transform.
21. The apparatus of claim 20, wherein an information representative of the size of the partial transform is encoded in video data.
22. The apparatus of claim 20, wherein the syntax element is a prefix resulting from a splitting of the coordinate of the position of the last coded transform coefficient in the block of transform coefficients into a prefix and a suffix, and wherein the entropy encoding further comprises inferring the value of the prefix from the size of the partial transform.
23. A method for decoding, comprising: entropy decoding a block of transform coefficients obtained by applying a partial transform to an input block of image values; and inverse transforming the block of transform coefficients using the partial transform to obtain an output block of image values, wherein the entropy decoding of the block of transform coefficients comprises entropy decoding a syntax element representing a coordinate of a position of a last coded transform coefficient in the block of transform coefficients, the entropy decoding of the syntax element depending on the size of the partial transform.
24. The method of claim 23, wherein an information representative of the size of the partial transform is decoded from video data.
25. The method of claim 24, wherein the syntax element is a prefix resulting from a splitting of the coordinate of the position of the last coded transform coefficient in the block of transform coefficients into a prefix and a suffix, and wherein the entropy decoding further comprises inferring the value of the prefix from the size of the partial transform.
26. An apparatus comprising electronic circuitry configured to: entropy decode a block of transform coefficients obtained by applying a partial transform to an input block of image values; and inverse transform the block of transform coefficients using the partial transform to obtain an output block of image values, wherein the entropy decoding of the block of transform coefficients comprises entropy decoding a syntax element representing a coordinate of a position of a last coded transform coefficient in the block of transform coefficients, the entropy decoding of the syntax element depending on the size of the partial transform.
27. The apparatus of claim 26, wherein an information representative of the size of the partial transform is decoded from video data.
28. The apparatus of claim 26, wherein the syntax element is a prefix resulting from a splitting of the coordinate of the position of the last coded transform coefficient in the block of transform coefficients into a prefix and a suffix, and wherein the entropy decoding further comprises inferring the value of the prefix from the size of the partial transform.
29. A non-transitory computer-readable storage medium storing program code instructions operative, when executed by a processor, to cause the processor to: entropy decode a block of transform coefficients obtained by applying a partial transform to an input block of image values; and inverse transform the block of transform coefficients using the partial transform to obtain an output block of image values, wherein the entropy decoding of the block of transform coefficients comprises entropy decoding a syntax element representing a coordinate of a position of a last coded transform coefficient in the block of transform coefficients, the entropy decoding of the syntax element depending on the size of the partial transform.
30. The storage medium of claim 29, wherein an information representative of the size of the partial transform is decoded from video data.
31. The storage medium of claim 29, wherein the syntax element is a prefix resulting from a splitting of the coordinate of the position of the last coded transform coefficient in the block of transform coefficients into a prefix and a suffix, and wherein the entropy decoding further comprises inferring the value of the prefix from the size of the partial transform.
32. A non-transitory computer-readable storage medium storing program code instructions operative, when executed by a processor, to cause the processor to: apply a partial transform to a block of image values to obtain a block of transform coefficients; and entropy encode the block of transform coefficients, wherein the entropy encoding of the block of transform coefficients comprises entropy encoding a syntax element representing a coordinate of a position of a last coded transform coefficient in the block of transform coefficients, the entropy encoding of the syntax element depending on the size of the partial transform.
33. The storage medium of claim 32, wherein an information representative of the size of the partial transform is encoded in video data.
34. The storage medium of claim 32, wherein the syntax element is a prefix resulting from a splitting of the coordinate of the position of the last coded transform coefficient in the block of transform coefficients into a prefix and a suffix, and wherein the entropy encoding further comprises inferring the value of the prefix of the y coordinate from the size of the partial transform.
35. The storage medium of claim 32, wherein an adaptive set of partial transforms that are learned offline is used to transform the block of image values.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
DETAILED DESCRIPTION
[0033] The technical problem solved by the following embodiments is reducing the cost of coding the position of the last coded coefficient in a transformed block of pixels to which a 2D partial transform has been applied.
[0034] These embodiments are an improvement of the entropy coding scheme commonly used in video coding standards. One such video coding standard is the HEVC/H.265 standard, but the embodiments are not limited to that standard.
[0035] The main idea of the present principles is that if a partial transform is used on a block of pixel values, it is known that some transformed coefficients are necessarily zero, thus imposing some constraints on the position of the last coded coefficient and on the coordinates representing the position of the last coded coefficient (x,y). In the described embodiments, the focus is on using such constraints to efficiently encode the two coordinates (x,y) in case a partial transform is used to transform a block of pixels to transform coefficients.
[0036] The present ideas use the fact that not all coordinate pairs (x,y) are acceptable in the case where m<n so that the coding of (x,y) is more efficient. This can lead to improved compression performance.
[0037] In particular, if one uses partial transforms, it implicitly takes into account the fact that not all coordinate pairs (x,y) are acceptable, such that the coding of (x,y) is more efficient. In this particular embodiment with partial transforms, tests have been conducted on a modified HEVC standard using partial transforms and it has been shown that an improved coding of (x,y) leads to a compression gain of about −0.5%
[0038] The ideas presented herein propose to keep the main part of the HEVC algorithm to code the coordinate pair (x,y), but it adds a feature, that is, the reduction of possible range of coding y once x is known.
[0039] For instance, as in
[0040] The coordinate x for the last coded coefficient is in the range 0 (for positions 0, 1, 3, 6) to 3 (only for the position 9). Once x is known, the range of y actually depends on x. In the example above, one has: [0041] x=0 or x=1 leads to y in the range [0,3] [0042] x=2 leads to y in the range [0,2] [0043] x=3 leads to y=0; in this case y does not even need to be coded
[0044] One easily understands that this process of coding is decodable as one decodes x, then deduces the range for y, assuming that m is known and encoded somewhere, then decodes y.
[0045] In HEVC, a coordinate x or y is split into a prefix and a suffix as shown in
[0046] For instance, if the coordinate value is 14, the prefix is 7 and the suffix is coded in 2 bits. The suffix is the remainder after the first value is subtracted from the coordinate value, 14−12=2. If the coordinate value is 3, then the prefix is 3, but there is no suffix.
[0047] The prefix is binarized using a truncated unary coding. The truncation is performed based on the knowledge of the block size that provides an upper bound for the prefix value. Each bit of the binarized prefix is encoded using CABAC and a dedicated context.
[0048] For instance, in a 4×4 block the possible prefix are 0, 1, 2 or 3 which are respectively binarized into 1, 01, 001 and 000 (truncated). In another example, in a 8×8 block, the possible prefix are from 0 to 5 and binarized into 1, 01, 001, 0001, 00001 and 00000 (truncated).
[0049] The suffix is binarized using a fixed length coding, and the fixed length suffix is encoded as is, using CABAC in bypass mode without using contexts. For instance, if the coordinate value is 14 in a 16×16 block, the prefix is 7 (binarized into 0000000) and the suffix is 2 (binarized into 10 and coded on 2 bits).
[0050] The binarization process into prefix and suffix is kept as in HEVC, using contexts only for the prefix. The main characteristic of the present principles is that the contexts of y can be reduced in range, depending on the value of x.
[0051] Usually, a transform T transforms a block of n pixels into m=n transformed coefficients. The process is invertible by applying the inverse transform T.sup.−1 to the transformed coefficients to get the pixel values back. In case of a partial transform P, the n pixels are transform into less m<n transformed coefficients. Equivalently, it may be assumed the lacking m-n coefficients are set to zero. An “inverse” transform P (of course not the mathematical inverse because the partial transform is not invertible) is applied to the transformed coefficient to obtain an approximation of the initial pixel values. Typically, partially transformed coefficients are representative of the low frequency information of the pixel block.
[0052] If a partial transform is used, it is known that some m-n transformed coefficients are necessarily zero, thus imposing some constraints on the position of the last coded coefficient and on the coordinates (x,y). In this case, by implicitly using these constraints, it is shown that the embodiment automatically handles the efficient encoding of the two coordinates (x,y) in case a partial transform is used to transform the TU pixels to transformed coefficients.
[0053] Instead of the systematic transforms such as DCT or DST, an adaptive set of orthogonal transforms may be used instead that are learned offline on a large training set using different classification and transform optimization schemes. This set of transforms is fed to a codec and the best transform out of a set is chosen in a Rate Distortion Optimization (RDO) loop. A more adaptive approach is to learn a set of orthogonal transforms for a particular intra frame of a sequence. This is referred to as an on-the-fly block based transform learning scheme in the rest of the specification. This scheme is shown in
[0054] The block diagram of
[0055] Due to the energy compaction property of SVD, it is observed that the overhead cost can be considerably reduced by deducing an incomplete representation of the learned transform where, only first ‘m’ vectors are transmitted to the decoder and the remaining (n-m) transform vectors are either generated using a completion algorithm similar to the Gram-Schmidt method or forced to zero, thus leading to a partial transform.
[0056] In order to illustrate the effect of dropping the last few vectors of a transform on the Bjontegaard Distortion rate (BD-rate), four non-separable optimized transforms of size 64×64 are learned on the 4K sequences ‘PeopleOnStreet’ and ‘Traffic’. Encoding tests are performed on these sequences where the first ‘m’ vectors are retained and the rest of the basis vectors are then completed using a completion algorithm.
[0057] It is observed in
[0058] Since, the performance drop in terms of BD-rate when encoding only first ‘m’ transform vectors depends on the video content, a content-adaptive method is necessary to estimate the optimal value of ‘m’. Intuitively, at low bit-rates, most of the coefficients are quantized to zero and at high bit rate, the coefficients' energy is significant even at high frequencies. Therefore, the value of ‘m’ depends on the content as well as the quantization parameter QP.
[0059] The residual signal energy on the average is concentrated more in the first few coefficients where, the DC coefficient has on the average maximum energy and it decreases as we go towards the higher frequency coefficients. Therefore, most of the high frequency coefficients are quantized to zero. A simple threshold based method can be applied in order to compute the best value of ‘m’ which is required to be coded along with the frame as an overhead.
[0060] Let E be the sum of energy of the DCT coefficients. A threshold t is defined by the multiplication of a parameter p with E to get,
t=p.Math.E
[0061] The value of ‘m’ can simply be computed from the number of coefficients with average energy greater than this threshold. The value of ‘p’ can be found experimentally. Table 1 shows the variation of the number of vectors ‘m’ above this threshold for a chosen value of ‘p’ in terms of percentage of total number of vectors that are encoded. It is observed from Table 1 that at high QP, the average number of vectors required are much less compared to that required at low QP. Moreover, the number of vectors ‘m’ also varies depending on the content.
TABLE-US-00001 TABLE 1 Number of vectors to encode vs threshold for different sequences Percentage of total number of vectors to encode (in %) QP ‘p’ PeopleOnStreet Traffic Nebuta SteamLocomotive 22 0.01 0.23 0.24 0.30 0.32 0.005 0.27 0.29 0.32 0.35 0.001 0.45 0.48 0.46 0.46 27 0.01 0.20 0.22 0.28 0.31 0.005 0.25 0.27 0.31 0.34 0.001 0.38 0.39 0.38 0.43 32 0.01 0.18 0.19 0.26 0.28 0.005 0.21 0.22 0.29 0.30 0.001 0.29 0.30 0.39 0.36 37 0.01 0.14 0.15 0.23 0.23 0.005 0.18 0.18 0.26 0.25 0.001 0.23 0.23 0.30 0.30
[0062] When the present principles are applied using partial transforms, the range of y depends on x.
[0063] Consider again the example of
[0064] Some additional examples of binarization using this embodiment in the example of
[0065] (3,0) binarized into (000, nothing) instead of (000,0)
[0066] (0,0) binarized into (0,0) as in HEVC
[0067] (3,3) is an impossible coordinate
[0068] The embodiment with partial transforms easily extends to a coordinate y that uses a suffix, as in
[0069] In the HEVC/H.265 standard, a new tool for coding binary data has been proposed in the arithmetic coder, namely the Context-Adaptive Binary Arithmetic Coding (or CABAC). A binary symbol s, which takes value 0 or 1, is coded following a probability p to be 1 and 1-p to be 0. This probability is deduced form a context and is adapted after each symbol coding.
[0070] A context value is an 8 bit value, see
[0071] The update of the context value is made following the process described in
[0072] The evolution is made through two tables, transIdxMPS if the coded symbol is the MPS, and transIdxLPS if the coded symbol is not the MPS, i.e., it is the Least Probable Symbol (LPS). These tables are provided in Table 2 for the entry p′, whose name is pStateIdx.
TABLE-US-00002 TABLE 2 tables for the evolution of the context state Table 9-41 - State transition table pStateIdx 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 transIdxLps 0 0 1 2 2 4 4 5 6 7 8 9 9 11 11 12 transIdxMps 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 pStateIdx 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 transIdxLps 13 13 15 15 16 16 18 18 19 19 21 21 22 22 23 24 transIdxMps 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 pStateIdx 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 transIdxLps 24 25 26 26 27 27 28 29 29 30 30 30 31 32 32 33 transIdxMps 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 pStateIdx 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 transIdxLps 33 33 34 34 35 35 35 36 36 36 37 37 37 38 38 63 transIdxMps 49 50 51 52 53 54 55 56 57 58 59 60 61 62 62 63
[0073] The probability p.sub.MPS of the symbol s to be the MPS is quantized linearly on 8 bits, from 0 to 127. It is deduced from the context value by
p.sub.MPS=(p′+64)/127=(pStateIdx+64)/127
and the probability p of the symbol s to be 1 is deduced obviously from p.sub.MPS depending on the value of the MPS.
p=.sub.MPSif MPS=1,
p=1−p.sub.MPSif MPS=0.
[0074] Context-Adaptive coding is a powerful tool that allows the coding to dynamically follow the statistics of the channel to which the symbol belongs. Also, each channel should have its own context in order to avoid mixing statistics and losing the benefit of the process. This has led to the extensive use of many contexts in HEVC/H.265, which uses several hundreds of contexts, in order to model many channels. For instance, among all channels using contexts, there are [0075] motion vector residuals, [0076] TU coding flags, [0077] Last significant coefficient position, [0078] Coding Group coding flags, [0079] Transformed coefficient significant flags, [0080] Transformed coefficient magnitude (greater than 1 and greater than 2) flags, [0081] SAO data, [0082] other data
[0083] All of these contexts/channels also largely depend on the color channel, that is whether the channel is luma or chroma, the Transform Unit size, the position of the transformed coefficient, the neighboring symbol values and other factors.
[0084] As an example, a Coding Group (CG) coding flag is chosen depending on whether or not the CG coding flags below and at the right of the current CG are 1 or not, as shown in
[0085] All in all, the context choice depends on many things, thus the huge number of contexts. Of course, the decoder must update the context values to correspond with what was performed on the encoder side to ensure synchronization with the encoder and parsing of the stream.
[0086] The embodiments described herein propose to reduce the range of potential y coordinate candidates by considering the size of the partial transform used to obtain the coefficients from the image block values.
[0087] The associated syntax needed to implement the described principles should be encoding of the value of m the size of the partial transform. This value can be coded at a frame or a slice level such that the improved binarization and truncated unary coding is also determinable at a decoder.
[0088] Regarding the decoding process, the principles described herein must accordingly impact a decoder to enable it to decode the last coded coefficient position.
[0089] In such embodiments using partial transforms, the improved binarization and truncated unary coding of the y suffix is also impacted.
[0090] In addition to the above features of the present principles, the described embodiments provide that the coding of m, or information from which the subset of coefficients can be determined, is present in the bitstream. In addition, the last coded coefficient is provided by two coordinates (x,y) and the subset and the value of x are used to determine a range for y. Also, truncated coding is used for the y prefix and suffix based on the determined range.
[0091] The proposed idea is normative in the sense that it is present in the syntax of the video stream and implies a decoding method to be applied. As such, the present ideas can be implemented in a video standard, such as the successor to HEVC.
[0092] One embodiment of a method 1100 for coding a set of transformed coefficients is shown in
[0093] One embodiment of an apparatus 1200 for coding a set of transformed coefficients is shown in
[0094] An embodiment of a method 1300 for decoding a set of transformed coefficients is shown in
[0095] An embodiment of an apparatus 1400 for decoding a set of transformed coefficients is shown in
[0096] The aforementioned embodiments can be implemented in Set Top Boxes (STBs), modems, gateways or other devices that perform video encoding or decoding.
[0097] The functions of the various elements shown in the figures can be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read-only memory (“ROM”) for storing software, random access memory (“RAM”), and non-volatile storage.
[0098] Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
[0099] The present description illustrates the present principles. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the present principles and are included within its spirit and scope.
[0100] All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the present principles and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
[0101] Moreover, all statements herein reciting principles, aspects, and embodiments of the present principles, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof.
[0102] Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
[0103] Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the present principles. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
[0104] In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The present principles as defined by such claims reside in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
[0105] Reference in the specification to “one embodiment” or “an embodiment” of the present principles, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.