Transform block coding
10887610 · 2021-01-05
Assignee
Inventors
- Joachim KEINERT (Erlangen, DE)
- Thomas RICHTER (Erlangen, DE)
- Herbert Thoma (Erlangen, DE)
- Christian SCHERL (Erlangen, DE)
- Manuel De Frutos López (Erlangen, DE)
- Siegfried Foessel (Erlangen, DE)
- Wolfgang HEPPNER (Erlangen, DE)
- Miguel Ángel Martínez Del Amor (Erlangen, DE)
- Sergej Wtjurin (Erlangen, DE)
Cpc classification
H04N19/645
ELECTRICITY
H04N19/129
ELECTRICITY
H04N19/184
ELECTRICITY
H04N19/647
ELECTRICITY
H04N19/463
ELECTRICITY
International classification
H04N7/18
ELECTRICITY
H04N19/129
ELECTRICITY
H04N19/645
ELECTRICITY
H04N19/184
ELECTRICITY
H04N19/463
ELECTRICITY
H04N19/64
ELECTRICITY
H04N19/156
ELECTRICITY
Abstract
Transform block coding is performed very efficiently in terms of computational complexity and compression ratio, by coding the magnitude bits of the transform coefficients distributed in a matrix, in which the magnitude bits of the spectral coefficients are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix. That is, magnitude bits within a certain column of the matrix belong to a certain spectral coefficient, while magnitude bits within a certain row of the matrix belong to a certain bit plane. In this configuration, the distribution of non-zero magnitude bits may be condensed towards one corner of the matrix, corresponding to, for instance, the least significant bit plane and corresponding to, by using a scan order among the transform coefficients which sorts the transform coefficients generally in a manner from lowest to highest frequency, the lowest frequency. Various low complexity variants are presented.
Claims
1. A decoder for decoding a transform block from a data stream, comprising a microprocessor: the microprocessor is configured to, decode, from the data stream, a profile leading through a matrix in which magnitude bits of spectral coefficients of the transform block are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, so that the profile indicates, in terms of position and value, first magnitude bits of the spectral coefficients of the transform block and the profile envelopes non-zero magnitude bits of the spectral coefficients in the matrix; and decode, second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the decoder is further configured to determine, a first predetermined bit plane among bit planes of the spectral coefficients with deriving, the first predetermined bit plane from information signaled in the data stream, or performing, the determination uniquely depending on an amount of data consumed in the data stream by a previously decoded portion of the data stream; and identify, the second magnitude bits out of the magnitude bits of the spectral coefficients using the first predetermined bit plane.
2. The decoder according to claim 1, wherein the decoder comprising the microprocessor, is configured to determine the first predetermined bit plane by computing, for each of several bit planes, a data coding consumption associated with coding the first magnitude bits and the second magnitude bits up to the respective bit plane completely, and selecting one of the several bit planes as the first predetermined bit plane by comparison of the data coding consumption for the several bit planes with a data consumption target.
3. The decoder according to claim 2, wherein the decoder comprising the microprocessor is configured to a) derive the data consumption target from information signaled in the data stream, or b) compute the data consumption target uniquely depending on an amount of data consumed in the data stream by a previously decoded portion of the data stream.
4. A decoder for decoding a transform block from a data stream, comprising a microprocessor: the microprocessor is configured to, decode, from the data stream, a profile leading through a matrix in which magnitude bits of spectral coefficients of the transform block are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, so that the profile indicates, in terms of position and value, first magnitude bits of the spectral coefficients of the transform block and the profile envelopes non-zero magnitude bits of the spectral coefficients in the matrix; and decode, second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the decoder is further configured to derive, from a signalization in the data stream a second predetermined bit plane for a transform block group to which the transform block belongs, and which represents spectral decompositions of a group of different portions of a two-dimensional image, and restrict, the decoding of the first magnitude bits and the decoding of the second magnitude bits to bit planes which are not more significant than the second predetermined bit plane and to spectral coefficients of a first subset of spectral coefficients, and further decode a second subset of spectral coefficients, at least comprising a DC coefficient, from the data stream, the further decoding revealing magnitude bits of the second subset of spectral coefficients lying in bit planes more significant than the second predetermined bit plane.
5. The decoder according to claim 4, wherein the decoder comprising the microprocessor is configured to, derive from a signalization in the data stream a third predetermined bit plane for a further transform block group to which the transform block belongs, and which represents spectral decompositions of a further group of different portions of the two-dimensional image, and perform the decoding of the second subset of spectral coefficients in a manner recovering magnitude bits of the second subset of spectral coefficients lying in bit planes more significant than the second predetermined bit plane, but non-indicative of magnitude bits of the second subset of spectral coefficients lying in bit planes more significant than the third predetermined bit plane.
6. The decoder according to claim 4, wherein the decoder is to decode a predetermined spectral coefficient of the code second subset of spectral coefficients from the data stream by VLC decoding a count of leading zero magnitude bits of the predetermined spectral coefficient from the data stream and decoding less significant magnitude bits of the predetermined spectral coefficient which pertain bit planes less significant than a bit plane determined by the count, from the data stream.
7. A decoder for decoding a transform block from a data stream, comprising a microprocessor: the microprocessor is configured to, decode, from the data stream, a profile leading through a matrix in which magnitude bits of spectral coefficients of the transform block are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, so that the profile indicates, in terms of position and value, first magnitude bits of the spectral coefficients of the transform block and the profile envelopes non-zero magnitude bits of the spectral coefficients in the matrix; and decode, second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the decoder is further configured to decode, a signalization from the data stream which identifies a selected scan order out of a plurality of scan orders, and use the selected scan order so that the magnitude bits of the spectral coefficients are entered into the matrix with the spectral coefficients of the transform block ordered along the row direction of the matrix in accordance with the selected scan order.
8. A method for decoding a transform block from a data stream, the method comprising the steps of: decoding, from the data stream, a profile leading through a matrix in which magnitude bits of spectral coefficients of the transform block are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, so that the profile indicates, in terms of position and value, first magnitude bits of the spectral coefficients of the transform block and the profile envelopes non-zero magnitude bits of the spectral coefficients in the matrix; and decoding, second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the method further comprises determining, a first predetermined bit plane among bit planes of the spectral coefficients with deriving, the first predetermined bit plane from information signaled in the data stream, or performing, the determination uniquely depending on an amount of data consumed in the data stream by a previously decoded portion of the data stream; and identifying, the second magnitude bits out of the magnitude bits of the spectral coefficients using the first predetermined bit plane.
9. A method for decoding a transform block from a data stream, the method comprising the steps of: decoding, from the data stream, a profile leading through a matrix in which magnitude bits of spectral coefficients of the transform block are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, so that the profile indicates, in terms of position and value, first magnitude bits of the spectral coefficients of the transform block and the profile envelopes non-zero magnitude bits of the spectral coefficients in the matrix; and decoding, second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the method further comprises deriving from a signalization in the data stream a second predetermined bit plane for a transform block group to which the transform block belongs, and which represents spectral decompositions of a group of different portions of a two-dimensional image, and the decoding of the first magnitude bits and the decoding of the second magnitude bits to bit planes which are not more significant than the second predetermined bit plane and to spectral coefficients of a first subset of spectral coefficients, and the method further comprises decoding a second subset of spectral coefficients, at least comprising a DC coefficient, from the data stream, the further decoding revealing magnitude bits of the second subset of spectral coefficients lying in bit planes more significant than the second predetermined bit plane.
10. A method for decoding a transform block from a data stream, the method comprising the steps of: decoding, from the data stream, a profile leading through a matrix in which magnitude bits of spectral coefficients of the transform block are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, so that the profile indicates, in terms of position and value, first magnitude bits of the spectral coefficients of the transform block and the profile envelopes non-zero magnitude bits of the spectral coefficients in the matrix; and decoding, second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the method further comprises decoding, a signalization from the data stream which identifies a selected scan order out of a plurality of scan orders, and using the selected scan order so that the magnitude bits of the spectral coefficients are entered into the matrix with the spectral coefficients of the transform block ordered along the row direction of the matrix in accordance with the selected scan order.
11. A non-transitory computer storage medium having stored thereon a computer program for performing method steps for, decoding a transform block from a data stream, executing by a microprocessor, comprising: decoding, from the data stream, a profile leading through a matrix in which magnitude bits of spectral coefficients of the transform block are arranged column-wise with the spectral coefficients of the transform block ordered along a row direction of the matrix, so that the profile indicates, in terms of position and value, first magnitude bits of the spectral coefficients of the transform block and the profile envelopes non-zero magnitude bits of the spectral coefficients in the matrix; and decoding, second magnitude bits of the spectral coefficients residing in the matrix at a lower significance side of the profile, wherein the method further comprises: determining, a first predetermined bit plane among bit planes of the spectral coefficients with deriving, the first predetermined bit plane from information signaled in the data stream, or performing, the determination uniquely depending on an amount of data consumed in the data stream by a previously decoded portion of the data stream; and identifying, the second magnitude bits out of the magnitude bits of the spectral coefficients using the first predetermined bit plane, when said computer program is run by a computer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) Embodiments of the present application are described below with respect to the figures, among which:
(2)
(3)
(4)
(5)
(6)
(7) the shading of the samples may correspond, for instance, to a one pixel value, whereas unshaded ones may correspond to a zero pixel value;
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
DETAILED DESCRIPTION OF THE INVENTION
(34) The following description of embodiments of the present application starts with a motivation of the underlying separated coding of profile magnitude bits and magnitude bits enveloped by the profile. Subsequently, the description involved with this motivation section is extended in three different ways with many variants also being presented. The four different aspects already outlined above are used in respective ones of the presented modified concepts. Finally, embodiments for the encoder and decoder are described, which use the separated coding of profile magnitude bits and magnitude bits enveloped by the profile and for each of the four aspects presented above, another implementation of the encoder and decoder embodiments is described with trying to reference sections of the description of the three modified concepts.
(35) 1. Coding Strategy
(36) Frequency transforms used for compression show an energy compaction property. This means that the relevant information is concentrated in a few frequency coefficients, while the other frequency coefficients have only low amplitudes.
(37) This property is exploited by the coding scheme embodiments outlined below. To this end, the frequency coefficients of a transform block as described with respect to Figs. A and B are ordered in such a way that in mean they have decaying amplitude. Since the ordering needs to be known by the decoder for proper decoding, the ordering scheme is fixed, or a limited number of orderings may be chosen as explained later on in Section 1.10. Please note that while for a single block frequency coefficients might not be decaying, this property holds when considering the entirety of blocks occurring in typical images.
(38) 1.1 Block Profile
(39)
(akj<i:f.sub.a.sup.f=0)(ak:f.sub.a.sup.i=1).
(40) In other words, the profile forms the convex hull of the most significant 1-bits. All bits that are situated in a bit plane below the profile bits are called refinement bits, since they gradually improve the precision of the coefficient value.
(41) With this definition, a block can be entropy encoded by efficiently describing the profile, and copying the refinement bits to the output bit stream without any modifications.
(42) 1.2 Definitions
(43) Let f.sub.k.sup.i be the i-th bit of frequency coefficient f.sub.k, f.sub.k.sup.0 being the LSB. Then the following definitions hold: f.sub.k is called significant in bit plane j, if and only if i>j:f.sub.k.sup.i=1. f.sub.k gets significant in bit plane j, if and only if it is significant in bit plane j1, but not in bit plane j
(44) 1.3 Profile Encoding
(45) In order to describe the form of the profile, the following symbols are used when encoding a bit f.sub.k.sup.i being part of the profile, where ZR is a so called zero run state:
(46) TABLE-US-00001 TABLE 1 Alphabet (symbol table) for profile encoding Typical Symbol value Meaning Condition Explanation EOP 0 End of ZR = 0 All remaining bits f.sub.a.sup.i, a k are zero. This avoids profile explicitly sending the cross hatched bits in FIG. 3. ONE 11 1-symbol ZR = 0 The currently encoded bit f.sub.k.sup.i is one. SZR 10 Start Zero ZR = 0 The currently coded bit f.sub.k.sup.i is zero, and a > Run k: f.sub.a.sup.t = 1. Consequently, the EOP needs not to be expected till reaching the next one bit. Consequently, when the zero-run state is entered (ZR = 1), an EOP symbol is illegal. CZR 0 Continue ZR = 1 The currently coded bit f.sub.k.sup.i is zero and the zero zero run run is continued (ZR = 1). EZR 1 End zero ZR = 1 The currently coded bit f.sub.k.sup.i is one and the zero run run is exit (ZR = 0)
(47)
(48)
(49) As illustrated in
(50) Encoding starts at the most significant bit plane with zero run disabled. In case no significant bit will appear any more in the current bit plane, and EOP symbol is sent, and encoding continues with the next lower bit plane of the same frequency coefficient. Otherwise, the profile bit is encoded in different ways depending the zero run mode is enabled or not. If zero run-mode is enabled, simply the bit value is output. Otherwise, two bits are used to encode the bit value.
(51) 1.4 Profile decoding
(52) In order to be able to decode the profile, the decoder also needs to track a zero run state. In case ZR=1, the decoder simply needs to read one bit in order to determine the value of the next. Otherwise (ZR=0), a two bit symbol needs to be read, possibly prepended by a string of zero-bits with variable length.
(53) 1.5 Advantages of the concept described so far are that all (or nearly all) dependencies are within a block, thereby increasing the possible parallelism. No grouping of coefficients is used, thereby reducing possible waste of bits. No negative impact results in case neighboring blocks have very different statistics. However, care should be taken to avoid waste of bits by a hull of a block which is not decaying. Coefficients should be brought into a favorable order.
(54) 1.6 Strategies for Encoding the Sign Bits
(55) Besides the profile and the data bits, also the sign bits need to be signaled to the decoder. For this end, two different strategies are possible: Whenever a coefficient is included in the profile, immediately emit a sign bit, independently of the coefficient value Only encode the sign bit when the quantized coefficient is unequal to zero. While this obviously increases encoding efficiency, it also increases complexity in the decoder due to an additional/longer feedback loop in the decoder. The decoder first needs to decode the coefficient value before it knows whether to read a sign bit or not.
(56) 1.7 Controlling the Granularity of Parallelism
(57) 1.7.1 General Concept
(58) Given that entropy coding exploits redundancy within a block, but not between blocks (except for possibly the DC coefficient when doing DC prediction), parallel encoding is rather straight forward. Parallel decoding, on the other hand is much more difficult, since the decoder needs to know where a block starts in the bit stream. Two strategies are possible to make this possible: Define a fixed coded length per block/slice/unit such that the decoder implicitly knows where to start reading the data for a given unit Signal the start of a block/slice/unit by means of length markers or pointers.
(59) Both strategies decrease the coding efficiency: Fixed block sizes might cause that a complicated block lacks bit rate, while a simple block disposes of more bits than needed to represent it with sufficient quality Length markers or pointers are supplementary information that are only used for parallel, but not for sequential decoding.
(60) Consequently, it is important to control the granularity of parallelism. This is done by the concept of a workgroup. A workgroup is a set of blocks that can only be decoded sequentially. The start of a workgroup, however, can be determined by the decoder without decoding previous workgroups in order to allow sufficient decoder parallelism.
(61) 1.7.2 Handling of Color Components
(62) For color images, a pixel consists of three (or even more) samples. These samples are typically processed in parallel by the frequency transform, such that in principle it is also possible to perform entropy encoding and decoding in parallel. Then, however, the entropy decoder needs to be capable to locate the individual color components in the codestream. This means that they need to be placed in different working groups.
(63) Alternatively, blocks belonging to the same spatial region, but different color components can be placed into the same working group. While this simplifies rate control in some cases (see Section 2.3), it increases the memory needed, since then more spatial regions need to be processed in parallel.
(64) Color subsampling is supported by allowing only an even number of blocks for the luminance channel within a workgroup. For each of the chroma channels, only half the number of blocks makes part of the working group.
(65) 1.8 Encoding the Profile of the DC Coefficients
(66) 1.8.1 Problem Formulation
(67) Frequency transforms, being it a 2D-DCT, or a wavelet transform, have the property to condense the signal energy into the low pass coefficients. Consequently, the DC coefficients have typically much larger values than the other frequency coefficients. This is illustrated in
(68) In order to allow for proper interpretation, the decoder first of all needs to know the number of zero bit planes. Given that the number of zero bit planes is expected to be similar for neighboring blocks, rate can be saved by signaling this quantity on a workgroup level instead of a block level. This means that the maximum DC coefficient determines the number of bits to transmit for all other blocks (of the same color component) as illustrated in
(69) The next step consists in signaling the profile of the blocks. However, usage of the principles from Section 1.3 would be inefficient, since for every block there is a significant drop in the profile between the DC coefficients and the first AC coefficient. Given that one output bit is used to signal a drop of the profile by one bit plane, this is again not very efficient. Instead, a more efficient representation within the workgroup may be used as outlined in the following.
(70) 1.8.2 Proposed Coding Mode
(71) Given that the statistics for the different color channels can be quite different, redundancy in signaling the profile can only be reasonably exploited within a color channel. In case a color channel in a workgroup only contains one block, there is no redundancy that can be exploited. Consequently, normal profile coding as described in section 1.3 can be applied.
(72) This means that in the following we can assume that the workgroup contains at least two blocks for the considered color component. Moreover, our strategy is to only output one variable length code per coefficient, since generation and decoding of VLC codes is relatively expensive in hardware due to the barrel shifters needed (see also Section 4.7).
(73) Let f.sub.a.sup.i(b, c) be the i-th bit of frequency coefficient f.sub.a in block b and color component c of the current workgroup. f.sub.a.sup.i=0(b, c) shall be the least significant bit, and f.sub.0(b, c) the DC coefficient. Then the coding algorithm is as follows 1. Compute the maximum number of AC bit planes in the complete workgroup:
(74)
(75)
(76)
(77) 1.8.3 Alternative Coding Mode
(78) An alternative way for encoding the profile of the DC coefficients is described in the following. It differs from Section 1.8.3 basically in the order by which different parts are signaled to the decoder: 1. Signal n.sub.maxn.sub.AC,max by a variable length code 2. For every color component, signal n.sub.DC,c,maxn.sub.AC,max
(79) The drawback of this approach is that the decoder needs to decode two variable length codes before it can decode the first DC coefficient. This, however, might have negative consequences on hardware implementations (see Section 3.8.1).
(80) 1.9 Scan Pattern Optimization
(81) 1.9.1 Problem Formulation
(82) As obvious from Section 1.1, the faster the profile is decaying, the more efficiently a code block can be represented. The form of the profile depends among others on the order by which the frequencies are visited during entropy coding. Thus, the scanning order also determines how fast a code block is decaying.
(83) To this end, consider two extreme cases illustrated in
(84) The first block consists of a vertical edge. Consequently, a vertical frequency transform can ideally compact the energy into the lowpass coefficients. A subsequent horizontal transform will then show many high frequency coefficients for the vertical low pass coefficients, while the other ones are zero. Similarly, for a horizontal edge, the horizontal transform can condense the signal into the low pass coefficients, such that a subsequent vertical transform will only show large frequencies for the horizontal low pass coefficients.
(85)
(86) Consequently, coding efficiency can be significantly improved by defining a set of possible scanning orders, dynamically select them on the encoder side and inform the decoder about this decision. Then both encoder and decoder can use the best possible scanning order in order to reduce the number of bits used to describe a transform block.
(87) 1.9.2 Brute Force Selection of the Scanning Order
(88) Let's suppose that we have defined a set of possible scanning orders as motivated in Section 1.9. Then the encoder can obviously perform entropy encoding with each possible scanning order and select that one with the smallest number of bits needed, or the smallest distortion.
(89) The first criterion is only applicable if the entropy coder performs lossless encoding of possibly pre-quantized frequency coefficients. For the more typical scenario of a rate constraint compression as illustrated in
(90) This can be approximated by selecting the scanning order that includes the largest number of bit planes from a code block for a given bit budget. In case the number of completely included bit planes is the same, the decision needs to be performed based on the included fractional bit planes. A fractional bit plane is a bit plane of a transform block that is only included partially in the codestream.
(91) Given that the typical zig-zag order weights the frequencies accordingly to their (visual) importance, one strategy could consist in selecting that scanning order where the number of contiguously encoded coefficients in the zig-zag scanning order is the largest one. Please note that this does not necessarily select the zig-zag scanning order itself, since it might abort encoding much earlier than a more optimized scanning order.
(92) 1.9.3 Heuristic for Selection of the Scanning Order
(93) While brute force selection of the scanning order delivers good image quality, it is typically too complex to be applied in hardware implementations with small footprint. To solve this problem, a measure can be used by the encoder that can select between, e.g. the zig-zag, the predominantly horizontal and the predominantly vertical scanning mode illustrated in
(94) If, for instance, the measure is greater than a first certain amount, then the predominantly vertical scanning order shown in
(95) If the measure is, for instance, smaller than a second certain amount (smaller than the first), then the predominantly horizontal scanning order shown in
(96) Otherwise, the traditional zig-zag scanning order may be applied.
(97) 1.10 Summary
(98) Section 1 has explained some concepts for subsequently exemplified coding schemes. In a nutshell, it exploits that frequency transform coefficients typically show decaying amplitudes. To this end, the profile, being the convex hull of the frequency coefficients is encoded, such that bits lying outside of the convex hull do not need to be signaled to the decoder.
(99) Sections 2-4 will now describe three variants of this entropy coding scheme.
(100) 2. Bit Plane Based Embodiment
(101) Profile-based entropy coding as explained in Section 1 is most easily applied when encoding the frequency coefficients bit plane by bit plane as done, for instance, in [2]. The following coding scheme renders plane-by-plane based magnitude bit coding more efficient and/or less complex to implement by paying special attention to: DC encoding Scan pattern adaptation Support of different rate control modes (see Section 2.2)
(102) In more detail, encoding is performed as follows: 1. Signal the selected scan pattern 2. Encode the DC overhang bits as explained in Section 1.8. The DC overhang bits are encoded on a coefficient base. In other words, all overhang bits for a DC coefficient as illustrated in
(103) 2.1 Bit Plane Coding for a Single Block
(104) In order to explain the bit plane based mode, let's consider the example block illustrated in
(105) Zero bit planes (marked with an orange shading), and DC overhang bits (marked with a gray shading) are handled in a special fashion as described in Section 1.8. Consequently, they shall not be further considered here. Hence, the first bit to encode is bit 8 of frequency f.sub.0. Since f.sub.0 is already significant in bit plane 8, f.sub.0.sup.8 can be simply written into the codestream. Given that all other coefficients f.sub.k are zero in bit plane 8, only an EOP-symbol will be sent before moving to the next bit plane.
(106) In the next bit plane 7, coefficient f.sub.0 is refined again by outputting f.sub.0.sup.7 into the codestream. Next, profile encoding is performed for coefficients f.sub.1-f.sub.4 as explained in Section 1.3. Since coefficient f.sub.4 gets significant, also its sign bit needs to be output. Finally the bit plane is terminated with an EOP symbol.
(107) In bit plane 6, first coefficients f.sub.0-f.sub.4 are refined by putting bits f.sub.0.sup.6-f.sub.4.sup.6 into the codestream. Then profile encoding is continued as explained in Section 1.3. For coefficients f.sub.6 and f.sub.9, a sign bit needs to be added as well. After profile encoding, the bit plane is terminated with an EOP symbol.
(108) For bit plane 5, bits f.sub.0.sup.5-f.sub.9.sup.5 are simply output to the codestream, followed by an EOP symbol. Bit plane 6 is handled in a similar way.
(109) When refining bit plane 3 by putting the bits f.sub.0.sup.3-f.sub.9.sup.3 into the codestream, the sign bits for coefficients f.sub.5 and f.sub.8 need to be added as well, since both coefficients are getting significant.
(110) 2.2 Termination of Encoding and Decoding
(111) Depending on the rate control strategy, different possibilities exist to decide when to stop encoding and decoding.
(112) 2.2.1 Bit Plane Based Coding Termination
(113) For the bit plane based coding termination, both encoder and decoder agree on a number of bit planes that should not be transmitted. The number of dropped bit planes can also be expressed by a quantization factor. It can be computed by the encoder and signaled to the decoder on different levels of granularity, such as blocks, workgroups, lines, slices, or even images. Alternatively, the quantization factor can be computed on both the encoder and the decoder side.
(114) In case the codestream layout imposes that the bit stream for a block, workgroup or slice terminates on a byte or word boundary, and the bits generated to include the desired bit plane would leave some spare bits, they can be simply used to refine the next bit plane for a small number of coefficients.
(115) The decoder can detect this situation and is free to decode them or simply skip them for reasons of complexity.
(116) 2.2.2 Bit Budget based Coding Termination
(117) In this case, coding termination is not controlled by a target bit plane, but by a target bit rate. This target bit rate can be a constant for the image, or it can be signaled by the encoder to the decoder on different levels of granularity, such as blocks, workgroups, lines or slices. Alternatively, both the encoder and the decoder can compute them in the same manner.
(118) As a result, both the encoder and the decoder know how many bits are available to encode a certain block (or workgroup, see Section 2.3). Consequently, both of them simply stop when this target bit rate is met, which is very simply due to the bit plane based entropy encoding.
(119) 2.3 Bit Plane Coding for a Workgroup
(120) Section 2.1 assumed that only a single transform block needs to be encoded or decoded. However, when combined with the bit budget based coding termination, this causes that the target rate needs to be met on a granularity of a block. Given, however, that some blocks might be more difficult to encode than other ones, this would result in low PSNR values, since the difficult block cannot be represented with enough precision.
(121) Consequently, Section 2.3.1 explains how this challenge can be addressed. Then Section 2.3.2 discusses the situation for a bit plane based coding termination.
(122) 2.3.1 Bit Budget Based Coding Termination
(123) As explained in Section 2.2.2, encoding and decoding shall be stopped when reaching a certain target bit budget. In the following, we assume that this budget is not given for a single block, but for a complete workgroup. By these means, difficult blocks in the workgroup can consume more bits than simple blocks, leading to an increased overall PSNR value.
(124) Then, however, the question arises, how to distribute the bit budget over the different blocks. This question can be solved by interleaving encoding and decoding of the different blocks. In more detail, bit plane i is encoded in all blocks before moving to bit plane i1 in any of the blocks. Within a bit plane i, first frequency coefficient f.sub.k.sup.i is encoded for all blocks, before moving to frequency f.sub.a.sup.i, a>k in any block.
(125) By these means, the bits are implicitly prioritized and the encoder continues coding the workgroup until the bit budget is exceeded.
(126) 7.3.2 Bit Plane based Coding Termination
(127) For the bit plane based coding termination, the distribution of the rate between the different blocks within a workgroup is implicitly defined. Consequently, no interleaving of the different blocks is necessary. The only exception is the filling to the next byte or word boundary.
(128) Here, ideally an interleaving of the frequency coefficients from the different blocks should happen. However, for complexity reasons, it is also possible to simply refine the blocks in sequential order.
(129) 2.4 Pseudo Code
(130) The pseudo code of
(131) 2.4.1 Definition of Functions
(132) bitplane(b,coeff): Returns bitplane b of coeff (in sign-magnitude representation) sign(coeff): Returns sign-bit of coeff
(133) 2.4.2 Encoding
(134) The encoding is illustrated in the pseudo code of
(135) 3. Coefficient Based Embodiment with Two Passes
(136) 3.1 Problem Formulation
(137) Low complexity compression as addressed by this invention typically targets compression ratios between 1:2 and 1:6.
(138) TABLE-US-00002 TABLE 2 Relation between input bit depth and compression rates (bpc = bits per component) 1:2 1:4 1:6 8 bits per pixel component 4 bpc 2 bpc 1.33 bpc 10 bits per pixel component 5 bpc 2.5 bpc.sup. 1.66 bpc 12 bits per pixel component 6 bpc 3 bpc 2 bpc
(139) This results in varying target compression rates in bits-per-component as shown by Table 2, ranging from 6 bpc (bits per component) up to 1.33 bpc. A worst cases analysis assuming that an input bit can be represented by a single output bit (like in refinement coding) then leads to the conclusion that a sample needs to be revisited in mean between 6 and 1.33 times when using the bit plane based embodiment of Section 2.
(140) In order to evaluate the impact of this fact, we need to consider in addition the target resolution of 4 k60 fps. This results in a pixel frequency of 4096.Math.2160.Math.60531 MHz. Assuming a processing clock of 150-200 Mhz for the entropy coder, this means processing 3-4 pixels or 9-12 samples in parallel. Given that a sample needs to be revisited up to 6 times, this ends up with up to 54-72 instances for the entropy coder. Given that workgroups can only be processed sequentially, 54-72 workgroups need to be buffered.
(141) Table 3 illustrates the resulting memory requirements, expressed in component rows. For instance, a value of 1.5 means that the buffer needs to be sufficiently large to hold 1.5 image rows with 3 components each. Please note that this memory has to be provided in addition to the memory used to perform the frequency transform.
(142) TABLE-US-00003 TABLE 3 Number of component rows that need to be buffered in order to achieve sufficient throughput. Workgroup sizes are 64, 128 and 256 pixels. Memory in component rows Spaltenbes
3 parallel units
4 parallel units Zellenbeschrift- per comp per comp Overall ungen
64 128 256 Max 64 128 256 Max max
8 bits per 0.5625 1.125 2.25 2.25 0.75 1.5 3 3 3 input comp 1:6 0.1875 0.375 0.75 0.75 0.25 0.5 1 1 1 1:4 0.28125 0.5625 1.125 1.125 0.375 0.75 1.5 1.5 1.5 1:2 0.5625 1.125 2.25 2.25 0.75 1.5 3 3 3
10 bits per 0.703125 1.40625 2.8125 2.8125 0.9375 1.875 3.75 3.75 3.75 input comp 1:6 0.234375 0.46875 0.9375 0.9375 0.3125 0.625 1.25 1.25 1.25 1:4 0.3515625 0.703125 1.40625 1.40625 0.46875 0.9375 1.875 1.875 1.875 1:2 0.703125 1.40625 2.8125 2.8125 0.9375 1.875 3.75 3.75 3.75
12 bits per 0.84375 1.6875 3.375 3.375 1.125 2.25 4.5 4.5 4.5 input comp 1:6 0.28125 0.5625 1.125 1.125 0.375 0.75 1.5 1.5 1.5 1:4 0.421875 0.84375 1.6875 1.6875 0.5625 1.125 2.25 2.25 2.25 1:2 0.84375 1.6875 3.375 3.375 1.125 2.25 4.5 4.5 4.5 Overall max 0.84375 1.6875 3.375 3.375 1.125 2.25 4.5 4.5 4.5
(143) Table 4 illustrates the number of parallel accesses to this memory. They need to be compared to the number of RAM modules used without any parallel access. Assuming a memory block size of approximately 16 kbits [3] as available in modern FPGAs, and 14 bits per DCT coefficient for 8 bit input images (including the gain of the color and the frequency transform), a single component row uses approximately 4 memory blocks. For all components, 12 memory blocks are thus sufficient, which is slightly less than the parallelism needed.
(144) TABLE-US-00004 TABLE 4 Number of entropy coder instances Maximum von instances parallel units per comp Zeilenbeschriftungen
3 4
8 bits 36 48 0.166666667 12 16 0.25 18 24 0.5 36 48
10 bits 45 60 0.166666667 15 20 0.25 22.5 30 0.5 45 60
12 bits 54 72 0.166666667 18 24 0.25 27 36 0.5 54 72 Overall Max 54 72
(145) Consequently, for better hardware efficiency the throughput per entropy encoder needs be improved, which is subject to the next subsections.
(146) 3.2 General Strategy
(147) From Table 2 it gets obvious that the bit plane based approach is particularly problematic for low compression ratios, since a coefficient needs to be revisited multiple times. As however a 1:2 compression leads to particularly good image quality, which allows much more subsequent editing operations than a 1:4 or even 1:6 compression, having a codec that also supports these operation points is very important.
(148) This can be achieved by switching from the bit plane to a coefficient based approach. The goal is to ideally consider each coefficient only one time. Unfortunately, this is very difficult to achieve. However, at least the number of times a coefficient needs to be revisited can be bounded and made independent of the compression ratio.
(149) The following subsections will describe a corresponding approach that can be used for bit budget based rate control as explained in Section 2.3.1. Section 4 then details an approach that is better suited for quantization based rate control as explained in Section 2.3.2.
(150) 3.3 Solution Concept for a Single Block
(151) In the proposed solution, entropy coding of a block is performed in two passes. In the first pass, only the profile of a block is encoded without emitting any refinement bits (bits without shading in
(152) For a more detailed explanation, let's reconsider the block shown in
(153) Next, profile coding continues in bit plane 7 by considering frequency coefficients f.sub.1-f.sub.4 in the same way as explained in Section 1.3. Then, bit plane 7 is terminated by means of an EOP symbol.
(154) The encoder continues this operation until the target bit budget which it is aware of is exceeded. To this end, it is important to notice that the encoder can easily determine the number of refinement bits used when terminating a bit plane. For instance, when terminating bit plane number 8, the encoder knows that in bit plane 7, only frequency coefficient f.sub.0 needs to be refined. Hence, before starting to encode the profile in bit plane 7, the encoder needs to allocate one bit for f.sub.0.sup.7, although the latter is not directly output to the codestream.
(155) Similarly, terminating bit plane 7, the encoder needs to allocate refinement bits for coefficients f.sub.0-f.sub.6 before starting to encode the profile of bit plane 6.
(156) By these means, while encoding the profile, the encoder can track how many bits will be used including the refinement bits needed. Consequently, it can simply stop profile encoding, when the target bit budget is exceeded. Next, it simply outputs for every coefficient the number of refinement bits needed.
(157) Overall, every coefficient needs to be revisited twice at most when assuming, that for every bit plane the maximum contributing frequency is computed beforehand as illustrated in
(158) Since each time a coefficient is visited at least one bit is output, for lower compression ratios, not all coefficients need to be revisited twice. For instance, when having an 8-bit input image and a compression ratio of 1:6, in the worst case each sample needs to be visited 1.33 times in mean. Consequently, encoding will be done much faster than for the bit plane based approach, because when refining a coefficient, much more bits can be output.
(159) In order to interpret the codestream, the decoder decodes the profile and tracks the number of refinement bits needed in the same way than the encoder did. Moreover, it is also aware of the available bit budget for the current block. Consequently, the decoder knows when to stop profile decoding and interprets the remaining bits as refinement bits.
(160) 3.4 Sign Bit
(161) The attentive reader might have observed that Section 3.3 did not discuss the sign bits. Ideally, a sign bit is only transmitted when the quantized coefficient is unequal zero. While for the encoder it is easily possible to determine exactly, when a sign bit needs to be emitted, the decoder cannot derive this information when decoding the profile, since the refinement bits will follow much later. This, however, is problematic, since during profile coding and decoding, also decision about coding abortion is performed, and this decision needs to be identical on encoder and decoder side.
(162) In order to solve this problem, there exist two different strategies: 1. Whenever a coefficient is included in the profile, also allocate a sign bit. By these means, a corresponding worst case analysis is performed, since some of the coefficients will never get significant, and hence theoretically do not need any sign bit. 2. When performing profile coding, only allocate a sign bit for the coefficients whose profile bit is one. Allocate a sign bit for the other coefficients when they are first refined, independent what the refinement value is. Please note that this option is much more difficult to implement in hardware.
(163) When doing the actual refinement coding, this sign bit might then indeed be emitted for every coefficient. Alternatively, the sign bit can be omitted when the quantized coefficient is zero. Then the block will not completely consume the available bit budget. Depending on the rate control, this bit budget can be allocated to the next block, or it can be used for refining some more coefficients. However, such an approach will significantly increase complexity.
(164) Please note that the sign bit can be easily omitted for coefficients where already from profile coding it is obvious that the value will be zero (coefficients f.sub.13-f.sub.15 in
(165) 3.5 Fractional Bit Planes
(166) Assuming that every coefficient will have an associated sign bit (except when profile coding already signals them as zero), both encoder and decoder can predict exactly how many bits will be needed for both profile and refinement coding. This allows generating so called fractional bit planes.
(167) Let's consider again
(168) 3.6 Encoding a Workgroup
(169) The concept described so far assumed encoding of a single block. Consequently, the rate constraint needs to hold for a single block. As already discussed in Section 2.3, this might lead to reduced image quality in case not all blocks are of the same complexity.
(170) Instead, the rate is better distributed within a workgroup. This can be achieved as described in the following.
(171) 3.6.1 Interleaving of Frequency Coefficients
(172) In accordance with Section 2.3.1, the bit budget can be distributed between blocks by interleaving the frequencies of different blocks. To this end, the profile is encoded bit plane by bit plane. Moreover, within a bit plane, the profile is encoded frequency by frequency instead of block:
(173) TABLE-US-00005 for bitplane bp = maxACbitplane -1 downto for each frequency f_i in increasing order foreach block b in current workgroup if f.sub.i.sup.bp of block b belongs to profile Emit profile bits for f.sub.i of block b end if end for end for allocate refinement bits end for
(174) Only when all blocks of a workgroup have terminated a bit plane bp, the corresponding refinement bits are allocated. In case the bit budget is not sufficient to refine all blocks completely, the available bit budget needs to be distributed among the blocks. In this case, the rate control ideally solves the following problem: Given a workgroup consisting of n blocks with m frequencies, and the profile has been encoded till bit plane b for all blocks. Then find two numbers n.sub.1[1, n] and 0k<m, such that for n.sub.1 blocks all frequencies f.sub.1f.sub.k having been included in the coded profile are refined until bit plane b1 and for nn.sub.1 blocks, all frequencies f.sub.1f.sub.k1 having been included in the coded profile are refined only till bit plane b.
(175) Since this causes a certain complexity for hardware implementation, an alternative solution consists in attributing the refinement bits in a first-come-first-served basis. Start with the first block and allocate the refinement bits needed. Then continue with the next block until the bit budget is exceeded.
(176) 3.6.2 Encoding without Frequency Interleaving
(177) Interleaving the frequencies as explained in Section 3.6.1 is rather complex to implement in hardware. The reason is that the hardware implementation needs to process one sample per clock. However, finding the block and frequency to process next is not obvious.
(178) In order to solve this challenge, it is important to notice that the unique reason for interleaving the frequencies of the different blocks are a precise rate control. In fact, all blocks shall a priori be refined until the same bit plane, and the remaining bit budget should then be attributed to the important coefficients. In this context, the lower is the frequency, the more important is a coefficient. Moreover, coefficients that are signaled as zero by the profile coding should obviously be excluded from the refinement.
(179) However, if this constraint is relaxed, it is also possible to process all frequencies of a block in a given bit plane before switching to the next bit plane:
(180) TABLE-US-00006 for bitplane bp = maxACbitplane -1 downto foreach block b in current workgroup for each frequency f_i in increasing order if f.sub.i.sup.bp of block b belongs to profile Emit profile bits for f.sub.i of block b end if end for end for allocate refinement bits end for
(181) Such an organization has the huge benefit of allowing high performance implementations as explained in Section 4.7.
(182) 3.7 Pseudo Code for the Encoder
(183) The concepts described in the previous sections are clarified in the following by corresponding pseudo code assuming frequency interleaving. Moreover, it uses a the most simple sign bit allocation in that every coefficient being included into the profile coding will contain a sign bit (see also Section 3.4). In addition, refinements bits are allocated in a first-come-first-served method as described in Section 3.6.1. For reasons of simplicity, the special handling of DC coefficients is excluded from the pseudo code.
(184) 3.7.1 Input coeff[b][f]: Coefficient for block b, frequency f maxFreq[b][bp]: contains f.sub.max.sup.bp for block b as defined in Equation (6-1) maxNumACBitplanes: maximum number of AC bit planes in current workgroup (see Section 6.9) bitbudget: Bit budget available for workgroup
(185) 3.7.2 Variables
(186) TABLE-US-00007 // Maximum frequency for a given bit plane per block Dim maxFreq[#BlocksPerWorkingGroup][#bitplanes] = {-1}; // which frequency has been encoded last Dim tailPos[#BlocksPerWorkingGroup] = {-1}; // Indicates whether the current bit plane is in run mode, assuming that // it contributes to the profile encoding Dim zeroRun[#BlocksPerWorkingGroup] = {};
(187) 3.7.3 Encode Working Group
(188) The pseudo code is depicted in
(189) 3.8 Pseudo Code for the Decoder
(190) 3.8.1 Input maxNumACBitplanes: maximum number of AC bit planes in current workgroup (see Section 1.8) bitbudget: Bit budget available for workgroup
(191) 3.8.2 Definitions Function peekBits(n) returns n bits from the buffer, but does not remove them from the buffer Function readBits(n) returns n bits from the buffer, and removes them from the buffer sign(coeff[c][b][f]) represents the sign associated with coeff[c][b][f] abs(coeff[c][b][f]) represents the absolute value of the coefficient coeff[c][b][f]
(192) 3.8.3 Variables
(193) TABLE-US-00008 // Maximum frequency for a given bit plane per block Dim maxFreq[#BlocksPerWorkingGroup][#bitplanes] = {-1}; // Maximum frequency coded for a block Dim tailPos[#BlocksPerWorkingGroup] = {-1}; // Indicates whether the current bit plane is in run mode, assuming that // it contributes to the profile encoding Dim zeroRun[#BlocksPerWorkingGroup] = {};
(194) 3.8.4 Decode Working Group
(195) The pseudo code for the decoder is depicted in
(196) 4. Coefficient Based Embodiment with one Pass
(197) 4.1 Problem Formulation
(198) While the bit plane based embodiment of Section 2 delivers good coding performance, it suffers from limited performance, in particular for small compression ratios like 1:2. This is solved by the coefficient based approach from Section 3, in that the maximum number of times a coefficient needs to be revisited is bounded to two.
(199) Unfortunately, the control flow is still rather complex due to frequency interleaving as discussed in Section 3.6.1. But even when avoiding frequency interleaving as described in Section 3.6.2, profile coding needs to interleave between blocks, since profile encoding needs to be in descending bit plane order. Finally, rate control is combined with profile coding, which complicates flexible rate control schemes that distribute rate between different workgroups. In addition, control flow within the entropy coder is getting more complex.
(200) In the following an alternative approach is presented that clearly distinguishes between rate control and entropy coding. The actual entropy coding is performed in one single pass. The rate control beforehand, on the other hand can easily be run in parallel, simplifying thus the hardware implementation.
(201) 3.2 Solution Concept
(202)
(203) Instead of a bit budget driven termination of the coding as discussed in Section 3, a bit plane based coding termination is applied. In other words, the decoder is informed about the performed quantization in terms of number of dropped bit planes. Signaling of the quantization factor can be achieved in the same manner than described in [4].
(204) The number of dropped bit planes needs to be computed by the rate control. This rate control can work on a block or workgroup level. Alternatively, it can even consider multiple working groups, such as a complete line of working groups. In the following, we use the term rate control group to define the set of coefficients on which the rate control is operating. Given that rate control is quite simple (see Section 4.3) it can be run in parallel for all blocks within a rate control group. Hence, a single rate control block can maintain a high throughput. Moreover, given that the result of the rate control is signaled to the decoder, sign bits can be handled in a precise manner in that sign bits are only emitted when the coefficient value is not zero.
(205) Once the entropy coder knows the desired quantization step size, entropy coding can happen on a coefficient base. First the entropy coder encodes the profile information for the current coefficient as explained in Section 1.3. Then it can immediately output the refinement bits, since the target quantization is already known before moving to the next coefficient.
(206) Please note that the refinement bits can be interleaved with the profile bits, or they can be put into separate FIFOs and then output sequentially. The latter option simplifies execution on GPUs due to better parallelism. Moreover, it allows unequal error protection of the codestream, in that the profile bits are better protected than the data bits, since a bit error in the data bits only have a local impact, while a bit error in the profile bits causes decoding-errors in large pixel areas. On the other hand, slightly larger output buffers are used. In fact, without separating profile and refinement bits, a single entropy coder instance would not need any output buffer. On the other hand, as soon as multiple entropy coder instances are used, an output buffer is needed in any case. The only difference is that for separate output of profile and refinement bits, two instead of one buffer per entropy coder instance are needed. This, however, is not a big problem. For instance, in case throughput considerations allow for, even the same memory block can be used, when profile bits are stored with increasing addresses, starting at memory address zero, and refinement bits are stored with decreasing addresses, starting at the largest possible memory address. Moreover, the number of profile bits is bounded, since only one or two profile bits can be output per coefficient plus the EOP symbols used, which are bounded by the bit depth of the coefficients.
(207) 4.3 Fractional Bit Planes
(208) 4.3.1 Problem Formulation
(209) With the insights of Section 1.5, the rate control of
(210)
(211) Consequently, for all blocks the bits bounded by a bold line are included. Moreover, the quantization factor for both blocks is typically set equal.
(212) By these means, only a very limited set of quantization points need to be checked, making it easy to implement it in hardware and software. However, on the other hand, quantization is rather coarse. Consequently, typically the rate associated to a workgroup cannot be met exactly, limiting the PSNR coding performance.
(213) The following subsections discuss a couple of strategies how to solve this issue.
(214) 4.3.2 Propagation of the Rate to the Next Rate Control Unit
(215) Bits that are not used in one rate control group will be passed to the next rate control group.
(216) Advantageously, this is simple to implement.
(217) Disadvantageously, at the end of the image, there might be some bits left that cannot be used, and this propagation concept may not be possible when every block should have a strictly fixed size. Further, this may increase the end-to-end latency of the system, since an additional smoothing buffer is used, and it may limit parallelism on encoder side.
(218) 4.3.3 Refinement Based on Budget Signaling or Tracking
(219) The rate control can compute the number of bits that will not be used for a rate control group. If both the encoder and the decoder know this number, they can refine some coefficients with an additional bit plane and stop this additional refinement when the extra bits are completely used.
(220) Advantageously, this also works when every block should have a strictly fixed size.
(221) Disadvantageously, an ideal solution would use frequency interleaving between blocks. In case the decoder cannot derive the bit budget available for refinement bits, they need to be explicitly signaled, thereby causing a corresponding overhead: In the worst case, one data bit needs to be encoded with 3 bits. Having for instance a working group with three color components and four 44 DCT blocks, the maximum number of bits that are not allocated by the rate control when only considering complete bit planes equals 3.Math.4.Math.4.Math.4.Math.31=575 bits. Encoding such a number uses 10 bits. Assuming a target compression of 4 bits per pixel, a workgroup consists of 4.Math.4.Math.4.Math.4=256 bits, such that the overhead is 3.9%. In case a workgroup consists of 16 DCT blocks, the overhead is only 1.2%
(222) 4.3.4 Use of Separate Output Buffer
(223) In this case, the refinement bits are placed in a separate buffer and append later on to the bit stream. Then the decoder can decode them in a second phase. Since the decoder does not need any rate control phase, the number of phases in encoder and decoder is identical.
(224) Advantageously, no signaling overhead is needed.
(225) Disadvantageously, this complicates the decoder design in terms of, for example, hardware resources and memory. An additional buffer may be used at the encoder side.
(226) 4.4 Pseudo Code for Encoder
(227) In order to clarify the descriptions of the previous sections, they are detailed in the following subsections by means of pseudo code. For the creation of fractional bit planes, the concepts of Section 4.3.3 are used.
(228) In order to keep the code as simple as possible, the special handling of the DC coefficients as explained in Section 1.8 is not included, but it could be.
(229) 4.4.1 Definitions sign(coeff[c][b][f]) outputs the sign associated with coeff[c][b][f] abs(coeff[c][b][f]) outputs the absolute value of the coefficient coeff[c][b][f]
(230) 4.4.2 Input to the Function coeff[c][b][f]: Coefficient for component c, block b, frequency f maxFreq[#components][#blocks][#bitplanes]. Signals for every bit plane and every block the maximum contributing frequency. If the complete bit plane is zero, the value equals1. lsbNumber: Result of the rate control defining to which bit plane the data are included remainingExtraBits: Number of bits left by the rate control that can be used for additional refinement
(231) 4.4.3 Variables
(232) TABLE-US-00009 // stop when block has been finished Dim BlockFinished[#components][#blocksPerWorkingGroup] = {}; // Currently processed profile bit plane Dim currentProfileBitplane[#components][#blocksPerWorkingGroup]; // Indicates whether the current bit plane is in run mode, assuming that // it contributes to the, profile encoding Dim zeroRun[#components][#blocksPerWorkingGroup] = {};
(233) 4.4.4 Encoding a Workgroup with Precise Fractional Bit Planes
(234) The pseudo code is depicted in
(235) 4.5 Pseudo Code for Decoder
(236) 4.5.1 Definitions Function peekBits(n) returns n bits from the buffer, but does not remove them from the buffer Function readBits(n) returns n bits from the buffer, and removes them from the buffer
(237) 4.5.2 Input to the Function lsbNumber: Result of the rate control defining to which bit plane the data are included remainingExtraBits: Number of bits left by the rate control that can be used for additional refinement
(238) 4.5.3 Decoding a Workgroup with Precise Fractional Bit Planes
(239) The pseudo code is depicted in
(240) 4.6 Simplification of Fractional Bit Plane Encoding
(241) Pseudo code marked at 10 in
(242) This can be prevented by using the bit budget for fractional coding only for refinement bits, but not for profile bits. In other words, fractional bit plane coding stops when reaching the hatched shaded bits in
(243) 4.6.1 Encoding a Workgroup with Limited Fractional Bit Planes
(244) The pseudo code is depicted in
(245) 4.6.2 Decoding a Workgroup with Limited Fractional Bit Planes
(246) The pseudo code is depicted in
(247) 4.7 Throughput Considerations and Codebook Reinterpretation
(248) For high-speed hardware implementations, it is important that during both profile and refinement encoding, one coefficient can be processed per clock cycle. Thanks to the fact, that the codec operates coefficient by coefficient, this is indeed possible as explained in the following.
(249) For refinement coding only the correct number of raw bits need to be output. Consequently, the throughput constraint is rather easily fulfilled for refinement coding.
(250) For profile coding, coding of AC and DC coefficients need to be distinguished.
(251) 4.7.1 AC Coefficient Coding
(252) For high performance hardware implementations, it is important to process only one variable length code word per clock cycle. Fortunately, this is easily possible by reinterpretation of the coding alphabet shown in Table 1.
(253) When encoding the profile of coefficient f.sub.a, and zero run state is active, a fixed code word of one bit needs to be emitted or read per coefficient. This is obviously trivial.
(254) When encoding the profile of a coefficient f.sub.a, and zero run state is not active, then the codeword consists of a variable number of zero bits, followed by a separating one-bit, followed by the bit value as depicted in
(255) The leading zeros correspond to EOP symbols. Their number hence determines how many bit planes need to be terminated and varies between zero and n.sub.max, where n.sub.max is the maximum number of bitplanes a coefficient can have. These leading zero bits are separated by a one bit, before the value of the profile bit follows.
(256) Thanks to this simple codeword structure, a high performance implementation is easily possible.
(257) 4.7.2 DC Coefficient Coding
(258) Coding of the profiles is also easy. In case a workgroup consists only of one block per color component, the profile for a DC coefficient is simply a variable length code for the number of zero bit planes. In case the workgroup consists for more than one block per color, the number of zero bit planes is only signaled once for all DC coefficients of a component. In addition, the number of maximum AC bit planes need to be signaled. However, all together, less than one variable length code word per DC coefficient is used, enabling a high throughput implementation as well.
(259) 4.8 Limitation of Code Word Length
(260) The maximum size of the variable codeword described in Section 3.8.1 equals n.sub.max+2 bits. Since longer codewords use larger barrel shifters in hardware, the maximum codeword size can be limited by introduction of a clipping threshold. Let x be the number of EOP symbols to encode. Then instead of outputting x zero bits, followed by a one, the following variable length code can be applied:
(261) TABLE-US-00010 If x < x Output x zero bits, followed by a one bit Else Output x zero bits, followed by a one bit, followed by x-x in binary representation
(262) 5. The above sections presented a motivation of the usage of the concept of coding the magnitude bits of spectral coefficients of a transform block in a manner distinguishing between profile on the one hand and magnitude bits enveloped by the profile on the other hand, and provided, in Sections 2 to 4, examples of how to exploit the concept of the separate coding of profile on the one hand and magnitude bits enveloped by the profile on the other hand, in a manner allowing for a less complex implementation of an encoder and decoder and/or an increase in coding efficiency. In the following, certain aspects applied in Sections 2 to 4 are specifically made the subject of embodiments described further below. Insofar, the embodiments for the encoder and decoder described below represent abstractions of the specific details set out in Sections 2 to 4. In order to alleviate associating the subsequently explained embodiments with the embodiments set out in Sections 2 to 4, the description outlined below contains references to Sections 2 to 4.
(263)
(264) For the sake of completeness, it is noted that in case if the transform block 14 representing the result of a subband spectral decomposition 22 such as a wavelet transform, block 14 is, for instance, of size nn with n=2.sup.N. A set of 2.sup.(Nl).Math.2.sup.(Nl), registered to one corner of block 14, would represent, for example, the lower frequency portion of subband l in x and y. This set or subarray would represent a quadrant of a subarray of subband l which comprises three further subarrays of size 2.sup.(Nl).Math.2.sup.(Nl) of spectral coefficients of subband l concerning a higher frequency portion of subband l in the horizontal direction and/or vertical direction. The 22 array of these subband l subarrays would then, in turn, represent a quadrant of an even larger subarray of block 14, the other three quadrants of which would represent subarrays of size 2.sup.(nl+1).Math.2.sup.(N1+1) and relate to subband l1 and so forth.
(265) As already denoted above, the encoder 12 is for encoding transform block 14 into a data stream. The data stream is illustrated in
(266) Encoder 12 is configured to code magnitude bits of spectral coefficients of transform block 14. That is, encoder 12 receives, or converts inbound spectral coefficients 16 into, a sign and magnitude representation. The sign is optional is, accordingly, not further taken into account in the following description of the encoder as well as with respect to the decoder described hereinafter. In order to efficiently code the magnitude bits, encoder 12 codes the magnitude bits by describing their distribution of zeros and ones in a matrix 32 in which the magnitude bits 34 of the spectral coefficients 16 are arranged, or into which the magnitude bits 34 are entered, column-wise with the spectral coefficients 16 of the transform block 14 ordered along a row direction 36 of the matrix 32. By dashed lines 38,
(267) It should be noted here that encoder 12 may or may not be provided with, or have access to, block 14 in a manner where the transform coefficients' magnitude bits 34 are already arranged in a manner so as to correspond to matrix 32. If not, encoder 12 takes the order of the block's 14 spectral coefficients 16 in accordance with the scan order 40 into account when coding the magnitude bits of the spectral coefficients in the manner outlined further below.
(268) When implementing encoder 12 in a manner complying with a first aspect of the present application, the encoder 12, in encoding the magnitude bits 34 of the spectral coefficients 16 into data stream 30, performs the coding of first magnitude bits of the spectral coefficients 16 into the data stream 30 first before coding second magnitude bits of the spectral coefficients 16 into data stream 30. The first magnitude bits were illustrated using hatching in
(269) Besides coding the first magnitude bits 46 which form profile 44, encoder 12 codes second magnitude bits of the spectral coefficient 16, with these second magnitude bits being ones residing in the matrix 32 beneath, i.e. at a lower significance side of, profile 44. In
(270) Before describing an embodiment for a decoder fitting to the encoder of
(271) As already noted above, encoder 12 may code profile 44, and the magnitude bits 46 forming the same, using symbols which might be code words of a variable length code. An example of such code words has been set out above with respect to Table 1 and the usage and mode of operation in order to code profile 44 using these symbols has been described with respect to
(272) While profile 44 may be possibly encoded in this manner with using, for instance, a variable length code in order to code the symbols output thereby, encoder 12 may alternatively use another approach. For instance, the aforementioned symbols of Table 1 could be coded using arithmetic coding. In
(273) As to the second magnitude bits 48, it has already been taught above that the magnitude bits 48 may be coded into data portion 78 of data stream 30 using a compression ratio of one, i.e.
(274) for each magnitude bit 48 one bit is written into data stream 30. For instance, the second magnitude bits 48 may be written into the data stream as they are, i.e. in a plane manner. Moreover, the second magnitude bits 48 may be coded into data portion 78 in a non-interleaved manner with respect to their membership to spectral coefficients 16. For instance, magnitude bits 48 of transform block 14 belonging to one coefficient 16 may be coded into data portion 78 immediately consecutively. Second magnitude bits 78 belonging to different coefficients 16 would then not be interleaved within data stream 30. An interleaving could also be left off between coefficients of different transform blocks 14.
(275) The description of the encoder of
(276) The decoder 80 is configured to decode the first magnitude bits 46 forming profile 44 and to decode the second magnitude bits 48 residing in matrix 32 at a lower significance side of profile 44, i.e. behind profile 44 when seen along the significance direction 42, or ones hit by the profile when the latter is projected along matrix column direction 42.
(277) The details set out above with respect to the encoder of
(278) However, as already indicated above, a different sort of coding/decoding of the profile magnitude bits 46 may be used. As far as the second magnitude bits 48 are concerned, the same is true. That is, decoder 80 may, in a plane manner, read the second magnitude bits 48 from data stream 30 and may, in this regard, read the magnitude bits 48 from the data stream in an non-interleaved manner with respect to the membership to transform coefficients 16, but alternatives would be available as well.
(279) After having described the encoder of
(280) In accordance with a first aspect of the present application, encoder 12 codes the first magnitude bits 46 into data stream 30 prior to, and in a non-interleaved manner to, the second magnitude bits 48, and likewise decoder 80 decodes the first magnitude bits 46 from the data stream 30 prior to, and in a non-interleaved manner relative to, the second magnitude bits 48 from data stream 30. As a result, the corresponding data sections or data portions 46 and 78 of data stream 30 do not interleave and are accordingly illustrated in a corresponding manner in
(281) An example of a mode of operation of encoder 12 and decoder 80 when corresponding to the first aspect of the present application is explained below with respects to
(282) Both tasks, i.e. the coding task of encoder 12 and the decoding task of decoder 80, start with obtaining a data rate target at step 110 and 112, respectively, or in alternative terms, a data amount, data consumption or bit budget reserved in the data stream 30 for the transform block or a transform block group to which the transform block belongs. Step 110 at the encoder side may involve encoder 12 receiving a data amount target from, for example, a rate control such as the rate control depicted in
(283) This knowledge may be used or may not be used in the subsequent step of coding the first magnitude bits 114 at the encoder side and decoding the first magnitude bits 116 at the decoder side. For example, the codec may be designed such that the data amount target suffices inevitably to completely code profile 44 down to the least significance bit plane, i.e. bit plane zero in
(284) Alternatively, the data amount target is taken into account when performing coding/decoding 114/116 of the first magnitude bits in the sense that the data rate target controls when to stop coding the profile 44. In other words, the data rate target may also be used in steps 114/116 to determine or identify the first magnitude bits until which same are coded into, or decoded from, the data stream such as, for instance, along the direction generally pointing along directions 36 and 42, respectively. As described above, the encoder could determine for each bit plane how much data would have to spent for coding the second magnitude bits completely within the respective bit plane in addition to coding the profile completely with respect to this bit plane, and accordingly encoder 12 is able to choose the least significant non-fractional bit plane till which it is feasible to code both the first magnitude bits and the second magnitude bits into the data stream 30 while complying with the data rate target. The decoder may determine at each transition from one bit plane to the next in decoding the profile bits, as to when the lest significant non-fractional bit plane has been reached. Again, potential remaining data amount still complying with the data rate target may distributed onto additional second magnitude bits below the least significant non-fractional bit plane. Further, the second magnitude bits thus identified at the encoder and decoder may be coded/decoded column-wise, i.e. without interleaving magnitude bits of different spectral coefficients.
(285) The concept presented in section 2 represents an example for the process described with respect to
(286) Before proceeding with describing implementation of encoder 12 of
(287) In accordance with the second aspect of the present application, encoder 12 is configured to determine, in the manner described above with respect to the first steps in
(288) As it turns out from the description brought forward above with respect to an implementation of encoder and decoder
(289) As shown in
(290) Again, it is clear that
(291) The statement performed with respect to the first aspect, according to which the coding of the first and second magnitude bits may be restricted to a certain subset of the coefficients of block 14, such as all but the DC coefficient and/or all but a subset of lowest frequency coefficients in scan order, is applicable to implementations of encoder and decoder corresponding to the second aspect of the present application as well.
(292) Similar statements performed above with respect to the second aspect of the present application or, to be more precise, when starting the description of implementations of the encoder and decoder of
(293) In order to examples of implementations of the encoder and decoder of
(294) This means that all the above described embodiments may be used to code the first and second magnitude bits, i.e. the profile magnitude bits 46 and enveloped magnitude bits 48, with merely restricting the coding to the sub array of matrix 32 relating to bit planes other than bit planes 150 and to coefficients of set 158. The first and second magnitude bits 46 and 48 merely lie within this sub array. The coding of the value of any coefficients within set 152 may take place before in coding order, the same applying to decoding.
(295) Taking into accounts bits 154 in coding the value of a coefficient in set 154 mans the following. For example, on a per coefficient basis, VLC coding may be used for coding the value and decoding the value. A predetermined spectral coefficient of the set 152 is, for example, coded by the encoder by mapping a value of the predetermined spectral coefficient, as represented by at least a subset of the magnitude bits of this predetermined spectral coefficient, which includes at least one magnitudes bit lying in bit planes 150, onto a variable length code 160 and writing the variable length code 160 into the data stream 30. For example, the value of this coefficient in set 152, is determined by the sequence 162 of magnitude bits from some most significant bit plane 164 among coefficients within set 152 within a certain set of transform blocks, to a least significant bit plane 166. In other words, the just-mentioned most significant bit plane 164 of bit sequence 162 may be signaled in the data stream 30 for a set of transform blocks which, for instance, covers all transform blocks of an image, is a the transform block group relating to the signalization of the least significant non-fractional bit plane, or is the transform block group, for which plane 156 is signaled in the data stream 30, i.e. the group of transform blocks relating to group 28. The latter groups may all be the same. Alternatively, the most significant bit plane 164 may be the most significant bit plane of all bit planes irrespective of the values of coefficients, i.e. bit plane n1 in nomenclature of
(296) That is, in accordance with the third aspect of the present application, the encoder signals in the data stream a second predetermined bit claim, such as the most significant non beyond profile bit plane indicated 156 in
(297) An alternative is illustrated in
(298) Instead of coding the VLC code directly, prediction, such as spatial prediction from a transform block corresponding to a neighboring sample block, and/or temporal prediction from a transform block corresponding to a spatially collocated sample block of a previously coded image of a video to which both images belong, may be used with restricting VLC coding to the prediction residual, i.e. the prediction of the count 167 with treating the plane written magnitude bits as they are (
(299) A possible implementation involving the aspect of possibly treating with more than one color component, is described above in section 1.8.2.
(300) Finally, the fourth aspect of the present application relates to an implementation of the encoder and decoder of
(301) With respect to the third and fourth aspects, it is noted that it may very well be that the first and second magnitude bits may be transmitted within a data stream in a bit plane manner rather than in a coefficient by coefficient manner. A similar statement would also be true for first and second aspects, respectively.
(302) Finally,
(303) Even though the above described embodiments comprise varying specific features, the main components of the encoder and decoder of the embodiments may be mutually applicable to within all embodiments.
(304) It is to be understood that in this specification, the signals on lines are sometimes named by the reference numerals for the lines or are sometimes indicated by the reference numerals themselves, which have been attributed to the lines. Therefore, the notation is such that a line having a certain signal is indicating the signal itself. A line can be a physical line in a hardwired implementation. In a computerized implementation, however, a physical line does not exist, but the signal represented by the line is transmitted from one calculation module to the other calculation module.
(305) Although the present invention has been described in the context of block diagrams where the blocks represent actual or logical hardware components, the present invention can also be implemented by a computer-implemented method. In the latter case, the blocks represent corresponding method steps where these steps stand for the functionalities performed by corresponding logical or physical hardware blocks.
(306) Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important method steps may be executed by such an apparatus.
(307) The inventive transmitted or encoded signal can be stored on a digital storage medium or can be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
(308) Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, for example a floppy disc, a DVD, a Blu-Ray, a CD, a ROM, a PROM, and EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
(309) Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
(310) Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may, for example, be stored on a machine readable carrier.
(311) Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
(312) In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
(313) A further embodiment of the inventive method is, therefore, a data carrier (or a non-transitory storage medium such as a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
(314) A further embodiment of the invention method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may, for example, be configured to be transferred via a data communication connection, for example, via the internet.
(315) A further embodiment comprises a processing means, for example, a computer or a programmable logic device, configured to, or adapted to, perform one of the methods described herein.
(316) A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
(317) A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
(318) In some embodiments, a programmable logic device (for example, a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods may be performed by any hardware apparatus.
(319) While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents which will be apparent to others skilled in the art and which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
REFERENCES
(320) [1] AMBROISE RENAUD; BUYSSCHAERT CHARLES; PELLEGRIN PASCAL; ROUVROY GAEL, Method and Device for Display Stream Compression, U.S. Pat. No. 9,332,258 BB. [2] Thomas Richter, Sven Simon, Comparison of CPU and GPU Based Coding on Low-Complexity Algorithms for Display Signals, Proc. SPIE 8856, Applications of Digital Image Processing XXXVI, September 2013. [3] Xilinx, 7 Series FPGAs Overview, http://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf, accessed Oct. 8, 2014. [4] Joachim Keinert, Thomas Richter, Herbert Thoma, Miguel Angel Martinez del Amor, Sergej Wtjurin, Siegfried Fel, Christian Scherl, Manuel de Frutos Lpez, Wolfgang Heppner, Low complexity entropy coder for image/video coding, Patent submission. [5] AMBROISE RENAUD; BUYSSCHAERT CHARLES; PELLEGRIN PASCAL; ROUVROY GAEL, Method and Device for Display Stream Compression, U.S. Pat. No. 9,332,258 BB. [6] AMBROISE RENAUD; BUYSSCHAERT CHARLES; PELLEGRIN PASCAL; ROUVROY GAEL, Method and Device for display stream compression, EP2773122 A1. [7] Jean-Baptiste Lorent, TICO Lightweight Codec Used in IP Networked or in SDI Infrastructure, SMPTE RDD 35:2016. [8] Toshiaki Kojima, LLVCLow Latency Video Codec for Network Transfer, SMPTE RDD 34:2015. [9] D. A. Huffman, A method for the construction of minimum-redundancy codes, Proceedings of the I.R.E. September 1952, pp. 1098-1101. [10] W. B. Pennebaker, J. L. Mitchell, G. G. Langdon and R. B. Arps, An overview of the basic principles of the Q-Coder adaptive binary arithmetic coder, in IBM Journal of Research and Development, vol. 32, no. 6, pp. 717-726, November 1988. [11] Solomon W. Golomb, Run-Length Encodings, IEEE Transactions on Information Theory IT-12 (3). 1966, pp. 399-401. [12] Elias, Peter (March 1975), Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory. 21 (2): 194-203. doi:10.1109/tit.1975.1055349