Method and Apparatus for Complexity Control in High Throughput JPEG 2000 (HTJ2K) Encoding

Abstract

Methods for management of encoding complexity for image and video encoding, for example for algorithms belonging to the JPEG 2000 family of standards, where the encoding process targets a given compressed size (i.e. a total coded length) for the image or for each frame of a video sequence. Described are a set of methods for complexity constrained encoding of HTJ2K code-streams, involving collection of local or global statistics for each sub-band (not for each code-block), generation of forecasts for the statistics of sub-band samples that have not yet been produced by spatial transformation and quantization processes, and the use of this information to generate a global quantization parameter, from which the coarsest bit-plane to generate in each code-block can be deduced. Coded length estimates can be generated in a manner that enables latency and memory to be separately optimized against encoded image quality, while maintaining low computational complexity.

Claims

1. A method for complexity constrained encoding of code-streams, including JPEG 2000 and High Throughput JPEG 2000 code-streams, subject to an overall target length constraint, involving the steps of: a. collecting information about the sub-band samples produced by spatial transformation; b. generation of coded length estimates from said collected information, for a plurality of potential bit-plane truncation points; c. determination of a quantization parameter (QP value) from these length estimates, such that after mapping the QP value to a base bit-plane index for each relevant code-block of each sub-band, the estimated overall coded length when truncating at these bit-plane indices is not expected to exceed said overall target length constraint; d. mapping said QP value to a base bit-plane index for each code-block; e. encoding each relevant code-block to the precision associated with the corresponding base bit-plane index, and encoding one or more additional coding passes from each such code-block; and f. subjecting all such generated coding passes to a post-compression rate-distortion optimization process to determine the final set of coding passes emitted from each code-block as the compressed result.

2. The method of claim 1, where the block coding algorithm of JPEG 2000 Part-1 is employed.

3. The method of claim 1, where the High Throughput block coding algorithm of JPEG 2000 Part-15 is employed, and the base bit-plane for a code-block corresponds to the first High Throughput (HT) Cleanup pass generated for that code-block.

4. The method of claim 1, where coded length estimates are formed for all samples of each sub-band before determining a global QP value, whereupon the base bit-plane indices for all code-blocks are determined and encoding of the relevant coding passes can then proceed based on these indices.

5. The method of claim 1, where coded length estimates are formed incrementally, as sub-band samples become available from the spatial transform.

6. The method of claim 5, where the QP value is updated incrementally according to a coded length budget that starts out equal to the overall target length constraint, and code-blocks become available incrementally for coding, said code-blocks being identified as “dispatched code-blocks,” involving the following steps: a. collection of length estimates that are based on the samples that have become available for a sub-band into estimated length records; b. generation of forecast length records which anticipate the coded lengths that would be estimated for the samples that have not yet become available for the sub-band; c. incremental determination of a QP value, based on the estimated length records and forecast length records, such that after mapping the QP value to a base bit-plane index for each non-dispatched code-block of each sub-band, the estimated overall coded length for non-dispatched code-blocks, when truncating at these bit-plane indices, is not expected to exceed the remaining length budget; d. mapping said QP value to a base bit-plane index for non-dispatched code-blocks for which sub-band samples are available, such code-blocks being identified as “active code-blocks”; e. entering the active code-blocks into the set of dispatched code-blocks, while subtracting the estimated coded lengths, corresponding to truncation at their respective base bit-plane indices, from the coded length budget; f. encoding dispatched code-blocks to the precision associated with their respective base bit-plane indices, and encoding one or more additional coding passes from each such code-block; and g. making all such generated coding passes available to the post-compression rate-distortion optimization process.

7. The method of claim 1, where coded length estimates for a collection of sub-band samples are formed using a simplified model of the block coding process that lacks one or more features of the actual encoding process so that the estimated length for a given bit-plane truncation point is almost certainly larger than an actual coded length at that same point.

8. The method of claim 7, where coded length estimates for a collection of sub-band samples are formed from significance statistics, where a sample is significant at a bit-plane boundary if its quantized magnitude at that bit-plane would be non-zero.

9. The method of claim 8, where coded length estimates for a collection of sub-band samples are formed from quad significance counts, where a quad consists of 4 samples, and the count for a given bit-plane identifies the number of quads from the collection that would contain at least one significant sample in that bit-plane.

10. The method of claim 6, where length forecasts are obtained by extrapolating the estimated lengths determined from available sub-band samples within the same sub-band.

11. The method of claim 6, where forecasts incorporate background information concerning typical estimated lengths for similar image content, that has been collected previously.

12. The method of claim 6, where the complexity constrained encoding procedure is applied to video, and the length forecasts for a given sub-band within a current video frame are determined using the sub-band samples observed in a previous frame.

13. The method of claim 12, where length forecasts for samples that are not yet available within a given sub-band of a current video frame are formed by combining the length estimates formed for corresponding sub-band samples in a previous frame, identified here as temporal forecasts, with extrapolated length estimates formed from available sub-band samples within the current frame, identified here as spatial forecasts.

14. The method of claim 13 where temporal and spatial forecasts are combined based on a reliability measure that compares the consistency of the length estimates for the available sub-band samples across frames with the consistency of the length estimates in the previous frame for those sub-band samples that are available and non-available, respectively, in the current frame.

15. The method of claim 1, where code-blocks of an image or video frame are partitioned into flush-sets, such that each flush-set has its own coded length constraints, and the QP generation, base bit-plane mapping, block encoding and post-compression rate-distortion optimization processes all operate flush-set by flush-set, based on the individual flush-set length constraints.

16. An apparatus for complexity constrained encoding of code-streams, comprising an encoder which is arranged to implement the method of claim 1.

17. (canceled)

18. A non-volatile computer readable medium, storing instructions for controlling a computer to implement a method in accordance with claim 1.

19. (canceled)

20. The method of claim 6 where coded length estimates for a collection of sub-band samples are formed using a simplified model of the block coding process that lacks one or more features of the actual encoding process so that the estimated length for a given bit-plane truncation point is almost certainly larger than an actual coded length at that same point.

21. The method of claim 20, where coded length estimates for a collection of sub-band samples are formed from significance statistics, where a sample is significant at a bit-plane boundary if its quantized magnitude at that bit-plane would be non-zero.

22. The method of claim 10, where forecasts incorporate background information concerning typical estimated lengths for similar image content, that has been collected previously.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0038] Features and advantages of the present invention will become apparent from the following description of embodiments thereof, by way of example only, with reference to the accompanying drawings, in which;

[0039] FIG. 1 is a diagram illustrating bit-plane contributions of the coding passes produced by the J2K-1 and HT algorithms;

[0040] FIG. 2 is a block diagram of an HTJ2K encoding system based on the FBCOT (FAST Block Coding with Optimised Truncation) paradigm;

[0041] FIG. 3 is a diagram illustrating a core complexity control method in accordance with an embodiment of the present invention;

[0042] FIG. 4 is a diagram illustrating an on-line adaptive complexity control method in accordance with an embodiment of the present invention;

[0043] FIG. 5 is a diagram illustrating generation and encoding of flush-sets for low latency encoding in accordance with an embodiment of the present invention, illustrated for two levels of Mallat-style DWT. Not all connections from stripe buffers to coding processes are shown, and

[0044] FIG. 6 is a graph illustrating exploration of complexity control strategies at 1 bpb on video with abrupt changes in scene complexity, composed of six segments, four of which have very low complexity in the upper half of the frame.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0045] In the following description of embodiments of the invention, there are five “aspects” that are discussed.

1.SUP.st .Aspect: QP Based Complexity Control Using Coded Length Estimates

[0046] A key principal behind this embodiment is that a near-optimal set of quantization parameters for transformed imagery can all be described in terms of a single global parameter, identified here as QP (Note that this is not same as the QP parameter used in modern video codecs such as H.264/AVC, H.265/AVC or AV1, but it plays a related role). We begin by explaining this property.

[0047] Writing x as a vector that represents all samples in an image and y.sub.b[n] for the 2D sequence (indexed by n≡[n.sub.1, n.sub.2]) of sub-band samples from transform sub-band b, the relationship between transform and image domain representations can be expressed as

[00001] $x = \underset{b}{.Math.} \underset{n}{.Math.} y_{b} [n] .Math. s_{b, n}$

where s.sub.b,n is the synthesis vector associated with location n in band b. Then the expected squared error distortion D in the image domain, due to quantization of the sub-band samples, can be written as

[00002] $D = E [{.Math. x - x^{'} .Math.}^{2}] \approx \underset{b}{.Math.} E [{.Math. y_{b} - y_{b}^{'} .Math.}^{2}] .Math. G_{b} = \underset{b}{.Math.} D_{b} .Math. G_{b}$

where D.sub.b=E[∥y.sub.b−y.sub.b′∥.sup.2] is the expected squared quantization error for the samples of sub-band b and

G.sub.b=∥s.sub.b∥.sup.2

is an “energy gain factor,” which is the squared Euclidean norm (sum of squared sample values) of the synthesis vectors s.sub.b,n-note that these are all translates of each other, so they all have the same Euclidean norm, except near the image boundaries.

[0048] More generally, we are often interested in minimizing a visually weighted distortion metric, which can be written as

[00003] $D = \underset{b}{.Math.} D_{b} .Math. W_{b} G_{b}$

where D.sub.b is still the expected mean squared quantization error for sub-band b and W.sub.b is a weighting factor that accounts for varying sensitivity of the human visual system to distortion in different spatial frequency bands.
At high bit-rates, a common distortion-rate model for the sub-band sample quantization process gives

[00004] $\frac{D_{b}}{N_{b}} \approx g .Math. σ_{b}^{2} .Math. e^{- a L_{b} / N_b}$

where N.sub.b is the number of samples in sub-band b, σ.sub.b.sup.2 is their variance, L.sub.b is the number of bits associated with the coded representation of these samples, and g and a can be taken as constants. All that matters is the exponential nature of the model, from which one can deduce that the sub-band quantization assignment that minimizes D subject to a constraint on the coded length L=Σ.sub.bL.sub.b, must minimize

[00005] $D + λ L = \underset{b}{.Math.} N_{b} .Math. g .Math. σ_{b}^{2} .Math. e^{- a L_{b} / N_b} .Math. W_{b} G_{b} + λ L_{b}$

for some λ>0, the solution to which is

[00006] $g .Math. σ_{b}^{2} .Math. e^{- \frac{a L_{b}}{N_{b}}} = \frac{λ}{W_{b} G_{b}}, \forall b$

[0049] That is, the mean squared quantization error in sub-band b satisfies

[00007] $\frac{D_{b}}{N_{b}} = \frac{λ}{W_{b} G_{b}}$

[0050] Writing Δ.sub.b for the quantization step size selected for sub-band b and P.sub.b for the number of least significant magnitude bit-planes discarded from the quantized sub-band samples, we have

[00008] $\frac{D_{b}}{N_{b}} \approx \frac{Δ_{b}^{2}}{1 2} .Math. 2^{2 P_{b}}$

and so the right number of bit-planes to discard for a given operating point λ is approximately

[00009] $P_{b} \approx \frac{1}{2} \log_{2} (\frac{1 2}{W_{b} G_{b} Δ_{b}^{2}}) + \frac{1}{2} \log_{2} λ$

[0051] Henceforth, we interpret the first and second terms on the right hand side of the above equation as a sub-band specific bias parameter β.sub.b and the global quantization parameter QP, respectively, that is:

[00010] $QP \overset{Δ}{=} \frac{1}{2} \log_{2} λ and β_{b} \overset{Δ}{=} \frac{1}{2} \log_{2} (\frac{1 2}{W_{b} G_{b} Δ_{b}^{2}})$

[0052] The complexity control methods of this invention assign a “base Cleanup pass,” denoted Cup0, for code-blocks of sub-band b, to correspond with bit-plane

P.sub.b=max{0,└QP+β.sub.b+R.sub.off┘} (1)

where R.sub.off is a rounding offset, such as R.sub.off=½. The actual value of R.sub.off is not really important, so long as it is consistent across all sub-bands b. The value of P.sub.b can be interpreted as the number of least significant magnitude bit-planes that will be effectively discarded from the quantized samples of sub-band b if the code-stream includes just this base Cleanup pass Cup0 for every code-block in the sub-band.

[0053] With these preliminaries out of the way, we are now in a position to describe the core complexity control method of this invention, which is illustrated in FIG. 3.

[0054] For each code-block of sub-band b, at most Z coding passes are generated, starting from bit-plane P.sub.b, where a preferred value for Z is 6. The value of P.sub.b is determined from equation (1), using a QP value that is quantized to an integer multiple of G, which can be interpreted as a number of “grid” points between successive integers. In FIG. 3, G=4, which is a good choice. In terms of G, QP can be expressed in terms of an integer F as

[00011] $Q P = \frac{F}{G}$

and equation (1) can be rewritten as

[00012] $\begin{matrix} P_{b} = \max {0, .Math. \frac{F + β_{b} G + R_{off} G}{G} .Math.} = \max {0, .Math. \frac{F + β_{b}^{'}}{G} .Math.} & (2) \end{matrix}$

where β.sub.b′=└β.sub.bG+R.sub.offG┘ is an integer bias for sub-band b. Equation (2) is implemented by the boxes marked with reference numerals 1 and 2 in FIG. 3.

[0055] In order to find the value of F (i.e., QP), the number of bytes associated with each candidate value p for P.sub.b is estimated, forming an “estimated length record” (or vector) L.sup.(b), containing the estimated lengths L.sub.p.sup.(b) for each sub-band b and each feasible base bit-plane p. The actual length estimation method forms the 2.sup.nd aspect of this invention, as described later in this document. We note here, however, that it is very important that these estimates are conservative, meaning that the total number of coded bytes associated with the Cleanup pass for bit-plane p over all code-blocks of sub-band b, should be close to but no larger than L.sub.p.sup.(b).

[0056] The elements of each estimated length record L.sup.(b) are replicated G times and biased to form

[00013] $\begin{matrix} {\bar{L}}_{f}^{(b)} = L_{p (b, f)}^{(b)}, where p (b, f) = \max {0, .Math. \frac{f + β_{b}^{'}}{G} .Math.} & (3) \end{matrix}$

[0057] That is, each element L.sub.p.sup.(b) of the L.sup.(b) vector is copied to G elements L.sub.f.sup.(b) of the L.sup.(b) vector, starting from the element with f=G.Math.p−β.sub.b′. This is the function of the boxes marked with reference numerals 3 and 4 in FIG. 3.

The expanded length vectors L.sup.(b) are accumulated to form L, whose elements are

[00014] ${\bar{L}}_{f} = {.Math.}_{b} {\bar{L}}_{f}^{(b)} = {.Math.}_{b} L_{p (b, f)}^{(b)}, where p (b, f) = \max {0, .Math. \frac{f + β_{b}^{'}}{G} .Math.}$

[0058] Evidently, L.sub.f is the (conservative) estimated number of coded bytes produced by the base Cleanup passes Cup0 of all code-blocks, if the integer F is chosen to be equal to f. The QP selection operation (box marked with reference numeral 5 in FIG. 3), simply selects

F=min{f|L.sub.f≤L.sub.max} (4)

where L.sub.max is the target maximum number of coded bytes. Since the length estimates are conservative, the actual number of coded bytes associated with all base Cleanup passes should be significantly smaller than L.sub.max, while the total number of coded bytes associated with the next finer Cleanup pass from all code-blocks is likely to be larger than L.sub.max. So long as Z>3 coding passes are generated for each code-block, therefore, the PCRD-opt algorithm is likely to have sufficient raw material from which to achieve a total compressed size that is very close to L.sub.max.

[0059] It is worth emphasizing the fact that in this invention, the length estimation process is not itself the basis for rate control—i.e., generating a code-stream with the desired coded length. The length estimation process intentionally under-estimates the coded length, ideally by a significant margin, and the coded length estimation methods are preferably very simple, so that they are not suitable for reliable rate control. Instead, rate control is performed by a post-compression rate-distortion optimization procedure, exploiting the availability of multiple truncation points for the code-blocks—i.e., multiple coded lengths and associated distortions for a code-block. Indeed, one source of inspiration for the invention is our recent discovery (demonstrated experimentally later in this document) that it is possible that it is possible to devise low complexity length estimators that have the desired level of conservatism with very high probability.

[0060] While the complexity control method described here is targeted towards High Throughput JPEG 2000, using the HT block coder, it may have application to other media coding systems. Most notably, the same complexity control method can be used to constrain the number of coding passes generated by the J2K-1 block coding algorithm, such that at most Z−1 passes are generated beyond the Cleanup pass associated with the base bit-plane P.sub.b, for any given code-block in sub-band b. The J2K-1 block coder's coding efficiency is usually similar to that of the HT block decoder (e.g., ˜10% better), so the same coded length estimation method can be used in both cases.

[0061] The main difference between the HT and J2K-1 block coding algorithms is that J2K-1 is fully embedded, so the J2K-1 block coder must produce all coding passes that are associated with coarser bit-planes than the base bit-plane P.sub.b, so long as the code-block contains significant samples at those coarser bit-planes, whereas the HT block coder does not need to do this. Nonetheless, the fact that coding can stop Z−1 passes after the Cleanup pass at bit-plane P.sub.b can still represent a significant computational saving in comparison with naively generating all possible coding passes for each code-block.

[0062] The method can also be used to produce code-streams that may contain a mixture of code-blocks that use the HT block coding algorithm and code-blocks that use the J2K-1 block coding algorithm.

2.SUP.nd .Aspect: Coded Length Estimation Using Sub-Band Sample Statistics

[0063] We turn our attention now to the question of how the estimated lengths L.sub.p.sup.(b) should be formed, as implemented by the boxes marked by reference numerals 6 and 7 in FIG. 3. The method models the coding cost of an algorithm for coding the quantization indices

I.sub.b,p[n]=χ.sub.b[n].Math.μ.sub.b,p[n]

where

[00015] $χ_{b} [n] = {\begin{matrix} 1 & y_{b} [n] \geq 0 \\ - 1 & y_{b} [n] < 0 \end{matrix} and μ_{b, p} [n] = .Math. \frac{.Math. y_{b} [n] .Math.}{Δ_{b} 2^{p}} .Math.$

are the sign and quantized magnitude, respectively, noting that the effective quantization step size is the actual step size Δ.sub.b time 2.sup.p, since p is the number of least significant magnitude bits that are being discarded from each sample if we truncate beyond the Cleanup pass in bit-plane p.

[0064] The modeled algorithm codes the magnitudes μ.sub.b,p[n], and the signs χ.sub.b[n] of only those samples whose magnitude is non-zero. In preferred embodiments, the modeled algorithm is a crude approximation of the actual encoding algorithm. In particular, the modeled algorithm for the HT Cleanup encoder, deliberately omitting many of the features that make the actual HT Cleanup algorithm efficient, so as to obtain a conservative estimate of the coded length L.sub.p.sup.(b). We describe here a specific embodiment that is both efficient to compute and effective in practice.

[0065] The first step of the estimation procedure involves the collection of quad significance statistics C.sub.b,p for each bit-plane p. Specifically, sub-band samples are partitioned into 2×2 quads, indexed by q≡[q.sub.1, q.sub.2], such that sample y.sub.b[n] belongs to quad q if

2q.sub.1≤n.sub.1≤2q.sub.1+1 and 2q.sub.2≤n.sub.2≤2q.sub.2+1;

then for each quad, a binary significance value σ.sub.b,p[q] is set to 1 in bit-plane p if any of its samples has non-zero magnitude μ.sub.b,b [n], i.e.,

[00016] $σ_{b, p} [q] = {\begin{matrix} 0 & μ_{b, p} [n] = 0, \\ 1 & otherwise \end{matrix} \forall n .Math. 2 q_{1} \leq n_{1} \leq 2 q_{1} + 1 and 2 q_{2} \leq n_{2} 2 q_{2} + 1;$

and then the quad significance statistics are obtained by accumulating the σ.sub.b,p[q] values:

[00017] $C_{b, p} = \underset{q}{.Math.} σ_{b, p} [q]$

[0066] Note that the statistics C.sub.b,p can be calculated without explicitly finding the individual μ.sub.b,p[n] values. It is sufficient to first form the maximum magnitude over each quad, quantize to get

[00018] ${\bar{μ}}_{b} [q] = .Math. \frac{\max_{\begin{matrix} 2 q_{1} \leq n_{1} \leq 2 q_{1} + 1, \\ 2 q_{2} \leq n_{2} \leq 2 q_{2} + 1 \end{matrix}} .Math. y_{b} [n] .Math.}{Δ_{b}} .Math.$

and then compare μ.sub.b [q] with 2.sup.p for each candidate bit-plane p. Specifically, C.sub.b,p is just the number of quads q, for which μ.sub.b[q]≥2.sup.p.

[0067] The second step of the estimation procedure involves converting the statistics C.sub.b,p to estimated byte counts. The HT Cleanup encoding algorithm produces three byte-streams, known as the MagSgn byte-stream, the VLC byte-stream and the MEL byte-stream. The number of bits packed to the MagSgn byte-stream from a quad q depends upon a bound on the precision of the quantized magnitudes in the quad. In the actual algorithm, this bound is based on so-called “magnitude exponents” E.sub.b,p[q], such that E.sub.b,p[q]−1 is the number of bits required to represent μ.sub.b,p[n]−1 for any samples in the quad, and the extra 1 accounts for the need to communicate sign bits for the non-zero samples. For the simplified model here, it is more convenient to use the quantity

P.sub.b,p[q]=min{P|μ.sub.b[q]<2.sup.p},

noting that E.sub.b,p[q]−1≤P.sub.b,p[q]≤E.sub.b,p[q]. We adopt a very simplistic model, in which the 4 samples in quad q all receive P.sub.b,p[q] magnitude bits, with only the non-zero samples receiving a sign bit. The fraction of samples in the quad that are non-zero is modeled as 1−2.sup.−P.sup.b,p.sup.[q]. This model essentially assumes that the quantized sub-band sample magnitudes μ.sub.b,p[n] are uniformly distributed over the interval from 0 to 2.sup.P.sup.b,p.sup.[q]−1, which is a very conservative assumption for sub-band data that tends to follow something closer to a Laplacian probability distribution in practice. The total number of MagSgn bits under this model can be expanded as

[00019] $M_{b, p} = 4 [C_{b, p} + C_{b, p + 1} + C_{b, p + 2} + .Math.] + 4 [C_{b, p} - \frac{1}{2} (C_{b, p} - C_{b, p + 1}) - \frac{1}{4} (C_{b, p + 1} - C_{b, p + 2}) - .Math.]$ $= 4 [C_{b, p} + C_{b, p + 1} + C_{b, p + 2} + .Math.] + 2 [C_{b, p} + \frac{1}{2} C_{b, p + 1} + \frac{1}{4} C_{b, p + 2} + .Math.]$

[0068] In addition to the magnitude and sign bits, we adopt a simple model for the cost of communicating the P.sub.b,p[q] values. The actual HT Cleanup encoder communicates magnitude exponent bounds differentially, which is different from communicating the P.sub.b,p[q] values, first because magnitude exponents E.sub.b,p[q] are not identical to the P.sub.b,p[q] values (there is an offset of 1 in the magnitudes that are bounded), and then because there is a complex inter-sample (not just inter-quad) dependency in the way the HT block coder communicates exponent bounds. One could try to model all this, but preferred embodiments instead rely upon the assumption that a simple coder that describes the same information should provide an upper bound for the coded length of the actual Cleanup algorithm. The simple coder recommended here involves independent signaling of P.sub.b,p[q] for each significant quad, using a unary (comma) code, combined with adaptive run-length coding of the quad significance symbols (via runs of insignificant quads) with a coding mechanism that is assumed to achieve the 0.sup.th order entropy of the quad-significance symbols whenever the probability of significance is less than 0.5—the adaptive run-length coder cannot use less than 1 bit for each run, so it degenerates into emitting a single significance bit for each quad once the significance likelihood becomes larger than 0.5. These two aspects (quad significance coding and unary coding of P.sub.b,p[q] for significant quads) are somewhat analogous to the information communicated via the HT Cleanup coder's MEL and VLC byte-streams, respectively.

[0069] For the unary code, a first bit indicates whether or not P.sub.b,p[q]>1, subject to the quad being significant (i.e., P.sub.b,p[q]>0); a second bit indicates whether or not P.sub.b,p[q]>2, subject to P.sub.b,p[q]>1; and so forth. Accordingly, the total number of unary code bits for the sub-band is simply

V.sub.b,p=[C.sub.b,p+C.sub.b,p+1+C.sub.b,p+2+ . . . ]

[0070] For the run-length coding of significance, the number of bits is approximated as

[00020] $\begin{matrix} R_{b, p} = Q_{b} .Math. H (\frac{C_{b, p} + 1}{Q_{b} + 1}) & (5) \end{matrix}$

where Q.sub.b is the total number of quads for the sub-band and H(u) is the entropy of a binary random variable with probability min{0.5,u}. To evaluate the function H(u), preferred embodiments use a quantized log-like representation of u, such as that obtained by taking some of the most significant bits from a floating-point representation of u, as the index to a lookup table.

[0071] In some embodiments, the adaptive nature of the run-length coding procedure employed by the HT Cleanup pass can be incorporated by accumulating quad significance statistics first over individual line-pairs j within the sub-band, so that C.sub.b,p=Σ.sub.jC.sub.b,p.sup.(j), and computing run-length bit counts separately for each line-pair j as

[00021] $R_{b, p}^{j} = Q_{b}^{line - pair} .Math. H (\frac{C_{b, p}^{j} + 1}{Q_{b}^{line - p a i r} + 1})$

where Q.sub.b.sup.line-pair is the number of quads in a single line-pair. Then equation (5) is replaced by

[00022] $R_{b, p} = \underset{j}{.Math.} R_{b, p}^{j}$

[0072] Either way, the final estimated byte count for bit-plane p of sub-band b is formed simply by adding up the three components developed above and dividing by 8, resulting in

[00023] $\begin{matrix} L_{b, p} = \frac{M_{b, p} + V_{b, p} + R_{b, p}}{8} & (6) \end{matrix}$

[0073] In practice, we observe that L.sub.b,p invariably over-estimates the number of bytes required by the HT Cleanup encoding of bit-plane p, but tends to be smaller than the number of bytes required by the HT Cleanup encoding of the next finer bit-plane p−1, which is the property required by the overall complexity control algorithm of FIG. 3 to be successful if the number of generated coding passes for each code-block is at least Z=4. For each sub-band b, there will be a maximum bit-plane P.sub.b.sup.max for which any quad is significant. For all p>P.sub.b.sup.max, C.sub.b,p=0 and so the only non-zero contribution to equation (6) is the very small cost R.sub.b,p of communicating the fact that all quads are insignificant. In particular, for p>.sub.b.sup.max, the value of L.sub.b,p found using the above procedure depends at most on the sub-band dimensions. While P.sub.b.sup.max itself is data dependent, there is a well-defined bound P.sub.b.sup.bound that depends only on the quantization step size Δ.sub.b, the properties of the transform used to produce sub-band samples, and the bit-depth of the original image sample values, such that P.sub.b.sup.max≤P.sub.b.sup.bound. An implementation of the method can use this bound to determine the number of quad significance statistics C.sub.b,p that need to be collected for each sub-band b. In some applications, it is desirable to impose a fixed limit S on the number of quad significance statistics C.sub.b,p that are collected for sub-band b, regardless of Δ.sub.b, the transform properties or the image bit-depth. This can be done without significantly damaging the efficacy of the method, by taking advantage of the fact that for p<<P.sub.b.sup.max (i.e., at very high precisions), the following relationship tends to hold

[00024] $L_{b, p} \approx L_{b, p + 1} + \frac{5}{8} Q_{b}$

[0074] This is because at very high precisions, most samples in the sub-band become significant, so that C.sub.b,p≈Q.sub.b, (V.sub.b,p−V.sub.b,p+1)≈Q.sub.b, (M.sub.b,p−M.sub.b,p+1)≈4Q.sub.b and (R.sub.b,p−R.sub.b,p+1)≈0. Using this relationship, embodiments can collect statistics C.sub.b,p and explicitly compute length estimates L.sub.b,p only for those bit-planes p such that

P.sub.b.sup.bound≥p≥P.sub.b.sup.min=max{0,P.sub.b.sup.bound+1−S}

deriving length estimates for each

[00025] $p \in [0, P_{b}^{\min}) from L_{b, p} = L_{b, P_{b}^{\min}} + \frac{5}{8} Q_{b} (P_{b}^{\min} - p) .$

[0075] The reader will appreciate that the coded length estimation method described above is only one of many related methods that can be used to provide a conservative model for the number of bytes produced by the HT Cleanup encoding procedure. More elaborate models can be used, that mimic the behaviour of the actual encoder more accurately, but practical experience suggests that these may not be justified, given the low complexity of the HT Cleanup encoder itself and the fact that the very simply model described above is sufficient even when the number of generated coding passes Z is small.

3.SUP.rd .Aspect: Online QP Adaptation Using Forecast Statistics

[0076] The complexity control method described in the 1.sup.st aspect requires coded lengths to be estimated from statistics collected from all sub-band samples, before the QP parameter can be computed. This in turn determines the coding passes that are generated by the block encoding process. As a result, the entire image, its quantized sub-band samples, or some equivalent set of data need to be buffered in memory before the block encoding process can commence. In many cases this leads to a high memory complexity, even if computational complexity is low.

[0077] This 3.sup.rd aspect avoids high memory complexity by updating the QP value dynamically, based on the sub-band samples that have actually been produced by the spatial transform process. FIG. 4 illustrates the method, focussing on just one sub-band b, but identifying the role played by other sub-bands. The spatial transform (Discrete Wavelet Transform here) is pipelined so that the entire image does not need to be buffered in memory. As is well-known, image lines can be pushed incrementally to the DWT (Discrete Wavelet Transform), in a top-down fashion, which incrementally produces lines of sub-band samples for each sub-band b, using only a modest amount of internal state memory. Of course, bottom-up and column-wise incremental pushing of image data can similarly be realized if appropriate, but in most applications the image data arrives in raster scan order, so we describe the approach specifically for that case here. Sub-band samples are collected within a memory buffer, from which they are consumed by the block encoding process.

[0078] It is helpful to collect sub-band samples into stripes, where each stripe represents a whole number of code-blocks—typically one row of code-blocks—for the sub-band. As shown in the figure, at any given point in the process we can identify four categories of sub-band samples, as follows: [0079] 1. An “active stripe” k corresponds to sub-band samples that have been produced by the transform and are ready for a QP value (equivalently, an F value) to be assigned, allowing a base Cleanup bit-plane P.sub.b to be assigned and the block coding of these samples to proceed. [0080] 2. “Dispatched stripes” correspond to sub-band samples that have previously been active, having already been assigned a QP value (i.e., an F value) and hence a base Cleanup bit-plane P.sub.b for block encoding to proceed. These samples may well have been encoded already, but that is not a strict requirement. In parallel processing environments, code-blocks may be distributed to concurrent processing engines that may span more than one stripe, so that one or more dispatched stripes may still be in-flight while the QP value for the active stripe is being determined. The method described here does not require any tight synchronization between the complexity control and block encoding processes. [0081] 3. “Advance data” corresponds to sub-band samples that have been produced by the transform, but have not yet been collected into whole stripes, or their stripes have not yet been made ready for QP assignment. The height of the advance data can be understood as the delay between production of a new line of sub-band samples and the point at which those samples become active for QP assignment and encoding. Larger delays provide more known statistics from which to forecast the coded lengths of samples that lie beyond the active stripe, but this comes at the expense of more memory. In many applications, it is desirable to reduce the delay to 0, so that at the point when a new stripe becomes active, there are no advance data at all. [0082] 4. “Unseen data” corresponds to sub-band samples that have not yet been produced by the transform.

[0083] As in the complexity control method described in the 1.sup.st aspect, sub-band samples are used to generate coded length estimates L.sub.p.sup.(b), for each candidate p for the base Cleanup pass (Cup0) bit-plane P.sub.b. A difference here, is that the coded length estimates are collected into records that describe only one stripe of the sub-band. Specifically, the entries L.sub.k,p.sup.(b) in record l.sub.k.sup.(b) provide a conservative estimate for the number of coded bytes that will be produced by the encoding of the sub-band samples in stripe k of sub-band b, within the HT Cleanup pass for bit-plane p. In general, these lengths represent multiple code-blocks, specifically, all code-blocks that lie within stripe k.

[0084] In some embodiments, the length estimates L.sub.k,p.sup.(b) may take fractional or floating-point values, rather than integers. The coded length estimation method described in the 2.sup.nd aspect of the invention is naturally adapted to estimating the length contributions from individual line-pairs within the sub-band, so it can be helpful to compute and aggregate line-pair length estimates with fractional precision. This also allows partial length estimates to be formed from any “advance data,” as defined above, and collected within a partial record L.sub.adv.sup.(b).

[0085] In this aspect, a QP value is generated for the active stripe k within sub-band b, without waiting for all unseen data to become available. The method for generating these dynamic QP values is substantially similar to that described in the 1.sup.st aspect. The main differences are: [0086] 1. There is a need to forecast the coded lengths associated with sub-band samples that lie beyond the active stripe. [0087] 2. There is a need to keep track of the estimated number of bytes associated with the base Cleanup passes of code-blocks that belong to dispatched stripes, since those may have had different QP values, based on which their base bit-plane P.sub.b was previously committed.

[0088] As shown in FIG. 4, the first issue mentioned here is addressed by a “length forecaster,” which produces a forecast length record Λ.sub.k.sup.(b) to represent the estimated lengths associated with all samples beyond the active stripe k. We discuss the forecasting process further below.

[0089] The second issue is addressed by extracting the estimated length B.sub.k.sup.(b) from record L.sub.k.sup.(b) at the point when the base bit-plane P.sub.b is determined using the QP value (equivalently, the integer value F such that QP=F/G). This is done by the box marked with reference numeral 10 in the figure, which uses equation (2) to obtain P.sub.b for the code-blocks in stripe k and reports

B.sub.k.sup.(b)=L.sub.k,p.sup.(b)|.sub.p=P.sub.b

[0090] These B.sub.k.sup.(b) values are accumulated for all dispatched stripes of all sub-bands to keep track of the total number of committed bytes B.sub.acc. Note that the B.sub.k.sup.(b) and B.sub.acc values are scalar, whereas the L.sub.k.sup.(b) and Λ.sub.k.sup.(b) are vector-valued records, representing a multitude of hypotheses p for the base bit-plane that has yet to be determined for the active stripe k.

[0091] The QP estimation procedure is similar to that employed in the 1.sup.st aspect, except that the target maximum number of bytes L.sub.max is reduced by the number of bytes that have already been committed, B.sub.acc, so that equation (4) becomes

F=min{f|L.sub.f≤L.sub.max−B.sub.acc} (7)

and L is computed online (i.e., adaptively) from the most recent forecasts and uncommitted active records of all sub-bands b. Specifically, writing k.sub.b for the index of the most recent active stripe for any given sub-band b, L.sub.k.sub.b.sup.(b) for the expanded and biased version of its estimated length record L.sub.k.sub.b.sup.(b), following equation (3), Λ.sub.k.sub.b.sup.(b) for the similarly expanded and biased version of its most recent forecast record Δ.sub.k.sub.b.sup.(b), and D.sup.(b) for the set of all dispatched stripe indices for sub-band b, L is found from

L=Σ.sub.bΛ.sub.k.sub.b.sup.(b)+Σ.sub.bs.t.k.sub.b.sub..Math.D.sub.(b)L.sub.k.sub.b.sup.(b) (8)

[0092] To be concrete, the expand and bias operations of equation (3) here become

[0093] L.sub.k.sub.b.sub.,f.sup.(b)=L.sub.k,p(b,f).sup.(b) and Λ.sub.k.sub.b.sub.,f.sup.(b)=Λ.sub.k,p(b,f).sup.(b), ∀f, where

[00026] $p (b, f) = \max {0, .Math. \frac{f + β_{b}^{'}}{G} .Math.}$

and equation (8) means that

[00027] ${\bar{L}}_{f} = \underset{b}{.Math.} {\bar{Λ}}_{k_{b}, f}^{(b)} + \underset{b s . t . k_{b} .Math. 𝒟^{(b)}}{.Math.} {\bar{L}}_{k_{b}, f}^{(b)}, \forall f$

Note that L.sub.f accounts for the (conservative) estimated coded lengths of all sub-band samples that do not belong to dispatched stripes within their sub-bands, for each hypothesis f on the value F that will be found via equation (7). Dispatched stripes are excluded, because all dispatched stripes have already been assigned an F value (i.e., a QP) and committed bytes to the tally in B.sub.acc, as described above.

[0094] The QP (i.e., F) assignment procedure can be executed each time a new active stripe becomes available for any sub-band. In this case, the second sum in equation (8) might involve only the sub-band b for which the assignment is being executed, all other sub-bands having had their most recent active stripes previously dispatched. However, the procedure can also be executed less frequently, waiting until several sub-bands have active stripes ready for QP assignment, so that the second sum in equation (8) involves multiple terms. Executing the QP assignment procedure less frequently reduces the overall computation associated with expanding, biasing and accumulating estimated and forecast length records, although this computation is not overly burdensome.

[0095] We turn our attention now to the creation of forecast length records Λ.sub.k.sup.(b). As shown in FIG. 4, the information available for generating forecasts consists of any partial estimated length record L.sub.adv.sup.(b) that has already been formed from some or all of the advance data, together with the set of active and past estimated length records {L.sub.i.sup.(b)}.sub.0≤i≤k.

[0096] Writing N.sub.k.sup.(b) for the number of sub-band lines represented by these length records, and H.sup.(b) for the height of the sub-band, a simple forecasting approach is to set

[00028] $\begin{matrix} Λ_{k}^{(b)} = L_{a d v}^{(b)} + \frac{H^{(b)} - N_{k}^{(b)}}{N_{k}^{(b)}} .Math. (L_{a d v}^{(b)} + {.Math.}_{i = 0}^{k} L_{i}^{(b)}) & (9) \end{matrix}$

[0097] In some embodiments, this simple uniform average may be replaced by a weighted average that places more emphasis on more recent estimated length records; this can be beneficial when N.sub.k.sup.(b) is larger than the number of sub-band lines H.sup.(b)−N.sub.k.sup.(b) to which the forecast applies.

[0098] At the beginning, when the transform has produced only a small number of sub-band samples, some sub-bands might not yet have accumulated any active stripe, dispatched or otherwise. For these sub-bands b, there is no most recent active stripe, so k.sub.b=−1. It is important that the first sum in equation (8) involves forecasts from every sub-band. In some embodiments, this requirement can be addressed by waiting until all sub-bands have active stripes before dispatching the first active stripe from any sub-band. However, this may consume significant memory resources in deep DWT hierarchies. A preferred approach is for all sub-bands to generate a first forecast length record Λ.sub.−1.sup.(b) as soon as they have accumulated any partial coded length estimates L.sub.adv.sup.(b). This may occur after the production of a single line-pair for the sub-band, so that the first execution of the QP assignment procedure is delayed until all sub-bands have received at least one pair of lines.

[0099] To further reduce delay and memory consumption, some embodiments may generate initial forecasts for sub-bands deep in the DWT hierarchy before any data becomes available from the transform. This can be done by scaling the forecasts produced by other higher resolution sub-bands in accordance with the sampling densities involved. Fortunately, low resolution sub-bands, for which sub-band samples might not be available when the QP assignment procedure is first executed, tend to have very low sample densities, so that they tend to have only a small impact on the L vector that is used in the QP assignment of equation (7).

[0100] In other embodiments, a “background” estimated length record L.sub.bg.sup.(b) may be employed for some or all sub-bands, containing lengths estimated offline, from other image data. An initial forecast can then be generated from this background data as

[00029] $Λ_{- 1}^{(b)} = L_{a d v}^{(b)} + \frac{W^{(b)} .Math. (H^{(b)} - N_{k}^{(b)})}{S_{b g}^{(b)}} .Math. L_{b g}^{(b)}$

where W.sup.(b) is the width of sub-band b and S.sub.bg.sup.(b) is the total number of samples from which the background length record L.sub.bg.sup.(b) was derived—alternatively the background length record may be kept in normalized form as L.sub.bg.sup.(b)/S.sub.bg.sup.(b). Some embodiments may include a contribution from any such background record when generating later forecasts Λ.sub.k.sup.(b), especially when k is small.

[0101] The description provided above may appear to suggest that QP selection is a centralized process, using synchronized information from all sub-bands to form and distribute decisions that are used to determine the base bit-plane values P.sub.b for block encoding. However, the QP selection process can in fact be executed in a distributed fashion, using information that is not necessarily synchronized. In particular, each sub-band or group of sub-bands can be assigned its own local copy of the boxes marked reference numeral 10 FIG. 4, which accumulate committed bytes B.sub.k.sup.(b) and determine QP (equivalently F) values. In a distributed implementation, each such local instance of the committed byte accumulator and QP generation processes still needs input from all other sub-bands, but this input can be delayed or partially pre-aggregated, relative to the active stripes for which QP estimates are being actively generated. In particular, the only external input required for correct operation of the local QP generation process is the cumulative sum of all committed byte counts from external sub-bands b, together with an augmented forecast vector that accumulates the latest forecasts from the external sub-bands Λ.sub.k.sup.(b), plus any length estimates L.sub.k.sup.(b) from external active stripes which have not yet been committed.

[0102] We finish our description of this 3.sup.rd aspect by pointing out that the complexity constraint method here does not include the encoding process itself. Unlike the adaptive quantization schemes employed by many conventional coders, actual encoded lengths are not used to determine the QP value. Moreover, the QP value itself does not directly determine the quantized sub-band sample values, since the block encoder generates not only the base Cleanup pass whose bit-plane P.sub.b depends on QP, but Z−1 additional coding passes. These properties mean that the block encoding processes can be largely decoupled from the complexity control procedure, allowing implementations to support the considerable parallelism afforded by independent block coding. Moreover, the PCRD-opt algorithm is free to optimize the distortion-length trade-off associated with individual code-blocks, at any point. In some embodiments, the PCRD-opt procedure may be executed only once all coded data for an image or frame has been generated, maximizing its opportunity to distribute bits non-uniformly according to scene complexity. In other embodiments, the PCRD-opt procedure may be executed incrementally so as to emit finalized code-stream content progressively, reducing memory consumption and latency. Even then, however, the frequency with which the PCRD-opt procedure is executed can be very different to the frequency with which the QP assignment procedure is executed, since the two are decoupled.

[0103] These properties and opportunities all ultimately derive from the fact that the coded length estimation process is not used for rate control, as carefully expounded earlier.

4th Aspect: Enhanced Forecasting for Video Applications

[0104] For video applications, previously compressed frames can contribute to the forecasting of the coded lengths of unseen data in a current frame. As a starting point, the background length records L.sub.bg.sup.(b) mentioned above, can be derived from the estimated length records in sub-band b of a previously compressed frame, and this background information can not only be used to form the initial forecast length record Λ.sub.−1.sup.(b), but also contribute to regular forecast length records Λ.sub.k.sup.(b) in the early stripes (small k) of each sub-band. This 4.sup.th aspect goes further by providing a method for determining the reliability of length estimates from previous frames, and incorporating length estimates from previous frames into forecast lengths within a current frame based on this reliability.

[0105] In this 4.sup.th aspect, a set of J “previous frame” summary length records P.sub.j.sup.(b) are kept for each sub-band b, where 0≤j<J and record j summarizes the coded lengths that were estimated for H.sub.j.sup.(b) lines from the sub-band in a previous frame. As a recommended example, J=6 summary records are kept for each sub-band, whose heights H.sub.j.sup.(b) divide the overall sub-band height H.sup.(b) roughly as follows:

[00030] $J = 6 and (\begin{matrix} H_{0}^{(b)} \\ H_{1}^{(b)} \\ H_{2}^{(b)} \\ H_{3}^{(b)} \\ H_{4}^{(b)} \\ H_{5}^{(b)} \end{matrix}) \approx H^{(b)} .Math. (\begin{matrix} 2^{- 4} \\ 2^{- 4} \\ 2^{- 2} \\ 2^{- 2} \\ 2^{- 2} \\ 2^{- 3} \end{matrix})$

[0106] Using the length estimation method described in the 2.sup.nd aspect, coded length estimates are formed from quad significance statistics that can become available after each pair of sub-band lines has been produced; in this case the exact summary record heights H.sub.j.sup.(b) should be multiples of 2. These incremental length estimates are aggregated to form “current frame” summary length records C.sub.j.sup.(b), which become “previous frame” summary length records P.sub.j.sup.(b) in the next frame. In memory-efficient embodiments, a C.sub.j.sup.(b) record can overwrite the P.sub.j.sup.(b) record as soon as it has been completely produced. For simplicity, the summary record heights H.sub.j.sup.(b) are taken to be consistent across frames, so that C.sub.j.sup.(b) and P.sub.j.sup.(b) both represent coded length estimates for the same set of sub-band lines; however, variations of the method can readily be developed in which the heights are allowed to vary.

[0107] Recall that N.sub.k.sup.(b) is the number of lines from sub-band b that have been used to form coded length estimates so far within the current frame. Let j.sub.k be the number of completed summary length records C.sub.j.sup.(b) so that C.sub.j.sub.k.sup.(b) denotes the next summary length record, that is currently being assembled. It follows that

Σ.sub.j=0.sup.j.sup.k.sup.−1H.sub.j.sup.(b)≤N.sub.k.sup.(b)

and

H.sub.cur.sup.(b)=N.sub.k.sup.(b)−Σ.sub.j=0.sup.j.sup.k.sup.−1H.sub.j.sup.(b)

is the number of sub-band lines that have already contributed length estimates to C.sub.j.sub.k.sup.(b).

[0108] An aggregate vector of coded length estimates for all N.sub.k.sup.(b) lines from sub-band b that have been seen so far is

C.sub.pre.sup.(b)=C.sub.j.sub.k.sup.(b)+Σ.sub.j=0.sup.j.sup.k−1C.sub.j.sup.(b)=L.sub.adv.sup.(b)+Σ.sub.i=0.sup.kL.sub.i.sup.(b)

noting that we are treating the incomplete C.sub.j.sub.k.sup.(b) an accumulator for length estimates from the line-pairs that have been seen since the completion of summary record C.sub.j.sub.k.sub.−1.sup.(b).

[0109] A similar vector, representing the same number of sub-band lines in the previous frame, can be formed as

[00031] $P_{p r e}^{(b)} = \frac{H_{c u r}^{(b)}}{H_{j_{k}}^{(b)}} P_{j_{k}}^{(b)} + {.Math.}_{j = 0}^{j_{k - 1}} P_{j}^{(b)}$

[0110] Then, a vector that represents length estimates in the previous frame for the H.sup.(b)−N.sub.k.sup.(b) sub-band lines that do not currently have length estimates in the current frame can be formed as

[00032] $P_{p o s t}^{(b)} = (1 - \frac{H_{c u r}^{(b)}}{H_{j_{k}}^{(b)}}) P_{j_{k}}^{(b)} + {.Math.}_{j = j_{k} + 1}^{J - 1} P_{j}^{(b)}$

[0111] In this aspect of the invention, the reliability of inter-frame length estimates is compared with that of intra-frame length estimates via the two quantities

[00033] $Δ_{temporal}^{(b)} = .Math. \frac{1}{N_{k}^{(b)}} .Math. U (C_{pre}^{(b)}) - \frac{1}{N_{k}^{(b)}} .Math. U (P_{pre}^{(b)}) .Math.$ $and$ $Δ_{spatial}^{(b)} = .Math. \frac{1}{H^{(b)} - N_{k}^{(b)}} .Math. U (P_{post}^{(b)}) - \frac{1}{N_{k}^{(b)}} .Math. U (P_{pre}^{(b)}) .Math.,$

where U(L) is a scalar measure of scene complexity derived from the estimated lengths vector L. A good choice for the function U( ) is an exponentially weighted sum, such as:

U(L)=Σ.sub.p>02.sup.−p.Math.(L.sub.p−L.sub.0) (10)

[0112] If Δ.sub.temporal>Δ.sub.spatial.sup.(b), it is preferable to set the forecast vector Λ.sub.k.sup.(b) to P.sub.post.sup.(b), essentially assuming that the estimated byte counts for the H.sup.(b)−N.sub.k.sup.(b) missing sub-band lines in the current frame will be similar to those for the same sub-band lines in the previous frame—we call this “temporal forecasting.” Otherwise, it is preferable to generate Λ.sub.k.sup.(b) from C.sub.pre.sup.(b), using equation (9) as in the 3.sup.rd aspect—we call this “spatial forecasting.”

[0113] Preferred embodiments of the invention, avoid the extremes of pure temporal or pure spatial forecasting, when the sub-band has at least one active stripe in the current frame, by assigning

[00034] $Λ_{k}^{(b)} = L_{a d v}^{(b)} + {\begin{matrix} \max {P_{post}^{(b)}, \frac{H^{(b)} - N_{k}^{(b)}}{N_{k}^{(b)}} .Math. {\overset{↼}{C}}_{p r e}^{(b)}} & if Δ_{temporal}^{(b)} < Δ_{spatial}^{(b)} \\ \max {{\overset{↼}{P}}_{post}^{(b)}, \frac{H^{(b)} - N_{k}^{(b)}}{N_{k}^{(b)}} .Math. C_{p r e}^{(b)}} & if Δ_{temporal}^{(b)} \geq Δ_{spatial}^{(b)} \end{matrix}$

[0114] Here, the notation custom-character means that the individual coded length estimates L.sub.p within the vector L have been adjusted downwards so that max{P,} is most likely constrained by P, while {,C} is most likely constrained by C. Specifically, a good choice for this downward adjustment process is to assign

[00035] $p = {\begin{matrix} L_{p - 1} & if p > 0 \\ 0 & if p = 0 \end{matrix}$

[0115] It will be apparent to those skilled in the art, that there are many different ways to avoid the extremes of pure temporal forecasting versus pure spatial forecasting, while still using the relationship between Δ.sub.temporal.sup.(b) and Δ.sub.spatial.sup.(b) to favour temporal or spatial forecasts.

[0116] It will also be apparent that the U(P.sub.pre.sup.(b)) values can be formed incrementally by applying equation (10) to each of the P.sub.j.sup.(b) records as it is about to be overwritten by a completed C.sub.j.sup.(b) record, and accumulating the results. This means that there is no need to provide storage for both the C.sub.j.sup.(b) and P.sub.j.sup.(b) summary records for a sub-band.

[0117] Furthermore, it will be apparent that although the formulas for Λ.sub.temporal.sup.(b) and Δ.sub.spatial.sup.(b) presented above involve height ratios, the costly division operations in these ratios can be avoided by standard cross multiplication techniques, since we are only interested in determining which of Λ.sub.temporal.sup.(b) and Δ.sub.spatial.sup.(b) is smaller.

5.SUP.th .Aspect: QP Adaptation for Low Latency Image and Video Encoding

[0118] The foregoing aspects support high throughput of imagery and video, with high quality rate control—i.e., accurately targeting a coded length target for the image or each frame of the video. Aspects 3 and 4 of the invention support low memory configurations, in which neither the image nor the sub-band samples need to be buffered fully in memory prior to block encoding. In all cases, the PCRD-opt stage of the overall encoding process can be deferred until all block encoding has completed for the image or for a video frame (even for a group of video frames), allowing coded bits to be distributed non-uniformly over space (or even time), in accordance with scene complexity. In many cases, this is a good strategy, because the total amount of coded data that needs to be buffered ahead of the PCRD-opt stage is usually much smaller than the amount of image or sub-band data that it represents.

[0119] In low latency applications, such as sub-frame latency video coding, one cannot wait until all coded data for an image or frame has been produced before executing the PCRD-opt process and emitting the final code-stream. Moreover, in such applications, one often needs to consider a communication channel with a constant, or at least constrained, bit-rate when determining the end-to-end latency. The natural way to address such applications in the context of JPEG 2000 is to collect stripes of code-blocks from each of the sub-bands in the image or video frame into “flush-sets,” as shown in FIG. 5, such that each sub-band is vertically partitioned into the same number of flush-sets, and each flush-set contains contributions from each sub-band that advance the coded image representation in a consistent way. For extremely low latencies, the number of vertical decomposition levels in the wavelet transform is often restricted to just 2 or 3, and rectangular code-blocks are employed that are much wider than they are high (e.g., 1024×4). JPEG 2000 precinct dimensions can be selected to ensure that code-block heights decrease by a factor of 2 with each level in the DWT hierarchy, and a spatially oriented progression order is selected so that the coded information for each flush-set can be emitted to the code-stream, as soon as it becomes available. The so-called PCRL (position, component, resolution, layer) progression order should usually be employed for low latency, where coded data appears in a vertical spatially progressive order (top to bottom), and for each spatial position the image components (usually colour planes) appear in order, and for each component at each spatial position successive resolutions appear in order, with successive quality layers (if more than one) of each precinct appearing consecutively. Vertical tiling can also be used to build flush-sets, but it is a less desirable approach, since introducing tile boundaries damages the properties of the DWT, reducing coding efficiency and potentially introducing visual artefacts into the decoded imagery at modest bit-rates.

[0120] The 1.sup.st aspect is readily adapted to such low latency encoding environments by restricting the QP assignment process to just those sub-band samples and code-blocks that belong to a single flush-set. In a constant or constrained bit-rate environment, the number of coded bytes produced for any given flush-set is generally required to satisfy both a lower bound (underflow constraint) and an upper bound (overflow constraint). The upper bound on the compressed size for a flush-set becomes the L.sub.max parameter in FIG. 3. Once all sub-band lines for a flush-set have been produced by the transform, the method described in the 2.sup.nd aspect is used to create estimated coded length records for the flush-set, which are used to assign a QP value for just that flush-set, based on the flush-set's L.sub.max constraint. This QP (equivalently F) value is used to derive the base bit-plane P.sub.b for all code-blocks in the flush-set that belong to sub-band b, whereupon block encoding is performed. Finally, the PCRD-opt algorithm is applied to the generated coding passes to produce a rate-distortion optimal representation of the flush-set that conforms to the L.sub.max upper bound. If the generated content is unable to satisfy the lower bound (underflow constraint) for a flush-set, stuffing bytes can be inserted into code-block byte streams in a way that does not affect the decoded result—both the J2K-1 and HT block coding algorithms support the introduction of stuffing bytes to the coded content.

[0121] To further reduce latency and/or memory consumption, some embodiments may employ the spatial forecasting methods described in the 3.sup.rd aspect or (for video) the combined spatial and temporal forecasting methods described in the 4.sup.th aspect, to allow the block encoding process within a flush-set to commence before all sub-band samples of the flush-set have been produced by the transform.

Experimental Results

[0122] Here we provide some experimental evidence for the effectiveness of embodiments of the invention.

[0123] First, we consider the compression of a single very large image. The image in question is an aerial photograph that measures 13333×13333, having RGB pixels with 24 bits each (8 bits/sample), occupying 533 MB on disk.

[0124] We compress the image to various bit-rates, using mean squared error peak signal-to-noise ratio MSE(PSNR) as the optimization objective for the PCRD-opt procedure. Code-block dimensions are 64×64 and the CDF9/7 wavelet transform is employed, along with the usual irreversible decorrelating colour transform (RGB to YCbCr in this case). We choose peak signal-to-noise ratio (PSNR) as the optimization objective so that the effect of complexity constrained encoding can be ascertained simply by measuring the PSNR of the decompressed code-stream, with the original image as reference. In each case, we compress the image to produce an HTJ2K code-stream conforming to JPEG 2000 Part-15, using the HT block coding algorithm exclusively. All compression and decompression are performed using the methods described hitherto.

[0125] We compare the performance associated with “Full” HT encoding, in which the block encoder produces a large number of passes, starting from the first (coarsest) significant bit-plane of each code-block, against that obtained using the methods described in the 3.sup.rd aspect, with minimum delay (no “advance data” at the point when a stripe becomes active) and various numbers of HT coding passes Z. The length estimates themselves are formed using the 2.sup.nd aspect. PSNR results for these different approaches are reported in Table 1. Evidently, for this type of content at least, it is sufficient for the HTJ2K encoder to produce at most Z=6 coding passes per code-block, which is much less than the number of passes produced by “Full” HT encoding.

TABLE-US-00001 TABLE 1 Compression of a large (0.5 GB) RGB aerial image to an HTJ2K code-stream, with various levels of HT encoder complexity control, corresponding to 3, 4 and 6 coding passes per code-block, followed by unweighted (PSNR-based) PCRD-opt rate control. JPH JPH JPH JPH bits/pel HT-FULL CPLEX-3p CPLEX-4p CPLEX-6p 0.5 bpp 32.29 dB 32.00 dB 32.29 dB 32.29 dB 1.0 bpp 34.62 dB 34.50 dB 34.62 dB 34.62 dB 2.0 bpp 38.48 dB 38.30 dB 38.48 dB 38.48 dB 3.0 bpp 42.20 dB 41.93 dB 42.16 dB 42.16 dB 4.0 bpp 45.54 dB 45.35 dB 45.53 dB 45.53 dB

[0126] Next, we explore the effectiveness of the combined spatial and temporal forecasting method described in the 4.sup.th aspect, in comparison with the spatial-only (intra frame) forecasting method described in the 3.sup.rd aspect. To do this, we construct an artificial 4K 4:4:4 RGB video sequence consisting of 48 frames with 5 scene cuts (6 segments), alternating between two different types of content. Four of the six segments are much easier to compress than the other two, with substantial amounts of overcast sky in the upper half of the picture, so that the scene complexity also varies strongly from the top to the bottom of the frame. The content is compressed to a bit-rate of 1 bit/pixel (bpp), with constant rate control, so that each frame has essentially the same compressed size, of

[00036] $3 8 4 0 \times 2 1 6 0 \times \frac{1}{8} = 1, 036, 800 bytes .$

Each frame is encoded to produce an HTJ2K code-stream, conforming to JPEG 2000 Part-15, using the HT block coding algorithm exclusively. PSNR traces, from the average mean squared error in each frame across the R, G and B channels, are plotted in FIG. 6.

[0127] The “PCRD-STATS” trace in this figure corresponds to the existing “PCRD-STATS” method for complexity constrained HTJ2K encoding that was mentioned in “Background to the Invention,” as described in WO2020/073098, where the set of coding passes generated for a given code-block are based on the operating point selected by the PCRD-opt procedure for the same code-block in the preceding frame, introducing extra coding passes in such a way as to allow progressive adaptation to changes in the scene content over time, while not producing more than Z=6 coding passes per code-block. The first frame is treated differently to get things started; in particular, for the results plotted in FIG. 6, all possible coding passes are generated for each code-block of the first frame.

[0128] The “CPLEX-S” trace in the figure corresponds to the method described in the 3.sup.rd aspect, again with minimum memory, while the “CPLEX-ST” trace corresponds to the method described in the 4.sup.th aspect. In both cases, the length estimates themselves are formed from quad significance statistics, following the 2.sup.nd aspect, and at most Z=6 coding passes are generated for each code-block.

[0129] The “HTFULL” trace in the figure corresponds to “Full” HT encoding, where many coding passes are produced for each code-block, starting from the first (coarsest) bit-plane in which any sample of the code-block is significant. This approach is inherently more complex (lower throughput) than the other methods, since many more coding passes are generated by the HT block encoder. One would expect that “Full” HT encoding would also produce the highest PSNR, since the PCRD-opt procedure is configured to maximize PSNR (minimize MSE) and is provided with a larger set of options (coding passes) from which to determine the optimized truncation points for each generated code-stream. However, the “Full” HT encoding approach here uses the Kakadu™ software's slope threshold prediction feature, where the distortion-length slope threshold for a frame is estimated from that used in previous frames, and the estimated slope is used to terminate the block encoding process early, if appropriate, rather than generating all possible coding passes. This strategy for complexity control is described in D. Taubman, “Software architectures for JPEG2000,” in Proc. IEEE Int. Conf. DSP, Santorini, Greece, 2002 and has been used successfully by JPEG 2000 encoders for many years, but it does exhibit a penalty in the transition from frames with high scene complexity to low scene complexity (low to high PSNR), as evidenced by trace reference numeral 20 in the figure. (Note: The Kakadu software is a widely used implementation of the JPEG 2000 suite of standards, whose tools are often used to produce reference results for JPEG 2000 in academic and commercial settings—these tools are available at http://www.kakadusoftware.com.)

[0130] Observe that the “PCRD-STATS” trace (reference numeral 21 in the figure) exhibits significant performance loss at scene transitions, especially in transitions from frames with low scene complexity to high scene complexity (high to low PSNR). This is because it assumes that local scene complexity will not change too rapidly from frame to frame, so it can wind up skipping too many coarse bit-planes in a high complexity frame that follows a low complexity one.

[0131] The purely intra-frame complexity control method from the 3.sup.rd aspect (the reference numeral 22 “CPLEX-S” curve) is more temporally robust than both “HTFULL” and “PCRD-STATS”, but suffers from some quality degradation in the easily compressed portion of the video where the upper part of each frame has very low scene complexity. This is not surprising, since the 3.sup.rd aspect of the invention uses coded length estimates from the upper portion of each sub-band, that has already been produced by spatial transformation, to generate forecast lengths for the remaining (unseen) portion of the sub-band.

[0132] Overall, the combined spatial and temporal forecasting method from the 4.sup.th aspect (reference numeral 23 “CPLEX-ST” curve) is superior to all other methods, both in temporal robustness and in overall compressed image quality.

[0133] The encoding methods of the above-described embodiments may be implemented by appropriate computing apparatus programmed with appropriate software. Embodiments of methods can be implemented in GPU deployments and in low and high latency hardware deployments.

[0134] Where software is used to implement embodiments, the software can be provided on computer readable media, such as discs or as data signals on a network, such as the internet, or in any other way.

[0135] The above described embodiments relate to use within the JPEG2000 (HTJ2K) and J2K formats. Embodiments of the invention are not limited to this. Some embodiments may be used in other image processing formats. Embodiments may find application in other image processing contexts.

[0136] It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

[0137] In the claims which follow and in the preceding description of the invention, except where the context requires otherwise due to express language or necessary implication, the word “comprise” or variations such as “comprises” or “comprising” is used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention.

[0138] It is to be understood that, if any prior art publication is referred to herein, such reference does not constitute an admission that the publication forms a part of the common general knowledge in the art, in Australia or any other country.

Method and Apparatus for Complexity Control in High Throughput JPEG 2000 (HTJ2K) Encoding

Inventors

Cpc classification

Classification Explorer

H04N19/196

ELECTRICITY

Classification Explorer

H04N19/147

ELECTRICITY

Classification Explorer

H04N19/176

ELECTRICITY

Classification Explorer

H04N19/126

ELECTRICITY

Classification Explorer

H04N19/149

ELECTRICITY

Classification Explorer

H04N19/172

ELECTRICITY

Classification Explorer

H04N19/647

ELECTRICITY

Classification Explorer

H04N19/61

ELECTRICITY

Classification Explorer

H04N19/1883

ELECTRICITY

Classification Explorer

H04N19/124

ELECTRICITY

International classification

Classification Explorer

H04N19/147

ELECTRICITY

Classification Explorer

H04N19/124

ELECTRICITY

Classification Explorer

H04N19/172

ELECTRICITY

Classification Explorer

H04N19/176

ELECTRICITY

Classification Explorer

H04N19/196

ELECTRICITY

Classification Explorer

H04N19/61

ELECTRICITY

Abstract

Claims

Description