RATE-CONTROL-AWARE RESHAPING IN HDR IMAGING
20230039038 · 2023-02-09
Assignee
Inventors
Cpc classification
International classification
Abstract
Given an input image in a high dynamic range (HDR) which is mapped to a second image in a second dynamic range using a reshaping function, to improve coding efficiency, a reshaping function generator may adjust the codeword range of the HDR input under certain criteria, such as for noisy HDR images with a relatively-small codeword range. An example of generating a scaler for adjusting the HDR codeword range based on the original codeword range and a metric of the percentage of edge-points in the HDR image is provided. The adjusted reshaping function allows for more efficient rate control during the compression of reshaped images.
Claims
1. A method for generating a reshaping function, the method comprising: receiving one or more input images (120) in a first dynamic range; computing a first minimum luma value and a first maximum luma value in the one or more input images; computing (605) a first codeword range for a luma channel of the one or more input images based on the first minimum luma value and the first maximum luma value; computing (610) a noise metric for the luma channel of the one or more input images, wherein the noise metric comprises a metric of noisiness of the luma channel; computing (615) a scaler based on the first minimum luma value, the first maximum luma value and the noise metric; and depending on the scaler, generating a forward luma reshaping function mapping luma values in the one or more input images from a source luma codeword range to a target luma codeword range, wherein the forward luma reshaping function is constructed to map a minimum codeword value of the source luma codeword range to a minimum codeword value of the target luma codeword range, and a maximum codeword value of the source luma codeword range to a maximum codeword value of the target luma codeword range, comprising: if the scaler is bigger than one: computing a second minimum luma value and a second maximum luma value based on the first minimum luma value, the first maximum luma value and the scaler; generating (620) a second codeword range for the luma channel of the one or more input images based on the second minimum luma value and the second maximum luma value, wherein the second codeword range is larger than the first codeword range; and generating (630) the forward luma reshaping function using the second codeword range as the source luma codeword range; else generating (625) the forward luma reshaping function using the first codeword range as the source luma codeword range.
2. The method of claim 1, further comprising: depending on the scaler, generating, based on a minimum luma value, a maximum luma value, and the forward luma reshaping function, a forward chroma reshaping function mapping chroma values in the one or more input images from a source chroma codeword range to a target chroma codeword range, comprising: if the scaler is bigger than one: generating the forward chroma reshaping function using the second minimum luma value as the minimum luma value and the second maximum luma value as the maximum luma value; else generating the forward chroma reshaping function using the first minimum luma value as the minimum luma value and the first maximum luma value as the maximum luma value.
3. The method of claim 1, wherein computing the scaler comprises computing an exponential mapping based on the bit-depth resolution of the one or more input images, the first minimum luma value, the first maximum luma value and the noise metric.
4. The method of claim 1, wherein computing the scaler (M) comprises computing
M=max(βe.sup.−αδ,1.0), wherein δ denotes a function of the bit-depth resolution of the one or more input images, the first minimum luma value and the first maximum luma value, β denotes a function of the noise metric, and α is a function of the noise metric and a cut-off parameter C for which if δ≥C, then M=1.
5. The method of claim 4, wherein
6. The method of claim 1, wherein the second minimum luma value {tilde over (v)}.sub.L,min.sup.i and the second maximum luma value {tilde over (v)}.sub.L,max.sup.i are computed as:
{tilde over (v)}.sub.L,min.sup.i=max(0,v.sub.L,avg.sup.i−M×Δ.sub.L,1.sup.i),
{tilde over (v)}.sub.L,max.sup.i=min(2.sup.B.sup.
Δ.sub.L,1.sup.i=v.sub.L,avg.sup.i−v.sub.L,min.sup.i,
Δ.sub.L,2.sup.i=v.sub.L,max.sup.i−v.sub.L,avg.sup.i, wherein v.sub.L,min.sup.i, v.sub.L,max.sup.i, and v.sub.L,avg.sup.i denote the first minimum luma value, the first maximum luma value, and an average luma value in the one or more input images.
7. The method of claim 1, where computing the noise metric comprises: normalizing luma values in the one or more input images to [0, 1) to generate one or more normalized images; determining edge points in the one or more normalized images based on edge-detection operators and one or more thresholds; and determining a percentage of the determined edge points over the total number of pixels in the one or more normalized images.
8. The method of claim 7, wherein the edge-detection operators comprise the Sobel operators.
9. The method of claim 7, wherein computing the threshold for a j-th image in the one or more normalized images comprises computing
Δ.sub.j,L,1.sup.i=v.sub.j,L,avg.sup.i−v.sub.j,L,min.sup.i
Δ.sub.j,L,2.sup.i=v.sub.j,L,max−v.sub.j,L,avg.sup.i, wherein v.sub.j,L,min.sup.i, v.sub.j,L,max.sup.i, and v.sub.j,L,avg.sup.i denote a minimum luma value in the j-th image, a maximum luma value in the j-th image, and an average luma value in the j-th image.
10. The method of claim 7, wherein determining the noise metric P′ further comprises computing
P.sup.i=max(P.sub.j.sup.i), wherein P.sub.j.sup.i denotes the percentage of edge points in the j-th normalized image in the one or more normalized images.
11. The method of claim 2, further comprising: applying the forward luma reshaping function and the forward chroma reshaping function to map the one or more input images in the first dynamic range to one or more reshaped images in a second dynamic range; and encoding the one or more reshaped images to generate a coded bitstream.
12. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions for executing with one or more processors a method in accordance with claim 1.
13. An apparatus comprising a processor and configured to perform the method recited in claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] An embodiment of the present invention is illustrated by way of example, and not in way by limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
DESCRIPTION OF EXAMPLE EMBODIMENTS
[0020] Designing rate-control-aware reshaping functions for coding HDR images and video content is described herein. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in exhaustive detail, in order to avoid unnecessarily occluding, obscuring, or obfuscating the present invention.
SUMMARY
[0021] Example embodiments described herein relate to designing rate-control-aware reshaping functions for the efficient coding of HDR images. In an embodiment, in an apparatus comprising one or more processors, a processor receives one or more input images in a first dynamic range (e.g., a high-dynamic range), it computes a first codeword range for luma pixels in the one or more input images, it computes a noise metric for the luma pixels in the one or more input images, it computes a scaler to adjust the first codeword range based on the first codeword range and the noise metric, and if the scaler is bigger than one, then a) it generates a second codeword range for the luma pixels in the one or more input images based on the scaler and the first codeword range, wherein the second codeword range is larger than the first codeword range, and b) it generates a forward luma reshaping function mapping luma pixel values from the first dynamic range to a second dynamic range (e.g., a standard dynamic range) based on the second codeword range; else it generates the forward luma reshaping function based on the first codeword range.
[0022] Example HDR Coding System
[0023] As described in U.S. Pat. No. 10,032,262, “Block-based content-adaptive reshaping for high dynamic range images,” by A. Kheradmand et al., to be referred to as the '262 patent, which is incorporated herein by reference in its entirety,
[0024] Under this framework, given reference HDR content (120), corresponding SDR content (134) (also to be referred as base-layer (BL) or reshaped content) is encoded and transmitted in a single layer of a coded video signal (144) by an upstream encoding device that implements the encoder-side codec architecture. The SDR content is received and decoded, in the single layer of the video signal, by a downstream decoding device that implements the decoder-side codec architecture. Backward-reshaping metadata (152) is also encoded and transmitted in the video signal with the SDR content so that HDR display devices can reconstruct HDR content based on the SDR content and the backward reshaping metadata. Without loss of generality, in some embodiments, as in non-backward-compatible systems, SDR content may not be watchable on its own, but must be watched in combination with the backward reshaping function which will generate watchable SDR or HDR content. In other embodiments which support backward compatibility, legacy SDR decoders can still playback the received SDR content without employing the backward reshaping function.
[0025] As illustrated in
[0026] Examples of backward reshaping metadata representing/specifying the optimal backward reshaping functions may include, but are not necessarily limited to only, any of: inverse tone mapping function, inverse luma mapping functions, inverse chroma mapping functions, lookup tables (LUTs), polynomials, inverse display management coefficients/parameters, etc. In various embodiments, luma backward reshaping functions and chroma backward reshaping functions may be derived/optimized jointly or separately, may be derived using a variety of techniques, for example, and without limitation, as described in the '262 patent.
[0027] The backward reshaping metadata (152), as generated by the backward reshaping function generator (150) based on the SDR images (134) and the target HDR images (120), may be multiplexed as part of the video signal 144, for example, as supplemental enhancement information (SEI) messaging.
[0028] In some embodiments, backward reshaping metadata (152) is carried in the video signal as a part of overall image metadata, which is separately carried in the video signal from the single layer in which the SDR images are encoded in the video signal. For example, the backward reshaping metadata (152) may be encoded in a component stream in the coded bitstream, which component stream may or may not be separate from the single layer (of the coded bitstream) in which the SDR images (134) are encoded.
[0029] Thus, the backward reshaping metadata (152) can be generated or pre-generated on the encoder side to take advantage of powerful computing resources and offline encoding flows (including but not limited to content adaptive multiple passes, look ahead operations, inverse luma mapping, inverse chroma mapping, CDF-based histogram approximation and/or transfer, etc.) available on the encoder side.
[0030] The encoder-side architecture of
[0031] In some embodiments, as illustrated in
[0032] Optionally, alternatively, or in addition, in the same or another embodiment, a backward reshaping block 158 extracts the backward (or forward) reshaping metadata (152) from the input video signal, constructs the backward reshaping functions based on the reshaping metadata (152), and performs backward reshaping operations on the decoded SDR images (156) based on the optimal backward reshaping functions to generate the backward reshaped images (160) (or reconstructed HDR images). In some embodiments, the backward reshaped images represent production-quality or near-production-quality HDR images that are identical to or closely/optimally approximating the reference HDR images (120). The backward reshaped images (160) may be outputted in an output HDR video signal (e.g., over an HDMI interface, over a video link, etc.) to be rendered on an HDR display device.
[0033] In some embodiments, display management operations specific to the HDR display device may be performed on the backward reshaped images (160) as a part of HDR image rendering operations that render the backward reshaped images (160) on the HDR display device.
Rate-Control-Aware Reshaping
[0034] Rate control is an integral part of any video compression pipeline. The principle behind rate control is to adjust how much a picture is quantized to achieve a target bit rate. In general, more bits per frame corresponds to a better visual quality; however, allocating more bits per frame comes at the cost of increased bandwidth. Rate control tries to find a balance between a target bit rate and acceptable quality.
[0035] For example, under most rate-control schemes, pictures with complex textures are deemed visually significant and are allocated more bits during quantization; however, noisy images with perceptually irrelevant content may also exhibit complex textures and may end up being allocated more bits than necessary. Being allocated more bits for perceptually irrelevant content corresponds to allocating less bits for real content, which is highly inefficient and may even result in lower overall visual quality.
[0036] As depicted in
Image Reshaping
[0037] During reshaping each channel or color component of a HDR frame is mapped, separately, to the base layer (134). Mapping of HDR channels to base layer is commonly referred to as forward reshaping. For example, the luma channel in HDR is mapped to the base layer luma channel using a luma forward reshaping curve. The chroma channels in the HDR are separately mapped to the base layer chroma channels using their respective forward chroma reshaping curves. To achieve the highest visual quality, the base layer is designed to occupy most of the base layer codeword range; however, under certain conditions, (e.g., for noisy HDR content with a small codeword range), there is no need to span the entire base layer codeword range as it hardly improves the visual quality. Instead, the bit rate can be lowered by restricting the small codeword range HDR content to a smaller codeword range in the base layer.
[0038] Let the luma channel forward reshaping function to be denoted by ƒ.sub.L, such that, it maps HDR luma values v.sub.L to base layer luma values s.sub.L i.e. s.sub.L=ƒ.sub.L (v.sub.L). In an embodiment, without limitation, the function may be monotonically non-decreasing.
[0039] As used herein, the terms “scene” or “group of pictures” refer to a set of consecutive frames in a video sequence with similar color or dynamic range characteristics. While example embodiments may refer to a scene, a forward reshaping function and/or a corresponding backward reshaping function as described herein may be constructed with similar methods for one of: a single image, a single group of pictures, a scene, a time window within a single scene or a single media program, etc.
[0040] Let the minimum and maximum HDR luma values in the i-th scene be given by v.sub.L,min.sup.i and v.sub.L,max.sup.i respectively. To generate a base layer of bit depth B.sub.s, the function ƒ.sub.L typically tries to map the minimum HDR luma value v.sub.L,min.sup.i to zero (or other legal minimum value) and the maximum HDR luma value v.sub.L,max.sup.i to the largest base layer codeword 2.sup.B.sup.
ƒ.sub.L(v.sub.L,min.sup.i)=s.sub.L,min.sup.i=0
ƒ.sub.L(v.sub.L,max.sup.i)=s.sub.L,max.sup.i=2.sup.B.sup.
The HDR codewords in the range [v.sub.L,min.sup.i, v.sub.L,max.sup.i] are mapped to base layer luma codewords in the range [0, 2.sup.B.sup.
r.sub.L.sup.i=s.sub.L,max.sup.i−s.sub.L,min.sup.i=2.sup.B.sup.
[0041] Irrespective of the codeword range of HDR codewords, typically, the base layer spans the entire allowed codeword range.
[0042] For chroma channels, the mapping is slightly different. Suppose ƒ.sub.C0 and ƒ.sub.C1 be the functions for forward mapping of the C0 and C1 channels respectively (e.g., Cb and Cr in a YCbCr representation). These functions should also be monotonically non-decreasing. In an embodiment, the range of base layer chroma codewords r.sub.Cx.sup.i is calculated using the following logic. As the procedure is same for both C0 and C1, the chroma-related subscripts are replaced by Cx.
where K.sub.C0 and K.sub.C1 are constants smaller or equal to 1.0 and depend on the color space (e.g., set to default values of 0.5 for the IPTPQc2 color space, a variant of the ICtCp color space). The chroma forward reshaping functions ƒ.sub.C0 and ƒ.sub.C1 are independent of luma, but the range of the chroma codewords is still governed by the minimum and maximum luma values. After the codeword ranges are computed, the minimum and maximum of the base layer Cx channel codewords is,
The HDR Cx channel codewords in the range [v.sub.Cx,min.sup.i, v.sub.Cx,max.sup.i] are mapped to the base layer Cx channel codewords range [s.sub.Cx,min.sup.i, s.sub.Cx,max.sup.i]. In an embodiment, the function mapping HDR Cx channel codewords to the base layer codewords may be given by:
where the symbols v.sub.Cx.sup.i and s.sub.Cx.sup.i, respectively, represent any HDR or base layer codeword value.
Codeword-Range Adjustment for Improved Rate Control
[0043] Reshaping functions try to map HDR codewords to a range of base layer codewords. A wider range of codewords in the base layer is good to maintain high visual quality; however, in certain cases (e.g., for a noisy HDR signal with narrow codeword range) the number of bits required to compress such a base layer is far more significant than the visual quality improvement. A rate-control-aware reshaping scheme should help an encoder to intelligently allocate fewer bits to such HDR content, while the regular HDR content should pass through unchanged.
[0044] To design such a scheme, one will have to first detect HDR frames with such features, and then adjust (e.g., reduce) the base layer codeword range. Reducing the base layer codeword range will decrease the residuals during motion estimation. Smaller residuals reduce the magnitude of DCT coefficients, thus requiring fewer bits to encode after entropy coding.
[0045] In an embodiment, one way to decrease the base layer codeword range is by adjusting the perceived HDR codeword range, e.g., by artificially decreasing the minimum HDR value and/or increasing the maximum HDR value for the luma channel. Note that there is no change in the actual HDR image. One simply adjusts the computed minimum and maximum luma HDR values so that the forward luma reshaping curve is altered. These updated (adjusted or virtual) values are represented by {tilde over (v)}.sub.L,min.sup.i and {tilde over (v)}.sub.L,max.sup.i respectively, such that,
{tilde over (v)}.sub.L,min.sup.i≤v.sub.L,min.sup.i and {tilde over (v)}.sub.L,max.sup.i≥v.sub.L,max.sup.i. (6)
[0046] For the luma channel, the updated forward reshaping function {tilde over (ƒ)}.sub.L will map the virtual minimum and maximum values {tilde over (v)}.sub.L,min.sup.i and {tilde over (v)}.sub.L,max.sup.i to the base layer as:
{tilde over (ƒ)}.sub.L({tilde over (v)}.sub.L,min.sup.i)={tilde over (s)}.sub.L,min.sup.i=0.
{tilde over (ƒ)}.sub.L({tilde over (v)}.sub.L,max.sup.i)={tilde over (s)}.sub.L,max.sup.i=2.sup.B.sup.
[0047] As {tilde over (ƒ)}.sub.L is a monotonically non-decreasing function, the actual minimum and maximum values, i.e. v.sub.L,min.sup.i and v.sub.L,max.sup.i, will map to different base layer codewords:
{tilde over (ƒ)}.sub.L(v.sub.L,min.sup.i)≥0,
{tilde over (ƒ)}.sub.L(v.sub.L,max.sup.i)≤2.sup.B.sup.
With this updated luma forward reshaping curve, the mapped base layer luma codeword range will shrink, {tilde over (r)}.sub.L.sup.i≤r.sub.L.sup.i, as:
{tilde over (r)}.sub.L.sup.i={tilde over (ƒ)}.sub.L(v.sub.L,max.sup.i)−{tilde over (ƒ)}.sub.L(v.sub.L,min.sup.i)≤2.sup.B.sup.
In summary, decreasing the minimum and/or increasing the maximum HDR luma values will reduce the base layer luma codeword range and allow for more efficient rate control during the subsequent compression step.
[0048] For the chroma channels, the range of base layer codewords is dependent on the luma HDR minimum and maximum values. With the updated (virtual) values, the chroma base layer codeword ranges will also change as follows:
As the updated denominator {tilde over (v)}.sub.L,max.sup.i−{tilde over (v)}.sub.L,min.sup.i is larger, the codeword range reduces i.e. {tilde over (r)}.sub.Cx.sup.i≤r.sub.Cx.sup.i. Note that the numerator of equation (10) is the same as before, as {tilde over (s)}.sub.L,max.sup.i−{tilde over (s)}.sub.L,min.sup.i=s.sub.L,max.sup.i−s.sub.L,min.sup.i. This will change the chroma base layer range [{tilde over (s)}.sub.Cx,min.sup.i, {tilde over (s)}.sub.Cx,max.sup.i] as follows:
[0049] By generating virtual minimum and maximum HDR luma values, the range of luma and chroma base layer codewords can be made smaller than before. This change can help to reduce the bit rate. The amount by which these values should be changed is controlled by the strength of a base layer codeword range reduction parameter, to be denoted as M. Details on how to derive M are provided later on.
[0050] As discussed, by adjusting the minimum luma HDR value v.sub.L,min.sup.i and the maximum luma HDR value v.sub.L,max.sup.i one may reduce the codeword range of the base layer, thus effectively reducing the bit rate. As depicted in
Δ.sub.L,1.sup.i=v.sub.L,avg.sup.i−v.sub.L,min.sup.i,
Δ.sub.L,2.sup.i=v.sub.L,max.sup.i−v.sub.L,avg.sup.i, (12)
where, v.sub.L,min.sup.i, v.sub.L,avg.sup.i and v.sub.L,max.sup.i denote the minimum, average and maximum HDR luma value values for scene i. The difference between the minimum and maximum HDR luma values, denoted as Δ.sub.L.sup.i for scene i, can be derived as
Δ.sub.L.sup.i=v.sub.L,max.sup.i−v.sub.L,min.sup.i=(v.sub.L,max.sup.i−v.sub.L,avg.sup.i)+(v.sub.L,avg.sup.i−v.sub.L,min.sup.i)=Δ.sub.L,1.sup.i+Δ.sub.L,2.sup.i. (13)
[0051] As an example, Table 1 shows in pseudo code an example algorithm for computing Δ.sub.L.sup.i. A luma channel frame j in scene i is denoted by F.sub.j,L.sup.i with width W.sub.L and height H.sub.L. A specific pixel in this frame is represented by F.sub.j,L.sup.i(m,n), where (m, n) is the location of the pixel. Let T be the total number of frames in the scene. Moreover, let v.sub.j,L,min.sup.i, v.sub.j,L,avg.sup.i and v.sub.j,L,max.sup.i be the minimum, average and maximum HDR luma values for frame j luma channel.
TABLE-US-00001 TABLE 1 Example pseudo code for computing Δ.sub.L.sup.i // Compute frame minimum, average and maximum For j = 0 .fwdarw. T − 1 { v.sub.j,L,min.sup.i = F.sub.j,L.sup.i (0,0) v.sub.j,L,max.sup.i = F.sub.j,L.sup.i (0,0) v.sub.j,L,sum.sup.i = 0 For m = 0 .fwdarw. H.sub.L − 1 For n = 0 .fwdarw. W.sub.L − 1 v.sub.j,L,sum.sup.i = v.sub.j,L,sum.sup.i + F.sub.j,L.sup.i (m, n) If v.sub.j,L,min.sup.i > F.sub.j,L.sup.i (m, n) v.sub.j,L,min.sup.i = F.sub.j,L.sup.i (m, n) If v.sub.j,L,max.sup.i < F.sub.j,L.sup.i (m, n) v.sub.j,L,max.sup.i = F.sub.j,L.sup.i (m, n) v.sub.j,L,avg.sup.i = v.sub.j,L,sum.sup.i /(W.sub.L × H.sub.L ) } // Compute scene minimum, average and maximum v.sub.L,min.sup.i = v.sub.0,L,min.sup.i v.sub.L,max.sup.i = v.sub.0,L,max.sup.i v.sub.L,sum.sup.i = 0 For j = 0 .fwdarw. T − 1 { v.sub.L,sum.sup.i = v.sub.L,sum.sup.i + v.sub.j,L,sum.sup.i If v.sub.L,min.sup.i > v.sub.j,L,min.sup.i v.sub.L,min.sup.i = v.sub.j,L,min.sup.i If v.sub.L,max.sup.i < v.sub.j,L,max.sup.i v.sub.L,max.sup.i = v.sub.j,L,max.sup.i } v.sub.L,avg.sup.i = v.sub.L,sum.sup.i / T Δ.sub.L.sup.i = v.sub.L,max.sup.i − v.sub.L,min.sup.i
Computing Scaler M
[0052] In an embodiment, the value of base layer codeword range adaptation M is greater than or equal to 1. When M=1, rate-control-aware reshaping is not performed. For M>1, the minimum and maximum HDR luma values v.sub.L,min.sup.i and v.sub.L,max.sup.i are translated to the virtual {tilde over (v)}.sub.L,min.sup.i and {tilde over (v)}.sub.L,max.sup.i values as:
{tilde over (v)}.sub.L,min=max(0,v.sub.L,avg.sup.i−M×Δ.sub.L,1.sup.i), (14a)
{tilde over (v)}.sub.L,max.sup.i=min(2.sup.B.sup.
In general, if M>1, as depicted in
[0053] As depicted in equation (14), in a typical embodiment, M may be used to adjust both v.sub.L,min.sup.i and v.sub.L,max.sup.i. In another embodiment, it may be desired to adjust only one of the two boundary values (e.g., to better preserve the highlights or the darks). Thus, one my leave v.sub.L,min.sup.i unchanged and only adjust v.sub.L,max.sup.i by using equation (14b), or one may leave v.sub.L,max.sup.i unchanged and adjust v.sub.L,min.sup.i by using equation (14a).
[0054]
[0055] In equation (14), one adjusts v.sub.L,min.sup.i and v.sub.L,max.sup.i by multiplying Δ.sub.L,1.sup.i and Δ.sub.L,2.sup.i by M, and these changes implicitly affect the SDR codeword range. Note, that the exact same effect may be accomplished by directly narrowing the original SDR codeword range. For example, let
{tilde over (ƒ)}.sub.L(v.sub.L,avg.sup.i)=s.sub.L,avg.sup.i. (15)
Then, for an original SDR range [0, 2.sup.B.sup.
{tilde over (s)}.sub.L,min.sup.i=s.sub.L,avg.sup.i−s.sub.L,avg.sup.i/M,
{tilde over (s)}.sub.L,max.sup.i=s.sub.L,avg.sup.i+(2.sup.B.sup.
and
{tilde over (r)}.sub.L.sup.i({tilde over (s)}.sub.L,max.sup.i−{tilde over (s)}.sub.L,min.sup.i)≤(2.sup.B.sup.
Criteria for Adjusting the HDR Codeword Range
[0056] Generating virtual HDR luma values will ensure that the base layer codeword ranges for luma and chroma channels will get reduced. This reduction will result in lower bitrate for encoding such a base layer; however, such an adjustment needs to be done only for select scenes. In an embodiment, without limitation, it is suggested that scenes having small HDR codeword range and high frequency, noise-like, content should be mapped to a smaller SDR codeword range. In such a case, a three-step algorithm is proposed: [0057] 1. Compute features for identifying noisy HDR content with relatively-small codeword range [0058] 2. Evaluate the strength of base layer codeword range reduction from the feature values [0059] 3. Construct modified forward reshaping curves and generate the base layer
[0060] As before, discussions may refer to a scene; however, a scene can be any group of contiguous frames or even a single frame. Given the minimum, average, and maximum luminance values in a scene, in an embodiment, Δ.sub.L.sup.i (see equation (13)), may be used to identify whether the input HDR content has a small codeword range or not. For example, if Δ.sub.L.sup.i is smaller than a certain threshold (e.g., Th.sub.HDR) then the scene can be classified as a scene with small HDR codeword range.
[0061] As known by a person skilled in the art, there exist a variety of algorithms to classify regions of a picture or a frame as “noisy.” For example, one may compute block-based statistics, such as their mean, variance, and or standard deviation, and use these statistical data for classification. In an embodiment, without limitation, a metric of noisiness in the picture is derived based on the number of pixels in a frame classified as “edges.” Details are provided next.
[0062] In order to detect noisy content, one may start by computing the total number of edge points in each luma and color channel of a frame. Let I.sub.j,L.sup.i, I.sub.j,C0.sup.i and I.sub.j,C1.sup.i denote the normalized luma and chroma channels of frame j in scene i i.e. F.sub.j,L.sup.i. Every pixel value in a normalized image is in the range [0, 1). Consider B.sub.v to be the bit-depth of the source HDR.
I.sub.j,L.sup.i(m,n)=F.sub.j,L.sup.i(m,n)/2.sup.B.sup.
I.sub.j,Cx.sup.i(m,n)=F.sub.j,Cx.sup.i(m,n)/2.sup.B.sup.
[0063] Let the height and width of the luma channel image be H.sub.L, W.sub.L respectively, and let H.sub.Cx, W.sub.Cx denote the height and width of chroma channels. In an embodiment, one may apply any known in the art edge-detection techniques (e.g., the Sobel operator and the like) to identify edges in the luma channel. For example, in an embodiment, the Sobel operators are given by
Kernels Ψ.sub.1 and Ψ.sub.2 are used to evaluate horizontal and vertical gradient images G.sub.1,j,L.sup.i and G.sub.2,j,L.sup.i respectively, for the luma channel
G.sub.1,j,L.sup.i=Ψ.sub.1.Math.I.sub.j,L.sup.i and G.sub.2,j,L.sup.i=Ψ.sub.2.Math.I.sub.j,L.sup.i,
G.sub.j,L.sup.i=((G.sub.1,j,L.sup.i).sup.2+(G.sub.2,j,L.sup.i).sup.2).sup.1/2, (19)
where the symbol .Math. denotes a 2D convolution operator and G.sub.j,L.sup.i is the luma gradient magnitude image. The gradient images for chroma channels are also evaluated.
G.sub.2,j,Cx.sup.i=Ψ.sub.1.Math.I.sub.j,Cx.sup.i and G.sub.2,j,Cx.sup.i=Ψ.sub.2.Math.I.sub.j,Cx.sup.i,
G.sub.j,Cx.sup.i=((G.sub.1,j,Cx.sup.i).sup.2+(G.sub.2,j,Cx.sup.i).sup.2).sup.1/2. (20)
[0064] After the gradient image is computed, the pixels having magnitude of gradients above a threshold Th.sub.j are designated as edge points. Choosing a fixed threshold may not be very reliable, thus, in embodiment, an adaptive threshold may be preferable, computed as follows:
where the value of the threshold is within the range [0.001, 1]. The constant 0.001 puts a lower bound on the threshold value, so it should not reduce to zero. Quantities Δ.sub.j,L,1.sup.i and Δ.sub.j,L,2.sup.i are intensity differences and roughly analogous to gradients. Taking the minimum of
gives an estimate of the range of normalized pixel-wise differences within the image and provides a suitable threshold. Note that, the value of the threshold is determined by the luma values only; however, the same threshold is applied to both luma and chroma channel gradient images. Even though the threshold may change for each frame, it does not have a direct impact on temporal consistency of the entire algorithm.
[0065] For noisy images with small codeword range, Δ.sub.j,L,1.sup.i or Δ.sub.j,L,2.sup.i are small, so the threshold Th.sub.j is lower and more points are detected as edge points. On the contrary, normal images have higher values for Δ.sub.j,L,1.sup.i or Δ.sub.j,L,2.sup.i, which increases the threshold and reduces the number of detected edge points.
[0066] Denote the percentage of edge pixels in the luma or chroma channels by P.sub.j,L.sup.i, P.sub.j,C0.sup.i and P.sub.j,C1.sup.i respectively. For each pixel in the gradient image, if the value is greater than or equal to the threshold Th.sub.j, it is regarded as an edge pixel. Suppose Ξ is the identity function,
Let P.sub.j.sup.i be the maximum value among all the three channels of that frame and P.sup.i be the maximum of all the frames in the scene.
P.sub.j.sup.i=max(P.sub.j,L.sup.i,P.sub.j,C0.sup.i,P.sub.j,C1.sup.i),
P.sup.i=max(P.sub.j.sup.i),j=0,1,2, . . . T−1. (23)
[0067] Table 2 provides an example algorithm to compute the edge-point percentage in a frame using equations (19) to (23). In general, the percentage of edge pixels is a relevant feature for detecting high frequency noise. Noisy images have textures with pixels having drastically different intensities juxtaposed together. Gradients are generally high in those regions and density of edge pixels is high. On the contrary, smooth images have fewer edge pixels and lower percentage of edge pixels. Thus, one may combine the HDR luma intensity codeword range and edge point percentage features to compute the strength of base layer codeword range reduction M.
TABLE-US-00002 TABLE 2 Example algorithm to compute edge-point percentage in a frame // Compute threshold for each frame For j = 0 .fwdarw. T − 1 { Δ.sub.j,L,1.sup.i = v.sub.j,L,avg.sup.i − v.sub.j,L,min.sup.i Δ.sub.j,L,2.sup.i = v.sub.j,L,max.sup.i − v.sub.j,L,avg.sup.i
Example of an M-Adaptation Scheme
[0068] As described earlier, one may use hard thresholds (e.g., Th.sub.HDR) on the feature values to determine if a certain HDR scene is noisy or has a small codeword range. One potential problem with hard thresholds is temporal inconsistency. For example, there can be a scenario where scene characteristics trigger rate-control-aware reshaping; however, the very next scene is processed with normal reshaping. This scenario will create temporal inconsistencies and potential visual quality problems, especially when each scene has only one frame. To avoid these problems, a novel technique without such hard thresholds is proposed.
[0069] In an embodiment, instead of a binary (e.g., yes/no) classification of a scene as noisy with a small codeword range, one may use the feature values to modulate the strength of the base layer range reduction M. If the feature values strongly indicate that rate-control-aware reshaping is needed, then a larger M value is applied. On the contrary, if feature values are weak indicators that rate-control-aware reshaping is needed, then a smaller M is employed. By applying the feature values to compute M, there is no need to explicitly classify each scene as noisy with a small codeword range. Details of the proposed approach are explained next.
[0070] Let δ and β denote parameters that control the strength of base layer range reduction M. These parameters are dependent on feature values. Normalized HDR luma range δ is a fraction in the range [0, 1) and, in an embodiment, its value may depend on the range of HDR luma codewords in the scene, i.e., Δ.sub.L.sup.i, as:
[0071] In an embodiment, the rational number β is computed based on the edge-point percentage of that scene, i.e., P.sup.i, as:
For P.sup.i=0, β=1. The value of β varies in [1.0, 100.0] as P.sub.i∈(0, 100.0]. The reason for choosing max(1/P.sup.i, P.sup.i) to calculate β is the following. Feature value P.sup.i will be large for noisy scenes and 1/P.sup.i will be large for mostly flat (no spatial variations) scenes. In both the cases, it is desired to have a higher β and stronger base layer codeword range reduction.
[0072] Given these two parameters, in an embodiment, the strength of base layer codeword range reduction M may be computed using an exponential mapping, as in
M=max(βe.sup.−αδ,1.0). (26)
[0073] The curve βe.sup.−αδ is monotonically decreasing. Let C (e.g., C=0.15) denote a constant as a function of the normalized cutoff HDR luma codeword range. For example, let δ≥C denote the range where rate-control-aware reshaping should not be triggered. In other words, M=1 when δ=C and thereafter. Substituting the values δ=C and M=1 in equation (26), one can compute the value of a as
M=1=βe.sup.−αC,
which implies
[0074]
[0075]
[0076] In Table 3, as described earlier (see equations (15)-(17)), following “if M>1,” instead of increasing the HDR codeword range, in an alternative embodiment, one could also directly decrease the SDR codeword range.
TABLE-US-00003 TABLE 3 Example M-adaptation algorithm // Compute parameters δ and β from feature values
Modifying the Forward Reshaping Mapping
[0077] Given the adjusted minimum and maximum HDR luma values {tilde over (v)}.sub.L,min.sup.i and {tilde over (v)}.sub.L,max.sup.i, the luma and chroma forward reshaping curves are computed using known in the art techniques (e.g., as in the ″262 patent) using virtual {tilde over (v)}.sub.L,min.sup.i and {tilde over (v)}.sub.L,max.sup.i as the minimum and maximum HDR luma values. In an embodiment, values outside the index range [v.sub.L,min.sup.i, v.sub.L,max.sup.i] are extrapolated by copying from the closest valid entry. In other words, all the entries in the index range [{tilde over (v)}.sub.L,min.sup.i, v.sub.L,min.sup.i) get the same value as the value at index v.sub.L,min.sup.i. Similarly, the codewords in the index range (v.sub.L,max.sup.i, {tilde over (v)}.sub.L,max.sup.i] get the value as the value at index v.sub.L,max.sup.i. The unused codewords are assigned using the power curve that spans the same codeword range [{tilde over (v)}.sub.L,min.sup.i, {tilde over (v)}.sub.L,max.sup.i]. As a result, the Forward LUT also maps the [{tilde over (v)}.sub.L,min.sup.i, {tilde over (v)}.sub.L,max.sup.i] HDR luma codeword range to the SDR codeword range [0,2.sup.B.sup.
[0078]
[0079]
[0080] The discussion herein assumed out-of-loop reshaping, wherein forward and backward reshaping is performed outside of compression and decompression; however, similar techniques for HDR codeword-range adaptation may also be applicable to in-loop reshaping schemes, such as those presented in PCT Application Ser. No. PCT/US2019/017891, “Image reshaping in video coding using rate distortion optimization,” by P. Yin et al., filed on Feb. 13, 2019, which is incorporated herein by reference.
[0081] Example Computer System Implementation
[0082] Embodiments of the present invention may be implemented with a computer system, systems configured in electronic circuitry and components, an integrated circuit (IC) device such as a microcontroller, a field programmable gate array (FPGA), or another configurable or programmable logic device (PLD), a discrete time or digital signal processor (DSP), an application specific IC (ASIC), and/or apparatus that includes one or more of such systems, devices or components. The computer and/or IC may perform, control or execute instructions relating to generating rate-control-aware reshaping functions, such as those described herein. The computer and/or IC may compute, any of a variety of parameters or values that relate to rate-control-aware reshaping functions as described herein. The image and video dynamic range extension embodiments may be implemented in hardware, software, firmware and various combinations thereof.
[0083] Certain implementations of the invention comprise computer processors which execute software instructions which cause the processors to perform a method of the invention. For example, one or more processors in a display, an encoder, a set top box, a transcoder or the like may implement methods for rate-control-aware reshaping functions as described above by executing software instructions in a program memory accessible to the processors. The invention may also be provided in the form of a program product. The program product may comprise any non-transitory and tangible medium which carries a set of computer-readable signals comprising instructions which, when executed by a data processor, cause the data processor to execute a method of the invention. Program products according to the invention may be in any of a wide variety of non-transitory and tangible forms. The program product may comprise, for example, physical media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, or the like. The computer-readable signals on the program product may optionally be compressed or encrypted.
[0084] Where a component (e.g. a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a “means”) should be interpreted as including as equivalents of that component any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated example embodiments of the invention.
EQUIVALENTS, EXTENSIONS, ALTERNATIVES AND MISCELLANEOUS
[0085] Example embodiments that relate to rate-control-aware reshaping functions for HDR images are thus described. In the foregoing specification, embodiments of the present invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and what is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
[0086] Various aspects of the present invention may be appreciated from the following enumerated example embodiments (EEEs):
EEE 1. A method for generating a reshaping function using one or more processors, the method comprising:
[0087] receiving one or more input images (120) in a first dynamic range;
[0088] computing (605) a first codeword range for luma pixels in the one or more input images;
[0089] computing (610) a noise metric for the luma pixels in the one or more input images;
[0090] computing (615) a scaler to adjust the first codeword range based on the first codeword range and the noise metric; and [0091] if the scaler is bigger than one: [0092] generating (620) a second codeword range for the luma pixels in the one or more input images based on the scaler and the first codeword range, wherein the second codeword range is larger than the first codeword range; and [0093] generating (630) a forward luma reshaping function mapping luminance pixel values from the first dynamic range to a second dynamic range based on the second codeword range; [0094] else [0095] generating (625) the forward luma reshaping function based on the first codeword range.
EEE 2. The method of EEE 1, further comprising:
[0096] if the scaler is bigger than one: [0097] generating a forward chroma reshaping function mapping chroma pixel values from the first dynamic range to the second dynamic range based on the second codeword range;
[0098] else [0099] generating the forward chroma reshaping function based on the first codeword range.
EEE 3. The method of EEE 2, further comprising:
[0100] applying the forward luma reshaping function and the forward chroma reshaping function to map the one or more input images in the first dynamic range to one or more reshaped images in the second dynamic range; and
[0101] encoding the one or more reshaped images to generate a coded bitstream.
EEE 4. The method of any of EEEs 1-3, where computing the first codeword range comprises
[0102] computing a first minimum luma pixel value and a first maximum luma pixel value in the one or more input images in the first dynamic range.
EEE 5. The method of EEE 4, wherein computing the scaler comprises computing an exponential mapping based on the first codeword range and the noise metric.
EEE 6. The method of EEE 5, wherein computing the scaler (M) comprises computing
M=max(βe.sup.−αδ,1.0),
wherein δ denotes a function of the first codeword range, β denotes a function of the noise metric, and α is a function of the noise metric and a cut-off parameter C for which if δ≥C, then M=1.
EEE 7. The method of EEE 6, wherein
wherein B.sub.v denotes bit-depth resolution of the one or more input images, and Δ.sub.L.sup.i denotes a difference of the first minimum luma pixel value from the first maximum luma pixel value,
wherein P.sup.i denotes the noise metric, and
EEE 8. The method of any of EEEs 4-7, wherein generating the second codeword range comprises computing second luma pixel minimum {tilde over (v)}.sub.L,min.sup.i and second luma pixel maximum {tilde over (v)}.sub.L,max.sup.i values as:
{tilde over (v)}.sub.L,min.sup.i=max(0,v.sub.L,avg.sup.i−M×Δ.sub.L,1.sup.i),
{tilde over (v)}.sub.L,max.sup.i=min(2.sup.B.sup.
wherein B.sub.v denotes bit-depth resolution of the one or more input images, M denotes the scaler, and
Δ.sub.L,1.sup.i=v.sub.L,avg.sup.i−v.sub.L,min.sup.i,
Δ.sub.L,2.sup.i=v.sub.L,max.sup.i−v.sub.L,avg.sup.i,
wherein v.sub.L,min.sup.i, v.sub.L,max.sup.i, and v.sub.L,avg.sup.i denote the first minimum luma pixel value, the first maximum luma pixel value, and a first average luma pixel average value in the one or more input images.
EEE 9. The method of any of EEEs 1-8, where computing the noise metric comprises:
[0103] normalizing pixel values in the one or more input images to [0, 1) to generate one or more normalized images;
[0104] determining edge points in the one or more normalized images based on edge-detection operators and one or more thresholds; and
[0105] determining the noise metric based on a percentage of the determined edge points over the total number of pixels in the one or more normalized images.
EEE 10. The method of EEE 9, wherein the edge-detection operators comprise the Sobel operators.
EEE 11. The method of EEE 9 or EEE 10, wherein computing the threshold for a j-th image in the one or more normalized images comprises computing
wherein B.sub.v denotes bit-depth resolution of the j-th image,
Δ.sub.j,L,1.sup.i=v.sub.j,L,avg.sup.i−v.sub.j,L,min.sup.i
Δ.sub.j,L,2=v.sub.j,L,max.sup.i−v.sub.j,L,avg.sup.i,
wherein v.sub.j,L,min.sup.i, v.sub.j,L,max.sup.i, and v.sub.j,L,avg.sup.i denote a minimum luma pixel value in the j-th image, a maximum luma pixel value in the j-th image, and an average luma pixel average value in the j-th image.
EEE 12. The method of any of EEEs 9-11, wherein determining the noise metric P.sup.i further comprises computing
P.sup.i=max(P.sub.j.sup.i),
wherein P.sub.j.sup.i denotes the percentage of edge points in the j-th normalized image in the one or more normalized images.
EEE 13. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions for executing with one or more processors a method in accordance with any one of the EEEs 1-12.
EEE 14. An apparatus comprising a processor and configured to perform any one of the methods recited in EEEs 1-12.