IMAGE OPTIMIZATION IN MOBILE CAPTURE AND EDITING APPLICATIONS
20260030730 ยท 2026-01-29
Assignee
Inventors
- Guan-Ming Su (Fremont, CA)
- Harshad Kadu (Santa Clara, CA)
- Tsung-Wei Huang (Sunnyvale, CA, US)
- Jon Scott MCELVAIN (Manhattan Beach, CA, US)
- Tao Chen (Palo Alto, CA)
- Samir N. Hulyalkar (Los Gatos, CA)
Cpc classification
G06T7/80
PHYSICS
International classification
Abstract
HDR color patches are sampled throughout an HDR color space parameterized by a parameter. Reference SDR color patches, input HDR color patches and reference HDR color patches are generated from the sampled HDR color patches. An optimization algorithm is executed to generate an optimized forward reshaping mapping and an optimized backward reshaping mapping. The optimized forward reshaping mapping is used to forward reshape input HDR images into forward reshaped SDR images, whereas the optimized backward reshaping mapping is used to backward reshape the forward reshaped SDR images into backward reshaped HDR images.
Claims
1. A method comprising: extracting a set of standard dynamic range (SDR) image feature points from a training SDR image and extracting a set of high dynamic range (HDR) image feature points from a training HDR image; matching a subset of one or more SDR image feature points in the set of SDR image feature points with a subset of one or more HDR image feature points in the set of HDR image feature points; using the subset of one or more SDR image feature points and the subset of one or more HDR image feature points to generate a geometric transform to spatially align a set of SDR pixels in the training SDR image with a set of HDR pixels in the training HDR image; determining a set of pairs of SDR and HDR color patches, from the set of SDR pixels in the training SDR image and the set of HDR pixels in the training HDR image after the training SDR and HDR images have been spatially aligned by the geometric transform; generating an optimized SDR-to-HDR mapping, based at least in part on the set of pairs of SDR and HDR color patches derived from the training SDR image and the training HDR image; applying the optimized SDR-to-HDR mapping to one or more non-training SDR images to generate one or more corresponding non-training HDR images.
2. The method of claim 1, wherein the training SDR image and the training HDR image are captured by a capturing device operating in SDR and HDR capture modes, respectively, from a three-dimensional (3D) visual scene.
3. The method of claim 1, wherein the training SDR image and the training HDR image form a pair of training SDR and HDR images in a plurality of pairs of training SDR and HDR images; wherein the optimized SDR-to-HDR mapping is generated based at least in part on a plurality of sets of pairs of SDR and HDR color patches derived from the plurality of pairs of training SDR and HDR images.
4. The method of claim 1, wherein each SDR image feature point in the subset of one or more SDR image feature points is matched with a respective HDR image feature point in the subset of one or more HDR image feature points; wherein the SDR image feature point and the HDR image feature point are extracted from the training SDR image and the HDR image, respectively, using a common feature point extraction algorithm.
5. The method of claim 1, wherein the SDR training image and the HDR training image are obtained by performing respective camera distortion correction operations on each distorted training image in a pair of a distorted training standard dynamic range (SDR) image and a distorted training high dynamic range (HDR) image to generate a respective training image in a pair of a training SDR image and a training HDR image; wherein the set of SDR image feature points and the set of HDR image feature points correspond to corner pattern marks; wherein using the subset of one or more SDR image feature points and the subset of the one or more HDR image feature points comprise generating a respective projective transform, in a pair of an SDR image projective transform and an HDR image projective transform, using the corner pattern marks detected from each training image in the pair of the training SDR image and the training HDR image.
6. The method of claim 5, wherein the training SDR image and the training HDR image are captured by a first capturing device operating in an SDR capture mode and a second capturing device operating in an HDR capture mode, respectively, from a common color chart image.
7. The method of claim 5, wherein the common color chart image is rendered on and captured by the first capturing device and the second capturing device from a screen of a common reference image display.
8. The method of claim 5, wherein the respective camera distortion correction operations are based at least in part on camera-specific distortion coefficients generated from a camera calibration process performed with a camera used to acquire the training image.
9. The method of claim 5, wherein the set of SDR color patches and the set of HDR color patches are used to derive a three-dimensional mapping table (3DMT); wherein the optimized SDR-to-HDR mapping is generated based at least in part on the 3DMT.
10. The method of claim 5, wherein the optimized SDR-to-HDR mapping represents one of: tensor-product B-Spline (TPB) based mapping or a non-TPB-based mapping.
11. A method comprising: building sampled high dynamic range (HDR) color space points distributed throughout an HDR color space used to represent reconstructed HDR images; converting the sampled HDR color space points into standard dynamic range (SDR) color space points in a first SDR color space in which SDR images to be edited by an editing device are represented; determining a bounding SDR color space rectangle based on extreme SDR codeword values of the SDR color space points in the first SDR color space and determining an irregular three-dimensional (3D) shape from a distribution of the SDR color space points; building sampled SDR color space points distributed throughout the bounding SDR color space rectangle in the first SDR color space; using the sampled SDR color space points and the irregular shape to generate a boundary clipping 3D lookup table (3D-LUT) including lookup keys and corresponding lookup values, wherein the boundary clipping 3D-LUT uses the sampled SDR color space points as lookup keys; wherein, when the lookup key is within the irregular shape, the lookup key equals the lookup value; wherein, when the lookup key is outside the irregular shape, the lookup value is determined based on an index function that takes the irregular shape and the lookup key as input and returns a nearest neighbor inside the irregular shape to the lookup key as lookup value; performing clipping operations, based at least in part on the boundary clipping 3D-LUT, on an edited SDR image in the first SDR color space to generate a boundary clipped edited SDR image in the first SDR color space.
12. The method of claim 11, wherein the clipping operations includes first using the bounding SDR color space rectangle to perform regular clipping on the edited SDR image to generate a regularly clipped edited SDR image and subsequently using the 3D-LUT to perform irregular clipping on the regularly clipped edited SDR image to generate the boundary clipped edited SDR image.
13. An apparatus comprising a processor and configured to perform the method recited in claim 1.
14. A non-transitory computer-readable storage medium having stored thereon computer-executable instruction for executing a method with one or more processors in accordance with the method recited in claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] An embodiment of the present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
[0011]
[0012]
[0013]
[0014]
[0015]
DESCRIPTION OF EXAMPLE EMBODIMENTS
[0016] In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be apparent, however, that the present disclosure may be practiced without these specific details. In other instances, well-known structures and devices are not described in exhaustive detail, in order to avoid unnecessarily occluding, obscuring, or obfuscating the present disclosure.
SUMMARY
[0017] Image optimizations such as those relating to tensor-product B-Spline (TPB) based image reshaping solutions are described herein for video capture applications including but not limited to mobile (video) capture applications. The TPB based solutions can provide or generate relatively accurate reconstructed images using robust predictors implemented with video codecs such as backward compatible video codecs. These predictors can operate with static mapping to gracefully handle or recover from transmission errors such as frame dropping, minimize power consumption such as battery power consumption, reduce data overheads and/or processing/transmission latencies, etc. In some operational scenarios, some or all of the TPB based solutions as described herein may be implemented with mobile applications that are to comply with relatively harsh battery or power constraint.
[0018] The TPB based image reshaping solutions can be adapted to operating in different use cases. Some of the use cases allow freedom to design and/or apply HDR-to-SDR mappings to generate SDR images to be encoded in (base layers of) video signals as described herein. Some of the use cases rely on mobile image signal processors (ISPs) with relatively limited programmable registers to generate SDR images to be encoded in (base layers of) video signals. Different architectures may be used to implement the TPB based solutions as described herein. Additionally, optionally or alternatively, these architectures or the TPB based solutions can be used to provide or support backward compatibility.
[0019] Given reference SDR images represented in an SDR color space or a color space in an SDR domain), the TPB-based solutions can use a TPB optimization process to achieve or determine a maximum or widest HDR color space or a color space in an HDR domain to represent corresponding reconstructed HDR images predicted from the SDR images. The wider the HDR color space, the more color deviations introduced in the SDR color space. The TPB optimization process can be implemented to achieve or determine an optimized balance point or tradeoff between achieving the maximum or widest HDR color space on one hand and introducing SDR color deviations on the other. Additionally, optionally or alternatively, the TPB optimization process as described herein can be implemented to incorporate neutral color processing or perform constraint optimization to help preserve gray levels represented in the SDR images in the reconstructed HDR images predicted from the SDR images.
[0020] As more and more videos are being captured by various camera-equipped computing or mobile devices in practice, video editing are also becoming more and more popular. Users operating these devices may be allowed to adjust the look of captured videos manually or simply apply default theme templates to the captured videos. Depending on available computation resources and/or preferences of video editing tool designers/providers, video editing may be done in either a source domain such as a source HDR domain in which source images are represented or a non-source domain such as a generated SDR domain to which pre-edited images may be converted from the source images in the source domain.
[0021] In operational scenarios in which SDR images are to be encoded in a video signal, video editing operations in the HDR domain or the source domain will not adversely impact downstream devices of the video signal to reconstruct HDR images from the SDR images decoded from the video signal at the decoder side. This is so because these video editing operations do not interfere HDR-to-SDR mappings to generate the SDR images from HDR images or source images in the HDR domain at the encoder side, and also do not interfere SDR-to-HDR mappings to map back to or reconstruct an approximation of, the HDR images at the decoder side.
[0022] However, in these operational scenarios, video editing operations in the SDR domain will likely adversely impact revertability between the SDR domain and the HDR domain. For example, the edited SDR images may not be able to be mapped back to the original or source HDR images. Furthermore, a color space to represent the SDR images may be limited with pre-defined ranges such as SMPTE ranges that pixel values or codewords cannot exceed. Color space conversion such as YUV-to-RGB conversion used for video editing operations may cause pixel values or codewords to be clipped and hence damaging or harming the reversibility. Clipping techniques as described herein can be used to reduce or prevent adverse editing impacts on video editing operations including but not limited to those performed with mobile devices. A two-level TPB boundary clipping may be implemented in the RGB domain to support or preserve maximum edited colors in the SDR domain that can be used to propagate or map back to reconstructed images in the HDR domain.
[0023] Example embodiments described herein relate to image generation. Sampled HDR color space points distributed throughout an HDR color space are built. The HDR color space is parameterized by a color primary scaling parameter with a candidate value selected from among a plurality of candidate values. The color primary scaling parameter is used to compute color space coordinates of at least one of multiple color primaries delineating the HDR color space. Reference SDR color space points represented in a reference SDR color space, input HDR color space points represented in an input HDR color space, and reference HDR color space points represented in a reference HDR color space points, are generated from the sampled HDR color space points in the HDR color space. A reshaping operation optimization algorithm is executed to generate a chain of an optimized forward reshaping mapping and an optimized backward reshaping mapping. The reshaping operation optimization algorithm uses the reference SDR color space points, the input HDR color space points and the reference HDR color space points as input. The optimized forward reshaping mapping is used to forward reshape input HDR images in the input HDR color space into forward reshaped SDR images in a forward reshaped SDR color space, whereas the optimized backward reshaping mapping is used to backward reshape the forward reshaped SDR images in the forward reshaped SDR color space into backward reshaped HDR images.
[0024] Example embodiments described herein relate to image generation. Sampled HDR color space points distributed throughout an HDR color space are built. The HDR color space is parameterized by a color primary scaling parameter with a candidate value selected from among a plurality of candidate values. The color primary scaling parameter is used to compute color space coordinates of at least one of multiple color primaries delineating the HDR color space. Input SDR color space points represented in an input SDR color space and reference HDR color space points represented in a reference HDR color space points are generated from the sampled HDR color space points in the HDR color space. A reshaping operation optimization algorithm is executed to generate an optimized backward reshaping mapping. The reshaping operation optimization algorithm receives the input SDR color space points and the reference HDR color space points as input. The backward reshaping mapping is used to backward reshape SDR images in the input SDR color space into backward reshaped HDR images.
[0025] Example embodiments described herein relate to image generation. A set of SDR image feature points is extracted from a training SDR image, whereas a set of HDR image feature points is extracted from a training HDR image. A subset of one or more SDR image feature points in the set of SDR image feature points is matched with a subset of one or more HDR image feature points in the set of HDR image feature points. The subset of one or more SDR image feature points and the subset of one or more HDR image feature points are used to generate a geometric transform to spatially align a set of SDR pixels in the training SDR image with a set of HDR pixels in the training HDR image. A set of pairs of SDR and HDR color patches is determined from the set of SDR pixels in the training SDR image and the set of HDR pixels in the training HDR image after the training SDR and HDR images have been spatially aligned by the geometric transform. An optimized SDR-to-HDR mapping is generated based at least in part on the set of pairs of SDR and HDR color patches derived from the training SDR image and the training HDR image. The optimized SDR-to-HDR mapping is applied to one or more non-training SDR images to generate one or more corresponding non-training HDR images.
[0026] Example embodiments described herein relate to image generation. Respective camera distortion correction operations are performed on each training image in a pair of a training standard dynamic range (SDR) image and a training high dynamic range (HDR) image to generate a respective undistorted image in a pair of an undistorted training SDR image and an undistorted training HDR image. A respective projective transform, in a pair of an SDR image projective transform and an HDR image projective transform, is generated using corner pattern marks detected from each undistorted image in the pair of the undistorted training SDR image and the undistorted training HDR image. Each projective transform in the pair of the SDR image projective transform and the HDR image projective transform is applied to a respective undistorted image in the pair of the undistorted training SDR image and the undistorted training HDR image to generate a respective rectified image in a pair of a rectified training SDR image and a rectified training HDR image. A set of SDR color patches is extracted from the rectified training SDR image, whereas a set of HDR color patches is extracted from the rectified training HDR image. An optimized SDR-to-HDR mapping is generated based at least in part on the set of SDR color patches and the set of HDR color patches derived from the training SDR image and the training HDR image. The optimized SDR-to-HDR mapping is applied to one or more non-training SDR images to generate one or more corresponding non-training HDR images.
[0027] Example embodiments described herein relate to clipping operations on edited images. Sampled HDR color space points distributed throughout an HDR color space used to represent reconstructed HDR images are built. The sampled HDR color space points are converted into SDR color space points in a first SDR color space in which SDR images to be edited by an editing device are represented. A bounding SDR color space rectangle is determined based on extreme SDR codeword values of the SDR color space points in the first SDR color space. An irregular three-dimensional (3D) shape is determined from a distribution of the SDR color space points. Sampled SDR color space points distributed throughout the bounding SDR color space rectangle in the first SDR color space are built. The sampled SDR color space points and the irregular shape are used to generate a boundary clipping 3D lookup table (3D-LUT). The boundary clipping 3D-LUT uses the sampled SDR color space points as lookup keys. Clipping operations are performed, based at least in part on the boundary clipping 3D-LUT, on an edited SDR image in the first SDR color space to generate a boundary clipped edited SDR image in the first SDR color space.
Reshaping Optimization in Image/Video Capture Applications
[0028] Reshaping optimization processes such as TPB and/or non-TPB optimization processes can be implemented or incorporated in video capture applications running on computing devices such as mobile devices in a variety of operational scenarios. The video capture applications with the reshaping optimization processes can be used to generate or output video signals such as base layer or SDR images encoded therein. The reshaping optimization processes can implement different solutions in different operational scenarios to generate or optimize reshaping operational parameters such as TPB and/or non-TPB coefficients to be used with the base layer or SDR images to generate, construct or reconstruct non-base-layer or HDR images with optimized picture quality.
[0029] For the purpose of illustration, the reshaping optimization processes, or the solutions implemented therein, may be classified into different types based on specific SDR generation processes or sub-processes adopted by the video capturing applications as well as based on specific reshaping path(s)forward (reshaping) path and/or backward (reshaping) pathsubjected to reshaping optimization.
[0030]
[0031] As used herein, white-box conversion refers to HDR-to-SDR mapping or conversion with well-defined conversion or mapping functions/equations such as those (e.g., publicly, etc.) specified or documented in standard-based or proprietary video coding specifications. In comparison, black-box conversion refers to HDR-to-SDR mapping or conversion with conversion or mapping operations not based on the well-defined conversion or mapping functions/equations. For example, black-box conversion may be implemented as internal image signal processes performed by image signal processors with no or little reliance on any well-defined conversion or mapping functions or equations (e.g., publicly, etc.) specified or documented in standard-based or proprietary video coding specifications.
[0032] By way of example but not limitation, in
[0033] As shown in
[0034] A downstream recipient device or a video decoder of the video signal can decode the reshaped SDR images from the video signal. The decoded reshaped SDR images at the decoder side may be the same as the reshaped SDR images at the encoder side, subject to errors introduced in compression/decompression, coding operations and/or data transmissions.
[0035] In the backward reshaping path as implemented by the downstream device, the reshaped SDR images may be backward reshaped into (backward) reshaped HDR images based at least in part on optimized backward reshaping operational parameters (denoted as backward TPB optimization) generated from the same joint forward-and-backward TPB optimization solution. The reshaped HDR imagesgenerated with the optimized backward reshaping operational parameters in an output HDR color spacerepresent an approximation to or reconstructed version of the reference HDR images.
[0036] The joint forward-and-backward TPB optimization solution can generate the optimized forward and backward reshaping operational parameters to cover as wide as possible in the output HDR color space in which the reshaped or reconstructed HDR images generated at the decoder side are represented.
[0037] One or both of the optimized forward and backward reshaping operational parameters such as optimized forward and backward TPB coefficients can be implemented or represented in three-dimensional look-up table(s) or 3D-LUT(s) to reduce processing times in reshaping operations.
[0038] Since the forward and backward reshaping operational parameters such as forward and backward TPB coefficients are jointly or concurrently designed or optimized in the joint forward and backward TPB optimization process, the supported reshaped SDR and HDR color spaces used to represent the reshaped SDR and HDR images can be jointly or concurrently designed or optimized in the same process.
[0039] In some operational scenarios, the optimized forward and backward reshaping operational parameters generated from the joint forward and backward TPB optimization process may be applied in a static single-layer backward compatible (SLBC) framework. Under this static framework, there is no need to (e.g., dynamically, etc.) obtain optimize image-specific or image-dependent optimized forward and backward operational parameters such as image-specific or image-dependent (or content-dependent) forward and backward TPB coefficients, for example on the fly while images are being processed.
[0040] Rather, under the static SLBC framework, the same or static optimized forward and backward operational parameters such as the same optimized forward and backward TPB coefficients can be obtained or generated oncefor example, offline or before performing reshaping operations on any of the (input) reference HDR images or the reshaped SDR imagesfor forward reshaping all the (input) reference HDR images and backward reshaping the reshaped SDR images. In an example, a single set of static optimized forward and backward operational parameters can be generated offline by a system as described herein and configured/deployed in or used by a capturing device as described herein to perform image reshaping or reconstruction operations. In another example, multiple sets of static optimized forward and backward operational parameters can be generated offline by a system as described herein and configured/deployed in or used by a capturing device as described herein to select a specific set of static optimized forward and backward operational parameters to perform image reshaping or reconstruction operations.
[0041] Thereafter, the optimized static forward TPB coefficients can be applied by the upstream device to forward reshape some or all of the (input) reference HDR images (e.g., a sequence of consecutive or sequential (input) reference HDR images, etc.) to generate the reshaped SDR images to be encoded in the (SLBC) video signal, whereas the optimized static backward TPB coefficients can be applied by the downstream recipient device of the video signal to some or all of the reshaped SDR images (e.g., a sequence of consecutive or sequential reshaped SDR images, etc.) decoded from the video signal to generate or reconstruct the reshaped HDR images.
[0042]
[0043] By way of example but not limitation, in
[0044] As shown in
[0045] A downstream recipient device or a video decoder of the video signal can decode the ISP SDR images from the video signal. The decoded ISP SDR images at the decoder side may be the same as the ISP SDR images at the encoder side, subject to errors introduced in compression/decompression, coding operations and/or data transmissions.
[0046] In the backward reshaping path as implemented by the downstream device, the ISP SDR images may be backward reshaped into (backward) reshaped HDR images based at least in part on optimized backward reshaping operational parameters (denoted as backward TPB optimization) generated from the backward only TPB optimization solution. The reshaped HDR imagesgenerated with the optimized backward reshaping operational parameters in an output HDR color spacerepresent an approximation to or reconstructed version of the reference HDR images.
[0047] The backward only TPB optimization solution can generate the optimized backward reshaping operational parameters to cover as wide as possible in the output HDR color space in which the reshaped or reconstructed HDR images generated at the decoder side are represented.
[0048] The optimized backward reshaping operational parameters such as optimized backward TPB coefficients can be implemented or represented in a three-dimensional look-up table or 3D-LUT to reduce processing times in reshaping operations.
[0049] In some operational scenarios, the backward reshaping operational parameters generated from the backward only TPB optimization process may be applied in a static single-layer inverse display mapping (SLiDM) framework. Under this static framework, there is no need to (e.g., dynamically, etc.) obtain optimize image-specific or image-dependent optimized backward operational parameters such as image-specific or image-dependent (or content-dependent) backward TPB coefficients.
[0050] Rather, under the static SLiDM framework, the same or static optimized backward operational parameters such as the same optimized backward TPB coefficients can be obtained or generated oncefor example, offline or before performing reshaping operations on any of the reshaped SDR imagesfor backward reshaping the ISP SDR images. In an example, a single set of static optimized backward operational parameters can be generated offline by a system as described herein and configured/deployed in or used by a capturing device as described herein to perform image reshaping or reconstruction operations. In another example, multiple sets of static optimized backward operational parameters can be generated offline by a system as described herein and configured/deployed in or used by a capturing device as described herein to select a specific set of static optimized backward operational parameters to perform image reshaping or reconstruction operations.
[0051] Thereafter, the optimized static backward TPB coefficients can be applied by the downstream recipient device of the video signal to some or all of the ISP SDR images (e.g., a sequence of consecutive or sequential ISP SDR images, etc.) decoded from the video signal to generate or reconstruct the reshaped HDR images.
[0052] The ISP SDR images encoded in the video signal may or may not be identical to a desired SDR look such as represented by the reference images. The operational parameters or settings thereof in the programmable ISP can be optimized to approximate the reference images. Hence, in the WB operational scenarios of
[0053]
[0054] Reshaping operational parameters in these operational scenarios can be generated with training SDR images and training HDR images generated or acquired by the same video capture device such as the same mobile device. These training SDR images and training HDR images form a plurality of SDR and HDR image pair each of which includes a training SDR image and a training HDR image corresponding to the training SDR image.
[0055] A training SDR image and a training HDR image in the same SDR and HDR image pair may be acquired at different time instances/points (e.g., a few milliseconds apart, a few fractional seconds apart, a few seconds apart, etc.) using the same capturing device using the same ISP. The training SDR and HDR images may not be exactly spatially aligned or temporally aligned with each other, as it is difficult if not impossible to maintain the same shooting positions at the different time instances/points and process the training SDR and HDR images identically. For example, the training SDR image can be local tone mapped or local enhanced, whereas the training HDR image may be generated or obtained from multiple camera exposures. Accordingly, in the BB1 operational scenarios, relationships between pixel or codeword values in the training SDR image and corresponding pixel or codeword values in the training HDR image in the same image pair may be treated as, or assumed to be, a black box.
[0056] Training SDR and HDR images in each of the image pairs may be first spatially aligned. Spatially aligned training SDR and HDR images in the image pairs may be used to determine or find out matching color pairs. These matching color pairs can then be used to generate the optimized reshaping operational parameters such as the optimized TPB and/or non-TPB coefficients for reshaping or mapping (e.g., non-training, reference, etc.) SDR images back to backward reshaped or reconstructed HDR images approximating (e.g., non-training, reference, etc.) HDR images.
[0057] As shown in
[0058] A downstream recipient device or a video decoder of the video signal can decode the reference SDR images from the video signal. The decoded reference SDR images at the decoder side may be the same as the reference SDR images at the encoder side, subject to errors introduced in compression/decompression, coding operations and/or data transmissions.
[0059] In the backward reshaping path as implemented by the downstream device, the reference SDR images may be backward reshaped into (backward) reshaped HDR images based at least in part on optimized backward reshaping operational parameters (denoted as backward TPB optimization) generated from the black-box backward only TPB optimization design/solution. The reshaped HDR imagesgenerated with the optimized backward reshaping operational parameters in an output HDR color spacerepresent an approximation to or reconstructed version of reference HDR images that can or could be generated by the same capture device.
[0060] The black-box backward only TPB optimization design/solution can generate the optimized backward reshaping operational parameters to cover as wide as possible in the output HDR color space in which the reshaped or reconstructed HDR images generated at the decoder side are represented.
[0061] The optimized backward reshaping operational parameters such as optimized backward TPB coefficients can be implemented or represented in a three-dimensional look-up table or 3D-LUT to reduce processing times in reshaping operations.
[0062] In some operational scenarios, the backward reshaping operational parameters generated from the black-box backward only TPB optimization design/solution may be applied in a SLiDM framework. Under this static framework, there is no need to (e.g., dynamically, etc.) obtain optimize image-specific or image-dependent optimized backward operational parameters such as image-specific or image-dependent (or content-dependent) backward TPB coefficients.
[0063] Rather, under the static SLiDM framework, the same or static optimized backward operational parameters such as the same optimized backward TPB coefficients can be obtained or generated oncefor example, offline or before performing reshaping operations on any of the reshaped SDR imagesfor backward reshaping the spatially aligned training SDR images generated by the upstream capture device to reshaped or reconstructed HDR images approximating the spatially aligned training HDR images generated by the same capture device.
[0064] Thereafter, the optimized static backward TPB coefficients can be applied by the downstream recipient device of the video signal to some or all of the (e.g., non-training, etc.) reference SDR images (e.g., a sequence of consecutive or sequential reference SDR images, etc.) decoded from the video signal to generate or reconstruct reshaped HDR images approximating reference HDR images that can or could be generated from the same upstream capture device that generate or capture the reference SDR images.
[0065] As TPB optimization is only used in the backward path, the resultant look of the reshaped HDR images generated from backward reshaping the reference SDR images may have relatively large deviations from a desired look of the reference HDR images in the BB1 operational scenarios.
[0066] In some BB1 operational scenarios, as illustrated in
[0067]
[0068] Reshaping operational parameters in the BB2 operational scenarios can be generated with training SDR images and training HDR images generated or acquired respectively by the two video capture devices such as two mobile devices of different makes and/or models. These training SDR images and training HDR images form a plurality of SDR and HDR image pair each of which includes a training SDR image and a training HDR image corresponding to the training SDR image.
[0069] A training SDR image and a training HDR image in the same SDR and HDR image pair may be acquired using the same image such as the same checkerboard image rendered on the same or similar image display(s). In the BB1 operational scenarios, relationships between pixel or codeword values in the training SDR image and corresponding pixel or codeword values in the training HDR image in the same image pair may be treated as, or assumed to be, a black box.
[0070] Training SDR and HDR images in each of the image pairs may be first spatially aligned. Spatially aligned training SDR and HDR images in the image pairs may be used to determine or find out matching color patches rendered with test images such as checkerboard images on the same image display or the same image display type. These matching color patches can then be used to construct three-dimensional look-up tables (3D-LUTs) to map SDR pixel or codeword values of color patches in the test images to corresponding HDR pixel or codeword values of the same color patches.
[0071] The 3D-LUTs derived based in part or in whole on the test images can be used to generate optimized static or dynamic reshaping operational parameters for reshaping or mapping (e.g., non-training, reference, etc.) SDR images back to backward reshaped or reconstructed HDR images approximating (e.g., non-training, reference, etc.) HDR images.
[0072] As shown in
[0073] A downstream recipient device or a video decoder of the video signal can decode the reference SDR images acquired with the first capture device from the video signal. The decoded reference SDR images at the decoder side may be the same as the reference SDR images at the encoder side, subject to errors introduced in compression, decompression, coding operations and/or data transmissions.
[0074] In the backward reshaping path as implemented by the downstream device, the reference SDR images may be backward reshaped into (backward) reshaped HDR images based at least in part on optimized backward reshaping operational parameters (denoted as Dynamic backward function optimization) generated from the BB2 reshaping optimization design/solution. The reshaped HDR imagesgenerated with the optimized backward reshaping operational parameters in an output HDR color spacerepresent an approximation to or reconstructed version of reference HDR images that can or could be generated by a second different capture device.
[0075] The BB2 reshaping optimization design/solution can generate the optimized backward reshaping operational parameters to cover as wide as possible in the output HDR color space in which the reshaped or reconstructed HDR images generated at the decoder side are represented.
[0076] In some operational scenarios, the backward reshaping operational parameters generated from the BB2 reshaping optimization design/solution may be applied in a static or dynamic SLiDM framework.
[0077] As the BB2 reshaping optimization is only used in the backward path, the resultant look of the reshaped HDR images generated from backward reshaping the reference SDR images may have relatively large deviations from a desired look of the reference HDR images. The first and second capture devices may be mobile devices that can operate differently in their respective video capturing application, for example with different exposure times, different video processing operations, different mapping functions/relationships, etc.
[0078] A dynamic SLiDM frame in which dynamic mapping with image-dependent or image-specific reshaping operational parameters may be used to backward reshape SDR images of the first capture device to HDR images that can or could be generated by the second capture device, especially in operational scenarios in which the first and second capture devices operate with relatively large differences between respective video/image capture applications running on the first and second capture devices and/or between respective ISPs used in the first and second capture devices. For example, in a training phase, multiple sets of training SDR and HDR images or image pairs can be used to derive multiple sets of optimized reshaping operational parameters respectively for the multiple sets of training SDR and HDR images or image pairs. In a deployment or application phase, specific image characteristics such as overall brightness of some or all regions of a specific image can be dynamically evaluated when the specific image is being processed. A specific set of optimized reshaping operational parameters can be adaptively and/or dynamically selected for the specific image from among the multiple sets of optimized reshaping operational parameters based on the specific image characteristics of the specific image in relation to or in comparison with respective image characteristics of the multiple sets of training SDR and HDR images or image pairs.
White Box Joint TPB Forward and Backward Optimization (WFB)
[0079] A number of design factors may be considered for reshaping optimization to help increase coverage of a supported color space in which reshaped images are to be represented. First, tensor-product bi-spline (TPB) predictors with optimized TPB coefficients generated from reshaping optimization may be used to obtain or achieve a relative high prediction accuracy than other types of predictors including but not limited to MMR predictors. Multi-knot and continuity properties of tensor-product bi-spline predictors or prediction functions can be exploited or used to cover a relatively wide color range or color space portion with relatively high accuracy. Example TPB predictors can be found in U.S. Provisional Patent Application Ser. No. 62/908,770, Tensor-product B-spline predictor, by Guan-Ming Su, Harshad Kadu, Qing Song and Neeraj J. Gadgil, filed on Oct. 1, 2019, the contents of which are entirely incorporated herein by reference as if fully set forth herein.
[0080] In comparison, while multi-piece MMR predictors or prediction functions might be better than single-piece MMR predictors or prediction functions, discontinuity between different MMR pieces would likely be introduced with the multi-piece MMR predictors or prediction functions, thereby negatively impacting and even prohibiting usage of the multi-piece MMR predictors or prediction functions in many operational scenarios. Example MMR based operations are described in U.S. Pat. No. 8,811,490, which are incorporated by reference in its entirety as if fully set forth herein.
[0081] TPB predictors can be advantageously used in SLBC based video codec. More specifically, forward TPB predictors can be used to generate forward reshaped SDR images approximating reference SDR images, whereas backward TPB predictors can be used to generate backward reshaped or reconstructed HDR images approximating reference HDR images. Example usage of TPB predictors with SLBC based video codec can be found in U.S. Provisional Patent Application Ser. No. 63/255,057, Tensor-product B-spline prediction for HDR video in mobile applications, by H. Kadu et al., filed on Oct. 13, 2021, the contents of which are entirely incorporated herein by reference as if fully set forth herein. In addition, a BESA (Backward Error Subtraction for signal Adjustment) algorithm/method with modification to support neutral color preservation can be used in a pipeline of chained reshaping functions to optimize the forward and backward TPB predictors together to achieve a relatively high reversibility to or a relatively accurate approximation of the reference HDR images by the backward reshaped or reconstructed HDR images. Example BESA algorithm/method can be found in U.S. Provisional Patent Application Ser. No. 63/013,063, Reshaping functions for HDR imaging with continuity and reversibility constraints, by G-M. Su, filed on Apr. 21, 2020, U.S. Provisional Patent Application Ser. No. 63/013,807, Iterative optimization of reshaping functions in single-layer HDR image codec, by G-M. Su and H. Kadu, filed on Apr. 22, 2020, and PCT Application Ser. No. PCT/US2021/028475, filed on Apr. 21, 2021, the contents of which are entirely incorporated herein by reference as if fully set forth herein.
[0082] While TPB predictors have better prediction accuracy than other types of predictors, a relatively large signal or bitrate overhead may be incurred to transmit TPB coefficients in a video signal. In addition, a relatively large computational cost may be incurred to construct TPB equations or basis functions used in the TPB predictors.
[0083] In some operational scenarios, to avoid or reduce the signal or bitrate overhead and computational cost, built-in or static TPB predictors may be used to reshape some or all images in an entire video sequence without changing TPB coefficients used in the static TPB predictors during playing back the video sequence. More specifically, some or all of the (built-in or static) TPB coefficients can be cached or stored in a video application such as a video capture/editing application without needing to transmit these TPB coefficients explicitly through or in a video signal or bitstream encoded with images to be reshaped by the built-in or static TPB predictors. In an example, some or all of the TPB coefficients can be pre-loaded or preconfigured in a downstream recipient device before the video signal or bitstream is received and processed by the downstream recipient device. Additionally, optionally or alternatively, multiple sets of TPB coefficients can be pre-loaded or preconfigured in a downstream recipient device before the video signal or bitstream is received and processed by the downstream recipient device. The video applications can simply choose or select a specific set of TPB coefficients to use with the static TPB predictors from among the multiple sets of TPB coefficients, for example based on a simple indicator (e.g., a binary indicator, a multi-bit indicator, etc.) signaled or transmitted in the video signal or bitstream. Example static TPB predictors or prediction functions can be found in in the previously mentioned U.S. Provisional Patent Application Ser. No. 63/255,057.
[0084] In some operational scenarios, mobile devices may be used to host and run video capture and/or editing applications. Users of the mobile devices can edit captured images in these video capture and/or editing applications. The edited images such as edited HDR imageswhich, for example, may be intended to be displayed or viewed on image displays of non-mobile devices with display capabilities higher or larger than those of image displays of the mobile devicescan have luminance ranges and/or color ranges or color distributions higher, larger, wider and/or broader than those of the (original) captured images such as HDR images originally captured from camera sensors of the mobile devices. For example, the captured HDR images can often be limited by camera sensors and ISP outputs of the mobile devices to be represented in a relatively narrow color space or gamut such as P3, whereas the edited HDR images can be represented in a relatively wider color space or gamut such as the entire R.2020 color space.
[0085] To design a static mapping for reshaping operations, it is highly desired to optimize forward/backward TPB coefficients to cover dynamic range(s) as high as possible and color space(s) or gamut(s) as wide as possible while realizing bitrate and computational efficiency as much as possible.
[0086] A full HDR revertability between a reconstructed HDR image and a reference HDR image to which the reconstructed HDR is identical would need to distinguish colors in an original or reconstructed HDR domain (or color space) to be distinguished or distinguishable in a reshaped SDR domain (or color space), in order to enable the distinguished colors in the SDR domain to be mapped back to distinguishable colors in the HDR domain. Hence, the full revertability would likely need one-to-one mappings from HDR to SDR as well as from SDR back to HDR. The wider the HDR domain or color space is to support, the more codewords the SDR domain or color space needs to have. Given the total number of available SDR pixel or codeword values in the SDR domain (e.g., corresponding to a lower bit depth than that of the HDR domain, etc.) is typically smaller than the total number of needed HDR pixel or codeword values in the HDR domain, the full HDR revertability may not be possible in some operational scenarios. Indeed, some captured or edited images may contain combinations of diffusive and/or specular colors forming color distributions such as shown as an irregular shape in
[0087] In many operational scenarios, a non-linear reshaping mapping or function can be used to distribute available pixel or codeword values relatively efficiently and generate the (forward) reshaped SDR that approximates the reference SDR image as closely as possible and that helps the reconstructed HDR image to approximate the reference HDR image as closely as possible, albeit the reshaped SDR generated with the non-linear reshaping mapping or function may not be identical to the reference SDR image but rather may contain some deviations from the reference SDR image.
[0088] Additionally, optionally or alternatively, in some operational scenarios, to maintain the same or similar looks of the reference SDR and HDR images in the reshaped SDR and HDR images, reshaping optimization as described herein can be performed with neutral color constraints that preserve neural colors in color mapping performed with the reshaping operations.
[0089] Given their capabilities to approximate functions with relatively high non-linearities, TPB predictors or functions can be incorporated in the reshaping operations to support or realize a relatively wide HDR color space for representing the backward reshaped or reconstructed HDR images.
[0090] A potential downside is that the TPB predictors may need a relatively large number of TPB coefficients to represent or approximate non-linear SDR-to-HDR and/or HDR-to-SDR mapping functions. In order to prevent possible ill-defined conditions, numerical instabilities, slow convergence issues, etc., that may occur in solving optimization problems for TPB coefficients, static TPB predictors may be generated beforehand and deployed with video codecs such as those used by upstream devices and/or downstream devices before these video codecs are used to process video sequences such as capture and/or edited video sequences.
[0091] For the purpose of illustration only, in the operational scenarios as shown in
[0092] While it may not be possible in all scenarios to cover the entire R.2020 color space, a subset or sub-spacee.g., a specific color gamut delineated as a triangle formed by color primaries in a color coordinate system, etc.in the R.2020 color space may be selected or chosen by the reshaping optimization operations to maintain the HDR reversibility to the reference HDR images and acceptable SDR approximation of the reference SDR images in the reshaped HDR and SDR images.
[0093] By way of illustration but not limitation, the subset or sub-space in the R.2020 color space (or HDR color space for representing the reshaped HDR images) may be defined or characterized by a specific white point and three specific color primaries (red, green, blue). The specific white point may be selected or fixed to be the D65 white point. There may exist much design freedom to select the three specific color primaries from many possible color primary combinations for the subset or sub-space in the R.2020 to be supported by the reshaping operations. The reshaping optimization techniques as described herein can be used to determine or select the specific color primaries for the subset or sub-space in the R.2020 to be optimized color primaries for realizing or reaching maximum perceptual quality and/or color coding efficiency and/or revertability between SDR and HDR images.
[0094] A Macadam ellipse represents a just noticeable color difference in that the HVS may not be able to distinguish color differences within the same ellipse. The Pointer's gamut may represent all diffusive colors that can be perceived by the HVS. In Macadam ellipse and Pointer's gamut, green colors are less important or less distinguishable/perceptible by the HVS than red and blue colors. Hence, in some operational scenarios, the specific optimized color primaries for the subset or sub-space in the R.2020 to represent the reshaped or reconstructed HDR images can be selected to cover the red and blue colors more than the green colors, especially when the one-to-one mapping relations cannot be supported by SDR-to-HDR and HDR-to-SDR mappings in the reshaping operations.
[0095] As used herein, color primaries may also be referred to as primary colors and may be used to define corners of a polygon such as a triangle representing a specific color space or gamut such as a standard-specified color space, a display-supported color space, a video-signal-supported color space, etc. For example, a standard color space with a standard-specified white point can be represented by a triangle whose corners are specified by three standard-specified color primaries in the CIExy color space coordinate system or CIExy chromaticity diagram. CIExy coordinates of the color primaries (red or R, green or G, blue or B) and white points respectively defining the R.709, P3 and R.2020 color spaces are specified in TABLE 1 below.
TABLE-US-00001 TABLE 1 R G B White point R.709 (0.6400 0.3300) (0.3000 0.6000) (0.1500 0.0600) (0.3127 0.3290) P3 (0.6800 0.3200) (0.2650 0.6900) (0.1500 0.0600) (0.3127 0.3290) R.2020 (0.7080 0.2920) (0.1700 0.7970) (0.1310 0.0460) (0.3127 0.3290)
[0096] CIExy coordinates of color primaries of each of the standard color spaces in TABLE 1 and the corresponding white point may be denoted as
respectively, where (c) is the standard color space.
[0097] Hence, the P3 color space may be characterized by CIExy coordinates of the P3 color primaries and the P3 white point as follows:
Similarly, the R.2020 color space may be characterized by CIExy coordinates of the R.2020 color primaries and the R.2020 white point as follows:
[0098] Denote a backward reshaped HDR color space for representing backward reshaped or reconstructed HDR images as an (a) color space. Hence, CIExy coordinates of the color primaries and white point defining the (a) color space may be denoted as
respectively.
[0099] As noted, green colors are less important or less distinguishable/perceptible by the HVS than red and blue colors. In some operational scenarios, the red and blue color primaries of the (a) color space may be selected to match those of the R.2020 color space, as follows:
[0100] In addition, like all the R.709, P3 and R.2020 color spaces that are specified with the D65 white points, the (a) color space can also be specified with the D65 white point, as follows:
[0101] The green color primary of the (a) color space can be selected along the line between the green color primary
of the P3 color space and the green color primary
of the R.2020 color space in the CIExy coordinate system or chromaticity diagram. Any point along the line between the two green color primaries of the P3 color space and the R.2020 color space can be represented as a linear combination of these two green color primaries with a weight factor a, as follows:
[0102] Hence, the optimization problem of finding the maximum support from the (a) color space for the R.2020 color space can be simplified as a problem to select the weight factor a. When a=0, the (a) color space becomes the entire R.2020 color space. When a=1, as illustrated in
Joint Color Space and TPB Optimization
[0103] As the range or coverage by the backward reshaped (a) color space over the R.2020 color space may be controlled by the parameter a, an overall TPB (based reshaping) optimization problem becomes how to optimize TPB coefficients in both forward and backward paths, such that (i) the forward reshaped SDR domain (or color space) for representing reshaped SDR images is close to a reference SDR domain (or color space) for representing reference SDR images, especially in neutral colors or color space portions which are more sensitive to the HVS than non-neutral colors or color space portions, and (ii) the backward reshaped or reconstructed HDR domain (or color space) for representing backward reshaped or reconstructed HDR images is as closely identical as possible to the reference HDR domain (or color space) for representing reference HDR images. Ideally, the backward reshaped or reconstructed HDR images are identical to the reference HDR images if a perfect reconstruction or full revertability can be realized.
[0104] As illustrated in
[0105]
[0106] Block 302 comprises selecting a current (e.g., to be iterated, etc.) value for the parameter a to the next candidate value (e.g., initially the first candidate value, etc.) in the plurality of candidate values for the parameter a. Given a candidate backward reshaped HDR color space with the current value for the parameter a, optimized reshaping operational parameters such as optimized TPB coefficients can be obtained in one or more subsequent process flow blocks of
[0107] Block 304 comprises building samples points, or prepare two sampled data sets, in the candidate backward reshaped HDR color space. By way of example but not limitation, the candidate backward reshaped HDR color space may be a Hybrid Log Gamma (HLG) RGB color space (referred to as the (a) RGB color space HLG).
[0108] The first of the two sampled data set is a uniformly sampled data set of color patches. Each color patch in the uniformly sampled data set of color patches is characterized or represented by a respective RGB color (denoted as
uniformly sampled from the (a) RGB color space HLG, which includes three dimensions denoted as R-axis, G-axis, B-axis for R, G and B component colors, respectively. Each axis or dimension of the (a) RGB color space HLG is normalized to a value range of [0, 1]) and sampled by N.sub.R, N.sub.G, and N.sub.B units or divisions, respectively. Here, each of N.sub.R, N.sub.G, and N.sub.B represents a positive integer greater than one (1). Hence, the total number of color patches or sampled data points in the uniformly sampled data set of color patches is given as follows:
[0109] Each uniformly sampled data point or RGB color {dot over (v)}.sub.ijk.sup.(u) in the uniformly sampled data set of color patches is given as follows:
[0110] For simplicity, (i, j, k) in expression (5) above can be vectorized or simply denoted as p. Correspondingly, the uniformly sampled data point or RGB color
can be simply denoted as
All N.sub.u nodes (each of which represents a unique combination of a specific R-axis unit/partition, a specific B-axis unit/partition, a specific G-axis unit/partition) can be grouped or collected into a vector/matrix as follows:
[0111] The second of the two sampled data set prepared or built in block 304 is a neutral color data set. This second data set includes a plurality of neutral colors or neutral color patches (also referred to as gray colors or gray color patches).
[0112] The second data set may be used to preserve input gray color patches in an input domain as output gray color patches in an output domain when the input gray color patches in the input domain are mapped or reshaped into the output gray color patches in reshaping operations as described herein. The input gray color patches in the input domain (or input color space) may be given increased weighting in the optimization problem as compared with other color patches to reduce the likelihood of these input gray color patches being mapped to non-gray color patches in the output domain (or output color space) by the reshaping operations.
[0113] The second data setthe gray color data set or the data set of gray colorsmay be prepared or built by uniformly sampling the R, G, B values along the line connecting between a first gray color (0, 0, 0) and a second gray color (1, 1, 1) in the RGB domain (e.g., the (a) RGB color space HLG, etc.), giving rise to N.sub.n nodes or gray color patches, as follows:
[0114] All the N.sub.n nodes in the second data set can be grouped or collected into a neutral color vector/matrix as follows:
[0115] The neutral color vector/matrix in expression (8) above can be repeated N.sub.t (a positive integer no less than one (1)) times to generate N.sub.nN.sub.t neutral color patches in the (now repeated) second data set, as follows:
[0116] The repetition of neutral colors in the repeated second data set of neutral colors increases weighting of the neutral or gray colors as compared with other colors. Accordingly, the neutral colors can be preserved more in the optimization problem than the other colors.
[0117] The first data set of (all sampled) colors and the second (repeated) data set of neutral colors in expressions (6) and (9) can be collected or placed together in a single combined vector/matrix, as follows:
[0118] The total number of vector/matrix elements (repeated and non-repeated color patches) in the combined vector/matrix
is N=N.sub.nN.sub.t+N.sub.u. Each vector/matrix element or color patch (row) in
in expression (10) above may be denoted as follows:
[0119] Block 306 comprises converting color values of the color patches (rows) represented in the vector/matrix elements of the combined vector/matrix
in the (a) RGB color space (or the (a) RGB color space HLG) to corresponding color values in a standard-based R.2020 color space or an R.2020 RGB color space HLG.
[0120] The three red, green and blue primary colors in the (a) RGB color space HLG, which correspond to three corners or points of a triangle defining the (a) RGB color space HLG in the CIExy chromaticity diagram, can be converted from CIE xy values to to CIE XYZ values via the following equations:
[0121] Given
as the red, green and blue primary colors in (x, y) or CIE xy values, the same red, green and blue primary colors in (X, Y, Z) or CIE XYZ values, respectively denoted as
can be obtained using the conversion equations in expressions (12) above. Similarly, given
as the white point in (x, y) or CIE xy values, the same white point in (X, Y, Z) or CIE XYZ values, denoted as
can be obtained using the same conversion equations in expressions (12) above.
[0122] A 33 conversion matrix denoted as P.sup.(a).fwdarw.XYZ can be constructed to convert the color values of the color patches (rows) represented in the vector/matrix elements of the combined vector/matrix
in (x, y) or CIE xy values to corresponding (X, Y, Z) or CIE XYZ values, as follows
where denotes elementwise dot multiplication between the two matrixes on the right hand side (RHS) in expression (13); and the two matrixes on the RHS in expression (13) can be constructed as follows:
[0123] An overall 33 conversion matrix denoted as P.sup.(a).fwdarw.R2020 can be constructed to convert the color values of the color patches (rows) represented in the vector/matrix elements of the combined vector/matrix
in the a RGB color space HLG to corresponding color values in the R.2020 RGB color space HLG, as follows:
where P.sup.XYZ.fwdarw.R2020 denotes a 33 conversion matrix that converts XYZ color values to corresponding values in the R.2020 RGB color space HLG.
[0124] Hence, in block 306, the color values of the color patches (rows) represented in the vector/matrix elements of the combined vector/matrix
in the a RGB color space (or the (a) RGB color space HLG) can be converted to the corresponding color values in a standard-based R.2020 color space or an R.2020 RGB color space HLG, as follows:
[0125] Block 308 comprises converting color values of the color patches (rows) represented in the vector/matrix elements of the vector/matrix
in the R.2020 RGB color space HLG to corresponding color values (denoted as
in an R.2020 YCbCr color space HLG, as follows:
where
denotes a conversion function or matrix that converts the (R.2020 RGB) color values in the R.2020 RGB color space HLG to the corresponding (HDR R.2020 YCbCr) color values in the R.2020 YCbCr color space HLG.
[0126] Each color patch (row) in
may be denoted as follows:
[0127] Block 310 comprises converting or content mapping color values of the color patches (rows) represented in the vector/matrix elements of the vector/matrix
in the R.2020 RGB color space HLG to corresponding color values (denoted as
in an R.709 SDR YCbCr color space, as follows:
where .sub.HLG.fwdarw.W_SDR( ) denotes a white box HDR-to-SDR mapping function that converts the HDR R.2020 RGB color values
in the R.2020 RGB color space HLG to corresponding R.709 SDR RGB color values (denoted as
in the R.709 SDR RGB color space; and
denotes a conversion function or matrix that then converts the R.709 SDR RGB color values
in the R.709 SDR RGB color space to the corresponding color values
in the R.709 SDR YCbCr color space. A non-limiting example of the white box HDR-to-SDR mapping function is described in BT.2390-10, High dynamic range television for production and international programme exchange, (November 2021), which is incorporated herein by reference in its entirety. Other examples of white-box HDR-to-SDR mapping functions may include, but are not necessarily limited to only, any of: standard based HDR-to-SDR conversion functions, proprietary HDR-to-SDR conversion functions, HDR-to-SDR conversion functions with linear and/or non-linear domains, HDR-to-SDR mapping functions using gamma functions, power functions, etc.
[0128] Each color patch (row) in
may be denoted as follows:
[0129] Block 312 comprises converting or content mapping color values of the color patches (rows) represented in the vector/matrix elements of the vector/matrix
in the R.2020 RGB color space HLG to corresponding color values (denoted as
in an R.2020 YCbCr color space PQ, as follows:
where .sub.HLG.fwdarw.PQ( ) denotes an HLG-to-PQ conversion function that converts the HDR R.2020 RGB HLG color values
in the R.2020 RGB color space HLG to corresponding HDR R.2020 RGB PQ color values (denoted as
in the R.2020 HDR RGB color space PQ, for example using transfer functions described in the previously mentioned SMPTE 2084 and Rec. ITU-R BT.2100; and
denotes a conversion function or matrix that then converts the HDR R.2020 RGB PQ color values
in the R.2020 HDR RGB color space PQ to the corresponding color values
in the R.2020 YCbCr color space PQ, for example using transfer functions described in the previously mentioned SMPTE 2084 and Rec. ITU-R BT.2100.
[0130] Each color patch (row) in
may be denoted as follows:
[0131] Block 314 comprises taking
as input to a (TPB) BESA algorithm, which may be enhanced or modified with neutral color preservation, to obtain or generate forward and backward reshaping operational parameters such as forward and backward TPB coefficients.
[0132] A BESA algorithm is an iterative algorithm in which each current iteration may modify a reference SDR signal according to backward prediction errors measured or determined in a previous iteration.
[0133] A modified BESA algorithm as described herein may implement neutral color preservation and avoid modifying neutral color patches. To do so, a set of neutral color patch indexes that identify or correspond to neutral color patches may be generated as follows:
where represents a neutral color value threshold that determines or specifies a maximum range of color deviation in absolute values within which color values qualify as neutral colors. Example values for the neutral color value threshold 5 may include, but are not necessarily limited to only, any of: 1/2048, 1/1024, etc.
[0134] Let ch denote a channel in (e.g., three channels of, etc.) the forward reshaped SDR domain or color space. Let F denote forward reshaping or forward path. Let B denote backward reshaping or backward path.
[0135] A per-channel forward generation matrix (also referred to as a design matrix) denoted as
can be used to predict forward reshaped SDR color patches as characterized by corresponding forward reshaped SDR codewords from input HDR color patches
as characterized by corresponding input HDR codewords derived with expression (19). The forward generation matrix can be generated or pre-calculated from cross-channel forward TPB basic functions
stored or cached in computer memory, and fixed in all iterations for the forward path, as follows:
where the cross-channel forward TPB basis functions
take or accept the color patches (or rows) in
as shown in expression (20) above as input parameters.
[0136] The forward reshaped SDR color patches may be predicted by multiplying the forward generation matrix in expression (28) with forward TPB coefficients. Example prediction reshaped codewords (which are equivalent or similar to color patches herein) with TPB coefficients are described in in the previously mentioned U.S. Provisional Patent Application Ser. No. 62/908,770.
[0137] The forward reshaped SDR color patches or codewords can be backward reshaped into backward reshaped HDR color patches or codewords, for example through TPB based backward reshaping operations.
[0138] In the backward path, at each iteration (e.g., k-th iteration, etc.) in the BESA algorithm, a per-channel backward generation matrix denoted as
can be used to predict backward reshaped HDR color patches as characterized by corresponding backward reshaped HDR codewords from the forward reshaped SDR color patches as characterized by the corresponding forward reshaped SDR codewords derived in the forward path. A backward generation matrix for each iteration can be generated from cross-channel forward TPB basis functions
and not fixed in all iterations for the backward path, as follows:
where
represent the forward reshaped SDR color patches or codewords.
[0139] The backward reshaped HDR color patches may be predicted by multiplying the backward generation matrix in expression (29) with backward TPB coefficients.
[0140] Backward prediction errors for each iteration in the BESA algorithm can be determined by comparing the backward reshaped HDR color patches or codewords collected in a vector/matrix with reference HDR color patches or codewords in a per-channel backward observation vector/matrix derived from expression (25). The per-channel backward observation vector/matrix can be generated or pre-calculated from expression (25), stored or cached in computer memory, and fixed in all iterations, as follows:
[0141] A reference SDR signal or reference SDR color patches or codewords may be used as prediction targets for the forward reshaping operations to generate the forward reshaped SDR color patches or codewords to approximate the reference SDR color patches or codewords. In the BESA algorithm, the reference SDR color patches or codewords
where k denotes an iteration index starting from zero (0) as the first iteration in the BESA algorithmmay be initialized to the SDR color patches or codewords derived with expression (22) above for the very first iteration (k=0), and thereafter may be modified at the end of each iteration based in part or in whole on the prediction errors determined for the iteration. The modified reference SDR color patches or codewords may be used in the next iteration in the BESA algorithm as prediction targets for the forward reshaping operations to generate the forward reshaped SDR color patches or codewords to approximate the modified reference SDR color patches or codewords
[0142] Per-channel reference SDR color patches or codewords at iteration k may be used to construct a vector/matrix
as follows:
[0143] Before the first iteration, the vector in expression (31) above is set to the original reference SDR signal represented by expressions (22) and (23), as follows:
[0144] At iteration k, optimized values for the (e.g., per-channel, etc.) forward TPB coefficients (denoted as
can be generated via a least squared solution to an optimization problem that minimizes differences between the forward reshaped SDR color patches or codewords and the reference SDR color patches or codewords determined for the iteration, as follows:
[0145] The predicted SDR color patches or codewords at iteration k for channel ch can be computed from the optimized values for the (e.g., per-channel, etc.) forward TPB coefficients, as follows:
[0146] At iteration k, optimized values for the (e.g., per-channel, etc.) backward TPB coefficients (denoted as
can be generated via a least squared solution to an optimization problem that minimizes differences between the backward reshaped HDR color patches or codewords and the reference HDR color patches or codewords, the latter of which are fixed for all the iterations in the BESA algorithm, as follows:
[0147] The predicted HDR color patches or codewords at iteration k for channel ch can be computed from the optimized values for the (e.g., per-channel, etc.) backward TPB coefficients, as follows:
[0148] Compute backward prediction error and propagate error back to reference non-neutral SDR signal.
[0149] As noted, the backward prediction errors for each iteration in the BESA algorithm can be determined by comparing the backward reshaped HDR color patches or codewords with the reference HDR color patches or codewords derived from expression (25) the latter of which are fixed for all the iterations in the BESA algorithm.
[0150] Among the backward prediction errors, backward prediction errors for non-neutral colors (or non-gray colors) can be propagated back to update or modify the reference SDR color patches or codewords for these non-neutral colors (or non-gray colors). The modified SDR color patches or codewords for the non-neutral colors (or non-gray colors) can be combined with the (non-modified) reference SDR color patches or codewords for the neutral colors (or gray colors) to serve as prediction target for the forward path in the next or upcoming iteration in the BESA algorithm.
[0151] In some operational scenarios, the backward prediction error can be computed as differences between the original or reference HDR signal (or the reference HDR color patches/codewords therein) and a predicted HDR signal (or the backward reshaped HDR color patches/codewords therein) at iteration k, as follows:
[0152] In response to determining that a color patch i is in the neutral color set , reference SDR codewords for the color patch in the Cb and Cr channels of the reference SDR signal for the upcoming iteration (k+1) can be set to gray color values such as 0.5, as follows:
[0153] Otherwise, in response to determining that a color patch i is not in the neutral color set , the reference SDR codewords for the color patch in the Cb and Cr channels of the reference SDR signal for the upcoming iteration (k+1) can be set to updated or modified values according to the backward prediction errors, as follows:
where represents a backward prediction error threshold above which a backward prediction error is enabled to be propagated to update the color patch of a non-neutral color; .sup.ch represents a fixed or configurable learning rate to scale the backward prediction error; represents an upper bound or maximum allowed value to modify the reference codewords for the color patch of the non-neutral color. An example value of E may, but is not necessarily limited to only, 0.000005. An example value of .sup.ch may, but is not necessarily limited to only, 1. An example value of may, but is not necessarily limited to only, 2. (which means no limit as SDR and/or HDR codewords herein may be normalized to a normalized domain or range of 0 to 1).
[0154] The neutral color preservation as enforced with expressions (38) through (40) can be used to ensure neutral colors in the reference SDR signal for all the iterations in the BESA algorithm from deviating to or toward non-neutral color. As a result, gray level related perceptual qualities in reshaped images are improved.
[0155] The BESA algorithm may be iterated up to a total number of iterations. In some operational scenarios, the total number of iterations for the BESA algorithm may be specifically selected to balance color deviations in reshaped SDR and/or HDR images. For example, different total numbers of iterations in the BESA algorithm may generate forward and backward TPB coefficients that produce reshaped SDR images of different SDR looks and reshaped HDR images of different HDR looks. A total number of iterations such as ten, fifteen, etc., that generate reshaped SDR and HDR images of relatively high quality looks can be selected for the BESA algorithm.
[0156] The BESA algorithm with neutral color preservation can be performed for each candidate value for the parameter a. For example, in block 314, the BESA algorithm with neutral color preservation can be performed for the current candidate value for the parameter a.
[0157] Block 316 comprises determining whether the current candidate value for the parameter a is the last candidate value in the plurality of candidate values for the parameter a. If so, the process flow goes to block 318. Otherwise, the process flow goes back to block 302.
[0158] Block 318 comprises selecting an optimal or optimized value for the parameter a as well as computing (or simply selecting already computed) the optimized forward and backward TPB coefficients for the (a) RGB color space corresponding to the optimized value for the parameter a.
[0159] The selection of the optimized value for the parameter a can be formulated as an optimization problem to find (a specific value set for)
such that (1) the (a) RGB color space represents the largest HDR color space (e.g., to cover the largest portion of R.2020 RGB color space, etc.) to achieve minimized HDR prediction errors denoted as
and (2) forward reshaped SDR color patches or codewords have relatively small or acceptable SDR color deviations denoted as
as follows:
where w is a weighting factor used to balance between HDR and SDR errors.
[0160] This optimization problem can be solved by checking results from each a RGB color space using its corresponding optimal or optimized TPB coefficients m.sub.F,a.sup.ch and m.sub.B,a.sup.ch. Among all possible or candidate a values, the smallest a value (corresponding to the largest color space coverage in the R.2020 color space) that can achieve minimized HDR errors with relatively small SDR errors. The HDR prediction errors
and the SDR color deviations
may be subjective (e.g., with input from human observers, etc.) or objective distortion functions (e.g., with no or little input from human observers, etc.), such as those measured with an image dataset from a database. Example distortion functions may include, but are not necessarily limited to only, any of: mean squared error (or deviation) or MSE, root mean squared error (or deviation) or RMSE, sum of absolute difference or SAD, peak signal-to-noise ratio or PSNR, structural similarity index (SSIM), etc.
[0161]
of the backward reshaped HDR images as compared with reference HDR images which the backward reshaped HDR images are to approximate.
[0162] The vertical axis of
[0163] The vertical axis of
[0164] In some operational scenarios, in block 318, the image dataset used to determine distortions in the reshaped SDR and HDR images and select the optimized value for the parameter a based in part or in whole on the distortions may include video bitstreams acquired with one or more specific video capturing devices such as mobile phone(s) (e.g., supporting video coding profiles in SMPTE ST 2094, supporting Dolby Vision Profile 8.4, etc.). A non-limiting example of the optimized value for the parameter a may, but is not necessarily limited to only, 0.5, which determines the largest color space coverage in the R.2020 color space for the backward reshaped HDR domain or color space. As a wider backward reshaped color space (e.g., corresponding to a value less than the optimized value for the parameter a, etc.) is used or supported, reshaped SDR colors and codewords can start to deviate from reference SDR colors and codewords with relatively large distortions. This is because increasing the backward reshaped HDR domain or color space means squeezing distinguished forward reshaped SDR colors (or codewords) more tightly into the forward reshaped SDR domain or color space, leading to move the forward reshaped SDR colors from original 3D positions represented in reference SDR colors (or codewords) to be approximated by the forward reshaped SDR colors.
[0165] As noted, forward and backward TPB coefficient for TPB based reshaping operations can be determined for a given value such as the optimized value for the parameter a. These TPB coefficients can be multiplied in math equations with a forward or backward generation matrix constructed with TPB basis functions using input codewords as input parameters to generate forward or reshaped codewords, which may entail performing numerous computations involving computing TPB basis function values for each pixel in numerous pixels of an image.
[0166] In some operational scenarios, to speed up or reduce computations, forward and backward 3D-LUTs may be constructed from pre-computed TPB basis function values from sampled values and the optimized forward and backward TPB coefficients that are generated for the optimized value for the parameter a. The forward and backward 3D-LUTs or lookup nodes/entries therein may be pre-built, before deployed at runtime to process input images, and applied with relatively simple lookup operations in the forward and backward paths or corresponding forward and backward reshaping operations performed therein with respect to the input images at the runtime.
[0167] Denote the optimized value for the parameter a as a.sup.opt. Denote the corresponding optimal forward and backward TPB coefficients as
respectively.
[0168] The forward 3D-LUT can be used to forward reshape input HDR colors (or codewords) in an input HDR domain or color space such as the R.2020 domain or color space (in which the a color space is contained) into forward reshaped SDR colors (or codewords) in the forward reshaped SDR domain or color space.
[0169] The a color space identified inside the input HDR domain or color space such as the R 2020 (container) domain or color space may be used to clip out HDR colors or codewords represented outside the a color space. The forward TPB coefficients can be applied to input HDR colors or codewords in the a color space identified inside the input HDR domain or color space such as the R 2020 (container) domain or color space to generate predicted or forward reshaped SDR colors or codewords for each lookup node or entry in the forward 3D-LUT. As a result, the forward 3D-LUT includes a plurality of lookup nodes or entries each of which maps or forward reshapes a respective input (cross-channel or three-channel) HDR color or codeword to a corresponding predicted or forward reshaped (cross-channel or three-channel) SDR color or codeword.
[0170] In some operational scenarios, a forward 3D-LUT construction process includes a first step in which a 3D uniformly sampling grid is prepared in the input HDR domain or color space such as the R 2020 (container) domain or color space.
[0171] By way of illustration but not limitation, the input HDR domain or color space the R.2020 YCbCr color space HLG, which includes three dimensions or channels, namely Y-axis, Cb-axis, Cr-axis. Input HDR values (denoted as
made up of normalized values in a value range of [0, 1] in each of the axes can be uniformly sampled in each dimension or channel, as follows:
where total numbers of per-axis sampled values along YCbCr axes are denoted as N.sub.Y, N.sub.Cb, and N.sub.Cr, respectively. Hence, the combined total number of sampled values in all these axes are N.sub.u=N.sub.YN.sub.CbN.sub.Cr.
[0172] While valid values in the entire R.2020 YCbCr color space may be limited (not to cover the entire 3D cube of YCbCr values), the entire sampling value grid may be used to cover the entire 3D cube of YCbCr values in order to reduce the likelihood of codeword deviations caused by compression operations.
[0173] For simplicity, (i, j, k) may be vectorized or denoted asp. Hence, the sampled values
in the R.2020 YCbCr may be denoted as
A vector/matrix can be constructed by collecting all N.sub.u nodes together, as follows:
[0174] The forward 3D-LUT construction process includes a second step in which the sampled values in the R.2020 YCbCr color space HLG may be converted to corresponding values (or colors) in the R.2020 RGB color space HLG, as follows:
[0175] The forward 3D-LUT construction process includes a third step in which the converted values in the R.2020 RGB color space HLG may be converted to corresponding values (or colors) in the a RGB color space HLG corresponding to the optimized value (denoted as a.sup.opt) for the parameter a, as follows:
[0176] The forward 3D-LUT construction process includes a fourth step in which the converted values in the (a) RGB color space HLG may be clipped as follows:
[0177] The forward 3D-LUT construction process includes a fifth step in which the clipped values in the a RGB color space HLG may be converted to corresponding values (or colors) in the R.2020 RGB color space HLG, as follows:
[0178] The forward 3D-LUT construction process includes a sixth step in which the values in the R.2020 RGB color space HLG derived with expression (47) above may be converted to corresponding values (or colors) in the R.2020 YCbCr color space HLG, as follows:
[0179] The forward 3D-LUT construction process includes a seventh step in which, for each lookup node/entry in the forward 3D-LUT, the optimized forward TPB coefficients are used with a forward generation matrix
built with the input HDR colors or codewords (or codeword values) in the R.2020 YCbCr color space HLG derived with expression (48) above as input parameters to forward TPB basis functions to obtain the mapped or forward reshaped SDR colors or codewords (or codeword values)
as follows:
[0180] In expression (49) above, the forward generation matrix
can be built or constructed using the input HDR colors or codewords (or codeword values)
as input to the forward TPB basis functions, as follows:
[0181] The mapped or forward reshaped SDR colors or codewords (or codeword values)
may be used as lookup values of the lookup nodes/entries of the forward 3D-LUT, whereas the input HDR colors or codewords (or codeword values) used as the input parameters to the forward TPB basis functions may be used as lookup keys of the lookup nodes/entries of the forward 3D-LUT. At runtime, mapped or forward reshaped SDR colors or codewords can be simply looked up in the forward 3D-LUT with lookup keys as input HDR colors or codewords to be forward reshaped into the mapped or forward reshaped SDR colors or codewords.
[0182] The backward 3D-LUT as mentioned previously can be used to backward reshape the reshaped SDR colors (or codewords) in the forward reshaped SDR domain or color space into backward reshaped HDR colors (or codewords) in the backward reshaped HDR domain or color space such as the R.2020 domain or color space (in which the a color space is contained).
[0183] In some operational scenarios, a backward 3D-LUT construction process, which may be simpler than the forward 3D-LUT construction process as previously discussed, may be implemented or performed to build or construct the backward 3D-LUT. In some operational scenarios, to speed up or reduce computations, the backward 3D-LUTs may be constructed from pre-computed TPB basis function values from sampled values and the optimized backward TPB coefficients that are generated for the optimized value for the parameter a. The backward 3D-LUT or lookup nodes/entries therein may be pre-built, before deployed at runtime to process input images such as forward reshaped SDR images, and applied with relatively simple lookup operations in the backward path or corresponding backward reshaping operations performed therein with respect to the input images at the runtime.
[0184] The pre-built backward 3D-LUT can be deployed at the decoder side. A recipient downstream device at the decoder side can receive and decode a video signal encoded with forward reshaped SDR images in the forward reshaped SDR domain or color space, and use the 3D-LUT to apply backward reshaping to the forward reshaped SDR images to generate backward reshaped HDR images in the backward reshaped HDR domain or color space such as the a color space contained in the R.2020 domain or color space.
[0185] While a shape formed or delineated by the full boundary of the forward reshaped SDR domain or color space mapped with TPB-based forward reshaping operations may not be a simple 3D cube, forward reshaped SDR values in the forward reshaped SDR domain or color space may be clipped to the tightest or smallest 3D cube (e.g., a 3D rectangle, a rescaled 3D cube from a 3D rectangle, etc.) that contains or supports all forward reshaped SDR values represented in all the lookup nodes/entries of the forward 3D-LUT, for example without clipping these SDR values represented in the forward 3D-LUT.
[0186] In some operational scenarios, the backward 3D-LUT construction process comprises a first step in which a full sampling value grid may be prepared in the forward reshaped SDR domain or color space such as the entire R.709 SDR YCbCr color space.
[0187] The backward 3D-LUT construction process comprises a second step in which min and max values for each dimension or (color) channel in the forward reshaped SDR domain or color space may be determined among the forward reshaped SDR values in the forward 3D-LUT and used to limit (input) data ranges of the forward reshaped SDR (e.g., YCbCr, etc.) colors or codewords that serve as input for the backward path.
[0188] The backward 3D-LUT construction process includes a third step in which, for each lookup node/entry in the backward 3D-LU the optimized backward TPB coefficients are used with a backward generation matrix
(e.g., as illustrated in expression (29) above, etc.) built with the forward reshaped SDR colors or codewords (or codeword values) in the clipped or limited (input) data ranges derived in the second step of the backward 3D-LUT construction process as input parameters to backward TPB basis functions to obtain the mapped or backward reshaped HDR colors or codewords (or codeword values).
[0189] The mapped or backward reshaped HDR colors or codewords (or codeword values) may be used as lookup values of the lookup nodes/entries of the backward 3D-LUT, whereas the input or forward reshaped SDR colors or codewords (or codeword values) used as the input parameters to the backward TPB basis functions may be used as lookup keys of the lookup nodes/entries of the backward 3D-LUT. At runtime, mapped or backward reshaped HDR colors or codewords can be simply looked up in the backward 3D-LUT with lookup keys as input or forward reshaped SDR colors or codewords to be backward reshaped into the mapped or backward reshaped HDR colors or codewords.
White Box Backward Optimization (WB)
[0190] Forward TPB-based reshaping or a corresponding forward 3D-LUT may not be used in all video capturing and/or editing devices owing to costs and/or computation overheads involved in implementing or operating with forward TPB-based reshaping or forward 3D-LUT in video capturing and/or editing applications.
[0191] In some operational scenarios, a hardware-based solution such as implemented with an available ISP may be used in a video capturing and/or editing device to perform HDR-to-SDR conversion such as HDR HLG to SDR image generation. An existing ISP pipeline deployed with the device can operate with one or more programmable parameters to generate, convert and/or output SDR images based in part or in whole on corresponding HDR (e.g., HLG, etc.) images acquired by the device.
[0192] In the forward path, the programable parameters for the ISP pipeline can be specifically set or configured to cause the ISP pipeline to output the SDR images that approximate as close as possible reference SDR images generated with a white box HDR-to-SDR conversion function.
[0193] In the backward path, backward TPB-based reshaping or a corresponding backward 3D-LUT may be used to generate backward reshaped HDR images from the SDR images outputted from the ISP pipeline. Backward TPB coefficients used in the backward path may be optimized, for example to cover a backward reshaped HDR domain or color space such as the R.2020 color space as much as possible.
[0194]
[0195] In some operational scenarios, the ISP pipeline may be a relatively limited programmable module implemented with an ISP (hardware) to approximate a white-box HDR-to-SDR conversion function such as a white box (e.g., known, well-defined, etc.) conversion function to convert HDR HLG images to corresponding SDR images in block 330.
[0196] The ISP pipeline may include or implement (1) a first set of three one-dimensional lookup tables (1D-LUTs) for conversion from HLG RGB to linear RGB, (2) followed by a 33 matrix for conversion from HDR images in a HDR domain or color space such as the R.2020 color space to an SDR domain or color space such as the R.709 color space, and (3) followed by a second set of three 1D-LUTs implementing a BT.1886 linear to non-linear (gamma) conversion with a (e.g., standard-based, etc.) HLG optical-to-optical transfer function (OOTF).
[0197] For the purpose of illustration only, the HDR images may be HDR HLG images retrieved from an image dataset or database as shown in
[0198] The process flow of
[0199] Block 322 comprises selecting a current (e.g., to be iterated, etc.) value for the design parameter .sup.BT1886 to the next candidate value (e.g., initially the first candidate value, etc.) in the plurality of candidate values for the design parameter .sup.BT1886. Given the current value for the design parameter .sup.BT1886 the ISP SDR images can be compared with the reference SDR images in one or more subsequent process flow blocks of
[0200] Block 324 comprises applying the first set of 1D-LUTs to the HDR HLG (RGB) images retrieved from the database to generate corresponding HDR linear RGB images, as follows:
where ch denotes the red, green or blue channel;
denotes HDR HLG codewords in the (input) HDR HLG images;
denotes HDR linear codewords in the corresponding HDR linear images generated from the first set of 1D-LUTs as represented in expression (51); the parameters (a, b, c) may be set to (0.17883277, 0.28466892, 0.55991073).
[0201] Block 326 comprises applying the 33 matrix to the corresponding HDR linear RGB images to generate corresponding SDR linear RGB images. The 33 matrix may be given as follows:
[0202] SDR linear RGB codewords in the corresponding SDR linear RGB images generated with the 33 matrix in expression (52) above may be denoted as
[0203] Block 328 comprises applying the second set of 1D-LUTs to the SDR linear RGB codewords
in the corresponding SDR linear RGB images to generate corresponding ISP SDR codewords (denoted as
in corresponding ISP SDR images.
[0204] In some operational scenarios, as noted, the second set of 1D-LUTs merges or combines the linear to non-linear SDR conversion with the HLG OOTF given as follows:
where
denotes intermediate SDR codewords generated from the SDR linear RGB codewords
in the corresponding SDR linear RGB images; L.sub.w=100; L.sub.b=0; and
[0205] The linear to non-linear SDR conversion used to generate the ISP SDR codewords
from the intermediate SDR codewords
in expression (53) above may be the linear to gamma conversion as defined in BT.1886, as follows:
where .sup.BT1886 represents the previously mentioned design parameter.
[0206] Blocks 322 through 328 of
[0207] Block 332 comprises determining (e.g., quality, etc.) differences between the ISP SDR images generated from the HDR HLG to SDR conversion function .sub.HLG.fwdarw.ISPSDR in blocks 322 through 328 and the reference SDR images generated from the white box HDR-to-SDR conversion function in block 330. A quality assessment function such as MSE, RMSE, SAD, PSNR, SSIM, or the like, may be used to compute the differences.
[0208] Block 334 comprises determining whether the current candidate value for t the design parameter .sup.BT1886 is the last candidate value in the plurality of candidate values for the design parameter .sup.BT1886. If so, the process flow goes to block 336. Otherwise, the process flow goes back to block 322.
[0209] Block 336 comprises selecting an optimal or optimized value for the design parameter .sup.BT1886 The selection of the optimized value for the design parameter .sup.BT1886 can be formulated as an optimization problem to find a specific value (denoted as .sup.BT1886,opt) for the design parameter .sup.BT1886 among the plurality of candidate values such that differences between the ISP SDR images and the reference SDR images as computed with a relatively large image dataset or database are minimized, as follows:
[0211] An example value of .sup.BT1886,opt may, but is not necessarily limited to only, 2.115.
SDR to PQ TPB Optimization
[0212] As noted, in the WFB use cases or operational scenarios, a BESA algorithm may be used under the SLBC framework to generate optimized reshaping operational parameters used by forward and backward reshaping operations to generate forward reshaped SDR images and backward reshaped HDR images. In comparison, in the WB use cases or operational scenarios, only backward reshaping may be performed under the SLiDM framework on non-forward-reshaped SDR imagessuch as ISP SDR images generated with an ISP pipeline implemented with a video capturing and/or editing deviceencoded in an outputted video signal to generate corresponding backward reshaped or reconstructed HDR images.
[0213] In the WB use cases or operational scenarios, forward TPB reshaping from an input HDR domain or color space such as the R.2020 HDR color space HLG to a forward reshaped SDR domain or color space such as the R.709 SDR YCbCr color space could not be used to help utilize the full ranges of codewords supported by the R.709 SDR YCbCr color space. ISP SDR codewords in the ISP SDR images encoded in the outputted video signal are often time hard limited in a R.709 color space portion generated or supported by the video capturing and/or editing device or the ISP pipeline implemented therein. It may be difficult for the ISP SDR codewords in the hard limited R.709 color space portion to be further expanded or mapped back by backward reshaping such as backward TPB based reshaping to a backward reshaped o reconstructed HDR domain or color space.
[0214] However, as in the WFB use cases or operational scenarios, in the WB use cases or operational scenarios, optimized reshaping operational parameters such as TPB coefficients can be generated in reshaping optimization to help increase coverage of a supported color space in which backward reshaped HDR images are to be represented. In many WB operational scenarios, the backward reshaped or reconstructed HDR domain or color space realized with the backward TPB based reshaping can be only slightly larger than the R.709 color space, as compared with the largest (a) color space achievable in the WFB use cases or operational scenarios.
[0215] For the purpose of illustration only, in the operational scenarios as shown in
[0216] By way of illustration but not limitation, the subset or sub-space in the R.2020 color space (or HDR color space for representing the reshaped HDR images) may be defined or characterized by a specific white point and three specific color primaries (red, green, blue). The specific white point may be selected or fixed to be the D65 white point.
[0217] Denote a backward reshaped HDR color space for representing backward reshaped or reconstructed HDR images as an (b) color space. Hence, CIExy coordinates of the color primaries and white point defining the (b) color space may be denoted as
respectively. The white point
of the (b) color space can be specified as the D65 white point.
[0218] Each of the color primaries of the (b) color space can be selected along the line between a respective color primary
of the P3 color space and a respective color primary
of the R.2020 color space in the CIExy coordinate system or chromaticity diagram. Any point along the line between the two respective color primaries of the P3 color space and the R.2020 color space can be represented as a linear combination of these two respective color primaries with a weight factor b, as follows:
[0219] Hence, the optimization problem of finding the maximum support from the (b) color space for the R.2020 color space can be simplified as a problem to select the weight factor b. When b=0, the (b) color space becomes the entire R.2020 color space. When b=1, as illustrated in
[0220] As illustrated in
[0221]
[0222] Block 342 comprises selecting a current (e.g., to be iterated, etc.) value for the parameter b to the next candidate value (e.g., initially the first candidate value, etc.) in the plurality of candidate values for the parameter b. Given a candidate backward reshaped HDR color space with the current value for the parameter b, optimized reshaping operational parameters such as optimized TPB coefficients can be obtained in one or more subsequent process flow blocks of
[0223] Block 344 comprises building samples points, or prepare two sampled data sets, in the candidate backward reshaped HDR color space. By way of example but not limitation, the candidate backward reshaped HDR color space may be a Hybrid Log Gamma (HLG) RGB color space (referred to as the (b) RGB color space HLG).
[0224] Similar to block 304 of
in the uniformly sampled data set of color patches is given as
for i[0, . . . , N.sub.R1], j[0, . . . , N.sub.G1], k[0, . . . , N.sub.B1].
[0225] For simplicity, (i, j, k) can be vectorized or simply denoted as as p. Correspondingly, the uniformly sampled data point or RGB color
can be simply denoted as
All N.sub.u nodes (each of which represents a unique combination of a specific R-axis unit/partition, a specific B-axis unit/partition, a specific G-axis unit/partition) can be grouped or collected into a vector/matrix as follows:
[0226] The second of the two sampled data set prepared or built in block 344 is a neutral color data set. This second data set includes a plurality of neutral colors or neutral color patches (also referred to as gray colors or gray color patches).
[0227] The second data set may be used to preserve input gray color patches in an input domain as output gray color patches in an output domain when the input gray color patches in the input domain are mapped or reshaped into the output gray color patches in reshaping operations as described herein. The input gray color patches in the input domain (or input color space) may be given increased weighting in the optimization problem as compared with other color patches to reduce the likelihood of these input gray color patches being mapped to non-gray color patches in the output domain (or output color space) by the reshaping operations.
[0228] The second data setthe gray color data set or the data set of gray colorsmay be prepared or built by uniformly sampling the R, G, B values along the line connecting between a first gray color (0, 0, 0) and a second gray color (1, 1, 1) in the RGB domain (e.g., the (b) RGB color space HLG, etc.), giving rise to N.sub.n nodes or gray color patches, as follows:
[0229] All the N.sub.n nodes in the second data set can be grouped or collected into a neutral color vector/matrix as follows:
[0230] The neutral color vector/matrix in expression (8) above can be repeated N.sub.t (a positive integer no less than one (1)) times to generate N.sub.nN.sub.t neutral color patches in the (now repeated) second data set, as follows:
[0231] The repetition of neutral colors in the repeated second data set of neutral colors increases weighting of the neutral or gray colors as compared with other colors. Accordingly, the neutral colors can be preserved more in the optimization problem than the other colors.
[0232] The first data set of (all sampled) colors and the second (repeated) data set of neutral colors in expressions (61) and (64) can be collected or placed together in a single combined vector/matrix, as follows:
[0233] The total number of vector/matrix elements (repeated and non-repeated color patches) in the combined vector/matrix
is N=N.sub.nN.sub.t+N.sub.u. Each vector/matrix element or color patch (row) in
in expression (10) above may be denoted as follows:
[0234] Block 346 comprises converting color values of the color patches (rows) represented in the vector/matrix elements of the combined vector/matrix
in the (b) RGB color space (or the (b) RGB color space HLG) to corresponding color values in a standard-based R.2020 color space or an R.2020 RGB color space HLG, as follows:
where P.sup.(b).fwdarw.R2020 denotes a conversion matrix from the (b) color space to the R.2020 RGB color space HLG, which can be constructed similarly to how P.sup.(a).fwdarw.R2020 is constructed in the process flow of
where P.sup.(b).fwdarw.XYZ denotes a conversion matrix from the (b) color space to the XYZ color space that can be constructed similarly to how P.sup.(a).fwdarw.XYZ is constructed in the process flow of
[0235] Block 348 comprises converting or content mapping color values of the color patches (rows) represented in the vector/matrix elements of the vector/matrix
in the R.2020 RGB color space HLG to corresponding ISP SDR color values (denoted as
in an R.709 SDR RGB color space and then further to corresponding color values (denoted as
in an R.709 SDR YCbCr color space, as follows:
where .sub.HLG.fwdarw.ISPSDR( ) represents an ISP equation such as the HDR HLG to SDR conversion function implemented by the ISP pipeline with the optimized value for the design parameter .sup.BT1886 generated from the process flow of
[0236] Each color patch (row) in
may be denoted as follows:
[0237] Block 350 comprises converting or content mapping color values of the color patches (rows) represented in the vector/matrix elements of the vector/matrix
in the R.2020 RGB color space HLG to corresponding color values (denoted as
in an R.2020 YCbCr color space PQ, as follows:
[0238] Each color patch (row) in
may be denoted as follows:
[0239] Block 352 comprises taking
as input to a backward TPB optimization algorithm to generate optimized backward TPB coefficients for TPB based reshaping for the (b) color space corresponding to the current value for the parameter b.
[0240] To do so, a backward generation matrix may be constructed from TPB basis functions and SDR codewords represented in
as follows:
[0241] Backward prediction errors can be determined by comparing the backward reshaped HDR color patches or codewords collected in a vector/matrix with reference HDR color patches or codewords in a per-channel backward observation vector/matrix derived and minimized in solving the TPB optimization problem. The per-channel backward observation vector/matrix can be generated or pre-calculated from expression (71) above, stored or cached in computer memory, and fixed in all iterations, as follows:
[0242] Optimized values for the (e.g., per-channel, etc.) backward TPB coefficients (denoted as
can be generated via a least squared solution to the optimization problem that minimizes differences between the backward reshaped HDR color patches or codewords and the reference HDR color patches or codewords, as follows:
[0243] Per-channel predicted (or backward reshaped or reconstructed) HDR codeword for each channel ch can be computed as follows:
[0244] Block 354 comprises determining whether the current candidate value for the parameter b is the last candidate value in the plurality of candidate values for the parameter b. If so, the process flow goes to block 356. Otherwise, the process flow goes back to block 342.
[0245] Block 356 comprises selecting an optimal or optimized value for the parameter b as well as computing (or simply selecting already computed) the optimized forward and backward TPB coefficients for the (b) RGB color space corresponding to the optimized value for the parameter b.
[0246] Similar to the WFB use cases or operational scenarios, in the WB use cases or operational scenarios, the (b) color space is also part of the optimization. The (b) color space and the backward TPB coefficients can be jointly optimized. Hence, the optimization problem can be formulated to find (specific values for) {b,
such that the (b) color space is generated to cover the largest HDR color space or color space portion while achieving minimized HDR prediction errors denoted as
as follows:
[0247] An example optimized value for the parameter b may, but is not necessarily limited to only, 1, which corresponds to the P3 color space.
Black-Box TPB Backward Optimization in Single Device (BB1)
[0248] Some (dual-mode) video capturing devices or mobile devices support both SDR and HDR capture modes and hence can output either SDR or HDR images. Some (mono-mode) video capture devices or mobile devices only support the SDR capture mode. Under techniques as described herein, SDR-to-HDR mappings can be modeled or generated with the dual-mode video capturing devices. Some or all of these SDR-to-HDR mappings can then be applied, for example in downloaded and/or installed video capturing applications, to SDR images captured by either the dual-mode video capturing devices or the mono-mode video capturing devices to up-convert the SDR images acquired with the SDR capture mode to corresponding HDR images, regardless of whether these video capturing devices support the HDR capture mode or not.
[0249] In some operational scenarios, as computation environment in or processing capabilities of mobile devices may be limited, a static (backward) 3D-LUT derived at least in part from TPB basis functions and optimized TPB coefficients may be used to represent a static SDR-to-HDR mapping that maps or backward reshapes SDR images such as all SDR images in a SDR video sequence to generate corresponding HDR images such as all HDR images in a corresponding HDR video sequence.
[0250] The static SDR-to-HDR mapping may be modeled based at least in part on a plurality of image pairs of HDR and SDR images acquired with a specific camera of a dual-mode video capturing device. Each image pair in the image pairs of HDR and SDR images includes an SDR image as well as an HDR image corresponding to the SDR image. The SDR image and the HDR image depict the same visual scene in a real world subject to spatial alignment errors caused by spatial movements that may occur between a first time point at which the SDR image is captured by the specific camera operating in the SDR capture mode and a second time point at which the HDR image is captured by the same camera operating in the HDR capture mode. For example, for each visual scene in a physical world, the video capturing device can operate with the HDR mode to capture an HDR image in an image pair and use the SDR mode to capture the SDR image in the same image pair. This may be repeated to generate the plurality of image pairs of HDR and SDR images to generate a relatively large number of different color patches (or a relatively large number of different colors or different codewords) used to generate the static SDR-to-HDR mapping.
[0251] There are several challenges for using the captured SDR and HDR images to model the static SDR-to-HDR mapping. First, while both the SDR and HDR capture modes may use the same ISP inside the same video capturing device, the video capturing device can apply different image capture settings (e.g., exposure settings for the specific camera, etc.) for the same scene to obtain designed optimized picture qualities in the captured SDR and HDR images. As a result, the static SDR-to-HDR mapping may be able to model or approximate the HDR image capture settings actually implemented by the video capturing device to some extent. Second, spatial alignment between the captured SDR and HDR images to be used to derive an image pair of SDR and HDR images may not be exact. For example, selecting a specific capture mode between the SDR and HDR capture modes may be performed via touching a screen of the video capture device, which can cause the video capture device to lose or move way from its previous spatial position and/or orientation. In addition, temporal alignment between a captured SDR video/image sequence and a captured HDR video/image sequence can easily occur as these SDR and HDR sequences are captured at different time instances or durations. Some depicted visual objects in these sequences can move in or out of field of camera view. Some local regions in the field of camera view can have occlusions or dis-occlusions from time to time.
[0252] In some operational scenarios, registration operations can be performed to resolve spatial and/or temporal alignment issues between a captured SDR image and a corresponding captured HDR image to generate an aligned SDR image and a corresponding aligned HDR image and to include the aligned SDR image and the corresponding aligned HDR image in an image pair of SDR and HDR images. The image pair can then be used to determine or establish SDR color patches or colors in the aligned SDR image and corresponding HDR color patches or colors in the corresponding aligned HDR image. These SDR and HDR color patches or colors can be used to derive at least some color patches in a set of corresponding SDR and HDR color patches or colors for the purpose of deriving or generating the static SDR-to-HDR mapping.
[0253] A SDR and HDR video capturing process may be performed to provide a comprehensive coverage of different scenes and/or exposures at different times of a day such as from early morning to late night for both indoor and outdoor scenes. For each scene, the same video capturing device can be used with both the SDR and HDR modes to capture SDR and HDR video consequences. In some operational scenarios, a specific frame/image such as the first frame/image of each of the video sequences may be selected or extracted for the purpose of building or deriving the plurality of image pairs of SDR and HDR images to be included in a training image dataset or database.
[0254]
[0255] Block 362 comprises receiving the captured SDR image and finding or extracting a set of SDR image feature points in the captured SDR image. Block 364 comprises receiving the captured HDR image and finding or extracting a set of HDR image feature points in the captured HDR image. The set of SDR image feature points and the set of HDR image feature points may be of the same set of feature point types.
[0256] The set of feature point types may include feature point types in a wide variety of different feature point types including but not necessarily limited to only, any, some or all of: BRISK features detected with the Binary-Robust-Invariant-Scalable-Keypoints or BRISK algorithm; detected corners using the Features from Accelerated-Segment-Test or FAST algorithm; features detected from the KAZE algorithm; detected corners using a minimum eigenvalue algorithm; features generated with the maximally-stable-extremal-regions or MSER algorithm; keypoints detected with the Oriented-FAST-and-Rotated or ORB algorithm; features extracted from the scale-invariant-feature-transform or SIFT algorithm; features extracted from the Speeded-Up-Robust-Features or SURF algorithm; and so forth.
[0257] Each of some or all feature points in the set of SDR image feature points and the set of HDR image feature points may be represented with a feature vector or descriptor such as an array of (e.g., numerical, etc.) feature values.
[0258] Block 366 comprises matching some or all feature points in the set of SDR image feature points with some or all feature points in the set of HDR image feature points. For each feature point of a specific type in the SDR image in the set of SDR image feature points, a matching metric may be computed between the feature point in the SDR image and each feature point in feature points of the same type in the HDR image in the set of HDR image feature points. In a non-limiting example, the matching metric may be computed as a sum of absolute difference or SADor another metric function used to measure differences between two feature pointsbetween a first feature vector representing the feature point in the SDR image and a second feature vector representing the feature point in the HDR image. A specific feature point with the lowest SAD or matching metric may be selected from among the feature points of the same type in the HDR image as a possible match for the feature point in the SDR image. In response to determining that the lowest SAD or matching metric is lower than a matching difference threshold, the specific feature point in the HDR image may be identified or determined as a match for the feature point in the SDR image; the feature point in the SDR image and the specific feature point in the HDR image form a pair of matched SDR and HDR feature points. Otherwise, the specific feature point may not be identified as such a match. This matching operation can be repeatedly performed for all feature points in the set of SDR image feature points, thereby giving rise to a set of pairs of matched SDR and HDR feature points.
[0259] Block 368 comprises computing a geometric transform such as a 33 2D affine transform between the SDR image and the HDR image based in part or in whole on the set of pairs of matched SDR and HDR feature points. The geometric transform may be computed or derived using (e.g., pixel rows and pixel columns, etc.) coordinates of matched feature points in the SDR and HDR images from the set of pairs of matched SDR and HDR feature points.
[0260] For each (e.g., the k-th, etc.) pair of matched SDR and HDR feature points, the 2D coordinate of the HDR feature point in the pair may be denoted as (.sub.ix, .sub.iy) and included in a first vector as .sub.i=[.sub.ix .sub.iy 1], whereas the 2D coordinate of the SDR feature point may be denoted as (.sub.ix, .sub.iy) and included in a second vector .sub.i=[.sub.ix .sub.iy 1].
[0261] Denote the total number of pairs of matched SDR and HDR feature points in the set of pairs of matched SDR and HDR feature points as N. Vectors generated from 2D coordinates of feature points in all the pairs of matched SDR and HDR feature points in the set of pairs of matched SDR and HDR feature points can be collected together, as follows:
[0262] As noted, the geometric transform may be represented as a 33 matrix, as follows:
[0263] Mathematically, first and second vectors respectively representing SDR and HDR feature points in each pair of matched SDR and HDR feature points in the set of pairs of matched SDR and HDR feature points can be related to each other through the 33 matrix representing the geometric transform, as follows:
[0264] Or, for all N pairs of matched feature points in the set of pairs of matched SDR and HDR feature points, vectors representing SDR and HDR feature points can be related through the 33 matrix representing the geometric transform, as follows:
[0265] Values of matrix elements in the 33 matrix representing the geometric transform may be generated or obtained as a solution (e.g., least squared solution, etc.) of an optimization problem to minimize transformation or alignment errors. More specifically, the optimization problem may be formulated as follows:
[0266] The optimized or optimal values for the 33 matrix representing the geometric transform can be obtained via the least squared solution to the optimization problem in expression ( ) above, as follows:
[0267] Block 370 comprises applying the geometric transform to one of the SDR and HDR images. In the present example, the transform is applied to the HDR image so HDR pixel locations in the HDR image are shifted to coincide with the SDR pixel locations in the SDR image, for example for each of three channels such as Y, Cb and Cr in which the HDR image is represented. As a result, most HDR pixels in the HDR image are spatially aligned with corresponding SDR pixels in the SDR image. Remaining HDR pixels and remaining SDR pixels to which no HDR pixels are spatially aligned may be excluded (e.g., assigned with an out-of-range pixel values, etc.) from being used as (valid) color patches or colors for the purpose of generating the static SDR-to-HDR mapping or the TPB coefficients or the static backward 3D-LUT.
[0268] Block 372 comprises finding corresponding valid color patches or colors in the SDR and HDR images. The valid color patches or colors can be obtained from codeword values of spatially aligned pixels in the SDR and HDR images. As noted, the spatially aligned pixels can be generated by applying the geometric transform that is in turn generated from matched SDR and HDR feature points.
[0269] In some operational scenarios, to enhance spatial alignment accuracy or reliability of the geometric transform, after (initial) matched SDR and HDR feature points with matching metrics lower than the minimum matching difference threshold are identified, a further matching threshold may be applied to select or distinguish a subset of final matched SDR and HDR feature points from or among the (initial) matched SDR and HDR feature points.
[0270] For each spatially aligned pixel in the SDR image, an SDR codeword such as an SDR Y/Cb/Cr value may be determined for the spatially aligned pixel in the SDR image. For a co-located or spatially aligned pixelcorresponding to the spatially aligned pixel in the SDR imagein the spatially transformed HDR image generated with the geometric transform, a corresponding HDR codeword such as a corresponding HDR Y/Cb/Cr value may be determined for the spatially aligned pixel in the HDR image. If the transformed HDR pixel is not available (e.g., having a value of 0, etc.), then the HDR pixel may be discarded or prevented from being considered as a part of matched color patches or codewords.
[0271] In each pair (e.g., the i-th, etc.) of matched SDR and HDR color patches or colors such as generated from a pair of spatially aligned SDR and HDR pixels in the SDR and HDR images, the matched SDR and HDR color patches or colors (or codeword values) may be given, respectively, as follows:
[0272] Matched SDR and HDR color patches or colors (or codeword values) in all pairs of matched SDR and HDR color patches or colors such as generated from all pairs of spatially aligned SDR and HDR pixels in SDR and HDR images of all image pairs of SDR and HDR image pairs can be collected together from all images and merge them into two matrixes, as follows:
[0273] Block 374 comprises solving or generating optimized or optimal TPB coefficients for backward reshaping or the static SDR-to-HDR mapping using the matrixes
in expressions (88) and (89) above as input.
[0274] A per-channel backward generation matrix may be generated from backward TPP basis functions and codewords in the matrix
as follows:
[0275] A per-channel observation matrix for the SDR-to-HDR mapping may be constructed from the matrix
as follows:
[0276] The optimized or optimal backward TPB coefficients for channel ch can be solved via least squared solution, as follows:
[0277] The predicted or backward reshaped HDR value for channel ch can be computed as follows:
Black-Box Backward Reshaping Optimization in Two Devices (BB2)
[0278] In the BB2 operational scenarios, a (backward) reshaping mapping may be used to map (ISP captured) SDR images captured by a first video capturing device in an SDR capture mode to generate (ISP mapped) HDR images simulating HDR looks of (ISP captured) HDR images captured by a second different video capturing device operating in an HDR capture mode. In some operational scenarios, the first video capturing device may be a relatively low-end mobile phone that can only capture SDR images or pictures, whereas the second video capturing device may be relatively high-end phone that can capture HDR images or pictures. While the first and second video capturing devices can operate with different hardware configurations and capabilities, the reshaping mapping is used to reshape the (ISP captured) SDR images captured by the first device into the (ISP mapped) HDR images approximating the (ISP captured) HDR images captured by the second device.
[0279] The reshaping (SDR-to-HDR) mapping may be modeled based at least in part on a plurality of image pairs formed by (training) SDR images captured by the first device and (training) HDR images captured by the second device. Each image pair in the image pairs of HDR and SDR images includes an SDR image as well as an HDR image corresponding to the SDR image.
[0280] In some BB2 operational scenarios, some or all image pairs in the plurality of image pairs may (e.g., initially, before image alignment operations, etc.) include captured SDR and HDR images depicting visual scenes such as nature indoor/outdoor scenes in a real world subject to spatial and/or temporal alignment errors relating to the first and second video capturing device. As in the BB1 use cases or operational scenarios, in these BB2 use cases or operational scenarios, image alignment operations between corresponding SDR and HDR images in an image pair may be performed, for example, using a geometric transform built with selected feature points extracted from the SDR and HDR images. Subsequently, SDR and HDR color patches or colors determined from the aligned SDR and HDR images in image pair(s) may be used to generate a reshaping (SDR-to-HDR) mapping as described herein.
[0281] In some BB2 operational scenarios, instead of or in addition to taking images from natural scenes, some or all image pairs in the plurality of image pairs may (e.g., initially, before image alignment operations, etc.) include captured SDR and HDR images depicting color charts displayed on one or more reference image displays of the same type, for example in a lab environment. For example, the color charts can be generated as 16-bit full HD RGB (color chart) TIFF images. These TIFF images containing the color charts can be displayed as a perceptually quantized (PQ) video signal on the reference image display(s) such as PRM TV(s). The color charts displayed on the reference image display(s) can then be captured by the first and second video capturing devices, respectively.
[0282]
[0283] Different TIFF images may include different color charts with different pluralities of colors or different sets of distinct colors. Each color chart in a respective TIFF image among the different TIFF images may correspond to a respective plurality of set of distinct colors among the different pluralities or sets of distinct colors.
[0284] Each corner rectangle of four corner rectangles in the TIFF image of
[0285] To find correspondence/mapping relationships of as many colors as possible between the first and second video capturing devices, color charts in TIFF images as described herein may include as many different (sets or pluralities of) colors as possiblecorresponding to as many different pixel/codeword values as possiblesubject to display capabilities of the reference image display(s) to differentiate different colors. In addition, a color chart may be displayed with different overall intensities or illuminations for the purpose of causing the first and second video capturing devices to exercise different exposure settings under different illumination conditions. Accordingly, the same color displayed within a color chart in a TIFF image can be displayed with different intensities or illuminations on the reference image display(s).
[0286]
[0287] Block 382 comprises defining or determining a set of possible mean values (denoted as ) in a PQ domain or color space.
[0288] In operational scenarios in which a reference image display used to display TIFF images containing color charts is a PRM TV, maximum and minimum luminance values supported by the reference image display may range between 1000 nits and 0 nit or even a larger dynamic range and contrast ratios. By way of example but not limitation, minimum and maximum PQ values that can be displayed by the reference image display without clipping may be given as: P.sub.0=L2PQ(0) and P.sub.1=L2PQ(1000), respectively, where L2PQ() denotes a mapping function from (linear or non-PQ) luminance to (non-linear or PQ) luma codewords in the PQ domain or color space.
[0289] In a non-limiting example, the set can be defined as a set of different PQ luma codewords corresponding to (linear) luminance values evenly distributed in a (linear) luminance value range from L.sub.lower to L.sub.upper on a log scale. The lower bound L.sub.lower may be set to a value such as 10.sup.4, whereas the upper bound L.sub.upper may be set to a value such as 10.sup.2.99 (e.g., fractional exponent value selected for numerical stability, etc.), depending in part or in whole on image capturing capabilities (e.g., to capture the darkest and brightest luminances, etc.) of the first and second devices. The log scale is used as luma codewords in captured SDR and HDR images may be approximately linear to the log of luminance in the luminance value range. The total number of possible mean values inor the magnitude |
| ofthe set of possible mean values may, but is not necessarily limited to only, set as follows: |
|=2048.
[0290] Block 384 comprises defining or determining a set of possible shape coefficients used to generate possible variance values for the statistical distribution or distribution type (e.g., beta distribution, etc.).
[0291] In some operational scenarios, the beta distribution used to randomly select colors or codeword values is defined, or has support, over a scaled or closed value interval [0, 1]. Given PQ values over a PQ codeword value range of [P.sub.0, P.sub.1], a scaled mean value , where 0<<1 or within the closed value interval of the beta distribution, may be computed, as follows:
[0292] The scaled variance (denoted as .sup.2) for the beta distribution in the scaled or closed value interval [0, 1] can be adaptively set to be proportional to (1) to avoid over/under exposures caused by large variance of (e.g., luminance in, etc.) color blocks during SDR and HDR image capturing by the first and second devices, as follows:
where denotes a shape coefficient that affects or determines the (e.g., more compressed, more expanded, etc.) shape of the beta distribution. Values for the shape coefficients can be selected from a set of possible shape coefficients e to allow the generated color charts to have relatively high diversities in codeword values and/or colors resulted therefrom. A non-limiting example for the set of possible shape coefficients may be: ={3,6,9,12}.
[0293] Block 386 comprises generating a plurality of all unique combinations of scaled mean values and shape coefficients using the set of possible mean value and the set of shape coefficients for different instances of the statistical (e.g., beta, etc.) distributions. In the present example, the total number of unique combinations is given as follows: ||||=20484=8192, where || denotes the total number of elements or magnitude in the set . In some operational scenarios, a different TIFF image containing a different color chart may be generated for each unique combination of scaled mean value and shape coefficient in the plurality of all unique combinations of scaled mean values and shape coefficients, thereby giving the total number of different color charts or corresponding different TIFF images as |
|||.
[0294] Block 388 comprises selecting a current combination of scaled mean and shape coefficient values, from among the plurality of all unique combinations of scaled mean values and shape coefficients, to be the next combination. A set or plurality of different pixel or codeword values respectively specifying or defining a set or plurality of different colors for a current color chart corresponding to the current combination of scaled mean and shape coefficient values can be generated from the beta distribution given the scaled mean in the current combination and a current scaled variance .sup.2 in the current combination.
[0295] Block 390 comprises calculating or defining the beta distribution, as follows:
where and are beta distribution parameters; B(, ) denotes a normalization constant. The beta distribution parameters and can in turn be derived from the mean and variance values of the beta distribution, as follows:
[0296] Block 392 comprises generating a set of different (e.g., 144, etc.) colors (or pixel/codeword values) for the current color chart or the current color chart image based on a set of (e.g., 144, etc.) random numbers generated from the beta distribution. Each random number denoted as x (where 0x1), can be scaled back or converted to a corresponding per-channel pixel or codeword value (referred to as PQ value) within the PQ value range [P.sub.0, P.sub.1] as follows:
[0297] A set of PQ values generated from scaling back or converting the set of random numbers into the PQ value range can then be generated from the set of random numbers and used as a set of per-channel codeword values or codewords for the set of colors in the current color chart. In some operational scenarios, for each color chart, the same beta distribution (with the same mean and shape coefficient) is used, for example with three different sets of random numbers, to generate per-channel codewords for each of the multiple (e.g., RGB, etc.) channels of the color space in which display images are to be rendered by the reference image display. Accordingly, the average color that is generated from averaging colors represented in all color blocks in the color chart is close to a neutral or gray color.
[0298] Block 394 comprises generating the current color chart or the central block of the corresponding TIFF image. The current color chart may include a set of (e.g., 2D square, etc.) color blocks each of which is a single color given by a specific (cross-channel or composite) pixel or codeword value with three per-channel pixel or codeword value. Three sets of per-channel codewords generated from the same beta distributionbut with three different sets of random numbersmay be used by the reference image display to respectively drive rendering of red, blue, green channels of color blocks in the set of color blocks in the current color chart or the current TIFF image.
[0299] Additionally, optionally or alternatively, a RGB value used to specify or generate a background color for the background of the color chart image may be set to the mean PQ value .sub.PQ from which the scaled mean value of the beta distribution is derived.
[0300] Block 396 comprises determining whether the current combination of scaled mean and shape coefficient values is the last combination in the plurality of all unique combinations of scaled mean values and shape coefficients. If so, the process flow ends for generating the plurality of different color charts or different color chart images. Otherwise, the process flow goes back to block 388.
[0301] By way of illustration but not limitation, a total number denoted as n.sub.c of different colors or color blocks such as 144 may be generated for each color chart. The total number of colors or color blocks generated from the plurality of different color charts and color chart images may be given as: ||||n.sub.c=20484144=1179648 colors or color blocks, which can be used to determine correspondence/match relationships between SDR color patches or codewords captured in the SDR capture mode with the first video capturing device and corresponding HDR color patches or codewords captured in the HDR capture mode with the second video capturing device.
[0302] In some operational scenarios, the plurality of TIFF images respectively containing the plurality of color charts can be iteratively displayed or rendered on the reference image display such as a PRM TV with a constant playback framerate and can be captured in SDR and HDR images by the first and second vide capturing devices, respectively. As illumination of the reference image display may vary with viewing angles, the SDR and HDR images can be captured separately by the first and second video capturing devices (e.g., two phones, etc.) placed at the same position and orientation in relation or reference to the reference image display. Hence, SDR and HDR video signals or bitstreams containing captured SDR and HDR images of the color charts or TIFF images as rendered on the reference image display can be generated by the first and second devices or cameras thereof, respectively. Since the playback framerate of the reference image display may be kept constant, frame numbers of color charts can be determined relatively easily to establish an image pair in which a captured SDR image by the first video capturing device and a captured HDR image by the second video capturing device depict the same color chart in the plurality of color charts. The captured SDR and HDR images may be represented in SDR and HDR YCbCr (or YUV) color space, for example in YUV image/video files.
[0303]
[0304] Block 3002 comprises receiving captured (SDR or HDR) checkerboard images, which may be taken by a camera of one of the first and second video capturing devices from a displayed checkerboard image displayed on the reference image display. Blocks 3002 through 3006 of the same process flow of
[0305] Block 3004 comprises detecting checkerboard corners from the captured checkerboard images by the camera of the device.
[0306] Block 3006 comprises computing or calibrating camera distortion coefficients for the camera using the checkerboard corners detected from the captured checkerboard images. 3D (reference) coordinates of the checkerboard image displayed on the reference image display may be first determined in a 3D coordinate system stationary to the reference image display. Camera parameters such as distortion coefficients and other intrinsic parameters of the camera used by the device to generate the captured checkerboard images can be computed as optimized values that generate the best mapping (e.g., the least errors or least mismatch, etc.) between the 3D reference coordinates of the checkerboard corners and 2D (image) coordinates of the checkerboard corners in the captured checkerboard images. This calibration process can be performed in either YUV or RGB color space which the displayed checkerboard image or the captured checkerboard images may be converted to or represented in.
[0307]
[0308] The (camera-specific) distortion coefficients and intrinsic parameters of the cameras of the first and second video capturing devices obtained in processing blocks 3002 through 3006 can then be used to analyze or correlate between captured SDR and HDR images from TIFF images containing respective color charts as follows.
[0309] Block 3008 comprises receiving a captured (SDR or HDR) image of a TIFF imagedisplayed on the reference image displaycontaining a color chart as well as checkerboard corners (or checkerboard patterns in corners). The captured (SDR or HDR) image may be captured by a camera for which distortion coefficients have been obtained beforehand in block 3006 using the previously captured checkerboard images (which may not contain color charts).
[0310] Block 3010 comprises correcting or undistorting the captured image of the displayed TIFF image to correct camera lens distortion of the camera, for example using the distortion coefficients obtained in the camera calibration process such as block 3006.
[0311] Block 3012 comprises detecting checkerboard corners in the captured image. The checkerboard corners are captured from the checkerboard corners in (e.g., four corners in, etc.) the displayed TIFF image on the reference image display, which may be a PPM TV.
[0312] Block 3014 comprises estimating a projective transform between image coordinates of the captured image and image coordinates of the original TIFF image, which is displayed on the reference image display and captured in the received captured image.
[0313] Block 3016 comprises rectifying the captured image into the same layout as the original TIFF image using the estimated projective transform. With the estimated projective transform, the captured image can be rectified into the same spatial layout as the original TIFF image.
[0314]
[0315] Block 3018 comprises locating and extracting (a set of) individual color blocks from the color chart in the rectified captured image. As used herein, a color block is specified with a single corresponding (e.g., RGB, YCbCr, composite, etc.) pixel or codeword value. Corresponding individual codewords respectively specifying the individual color blocks in the captured image can be determined based on pixel or codeword values of pixels in these individual color blocks in the rectified captured image. In addition, individual original codewords respectively specifying individual original color blockscorresponding to the individual color blocks of the rectified captured imagein the original TIFF image can be determined from an RGB or YUV file for the original TIFF image.
[0316] Correspondence/mapping relationships between SDR color blocks or codewords extracted from a (rectified) captured SDR image of an original color chart image and HDR color blocks or codewords extracted from a (rectified) captured HDR image of the same original color chart image can be established based in part or in whole on the individual original color blocks or codewords in the original TIFF image from which both the (rectified) SDR and HDR image are derived.
[0317] Using the plurality of TIFF images, a plurality of SDR images captured by the first video capturing device in the SDR capture mode from the displayed TIFF images, and a plurality of HDR images captured by the second video capturing device in the HDR capture mode from the same displayed TIFF images, a plurality of correspondence or mapping relationships may be established between a set of SDR colors and a set of HDR images.
[0318] Several approaches may be used to map SDR images to HDR images.
[0319] In a first approach, as in some BB1 operational scenarios, in some BB2 operational scenarios, the mapped SDR and HDR color patches or colors can be used to generate (e.g., static, etc.) SDR-to-HDR reshaping operational parameters such as TPB coefficients. These TPB coefficients may be combined with TPB basis functions using SDR codewords of SDR images as input parameters to predict corresponding HDR codewords of reconstructed HDR images. A static 3D-LUT may be pre-built and deployed to a video capturing device (e.g., the first video capturing device, etc.) for mapping SDR images captured by the first video capturing device to reconstructed HDR images simulating HDR looks of the second video capturing device. Since HDR color blocks or codewords used to generate the reshaping operational parameters are extracted from captured (training) HDR images of the second video capturing device, it is likely that predicted HDR codewords generated with the reshaping operational parameters are likely to provide mapped HDR looks in the reconstructed HDR images similar to actual HDR looks of actual HDR images captured by the second video capturing device.
[0320] In a second approach, in some BB2 operational scenarios, non-TPB optimization may be used to generate non-TPB reshaping operational parameters to be used in non-TPB reshaping operations to map captured SDR images taken by the first video capturing device to reconstructed HDR images simulating the HDR looks of the second video capturing device. A 3D-LUT for non-TPB reshaping operations may be directly built, without relatively high continuity and smoothness supported by TPB reshaping operations. This non-TPB second approach provides relatively high design freedom and may be relatively flexible, even though different colors represented in nearby 3D-LUT entries/nodes may or may not have relatively high continuity and smoothness. In some operational scenarios, with the relatively high design freedom, the non-TPB second approach may be more suitable than the first approach to the two-device or BB2 operational scenarios in which SDR images from one device are to be mapped to HDR images simulating HDR looks of another device.
[0321] The non-TPB second approach may make use of a 3D mapping table (3DMT) to build a backward reshaping mapping or a backward lookup table (BLUT) representing the backward reshaping mapping. The BLUT may be used to map SDR (e.g., cross-channel, three-channel, etc.) codewords of SDR images to HDR (e.g., chroma channel, per chroma-channel, etc.) codewords of reconstructed HDR images. Example operations to build backward reshaping mappings such as BLUTs from 3DMT can be found in U.S. patent application Ser. No. 17/054,495, filed on May 9, 2019, the contents of which are entirely incorporated herein by reference as if fully set forth herein.
[0322] For the purpose illustration only, SDR codewords may be represented in three dimensions or channels Y, Cb, Cr in an SDR YCbCr color space. The 3D mapping table may be generated from a 3D histogram having a plurality of 3D histogram bins. The plurality of 3D histogram bins may correspond to a plurality of color space partitions generated by partitioning each dimension or channel with a respective positive integer among three positive integers Q.sub.y, Q.sub.Cb, Q.sub.Cr, thereby giving rise to a total of (Q.sub.yQ.sub.CbQ.sub.Cr) 3D histogram bins.
[0323] Each 3D histogram bin (denoted as q) in the plurality of 3D histogram bins (denoted as .sup.Q,s) of the 3D histogram may be specified with a respective bin index or three respective quantized channel values q=(q.sub.y, q.sub.Cb, q.sub.Cr) in the dimensions or channels of the SDR (YCbCr) color space and store a pixel count for all SDR color patches or codewords within the SDR color space partition represented by the 3D histogram bin. All the bin indexes (or quantized channel values) for the plurality of 3D histogram bins can be collected into a set of bin index values denoted as Q, where Q=[Q.sub.y, Q.sub.C.sub.
[0324] In addition, sums of HDR codewords mapped to SDR codewords within each 3D histogram in the plurality of 3D histograms may be computed and stored for the 3D histogram bin. Let
denote the sums of the HDR codewords (also referred to as mapped HDR luma and chroma values).
[0325] An example procedure for generating SDR pixel counts and mapped HDR luma and chroma values for the 3D histogram bins of the histogram is illustrated in TABLE 2 below.
TABLE-US-00002 TABLE 2 // STEP 1: 3D SDR histogram and 3D-mapped HDR chroma values initialization
[0326] Let
denote a (representative) SDR codeword at the center of the q-th 3D histogram bin. Representative SDR codewords for all the 3D histogram bins can be fixed for all SDR images/frames and precomputed with an example procedure as shown in TABLE 3 below.
TABLE-US-00003 TABLE 3 // Set for each bin the SDR mapping values // Recall that the bin index q = (q.sub.y, q.sub.Cb, q.sub.Cr). for (q.sub.y = 0; q.sub.y < Q.sub.y; q.sub.y++ ) for (q.sub.Cb = 0; q.sub.Cb < Q.sub.Cb; q.sub.Cb++ ) for (q.sub.Cr = 0; q.sub.Cr < Q.sub.Cr; q.sub.Cr++) {
[0327] Next, 3D histogram bins with non-zero (SDR) pixel counts may be identified in the plurality of 3D histogram bins and kept, where all other 3D histogram bins with zero (SDR) pixel counts are discarded or removed from the plurality of 3D histogram bins.
[0328] Let q.sub.0, q.sub.1, . . . q.sub.k-1, denote all k 3D histogram bins each with non-zero (SDR) pixel counts or
where k is a positive integer. Average values of mapped HDR luma and chroma values
may be computed for each 3D histogram bin in the k 3D histogram bins with non-zero (SDR) pixel counts by dividing the sums of HDR luma and chroma values to which SDR codewords of the SDR pixels are mapped over the number or pixel count of SDR pixels in in the 3D histogram bin, for example, using a procedure as shown in TABLE 4 below.
TABLE-US-00004 TABLE 4 // Set for each bin the SDR mapping values // The non-zero bin index q.sub.i = (q.sub.y, q.sub.Cb, q.sub.Cr). for (i = 0; i < k; i++) {
[0329] The 3D-LUT can be built from the 3DMT represented by the 3D histogram storing respective SDR pixel counts in the 3D histogram bins and respective mapped HDR codeword values. Each node/entry in the 3D-LUT or 3DMT can be built with a mapping key represented by a specific SDR codeword (q.sub.y, q.sub.Cb, q.sub.Cr) represented in each 3D histogram bin of the k 3D histogram bins and a mapped HDR codeword represented by
as computed for the 3D histogram bin in TABLE 4 above. Additionally, optionally or alternatively, for those 3D histogram bins with zero SDR pixel count, the nearest or neighbor 3D bin(s) with the closest bin index(es) can be used to fill in missing mapped HDR value(s), for example through tri-linear interpolation.
[0330] As noted, example operations to build a 3D-LUT from 3DMT to serve as a non-TPB backward reshaping mapping or a BLUTfor example for predicting HDR chroma codewords per chroma channel from cross-channel SDR codewordsare described in the previously mentioned U.S. patent application Ser. No. 17/054,495.
[0331] In some BB2 operational scenarios, a luma backward reshaping mapping may be used to backward reshape SDR luma codewords of a SDR image into predicted or backward reshaped HDR luma codewords of a corresponding HDR image using GPR-based model. Example generation of luma reshaping mappings using GPR-based model can be found in U.S. Provisional Patent Application Ser. No. 62/887,123, Efficient user-defined SDR-to-HDR conversion with model templates, by Guan-Ming Su and Harshad Kadu, filed on Aug. 15, 2019, and PCT Application Ser. No. PCT/US2020/046032, filed on Aug. 12, 2020, the contents of which are entirely incorporated herein by reference as if fully set forth herein. For example, CDF matching curves may be generated from each training SDR-HDR image pair in a plurality of training SDR-HDR image pairs. Using a set of (e.g., 15, etc.) uniformly sampled SDR points over a SDR codeword range (e.g., the entire SDR codeword range, etc.), corresponding mapped HDR codewords can be found in each of the CDF matching curves. For each sample SDR point in the set of uniformly sampled SDR points, a GPR model may be constructed based on a histogram having a plurality of (e.g., 128, etc.) luma bins histogram and used to generate a corresponding luma reshaping mapping.
TPB Optimization in Editing
[0332] Image/video editing is a common application in a video capturing devices such as a mobile device to allow users to adjust color, contrast, brightness, or other user preferences. The image/video editing can be made or performed at the encoder side before captured images are compressed into a (compressed) video signal or bitstream. Additionally, optionally or alternatively, the image/video editing can be made or performed at the decoder side after the video signal bitstream is decoded or decompressed.
[0333] Image/video editing operations as described herein can be performed at either or both of an HDR domain and an SDR domain. Image/video editing operations in an HDR domain can be relatively easily performed. For example, after HDR content or images are edited, the edited HDR content or images can be passed in as input or reference HDR images to a video pipeline that generates corresponding reshaped or ISP SDR content or images to be encoded in a video signal or bitstream. In comparison, image/video editing operations in an SDR domain can be relatively challenging. While HDR-to-SDR and/or SDR-to-HDR mappings can be designed or generated to ensure or enhance revertability between the SDR domain and the HDR domain, the image/video editing operations performed with SDR images that are to be encoded into a video signal or bitstream is likely to cause difficulties or challenges for reverting or backward reshaping the edited SDR images to generate reconstructed HDR images that approximate the reference HDR images.
[0334]
[0335]
[0336] As shown in
[0337] As more colors are squeezed into the SDR domain or color space, codewords in the forward reshaped domain or color space such as the SDR YCbCr color space often may exceed a codeword value range such as the SMPTE range in the SDR RGB color space in which the image/video editing operations. Applying the YCbCr-to-RGB conversionwhich converts YCbCr codewords in an YCbCr codeword value range (normalized in a value range of [0, 1]) of the SDR YCbCr color space to RGB codewords of the SDR RGB color spacemay cause some of the RGB codewords to exceed or go beyond the SMPTE range (normalized in a value range of [0, 1]) of the SDR RGB color space. While these out-of-range codeword values can be clipped, the clipped codeword values may not be able to be restored in the (e.g., TPB based, etc.) backward reshaping operations to non-clipped original HDR codewords. Additional operations as described herein may be used to enhance revertability and reduce or avoid visual artifact in the image/video editing applications.
[0338]
[0339] In some operational scenarios, as illustrated in
[0340] In some operational scenarios, as illustrated in
[0341] In some operational scenarios, as illustrated in
[0342] In some operational scenarios, as illustrated in
[0343] The (input) SDR YCbCr image may be obtained from an original HDR image through forward reshaping or ISP processing with a known (e.g., white box, ISP, etc.) HDR-to-SDR forward transform or mapping, the boundary may be determined and represented as a (TPB boundary clipping) 3D-LUT based in part or in whole on the HDR-to-SDR forward or mapping and/or any applicable color space conversion matrix(es). For example, using a full grid sampling data that covers the entire HDR domain or color space in which the original HDR image is represented, boundary pixel or codeword values in the forward reshaped or ISP SDR color space can be determined.
[0344] After the image/video editing operations (Editing in RGB domain), the edited SDR RGB codewords can be scaled or squeezedor clipped irregularlyinto a 3D shape delineated or enclosed by the boundary. In some operational scenarios, since the converted scaled edited SDR YCbCr codewords are already placed within a corresponding boundary in the SDR YCbCr color space that is supported by TPB-based backward reshaping operations, the scaled edited SDR RGB codewords in the SDR RGB color space can be converted by the RGB-to-YCbCr conversion (RGB to YCbCr clipping conversion) into the converted scaled edited SDR YCbCr codewords without further clipping by the RGB-to-YCbCr conversion (RGB to YCbCr clipping conversion). In these operational scenarios as illustrated in
[0345] In some operational scenarios, as illustrated in
[0346] The (input) SDR YCbCr image may be obtained from an original HDR image through forward reshaping or ISP processing with a known (e.g., white box, ISP, etc.) HDR-to-SDR forward transform or mapping, the boundary may be determined and represented with a (TPB boundary clipping) 3D-LUT based on the HDR-to-SDR forward or mapping and/or any applicable color space conversion matrix(es). For example, using a full grid sampling data that covers the entire HDR domain or color space in which the original HDR image is represented, boundary pixel or codeword values in the forward reshaped or ISP SDR color space can be determined.
Construction of and Clipping with Boundary Clipping 3D-LUT
[0347] An HDR-to-SDR forward reshaping or mapping such as TPB based forward reshaping may be a non-linear function. While input HDR codewords in an input HDR domain or color space may be well organized within a simple 3D cube, the boundary of mapped SDR codewords generated from applying the (non-linear or TPB) HDR-to-SDR mapping may be relatively irregular unlike a 3D cube.
[0348] To perform (TPB) boundary clipping with respect to the relatively irregular boundary, a (TPB) boundary clipping 3D-LUT may be constructed. The 3D-LUT can be looked up with a SDR codeword as input (or lookup key) and returns the SDR codeword as value in response to determining that the SDR codeword is within the boundary. Otherwise, in response to determining that the SDR codeword is outside the boundary, the 3D-LUT can return a clipped SDR codeword different from the (original SDR codeword the former of which is within the boundary.
[0349] In some operational scenarios, boundary clipping can be implemented in two level. In the first level, regular clipping is performed with ranges defined by minimum and maximum values (or lower and upper limits) that define a (3D) codeword range. In the second level, irregular clipping is performed with the (TPB) boundary clipping 3D-LUT.
[0350]
[0351] Block 3022 comprises preparing a 3D uniformly sampling grid or set of sampled values {dot over (V)}.sub.YCbCr.sup.(FL),(R2020) (e.g., given in expressions (42) and (43) above, etc.) in an input HDR domain or color space such as the R.2020 (container) domain or color space. By way of illustration but not limitation, the HDR domain or color space represents an HDR YCbCr color space HLG.
[0352] The sampled values in the R.2020 YCbCr color space HLG may be converted to corresponding values in the R.2020 RGB color space HLG, as indicated in expression (44) above. The converted values in the R.2020 RGB color space HLG may be further converted to corresponding values in the a RGB color space HLG corresponding to the optimized value (a.sup.opt) for the parameter a, as indicated in expression (45) above. The converted values in the (a) RGB color space HLG may be clipped as indicated in expression (46) above and converted to corresponding clipped values in the R.2020 RGB color space HLG as indicated in expression (47) above. The clipped values in the R.2020 RGB color space HLG derived with expression (47) above may be converted to corresponding clipped values
in the R.2020 YCbCr color space HLG, as indicated in expression (48) above.
[0353] Block 3024 comprises applying the TPB based forward reshaping (referenced as forward TPB in
(referenced as constrained input in
(in expression (50) above) built with the clipped HDR values
in the R.2020 YCbCr color space HLG as input parameters to forward TPB basis functions to obtain mapped or forward reshaped clipped SDR or R.709 YCbCr values denoted as
as follows:
[0354] Block 3046 comprises converting the R.709 YCbCr values denoted as
to generate SDR or R.709 RGB values in an unclipped RGB domain or color space, as follows:
[0355] The R.709 RGB values as converted from the R.709 YCbCr values may include values outside a valid or designated range such as a value range of [0, 1].
[0356] For each channel, minimum and maximum values in the R.709 RGB values given in expression (100) may be measured or determined, as follows:
[0357] These extreme values can be used as lower and upper limits in block 3030 to construct or prepare a uniform sampling 3D grid or a set of sampling values in the unclipped RGB domain or color space.
[0358]
[0359] After image/video editing operations are performed on the SDR RGB image, a resultant (3D) codeword range or distribution of edited codewords or colors can be (e.g., much, etc.) wider than the irregular shape in the SDR RGB color space, thereby giving rise to many SDR RGB codewords not defined or supported by the backward reshaping operations.
[0360] In addition, it may be difficult to characterize, represent or approximate the actual 3D boundary of the irregular shape using analytical equations or multiple 2D planes that serve to cut a 3D cube in the RGB color space into the irregular shape.
[0361] As noted, a two-level clipping solution may be used to clip the edited codewords back into the space of maximum supported RGB colors as represented in the irregular shape. More specifically, at the first level, regular clipping is applied by clipping the edited codewords within a fixed 3D range of
The regular clipping can make use of relatively simple clipping functions based on if-else or min( )/max( ) operations for the purpose of reducing the (input) edited codeword range to the 3D cube (including rectangle) defined or delimited by
At the second level, irregular clipping is applied by clipping those codewordsamong the simple clipped edited codewordsthat still lie outside the irregular shape into the space of maximum supported SDR RGB colors delineated or specified by the irregular shape. As the relatively irregular clipping boundary of the space of maximum supported SDR RGB colors does not lend to relatively simple clipping operations with analytical equations, simple codes or operations, a boundary clipping 3D-LUT can be used to carry out the irregular clipping at the second level of the two-level clipping solution.
[0362] Block 3028 of
building an alphaShape with available alphaShape construction tools such as alphaShape( ) MATLAB functions. As used herein, an alphaShape refers to a bounding area or volume that envelops a set of 2-D or 3-D points, such as the set of SDR RGB codewords
in the present example.
[0363] An alphaShape object created to represent the alphaShape can be manipulated to generalize a bounding polygons containing the set of SDR RGB codewords
tighten or loosen the fit of the alphaShape around the SDR RGB codewords
for example with nonconvex regions. Additional points may be added or removed, for example to suppress holes or regions or to simply the alphaShape.
[0364] Denote a function to construct the alphaShape as .sup.S(, r.sup.), where r.sup. denotes a radius parameter.
[0365] By passing the set of SDR RGB codewords
as the first input parameter to this function, a bounding polyhedra denoted as S.sup.(R709) can be constructed by the alphaShape construction function as follows:
[0366]
with different r.sup.. As shown, a larger value for the radius parameter r.sup. results or constructs a larger polyhedron, which leads to coarser quantization if nearest neighbors are searched for outside SDR RGB points or codewords and used to fill in clipped values for these outside SDR RGB points. On the other hand, a smaller r.sup. results or constructs a smaller polyhedron, which leads to a higher likelihood of creating holes and have a less effective boundary. The radius parameter, r.sup. may be tuned to different values (e.g., 0.1, 0.5, etc.) for different video editing applications to maximize effectiveness or coverage of supported SDR colors and avoid introducing visual artifacts.
[0367] The alphaShape provides a boundary polyhedron as a clipping boundary for the irregular shape and allows determining whether a 3D point represented by a SDR RGB codeword is inside the irregular shape or outside the irregular shape.
[0368] Let I.sup.s (S, x) denote a binary function used to determine a 3D point or SDR RGB codeword (denoted as x) is inside the irregular shape or not. Assume a return binary value of 1 indicates inside the irregular shape and a return binary value of 0 is outside.
[0369] Let NN.sup.S (S, x) denote an index function that receives a given query 3D point or SDR RGB codeword such as x as the second input parameter and returns the nearest neighbor point in S for the query 3D point or SDR RGB codeword x.
[0370] As noted, the boundary clipping 3D-LUT can be constructed in the SDR RGB domain or color space to clip any SDR RGB codeword outside the irregular shape representing the space of maximum supported colors by the (TPB based forward and backward) reshaping operations.
[0371] Block 3030 comprises building the boundary clipping 3D-LUT as a full-grid 3D-LUT with a plurality of nodes/entries. These nodes/entries in the 3D-LUT include query SDR codewords as lookup keys and returned SDR codewords for the query SDR codewords as values.
[0372] The extreme values determined (in block 3026 of
may be used as lower or upper limits for a 3D cube (or rectangle). The query SDR codewords represented or included in the 3D-LUT may form a full grid that includes 3D points uniformly sampled over the 3D codeword range defined by the extreme values or lower/upper limits. For example, for each dimension or color channel represented by a corresponding axis among R-axis, G-axis, B-axis, the codeword value range between
in that dimension or color channel can be uniformly sampled into a respective total number of partitions among three total numbers of partitions N.sub.R, N.sub.G, and N.sub.B corresponding to three dimensions or color channels, as follows:
[0373] Hence, the total number of represented query SDR codewords in the 3D-LUT may be given as N.sub.u=N.sub.RN.sub.GN.sub.B.
[0374] Block 3032 comprises starting to execute a node processing loop for each node/entry in the 3D-LUT by selecting a current node/entry (e.g., in a sequential or non-sequential looping/iteration order, etc.) among the plurality of nodes/entries in the 3D-LUT.
[0375] For simplicity, (i, j, k) in expression (103) above can be vectorized as p. The current node/entry may be the p-th node/entry in the plurality of nodes/entries in the 3D-LUT. A lookup key for the p-th node/entry may be represented by the p-th represented query SDR RGB codeword denoted as u.sub.p on the LHS of expression (103) above. Let .sub.p denote a current output clipped SDR RGB codeword or value returned by the p-th node/entry of the 3D-LUT given the p-th represented query SDR RGB codeword as the lookup key.
[0376] Block 3034 comprises checking or determining whether the current or p-th represented query SDR RGB codeword u.sub.p for the current node/entry is inside the alphaShape S.sup.(R709) (or the shape constructed with an alpha shape construction function as shown in expression (102) above) generated in block 3028. In response to determining that the current or p-th represented query SDR RGB codeword u.sub.p for the current node/entry is inside the alphaShape S.sup.(R709), the process flow goes to block 3036. Otherwise, the process flow goes to block 3040.
[0377] The plurality of nodes/entries in the 3D-LUT includes a subset of nodes or entries each of which has a respective lookup key specified by a represented query SDR RGB codeword that lies within the alphaShape S.sup.(R709). Hence, the subset of nodes or entries include nodes or entries that are deemed to be within the alphaShape S.sup.(R709).
[0378] Block 3040 comprises finding the nearest node/entry among the subset of nodes/entries within the alphaShape S.sup.(R709) for the current node/entry. The nearest node/entry has the nearest represented query SDR RGB codeword (.sub.p) as the lookup key such that the nearest represented query SDR RGB codeword .sub.p has a minimum distance, as measured with an Euclidean or non-Euclidean distance in the SDR RGB color space, to the current or p-th represented query SDR RGB codeword u.sub.p as compared with all other represented query SDR RGB codewords for all other nodes/entries in the subset of nodes or entries within the alphaShape S.sup.(R709).
[0379] In some operational scenarios, the nearest represented query SDR RGB codeword .sub.p for the current or p-th represented query SDR RGB codeword u.sub.p may be given with the previously mentioned index function NN.sup.S (S, x), as follows:
[0380] The nearest represented query SDR RGB codeword .sub.p represents a boundary clipped value for the current or p-th represented query SDR RGB codeword u.sub.p. This boundary clipped value .sub.p is set to be the returned value for the current or p-th node entry in the 3D-LUT, whereas the current or p-th represented query SDR RGB codeword u.sub.p is set to be the lookup key for the current or p-th node entry in the 3D-LUT.
[0381] Block 3036 comprises setting the current or p-th represented query SDR RGB codeword u.sub.p as the returned value for the current or p-th node entry (in addition to being the lookup key) in the 3D-LUT and determining whether the current or p-th node/entry is the last node or entry in the plurality of nodes/entries in the 3D-LUT. In response to determining that the current or p-th node/entry is the last node or entry in the plurality of nodes/entries in the 3D-LUT, the process flow goes to block 3038. Otherwise, the process flow goes back to block 3032.
[0382] Block 3038 comprises outputting the 3D-LUT as the final boundary clipping 3D-LUT (denoted as
in the SDR RGB color space.
[0383] An example procedure for generating the final boundary clipping 3D-LUT
is illustrated in TABLE 5 below.
TABLE-US-00005 TABLE 5 for each node/entry u.sub.p in the 3D-LUT // check whether it is outside S // if it is outside, the nearest neighbor is found for clipping if I.sup.S(S.sup.(R709), u.sub.p) == 1 // inside .sub.p = u.sub.p else // outside = NN.sup.S(S.sup.(R709), u.sub.p) .sub.p = S.sup.(R709)(
) end if end for
[0384] Given the (final) boundary clipping 3D-LUT
boundary clipping can be performed relatively efficiently on an (e.g., edited, etc.) SDR image using the previously mentioned two-level solution. As noted, regular clipping can be performed to make sure all (e.g., edited, etc.) SDR RGB codewords in the SDR image are within the extreme values or upper/lower limits for SDR RGB codewords in a SDR RGB color space in which image/video editing operations on the SDR image have been performed. The regular clipping is then followed by irregular clipping by passing the (regularly clipped if applicable) SDR codewords as lookup keys to the (final) boundary clipping 3D-LUT
boundary clipping and using returned values from the (final) boundary clipping 3D-LUT
as output (further irregularly clipped if applicable) SDR codewords in an output (e.g., clipped edited, etc.) SDR image.
[0385] In various operational scenarios including but not limited to those illustrated in
Example Process Flows
[0386]
[0387] In block 404, the system generates from the sampled HDR color space points in the HDR color space: (a) reference standard dynamic range (SDR) color space points represented in a reference SDR color space, (b) input HDR color space points represented in an input HDR color space, and (c) reference HDR color space points represented in a reference HDR color space points.
[0388] In block 406, the system executes a reshaping operation optimization algorithm to generate a chain of an optimized forward reshaping mapping and an optimized backward reshaping mapping. The reshaping operation optimization algorithm uses the reference SDR color space points, the input HDR color space points and the reference HDR color space points as input.
[0389] In an embodiment, the optimized forward reshaping mapping is used to forward reshape input HDR images in the input HDR color space into forward reshaped SDR images in a forward reshaped SDR color space; the optimized backward reshaping mapping is used to backward reshape the forward reshaped SDR images in the forward reshaped SDR color space into backward reshaped HDR images.
[0390] In an embodiment, the sampled HDR color space points are built in the HDR color space without using any image.
[0391] In an embodiment, a plurality of chains of optimized forward reshaping mappings and optimized backward reshaping mappings is generated by the reshaping operation optimization algorithm for the plurality of candidate values for the color primary scaling parameter; each chain in the plurality of chains of optimized forward reshaping mappings and optimized backward reshaping mappings includes a respective optimized forward reshaping mapping and a respective optimized backward reshaping mapping.
[0392] In an embodiment, the sampled HDR color space points are mapped to the reference SDR color space points based at least in part on a predefined HDR-to-SDR mapping.
[0393] In an embodiment, a plurality of sets of prediction errors are computed for the plurality of chains of optimized forward reshaping mappings and optimized backward reshaping mappings; each set of prediction errors in the plurality of sets of prediction errors is computed for a respective chain in the plurality of chains of optimized forward reshaping mappings and optimized backward reshaping mappings; the plurality of sets of prediction errors is used to select a specific candidate value from among the plurality of candidate values for the color primary scaling parameter.
[0394] In an embodiment, the specific candidate value for the color primary scaling parameter is used to generate a specific chain of a specific optimized forward reshaping mapping and a specific optimized backward reshaping mapping.
[0395] In an embodiment, the specific optimized forward reshaping mapping is represented in a forward reshaping three-dimensional lookup table.
[0396] In an embodiment, the specific optimized backward reshaping mapping is represented in a backward reshaping three-dimensional lookup table.
[0397] In an embodiment, a video encoder applies the optimized forward reshaping mapping to a sequence of input HDR images to generate a sequence of forward reshaped SDR images and encodes the sequence of forward reshaped SDR images into a video signal.
[0398] In an embodiment, a video decoder decodes a sequence of forward reshaped SDR images from a video signal and applies the optimized backward reshaping mapping to the sequence of forward reshaped SDR images to generate a sequence of backward reshaped HDR images.
[0399] In an embodiment, a sequence of display images derived from the sequence of backward reshaped HDR images is rendered on an image display operating with the video decoder.
[0400] In an embodiment, the HDR color space and the input HDR color space share a common white point.
[0401] In an embodiment, the reshaping operation optimization algorithm represents a Backward-Error-Subtraction-for-signal-Adjustment (BESA) algorithm.
[0402]
[0403] In block 424, the system generates from the sampled HDR color space points in the HDR color space: (a) input SDR color space points represented in an input SDR color space and (b) reference HDR color space points represented in a reference HDR color space points.
[0404] In block 426, the system executes a reshaping operation optimization algorithm to generate an optimized backward reshaping mapping. the reshaping operation optimization algorithm receives the input SDR color space points and the reference HDR color space points as input.
[0405] In an embodiment, the backward reshaping mapping is used to backward reshape SDR images in the input SDR color space into backward reshaped HDR images.
[0406] In an embodiment, the sampled HDR color space points are built in the HDR color space without any image.
[0407] In an embodiment, a plurality of optimized backward reshaping mappings is generated by the reshaping operation optimization algorithm for the plurality of candidate values for the color primary scaling parameter; each optimized backward reshaping mapping in the plurality of optimized backward reshaping mappings includes a respective optimized backward reshaping mapping.
[0408] In an embodiment, a plurality of sets of prediction errors are computed for the plurality of optimized backward reshaping mappings; each set of prediction errors in the plurality of sets of prediction errors is computed for a respective optimized backward reshaping mapping in the plurality of optimized backward reshaping mappings; the plurality of sets of prediction errors is used to select a specific candidate value from among the plurality of candidate values for the color primary scaling parameter.
[0409] In an embodiment, the sampled HDR color space points are processed by a programmable ISP pipeline to the input SDR color space points based at least in part on an optimized value for a programmable configuration parameter of the programmable ISP pipeline.
[0410] In an embodiment, the optimized value for the programmable configuration parameter of the programmable ISP pipeline is determined by minimizing approximation errors between ISP SDR images generated by the programmable ISP pipeline from HDR images and reference SDR images generated by applying a predefined HDR-to-SDR mapping to the same HDR images.
[0411]
[0412] In block 444, the system matches a subset of one or more SDR image feature points in the set of SDR image feature points with a subset of one or more HDR image feature points in the set of HDR image feature points.
[0413] In block 446, the system uses the subset of one or more SDR image feature points and the subset of one or more HDR image feature points to generate a geometric transform to spatially align a set of SDR pixels in the training SDR image with a set of HDR pixels in the training HDR image.
[0414] In block 448, the system determines a set of pairs of SDR and HDR color patches, from the set of SDR pixels in the training SDR image and the set of HDR pixels in the training HDR image after the training SDR and HDR images have been spatially aligned by the geometric transform.
[0415] In block 450, the system generates an optimized SDR-to-HDR mapping, based at least in part on the set of pairs of SDR and HDR color patches derived from the training SDR image and the training HDR image.
[0416] In block 452, the system applies the optimized SDR-to-HDR mapping to one or more non-training SDR images to generate one or more corresponding non-training HDR images.
[0417] In an embodiment, the training SDR image and the training HDR image are captured by a capturing device operating in SDR and HDR capture modes, respectively, from a three-dimensional (3D) visual scene.
[0418] In an embodiment, the training SDR image and the training HDR image form a pair of training SDR and HDR images in a plurality of pairs of training SDR and HDR images; the optimized SDR-to-HDR mapping is generated based at least in part on a plurality of sets of pairs of SDR and HDR color patches derived from the plurality of pairs of training SDR and HDR images.
[0419] In an embodiment, each SDR image feature point in the subset of one or more SDR image feature points is matched with a respective HDR image feature point in the subset of one or more HDR image feature points; the SDR image feature point and the HDR image feature point are extracted from the training SDR image and the HDR image, respectively, using a common feature point extraction algorithm.
[0420] In an embodiment, the common feature point extraction algorithm represents one of: a Binary-Robust-Invariant-Scalable-Keypoints algorithm; a Features-from-Accelerated-Segment-Test algorithm; a KAZE algorithm; a minimum eigenvalue algorithm; a maximally-stable-extremal-regions algorithm; an Oriented-FAST-and-Rotated algorithm; a scale-invariant-feature-transform algorithm; a Speeded-Up-Robust-Features algorithm; etc.
[0421]
[0422] In block 464, the system generates a respective projective transform, in a pair of an SDR image projective transform and an HDR image projective transform, using corner pattern marks detected from each undistorted image in the pair of the undistorted training SDR image and the undistorted training HDR image.
[0423] In block 466, the system applies each projective transform in the pair of the SDR image projective transform and the HDR image projective transform to a respective undistorted image in the pair of the undistorted training SDR image and the undistorted training HDR image to generate a respective rectified image in a pair of a rectified training SDR image and a rectified training HDR image.
[0424] In block 468, the system extracts a set of SDR color patches from the rectified training SDR image and extracting a set of HDR color patches from the rectified training HDR image.
[0425] In block 470, the system generates an optimized SDR-to-HDR mapping, based at least in part on the set of SDR color patches and the set of HDR color patches derived from the training SDR image and the training HDR image.
[0426] In block 472, the system applies the optimized SDR-to-HDR mapping to one or more non-training SDR images to generate one or more corresponding non-training HDR images.
[0427] In an embodiment, the training SDR image and the training HDR image are captured by a first capturing device operating in an SDR capture mode and a second capturing device operating in an HDR capture mode, respectively, from a common color chart image.
[0428] In an embodiment, the common color chart image is selected from a plurality of color chart images each of which includes a distinct distribution of color patches arranged in a two-dimensional color chart.
[0429] In an embodiment, the distinct distribution of color patches is generated with random colors randomly selected from a common statistical distribution with a specific combination of statistical mean and variance values.
[0430] In an embodiment, the common color chart image is rendered on and captured by the first capturing device and the second capturing device from a screen of a common reference image display.
[0431] In an embodiment, the respective camera distortion correction operations are based at least in part on camera-specific distortion coefficients generated from a camera calibration process performed with a camera used to acquire the training image.
[0432] In an embodiment, the set of SDR color patches and the set of HDR color patches are used to derive a three-dimensional mapping table (3DMT); the optimized SDR-to-HDR mapping is generated based at least in part on the 3DMT.
[0433] In an embodiment, the optimized SDR-to-HDR mapping represents one of: TPB based mapping or a non-TPB-based mapping.
[0434] In an embodiment, the optimized SDR-to-HDR mapping is one of: a static mapping that is applied to all non-training SDR images represented in a video signal, or a dynamic mapping that is generated based at least in part on a specific value distribution of SDR codewords of a non-training SDR image among non-training SDR images represented in the video signal.
[0435]
[0436] In block 484, the system converts the sampled HDR color space points into SDR color space points in a first SDR color space in which SDR images to be edited by an editing device are represented.
[0437] In block 486, the system determines a bounding SDR color space rectangle based on extreme SDR codeword values of the SDR color space points in the first SDR color space and determining an irregular 3D shape from a distribution of the SDR color space points.
[0438] In block 488, the system builds sampled SDR color space points distributed throughout the bounding SDR color space rectangle in the first SDR color space.
[0439] In block 490, the system uses the sampled SDR color space points and the irregular shape to generate a boundary clipping 3D-LUT. The boundary clipping 3D-LUT uses the sampled SDR color space points as lookup keys.
[0440] In block 492, the system performs clipping operations, based at least in part on the boundary clipping 3D-LUT, on an edited SDR image in the first SDR color space to generate a boundary clipped edited SDR image in the first SDR color space.
[0441] In an embodiment, the clipping operations includes first using the bounding SDR color space rectangle to perform regular clipping on the edited SDR image to generate a regularly clipped edited SDR image and subsequently using the 3D-LUT to perform irregular clipping on the regularly clipped edited SDR image to generate the boundary clipped edited SDR image.
[0442] In an embodiment, a set of one or more SDR pixels in the SDR image to be edited is edited from one or more first luminance values to one or more second luminance values in the edited image; the one or more second luminance values are different from the one or more first luminance values.
[0443] In an embodiment, a set of one or more SDR pixels in the SDR image to be edited is edited from one or more first chrominance values to one or more second chrominance values in the edited image; the one or more second chrominance values are different from the one or more first chrominance values.
[0444] In an embodiment, an image detail depicted in the SDR image to be edited is removed in the edited SDR image.
[0445] In an embodiment, an image detail not depicted in the SDR image to be edited is added in the edited SDR image.
[0446] In an embodiment, the 3D-LUT includes one or more nodes each of which includes a lookup key and a lookup value; the lookup key equals the lookup value; the lookup key is within the irregular shape.
[0447] In an embodiment, the 3D-LUT includes one or more nodes each of which includes a lookup key and a lookup value; the lookup key is outside the irregular shape whereas the lookup value is within the irregular shape.
[0448] In an embodiment, the lookup value is determined based on an index function that takes the irregular shape and the lookup key as input and returns a nearest neighbor to the lookup key as output.
[0449] In an embodiment, a computing device such as a display device, a mobile device, a set-top box, a multimedia device, etc., is configured to perform any of the foregoing methods. In an embodiment, an apparatus comprises a processor and is configured to perform any of the foregoing methods. In an embodiment, a non-transitory computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of any of the foregoing methods.
[0450] In an embodiment, a computing device comprising one or more processors and one or more storage media storing a set of instructions which, when executed by the one or more processors, cause performance of any of the foregoing methods.
[0451] Note that, although separate embodiments are discussed herein, any combination of embodiments and/or partial embodiments discussed herein may be combined to form further embodiments.
Example Computer System Implementation
[0452] Embodiments of the present invention may be implemented with a computer system, systems configured in electronic circuitry and components, an integrated circuit (IC) device such as a microcontroller, a field programmable gate array (FPGA), or another configurable or programmable logic device (PLD), a discrete time or digital signal processor (DSP), an application specific IC (ASIC), and/or apparatus that includes one or more of such systems, devices or components. The computer and/or IC may perform, control, or execute instructions relating to the adaptive perceptual quantization of images with enhanced dynamic range, such as those described herein. The computer and/or IC may compute any of a variety of parameters or values that relate to the adaptive perceptual quantization processes described herein. The image and video embodiments may be implemented in hardware, software, firmware and various combinations thereof.
[0453] Certain implementations of the inventio comprise computer processors which execute software instructions which cause the processors to perform a method of the disclosure. For example, one or more processors in a display, an encoder, a set top box, a transcoder or the like may implement methods related to adaptive perceptual quantization of HDR images as described above by executing software instructions in a program memory accessible to the processors. Embodiments of the invention may also be provided in the form of a program product. The program product may comprise any non-transitory medium which carries a set of computer-readable signals comprising instructions which, when executed by a data processor, cause the data processor to execute a method of an embodiment of the invention. Program products according to embodiments of the invention may be in any of a wide variety of forms. The program product may comprise, for example, physical media such as magnetic data storage media including floppy diskettes, hard disk drives, optical data storage media including CD ROMs, DVDs, electronic data storage media including ROMs, flash RAM, or the like. The computer-readable signals on the program product may optionally be compressed or encrypted.
[0454] Where a component (e.g. a software module, processor, assembly, device, circuit, etc.) is referred to above, unless otherwise indicated, reference to that component (including a reference to a means) should be interpreted as including as equivalents of that component any component which performs the function of the described component (e.g., that is functionally equivalent), including components which are not structurally equivalent to the disclosed structure which performs the function in the illustrated example embodiments of the invention.
[0455] According to one embodiment, the techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
[0456] For example,
[0457] Computer system 500 also includes a main memory 506, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 502 for storing information and instructions to be executed by processor 504. Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Such instructions, when stored in non-transitory storage media accessible to processor 504, render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions.
[0458] Computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504. A storage device 510, such as a magnetic disk or optical disk, is provided and coupled to bus 502 for storing information and instructions.
[0459] Computer system 500 may be coupled via bus 502 to a display 512, such as a liquid crystal display, for displaying information to a computer user. An input device 514, including alphanumeric and other keys, is coupled to bus 502 for communicating information and command selections to processor 504. Another type of user input device is cursor control 516, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 504 and for controlling cursor movement on display 512. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
[0460] Computer system 500 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 500 to be a special-purpose machine. According to one embodiment, the techniques as described herein are performed by computer system 500 in response to processor 504 executing one or more sequences of one or more instructions contained in main memory 506. Such instructions may be read into main memory 506 from another storage medium, such as storage device 510. Execution of the sequences of instructions contained in main memory 506 causes processor 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
[0461] The term storage media as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operation in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510. Volatile media includes dynamic memory, such as main memory 506. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
[0462] Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 502. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
[0463] Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 504 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 500 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 502. Bus 502 carries the data to main memory 506, from which processor 504 retrieves and executes the instructions. The instructions received by main memory 506 may optionally be stored on storage device 510 either before or after execution by processor 504.
[0464] Computer system 500 also includes a communication interface 518 coupled to bus 502. Communication interface 518 provides a two-way data communication coupling to a network link 520 that is connected to a local network 522. For example, communication interface 518 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
[0465] Network link 520 typically provides data communication through one or more networks to other data devices. For example, network link 520 may provide a connection through local network 522 to a host computer 524 or to data equipment operated by an Internet Service Provider (ISP) 526. ISP 526 in turn provides data communication services through the worldwide packet data communication network now commonly referred to as the Internet 528. Local network 522 and Internet 528 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 520 and through communication interface 518, which carry the digital data to and from computer system 500, are example forms of transmission media.
[0466] Computer system 500 can send messages and receive data, including program code, through the network(s), network link 520 and communication interface 518. In the Internet example, a server 530 might transmit a requested code for an application program through Internet 528, ISP 526, local network 522 and communication interface 518.
[0467] The received code may be executed by processor 504 as it is received, and/or stored in storage device 510, or other non-volatile storage for later execution.
EQUIVALENTS, EXTENSIONS, ALTERNATIVES AND MISCELLANEOUS
[0468] In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is claimed embodiments of the invention, and is intended by the applicants to be claimed embodiments of the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. Enumerated Exemplary Embodiments
[0469] The invention may be embodied in any of the forms described herein, including, but not limited to the following Enumerated Example Embodiments (EEEs) which describe structure, features, and functionality of some portions of embodiments of the present invention.
[0470] EEE1. A method comprising: [0471] building sampled high dynamic range (HDR) color space points distributed throughout an HDR color space, wherein the HDR color space is parameterized by a color primary scaling parameter with a candidate value selected from among a plurality of candidate values, wherein the color primary scaling parameter is used to compute color space coordinates of at least one of multiple color primaries delineating the HDR color space; [0472] generating from the sampled HDR color space points in the HDR color space: (a) reference standard dynamic range (SDR) color space points represented in a reference SDR color space, (b) input HDR color space points represented in an input HDR color space, and (c) reference HDR color space points represented in a reference HDR color space points; [0473] executing a reshaping operation optimization algorithm to generate a chain of an optimized forward reshaping mapping and an optimized backward reshaping mapping, wherein the reshaping operation optimization algorithm uses the reference SDR color space points, the input HDR color space points and the reference HDR color space points as input; [0474] wherein the optimized forward reshaping mapping is used to forward reshape input HDR images in the input HDR color space into forward reshaped SDR images in a forward reshaped SDR color space; wherein the optimized backward reshaping mapping is used to backward reshape the forward reshaped SDR images in the forward reshaped SDR color space into backward reshaped HDR images.
[0475] EEE2. The method of EEE1, wherein the sampled HDR color space points are built in the HDR color space without using any image.
[0476] EEE3. The method of EEE1 or EEE2, wherein a plurality of chains of optimized forward reshaping mappings and optimized backward reshaping mappings is generated by the reshaping operation optimization algorithm for the plurality of candidate values for the color primary scaling parameter; wherein each chain in the plurality of chains of optimized forward reshaping mappings and optimized backward reshaping mappings includes a respective optimized forward reshaping mapping and a respective optimized backward reshaping mapping.
[0477] EEE4. The method of any of EEE1-EEE3, wherein the sampled HDR color space points are mapped to the reference SDR color space points based at least in part on a predefined HDR-to-SDR mapping.
[0478] EEE5. The method of any of EEE1-EEE4, wherein a plurality of sets of prediction errors are computed for the plurality of chains of optimized forward reshaping mappings and optimized backward reshaping mappings; wherein each set of prediction errors in the plurality of sets of prediction errors is computed for a respective chain in the plurality of chains of optimized forward reshaping mappings and optimized backward reshaping mappings; wherein the plurality of sets of prediction errors is used to select a specific candidate value from among the plurality of candidate values for the color primary scaling parameter.
[0479] EEE6. The method of EEE5, wherein the specific candidate value for the color primary scaling parameter is used to generate a specific chain of a specific optimized forward reshaping mapping and a specific optimized backward reshaping mapping.
[0480] EEE7. The method of EEE6, wherein the specific optimized forward reshaping mapping is represented in a forward reshaping three-dimensional lookup table.
[0481] EEE8. The method of EEE6 or EEE7, wherein the specific optimized backward reshaping mapping is represented in a backward reshaping three-dimensional lookup table.
[0482] EEE9. The method of any of EEE6-EEE8, wherein a video encoder applies the optimized forward reshaping mapping to a sequence of input HDR images to generate a sequence of forward reshaped SDR images and encodes the sequence of forward reshaped SDR images into a video signal.
[0483] EEE10. The method of any of EEE6-EEE9, wherein a video decoder decodes a sequence of forward reshaped SDR images from a video signal and applies the optimized backward reshaping mapping to the sequence of forward reshaped SDR images to generate a sequence of backward reshaped HDR images.
[0484] EEE11. The method of EEE10, wherein a sequence of display images derived from the sequence of backward reshaped HDR images is rendered on an image display operating with the video decoder.
[0485] EEE12. The method of any of EEE1-EEE10, wherein the HDR color space and the input HDR color space share a common white point.
[0486] EEE13. The method of any of EEE1-EEE10, wherein the reshaping operation optimization algorithm represents a Backward-Error-Subtraction-for-signal-Adjustment (BESA) algorithm with neural color preservation.
[0487] EEE14. A method comprising: [0488] building sampled high dynamic range (HDR) color space points distributed throughout an HDR color space, wherein the HDR color space is parameterized by a color primary scaling parameter with a candidate value selected from among a plurality of candidate values, wherein the color primary scaling parameter is used to compute color space coordinates of at least one of multiple color primaries delineating the HDR color space; [0489] generating from the sampled HDR color space points in the HDR color space: (a) input standard dynamic range (SDR) color space points represented in an input SDR color space and (b) reference HDR color space points represented in a reference HDR color space points; [0490] executing a reshaping operation optimization algorithm to generate an optimized backward reshaping mapping, wherein the reshaping operation optimization algorithm receives the input SDR color space points and the reference HDR color space points as input; [0491] wherein the backward reshaping mapping is used to backward reshape SDR images in the input SDR color space into backward reshaped HDR images.
[0492] EEE15. The method of EEE14, wherein the sampled HDR color space points are built in the HDR color space without any image.
[0493] EEE16. The method of EEE14 or EEE15, wherein a plurality of optimized backward reshaping mappings is generated by the reshaping operation optimization algorithm for the plurality of candidate values for the color primary scaling parameter; wherein each optimized backward reshaping mapping in the plurality of optimized backward reshaping mappings includes a respective optimized backward reshaping mapping.
[0494] EEE17. The method of any of EEE14-EEE16, wherein a plurality of sets of prediction errors are computed for the plurality of optimized backward reshaping mappings; wherein each set of prediction errors in the plurality of sets of prediction errors is computed for a respective optimized backward reshaping mapping in the plurality of optimized backward reshaping mappings; wherein the plurality of sets of prediction errors is used to select a specific candidate value from among the plurality of candidate values for the color primary scaling parameter.
[0495] EEE18. The method of any of EEE14-EEE17, wherein the sampled HDR color space points are processed by a programmable image signal processor (ISP) pipeline to the input SDR color space points based at least in part on an optimized value for a programmable configuration parameter of the programmable ISP pipeline.
[0496] EEE19. The method of any of EEE14-EEE18, wherein the optimized value for the programmable configuration parameter of the programmable ISP pipeline is determined by minimizing approximation errors between ISP SDR images generated by the programmable ISP pipeline from HDR images and reference SDR images generated by applying a predefined HDR-to-SDR mapping to the same HDR images.
[0497] EEE20. A method comprising: [0498] extracting a set of standard dynamic range (SDR) image feature points from a training SDR image and extracting a set of high dynamic range (HDR) image feature points from a training HDR image; [0499] matching a subset of one or more SDR image feature points in the set of SDR image feature points with a subset of one or more HDR image feature points in the set of HDR image feature points; [0500] using the subset of one or more SDR image feature points and the subset of one or more HDR image feature points to generate a geometric transform to spatially align a set of SDR pixels in the training SDR image with a set of HDR pixels in the training HDR image; [0501] determining a set of pairs of SDR and HDR color patches, from the set of SDR pixels in the training SDR image and the set of HDR pixels in the training HDR image after the training SDR and HDR images have been spatially aligned by the geometric transform; [0502] generating an optimized SDR-to-HDR mapping, based at least in part on the set of pairs of SDR and HDR color patches derived from the training SDR image and the training HDR image; [0503] applying the optimized SDR-to-HDR mapping to one or more non-training SDR images to generate one or more corresponding non-training HDR images.
[0504] EEE21. The method of EEE20, wherein the training SDR image and the training HDR image are captured by a capturing device operating in SDR and HDR capture modes, respectively, from a three-dimensional (3D) visual scene.
[0505] EEE22. The method of EEE20 or EEE21, wherein the training SDR image and the training HDR image form a pair of training SDR and HDR images in a plurality of pairs of training SDR and HDR images; wherein the optimized SDR-to-HDR mapping is generated based at least in part on a plurality of sets of pairs of SDR and HDR color patches derived from the plurality of pairs of training SDR and HDR images.
[0506] EEE23. The method of any of EEE20-EEE22, wherein each SDR image feature point in the subset of one or more SDR image feature points is matched with a respective HDR image feature point in the subset of one or more HDR image feature points; wherein the SDR image feature point and the HDR image feature point are extracted from the training SDR image and the HDR image, respectively, using a common feature point extraction algorithm.
[0507] EEE24. The method of EEE23, wherein the common feature point extraction algorithm represents one of: a Binary-Robust-Invariant-Scalable-Keypoints algorithm; a Features-from-Accelerated-Segment-Test algorithm; a KAZE algorithm; a minimum eigenvalue algorithm; a maximally-stable-extremal-regions algorithm; an Oriented-FAST-and-Rotated algorithm; a scale-invariant-feature-transform algorithm; or a Speeded-Up-Robust-Features algorithm.
[0508] EEE25. A method comprising: [0509] performing respective camera distortion correction operations on each training image in a pair of a training standard dynamic range (SDR) image and a training high dynamic range (HDR) image to generate a respective undistorted image in a pair of an undistorted training SDR image and an undistorted training HDR image; [0510] generating a respective projective transform, in a pair of an SDR image projective transform and an HDR image projective transform, using corner pattern marks detected from each undistorted image in the pair of the undistorted training SDR image and the undistorted training HDR image; [0511] applying each projective transform in the pair of the SDR image projective transform and the HDR image projective transform to a respective undistorted image in the pair of the undistorted training SDR image and the undistorted training HDR image to generate a respective rectified image in a pair of a rectified training SDR image and a rectified training HDR image; [0512] extracting a set of SDR color patches from the rectified training SDR image and extracting a set of HDR color patches from the rectified training HDR image; [0513] generating an optimized SDR-to-HDR mapping, based at least in part on the set of SDR color patches and the set of HDR color patches derived from the training SDR image and the training HDR image; [0514] applying the optimized SDR-to-HDR mapping to one or more non-training SDR images to generate one or more corresponding non-training HDR images.
[0515] EEE26. The method of EEE25, wherein the training SDR image and the training HDR image are captured by a first capturing device operating in an SDR capture mode and a second capturing device operating in an HDR capture mode, respectively, from a common color chart image.
[0516] EEE27. The method of EEE26, wherein the common color chart image is selected from a plurality of color chart images each of which includes a distinct distribution of color patches arranged in a two-dimensional color chart.
[0517] EEE28. The method of EEE27, wherein the distinct distribution of color patches is generated with random colors randomly selected from a common statistical distribution with a specific combination of statistical mean and variance values.
[0518] EEE29. The method of any of EEE25-EEE28, wherein the common color chart image is rendered on and captured by the first capturing device and the second capturing device from a screen of a common reference image display.
[0519] EEE30. The method of any of EEE25-EEE29, wherein the respective camera distortion correction operations are based at least in part on camera-specific distortion coefficients generated from a camera calibration process performed with a camera used to acquire the training image.
[0520] EEE31. The method of any of EEE25-EEE30, wherein the set of SDR color patches and the set of HDR color patches are used to derive a three-dimensional mapping table (3DMT); wherein the optimized SDR-to-HDR mapping is generated based at least in part on the 3DMT.
[0521] EEE32. The method of any of EEE25-EEE31, wherein the optimized SDR-to-HDR mapping represents one of: tensor-product B-Spline (TPB) based mapping or a non-TPB-based mapping.
[0522] EEE33. The method of any of EEE25-EEE32, wherein the optimized SDR-to-HDR mapping is one of: a static mapping that is applied to all non-training SDR images represented in a video signal, or a dynamic mapping that is generated based at least in part on a specific value distribution of SDR codewords of a non-training SDR image among non-training SDR images represented in the video signal.
[0523] EEE34. A method comprising: [0524] building sampled high dynamic range (HDR) color space points distributed throughout an HDR color space used to represent reconstructed HDR images; [0525] converting the sampled HDR color space points into standard dynamic range (SDR) color space points in a first SDR color space in which SDR images to be edited by an editing device are represented; [0526] determining a bounding SDR color space rectangle based on extreme SDR codeword values of the SDR color space points in the first SDR color space and determining an irregular three-dimensional (3D) shape from a distribution of the SDR color space points; [0527] building sampled SDR color space points distributed throughout the bounding SDR color space rectangle in the first SDR color space; using the sampled SDR color space points and the irregular shape to generate a [0528] boundary clipping 3D lookup table (3D-LUT), wherein the boundary clipping 3D-LUT uses the sampled SDR color space points as lookup keys; [0529] performing clipping operations, based at least in part on the boundary clipping 3D-LUT, on an edited SDR image in the first SDR color space to generate a boundary clipped edited SDR image in the first SDR color space.
[0530] EEE35. The method of EEE34, wherein the clipping operations includes first using the bounding SDR color space rectangle to perform regular clipping on the edited SDR image to generate a regularly clipped edited SDR image and subsequently using the 3D-LUT to perform irregular clipping on the regularly clipped edited SDR image to generate the boundary clipped edited SDR image.
[0531] EEE36. The method of EEE34 or EEE35, wherein a set of one or more SDR pixels in the SDR image to be edited is edited from one or more first luminance values to one or more second luminance values in the edited image; wherein the one or more second luminance values are different from the one or more first luminance values.
[0532] EEE37. The method of any of EEE34-EEE36, wherein a set of one or more SDR pixels in the SDR image to be edited is edited from one or more first chrominance values to one or more second chrominance values in the edited image; wherein the one or more second chrominance values are different from the one or more first chrominance values.
[0533] EEE38. The method of any of EEE34-EEE37, wherein an image detail depicted in the SDR image to be edited is removed in the edited SDR image.
[0534] EEE39. The method of any of EEE34-EEE38, wherein an image detail not depicted in the SDR image to be edited is added in the edited SDR image.
[0535] EEE40. The method of any of EEE34-EEE40, wherein the 3D-LUT includes one or more nodes each of which includes a lookup key and a lookup value; wherein the lookup key equals the lookup value; wherein the lookup key is within the irregular shape.
[0536] EEE41. The method of any of EEE34-EEE40, wherein the 3D-LUT includes one or more nodes each of which includes a lookup key and a lookup value; wherein the lookup key is outside the irregular shape whereas the lookup value is within the irregular shape.
[0537] EEE42. The method of EEE41, wherein the lookup value is determined based on an index function that takes the irregular shape and the lookup key as input and returns a nearest neighbor to the lookup key as output.
[0538] EEE43. An apparatus comprising a processor and configured to perform any one of the methods recited in EEEs 1-42.
[0539] EEE44. A non-transitory computer-readable storage medium having stored thereon computer-executable instruction for executing a method with one or more processors in accordance with any of the methods recited in EEEs 1-42.
[0540] EEE45. A computer system configured to perform any one of the methods recited in EEEs 1-42.