Method and apparatus for conversion of HDR signals
10848729 ยท 2020-11-24
Assignee
Inventors
Cpc classification
G09G2320/0276
PHYSICS
G09G2360/16
PHYSICS
G09G2370/04
PHYSICS
H04N9/77
ELECTRICITY
H04N9/68
ELECTRICITY
International classification
H04N9/77
ELECTRICITY
H04N9/68
ELECTRICITY
Abstract
Described are concepts, systems and techniques related to processing an input video signal intended for a first display to produce an output signal appropriate for a second display. The concepts, systems and techniques include converting using one or more transfer functions arranged to provide relative scene light values and remove or apply rendering intent of the input or output video signal, wherein the removing or applying rendering intent alters luminance.
Claims
1. A method of processing a scene referred video signal representing a scene detected by a camera, into a display referred video signal relating to a display, the method comprising converting the scene referred video signal to the display referred video signal using one or more transfer functions arranged to: apply rendering intent to the scene referred video signal to provide display light values for the display, wherein the rendering intent depends on a peak display light value for the display and a surrounding luminance level for the display, and wherein applying the rendering intent comprises: applying an inverse opto-electrical transfer function to reverse the effect of an opto-electrical transfer function used to produce the scene referred video signal, wherein applying the inverse opto-electrical transfer function produces a scene light signal representing linear light from the scene detected by the camera; applying an opto-optical transfer function to the scene light signal to produce a display light signal representing linear light for presentation on the display; and encoding the display light signal using the inverse of an electro-optical transfer function of the display to produce the display referred video signal; wherein applying the rendering intent alters luminance without altering relative values of RGB components.
2. A method according to claim 1, further comprising scaling the scene referred video signal to convert between an absolute range and a relative range.
3. A method according to claim 1, wherein the one or more transfer functions are provided by a three-dimensional look up table.
4. A method according to claim 1, wherein the rendering intent is applied as a function of input RGB values according to any of:
5. A method of processing a display referred video signal relating to a display, into a scene referred video signal representing a scene detected by a camera, the method comprising converting the display referred video signal to the scene referred video signal using one or more transfer functions arranged to: remove rendering intent of the display referred video signal to provide relative scene light values, wherein the rendering intent depends on a peak display light value for the display and a surrounding luminance level for the display, and wherein removing the rendering intent comprises: applying an electro-optical transfer function of the display to the display referred video signal to generate a display light signal representing the linear light for presentation on the display; applying an inverse opto-optical transfer function to the display light signal to produce a scene light signal representing linear light from the scene detected by the camera; and encoding the scene light signal with an opto-electrical transfer function to produce the scene referred video signal; wherein removing the rendering intent alters luminance without altering relative values of RGB components.
6. A method according to claim 5, further comprising scaling the display referred video signal to convert between an absolute range and a relative range.
7. A method according to claim 5, wherein the one or more transfer functions are provided by a three-dimensional look up table.
8. A method according to claim 5, wherein the rendering intent is removed as a function of input RGB values according to any of:
9. A method of processing an input video signal referred to a first display, into an output video signal referred to a second display, the method comprising converting the input video signal to the output video signal using one or more transfer functions arranged to: (a) remove rendering intent of the first display to provide relative scene light values, wherein the rendering intent of the first display depends on a peak display light value for the first display and a surrounding luminance level for the first display, wherein removing the rendering intent of the first display comprises: applying an electro-optical transfer function of the first display to the input signal to generate a first display light signal representing linear light for presentation on the first display; and applying an inverse opto-optical transfer function to the first display light signal to generate a scene light signal representing the linear light from a scene detected by a camera; and (b) apply rendering intent of the second display, wherein the rendering intent of the second display depends on a peak display light value for the second display and a surrounding luminance level for the second display, and wherein applying the rendering intent of the second display comprises: applying an opto-optical transfer function to the scene light signal to produce a second display light signal representing linear light for presentation on the second display; and encoding the second display light signal using the inverse of an electro-optical transfer function of the second display to produce the output signal; wherein the removing or applying rendering intent alters luminance without altering relative values of RGB components.
10. A method according to claim 9, further comprising scaling the input video signal to convert between an absolute range and a relative range.
11. A method according to claim 9, wherein the one or more transfer functions are provided by a three-dimensional look up table.
12. A method according to claim 9, wherein removing the rendering intent of the first display, or applying the rendering intent of the second display, or both, is applied as a function of input RGB values according to any of:
13. A converter for processing a scene referred video signal representing a scene detected by a camera, into a display referred video signal relating to a display, the converter comprising processors programmed to perform, or dedicated hardware arranged to perform, one or more transfer functions to: apply rendering intent to the scene referred video signal to provide display light values for the display, wherein the rendering intent depends on a peak display light value for the display and a surrounding luminance level for the display, and wherein applying the rendering intent comprises: applying an inverse opto-electrical transfer function to reverse the effect of an opto-electrical transfer function used to produce the scene referred video signal, wherein applying the inverse opto-electrical transfer function produces a scene light signal representing linear light from the scene detected by the camera; applying an opto-optical transfer function to the scene light signal to produce a display light signal representing linear light for presentation on the display; and encoding the display light signal using the inverse of an electro-optical transfer function of the display to produce the display referred video signal; wherein applying the rendering intent alters luminance without altering relative values of RGB components.
14. A converter according to claim 13, further comprising means for scaling to convert between an absolute range and a relative range.
15. A device, or receiver, or set top box, or display, or transmitter, or apparatus being part of a studio chain, comprising the converter of claim 13.
16. A converter for processing a display referred video signal relating to a display, into a scene referred video signal representing a scene detected by a camera, the converter comprising processors programmed to perform, or dedicated hardware arranged to perform, one or more transfer functions to: remove rendering intent of the display referred video signal to provide relative scene light values, wherein the rendering intent depends on a peak display light value for the display and a surrounding luminance level for the display, and wherein removing the rendering intent comprises: applying an electro-optical transfer function of the display to the display referred video signal to generate a display light signal representing the linear light for presentation on the display; applying an inverse opto-optical transfer function to the display light signal to produce a scene light signal representing linear light from the scene detected by the camera; and encoding the scene light signal with an opto-electrical transfer function to produce the scene referred video signal; wherein removing the rendering intent alters luminance without altering relative values of RGB components.
17. A converter according to claim 16, further comprising means for scaling to convert between an absolute range and a relative range.
18. A device, or receiver, or set top box, or display, or transmitter, or apparatus being part of a studio chain, comprising the converter of claim 16.
19. A converter for processing an input video signal referred to a first display, into an output video signal referred to a second display, the converter comprising processors programmed to perform, or dedicated hardware arranged to perform, one or more transfer functions to: (a) remove rendering intent of the first display to provide relative scene light values, wherein the rendering intent of the first display depends on a peak display light value for the first display and a surrounding luminance level for the first display, wherein removing the rendering intent of the first display comprises: applying an electro-optical transfer function of the first display to the input signal to generate a first display light signal representing linear light for presentation on the first display; and applying an inverse opto-optical transfer function to the first display light signal to generate a scene light signal representing the linear light from a scene detected by a camera; and (b) apply rendering intent of the second display, wherein the rendering intent of the second display depends on a peak display light value for the second display and a surrounding luminance level for the second display, and wherein applying the rendering intent of the second display comprises: applying an opto-optical transfer function to the scene light signal to produce a second display light signal representing linear light for presentation on the second display; and encoding the second display light signal using the inverse of an electro-optical transfer function of the second display to produce the output signal; wherein the removing or applying rendering intent alters luminance without altering relative values of RGB components.
20. A converter according to claim 19, further comprising means for scaling to convert between an absolute range and a relative range.
21. A device, or receiver, or set top box, or display, or transmitter, or apparatus being part of a studio chain, comprising the converter of claim 19.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The concepts, systems and techniques will be described in more detail by way of example with reference to the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION
(11) The concepts sought to be protected may be embodied in a method of processing video signals to convert between video signals appropriate for one display and signals appropriate for a target display, devices for performing such conversion, transmitters, receivers and systems involving such conversion.
(12) An embodiment will be described in relation to processing which may be embodied in a component within a broadcast chain. The component may be referred to as a converter for ease of discussion, but it is to be understood as a functional module that may be implemented in hardware or software within another device or as a standalone component. The converter may be within production equipment, a transmitter or a receiver, or within a display. The functions may be implemented as a 3D look up table. Some background relating to video signals will be presented first for ease of reference.
(13) Scene Referred & Display Referred Signals
(14) High dynamic range (HDR) television offers the potential for delivering much greater impact than conventional, or standard, dynamic range (SDR) television. Standards for HDR television signals are needed to support the development and interoperability of the equipment and infrastructure needed to produce and deliver HDR TV. Two different approaches to HDR signal standardisation are emerging. These may be referred to as scene referred and display referred and are described below. It is likely that movies and videos will be produced using both types of signal. We have appreciated the need to interconvert between signals such as these two types of signal. This disclosure describes how to perform such conversions whilst maintaining the image quality and artistic intent embodied in the signals. Furthermore, with one type of signal (display referred), processing is also required to convert between signals intended to be shown on displays with different brightnesses. This disclosure also describes now to perform inter-conversions between different display referred signals. The main embodiment described is for HDR signals but the techniques described also apply to other signals representing moving images.
(15) A scene referred signal represents the relative luminance that would be captured by a camera, that is the light from a scene. Such signals typically encode dimensionless (i.e. normalised) values in the range zero to one, where zero represents black and one represents the brightest signal that can be detected without the camera sensor saturating. This type of signal is used in conventional television signals, for example as specified in international standard ITU-R BT 709. Such signals may be presented on displays with different peak luminance. For example the same signal may be shown on a professional display (used in programme production) with a peak luminance of 100 cd/m.sup.2 or a consumer TV with a peak luminance of 400 cd/m.sup.2 viewed in a home. This is supported by international standard ITU-R BT 1886. It defines an electro-optic transfer function (EOTF), which specifies how the signal is converted to light emitted (or reflected) from a display (or screen). In ITU-R BT 1886 the EOTF is parameterised by the peak luminance (and black level) of the display, thereby allowing image presentation on displays of different brightness. The signal from scanning conventional photo-chemical film stock, or from an electronic film camera also represents light from a scene and so is scene referred. Recently a scene referred HDR TV signal was proposed in BBC Research & Development White Paper WHP283. Similar signals have been proposed to the International Telecommunications Union (ITU) for standardisation. In summary, a scene referred signal provides relative luminance and so is dimensionless and represents the light captured by the image sensor in a camera.
(16) A different type of moving image signal, known as display referred, was defined for HDR movies, in SMPTE standard ST 2084 in 2014, and has also been proposed to the ITU for standardisation. This signal represents the light emitted from a display. Therefore this signal represents an absolute luminance level. For example the luminance of a pixel at a specified location on the display may be coded as 2000 cd/m.sup.2. In ST 2084 the signal range is zero to 10000 cd/m.sup.2. Note that in a display referred signal the values have dimension cd/m.sup.2 (or equivalent), whereas in a scene referred signal the values are relative and, therefore, dimensionless.
(17) We have appreciated that the absolute, rather than relative nature of display referred signals presents a difficulty if the signal value is brighter than the peak luminance of a display. For example consider a signal prepared or graded on a display with a peak luminance of 4000 cd/m.sup.2. This signal is likely to contain values close to the peak luminance of the display, 4000 cd/m.sup.2. If you now try to display such a signal on a display capable of only 48 cd/m.sup.2 (which is the brightness of a projected cinema image), we have appreciated the problem of displaying pixels that are supposed to be shown brighter than the display can manage.
(18) One way that has been used hitherto is to show pixels too bright for the display at its peak luminance. This is known as limiting or clipping. However, in this example, the specified luminance of many pixels will be greater than the capabilities of the cinema projector, resulting in large regions in which the image is severely distorted. Clearly clipping is not always a satisfactory method of presenting a display referred signal. This disclosure describes how to convert a display referred signal intended for display at a given brightness to be displayed at a different brightness, whilst preserving image quality and artistic intent.
(19) A key feature of moving image displays is rendering intent. The need for rendering intent is to ensure the subjective appearance of pictures is close to the appearance of the real scene. Naively one might think that the luminance of an image should be a scaled version of that captured by the camera. For printed photographic images this is approximately correct; over most of the density range, the points lie near the straight line of unity gamma [described later] passing through the origin (Hunt, R. W. G., 2005. The Reproduction of Colour. ISBN 9780470024263, p 55). But for images displayed in dark surroundings (e.g. projected transparencies, movies, or television) it has long been known that an overall non-linearity between camera and display is required to produce subjectively acceptable pictures (see Hunt ibid, or Poynton, C. & Funt, B., 2014. Perceptual uniformity in digital image representation and display. Color Res. Appl., 39: 6-15). Rendering intent is, therefore, the overall non-linearity applied between camera and display so that the subjective appearance of the image best matches the real scene.
(20) Rendering intent is typically implemented using gamma curves, or approximations thereto, in both the camera and the display. A gamma curve is simply a power law relationship between the signal values and luminance. In the camera the relationship between the relative light intensity, L.sub.c (range [0:1]), detected by the camera, and values encoded in the signal, V (range [0:1]), may be approximated by:
V=L.sub.C.sup..sup.
(21) Similarly, in the display, the relationship between emitted light, L.sub.d (range [0:1]), normalised to the peak display brightness), and the signal value V may be approximated by:
L.sub.d=V.sup..sup.
Therefore:
L.sub.d=L.sub.C.sup..sup.
(22)
(23) If .sub.d=1/.sub.c then, overall, the camera/display system is linear, but this is seldom the case in practice. More generally overall, end to end, system gamma is given by the product of .sub.c and .sub.d.
(24) Different rendering intents are used for different forms of image reproduction. Projected photographic transparencies use a system gamma of about 1.5. Movies typically apply a system gamma of about 1.56. Reference monitors, used in television production, apply a system gamma of about 1.2. The system gamma used depends primarily on the brightness of the display and the background luminance surrounding the display. Experimentally we have found that the system gamma providing the best subjective picture rendition may be approximated by:
(25)
(26) where L.sub.peak is the peak luminance of the picture, and L.sub.surround is the luminance surrounding the display. In any given viewing environment a more precise value of system gamma may be determined experimentally. Using such custom values of system gamma, rather than the approximate generic formula above, may improve the fidelity of the image conversion described below.
(27) Gamma curves have been found empirically to provide a rendering intent that subjectively yields high quality images. Nevertheless other similar shaped curves might yield improved subjective quality. The techniques disclosed herein are described in terms of gamma curves. But the same techniques may be applied with curves with a different shape.
(28) Colour images consist of three separate colour components, red, green and blue, which affects how rendering intent should be applied. We have appreciated that applying a gamma curve to each component separately distorts the colour. It particularly distorts saturation but also, to a lesser extent, the hue. For example, suppose the red, green and blue components of a pixel have (normalised) values of (0.25, 0.75, 0.25). Now if we apply a gamma of 2, i.e. square the component values, we get (0.0625, 0.5625, 0.0625). We may note two results: the pixel has got slightly darker, and the ratio of green to blue and red has increased (from 3:1 to 9:1), which means that a green pixel has got even greener. In general we would not wish to distort colours when displaying them, so this approach is not ideal.
(29) Rather than applying a gamma curve independently to each colour component we have appreciated we may apply it to only to the luminance (loosely the brightness). The luminance of a pixel is given by a weighted sum of the colour components; the weights depend on the colour primaries and the white point. For example with HDTV, specified in ITU-R BT 709, luminance is given by:
Y=0.2126R+0.7152G+0.0722B
or, for the newer UHDTV, specified in ITU-R BT 2020, luminance is given by:
Y=0.2627R+0.6780G+0.0593B
where Y represents luminance and R, G and B represent the normalised, linear (i.e. without applying gamma correction), colour components.
(30) By applying a gamma curve, or rendering intent, to the luminance component only we can avoid colour changes in the display.
(31) Image Signal Chain
(32)
(33) As shown in
(34) Conventionally the OETF is applied independently to the three colour components (although in principle it could be, non-separable, a joint function of them). This allows it to be implemented very simply using three independent 1 dimensional lookup tables (1D LUTs). Similarly the EOFT has also, conventionally, been implemented independently on the three colour components. Typically the EOTF is implemented using three non-linear digital to analogue converters (DACs) immediately prior to the display panel, which is equivalent to using independent 1D LUTs. However, as discussed above, this leads to colour changes. So, ideally, the EOTF would be implemented as a combined function of the three colour components. This is a little more complex that using 1D LUTs but could be implemented in a three dimensional look up table (3D LUT).
(35) Only two of the OETF, the EOTF and the OOTF are independent. In functional notation:
OOTF.sub.R(R,G,B)=EOTF.sub.R(OETF.sub.R(R,G,B))
OOTF.sub.G(R,G,B)=EOTF.sub.G(OETF.sub.G(R,G,B))
OOTF.sub.B(R,G,B)=EOTF.sub.B(OETF.sub.B(R,G,B))
This is easier to see if we use the symbol .Math. to represent concatenation. With this notation we get the follow three relationships between these three non-linearities:
OOTF=OETF.Math.EOTF
EOTF=OETF.sup.1.Math.OOTF
OETF=OOTF.Math.EOTF.sup.1
(36) The display referred signal chain looks superficially similar (and so is not illustrated) but the signal corresponds to display referred image data. A crucial difference is that the EOTF is fixed and does not vary with display brightness, display black level or the viewing environment (particularly the luminance surrounding the display). Rendering intent, or OOTF, must vary with display characteristics and viewing environment to produce a subjectively acceptable picture. Therefore, for a display referred signal, the OOTF, and hence the EOTF, must depend on the specific display on which the signal is to be presented and its viewing environment. For fixed viewing environment, such as viewing movies in a cinema, this is possible. For television, where the display and viewing environment are not known when the programme is produced, this is not practical. In practice display referred signals are intended for producing non-live programmes. The OETF is largely irrelevant as the image is adjusted by an operator until it looks right on the mastering display.
(37) Conversion from Scene Referred Signals to Display Referred Signals
(38)
(39) Thus OETF.sub.s.sup.1 is the inverse OETF for the scene referred signal, OOTF is the desired rendering intent, discussed in more detail below, and EOTF.sub.d.sup.1 is the inverse of the display EOTF.
(40) The design of the OOTF is described using gamma curves, but a similar procedure may be used for an alternative psycho-visual curve to a gamma curve. The OETF.sub.s.sup.1 regenerates the linear light from the scene detected by the camera. Form this we may calculate the (normalised) scene luminance Ys, for example for UHDTV,
Y.sub.s=0.2627R.sub.s+0.6780G.sub.s+0.0593B.sub.s
where the subscript s denotes values relating to the scene. We apply rendering intent to the scene luminance, for example using a gamma curve:
Y.sub.d=Y.sub.s.sup.
Here the appropriate gamma may be calculated using the approximate generic formula above, or otherwise. In calculating gamma we need to choose an intended peak image brightness, L.sub.peak, and the luminance surrounding the display, L.sub.surround. The surrounding luminance may be measured by sensors in the display or otherwise. Alternatively it may be estimated based on the expected, or standardised (reference), viewing environment. Once we know the displayed luminance we may calculate the red, green, and blue components to be presented on the display to implement the OOTF directly on each RGB component (Equation 1)
R.sub.d=L.sub.peakR.sub.s(Y.sub.d/Y.sub.s)
G.sub.d=L.sub.peakG.sub.s(Y.sub.d/Y.sub.s)
B=L.sub.peakB.sub.s(Y.sub.d/Y.sub.s)
(41) where subscript d denotes values relating to the display. As noted above the scene referred data is dimensionless and normalised to the range [0:1], whereas display referred data has dimensions cd/m.sup.2. To convert to display referred values they should be multiplied (scaled) by the chosen peak image brightness, L.sub.peak. Finally the linear light values calculated this way should be encoded using the inverse of the display referred EOTF, EOTF.sub.d.sup.1.
(42) The conversion may be implemented in a variety of ways. The individual components may be implemented using lookup tables and the scaling as an arithmetic multiplier. The OETF and EOTF may be implemented using 1D LUTs, but the OOTF requires a 3D LUT. Alternatively the conversion may conveniently be implemented using a single 3D LUT that combines all separate components.
(43) As a summary of the above, the embodiment applies an opto-optical transfer function (OOTF) as a step in the processing chain to appropriately provide the rendering intent of the target display. In addition, a scaling step is provided to convert between normalised values and absolute values. A particular feature of the embodiment is that the OOTF does not alter colour, more specifically it does not alter hue or saturation, and this can be achieved either by conversion of signals from RGB to a separate luminance component against which gamma is then provided. Preferably, the OOTF is provided directly on the RGB components in such a way that the relative values of the RGB components do not change such that colour is not altered. In effect, this applies the OOTF directly to RGB components so as to alter the overall luminance, but not the colour.
(44)
(45) Some signals have characteristics of both scene referred and display referred signals. This document refers to such signals as quasi scene referred signals. These include conventional SDR signals. For such signals an alternative method of conversion may yield higher quality results.
(46) For conventional SDR signals the rendering intent is standardised and does not vary with display brightness. This implies the signal has some dependence on the display brightness and viewing environment. The rendering intent will be appropriate provided the peak display luminance is constant relative to the surrounding luminance and there is some degree of latitude in this ratio. In practice, for SDR signals, the conditions for the rendering intent to be substantially correct are usually met even though the brightness of displays can vary substantially.
(47) When the highest quality conversion from a quasi-scene referred signal to a display referred signal is required it may be preferable to derive the linear scene light from the light intended to be shown on a reference display. This would take into account the rendering intent applied to the scene referred signal. Such an approach may also be beneficial for some HDR scene referred signals, such as proposed in BBC White Paper 283, which have similar characteristics to conventional SDR signals.
(48) The difference in the conversion technique, shown in
(49) Here the rendering intents, or OOTFs, are distinguished by subscripts. Subscript d indicates an OOTF used to create the display referred signal. Subscript r indicates the reference OOTF. That is the OOTF that would be used if the signal were to be rendered onto a reference display. OOTF.sub.r.sup.1 represents the inverse of the reference OOTF.sub.r, that is it undoes OOTF.sub.r.
(50) The first functional block in the processing chain, EOTF.sub.r, applies the non-linearity specified for a reference monitor (display). This generates the linear light components that would be presented on a reference monitor. That is:
R.sub.r=EOTF.sub.r(R.sub.s)
G.sub.r=EOTF.sub.r(G.sub.s)
B.sub.r=EOTF.sub.r(B.sub.s)
where R.sub.r, G.sub.r, and B.sub.r are the linear light components on a (virtual) reference monitor. R.sub.s, G.sub.s, and B.sub.s are the, non-linear (gamma corrected) quasi scene referred signals. Note that all signals are normalised to the range [0:1]. Note also that these equations assume the EOTF is applied independently to all colour components (e.g. implemented with a 1D LUT), which is usually the case but is not necessary to perform the conversion. Consider, for example a UHD television signals for which the EOTF is (presumably) specified by ITU-R BT 1886, which may be approximated by a gamma curve with an exponent of 2.4. In this example, EOTF.sub.r(x)=x.sup.2.4, so that:
R.sub.r=R.sub.s.sup.2.4
G.sub.r=G.sub.s.sup.2.4
B.sub.r=B.sub.s.sup.2.4
Once the linear light components are known we may the calculate reference luminance, Y.sub.r, as indicated above.
(51) In order to undo the implied system gamma (that is implement OOTF.sub.r.sup.1) we first consider that:
R.sub.r=R.sub.s(Y.sub.r/Y.sub.s)
G.sub.r=G.sub.s(Y.sub.r/Y.sub.s)
B.sub.r=B.sub.s(Y.sub.r/Y.sub.s)
where R.sub.s, G.sub.s, B.sub.s and Y.sub.s are the linear light components of the scene (which are what we are after). Assuming the rendering intent is a gamma curve (and assuming a zero black offset) then we have
Y.sub.s=Y.sub.r.sup.1/
This implies an implementation of the inverse OOTF is (Equation 2):
(52)
With UHDTV, for example, which is standard dynamic range (SDR), we know that system gamma is 1.2 (see, for example, EBU-TECH 3321, EBU guidelines for Consumer Flat Panel Displays (FPDs), Annex A, 2007).
(53) So we now have explicit values for the linear light components corresponding to the scene (Scene Light). These may be used, as they were in relation to conversion from scene referred to display referred, to generate a display referred signal.
(54) Conversion from Display Referred Signals to Scene Referred Signals
(55)
(56) Here the linear light intended to be presented on a display, Display Light is first generated using the display EOTF.sub.d. This generates values with units of cd/m.sup.2. The display light is divided by the peak value of display light to produce a dimensionless normalised value. Then the rendering intent (OOTF.sub.d), that was applied to ensure the pictures looked subjectively correct, is undone by applying the inverse of the rendering intent (OOTF.sub.d.sup.1). This generates a normalised signal representing the (linear) light that would have been detected by a camera viewing the real scene (Scene Light). Finally the linear scene light is encoded using the OETF.sub.r of the scene referred signal.
(57) The peak value of display light may either be provided as an input to the conversion process, or it may be determined by analysing the signal itself. Because the peak value to be displayed may change from frame to frame it is more difficult to estimate the peak value of a live picture sequence (e.g. from a live sporting event) when the complete signal is not, yet, available. Note that when converting from a scene referred signal to a display referred signal the peak signal value must be chosen. In this reverse case, converting from a display referred signal to a scene referred signal, this same piece of information, peak signal value, must be provided or estimated.
(58) Inverting the OOTF.sub.d is the same process as is used in inverting the OOTF, when converting quasi scene referred signals to display referred signals, above.
(59)
(60) In this conversion the processing in the signal chain prior to Scene light is the same as in method two, but the encoding of the Scene light to generate the quasi scene referred signal is different. To encode Scene Light we first apply the reference OOTF.sub.r. This may be to apply a gamma curve to the luminance component of the linear scene light Y.sub.s, that is:
Y.sub.r=Y.sub.s.sup.
The individual colour components are then given by (Equation 3);
R.sub.r=R.sub.s(Y.sub.r/Y.sub.s)=R.sub.sY.sub.s.sup.-1
G.sub.r=G.sub.sx(Y.sub.r/Y.sub.s)=G.sub.sY.sub.s.sup.-1
B.sub.r=B.sub.s(Y.sub.r/Y.sub.s)=B.sub.sY.sub.s.sup.-1
scene light encoding is completed by applying the reference EOTF.sub.r (e.g. ITU-R BT 1886).
(61) Conversion Between Different Display Referred Signals
(62)
(63) Display referred signals differ in the peak level of signal they hold. Each signal relates to a specific display (hence display referred). The signal is incomplete without knowledge of the display, especially its peak level and the luminance level surrounding the display (because these values determine how the pictures should be rendered to achieve high subjective quality). This data may be conveyed with the signal as metadata, or the peak signal level may be measured, or estimated, from the signal itself, and the surrounding luminance measured, or inferred from standards documents or from knowledge of current production practice. SMPTE ST 2084 provides two Reference Viewing Environments in Annex B, for HDTV and Digital Cinema. The HDTV environment has a luminance of background of 8 to 12 cd/m2. The Digital Cinema environment only states the light level reflected from the screen and does not, directly, indicate the background illumination, which must be estimated.
(64) A display referred signal may therefore be considered a container for signals produced (or mastered) on displays with different brightness and viewing environments.
(65) Since different display referred signals may relate to different mastering displays there is a need to convert between them. Furthermore such conversion implicitly indicates how a signal, mastered at one peak brightness and surrounding illumination, maybe reproduced at a different peak brightness and surrounding illumination. So this technique, for converting between display referred signals, may also be used to render a signal intended for one display, on a different display, in high quality. For example a programme or movie may be mastered on a bright display supporting peak luminance of 4000 cd/m.sup.2 (e.g. a Dolby Pulsar display), but may wish to be shown on a dimmer monitor, e.g. an OLED display (perhaps 1000 cd/m.sup.2) or a cinema display (48 cd/m.sup.2). Prior to this disclosure no satisfactory automatic (algorithmic) method had been suggested to achieve this conversion/rendering. Instead the proponents of SMPTE ST 2084 suggest that the programme or movie be manually re-graded (i.e. adjusted) to provide a satisfactory subjective experience. Clearly an automatic method for performing this conversion potentially provides significant benefits in terms of both cost and simplified production workflows.
(66) This conversion may be implemented by concatenating the processing before Scene Light of the conversion from display referred to scene referred described above (i.e. a first EOTF.sub.d1, cascaded with a first scaling factor and an inverse first OOTF.sub.d1.sup.1), with the processing after Scene Light of the conversion from scene referred to display referred (i.e. a second OOTF.sub.d2, cascaded with a second scaling factor and an inverse second EOTF.sub.d2.sup.1). Note that the peak signal value for display referred signal 1 is needed to normalise the signal (Scale 1). It is also needed, along with the background illumination for to calculate OOTF.sub.d1, which may be a gamma curve with gamma determined as above. Note that the peak signal value and background illumination are also needed for display 2. Peak signal 2 is used to multiply (scale 2) the normalised signal to produce an absolute (linear) signal with the correct magnitude and dimensions cd/m.sup.2 (and with background illumination to calculate a second value of gamma). By appropriate selection of these peak signal values and background illuminations the signal can be converted between different display referred signals or rendered for display on a display other than that used for production (mastering).
(67) Conversion Between Scene Referred Signals and Quasi Scene Referred Signals
(68) For completeness, we will describe conversion between scene referred signals and quasi scene referred signals. Whilst these are not the main embodiments, similar steps are performed.
(69) The sections above consider 3 types of signal: a scene referred signal (e.g. a proprietary camera response curve such as Sony S-Log), a quasi scene referred signal (e.g. ITU-R BT 709, which uses ITU-R BT 1886 as a reference EOTF), or a display referred signal (e.g. SMPTE ST 2084). With three types of signal 9 types of conversion are possible and only 4 conversions are described above. The remaining conversions are between scene referred signals and quasi scene referred signals, which may also be useful. These conversions may be implemented by permuting the processing before and after Scene Light in methods above.
(70) Conversion from a scene referred signal to a quasi-scene referred signal: This conversion may be implemented by concatenating the processing before Scene Light in
(71) Conversion from a quasi scene referred signal to a scene referred signal: This conversion may be implemented by concatenating the processing before Scene Light in
(72) Conversion from a quasi scene referred signal to a different quasi-scene referred signal: This conversion may be implemented by concatenating the processing before Scene Light in
(73) Conversion from a scene referred signal to a different scene referred signal: This conversion may be implemented by concatenating the processing before Scene Light in
(74) Accordingly, the concepts described herein should not be limited to disclosed embodiments but rather should be limited only by the spirit and scope of the appended claims. All publications and references cited herein are expressly incorporated herein by reference in their entirety.