METHODS AND APPARATUS FOR MOTION-BASED VIDEO TONAL STABILIZATION

Abstract

One general aspect for motion-based video tonal stabilization uses a keyframe and motion estimation techniques to determine the level of spatial correspondence between input images and the keyframe. When the level of spatial correspondence is high, tonal stabilization is performed through regression and power law tonal transformation to minimize the color differences between images caused by automatic camera parameters without a priori knowledge of the camera model. Tonal error accumulation is reduced by using long-term tonal propagation.

Claims

1-8. (canceled)

9. A method for correcting color and/or brightness instabilities of a sequence of images, said method comprising correcting color and/or brightness instabilities of at least one first image of said sequence according to a reference image, said reference image being determined between images of said sequence according to a number of motion based correspondences between said first image and said reference image.

10. The method of claim 9, wherein said reference image is a reference image previously used for correcting a second image of said sequence when said number of motion based correspondences is less than a first value.

11. The method of claim 10, wherein said second image is the last previously corrected image of said sequence.

12. The method of claim 9, wherein said reference image is different from another reference image previously used for correcting a third image of said sequence when said number of motion based correspondences is greater than a second value.

13. The method of claim 12, wherein said reference image is the last previously corrected image of said sequence.

14. The method of claim 9, wherein said method comprises determining said number of motion based correspondences, said determining comprising: performing motion estimation between said first image and said reference image; warping said first image to align with said reference image; and discarding values higher than a third value on a difference map regarding said aligned first and reference images.

15. The method of claim 14, wherein correcting color and/or brightness instabilities comprises: generating pairs of spatially corresponding points between said image and said reference image based on said difference map; and performing, on said first image, a transformation based on a color mapping regression over said pairs of spatially corresponding points.

16. The method of claim 15, wherein said transformation comprises a power law tonal transformation.

17. The method of claim 15, wherein said transformation uses a six-parameters color transformation model.

18. An apparatus adapted for correcting color and/or brightness instabilities of a sequence of images, said apparatus comprising a processing circuitry adapted for correcting color and/or brightness instabilities of at least one first image of said sequence according to a reference image, said reference image being determined between images of said sequence according to a number of motion based correspondences between said first image and said reference image.

19. The apparatus of claim 18, wherein said reference image is a reference image previously used for correcting a second image of said sequence when said number of motion based correspondences is less than a first value.

20. The apparatus of 19, wherein said second image is the last previously corrected image of said sequence.

21. The apparatus of claim 18, wherein said reference image is a reference image different from previously used for correcting a third image of said sequence when said number of motion based correspondences is greater than a second value.

22. The apparatus of claim 18, wherein said apparatus comprises a motion estimator adapted to determine said number of motion based correspondences.

23. The apparatus of claim 18, wherein said processing circuitry is adapted for determining said number of motion based correspondences, said determining comprising: performing motion estimation between said first image and said reference image; warping said first image to align with said reference image; and discarding values higher than a third value on a difference map of said aligned first and reference images.

24. The apparatus of claim 18, wherein said apparatus comprises an image processor adapted to align said first image and said reference image.

25. The apparatus of claim 24, wherein said apparatus comprises a comparator adapted to operate on said difference map to discard values higher than said third value.

26. The apparatus of claim 18, wherein correcting color and/or brightness instabilities comprises: generating pairs of spatially corresponding points between said image and said reference image based on said difference map; and performing, on said first image, a transformation based on a color mapping regression over said pairs of spatially corresponding points.

27. A computer readable storage medium having stored thereon computer program comprising program code instructions for executing a method for correcting color and/or brightness instabilities of a sequence of images, said method comprising correcting color and/or brightness instabilities of at least one first image of said sequence according to a reference image, said reference image being determined between images of said sequence according to a number of motion based correspondences between said first image and said reference image.

28. A non-transitory computer readable program storage product comprising program code instructions for executing, when said program is executed by a computer, the method according to claim 9.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] The present principles can be better understood in accordance with the following exemplary figures, in which:

[0013] FIG. 1 shows original images and corrected examples through video tonal stability.

[0014] FIG. 2 shows one embodiment of a flowchart for video tonal stabilization.

[0015] FIG. 3 shows some steps for video tonal stabilization.

[0016] FIG. 4 shows channel-wise histogram specification mapping according to the principles.

[0017] FIG. 5 shows examples of images with different exposure and white balance.

[0018] FIG. 6 shows an example of images corrected with a power law model.

[0019] FIG. 7 shows one embodiment of a method for video tonal stabilization.

[0020] FIG. 8 shows data points extracted from color chart and estimate power law transformations.

[0021] FIG. 9 shows an example of exposure correction for a time lapse sequence.

[0022] FIG. 10 shows data points and estimated curves to correct the sequence of FIG. 9.

[0023] FIG. 11 shows an embodiment of an apparatus for video tonal stabilization.

[0024] FIG. 12 shows an example of tonal stabilization in a test sequence.

[0025] FIG. 13 shows an embodiment of a flowchart for video tonal stabilization.

DETAILED DESCRIPTION

[0026] The described embodiments are directed to methods and apparatus for motion-based video tonal stabilization.

[0027] Generally speaking, tonal stabilization can be described as searching for the transformations that minimize undesired color variations in multiple images of a sequence.

[0028] This section presents the rationale and the main contributions of the proposed method for video tonal stabilization. First of all, the aim is to conceive a method that has the following desired properties: 1) Accuracy in modeling the color instabilities observed between frames in a video; 2) Robustness against motion, occlusion and noise; 3) Computational simplicity to be implemented in a near real time application.

[0029] Observe in practice that the first property (model accuracy) is often in contradiction with the other properties of robustness and computational simplicity. Notably, in terms of tonal transformation, the radiometric calibration approach, which can be considered the most accurate model, is actually not robust against motion and occlusion, and is overly complex. Having this in mind, the proposed method has a goal of a good tradeoff between these three properties.

[0030] In addition, note that the state-of-the-art tonal stabilization method from a prior method does not meet the desired properties for tonal stabilization that were mentioned above. The main limitation of this method is to rely on spatial correspondences, however without applying motion compensation between the spatial coordinates of adjacent frames. Hence, the accuracy of spatial correspondences can be seriously compromised in the case of fast motion between two frames.

[0031] FIG. 2 shows a flow diagram of the proposed method for tonal stabilization and FIG. 3 presents a general overview of the proposed method. In order to achieve robustness against motion and occlusion (an important limitation of earlier methods), the dominant motion between a reference keyframe u.sub.k and the frame to be corrected u.sub.t is estimated. Then, these two frames are registered in order to compute color correspondences. Note that by cumulative motion, we are able to register u.sub.t and u.sub.k, even if they differ by several frames in time. Finally, the color correspondences are used to estimate a color transformation that is applied to correct the tonal instabilities.

[0032] The contributions of the proposed method in comparison to state-of-the-art can be summarized as the following:

[0033] 1. Motion driven method: use of accurate color correspondences between frames obtained by dominant motion estimation and compensation.

[0034] 2. Temporally longer tonal coherence, by using long term motion estimation obtained by motion accumulation.

[0035] 3. Proposal of a computationally simple yet efficient parametric model for color correction.

[0036] For the application of the proposed algorithm, some assumptions need to be made regarding the sequence to be corrected and the color fluctuations to be modeled. In particular, assume that:

[0037] 1. There are spatial correspondences (or redundancy in content) between neighbor frames in the sequence (no scene cuts);

[0038] 2. There is a global transformation which can compensate the colorimetric aberrations between the frames.

[0039] The first assumption is confirmed for every sequence composed of a single shot, as long as it does not pass through extreme variations of scene geometry (i.e., nearly total occlusion) or radiometry (i.e., huge changes in illumination or saturation). The second assumption implies that the observed color instability and consequently the camera response function are global (spatially invariant). In other words, the proposed method is not suitable for correction of local tonal instabilities such as local flicker observed in old archived films.

[0040] The following subsections discuss in detail each main step (in FIG. 3) of the proposed method. For the sake of simplicity, the tonal transformation model is described next, first assuming the simplest case of color correction between images without motion. In the sequence, the proposed model is presented to deal with the general case of tonal stabilization of sequences containing motion. Finally, the effectiveness of the method is described from experiments with real sequences and comparisons with the state-of-the-art.

Tonal Transformation Model

[0041] In this section, the tonal transformation model for correction of tonal instability is described. In particular, consider the case of tonal instability observed in images taken with the same camera, so that tonal variations are caused specifically by the camera automatic parameters.

[0042] According to one prior method, the complete color acquisition model is given by

[00001] $\begin{matrix} [\begin{matrix} u_{R} \\ u_{G} \\ u_{B} \end{matrix}] = F (T_{s} .Math. T_{w} [\begin{matrix} E_{R} \\ E_{G} \\ E_{B} \end{matrix}]) . & (1) \end{matrix}$

where F: custom-character .sup.3.fwdarw..sup.3 denote the color camera response, T.sub.s is a 3×3 matrix accounting for camera color space transform (constant over time), u is the observed intensity, E is the irradiance; T.sub.w is a diagonal matrix accounting for changes in white balance and exposure (varying over time) and given by

[00002] $\begin{matrix} T_{w} = [\begin{matrix} φ & 0 & 0 \\ 0 & ξ & 0 \\ 0 & 0 & ψ \end{matrix}] . & (2) \end{matrix}$

Let u.sub.0 and u.sub.1 be two perfectly registered images taken by the same camera, differing only with respect to white balance and exposure (so that these images have identical irradiance E). Denoting H=F(T.sub.s) as the component of the camera response that is constant, then

u.sub.0=[u.sub.0.sub.R, u.sub.0.sub.G, u.sub.0.sub.B].sup.T=H([φ.sub.0E.sub.R, ξ.sub.0E.sub.G,ψ.sub.0E.sub.B].sup.T) (3)

and

u.sub.1=[u.sub.1.sub.R, u.sub.1.sub.G, u.sub.1.sub.B].sup.T=H([φ.sub.1E.sub.R, ξ.sub.1E.sub.G, Ψ.sub.1E.sub.B].sup.T). (4)

Now, a simple approach to correct the tonal difference between u.sub.0 and u.sub.1 is to transform the colors of u.sub.0 to have the same tonal characteristics of u.sub.1, so that

[00003] $\begin{matrix} H^{- 1} (u_{o}) = [\begin{matrix} \frac{φ_{0}}{φ_{1}} & 0 & 0 \\ 0 & \frac{ξ_{0}}{ξ_{1}} & 0 \\ 0 & 0 & \frac{ψ_{0}}{ψ_{1}} \end{matrix}] .Math. H^{- 1} (u_{1}) & (5) \\ u_{0} = H ([\begin{matrix} \frac{φ_{0}}{φ_{1}} & 0 & 0 \\ 0 & \frac{ξ_{0}}{ξ_{1}} & 0 \\ 0 & 0 & \frac{ψ_{0}}{ψ_{1}} \end{matrix}] .Math. H^{- 1} (u_{1})) . & (6) \end{matrix}$

[0043] Hence, in theory tonal stabilization between images u.sub.0 and u.sub.1 can be achieved with a simple diagonal transformation performed in the camera sensor space (given by non-linear transformations H and H.sup.−1). This tonal stabilization model is inspired by radiometric calibration as an accurate procedure to perform camera color transfer when irradiances E=[E.sub.R, E.sub.G, E.sub.B] are known in the form of RAW images, allowing an estimate of H. However, for the problem of tonal stabilization, we are faced with videos taken with low-cost cameras, from which we cannot make the usual assumptions that are necessary to compute radiometric calibration. The assumption of multiple exposures from the same scene, which is required to estimate the camera response function may not be valid for some sequences, and RAW-sRGB correspondences are also not available in practice.

[0044] According to the desired properties just listed for video tonal stabilization, the radiometric calibration model, while accurate, is overly complex and not general enough to be applied for tonal stabilization of sequences from which the irradiances are not known. The question now is how to approximate this model, when the only information known are the intensities observed in u.sub.0 and u.sub.1?

[0045] While the observed images do not provide enough information to derive the exact color transformation that normalize their tonal characteristics, it is proposed herein that an effective solution for this problem comes from a tonal intensity mapping (such as a brightness or color transfer function), which can be computed by parametric or non-parametric estimation methods. Next, the pros and cons of each estimation approach are described and provide the motivation for the proposed choice.

Non-Parametric or Parametric Color Transformation

[0046] Non-parametric color transformation models do not make explicit assumptions on the type of transformation, allowing to model non-linear transformations, but at the risk of lack of regularity that would demand post-processing regularization.

[0047] Some notable examples of non-parametric color transformations are weighted interpolation and histogram specification. As previously discussed, a weighted interpolation (such as suggested in a prior art tonal stabilization method) has the drawback of being computationally complex both in terms of memory requirements and processing time. It is noted that a color interpolation such as proposed by another prior method is in fact a global transformation that is similar to a histogram specification, the main difference being that the interpolation is computed from spatial correspondences, while the histogram specification is computed from intensity cumulative histograms.

[0048] Classical histogram specification can be an efficient alternative to solve the problem of tonal stabilization (channel-wise specification requires only O(n log n) computations, where n is the number of pixels in an image). However, there are well known limitations of histogram specification. Indeed, it can lead to contrast stretching and quantization artifacts that would need post-processing, and range extrapolation of the transformation is not always possible, especially when dealing with color. Take for example the transformations illustrated in FIG. 4, which shows a channel-wise histogram specification. Note that the red and blue transformation curves in FIG. 4 are affected by sudden jumps, which turn out to produce strong artifacts in the resulting image after transformation.

[0049] On the other hand, parametric models assume that the transformation can be modeled by a given function (linear, affine, and polynomial, for example), so the problem is solved by estimating the coefficients of the transformation. While not very flexible to model any form of transformation, parametric models have the important advantage of being expressed by smooth and regular functions, well defined for the whole color range, so that extrapolation is not a problem. Furthermore, since the transformation is described by few parameters, it reduces the risk of oscillation in time.

[0050] Most white balance algorithms implemented in digital cameras adjust the channel scaling with a simple parametric model, which is a Von Kries diagonal transformation (In practice, some camera white balance algorithms compensate only the red and blue channels, leaving the green channel untouched) performed in RAW images. However, as described in the discussion of tonal transformation model, a diagonal model applied to sRGB images is not able to model non-linearities inherent to the camera response.

[0051] There is not enough information to derive the exact tonal transformation model for color stabilization, whether it is a parametric or non-parametric transformation. Hence, there is a need for a tonal transformation model that is simple enough to be quickly computed, and accurate enough to produce a visually pleasant tonal stabilized sequence. After performing experiments with different parametric and non-parametric models (histogram specification, splines interpolation, piece-wise linear function, diagonal model) the power law transformation has shown to best fit the criteria mentioned above.

Power Law Color Transformation

[0052] For the sake of simplicity, this subsection describes the proposed tonal transformation model assuming used to correct tonal instability in sequences containing no motion. The general case of sequences containing motion is approached later.

[0053] The assumption to correct non-linear tonal instabilities is that exposure differences between frames can be approximated by an exponential factor, while white balance correction can be approximated by diagonal color re-scaling. A parametric power law model is successful in jointly meeting these assumptions. Formally, let u.sub.k be a reference image and u.sub.t an image to be corrected, assuming the images are perfectly registered, a power law relationship between u.sub.t and u.sub.k is written as a function of the form

u.sub.k(x, c)=T(u.sub.t)=α.sub.cu.sub.t(x, c).sup.γ.sup.c, (7)

where c={r, g, b } denotes the image color channels, x ∈ Ω denotes the spatial coordinates over the domain Ω ⊂ custom-character .sup.2. The problem now is to estimate the optimal coefficients α.sub.c, γ.sub.c such that we minimize the mean square error

[00004] $\begin{matrix} \underset{α_{c}, γ_{c}}{argmin} .Math. \underset{x \in Ω}{.Math.} .Math. .Math. {(u_{k} (x, c) - α_{c} .Math. {u_{t} (x, c)}^{γ_{c}})}^{2} . & (8) \end{matrix}$

The minimization of non-linear Eq. 8 over α.sub.c and γ.sub.c has not an analytical solution, b but the power law in Eq. 7 can be rewritten

log u.sub.k(x, c)=γ.sub.c log u.sub.t(x, c)+log α.sub.c (9)

and can be solved by linear least squares fitting as an affine function defined in the logarithmic domain:

[00005] $\begin{matrix} .Math. \underset{E (49)}{\underset{}{\underset{x \in Ω}{.Math.} .Math. .Math. {(\log .Math. .Math. u_{k} (x, c) - (γ_{c} .Math. \log .Math. .Math. u_{t} (x, c) +))}^{2}}} & (10) \end{matrix}$

where custom-character =log α.sub.c. Now, solve Eq. 10 by setting

[00006] $\begin{matrix} \partial E = \frac{\partial E}{\partial γ_{c}} = 0 & (11) \end{matrix}$

to derive the well-known analytical solution to univariate linear regression:

[00007] $\begin{matrix} γ_{c} = \frac{{.Math.}_{x \in Ω} .Math. (u_{k} (x, c) - {\overline{u}}_{k} (x, c)) .Math. (u_{t} (x, c) - {\overline{u}}_{t} (x, c))}{{.Math.}_{x \in Ω} .Math. {(u_{t} (x, c) - {\overline{u}}_{t} (x, c))}^{2}} = \frac{Cov (u_{t}, u_{k} .Math. t)}{Var (u_{t})} & (12) \\ = {\overline{u}}_{k (c)} - γ_{c} .Math. {\overline{u}}_{t (c)} & (13) \\ .Math. α_{c} = \exp () . & (14) \end{matrix}$

[0054] This solution to obtain the coefficients α.sub.c and γ.sub.c has some desirable properties: it is computationally simple and exact, guaranteed to converge in O(n) iterations (linear in the number of n correspondent points, n=#Ω. As a remark, note that minimizing Eq. 8 is evidently not equivalent to minimize Equation 10. It is known that when fitting an affine function in the logarithmic domain, the loss function E also becomes logarithmic, meaning that residuals computed from low values will tend to have more weight than residuals computed from high values. For our application of color correction, this implies that the estimation can be specially sensitive to the presence of outliers in dark colors. Even though the analytical solution is fast and exact (for non-liner error), for higher regression accuracy in terms of linear mean squared error, the solution can be alternatively computed with a numerical method such as gradient descent.

[0055] The accuracy of the power law model using the R.sup.2 (coefficient of determination) can be evaluated, which gives some information about the accuracy of fit of the model. This coefficient is a statistical measure of how well the regression line approximates the real data points. For instance, a R.sup.2 of 1 indicates that the regression line perfectly fits the data. In this case, it is preferable to evaluate how good a regression line is fitted in the logarithmic domain, so the accuracy of fit is given by

[00008] $\begin{matrix} R^{2} = 1 - \frac{{.Math.}_{x \in Ω} .Math. {(\log .Math. .Math. u_{k} - T_{l} (\log .Math. .Math. u_{t}))}^{2}}{{.Math.}_{x \in Ω} .Math. {(\log .Math. .Math. u_{k} - \overline{\log .Math. .Math. u_{k}})}^{2}}, & (15) \end{matrix}$

where T.sub.1(log u.sub.t)=γ.sub.c log u.sub.t(x, c)+ custom-character .

[0056] The power law relationship can be illustrated with images taken from the same scene where there are variations in camera exposure and white balance. FIG. 5 shows a sequence of photographs of a same scene, taken by a smartphone. Each picture is adjusted (using the camera settings) to have a different exposure or white balance, so tonal changes can be analyzed by studying the color transfer function between the first picture and the following ones. More specifically, to take advantage of the Macbeth color chart as a reference, use the median color value of each color in the chart to estimate a power law transformation. FIG. 8 plots the graphics of the functional relationship between the colors from the first picture and the colors in the following pictures. Note that the left column of FIG. 8 plots the ordinary linear graphics, where a non-linear relationship is observed. In the right column is plotted the log-log graphic, where the functional relationship is now approximately linear. Finally, FIG. 6 shows the pictures of the same sequence after being corrected with the estimated power law color transformations. Note that the colors are effectively stabilized after the color correction. However, in a careful observation, it can still noted there are some color differences between the first picture and the following ones. This is due to the fact that the model is approximate, and will likely to fail in saturated colors that cannot be mapped without the aid of additional camera information. In addition, note that in this experiment the variations in white balance and exposure are extreme, it is unlikely to observe such extreme variations in white balance in a video shot. In addition, the appropriateness of fit R.sup.2 over the log domain is larger than 0.9 for all the computed regressions shown in this example, which shows that the relationship between color intensities is in fact approximatively linear in a logarithmic scale.

[0057] The proposed tonal transformation model can also be effectively applied to compensate only exposure instability. FIG. 9 shows five frames from the original and corrected version of a time lapse sequence. Sequences containing no motion, such as a time lapse video, are interesting to validate the tonal correction model, since the influence of motion outliers do not interfere in the accuracy of estimated tonal transformation. In FIG. 10 we show the plot of the RGB points and the RGB estimated curves to transform each frame according to the correspondences with the first frame. Note that the power law transformation fits well to the intensity distribution.

[0058] Finally, we note that a power law model for color transformation is commonly used for color grading in film post-production. The ASC CDL (American Society of Cinematographers Color Decision List) is a format for exchange of color grading parameters between equipment and software from different manufacturers. The format is defined by three parameters slope (α), offset (β) and power (γ), which are independently applied for each color channel:

T(u)=(αu+β).sup.γ. (16)

This transformation is usually applied in a color space specific to the color grading software (for example YRGB color space in DaVinci Resolve). Compared to ASC CDL, our parametric model is similarly based on power and slope coefficients, without offset, which advantageously allows us to compute the optimal parameters with an analytical expression.

Motion and Temporal Coherence Model

[0059] Although the assumption of perfectly registered images is a convenient starting point, it is evident that in practice, movement is observed in the majority of sequences. Motion estimation is proposed by the present methods to guarantee tonal stabilization by taking into account the movement not only between a pair of frames, but also between several frames in a sequence.

[0060] There are numerous motion estimation methods, some examples being dominant global motion estimation, dense optical flow, and sparse feature tracking. For the present task of estimating tonal transformations driven by motion based correspondences, assume that it is desirable to have a dense set of correspondences, so that we take advantage of correspondences between homogeneous intensity areas to estimate accurate color transformations.

[0061] In particular, the present techniques rely on dominant motion estimation between frames, mostly motivated by a tradeoff. Dominant motion is computationally simpler (potentially computed in real time) in comparison to dense optical flow, however, dominant motion does not provide pixel-wise accuracy. But dominant motion usually accounts for camera motion, and tonal instabilities seen in videos are normally correlated with the movement of the camera. In contrast to tasks that depend heavily on accurate motion (i.e. video motion stabilization), there is no need for a highly accurate motion description in order to estimate a color transformation that compensates tonal differences between frames.

[0062] Denoting u.sub.t:Ω.fwdarw. custom-character .sup.3 and u.sub.k:Ω.fwdarw..sup.3 as two neighboring frames in a sequence such that t=k+1,we can assume that u.sub.t and u.sub.k depict the same scene, differing only by a small spatial displacement. Then, the 2D motion between these frames can be described by a global transformation A, such that u.sub.k ({circumflex over (Ω)}) and u.sub.t(A(Ω)) denotes the registration (motion compensated alignment) of u.sub.k and u.sub.t, where {circumflex over (Ω)}.sub.k ⊂ Ω.sub.k is a subset of spatial coordinates in u.sub.k. More specifically, we represent A as a matrix that accounts for affine warping, which can be considered a good tradeoff between complexity and representativeness, taking into account scale, translation and rotation transformations between frames. Then,

[00009] $\begin{matrix} A (x) = [\begin{matrix} u (x) \\ v (x) \end{matrix}] & (17) \\ u (x) = a_{1} + a_{2} .Math. x_{1} + a_{3} .Math. x_{2} .Math. .Math. v (x) = a_{4} + a_{5} .Math. x_{1} + a_{6} .Math. x_{2}, & (18) \end{matrix}$

where x=(x.sub.1, x.sub.2) denotes the original pixel coordinates, A(x) is the affine flow vector modeled at point x and (α.sub.1, . . . , 6) are the estimated motion coefficients. We estimate the coefficients based on a robust parametric motion estimation methods in a prior art approach. That method computes the optimal affine motion coefficients in terms of spatiotemporal gradients by Iteratively Reweighted Least Squares (IRLS) with M-estimator loss function (Tukey's biweight). Such loss function is known to be more robust against motion outliers than usual quadratic error. That method also takes into account a brightness offset as a simple way of relaxing the brightness constancy assumption (which states that pixel intensities from the same object do not change over time) to deal with minor changes in scene illumination.

[0063] After considering the case of motion estimation between neighbor frames u and u.sub.k, for t=k+1, we generalize the approach for the case of an arbitrary k differing to t by several frames. In particular, for video tonal stabilization, it is desirable to take advantage of motion estimation between several frames in order to guarantee longer tonal coherence. However, long term motion estimation is a challenging problem and methods based on spatiotemporal gradients cannot deal with direct large motion estimation between frames. An usual workaround to deal with larger displacement is to estimate motion from multiple image resolutions, but even though, the multiresolution estimation is inaccurate in estimating large motion between several frames. In other words, reliable parametric dominant motion estimation based on this prior approach is limited only for interframe motion.

[0064] In practice, a simple accumulation of interframe motions is used as an approximation of long term motion, which can be used for the estimation of tonal transformations. Formally, assuming t>>k (the keyframe is in the “past”) and s=(t−k)−1 being the temporal scale for which the scenes in u.sub.t, and u.sub.k are overlapped, the accumulated affine motion from u.sub.t to u.sub.k is given by

A.sub.t,k=A.sub.t,t−1∘A.sub.t−1,t−2∘ . . . ∘A.sub.t−s,k, (19)

where A.sub.t,k denotes the motion coefficients estimated from frame u.sub.t to u.sub.k. Having an estimate of A.sub.t,k, we can warp u.sub.t to u.sub.k in order to get a registered pair of images with known motion compensated correspondent points, which are defined by

Ω.sub.t,k={(x, y)|x ∈ A.sub.t,k(Ω.sub.t), y ∈ {circumflex over (Ω)}.sub.k}, (20)

where {circumflex over (Ω)}.sub.k ⊂ Ω.sub.k. Nevertheless, it should be noted that the motion estimation is a rough global approximation and is likely to contain errors due to occlusions, non-dominant (object) motion, or simply inaccurate coefficients in A.sub.t,k. Hence, motion outliers can be discarded to guarantee an accurate color transformation. One approach is to compute a difference map between the aligned images, and consider that values higher than a threshold on this difference map will correspond to the outliers. However, the difference map being based on the residual of intensity values, it should be noted that intensity differences are not reliable under brightness and color changes between frames.

[0065] Thus, first compute a rough radiometric compensation of tonal differences between the aligned images as a measure to reduce the risk of confusing motion outliers with tonal differences. In the sequence, compute a difference map from the corrected warped frame, which will discard the motion outliers and keep the colorimetric differences-which are essential to estimate the color transformation.

[0066] Formally, the outlier removal approach can be summarized as the following. Let {circumflex over (Ω)}.sub.t,k be the set of correspondent spatial coordinates (motion overlap) shared between two frames u.sub.t and u.sub.k. {circumflex over (Ω)}.sub.t,k is computed by accumulating frame to frame motions—warp u.sub.t to align it to the keyframe u.sub.k in order to have u.sub.k({circumflex over (Ω)}.sub.k) registered to u.sub.t(A.sub.t,k(Ω.sub.t)). Since {circumflex over (Ω)}.sub.t,k contains motion outliers, reject outlier data, but first account for possible tonal differences between the aligned frames, so that these differences are not taken as outliers. Given (x, y) ∈ {circumflex over (Ω)}.sub.t,k, the tonal differences between the aligned current frame and key frame are compensated by a simple mean value shift:

ũ.sub.t(x,c)=(u.sub.t(x,c)−μ(u.sub.t(c))+μ(u.sub.k(c)) (21)

[0067] Finally, a set of corresponding spatial coordinates filtered by motion outliers is defined by

[00010] $\begin{matrix} Ω_{t, k} = {(x, y) \in {\hat{Ω}}_{t, k} | \frac{1}{3} .Math. \underset{c \in C}{.Math.} .Math. .Math. {[u_{k} (y, c) - {\tilde{u}}_{t} (x, c)]}^{2} < σ}, & (22) \end{matrix}$

where σ is the empirical noise, which can be an estimation (with a noise estimation method from a prior art method) or an approximation of the noise variance in u.sub.t and u.sub.k.

[0068] Based on the set of spatial correspondences between temporally distant frames Ω.sub.t,k, the present principles can estimate temporally coherent tonal transformations, so that tonal instabilities are compensated. By taking long term motion into account, it enforces that tonal coherency is not lost from frame to frame.

Motion Driven Tonal Stabilization

[0069] In describing motion driven tonal stabilization, first consider an ideal symmetric operator. This transformation has the desired property of being invariant with respect to the time direction, avoiding bias to the colors of the keyframe. This definition leads to a symmetric scale-time correction similar to the operator proposed in at least one other method:

[00011] $\begin{matrix} S_{t} (u_{t}) = {.Math.}_{i = - s}^{s} .Math. .Math. λ_{i} .Math. T_{i} (u_{t}), & (23) \end{matrix}$

where s is the temporal scale of the correction, T.sub.i is a tonal transformation weighted by λ.sub.i, assuming that λ.sub.i is a Gaussian weighting intended to give more importance to transformations estimated from frames that are temporally close to u.sub.t. This operator can be seen as a temporal smoothing which computes the tonal stabilization of u.sub.t as a combination of several weighted transformations. In practice, the S.sub.t operator requires the estimation of 2s transformations for every frame to be corrected, which is computationally expensive, and even if s is set to be small, the correction then risks to not be sufficiently effective. This approach fits well for high frequency flickering stabilization, because flicker can be filtered with an operator defined for a limited temporal scale. On the other hand, tonal fluctuations caused by camera parameters could need larger temporal scales to be properly corrected.

[0070] A faster and yet efficient alternative to operator S.sub.t is desired where less computations are required to correct each frame. In particular, control of undesired estimation bias and drift is needed through weighted transformations. For the sake of simplicity, assume that the starting point for the sequential tonal stabilization is the first frame of the sequence, then the solution for tonal stabilization can be seen as a temporal prediction, where the correct tonal appearance of u.sub.t is predicted based on previously known tonal states. This is typically the case of an application for sequential on-the-fly correction, for example, to compensate tonal fluctuations of a live camera in a video conference. Even for sequential tonal stabilization, the symmetric property can be approximated by combining forward and backward corrections.

[0071] Algorithm 1 shows the proposed sequential motion driven tonal stabilization. For each frame u.sub.t, we want to find a triplet of RGB transformations defined as T.sub.t(u.sub.t) which minimizes the tonal differences between u.sub.t and u.sub.k. Let M(u.sub.t,u.sub.k) denote a function that takes two frames as parameters, computes their motion estimation, warping and outlier rejection, producing as output reliable spatial correspondences. So, the tonal transformation is based on a regression over the set of data points given by the coordinates Ω.sub.t,k=M(u.sub.t,u.sub.k).

[0072] The tonal stabilization problem is solved by estimating the optimal coefficients α.sub.c, γ.sub.cthat transform the colors of u.sub.t to the colors of a keyframe u.sub.k such that we minimize the sum of squared error:

[00012] $\underset{α_{c}, γ_{c}}{argmin} .Math. \underset{(x, y) \in Ω_{t, k}}{.Math.} .Math. .Math. {(u_{k} (x, c) - α_{c} .Math. {u_{t} (y, c)}^{γ_{c}})}^{2} .$

Starting from u.sub.t, we repeat the main procedures illustrated in FIG. 2 (motion estimation, warping, color transform estimation, color correction) for all the following frames, while #Ω.sub.t,k≧ωn, where ω is the motion overlap threshold, usually set to 0.25 and n is the number of pixels in frame u.sub.t. When the cardinality #Ω.sub.t,k of the overlapped region between u.sub.t and u.sub.k is no longer large enough to allow for an accurate color estimation, the keyframe u.sub.k is updated to the previously corrected frame T.sub.t−1(u.sub.t−1).

[0073] As a regularization concern, we can ensure that the transformation frame T.sub.t(u.sub.t) does not deviate largely from the original content of u.sub.t by applying a temporal weighting λ:

û.sub.t(c)=α.sub.cu.sub.t(c).sup.γ.sup.c

T(u.sub.t)=λ(û.sub.t(c))+(1−λ)u.sub.t.

The parameter λ is set as a weight that decreases exponentially in function of the motion between the current frame and the keyframe (exponential forgetting factor). In addition, when we apply our method to compensate exposure variations, we can work with 16 bits images, so that intensities larger than 255 do not need to be clipped after color transformation. Then, we have as result a sequence that has an increased dynamic range over time, and the sequence could actually be visualized without losing intensity information in an appropriated high dynamic range display.

[0074] The steps illustrated in FIG. 13 (motion estimation, warping, color transform estimation, color correction) are repeated for all the following u.sub.t+m frames, until #Ω.sub.t,k<ω×n, where #Ω.sub.t,k denotes the cardinality of the corresponding set. When this condition is met, it means that the cardinality of the overlapped region between u.sub.t and u.sub.k is no longer large enough to allow for an accurate color estimation. In this case, the keyframe u.sub.k is updated to u.sub.t−1.

TABLE-US-00001 Algorithm 1 Motion driven tonal stabilization Input: Sequence of frames u.sub.t ε U, t = {1, ..., D} Output: Tonal stabilized sequence T.sub.t(u.sub.t), t = {1, ..., D} 1: k custom-character 1 # Initialize keyframe index 2: t k + 1 # Initialize current index 3: T.sub.1(u.sub.1) = u.sub.1 # First output frame is not transformed 4: while t ≦ D do 5: Ω.sub.t,k M(u.sub.t, u.sub.k) # Compute motion based correspondences 6: if #Ω.sub.t,k ≧ ω × n then # If there are enough correspondences: 7: for c custom-character {R, G, B} do # Perform tonal correction 8: α.sub.c, γ.sub.c arg min.sub.α,γ Σ.sub.(x,y)εΩt,k (u.sub.k(y,c) − αu.sub.t(x,c).sup.γ).sup.2 9: û.sub.t(c) α.sub.cu.sub.t(c).sup.γc 10: T.sub.t(u.sub.t(c)) λû.sub.t(c) + (1 − λ)u.sub.t(c) 11: end for 12: t custom-character t + 1 13: else # If there are not enough correspondences: 14: if k < t − 1 then 15: k t − 1 # Update keyframe 16: u.sub.k T.sub.t −1 (u.sub.t−1) 17: else 18: T.sub.t(u.sub.t) T.sub.t −1 (u.sub.t) 19: t t + 1 20: k custom-character t + 1 21: end if 22: end if 23: end while

[0075] In contrast to a prior method, which propagates transformations from frame to frame, the proposed method guarantees longer tonal coherency between the temporal neighborhood of a keyframe. In other words, this method propagates the tonal transformations from keyframe to keyframe, so that the accumulation of tonal error is controlled by using a larger temporal scale.

[0076] An important aspect of the video tonal stabilization problem is that complete temporal preservation of tonal appearance is not always desired, due to the inherent camera dynamic range limitations. In fact, tonal instabilities caused by camera automatic exposure can be perceptually disturbing, but if huge changes occur in camera exposure, the variation of tonal appearance should be kept to some degree, to avoid overexposure. In order to deal with this aspect, temporally weighted color transformations can be performed, or additionally, the dynamic range of the sequence in time can be increased.

Temporal Weighting As a regularization concern, to ensure that the transformation T.sub.t(u.sub.t) does not deviate largely from the original content of u.sub.t a weight λ is applied:

custom-character =λ(α(u.sub.t.sup.γ))+(1−λ)u.sub.t. (24)

A similar weighted correction is used in a prior method, where it is proposed to fix λ:=0.85 as a forgetting factor for recursive de-flickering. It is proposed that the weighting λ could vary over time, in function of the temporal distance or in function of the motion between u.sub.t, and u.sub.k, assuming that frames that are closer in content to the keyframe should receive higher weight in the tonal correction. Since the affine motion parameters A.sub.t,k that warp u.sub.t to u.sub.k are known, a rough spatial distance from these two frames can be computed and written as:

[00013] $\begin{matrix} λ = \exp (- λ_{0} .Math. \frac{.Math. V_{uk} .Math.}{p}), & (25) \end{matrix}$

where ∥V.sub.uk∥ denotes the norm of the dominant motion vector V.sub.uk, p is the maximum spatial displacement (number of rows+number of columns in the image), λ.sub.0 is the exponential decay rate (in practice here, set λ.sub.0:=0.5). Another possibility is to weight the correction in function of the temporal distance between u.sub.t and u.sub.k:

[00014] $\begin{matrix} λ = \exp (- λ_{0} .Math. \frac{.Math. t - k .Math.}{D}), & (26) \end{matrix}$

where D is the number of frames in the sequence. In this sense, the idea is to decrease the influence of frames which have large motion displacement from the current frame. A remark of interest, is that work done in the field of color perception has shown that chromatic and contrast sensitivity functions decrease exponentially when the velocity of stimuli increases. Therefore, it is offered here that the motion dependent λ has, to some degree, a perceptual motivation.

Increasing Temporal Dynamic Range

[0077] In practice, the proposed method guarantees strict tonal stabilization throughout the entire sequence, no matter if strong luminance changes occur. The result is visually pleasant for sequences in which luminance variation is smooth, however, when correcting sequences with significant changes in exposition (for example, from very dark to very bright environments), it has been observed that there is saturation and clipping in the final result. In order to deal with this problem, a higher dynamic range can be used, so that it is not necessary to clip color intensities larger than 2.sup.8−1=255 (maximum intensity value for each color channel in 8 bits images).

[0078] Larger intensities are possible by working with 16-bit images, so that intensities larger than 255 do not need to be clipped after color transformation. This results in a sequence that has an increased dynamic range over time, and the sequence can be visualized without losing intensity information in an appropriate high dynamic range display.

[0079] However, in practice, the sequence needs to be converted back to 8 bits in order to display it in standard low range displays. Instead of clipping all the intensities which extrapolate the limit, alternatively, a tone mapping operator of choice can be applied to render a low dynamic range image. In particular, a logarithmic tone map operator can be used. Given an intensity value i, and the maximum intensity value of the whole sequence z, a log tone mapping operator m is given by

[00015] $\begin{matrix} m (i) = 255 .Math. (\frac{\log (1 + \frac{i}{z})}{\log (2)}) . & (27) \end{matrix}$

FIG. 12 illustrates the potential problem of intensity clipping when applying tonal stabilization and the effects of attenuating it with a temporal tone map operator or with a temporal weighting.

Additional Implementation Details

[0080] In practice, an optional smoothing (bilateral filtering) is applied to u.sub.t and u.sub.k to reduce the influence of noise outliers in the estimation of tonal transformation. Note that this step is not necessary for well exposed sequences where tonal instability is mainly due to white balance fluctuations. Nevertheless, smoothing is recommended when working with sequences strongly affected by noise.

[0081] In order to save processing time, rescale (120 pixels wide) the original frames for both motion estimation and color transform estimation, which in turn do not produce noticeable loss in tonal stabilization accuracy. Furthermore, instead of applying the power law color transformation to correct the full original frame composed of N pixels, build one lookup table (LUT) per color channel, and then compute the power law independently for each LUT. This reduces the number of power law computations from 3×N (more than 16 million for 4k video resolution) to only 3×256=768.

[0082] In practice, it has been observed that the optimal value for ω, where ω can be seen as a geometric similarity threshold between u.sub.t, and u.sub.k, depends on the accuracy of the motion estimation. If the motion is not accurate, a greater value for ω can be preferable, so that the motion estimation error is less accumulated with time. In general, it has been observed that ω:=0.25 leads to stable color transformations in most cases.

[0083] The general aspects described herein have proposed an efficient tonal stabilization method, aided by composed motion estimation and power law tonal transformation. A simple six-parameters color transformation model is enough to provide tonal stabilization caused by automatic camera parameters, without the need to rely on any a priori knowledge about the camera model.

[0084] In contrast to the state of the art, the proposed algorithm is robust for sequences containing motion, it reduces tonal error accumulation by means of long-term tonal propagation, and it does not require high space and time computational complexity to be executed.

[0085] In addition, one of the main advantages of the proposed method is that it could be applied in practice online, giving it potential for real time video processing applications such as tonal compensation for video conferences or for live broadcast.

[0086] One embodiment of a method 700 for tonal stabilization of images is shown in FIG. 7. The method commences from start block 701 and proceeds to block 710 for determining motion based correspondences. This block can be comprised of performing motion estimation between an image and a keyframe, warping the image to align with the keyframe and discarding values higher than a threshold on a difference map of said aligned images. Block 710 generates a spatial correspondence function. Control then proceeds from block 710 to block 720 for determining whether the number of motion based correspondences between images is greater than some predetermined threshold value. If not, control proceeds from block 720 to block 730 to update the keyframe. Control then proceeds from block 730 to block 710 to repeat determination of motion based correspondences. If, in block 720, the number of motion based correspondences is greater than the predetermined threshold value, control proceeds to block 740 for performing color correction. Block 740 can be comprised of performing a regression over a set of points in the spatial correspondence function and performing a transformation on the image to minimize color differences between the image and the keyframe. After block 740, each of the tonal stabilized images can be added to an image sequence.

[0087] One embodiment of an apparatus 1100 for tonal stabilization of images is shown in FIG. 11. The apparatus is comprised of a motion estimator 1110 operating on an image, in signal connectivity with a first input of the motion estimator, and a keyframe, in signal connectivity with a second input of the motion estimator. The first and second outputs of the motion estimator are in signal connectivity with two inputs of image processor 1120. A third output of the motion estimator can provide motion estimation details to image processor 1120. Image processor 1120 aligns the image and the keyframe. An output of image processor 1120 is in signal connectivity with a comparator/discard function circuitry 1130 that operates on a difference map of the aligned image and keyframe and discards values higher than a threshold resulting in a spatial correspondence function. Comparator/discard function circuit 1130 also determines if the number of motion based correspondences between the input image and the keyframe is higher or lower than a predetermined threshold value. If the number of motion based correspondences between the input image and the keyframe is less than a predetermined threshold value, one output from comparator/discard function circuit 1130 is in signal connectivity with an input of keyframe update circuitry 1140 that updates the keyframe and causes control to proceed back to the motion estimator, using the updated keyframe, which is output from keyframe update circuit 1140.

[0088] If, however, comparator/discard function 1130 determines that the number of motion based correspondences between the input image and the keyframe is greater than a predetermined threshold value, first processor 1150 receives the output of comparator/discard function 1130, in signal connectivity with its input. First processor 1150 performs a regression over a set of points in the spatial correspondence function and sends its output to the input of second processor 1160. Second processor 1160 performs a transformation of the image to minimize color differences between the image and the keyframe. The tonal stabilized image can then be added back to a sequence of images being stabilized. First processor 1150 and second processor 1160 also receive the image and keyframe as inputs.

[0089] The present description illustrates the present principles. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the present principles and are thereby included within the present principles.

[0090] All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the present principles and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.

[0091] Moreover, all statements herein reciting principles, aspects, and embodiments of the present principles, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

[0092] Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the present principles. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which can be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

[0093] The functions of the various elements shown in the figures can be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions can be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which can be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and can implicitly include, without limitation, digital signal processor (“DSP”) hardware, read-only memory (“ROM”) for storing software, random access memory (“RAM”), and non-volatile storage.

[0094] Other hardware, conventional and/or custom, can also be included. Similarly, any switches shown in the figures are conceptual only. Their function can be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.

[0095] In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The present principles as defined by such claims reside in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.

[0096] Reference in the specification to “one embodiment” or “an embodiment” of the present principles, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.

[0097] It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This can be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.

[0098] These and other features and advantages of the present principles can be readily ascertained by one of ordinary skill in the pertinent art based on the teachings herein. It is to be understood that the teachings of the present principles can be implemented in various forms of hardware, software, firmware, special purpose processors, or combinations thereof.

[0099] Most preferably, the teachings of the present principles are implemented as a combination of hardware and software. Moreover, the software can be implemented as an application program tangibly embodied on a program storage unit. The application program can be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPU”), a random access memory (“RAM”), and input/output (“I/O”) interfaces. The computer platform can also include an operating system and microinstruction code. The various processes and functions described herein can be either part of the microinstruction code or part of the application program, or any combination thereof, which can be executed by a CPU. In addition, various other peripheral units can be connected to the computer platform such as an additional data storage unit and a printing unit.

[0100] It is to be further understood that, because some of the constituent system components and methods depicted in the accompanying drawings are preferably implemented in software, the actual connections between the system components or the process function blocks can differ depending upon the manner in which the present principles are programmed. Given the teachings herein, one of ordinary skill in the pertinent art will be able to contemplate these and similar implementations or configurations of the present principles.

[0101] Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present principles are not limited to those precise embodiments, and that various changes and modifications can be effected therein by one of ordinary skill in the pertinent art without departing from the scope of the present principles. All such changes and modifications are intended to be included within the scope of the present principles as set forth in the appended claims.

METHODS AND APPARATUS FOR MOTION-BASED VIDEO TONAL STABILIZATION

Inventors

Cpc classification

Classification Explorer

H04N23/81

ELECTRICITY

Classification Explorer

H04N23/6811

ELECTRICITY

Classification Explorer

H04N23/76

ELECTRICITY

Classification Explorer

H04N23/88

ELECTRICITY

Classification Explorer

H04N5/145

ELECTRICITY

Classification Explorer

H04N9/73

ELECTRICITY

International classification

Classification Explorer

H04N9/73

ELECTRICITY

Classification Explorer

H04N5/14

ELECTRICITY

Classification Explorer

H04N9/67

ELECTRICITY

Abstract

Claims

Description