Processing of impulse noise in a video sequence

Abstract

A method for processing data in a video sequence comprising impulse noise (“salt-and-pepper”, “snow”, or other type), comprising, for filtering the noise, an application of a recursive filter (S2), given by:
z(n)=z(n−1)+Δ if y(n)>z(n−1)
z(n)=z(n−1)−Δ if y(n)<z(n−1)
and z(n)=z(n−1) if y(n)=z(n−1)
where: y(n) designates an element in the n.sup.th image of the succession, not processed by application of the sign filter; z(n−1) designates an element having a position corresponding to y(n), in the (n−1).sup.th image of the succession, and processed by application of the sign filter; z(n) designates an element having a position corresponding to y(n), in the n.sup.th image of the succession, and processed by application of said sign filter, and Δ is a strictly positive coefficient.

Claims

1. A method for processing data in a video sequence comprising noise, the video sequence being formed of a succession of images, the method comprising, for filtering the noise, an application of a recursive filter hereafter called the “sign filter” and given by:
z(n)=z(n−1)+Δ if y(n)>z(n−1)
z(n)=z(n−1)−Δ if y(n)<z(n−1)
and z(n)=z(n−1) if y(n)=z(n−1) where: y(n) designates a pixel element in the n.sup.th image of the succession, not processed by any application of the sign filter; z(n−1) designates a pixel element having a position corresponding to the position of y(n), in the (n−1).sup.th image of the succession, and processed by application of the sign filter; z(n) designates a pixel element having a position corresponding to the position of y(n), in the n.sup.th image of the succession, and processed by application of said sign filter, and Δ is a strictly positive coefficient.

2. The method according to claim 1, wherein the images from the succession are processed pixel by pixel.

3. The method according to claim 1, wherein the noise is at least one of a “salt-and-pepper” and “snow” type impulse noises.

4. The method according to claim 1, wherein the noise is impulse noise and results from a radioactive radiation received by a sensor of a camera filming said video sequence.

5. The method according to claim 1, wherein the images of said video sequence present objects moving in front of a background of interest and said objects moving in the images are treated as noise.

6. The method according to claim 1, wherein the succession of images comprises an apparent movement of an image background in the succession of images, and wherein the method further comprises: incorporating the apparent movement as input to the sign filter.

7. The method according to claim 6, wherein the elements y(n), (n−1) and z(n) are image pixels, having the same position, the images from the succession being processed pixel by pixel, and wherein the application of the sign filter in the case of apparent movement is given by:
z(q,n)=z(T.sub.n(q),n−1)+Δ if y(q,n)>z(T.sub.n(q),n−1)
z(q,n)=z(T.sub.n(q),n−1)−Δ if y(q,n)<z(T.sub.n(q),n−1)
z(q,n)=z(T.sub.n(q),n−1) if y(q,n)=z(T.sub.n(q),n−1) with z(q,n) being the values taken by the n.sup.th image at the pixel with vector coordinates q and T.sub.n being the estimate of the transformation between the preceding image of rank n−1 and the current image of rank n in the succession.

8. The method according to claim 1, wherein, for the initial images of the succession until the n.sub.0.sup.th image, a forgetting-factor temporal filter is applied, without applying sign filtering, this forgetting-factor temporal filter being given by: $z (q, n) = \frac{z_{temp} (q, n)}{N (q, n)}$ where z.sub.temp(q,n)=(1−α).Math.y(q,n)+αz.sub.temp(T.sub.n(q),n−1) and N(q,n)=(1−α)+α.Math.N(T.sub.n(q),n−1); z.sub.temp(q,n) being a time variable and a being a forgetting factor included between 0 and 1. z(q,n) then being the values taken by the n.sup.th image at the pixel with vector coordinates q and T.sub.n being the estimate of a transformation by possible movement between the preceding image of rank n−1 and the current image of rank n in the succession, with n less than n.sub.0.

9. The method according to claim 8, wherein, in the absence of movement between successive images up to image n.sub.0, the forgetting-factor temporal filter is given by: $z (n) = \frac{z_{t e m p} (n)}{1 - α^{n}},$ with z.sub.temp(n)=(1−α)y(n)+αz.sub.temp(n−1) z.sub.temp(n) being a time variable and a being a forgetting factor included between 0 and 1.

10. The method according to claim 1, wherein, at least for images of the succession which follow an n.sub.0.sup.th image, the combination of the sign filter with a forgetting-factor temporal filter is applied, the result of this combination being given by: $s (n) = \frac{w_{t e m p} (n)}{1 - β^{n - n_{0} + 1}}$ w.sub.temp(n) being a temporal variable given by: w.sub.temp(n)=(1−β)z(n)+βw.sub.temp(n−1), where z(n) is the result of the application of the sign filter, and where β is a forgetting factor included between 0 and 1.

11. The method according to claim 1, wherein a value of the coefficient Δ is chosen as a function of a maximum value, I.sub.max, of the color level taken by the image elements, and in that the coefficient Δ is less than 20.Math.I.sub.max/255.

12. The method according to claim 11, wherein the coefficient Δ is included between 0 and 5.Math.I.sub.max/255.

13. A non-transitory computer storage medium, storing data of instructions of a computer program for implementing the method according to claim 1, when said instructions are executed by a processor.

14. A device comprising a processing unit comprising at least an input interface to receive data of a video sequence being formed of a succession of images, at least one memory to store temporarily said data, and a processor to process said data for implementing the method according to claim 1.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) Other advantages and features will become apparent upon reading the detailed description of the disclosed embodiments, presented as illustrative examples, and upon examination of the attached drawings in which:

(2) FIG. 1 shows the temporal median and average of the values taken by a pixel which could be contaminated by an impulse noise (p=20%, σ.sub.u=10 gray levels);

(3) FIG. 2 shows the temporal evolution of a pixel which could be contaminated (p=20%, σ.sub.u=10 gray levels) and output from the processing by sign filter (with Δ=.sup.3);

(4) FIG. 3 shows the observed and theoretical distribution of values taken by the output from the processing by sign filter used for the results from FIG. 2 (Δ=3);

(5) FIG. 4 compares the processing by sign filter (Δ=3) to the hybrid processing using two forgetting-factor temporal filters and the sign filter (Δ=3, n.sub.0=25 images, α=β=0.95) on a sometimes-contaminated pixel (σ.sub.u=10, p=20%);

(6) FIG. 5 shows a device for implementation according to an embodiment;

(7) FIG. 6 shows a processing applied to a sequence of images with movement according to an embodiment;

(8) FIG. 7 shows a processing according to an embodiment with two forgetting-factor filters, prior (Filt temp1) to the sign filter (FILT SGN) and concomitant (Filt temp2) with the sign filter.

DETAILED DESCRIPTION

(9) The disclosed embodiments propose a recursive approach as follows: by considering the values taken by a single pixel, denoted y(n) for image n, this approach consists of calculating the series z(n) defined by:
z(n)=z(n−1)+Δ×sign(y(n)z(n−1)) where sign(0)=0.

(10) In the following, this approach is called “processing by sign filter” or “fast temporal median.”

(11) Δ is a parameter of the algorithm whose adjustment expresses a compromise between the convergence speed and the residual standard deviation after convergence. Its adjustment is discussed below.

(12) There are multiple advantages to this sign filter.

(13) The required memory remains very small: only two images are used for calculating the output at moment n: the previous output image z(n−1) and the current raw image y(n). The real-time implementation is therefore immediate.

(14) There is nearly no complexity: for each new image and for each pixel, there are one addition, one multiplication, and one comparison (on the sign calculation).

(15) This approach is adapted advantageously and without difficulty to the specific case of a moving camera, for which the recurrence equation becomes:
z(q,n)=z(T(q),n−1)+Δ×sign(y(q,n)z(T(q),n−1))

(16) with z(q,n) being the values taken by the image at moment n at the pixel with vector coordinates q, and T being the estimate of the transformation between the preceding image and the current image. Methods for estimating this transformation are described in the literature for a translation and for the combination of a translation, a rotation, and a change of zoom factor.

(17) Thus a preliminary to processing the sign may consist of defining whether or not the camera is moving: if the camera is fixed, only the processing by sign filter is applied; and If the camera is meant to move, and/or if the zoom factor can vary (or the lighting), then, referring to FIG. 6, a preliminary step S1 of estimating the movement of the camera is applied: in a known manner, the displacement of the camera is calculated from successive images and this displacement, thus obtained in step S1, is then incorporated as input to the processing by sign filter to be applied in step S2.

(18) For example, a user can be provided a binary button for specifying whether the sequence was filmed with a fixed camera or moving camera (step S0), the latter case leading to launching the movement estimation algorithm (step S1), the result thereof being taken as input to each iteration of step S2, iteratively over n up to N.sub.imax (loop of steps S3 and S4, until S5 for n=N.sub.imax)

(19) In the preceding general equation, the coefficient Δ characterizes the importance given to the sign of the difference between the current image y(n) and the preceding image n−1 which has been processed recursively by the method according to an embodiment: z(n−1).

(20) The adjustment of the value of this coefficient Δ can be done as detailed below. This adjustment results from a compromise between the convergence time of the recursive sign filter and the final pertinence of the output of the filter which can for example be estimated by the variance after convergence.

(21) “Convergence time” is understood to mean both: the convergence time at startup of the processing; the time for adaptation to camera movements or zoom variations (the pixel in fact corresponding to a new scene element); and the time for adaptation to a change of scene, typically a lighting change.

(22) If one wants to give preference to a fast adaptation/convergence time, a high value is chosen for the coefficient Δ.

(23) If one wants to give preference to a small residual variance, a small value is chosen for the coefficient Δ.

(24) FIGS. 2 and 3 respectively show the temporal evolution and the distribution of values taken by the output from the processing by sign filter. This distribution is very close to the theoretical distribution.

(25) Typically, as output from the sign filter, the residual variance after convergence is given by:

(26) $Vr \approx \sqrt{\frac{π}{8}} \times \frac{{Δσ}_{u}}{1 - p}$

(27) The convergence time for a median change of amplitude A is given by:

(28) 0 $Tc \approx .Math. \frac{A}{Δ (1 - p)} .Math.$

(29) Here, the convergence time is given for a “salt-and-pepper” impulse noise comparable to black or white. In the case of pixels contaminated with white only, one would have: A/Δ(1−2p) in the case of a drop of the median value, and A/Δ in the case of an increase of the median value.

(30) The values taken by the pixels of the image are typically between 0 and 255 (the case of 8-bit coded values). In this case, it appears empirically that a good choice of value for the coefficient Δ may be between 0 and 20, and preferably between 0 and 5. More generally, if the values taken by the pixels are between 0 and I.sub.Max, the coefficient Δ can be chosen between 0 and 20.Math.I.sub.max/255, preferably between 0 and 5.Math.I.sub.max/255.

(31) The coefficient Δ may be set as input to the processing by sign filter in step S2, during an initialization step S6 shown in FIG. 6, and it is possible to change it if needed in case the obtained image or the time to obtain it is unsatisfactory. Typically, the processing can be such that a high value is chosen at initialization (Δ=5 for example) and then a small value (Δ=1 for example) after convergence begins. The beginning of convergence can be defined as a function of a sliding window value n.sub.1 (a number of successive images n.sub.1) for switching from Δ=5 to Δ=1.

(32) The performance of the filter according to an embodiment can be compared with the sliding-window median filter.

(33) The standard deviation of the output from the processing by application of the sign filter can in fact be compared to the standard deviation of the conventional calculation of the median over a sliding window of size N. This can be approximated by the following asymptotic result:

(34) $σ_{N} = \frac{1}{f_{Y} (y_{m e d}) \sqrt{4 N}}$

(35) Which, in this context, gives:

(36) $σ_{N} \approx \sqrt{\frac{π}{2 N}} \frac{σ_{u}}{1 - p}$

(37) For data values of the noise σ.sub.u (standard deviation for “conventional” additive noise, not impulse noise) and contamination level p, the minimum size of the sliding windows needed to obtain the same performance as the sign algorithm proposed here can be deduced:

(38) $N = \frac{\sqrt{2 π} σ_{u}}{Δ (1 - p)}$

(39) With for example σ.sub.u=10 gray levels and p=20%, it is found that the size of the sliding window necessary to obtain the same residual standard deviation is: N=10 images, for Δ=3, N=30 images, for Δ=1, and N=60 images, for Δ=0.5.

(40) One can then consider as a downside to this processing according to an embodiment the compromise to be made between the convergence time and the residual variance after conversion.

(41) However, it is possible to make use of hybrid processing in order to obtain a low residual variance without impacting the convergence time.

(42) A first solution consists of starting the processing with a linear filter, typically a normalized exponential forgetting, and then, after an initialization time n.sub.0 of several images, next switching to processing by application of the sign filter. This type of improvement can be useful in the case of a fixed camera, where a low residual variance is sought without penalizing the initial convergence time. On the other hand, this approach is less effective when needing to adapt quickly to a change of scene (and/or change of lighting or zoom in particular).

(43) In order to reduce the residual variance, a second solution consists of using another linear temporal filter, applied to z(t), typically with exponential forgetting. If β is the coefficient for this second filter (included between 0 and 1), this allows multiplying the residual variance by

(44) $\frac{1 - β}{1 + β} (< 1)$
(if it is assumed that the outputs from the sign filter are uncorrelated in time).

(45) This improvement can be particularly useful in the case of a scene which could shift, in order to: guarantee tracking of the scene (lighting) and/or of the camera (with a high A), and guarantee a low residual variance (with the second exponential filter).

(46) The two improvements can be used during a single processing.

(47) The hybrid processing with the two improvements is summarized below, with reference to FIG. 7, in the case of a fixed camera, for a given pixel of value y(n) in the n.sup.th image:

(48) For n=0, application of an initialization step S20:
y(0)=0,z.sub.temp(0)=0,w.sub.temp(n.sub.0)=0

(49) For n=1 to n.sub.0 (loop of steps S22 and S23), application in step S21 of the first forgetting-factor temporal filter, without sign filter:

(50) $z_{t e m p} (n) = (1 - α) y (n) + α z_{t e m p} (n - 1) And$ $at output s (n) = \frac{z_{t e m p} (n)}{1 - α^{n}}$

(51) For n>n.sub.0 (and until convergence in step S26), application in steps S24 and S25 of the sign filter and respectively of a second forgetting-factor temporal filter to the result of the sign filter:

(52) $z (n) = z (n - 1) + Δ \times sign (y (n) - z (n - 1))$ $w_{temp} (n) = (1 - β) z (n) + β w_{t e m p} (n - 1) And$ $at output s (n) = \frac{w_{t e m p} (n)}{1 - β^{n - n_{0} + 1}}$

(53) The performance of the processing by application of the sign filter and of the hybrid processing with improvements (much better performance in this second case) are illustrated by FIG. 4.

(54) Use of the hybrid processing and adjustment of its parameters (initialization time and exponential forgetting factors) depend on the type of application. For a fixed or slowly moving camera, a small value of the coefficient Δ is sufficient (0.5 or 1 gray level), on condition of initializing the processing with normalized exponential forgetting (α=0.9 for example). The second filter w.sub.temp) (is not necessarily useful.

(55) For a moving camera, or a variable scene (change of lighting for example), the processing must constantly adapt. A high coefficient of up to Δ=5 gray level could be chosen, followed by exponential forgetting β=0.9. The initialization filter (z.sub.temp) is not necessarily useful.

(56) For an optimum adjustment of the parameters, calculations comparing the residual variance and the convergence time during a change of median of amplitude A (typically due to a variation of the lighting of the filmed scene) can be helpful.

(57) In fact, in this case, the convergence time (for a salt-and-pepper type “black or white pixel” noise (not just white, “snow” type)) for a change of median of amplitude A (time to reach μA) is given by:

(58) $Tc \approx .Math. \frac{A}{Δ (1 - p)} .Math. + \frac{\log (1 - μ)}{\log β}$

(59) The residual variance after convergence can be approximated by:

(60) $V r \approx 3 * \sqrt{\frac{π}{8}} \times \frac{Δ σ_{u}}{1 - p} \times \frac{1 - β}{1 + β}$

(61) Now referring to FIG. 5, a device according to an embodiment comprises a processing unit UT comprising at least: an input interface IN for receiving a signal representative of pixels of successive images forming a video sequence (in fixed plane, and fixed or variable zoom and lighting) from a camera CAM, or even from a remote server (or a remote entity via a communication network); a buffer memory BUFF for temporarily storing image data (and typically the last image received (raw) and the last image processed, as well as, in case of movement, the next-to-last image received); a memory MEM for possibly storing other temporary variables and especially for storing instruction code of a computer program according to an embodiment, and also possibly parameter values for filters (forgetting factors, coefficient Δ, number of initial images n.sub.0, N.sub.imax, etc.); a human-machine interface HMI for entering values of filter parameters, this interface being connected to entry means ENTRY (keyboard, mouse, touchscreen, voice command, etc.) and display means DISP (screen, touchscreen, etc.); a processor PROC for reading the instruction code of the computer program from the memory MEM and applying the processing according to an embodiment to the image data from the buffer BUFF, where appropriate by relying on a movement estimation module MOV (estimate of movement of the camera by translation, rotation, zoom factor) if the position of the camera or the adjustments thereof are thought to be variable; an output interface OUT for delivering the denoised images which can be viewed on the display means DISP for example.

(62) The processing is done pixel by pixel, to provide an image restored from the value of the pixels at image n.

(63) The output of the processing may present several options: estimating in a simple manner the background of the video sequence in real time; detecting and possibly extracting one or more objects moving in the images, by subtraction of the background so obtained from the raw image; and/or again estimating a salt-and-pepper or snow type impulse noise in the video sequence; where applicable, restoring the sequence in case this impulse noise is present; and possibly delivering the images of this background or the denoised images, via the display means DISP or via an interface (not shown) for communicating image data denoised in this manner to a remote site.

(64) The restoration then aims to carry out the following operation:
image.sub.restored=image.sub.restored,previous,reset+Δ×sign{image.sub.raw−image.sub.restored,previous,reset}

(65) Two conventional filters (exponential forgetting) can be used in addition to this treatment: a first filter can be used to accelerate the initial convergence time for this processing; a second filter can be used to reduce the residual variance.

(66) We therefore count four possible combinations of filters: simple sign filter sign filter with initialization filter sign filter with residual variance reduction filter sign filter with initialization filter and residual variance reduction filter.

(67) The second and fourth combinations are detailed below.

(68) Here, the estimate of the geometric transformation between raw image n−1 and raw image n, denoted T.sub.n(q), is used. Continuing to denote the input images y(q,n) and the output images z.sub.restored(q,n), the following steps may be applied:

(69) Initialization: z(q, 0)=0; z.sub.restored(q,0)=0; T.sub.1(q)=q (transformation identity);

(70) N(q,0)=0 (normalization image) For n=1 to n.sub.0, . . . z(q,n)=(1−α).Math.y(q,n)+.Math.αz(T.sub.n(q),n−1) N(q,n)=(1−α)+α.Math.N(T.sub.n(q),n−1)

(71) and

(72) $z_{r e s t o r e d} (q, n) = \frac{z (q, n)}{N (q, n)};$ For n>n.sub.0:
z.sub.restored(q,n)=z.sub.restored(T.sub.n(q),n−1)+Δ×sign{y(q,n)−z.sub.restored(T.sub.n(q),n−1)}

(73) The following can be chosen as input values: n.sub.0=25 images (1 second); it is possible to go up to 50 images (2 seconds) Δ=3 (between 0 and 10)

(74) This value of Δ corresponds to pixel values varying between 0 and 255. For the case of pixels varying between 0 and 1, it is necessary to multiply by 1/255. For the case of pixels varying between 0 and MAX, it is necessary to multiply by MAX/255.

(75) This algorithm makes use of the values:
z(T.sub.n(q),n−1),N(T.sub.n(q),n−1) and z.sub.restored(T.sub.n(q),n−1),

(76) which are sometimes not available when the estimated transformation T.sub.n(q) removes a pixel from the image (because of movement of the image).

(77) In this case, one could then choose the following values:
z(T.sub.n(q),n−1)=0
N(T.sub.n(q),n−1)=0
z.sub.restored(T.sub.n(q),n−1)=y(q,n) or 0

(78) We will now describe the fourth combination, corresponding therefore to the representation in FIG. 7 as previously discussed. It involves the application of the sign filter with a preliminary initialization filter and a concomitant residual variance reduction filter.

(79) The processing can be presented as follows: Initialization: z(q,0)=0; T.sub.1(q)=.sub.t q; z.sub.temp(q,0)=0; z.sub.restored(q,0)=0; N(q,0)=0 and z.sub.restored,bis(q,0)=0 For n=1 to n.sub.0, the same processing as before is retained, with:

(80) 0 $z (q, n) = (1 - α) .Math. y (q, n) + α .Math. z (T_{n} (q), n - 1)$ $N (q, n) = (1 - α) + α .Math. N (T_{n} (q), n - 1) and$ $z_{r e s t o r e d} (q, n) = \frac{z (q, n)}{N (q, n)} z_{r e s t o r e d, b i s} (T (q), n_{0}) = z_{r e s t o r e d} (q, n_{0})$ next, for n>n.sub.0, the processing becomes a combination, for each pixel of image n, of a sign filter:
z.sub.restored(q,n)=z.sub.restored(T.sub.n(q),n−1)+Δ×sign{y(q,n)−z.sub.restored(T.sub.n(q),n−1)} and a forgetting filter:
z.sub.temp(q,n)=(1−β)z.sub.restored(q,n)+β×z.sub.temp(q,n−1) such that:

(81) $z_{r e s t o red, bis} (q, n) = \frac{z_{t e m p} (q, n)}{1 - β^{n - n_{0}}}$

(82) By default, we can take α=0.95 and β=0.9

(83) Here again, this processing involves the values:

(84) z(T.sub.n(q),n−1), N(T.sub.n(q),n−1) and z.sub.restored(T.sub.n(q),n−1), which may not be available when the estimated transformation T.sub.n(q) removes a pixel from the image (because of movement of the image).

(85) In this case, one could then choose the following values:
z(T.sub.n(q),n−1)=0
N(T.sub.n(q),n−1)=0
z.sub.restored(T.sub.n(q),n−1)=y(q,n) or 0

(86) It is thus shown that the recursive real-time estimation of the background of a video sequence allows restoring films highly degraded by impulse noise (“salt-and-pepper” or “snow” or actual dust (loose paper, particles, etc.) hiding a useful background and thus similar to impulse noise), without denaturing the original image as occurs with a linear filter applying an undesirable form of averaging to the succession of pixels.

(87) The advantages of the processing proposed here are multiple: the complexity and required memory are very low because the update of one pixel for image n is done using only the previously processed value (output n−1) and that of the current pixel (image n). The real-time implementation is therefore immediate, unlike an implementation based on conventional median filters. Furthermore, the processing is directly applicable to the case of a moving camera.

Processing of impulse noise in a video sequence

Assignee

Inventors

Cpc classification

Classification Explorer

H04N23/81

ELECTRICITY

Classification Explorer

H04N5/213

ELECTRICITY

Classification Explorer

G06T5/50

PHYSICS

Classification Explorer

G06T2207/10016

PHYSICS

Classification Explorer

G06T5/20

PHYSICS

Classification Explorer

G06T2207/10024

PHYSICS

Classification Explorer

G06T5/002

PHYSICS

Classification Explorer

G06T2207/20182

PHYSICS

International classification

Classification Explorer

G06T5/00

PHYSICS

Classification Explorer

G06T5/20

PHYSICS

Classification Explorer

H04N5/217

ELECTRICITY

Classification Explorer

H04N5/213

ELECTRICITY

Abstract

Claims

Description