Systems and methods for computer vision background estimation using foreground-aware statistical models
09767570 · 2017-09-19
Assignee
Inventors
Cpc classification
G06V10/267
PHYSICS
International classification
Abstract
Systems and methods are disclosed for background modeling in a computer vision system for enabling foreground object detection. A video acquisition model receives video data from a sequence of frames. A fit test module identifies a foreground object from the video data and defines a foreground mask representative of the identified foreground object. A foreground-aware background estimation module defines a first background model from the video data and then further defines an updated background model from an association of a current frame of the video data, the first background model and the foreground mask.
Claims
1. A system for parametric background modeling in a computer vision system for enabling foreground object detection, comprising: a video acquisition module for receiving video data from a sequence of frames acquired from an associated image capture device monitoring a scene of interest, wherein the video acquisition module includes an at least one video capture device; a fit test module for performing pixel-wise goodness of fit tests for pixel values in the incoming frames received from the video acquisition module, wherein the fit test module identifies the pixels associated with the foreground object from the video data obtained from the at least one video capture device where the identified pixels are determined to belong to the foreground or to the background according to the individual pixel's RGB value and to the individual pixel's corresponding mixture model, the identified pixels defining a foreground mask representative of the identified foreground object; and a foreground-aware background estimation module for maintaining a parametric background model wherein the foreground-aware background estimation module includes a processor for updating individual pixel models in accordance with incoming pixel values of the current frame, whereby only the pixels outside of the foreground mask are updated with a new individual pixel model, leaving the models for pixels in the foreground mask unchanged.
2. The system of claim 1 wherein the foreground mask comprises a binary mask.
3. The system of claim 2 wherein the binary foreground mask is generated by thresholding a probability matrix output by the fit test module.
4. The system of claim 2 wherein the binary foreground mask is generated directly by the fit test module.
5. The system of claim 1 wherein the background models comprise a statistical estimation model representative of historical pixel values from the sequence of frames.
6. The system of claim 5 wherein the fit test module includes a processor for comparing pixel values between the background model and corresponding pixel values in the current frame.
7. The system of claim 6 wherein the fit test module computes a probability score representative of a confidence that the pixels in the current frame belong to a corresponding respective distribution.
8. The system of claim 7 wherein the foreground-aware background estimation module includes a processor for updating statistical models for the pixel values in accordance with the probability score returned by the fit test module.
9. The system of claim 6 wherein the background estimation module includes a processor for updating statistical models in accordance with incoming pixel values of the current frame for pixels outside of the foreground mask for precluding absorbing the foreground object into the updated background estimation model.
10. The system of claim 6 further including a probability matrix for estimating relative to a preselected threshold if a selected pixel is included in the foreground mask.
11. The system of claim 1 comprising a non-transitory storage medium storing instructions readable and executable by an electronic data processing device.
12. The system of claim 11 wherein the storage medium includes a cloud-based server complemented by a network of computers operatively interconnected via a network.
13. The system of claim 1 wherein the scene of interest comprises a sequence of frames stored in a storage medium.
14. A method for background estimation in a computer vision system for enhanced foreground object detection, comprising: receiving video data in a video acquisition module from a sequence of frames acquired from an associated image capture device monitoring a scene of interest; identifying a foreground object in the video data with a pixel-wise goodness of fit test for the frames received from the video acquisition module, wherein the pixel-wise goodness of fit test is made in a fit test module; defining a foreground mask representative of the identified foreground object by determining whether identified pixels belong to the foreground or the background according to the individual pixel's RGB value and the individual pixel's corresponding mixture model; defining a parametric background model from the video data in a foreground-aware background estimation module; and updating the background model in the foreground-aware background estimation module by selectively updating individual pixel models in accordance with incoming pixel values of the current frame, whereby only the pixels outside of the foreground mask are updated with a new individual pixel model, leaving the models for pixels in the foreground mask unchanged.
15. The method of claim 14 wherein defining the parametric background model includes defining a Gaussian mixture model.
16. The method of claim 14 wherein the background estimation module includes a processor for updating the statistical models in accordance with incoming pixel values of the current frame for pixels outside of the foreground mask for precluding absorbing the foreground object into the updated background estimation model.
17. The method of claim 14 further including estimating if a select pixel is included in the foreground mask with a probability matrix relative to a preselected threshold.
18. The method of claim 14 further including defining the updated background model in one of a retail application or traffic monitoring environment.
19. The method of claim 14 wherein the monitoring the scene of interest includes monitoring a sequence of frames stored in a storage medium.
20. The method of claim 14 wherein a pixel is deemed to belong to the background if its RGB value is within three standard deviation of any component in its corresponding Gaussian mixture.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION
(9) The following terminology and notation will be used throughout describing the present embodiment.
(10) BG.sub.0: the initial background model (array of pixel-wise statistical models) extracted, e.g., by setting the means of Gaussian components of each pixel to the pixel value and setting the covariance matrix to a diagonal matrix with large positive entries. Alternatively, BG.sub.0 can be loaded from a database of previous models from the same scene. A background model is said to have been initialized once the parameters that best describe the statistical models for every pixel are determined. Upon completion of the training phase, BG.sub.0 converges to BG.sub.1; F.sub.i: the i-th video frame (grayscale or color), where i represents a temporal index; BG.sub.i: the i-th background model (array of pixel-wise statistical models) used for foreground detection in conjunction with frame F.sub.i; this is the model available before an update occurs based on the newly incoming pixel samples in F.sub.i; FG.sub.i: the i-th foreground binary mask obtained via comparison between BG.sub.i and F.sub.i; BG.sub.i+1: the (i+1)-th background model obtained by updating the pixel-wise background models in BG.sub.i with the pixel values in F.sub.i; FG.sub.i+1 will subsequently be determined via comparison between BG.sub.i+i and frame F.sub.i+1. Traditional methods of using statistical model for background estimation include the following steps/operations.
(11) Pixel Modeling Statistical models for background estimation model the values of a pixel over time as the instantiations of a random variable with a given distribution. Background estimation is achieved by estimating the parameters of the distributions that accurately describe the historical behavior of pixel values for every pixel in the scene. Specifically, at frame n, what is known about a particular pixel located at coordinates (i,j) is the history of its values
{X.sub.1,X.sub.2, . . . ,X.sub.n}={I(i,j,m),1≦m≦n} (1)
where X.sub.n is the pixel value at frame n, I is the image sequence or video frame sequence, (i,j) are the pixel indices and m is the image frame index.
(12) While the historical behavior can be described with different statistical models including parametric models that assume an underlying distribution and estimate the relevant parameters, and non-parametric models such as kernel-based density estimation approaches, the present embodiments implement the proposed algorithm in terms of Gaussian mixture models, and note that it is equally applicable to other online modeling approaches. The recent history of behavior of values of each pixel are modeled as a mixture of K Gaussian distributions, so that the probability density function for the current value is
P(X.sub.t)=Σ.sub.i=1.sup.Kw.sub.itη(X.sub.t,μ.sub.it,Σ.sub.it) (2)
where w.sub.it is an estimate of the relative weight of the i-th Gaussian component in the mixture at time t, μ.sub.it is the mean value of the i-th Gaussian component in the mixture at time t, Σ.sub.it is the covariance matrix of the i-th Gaussian component in the mixture at time t, and η(.Math.) is the Gaussian probability density function. Reference is made to μ.sub.it, Σ.sub.it, and w.sub.it as parameters of the Gaussian mixture model. Note that when other statistical models are used, the relevant parameters will differ. Initializing and updating a given statistical model will update the values of the parameters. In the case where color images are used, sometimes a reasonable assumption is for the different color channels to be uncorrelated, in which case Σ.sub.it=σ.sub.itI. This is not intended to be a limiting statement, since non-diagonal covariance matrices are used in the more general case.
(13) Pixel modeling is usually conducted during the initialization/training phase of the background model. To this end, the first N frames (usually N˜100 in practice) are used to train the background model. A background model is said to have been initialized once the parameters that best describe the mixture of Gaussians (mean vectors and covariance matrices for each Gaussian component) for every pixel are determined. For simplicity, the initialization/training phase of the background model is omitted from the description of the system and it is assumed that the background model has been initialized upon the beginning of the foreground detection process.
(14) Foreground Pixel Detection Foreground detection is performed by determining a measure of fit of each pixel value in the incoming frame relative to its constructed statistical model. In one embodiment, as a new frame comes in, every pixel value in the frame is checked against its respective mixture model so that a pixel is deemed to be a background pixel if it is located within T=3 standard deviations of the mean of any of the K components. Use of other values for T or membership/fit tests to determine pixel membership (e.g., maximum likelihood) is possible.
(15) Model Updating If none of the K distributions match the current pixel value according to the membership test described above, the least probable distribution in the mixture is replaced with a distribution with mean equal to the incoming pixel value, some arbitrarily high variance, and a small weighting factor, the two latter statements reflecting the lack of confidence in the newly added component. The weights of the distributions are adjusted according to:
w.sub.i(t+1)=(1−α)w.sub.it+αM.sub.it (3)
where α is the learning or update rate and M.sub.it is an indicator variable equaling 0 for every component except the matching one (in which case M.sub.it=1), so that only the weight factor for the matching distribution is updated. Similarly, only the mean and standard deviation/covariance estimates for matching distributions are updated according to:
μ.sub.t+1=(1−ρ)μ.sub.t+ρX.sub.t (4)
σ.sub.t+1.sup.2=(1−ρ)σ.sub.t.sup.2+ρ(X.sub.t−μ.sub.t+1).sup.T(X.sub.t−μ.sub.t+1) (5)
where X.sub.t is the value of the incoming pixel and ρ=αη(X.sub.t|μ.sub.k,σ.sub.k.sup.2) is the learning rate for the parameters of the matching component of the distribution, k.
(16) With particular reference to
(17) The Fit Test Module 34 outputs a foreground estimate by performing pixel-wise goodness of fit tests of the values in each incoming frame received from the video acquisition module 32 relative to the background model maintained by the foreground-aware model update module. This module 34 implements the Fit Test operation that takes the current background model BG.sub.t and the most recent video frame F.sub.t, and outputs the current foreground mask FG.sub.t. More particularly, and with additional reference to
(18) The Foreground-Aware Background Model Update Module 36 stores the current background model BG.sub.t and updates it according to the foreground mask FG.sub.t output by the Fit Test Module 34, and the incoming frame F.sub.t 38 received from the video acquisition module. The result is an updated background model BG.sub.t+1 44 to be stored and used in the processing of the next and new incoming frame F.sub.t+1.
(19) For each pixel, the weights of the distributions are adjusted according to:
w.sub.i(t+1)=fg.sub.tw.sub.it+(1−fg.sub.t)((1−α)w.sub.it+αM.sub.it) (6)
where α is the learning or update rate, and M.sub.it is an indicator variable equaling 0 for every component except the matching one (in which case M.sub.it=1), so that only the weight factor for the matching distribution is updated; lastly, fg.sub.t is the binary value of the foreground mask FG.sub.t at the pixel whose model is being updated.
(20) Similarly, only the mean and standard deviation/covariance estimates for matching distributions are updated according to:
μ.sub.t+1=fg.sub.tμ.sub.t+(1−fg.sub.t)((1−ρ)μ.sub.t+ρX.sub.t) (7)
σ.sub.t+1.sup.2=fg.sub.tσ.sub.t.sup.2+(1−fg.sub.t)((1−ρ)σ.sub.t.sup.2+ρ(X.sub.t−μ.sub.t+1).sup.T(X.sub.t−μ.sub.t+1)) (8)
(21) where X.sub.t is the value of the incoming pixel, and ρ=αη(X.sub.t|μ.sub.k,σ.sub.k.sup.2) is the learning rate for the parameters of the matching component of the distribution, k. In other words, the weights and the parameters of each Gaussian component are updated in accordance with incoming pixels of the current frame for pixels outside of the foreground mask, thereby precluding absorbing the foreground object into the updated background estimation model.
(22) With particular reference to
(23) The effect of performing the updates in the manner described is that only models for background pixels get updated at each frame, which eliminates the risk for a foreground object being absorbed into the background, thus inaccurately affecting the background model for that pixel.
(24) With particular reference to
(25) When intermediate probabilities are available, the updating rules implemented by the foreground-aware background model update module are as follows:
w.sub.i(t+1)=(1−p.sub.t)w.sub.it+p.sub.t((1−α)w.sub.itαM.sub.it) (7)
μ.sub.t+1=(1−p.sub.t)μ.sub.t+p.sub.t((1−ρ)μ.sub.t+ρX.sub.t) (8)
σ.sub.t+1.sup.2=(1−p.sub.t)σ.sub.t.sup.2+p.sub.t((1−ρ)σ.sub.t.sup.2+ρ(X.sub.t−μ.sub.t+1).sup.T(X.sub.t−μ.sub.t+1)) (9)
which reflects the estimated confidence of a pixel belonging to its respective background distribution.
(26) The foregoing embodiments will thusly support an increased range of patterns of motion of objects for a given learning rate. They also have the advantage of being less sensitive to the choice of learning rate to achieve a satisfactory detection performance and greatly improve the detection performance with a relatively small learning rate value, which in turn enables responsiveness of the background model to fast changes in the appearance of the scene at a region of interest.
(27) The foregoing described modules such as the Fit Test Module 34 and the Foreground-Aware Model Update Module 36 are intended to embody a computer or processing based device, either hardware or software, and thus comprise non-transitory storage mediums for storing instructions readable and executable by an electronic data processing device. Such a storage medium can also include a cloud-based server complemented by a network of computers operatively interconnected via a network.
(28) It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.