Method for segmenting and tracking content in videos using low-dimensional subspaces and sparse vectors
09697614 ยท 2017-07-04
Assignee
Inventors
Cpc classification
G06F18/21345
PHYSICS
G06V10/7715
PHYSICS
G06V10/462
PHYSICS
International classification
Abstract
A method segments and tracks content in a video stream including sets of one or more images by first determining measured data from each set of one or more images. An adaptive step-size parameter and a low-dimensional subspace characterizing motion of the content the measured data are initialized. A sparse vector representing a sparse component that characterizes the motion of the content different from the motion of the content characterized by the low-dimensional subspace is determined. A change in the low-dimensional subspace for the measured data is determined using a proximal point iteration and the parameter, which is updated according to the change. A low-rank subspace matrix representing the low-dimensional subspace is updated according to the change and the parameter. Then, the low-rank matrix representing the low-dimensional subspace and the sparse vector are outputted.
Claims
1. A method for segmenting content in a video stream, comprising: acquiring the video stream generated from a camera and storing the video stream in a memory, wherein the video stream includes a sequence of images acquired of a scene by the camera, wherein each image includes pixels; using a processor in communication with the memory, such that the processor is configured for: using a stored subspace matrix and an adaptive step-size parameter to determine low-dimensional subspace coefficients for an input image in the sequence of images; determining a background component by multiplying the stored subspace matrix with the low-dimensional subspace coefficients, such that the background component represents motion of the content in the video stream characterized by the stored subspace matrix; determining a foreground sparse component from a difference between the pixels in the input image and the background component, such that the foreground sparse component characterizes the motion of the content in the video stream that is different from the motion of the content in the video stream characterized by the stored subspace matrix; determining a foreground-free image by subtracting the pixels in the input image from the foreground sparse component; updating the stored subspace matrix according to the adaptive step-size parameter, such that when the updated stored subspace matrix is multiplied by the low-dimensional subspace coefficients, the updated stored subspace matrix produces an image that is similar to the foreground-free image; updating the adaptive step-size parameter according to an amount of change in the updating of the stored subspace matrix; and storing the updated adaptive step-size parameter in the memory.
2. The method of claim 1, wherein the stored subspace matrix and the adaptive step-size parameter are available, and further comprising: determining a data misfit function to quantify a difference between the input image and a projection of the input image onto the stored subspace matrix; and determining the foreground sparse component and the low-dimensional subspace coefficients that minimize the data misfit function.
3. The method of claim 2, wherein the data misfit function is a l.sub.1-norm cost function.
4. The method of claim 1, wherein the input image is previously processed along with previously processed input images and new input images are available, further comprising: determining the stored subspace matrix and the adaptive step-size parameter from the updated stored subspace matrix and the adaptive step-size parameter of the previously processed input images.
5. The method of claim 1, further comprising: determining an error component that determines missing pixels from the input image using the stored subspace matrix.
6. The method of claim 1, further comprising: determining the amount of change in the updating of the stored subspace matrix by projecting a difference between the updated stored subspace matrix and a previous stored subspace matrix onto an orthogonal complement of the previous stored subspace matrix; and updating the adaptive step-size parameter by decreasing the adaptive step-size parameter when the amount of change is large and increasing the adaptive step-size parameter when the amount of change is small, relative to a previous amount of change.
7. The method of claim 1, wherein the input image is selected from a group consisting of pixel values, optical flow motion vectors or motion trajectories of feature points.
8. The method of claim 1, wherein the video stream is compressed, and further comprising: determining compressed motion vectors of each input image; determining the stored subspace matrix that characterizes a dominant motion flow in the video stream; determining the foreground sparse component that characterizes alternate motion flows from the dominant motion flow; and outputting the dominant motion flow separate from the alternate motion flow.
9. The method of claim 1, wherein each input image in the sequence of images is processed using the steps of claim 1 and stored in the memory.
10. The method of claim 1, wherein the adaptive step-size parameter controls a speed of convergence of an estimation of the stored subspace matrix.
11. The method of claim 1, wherein the low-dimensional subspace coefficients are determined using a least squares algorithm.
12. The system of claim 11, wherein a user interface in communication with the processor and the memory, acquires and stores the video stream in the memory upon receiving an input from a surface of the user interface by a user.
13. The method of claim 1, further comprising: outputting the foreground sparse component and the background component via an output interface in communication with the processor, wherein the foreground sparse component is used to identify objects of interest in the video stream and used to assist in further processing the video stream.
14. The method of claim 1, wherein storing the updated adaptive step-size parameter in the memory is later used to assist in updating a future stored subspace matrix.
15. A computer readable memory embodied thereon a program for segmenting content in a video stream, the program comprising instructions that when executed by a processor cause the processor to perform steps comprising: accessing the video stream generated from a camera stored in the computer readable memory, wherein the video stream includes a sequence of images acquired of a scene by the camera, wherein each image includes pixels; using a stored subspace matrix and an adaptive step-size parameter to determine low-dimensional subspace coefficients for an input image in the sequence of images; determining a background component by multiplying the stored subspace matrix with the low-dimensional subspace coefficients, such that the background component represents motion of the content in the video stream characterized by the stored subspace matrix; determining a foreground sparse component from a difference between the pixels in the input image and the background component, such that the foreground sparse component characterizes the motion of the content in the video stream that is different from the motion of the content in the video stream characterized by the stored subspace matrix; determining a foreground-free image by subtracting the pixels in the input image from the foreground sparse component; updating the stored subspace matrix according to the adaptive step-size parameter, such that when the updated stored subspace matrix is multiplied by the low-dimensional subspace coefficients, the updated stored subspace matrix produces an image that is similar to the foreground-free image; updating the adaptive step-size parameter according to an amount of change in the updating of the stored subspace matrix; and storing the updated adaptive step-size parameter in the computer readable memory.
16. The computer-readable memory of claim 15, wherein the input image is previously processed along with previously processed input images and new input images are available, further comprising: determining the stored subspace matrix and the adaptive step-size parameter from the updated stored subspace matrix and the adaptive step-size parameter of the previously processed input images.
17. The computer-readable memory of claim 15, further comprising: determining the amount of change in the updating of the stored subspace matrix by projecting a difference between the updated stored subspace matrix and a previous stored subspace matrix onto an orthogonal complement of the previous stored subspace matrix; and updating the adaptive step-size parameter by decreasing the adaptive step-size parameter when the amount of change is large and increasing the adaptive step-size parameter when the amount of change is small, relative to a previous amount of change.
18. A system for segmenting content in a video stream, comprising: at least one camera generating the video stream in communication with a computer readable memory to produce the video stream; an output interface; a processor in communication with the computer readable memory, is configured to: access the video stream, wherein the video stream includes a sequence of images acquired of a scene by the camera, wherein each image includes pixels; using a stored subspace matrix and an adaptive step-size parameter to determine low-dimensional subspace coefficients for an input image in the sequence of images; determining a background component by multiplying the stored subspace matrix with the low-dimensional subspace coefficients, such that the background component represents motion of the content in the video stream characterized by the stored subspace matrix; determining a foreground sparse component from a difference between the pixels in the input image and the background component, such that the foreground sparse component characterizes the motion of the content in the video stream that is different from the motion of the content in the video stream characterized by the stored subspace matrix; determining a foreground-free image by subtracting the pixels in the input image from the foreground sparse component; updating the stored subspace matrix according to the adaptive step-size parameter, such that when the updated stored subspace matrix is multiplied by the low-dimensional subspace coefficients, the updated stored subspace matrix produces an image that is similar to the foreground-free image; updating the adaptive step-size parameter according to an amount of change in the updating of the stored subspace matrix; and storing the updated adaptive step-size parameter in the computer readable memory.
19. The system of claim 18, wherein a user interface in communication with the processor and the computer readable memory, acquires and stores the video stream in the computer readable memory upon receiving an input from a surface of the user interface by a user.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
(3) The embodiments of our invention provide a method and system for segmenting and tracking objects in a data stream that lie in low dimensional subspaces using measurements of the data stream. For example, consider a sequence of images in a video stream where a large collection of objects in the video have a dominant motion trajectory that is static or changing slowly and where other objects in the video have a different motion trajectory than the dominant trajectory.
(4) In one embodiment, the method described in this invention separates a stationary and relatively large background component of measurements, in the form of images of a video arriving at a processor after the images are acquired, from typically sparse objects that characterize smaller moving objects in the video, usually in the foreground.
(5) In another embodiment of the invention, the measurements are motion vectors extracted from a sequence of images in a compressed video possibly corrupted with non-Gaussian noise. The motion vectors represent the optical flow in the video that tracks the motion of a large collection of objects in the video. The method segments and tracks a dominant optical flow using the low-dimensional subspace or a union of low-dimensional subspaces.
(6) In yet another embodiment, the measurements have missing data points. For example, if the motion vectors are extracted from only a subset of a sequence of images in a compressed video. The method of this invention determines the missing data points that correspond to the dominant optical flow after identifying the low-dimensional subspace or a union of low-dimensional subspaces.
(7) As shown in
(8) For first measured data 104 in the stream, e.g., a first image in the video sequence, an intial subspace matrix 131 and an initial step-size parameter 126 are initialized 105. A sparse component 111, in the form of a sparse vector, and subspace coefficients 112 are determined 110 from the first measured data using an iterative solver, for example an alternating direction method of multipliers (ADMM). Next, a change in the subspace 121 is determined 120 according to the first measured data 104, the sparse component 111, and the subspace coefficients 112. An adaptive step-size parameter 126 is updated 125 according to the change in subspace 121 is updated 120. The subspace matrix 131 is then updated 130 using the change in subspace 121 and the updated adaptive step-size parameter 126. For the second measured data and every subsequent measured data in the stream, the updated subspace matrix 131 and the updated adaptive step-size parameter are used as the initial subspace matrix and the initial step-size parameter. The process is repeated iteratively until all the measured data 101 from the data stream are processed.
(9) After the arrival of every new measured data from the stream, the moving objects 108 as represented by the sparse vector have been separated from the background as represented in the current subspace matrix 121.
(10)
(11) In one embodiment, the measured data correspond to features of interest points in a video sequence. A graph is constructed from the measured data using feature descriptors, such as a Scale Invariant Feature Transform (SIFT), a Fast Retina Keypoint (FREAK), a Binary Robust Invariant Scalable Keypoints (BRISK), etc., corresponding to the interest points in order to assign edges and weights between the interest points. The method then identifies one or a union of a low-dimensional subspaces that occupy a portion of a spectrum of the graph and that characterises the dominant association between the interest points. The method also segments the dominant association from sparse or non-Gaussian distributed associations that exist in the graph spectrum.
(12) Real-time Subspace Estimation
(13) We descibe real-time estimation of the low-dimensional subspace matrix 131 from incomplete streaming measurements 101, e.g., a compressed video, that may be corrupted with non-Gaussian noise. First, we describe our problem and define the notation used. Then, we describe minimizing a l-.sub.1-norm cost function between the measurements and their projection onto the subspace to determine the subspace coefficients 112, and sparse outliers 111. Then, the subspace is updated 130 using a proximal point iterative procedure, based on using least squares estimation, while updating 125 the adaptive step-size parameter 126.
(14) As advantages, our method does not restrict the subspace update to a Grassmannian manifold as in the prior art, and uses an adaptive step size. In addition, the method does not require an accurate initial estimate of the subspace, e.g., the subspace is set to a random subspace.
(15) Augmented Lagrangian-Based Proximal Point Iterative Procedure
(16) The method minimizes an augmented Lagrangian with the l.sub.1-norm cost function, and uses a smoothing term that maintains a proximity of the update to the previous subspace estimate over the variables (U.sub.t, s.sub.t, a.sub.t, y.sub.t). An objective cost can be represented by
(17)
where e.sub.t is supported on the complement of the selection operator .sub.t, denoted .sub.t.sup.c, such that .sub.t(e.sub.t)=0 and .sub.t.sup.c(e.sub.t)=.sub.t.sup.c(U.sub.ta.sub.t).
(18) Equation (2) is non convex in the variables U.sub.t and a.sub.t. Therefore, we follow the PETRELS and GRASTA approach of alternating the minimization over the variables (s.sub.t, a.sub.t, y.sub.t) and then the variables U.sub.t. By fixing U.sub.t, the minimizers of equation (2) are equal, i.e.,
(19)
(20) Then, the subspace U.sub.t is updated by taking a gradient step to minimize the function
(21)
using the adaptive step-size .
(22) Method
(23) As shown by the pseudocode in
(24)
where .sub.(x)=sign(x).Math.max{|x|,0} denotes an element-wise soft thresholding operator with threshold , k indicates the iteration number, and represents a Moore-Penrose pseudo-inverse of a matrix.
(25) In the second stage (steps 12-17), the subspace U.sub.t is updated (step 19) by minimizing equation (4) using
(26)
where I.sub.r is an rr identity matrix, and the step size .sub.t 126 is updated adaptively.
(27) Adaptive Step-Size Parameter
(28) For the adaptive step-size parameter, the method uses a regularizer .sub.t to control the speed of convergence of the estimation of the subspace 131. In particular, a smaller value of allows for faster adaptability of U.sub.t to a changing subspace, i.e., with a larger descent direction, whereas a larger value of only allows a slower change in U.sub.t.
(29) Consider a descent direction
D.sub.t=(U.sub.t-1+(b.sub.ts.sub.te.sub.t)a.sub.t.sup.T)(I.sub.r+a.sub.ta.sub.t.sup.T).sup.1U.sub.t-1, (7)
(30) and determine its projection onto an orthogonal complement of the previous subspace estimate to obtain the change in subspace 121
G.sub.t=(IU.sub.t-1U.sub.t-1.sup.)D.sub.t.(8)
(31) Then, the adaptive step-size parameter .sub.t 126 can be updated 125 according to
(32)
where
(33)
and l{1,0,1,2} are set according to predetermined thresholds for .sub.t. Here, sigmoid(x)=+2/(1+e.sup.10x) for some predefined .
(34) Similar to GRASTA, the intuition behind selecting such an update rule comes from the idea that if two consecutive subspace updates G.sub.t-1 and G.sub.t have the same direction, i.e., (G.sub.t-1, G.sub.t)>0, then the target subspace is still far from the current subspace estimate. Consequently, the updated step size .sub.t should be smaller to allow for fast adaptability, which is achieved by increasing .sub.t. Similarly, when (G.sub.t-1, G.sub.t)<0, the subspace update can oscilate around the target subspace and hence a larger .sub.t is needed. Note that when the product of the norm of the subspace updates (G.sub.t-1.sub.F.Math.G.sub.t.sub.F) is too small, e.g., smaller than 10.sup.6, we assume that the current subspace estimate is close to the target subspace, and we force .sub.t to decrease by the magnitude of the sigmoid.
(35) Although the invention has been described by way of examples of preferred embodiments, it is to be understood that various other adaptations and modifications may be made within the spirit and scope of the invention. Therefore, it is the object of the appended claims to cover all such variations and modifications as come within the true spirit and scope of the invention.