COMPUTER-IMPLEMENTED METHOD AND SYSTEM FOR PROCESSING VIDEO WITH TEMPORAL CONSISTENCY

20170345163 · 2017-11-30

    Inventors

    Cpc classification

    International classification

    Abstract

    System and computer-implemented method for editing a video sequence with temporal consistency. The method includes the steps of: computing a motion field modeling temporal consistency between successive frames; defining an energy functional modeling the desired properties to be enforced on the video sequence; splitting the video sequence into two sets with even frames and odd frames; computing the motion field between consecutive frames on the splitted sequences; recursively performing steps until the sets to be split contain one frame to edit; minimizing the energy functional for each set containing one frame to edit; merging the edited frames and outputting the edited video sequence.

    Claims

    1. A computer-implemented method for editing a video sequence with temporal consistency comprising the steps of: i) computing a motion field modeling temporal consistency between successive frames; ii) defining an energy functional modeling the desired properties to be enforced on the video sequence; iii) splitting the video sequence into two sets with even frames and odd frames; iv) computing the motion field between consecutive frames on the splitted sequences; v) recursively performing steps iii) and iv) until the sets to be split contain one frame to edit; vi) minimizing the energy functional for each set containing one frame to edit; vii) merging the edited frames and outputting the edited video sequence.

    2. The computer-implemented method of claim 1, wherein it comprises a previous step of identifying a frame in a video sequence having an edited object to be propagated.

    3. The computer-implemented method of claim 1, wherein the energy functional comprises a summation of editing energies depending on single-frames and temporal coupling energies depending on pairs of consecutive frames.

    4. The computer implemented method of claim 1, wherein the energy functional is based on at least one of the following model: global brightness change assumption; brightness constancy assumption; gradient constancy assumption.

    5. A system for editing a video sequence with temporal consistency comprising: a first processing means or computing a motion field (v) modeling temporal consistency between successive frames (f.sub.i, f.sub.i+1) and for defining an energy functional (J) modeling the properties to be enforced in the video sequence (f); a second processing means for splitting the video sequence (f) into two sets with even frames (f.sup.even) and odd frames (f.sup.odd), wherein the splitting is done recursively until there is only one frame to edit in each set; a third processing means for computing the motion field (v) between consecutive frames in each set after each split; a fourth processing means for minimizing the energy functional (J) on each set to obtain edited frames; a fifth processing means for merging the edited frames and outputting the entire edited video sequence (u).

    6. A system for editing a video sequence with temporal consistency of claim 5, comprising a sixth processing means for identifying a frame in a video sequence having an edited object to be propagated.

    7. A computer program product for editing a video sequence with temporal consistency comprising computer code instructions stored thereon that, when executed by a processor, causes the processor to perform the method of claim 1.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0082] A series of drawings which aid in better understanding the invention and which are expressly related with an embodiment of said invention, presented as a non-limiting example thereof, are very briefly described below.

    [0083] FIG. 1: Illustration of an editing domain inside of the video domain. It also shows the temporal slice at time t.

    [0084] FIG. 2: Diagram of the SplitAndMinimize procedure.

    [0085] FIG. 3: Diagram of the VideoOddSplit procedure.

    [0086] FIG. 4: Diagram of the VideoEvenSplit procedure.

    [0087] FIG. 5: Diagram of the MotionOddSplit procedure.

    [0088] FIG. 6: Diagram of the MotionEvenSplit procedure.

    [0089] FIG. 7: Diagram of the Merge procedure.

    [0090] FIG. 8: An example of the type of video editing application that can be accelerated by the disclosed method. First row, original sequence f to edit, second row first and last frame manually edited. Third row desired solution automatically computed.

    [0091] FIG. 9: A block diagram of the main elements of the invention.

    DETAILED DESCRIPTION OF THE INVENTION

    [0092] The present embodiment describes a method for reducing the complexity and solving time of video edition schemes assuming temporal consistency. Let Ω×[1,T] be the (spatio-temporal video) domain, where where Ωcustom-character.sup.2 is the spatial domain, and T≧1 represents the frame number within a video. Let u: Ω×[1,T].fwdarw.custom-character.sup.M be a given color (M>1) or grey (M=1) video such that u(x,t) represents the pixel value at location x in the frame t. Moreover, let v:Ω×[1,T−1].fwdarw.custom-character.sup.2 be the corresponding motion field. This motion field gives the correspondence between pixels in frame t and t+1 in such a way that u(x,t)=u(x+v(x),t+1) (under brightness constancy assumption), that is v establishes the temporal coherence in the video.

    [0093] As a way of example, several algorithms written in pseudo code are defined below for a better understanding. These algorithms can be implemented in any processing device (e.g. a computer) to advantageously edit a video sequence according to the principles of the invention. Especially, the core of the invention can be seen in the sequence of steps to be taken in Algorithm 2.

    Algorithm 1: u←Minimize(J,f,v)

    Require:

    [0094] A grayscale or color video f(x,t),tε[1,T] composed by T frames. The motion field v(x,t), tε[1,T−1] [0095] An energy functional J(u,v,f) to minimize or the partial differential equation that involves the given video f and the motion field v.
    Ensure: An edited video u(x,t), tε[1,T].
    1:u←minimum of J(u,v,f) w.r.t. u
    Algorithm 2: u←SplitAndMinimize(J,f,v)

    Require:

    [0096] A grayscale or color video f(x,t),tε[1,T] composed by T frames. The motion field v(x,t), tε[1,T−1] [0097] An energy functional J(u,v,f) to minimize or the partial differential equation that involves the given video f and the motion field v.

    Ensure:

    [0098] An edited video u(x,t), tε[1,T].

    1. nFrames←number of frames
    2. if nFrames is 1 then [0099] a. u←Minimize (j,f,v) w.r.t. u
    3. else{nFrames>1} [0100] a. f.sup.odd(x,t)←VideoOddSplit(f) [0101] b. f.sup.even(x,t)←VideoEvenSplit(f) [0102] c. v.sup.odd(x,t)←MotionOddSplit(v) [0103] d. v.sup.even(x,t)←MotionEvenSplit(f) [0104] e. u.sup.odd←SplitAndMinimize(J,f.sup.odd,v.sup.odd) [0105] f. u.sup.even←SplitAndMinimize(J,f.sup.even,v.sup.even) [0106] g. u←Merge(u.sup.odd,u.sup.even)
    4. end if
    5. return u
    Algorithm 3: f.sup.odd←VideoOddSplit(f)

    Require:

    [0107] A grayscale or color video f(x,t),tε[1,T] composed by T frames

    Ensure: A video f.sup.odd(x,t) composed by the odd frames from f

    TABLE-US-00001 1. nFrames←number of frames 2. if nFrames is 1 then    a. f.sup.odd ← f 3. else{nFrames>1}    a. j ← 1    b. for i = 1 to i=nFrames do       i. if i is odd then          1. f.sup.odd(x,j) = f(x,i)          2. j ← j + 1        ii. end if       iii. i ← i + 1    c. end for 4. end if 5. return f.sup.odd
    Algorithm 4: f.sup.even←VideoEvenSplit(f)

    Require:

    [0108] A grayscale or color video f(x,t),tε[1,T] composed by T frames

    Ensure: A video f.sup.even(x,t) composed by the odd frames from f

    TABLE-US-00002 1. nFrames←number of frames 2. if nFrames is 1 then    a. f.sup.even ← f 3. else{nFrames>1}    a. j ← 1    b. for i = 1 to i=nFrames do       i. if i is even then          1. f.sup.even(x,j) = f(x,i)          2. j ← j + 1        ii. end if       iii. i ← i + 1    c. end for 4. end if 5. return f.sup.even
    Algorithm 5: v.sup.odd←MotionOddSplit(v)

    Require:

    [0109] A motion field v(x,t),tε[1,T−1] from a composed by T frames.

    [0110] The number of frames nFrames of the corresponding video.

    Ensure: A motion field v.sup.odd(x,t) that should be coherent with VideoOddSplit(f)

    TABLE-US-00003 1. nFrames←number of frames 2. if nFrames is 1 then    a. v.sup.odd ← v 3. else{nFrames>1}    a. j ← 1    b. for i = 1 to i=nFrames−1 do       i. if i is odd then          1. v.sup.odd(x,j) ← v(x,i) + v(x + v(x,i), i + 1)          2. j ← j + 1        ii. end if       iii. i ← i + 1    c. end for 4. end if 5. return v.sup.odd
    Algorithm 6: v.sup.even MotionEvenSplit(v)

    Require:

    [0111] A motion field v(x,t),tε[1,T−1] from a composed by T frames.

    [0112] The number of frames nFrames of the corresponding video.

    Ensure: A motion field v.sup.even(x,t) that should be coherent with VideoEvenSplit(f)

    TABLE-US-00004 1. nFrames←number of frames 2. if nFrames is 1 then    a. v.sup.even ← v 3. else{nFrames>1}    a. j ← 1    b. for i = 1 to i=nFrames−1 do       i. if i is even then          1. v.sup.even(x,j) ← v(x,i) + v(x + v(x,i), i + 1)          2. j ← j + 1        ii. end if       iii. i ← i + 1    c. end for 4. end if 5. return v.sup.even
    Algorithm 7: u←Merge (u.sup.odd,u.sup.even)

    Require:

    [0113] Two color or grey video sequences (u.sup.odd,u.sup.even).

    [0114] The difference in the number of frames can not be greater than one.

    Ensure: A new u video composed by u.sup.odd and u.sup.even [0115] 1. nFrames.sup.odd←number of frames from u.sup.odd [0116] 2. nFrames.sup.even←number of frames from u.sup.even [0117] 3. i←1 [0118] 4. j←1 [0119] 5. while i←nFrames.sup.odd or i<nFrames.sup.even do [0120] a. u(x,j)←u.sup.odd(x,i) [0121] b. u(x,j+1)←u.sup.even(x,i) [0122] c. i←i+1 [0123] d. j←j+2 [0124] 6. end while

    [0125] Now the following video editing problem is considered as an example of how to proceed for solving the problem according to the invention: Let f be a color video composed by 5 frames, as it shown in first row of figure FIG. 8, in which the frames 1 and 5 have been manually edited by changing two sides from the box (second row). The idea is to automatically propagate the information of the first and last frames to the frames in between (frames 2, 3 and 4), as is shown in FIG. 8, third row.

    [0126] One of the possible methods in the literature for solving this problem goes as follows. Let 0⊂Ω×[1,3] be the editing domain (0 is the sides of the box from FIG. 8) with a Lipschitz boundary [1] (to simplify, we can consider that 0 has a smooth boundary). Let 0.sub.t={xεΩ:(x,t)ε0 tε[1,3]}, i.e. 0.sub.t is the editing area of frame t. An illustration of these domains can be seen in FIG. 1. Moreover, let v be the correspondence map between the frames.

    [0127] The problem can be solved minimizing an energy functional. In this example, the global brightness change assumption is used:


    J(u)=∫.sub.0.sup.T∫.sub.Ω.sub.t∥∇∂.sub.vu(x,t)∥.sup.2dxdt  (15)

    [0128] Where ∇ is the gradient operator and ∂.sub.v is the convective derivative. Following calculus of variations, the minimum of energy (15) is the solution to the Euler-Lagrange equation given by the following fourth order PDE (Partial Differential Equation)


    ∂.sub.v*div ∇∂v(x,t)=0, (x,t)ε0  (16)

    where div is the spatial divergence adjoint to −∇ and ∂.sub.v* denotes the adjoint operator of the convective derivative, given by

    [00008] v * .Math. f = - f t - div ( vf ) .

    [0129] This Equation is completed with Dirichlet boundary conditions,


    u(x,t)=u.sub.0(x,t), xεΩ.sup.t/0.sub.t  (17)

    [0130] According to the present proposal, it is not used the whole video u.sub.0 neither the whole correspondence map v. In an informal way, the method applied states as follows: The inputs of the problem are the video f and the connectivity information between consecutive frames. This connectivity information (or motion field) is usually approximated by the optical flow v.

    [0131] The first step relates to splitting the input video sequence to be edited. According to this step, the input video is split into two sets: The odd frames and the even frames (FIGS. 3 and 4 and algorithms 3 and 4). The motion field also has to be split (FIGS. 5 and 6 and algorithms 5 and 6). This step has to be done recursively until the video sequences has only one frame. In our example we have 3 video sequences at the end of the recursive splitting step that are not really of one frame but of one frame plus the lids (the already edited frames). In this regard, the sequences are called of one frame because there is only one frame that contains unknown values. Once all the videos of one frame plus the lids and their correspondent motion fields are created, the problem is solved independently for each small video. Once it is solved a merging step is needed to re-compose the full output from each individual solution of the many small problems, this is done following FIG. 7 and algorithm 7. This merge may also include some correction steps that are responsible in fixing the possible errors generated by the algorithms 5 and 6.

    [0132] Now, let us to describe the algorithm 2 step by step in the example context of the previous video editing problem for a video sequence f={f0, f1, f2, f3, f4}, with the manual edited frames {f0, f4} and motion field v={v0, v1, v2, v3}

    1. u←SplitAndMinimize(J,f,v) (because f has more than one frame to edit) [0133] 1.1. Split the original video sequence f={f0, f1, f2, f3, f4} following algorithms 3 and 4. [0134] f.sub.o←VideoOddSplit(f) [0135] f.sub.o={f0, f1, f3, f4} [0136] f.sub.e←VideoEvenSplit(f) [0137] f.sub.e={f0, f2, f4} [0138] 1.2. Compute the new motion fields from v following algorithms 5 and 6 [0139] v.sub.o←MotionOddSplit(v) [0140] v.sub.o={v(x,0),v(x,1)+v(x+v(x,1),2),v(x,3)} [0141] v.sub.oe←MotionEvenSplit(v) [0142] v.sub.e={v(x,0)+v(x+v(x,0),1),v(x,2)+v(x+v(x,2),3)} [0143] 1.3. Solve the problem for each f.sub.o, f.sub.e and their corresponding motion fields v.sub.o, v.sub.e. [0144] u.sub.e←Minimize(J,f.sub.e,v.sub.e) (because f.sub.e has only one frame to edit) [0145] u.sub.e={f0, u.sub.e1,f4} [0146] u.sub.o←SplitAndMinimize(J,f.sub.o,v.sub.o) (because f.sub.o has more than one frame to edit) [0147] 1.3.1. Split the video sequence f.sub.o following algorithms 3 and 4 [0148] f.sub.oo←VideoOddSplit(f.sub.o) [0149] f.sub.oo={f0,f1,f4} [0150] f.sub.oe←VideoEvenSplit(f.sub.o) [0151] f.sub.oe={f0,f3,f4} [0152] 1.3.2. Compute The new motion fields from v.sub.o following algorithms 5 and 6 [0153] v.sub.oo←MotionOddSplit(v.sub.o) [0154] v.sub.oo={v.sub.o(x,0), v.sub.o(x,1)+v.sub.o(x+v.sub.o(x,1)2) [0155] v.sub.oe←MotionEvenSplit(v.sub.o) [0156] v.sub.oe={v.sub.o(x,0)+v.sub.o(x+v(x,0),1),v.sub.o(x,2)} [0157] 1.3.3. Because the number of frames to edit of f.sub.oo and f.sub.oe is one, we solve [0158] u.sub.oo←Minimize(J,f.sub.oo,v.sub.oo) [0159] u.sub.oo={f0, u.sub.oo1,f4} [0160] u.sub.oe←Minimize(J,f.sub.oe,v.sub.oe) [0161] u.sub.oe={f0, u.sub.oe1,f4} [0162] 1.3.4. Merge the solutions u.sub.oo, u.sub.oe [0163] u.sub.o←Merge(u.sub.oo, u.sub.oe) [0164] u.sub.o={f0,u.sub.oo1,u.sub.oe1,f4} [0165] 1.3.5. Merge the solutions u.sub.o, u.sub.e [0166] u←Merge(u.sub.o, u.sub.e) [0167] u={f0,u.sub.oo1,u.sub.e1,u.sub.oe1,f4} [0168] 1.3.6. Return the edited video u.

    [0169] As apparent, the above algorithms can be coded as instructions in a suitable computer language for automatically performing the described operations when executed on a computer.

    [0170] FIG. 9 shows a block diagram that represents the main functional elements that manage data in an embodiment. The boxes 10-50 may refer to logical units defined in a computer or computers in a network. A video sequence (t) to be edited is received by a first processing means 10 which computes a motion field (v) that models temporal consistency between successive frames (f.sub.i, f.sub.i+1) and further defines an energy functional (J) that models the properties to be enforced in the video sequence (f). A second processing means 20 recursively splits the video sequence (f) into two sets with even frames (f.sup.even) and odd frames (f.sup.odd). This is done until there is only one frame to edit in each set. A third processing means 30 is in charge of computing the motion field (v) between consecutive frames in each set after each division. A fourth processing means 40 minimizes the energy functional (J) on each set to obtain edited frames. Lastly a fifth processing means 50 merges the edited frames and provide the entire edited video sequence (u).

    [0171] Although the invention has been explained in relation to its preferred embodiment(s) as mentioned above, it can be understood that many other modifications and variations can be made without departing from the scope of the present invention defined by the appended claims.