Generation of alpha masks of video frames

11295424 · 2022-04-05

Assignee

Inventors

Cpc classification

International classification

Abstract

Disclosed is an electronic device and a method to generate alpha masks of video frames in a video comprising a plurality of video frames including a first video frame and a second video frame following the first video frame, where the method comprising; obtaining a first alpha mask of the first video frame; providing a first downscaled video frame, wherein the first downscaled video frame is a lower resolution version of the first video frame; providing a first downscaled alpha mask of the first alpha mask; estimating a first primary coefficient and a first secondary coefficient based on the first downscaled video frame and the first downscaled alpha mask; and generating a second alpha mask for the second video frame based on the first primary coefficient and the first secondary coefficient.

Claims

1. A method to generate alpha masks of video frames in a video comprising a plurality of video frames including a first video frame and a second video frame following the first video frame, the method comprising: obtaining a first alpha mask of the first video frame; estimating a first primary coefficient and a first secondary coefficient based on the first video frame and the first alpha mask, wherein the first downscaled alpha mask is, within a window of the first video frame, a linear function of the first video frame and wherein the first primary coefficient and the first secondary coefficient are coefficients of the linear function; generating a second alpha mask for the second video frame based on the first primary coefficient and the first secondary coefficient.

2. The method of claim 1, wherein the first primary coefficient and the first secondary coefficient are constant for all pixels within a window of the first video frame.

3. The method of claim 1, wherein estimating the first primary coefficient and the first secondary coefficient comprising convolution operations.

4. The method of claim 1, wherein the obtaining the first alpha mask of the first video frame includes downscaling a first unscaled alpha mask of a first unscaled video frame.

5. The method of claim 4, wherein the ratio between the resolution of the first video frame and the resolution of the first unscaled video frame is between 1/2 and 1/100.

6. The method of claim 1, wherein the first alpha mask is defined by:
α.sub.i.sup.(t)=a.sub.k.sup.(t)I.sub.i.sup.(t)+b.sub.k.sup.(t) wherein α is the first alpha mask, I is the first video frame, a is the first primary coefficient, and b is the first secondary coefficient, within a window w centered at pixel k with radius r.

7. The method of claim 1, wherein the plurality of video frames includes a third video frame, the method comprising: estimating a second primary coefficient and a second secondary coefficient based on the second video frame and the second alpha mask; and generating a third alpha mask for the third video frame based on the second primary coefficient and the second secondary coefficient.

8. The method of claim 7, wherein the plurality of video frames includes a fourth video frame, the method comprising: estimating a third primary coefficient and a third secondary coefficient based on a second keyframe alpha mask and the third video frame; and generating a fourth alpha mask for the fourth video frame based on the third primary coefficient and third secondary coefficient.

9. The method of claim 8, wherein generating the fourth alpha mask comprises updating the third primary coefficient and the third secondary coefficient to the fourth video frame.

10. The method of claim 8, wherein the method comprises estimating a second keyframe primary coefficient and a second keyframe secondary coefficient based on the second keyframe alpha mask, where the second keyframe alpha mask is calculated concurrently with the generation of the second alpha mask and/or the third alpha mask.

11. The method of claim 8, wherein the method comprises estimating a temporary fourth primary coefficient and a temporary fourth secondary coefficient based on the fourth alpha mask and the fourth video frame; and wherein generating the fourth alpha mask comprises using Kalman filtering to combine the temporary fourth primary coefficient and the temporary fourth secondary coefficient with the fourth primary coefficient and the fourth secondary coefficient to generate the fourth alpha mask.

12. An electronic device for generating alpha masks of video frames in a video, the electronic device comprising: a camera configured to provide the video comprising a plurality of video frames including a first video frame and a second video frame following the first video frame; a display configured to display the video frames of the video; and a processing unit configured to: obtain a first alpha mask of the first video frame; estimate a first primary coefficient and a first secondary coefficient based on the first video frame and the first alpha mask, wherein the first alpha mask is, within a window of the first video frame, a linear function of the first video frame and wherein the first primary coefficient and the first secondary coefficient are coefficients of the linear function; and generate a second alpha mask for the second video frame based on the first primary coefficient and the first secondary coefficient.

13. The electronic device of claim 12, wherein estimating the first primary coefficient and the first secondary coefficient comprising convolution operations.

14. The electronic device of claim 12, wherein the first alpha mask is defined by:
α.sub.i.sup.(t)=a.sub.k.sup.(t)I.sub.i.sup.(t)+b.sub.k.sup.(t) wherein α is the first alpha mask, I is the first video frame, a is the first primary coefficient, and b is the first secondary coefficient, within a window w centered at pixel k with radius r.

15. The electronic device of claim 12, wherein the plurality of video frames includes a third video frame, the processing unit further configured to: estimate a second primary coefficient and a second secondary coefficient based on the second video frame and the second alpha mask; and generate a third alpha mask for the third video frame based on the second primary coefficient and the second secondary coefficient.

16. The electronic device of claim 15, wherein the plurality of video frames includes a fourth video frame, the processing unit further configured to: provide a third video frame, wherein the third video frame is a lower resolution version of the third video frame; estimate a third primary coefficient and a third secondary coefficient based on a second keyframe alpha mask and the third video frame; and generate a fourth alpha mask for the fourth video frame based on the third primary coefficient and third secondary coefficient.

17. The electronic device of claim 16, wherein the processing unit is configured to generate the fourth alpha mask by updating the third primary coefficient and the third secondary coefficient to the fourth video frame.

18. The electronic device of claim 16, wherein the processing unit is configured to estimate a second keyframe primary coefficient and a second keyframe secondary coefficient based on the second keyframe alpha mask, where the second keyframe alpha mask is calculated concurrently with the generation of the second alpha mask and/or the third alpha mask.

19. The electronic device of claim 16, wherein the processing unit is configured to estimate a temporary fourth primary coefficient and a temporary fourth secondary coefficient based on the fourth alpha mask and the fourth video frame; and wherein the processing unit is configured to generate the fourth alpha mask using Kalman filtering to combine the temporary fourth primary coefficient and the temporary fourth secondary coefficient with the fourth primary coefficient and the fourth secondary coefficient to generate the fourth alpha mask.

20. A non-transitory computer-readable medium having instructions encoded thereon which, when executed by a processing unit of an electronic device, cause the electronic device to generate alpha masks of video frames in a video comprising a plurality of video frames including a first video frame and a second video frame following the first video frame by: obtaining a first alpha mask of the first video frame; estimating a first primary coefficient and a first secondary coefficient based on the first video frame and the first alpha mask, wherein the first alpha mask is, within a window of the first video frame, a linear function of the first video frame and wherein the first primary coefficient and the first secondary coefficient are coefficients of the linear function; generating a second alpha mask for the second video frame based on the first primary coefficient and the first secondary coefficient.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The above and other features and advantages will become readily apparent to those skilled in the art by the following detailed description of exemplary embodiments thereof with reference to the attached drawings, in which:

(2) FIG. 1 schematically illustrates an example of the method to generate alpha masks of video frames in a video

(3) FIG. 2 schematically illustrates an example of the method to generate alpha masks of video frames in a video.

(4) FIG. 3 schematically illustrates an image or video frame comprising pixels and window(s).

(5) FIG. 4 schematically illustrates an asynchronous rendering of a slow keyframe model and model updating, while a fast temporal model continuously renders frames from the camera.

(6) FIG. 5 schematically illustrates a flow chart of a method to generate alpha masks of video frames.

(7) FIG. 6 schematically illustrates a flow chart of a method to generate alpha masks of video frames.

(8) FIG. 7 schematically illustrates an exemplary electronic device for generating alpha masks of video frames in a video.

DETAILED DESCRIPTION

(9) Various embodiments are described hereinafter with reference to the figures. Like reference numerals refer to like elements throughout. Like elements will, thus, not be described in detail with respect to the description of each figure. It should also be noted that the figures are only intended to facilitate the description of the embodiments. They are not intended as an exhaustive description of the claimed invention or as a limitation on the scope of the claimed invention. In addition, an illustrated embodiment needs not have all the aspects or advantages shown. An aspect or an advantage described in conjunction with a particular embodiment is not necessarily limited to that embodiment and can be practiced in any other embodiments even if not so illustrated, or if not so explicitly described.

(10) Throughout, the same reference numerals are used for identical or corresponding parts.

(11) FIG. 1 schematically illustrates an example of the method to generate alpha masks of video frames in a video comprising a plurality of video frames including a first video frame 2 and a second video frame 4 following the first video frame 2. The method comprises obtaining a first alpha mask 6 of the first video frame 2. The method comprises providing a first downscaled video frame, wherein the first downscaled video frame is a lower resolution version of the first video frame 2. The method comprises providing a first downscaled alpha mask of the first alpha mask 6. The method comprises estimating a first primary coefficient 8 and a first secondary coefficient 10 based on the first downscaled video frame and the first downscaled alpha mask. The method comprises generating a second alpha mask 12 for the second video frame 4 based on the first primary coefficient 8 and the first secondary coefficient 10.

(12) Thus FIG. 1 illustrates the method, which may be disclosed as a method of temporally propagating the coefficients of the alpha masks. The first video frame 2 may be read in as a keyframe alpha mask and some method is used to obtain the first keyframe alpha mask 6. The first alpha mask 6 and the first video frame 2 are used to obtain the linear coefficients 8, 10. The linear coefficients 8, 10 for the first video frame 2 are propagated to the second video frame 4 and used together with the second video frame 4 to obtain the second alpha mask 12 for the second video frame 4. The second alpha mask 12 for the second video frame 4 is then used to obtain the second primary coefficient 14 and the second secondary coefficient 16 for the second video frame 4, which are then propagated to the next frame, i.e. the third video frame, and the steps for generating the third alpha mask, fourth alpha mask etc. are repeated for the following video frames.

(13) The first video frame 2 is for time t=0. The second video frame 4 is for time t=1. The third video frame is for time t=2 etc.

(14) As seen from FIG. 1, the images or video frames 2, 4 show driving cars on a street. The alpha masks 6, 12 show the alpha mask or shape of the dark car in the bottom left corner.

(15) FIG. 2 schematically illustrates an example of the method to generate alpha masks of video frames in a video comprising a plurality of video frames including a first video frame 2 and a second video frame 4 following the first video frame 2. The method comprises obtaining a first alpha mask 6 of the first video frame 2. The method comprises providing a first downscaled video frame, wherein the first downscaled video frame is a lower resolution version of the first video frame 2. The method comprises providing a first downscaled alpha mask of the first alpha mask 6. The method comprises estimating a first primary coefficient 8 and a first secondary coefficient 10 based on the first downscaled video frame and the first downscaled alpha mask. The method comprises generating a second alpha mask 12 for the second video frame 4 based on the first primary coefficient 8 and the first secondary coefficient 10.

(16) The plurality of video frames further includes a third video frame 18. The method comprises providing a second downscaled video frame, wherein the second downscaled video frame is a lower resolution version of the second video frame 6. The method comprises providing a second downscaled alpha mask of the second alpha mask 12. The method comprises estimating a second primary coefficient 14 and a second secondary coefficient 16 based on the second downscaled video frame and the second downscaled alpha mask. The method comprises generating a third alpha mask 20 for the third video frame 18 based on the second primary coefficient 14 and the second secondary coefficient 16.

(17) The third alpha mask 20 for the third video frame 18 is then used to obtain the third primary coefficient 22 and the third secondary coefficient 24 for the third video frame 18, which are then propagated to the next frame, i.e. the fourth video frame, and the steps for generating the fourth alpha mask, fifth alpha mask etc. are repeated for the following video frames.

(18) The first video frame 2 is for time t=0. The second video frame 4 is for time t=1. The third video frame is for time t=2 etc.

(19) Thus FIG. 2 is an illustration of the estimation sequence. Images (I.sup.(t)) 2, 4, 18 are video frames obtained from the video sequence, the first alpha mask (α.sup.(0)) 6 is obtained by other means as a key frame. The first coefficients ((a, b).sup.(t)) 8, 10, are obtained from the first alpha mask 6, and used to estimate the alpha mask 12 of the next frame 6 together with the next video frame 6, etc. The second coefficients ((a, b).sup.(t)) 14, 16, are obtained from the second alpha mask 12, and used to estimate the alpha mask 20 of the next frame 18 together with the next video frame 18, etc. Dashed circles indicate observed variables 2, 4, 6, 18. Note that the first alpha mask 6 is illustrated as observed in this case, since it is a key frame.

(20) FIG. 3 schematically illustrates an image or video frame 2, 4, 18. The video frame 2, 4, 18 comprises a plurality of pixels 26. For illustrative purposes the video frame 2, 4, 18 comprises 5×5 pixels 26, however in real cases there may be many more pixels 26 in a video frame, such as 1024×768, or such as 2048×1536, or such as 512×384. The primary coefficient and the secondary coefficient are assumed to be constant in a window 28, 28′ centered at pixel 26′, 26″.

(21) FIG. 3a) shows that the window 28 is centered at pixel 26′ with radius r. Thus the window 28 may have size 2r+1×2r+1. Alternatively the window may have size r×r.

(22) FIG. 3b) shows that window 28′ centered at pixel 26″ overlaps window 28 centered at pixel 26′. Thus there may be a window 28, 28′ centered at each pixel 26.

(23) FIG. 4 schematically illustrates an asynchronous rendering of a slow keyframe model and model updating, while a fast temporal model continuously renders frames from the camera. Black thick lines indicate that the process is busy.

(24) The present method referred to as the temporal model is able to predict the alpha mask for the next video frame, given the image for the next frame and the (predicted) alpha mask for the current frame. Sometimes it may be desirable to run a more complex model, such as a deep neural network, to get a better prediction (a keyframe). Usually such a keyframe model 30 is not fast enough to keep up with the camera frame rate 32, wherefore the fast temporal model 34 (the present method) is needed while the keyframe model 30 is running. However, assuming that the keyframe model 30 takes time Δk to process a frame 36, then by the time the keyframe model 30 has processed a frame 36 from time t, the result is outdated by Δk×fps frames. Say it takes 5 seconds to process a keyframe and the camera 32 has a framerate at 60 frames per second (fps). This means that when the keyframe 30 processing has finished, the camera 32 has processed 300 frames and the result from the keyframe model 30 is “outdated” in some sense.

(25) FIG. 4 illustrates a solution to this problem. The process is as follows: The camera 32 delivers a frame 36 every 1/fps second and the temporal model 34 needs to process this in Δt<1/fps. This may be a strict requirement to keep up with the camera frame rate. The keyframe model 30 needs time Δk to process the frame 36, whereafter an update process needs to catch up with the temporal model 34, which takes time Δc. Notice that the temporal model 34 has a new state s2 available when the keyframe 30 and update processes have finished.

(26) The update process uses the very same temporal model 34, which puts further requirements on the processing time. The update process will have caught up with the camera 32×frames 36 after time t when the keyframe 30 started processing:

(27) x .Math. 1 fps = Δ k + Δ c = Δ k + x .Math. Δ t ( 4 ) ( 5 ) x ( 1 fps - Δ t ) = Δ k ( 6 ) x = Δ k ( 1 fps - Δ t ) . ( 7 )

(28) Note that the longer the keyframe 30 takes to compute, the faster the temporal model 34 needs to be relative to the camera frame rate. As an example, let's set the keyframe processing time to 2 seconds, Δk=2, the camera frame rate at 60 fps. With a temporal model 34 capable of 200 frames per second the time is Δt=0.005, the result from the keyframe 30 at time t will be ready after x=172 frames, which is a little under three seconds. It may be desirable to keep x small to have frequent keyframe updates.

(29) This can be achieved by parallelizing either the update process and/or the keyframe process. Parallelizing solely the update process may only make sense when Δk<Δc.

(30) FIG. 5 schematically illustrates a flow chart of a method 100 to generate alpha masks of video frames in a video comprising a plurality of video frames including a first video frame and a second video frame following the first video frame. The method 100 comprises:

(31) In step 102 a first alpha mask of the first video frame is obtained.

(32) In step 104 a first downscaled video frame is provided, wherein the first downscaled video frame is a lower resolution version of the first video frame.

(33) In step 106 a first downscaled alpha mask of the first alpha mask is provided.

(34) In step 108 a first primary coefficient and a first secondary coefficient is estimated based on the first downscaled video frame and the first downscaled alpha mask.

(35) In step 110 a second alpha mask for the second video frame is generated based on the first primary coefficient and the first secondary coefficient.

(36) FIG. 6 schematically illustrates a flow chart of an exemplary method 200 to generate alpha masks of video frames, such as video frames in a video comprising a plurality of video frames including a first video frame, a second video frame and a third video frame. The method 200 comprises:

(37) In step 202 a first alpha mask (α.sup.(1)) of the first video frame (I.sup.(1)) is obtained.

(38) In step 204 a first downscaled video frame (Ĩ.sup.(1)) of the first video frame is provided. The downscaled the video frame may be provided by for example bicubic interpolation of the first video frame.

(39) In step 206 a first downscaled alpha mask ({tilde over (α)}.sup.(1)) of the first alpha mask is provided. The downscaled alpha mask may be provided by for example bicubic interpolation of the first alpha mask.

(40) In step 208 a first primary coefficient (a.sup.(1)) and a first secondary coefficient (b.sup.(1)) is estimated based on the first downscaled video frame and the first downscaled alpha mask.

(41) Estimating 208 the first primary coefficient and the first secondary coefficient may comprise convolution operations of the downscaled video frame and/or of the downscaled alpha mask.

(42) Estimating 208 the first primary coefficient and the first secondary coefficient may comprise estimating 210 a downscaled first primary coefficient (ã.sup.(1)) and a downscaled first secondary coefficient ({tilde over (b)}.sup.(1)) based on the first downscaled video frame and the first downscaled alpha mask, and resizing 212 the downscaled first primary coefficient and the downscaled first secondary coefficient to obtain the first primary coefficient and the first secondary coefficient.

(43) Resizing 212 the downscaled first primary coefficient and the downscaled first secondary coefficient to obtain the first primary coefficient and the first secondary coefficient may include for example bilinear or bicubic resizing.

(44) In step 214 a second alpha mask (α.sup.(2)) for the second video frame (I.sup.(2)) is generated based on the first primary coefficient (a.sup.(1)) and the first secondary coefficient (b.sup.(1)), such as wherein α.sup.(2)=a.sup.(1)I.sup.(2)+b.sup.(1).

(45) A subsequent alpha mask, such as a third alpha mask for a third video frame may be generated by:

(46) In step 216 a second downscaled video frame (Ĩ.sup.(2)) of the second video frame (I.sup.(2)) is provided.

(47) In step 218 a second downscaled alpha mask ({tilde over (α)}.sup.(2)) of the second alpha mask (α.sup.(2)) is provided.

(48) In step 220 a second primary coefficient (a.sup.(2)) and a second secondary coefficient (b.sup.(2)) is estimated based on the second downscaled video frame and the second downscaled alpha mask.

(49) Estimating 220 the second primary coefficient (a.sup.(2)) and the second secondary coefficient (b.sup.(2)) may include the same operations as estimation 208 of the first primary coefficient and the first secondary coefficient.

(50) In step 222 the third alpha mask (α.sup.(3)) for the third video frame (I.sup.(3)) is generated based on the second primary coefficient (a.sup.(2)) and the second secondary coefficient (b.sup.(2)), such as wherein α.sup.(3)=a.sup.(2)I.sup.(3)+b.sup.(2).

(51) FIG. 7 schematically illustrates an electronic device 38, such as a smartphone or other computer device, for generating alpha masks of video frames in a video. The electronic device 38 comprises a camera 32 configured to provide the video comprising a plurality of video frames including a first video frame and a second video frame following the first video frame. The electronic device 38 comprises a display 40 configured to display the video frames of the video. The electronic device 38 comprises a processing unit 42 configured to obtain a first alpha mask of the first video frame; provide a first downscaled video frame, wherein the first downscaled video frame is a lower resolution version of the first video frame; provide a first downscaled alpha mask of the first alpha mask; estimate a first primary coefficient and a first secondary coefficient based on the first downscaled video frame and the first downscaled alpha mask; and generate a second alpha mask for the second video frame based on the first primary coefficient and the first secondary coefficient.

(52) Although particular features have been shown and described, it will be understood that they are not intended to limit the claimed invention, and it will be made obvious to those skilled in the art that various changes and modifications may be made without departing from the scope of the claimed invention. The specification and drawings are, accordingly to be regarded in an illustrative rather than restrictive sense. The claimed invention is intended to cover all alternatives, modifications and equivalents.

LIST OF REFERENCES

(53) 2 first video frame 4 second video frame 6 first alpha mask 8 first primary coefficient 10 first secondary coefficient 12 second alpha mask 14 second primary coefficient 16 second secondary coefficient 18 third video frame 20 third alpha mask 22 third primary coefficient 24 third secondary coefficient 26, 26′, 26″ pixels 28, 28′ window 30 keyframe model 32 camera 34 temporal model 36 frame 38 electronic device 40 display 42 processing unit 100 method 102 method step of obtaining a first alpha mask of the first video frame 104 method step of providing a first downscaled video frame, wherein the first downscaled video frame is a lower resolution version of the first video frame 106 method step of providing a first downscaled alpha mask of the first alpha mask 108 method step of estimating a first primary coefficient and a first secondary coefficient based on the first downscaled video frame and the first downscaled alpha mask 110 method step of generating a second alpha mask for the second video frame based on the first primary coefficient and the first secondary coefficient 200 method 202 method step of obtaining a first alpha mask (α.sup.(1)) of the first video frame (I.sup.(1)) 204 method step of providing a first downscaled video frame (Ĩ.sup.(1)) of the first video frame 206 method step of providing a first downscaled alpha mask ({tilde over (α)}.sup.(1)) of the first alpha mask. 208 method step of estimating a first primary coefficient (a.sup.(1)) and a first secondary coefficient (b.sup.(1)) based on the first downscaled video frame and the first downscaled alpha mask 210 method step of estimating a downscaled first primary coefficient (ã.sup.(1)) and a downscaled first secondary coefficient ({tilde over (b)}.sup.(1)) based on the first downscaled video frame and the first downscaled alpha mask 212 method step of resizing the downscaled first primary coefficient and the downscaled first secondary coefficient to obtain the first primary coefficient and the first secondary coefficient 214 method step of generating a second alpha mask (α.sup.(2)) for the second video frame (I.sup.(2)) based on the first primary coefficient (a.sup.(1)) and the first secondary coefficient (b.sup.(1)), such as wherein α.sup.(2)=a.sup.(1)I.sup.(2)+b.sup.(1) 216 method step of providing a second downscaled video frame (Ĩ.sup.(2)) of the second video frame (I.sup.(2)) 218 method step of providing a second downscaled alpha mask ({tilde over (α)}.sup.(2)) of the second alpha mask (α.sup.(2)) 220 method step of estimating a second primary coefficient (a.sup.(2)) and a second secondary coefficient (b.sup.(2)) based on the second downscaled video frame and the second downscaled alpha mask 222 method step of generating the third alpha mask (α.sup.(3)) for the third video frame (I.sup.(3)) based on the second primary coefficient (a.sup.(2)) and the second secondary coefficient (b.sup.(2)), such as wherein α.sup.(3)=a.sup.(2)I.sup.(3)+b.sup.(2)